<<

iv

Metabolic Systems Biology of Leishmania major University of Minho, 2019 v

ACKNOWLEDGEMENTS

I am extremely grateful to my supervisors Dr. Isabel Cristina de Almeida Pereira da Rocha and Dr. Sónia Madalena Azevedo Carneiro for providing me critical guidance and suggestions during my Ph.D. studies. Moreover, I thankfully acknowledge their precious scientific advice and opportunity they provided to explore this field widely and independently.

I gratefully express my appreciation to SilicoLife Lda and CEB (University of Minho) for providing required infrastructural facilities for doing research. I especially thank to Simao Soares, CEO SilicoLife Lda, for providing me such nice platform and all help during my work. I also thank Bruno Pereira (systems biologist at SilicoLife Lda) and Hugo Giesteira (ex- programmer at SilicoLife Lda) for scientific and technical assistance during various phases of the work. I am very grateful to BiSBII group (Systems Biology group a University of Minho) for technical suggestions on my project and improving my knowledge in this field. My special thanks to Sara Correia (Researcher at University of Minho) for helping me with java programming during my project. I am also thankful for Initial Training Network, GlycoPar, funded by the FP7 Marie Curie Actions of the European Commission (FP7-PEOPLE-2013-ITN- 608295) for providing me financial support.

My research journey began from the National Bureau of Animal Genetic Resources Karnal (India), where I did my first research internship under the supervision of Dr. Dinesh Kumar during my first university education at IIT Guwahati (India). I am extremely grateful for his motivational advice that helped me to start my research career. I am also very thankful to my previous supervisor Prof. Vikash Dubey and co-supervisor Prof. Arun Goyal for providing me scientific guidance on my graduation (B.Tech) project at IIT Guwahati. I also thank Dr. Peter Ashton for being my supervisor during my master degree (Master of Research) at University of York UK, and Professor Tero Attokallio (EMBL Group Leader at FIMM, University of Helsinki Finland) for providing me in-depth suggestions and guidance on the project that I completed under his supervision after my master degree. My special thanks to all professors, teachers, and researchers at IIT Guwahati (India), IIT Delhi (India) University of York (UK), University of Helsinki (Finland) and University of Oslo (Norway) for providing me scientific advances during

Metabolic Systems Biology of Leishmania major University of Minho, 2019 vi graduation/post-graduation/internship. Also, I want to thank my school teachers for providing me all basic educations and career-oriented guidelines.

I would like to thank my friends Dr. Arunav Pradhan, Dr. Dileep Bishi, Neelam, Shweta Singh, Dr. Mouli Thalluri, Balaji Sompalle, Ramya, Dr. Sandeep Kaushik, Shailendra, Nidhi, Dr. Praveen Sher and others for cheering the life during this phase. I also thank my college buddies Vipin Ranga and Hemant Bhaskar for their valuable support. Lastly, I express special thanks to my wife Sujata Tantuway who has been always with me to provide moral support and inspiration, and my parents for all kind of unconditional supports and encouragements during this special phase of my life.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 vii

ABSTRACT Metabolic Systems Biology of Leishmania major

Protozoan parasitic diseases such as leishmaniasis, toxoplasmosis, and sleeping sickness are one of the major causes of death worldwide. The emerging resistance of parasitic species and adaptive mechanisms of infection are a major concern in developing medical treatment. Understanding the genotype/phenotype of parasites during infection can help in developing effective anti-parasitic therapies. In recent years, systems biology approaches, in particular, genome-scale metabolic modelling has been proposed to attain such understanding. This methodology allows incorporating omics data (e.g. transcriptomics, proteomics and metabolomics) to understand stage-specific metabolism of many organisms including protozoan parasites such as Leishmania, Toxoplasma, and Plasmodium. Metabolic behaviour during infection can be characterized, including the metabolic involvement of small or complex carbohydrates that has been poorly studied, so far, at the systems level. This thesis aims to present a comprehensive understanding of protozoan parasites metabolism and specifically the influence of glycans and glycoconjugates during infection. First, a review on different data types and methodologies used to study glycans and glycoconjugates from their structures to functions in protozoan parasites is presented. Glycobiology databases, in particular, glycomic and glycoproteomic data resources, were extensively reviewed, addressing problems in accessing and integrating data due to inconsistencies in the identification and representation of glycans, as well as the poor inter-linkage between databases. The focus is provided on graphic and text-based glycan structural notations, exploiting available tools to interconvert these encoding formats, in order to improve inter-linkage and interoperability among various glycomic databases. Next, metabolic modelling of parasitic cells using omics data (e.g. transcriptomics, proteomics, and glycomics) is used to understand the metabolism and the role of carbohydrates at the systems level. To explore the metabolism of human protozoan parasites, a constraint-based metabolic model of L. major, iAC560, was used as a case-study and extended to include pathways for the metabolism of lipids and larger fatty acids, and biosynthesis of carbohydrates. Flux Balance Analyses (FBA) was used to simulate the metabolism of Leishmania at promastigote (glucose- rich environment) and amastigote (, amino sugar, and lipid-rich environment) conditions. Also, the model helped to assess active and inactive metabolic pathways to

Metabolic Systems Biology of Leishmania major University of Minho, 2019 viii synthesize sugar nucleotides, which are essential precursors in the biosynthesis of glycans and glycoconjugates in promastigote and amastigote stages. Furthermore, in order to improve metabolic predictions, Gene Inactivity Moderated by Metabolism and Expression (GIMME) algorithm was used with flux-based simulations to improve the consistency between predicted fluxes and gene expression data of L. major in promastigote stage. The implementation of GIMME improved flux distribution across various pathways and helped to understand metabolism of Leishmania promastigote stage.

Gene deletion analysis using the ext-iAC560 model allowed to predict 53 potential drug target genes in L. major. Many of these genes had been already characterized as essential in other protozoan species, while 10 genes (e.g. LmjF35.5330, LmjF36.2540, LmjF32.1960, LmjF33.0680, LmjF28.1280, LmjF21.1430, LmjF09.1040, LmjF06.1070, and LmjF06.0350 in promastigote stage and amastigote stage, and LmjF36.6950 in only amastigote stage) are predicted as novel drug targets. More than 70% of predicted essential genes showed lethal phenotype by preventing biosynthesis of more than two cellular building blocks. Predicted novel essential genes are associated with lipid and fatty acid biosynthesis, but essentiality in human protozoan parasites or closely related species has not been tested. Searches in literature and chemical databases (e.g. DrugBank and TDR Target) found that around 80% of the predicted essential genes have enzyme-based inhibitors in parasitic and non-parasitic species. Most of these enzyme-based inhibitors were not tested in Leishmania species; however, molecules such as Carbamide phenylacetate, CDV, Terbinafine, and 4-(dimethylaminomethyl)-2,6-di(propan-2- yl)ph enol, which are tested against essential genes in other protozoan species are of major interest, as these are more likely to have similar responses in Leishmania.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 ix

RESUMO Metabolic Systems Biology of Leishmania major

As doenças parasitárias provocadas por protozoários, tais como a leishmaniose, toxoplasmose e doença do sono, são as uma das principais causas de morte em todo o mundo. A resistência destes parasitas a certas drogas usadas nos tratamentos e as suas capacidades de adapatação durante a infecção são dos principais factores a considerar no desenvolvimento de tratamentos médicos.. Compreender o genótipo / fenótipo destes parasitas durante o processo de infecção é importante no desenvolvimento de terapias anti-parasitárias mais eficazes. Mais recentemente, o uso de abordagens de biologia de sistemas, e em particular, a modelação de redes metabólicas à escala genómica tem sido proposta para compreender o metabolismo dos organismo em diferentes condições de crescimento. Este tipo de metodologias permite incorporar dados ómicos (por exemplo, transcriptomica, proteomica e metabolomica) que melhoram a caracterização do metabolismo dos organismos, incluindo parasitas protozoários, tais como Leishmania, Toxoplasma e Plasmodium..

Esta tese tem como objetivo apresentar uma revisão abrangente do metabolismo de parasitas protozoários e em específico o envolvimento de diferente açucares e glicoconjugados durante o processo de infecção. Para tal, diferentes tipos de dados e metodologias utilizadas para estudar estas moléculas e as suas estruturas, tais como as suas funções em parasitas protozoários foram explorados. Bases de dados de Glicobiologia e em particular, dados ómicos relativos a estas moléculas foramanalisados, abordando a problemática do acesso e integração de dados devido às inconsistências na identificação e representação destes dados.A representação gráfica e textual destas estruturas foi cuidadosamente revista e foram exploradas diferentes ferramentas bioinformáticaspara a interconversão dos diefrentes formatos de modo a melhorar a interligação e a interoperabilidade entre as bases de dados.

De modo a explorar o metabolismo de parasitas protozoários, foi usado como caso de estudo o modelo metabólico iAC560 de L. major. O modelo foi extendido (ext-iAC560) de modo a compreender o metabolismo de alguns lipídios e ácidos gordos de cadeia longa e a bisossintese de alguns carboidratos. A simulação do metabolismo foi feirta usando essencialemte métodos baseados na análise de balanço de fluxos (FBA), permitindo descrever o metabolismo de

Metabolic Systems Biology of Leishmania major University of Minho, 2019 x

Leishmania no estado promastigota (meio rico em glucose) e amastigota (meio rico em lipídios aminoácidos e amino açúcares e). Métodos como GIMME (Gene Inactivity Moderated by Metabolism and Expression) foram também usados para melhorar a consistência entre os fluxos previstos e os dados de expressão genética.

Análises adicionais sobre a essencialidade de genes foram aplicadas, tendo sido encontrados 53 potenciais genes alvo de fármacos na L. major. Muitos desses genes já foram sido caracterizados como essenciais noutras espécies de protozoários, enquanto outros 10 genes (por exemplo, LmjF35.5330, LmjF36.2540, LmjF32.1960, LmjF33.0680, LmjF28.1280, LmjF21.1430, LmjF09.1040, LmjF06.1070, and LmjF06.0350 no estado promastigota e amastigota, LmjF36.6950 no estado amastigota) são previstos como novos alvos farmacológicos. Mais de 70% dos genes essenciais previstos mostraram provocar um fenótipo letal ao prevenir a biossíntese de mais de dois componentes celulares essenciais, se eliminados em condições promastigotas ou amastigotas. Pesquisas na literatura e bases de dados químicas (por exemplo, DrugBank e TDR Target) mostraram que cerca de 80% dos genes essenciais previstos têm inibidores enzimáticos em espécies parasitas e não parasitárias. A maioria desses inibidores não foi testada em espécies de Leishmania; no entanto, moléculas como fenilacetato de carbamida, CDV, terbinafina e 4-(dimetilaminometil)-2,6-di(propan-2-il) fenol, que foram testadas contra genes essenciais em outras espécies de protozoários, são de maior interesse, sendo mais propensos a causar respostas semelhantes em Leishmania.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xi

LIST OF CONTENTS Acknowledgements ...... v Abstract ...... vii Resumo ...... ix List of Contents ...... xi List of Figures ...... xv List of Tables ...... xix Scientific Output ...... xxi List of Abbreviations ...... xxiii Chapter 1: General Introductions

Abstract ...... 2 1.1 Glycobiology of human protozoan parasites...... 3 1.1.1 Glycans, glycoconjugates and their biosynthesis ...... 3 1.1.2 Biological roles of the glycans and glycoconjugates ...... 7 1.2 Data types and methodologies in Glycobiology ...... 10 1.2.1 Spectra-based methodologies ...... 12 1.2.2 Array-based methods ...... 15 1.2.3 Kinetic characterization of glycosyltransferase enzymes ...... 16 1.2.4 Functional analysis and phenotyping ...... 17 1.3 Data resources ...... 19 1.4 Glycoinformatic tools ...... 30 1.4.1 Glycan 3D structures analysis ...... 30 1.4.2 Glycosylation prediction ...... 36 1.5 Systematic study of human protozoan parasites ...... 41 1.5.1 Study of human parasites: traditional approaches and recent advances ...... 41 1.5.2 Metabolism of protozoan parasites: current knowledge and challenges ...... 43 1.6 Metabolic network modeling ...... 46 1.6.1 Metabolic network reconstruction, curation, and flux analyses ...... 46 1.6.2 Integration of omics data into metabolic network modeling...... 49 1.6.3 Previous systems-wide metabolic studies ...... 51 1.6.4 System-based characterization of carbohydrates ...... 54 1.7 Objectives of the thesis ...... 55 1.8 Outline of the thesis ...... 56

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xii

1.9 References ...... 58 Chapter 2: Accessing And Interlinking Databases in Glycobiology

Abstract ...... 80 2.1 Introduction ...... 81 2.2 Glycomic resources and integration with other databases ...... 83 2.3 Notations to represent glycan structures ...... 84 2.3.1 Graphical representation of glycan structures ...... 84 2.3.2 Text-based representation of glycan structures ...... 85 2.4 Glycan notations in glycomic databases ...... 87 2.5 Tools to convert encoding formats for glycan structure ...... 90 2.6 Discussion and Conclusions...... 95 2.7 Appendix 2a ...... 98 2.8 References ...... 99 Chapter 3: Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism

Abstract ...... 104 3.1 Introduction ...... 105 3.2 Methodology ...... 108 3.2.1 Model extension and refinement ...... 108 3.2.2 Biomass composition ...... 108 3.2.3 In-silico media formulation ...... 109 3.2.3.1 Modified Media for Promastigote (MMP) ...... 109 3.2.3.2 Modified Media for Amastigote (MMA) ...... 109 3.2.3.3 Definition of reaction flux constraints ...... 109 3.2.4 Integration of gene expression data...... 110 3.3 Results and Discussion ...... 113 3.3.1 Extended metabolic model ...... 113 3.3.2 Model simulations and phenotypes predictions ...... 114 3.3.2.1 Leishmania metabolism ...... 114 3.3.2.2 Sugar nucleotides (SugNuc) biosynthesis in promastigote and amastigote stage ...... 118 3.3.2.3 Phenotypic effects of gene knockouts in sugar nucleotides biosynthesis ...... 119 3.3.3 GIMME simulations and consistency analyses ...... 121 3.3.3.1 Gene coverage in ext-iAC560, transcriptomic and protein expression study ...... 121

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xiii

3.3.3.2 Consistency between model predictions and transcriptomics data ...... 123 3.4 Conclusions ...... 129 3.5 Appendix 3a ...... 130 3.6 Appendix 3b ...... 141 3.7 Appendix 3c ...... 146 3.8 Appendix 3d ...... 146 3.9 References ...... 147 Chapter 4: Gene Essentiality Analysis in Leishmania major

Abstract ...... 154 4.1 Introduction ...... 155 4.2 Methodology ...... 157 4.2.1 Reaction knockout and phenotype predictions ...... 157 4.2.2 Gene essentiality prediction ...... 157 4.2.3 Identification of the non-human homologous genes ...... 158 4.3 Results and Discussion ...... 159 4.3.1 Reaction knockout phenotypes and model comparison ...... 159 4.3.2 Single/double gene deletion analysis ...... 160 4.3.3 Local or global effects of gene deletion ...... 161 4.3.4 Drug target identification ...... 163 4.4 Conclusions ...... 178 4.5 Appendix 4a ...... 179 4.6 Appendix 4b ...... 184 4.7 Appendix 4c ...... 190 4.8 Appendix 4d ...... 197 4.9 References ...... 198 Chapter 5: Summary And Future Perspectives

5.1 Summary ...... 206 5.2 Future perspectives ...... 213 5.3 References ...... 215

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xiv

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xv

LIST OF FIGURES

Figure 1.1 The overview of different types of data from various experimental techniques to characterize glycoconjugates for understanding their structures and functions, biosynthetic pathways and associated genes/enzymes, interactions between sugar and non-sugar molecules, and phenotypic effects by altering the activity of glycosyltransferase (GT) enzymes or glycan binding proteins (GBPs)……………………………………………………………………………………………………..11

Figure 1.2 Involvement of current omics technologies to understand metabolism and roles of carbohydrates in human protozoan parasites……………………………………………………………...42

Figure 1.3 A general schematic representation of energy metabolism pathways commonly found in most human protozoan parasites using glucose as carbon source…...…………………………………....45

Figure 1.4 Metabolic network modeling. A) Illustration of a metabolic reactions network. B) A metabolic reactions network can be mathematically represented by mass balance equations C) The stoichiometric matrix represents the stoichiometry of all reactions in the network. D) Flux constraints can be imposed by defining minimum and maximum bounds for each reaction flux; vi and bi,j refers to internal flux and exchange flux respectively in the network. E) Solving linear equations assuming steady- state condition...…………………………………………………………………………………………...49

Figure 2.1 Illustration of various notations to represent the glycan (Gal)3(Glc)1(GlcNAc)2(cer)1 (KEGG ID- G01905). This glycan consists of three galactoses (Gal), two N-acetylglucosamines (GlcNAc) and a glucose (Glc) unit attached to a non-sugar molecule, ceramide (Cer) (not illustrated). Glycosidic bonds in IUPAC (2D) and CFG/Essentials are illustrated using IUPAC nomenclature, where, “α” and “β” represent anomeric linkage between different monosaccharides…………………..………...85

Figure 2.2 Machine-readable formats representing the following glycan structure: (Gal)3(Glc)1(GlcNAc)2(cer)1...... …………….………………………………….87

Figure 2.3 Use of graphic/text-based glycan notations across sixteen glycomic databases included in the analysis………………………………………………………………………………………..……….89

Figure 2.4 The cross references (or cross-links to other databases) used in different glycomic databases…………………………………………………………………………………………………..89

Figure 2.5 Tools for the interconversion of text-based glycan structural encoding formats. The colored circles represent tools that use input formats mentioned in first column to convert into other

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xvi formats shown in first row. Blank grey boxes indicate that there are no tools available to convert one format into the other…………………………………………………………………………...………….90

Figure 2.6 Search tools used to access glycomic databases that can be queried using different input formats, graphic (e.g. CFG and GlycoCT) or text (e.g. LinearCode, GlycoCT (condensed), IUPAC, and KCF). For example, GlycoCT format can be used at GlycanBuilder platform to access glycan structures related information from GlycoWorkBench and UniCarbKB databases.……………………………...... 93

Figure 2.7 The search platform provided by the GlyTouCan database. The search query is permitted using graphic, text-based and file-based inputs. The output of the query includes glycan information in WURCS, GlycoCT (condensed), IUPAC and CFG formats……………………………………………...94

Figure 3.1 Workflow for integrating gene expression data into the metabolic model to calculate inconsistency and improve flux distribution predictions for L. major promastigote. A) Gene Inactivity Moderated by Metabolism and Expression (GIMME) algorithm was used to constrain the flux (i.e. zero and non-zero for the reactions with associated gene expression values below and above the threshold, respectively) in the metabolic network representing promastigote condition of L. major. Flux Variability Analysis (FVA) was used to analyze solution space over different thresholds, while protein expression data further provide validation on the predicted fluxes of reactions. B) An exemplifying scheme for calculating inconsistency scores (IS) using gene expression and flux values in a given metabolic network. The IS was calculated assuming synthesis of metabolite E as a target…………………………………..112

Figure 3.2 Active/inactive pathways in lipid metabolism based on flux distributions predicted from Parsimonious Flux Balance Analysis (pFBA) simulation using metabolic model ext-iAC560 in promastigote and amastigote stages of L. major. Modified Media for Promastigote (MMP) and Modified Media for Amastigote (MMA), as described in the methodology, were used as growth media to simulate promastigote and amastigote phenotypes, respectively………………………………………………….117

Figure 3.3 Active/inactive pathways in the biosynthesis of essential sugar nucleotides (SugNuc) in promastigote and amastigote stages. The reactions represented by red arrows indicate active flux in promastigote stage, when glucose is a main carbon source for the synthesis of SugNuc. The green arrows represent active flux distribution in amastigote stage, when lipids, two additional amino acids (asp-L and ala-D), and amino sugars are consumed as a main carbon source to synthesize essential SugNuc. Alternatively, other carbon sources such as Fructose (Fru), Galactose (Gal), Arabinose (Ara), and Mannose (Man) can be used to synthesize SugNuc……………………………………………..……….119

Figure 3.4 Predicted active and inactive sugar nucleotides (SugNuc) biosynthetic pathways after knocking out genes GND (A) and GFAT (B) one by one in amastigote stage. Dashed arrow represents a

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xvii restricted synthesis of the corresponding metabolites (marked with red circle) which cause lethal effects. .…………………………………….…………………………………………………………….…….…121

Figure 3.5 A) Common genes considered in the transcriptomic study (Rastrojo et al. 2013), proteomic study (Pawar et al. 2014) and current metabolic model ext-iAC560. B) The density plot of FPKM values of 10275 genes in considered in transcriptomic study and coverage of the genes which are expressed in the proteomic study..……………………………………………………………………….122

Figure 3.6 Expression level (FPKM values in promastigote stage) of 570 genes considered in the metabolic model ext-iAC560. The blue dots represent genes that are expressed at the protein level in the proteomic study (Pawar et al. 2014). The red dots show genes with zero FPKM values………………..122

Figure 3.7 Evaluating inconsistencies between ext-iAC560 model predictions and gene expression data from L. major promastigote cells. A) Inconsistency scores (IS) were calculated for different expression threshold values, while estimating the number of reactions with gene expression levels below a threshold value and predicted flux values equal and different from zero. B) Zoom in of plot A for lower threshold values, showing the variation in the inconsistency score and the number of reactions with gene expression below threshold with flux values different and equal to zero……………………………………………124

Figure 3.8 GIMME constrained pFBA simulations of promastigote conditions of L. major. A) Changes in the number of reactions with predicted flux = 0 or ≠ 0 after implementing GIMME at different gene expression thresholds. The reactions which became inactive due to GIMME-based constraints are mentioned with the green bar. B) Percentage cellular distribution of the reactions which showed binary changes in their fluxes after GIMME (at gene expression threshold 70)...………...…………………....127

Figure 3.9 Changes in the flux distribution of the glycolysis associated pathways after GIMME constraints (at threshold = 70) in L. major promastigote. The red arrows represent changed fluxes after imposing GIMME constraints. Legend: PEP: Phosphoenolpyruvate; oaa: oxaloacetate; succ: succinate; ala_L: L-; TCA: tricarboxylic acid; pyr: Pyruvate; mal: Malate. 1L-alanine transaminase, 2Malic enzyme (NAD), 3Malate dehydrogenase…..…………………………………..…………………………128

Figure 4.1 Number of essential genes which affected biosynthesis of macromolecules in biomass. The pFBA-based simulation was carried out in MMP (promastigote) and MMA (amastigote) media to simulate growth phenotypes in wild-type and knockout conditions. The genes which knockout reduced the synthesis of at least one building block to zero, were considered as essential for growth...... ………………………………………………………………………………………………..163

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xviii

Figure 4.2 Distribution of predicted 53 potential drug target genes based on literature evidence referring to essentiality data in Leishmania, other parasitic species, and non-parasitic species. All the genes were searched into DrugBank, TDR Target databases and literature to find enzymatic inhibitors mentioned.……………………………………….……………………………………………..……..…165

Figure 5 A schematic representation of strategy of the whole study including various resources used, analyses performed and main findings. Light green shaded boxes represent key outcomes of the study, while grey boxes represent main analysis performed……………………………….……...….…212

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xix

LIST OF TABLES Table 1.1 Summary of various methodologies and their applications to study the glycoconjugates……………………………………………………………………………………………12

Table 1.2 Detailed information on glycan databases. Graphical and text formats which each database use for representing glycan structures and connectivity to other glycomics and glycoproteomics databases is mentioned in the corresponding columns……………………………………………..……..22

Table 1.3 Databases providing data on GT enzymes and GBPs, including gene expression profiles, phenotypic studies in mouse, lectins and glycan-protein interactions……………………………...……..26

Table 1.4 Databases and tools which facilitate different analytical tasks, such as accessing the glycomics databases, glycan structures visualization, structural notation translations, and glycan 3D structural analysis………………………………………………………………………………………….33

Table 1.5 Web-based tools for predicting N-, O-, C-linked glycosylation, glycation, and glypiation in different organisms. The corresponding column of each tool describes the machine learning method used to develop the tool………………………………………………………………………………..….38

Table 1.6 Metabolic models of the human protozoan parasites. Details of number of genes, reactions, metabolites, and compartments in each model are mentioned in the corresponding columns……………………………………………………………………………………………….…...53

Table 3.1 Number of reactions classified as type1, type2 and type3 from Flux Variability Analysis (FVA) results considering simulations with/without GIMME-based constraints (i.e. deleting MURs at gene expression threshold = 70), with 90% flexibility of maximum biomass in promastigote condition……………………………………………………………………………………………….…126

Table 4.1 Predicted Essential (E) and Non-Essential (NE) genes in single/double gene knockout (KO) analysis using metabolic model ext-iAC560 in promastigote and amastigote conditions. Parsimonious Flux Balance Analysis (pFBA)-based simulation was ran to predict growth phenotypes in gene knockout condition.………………………………………………………………………………...……………….161

Table 4.2 Databases and literature based information for predicted 53 essential genes (which are non-human homologous also) in Leishmania……….…………………………………………..………..169

Table 4.3 Additional information about 10 novel genes predicted as potential drug targets in L. major………………………………………………………………………………………………….….177

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xx

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xxi

SCIENTIFIC OUTPUT

Manuscripts for publication

1. Shakyawar S, Rocha I, and Carneiro S (2018). Integrated Metabolic Flux and Omics Analysis of Leishmania major metabolism. BICOB 2018 - 10th International Conference on and . No. 135052, Las Vegas, USA, March 19-21, 198-203, 2018. ISBN: 978- 194343611-8. (http://hdl.handle.net/1822/55203).

2. Shakyawar S, Soares S, Rocha I, Carneiro, S. Accessing and interlinking databases in glycobiology. (Ready for submission).

3. Shakyawar S, Soares S, Rocha I, Carneiro, S. Identification of potential drug targets in Leishmania major using metabolic modelling approach. (Under preparation).

Programming scripts for flux-based analyses (Available at https://github.com/shakyawar/SupplMaterial_PhD)

Script set 1 (java): for performing pFBA analysis in conjugation with GIMME on varying gene expression threshold values.

Script set 2 (java): for performing reaction knockout simulation, where a reaction is knocked out and biomass is measured to observe lethal or non-lethal effect.

Script set 3 (python): for performing general analysis using omics data (e.g. transcriptomics and proteomics) and metabolic model ext-iAC560, like finding common genes in all studies, density analysis of using FPKM values of all the genes considered in transcriptomics study.

Script set 4 (java): for performing gene essentiality analysis, where synthesis of biomass building blocks is checked in when a particular gene is knocked out from the model. Essentiality of the gene is determined based on biomass values in wild type and knocked out condition.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xxii

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xxiii

LIST OF ABBREVIATIONS

aPPGs non-filamentous proteophosphoglycans (PPGs) Ara Arabinose CARP CArbohydrate Ramachandran Plot CAZy Carbohydrate-Active Enzyme CSS Carbohydrate Structure Suite EF Essentiality Factor ER Endoplasmic Reticulum F6P Froctose-6-phosphate FBA Flux Balance Analysis FPKM Fragments Per Kilobase Million fPPGs Filamentous proteophosphoglycans (PPGs) FVA Flux Variability Analysis Gal Galactose GBP(s) Glycam Binding Protein(s) GDP-Ara GDP-Arabinose GDP-Man GDP-Mannose GEMs genome-scale models GFAT -fructose-6-phosphate amidotransferase GIMME Gene Inactivity Moderated by Metabolism and Expression GIPLs Glycosylinositol phospholipids Glc6P Glucose-6-phosphate GlcN Glucosamine GlcNAc N-acetylglucosamine GNAD N-acetylglucosamine 6-phosphate deacetylase GNAT N-acetylglucosamine acetyltransferases GND N-acetylglucosamine 6-phosphate deaminase GP63 Glycoprotein 63 GPI Glycosylphosphatidylinositol GPs Glycoproteins GT Glycosyltransferase HPCE High-performance capillary electrophoresis HPLC High-performance liquid chromatography IS Inconsistency score KO Knockout LC Liquid Chromatography LfDB Lectin Frontier DataBase

Metabolic Systems Biology of Leishmania major University of Minho, 2019 xxiv

LPGs Lipophosphoglycans MALDI-TOF Matrix-Assisted Laser Desorption Ionization and Time Of Flight measurement Man Mannose MIRs Metabolically important reactions MMA Modified Media for Amastigote MMP Modified Media for Promastigote mPPGs GPI-anchored proteophosphoglycans (PPGs) MS Mass Spectrometry MURs Metabolically unwanted reactions NMR Nuclear magnetic resonance OGTs O-GlcNAc transferases PDB PFBA Parsimonious Flux Balance Analysis PI 1-O-alkyl-2-lyso-phosphatidyl(myo)inositol PL Phospholipids PPGs Proteophosphoglycans PSTMM Probabilistic sibling-dependent tree Markov model SBML Systems Biology Markup Language SL Sphingolipids SugNuc Sugar Nucleotides TCA Tricarboxylic acid UDP-Gal UDP-Galactose UDP-Galf UDP-Galactofuranose UDP-Glc UDP-Glucose UDP-GlcNAc UDP-N-acetylglucosamine UGTs UDP-glycosyltransferases WT Wild Type

Metabolic Systems Biology of Leishmania major University of Minho, 2019

CHAPTER 1

General Introduction

“Almost all aspects of life are engineered at the molecular level, and without understanding molecules, we can only have a very sketchy understanding of life itself.”

Francis Crick, A Personal View of Scientific Discovery (1988)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 2 Chapter 1

Abstract

This chapter aims to provide a general introduction to the systems biology approaches applied to the study of human protozoan parasites. The application of genome-scale modeling procedures coupled with the integration of omics data has been widely used to characterize the metabolic behavior of various organisms. Here, genome-scale metabolic modeling and omics approaches are covered and evaluated with respect to the ability to provide a systems- level understanding of parasites’ metabolism. In particular, glycans and glycoconjugates biosynthetic activities are investigated with focusing on their systems-wide involvement in regulating metabolism and reformulating survival strategies of the parasites in a harsh environment.

Previous efforts to understand the metabolic profile of the parasites at the systems level are comprehensively reviewed with highlighting their limitations, and challenges for next generation application of metabolic modeling along with the use of multi-omics data to understand the complexity of the parasites’ metabolism. The basic protocol for flux-based analyses using the metabolic network is described with the motive of its application for performing omics data-driven system-level characterization of the metabolic components including glycans and glycoconjugates to understand metabolism and genotype/phenotype in protozoan parasites.

General Introduction 3

1.1 Glycobiology of human protozoan parasites

1.1.1 Glycans, glycoconjugates and their biosynthesis

Protozoan parasites such as Plasmodium, Leishmania, Trypanosoma, and Toxoplasma are causative agents of a variety of human diseases, including malaria, leishmaniasis, sleeping sickness and toxoplasmosis. These parasites adopt different survival strategies which typically involve the participation of various glycans, and glycoconjugates, which are product of the glycosylation, a process of enzyme-driven attachment of glycan (sugar) molecules to non-glycan molecules (e.g. proteins and lipids) (Bishop and Gagneux 2007; Holt 2011; Hokke and van Diepen 2017; Baum et al. 2014). More precisely, this process involves the transfer of sugar moieties from activated monosaccharides (e.g. UDP-Glucose, GDP- Mannose, and UDP-N-acetylglucosamine) to specific acceptor molecules (Spiro 2002; Aebi 2013). This biochemical process is one of the most common post-translational modification (PTM) processes in cells, and almost 50% of mammalian proteins undergo this type of modification (Apweiler et al. 1999; Hertel and Zhang 2014). The glycoconjugates form the glycocalyx, which participates in many biologically significant activities that involve the recognition of the glycans at the cell surface of many eukaryotes including human protozoan parasites (Hokke and van Diepen 2017; Baum et al. 2014). In the context of understanding the roles of carbohydrates in human parasites, “Glycobiology”, plays a significant role, as a sub-branch of biology dealing with the study of carbohydrates, including those associated with the metabolism, survival, growth and other regulatory cellular activities of the parasite at different life stages. Various glycoconjugates in different protozoan parasites are discussed next.

Glycoproteins (GP)

GP are glycan-protein complexes, where a specific amino acid residue of the protein is glycosidically attached to monosaccharides mostly occurring either by N-linked or O-linked glycosylation process, named according to the amino acid atom to which the monosaccharides chain is linked (Medina-Acosta et al. 1989; Daubenspeck et al. 2015). N- linked glycosylation typically involve attachment of monosaccharide to (Asn) in the Asn-X-Ser or Asn-X-Thr sequence of proteins, where X can be any amino acid except (Pro); while, O-linked glycosylation involve attachment of monosaccharide to (Ser) or (Thr) in the same amino acid sequence (Medina-Acosta et al. 1989; Aebi

Metabolic Systems Biology of Leishmania major University of Minho, 2019 4 Chapter 1

2013; An et al. 2009). Other less frequent glycosylation is C-mannosylation, which involves the attachment of a mannose sugar to the carbon (C) atom of a residue to form the C-mannosyl-tryptophan complex, and is mediated by C-mannosyltransferase enzymes (Beer et al. 1995; Ihara et al. 2015). A special type of glycosylation, called glypiation, which is widely detected on surface glycoproteins in eukaryotes, involves the enzymatic addition of glycosylphosphatidylinositol (GPI) molecule, a specific class of glycolipids containing phosphatidylinositol, to the C-terminal end of the protein chain (Udenfriend and Kodukula 1995; Fujita and Kinoshita 2010). In the context of the PTM, glycation is the process of non- enzymatic attachment of a glycan to protein and lipid molecules to form glycoconjugates (Bodiga et al. 2013). The mechanism of bond formation between glycan and non-glycan molecules in different types of glycosylation have been extensively reviewed in (Mazola et al. 2011a; Strasser 2016).

In N-linked glycoproteins, the glycan core contains multiple units of sugar such as mannose (Man), N-acetylglucosamine (GlcNAc) and glucose (Glc), which are attached to the protein molecules via glycosylation. In these types of glycoproteins, the formation of the glycosidic linkage between glycans and proteins begins with the addition of two molecules of GlcNAc to dolichol phosphate (P-Dol), forming a GlcNAc-α1-PP-Dol complex in the cytoplasmic face of the endoplasmic reticulum (ER). The process is followed by ALG glycosyltransferase mediated attachment of the Man and Glc residues to form a glycan anchor with different oligosaccharides molecules. Further, the protein binds to the glycan core by forming a glycosidic linkage. This phase of the process is highly conserved across eukaryotic organism including protozoan parasites (Varki et al. 2017). The follow-up part of the process includes trimming of the Glc and Man residues, and extension by adding monosaccharides such as GlcNAc and fucose (Fuc) in the Golgi apparatus to provide required functionality to the glycan-protein complex (Varki et al. 2009; Zhang and Wang 2016).

GP63 protein is a well-known example of N-linked glycoprotein, which exists on the cell surface of Leishmania (Medina-Acosta et al. 1989; Kaur et al. 2011), and is anchored by a

GPI molecule (Schneider et al. 1990). Two major N-glycans e.g. Man6GlcNAc2 and

GlcMan6GlcNAc2 which are linked with GP63 have been characterized in both L. mexicana and L. major (Olafson et al. 1990; Funk et al. 1997). Similarly, Variant Surface Glycoproteins (VSG) are dimeric protein found on the cell surface of African Trypanosomes (Ferguson 1997; Cross et al. 2014). These are N-glycosylated GPI-anchored proteins, and mostly classified as class I and II based on the position of glycosylation sites from amino acid General Introduction 5

sequence at the C-terminal domains (Carrington et al. 1991; Mehlert et al. 2010). Class I VSG contains only one glycosylation site at 50th residue from C-terminus, with an associated

Man residue rich N-glycan (Man5-9GlcNAc2); Class II VSG have five to six glycosylation residues at 170th residue from C-terminus. Similarly, procyclic acidic repetitive proteins (PARPs) in T. brucei are also found attached with N-linked glycans (e.g. Man5GlcNAc2), and are anchored by GPI molecule (Treumann et al. 1997; Haikarainen et al. 2017). Examples of O-linked glycoproteins include merozoite surface protein-2 (MSP-2) in P. falciparum during intraerythrocytic stages (Low et al. 2007), and cyst wall proteins (CWP) in protozoan parasites of the genus Giardia (Ebneter et al. 2016) and Toxoplasma (Tomita et al. 2013). Similar to VSG of trypanosomes, variant-specific surface proteins (VSPs) is also an example of O-linked glycoprotein, which is expressed on the surface of parasite G. lamblia (Papanastasiou et al. 1997; Li et al. 2013).

Lipophosphoglycans (LPG) LPG are cell surface glycoconjugates of Leishmania parasite. The structure is composed of a PI anchor, a glycan core, repeating disaccharide phosphate units and a small oligosaccharide cap (Turco and Descoteaux 1992; Novozhilova and Bovin 2010). The PI anchor consists of lipid derivative of 1-O-alkyl-2-lyso-phos-phatidyl(myo)inositol, while glycan core mainly includes glucosamine (GlcN), Man and galactofuranose (Galf) residues (Ferguson 1992). The repetitive unit, which is consisted of Gal(β1,4)Man(α1)-PO4, is attached to the glycan core. The oligosaccharide chain that mainly includes galactose (Gal) and Man are connected as a cap in the end (Turco 1988; Novozhilova and Bovin 2010). The structural characterization also showed that the lipid anchor, the glycan core, and the Gal(β1,4)Man(α1)-PO4 backbone of repeat units are conserved in different Leishmania species; however, the variations have been detected in carbohydrate side chains of the repeat units. For example, no side chains were detected in LPG of L. donovani from Sudan, while, β-Glc was found attached with every four to five repeat units in LPG of L. donovani from India (Mahoney and Turco 1999). Similarly, in L. mexicana, side chains of repeat units are mainly modified with β-Glc, whereas in L. tropica, side chain consists of over 19 different types of monosaccharides (McConville et al. 1995). Recently, variation in carbohydrate part of LPG was also studied in L. braziliensis and L. infantum (Assis et al. 2012). The biosynthesis of LPG is not yet clearly understood; however, studies have shown that approximately 5 × 106 copies of this glycoconjugate are present on the cell surface of Leishmania (Turco and Descoteaux 1992). During metacyclogenesis (i.e. conversion of non-infective promastigotes to a non-dividing

Metabolic Systems Biology of Leishmania major University of Minho, 2019 6 Chapter 1

infective form of promastigotes; promastigote stage refers to the stage when Leishmania reside inside sandlfy), repeating unit of LPG gets doubled, from 15 to 30, when it detached from the midgut of sandfly (Sacks et al. 1995). Similar to LPG, lipopeptidophosphoglycans (LPPG) are major glycoconjugate expressed on cell surface of Trypanosoma cruzi epimastigote, with approximately 1.5 × 107 copies per cell (Golgher et al. 1993). The structure of LPPG mainly consists of a glycan core linked to a lipid component inositolphosphoceramide via a nonacetylated glucosamine; where, the glycan core contains Man, Galf and 2-aminoethylphosphonate (2-AEP) (Lederkremer et al. 1991; Guha-Niyogi et al. 2001). The major variations in the structure of LPPG were observed at the position of Galf in the glycan core, where in most of the cases, Galf residues are linked to Man(α1,2)Man. The lipid component of LPPG is mainly composed of complex lipids such as palmitoylsphinganine, palmitoylsphingosine, and lignoceroylsphinganine (Guha-Niyogi et al. 2001).

Glycosylinositol phospholipids (GIPL)

GIPL are low molecular weight glycolipids expressed on cell surface of Leishmania. These glycoconjugates consist of a PI anchor and a glycan core without any attachment with proteins or polysaccharides (McConville and Ferguson 1993; Assis et al. 2012a). Based on the components of the glycan core, GIPL are categorized as:

i) type1: which consist of Manα1,6Manα1,4GlcNα1,6-PI;

ii) type2: which consist of Manα1,3Manα1,4GlcNα1,6-PI

iii) type3 or hybrid: which consist of Manα1,6(Manα1,3)Manα1,4GlcNα1,6-PI.

The lipid part of GIPL includes 1-O-alkyl-2-O-acylglycerol. A lipid component “alkyl-acyl- PI” is the major constituent in type1 and hybrid GIPL, while type2 GIPL are rich in longer alkyl chains (Zawadzki et al. 1998; Assis et al. 2012a). The lipid composition of GIPL is distinct and enriched with alkyl-acyl-PI as compared to LPG and GPI-anchor (McConville and Blackwell 1991). There is not much supportive evidence about biosynthesis of GIPL in the protozoan parasites. However, it was proposed that early steps of its synthesis are same as in case of biosynthesis of the protein anchors, which involve the formation of GlcNAc-PI complex formation by transferring GlcNAc to PI in L. mexicana (Proudfoot et al. 1995; Smith et al. 1997; Oppenheimer et al. 2011).

General Introduction 7

Secreted glycoconjugates

Secreted glycoconjugates are a family of heavily glycosylated proteins synthesized by Leishmania. The main complex molecules in this category include secreted acid phosphatases (sAPs) and proteophosphoglycans (PPG). sAPs are acid phosphatase enzymes which are synthesized in mono- or oligomeric form by various Leishmania species such as L. donovani, L. tropica, L. amazonensis and L. aethiopica (Fernandes et al. 2013). These are highly glycosylated enzymes encoded by multiple genes with high sequence similarity in different species (Ilg et al. 1994a; Papadaki et al. 2015; Shakarian et al. 2002). sAPs from L. donovani are glycosylated on C-terminal serine/threonine-rich domains, where, glycan core consist of 6Gal(β1,4)Man(α1-)PO4 repeat unit, which is structurally similar to that in LPG. In contrasts, sAPs in L. mexicana have modification at Man(α1)PO4 residue in repeat units Gal-Man-PO4 (Ilg et al. 1994; Ilg et al. 1991).

Proteophosphoglycans (PPG) refer to another type of highly glycosylated proteins and are observed in Leishmania during different stages of its life cycle. PPG can be categorized as i) filamentous PPG (fPPG), which consists of 95% phosphoglycans, with serine, alanine, and proline as major amino acid residues in the proteins; ii) GPI-anchored PPG (mPPG), which consists of highly glycosylated proteins associated with GPI-anchor, and iii) non-filamentous PPG (aPPG), which consists of a polypeptide with modified phophoglycans on serine residues of the proteins (Ilg et al. 1994a; Sacks and Kamhawi 2001; Assis et al. 2012; Novozhilova and Bovin 2010). In each of the molecules, the glycan core includes structures

like [Glc(β1,3)]1-2Gal(β1,4)Man and backbone consisted of Gal-Man-PO4 (Guha-Niyogi et al. 2001; Novozhilova and Bovin 2010). In order to understand the biosynthesis of PPG, high throughput techniques such Mass Spectrometry (MS) were used to study their glycan components, giving an idea of how sugar molecules are linked with each other in L. major (Majumdar et al. 2005) and Entamoeba histolytica (Arya et al. 2003). But, the process of attachment of glycan molecules with proteins in PPG complex has not been understood well in the parasites; however, a recent study showed expression profile of the genes involved in the biosynthesis of PPG in L. infantum (Alcolea et al. 2016).

1.1.2 Biological roles of the glycans and glycoconjugates

Glycoconjugates are crucial in several metabolic and host-mediated interactions in human protozoan parasites. The outer covering of most parasitic cells contains these glycoconjugates

Metabolic Systems Biology of Leishmania major University of Minho, 2019 8 Chapter 1

and plays a vital role in many cellular processes involving glycans recognition. These molecules are very specific to the biological processes and play a central role in regulating organisms’ metabolism (Bishop and Gagneux 2007; Holt 2011; Pollyanna et al. 2017; Bergman et al. 2004; Varki 2017).

In general, the glycan part of most glycoconjugates is the key player especially in the context of cell-to-cell interactions between parasitic and non-parasitic cells. For instance, the interaction of GPI-anchored GP63 protein with fibronectin receptors of host macrophage to promote Leishmania infection (Brittingham et al. 1999) is modulated mainly by N-glycans attached with the GP63 protein (Hsiao et al. 2017). Moreover, other studies also found that N-glycans are involved in providing thermotolerance to many organisms including protozoan parasites in the unwanted environment (Naderer et al. 2008; Garfoot et al. 2018; Sanz et al. 2016). The structural diversity of glycans on the cell surface helps in cell-to-cell interactions, as previously shown in the context of parasitic infection (Sunter and Gull 2017; Forestier et al. 2015). The outbound glycan molecular moiety acts as a docking site in cell recognition to provide cell surface antigenicity in Leishmania and other parasites (Rek et al. 2009; Cohen 2015; Anish et al. 2013). Previous studies also described the mechanism of interactions of surface glycans with non-sugar molecules from the structural point of view (Rek et al. 2009; Hebert and Molinari 2012).

Glycans are also involved in protein folding in order to provide specific functions to the secreted proteins in eukaryotes (Vasudevan and Haltiwanger 2014; Xu and Ng 2015; Chaffey et al. 2017); however, less explored in protozoan parasites. Only a few studies showed involvement of N-glycans in structural folding of secreted glycosylated proteins in protozoan parasites such as T. gondii and P. falciparum (Luk et al. 2008; Tan et al. 2014; Samuelson and Robbins 2015). A recent study has also found modulation in sAP enzymes during Leishmania infection, suggesting a role in virulence (Ghouila et al. 2017). Previous studies also explored amino acid sequence and three dimensional (3D) structures of few glycoproteins in order to reveal the mechanism of glycan-protein interactions, as well as to identify protein glycosylation sites in protozoan parasites. For example, the Arg-Gly-Asp sequence of GP63 protein was proposed to bind to a macrophage receptor during the interaction between Leishmania and human macrophage (Russell and Wright 1988; Brittingham et al. 1999; Liu and Uzonna 2012). In a different study, three potential glycosylation sites were identified in Leishmania GP63 protein (Button and McMaster 1988). Similar to GP63, N-glycosylated VSG helps Trypanosoma to escape from the host's immune General Introduction 9

system by forming a dense coat over its surface (Zamze et al. 1991; Borst and Cross 1982; Pinger et al. 2017). The VSG provide an impenetrable barrier to parasitic cells to protect from several host antibodies (Schwede et al. 2015). N-glycans (e.g. Asn-GlcNAc2-Man3-Fuc) with high mannose residues of surface proteins in merozoite stage of P. falciparum are involved in promoting invasion of human erythrocytes (Burghaus et al. 1999).

Among the other widely studied glycoconjugates, GIPLs, LPGs, and PPGs are also involved in several surface activities in the parasitic cell (Rogers 2012; Franco et al. 2012; Srivastava et al. 2013; Suzuki et al. 2002). The structural variations in the glycan core of the LPG and PPG serve as a ligand for binding with lectin in mid gut of sand fly, helping parasites like Leishmania to retain in the mid gut. The surface coat of LPG in metacyclic Leishmania resists host defences to differentiate into amastigote form. Previous studies showed involvement of LPG and PPG at various stages e.g. from the growth of procyclic Leishmania to macrophage infection by metacyclic Leishmania in hosts (Sacks and Kamhawi 2001; Srivastava et al. 2013; Rogers 2012; Eggimann et al. 2015). This parasite synthesizes very little or no LPG in amastigote stages, playing insignificant roles in amastigote survival (Spath et al. 2000; Turco and Sacks 1991); however, at this stage GIPL are the major constituents which interact with macrophages and protect parasite from environmental hazards (Assis et al. 2012; Passero et al. 2015). GIPL act to modulate signaling events such as Nitric Oxide (NO) synthesis, which is critical for parasite clearance inside macrophage (Suzuki et al. 2002; Proudfoot et al. 1995; Olivier et al. 2005). GIPL in Trypanosoma cruzi is also found essential for resisting host defense and promoting infection in the human host (Brodskyn et al. 2002; Flores-García et al. 2011; Previato et al. 2003).

Intimate involvement of glycoconjugates in the strategies used for survival and spreading infections have made carbohydrates attractive targets for developing antiparasitic drugs and vaccines. Advances in structures and biosynthesis of these molecules provided deeper insight to understand their interactions with the hosts, helping to develop antiparasitic therapies. For example, the essential GPI molecules have been validated as a chemotherapeutic drug target in P. falciparum (Werz and Seeberger 2005; Morotti et al. 2017) and T. brucei (Smith et al. 2004; Hong and Kinoshita 2009). Similarly, Galf is an important component of surface glycoproteins, and identified as potential drug targets in Leishmania spp. and T. cruzi (Oppenheimer et al. 2011). In later studies, molecular dynamics based studies identified potential drug target sites in GPI12, an important enzyme in the GPI biosynthetic pathway in L. major (Singh et al. 2015). Similarly, GDP-mannose pyrophosphorylase, an important

Metabolic Systems Biology of Leishmania major University of Minho, 2019 10 Chapter 1

enzyme in biosynthesis of several mannose containing glycoconjugates, was studied as a potential drug target in L. mexicana (Davis et al. 2004; Lackovic et al. 2010). In L. major also, UDP-galactopyranose mutase, an enzyme involved in LPG biosynthesis, was studied as a possible drug target, as it plays a vital role in virulence activity in the host (Kashif et al. 2018; Kleczka et al. 2007). In a recent study, the enzyme glucosamine-6-phosphate N- acetyltransferase, actively involved in glycoconjugates biosynthesis, are also validated as a potential drug target in P. falciparum (Cova et al. 2018). Immunogenicity of surface glycans has also been studied in Leishmania and other protozoan parasites to develop vaccines (La Greca and Magez 2011; Moura et al. 2017; Hewitt and Seeberger 2001). The past and current researches with the challenges and scope for future in this area have been extensively discussed in the previous review (Jaurigue and Seeberger 2017). The potency of glycans as drug targets in parasites is also discussed in (Pollyanna et al. 2017), while ongoing research and future challenges on glycoconjugates and enzymes involved in their biosynthesis, to be used as potential targets are reviewed in (Castillo-Acosta et al. 2017; Rodrigues et al. 2015; Oppenheimer et al. 2011).

1.2 Data types and methodologies in Glycobiology

Studies of carbohydrates from their structural to functional level were emphasized by a large scientific community, after they have been found involved in many cellular processes in different organisms, as discussed in the previous section. Basic molecular biology tools and recent advances in omics techniques were used to characterize simple and complex carbohydrates in different organisms, under Glycobiology. Current research in this context mainly focuses on: • the characterization of glycans, glycoconjugates, and building blocks (i.e. sugar nucleotides), • the identification of glycosylation sites in glycoproteins, • the discovery of carbohydrate-active enzymes and their kinetics, and glycoconjugates biosynthetic pathways, • the detection of molecular interactions between sugar and non-sugar molecules, • and the study of phenotypic effects caused due to irregularities in activities of the glycosyltransferase (GT) enzymes, glycan binding protein (GBP), glycans, and glycoconjugates.

General Introduction 11

Figure 1.1: The overview of different types of data from various experimental techniques to characterize glycoconjugates for understanding their structures and functions, biosynthetic pathways and associated genes/enzymes, interactions between sugar and non-sugar molecules, and phenotypic effects by altering the activity of glycosyltransferase (GT) enzymes or glycan binding proteins (GBPs).

As such, multiple analytical techniques are used to exploit and analyze the glycobiological phenomena in a biological system, generating different types of data that can be used in: profiling, kinetics, interactions, gene expression, and phenotyping studies (Figure 1.1). These many studies have been used to characterize not only the glycome of cellular systems but also their interactions with other systems, and their role in mediating phenotypic changes. The different methodologies and technical requirements, along with the limitations in data analysis are discussed next, with summarizing in Table 1.1.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 12 Chapter 1

Table 1.1: Summary of various methodologies and their applications to study the glycoconjugates.

General Methodology Applications References classification For profiling and decoding structural (Kailemia et al. 2014; Mass Spectrometry Lundborg and Widmalm information in carbohydrates 2011) Spectra-based For studying anomeric configuration, Nuclear magnetic monosaccharides composition, (Lundborg and Widmalm resonance (NMR) glycosidic linkage and other structural 2011; Leeflang et al. 2000) properties of carbohydrate molecules For analyzing stage-specific expression of genes which are (Naito-Matsui and DNA microarray Takematsu 2014; Ma et al. directly/indirectly involved in 2009) carbohydrate biosynthesis For detecting the interactions of Array-based (Song et al. 2014; Smith et Glycan microarray immobilized glycans with the lectins, al. 2010) glycoprotein or antibodies For detecting the interactions of (Zhang et al. 2016; Sun et Lectin microarray immobilized lectins with the glycans in al. 2016) the sample For determining the activity of glycosyltransferase enzymes to study Kinetic-based Enzyme kinetics (Augustin, J. M. and Bak the extent of interactions between 2013; Quirós et al. 2000) glycans and non-sugar molecules For providing functional annotations to Functional Functional newly discovered glycosyltransferase (Sharma et al. 2013; Jaillon genomics genomics et al. 2007) enzymes For analyzing phenotypic effects on Mouse altering the activity of Mouse phenotyping (Kuan et al. 2010; Orr et phenotyping glycosyltransferase gene/enzyme in al. 2013) mouse

1.2.1 Spectra-based methodologies

Mass Spectrometry (MS)

MS-based methods have emerged as a fast and sensitive way to determine glycan profiles and analyze their molecular structures in various organisms including human protozoan parasites (Kailemia et al. 2014; Lundborg and Widmalm 2011; Mehta et al. 2016; Morelle and Michalski 2007). To elucidate glycan structures and capture information about their conjugate partners, MS experiments can be performed either on isolated glycans or on glycoconjugates. In the first case, glycan molecules are released from the glycoconjugate either via enzymatic or chemical process, followed by a reaction with antibodies to detect the glycans (Cummings et al. 2017; Cummings and Etzler 2009).

For analyzing glycoproteins, the chemical process usually involves treatment of the sample with sodium hypochlorite (NaClO) to release the glycans; however, as a limitation, this method completely degrades protein part of the glycoproteins (Song et al. 2016). Similarly, General Introduction 13

hydrazinolysis are used to release O-glycans and N-glycans from glycoproteins without preserving peptides (Nakakita et al. 2007; Kozak et al. 2014). N-bromosuccinimide (NBS) is another compound used to decarboxylate small N-glycopeptides to release N-glycans without affecting the structures (Song et al. 2016). In enzymatic cleavage of N-glycans, two types of enzymes are used: endoglycosidases (e.g. N-glycosidase), which are used to cleave most N- glycans from the glycoproteins, and exoglycosidases, which are used to cleave a specific portion of the non-reducing end of the N-glycans (Kronewitter et al. 2010; Guthrie and Magnelli 2016). The released glycans in the mixture are labelled with fluorescent dyes, such as 2-aminobenzamide, 2-aminobenzamide (2-AB), 2,6-diaminopyridine (DAP), or biotinylated 2,6-diaminopyridine (BAP) (Ruhaak et al. 2010; Mulloy et al. 2009; Mulloy et al. 2017), and further analyzed using different chromatographic techniques, such as high- performance liquid chromatography (HPLC) or high-performance capillary electrophoresis (HPCE) for separation, followed by MS to analyze structure of the glycans (Cummings and Etzler 2009; Han and Costello 2013).

Exoglycosidase cleaved N-glycans can be analyzed using Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) profiling to analyze structure and linkage of the terminal monosaccharides (Morelle and Michalski 2007). This is how enzymatic cleavage is more useful over chemical cleavage to analyze structure and linkage of the specific carbohydrate in the glycoproteins. O-glycans are much more complex as compared to the N-glycans; similarly, O-glycosylated glycopeptides are also highly heterogeneous in nature as compared to the N-glycoproteins, making MS analysis more difficult (Turnock and Ferguson 2007; Mulagapati et al. 2017). At the same time, lack of O- glycosidase enzymes makes the release of O-glycans from glycoproteins more difficult; however, chemical methods can be used, as described above. “MALDI and Time Of Flight measurement (MALDI-TOF)” can be used to screen a complex mixture of glycoconjugates to analyze its monosaccharide sequence and specific linkage of the glycans (Brito et al. 2015; Selman et al. 2012). This methodology is capable of generating information about diversity and nature of the glycans. Apart from MALDI, Electrospray Ionization (ESI) is another MS ionization technique used to obtain glycan profile, while Tandem mass spectrometry is highly powerful to study structures of the carbohydrates (Liu et al. 2014; Bocker et al. 2011; Harvey 2005). The characterization of glycans from glycoproteins using MS-based methods has also provided information on glycosylation sites and type of linkage between glycans and peptides for many proteins, such as galactommano proteins from Aspergillus fumigatus (Morelle et al.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 14 Chapter 1

2005), glycoproteins from parasite Giardia intestinalis (Morelle et al. 2006; Bandini et al. 2016; Li et al. 2015) and several of human recombinant proteins (Bocker et al. 2011; Harvey 2005; Liu et al. 2006; Smith et al. 2015; Zaia 2008; Cova et al. 2015). Recently, ion mobility (IM) spectrometry coupled with MS is introduced as more sophisticated and sensitive technique to improve glycan characterization due to the capability of IM to differentiate isomeric glycans (Chen et al. 2018; Huang et al. 2013).

Glycoconjugates such as glycolipids can be analyzed using MS methods in intact form (Levery 2006; Ojima et al. 2015). High-performance thin layer chromatography (HPTLC) is the most common separation technique used for analyzing glycolipids. After separation, glycolipids can be probed using reactions with glycan binding proteins such as lectins and antibodies, followed by MS analysis (Meisen et al. 2011). Another sophisticated technique for analyzing glycolipids is hydrophilic interaction liquid chromatography (HILIC) in conjugation with MS, as previously performed to separate alpha2-3- and alpha2-6-sialylated ganglioside species of different carbohydrates, followed by analysis using tandem MS experiments (Kirsch et al. 2009). Similarly, liquid chromatography-tandem mass spectrometry (LC-MS/MS) can also be used to analyze lipids, as previously performed for characterizing phospholipids and glycolipids from pathogenic bacterium Enterococcus faecalis (Rashid et al. 2017).

Nevertheless, nucleotide sugars, such as GDP-fucose (GDP-Fuc), GDP-mannose (GDP- Man), GDP-arabinose (GDP-Ara), UDP-glucose (UDP-Glc), UDP-galactose (UDP-Gal), UDP-N-acetylglucosamine (UDP-GlcNAc), UDP-rhamnopyranose (UDP-Rha), UDP-xylose (UDP-Xyl), and UDP-glucuronic acid (UDP-GlcA), which are main precursors in the biosynthesis of the glycoconjugates in different parasites, have been characterized using LC/MS-based methods. These mainly allow determining the abundance of nucleotide sugars and their involvement in glycosylation processes. Some of the successful applications of the MS-based techniques in this context include the characterization of five sugar nucleotides (e.g. GDP-Fuc, GDP-Man, UDP-Glc, UDP-Gal, and UDP-GlcNAc) in three protozoan parasites e.g. T. brucei, T. cruzi, and L. major (Turnock and Ferguson 2007). The same study also characterized UDP-Rha, UDP-Xyl, and UDP-GlcA only in T.cruzi, and GDP-Ara uniquely in L. major. Similarly, various UDP-sugars were analyzed using MS-based methods in (Behmüller et al. 2014), human-Chinese hamster ovary (CHO) cells (Pabst et al. 2010), and human (Walia et al. 2017; Takeda et al. 2014). Applications of MS- based methods for the characterization of sugar molecules have been reviewed in previous General Introduction 15

articles (Lundborg and Widmalm 2011; Barb and Prestegard 2011; Leymarie and Zaia 2012; Shajahan et al. 2017).

Nuclear magnetic resonance (NMR)

Another powerful technology for the structural characterization of glycans is NMR, which analyses the chemical shift of 1H and 13C to study their anomeric configuration, monosaccharides composition, glycosidic linkage and other structural modifications in the molecules (Lundborg and Widmalm 2011; Barb and Prestegard 2011; Leeflang et al. 2000; Buda et al. 2016); however, it needs more than 95% pure samples. The NMR follows the principle of electrically charged nuclei of an atom, which can be involved in the transfer of energy if the external magnetic field is applied on it. These methods can quantitatively analyze a mixture of known compounds; however, for new compounds, the spectra measurements can be compared against a spectral library of known compounds. Previous reports describe a step-by-step procedure for performing NMR experiment using various biological samples (Lundborg and Widmalm 2011; Mulloy et al. 2009). For example, nano- NMR (NMR coupled with nano-probe) has been used to analyze mixtures of complex carbohydrates (Manzi et al. 2000). This technique is capable of using even very low amounts of sample (at microgram level) to characterize specific glycans, linkage and branching pattern (Manzi et al. 2000).

Databases such as GLYCOSCIENCES.de (Lütteke et al. 2006) and CASPER (Lundborg and Widmalm 2011; Jansson et al. 2006) provide NMR shift spectra and coupling constants1 for different glycans, which can be used as good resources for comparing raw NMR data to analyze structural features of newly identified glycan molecules. Other computational tools, such as GlyNest (Loß et al. 2006) and ProSpectND (http://sourceforge.net/projects/prospectnd/) provide an open source package to analyze raw NMR data to determine chemical shifts in the glycan molecules.

1.2.2 Array-based methods

Glycan array-based profiling approaches use high-throughput technologies similar to DNA microarrays, for screening thousands of antibodies, glycoproteins, and lectins in a biological sample (Song et al. 2014; Smith et al. 2010). The working strategy mainly includes immobilization of the natural and synthetic glycans on a solid surface in a discrete pattern, followed by allowing interactions with the sample containing lectins, glycoprotein or

1 Coupling constant is a measure of the interaction between a pair of protons in NMR analysis.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 16 Chapter 1

antibodies. Furthermore, fluorescence-labeled secondary antibodies can be used for detecting glycan-protein interactions (Boer et al. 2007; Purohit et al. 2018). More recently, lectins-based microarrays have also become popular for analyzing glycosylation features of the glycoproteins and glycans from the sera, crude cell samples or bacteria (Kuno et al. 2005; Rosenfeld et al. 2007; Cao et al. 2013). This technique involves instead the immobilization of lectins on a solid surface, which bind to glycans in the sample and may be detected using either prior labeling or overlaying fluorescence-labeled antibodies. Lectin microarrays have the advantage to detect glycans attached to other molecules without requiring any pre-processing to release the glycoconjugates, as in MS-based methods, for instance (Zou et al. 2017; Hirabayashi et al. 2013). The methodology is also capable of detecting lectin-glycan interactions, which are 100-1000 times weaker than antibody-antigen interactions (Uchiyama et al. 2008; Zhang et al. 2016). Typically, plant lectins, such as legume lectins (Etzler et al. 2009; Lagarda-Diaz et al. 2017), ricin B chain like lectins (Cummings and Etzler 2009), monocot-derived GNA-related lectins (Barre et al. 2001) and hevein-type lectins (Wright et al. 1991; Itakura et al. 2017) are used in lectin-based microarray experiments, mainly because of their ease of availability, stability and specificity. Both glycan and lectin microarrays are helpful to identify interactions between glycan and non-glycan molecules, which regulate several biological activities. Although microarray- based methods are not quantitative these are still useful for detecting inter-molecular interactions between glycans and their conjugate molecules.

1.2.3 Kinetic characterization of glycosyltransferase enzymes

The kinetics of glycosyltransferase (GT) enzymes, which selectively catalyze the transfer of a sugar moiety from the glycosyl donor to the acceptor molecule (Rini et al. 2009; Gantt et al. 2013), has been poorly studied as compared to other enzymes. Previous studies have mainly focused on monitoring the transfer of radiolabelled sugar moieties from donors to acceptor molecules during the glycosylation process (Donovan et al. 1999; Weerapana et al. 2005). Usually, chromatographic techniques, such as HPLC and HPCE are needed to separate the glycan molecules from their substrate in order to detect the transferred glycan molecules. Fluorescent chemosensors are also used as reagents to bind nucleotide groups (Cao and Heagy 2004; Oesch and Luedtke 2015) or antibodies which bind to products during the glycosylation reaction (Wongkongkatep et al. 2006). Other methods use the phosphatase enzyme to measure the released phosphate during the glycosylation reaction (Wu et al. 2011). The reactions involving GT enzymes can be studied using Michaelis-Menten parameters such General Introduction 17

as Km, Vmax, and the inhibition constant (Ki) (Michaelis and Menten 1913), to gain insight into functions of the enzymes.

Kinetic studies with GT enzymes also help to determine the extent of interactions between glycans and non-sugar molecules. For instance, the transfer of sugar moiety from UDP-Glc towards sapogenins oleanolic acid and hederagenin catalyzed by UDP-glycosyltransferases (UGTs) enzyme was studied using kinetic parameters (Augustin and Bak 2013). Similarly, the activity of glycosyltransferase in Streptomyces antibioticus (Quirós et al. 2000), and cyclodextrin glycosyltransferase (CGTase) in Klebsiella pneumoniae (Gastón et al. 2009) was studied using kinetics-based experiments. In another study, the activity of sterol glycosyltransferase, which catalyze the transfer of glucose (from UDP-Glc donor) to sterol

(glycosyl acceptor), was analyzed by measuring kinetic parameter Km in Micromonospora rhodorangea (Hoang et al. 2016). A recent review discussed various methods for studying the kinetics of glycosyltransferases in different species (Ngo and Suits 2017).

1.2.4 Functional analysis and phenotyping

Current high-throughput gene expression methods, such as DNA microarrays and RNAseq are powerful technologies to extract information on the roles of GT and glycans/glycoconjugates biosynthetic genes under different physiological conditions or in different types of cells. For example, differential expression studies on GT genes have been carried out in different types of human cells such as monocytes, dendritic cells, and macrophages (Trottein et al. 2009). Similarly, the stage-specific gene expression analysis was carried out in protozoan parasite L. infantum (Alcolea et al. 2014). Availability of a large number of well-annotated genomes facilitates similarity-based searches in the genome databases, such as GeneDB (Logan-Klumpler et al. 2012) and EuPathDB (Aurrecoechea et al. 2013) to infer the functions of the newly discovered genes in pathogens and other species. For example, genes UGT78D1 and UGT73C6 from A. thaliana were assigned functions for 3-O-glycosyltransferase and flavonol 7-O-glycosyltransferase enzymes, respectively, based on sequence similarity in the database search, which was experimentally confirmed by further analyzing flavonol biosynthesis and protein expression of the enzymes (Jones et al. 2003). Similarly, functions of the 146 UDP-glycosyltransferases (UGTs) genes in Oryza sativa (Sharma et al. 2013), 164 UGTs genes in Medicago truncatula (Jaillon et al. 2007) and UDP- glycosyltransferase genes in trypanosomatid (Pereira and Jackson 2018) were assigned based on comparative genomics analysis. Transcriptomics studies also help to understand functions of the genes in the secondary metabolism, as previously done on flavonoid UGTs genes in A.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 18 Chapter 1

thanliana (Yonekura-Sakakibara et al. 2007) and GT genes in protozoan parasites (Cantacessi et al. 2015). However, combined studies using genomics, transcriptomics, and proteomics approaches are much more powerful to understand the metabolic involvement of the GT genes. For example, comparative analysis of transcriptomic and proteomic data showed that four genes (encoding for the -rich protein, PfEMP-2, the glycophorin-binding protein 2, and a putative protein) are preferentially expressed in cerebral malaria parasite (Bertin et al. 2016). Similarly, transcriptomics and metabolomics approaches are also quite useful to identify the function of the newly discovered genes, as previously studied in human and protozoan parasites (Cantacessi et al. 2015; Scheltema et al. 2010). Databases such as ArrayExpress (Parkinson 2004), which provide highly curated and annotated functional genomics data are useful in this context. EMBL (https://www.ebi.ac.uk) also provide a platform linking ArrayExpress with Ensembl (Aken et al. 2016) and UniProt (The UniProt Consortium 2014), which can be used for purpose of functional annotations of the newly discovered genes. Since the functional genomics requires analysis of high throughput data on a large scale, strong and intensive computational resources are generally needed to avoid noise in the data.

Mouse phenotyping experiments can be carried out to study the phenotypic effects if the activity of GT or GBP genes is altered in the mouse lines. It helps to understand the biological function of enzymes and their phenotypic effect if deleted or altered their activity in biosynthetic pathways; however, this method is quite expensive due to the high maintenance cost of the experiments. This methodology is often applied to study the human GT and GBP genes because of higher similarity between human and mouse genes (Emes et al. 2003; Dowell 2011). Genes are engineered in the mice using the knockout model and altered phenotypic behavior is observed to understand the specific function of the genes. For example, overexpression of O-GlcNAc transferase (OGT) enzyme causes diabetic phenotypes in mouse (McClain et al. 2002). In another study, multiple phenotypic effects (related to control of reproduction and functions of spleen B-cell) were observed while knocking out of Lc3 synthase enzyme (B3gnt5 gene), which catalyzes lacto-neolacto ganglioside synthesis (Kuan et al. 2010). A comprehensive analysis using 36 mouse mutants with defects in GT enzymes or glycan binding proteins were performed to observe more than 300 phenotypic changes in the mouse (Orr et al. 2013). Other similar gene knockout studies and their associated phenotypic effects in the mouse lines are discussed in previous articles (Lowe and Marth 2003). General Introduction 19

1.3 Data resources

Glycomic databases store a tremendous amount of information on glycans and glycoconjugates from their structures to other chemical properties like molecular weight and glycan-binding specificity. GlycomeDB (Ranzinger et al. 2011) (now part of GlyTouCan 1.0 (Ceroni et al. 2008; Aoki-Kinoshita et al. 2016)), GlycoWorkbench (Ceroni et al. 2008) and CFG (Consortium for Functional Glycomics) are among the most used databases which provide structural information on carbohydrates (Table 1.2), as have also been reported in previous reviews (Baycin-Hizal et al. 2014; Artemenko et al. 2012), and textbooks (von der Lieth et al. 2009). More recently, GlyTouCan was upgraded to GlyTouCan 1.0, becoming one of the largest repositories of glycan structures (Ceroni et al. 2008; Aoki-Kinoshita et al. 2016). Some databases hold more specific data, such as GLYCOSCIENCES.de, which provides NMR-based glycan 3D structures (Sussman et al. 1998). These databases are powerful resources to facilitate tools for structure-based computational analysis including drug design (Ruyck et al. 2016; Hasan et al. 2015; Meng et al. 2011) and molecular modeling (Geromichalos 2007; Donga et al. 2016). Other databases of interest for the glycobiology community are glycoproteomic databases (Table 1.3), which are more focused on the chemical properties of glycoproteins, their glycosylation sites and the type of glycosylation processes these are associated with. UniPep (Zhang et al. 2006) and dbOGAP (Wang et al. 2011) are among the most popular glycoproteomic databases storing human N-linked and O-linked glycoproteins, respectively, providing both experimental and computationally predicted information. Similarly, GlycoProtDB (Kaji et al. 2012) provides data on N-linked glycoproteins and their associated glycans; whereas O-GlycBase (Gupta et al. 1999a) provides information for O-linked and C- linked glycoproteins. Few databases such as GlycoFly (Baycin-Hizal et al. 2011) and GlycoFish (Baycin-Hizal et al. 2011) provide data derived from specific experimental methods such as hydrazide chemistry and solid phase extraction.

Some databases provide information on enzymes involved in glycosylation and glycoconjugates biosynthetic pathways (Table 1.3). Carbohydrate-Active Enzyme (CAZy) (Lombard et al. 2014) is one of the largest resources that provide information on glycan- associated enzymes like glycosyltransferases, glycoside hydrolases, polysaccharide lyases and carbohydrate esterases. Data in CAZy is well organized, easy to access and linked to many other databases such Protein Data Bank (PDB) (Berman 2000), GeneBank (Benson et al. 2013) and UniProt (The UniProt Consortium 2014). In a similar context, Carbohydrate

Metabolic Systems Biology of Leishmania major University of Minho, 2019 20 Chapter 1

Structure Glycosyltransferase Database (CSDB_GT) (Egorova and Toukach 2017) provides more curated structural and functional information on GT enzymes. CAZy and KEGG PATHWAY databases (http://www.genome.jp/kegg/pathway.html) are quite helpful to study the functional roles of GT enzymes in metabolic pathways. Similarly, KEGG Orthology, as a sub-dataset of KEGG (Kanehisa and Goto 2000), is a good resource for more general information, such as gene name, bibliographic links, and sources of GT enzymes. In addition, databases such as GlycoGeneDB (Narimatsu 2004), Gene Expression Omnibus (GEO) (Edgar and Lash 2002) and CFG provide expression data associated with genes involved in glycosylation processes, which may be important for many glycomic studies, especially for the functional analysis of GT enzymes.

Other relevant data, derived from lectin/glycan microarray experiments facilitate studies on glycan-protein interactions in a high throughput manner. CFG is one of the largest resources storing glycan-protein interactions data from glycan microarrays and lectin microarrays experiments. Lectin Frontier DataBase (LfDB) (Hirabayashi et al. 2015) also provides glycan-protein interactions data from glycan-based microarray experiments. Similarly, GlyAffinity (Frank and Schloissnig 2010) provides access to the curated glycan-array data, imported from CFG and LfDB databases, which especially focus on different aspects of interactions between glycans and protein. GlycoEpitopeDB (Kawasaki et al. 2006) provides manually curated data for glycoproteins which are classified as antibodies. Other smaller databases, such as SugarBindDB (Shakhsheer et al. 2013), AnimalLectinDB (Kumar and Mittal 2011) and CancerLectinDB (Damodaran et al. 2008), also provide information on lectins and their associated target glycans. In all the databases, the formats used to represent the glycan structures, and connectivity with other databases are also mentioned in Table 1.3. Briefly, the glycomic databases use two types of format to represent glycan structures: 1) text-based format, and 2) graphic format. The text-based format use representation of monosaccharides of glycans in text, while in graphic representation, the monosaccharides are denoted using graphical symbols. More on the structural formats and representation of the glycan structures are discussed in Chapter 2.

Other databases compiling functional data from mouse phenotypic experiments are also of interest to understand the role of GT enzymes and GBPs at the systems level (Table 1.3). CFG contains 29 mutant mouse models, currently managed by the Mutant Mouse Regional Resource Center (MMRRC) (Lloyd 2003), from different phenotyping experiments involving GT enzymes knockout studies. In a similar context, a mass scale phenotyping study General Introduction 21

performed on 36 mouse lines was conducted by Orr et al., 2013, which provided phenotypic effects of various glycogenes alterations in the mouse, but the information is still not stored in any public database (Orr et al. 2013). The International Mouse Phenotype Consortium (IMPC) (http://www.mousephenotype.org/) is a large resource providing functional insight on the various human genes by carrying out systematic phenotyping studies on around 20000 knockout mouse strains; however, the coverage on the glycogenes in this study is poor. So far, there is no publicly available database comprehensively storing data from mouse phenotyping studies, specifically considering glycogenes.

Regarding information on complex molecules e.g. LPG, GIPL, and GPI-anchors however; only a few initiatives have been taken to collect this kind of data. BioCyc (Caspi et al. 2014) and LeishCyc (Doyle et al. 2009) databases provide information on the biosynthetic pathways of these complex molecules. Similarly, LmSmdB database (Patel et al. 2016) provides specific information on various components of the metabolic and gene regulatory network of L. major and Schistosoma mansoni. Data from different databases mentioned above are useful to characterize glycoconjugates in several species; however, different data types have remained separated in glycomics research. This is mainly because of unavailability of computational strategies to perform integrative analysis. Unfortunately, there is no specific database to provide kinetic data for GT enzyme activity. Most kinetic studies are performed independently and no framework is available to integrate these data in other analysis. Kinetic data can be useful to study functions of the GT enzymes at the systems level. Most glycomic databases provide structural data on glycans and glyco-components in different organisms, but the information is poorly linked with the other glycomic and glycoproteomic databases as well as dispersed, making integrative analyses difficult especially for protozoan parasites. Further efforts are needed to standardize these databases in terms of their data and linked information.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 22 Chapter 1

Table 1.2: Detailed information on glycan databases. Graphical and text formats which each database use for representing glycan structures and connectivity to other glycomics and glycoproteomics databases is mentioned in the corresponding columns.

Database Description Formats storing glycan structures Connectivity to References Web links other databases Graphical Text

BCSDB Database to provide structures CFG, IUPAC GlycoCT (Toukach 2011) http://csdb.glycoscience.ru/bacter and raw NMR data of glycans (2D) (condensed), ial from bacterial species. GlycoCT (XML), GlydeII, LINUCS

Cambridge Structural Database for crystal structures MOL pdb PDB (Allen 2002; http://www.ccdc.cam.ac.uk Database (CSD) of many mono- and Groom et al. oligosaccharides from x-ray and 2016) neutron diffraction analyses.

CFG A comprehensive resource for CFG, IUPAC LinearCode, GLYCOSCIENC Consortium for http://www.functionalglycomics. data on glycan structures from (2D) IUPAC(linear) ES.de, Functional org different experiments, such as GLYCAM-Web, Glycomics MS profiling, Glycan/Lectin CarbBank microarrays, glycogene expression and mouse phenotyping.

GDB Database for N- and O-linked IUPAC (2D) - PDB, KEGG (Nakahara et al. http://www.glycostructures.jp/ glycan structures annotated 2008) from PDB database.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 General Introduction 23

Table 1.2: (continued).

Database Description Formats storing glycan structures Connectivity to References Web links other databases Graphical Text

GFDB Database to provide torsion CFG - PDB (Jo and Im http://www.glycanstructure.org/ angle distribution from the 2013) glycan structures in the PDB database

GlycoBase (now part Repertoire for structural data of CFG, Oxford, GlycoCT CarbBank, (Campbell et al. http://wwsw.nibrt.ie/ of UniCarbKB) glycans which are characterized IUPAC (2D) (condensed) GLYCOSCIENCES 2008) using HPLC-based methods .de, CarbBank

GlycomeDB Database to provide structural CFG, Oxford, GlycoCT GLYCOSCIENCES (Ranzinger et http://www.glycome-db.org/ and taxonomic data of glycans IUPAC (2D), (condensed), .de, CFG, KEGG, al. 2011) (now part of from different resources. GlycoCT notation GlycoCT (XML), BCSDB, CarbBank, https://glytoucan.org/ GlyTouCan) GlydeII GlycoBase, PDB

*GlycoMind Commercial database to provide - LinearCode - Glycominds http://www.glycominds.com information on structures, Ltd. functions, and interaction of glycan molecules with proteins.

GLYCOSCIENCES.de The database provides glycan IUPAC (2D) LINUCS PDB (Lütteke et al. http://www.glycosciences.de structures, NMR data, and 2006) analytical software tools.

GlycoSuiteDB (now Comprehensive resources for IUPAC (2D) IUPAC (linear) SWISS- (Cooper et al. http://glycosuitedb.expasy.org part of UniCarbKB) glycan structures curated from PROT/TrEMBL 2003) the literature.

EUROCarbDB (now Database to provide glycan - - - (Von Der Lieth http://www.ebi.ac.uk/eurocarb structures and their biological

Metabolic Systems Biology of Leishmania major University of Minho, 2019 24 Chapter 1

Table 1.2: (continued).

Database Description Formats storing glycan structures Connectivity to References Web links other databases Graphical Text

part of UniCarbKB) context along with et al. 2011) interpretation with HPLC, MS/MS, and NMR data

GlyTouCan 1.0 Uncurated repository for glycan CFG GlycoCT BCSDB , (Ceroni et al. https://glytoucan.org/ structures with globally unique (condensed), GlycoEpitope, 2008; Aoki- accession number to each entry. WURCS, IUPAC UniCarbKB, Kinoshita et al. (condensed and CarbBank, CFG, 2016) extended) GlycO, Glycobase, GLYCOSCIENCES .de, JCGGDB, KEGG, PDB

JCGGDB The repertoire of data on CFG InChIKey GlyTouCan, KEGG, (Maeda et al. http://jcggdb.jp/index_en.html glycans, glycosylated proteins, CarbBank, GMDB 2015) lectins, glycosylation site, genes and glycan biosynthesis.

KEGG GLYCAN Database for glycan structures, IUPAC (2D) KCF CCSD, GlyTouCan, (Hashimoto et http://www.genome.jp/kegg/glyc glycogenes, and glycan JCGGDB, al. 2006) an biosynthesis related data.

MonosaccharideDB A comprehensive resource for CFG modified CarbBank, http://www.monosaccharidedb.or compositions, stereo-specific GlycoCT(conden GLYCOSCIENCES g information, intermolecular sed), IUPAC .de, BCSDB, PDB, linkage and residues (linear), pdb, information of BCSDB format , monosaccharides. GlydeII General Introduction 25

Table 1.2: (continued).

Database Description Formats storing glycan structures Connectivity to References Web links other databases Graphical Text

SugarBindDB Database to provide information CFG, Oxford, IUPAC (linear), UniCarbKB, (Shakhsheer et http://sugarbind.expasy.org on structures of the lectin IUPAC (2D) GlycoCT(conden UniProtKB al. 2013) binding glycans. sed)

*SWEET-DB Database to provide annotation IUPAC (2D) LINUCS CarbBank (Loss et al. http://www.dkfz.de/spec2/sweetd and cross-references for glycans 2002) b/ related data.

UniCarb-DB Database to provide LC/MS- CFG, Oxford, - GlyTouCan 1.0 (Hayes et al. http://www.unicarb-db.org MS, HPLC data for glycan IUPAC (2D) 2011) structures from different organisms.

UniCarbKB Database of curated glycan CFG, Oxford, GlycoCT PubChem, (Campbell et al. http://unicarbkb.org structures and glycoproteins IUPAC (2D) GlycoBase, 2014) from different resources. UniCarb-DB

Glyco3D Portal for 3D features of - Pdb - (Pérez et al. http://glyco3d.cermav.cnrs.fr saccharides (mono, dim oligo 2015) and poly), glycosyltransferases and lectins

*Database URL is not functional (records on Sept 10, 2017)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 26 Chapter 1

Table 1.3: Databases providing data on GT enzymes and GBPs, including gene expression profiles, phenotypic studies in mouse, lectins and glycan-protein interactions.

Databases Description Reference Web links

Glycan/lectin microarrays databases CFG Microarrays data for different glycans, lectins, antibodies and pathogenic Consortium for http://www.functionalglycomics.org/glycomics/public proteins from human Functional Glycomics data/microarray.jsp

*GlyAffinity Glycan arrays data from different resources. (Frank and Schloissnig http://worm.mpi-cbg.de/affinity/ 2010)

LfDB Glycan arrays data for various lectins (Hirabayashi et al. http://jcggdb.jp/rcmg/glycodb/LectinSearch 2015)

Glycosyltransferase (GT) enzymes database CAZy Information on different families of GT enzymes (Lombard et al. 2014) http://www.cazy.org/

CAZypedia Descriptive information (Wiki) for GT (Kanehisa and Goto http://www.cazypedia.org/ 2000)

CFG GT enzymes involved in mammalian glycans/glycoconjugates biosynthesis Consortium for http://www.functionalglycomics.org/glycomics/molec Functional Glycomics ule/jsp/glycoEnzyme/geMolecule.jsp

KEGG Orthology Information on GT enzymes and catalyzed reactions (Kanehisa et al. 2016) http://www.genome.jp/kegg/ko.html

KEGG pathways GT enzymes involved in the biosynthesis of glycans and glycoconjugates in (Kanehisa 2002) http://www.genome.jp/kegg/pathway.html various species

Databases for lectins and glycan-binding proteins (GBPs) AnimalLectinDB A collection of lectins from mammalian species (Kumar and Mittal http://www.research-bioinformatics.in 2011) General Introduction 27

Table 1.3: (continued).

Databases Description Reference Web links

CFG A collection of GBPs and glycan epitopes for related diseases Consortium for http://www.functionalglycomics.org/glycomics/molec Functional Glycomics ule/jsp/gbpMolecule-home.jsp

CancerLectinDB Database for lectins relevant to cancer (Damodaran et al. 2008) http://proline.physics.iisc.ernet.in/cancerdb dbOGAP Collection of O-linked glycosylated proteins from H. sapiens, M. musculus, R. (Wang et al. 2011) http://cbsb.lombardi.georgetown.edu/hulab/OGAP.ht norvegicus, D. melanogaster, X. laevis ml

Genomics Information on structures and functions of animal lectins http://www.imperial.ac.uk/research/animallectins/ Resource for Animal Lectins

GlycoFish A collection of Zebrafish N-glycosylated proteins and peptides (Baycin-Hizal et al. http://betenbaugh.jhu.edu/GlycoFish 2011)

GlycoFly Resource for N-glycosylated proteins from D. melanogaster (Baycin-Hizal et al. http://betenbaugh.jhu.edu/GlycoFly 2011)

GlycoProtDB Information on N-glycosylated proteins and their glycosylated sites (Kaji et al. 2012) http://jcggdb.jp/rcmg/gpdb/index.action

GlycoSuiteDB A collection of O- and N-linked glycoproteins from previous literature (Cooper et al. 2003) http://www.glycosuite.com (Now part of UniCarbKB)

LfDB Information on lectin interactions with glycans in the different organisms. (Hirabayashi et al. http://jcggdb.jp/rcmg/glycodb/LectinSearch 2015)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 28 Chapter 1

Table 1.3: (continued).

Databases Description Reference Web links

O-GlycBase Collection of O- and C-glycosylated proteins from previous literatures (Gupta et al. 1999a) www.cbs.dtu.dk/databases/OGLYCBASE/

Pathogen Information on GBPs and associated glycans from various human pathogens http://jcggdb.jp/search/PACDB.cgi Adherence Carbohydrate Database (PACDB)

SugarBindDB Database to provide lectins from pathogens and their associated glycans. (Shakhsheer et al. 2013) http://sugarbind.expasy.org

UniPep A collection of human N-glycosylated proteins and peptides (Zhang et al. 2006) http://www.unipep.org

*GlycoEpitopeD A collection of manually curated data for glycoproteins which are classified as (Kawasaki et al. 2006) http://www.glyco.is.ritsumei.ac.jp/epitope/ B antibodies

Gene expression databases for GTs and GBPs CFG GT gene expression data Consortium for http://www.functionalglycomics.org/glycomics/public Functional Glycomics data/microarray.jsp

Gene Expression Expression profiling data for GT genes (Edgar and Lash 2002) http://www.ncbi.nlm.nih.gov/geo/ Omnibus (GEO)

GlycoGeneDB Information on gene names, enzyme names, DNA sequences and expression (Maeda et al. 2015) http://riodb.ibase.aist.go.jp/rcmg/ggdb/ (now part of profile for GT genes JCGGDB)

Mouse phenotyping databases/resources General Introduction 29

Table 1.3: (continued).

Databases Description Reference Web links

CFG Data from the phenotypic analysis in the mouse after GT/GBP genes Consortium for http://www.functionalglycomics.org/glycomics/public knockouts Functional Glycomics data/phenotyping.jsp

International Consortium to provide data from mouse phenotyping experiments for various (Koscielny et al. 2014) http://www.mousephenotype.org/ Mouse human genes Phenotyping Consortium (IMPC)

Mutant Mouse Repository of mutant mouse lines from phenotyping experiments performed on (Lloyd 2003) http://www.mmrrc.org Regional various human genes Resource Center (MMRRC)

Phenotypic Resource for >300 phenotypic changes from studies on 36 mouse mutants (Orr et al. 2013) http://www.ncbi.nlm.nih.gov/pubmed/23118208/ survey

*Database URL is not functional (records on September 10, 2017)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 30 Chapter 1

1.4 Glycoinformatic tools

1.4.1 Glycan 3D structures analysis

Glycan structures modeling tools

NMR and X-ray studies of glycans, glycoconjugates and protein molecules play a key role in simplifying biological complexity; however, the branched nature of glycoconjugates complicates analysis when compared to other simpler molecules. In order to facilitate analysis of raw data from these experiments, various databases and web-based tools were developed. For example, SWEET II (Bohne et al. 1998) provides conversion of a glycan sequence to the most probable 3D structures. The interface allows user-input symbols of monosaccharides which are converted to oligomers based on the linkage information. Then, the conformational space for each glycosidic bond is generated and optimized to get most stable 3D structures. Similarly, GlycoMapsDB database (Frank et al. 2007) store more than two thousand calculated conformational maps for polysaccharides fragments which help in checking the torsion values of the glycosidic linkage during optimization of the 3D model generated by SWEET II tool. PDB has been linked with GlycoMapsDB in order to retrieve glycosidic torsions in the glycan molecules. Similarly, GlyProt (Bohne-Lang and Von der Lieth 2005) facilitates representation of the attachment of N-glycans to the available glycosylation sites in the 3D structure of the proteins. The most stable N-glycan conformation is selected based on various simulations and comparisons with experimental data. In the same context, another tool, Glydict (http://www.glycosciences.de/modeling/glydict/), was developed to perform molecular dynamic simulations for predicting most suitable glycan 3D structures considering the influences of branching. Similar to Glydict, Shape (Rosen et al. 2009) is a genetic algorithm-based prediction tool for automated modeling to predict conformations of the carbohydrates structures with at most 100 atoms in size.

Quality checking tools for 3D structures of Glycans

Quality assessment of the 3D structure of the glycan is important before performing structure-based analyses. PDB Carbohydrate REsiduecheck (pdb-care) (Lütteke and von der Lieth 2004a) was developed to check and improve the quality of the 3D structures from PDB. For a similar purpose, CArbohydrate Ramachandran Plot (CARP) (Lütteke et al. 2005) was developed to draw the Ramachandran plots2 of the carbohydrates 3D structures. The tool is

2Ramachandran Plot describes the dihedral angles ψ against φ of the monosaccharides in the glycan structures, which are generally used for the structural validation of the carbohydrate molecules. In general, the method was used to analyse the protein structures.

General Introduction 31

useful for checking the quality of structures by examining torsion angles plotted against each other. The program incorporates glycan structural data available from PDB. The application uses pdb2linucs tool (Lütteke et al. 2004) for converting pdb file to LINUCS formats (Bohne- Lang et al. 2001) to analyze torsions. Output can be extracted in the form of conformational maps showing energetically preferred regions. Carbohydrate Structure Suite (CSS) (Lütteke et al. 2005) can also be used to access glycan structures from PDB to perform the 3D structural analysis.

Tools for analyzing torsion angles and glycosylation sites

GlycoTorsion (Lütteke et al. 2005) was developed to analyze torsion angles of various monosaccharides linkage in the glycan structures. The application is integrated with the translational tool “pdb2linucs”, which imports carbohydrate data from pdb files and converts to LINUCS format for characterizing torsion angles. The application was further upgraded to analyze ring torsions, N-acetyl torsions, omega torsions and Asn-residue side chain torsions. In the same way, GlySeq (Lütteke et al. 2005) focuses on statistical analysis of the neighbor sequences of the glycosylation site in a glycoprotein. Similar to GlySeq, GlyVicinity (Rojas- Macias and Lutteke 2015) also uses PDB data to analyze surrounding environment of the glycosylation site, and also provided connectivity to the GlySeqDB (Lütteke et al. 2005), a database to provide amino acid sequence of glycoproteins. This application overcame the drawback of the GlySeq, which is not able to perform analysis on the carbohydrate surrounded by amino acids, because of missing covalent linkage between protein and carbohydrates. In the further developments, SuMo is an undertrial experiment by GLYCOSCIENCES.de focusing on sugar motif search in the carbohydrate structures.

Glycan motif search and substructure analysis tools

Motif analysis is one of the fascinating strategies to reveal biologically important information from the glycan 3D structures. Multiple Carbohydrate Alignment with Weight (MCAW) tool (Hosoda et al. 2017) was developed to understand glycan recognition mechanism of the glycan binding molecules based on glycans tree structure alignment and their weights, as described in (Hosoda et al. 2012). The methodology is useful for determining glycan patterns, which are useful to understand their interactions with the non-sugar molecules. Another profiling tool PSTMM developed based on the probabilistic sibling-dependent tree Markov model (Akune et al. 2010) is also capable of identifying patterns in glycan structures. This probabilistic strategy was developed based on the methodology described in the article

Metabolic Systems Biology of Leishmania major University of Minho, 2019 32 Chapter 1

(Akune et al. 2010); however, the tool suffers the problem of over-fitting the data. Glycan Kernal Tool (Akune et al. 2010) was also developed based on the strategy described in (Jiang et al. 2011) to identify glycan motifs, where the working strategy involves incorporation of the glycans’ similarity and biochemical information. This method is capable of identifying biologically important motifs in the glycan structures. Glycan Pathway Predictor (GPP) (Aoki-Kinoshita 2015) was also developed to predict N-glycosylation biosynthetic pathways based on the mathematical strategies described in the articles (Krambeck and Betenbaugh 2005; Krambeck et al. 2009). As an input, the tool requires glycan structures in KCF encoding format (Hattori et al. 2003), and a list of enzymes available in the organism and molecular mass to be produced in the metabolic pathways. These tools overcame the problem of over-fitting the data in the previously developed PSTMM tool. In the same context, GlycoMiner tool (Ozohanics et al. 2008) focuses on mining motifs and significant subtrees in the complex glycan structures. The application implements the concept of "closed frequent subtrees", developed in the previous study (Hashimoto et al. 2008). The identified subtrees were validated for their significance in the glycobiological processes. Other most recognized computational tools for analyzing 3D structures of glycans are mentioned in Table 1.4.

To facilitate more interactive analysis on 3D structures of the glycans, glycoconjugates, and glycoprotein, the existing 3D tools can be integrated to glycan structural databases. Similar to CSS and CARP, which provide access to the PDB for glycans 3D structural analysis, other tools, such as GlycoTorsion and GlyVicinity, which performs statistical analysis on the glycan structures, could also be integrated into structural databases. Cambridge Structural Database (CSD) (Allen 2002; Groom et al. 2016), providing > 900,000 highly curated 3D structures of the molecules from the X-ray and neutron diffraction analyses, is also a good resource for carbohydrate structures associated analyses in glycomics research. Recently, GLYCAM-Web server (http://glycam.org/) was developed by Prof. Robert J. Woods in the Complex Carbohydrate Research Center (CCRC) to perform 3D structures modeling of glycan molecules. Additionally, the platform supports modeling for specific oligosaccharides conformation and glycoproteins 3D structures. Similarly, Glyco3D portal (Pérez et al. 2015) provides 3D features of glycans and GBPs calculated mainly from the diffraction experiments. The information can be used for different 3D structural analysis of the glycans and glycoproteins. General Introduction 33

Table 1.4: Databases and tools which facilitate different analytical tasks, such as accessing the glycomics databases, glycan structures visualization, structural notation translations, and glycan 3D structural analysis.

Type of Tool/Database Description Reference Web links entity

Glycoscience.de Glycosciences.de Database to provide glycan (Lütteke et al. http://www.glycosciences.de/data Glycan structures from various 2006) base/structure/ structures sources database GlycoMapsDB Database for calculated (Frank et al. 2007) http://www.glycosciences.de/mod glycosidic linkage eling/glycomapsdb conformations from the glycan structures. CARP Generating Ramachandran- (Lütteke et al. http://www.glycosciences.de/tool like plots of carbohydrate 2005) s/carp/ linkage pdb-care Carbohydrate residues error (Lütteke and von http://www.glycosciences.de/tool checking tools in pdb files. der Lieth 2004) s/pdbcare/

GlycoTorsion Statistical analysis of (Lütteke et al. http://www.glycosciences.de/tool

carbohydrate torsion angles. 2005) s/glytorsion/ 3D Structures GlyVicinity Statistics analysis of the (Rojas-Macias http://www.glycosciences.de/tool analysis amino acids present in the and Lutteke 2015) s/glyvicinity/ tools vicinity of carbohydrate residues GlySeq Statistics analysis of the (Lütteke et al. http://www.glycosciences.de/tool sequences around 2005) s/glyseq/ glycosylation sites in the proteins SWEET II Conversion of the primary (Bohne et al. http://www.glycosciences.de/mod sequence of a complex 1998) eling/sweet2/doc/index.php carbohydrate into a reliable 3D molecular model GlyProt Tool for predicting (Bohne-Lang and http://www.glycosciences.de/mod glycosylation sites in Von der Lieth eling/glyprot/php/main.php 2005) proteins Glydict N-glycan 3D structures (Frank et al. 2002) http://www.glycosciences.de/mod prediction tool eling/glydict/ SuMo (under Tools to perform motif under http://www.glycosciences.de/tool s/sumo/ development) search in glycans development

Structural pdb2linucs Tool to convert pdb format (Lütteke et al. http://www.glycosciences.de/tool notation to LINUCS or IUPAC 2004) s/pdb2linucs/ translation format. LiGraph Tool to convert the graphical (Lütteke et al. http://www.glycosciences.de/tool format of glycan to ASCII 2004) s/LiGraph/ IUPAC sugar nomenclature.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 34 Chapter 1

Table 1.4: (continued). Type of Tool/Database Description Reference Web links entity

KEGG tools Glycan KEGG GLYCAN Database to provide glycan (Hashimoto et http://www.genome.jp/kegg/glycan/ structures Structure structures, bibliography, and al. 2006) database Database links to other glycomic resources. Glycan KCaM Search Tool to provide KCF based (Aoki et al. 2004) http://www.genome.jp/tools/kcam/ structures Tool search into KEGG database. search tool It also provides structural alignment analysis on glycans structures. Drawing KegDraw Tool Interface to draw glycan (Hashimoto et al. http://www.kegg.jp/kegg/download/ tool structures to perform the 2006) kegtools.html different analysis. WURCS Working Group

GlycoCT to Mol To convert GlycoCT to mol (Tanaka et al. http://www.wurcs- format 2014) wg.org/tool/converter/glycocttomol/ Structural input notation translation mol toWURCS To convert mol to GlycoCT (Tanaka et al. http://www.wurcs- format 2014) wg.org/tool/mol2wurcs.php

GlycoCT to To convert GlycoCT to (Tanaka et al. http://www.wurcs- WURCS WURCS format 2014) wg.org/tool/converter/glycocttowur cs/input

WURCS to To convert WURCS to (Tanaka et al. http://www.wurcs- IUPAC IUPAC format 2014) wg.org/tool/WURCS2IUPAC.php

Chemical To convert chemical (Tanaka et al. http://www.wurcs- Structures to structures to WURCS format 2014) wg.org/tool/edit/draw2wurcs.php WURCS Glycan GlyTouCan To visualize the glycan (Tanaka et al. http://www.wurcs- structures Glycan Viewer structures. 2014) wg.org/tool/idviewer.php visualization ID GlyTouCan- To map IDs in GlyTouCan (Tanaka et al. http://www.wurcs- Conversion GlycomeDB and GlycomeDB databases. 2014) wg.org/tool/idconv.php

Glycan GlycoCT String To facilitate GlycoCT based (Tanaka et al. http://www.wurcs- search tool Search (GlycoCT) search into GlycomeDB 2014) wg.org/db/glycomedb database.

RINGS Resources Glycan RINGS database The resource for glycan (Akune et al. http://rings.t.soka.ac.jp/ structures structures 2010) database DrawRINGS Tool to facilitate glycan (Akune et al. http://rings.t.soka.ac.jp/cgi- Glycan structures draw to make a 2010) bin/tools/DrawRings/drawrings2.pl structures query into RINGS database. visualization

General Introduction 35

Table 1.4: (continued).

Type of Tool/Database Description Reference Web links entity

GlycomeAtlas Facilitates visualization of (Konishi and https://rings.t.soka.ac.jp/GlycomeAt structures of the glycome Aoki-Kinoshita las/GUI.html 2012) from human and mouse tissue samples. Glycan Kernal Glycans structural (Akune et al. http://rings.t.soka.ac.jp/cgi- Tool classification tools. 2010) bin/tools/Kernel/kernel-tool.pl Profile PSTMMM To generate glycan profiles (Akune et al. http://rings.t.soka.ac.jp/cgi- Glycan tool from glycan structural data. 2010) bin/tools/ProfilePSTMM/profile- training_index.pl structures MCAW To perform multiple glycan (Hosoda et al. http://rings.t.soka.ac.jp/cgi- analysis structures alignment 2017) bin/tools/MCAW/mcaw_index.pl Glycan Miner For mining alpha-closed (Aoki-Kinoshita http://rings.t.soka.ac.jp/cgi- subtrees in the glycan 2013) bin/tools/GlycanMiner/Miner_index .pl structures. Glycan Pathway To predict N-glycan (Aoki-Kinoshita http://rings.t.soka.ac.jp/cgi- Predictor (GPP) biosynthetic pathways from 2015) bin/tools/GPP/gpp_index.pl given glycan structures. GlycoCT to KCF To convert GlycoCT to KCF (Akune et al. http://rings.t.soka.ac.jp/cgi- format 2010) bin/tools/utilities/GlycoCTtoKCF_a u/glycoct_index_au.pl

GLYDE2 to KCF To convert GlydeII to KCF (Akune et al. http://rings.t.soka.ac.jp/cgi-

format 2010) bin/tools/utilities/GLYDE2toKCF/g lyde2_index.pl IUPAC to KCF To convert IUPAC to KCF (Akune et al. http://rings.t.soka.ac.jp/cgi- format 2010) bin/tools/utilities/IUPACtoKCF_au/ iupactokcf_index_au.pl KCF to IMAGE To convert KCF to CFG (Akune et al. http://rings.t.soka.ac.jp/cgi- image format 2010) bin/tools/utilities/KCFtoIMAGE/kc f_to_image_index.pl Structural KCF to To convert KCF to (Akune et al. http://rings.t.soka.ac.jp/cgi- LinearCode LinearCode format 2010) bin/tools/utilities/KCFtoLinearCode notation /kcf_to_linearcode_index.pl translation KCF to LINUCS To convert KCF to LINUCS (Akune et al. http://rings.t.soka.ac.jp/cgi- format 2010) bin/tools/utilities/KCFtoLINUCS/k cf_to_linucs_index.pl KCF to mol To convert KCF to mol (Akune et al. http://rings.t.soka.ac.jp/cgi- format 2010) bin/tools/utilities/KCFtoMol/kcfTo Mol_index.pl KEGG GLYCAN To map KEGG glycan ID to (Akune et al. http://rings.t.soka.ac.jp/cgi- ID to KCF KCF format 2010) bin/tools/utilities/KEGG_GLYCAN _IDtoKCF/keggtokcf_index.pl LINUCS to KCF To convert LINUCS to KCF (Akune et al. http://rings.t.soka.ac.jp/cgi- format 2010) bin/tools/utilities/LINUCStoKCF/li nucs_to_kcf_index.pl LinearCode to To convert LinearCode to (Akune et al. http://rings.t.soka.ac.jp/cgi- KCF KCF format 2010) bin/tools/utilities/LinearCodetoKCF /linearcode_to_kcf_index.pl Convert Tool To convert GlycoCT (Akune et al. http://rings.t.soka.ac.jp/cgi- condensed, KCF, IUPAC, 2010) bin/tools/utilities/convert/index.pl LinearCode, LINUCS to almost any other format (depending on input)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 36 Chapter 1

Table 1.4: (continued).

Type of Tool/Database Description Reference Web entity links

Other independent tools for 3D structural data analysis Shape Molecular (Rosen et al. 2009) http://sourceforge.net/projects/shape conformation ga/ prediction in carbohydrates. Structure Feature Structure-based method (Lam et al. 2013) http://hive.biochemistry.gwu.edu/to Analysis Tool to predict N- ols/sfat (SFAT) glycosylation in eukaryotes CSS Perform automatic (Lütteke et al. 2005) www.dkfz.de/spec/css/ analysis on 3D structures of glycans from Protein Data Bank (PDB) GLYCAM-Web Set of tools for the web server at http://glycam.org/ predicting glycan 3D http://glycam.org/ structures and performing different analysis

1.4.2 Glycosylation prediction

Tools for predicting N-, O-, and C-linked glycosylation

Various prediction tools such as GlycoEP (Chauhan et al. 2013) and NetNGlyc (Gupta et al. 2004) have been developed for predicting N-glycosylation sites in the protein sequences (Table 1.5). Most of the tools were developed by adopting machine learning-based methods which used mammalian data, and thus limited to make a prediction only for mammalian proteins, such as alpha-1-acid glycoprotein and alpha-1-antitrypsin; however, the problem was tackled by other tools such as Glycosylation Prediction Program (GPP) (Hamby and Hirst 2008) and ENSEBLgly (Caragea et al. 2007) which are capable of predicting N- and O- glycosylation in both prokaryotic as well as eukaryotic proteins. Recently, a web-based tool SFAT (Lam et al. 2013) has been developed which uses protein's 3D structural information to predict N-glycosylation with 93% accuracy in Homo sapiens, Mus musculus, Drosophila melanogaster, A. thaliana and Saccharomyces cerevisiae. This tool uses the Classification And Regression Tree (CART)3 algorithm to utilize structural information of the glycoproteins for predicting N-glycosylation in eukaryotes, but the application was limited to five species only. The web server automatically uses predicted 3D structures of glycoprotein in case if it

3CART refers to a Decision algorithm that can be used for classification or regression of the predictive modelling problems. General Introduction 37

is not available; however, in such cases, the prediction accuracy of the method is reduced by 19%.

O-glycosylation, which occurs in prokaryotes and eukaryotes, has been also modeled using different computational methods. The tools which are capable of predicting mucin-type O- glycosylation include NetOGlyc (Hansen et al. 1998), Oglyc (Li et al. 2006), GPP, and CKSAAP_OGlySite (Chen et al. 2008) (Table 1.5). Among the various machine learning- based tools, GPP is the most accurate to predict glycosylation, with an accuracy of 90.8% for Ser site, 92.0% for the Thr site and 92.8% for Asn site. Another tool, GlycoX (An et al. 2006) is a MATLAB-based program which requires an experimentally determined mass of each glycopeptide and glycan spectrum to determine N- and O-glycosylation site, as well as the abundance of glycans on the binding sites in a protein sequence. As compared to the N- and O-glycosylation, mannosylation has been poorly studied possibly because of the lack of sophisticated experimental technologies. Only a few efforts have been done in past to predict mannosylation sites in proteins. To the best of our knowledge, EnsembleGly, GlycoEP, and NetCGlyc (Julenius 2007) are the only tools which facilitate prediction of C-mannosylation site in a given protein sequence (Table 1.5).

Metabolic Systems Biology of Leishmania major University of Minho, 2019 38 Chapter 1

Table 1.5: Web-based tools for predicting N-, O-, C-linked glycosylation, glycation, and glypiation in different organisms. The corresponding column of each tool describes the machine learning method used to develop the tool.

Type of glycosylation

Web servers Methods Reference Web links

linked

linked linked

-

- -

N C

O

Glycation Glypiation

DictyOGlyc NN (Gupta et al. 1999) http://www.cbs.dtu.dk/services/DictyOGlyc

NetOGly 4.0 NN (Hansen et al. 1998) http://www.cbs.dtu.dk/services/NetOGlyc

Glycosylation Prediction Program (GPP) RF (Hamby and Hirst 2008) http://comp.chem.nottingham.ac.uk/glyco

http://glycomics.ccrc.uga.edu/GlycomicsPort EnsembleGly SVM (Caragea et al. 2007) al/showEntry.action?id=76

Oglyc SVM (Li et al. 2006) http://www.biosino.org/Oglyc

GlycoEP SVM (Chauhan et al. 2013) http://www.imtech.res.in/raghava/glycoep

Glycopeptides ISOGlyP (Gerken et al. 2011) http://isoglyp.utep.edu randomization

http://bioinformatics.cau.edu.cn/zzd_lab/CKS CKSAAP_OglySite SVM (Chen et al. 2008) AAP_OGlySite

NetNGlyc ANN (Gupta et al. 2004) http://www.cbs.dtu.dk/services/NetNGlyc

NetCGlyc NN (Julenius 2007) http://www.cbs.dtu.dk/services/NetCGlyc

General Introduction 39

Table 1.5: (continued).

Type of glycosylation

Web servers Methods Reference Web links

linked

linked linked

-

- -

N C

O

Glycation Glypiation

(Fankhauser and Mäser YinOYang 1.2 NN http://www.cbs.dtu.dk/services/YinOYang 2005)

SFAT CART (Lam et al. 2013) http://hive.biochemistry.gwu.edu/tools/sfat

(Fankhauser and Mäser GPI-SOM ANN http://gpi.unibe.ch 2005)

HMM and PredGPI (Pierleoni et al. 2008) http://gpcr.biocomp.unibo.it/predgpi SVM

ANN and http://navet.ics.hawaii.edu/~fraganchor/NNH FragAnchor (Poisson et al. 2007) HMM MM/NNHMM.html

Scoring Big-PI (Eisenhaber et al. 1999) http://mendel.imp.ac.at/gpi/gpi_server.html Function

NetGlycate 1.0 ANN (Johansen et al. 2006) http://www.cbs.dtu.dk/services/NetGlycate/

Legend: RF-Random Forest, CART-Classification and Regression Tree, NN- Neural Network, SVM- Support Vector Machine; “Light Green”-Tools for eukaryotes species only; “Grey”- Tools for both eukaryotic and prokaryotic species.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 40 Chapter 1

Tools for predicting glypiation and glycation

Computational tools, such as Big-PI (Eisenhaber et al. 1999) and DGPI (Kronegg and Buloz 1999) were developed for predicting glypiation in the proteins. These tools incorporated machine learning methods, where the working strategy used protein's features like amino acid composition around the glypiation site of the target proteins as training datasets in the cross- validation. Later on, an Artificial Neural Network (ANN)-based tool GPI-SOM (Fankhauser and Mäser 2005) was developed to improve the prediction accuracy by performing mutagenesis4 experiment at the C-terminal end, which overcame the identification of key amino acids recognizing GPI-anchor. The previous model suffered from the high false positives rate because of the presence of the integral protein with the transmembrane domains. FragAnchor (Poisson et al. 2007) was another tool developed based on ANN and Hidden Markov Model (HMM) to overcome the difficulty of correctly identifying C-terminal susceptible to GPI integration. The model was able to predict 91% correct glypiation sites in the proteins from the Swiss-Prot database (Cooper et al. 2003); however, the method was less sensitive to the background noise in the database as it could only read C-terminus of the proteins. In continuation, PredGPI (Pierleoni et al. 2008) was developed based on the coupling of HMM and Support Vector Machine (SVM) which was able to further reduce false positive predictions as compared to the other previously developed tools. Recently, a more accurate SVM-based approach was developed for predicting GPI-anchored protein with 96% prediction accuracy (Cao et al. 2009). To study glycation sites in the protein, an ANN based tool NetGlycate-1.0 (Johansen et al. 2006) was developed to predict glycation of sigma amino groups of in mammalian cells. In continuation, the effort was done to improve the predictions of the glycation sites by developing SVM models such as ”Maximum Relevance Minimum Redundancy (mRMR)” and ”Incremental Feature Selection (IFS)” (Liu et al. 2015). Although the recently developed method is more accurate in terms of maximal Matthews correlation coefficient (MCC)5, the size of the dataset used is lower, which further suggests that more experimental data is needed to simulate the model for improving its accuracy for predicting glycation.

Current glycosylation prediction tools (Table 1.5), except for SFAT, were developed based on machine learning algorithms, and apparently did not incorporate any 3D structural information of proteins and glycans during the model learning process. The process of

4Mutagenesis experiment refers to the genetic changes in the organism which resulted as a mutation at the C-terminal side of the protein. 5Maximal Matthews Correlation Coefficient (MCC) is used to account true and false positives and negatives in the machine learning based models. It also provides sensitivity and specificity of the model.

General Introduction 41

glycosylation depends on many factors, such as charge, molecular size, sequence and orientation of binding pocket in the proteins, as well as target glycans, which unfortunately was not adopted in the previous models. As discussed earlier, SFAT adopted structural information of the glycoproteins for predicting glycosylation; however, no 3D information about the target glycans was used. To promote research in this context, a strategy for using 3D structural tools in predicting glycosylation has been proposed in previous reviews (Mazola et al. 2011).

1.5 Systematic study of human protozoan parasites

1.5.1 Study of human parasites: traditional approaches and recent advances

Human protozoan parasites adopt unique features to infect their hosts and survive in harsh environments. Leishmaniasis, trypanosomiasis, and malaria are among the top most common protozoan parasitic diseases caused by Leishmania, Trypanosoma, and Plasmodium, respectively, which exert significant threat to the health of billions of lives worldwide each year (Havelaar et al. 2015; Murray et al. 2012; Nii-Trebi 2017). Over the years, the analytical techniques for detecting and studying protozoan parasites have changed significantly, from simple microscopy examinations to serology-based assays, such as enzyme-linked immunosorbent assay (ELISA) (Hancock and Tsang 1986) and luciferase immunoprecipitation system (LIPS) (Burbelo et al. 2005). However, these methodologies failed to discriminate between closely related species or genetic variants. More recently, molecular biology-based methods such as real-time polymerase chain reaction (RT-PCR) (Lopez and Pardo 2010; Chaudry et al. 2012) and proteomics technologies (Sánchez-Ovejero et al. 2016) showed potential in studying parasites by developing more sensitive and specific techniques. Yet, perhaps, in the current post-genomic era, studies are more focused on using high- throughput DNA-based technologies (e.g. genomics and transcriptomics), which are highly sensitive even for studying closely related parasitic species in order to understand their genetic makeup. Several parasitic species such as L. major and P. falciparum have been sequenced and analyzed at genome level (Oyola et al. 2016; Gannavaram et al. 2017; Real et al. 2013), which further facilitated analysis of stage-specific expression and roles of genes in parasites. For instance, DNA microarray technologies provided a platform for analyzing environmental specific expression (based on quantitation of mRNA levels) of thousands of genes in parasite at a time (Lasonder et al. 2008; Kim et al. 2015).

Metabolic Systems Biology of Leishmania major University of Minho, 2019 42 Chapter 1

Further advances in proteomics and metabolomics facilitated characterization of various metabolic components, while glycomics allowed the profiling of carbohydrates (Ferguson 1999; Rodrigues et al. 2015). These omics studies, in particular, transcriptomics and proteomics, explored links between gene, proteins and enzyme-driven reactions occurring in the cells by establishing gene-protein-reaction (G-P-R) relationship (Vedadi et al. 2007; Machado et al. 2016), and subsequently provided a set of metabolic reactions encoded in the genome of protozoan parasites (Swann et al. 2015; Tymoshenko et al. 2015). As illustrated in Figure 1.2, genomics, proteomics, and metabolomics are useful to study metabolic components such as genes, enzymes, and metabolites in the context of metabolism; however, glycomics (the study of carbohydrates) has been poorly used to understand the metabolic connection of the carbohydrates in protozoan parasites.

Figure 1.2: Involvement of current omics technologies to understand metabolism and roles of carbohydrates in human protozoan parasites. The study of structural components of glycoconjugates under glycomics is comparatively difficult because of their complex branching nature (Krishnamoorthy and Mahal 2009). Moreover, the study concerning biosynthesis of carbohydrates is even more complicated as it is not a template-driven process like synthesis of protein molecules. However, attempts have General Introduction 43

been made to understand biosynthesis and metabolic roles of these complex molecules. For example, MS was used to study the abundance of sugar nucleotides, which are essential precursors in the biosynthesis of glycoconjugates (Turnock and Ferguson 2007). As described in section 1.2, several other studies which focused on enzymes associated with the glycosylation, and glycan-protein interactions subsequently improved knowledge of the biosynthesis of sugar molecules in parasites (Boer et al. 2007; Kuno et al. 2005; Augustin, J. M. and Bak 2013). Previous reviews also mentioned and emphasized upon difficulties in metabolic characterization and biosynthesis of these complex carbohydrates in protozoan parasites (Kafsack and Llinás 2010; Monis et al. 2005; Tyagi et al. 2015). In present, various databases and resources provide data on metabolic components including glycans in protozoan parasites (as discussed in section 1.3), which are useful to perform systems-wide analyses to understand the involvement of glycans/glycoconjugates in the interaction associated metabolism, and disease-associated phenotypes in the parasites (Swann et al. 2015; Grech et al. 2006). In this context, various databases such as EuPathDB, TriTrypDB (Aslett et al. 2009), BioCyc, PlasmoDB (Kissinger et al. 2001) and LeishCyc are useful to provide sequence information, annotation data, and other omics data to systematically study the metabolism of various protozoan parasites. The current knowledge and challenges to understand the metabolism of protozoan parasites are discussed next.

1.5.2 Metabolism of protozoan parasites: current knowledge and challenges

Studying metabolism is important in order to understand the strategies that parasites use while interacting with the hosts and invading the immune system to spread infection; however, it is difficult, as most of the parasites such as Leishmania and Toxoplasma uniquely change their metabolism and morphology to adapt and grow inside the host organisms (Mukhopadhyay and Mandal 2006; Novozhilova and Bovin 2010; Lamotte et al. 2017). The survival strategies inside host include involvement of several metabolic components including glycoconjugates (Mukhopadhyay and Mandal 2006; Novozhilova and Bovin 2010). Moreover, the survival of the parasites is associated with the host metabolism (Naderer and McConville 2008; Weilhammer et al. 2012; Wallqvist et al. 2016), making the study of coupled metabolism even more difficult.

Currently, many of the human protozoan parasites have been characterized at the genome and proteome levels using current high throughput technologies (Real et al. 2013; El-Sayed et al. 2005; Leifso et al. 2007; Gardner et al. 2002; Ivens et al. 2005), which helped to understand

Metabolic Systems Biology of Leishmania major University of Minho, 2019 44 Chapter 1

stage-specific genetic regulations associated with the metabolism. In order to understand the metabolism, some studies focusing on glycolysis/gluconeogenesis pathways and associated enzymes revealed that most parasites including Leishmania, Trypanosoma and Toxoplasma use glucose to produce energy via glycolysis pathways, whereas in the glucose limiting conditions inside host environment, other carbon sources such as lipids and fatty acids are the main source of energy (Mehta et al. 2006; Naderer et al. 2006; Jesus et al. 2014; Veras and De Menezes 2016); however, some of these pathways are not present in protozoans such as Trichomonas vaginalis (Saunders et al. 2011). Glycolysis is metabolically connected with tricarboxylic acid cycle (TCA) in mitochondria, which catalyze its end products such as pyruvate to synthesize amino acids, lipids and other metabolites in many protozoan parasites (Olszewski et al. 2010; McConville et al. 2015; Saunders et al. 2010). A general schematic illustration of metabolic pathways for energy metabolism and biosynthesis of amino acids is shown in Figure 1.3.

In glycosome, the conversion of glucose to 1, 3-biphosphoglycerate consumes both ATP and NAD+, which are needed to be recycled within the same compartment in order to regulate the glycolytic flux (Hammond et al. 1985; McConville et al. 2015; Michels et al. 2006), as illustrated in Figure 1.3. In order to maintain the balance of ATP and NAD+, cytosolic phosphoenolpyruvate (PEP) can be converted to succinate, which results in the synthesis of these metabolites, as studied in procyclic T. brucei L. mexicana (Hammond et al. 1985; Saunders et al. 2011). These pathways include several enzymes such as PEP carboxykinase, fumarase, and NADH-dependent fumarate reductase (Bakker et al. 1997). Leishmania also expresses a similar set of enzymes to maintain glycosomal energy balance (Guerra et al. 2006; Sosa et al. 2015).

In general, understanding metabolism of intracellular parasites, in particular, Leishmania amastigote (inside macrophage), is more difficult as these are less available for studies. For most protozoan parasites including Leishmania and Toxoplasma, laboratory experiments are difficult, as it is hard to provide the natural environmental conditions that parasites face inside the host, limiting the study in the conditions where they have evolved naturally. However, efforts were made on developing and improving artificial growth media for studying parasites in different stages of their life cycle (Limoncu et al. 2004; Schuster 2002; Kalani et al. 2016). The metabolism inside host essentially depends on the nutritional environment. For example, in the bloodstream stage of trypanosomes, glucose is catabolized aerobically, which require oxygen; while in Leishmania amastigote (inside human host), General Introduction 45

glycolysis is significantly reduced due to glucose and oxygen-limiting conditions (Moyersoen et al. 2004). Other studies found that beta-oxidation of fatty acids increases in amastigote stage of L. mexicana (Hart and Coombs 1982). Several studies also mentioned functionality of central carbon metabolism for synthesis of energy and essential metabolites in parasites such as Leishmania (Opperdoes and Coombs 2007; Saunders et al. 2010), Plasmodium (Olszewski and Llinás 2011) and Trypanosomes (Bringaud et al. 2006; Smith and Bütikofer 2010).

Figure 1.3: A general schematic representation of energy metabolism pathways commonly found in most human protozoan parasites using glucose as carbon source. Legend: Glc6P: Glucose-6- phosphate; F6P: Fructose-6-phosphate; F1,6biP: Fructose-1,6-biphosphate; DHAP: Dihydroxyacetone phosphate; g3p: Glyceraldehyde-3-phosphate; 1,3bpg: 1,3-biphosphoglycerate; PEP: Phosphoenolpyruvate. Biosynthesis of glycoconjugates such as GIPL, LPG, and GP and their metabolic involvement is poorly studied in protozoan parasites (Mukhopadhyay and Mandal 2006; Hederos and Konradsson 2006; Proudfoot et al. 1995); however, studies have been done on

Metabolic Systems Biology of Leishmania major University of Minho, 2019 46 Chapter 1

simple sugars such as such as GlcN, GlcNAc, Glc, ribose (Rib), Gal and Man, which parasite can use as carbon sources (Naderer et al. 2010; Turnock and Ferguson 2007). For example, Leishmania expresses a variety of transporters to consume sugars like Glc, GlcN, and GlcNAc as carbon sources (Rodriguez-Contreras et al. 2007). As illustrated in Figure 1.3, the presence of glucose activates glycolysis and subsequently mitochondrial metabolism in Leishmania and Trypanosomes. Mannogen (repertoire of mannose) synthesis occur only in Leishmania and Crithidia; however, these pathways are absent in Trypanosomes (Gorin et al. 1979; Ralton et al. 2003). The enzymes associated with mannoses synthesis have not been identified yet, however, other enzymes associated with glucose-6-phosphate/fructose-6- phosphate to GDP-Man conversion were characterized (Garami and Ilg 2001). Biosynthesis of sugar nucleotides, which are important precursors in the synthesis of glycans and complex glycoconjugates such as GIPL, GP, and LPG, mainly depends on the availability of the sugars. For example, in Leishmania promastigote, glucose is mainly used, while in amastigote form, GlcNAc and GlcN are the main sugars used for the synthesis of sugar nucleotides (Naderer et al. 2010) (Figure 1.3).

Several metabolic components associated with different pathways were studied in parasites; however, the integrated studies allowing combined analysis of all pathways at the systems level, to understand metabolic profiles and associated phenotypes are still in infancy. This is possibly because of lack of the strategies and computational methodologies to systematically analyze the multi-omics data in combination with metabolic modeling. As discussed above, several resources are available to provide organism-specific data on metabolic components, which can be utilized to perform a comprehensive analysis to study genotype/phenotype in parasites. More about previous attempts for systems-based analyses and application of different algorithms for integrating omics data into metabolic network analyses to understand the metabolism of protozoan parasites is discussed next.

1.6 Metabolic network modeling

1.6.1 Metabolic network reconstruction, curation, and flux analyses

A metabolic network represents a set of chemical reactions that defines metabolism and physiological properties of a cell. Metabolic network modeling, as an integral part of systems biology (Kitano 2002), is an effective and sophisticated approach to systematically study the metabolic profile of an organism as well as to understand its genotype/phenotype in a particular environmental condition (Martín-Jiménez et al. 2017; Wu and Chan 2012; Plata et General Introduction 47

al. 2010). The methodology is also useful for analyzing gene and metabolite essentiality, enzyme robustness and flux variability in the context of the metabolism (Chavali et al. 2012). Previously, these techniques were used for metabolic characterization and identification of the potential drug targets in several organisms including human parasites such as Leishmania and Plasmodium falciparum (Chavali et al. 2008, Plata et al. 2010, Beste et al. 2007, Kim et al. 2010, and Raghunathan et al. 2010).

The methodology requires genome and experimental information about the metabolism such as metabolic enzymes and functions, stoichiometry of the associated reactions, and gene-reaction association to study the genotype/phenotype of an organism at the systems level. The working strategy mainly involves draft reconstruction of the metabolic network, refinement and manual curation, followed by metabolic flux analysis and validation. The preliminary network draft can be built using all the metabolic reactions identified in the genome or with experimental evidence, associated enzymes, and cellular compartmentalization of the metabolites (Figure 1.4A). Various automated tools such as GEMSiRV (Liao et al. 2012) and Merlin (Dias et al. 2010) provide a preliminary reconstruction of the metabolic network by incorporating genome sequence information. These tools compare target genome sequence with highly annotated genomes in order to describe most probable annotations to be included in the metabolic network. In the same context, PRIAM software (Claudel-Renard et al. 2003) also provides a possible functional annotation to the enzymes based on the comparison across multiple functional domains of already characterized enzymes. Recently, EnzDP protocol (Nguyen et al. 2015), with an accuracy of 94.5% based on five-fold cross-validation, was developed to improve enzyme annotations for the metabolic network. Other strategies for better reannotation in metabolic network reconstruction are reviewed in (Pfau et al. 2016). The draft models are further curated and refined by incorporating additional supporting information on the genes, proteins and their association with the reactions (Feist et al. 2009; Dias et al. 2018). Several other studies and reviews described the standard procedure for generating high-quality metabolic networks, and protocol for curating the models using experimental data (Thiele and Palsson 2010; Francke et al. 2005; Dias et al. 2018; Hartleb et al. 2018).

Flux Balance Analysis (FBA) is a widely used mathematical approach for analyzing biochemical reactions networks to predict genotype and phenotype relationship of an organism. The technique allows constraint-based flux analyses to predict the steady-state flow of the metabolites with the maximization of an objective function in a metabolic

Metabolic Systems Biology of Leishmania major University of Minho, 2019 48 Chapter 1

network (Kauffman et al. 2003). In order to perform FBA, for each metabolite in the metabolic network, a mass balance is performed, originating a system of ordinary differential equations (Figure 1.4B). Then, these equations are converted into a matrix format, with stoichiometry matrix S and flux vector v, dx/dt = S.v, where x represents the vector with all the metabolites in the network, S is a m×n matrix with m metabolites and n chemical reactions (Figure 1.4C). Flux vector v represents all flux values within the metabolic network, each describing the rate at which one metabolite is converted into another. The model can be metabolically constrained by imposing restrictions on the reaction fluxes based on the relevant biological data (Figure 1.4D). Various strategies to use omics data for putting flux constraints on the metabolic reactions are discussed in the next paragraph. As an essential part of the analyses, linear objective functions can be defined in the metabolic network, which is to be achieved during model simulation. The objective function(s) can represent the growth or production of the ATP or other specific metabolites in a given environmental condition. As an important assumption of stoichiometric models and the constraints-based FBA analyses, the “steady-state” represents that the rate of change of concentrations of any metabolite is zero (Figure 1.4E), which can be biologically justified in the sense that conversion of one metabolite to another is faster than any other genetic or behavioural changes in the organism (Segre et al. 2002). After implementing steady-state conditions, the flux distribution in various pathways is obtained by solving the set of the linear equations under the chosen objective function. General Introduction 49

Figure 1.4: Metabolic network modeling. A) Illustration of a metabolic reactions network. B) A metabolic reactions network can be mathematically represented by mass balance equations C) The stoichiometric matrix represents the stoichiometry of all reactions in the network. D) Flux constraints can be imposed by defining minimum and maximum bounds for each reaction flux; vi and bi,j refers to internal flux and exchange flux respectively in the network. E) Solving linear equations assuming steady-state condition. 1.6.2 Integration of omics data into metabolic network modeling

Omics technologies such as genomics, proteomics, transcriptomics and metabolomics, glycomics have provided data-rich platform enabling scientist to study a large number of molecular components such as DNA, proteins, carbohydrates, and metabolites in an organism. At the same time, advances in experimental technologies such as MS, lectin/glycan microarray, mouse phenotyping, enzymes kinetics and gene microarrays, have generated enormous amount of data to characterize metabolic components, as well as to understand roles of glycosylation and glycoconjugates in many organisms including protozoan parasites (Avila et al. 1991; McConville and Blackwell 1991; Phillips and Turco 2015). Use of these data in the metabolic network modelling has grown up significantly for improving metabolic characterization of various prokaryotes and eukaryotes at systems level (Saha et al. 2014; Zhang and Hua 2016); however, the methodologies have been poorly explored for understanding metabolism of human protozoan parasites (Cantacessi et al. 2015; Pinney et al. 2007; Swann et al. 2015).

Metabolic Systems Biology of Leishmania major University of Minho, 2019 50 Chapter 1

The metabolic network analyses using FBA-based methods allow imposing metabolic flux constraints based on the physiological, regulatory and nutritional environmental conditions to improve the predictions on flux distributions (Kauffman et al. 2003; Abedpour and Kollmann 2015). The biologically relevant omics data can be integrated into the metabolic model to provide an extra layer of metabolic flux constraint across the network to minimize the solution space in order to improve model predictions (Forst 2010; Kim and Lun 2014; Blazier and Papin 2012). In this context, use of mRNA expression data with the metabolic flux analyses has made significant progress to characterize various cellular phenotypes, where the networks were constrained either by activating, deactivating or specifying activity level of the reactions based on the gene expression level (Covert and Palsson 2002; Blazier and Papin 2012). In continuation, algorithms such as GIMME (Becker and Palsson 2008) and iMAT (Shlomi et al. 2008), which use on/off constraints on the reaction fluxes based on expression thresholds, while others such as E-Flux (Jensen and Papin 2011) and PROM (Chandrasekaran and Price 2010), which use relative reaction flux values based on the relative expression level, were also developed to improve the flux prediction. In terms of the utilizing omics data, GIMME, iMAT and E-Flux use a single set of gene expression data; while PROM and MADE (Chandrasekaran and Price 2010) allow integration of multiple gene expression datasets into the metabolic models. These algorithms are also different in the context of their assumptions to constrain the metabolic network using gene expression data. For example, E- Flux allows maximization of the biomass function under the constrained lower and upper limits based on gene expression, while contrastingly, iMAT performs maximization of the fit between gene expression data and flux values, under a mathematical environment where flux through highly and under-expressed reactions are maximized and minimized, respectively.

Other methods use fluxomics (Wiback et al. 2004; Carreira et al. 2013) and metabolomics data (Cakir et al. 2006; Topfer et al. 2015) to improve condition-specific predictability using the metabolic models. Integrative Omics Metabolic Analysis (IOMA) (Yizhak et al. 2010) and Mass Action Stoichiometric Simulation (MASS) (Jamshidi and Palsson 2010) allow integration of metabolomics and proteomic data into the metabolic models to constrain the reaction fluxes in the metabolic pathways. A recently developed method GIM3E (Gene Inactivation Moderated by Metabolism, Metabolomics, and Expression) (Schmidt et al. 2013) is also capable of using metabolomics and gene expression data for imposing metabolic constraints in order to improve the flux predictions. In the same context, E-Flux2 (E-Flux method combined with minimization of I2 norm) and SPOT (Simplified Pearson cOrrelation General Introduction 51

with Transcriptomic data) were developed to utilize transcriptomics data along with known carbon source and objective functions to improve prediction on the context-specific metabolic functionalities (Kim et al. 2016). Although these algorithms improve accuracy on flux distributions, they lack consistency while used across different conditions and organisms (Song et al. 2014a). The methods such as GIMME and GIM3E, which require metabolic functionality (e.g. an objective function), are more useful to study prokaryotes, while other methods such as iMAT, which do not require objective function perform better to study multi-cellular eukaryotes. More comparative analysis about application of different algorithms across various organisms is comprehensively discussed in a previous study (Robaina and Nikoloski 2014).

In terms of the application of these algorithms to integrate omics data into genome-scale models, the few successful example stories include the integration of RNAseq data into metabolic model of L. infantum (Sharma et al. 2017), tissue/cell type-specific omics data into human metabolic model Recon 2 (Ryu et al. 2015), proteomics data into metabolic model of Enterococcus faecalis (Großeholz et al. 2016), and multi-omics datasets into the metabolic models of Escherichia coli (Kim et al. 2016a; Enjalbert et al. 2011) to understand the metabolism and correlation between genotype and phenotype. This strategy has also improved drug target predictions in many medically important organisms such as Aspergillus fumigates (Kaltdorf et al. 2016) and P. falciparum (Ludin et al. 2012). In particular, E-Flux and PROM have shown promising improvements for drug target predictions in M. tuberculosis (Chandrasekaran and Price 2010), while iMAT was used to understand tissue- specific metabolism in human (Shlomi et al. 2008). Several reviews extensively discussed the scope and the strategies for integrating omics data into genome-scale models (GEMs) (Kim and Lun 2014; Blazier and Papin 2012; Joyce and Palsson 2006; Machado and Herrgård 2014), but still, these strategies have not been significantly utilized in context of analyzing metabolic networks of protozoan parasites to understand their stage-specific metabolism and associated roles of carbohydrates.

1.6.3 Previous systems-wide metabolic studies

Systems biology approaches such as metabolic network modeling have been widely explored not only to understand the metabolism but also to study disease phenotypes, pathogenicity and drug-target predictions in protozoan parasites (Pitkanen et al. 2010; Chavali et al. 2008; Carey et al. 2017). Up to my knowledge, so far, eleven genome-scale metabolic models were

Metabolic Systems Biology of Leishmania major University of Minho, 2019 52 Chapter 1

developed to study the metabolism and genotype/phenotypes of the protozoan parasites in different environmental conditions (Table 1.6). In general, these systems-wide analyses included automated or semi-automated draft reconstruction of the reactions network using genome sequences and reactomics data (Devoid et al. 2013; Reyes et al. 2012), followed by manual curation and flux-based analyses. Omics data such as transcriptomics were also used to constrain reaction fluxes to improve phenotype predictions (Saha et al. 2014).

In order to study the metabolism, most of the models describe central carbon metabolism in the cytosol, glycosome, and mitochondria; however, few of them, such as iAC560 and iAS556 also include biosynthesis and degradation pathways of lipid and larger fatty acids in the metabolic network. The flux distribution across various pathways was predicted by performing FBA-based simulations in the defined nutrients media which were formulated based on experimental as well as computational evidence. For example, metabolic network modeling of trypanosomes proposed that hexokinase and phosphofructokinase can consume ATP from cytosol, which was experimentally validated in the studies (Guerra-Giraldez et al. 2002). Similarly, the effect of ATP synthase on robustness of metabolic network was predicted using model iAC560, and subsequently verified experimentally (Chavali et al. 2008). In terms of the overall validation, few models such as iAS556 (L. infantum) used 13C isotope enrichment data of various metabolites to compare stage-specific metabolism of the parasite (Subramanian and Sarkar 2017), while other models such as iAC560 (L. major) and iCS382 (T. gondii) were validated using drug sensitivity assays to analyse enzyme inhibition effects on growth of parasites (Song et al. 2013; Chavali et al. 2008). Reactions knockout simulations under various defined media were also performed to analyze the prediction performance of various models including iAC560 (Chavali et al. 2008; Carey et al. 2017).

Irrespective of the genome coverage for model reconstruction and methodologies used for validation, most of these models have been evaluated by measuring phenotypes after knockout of genes/reactions; which unfortunately resulted with low prediction accuracy. Some of the potential causes of the poor performance of these models can be wrong annotations of the metabolic enzymes, reaction reversibility and incorrect filling of the pathway gaps using reactions which have not been assigned any gene so far (Reed et al. 2006).

General Introduction 53

Table 1.6: Metabolic models of the human protozoan parasites. Details of number of genes, reactions, metabolites, and compartments in each model are mentioned in the corresponding columns.

Metabolic model information

Parasite

species

Reference

Number of ORFs of Number

Model ID Genes Metabolites Reactions Compartments L. infantum 8109 iAS142 142 231 237 o,m,mm,c,e (Subramanian et al. 2015)

L. infantum 8109 iAS556 556 1160 1260 y,c,m,mm,nm,r,a,v, (Subramanian and Sarkar 2017) n,e

L. major 8370 iAC560 560 1101 1112 c,a,e,m,y,ER,f,n (Chavali et al. 2008)

L. donovani iMS604 604 1135 1159 a,f,y,c,m,n,pm (Sharma et al. 2017)

P. falciparum 5615 579 1622 998 c,i,m,n,r,v (Huthmacher et al. 2010)

P. falciparum 5615 iTH366 366 616 1001 c,m,a,e (Plata et al. 2010)

P. falciparum 5615 iPfa 325 1258 1066 c,m,a,e,n (Chiappino-Pepe et al. 2017)

P. falciparum 5615 iPfal17 482 991 1192 c,m,a,v,e (Carey et al. 2017)

T. gondii 8920 iCS382 382 384 571 a,c, rm, m, mm (Song et al. 2013)

T. cruzi 3054 iSR215 215 158 162 e, c, m, o (Roberts et al. 2009)

C. hominis 3884 iNV213 213 158 162 c, e (Vanee et al. 2010)

Legend: a: acidocalcisome, c: cytosol, e: extra-organism/extracellular, f: flagellum, m: mitochondrion, n: nucleus, o: glyoxysomes, r: ER: Endoplasmic Reticulum, v: vacuole, y: glycosome, mm: mitochondrial membrane, nm: nuclear membrane, rm: ER membrane, ORFs: Open Reading Frames.

In almost all cases (Table 1.6), initial reconstruction of the metabolic network used comparative genomics or text mining methods to improve functional annotations of the enzymes. The comparative genomic approach might have predicted some enzyme functions that are not present in the organism. Similarly, filling the reaction gaps in the metabolic network using indirect experimental evidence (i.e. from closely related species) might be the reason some models perform poorly. Instead of annotation approaches, in most of the cases, models were not constrained using multi-omics data due to lack of appropriate strategies/methodologies that are discussed in the previous section. This might also be one of

Metabolic Systems Biology of Leishmania major University of Minho, 2019 54 Chapter 1

the reasons for reducing the predictability of these models. In context of improving predictability of the metabolic models, several reviews discussed the causes throughout all components in metabolic network modelling process, which can reduce prediction accuracy (Pinney et al. 2007), while other reviews focused on methodologies for improving quality of the current metabolic networks (Pitkanen et al. 2010; Pfau et al. 2016).

1.6.4 System-based characterization of carbohydrates

As discussed in section 1.2, the glycomic studies have significantly increased the number of characterized complex carbohydrates structures (Kailemia et al. 2014; Lundborg and Widmalm 2011; Mehta et al. 2016; Morelle and Michalski 2007), while other methodologies such as transcriptomics, proteomics, enzyme kinetics, glycoproteomics, and functional glycomics improved knowledge on their biosynthesis by exploiting the metabolic functionalities of the glycosyltransferases enzymes in the protozoan parasites (Novozhilova and Bovin 2010). Yet, much less effort has been made to understand metabolic roles of carbohydrates at systems-level in these organisms. Recent studies characterized precursors (i.e. activated nucleotide sugars) for biosynthesis of complex glycoconjugates such as LPG, GIPL and GP (Turnock and Ferguson 2007; Sanz et al. 2013), while another study measured metabolic fluxes and metabolite levels in Trypanosomes (Kim et al. 2015); but the metabolic understanding of these molecules at systems level have not been emphasized.

The existing metabolic models (Table 1.6) poorly considered pathways for biosynthesis of carbohydrates including glycans and complex glycoconjugates. The internal reactions in these models are associated with cellular compartments, such as acidocalcisome, cytoplasm, flagellum, mitochondrion, nucleus, glyoxysomes, vacuole, glycosome, and mitochondrial membrane; but, other compartments such as ER and golgi apparatus, which are primary locations for the synthesis of glycan/glycoconjugates, are disregarded (Table 1.6). Coverage on GT enzymes, which catalyze glycosidic linkage in carbohydrate synthesis, is also very poor in these metabolic models. This is why the study of carbohydrates metabolism is very much limited using these models.

The availability of valuable data (as discussed in Section 1.2) and various carbohydrates synthesis focused studies (Naderer et al. 2010; Naderer et al. 2006; Rodrigues et al. 2015) provide opportunity to perform integrative systems-level analyses, with the challenge of adopting systems biology approaches, for understanding complex carbohydrates from the metabolic perspectives in protozoan parasites. The relevant glycobiology data from various General Introduction 55

databases can be incorporated along with the use of the existing data integration algorithms to explore metabolic involvement of the carbohydrates in protozoan parasites.

1.7 Objectives of the thesis

The human protozoan parasites such as Leishmania and Trypanosoma cause widespread neglected tropical diseases including leishmaniasis and sleeping sickness in human. The parasitic infection is promoted due to adopting strategies by these parasites to synthesize essential metabolites including glycans and glycoconjugates, and ultimately altering their metabolism to survive in the harsh environment inside human hosts. In the post-genomic era, with the significant advancements in omics technologies to characterize the metabolic components, the integrated systems biology-based strategies are still lacking for understanding stage-specific metabolism and metabolic roles of carbohydrates molecules in protozoan parasites. Owing to the rapid growth of the omics data in this context, this thesis considers the following objectives- ➢ To provide an overview of glycobiology research in terms of the data types, experimental methodologies, important resources, and analytical tools to understand carbohydrates from structures to functions in various organisms including human protozoan parasites.

➢ To investigate glycomic databases for the stored glycans related information, ease of accessibility and inter-connectivity with the other important databases to enhance glycobiology research.

➢ To dig out current knowledge on the metabolism of the protozoan parasites from literature, and glycomics databases and other resources. The objective also includes investigating previous metabolic studies to understand their strength and limitations for studying stage-specific metabolism and metabolic roles of carbohydrates such as glycans, glycoconjugates, and sugar nucleotides in protozoan parasites at the systems level.

➢ To develop systems biology-based strategy for utilizing available omics data into metabolic network analysis to improve understanding on stage-specific metabolism and roles of carbohydrates such as sugar nucleotides, which are building blocks of glycans and glycoconjugates, in different stages of parasites’ life cycle.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 56 Chapter 1

➢ To develop a strategy involving metabolic network analysis for identifying potential drug target genes which affect the growth of the parasites in different environmental conditions.

1.8 Outline of the thesis

This thesis is primarily focused on the application of genome-scale metabolic modeling in the data-rich discipline to perform systems-wide metabolic network analyses to understand stage- specific metabolism as well as biosynthesis and metabolic involvement of sugar nucleotides (e.g. UDP-Glucose (UDP-Glc), GDP-Arabinose (GDP-Ara), UDP-Galactose (UDP-Gal), GDP-Mannose (GDP-Man), and UDP-N-acetyl Glucosamine (UDP-GlcNAc)), which are main precursors of glycans and glycoconjugates in Leishmania major. The strategy is further used for identification of potential drug target genes in Leishmania major. Chapter-wise outline of the thesis is described below. Chapter 1 provides a comprehensive review of different data types in glycobiology and methodologies such as Mass Spectrometry (MS), lectin/glycan microarrays, mouse phenotyping and enzyme kinetics, which are used to study glycans and glycoconjugates from structural to the functional level. Further, different data resources and important analytical tools for glycan analysis are extensively reviewed. The motivation for the further study was developed by introducing the application of metabolic modeling approaches with utilizing relevant omics data and different algorithms to improve understanding on metabolism as well as deciphering metabolic involvement of carbohydrates in protozoan parasites. The in-depth investigation of previous systems-based metabolic studies described limitations on the studies of carbohydrates (e.g. glycans, glycoconjugates, and sugar nucleotides) to understand their metabolic roles in parasites at the systems level. In Chapter 2, glycobiology databases, in particular, glycomic and glycoproteomic data resources are extensively reviewed, addressing problems in accessing and integrating data due to the inconsistency in the identification and representation of stored glycans, as well as the poor inter-linkage between databases such as GlyTouCan, CarbBank, and UniProtKB. The focus is provided on graphic and text-based glycan structural notations (e.g. WURCS, LinearCode, and GlycoCT), exploiting available tools (e.g. RINGS Resources and WURCS- WG) to interconvert these encoding formats, in order to improve inter-linkage and interoperability among various glycomic databases. General Introduction 57

In Chapter 3, a metabolic model ext-iAC560 (extension of existing model iAC560) was developed with additionally incorporating pathways conferring metabolism of lipids and larger fatty acids, and carbohydrate including sugar nucleotides such as UDP-Glc, GDP-Ara, UDP-Gal and UDP-GlcNAc in L. major. Parsimonious Flux Balance Analysis (PFBA)-based simulation in combination with Gene Inactivity Moderated by Metabolism and Expression (GIMME) using gene expression data was performed to improve the consistency between gene expression and metabolic model. These analyses were also used to predict fluxes across various pathways, which helped to understand the stage-specific metabolism of Leishmania. Chapter 4 describes an application of flux-based simulation using model ext-iAC560 to identify 53 potential drug target genes (including 10 novel targets) in L. major. These target genes were capable of affecting biosynthesis of building blocks of biomass, and consequently the growth of L. major. Further databases and literature based searches of potential drug targets described their essentiality and druggability in other parasitic and non-parasitic species. Further investigations explored enzyme-based inhibitors (already tested in other parasitic/non-parasitic species), which might work as an antileishmanial drug against predicted targets. Finally, in Chapter 5, the summary and concluding remarks of the thesis are mentioned by integrating major outcomes of the study, with providing possible lines for future research.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 58 Chapter 1

1.9 References

Abedpour, N. and Kollmann, M., 2015. Resource constrained flux balance analysis predicts selective pressure on the global structure of metabolic networks. BMC Systems Biology, 9(1), p. 88. Aebi, M., 2013. N-linked protein glycosylation in the ER. Biochimica et Biophysica Acta - Molecular Cell Research, 1833(11), pp.2430–2437. Aken, B.L. et al., 2016. The Ensembl gene annotation system. Database, 2016, p.baw093. Akune, Y. et al., 2010. The RINGS resource for glycome informatics analysis and data mining on the Web. Omics : a journal of integrative biology, 14(4), pp.475–486. Alcolea, P.J. et al., 2016. In vitro infectivity and differential gene expression of Leishmania infantum metacyclic promastigotes: Negative selection with peanut agglutinin in culture versus isolation from the stomodeal valve of Phlebotomus perniciosus. BMC Genomics, 17(1), p.375. Alcolea, P.J. et al., 2014. Stage-specific differential gene expression in Leishmania infantum: From the foregut of Phlebotomus perniciosus to the human phagocyte. BMC Genomics, 15(1), p.849. Allen, F.H., 2002. The Cambridge Structural Database: A quarter of a million crystal structures and rising. Acta Crystallographica Section B: Structural Science, 58(3 PART 1), pp.380–388. An, H.J. et al., 2006. A new computer program (GlycoX) to determine simultaneously the glycosylation sites and oligosaccharide heterogeneity of glycoproteins. Journal of Proteome Research, 5(10), pp.2800–2808. An, H.J. et al., 2009. Determination of glycosylation sites and site-specific heterogeneity in glycoproteins. Current Opinion in Chemical Biology, 13(4), pp.421–426. Anish, C. et al., 2013. Immunogenicity and diagnostic potential of synthetic antigenic cell surface glycans of Leishmania. ACS Chemical Biology, 8(11), pp.2412–2422. Aoki-Kinoshita, K. et al., 2016. GlyTouCan 1.0 - The international glycan structure repository. Nucleic Acids Research, 44(D1), pp.D1237–D1242. Aoki-Kinoshita, K.F., 2015. Analyzing glycan structure synthesis with the glycan pathway predictor (GPP) tool. Methods in Molecular Biology, 1273, pp.139–147. Aoki-Kinoshita, K.F., 2013. Mining frequent subtrees in glycan data using the rings Glycan Miner Tool. Methods in Molecular Biology, 939, pp.87–95. Aoki, K.F. et al., 2004. KCaM (KEGG Carbohydrate Matcher): A software tool for analyzing the structures of carbohydrate sugar chains. Nucleic Acids Research, 32(WEB SERVER ISS.), pp.W267–W272. Apweiler, R. et al., 1999. On the frequency of protein glycosylation, as deduced from analysis of the SWISS- PROT database. Biochimica et Biophysica Acta - General Subjects, 1473(1), pp.4–8. Artemenko, N.V. et al., 2012. Databases and tools in glycobiology. Methods in Molecular Biology, 899, pp.325–350. Arya, R. et al., 2003. Biosynthesis of Entamoeba histolytica proteophosphoglycan in vitro. Molecular and Biochemical Parasitology, 126(1), pp.1–8. Aslett, M. et al., 2009. TriTrypDB: A functional genomic resource for the Trypanosomatidae. Nucleic Acids Research, 38, p. D457-D462. Assis, R.R. et al., 2012a. Glycoinositolphospholipids from Leishmania braziliensis and L. infantum: Modulation of innate immune system and variations in carbohydrate structure. PLoS Neglected Tropical Diseases, 6(2)., p.e1543. Assis, R.R. et al., 2012. Glycoconjugates in New World species of Leishmania: Polymorphisms in lipophosphoglycan and glycoinositolphospholipids and interaction with hosts. Biochimica et Biophysica Acta - General Subjects, 1820(9), pp.1354–1365. Augustin, J.M. and Bak, S., 2013. Determination of Enzyme Kinetic Parameters of UDP-glycosyltransferases. Bio-protocol, 3(14), p.e825. Aurrecoechea, C. et al., 2013. EuPathDB: The eukaryotic pathogen database. Nucleic Acids Research, 41(D1). Avila, J.L., Rojas, M. and Acosta, A., 1991. Glycoinositol phospholipids from American Leishmania and General Introduction 59

Trypanosoma spp.: Partial characterization of the glycan cores and the human humoral immune response to them. Journal of Clinical Microbiology, 29(10), pp.2305–2312. Bakker, B.M. et al., 1997. Glycolysis in bloodstream form Trypanosoma brucei can be understood in terms of the kinetics of the glycolytic enzymes. Journal of Biological Chemistry, 272(6), pp.3207–3215. Bandini, G. et al., 2016. O-fucosylated glycoproteins form assemblies in close proximity to the nuclear pore complexes of Toxoplasma gondii. Proceedings of the National Academy of Sciences of the United States of America, 113(41), pp.11567–11572. Barb, A.W. and Prestegard, J.H., 2011. NMR analysis demonstrates immunoglobulin G N-glycans are accessible and dynamic. Nature chemical biology, 7(3), pp.147–153. Barre, A. et al., 2001. Mannose-binding plant lectins: Different structural scaffolds for a common sugar- recognition process. Biochimie, 83(7), pp.645–651. Baum, L.G. et al., 2014. Microbe-host interactions are positively and negatively regulated by galectin-glycan interactions. Frontiers in Immunology, 5, p. 284. Baycin-Hizal, D. et al., 2011. GlycoFish: A database of zebrafish N -linked glycoproteins identified using SPEG method coupled with LC/MS. Analytical Chemistry, 83(13), pp.5296–5303. Baycin-Hizal, D. et al., 2014. Glycoproteomic and glycomic databases. Clinical proteomics, 11(1), p.15. Becker, S.A. and Palsson, B.O., 2008. Context-specific metabolic networks are consistent with experiments. PLoS Computational Biology, 4(5), p.e1000082. Beer, T. et al., 1995. The hexopyranosyl residue that is C-glycosidically linked to the side chain of tryptophan-7 in human RNase Us is alpha-mannopyranose. Biochemistry, 34, pp.11785–11789. Behmüller, R. et al., 2014. Quantitative HPLC-MS analysis of nucleotide sugars in plant cells following off-line SPE sample preparation. Analytical and Bioanalytical Chemistry, 406(13), pp.3229–3237. Benson, D.A. et al., 2013. GenBank. Nucleic Acids Research, 41(D1), p.D36-42. Bergman, M.P. et al., 2004. Helicobacter pylori modulates the T helper cell 1/T helper cell 2 balance through phase-variable interaction between lipopolysaccharide and DC-SIGN. The Journal of experimental medicine, 200(8), pp.979–990. Berman, H.M., 2000. The Protein Data Bank. Nucleic Acids Research, 28(1), pp.235–242. Bertin, G.I. et al., 2016. Proteomic analysis of Plasmodium falciparum parasites from patients with cerebral and uncomplicated malaria. Scientific Reports, 6, p.26773. Beste, D.J. et al., 2007. GSMN-TB: a web-based genome-scale network model of Mycobacterium tuberculosis metabolism. Genome biology, 8(5), p.R89. Bishop, J.R. and Gagneux, P., 2007. Evolution of carbohydrate antigens - Microbial forces shaping host glycomes? Glycobiology, 17(5), pp. 23R-34R. Blazier, A.S. and Papin, J.A., 2012. Integration of expression data in genome-scale metabolic network reconstructions. Frontiers in Physiology, 3, p.299. Bocker, S., Kehr, B. and Rasche, F., 2011. Determination of glycan structure from tandem mass spectra. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(4), pp.976–986. Bodiga, V.L., Eda, S.R. and Bodiga, S., 2013. Advanced glycation end products: role in pathology of diabetic cardiomyopathy. Heart Failure Reviews, pp.1–15. Boer, A.R. et al., 2007. General microarray technique for immobilization and screening of natural glycans. Analytical Chemistry, 79(21), pp.8107–8113. Bohne-Lang, A. et al., 2001. LINUCS: LInear Notation for Unique description of Carbohydrate Sequences. Carbohydrate Research, 336(1), pp.1–11. Bohne-Lang, A. and Von der Lieth, C.W., 2005. GlyProt: In silico glycosylation of proteins. Nucleic Acids Research, 33, pp.W214-W219. Bohne, A., Lang, E. and Von der Lieth, C.W., 1998. W3-SWEET: Carbohydrate Modeling By Internet. Molecular modeling annual, 4(1), pp.33–43. Borst, P. and Cross, G.A.M., 1982. Molecular basis for trypanosome antigenic variation. Cell, 29(2), pp.291–

Metabolic Systems Biology of Leishmania major University of Minho, 2019 60 Chapter 1

303. Bringaud, F., Rivière, L. and Coustou, V., 2006. Energy metabolism of trypanosomatids: Adaptation to available carbon sources. Molecular and Biochemical Parasitology, 149(1), pp.1–9. Brito, A.E. et al., 2015. Benchmark study of automatic annotation of MALDI-TOF N-glycan profiles. Journal of Proteomics, 129, pp.71–77. Brittingham, A. et al., 1999. Interaction of Leishmania gp63 with cellular receptors for fibronectin. Infection and Immunity, 67(9), pp.4477–4484. Brodskyn, C. et al., 2002. Glycoinositolphospholipids from Trypanosoma cruzi interfere with macrophages and dendritic cell responses. Infection and Immunity, 70(7), pp.3736–3743. Buda, S., Nawój, M. and Mlynarski, J., 2016. Recent Advances in NMR Studies of Carbohydrates. In Annual Reports on NMR Spectroscopy, 89, pp. 185–223. Burbelo, P.D., Goldman, R. and Mattson, T.L., 2005. A simplified immunoprecipitation method for quantitatively measuring antibody responses in clinical sera samples by using mammalian-produced Renilla luciferase-antigen fusion proteins. BMC biotechnology, 5, p.22. Burghaus, P. a et al., 1999. Analysis of recombinant merozoite surface protein-1 of Plasmodium falciparum expressed in mammalian cells. Molecular and Biochemical Parasitology, 104(2), pp.171–183. Button, L.L. and McMaster, W.R., 1988. Molecular cloning of the major surface antigen of Leishmania. The Journal of experimental medicine, 167(2), pp.724–729. Cakir, T. et al., 2006. Integration of metabolome data with metabolic networks reveals reporter reactions. Molecular systems biology, 2, p.50. Campbell, M.P. et al., 2008. GlycoBase and autoGU: Tools for HPLC-based glycan analysis. Bioinformatics, 24(9), pp.1214–1216. Campbell, M.P. et al., 2014. UniCarbKB: Building a knowledge platform for glycoproteomics. Nucleic Acids Research, 42(D1), pp. D215-D221. Cantacessi, C. et al., 2015. The past, present, and future of Leishmania genomics and transcriptomics. Trends in Parasitology, 31(3), pp.100–108. Cao, H. and Heagy, M.D., 2004. Fluorescent chemosensors for carbohydrates: A decade’s worth of bright spies for saccharides in review. Journal of Fluorescence, 14(5), pp.569–584. Cao, W. et al., 2009. Computational protocol for screening gpi-anchored proteins. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 164–175. Cao, Z. et al., 2013. Modulation of glycan detection on specific glycoproteins by lectin multimerization. Analytical Chemistry, 85(3), pp.1689–1698. Caragea, C. et al., 2007. Glycosylation site prediction using ensembles of Support Vector Machine classifiers. BMC bioinformatics, 8, p.438. Carey, M.A., Papin, J.A. and Guler, J., 2017. Novel Plasmodium falciparum metabolic network reconstruction identifies shifts associated with clinical antimalarial resistance. BMC Genomics, 18(1), p.543 Carreira, R. et al., 2013. Algorithms to infer metabolic flux ratios from fluxomics data. In Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM . pp. 205–209. Carrington, M. et al., 1991. Variant specific glycoprotein of Trypanosoma brucei consists of two domains each having an independently conserved pattern of residues. Journal of Molecular Biology, 221(3), pp.823–835. Caspi, R. et al., 2014. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research, 42(D1), pp. D742-D753 Castillo-Acosta, V.M., Balzarini, J. and González-Pacanowska, D., 2017. Erratum: Surface Glycans: A Therapeutic Opportunity for Kinetoplastid Diseases. Trends in Parasitology, 33(10), pp.775–787. Ceroni, A. et al., 2008. GlycoWorkbench: A tool for the computer-assisted annotation of mass spectra of glycans. Journal of Proteome Research, 7(4), pp.1650–1659. General Introduction 61

Chaffey, P.K. et al., 2017. Quantitative Effects of O-Linked Glycans on Protein Folding. Biochemistry, 56(34), pp.4539–4548. Chandrasekaran, S. and Price, N.D., 2010a. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences of the United States of America, 107(41), pp.17845–50. Chaudry, J.Z. et al., 2012. Real time polymerase chain reaction for the detection of malarial parasite. Journal of the College of Physicians and Surgeons--Pakistan : JCPSP, 22(2), pp.98–100. Chauhan, J.S., Rao, A. and Raghava, G.P.S., 2013. In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences. PLoS ONE, 8(6), p.e67008. Chavali, A.K. et al., 2012. A metabolic network approach for the identification and prioritization of antimicrobial drug targets. Trends Microbiol, 20(3), pp.113–123. Chavali, A.K. et al., 2008. Systems analysis of metabolism in the pathogenic trypanosomatid Leishmania major. Molecular Systems Biology, 4, p.177. Chen, Y.Z. et al., 2008. Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs. BMC bioinformatics, 9, p.101. Chen, Z., Glover, M.S. and Li, L., 2018. Recent advances in ion mobility–mass spectrometry for improved structural characterization of glycans and glycoconjugates. Current Opinion in Chemical Biology, 42, pp.1– 8. Chiappino-Pepe, A. et al., 2017. Bioenergetics-based modeling of Plasmodium falciparum metabolism reveals its essential genes, nutritional requirements, and thermodynamic bottlenecks. PLoS Computational Biology, 13(3), p.e1005397. Claudel-Renard, C. et al., 2003. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Research, 31(22), pp.6633–6639. Cohen, M., 2015. Notable aspects of glycan-protein interactions. Biomolecules, 5(3), pp.2056–2072. Consortium for Functional Glycomics. http://www.functionalglycomics.org/. Cooper, C.A. et al., 2003. GlycoSuiteDB: A curated relational database of glycoprotein glycan structures and their biological sources. 2003 update. Nucleic Acids Research, 31(1), pp.511–513. Cova, M. et al., 2015. Sugar activation and glycosylation in Plasmodium. Malaria journal, 14(1), p.427. Cova, M. et al., 2018. The Apicomplexa-specific glucosamine-6-phosphate N-acetyltransferase gene family encodes a key enzyme for glycoconjugate synthesis with potential as therapeutic target. Scientific Reports, 8(4005), p.4005. Covert, M.W. and Palsson, B., 2002. Transcriptional regulation in constraints-based metabolic models of Escherichia coli. Journal of Biological Chemistry, 277(31), pp.28058–28064. Cross, G.A.M., Kim, H.S. and Wickstead, B., 2014. Capturing the variant surface glycoprotein repertoire (the VSGnome) of Trypanosoma brucei Lister 427. Molecular and Biochemical Parasitology, 195(1), pp.59–73. Cummings, R.D. et al., 2017. Glycan-Recognizing Probes as Tools. In Essentials of Glycobiology. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press, pp. 2015–2017. Cummings, R.D. and Etzler, M.E., 2009. Antibodies and Lectins in Glycan Analysis. In A. Varki et al., eds. Essentials of Glycobiology 2nd Edition. Cold Spring Harbor Laboratory Press, pp. 633–648. Damodaran, D. et al., 2008. CancerLectinDB: A database of lectins relevant to cancer. Glycoconjugate Journal, 25(3), pp.191–198. Daubenspeck, J.M. et al., 2015. General N- and O-linked glycosylation of lipoproteins in mycoplasmas and role of exogenous oligosaccharide. PLoS ONE, 10(11), p. e0143362. Davis, A.J. et al., 2004. Properties of GDP-mannose Pyrophosphorylase, a Critical Enzyme and Drug Target in Leishmania mexicana. Journal of Biological Chemistry, 279(13), pp.12462–12468. Dias, O. et al., 2010. Merlin: Metabolic models reconstruction using genome-scale information. In IFAC Proceedings Volumes (IFAC-PapersOnline). pp. 120–125. Dias, O. et al., 2018. Reconstructing high-quality large-scale metabolic models with merlin. In Methods in

Metabolic Systems Biology of Leishmania major University of Minho, 2019 62 Chapter 1

Molecular Biology, pp. 1–36. Donga, M.-H. et al., 2016. Molecular modeling studies, synthesis and biological evaluation of dabigatran analogues as thrombin inhibitors. Bioorg Med Chem, 24(2), pp.73–84. Donovan, R.S. et al., 1999. A solid-phase glycosyltransferase assay for high-throughput screening in drug discovery research. Glycoconjugate Journal, 16(10), pp.607–615. Dowell, R.D., 2011. The similarity of gene expression between human and mouse tissues. Genome Biology, 12(1), p.101. Doyle, M. a et al., 2009. LeishCyc: a biochemical pathways database for Leishmania major. BMC systems biology, 3, p.57. Ebneter, J.A. et al., 2016. Cyst-Wall-Protein-1 is fundamental for Golgi-like organelle neogenesis and cyst-wall biosynthesis in Giardia lamblia. Nature Communications, 7, p.13859. Edgar, R. and Lash, A., 2002. The Gene Expression Omnibus (GEO): A Gene Expression and Hybridization Repository. NCBI Handbook, 30(1),pp.207-210. Eggimann, G.A. et al., 2015. The role of phosphoglycans in the susceptibility of Leishmania mexicana to the temporin family of anti-microbial peptides. Molecules, 20(2), pp.2775–2785. Egorova, K.S. and Toukach, P. V, 2017. CSDB_GT: a new curated database on glycosyltransferases. Glycobiology, 27(4), pp.285–290. Eisenhaber, B., Bork, P. and Eisenhaber, F., 1999. Prediction of potential GPI-modification sites in proprotein sequences. Journal of molecular biology, 292(3), pp.741–758. El-Sayed, N.M. et al., 2005. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science (New York, N.Y.), 309(5733), pp.409–15. Emes, R.D. et al., 2003. Comparison of the genomes of human and mouse lays the foundation of genome zoology. Human Molecular Genetics, 12(7), pp.701–709. Enjalbert, B., Jourdan, F. and Portais, J.C., 2011. Intuitive visualization and analysis of multi-omics data and application to Escherichia coli carbon metabolism. PLoS ONE, 6(6), p. e21318. Etzler, M.E., Surolia, A. and Cummings, R.D., 2009. L-type Lectins. In A. Varki et al., eds. Essentials of Glycobiology. Cold Spring Harbor Laboratory Press, Cold Spring Harbor (NY), pp. 415–424. Fankhauser, N. and Mäser, P., 2005. Identification of GPI anchor attachment signals by a Kohonen self- organizing map. Bioinformatics, 21(9), pp.1846–1852. Feist, A.M. et al., 2009. Reconstruction of biochemical networks in microorganisms. Nature Reviews Microbiology, 7(2), pp.129–143. Ferguson, M.A., 1999. The structure, biosynthesis and functions of glycosylphosphatidylinositol anchors, and the contributions of trypanosome research. Journal of cell science, 112, pp.2799–2809. Ferguson, M.A., 1997. The surface glycoconjugates of trypanosomatid parasites. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 352(1359), pp.1295–302. Ferguson MA, 1992. Glycosyl-phosphatidylinositol membrane anchors: the tale of a tail. Biochem Soc Trans., 20(2), pp.243–256. Fernandes, A.C.S. et al., 2013. Different secreted phosphatase activities in Leishmania amazonensis. FEMS Microbiology Letters, 340(2), pp.117–128. Flores-García, Y. et al., 2011. IL-10-IFN-γ double producers CD4+ T cells are induced by immunization with an amastigote stage specific derived recombinant protein of Trypanosoma cruzi. International Journal of Biological Sciences, 7(7), pp.1093–1100. Forestier, C.-L., Gao, Q. and Boons, G.-J., 2015. Leishmania lipophosphoglycan: how to establish structure- activity relationships for this highly complex and multifunctional glycoconjugate? Frontiers in Cellular and Infection Microbiology, 4, p. 193. Forst, C. V., 2010. Host-pathogen systems biology. In Infectious Disease Informatics. pp. 123–147. Francke, C., Siezen, R.J. and Teusink, B., 2005. Reconstructing the metabolic network of a bacterium from its genome. Trends in Microbiology, 13(11), pp.550–558. General Introduction 63

Franco, L.H., Beverley, S.M. and Zamboni, D.S., 2012. Innate immune activation and subversion of mammalian functions by Leishmania lipophosphoglycan. Journal of Parasitology Research, 2012, p.165126. Frank, M. et al., 2002. Rapid generation of a representative ensemble of N-glycan conformations. In silico biology, 2(3), pp.427–439. Frank, M., Lütteke, T. and von der Lieth, C.W., 2007. GlycoMapsDB: A database of the accessible conformational space of glycosidic linkages. Nucleic Acids Research, 35, p.287-290. Frank, M. and Schloissnig, S., 2010. Bioinformatics and molecular modeling in glycobiology. Cellular and Molecular Life Sciences, 67(16), pp.2749–2772. Fujita, M. and Kinoshita, T., 2010. Structural remodeling of GPI anchors during biosynthesis and after attachment to proteins. FEBS Letters, 584(9), pp.1670–1677. Funk, V.A. et al., 1997. A unique, terminally glucosylated oligosaccharide is a common feature on Leishmania cell surfaces. Molecular and Biochemical Parasitology, 84(1), pp.33–48. Gannavaram, S. et al., 2017. Whole genome sequencing of live attenuated Leishmania donovani parasites reveals novel biomarkers of attenuation and enables product characterization. Scientific Reports, 7(1), p.4718. Gantt, R.W. et al., 2013. Broadening the scope of glycosyltransferase-catalyzed sugar nucleotide synthesis. Proceedings of the National Academy of Sciences, 110(19), pp.7648–7653. Garami, A. and Ilg, T., 2001. The Role of Phosphomannose Isomerase in Leishmania mexicana Glycoconjugate Synthesis and Virulence. Journal of Biological Chemistry, 276(9), pp.6566–6575. Gardner, M.J. et al., 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature, 419(6906), pp.498–511. Garfoot, A.L. et al., 2018. O-mannosylation of proteins enables Histoplasma yeast survival at mammalian body temperatures. mBio, 9(1), p. e02121-17. Gastón, J.A.R. et al., 2009. Cyclodextrin glycosyltransferase from Bacillus circulans DF 9R: Activity and kinetic studies. Enzyme and Microbial Technology, 45(1), pp.36–41. Gerken, T.A. et al., 2011. Emerging paradigms for the initiation of mucin-type protein O-glycosylation by the polypeptide GalNAc transferase family of glycosyltransferases. Journal of Biological Chemistry, 286(16), pp.14493–14507. Geromichalos, G.D., 2007. Importance of molecular computer modeling in anticancer drug development. Journal of B.U.ON. : official journal of the Balkan Union of Oncology, 12, pp.S101-18. Ghouila, A. et al., 2017. Comparative genomics of Tunisian Leishmania major isolates causing human cutaneous leishmaniasis with contrasting clinical severity. Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases, 50, pp.110–120. Glycominds Ltd. http://www.glycominds.com. Golgher, D.B. et al., 1993. Galactofuranose-containing glycoconjugates of epimastigote and trypomastigote forms of Trypanosoma cruzi. Molecular and Biochemical Parasitology, 60(2), pp.249–264. Gorin, P.A.J. et al., 1979. Structure of the D‐Mannan and D‐Arabino‐D‐Galactan in Crithidia fasciculata: Changes in Proportion with Age of Culture. The Journal of Protozoology, 26(3), pp.473–478. Greca, F. and Magez, S., 2011. Vaccination against trypanosomiasis. Human Vaccines, 7(11), pp.1225–1233. Grech, K., Watt, K. and Read, A.F., 2006. Host-parasite interactions for virulence and resistance in a malaria model system. Journal of Evolutionary Biology, 19(5), pp.1620–1630. Groom, C.R. et al., 2016. The Cambridge structural database. Acta Crystallographica Section B: Structural Science, Crystal Engineering and Materials, 72(2), pp.171–179. Großeholz, R. et al., 2016. Integrating highly quantitative proteomics and genome-scale metabolic modeling to study pH adaptation in the human pathogen Enterococcus faecalis. npj Systems Biology and Applications, 2(1), p.16017. Guerra-Giraldez, C., Quijada, L. and Clayton, C.E., 2002. Compartmentation of enzymes in a microbody, the glycosome, is essential in Trypanosoma brucei. Journal of cell science, 115, pp.2651–2658.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 64 Chapter 1

Guerra, D.G. et al., 2006. The mitochondrial FAD-dependent glycerol-3-phosphate dehydrogenase of Trypanosomatidae and the glycosomal redox balance of insect stages of Trypanosoma brucei and Leishmania spp. Molecular and Biochemical Parasitology, 149(2), pp.155–169. Guha-Niyogi, A., Sullivan, D.R. and Turco, S.J., 2001. Glycoconjugate structures of parasitic protozoa. Glycobiology, 11(4), p.45R–59R. Gupta, R. et al., 1999a. O-GLYCBASE version 4.0: A revised database of O-glycosylated proteins. Nucleic Acids Research, 27(1), pp.370–372. Gupta, R. et al., 1999. Scanning the available Dictyostelium discoideum proteome for O-linked GlcNAc glycosylation sites using neural networks. Glycobiology, 9(10), pp.1009–1022. Gupta, R., Jung, E. and Brunak, S., 2004. NetNGlyc: Prediction of N-glycosylation sites in human proteins. Pac. Symp. Biocomput, p.310–322.

Guthrie, E.P. and Magnelli, P.E., 2016. Using glycosidases to remove, trim, or modify glycans on therapeutic proteins. BioProcess International, 14(2). http://www.bioprocessintl.com/analytical/cell-line- development/using-glycosidases-to-remove-trim-or-modify-glycans-on-therapeutic-proteins/. Haikarainen, T. et al., 2017. Structural and Biochemical Characterization of Poly-ADP-ribose Polymerase from Trypanosoma brucei. Scientific Reports, 7(1), p.3642. Hamby, S.E. and Hirst, J.D., 2008. Prediction of glycosylation sites using random forests. BMC bioinformatics, 9, p.500. Hammond, D.J., Aman, R.A. and Wang, C.C., 1985. The role of compartmentation of glycerol kinase in the synthesis of ATP within the glycosome of Trypanosoma brucei. Journal of Biological Chemistry, 260(29), pp.15646–15654. Han, L. and Costello, C.E., 2013. Mass spectrometry of glycans. Biochemistry (Mosc), 78(7), pp.710–720. Hancock, K. and Tsang, V.C.W., 1986. Development and optimization of the FAST-ELISA for detecting antibodies to Schistosoma mansoni. Journal of Immunological Methods, 92(2), pp.167–176. Hansen, J.E. et al., 1998. NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate Journal, 15(2), pp.115–130. Hart, D.T. and Coombs, G.H., 1982. Leishmania mexicana: Energy metabolism of amastigotes and promastigotes. Experimental Parasitology, 54(3), pp.397–409. Hartleb, D., Fritzemeier, C.J. and Lercher, M., 2018. Automated high-quality reconstruction of metabolic networks from high-throughput data. bioRxiv, p.282251. http://dx.doi.org/10.1101/282251. Harvey, D.J., 2005. Structural determination of N-linked glycans by matrix-assisted laser desorption/ionization and electrospray ionization mass spectrometry. Proteomics, 5(7), pp.1774–1786. Hasan, M.A. et al., 2015. Molecular-docking study of malaria drug target enzyme transketolase in Plasmodium falciparum 3D7 portends the novel approach to its treatment. Source code for biology and medicine, 10, p.7. Hashimoto, K. et al., 2006. KEGG as a glycome informatics resource. Glycobiology, 16(5), pp. 63R-70R. Hashimoto, K. et al., 2008. Mining significant tree patterns in carbohydrate sugar chains. In Bioinformatics, 24(16), pp.i167–i173. Hattori, M. et al., 2003. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc, 125(39), pp.11853–11865. Havelaar, A.H. et al., 2015. World Health Organization Global Estimates and Regional Comparisons of the Burden of Foodborne Disease in 2010. PLoS Medicine, 12(12), p. e1001923. Hayes, C.A. et al., 2011. UniCarb-DB: A database resource for glycomic discovery. Bioinformatics, 27(9), pp.1343–1344. Hebert, D.N. and Molinari, M., 2012. Flagging and docking: Dual roles for N-glycans in protein quality control and cellular proteostasis. Trends in Biochemical Sciences, 37(10), pp.404–410. Hederos, M. and Konradsson, P., 2006. Synthesis of the Trypanosoma cruzi LPPG heptasaccharyl myo-inositol. Journal of the American Chemical Society, 128(10), pp.3414–3419. Hertel, F. and Zhang, J., 2014. Monitoring of post-translational modification dynamics with genetically encoded General Introduction 65

fluorescent reporters. Biopolymers, 101(2), pp.180–187. Hewitt, M.C. and Seeberger, P.H., 2001. Solution and solid-support synthesis of a potential leishmaniasis carbohydrate vaccine. Journal of Organic Chemistry, 66(12), pp.4233–4243. Hirabayashi, J. et al., 2013. Lectin microarrays: concept, principle and applications. Chemical Society Reviews, 42(10), p.4443. Hirabayashi, J. et al., 2015. The lectin frontier database (LfDB), and data generation based on frontal affinity chromatography. Molecules, 20(1), pp.951–973. Hoang, N.H. et al., 2016. Kinetic studies on recombinant UDP-glucose: sterol 3-O-β-glycosyltransferase from Micromonospora rhodorangea and its bioconversion potential. AMB Express, 6(1), p.52. Hokke, C.H. and van Diepen, A., 2017. Helminth glycomics – glycan repertoires and host-parasite interactions. Molecular and Biochemical Parasitology, 215, pp.47–57. Holt, W. V., 2011. Mechanisms of sperm storage in the female reproductive tract: An interspecies comparison. Reproduction in Domestic Animals, 46, pp.68–74. Hong, Y. and Kinoshita, T., 2009. Trypanosome glycosylphosphatidylinositol biosynthesis. Korean Journal of Parasitology, 47(3), pp.197–204. Hosoda, M., Akune, Y. and Aoki-Kinoshita, K.F., 2017. Development and application of an algorithm to compute weighted multiple glycan alignments. Bioinformatics, 33(9), pp.1317–1323. Hosoda, M., Akune, Y. and Aoki-Kinoshita, K.F., 2012. Multiple Alignment with Weights Applied to Carbohydrate to Extract Binding Recognition Patterns. Lecture Notes in Computer Science, 7632, pp.49–58. Hsiao, C.-T. et al., 2017. Fibronectin in cell adhesion and migration via N-glycosylation. Oncotarget, 8(41), pp. 70653–70668. Huang, Y., Gelb, A.S. and Dodds, E.D., 2013. Carbohydrate and Glycoconjugate Analysis by Ion Mobility Mass Spectrometry: Opportunities and Challenges. Current Metabolomics, 1, pp.291–305. Huthmacher, C. et al., 2010. Antimalarial drug targets in Plasmodium falciparum predicted by stage-specific metabolic network analysis. BMC systems biology, 4(1), p.120. Ihara, Y. et al., 2015. C-Mannosylation: Modification on Tryptophan in Cellular Proteins. In Glycoscience: Biology and Medicine. pp. 1091–1100. Ilg, T. et al., 1994. Characterization of phosphoglycan-containing secretory products of Leishmania. Parasitology, 108(S1), pp.S63–S71. Ilg, T. et al., 1994a. O- and N-glycosylation of the Leishmania mexicana-secreted acid phosphatase. Characterization of a new class of phosphoserine-linked glycans. Journal of Biological Chemistry, 269(39), pp.24073–24081. Ilg, T. et al., 1991. Secreted Acid-Phosphatase Of Leishmania-Mexicana - A Filamentous Phosphoglycoprotein Polymer. Proceedings of the National Academy of Sciences of the United States of America, 88(19), pp.8774–8778. Itakura, Y. et al., 2017. Sugar-binding profiles of chitin-binding lectins from the hevein family: A comprehensive study. International Journal of Molecular Sciences, 18(6), p. E1160. Ivens, A.C. et al., 2005. The genome of the kinetoplastid parasite, Leishmania major. Science, 309(5733), pp.436–442. Jaillon, O. et al., 2007. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 449(7161), pp.463–7. Jamshidi, N. and Palsson, B., 2010. Mass action stoichiometric simulation models: Incorporating kinetics and regulation into stoichiometric models. Biophysical Journal, 98(2), pp.175–185. Jansson, P.E., Stenutz, R. and Widmalm, G., 2006. Sequence determination of oligosaccharides and regular polysaccharides using NMR spectroscopy and a novel Web-based version of the computer program casper. Carbohydrate Research, 341(8), pp.1003–1010. Jaurigue, J.A. and Seeberger, P.H., 2017. Parasite Carbohydrate Vaccines. Frontiers in Cellular and Infection Microbiology, 7, p.248.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 66 Chapter 1

Jensen, P.A. and Papin, J.A., 2011. Functional integration of a metabolic network model and expression data without arbitrary thresholding. Bioinformatics, 27(4), pp.541–547. Jesus, J.B., Mesquita-Rodrigues, C. and Cuervo, P., 2014. Proteomics advances in the study of Leishmania parasites and leishmaniasis. Sub-Cellular Biochemistry, 74, pp.323–349. Jiang, H., Aoki-Kinoshita, K.F. and Ching, W.-K., 2011. Extracting glycan motifs using a biochemically- weighted kernel. Bioinformation, 7(8), pp.405–412. Jo, S. and Im, W., 2013. Glycan fragment database: A database of PDB-based glycan 3D structures. Nucleic Acids Research, 41(D1), pp.D470-D474. Johansen, M.B., Kiemer, L. and Brunak, S., 2006. Analysis and prediction of mammalian protein glycation. Glycobiology, 16(9), pp.844–853. Jones, P. et al., 2003. UGT73C6 and UGT78D1, Glycosyltransferases Involved in Flavonol Glycoside Biosynthesis in Arabidopsis thaliana. Journal of Biological Chemistry, 278(45), pp.43910-43918. Joyce, A.R. and Palsson, B.O., 2006. The model organism as a system: integrating “omics” data sets. Nature reviews. Molecular cell biology, 7(3), pp.198–210. Julenius, K., 2007. NetCGlyc 1.0: Prediction of mammalian C-mannosylation sites. Glycobiology, 17(8), pp.868–876. Kafsack, B.F.C. and Llinás, M., 2010. Eating at the Table of Another: Metabolomics of Host-Parasite Interactions. Cell Host and Microbe, 7(2), pp.90–99. Kailemia, M.J. et al., 2014. Oligosaccharide analysis by mass spectrometry: A review of recent developments. Analytical Chemistry, 86(1), pp.196–212. Kaji, H. et al., 2012. Large-scale identification of N-glycosylated proteins of mouse tissues and construction of a glycoprotein database, GlycoProtDB. Journal of Proteome Research, 11(9), pp.4553–4566. Kalani, H. et al., 2016. Comparison of eight cell-free media for maintenance of Toxoplasma gondii tachyzoites. Iranian Journal of Parasitology, 11(1), pp.104–109. Kaltdorf, M. et al., 2016. Systematic identification of anti-fungal drug targets by a metabolic network approach . Frontiers in Molecular Biosciences, 3, pp.1–19. Kanehisa, M. et al., 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research, 44(D1), pp.D457–D462. Kanehisa, M., 2002. The KEGG database. Novartis Found Symp, 247, pp.91–101. Kanehisa, M. and Goto, S., 2000. Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28, pp.27–30. Kashif, M. et al., 2018. Identification of novel inhibitors against UDP-galactopyranose mutase to combat leishmaniasis. Journal of Cellular Biochemistry, 119(3), pp.2653–2665. Kauffman, K.J., Prakash, P. and Edwards, J.S., 2003. Advances in flux balance analysis. Current Opinion in Biotechnology, 14(5), pp.491–496. Kaur, T., Sobti, R.C. and Kaur, S., 2011. Cocktail of gp63 and Hsp70 induces protection against Leishmania donovani in BALB/c mice. Parasite Immunology, 33(2), pp.95–103. Kawasaki, T. et al., 2006. GlycoEpitope: the Integrated Database of Carbohydrate Antigens and Antibodies. Trends in Glycoscience and Glycotechnology, 18(102), pp.267–272. Kim, D.H. et al., 2015. LC–MS-based absolute metabolite quantification: application to metabolic flux measurement in trypanosomes. Metabolomics, 11(6), pp.1721–1732. Kim, H.U., Kim, T.Y. and Lee, S.Y., 2010. Genome-scale metabolic network analysis and drug targeting of multi-drug resistant pathogen Acinetobacter baumannii AYE. Molecular bioSystems, 6(2), pp.339–348. Kim, M. et al., 2016a. Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli. Nature Communications, 7, p.13090. Kim, M.K. et al., 2016. E-Flux2 and sPOT: Validated methods for inferring intracellular metabolic flux distributions from transcriptomic data. PLoS ONE, 11(6), p. e0157101. Kim, M.K. and Lun, D.S., 2014. Methods for integration of transcriptomic data in genome-scale metabolic General Introduction 67

models. Computational and Structural Biotechnology Journal, 11(18), pp.59–65. Kirsch, S. et al., 2009. On-line nano-HPLC/ESI QTOF MS monitoring of α2-3 and α2-6 sialylation in granulocyte glycosphingolipidome. Biological Chemistry, 390(7), pp.657–672. Kitano, H., 2002. Computational systems biology. Nature, 420(6912), pp.206–210. Kleczka, B. et al., 2007. Targeted gene deletion of Leishmania major UDP-galactopyranose mutase leads to attenuated virulence. Journal of Biological Chemistry, 282(14), pp.10498–10505. Konishi, Y. and Aoki-Kinoshita, K.F., 2012. The GlycomeAtlas tool for visualizing and querying glycome data. Bioinformatics, 28(21), pp.2849–2850. Koscielny, G. et al., 2014. The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Research, 42(D1), pp.D802-809. Kozak, R.P. et al., 2014. Improved nonreductive O-glycan release by hydrazinolysis with ethylenediaminetetraacetic acid addition. Analytical Biochemistry, 453(1), pp.29–37. Krambeck, F.J. et al., 2009. A mathematical model to derive N-glycan structures and cellular enzyme activities from mass spectrometric data. Glycobiology, 19(11), pp.1163–1175. Krambeck, F.J. and Betenbaugh, M.J., 2005. A mathematical model of N-linked glycosylation. Biotechnology and Bioengineering, 92(6), pp.711–728. Krishnamoorthy, L. and Mahal, L.K., 2009. Glycomic analysis: An array of technologies. ACS Chemical Biology, 4(9), pp.715–732. Kronegg, J. and Buloz, D., 1999. Detection/prediction of GPI cleavage site (GPI-anchor) in a protein (DGPI), Available at: http://129.194.185.165/dgpi/index_en.html. Kronewitter, S.R. et al., 2010. Human serum processing and analysis methods for rapid and reproducible N- glycan mass profiling. Journal of Proteome Research, 9(10), pp.4952–4959. Kuan, C.T. et al., 2010. Multiple phenotypic changes in mice after knockout of the B3gnt5 gene, encoding Lc3 synthase--a key enzyme in lacto-neolacto ganglioside synthesis. BMC developmental biology, 10(1), p.114 Kumar, D. and Mittal, Y., 2011. AnimalLectinDb: An integrated animal lectin database. Bioinformation, 6(3), pp.134–6. Kuno, A. et al., 2005. Evanescent-field fluorescence-assisted lectin microarray: a new strategy for glycan profiling. Nature methods, 2(11), pp.851–856. Lackovic, K. et al., 2010. Inhibitors of Leishmania GDP-mannose pyrophosphorylase identified by high- throughput screening of small-molecule chemical library. Antimicrobial Agents and Chemotherapy, 54(5), pp.1712–1719. Lagarda-Diaz, I., Guzman-Partida, A.M. and Vazquez-Moreno, L., 2017. Legume lectins: Proteins with diverse applications. International Journal of Molecular Sciences, 18(6), p.1242. Lam, P.V.N. et al., 2013. Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes. Genomics, Proteomics and Bioinformatics, 11(2), pp.96–104. Lamotte, S. et al., 2017. The enemy within: Targeting host–parasite interaction for antileishmanial drug discovery. PLoS Neglected Tropical Diseases, 11(6), p. e0005480. Lasonder, E. et al., 2008. Proteomic profiling of Plasmodium sporozoite maturation identifies new proteins essential for parasite development and infectivity. PLoS Pathogens, 4(10), p. e1000195. Lederkremer, R.M. et al., 1991. Complete structure of the glycan of lipopeptidophosphoglycan from Trypanosoma cruzi epimastigotes. Journal of Biological Chemistry, 266(35), pp.23670–23672. Leeflang, B.R. et al., 2000. Structure elucidation of glycoprotein glycans and of polysaccharides by NMR spectroscopy. In Journal of Biotechnology. Elsevier Sci B.V., pp. 115–122. Leifso, K. et al., 2007. Genomic and proteomic expression analysis of Leishmania promastigote and amastigote life stages: The Leishmania genome is constitutively expressed. Molecular and Biochemical Parasitology, 152(1), pp.35–46. Levery, S.B., 2006. Glycosphingolipid structural analysis and glycosphingolipidomics. Methods in Enzymology, 405, pp.300–369.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 68 Chapter 1

Leymarie, N. and Zaia, J., 2012. Effective use of mass spectrometry for glycan and glycopeptide structural analysis. Analytical Chemistry, 84(7), pp.3040–3048. Li, H. et al., 2015. Automated N-glycan profiling of a mutant Trypanosoma rangeli sialidase expressed in Pichia pastoris, using tandem mass spectrometry and bioinformatics. Glycobiology, 25(12), pp.1350–1361. Li, S. et al., 2006. Predicting O-glycosylation sites in mammalian proteins by using SVMs. Computational Biology and Chemistry, 30(3), pp.203–208. Li, W., Saraiya, A.A. and Wang, C.C., 2013. Experimental verification of the identity of variant-specific surface proteins in Giardia lamblia trophozoites. mBio, 4(3), pp. e00321-13. Liao, Y.C. et al., 2012. GEMSiRV: A software platform for GEnome-scale metabolic model simulation, reconstruction and visualization. Bioinformatics, 28(13), pp.1752–1758. Von Der Lieth, C.W. et al., 2011. EUROCarbDB: An open-access platform for . Glycobiology, 21(4), pp.493–502. von der Lieth, C.W., Lütteke, T. and Frank, M., 2009. Bioinformatics for glycobiology and glycomics: An introduction. John Wiley and Sons, Ltd. ISBN:9780470029619 DOI:10.1002/9780470029619. Limoncu, M. et al., 2004. Evaluation of three new culture media for the cultivation and isolation of Leishmania parasites. Journal of Basic Microbiology, 44(3), pp.197–202. Liu, D. and Uzonna, J.E., 2012. The early interaction of Leishmania with macrophages and dendritic cells and its influence on the host immune response. Frontiers in Cellular and Infection Microbiology, 2, p.83. Liu, H. et al., 2014. Mass spectrometry-based analysis of glycoproteins and its clinical applications in cancer biomarker discovery. Clin. Proteomics, 11(1), p.14. Liu, X. et al., 2006. Mass spectrometry-based glycomics strategy for exploring N-linked glycosylation in eukaryotes and bacteria. Analytical Chemistry, 78(17), pp.6081–6087. Liu, Y. et al., 2015. Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods. BioMed Research International. 2015, p.561547. Lloyd, K., 2003. The Mutant Mouse Regional Resource Center Program. Breast Cancer Res., 5, p.7. Logan-Klumpler, F.J. et al., 2012. GeneDB-an annotation database for pathogens. Nucleic Acids Research, 40(D1), pp. D98-108. Lombard, V. et al., 2014. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research, 42(D1), pp. D490–D495. Lopez, I. and Pardo, M.A., 2010. Evaluation of a real-time polymerase chain reaction (PCR) Assay for detection of anisakis simplex parasite as a food-borne allergen source in seafood products. Journal of Agricultural and Food Chemistry, 58(3), pp.1469–1477. Loss, A. et al., 2002. SWEET-DB: an attempt to create annotated data collections for carbohydrates. Nucleic acids research, 30(1), pp.405–408. Loß, A. et al., 2006. GlyNest and CASPER: Two independent approaches to estimate 1H and 13C NMR shifts of glycans available through a common web-interface. Nucleic Acids Research, 34(WEB. SERV. ISS.), pp. W733-737. Low, A. et al., 2007. Merozoite surface protein 2 of Plasmodium falciparum: Expression, structure, dynamics, and fibril formation of the conserved N-terminal domain. Biopolymers, 87(1), pp.12–22. Lowe, J.B. and Marth, J.D., 2003. A genetic approach to Mammalian glycan function. Annual review of biochemistry, 72(1), pp.643–691. Ludin, P. et al., 2012. In silico prediction of antimalarial drug target candidates. Int J Parasitol Drugs Drug Resist, 2, pp.191–199. Luk, F.C.Y., Johnson, T.M. and Beckers, C.J., 2008. N-linked glycosylation of proteins in the protozoan parasite Toxoplasma gondii. Molecular and Biochemical Parasitology, 157(2), pp.169–178. Lundborg, M. and Widmalm, G., 2011. Structural analysis of glycans by NMR chemical shift prediction. Analytical Chemistry, 83(5), pp.1514–1517. Lütteke, T. et al., 2006. GLYCOSCIENCES.de: An internet portal to support glycomics and glycobiology General Introduction 69

research. Glycobiology, 16(5), pp.71R–81R. Lütteke, T., Frank, M. and von der Lieth, C.W., 2005. Carbohydrate Structure Suite (CSS): Analysis of carbohydrate 3D structures derived from the PDB. Nucleic Acids Research, 33(DATABASE ISS.), pp. D242-246. Lütteke, T., Frank, M. and Von Der Lieth, C.W., 2004. Data mining the protein data bank: Automatic detection and assignment of carbohydrate structures. In Carbohydrate Research. pp. 1015–1020. Lütteke, T. and von der Lieth, C.W., 2004a. pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC bioinformatics, 5, p.69. Ma, R. et al., 2009. Post-translational and transcriptional regulation of glycolipid glycosyltransferase genes in apoptotic breast carcinoma cells: VII. Studied by DNA-microarray after treatment with l-PPMP. In Glycoconjugate Journal. pp. 647–661. Macedo, C.S. et al., 2010. Overlooked post-translational modifications of proteins in Plasmodium falciparum: N- and O-glycosylation - A review. Memorias do Instituto Oswaldo Cruz, 105(8), pp.949–956. Machado, D. and Herrgård, M., 2014. Systematic Evaluation of Methods for Integration of Transcriptomic Data into Constraint-Based Models of Metabolism. PLoS Computational Biology, 10(4),p. e1003580. Machado, D., Herrgård, M.J. and Rocha, I., 2016. Stoichiometric Representation of Gene–Protein–Reaction Associations Leverages Constraint-Based Analysis from Reaction to Gene-Level Phenotype Prediction. PLoS Computational Biology, 12(10), p. e1005140. Maeda, M. et al., 2015. JCGGDB: Japan consortium for glycobiology and glycotechnology database. Methods in Molecular Biology, 1273, pp.161–179. Mahoney, A.B. and Turco, S.J., 1999. Characterization of the glucosyltransferases that assemble the side chains of the Indian Leishmania donovani lipophosphoglycan. Archives of Biochemistry and Biophysics, 372(2), pp.367–374. Majumdar, D. et al., 2005. Synthesis of proteophosphoglycans of Leishmania major and Leishmania mexicana. Journal of Organic Chemistry, 70(5), pp.1691–1697. Manzi, A.E. et al., 2000. Exploring the glycan repertoire of genetically modified mice by isolation and profiling of the major glycan classes and nano-NMR analysis of glycan mixtures. Glycobiology, 10(7), pp.669–689. Martín-Jiménez, C.A. et al., 2017. Genome-scale reconstruction of the human astrocyte metabolic network. Frontiers in Aging Neuroscience, 9, p.23. Mazola, Y., Chinea, G. and Musacchio, A., 2011. Glycosylation and bioinformatics: Current status for glycosylation prediction tools. Biotecnologia Aplicada, 28(1), pp.6–12. Mazola, Y., Chinea, G. and Musacchio A., 2011a. Integrating Bioinformatics Tools to Handle Glycosylation. PLoS Computational Biology, 7(12), p. e1002285. McClain, D.A. et al., 2002. Altered glycan-dependent signaling induces insulin resistance and hyperleptinemia. Proc Natl Acad Sci U S A, 99(16), pp.10695–10699. McConville, M.J. et al., 2015. Leishmania carbon metabolism in the macrophage phagolysosome- feast or famine? F1000Research, 4(F1000 Faculty Rev), p.938. McConville, M.J. and Blackwell, J.M., 1991. Developmental changes in the glycosylated phosphatidylinositols of Leishmania donovani. Characterization of the promastigote and amastigote glycolipids. The Journal of biological chemistry, 266(23), pp.15170–15179. McConville, M.J. and Ferguson, M. A., 1993. The structure, biosynthesis and function of glycosylated phosphatidylinositols in the parasitic protozoa and higher eukaryotes. Biochemical Journal, 29, pp.305–324. McConville M.J. et al., 1995. Structure of Leishmania lipophosphoglycan: inter- and intra-specific polymorphism in Old World species. Biochem, 310(3), pp.807–818. Medina-Acosta, E. et al., 1989. The promastigote surface protease (gp63) of Leishmania is expressed but differentially processed and localized in the amastigote stage. Molecular and Biochemical Parasitology, 37(2), pp.263–273. Mehlert, A., Sullivan, L. and Ferguson, M.A.J., 2010. Glycotyping of Trypanosoma brucei variant surface glycoprotein MITat1.8. Molecular and Biochemical Parasitology, 174(1), pp.74–77.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 70 Chapter 1

Mehta, M., Sonawat, H.M. and Sharma, S., 2006. Glycolysis in Plasmodium falciparum results in modulation of host enzyme activities. Journal of Vector Borne Diseases, 43(3), pp.95–103. Mehta, N. et al., 2016. Mass Spectrometric Quantification of N-Linked Glycans by Reference to Exogenous Standards. Journal of Proteome Research, 15(9), pp.2969–2980. Meisen, I., Mormann, M. and Müthing, J., 2011. Thin-layer chromatography, overlay technique and mass spectrometry: A versatile triad advancing glycosphingolipidomics. Biochimica et Biophysica Acta - Molecular and Cell Biology of Lipids, 1811(11), pp.875–896. Meng, X.-Y. et al., 2011. Molecular docking: a powerful approach for structure-based drug discovery. Current computer-aided drug design, 7(2), pp.146–57. Michaelis, L. and Menten, M.L., 1913. Die Kinetik der Invertinwirkung. Biochem Z, 49(February), pp.333–369. Michels, P.A.M. et al., 2006. Metabolic functions of glycosomes in trypanosomatids. Biochimica et Biophysica Acta - Molecular Cell Research, 1763(12), pp.1463–1477. Monis, P.T. et al., 2005. Emerging technologies for the detection and genetic characterization of protozoan parasites. Trends in parasitology, 21(7), pp.340–346. Morelle, W. et al., 2006. Characterization of N-glycans of recombinant human thyrotropin using mass spectrometry. Rapid communications in mass spectrometry : RCM, 20(3), pp.331–345. Morelle, W. et al., 2005. Characterization of the N-linked glycans of Giardia intestinalis. Glycobiology, 15(5), pp.549–559. Morelle, W. and Michalski, J.C., 2007. Analysis of protein glycosylation by mass spectrometry. Nature protocols, 2(7), pp.1585–1602. Morotti, A., Teixeira, M.M. and Carvalho, I., 2017. Protozoan parasites glycosylphosphatidylinositol anchors: structures, functions and trends for drug discovery. Curr Med Chem., Epub ahead. doi: 10.2174/0929867324666170727110801. Moura, A.P. V. et al., 2017. Virus-like Particle Display of the α-Gal Carbohydrate for Vaccination against Leishmania Infection. ACS Central Science, 3(9), pp.1026–1031. Moyersoen, J. et al., 2004. Biogenesis of peroxisomes and glycosomes: Trypanosomatid glycosome assembly is a promising new drug target. FEMS Microbiology Reviews, 28(5), pp.603–643. Mukhopadhyay, S. and Mandal, C., 2006. Glycobiology of Leishmania donovani. Indian Journal of Medical Research, 123(3), pp.203–220. Mulagapati, S., Koppolu, V. and Raju, T.S., 2017. Decoding of O-Linked Glycosylation by Mass Spectrometry. Biochemistry, 56(9), pp.1218–1226. Mulloy, B. et al., 2017. Structural Analysis of Glycans. In Essentials of Glycobiology. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press, p. 2015-2017. Mulloy, B., Hart, G.W. and Stanley, P., 2009. Structural Analysis of Glycans, The Consortium of Glycobiology Editors, Available at: http://www.ncbi.nlm.nih.gov/pubmed/20301234. Murray, C.J.L. et al., 2012. Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990-2010: A systematic analysis for the Global Burden of Disease Study 2010. The Lancet, 380(9859), pp.2197–2223. Naderer, T. et al., 2006. Virulence of Leishmania major in macrophages and mice requires the gluconeogenic enzyme fructose-1,6-bisphosphatase. Proceedings of the National Academy of Sciences of the United States of America, 103(14), pp.5502–5507. Naderer, T., Heng, J. and McConville, M.J., 2010. Evidence that intracellular stages of Leishmania major utilize amino sugars as a major carbon source. PLoS Pathogens, 6(12). Naderer, T. and McConville, M.J., 2008. The Leishmania-macrophage interaction: A metabolic perspective. Cellular Microbiology, 10(2), pp.301–308. Naderer, T., Wee, E. and McConville, M.J., 2008. Role of hexosamine biosynthesis in Leishmania growth and virulence. Molecular Microbiology, 69(4), pp.858–869. Naito-Matsui, Y. and Takematsu, H., 2014. Remodeling of glycans using glycosyltransferase genes. Methods in Molecular Biology, 1200, pp.379–387. General Introduction 71

Nakahara, T., et al., 2008. Glycoconjugate Data Bank:Structures - An annotated glycan structure database and N -glycan primary structure verification service. Nucleic Acids Research, 36, pp.D368–D371. Nakakita, S., et al., 2007. A practical approach to N-glycan production by hydrazinolysis using hydrazine monohydrate. Biochemical and Biophysical Research Communications, 362(3), pp.639–645. Narimatsu, H., 2004. Construction of a human glycogene library and comprehensive functional analysis. In Glycoconjugate Journal. pp. 17–24. Ngo, M. and Suits, M.D.L., 2017. Methods for determining glycosyltransferase kinetics. In Methods in Molecular Biology. pp. 59–70. Nguyen, N.-N. et al., 2015. EnzDP: Improved enzyme annotation for metabolic network reconstruction based on domain composition profiles. Journal of Bioinformatics and Computational Biology, 13(5), p.1543003. Nii-Trebi, N.I., 2017. Emerging and Neglected Infectious Diseases: Insights, Advances, and Challenges. BioMed Research International, 2017. doi: 10.1155/2017/5245021. Novozhilova, N. and Bovin, N., 2010. Structure, Functions, and Biosynthesis of Glycoconjugates of Leishmania spp. Cell surface. Biochemistry (Moscow), 75(6), pp.686–694. Oesch, D. and Luedtke, N.W., 2015. Fluorescent chemosensors of carbohydrate triols exhibiting TICT emissions. Chem. Commun., 51(63), pp.12641–12644. Ojima, T. et al., 2015. Glycolipid dynamics in generation and differentiation of induced pluripotent stem cells. Scientific Reports, 5, p.14988. Olafson, R.W. et al., 1990. Structures of the N-linked oligosaccharides of Gp63, the major surface glycoprotein, from Leishmania mexicana amazonensis. Journal of Biological Chemistry, 265(21), pp.12240–12247. Olivier, M., Gregory, D.J. and Forget, G., 2005. Subversion mechanisms by which Leishmania parasites can escape the host immune response: A signaling point of view. Clinical Microbiology Reviews, 18(2), pp.293– 305. Olszewski, K.L. et al., 2010. Branched tricarboxylic acid metabolism in Plasmodium falciparum. Nature, 466(7307), pp.774–778. Olszewski, K.L. and Llinás, M., 2011. Central carbon metabolism of Plasmodium parasites. Molecular and Biochemical Parasitology, 175(2), pp.95–103. Oppenheimer, M., Valenciano, A.L. and Sobrado, P., 2011. Biosynthesis of Galactofuranose in Kinetoplastids: Novel Therapeutic Targets for Treating Leishmaniasis and Chagas’ Disease. Enzyme Research, 2011, pp.1– 13. Opperdoes, F.R. and Coombs, G.H., 2007. Metabolism of Leishmania: proven and predicted. Trends in Parasitology, 23(4), pp.149–158. Orr, S.L. et al., 2013. A phenotype survey of 36 mutant mouse strains with gene-targeted defects in glycosyltransferases or glycan-binding proteins. Glycobiology, 23(3), pp.363–380. Oyola, S.O. et al., 2016. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification. Malaria Journal, 15(1), p.597. Ozohanics, O. et al., 2008. GlycoMiner: A new software tool to elucidate glycopeptide composition. Rapid Communications in Mass Spectrometry, 22(20), pp.3245–3254. Pabst, M. et al., 2010. Nucleotide and nucleotide sugar analysis by liquid chromatography- electrospray ionization-mass spectrometry on surface-conditioned porous graphitic carbon. Analytical Chemistry, 82(23), pp.9782–9788. Papadaki, A. et al., 2015. The Leishmania donovani acid ecto-phosphatase Ld MAcP: insight into its structure and function. Biochemical Journal, 467(3), pp.473–486. Papanastasiou, P. et al., 1997. The variant-specific surface protein of Giardia, VSP4A1, is a glycosylated and palmitoylated protein. The Biochemical journal, 322, pp.49–56. Parkinson, H., 2004. ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Research, 33, pp.D553–D555. Passero, L.F.D. et al., 2015. Differential modulation of macrophage response elicited by glycoinositolphospholipids and lipophosphoglycan from Leishmania (Viannia) shawi. Parasitology

Metabolic Systems Biology of Leishmania major University of Minho, 2019 72 Chapter 1

International, 64(4), pp.32–35. Patel, P., Mandlik, V. and Singh, S., 2016. LmSmdB: An integrated database for metabolic and gene regulatory network in Leishmania major and Schistosoma mansoni. Genomics Data, 7, pp.115–118. Pereira, S.S. and Jackson, A.P., 2018. UDP-glycosyltransferase genes in trypanosomatid genomes have diversified independently to meet the distinct developmental needs of parasite adaptations. BMC Evol Biol., 18, p.31. Pérez S. et al., 2015. Glyco3D: a portal for structural glycosciences. Methods Mol Biol., 1273, pp.241–58. Pfau, T., Pacheco, M.P. and Sauter, T., 2016. Towards improved genome-scale metabolic network reconstructions: Unification, transcript specificity and beyond. Briefings in Bioinformatics, 17(6), pp.1060– 1069. Phillips, M.R. and Turco, S.J., 2015. Characterization of a ricin-resistant mutant of Leishmania donovani that expresses lipophosphoglycan. Glycobiology, 25(4), pp.428–437. Pierleoni, A., Martelli, P.L. and Casadio, R., 2008. PredGPI: a GPI-anchor predictor. BMC bioinformatics, 9, p.392. Pinger, J., Chowdhury, S. and Papavasiliou, F.N., 2017. Variant surface glycoprotein density defines an immune evasion threshold for African trypanosomes undergoing antigenic variation. Nature Communications, 8(1), p.828. Pinney, J.W. et al., 2007. Metabolic reconstruction and analysis for parasite genomes. Trends in Parasitology, 23(11), pp.548–554. Pitkanen, E., Rousu, J. and Ukkonen, E., 2010. Computational methods for metabolic reconstruction. Current Opinion in Biotechnology, 21(1), pp.70–77. Kissinger, J.C. et al., 2001. PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data. The Plasmodium Genome Database Collaborative. Nucleic acids research, 29(1), pp.66-69. Plata, G. et al., 2010a. Reconstruction and flux-balance analysis of the Plasmodium falciparum metabolic network. Molecular Systems Biology, 6, p. 408. Poisson, G. et al., 2007. FragAnchor: A Large-Scale Predictor of Glycosylphosphatidylinositol Anchors in Eukaryote Protein Sequences by Qualitative Scoring. Genomics, Proteomics and Bioinformatics, 5(2), pp.121–130. Pollyanna, S.G. et al., 2017. Decoding the Role of Glycans in Malaria. Front. Microbiol,8, p.1071. Previato, J.O. et al., 2003. Glycoinositolphospholipid from Trypanosoma cruzi: Structure, Biosynthesis and Immunobiology. Advances in Parasitology, 56, pp.1–41. Proudfoot, L. et al., 1995. Biosynthesis of the glycolipid anchor of lipophosphoglycan and the structurally related glycoinositolphospholipids from Leishmania major. The Biochemical journal, 308, pp.45–55. Proudfoot, L., O’Donnell, C. A. and Liew, F.Y., 1995. Glycoinositolphospholipids of Leishmania major inhibit nitric oxide synthesis and reduce leishmanicidal activity in murine macrophages. European journal of immunology, 25(3), pp.745–750. Purohit, S. et al., 2018. Multiplex glycan bead array for high throughput and high content analyses of glycan binding proteins. Nature Communications, 9(1), p.258. Quirós, L.M. et al., 2000. Glycosylation of macrolide antibiotics. Purification and kinetic studies of a macrolide glycosyltransferase from Streptomyces antibioticus. Journal of Biological Chemistry, 275(16), pp.11713– 11720. Raghunathan, A. et al., 2010. Systems approach to investigating host-pathogen interactions in infections with the biothreat agent Francisella. Constraints-based model of Francisella tularensis. BMC Systems Biology, 4(1), p.118. Ralton, J.E. et al., 2003. Evidence that Intracellular β1-2 Mannan Is a Virulence Factor in Leishmania Parasites. Journal of Biological Chemistry, 278(42), pp.40757–40763. Ranzinger, R. et al., 2011. GlycomeDB-A unified database for carbohydrate structures. Nucleic Acids Research, 39, pp. D373-376. General Introduction 73

Rashid, R. et al., 2017. Comprehensive analysis of phospholipids and glycolipids in the opportunistic pathogen Enterococcus faecalis. PLoS ONE, 12(4), p. e0175886. Real, F. et al., 2013. The genome sequence of leishmania (Leishmania) amazonensis: Functional annotation and extended analysis of gene models. DNA Research, 20(6), pp.567–581. Reed, J.L. et al., 2006. Towards multidimensional genome annotation. Nature reviews Genetics, 7(2), pp.130– 141. Rek, A., Krenn, E. and Kungl, A.J., 2009. Therapeutically targeting protein-glycan interactions. British Journal of Pharmacology, 157(5), pp.686–694. Rini, J., Esko, J. and Varki, A., 2009. Chapter 5: Glycosyltransferases and Glycan-processing Enzymes. In Essentials of Glycoboilogy. 2nd edition. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press, pp.2015-2017. Robaina E.S. and Nikoloski, Z., 2014. Generalized framework for context-specific metabolic model extraction methods. Frontiers in Plant Science, 5, p.491. Roberts, S.B. et al., 2009. Proteomic and network analysis characterize stage-specific metabolism in Trypanosoma cruzi. BMC systems biology, 3, p.52. Rodrigues, J.A. et al., 2015. Parasite Glycobiology: A Bittersweet Symphony. PLoS Pathog, 11(11), p. e1005169. Rodriguez-Contreras, D. et al., 2007. Phenotypic characterization of a glucose transporter null mutant in Leishmania mexicana. Molecular and Biochemical Parasitology, 153(1), pp.9–18. Rogers, M.E., 2012. The role of Leishmania proteophosphoglycans in sand fly transmission and infection of the mammalian host. Frontiers in Microbiology, 3, p.223. Rojas-Macias, M.A. and Lutteke, T., 2015. Statistical analysis of amino acids in the vicinity of carbohydrate residues performed by glyvicinity. Methods in Molecular Biology, 1273, pp.215–226. Rosen, J., Miguet, L. and Pérez, S., 2009. Shape: Automatic conformation prediction of carbohydrates using a genetic algorithm. Journal of Cheminformatics, 1(1), p.16. Rosenfeld, R. et al., 2007. A lectin array-based methodology for the analysis of protein glycosylation. Journal of Biochemical and Biophysical Methods, 70(3), pp.415–426. Ruhaak, L.R. et al., 2010. Glycan labeling strategies and their use in identification and quantification. Analytical and Bioanalytical Chemistry, 397(8), pp.3457–3481. Russell, D.G. and Wright, S.D., 1988. Complement receptor type 3 (CR3) binds to an Arg-Gly-Asp-containing region of the major surface glycoprotein, gp63, of Leishmania promastigotes. The Journal of experimental medicine, 168(1), pp.279–292. Ruyck, J. et al., 2016. Molecular docking as a popular tool in drug design, an in silico travel. Adv. Appl. Bioinform. Chem., 9, pp.1–11. Ryu, J.Y., Kim, H.U. and Lee, S.Y., 2015. Reconstruction of genome-scale human metabolic models using omics data. Integr. Biol., 7(8), pp.859–868. Sacks, D. and Kamhawi, S., 2001. Molecular Aspects of Parasite-Vector and Vector-Host Interactions in Leishmaniasis. Annual Review of Microbiology, 55(1), pp.453–483. Sacks, D.L. et al., 1995. Stage-specific binding of Leishmania donovani to the sand fly vector midgut is regulated by conformational changes in the abundant surface lipophosphoglycan. The Journal of experimental medicine, 181(2), pp.685–697. Saha, R., Chowdhury, A. and Maranas, C.D., 2014. Recent advances in the reconstruction of metabolic models and integration of omics data. Current Opinion in Biotechnology, 29(1), pp.39–45. Samuelson, J. and Robbins, P.W., 2015. Effects of N-glycan precursor length diversity on quality control of protein folding and on protein glycosylation. Seminars in Cell and Developmental Biology, 41, pp.121–128. Sánchez-Ovejero, C. et al., 2016. Sensing parasites: Proteomic and advanced bio-detection alternatives. Journal of Proteomics, 136, pp.145–156. Sanz, S. et al., 2013. Biosynthesis of GDP-fucose and other sugar nucleotides in the blood stages of Plasmodium falciparum. Journal of Biological Chemistry, 288(23), pp.16506–16517.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 74 Chapter 1

Sanz, S. et al., 2016. The disruption of GDP-fucose de novo biosynthesis suggests the presence of a novel fucose-containing glycoconjugate in Plasmodium asexual blood stages. Scientific Reports, 6, p.37230. Saunders, E.C. et al., 2010. Central carbon metabolism of Leishmania parasites. Parasitology, 137(9), pp.1303– 1313. Saunders, E.C. et al., 2011. Isotopomer profiling of Leishmania mexicana promastigotes reveals important roles for succinate fermentation and aspartate uptake in Tricarboxylic Acid Cycle (TCA) anaplerosis,,glutamate synthesis, and growth. Journal of Biological Chemistry, 286(31), pp.27706–27717. Scheltema, R.A. et al., 2010. The potential of metabolomics for Leishmania research in the post-genomics era. Parasitology, 137(9), pp.1291–1302. Schmidt, B.J. et al., 2013. GIM3E: Condition-specific models of cellular metabolism developed from metabolomics and expression data. Bioinformatics, 29(22), pp.2900–2908. Schneider, P. et al., 1990. Structure of the glycosyl-phosphatidylinositol membrane anchor of the Leishmania major promastigote surface protease. Journal of Biological Chemistry, 265(28), pp.16955–16964. Schuster, F.L., 2002. Cultivation of plasmodium spp. Clinical Microbiology Reviews, 15(3), pp.355–364. Schwede, A. et al., 2015. How Does the VSG Coat of Bloodstream Form African Trypanosomes Interact with External Proteins? PLoS Pathogens, 11(12), p. e1005259. Segre, D., Vitkup, D. and Church, G.M., 2002. Analysis of optimality in natural and perturbed metabolic networks. Proceedings of the National Academy of Sciences, 99(23), pp.15112–15117. Selman, M.H.J. et al., 2012. MALDI-TOF-MS analysis of sialylated glycans and glycopeptides using 4-chloro- α-cyanocinnamic acid matrix. Proteomics, 12(9), pp.1337–1348. Shajahan, A. et al., 2017. Glycomic and glycoproteomic analysis of glycoproteins—a tutorial. Analytical and Bioanalytical Chemistry, 409(19), pp.4483–4505. Shakarian, A.M. et al., 2002. Molecular dissection of the functional domains of a unique, tartrate-resistant, surface membrane acid phosphatase in the primitive human pathogen Leishmania donovani. Journal of Biological Chemistry, 277(20), pp.17994–18001. Shakhsheer, B. et al., 2013. SugarBind database (SugarBindDB): A resource of pathogen lectins and corresponding glycan targets. Journal of Molecular Recognition, 26(9), pp.426–431. Sharma, M. et al., 2017. A systematic reconstruction and constraint-based analysis of Leishmania donovani metabolic network: identification of potential antileishmanial drug targets. Mol. BioSyst., 13(5), pp.955–969. Sharma, R. et al., 2013. Construction of a rice glycoside hydrolase phylogenomic database and identification of targets for biofuel research. Frontiers in plant science, 4, p.330. Shlomi, T. et al., 2008. Network-based prediction of human tissue-specific metabolism. Nature biotechnology, 26(9), pp.1003–1010. Singh, S., Mandlik, V. and Shinde, S., 2015. Molecular dynamics simulations and statistical coupling analysis of GPI12 in L. major: functional co-evolution and conservedness reveals potential drug–target sites. Mol. BioSyst., 11(3), pp.958–968. Smith, D.F., Song, X. and Cummings, R.D., 2010. Use of glycan microarrays to explore specificity of glycan- binding proteins. Methods in Enzymology, 480(C), pp.417–444. Smith, D.G.S. et al., 2015. Data set for mass spectrometric analysis of recombinant human serum albumin from various expression systems. Data in Brief, 4, pp.583–586. Smith, T.K. et al., 2004. Chemical validation of GPI biosynthesis as a drug target against African sleeping sickness. EMBO Journal, 23(23), pp.4701–4708. Smith, T.K. et al., 1997. Early steps in glycosylphosphatidylinositol biosynthesis in Leishmania major. The Biochemical journal, 326, pp.393–400. Smith, T.K. and Bütikofer, P., 2010. Lipid metabolism in Trypanosoma brucei. Molecular and Biochemical Parasitology, 172(2), pp.66–79. Song, C. et al., 2013. Metabolic reconstruction identifies strain-specific regulation of virulence in Toxoplasma gondii. Molecular systems biology, 9(708), p.708. General Introduction 75

Song, H.S., Reifman, J. and Wallqvist, A., 2014a. Prediction of metabolic flux distribution from gene expression data based on the flux minimization principle. PLoS ONE, 9(11), p. e112524. Song, X. et al., 2014. Chemistry of natural glycan microarrays. Current Opinion in Chemical Biology, 18(1), pp.70–77. Song, X. et al., 2016. Oxidative release of natural glycans for functional glycomics. Nature Methods, 13(6), pp.528–534. Sosa, M.H., Giordana, L. and Nowicki, C., 2015. Exploring biochemical and functional features of Leishmania major phosphoenolpyruvate carboxykinase. Archives of Biochemistry and Biophysics, 583, pp.120–129. Spath, G.F. et al., 2000. Lipophosphoglycan is a virulence factor distinct from related glycoconjugates in the protozoan parasite Leishmania major. Proceedings of the National Academy of Sciences, 97(16), pp.9258– 9263. Spiro, R.G., 2002. Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds. Glycobiology, 12(4), p.43R–56R. Srivastava, S. et al., 2013. Leishmania expressed lipophosphoglycan interacts with Toll-like receptor (TLR)-2 to decrease TLR-9 expression and reduce anti-leishmanial responses. Clinical and Experimental Immunology, 172(3), pp.403–409. Strasser, R., 2016. Plant protein glycosylation. Glycobiology, 26(9), pp.926–939. Subramanian, A., Jhawar, J. and Sarkar, R.R., 2015. Dissecting Leishmania infantum energy metabolism - A systems perspective. PLoS ONE, 10(9), p. e0137976. Subramanian, A. and Sarkar, R.R., 2017. Revealing the mystery of metabolic adaptations using a genome scale model of Leishmania infantum. Scientific Reports, 7(1), p.10262. Sun, Y. et al., 2016. A Human Lectin Microarray for Sperm Surface Glycosylation Analysis. Molecular and Cellular Proteomics, 15(9), pp.2839–2851. Sunter, J. and Gull, K., 2017. Shape, form, function and Leishmania pathogenicity: from textbook descriptions to biological understanding. Open Biology, 7(9), p.170165. Sussman, J.L. et al., 1998. Protein Data Bank (PDB): Database of three-dimensional structural information of biological macromolecules. In Acta Crystallographica Section D: Biological Crystallography, 54, pp. 1078– 1084. Suzuki, E. et al., 2002. Role of beta-D-galactofuranose in Leishmania major macrophage invasion. Infection and Immunity, 70(12), pp.6592–6596. Swann, J. et al., 2015. Systems analysis of host-parasite interactions. Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 7(6), pp.381–400. Takeda, Y. et al., 2014. Both isoforms of human UDP-glucose:glycoprotein glucosyltransferase are enzymatically active. Glycobiology, 24(4), pp.344–350. Tan, N.Y. et al., 2014. Sequence-based protein stabilization in the absence of glycosylation. Nature Communications, 5, p.3099. Tanaka, K. et al., 2014. WURCS: The Web3 unique representation of carbohydrate structures. Journal of Chemical Information and Modeling, 54(6), pp.1558–1566. The UniProt Consortium, 2014. UniProt: a hub for protein information. Nucleic Acids Research, 43(Database issue), pp.D204-12. Thiele, I. and Palsson, B.Ø., 2010. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols, 5(1), pp.93–121. Tomita, T. et al., 2013. The Toxoplasma gondii Cyst Wall Protein CST1 Is Critical for Cyst Wall Integrity and Promotes Bradyzoite Persistence. PLoS Pathogens, 9(12), pp.1–15. Topfer, N., Kleessen, S. and Nikoloski, Z., 2015. Integration of metabolomics data into metabolic networks. Frontiers in Plant Science, 6, p.49. Toukach, P. V., 2011. Bacterial carbohydrate structure database 3: Principles and realization. Journal of Chemical Information and Modeling, 51(1), pp.159–170.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 76 Chapter 1

Treumann, A. et al., 1997. Structural characterisation of two forms of procyclic acidic repetitive protein expressed by procyclic forms of Trypanosoma brucei. Journal of Molecular Biology, 269(4), pp.529–547. Trottein, F. et al., 2009. Glycosyltransferase and sulfotransferase gene expression profiles in human monocytes, dendritic cells and macrophages. Glycoconjugate Journal, 26(9), pp.1259–1274. Turco, S.J., 1988. The lipophosphoglycan of Leishmania. Parasitology Today, 4(9), pp.255–257. Turco, S.J. and Descoteaux, A., 1992. The lipophosphoglycan of Leishmania parasites. Annu Rev Microbiol, 46, pp.65–94. Turco, S.J. and Sacks, D.L., 1991. Expression of a stage-specific lipophosphoglycan in Leishmania major amastigotes. Molecular and Biochemical Parasitology, 45(1), pp.91–99. Turnock, D.C. and Ferguson, M.A.J., 2007. Sugar nucleotide pools of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major. Eukaryotic Cell, 6(8), pp.1450–1463. Tyagi, R. et al., 2015. Pan-phylum Comparison of Nematode Metabolic Potential. PLoS Neglected Tropical Diseases, 9(5), p. e0003788. Tymoshenko, S. et al., 2015. Metabolic Needs and Capabilities of Toxoplasma gondii through Combined Computational and Experimental Analysis. PLoS Computational Biology, 11(5), p. e1004261. Uchiyama, N. et al., 2008. Optimization of evanescent-field fluorescence-assisted lectin microarray for high- sensitivity detection of monovalent oligosaccharides and glycoproteins. Proteomics, 8(15), pp.3042–3050. Udenfriend, S. and Kodukula, K., 1995. How glycosylphosphatidylinositol-anchored membrane proteins are made. Annual review of biochemistry, 64, pp.563–591. Vanee, N. et al., 2010. A genome-scale metabolic model of Cryptosporidium hominis. Chemistry and Biodiversity, 7(5), pp.1026–1039. Varki, A., 2017. Biological roles of glycans. Glycobiology, 27(1), pp.3–49. Varki, A., Cummings, R. and Esko, J., 2009. N-Glycans. In Essentials of Glycobiology. 2nd edition. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press. Vasudevan, D. and Haltiwanger, R.S., 2014. Novel roles for O-linked glycans in protein folding. Glycoconjugate Journal, 31(6), pp.417–426. Vedadi, M. et al., 2007. Genome-scale protein expression and structural biology of Plasmodium falciparum and related Apicomplexan organisms. Molecular and Biochemical Parasitology, 151(1), pp.100–110. Veras, P.S.T. and De Menezes, J.P.B., 2016. Using proteomics to understand how Leishmania parasites survive inside the host and establish infection. International Journal of Molecular Sciences, 17(8), p.1270. Walia, G. et al., 2017. The effects of UDP-sugars, UDP and Mg2+on uridine diphosphate glucuronosyltransferase activity in human liver microsomes. Xenobiotica, 26, pp.1-9. Wallqvist, A. et al., 2016. Metabolic host responses to malarial infection during the intraerythrocytic developmental cycle. BMC Systems Biology, 10(1), p.58. Wang, J. et al., 2011. dbOGAP - an integrated bioinformatics resource for protein O-GlcNAcylation. BMC bioinformatics, 12, p.91. Weerapana, E. et al., 2005. Investigating bacterial N-linked glycosylation: Synthesis and glycosyl acceptor activity of the undecaprenyl pyrophosphate-linked bacillosamine. Journal of the American Chemical Society, 127(40), pp.13766–13767. Weilhammer, D.R. et al., 2012. Host metabolism regulates growth and differentiation of Toxoplasma gondii. International Journal for Parasitology, 42(10), pp.947–959. Werz, D.B. and Seeberger, P.H., 2005. Carbohydrates as the next frontier in pharmaceutical research. Chemistry - A European Journal, 11(11), pp.3194–3206. Wiback, S.J., Mahadevan, R. and Palsson, B., 2004. Using Metabolic Flux Data to Further Constrain the Metabolic Solution Space and Predict Internal Flux Patterns: The Escherichia coli Spectrum. Biotechnology and Bioengineering, 86(3), pp.317–331. Wongkongkatep, J. et al., 2006. Label-free, real-time glycosyltransferase assay based on a fluorescent artificial chemosensor. Angewandte Chemie - International Edition, 45(4), pp.665–668. General Introduction 77

Wright, H.T., Sandrasegaram, G. and Wright, C.S., 1991. Evolution of a family of N-acetylglucosamine binding proteins containing the disulfide-rich domain of wheat germ agglutinin. Journal of molecular evolution, 33(3), pp.283–294. Wu, M. and Chan, C., 2012. Human metabolic network: Reconstruction, simulation, and applications in systems biology. Metabolites, 2(1), pp.242–253. Wu, Z.L. et al., 2011. Universal phosphatase-coupled glycosyltransferase assay. Glycobiology, 21(6), pp.727– 733. Xu, C. and Ng, D.T.W., 2015. Glycosylation-directed quality control of protein folding. Nature Reviews Molecular Cell Biology, 16(12), pp.742–752. Yizhak, K. et al., 2010. Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model. Bioinformatics (Oxford, England), 26(12), pp.i255-i260. Yonekura-Sakakibara, K. et al., 2007. Identification of a flavonol 7-O-rhamnosyltransferase gene determining flavonoid pattern in Arabidopsis by transcriptome coexpression analysis and reverse genetics. Journal of Biological Chemistry, 282(20), pp.14932–14941. Zaia, J., 2008. Mass Spectrometry and the Emerging Field of Glycomics. Chemistry and Biology, 15(9), pp.881–892. Zamze, S.E. et al., 1991. Structural characterization of the asparagine-linked oligosaccharides from Trypanosoma brucei Type II and Type III variant surface glycoproteins. Journal of Biological Chemistry, 266(30), pp.20244–20261. Zawadzki, J. et al., 1998. The glycoinositolphospholipids from Leishmania panamensis contain unusual glycan and lipid moieties. Journal of Molecular Biology, 282(2), pp.287–299. Zhang, C. and Hua, Q., 2016. Applications of genome-scale metabolic models in biotechnology and systems medicine. Frontiers in Physiology, 6, p. 413. Zhang, H. et al., 2006. UniPep--a database for human N-linked glycosites: a resource for biomarker discovery. Genome biology, 7(8), p.R73. Zhang, L., Luo, S. and Zhang, B., 2016. The use of lectin microarray for assessing glycosylation of therapeutic proteins. mAbs, 8(3), pp.524–535. Zhang, X. and Wang, Y., 2016. Glycosylation Quality Control by the Golgi Structure. Journal of Molecular Biology, 428(16), pp.3183–3193. Zou, X. et al., 2017. A standardized method for lectin microarray-based tissue glycome mapping. Scientific Reports, 7, p. 43560.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 78 Chapter 1

CHAPTER 2

Accessing And Interlinking Databases In Glycobiology

"...tracking data provenance matters especially in an age when databases continuously integrate information emerging in the literature."

Raja Mazumder, Nature blog, (June 2017)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 80 Chapter 2

Abstract

Current glycomic databases store valuable information on glycan structures and their functional roles; however poor interoperability and interconnectivity among databases have limited access to most datasets, mainly due to the use of unstandardized encoding formats to represent glycan structures, and the lack of computational tools to inter-translate between different formats. The present work addresses these issues by exploring sixteen commonly used glycomic resources, including GlyTouCan, Consortium for Functional Glycomics databases (CFG), CarbBank and UniProtKB and their links to other databases like glycoproteomic databases (e.g. AnimalLectinDB and CancerLectinDB) or other relevant databases, such as SWISS- PROT/TrEMBL, KEGG, Carbohydrate-Active Enzyme (CAZy) and PubMed. Particularly, the representation of glycan structures and the availability of web-based tools for the translation of multiple encoding formats were further inspected to assess the existing connectivity among glycan databases. This work shows that, although some databases provide limited cross- references, the use of translation tools and generalized encoding formats like WURCS would improve interoperability between glycomic databases, as well as other databases. This chapter is expected to bring valuable information to the glycomics community, improving current glycomics data analytical strategies. The findings encourage for collaborative and integrative studies in glycomics research, particularly for the development of bioinformatics tools to expand data accessibility. Accessing and Interlinking Databases in Glycobiology 81

2.1 Introduction The development and application of high-throughput technologies including Mass Spectrometry (MS), Nuclear Magnetic Resonance (NMR), lectin/glycan microarrays and mouse phenotyping have revolutionized glycomics studies on many organisms, resulting in the discovery of many glycan structures and allowing to study their roles in cellular systems. However, the subsequent need for computational resources to store data and ensure easy access and interoperability between different data resources is now more demanding.

Various databases have been developed to store glycan-related information. The publically available databases, such as CarbBank (CCSD) (Doubet et al. 1989), GlycomeDB (Ranzinger et al. 2008) (now part of GlyTouCan (Aoki-Kinoshita et al. 2015)) and GlycoWorkbench (Ceroni et al. 2008) are some of the most used resources, containing a large number of glycan structures and experimental data from different studies. CFG (Consortium for Functional Glycomics) provides data from different experiments as well, such as lectin/glycan microarrays, MS, mouse phenotyping and gene microarrays. Similarly, GlycoGeneDB (Narimatsu 2004), as part of the Japan Consortium for Glycobiology and Glycotechnology Database (JCGGDB) (Maeda et al. 2015) provides expression data on glyco-enzymes involved in the biosynthesis of glycans, glycoconjugates and Glycan Binding Proteins (GBPs) in different organisms. Other important databases are already discussed in Chapter 1 (see section 1.3). These web-based resources have been reported in previous reviews (Baycin-Hizal et al. 2014; Aoki-Kinoshita 2013; von der Lieth 2007) as important tools for exploring glycan structures and understanding their interactions with other biomolecules. However, information is scattered across many different databases and it had been frequently found inconsistent among different databases, limiting accessibility and usability (Aoki-Kinoshita 2013; Cummings and Pierce 2014; Ranzinger et al. 2008). Most glycomic databases do not provide user-friendly interfaces to extract data, which, in addition to the lack of universal formats for representing glycan structures, make it more difficult for the glycomics community to access and share information. Besides, most glycomic databases provide only a few external cross-references that link to other resources, which provides limited access to different types of data.

Currently, different databases use different formats to represent glycan structures. For instance, KEGG database (Kanehisa and Goto 2000) uses KCF format (Hattori et al. 2003), whereas

Metabolic Systems Biology of Leishmania major University of Minho, 2019 82 Chapter 2

GlyTouCan database uses IUPAC (International Union of Pure and Applied Chemistry, 2013), GlycoCT (condensed) (Herget et al. 2008) or Web3 Unique Representation of Carbohydrate Structures (WURCS) (Tanaka et al. 2014) formats. The lack of a universal representation format, make it difficult to verify, with any certainty, the equivalence between glycan structures and integrate glycan-related data from various glycomic databases. This problem has been tackled to some extent using several tools for the translation of these encoding formats. However, many of these tools do not allow translating formats both ways, probably due to the loss of structural information when converting complex formats into simpler ones.

So far, the implementation of frameworks or strategies to make data storage and integration more complete, accurate, and timely remains a major challenge for the glycoinformatics community, especially with the recent accumulation of omics data. This study focuses on data from different glycomic and glycoproteomic resources (Chapter 1, section 1.3), addressing problems in integrating information due to the inconsistency in the identification and representation of glycans, as well as the poor inter-linkage between databases. Main focus on graphic and text- based glycan structural notations is provided, exploring available tools to interconvert these encoding formats, in order to improve inter-linkage and interoperability among various glycomic databases.

This study will hopefully contribute to the discussion on data standards and integration in the field of glycobiology, and will help the glycoinformatics community to efficiently fill the gaps in using current glycomics/glycoproteomics databases and analytical tools for glycobiology research.

Accessing and Interlinking Databases in Glycobiology 83

2.2 Glycomic resources and integration with other databases

As comprehensively discussed in Chapter 1 (see section 1.3), glycomic databases compile different levels of information on glycans, from their structure to other chemical properties, like molecular weight or glycan-binding specificity. More specialized databases, such as GlycoBase (Campbell et al. 2008) (now part of UniCarbKB (Campbell et al. 2014)) and Glycan Mass Spectral DB (GMDB) (Kameyama et al. 2005) provide analytical data on specific glycan structures (e.g. N- and O-linked glycans), while databases like GlyTouCan, CarbBank or other resources like CFG provide a comprehensive compilation of glycan-related data (e.g. structures, profiling data, etc). Previous reviews (Baycin Hizal et al. 2014; Aoki-Kinoshita 2013; von der Lieth 2007) and textbooks (von der Lieth et al. 2009; Aoki-Kinoshita 2017) also discussed the importance of most of these glycan resources. Linking glycomic databases, especially to glycoproteomics databases, is still poor due to the non-unified representation of the glycan structures. So far, many glycoproteomics databases, such as SugarBindDB (Mariethoz et al. 2015), AnimalLectinDB (Kumar and Mittal 2011) and CancerLectinDB (Damodaran et al. 2008) that provide information on lectins and their associated glycans are poorly linked to larger glycomic databases, such CFG and GlyTouCan. Similarly, databases such as GLYCOSCIENCES.de (Lütteke et al. 2006) and Glyco3D (Pérez et al. 2015), providing 3D structural data with major importance for glycan structural modeling projects, are poorly linked to glycoproteomic databases. In the same context, Protein Data Bank (PDB) (Sussman et al. 1998), KEGG and ExPASy (Tools et al. 2010) are useful resources for glycoprotein 3D structural modelling (Capriles et al. 2010; Gerloff et al. 2005) and protein- ligand docking (Forli et al. 2016; Perryman et al. 2015), but unfortunately do not provide links to most glycomic and glycoproteomic databases. Besides glycan-related data, information on other interacting biomolecules is also important in some studies. Therefore, databases like Carbohydrate-Active Enzyme (CAZy) (Lombard et al. 2014) for carbohydrate-active enzymes, KEGG and Carbohydrate Structure Glycosyltransferase Database (CSDB_GT) (Egorova and Toukach 2017) for structural and functional information on Glycosyltransferase (GTs), and Mutant Mouse Regional Resource Center (MMRRC) (Lloyd 2003) and The International Mouse Phenotype Consortium (IMPC) (http://www.mousephenotype.org/) for functional data from mouse phenotypic experiments, are often accessed and useful to bring knowledge on glycans’ functional roles. Various glycomic,

Metabolic Systems Biology of Leishmania major University of Minho, 2019 84 Chapter 2

glycoproteomic and other relevant databases are listed in Table 1.2 (Chapter 1) and Table 1.3 (Chapter 1).

2.3 Notations to represent glycan structures

2.3.1 Graphical representation of glycan structures

The graphical representation of glycan structures is generally more complex compared to other biomolecules due to the multi-branched nature of glycans. Various notations have been defined to represent monosaccharides in glycan structures (Figure 2.1). For instance, graphical notation published in the textbook “Essentials of Glycobiology” (Varki et al. 1999) (here so-called CFG/Essentials) was adopted by various databases, such as UniCarbKB, SugarBindDB, GlyTouCan and GlycoBase with very little modifications, especially because of its popularity among the glycobiology community. This notation represents monosaccharides as colored symbols and glycosidic bonds indicating the stereochemical type and linkage positions. Other notations, such as IUPAC (2D) and GlycoCT have been proposed and used by various databases such as GlyTouCan, GlycoBase, and SugarBindDB. The symbolic representation of monosaccharides defined by the Oxford Glycobiology Institute (Harvey et al. 2011) was also adopted by many databases, such as SugarBindDB and UniCarbKB. Different aspects of Oxford and CFG style symbolic nomenclatures have been described in previous reviews (Varki et al. 2009; Varki et al. 2015).

Graphical notations are useful to visualize glycan structures, especially to capture the composition of glycans and glycosidic bonds between monosaccharides. Unfortunately, no standardized notations have been defined so far and databases keep using different formats to represent glycan structures. Recently, UniCarbKB made available a software tool, the GlycanBuilder (Ceroni et al. 2007), to illustrate the same glycan using more than one symbolic notation, facilitating the visualization and identification of glycan structures in most glycomic databases, and switching between various glycan representing formats.

Accessing and Interlinking Databases in Glycobiology 85

Figure 2.1: Illustration of various notations to represent the glycan (Gal)3(Glc)1(GlcNAc)2(cer)1 (KEGG ID- G01905). This glycan consists of three galactoses (Gal), two N-acetylglucosamines (GlcNAc) and a glucose (Glc) unit attached to a non-sugar molecule, ceramide (Cer) (not illustrated). Glycosidic bonds in IUPAC(2D) and CFG/Essentials are illustrated using IUPAC nomenclature, where, “α” and “β” represent anomeric linkage between different monosaccharides.

2.3.2 Text-based representation of glycan structures

Text-based notations should be more appropriate for software applications and capture the most essential structural details of glycans, facilitating the exchange and integration of glycomics data. However, because of the complex nature of glycan structures, most text-based formats are still unable to capture all known glycan features, such as glycosidic linkage, torsions, and anomericity.

Computer readable text-based formats developed so far includes: linear IUPAC, LINUCS, KCF, GlycoCT (condensed and XML), Bacterial Carbohydrate Structure Database (BCSDB) sequence format (Toukach 2011), CabosML (Kikuchi et al. 2005), WURCS format, GlydeII (Ranzinger et

Metabolic Systems Biology of Leishmania major University of Minho, 2019 86 Chapter 2

al. 2017) and LinearCode (Banin et al. 2002). Each of these formats accommodates different levels of structural information. For instance, IUPAC is one of the simplest text-based structural encoding formats, showing limitations in the representation of variations in the carbohydrate branches. LINUCS format can, however, represent more branching features and was first implemented by SWEET-DB database (Loss et al. 2002). XML-based representations like GlydeII (extension of Glyde (Sahoo et al. 2005)) and GlycoCT (XML) formats are able to enclose several structural features, especially concerning the linkage and the isotopomeric representations of monosaccharides. These formats have been frequently used in exchanging glycan information and developing several web-based applications for structural analysis (Eavenson et al. 2015; Barh et al. 2017). Among several other developed formats, KCF, CabosML, BCSDB sequence, LinearCode and WURCS, the latest development, are able to represent a larger number of structural features, such as compositions, repeating units and linkage, as compared to previously developed formats. WURCS format is capable of accommodating different features from glycan structures, and also can be used in Semantic Web applications to integrate data from different databases (Tanaka et al. 2014). Apart from these notations, pdb and mol formats are specific to represent 3D information of glycan molecules and have been widely used by PDB and PubChem databases (Bolton et al. 2008; Berman 2000). Figure 2.2 shows different machine-readable formats to represent glycan molecules, which are used by most databases mentioned in Table 1.2 (Chapter 1). Accessing and Interlinking Databases in Glycobiology 87

Figure 2.2: Machine-readable formats representing the following glycan structure: (Gal)3(Glc)1(GlcNAc)2(cer)1. Note: Text-based formats were generated using different tools that are discussed in section 2.4 ”Glycan notations in glycomic databases”. GlycoCT (condensed) format was converted to LinearCode and LINUCS using GlycanBuilder tool, while RINGS tools were used to convert LINUCS into KCF format. GlyTouCan was used to convert GlycoCT (condensed) to WURCS, and GlycanBuilder converted LINUCS into GlydeII format.

2.4 Glycan notations in glycomic databases

Glycomic databases usually use different notations to represent glycan structures. In general, graphical notations like IUPAC or CFG/Essentials are widely used, but text-based notations, such as GlycoCT (condensed) are also used. As shown in Figure 2.3 (see also Table 1.2, Chapter 1), among the analyzed databases almost 15 databases use graphical notations, being IUPAC and CFG/Essentials the most used formats. Nevertheless, text-based notations are often used,

Metabolic Systems Biology of Leishmania major University of Minho, 2019 88 Chapter 2

especially GlycoCT (condensed), while like WURCS, KCF and GlycoCT (XML) are among the least used formats.

In spite of significant developments in glycan structural notations, the lack of standardized notations and/or translation tools to support interoperability among databases limits access to relevant glycan information. The need to interlink glycomic databases (also to other biological databases) becomes even more significant when investigating glycan functional roles, which benefits from the combination of multiple omics data types, such as genomics and proteomics. For example, when inspecting glycan biosynthetic pathways researchers might want to combine information from enzymatic studies, gene or protein expression data, which could be scattered across many different databases, lacking cross-references or using different notations to identify glycans or glycan-related data. KEGG GLYCAN database (Hashimoto et al. 2006) holds information on glycan-related data like enzymes, genes, and pathways involved in the synthesis/degradation of glycans, and use KCF to represent glycan structures; however, poorly linked with other relevant databases such as CAZy database which carries similar information. Besides, these databases do not use the same notations, and translation tools converting structural formats are limited. For instance, CFG stores information in IUPAC (linear) and LinearCode formats, but neither provide any external link to other databases such as GlyTouCan or KEGG, nor facilitate any translational tools to convert these formats to other important formats such as KCF, WURCS, and LINUCS.

Besides the application of translational tools, cross-referencing between different databases is equally important for improving access to data. At present, on average almost all the glycomic databases have poor cross-linkage to other glycomic or glycoproteomics databases, except GlyTouCan, which provides cross-links to nearly 12 other databases (Figure 2.4).

Accessing and Interlinking Databases in Glycobiology 89

Figure 2.3: Use of graphic/text-based glycan notations across sixteen glycomic databases included in the analysis.

Figure 2.4: The cross references (or cross-links to other databases) used in different glycomic databases.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 90 Chapter 2

2.5 Tools to convert encoding formats for glycan structure

Several efforts have been made to implement tools capable to convert one structural encoding format into another. So far, six major web applications that facilitate the interconversion between text-based encoding formats have been recognized (Figure 2.5): RINGS, GlycanBuilder, GLYCOSCIENCES.de, WURCS working group (WURCS-WG) (www.wurcs-wg.org), and inbuilt features in GlyTouCan and UnicarbKB. Yet, using a single tool to interconvert all formats is still not available.

Figure 2.5: Tools for the interconversion of text-based glycan structural encoding formats. The colored circles represent tools that use input formats mentioned in first column to convert into other formats shown in first row. Blank grey boxes indicate that there are no tools available to convert one format into the other. *These are the databases which have in-built features to facilitate format translations. Accessing and Interlinking Databases in Glycobiology 91

WURCS-WG provides translations from GlycoCT (condensed) to WURCS, mol to WURCS, WURCS to IUPAC and others; however, some of the most important translations such as WURCS to KCF and LINUCS are still not implemented. GlyTouCan also facilitates an internal platform to switch between GlycoCT (condensed) and WURCS format. Similarly, RINGS provides two-way translations from KCF to LINUCS, which are helpful to link KEGG GLYCAN database with other glycomic databases. Some of the two-way translations of pdb/LINUCS formats are provided by GLYCOSCIENCES.de, which is important to interlink PDB and Glyco3D databases to other glycomic databases. In-build translational tools in different databases are described in Table 1.4 (Chapter 1).

Despite all these available tools, the ability to interconvert all these encoding formats is limited. RINGS and GlycanBuilder allow only 10 and 12 interconversions, respectively. Significant efforts are still needed to develop tools for the remaining 58 “possible” translations, especially focusing on the conversion of pdb and mol to other formats, to interlink current glycomic databases with PDB and PubChem databases. The application of sequential translational tools, like RINGS to convert KCF to GlycoCT, followed by WURCS-WG to convert GlycoCT to WURCS could be a solution.

Converting one format into another primarily involves mapping of different naming schemes used to represent glycan structures. During this process, it is very likely to lose structural information, especially while translating from complex formats (with more extensive information) to simpler ones (with less information). For example, KCF to LinearCode conversion by RINGS involves the loss of information, as LinearCode naming schemes cannot accommodate all the features from the KCF format. An example is shown in Appendix 2a. This conveys that mere development of translational tools is not a fool-proof solution to interlink glycomic databases. Further, standardization on inter-translation of different glycan formats should be considered. In the future, glycomic resources should store glycan structural data using encoding formats that hold more detailed information and, if necessary, only converting to formats with the same level of information.

Also, visualization of glycan structures should be standardized independently on the text-based encoding formats used in different databases. Tools like GlycanBuilder can support the

Metabolic Systems Biology of Leishmania major University of Minho, 2019 92 Chapter 2

visualization of glycan structures using many different formats, such as LINUCS, IUPAC or GlycoCT (XML) formats, and provide a clear representation of glycans. Additionally, this interface allows querying GlycoWorkbench using either graphic drawing or GlycoCT (XML) inputs. The GlycanBuilder tool has also been integrated into UniCarbKB database to facilitate graphics-based searches. Similarly, GlycoViewer (Joshi et al. 2010) was developed to allow the visualization of glycan structures and provide search functionalities in GlycoSuiteDB database (Cooper et al. 2003) (now part of UniCarbKB) using either IUPAC or graphic-based inputs. GlycomeAtlas (Konishi and Aoki-Kinoshita 2012) is another tool which is preloaded with data from CFG and provides visualization of glycan structures, but only from human and mouse samples. The platform also facilitates uploading and visualizing structures from its own glycan dataset. Several other tools such as DrawRINGS (Akune et al. 2010), GlycoVault (Nimmagadda et al. 2008) and KEGG Carbohydrate Matcher (KCaM) tool (Aoki et al. 2004) are helpful to perform structure-based searches based on graphical as well as different structural encoding inputs as shown in Figure 2.6. Many of these tools and databases provide multiple options for making search queries; yet, are usually limited to only one database, except KCaM tool, which allows carbohydrate sequence similarity search in KEGG and CarbBank databases, as well as GlycanBuilder which allows querying GlycoWorkbench and UniCarbKB databases. Table 1.4 (Chapter 1) lists drawing and search tools that allow querying glycomic databases. A recent review explored more on using glycan structural information to query specific databases such as GlycoBase, GlycoWorkbench, UniCarb-DB (Hayes et al. 2011), GlycoDigest (Gotz et al. 2014), and CASPER (Lisacek et al. 2017). CFG also facilitates queries using LinearCode and IUPAC formats; yet still poorly linked to other glycomic databases. Figure 2.6 provides an overview of the querying tools which uses different graphics and text-based encoding formats to facilitate search in key glycomic databases. As an example, Figure 2.7 presents some of the search and query features of the GlyTouCan database, from text (GlycoCT-condensed or WURCS) and graphic (manual drawing in CFG graphic formats) format inputs to file upload containing at least two glycan structures in the previously mentioned formats. Accessing and Interlinking Databases in Glycobiology 93

Figure 2.6: Search tools used to access glycomic databases that can be queried using different input formats, graphic (e.g. CFG and GlycoCT) or text (e.g. LinearCode, GlycoCT (condensed), IUPAC, and KCF). For example, GlycoCT format can be used at GlycanBuilder platform to access glycan structures related information from GlycoWorkBench and UniCarbKB databases.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 94 Chapter 2

Figure 2.7: The search platform provided by the GlyTouCan database. The search query is permitted using graphic, text-based and file-based inputs. The output of the query includes glycan information in WURCS, GlycoCT (condensed), IUPAC and CFG formats.

Accessing multiple databases and performing search query using more than one encoding format is of paramount importance in glycomics research. The possibility to use format translational tools to generate the appropriate input formats is still feasible, but they should provide high- quality outputs and ideally integrated within data resources that are highly cross-linked to other databases. In this context, GlycanBuilder can be considered a good initiative, accepting both GlycoCT (condensed) and WURCS as input formats to search GlyTouCan database with giving output results in GlycoCT (condensed), IUPAC and WURCS formats. Considering the importance of WURCS format, tools focused on WURCS-associated translations are of extreme importance for future glycomics researches. Additionally, the GlyTouCan is helpful to integrate data with other databases, as it provides comparatively higher number of cross-references to other data resources, including glycomic and glycoproteomics databases. Accessing and Interlinking Databases in Glycobiology 95

2.6 Discussion and Conclusions

Despite the availability of high-throughput technologies and various glycomic databases, glycobiology research has not truly benefited from these advances, mainly because of the poor inter-connectivity of glycomic and glycoproteomic databases, lack of unified glycan representation notations and the lack of user-friendly frameworks or tools assisting glycobiologists. A significant effort has been made to develop various graphics and text-based formats to represent glycan structures; but still, none of them are universally used. Among the most recognized glycomics databases, CFG uses LinearCode and IUPAC, GLYCOSCIENCES.de exclusively use LINUCS format, and GlyTouCan stores glycans in the GlycoCT (condensed), IUPAC and WURCS format. This indicates clear dividing views within the glycoinformatics community. While databases, such as KEGG and GlyTouCan provide extensive information about structures and functions of many glycans, as well as the enzymes involved in the biosynthesis of glycans and glycoconjugates, others provide valuable information concerning structural features of glycans retrieved from the experiments such as MS/MS data and NMR, stored in completely different encoding formats and, on top of that, lacking cross-references to other glycomic databases. Also, there is limited linkage of these databases with other popular bioinformatics resources, such as PubChem, PDB and ExPASy server (especially PROSITE, UniProt, and ENZYME). GlycoSuiteDB was a promising initiative project in this context, which focused on inter-linking glycans information with SWISS-PROT/TrEMBL (Bairoch and Apweiler 2000) and PubMed repositories. One of the biggest challenges in the integration of glycomics data is the development of a standardized format for representing complex structures of glycans. WURCS format is perhaps the most suitable format for storing structural information of glycans and could be adopted by various glycomic databases. However, a short-term solution should rely on the integration of data through the interconversion of formats used by different databases. Most recently, various user-friendly tools provided by RINGS and WURCS-WG and those integrated within databases like GlyTouCan and UniCarbKB have been made available to facilitate inter-translation of the various encoding notations; however, these tools do not allow all possible conversions. For example, GlyTouCan provides conversion from GlycoCT (condensed) to IUPAC, but the opposite is not possible. Others, like GlyanBuilder and RINGS tools, allow two-way

Metabolic Systems Biology of Leishmania major University of Minho, 2019 96 Chapter 2

conversions, but while translating complex formats like GlydeII or GlycoCT to simpler formats such as IUPAC, LinearCode, there is a possibility to lose some of the structural information during parsing and translation. Future works should consider this inefficiency of the tools and minimize loss of information during translations. In spite of developments in capabilities of translational tools, there are still some translations such as pdb/mol to GlycoCT/KCF/LINUCS conversion and vice versa, which have not been implemented so far. This is highly demanding to link glycans 3D structural information to glycomic profiling data, and can be achieved either by developing new translational tools or using serial application of existing tools. Further developments in the same context provide a graphical representation of the glycan structures and linking those with the text-based encoding notations. GlycoViewer, GlycomeAtlas, GlycanBuilder, and KCaM are examples of popular tools which can be used to visualize the glycans structure, link graphic format to text format, and provides graphic/text-based input facility to search glycans related data in different databases.

Collaborative efforts to facilitate various analyses on glycans (e.g. structures visualization, notation inter-translation, and 3D structural analysis) on the same platform, as provided by GLYCOSCIENCES.de, RINGS, KEGG and WURCS-WG (Table 1.4, Chapter 1) are useful to dig out important biological information from experimental data. Several other tools, such as GlycomAtlas, DrawRINGS, KCaM were developed for cross-linking glycan notations, graphical representations, and structural information among various databases (Figure 2.6). As a drawback, each tool allows accessibility to a single database, except KCaM and GlycanBuilder, causing a restricted access to different databases. The performance of these tools can be improved either by increasing search space or utilizing translation tools which can facilitate data search using any required input formats. Recent advances in glycoinformatics methods have addressed some of the difficulties related to accessing and cross-linking the databases in glycomics research; however, efforts are still needed to organize and store the glycomics information in a structured way to make data access and integration easier.

Accessing and Interlinking Databases in Glycobiology 97

Key notes-

• Use of unstandardized notations format to represent glycan structures is the main cause of poor interoperability and interconnectivity between various glycomics and glycoproteomics databases. • There is a need for developing computer readable universal formats which should be capable of accommodating as many features from glycan structures as possible, to promote glycomics research. • Many glycomics and glycoproteomics databases contain useful information for glycobiology research, but their accessibility is limited because of unavailability of suitable web-API services. • Tools developed for inter-translating glycan notations from one format to another are quite useful for accessing and interlinking various glycomics databases; however, more practical implementation of these tools is needed.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 98 Chapter 2

2.7 Appendix 2a

Figure I: The inter-translation between KCF format and LinearCode using RINGS tools. The colored boxes show inconsistencies in the position of sugar molecules e.g. GlcNAc, Man, and Gal after translation. Legend: GlcNAc- N-acetylglucosamine, Man- Mannose, and Gal- Galactose. Accessing and Interlinking Databases in Glycobiology 99

2.8 References

Akune, Y. et al., 2010. The RINGS resource for glycome informatics analysis and data mining on the Web. Omics : a journal of integrative biology, 14(4), pp.475–486. Aoki-Kinoshita, K. et al., 2015. GlyTouCan 1.0- The international glycan structure repository. Nucleic acids research, pp.D1237-D1242. Aoki-Kinoshita, K.F., 2017. A Practical Guide to Using Glycomics Databases, Tokyo: Springer. Available at: http://www.springer.com/gp/book/9784431564522. Aoki-Kinoshita, K.F., 2013. Using databases and web resources for glycomics research. Molecular & cellular proteomics : MCP, 12(4), pp.1036–45. Aoki, K.F. et al., 2004. KCaM (KEGG Carbohydrate Matcher): A software tool for analyzing the structures of carbohydrate sugar chains. Nucleic Acids Research, 32(WEB SERVER ISS.), pp.W267–W272. Bairoch, A. and Apweiler, R., 2000. The SWISS-PROT protein and its supplement TrEMBL in 2000. Nucleic Acids Research, 28(1), pp.45–48. Banin, E. et al., 2002. A novel Linear Code((R)) nomenclature for complex carbohydrates. TRENDS IN GLYCOSCIENCE AND GLYCOTECHNOLOGY, 14(77), pp.127–137. Barh, D., Zambare, V. and Azevedo, V., 2017. OMICS: Applications in Biomedical, Agricultural, and Environmental Sciences, CRC Press. https://www.taylorfrancis.com/books/e/9781466562837. Baycin-Hizal, D. et al., 2014. Glycoproteomic and glycomic databases. Clinical proteomics, 11(1), p.15. Berman, H.M., 2000. The Protein Data Bank. Nucleic Acids Research, 28(1), pp.235–242. Bolton, E.E. et al., 2008. PubChem: Integrated Platform of Small Molecules and Biological Activities. Annual Reports in Computational Chemistry, 4, pp.217–241. Campbell, M.P. et al., 2008. GlycoBase and autoGU: Tools for HPLC-based glycan analysis. Bioinformatics, 24(9), pp.1214–1216. Campbell, M.P. et al., 2014. UniCarbKB: Building a knowledge platform for glycoproteomics. Nucleic Acids Research, 42(D1)., pp. D215-221. Capriles, P.V.S.Z. et al., 2010. Structural modelling and comparative analysis of homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo sapiens: putative drug targets for chagas’ disease treatment. BMC genomics, 11, p.610. Ceroni, A. et al., 2008. GlycoWorkbench: A tool for the computer-assisted annotation of mass spectra of glycans. Journal of Proteome Research, 7(4), pp.1650–1659. Ceroni, A., Dell, A. and Haslam, S.M., 2007. The GlycanBuilder: a fast, intuitive and flexible software tool for building and displaying glycan structures. Source code for biology and medicine, 2, p.3. Consortium for Functional Glycomics. http://www.functionalglycomics.org/. Cooper, C.A. et al., 2003. GlycoSuiteDB: A curated relational database of glycoprotein glycan structures and their biological sources. 2003 update. Nucleic Acids Research, 31(1), pp.511–513. Cummings, R.D. and Pierce, J.M., 2014. The challenge and promise of glycomics. Chemistry and Biology, 21(1), pp.1–15. Damodaran, D. et al., 2008. CancerLectinDB: A database of lectins relevant to cancer. Glycoconjugate Journal, 25(3), pp.191–198. Doubet, S. et al., 1989. The Complex Carbohydrate Structure Database. Trends in biochemical sciences, 14(12), pp.475–477. Eavenson, M. et al., 2015. Qrator: A web-based curation tool for glycan structures. Glycobiology, 25(1), pp.66–73.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 100 Chapter 2

Egorova, K.S. and Toukach, P. V, 2017. CSDB_GT: a new curated database on glycosyltransferases. Glycobiology, 27(4), pp.285–290. Forli, S. et al., 2016. Computational protein-ligand docking and virtual drug screening with the AutoDock suite. Nature protocols, 11(5), pp.905–919. Gerloff, D.L. et al., 2005. Structural models for the protein family characterized by gamete surface protein Pfs230 of Plasmodium falciparum. Proceedings of the National Academy of Sciences of the United States of America, 102(38), pp.13598–13603. International Union of Pure and Applied Chemistry, 2013 Compendium of chemical terminology - The Gold Book. Available at: http://goldbook.iupac.org. Gotz, L. et al., 2014. GlycoDigest: A tool for the targeted use of exoglycosidase digestions in glycan structure determination. Bioinformatics, 30(21), pp.3131–3133. Harvey, D.J. et al., 2011. Symbol nomenclature for representing glycan structures: Extension to cover different carbohydrate types. Proteomics, 11(22), pp.4291–4295. Hashimoto, K. et al., 2006. KEGG as a glycome informatics resource. Glycobiology, 16(5), pp.63R-70R. Hattori, M. et al., 2003. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc, 125(39), pp.11853–11865. Hayes, C.A. et al., 2011. UniCarb-DB: A database resource for glycomic discovery. Bioinformatics, 27(9), pp.1343– 1344. Herget, S. et al., 2008. GlycoCT-a unifying sequence format for carbohydrates. Carbohydrate Research, 343(12), pp.2162–2171. Joshi, H.J. et al., 2010. GlycoViewer: A tool for visual summary and comparative analysis of the glycome. Nucleic Acids Research, 38, pp. W667-670. Kameyama, A. et al., 2005. A strategy for identification of oligosaccharide structures using observational multistage mass spectral library. Analytical Chemistry, 77(15), pp.4719–4725. Kanehisa, M. and Goto, S., 2000. Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28, pp.27– 30. Kikuchi, N. et al., 2005. The carbohydrate sequence markup language (CabosML): An XML description of carbohydrate structures. Bioinformatics, 21(8), pp.1717–1718. Konishi, Y. and Aoki-Kinoshita, K.F., 2012. The GlycomeAtlas tool for visualizing and querying glycome data. Bioinformatics, 28(21), pp.2849–2850. Kumar, D. and Mittal, Y., 2011. AnimalLectinDb: An integrated animal lectin database. Bioinformation, 6(3), pp.134-136. von der Lieth, C.W., 2007. Databases and informatics for glycobiology and glycomics. Comprehensive glycoscience - From chemistry to systems biology - Volume 2: Analysis of glycans, pp.329–346. von der Lieth, C.W., Lütteke, T. and Frank, M., 2009. Bioinformatics for glycobiology and glycomics: An introduction, Chichester: John Wiley & Sons. p. 494. Lisacek, F. et al., 2017. Databases and Associated Tools for Glycomics and Glycoproteomics. Methods Mol Biol., 1503, pp.235–264. Lloyd, K., 2003. The Mutant Mouse Regional Resource Center Program. Breast Cancer Res., 5, p.7. Lombard, V. et al., 2014. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research, 42(D1), pp.D490-495. Loss, A. et al., 2002. SWEET-DB: an attempt to create annotated data collections for carbohydrates. Nucleic acids research, 30(1), pp.405–408. Lütteke, T. et al., 2006. GLYCOSCIENCES.de: An internet portal to support glycomics and glycobiology research. Accessing and Interlinking Databases in Glycobiology 101

Glycobiology, 16(5), pp.71R–81R. Maeda, M. et al., 2015. JCGGDB: Japan consortium for glycobiology and glycotechnology database. Methods in Molecular Biology, 1273, pp.161–179. Mariethoz, J. et al., 2015. SugarBindDB, a resource of glycan-mediated host-pathogen interactions. Nucleic acids research, 44(D1), pp.D1243-1250. Narimatsu, H., 2004. Construction of a human glycogene library and comprehensive functional analysis. In Glycoconjugate Journal. pp. 17–24. Nimmagadda, S. et al., 2008. GlycoVault: A Bioinformatics infrastructure for glycan pathway visualization, analysis and modeling. In Proceedings - International Conference on Information Technology: New Generations, ITNG 2008. pp. 692–697. Pérez, S. et al., 2015. Glyco3d: A portal for structural glycosciences. Methods in Molecular Biology, 1273, pp.241– 258. Perryman, A.L. et al., 2015. A virtual screen discovers novel, fragment-sized inhibitors of Mycobacterium tuberculosis InhA. Journal of Chemical Information and Modeling, 55(3), pp.645–659. Ranzinger, R. et al., 2008. GlycomeDB - integration of open-access carbohydrate structure databases. BMC bioinformatics, 9, p.384. Ranzinger, R. et al., 2017. GLYDE-II: The GLYcan data exchange format. Perspectives in Science, 11, pp.24–30. Sahoo, S.S. et al., 2005. GLYDE - An expressive XML standard for the representation of glycan structure. Carbohydrate Research, 340(18), pp.2802–2807. Sussman, J.L. et al., 1998. Protein Data Bank (PDB): Database of three-dimensional structural information of biological macromolecules. In Acta Crystallographica Section D: Biological Crystallography. pp. 1078–1084. Tanaka, K. et al., 2014. WURCS: The Web3 unique representation of carbohydrate structures. Journal of Chemical Information and Modeling, 54(6), pp.1558–1566. Tools, D., Mirrors, S. and Contact, A., 2010. ExPASy Proteomics Server. https://expasy.org/ Toukach, P. V., 2011. Bacterial carbohydrate structure database 3: Principles and realization. Journal of Chemical Information and Modeling, 51(1), pp.159–170. Varki, A. et al., 1999. Essentials of Glycobiology, Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; ISBN-10: 0-87969-559-5.

Varki, A. et al., 2009. Symbol nomenclature for glycan representation. Proteomics, 9(24), pp.5398–5399. Varki, A. et al., 2015. Symbol Nomenclature for Graphical Representations of Glycans. Glycobiology, 25(12), pp.1323–1324.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 102 Chapter 2

CHAPTER 3

Integrated Metabolic Flux And Omics Analysis Of Leishmania major Metabolism

“We have got the human genome sequenced, we have got the parasite genome sequenced, we have got the mosquito genome sequenced. Somehow you should be able to work out some way of controlling everything.”

Dickson Despommier, TWiP Episode 10 (2010)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 104 Chapter 3

Abstract

Leishmaniasis is a virulent parasitic infection that causes a significant threat to human health worldwide. The existing drugs are becoming less effective due to the ability of Leishmania spp. to alter its metabolism to adapt in harsh environments. Understanding how this parasite manipulates its metabolism inside the host (e.g. sandfly and human) might underpin new ways to prevent the disease and develop effective treatment strategies.

Despite significant advances in omics technologies, biochemistry of protozoan parasites still lacks the understanding of molecular components that determine the metabolic behavior under varying conditions. Metabolic network modeling might be of interest to identify physiologically relevant nodes in a metabolic network to understand the metabolism of parasites.

The present work proposes a metabolic model ext-iAC560 (an extension of the existing metabolic model iAC560) with additional reactions describing the metabolism of lipids, including long chain fatty acids and carbohydrates to study the metabolic behavior of L. major. Gene Inactivity Moderated by Metabolism and Expression (GIMME) algorithm coupled with Parsimonious Flux Balance Analysis (pFBA)-based simulations were used to analyze consistency between flux predictions and gene expression data. Improved flux distributions were obtained, allowing a more accurate understanding of stage-specific metabolism, in particular, under promastigote conditions. The biosynthesis of sugar nucleotides, which are main precursors in the synthesis of glycans and glycoconjugates in promastigote and amastigote forms of L. major were further inspected. Flux distribution helped to understand lethal and non-lethal phenotypes after knocking out carbohydrate- active enzymes N-acetylglucosamine 6-phosphate deaminase (GND) and glutamine-fructose-6- phosphate amidotransferase (GFAT), respectively in amastigote condition. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 105

3.1 Introduction

Protozoan parasites from the genus Leishmania belong to the family Trypanosomatidae, and cause a large spectrum of human diseases affecting around 12 million people worldwide (Nii-Trebi 2017). Existing therapies involve the administration of drugs like sodium stibogluconate, meglumine antimoniate, amphotericin B and miltefosine, which are limited by various factors, including host toxicity and lack of efficacy (Guerin et al. 2002; Goto and Lindoso 2010; Ponte-Sucre et al. 2017). Considering the endemic severity of the diseases, there is an urgent need for developing novel and effective antileishmanial therapies. In recent years, putative drug targets have been identified associated with metabolic pathways of several parasites such as Plasmodium falciparum (Qidwai et al. 2014), Trypanosoma brucei (Gu et al. 2013) and Toxoplasma gondii (Wei et al. 2013). In order to identify and exploit these metabolic drug targets, it is important to understand the metabolic behavior of the organism under different environmental conditions, especially when inside the host. In previous studies, significant alterations have been observed in the metabolism exhibited by Leishmania at different stages of its life cycle, where it faces different nutritional environments (Opperdoes and Coombs 2007). For example, the promastigote form (inside sand fly) of Leishmania uses preferably glucose as a carbon source; while amastigotes (inside macrophage) use glucosamine (GlcN) and its derivative N- acetylglucosamine (GlcNAc) along with some lipids and amino acids, involving different catabolic activities (Naderer et al. 2010; Naderer et al. 2006). Similarly, the biosynthesis of glycans and glycoconjugates, such as lipophosphoglycans (LPG), Glycosylinositol phospholipids (GIPL) and glycoproteins (GP), changes between the two stages. For example, high levels of glycoconjugate LPG are observed in the cell surface of promastigotes that should help Leishmania to survive in the sand fly midgut (Srivastava et al. 2013; Eggimann et al. 2015). On the other hand, high levels of GIPL might help amastigotes in interacting with macrophages and modulating Nitric Oxide (NO) synthesis to promote infection and evade the host immune response (Assis et al. 2012; Passero et al. 2015). In spite of vital roles of carbohydrates in parasitic survival, their metabolic involvement has not been fully understood. However, a previous study showed that the availability of various sugars, such as hexoses (e.g. glucose (Glc), mannose (Man), and galactose (Gal)), and amino sugars (e.g. glucosamine (GlcN) and N-acetylglucosamine (GlcNAc)) are determining factors for alterations in parasites metabolism as well as synthesis of essential metabolites including glycans and glycoconjugates (Turnock and Ferguson 2007). Unfortunately, previous studies poorly explained the metabolic basis leading to the biosynthesis of carbohydrates, especially those associated with the cell surface, in different environments. In fact, it

Metabolic Systems Biology of Leishmania major University of Minho, 2019 106 Chapter 3

is still unknown if observed metabolic changes are resulting from, or arising out of the different parasitic stages. For example, under promastigote stage, glucose is the main carbon source for energy production via glycolysis pathways driven by glycolytic enzymes, while only a few enzymes from the tricarboxylic acids (TCA) cycle are active; whereas, under amastigote stage, i.e. in glucose limiting condition, glycolytic enzymes are less functional (McConville and Naderer 2011; Opperdoes and Coombs 2007); however, gluconeogenesis pathways are active in this stage (Rodriguez-Contreras and Hamilton 2014). Metabolic network modeling is an effective and sophisticated approach for systematically studying the metabolic behavior of an organism, as well as to understand the relationship between its genotype and phenotype. Previously, these methods have been used to understand the cellular metabolism of many medically important organisms, such as Mycobacterium tuberculosis (Beste et al. 2007; Ma et al. 2015), Acinetobacter baumanii (Kim et al. 2010), Francisella tularensis (Raghunathan et al. 2010), and human parasites like Leishmania major (Chavali et al. 2008) and Plasmodium falciparum (Plata et al. 2010; Carey et al. 2017); though with low prediction accuracy. Integrating omics data with metabolic network analysis can improve model´s predictions and hence our understanding on various aspects, such as metabolic alterations associated with the environmental conditions, essential genes and metabolic flux variability of the essential reactions (Saha et al. 2014). Relevant data can be integrated into the metabolic model to put an extra layer of metabolic flux constraints to improve prediction efficiency. Various methods like GIMME (Becker and Palsson 2008), iMAT (Shlomi et al. 2008), E-Flux (Jensen and Papin 2011) and PROM (Chandrasekaran and Price 2010) have been made available for the integration of gene expression data into metabolic models. Similarly, other methods such as Integrative Omics Metabolic Analysis (IOMA) (Yizhak et al. 2010) and Mass Action Stoichiometric Simulation (MASS) (Jamshidi and Palsson 2010) allow integration of metabolomics and proteomic data into the metabolic models to constrain the reaction fluxes in the metabolic pathways. Successful examples in this context include the integration of RNAseq data into the Leishmania infantum model (Sharma et al. 2017), proteomics data into the metabolic model of Enterococcus faecalis (Großeholz et al. 2016), and multi-omics data into metabolic models of Escherichia coli (Joyce and Palsson 2006; Kim et al. 2016) to understand the metabolism and associated predicted phenotypes. These strategies have also improved the prediction of drug targets in many medically important organisms such as Aspergillus fumigatus (Kaltdorf et al. 2016), Plasmodium falciparum (Ludin et al. 2012) and L. major (Chavali et al. 2012). In spite of the availability of abundant omics data and various methodologies, only a few studies have employed these strategies to understand the metabolism of Leishmania (Chavali et al. 2008; Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 107

Subramanian and Sarkar 2017; Sharma et al. 2017). Here, we modified the genome-scale model iAC560 (Chavali et al. 2008) of L. major to accommodate metabolism and biosynthesis of sugar nucleotides (SugNuc) such as GDP-mannose (GDP-Man), GDP-arabinose (GDP-Ara), UDP- glucose (UDP-Glc), UDP-galactose (UDP-Gal), UDP-galactofuranose (UDP-Galf), and UDP-N- acetylglucosamine (UDP-GlcNAc). We also included pathways for lipid-associated metabolism. Further, gene expression data were integrated into the metabolic model using Gene Inactivity Moderated by Metabolism and Expression (GIMME) method (Becker and Palsson 2008) to improve predicted flux distributions to understand Leishmania metabolism.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 108 Chapter 3

3.2 Methodology

3.2.1 Model extension and refinement

The existing metabolic model iAC560 (Chavali et al. 2008) was extended to include pathways for biosynthesis of SugNuc and fatty acids. Pathways for consumption and degradation of complex lipids were also added. Metabolic reactions and enzyme-coding genes were collected from databases like KEGG (Kanehisa and Goto 2000) and LeishCyc (Doyle et al. 2009). However, as some reaction steps were not associated with any specific gene, homology search tools like BLAST (Madden 2013) were applied to find the highest scoring protein sequences (i.e. with criteria, % identity ≥ 40%, alignment length ≥ 70% and E-value 1.0e-30), as suggested in (Sharma et al. 2017), and associate those with these putative metabolic activities. In order to perform BLAST, the genes studied in other parasites such as Toxoplasma gondii, Plasmodium spp., as well as considered in the validated metabolic network of phylogenetically close species L. donovani (iMS606) (Sharma et al. 2017), were considered as a template. Additionally, based on experimental evidence, some metabolic reactions were altered in terms of reversibility and/or compartments, while new transport reactions for sugar nucleotides, lipids, and fatty acids were also included. Few reactions were deleted based on existing experimental evidence. For example, reaction (reaction ID “R_ACCOACrm” in the model) representing the conversion of acetyl-CoA to malonyl-CoA (which is further used for fatty acid synthesis) in mitochondria was deleted, as it wrongly represented the compartments of the associated metabolites according to BioCyc database (Caspi et al. 2014). Another reaction (reaction ID “R_HACD1m” in the metabolic model) for conversion of acetoacetyl-CoA to hydroxybutyryl-CoA was deleted, as it was not included in the TrypanoCyc database (Shameer et al. 2015). Refer to Supplementary Material 1 for added, deleted or altered reactions. 3.2.2 Biomass composition The macromolecular composition of L. major cells was also corrected by incorporating experimental data to define more accurate stoichiometry of components in the objective function(s) representing growth of the parasite. Protein, DNA, and RNA contents were estimated from L. donovani studies (Sharma et al. 2017), while carbohydrates, lipids, and polyamine contents were calculated using experimental data from protozoan Tetrahymena (Gates et al. 1982; Hellung-Larsen and Andersen 1989) and L. mexicana (Ralton et al. 2003). Individual carbohydrates, such as mannan, lipophosphoglycan (LPG), glycoinositol phospholipid (GIPL), and N-glycans, were estimated as follows: mannan contents were assumed to represent 80% and 90 % of all Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 109 carbohydrates in promastigote and amastigote stage, respectively (Ralton et al. 2003), while LPG, GIPL, and N-glycans would represent 20% and 10% in total, respectively. The relative mass fractions (w/w) of LPG, GIPL, and N-glycans were estimated based on previous studies (Turco and Sacks 1991; McConville and Ferguson 1993; Descoteaux and Turco 1993; Kink and Chang 1988). In the biomass equation, the building blocks (i.e. sugar nucleotides), which molar fractions were calculated using molecular composition of each glycoconjugate from BioCyc database, were added. Further details on biomass calculations can be found in Appendix 3b.

3.2.3 In-silico media formulation

3.2.3.1 Modified Media for Promastigote (MMP)

MMP was formulated based on previous studies using experimental conditions promoting the promastigote stage of L. major (Hart and Coombs 1982; Rainey and MacKenzie 1991). The media components include fifteen nutrient sources: twelve amino acids (L- (arg-L), L-cysteine (cys-L), L-histidine (his-L), L- (ile-L), L- (leu-L), L- (lys-L), L- (met-L), L- (phe-L), L-threonine (thr-L), L- (tyr-L), L- (val-L) and L- proline (pro-L)), with hypoxanthine (hxan), phosphate, and D-glucose, in oxygen rich condition. While D-glucose and L-proline serve as major carbon sources for Leishmania promastigotes, remaining components, particularly other amino acids, were included considering experimental conditions mentioned in (Merlen et al. 1999; Schuster and Sullivan 2002).

3.2.3.2 Modified Media for Amastigote (MMA) MMA includes all fifteen nutrients from MMP, with some additional amino sugars, amino acids, lipids and fatty acids, making a total of twenty nutrients in presence of oxygen. GlcN and GlcNAc amino sugars were included considering findings in (Naderer et al. 2010) that showed the degradation of glycosaminoglycans inside macrophages to provide GlcN and GlcNAc as major carbon sources during the amastigote stage. Stearic acid and phosphatidylethanolamine (peLM) were also included, based on different studies showing that Leishmania utilizes lipids from host cells and transports them into the cytosol (Naderer and McConville 2008; Zhang et al. 2007; McConville and Blackwell 1991; Winter et al. 1994; Zhang et al. 2005). L-aspartate (asp-L) and D- alanine (ala-D) were also added to MMA, based on experimental measurements showing high consumption of these amino acids by amastigotes (Saunders et al. 2011).

3.2.3.3 Definition of reaction flux constraints Model simulations were performed using the previously described media with specific reaction constraints for the uptake of external nutrients (See Appendix 3b). Since there is no available data

Metabolic Systems Biology of Leishmania major University of Minho, 2019 110 Chapter 3

to set maximal uptake rates for most media components, the lower and upper bounds were set unconstrained, allowing those to vary between -1000 and 0, respectively. As for media components used as major carbon sources, lower and upper bounds were constrained between -100 and 0. Furthermore, flux constraints for L-proline, glucose, and oxygen uptake were set differently to simulate phenotypes under amastigote and promastigote stages based on previous phenotype observations (Hart and Coombs 1982; Murray et al. 2005). For example, the absolute value of the lower bound (or maximum flux) for L-proline uptake was reduced by 90% in amastigote compared to promastigote simulations, based on a previous study showing a significant decrease in the consumption of this particular amino acid in L. mexicana amastigotes (Hart and Coombs 1982). Also, the lower bound for glucose uptake reaction was constrained to 90% less than that in the promastigote stage, considering previous findings (Murray et al. 2005; Garami and Ilg 2001), which concluded that glucose levels in parasitophorous vacuole (PV) are poor. Furthermore, the oxygen uptake during the amastigote stage was significantly reduced as compared to that in the promastigote stage, considering the fact that oxygen is limited in Leishmania-infected macrophage (Mahnke et al. 2014; Degrossoli et al. 2011). The upper and lower limits for uptake fluxes for all other nutrients were constrained as: -100 to 0 for major carbon sources (as also did for promastigote), and -1000 to 0 for others nutrients (See Appendix 3b).

3.2.4 Integration of gene expression data Gene expression data (FPKM1 values) of 10732 genes from L. major in promastigote stage (Rastrojo et al. 2013) was integrated with the extended metabolic model (termed as ext-iAC560) by applying GIMME approach. Because of availability of transcriptomics data only for promastigote L. major, we performed these analyses in promastigote stage. The GIMME algorithm was implemented in java by Sara Correia in her Ph.D. work (Correia 2016). The other scripts in java, as briefly described in Appendix 3d were written by me to use OptFlux modules (Rocha et al. 2010) and to run flux-based simulations using metabolic model ext-iAC560. All the java codes, which are written by me, and related material can be accessed from the GitHub repository (https://github.com/shakyawar/SupplMaterial_PhD).

Briefly, GIMME implementation considers genes with an expression level below the threshold as inactive, and thus assigns zero metabolic flux to all the associated reactions in the model. Since the model describes the relationship between genes, proteins, and reactions using logical rules, called Gene-Protein-Reaction (GPR) rules, the final value of gene expression is considered accordingly for

1 FPKM (Fragments Per Kilobase Million) is a method for estimating relative abundance of transcripts in terms of fragments observed in RNA-Seq experiment. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 111 the reactions which are associated with more than one gene. In case of isoenzymes (where two genes are combined with logical operator "OR"), the highest gene expression value among all the associated genes is considered, while in case of enzyme complexes (where two genes are combined with logical operator "AND"), lowest gene expression value among all the associated genes is considered. While simulating the model, the algorithm may reconsider few of these inactive reactions (which were assigned zero flux based on gene expression values) back in the simulation to achieve an optimal solution (i.e. maximum growth in this case). The reconsidered reactions are so- called metabolically important reactions (MIRs), as these are required to achieve maximum biomass in that condition. The remaining reactions are blocked and termed as metabolically unwanted reactions (MURs) in that particular metabolic state. Inconsistencies between the metabolic model and gene expression data are estimated based on MIRs that are re-inserted in the model. GIMME solves a linear programming (LP) problem on the reconsidered reactions to minimize this inconsistency. As such, inconsistency scores (IS) are calculated and associated to each metabolic reaction. Accordingly, metabolic reactions can be categorized as follow:

(1) Inactive, so-called MURs (expression levels below the threshold and metabolic flux2 equal to zero); (2) potentially inactive, so-called MIRs (expression levels below the threshold and metabolic flux2 is non-zero); (3) potentially active (expression levels above the threshold and metabolic flux2 equal to zero); (4) active (expression levels above the threshold and metabolic flux2 is non-zero).

Flux spans3 based on Flux Variability Analysis (FVA) and flux distributions from Parsimonious Flux Balance Analysis (pFBA4) were compared over different thresholds to analyze predicted solution space, as illustrated in Figure 3.1A. Different threshold values were tested and inconsistency scores (IS) were recalculated as described in the study (Becker and Palsson 2008) (Figure 3.1B). The predicted changes in metabolic operability (i.e. direction and amount of flux carried by a reaction) of reactions after GIMME implementation were also compared, and proteomic data from a previous study (Pawar et al. 2014) was used to further validate model simulations.

2 Metabolic flux was calculated by performing Parsimonious Flux Balance Analysis (PFBA) in combination with GIMME. 3 Flux span refers to the difference between maximum and minimum flux values that a reaction can carry according to FVA analysis. 4 pFBA refers to the flux balance analysis approach that incorporates flux parsimony, means total flux in all pathways to achieve an objective is minimized.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 112 Chapter 3

Figure 3.1: Workflow for integrating gene expression data into the metabolic model to calculate inconsistency and improve flux distribution predictions for L. major promastigote. A) Gene Inactivity Moderated by Metabolism and Expression (GIMME) algorithm was used to constrain the flux (i.e. zero and non-zero for the reactions with associated gene expression values below and above the threshold, respectively) in the metabolic network representing promastigote condition of L. major. Flux Variability Analysis (FVA) was used to analyze solution space over different thresholds, while protein expression data further provide validation on the predicted fluxes of reactions. B) An exemplifying scheme for calculating inconsistency scores (IS) using gene expression and flux values in a given metabolic network. The IS was calculated assuming synthesis of metabolite E as a target.

Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 113

3.3 Results and Discussion

3.3.1 Extended metabolic model

The original metabolic model iAC560 consisted of 560 genes, 1112 reactions, and 1101 metabolites in eight unique subcellular compartments (Chavali et al. 2008), while the extended model ext- iAC560 additionally includes 64 reactions associated with 33 genes (10 new and 23 already existing in the previous model), and 42 metabolites. Two reactions representing the biomass composition in promastigote and amastigote stages were also added, as described in Appendix 3b. Various reactions consuming sugars like GlcN, GlcNAc, Arabinose (Ara), Gal, and Man, along with biosynthetic reactions for sugar nucleotides, such as UDP-Glc, UDP-Gal, UDP-Galf, GDP-Ara, and GDP-Man were newly added to represent sugars metabolism and glycans biosynthesis in promastigote and amastigote stages of L. major. Pathways for consumption and degradation of lipids, such as phosphatidylethanolamine, along with the required transport reactions were also included in order to explore lipids associated metabolism, especially in the amastigote stage. Along the line of fatty acid synthesis, reactions were edited to produce, for instance, acetyl acyl carrier protein complex (acACP), which is an important component to synthesize other larger protein complexes such as Malonyl acyl carrier protein (malACP) in cytosol. These reactions are critical for utilizing acyl carrier protein (ACP), which is an important carrier of acyl intermediates during fatty acid synthesis in cytosol (Byers and Gong 2007). A few compartmental changes were also made, e.g. mitochondrial acetyl-CoA carboxylase was changed to cytosolic acetyl-CoA carboxylase for the synthesis of malonyl-CoA in fatty acid biosynthesis pathways, as suggested in previous studies (Oldberg and Brunengraber 1980; Ferré and Foufelle 2007). This correction was important to utilize cytosolic acetyl-CoA produced from citrate which is a product of TCA cycle in mitochondria, as described in the previous study (Saunders et al. 2010) focusing on central carbon metabolism of Leishmania parasite. The reaction for transporting citrate from mitochondria to cytosol is also added. Further, in order to convert citrate to acetyl-CoA in cytosol, a reaction representing ATP- dependent phosphorylation of citrate was included, as described in the previous study (Saunders et al. 2010). All the added or altered reactions are provided in Supplementary Material 1. The Systems Biology Markup Language (SBML) file of the final metabolic model and all supplementary material files can be accessed from the online GitHub repository (https://github.com/shakyawar/SupplMaterial_PhD).

Metabolic Systems Biology of Leishmania major University of Minho, 2019

114 Chapter 3

3.3.2 Model simulations and phenotypes predictions

3.3.2.1 Leishmania metabolism As introduced earlier, Leishmania parasites reside either in promastigote form inside the sandfly or in amastigote form inside the host organism (mostly inside human macrophages), where the main energy and nutrient sources vary from sugar and amino acids to lipids and fatty acids (Naderer et al. 2010; Naderer et al. 2006; McConville et al. 2015). Moreover, procyclic promastigotes, which refers to the non-infective forms of Leishmania in the digestive tract of the sandfly, face a glucose- rich environment, whereas, the metacyclic promastigotes refers to infective forms of the parasite in the mouthparts of the fly, and faces glucose limiting conditions. Subsequently, during early amastigote stage parasites undergo glucose limiting environment inside the host, and accordingly, change their metabolism to survive in that environment (Peters et al. 2008; Saunders et al. 2014). The availability of the nutrient levels in each condition varies and significantly alter the metabolism of Leishmania in order to provide a required response to the external environment.

Promastigote stage

Model ext-iAC560 simulations with MMP medium indicate that glucose should be the main carbon source metabolized via glycolysis to pyruvate, which can be further oxidized in mitochondria for energy production in the promastigote stage. The predicted pFBA flux distribution showed that early steps in glycolysis include ATP-dependent phosphorylation of glucose, followed by regeneration of ATP by pyruvate kinase (PYK, cytosolic) and phosphoglycerate kinase (PGK, glycosomal and cytosolic). The predicted secretion of by-products like succinate in glucose-rich medium is experimentally supported by previous studies (Rainey and MacKenzie 1991; Costa et al.

2011). Other by-products, such as acetate and L-alanine (ala-L) are also mentioned in the previous studies (Ter Kuile 1999; Costa et al. 2011). Moreover, these predictions are supported with the results in other studies (Van Hellemond et al. 2005; Tielens and van Hellemond 2009), which characterized energy metabolism and associated by-products in distinct trypanosomatids. The predicted urea production in the by-products is the result of arginase driven catalysis of L-arginine, as characterized in previous studies (Silva et al. 2012; Green et al. 1990). A negligible amount of zymosterol is unexpectedly predicted as a by-product, which can be explained as a result of lack of degradation pathways. Zymosterol can be used to produce derivatives of ergosterol, but the experimental supports for these pathways are poor in Leishmania (Shameer et al. 2015). Part of the glycolytic flux enters the mitochondria to activate TCA cycle and lipid biosynthetic pathways, which is in agreement with previous study (Saunders et al. 2011) explaining about activation of these pathways in glucose-rich medium, while part of Glucose-6-phosphate (Glc6P) is channelled to Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 115 the synthesis of sugar nucleotides (SugNuc) (discussed in more detail in the next section), as the main precursor for glycans and glycoconjugates formation in Leishmania. The active flux in the citrate metabolic pathway shows that, mitochondrial citrate is transported to cytosol to convert into acetyl-CoA which is used for lipid and fatty acids biosynthesis, unlike the wrong prediction by the original model iAC560 (Chavali et al. 2008) which mentioned that mitochondrial acetyl-CoA is used for the synthesis of lipid/fatty acid synthesis (Figure 3.2). The current predictions are supported by the previous studies (Coustou et al. 2008; Van Weelden et al. 2005; Van Hellemond et al. 1998), which characterized pathways associated with biosynthesis of mitochondrial acetyl-CoA in Trypanosoma brucei and other protozoan parasites.

The original model iAC560 predicted L. major phenotypes in the medium (containing 14 nutrients, as mentioned in the methodology), lacking glucose. Although, some of the phenotypic predictions from the previous models were agreed with the laboratory results, but, the considered medium (without glucose) is not a very practical environment that Leishmania faces inside sand fly, as glucose is the main carbon source for energy in this phase (Hart and Coombs 1982; Rainey and MacKenzie 1991). The present simulations have overcome some of those limitations concerning the components of the media by considering more practical environmental for L. major promastigote.

Amastigote stage

The model simulation in MMA media, which provide more realistic environmental conditions for Leishmania amastigote, help to understand stage-specific sugar and lipid/fatty acid associated metabolic strategies that parasites use for survival inside the host, which were not discussed by the previous model iAC560. The predicted pFBA flux distribution after imposing metabolic constraints based on MMA medium, as described in the methodology, showed a significant difference in the active metabolic profile of L. major in amastigote stage as compared to the promastigote stage. In general, the flux distribution describes major metabolic differences in glycolysis/gluconeogenesis, lipid biosynthesis and fatty acid degradation pathways in this stage. In lipid metabolism, some of the lipids such as phosphatidylethanolamine which is an essential part of biomass are directly scavenged from the PV, switching off several pathways to synthesize the same (Figure 3.2). The externally consumed lipids are also used as precursors in the synthesis of other complex lipids such as phospholipids (PL) and sphingolipids (SL). Leishmania may utilize phosphatidylethanolamine to synthesize complex glycolipids critical for surface associated interactions with macrophages under these conditions, as discussed in a previous study (Suzuki et al. 2008). Unlike in the promastigote stage, part of the carbon flux could be obtained from the degradation of complex lipids and then

Metabolic Systems Biology of Leishmania major University of Minho, 2019

116 Chapter 3

used to synthesize simpler lipids like fatty acids (Figure 3.2). In a similar context, the flux associated with the beta-oxidation of lipids is one of the major active pathways used to synthesize fatty acyl-CoA compounds such as stearyl-CoA (strcoa), tetradecanoyl-CoA (tdcoa) and palmitoyl- CoA (pmtcoa) in amastigote stage, which are either part of biomass or used in formation of glycolipid complexes, as also found in the previous study (Hart and Coombs 1982) (Figure 3.2). These analyses concerning switching on/off of the reactions in the lipid biosynthesis pathways help to understand the metabolic behavior of the parasite in lipid/fatty acid rich environment; which has not been well explained when using the previous models such as iAC560, iAS556, and iAS142.

Part of the TCA cycle carbon flux derives from the degradation of asp-L and ala-L, which forces the activation of gluconeogenic pathways as suggested in previous studies (Rosenzweig et al. 2008; Naderer et al. 2006; Rodriguez-Contreras and Hamilton 2014) that explained activation of gluconeogenic pathways in the amino acid-rich environment. Moreover, unlike in the promastigote stage, the glycolytic flux is reduced in this stage because of the glucose limiting conditions. The predicted flux distribution also showed that a part of the gluconeogenic flux is used to synthesize fructose 6-phosphate (F6P) which is a key intermediate in SugNuc biosynthesis (Figure 3.2). The predicted amount of flux in the TCA cycle is higher as compared to what was predicted in promastigote probably because of higher consumption of amino acids especially ala-L and asp-L, as well as fatty acids like stearic acid. However, a restricted amount of flux is responsible for the synthesis of NADH which results in the less amount of ATP generation through an electron transport chain as compared to the promastigote stage. The reduced flux in the reaction for ATP synthase driven oxidative phosphorylation to produce mitochondrial ATP in amastigote stage is also observed, which is supported by the experimental study explaining about the significant reduction in the expression of ATP synthase in amastigote stage (Leifso et al. 2007). The predicted flux values of all the reactions in promastigote and amastigote stages can be seen in the Supplementary Material 2.

Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 117

Figure 3.2: Active/inactive pathways in lipid metabolism based on flux distributions predicted from Parsimonious Flux Balance Analysis (pFBA) simulation using metabolic model ext-iAC560 in promastigote and amastigote stages of L. major. Modified Media for Promastigote (MMP) and Modified Media for Amastigote (MMA), as described in the methodology, were used as growth media to simulate promastigote and amastigote phenotypes, respectively. Legend: Glc6P: Glucose-6-phosphate, F6P: Glucose-1-phosphate, F1,6biP: Fructose 1,6-bisphosphate, g3p: Glyceraldehyde 3-phosphate, DHAP: Dihydroxyacetone phosphate, GlcN: Glucosamine, GlcNAc: N-acetylglucosamine, accoa: Acetyl-CoA, oaa: Oxaloacetate, PEP: Phosphoenolpyruvate, pyr: Pyruvate, Pe_LM: phosphatidylethanolamine,

Although the model simulations in MMP and MMA medium was useful to understand energy and carbohydrate metabolism in promastigote and amastigote stages, the synthesis of metabolites such as succinate and alanine is not understood well using current flux distribution in promastigote condition. These are mainly because of incorrect predictions of the direction of the reaction fluxes. For example, predicted ala-L to pyruvate conversion in mitochondria is not well supported process especially in the glucose-rich environment; because glycolytic flux is the main driving force to

Metabolic Systems Biology of Leishmania major University of Minho, 2019

118 Chapter 3

synthesize pyruvate in this state. Synthesis of pyruvate is important to understand central carbon metabolism, which cannot be described well using current predictions. This emphasized developing strategies to put strong metabolic constraints in the model in order to predict improved flux distribution in a particular environmental condition. As described in Chapter 1 (section 1.6.2), the relevant omics data (e.g. gene expression data, metabolomics data, and protein expression data) can be used to provide an extra layer of constraints in flux-based simulations to improve metabolic flux predictions. Further sections described an application of the GIMME algorithm to utilize gene expression data in flux-based simulations using metabolic model ext-iAC560 to analyze consistency between metabolic flux and expression data, and improvement in the flux distribution to better understand the parasite´s metabolism in promastigote stage.

Before performing GIMME simulations, sugar nucleotides biosynthesis in promastigote and amastigote medium was analyzed, as discussed next.

3.3.2.2 Sugar nucleotides (SugNuc) biosynthesis in promastigote and amastigote stage Flux predictions using MMP media (promastigote stage) compared to MMA media (amastigote stage) showed that glucose is mostly used to synthesize essential SugNuc (as represented by red arrows in Figure 3.3); while, the GlcN or GlcNAc are alternatively used for synthesizing SugNuc especially UDP-GlcNAc and GDP-Man (green arrows in Figure 3.3). The consumption of amino sugar has been demonstrated by previous studies using L. major amastigotes that were able to survive in media with GlcN or GlcNAc as sole carbon sources (Naderer et al. 2010). The synthesis of SugNuc is primarily dependent on the consumption of the available sugars; however, other carbon sources such as lipids and amino acids are also consumed and contribute to the synthesis of SugNuc via gluconeogenesis pathways, particularly in amastigote stage.

Glucose uptake is essential in promastigote and amastigote stages of L. major life cycle; however, the consumption rate is significantly reduced in amastigote stage (Rainey and MacKenzie 1991; Saunders et al. 2014). Glucose limiting environment in macrophage forces Leishmania to scavenge other sugars specially GlcN and GlcNAc which are used as primary carbon sources. However, slow consumption of glucose even in GlcN and GlcNAc rich environment has not been understood in the previous models and studies. The present model ext-iAC560 describes essentiality of glucose consumption in amastigote environment mainly for the synthesis of GDP-Ara (thin green arrow in Figure 3.3), which is an essential precursor of LPG that is uniquely expressed in L. major among different trypanosomatids (Lillico et al. 2003). Inside macrophage, a small amount of glucose possibly comes from either degradation of glycans and glycoconjugates in the macrophage or from host cells. This is the first time, flux-based analysis provides an overview of active/inactive Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 119 metabolic pathways to synthesize SugNuc as essential metabolites in promastigote and amastigote stages of L. major.

Further, knockout simulations were performed using important genes in sugar nucleotide biosynthesis pathways in order to observe phenotypic effects in Leishmania, as discussed in next section.

Figure 3.3: Active/inactive pathways in the biosynthesis of essential sugar nucleotides (SugNuc) in promastigote and amastigote stages. The reactions represented by red arrows indicate active flux in promastigote stage, when glucose is a main carbon source for the synthesis of SugNuc. The green arrows represent active flux distribution in amastigote stage, when lipids, two additional amino acids (asp-L and ala- D), and amino sugars are consumed as a main carbon source to synthesize essential SugNuc. Alternatively, other carbon sources such as Fructose (Fru), Galactose (Gal), Arabinose (Ara), and Mannose (Man) can be used to synthesize SugNuc. Legend: Ara: Arabinose, GDP-Ara: GDP-Arabinose, Ara5P: Arabinose-5- phosphate, Glc: Glucose, Glc6P: Glucose-6-phosphate, Glc1P: Glucose-1-phosphate, UDP-Glc: UDP- glucose, Man: Mannose, Man6P: Mannose-6-phosphate, Man1P: Mannose-1-phosphate, GDP-Man: GDP- Mannose, Gal: Galactose, Gal1P: Galactose-1-phosphate, UDP-Gal: UDP-Galactose, UDP-Galf: UDP- Galactofuranose, Fru: Fructose, F6P: Fructose-6-phosphate, GlcN: Glucosamine, gam6p: Glucosamine-6- phosphate, GlcNAc: N-acetylglucosamine, acgam6p: N-acetylglucosamine-6-phosphate, acgam1p: N- acetylglucosamine-1-phosphate, UDP-GlcNAc: UDP-N-acetylglucosamine.

Metabolic Systems Biology of Leishmania major University of Minho, 2019

120 Chapter 3

3.3.2.3 Phenotypic effects of gene knockouts in the sugar nucleotides biosynthesis The consumption of amino sugars such as GlcN and GlcNAc in amastigote stage is primarily important for the synthesis of the SugNuc as well as to keep central carbon metabolism active which is required for survival and virulence of the parasite (Naderer et al. 2010) (Figure 3.3). The metabolism of these sugars is mediated by metabolic enzymes such as N-acetylglucosamine 6- phosphate deacetylase (GNAD) and Glucosamine 6-phosphate deaminase (GND), while other active enzymes in these pathways include Glutamine-fructose-6-phosphate amidotransferase (GFAT) and Glucosamine-6-phosphate acetylase (GNAT) in amastigote phase of Leishmania (Naderer et al. 2010; Naderer et al. 2015). GND and GNAD facilitate the conversion of amino sugars (e.g. GlcNAc and GlcN) to Glc6P in the glycosome; whereas, GFAT and GNAT are involved in the pathway synthesizing UDP-GlcNAc from Fructose-6-phosphate in cytoplasm (Figure 3.4). In order to predict knockout phenotypes, GND and GFAT were deleted from the network one by one, and pFBA simulations were performed using MMA conditions to predict flux distribution in SugNuc biosynthetic pathways in amastigotes. The predicted flux distribution after GND knockout showed the inability of Leishmania to synthesize Glc6P, which resulted in the restricted synthesis of SugNuc (especially UDP-Glc, UDP-Gal, and UDP-Galf) and consequently showed lethal effects in the parasite (Figure 3.4A). These predictions are also supported in the previous study (Naderer et al. 2010), which showed that knockout GND and growing cells in the media containing amino sugars as sole carbon source makes Leishmania unable to grow. The current predictions provide a possible explanation for the lethal effect as Leishmania is unable to synthesize UDP-Gal, UDP-Galf, and UDP-Glc which are essential for biomass. Unlike GND, GFAT knockout simulations predicted non-lethal effects for amastigotes, since not all alternative pathways for synthesizing essential SugNuc, especially GDP-Man and UDP-GlcNAc, are blocked (Figure 3.4B). A previous study (Naderer et al. 2008) proposed that GlcNAc or GlcN are essential sugars for the synthesis of SugNuc especially UDP-GlcNAc. These analyses are useful to understand gene knockout effect for biosynthesis of SugNuc, which is directly linked to the growth of the parasite. The strategy can also be utilized to identify carbohydrates-associated drug targets to develop novel antileishmanial therapies. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 121

Figure 3.4: Predicted active and inactive sugar nucleotides (SugNuc) biosynthetic pathways after knocking out genes GND (A) and GFAT (B) one by one in amastigote stage. Dashed arrow represents a restricted synthesis of the corresponding metabolites (marked with red circle) which cause lethal effects. NOTE: The pFBA was performed using model ext-iAC560 in Modified Media for Amastigote (MMA medium) to predict flux distribution after gene knockouts. Legend: Ara: Arabinose, GDP-Ara: GDP-Arabinose, Arab5P: Arabinose-5-phosphate, Glc: Glucose, Glc6P: Glucose-6-phosphate, Glc1P: Glucose-1-phosphate, UDP-Glc:

Metabolic Systems Biology of Leishmania major University of Minho, 2019

122 Chapter 3

UDP-glucose, Man: Mannose, Man6P: Mannose-6-phosphate, Man1P: Mannose-1-phosphate, GDP-Man: GDP-Mannose, Gal: Galactose, Gal1P: Galactose-1-phosphate, UDP-Gal: UDP-Galactose, UDP-Galf: UDP- Galactofuranose, Fru: Fructose, F6P: Fructose-6-phosphate, GlcN: Glucosamine, gam6p: Glucosamine-6- phosphate, GlcNAc: N-acetylglucosamine, acgam6p: N-acetylglucosamine-6-phosphate, acgam1p: N- acetylglucosamine-1-phosphate, UDP-GlcNAc: UDP- N-acetylglucosamine.

3.3.3 GIMME simulations and consistency analyses

3.3.3.1 Gene coverage in ext-iAC560, transcriptomic and protein expression study

Transcriptomics data (gene expression data in promastigote condition) of metabolic genes were used in GIMME analysis, while protein level expression of the genes was used to further validate model predictions using GIMME approach. Nearly all genes in the ext-iAC560 model (566 out of 570 genes) are covered by the transcriptomics data (Rastrojo et al. 2013) (Figure 3.5 A) and nearly 70% are expressed at the protein level (Pawar et al. 2014) (Figure 3.6). Density analysis of expression levels showed that a large number of genes from transcriptomics data have low FPKM values (between 0 and 50), while; only a few genes have very high FPKM value (above 1000). Genes that were expressed at the protein level have minimum and maximum FPKM values of 1.8 and 1357.4, respectively (Figure 3.5 B).

Figure 3.5 A) Common genes considered in the transcriptomic study (Rastrojo et al. 2013), proteomic study (Pawar et al. 2014) and current metabolic model ext-iAC560. B) The density plot of FPKM values of 10275 genes in considered in transcriptomic study and coverage of the genes which are expressed in the proteomic study. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 123

Figure 3.6: Expression level (FPKM values in promastigote stage) of 570 genes considered in the metabolic model ext-iAC560. The blue dots represent genes that are expressed at the protein level in the proteomic study (Pawar et al. 2014). The red dots show genes with zero FPKM values.

3.3.3.2 Consistency between model predictions and transcriptomics data Integration of high throughput omics data into metabolic simulation helps improving prediction of phenotypes. In the present analysis, the gene expression data were used to constrain the reaction fluxes during the pFBA simulation using model ext-iAC560, followed by consistency analysis between expression data and predicted fluxes. As described in the methodology, based on different tests, where the gene expression threshold value was changed, it was observed that IS values increase with the threshold values (Figure 3.7A), particularly above threshold of 11 (Figure 3.7B). Below this threshold, IS values are close to zero, indicating that there are only a few inconsistencies between predicted fluxes and gene expression levels associated with the corresponding reactions. While increasing the threshold value, inactive reactions (i.e. reactions with expression levels below the threshold and metabolic flux equal to zero, or MURs) (green line in Figure 3.7A) increases; at the same time, more reactions (classified as potentially inactive reactions in the methodology, or MIRs) with predicted fluxes different from zero and with expression levels below threshold are also accounted (blue line in Figure 3.7A), which increases the level of inconsistencies (red line in Figure 3.7A) between expression data and flux predictions. Although the number MURs increase with the

Metabolic Systems Biology of Leishmania major University of Minho, 2019

124 Chapter 3

threshold value, agreeing with metabolic predictions; the fact is that increasing the threshold value tends to exclude reactions (so-called MIRs) that should also be active as predicted by pFBA-based simulations.

As shown in Figure 3.7A, the number of MIRs reaches a maximum of 249 at the highest expression threshold value of 368. That means, at gene expression threshold 368, GIMME reconsider 249 reactions (which gene expression levels are below the threshold, and fluxes were assigned zero) in the model simulation to maximize the growth function. These reconsidered reactions are critically involved towards optimal performance (i.e. maximizing growth in this case) of the network, and give rise to inconsistency, IS = 1195.5×104 in this case (as per the calculation described in Figure 3.1B). Although the number of MIRs are lower than the number of MURs at a particular threshold value, these contribute far more to increase IS values. Therefore a threshold value should be carefully selected. In the lower range, let’s say at a threshold value of 12 (close to the area where IS increases sharply (Figure 3.7B), GIMME simulations predicted 30 genes (out of 570) with expression levels below the threshold value, corresponding to 23 reactions from which 16 were considered MURs and 7 were MIRs. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 125

Figure 3.7: Evaluating inconsistencies between ext-iAC560 model predictions and gene expression data from L. major promastigote cells. A) Inconsistency scores (IS) were calculated for different expression threshold values, while estimating the number of reactions with gene expression levels below a threshold value and predicted flux values equal and different from zero. B) Zoom in of plot A for lower threshold values, showing the variation in the inconsistency score and the number of reactions with gene expression below threshold with flux values different and equal to zero. Legend: MIRs- metabolically important reactions; MURs- metabolically unwanted reactions Importantly, most of the reactions from seven predicted MIRs (e.g. R_PPA, R_MCMAT2m, R_MCMAT4m, R_MCMAT3m, R_MCMAT8m, R_ACMAT1m, and R_DHORTS) belong to fatty acid biosynthesis pathways. In promastigote condition, it makes sense that these reactions are critical for synthesis of fatty acids, which are not available to utilize as carbon source from the gut of sand fly. Below threshold 12, zero MIRs were predicted, leading to no inconsistency between

Metabolic Systems Biology of Leishmania major University of Minho, 2019

126 Chapter 3

gene expression and predicted fluxes; however, at higher threshold especially in the range between 60 and 100 (simulations were performed at threshold 70 in this case), GIMME simulation identified 274 MURs, but the number of MIRs is also increased which gave rise to the inconsistency (IS = 69×104) between gene expression and predicted fluxes. Above threshold 70, the IS increases even further, but the number of MIRs seems stabilized, that means after this phase, GIMME predicts MIRs which have higher fluxes. As shown in the methodology section (Figure 3.1B), the calculation of IS does not only depend on the number of MIRs, but also on the flux values of the reactions. In order to minimize inconsistency and giving emphasis to MIRs and their fluxes, a threshold in the range between 60 and 100 (= 70 in this case) was chosen for further GIMME simulation to predict fluxes across various pathways, which are interpreted in further sections. Distribution of the metabolic genes in transcriptomic and proteomic studies at gene expression threshold 70 is provided in Table I of Appendix 3c. FVA analyses also showed that the flux spans of several metabolic reactions (including those which were activated) were reduced significantly, probably because of lowering the number of branching reactions in the metabolic network. The activated and inactivated reactions after GIMME constraints improved understanding on Leishmania metabolism in promastigote stage. FVA analyses were performed based on GIMME constraints using a threshold value of 70. Briefly, the idea was to evaluate changes in metabolic predictions imposed by GIMME constraints (especially MURs) and estimate the impact in the predicted metabolic flexibility under the defined conditions. Therefore, FVA analyses with and without GIMME constraints were compared. Reactions were categorized as such: type1, minimum and maximum FVA fluxes equal to zero; type2, minimum and maximum FVA fluxes different from zero (either positive or negative); and type3, minimum and maximum FVA fluxes equal to upper and lower bounds of reactions (Table 3.1). In terms of metabolic importance, type1 reactions are always inactive, type2 reactions are always active, and type3 reactions can be active or inactive in a particular state. Results showed that the number of reactions type2 increased, while reactions type3 decreased, which suggests that GIMME-based constraints reduced metabolic flexibility associated with large FVA spans. Also, reactions of type 1 with FVA spans of zero (i.e. blocked reactions) contribute to decrease this metabolic flexibility, as the number of possible alternatives for carbon distribution within the network also decreases. In general, the flux spans of several metabolically active reactions are reduced after GIMME implementation. Minimum and maximum flux values from FVA analyses with and without GIMME-based constraints for reactions which significantly changed their fluxes are presented in Table I of Appendix 3a. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 127

Table 3.1: Number of reactions classified as type1, type2 and type3 from Flux Variability Analysis (FVA) results considering simulations with/without GIMME-based constraints (i.e. deleting MURs at gene expression threshold = 70), with 90% flexibility of maximum biomass in promastigote condition.

Reaction Minimum (min and maximum Number of Reactions Category (max) FVA values Without GIMME With GIMME

type 1 min=0 and max=0 472 654

type 2 min/max<0 or min/max>0 239 321

type 3 min=lower bound and max=upper 464 200 bound

Additionally, predicted flux distributions from pFBA with and without GIMME were compared. In general, flux distributions did not change significantly, most likely because of small differences in the number of active and non-active reactions (Figure 3.8A); however, few reactions changed their flux values from zero to non-zero and vice-versa. The reactions that changed flux values from zero to non-zero are mostly transport reactions, probably to maintain the inter-compartmental flux that is associated with the metabolic functionality of the other reactions showing significant changes in their fluxes. However, reactions belonging to other metabolic pathways like “Glycerophospholipid metabolism” (13.04%) and “Fatty acid metabolism” (20.29%) (Figure 3.8B) were also identified. Changes in flux operability of these reactions are partially supported by proteomic data for L. major in the study (Pawar et al. 2014), which showed that 23 reactions that changed flux values from zero to non-zero after GIMME implementation are associated with genes that are expressed at protein level at promastigote stage. On the other hand, 16 deactivated reactions (i.e. changed flux values to zero) after GIMME are associated with genes that were also expressed at the protein level. Protein expression was also positive for the 20 reactions that showed significant changes (non-zero fluxes) in their fluxes. The reactions with significant changes in their flux values, and protein expression of their associated genes are mentioned in Table I of Appendix 3a.

After imposing GIMME-based constraints (at threshold = 70), the pFBA simulation was performed in MMP media (as defined in the methodology) with maximizing the biomass to predict flux distribution across different metabolic pathways. As discussed in the previous section that many of the reactions changed their fluxes from zero to non-zero, or vice versa; many of them changed the direction of the flux flow after GIMME implementation (Table I of Appendix 3a). An example include activation of reaction converting malate to pyruvate catalysed by malic enzyme

Metabolic Systems Biology of Leishmania major University of Minho, 2019

128 Chapter 3

(LmjF24.0770) in cytosol after applying GIMME constraints; which essentially make sense in order to maintain NAD/NADH pool as well synthesis of succinate (which is an end product) from phosphoenolpyruvate (PEP) associated pathways in this state (Figure 3.9). The demand of malate, in this case, is fulfilled by reaction representing synthesis of malate from oxaloacetate (oaa) catalyzed by malate dehydrogenase (LmjF28.2860), which had opposite directionality before GIMME implementation, as shown in Figure 3.9.

Figure 3.8: GIMME constrained pFBA simulations of promastigote conditions of L. major. A) Changes in the number of reactions with predicted flux = 0 or ≠ 0 after implementing GIMME at different gene expression thresholds. The reactions which became inactive due to GIMME-based constraints are mentioned with the green bar. B) Percentage cellular distribution of the reactions that changed flux values from zero to non-zero or vice versa in their fluxes after GIMME (at gene expression threshold 70). Legend: GIMME- Gene Inactivity Moderated by Metabolism and Expression; MURs- Metabolically Unwanted Reactions; pFBA- Parsimonious Flux Balance Analysis. The calculations in both A and B correspond to promastigote condition of L. major. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 129

Interestingly, associated genes of both reactions are expressed at the protein level (Pawar et al., 2014), supporting the current predictions. The cytosolic malate is also transported to mitochondria to synthesize fumarate and succinate in this compartment. Activation of reaction representing synthesis of L-alanine from pyruvate is predicted only when GIMME constraints were applied; however, without GIMME the direction of this conversion was reversed in mitochondria, which is not the case in L. major promastigote, as main driving flux for synthesis of metabolites such as pyruvate, alanine, and acetyl-CoA in mitochondria is glycolytic flux especially when glucose is the main carbon source (Saunders et al. 2011), as shown in Figure 3.9. L-alanine is further used as a precursor in other metabolic reactions as well as to contribute to biomass in Leishmania. The improvement in the predicted flux distribution after applying GIMME-based constraints provided a better understanding of Leishmania metabolism in promastigote stage. All the changes in value and direction of the fluxes of reactions after GIMME implementation, along with the protein expression of the associated genes are provided in Table I of Appendix 3a.

Figure 3.9: Changes in the flux distribution of the glycolysis associated pathways after GIMME constraints (at threshold = 70) in L. major promastigote. The red arrows represent changed fluxes after imposing GIMME constraints. Legend: PEP: Phosphoenolpyruvate; oaa: oxaloacetate; succ: succinate; ala_L: L- alanine; TCA: tricarboxylic acid; pyr: Pyruvate; mal: Malate. 1L-alanine transaminase, 2Malic enzyme (NAD), 3Malate dehydrogenase.

Metabolic Systems Biology of Leishmania major University of Minho, 2019

130 Chapter 3

3.4 Conclusions Metabolic models are helpful to understand the metabolism of an organism in a defined environmental condition. In this work, the model ext-iAC560 (extension of existing model iAC560) provided a metabolic framework to understand biosynthesis of sugar nucleotides such as UDP- Glucose, GDP-Arabinose, UDP-Galactose and UDP-N-acetyl-Glucosamine which are main precursors of glycans and glycoconjugates in promastigote and amastigote stages of Leishmania. The gene deletion studies of enzymes such as GND (which catalyse conversion of Glucosamine-6- phosphate to Fructose-6-phosphate, in glycosome) and GFAT (which catalyse conversion of Fructose-6-phosphate to Glucosamine-6-phosphate, in cytosol), which play vital roles in sugar nucleotides biosynthesis, predicted lethal and non-lethal phenotypes in amastigote environment, which was found in agreement with the earlier experimental studies. The model simulation in MMA medium (amastigote stage) characterized the phenotype of Leishmania inside human host where it uses lipids, amino sugars, and amino acids as major nutrients. GIMME implementation, which incorporates gene expression data (FPKM values from RNAseq experiment) to constrain the fluxes of the metabolic reactions, improved the pFBA-based flux distributions across various pathways to understand L. major metabolism in promastigote stage. Moreover, the reversibility of some metabolic reactions (e.g. pyruvate to alanine conversion and transport of malate between cytosol and mitochondria) was predicted biologically more reliable after imposing GIMME-based constraints in pFBA simulations. The overall strategy was helpful to constrain the metabolic model ext-iAC560 to improve pFBA-based flux predictions in order to understand L. major promastigote metabolism. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 131

3.5 Appendix 3a Table I: Minimum and maximum flux value (from FVA analyses) of the reactions without and with GIMME (at threshold 12 and 70) in promastigote condition. The bold text represents reactions where flux value (from pFBA) at threshold 70 was significantly changed as compared to threshold 12. The protein expression of the associated genes of the reactions is mentioned in the last column.

Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) GIMME at (Pawar et al., Reaction ID Associated GIMME at threshold Without (threshold (threshold Before GIMME threshold 12 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum R_ATPSm genes1 55.3627 67612.7304 55.3627 67577.1606 245.6493 748.1146 25394.9432 154.1241 457.6477 no ((LmjF05.0290 or R_PItm LmjF35.4420) or 39.824 86097.6772 118.7016 86095.0151 121.9206 1051.4451 38016.5681 154.2915 701.4258 no LmjF35.4430) R_PI4P5K_LM LmjF34.3090 0 5560.3119 0 5550.625 0.3052 0.3393 0 0 0.3392 no ((LmjF32.3670 or R_HIBHrm LmjF32.3660) or 0 0.6547 0 0.6547 0 0 0.6453 0.6453 0 no LmjF32.3650) (LmjF36.2360 or R_TYRTA -100000 100000 -100000 100000 0 0 13125.8643 -100000 0 no LmjF35.0820) yes (LmjF01.0470, LmjF01.0490, R_FACOAL140 genes2 0.0015 429.6415 0.0015 379.8671 0.0015 9.4954 0.0016 0.0016 0.3167 LmjF01.0520, LmjF01.0530, LmjF03.0230, LmjF13.0420) yes (LmjF03.0230, LmjF01.0470, R_FACOAL160 genes3 0.0044 1177.0604 0.0044 1167.0979 0.0044 17.8395 1.264 1.264 0.9502 LmjF01.0490, LmjF01.0520, LmjF01.0530, LmjF13.0420)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 132 Chapter 3

Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum yes (((LmjF34.0150 or (LmjF34.0150, LmjF34.0140) or R_MDHm -99999.744 99809.3116 -99999.7439 99809.3116 6.5195 156.0011 -99967.78 -209.3076 119.0813 LmjF34.0140, LmjF34.0160) or LmjF34.0160, LmjF34.0130) LmjF34.0130) yes, ((LmjF20.0110 and LmjF20.0110, R_PGKg LmjF20.0100 ) and -23865.627 50152.0946 6.2804 50151.9529 141.3314 212.1647 -12450.9417 6.9712 168.8897 LmjF20.0100, LmjF30.3380 ) LmjF30.3380 yes ((LmjF20.0110 and (LmjF20.0110, R_PGK LmjF20.0100 ) and -187.1932 50113.0846 -187.1932 49983.4792 -52.1422 18.691 -23.2879 -185.7845 -23.8587 LmjF20.0100, LmjF30.3380 ) LmjF30.3380) yes ((LmjF36.2320 or (LmjF36.2320, R_GLUKg LmjF21.0240) or 0 99.9705 0 99.9705 0 99.9705 99.9673 0.6587 99.9673 LmjF21.0240, LmjF21.0250) LmjF21.0250) yes ( LmjF35.3870 and - R_NDPK1 -36996.757 9934.9159 -36994.3607 9880.4756 -454.549 -12744.7302 -124.3206 -306.2065 (LmjF35.3870,L LmjF32.2950) 119.2409 mjF32.2950) yes (LmjF18.0670 or R_CSm 0 130.3745 0 130.3745 0 112.6942 31.9357 31.9357 91.6309 (LmjF18.0670, LmjF18.0680) LmjF18.0680) yes (LmjF23.0540 or R_ACSm 0 31169.8248 0 31168.4296 1.7513 304.0854 12635.8517 15.4421 198.0668 (LmjF23.0540, LmjF23.0710) LmjF23.0710) yes (LmjF27.0880 or R_AKGDe1 0 36.142 0 36.142 0 1.4507 1.93 1.93 1.4298 (LmjF27.0880, LmjF36.3470) LmjF36.3470)

Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 133

Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum - (LmjF27.1805 or yes R_PPCKg -25559.621 13.0653 -25517.8463 13.0653 25517.846 12.4106 -30.2591 -192.7557 -192.7557 LmjF27.1810) (LmjF27.1805) 3 yes R_GSADHm LmjF03.0200 -100000 620.8456 -100000 620.8456 0.1197 1.5705 238.3441 238.3441 1.5628 (LmjF03.0200) yes R_PPA LmjF03.0910 30.4879 50738.035 75.3666 50724.1202 75.7377 517.263 12719.5409 99.1313 281.8398 (LmjF03.0910) (LmjF27.2440 or yes R_HDDR5m 0 62.0866 0 0 0 6.9589 0 0 0.3151 LmjF24.2030) (LmjF24.2030) yes (LmjF18.0440 or R_PAPAm_LM 0 67559.8979 0 67519.6522 0 593.012 0 1.0479 1.049 (LmjF18.0440, LmjF19.1350) LmjF19.1350) (LmjF18.1510 or yes R_ATPS 0 99.2462 0 99.2462 0 93.137 0 0 59.8634 LmjF18.1520) (LmjF18.1520) (LmjF27.2440 or yes R_HTDR6m 0 62.0866 0 0 0 6.9589 0 0 0.3151 LmjF24.2030) (LmjF24.2030) yes (LmjF35.1190 or R_FRDm 0 315.8052 0 315.5216 0 126.6076 0 0 88.7585 (LmjF35.1190, LmjF35.0830) LmjF35.0830) yes R_HPROxm LmjF03.0200 0 100000 0 100000 176.8248 257.7683 0 0 236.2601 (LmjF03.0200) yes R_HPROam LmjF13.1680 0 100000 0 100000 176.8248 257.7683 0 0 236.2601 (LmjF13.1680) yes R_ME1x LmjF24.0770 0 25537.9955 0 25496.2204 0 25496.22 0 162.4966 162.4966 (LmjF24.0770) yes R_DDMAT5m LmjF05.0520 0 62.0866 0 0 0 6.9589 0 0 0.3151 (LmjF05.0520) yes R_DEMAT4m LmjF05.0520 0 62.0866 0 0 0 6.9589 0 0 0.3151 (LmjF05.0520) yes R_MAT6m LmjF05.0520 0 62.0866 0 0 0 6.9589 0 0 0.3151 (LmjF05.0520)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 134 Chapter 3

Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum yes R_THFOCi LmjF06.0860 0 100000 0 100000 0.0033 0.0037 0 0 0.0037 (LmjF06.0860) yes R_MMEm LmjF26.0020 -100.6193 0 -100.6193 0 80.5821 99.9646 0 0 99.9607 (LmjF26.0020) yes R_PGL LmjF26.2700 0 75 0 75 0.1806 10.486 0 0 0.2006 (LmjF26.2700) R_PETOHMm_ yes LmjF31.2290 0 2.2823 0 2.2823 2.0532 2.2823 0 0 2.2813 LM (LmjF31.2290) R_MFAPSm_L yes LmjF31.3120 0 2.2823 0 2.2823 2.0532 2.2823 0 0 2.2813 M (LmjF31.3120) R_PMETMm_L yes LmjF31.3120 0 2.2823 0 2.2823 2.0532 2.2823 0 0 2.2813 M (LmjF31.3120) yes R_AGPATi_LM LmjF32.1960 0 5275.7154 0 5266.5318 4.3245 5.1763 0 4.8002 4.805 (LmjF32.1960) yes R_MCMAT5m LmjF33.2720 0 62.0866 0 6.9589 0 0 0.3151 (LmjF33.2720) yes R_MCMAT6m LmjF33.2720 0 62.0866 0 6.9589 0 0 0.3151 (LmjF33.2720) yes R_UPPRTr LmjF34.1040 -90000 0 -90000 0 -90000 0 0 -10000 -10000 (LmjF34.1040) yes R_GPAM_LM LmjF34.1090 0 5275.7154 0 5266.5318 4.3245 5.1763 0 4.8002 4.805 (LmjF34.1090) yes R_PSD_LM LmjF35.4590 0 3.791 0 3.791 2.9392 3.791 0 0 3.2658 (LmjF35.4590) yes R_DHFOR2 LmjF06.0860 -100000 100000 -100000 100000 0.0033 0.0037 100000 -100000 0.0037 (LmjF06.0860) yes R_G3PDg LmjF10.0510 0 23876.3002 0 5.2442 4.3924 5.2442 12625.2852 4.8756 4.8805 (LmjF10.0510)

Table I: (continued). Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 135

Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum yes R_ALATA_Lm LmjF12.0630 -68704.217 99880.2577 -68695.4215 99880.2577 -163.3843 -135.051 99837.5034 -1126.821 -161.9115 (LmjF12.0630) yes R_P5CRrm LmjF13.1680 -620.8456 99915.3621 -620.8456 99853.8721 -1.5705 -0.1197 -238.3441 -238.3441 -1.5628 (LmjF13.1680) R_CDPDSP_L yes LmjF14.1200 0 3.791 0 3.791 2.9392 3.791 3.2625 0 3.2658 M (LmjF14.1200) yes R_SUCD2_u6m LmjF15.0990 62.8157 470.146 63.0993 470.146 78.1245 276.3828 133.2828 133.2828 220.0654 (LmjF15.0990) R_GLUDx__LB yes RACKET_m_R LmjF15.1010 -100000 100000 -100000 100000 -45.8691 -5.3568 86874.8435 -263.2194 -26.4572 (LmjF15.1010) BRACKET_ yes R_step_CitAcetl LmjF18.0670 or 0 95.6833 0 95.6833 0 112.6942 31.4357 31.4357 91.6309 (LmjF18.0670, CoA LmjF18.0680 LmjF18.0680) yes R_ASPTA1m LmjF24.0370 -99785.721 99999.7439 -99785.7209 99999.7439 -46.7629 -6.2506 99999.7157 241.2433 -27.4503 (LmjF24.0370) yes R_TPIg LmjF24.0850 -23777.317 98.9828 69.5904 94.5904 80.5525 94.5904 -12526.4143 93.9953 93.9893 (LmjF24.0850) yes R_AKGDe2 LmjF28.2420 0 36.142 0 36.142 0 1.4507 1.93 1.93 1.4298 (LmjF28.2420) - yes R_MDH LmjF28.2860 -100000 99785.4647 -100000 99785.4647 -396.6427 99733.8542 -187.1148 -354.0891 250.6013 (LmjF28.2860) yes R_GLUDy LmjF28.2910 -100000 100000 -100000 100000 5.9945 46.5065 -86874.1359 263.9269 27.1655 (LmjF28.2910) R_step_ACCOA yes LmjF31.2970 0 17779.1831 74.4852 17776.9203 74.2017 101.0412 20.8914 20.8914 82.4463 CrmChange (LmjF31.2970) yes R_RPE LmjF33.1570 -100000 100000 -100000 100000 -0.2138 6.6778 -99999.693 99999.7864 -0.2138 (LmjF33.1570) yes R_ASPTA1 LmjF35.0820 -100000 99785.4647 -100000 99785.4647 5.9945 46.5067 -100000 -241.5276 27.1657 (LmjF35.0820) yes R_FRDg LmjF35.1180 0.3461 316.1513 0.3461 315.8677 0.3461 10.6515 0.3417 0.7842 0.7842 (LmjF35.1180) yes R_GAPD LmjF35.4750 -47350.906 51138.8319 342.9091 51041.6546 342.9091 396.3445 -24848.69 392.1291 392.1213 (LmjF35.4750)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 136 Chapter 3

Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum yes ( LmjF35.3870 and R_NDPKn3 -99996.661 99999.9665 -99996.6614 99999.9665 3.3386 4.1904 -0.0555 99999.9536 3.7096 LmjF35.3870, LmjF32.2950 ) LmjF32.2950 (( LmjF25.2130 and yes LmjF36.2950 ) or ( - (LmjF25.2130, R_SUCOGDPm -36996.946 -65.2734 -36994.5499 -119.7137 -454.7382 -12744.9403 -124.5307 -306.4168 LmjF25.2140 and 119.4301 LmjF36.2950, LmjF36.2950 )) LmjF36.2950) ( (LmjF23.0690 or yes R_ACACT7rm LmjF31.1630) or -45.3581 0 -45.3581 -0.2836 -12.7836 0 -0.3148 -0.3148 0 (LmjF23.0690) LmjF31.1640) yes ( (LmjF26.1550 or (LmjF26.1550, R_ECOAH7m LmjF29.2310) or -45.3581 0 -45.3581 -0.2836 -12.7836 0 -0.3148 -0.3148 0 LmjF29.2310, LmjF35.0360) LmjF35.0360) yes (LmjF06.0880 or R_HDCOADm 0 45.3581 0.2836 45.3581 0 12.7836 0.3148 0.3148 0 (LmjF06.0880, LmjF28.2510) LmjF28.2510) yes ( (LmjF29.2310 or (LmjF26.1550, R_ECOAH16m LmjF26.1550) or 0 0.6547 0 0.6547 0 0 0.6453 0.6453 0 LmjF29.2310, LmjF35.0360) LmjF35.0360) yes (LmjF18.0440 or R_PAPA_LM 0 45039.9319 0 45019.2879 0 444.759 1.0479 0 0 (LmjF18.0440, LmjF19.1350) LmjF19.1350) yes (LmjF26.1550 or R_HACD7m -45.3581 0 -45.3581 -0.2836 -12.7836 0 -0.3148 -0.3148 0 (LmjF26.1550, LmjF36.1140) LmjF36.1140) yes (LmjF28.2510 or R_ACOAD9m 0 0.6547 0 0.6547 0 0 0.6453 0.6453 0 (LmjF06.0880, LmjF06.0880) LmjF28.2510)

Table I: (continued). Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 137

Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum yes (LmjF33.2550 or R_ICDHym 0 34.6913 0 34.6913 0 0 0.5 0.5 0 (LmjF33.2550, LmjF10.0290) LmjF10.0290) (LmjF35.0820 or yes R_PHETA1 -100000 100000 -100000 100000 0 0 100000 99541.1304 0 LmjF36.2360) (LmjF35.0820) yes R_DHFOR2a LmjF06.0860 -100000 100000 -100000 100000 0 0 -100000 100000 0 (LmjF06.0860) yes (LmjF36.1140 or R_HACOADm 0 0.6547 0 0.6547 0 0 0.6453 0.6453 0 (LmjF26.1550, LmjF26.1550) LmjF36.1140) yes R_ALDD8xr_m LmjF25.1120 -5.2442 23871.9077 0 0 0 0 12620.4096 0 0 (LmjF25.1120) yes R_TAL LmjF16.0760 -0.1069 24.9038 -0.1069 24.9038 0 0 -0.1068 -0.1068 0 (LmjF16.0760) yes R_VALTAm LmjF27.2030 0 0.6547 0 0.6547 0 0 0.6453 0.6453 0 (LmjF27.2030) yes R_MFAPS_LM LmjF31.3120 0 2.2823 0 2.2823 0 0 2.279 2.279 0 (LmjF31.3120) yes R_PMETM_LM LmjF31.3120 0 2.2823 0 2.2823 0 0 2.279 2.279 0 (LmjF31.3120) 99656.99 R_Xu5ptg -100000 100000 -100000 100000 -100000 -100000 99999.4794 -0.4382 1 R_EX_hxan_LP AREN_e_RPAR -0.339 -0.3049 -0.339 -0.3049 -0.339 -0.3049 -100000 -100000 -0.3388 EN_ R_DTDPtn -100000 100000 -100000 100000 0 0 -100000 -100000 0 R_IDPtn -100000 100000 -100000 100000 0 0 -100000 -100000 0 R_PHCHGSm -100000 0 -100000 0 0 0 -100000 -100000 0 R_PALATa -100000 100000 -100000 100000 0 0 -100000 -99541.1304 0 R_DATPtn -100000 100000 -100000 100000 0 0 -100000 0.0555 0 R_CDPtn -100000 100000 -100000 100000 6.7772 8.4808 -100000 -0.0372 7.5302

Metabolic Systems Biology of Leishmania major University of Minho, 2019 138 Chapter 3

Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum R_CYTK1 -99996.653 100000 -99996.6532 100000 6.7772 8.4808 -99996.2387 -99996.285 7.5302 R_ALAtm -99880.258 68704.2173 -99880.2577 68695.4215 135.051 163.3843 -99837.5034 1126.821 161.9115 R_PYRtm -99864.949 68719.5261 -99864.949 68710.7302 135.051 163.3843 -99837.5034 1126.821 161.9115 R_NH4tm -100000 100000 -100000 100000 5.3568 45.8691 -86874.8435 263.2194 26.4572 R_step_TransCi -50140.313 49910.322 -50140.1712 49910.322 15.2241 96.1434 -49969.3446 -90.1084 74.6246 tMalCM R_13DPGt -47538.1 100000 180.7159 100000 325.4582 411.6002 -24871.9779 206.3447 368.2625 R_TYRt6m -100000 100000 -100000 100000 0 0 -13125.8643 100000 0 - R_GTPtm -36996.946 -65.2734 -36994.5499 -119.7137 -454.7382 -12744.9403 -124.5307 -306.4168 119.4301 R_PPItm -31172.9 -0.136 -31171.5047 -0.136 -304.2213 -1.8872 -12636.0026 -18.8555 -198.2179 R_GLYALDtm -23871.908 5.2442 0 0 0 0 -12620.4096 0 0 R_GLYCRtm -23871.908 5.2442 0 0 0 0 -12620.4096 0 0 R_GLYCtg -23871.908 5275.7834 0 5262.2073 0 0 -12620.4096 0 0 R_3PGt -23865.627 50152.0946 6.2804 50151.9529 141.3314 212.1647 -12450.9417 6.9712 168.8897 - R_H2Ot5 -454.5394 -347.6771 -454.5394 -347.6771 -421.6144 -400.1884 -400.1884 -398.1942 347.7055 R_EX_o2_LPAR - EN_e_RPAREN -327.5592 -156.6993 -327.5592 -156.6993 -287.863 -244.8922 -244.8922 -242.0441 180.0321 _ R_G5SADsm -620.8456 100000 -620.8456 100000 -1.5705 -0.1197 -238.3441 -238.3441 -1.5628 R_GLUT -619.3948 99908.4334 -619.3948 99847.0375 -0.1331 -0.1197 -236.9141 -236.9141 -0.133 R_CO2t -142.109 -18.5157 -142.109 -18.5157 -96.951 -18.5157 -58.2177 -58.2177 -56.0798

Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 139

Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum R_EX_pi_LPAR -100 -4.3167 -100 -4.3167 -117.0109 -4.3167 -36.2273 -36.2273 -96.4273 EN_e_RPAREN R_COAtm -12334.804 10037.5 -12332.3234 9963.0148 -103.8553 3.0639 -30.4917 -20.8914 -92.0562 R_CO2tm -89.7359 12.5 -89.7359 12.5 -1.4507 12.5 -6.3378 -6.3378 -1.4298 R_TDCOAtm -12.7836 366.0135 -12.7836 315.8793 -12.5241 2.7595 -0.0268 -0.3148 0.2883 R_AHCYStm -6.8469 0 -6.8469 0 -6.8469 -6.1595 0 0 -6.8439 R_ACPexch -62.0866 0 -54.527 0 -11.714 0 0 0 -1.3703 R_TAGDPtg 0 0 0 0 -0.1069 3.3389 0 0 -0.1069 R_PCtm -0.0228 0 -0.0228 0 -0.0228 -0.0205 0 0 -0.0228 R_12DGRtm -312.238 450.2068 -311.694 450.1764 -0.0105 4.4381 0 -0.0105 -0.0105 R_FA140ACPH 0 52.9178 0 0 0 6.9589 0 0 0.3151 R_FA140ACPt 0 52.9178 0 0 0 6.9589 0 0 0.3151 m R_HDDHL5m 0 62.0866 0 0 0 6.9589 0 0 0.3151 R_HTDHL6m 0 62.0866 0 0 0 6.9589 0 0 0.3151 R_ACOATAm 0 62.0866 0 54.527 0 11.714 0 0 1.3703 R_AMETtm 0 6.8469 0 6.8469 6.1595 6.8469 0 0 6.8439 R_CTPtn -100000 99996.6196 -100000 99996.6196 -4.2322 -3.3804 0.0091 -100000 -3.756 R_PS_LMtm -0.0379 0.0379 -0.0379 0.0379 0 0 0.0326 0 0 R_VT 0 0.6547 0 0.6547 0 0 0.6453 0.6453 0 R_ALINCOAtm 0 844.1145 0 842.6451 0.6919 0.8282 0.768 0 0.7688 R_GLINCOAtm 0 844.1145 0 842.6451 0.6919 0.8282 0.768 0 0.7688

Metabolic Systems Biology of Leishmania major University of Minho, 2019 140 Chapter 3

R_STRCOAtm 0 1077.7906 0 1070.2782 0.8649 14.4992 0.96 0 0.961 Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum R_PMTCOAtm -13.3507 986.0614 -13.3507 977.6815 -12.5723 13.562 1.1788 0.3148 0.8649 R_FA160ACPH 0 46.2088 0 46.2088 0 10.8464 1.2591 1.2591 0.9452 R_FA160ACPtm 0 46.2088 0 46.2088 0 10.8464 1.2591 1.2591 0.9452 R_FAS_an_m2 -11.4542 50.773 2.0795 50.773 0 11.714 2.3083 2.3083 1.9955 R_OLCOAtm 0 3481.9722 0 3475.911 2.8541 3.4163 3.1681 0 3.1713 R_MALCOAtm -12.5 17779.1831 66.1671 17776.9203 61.7017 93.7119 20.8914 20.8914 82.4463 R_CO2tp -25.7328 25559.454 -25.7328 25517.6789 -12.5781 58.2551 30.0733 192.5698 30.6509 R_PEPt -25039.31 193.4736 -24910.0016 193.4736 -12.4106 193.4736 30.2591 192.7557 192.7557 R_PROtm -99915.362 620.8456 -99853.8721 620.8456 0.1197 1.5705 238.3441 238.3441 1.5628 R_EX_h2o_LPA REN_e_RPARE 347.6771 454.5394 347.6771 454.5394 347.7055 421.6144 400.1884 400.1884 398.1942 N_ R_ADK1m 0 31169.8248 0 31168.4296 1.7513 304.0854 12635.8517 15.4421 198.0668 R_ASCTm 0 36860.1845 0 36857.7886 18.0148 353.3229 12642.4042 21.9946 205.0263 R_GDPtm 65.2734 36996.9458 119.7137 36994.5499 119.4301 454.7382 12744.9403 124.5307 306.4168 R_34HPPt2m -100000 100000 -100000 100000 0 0 13125.8643 -100000 0 R_CITtam -49910.322 50140.3129 -49910.322 50140.1712 -96.1434 -15.2241 49969.3446 90.1084 -74.6246 R_H2Otm -100000 99950.4205 -100000 99945.4025 -833.0402 -224.043 74461.9344 -54.6709 -532.0147 R_Ru5ptg -100000 100000 -100000 100000 0 0 99999.4794 -100000 0 R_ASPtm -99785.721 99999.7439 -99785.7209 99999.7439 -46.7629 -6.2506 99999.7157 241.2433 -27.4503 R_CMPtn -99996.653 100000 -99996.6532 100000 -4.2821 -3.4303 99999.9536 100000 -3.8115

Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 141

Table I: (continued). Flux Variability Analysis (FVA) Flux (pFBA) With With Protein GIMME GIMME expression constraints constraints (gene ID) Reaction ID Associated GIMME at GIMME at Without (threshold (threshold (Pawar et al., Before GIMME threshold 12 threshold 70 GIMME 12) 70) 2014) (in ext- genes iAC560) Minimum Maximum Minimum Maximum Minimum Maximum R_CYTK1n -99996.653 100000 -99996.6532 100000 -4.2821 -3.4303 99999.9536 100000 -3.8115 R_DTTPtn -100000 100000 -100000 100000 0 0 100000 100000 0 R_PHPYRtm -100000 100000 -100000 100000 0 0 100000 99541.1304 0 R_ITPtn -100000 100000 -100000 100000 0 0 100000 100000 0 R_DADPtn -100000 100000 -100000 100000 0 0 100000 -0.0555 0 genes1- ( ( ( ( ( ( ( ( ( ( (( ( ( ( LmjF05.0500 and LmjF25.1170 ) and LmjF21.0740 ) and LmjF30.3600 ) and LmjF21.1770 ) or ( ( ( ( LmjF05.0500 and LmjF25.1170 ) and LmjF24.0630 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0500 and LmjF25.1170 ) and LmjF26.0460 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0500 and LmjF25.1180 ) and LmjF21.0740 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0500 and LmjF25.1180 ) and LmjF24.0630 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0500 and LmjF25.1180 ) and LmjF26.0460 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0510 and LmjF25.1170 ) and LmjF21.0740 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0510 and LmjF25.1170 ) and LmjF24.0630 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0510 and LmjF25.1170 ) and LmjF26.0460 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0510 and LmjF25.1180 ) and LmjF21.0740 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0510 and LmjF25.1180 ) and LmjF24.0630 ) and LmjF30.3600 ) and LmjF21.1770 )) or ( ( ( ( LmjF05.0510 and LmjF25.1180 ) and LmjF26.0460 ) and LmjF30.3600 ) and LmjF21.1770 )); genes2- ( ( ( ( ( ( (LmjF01.0470 or LmjF01.0490) or LmjF01.0500) or LmjF01.0510) or LmjF01.0520) or LmjF01.0530) or LmjF03.0230) or LmjF13.0420); genes3- ( ( ( ( ( ( (LmjF01.0500 or LmjF01.0470) or LmjF01.0490) or LmjF01.0520) or LmjF03.0230) or LmjF01.0510) or LmjF01.0530) or LmjF13.0420); pFBA- Parsimonious Flux Balance Analysis; FVA- Flux Variability Analysis; GIMME- Gene Inactivity Moderated by Metabolism and Expression.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 142 Chapter 3

3.6 Appendix 3b Biomass formulation In original Genome-Scale Metabolic model iAC560 of L. major, the percentage components of protein, RNA, and carbohydrate were estimated based on experimental data for protozoan Tetrahymena (Gates et al. 1982; Hellung-Larsen and Andersen 1989). Similarly, the percent dry weight of polyamine pools from Leishmania promastigotes (L. enrietti and L. donovani) (Beach et al. 1979; Glew et al. 1988); and lipid components from metabolic network reconstruction of H. pylori and M. barkeri (Feist et al. 2006; Thiele et al. 2005). In the present study, the stoichiometric coefficients of biomass components e.g. Protein, DNA, and RNA were estimated from a latest experimental study on L. donovani (Sharma et al. 2017) (Table I)

Table I: Percent fraction of macromolecules in biomass.

Metabolic model Stages drate Total

DNA RNA

pools

Lipid

Protein

Polyamine Carbohy

Original model Promastigote 45 1.6 11 27 15 0.4 100 (iAC560)

Promastigote (Sharma et al. 2017) 50.9 2.7 4.5 26.5 15 0.4 100 Extended model (ext-iAC560)

Amastigote (estimated) 46.1 2.4 4.1 33.4 13.6 0.4 100

The carbohydrate mainly includes mannan, LPG, GIPL, and N-glycans. For each of the carbohydrate fraction, the composition was calculated considering the mannan fraction of biomass is as 80% and 90 % of all carbohydrates in promastigote and amastigote stage, respectively (Ralton et al. 2003). For the biomass fractions of the remaining carbohydrates, we considered studies performed for differential expression analysis of LPG, GIPL, and N-glycans in (Turco and Sacks 1991; McConville and Ferguson 1993; Descoteaux and Turco 1993; Kink and Chang 1988). The calculated percent fractions based on above experimental evidence are mentioned in Table II. The membrane-bound proteophosphoglycans (mPPGs) is similar to LPG structure and relatively expressed in lesser proportion than LPG and GIPL in promastigote stage (Ilg 2000); hence not considered in the present study. Secondly, the biosynthesis of mPPG and enzymes involved in the process has not been fully defined. Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 143

Table II: Biomass composition of Mannan, LPG, GIPL and N-glycans in promastigote and amastigotes stage of L. major.

Mannan LPG GIPL N-Glycans Total

Promastigote 10.05 6.25 10.00 0.50 26.5

Amastigote 30.06 0.37 2.94 0.04 33.4

Sugar nucleotides mainly UDP-Glucose (UDP-Glc), UDP-Galactose (UDP-Gal), UDP-N- acetylglucosamine (UDP-GlcNAc), GDP-Mannose (GDP-Man), UDP-Galactofuranose (UDP- Galf), and GDP-Arabinose (GDP-Ara) are the building blocks of the different carbohydrate macromolecules in the parasite. We considered these sugar nucleotides to introduce in the biomass equation instead of complex carbohydrate macromolecules to maintain the consistency of the model.

The molar fraction of each of the sugar nucleotides was calculated in each carbohydrate macromolecule. Using the molar fraction, stoichiometric coefficients (in mmol per gram of Dry cell Weight (DW)) of each building block were calculated. The calculation was performed separately for promastigote and amastigote stages of parasite’s life cycle. As an example shown below, the biomass composition (mmol/gDW) of UDP-Gal was calculated using the Table III data in promastigote stage.

UDP-Gal (mmol/gDW) = × × × = 0.186

where, MW (LPG) = ]

In the example, MW (LPG) = 156.16 gLPG / mol LPG

Metabolic Systems Biology of Leishmania major University of Minho, 2019 144 Chapter 3

Table III: The molar ratio of each sugar nucleotides in LPG. The biomass composition was calculated for promastigote stage.

Sugar Nucleotide Molar ratio MW Biomass Composition (SugNuc) (mol SugNuc/mol LPG) (g/mol) (mmol/gDW)

UDP-Glc 0.0000 162.30 0.00000

UDP-Gal 0.47059 162.30 0.05249

UDP-GlcNAc 0.01471 203.35 0.00164

GDP-Man 0.27941 162.14 0.03117

UDP-Galf 0.01471 160.28 0.00164

GDP-Ara 0.22059 132.11 0.02460

After including sugar nucleotides in biomass equation, the stoichiometric coefficients of each of the compound were converted into g/gDW in order to check the weight of producing biomass. The correction factor for each of the stoichiometric coefficient was determined to balance the biomass equation and to produce 1g of DW. The excel data for calculation on biomass is provided in Supplementary Material 3.

Biomass Equations

The complete biomass equations in two metabolic stages are given below-

Promastigote (inside sand fly)

0.48942(Alanine)+0.29238(Arginine)+0.10638(Asparagine)+0.19758()+0.07696(Cysteine)+0.2438(Glutamate)+0.16662(Glutamine)+0.26266()+0.10983(Histidine)+0.12129(Iso leucine)+0.37407(Leucine)+0.13536(Lysine)+0.09183(Methionine)+0.12(Phenylalanine)+0.23523(Proline)+0.36495(S erine)+0.24375(Threonine)+0.04387(Tryptophan)+0.09747(Tyrosine)+0.29138(Valine)+0.01693(dAMP)+0.02508(dC MP)+0.02508(dGMP)+0.01693(dTMP)+0.02597(AMP)+0.04159(CMP)+0.04232(GMP)+0.02452(UMP)+0.05715(erg osterol)+0.02584(triglyceride)+0.01239(zymosterol)+0.00128(Diacylglycerol)+0.00149(Monoacylglycerol)+0.02685(P hosphatidylethanolamine)+0.06222(Phosphatidylcholine)+0.00925(Phosphatidylinositol)+0.00206(cardiolipin)+0.0033 7(Putrescine)+0.00068(Spermidine)+32.61204(Water)+32.61204(ATP)+0.00146(Mannan)+0.00096(UDP- Glc)+0.26269(UDP-Gal)+0.13971(UDP-GlcNAc)+0.45926(GDP-Man)+0.05633(UDP-Galf)+0.08764(GDP-Ara) → 32.26(h)+32.26(pi)+32.26(adp)+0.14739(udp)+0.17799(gdp)+1(Biom)

Amastigote (inside human host)

0.44165(Alanine)+0.26384(Arginine)+0.09600(Asparagine)+0.17829(Aspartic acid)+0.06945(Cysteine)+0.22000(Glutamate)+0.15036(Glutamine)+0.23702(Glycine)+0.09911(Histidine)+0.10945(Is oleucine)+0.33756(Leucine)+0.12215(Lysine)+0.08287(Methionine)+0.10829(Phenylalanine)+0.21227(Proline)+0.329 33(Serine)+0.21996(Threonine)+0.03959(Tryptophan)+0.08796(Tyrosine)+0.26294(Valine)+0.01528(dAMP)+0.02263 (dCMP)+0.02263(dGMP)+0.01528(dTMP)+0.02344(AMP)+0.03753(CMP)+0.03819(GMP)+0.02213(UMP)+0.05158( ergosterol)+0.02332(triglyceride)+0.01118(zymosterol)+0.00116(Diacylglycerol)+0.00134(Monoacylglycerol)+0.0242 Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 145

3(Phosphatidylethanolamine)+0.05615(Phosphatidylcholine)+0.00835(Phosphatidylinositol)+0.00186(cardiolipin)+0.0 0304(Putrescine)+0.00062(Spermidine)+32.48072(Water)+32.48072(ATP)+0.00433(Mannan)+0.00007(UDP- Glc)+0.03370(UDP-Galp)+0.03852(UDP-GlcNAc)+0.10587(GDP-Man)+0.01538(UDP-Galf)+0.00523(GDP-Ara) → 32.26(h)+32.26(pi)+32.26(adp)+0.07038(udp)+0.08917(gdp)+1(Biom)

In-silico media for promastigote and amastigote

Modified Media for Promastigote (MMP)

L-arginine (arg-L), L-cysteine (cys-L), L-histidine (his-L), L-isoleucine (ile-L), L-leucine (leu-L), L-lysine (lys-L), L-methionine (met-L), L-phenylalanine (phe-L), L-threonine (thr-L), L-tyrosine (tyr-L), L-valine (val-L) and L-proline (pro-L)), hypoxanthine (hxan), phosphate, oxygen, and D- glucose (glc-D)

Modified Media for Promastigote (MMA)

L-arginine (arg-L), L-cysteine (cys-L), L-histidine (his-L), L-isoleucine (ile-L), L-leucine (leu-L), L-lysine (lys-L), L-methionine (met-L), L-phenylalanine (phe-L), L-threonine (thr-L), L-tyrosine (tyr-L), L-valine (val-L) and L-proline (pro-L), hypoxanthine (hxan), phosphate, oxygen, and D- glucose (glc-D), Stearic acid (strAcid), D-Glucosamine (GlcN), N-Acetyl-D-glucosamine (GlcNAc), Phosphatidylethanolamine (peLM) and L-alanine (ala-L)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 146 Chapter 3

Table IV: Flux constraints set to simulate the growth conditions and predicted pFBA flux showing consumption of nutrients in promastigote and amastigote condition.

Promastigote stage Amastigote stage Uptake Reactions Flux Constraints Predicted flux Flux Constraints Predicted flux value Minimum Maximum value (pFBA) Minimum Maximum (pFBA) R_EX_hxan_LPAREN_e_RPAREN_ -1000 0 -0.3387 -1000 0 -0.0528 R_EX_val_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.0944 -1000 0 -0.0296 R_EX_o2_LPAREN_e_RPAREN_ -1000 0 -237.5243 -23.7524 0 -23.7524 R_EX_thr_DASH_L_LPAREN_e_RPAREN_ -1000 0 -3.5482 -1000 0 -0.0886 R_EX_his_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.0356 -1000 0 -0.0111 R_EX_trp_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.0142 -1000 0 -0.0044 R_EX_pi_LPAREN_e_RPAREN_ -1000 0 -36.5638 -1000 0 -0.2913 R_EX_phe_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.5521 -1000 0 -0.0221 R_EX_pro_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.2092 -0.0209 0 -0.0209 R_EX_arg_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.0962 -1000 0 -0.0302 R_EX_lys_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.0438 -1000 0 -0.0137 R_EX_met_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.0297 -1000 0 -0.0093 R_EX_cys_DASH_L_LPAREN_e_RPAREN_ -1000 0 -0.0249 -1000 0 -0.0078 R_EX_b_DASH_D_DASH_glucose_LPAREN_e_RPAREN_ -100 0 -100 -10 0 -10 R_EX_ile_DASH_L_LPAREN_e_RPAREN_ -100 0 -100 -1000 0 -0.0123 R_EX_leu_DASH_L_LPAREN_e_RPAREN_ -1000 0 -61.6353 -1000 0 -13.1942 R_EX_ala_DASH_L_LPAREN_e_RPAREN_ 0 0 0 -1000 0 -7.9042 step_EX_peLM_exchange 0 0 0 -1000 0 -0.0102 step_EX_strAcid_exchange 0 0 0 -100 0 -1.2466 step_EX_Nacetyl_Glucosamine_exchange 0 0 0 -100 0 -0.0048 R_EX_asp_DASH_L_LPAREN_e_RPAREN_ 0 0 0 -100 0 -58.9703

Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 147

3.7 Appendix 3c

Table I: Distributions of the metabolic genes in transcriptomic and proteomic studies (for promastigote L. major) at threshold equal to 70.

Protein Expression (Pawar et al., 2014) Gene expression Total Yes No < threshold 325 142 467

> threshold 76 23 99

Note: Four metabolic genes (e.g. LmjF34.4330, LmjF31.3130, LmjF31.2470, and LmjF20.1550) in the transcriptomic study do not have FPKM values

3.8 Appendix 3d

All supplementary materials are provided at https://github.com/shakyawar/SupplMaterial_PhD

A brief description of programming scripts written for performing various analyses in this study

(Available at https://github.com/shakyawar/SupplMaterial_PhD)

Script set 1 (java): This is for performing pFBA analysis in conjugation with GIMME on varying gene expression threshold values.

Script set 2 (java): This is for performing reaction knockout simulation, where a reaction is knocked out and biomass is measured to observe lethal or non-lethal effect.

Script set 3 (python): This is for performing general analysis using omics data (e.g. transcriptomics and proteomics) and metabolic model ext-iAC560, like finding common genes in all studies, density analysis of using FPKM values of all the genes considered in transcriptomics study.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 148 Chapter 3

3.9 References

Assis, R.R. et al., 2012. Glycoinositolphospholipids from Leishmania braziliensis and L. infantum: Modulation of innate immune system and variations in carbohydrate structure. PLoS Neglected Tropical Diseases, 6(2), p. e1543. Beach D.H., Holz G.G. and Jr, A.G., 1979. Lipids of Leishmania promastigotes. J Parasitol, 65(2), pp.201–216. Becker, S.A. and Palsson, B.O., 2008. Context-specific metabolic networks are consistent with experiments. PLoS Computational Biology, 4(5), p. e1000082. Beste, D.J. et al., 2007. GSMN-TB: a web-based genome-scale network model of Mycobacterium tuberculosis metabolism. Genome biology, 8(5), p.R89. Byers, D.M. and Gong, H., 2007. Acyl carrier protein: structure-function relationships in a conserved multifunctional protein family. Biochemistry and cell biology, 85(6), pp.649–662. Carey, M.A., Papin, J.A. and Guler, J.L., 2017. Novel Plasmodium falciparum metabolic network reconstruction identifies shifts associated with clinical antimalarial resistance. BMC Genomics, 18(1), p.543. Caspi, R. et al., 2014. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research, 42(D1), p. D459-471. Chandrasekaran, S. and Price, N.D., 2010. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences of the United States of America, 107(41), pp.17845–50. Chavali, A.K. et al., 2012. Metabolic network analysis predicts efficacy of FDA-approved drugs targeting the causative agent of a neglected tropical disease. BMC Systems Biology, 6(1), p.27. Chavali, A.K. et al., 2008. Systems analysis of metabolism in the pathogenic trypanosomatid Leishmania major. Molecular Systems Biology, 4, p.177. Correia, S., 2016. Ph.D Thesis. University of Minho, http://repositorium.sdum.uminho.pt/handle/1822/45264. Costa, T.L. et al., 2011. Energetic metabolism of axenic promastigotes of Leishmania (Viannia) braziliensis. Experimental Parasitology, 128(4), pp.438–443. Coustou, V. et al., 2008. Glucose-induced remodeling of intermediary and energy metabolism in procyclic Trypanosoma brucei. Journal of Biological Chemistry, 283(24), pp.16343–16354. Degrossoli, A. et al., 2011. The Influence of Low Oxygen on Macrophage Response to Leishmania Infection. Scandinavian Journal of Immunology, 74(2), pp.165–175. Descoteaux, A. and Turco, S.J., 1993. The lipophosphoglycan of Leishmania and macrophage protein kinase C. Parasitology Today, 9(12), pp.468–471. Doyle, M. a et al., 2009. LeishCyc: a biochemical pathways database for Leishmania major. BMC systems biology, 3, p.57. Eggimann, G.A. et al., 2015. The role of phosphoglycans in the susceptibility of Leishmania mexicana to the temporin family of anti-microbial peptides. Molecules, 20(2), pp.2775–2785. Feist, A.M. et al., 2006. Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Molecular systems biology, 2, p.2006.0004. Ferré, P. and Foufelle, F., 2007. SREBP-1c transcription factor and lipid homeostasis: Clinical perspective. Hormone Research, 68(2), pp.72–82. Garami, A. and Ilg, T., 2001. The Role of Phosphomannose Isomerase in Leishmania mexicana Glycoconjugate Synthesis and Virulence. Journal of Biological Chemistry, 276(9), pp.6566–6575. Gates, M.A., Rogerson, A. and Berger, J., 1982. Dry to wet weight biomass conversion constant for Tetrahymena elliotti (Ciliophora, Protozoa). Oecologia, 55(2), pp.145–148. Glew RH. et al., 1988. Biochemistry of the Leishmania species. crobiol Rev, 52, pp.412–432. Goto, H. and Lindoso, J.A., 2010. Current diagnosis and treatment of cutaneous and mucocutaneous leishmaniasis. Expert Rev Anti Infect Ther, 8(4), pp.419–433. Green, S.J. et al., 1990. Leishmania major amastigotes initiate the L-arginine-dependent killing mechanism in IFN- Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 149

gamma-stimulated macrophages by induction of tumor necrosis factor-alpha. Journal of immunology (Baltimore, Md. : 1950), 145(12), pp.4290–7. Großeholz, R. et al., 2016. Integrating highly quantitative proteomics and genome-scale metabolic modeling to study pH adaptation in the human pathogen Enterococcus faecalis. npj Systems Biology and Applications, 2(1), p.16017. Gu, X. et al., 2013. Mathematical Modelling of Polyamine Metabolism in Bloodstream-Form Trypanosoma brucei: An Application to Drug Target Identification. PLoS ONE, 8(1), p. e53734. Guerin, P.J. et al., 2002. Visceral leishmaniasis: current status of control, diagnosis, and treatment, and a proposed research and development agenda. The Lancet infectious diseases, 2(8), pp.494–501. Hart, D.T. and Coombs, G.H., 1982. Leishmania mexicana: Energy metabolism of amastigotes and promastigotes. Experimental Parasitology, 54(3), pp.397–409. Van Hellemond, J.J., Bakker, B.M. and Tielens, A.G.M., 2005. Energy metabolism and its compartmentation in Trypanosoma brucei. Advances in Microbial Physiology, 50, pp.199–226. Van Hellemond, J.J., Opperdoes, F.R. and Tielens, A.G.M., 1998. Trypanosomatidae produce acetate via a mitochondrial acetate:succinate CoA transferase. Proceedings of the National Academy of Sciences, 95(6), pp.3036–3041. Hellung-Larsen, P. and Andersen, A. P., 1989. Cell volume and dry weight of cultured Tetrahymena. Journal of cell science, 92, pp.319–324. Ilg, T., 2000. Proteophosphoglycans of Leishmania. Parasitology Today, 16(11), pp.489–497. Jamshidi, N. and Palsson, B., 2010. Mass action stoichiometric simulation models: Incorporating kinetics and regulation into stoichiometric models. Biophysical Journal, 98(2), pp.175–185. Jensen, P.A. and Papin, J.A., 2011. Functional integration of a metabolic network model and expression data without arbitrary thresholding. Bioinformatics, 27(4), pp.541–547. Joyce, A.R. and Palsson, B.O., 2006. The model organism as a system: integrating “omics” data sets. Nature reviews. Molecular cell biology, 7(3), pp.198–210. Kaltdorf, M. et al., 2016. Systematic identification of anti-fungal drug targets by a metabolic network approach . Frontiers in Molecular Biosciences, 3, pp.1–19. Kanehisa, M. and Goto, S., 2000. Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research, 28, pp.27–30. Kim, H.U., Kim, T.Y. and Lee, S.Y., 2010. Genome-scale metabolic network analysis and drug targeting of multi-drug resistant pathogen Acinetobacter baumannii AYE. Molecular bioSystems, 6(2), pp.339–348. Kim, M. et al., 2016. Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli. Nature Communications, 7, p. 13090. Kink, J.A. and Chang, K.P., 1988. N-Glycosylation as a biochemical basis for virulence in Leishmania mexicana amazonensis. Molecular and Biochemical Parasitology, 27(2–3), pp.181–190. Ter Kuile, B.H., 1999. Regulation and adaptation of glucose metabolism of the parasitic protist Leishmania donovani at the enzyme and mRNA levels. Journal of Bacteriology, 181(16), pp.4863–4872. Leifso, K. et al., 2007. Genomic and proteomic expression analysis of Leishmania promastigote and amastigote life stages: The Leishmania genome is constitutively expressed. Molecular and Biochemical Parasitology, 152(1), pp.35–46. Lillico, S. et al., 2003. Essential roles for GPI-anchored proteins in African trypanosomes revealed using mutants deficient in GPI8. Molecular biology of the cell, 14(3), pp.1182–1194. Ludin, P. et al., 2012. In silico prediction of antimalarial drug target candidates. Int J Parasitol Drugs Drug Resist, 2, pp.191–199. Ma, S. et al., 2015. Integrated Modeling of Gene Regulatory and Metabolic Networks in Mycobacterium tuberculosis. PLoS Computational Biology, 11(11), p.e1004543. Madden, T., 2013. The BLAST sequence analysis tool. The BLAST Sequence Analysis Tool, pp.1–17. Mahnke, A. et al., 2014. Hypoxia in Leishmania major skin lesions impairs the NO-dependent leishmanicidal activity of macrophages. The Journal of investigative dermatology, 134(9), pp.2339–2346.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 150 Chapter 3

McConville, M.J. et al., 2015. Leishmania carbon metabolism in the macrophage phagolysosome- feast or famine? F1000Research, 4(F1000 Faculty Rev), p.938. McConville, M.J. and Blackwell, J.M., 1991. Developmental changes in the glycosylated phosphatidylinositols of Leishmania donovani. Characterization of the promastigote and amastigote glycolipids. The Journal of biological chemistry, 266(23), pp.15170–15179. McConville, M.J. and Ferguson, M. a, 1993. The structure, biosynthesis and function of glycosylated phosphatidylinositols in the parasitic protozoa and higher eukaryotes. Biochemical Journal, 294(Pt 2), pp.305– 324. McConville, M.J. and Naderer, T., 2011. Metabolic pathways required for the intracellular survival of Leishmania. Annual Review of Microbiology, 65, pp.543–61. Merlen, T. et al., 1999. Leishmania spp.: Completely defined medium without serum and macromolecules (CDM/LP) for the continuous in vitro cultivation of infective promastigote forms. American Journal of Tropical Medicine and Hygiene, 60(1), pp.41–50. Murray, H.W. et al., 2005. Advances in leishmaniasis. In Lancet, 366(9496), pp. 1561–1577. Naderer, T. et al., 2015. Intracellular Survival of Leishmania major Depends on Uptake and Degradation of Extracellular Matrix Glycosaminoglycans by Macrophages. PLoS Pathogens, 11(9), pp.1–20. Naderer, T. et al., 2006. Virulence of Leishmania major in macrophages and mice requires the gluconeogenic enzyme fructose-1,6-bisphosphatase. Proceedings of the National Academy of Sciences of the United States of America, 103(14), pp.5502–5507. Naderer, T., Heng, J. and McConville, M.J., 2010. Evidence that intracellular stages of Leishmania major utilize amino sugars as a major carbon source. PLoS Pathogens, 6(12)., p. e1001245. Naderer, T. and McConville, M.J., 2008. The Leishmania-macrophage interaction: A metabolic perspective. Cellular Microbiology, 10(2), pp.301–308. Naderer, T., Wee, E. and McConville, M.J., 2008. Role of hexosamine biosynthesis in Leishmania growth and virulence. Molecular Microbiology, 69(4), pp.858–869. Nii-Trebi, N.I., 2017. Emerging and Neglected Infectious Diseases: Insights, Advances, and Challenges. BioMed Research International, 2017, p.5245021. Oldberg, R.P. and Brunengraber, H., 1980. Contributions of cytosolic and mitochondrial acetyl-CoA syntheses to the activation of lipogenic acetate in rat liver. Adv Exp Med Biol., 132, pp.413–418. Opperdoes, F.R. and Coombs, G.H., 2007. Metabolism of Leishmania: proven and predicted. Trends in Parasitology, 23(4), pp.149–158. Passero, L.F.D. et al., 2015. Differential modulation of macrophage response elicited by glycoinositolphospholipids and lipophosphoglycan from Leishmania (Viannia) shawi. Parasitology International, 64(4), pp.32–35. Pawar, H. et al., 2014. Neglected Tropical Diseases and Omics Science: Proteogenomics Analysis of the Promastigote Stage of Leishmania major Parasite. Omics : a journal of integrative biology, 18(8), pp.1–14. Peters, N.C. et al., 2008. In vivo imaging reveals an essential role for neutrophils in leishmaniasis transmitted by sand flies. Science (New York, N.Y.), 321(5891), pp.970–4. Plata, G. et al., 2010. Reconstruction and flux-balance analysis of the Plasmodium falciparum metabolic network. Molecular Systems Biology, 6(408), p.408. Ponte-Sucre, A. et al., 2017. Drug resistance and treatment failure in leishmaniasis: A 21st century challenge. PLOS Neglected Tropical Diseases, 11(12), p.e0006052. Qidwai, T. et al., 2014. Antimalarial Drug Targets and Drugs Targeting Dolichol Metabolic Pathway of Plasmodium falciparum. Current Drug Targets, 15, pp.374–409. Raghunathan, A. et al., 2010. Systems approach to investigating host-pathogen interactions in infections with the biothreat agent Francisella. Constraints-based model of Francisella tularensis. BMC Systems Biology, 4(1), p.118. Rainey, P.M. and MacKenzie, N.E., 1991. A carbon-13 nuclear magnetic resonance analysis of the products of glucose metabolism in Leishmania pifanoi amastigotes and promastigotes. Molecular and biochemical parasitology, 45(2), pp.307–315. Ralton, J.E. et al., 2003. Evidence that Intracellular β1-2 Mannan Is a Virulence Factor in Leishmania Parasites. Journal Integrated Metabolic Flux And Omics Analysis of Leishmania major Metabolism 151

of Biological Chemistry, 278(42), pp.40757–40763. Rastrojo, A. et al., 2013. The transcriptome of Leishmania major in the axenic promastigote stage: transcript annotation and relative expression levels by RNA-seq. BMC genomics, 14(1), p.223. Rocha, I. et al., 2010. OptFlux: an open-source software platform for in silico metabolic engineering. BMC systems biology, 4(1), p.45. Rodriguez-Contreras, D. and Hamilton, N., 2014. Gluconeogenesis in Leishmania mexicana: Contribution of glycerol kinase, phosphoenolpyruvate carboxykinase, and pyruvate phosphate dikinase. Journal of Biological Chemistry, 289(47), pp.32989–33000. Rosenzweig, D. et al., 2008. Retooling Leishmania metabolism: from sand fly gut to human macrophage. FASEB journal : official publication of the Federation of American Societies for Experimental Biology, 22(2), pp.590– 602. Saha, R., Chowdhury, A. and Maranas, C.D., 2014. Recent advances in the reconstruction of metabolic models and integration of omics data. Current Opinion in Biotechnology, 29(1), pp.39–45. Saunders, E.C. et al., 2010. Central carbon metabolism of Leishmania parasites. Parasitology, 137(9), pp.1303–1313. Saunders, E.C. et al., 2014. Induction of a Stringent Metabolic Response in Intracellular Stages of Leishmania mexicana Leads to Increased Dependence on Mitochondrial Metabolism. PLoS Pathogens, 10(1), p.e1003888. Saunders, E.C. et al., 2011. Isotopomer profiling of Leishmania mexicana promastigotes reveals important roles for succinate fermentation and aspartate uptake in Tricarboxylic Acid Cycle (TCA) anaplerosis, glutamate synthesis, and growth. Journal of Biological Chemistry, 286(31), pp.27706–27717. Schuster, F.L. and Sullivan, J.J., 2002. Cultivation of clinically significant hemoflagellates. Clinical Microbiology Reviews, 15(3), pp.374–389. Shameer, S. et al., 2015. TrypanoCyc: A community-led biochemical pathways database for Trypanosoma brucei. Nucleic Acids Research, 43(D1), pp.D637–D644. Sharma, M. et al., 2017. A systematic reconstruction and constraint-based analysis of Leishmania donovani metabolic network: identification of potential antileishmanial drug targets. Mol. BioSyst., 277, pp.38245–38253. Shlomi, T. et al., 2008. Network-based prediction of human tissue-specific metabolism. Nature biotechnology, 26(9), pp.1003–1010. Silva, M.F.L. et al., 2012. Leishmania amazonensis arginase compartmentalization in the glycosome is important for parasite infectivity. PLoS ONE, 7(3)., p. e34022. Srivastava, S. et al., 2013. Leishmania expressed lipophosphoglycan interacts with Toll-like receptor (TLR)-2 to decrease TLR-9 expression and reduce anti-leishmanial responses. Clinical and Experimental Immunology, 172(3), pp.403–409. Subramanian, A. and Sarkar, R.R., 2017. Revealing the mystery of metabolic adaptations using a genome scale model of Leishmania infantum. Scientific Reports, 7(1), p.10262. Suzuki, E. et al., 2008. Trypanosomatid and fungal glycolipids and sphingolipids as infectivity factors and potential targets for development of new therapeutic strategies. Biochimica et Biophysica Acta - General Subjects, 1780(3), pp.362–369. Thiele, I. et al., 2005. Expanded metabolic reconstruction of Helicobacter pylori (iIT341 GSM/GPR): An in silico genome-scale characterization of single- and double-deletion mutants. Journal of Bacteriology, 187(16), pp.5818–5830. Tielens, A.G.M. and van Hellemond, J.J., 2009. Surprising variety in energy metabolism within Trypanosomatidae. Trends in Parasitology, 25(10), pp.482–490. Turco, S.J. and Sacks, D.L., 1991. Expression of a stage-specific lipophosphoglycan in Leishmania major amastigotes. Molecular and Biochemical Parasitology, 45(1), pp.91–99. Turnock, D.C. and Ferguson, M.A.J., 2007. Sugar nucleotide pools of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major. Eukaryotic Cell, 6(8), pp.1450–1463. Van Weelden, S.W.H. et al., 2005. New functions for parts of the krebs cycle in procyclic Trypanosoma brucei, a cycle not operating as a cycle. Journal of Biological Chemistry, 280(13), pp.12451–12460. Wei, F., Wang, W. and Liu, Q., 2013. Protein kinases of Toxoplasma gondii: Functions and drug targets. Parasitology

Metabolic Systems Biology of Leishmania major University of Minho, 2019 152 Chapter 3

Research, 112(6), pp.2121–2129. Winter, G. et al., 1994. Surface antigens of Leishmania mexicana amastigotes: characterization of glycoinositol phospholipids and a macrophage-derived glycosphingolipid. J. Cell Sci., 107, pp.2471–82. Yizhak, K. et al., 2010. Integrating quantitative proteomics and metabolomics with a genome-scale metabolic network model. Bioinformatics (Oxford, England), 26(12), pp.i255-i260. Zhang, K. et al., 2005. Leishmania salvage and remodelling of host sphingolipids in amastigote survival and acidocalcisome biogenesis. Molecular Microbiology, 55(5), pp.1566–1578. Zhang, K. et al., 2007. Redirection of sphingolipid metabolism toward de novo synthesis of ethanolamine in Leishmania. The EMBO journal, 26(4), pp.1094–104.

CHAPTER 4

Gene Essentiality Analysis In Leishmania major

"...so that it becomes possible to kill the parasites without damaging the body to any great extent."

Paul Ehrlich, Readings in Pharmacology (1909)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 154 Chapter 4

Abstract Acquired resistance of the Leishmania parasites to chemotherapies has become a significant public health issue. The parasite primarily uses its adaptive metabolism to bypass the effects of drugs inside human hosts, which entails the discovery of new drugs for treatment and prevention by targeting resistance mechanisms. The advent of omics data has provided great opportunity to understand the genotype/phenotype associated with these resistance mechanisms in parasites, which helps to identify potential drug targets in order to eradicate the disease. Systems biology approaches, in particular, metabolic network modeling have emerged as a promising strategy to identify potential metabolic drug targets.

In this work, Parsimonious Flux Balance Analysis (pFBA)-based simulations using metabolic model ext-iAC560 (presented in Chapter 3) were performed to predict 89 and 84 essential genes in promastigote and amastigote stages of L. major, respectively. Further analyses identified 53 non- human homologous genes including 48 commonly essential in both the stages, while, four genes (e.g. LmjF35.4590, LmjF28.3005, LmjF14.1200, and LmjF05.0520) and one gene (e.g. LmjF36.6950) are found uniquely essential in promastigote and amastigote stages, respectively, and can be considered as potential drug targets. The analyses also identified nine novel metabolic target genes (e.g. LmjF35.5330, LmjF36.2540, LmjF32.1960, LmjF33.0680, LmjF28.1280, LmjF21.1430, LmjF09.1040, LmjF06.1070, and LmjF06.0350) in promastigote and amastigote stage, while one novel gene (e.g. LmjF36.6950) was identified as a potential drug target in amastigote stage. More than 70% of predicted essential genes are associated with biosynthesis of at least two building blocks and nearly 80% have known enzymatic inhibitors (based on information from DrugBank database and literature); however, these potential targets have been poorly studied in Leishmania species. Further, reported scientific data suggests that molecules, such as carbamide phenylacetate, CDV, terbinafine, and 4-(dimethylaminomethyl)-2,6-di(propan-2-yl)phenol could be used as potential antileishmanial drugs. Gene Essentiality Analysis in Leishmania major 155

4.1 Introduction

Leishmaniasis is a tropical and sub-tropical disease caused by Leishmania parasites. So far, nearly 53 species of Leishmania have been characterized and 20 are known to infect humans, including Leishmania major, Leishmania mexicana, Leishmania donovani, Leishmania braziliensis and Leishmania infantum (Alvar et al. 2012). Considering the endemic severity of leishmaniasis and the emergence of resistance of Leishmania species to the existing drugs (e.g. miltefosine, paromomycin and amphotericin B), there is an urgent need for developing novel and effective antileishmanial therapies.

In addition to conventional methods that basically rely on growth observations, high throughput technologies like transcriptomics or proteomics have been widely used to characterize parasites genotypes/phenotypes in order to find potential drug targets (Barribeau et al. 2014; Yamagishi et al. 2014; Ismail et al. 2016). Bioinformatics resources and platforms have been developed to analyze large-scale data to identify drug targets (Cilingir et al. 2012; Azevedo and Soares 2009), while computational approaches are aiding significantly towards drug discovery process to reduce time, cost and resources in various organism including protozoan parasites (Ou-Yang et al. 2012; Ma et al. 2010; Katsila et al. 2016). For example, access to genome sequences of various parasites, such as Leishmania (Ivens et al. 2005), Trypanosoma (El-Sayed et al. 2005), and Toxoplasma (Kissinger et al. 2003; Tymoshenko et al. 2015) and high throughput methods have led to the identification of a large number of potential drug targets to prevent disease, as in previous studies with Plasmodium (Bozdech et al. 2003) and Leishmania (Depledge et al. 2009).

In the same context, systems biology approaches such as metabolic network analyses, which facilitate systems-wide characterization of an organism by utilizing multi-omics data, have emerged as promising tools to understand metabolism and identify potential drug targets in parasites (Durmus et al. 2015; Zou et al. 2013). In particular, gene essentiality analysis using a metabolic network is one of the most common methods to study organism phenotypes in gene knockout condition, which subsequently help in identifying potential drug targets in an organism. Other systems based analyses such enzyme robustness and flux variability using the metabolic network are also useful for predicting drug targets (Chavali et al. 2012a; Jiang et al. 2015). More specific applications of these methodologies have been proposed for predicting gene/enzyme essentiality and robustness in the metabolic network (Larhlimi et al. 2011; Edwards and Palsson 2000; Tobalina et al. 2016).

Metabolic Systems Biology of Leishmania major University of Minho, 2019 156 Chapter 4

Previously, these approaches were used for predicting potential drug targets in many medically challenging pathogens such as Mycobacterium tuberculosis (Jamshidi and Palsson 2007; Fang et al. 2010) and Acinetobacter baumanii, and protozoan parasites such as and Plasmodium falciparum (Plata et al. 2010); however, for the other protozoan parasites including Leishmania the strategies have been utilized with poor prediction accuracy. For human protozoan parasites, various genome- scale metabolic networks have been constructed and analyzed in the context of elucidating insights of essential metabolic components such as metabolites, reactions and genes that contribute to the pathogenicity (Jamshidi and Palsson 2007; Chavali et al. 2012a) (see section 1.6.4 in Chapter 1). For example, metabolic models for Leishmania species such as L. infantum (iAS556 and iAS142) (Subramanian et al. 2015; Subramanian and Sarkar 2017), L. donovani (iMS604) (Sharma et al. 2017), and L. major (iAC560) (Chavali et al. 2008) were developed to describe metabolism of the parasites; however, systems-based approaches were poorly utilized for predicting essential genes. The main challenges exist to develop a strategy along with imposing proper constraint while performing flux-based analysis for predicting critical components in the metabolic network which can further help to identify potential drug targets in promastigote and amastigote conditions.

The present study uses metabolic model ext-iAC560 (developed and described in Chapter 3) and performs flux-based analysis for predicting genes essential for growth in promastigote and amastigote L. major. Further, literature-based searches and homology analyses are used to prioritize the drug targets in Leishmania.

Gene Essentiality Analysis in Leishmania major 157

4.2 Methodology

4.2.1 Reaction knockout and phenotype predictions

Reaction knockout simulations using metabolic model ext-iAC560 were performed to predict growth phenotypes. A dataset of 67 reactions which knockout and phenotypes were previously studied in Leishmania and Trypanosoma, collected from literature, were used to evaluate prediction accuracy of the model ext-iAC560. Each reaction was knocked out (i.e. minimum and maximum flux was set as zero for that reaction) one by one, and Parsimonious Flux Balance Analysis (pFBA1) was performed to predict biomass growth. Modified Media for Promastigote (MMP) and Modified Media for Amastigote (MMA) (described in section 3.2.3 in Chapter 3) were used to simulate growth phenotypes in promastigote and amastigote conditions, respectively. Reactions were considered lethal if their knockout predicts zero biomass in the simulation. Previous metabolic models of Leishmania species such as L. major (iAC560) (Chavali et al. 2008), L. donovani (iMS604) (Sharma et al. 2017) and L. infantum (iAS142) (Subramanian et al. 2015) were also checked in terms of the number of reactions (out of 67) tested in knockout simulations and correctly predicted phenotypes cases. As some of these models were not described in proper SBML format, which subsequently caused problems in flux-based simulations; that’s why only published data on reaction knockout simulations and phenotype predictions using these models were considered.

4.2.2 Gene essentiality prediction

Single/double gene deletion analysis was carried out using the metabolic model ext-iAC560 to predict gene essentiality for growth of L. major in promastigote and amastigote stages. Each gene in the model was knocked out (i.e. lower and upper bound of the reaction(s) which are solely associated2 with that gene were set zero) one by one, and pFBA was performed to predict growth. The MMP and MMA media were applied to simulate growth in promastigote and amastigote stages, respectively. The predicted biomass flux values in wild-type (WT) and knocked out (KO) conditions were compared to calculate the essentiality factor (EF), as described below:

Essentiality Factor (EF) =

Based on EF values, genes were categorized as Essential (E) or Non-Essential (NE), as suggested in (Sharma et al. 2017), and represented below:

1 pFBA refers to the flux balance analysis approach that incorporates flux parsimony, means total flux in all pathways to achieve an objective is minimized. 2 Sole association of the gene(s) with the reaction(s) refers to the condition where no other gene(s) can activate that reaction(s).

Metabolic Systems Biology of Leishmania major University of Minho, 2019 158 Chapter 4

Essential (E): 0 ≤ EF < 10-6; Non-essential (NE): 10-6 ≤ EF ≤ 1;

The predicted NE genes in the single gene knockout analysis were further tested for double gene deletion effects on parasites growth, where a pair of genes was knocked out from the model before performing pFBA. The calculation of EF of each tested gene pair was performed in the same way as for single gene knockout analysis. The knockout effects on the biosynthesis of specific biomass building blocks were further inspected by comparing predicted fluxes under WT and KO conditions.

4.2.3 Identification of the non-human homologous genes

All predicted essential genes were subjected to BLASTP analysis (Madden 2013) against human proteome to identify non-human homologous hits. The FASTA format of protein sequences of all the genes were downloaded from GeneBank (https://www.ncbi.nlm.nih.gov), while the NCBI server (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used to perform BLASTP. Best hits for homologous genes were considered based on the following criteria (Paul et al. 2014): identity >35%, e-value <0.0001 and sequence coverage >75%. The remaining non-homologous genes were searched in databases (e.g. TDR Target Database (Magariños et al. 2012), DrugBank (Wishart et al. 2008), and MetaCyc (Caspi et al. 2014)) and literature for information on their essentiality and knockout phenotypes in Leishmania or other closely related species. Further, chemical databases such as DrugBank were searched to find available enzyme-based inhibitors, druggability of the protein target, and other relevant information about predicted essential genes in Leishmania.

Predicted essential gene pairs (also known as Non-trivial lethal genes) from double gene deletion analysis were compared with the results from a previous study (Chavali et al. 2012), which identified synthetic lethal gene pairs, and experimentally verified by measuring Leishmania phenotypes in presence of FDA-approved drugs.

Brief description about programming scripts to carry out above mentioned analyses and simulations

are provided in Appendix 4d. Gene Essentiality Analysis in Leishmania major 159

4.3 Results and Discussion

4.3.1 Reaction knockout phenotypes and model comparison

Reaction knockout simulations were performed in order to evaluate prediction accuracy of the current model as compared to the previous models. The original model iAC560 predicted reaction knockout phenotypes with 66.6% accuracy (24 correct prediction out of 36 reactions tested) in promastigote condition, while the present model ext-iAC560 showed ~72% accuracy (48 correct predictions out of 67 reactions tested) for predicting correct phenotypes in reaction knockout condition (Table I of Appendix 4c). In promastigote condition, the present model ext-iAC560 predicted correct phenotypes after knocking out of four reactions i.e. UDPG4Ex (UDPglucose 4- epimerase), PGK (phosphoglycerate kinase), ADPTr (1-(5'-Phosphoribosyl)-5-amino-4- imidazolecarboxamide:pyrophosphate phosphoribosyltransferase), and ADNK1g (adenosine kinase) one by one, which were wrongly predicted in the original iAC560. As in silico growth medium considered for reaction knockout simulation using iAC560 and ext-iAC560 are almost similar, and no additional flux constraint was imposed in ext-iAC560 simulation; therefore, the improvement in the phenotype prediction accuracy can be mainly because of the constructional corrections (e.g. deletion, addition and alterations in the metabolic reactions based on experimental evidence) made in the present model. The recently developed model iMS604, which represented a metabolic network of L. donovani, included a higher number of genes as compared to any of the metabolic models including ext-iAC560, but showed only ~60% accuracy for predicting reaction knockout phenotypes in promastigote condition. There are only a few reactions which knockout was predicted wrong using iMS604, but corrected using ext-iAC560. The metabolic model iAS142 is a very small model and only represent central carbon metabolism of L. infantum. Although, the prediction accuracy for this model is highest, but the coverage in terms of number of reactions tested in the knockout simulation is much lower as compared to any other model (Table I of Appendix 4c).

In amastigote condition, the lethal phenotypic effects after knocking out Glucosamine-6-phosphate acetylase (GNAT) and glucosamine-6-phosphate deaminase (GND) associated reactions as previously studied in (Naderer et al. 2015) are only predicted using current model. None of the previous models considered these enzymes, which play a vital role in sugar nucleotide biosynthesis in Leishmania. Similarly, knockout of a reaction (reaction ID ”ACONTm” in the model) catalyzed by aconitate hydratase in L. mexicana showed lethal phenotype in amastigote condition (Naderer et al. 2010), which is wrongly predicted using model iAS142; however, ext-iAC560 predicted this phenotype correctly. Other models including iAC560 and iMS604 did not consider this reaction in

Metabolic Systems Biology of Leishmania major University of Minho, 2019 160 Chapter 4

the knockout studies. Another, lethal phenotypes was correctly predicted after knockout of hypoxanthine phosphoribosyltransferase (reaction ID “HXPRTg” in the model ext-iAC560) using the model ext-iAC560, however also predicted in iMS604, which was supported in the previous study of amastigote L. donovani (Boitz and Ullman 2006a) (Table I of Appendix 4c). These improved phenotype predictions in amastigote stage is not only because of the constructional changes in the metabolic network, but also due to use of in silico MMA media (formulated in the current study, see section 3.2.3 in Chapter 3), which is more realistic growth media defined for Leishmania amastigote to perform in silico growth simulation in knockout condition. The number of reactions tested and correctly predicted phenotypes using current model is significantly higher than that in iAC560 as well as other metabolic models such as iAS142 and iMS604, suggesting that the present model comparatively better represent a metabolic network of L. major and is more capable of predicting accurate reaction knockout phenotypes in defined environmental conditions. This conveys that the present model has improved accuracy for predicting reaction knockout phenotypes in Leishmania; and therefore, is further used to identify genes essential for growth by performing gene knockout simulation.

4.3.2 Single/double gene deletion analysis

Single gene deletion analysis predicted 89 and 84 essential (E) genes in promastigote and amastigote stage, respectively (Table 4.1). 81 essential genes are common in both stages, while 8 genes (e.g. LmjF35.4590, LmjF31.2970, LmjF28.3005, LmjF06.0950, LmjF05.0520, LmjF12.0630, LmjF14.1200, and LmjF33.2720) and 3 genes (e.g. LmjF36.6950, LmjF10.0010, and LmjF33.3270) are uniquely identified as essential in promastigote and amastigote stages, respectively. Genes exclusively predicted as essential in promastigote stage mainly belong to amino sugars metabolism, fatty acid biosynthesis, and glycerophospholipid metabolic pathways, while in amastigote stage these are mostly from fatty acid synthesis pathways. This is mainly due to changes in media composition used to simulate growth phenotypes in promastigote and amastigote stages. MMP is mainly composed of glucose and some amino acids, while MMA includes glucose, fatty acids, and other lipids, and amino acids, as described in section 3.2.3 of Chapter 3.

The predicted Non-essential (NE) genes in each condition were considered for double gene deletion analysis, which identified 52 gene pairs as essential in both promastigote and amastigote stage. Additionally, 63 and 10 gene pairs were also identified as uniquely essential in promastigote and amastigote conditions, respectively (Table 4.1).

Gene Essentiality Analysis in Leishmania major 161

Table 4.1: Predicted Essential (E) and Non-Essential (NE) genes in single/double gene knockout (KO) analysis using metabolic model ext-iAC560 in promastigote and amastigote conditions. Parsimonious Flux Balance Analysis (pFBA)-based simulation was ran to predict growth phenotypes in gene knockout condition.

Essentiality factor (EF) 0.0 ≤ EF < 10-6 10-6≤ EF ≤ 1 (E) (NE)

Number of genes in single gene knockout analysis Promastigote 89 481 Amastigote 84 485 Common genes 81

Number of gene pairs in double gene knockout analysis Promastigote 115 82566 Amastigote 62 117307 Common gene pairs 52

4.3.3 Local or global effects of gene deletion

The knockout of each essential (E) gene influenced biosynthesis of single or multiple biomass building blocks, and consequently affected metabolic network locally (i.e. single metabolic pathway) or globally (i.e. multiple metabolic pathways). In promastigote, the predicted flux values representing biosynthesis of 48 building blocks of biomass (see biomass equation of the model ext- iAC560, Appendix 3b in Chapter 3) in WT and KO conditions showed that knocking out all predicted 89 essential genes one-by-one affects3 the synthesis of at least one building block, preventive biomass growth. In amastigote condition also, knockout of all predicted 84 genes one- by-one is capable of influencing3 synthesis of at least one building block and ultimately leading to lethal effects. Most of the genes (81 genes as mentioned in Table 4.1) are commonly predicted as essential in promastigote and amastigote stages. On average around 70% of essential genes were found capable of influencing3 the biosynthesis of more than two biomass building blocks in each condition. At the same time, most of the genes are capable of affecting metabolic network globally (see Table I of Appendix 4a), after knocking out predicted essential genes one by one, conveying that metabolic pathway in Leishmania are highly networked. Knockout of few genes (e.g. LmjF19.0710, LmjF28.1970, LmjF29.1960, LmjF32.2950, LmjF35.1180, and LmjF35.3870)

3 This refers to the situation when knockout of the gene reduces synthesis of biomass building block(s) to zero.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 162 Chapter 4

affected4 synthesis of more than 10 building blocks constituting multiple macromolecules (e.g. RNA, DNA, carbohydrates, lipids, proteins, and polyamines) in both the stages of Leishmania (see Table I of Appendix 4a). However, genes like LmjF32.3310, LmjF31.2640, LmjF31.2650, and LmjF26.0830 affected comparatively a larger number of building blocks and/or macromolecules in promastigote stage as compared to amastigote stage. These genes belong to glycolysis/gluconeogenesis and amino acid metabolic pathways, which are rather critical in the amastigote stage than in promastigote stage. The number of essential genes affecting individual macromolecules in each stage is shown in Figure 4.1. Interestingly, one by one knockout of a large number of essential genes affected macromolecules such as RNA, protein, lipids, and carbohydrates, while, only a few genes affected DNA, polyamines and other macromolecules by influencing3 biosynthesis of associated monomers such as dAMP, dCMP, Putrescine, and Sperimidine in promastigote as well as amastigote stage. From Table I of Appendix 4a, it is interesting to notice that the essential genes (e.g. LmjF11.0590, LmjF11.1100, LmjF13.1620, LmjF18.0020, and LmjF21.1430) are capable of influencing biosynthesis of protein and lipid simultaneously. Knockout of these genes probably affects synthesis of common metabolites and cofactors which are used for biosynthesis of both proteins and lipids. The analysis also showed that the essential genes which knockout affected carbohydrates biosynthesis rarely influenced protein synthesis, but lipid biosynthesis, conveying that carbohydrates and lipids synthesis, and probably their functions also, are metabolically more connected in Leishmania. The genes such as LmjF06.0370, LmjF19.0710, LmjF29.1830, LmjF35.3870, and LmjF30.2600, influencing multiple macromolecules (e.g. RNA, DNA, Carbohydrate, and Protein) after KO might be of interest to consider as possible metabolic drug targets as these affect the growth of Leishmania by influencing metabolic network globally. These analyses are helpful to understand metabolic connectivity between different pathways up to some extent, as well as genes which are capable of influencing metabolic network globally to restrict Leishmania growth in different environmental conditions.

4 This refers to the situation when knockout of the gene reduces synthesis of biomass building block(s) to zero. Gene Essentiality Analysis in Leishmania major 163

Figure 4.1: Number of essential genes which affected biosynthesis of macromolecules in biomass. The pFBA-based simulation was carried out in MMP (promastigote) and MMA (amastigote) media to simulate growth phenotypes in wild-type and knockout conditions. The genes which knockout reduced the synthesis of at least one building block to zero, were considered as essential for growth. Legend: pFBA: Parsimonious Flux Balance Analysis; MMP: Modified Media for Promastigote; MMA: Modified Media for Amastigote.

4.3.4 Drug target identification

The ideal drug target should not be identical to the biological entity such as genes and proteins inside human to avoid unexpected effects of the drugs. In order to minimize resemblance with the human biological entity, the BLASTP analysis of all essential genes against human proteome was performed (as described in the methodology) to identify 48 non-human homologous genes in both promastigote and amastigote stages of L. major. Additionally, four genes (i.e. LmjF35.4590: Phosphatidylserine decarboxylase and LmjF14.1200: CDPdiacylglycerol-serine O- phosphatidyltransferase from glycerophospholipid metabolism, LmjF28.3005: N- acetylglucosamine-6-phosphate synthase from amino sugars metabolism, and LmjF05.0520:

Metabolic Systems Biology of Leishmania major University of Minho, 2019 164 Chapter 4

malonyl-CoA C-acyltransferase from fatty acid biosynthesis pathways) in promastigote and one gene (i.e. LmjF36.6950: linoleoyl-coa desaturase from fatty acid synthesis pathways) in amastigote were uniquely found as non-human homologous and essential for growth of the parasite (Table 4.2). In total, 53 genes were classified as potential drug targets in promastigote and amastigote conditions. The unique essential gene LmjF36.6950 in amastigote stage is associated with the biosynthesis of gamma-linoleoyl-CoA, which is further involved in the formation of larger essential CoA complexes such as docosotetranoyl-CoA and docosohexaenoyl-CoA. In the promastigote stage, LmjF35.4590 and LmjF14.1200 are involved in glycerophospholipid metabolism, while LmjF28.3005 and LmjF05.0520 are associated with the amino sugars and fatty acid biosynthetic pathways. These unique gene target genes were not predicted in the same way (like uniquely essential in one particular stage) in the previous L. major metabolic model iAC560; however, orthologs of LmjF35.4590 (Phosphåtidylserine decårboxylåse) and LmjF14.1200 (CDPdiacylglycerol-serine O-phosphatidyltransferase) were predicted as essential by L. donovani metabolic model iMS604. In particular, gene targets from lipid and sugar biosynthesis pathways are essentially of much importance. As described in Chapter 1, lipids and sugars are majorly used for formation of glycolipid and glycoconjugate complexes which provide a protective layer on the cell surface of Leishmania. The unique gene targets, especially from lipid biosynthesis pathways, are capable of influencing glycolipid configuration and thus can be used in antileishmanial drug development strategies.

Many of the currently predicted drug target genes have been previously analyzed in parasitic and non-parasitic species. In order to support current predictions, the existing literature evidence was used. The literature and databases searches of each drug target showed that sixteen genes (out of 53) are found essential in Leishmania, while twenty seven genes are essential in Trypanosoma or other parasitic species (Figure 4.2 and Table 4.2). Interestingly, six genes (e.g. LmjF11.1100, LmjF15.1460, LmjF18.0200, LmjF21.0850, LmjF28.1970, and LmjF35.4410), which were mentioned as essential in Leishmania as well as in other parasitic species in literature (Figure 4.2 and Table 4.2), predicted essential in L. major using only current model; however, ortholog of only one gene LmjF21.0850 (xanthine phosphoribosyltransferase) was predicted as essential in L. donovani using metabolic model iMS604. These commonly essential genes in various parasitic species should have shared some common features associated with similar metabolic functions among different parasites, and so, might bring an opportunity to conduct further studies about common functional features which can help in filling the metabolic gap to understand the metabolism of either species. From sugar nucleotides biosynthesis pathways, predicted drug target Gene Essentiality Analysis in Leishmania major 165 genes such as LmjF33.2520 (UDP-N-acetylglucosamine pyrophosphorylase) in promastigote and amastigote, and LmjF28.3005 (N-acetylglucosamine-6-phosphate synthase) in promastigote stage, are supported by the previous study revealing essentiality of these genes for growth of T. brucei (Stokes et al. 2008; Mariño et al. 2011). These genes were not analyzed in the previous models; however, predicted as essential as well as potential drug targets in current flux-based analysis using ext-iAC560.

Figure 4.2: Distribution of predicted 53 potential drug target genes based on literature evidence referring to essentiality data in Leishmania, other parasitic species, and non-parasitic species. All the genes were searched into DrugBank, TDR Target databases and literature to find enzymatic inhibitors mentioned.

The analysis also predicted nine novel genes (e.g. LmjF35.5330, LmjF36.2540, LmjF32.1960, LmjF33.0680, LmjF28.1280, LmjF21.1430, LmjF09.1040, LmjF06.1070, and LmjF06.0350) (Table 4.2 and Table 4.3), which can be considered as potential drug targets in both the conditions, while one gene LmjF36.6950 is predicted as novel potential drug target in only amastigote stage. These genes have no or very little information about their essentiality in human protozoan parasites or closely related species. These in silico growth affecting genes are associated to various pathways including steroid biosynthesis, glycerophospholipid, and amino acid metabolism (Table 4.2 and Table 4.3). For example, in glycerophospholipid metabolic pathway, predicted 1-acyl-sn-glycerol- 3-phosphateacyltransferase (LmjF32.1960) catalyzes the synthesis of phosphatidate (pa_LM), which is an intermediate for biosynthesis of higher lipids such as phosphatidylcholine (pc_LM) and

Metabolic Systems Biology of Leishmania major University of Minho, 2019 166 Chapter 4

phosphatidylethanolamine (pe_LM). Synthesis of these lipids is essential as these are part of biomass in both the conditions. Previous study showed that, Miltefosine (hexadecylphosphocholine [HePC]), the first orally active antileishmanial drug, significantly affects synthesis of pc_LM and pe_LM possibly by partially altering activity of phosphatidylethanolamine-N-methyltransferase enzyme (which further catalyze conversion of these lipids to other lipids), leading to reduced proliferation of L. donovani promastigote (Rakotomanga et al. 2007). This reduction in the content of lipid can also be understood in the sense that HePC might have also altered the activity of 1-acyl- sn-glycerol-3-phosphateacyltransferase enzyme to affect pe_LM and pc_LM content, causing a reduction in growth, as predicted in the current analysis; however, further experimentation is needed to support this. Similarly, 1,2-diacyl-sn-glycerol O-acyltransferase (LmjF09.1040) is another predicted potential drug target which is associated with synthesis of lipid triglyceride which contributes to biomass in both the condition.

Phenylalanine 4 Hydroxylase (PAH, LmjF28.1280), predicted as a potential drug target in both conditions, catalyzes the hydroxylation of the aromatic side-chain of phenylalanine to generate tyrosine, which is further used as a precursor for other amino acids such as glutamate, as well as contributes to biomass. From the previous studies, this enzyme uses a molecule tetrahydrobiopterin (H4B) to perform catalytic action in Leishmania (Nare et al. 2009). Related studies also showed that L. major requires H4B for growth (pathways related to this metabolite are not covered in the present model); however, its metabolic roles are not clearly understood so far (Beck and Ullman 1990; Ong et al. 2011). Targeting PAH enzyme might possibly alter the level of H4B, leading to affect the growth of the parasite. TDR Target database mentioned some of these genes as essential in other species including E. coli and M. tuberculosis. DrugBank database also provides information about available enzymatic inhibitors only for one gene (i.e. LmjF28.1280) (Table 4.2). Similarly, as a bit described above, LmjF36.6950 (linoleoyl-coa desaturase) from fatty acid synthesis pathway, which is predicted as unique potential drug target gene in amastigote Leishmania also, has no information in literature and databases about its essentiality in protozoan parasites (Table 4.2). The original model iAC560 did not predict this gene as essential for growth. Other relevant information for predicted all novel genes are provided in Table 4.3. Compilation of this information emphasizes to conduct further studies focusing on characterization of pathways where these genes are involved in order to find their metabolic roles and essentiality for the growth of Leishmania. Gene Essentiality Analysis in Leishmania major 167

As shown in Table 4.2, druggability5 of the enzymes associated with most of the novel drug target genes has not been analyzed in the past; however, protein-based 3D structural models are available for many of these enzymes in other parasitic species, which can be used to determine druggable pockets in order to determine druggability in Leishmania. Interestingly, LmjF28.1280 (predicted as a novel essential gene), associated with tyrosine formation in amino acid metabolic pathways, has high druggability (i.e. equal to 0.9); suggesting that this can be a potential target; however, further experimental characterization is needed in Leishmania. Among the other predicted drug targets, LmjF05.0350, LmjF06.0860, LmjF12.0280, LmjF13.1620, LmjF21.0845, LmjF11.1100, LmjF21.0850, LmjF21.0810, LmjF16.0530, and LmjF28.1280, which have already been characterized in other parasitic and non-parasitic species, also have a high druggability (≥ 0.8), suggesting these targets are very potent to bind with drug-like molecules, and can be used as potential antileishmanial drug targets.

Around 80% of the predicted drug target genes have enzymatic inhibitors in parasitic and non- parasitic species according to information in literature and chemical databases (e.g. DrugBank and TDR Target) (Table 4.2). The inhibitors found in DrugBank are either approved by FDA (Food and Drug Administration) or at the investigational/experimental stage. Most of these enzyme-based inhibitors were not tested in Leishmania species; however, those which are tested against essential genes in other protozoan species are of major interest, as these are more likely to have similar responses in Leishmania. For example, inhibitor molecules (e.g. Carbamide phenylacetate, CDV, Terbinafine, and 4-(dimethylaminomethyl)-2,6-di(propan-2-yl)ph enol ) have been tested against enzymes (which orthologous genes in L. major are LmjF06.0860, LmjF21.0845, LmjF11.1100, and LmjF31.2650) in Trypanosoma species, may work as potential antileishmanial drugs; however, their sensitivity in human is matter of further investigation.

As far as double gene knockout analysis is concerned, 115 non-trivial lethal gene pairs were predicted, which were essential for growth of Leishmania promastigote. The original model iAC560 (Chavali et al. 2008) predicted only 56 non-trivial lethal gene pairs, out of which 15 gene pairs were not predicted as essential in the current analysis. The currently predicted gene pairs also included seven pairs (i.e. LmjF27.2050-LmjF22.1290, LmjF20.0100-LmjF24.0850, LmjF24.0850- LmjF30.3380, LmjF04.0960-LmjF34.0110, LmjF30.3520-LmjF30.3500, LmjF05.0510- LmjF05.0500 and LmjF25.1180-LmjF25.1170) which were considered as high priority synthetic lethal gene pairs based on druggability4 and gene essentiality in a previous study (Chavali et al.

5 The druggability is defined as capability of the target to bind with drug-like small chemical molecule(s), and is represented using the score between 0 and 1. Higher score corresponds to better druggability.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 168 Chapter 4

2012) (Table I of Appendix 4b). In current findings, the gene pair LmjF05.0510-LmjF05.0500 was associated with the chemical halofantarine (an antimalarial drug) and showed antileishmanial activity as studied previously (Chavali et al. 2012). In amastigote stage, the current metabolic model predicted 62 gene pairs (52 common in promastigote and amastigote stage, and 10 unique in amastigote stage), which were not predicted by the previous model iAC560. Of particular interest, the gene pair LmjF05.0510-LmjF05.0500, which is predicted as essential in promastigote, is also predicted as lethal pair in amastigote stage, providing further scope for testing halofantarine in amastigote L. major. Apart from this, other uniquely predicted 10 lethal gene pairs in amastigote stage are also of interest and should be tested in the laboratory. All predicted double gene combinations and the ones which are commonly found in the previous studies (Chavali et al. 2012) and (Chavali et al. 2008) are mentioned in Table I of Appendix 4b. Gene Essentiality Analysis in Leishmania major 169

Table 4.2: Databases and literature based information for predicted 53 essential genes (which are non-human homologous also) in Leishmania. Note: light red- essential genes in Leishmania; light grey- genes which are essential in Leishmania and other species; light blue - essential genes in protozoan parasites (other than Leishmania) and other species and; light green- novel predicted essential genes which have no or very less information about their essentiality in protozoan parasites or closely related species in literature and databases. The bold text represents genes that are uniquely found essential either in promastigote or amastigote stage in Leishmania.

TDR Target database

Organism in which Enzyme Enzyme Name Gene ID Stage in Essentiality (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania or structure essential(Literature) Literature gene was Orthologues) models

predicted Druggability as essential Cellular localization

(N-{(1s)-4-[Bis(2- Leishmania amazonensis, Chloroethyl)Amino]-1- amast, trypanothione Leishmania donovani, Methylbutyl}-N-(6-Chloro-2- LmjF05.0350 x E 0.9 Y DrugBank promast reductase Leishmania major (Aguero et Methoxy-9-Acridinyl)Amine, al. 2008) Flavin adenine dinucleotide, Trypanothione, Maleic Acid) Trimethoprim, cycloguanil, dihydrofolate Leishmania major (Cruz and Carbamide amast, reductase- Pyrimethamine, Proguanil, LmjF06.0860 c E 1 Y Beverley 1990); Leishmania phenylacetate, DrugBank promast thymidylate Sulfadoxine donovani (Scott et al. 1987) pyrimethamine synthase and methotrexate (Gilbert 2002) Pyridoxal Phosphate, Putrescine, Pyridoxine-5'-Phosphate, N- Pyridoxyl-Glycine-5- ornithine M-2, M-5, amast, Leishmania donovani (Boitz et Monophosphate, Alpha- LmjF12.0280 c decarboxylase, E 1 Y Mangiferin (Das DrugBank promast al. 2009) Difluoromethylornithine, N'- putative et al. 2016) Pyridoxyl-Lysine-5'- Monophosphate, Geneticin, Spermine, Eflornithine

Metabolic Systems Biology of Leishmania major University of Minho, 2019 170 Chapter 4

Table 4.2: (continued).

TDR Target database

Essentiality Organism in which Enzyme Enzyme Name Gene ID Stage in (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania structure essential(Literature) Literature gene was or models

predicted Orthologues) Druggability as essential Cellular localization

sn-glycerol-3- amast, c, m, dihydroxyacetonephosph Leishmania major (Zufferey LmjF34.1090 E 0.6 Y phosphate (Datta promast x ate acyltransferase and Mamoun 2006) and Hajra 1984) Pyromellitic Acid, 3- amast, Fumarate hydratase class No Leishmania major (Feliciano LmjF29.1960 x E Y Trimethylsilylsuccinic Acid, Malate DrugBank promast I, cytosolic data et al. 2012) Ion, Citric Acid Guanosine-5'-Diphosphate, amast, nucleoside diphosphate No Leishmania braziliensis SU11652 (Vieira LmjF35.3870 c, n E Y Lamivudine, Tenofovir, Adefovir DrugBank promast kinase, putative data (Moreira and Murta 2016) et al. 2017) Dipivoxil Clotrimazole squalene amast, Leishmania major (Xu et al. (Marriott 1980), LmjF13.1620 er monooxygenase-like E 0.8 Y Tolnaftate DrugBank promast 2014) terbinafine (Ryder protein 1992) Biotin sulfone, amast, Leishmania donovani (Tiwari Kaempferol LmjF16.0580 c dihydroorotase, putative E 0.3 Y promast et al. 2016) (Tiwari et al. 2016) CDV, (Alpha- Phosphoribosylpyrophosphoric Acid, 5--Monophosphate-9-Beta-D- Ribofuranosyl Xanthine, 9- hypoxanthine-guanine DrugBank; amast, Leishmania donovani (Shih Deazaguanine, 3h-Pyrazolo[4,3- LmjF21.0845 x phosphoribosyltransferas E 0.8 Y TDR promast et al. 1998) D]Pyrimidin-7-Ol, Azathioprine, e Database Mercaptopurine, Tioguanine); [3- azaniumyl-1-hydroxy-1- [hydroxy(oxido)phosp horyl]propyl]-hydroxyphosphinate amast, aspartate--ammonia Leishmania donovani L-Aspartic Acid, Adenosine LmjF26.0830 c E 0.2 Y DrugBank promast ligase, putative (Manhas et al. 2014) triphosphate, L-Asparagine

Gene Essentiality Analysis in Leishmania major 171

Table 4.2: (continued).

TDR Target database

Organism in which Enzyme Enzyme Name Gene ID Stage in Essentiality (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania or structure essential(Literature) Literature gene was Orthologues) models

predicted Druggability as essential Cellular localization

Trypanosoma cruzi and amast, Lanosterol 14- Ketoconazole, Terbinafine, TDR LmjF11.1100 er E 0.8 Y Leishmania spp (Lepesheva promast alpha demethylase Itraconazole Database and Waterman 2011) phosphomevalona Leishmania major and amast, No LmjF15.1460 c, x te kinase protein, E Y Trypanosoma brucei (Sgraja et promast data putative al. 2007) Compound 1, Trypanosoma cruzi and Compound 2, UDP- amast, No Leishmania major (Pedersen Compound 3 LmjF18.0200 c galactopyranose E Y Flavin adenine dinucleotide, Bicine DrugBank promast data and Turco 2003; Kizjakina et AND Compound mutase al. 2013) 4 (Soltero-Higgin et al. 2004) xanthine amast, Not essential in Leishmania TDR LmjF21.0850 x phosphoribosyltra E 0.8 Y CDV, meropenem, Ertapenem promast Donovani Database nsferase ribose 5- 4- Leishmania donovani (Kaur et amast, phosphate No phosphoerythronat LmjF28.1970 c E Y al. 2016); Trypanosoma cruzi Citric Acid DrugBank promast isomerase, data e (Woodruff and (Stern et al. 2007) putative Wolfenden 1979) Leishmania donovani amast, amino acid (Akerman et al. 2004); Eugenol (Martho LmjF35.4410 e E 0.1 Y promast permease, putative Cryptococcus neoformans et al. 2016) (Fernandes et al. 2015) 1-Decanoyl-sn- cardiolipin glycero-3- amast, No Trypanosoma brucei LmjF34.2110 m synthetase, NE No data phosphorylcholine promast data (Serricchio and Bütikofer 2012) putative (Schlame and Haldar 1993)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 172 Chapter 4

Table 4.2: (continued)

TDR Target database

Organism in which Enzyme Enzyme Name Gene ID Stage in Essentiality (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania or structure essential(Literature) Literature gene was Orthologues) models

predicted Druggability as essential Cellular localization

phosphatidylethan Severely affected phenotype in 3-Deazaadenosine amast, olaminen- No LmjF31.3120 c, m No data N Plasmodium knowlesi (Sen et (Haydn et al. promast methyltransferase- data al. 2013) 1982) lik e protein REP3123 and amast, methionyl-tRNA Staphylococcus aureus MetRS REP8839 LmjF21.0810 m E 0.9 Y promast synthetase (Green et al. 2009) (Hussain et al. 2015) dihydroorotate amast, Plasmodium falciparum DSM265(Phillips LmjF16.0530 c dehydrogenase E 0.9 Y Atovaquone DrugBank promast (Phillips and Rathod 2010) et al. 2015) (fumarate) orotidine-5- phosphate Plasmodium falciparum and 6-iodouridine 5'- amast, decarboxylase/oro LmjF16.0550 x E 0.3 Y Trypanosoma brucei monophosphate promast tate (Krungkrai et al. 2003) (Bello et al. 2007) phosphoribosyltra nsferase, putative phosphatidylseri ne No Trypanosoma brucei (Farine LmjF35.4590 promast c, m E N Phosphatidyl serine DrugBank decarboxylase, data et al. 2017) putative NADH-dependent amast, No Trypanosoma brucei (Coustou LmjF35.1180 x fumarate E Y Flavin adenine dinucleotide DrugBank promast data et al. 2006) reductase, putative dihydrolipoamide Plasmodium falciparum (Pei et amast, LmjF29.1830 m dehydrogenase, E 0.3 Y al. 2010); Trypanosoma brucei Flavin adenine dinucleotide DrugBank promast putative (Mazet et al. 2013) amast, beta eliminating Toxoplasma gondii (Starnes et LmjF01.0480 c No data 0.2 Y promast lyase,putative al. 2009)

Gene Essentiality Analysis in Leishmania major 173

Table 4.2: (continued).

TDR Target database

Organism in which Enzyme Enzyme Name Gene ID Stage in Essentiality (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania or structure essential(Literature) Literature gene was Orthologues) models

predicted Druggability as essential Cellular localization

acetoin 4-(dimethylaminomethyl)-2,6- amast, dehydrogenase e3 Trypanosoma brucei di(propan-2-yl)ph enol, CPZ, TDR LmjF31.2650 m E 0.3 Y promast component-like (Gutiérrez-Correa 2006) Isopromethazine, Alimemazine, Database protein Methixen 2-oxoglutarate dehydrogenase, e3 (Valproic Acid, NADH); (CPZ, DrugBank; amast, component, Trypanosoma brucei (Roldán et LmjF31.2640 m E 0.3 Y Isopromethazine, (+-)- TDR promast lipoamidedehydro al. 2011) Alimemazine) Database genase-like protein amast, mevalonate No Trypanosoma cruzi (Ferreira et FARNESYL LmjF31.0560 c, x NE Y DrugBank promast kinase, putative data al. 2016) THIOPYROPHOSPHATE Plasmodium falciparum (and Toxoplasma gondii) (Muench Ethionamide, nuclear receptor No et al. 2007); E. Coli (Ling et Isoniazid LmjF05.0520 promast m binding factor- NE Y Triclosan DrugBank data al. 2004); Listeria (Surolia and like protein monocytogenes (Yao et al. Surolia 2001) 2016) 2-hydroxy-5- udp-glc 4'- amast, No Trypanosoma brucei (JR et al. nitrogenzyl LmjF33.2300 x epimerase, E Y promast data 2002) bromide (Geren et putative al. 1977) UDP-N- Compound1 amast, acetylglucosamine No Trypanosoma brucei (Stokes et LmjF33.2520 c E Y (Urbaniak et al. promast pyrophosphorylas data al. 2008) 2013) e, putative ATP synthase, Enflurane, Isoflurane, amast, No Trypanosoma brucei LmjF30.3600 m epsilon chain, E Y Methoxyflurane, Halothane, DrugBank promast data (Schnaufer et al. 2005) putative Desflurane, Sevoflurane

Metabolic Systems Biology of Leishmania major University of Minho, 2019 174 Chapter 4

Table 4.2: (continued).

TDR Target database

Organism in which Enzyme Enzyme Name Gene ID Stage in Essentiality (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania or structure essential(Literature) Literature gene was Orthologues) models

predicted Druggability as essential Cellular localization

phosphatidylglyce CDP- rolphosphate amast, No Trypanosoma brucei diacylglycerol LmjF07.0200 m synthase, E N promast data (Serricchio and Bütikofer 2013) (Hirabayashi et al. mitochondrial, 1976) putative 6- amast, No Plasmodium falciparum (Allen LmjF26.2700 c, x phosphogluconola E Y Formic Acid, Citric Acid DrugBank promast data et al. 2015) ctonase glutamate isophthalic acid NADH, L-Glutamic Acid, amast, Plasmodium falciparum LmjF28.2910 c dehydrogenase, E 0.8 Y (Aparicio et al. Guanosine-5'-Triphosphate, DrugBank promast (Aparicio et al. 2010) putative 2010) Hexachlorophene pseudo-substrate N- glucose-6- acetylglucosamin No Trypanosoma brucei (Mariño phosphate LmjF28.3005 promast c E Y e-6-phosphate data et al. 2011) (Hurtado- synthase Guerrero et al. 2007) Glycine approved, Glutathione, glutamine gamma-Glutamylcysteine, amast, Mycobacterium tuberculosis LmjF06.0370 c synthetase, E 0.6 Y Phosphoaminophosphonic Acid- DrugBank promast (Tullius et al. 2003) putative Adenylate, L-Cysteine, Acetylcysteine amast, lanosterol Trypanosoma cruzi (Urbina LmjF06.0650 er E 0.6 Y Oxiconazole DrugBank promast synthase, putative 2009) Phosphatidylinosit amast, CDP-DAG No Trypanosoma brucei (Lilley et LmjF26.1620 c, m NE Y ol (D’Souza et al. promast synthase data al. 2014) 2014) myo-inositol-1(or Lithium (Atack amast, 4)- Trypanosoma brucei (Martin LmjF17.1390 c E 0.7 Y and Fletcher L-Myo-Inositol-1-Phosphate DrugBank promast monophosphatase and Smith 2005) 1994) 1, putative Gene Essentiality Analysis in Leishmania major 175

Table 4.2: (continued).

TDR Target database

zation

Organism in which Enzyme Enzyme Name Gene ID Stage in Essentiality (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania or structure essential(Literature) Literature gene was Orthologues) models

predicted Druggability as essential Cellular locali

phosphatidylseri Trypanosoma brucei No Choline (Stone LmjF14.1200 promast c, m ne synthase, E N (Serricchio and Bütikofer Phosphatidyl serine DrugBank data and Vance 2000) putative 2013) N,N'- ATP synthase F1 Enflurane, Isoflurane, amast, No Trypanosoma brucei (Dean et Dicyclohexylcarb LmjF21.1770 m subunit gamma E Y Methoxyflurane, Halothane, DrugBank promast data al. 2013; Schnaufer et al. 2005) odiimide (Lu et al. protein, putative Desflurane, Sevoflurane 2014) N- Gluconacetobacter xylinus 1,10- acetylglucosamine amast, No (Yadav et al. 2011); Listeria phenanthroline LmjF36.0040 c -6-phosphate No data Y promast data monocytogenes EGD and EDTA (Souza deacetylase-like (Popowska et al. 2012) et al. 1997) protein isopentenyl- NE21650 amast, diphosphate delta- No data for essentiality in LmjF35.5330 c, x E 0.2 Y (Thompson et al. promast isomerase (type human protozoan parasite 2002) II), putative amast, C-4 sterol methyl No No data for essentiality in LmjF36.2540 er No data N promast oxidase, putative data human protozoan parasite 1-acyl-sn- glycerol-3- amast, No data for essentiality in LmjF32.1960 c, m phosphateacyltran E 0.6 Y promast human protozoan parasite sferase-like protein amast, sterol C-24 No No data for essentiality in LmjF33.0680 er No data Y promast reductase, putative data human protozoan parasite

Metabolic Systems Biology of Leishmania major University of Minho, 2019 176 Chapter 4

Table 4.2: (continued).

TDR Target database

Organism in which Enzyme Enzyme Name Gene ID Stage in Essentiality (in 3D gene/protein is Inhibitors in Enzyme Inhibitors in databases (TriTrypDB) which the Leishmania or structure essential(Literature) Literature gene was Orthologues) models

predicted Druggability as essential Cellular localization

Sapropterin, Norepinephrine, L- Phenylalanine, Quinonoid 7,8- amast, phenylalanine-4- No data for essentiality in Tetrahydrobiopterin, Beta(2- LmjF28.1280 c E 0.9 Y DrugBank promast hydroxylase human protozoan parasite Thienyl)Alanine, 7,8- Dihydrobiopterin, Norleucine, Epinephrine, Droxidopa 2-oxoisovalerate amast, dehydrogenase No No data for essentiality in LmjF21.1430 m E Y promast alpha subunit, data human protozoan parasite putative phospholipid:diac amast, ylglycerol No No data for essentiality in LmjF09.1040 c E N promast acyltransferase, data human protozoan parasite putative deoxyribose- amast, No No data for essentiality in LmjF06.1070 c phosphate E Y promast data human protozoan parasite aldolase, putative NAD(P)- amast, dependent steroid No No data for essentiality in LmjF06.0350 er E Y promast dehydrogenase data human protozoan parasite protein, putative linoleoyl-coa No No data for essentiality in LmjF36.6950 amast c desaturase data human protozoan parasite Legend: c: cytosol; x: glycosome; m: mitochondria; e: extra cellular space; er: endoplasmic reticulum; n: nucleus. E: essential; NE: Non-essential; Y: Yes; N: No; promast: promastigote stage; amast: amastigote stage Gene Essentiality Analysis in Leishmania major 177

Table 4.3: Additional information about 10 novel genes predicted as potential drug targets in L. major.

Metabolic Gene ID EC number Enzyme Name (TriTrypDB) stage

Cellular Cellular Metabolic pathway(s) localization

LmjF35.5330 5.3.3.2 amast, promast c, x isopentenyl-diphosphate delta-isomerase (type II), putative Steroid Biosynthesis LmjF36.2540 1.1.1.170 amast, promast er C-4 sterol methyl oxidase, putative Steroid Biosynthesis LmjF32.1960 2.3.1.51 amast, promast c, m 1-acyl-sn-glycerol-3-phosphateacyltransferase-like protein Glycerophospholipid metabolism LmjF33.0680 - amast, promast er sterol C-24 reductase, putative Steroid Biosynthesis Phenylalanine, Tyrosine, Tryptophan LmjF28.1280 1.14.16.1 amast, promast c phenylalanine-4-hydroxylase biosynthesis Valine, leucine, and isoleucine LmjF21.1430 1.2.4.4 amast, promast m 2-oxoisovalerate dehydrogenase alpha subunit, putative degradation LmjF09.1040 2.3.1.158 amast, promast c phospholipid:diacylglycerol acyltransferase, putative Glycerophospholipid metabolism LmjF06.1070 4.1.2.4 amast, promast c deoxyribose-phosphate aldolase, putative Pentose Phosphate Pathway NAD(P)-dependent steroid dehydrogenase protein, LmjF06.0350 5.3.3.1 amast, promast er Steroid Biosynthesis putative LmjF36.6950 1.14.19.3 amast c linoleoyl-coa desaturase Fatty Acid Synthesis Legend: c: cytosol; x:glycosome; m:mitochondria; er: endoplasmic reticulum; promast: promastigote stage; amast: amastigote stage

Metabolic Systems Biology of Leishmania major University of Minho, 2019 178 Chapter 4

4.4 Conclusions

Gene knockout analysis using metabolic model ext-iAC560 was useful to identify 48 non-human homologous genes which are essential in promastigote and amastigote stages of L. major. The analysis also predicted four genes (e.g. LmjF35.4590, LmjF28.3005, LmjF14.1200, and LmjF05.0520) and one gene (e.g. LmjF36.6950) which are found uniquely essential in promastigote and amastigote stages, respectively. Knockout of any essential gene affected metabolic network locally or globally by influencing synthesis of single or multiple building block(s) of biomass, leading to lethal phenotypes. The methodology was also useful to predict nine novel essential genes (e.g. LmjF35.5330, LmjF36.2540, LmjF32.1960, LmjF33.0680, LmjF28.1280, LmjF21.1430, LmjF09.1040, LmjF06.1070, and LmjF06.0350) in promastigote and amastigote condition, and one gene (e.g. LmjF36.6950) in only amastigote conditions, respectively, which can be considered as potential drug targets. Furthermore, a compilation of information from literature/database about essential genes was helpful to support the current predictions as well as to prioritize the drug targets. Additionally, the available enzyme-based inhibitors (tested in other parasitic species), which can be considered as antileishmanial drugs, were also suggested. The strategy was helpful to identify essential genes and prioritize potential drug targets in L. major and can be easily applied in other parasitic or non-parasitic species. Gene Essentiality Analysis in Leishmania major 179

4.5 Appendix 4a

Table I: List of essential genes that affect biomass growth by influencing biosynthesis of building blocks and constituting macromolecules in the biomass. The only cases where knockout of a gene reduces synthesis of at least one building block(s) to zero were considered. The eight genes are uniquely identified as essential in promastigote, while four genes are identified as essential only in amastigote condition.

Promastigote Amastigote

Number Number

of of

Number of affected Number of affected

in

affected building affected building

macromolecules block of macromolecules block of

olyamine

RNA DNA Carbohydrate Lipid Prote P RNA DNA Carbohydrate Lipid Protein Polyamine Gene ID in biomass biomass in biomass biomass

LmjF01.0480 Yes Yes Yes 3 6 Yes Yes Yes 3 4

LmjF04.0580 Yes 1 1 Yes 1 1

LmjF05.0350 Yes Yes Yes 3 9 Yes Yes Yes 3 9

LmjF06.0350 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF06.0370 Yes Yes Yes Yes 4 7 Yes Yes Yes Yes 4 6

LmjF06.0650 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF06.0860 Yes 1 1 Yes 1 1

LmjF06.1070 Yes 1 1 Yes 1 1

LmjF07.0090 Yes 1 2 Yes 1 2

LmjF07.0200 Yes 1 1 Yes 1 1

LmjF07.0805 Yes 1 1 Yes 1 1

LmjF09.1040 Yes 1 1 Yes 1 1

LmjF11.0590 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF11.1100 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF12.0280 Yes 1 2 Yes 1 2

Metabolic Systems Biology of Leishmania major University of Minho, 2019 180 Chapter 4

Table I: (continued).

Promastigote Amastigote

Number Number

of of

Number of affected Number of affected

mine

affected building affected building

macromolecules block of macromolecules block of

olya

RNA DNA Carbohydrate Lipid Protein P RNA DNA Carbohydrate Lipid Protein Polyamine Gene ID in biomass biomass in biomass biomass

LmjF13.1620 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF14.1360 Yes 1 1 Yes 1 1

LmjF15.1040 Yes Yes Yes 3 9 Yes Yes Yes 3 9

LmjF15.1460 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF16.0530 Yes Yes Yes 3 8 Yes Yes Yes 3 8

LmjF16.0540 Yes Yes Yes 3 8 Yes Yes Yes 3 8

LmjF16.0550 Yes Yes Yes 3 8 Yes Yes Yes 3 8

LmjF16.0580 Yes Yes Yes 3 8 Yes Yes Yes 3 8

LmjF16.0590 Yes Yes Yes 3 8 Yes Yes Yes 3 8

LmjF17.1390 Yes 1 1 Yes 1 1

LmjF18.0020 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF18.0200 Yes 1 1 Yes 1 1

LmjF18.0990 Yes 1 3 Yes 1 3

LmjF19.0710 Yes Yes Yes Yes Yes 5 22 Yes Yes Yes Yes Yes 5 20

LmjF20.0560 Yes 1 1 Yes 1 1

LmjF21.0640 Yes 1 3 Yes 1 3

LmjF21.0845 Yes Yes Yes 4 6 Yes Yes Yes 3 4

LmjF21.1430 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF21.1770 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF22.0110 Yes Yes Yes 3 4 Yes Yes Yes 3 4

LmjF22.1360 Yes Yes Yes 3 7 Yes Yes Yes 3 6

Gene Essentiality Analysis in Leishmania major 181

Table I: (continued).

Promastigote Amastigote

Number Number

of of

Number of affected Number of affected

affected building affected building

macromolecules block of macromolecules block of

RNA DNA Carbohydrate Lipid Protein Polyamine RNA DNA Carbohydrate Lipid Protein Polyamine Gene ID in biomass biomass in biomass biomass

LmjF23.0110 Yes 1 2 Yes 1 2

LmjF24.1630 Yes Yes Yes 3 7 Yes Yes Yes 3 11

LmjF26.0830 Yes Yes 3 3 Yes 1 1

LmjF26.1620 Yes 1 4 Yes 1 2

LmjF26.2480 Yes 1 1 Yes 1 1

LmjF26.2700 Yes Yes Yes 3 9 Yes Yes Yes 3 9

LmjF27.0930 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF27.2030 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF28.0890 Yes 1 3 Yes 1 3

LmjF28.1280 Yes 1 1 Yes 1 1

LmjF28.1970 Yes Yes Yes Yes 5 14 Yes Yes Yes Yes 4 14

LmjF29.1830 Yes Yes Yes Yes 4 13 Yes Yes Yes 3 6

LmjF29.1960 Yes Yes Yes Yes 4 14 Yes Yes Yes Yes 4 14

LmjF30.2600 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF30.3190 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF30.3600 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF31.0560 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF31.2290 Yes 1 1 Yes 1 1

LmjF31.2640 Yes Yes Yes Yes 4 13 Yes Yes Yes 3 6

LmjF31.2650 Yes Yes Yes Yes 4 13 Yes Yes Yes 3 6

LmjF31.2940 Yes Yes Yes 3 7 Yes Yes Yes 3 6

Metabolic Systems Biology of Leishmania major University of Minho, 2019 182 Chapter 4

Table I: (continued).

Promastigote Amastigote

Number Number

of of

Number of affected Number of affected

affected building affected building

macromolecules block of macromolecules block of

RNA DNA Carbohydrate Lipid Protein Polyamine RNA DNA Carbohydrate Lipid Protein Polyamine Gene ID in biomass biomass in biomass biomass

LmjF31.3120 Yes 1 1 Yes 1 1

LmjF31.3130 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF32.1580 Yes 1 2 Yes 1 2

LmjF32.1960 Yes 1 7 Yes 1 5

LmjF32.2320 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF32.2950 Yes Yes Yes Yes 4 12 Yes Yes Yes Yes 4 10

LmjF32.3310 Yes Yes Yes Yes 4 13 Yes Yes Yes 3 6

LmjF33.0680 Yes 1 1 Yes 1 1

LmjF33.2300 Yes 1 2 Yes 1 2

LmjF33.2520 Yes 1 1 Yes 1 1

LmjF34.0080 Yes Yes Yes 3 9 Yes Yes Yes 3 9

LmjF34.1090 Yes 1 7 Yes 1 5

LmjF34.2110 Yes 1 1 Yes 1 1

LmjF35.1180 Yes Yes Yes Yes 4 14 Yes Yes Yes Yes 4 14

LmjF35.1480 Yes 1 2 Yes 1 2

LmjF35.3340 Yes Yes Yes 3 9 Yes Yes Yes 3 9

LmjF35.3870 Yes Yes Yes Yes 4 12 Yes Yes Yes Yes 4 10

LmjF35.4410 Yes Yes Yes 3 10 Yes Yes Yes 3 9

LmjF35.5330 Yes Yes Yes 3 7 Yes Yes Yes 3 6

LmjF36.1960 Yes 1 2 Yes 1 2

LmjF36.2540 Yes Yes Yes 3 7 Yes Yes Yes 3 6 Gene Essentiality Analysis in Leishmania major 183

Table I: (continued).

Promastigote Amastigote

Number Number

of of

Number of affected Number of affected

affected building affected building

macromolecules block of macromolecules block of

RNA DNA Carbohydrate Lipid Protein Polyamine RNA DNA Carbohydrate Lipid Protein Polyamine Gene ID in biomass biomass in biomass biomass

LmjF36.3910 Yes 1 2 Yes 1 2

LmjF36.4430 Yes 1 1 Yes 1 1

LmjF36.6390 Yes 1 2 Yes 1 2

LmjF12.0630 Yes Yes 9 2

LmjF35.4590 Yes 1 2

LmjF31.2970 Yes 1 5

LmjF33.2720 Yes 1 7

LmjF06.0950 Yes 1 1

LmjF14.1200 Yes 1 2

LmjF05.0520 Yes 1 7

LmjF28.3005 Yes 1 1

LmjF36.6950 Yes 1 5

LmjF33.3270 Yes 1 5

LmjF10.0010 Yes 1 4

Metabolic Systems Biology of Leishmania major University of Minho, 2019 184 Chapter 4

4.6 Appendix 4b Table I: (continued).

Gene pairs ext-iAC560 iAC560 Metabolic Table I: List of non-trivial lethal gene pairs predicted in current double (Chavali et study (Chavali gene deletion study using metabolic model ext-iAC560. The corresponding al. 2008) et al. 2012) column provides information if the gene pairs are also predicted or studied in the previous studies (Chavali et al. 2012) and (Chavali et al. 2008) in L. major. Promast Amast Promast Promast

Gene pairs ext-iAC560 iAC560 Metabolic LmjF18.0510 LmjF13.1680 yes yes (Chavali et study (Chavali al. 2008) et al. 2012) LmjF19.1560 LmjF17.0725 yes yes

LmjF20.0100 LmjF10.0510 yes yes Promast Amast Promast Promast LmjF20.0100 LmjF24.0850 yes yes yes yes LmjF03.0100 LmjF24.0320 yes yes yes LmjF20.0100 LmjF25.1120 yes yes LmjF03.0740 LmjF24.0320 yes yes yes LmjF20.0100 LmjF35.3080 yes yes LmjF04.1130 LmjF24.0320 yes yes yes LmjF20.0100 LmjF35.4750 yes yes LmjF05.0180 LmjF25.0020 yes yes yes LmjF20.0110 LmjF10.0510 yes yes LmjF05.0500 LmjF05.0510 yes yes yes Yes LmjF20.0110 LmjF24.0850 yes yes yes LmjF07.0060 LmjF24.0320 yes yes yes LmjF20.0110 LmjF25.1120 yes yes LmjF11.0220 LmjF34.4330 yes yes

LmjF20.0110 LmjF35.3080 yes yes LmjF16.0480 LmjF16.0440 yes yes

LmjF20.0110 LmjF35.4750 yes yes LmjF18.0510 LmjF03.0200 yes yes

LmjF21.1710 LmjF24.0320 yes yes yes

LmjF22.1290 LmjF27.2050 yes yes yes yes

Gene Essentiality Analysis in Leishmania major 185

Table I: (continued). Table I: (continued).

Gene pairs ext-iAC560 iAC560 Metabolic Gene pairs ext-iAC560 iAC560 Metabolic (Chavali et study (Chavali (Chavali et study (Chavali al. 2008) et al. 2012) al. 2008) et al. 2012)

Promast Amast Promast Promast Promast Amast Promast Promast

LmjF24.0320 LmjF15.0990 yes yes yes LmjF30.3120 LmjF30.3110 yes yes yes

LmjF24.0320 LmjF20.0840 yes yes yes LmjF30.3380 LmjF10.0510 yes yes

LmjF24.0320 LmjF23.0370 yes yes yes LmjF30.3380 LmjF24.0850 yes yes yes yes

LmjF24.0320 LmjF28.2680 yes yes yes LmjF30.3380 LmjF25.1120 yes yes

LmjF24.0320 LmjF31.1570 yes yes yes LmjF30.3380 LmjF35.3080 yes yes

LmjF24.0320 LmjF31.2580 yes yes yes LmjF30.3380 LmjF35.4750 yes yes

LmjF24.0320 LmjF36.6995 yes yes yes LmjF30.3520 LmjF30.3500 yes yes yes yes

LmjF25.1130 LmjF24.0320 yes yes yes LmjF33.1090 LmjF36.2260 yes yes yes

LmjF25.1170 LmjF25.1180 yes yes yes Yes LmjF34.0110 LmjF04.0960 yes yes yes yes

LmjF26.1710 LmjF24.0320 yes yes yes LmjF34.0120 LmjF04.0960 yes yes yes

LmjF28.2370 LmjF14.1320 yes yes LmjF35.1380 LmjF24.0320 yes yes yes

LmjF29.2510 LmjF12.0670 yes yes yes LmjF35.1540 LmjF24.0320 yes yes Yes

Metabolic Systems Biology of Leishmania major University of Minho, 2019 186 Chapter 4

Table I: (continued). Table I: (continued).

Gene pairs ext-iAC560 iAC560 Metabolic Gene pairs ext-iAC560 iAC560 Metabolic (Chavali et study (Chavali (Chavali et study (Chavali al. 2008) et al. 2012) al. 2008) et al. 2012)

Promast Amast Promast Promast Promast Amast Promast Promast

LmjF35.2160 LmjF33.0235 yes yes yes LmjF04.1130 LmjF29.2510 yes

LmjF35.3080 LmjF10.0510 yes yes yes LmjF07.0060 LmjF12.0530 yes

LmjF35.4750 LmjF24.0850 yes yes yes LmjF07.0060 LmjF36.1260 yes

LmjF36.1360 LmjF25.2370 yes yes yes LmjF07.0060 LmjF29.2510 yes

LmjF36.5390 LmjF33.1930 yes yes yes LmjF12.0530 LmjF36.6995 yes

LmjF03.0100 LmjF12.0530 yes LmjF12.0530 LmjF36.1260 yes

LmjF03.0100 LmjF36.1260 yes LmjF12.0530 LmjF31.2580 yes

LmjF03.0100 LmjF29.2510 yes LmjF12.0530 LmjF31.1570 yes

LmjF03.0740 LmjF12.0530 yes LmjF12.0530 LmjF28.2680 yes

LmjF03.0740 LmjF36.1260 yes LmjF12.0530 LmjF24.2060 yes

LmjF03.0740 LmjF29.2510 yes LmjF12.0530 LmjF23.0370 yes

LmjF04.1130 LmjF12.0530 yes LmjF12.0530 LmjF20.0840 yes

LmjF04.1130 LmjF36.1260 yes LmjF12.0530 LmjF12.0670 yes

Gene Essentiality Analysis in Leishmania major 187

Table I: (continued). Table I: (continued).

Gene pairs ext-iAC560 iAC560 Metabolic Gene pairs ext-iAC560 iAC560 Metabolic (Chavali et study (Chavali (Chavali et study (Chavali al. 2008) et al. 2012) al. 2008) et al. 2012)

Promast Amast Promast Promast Promast Amast Promast Promast

LmjF12.0670 LmjF36.1260 yes LmjF24.0370 LmjF35.0820 yes

LmjF16.0760 LmjF12.0530 yes LmjF25.1120 LmjF35.4750 yes

LmjF16.0760 LmjF36.1260 yes LmjF25.1130 LmjF12.0530 yes

LmjF16.0760 LmjF29.2510 yes LmjF25.1130 LmjF36.1260 yes

LmjF20.0100 LmjF36.2950 yes yes LmjF25.1130 LmjF29.2510 yes

LmjF20.0110 LmjF36.2950 yes yes LmjF26.1710 LmjF12.0530 yes

LmjF20.0840 LmjF36.1260 yes LmjF26.1710 LmjF36.1260 yes

LmjF21.1710 LmjF12.0530 yes LmjF26.1710 LmjF29.2510 yes

LmjF21.1710 LmjF36.1260 yes LmjF29.2510 LmjF36.6995 yes

LmjF21.1710 LmjF29.2510 yes LmjF29.2510 LmjF12.0530 yes

LmjF23.0370 LmjF36.1260 yes LmjF29.2510 LmjF31.2580 yes

LmjF24.0320 LmjF12.0670 yes LmjF29.2510 LmjF31.1570 yes

Metabolic Systems Biology of Leishmania major University of Minho, 2019 188 Chapter 4

Table I: (continued). Table I: (continued).

Gene pairs ext-iAC560 iAC560 Metabolic Gene pairs ext-iAC560 iAC560 Metabolic (Chavali et study (Chavali (Chavali et study (Chavali al. 2008) et al. 2012) al. 2008) et al. 2012)

Promast Amast Promast Promast Promast Amast Promast Promast

LmjF29.2510 LmjF28.2680 yes LmjF36.1260 LmjF31.2580 yes

LmjF29.2510 LmjF24.2060 yes LmjF36.1260 LmjF28.2680 yes

LmjF29.2510 LmjF23.0370 yes LmjF36.1260 LmjF24.2060 yes

LmjF29.2510 LmjF20.0840 yes LmjF36.2950 LmjF35.4750 yes Yes

LmjF30.3380 LmjF36.2950 yes yes LmjF36.6995 LmjF36.1260 yes

LmjF31.1570 LmjF36.1260 yes LmjF02.0500 LmjF20.0100 yes

LmjF35.1380 LmjF12.0530 yes LmjF02.0500 LmjF20.0110 yes

LmjF35.1380 LmjF36.1260 yes LmjF02.0500 LmjF30.3380 yes

LmjF35.1380 LmjF29.2510 yes LmjF03.0910 LmjF31.1220 yes

LmjF35.1540 LmjF12.0530 yes LmjF06.0880 LmjF28.2510 yes

LmjF35.1540 LmjF36.1260 yes LmjF16.1340 LmjF31.0590 yes Yes

LmjF35.1540 LmjF29.2510 yes LmjF16.1340 LmjF29.2140 yes Yes

LmjF35.4750 LmjF10.0510 yes LmjF16.1340 LmjF36.2380 yes Yes

LmjF24.2250 LmjF14.0510 yes

Gene Essentiality Analysis in Leishmania major 189

Table I: (continued). Table I: (continued).

Gene pairs ext-iAC560 iAC560 Metabolic Gene pairs ext-iAC560 iAC560 Metabolic (Chavali et study (Chavali (Chavali et study (Chavali al. 2008) et al. 2012) al. 2008) et al. 2012)

Promast Amast Promast Promast Promast Amast Promast Promast

LmjF26.1550 LmjF36.1140 yes LmjF34.4600 LmjF20.0970 yes

LmjF06.0860 LmjF21.1210 yes yes LmjF36.2360 LmjF35.0820 yes

LmjF06.1070 LmjF27.0420 yes LmjF36.4430 LmjF06.0950 yes

LmjF18.1330 LmjF31.2290 yes LmjF35.1470 LmjF31.2290 yes

LmjF18.1330 LmjF31.3120 yes LmjF32.3260 LmjF36.4430 yes

LmjF23.1480 LmjF12.0630 yes Legend: Promast-Promastigote stage; Amast- Amastigote stage

LmjF27.0420 LmjF01.0480 yes

LmjF27.2440 LmjF24.2030 yes

LmjF31.2290 LmjF36.5900 yes

LmjF31.3120 LmjF35.1470 yes

LmjF31.3120 LmjF36.5900 yes

Metabolic Systems Biology of Leishmania major University of Minho, 2019 190 Chapter 4

4.7 Appendix 4c

Table I: Predicted phenotypes of the reaction knockout simulations using ext-iAC560 and previous metabolic models. Details of the 67 reactions considered to perform reaction knockout studies using ext-iAC560 are mentioned. ‘-’ in the column “Predicted phenotypes” represents that the reaction is not considered either in the metabolic network or in the knockout studies using the corresponding metabolic model. The other information of each of the reactions in the literature and databases are also mentioned in the corresponding column. The shaded rows represent cases where predicted phenotypes using at least one of the models do not match with the experimental data.

Experimental information Predicted phenotypes

iAC560 iMS604 iAS142 ext-iAC560

)

)

)

(ID)

, , ,

ns Comments

astigote

No.

donovani

.

l

MM+Glc, MM+Glc,

L. L. major L. major

S Reactio Name Rxn association Gene Phenotypes Organism References major L. ( MM+Glc+aa L. infantum L. Promastigote Am ( MMPmedia ( MMA media

(((LmjF14.0320 or L-arginine transport via LmjF22.0230) or (Shaked-Mishan et 1 ARGt L L. donovani - L - NL - proton symport LmjF11.0520) or al. 2006) LmjF27.0670) (Aripirala et al. 2014; Ortiz- Farnesyl pyrophosphate 2 GRTT LmjF22.1360 L L. major Gómez et al. - L - L - synthase 2006; Aripirala et al. 2011) (Aripirala et al. 2014; Ortiz- Farnesyl pyrophosphate 3 GRTTx LmjF22.1360 L L. major Gómez et al. - L - L - synthase 2006; Aripirala et al. 2011) (Mandlik et al. 2014; Smith and Bütikofer Inositol phosphorylceramide 4 IPCS_LM LmjF35.4990 L L. major 2010; Denny et al. - NL - NL - synthase (IPC) Synthase 2006; Mina et al. 2009) (Beverley et al. 5 TMDS Thymidylate synthase LmjF06.0860 NL L. major - NL - L - 1986)

Gene Essentiality Analysis in Leishmania major 191

Table I: (continued). Experimental information Predicted phenotypes

iAC560 iMS604 iAS142 ext-iAC560

(ID)

Comments

mastigote

No.

donovani

.

l

L. L. major, L. major,

S Reactions Name Rxn association Gene Phenotypes Organism References major L. (MM+Glc, MM+Glc+aa) L. infantum L. Pro Amastigote ( MMPmedia) ( MMA media)

Phenylalanine 4- (Ong et al. 2011; 6 PHE4MOi LmjF28.1280 NL L. major - L - L - monooxygenase Lye et al. 2011) Phenylalanine 4- (Ong et al. 2011; 7 PHE4MOi2 LmjF28.1280 NL L. major - L - L - monooxygenase Lye et al. 2011) 5,10- 8 MTHFR methylenetetrahydrofolate LmjF36.6390 NL L. major (Vickers et al. 2006) - NL - L - reductase (FADH2) formate-tetrahydrofolate 9 FTHFLr LmjF30.2600 L L. major (Murta et al. 2009) - L - L - ligase 10 ADSL1r adenylsuccinate lyase LmjF04.0460 L L. donovani (Boitz et al. 2013) - NL - NL - 11 ADSL2r adenylsuccinate lyase LmjF04.0460 L L. donovani (Boitz et al. 2013) - NL - NL - (Garami et al. 12 PMANMg phosphomannomutase LmjF34.3780 NL L. mexicana - NL NL NL - 2001) Adenosylmethionine (LmjF30.3110 or 13 ADMDCi L L. donovani (Singh et al. 2013) L L - L - decarboxylase LmjF30.3120) HMGCOA Hydroxymethylglutaryl (Dinesh et al. 14 LmjF30.3190 L L. donovani - L - L - Ri CoA reductase 2014) HMGCOA Hydroxymethylglutaryl (Dinesh et al. 15 LmjF30.3190 L L. donovani - L - L - Ri_x CoA reductase 2014) mannose-6-phosphate (Garami and Ilg 16 MAN6PI LmjF32.1580 NL L. mexicana L L NL L - isomerase 2001) ((LmjF30.0880 or (Iovannisci and 17 ADNK1c adenosine kinase LmjF34.3600) or NL L. donovani NL L - NL - Ullman 1984) LmjF30.0890) ((LmjF30.0880 or (Iovannisci and 18 ADNK1er adenosine kinase LmjF34.3600) or NL L. donovani NL L - NL - Ullman 1984) LmjF30.0890) ((LmjF30.0880 or (Iovannisci and 19 ADNK1m adenosine kinase LmjF34.3600) or NL L. donovani NL L - NL - Ullman 1984) LmjF30.0890)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 192 Chapter 4

Table I: (continued). Experimental information Predicted phenotypes

iAC560 iMS604 iAS142 ext-iAC560

(ID)

Comments

mastigote

No.

donovani

.

l

L. L. major, L. major,

S Reactions Name Rxn association Gene Phenotypes Organism References major L. (MM+Glc, MM+Glc+aa) L. infantum L. Pro Amastigote ( MMPmedia) ( MMA media)

((LmjF30.0880 or (Iovannisci and 20 ADNK1g adenosine kinase LmjF34.3600) or NL L. donovani L L - NL - Ullman 1984) LmjF30.0890) serine C- (LmjF34.3740 and (Zhang et al. 21 SERPTr NL L. major NL NL NL - palmitoyltransferase LmjF35.0320) 2003) adenylosuccinate 22 ADSS_i LmjF13.1190 NL L. donovani (Boitz et al. 2013) - NL - NL - synthetase 1-(5'-Phosphoribosyl)-5- amino-4- (Boitz and Ullman 23 ADPT2 imidazolecarboxamide:pyroph LmjF26.0140 NL L. donovani L - NL - 2010) osphate phosphoribosyltransferase 1-(5'-Phosphoribosyl)-5- amino-4- (Boitz and Ullman 24 ADPTr imidazolecarboxamide:pyroph LmjF26.0140 NL L. donovani L L - NL - 2010) osphate phosphoribosyltransferase (Manhas et al. 25 ASNS3 asparagine synthetase LmjF26.0830 L L. donovani - L - L - 2014) (Boitz and Ullman xanthine 26 XPRTgr LmjF21.0850 NL L. donovani 2006a; Boitz and NL NL - L - phosphoribosyltransferase Ullman 2006b) 27 ORNDC ornithine decarboxylase LmjF12.0280 L L. donovani (Boitz et al. 2009) L L - L - (McCall et al. 28 ST14DMr Sterol 14-demethylase LmjF11.1100 L L. donovani L - L - 2015) ATP synthase, (Luque-Ortega et 29 ATPSm Genes1 L L. donovani L L L L - mitochondrial al. 2008) ribose-5-phosphate 30 RPI LmjF28.1970 L L. infantum (Faria et al. 2016) - L - L - isomerase cytochrome c oxidase, (Luque-Ortega 31 CYOO6m Genes2 L L. donovani NL NL L NL - mitochondrial and Rivas 2007) mannose-1-phosphate 32 MAN1PT1 LmjF23.0110 NL L. mexicana (Stewart et al. 2005) L L - L - guanylyltransferase

Gene Essentiality Analysis in Leishmania major 193

Table I: (continued). Experimental information Predicted phenotypes

iAC560 iMS604 iAS142 ext-iAC560

)

es

Comments lc+aa

L. L. major, L. major,

Sl. No. Sl. (ID) Reactions Name Rxn association Gene Phenotypes Organism Referenc major L. (MM+Glc, MM+G donovani L. infantum L. Promastigote Amastigote ( MMPmedia) ( MMA media) (Roberts et al. 33 ARGNg arginase LmjF35.1480 L L. mexicana L L - L - 2004) argininosuccinate (Lakhal-Naouar et 34 ARGSSr LmjF23.0260 NL L. donovani - NL - NL - synthase, reversible al. 2012) myo-Inositol-1-phosphate 35 MI1PSB LmjF14.1360 L L. mexicana (Ilg T 2002) L L - L - synthase fructose-bisphosphatase, (Naderer et al. 36 FBPg LmjF04.1160 NL L. major - NL NL NL - glycosomal 2006) succinate dehydrogenase SUCD2_u6 (Mondal et al. 37 (ubiquinone-6), LmjF15.0990 NL L. donovani - NL - NL - m 2014) mitochondrial (Cunningham and 38 TRYR trypanothione reductase LmjF05.0350 L L. donovani L L - L - Fairlamb 1995) (Gilroy et al. 39 SPMS spermidine synthase LmjF04.0580 L L. donovani L L - L - 2011) glycerol-3-phosphate (Zufferey and 40 GPAM_L LmjF34.1090 NL L. major - L - L - acyltransferase (L. major) Mamoun 2005) glycerol-3-phosphate (Zufferey and 41 GPAMm_LM LmjF34.1090 NL L. major - L - L - acyltransferase (L. major) Mamoun 2005) DHAPAx_L glycerol-3-phosphate (Zufferey and 42 LmjF34.1090 NL L. major - L - L - M acyltransferase (L. major) Mamoun 2005) hypoxanthine Axenic (Boitz and Ullman 43 HXPRTg phosphoribosyltransferase LmjF21.0845 L L. donovani amastigote - L - L L 2006a) (Hypoxanthine) stage 2-oxoglutarate (Bochud- (LmjF27.0880 or 44 AKGDe1 dehydrogenase NL T. brucei Allemann and NL - - NL - LmjF36.3470) (lipoamide) Schneider 2002)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 194 Chapter 4

Table I: (continued). Experimental information Predicted phenotypes

iAC560 iMS604 iAS142 ext-iAC560

(ID)

ciation

Comments vani

media)

asso

No.

dono

.

l

L. L. major, L. major,

S Reactions Name Rxn Gene Phenotypes Organism References major L. (MM+Glc, MM+Glc+aa) L. infantum L. Promastigote ( MMPmedia) Amastigote ( MMA TrypanoFAN database (http://trypanofan. hnRNP arginine N- (LmjF03.0600 and 45 ARMT NL T. brucei path.cam.ac.uk/cgi NL - - NL - methyltransferase LmjF12.1270) - bin/WebObjects/tr ypanofan) TrypanoFAN database (http://trypanofan. ATP synthase, 46 ATPS3v Genes3 NL T. brucei path.cam.ac.uk/cgi NL - - NL - Acidocalcisome - bin/WebObjects/tr ypanofan) (Hidalgo-Zarco and 47 DUTPDPm dUTP diphosphatase LmjF06.0560 L L. major Gonzalez- NL - - NL - Pazanowska 2001) (Hidalgo-Zarco and 48 DUTPDPn dUTP diphosphatase LmjF06.0560 L L. major Gonzalez- NL - - NL - Pazanowska 2001) hypoxanthine (Hwang and Ullman 49 HXPRTg phosphoribosyltransferase LmjF21.0845 NL L. donovani L - - L - 1997) (Hypoxanthine) ((LmjF18.1380 and (Bochud- pyruvate dehydrogenase, 50 PDHe1 LmjF25.1710) and NL T. brucei Allemann and NL - - NL - mitochondrial LmjF35.0050) Schneider 2002) glucose-6-phosphate TrypanoFAN 51 PGI1 LmjF12.0530 NL T. brucei NL - - NL - isomerase, glycosomal database glucose-6-phosphate TrypanoFAN 52 PGI2 LmjF12.0530 NL T. brucei NL - - NL - isomerase, glycosomal database glucose-6-phosphate TrypanoFAN 53 PGI3 LmjF12.0530 NL T. brucei NL - - NL - isomerase, glycosomal database

Gene Essentiality Analysis in Leishmania major 195

Table I: (continued). Experimental information Predicted phenotypes

iAC560 iMS604 iAS142 ext-iAC560

(ID)

Comments

No.

donovani

.

l

L. L. major, L. major,

S Reactions Name Rxn association Gene Phenotypes Organism References major L. (MM+Glc, MM+Glc+aa) L. infantum L. Promastigote Amastigote ( MMPmedia) ( MMA media) ((LmjF20.0110 and phosphoglycerate kinase, TrypanoFAN 54 PGKg LmjF20.0100) and NL T. brucei NL - - NL - glycosomal database LmjF30.3380) ((LmjF20.0110 and phosphoglycerate kinase, TrypanoFAN 55 PGK LmjF20.0100) and NL T. brucei L - - NL - glycosomal database LmjF30.3380) (((((LmjF20.1120 or LmjF24.2010) or

1-phosphatidylinositol 3- LmjF34.4530) or TrypanoFAN 56 PIN3K_LM NL T. brucei NL - - NL - kinase LmjF30.1850) or database LmjF02.0120) or LmjF34.3940) phosphoenolpyruvate (LmjF27.1805 or (Coustou et al. 57 PPCKg NL T. brucei NL - - NL - carboxykinase, glycosomal LmjF27.1810) 2003) pyruvate phosphate (Coustou et al. 58 PPDKg LmjF11.1000 NL T. brucei NL - - NL - dikinase, glycosome 2003) (LmjF35.0030 or 59 PYK pyruvate kinase L T. brucei (Coustou et al. 2003) NL - - NL - LmjF35.0020) (Bochud-Allemann 60 SUCD1rm Succinate dehydrogenase LmjF24.1630 NL T. brucei L - - L - and Schneider 2002) ((LmjF25.2130 and Succinyl CoA synthetase (Bochud- SUCOGDP LmjF36.2950) or 61 (GDP-forming), L T. brucei Allemann and L - - L - m (LmjF25.2140 and mitochondrial Schneider 2002) LmjF36.2950)) 62 UDPG4Ex UDPglucose 4-epimerase LmjF33.2300 L T. brucei (JR et al. 2002) NL - - L - Intracellular 63 ACONTm Aconitate hydratase LmjF18.0510 L L. mexicana (Naderer et al. 2010) Leishmania - - NL - L mexicana glucosamine-6-phosphate (Naderer et al. 64 G6PDAr LmjF32.3260 L L. major - - - - L deaminase; glycosomal 2010)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 196 Chapter 4

Table I: (continued). Experimental information Predicted phenotypes

iAC560 iMS604 iAS142 ext-iAC560

e

Comments

notypes

L. L. major, L. major,

Sl. No. Sl. (ID) Reactions Name Rxn association Gene Phe Organism References major L. (MM+Glc, MM+Glc+aa) donovani L. infantum L. Promastigote Amastigot ( MMPmedia) ( MMA media) step_GlcNa (Naderer et al. 65 GlcNAc6P deacetylase LmjF36.0040 NL L. major - - - NL - c6pTo 2008) ACGAM6P Glucosamine-6-phosphate (Naderer et al. 66 LmjF28.3005 L L. major Amastigote - - - - L Si acetylase 2015) glutamine-fructose-6- (Naderer et al. 67 GF6PTAr LmjF06.0950 L L. major Amastigote - - - - NL phosphate transaminase 2015) Prediction accuracy 66.67% 60.5% 83.3% 71.6% Note: Flux-based analysis was not performed to predict knockout effects using metabolic model except ext-iAC560. The results from the published models, as it is, are considered in this analysis. Legend: MM: Minimal Media containing L-arginine, L-cysteine, L-histidine, L-isoleucine, L-leucine, L-lysine, L-methionine, L-phenylalanine, L-threonine, L- tyrosine, L-valine, hypoxanthine, phosphate, and oxygen; MMP: MM+glucose+L-proline; MMA: MMP+ Stearic acid, D-Glucosamine, N-Acetyl-D-glucosamine, Phosphatidylethanolamine and D-alanine; Glc: Glucose; aa: amino acids; L: Lethal; NL: Non-lethal.

1(((((((((((((((LmjF05.0500 and LmjF25.1170) and LmjF21.0740) and LmjF30.3600) and LmjF21.1770) or ((((LmjF05.0500 and LmjF25.1170) and LmjF24.0630) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0500 and LmjF25.1170) and LmjF26.0460) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0500 and LmjF25.1180) and LmjF21.0740) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0500 and LmjF25.1180) and LmjF24.0630) and LmjF30.3600) and LmjF21.1770)) or ( (((LmjF05.0500 and LmjF25.1180) and LmjF26.0460) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0510 and LmjF25.1170) and LmjF21.0740) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0510 and LmjF25.1170) and LmjF24.0630) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0510 and LmjF25.1170) and LmjF26.0460) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0510 and LmjF25.1180) and LmjF21.0740) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0510 and LmjF25.1180) and LmjF24.0630) and LmjF30.3600) and LmjF21.1770)) or ((((LmjF05.0510 and LmjF25.1180) and LmjF26.0460) and LmjF30.3600) and LmjF21.1770)).

2( ( ( ( ( ( ( ( ( ( (LmjF36.6995 and LmjF23.0370) and LmjF03.0100) and LmjF28.2680) and LmjF03.0740) and LmjF12.0670) and LmjF26.1710) and LmjF21.1710) and LmjF25.1130) and LmjF31.1570) and LmjF04.1130) and LmjF20.0840)

3(( ( ( ( ( ( ( ( ( ( ( ( (LmjF21.1340 and LmjF21.1790) and LmjF23.0130) and LmjF28.1160) and LmjF30.3660) and LmjF34.3670) and LmjF05.1140) and LmjF28.2430) and LmjF18.0560) and LmjF35.0700) and LmjF36.3100) and LmjF12.0520) and LmjF23.0340) and LmjF23.1510) or (((((((((((((LmjF21.1340 and LmjF21.1800) and LmjF23.0130) and LmjF28.1160) and LmjF30.3660) and LmjF34.3670) and LmjF05.1140) and LmjF28.2430) and LmjF18.0560) and LmjF35.0700) and LmjF36.3100) and LmjF12.0520) and LmjF23.0340) and LmjF23.1510)) Gene Essentiality Analysis in Leishmania major 197

4.8 Appendix 4d

A brief description of programming scripts written for performing various analyses in this study

(Available at https://github.com/shakyawar/SupplMaterial_PhD)

Script set 2 (java): This is for performing reaction knockout simulation, where a reaction is knocked out and biomass is measured to observe lethal or non-lethal effect.

Script set 4 (java): This is for performing gene essentiality analysis, where synthesis of biomass building blocks is checked in when a particular gene is knocked out from the model. Essentiality of the gene is determined based on biomass values in wild type and knocked out conditions.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 198 Chapter 4

4.9 References

Akerman, M. et al., 2004. Novel motifs in amino acid permease genes from Leishmania. Biochemical and Biophysical Research Communications, 325(1), pp.353–366. Allen, S.M. et al., 2015. Plasmodium falciparum glucose-6-phosphate dehydrogenase 6-phosphogluconolactonase is a potential drug target. The FEBS journal, 282(19), pp.3808–23. Alvar, J. et al., 2012. Leishmaniasis worldwide and global estimates of its incidence. PLoS ONE, 7(5), p. e35671. Aparicio, I.M. et al., 2010. Susceptibility of Plasmodium falciparum to glutamate dehydrogenase inhibitors-A possible new antimalarial target. Molecular and Biochemical Parasitology, 172(2), pp.152–155. Aripirala, S. et al., 2011. Inhibitors of Leishmania major Farnesyl Diphosphate Synthase: Crystallographic and Calorimetric Studies. Biophysical Journal, 100(3), pp.218a–219a. Aripirala, S. et al., 2014. Structural and thermodynamic basis of the inhibition of Leishmania major farnesyl diphosphate synthase by nitrogen-containing bisphosphonates. Acta Crystallographica Section D: Biological Crystallography, 70(3), pp.802–810. Atack, J.R. and Fletcher, S.R., 1994. Inhibitors of inositol monophosphatase. Drugs of the Future, 19(9), pp.857–866. Azevedo, W.F. and Soares, M.B.P., 2009. Selection of targets for drug development against protozoan parasites. Current drug targets, 10(3), pp.193–201. Barribeau, S.M. et al., 2014. Gene expression differences underlying genotype-by-genotype specificity in a host– parasite system. Proceedings of the National Academy of Sciences, 111(9), pp.3496–3501. Beck, J.T. and Ullman, B., 1990. Nutritional requirements of wild-type and folate transport-deficient Leishmania donovani for pterins and folates. Molecular and Biochemical Parasitology, 43(2), pp.221–230. Bello, A.M. et al., 2007. A potent, covalent inhibitor of orotidine 5′-monophosphate decarboxylase with antimalarial activity. Journal of Medicinal Chemistry, 50(5), pp.915–921. Beverley, S.M., Ellenberger, T.E. and Cordingley, J.S., 1986. Primary structure of the gene encoding the bifunctional dihydrofolate reductase-thymidylate synthase of Leishmania major. Proceedings of the National Academy of Sciences of the United States of America, 83(8), pp.2584–2588. Bochud-Allemann, N. and Schneider, A., 2002. Mitochondrial substrate level phosphorylation is essential for growth of procyclic Trypanosoma brucei. Journal of Biological Chemistry, 277(36), pp.32849–32854. Boitz, J.M. et al., 2013. Adenylosuccinate synthetase and adenylosuccinate lyase deficiencies trigger growth and infectivity deficits in Leishmania donovani. Journal of Biological Chemistry, 288(13), pp.8977–8990. Boitz, J.M. et al., 2009. Leishmania donovani ornithine decarboxylase is indispensable for parasite survival in the mammalian host. Infection and Immunity, 77(2), pp.756–763. Boitz, J.M. and Ullman, B., 2006a. A conditional mutant deficient in hypoxanthine-guanine phosphoribosyltransferase and xanthine phosphoribosyltransferase validates the purine salvage pathway of Leishmania donovani. Journal of Biological Chemistry, 281(23), pp.16084–16089. Boitz, J.M. and Ullman, B., 2010. Amplification of adenine phosphoribosyltransferase suppresses the conditionally lethal growth and virulence phenotype of Leishmania donovani mutants lacking both hypoxanthine-guanine and xanthine phosphoribosyltransferases. Journal of Biological Chemistry, 285(24), pp.18555–18564. Boitz, J.M. and Ullman, B., 2006b. Leishmania donovani singly deficient in HGPRT, APRT or XPRT are viable in vitro and within mammalian macrophages. Molecular and Biochemical Parasitology, 148(1), pp.24–30. Bozdech, Z. et al., 2003. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biology, 1(1), p. E5. Caspi, R. et al., 2014. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research, 42(D1), pp. D742-753. Chavali, A.K.. et al., 2012a. A metabolic network approach for the identification and prioritization of antimicrobial drug targets. Trends Microbiol, 20(3), pp.113–123. Chavali, A.K.. et al., 2012. Metabolic network analysis predicts efficacy of FDA-approved drugs targeting the causative agent of a neglected tropical disease. BMC Systems Biology, 6(1), p.27. Gene Essentiality Analysis in Leishmania major 199

Chavali, A.K. et al., 2008. Systems analysis of metabolism in the pathogenic trypanosomatid Leishmania major. Molecular Systems Biology, 4, p.177. Cilingir, G., Broschat, S.L. and Lau, A.O.T., 2012. ApicoAP: The first computational model for identifying apicoplast- targeted proteins in multiple species of apicomplexa. PLoS ONE, 7(5), p. e36598. Coustou, V. et al., 2003. ATP generation in the Trypanosoma brucei procyclic form: cytosolic substrate level is essential, but not oxidative phosphorylation. The Journal of biological chemistry, 278(49), pp.49625–49635. Coustou, V. et al., 2006. Fumarate is an essential intermediary metabolite produced by the procyclic Trypanosoma brucei. Journal of Biological Chemistry, 281(37), pp.26832–26846. Cruz, A. and Beverley, S.M., 1990. Gene replacement in parasitic protozoa. Nature, 348, pp.171–173. Cunningham, M.L. and Fairlamb, A.H., 1995. Trypanothione Reductase from Leishmania donovani Purification, Characterisation and Inhibition by Trivalent Antimonials. European Journal of Biochemistry, 230(2), pp.460–468. D’Souza, K. et al., 2014. Distinct properties of the two isoforms of CDP-diacylglycerol synthase. Biochemistry, 53(47), pp.7358–7367. Das, M., Singh, S. and Dubey, V.K., 2016. Novel Inhibitors of Ornithine Decarboxylase of Leishmania Parasite (LdODC): The Parasite Resists LdODC Inhibition by Overexpression of Spermidine Synthase. Chem Biol Drug Des, 87(3), pp.352–60. Datta, N.S. and Hajra, A.K., 1984. Does microsomal glycerophosphate acyltransferase also catalyze the acylation of dihydroxyacetone phosphate? FEBS Letters, 176(1), pp.264–268. Dean, S. et al., 2013. Single point mutations in ATP synthase compensate for mitochondrial genome loss in trypanosomes. Proceedings of the National Academy of Sciences of the United States of America, 110(36), pp.14741–14746. Denny, P.W. et al., 2006. The protozoan inositol phosphorylceramide synthase: A novel drug target that defines a new class of sphingolipid synthase. Journal of Biological Chemistry, 281(38), pp.28200–28209. Depledge, D.P. et al., 2009. Comparative expression profiling of Leishmania: Modulation in gene expression between species and in different host genetic backgrounds. PLoS Neglected Tropical Diseases, 3(7), p. e476. Dinesh, N. et al., 2014. Exploring Leishmania donovani 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR) as a potential drug target by biochemical, biophysical and inhibition studies. Microbial Pathogenesis, 66, pp.14–23. Durmus, S. et al., 2015. A review on computational systems biology of pathogen-host interactions. Frontiers in Microbiology, 6, p. 235. Edwards, J.S. and Palsson, B.O., 2000. Robustness analysis of the Escherichia coli metabolic network. Biotechnology Progress, 16(6), pp.927–939. El-Sayed, N.M. et al., 2005. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science (New York, N.Y.), 309(5733), pp.409–15. Fang, X., Wallqvist, A. and Reifman, J., 2010. Development and analysis of an in vivo-compatible metabolic network of Mycobacterium tuberculosis. BMC Systems Biology, 4(1), p.160. Faria, J. et al., 2016. Disclosing the essentiality of ribose-5-phosphate isomerase B in Trypanosomatids. Scientific Reports, 6., p. 26937. Farine, L. et al., 2017. Phosphatidylserine synthase 2 and phosphatidylserine decarboxylase are essential for aminophospholipid synthesis in Trypanosoma brucei. Molecular Microbiology, 104(3), pp.412–427. Feliciano, P.R. et al., 2012. Fumarate hydratase isoforms of Leishmania major: Subcellular localization, structural and kinetic properties. International Journal of Biological Macromolecules, 51, pp.25–31. Fernandes, J.D.S. et al., 2015. The role of amino acid permeases and tryptophan biosynthesis in Cryptococcus neoformans survival. PLoS ONE, 10(7), p. e0132369. Ferreira, É.R. et al., 2016. Unique behavior of Trypanosoma cruzi mevalonate kinase: A conserved glycosomal enzyme involved in host cell invasion and signaling. Scientific Reports, 6(April), p.24610. Garami, A. and Ilg, T., 2001. The Role of Phosphomannose Isomerase in Leishmania mexicana Glycoconjugate Synthesis and Virulence. Journal of Biological Chemistry, 276(9), pp.6566–6575. Garami, A., Mehlert, A. and Ilg, T., 2001. Glycosylation defects and virulence phenotypes of Leishmania mexicana

Metabolic Systems Biology of Leishmania major University of Minho, 2019 200 Chapter 4

phosphomannomutase and dolicholphosphate-mannose synthase gene deletion mutants. Molecular and cellular biology, 21(23), pp.8168–83. Geren, C.R., Geren, L.M. and Ebner, K.E., 1977. Inhibition and inactivation of bovine mammary and liver UDP galactose 4 epimerases. Journal of Biological Chemistry, 252(6), pp.2089–2094. Gilbert, I.H., 2002. Inhibitors of dihydrofolate reductase in Leishmania and Trypanosomes. Biochimica et Biophysica Acta - Molecular Basis of Disease, 1587(2–3), pp.249–257. Gilroy, C. et al., 2011. Spermidine synthase is required for virulence of Leishmania donovani. Infection and Immunity, 79(7), pp.2764–2769. Green, L.S. et al., 2009. Inhibition of methionyl-tRNA synthetase by REP8839 and effects of resistance mutations on enzyme activity. Antimicrobial Agents and Chemotherapy, 53(1), pp.86–94. Gutiérrez-Correa, J., 2006. Trypanosoma cruzi dihydrolipoamide dehydrogenase as target for phenothiazine cationic radicals. Effect of antioxidants. Current drug targets, 7(9), pp.1155–79. Haydn, P.P. et al., 1982. Inhibition of Phosphatidylethanolamine N-Methylation by 3-Deazaadenosine Stimulates the Synthesis of Phosphatidylcholine via the CDP-Choline Pathwa. THE JOURNAL OF BIOLOGICAL CHEMISTRY, 257, pp.6362–6367. Hidalgo-Zarco, F. and Gonzalez-Pazanowska, D., 2001. Trypanosomal dUTPases as potential targets for drug design. Curr Protein Pept Sci, 2(4), pp.389–397. Hirabayashi, T., Larson, T.J. and Dowhan, W., 1976. Membrane-associated phosphatidylglycerophosphate synthetase from Escherichia coli: purification by substrate affinity chromatography on cytidine 5’-diphospho-1,2-diacyl-sn- glycerol sepharose. Biochemistry, 15(24), pp.5205–11. Hurtado-Guerrero, R. et al., 2007. Glucose-6-phosphate as a probe for the glucosamine-6-phosphate N-acetyltransferase Michaelis complex. FEBS Letters, 581(29), pp.5597–5600. Hussain, T., Yogavel, M. and Sharma, A., 2015. Inhibition of protein synthesis and malaria parasite development by drug targeting of methionyl-tRNA synthetases. Antimicrobial Agents and Chemotherapy, 59(4), pp.1856–1867. Hwang, H.Y. and Ullman, B., 1997. Genetic analysis of purine metabolism in Leishmania donovani. J Biol Chem, 272(31), pp.19488–19496. Ilg T, 2002. Generation of myo-inositol-auxotrophic Leishmania mexicana mutants by targeted replacement of the myo- inositol-1-phosphate synthase gene. Mol Biochem Parasitol., 120(1), pp.151–156. Iovannisci, D.M. and Ullman, B., 1984. Characterization of a mutant Leishmania donovani deficient in adenosine kinase activity. Mol Biochem Parasitol, 12(2), pp.139–151. Ismail, H.M. et al., 2016. Artemisinin activity-based probes identify multiple molecular targets within the asexual stage of the malaria parasites Plasmodium falciparum 3D7. Proceedings of the National Academy of Sciences, 113(8), pp.2080–2085. Ivens, A.C. et al., 2005. The genome of the kinetoplastid parasite, Leishmania major. Science, 309(5733), pp.436–442. Jamshidi, N. and Palsson, B.Ø., 2007. Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets. BMC systems biology, 1, p.26. Jiang, P. et al., 2015. Network analysis of gene essentiality in functional genomics experiments. Genome Biology, 16(1), p. 239. JR, R., ML, G. and Milne KG, F.M., 2002. Galactose metabolism is essential for the African sleeping sickness parasite Trypanosoma brucei. Proc Natl Acad Sci U S A., 30(99), pp.5884–9. Katsila, T. et al., 2016. Computational approaches in target identification and drug discovery. Computational and Structural Biotechnology Journal, 14, pp.177–184. Kaur, P.K. et al., 2016. Mutational and structural analysis of conserved residues in ribose-5-phosphate isomerase B from Leishmania donovani: Role in substrate recognition and conformational stability. PLoS ONE, 11(3), p. e0150764. Kissinger, J.C. et al., 2003. ToxoDB: Accessing the Toxoplasma gondii genome. Nucleic Acids Research, 31(1), pp.234–236. Kizjakina, K., Tanner, J.J. and Sobrado, P., 2013. Targeting UDP-galactopyranose mutases from eukaryotic human pathogens. Current pharmaceutical design, 19(14), pp.2561–73. Gene Essentiality Analysis in Leishmania major 201

Krungkrai, J. et al., 2003. Molecular biology and biochemistry of malarial parasite pyrimidine biosynthetic pathway. The Southeast Asian journal of tropical medicine and public health, 34 Suppl 2, pp.32–43. Lakhal-Naouar, I. et al., 2012. Leishmania donovani Argininosuccinate Synthase Is an Active Enzyme Associated with Parasite Pathogenesis. PLoS Neglected Tropical Diseases, 6(10), p. e1849. Larhlimi, A. et al., 2011. Robustness of metabolic networks: a review of existing definitions. Bio Systems, 106(1), pp.1– 8. Lepesheva, G.I. and Waterman, M.R., 2011. Sterol 14alpha-Demethylase (CYP51) as a Therapeutic Target for Human Trypanosomiasis and Leishmaniasis. Current topics in medicinal chemistry, 11(16), pp.2060–2071. Lilley, A.C. et al., 2014. The essential roles of cytidine diphosphate-diacylglycerol synthase in bloodstream form Trypanosoma brucei. Molecular Microbiology, 92(3), pp.453–470. Ling, L.L. et al., 2004. Identification and Characterization of Inhibitors of Bacterial Enoyl-Acyl Carrier Protein Reductase. Antimicrobial Agents and Chemotherapy, 48(5), pp.1541–1547. Lu, P., Lill, H. and Bald, D., 2014. ATP synthase in mycobacteria: Special features and implications for a function as drug target. Biochimica et Biophysica Acta - Bioenergetics, 1837(7), pp.1208–1218. Luque-Ortega, J.R. et al., 2008. Human antimicrobial peptide histatin 5 is a cell-penetrating peptide targeting mitochondrial ATP synthesis in Leishmania. The FASEB Journal, 22(6), pp.1817–1828. Luque-Ortega, J.R. and Rivas, L., 2007. Miltefosine (hexadecylphosphocholine) inhibits cytochrome c oxidase in Leishmania donovani promastigotes. Antimicrobial Agents and Chemotherapy, 51(4), pp.1327–1332. Lye, L.F. et al., 2011. Phenylalanine hydroxylase (PAH) from the lower eukaryote Leishmania major. Molecular and Biochemical Parasitology, 175(1), pp.58–67. Ma, X.H. et al., 2010. In-silico approaches to multi-target drug discovery computer aided multi-target drug design, multi-target virtual screening. Pharmaceutical Research, 27(5), pp.739–749. Madden, T., 2013. The BLAST sequence analysis tool. The BLAST Sequence Analysis Tool, pp.1–17. Magariños, M.P. et al., 2012. TDR targets: A chemogenomics resource for neglected diseases. Nucleic Acids Research, 40(D1), pp. D1118–D1127. Mandlik, V., Shinde, S. and Singh, S., 2014. Molecular evolution of the enzymes involved in the sphingolipid metabolism of Leishmania: selection pressure in relation to functional divergence and conservation. BMC Evolutionary Biology, 14(1), p.142. Manhas, R. et al., 2014. Identification and functional characterization of a novel bacterial type asparagine synthetase A: a tRNA synthetase paralog from Leishmania donovani. The Journal of biological chemistry, 289(17), pp.12096– 108. Mariño, K. et al., 2011. Characterization, localization, essentiality, and high-resolution crystal structure of glucosamine 6-phosphate N-acetyltransferase from Trypanosoma brucei. Eukaryotic Cell, 10(7), pp.985–997. Marriott, M.S., 1980. Inhibition of sterol biosynthesis in Candida albicans by imidazole- containing antifungals. Journal of General Microbiology, 117(1), pp.253–255. Martho, K.F.C. et al., 2016. Amino acid permeases and virulence in Cryptococcus neoformans. PLoS ONE, 11(10), p. e0163919 Martin, K.L. and Smith, T.K., 2005. The myo-inositol-1-phosphate synthase gene is essential in Trypanosoma brucei. Biochemical Society Transactions, 33(Pt 5), pp.983–985. Mazet, M. et al., 2013. Revisiting the Central Metabolism of the Bloodstream Forms of Trypanosoma brucei: Production of Acetate in the Mitochondrion Is Essential for Parasite Viability. PLoS Neglected Tropical Diseases, 7(12), p. e2587. McCall, L.I. et al., 2015. Targeting Ergosterol Biosynthesis in Leishmania donovani: Essentiality of Sterol 14alpha- demethylase. PLoS Neglected Tropical Diseases, 9(3), p. e0003588. Mina, J.G. et al., 2009. The Trypanosoma brucei sphingolipid synthase, an essential enzyme and drug target. Molecular and Biochemical Parasitology, 168(1), pp.16–23. Mondal, S., Roy, J.J. and Bera, T., 2014. Generation of adenosine tri-phosphate in Leishmania donovani amastigote forms. Acta parasitologica / Witold Stefański Institute of Parasitology, Warszawa, Poland, 59(1), pp.11–6.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 202 Chapter 4

Moreira, D.S. and Murta, S.M.F., 2016. Involvement of nucleoside diphosphate kinase b and elongation factor 2 in Leishmania braziliensis antimony resistance phenotype. Parasites & vectors, 9(1), p.641. Muench, S.P. et al., 2007. Studies of Toxoplasma gondii and Plasmodium falciparum enoyl acyl carrier protein reductase and implications for the development of antiparasitic agents. Acta Crystallographica Section D: Biological Crystallography, 63(3), pp.328–338. Murta, S.M.F. et al., 2009. Methylene tetrahydrofolate dehydrogenase/cyclohydrolase and the synthesis of 10-CHO- THF are essential in Leishmania major. Molecular Microbiology, 71(6), pp.1386–1401. Naderer, T. et al., 2015. Intracellular Survival of Leishmania major Depends on Uptake and Degradation of Extracellular Matrix Glycosaminoglycans by Macrophages. PLoS Pathogens, 11(9), pp.1–20. Naderer, T. et al., 2006. Virulence of Leishmania major in macrophages and mice requires the gluconeogenic enzyme fructose-1,6-bisphosphatase. Proceedings of the National Academy of Sciences of the United States of America, 103(14), pp.5502–5507. Naderer, T., Heng, J. and McConville, M.J., 2010. Evidence that intracellular stages of Leishmania major utilize amino sugars as a major carbon source. PLoS Pathogens, 6(12), p. e1001245. Naderer, T., Wee, E. and McConville, M.J., 2008. Role of hexosamine biosynthesis in Leishmania growth and virulence. Molecular Microbiology, 69(4), pp.858–869. Nare, B. et al., 2009. PTR1-dependent synthesis of tetrahydrobiopterin contributes to oxidant susceptibility in the trypanosomatid protozoan parasite Leishmania major. Current Genetics, 55(3), pp.287–299. Ong, H.B. et al., 2011. Dissecting the metabolic roles of pteridine reductase 1 in Trypanosoma brucei and Leishmania major. Journal of Biological Chemistry, 286(12), pp.10429–10438. Ortiz-Gómez, A. et al., 2006. Farnesyl diphosphate synthase is a cytosolic enzyme in Leishmania major promastigotes and its overexpression confers resistance to risedronate. Eukaryotic Cell, 5(7), pp.1057–1064. Ou-Yang, S.S. et al., 2012. Computational drug discovery. Acta Pharmacologica Sinica, 33(9), pp.1131–1140. Paul, M.L.S. et al., 2014. Essential gene identification and drug target prioritization in Leishmania species. Molecular BioSystems, 10(5), p.1184. Pedersen, L.L. and Turco, S.J., 2003. Galactofuranose metabolism: A potential target for antimicrobial chemotherapy. Cellular and Molecular Life Sciences, 60(2), pp.259–266. Pei, Y. et al., 2010. Plasmodium pyruvate dehydrogenase activity is only essential for the parasite’s progression from liver infection to blood infection. Molecular Microbiology, 75(4), pp.957–971. Phillips, M.A. et al., 2015. A long-duration dihydroorotate dehydrogenase inhibitor (DSM265) for prevention and treatment of malaria. Science Translational Medicine, 7(296), p.296ra111. Phillips, M.A. and Rathod, P.K., 2010. Plasmodium dihydroorotate dehydrogenase: a promising target for novel anti- malarial chemotherapy. Infect.Disord.Drug Targets., 10(1871–5265 (Print)), pp.226–239. Plata, G. et al., 2010. Reconstruction and flux-balance analysis of the Plasmodium falciparum metabolic network. Molecular Systems Biology, 6(408), p.408. Popowska, M., Osińska, M. and Rzeczkowska, M., 2012. N-acetylglucosamine-6-phosphate deacetylase (NagA) of Listeria monocytogenes EGD, an essential enzyme for the metabolism and recycling of amino sugars. Archives of Microbiology, 194(4), pp.255–268. Rakotomanga, M. et al., 2007. Miltefosine affects lipid metabolism in Leishmania donovani promastigotes. Antimicrobial Agents and Chemotherapy, 51(4), pp.1425–1430. Roberts, S.C. et al., 2004. Arginase plays a pivotal role in polyamine precursor metabolism in Leishmania: Characterization of gene deletion mutants. Journal of Biological Chemistry, 279(22), pp.23668–23678. Roldán, A. et al., 2011. Lipoamide dehydrogenase is essential for both bloodstream and procyclic Trypanosoma brucei. Molecular Microbiology, 81(3), pp.623–639. Ryder, N.S., 1992. Terbinafine: Mode of action and properties of the squalene epoxidase inhibition. British Journal of Dermatology, 126, pp.2–7. Schlame, M. and Haldar, D., 1993. Cardiolipin is synthesized on the matrix side of the inner membrane in rat liver mitochondria. The Journal of biological chemistry, 268(1), pp.74–9. Gene Essentiality Analysis in Leishmania major 203

Schnaufer, A. et al., 2005. The F1-ATP synthase complex in bloodstream stage trypanosomes has an unusual and essential function. EMBO Journal, 24(23), pp.4029–4040. Scott D.A., Coombs G.H. and Sanderson B.E., 1987. Effects of methotrexate and other antifolates on the growth and dihydrofolate reductase activity of Leishmania promastigotes. Biochem Pharmacol, 36(12), p.2043–5. Sen, P., Vial, H.J. and Radulescu, O., 2013. Kinetic modelling of phospholipid synthesis in Plasmodium knowlesi unravels crucial steps and relative importance of multiple pathways. BMC systems biology, 7, p.123. Serricchio, M. and Bütikofer, P., 2012. An essential bacterial-type cardiolipin synthase mediates cardiolipin formation in a eukaryote. Proceedings of the National Academy of Sciences of the United States of America, 109(16), pp.E954- 61. Serricchio, M. and Bütikofer, P., 2013. Phosphatidylglycerophosphate synthase associates with a mitochondrial inner membrane complex and is essential for growth of Trypanosoma brucei. Molecular Microbiology, 87(3), pp.569– 579. Sgraja, T., Smith, T.K. and Hunter, W.N., 2007. Structure, substrate recognition and reactivity of Leishmania major mevalonate kinase. BMC structural biology, 7, p.20. Shaked-Mishan, P. et al., 2006. A novel high-affinity arginine transporter from the human parasitic protozoan Leishmania donovani. Molecular Microbiology, 60(1), pp.30–38. Sharma, M. et al., 2017. A systematic reconstruction and constraint-based analysis of Leishmania donovani metabolic network: identification of potential antileishmanial drug targets. Mol. BioSyst., 13(5), pp.955–969. Shih, S. et al., 1998. Localization and targeting of the Leishmania donovani hypoxanthine- guanine phosphoribosyltransferase to the glycosome. Journal of Biological Chemistry, 273(3), pp.1534–1541. Singh, S.P., Agnihotri, P. and Pratap, J.V., 2013. Characterization of a Novel Putative S-Adenosylmethionine Decarboxylase-Like Protein from Leishmania donovani. PLoS ONE, 8(6), p. e65912. Smith, T.K. and Bütikofer, P., 2010. Lipid metabolism in Trypanosoma brucei. Molecular and Biochemical Parasitology, 172(2), pp.66–79. Soltero-Higgin, M. et al., 2004. Identification of inhibitors for UDP-galactopyranose mutase. Journal of the American Chemical Society, 126(34), pp.10532–10533. Souza, J.M., Plumbridge, J. a and Calcagno, M.L., 1997. N-acetylglucosamine-6-phosphate deacetylase from Escherichia coli: purification and molecular and kinetic characterization. Archives of biochemistry and biophysics, 340(2), pp.338–46. Starnes, G.L. et al., 2009. Aldolase Is Essential for Energy Production and Bridging Adhesin-Actin Cytoskeletal Interactions during Parasite Invasion of Host Cells. Cell Host and Microbe, 5(4), pp.353–364. Stern, A.L. et al., 2007. Ribose 5-phosphate isomerase type B from Trypanosoma cruzi: kinetic properties and site- directed mutagenesis reveal information about the reaction mechanism. The Biochemical journal, 401(1), pp.279– 85. Stewart, J. et al., 2005. Characterisation of a Leishmania mexicana knockout lacking guanosine diphosphate-mannose pyrophosphorylase. International Journal for Parasitology, 35(8), pp.861–873. Stokes, M.J. et al., 2008. The synthesis of UDP-N-acetylglucosamine is essential for bloodstream form Trypanosoma brucei in vitro and in vivo and UDP-N-acetylglucosamine starvation reveals a hierarchy in parasite protein glycosylation. Journal of Biological Chemistry, 283(23), pp.16147–16161. Stone, S.J. and Vance, J.E., 2000. Phosphatidylserine synthase-1 and -2 are localized to mitochondria-associated membranes. Journal of Biological Chemistry, 275(44), pp.34534–34540. Subramanian, A., Jhawar, J. and Sarkar, R.R., 2015. Dissecting Leishmania infantum energy metabolism - A systems perspective. PLoS ONE, 10(9), p. e0137976. Subramanian, A. and Sarkar, R.R., 2017. Revealing the mystery of metabolic adaptations using a genome scale model of Leishmania infantum. Scientific Reports, 7(1), p.10262. Surolia, N. and Surolia, A., 2001. Triclosan offers protection against blood stages of malaria by inhibiting enoyl-ACP reductase of Plasmodium falciparum. Nature medicine, 7(2), pp.167–73. Aguero F. et al., 2008. Genomic-scale prioritization of drug targets: the TDR Targets database. Nat Rev Drug Discov. 7(11), pp.900-907

Metabolic Systems Biology of Leishmania major University of Minho, 2019 204 Chapter 4

Thompson, K. et al., 2002. Identification of a bisphosphonate that inhibits isopentenyl diphosphate isomerase and farnesyl diphosphate synthase. Biochemical and biophysical research communications, 290(2), pp.869–73. Tiwari, K., Kumar, R. and Dubey, V.K., 2016. Biochemical characterization of dihydroorotase of Leishmania donovani: Understanding pyrimidine metabolism through its inhibition. Biochimie, 131, pp.45–53. Tobalina, L. et al., 2016. Assessment of FBA Based Gene Essentiality Analysis in Cancer with a Fast Context-Specific Network Reconstruction Method. PloS one, 11(5), p.e0154583. Tullius, M. V., Harth, G. and Horwitz, M.A., 2003. Glutamine synthetase GlnA1 is essential for growth of Mycobacterium tuberculosis in human THP-1 macrophages and guinea pigs. Infection and Immunity, 71(7), pp.3927–3936. Tymoshenko, S. et al., 2015. Metabolic Needs and Capabilities of Toxoplasma gondii through Combined Computational and Experimental Analysis. PLoS Computational Biology, 11(5), p. e1004261. Urbaniak, M.D. et al., 2013. A novel allosteric inhibitor of the uridine diphosphate N-acetylglucosamine pyrophosphorylase from Trypanosoma brucei. ACS Chemical Biology, 8(9), pp.1981–1987. Urbina, J.A., 2009. Ergosterol biosynthesis and drug development for Chagas disease. Memorias do Instituto Oswaldo Cruz, 104(SUPPL. 1), pp.311–318. Vickers, T.J. et al., 2006. Biochemical and genetic analysis of methylenetetrahydrofolate reductase in Leishmania metabolism and virulence. Journal of Biological Chemistry, 281(50), pp.38150–38158. Vieira, P.S. et al., 2017. Pyrrole-indolinone SU11652 targets the nucleoside diphosphate kinase from Leishmania parasites. Biochemical and Biophysical Research Communications, 488(3), pp.461–465. Wishart, D.S. et al., 2008. DrugBank: A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research, 36(SUPPL. 1), pp. D901-D906. Woodruff, W.W. and Wolfenden, R., 1979. Inhibition of ribose-5-phosphate isomerase by 4-phosphoerythronate. Journal of Biological Chemistry, 254(13), pp.5866–5867. Xu, W. et al., 2014. Sterol Biosynthesis Is Required for Heat Resistance but Not Extracellular Survival in Leishmania. PLoS Pathogens, 10(10), p. e1004427. Yadav, V. et al., 2011. N-acetylglucosamine 6-phosphate deacetylase (nagA) is required for N-acetyl glucosamine assimilation in gluconacetobacter xylinus. PLoS ONE, 6(6), p. e18099. Yamagishi, J. et al., 2014. Interactive transcriptome analysis of malaria patients and infecting Plasmodium falciparum. Genome Research, 24(9), pp.1433–1444. Yao, J. et al., 2016. Enoyl-acyl carrier protein reductase I (FabI) is essential for the intracellular growth of Listeria monocytogenes. Infection and Immunity, 84(12), pp.3597–3607. Zhang, K. et al., 2003. Sphingolipids are essential for differentiation but not growth in Leishmania. EMBO Journal, 22(22), pp.6016–6026. Zou, J. et al., 2013. Advanced systems biology methods in drug discovery and translational biomedicine. BioMed Research International, 2013, p. 742835. Zufferey, R. and Mamoun, C. Ben, 2006. Leishmania major expresses a single dihydroxyacetone phosphate acyltransferase localized in the glycosome, important for rapid growth and survival at high cell density and essential for virulence. Journal of Biological Chemistry, 281(12), pp.7952–7959. Zufferey, R. and Mamoun, C. Ben, 2005. The initial step of glycerolipid metabolism in Leishmania major promastigotes involves a single glycerol-3-phosphate acyltransferase enzyme important for the synthesis of triacylglycerol but not essential for virulence. Molecular Microbiology, 56(3), pp.800–810.

CHAPTER 5

Summary And Future Perspectives

“I think the light of science is so dazzling that it can be evaluated only by studying its reflection from the absorbing mirror of life; and life brings one back to wildness.”

Charles A. Lindbergh, In 'The Wisdom of Wilderness', Life (1967)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 206 Chapter 5

5.1 Summary

Human protozoan parasites especially Leishmania, Trypanosoma, Toxoplasma, and Plasmodium are major causes of human death in tropical and subtropical regions (Nii-Trebi 2017). The treatment of the associated diseases mainly relied on chemotherapies, which has been affected by the emerging resistance of these parasitic species. A number of factors, in particular, mutations in the genome due to environmental changes and exposure with the chemicals and antimicrobial compounds are mainly responsible for the emerging resistance of the parasites against existing drugs (Lashley 2004; Koning 2017; Mohapatra 2014). From the metabolic perspectives, resistant species of parasites are highly dynamic to modulate their metabolism (e.g. use of carbon source, and synthesis of required compounds like glycans and glycoconjugates) to adapt in the ecologically harsh niches especially inside human host (Tong et al. 2015). The current research for developing anti-parasitic drugs mainly focuses on the identification of potential drug targets (e.g. gene, enzyme, and metabolites), followed by selection of the chemical compounds that can affect the normal functioning of the target in order to influence the microbial growth. Moreover, few recent studies have also focused on carbohydrates as a drug target (Field et al. 2017; Pollyanna et al. 2017), after these have been found involved in several cellular processes involving carbohydrates recognition on cell surface of the parasites (Bishop and Gagneux 2007; Holt 2011; Cova et al. 2015; Pollyanna et al. 2017). The studies especially focusing on structures and functions of these carbohydrates, under glycobiology, have grown significantly in recent years (Mendonça-Previato et al. 2005; Cummings and Nyame 2008; Rodrigues et al. 2015); however, their metabolic involvement has not been understood well in parasites. Further, exploring fascinating strategies that parasites use for altering its metabolism as well as carbohydrate-associated metabolic functions might underpin new ways to develop effective treatment strategies against protozoan parasites.

Although, the application of high throughput technologies including Mass Spectrometry, Nuclear Magnetic Resonance, high throughput RNA sequencing, proteomics, and glycomics provided characterization of metabolic components including carbohydrates; sophisticated computational strategies are still needed to utilize these large-scale data for systems-based analysis to understand genotype/phenotype and metabolism; which can help in identifying novel drug targets in protozoan parasites. The aim of this work is to utilize systems biology approaches, in Summary and Future Perspectives 207

particular, metabolic modeling, incorporating omics data to study metabolism and involvement of glyco components in cellular processes, which can further help to identify physiologically relevant nodes (e.g. potential drug target genes) in the metabolic network of the parasites.

At beginning of the work, a literature-based investigation was done to explore structures, biosynthesis and biological roles of glycans and glycoconjugates such as Lipophosphoglycans (LPG), Glycosylinositol phospholipids (GIPL) and Glycoproteins (GP) in various organisms especially the human protozoan parasites. In order to further understand the glycobiology research at the laboratory level, various experimental methodologies such as spectra-based glycan profiling, kinetic-based characterization of glycosyltransferases (GTs), and array-based methods for studying carbohydrates from various aspects such as biosynthesis, kinetics of enzymes involved in the synthesis, and their molecular interactions with the other biological moieties in the parasites, were reviewed (section 1.1 of Chapter 1). Apart from this, various glycobiology databases, including GlyTouCan (Ceroni et al. 2008; Aoki-Kinoshita et al. 2016), Consortium for Functional Glycomics databases (CFG) (http://www.functionalglycomics.org) and GLYCOSCIENCES.de (Lütteke et al. 2006) were reviewed to explore glycans-related data of various organisms, in particular, protozoan parasites. The intention of these studies was to understand the omics data to utilize in further systems-level analysis of human protozoan parasites metabolism as well as roles of carbohydrates. Other important analytical tools for analyzing glycans 3D structures, and predicting glycosylation sites in the proteins, were also reviewed in terms of analyzing their utility and applications for modeling purposes and systems- based studies (please refer to section 1.2 of Chapter 1).

The literature searches also found that carbohydrates have been characterized more independently as compared to other biological molecules such as genes and proteins using transcriptomics and proteomics techniques in the protozoan parasites; however, studies at systems level especially to understand their metabolic roles and correlation with genotype/phenotype are still in infancy. So far, eleven metabolic models were developed (Table 1.6 of Chapter 1), and in most cases, the flux-based analysis was performed to understand the metabolism and biosynthesis of growth-associated molecules such as DNA, RNA, proteins and ATP in the protozoan parasites such as Leishmania, Trypanosoma, and Plasmodium at systems level. My primary inspection found that most of these models such as iAC560 (L. major)

Metabolic Systems Biology of Leishmania major University of Minho, 2019 208 Chapter 5

(Chavali et al. 2008) and iCS382 (T. gondii) (Song et al. 2013) describe the central carbon metabolism, which is important to provide energy for survival and infection. Only a few studies used integrative approaches to use omics data in flux-based analyses for understanding stage- specific metabolism of amino acids and lipids in parasites (Sharma et al. 2017; Chavali et al. 2008). Unfortunately, previous metabolic models poorly described metabolic involvement of glycans, glycoconjugates, and their building blocks such as UDP-Glucose (UDP-Glc), GDP- Arabinose (GDP-Ara), UDP-Galactose (UDP-Gal), GDP-Mannose (GDP-Man), and UDP-N- acetyl Glucosamine (UDP-GlcNAc) in Leishmania. Additionally, based on reaction knockout phenotype predictions, most of these models showed poor accuracy to predict gene essentiality in the defined environmental conditions. In order to further analyze causes of poor performance, the basics of systems biology approaches such as genome-scale modeling and metabolic network analysis were explored to study the metabolism and genotype/phenotype of an organism at the systems level. In this context, various aspects of the methodology such as the reconstruction of a metabolic network, curation and flux-based analyses in data-rich environment were reviewed. Additionally, several algorithms were also reviewed which are used for integrating omics data into metabolic network analyses for improving model predictions. For example, GIMME (Becker and Palsson 2008), iMAT (Shlomi et al. 2008), E-Flux (Jensen and Papin 2011) and PROM (Chandrasekaran and Price 2010), which use gene expression data to constrain the flux in the metabolic network (Please refer to section 1.6.2 of Chapter 1).

An investigation concerning databases in glycobiology found that, the data in glycomic and glycoproteomic databases is often inconsistent and scattered, hindering their accessibility and usability, as also reported in previous studies (Aoki-Kinoshita 2013; Cummings and Pierce 2014; Ranzinger et al. 2008). Most databases do not provide user-friendly interfaces to extract glycans- related data, which, in addition to the use of unstandardized encoding formats such as WURCS (Tanaka et al. 2014), GlycoCT (Herget et al. 2008), LinearCode (Banin et al. 2002), and GlydeII (Ranzinger et al. 2017) for representing glycans information, makes it more difficult for the glycomics community to access and share information. In specific, most glycomics databases are poorly linked with the other resources like glycoproteomic databases (e.g. AnimalLectinDB (Kumar and Mittal 2011) and CancerLectinDB (Damodaran et al. 2008)) or other popular databases, such as SWISS-PROT/TrEMBL (Bairoch and Apweiler 2000), KEGG GLYCAN (Hashimoto et al. 2006), and Carbohydrate-Active Enzyme (CAZy) (Lombard et al. 2014). Summary and Future Perspectives 209

Subsequent analyses on graphic and text-based format of representing glycans concluded that the analytical tools such as RINGS resources (Akune et al. 2010), GlycanBuilder (Ceroni et al. 2007) and WURCS-WG (www.wurcs-wg.org), which convert one glycan representation format into another, should be used to empower usability and inter-connectivity between glycomic databases. Moreover, the emphasis should be given to develop simple and more generalized format (like WURCS, which is simple and also can accommodate variety of features of glycans) to represent glycan structures, in order to improve accessibility and connectivity of the databases (please refer to Chapter 2).

The Chapter 3 described application of the systems biology approaches, in particular, metabolic network modeling, along with the use of omics data to understand the metabolism of protozoan parasites (a case study on protozoan parasite L. major). More specifically, a metabolic model ext- iAC560 was developed by extending existing constraint-based model iAC560 of L. major with additionally incorporating pathways conferring metabolism of lipids, and carbohydrate including sugar nucleotides such as UDP-Glc, GDP-Ara, UDP-Gal and UDP-GlcNAc, which are essential precursors in biosynthesis of complex glycans and glycoconjugates in parasites. The Gene Inactivity Moderated by Metabolism and Expression (GIMME) algorithm (as introduced in Chapter 1) was used in flux-based simulations to describe consistency between metabolic model and gene expression data of L. major in promastigote stage. In this case study, the inconsistency between flux and gene expression data increased if the threshold was increased (Figure 3.7 of Chapter 3), suggesting that gene expression threshold should be chosen carefully (preferably in the lower ranges) in such integrative analyses. By performing Flux Variability Analysis (FVA), flux spans of each metabolic reaction were compared for different values of gene expression threshold used in GIMME-based flux balance analysis. Considering gene expression threshold as 70, as selected based on consistency analysis, flux distribution across different pathway was predicted using Parsimonious Flux Balance Analysis (pFBA) simulation. Activated fluxes of several reactions after GIMME implementation were supported by the protein expression of associated enzymes in the previous study (Pawar et al. 2014). The improved predicted flux distribution was helpful to describe the stage-specific energy metabolism especially in promastigote (glucose-rich environment) stages of Leishmania. At the same time, the model also described active and inactive metabolic pathways to synthesize sugar nucleotides, which are essential precursors in the biosynthesis of glycoconjugates in promastigote and amastigote

Metabolic Systems Biology of Leishmania major University of Minho, 2019 210 Chapter 5

stages, which was not described by any of the previous models of Leishmania species (see Section 3.3.2 of Chapter 3). In order to understand knockout phenotypes of carbohydrate active enzymes, further gene deletion analysis was carried out in amastigote medium. Knockout of GND (which catalyse conversion of Glucosamine-6-phosphate (Glc6P) to Fructose-6-phosphate (F6P) in glycosome) showed the inability of Leishmania to synthesize Glc6P from amino sugar degradation pathways, which resulted in the restricted synthesis of sugar nucleotides in particular especially UDP-Glc, UDP-Gal, and UDP-Galf), and consequently showed lethal effects in the parasite, while knockout of GFAT enzyme (which catalyse conversion of F6P to Glucosamine-6- phosphate, in cytosol) doesn´t block all the pathways for synthesizing essential sugar nucleotide especially GDP-Man and UDP-GlcNAc, and therefore, causes a non-lethal effect, in the amino sugar rich environment (Figure 3.4 of Chapter 3). These predicted phenotypes-associated analyses, supported in the previous study (Naderer et al. 2008) explaining requirement of the associated enzymes for growth, can be used in antileishmanial drug developing strategies. In terms of limitations of the GIMME method used in present analysis, the activity of the metabolic reactions was described in on/off manner based on the gene expression threshold. The method doesn’t infer the relative expression of the metabolic genes, which is more realistic biological situation, to constrain reaction fluxes in the model. The GIMME driven changes in flux distributions of several reactions in the metabolic network are partially supported by protein expression data. The other inconsistent predictions might be because of chosen threshold or different medium used in protein expression study. Further experimentation is subjected to be done to validate the model´s predictions.

Chapter 4 focused on the constraints-based flux analysis using metabolic model ext-iAC560 to identify essential genes which affect growth of the parasites by reducing synthesis of building block(s) of biomass to zero in Leishmania promastigote and amastigote. Before performing gene essentiality analysis, reaction knockout simulations using a dataset of 67 reactions and their knockout phenotypes from literature, were performed to evaluate accuracy of the model ext- iAC560 for predicting phenotypes in knockout condition. Comparing the knockout phenotype results from the other models (of other Leishmania species) such as iAC560 (L. major), iMS604 (L. donovani), and iAS142 (L. infantum) with the predictions using current model showed that ext-iAC560 predicts phenotypes in reaction knockout condition with better accuracy (~72%) as compared to the existing models. Moreover, the number of tested knockout simulations and Summary and Future Perspectives 211

correctly predicted phenotypes are relatively more in ext-iAC560 than those considered in the previous metabolic models (see Figure 4.4 of Chapter 4). Since the model works better for predicting reaction knockout phenotypes, further gene essentiality analysis was carried out to predict 89 and 84 genes (with common 81 genes in both the conditions), which are essential for growth which affected synthesis of macromolecules (e.g. protein, carbohydrate, DNA, RNA and lipid) in promastigote and amastigote stages, respectively. Further analysis identified 53 non- human homologous genes (48 genes common in both conditions, 4 genes in promastigote, and one gene in amastigote stage), which can be considered as potential drug targets in L. major. This analysis also predicted nine novel metabolic target genes (e.g. LmjF35.5330, LmjF36.2540, LmjF32.1960, LmjF33.0680, LmjF28.1280, LmjF21.1430, LmjF09.1040, LmjF06.1070, and LmjF06.0350) in promastigote and amastigote stage, and one novel gene (e.g. LmjF36.6950) in amastigote stage, which have no or very less information about essentiality in the protozoan parasites. Around 80% of the predicted drug target genes (including some of the novel target genes) have enzyme-based inhibitors in DrugBank database and literature. Most of these inhibitors, especially those which are tested in other parasitic species, have not been tested in Leishmania. This can be used as an opportunity for further studies focusing on determining sensitivities of such inhibitor molecules in Leishmania. The genes (e.g. LmjF33.2520: UDP-N- acetylglucosamine pyrophosphorylase and LmjF28.3005: N-acetylglucosamine-6-phosphate synthase) which are actively involved in synthesis of sugar nucleotides are first time predicted as essential for growth using model ext-iAC560. In previous studies, these enzymes were proposed as potential drug targets in T. brucei, based on essentiality analysis for growth (Stokes et al. 2008; Mariño et al. 2011). Apart from this, other genes such as LmjF11.1100, LmjF15.1460, LmjF18.0200, LmjF28.1970, and LmjF35.4410, which are mentioned essential in Leishmania according to literature, are also predicted as essential using ext-iAC560, while previous model iAC560 and iMS604 did not analyze these genes. The compilation of data from DrugBank suggested that, molecules such as carbamide phenylacetate, CDV, terbinafine, and 4- (dimethylaminomethyl)-2,6-di(propan-2-yl)phenol can be used as potential antileishmanial drugs, while Miltefosine is suggested to be tested in amastigote Leishmania for altering content of lipids such as phosphatidylcholine and phosphatidylethanolamine, which are essential for growth of the parasite. As limitations, in the present analysis, the lethal phenotypes after gene knockout in Leishmania was defined only when the synthesis of at least one building block of

Metabolic Systems Biology of Leishmania major University of Minho, 2019 212 Chapter 5

biomass reduces to zero; however, this is not always true because sometimes only a slight reduction in the biosynthesis of essential building blocks might also be associated with lethal effects. The predicted novel target genes have potential to be used for developing antileishmanial therapies, and that´s why should be further analysed in laboratory. The overall strategy was helpful to utilize metabolic network modeling approaches with incorporating omics data about glycans, metabolic genes and enzymes to understand stage-specific metabolism and biosynthesis of essential metabolites including sugar nucleotides in L. major. The implementation of GIMME algorithm for integrating gene expression data into flux-based simulation using model ext- iAC560 was helpful to impose strong constraints in the metabolic network for improving predictions. The strategy of the whole work including literature-based investigations, various analyses, and key results are schematically illustrated in Figure 5.

Figure 5: A schematic representation of strategy of the whole study including various resources used, analyses performed and main findings. Light green shaded boxes represent key outcomes of the study, while grey boxes represent main analysis performed. Legend: MS: Mass Spectrometry; NMR: Nuclear magnetic resonance; UDP-Glc: UDP-Glucose; UDP-GlcNAc: N-acetyl Glucosamine; GDP-Man: GDP- Mannose; GIMME: Gene Inactivity Moderated by Metabolism and Expression; pFBA: Parsimonious Flux Balance Analysis. Summary and Future Perspectives 213

5.2 Future perspectives

As highlighted in Chapter 2, most glycomic databases provide scattered information on glycan structures, probably because of their complex representation and unavailability of the standard notation format. The use of current omics technologies significantly increased the number of characterized glycans and glycoconjugates molecules, with the subsequent demand of suitable repository to store these data. The prime challenge here includes the development of universal representation formats to store these data in order to avoid difficulties in accessing and sharing at the public platform for various analyses. In order to tackle this problem, the future research should focus on either sequential application of existing translational tools like RINGS (Akune et al. 2010) or development of more sophisticated tools for interconverting glycan structures to improve the connectivity between various glycomic and glycoproteomic databases.

In the data-rich discipline, systems-level characterization of metabolic complements, in particular, glycans and glycoconjugates are important to understand their biosynthesis and stage- specific activation in protozoan parasites. The application of systems biology approaches to use glyco related data for systems-level understanding of the carbohydrate metabolism in parasite is still in infancy. Therefore, new and sophisticated systems-based strategies should be developed and promoted to utilize large-scale multi-omics data for studying the metabolic roles of carbohydrates in parasites. Further developments in Genome-Scale Models (GEMs) reconstructions should focus on integrating biosynthetic pathways for sugars, glycans, glycoproteins and other complex glycoconjugates along with incorporating as much as biological data (especially glycomics data) for better understanding metabolic roles of these molecules in parasites in different environmental conditions. In order to enhance the metabolic characterization of glycans and glycoconjugates in human parasites, the databases such as (Caspi et al. 2014), LeishCyc (Doyle et al. 2009), KEGG and CAZy provide useful information for metabolic complements for the biosynthesis of these complex molecules. Appropriate methodologies should be developed in order to enhance the use of relevant multi-experimental data from these resources for exploring stage-specific metabolic roles of the carbohydrates at the systems level. These strategies will possibly help to understand metabolic connection of the carbohydrates and subsequently step up identification of carbohydrate-based drug targets in Leishmania as well as other parasites.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 214 Chapter 5

In the same context, the future systems-based strategies should also focus on developing integrated host-pathogen immuno-metabolic models by combining metabolic models of Leishmania and human macrophages, which should help to explore metabolic interactions of the parasites with human immune system. For example, metabolic model ext-iAC560 (L. major, from this work) and the human metabolic model Recon 2.2 (Swainston et al. 2016) can be integrated with incorporating relevant omics data in the flux-based simulation to study the combined metabolism of Leishmania and macrophage to explore their interactions from metabolic point of view. The transcriptomic study (Fernandes et al. 2016) of Leishmania- infected macrophage could be helpful in metabolic simulation. These analyses can identify metabolic alterations in infected macrophage which promote parasites survival and help them in spreading infection in human host by changing its metabolic functionalities. The The appropriate application of the systems biology approaches can significantly utilize such strategies and omics data for improving understanding on the coupled metabolism of Leishmania as well as infected macrophage, which further might be helpful to identify high-priority drug targets in infected macrophages to kill the parasites. Summary and Future Perspectives 215

5.3 References Akune, Y. et al., 2010. The RINGS resource for glycome informatics analysis and data mining on the Web. Omics : a journal of integrative biology, 14(4), pp.475–486. Aoki-Kinoshita, K.F. et al., 2016. GlyTouCan 1.0 - The international glycan structure repository. Nucleic Acids Research, 44(D1), pp.D1237–D1242. Aoki-Kinoshita, K.F., 2013. Using databases and web resources for glycomics research. Molecular & cellular proteomics : MCP, 12(4), pp.1036–1045. Bairoch, A. and Apweiler, R., 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research, 28(1), pp.45–48. Banin, E. et al., 2002. A novel Linear Code((R)) nomenclature for complex carbohydrates. TRENDS IN GLYCOSCIENCE AND GLYCOTECHNOLOGY, 14(77), pp.127–137. Becker, S.A. and Palsson, B.O., 2008. Context-specific metabolic networks are consistent with experiments. PLoS Computational Biology, 4(5), p. e1000082. Bishop, J.R. and Gagneux, P., 2007. Evolution of carbohydrate antigens - Microbial forces shaping host glycomes? Glycobiology, 17(5), pp. 23R-34R. Caspi, R. et al., 2014. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research, 42(D1), pp. D742-D753. Ceroni, A. et al., 2008. GlycoWorkbench: A tool for the computer-assisted annotation of mass spectra of glycans. Journal of Proteome Research, 7(4), pp.1650–1659. Ceroni, A., Dell, A. and Haslam, S.M., 2007. The GlycanBuilder: a fast, intuitive and flexible software tool for building and displaying glycan structures. Source code for biology and medicine, 2, p.3. Chandrasekaran, S. and Price, N.D., 2010. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences of the United States of America, 107(41), pp.17845–50. Chavali, A.K. et al., 2008. Systems analysis of metabolism in the pathogenic trypanosomatid Leishmania major. Molecular Systems Biology, 4, p.177. Cova, M. et al., 2015. Sugar activation and glycosylation in Plasmodium. Malaria Journal, 14(1), p. 427. Cummings, R.D. and Nyame, A.K., 2008. Glycobiology of protozoan and helminthic parasites. In Carbohydrates in Chemistry and Biology. pp. 867–894. Cummings, R.D. and Pierce, J.M., 2014. The challenge and promise of glycomics. Chemistry and Biology, 21(1), pp.1–15. Damodaran, D. et al., 2008. CancerLectinDB: A database of lectins relevant to cancer. Glycoconjugate Journal, 25(3), pp.191–198. Doyle, M. A. et al., 2009. LeishCyc: a biochemical pathways database for Leishmania major. BMC systems biology, 3, p.57. Fernandes MC, Dillon LAL, Belew AT, Bravo HC, Mosser DM, El-Sayed NM. 2016. Dual transcriptome profiling ofLeishmania-infected human macrophages revealsdistinct reprogramming signatures. mBio 7(3):e00027-16. doi:10.1128/mBio.00027-16. Field, M.C. et al., 2017. Anti-trypanosomatid drug discovery: An ongoing challenge and a continuing need. Nature Reviews Microbiology, 15(4), pp.217–231. Hashimoto, K. et al., 2006. KEGG as a glycome informatics resource. Glycobiology, 16(5), pp. 63R-70R. Herget, S. et al., 2008. GlycoCT-a unifying sequence format for carbohydrates. Carbohydrate Research, 343(12), pp.2162–2171. Holt, W. V., 2011. Mechanisms of sperm storage in the female reproductive tract: An interspecies comparison.

Metabolic Systems Biology of Leishmania major University of Minho, 2019 216 Chapter 5

Reproduction in Domestic Animals, 46, pp.68–74. Jensen, P.A. and Papin, J.A., 2011. Functional integration of a metabolic network model and expression data without arbitrary thresholding. Bioinformatics, 27(4), pp.541–547. Koning, H.P., 2017. Drug resistance in protozoan parasites. Emerging Topics in Life Sciences, 1, pp.627–632. Kumar, D. and Mittal, Y., 2011. AnimalLectinDb: An integrated animal lectin database. Bioinformation, 6(3), pp.134–6. Lashley, F.R., 2004. Emerging infectious diseases: vulnerabilities, contributing factors and approaches. Expert Review of Anti-infective Therapy, 2(2), pp.299–316. Lombard, V. et al., 2014. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Research, 42(D1), pp. D490-495. Lütteke, T. et al., 2006. GLYCOSCIENCES.de: An internet portal to support glycomics and glycobiology research. Glycobiology, 16(5), p.71R–81R. Mariño, K. et al., 2011. Characterization, localization, essentiality, and high-resolution crystal structure of glucosamine 6-phosphate N-acetyltransferase from Trypanosoma brucei. Eukaryotic Cell, 10(7), pp.985–997. Mendonça-Previato, L. et al., 2005. Protozoan parasite-specific carbohydrate structures. Current Opinion in Structural Biology, 15(5), pp.499–505. Mohapatra, S., 2014. Drug resistance in leishmaniasis: Newer developments. Tropical Parasitology, 4(1), p.4. Naderer, T., Wee, E. and McConville, M.J., 2008. Role of hexosamine biosynthesis in Leishmania growth and virulence. Molecular Microbiology, 69(4), pp.858–869. Nii-Trebi, N.I., 2017. Emerging and Neglected Infectious Diseases: Insights, Advances, and Challenges. BioMed Research International, 2017. doi: 10.1155/2017/5245021 Pawar, H. et al., 2014. Neglected Tropical Diseases and Omics Science: Proteogenomics Analysis of the Promastigote Stage of Leishmania major Parasite. Omics : a journal of integrative biology, 18(8), pp.1–14. Pollyanna, S.G. et al., 2017. Decoding the Role of Glycans in Malaria. Front. Microbiol., 9(8), p.1071. Ranzinger, R. et al., 2008. GlycomeDB - integration of open-access carbohydrate structure databases. BMC bioinformatics, 9, p.384. Ranzinger, R. et al., 2017. GLYDE-II: The GLYcan data exchange format. Perspectives in Science, 11, pp.24–30. Rodrigues, J.A. et al., 2015. Parasite Glycobiology: A Bittersweet Symphony. PLoS Pathog, 11(11), p. e1005169. Sharma, M. et al., 2017. A systematic reconstruction and constraint-based analysis of Leishmania donovani metabolic network: identification of potential antileishmanial drug targets. Mol. BioSyst., 13(5), pp.955–969. Shlomi, T. et al., 2008. Network-based prediction of human tissue-specific metabolism. Nature biotechnology, 26(9), pp.1003–1010. Song, C. et al., 2013. Metabolic reconstruction identifies strain-specific regulation of virulence in Toxoplasma gondii. Molecular systems biology, 9(708), p.708. Stokes, M.J. et al., 2008. The synthesis of UDP-N-acetylglucosamine is essential for bloodstream form Trypanosoma brucei in vitro and in vivo and UDP-N-acetylglucosamine starvation reveals a hierarchy in parasite protein glycosylation. Journal of Biological Chemistry, 283(23), pp.16147–16161. Swainston, N. et al., 2016. Recon 2.2: from reconstruction to model of human metabolism. Metabolomics, 12(7), p.109. Tanaka, K. et al., 2014. WURCS: The Web3 unique representation of carbohydrate structures. Journal of Chemical Information and Modeling, 54(6), pp.1558–1566. Tong, M. et al., 2015. Infectious Diseases, Urbanization and Climate Change: Challenges in Future China. International Journal of Environmental Research and Public Health, 12(9), pp.11025–11036. Summary and Future Perspectives 217

Metabolic Systems Biology of Leishmania major University of Minho, 2019