1 2 3 4 Article type : Special Feature Review 5 6 7 8 9 Surveying across the domains of life

10 unveils promising drug targets in pathogens 11 12 13 14 Sheena M. H. Chua and James A. Fraser* 15 16 17 18 Australian Infectious Diseases Research Centre 19 School of Chemistry & Molecular Biosciences 20 The University of Queensland, Brisbane, Queensland, Australia 21 22 23 24 25 26 27 28 29

30 Manuscript Author 31 *Correspondence e-mail: [email protected] This is the author manuscript accepted for publication and has undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/IMCB.12389

This article is protected by copyright. All rights reserved 32 Running head 33 34 Purine biosynthesis across the domains of life 35

36 37 Keywords 38 39 Drug target, gene fusions, immunotherapy, infectious diseases, purine metabolism, 40 purinosome 41 42 Abstract 43 play an integral role in cellular processes such as energy metabolism, cell signalling, 44 and encoding the genetic makeup of all living organisms, ensuring that the purine metabolic 45 pathway is maintained across all domains of life. To gain a deeper understanding of purine 46 biosynthesis via the de novo biosynthetic pathway, the genes encoding purine metabolic 47 from 35 , 69 bacteria and 99 eukaryotic were investigated. While the 48 classic elements of the canonical purine metabolic pathway were utilised in all domains, a 49 subset of familiar biochemical roles were found to be performed by unrelated proteins in 50 some members of the Archaea and Bacteria. In the Bacteria, a major differentiating feature of 51 de novo purine biosynthesis is the increasing prevalence of gene fusions, where two or more 52 purine biosynthesis enzymes that perform consecutive biochemical functions in the pathway 53 are encoded by a single gene. All species in the Eukaryota exhibited the most common 54 fusions seen in the Bacteria, in addition to new gene fusions to potentially increase metabolic 55 flux. This complexity is taken further in humans, where a reversible biomolecular assembly 56 of enzymes known as the purinosome has been identified, allowing short-term regulation in 57 response to metabolic cues whilst expanding on the benefits that can come from gene fusion. 58 By surveying purine metabolism across all domains of life we have identified important 59 features of the purine biosynthetic pathway that can potentially be exploited as prospective 60 drug targets. Author Manuscript Author 61 62 Purines and the origin of life

This article is protected by copyright. All rights reserved 63 The most widely accepted model for the origin of life is that it arose via creation of organic 64 molecules produced by simple prebiotic chemical reactions. The Miller-Urey experiment 65 showed that amino acids essential for creating proteins could be formed from simple 66 molecules like water, methane, ammonia and hydrogen.1 Further investigations established 67 that the purine nucleotides adenine and guanine, key components of both RNA and DNA, 68 could also form under conditions likely present on primitive Earth.2 Over 3.5 billion years 69 later, all living things from the simplest unicellular life to more complex multicellular 70 organisms require purines for survival. Purines play an integral role in diverse processes 71 including energy metabolism, cell signalling, and encoding the genetic makeup of all 72 organisms – providing strong selective pressure that ensures the purine de novo biosynthetic 73 pathway is maintained across all domains of life. 74 75 Purines contain a six-membered pyrimidine ring fused to a five-membered imidazole ring. 76 The first of these heterocyclic compounds were discovered in urinary calculi; uric acid (from 77 the French urique for “urine”) in 1776 and xanthine (from the Greek xanthos for “yellow”) in 78 1838.3, 4 Guanine was identified in 1846 and named for the bird guano in which it was found, 79 and four years later hypoxanthine (“xanthine with less oxygen”) was isolated from cow 80 spleen.5, 6 It was not until 1884 that the term “purine” was coined from the Latin purum 81 (“pure”) and urinae (“urine”) by Fischer, who was awarded a Nobel Prize for his 82 achievements in the field of purine synthesis.7 Experiments on ox pancreas in 1886 led to the 83 identification of adenine (from the Greek aden for “gland”) that in turn led to Kossel being 84 awarded a Nobel Prize for the discovery that nucleic acids were composed of the purines 85 adenine and guanine alongside the pyrimidines thymine, cytosine and uracil.8 86 87 The biochemical processes underpinning the synthesis of purines would not be identified 88 until over half a century later. The determination of the intermediates involved in de novo 89 purine biosynthesis began in the 1930s with the discovery that hypoxanthine could be 90 detected in the livers of pigeons but not chickens, ducks, rats or guinea pigs.9 Key 91 intermediates in the pathway were subsequently elucidated by feeding isotopically labelled 92 substrates to pigeons and analysing uric acid crystallised from their droppings.10 These in Author Manuscript Author 93 vivo experiments were followed by in vitro studies purifying and assaying individual 94 enzymes from pigeon liver, cow liver, and Saccharomyces cerevisiae, together laying the 95 foundation for our current understanding of the biochemical processes required for the de 96 novo biosynthesis of purines in the Eukaryota.11-13

This article is protected by copyright. All rights reserved 97 98 The biochemistry of purine biosynthesis in the Eukaryota 99 De novo purine and pyrimidine biosynthesis both require the ATP-dependent 100 phosphorylation of ribose-5-phosphate (R5P) by ribose-phosphate diphosphokinase (EC 101 2.7.6.1) to produce phosphoribosyl-pyrophosphate (PRPP).14 102

103 The first dedicated step of de novo purine biosynthesis is the hydrolysis of L-glutamine and

104 transfer of the liberated amine group to PRPP by PRPP amidotransferase (PRPPA, EC 105 2.4.2.14), creating phosphoribosyl-amine (PRA; Figure 1).15 Next, phosphoribosyl-

106 glycinamide (GAR) synthetase (GARS, EC 6.3.4.13) catalyses the ATP-dependent ligation of

107 L-glycine to PRA (via a phosphorylated intermediate) to yield GAR.16 GAR transformylase

108 (N10-fTHF-GART, EC 2.1.2.2) then ligates the formyl group from 10-formyltetrahydrofolate 109 (N10-fTHF) to GAR, producing phosphoribosyl-formylglycinamide (FGAR).17 The 110 subsequent activation of the FGAR amide oxygen by the ATP-grasp domain of

111 phosphoribosyl-formylglycinamidine (FGAM) synthetase (FGAMS, EC 6.3.5.3) produces an 112 iminophosphate intermediate that is amidated by ammonia, channelled via a structural 113 domain, from the domain to create FGAM.18 Finally, phosphoribosyl-

114 aminoimidazole (AIR) synthetase (AIRS, EC 6.3.3.1) catalyses the ATP-dependent activation 115 of the FGAM formyl oxygen that reacts with a nearby nitrogen to close the five-membered 116 imidazole ring of the purine base and form AIR.19 117 118 Formation of the pyrimidine ring then begins with phosphoribosyl-carboxyaminoimidazole

119 (CAIR) synthetase (CAIRM-II type CAIRS, EC 4.1.1.21) carboxylating AIR using CO2 to 120 form CAIR.20 Phosphoribosyl-aminoimidazolesuccinocarboxamide (SAICAR) synthetase

121 (SAICARS, EC 6.3.2.6) then mediates the ATP-dependent ligation of L-aspartate to CAIR

122 forming SAICAR; the following -elimination of fumarate from SAICAR by 123 adenylosuccinate (ADS) (ADSL, EC 4.3.2.2) produces phosphoribosyl-

124 aminoimidazolecarboxamide (AICAR).21 AICAR transformylase (N10-fTHF-AICART, EC 125 2.1.2.3) then ligates the formyl group from N10-fTHF to AICAR to form phosphoribosyl-

22

126 formamidocarboxamide Manuscript Author (FAICAR). Closure of the pyrimidine ring (completing formation 127 of the purine base) occurs via elimination of water from FAICAR by inosine monophosphate

128 (IMP) cyclohydrolase (IMPC, EC 3.5.4.10) and cyclisation to produce IMP.23 129

This article is protected by copyright. All rights reserved 130 Purine de novo biosynthesis bifurcates after IMP. In the adenosine monophosphate (AMP) 131 biosynthesis branch, ADS synthetase (ADSS, EC 6.3.4.4) transfers the γ-phosphate from GTP 132 to IMP, forming the intermediate 6-phosphoryl IMP; the 6-phosphoryl group is then 133 displaced by the α-amino group of L-aspartate to form ADS.24, 25 The conversion of ADS to 134 AMP follows the same -elimination mechanism by ADS lyase that it performs earlier in the 135 pathway.26, 27 AMP can be converted to ADP by adenylate kinase (ADK), then ATP by

136 nucleoside diphosphate kinase (NDK).28, 29 In the GMP biosynthesis branch, IMP

137 dehydrogenase (IMPDH, EC 1.1.1.205) catalyses the NAD+-dependent hydrolysis of IMP to 138 form XMP.30, 31 Next, the ATP pyrophophatase domain of GMP synthetase (GMPs, EC 139 6.3.5.2) adenylates XMP to form an XMP intermediate, while the glutamine amidotransferase 140 domain produces ammonia to add an amine group to the intermediate and yield GMP.27 GMP

141 can be converted to GDP by guanylate kinase (GUK), then GTP by nucleoside diphosphate

142 kinase (NDK).29, 32 143 144 In addition to the de novo pathway, exogenous purines can also be scavenged from the 145 environment via a salvage pathway through the action of phosphoribosyltransferases that 146 convert adenine, hypoxanthine, xanthine and guanine to AMP, IMP, XMP and GMP, 147 respectively.31, 33 148 149 The history of purine metabolism as a drug target 150 Given the requirement for significant quantities of nucleic acids in highly proliferative cells 151 such as cancers, immune cells and infecting pathogens, it was proposed that targeting of 152 specific biological processes in the purine metabolism pathway may lead to effective 153 therapeutic treatments.34, 35 This premise led to the conception of rational drug design, 154 earning Hitchings and Elion a Nobel Prize.36 155 156 Pharmaceuticals that act through the purine biosynthetic pathway were subsequently 157 developed. Allopurinol, an inhibitor of the purine degradation xanthine oxidase, 158 blocks the accumulation of uric acid and, by extension, the occurrence of gout.37

159 Azathioprine is converted Manuscript Author into the purine analogue 6-mercaptopurine before it is salvaged by 160 hypoxanthine-guanine phosphoribosyltranferase to disrupt purine metabolism; originally 161 designed as an anticancer drug, this compound was found to be more effective as an 162 immunosuppressant in organ transplant recipients.38 Acyclovir is sequentially metabolised by

This article is protected by copyright. All rights reserved 163 viral thymidine kinase, host guanylate kinase and host nucleoside diphosphate kinase to 164 create acyclovir triphosphate, a purine analogue that selectively incorporates into, and 165 terminates synthesis of, viral DNA in the treatment of viral infections such as herpes.39 166 Beyond these synthesised compounds, natural products have also been found that interfere 167 with purine metabolism. Mycophenolic acid from Penicillium brevicompactum inhibits IMP 168 dehydrogenase in a wide range of species, including in humans where it is used as an 169 immunosuppressant.33, 40 170 171 Despite the medical advances that have been made by targeting purine metabolism, many of 172 the enzymes in the pathway remain unexplored, particularly in the field of antibiotic 173 development. To better understand why some biochemical reactions may serve as superior 174 targets, it is worthwhile to look beyond the Eukaryota and consider purine metabolism across 175 all domains of life. 176 177 Archaea and an alternative purine lifestyle 178 The original characterisation of purine biosynthesis in the 1950s did not include studies of the 179 Archaea, unsurprising given that this domain (that contains no known pathogens) was not 180 recognised until two decades later. However, with the exception of proposed symbionts that 181 are unable to create purines de novo, studies of these autochthons of diverse habitats reveal 182 that while all of the classic elements of the canonical eukaryotic purine metabolic pathway 183 are present in the domain, a subset of the familiar biochemical roles are performed by 184 unrelated proteins in many species (Figure 2). 185 186 The most widespread example of alternative purine metabolism enzymes in the Archaea are 187 AICAR transformylase and GAR transformylase. In the Eukaryota, these enzymes use N10- 188 fTHF as a formyl donor, but in the Archaea the ability to synthesise this is not 189 universal. Consistent with this, the eukaryotic forms of these enzymes are only found in 190 certain species capable of synthesising N10-fTHF.41, 42 Many Archaea instead employ

191 formate-dependent variants of AICAR transformylase (formate-AICART, EC 6.3.4.23) and

192 GAR transformylase (formate-GART, EC 2.1.2.-) to generate formylphosphate that supplies Author Manuscript Author 193 the formyl group for ligation to AICAR and GAR respectively (Figure 1).43, 44 Formate is 194 abundant in hydrothermal vents where extreme thermophilic Archaea thrive, and it has been 195 proposed that the formate-dependent enzymes are beneficial in this type of niche.45, 46

This article is protected by copyright. All rights reserved 196 197 In contrast to the biochemically distinct formate- and N10-fTHF-dependent mechanisms of 198 the transformylases, there are two types of IMP cyclohydrolase in the Archaea that appear to 199 possess the same enzymatic activity. One of these is of the type identified in early eukaryotic

200 studies, and the other is an unrelated protein with the same function (archaeal-IMPC, EC 201 3.5.4.10) (Figure 1).47, 48 Each are equally common in the species we have investigated 202 (Figure 2). 203 204 Members of the Archaea carboxylate AIR to CAIR using one of three distinct paths (Figure 205 1). The first path is a two-step process that requires a phosphoribosyl-

206 carboxyaminoimidazole (CAIR) synthetase (CAIRS, EC 6.3.4.18) that catalyses ATP- 207 dependent phosphorylation of bicarbonate to create a carboxyphosphate intermediate that is 208 then ligated to AIR, yielding N(5)-phosphoribosyl-carboxyaminoimidazole (N(5)-CAIR). A

209 class I CAIR mutase (CAIRM-I, EC 5.4.99.18), the dominant class of the enzyme in the 210 Archaea, rearranges the position of this carboxylate group to form CAIR.49, 50 The second 211 path occurs in species that have class I CAIR mutase but no CAIR synthetase. It has been

- 212 proposed that in these species, during growth in CO2 or HCO3 replete niches the Class I 213 enzyme catalyses the direct formation of CAIR on its own, analogous to the suggested 214 environmental link associated with the formate-requiring transformylases.51, 52 The third path

215 is a single step CO2-dependent carboxylation reaction initially identified in the Eukaryota, 216 that in the Archaea has only been reported in members of the genus Archaeoglobus.53 217 Performed by an enzyme appropriately deemed a CAIR synthetase based on its biochemical 218 function, at the sequence level this protein more closely resembles the class I PurE CAIR 219 mutase of Escherichia coli. Referred to in the literature as a “class II PurE”, this enzyme 220 could therefore also be described as a “class II CAIR mutase-type CAIR synthetase”

221 (CAIRM-II type CAIRS, EC 4.1.1.21). 222 223 Overall, alternative purine metabolism enzymes are a major theme in the Archaea, and 224 intriguingly, this is a story that appears to be incomplete. Multiple species lack open reading 225 frames likely to encode homologues of GAR transformylase, IMP cyclohydrolase or IMP Author Manuscript Author 226 dehydrogenase, or a combination of these. Whether novel, undiscovered alternative proteins 227 exist that perform these essential biochemical reactions, or whether the biochemical step is 228 bypassed (as with CAIR synthetase) is unknown.53 229

This article is protected by copyright. All rights reserved 230 Fusion-powered purine metabolism in the Bacteria 231 As with the Archaea, some species of the Bacteria lack the de novo pathway altogether, 232 however in this case they are normally obligate pathogens that rely entirely on scavenging 233 purines from the host.54, 55 Alternative purine biosynthetic enzymes also occur in the Bacteria, 234 only to a lesser extent than the Archaea (Figure 3). Formate-dependent AICAR 235 transformylase appears to be absent and archaeal IMP cyclohydrolase is rare. In contrast, 236 formate-dependent GAR transformylase is present in one third of the species investigated 237 which also include the ubiquitous N10-fTHF-dependent GAR transformylase. The metabolic 238 redundancy of retaining two different forms of GAR transformylase has been demonstrated 239 by deletion studies in E. coli where either GAR transformylase alone is sufficient for FGAR 240 biosynthesis and survival in purine-free media.56 241 242 The major differentiating feature of the de novo purine biosynthetic pathway in the Bacteria 243 is an increasing prevalence of gene fusions where two or more purine biosynthesis enzymes 244 are encoded by a single gene. These are almost exclusively fusions of genes that encode 245 enzymes performing consecutive biochemical functions in the pathway. 246 247 The effect of selective pressure on the fusion of genes in the purine biosynthesis pathway is 248 highlighted by analysis of steps that require multiple distinct biochemical activities. The 249 conversion of FGAR to FGAM requires three proteins: an ATP-grasp protein that activates 250 the FGAR amide oxygen, a glutaminase that produces ammonia to amidate the and 251 produce FGAM, and a structural protein that channels the ammonia from the glutaminase to 252 the ATP-grasp protein. In some bacteria, these are encoded by separate genes, in others the 253 ATP-grasp and structural genes are fused to create a single larger protein, and in most the 254 glutaminase joins this fusion to create a triple domain protein (Figure 3). Each of these 255 arrangements can also be found in the Archaea, albeit with the fused forms least common 256 (Figure 2). Likewise, the conversion of XMP to GMP requires two separate enzymatic 257 activities – ATP pyrophophatase for XMP adenylation, and glutamine amidotransferase to 258 add an amine group to create GMP.27, 57 In the Bacteria these are fused into a single 259 bifunctional GMP synthetase protein (Figure 3), while in the majority of the Archaea they Author Manuscript Author 260 exist as separate genes (Figure 2). 261

This article is protected by copyright. All rights reserved 262 In a similar fashion, genes that encode enzymes performing discrete but consecutive steps in 263 the pathway can also be fused. The genes encoding N10-fTHF-dependent AICAR 264 transformylase (that ligates a formyl group to AICAR forming FAICAR) and IMP 265 cyclohydrolase (which cyclises FAICAR to form IMP) sometimes exist as separate genes in 266 the Bacteria, but far more commonly occur as a single gene fusion (Figure 3). The same 267 fusion occurs in the Archaea, but is much rarer. 268 269 Every example of fusion involving a purine biosynthesis gene that we have observed in the 270 Bacteria is always to another purine biosynthesis gene, and this is also true in the Archaea, 271 indicating that these fusions are not arbitrary. Furthermore, the three most predominant 272 purine biosynthesis gene fusions in the Bacteria entail the joining of genes that encode 273 proteins performing consecutive biochemical steps in the pathway. In keeping with 274 applications in synthetic biology where sequentially scaffolding enzymes can facilitate 275 substrate channelling and significantly increase flux through a pathway, the most 276 parsimonious explanation of these data is that there is a strong preference towards gene 277 fusion for more efficient purine metabolism.58 278 279 Profusion of purine metabolic gene fusion in the Eukaryota 280 As with the Archaea and Bacteria, some members of the Eukaryota lack the de novo purine 281 biosynthesis pathway, usually obligate parasites or pathogens that are not free-living.59, 60 282 However, the majority of the Eukaryota are capable of de novo purine biosynthesis, and these 283 contain the most common fusions evident in the Bacteria: the triple domain FGAM 284 synthetase, the bifunctional GMP synthetase fusion, and the bifunctional N10-fTHF- 285 dependent AICAR transformylase/IMP cyclohydrolase (referred to as “ATIC” in the 286 Metazoa; Figure 4). 287 288 The theme of gene fusions encoding multi-domain enzymes that perform sequential pathway 289 steps goes even further in the Eukaryota. In the Fungi, the Viridiplantae and many of the 290 protists, the carboxylation of AIR to CAIR occurs via the sequential action of CAIR 291 synthetase and class I CAIR mutase which are now fused into a single protein; this Author Manuscript Author 292 arrangement also occurs in the bacterium Kocuria rhizophila. In the Amoebozoa, this fusion 293 is further expanded to include the enzyme that catalyses the next step in the pathway,

294 SAICAR synthetase, which ligates L-aspartate to CAIR to create AICAR. The Metazoa have

This article is protected by copyright. All rights reserved 295 a simpler form of this fusion that only includes the metazoan class II CAIR mutase-type 296 CAIR synthetase and SAICAR synthetase (dubbed “PAICS”). 297 298 One enzyme fusion present in all Fungi and a subset of Amoebozoa occurs between two 299 enzymes responsible for non-consecutive steps of the pathway: GAR synthetase (ligation of

300 L-glycine to PRA creating GAR) and AIR synthetase (cyclisation of FGAM to form AIR). 301 However, this gene fusion expands in the Metazoa with the addition of N10-fTHF-dependent 302 GAR transformylase that catalyses the consecutive step after GAR synthetase (ligation of a 303 formyl group to GAR to produce FGAR) forming the trifunctional protein “TrifGART”. 304 While this example initially deviates from the trend of fusion of consecutive steps, the third 305 addition follows the theme, bringing together the enzymes for conversion of PRA to GAR 306 and GAR to FGAR into a single protein. 307 308 Unlike the Archaea and the Bacteria, there are multiple examples in the Eukaryota where 309 purine biosynthesis genes have fused with genes whose function is not associated with this 310 primary metabolic process (Figure 4, Table 1). For example, a range of animals possess a 311 fusion of class II CAIR mutase-type CAIR synthetase/SAICAR synthetase with a predicted 312 Myb-like DNA-binding domain, and several protists have the trifunctional GAR 313 synthetase/GAR transformylase/AIR synthetase fused to an MGS-like domain protein. 314 Interestingly, the coral Stylophora pistillata contains a fusion of IMP dehydrogenase to 315 aspartyl-tRNA synthetase, and the fungi of the order Tremellales, which includes the 316 pathogenic Cryptococcus species complex, exhibits a similar fusion, this time between 317 SAICAR synthetase and tyrosyl-tRNA synthetase. 318 319 Finally, across the Eukaryota domain there are multiple examples of purine biosynthesis 320 genes being duplicated, most often the Viridiplantae (Figure 4). While there are a limited 321 number of changes in purine biosynthesis gene copy number in the Archaea (Figure 2) and 322 the Bacteria (Figure 3), in the Eukaryota the frequency of gene duplications drastically 323 increases. To date there are no reports of neofunctionalization. 324 Author Manuscript Author 325 The prevalence and size of purine biosynthesis gene fusions in the Eukaryota supports the 326 hypothesis that close proximity of these enzymes enhances efficiency, raising the question of 327 whether this process will continue towards all enzymatic activities of the pathway existing 328 within a single protein. Such a protein could conceivably evolve; the combined lengths of the

This article is protected by copyright. All rights reserved 329 purine biosynthetic enzymes in humans is approximately 6,000 amino acid residues, while 330 the largest known protein (Titin) can reach almost 36,000 residues in length.61 While 331 theoretically possible, a different mechanism for temporally and spatially bringing the 332 components of the pathway together appears to already exist. 333

334 335 The higher order structure of purine biosynthesis in humans 336 Just as the fusion of enzymes that function in sequential steps of a pathway can enable 337 substrate channelling, facilitate regulation of metabolic flux, raise catalytic efficiency, and 338 prevent substrate degradation and diffusion, so can the formation of multiprotein 339 complexes.62, 63 Over the last decade, studies in human cell lines have enabled the detection 340 of a dynamic mesoscale protein assembly responsible for the production of ATP and GTP. 341 342 Studies of fluorescently tagged de novo purine biosynthesis enzymes in the human HeLa cell 343 line revealed subcellular colocalisation in response to purine availability; the enzymes 344 clustered together into a higher order structure dubbed the “purinosome”, a metabolon that 345 formed in the absence of exogenous purines.64 Furthermore, cells derived from patients 346 suffering from either ADS lyase deficiency or AICA-ribosiduria (AICAR 347 transformylase/IMP cyclohydrolase deficiency) were unable to form these multienzyme 348 assemblies under purine depleted conditions, nor could HeLa cells in which individual de 349 novo purine biosynthesis genes had been disrupted. Together, these data suggest that all 350 components of the pathway are required for purinosome formation.65-67 351 352 Fluorescence recovery after photobleaching (FRAP) and cytoplasmic protein-protein 353 interaction studies revealed the existence of a stable core complex consisting of the first half 354 of the de novo biosynthesis pathway: PRPP amidotransferase, GAR synthetase/AIR 355 synthetase/GAR transformylase (TrifGART) and FGAM synthetase.68 The enzymes from the 356 second half of the pathway appear to transiently interact with this core to form the 357 purinosome.69 Found in the cytoplasm in association with mitochondria and microtubules,

358 formation is regulated Manuscript Author by G-protein coupled receptor signalling, phosphorylation and 359 interaction with molecular chaperones such as HSP90.70-73 Purinosomes form primarily, but 360 not exclusively, during the G1 phase of the cell cycle, preceding S phase when purine 361 demand is high due to DNA replication.74

This article is protected by copyright. All rights reserved 362 363 Unlike gene fusion, compartmentalisation of purine biosynthesis in a reversible biomolecular 364 assembly of enzymes that allows short-term regulation in response to metabolic cues. As 365 such, the existence of the purinosome provides an additional level of control over a process 366 that is critical to numerous functions in the cell. However, all studies to date have taken place 367 in vitro, and exclusively in human cells. The prevalence of this metabolon in vivo, 368 particularly in regards to tissue specificity, remains to be shown. Perhaps more importantly, 369 the question whether the purinosome exists beyond humans, beyond the Metazoa, or even 370 beyond the Eukaryota remains to be answered. 371 372 The future of purine metabolism as a drug target 373 The most well-known therapeutic agents that function through purine metabolism were 374 developed during the golden age of drug discovery between the 1950s and 1980s.36 These 375 studies were performed from a reductionist standpoint: focusing on a specific target in a 376 specific organism was the cornerstone of the rational drug design strategy. In the post- 377 genomic world where this important primary metabolic process can be considered across all 378 of the domains of life, rational drug design can now exploit similarities or contrasts between 379 species, genera, orders, phyla, or even domains. From this viewpoint, the pathway can be 380 targeted to inhibit purine biosynthesis in few species, in many, or potentially in all depending 381 on the desired outcome. 382 383 If the goal is to target just a few species, characteristics unique to the purine de novo 384 biosynthesis pathway in one organism or a small group of closely related species could be 385 exploited. An excellent example is the fungal pathogen Cryptococcus neoformans of the 386 Tremellales. The fungi of this order possess an unusual gene fusion that joins SAICAR 387 synthetase to the cytoplasmic tyrosyl-tRNA synthetase, providing two levels of vulnerability: 388 the conditional lethality associated with loss of purine biosynthesis, and the essential ability 389 to synthesise proteins (Figure 5). The fusion has created an irreversible link between these 390 otherwise unrelated processes, providing the opportunity for the development of therapeutic 391 agents that are organism-specific whilst interfering with two critical cellular processes Author Manuscript Author 392 simultaneously. To our knowledge, the concurrent inhibition of two pathways by targeting a 393 fusion protein has never been demonstrated. 394

This article is protected by copyright. All rights reserved 395 The development of a drug that targets a single pathogen is less useful than the development 396 of a drug that can target many. Analysis of purine metabolism across all domains reveals 397 prospective targets shared by many species. The alternative enzymes that are prevalent in the 398 Archaea do not make good targets; this domain contains no known pathogens, and the 399 Bacteria that have the most common of these (the formate-dependent GAR transformylase) 400 appear to have metabolic redundancy that makes the protein non-essential. In contrast, the 401 CAIR synthetase in most organisms is a much better target due to the fact that it is unrelated 402 to its counterpart in the Metazoa (Figure 5). Furthermore, the metazoan enzyme also exists as 403 a fusion with SAICAR synthetase, providing an additional level of differentiation. 404 405 Targeting all free-living organisms could be most easily achieved through the inhibition of 406 those enzymes that are universally shared but don’t have the potentially confounding 407 influence of gene fusions. PRPP amidotransferase and ADS lyase present in all free-living 408 species and are not part of a common fusion. Inhibition of either could yield a new series of 409 antibacterials, herbicides, antifungals, anticancer agents or immunosuppressants. 410 411 By surveying purine metabolism across all domains of life, we have identified important 412 features which could inform each of these goals, and the strategies that could be employed to 413 address them. An antimicrobial that also targets a human enzyme isn’t necessarily 414 undesirable. As typified by mycophenolic acid, a natural used by P. brevicompactum 415 to combat other microbes, an antimicrobial drug that also inhibits a human purine 416 biosynthesis enzyme can still be valuable in the clinic as an immunosuppressant or anticancer 417 drug. Furthermore, beyond the insights that can be gained from genomic analyses, the 418 purinosome represents another avenue of investigation. Which of these strategies provides 419 the most effective approach is dependent on developing a deeper understanding of this 420 evolutionarily conserved pathway. 421 422 References 423 1. Miller SL. A production of amino acids under possible primitive earth conditions. 424 Science 1953; 117: 528-529. Author Manuscript Author 425 2. Oro J. Mechanism of Synthesis of Adenine from Hydrogen Cyanide under Possible 426 Primitive Earth Conditions. Nature (10.1038/1911193a0).1961; 191: 1193-1194.

This article is protected by copyright. All rights reserved 427 3. Scheele KW. Examen Chemicum Calculi urinarii, Opuscula 11. Nucleic Acids: 428 Chemical Catalog Co New York, NY; 1776, p 73. 429 4. Wohler F, Liebig,J. Uber Marcet's Xanthic-Oxyd. Annl Pharm 1838 b; 26: 340-345. 430 5. Unger JB. Annalen der Chemie und Pharmacie: vereinigte Zeitschrift des Neuen 431 Journals der Pharmacie für Ärzte, Apotheker und Chemiker u. des Magazins für 432 Pharmacie und Experimentalkritik. Winter, 1846. 433 6. Scherer P. Über einen im thierischen Organismus vorkommenden, dem Xanthicoxyd 434 verwandten Körper. Justus Liebigs. Ann Chem 1850; 73: 328-334. 435 7. Fischer E, Hess O. Synthese von Indolderivaten. Berichte der deutschen chemischen 436 Gesellschaft 1884; 17: 559-568. 437 8. Kossel A. Weitere Beiträge zur Chemie des Zellkerns. Zeitschrift für physiologische 438 Chemie1886. p. 248. 439 9. Orström A, Orström M, Krebs HA. The formation of hypoxanthine in pigeon liver. 440 The Biochemical journal 1939; 33: 990-994. 441 10. Buchanan JM, Sonne JC, Delluva AM. Biological precursors of uric acid; the role of 442 lactate, glycine, and carbon dioxide as precursors of the carbon chain and nitrogen 443 atom 7 of uric acid. J Biol Chem 1948; 173: 81-98. 444 11. Schulman MP, Buchanan JM. Biosynthesis of the purines: II. Metabolism of 4-amino- 445 5-imidazolecarboxamide in pigeon liver. Journal of Biological Chemistry 1952; 196: 446 513-526. 447 12. Williams WJ, Buchanan JM. Biosynthesis of the purines: IV. The metabolism of 4- 448 amino-5-imidazolecarboxamide in yeast. Journal of Biological Chemistry 1953; 202: 449 253-262. 450 13. Korn ED, Buchanan JM. Biosynthesis of the purines: VI. Purification of liver 451 nucleoside phosphorylase and demonstration of nucleoside synthesis from 4-amino-5- 452 imidazolecarboxamide, adenine, and 2,6-diaminopurine. Journal of Biological 453 Chemistry 1955; 217: 183-192. 454 14. Li S, Lu Y, Peng B, Ding J. Crystal structure of human phosphoribosylpyrophosphate 455 synthetase 1 reveals a novel allosteric site. The Biochemical journal 2007; 401: 39-47. 456 15. Messenger LJ, Zalkin H. Glutamine phosphoribosylpyrophosphate amidotransferase Author Manuscript Author 457 from Escherichia coli. Purification and properties. The Journal of biological 458 chemistry 1979; 254: 3382.

This article is protected by copyright. All rights reserved 459 16. Wang W, Kappock TJ, Stubbe J, Ealick SE. X-ray Crystal Structure of Glycinamide 460 Ribonucleotide Synthetase from Escherichia coli. Biochemistry 1998; 37: 15647- 461 15662. 462 17. Almassy RJ, Janson CA, Kan C-C, Hostomska Z. Structures of Apo and Complexed 463 Escherichia coli Glycinamide Ribonucleotide Transformylase. Proceedings of the 464 National Academy of Sciences of the United States of America 1992; 89: 6114-6118. 465 18. Anand R, Hoskins AA, Bennett EM, Sintchak MD, Stubbe J, Ealick SE. A model for 466 the Bacillus subtilis formylglycinamide ribonucleotide amidotransferase multiprotein 467 complex. Biochemistry 2004; 43: 10343-10352. 468 19. Schrimsher JL, Schendel FJ, Stubbe J, Smith JM. Purification and characterization of 469 aminoimidazole ribonucleotide synthetase from Escherichia coli. Biochemistry 1986; 470 25: 4366-4371. 471 20. Firestine SM, Poon SW, Mueller EJ, Stubbe J, Davisson VJ. Reactions catalyzed by 472 5-aminoimidazole ribonucleotide carboxylases from Escherichia coli and Gallus 473 gallus: a case for divergent catalytic mechanisms. Biochemistry 1994; 33: 11927- 474 11934. 475 21. Chitty JL, Blake KL, Blundell RD, et al. Cryptococcus neoformans ADS lyase is an 476 enzyme essential for virulence whose crystal structure reveals features exploitable in 477 antifungal drug design. J Biol Chem 2017; 292: 11829-11839. 478 22. Wolan DW, Greasley SE, Beardsley GP, Wilson IA. Structural Insights into the 479 Avian AICAR Transformylase Mechanism. Biochemistry 2002; 41: 15505-15513. 480 23. Vergis JM, Beardsley GP. Catalytic Mechanism of the Cyclohydrolase Activity of 481 Human Aminoimidazole Carboxamide Ribonucleotide Formyltransferase/Inosine 482 Monophosphate Cyclohydrolase. Biochemistry 2004; 43: 1184-1192. 483 24. Poland BW, Silva MM, Serra MA, et al. Crystal structure of adenylosuccinate 484 synthetase from Escherichia coli. Evidence for convergent evolution of GTP-binding 485 domains. Journal of Biological Chemistry 1993; 268: 25334-25342. 486 25. Blundell RD, Williams SJ, Morrow CA, Ericsson DJ, Kobe B, Fraser JA. Purification, 487 crystallization and preliminary X-ray analysis of adenylosuccinate synthetase from 488 the fungal pathogen Cryptococcus neoformans. Acta Crystallogr Sect F Struct Biol Author Manuscript Author 489 Cryst Commun 2013; 69: 1033-1036. 490 26. Tsai M, Koo J, Yip P, Colman RF, Segall ML, Howell PL. Substrate and product 491 complexes of Escherichia coli adenylosuccinate lyase provide new insights into the 492 enzymatic mechanism. Journal of molecular biology 2007; 370: 541-554.

This article is protected by copyright. All rights reserved 493 27. Chitty JL, Tatzenko TL, Williams SJ, et al. GMP Synthase Is Required for Virulence 494 Factor Production and Infection by Cryptococcus neoformans. J Biol Chem 2017; 495 292: 3049-3059. 496 28. Dahnke T, Shi Z, Yan H, Jiang RT, Tsai MD. Mechanism of adenylate kinase. 497 Structural and functional roles of the conserved arginine-97 and arginine-132. 498 Biochemistry 1992; 31: 6318-6328. 499 29. Schneider B, Babolat M, Xu YW, Janin J, Véron M, Deville-Bonne D. Mechanism of 500 phosphoryl transfer by nucleoside diphosphate kinase pH dependence and role of the 501 Lys16 and Tyr56 residues. European journal of biochemistry 2001; 268: 502 1964-1971. 503 30. Hedstrom L. IMP dehydrogenase: structure, mechanism, and inhibition. Chemical 504 reviews 2009; 109: 2903-2928. 505 31. Morrow CA, Valkov E, Stamp A, et al. De novo GTP biosynthesis is critical for 506 virulence of the fungal pathogen Cryptococcus neoformans. PLoS Pathog 2012; 8: 507 e1002957. 508 32. Stehle T, Schulz GE. Refined structure of the complex between guanylate kinase and 509 its substrate GMP at 2·0 Å resolution. Journal of molecular biology 1992; 224: 1127- 510 1141. 511 33. Blundell RD, Williams SJ, Arras SD, et al. Disruption of de Novo Adenosine 512 Triphosphate (ATP) Biosynthesis Abolishes Virulence in Cryptococcus neoformans. 513 ACS infectious diseases 2016; 2: 651-663. 514 34. Hitchings GH, Elion GB, Falco EA, Russell PB, Sherwood MB, Vanderwerff H. 515 Antagonists of nucleic acid derivatives I. The Lactobacillus casei model. Journal of 516 Biological Chemistry 1950; 183: 1-9. 517 35. Nieuwenhuis P, Opstelten D. Functional anatomy of germinal centers. American 518 Journal of Anatomy 1984; 170: 421-435. 519 36. Elion GB. The Purine Path to Chemotherapy. Science 1989; 244: 41-47. 520 37. Hitchings GH, Elion GB. Chemical Suppression of the Immune Response. 521 Pharmacological Reviews 1963; 15: 365-405. 522 38. Hamilton L, Elion GB. The fate of 6-mercaptopurine in man. Ann N Y Acad Sci 1954; Author Manuscript Author 523 60: 304-314. 524 39. Elion GB. The biochemistry and mechanism of action of acyclovir. Journal of 525 Antimicrobial Chemotherapy 1983; 12: 9-17.

This article is protected by copyright. All rights reserved 526 40. Anderson HA, Bracewell JM, Fraser AR, Jones D, Robertson GW, Russell JD. 5- 527 Hydroxymaltol and mycophenolic acid, secondary metabolites from Penicillium 528 echinulatum. Transactions of the British Mycological Society 1988; 91: 649-651. 529 41. White RH. Analysis and characterization of the folates in the nonmethanogenic 530 archaebacteria. Journal of bacteriology 1988; 170: 4608-4612. 531 42. Buchenau B, Thauer RK. Tetrahydrofolate-specific enzymes in Methanosarcina 532 barkeri and growth dependence of this methanogenic archaeon on folic acid or p- 533 aminobenzoic acid. Archives of microbiology 2004; 182: 313-325. 534 43. Thoden JB, Firestine S, Nixon A, Benkovic SJ, Holden HM. Molecular Structure of 535 Escherichia coli PurT-Encoded Glycinamide Ribonucleotide Transformylase. 536 Biochemistry 2000; 39: 8791-8802. 537 44. Ownby K, Xu H, White RH. A Methanocaldococcus jannaschii archaeal signature 538 gene encodes for a 5-formaminoimidazole-4-carboxamide-1-beta-D-ribofuranosyl 5'- 539 monophosphate synthetase. A new enzyme in purine biosynthesis. J Biol Chem 2005; 540 280: 10881-10887. 541 45. White RH. Distribution of folates and modified folates in extremely thermophilic 542 bacteria. Journal of bacteriology 1991; 173: 1987-1991. 543 46. Martin W, Russell MJ. On the origin of biochemistry at an alkaline hydrothermal 544 vent. Philosophical transactions of the Royal Society of London Series B, Biological 545 sciences 2007; 362: 1887-1925. 546 47. Kang Y-N, Tran A, White RH, Ealick SE. A Novel Function for the N-Terminal 547 Nucleophile Fold Demonstrated by the Structure of an Archaeal Inosine 548 Monophosphate Cyclohydrolase. Biochemistry 2007; 46: 5050-5062. 549 48. Hunter CA, Plymale NI, Smee KM, Sarisky CA. Experimental characterization of two 550 archaeal inosine 5'-monophosphate cyclohydrolases. PloS one 2019; 14: e0223983. 551 49. Thoden JB, Kappock TJ, Stubbe J, Holden HM. Three-dimensional structure of N5- 552 carboxyaminoimidazole ribonucleotide synthetase: a member of the ATP grasp 553 . Biochemistry 1999; 38: 15480-15492. 554 50. Mathews II, Kappock TJ, Stubbe J, Ealick SE. Crystal structure of Escherichia coli 555 PurE, an unusual mutase in the purine biosynthetic pathway. Structure 1999; 7: 1395- Author Manuscript Author 556 1406. 557 51. Dismukes GC, Klimov VV, Baranov SV, Kozlov YN, DasGupta J, Tyryshkin A. The 558 origin of atmospheric oxygen on Earth: the innovation of oxygenic photosynthesis. 559 Proc Natl Acad Sci U S A 2001; 98: 2170-2175.

This article is protected by copyright. All rights reserved 560 52. Koonin EV GMB. Sequence - Evolution - Function: Computational Approaches in 561 Comparative Genomics. . Kluwer Academic; 2003. 562 53. Brown AM, Hoopes SL, White RH, Sarisky CA. Purine biosynthesis in archaea: 563 variations on a theme. Biol Direct 2011; 6: 63-63. 564 54. Tipples G, McClarty G. The obligate intracellular bacterium Chlamydia trachomatis 565 is auxotrophic for three of the four ribonucleoside triphosphates. Molecular 566 Microbiology 1993; 8: 1105-1114. 567 55. Jain S, Sutchu S, Rosa PA, Byram R, Jewett MW. Borrelia burgdorferi harbors a 568 transport system essential for purine salvage and mammalian infection. Infection and 569 immunity 2012; 80: 3086-3093. 570 56. Jelsbak L, Mortensen MIB, Kilstrup M, Olsen JE. The In Vitro Redundant Enzymes 571 PurN and PurT Are Both Essential for Systemic Infection of Mice in Salmonella 572 enterica Serovar Typhimurium. Infect Immun 2016; 84: 2076-2085. 573 57. Oliver JC, Linger RS, Chittur SV, Davisson VJ. Substrate activation and 574 conformational dynamics of guanosine 5'-monophosphate synthetase. Biochemistry 575 2013; 52: 5225-5235. 576 58. Pröschel M, Detsch R, Boccaccini AR, Sonnewald U. Engineering of Metabolic 577 Pathways by Artificial Enzyme Channels. Frontiers in Bioengineering and 578 Biotechnology (Review).2015; 3. 579 59. Ceron CR, Caldas RD, Felix CR, Mundim MH, Roitman I. Purine metabolism in 580 trypanosomatids. The Journal of protozoology 1979; 26: 479-483. 581 60. Wang CC, Aldritt S. Purine salvage networks in lamblia. The Journal of 582 experimental medicine 1983; 158: 1703-1712. 583 61. Wang K, McClure J, Tu A. Titin: major myofibrillar components of striated muscle. 584 Proc Natl Acad Sci U S A 1979; 76: 3698-3702. 585 62. Miles EW, Rhee S, Davies DR. The Molecular Basis of Substrate Channeling. 586 Journal of Biological Chemistry 1999; 274: 12193-12196. 587 63. Kastritis Panagiotis L, Gavin A-C. Enzymatic complexes across scales. Essays in 588 Biochemistry 2018; 62: 501-514. 589 64. An S, Kumar R, Sheets ED, Benkovic SJ. Reversible Compartmentalization of de Author Manuscript Author 590 Novo Purine Biosynthetic Complexes in Living Cells. Science 2008; 320: 103. 591 65. Baresova V, Skopova V, Sikora J, et al. Mutations of ATIC and ADSL affect 592 purinosome assembly in cultured skin fibroblasts from patients with AICA-ribosiduria 593 and ADSL deficiency. Human molecular genetics 2012; 21: 1534-1543.

This article is protected by copyright. All rights reserved Species Purine biosynthesis enzyme Fused protein Perkinsus marinus Conserved protein d 9 Blastocystis spp. PRPP amidotransferase Cwf15/Cwc15 cell cycle control protein Thalassiosira pseudonana Trifunctional GAR synthetase, MGS-like domain protein Aphanomyces invadans GAR transformylase and Phytophthora infestans AIR synthetase Cryptococcus neoformans SAICAR synthetase tyrosyl-tRNA synthetase Stylophora pistillata IMP dehydrogenase aspartyl-tRNA synthetase 594 66. Zhao H, Chiaro CR, Zhang L, et al. Quantitative analysis of purine nucleotides 595 indicates that purinosomes increase de novo purine biosynthesis. J Biol Chem 2015; 596 290: 6705-6713. 597 67. Baresova V, Krijt M, Skopova V, Souckova O, Kmoch S, Zikanova M. CRISPR-Cas9 598 induced mutations along de novo purine synthesis in HeLa cells result in 599 accumulation of individual enzyme substrates and affect purinosome formation. 600 Molecular genetics and metabolism 2016; 119: 270-277. 601 68. Deng Y, Gam J, French JB, Zhao H, An S, Benkovic SJ. Mapping protein-protein 602 proximity in the purinosome. J Biol Chem 2012; 287: 36201-36207. 603 69. Kyoung M, Russell SJ, Kohnhorst CL, Esemoto NN, An S. Dynamic architecture of 604 the purinosome involved in human de novo purine biosynthesis. Biochemistry 2015; 605 54: 870-880. 606 70. Chan CY, Pedley AM, Kim D, Xia C, Zhuang X, Benkovic SJ. Microtubule-directed 607 transport of purine metabolons drives their cytosolic transit to mitochondria. 608 Proceedings of the National Academy of Sciences of the United States of America 609 2018; 115: 13009-13014. 610 71. Verrier F, An S, Ferrie AM, et al. GPCRs regulate the assembly of a multienzyme 611 complex for purine biosynthesis. Nat Chem Biol 2011; 7: 909-915. 612 72. Schmitt DL, Sundaram A, Jeon M, Luu BT, An S. Spatial alterations of De Novo 613 purine biosynthetic enzymes by Akt-independent PDK1 signaling pathways. PloS one 614 2018; 13: e0195989-e0195989. 615 73. French JB, Zhao H, An S, et al. Hsp70/Hsp90 chaperone machinery is involved in the 616 assembly of the purinosome. Proceedings of the National Academy of Sciences of the

617 United States Manuscript Author of America 2013; 110: 2528-2533. 618 74. Chan CY, Zhao H, Pugh RJ, et al. Purinosome formation as a function of the cell 619 cycle. Proc Natl Acad Sci U S A 2015; 112: 1368-1373. 620 Table 1. Purine biosynthesis enzymes fused to proteins not associated with this metabolic process.

This article is protected by copyright. All rights reserved Podarcis muralis Bifunctional class II CAIR mutase-type Myb-li ng domain Ornithorhynchus anatinus CAIR synthetase and SAICAR Manis javanica synthetase Tursiops truncatus Camelus ferus Camelus ferus GMP synthase Uncharacterised protein (150 amino acids) 621 622 Figure Legends 623 Figure 1. The de novo purine biosynthesis pathway. Canonical purine biosynthesis enzymes from the 624 Eukaryota are represented as coloured hexagons, while non-eukaryotic alternatives are represented as coloured 625 circles. Adjacent shapes sharing an arrow represent enzyme domains that exist as separate proteins in some 626 species but are fused in others. Sections of chemical structures present in the final purine base are highlighted in 627 red. 628 629 Figure 2. The purine biosynthesis enzymes of the Archaea. Purine biosynthesis enzymes from the genomes 630 of 35 species across the Archaea were identified through literature searches and reciprocal BLAST analyses. 631 Canonical purine biosynthesis enzymes from the Eukaryota are represented as coloured hexagons, while non- 632 eukaryotic alternatives are represented as coloured circles. Fusion of purine biosynthesis enzymes is represented 633 by hexagons that are joined. Gene amplification is indicated with brackets with copy number in subscript. 634 Enzymes that were not found are represented by a dash. Species with the same combination of purine 635 biosynthesis enzymes for inosine monophosphate (IMP) synthesis are categorised in coloured boxes. The 636 taxonomy-based phylogenetic tree was created using phyloT, an online tree generator based on the NCBI 637 taxonomy and Genome Taxonomy Database. Phylum and order are indicated above each species. 638 639 Figure 3. The purine biosynthesis enzymes of the Bacteria. Purine biosynthesis enzymes from the genomes 640 of 69 species across the Bacteria were identified through literature searches and reciprocal BLAST analyses. 641 Canonical purine biosynthesis enzymes from the Eukaryota are represented as coloured hexagons, while non- 642 eukaryotic alternatives are represented as coloured circles. Fusion of purine biosynthesis enzymes is represented 643 by hexagons that are joined. Gene amplification is indicated with brackets with copy number in subscript. 644 Enzymes that were not found are represented by a dash. Species with the same combination of purine 645 biosynthesis enzymes for inosine monophosphate (IMP) synthesis are categorised in coloured boxes. The 646 taxonomy-based phylogenetic tree was created using phyloT, an online tree generator based on the NCBI 647 taxonomy and Genome Taxonomy Database. Phylum and order are indicated above each species. 648 649 Figure 4. The purine biosynthesis enzymes of the Eukaryota. Purine biosynthesis enzymes from the 650 genomes of 99 species across the Eukaryota were identified through literature searches and reciprocal BLAST Author Manuscript Author 651 analyses. Canonical purine biosynthesis enzymes from the Eukaryota are represented as coloured hexagons, 652 while non-eukaryotic alternatives are represented as coloured circles. Fusion of purine biosynthesis enzymes is 653 represented by hexagons that are joined. Gene amplification is indicated with brackets with copy number in 654 subscript. Enzymes that were not found are represented by a dash. Partial gene identifications are represented by

This article is protected by copyright. All rights reserved 655 hexagons with a dashed outline. Proteins unrelated to purine biosynthesis are represented by a dark grey oval 656 with a white asterisk (Refer to Table 1 for protein details). Species with the same combination of purine 657 biosynthesis enzymes for inosine monophosphate (IMP) synthesis are categorised in coloured boxes. The 658 taxonomy-based phylogenetic tree was created using phyloT, an online tree generator based on the NCBI 659 taxonomy and Genome Taxonomy Database. Phylum and order are indicated above each species. 660 661 Figure 5. Comparing purine biosynthesis in archaea, pathogenic bacteria, pathogenic fungi and humans 662 to identify future drug targets. Canonical purine biosynthesis enzymes from the Eukaryota are represented as 663 coloured hexagons, while non-eukaryotic alternatives are represented as coloured circles. Enzyme fusions are 664 represented by hexagons joined together. The order of biochemical activities for purine biosynthesis is 665 represented by a line. In Cryptococcus neoformans, tyrosyl-tRNA synthetase protein unrelated to purine 666 biosynthesis is represented by a dark grey oval. 667 668 Conflict of Interest 669 670 The authors declare no conflict of interest. Author Manuscript Author

This article is protected by copyright. All rights reserved N10-fTHF-GART 10 PRPP PRA GAR N -fTHF THF H FGAR H FGAM AIR

H2N NH NH PRPPA GARS FGAMS AIRS N O O

PPi NH HN HN 2 HN N

R5P R5P R5P O R5P O NH R5P NH R5P 2

L-Gln L-Glu ATP ADP ATP, H2O ADP, Pi ATP ADP ATP ATP ADP CAIRS H2O PPi L-Gly Pi L-Gln L-Glu Pi HCO3 Formate Pi Formate-GART ADP

CAIRM-II type Pi CO2 HCO3 CAIRS II or CO2

5 N N -CAIR 10 T O- IMPC N -fTHF-AICAR N 10 HN H2O FAICAR THF N -fTHF AICAR SAICAR -O CAIR R5P O O O O O O O- N N ADSL N SAICARS N CAIRM-I

- NH N O NH2 2 H N N N O N I

NH NH2 NH2 R5P NH R5P 2 R5P R5P Fumarate ADP ATP O H2O ADP ATP Pi L-Asp Archael-IMPC Pi Formate Formate-AICART

O O- ADS O AMP ADSS HN ADSL NH2 ADK NDK N N O- NH NH N N ADP ATP N N R5P R5P IMP O N GTP GDP Fumarate L NH -Asp Pi N

N R5P GMP IMPDH XMP O GMPS O GUK NDK N N

NH NH N N GDP GTP N N R5P H R5P O NH + 2 NAD NADH ATP, H2O AMP, PPi

H2O L-Gln L-Glu

imcb_12389_f1.eps Author Manuscript Author

This article is protected by copyright. All rights reserved PRPPa GARs GARt FGAMs AIRs CAIRs CAIRm SAICARs ADSl AICARt IMPc ADSs IMPdh GMPs

Nanoarchaeota Nanoarchaeum equitans Kin4-M ------Korarchaeota Korarchaeum cryptofilum OPF8 ------[ ]2 Thaumarchaeota Nitrosopumilus maritimus SCM1 I - [ ]2 Crenarchaeota, Desulfurococcales Pyrolobus fumarii 1A - I - - [ ]2 [ ]2 Crenarchaeota, Desulfurococcales Ignicoccus hospitalis KIN4/I - I - - [ ]2 [ ]2 Crenarchaeota, Desulfurococcales Desulfurococcus amylolyticus 1221n - I - - [ ]2 [ ]2 Crenarchaeota, Desulfurococcales Staphylothermus hellenicus DSM 12710 ------Crenarchaeota, Thermoproteales Pyrobaculum aerophilum str. IM2 I - - [ ]2 [ ]2 Crenarchaeota, Thermoproteales Caldivirga maquilingensis IC-167 I - - [ ]2 [ ]2 Crenarchaeota, Sulfolobales Metallosphaera sedula DSM 5348 I - - [ ]2 [ ]2 Crenarchaeota, Sulfolobales Sulfolobus acidocaldarius DSM 639 I - - [ ]2 [ ]2 Euryarchaeota, Thermococcales Pyrococcus furiosus DSM 3638 I - [ ]2 Euryarchaeota, Thermococcales Pyrococcus abyssi GE5 I - [ ]2 Euryarchaeota, Thermococcales Thermococcus gammatolerans EJ3 I [ ]2 Euryarchaeota, Thermococcales Thermococcus kodakarensis KOD1 I [ ]2 Euryarchaeota, Methanococcales Methanocaldococcus jannaschii DSM 2661 - I Euryarchaeota, Methanococcales Methanococcus aeolicus Nankai-3 - I [ ]2 Euryarchaeota, Methanobacteriales Methanobrevibacter smithii ATCC 35061 - - I I Euryarchaeota, Methanobacteriales Methanosphaera stadtmanae DSM 3091 - - I I Euryarchaeota, Methanopyrales Methanopyrus kandleri AV19 - - I Euryarchaeota, Archaeoglobales Archaeoglobus profundus DSM 5631 - - II - [ ]2 Euryarchaeota, Halobacteriales Haloquadratum walsbyi DSM 16790 I Euryarchaeota, Halobacteriales Halobacterium salinarum NRC-1 I [ ]2 Euryarchaeota, Halobacteriales Natronomonas pharaonis DSM 2160 I [ ]2 Euryarchaeota, Methanosarcinales Methanococcoides burtonii DSM 6242 - I - [ ]2 Euryarchaeota, Methanosarcinales Methanosarcina mazei Go1 - I - [ ]3 Euryarchaeota, Methanosarcinales Methanosaeta harundinacea 6Ac - I [ ]2 Euryarchaeota, Methanocellales Methanocella paludicola SANAE - I [ ]3 Euryarchaeota, Methanocellales Methanocella conradii HZ254 - I Euryarchaeota, Methanomicrobiales Methanospirillum hungatei JF-1 - I Euryarchaeota, Methanomicrobiales Methanoregula formicica SMSP - I Euryarchaeota, Methanomicrobiales Methanosphaerula palustris E1-9c - I Euryarchaeota, Methanomicrobiales Methanocorpusculum labreanum Z - I [ ]2 Euryarchaeota, Thermoplasmatales Picrophilus torridus DSM 9790 I Euryarchaeota, Thermoplasmatales Thermoplasma acidophilum DSM 1728 I

imcb_12389_f2.eps Author Manuscript Author

This article is protected by copyright. All rights reserved PRPPa GARs GARt FGAMs AIRs CAIRs CAIRm SAICARs ADSl AICARt IMPc ADSs IMPdh GMPs

Fusobacteria,Fusobacteriales Fusobacterium nucleatum subsp. nucleatum ATCC 25586 - I Bacteroidetes, Bacteroidales Bacteroides thetaiotaomicron VPI-5482 - I [ ]2 Fibrobacteres, Fibrobacterales Fibrobacter succinogenes subsp. succinogenes S85 - I Spirochaetes, Leptospirales Leptospira interrogans strain FMAS_AW1 I Spirochaetes, Spirochaetales Borreliella burgdorferi strain B31_NRZ ------Spirochaetes, Spirochaetales Treponema pallidum subsp. pallidum str. Sea 81-4 ------Chlamydiae, Chylamydiales Chlamydia trachomatis D/UW-3/CX ------Planctomycetes Planctomycetes bacterium strain Mal15 I Proteobacteria, Rickettsiales Wolbachia pipientis wAlbB I Proteobacteria, Rickettsiales Rickettsia prowazekii str. Breinl ------Proteobacteria, Rhizobiales Mesorhizobium loti R88b I Proteobacteria, Rhizobiales Brucella melitensis bv. 1 str. 16M I Proteobacteria, Rhizobiales Agrobacterium tumefaciens strain Ach5 I Proteobacteria, Rhizobiales Sinorhizobium meliloti RU11/001 I Proteobacteria, Campylobacterales Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 I Proteobacteria, Campylobacterales Helicobacter felis ATCC 49179 [ ] - I Proteobacteria, Campylobacterales 2 Helicobacter pylori NCTC 11637 ------Proteobacteria, Myxococcales Myxococcus xanthus DK 1622 I Proteobacteria, Neisseriales Neisseria meningitidis strain NCTC10025 I Proteobacteria, Neisseriales Chromobacterium violaceum ATCC 12472 I [ ] 2 Proteobacteria, Burkholderiales Ralstonia solanacearum GMI1000 I Proteobacteria, Burkholderiales Bordetella pertussis 18323 I Proteobacteria, Pseudomonadales Pseudomonas aeruginosa PAO1 I Proteobacteria,, Pasteurellales Pasteurella multocida ATCC 43137 I Proteobacteria,, Pasteurellales Haemophilus influenzae strain NCTC8143 I Proteobacteria, Legionellales Coxiella burnetii Dugway 5J108-111 I Proteobacteria, Enterobacterales Photorhabdus luminescens subsp. luminescens I Proteobacteria,, Enterobacterales Buchnera aphidicola (Macrosiphum euphorbiae) strain Meu ------Proteobacteria, Enterobacterales Yersinia pestis CO92 I Proteobacteria, Enterobacterales Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 I Proteobacteria,, Enterobacterales Shigella flexneri 2a I Proteobacteria, Enterobacterales Escherichia coli str. K-12 substr. MG1655 I Proteobacteria, Alteromonadales Shewanella baltica OS678 I Proteobacteria, Vibrionales Vibrio parahaemolyticus strain FDAARGOS_115 I [ ] 2 Proteobacteria, Vibrionales Photobacterium damselae subsp. damselae I Proteobacteria, Thiotrichales Piscirickettsia salmonis LF-89 = ATCC VR-1361 I Proteobacteria, Thiotrichales Francisella noatunensis subsp. orientalis LADL--07-285A I Proteobacteria, Thiotrichales Francisella tularensis subsp. novicida U112 I Tenericutes, Mycoplasmatales Mycoplasma pneumoniae FH ------Chloroflexi, Dehalococcoidales Dehalococcoides mccartyi 195 - I Cyanobacteria, Synechococcales Prochlorococcus marinus subsp. marinus I Cyanobacteria, Chroococcales Microcystis aeruginosa NIES-843 I Cyanobacteria, Nostocales Cylindrospermopsis raciborskii CS-505 I Cyanobacteria, Nostocales Fischerella thermalis PCC 7521 I Actinobacteria, Streptomycetales Streptomyces griseus subsp. griseus NBRC 13350 I [ ] 3 Actinobacteria, Streptomycetales Streptomyces venezuelae ATCC 10712 I [ ] 2 Actinobacteria, Micrococcales Tropheryma whipplei str. Twist I [ ] 2 Actinobacteria, Micrococcales Micrococcus luteus NCTC 2665 I Actinobacteria, Micrococcales Kocuria rhizophila DC2201 I [ ] 2 Actinobacteria, Bifidobacteriales Bifidobacterium bifidum ATCC 29521 = JCM 1255 = DSM 20456 I [ ] 2 Actinobacteria, Corynebacteriales Mycobacterium tuberculosis H37Rv I [ ] Actinobacteria, Corynebacteriales 2 Corynebacterium diphtheriae strain NCTC11397 I [ ] Firmicutes, Lactobacillales 2 Streptococcus pyogenes M1 GAS I Firmicutes, Lactobacillales Streptococcus pneumoniae strain NCTC7465 I Firmicutes, Lactobacillales Lactococcus lactis subsp. lactis I Firmicutes, Lactobacillales Lactobacillus acidophilus NCFM I Firmicutes, Lactobacillales Enterococcus faecalis EnGen0107 [ ] I Firmicutes, Bacillales 2 Listeria monocytogenes EGD-e I Firmicutes, Bacillales Staphylococcus aureus subsp. aureus DSM 20231 I Firmicutes, Bacillales Bacillus anthracis str. 'Ames Ancestor' I Firmicutes, Bacillales Author Manuscript Author Bacillus cereus ATCC 14579 I Firmicutes, Bacillales Bacillus subtilis subsp. subtilis str. 168 I Firmicutes, Bacillales Paenibacillus xylanexedens strain EDO6 NODE_10 I Firmicutes, Selenomonadales Selenomonas ruminantium subsp. lactilytica TAM6421 - I Firmicutes, Clostridiales Butyrivibrio fibrisolvens DSM 3071 - I [ ] 2 Firmicutes, Clostridiales Ruminococcus flavefaciens ATCC 19208 - I [ ] 2 Firmicutes, Clostridiales Hungatella hathewayi DSM 13479 - I [ ] 2 Firmicutes, Clostridiales Clostridium perfringens ATCC 13124 - I Firmicutes, Clostridiales Clostridium tetani E88 - I - -

This article is protected by copyright. All rights reserved imcb_12389_f3.eps PRPPa GARs GARt FGAMs AIRs CAIRs CAIRm SAICARs ADSl AICARt IMPc ADSs IMPdh GMPs

Sarcomastigophora, Diplomonadida Giardia lamblia ATCC 50803 ------Metamonada, vaginalis G3 ------, Schizopyrenida gruberi strain NEG-M ------, rangeli strain AM80 ------Euglenozoa, Kinetoplastida brucei TREU927 ------Euglenozoa, Kinetoplastida Leptomonas pyrrhocoris isolate H10 ------Euglenozoa, Kinetoplastida donovani BPK282A1 ------Rhodophyta, Gigartinales Chondrus crispus - II Rhodophyta, Cyanidiales Cyanidioschyzon merolae strain 10D - II Rhodophyta, Cyanidiales Galdieria sulphuraria - II Perkinsozoa, Perkinsida Perkinsus marinus ATCC 50983 - II [ ]2 [ ] 2 * [ ]2 [ ]2 [ ]4 Apicomplexa, Haemosporida Plasmodium falciparum 3D7 ------Stramenopila, Blastocystida Blastocystis sp. subtype 4 - II * [ ] 2 Bacillariophyta, Thalassiosirales Thalassiosira pseudonana CCMP1335 * I Oomycota, Saprolegniales Aphanomyces invadans strain NJM9701 * I Heterokontophyta, Peronosporales Phytophthora infestans T30-4 * I [ ] [ ] [ ] Heterokontophyta, Peronosporales 2 2 2 Plasmopara halstedii I Haptophyta, Isochrysidales Emiliania huxleyi CCMP1516 [ ] [ ] I [ ] Amoebozoa, Centramoebida 2 2 2 Acanthamoeba castellanii str. Neff I [ ]2 Amoebozoa, Dictyosteliida Polysphondylium pallidum PN500 I Amoebozoa, Dictyosteliida Dictyostelium discoideum AX4 I Cryptophyta, Pyrenomonadales Guillardia theta CCMP2712 - I [ ]2 Chlorophyta, Mamiellales Bathycoccus prasinos I Chlorophyta, Chlamydomonadales Auxenochlorella protothecoides sp 0710 I [ ]3 Chlorophyta, Chlamydomonadales Chlamydomonas reinhardtii CC3269 I I Streptophyta, Tracheophyta, Brassicales Arabidopsis thaliana I I [ ]3 [ ]2 Streptophyta, Tracheophyta, Malpighiales Populus trichocarpa [ ] [ ] [ ] [ I ] [ ] [ ] Streptophyta, Tracheophyta, Poales 3 2 2 2 2 2 Zea mays I [ ] [ ] [ ] [ ] [ ] [ ] [ ]2 [ ] [ ] Streptophyta, Tracheophyta,Poales 3 2 2 2 2 2 2 2 Oryza sativa [ ] [ ] [ ] [ I ] [ ] Streptophyta, Tracheophyta, Poales 2 2 2 2 2 Brachypodium distachyon [ ] [ ] [ ] [ ] [ I ] [ ] Streptophyta, Bryophyta, Funariales 2 2 2 2 2 2 Physcomitrella patens [ ] [ ] [ ] I [ ] [ ] Craspedida 3 2 2 2 3 Monosiga brevicollis MX1 I Filasterea Capsaspora owczarzaki ATCC 30864 I Ichthyophonida Sphaeroforma arctica JP610 I Mucoromycota, Mucorales Rhizopus microsporus ATCC 52813 I Basidiomycota, Ustilaginales Ustilago maydis 521 I Basidiomycota, Tremellales Cryptococcus neoformans var. grubii H99 I Basidiomycota, Agaricales * Schizophyllum commune I Ascomycota, Schizosaccharomycetales Schizosaccharomyces pombe I Ascomycota, Schizosaccharomycetales Candida auris B11221 I Ascomycota, Saccharomycetales Saccharomyces cerevisiae S288C I [ ] [ ] Ascomycota, Saccharomycetales 2 3 Candida albicans SC5314 I Ascomycota, Saccharomycetales Scheffersomyces stipitis CBS 6054 I Ascomycota, Onygenales Paracoccidioides brasiliensis Pb18 I Ascomycota, Onygenales Histoplasma capsulatum NAm1 I Ascomycota, Onygenales Blastomyces gilchristii SLH14081 I Ascomycota, Eurotiales Talaromyces marneffei ATCC 18224 I Ascomycota, Eurotiales Aspergillus niger CBS 513.88 I Ascomycota, Eurotiales Aspergillus fumigatus Af293 I Ascomycota, Helotiales Sclerotinia sclerotiorum 1980 UF-70 I Ascomycota, Hypocreales Neurospora crassa OR74A I Ascomycota, Sordariales Fusarium oxysporum f. sp. lycopersici 4287 [ ] I [ ] Porifera, Haplosclerida 2 2 Amphimedon queenslandica ------Placozoa, Tricoplaciformes Trichoplax adhaerens - II Cnidaria, Actiniaria Nematostella vectensis - II [ ] [ ] Cnidaria, Scleractinia 2 2 Stylophora pistillata II [ ]2 * Cnidaria, Alcyonacea Dendronephthya gigantea - II Nematoda, Strongylida Necator americanus - II II Nematoda, Rhabditida Caenorhabditis elegans - II Arthropoda, Xiphosura Limulus polyphemus - II [ ] [ ] ɤ$UWKURSRGD$UDQHDH 2 2 Parasteatoda tepidariorum - II Arthropoda, Decapoda Penaeus vannamei - II [ ] Arthropoda, Diptera 2 Drosophila melanogaster [ ] - II Priapulida, Priapulimorpha 2 Priapulus caudatus [ ] - II [ ] Platyhelminthes, Strigeidida 2 2 Schistosoma mansoni ------Annelida, Rhynchobdellida Helobdella robusta - II Mollusca, Ostreoida Crassostrea gigas - II Mollusca, Octopoda Octopus bimaculoides - II [ ] Mollusca, Euopisthobranchia 2 Aplysia californica - II Mollusca, Architaenioglossa Pomacea canaliculata - II [ ]2 Brachiopoda, Lingulida Lingula anatina - II [ ] [ ]2 [ ] [ ] Hemichordata, Enteropneusta 2 2 2 Saccoglossus kowalevskii - II Echinodermata, Comatulida Anneissia japonica - II Echinodermata, Valvatida Acanthaster planci - II Echinodermata, Echinoida Strongylocentrotus purpuratus [ ] - II [ ] Chordata, Amphioxiformes 2 2 Branchiostoma floridae - II Chordata, Enterogona Author Manuscript Author Ciona intestinalis - II Chordata, Petromyzontiformes Petromyzon marinus - II [ ] Chordata, Rajiformes 2 Amblyraja radiata - II [ ]2 [ ]2 Chordata, Chimaeriformes Callorhinchus milii - II [ ]2 [ ]2 Chordata, Cypriniformes Danio rerio - II [ ]3 [ ]3 Chordata, Perciformes Epinephelus lanceolatus - II [ ]3 [ ]3 Chordata, Gymnophiona Rhinatrema bivittatum - II [ ]2 [ ]2 Chordata, Anura Xenopus tropicalis - II II [ ]2 [ ]2 Chordata, Galliformes Gallus gallus - II [ ] 2 [ ]2 [ ]2 Chordata, Squamata Podarcis muralis [ ] - II * [ ] Chordata, Monotremata 2 2 Ornithorhynchus anatinus [ ] - II * [ ] [ ] Chordata, Pholidota 2 2 2 Manis javanica - II * [ ] [ ] Chordata, Artiodactyla 2 2 Tursiops truncatus [ ] - II * [ ] [ ] Chordata, Artiodactyla 2 2 2 Camelus ferus - II * [ ] [ ] * Chordata, Eulipotyphla 2 2 This article is protected by copyright. All rights Erinaceus europaeus reserved- II [ ] [ ] Chordata, Carnivora 2 2 Ursus maritimus - II [ ] [ ] Chordata, Perissodactyla 2 2 Equus caballus - II [ ] [ ] Chordata, Chiroptera 2 2 Rhinolophus ferrumequinum - II [ ] [ ] Chordata, Primates 2 2 Homo sapiens - II [ ]2 [ ]2

imcb_12389_f4.eps Archaea PRPPA GARS GART FGAMS AIRS CAIRS CAIRM SAICARS ADSL AICART IMPC Thermococcus PRPP I IMP gammatolerans

Bacteria Streptococcus PRPP I IMP pneumoniae

Fungi Cryptococcus PRPP I Tys1 IMP neoformans

Animals Homo sapiens PRPP II IMP

imcb_12389_f5.eps Author Manuscript Author

This article is protected by copyright. All rights reserved