Designing Novel Biochemical Pathways to Commodity Chemicals Using
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 1 Designing novel biochemical pathways to commodity chemicals using 2 ReactPRED and RetroPath2.0 3 4 5 6 Authors and Affiliations 7 • Eleanor Vigrass 8 • M. Ahsanul Islam 9 • Department of Chemical Engineering, Loughborough University, Loughborough, 10 Leicestershire, LE11 3TU, UK 11 12 Corresponding Author 13 • M. Ahsanul Islam ([email protected]) 14 15 16 17 18 19 20 21 22 23 24 25 26 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 27 Abstract 28 Commodity chemicals are high-demand chemicals, used by chemical industries to synthesise 29 countless chemical products of daily use. For many of these chemicals, the main production 30 process uses petroleum-based feedstocks. Concerns over these limited resources and their 31 associated environmental problems, as well as mounting global pressure to reduce CO2 32 emissions have motivated efforts to find biochemical pathways capable of producing these 33 chemicals. Advances in metabolic engineering have led to the development of technologies 34 capable of designing novel biochemical pathways to commodity chemicals. Computational 35 software tools, ReactPRED and RetroPath2.0 were utilised to design 49 novel pathways to 36 produce benzene, phenol, and 1,2-propanediol — all industrially important chemicals with 37 limited biochemical knowledge. A pragmatic methodology for pathway curation was 38 developed to analyse thousands and millions of pathways that were generated using the 39 software. This method utilises publicly accessible biological databases, including MetaNetX, 40 PubChem, and MetaCyc to analyse the generated outputs and assign EC numbers to the 41 predicted reactions. The workflow described here for pathway generation and curation can be 42 used to develop novel biochemical pathways to commodity chemicals from numerous starting 43 compounds. 44 45 Key words: Biochemical pathways, cheminformatics tools, commodity chemicals, 46 ReactPRED, RetroPath2.0, retrosynthesis 47 48 49 50 51 52 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 53 Introduction 54 Commodity chemicals, such as ethylene, propylene, benzene, phenol, ethanol, and toluene, are 55 high-value chemicals used by industries to synthesise countless chemical products of daily use. 56 From pharmaceuticals to biofuels (Bengelsdorf and Dürre, 2017; Straathof, 2014), the global 57 chemical turnover was valued at € 3475 billion in 2017, and this demand is expected to rise 58 further in the future (Cefic, 2018). Both organic and inorganic commodities are mainly derived 59 from fossil fuel-based petroleum feedstocks to release harmful direct and indirect greenhouse 60 gases such as CO2 and CO into the atmosphere. Concerns over these limited fossil-fuel 61 resources and increasing global pressure to reduce greenhouse gas emissions (UNEP, 2017) 62 have led to an urgent need to find sustainable biochemical routes capable of producing these 63 chemicals and satisfying their demands. 64 65 Biochemical routes involving fermentation and enzymatic methods have widely discussed in 66 the literature for sustainable production of commodity chemicals (Saha, 2003; Siebert and 67 Wendisch, 2015). Fermentation is a microbial process that uses microorganisms such as 68 bacteria and yeast to produce enzymes (Renge et al., 2012), which then catalyse the 69 biochemical reactions producing commodities from sugars and other biomass resources 70 (Straathof, 2014). For example, the production of ethanol via the fermentation of syngas 71 (Bengelsdorf et al., 2013; Bengelsdorf and Dürre, 2017), or the conversion of protein waste to 72 cinnamic acid and β-alanine (Kumar et al., 2015) are microbially mediated fermentation 73 processes. Enzymes are highly selective, but they also have the ability of catalyse numerous 74 non-selective or non-specific reactions in addition to the specific reaction the enzyme has 75 evolved for (Kumar et al., 2015; Straathof, 2014). This ability of catalysing non-specific 76 reactions is known as the ‘enzyme promiscuity’ (Tawfik, 2010), and is dependent on the 77 substrates and cofactors involved in the reactions (Delépine et al., 2018; Shin et al., 2013; 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 78 Tawfik, 2010). Although billions of years of evolution have enriched the repertoire of natural 79 biochemical reaction networks of an organism, many chemical commodities cannot be 80 produced ‘naturally’ due to surpassing an organism’s natural capabilities (Wang et al., 2017). 81 Additionally, there is lack of knowledge on promiscuous enzyme activities such as the number 82 of promiscuous reactions that enzymes can partake (Lee et al., 2012; Shin et al., 2013; Wang 83 et al., 2017). These limitations prevent the discovery and implementation of potential 84 biochemical pathways to high-value commodity chemicals. 85 86 Recent advances in cheminformatics and bioinformatics have enabled the design of novel (i.e., 87 biologically unknown) biochemical pathways (Brunk et al., 2012; Medema et al., 2012), and 88 have expanded our knowledge of promiscuous enzyme activities through the design and 89 implementation of computational tools (Hadadi et al., 2019; Wang et al., 2017). Many of these 90 state-of-the-art computational tools are equipped with unique abilities to aid metabolic 91 engineering efforts by designing novel pathways for numerous applications, including 92 bioremediation of xenobiotics (Finley et al., 2009), novel drug discovery (Moura et al., 2016), 93 and production of commodity chemicals (Islam et al., 2017; Yim et al., 2011). Examples of 94 some of the widely used cheminformatics tools include From Metabolite to Metabolite (FMM) 95 (http://fmm.mbc.nctu.edu.tw/), BINCE (Hatzimanikatis et al., 2005), DESHARKY (Rodrigo 96 et al., 2008), PathPred (Moriya et al., 2010), and MRE (Kuwahara et al., 2016). These tools 97 have been applied to numerous studies and have been extensively discussed elsewhere (Brunk 98 et al., 2012; Henry et al., 2010; Islam et al., 2017; Medema et al., 2012; Wang et al., 2017). 99 100 Many of these computational tools ‘retrosynthetically’ generate biochemical pathways by 101 iteratively applying the ‘generalised reaction rules’ to transform and connect target compounds 102 to the metabolites of interest (Hadadi et al., 2016; Medema et al., 2012; Wang et al., 2017). 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 103 The generalised reaction rules are derived using the EC (Enzyme commission) number 104 information of known biochemical reactions assigned by the Nomenclature Committee of the 105 International Union of Biochemistry and Molecular Biology (NC-IUBMB, 1992). These tools 106 have the capability of generating novel and known biochemical reactions; however, a 107 significant limitation that most tools suffer from is the combinational explosion of pathways 108 predicted due to using the generalised reaction rules. The number of pathways generated could 109 result in the thousands and in some cases, in millions, presenting the challenge of efficient post- 110 processing of the generated pathways to find meaningful results (Islam et al., 2017). Although 111 publications relevant to a specific software provide information on how to use and generate 112 results using the software, often there is no further guidance on how to curate these results to 113 obtain useful pathways: a crucial need for practicing metabolic engineers. This need leads to 114 developing individual curation methods that are mainly tools or software specific, as well as 115 specific to the conducted studies. 116 117 In this study, two powerful computational cheminformatics tools, ReactPRED (Sivakumar et 118 al., 2016) and RetroPath2.0 (Delépine et al., 2018) were applied to design novel biochemical 119 pathways to produce three commodity chemicals: benzene, phenol, and 1, 2-propanediol. These 120 target compounds were chosen based on their limited biochemical pathway knowledge (i.e., 121 how many pathways are known in the current biological databases) and global demand. For 122 example, it was estimated that the global demand for benzene in 2016 was 46 million tonnes 123 (Pérez-Uresti et al., 2017). RetroPath2.0 and ReactPRED are relatively new, open source, and 124 customisable cheminformatics tools. We chose to use these tools based on their ability to 125 predict novel retrosynthetic (i.e., transforming the target compounds to their simpler 126 precursors) and synthetic (i.e., using simpler precursor compounds to construct target 127 molecules) pathways through identifying the chemical bond transformations occurring in the 5 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021.