bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
1 Designing novel biochemical pathways to commodity chemicals using
2 ReactPRED and RetroPath2.0
3
4
5
6 Authors and Affiliations
7 • Eleanor Vigrass 8 • M. Ahsanul Islam 9 • Department of Chemical Engineering, Loughborough University, Loughborough, 10 Leicestershire, LE11 3TU, UK
11
12 Corresponding Author
13 • M. Ahsanul Islam ([email protected]) 14
15
16
17
18
19
20
21
22
23
24
25
26
1 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
27 Abstract
28 Commodity chemicals are high-demand chemicals, used by chemical industries to synthesise
29 countless chemical products of daily use. For many of these chemicals, the main production
30 process uses petroleum-based feedstocks. Concerns over these limited resources and their
31 associated environmental problems, as well as mounting global pressure to reduce CO2
32 emissions have motivated efforts to find biochemical pathways capable of producing these
33 chemicals. Advances in metabolic engineering have led to the development of technologies
34 capable of designing novel biochemical pathways to commodity chemicals. Computational
35 software tools, ReactPRED and RetroPath2.0 were utilised to design 49 novel pathways to
36 produce benzene, phenol, and 1,2-propanediol — all industrially important chemicals with
37 limited biochemical knowledge. A pragmatic methodology for pathway curation was
38 developed to analyse thousands and millions of pathways that were generated using the
39 software. This method utilises publicly accessible biological databases, including MetaNetX,
40 PubChem, and MetaCyc to analyse the generated outputs and assign EC numbers to the
41 predicted reactions. The workflow described here for pathway generation and curation can be
42 used to develop novel biochemical pathways to commodity chemicals from numerous starting
43 compounds.
44
45 Key words: Biochemical pathways, cheminformatics tools, commodity chemicals,
46 ReactPRED, RetroPath2.0, retrosynthesis
47
48
49
50
51
52
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
53 Introduction
54 Commodity chemicals, such as ethylene, propylene, benzene, phenol, ethanol, and toluene, are
55 high-value chemicals used by industries to synthesise countless chemical products of daily use.
56 From pharmaceuticals to biofuels (Bengelsdorf and Dürre, 2017; Straathof, 2014), the global
57 chemical turnover was valued at € 3475 billion in 2017, and this demand is expected to rise
58 further in the future (Cefic, 2018). Both organic and inorganic commodities are mainly derived
59 from fossil fuel-based petroleum feedstocks to release harmful direct and indirect greenhouse
60 gases such as CO2 and CO into the atmosphere. Concerns over these limited fossil-fuel
61 resources and increasing global pressure to reduce greenhouse gas emissions (UNEP, 2017)
62 have led to an urgent need to find sustainable biochemical routes capable of producing these
63 chemicals and satisfying their demands.
64
65 Biochemical routes involving fermentation and enzymatic methods have widely discussed in
66 the literature for sustainable production of commodity chemicals (Saha, 2003; Siebert and
67 Wendisch, 2015). Fermentation is a microbial process that uses microorganisms such as
68 bacteria and yeast to produce enzymes (Renge et al., 2012), which then catalyse the
69 biochemical reactions producing commodities from sugars and other biomass resources
70 (Straathof, 2014). For example, the production of ethanol via the fermentation of syngas
71 (Bengelsdorf et al., 2013; Bengelsdorf and Dürre, 2017), or the conversion of protein waste to
72 cinnamic acid and β-alanine (Kumar et al., 2015) are microbially mediated fermentation
73 processes. Enzymes are highly selective, but they also have the ability of catalyse numerous
74 non-selective or non-specific reactions in addition to the specific reaction the enzyme has
75 evolved for (Kumar et al., 2015; Straathof, 2014). This ability of catalysing non-specific
76 reactions is known as the ‘enzyme promiscuity’ (Tawfik, 2010), and is dependent on the
77 substrates and cofactors involved in the reactions (Delépine et al., 2018; Shin et al., 2013;
3 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
78 Tawfik, 2010). Although billions of years of evolution have enriched the repertoire of natural
79 biochemical reaction networks of an organism, many chemical commodities cannot be
80 produced ‘naturally’ due to surpassing an organism’s natural capabilities (Wang et al., 2017).
81 Additionally, there is lack of knowledge on promiscuous enzyme activities such as the number
82 of promiscuous reactions that enzymes can partake (Lee et al., 2012; Shin et al., 2013; Wang
83 et al., 2017). These limitations prevent the discovery and implementation of potential
84 biochemical pathways to high-value commodity chemicals.
85
86 Recent advances in cheminformatics and bioinformatics have enabled the design of novel (i.e.,
87 biologically unknown) biochemical pathways (Brunk et al., 2012; Medema et al., 2012), and
88 have expanded our knowledge of promiscuous enzyme activities through the design and
89 implementation of computational tools (Hadadi et al., 2019; Wang et al., 2017). Many of these
90 state-of-the-art computational tools are equipped with unique abilities to aid metabolic
91 engineering efforts by designing novel pathways for numerous applications, including
92 bioremediation of xenobiotics (Finley et al., 2009), novel drug discovery (Moura et al., 2016),
93 and production of commodity chemicals (Islam et al., 2017; Yim et al., 2011). Examples of
94 some of the widely used cheminformatics tools include From Metabolite to Metabolite (FMM)
95 (http://fmm.mbc.nctu.edu.tw/), BINCE (Hatzimanikatis et al., 2005), DESHARKY (Rodrigo
96 et al., 2008), PathPred (Moriya et al., 2010), and MRE (Kuwahara et al., 2016). These tools
97 have been applied to numerous studies and have been extensively discussed elsewhere (Brunk
98 et al., 2012; Henry et al., 2010; Islam et al., 2017; Medema et al., 2012; Wang et al., 2017).
99
100 Many of these computational tools ‘retrosynthetically’ generate biochemical pathways by
101 iteratively applying the ‘generalised reaction rules’ to transform and connect target compounds
102 to the metabolites of interest (Hadadi et al., 2016; Medema et al., 2012; Wang et al., 2017).
4 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
103 The generalised reaction rules are derived using the EC (Enzyme commission) number
104 information of known biochemical reactions assigned by the Nomenclature Committee of the
105 International Union of Biochemistry and Molecular Biology (NC-IUBMB, 1992). These tools
106 have the capability of generating novel and known biochemical reactions; however, a
107 significant limitation that most tools suffer from is the combinational explosion of pathways
108 predicted due to using the generalised reaction rules. The number of pathways generated could
109 result in the thousands and in some cases, in millions, presenting the challenge of efficient post-
110 processing of the generated pathways to find meaningful results (Islam et al., 2017). Although
111 publications relevant to a specific software provide information on how to use and generate
112 results using the software, often there is no further guidance on how to curate these results to
113 obtain useful pathways: a crucial need for practicing metabolic engineers. This need leads to
114 developing individual curation methods that are mainly tools or software specific, as well as
115 specific to the conducted studies.
116
117 In this study, two powerful computational cheminformatics tools, ReactPRED (Sivakumar et
118 al., 2016) and RetroPath2.0 (Delépine et al., 2018) were applied to design novel biochemical
119 pathways to produce three commodity chemicals: benzene, phenol, and 1, 2-propanediol. These
120 target compounds were chosen based on their limited biochemical pathway knowledge (i.e.,
121 how many pathways are known in the current biological databases) and global demand. For
122 example, it was estimated that the global demand for benzene in 2016 was 46 million tonnes
123 (Pérez-Uresti et al., 2017). RetroPath2.0 and ReactPRED are relatively new, open source, and
124 customisable cheminformatics tools. We chose to use these tools based on their ability to
125 predict novel retrosynthetic (i.e., transforming the target compounds to their simpler
126 precursors) and synthetic (i.e., using simpler precursor compounds to construct target
127 molecules) pathways through identifying the chemical bond transformations occurring in the
5 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
128 reactions (Delépine et al., 2018; Sivakumar et al., 2016). After generating pathways
129 automatically, a pathway curation method was developed for each tool to analyse millions of
130 generated pathways and remove redundant results. Initially pathways were examined for
131 specific starting compounds, such as acetate, pyruvate, and glucose, as these compounds are
132 abundantly available in cells and widely used in their metabolisms. Next the pathways
133 containing these compounds were screened based on thermodynamic feasibility. The feasible
134 pathways were further analysed to examine the compounds generated, and individual reactions
135 were assigned to an enzyme commission (EC) number: a numerical classification scheme for
136 enzyme catalysed reactions (Egelhofer et al., 2010). This task was accomplished by comparing
137 the generated reactions to known reactions in the MetaNetX (Moretti et al., 2016), MetaCyc
138 (Caspi et al., 2020) and KEGG (Kanehisa et al., 2020) databases. Finally, both software
139 programmes were analysed to discuss their comparative advantages and limitations for finding
140 novel biochemical pathways to target compounds.
141
142 Materials and methods
143 Automated generation of pathways
144 Novel biochemical pathways were constructed for the production of 1, 2-propanediol, benzene,
145 and phenol using the computational software programmes, ReactPRED and RetroPath2.0.
146 Detailed descriptions of both algorithms and their functionalities can be found elsewhere
147 (Delépine et al., 2018; Sivakumar et al., 2016). Both software programmes require a number
148 of inputs to generate biochemical pathways. In the case of ReactPRED, these inputs included
149 a set of generalised reaction rules developed based on the EC numbers of biochemical reactions
150 found in the MetaCyc (Caspi et al., 2020) database, cofactors (NAD, NADP), target
151 compounds (benzene, phenol, 1, 2-propanediol), and source (glucose, pyruvate, acetate)
152 compounds information in the SMILES format (Weininger, 1988). For RetroPath2.0, the inputs
6 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
153 included a set of reaction rules generated based on the biochemical reactions in the MetaNetX
154 (Moretti et al., 2016) database and the reactions in the genome-scale E. coli metabolic model,
155 iJ01366 (Orth et al., 2011), as well as the source, sink, and cofactor compounds, including
156 benzene, phenol, NAD, and NADP. RetroPath2.0 then retrosynthetically and ReactPRED
157 synthetically generated pathways by iteratively applying the generalised reaction rules to
158 generate reactions connecting the target compounds to the metabolites present within the
159 MetaCyc and MetaNetX databases. The thermodynamic feasibility of both ReactPRED and
160 Rertopath2.0 generated pathways was analysed by estimating the standard Gibbs free energy
161 of the generated reactions using the group contribution method (Jankowski et al., 2008; Noor
162 et al., 2012).
163
164 Manual curation of the generated pathways
165 The automatically generated pathways were analysed and manually curated based on the
166 reactions and compounds involved in reactions, and the overall pathway feasibility. For
167 RetroPath 2.0, most of the generated reactions that were biologically known were automatically
168 assigned an EC number. However, the unknown or novel reactions were examined and
169 compared to similar reactions in the MetaNetX (Moretti et al., 2016) database. Also, the
170 generated compounds were all examined to verify their identities by comparing them with the
171 compounds in the MetaNetX and PubChem (Kim et al., 2019) databases. If the compounds
172 were identified and existed in the databases, they were assigned to corresponding reactions
173 with an EC number while unconfirmed compounds were discarded (Figure 1). ReactPRED
174 generated compounds in the SMILES format (Weininger, 1988), which were examined and
175 verified using the PubChem database. Unidentified compounds were discarded while identified
176 compounds were further assessed using the MetaCyc (Caspi et al., 2020) and KEGG (Kanehisa
177 et al., 2020) databases to confirm if the generated compounds were present in the biological
7 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
178 databases. The identified compounds and corresponding reactions were then assigned an EC
179 number based on the reaction rule and cofactor information used to generate those reactions.
180 Moreover, the compounds not present in MetaCyc and KEGG were further analysed with the
181 online CDK depicter tool (Willighagen et al., 2017) to confirm the bond transformations
182 occurring within the proposed reactions (Figure 1). An EC number was then assigned to the
183 reactions using the provided reaction rule and cofactor information.
184
185 Results and discussion
186 Analysis of the generated pathways using ReactPRED
187 Reactions and pathways were generated using ReactPRED’s default reaction rule set, which
188 included a total of 1462 reaction rules (Sivakumar et al., 2016) and the SMILES format of the
189 starting compounds. Glucose, pyruvate, and acetate were used to generate synthetic pathways
190 up to the pathway length of 3, while phenol, benzene, and 1, 2-propanediol were used as starting
191 compounds to retrosynthetically generate pathways up to the pathway length of 2.
192
193 Figure 2 illustrates the number of pathways generated with increasing pathway length. The
194 number of generated pathways is linked to the number of potential bond transformations
195 available in the starting compound. For example, the number of glucose pathways produced at
196 each pathway length is greater than the number of acetate and pyruvate pathways produced
197 (Figure 2A). Additionally, the number of phenol pathways significantly increased from 1952
198 at pathway length 1 to 2337181 at pathway length 3 (Figure 2B), further illustrating that more
199 potential for bond transformations in an input compound generates more outputs. From the
200 generated pathways, thermodynamically feasible pathways, i.e., reactions with a negative
201 standard Gibbs free energy to the target compounds were examined. Table 1 shows the number
202 of thermodynamically feasible pathways generated to the target compounds at each pathway
8 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
203 length. No pathways were found to synthetically generate benzene. Table 2 shows the number
204 of thermodynamically feasible retrosynthetic pathways generated at each pathway length.
205
206 Comparing both the synthetic and retrosynthetic results, more pathways were generated
207 retrosynthetically because there were more potential for bond transformations in the
208 retrosynthetic starting compounds than the compounds used for the synthetic analysis.
209 Therefore, more reaction rules were automatically applied to these compounds to generate
210 more outputs, i.e., reactions. Pathways were further analysed individually based on the identity
211 of the compounds involved in the pathways (Figure 3). Pathways were discarded if they
212 included compounds unidentifiable in the PubChem database (Kim et al., 2019). For instance,
213 many of generated compounds contained carbon atoms with 5 or more bonds, which means
214 these compounds would be unlikely to exist in nature. From the synthetic outputs, two
215 pathways to 1, 2-propanediol and one pathway to phenol were accepted (Figure 3A). Figure
216 3B shows the number of accepted and discarded pathways to each target compound for the
217 retrosynthetic outputs: one acetate, ten pyruvate, and seven glucose to 1, 2-propanediol
218 pathways were accepted, while seven acetate, five pyruvate, and fifty nine glucose to phenol
219 pathways were accepted. No acetate to benzene or pyruvate to benzene pathways were accepted
220 because each of these categories of pathways included at least one unidentifiable compound;
221 however, fifteen glucose to benzene pathways were accepted. The accepted reactions were
222 assigned EC numbers capable of catalysing the reactions following the procedure described in
223 Materials and Methods.
224
225 Analysis of the generated pathways using RetroPath2.0
226 The RetroPath2.0 algorithm generated pathways using 14,302 reaction rules known to the E.
227 coli metabolism, benzene as the starting compound, and a set of ‘sink’ compounds. The sink
9 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
228 compounds are the native metabolites of the chassis organism, i.e., E coli metabolism (Delépine
229 et al., 2018). The algorithm converged after 7 iterations, as no new reaction was generated from
230 further run of the algorithm (Figure 4). Figure 4A illustrates the total number of generated
231 reactions, the number of reactions assigned to EC numbers, and the number of compounds
232 generated at each iteration. The number of generated compounds and reactions peaked at
233 iteration 3, generating in total thirty seven reactions and eighty nine compounds, while the total
234 number of reactions and compounds significantly decreased to three and five after the third
235 iteration. This reduction in compounds and reactions can be attributed to the handling of ‘sink’
236 compounds by the algorithm. Ordinarily, the total number of reactions should exponentially
237 increase with each iteration of the algorithm. However, RetroPath2.0 removes all outputs, in
238 which the generated compounds match those that are in the sink set (Delépine et al., 2018);
239 thus, preventing further iterations of the algorithm on those generated compounds. Each
240 reaction was further analysed, and reactions were discarded if they contained compounds
241 unidentifiable in the PubChem (Kim et al., 2019) database. Thus, no reactions were accepted
242 from those that were generated after the fourth iteration (Figure 4B). However, three, two, and
243 twelve reactions were accepted from the reactions generated after the 1st, 2nd, and 3rd iteration
244 of the algorithm, respectively (Figure 4B). RetroPath2.0 automatically assigns EC numbers to
245 each reaction. Only one accepted reaction from the first iteration set and ten accepted reactions
246 from the third iteration set were assigned to EC numbers. The unassigned accepted reactions
247 were then compared to the ReactPRED results to identify similar reactions. Additionally, the
248 reaction rule and co-substrate information were examined to manually assign an EC number to
249 the unassigned reactions.
250
251 No direct pathways to glucose, acetate, or pyruvate were found using RetroPath2.0. However,
252 pathways were generated to connect compounds present within the E. coli metabolism, as well
10 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
253 as in the PubChem database only. Additionally, generating pathways retrosynthetically using
254 phenol as the starting compound produced the same results as benzene, but no pathways were
255 generated using 1, 2-propanediol as the starting compound.
256
257 Analysis of the accepted pathways
258 From the generated outputs of ReactPRED and RetroPath2.0, in total 49 pathways consisting
259 of 106 reactions connecting acetate, glucose, and pyruvate to phenol, benzene, and 1, 2-
260 propanediol were accepted. No 1-step pathway connecting benzene or 1, 2-propanediol to the
261 target starting compounds, i.e., glucose, acetate, and pyruvate was identified. Each pathway
262 contains at least one novel step, while 25 (51%) of the accepted pathways are entirely composed
263 of novel reactions. No pathways were identified in which all reaction steps were known, i.e.,
264 found in the MetaCyc, MetaNetX, and KEGG databases. Many of the accepted pathways
265 contained identical reaction rules, as well as compounds found in the PubChem database only.
266 For example, 13 of the 26 (50%) glucose to phenol pathways (Supplementary data) contained
267 compounds only identifiable in the PubChem database. Many of these compounds are synthetic
268 man-made compounds that are found only in the retrosynthetically generated pathways. Figure
269 5 shows the number of accepted pathways from glucose, pyruvate, and acetate to each
270 commodity chemical.
271
272 The thermodynamic feasibility of the accepted pathways was analysed based on the overall
" 273 standard Gibbs free energy of reactions (∆G!) of each pathway. The standard Gibbs free energy
274 of both the ReactPRED and RetroPath2.0-generated reactions were estimated using the group
275 contribution method (Jankowski et al., 2008; Noor et al., 2012). Figure 6 shows a few notable
276 examples of the accepted pathways discussed in this section, while Figure 7 depicts their
11 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
277 thermodynamic feasibility information. Additional information of the pathways discussed in
278 this section can be found in the supplementary data provided.
279
280 Pathways 1-7 (Supplementary data) are the predicted pathways from acetate to 1,2-propanediol
281 and phenol. Pathways 1, 3, and 4 are composed of the same novel reaction in the 1st step, while
282 pathway 2 include only novel reactions (Figure 6). Each acetate to 1, 2-propanediol producing
283 pathway is thermodynamically feasible although pathways 3 and 4 include reaction steps with
" 284 positive ∆G! (Figure 7). Further comparison of pathways 1, 3, and 4 shows that the acetate to
285 methylglyoxal reaction is thermodynamically more favourable than the acetate to
" 286 hydroxyacetone reaction, resulting in a larger negative ∆G! for the corresponding pathways.
287 The first reaction step in pathways 5, 6, and 7 includes a novel reaction. This novel step
288 involves the transferring of alkyl or aryl groups in 5 and 6, while in 7, this novel step uses a
289 haloacetate dehalogenase-catalysed reaction. All three pathways generate phenol through an
290 arylesterase reaction in the final reaction step. Each pathway is overall thermodynamically
291 feasible. However, pathway 5 was found to be the most thermodynamically feasible while
292 pathway 7 was the least of the 3 acetate to phenol pathways.
293
294 Pathways 8-21 (Supplementary data) are the predicted pyruvate pathways connecting pyruvate
295 to the target commodity chemicals. Pathways 8-12 are predicted to produce hydroxyacetone in
296 the first reaction step using the same novel reaction. Pathways 13-17 predict the production of
297 lactic acid from pyruvate in the first reaction step. This step is a novel reaction for pathways
298 13-16, while it is a biologically known reaction in pathway 17. Only pathways 9 and 10 include
299 a known reaction in the 2nd reaction step. All pyruvate to 1, 2-propanediol pathways are overall
" 300 thermodynamically feasible with each pathway having an overall negative ∆G!. However,
301 further comparison of the reaction steps revealed that the pathway producing 1, 2-propanediol
12 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
302 via lactic acid were thermodynamically more favourable than the pathways through
303 hydroxyacetone.
304
305 Pathways 18-21 produce phenol from pyruvate. The first reaction step in pathways 18, 21, and
306 the 2nd in pathway 20 are a known reaction, while the other reactions are novel in these
307 pathways. Pathway 19 is predicted to use phenyl phosphono hydrogen phosphate as a co-
308 reactant (Supplementary data). Phenyl phosphono hydrogen phosphate was only identified in
309 the PubChem database, indicating that it is not a natural biological compound. Examining the
310 thermodynamic feasibility of the pyruvate to phenol pathways reveal that pathways 18, 19, and
" 311 20 have the same ∆G! (-4.61 and -4.5 kcal/mol) for the first and second reaction steps, while
" 312 only pathway 21 has different ∆G! values (-4.6 and -1.7 kcal/mol) for the two reactions. These
313 estimates lead to the fact that pathways 18, 19, and 20 are overall thermodynamically more
314 favourable than pathway 21.
315
316 All predicted glucose to 1, 2-propanediol pathways, i.e., pathways 22-25 were
317 retrosynthetically generated using ReactPRED and are composed of two novel reactions in both
318 steps. Notably, all of these pathways contain a compound in the first reaction step only found
319 in the PubChem database: 2-(hydroxymethyl)-6-(1-hydroxypropane-2-yloxy)oxane-3,4,5-triol
320 (pathway 22 in Figure 6). The reactions in pathways 22-25 are all thermodynamically feasible
321 with pathway 22 is estimated to be the most thermodynamically feasible glucose to 1, 2-
322 propanediol pathway while pathway 25 is the least. The presence of NADP+ as a cofactor in
323 the first reaction step of pathway 22 is likely to make it thermodynamically more favourable,
324 as NADP+ works alongside enzymes to provide energy for cellular reactions (Xiao et al., 2018).
325
13 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
326 The only 1-step pathway from glucose to phenol (Pathway 26 in Figure 6) was generated
327 retrosynthetically using ReactPRED. Pathways 27-32 include a known reaction in the first step,
328 while the second reaction step in these pathways are all novel (Supplementary data). All of
329 these pathways utilised water produced from the first step as a reactant for the second reaction.
330 Pathways 33-40 and 44 are predicted to consist of novel reactions in both reaction steps.
331 Interestingly, pathways 33-40 generated phenol from phenyl-α-D-glucoside in the second step
332 using the same reaction; this reaction was classified as α-galactosidase and was assigned EC
333 3.2.1.21. Pathway 41 and 42 included the same known reaction in the second reaction step,
334 while this 2nd step in pathway 43 was identified as reaction R05626 using the KEGG database.
335 Overall, each glucose to phenol pathway is thermodynamically feasible even though some
" 336 pathways include reactions with a positive ∆G!.
337
338 The second step in pathways 45 and 46 were retrosynthetically generated by the RetroPath2.0
339 software in the first iteration (Figure 6). These pathways were compared to the ReactPRED
340 generated pathways to find similarities and construct novel reactions. Pathway 45 was
341 confirmed by creating a customised reaction rule set and generating the reactions using
342 ReactPRED’s pathway prediction system (Sivakumar et al., 2016), while pathway 46 was
343 confirmed through comparison of ReactPRED’s retrosynthetic pathway results. Pathway 47-
344 49 were all retrosynthetically generated by ReactPRED and consisted of two novel reactions.
345 Each glucose to benzene pathway is overall thermodynamically feasible. Pathway 48 was
346 identified as the most thermodynamically feasible glucose to benzene pathway with an
" 347 estimated overall ∆G! of -105 kcal/mol, while pathway 46 was the least thermodynamically
" 348 feasible having an overall ∆G! of -9.4 kcal/mol.
349
14 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
350 A closer examination of the accepted pathways further revealed similarities in the co-substrate
" 351 use and bond transformation information, as well as in the estimated ∆G! for multiple reaction
352 steps. For example, 32 of the 80 accepted glucose pathways contain a reverse sucrose alpha-
353 glucohydrolase reaction in the first reaction step to produce sucrose and water (Supplementary
354 data). These pathways used water as the starting compound in the next reaction step and
355 compounds only identified in the PubChem database as co-reactants. Similarly, pathways 22-
356 25, 34, and 46-49 included co-reactants that were only identified in the PubChem database.
357 The presence of a compound only in PubChem but not in other biological databases (MetaCyc,
358 MetaNetX, KEGG) implies that the compound is man-made or synthetic and may not be made
359 by biological systems. 16 out of 28 glucose pathways contained one of these compounds, whilst
360 none of the acetate or pyruvate pathways contained these compounds. Moreover, many other
361 pathways, such as pathways 6-8, 23-26, 28-30 produce target compounds using commodity
362 chemicals as co-reactants or generate CO2 (Pathway 46 in Figure 6) in their reaction steps.
363 Thus, the proposed pathways, although novel, may not necessarily be considered ‘green’.
364 Finally, most of the retrosynthetically generated pathways, including pathways 1, 13-16, 19,
365 22-26, 33-39, 40, 43-45 are completely composed of novel reactions, indicating that
366 retrosynthetic generation allows for more potential novel reactions to be uncovered.
367
368 Comparative analysis of ReactPRED and RetroPath2.0
369 Both software tools, ReactPRED and RetroPath2.0 were applied to generate novel biochemical
370 pathways to three industrially important commodity chemicals: benzene, phenol and 1, 2-
371 propanediol. These cheminformatics tools were designed to be user-friendly and customisable
372 to conduct user-specific pathway design tasks (Delépine et al., 2018; Sivakumar et al., 2016).
373 Additionally, both tools have unique features that enable the design of biochemical pathways
374 of various lengths from different starting compounds to different targets.
15 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
375 Both ReactPRED and RetroPath 2.0 allow users to customise their inputs to generate reactions.
376 For ReactPRED, the input reaction rules were completely customisable, and the software’s
377 reaction rule creation system allows the user to generate tailored reaction rules for specific
378 tasks (Sivakumar et al., 2016). However, this study only utilised ReactPRED’s default reaction
379 rule set to generate outputs, i.e., compounds, reactions, and pathways. Additionally,
380 ReactPRED’s default reaction rule set was created using reactions present in the MetaCyc
381 database (Caspi et al., 2020), and identical rules were merged together; thus, indicating that the
382 novel reactions may be catalysed by more than one enzyme. Uniquely, ReactPRED estimates
383 the overall Gibbs free energy change of the generated reactions, allowing users to assess the
384 thermodynamic feasibility of the generated outputs. Further, ReactPRED allows users to view
385 and search for pathways based on thermodynamic feasibility, molecular weight, and
386 substructure through the user-friendly pathway analysis system.
387
388 Comparatively, RetroPath2.0 allows the user to tailor not only the reaction rules but also the
389 ‘sink’ compounds to find novel pathways in the context of a specific chassis organism. This
390 study used the software’s default reaction rule set and sink compounds that were developed
391 based on the genome-scale metabolic model of E. coli, iJO1366 (Orth et al., 2011) and
392 MetaNetX (Moretti et al., 2016), a meta-database consisting of reactions extracted from the
393 KEGG, MetaCyc, Rhea (Lombardot et al., 2019) and Reactome (Jassal et al., 2020) databases.
394 Additionally, RetroPath2.0 automatically assigns an EC number to each reaction within the
395 chassis strain. Uniquely, RetroPath2.0 uses ‘sinks’ to prevent further iterations of the algorithm
396 using compounds found within the chassis strain. This strategy, thus, not only shortens the
397 execution time of the algorithm, but also prevents the combinational explosion of pathways
398 that is usually generated with cheminformatics tools.
399
16 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
400 Although ReactPRED and RetroPath2.0 have benefits to aid biochemical pathway design, both
401 software tools have limitations. For example, both pieces of software generate compounds that
402 do not exist in nature, i.e., the compounds are unidentifiable in both biochemical and chemical
403 databases. This is an important limitation, as these compounds and relevant reactions cannot
404 be removed automatically from the generated results. Therefore, each pathway is required to
405 be individually analysed to find and discard the reactions and pathways containing these
406 compounds. Another major limitation of ReactPRED is its longer execution time, which can
407 take anywhere from a few seconds to a week to generate the desired outputs. The time taken
408 for ReactPRED to generate outputs is dependent on the number of potential bond
409 transformations available for the starting compound and the number of reaction rules used to
410 predict reactions. For instance, larger input molecules with more bond transformation potential
411 will take longer to predict reactions. Furthermore, similar to many other cheminformatics tools,
412 ReactPRED suffers from the combinational explosion of predicted reactions and pathways. As
413 the pathway length increases, the number of predictions could be in the millions, leading to a
414 greater effort to sift through the data to find meaningful results. Additionally, assigning EC
415 numbers to the ReactPRED predicted reactions is also challenging and requires extensive
416 analysis of the reactions, as discussed in this study, to assign a complete EC number.
417
418 A significant limitation of both pieces of software is that neither ReactPRED nor RetroPath2.0
419 can propose targeted pathways, i.e., generate only the desired reactions connecting starting
420 compounds to target compounds automatically. Instead, both algorithms will continue to
421 generate reactions iteratively until they are converged based on specific cut-off parameters such
422 as pathway length and bond transformation diameter. Hence, a substantial amount of
423 downstream pathway curation work is required to find the meaningful and novel results.
424
17 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
425 Conclusions
426 Cheminformatics tools, ReactPRED and RetroPath2.0 were utilised to design novel
427 biochemical pathways to produce three industrially important commodity chemicals with
428 limited biochemical knowledge: benzene, phenol, and 1, 2-propanediol. All of the 49 designed
429 pathways from glucose, acetate, and pyruvate contained at least one novel step, i.e.,
430 biologically unknown reaction, and all were found to be thermodynamically feasible. A novel
431 methodology for curation of thousands and millions of pathways generated by both software
432 tools was developed, and this method can be used as a guide for designing biochemical
433 pathways to produce not only commodity chemicals but also nutraceuticals and
434 pharmaceuticals. RetroPath2.0 and ReactPRED were also comparatively assessed to provide
435 further insight on their effectiveness as a biochemical pathway design tool, as well as their
436 advantages and limitations in the context of a specific design task. Although both software
437 tools are user-friendly and help design novel pathways, these tools also produce thousands of
438 pathways with compounds non-existent in nature. Hence, this study can be used to develop
439 practical pathway curation strategies while using similar cheminformatics tools to design
440 biochemical pathways. Moreover, the designed pathways can be used as valuable hypotheses
441 for experimental implementation of the pathways in suitable chassis organisms for sustainable
442 production of bio-based commodity chemicals.
443
444
445
446
447
448
449
450
18 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
451 References
452 Bengelsdorf, F.R., Dürre, P., 2017. Gas fermentation for commodity chemicals and fuels. 453 Microb. Biotechnol. 10, 1167–1170. https://doi.org/10.1111/1751-7915.12763
454 Bengelsdorf, F.R., Straub, M., Dürre, P., 2013. Bacterial synthesis gas (syngas) fermentation. 455 Environ. Technol. (United Kingdom). https://doi.org/10.1080/09593330.2013.827747
456 Brunk, E., Neri, M., Tavernelli, I., Hatzimanikatis, V., Rothlisberger, U., 2012. Integrating 457 computational methods to retrofit enzymes to synthetic pathways. Biotechnol. Bioeng. 458 https://doi.org/10.1002/bit.23334
459 Caspi, R., Billington, R., Keseler, I.M., Kothari, A., Krummenacker, M., Midford, P.E., Ong, 460 W.K., Paley, S., Subhraveti, P., Karp, P.D., 2020. The MetaCyc database of metabolic 461 pathways and enzymes-a 2019 update. Nucleic Acids Res. 462 https://doi.org/10.1093/nar/gkz862
463 Cefic, 2018. Facts & Figures of the European chemical industry.
464 Delépine, B., Duigou, T., Carbonell, P., Faulon, J.L., 2018. RetroPath2.0: A retrosynthesis 465 workflow for metabolic engineers. Metab. Eng. 45, 158–170. 466 https://doi.org/10.1016/j.ymben.2017.12.002
467 Egelhofer, V., Schomburg, I., Schomburg, D., 2010. Automatic assignment of EC numbers. 468 PLoS Comput. Biol. 6. https://doi.org/10.1371/journal.pcbi.1000661
469 Finley, S.D., Broadbelt, L.J., Hatzimanikatis, V., 2009. Computational framework for 470 predictive biodegradation. Biotechnol. Bioeng. https://doi.org/10.1002/bit.22489
471 Hadadi, N., Hafner, J., Shajkofci, A., Zisaki, A., Hatzimanikatis, V., 2016. ATLAS of 472 Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic 473 Biology and Metabolic Engineering Studies. ACS Synth. Biol. 474 https://doi.org/10.1021/acssynbio.6b00054
475 Hadadi, N., MohammadiPeyhani, H., Miskovic, L., Seijo, M., Hatzimanikatis, V., 2019. 476 Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive 477 sites. Proc. Natl. Acad. Sci. 116, 7298 LP – 7307. 478 https://doi.org/10.1073/pnas.1818877116
479 Hatzimanikatis, V., Li, C., Ionita, J.A., Henry, C.S., Jankowski, M.D., Broadbelt, L.J., 2005. 480 Exploring the diversity of complex metabolic networks. Bioinformatics. 481 https://doi.org/10.1093/bioinformatics/bti213
482 Henry, C.S., Dejongh, M., Best, A.A., Frybarger, P.M., Linsay, B., Stevens, R.L., 2010. 483 High-throughput generation, optimization and analysis of genome-scale metabolic 484 models. Nat. Biotechnol. 28, 977–982. https://doi.org/10.1038/nbt.1672
19 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
485 Islam, M.A., Hadadi, N., Ataman, M., Hatzimanikatis, V., Stephanopoulos, G., 2017. 486 Exploring biochemical pathways for mono-ethylene glycol (MEG) synthesis from 487 synthesis gas. Metab. Eng. 41, 173–181. https://doi.org/10.1016/j.ymben.2017.04.005
488 Jankowski, M.D., Henry, C.S., Broadbelt, L.J., Hatzimanikatis, V., 2008. Group contribution 489 method for thermodynamic analysis of complex metabolic networks. Biophys. J. 95, 490 1487–1499. https://doi.org/10.1529/biophysj.107.124784
491 Jassal, B., Matthews, L., Viteri, G., Gong, C., Lorente, P., Fabregat, A., Sidiropoulos, K., 492 Cook, J., Gillespie, M., Haw, R., Loney, F., May, B., Milacic, M., Rothfels, K., Sevilla, 493 C., Shamovsky, V., Shorser, S., Varusai, T., Weiser, J., Wu, G., Stein, L., Hermjakob, 494 H., D’Eustachio, P., 2020. The reactome pathway knowledgebase. Nucleic Acids Res. 495 https://doi.org/10.1093/nar/gkz1031
496 Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M., Tanabe, M., 2020. KEGG: 497 integrating viruses and cellular organisms. Nucleic Acids Res. 498 https://doi.org/10.1093/nar/gkaa970
499 Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A., Thiessen, 500 P.A., Yu, B., Zaslavsky, L., Zhang, J., Bolton, E.E., 2019. PubChem 2019 update: 501 Improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109. 502 https://doi.org/10.1093/nar/gky1033
503 Kumar, M.B., Gao, Y., Shen, W., He, L., 2015. Valorisation of protein waste: An enzymatic 504 approach to make commodity chemicals. Front. Chem. Sci. Eng. 307.
505 Kuwahara, H., Alazmi, M., Cui, X., Gao, X., 2016. MRE: a web tool to suggest foreign 506 enzymes for the biosynthesis pathway design with competing endogenous reactions in 507 mind. Nucleic Acids Res. https://doi.org/10.1093/nar/gkw342
508 Lee, J.W., Na, D., Park, J.M., Lee, J., Choi, S., Lee, S.Y., 2012. Systems metabolic 509 engineering of microorganisms for natural and non-natural chemicals. Nat. Chem. Biol. 510 8, 536–546. https://doi.org/10.1038/nchembio.970
511 Lombardot, T., Morgat, A., Axelsen, K.B., Aimo, L., Hyka-Nouspikel, N., Niknejad, A., 512 Ignatchenko, A., Xenarios, I., Coudert, E., Redaschi, N., Bridge, A., 2019. Updates in 513 Rhea: SPARQLing biochemical reaction data. Nucleic Acids Res. 514 https://doi.org/10.1093/nar/gky876
515 Medema, M.H., Van Raaphorst, R., Takano, E., Breitling, R., 2012. Computational tools for 516 the synthetic design of biochemical pathways. Nat. Rev. Microbiol. 517 https://doi.org/10.1038/nrmicro2717
518 Moretti, S., Martin, O., Van Du Tran, T., Bridge, A., Morgat, A., Pagni, M., 2016. 519 MetaNetX/MNXref - Reconciliation of metabolites and biochemical reactions to bring 520 together genome-scale metabolic networks. Nucleic Acids Res. 44, D523–D526. 521 https://doi.org/10.1093/nar/gkv1117
20 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
522 Moriya, Y., Shigemizu, D., Hattori, M., Tokimatsu, T., Kotera, M., Goto, S., Kanehisa, M., 523 2010. PathPred: An enzyme-catalyzed metabolic pathway prediction server. Nucleic 524 Acids Res. https://doi.org/10.1093/nar/gkq318
525 Moura, M., Finkle, J., Stainbrook, S., Greene, J., Broadbelt, L.J., Tyo, K.E.J., 2016. 526 Evaluating enzymatic synthesis of small molecule drugs. Metab. Eng. 527 https://doi.org/10.1016/j.ymben.2015.11.006
528 NC-IUBMB, 1992. Nomenclature committee of the international union of biochemistry and 529 molecular biology. [WWW Document].
530 Noor, E., Bar-Even, A., Flamholz, A., Lubling, Y., Davidi, D., Milo, R., 2012. An integrated 531 open framework for thermodynamics of reactions that combines accuracy and coverage. 532 Bioinformatics. https://doi.org/10.1093/bioinformatics/bts317
533 Orth, J.D., Conrad, T.M., Na, J., Lerman, J.A., Nam, H., Feist, A.M., Palsson, B., 2011. A 534 comprehensive genome-scale reconstruction of Escherichia coli metabolism-2011. Mol. 535 Syst. Biol. https://doi.org/10.1038/msb.2011.65
536 Pérez-Uresti, S., Adrián-Mendiola, J., El-Halwagi, M., Jiménez-Gutiérrez, A., 2017. Techno- 537 Economic Assessment of Benzene Production from Shale Gas. Processes 5, 33. 538 https://doi.org/10.3390/pr5030033
539 Renge, V.C., Khedkar, S. V, Nandurkar, N.R., 2012. Enzyme Synthesis By Fermentation 540 Method : a Review 2, 585–590.
541 Rodrigo, G., Carrera, J., Prather, K.J., Jaramillo, A., 2008. DESHARKY: Automatic design 542 of metabolic pathways for optimal cell growth. Bioinformatics 24, 2554–2556. 543 https://doi.org/10.1093/bioinformatics/btn471
544 Saha, B.C., 2003. Commodity chemicals production by fermentation: An overview. Ferment. 545 Biotechnol. 862, 3–17. https://doi.org/doi:10.1021/bk-2003-0862.ch001\r10.1021/bk- 546 2003-0862.ch001
547 Shin, J.H., Kim, H.U., Kim, D.I., Lee, S.Y., 2013. Production of bulk chemicals via novel 548 metabolic pathways in microorganisms. Biotechnol. Adv. 31, 925–935. 549 https://doi.org/10.1016/j.biotechadv.2012.12.008
550 Siebert, D., Wendisch, V.F., 2015. Metabolic pathway engineering for production of 1,2- 551 propanediol and 1-propanol by Corynebacterium glutamicum. Biotechnol. Biofuels 8, 1– 552 13. https://doi.org/10.1186/s13068-015-0269-0
553 Sivakumar, T.V., Giri, V., Park, J.H., Kim, T.Y., Bhaduri, A., 2016. ReactPRED: A tool to 554 predict and analyze biochemical reactions. Bioinformatics. 555 https://doi.org/10.1093/bioinformatics/btw491
21 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
556 Straathof, A.J.J., 2014. Transformation of biomass into commodity chemicals using enzymes 557 or cells. Chem. Rev. 114, 1871–1908. https://doi.org/10.1021/cr400309c
558 Tawfik, O.K. and D.S., 2010. Enzyme Promiscuity: A Mechanistic and Evolutionary 559 Perspective. Annu. Rev. Biochem. 79, 471–505. https://doi.org/10.1146/annurev- 560 biochem-030409-143718
561 UNEP, 2017. The Emissions Gap Report 2017.
562 Wang, L., Dash, S., Ng, C.Y., Maranas, C.D., 2017. A review of computational tools for 563 design and reconstruction of metabolic pathways. Synth. Syst. Biotechnol. 564 https://doi.org/10.1016/j.synbio.2017.11.002
565 Weininger, D., 1988. SMILES, a Chemical Language and Information System: 1: 566 Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 28, 31– 567 36. https://doi.org/10.1021/ci00057a005
568 Willighagen, E.L., Mayfield, J.W., Alvarsson, J., Berg, A., Carlsson, L., Jeliazkova, N., 569 Kuhn, S., Pluskal, T., Rojas-Chertó, M., Spjuth, O., Torrance, G., Evelo, C.T., Guha, R., 570 Steinbeck, C., 2017. The Chemistry Development Kit (CDK) v2.0: atom typing, 571 depiction, molecular formulas, and substructure searching. J. Cheminform. 572 https://doi.org/10.1186/s13321-017-0220-4
573 Xiao, W., Wang, R.-S., Handy, D.E., Loscalzo, J., 2018. NAD(H) and NADP(H) Redox 574 Couples and Cellular Energy Metabolism. Antioxid. Redox Signal. 28, 251–272. 575 https://doi.org/10.1089/ars.2017.7216
576 Yim, H., Haselbeck, R., Niu, W., Pujol-Baxley, C., Burgard, A., Boldt, J., Khandurina, J., 577 Trawick, J.D., Osterhout, R.E., Stephen, R., Estadilla, J., Teisan, S., Schreyer, H.B., 578 Andrae, S., Yang, T.H., Lee, S.Y., Burk, M.J., Van Dien, S., 2011. Metabolic 579 engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 580 https://doi.org/10.1038/nchembio.580
581
582
583
584
585
586
587
22 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
588 Figure legends
589 Figure 1: Schematic of the overall workflow followed to design biochemical pathways using 590 RetroPath2.0 and ReactPRED. The input to both software tools included source or target 591 compounds, sink compounds, and reaction rules developed from biochemical databases and 592 genome-scale E. coli metabolic model, iJO1366. The generated outputs in excel (RetroPath2.0) 593 and SMILES (ReactPRED) formats were manually curated and extensively analysed with 594 MetaNetX, PubChem, MetaCyc, and KEGG databases, as well as with the online CDK depicter 595 tool to remove non-natural compounds and assign EC numbers to accepted reactions (see text 596 for details).
597 Figure 2: Total number of pathways generated by the ReactPRED software. (A) is showing 598 the number of pathways generated synthetically using acetate, glucose, and pyruvate as 599 starting compounds, and (B) is showing the number of retrosynthetic pathways generated 600 using 1,2-propanediol, phenol, and benzene as starting compounds. The total number of 601 compounds that were generated at each pathway length are also illustrated.
602 Figure 3: The number of accepted and discarded pathways to the target chemicals obtained 603 from synthetic and retrosynthetic runs of ReactPRED software is shown in (A) and (B), 604 respectively. The thermodynamically feasible pathways were subjected to further pathway 605 pruning (see materials and methods). The accepted pathways are the ones that successfully 606 passed the curation criteria while the discarded pathways failed to pass the curation criteria 607 (see materials and methods).
608 Figure 4: Illustration of the results generated by the RetroPath2.0 software: (A) is showing 609 the number of reactions and compounds generated, as well as EC numbers assigned to 610 generated reactions using this software. Notably, 76% of the total reactions were assigned an 611 EC number at iteration 3 (represented by the green bar). (B) is showing the number of 612 accepted and discarded reactions generated at each iteration of RetroPath2.0. The predicted 613 reactions were subjected to further pruning (see materials and methods). The accepted 614 pathways are the ones that successfully passed the curation criteria, while the discarded 615 pathways failed to pass the curation criteria (see materials and methods).
616 Figure 5: Number of accepted pathways to target commodity chemicals from glucose, 617 pyruvate, and acetate. Different categories of accepted pathways to phenol, benzene, and 1,2- 618 propanediol are shown from target starting compounds: glucose, pyruvate, and acetate. Each 619 of these pathways are thermodynamically feasible and passed the pathway curation criteria.
620 Figure 6: Examples of a few predicted pathways with assigned EC numbers. Each pathway 621 contains at least one novel step (blue), while pathways 1, 3, 6, and 18 contain biologically 622 known steps (green) as well. Pathways 22, 26, 45, 46, and 49 include only novel steps.
623 Figure 7: Analysis of the thermodynamic feasibility of the pathways shown in Figure 6. 624 Pathways 1-3, 6, and 18 are shown in (A), and pathways 22, 26, 45-46 and 49 are shown in 625 (B). The overall thermodynamic feasibility of each pathway was evaluated by estimating the
23 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
" " 626 standard Gibbs free energy of reaction (∆G!) for each step followed by combining the ∆G! 627 values for all relevant reactions in a pathway (see text for details).
628 Tables
Table 1: Number of thermodynamically feasible and synthetically generated pathways to target chemicals (1,2-propanediol, benzene, phenol) from synthetic starting compounds (acetate, pyruvate, glucose)
Thermodynamically Thermodynamically Thermodynamically Pathway Length feasible pathways to feasible pathways to feasible pathways to 1,2-Propanediol Benzene Phenol 1 0 0 1 2 1 0 21 3 2 0 164
Table 2: Number of thermodynamically feasible and retrosynthetically generated pathways to target compounds (acetate, glucose, pyruvate) from retrosynthetic inputs (1,2- propanediol, benzene, phenol)
Thermodynamically Thermodynamically Thermodynamically Pathway Length feasible pathways to feasible pathways to feasible pathways to Acetate Glucose Pyruvate 1 0 1 0 2 2220 3580 57
629
24 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Figure 1 Source Compounds
O=C(O)C
Sink Compounds Input
RetroPath 2.0 ReactPRED Generation
P athways Output Output in SMILES in excel file of plain text format format Automat ed
Output Output unidentified PubChem unidentified MetaNetX Output compound Database discarded finder
Output Output identified identified
Compound Present MetaCyc and [#6:2][#6:1][#6:6][#6:4](=[#8:5])[#8:3] [#6:1][#6:2](=[#8:5])[#8:3] KEGG Databases P athways of
Manual Curation Compound not present H OO O Assign EC Online CDK H number to O O Depicter tool reaction bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
A Figure 2 2000000 Acetate Outputs 1800000 1740690 Glucose Outputs 1600000 Pyruvate Outputs 1400000
1200000 1048575 f Pathways 1000000 969769 931440 o 800000
Number 600000
400000
200000 91584 60 903 177 20000 0 1 2 3 Pathway Length B
2500000 1,2-propanediol Outputs 2337181 Phenol Outputs 2000000 Benzene Outputs
1500000 f Pathways o
1000000 Number
500000 417766
94562 96 1952 150 0 1 2 Pathway Length bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
A Figure 3 180 Accepted Pathways 164 160 Discarded Pathways 140
120
100 f Pathways
o 80
60 Number 40
20 2 2 1 0 B 1,2-Propanediol Phenol 4000 Accepted pathways 3449 3500 Discarded Pathways 3000
2500
f Pathways 2000 o
1500 989 Number 1000 649 500 7 1 0 10 2 15 0 72 0 2 59 7 34 5 36 0 Glucose to 1,2- Acetate to 1,2- Pyruvate to 1,2- Glucose to Acetate to Pyruvate to Glucose to Acetate to Pyruvate to Propanediol Propanediol Propanediol Benzene Benzene Benzene Phenol Phenol Phenol bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Figure 4 A 60 90 Total reactions 80 50 Assigned Reactions 70 Total Compounds on s 40 60
50 f Reacti 30 pound s f Com o 40 o
20 30 Number Number 20 10 10
0 0 1 2 3 4 5 6 7 Iterations B 30
25 Accepted Reactions 25 Discarded Reactions on s 20 f Reacti
o 15 12
Number 10
5 3 3 2 2 2 0 0 0 0 0 0 1 2 3 4 5 6 Iterations bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
20 Figure 5
18 18
16
14
12
10 10 f Pathways o 8 7
Number 6 5 4 4 3 2 2 2 1
0 1-step Glucose 2-step Glucose 2-step Glucose 2-step Glucose 2-step Pyruvate 2-step Pyruvate 2-step Acetate 3-Step Acetate 2-step Acetate to Phenol to Phenol to Benzene to 1,2- to 1,2- to Phenol to 1,2- to 1,2- to Phenol propanediol propanediol propanediol propanediol Figure 6 NovelbioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this versionO posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review)1 is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NDO 4.0 International license.
reaction Acetone Hydroxyacetone
NADPH
OO NADP+ O Known O
reaction Acetate Acetate 1,2-propanediol Water Propanol + NADP+ O NADPH O 2 O Propylene glycol 2-acetate
NADP+ NADPH NADPH NADP+ Acetone OO O O O O 3 O 2.8.3.19 O 1.1.1.283 1.1.1.6 Acetate Hydroxyacetone Methylglyoxal 1,1,2-propandiol2-propandiol
Methanethiol Acetate Thioanisole Water O OO OO 6 2.5.1.49 3.1.1.2 Acetate Phenyl acetate Phenol
Phosphenol pyruvate Phenolphosphoric ATP O ATP acid O O
18 ADP O 2.7.1.40 2.7.1.53 Pyruvate Phenol
Propanol + NADPH O Glucose O Water ȕ-NADP+ O O O O O O 22 OO O O O 1.17.1.3 3.2.1.31 O O 1,2-propanediol Glucose 2-(hydroxymethyl)-6- (1-hydroxypropane-2- yloxy)oxane-3, 4, 5- triol
Glucose L- O Phosphate Phenyl phosphate O O O
26 O O O 2.7.1.142 Glucose Phenol
Glucose L- NAD+ O Phosphate Phenyl phosphate O NADH O O
45 O O O 2.7.1.142 1.2.1.-/ 1.2.1.86 Glucose Phenol Benzene
1-galloylglucose Benzoyl 3, 4, 5- O OO CO2 trihydroxybenzoate O O
46 O O O 2.3.1.90 4.1.1.98
Glucose Benzenoic acid Benzene
Alpha-D- NADH O glucopyranoside 2,4-cyclohexadiene-1,2-diol O O NADP+ + water 49 O O NAD+ O 2.7.1.142 1.14.13.243 Glucose Benzene bioRxiv preprint doi: https://doi.org/10.1101/2020.12.31.425007; this version posted January 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.
Figure 7 A 5
Pathway Length 0 1 2 3
-5
-10 l/mol) ca l/mol) (k 0 R G
Δ -15 Pathway 1
Pathway 2 -20 Pathway 3
Pathway 6 -25 Pathway 18
-30
0 B 1 Pathway Length 2
-10
-20
-30 l/mol) ca l/mol)
k -40 ( 0 R G Δ -50 Pathway 22
Pathway 26 -60 Pathway 45
-70 Pathway 46
Pathway 49 -80