<<

Full Papers Chemistry—Methods doi.org/10.1002/cmtd.202000044

1 2 3 A Machine Learning-Enabled Autonomous Flow Chemistry 4 5 Platform for Process Optimization of Multiple Reaction 6 Metrics 7 8 Mohammed I. Jeraal,[b] Simon Sung,[b] and Alexei A. Lapkin*[a, b] 9 10 11 Self-optimization of chemical reactions using machine learning reaction conditions and trade-offs (Pareto fronts) between 12 multi-objective algorithms has the potential to significantly conflicting optimization objectives, such as yield, cost, space- 13 shorten overall process development time, providing users with time yield, and E-factor, in a data efficient manner. Advanta- 14 valuable information about economic and environmental geously, the robust system consists of exclusively commercially 15 factors. Using the Thompson Sampling Efficient Multi-Objective available equipment and a user-friendly MATLAB graphical user 16 (TS-EMO) algorithm, the self-optimization flow chemistry system interface, and was shown to autonomously run 131 experiments 17 in this report demonstrates the ability to identify optimum over 69 hours uninterrupted. 18 19 20 1. Introduction 21 22 Despite the prevalence of established techniques such as 23 Design of Experiments (DoE), reaction optimization is still often 24 a difficult and time-consuming task for chemists.[1] Identifying 25 Figure 1. General flow chart of a reaction self-optimization system. where improvements can be made is challenging due to the 26 large number of process variables with many different possible 27 combinations that should be tested. This issue can be alleviated 28 using self-optimizing systems that combine programmable self-optimizing systems utilize single-objective optimization 29 chemical handlers, a machine-learning reaction optimization algorithms, such as the Nelder-Mead simplex (NMSIM) and 30 algorithm, and online analytical techniques in a real-time Stable Noisy Optimization by Branch and FIT (SNOBFIT).[3–5] 31 adaptive feedback optimization loop (Figure 1). Examples of Owing to the significantly increased complexity of multiple 32 analytical methods suitable for self-optimising experimental objective optimization, there are few algorithms that have been 33 systems include gas chromatography (GC), high-performance demonstrated to efficiently perform this task. Whilst multiple 34 liquid chromatography (HPLC), mass spectrometry (MS), in-situ objectives can be scalarized into a single function, the 35 infrared spectroscopy (IR) and nuclear magnetic resonance weighting given to individual objectives is subjective when 36 (NMR) spectroscopy.[2] A significant advantage of these types of compared to multi-objective optimization. 37 systems is that the optimization procedure can be entirely Another key point for multi-objective algorithms is that 38 automated, where no user intervention is required. objectives sometimes compete with one another (e.g. yield vs. 39 Reaction optimization conducted by chemists is typically cost), which makes it is impossible to find a single set of 40 measured against multiple performance criteria such as yield, ‘utopian’ conditions that correspond with optimal values for 41 cost, impurities profile, and environmental impacts. Therefore, both objectives. One representation of competing multi- 42 the ability for the automated process to self-optimize for objective optimization is a Pareto front (Figure 2),[6] which is a 43 multiple objectives is highly desirable. The majority of existing set of non-dominated data points where either objective cannot 44 be improved without having a detrimental effect on the other, 45 i.e. showing the trade-off between objectives. An example of 46 [a] Prof. A. A. Lapkin Department of Chemical Engineering and Biotechnology an algorithm for efficient multi-objective reaction optimization 47 University of Cambridge is the open-source Thompson Sampling Efficient Multi-Objec- 48 Cambridge CB3 0AS (UK) tive (TS-EMO).[7] Lapkin and co-workers[6,8–10] have demonstrated 49 E-mail: [email protected] [b] Dr. M. I. Jeraal, Dr. S. Sung, Prof. A. A. Lapkin the quality of the generated Pareto fronts, as well as the 50 Cambridge Centre for Advanced Research and Education in Singapore Ltd. algorithm’s efficiency at identifying them, when compared with 51 1 Create Way, CREATE Tower #05-05 alternative algorithms such as ParEGO.[11] Alternative examples 52 138602, Singapore multi-objective algorithms[12] developed for chemical process 53 Supporting information for this article is available on the WWW under https://doi.org/10.1002/cmtd.202000044 include Phoenics[13] and Chimera.[14] 54 © 2020 The Authors. Published by Wiley-VCH GmbH. This is an open access The application of flow chemistry over batch methods for 55 article under the terms of the Creative Commons Attribution License, which self-optimizing systems has significant advantages. As well as 56 permits use, distribution and reproduction in any medium, provided the original work is properly cited. being inherently safer under high temperature and pressure 57

Chemistry—Methods 2021, 1, 71–77 71 © 2020 The Authors. Published by Wiley-VCH GmbH

Wiley VCH Donnerstag, 14.01.2021 2101 / 188507 [S. 71/77] 1 Full Papers Chemistry—Methods doi.org/10.1002/cmtd.202000044

conditions. Here we also aim to further investigate the 1 optimization behavior of the TS-EMO Bayesian optimizer with 2 respect to exploitation vs exploration of experimental parame- 3 ter space. 4 5 6 2. Results and Discussion 7 8 The case study reaction was the reaction 9 between (1) and (2), catalyzed by 10 (3) base, to give the desired benzylideneace- 11 tone (4) product (Scheme 1). The possible side-reactions to 12 form dibenzylideneacetone (5) or acetone polymerization side- 13 products represent an ideal challenge for careful control of 14 reaction conditions chosen by the algorithm. 15 The self-optimization system utilized in this work features 16 exclusively commercially available equipment and the TS-EMO 17 Figure 2. An illustration of a Pareto front (made up of non-dominated multi-objective optimization algorithm (Figure 3). The flow 18 solutions) in a system with two competing optimization objectives, where chemistry equipment consists of two Vapourtec R2 modules 19 values in the infeasible region under the Pareto front are inaccessible to the optimization process. and a R4 reactor module for controlling solution flows and 20 reactor temperatures respectively. These parameters are con- 21 trolled from within the software provided by the manufacturer. 22 conditions (process intensification conditions), in situ analysis Designed for mesoscale flow chemistry,[22] the system uses 23 and closed-loop optimization systems are easier to implement plug-flow modelling by calculating the flow rates and pump 24 in flow conditions as automated direct reaction sampling of the timings in relation to the desired reaction-zone plug sizes, 25 reaction solution can be performed using in-line small volume determination of solution compositions within a plug, and 26 injectors or using non-invasive spectroscopic sampling. Further- automated signaling to reaction samplers and analytical equip- 27 more, subsequent flow chemistry reactions can be conveniently ment when the system is deemed to have reached steady-state. 28 initiated with different continuous reaction variables by modu- These features allow for easy implementation of direct reaction 29 lating reactor temperatures and flow rates. Conversely, the mixture sampling at steady-state using a microliter injector into 30 screening of continuous variables in batch reactions is an online HPLC-UV instrument. A bespoke MATLAB user inter- 31 inefficient, typically requiring expensive robotic equipment.[15] face was developed to control all aspects of the self- 32 Self-optimization flow systems reported in the literature optimization process, including control of physical equipment 33 typically utilize custom-designed setups (consisting of pumps, through interface with commercial software, creation of training 34 reactors, samplers, and analytical equipment) interfaced with in- data sets, reading HPLC data and calculation of optimization 35 house software, which could be detrimental to the widespread objectives, and the complete, autonomous execution of flow 36 adoption and rapid development of these tools. Furthermore, chemistry experiments.[23] This process was repeated iteratively 37 systems are sometimes developed for specific reactions, where until the user terminated the MATLAB environment. It should 38 modifying a system for a different reaction often requires be noted that any downstream processes, such as purification 39 considerable effort and time, even by experts.[16] In contrast, the steps, were not taken into consideration in this work. Therefore, 40 applications of commercially available modular flow chemistry 41 systems, for example by Vapourtec, have been demonstrated to 42 be effective in conducting many different reactions.[17–21] 43 Furthermore, for more complex and scripted applications such 44 as self-optimization, some systems can be remotely controlled 45 through their standard software packages using application 46 programming interfaces (API) written by manufacturers from 47 popular programming environments in languages such as 48 MATLAB or Python. 49 In this study we aim to further develop autonomous self- 50 optimization flow chemistry systems, by developing a robust 51 implementation, based on commercially available equipment 52 and a proven ML algorithm, suitable for various single-step 53 reaction optimization studies. The system has been demon- 54 Scheme 1. Reaction scheme for the sodium hydroxide (3) catalyzed Aldol strated on a sample reaction exhibiting competing reaction 55 condensation case study between benzaldehyde (1) and acetone (2) to pathways where optimisation of process parameters is known produce (4) at reactor temperature, T, with residence 56 to lead to multiple possible “optimal” sets of reaction time, tres. 57

Chemistry—Methods 2021, 1, 71–77 www.chemistrymethods.org 72 © 2020 The Authors. Published by Wiley-VCH GmbH

Wiley VCH Donnerstag, 14.01.2021 2101 / 188507 [S. 72/77] 1 Full Papers Chemistry—Methods doi.org/10.1002/cmtd.202000044

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Figure 3. Schematic of the self-optimization systems containing a Vapourtec flow chemistry pumps and reactor, 4-way sample injector, HPLC-UV analysis, and algorithmic reaction optimization, controlled using a MATLAB based environment. BPR: back pressure regulator. 17 18 19 20 the cost and chemical use in these subsequent processes were The first self-optimization performed in this study targeted 21 not accounted for in the objective calculations. reactions conditions that would maximize yield (Eq. 1) and 22 The four continuous variables optimized in all cases of this minimize cost (Eq. 2). In Eq. 2, material costs were based on the 23 study were (i, ii) the molar equivalents of acetone and sodium prices at which they were purchased at the kg scale from a 24 hydroxide (relative to benzaldehyde), (iii) reactor temperature commercial supplier. 25 (T), and (iv) residence time (t ), see Table 1 for user-defined 26 res lower and upper limits. Volume of benzaldehyde solution was Actual n 27 Yield=% ¼ product � 100 fixed for each reaction. The upper limit for T was chosen as Theoretical n (1) 28 product 70 °C to help avoid acetone polymerization, which results in 29 Total cost of all materials poorly soluble products that clog the flow path and tubular Cost= LÀ 1 ¼ 30 [24] (2) reactor. The residence time limits were set to ensure reactor Vtotal 31 pressure was not excessive with quicker experiments, whilst 32 keeping total experiments to within 45 mins for longest experi- In Eq. 1, n is defined as the number of moles of product 33 product ments. 4; in Eq. 2, the total cost of all materials refers to reaction 34 solvents, reactants, and reagents, and V is the total volume 35 total of the reaction mixture. To initialize the TS-EMO algorithm, a 36 training dataset of 20 experiments with values of the reaction 37 Table 1. Continuous variable limits for the self-optimisation of the aldol conditions being optimized, was generated using Latin hyper- 38 condensation reaction shown in Scheme 1. Molar equivalents is relative to cube sampling (LHS) and autnomously performed. Figure 4).[25] 39 the number of moles of 1. Solution concentrations: [1]=0.5 M in MeCN; [2] Although previous studies have recommended 10 training 40 =6.73 M in MeCN; [3]=0.1 M in EtOH. experiments per continuous variable,[26] five experiments were 41 Self-Optimization One: selected in this instance based on the observed efficiency of 42 Yield and Cost objectives 20 experiment training set TSEMO in previous experimental application.[6] 43 ° Limits Molar equivalents of 2 Molar equivalents of 3 T/ C tres/min Further 47 iterations were designed by the ML algorithm 44 Lower 1 0.02 30 5 and rapidly converged to form a clear Pareto front, where the 45 Upper 10 0.2 70 15 highest yield was 56.1% at a cost of 7.44 £LÀ 1. The lowest cost 46 Self-Optimization Two: reaction was at 6.51 £LÀ 1 but had a much poorer yield of 47 Yield and Cost objectives 2 experiment 10.1%, illustrating the trade-off between these two objectives 48 training set ° (Figure 4). The Pareto front shows how the yield can be 49 Limits Molar equivalents of 2 Molar equivalents of 3 T/ C tres/min significantly increased from 10.1 to 53.4% for relatively minor 50 Lower 1 0.02 30 5 increases in cost from 6.51 to 6.70 £LÀ 1. The variable that had 51 Upper 50 0.2 70 15 the greatest contribution to this significant increase in yield was 52 Self-Optimization Three: the equivalents of relatively inexpensive acetone (Figure 5). The 53 STY and E-factor objectives 20 experiment training set reasons for this are likely through mitigating the formation of 54 ° Limits Molar equivalents of 2 Molar equivalents of 3 T/ C tres/min side-product 5 and increasing the reaction rate to form the 55 Lower 1 0.02 30 5 desired product 4 via increased concentration of 2. However, it 56 Upper 10 0.2 70 15 is also clear that increasing the acetone equivalents has a small 57

Chemistry—Methods 2021, 1, 71–77 www.chemistrymethods.org 73 © 2020 The Authors. Published by Wiley-VCH GmbH

Wiley VCH Donnerstag, 14.01.2021 2101 / 188507 [S. 73/77] 1 Full Papers Chemistry—Methods doi.org/10.1002/cmtd.202000044

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Figure 4. A plot of cost vs yield for experiments related to the self- 20 optimization of aldol condensation reaction in Scheme 1 with limits from 21 Table 1 (Self-Optimization One). The initial training set experiments and self- 22 optimization experiments combine to form a Pareto front and the trade-off between the two optimization targets. 23 24 25 26 detrimental effect on the cost (Figure 5). A minor increase in 27 yield from 53.4 to 56.1% on the Pareto front corresponds to the 28 large increase in the cost from 6.70 to 7.44 £LÀ 1. When the 29 reaction conditions for the costliest data point on the Pareto 30 front (Table 2, Entry 1) is compared with two data points in the 31 cluster just before the sharp increase in cost (Table 2, Entry 2), 32 the main cause for the large cost difference was molar 33 equivalents of sodium hydroxide used. The lower equivalents 34 used for the costly experiment (0.10 equiv.) compared to the 35 other data points (0.19 equiv.) resulted in approximately half Figure 5. 3D (upper) and 2D (lower) plots of experiments run in the Self- 36 the volume of sodium hydroxide solution being pumped into Optimization One of the aldol condensation reaction depicted in Scheme 1. 37 Each point represents a single experiment executed during the optimization. the reaction, which meant the benzaldehyde and acetone 38 The graph displays five variables as follows: (x) molar equivalents of 2, (y) solutions accounted for a greater proportion of the total residence time in the heated reactor (z) temperature of reactor. The point 39 reaction volume. Therefore, the high calculated cost can be size denotes the molar equivalents of 3 in each run. The core color of each 40 point represents the yield (%), whilst the shell color represents the cost of explained by the relatively high prices of benzaldehyde and À 1 41 each experiment (£L ), as shown in the legend. Lower figure is identical to acetone (see ESI for material prices). upper but rotated to depict data as viewed along the y-axis. 42 In contrast to the effects of reactant/reagent molar equiv- 43 alents on the target objectives, Figure 5 shows poor correlation 44 between t and either yield or cost objectives. Whilst reaction 45 res temperature, T, was found to have no correlation with cost, it improvements in yield are observed when increasing temper- 46 did exhibit positive correlation with yield. As shown in Figure 5, ature from 30 °C to above 50 °C. 47 48 49 50 51 Table 2. Table of reaction variables and conditions, objective values, and volumes of each compound solution added (Vn where n=1, 2 or 3 corresponds to benzaldehyde, acetone and sodium hydroxide respectively) of two representative data points from the self-optimisation of the aldol condensation reaction 52 shown in Scheme 1 with limits from Table 1 (Self-Optimization One), showing the difference in cost value as a result of solution volumes added. 53 Reaction variables and conditions Objectives Volume, V, each compound solution added/mL 54 À 1 Entry 1/equiv. 2/equiv. 3/equiv. T/°C tres/min Yield/% Cost/£L V1 V2 V3 55 1 1.00 9.63 0.10 49 7.36 56.1 7.44 5.0 3.6 2.5 56 2 1.00 9.24 0.19 50 7.01 52.4 6.80 5.0 3.4 4.7 57

Chemistry—Methods 2021, 1, 71–77 www.chemistrymethods.org 74 © 2020 The Authors. Published by Wiley-VCH GmbH

Wiley VCH Donnerstag, 14.01.2021 2101 / 188507 [S. 74/77] 1 Full Papers Chemistry—Methods doi.org/10.1002/cmtd.202000044

The reaction conditions used in the 67 experiments showed 1 that the optimization algorithm often selected molar equiv- 2 alents of acetone and sodium hydroxide close to the upper 3 limit (a complete list reaction conditions for all experiments is 4 available in the ESI). As mentioned previously, after 20 training 5 experiments, the algorithm immediately converged to form the 6 Pareto front. Therefore, with respect to the trade-off between 7 exploration and exploitation of the optimization space typically 8 observed in Bayesian optimization processes,[27,28] the system 9 described in this work demonstrated a greater tendency 10 towards exploitation. 11 This behavior is analogous to earlier chemical reaction 12 optimizations using the TS-EMO algorithm reported by Bourne 13 and Lapkin and co-workers.[6] The relatively small number of 14 training experiments required for the efficient optimization 15 observed was proposed to be due to the wide range of 16 experimental conditions, and yield and cost values in the initial 17 training set data.[28] 18 To further investigate the exploration and exploitation 19 characteristics of the TS-EMO algorithm and identify its 20 Figure 6. A plot of cost vs yield for experiments related to the self- effectiveness in locating the Pareto front, the optimization optimization of aldol condensation reaction in Scheme 1 with limits from 21 process was repeated but with only two poorly yielding Table 1 (Self-Optimization Two). The self-optimization experiments combine 22 to form a Pareto front and the trade-off between the two optimisation experiments in the initial training set. In addition, the upper 23 targets. limit for the molar equivalents of acetone variable was 24 increased to 50 equiv. to potentially allow for a greater variation 25 in the yield and cost objectives (Table 1), since the previous 26 optimization experiments were often conducted near the 10 eq. 27 upper limit. 28 Using two training experiments with yields of 3% and 5%, 29 the system performed an additional 129 TS-EMO-designed 30 experiments autonomously without interruption or error for 31 69 hours (full list of reaction conditions for all experiments is 32 available in the ESI). After the initial six optimization iterations, 33 the reaction conditions selected by the algorithm already 34 generated yield and cost values close to the Pareto front, as 35 shown in Figure 6. This further demonstrates the algorithm’s 36 tendency to efficiently exploit, rather than explore the 37 optimization space when locating the Pareto front. The 38 increased scatter in the optimization experiments when 39 compared with the initial optimization of 47 experiments, 40 however, does indicate that algorithm retains a proclivity to 41 explore when the uncertainty in relation to the pareto front is 42 low. As the spread of reactions in the pareto region is far 43 greater in this extended optimization when compared to the 44 original run, it could be argued that an optimal number of 45 Figure 7. Plots for cost vs yield for specific experiments on the Pareto front training experiments helps to improve the efficiency of the related to the self-optimization of aldol condensation reaction in Scheme 1 46 optimization process. with limits from Table 1. 47 The cost vs. yield Pareto fronts produced from the two 48 optimizations using different initial training sets are comparable 49 (Figure 7), which suggests that acetone equivalents above 10 volume (V ), and t (Eq. 3); whilst E-factor[29] is defined as 50 reactor res had a negligible effect on these objectives. the ratio of the mass of waste (m ) to m (Eq. 4). 51 waste product The final self-optimization performed in this study targeted 52 reactions conditions that would maximize space-time yield À 1 À 1 mproduct 53 STY=g L h ¼ (3) (STY) and minimize the environmental impact using the E-factor Vreactor � tres 54 metric. Space-time yield is a measure of reactor productivity 55 related to the mass of product 4 formed (m ), the reactor 56 product 57

Chemistry—Methods 2021, 1, 71–77 www.chemistrymethods.org 75 © 2020 The Authors. Published by Wiley-VCH GmbH

Wiley VCH Donnerstag, 14.01.2021 2101 / 188507 [S. 75/77] 1 Full Papers Chemistry—Methods doi.org/10.1002/cmtd.202000044

3. Conclusions 1 mwaste E À factor ¼ (4) 2 mproduct A self-optimization system consisting of a bespoke MATLAB 3 user interface, a commercially available flow chemistry system, 4 The same initial training set of 20 experiments from the sampling and HPLC equipment and a self-optimizing algorithm 5 previous optimizations, as well as the same lower and upper was built and demonstrated autonomous uninterrupted oper- 6 variable limits (Table 1) were used to commence the process. It ation for as many as 131 reactions over 69 hours. The multi- 7 should be noted that a plot of log (E-factor) against STY, for objective optimization algorithm was proven to be able to 8 10 the original training set was not spread across the optimization rapidly exploit the optimization space and locate optimum 9 space as it had been for the previous optimizations (Figure 8). reaction conditions and key trade-off zones if competing 10 This suggested that there was no trade-off between the targets objectives were under investigation. In the aldol condensation 11 and therefore there could be a utopian optimum in this case study shown in Scheme 1, multi-objective optimizations to 12 instance. The results after 55 TS-EMO optimization iterations simultaneously maximize yield and minimize cost indicated that 13 confirmed that there was no Pareto front for these objectives, these two performance criteria competed with each other and 14 and instead identified an optimum where STY was formed a clear Pareto front. In contrast, optimizations to 15 237.43 gLÀ 1 hÀ 1 and E-Factor=39.7 (Figure 8). Like the optimal maximum STY and minimize E-factor converged towards a set 16 conditions for the maximum yields in the previous optimiza- of optimum reaction conditions. 17 tions, the ideal reaction conditions for achieving high STY and Given the modularity of the commercial system employed, 18 low E-factor corresponded to high acetone equivalents the flow chemistry setup can be easily modified with different 19 (9.94 equiv.), as well as a low t of 5.1 min. The absence of a supported components (such as pumps, tubing, reactors, 20 res Pareto front is due to the closeness of densities of benzalde- purification modules) and/or additional chemical handlers for 21 hyde, acetone and sodium hydroxide solutions (0.795, 0.785 reactant loading/product collection. With respect to the han- 22 and 0.792 gmLÀ 1 respectively; see ESI for derivation). As the dling of discrete variables, such as reagents and solvents, the 23 solvent accounts for most of the waste generated, the amount TS-EMO optimizing algorithm was recently reported to be 24 of waste generated between experiments is very similar. This successful in optimizing for solvents in a ruthenium-catalyzed 25 leaves both STY and E-factor being mostly dependent on asymmetric hydrogenation reaction.[10] Developments into han- 26 product quantity, and therefore allowed an optimum result to dling discrete variables are currently underway in our labora- 27 be identified. tory, with the aim to demonstrate the improved capabilities 28 and efficiencies using robotic workflows in process develop- 29 ment. 30 31 32 Experimental Section 33

34 Benzaldehyde was purified by washing with aqueous 10% Na2CO3 35 solution, isolated by liquidÀ liquid separation, distilled under reduced pressure, and then stored under a nitrogen atmosphere. 36 All other chemicals were used as received. 37 38 HPLC analysis was performed using an Agilent 1260 Infinity system 39 equipped with a G1311B quaternary pump, Eclipse XDB-C18 column (Agilent product number: 961967-302), and G1314F varia- 40 ble wavelength detector (VWD). Compounds were separated using 41 the following HPLC quaternary pump method: the initial mobile 42 phase was a 5:95 (v/v) binary mixture of acetonitrile and water 43 flowing at 0.2 mLminÀ 1. Immediately after sample injection, the 44 flow rate and ratio of acetonitrile and water were steadily changed to 1 mLminÀ 1 and 95:5 (v/v) during the first 5 min. At a flow rate of 45 1 mLminÀ 1, the binary mixture ratio is returned to 5:95 (v/v) 46 acetonitrile:water over a duration of 1.5 min in a linear gradient. 47 This binary mixture ratio is then held constant at 1 mLminÀ 1 for the 48 next 1.5 min, after which the analysis is complete (after a total of À 49 8 min), and the method returns to a flow rate of 0.2 mLmin 1. The 50 VWD wavelength was changed over the 8 min analysis time as follows: the absorption wavelength was 254 nm for the initial 51 4.50 min, after it switched to 333 nm. After 1 min, the wavelength 52 changed to 225 nm, then after an additional 0.57 min, the wave- 53 Figure 8. A plot of E-factor against STY for experiments related to the self- length was returned to 254 nm. 54 optimization of aldol condensation reaction in Scheme 1 with limits from A schematic of the flow chemistry equipment and HPLC analysis 55 Table 1 (Self-Optimization Three). In this case, there is an optimum solution for these two optimization targets, indicating there is no trade-off between components, as part of the self-optimization system are shown in 56 them and therefore no Pareto front is present. Figure 3. Communication with the Flow Commander software for 57

Chemistry—Methods 2021, 1, 71–77 www.chemistrymethods.org 76 © 2020 The Authors. Published by Wiley-VCH GmbH

Wiley VCH Donnerstag, 14.01.2021 2101 / 188507 [S. 76/77] 1 Full Papers Chemistry—Methods doi.org/10.1002/cmtd.202000044

controlling the Vapourtec flow chemistry equipment was per- [3] W. Huyer, A. Neumaier, ACM Trans. Math. Softw. 2008, 35, 1–25. 1 formed from a custom MATLAB user interface environment.[23] In [4] M. W. Routh, P. A. Swartz, M. B. Denton, Anal. Chem. 1977, 49, 1422– 2 this interface, the user selects the optimization variables and 1428. 3 defines their limits, the physical properties of the reactants, HPLC [5] J. A. Nelder, R. Mead, Comput. J. 1965, 7, 308–313. [6] A. M. Schweidtmann, A. D. Clayton, N. Holmes, E. Bradford, R. A. Bourne, 4 parameters, the reaction scale, the optimization objectives, and the A. A. Lapkin, Chem. Eng. J. 2018, 352, 277–282. 5 number of training experiments. Based on the flow rate of each [7] E. Bradford, A. M. Schweidtmann, A. Lapkin, J. Glob. Optim. 2018, 71, reactant solution, Flow Commander calculated the time at which 6 407–438. the reaction mixture is at steady state and automatically triggered [8] D. Helmdach, P. Yaseneva, P. K. Heer, A. M. Schweidtmann, A. A. Lapkin, 7 the VICI Valco 4-port, 2 position sample injector to take a 60 nL ChemSusChem 2017, 10, 3632–3643. 8 sample from the flow path and send it to the HPLC system for [9] A. D. Clayton, A. M. Schweidtmann, G. Clemens, J. A. Manson, C. J. 9 analysis. Extraction of HPLC chromatogram retention times and Taylor, C. G. Niño, T. W. Chamberlain, N. Kapur, A. J. Blacker, A. A. Lapkin, R. A. Bourne, Chem. Eng. J. 2019, 123340. peak areas, and calculation of yield, cost, STY, and/or E-factor 10 [10] Y. Amar, A. M. Schweidtmann, P. Deutsch, L. Cao, A. Lapkin, Chem. Sci. 11 occurred automatically after HPLC analysis was complete. The 2019, 10, 6697–6706. newly calculated values and all previous values were automatically 12 [11] J. Knowles, IEEE Trans. Evol. Comput. 2006, 10, 50–66. inputted into the TS-EMO optimization algorithm, which in turn [12] C. A. Coello Coello, S. González Brambila, J. Figueroa Gamboa, M. G. 13 returned the reaction conditions for the next experiment in the Castillo Tapia, R. Hernández Gómez, Complex Intell. Syst. 2020, 6, 221– 14 optimization cycle (Figure 3). MATLAB then sent the new reaction 236. 15 conditions to Flow Commander for autonomous execution of the [13] F. Häse, L. M. Roch, C. Kreisbeck, A. Aspuru-Guzik, ACS Cent. Sci. 2018, 4, next reaction. In all experiments, the volume of benzaldehyde 1134–1145. 16 [14] F. Häse, L. M. Roch, A. Aspuru-Guzik, Chem. Sci. 2018, 9, 7642–7655. 17 solution (with naphthalene as an internal standard), was kept [15] M. B. Plutschack, B. Pieber, K. Gilmore, P. H. Seeberger, Chem. Rev. 2017, constant at a user specified quantity. Throughout this study, a 18 117, 11796–11893. single experiment was executed, analyzed and processed by the [16] N. Cherkasov, Y. Bai, A. J. Expósito, E. V. Rebrov, React. Chem. Eng. 2018, 19 ML algorithm before the conditions for the next experiment were 3, 769–780. 20 generated. Complete tables of the reaction conditions used for all [17] P. R. D. Murray, D. L. Browne, J. C. Pastre, C. Butters, D. Guthrie, S. V. Ley, 21 experiments can be found in the ESI. Org. Process Res. Dev. 2013, 17, 1192–1208. [18] F. Lévesque, P. H. Seeberger, Angew. Chem. Int. Ed. 2012, 51, 1706–1709; 22 Angew. Chem. 2012, 124, 1738–1741. 23 [19] B. Y. Park, M. T. Pirnot, S. L. Buchwald, J. Org. Chem. 2020, 85, 3234– 24 3244. Acknowledgements [20] E. T. Sletten, M. Nuño, D. Guthrie, P. H. Seeberger, Chem. Commun. 25 2019, 55, 14598–14601. 26 [21] M. Elsherbini, B. Winterson, H. Alharbi, A. A. Folgueiras-Amador, C. The project is funded by Pharma Innovation Programme 27 Génot, T. Wirth, Angew. Chem. Int. Ed. 2019, 58, 9811–9815. Singapore (PIPS). [22] R. C. Wheeler, O. Benali, M. Deal, E. Farrant, S. J. F. MacDonald, B. H. 28 Warrington, Org. Process Res. Dev. 2007, 11, 704–710. 29 [23] The MATLAB user interface and self-optimisation code (excluding 30 proprietary Vapourtec code lines) is freely available on GitHub at http:// Conflict of Interest github.com/cares-pips/mat-optimiser, 2020. 31 [24] M. I. Jeraal, N. Holmes, G. R. Akien, R. A. Bourne, Tetrahedron 2018, 74, 32 3158–3164. The authors declare no conflict of interest. 33 [25] V. R. Joseph, Y. Hung, Stat. Sin. 2008, 18, 171–186. [26] J. L. Loeppky, J. Sacks, W. J. Welch, Technometrics 2009, 51, 366–376. 34 [27] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, N. De Freitas, Proc. IEEE 35 Keywords: aldol reactions · flow chemistry · green chemistry · 2016, 104, 148–175. 36 machine-learning · synthesis optimization [28] S. Sano, T. Kadowaki, K. Tsuda, S. Kimura, J. Pharm. Innov. 2019, 1–11. [29] R. A. Sheldon, ACS Sustainable Chem. Eng. 2018, 6, 32–48. 37 38 [1] S. A. Weissman, N. G. Anderson, Org. Process Res. Dev. 2015, 19, 1605– 39 1633. 40 [2] C. Mateos, M. J. Nieves-Remacha, J. A. Rincón, React. Chem. Eng. 2019, 4, Manuscript received: August 18, 2020 41 1536–1544. Version of record online: December 16, 2020 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57

Chemistry—Methods 2021, 1, 71–77 www.chemistrymethods.org 77 © 2020 The Authors. Published by Wiley-VCH GmbH

Wiley VCH Donnerstag, 14.01.2021 2101 / 188507 [S. 77/77] 1