Downloaded by guest on September 29, 2021 eaoi modeling metabolic biological mechanistic and discernible predic- of understanding insights. the extent for the both and tools accuracy increasing tion provides phenotypes, complex study learn- manipulating machine this information on interpretable Overall, disjoint based and ing. models, informed with silico biologically contribute in toward demonstrate can with thus biochemistry, cues results known modeled experimental Our not fusing reconstruction. were that genes metabolic show whose predic- the strains we the in knockout Finally, improves for features dataset. also flux tions independent mechanistic gen- robust- additional introducing its strains that an verifying 86 therefore further to experiment, a ness different We on patterns a alone. network in functional expression neural erated gene reveals proposed from the and deducible test directly latter and not pre- the are fluxomic the that of increases using former accuracy network the dictive that neural showing multiview data, approaches. transcriptomic a learning propose state- multiview incorporating and We methods, selection machine-learning feature strain-specific 27 of-the-art create test we we 1,143 end, and for this models To to growth. data metabolic flux metabolic yeast expression generated predict gene computationally combining with techniques, machine-learning– profiles for integration compare unexplored. and data frameworks largely assess, based two is rigorously How- integration these propose, combined. and We clear integrating are augmentation becoming of they data is omic when potential it maximized the isolation, is ever, in value used their being that than biology Rather relation- synthetic ship. genotype–phenotype–environment and the systems targeting of tools, generation 2020) in 16, next components February review key emerging for are the (received learning 2020 machine 12, and June modeling approved Metabolic and Denmark, Copenhagen, Institute, BioInnovation Nielsen, Jens by Edited and Kingdom; Kingdom United United 3BX, 3BX, TS1 TS1 Middlesbrough Middlesbrough University, Teesside Systems, Information and a Culley growth Christopher cell yeast characterizes pipeline machine-learning multiomic and mechanism-aware A www.pnas.org/cgi/doi/10.1073/pnas.2002959117 fos- has data learning omic multimodal of of application (3). heterogeneity methods and the development mechanisms context, the biological tered this underlying In the of (2). aid understanding can our which information, and in metabolic identify this to within still tools patterns is provides exploit scale learning omic large Machine in a (1). on innovations challenging to activity recent closer metabolic tak- despite are sampling processes but, technologies, data the phenotype Metabolic of cellular . part the biological a only in describing place relevance hand, ing limited at several have task in may the However, to genes level. on epigenetic information and applications, genetic a on nomena of transcriptome. time—the level certain activation a at overall genes data. of the its composition omic genome—and genetic as —the global known an the are collectively examples or are Representative biomolecules and specific processes target biological that range devices a by high-throughput generated are of data Such biology. molecular in tlenecks T biology systems aut fEgneigadPyia cecs nvriyo otapo,SuhmtnS1 B,Uie Kingdom; United 1BJ, SO17 Southampton Southampton, of University Sciences, Physical and Engineering of Faculty oua ehooispri h oioigo aiu phe- various of monitoring the permit technologies Popular eaayi fcmlx ihdmninlbooia data bot- main biological the of high-dimensional one currently is complex, sources heterogeneous from of analysis he | utmdllearning multimodal | ahn learning machine a,b ureaVijayakumar Supreeta , acaoye cerevisiae Saccharomyces | u aac analysis balance flux b ud Zampieri Guido , | mutants doi:10.1073/pnas.2002959117/-/DCSupplemental at online information supporting contains article This 1 presented.y results the replicating for information aadpsto:Aldt,mdl,adcd sdi hswr vial on available are work this in used code and models, data, at GitHub All deposition: Data the under Published Submission.y Direct PNAS a acquired is article C.A. This and interest.y project; competing no the declare authors administered The C.A. and project.y the G.Z. supervised G.Z., paper; and S.V., funding C.C., the data; wrote analyzed S.V. C.A. and and C.C. and G.Z. tools; research; reagents/analytic performed new C.C. contributed research; C.A. designed C.A. and G.Z. contributions: Author om nbscrsac swl si itcnlg n,more with and, associated processes in the (11). diseases as characterizing human well for as used strains research recently, 1,143 basic for in growth forms cellular the predict of use to We interest. framework of traits this strain-specific phenotypic and predict to data models transcriptomic metabolic both leverages that work is process learning such the lacking. in still integrate fully knowledge incorpo- biological to to able that mechanistic therefore rate potential is and approach omics learning experimental with integrative recently models multimodal (4– as an the models advantages, However, previ- exploits predictive machine-learning (10). providing been specific cases have reviewed some inform silico scale. in to in 9), cellular be used a generated can on profiles ously metabolism (CBM) flux model. steady-state modeling Metabolic simulate obtained constraint-based to gaps, any the used these limiting of fill analysis, interpretability To pattern the and driving trustworthiness in knowledge logical owo orsodnemyb drse.Eal [email protected] Email: addressed. be may correspondence whom To vroigbakbxlmttoso ovninldata-driven approaches. conventional of limitations black-box overcoming therefore and knowledge, mechanistic metabolic incorporating domains, and biological between learning interactions machine unknown revealing both modeling, of lever- advantages can approach the our age that multi- show We in custom-built data. and model-generated generated a mechanisms experimentally merges core, driving method learning its modal its characterizing At understanding cerevisiae. for Saccharomyces and models, pro- growth metabolic expression cell mechanistic gene com- and large-scale not files integrates machine-learning still a that are test and approach bases propose functional We understood. its pletely yet metabolism, regulation, result- gene and trait, environment, biotechnological phenotypic between and interactions central from a biomedical ing is several growth problem Cell to applications. fundamental key a biology, is in phenotype and genotype Linking Significance b nti ok epooeamlioa erigframe- learning multimodal a propose we work, this In bio- previous ignore generally techniques Machine-learning n ftemi uaytcplat- eukaryotic main the of one cerevisiae, Saccharomyces n lui Angione Claudio and , ln ihthe with along https://github.com/multiOmicMechanismAwareML/CodeBase, c elhaeInvto ete esieUniversity, Teesside Centre, Innovation Healthcare NSlicense.y PNAS b,c,1 b eateto optrScience Computer of Department . y https://www.pnas.org/lookup/suppl/ NSLts Articles Latest PNAS | f11 of 1

BIOPHYSICS AND COMPUTATIONAL BIOLOGY Cellular growth and gene expression are closely related in uni- gration approaches and 2) an examination of the benefits of using cellular organisms, as they coparticipate in mutual regulation. metabolic modeling in building multimodal machine-learning On the one hand, growth is sustained by genes implicated in predictive models, evaluating to what extent these mechanistic ribosomal and translational functions. In parallel, the expres- data are used to drive the learning process. sion of genes is affected by global and unspecific regulation originating from the physiological state of the cell (12). This Results relationship has yet to be fully understood, and therefore pre- Our goal was to develop and evaluate a multiomic mechanism- dicting cellular growth following genetic manipulations is still aware pipeline for predicting S. cerevisiae growth rate. To this challenging. Understanding and controlling cellular growth have end, we developed the workflow summarized in Fig. 1. In brief, important applications in disease modeling, in biotechnology, we used CBM of metabolism to estimate the metabolic activity and for the development of efficient cell factories (13). CRISPR- of each yeast mutant in the exponential growth phase, starting Cas9–enabled genetic engineering now allows modifying yeast from their transcriptional activity. Then, we built and cross- DNA with single-nucleotide precision in vivo (14), achieving compared 27 machine-learning models of yeast growth from a engineered strains that maximize a desired output. However, the combination of transcript abundance and metabolic flux infor- identification of such strains is a complex issue (15). For instance, mation. These steps and their output are described in detail streamlining yeast metabolism for the production of valuable in the following. compounds often requires the deletion of multiple genes and efficient diversion of resources toward production pathways (16). Strain-Specific Metabolic Modeling of Yeast Mutants. Genome- In an attempt to fully elucidate relationships between cel- scale metabolic models (GSMMs) aim to capture and simulate lular growth and other processes, mathematical models have the entire metabolic activity within a cell. Since different tran- been developed, particularly in bacteria and yeast (17–19). For scription rates lead to alterations of cell behavior, we used gene instance, coarse-grained models were designed to describe the expression data to create 1,229 strain-specific models that emu- global relationship between the allocation of resources toward late the corresponding metabolism. Through these simulations, protein synthesis and growth (20). Further, extensive models we extracted a measure of this metabolic activity in the form of of metabolic networks are commonly used to simulate cel- reaction fluxes for each strain (fluxomic data). lular metabolism under different growth conditions (21, 22). In particular, we focused on a transcriptomic dataset with These models offer quantitative mechanistic representations of 1,143 single deletion strains of S. cerevisiae (27) and a second molecular processes, but often require detailed knowledge about dataset comprising 86 single and double mutants (28), for a uptake rates from the environment to achieve precise estimates. total of 1,229 strains. The former was used as the main resource On the other hand, accurate and flexible models connecting for model training, optimization, and testing, while the latter gene expression and cell growth can be obtained by data-driven served as an experimentally independent test set in the pre- statistical and machine-learning methods. As gene expression dictive modeling stage. We used a recently refined GSMM of maintains a steady state during the log phase (23), it is possi- yeast metabolism (29) in conjunction with Eq. 2 in Materials ble to predict the growth rate even in cases where experimental and Methods to build the corresponding 1,229 strain-specific measurements are not feasible. This is particularly relevant in models. This was achieved through a set of 908 genes involved the development of synthetic systems, where phenotypic traits in metabolism, represented within the yeast GSMM and put have to be tightly controlled. Previous research has focused on in relation to the biochemical reactions they control. In the building linear predictive models for yeast growth (24) and more following, we refer to the full transcriptomic profiles as “gene recently machine learning for both and S. cere- expression” (GE) data and to the reduced transcript information visiae (25). While both studies used gene expression profiles from these 908 genes as “metabolic gene expression” (MGE), as alone, metabolic activity is also tightly bound to cell growth (26). depicted in Fig. 1B. Our idea is that reconnecting metabolic activity to cell growth To create the strain-specific metabolic models, we altered the with a data-driven and multiview approach should support more reaction bounds within the yeast GSMM based on expression accurate machine-learning predictions, while incorporating bio- fold-change levels in the MGE dataset. To reproduce nutritional logical mechanisms within the learning process. To investigate conditions, we set the uptake rates according to the feed com- this idea, we used a compendium of 1,143 single-gene knockout position used in the original study (Materials and Methods). We S. cerevisiae strains, with their genome-wide expression profiles then used pFBA to determine the reaction fluxes for the entire as training data to build models that predict cell doubling times. network by maximizing the biomass accumulation rate subject We augmented the array of biological predictors by incorporat- to model constraints. In this setting, we ensure that metabolic ing a metabolic modeling phase, wherein we use transcriptomic activity is coupled with gene expression and independent of envi- profile integration in CBM to simulate strain-specific metabolism ronmental conditions, which are homogeneous across all strains using parsimonious flux balance analysis (pFBA). From these (Fig. 2A). Fig. 2 B and C shows the relationship between the simulations, we extracted reaction fluxes as additional features pFBA-predicted biomass accumulation rate and the experimen- (fluxomic data). We then applied machine-learning methods tally measured relative cell doubling time in the two sets of using the transcriptomic and fluxomic datasets combined across mutants. As expected, we obtained a clear negative correlation 27 data–method combinations, testing different approaches for between the two quantities, with a Pearson’s correlation coef- their multiview integration. When the integration of the two ficient (PCC) = − 0.66, P < 10−15 in the first set and PCC = omics was performed within a neural network architecture, we − 0.76, P < 10−15 in the second set. found a significant improvement compared to using transcrip- Metabolic modeling of the yeast mutant populations also tomic data alone. Upon finding that the proposed model, a allowed us to identify pathways of biological interest that multimodal artificial neural network, achieves the best perfor- are highly correlated with growth, therefore providing means mance, we tested it on a further 86 “unseen” strains generated to assess the mechanistic knowledge supporting the machine- in a different experiment and not used in the training phase, learning models we developed in the next stage. Fig. 2D shows verifying its robustness to this independent dataset. the mean absolute correlation of fluxes inside each pathway Our contributions thus focus on two aspects: 1) an inves- with the relative doubling time. Among those pathways that tigation into the viability of building predictive models using correlate most strongly with growth (|PCC|≥ 0.6) we found transcriptomic and fluxomic information through a comparison amino acid and aminoacyl-tRNA metabolism, as well as path- of machine-learning, feature selection, and multiview data inte- ways involved in producing the fuel for growth such as starch,

2 of 11 | www.pnas.org/cgi/doi/10.1073/pnas.2002959117 Culley et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 n tanseicGM ecinflxswti ahmtblcptwy ihcreain eeietfidfrmiss mn cd,adcarbohydrates and acids, amino al. meiosis, et Culley for identified were correlations High pathway. time metabolic doubling relative each experimental within between correlation fluxes absolute metabolism. C ( reaction Mean set (D) test GSMM growth. independent strain-specific yeast experimentally measured and the expression the for and study, recapitulates and regulation our approach (B) gene modeling In set varying constraint-based initial framework. of the computational influence for unified the both by a pFBA, driven within mainly strain-specific views are both metabolism integrate and to (B growth seek cellular conditions. hence we and here fixed, metabolism, are or conditions expression environmental gene either consider models most 2. Fig. galac- and acid dibasic metabolism, C5-branched purine identified as we Furthermore, such metabolism. tose important growth, are cell that processes for with between intermediaries keeping as path- act correlated in highly ways 27 Other metabolism, total, (30). In results fructose experimental views). previous data and or modes riboflavin, data sucrose, (MMANN). as network to neural referred multimodal (also custom a sources including data a stage, multiple in this for GSMM in tested the designed stage were of algorithms combinations machine-learning constraints with dataset–model the flux data In the multiomic (MFs). tailor fluxes the to (C.II metabolic learning—integrating data MF; associated the or the used MGE, obtain and coupled GE, to (MGE) only validation), GSMMs metabolism strain-specific independent in such for involved to used genes pFBA (C strains the applied double-knockout for we and data Next, manner. (GE) of single- strain-specific expression GSMM 86 gene (plus a the strains and extracted yeast rate we single-knockout growth 1,143 relative of their screen with expression gene a is 1. Fig. B AD B AC (ii) Growthrate (i) Geneexpressionprofiles Mapping ofmetabolic (MGE) ontheGSMM Expression level(GE) FBA growth rate [1/h] ( through achieved was This growth. yeast of models machine-learning construct to data MF and MGE, GE, the used we ), 0.5 1.0 1.5 2.0 gene expression Data anddefinitions Strain-specific metabolicmodeling Gene regulation Environment Metabolic flux(MF) 1143 + 86 Strain Growth rate(GR) ,4 igeK tan 86single-anddouble-KOstrains 1,143 single-KOstrains Strain 1 strains S. cerevisiae eainhpbtencl rwhadtemi ilgclpoess While processes. biological main the and growth cell between Relationship (A) knockouts. yeast of modeling metabolic strain-specific of Results u utoi nerto n rdcinfaeok nldn l ftedtst n ahn-erigmtosue nti td.Teinput The study. this in used methods machine-learning and datasets the of all including framework, prediction and integration multiomic Our 1143 + 86 . . . . 2.0 1.5 1.0 0.5 0.0 log

2 ... ofdoubling timefold change (i) and C es uateprmna eaieduln iepotdaantterboasacmlto ae opttoal siae by estimated computationally rate, accumulation biomass their against plotted time doubling relative experimental mutant Yeast ) Low strain-specific (ii) Creation of GSMMs PCC octnto,faueslcin n igeve erigrdcn h ubro EadM rdcos n (C.III and predictors; MF and GE of number the learning—reducing single-view and selection, feature concatenation, ) Gene expression High P 1,143 +86 KO strains ...   10 0.66  15 Reaction rate C pFBA simulates 0.5 1.0 1.5 2.0 (flux) metabolic Flux lower activity genome-scale metabolic bound . . . 1.5 1.0 0.5 0.0 log

... model (GSMM) 2 S. cerevisiae ofdoubling timefold change

Omics cerevisiae S. Metabolic activity (MF) formanew Metabolic fluxes Metabolites data view PCC Growth Flux upper bound

P ...   .Ormtoooyi iie notomi tgs ntemtblcmdln tg (B), stage modeling metabolic the In stages. main two into divided is methodology Our (A). 10 0.76  15

Mean absolute PCC 0.00 0.25 0.50 0.75 MF C.III -Multiviewlearning C.I -Single-viewlearning C.II -Concatenation,featureselection,andsingle-viewlearning Starch and sucrose metabolism GR Amino sugar and nucleotide sugar metabolism Sulfur metabolism Machine learning

Pyrimidine metabolism GR Aminoacyl−tRNA biosynthesis

Carbapenem biosynthesis growth research recent that supports fact actively pyrimidine the with can correlated Finally, which also (33). is metabolism, division rate cell sulfur initial and promote (32); yeast with degra- correlated rates RNA strongly (31); be growth to growth shown cell been has regulate which to dation, found been has which Histidine metabolism Monobactam biosynthesis MF Porphyrin and chlorophyll metabolism Riboflavin metabolism Selenocompound metabolism Sesquiterpenoid and triterpenoid biosynthesis Vitamin B6 metabolism GE/MGE Nitrogen metabolism Oxidative phosphorylation Galactose metabolism Fructose and mannose metabolism RNA degradation Pentose and glucuronate interconversions

Pentose phosphate GR Purine metabolism

.Tengtv orlto ugssta u strain-specific our that suggests correlation negative The ). Valine, leucine and isoleucine biosynthesis C5−Branched dibasic acid metabolism GE/MGE Biosynthesis of amino acids • Iterativerandomforest(iRF) genetic algorithmII(NSGA-II) • Non-dominatedsorting • SparsegroupLASSO(SGL) Lysine biosynthesis Input Biosynthesis of antibiotics Pre-trained Gluconeogenesis

Glycolysis SVR, RFand ANN Terpenoid backbone biosynthesis

Arginine biosynthesis MF GE/MGE Lysine degradation Phenylalanine, tyrosine and tryptophan biosynt. MMANN architecture Glutathione metabolism Cysteine and methionine metabolism

2−Oxocarboxylic acid metabolism + Alanine, aspartate and glutamate metabolism Carbon metabolism Glycine, serine and threonine metabolism

Pantothenate and CoA biosynthesis GR Fatty acid elongation Pyruvate metabolism One carbon pool by folate Arginine and proline metabolism Biosynthesis of secondary metabolites pre-trained Tr yptophan metabolism • Artificial neuralnetworks(ANN) • Randomforest(RF) • Supportvectorregression(SVR) Propanoate metabolism

Citrate cycle (TCA cycle) Non Ubiquinone/terpenoid−quinone biosynthesis C.I Synthesis and degradation of ketone bodies NSLts Articles Latest PNAS Butanoate metabolism learning—using single-view ) Glyoxylate and dicarboxylate metabolism Folate biosynthesis Tyrosine metabolism Steroid biosynthesis Valine, leucine and isoleucine degradation selected MF Phenylalanine metabolism GR Biosynthesis of unsaturated fatty acids and GE Fatty acid metabolism (MMANN) neural networks • Multimodalartificial forest (BRF) • Baggedrandom learning (BEMKL) multiple kernel • Bayesianefficient Sphingolipid metabolism Thiamine metabolism Fatty acid biosynthesis Autophagy MAPK signaling beta−Alanine metabolism Fatty acid degradation Glycerophospholipid metabolism Mitophagy multiview ) Peroxisome | Phosphatidylinositol signaling system Inositol phosphate metabolism

f11 of 3 Glycerolipid metabolism Endocytosis Phosphonate and phosphinate metabolism Nicotinate and nicotinamide metabolism AGE−RAGE signaling in diabetic complications ABC transporters

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY suggesting that its limitation causes the depletion of UTP and Comparison of 27 Multiomic Machine-Learning Models of Yeast CTP, which in turn limits RNA biosynthesis, a limiting factor for Growth. The methods outlined in the previous section globally cell growth (34). constitute a wide and diversified collection of state-of-the-art data-driven prediction tools, applicable to different sets of omic Prediction of Cellular Growth Based on Transcriptomic and Fluxomic data. To identify the most effective approach, we performed Profiles. Starting from GE and metabolic flux (MF) profiles of a systematic comparison of their predictive accuracy, covering yeast mutants as two data views, we used the associated relative 27 dataset–method combinations. We evaluated each combina- growth rate as a target to train our predictive machine-learning tion by training and optimizing a model with 80% of the 1,143 models. As the nutritional conditions are fixed for all of the samples in our primary dataset and testing it with the remain- strains, we assumed that variation on the level of gene reg- ing 20%. The hyperparameters were selected by grid search as ulation and expression is the main contributor to metabolism described in Materials and Methods. The entire procedure was and growth. In this stage, we adopted the workflow depicted repeated 100 times to capture the random variation in training in Fig. 1C. and validation, while maintaining the same final test set. First, we explored three traditional machine-learning tech- Table 1 and Fig. 3 give a breakdown of the predictive model- niques, each one with previous encouraging results in biological ing results. First, we found highly variable scores for single-omic predictive tasks: 1) support vector regression (SVR)—often the predictions, depending on whether they referred to transcrip- learning tool of choice in computational biology due to its non- tomic or fluxomic data. In fact, both GE and MGE consistently linear decision boundary and ability to handle high-dimensional achieved higher accuracy than MF profiles. Analogously, the datasets (35, 36); 2) random forest (RF)—able to handle hetero- complete GE performs better than the MGE subset, there- geneous data types in high dimensions and to account for both fore highlighting the importance of metabolic or nonmetabolic correlation and interaction among features, which has led to suc- genes that are not currently used by the yeast GSMM. Second, cess in predictive modeling in multiple biological domains (37); our results suggest that early- and late-integration approaches and 3) artificial neural networks (ANNs)—extremely effective on average do not improve single-omic accuracy, although also in learning and modeling complex systems, with recent research this trend is associated with large variation depending on the reconstructing cell functionality (38) and predicting phenotypes specific data–method combination. Conversely, a small but tan- from multiomic data (39). We applied these methods to GE, gible improvement was observed for intermediate integration MGE, and MF data separately, in a single-view fashion, to obtain approaches. Third, SVR- and ANN-based approaches generally a baseline performance for the following steps. tend to be more accurate than tree-based approaches. It is inter- In a second stage, we studied the integration of base omic esting to observe that, overall, the most accurate dataset–method datasets. Because our combined data represent two distinct views combination is the MMANN model using both GE and MF, on the same biological systems, to thoroughly investigate the use immediately followed by SVR trained on GE alone, with sta- of complementary information we explored three data strategies: tistically significant median absolute error (MDAE) differences 1) early integration, where GE and MF are concatenated and between the two (Fig. 3D). treated as a single dataset denoted as GE-MF; 2) intermediate By examining the predictive scores achieved by single-view and integration, where model building is carried out on a combined multiview ANNs, we notice a clear improvement of multiomic transformation of the input views; and 3) late integration, where models against the stand-alone GE- and MGE-based models, a model is separately built within each view and then the models in contrast to other multiview methods. It thus emerges that are fused (3). ANNs constitute the most suitable framework for the integra- For intermediate and late integration, we used three multiview tion of transcriptomic and fluxomic data in terms of predictive methods based on those employed in the single-view scenario. benefits, among those considered here. Our results also sug- First, we considered Bayesian efficient multiple-kernel learning gest that, despite the relatively weak performance of the fluxes (BEMKL) (40), applying separate radial basis kernels to the MF alone, their useful information cannot be discerned from GE and and GE datasets. Second, we used bagged random forest (BRF) is therefore complementary to it. This is supported by examin- with distinct forests learned on transcriptomic and fluxomic pro- ing the prediction output correlations shown in Fig. 3D, where files. Finally, we designed and built a multimodal artificial neural the models produced using the fluxomic data have a predic- network (MMANN) to independently extract latent information tion set that largely differs from the other models. MMANNs from the two omic views and then fuse it together via additional seem thus to use the metabolic modeling to gain information neural layers (see SI Appendix for details). that cannot be acquired from the gene expression alone. Addi- The multiomic datasets considered in our predictive frame- tionally, using fluxes as additional features improves the ability work have a large number of features, which in general can to mechanistically explain the predictions from ANNs, making contribute to various extents toward the predicted growth value. them biologically interpretable. Noncontributing features add noise to the data, therefore giving Furthermore, data condensation through feature selection potentially weaker predictive models while increasing the train- (SGL, NSGA-II, and iRF data) increases the predictive capa- ing effort. To overcome this “curse of dimensionality,” feature bility of SVR and occasionally RF, but our results indicate that selection and regularization techniques were incorporated with this is not the case with ANNs. Since our ANNs include at least the aim of isolating the most predictive features. Also in this two hidden layers, this suggests that ANNs can identify predictive task, we explored three state-of-the-art approaches: 1) sparse nonlinear relationships among genes and metabolic reactions group lasso (SGL) (41), due to its ability to take into account the that involve a larger set of features. correlated and modular nature of biological functions; 2) non- dominated sorting genetic algorithm II (NSGA-II) (42), for its Generalization to an Experimentally Independent Dataset. For a ability to optimize multiple objectives; and 3) iterative random machine-learning model to be considered generalizable and of forests (iRF) (43), for its ability to capture nonlinear interactions high utility, performance stability is paramount. Especially in among features (SI Appendix). Each of these techniques offers a those settings where new data are collected in environments that different perspective on feature selection and is applied to GE- differ from those of the training data, it is imperative that the MF as an additional step of early integration. We thereby created prediction accuracy does not degrade under this new and unseen three further datasets (SGL data, NSGA-II data, and iRF data, setting. However, this can be challenging to achieve when all respectively) comprising the features identified by each of these of the training, validation, and test data originate from a single approaches. experiment (44). To verify the ability of our MMANN model to

4 of 11 | www.pnas.org/cgi/doi/10.1073/pnas.2002959117 Culley et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 etn rmwihsriscudb oprdwt confi- with compared al. be et Culley could strains give which would testing from approach and MMANN setting knockouts the single a using on knockouts identifica- training double strain then on absolute required, than This rather is strong. relative tion particularly that, a is if also rates that, note suggests growth We target scenario. with the the setting, correlation this knockout expectedly, double-gene in out-of-distribution therefore, this in well and even PCC less not dataset and were performs training knockouts RMSE model the Double as in tests. captured previous present batch are (45). with MDAE consistent potential patterns system- of are level key of by the source the caused on visible a However, be particularly represent often might error, that atic This experiments case. across to effects compared test improve first (PCC) the coefficient and correlation (RMSE) error Pearson’s root-mean-squared (MAE) but error increase, absolute MDAE mean and case, single-knockout the In set. test and Materials in found be for can predictions Methods). details reasonable (further generate mutants on also double only could trained model, mutants, MMANN ques- This single multiomic additional (28). our the whether trained investigate of not to tion was us it allowed which comprise therefore on analysis only effects epistatic our not exposing to knockouts, model mutants double also but new strains, nutritional single-knockout the same the in Importantly, cultivated to mutants conditions. it yeast applied of set we different data, a the independent of experimentally utility the to demonstrates generalize fluxomic, being features the of 36% with and, pipeline. models our other in all stage outperforms modeling consistently metabolic model additional MMANN The asterisk. an by Single features) of number total features the fluxomic over and features Dataset(s) (PCC), flux coefficient metabolic correlation of Pearson’s percentage (MDAE), the error (FFR, absolute representation median (MAE), error absolute mean T nemdaeand Intermediate al integration Early be1 ulsto cuaysoe cosal2 aae–loih obntos hw nFg :ro-ensurderr(RMSE), error root-mean-squared 3: Fig. in shown combinations, dataset–algorithm 27 all across scores accuracy of set Full 1. able GE GE GE MGE R data iRF R data iRF data NSGA-II data NSGA-II MGE MGE G n FMAN0.112 MMANN MF and MGE G n MF and MGE EadMF and GE EadMF and GE integration late R data iRF SAI data NSGA-II G data SGL G data SGL data SGL GE-MF GE-MF GE-MF MF MF MF G n MF and MGE EadMF and GE i.3C Fig. V le nblfc yerpeettebs crsfrec aaitgainseai,wietebs lblpromnefrec esr shighlighted is measure each for performance global best the while scenario, integration data each for scores best the represent type boldface in alues omics hw h eut nteeprmnal independent experimentally the on results the shows MMANN Method EK 0.182 BEMKL EK 0.182 BEMKL ANN ANN ANN ANN ANN ANN ANN SVR SVR SVR SVR SVR SVR SVR BRF BRF RF RF RF RF RF RF RF 0.102 0.108 0.102 0.127 0.115 0.122 0.120 0.179 0.154 0.130 0.136 0.178 0.163 0.130 0.117 0.132 0.126 0.185 0.196 0.203 0.139 0.147 0.145 0.132 RMSE ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± e0*0.067 3e-04* e0 0.110 7e-05 e0 0.110 1e-04 .0 0.077 0.001 .0 0.070 0.003 .0 0.079 0.007 .0 0.074 0.001 .2 0.110 0.020 .0 0.072 0.100 0.002 0.011 .0 0.091 0.008 .0 0.079 0.001 .0 0.073 0.001 .0*0.067 0.001* .0 0.086 0.001 .0 0.090 0.008 .1 0.103 0.014 .1 0.105 0.011 .0 0.082 0.001 .0 0.082 0.085 0.001 0.007 .0 0.077 0.001 .0 0.109 0.002 .0 0.125 0.117 0.009 0.006 .0 0.087 0.001 .0 0.079 0.009 MAE ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± .Ti scnitn ihthe with consistent is This 4C). ini- (Fig. translational elongation organic transcription, and for and factors tiation, the macromolecules as to of such correspond compounds, metabolism pathways a and and that reactions found biosynthesis we of processes, algorithms. proportion metabolic selection large organic feature the three examining all By for class prominent most by obtained (46). genes, system selected classification PANTHER the the for querying reactions 4B categories Fig. GSMM functional while algorithms, the main these of with each by associated 4A selected Fig. pathways features. unique metabolic Appendix), MF (SI 51 the and solutions members GE optimal 218 as possible include NSGA- selected of which the are front Pareto sets while with the variable relevant, Third, of nine most features. selection, that as GE feature found features unique II cel- MF We 68 36 the view. identifies and of with iRF GE point association 71 data-driven strong yields a SGL a from hold growth that lular concise predictors with us provided of facili- it to time, sets same variables the methods biological At of learning. Predictors. selection model number tate the feature reduce Multiomic of to us application Relevant allowed the above, of described Classification Functional and task this for experiments. method across predictive generalization the strong robust support a demonstrate results as MMANN these envi- of corrections, use training effect appropriate batch an and ronment assuming together, Taken dence. e0 0.049 4e-04 e0 0.046 4e-04 e0 0.049 3e-04 e0 0.050 4e-04 e0 0.053 3e-04 e0 0.053 5e-04 e0 0.058 3e-04 e0 0.066 2e-04 e0 0.054 4e-04 e0 0.067 1e-04 .0*0.045 0.001* .0 0.053 0.008 .1 0.067 0.067 0.001 0.014 0.010 .0 0.065 0.008 .0 0.047 0.001 .0*0.043 0.001* .1 0.065 0.010 .0 0.063 0.005 .1 0.072 0.013 .0 0.057 0.009 0.001 .1 0.083 0.065 0.065 0.016 0.001 0.003 0.004 mn l ilgclpoess eaoi rcse r the are processes metabolic processes, biological all Among 0.050 0.048 0.048 MDAE ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± e0 0.872 2e-04 e0 0.504 3e-04 e0 0.625 2e-04 e0 0.626 1e-04 .0 0.864 0.001 .0 0.902 0.004 .1 0.876 0.010 .0 0.870 0.001 .1 0.804 0.653 0.001 0.017 0.004 .1 0.838 0.011 .0 0.855 0.001 .0 0.882 0.002 .0*0.906 0.002* .0 0.810 0.001 .1 0.854 0.014 .0 0.653 0.002 .1 0.805 0.019 .0 0.844 0.001 .1 0.847 0.011 .0 0.867 0.001 0.001 .2 0.588 0.611 0.021 0.002 .0 0.803 0.001 0.004 0.891 0.866 0.828 NSLts Articles Latest PNAS PCC ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 3e-04 0.001 0.002 0.006 0.004 0.002 0.002 0.013 0.077 0.005 0.002 0.003 0.002* 0.003 0.003 0.069 0.005 0.003 0.006 0.002 0.003 .0 100 100 0.003 0.009 .033 0.003 0.001 0.029 lutae the illustrates | F,% FFR, shows 5 100 f11 of 24 24 79 36 24 34 34 36 34 36 79 36 36 36 79 As 0 0 0 0 0 0 0 0 0

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY Downloaded by guest on September 29, 2021 f11 of 6 ranging 4A), (Fig. functions of NSGA-II variety by and diverse more selected a glycerophospholipids, encapsulate reactions glycerolipids, whereas metabolites, of rate. secondary metabolism growth involved the largely actual reactions in the was selected SGL contribu- determines class features, joint MF processes functional Regarding the multiple No that of (47). indicating tion growth enriched, cell machinery, statistically for translational found the critical by is data– played of which synthesis correction. pair Bonferroni protein each by for rescaled in respectively, set, role 0.001, test and the 0.01, on 0.05, vectors of score thresholds at error significance between represent correlation followed *** Pearson’s model, and blue, accurate **, most In *, the (D) combination. overall SD. algorithm red, is associated In dataset–learning profiles their combination. MF all represent method and for areas GE accuracy both shaded model using while MMANN of set, The and (C Comparison 1. ANN- SVR. Concomitantly, Table (B) models. in GE-based techniques. single-omic shown by tree-based than results better than numeric notably to effective and corresponding more approach combinations, effective generally most appear the techniques overall is SVR-based integration Intermediate type. model learning 3. Fig. C A RMSE PCC MDAE MAE RMSE 0.15 0.00 0.05 0.10 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 0.10 0.15 0.20 0.6 0.7 0.8 0.9 | oprsno oe rdciepromnears aaitgainsrtg n machine- and strategy integration data across performance predictive model of Comparison (A) results. prediction growth yeast Machine-learning aaitgainapoc Learningapproach Data integrationapproach www.pnas.org/cgi/doi/10.1073/pnas.2002959117 Intermediate Single Strain type − omic ro crso h xeietlyidpnetts e.Dse e ie ersn h orsodn ro cr ntemi test main the on score error corresponding the represent lines red Dashed set. test independent experimentally the on scores Error ) MAE 0.000 0.025 0.050 0.075 0.100 P Late Early ausaesono icxnrn-u et sesn h infiac fMA ifrne,frec aro data–method of pair each for differences, MDAE of significance the assessing tests rank-sum Wilcoxon of shown are values igeK Double KO Single KO

MDAE 0.00 0.02 0.04 0.06 ANN Tree SVR − − − based based based PCC 0.00 0.25 0.50 0.75 1.00 D B 0.02 0.04 0.06 0.08 0.05 0.10 0.15 0.20 iRFdata iRFdata GEandMF RMSE MDAE GEandMF P MDAE difference 0.00 0.250.500.751.00 TesterrorPCC 1.00 0.75 0.50 0.25 0.00 value NSGA NSGA MGE andMF eetdgnsadmtblcratosaeecuiet n or one to exclusive are are reactions which other metabolic All complexes, and com- (48). genes TRAPP core trafficking selected in vesicle a Rab-mediated present encodes for subunit and responsible a methods of selection ponent feature three all metabo- and secondary glycerophospholipids, acids, and fatty acids of nucleotides. metabolism amino the to of lites biosynthesis the from MGE andMF GE andMF GE andMF NSGA NSGA h eeYR7W(lokona R3)wsslce by selected was TRS31) as known (also YDR472W gene The MGE andMF NSGA MGE andMF MGE andMF GE andMF − − SGL data SGL data II data II data iRF data iRF data GE SGL data GE − − iRF data II data II data GE − MGE MGE − II data − SGL data SGL data MGE GE MF MF GE MF MF − − − − − MMANN MMANN GE MF MF BEMKL BEMKL − − − − − − − − − − − − − − − − ANN ANN ANN ANN ANN ANN ANN SVR SVR SVR SVR SVR SVR SVR BRF BRF − − − − − − − RF RF RF RF RF RF RF GE GE GE GE *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

GE − SVR − *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** − GE − RF MF MF *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** MGE GE − ANN MGE *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** MF MGE − SVR MF *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

MGE − RF ANN Tree SVR *** *** *** *** *** *** *** *** *** *** *** *** *** ** * MGE − ANN *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** −

MF − SVR − − 0.12 0.04 0.06 0.08 0.10 *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** based 0.4 0.6 0.8 1.0 iRFdata based MF − RF based iRFdata *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** MF − ANN GEandMF GEandMF *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** MAE GE−MF − SVR PCC *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

GE−MF − RF NSGA NSGA *** *** *** *** *** *** *** *** *** *** *** GE−MF − ANN *** *** *** *** *** *** *** *** *** *** *** *** SGL − SVR *** *** *** *** *** *** *** *** *** *** *** *** MGE andMF MGE andMF −

SGL − RF − *** *** *** *** *** *** II data * SGL − ANN II data *** *** *** *** *** *** *** *** *** NSGA−II − SVR *** *** *** *** *** *** *** *** *** SGL data NSGA−II − RF SGL data *** *** *** *** *** *** * NSGA−II − ANN *** *** *** *** *** *** *** *** iRF − SVR *** *** *** *** *** *** *** iRF − RF *** *** *** *** *** *** GE iRF − ANN GE *** *** *** *** *** GE GE and MF − BEMKL GE *** *** *** ulye al. et Culley ** GE and MF − BRF *** *** *** − − MF GE and MF − MMANN MF MGE MGE * *** *** *** MGE and MF − BEMKL MF MGE and MF − BRF MF MGE and MF − MMANN Downloaded by guest on September 29, 2021 Fig. ontemb h ahn-erigapoce.Teclrsaefo ry(o)t e hg)idctsteaon fflxcridb ahrato nthe in reaction each by al. carried et flux contributing, Culley of highly amount exploited the as are indicates fluxes determined (high) whose features red GSMM, to the of (low) in number present gray not from small genes scale a two color of The has (as pathway. knockout approaches. methods MF the machine-learning all by the the across generated by contribution, perturbations consistent downstream metabolic higher is capture signal overall gene can selected CBM of an (F pathway-level condition-specific expression utility. have the the predictive values level, by their functional values SHAP represented expression higher demonstrating GE gene SGL, a the correlated and at highly Although while, NSGA-II, (e.g., iRF, pathway) models. methods among linear selection shared a feature in in is the flux shown by feature interchangeably similar single used with A are reactions iRF. iRF features or and and individual NSGA-II, NSGA-II, that SGL, suggests SGL, This for by YDR472W. method selected selection features the individual of the independent hypothesis is (null class MMANN functional for per features of number (χ The respectively. functions, molecular metabolic (B MMANN. AB F ligase succinate-CoA dehydrogenase succinate 2

Phenylalanine, biosynt. tyrosineandtryptophan Pathway AGE eto needne ulhypothesis null independence, of test 4. Alanine, aspartate andglutamatemetabolism Alanine, aspartate Synthesis anddegradation ofketone bodies Number ofreactions Valine, leucineandisoleucinebiosynthesis − Valine, leucineandisoleucinedegradation Glycine, andthreoninemetabolism serine oxoglutarate dehydrogenase RAGE signalingindiabeticcomplications Glyoxylate anddicarboxylate metabolism aha lsicto ftemtblcfaue eetdb G,NG-I and NSGA-II, SGL, by selected features metabolic the of classification Pathway (A) process. learning the to features omic the of Contribution ubiquinone-6 .(E B). Biosynthesis of secondary metabolites Biosynthesis ofsecondary C5 Biosynthesis ofunsaturated fatty acids malate dehydrogenase Porphyrin andchlorophyll metabolism Phosphatidylinositol signalingsystem ubiquinol-6 Pantothenate andCoAbiosynthesis − and succinate phosphate fumarase Branched dibasicacidmetabolism 2 Terpenoid backbonebiosynthesis Glycerophospholipid metabolism dihydrolipoamide − pyruvate dehydrogenase itiuino etr motnei h MNs hs itiuin r xrce rmteM n Ecmoet fteMMANN the of components GE and MF the from extracted are distributions These MMANNs. the in importance feature of Distribution ) Oxocarboxylic acidmetabolism Inositol phosphatemetabolism Aminoacyl C Biosynthesis ofaminoacids S(8)-succinyldihydrolipoamide ucinlcasfiaino h ee eetdb G,NG-I R,o MN,bsdo eeOtlg ilgclpoessand processes biological Ontology Gene on based MMANN, or iRF, NSGA-II, SGL, by selected genes the of classification Functional ) Oxidative phosphorylation Biosynthesis ofantibiotics One carbonpoolby folate Citrate cycle(TCAcycle) Glycerolipid metabolism Propanoate metabolism Tryptophan metabolism Vitamin B6metabolism fumarate Pyrimidine metabolism Pyrimidine Fatty acidbiosynthesis Butanoate metabolism Fatty aciddegradation Riboflavin metabolism Fatty acidmetabolism Pyruvate metabolism Histidine metabolism succinyl-CoA Steroid biosynthesis PET112 knockout − AT Carbon metabolism Lysine biosynthesis Lysine degradation AD Purine metabolism Purine tRNA biosynthesis P Biotin metabolism (S)-malat Gluconeogenesis P H NADH 0918 0 Peroxisome e (NAD+) dehydrogenase isocitrate rejected, oxaloacetate lipoamide NADH acetyl-CoA oxoglutarate dehydrogenase(lipoamide) pyruvate eaoi u hog h ircai yl ntomtns E12(Left PET112 mutants: two in cycle acid citric the through flux Metabolic ) H P 0 1 1 7 2 2 =

retained, SGL 2-oxoglutarate 11 18 10 1 1 2 2 1 2 3 1 3 1 1 7 2 2 1 3 1 1 2 5 2 1 2 1 2 2 1 1 2 1 2 2 1 2 2 2 1 3 1 1 6.3

citrate synthase NSGA−II 2 2 4 1 citrate

· MMANN NADPH 10 isocitrate −4 cis-aconitate P NADP(+) citrate tocis-aconitate dehydrogenase isocitrate = < Individual featureselectionoverlap R SGL iRF 0.72 D 5frbooia rcse and processes biological for 0.05 coenzyme A CO2 NAD H H2O to isocitrate cis-aconitate(3-) + Number of genes 8102 58 100 150 > 50 0 5frbooia rcse and processes biological for 0.05 Metabolic fluxactivity Biological process 9 Feature selectionalgorithm G NSGA SGL Signaling Response tostimulus Reproduction Metabolic process Localization Cellular process Cellular componentorganization Cell proliferation Biological regulation Biological phase NSGA 255 low flux high flux 1 − II ligase succinate-CoA dehydrogenase succinate − 3 IiFMMANN iRF II H P χ ² testofind. >0.05 0 oxoglutarate dehydrogenase retained ubiquinone-6 ubiquinol-6 malate dehydrogenase by NSGA-II selected succinate phosphate fumarase P dihydrolipoamide = 2. H P χ S(8)-succinyldihydrolipoamide ² testofind. <0.05 0 P rejected E = · 10 fumarate 0.18 succinyl-CoA −3 AT

AD Absolute frequency P 1200 (S)-malat P 400 800 < > NADH 0 vra in Overlap (D) processes). metabolic for 0.05 5frmtblcpoess,btdependent but processes), metabolic for 0.05 C e (NAD+) dehydrogenase isocitrate ATG10 knockout oxaloacetate . . . 1.5 1.0 0.5 0.0 Number of genes lipoamide 20 40 60 80 0 Mean absoluteSHAPvalue n T1 (Right ATG10 and ) Metabolic process NADH acetyl-CoA pyruvate oxo H Small moleculemetabolicprocess metabolicprocess Primary Organic substancemetabolicprocess Nitrogen compoundmetabolicprocess Glycosylation Cellular metabolicprocess Catabolic process Biosynthetic process MF GE 0 Feature selectionalgorithm NSGA SGL pyruvate dehydrogenase retained, g H lutarate deh 0 2-oxoglutarate rejected, NSLts Articles Latest PNAS 20 40 citrate synthase 0 citrate . . . 1.6 1.2 0.8 0.4 χ NADPH ² testofind., χ isocitrate cis-aconitate ² testofind., − y NADP(+ citrate tocis-aconitate IiFMMANN iRF II dro dehydrogenase isocitrate ,ilsrtn how illustrating ), g enase ) to isocitrate cis-aconitate(3-) P >0.05 P ( <0.05 lipoamide | × 7 × 10 10 f11 of ) − − 3 3

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY two methods. Among the nine features selected by both iRF and while improving our mechanistic understanding of the role that NSGA-II, there are genes encoding binding proteins and trans- each omic plays in the wider biological context. porters (Dataset S1). Similarly, the genes selected by SGL and Finally, given the high prediction accuracy of MMANN mod- NSGA-II also coded for mitochondrial transport and mRNA els, we sought to determine their most contributing features. binding. The selection of genes linked to tRNA and cellular To this end, we exploited recent advances in ANN interpre- amino acid-related metabolic processes is consistent with the tation via the SHapley Additive exPlanations (SHAP) method process of translational elongation during the assembly of amino (51), a general approach for determining the contribution (called acids into proteins, which consequently affects cellular growth SHAP value) of individual features to model outputs. We applied and maximization of biomass. Despite the limited overlap among SHAP to a randomly selected model from the set of MMANN the features selected by the three methods (Fig. 4D), their high- models, selecting features with absolute mean SHAP values in level functional classification is statistically coherent (χ2 tests the top percentile as highly relevant and obtaining 71 belonging of independence, null hypothesis retained, P = 0.72 for biolog- to the transcriptomic domain and 10 to GSMM reaction fluxes ical processes and P = 0.18 for metabolic processes). This is (Dataset S1). MMANN-associated GE features yield statistically consistent with the nature of cell systems, based on functional significant differences from those selected by the feature selec- modularity and redundancy, and characterized by widespread tion methods in terms of functional classification (Fig. 4 B and C, cross-correlated omic cues. χ2 tests of independence, null hypothesis rejected, P = 6.3 · 10−4 For metabolic genes or reactions, their contribution to cell for biological processes and P = 2.2 · 10−3 for metabolic pro- growth could be inferred also through CBM-only approaches, cesses). The information extracted by these models thus seems e.g., by simulating the effect of their artificial alterations. To notably distinct, which may explain the higher performance of compare a CBM-only approach with our multimodal machine- MMANNs. Among the top-contributing genes in MMANNs, learning approach, we performed a sensitivity analysis through many produce proteins binding to RNA, with several genes act- in silico single-gene knockdown directly within the metabolic ing as mRNA splicing factors involved in preprocessing via the model, examining the impact on the biomass accumulation rate spliceosome. Some genes encode proteins that bind to DNA (SI Appendix). The genes and pathways that have the greatest to repair mismatched nucleotides, as well as proteins responsi- effect on the biomass are listed in Dataset S1, among which ble for dephosphorylation and protein/tRNA modification. This, we found some overlap with the feature selection algorithms. along with the presence of an amino acid transporter gene, reaf- The down-regulation of genes related to tRNA metabolic pro- firms the role of protein synthesis in relation to growth. Among cesses and the biosynthesis of amino acids such as arginine and the top-contributing reactions, the main pathways (glycerophos- phenylalanine resulted in zero biomass flux, consistent with the pholipid and inositol metabolism) are very closely linked, since features identified by SGL and NSGA-II. From the perspec- inositol signaling is responsible for homeostasis and regulation tive of individual algorithms, overlapping iRF-selected genes are of lipid metabolism (52). related to pyrimidine and phospholipid biosynthesis and to the pentose phosphate pathway. The NSGA-II genes whose deletion Contribution of Fluxomic Information in Multiomic Machine-Learning resulted in zero biomass are related to the metabolism of vitamin Models. Although from the single-omic results it is clear that D and sphingolipid biosynthesis. a large contribution in the most accurate multimodal learn- Analogously, we carried out a flux-coupling analysis to iden- ing model (MMANN) comes from the transcriptomic data, we tify reaction fluxes on which growth rate is mutually dependent showed that a significant and complementary amount of relevant (fully coupled) or unilaterally dependent (directionally coupled) signal is present in the metabolic view. Thus, we further investi- (49) (see SI Appendix for details). A total of 234 reactions were gated the extent to which this method exploits the information classified in either one of the two categories (Dataset S1). Also in MF rather than in GE. The variable importance distribution in this case, we observed an overlap between some features that for each data source, estimated through SHAP, is plotted in were selected by SGL or NSGA-II. Of the 36 reactions selected Fig. 4E. Although transcriptomic features have a higher mean by SGL, only 3 reactions are coupled with the biomass pseu- absolute SHAP value and constitute the majority of the infor- doreaction (with 1 fully coupled and 2 directionally coupled mation used, fluxomic features also contribute a subset with reactions), whereas 19 of the 51 reactions selected by NSGA-II high SHAP values. This shows that the predictive improvement were found to be coupled (with 1 fully coupled and 18 direc- obtained by the addition of MF profiles is directly attributable to tionally coupled reactions). However, it should be noted that active information sourcing from this data view. CBM approaches are limited to the enzymes included in the Finally, to ascertain how the addition of MF affected the pre- genome-scale metabolic model and overlook the role of external dictive accuracy on individual knockout strains, we compared the biological factors. Thus, we argue that our integrative framework absolute error differences between ANNs (using only GE) and can be complementary to more traditional CBM approaches and MMANNs (using both GE and MF). The knockout strains that capture cross-omic relationships missed by them. recorded the highest differences between the mean errors were Interestingly, when examining rules within the GSMM that regarded as providing a more accurate prediction of growth rate dictate the gene–protein-reaction associations, some of the due to the addition of MF to the model. The full list of strains reactions selected uncover formerly overlooked connections. for this analysis can be found in Dataset S2. Among the 20 high- For instance, the reactions involved in glycerophospholipid est differences were many gene knockouts that played a role metabolism are selected by SGL but the corresponding genes are in DNA transcription or RNA processing, as well as enzymes not. In fact, a closer inspection of these results revealed that the involved in the sorting and modification of proteins. Interest- functionalities of the selected gene and reaction features hardly ingly, only 2 of these 20 genes are present within the GSMM. overlap. Five reactions that constitute part of the glycerophos- This shows that MF and machine learning can jointly contribute pholipid are controlled either exclusively or toward extracting more accurate and biologically interpretable partially by the gene YPR140W, which is essential for maintain- predictions by indirectly propagating perturbations on biological ing the phospholipid content of the mitochondrial membrane. components into a GSMM, even when such components are not Indeed, S. cerevisiae is a popular choice of organism for studying explicitly included in the GSMM. As an example, Fig. 4F displays glycerophospholipid homeostasis in eukaryotes, owing to toler- the difference in metabolic flux in the citric acid cycle between ance with respect to its membrane lipid composition (50). These two different mutants, illustrating how our condition-specific results support the case for the inclusion of both flux and gene CBM approach can capture metabolic perturbations generated features to augment the machine-learning model with more data, by the knockout of genes not present in the GSMM (PET112 and

8 of 11 | www.pnas.org/cgi/doi/10.1073/pnas.2002959117 Culley et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 ulye al. et Culley ahn erigt ilgclegneigadt te relevant other to and engineering multiomic biological knowledge-based to and sup- learning cell data- machine thus such of results of effect extension Our the the (10). port understand operations to engineering metabolic important or is more it employ where and in scenarios trust especially to models, into machine-learning ability interpretable translate human biologically can of This terms outcome. in phenotypic advantages the metabolites to of rise interaction give condition-specific that the mechanis- into direct can insights providing tic learning, while multiview predictivity with increase therefore combined networks, based metabolic augmentation Data directly on biochemistry. is underlying it the as to interpretation, linked mechanistic straightforward a has exploited. fully is transcriptomic complementarity by such once guided data, predictions fluxomic and the framework promising improving a sup- further be Therefore, to for scenarios. appears of also regression variety vector exploitation a port its in support models may the data-driven and by that expression gen- demonstrate gene com- to findings is accuracy models plementary our metabolic in genome-scale task, in difference embedded the knowledge any on depends While erally (56). expression over benchmarking performance gene when guar- especially good always predictions, not improved does achieves antee integration data regression multiomic Indeed, vector scores. transcriptomics- support also neural but overall, single-view based methods to other to compared and accuracy networks higher achieve works varying investigating to extended conditions. be Simi- environmental to could (5). implemented framework layers where be our biological larly, could cases different use they across the available, predictions widespread In perform are their studies. data omic given biotechnology further tran- benchmark, and metabolic adopted biology a base we across different as work, of data level this scriptomic the in on Furthermore, as reaction- reconstructions. well generate level as to the fluxes, on techniques level performed modeling machine-learning be constraint-based While cross-comparing could of survey sampling. on analogous or focused an analysis we methods, variability work flux e.g., this model, of in metabolic results a the addi- from extracted instance, from be For could models. flux features strain-specific tional in of in space are constraints lie solution may further there the information additional inject modeling, Similarly, can (53–55). metabolic simulations that genome-scale methods in additional used widely captured directly profiles. generate is fluxomic to the used what reconstruction metabolic and beyond the accuracy by reach mechanistically prediction can of terms insights in biological advantages demon- the we being Additionally, that networks overall. strated neural model arti- predictive multimodal through strongest In with obtained the choice. networks, was model neural improvement predictive ficial largest the the to study, subject our although is omics, improvement individual over the can strength data prediction fluxomic the artificial increase and useful combin- transcriptomic that a experimental verified We ing provides benchmarking. future here for point considered several starting techniques across and integration models evaluated of data spectrum systematically wide The and approaches. frame- machine-learning proposed This growth. is cellular yeast of work multi- prediction the and for silico-generated data multiview in omic and of experimental integrating application to learning the stage investigates work This Discussion feature application. hoc given ad any for using techniques methods, selection reactions machine-learning metabolic for of features use as the advocates This downstream. used ATG10), ial,i sipratt oeta eaoi u information flux metabolic that note to important is it Finally, net- neural artificial multimodal that note to interesting is It is analysis balance flux transcriptomic-constrained Although hc ntr a eepotdb aadie model data-driven a by exploited be can turn in which erigmdl,w mlydtefloigmliiwmtos BEMKL methods: (37), multiview machine- RF following multiomic (35), obtain the and employed SVR profiles we methods: omic models, integrate learning and learning To features, supervised (61). as ANNs following profiles and the fluxomic and used transcriptomic we the from started we log the as the from Models. strains Machine-Learning 86 the and dataset main solutions the dataset. The independent from solver. the experimentally strains PDCO in yeast the reaction 1,143 with every the for (60) Eq. fluxes 3.0 solve steady-state To toolbox dataset. provide COBRA transcriptomic the main our used total in a we present genes, are 926 (98%) these Among 908 (29). of metabolites 2,223 and reactions, 3,494 set gene to the of level expression effective where reactions. associated into as the rules operations, association max/min onto gene–protein-reaction genes logical individual converting involves the This of expression is the condition ping transcriptional each of by impact represented The constraints. given the under while objective, Here odto ysligtefloigprioiu B problem: FBA parsimonious following the solving by condition with consistent model metabolic data. context-specific experimental a yielding factors, mental ok iie yterlwraduprbounds upper by vector and state given lower a steady their metabolic by by limited a described work, assuming be balanced rates can energy Reaction and and a (58). mass transformation as are biochemical (fluxes) represented each in mathematically involved is products network within matrix reaction occur stoichiometric The that organism. transporters transmembrane an and reactions biochemical Modeling. Metabolic Genome-Scale proposed our using for scenario real-case a as independent method. val- served MMANN experimentally missing and an conditions of represented of imputation mutants set linear Upon 86 by variables. obtained genes the other missing ues, mod- the the pretrained for on our values based into expression data regression gene new the features, these in imputed of feed present we and consistency els, genes sets, dou- ensure the gene To the same of missing. of the 58 not were all i.e., dataset, dataset do and second training strains) that this main (14 In mutants the strains). dataset single (72 primary mutants the our ble in selected those we and with strains, single overlap for these profiles Among expression of form. strains gene deletion providing gene (28), double study third a stages. from following the with in used samples we 1,143 which obtained rates, merging growth we log After associated rates, the type. their growth wild as and the expressed profiles and rates strains, strain transcriptomic growth 1,312 each relative for between also ratio type provides doubling-time wild which the (57), supplementary to study the compared second from data a these of downloaded material We phase. midlog the of strains two-channel ing deletion provides single-gene which 1,484 (27), for study profiles previous microarray a in collected was work this Data. Growth and Transcriptomic Methods and Materials drug for metabolites of secretion development. the as such targets, phenotypic IAppendix SI nti ok eue the used we work, this In eetmtdtemtblcflxsascae oec transcriptional each to associated fluxes metabolic the estimated We nidpnetdtstfrtsigtepooe MN a obtained was MMANN proposed the testing for dataset independent An w θ sabnr etrepesn h ims suoecina unique a as pseudoreaction biomass the expressing vector binary a is ersnsteepeso ee fagene a of level expression the represents (g) v lb 2 and fteduln-iertowt epc otewl type, wild the to respect with ratio doubling-time the of o oedtisrgrigtentiinlconditions. nutritional the regarding details more for hc stegn e xrsinvco bandb map- by obtained vector expression set gene the is which Θ, f v ub stemxmlgot aeaheal ytenetwork the by achievable rate growth maximal the is a emdfidt oe ayn eei renviron- or genetic varying model to modified be can Θ(g Θ(g atrn h xc rprin fratnsand reactants of proportions exact the capturing S, 1 1 opeitterltv obigtm,expressed time, doubling relative the predict To ∨ ∧ ujc to subject i c96yatGM,wihicue 2 genes, 926 includes which GSMM, yeast Sce926 v g g lb 2 2 ) ) Θ .cerevisiae S. = = min h antasrpoi aae sdin used dataset transcriptomic main The v S ≤ v max{θ min{θ SMi olcino l known all of collection a is GSMM A v v = kv ≤ frato ue hog h net- the through fluxes reaction of w 0, > v 1 {g (g ub (g v 1 = 1 1 Θ. ), ), , ntesm iraryplat- microarray same the on g f NSLts Articles Latest PNAS θ θ , 2 (g v (g } lb 2 2 5) erfrtereader the refer We (59). )} )}, and and g, i c96GM across GSMM Sce926 v ub Θ .cerevisiae S. h constraints The . ersnsthe represents | 2 9 fthe of f11 of dur- [2] [1] 1,

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY Pn |ˆy − y | (40), BRF, and MMANNs. Further, to reduce the number of omic predictors, MAE = i=1 i i ; [4] we employed SGL (41), NSGA-II (42), and iRF (43) (see SI Appendix for details n on each of these methods). the MDAE, MDAE = median(|ˆy1 − y1|, ... , |ˆyn − yn|); [5] Machine-Learning Model Selection, Training, and Testing. To assess model and the PCC. MDAE statistical differences across data–method pairs were generalization, we randomly split our samples into train and test subsets estimated by Wilcoxon rank-sum tests through the wilcox.test R function, composing 80% and 20% of the main dataset, respectively. Training data whose P− values were adjusted via Bonferroni correction. were used for fitting the models and learning latent patterns present in the data, which can predict the relative doubling time of yeast mutants. Since Artificial Neural Network Interpretation. To quantify the variable contribu- many of the adopted methods have hyperparameters that can impact the tions in the MMANN models, we used the SHAP method (51). SHAP uses a learning process, we performed a grid search to identify the optimal hyper- game-theoretic approach to determine the importance of a particular fea- parameter settings with the use of validation data subsets. Using the 80% ture to individual data inputs. SHAP values are thus feature importance data portion, we applied fivefold cross-validation repeated three times for scores defined to satisfy local accuracy, missingness, and consistency prop- all methods, except the ANN-based models, for which we used a fixed 10% erties. We used a variant of the SHAP method specifically designed for ANN of the training set for validation. After selecting the hyperparameters, we models, called Deep SHAP (51), whose working principle is the back propa- trained each model again, this time using the full training data—validation gation of unit activation differences to input features. The top-contributing samples included. To measure model performance, we used the obtained features inspected in terms of biological classification were chosen as those models to make predictions on all of the samples in the test set, which are in the largest mean SHAP value percentile, where the mean was computed disjoint from those in the training and hyperparameter selection phases. over the training samples. To account for stochastic variability—whether in cross-validation or during the optimization process in the case of ANN—we repeated the Biological Feature Classification. The biological classification for the genes training–test procedure 100 times for each combination of dataset and identified by the feature selection methods and SHAP was obtained with ANN-based model and repeated the selection–training–test procedure 100 the PANTHER classification system (46). The KEGG pathway annotation (62) times for each other dataset–method combination. Feature selection meth- for GSMM reactions was obtained from a curated S. cerevisiae GSMM (63). ods were optimized and applied one time only. Finally, we applied a The statistical enrichment tests on PANTHER were run with default parame- randomly selected MMANN model to the experimentally independent test ters. To assess associations between the feature selection methods and the set to simulate a real-use scenario. To ensure full reproducibility, we provide selected gene features, χ2 independence tests were run on biological and the train–test split indexes and the random seed used, along with details metabolic process classification classes via the chisq.test R function. These on methods, software packages, and hyperparameter search spaces in tests were performed first across SGL, NSGA-II, and iRF and finally with the SI Appendix. inclusion of the MMANN features obtained through SHAP.

Data Normalization and Performance Metrics. When feeding the different Data Availability. The microarray and growth data obtained for this study data views to the machine-learning techniques, we used z-score normal- are available on Gene Expression Omnibus (GEO) (accession nos. GSE42526, ization, where the mean and SD of the training data were also used to GSE42527, and GSE42536), on Array Express (E-MTAB-1383, E-MTAB-1384, normalize the test data to prevent information leakage. We used the nor- and E-MTAB-1385), and as flat files from the authors of the original studies malized data in all of the learning approaches due to the different data (27, 28, 57). The yeast metabolic model can be found in the supple- distributions of the two views (fluxes and gene expression), also noting mentary material of the corresponding paper (29). All data, models, and in general that normalization is a requirement for SVR and enables faster code used in this work are also available on GitHub at https://github.com/ convergence in ANNs. multiOmicMechanismAwareML/CodeBase, along with the information for The hyperparameter selection focused on minimizing the RMSE, replicating the results presented. s Pn 2 ACKNOWLEDGMENTS. C.C. was supported by the United Kingdom Research (ˆyi − yi) RMSE = i=1 , [3] and Innovation (UKRI) Centre for Doctoral Training (CDT) in Machine Intelli- n gence for Nano-electronic Devices and Systems (EP/S024298/1). C.A. received funding from Biotechnology and Biological Sciences Research Council ˆ where model predictions yi are compared with observed growth rates yi (BBSRC), Grants CBMNet-PoC-D0156 and NPRONET-BIV-015 (BB/L013754/1). across all n strains. The RMSE emphasizes incorrect predictions. When eval- G.Z. and C.A. were also supported by Teesside University and by UKRI Re- uating and comparing models, we used three additional metrics, namely search England’s Teesside, Hull and York - mobilising bioeconomy knowl- the MAE, edge exchange (THYME) project.

1. S. Niedenfuhr,¨ W. Wiechert, K. Noh,¨ How to measure metabolic fluxes: A taxonomic 13. M. P. Pacheco, T. Bintener, T. Sauter, Towards the network-based prediction of guide for 13c fluxomics. Curr. Opin. Biotechnol. 34, 82–90 (2015). repurposed drugs using patient-specific metabolic models. EBioMedicine 43, 26–27 2. M. W. Libbrecht, W. S. Noble, Machine learning applications in genetics and (2019). genomics. Nat. Rev. Genet. 16, 321–332 (2015). 14. Z. Bao et al., Genome-scale engineering of Saccharomyces cerevisiae with single- 3. Y. Li, F. X. Wu, A. Ngom, A review on machine learning principles for multi-view nucleotide precision. Nat. Biotechnol. 36, 505 (2018). biological data integration. Briefings Bioinf. 19, 325–340 (2016). 15. T. S. Gardner, Synthetic biology: From hype to impact. Trends Biotechnol. 31, 123–125 4. I. Shaked, M. A. Oberhardt, N. Atias, R. Sharan, E. Ruppin, (2013). prediction of drug side effects. Cell Systems 2, 209–213 (2016). 16. F. David, V. Siewers, Advances in yeast genome engineering. FEMS Yeast Res. 15, 1–14 5. M. Kim, N. Rai, V. Zorraquino, I. Tagkopoulos, Multi-omics integration accurately pre- (2015). dicts cellular state in unexplored conditions for Escherichia coli. Nat. Commun. 7, 17. V. Shahrezaei, S. Marguerat, Connecting growth with gene expression: Of noise and 13090 (2016). numbers. Curr. Opin. Microbiol. 25, 127–135 (2015). 6. E. Yaneske, C. Angione, The poly-omics of ageing through individual-based metabolic 18. H. De Jong et al., Mathematical modelling of microbes: Metabolism, gene expression modelling. BMC Bioinformatics 19, 415 (2018). and growth. J. R. Soc. Interface 14, 20170502 (2017). 7. J. H. Yang et al., A white-box machine learning approach for revealing antibiotic 19. M. J. Herrgard˚ et al., A consensus yeast metabolic network reconstruction obtained mechanisms of action. Cell 177, 1649–1661 (2019). from a community approach to systems biology. Nat. Biotechnol. 26, 1155–1160 8. C. Culley, S. Vijayakumar, G. Zampieri, C. Angione, “Combining metabolic modelling (2008). with machine learning accurately predicts yeast growth rate” in 11th International 20. M. Scott, C. W. Gunderson, E. M. Mateescu, Z. Zhang, T. Hwa, Interdependence of Workshop on Bio-Design Automation. P. Lio’, A. Wipat, J. Haseloff, A. Phillips, cell growth and gene expression: Origins and consequences. Science 330, 1099–1102 S. J. Dunn, Eds. (University of Cambridge, Cambridge, United Kingdom, 2019), pp. (2010). 26–27. 21. J. D. Orth, I. Thiele, B. Ø. Palsson, What is flux balance analysis?. Nat. Biotechnol. 28, 9. M. B. Guebila, I. Thiele, Predicting gastrointestinal drug effects using contextualized 245–248 (2010). metabolic models. PLoS Comput. Biol. 15, e1007100 (2019). 22. Y. Chen, G. Li, J. Nielsen, “Genome-scale metabolic modeling from yeast to human 10. G. Zampieri, S. Vijayakumar, E. Yaneske, C. Angione, Machine and deep learning meet cell models of complex diseases: Latest advances and challenges” in Yeast Systems genome-scale metabolic modeling. PLoS Comput. Biol. 15, e1007084 (2019). Biology, S. Oliver, J. Castrillo, Eds. (Springer, 2019), pp. 329–345. 11. R. Yu, J. Nielsen, Yeast systems biology in understanding principles of physiology 23. V. Pelechano, J. E. Perez-Ort´ ´ın, There is a steady-state transcriptome in exponentially underlying complex human diseases. Curr. Opin. Biotechnol. 63, 63–69 (2020). growing yeast cells. Yeast 27, 413–422 (2010). 12. S. Levy, N. Barkai, Coordination of gene expression with growth rate: A feedback or 24. E. M. Airoldi et al., Predicting cellular growth from gene expression signatures. PLoS a feed-forward strategy? FEBS Lett. 583, 3974–3978 (2009). Comput. Biol. 5, e1000257 (2009).

10 of 11 | www.pnas.org/cgi/doi/10.1073/pnas.2002959117 Culley et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 2 .Db .Paa,S gra,T eaia,Afs n lts utojciegenetic multiobjective elitist and fast A Meyarivan, T. Agarwal, S. Pratap, A. Deb, K. 42. 3 .Bs,K ube,J .Bon .Y,Ieaierno oet odsoe predic- discover to forests random Iterative Yu, B. Brown, B. J. Kumbier, K. Basu, S. 43. lasso. sparse-group A Tibshirani, R. Hastie, T. Friedman, predict J. Simon, to N. system 41. learning deep A metabolism: Deep G Feng, M. X. 40. Xu, Y. Guo, W. 39. Ma J. 38. promotes actively metabolism Sulfur Polymenis, M. Belyanin, A. Gajjar, S. Blank, M. H. 33. Garc J. 32. 25. 7 .Ce,H swrn admfrssfrgnmcdt analysis. data genomic for forests Random Ishwaran, H. Chen, X. 37. Sch Huang S. B. 36. Sonnenburg, S. Ong, Growth- S. C. Rabinowitz, D. Ben-Hur, J. A. Botstein, D. 35. Bradley, H. P. Crutchfield, A. C. Boer, M. V. 34. yeast. in Kondo development M. and 31. growth of control Nutritional Broach, R. J. synthetic and 30. essentiality gene Using Maranas, D. C. Chowdhury, A. Chowdhury, R. 29. and cycle, metabolic response, Kemmeren P. rate growth 27. among Coupling Botstein, D. Slavov, N. 26. 8 .Sameith K. 28. ulye al. et Culley .P yok .E otr rdciggot aefo eeexpression. gene from rate growth Predicting Motter, E. A. Wytock, P. T. loih:NSGA-II. algorithm: Stat. ieadsal ihodrinteractions. (2018). high-order stable and tive Learning, 91–98. pp. Machine 2012), WI, on Madison, (Omnipress, Conference Eds. Pineau, International J. Langford, on J. Coference International 29th 2017). May (8 arXiv:1705.03094 sequencing. genome from phenotype cell. a (2012). 329 genomics. yeast. in division cell of initiation regulons. gene particular of Res. rates Acids degradation Nucleic or transcription either modulates and transition. phase S to G1 and production U.S.A. Sci. Acad. ahnsadkresfrcmuainlbiology. computational for (2008). kernels and machines limitations. nutrient Cell diverse Biol. under Mol. growing yeast in metabolites intracellular limiting (2012). 73–105 models. (2015). genome-scale cell 536–570 CHO 5, and yeast correct to information lethality interactions. genetic for mechanisms Biol. potential BMC exposes factors transcription specific repressors. gene-specific of abundance an yeast. in cycle division cell 3–4 (2013). 231–245 22, nn Bysa fcetmlil enllann”in learning” kernel multiple efficient “Bayesian onen, ¨ sn eplann omdlteheacia tutr n ucinof function and structure hierarchical the model to learning deep Using al., et a.Methods Nat. ´ ıa-Mart 1 (2015). 112 13, plctoso upr etrmcie(V)lann ncancer in learning (SVM) machine vector support of Applications al., et acrGnmc Proteomics Genomics Cancer h aeo elgot srgltdb uieboytei i ATP via biosynthesis purine by regulated is growth cell of rate The al., et ihrslto eeepeso ta feitssbtengene- between epistasis of atlas expression gene high-resolution A al., et ag-cl eei etrain eelrgltr ewrsand networks regulatory reveal perturbations genetic Large-scale al., et ´ ınez 9–1 (2010). 198–211 21, 6–7 (2019). 367–372 116, 6335 (2015). 3643–3658 44, h ellrgot aecnrl vrl RAturnover, mRNA overall controls rate growth cellular The al., et EETas vl Comput. Evol. Trans. IEEE 9–9 (2018). 290–298 15, o.Bo.Cell Biol. Mol. lSOne PloS 15 (2018). 41–51 15, rc al cd c.U.S.A. Sci. Acad. Natl. Proc. 9720 (2011). 1997–2009 22, .Biochem. J. Cell 81 (2009). e8018 4, 8–9 (2002). 182–197 6, 4–5 (2014). 740–752 157, lof .R G. olkopf, ¨ LSCmu.Biol. Comput. PLoS 76 (2000). 57–64 128, tc,Spotvector Support atsch, ¨ rceig fthe of Proceedings Genomics .Cmu.Graph Comput. J. 1943–1948 115, Genetics e1000173 4, Metabolites rc Natl. Proc. 323– 99, 192, 5 .W .Gh .Wn,L og h ac fet atri mc aa n how and data, omics in matter Mi effects H. batch Why 46. Wong, L. Wang, Next-generation W. Collins, Goh, J. B. J. W. Costello, W. C. J. 45. Powers, K. R. Collins, M. K. Camacho, M. D. 44. 9 .Lrlm,L ai,J ebg .Bcmy,FC:Afs olfrtecmuainof computation the for tool fast A F2C2: Bockmayr, A. Selbig, J. David, L. autophagy Larhlimi, A. in participate 49. Bet5 and Trs31 Trs23, Trs20, Liang, Y. Min, G. in Liu, synthesis Y. protein Zou, of regulation S. and Mechanism 48. Pavitt, D. G. Kinzy, G. T. Dever, E. T. 47. 2 .Knhs,Y ao .Frmci .Mrsia .Tnb,Nwapproach New Tanabe, M. S Morishima, B. K. 63. Furumichi, M. Sato, Y. Kanehisa, learning. Deep M. Hinton, G. 62. Bengio, Y. LeCun, Y. 61. Heirendt charac- L. models 60. genome-scale into expression splice-isoform Integrating Angione, C. 59. Palsson, Ø. B. 58. O’Duibhir E. 57. in remodeling chain Herrg M. acyl Machado, and D. turnover 53. Phospholipid Kroon, de I. A. Patton-Vogt, J. predictions” 52. model interpreting to approach unified “A Lee, I. S. Lundberg, M. homeostasis. S. lipid membrane yeast 51. on research in Lipidomics Kroon, de I. A. 50. 6 .Ray B. 56. Li P. Conway, M. Vijayakumar, S. 55. Opdam S. 54. h ATE lsicto ytm(.14.0). (v. system classification PANTHER the them. avoid to networks. biological for learning machine u opigi eoesaemtblcnetworks. metabolic genome-scale in coupling flux cerevisiae. Saccharomyces in (2018). Ypt1 GTPase through cerevisiae. Saccharomyces in Lipids Biol. Cell Mol. Acta Biophys. ihbcmSsiCamr/es-E.Acse 2Ags 2018. August 12 Accessed github.com/SysBioChalmers/yeast-GEM. o nesadn eoevrain nKEGG. in variations (2019). genome understanding for 3.0. v. toolbox COBRA the metabolism. cancer breast terizes 2015). Press, University bridge Biol. ER. yeast the Associates, (Curran Eds. Garnett, R. 4765–4774. Vishwanathan, pp. S. 2017), Fergus, R. Wallach, H. Bengio, S. hogptboeia data. biomedical throughput modelling. metabolic in integration omic-network Bioinformatics Briefings and optimization for methods of models. metabolic metabolism. of (2014). e1003580 models constraint-based into data scriptomic .Gyn .V Luxburg, V. U. Guyon, I. Systems, Processing Information Neural in Advances nhz .L,H u .Krhvn .Nesn yBohles/yatGM https:// yeast-GEM. / SysBioChalmers Nielsen, J. Kerkhoven, E. Lu, H. Li, F. anchez, ´ 3 (2014). 732 10, rtcludt o ag-cl eoeadgn ucinaayi with analysis function gene and genome large-scale for update Protocol al., et nomto otn n nlssmtosfrmlimdlhigh- multi-modal for methods analysis and content Information al., et ytmtceauto fmtosfrtioiggenome-scale tailoring for methods of evaluation systematic A al., et rainadaayi fbohmclcntan-ae oesusing models constraint-based biochemical of analysis and Creation al., et elccepplto fet nprubto studies. perturbation in effects population cycle Cell al., et ici.Bohs caMl elBo.Lipids, Biol. Cell Mol. Acta Biophys. Biochim. ytm ilg:Cntan-ae eosrcinadAnalysis and Reconstruction Constraint-Based Biology: Systems rnsBiotechnol. Trends elSystems Cell r,Sseai vlaino ehd o nerto ftran- of integration for methods of evaluation Systematic ard, ˚ 2813 (2017). 1218–1235 19, Genetics a.Protoc. Nat. c.Rep. Sci. 1–2 (2017). 318–329 4, ,C nin,Sen h odfrtetes forest A trees: the for wood the Seeing Angione, C. o, ´ Bioinformatics 9–0 (2017). 498–507 35, 9–9 (2017). 797–799 1862, 517(2016). 65–107 203, 41(2014). 4411 4, 3–0 (2019). 639–702 14, Cell a.Protoc. Nat. Nature 5119 (2018). 1581–1592 173, 9–0 (2018). 494–501 34, NSLts Articles Latest PNAS uli cd Res. Acids Nucleic M Bioinformatics BMC 3–4 (2015). 436–444 521, rh il Sci. Biol. Arch. 0–2 (2019). 703–721 14, 542(2020). 158462 1865, LSCmu.Biol. Comput. PLoS D590–D595 47, 7(2012). 57 13, 109–118 70, | o.Syst. Mol. 11 Biochim. (Cam- f11 of 10,

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY