Differential Metabolic and Coexpression Networks of Plant Metabolism

Supplementary Material Differential metabolic and coexpression networks of plant metabolism Nooshin Omranian1,2, Sabrina Kleessen3, Takayuki Tohge1, Sebastian Klie3, Georg Basler4, Bernd Mueller-Roeber1,2, Alisdair R. Fernie1 and Zoran Nikoloski1 1Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam, Germany 2Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany 3targenomix GmbH, Am Mühlenberg 11, Potsdam, Germany 4Estación Experimental del Zaidín CSIC, 18008 Granada, Spain Corresponding author: Nikoloski, Z. ([email protected])

Singular value decomposition of a stoichiometric matrix and derived cumulative distribution

We use singular value decomposition of the stoichiometric matrices to determine the normalized cumulative singular value spectra for the 16 plant networks. The singular values of a stoichiometric matrix S, with row denoting metabolites, columns corresponding to reactions, and entries representing the stoichiometric coefficients with which metabolites participate in a reaction, are given by the diagonal entries of D, whereby:

The cumulative singular value spectra are obtained from the singular values by dividing the cumulative values of by the sum of singular values [S1].

Two cumulative probability distributions, and , over the domain , can be compared by the Kolmogorov- Smirnov statistic [S2], given by

.

This statistic was used to build the heatmap in Figure 2A of the main text. The resulting distance matrix can be used in conjunction with any clustering algorithm of choice.

Comparison of metabolic networks and distance matrices

The similarity between two metabolic networks can be quantified by the ratio of the number of reactions present in both networks and the number of all reactions in the two networks. If and denote the reaction sets of two metabolic networks, their Jaccard distance is given by: .

This distance index was used to build the heatmap in Figure 2B of the main text. The resulting distance matrix can be used in conjunction with any clustering algorithm of choice.

Two (distance) matrices, and , can be compared based on the coefficient given by [S3]:

, where denotes expectation and denotes the trace of a matrix.

Distribution of co-expression values over sets of genes

Given a family of gene sets , such that each gene set represents some grouping of genes based on a criterion (e.g., involvement in a metabolic pathways), one can determine two characteristic distributions of co-expression: Let , denote the set of correlation values obtained from all pairs of genes in the corresponding ; assuming that the pairs of genes are ordered, let denote the index of the pair of genes whose correlation is largest in absolute value, and let denote the value of the correlation for the pair from the gene set . The first characteristic distribution is over the set of correlation values given by and the second is over the set of correlation values given by . We refer to the first distribution as “distribution of all pairwise correlations in the family of gene sets ” and to the second as “distribution of maximum pairwise correlations in the family of gene sets ” (or for short, all and max). While the first distribution captures the typical behavior in a family of gene sets , the second stresses of extremes in . The distribution of maximum pairwise correlations was used in the analysis of Chae et al. [S4] with given by: pathways of specialized metabolism, pathways of non-specialized metabolism, specialized metabolism clusters, non-specialized metabolism clusters, neighboring genes, and random clusters. Since we are interested in comparing the behavior of non-specialized and specialized metabolic pathways, we determined and compared the distributions of all as well as maximum pairwise correlations in these two families of gene sets.

Co-expression networks and assortativity statistic

Co-expression networks were generated by determining correlation for all pairs of genes and applying a threshold to ensure FDR of 0.01. Each node in the network corresponds to a gene, and two genes are connected by an edge if the correlation between their profiles is above the determined threshold [S5]. Each node in the network can be categorized to be involved in non-specialized metabolic pathways or in specialized metabolic pathways. Based on this information, each node receives a weight which corresponds to the number of neighboring genes which have the same category as the node . The assortativity statistic corresponds to the correlation of the weights given the network structure, and was shown to correspond to Moran’s with the assumption of network homogeneity [S6]:

, where denotes the weight of the edge between nodes and and takes a value of zero if the network does not include this edge. Usually or , where is the degree of node , given by the number of its neighbors. The values for are between and and can be interpreted in the same way as correlations.

Supplementary references

S1 Duarte, N.C. et al. (2007) Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl. Acad. Sci. U. S. A. 104, 1777–82 S2 Lilliefors, H.W. (1967) On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown. J. Am. Stat. Assoc. 62, pp. 399–402 S3 Robert, P. and Escoufier, Y. (1976) A Unifying Tool for Linear Multivariate Statistical Methods: The RV- Coefficient. J. R. Stat. Soc. Ser. C (Applied Stat.) 25, 257–265 S4 Chae, L. et al. (2014) Genomic signatures of specialized metabolism in plants. Science. 344, 510–3 S5 Toubiana, D. et al. (2012) Network analysis: tackling complex data to study plant metabolism. Trends Biotechnol. 31, 29-36 S6 Kleessen, S. et al. (2013) Data integration through proximity-based networks provides biological principles of organization across scales. Plant Cell 25, 1917–27