Supplementary material for SINCERITIES: Inferring gene regulatory network from time-stamped cross-sectional single cell transcriptional expression data Nan Papili Gao1,2, S.M. Minhaz Ud-Dean3 and Rudiyanto Gunawan1,2

1 Institute for Chemical and Bioengineering, ETH Zurich, Zurich, Switzerland 2 Swiss Institute of Bioinformatics, Lausanne, Switzerland 3 Werner Siemens Imaging Center, Department of Preclinical Imaging and Radiopharmacy, Eberhard Karls University Tuebingen, Tuebingen, Germany

Distribution distances In addition to the Kolmogorov-Smirnov distance, we also evaluated the use of the Cramér–von Mises (CM) criterion [1] as the DD metric. The CM criterion is given by:

(S1)

where denotes the CM criterion of gene j in the time window Δtl, and denotes the cumulative distribution function of gene j expression (Ej) at time point tl (l = 1, 2, …, n−1). In addition to the CM criterion, we also tested the Anderson-Darling (AD) criterion [2], which is given by:

(S2)

The CM and AD criteria provide more sensitive measures of the global change in the distribution than the KS distance [3]. In contrast, the KS distance better reflects the shift of the center of the distribution. We evaluated the performance of SINCERITIES using the CM criterion using the in silico single cell dataset (see Methods). Table S2 below report the AUROCs and AUPRs for the CM and AD criteria, showing that these criteria could provide a comparable performance to the KS distance. However, for the THP-1 differentiation dataset, both CM and AD criteria gave much poorer AUROC and AUPR values than the KS distance (AUROC: 0.54 for CM, 0.56 for AD vs. 0.70 for KS, AUPR: 0.20 for CM, 0.22 for AD vs. 0.33 for KS). The reason for the poor performance of AD and CM might have to do with the reference TF network of THP-1, which came from population-average transcriptional data of RNAi experiments. Since the KS is a more sensitive metric of the mean-shift than AD and CM, the GRN prediction from SINCERITIES using KS agreed better with the RNAi experiments.

Table S2 Performance comparison for SINCERITIES with KS, AD or CM distance on in silico data. Regularization methods: Lasso and Elastic-net While we recommended using ridge regression, SINCERITIES could also be implemented using two additional regularization strategies, namely Lasso (Least Absolute Shrinkage and Selection Operator) and elastic-net. The three methods differ only in the penalty function used in the least square objective function in Eq. (4) in the main text. In contrast to ridge regression, the Lasso regularization enforces an L1 norm penalty in the least square objective function, as follows

(S3)

Meanwhile, the elastic net uses a penalty function that combines those from the Lasso and ridge regression, with the following least square objective function:

(S4)

Setting γ to 1 would give the Lasso regularization, while setting γ to 0 would give the ridge regression. Here, we again used LOOCV to determine the parameters  and γ. In the case of elastic net, we performed LOOCV to obtain the optimal  value for discrete values of γ between 0.1 and 0.9 with a step size of 0.1 (i.e. γ = 0.1, 0.2, …, 0.9). The final optimal combination of  and γ again corresponded to the minimum cross validation error among the LOOCV runs. Table S3 reports the performance of SINCERITIES using the KS distance using the Lasso and elastic net regularization strategies for the in silico single cell dataset. For 10-gene gold standard GRNs, the ridge regression gave significantly higher AUROCs and AUPRs (p-value<0.05, paired t- tests) than the Lasso and elastic net.

Table S3 Performance comparison for SINCERITIES using KS distance with Ridge, Elastic-net, and Lasso on in silico data.

Supplementary figures

DIFFUSION MAPS

4

2 3

C 0 D - 5 - 2

0 - 4 3 2 1 0 5 D C 2 - 1 - 2 D C 1

Figure S1 Three-dimensional projection of in silico single cell data (10-gene Network E. coli 1) using diffusion map [4].

A B 1 1

0 . 9 0 . 9

0 . 8 0 . 8

e 0 . 7 e 0 . 7 m m i i t t

g g n n i i l l

p 0 . 6 p 0 . 6 m m a a s s

l l a a n n i 0 . 5 i 0 . 5 g g i i r r O O

s s v v

t t

s 0 . 4 s 0 . 4 u u l l e e d d n n a 0 . 3 a 0 . 3 W W

0 . 2 0 . 2

0 . 1 0 . 1

0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 C e l l s C e l l s

Figure S2 Comparison between Wanderlust [5] pseudotime and the true cell sampling time points for in silico single cell data (10-gene Network E. coli 1). Wanderlust algorithm is applied to (A) the original dataset in high- dimensional space, and (B) low-dimensional (3D) diffusion map projection data. 0 h 1 h TSNE PCA 6 h 60 DIFFUSION MAPS 1 2 h 2 4 h 4 8 h 7 2 h 40 9 6 h 6

20 5 4 2

0 ) T 2 0 %

3 1 C 1

D (

-20 0 3 C

P - 5 1 0 - 2 -40 0 - 5 - 1 0 - 1 0 0 - 1 0 - 4 - 5 4 - 2 0 -60 2 5 0 -50 -40 -30 -20 -10 0 10 20 30 40 0 5 - 2 - 3 0 P C 2 ( 1 1 % ) T 1 - 4 1 0 D C 2 1 0 D C 1 P C 1 ( 1 5 % )

Figure S3 Low dimensional projection of THP-1 human myeloid leukemia cell differentiation data using (a) principal component analysis (PCA), (b) t-Distributed Stochastic Neighbor Embedding (t-SNE) [6] and (c) diffusion map analysis.

A B 1 1

0 . 9 0 . 9

0 . 8 0 . 8

0 . 7 0 . 7 ) ) e e m m i i t t

l 0 . 6 l 0 . 6 a a n n i i g g i i r r O O 0 . 5 0 . 5 s s v v

t t s s u u l l e e

d 0 . 4 d 0 . 4 n n a a W W 0 . 3 0 . 3

0 . 2 0 . 2

0 . 1 0 . 1

0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0 Cells Cells

Figure S4 Comparison between Wanderlust pseudotime and the true cell sampling time points of THP-1 human myeloid leukemia cell differentiation. Wanderlust algorithm is applied to (A) the original dataset in high- dimensional space, and (B) low-dimensional diffusion map projection data.

References 1. Anderson TW: On the Distribution of the Two-Sample Cramer-von Mises Criterion. Ann Math Stat 1962, 33:1148–1159. 2. Anderson TW, Darling DA: Asymptotic Theory of Certain “Goodness of Fit” Criteria Based on Stochastic Processes. Ann Math Stat 1952, 23:193–212. 3. Stephens MA: Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics Without Extensive Tables. J R Stat Soc Ser B 1970, 32:115–122. 4. Coifman RR, Lafon S: Diffusion maps. Appl Comput Harmon Anal 2006, 21:5–30. 5. Bendall SC, Davis KL, Amir E-AD, Tadmor MD, Simonds EF, Chen TJ, Shenfeld DK, Nolan GP, Pe’er D: Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 2014, 157:714–25. 6. Van Der Maaten L, Hinton G: Visualizing Data using t-SNE. J Mach Learn Res 2008, 9:2579–2605.