Supplementary Figures

Figure S1. Top shows the ordering by single cell co-expression of receptors in endothelial cells of PanglaoDB annotated by DCS for Mus musculus. This is identical to Figure 2A. The heatmap shows similarity (Euclidean distance) of correlation of the receptors with all arranged according to the optimized dendrogram ordering.

1

Figure S2. Top shows the ordering by single cell co-expression of receptors in endothelial cells of PanglaoDB annotated by DCS for Mus musculus. This is identical to Figure 2A. The heatmap shows correlation among the receptors arranged according to the optimized dendrogram ordering. This Figure shows that the correlation among the receptors alone would not be sufficient to detect the comberon (highlighted with a red background).

2

Figure S3. Top shows the ordering of the dendrogram by Pearson correlation among receptors in endothelial cells of PanglaoDB annotated by DCS for Mus musculus. The heatmap shows Pearson correlation of the receptors arranged according to the optimized dendrogram ordering. The comberon (red background) can be found but the number of known angiogenesis receptors within it is lower compared to the main method (5 vs 10).

3

Figure S4. Peak memberships for endothelial cells from PandlaoDB Mus musculus annotated by DCS. “Combination of 3” measures show 5 peaks in this realization. Genes belonging to largest peak are highlighted in blue.

4

Figure S5. Empirical distribution of the maxima of 107 realization of 100 randomizations of the dendrogram order of the “Combination of 3” measure done by the DECNEO main implementation of DCS annotated PanglaoDB Mus musculus datasets. This distribution is binned according to all possible values and presented with blue markers. The generalized extreme value distribution, 퐹(푥; 휉, 휇, 휎) = exp[−(1 − 휉 (푥 − 휇)⁄휎)1⁄휉], is found to fit the empirical distribution, where the fitted parameters 휉 = 9.643 ∙ 10−2, 휇 = 2.523 ∙ 10−1 and 휎 = 2.727 ∙ 10−2 determine the shape, location and scale of the distribution, calculated for the same bins as the empirical distribution and displayed with red markers. The horizontal axis in the first four panel is the maximum frequency in a realization. Panels show (A) cumulative probability distribution function (CDF) in linear scale, (B) CDF in logarithmic scale, (C) discrete probability distribution function (PDF) in linear scale, and (D) PDF in logarithmic scale. The numerical difference between analytical and empirical distribution is obtained by root mean square error (RMSE), indicated in each of the panels. Panel (E) shows the empirical distribution of the maximum values increasing with the number of realizations (horizontal axis). Panel (F) shows the model distribution of the maximum values increasing with the number of realizations (horizontal axis).

5

Figure S6. Software modules dependency. DECNEO modules are relying on Python- based numpy, scipy, pandas, scikit-learn, matplotlib and other packages.

6

Figure S7. Ordering by single cell co-expression of receptors in endothelial cells of PanglaoDB annotated by Alona for: (A) Mus musculus, and (B) Homo sapiens. See Figure 2 for detailed description of the dendrogram and information panels.

7

Figure S8. Alternative implementation used for validation. PanglaoDB annotated by Alona for: (A) Mus musculus, and (B) Homo sapiens. See Figure 2 for more detailed description of the dendrogram and information panels.

8

Figure S9. Genes from multiple peaks of 100 bootstrap experiments generated from PanglaoDB Mus musculus data annotated by DCS and analyzed with DECNEO main implementation for the “Combination of 3” measure. Color-coding indicates peak heights. White color indicates that a gene in not in a given peak. Genes and peaks are ordered by agglomerative clustering with method Ward. Dendrograms highlight genes and peaks groupings. Showcased clusters have their gene memberships annotated in the information boxes.

9

Figure S10. Homo sapiens choroid single cell dataset enriched for endothelial cells from Voigt et al. (22) (A) t-SNE projection plot of subset (13,700 cells that passed quality control of DCS) of 18621 cells, color coded by annotated cell type. Cells that did not pass quality control of DCS (4,921 cells, or 26.4%) are included in panel B only, where relative proportions of cells by annotated cell type are shown. (C) Same as (A) but color- coded by batch identifier. The largest cluster (that contains endothelial cells) is formed by all different batches, indicating low batch effects. Annotated cells proportions in batches are shown in (D).

10

Figure S11. Metascape analysis of the 11 gene groups for multiple peaks.

Figure S12. Dendrogram Ordered by single cell co-expression of receptors in endothelial cells of the independent validation dataset annotated by DCS. The procedure is similar to that shown in Figure 2 for the main dataset. The comberon is shown with a red background.

11

Figure S13. Second Metascape enrichment. Receptors of the top enrichment term of tissue-specific genes are removed to investigate secondary set of enrichment terms for: (A) Mus musculus lung endothelial cells, and (B) Homo sapiens choroid endothelial cells.

12

Supplementary Tables

Table S1. Alternative choices for the gene co-expression metrics, clustering similarity metrics and linkage methods analyzed for PanglaoDB-DCS dataset. See file “Table S1.xlsx”.

Table S2. Validation of the comberon by splits of the PanglaoDB-DCS Mus musculus data. See file “Table S2.xlsx”.

Table S3. Comberon GO enrichment analysis filtered to show terms from the Biological Process class. See file “Table S3.xlsx”.

Table S4. Complete version of Table 1 with bootstrap frequencies for 879 receptors. See file “Table S4.xlsx”.

Table S5. Independent dataset composition. See file “Table S5.xlsx”.

Table S6. Pearson correlation of largest-peak bootstrap frequencies from Table S4. “Alona (alt.)” is the alternative implementation of DECNEO using the Alona endothelial cell annotation.

Approach DCS Alona Alona (alt.) Mus Homo Mus Homo Mus Homo Species musculus sapiens musculus sapiens musculus sapiens Combinati on 3 4 3 4 3 4 3 4 3 4 3 4 Mus 3 1.00 0.99 0.76 0.77 0.85 0.87 0.55 0.57 0.64 0.64 0.58 0.62 musculus 4 1.00 0.75 0.76 0.83 0.85 0.53 0.54 0.61 0.61 0.56 0.60 DCS Homo 3 1.00 1.00 0.75 0.75 0.62 0.61 0.53 0.54 0.69 0.69 sapiens 4 1.00 0.75 0.76 0.63 0.62 0.54 0.54 0.69 0.69 Mus 3 1.00 1.00 0.65 0.65 0.79 0.79 0.71 0.72 musculus 4 1.00 0.64 0.64 0.78 0.78 0.69 0.71 Alona Homo 3 1.00 0.98 0.61 0.61 0.70 0.69 sapiens 4 1.00 0.60 0.61 0.69 0.68 Mus 3 1.00 1.00 0.67 0.67 Alona musculus 4 1.00 0.67 0.67 (alt.) Homo 3 1.00 0.98 sapiens 4 1.00

13

Table S7. Receptors were clustered from the multiple peak analysis into 11 receptor groups.

Average Frequency Number Group peak Genes of group of genes height

ADGRF5, ADGRL4, CD93, CDH5, ENG, ESAM, FLT1, KDR, PECAM1, P1 1.00 1.00 13 PTPRB, RAMP2, TEK, TIE1 ADGRL2, APLNR, BAMBI, CAV1, CD151, CD40, CD59, FCGRT, FZD6, P6 0.54 0.56 18 IFNGR1, ITGA6, LRP10, LRP8, PROCR, PTPN12, RTP4, STRA6, TNFRSF1A ACKR2, ACVRL1, ADORA2A, ADRB2, BMPR2, CCRL2, CLEC2D, DYSF, EPHB4, GPR4, IGF1R, IL10RB, IL2RG, IL6ST, ITGA1, ITGA4, NRP1, P11 0.18 0.50 29 OSMR, PEAR1, PLSCR4, PTPN18, PTPRM, SCARF1, SDC3, SIRPB1, STAB1, TGFBR2, TGFBR3, TWF2 ADGRE5, ADGRG3, AQP1, CALCRL, CD300LG, CD36, CD55, CLEC14A, P9 0.37 0.50 22 CLEC1A, CXCL16, FLT4, GPIHBP1, GPR182, ITGA9, KIT, NPR1, NRP2, PLAUR, PLXND1, RARG, ROBO4, THBD ATP6AP2, CANX, CD9, HFE, IFNGR2, LAMP1, LAMP2, LIFR, NCSTN, P7 0.53 0.35 13 PPARD, PTPN1, PTPRA, REEP5 AXL, CD248, CD63, CSPG4, EDNRA, GPER1, IFITM1, NOTCH3, P3 0.72 0.32 12 P2RY14, PDGFRB, PTH1R, S1PR3 ADIPOR2, AMFR, BMPR1A, BUD31, CD1D, CDH2, CLU, COLEC12, DERL1, DERL2, EGFR, GHR, GPC4, GPR108, GPRC5C, IL11RA, IL15RA, IL17RC, IL6R, ITGB5, LDLR, LRP1, LSR, MAGED1, NECTIN1, NOTCH2, P10 0.27 0.30 45 NR1D1, NR1H3, NR2F6, P2RX4, PGRMC1, PLXNB1, PLXNB2, PTPRD, PTPRF, RORA, RXRA, SDC1, SDC2, SDC4, SERPING1, SLC16A2, TNFRSF12A, TRAF4, ZNRF3 P2 0.79 0.27 9 ADIPOR1, APLP2, CD81, CR1, ITGB1, KDELR2, NR3C1, REEP3, S1PR1 ABCA1, F11R, FAS, GPR146, IFNAR2, INSR, ITGA5, ITPR2, NR4A1, P5 0.61 0.22 16 OCLN, PTPRG, SCARB1, SIGIRR, SLC40A1, TFRC, TNFRSF1B C5AR1, CD14, CD300A, CD48, CD53, CLEC4C, CLEC6A, CSF1R, P8 0.39 0.15 21 CX3CR1, FCGR1B, GPR183, GPR34, IGSF6, IL10RA, ITGAM, ITGB2, P2RY12, P2RY6, PTAFR, TREM2, TYROBP ADCYAP1R1, ADGRB1, ADGRB2, ADGRL1, CELSR2, CNR1, CNTFR, CNTNAP2, EPHA5, FXYD6, FZD3, GABBR1, GABRA2, GABRB1, GABRB3, GABRG2, GFRA4, GLRB, GPR162, GPR85, GRIA1, GRIA2, P4 0.68 0.13 44 GRIA4, GRIK2, GRIK5, GRIN2B, IGSF3, LINGO1, LRP11, NR2F1, NRXN1, NRXN2, NRXN3, NTRK2, NTRK3, PLXNA4, PTPRS, PTPRZ1, REEP2, ROBO2, SCN8A, SORL1, THRA, TUBB3

14 Table S8. The top-10 tissue-specific receptors of endothelial cells of PanglaoDB and choroid annotated by DCS. All data were processed as described in Figure 5. The right- most column is the minimum max-peak bootstrap count.

Homo Mus Mus Mus Mus Mus Mus Mus Homo Species sapiens musculus musculus musculus musculus musculus musculus musculus sapiens Subventri Olfactory Hypothal Bone Tissue Choroid cular Lung Kidney Heart Testis bulb amus marrow zone Max 0.90 0.89 0.73 0.95 0.98 0.96 0.99 0.76 0.95 PTPRG ACVRL1 IGF1R CALCRL NRP1 IGF1R APLP2 CAV1 CAV1 BMPR2 APLP2 ACVRL1 CD36 CD300LG FCGRT REEP3 ITGA6 FCGRT NOTCH1 ITGB1 SLC16A2 GPIHBP1 EDNRB ACVRL1 ROBO4 CD36 CALCRL PTPRM CAV1 TFRC THBD S1PR1 SLC40A1 GPR182 THBD BCAM Top-10 LIFR S1PR1 PTPRG ACVRL1 RAMP3 PAQR5 STAB1 S1PR1 TGFBR2 receptors IL6ST CD81 SLC40A1 CLEC14A CALCRL IL10RB CLEC2D GPIHBP1 HLA-F NRP1 CD151 PAQR5 CLEC1A THBD CD151 CD36 CLEC1A CLEC14A PLXND1 BMPR2 TGFBR2 S1PR1 CD81 OCLN PLXND1 CD300LG LPAR6 ITGA5 THBD CD151 CAV1 CD9 CAV1 FLT4 EPHB4 ROBO4 OSMR SLC16A4 CD59 BMPR2 GPIHBP1 CD59 MRC1 CLEC14A CLEC2B Min 0.75 0.58 0.52 0.78 0.57 0.57 0.86 0.54 0.68

Table S9. GO biological processes, KEGG and Reactome enrichment analysis of the 11 gene groups using PyIOmica (28). See file “Table S6.xlsx”.

Table S10. Analysis of endothelial cells from selected studies (same as in Figure 4). Same tissue SRAs are combined into one to find average correlation among them (column “Inside”) and average correlation between them and all other SRAs (column “Outside”). The table is ordered by column “Inside” in descending order. Since most “Inside” values are larger than “Outside” values, we quantitatively observe that the SRAs from similar tissues tend to group together.

Species Tissue Studies Inside Outside Mus musculus Hypothalamus 4 0.865 0.592 Mus musculus Olfactory bulb 2 0.827 0.612 Mus musculus Lung 5 0.817 0.579 Mus musculus Subventricular zone 2 0.776 0.604 Homo sapiens Testis 3 0.744 0.521 Mus musculus Kidney 3 0.734 0.594 Mus musculus Bone marrow 2 0.716 0.496 Mus musculus Heart 2 0.311 0.470

15 Table S11. Metascape enrichment of tissue-specific genes (excluding the 13 genes relevant for all endothelial cells, shown at the top of Table 1) for Homo sapiens choroid endothelial cells and Mus musculus lung endothelial cells. Log(q- Found Total Tissue Term Description value) genes genes ACVRL1, ADGRE5, ADGRL2, APLP2, AQP1, BCAM, BMPR2, CALCRL, CAV1, CD151, CD36, CD9, CLEC14A, CLEC1A, EDNRB, FAS, GPIHBP1, ITGA1, ITGA6, ITGB1, KIT, LAMP1, LYVE1, NPR3, NRP1, S1PR1, THBD GO:0007160 cell-matrix adhesion -6.086 8 231 Lung GO:0050900 leukocyte migration -5.213 9 518 GO:0019935 cyclic-nucleotide-mediated signaling -5.182 7 224 GO:0048514 blood vessel morphogenesis -4.400 9 707 GO:0009611 response to wounding -4.400 9 708 GO:0051345 positive regulation of hydrolase activity -4.120 9 783 ADIPOR2, BMPR2, CALCRL, CD46, CFH, F11R, FZD4, IGF2R, IL4R, IL6ST, ITGA1, ITGA10, ITGA5, ITGAV, KIDINS220, LIFR, LPAR6, LRP11, NOTCH1, NRP1, OSMR, PLSCR4, PLXND1, PTPRG, PTPRM, ROBO4, S1PR1, TGFBR2 GO:0048514 blood vessel morphogenesis -9.112 13 707 Choroid GO:0030155 regulation of cell adhesion -9.112 13 738 GO:0045446 endothelial cell differentiation -5.029 6 122 M47 PID INTEGRIN CS PATHWAY -4.634 4 26 GO:0038165 oncostatin-M-mediated signaling pathway -4.414 3 7 hsa05200 Pathways in cancer -4.237 8 575

Table S12. The comparison of the bootstrap frequencies of the independent validation dataset and the original PanglaoDB-DCS dataset. See file “Table S12.xlsx”.

16