Supplementary Materials

A comprehensive analysis of expression changes in a high replicate and open-source dataset of differentiating hiPSC-derived cardiomyocytes

Tanya Grancharova1,2, Kaytlyn A. Gerbin1,2, Alexander B. Rosenberg3,4, Charles M. Roco3,4,5, Joy Arakaki2, Colette DeLizzo2, Stephanie Q. Dinh2, Rory Donovan-Maiye2, Mathew Hirano3, Angelique Nelson2, Joyce Tang2, Julie A. Theriot2,6, Calysta Yan2, Vilas Menon7, Sean P. Palecek8, Georg Seelig3,9, Ruwanthi N. Gunawardane2*

1Authors contributed equally 2Allen Institute for Cell Science, Seattle, WA 3Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 4Parse Biosciences, Seattle, WA 5Department of Bioengineering, University of Washington, Seattle, WA 6Department of Biology and Howard Hughes Medical Institute, University of Washington, Seattle, WA 7Department of Neurology, Columbia University Irving Medical Center, New York, NY 8 Department of Chemical and Biological Engineering, University of Wisconsin - Madison, Madison, WI 9Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA * corresponding author, [email protected]

Supplementary Figure S1

A Protocol 1 Protocol 2 D0 (undifferentiated) D12 D24 D90 D14 D26 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 Diff Exp1 1 1 1 1 Diff Exp2 1 1 1 1 1 1 Diff Exp3 1 1 1 1 1 1 1 1 Diff Exp4 2 1 2 1 2 1 2 1 Diff Exp5 1 2 1 1 1 2 1 2 1 3 1 2 Diff Exp6 2 1 Diff Exp7 2 Undiff 1 1

Protocol 1 Protocol 2 B D0 (undifferentiated) D12 D24 D90 D14 D26 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 Diff Exp1 1 1 1 1 Diff Exp2 1 1 1 1 1 1 Diff Exp3 1 1 1 1 1 1 1 1 Diff Exp4 2 1 2 1 2 1 2 1 Diff Exp5 1 2 1 1 1 2 1 2 1 3 1 2 Diff Exp6 2 1 Diff Exp7 2 Undiff 1 1

C Protocol 1 Protocol 2 D0 (undifferentiated) D12 D24 D90 D14 D26 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 WTC-11 TOMM20 TNNI1 Diff Exp1 1 1 1 1 Diff Exp2 1 1 1 1 1 1 Diff Exp3 1 1 1 1 1 1 1 1 Diff Exp4 2 1 2 1 2 1 2 1 Diff Exp5 1 2 1 1 1 2 1 2 1 3 1 2 Diff Exp6 2 1 Diff Exp7 2 Undiff 1 1

D D0 D12

D12 D24

D24 D90

E D12

D12 D14

D14 D24

D24 D26 Supplementary Figure S1. Sample overview A. Table summarizing all scRNA-seq samples. Each row represents one differentiation experiment (plates/samples set up on one day; see Fig. 5B). Numbers in the table indicate the number of wells that were collected. Wells were not pooled. B. Table highlighting scRNA-seq samples used in Figs. 1-3, Supplementary Figs. S2-S4, focusing on gene expression during cardiomyocyte differentiation with Protocol 1 at D0, D12, D24 and D90. C. Table highlighting scRNA-seq samples used in reproducibility analysis (Figs. 4-5, Supplementary Figs. S6-S7). D. UMAPs showing each of the samples from D0, D12, D24, D90 Protocol 1 analysis (D0 n = 2 samples; D12 n = 9 samples; D24 n = 9 samples; D90 n = 5 samples). Each individual sample highlighted in green in B is shown in a separate UMAP in red. The UMAP is the same as in Fig. 1. Box colored by time point (colors as in Fig. 1D) is drawn around all samples from a given time point. E. UMAPs showing each of the samples from D12, D14, D24, D26 reproducibility analysis (D12 n = 14 samples; D14 n = 14 samples; D24 n = 9 samples; D26 n = 11 samples). Each individual sample highlighted in green in C is shown in a separate UMAP in red. The UMAP is the same as in Fig. 4. Box colored by time point (colors as in Fig. 4C) is drawn around all samples from a given time point.

Supplementary Figure S2

A y

Cluster Cluster D a FN1 COL3A1 COL1A2 COL1A1 POSTN THBS2 OGN GPC5 EYS FSTL5 RP5−964N17.1 FLT1 KDR EGFL7 ESM1 ELTD1 MFAP4 IGF2 H19 A2M C7 GATA4 ZFPM2 KCNIP4 MYH6 ACTC1 TTN TNNT2 GPR126 AFP SERPINA1 LRP2 LINC00842 SBSPON KCNMA1 PRTG RP11−175E9.1 GABRP GRHL2 CNTN1 DCT TRPM3 ADAMTSL3 TYRP1 SERPINF1 ELN SULF1 BNC2 PCDH9 GRIA4 RELN RMST CPAMD8 CENPF MKI67 PDZRN4 CNTN5 CTNNA2 10 4 11

12 Day 3 13 D12 4 D24 2 5 D90 6 1 7 8 0 9

−1

B CTNNA2 C OGN 10 10 0 1 2 3 4 0 1 2 3 4

5 5

C11 0 0 UMAP2 UMAP2

C5 -5 -5

-10 -10

-10 -5 0 5 10 -10 -5 0 5 10 UMAP1 UMAP1

D TRPM3 E FN1 10 10 0 1 2 3 4 5 0 2 4 6

5 5

C11 0 0 UMAP2 UMAP2

C5 -5 -5

C7 -10 -10

-10 -5 0 5 10 -10 -5 0 5 10 UMAP1 UMAP1 Supplementary Figure S2: Non-cardiomyocyte cell types identified in D12, D24, and D90 samples generated with differentiation Protocol 1 A. Heatmap showing top differentially expressed from each pairwise cluster comparison between pairs of non-cardiomyocyte clusters (non-cardiomyocytes were identified by the absence of marker TNNT2; see Fig. 1C-D). Heatmap includes cardiomyocyte cluster 7, proliferative cardiomyocyte cluster 12, and all differentiated non-cardiomyocyte clusters. Normalized transcript abundance was centered and scaled across each gene (z-score color scale to the right of heatmap; red = standard deviations above mean; blue = standard deviations below mean; white = mean; for visualization purposes, 4 was set as the maximum z- score, and z-scores > 4 were set to 4). The dendrogram is based on hierarchical clustering of genes. Each row corresponds to one cell. B. UMAP from Fig. 1 colored by transcript abundance of CTNNA2, highlighting non-cardiomyocyte cluster C11 (orange shading in A and outline on UMAP). Increased red shading reflects higher levels of transcript. C. UMAP from Fig. 1 colored by transcript abundance of OGN, highlighting non-cardiomyocyte cluster C5 (mint green shading in A and outline on UMAP). D. UMAP from Fig. 1 colored by transcript abundance of TRPM3, highlighting non-cardiomyocyte cluster C11 (orange shading in A and outline on UMAP) and cardiomyocyte cluster C7 (purple shading in A and outline on UMAP). E. UMAP from Fig. 1 colored by transcript abundance of FN1, highlighting non-cardiomyocyte cluster C5 (mint green shading in A and outline on UMAP).

Supplementary Figure S3 A B

PRKG1 Day H19* Day LINC00881 EFNA5 PAM ACTG1 D24 PRSS35 D12 EGR1 PLN PDE3A D90 MYH7 D24 CRYAB SPHKAP PDE1C PDGFD C7 FHOD3 CPNE5 FBXL7 A2M MLIP IGF2 RNF150 H19* 4 INPP4B NLGN1 STK39 PLCL1 GOLIM4 4 HDAC9 3 SV2C RGS6 HS3ST4 FGF12 SOX6 CNTN5 2 2 MOXD1 CALD1 WWOX MFAP4 DOK4 SULT1E1 1 GRIN2A 0 NCKAP5 FAM19A4 HECW2 0 MYH6* STK38L BMPER* PRRX1 −2 LDLRAD4 COL2A1* −1 MYO1D NFIB PRTG* TRIM24 KCNQ5 MEF2C −4 −2 RNA28S5 ALPK2 GAPDH DENND5B ADAMTS12 WNT5B CCDC141 RALYL FBN2 MYH6* VCAN SC5D ATP1A1 RHOBTB3 ACTA1 PRTG* MASP1 BMPER* FRMD4B SNHG14 RBFOX2 COL2A1* TENM4

SMADs, TFs Signaling C D C2 C0 C1 C3 C2 C0 C1 C3 Max value wound healing Max value

viral transcription PDE7A 3 LDLRAD4 3.3 viral gene expression translational initiation rRNA metabolic process PDE4D 4.2 RNA localization SMAD9 3 RNA catabolic process 3.4 ribonucleoprotein complex biogenesis ADCY5 regulation of transporter activity MEF2C 3.6 regulation of the force of heart contraction PDE3A 4 regulation of system process regulation of membrane potential TGFB2 4.1 regulation of ion transmembrane transporter activity PDE10A 3.7 muscle system process muscle hypertrophy in response to stress TGFBR2 2.7 3.7 multicellular organismal signaling PDE1C heart morphogenesis heart development extracellular matrix organization Ion channels Metabolic establishment of localization to endoplasmic reticulum C2 C0 C1 C3 C3 Max value

C2 C0 C1 C3 Max value embryonic morphogenesis DNA conformation change CACNA1C 5.2 SLC2A3 3.3 cotranslational protein targeting to membrane cell-substrate adhesion CACNA1D 4.6 cardiac muscle hypertrophy in response to stress SLC27A6 3.7 cardiac muscle hypertrophy CASQ2 3.3 cardiac muscle adaptation cardiac conduction FABP3 3.4 6.1 -mediated cell contraction KCNIP4

KCNT2 3.3 D0v12 IGFBP5 4.4 D12v24 D24v90

SCN5A 3.2 100 150 200 250 50 Gene count KCNQ1 3.1 Supplementary Figure S3: Gene changes in cardiomyocyte populations over time A. Heatmap of the top 40 ranked genes from feature selection analysis between D12 and D24 cardiomyocytes. Genes that overlap between (A) and (B) are marked with an “*”. B. Heatmap of the top 40 ranked genes from feature selection analysis between D24 and D90 cardiomyocytes. Genes that overlap between (A) and (B) are marked with an “*”. C. Enriched (GO) categories were identified for differentially expressed genes between Day 0 (C2) and Day 12 (C0), Day 12 (C0) and Day 24 (C1), and Day 24 (C1 and Day 90 (C3). Size of the circle represents the number of differentially expressed genes in each category. Top 10 enriched GO categories (ranked by multiple testing adjusted p-value) are shown for each pairwise comparison. See Supplemental Table S4. D. Functional gene categories that change between D24 and D90 cardiomyocytes. Transcript abundance distributions are shown for C2 (D0), C0 (D12), C1 (D24), and C3 (D90). Max value = maximum value of log1p normalized counts; dot = median.

Supplementary Figure S4

1000 A B 80

750 60

two two twelve forty 500 twelve 40 other forty not selected

250 # of selected genes 20 # of times selected in 1000 bootstraps 0 0 -5 -4 -3 -2 -1 -5 -4 -3 -2 -1 log(lambda) log(lambda)

FGF12 C 4 D A2M SULT1E1 IGF2 1.0 NCKAP5 C7 RGS6 H19 PDE3A CPNE5 PRRX1 MFAP4 CRYAB PDE1C STK38L NLGN1 EGR1 2 HECW2 three PLCL1 PRKG1 0.8 LDLRAD4 EFNA5 twelve ACTG1 CALD1 forty SOX6 HDAC9 not NFIB selected log2fc

0 0.6

PRTG TRIM24 ALPK2 Prediction accuracy DENND5B BMPER RHOBTB3 SNHG14 WNT5B 0.4 MYH6 -2 COL2A1 RALYL 0 20 40 60 1810 1820 SC5D TENM4 Gene set size

0 2 4 6 E mean

1000 F 60

750

40 three twelve three forty twelve 500 other forty not selected # of selected genes 20 250

0 0 # of times selected in 1000 bootstraps -5 -4 -3 -2 -1 -5 -4 -3 -2 -1 log(lambda) log(lambda) G H

6 5

4 4 D12 D24 D24 3 MYH7 IGF2 D90

2 2

1

0 0 0 1 2 3 4 5 0 1 2 3 4 MYH6 A2M Supplementary Figure S4: Feature selection analysis in D12 vs D24 and in D24 vs D90 cardiomyocytes A. D12 vs. D24 bootstrapped penalized regression selection frequency for each gene across regularization parameter, lambda, sequence. Selected genes are color coded based on size of the selected gene set as in Fig. 3A-B. B. Number of selected genes from D12 vs. D24 feature selection at different values of lambda. See Fig. 3A-B. C. D24 vs. D90 cardiomyocyte log2 fold change vs. mean expression. Top feature selected genes are color coded in red (top 3 genes), blue (top 12 genes), and purple (top 40 genes); all other genes are shown in gray. D. Accuracy of predicting cell age (D24 vs. D90) in holdout data using feature selected gene sets of different sizes. The prediction accuracy for a set of highly variable genes between D24 and D90 is shown as the dot to the right of the x-axis break. Prediction accuracies for random gene sets of the same size are shown as box plots with outliers omitted (100 random samples for each gene set size). E. D24 vs. D90 bootstrapped penalized regression selection frequency for each gene across regularization parameter, lambda, sequence. Selected genes are color coded based on size of the selected gene set as in C. F. Number of selected genes from D24 vs. D90 feature selection at different values of lambda. G. Scatter plot of transcript abundances of the top two D12 vs. D24 feature selection genes, MYH6 and MYH7, in D12 and D24 cardiomyocytes. H. Scatter plot of transcript abundances of the top two D24 vs. D90 feature selection genes, IGF2 and A2M, in D24 and D90 cardiomyocytes.

Supplementary Figure S5

A MYL7 MYL2 MEIS2 ESRRG FABP3 TNNT2 D18 D30

B MYH6 MYH7 COL2A1 VCAN MYL7 MYL2 D18 D30 Supplementary Figure S5: RNA FISH in cardiomyocytes validates genes identified in scRNA-seq data analysis A. RNA FISH transcripts are shown in white for the following genes at D18 and D30, as labeled: MYL7, MYL2, MEIS2, ESRRG, FABP3, TNNT2. Nuclei are labeled with DAPI (cyan). Scale bars = 20 µm. B. RNA FISH transcripts are shown in representative fields of view at D18 and D30 for the following gene pairs: MYH6 (white) and MYH7 (magenta), COL2A1 (white) and VCAN (magenta), and MYL7 (white) and MYL2 (magenta). Nuclei are labeled with DAPI (cyan). Scale bars = 20 µm. Transcript abundance for all genes shown in this figure is quantified in Fig. 3D.

Supplementary Figure S6

A B Exp1 Exp2 Exp3 Exp4 Exp5 0.20

0.15 Exp1 0.10 0.961 0.964 0.968 0.965 0.05 0.00 0.9

10 0.967 0.967 0.973 Exp2 5

0 Day

0.6 Differentiation experiment 10 Exp3 Differentiation protocol 0.977 0.976 # Genes 5 Day grouped (early, late) # UMI 0 Cell line

10 Exp4 Probability density 0.3 5 0.977

0

10 Exp5

0.0 5 0 0.001 0.01 0.1 1 10 0 5 10 0 55 10 0 5 10 0 5 10 0 5 10 % variance explained (R2)

C D E D14 Exp1 D24 Exp1 D26 Exp1 c0 c1 c2 c3 c4 c5 c0 c1 c2 c3 c0 c1 c2 c3

D14 Exp2 D24 Exp2 D26 Exp2

c0 c1 c2 c3 Max value c0 c1 c2 c3 c4 Max value c0 c1 c2 c3 Max value MKI67 2.9 MKI67 3.3 MKI67 3.9

TNNT2 4.4 TNNT2 4.4 TNNT2 4.3 MYH6 5.8 MYH6 5.9 MYH6 5.2

D14 Exp3 MYH7 5.2 D24 Exp3 MYH7 6.3 D26 Exp3 MYH7 5.5

0 0 0 200 500 500 400 1000 Exp3 Exp4 Exp5 Exp1 Exp2 Exp3 Exp4 Exp5 Exp3 Exp4 Exp5 600 1000 1500 D24 Exp4 D14 Exp4 D26 Exp4 0 1 2 3 4 5 0 1 2 3 0 1 2 3

0 0 0 200 500 500 1000 400 600 1000 1500 D24 Exp5 D14 Exp5 Exp5 0 1 2 3 4 5 0 1 2 3 0 1 2 3

WTC-11 WTC-11 WTC-11 TOMM20-mEGFP TOMM20-mEGFP TOMM20-mEGFP TNNI1-mEGFP TNNI1-mEGFP TNNI1-mEGFP Supplementary Figure S6: Sources of biological variability and differences in gene expression across differentiation experiments. A. Distributions of gene-wise variance explained (coefficient of determination) for cell variables of interest (day of differentiation, differentiation experiment, differentiation protocol, cell line, # of genes detected, # of UMIs detected). Analysis was performed on the set of highly variable genes in D12, D14, D24, and D26 cells. B. Scatter plots of population transcript abundances between differentiation experiments. Each point is a gene, and Spearman correlations are shown in the upper right. UMAPs on the right highlight cells from each differentiation experiment in red. C. D14 cardiomyocytes (TNNT2+ cells) from all 5 differentiation experiments were independently clustered and visualized using UMAP. Each differentiation experiment is individually colored in red in the left column UMAPs. Top right UMAP is colored by cluster. Violin plot shows distributions of marker genes across clusters with cluster breakdown by differentiation experiment and cell line shown below. D. D24 cardiomyocytes (TNNT2+ cells) from all 5 differentiation experiments were independently clustered and visualized using UMAP. Each differentiation experiment is individually colored in red in the left column of UMAPs. D24 samples were not collected in differentiation experiments 1 and 2. Top right UMAP is colored by cluster. Violin plot shows distributions of marker genes across clusters with cluster breakdown by differentiation experiment and cell line shown below. E. D26 cardiomyocytes (TNNT2+ cells) from all 5 differentiation experiments were independently clustered and visualized using UMAP. Each differentiation experiment is individually colored in red in the left column of UMAPs. D26 samples were not collected in differentiation experiments 1 and 2. Top right UMAP is colored by cluster. Violin plot shows distributions of marker genes across clusters with cluster breakdown by differentiation experiment and cell line shown below.

Supplementary Figure S7

B CNTN5 A INPP4B D12/D14 D24/D26 MYH7

NAV1 FBXL7 HECW2 PLN 1 H19 ROBO2 EZR FLRT2 PRSS35 RNF150 ATP13A3 MAN1C1 PRICKLE1 selected LINC00881 FHOD3 not selected ITGAV

10 10 0

RBMS3 MYL4 Protocol 2 Protocol 2 RBFOX2 5 5 TANC2 PRTG SAMD4A PIP4K2A SDK1 DAB1 ADAM28 BMP2 MYO1D MYH6 MEF2C SEMA3C

Log2 fold change D14 vs. D26 -1 COL2A1 GRIN2A BMPER GPC3 LSAMP VCAN 0 Rs = 0.97 0 Rs = 0.96 FAM19A4 0 5 10 0 5 10 Protocol 1 Protocol 1 0 2 4 6 Transcript abundance

Exp1 D Exp2 Exp3 F Exp4 C E G Day Cluster D12 D14 D24 D26 0 1 2 3 4

D12 D14 D12 D14 D12 D14 D24 D26 D12 D14 D24 D26 Day Cluster MOXD1 IL1RAPL1 EGFEM1P CNTN5 ROBO2 SEMA3C COL11A1 HIF1A−AS2 FAM155A PDZRN4 Max value Max value Max value c0 c1 c2 c3 c0 c1 c2 c3 Max value c0 c1 c2 c3 c0 c1 c2 c3 c4 c5 GRIA4 CCDC60 3.7 MKI67 3.4 MKI67 3.5 MKI67 MKI67 3.4 RP11−820L6.1 COLEC12 TNNT2 4.3 TNNT2 4.8 TNNT2 4.3 TNNT2 4.4 EPHA4 TSPAN15 MYH6 5.8 MYH6 5.5 MYH6 5.7 MYH6 5.9 SORCS1 LRRTM4 MYH7 5.2 MYH7 6.2 MYH7 6.3 MYH7 6.2 FAM19A4 PPFIA2 LINC00478 VCAN 0 0 0 0 MYH6 200 300 300 INPP4B 250 PRSS35 400 600 D12 D14 600 D12 D14 500 D12 D14 H19 # cells D24 D26 D24 D26 LINC00881 750 600 D12 D14 900 900 800 NEAT1 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 4 5 CDKN1A cluster cluster cluster cluster XRCC4 DIAPH3 Sample ANKRD1 0 0 Sample 0 11 0 Sample MYH7 CRYAB Sample 10 12 15 33 200 NR4A1 250 1 5 300 300 13 16 34 CNNM2 2 6 14 17 35 500 400 600 600 SAMD4A 3 7 29 18 36 FAM189A2 # cells 4 600 8 900 30 19 37 NFATC2 750 900 9 31 20 38 PIP4K2A BMP2 800 32 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 4 5 MEF2C cluster cluster cluster cluster THBS1 CYR61 ARID5A MYH6 MYH6 MYH6 MYH6 HAS2 1.00 0.8 TOX 0.6 D12 vs D14 D12 vs D14 D12 vs D14 D12 vs D14 0.6 LRP1B 0.75 0.6 0.4 0.4 0.2 0.2 LSAMP 0.50 0.4 0.0 0.0 STXBP6 0.4 D24 vs D26 0.3 D24 vs D26 SLC1A3 0.2 0.3 0.25 0.2 0.2 LAMA1 0.1 0.1 GRIN2A 0.00 0.0 0.0 0.0 0 2 4 6 0 2 4 0 2 4 6 0 2 4 6 ABRA COL2A1 MYH7 MYH7 MYH7 MYH7 NLGN1 COL1A2 1.00 D12 vs D14 0.5 D12 vs D14 0.3 D12 vs D14 0.6 D12 vs D14 0.75 0.2 Transcript density Transcript 0.4 0.50 0.25 0.1 4 3 2 1 0 −1 −2 −3 0.3 0.4 0.00 0.0 0.2 0.3 D24 vs D26 0.3 D24 vs D26 0.2 0.1 0.2 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0 1 2 3 4 5 0 2 4 6 0 2 4 6 0 2 4 6 Transcript abundance Supplementary Figure S7: Differences in gene expression between two directed differentiation protocols in D12, 14, 24, and 26 cardiomyocytes A. Left: Scatter plot of population transcript abundances between Protocol 1 at D12 and Protocol 2 at D14, respectively. Right: Scatter plot of population transcript abundances between Protocol 1 at D24 and Protocol 2 at D26. Each point represents a gene. Spearman correlation is shown in lower right. B. Log2 fold change vs. mean expression for Protocol 2 D14 vs D26 cardiomyocytes. Bootstrapped sparse regression analysis was performed as described for Fig. 3A, and selected genes are highlighted in black. C. Experiment 1 cardiomyocytes (TNNT2+ cells) from all collected time points (D12 and D14) were independently clustered and visualized using UMAP to compare differentiation protocols. UMAPs color coded by day/protocol and cluster are shown. Group violin plot shows distributions of marker genes in clusters with cluster breakdown by day and sample shown below. Probability density distribution of MYH6 and MYH7 transcript abundance in all Exp1 D12 and D14 cardiomyocytes is shown at the bottom. D. Experiment 2 cardiomyocytes (TNNT2+ cells) from all collected time points (D12 and D14) were independently clustered and visualized using UMAP to compare differentiation protocols. UMAPs color coded by day/protocol and cluster are shown. Group violin plot shows distributions of marker genes in clusters with cluster breakdown by day and sample shown below. Probability density distribution of MYH6 and MYH7 transcript abundance in all Exp2 D12 and D14 cardiomyocytes is shown at the bottom. E. Experiment 3 cardiomyocytes (TNNT2+ cells) from all collected time points (D12, D14, D24, and D26) were independently clustered and visualized using UMAP to compare differentiation protocols. UMAPs color coded by day/protocol and cluster are shown. Group violin plot shows distributions of marker genes in clusters with cluster breakdown by day and sample shown below. Probability density distribution of MYH6 and MYH7 transcript abundance in all Exp3 D12, D14, D24, and D26 cardiomyocytes is shown at the bottom. F. Experiment 4 cardiomyocytes (TNNT2+ cells) from all collected time points (D12, D14, D24, and D26) were independently clustered and visualized using UMAP to compare differentiation protocols. UMAPs color coded by day/protocol and cluster are shown. Group violin plot shows distributions of marker genes in clusters with cluster breakdown by day and sample shown below. Probability density distribution of MYH6 and MYH7 transcript abundance in all Exp4 D12, D14, D24, and D26 cardiomyocytes is shown at the bottom. G. Heatmap showing top differentially expressed genes between non-proliferative cardiomyocyte clusters (c0-4) in differentiation Experiment 5. See Exp5 clusters in Fig. 5F-G. Normalized transcript abundance was centered and scaled across each row (z-score color scale below heatmap; red = standard deviations above mean; blue = standard deviations below mean; white = mean; for visualization purposes, 4 was set as the maximum z-score, and z-scores > 4 were set to 4). The dendrogram is based on hierarchical clustering of genes. Each column corresponds to one cell.

Supplementary Table S1: scRNA-seq sample metadata Metadata for all samples included in the scRNA-seq data set. Sequencing_batch refers to the sequencing batch that samples belonged to (seq1 or seq2), cell_line indicates one of three cell lines: either AICS0 (AICS-00 WTC-11), AICS11 (AICS-0011 cl.27 TOMM20-mEGFP), or AICS37 (AICS-0037 cl.172 TNNI1- mEGFP). Protocols are listed as Protocol 1 (small molecule), Protocol 1 (small molecule 7.5/7.5), and Protocol 2 (cytokine). See “Cardiomyocyte differentiation using two protocols” section of the Materials and Methods for protocol details. Differentation_experiment refers to independent experiment setups, Exp 1 through Exp7, which correspond to the differentiation_start. Differentiation_start refers to the date at which undifferentiated stem cells were seeded for cardiac differentiation, with the undifferentiated cell passage number and seeding density noted in 10x10^6 cells per well (M, million) in columns H and I, respectively. The harvest date and day when spontaneous beating was observed is recorded (# of days after D0, when differentiation was initiated, see Directed cardiomyocyte differentiation section of Materials and Methods, nr = not recorded). Percent_ctnt is the percent of the harvested population that expressed cardiac T by flow cytometry analysis.

Supplementary Table S2: Differentially expressed genes between clusters List of differentially expressed (DE) genes from pairwise cluster comparisons for D0 (C2), D12 (C0), D24 (C1), and D90 (C3) (see clusters in Figs. 1, 2, 3, Supplementary Figure S3D). Each tab is DE genes between one pair of clusters. LogFC = log2 fold change between groups; logCPM = mean log2 of counts per million; LR = likelihood ratio statistics; PValue = p-value before multiple testing correction; FDR = multiple testing adjusted p-values with Benjamini-Hochberg method to control false discovery rate; up = fraction of non-zero cells for gene in up-regulated cluster (cluster in pair with higher transcript abundance of the two); down = fraction of non-zero cells for gene in down-regulated cluster (cluster in pair with lower transcript abundance).

Supplementary Table S3: Feature selection analysis genes ranked by lambda Genes selected from the bootstrapped sparse regression analysis are listed ranked by lambda value (regularization parameter) when they were first selected. Each tab shows a different set (D12 vs D24, D24 vs D90, and D14 vs D26). See Fig. 3A-B, and Supplementary Figs. S4 and S7B, and scRNA-seq feature selection analysis section of Materials and Methods). Selected = gene with non-zero coefficient in all 1,000 bootstrap rounds at a given value of lambda.

Supplementary Table S4: Enriched gene ontology categories Enriched gene ontology (GO) categories were identified for differentially expressed genes between time points Day 0 (C2) and Day 12 (C0), Day 12 (C0) and Day 24 (C1), and Day 24 (C1) and Day 90 (C3) (group; column J). Table shows top 10 enriched GO categories ranked by adjusted p-value from each pairwise comparison.ID = GO accession; GeneRatio = # of differentially expressed genes that overlap background gene set and are annotated with given GO term / # of differentially expressed genes that overlap background gene set; BgRatio = # of genes from background gene set that are annotated with GO term / # of genes in background gene set; pvalue =hypergeometric p-value; p.adjust = Benjamini- Hochberg adjusted p-value; qvalue = false discovery rate; geneID = list of differentially expressed genes that overlap background gene set and are annotated with given GO term; Count = # of differentially expressed genes that overlap background gene set and are annotated with given GO term. This table is used to make plot in Supplementary Fig. S3C.

Supplementary Table S5: Genes evaluated using RNA FISH Genes evaluated using RNA FISH are listed, with protein name (Column B) and NCBI accession number (Column C) for the sequence used to design probe sets listed. All probe sets can be ordered from Molecular Instruments/Molecular Technologies using the unique probe ID listed here (Column D).