Single-Cell Transcriptome Sequencing of 18,787 Human Induced Pluripotent Stem Cells Identifies Differentially Primed Subpopulations
Total Page:16
File Type:pdf, Size:1020Kb
Single-cell transcriptome sequencing of 18,787 human induced pluripotent stem cells identifies differentially primed subpopulations Quan H. Nguyen1*, Samuel W. Lukowski1*, Han Sheng Chiu1, Anne Senabouth1, Timothy J. C. Bruxner1, Angelika N. Christ1, Nathan J. Palpant1*, Joseph E. Powell1,2* 1 Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia 2 Queensland Brain Institute, University of Queensland, Brisbane, Australia * These authors contributed equally Supplementary Figures and Tables Table S1. Summary statistics for sequencing and mapping data of five samples Mean Median Total Median Percent Remaining Number Total number reads per genes genes UMIs per mapped cells post of cells of reads cell per cell detected cell reads filtering Sample 1 2,779 71,256 3,662 20,356 17,769 198,022,303 62.6 2,426 Sample 2 545 318,909 4,547 18,557 27,341 173,805,798 63.4 424 Sample 3 3,103 56,459 2,944 19,261 11,214 175,193,813 71.8 2,906 Sample 4 9,192 38,637 2,016 21,491 5,963 355,158,219 68.9 8,294 Sample 5 4,863 26,471 2,804 20,235 10,150 128,728,889 67.2 4,737 1 Table S2. Summary of the cell and gene filtering process Procedure Count Cells removed by library size (outside 3 x MAD range)a 0 Cells removed by number detected genes (outside 3 x MAD range) 77 Cells removed by reads mapped to mitochondrial genes (outside 3 x MAD range) 1,559 Cells removed by reads mapped to ribosomal genes (outside 3 x MAD range) 102 Cells removed by reads mapped to mitochondrial genes (> 20 % total reads)b 0 Cells removed by reads mapped to ribosomal genes (> 50 % total reads) 0 Genes removed by number of expressed cells (< 1 % total cells)c 16,674 Remaining cells post filtering 18,787 Remaining genes post filtering 16,064 aMAD stands for median absolute deviation. A cell that had library size, or total genes, or mapped reads for mitochondrial genes or ribosomal genes outside the 3xMAD range was removed. bA second round filtering using hard cut-offs for mitochondrial or ribosomal genes cA gene that was detected in fewer than 1% of the total cells was removed 2 Table S3. Cell number distribution across subpopulations a Subpopulation 1 Subpopulation 2 Subpopulation 3 Subpopulation 4 Total Sample 1 1,118 (46.1) 1,306 (53.8) 2 (0.1) 0 (0) 2,426 Sample 2 215 (50.7) 203 (47.9) 4 (0.9) 2 (0.5) 424 Sample 3 1,410 (48.5) 1,024 (35.2) 294 (10.1) 178 (6.1) 2,906 Sample 4 4,095 (49.4) 3,975 (47.9) 211 (2.5) 13 (0.2) 8,294 Sample 5 2,245 (47.4) 2,469 (52.1) 15 (0.3) 8 (0.2) 4,737 Total 9,083 (48.3) 8,977 (47.8) 526 (2.8) 201 (1.1) 18,787 aNumbers in brackets are the percentages of cells in each subpopulation within each sample. 3 a Table S4. Number of overexpressed genes between subpopulations nij Subpopulation 1 Subpopulation 2 Subpopulation 3 Subpopulation 4 Cluster 1 0 16 651 2,366 Cluster 2 33 0 546 2,433 Cluster 3 90 27 0 2,093 Cluster 4 111 80 441 0 a nij is the number of genes higher in subpopulation in row i compared to subpopulation in column j. 4 Table S5. Expression of known pluripotent and lineage-primed markers Average Expression in Each Subpopulation Known Markersa Subpopulation 1 Subpopulation 2 Subpopulation 3 Subpopulation 4 POU5F1 1845.07b 1933.63 1446.8 1258.13 DNMT3B 302.71 342.87 182.57 174.65 SOX2 280.83 299.64 235.07 219.07 NODAL 210.43 211.83 129.12 36.06 UTF1 178.16 201.13 64.28 44.08 LIN28A 169.93 189.92 111.54 113.18 LEFTY1 160.57 168.91 102.02 18.58 GDF3 98.25 85.41 68.12 23.07 SDC2 87.92 80.54 64.37 53.14 NANOG 42.48 46.83 25.09 22.94 LIN28B 38.2 42.88 27.31 11.72 HESX1 34.19 26.61 122.18 187.97 LCK 32.87 33.29 31.28 25.34 LEFTY2 26.79 29.41 16.14 13.64 ZFP42 26.43 19.67 137.86 231.49 OTX2 24.05 25.71 43.32 72.13 DPPA2 23.98 22.13 23.57 0 IDO1 20.11 15.16 53.78 69.5 HHEX 17.25 17.04 12.57 0 NR5A2 13.91 16.44 7.16 0 DPPA5 12.59 11.41 8.09 9.08 PDGFRA 11.67 12.01 13.97 3 MIXL1 10.72 11.93 2.93 6.04 HEY1 9.54 11 3.81 3.92 TRIM22 9.16 8.23 5.6 22.13 COL2A1 8.01 10 2.96 3.62 MAP2 6.51 7.93 8.46 0 KLF4 5.99 7 5.79 0 FOXA2 4.23 5.37 1.79 0 EOMES 3.85 4.97 3.82 0 DMBX1 2.67 2.81 0.94 0 CXCL5 2.31 2.4 1.52 0 IL6ST 1.58 1.41 0.98 0 TBX3 1.49 1.92 0 0 CPLX2 1.47 2 2.53 0 PAPLN 1.42 1.26 0 0 RGS4 1.4 2.19 1.95 3.83 TFCP2L1 1.34 1.74 0 0 PAX6 1.27 0.95 1.25 0 CLDN1 1.22 1.73 0 3.4 DRD4 1.17 1.59 0 0 OLFM3 1.06 0.94 1.12 0 NPPB 0.89 0.76 0 0 GATA4 0.87 0.96 0 0 TM4SF1 0.79 0.42 2.52 0 FOXP2 0.71 0.66 0 0 KLF5 0.68 0.81 0 0 CDH9 0.55 0.38 0 0 SST 0.41 0.5 0 0 SNAI2 0.39 0.46 2.37 0 ALOX5 0.33 0.58 0 0 T 0.31 0.27 0 0 FGF4 0.26 0.41 0 0 NOS2 0.23 0.26 1.63 0 NKX2-5 0.19 0.27 0 0 MYO3B 0.18 0.3 0 0 a Markers are ordered from high to low expression values in Subpopulation one 5 Table S6. Number of cells expressing known pluripotent and lineage-primed markers across subpopulations Gene Symbola Subpopulation 1b,c Subpopulation 2 Subpopulation 3 Subpopulation 4 Total Cells POU5F1 8979 (98.9) 8909 (99.2) 465 (88.4) 162 (80.6) 18515 (98.6) DNMT3B 6412 (70.6) 6689 (74.5) 148 (28.1) 43 (21.4) 13292 (70.8) SOX2 6159 (67.8) 6235 (69.5) 188 (35.7) 48 (23.9) 12630 (67.2) UTF1 4579 (50.4) 4889 (54.5) 69 (13.1) 11 (5.5) 9548 (50.8) NANOG 1722 (19.0) 1813 (20.2) 27 (5.1) 6 (3.0) 3568 (19.0) LCK 1326 (14.6) 1297 (14.4) 32 (6.1) 8 (4.0) 2663 (14.2) OTX2 1081 (11.9) 1042 (11.6) 37 (7.0) 18 (9.0) 2178 (11.6) HESX1 1100 (12.1) 895 (10.0) 88 (16.7) 40 (19.9) 2123 (11.3) DPPA2 940 (10.3) 896 (10.0) 29 (5.5) 0 (0.0) 1865 (9.9) IDO1 788 (8.7) 588 (6.6) 47 (8.9) 19 (9.5) 1442 (7.7) ZFP42 732 (8.1) 529 (5.9) 99 (18.8) 52 (25.9) 1412 (7.5) DPPA5 531 (5.8) 476 (5.3) 9 (1.7) 3 (1.5) 1019 (5.4) TRIM22 401 (4.4) 373 (4.2) 8 (1.5) 4 (2.0) 786 (4.2) KLF4 296 (3.3) 337 (3.8) 6 (1.1) 0 (0.0) 639 (3.4) FOXA2 218 (2.4) 250 (2.8) 3 (0.6) 0 (0.0) 471 (2.5) CXCL5 105 (1.2) 99 (1.1) 2 (0.4) 0 (0.0) 206 (1.1) TBX3 74 (0.8) 85 (0.9) 0 (0.0) 0 (0.0) 159 (0.8) TFCP2L1 66 (0.7) 78 (0.9) 0 (0.0) 0 (0.0) 144 (0.8) KLF5 32 (0.4) 39 (0.4) 0 (0.0) 0 (0.0) 71 (0.4) a Genes are ordered from high to low by total cell expressing the gene markers b Expression values are in counts per million(cpm). Each cell had a similar total expression of approximately one million cpm. c For each subpopulation, the absolute cell numbers are shown along with the percentage (%) of the subpopulation expressing that gene in parentheses. 6 Table S7. Functional enrichment analysis of differentially expressed (DE) genes for cells in subpopulation one compared to cells in the remaining subpopulations Total genes Genes in Reactome Pathway P-value FDR HitGenes in pathway geneSet YBX1, LSM3, FUS, POLR2L, POLR2I, mRNA Splicing - Major Pathway 127 8 1.5x 10-6 1.0 x 10-4 HNRNPA3, HSPA8, SRSF2 RNA polymerase III transcription initiation from 27 3 7.4 x 10-6 0.02 POLR2L, GTF3C6, POLR3H type 2 promoter Golgi associated vesicle biogenesis 27 3 7.4 x 10-6 0.02 FTL, FTH1, HSPA8 Respiratory electron transport, ATP synthesis ATP5E, UQCR10, UQCRH, COX7C, by chemiosmotic coupling, and heat production 117 5 9.4 x 10-6 0.02 NDUFA2 by uncoupling proteins. RNA polymerase II transcription elongation 44 3 3.0 x 10-4 0.04 POLR2L, POLR2I, SSRP1 Golgi associated vesicle biogenesis 27 3 7.4 x 10-4 0.02 FTL, FTH1, HSPA8 RNA polymerase II transcription 108 4 5.2 x 10-3 0.06 POLR2L, POLR2I, SSRP1, SRSF2 Cellular senescence 125 2 0.2 0.2 HMGA1, ANAPC15 7 Table S8.