Integrated Genomic Analysis of Diverse Induced Pluripotent Stem Cells from the Progenitor Cell Biology Consortium

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Salomonis, N., P. Dexheimer, L. Omberg, R. Schroll, S. Bush, J. Huo, L. Schriml, et al. 2016. “Integrated Genomic Analysis of Diverse Induced Pluripotent Stem Cells from the Progenitor Cell Biology Consortium.” Stem Cell Reports 7 (1): 110-125. doi:10.1016/ j.stemcr.2016.05.006. http://dx.doi.org/10.1016/j.stemcr.2016.05.006.

Published Version doi:10.1016/j.stemcr.2016.05.006

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:27822302

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Stem Cell Reports Resource

Integrated Genomic Analysis of Diverse Induced Pluripotent Stem Cells from the Progenitor Cell Biology Consortium

Nathan Salomonis,1,16 Phillip J. Dexheimer,1,16 Larsson Omberg,2 Robin Schroll,3 Stacy Bush,3 Jeffrey Huo,4,5 Lynn Schriml,6 Shannan Ho Sui,7,8 Mehdi Keddache,9 Christopher Mayhew,10 Shiva Kumar Shanmukhappa,11 James Wells,10 Kenneth Daily,2 Shane Hubler,2 Yuliang Wang,12 Elias Zambidis,4,5 Adam Margolin,2,12 Winston Hide,7,8,13 Antonis K. Hatzopoulos,14 Punam Malik,3 Jose A. Cancelas,3,15 Bruce J. Aronow,1,17 and Carolyn Lutzko3,15,17,* 1Department of Biomedical Informatics, Cincinnati Children’s Hospital, Cincinnati, OH 45229, USA 2Sage Bionetworks, Seattle, WA 98109, USA 3Division of Experimental Hematology and Cancer Biology, Cincinnati Children’s Hospital, Cincinnati, OH 45229, USA 4Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA 5Division of Pediatric Oncology, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21205, USA 6Department of Epidemiology and Public Health, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA 7Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA 8Harvard Stem Cell Institute, Cambridge, MA 02138, USA 9Division of Human Genetics 10Division of Developmental Biology 11Division of Pathology Cincinnati Children’s Hospital, Cincinnati, OH 45229, USA 12Computational Biology Program, Oregon Health & Science University, Portland, OR 97239, USA 13Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield S10 2HQ, UK 14Division of Cardiovascular Medicine, Departments of Medicine and Cell and Developmental Biology, Vanderbilt University, Nashville, TN 37232, USA 15Hoxworth Blood Center, University of Cincinnati, Cincinnati, OH 45229, USA 16Co-first author 17Co-senior author *Correspondence: [email protected] http://dx.doi.org/10.1016/j.stemcr.2016.05.006

SUMMARY

The rigorous characterization of distinct induced pluripotent stem cells (iPSC) derived from multiple reprogramming technologies, somatic sources, and donors is required to understand potential sources of variability and downstream potential. To achieve this goal, the Progenitor Cell Biology Consortium performed comprehensive experimental and genomic analyses of 58 iPSC from ten laboratories generated using a variety of reprogramming , vectors, and cells. Associated global molecular characterization studies identified func- tionally informative correlations in expression, DNA methylation, and/or copy-number variation among key developmental and oncogenic regulators as a result of donor, sex, line stability, reprogramming technology, and cell of origin. Furthermore, X- inactivation in PSC produced highly correlated differences in teratoma-lineage staining and regulator expression upon differentiation. All experimental results, and raw, processed, and metadata from these analyses, including powerful tools, are interactively accessible from a new online portal at https://www.synapse.org to serve as a reusable resource for the stem cell community.

INTRODUCTION making it unclear whether all methodologies produce iPSC with a similar quality and stability. Pluripotent stem cells (PSC) have been used to study hu- A variety of studies have compared the expression pro- man development, model disease, and generate cellular files, pluripotentiality, and genetic and epigenetic stability tools for regenerative medicine. Human embryonic stem of hESC and iPSC including lines generated using different cells (hESC) have been considered the functional, genetic, strategies, distinct parental somatic cell types, or reprog- and epigenetic gold standard in the field (Thomson et al., ramming methods (Bock et al., 2011; International Stem 1998). Methods of somatic cell reprogramming to generate Cell Initiative et al., 2007; Mu¨ller et al., 2011; Rouhani induced PSC (iPSC) (Takahashi and Yamanaka, 2006) are et al., 2014; Schlaeger et al., 2015). However, these have continually being improved and have enabled the genera- been limited to a few variables, have multiple methods or tion of iPSC using a variety of somatic cell sources, gene laboratories collecting and processing samples, and typi- combinations, and methodologies. However, due to the cally employ a single genomics platform. ‘‘Multi-omics’’ intensive resources required for iPSC generation and char- analyses have proved to be essential in deciphering com- acterization, direct comparisons of iPSC generated using plex gene regulatory programs, as demonstrated by ana- a wide range of technologies and cell sources from multi- lyses of iPSC reprogramming transitional states (Clancy ple independent laboratories have rarely been performed, et al., 2014; Lee et al., 2014; Tonge et al., 2014).

110 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 j ª 2016 The Authors. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). The Progenitor Cell Biology Consortium (PCBC) of the of the protocols, genomic analyses, and metadata produced National Heart, Lung and Blood Institute was founded through this effort, we developed a sophisticated interac- to study iPSC reprogramming and differentiation and tive data portal, the interface of which is exemplified in develop strategies to address the challenges presented by Figure 1. In addition to integrated provenance annotations the transplantation of these cells. These questions include, for every raw data file, script, or processed result file, data but are not limited to: (1) Do iPSC consistently generate all can be queried through an interactive heatmap viewer three germ layers? (2) How prevalent is copy-number that displays and inter-relates gene expression, DNA variation (CNV) in iPSC generated using different reprog- methylation, and miRNA expression for queried genes, ramming methodologies? (3) Do different reprogramming pathways, and gene signatures produced in the analyses methods affect global methylation, gene, splicing and described here. These signatures have been further propa- microRNA (miRNA) expression profiles? (4) Can aberrant gated into ToppGene (Chen et al., 2009) for interactive PSC gene regulation be identified on a global basis? (5) queries. Synapse IDs are included to access the resources, How do variables such as X-chromosome inactivation data, metadata, ontologies, and other information through (XCI) affect iPSC quality, stability, and differentiation the Synapse online repository. potential? To advance these goals, the PCBC developed a Central Cell Characterization Core and Bioinformatics Screening of Lines Core to perform standardized and comprehensive character- The data from the first 64 lines (58 iPSC and 6 hESC) ization of iPSC generated using different somatic cell sour- enrolled in the study are presented here with their charac- ces, methodologies, and vectors. The characterized iPSC teristics outlined in Figure 2A (details in syn2767694). All are being made available through WiCell Research Institute. lines completed a standardized screen to ensure they met Using integrative analyses across genomic analysis plat- a basic set of criteria. This included self-renewal in defined forms, we present comparative results on phenotype, ge- feeder-free conditions, expression of markers of pluripo- netics, epigenetics, and gene regulation for a diverse panel tency and a lack of expression of markers of differentiation, of iPSC and hESC. Standardized methods and strict control a normal karyotype, and the ability to grow sufficient of reagents during cell culture, sample collection, and assay quantities of cells for the analyses (Tables S2 and S3; Fig- performance were used to evaluate the innate potential and ure S1). Overall, 6 hESC and 35 iPSC (64%) met these limitations of these cells with fewer confounding factors. criteria and 23 iPSC did not (36%) (Table S4). Abnormal Our use of this uniform analytical methodology allowed karyotypes were observed in seven lines (Table S5), with us to discover candidate regulators of the fate of reprog- karyotypes for all lines available (syn2679104). The most rammed cells. To maximize the utility of this resource, we consistent flow cytometry anomalies were TRA-1-81 and developed an interactive open data portal for access to TRA-1-60 below 90% or an increase in SSEA-1 above 5% the raw data, metadata, results, and protocols from these (Figure 2B). Due to contamination, difficulty in expanding experiments for further analysis (https://www.synapse. cells, and/or abnormal karyotype, not all lines were org/PCBC). included in functional pluripotency assays.

Pluripotency Analysis RESULTS Pluripotency was evaluated in a teratoma assay on 49 lines. Forty-six of the lines met the screening criteria outlined in Study Design and Synapse Analysis Portal Table S3 and 45 of these lines generated teratomas. Three An overview of the study is presented in Figure 1. The lines did not meet the PSC screening criteria with decreased evaluation of iPSC from multiple laboratories and method- expression of self-renewal markers and increased differenti- ologies required highly structured cell-line annotations ation in culture (SC12-021, SC12-023, and SC14-082), and and well-documented protocols to make comprehensive all three successfully generated teratomas. All teratomas comparisons possible. Metadata standards were developed were scored by a clinical pathologist, and representatives to capture the origin of each line, starting cell type, donor of all three embryonic germ layers were identified in demographics, and reprogramming parameters (derivation all tumors (detailed information is available at Synapse method, vector type, reprogramming genes, culture condi- syn2882785). We performed immunostaining analysis on tions). These metadata were provided by the originating teratomas from a subset of lines to confirm pluripotency laboratory and confirmed and augmented with in vitro (muscle-specific actin [MSA], neurofilament, and a-feto- genetic and experimental characterization of the line. ) and OCT4 to evaluate the presence of undifferen- RNA sequencing (RNA-seq) was performed at an acceptable tiated PSC (Figure S1). This included two lines that did not depth to facilitate accurate gene-expression quantification meet the screening criteria and independent iPSC from the (Supplemental Experimental Procedures). To facilitate use same donor as controls (Table S7), and three teratomas that

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 111 A B

Donor Sample Synapse Protocols Ontologies Donor cells ISA-Tab Reprogramming Metadata Karyotype iPSC hESC FACS Vials shipped to C4 qPCR Teratoma Central Cell Characterization Omics Input Core (C4) Omics Output Thaw and culture in Provenance standardized conditions Code Standard Preliminary Screening Sterility & Mycoplasma Karyotype Analysis Flow Cytometry Unstable Stable lines lines C subset of lines

Pluripotency Molecular “omics” Alternative Analysis characterization Splicing RNA-Seq Teratoma Gene Analysis DNA RNA Expression

miR-Seq SNP array DNA-Methylation 450k array Donor CNV Genetics Omics Integration D

Figure 1. Study Design and Synapse Results Portal (A) Overview of study design and data deposited online in Synapse. FACS, fluorescence-activated cell sorting; ISA-Tab, investigation/ study/assay tab-delimited format. (B) Synapse heatmap viewer displaying probes corresponding to EB-induced transcription factors and associated DNA-methylation levels to visually detected outliers in PSC. (C) Provenance for the creation of normalized gene-expression values, associated scripts, quality control, metadata, and annotations. (D) Gene-enrichment analysis in ToppGene displaying top-ranking PCBC stem cell signatures. had regions histopathologically classified as poorly differ- Evaluation of CNV Changes in iPSC entiated as well as independent teratomas generated from Genetic stability was evaluated between independent lines the same lines (Table S8). The immunostaining confirmed with common donors by CNV SNP microarrays. Although pluripotency in all tumors (Figure S1). OCT4 staining was two SNP genotyping arrays were used, all lines derived observed in one teratoma with a poorly differentiated re- from a single donor were run on the same platform (see gion (SC12-034), although other teratomas from this line Experimental Procedures). Variations were observed in all were fully differentiated and did not have OCT-4-stained lines and on all (Figure S2). Excluding hu- regions. Two teratomas from other lines (SC11-014 and man leukocyte antigen-associated regions, 724 non-benign SC11-0013) with poorly differentiated regions did not or clinically significant CNV from 529 unique genomic have OCT4 immunoreactivity, although we did not have loci were identified (syn3105726). Although not signifi- adjacent sections for staining (Table S8). cant, lines reprogrammed with integrating vectors trended

112 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 Sendai Figure 2. iPSC Line Characteristics, Flow A ICM Virus Other RNA Cytometry Analysis, and CNV Accumu- MSC 9% 5% 10% 3% 7% OSK-L-l- lation Lenti p53KD (A) Reprogramming variables for: origi- 17% Retro 12% OSKM nating cell type (left), reprogramming vec- Fibroblast 7% Blood 40% tor (middle), and gene combination (right). 49% 39% Plasmid OSKM- Reprogramming gene combinations: OSKM 64% NLT is composed of POU5F1 (also known as 38% OCT4), SOX2, KLF4, and c-MYC; OSK-L-l- Cell of Origin Reprogramming Reprogramming p53KD includes LIN28A and TP53 knock- n=64 Vector n=58 Gene n=58 down and l-MYC instead of c-MYC; OSKM-NLT includes NANOG, LIN28A, and SV40 large T antigen. ICM, inner cell mass; MSC, mesen- B chymal stem cell. 100 SSEA-1 SSEA-4 TRA-1-60 TRA-1-81 CD9 OCT4 1000 SSSEA-1 SSSEA-4 TTRA-1-60 TTRA-1-81 CCD9 OCT4 (B) Flow cytometry analysis classified iPSC 80 80 as stable (n = 41) or unstable (n = 11). Boxes ve i t 60 i 60 represent the first and third quartiles, os 40 P 40 whiskers show the complete range, and the horizontal line is the median. 20 20 ercent Percent Positive P Percent Positive (C) Commonly observed CNV predicted as 0 0 non-benign or clinically significant and Stable Lines UnstableUnstable LinesLines observed from at least three independent genetic donors are listed on the right C CNV+expressionCNV+expression CNV and shown as a heatmap (blue). Red cells HomozygousHomozygous deletion 2121q21.2q2 q indicate that CNV overlaps genes with Duplicationpq <500kb 17q2117q21.31 Deletion <200kb 4q13.1q concordant expression differences. Concor- Deletion <200kb 19p13.3p Homozygousyg deletion 15q215q26.3q dant genes with known function are labeled ID1 BCL2L1 Duplicationp >500kb 20q20q11.21q11. HM13 TPX2 Deletion <200kb 5q35.2q on the left, with previously identified Deletion <200kb 7p22.1p Homozygousyg deletion 8q248q24.3 q tumor-suppressor genes in purple and cell- Deletion <200kb 16p13.3p Deletion <200kb 12q23.3q growth-promoting and oncogenesis-pro- Deletion <200kb 17p13.3p Deletion <200kb 4q21.21q moting genes in green. Lines from the same Deletion <200kb 8q24.23q Deletion <200kb 3q23q donor are highlighted in the same color. Deletion <200kb 5p15.2p Deletion <200kb 21q221q22.12q 2.12 Deletion <200kb 13q32.1q Homozygousyg deletion 5p155p15.2 p Homozygousyg deletion 4q24q21.21q Deletion <200kb 10q23.1q Deletion <200kb 20q13.33q EIF4E3 Deletion <200kb 3p13p Deletion <200kb 8q24.3q Deletion <200kb 2p16.3p Deletion <200kb 21q221q22.3q 2.3 Deletion <200kb 16q12.1q Duplicationpp <500kb 12p1312p13.31 Homozygousyg deletion 12q12q13.13q Mosaic Duplicationp 15q11.215qq11. FOXO4 Deletion <200kb 7q36.3q BEX1 SRPX Mosaic monosomy Xp22.3Xp22.33-Xq28 EDA2R GPC3 D011:SC11-011 D002:SC11-005 D004:SC11-016 D014:SC11-014 D013:SC11-013 D003:SC11-010 D004:SC11-018 D015:SC11-015 D012:SC11-012 D003:SC11-008 D001:SC11-003 D004:SC11-017 D003:SC11-009 D001:SC11-004 D009:SC12-030 D008:SC13-049 D008:SC12-026 D009:SC12-029 D022:SC13-045 D020:SC13-043 D026:SC14-066 D009:SC12-031 D009:SC12-028 D007:SC12-024 D007:SC12-023 D004:SC12-021 D016:SC12-006 D025:SC12-002 D021:SC13-044 D005A:SC12-005 D005B:SC12-007

1st degree toward a higher frequency of clinically significant CNV same donor samples that had variable CNV (Table S6), (58%) compared with non-integrating vectors (41%). with some donors having higher frequencies of CNV Our study included different iPSC generated from the than others (such as D001, 2, 3, 4, and 9). same donor sample and reprogramming methods, thereby We discovered 102 non-benign CNV shared by at least enabling the direct evaluation of the CNV present in the two distinct donors, with 83 of these CNV variably present donor versus those induced during reprogramming and in two or more distinct samples from a common donor. culture. We observed CNV that were specific to the donor, Two donors (D004 and D003) were solely responsible for and others present among multiple genetically distinct 46 of these CNV, while 26 were recurrent among multiple iPSC (Figure S2). We identified lines generated from the donors (Figure S2C). A more stringent analysis considering

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 113 Variable Genes (2x, at least 3 and 3 samples) Diff Donor Vector Gene A State GenderGend ID Cell Type of Origin Type Combination RunPassage CellLine PPSCSC mal malee 2 fibroblast lentivirus OSKM 119 21 SC11-006 PPSCSC mal malee 2 fibroblast lentivirus OSKM 119 21 SC11-006 PPSCSC mal malee 2 fibroblast lentivirus OSKM 119 26 SC11-007 PPSCSC m maleale 2 fibroblast lentivirus OSKM 119 26 SC11-007 PSPSCC m maleale 2 fibroblast lentivirus OSKM 119 18 SC11-005 PSPSCC m maleale 2 fibroblast lentivirus OSKM 119 18 SC11-005 PPSCSC m maleale 8 fibroblast retrovirus OSKM 219 19 SC13-049 PPSCSC mal malee 16 fibroblast plasmid OSK-L-l-p53KD 144 24 SC12-006 PSPSCC malemale 16 fibroblast plasmid OSK-L-l-p53KD 144 24 SC12-006 PSPSCC m maleale 17 mesenchymal stem cell plasmid OSK-L-l-p53KD 182 15 SC12-034 PPSCSC ma malele 9 fibroblast RNA OSKM-L 181 20 SC12-029 PPSCSC malemale 10 CD34+ cells plasmid OSK-L-l-p53KD 182 17 SC12-036 PPSCSC m maleale 10 CD34+ cells plasmid OSK-L-l-p53KD 182 19 SC12-037 PPSCSC ma malele 7 fibroblast Sendai Virus OSKM 172 20 SC12-025 PPSCSC mal malee 4 fibroblast plasmid OSKM-NLT 154 25 SC11-018 PPSCSC malemale 4 fibroblast plasmid OSKM-NLT 144 25 SC11-018 PSPSCC mal malee 14 inner cell mass N/A N/A 133 33 SC11-014 PSPSCC mal malee 14 inner cell mass N/A N/A 133 33 SC11-014 PSPSCC mal malee 18 inner cell mass N/A N/A 420 23 SC12-038 PPSCSC mal malee 9 fibroblast RNA OSKM-L 181 20 SC12-030 PPSCSC m maleale 9 fibroblast Sendai Virus OSKM 181 20 SC12-031 PPSCSC mal malee 6 fibroblast retrovirus OSKM 181 19 SC12-019 PPSCSC mal malee 4 fibroblast plasmid OSKM-NLT 154 25 SC11-017 PPSCSC mal malee 4 fibroblast plasmid OSKM-NLT 144 25 SC11-017 PPSCSC mal malee 4 fibroblast plasmid OSKM-NLT 172 26 SC12-022 PPSCSC mal malee 30 CD34+ cells lentivirus OSKM 102 IPS18 PPSCSC mal malee 30 CD34+ cells lentivirus OSKM 102 IPS18 PPSCSC mal malee 4 fibroblast plasmid OSKM-NLT 133 29 SC11-016 PSPSCC malemale 4 fibroblast plasmid OSKM-NLT 133 29 SC11-016 PPSCSC mal malee 15 inner cell mass N/A N/A 133 15 SC11-015 PPSCSC m maleale 7 fibroblast retrovirus OSKM 219 32 SC12-024 PPSCSC m maleale 9 fibroblast RNA OSKM-L 181 21 SC12-028 PPSCSC ma malele 6 fibroblast plasmid OSK-L-l-p53KD 181 15 SC12-020 PPSCSC ma malele 15 inner cell mass N/A N/A 133 15 SC11-015 PPSCSC m maleale 7 fibroblast lentivirus OS-NL 182 32 SC12-023 PPSCSC ma malele 17 mesenchymal stem cell plasmid OSK-L-l-p53KD 569 14 SC12-035 PPSCSC mal malee 4 fibroblast plasmid OSKM-NLT 172 33 SC12-021 PPSCSC f femaleema 20 mononuclear plasmid OSKM-NLT 420 15 SC13-043 PPSCSC f femaleema 19 inner cell mass N/A N/A 219 28 SC12-039 PPSCSC fema female 25 inner cell mass N/A N/A 558 27 SC14-067 PSPSCC fema female 26 inner cell mass N/A N/A 558 34 SC14-066 PPSCSC femalefema 22 mononuclear plasmid OSKM-NLT 219 16 SC13-045 PPSCSC fema female 21 mononuclear plasmid OSKM-NLT 219 23 SC13-044 PPSCSC femalefema 27 CD34+ cells plasmid OSKM 569 15 SC14-069 PPSCSC femalefema 5 fibroblast retrovirus OSKM 181 26 SC12-007 PPSCSC femalefema 3 CD34+ cells plasmid OSKM-NLT 149 15 SC11-009 PPSCSC femalefema 3 CD34+ cells plasmid OSKM-NLT 133 15 SC11-009 PPSCSC fema female 3 CD34+ cells plasmid OSKM-NLT 149 17 SC11-010 PSPSCC femalefema 3 CD34+ cells plasmid OSKM-NLT 149 17 SC11-010 PSPSCC fema female 12 CD34+ cells plasmid OSKM 149 16 SC11-012 PSPSCC fema female 12 CD34+ cells plasmid OSKM 149 16 SC11-012 PSPSCC femalefema 13 CD34+ cells plasmid OSKM 149 17 SC11-013 PSC fema female 13 CD34+ cells plasmid OSKM 149 17 SC11-013 PPSCSC fema female 1 fibroblast lentivirus OSKM 133 25 SC11-003 PPSCSC f femaleema 1 fibroblast lentivirus OSKM 133 25 SC11-003 PPSCSC f femaleema 3 CD34+ cells plasmid OSKM-NLT 149 23 SC11-008 PPSCSC f femaleema 3 CD34+ cells plasmid OSKM-NLT 149 23 SC11-008 PPSCSC femalefema 1 fibroblast lentivirus OSKM 133 17 SC11-004 PSCPSC fema female 1 fibroblast lentivirus OSKM 133 17 SC11-004 PPSCSC fema female 1 fibroblast lentivirus OSKM 133 23 SC11-002 PSPSCC fema female 1 fibroblast lentivirus OSKM 133 23 SC11-002 PSPSCC f femaleema 31 fibroblast plasmid OSK-L-l-p53KD 144 26 SC12-005 PPSCSC femalefema 31 fibroblast plasmid OSK-L-l-p53KD 144 26 SC12-005 PSPSCC fema female 24 fibroblast retrovirus OSKM 154 16 SC12-002 PSPSCC femalefema 25 inner cell mass N/A N/A 144 H9 PSPSCC femalefema 25 inner cell mass N/A N/A 102 H9 PSPSCC femalefema 25 inner cell mass N/A N/A 102 H9 PSPSCC femalefema 25 inner cell mass N/A N/A 119 H9 PSPSCC femalefema 25 inner cell mass N/A N/A 119 H9 PSPSCC femalefema 28 mesenchymal stem cell lentivirus OSKM 119 18 SC11-001 PSPSCC femalefema 28 mesenchymal stem cell lentivirus OSKM 119 18 SC11-001 PSPSCC fema female 25 inner cell mass N/A N/A 154 H9Hypox EBEB femalefema EBEB femalefema EBEB femalefema EBEB male EBEB malemale EBEB malemale EBEB male -1.2 0 1.2 EBEB malemale EBEB malemale Differentiation Expression (log2)

AbnormalAbnormal AbnormalAbnormal B KaryotypeKaryotype KaryotypeKaryotype C

Male Female 6

3

0 BM.mono PC2 (5.8%) CB Fib −3 hESC p53KD.Amnio.MSC p53KD.BM.CD34

−15 −10 −5 0 PC 1 (70.4%)

Figure 3. Global Gene Expression and Methylation Variation between iPSC (A) Hierarchical clustering of the most variable genes observed among iPSC and hESC (n = 1,031). These genes were chosen by selecting the reliably expressed genes (n = 9,670) that varied at least 2-fold between six or more samples and correlated (r > 0.5) to the expression of at least ten other genes (AltAnalyze, Predict Groups analysis). Yellow indicates upregulated and blue indicates downregulated genes. (legend continued on next page)

114 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 CNV shared among at least three donors identified a set chymal stem cell-derived iPSC from the same donor and of 31 frequently affected genomic loci, suggesting that laboratory (D017) exhibited a similar EB-like signature. they occurred during iPSC reprogramming or that the start- Neither the D017 nor the H9 samples displayed apparent ing samples were mosaic (Abyzov et al., 2012; McConnell global DNA methylation differences, demonstrating the et al., 2013; Young et al., 2012)(Figure 2). utility of distinct genomic platforms in assessing PSC qual- Comparison of the CNV and RNA-seq data identified 19 ity (Figure S3). non-benign and clinically significant CNV that overlap To identify differences associated with major cell-line with differentially expressed genes in a manner consistent variables, we performed all possible pairwise comparisons with the detected duplication or deletion (syn2731183) from each metadata category for gene expression, splicing, (Figure 2C). This included 88 downregulated genes in miRNA, and DNA methylation (syn3094629). We iden- deleted regions, 79 of which correspond to the frequently tified 355 differentially expressed genes from these com- observed X-chromosome mosaic monosomy. Among 26 parisons and 3,451 differential methylated DNA probes. upregulated genes in duplicated regions, a duplication of As expected, laboratory of origin accounted for the largest 20q11.21 corresponded to the upregulation of nine over- number of differences, likely because several iPSC deriva- lapping genes, including four (ID1, BCL2L1, HM13, and tion protocols and cell types of origin were largely unique TPX2) previously shown to promote hESC survival or to a single laboratory (e.g., RNA-based reprogramming, oncogenic potential (Nguyen et al., 2014). We also found stromal priming) and could therefore mask handling compatible regulation of the cancer susceptibility genes or other technical differences between laboratories. The FYN (6q21 duplication), ERCC2 (19q13.32 duplication), major distinguishing reprogramming variables from the and NIN (14q22.1 duplication), as well as the tumor-sup- methylation analyses were cell type of origin (1,427 pressor genes FOXO4, BEX1, SRPX, EDA2R, GPC3 (X mono- probes), method of reprogramming (1,346 probes), and somy), ING2 (4q35.1 deletion), and EIF4E3 (3p13 deletion) sex (520 probes). Clustering of these methylation profiles (Osborne et al., 2013). These results are consistent with readily distinguished lines based on both sex and abnormal these CNV conferring a survival or proliferative advantage. karyotypes (Figure 3B), while PCA segregated samples based on cell of origin (Figure 3C). Although these samples Global Expression and Methylation Analysis of PSC consistently segregated by cell of origin independently of To understand the molecular determinants of PSC quality the donor sex, these differences could not be directly attrib- as a function of reprogramming method and somatic cell uted to blood and fibroblast somatic methylation profile origin, we performed mRNA, miRNA, and methylation differences (data not shown). profiling on iPSC and hESC with profiles from hESC- To examine the impact on possible pathways, we looked derived embryoid bodies (EB) as a control. at the enrichment of our discovered reprogramming regu- Relative to EB, hESC and iPSC were largely indistin- lated genes among (GO) terms for each of guishable from each other at the global gene-expression the different measurement platforms (Figure 4A). The level by both hierarchical clustering and principal compo- most prominent pathway level effects were found in the nent analysis (PCA) (syn3107554). Greater variability was methylation comparisons with hESC for a wide array of observed from analogous DNA methylation and miRNA biological comparisons and tested reprogramming vari- profiles (Figure S3). However, restricting the analysis to ables. We observed consistent regulation of inflammatory genes with varying expression only in PSC identifies and immune response, ion homeostasis, and regulation donor, sex, reprogramming technology, and originating of cell proliferation gene sets, particularly among all laboratory as the major driving covariates by hierarchical iPSC compared with hESC, among the different profiling clustering (Figure 3A). These differences did not clearly technologies. associate with the passage number of the profiled PSC. Of To determine whether differential methylation might interest, H9 cells (D025) analyzed greater than ten passages be a source of observed gene-expression differences, we apart displayed a highly variable signature with the higher compared the expression profiles of these differentially passage more similar to EB. Likewise, one of two mesen- regulated genes and probes, based on common gene

Expression is shown relative to day 17 EB derived from multiple hESC (median-based normalization applied to preferentially identify PSC variance). Selected metadata associated with each cell line are shown on the right, with identical terms in each column sharing a color. (B) Expression clustering of all CpG methylation probes on the X chromosome, with blue indicating hypo- and red hypermethylation. Lines with an abnormal karyotype are indicated. (C) Principal component analysis of all differentially methylated probes for all evaluated PSC lines, colored according to cell of origin. BM.mono, bone marrow-derived monocytes; CB, umbilical cord blood; Fib, fibroblasts; Amnio.MSC, amniotic fluid-derived mesenchymal stem cells; BM.CD34, bone marrow-derived CD-34+ cells; p53KD, OSKL-l-p53KD reprogramming vector.

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 115 A

antigen regulation positive processing negative ofhomophilic regulation interleukin-4 biological andregulation response presentation negative cellof adhesion cellnegative productionadhesion of to cellular fateepithelial virusregulation negative commitmentvia regulation peptidyl-tyrosine responseMHC celltransforming ofregulation classmigrationnegative neuron of toion viral IIlipid immunehomeostasis apoptosis of genome regulation growth dephosphorylation regulation macroautophagy positive response factor replicationmetanephros of ofactin canonical regulation betacell Wnt filament proliferation receptornegative receptor developmentneuronWnt of bundlecellchromatin receptorsignaling regulation signaling positive proliferationmigration assemblymulticellular signaling remodelingcellular regulation pathwayof embryonic hydrolase signalresponse organismal neuron of transduction sequence-specificnegativeaxisactivity to cellulardifferentiation specification hormonedevelopment protein regulation calcium-mediatedresponse involved calcium-dependentstabilization stimulus positiveDNA of intonegative cell mitotic lipopolysaccharideactivity negative growthregulation transmembranesignaling cellregulation cell-cell regulationcycle mRNA of cellcarbohydrate checkpoint of adhesion cellular catabolicdevelopmentcell oftransport mRNA transcriptionproliferation regulationlipid processmetabolic transport metabolic from of glucoseprocess RNAprocess pol transport II lentivirus retrovirus CD34+ cells mononuclear A blood plasmid fibroblast DN iPSC

OSKM-NLT Enrichment Z-score OSK-L-l-p53KD

Methylation OSKM -3 lentivirus retrovirus CD34+ cells mononuclear blood plasmid 0 fibroblast iPSC

Gene OSKM-NLT OSK-L-l-p53KD OSKM Expression lentivirus retrovirus 3 CD34+ cells mononuclear blood plasmid fibroblast iPSC OSKM-NLT Splicing OSK-L-l-p53KD

Alternative OSKM

B CDDNA-Methylation β values

0 0.5 1 FRG1B SLC6A5 α CBLN4 MC3R PEG3 LINC00939/ 25 LINC00943 20 TMEM132C PTPRN2 15

DPP6 10 RPKM NFIC 5 EIF3D 0 * CLSTN2 AX747064 OR1A2 hES iPS inner cell mass CD34+ cell fibroblastsmesenchymal stem cell mononuclear cell

E RNA-Seq Fold (log2) Stem Cell/Proliferation -1 0 1 Ectoderm -2 0 2 MYC MYC FGF19 FGF19 NANOG NANOG RCC2 RCC2

SOX4 SOX4 ZBTB22 ZBTB22 DERA DERA

PPP1R16A PPP1R16A EFS EFS MORN4 MORN4 p0 p10 ESC Passage Number

(legend on next page)

116 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 annotations (e.g., promoter, body, or UTR location of the MYC were enriched in negative regulators of differentia- probe). This analysis indicates that 21% of all differen- tion, stem cell maintenance, and positive regulation of tially methylated probes correspond to gene-expression cell proliferation, while anti-correlated genes were en- changes in the PSC, while 43% of all differentially ex- riched in experimentally observed ectoderm differentia- pressed genes appear to be due to underlying differential tion upregulated genes (ToppGene). To test whether these DNA methylation (Pearson r < 0.5). Only negative cor- differences could be related to PSC quality and increased relations were considered from these analyses. Taken passaging, we compared the expression of these same genes together, these data suggest that while iPSC are largely with hESC from a previously described single-cell RNA-seq similar to hESC at the level of gene expression, observed dataset (Figure 4E) (Yan et al., 2013). Clustering of both differences are frequently correlated with changes in DNA early (passage 0) and late (passage 10) single-cell hESC methylation. confirmed that NANOG and MYC high lines were most similar to early-passage hESC. Comparison of hESC and iPSC Among DNA-methylation profiles, comparison of all Genomic Impact of Reprogramming Technology, Cell iPSC to hESC yielded 180 differentially methylated sites, of Origin, and iPSC Stability with 52% of these anti-correlated with gene expression Among the 64 lines evaluated, 41 underwent genomics (n = 93). A more relaxed analysis (non-adjusted moderated characterization, with five unstable lines included as con- t test p < 0.05) of unique donor samples indicated that trols. These 46 lines comprised five cell-of-origin groups, methylation probes largely segregated by donor and cell five reprogramming vector types, and five distinct gene of origin when subjected to hierarchical clustering (Fig- combinations. Comprehensive pairwise comparisons of ure 4B). In agreement with previously published studies, all metadata categories across each genomics platform DPP6, TMEM132C, and PTPRT were among the most highlighted a large number of genes (syn3106206), splicing differentially methylated loci between iPSC and hESC variants (syn3106266), methylation probes (syn3106255), (Figure 4C). In addition, we found that several interesting and miRNAs (syn3106244) strongly associated with one or gene loci were hypomethylated (FRG1B, SLC6A5) and more of these variables (Figure 5). To our knowledge, few of hypermethylated (PTPRN2, LINC00939, CBLN4, MC3R, these molecular differences have previously been reported. NFIC, EIF3D, CLSTN2, AX747064, and OR1A2) in iPSC. Many of the most significant differences were observed Genes hypermethylated in iPSC were associated with among differentially methylated probes (Figures 5A and neuronal differentiation and genomic targets of the poly- S4A). For example, SOX2 was hypermethylated in retroviral comb repressive complex 2 (PRC2) (ToppGene). The most lines relative to all hESC and nearly all iPSC. Reciprocal highly differentially expressed iPSC versus hESC gene, differences in gene expression were frequently observed the paternally imprinted PEG3, was also anti-correlated for these and many other differentially methylated genes. with DNA-methylation probes (Pearson r < 0.98) For all genomic analyses, the small number of unique (Figure 4D). donors available for certain reprogramming methods A close examination of the expression of core pluripo- limited the power of our analysis. However, the availability tency factors across all PSC identified a large number of of a small number of iPSC derived from the same donor genes correlated and anti-correlated with NANOG and with different methods provides additional confirma- MYC (Figure 4E). Genes coexpressed with NANOG and tion of our findings. For example, differentially expressed

Figure 4. Global Reprogramming Impact on Pathways in PSC (A) Pathway-level impact of reprogramming methodology or initiating cell type as compared with hESC, based on statistically enriched GO terms for each comparison and profiling technology. Red indicates higher Z scores, corresponding to lower GO-Elite enrichment p values. Dashed blue lines indicate common regulated pathways in the different applied profiling methods and in the same reprogramming comparisons. (B) DNA-methylation profiles for probes significantly differing between hESC and iPSC (non-adjusted p < 0.05). Colored bars below each cluster indicate cell of origin. (C) Hierarchically clustered subset of the DNA-methylation probes with the lowest p values (adjusted p < 0.05). Associated genes for each probe cluster are indicated on the right. (D) PEG3 expression in hESC and iPSC lines. Box and whiskers plot of unique donor PSC values are represented. Anti-correlated gene expression and DNA methylation are denoted by a red alpha. A blue asterisk indicates significant differential expression. (E) Expression clustering (HOPACH) of genes correlated and anti-correlated with NANOG and MYC gene expression in all PSC. To the right of this cluster, early- (P0) and late-passage (P10) single-cell hESC (Yan et al., 2013) are shown in the same gene order. Genes with red names are negative regulators of differentiation, stem cell maintenance genes, or positive regulators of cell proliferation, while genes with blue names are associated with ectoderm differentiation, based on ToppGene-associated annotations from multiple sources.

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 117 α αα A LARGE (cg11945717) SOX2 (cg25444208) TCHH (cg05523911) DNAH9 (cg18324798) 1.0 * * 1.0 1.0 1.0 * 0.8 0.8 * 0.8 0.8 0.6 0.6 0.6 * 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 Beta Values Beta Values Beta Values Beta Values 0.0 0.0 0.0 0.0

ICM hES iPS ICM hES NLT Retro Lenti CD34+ Mono- CD34+ OSKM- nuclear Fibroblastα α α α B 15 LARGE 2.0 TXNRD2 15 DNAJ15 80 ZFP42 60 * 1.5 10 * 10 * M 1.0 40 * RPKM RPK 5 RPKM 5 RPKM * 0.5 20 0 0.0 0 0

ICM ICM Lenti Retro CD34+ Mono- Sendai OSKM Stable nuclear Plasmid OSK-L-l Unstable Fibroblast -p53KD α OSKM-NLT α C hsa-mir-92b hsa-miR-30c-5p hsa-miR-130b-5p hsa-miR-141-5p 25 1.0 0.15 0.00 20 * 0.8 * 0.10 * -0.05 15 0.6 0.05 * 10 0.4 0.00 -0.10 5 0.2 -0.05 Normalized Exp Normalized Exp Normalized Exp Normalized Exp 0 0.0 -0.10 -0.15

ICM ICM ICM ICM Blood Blood Blood

Fibroblast Fibroblast Fibroblast Fibroblast ‘ D SRSF5 Intron 4 Retention SNHG5 Novel alt-3 splice-site FRAS1 Cassette Exon 53 54 hESC hESC hESC 26 97 74 384 33 10

19 76 72 287 16 29 17 16 86 68 24 Lenti OSK-L-I 8 OSK-L-I 146 71 -p53KD -p53KD 17 70 59 37 70 772 495 13 70235325 70238323 86387646 79385848 79390503

Figure 5. Candidate Factors Associated with iPSC Derivation Method (A–C) The top differentially regulated (A) DNA methylation, (B) mRNA gene expression, and (C) normalized miRNA expression profiles associated with each indicated comparison. Box and whiskers plot of unique donor PSC values are represented. For TXNRD2, sample expression values for the same genetic donor (D007) are indicated for the three indicated reprogramming methods in red. Anti-correlated gene expression and DNA methylation are denoted by a red alpha. Blue asterisks indicate significantly differentially expressed genes (adjusted p < 0.05) versus hESC or the indicated comparator. Retro, retroviral; Lenti, lentiviral; RPKM, reads per kilobase per million mapped reads. (D) Examples of splicing events visualized in the software IGV (Broad Institute), with associated genomic read-alignment depth and junction read counts indicated for a single representative sample. retroviral and lentiviral associated genes (e.g., TXNRD2, unstable lines, decreased expression of crucial PSC genes JUN, UCP2, and HIST1H2BF; Figures 5B and S4B) were (ZFP42 and TRIM6) was associated with increased promoter consistently observed from uniquely reprogrammed lines and gene methylation of these genes. Using a 96-gene from a single donor (D007). Notably, these genes are qPCR panel we verified differential expression for multi- involved in multiple pathways related to oxidative stress. ple genes where corresponding probes were present (e.g., Differential expression of multiple genes affecting cell ZFP42; Figure S4C). growth and differentiation (ID2, ID4, JAG1, IGFBP5, and In total, 41 miRNAs were statistically associated with GLT1D1) were observed with OSK-L-l-p53KD, relative to at least one reprogramming variable. Among these, we other gene reprogramming combinations or hESC. In observed three miRNAs (miR-92b, miR-30c-1, miR-30c-2)

118 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 β values

AC0 0.5 1 B female male *hsa-mir-18b-5p XIST 0.05 20 C1 0.00 15 C2 10 -0.05 5 C3 -0.10 RPKM 0

Normalized Exp -0.15 -5

iPS Male hES inner cell mass CD34+ cell fibroblasts female Female female mesenchymal stem cell mononuclear cell

β D XIST expression DNA-Methylation values XIST expression RNA-Seq fold change (log2) 0 0.35 0.7 -1.2 0 1.2 gender chrX p

q

X-ratio ICM:SC11-015 ICM:SC11-015 ICM:SC14-067 ICM:SC12-039 ICM:SC14-066 ICM:SC12-038 ICM:SC14-067 ICM:SC12-039 ICM:SC14-066 ICM:SC12-038 MSC:SC12-034 MSC:SC12-035 MSC:SC12-034 MSC:SC12-035 mono:SC13-045 mono:SC13-043 mono:SC13-044 mono:SC13-043 mono:SC13-044 mono:SC13-045 CD34+:SC11-012 CD34+:SC11-010 CD34+:SC11-010 CD34+:SC11-013 CD34+:SC11-012 CD34+:SC11-013 CD34+:SC14-069 CD34+:SC12-036 CD34+:SC14-069 CD34+:SC12-036 fibroblast:SC11-016 fibroblast:SC11-003 fibroblast:SC11-016 fibroblast:SC11-003 fibroblast:SC12-005 fibroblast:SC12-006 fibroblast:SC12-023 fibroblast:SC12-028 fibroblast:SC12-019 fibroblast:SC13-049 fibroblast:SC12-005 fibroblast:SC12-006 fibroblast:SC12-023 fibroblast:SC12-019 fibroblast:SC12-028 fibroblast:SC13-049

E 1.4 0.40 18 F 0.38 16 GPC4 1.3 14 TEX10 0.36 12 SOX2 1.2 0.34 10 1.1 0.32 8 NANOG RPKM

X-ratio 6 0.30 RBBP7 1.0 4 0.28 2 POU5F1 % X-ch methyl CSNK2A1 0.9 0.26 0 X-chr methyl PSC XIST PSC X-ratio PSC PSMD10

Figure 6. Impact of X-Chromosome Inactivation and Sex on iPSC (A) Segregation of PSC based on differentially regulated sex-associated DNA-methylated probes (HOPACH). (B) Normalized miRNA expression values in male and female lines from distinct donors. Box and whiskers plot of unique donor PSC values are represented. (C) XIST expression in independent female hESC and iPSC samples from unique donors. (D) Heatmaps of all anti-correlated (Pearson r < 0.6) methylation probes (left) and genes (right) on the X chromosome, ordered by genomic location. XIST expression values are in green and X-to-autosome expression ratios are below the heatmaps. (E) Comparison of distinct measures of XCI within female PSC as determined by RNA-seq (X-to-autosome ratio, XIST expression) and DNA-methylation array (% X-chr methylation). (F) Protein-protein interactions (BioGRID database) between genes anti-correlated with XIST (gray) and core pluripotency factors (black). with predicted mRNA targets that were differentially ex- miR-660, miR-548f-1), suggesting regulation in part by pressed in a reciprocal manner (GO-Elite) (Figure 5C). DNA methylation. Five of the 41 regulated miRNAs were also anti-correlated Alternative splicing and promoter usage was evaluated in with methylation probes (miR-141, miR-130b, miR-191, our RNA-seq data. Comparison of hESC and hESC-derived

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 119 XIST Figure 7. EB Differentiation Outcomes AB X-ratio Correlate with Differential X-Chromo- 6 PSC some Inactivation among Female Stem 5 DUSP6 DE SHH Cell Lines 4 MESO (A) Comparison of XIST expression and X-to- 3 ECTO EB DLX5 autosome ratios in PSC and EB for the same 2 ISL1 lines (PSC line name indicated below the 1 plots).

XIST RPKM (log2) XIST 0 (B) HOPACH clustered heatmap of differen- 1.4 FOXA1 tially expressed genes in EB differentiated 1.3 ERBB3 from female PSC lines. Developmental reg- 1.2 ulators are shown on the right of the plot. 1.1 PSC X-to-autosome ratio and XIST expres- 1 GATA2 sion are displayed above the plot. 0.9 (C) Immunohistochemistry of two teratomas 0.8 X-to-autosome ratio DDC derived from a low and high XIST female PSC using a muscle-specific actin (MSA) SC SC11- SC SC14 SC SC SC SC14-067 SC12-007 SC13-043 SC12-005 SC SC11- SC antibody. SC11-008 SC11-008 SC11-009 SC11-012 SC11-013 SC11-013 SC11-010 SC14-066 SC13-043 SC13-045 SC12-039 SC14-071 SC14-067 SC12-007 SC12-005 SC14-069 SC14-070 SC13-044 14 11-012 14- 13 11- 14 13- -071 - -044 - 008 069 066 013 07 010 045 (D) Quantification of the percentage of

0 positive MSA or neurofilament (NF) staining in adjacent teratoma sections relative to Low XIST SC12-039 MSA X-to-autosome ratio C D 1.3 45 PSC X-to-autosome ratio. 1.25 40 1.2 35 1.15 30 25 1.1 20 1.05 15 1 10 0.95 5 0.9 0 % Positive MSA X-to-autosome ratio NF X-to-autosome ratio 1.3 9 High XIST SC13-043 1.25 8 1.2 7 1.15 6 5 1.1 4 1.05 3 1 2

0.95 1 % Positive NF 0.9 0 X-to-autosome ratio

SC11-008SC12-039 SC14-066 SC14-067 SC12-005 SC13-043 SC14-070 SC11-010 SC13-045

EBs identified 129 alternative exon events with a false methylated between male and female donors, the majority discovery rate p < 0.05 (syn3106284), including many of which were localized to allosomes (457 probes) (Fig- well-validated events (in MBD2, DNMT3B, SLK, ADD3, ure 6A). Similarly, most differentially expressed genes be- MARK3, FYN, NUMB, NAV2, and NFYA)(Gopalakrishna- tween male and females were also localized to allosomes Pillai and Iverson, 2011; Lu et al., 2014; Salomonis et al., (43 out of 60), as were differentially expressed miRNA 2009)(Figure S5A), suggesting that these data are reliable (4 out of 7). Predicted mRNA targets (GO-Elite) of one for more in-depth evaluation. A total of 77 alternative X-chromosomal miRNA (miR-18b) were enriched among exons were significant in a pairwise comparison of all male versus female RNA upregulated genes (Figure 6B). major reprogramming or cell-of-origin variables in PSC. This miRNA was also found to be anti-correlated to its Manual examination of highly differential but non-sig- own DNA-methylation probes, suggesting that it is regu- nificant splicing events suggest that many are valid, but de- lated by DNA methylation. tected with lower sensitivity due to reduced sequencing Genes associated with autosomal differential DNA depth (Figures 5D and S5B). methylation were enriched for PRC2 factors and targets of the PRC2 transcription factor Suz12 (Figures S6B Effect of XCI and Donor Sex and S6C). Only one DNA-methylation-regulating gene, A significant potential confounder in this dataset is donor MECP2, was itself differentially methylated between sex difference. A total of 520 probes were differentially females and males. This is consistent with prior studies

120 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 that have identified MECP2 as a target of X inactivation or proliferation (MZF1, ZNF148, CUX1)(Figure S6D). As (Vallot et al., 2015). these genes promote distinct differentiation pathways, In mouse ESC and human somatic cells, aberrant loss we subjected a set of female PSC (n = 16) to short-term of XIST expression and corresponding breakdown of directed differentiation assays for definitive endoderm, normal XCI has been associated with reduced develop- mesoderm, ectoderm, and EB and performed RNA-seq. mental and increased oncogenic potential. In human Although XIST expression in these lines changed upon PSC, XIST expression is required for the initiation of differentiation, high XIST lines generally remained high XCI but not for XCI maintenance. Multiple classes of (average 2-fold increase versus PSC) and XIST-low re- female PSC have been described including those which mained low (Figure 7A). While most of these lines retained only undergo XCI upon differentiation (class I), those similar X-to-autosome ratios during differentiation as well, that already have undergone XCI (class II and III), and notable variance among a few lines was observed (SC11- PSC that have lost XIST during culture and have under- 009, SC14-069, SC14-071) (Figure 7A). Comparison of gone eroded XCI (class III) (Hall et al., 2008; Silva et al., the number of passages in these two sets of lines revealed 2008; Vallot et al., 2015). Six of ten iPSC from distinct that low XIST PSC had undergone significantly more pas- female donors show little to no XIST expression by sages in culture (Student’s t test p < 0.05, 6 passages more RNA-seq, with no expression in any of the three hESC on average). At least one prior study has demonstrated (Figure 6C). Restricted analysis of probes on the X chro- that failure to induce to XIST expression upon PSC differ- mosome found 1,118 methylation probes anti-correlated entiation is a hallmark of XCI erosion (Mekhoubad et al., for the same PSC with gene expression (syn3107536). 2012). Taken together, the low XIST female PSC in this These largely overlapped with a prior set of described study appear to be most consistent with class III or eroded XCI-associated probes, 619 out of 3,279 (Nazor et al., XCI. An exception to this rule was SC11-009. This line was 2012)(syn3107535). From these 1,118 probes, we find found to transition from low to high XIST expression from that lines without XIST expression have a decrease in the PSC to the EB. This same trend was observed in all X-chromosome methylation and increased X-to-auto- lineage and EB differentiations, suggesting induction of some gene-expression ratio (Figures 6DandS6C). Each XCI (class I). of these three measures of XCI were correlated to each To analyze the differentiation gene-expression results in other (r >0.6orr < 0.6) (Figure 6E). The observed the context of XCI, we focused on genes with expression continuum of predicted XCI among PSC lines supports correlated (r > 0.6 or r < 0.6) to multiple measures of prior proposed models of variable or precocious XCI PSC XCI in each differentiation state (PSC XIST expres- among cells within each PSC line, rather than 100% con- sion, X-to-autosome ratio, and degree of X-chromosome formity (Hall et al., 2008; Silva et al., 2008). Although methylation). In EB, we found 267 genes correlated to spontaneous differentiation in some cultures could multiple measures of XCI and only 12 anti-correlated (Fig- account for the increased XCI, both XIST expression ure 7B). These correlated genes were most enriched (Top- and X-to-autosome ratios were largely consistent in bio- pGene) in associated with methylation (e.g., logical RNA-seq replicates from the same PSC line (data TRMT11, COMTD1, HENMT1, CAMKMT), neuron-fate not shown). Among 116 genes anti-correlated (Pearson commitment (GATA2 ISL1, FOXA1, SHH), heart develop- r < 0.6) with XIST expression, five (RBBP7, CSNK2A1, ment (ERBB3, ISL1, SSH), and other broad developmental PSMD10, GPC4,andTEX10) shared protein interactions processes. Analysis of the early germ-layer differentia- with at least one core pluripotency factor (POU5F1, tions revealed possible precocious induction of genes SOX2,orNANOG) (BioGRID database). The X-chromo- anti-correlated with measures of XCI in each of the differ- some localized nucleosome remodeling factor RBBP7 entiations, such as myocardin and SMAD3 in definitive was the most anti-correlated with XIST expression and endoderm (Figure S6E, syn5565603). As an improved interacts with all three pluripotency factors at the protein means to evaluate the differentiation potential of these level (Figure 6E). lines, we performed immunohistochemistry on teratoma In addition to these expression differences, 646 auto- sections from female PSC (Figures 7C, 7D, and S6F). On some and allosome probes were differentially methylated average, we detected 18% positive MSA staining and in XIST-high versus XIST-low female lines (unique donors, 4% neurofilament (NF) staining in adjacent histological all probes considered). Among the 236 known associated sections. Strikingly, both differentiation markers were genes, eight transcription factors were hypermethylated correlated to multiple measures of PSC XCI (>0.6), with (Brachyury, ZNF628, CUX1) or hypomethylated (MZF1, MSA most highly correlated to XIST expression (r = SCRT2, SCML1, TFCP2, ZNF148) with high XIST expres- 0.71) and NF to X-to-autosome ratio (r = 0.67). These re- sion. Several of these factors have important roles in sults are in agreement with prior studies indicating that lineage differentiation (Brachyury, SCRT2, TFCP2, CUX1) PSC with increased XIST and XCI results in improved

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 121 differentiations relative to PSC undergoing XCI erosion consistently different among iPSC and hESC, fewer dif- (Mekhoubad et al., 2012). ferences were observed for mRNA and miRNA. Although many DNA-methylation probes were identified that were highly distinct between iPSC and hESC, including a DISCUSSION number of probes anti-correlated with gene expression (FRG1B, CCL28, CR1L, PEG3), none could perfectly distin- The large-scale profiling of dozens of iPSC and previously guish between these cell types. characterized hESC represents an important analytical At a global level, the analysis of DNA-methylation pro- reference for the stem cell research community. Evaluating files provides important insights into molecular and cell these lines using the same post-reprogramming culture growth characteristics of PSC that would otherwise be diffi- conditions and profiling technologies has allowed us to cult to identify. Reprogramming-associated variations in carefully examine many possible variables. The creation X-linked CpG methylation is of particular interest because of metadata standards and associated ontologies was of the complex variations in degrees of XCI and X-chromo- essential to make informed comparisons and identify some reactivation between various hierarchical states of confounders in our study. All metadata, raw genomic files, pluripotency. Our analysis highlighted two distinct popu- protocols, processed results, and analyses are provided in lations of XCI female cells, with the most hypermethylated Synapse (Omberg et al., 2013). X-chromosomal PSC split between very high and absent Our studies identified 23 iPSC lines with adverse charac- XIST expression. Interestingly, none of the female hESC teristics such as contamination, karyotypic abnormalities, lines in this study expressed XIST, whereas many of the flow cytometry, or culture morphology consistent with iPSC do. Our analyses of differentiated female PSC identi- differentiation. Surprisingly, teratomas generated from 45 fied correlates between XCI and pluripotentiality that sub- of 46 lines, including three with characteristics of differen- stantiate prior proposed models and provide additional tiation, were pluripotent as they contained cells from all candidate molecular regulators for investigation (Mekhou- three embryonic germ layers. Notably, three pluripotent bad et al., 2012; Silva et al., 2008). teratomas also contained undifferentiated cells identified Genes anti-correlated with XIST, principally RBBP7, by histological or immunostaining analyses, although share protein interactions with core pluripotency regula- independent tumors from the same lines were fully differ- tors, and these differences persist in the EB. This result entiated and did not. Given that the teratoma assay is is particularly intriguing given that RBBP7, a partner of commonly used to confirm PSC pluripotency and quality PRC2 implicated in nucleosome binding, and SUZ12,a (Muller et al., 2010), these results suggest that teratoma component of PRC2 required for stability of the complex analysis should be considered within the context of other and EZH1/2 mediated catalytic activity, were highly en- analyses and results to determine the quality of the PSC riched factors in our analysis of differentially methylated line and not as a stand-alone quality measure. sex-associated autosomal genes. In addition to RBBP7, pre- Evaluation of deleterious CNV provided strong evidence dicted regulation by PRC2 was recurrent in a number of our that the same genetic abnormalities can occur in distinct covariate analyses, including iPSC versus hESC differen- iPSC lines and that such abnormalities can arise during tially methylated probes. A growing body of literature the reprogramming process. As described in other studies, now supports an important role for PRC2 in pluripo- we were unable to exclude the possibility that there was tency, XCI, and differentiation as a recruitment tool of heterogeneity in the starting cell population (Ma et al., PRC1 (Cheng et al., 2014). Although likely not relevant 2014). CNV that were coincident with differential expres- in vivo, such physical protein and epigenetic interactions sion frequently resulted in the deletion of known tumor could be undesirable in iPSC for programmed lineage suppressors or duplication of cell growth/oncogenic fac- differentiation. tors. Such genetic abnormalities could result in clonal The resource presented here provides a standardized and selection advantages that are undesirable for clinical appli- annotated dataset for the characterization of hESC and cations (Cunningham et al., 2012). iPSC that can be applied to the discovery of molecular The DNA-methylation, gene-expression, miRNA, and determinants underlying specific biological properties of splicing differences observed in these studies represent iPSC and their use for future clinical applications. It intriguing differences in PSC that could result in differ- provides information on the most relevant, informative, ences in pluripotentiality, cell growth, or potential tumor- and efficient assays to use for iPSC characterization. Finally, igenicity in vivo. The existence of consistent patterns the new data repository encodes a standard for the bio- between DNA methylation and mRNA or miRNA expres- logical, genomic, and epigenomic characteristics of high- sion provides an additional layer of confidence in these quality, stable iPSC that will serve as a valuable resource observations. While methylation profiles were highly and as iPSC technology moves into clinical translation. The

122 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 many observed reprogramming and cell-of-origin gene- from AltAnalyze (Emig et al., 2010). miRNA expression was expression and splicing differences provide intriguing quantified with mirExpress v2.1.4 (Wang et al., 2009) using the hu- starting hypotheses to fuel new research. man miRBase 20.0 reference (syn2247097). Methylation arrays We aim to improve the breadth and utility of this new were normalized with the minfi R package (Aryee et al., 2014) resource by adding additional pluripotent lines and differ- (syn2677441). Raw data and processing scripts with exact para- meters used are available and are linked together by provenance entiated products. Integration of previously published in Synapse (Table S9). and new datasets will further facilitate advanced cross-com- parison analyses, many of which can be achieved using the SUPPLEMENTAL INFORMATION online data analysis and exploratory tools provided within the Synapse programmatic and web interface. By providing Supplemental Information includes Supplemental Experimental consistent cell-line descriptions, protocols, and associated Procedures, Supplemental Results, six figures, and nine tables and data in an easy-to-access online repository, we hope that can be found with this article online at http://dx.doi.org/10.1016/ these observations will fuel future research into the role of j.stemcr.2016.05.006. these gene signatures in resulting progenitor populations. AUTHOR CONTRIBUTIONS

EXPERIMENTAL PROCEDURES J.W., W.H., A.K.H., P.M., A.M., J.A.C., B.A., and C.L. conceived of and designed the study. N.S., P.J.D., L.O., R.S., S.B., L.S., S.H.S., M.K., C.M., K.D., and S.H. generated and assembled the data. Methods and Data Availability N.S., P.J.D., L.O., R.S., S.B., J.H., M.K., S.K.S., K.D., Y.W., E.Z., All data and methods described herein are available at https:// P.M., J.A.C., B.A., and C.L. analyzed and interpreted the data. www.synapse.org/PCBC (http://dx.doi.org/10.7303/syn1773109) N.S., P.J.D., L.O., L.S., S.H.S., J.H., and C.L. wrote the manuscript. and/or Supplemental Experimental Procedures. Accessions for specific methods are provided in methods sections and Table S9. ACKNOWLEDGMENTS For interactive analyses, customized data exploration options have been integrated into Synapse to facilitate gene-level, cluster, The authors thank the PCBC line contributors in Table S1; Dr. and ToppGene functional enrichment analyses. Michael Terrin, Ling Tang, Andrea Lefever, and Liz Casher from the Administrative Coordinating Center; and Dr. Elke Grassman and Diana Nordling from the CCHMC Translational Core Labora- Cell Lines tories for support. This work was supported by the NHLBI Progen- The lines brought into the study included commonly used but itor Cell Biology Consortium, Administrative Coordinating Center distinct variables from multiple laboratories (Figure 2A). The line (U01HL099997), Cell Characterization Core, Bioinformatics Core, identifiers, originating laboratory, and key contributing scientists and PCBC2012Pilot_01. Other support was provided by the Na- for each line are provided in Table S1. tional Heart, Lung, and Blood Institute (NHLBI) (U01HL099775), the National Institute for Child Health and Human Development Genomic and Epigenetic Molecular Characterization (NICHD) (R01HD082098), the National Institute General Medical mRNA-Seq libraries were prepared with the Illumina TruSeq kit Sciences (NIGMS) (R01GM110628), and the National Eye Institute RNAV2. Single-end libraries were sequenced at a depth of between (NEI) (R01EY023962). Under a licensing agreement between Life 10 and 30 million 50-nt reads on an Illumina HiSeq 2000. A small Technologies Corporation and the Johns Hopkins University, number (n = 3) of ESC and iPSC were also sequenced at a depth E.Z. is entitled to a share of royalty received by the University on of 50 million paired-end, stranded reads, for comparison. miRNA sales of human induced pluripotent stem cell lines. The terms of libraries were prepared with the Illumina TruSeq Small RNA kit and this arrangement are being managed by the Johns Hopkins Univer- sequenced to 1–4 million reads. Methylation was assessed with sity in accordance with its Conflict of Interest policies. the Illumina HumanMethylation450 BeadChip with annotations provided by ENCODE (Encode Project Consortium, 2012). Two Received: August 31, 2015 different assays were used for CNV analysis. 21 cell lines were as- Revised: May 9, 2016 sayed with the Illumina CytoSNP-850K BeadChip, and 29 cell lines Accepted: May 10, 2016 with the Illumina HD HumanOMNI-Quad BeadChip platform. Published: June 9, 2016 Thirty-seven lines were assayed using a TaqMan Low Density Array (Life Technologies, 4385344) containing a panel of stem cell and REFERENCES pluripotency marker genes (syn3107327). Abyzov, A., Mariani, J., Palejev, D., Zhang, Y., Haney, M.S., Toma- Data Processing sini, L., Ferrandino, A.F., Rosenberg Belmaker, L.A., Szekely, A., FASTQ files were aligned to the build GRCh37 and Wilson, M., et al. (2012). Somatic copy number mosaicism in hu- 492 University of California Santa Cruz transcriptome reference (Rose- man skin revealed by induced pluripotent stem cells. Nature , nbloom et al., 2014) using TopHat 2.0.9 (Kim et al., 2013) 438–442. (syn1773110). Gene-level RPKM (reads per kilobase per million Aryee, M.J., Jaffe, A.E., Corrada-Bravo, H., Ladd-Acosta, C., Fein- mapped reads) and alternative splicing estimates were obtained berg, A.P., Hansen, K.D., and Irizarry, R.A. (2014). Minfi: a flexible

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 123 and comprehensive Bioconductor package for the analysis of Infin- Abnormalities in human pluripotent cells due to reprogramming ium DNA methylation microarrays. Bioinformatics 30, 1363– mechanisms. Nature 511, 177–183. 1369. McConnell, M.J., Lindberg, M.R., Brennand, K.J., Piper, J.C., Voet, Bock, C., Kiskinis, E., Verstappen, G., Gu, H., Boulting, G., Smith, T., Cowing-Zitron, C., Shumilina, S., Lasken, R.S., Vermeesch, J.R., Z.D., Ziller, M., Croft, G.F., Amoroso, M.W., Oakley, D.H., et al. Hall, I.M., et al. (2013). Mosaic copy number variation in human (2011). Reference maps of human ES and iPS cell variation enable neurons. Science 342, 632–637. high-throughput characterization of pluripotent cell lines. Cell Mekhoubad, S., Bock, C., de Boer, A.S., Kiskinis, E., Meissner, A., 144, 439–452. and Eggan, K. (2012). Erosion of dosage compensation impacts hu- Chen, J., Bardes, E.E., Aronow, B.J., and Jegga, A.G. (2009). man iPSC disease modeling. Cell Stem Cell 10, 595–609. ToppGene Suite for gene list enrichment analysis and candidate Muller, F.J., Goldmann, J., Loser, P., and Loring, J.F. (2010). A call to gene prioritization. Nucleic Acids Res. 37, W305–W311. standardize teratoma assays used to define human pluripotent cell Cheng, B., Ren, X., and Kerppola, T.K. (2014). KAP1 represses dif- lines. Cell Stem Cell 6, 412–414. ferentiation-inducible genes in embryonic stem cells through cooperative binding with PRC1 and derepresses pluripotency-asso- Mu¨ller, F.J., Schuldt, B.M., Williams, R., Mason, D., Altun, G., Papa- ciated genes. Mol. Cell. Biol. 34, 2075–2091. petrou, E.P., Danner, S., Goldmann, J.E., Herbst, A., Schmidt, N.O., et al. (2011). A bioinformatic assay for pluripotency in human Clancy, J.L., Patel, H.R., Hussein, S.M., Tonge, P.D., Cloonan, N., cells. Nat. Methods 8, 315–317. Corso, A.J., Li, M., Lee, D.S., Shin, J.Y., Wong, J.J., et al. (2014). Small RNA changes en route to distinct cellular states of induced Nazor, K.L., Altun, G., Lynch, C., Tran, H., Harness, J.V., Slavin, I., pluripotency. Nat. Commun. 5, 5522. Garitaonandia, I., Muller, F.J., Wang, Y.C., Boscolo, F.S., et al. (2012). Recurrent variations in DNA methylation in human plurip- Cunningham, J.J., Ulbright, T.M., Pera, M.F., and Looijenga, L.H. otent stem cells and their differentiated derivatives. Cell Stem Cell (2012). Lessons from human teratomas to guide development of 10, 620–634. safe stem cell therapies. Nat. Biotechnol. 30, 849–857. Nguyen, H.T., Geens, M., Mertzanidou, A., Jacobs, K., Heirman, C., Emig, D., Salomonis, N., Baumbach, J., Lengauer, T., Conklin, B.R., Breckpot, K., and Spits, C. (2014). Gain of 20q11.21 in human em- and Albrecht, M. (2010). AltAnalyze and DomainGraph: analyzing bryonic stem cells improves cell survival by increased expression of and visualizing exon expression data. Nucleic Acids Res. 38, Bcl-xL. Mol. Hum. Reprod. 20, 168–177. W755–W762. Encode Project Consortium (2012). An integrated encyclopedia of Omberg, L., Ellrott, K., Yuan, Y., Kandoth, C., Wong, C., Kellen, DNA elements in the human genome. Nature 489, 57–74. M.R., Friend, S.H., Stuart, J., Liang, H., and Margolin, A.A. (2013). Enabling transparent and collaborative computational Gopalakrishna-Pillai, S., and Iverson, L.E. (2011). A DNMT3B alter- analysis of 12 tumor types within the Cancer Genome Atlas. Nat. natively spliced exon and encoded peptide are novel biomarkers of Genet. 45, 1121–1126. human pluripotent stem cells. PLoS One 6, e20663. Osborne, M.J., Volpon, L., Kornblatt, J.A., Culjkovic-Kraljacic, B., Hall, L.L., Byron, M., Butler, J., Becker, K.A., Nelson, A., Amit, M., Baguet, A., and Borden, K.L. (2013). eIF4E3 acts as a tumor suppres- Itskovitz-Eldor, J., Stein, J., Stein, G., Ware, C., et al. (2008). X-inac- sor by utilizing an atypical mode of methyl-7-guanosine cap recog- tivation reveals epigenetic anomalies in most hESC but identifies nition. Proc. Natl. Acad. Sci. USA 110, 3877–3882. sublines that initiate as expected. J. Cell. Physiol. 216, 445–452. Rosenbloom, K.R., Armstrong, J., Barber, G.P., Casper, J., Clawson, International Stem Cell Initiative, Adewumi, O., Aflatoonian, B., H., Diekhans, M., Dreszer, T.R., Fujita, P.A., Guruvadoo, L., Haeuss- Ahrlund-Richter, L., Amit, M., Andrews, P.W., Beighton, G., Bello, ler, M., et al. (2014). The UCSC Genome Browser database: 2015 P.A., Benvenisty, N., Berry, L.S., et al. (2007). Characterization of update. Nucleic Acids Res. 43, D670–D681. human embryonic stem cell lines by the International Stem Cell Initiative. Nat. Biotechnol. 25, 803–816. Rouhani, F., Kumasaka, N., de Brito, M.C., Bradley, A., Vallier, L., Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salz- and Gaffney, D. (2014). Genetic background drives transcriptional berg, S.L. (2013). TopHat2: accurate alignment of transcriptomes variation in human induced pluripotent stem cells. PLoS Genet. 10 in the presence of insertions, deletions and gene fusions. Genome , e1004432. Biol. 14, R36. Salomonis, N., Nelson, B., Vranizan, K., Pico, A.R., Hanspers, K., Lee, D.S., Shin, J.Y., Tonge, P.D., Puri, M.C., Lee, S., Park, H., Lee, Kuchinsky, A., Ta, L., Mercola, M., and Conklin, B.R. (2009). Alter- W.C., Hussein, S.M., Bleazard, T., Yun, J.Y., et al. (2014). An epige- native splicing in the differentiation of human embryonic stem 5 nomic roadmap to induced pluripotency reveals DNA methylation cells into cardiac precursors. PLoS Comput. Biol. , e1000553. as a reprogramming modulator. Nat. Commun. 5, 5619. Schlaeger, T.M., Daheron, L., Brickler, T.R., Entwisle, S., Chan, K., Lu, Y., Loh, Y.H., Li, H., Cesana, M., Ficarro, S.B., Parikh, J.R., Salo- Cianci, A., DeVine, A., Ettenger, A., Fitzgerald, K., Godfrey, M., monis, N., Toh, C.X., Andreadis, S.T., Luckey, C.J., et al. (2014). et al. (2015). A comparison of non-integrating reprogramming Alternative splicing of MBD2 supports self-renewal in human methods. Nat. Biotechnol. 33, 58–63. 15 pluripotent stem cells. Cell Stem Cell , 92–101. Silva, S.S., Rowntree, R.K., Mekhoubad, S., and Lee, J.T. (2008). Ma, H., Morey, R., O’Neil, R.C., He, Y., Daughtry, B., Schultz, M.D., X-chromosome inactivation and epigenetic fluidity in human em- Hariharan, M., Nery, J.R., Castanon, R., Sabatini, K., et al. (2014). bryonic stem cells. Proc. Natl. Acad. Sci. USA 105, 4820–4825.

124 Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 Takahashi, K., and Yamanaka, S. (2006). Induction of pluripotent depends on a specific heterochromatin landscape. Cell Stem stem cells from mouse embryonic and adult fibroblast cultures Cell 16, 533–546. by defined factors. Cell 126, 663–676. Wang, W.C., Lin, F.M., Chang, W.C., Lin, K.Y., Huang, H.D., and Lin, Thomson, J.A., Itskovitz-Eldor, J., Shapiro, S.S., Waknitz, M.A., N.S. (2009). miRExpress: analyzing high-throughput sequencing Swiergiel, J.J., Marshall, V.S., and Jones, J.M. (1998). Embryonic data for profiling microRNA expression. BMC Bioinformatics stem cell lines derived from human blastocysts. Science 282, 10,328. 1145–1147. Yan, L., Yang, M., Guo, H., Yang, L., Wu, J., Li, R., Liu, P., Lian, Y., Tonge, P.D., Corso, A.J., Monetti, C., Hussein, S.M., Puri, M.C., Zheng, X., Yan, J., et al. (2013). Single-cell RNA-seq profiling of Michael, I.P., Li, M., Lee, D.S., Mar, J.C., Cloonan, N., et al. human preimplantation embryos and embryonic stem cells. Nat. (2014). Divergent reprogramming routes lead to alternative stem- Struct. Mol. Biol. 20, 1131–1139. cell states. Nature 516, 192–197. Young, M.A., Larson, D.E., Sun, C.W., George, D.R., Ding, L., Vallot, C., Ouimette, J.F., Makhlouf, M., Feraud, O., Pontis, J., Miller, C.A., Lin, L., Pawlik, K.M., Chen, K., Fan, X., et al. (2012). Come, J., Martinat, C., Bennaceur-Griscelli, A., Lalande, M., and Background mutations in parental cells account for most of the ge- Rougeulle, C. (2015). Erosion of X chromosome inactivation netic heterogeneity of induced pluripotent stem cells. Cell Stem in human pluripotent cells initiates with XACT coating and Cell 10, 570–582.

Stem Cell Reports j Vol. 7 j 110–125 j July 12, 2016 125