Molecular Recording of Mammalian Embryogenesis Michelle M
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE https://doi.org/10.1038/s41586-019-1184-5 Molecular recording of mammalian embryogenesis Michelle M. Chan1,2,14, Zachary D. Smith3,4,5,14, Stefanie Grosswendt6, Helene Kretzmer6, Thomas M. Norman1,2, Britt Adamson1,2,13, Marco Jost1,2,7, Jeffrey J. Quinn1,2, Dian Yang1,2, Matthew G. Jones1,2,8, Alex Khodaverdian9,10, Nir Yosef9,10,11,12, Alexander Meissner3,4,6* & Jonathan S. Weissman1,2* Ontogeny describes the emergence of complex multicellular organisms from single totipotent cells. This field is particularly challenging in mammals, owing to the indeterminate relationship between self-renewal and differentiation, variation in progenitor field sizes, and internal gestation in these animals. Here we present a flexible, high-information, multi-channel molecular recorder with a single-cell readout and apply it as an evolving lineage tracer to assemble mouse cell-fate maps from fertilization through gastrulation. By combining lineage information with single-cell RNA sequencing profiles, we recapitulate canonical developmental relationships between different tissue types and reveal the nearly complete transcriptional convergence of endodermal cells of extra-embryonic and embryonic origins. Finally, we apply our cell-fate maps to estimate the number of embryonic progenitor cells and their degree of asymmetric partitioning during specification. Our approach enables massively parallel, high-resolution recording of lineage and other information in mammalian systems, which will facilitate the construction of a quantitative framework for understanding developmental processes. The development of a multicellular organism from a single cell is an aston- Here we generate and validate a method for simultaneously reporting ishing process. Classic lineage-tracing experiments using Caenorhabditis cellular state and lineage history in mice. Our CRISPR–Cas9-based elegans revealed surprising outcomes, which include deviations between recorder is capable of generating high-information content and can lineage and functional phenotype, but nonetheless benefitted from the perform multi-channel recording with readily tunable mutation rates. highly deterministic nature of development in this organism1. More- We use the recorder as a continuously evolving lineage tracer to observe complex species generate larger and more elaborate structures that the fate map that underlies mouse embryogenesis through gastrulation, progress through multiple transitions, which raise questions about the recapitulating canonical paradigms and illustrating how lineage infor- coordination between specification and commitment to ensure faith- mation may facilitate the identification of novel cell types. ful recapitulation of an exact body plan2,3. Single-cell RNA sequencing (scRNA-seq) has permitted unprecedented explorations into cell-type A transcribed and evolving recorder heterogeneity, and has produced profiles of developing flatworms4,5, To achieve our goal of a tunable molecular recorder that is capable frogs6, zebrafish7,8 and mice9,10. More recently, CRISPR–Cas9-based of creating high-information content, we used Cas9 to generate dou- technologies have been applied to record cell lineage11–14, and combined ble-stranded breaks that result in heritable insertions or deletions with scRNA-seq to generate fate maps in zebrafish15–17. However, these (indels) after repair11–17. We record within a 205-base-pair, synthetic technologies include only one or two bursts of barcode-diversity genera- DNA ‘target site’ that contains three ‘cut sites’ and a static 8-base-pair tion, which may be limiting for other applications or organisms. ‘integration barcode’, which is delivered in multiple copies using piggy- An ideal molecular recorder for addressing developmental questions Bac transposition (Fig. 1a, b). We embedded this sequence into the 3′ in complex multicellular organisms would possess the following charac- untranslated region of a constitutively transcribed fluorescent protein teristics: (1) minimal effect on cellular phenotype; (2) high-information to enable profiling from the transcriptome. A second cassette encodes content to account for hundreds of thousands of cells; (3) a single-cell three independently transcribed and complementary guide RNAs to readout for simultaneous profiling of functional state15–17; (4) flexi- permit recording of multiple distinct signals19 (Fig. 1a, b). ble recording rates that can be tuned to a broad temporal range; and Our system is capable of high-information storage owing to the (5) continuous generation of diversity throughout the experiment. The diversity of heritable repair outcomes and the large number of target last point is especially relevant for mammalian development, in which sites, which can be distinguished by the integration barcode (Fig. 1c). spatial plans are gradually and continuously specified and may origi- DNA repair generates hundreds of unique indels, and the distribu- nate from small, transient progenitor fields. scRNA-seq has revealed tion for each cut site is different and non-uniform; some regions lead populations of cells that have a continuous spectrum of phenotypes, to highly biased outcomes, whereas others create a diverse series20–22 which implies that differentiation does not occur instantaneously and (Fig. 1c, Extended Data Fig. 1). To identify sequences that can tune the further highlights the need for an evolving recorder18. mutation rate of our recorder for time scales that are not pre-defined 1Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA. 2Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA. 3Broad Institute of MIT and Harvard, Cambridge, MA, USA. 4Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA. 5Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA. 6Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany. 7Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, USA. 8Integrative Program in Quantitative Biology, University of California, San Francisco, San Francisco, CA, USA. 9Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA. 10Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA. 11Chan Zuckerberg Biohub, San Francisco, CA, USA. 12Ragon Institute of Massachusetts General Hospital, MIT and Harvard University, Cambridge, MA, USA. 13Present address: Department of Molecular Biology, Lewis Sigler Institute, Princeton University, Princeton, NJ, USA. 14These authors contributed equally: Michelle M. Chan, Zachary D. Smith. *e-mail: [email protected]; [email protected] 6 JUNE 2019 | VOL 570 | NATURE | 77 RESEARCH ARTICLE d Controls bri1 bam3 ade2 1.0 a Target-site cassette Gal4-4 Target site c Mismatch EF1α 20 FP 321 pA n = 10 intBCs 0.5 IntBC Rate Three-guide cassette sgGFP Site 2 Site 1 1, P 1 3 0 2 U6 10 5 ′ 123 fraction white-O white-B white-L Chrl_106412 + 1.0 3 sgRN GFP 2 A Unique reads (%) 0 0.5 1 b Uncut Indels Site 1 All All region Constant IntBC ––+ Site 3 + Cas9 0 + 3× sgRNA 0816 0816 0816 0816 Time (days) Time (days) Time (days) Time (days) Fig. 1 | Optimization of a multi-purpose molecular recorder. a, Target- of the intBC. All, all three cut sites. d, sgRNA mismatches alter mutation site (top) and three-guide (bottom) cassettes. The target site consists of an rate. Seven protospacers (bri1, bam3, ade2, white-O, white-B, white-L and integration barcode (intBC) and three cut sites for Cas9-based recording. chrl_106412) were integrated into the coding sequence of a GFP reporter Three different single-guide RNAs (sgRNAs) are each controlled by to infer mutation rate by the fraction of GFP-positive cells over a 20-day independent promoters (in this study, mouse U6, human U6 and bovine time course. Single or dual mismatches were made in guides according to U6). FP, fluorescent protein; sites 1–3 are indicated. b, Molecular recording proximity to the protospacer adjacent motif: region 1 (proximal), region 2 principle. Each cell contains multiple genomic, intBC-distinguishable and region 3 (distal). Guides against Gal4-4 and the GFP coding sequence target-site integrations. sgRNAs direct Cas9 to cognate cut sites to generate (sgGFP) act as negative and positive controls. Sequence names in black insertion (red) or deletion mutations. Here Cas9 is either ectopically were incorporated into the target site. P, perfect complementarity between delivered or induced by doxycycline. c, Percentage of uniquely marked guide RNA and protospacer; 3, mismatch in region 3; 1, mismatch in reads recovered after recording within a K562 line with ten intBCs for region 1; 1,2, simultaneous mismatches in regions 1 and 2. six days. Information content scales with number of sites and presence a b Transposase mRNA E9.5 Body Rosa26::Cas9:eGFP Head Tail Target PiggyBAC library Yolk sac Target site Three guide array site Placenta Fertilization ITR ITR E8.5–E9.5 01Correlation c 3 d Guide array Site 1: 510 unique indels 1, 1, 1 2, 1, 2 2, 1, P P, 1, P 100 75 ) I I I 0 I D D D D D D D D D D D D D D D D D D D D D D D D D D 50 139:1: 136:2: 139:2: 136:1: 139:3: 92:47: 140:2: 94:55: 137:4: 138:8: 140:3: 141:1: 91:52: 139:4: 139:7: 136:2: 93:46: 136:3: 138:1: 133:22: 126:12: 131:21: 131:10: 139:11: 131:11: 126:11: 137:10: 136:14: 135:12: 131:22: