<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Single cell chromatin accessibility of the developmental cephalochordate Dongsheng Chen1, Zhen Huang2,3, Xiangning Ding1,4, Zaoxu Xu1,4, Jixing Zhong1,4, Langchao Liang1,4, Luohao Xu5, Chaochao Cai1,4, Haoyu Wang1,4, Jiaying Qiu1,4, Jiacheng Zhu1,4, Xiaoling Wang1,4, Rong Xiang1,4, Weiying Wu1,4, Peiwen Ding1,4, Feiyue Wang1,4, Qikai Feng1,4, Si Zhou1, Yuting Yuan1, Wendi Wu1, Yanan Yan2,3, Yitao Zhou2,3, Duo Chen2,3, Guang Li6, Shida Zhu1, Fang Chen1,7, Qiujin Zhang2,3, Jihong Wu8,9,10,11 &Xun Xu1

1. BGI-shenzhen, Shenzhen 518083, China 2. Life Science College, Fujian Normal University 3. Key Laboratory of Special Marine Bio-resources Sustainable Utilization of Fujian Province, Fuzhou 350108, P. R. China 4. BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China 5. Department of Neuroscience and Developmental Biology, University of Vienna 6. State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiangan District, Xiamen, Fujian 361102, China 7. MGI, BGI-Shenzhen, Shenzhen 518083, China 8. Eye Institute, Eye and ENT Hospital, Shanghai Medical College, Fudan University, Shanghai, China 9. Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai Municipality, Shanghai, China 10. Key Laboratory of Myopia (Fudan University), Chinese Academy of Medical Sciences, National Health Commission, China 11. State Key Laboratory of Medical Neurobiology, Institutes of Brain Science and Collaborative Innovation Center for Brain Science, Shanghai Medical College, Fudan University, Shanghai, China

Abstract: The phylum chordata are composed of three groups: vertebrata, and

cephalochordata. Single cell developmental atlas for typical species in vertebrata

(mouse, zebrafish, western frog, worm) and tunicate (sea squirts) has been constructed

recently. However, the single cell resolution atlas for , a living proxy of

ancestors, has not been achieved yet. Here, we profiled more than 57

thousand cells during the development of florida lancelet ( floridae),

covering important processes including embryogenesis, organogenesis and

. We identified stage and cluster specific regulatory elements.

Additionally, we revealed the regulatory codes underlying functional specification and

lineage commitment. Based on epigenetic features, we constructed the developmental bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

trajectory for lancelet, elucidating how cell fates were established progressively.

Overall, our study provides, by far, the first single cell regulatory landscape of

amphioxus, which could help us to understand the heterogeneity and complexity of

lancet development at single cell resolution and throw light upon the great transition

from simple ancestor to modern with amazing diversity and

endless forms.

Introduction:

Vertebrate evolved through two rounds of ancient whole genome duplications,

followed by functional divergence in terms of regulatory circuits and gene expression

patterns1. are derived from an ancient chordate lineage, and belong to the

cephalochordate clade2. As the sister group of vertebrates and ,

cephalochordate showed great similarity to fossil from Cambrian in terms of

morphology and thus have been considered as “living fossils”3, or living proxies for the

chordate ancestor. As a slow-evolving chordate species, amphioxus is a good model to

explore evolution history and provide insights about vertebrate origin and evolution.

Amphioxus has been considered as a key phylogenetic model about vertebrate

origin 4. In contrast to tunicates, cephalochordates maintained their basic body plans

since Cambrian, therefore study on cephalochordates could probably provide valuable

insights about the patterning mechanisms conserved among chordates. Yang et al.

performed RNAseq for Branchiostome, covering 13 development stages, ranging from

fertilized egg to adults. In total, they identified 3423 genes changed significantly during

the development, including the increase of translation associated genes and decrease of

cell cycle genes5. Marletaz et al. performed whole genome sequencing for

Branchiostome lanceolatum and constructed genomic profiles including transcriptome,

histone modification, DNA methylation and chromatin accessibility at bulk level. This

study revealed high conversation of cis elements and gene expression patterns between bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

vertebrates and amphioxus at earlier middle embryonic stage, supporting the hourglass

model6.

Although previous studies revealed key components and regulatory mechanisms of

lancelet development, they fail to detect rare cell populations and rapid changing

pathway signals during animal development because of the diversity of cell types and

complicated cellular interactions. To understand the genomic and epigenomic

landscapes of animal development, single cell sequencing methods have been widely

employed in developmental biology. The developmental process for mouse, zebrafish,

clawed frog, worm and ascidian have been investigated at single cell resolution7-11,

which revealed cell lineages and developmental trajectories. These studies elucidate the

dynamic gene expression profiles during the development of rodent, , and

proto-vertebrate. However, a cellular and molecular of more ancient species

such as amphioxus is still missing. In this study, we profiled the chromatin accessibility

and gene expression of Branchiostoma floridae at bulk levels covering 18 continuous

developmental stages, including sperm, oocyte, zygote, 2 cell, 8 cell, 16 cell, 64 cell,

blastula, early gastrulation (early-gas or EG), late gastrulation (late-gas, or LG), early

neurula (early-neu, or EN), middle neurula (mid-neu, or MN), late neurula (late-neu, or

LN) , larva, early metamorphosis (early-meta, or EM), middle metamorphosis

(mid-meta, or MM), late metamorphosis (late-meta, or LM). Meanwhile, we

characterized the single cell chromatin accessibility from blastula to larva stage. Our

study, for the first time, revealed the dynamic epigenetic of lancelet development at

single cell revolution, which classified cells into different groups with distinct patterns

of chromatin accessibility profiles. Moreover, we traced the turnover of cis regulatory

elements of each cell lineage through the whole life cycle of Branchiostoma floridae.

Results:

Profiling of cis regulatory elements during amphioxus development bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

To evaluate cis regulatory elements (CREs) dynamics of Branchiostoma floridae

development, we performed assay for transposase-accessible chromatin with high

throughput sequencing (ATACseq) for Branchiostoma floridae including 18 time

points (Figure 1a). During the entire development period studied, we detected a total of

2,181,151 peaks after rigid filter criteria. To evaluate repeatability of each stage, we

performed principal component analysis (PCA) (Figure S1a), which discriminated

different development embryonic stages and indicated replicates of the same stage

tended to share higher similarity. Genomic feature analysis (Figure S1b) showed more

than 25% of peaks fell to the promoter regions. We found peaks located at promoter

regions increased dramatically after early gastrulation stage and increased

progressively from zygote to middle neurula stage, and decreased afterwards, which

might indicate corresponding dynamics of regulatory elements during developmental

stages. Interestingly neurula stage possesses the least number of peaks, but has the

largest ratio of peaks within promoter regions.

CREs of master regulators and signaling pathway components were under

dynamic regulation during amphioxus development

To pinpoint the profile of dynamic accessible regions during amphioxus development,

we visualized ATACseq data of all time points using the integrative genomics viewer

(IGV). Hox genes were reported to encode a family of transcriptional factors playing

crucial roles in specifying body plans along the head-to-tail axis in various animals12.

In Branchiostoma floridae, 15 Hox genes in total, existed in clusters on chromosome

17 within a 500kb range (Figure 1b). Based on our data, chromosome regions close to

Hox1_Bf, Hox2_Bf and Hox3_Bf genes started to open dramatically from early

gastrulation stage and maintained the epigenetic signals to the metamorphosis stages.

Furthermore, signals of above three Hox gene regions were much stronger than that of

Hox4_Bf to Hox15_Bf. Among Hox genes which harbored relatively less signal

intensity, there were mainly four accessible patterns observed in different bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

developmental stages. The first pattern was that signals were rather weak in sperm,

oocyte and early embryonic stages, but sharply arose from early gastrulation stage and

maintained till late metamorphosis stages. Hox4_Bf followed the first pattern. The

second pattern was that regions opened in sperm, zygote and early gastrulation to late

metamorphosis stages but were nearly closed in oocyte and early embryonic stages

from 2-cell to blastula. Hox5_Bf, Hox6_Bf, Hox7_Bf, Hox9_Bf, Hox10_Bf, Hox11_Bf,

Hox13_Bf, Hox15_Bf followed the second pattern. Hox8_Bf and Hox12_Bf followed

the third pattern of which regions approximate to the genes seemed to be closed

across all embryonic stages. The last pattern, which Hox14_Bf followed, was that

peaks emerged in sperm, zygote, 2cell, early gastrulation, early neurula, early

metamorphosis to late metamorphosis stages but were hardly seen in remaining

stages.

Signals of peaks near Sox2, Sox7 and KLF4 genes emerged in early gastrulation stage,

maintained till larva stage, and weakened afterward. Sox2 was previously reported as

evolutionally conserved and essential for embryo stem cells and neural progenitor

cells pluripotency in various of species 13. KLF4 could act both as activator and as

repressor and was involved in regulating transcription, cell proliferation and

differentiation, and maintaining stem cell population14. Hmbox1 gene was near the

open regions which were specific in sperm stage while Casp8 gene tended to be near

the open peaks in almost all stages except middle neurula and late neurula.

Cell to cell communication pathways are of great importance for the commitment of

cell fates during amphioxus development 15. Here we investigated chromatin

accessibilities of genes involved in crucial pathways during amphioxus development,

namely Hedgehog, Notch, Nuclear Receptor (NR), Receptor Tyrosine Kinase (PTK),

Transforming Growth Factor-β (TGF-β) and Wingless/Int (Wnt) pathways. (Figure

1c). Previous research reported amphioxus with mutation of hh gene showed

deformities including a curled tail, no mouth etc., indicating a role of Hedgehog bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

pathway in the left/right asymmetry control in amphioxus 16. Based on our data, CREs

of hh gene were accessible from early gastrulation stage till larva stage, which was

consistent with previous report that hh transcripts were first detected in the dorsal

mesendoderm at the early gastrula stage17. Lrp5 gene in Wnt pathway and NOTCH4

gene in Notch pathway seemed to be ubiquitously accessible in almost all time points

excepts in oocyte and late neurula stage. While FGFR gene in RTK pathway tended

to be accessible during early embryo stages.

Single-cell chromatin accessibility atlas of amphioxus development We performed single cell ATACseq from samples ranging from gastrulation to larva

stage, and obtained data for around 57 thousand cells passing quality control measures

(Figure 1a). Specifically, the chromatin accessibility status of 11081, 7026, 6316, 9301,

14215, 9470 cells were detected from early gastrula, late gastrula, early neurula, middle

neurula, late neurula and larva respectively. Low dimension visualization of the

chromosome accessibility atlas at six stages revealed the increasing complexity during

embryogenesis (Figure 2a). All 15 clusters identified by the similarity of epigenomic

profile comprised no less than two adjacent sampled stages (Figure 2b), which confirm

the comparability of a serials of samples. Next, we clustered cells at each time point

and then classified chromosome accessibility of each cluster by a hierarchical

clustering approach inferring the relevance of clusters. Then we annotated clusters by

the accessibility of embryonic cell type markers of amphioxus.

At early gastrulation stage of branchiostoma floridae, we identified 10 clusters

corresponding to endoderm and ectoderm. EG1, EG3, EG4, EG8 and EG10 were

defined as entoderm by the accessibility of Foxa2. EG4 and EG8 were specifically

accessible at DKK2/4, the Wnt antagonist, which was the marker of the invaginating as

the vegetal plate flattens18. EG5, EG6, EG7 and EG9 were identified as ectoderm by the

accessibility of Pax2, and the neurol lineage marker Mybl1. At late gastrulation stage, bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

we observed two distinct patterns corresponding to endoderm and ectoderm. LG1, LG2,

LG3, LG5 and LG7 were identified as endodermal with high accessibility of canonical

markers including foxa2, sox7, gsc, afp etc. LG4, LG6 and LG8 showed ectodermal

characteristics due to the accessibility in pax2 and pax6. In addition, neuronal lineage

markers nr6a1, pax3, sox2 etc. were mainly accessible in LG4 and LG8. Taking into

account of canonical Wnt/ß-catenin signaling which mediates cell fate decisions and

cell movements during embryogenesis across metazoans, CTNNB1 was accessible at

EG5, EG6, EG7 and EG9 in early gastrulation stage. While in late gastrulation,

CTNNB1 was accessible concentratedly at LG8, inferring the cells located around the

blastopore and it consisted with the finding that ß-catenin commonly localized at

blastula stage, then was downregulated in mesendoderm at late gastrulation15.

At early neurula stage, we identified nine clusters corresponding to endoderm,

mesendoderm, , neural plate and epidermis. EN4 and EN5 was specifically

accessible at wnt8b, which was the marker of mesendoderm. EN1 and EN6 were

identified as endoderm by the accessibility of Sox7. EN9 and EN10 was identified

neural plate as the specifically accessibility of pax3 and pax6. EN8 was specifically

accessible at NEUROG3, which was the marker of epidermis.EN7 was identified as

notochord by the accessibility at foxa2. At middle neurula stage, MN2, MN6, MN10,

MN11 and MN12 were accessible in pax6, map2, gfap, indicating ectodermal features.

MN1, MN4 and MN5 was identified as endodermal lineage because of the accessibility

in foxa2, sox7, gsc etc. Besides, we noticed notochord markers shha, mnx1, dag1

showed accessibility in MN1, MN4 and MN5, suggesting the presumptive notochord

characteristics. Additionally, MN3, MN7, MN9 and MN13 were identified as neuronal

lineage as they were accessible in neurod1, nefl, grin1 etc. At late neurula stage, LN5,

LN6, LN7 and LN13 were accessible in foxa2 and nkx2-2, indicating their endodermal

features. LN2, LN11 and LN14 was identified as neural tube because of the

accessibility in sox2 and hh. LN4 was accessible at phgdh which is involved in neural bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

tube development, was identified as neural tube as well. We noticed that notochord

markers shha showed accessibility in LN3, suggesting the presumptive notochord

characteristics. LN16 was accessible in egfr and neurod1 which implied an epidermal

feature, hence LN16 was annotated into neuroepithelial cell. Besides, LN8 was

identified as neural crest cells due to the access of specific markers such as neurog3 and

ascl1.

In the larva stage, we identified 10 clusters corresponding to seven cell types: nerve

cord, cerebral vesicle, notochord, pharyngeal region, gill slit, tail bud and

hematopoietic lineage. LA8 were identified as pharyngeal regions by the accessibility

in NKX2-1. LA2 and LA7 were accessible in foxa2, indicating the notochord feature,

and LA13 was accessible in Slc12a8, which was the marker of nerve cord. LA6 was

identified as cerebral vesicle with the accessibility in pax6 and neurog3. We found that

hematopoietic marker vcan1, gblba and ncr1 showed specific accessibility in LA3 and

LA12, suggesting the presumptive hematopoietic lineage characteristics. LA5 and

LA10 were identified as gill slit with the accessibility in pax9 and tbx1. LA11 showed

the characteristics of tail bud due to the accessibility in vasa. Major organogenesis take

place at larval stage. Interestingly, the pharyngeal region differentiates into distinct

organs in larva with asymmetric patterning, the preoral pit and mouth on the left side

and the endostyle and club-shaped gland on the right side. Dkk2 was specifically

accessible in LA8, which expressed at the margins of the mouth opening and revealed

the left side position of LA8 in larva 19.

Developmental trajectory of cephalochordate

We constructed putative trajectory based on the chromatin accessibility of canonical

lineage markers. In global view, we observed distinct endoderm and ectoderm

patterning in gastrula stage. Endoderm in late gastrulation, derived from early

gastrulation, were later separated into mesendoderm, notochord and gut endoderm in bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

early neurula. The gut endoderm lineage continued to differentiate into gill slit and

foregut (pharyngeal region) in larva stage.

The development of notochord was also clearly captured. Following the first

appearance of notochord cluster in early neurula (EN5), notochord clusters were

identified in middle neurula (MN1), late neurula (LN15) and larva (LA2, LA7). NTRK3

started to be accessible in late gastrula and remain accessible in following development.

Similarly, the accessibility of another two nuclear receptors, Nr2e3 and hnf4a,

appeared in late gastrulation and increased gradually along the specification of

notochord, finally peaking at larva stage.

Mesendoderm developed into hematopoietic lineage (LA5) in larva stage. We noticed

that FLT1 and Flt4, both of which belong to platelet-derived and vascular endothelial-

growth factor receptors (PD/VEGFR receptors), showed distinct accessibility from late

gastrulation stage. PD/VEGFR receptors have been linked with the formation of

circulatory system in amphioxus, providing evidence for the validity of our trajectory.

In the separation point from late gastrulation endoderm to mesendoderm and notochord

lineage in early neurula, we noticed that NR5A2 was relatively more accessible in

notochord (EN5) than in mesendoderm (EN1,7). The different accessibility pattern of

NR5A2 suggested the contribute of NR5A2 to the segregation of mesendoderm and

notochord.

As for ectoderm, we observed clear neural lineage in the trajectory. During the

transition from late gastrulation to early neurula, ectoderm differentiate into two

branches: epidermis and neural ectoderm. The epidermis layer persists along the

embryo development. However, the neural ectoderm clusters then developed into

neural plate (EN2,6,9-12), followed by the neural crest (LN3) and neural tube bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(LN4,9,11,14) in late neurula stage. subsequently in larva stage, neural tube clusters

can be further developed into cerebral vesicle and nerve cord.

During the development course of neural lineage, we noticed the dynamic changes of

several nuclear receptors. For example, nr6a1 was highly accessible in gastrula stage

(early gastrulation and late gastrulation) but became relatively closed in subsequent

developmental stages. On the contrary, NTRK3 and Nr1h3 remained closed in gastrula

stage but after which high accessibility were observed. We also found that notch1

displayed a similar accessibility pattern to NTRK3 and Nr1h3. Notch signaling has been

reported to extensively involved the somites formation and segregation. Here we also

suspected its role in neural lineage differentiation. It is worth noting that in larva stage,

NTRK3 was accessible in nerve cord but not in cerebral vesicle, suggesting a possible

role of NTRK3 to involved in the specification of nerve cord and cerebral vesicle.

Transposable elements function as CREs associated with neural development

during amphioxus development

Transposable elements (TEs) have been proposed to contribute to genetic complexity

and provide genetic materials for the evolution of phenotype novelty. It has been

reported that TEs are closely related to neural development. To investigate the

functionality of TEs during lancelet development, we classified TEs into 67 families,

and studied the biological process associated with each class. We found that

PIF-Harbinger, LINE_I-Jockey, LINE_R2-Hero, DNA_Dada and TE_unknown family

are associated with neural system development. Of particular,

TE_DNA_PIF-Harbinger elements are closely related to “neurotransmitter metabolic

process”, including the following genes: PTX3, AGXT, PHGDH, SHMT1, PKD2,

NOS1, TLR2. TE_LINE_I-Jockey overlap with the regulatory elements of GNPAT

related to ensheathment of neurons (Figure 4a). TE_LINE_R2-Hero element is located

around LRP4 gene related to regulation of neuromuscular junction development.

Several elements in TE_unknown are related to neuron projection regeneration (JAK2, bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

LRP1, NDEL1, GRN, PTPRS, TNR, HGF, RTN4R, APOD, MATN2, TNC) and

regulation of neuron projection development (PTK2, RET, LRP1, LRP4, CSMD3,

NRP1, etc.), positive regulation of neurogenesis (RET, NOTCH1, LRP1, HES1, NRP1,

NDEL1, etc.), regulation of neuron projection regeneration (LRP1, NDEL1, GRN,

PTPRS, TNR, HGF, RTN4R) and neural crest cell differentiation (ALDH1A2,

KBTBD8, RET, HES1, NRP1, LRP6, FN1, ALX1, KLHL12, NRP2). In addition, an

element in TE_DNA_Dada is related to neuroinflammatory response gene FPR2

(Figure 4b).

Discussion:

Gene regulation is an important scientific question in developmental biology, as

patterns are largely determined the spatial and temporal expression domains of

developmental genes, which are decided by the dynamic interactions between TFs and

CREs. CREs has been detected using experimental methods such as enhancer trapping,

which is very time-consuming. We investigated the heterogeneity of cell populations of

amphioxus and performed cross-stage and cross-cluster comparisons and revealed

dynamic regulatory elements during several critical biological processes including

gastrulation, neurulation and metamorphosis. Furthermore, we traced the chromatin

states of key regulators controlling body plans and neural system development and

inspected the dynamics of chromatin accessibility of components in WNT, TGF-β,

RTK, nuclear receptor and Notch signaling pathways along developmental trajectory.

Overall, we generated the first single cell epigenetic atlas for the developmental

Branchiostoma floridae, a basal chordate lineage of particular value for studying the

evolution and development of vertebrate. These results could be valuable for

understanding the origin and evolution of chordate and throw light upon the genetic and

epigenetic mechanism underling phenotype novelty of chordate during evolution.

Methods:

Animal husbandry bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Embryos were obtained by in vitro fertilization in filtered seawater and cultured at

25 °C.

10xATAC Sequencing Library preparation

Embryos were transferred to cold lysis buffer (Tris-HCl pH 7.4 10mM, NaCl 10mM,

MgCl2 3mM, Tween-20 0.1%, NP40 0.1%, Digitonin 0.01%, BSA 1%) and immediately pipetted 10X. Larva were homogenized in dounce homogenizer (SIGMA)

10X with lose pestle and 15X with tight pestle. Homogenization product was filtered

through sterile 40-μm cell strainer (BD). Nuclei were centrifuged at 500 g for 5 min at

4 twice. First time nuclei were resuspended by chilled PBS (Gibco) with 0.04% BSA

(BBI). Second time resuspend nuclei in diluted nuclei buffer (10X Genomics). Nuclei

suspensions first were incubated in 10X Genomics transposition mix for 60 min, then

transferred to Chromium Chip E to generate Gel bead-In-Emulsions according to

Chromium Single Cell ATAC Reagent Kits User Guide Rev A (10X Genomics). After

completed library construction step according to the Single Cell ATAC v1 workflow,

library structure conversed by MGIEasy Universal Library Conversion Kit (App-A,

MGI).

Bulk ATAC-seq Sequencing Library preparation

For amphioxus embryos and larva, ATAC-seq was performed in two biological

replicates following the original protocol20,21. Amphioxus embryos and larva were

throwed in liquid nitrogen and frozen for 30 mins.

Bulk ATAC-seq data processing

ATAC- seq was performed in more than two biological replicates at each

developmental stage. ATAC-seq libraries were sequenced to produce an average of 78

million reads for amphioxus. Pair end raw data was generated by BGISEQ-500

sequencer and adapters were trimmed by Trimmomatic-0.39 with parameters bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

“ILLUMINACLIP: adapter.fa:2:30:10 LEADING:3 TAINLING:3

SLIDINGWINDOW:4:15 MINLEN:36”. Bowtie2 was used for aligning with

“--very-sensitive” and other parameters were left default. After aligning, duplicated

reads were removed by MarkDuplicates function of Picard with

“REMOVE_DUPLICATES=true”. Then reads with mapping quality below 30 or

mapped to multi-positions of the genome were filtered. Reads remaining were used for

downstream analysis. The model-based analysis mode of MACS2 was used for peak

calling, and parameters were “-B -f BAMPE”. Only peaks with -logPvalue >=5 were

kept for following steps. IDR (irreproducible discovery rate (IDR) was used for

replicates merging. Gain and lost peaks were identified by BEDTOOLS intersect. R

package of CHIPseeker was used for peak annotation. And Integrated Genomic Viewer

was used for peak visualization. Read counts of every unique peaks were were

calculated using summarizeOverlaps function of GenomicAlignments package with

option "mode='Union', inter.feature=TRUE, fragment=FALSE, singleEnd=TRUE".

This function also recepted a peak information which acted as a GenomicRanges object.

We used rlogTransformation function of DESeq2 package to remove the batch effects

between different repeats of same developmental stage. Dimensional reduction was

applied using plotPCA function of BiocGenerics package and repeats were presented in

2-D plane. Then we removed the abnormal repeats according to the visual results. Gain

and lost peak analysis between every two adjacent time points was performed by

BEDTOOLS with "-v" parameter.

TE analysis

Stage specific peaks were identified and merged by BEDTOOLS. TEs overlapped with

merged peaks were classified based on their families and annotated to nearest genes by

CHIPseeker. Genes of which TSSs were within the distance of 3k bp with TEs were

used for gene ontology analysis. Position of TEs and peaks were visualized by IGV.

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Single cell ATAC-seq data analysis

Raw data sequenced by MGI2000 were split into reads and cell barcode using custom

scripts. Reads were mapping to genome using snaptools with default parameters. To

access the quality of each single cell, we calculated ATAC-seq quality metrics such as

the fragment length distribution, transcription start site (TSS) enrichment and fraction

of reads overlapping peaks were. We removed cell with total fragment less than 6000

and unique molecular identifiers (UMIs) less than 2500. Data was constructed into bins

multiply by cells matrix. We remove the top 5% bins that overlap with invariant

features. We sampled 40% to 20% of the total cells per stage from six developmental

stages for following analysis. Then log-transformation, dimension reduction using

diffusion map and clustering with K Nearest Neighbor (KNN) graph were applied

using snapATAC. To identify the heterogeneity of clusters at each stage, we calculated

the accessible signal of clusters using “bedGraphToBigWig”. Cell type annotation were

based on cluster specific accessibility of marker genes visualized by IGV. We called

peak by MACS2 with “-f BEDPE” and peaks with -logPvalue >=5 were kept for

following DAR analysis.

Sequencing and data availability

All 10xATAC sequence data was produced by MGISEQ-2000 in the China National

GeneBank. ATAC-seq data was produced by BGISEQ-500 in the China National

GeneBank. The data that support the findings of this study have been deposited in the

CNSA (https://db.cngb.org/cnsa/) of CNGBdb with accession number CNP0000891.

Acknowledgements We are thankful to the production team of China National Gene Bank, Shenzhen,

China.

Reference bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Ohno, S. (Allen & Unwin). 2 Putnam, N. H. et al. The amphioxus genome and the evolution of the chordate karyotype. Nature 453, 1064-1071 (2008). 3 Yue, J.-X., Yu, J.-K., Putnam, N. H. & Holland, L. Z. The transcriptome of an amphioxus, Asymmetron lucayanum, from the Bahamas: a window into chordate evolution. Genome biology and evolution 6, 2681-2696 (2014). 4 Feng, Y., Li, J. & Xu, A. in Amphioxus Immunity 1-13 (Elsevier, 2016). 5 Yang, K. Y. et al. Transcriptome analysis of different developmental stages of amphioxus reveals dynamic changes of distinct classes of genes during development. Scientific reports 6, 23195 (2016). 6 Marlétaz, F. et al. Amphioxus functional genomics and the origins of vertebrate gene regulation. Nature 564, 64-70 (2018). 7 Briggs, J. A. et al. The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution. Science 360, eaar5780 (2018). 8 Cao, C. et al. Comprehensive single-cell transcriptome lineages of a proto-vertebrate. Nature 571, 349-354 (2019). 9 Packer, J. S. et al. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science 365, eaax1971 (2019). 10 Cusanovich, D. A. et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell 174, 1309-1324. e1318 (2018). 11 Wagner, D. E. et al. Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360, 981-987 (2018). 12 Mallo, M. & Alonso, C. R. The regulation of Hox gene expression during animal development. Development 140, 3951-3963 (2013). 13 Graham, V., Khudyakov, J., Ellis, P. & Pevny, L. SOX2 functions to maintain neural progenitor identity. Neuron 39, 749-765 (2003). 14 Dang, D. T., Pevsner, J. & Yang, V. W. The biology of the mammalian Kruppel-like family of transcription factors. The international journal of biochemistry & cell biology 32, 1103-1121, doi:10.1016/s1357-2725(00)00059-5 (2000). 15 Bertrand, S., Le Petillon, Y., Somorjai, I. M. & Escriva, H. Developmental cell-cell communication pathways in the cephalochordate amphioxus: actors and functions. International Journal of Developmental Biology 61, 697-722 (2017). 16 Wang, H., Li, G. & Wang, Y. Generating amphioxus Hedgehog knockout mutants and phenotype analysis. Yi chuan= Hereditas 37, 1036-1043 (2015). 17 Zhang, Y., Mao, B., Zhang, S. & Zhang, H. Hedgehog gene in Qingdao amphioxus Branchiostoma belcheri tsingtauense: cloning and expression pattern in early development. Journal of the Marine Biological Association of the United Kingdom 82, 629-633 (2002). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

18 Holland, L. Z. & Onai, T. Early development of cephalochordates (amphioxus). Wiley interdisciplinary reviews. Developmental biology 1, 167-183, doi:10.1002/wdev.11 (2012). 19 Soukup, V. et al. The Nodal signaling pathway controls left-right asymmetric development in amphioxus. EvoDevo 6, 5 (2015). 20 Xu, Q. & Xie, W. Epigenome in early mammalian development: inheritance, reprogramming and establishment. Trends in cell biology 28, 237-253 (2018). 21 Chen, X., Miragaia, R. J., Natarajan, K. N. & Teichmann, S. A. A rapid and robust method for single cell chromatin accessibility profiling. Nature Communications 9, 1-9 (2018).

Figure legends

Fig 1: The accessible chromatin landscape and dynamics of B. floridae embryo

a) Schematic diagram showing brief experimental workflow and samples generated

from different developmental stages, with number of biological replicates indicated

for each sample type. Abbreviations and corresponding sample stages: BL. blastula,

EG. early gastrulation, LG. late gastrulation, EN. early neurula, MN. middle neurula,

LN. late neurula, LA. larva, EM. early metamorphosis, MM. middle metamorphosis

and LM. late metamorphosis

b) Genome browser view of a representative region showing the stage specific

chromatin accessibility.

c) Genome browser view of genes related to crucial developmental signal pathways

Fig 2 Single-cell chromatin accessibility atlas of B. floridae development

a) T-SNE map showing heterogeneity of cells from different developmental stages.

Cells are colored by stages (left) and clusters (right) respectively.

b) Proportion of cells from different stages in each cluster bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

c) T-SNE projection of cells from indicated stages. Cells are colored by embryo

lineage/ cell types.

d) Chromatin accessibility of lineage/ cell type markers in indicated stages.

Fig 3 Epigenomic differentiation trajectories and genome browser views of genes

a) Differentiation trajectory of cells in each cluster along developmental stages, with

stages indicated on the left. Numbers in circles represented clusters.

b) Integrative genomics viewer of several function related genes with stages and

clusters on the left side. EG1 represented cluster 1 of early gastrulation.

Fig 4 Transposable elements analysis

a) Bubble plot showing TE classes and enriched GO terms.

b) Integrate genomic views of TEs and nearest genes.

Fig S1

a) Principal component plots of accessible regions of each replicate with stages

indicated.

b) Genomic distribution of peaks detected at each stage. Figure 1 a bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

SpermOocyte 1cell 2cell 4cell 8cell 16cell64cell BL EG LG EN MN LN LA EM MM LM

2 1 22 21 2 2 22222222 22 bulkATAC 1111111 scATAC

Embryo Nuclei Single nucleus

Sequencing

bulkATAC

b

Sperm Oocyte 1cell 2cell 4cell 8cell 16cell 64cell BL EG LG EN MN LN LA EM MM LM

Hox_family Sox2 Sox7 Foxb1 Hmbox1 KLF4 Casp8 c

Hedgehog pathway TGF-β pathway

Sperm Oocyte 1cell 2cell 4cell 8cell 16cell 64cell BL EG LG EN MN LN LA EM MM LM

Sufu hh SMO acvr2b Bambi NBL1

Nuclear receptor pathway Receptor Tyrosine Kinase pathway Sperm Oocyte 1cell 2cell 4cell 8cell 16cell 64cell BL EG LG EN MN LN LA EM MM LM

NR0B1 NR1D1 Esrrg AKT3 FGFR_1 Plcg1

Wnt pathway Notch pathway Sperm Oocyte 1cell 2cell 4cell 8cell 16cell 64cell BL EG LG EN MN LN LA EM MM LM

GSK3B Lrp5 wnt8b JAG1_2 Notch1_3 NOTCH4 Figure 2 a Stage Cluster b bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to100% display the preprint in perpetuity. It is made BLavailable under aCC-BY-NC-ND 4.0 International licenseC1 . Stage EG C2 BL LG C3 75% EG EN C4 LG MN C5 EN LN C6 MN LA C7 LN LA C8 50% C9 TSNE-2 C10 C11 C12 25% C13 C14 C15 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 TSNE-1 TSNE-1 c early gastrulation late gastrulation early neurula

C1_mesendoderm C1_endoderm C1_endoderm C2_unknown C2_endoderm C2_unknown C3_mesendoderm C3_endoderm C4_mesendoderm C4_mesendoderm C4_ectoderm C5_mesendoderm C5_ectoderm C5_endoderm C6_endoderm C6_ectoderm C6_ectoderm C7_notochord C7_ectoderm C7_endoderm C8_epidermis C8_mesendoderm C8_ectoderm C9_neural plate

TSNE-2 C9_ectoderm C9_unknown C10_neural plate C10_mesendoderm

middle neurula late neurula larva C2_neural tube C1_notochord C3_notochord C1_unknown C2_neural plate C4_neural tube C2_notochord C3_dorsal mesendoderm C5_endoderm C3_hematopoietic lineage C4_notochord C6_endoderm C5_gill slit C5_notochord C7_endoderm C6_cerebral vesicle C6_neural plate C8_neural crest C7_notochord C7_paraxial mesendoderm C9_unknown C8_pharyngeal regions C8_unknown C10_unknown C9_unknown C9_neural plate C11_neural tube C10_gill slit C10_neural plate C12_unknown C11_tail bud

TSNE-2 C11_neural plate C13_endoderm C12_hematopoietic lineage C12_neural plate C14_neural tube C13_nerve C13_neural plate C15_unknown C14_unknown C16_epidermis TSNE-1 TSNE-1 TSNE-1 d early gastrulation late gastrulation C1_mesendoderm C1_endoderm C2_unknown C2_endoderm C3_mesendoderm C3_endoderm C4_mesendoderm C4_ectoderm C5_ectoderm C5_endoderm C6_ectoderm C7_ectoderm C6_ectoderm C8_mesendoderm C7_endoderm C9_ectoderm C8_ectoderm C10_mesendoderm C9_unknown Chrd GSC cpn1 foxa2 early neurula middle neurula C1_endoderm C1_notochord C2_neural plate C2_unknown C3_dorsal mesendoderm C4_mesendoderm C4_notochord C5_notochord C5_mesendoderm C6_neural plate C6_endoderm C7_paraxial mesendoderm C7_notochord C8_unknown C9_neural plate C8_epidermis C10_neural plate C9_neural plate C11_neural plate C12_neural plate C10_neural plate C13_neural plate

pax3 wnt8b foxa2 wnt4 late neurula larva C2_neural tube C1_unknown C3_notochord C2_notochord C4_neural tube C5_endoderm C3_hematopoietic lineage C6_endoderm C5_gill slit C7_endoderm C6_cerebral vesicle C8_neural crest C7_notochord C9_unknown C8_pharyngeal regions C10_unknown C9_unknown C11_neural tube C10_gill slit C12_unknown C11_tail bud C13_endoderm C12_hematopoietic lineage C14_neural tube C13_nerve C15_unknown C16_epidermis C14_unknown

neurod1 NKX2-2 CSNK1A1 Dpep1 Figure 3

a endoderm EG 1 3 4 8 10 5 6 7 9 mesendoderm bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this version posted March 19, 2020. The copyright holdergill for slit this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.epidermis It is made available under aCC-BY-NC-ND 4.0 International license. notochord LG 1 2 3 5 7 4 6 8 neuronal plate neuronal crest neuronal tube EN 4 5 7 1 6 8 9 10 nerve cord cerebral vesical pharyngeal region ectoderm MN 3 7 1 4 5 13 2 6 9 10 11 12 hematopoietic lineage tail bud

LN 10 3 5 6 7 13 16 8 2 4 1411

LA 3 12 2 7 8 115 10 6 13 b FLT1 FLt4 WNT16 NR5A2

EG1 EG3 EG4 EG8 EG10 LG1 LG2 LG3 LG5 LG7 EN3 EN5 MN3 MN7 LN10 LA3 LA12 hematopoietic lineage

NTRK3 Nr1h3 nr6a1 notch1

EG5 EG6 EG7 EG9 LG4 LG6 LG8 EN9 EN10 MN2 MN6

nerve cord MN9 MN10 MN11 MN12 LN2 LN4 LN11 LN14 LA13

chrd wnt5b Nr2e3 hnf4a NR5A2

EG1 EG3 EG4 EG8 EG10 LG1 LG2 LG3 LG5

notochord LG7 EN7 MN1 LN3 LA2 LA7

wnt5b WNT7B LEFTY1 NTRK3 Nr1h3 nr6a1 notch1

EG5 EG6 EG7 EG9 LG4 LG6 LG8 EN9 EN10 MN2 MN6 MN9 MN10 MN11 MN12 cerebral versicle LN2 LN4 LN11 LN14 LA6 Figure 4

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.994954; this versionb posted March 19, 2020. The copyright holder for this preprint a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 InternationalSperm license. Oocyte 1cell 2cell 4cell 8cell 16cell −1 * log10(p.adjust) 64cell 1.5 2.0 BL EG TE_Unknown ● ● ● ● ● ● ● ● ● ● ● ● ● LG EN MN TE_LINE_R2−Hero ● ● ● ● LN LA TE_LINE_I−Jockey ● EM MM TE_DNA_Zator ● ● ● LM

TE_DNA_PIF−Harbinger ●

TE_DNA_hAT−Ac ● ● ● Unknown:Kbtbd8 Satellite:NOTCH4

TE_DNA_Dada ● ● ● Sperm Oocyte 1cell 2cell 4cell 8cell 16cell

neural crest formation 64cell ensheathment of neurons BL neuron projection extension neuroinammatoryneural response crest cell development EG neutral lipidneutral catabolic lipid process metabolicneural process crest cell dierentiation neuron projection regeneration LG neurotransmitter metabolic process positive regulation of neurogenesis neuromuscular junction development negative regulation of neurogenesis EN MN positive regulation of neuron dierentiation negative regulationregulation of of neuron neuron regulationdierentiation projection of development neuron projection regeneration LN regulation of neuromuscular junction development LA negative regulation of neuron projection developmentpositive regulationpositive of neuron regulation projection of neuron development projection regeneration EM MM LM

Unknown:NRP1 Unknown:RTN4R