A Microfluidic Approach for Imaging and Sequencing Protein-DNA Interactions

bioRxiv preprint first posted online Jul. 18, 2019; doi: http://dx.doi.org/10.1101/706903. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. µDamID: a microfluidic approach for imaging and sequencing protein-DNA interactions in single cells Authors: Nicolas Altemose1, Annie Maslan1, Andre Lai1, Jonathan A. White1, Aaron M. Streets1,2,* 1Department of Bioengineering, University of California, Berkeley, Berkeley, CA, 94720 2Chan Zuckerberg Biohub, San Francisco, CA 94158 * To whom correspondence should be addressed. Email: [email protected] 1 bioRxiv preprint first posted online Jul. 18, 2019; doi: http://dx.doi.org/10.1101/706903. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. 1 Abstract 2 3 Genome regulation depends on carefully programmed protein-DNA interactions that maintain or 4 alter gene expression states, often by influencing chromatin organization. Most studies of these 5 interactions to date have relied on bulk methods, which in many systems cannot capture the 6 dynamic single-cell nature of these interactions as they modulate cell states. One method allowing 7 for sensitive single-cell mapping of protein-DNA interactions is DNA adenine methyltransferase 8 identification (DamID), which records a protein’s DNA-binding history by methylating adenine 9 bases in its vicinity, then selectively amplifies and sequences these methylated regions. These 10 interaction sites can also be visualized using fluorescent proteins that bind to methyladenines. Here 11 we combine these imaging and sequencing technologies in an integrated microfluidic platform 12 (µDamID) that enables single-cell isolation, imaging, and sorting, followed by DamID. We apply 13 this system to generate paired single-cell imaging and sequencing data from a human cell line, in 14 which we map and validate interactions between DNA and nuclear lamina proteins, providing a 15 measure of 3D chromatin organization and broad gene regulation patterns. µDamID provides the 16 unique ability to compare paired imaging and sequencing data for each cell and between cells, 17 enabling the joint analysis of the nuclear localization, sequence identity, and variability of protein- 18 DNA interactions. 19 20 21 22 23 2 bioRxiv preprint first posted online Jul. 18, 2019; doi: http://dx.doi.org/10.1101/706903. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. 24 Introduction 25 Complex life depends on the protein-DNA interactions that constitute and maintain the epigenome, 26 including interactions with histone proteins, transcription factors, DNA (de)methylases, and 27 chromatin remodeling complexes, among others. These interactions enable the static DNA 28 sequence inside the nucleus to dynamically execute different gene expression programs that shape 29 the cell’s identity and behavior. Methods for measuring protein-DNA interactions have proven 30 indispensable for understanding the epigenome, though to date most of this knowledge has derived 31 from experiments in bulk cell populations. By requiring large numbers of cells, these bulk methods 32 can fail to capture critical epigenomic processes that occur in small numbers of dividing cells, 33 including processes that influence embryo development, developmental diseases, stem cell 34 differentiation, and certain cancers. By averaging together populations of cells, bulk methods also 35 fail to capture important epigenomic dynamics occurring in asynchronous single cells during 36 differentiation or the cell cycle. Because of this, bulk methods can overlook important biological 37 heterogeneity within a tissue. It also remains difficult to pair bulk biochemical data with imaging 38 data, which inherently provide information in single cells, and which can reveal the spatial location 39 of protein-DNA interactions within the nuclei of living cells. These limitations underline the need 40 for high-sensitivity single-cell methods for measuring protein-DNA interactions. 41 42 Most approaches for mapping protein-DNA interactions rely on immunoaffinity purification, in 43 which protein-DNA complexes are physically isolated using a high-affinity antibody against the 44 protein, then purified by washing and de-complexed so the interacting DNA can be amplified and 45 measured. The most widely used among these methods is chromatin immunoprecipitation with 46 sequencing (ChIP-seq; Barski et al. 2007, Johnson et al. 2007), which has formed the backbone of 3 bioRxiv preprint first posted online Jul. 18, 2019; doi: http://dx.doi.org/10.1101/706903. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. 47 several large epigenome mapping projects (Celniker et al. 2009; ENCODE Consortium 2012; 48 Kundaje et al. 2015). One drawback of ChIP-seq is that the protein-DNA complex, which is often 49 fragile, must survive the shearing or digestion of the surrounding DNA, as well as several 50 intermediate washing and purification steps, in order to be amplified and sequenced. This results 51 in a loss of sensitivity, especially when using a small amount of starting material. More recent 52 immunoaffinity-based methods have reduced the high input requirements of ChIP-seq, but they 53 recover relatively few interactions in small numbers of cells or single cells (Wu et al. 2012, Shen 54 et al. 2014, Jakobsen et al. 2015, Rotem et al. 2015, Zhang et al. 2016, Skene et al. 2018, Harada 55 et al. 2018, Kaya-Okur et al. 2019, Carter et al. 2019, Grosselin et al. 2019). 56 57 An alternative method for probing protein-DNA interactions, called DNA adenine 58 methyltransferase identification (DamID), relies not on physical separation of protein-DNA 59 complexes (as in ChIP-seq), but on a sort of ‘chemical recording’ of protein-DNA interactions 60 onto the DNA itself, which can later be selectively amplified (Figure 1a; van Steensel and Henikoff 61 2000, Vogel et al. 2007). This method utilizes a small enzyme from E. coli called DNA adenine 62 methyltransferase (Dam). When genetically fused to the protein of interest, Dam deposits methyl 63 groups near the protein-DNA contacts at the N6 positions of adenine bases (m6A) within GATC 64 sequences (which occur once every 270 bp on average across the human genome). That is, 65 wherever the protein contacts DNA throughout the genome, m6A marks are left at GATC sites in 66 its trail. These m6A marks are highly stable in eukaryotic cells, which do not tend to methylate (or 67 demethylate) adenines (O’Brown et al. 2019). Dam expression has been shown to have no 68 discernable effect on gene expression in a human cell line, and its m6A marks were shown to be 69 passed to daughter cells, halving in quantity each generation after Dam is inactivated (Park et al. 4 bioRxiv preprint first posted online Jul. 18, 2019; doi: http://dx.doi.org/10.1101/706903. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. 70 2018). These properties allow even transient protein-DNA interactions to be recorded as 71 permanent, biologically orthogonal chemical signals on the DNA. 72 73 DamID reads out these chemical recordings of protein-DNA interactions by specifically 74 amplifying and then sequencing fragments of DNA containing the interaction site. First, genomic 75 DNA is purified and digested with DpnI, a restriction enzyme that exclusively cleaves G m6ATC 76 sites (Figure 1). Then, universal adapters are ligated onto the fragment ends to allow for 77 amplification using universal primers. Only regions with a high density of m6A produce DNA 78 fragments short enough to be amplified by Polymerase Chain Reaction (PCR) and quantified by 79 microarray or high-throughput sequencing (Wu et al. 2016). DamID has been used to explore 80 dynamic regulatory protein-DNA interactions such as transcription factor binding (Orian et al. 81 2003) and RNA polymerase binding (Southall et al. 2013) as well as protein-DNA interactions 82 that maintain large-scale genome organization. One frequent application of DamID is to study 83 large DNA domains associated with proteins at the nuclear lamina, near the inner membrane of 84 the nuclear envelope (Pickersgill et al. 2006, Guelen et al. 2008, reviewed by van Steensel and 85 Belmont 2017). Because DamID avoids the limitations of antibody binding, physical separations, 86 or intermediate purification steps, it lends itself to single-cell applications. Recently, DamID has 87 been successfully applied to sequence lamina-associated domains (LADs) in single cells in a one- 88 pot reaction, recovering hundreds of thousands of unique fragments per cell (Kind et al. 2015). 89 90 While DamID maps the sequence positions of protein-DNA interactions throughout the genome, 91 the spatial location of these interactions in the nucleus can play an important role in genome 92 regulation (Bickmore and van Steensel 2013). A recent technique demonstrated the ability to 5 bioRxiv preprint first posted online Jul. 18, 2019; doi: http://dx.doi.org/10.1101/706903. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.

Load more