Gene Regulation and Chromatin Structure of Mammalian Olfactory Receptors

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Tan, Longzhi. 2018. Gene Regulation and Chromatin Structure of Mammalian Olfactory Receptors. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:41129184

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Gene regulation and chromatin structure of mammalian olfactory receptors

A dissertation presented

by

Longzhi Tan

to

The Committee on Higher Degrees in Systems Biology

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in the subject of

Systems Biology

Harvard University

Cambridge, Massachusetts

April 2018

© 2018 Longzhi Tan

All rights reserved. Dissertation Advisor: Professor Xiaoliang Sunney Xie Longzhi Tan

Gene regulation and chromatin structure of mammalian olfactory receptors

Abstract

Mammals sense odors by expressing the gene family of olfactory receptors (ORs). Despite the massive family size — around 1,000 OR genes in the mouse genome and 400 in human, each sensory neuron randomly expresses one, and only one, OR. This phenomenon, termed the “one- neuron-one-receptor” rule, underlies both odor sensing in the nose and the formation of an odor map in the brain. However, it remains a mystery how this rule is established. Combining theoretical modeling, single-cell transcriptomics, spatial transcriptomics, and single-cell 3D genome structures, we investigated the regulation of OR genes during neuronal development. We identified a fundamental kinetic constraint in a recent model of epigenetic OR regulation, uncovered a surprising phenomenon of transient multi-OR expression in immature neurons, created by far the most comprehensive spatial map of mouse ORs in the nose, and revealed for the first time 3D genome structures of single diploid human and mouse cells including olfactory sensory neurons. Our interdisciplinary approach provided valuable insights into the molecular mechanism behind the “one-neuron-one-receptor” rule of OR expression; and our methods could be widely applicable to other systems where gene regulation and chromatin structure underlie important physiological functions.

iii Table of Contents

ž Abstract (iii)

ž Table of Contents (iv)

ž Acknowledgements (v)

ž Chapter 1: Introduction (1)

ž Chapter 2: Modeling the kinetics of gene regulation of olfactory receptors (10)

ž Chapter 3: Single-cell transcriptomic sequencing of olfactory sensory neurons (34)

ž Chapter 4: Mapping the expression zones of nearly all mouse olfactory receptors (55)

ž Chapter 5: Reconstructing the 3D genomes of single diploid human and mouse cells (71)

iv Acknowledgements

I would like to thank my thesis advisor Xiaoliang Sunney Xie for being a fantastic mentor. He is absolutely awesome in every way. He is perhaps best known for his incredible technologies; but at heart he cares the most about scientific discoveries, about which I share the same passion. He gave me tremendous freedom in research subjects, experimental approaches, and work hours; but whenever I need his help, he gives me the best advice and points me to the best resources. I feel extremely fortunate to work with Sunney — a visionary scientist and a wonderful human being.

I would also like to thank

ž current and past members of the Xie lab including Chenghang Chuck Zong, Jun Yong, Alec

Chapman, Chongyi Chen, Dong Xing, Patricia Purcell, Lin Song, Chi-Han Chang, Xu

Zhang, Zi Hertz He, Dan Fu, Fa-Ke Frank Lu, Yuntao Steve Mao, Wenlong Yang, Haisong

Liu, Shufang Wang, David Feng Lee, Zheng Yan, Huiyi Chen, Ziqing Winston Zhao, Sabin

Mulepati, Asaf Tal, Jenny Lu, Ang Li, Shasha Chong, Minbiao Ji, Lei Huang, Luoxing

Xiong, Yaqiong Tang, Yi Yin, Guangyu Gavin Zhou, Bo Zhao, Yuanzhen Suo, Wenting Cai,

Liyun Jessica Sang, Yunlong Richard Cao, Yan Gu, David Suter, Rahul Roy, Sijia Lu, Larry

Valles, Sarah Quilty, Tracey Schaal, and Tony Jia,

ž my collaborators Steven Liberles, Sabina Berretta, Stavros Lomvardas, Heng Li, and Fred

Alt, and members of their labs including Qian Li, Anne Boyer-Boiteau, Kevin Monahan,

Jerome Kahiapo, and Pei-Chi Peggy Wei,

ž my dissertation examining committee Xiaowei Zhuang, Adam Cohen, and Tim Mitchison,

ž my dissertation advisory committee (DAC) Catherine Dulac, Yang Shi, and Steven Liberles,

ž my preliminary qualifying exam 2 (PQE2) committee Sean Megason, Yang Shi, Catherine

v Dulac, and Tim Mitchison,

ž my preliminary qualifying exam 1 (PQE1) committee Andrew Murray, Michael Desai, and

Johan Paulsson,

ž my student symposium coach Ethan Garner,

ž my rotation advisors Angela Depace, George Church, and Jack Szostak, and members of

their labs including Tara Martin, Je Hyuk Jay Lee, Aaron Engelhart, and Anders Björkbom,

ž my PhD program directors Tim Mitchison and Andrew Murray, classmates Daniel Flicker,

Abigail Groff, John Ingraham, Yunxin Joy Jiao, Farhan Kamili, Peter Koch, Matthieu

Landon, Martin Lukacisin, Eran Mick, Alex Ng, and Elizabeth Van Itallie, and

administrators Sam Reed and Liz Pomerantz,

ž my Neurobiology summer course directors Graeme Davis and Timothy Ryan, the rest of the

faculty and assistants, and classmates Antiño Allen, Katie Ferguson, Huong Ha, Natalie

Kaempf, Michael Kienzler, Kyle Lyman, Dimphna Meijer, Anne Olsen, Johnny Saldate, Eric

Schreiter, Chung Yiu Jonathan Tang, Prahatha Venkatraman, and Christina Whiteus, and

ž my past advisors Wit Busza, Jeff Gore, and Pardis Sabeti,

ž my parents Lierong Li and Yilun Tan.

This work was supported by an NIH Director's Transformative Research Award

(5R01EB010244), an NIH Director’s Pioneer Award (5DP1CA186693), and funding from

Beijing Advanced Innovation Center for at to Xiaoliang Sunney

Xie, a Harvard Brain Initiative Collaborative Seed Grant to Fred Alt and Xiaoliang Sunney Xie, an NIH Research Project Grant (5R01DC013289) to Steven Liberles, and an HHMI International

Student Research Fellowship to me.

vi

Chapter 1

Introduction

1 The mammalian nose detects an enormous number of odors; at the molecular level, these odorants are recognized by a large family of proteins, termed olfactory receptors (ORs) (Buck and Axel 1991). ORs are G-protein-coupled receptors (GPCRs) and constitute the largest gene family in many organisms. For example, the human and mouse genomes encode ~ 400 and ~

1,000 functional OR genes, respectively (Godfrey, Malnic, and Buck 2004, Malnic, Godfrey, and

Buck 2004). The large numbers of ORs enable mammals to sense the diverse chemical world around them.

In the mammalian nose, ORs are expressed in a striking pattern, termed the “one-neuron-one- receptor” rule (Figure 1.1) (Mombaerts 2004). In the main olfactory epithelium (MOE) of rodents, each olfactory sensory neuron (OSN) randomly chooses one, and only one, OR gene for expression (Ressler, Sullivan, and Buck 1993, Vassar, Ngai, and Axel 1993), and for each OR gene only one of the maternal and paternal alleles (Chess et al. 1994). This phenomenon is also observed in insects (Vosshall et al. 1999) and in fish (Barth, Dugas, and Ngai 1997). OSNs that express the same OR then converge their axons to a few stereotypical spots, termed glomeruli, in the olfactory bulb (OB) of the brain (Ressler, Sullivan, and Buck 1994, Vassar et al. 1994). The

“one-neuron-one-receptor” rule therefore ensures the proper sense of smell in the nose and its transduction to the brain.

2

Figure 1.1. Schematic of the “one-neuron-one-receptor” rule of OR expression. Each OSN randomly expresses one, and only one, OR gene.

It remains a longstanding mystery how the “one-neuron-one-receptor” rule can be achieved despite the intrinsic stochasticity of . In particular, each neuron has to continuously transcribe a single OR gene while specifically and strictly silencing all ~ 1,000 others, any leakage of which would add up to a considerable number of undesired OR transcripts. Transgenic experiments revealed that once an OR gene is chosen for expression, its protein product will elicit a feedback that silences all other OR genes (Serizawa et al. 2003,

Lewcock and Reed 2004). However, it is unclear how this feedback is achieved and how it avoids silencing the chosen OR.

Recent studies suggested an epigenetic mechanism, termed “silencing the de-silencer”, that might explain how a chosen OR avoids silencing itself. In this proposed mechanism, all OR genes are initially silenced by a repressive histone methylation, H3K9me3, according to chromatin immunoprecipitation (ChIP) experiments (Magklara et al. 2011). A transcriptional activator protein KDM1A (also known as LSD1), which could demethylate H3K9 (Shi et al.

2004), is transiently expressed in neuronal progenitors and de-silences a single OR gene (Lyons

3 et al. 2013). Expression of this OR then triggers the unfolded protein response (UPR), which in turn silences the Kdm1a gene and thus prevents de-silencing of any additional OR genes (Dalton,

Lyons, and Lomvardas 2013). However, it remains unclear whether this mechanism is sufficient to explain the “one-neuron-one-receptor” rule.

In Chapter 2, we theoretically modeled the kinetics of “silencing the de-silencer” during the OR choice. Under minimal assumptions, we found that the ratio between the timescale of OR de- silencing and the timescale of de-silencer removal determined whether the “one-neuron-one- receptor” rule held. The two timescales must be separated by at least two orders of magnitude, which could be physiologically feasible but would require unusually fast removal of the de- silencer KDM1A. This result prompted us to look for additional mechanisms behind the “one- neuron-one-receptor” rule. This Chapter has been published as (Tan, Zong, and Xie 2013).

Indeed, in Chapter 3 we uncovered an additional level of complexity in OR regulation: the transient co-expression of multiple ORs before the final OR choice. For more than a decade, ORs were thought to express one-at-a-time (Li et al. 2004, Shykind et al. 2004). By single-cell transcriptomic sequencing (Chapman et al. 2015) of mouse OSNs, however, we observed co- expression of two to nine ORs per cell in a large fraction of immature OSNs. This observation provided experimental evidence for a previous hypothesis of multi-OR expression (Mombaerts

2004), and was discovered independently by another group (Hanchate et al. 2015). The discovery of multi-OR expression calls for new mechanisms to explain the transcriptional elimination of all but one OR gene. This Chapter has been published as (Tan, Li, and Xie 2015).

4 In Chapter 4, we turn to the spatial aspect of OR regulation. Different parts of the rodent nose express different subsets of ORs (Ressler, Sullivan, and Buck 1993, Vassar, Ngai, and Axel

1993). This spatial patterning, termed expression “zones”, is also observed in fish (Weth, Nadler, and Korsching 1996) and in primates (Horowitz et al. 2014), and is hypothesized to help establish the “one-neuron-one-receptor” rule by limiting the number of OR genes accessible to each OSN. By spatial transcriptomic sequencing, we determined the zones for nearly all mouse

ORs with high resolution. This zonal map might help to better understand the spatial organization of smell and its molecular basis. This Chapter has been submitted for publication and posted as a preprint (Tan and Xie 2017).

In Chapter 5, we explore the 3D genome structures underlying OR regulation. Chromatin structures are believed to play a crucial role in the “one-neuron-one-receptor” rule. In particular, the nuclei of OSNs exhibit an unusual architecture, where most OR genes aggregate in heterochromatic foci and are presumably silenced together (Clowney et al. 2012), while an network of enhancers presumably cooperates to transcribe the one chosen OR gene

(Markenscoff-Papadimitriou et al. 2014). To better understand this nuclear architecture, we developed a single-cell chromatin conformation capture method, termed Dip-C, which generated the first 3D genome structures of single diploid human and mouse cells. Dip-C was partly based on a high-coverage whole-genome amplification (WGA) method LIANTI, which we also developed (Chen et al. 2017). Applying Dip-C to OSNs, we made our first step towards uncovering the chromatin-structural determinants of the “one-neuron-one-receptor” rule. This

Chapter, except data from OSNs, has been submitted for publication (Tan, Xing, Chang, and Xie

2018) and the Dip-C method has been filed as part of a patent application (Xing, Chang, Tan, and

5 Xie patent pending).

6 References

Barth, A. L., J. C. Dugas, and J. Ngai. 1997. "Noncoordinate expression of odorant receptor genes tightly linked in the zebrafish genome." Neuron 19 (2):359-69.

Buck, L., and R. Axel. 1991. "A novel multigene family may encode odorant receptors: a molecular basis for odor recognition." Cell 65 (1):175-87.

Chapman, A. R., Z. He, S. Lu, J. Yong, L. Tan, F. Tang, and X. S. Xie. 2015. "Single cell transcriptome amplification with MALBAC." PLoS One 10 (3):e0120889. doi: 10.1371/journal.pone.0120889.

Chen, C., D. Xing, L. Tan, H. Li, G. Zhou, L. Huang, and X. S. Xie. 2017. "Single-cell whole- genome analyses by Linear Amplification via Transposon Insertion (LIANTI)." Science 356 (6334):189-194. doi: 10.1126/science.aak9787.

Chess, A., I. Simon, H. Cedar, and R. Axel. 1994. "Allelic Inactivation Regulates Olfactory Receptor Gene-Expression." Cell 78 (5):823-834. doi: Doi 10.1016/S0092- 8674(94)90562-2.

Clowney, E. J., M. A. LeGros, C. P. Mosley, F. G. Clowney, E. C. Markenskoff-Papadimitriou, M. Myllys, G. Barnea, C. A. Larabell, and S. Lomvardas. 2012. "Nuclear Aggregation of Olfactory Receptor Genes Governs Their Monogenic Expression." Cell 151 (4):724- 737. doi: Doi 10.1016/J.Cell.2012.09.043.

Dalton, R. P., D. B. Lyons, and S. Lomvardas. 2013. "Co-opting the unfolded protein response to elicit olfactory receptor feedback." Cell 155 (2):321-32. doi: 10.1016/j.cell.2013.09.033.

Godfrey, P. A., B. Malnic, and L. B. Buck. 2004. "The mouse olfactory receptor gene family." Proceedings of the National Academy of Sciences of the United States of America 101 (7):2156-2161. doi: Doi 10.1073/Pnas.0308051100.

Hanchate, N. K., K. Kondoh, Z. Lu, D. Kuang, X. Ye, X. Qiu, L. Pachter, C. Trapnell, and L. B. Buck. 2015. "Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis." Science 350 (6265):1251-5. doi: 10.1126/science.aad2456.

Horowitz, L. F., L. R. Saraiva, D. Kuang, K. H. Yoon, and L. B. Buck. 2014. "Olfactory receptor patterning in a higher primate." J Neurosci 34 (37):12241-52. doi: 10.1523/JNEUROSCI.1779-14.2014.

Lewcock, J. W., and R. R. Reed. 2004. "A feedback mechanism regulates monoallelic odorant

7 receptor expression." Proceedings of the National Academy of Sciences of the United States of America 101 (4):1069-1074. doi: Doi 10.1073/Pnas.0307986100.

Li, J., T. Ishii, P. Feinstein, and P. Mombaerts. 2004. "Odorant receptor gene choice is reset by nuclear transfer from mouse olfactory sensory neurons." Nature 428 (6981):393-9. doi: 10.1038/nature02433.

Lyons, D. B., W. E. Allen, T. Goh, L. Tsai, G. Barnea, and S. Lomvardas. 2013. "An epigenetic trap stabilizes singular olfactory receptor expression." Cell 154 (2):325-36. doi: 10.1016/j.cell.2013.06.039.

Magklara, A., A. Yen, B. M. Colquitt, E. J. Clowney, W. Allen, E. Markenscoff-Papadimitriou, Z. A. Evans, P. Kheradpour, G. Mountoufaris, C. Carey, G. Barnea, M. Kellis, and S. Lomvardas. 2011. "An Epigenetic Signature for Monoallelic Olfactory Receptor Expression." Cell 145 (4):555-570. doi: Doi 10.1016/J.Cell.2011.03.040.

Malnic, B., P. A. Godfrey, and L. B. Buck. 2004. "The human olfactory receptor gene family." Proc Natl Acad Sci U S A 101 (8):2584-9.

Markenscoff-Papadimitriou, E., W. E. Allen, B. M. Colquitt, T. Goh, K. K. Murphy, K. Monahan, C. P. Mosley, N. Ahituv, and S. Lomvardas. 2014. "Enhancer interaction networks as a means for singular olfactory receptor expression." Cell 159 (3):543-57. doi: 10.1016/j.cell.2014.09.033.

Mombaerts, P. 2004. "Odorant receptor gene choice in olfactory sensory neurons: the one receptor-one neuron hypothesis revisited." Current Opinion in Neurobiology 14 (1):31- 36. doi: Doi 10.1016/J.Conb.2004.01.014.

Ressler, K. J., S. L. Sullivan, and L. B. Buck. 1993. "A zonal organization of odorant receptor gene expression in the olfactory epithelium." Cell 73 (3):597-609.

Ressler, K. J., S. L. Sullivan, and L. B. Buck. 1994. "Information coding in the olfactory system: evidence for a stereotyped and highly organized epitope map in the olfactory bulb." Cell 79 (7):1245-55.

Serizawa, S., K. Miyamichi, H. Nakatani, M. Suzuki, M. Saito, Y. Yoshihara, and H. Sakano. 2003. "Negative feedback regulation ensures the one receptor-one olfactory neuron rule in mouse." Science 302 (5653):2088-2094. doi: Doi 10.1126/Science.1089122.

Shi, Y., F. Lan, C. Matson, P. Mulligan, J. R. Whetstine, P. A. Cole, R. A. Casero, and Y. Shi. 2004. "Histone demethylation mediated by the nuclear amine oxidase homolog LSD1." Cell 119 (7):941-53. doi: 10.1016/j.cell.2004.12.012.

8 Shykind, B. M., S. C. Rohani, S. O'Donnell, A. Nemes, M. Mendelsohn, Y. Sun, R. Axel, and G. Barnea. 2004. "Gene switching and the stability of odorant receptor gene choice." Cell 117 (6):801-15. doi: 10.1016/j.cell.2004.05.015.

Tan, L., Q. Li, and X. S. Xie. 2015. "Olfactory sensory neurons transiently express multiple olfactory receptors during development." Mol Syst Biol 11 (12):844. doi: 10.15252/msb.20156639.

Tan, L. Z., C. H. Zong, and X. S. Xie. 2013. "Rare event of histone demethylation can initiate singular gene expression of olfactory receptors." Proceedings of the National Academy of Sciences of the United States of America 110 (52):21148-21152. doi: Doi 10.1073/Pnas.1321511111.

Tan, Longzhi, and Xiaoliang Xie. 2017. "A Near Complete Zonal Map of Mouse Olfactory Receptors." bioRxiv. doi: 10.1101/234187.

Vassar, R., S. K. Chao, R. Sitcheran, J. M. Nunez, L. B. Vosshall, and R. Axel. 1994. "Topographic organization of sensory projections to the olfactory bulb." Cell 79 (6):981-91.

Vassar, R., J. Ngai, and R. Axel. 1993. "Spatial segregation of odorant receptor expression in the mammalian olfactory epithelium." Cell 74 (2):309-18.

Vosshall, L. B., H. Amrein, P. S. Morozov, A. Rzhetsky, and R. Axel. 1999. "A spatial map of olfactory receptor expression in the Drosophila antenna." Cell 96 (5):725-36.

Weth, F., W. Nadler, and S. Korsching. 1996. "Nested expression domains for odorant receptors in zebrafish olfactory epithelium." Proc Natl Acad Sci U S A 93 (23):13321-6.

9 Chapter 2

Modeling the kinetics of gene regulation of olfactory receptors

10 Author Contribution Statement

Chenghang Zong, X. Sunney Xie, and I designed the research project. I developed the model and performed simulations. Chenghang Zong, X. Sunney Xie, and I wrote the manuscript.

Introduction

In mammals, the ability to sense odors relies on the “one-neuron-one-receptor” expression (also known as “singular” expression) of the OR gene family: Despite the enormous family size — more than 1,000 genes in the mouse genome, each OSN only expresses one single allele of OR

(Godfrey, Malnic, and Buck 2004, Zhang and Firestein 2002, Chess et al. 1994, Buck and Axel

1991). It has long been thought that the “one-neuron-one-receptor” rule stems from a negative feedback in which the expression of one OR specifically silences all other ORs (Mombaerts

2004, Lewcock and Reed 2004, Serizawa et al. 2003). However, this hypothesis has led to the question of how the one, and only one active OR can escape its own silencing effect. In other words, it was unclear how such silencing can be biologically feasible.

Recent studies provoked a different mechanism for the “one-neuron-one-receptor” rule in light of epigenetic regulation. Contrary to previous belief, Magklara et al. found that silencing of OR genes precedes OR expression: The histone marker H3K9me2 on OR genes is methylated into

H3K9me3 as early as in the stage of neuronal progenitors, which have not yet express any ORs

(Magklara et al. 2011). H3K9me2 is a marker for localization of inactive genes into facultative heterochromatin; its further methylation into H3K9me3, however, marks OR genes for constitutive heterochromatin—nuclear compartments that normally contain pericentromeric and telomeric regions. With this marker, all OR genes are deeply repressed in nuclear aggregates

11 before de-silencing (Clowney et al. 2012). The existence of negative feedback was also confirmed recently: Once a choice of ORs is made, the histone demethylase Kdm1a (also known as Lsd1) will be transcriptionally repressed by OR expression (Lyons et al. 2013, Dalton, Lyons, and Lomvardas 2013). Kdm1a downregulation will prevent further de-silencing and freeze the system after the choice because this is essential for OR de-silencing.

Taken together, these studies provided a clear mechanism for maintaining a choice of OR genes through inhibition of histone demethylation; however, it remains unclear how the choice of only a single OR can be initiated in the first place. Although these studies hypothesized that slow de- silencing may help to choose only a single OR (Lyons et al. 2013, Clowney et al. 2012, Magklara et al. 2011), it is not obvious at a quantitative level whether this scheme is feasible. The critical assessment is whether a slow kinetics can really generate the “one-neuron-one-receptor” rule and if so, what the requirements are for the kinetic parameters. In this Chapter, we theoretically prove that rare events of histone modifications indeed produce “one-neuron-one-receptor” expression on a timescale compatible with experiments. The key is a combination of slow de-silencing and fast feedback. In particular, it is the ratio between the two timescales that determines the success probability of “one-neuron-one-receptor” expression in a wide range of models. Clowney et al. suggested the inaccessibility of aggregated heterochromatin as the reason for slow de-silencing

(Clowney et al. 2012). As an alternative, our model predicts that a rate-limiting step of

H3K9me3-to-H3K9me2 demethylation, catalyzed by an unidentified enzyme, can also be responsible for the “one-neuron-one-receptor” rule. This is in contrast to the previous proposal that Kdm1a (H3K9me2 demethylation) is a rate-limiting enzyme (Lyons et al. 2013).

12 Results

We modeled each allele of OR with three epigenetic states (Figure 2.1A): an Off state that contained H3K9me3 (constitutive heterochromatin), an Intermediate state that contained

H3K9me2 (facultative heterochromatin), and an On state that actively expressed the OR.

Neuronal commitment begins with extensive methylation of H3K9me2 into H3K9me3

(Intermediate→Off, with a rate kme2→me3) (Magklara et al. 2011); therefore, we initialized each allele of OR in the Off state. OR de-silencing — namely the complete demethylation of its repressive

H3K9me3 marker — involves two consecutive steps: H3K9me3-to-H3K9me2 demethylation by

an unidentified demethylase (Off→Intermediate, with a rate kme3→me2), and then H3K9me2

demethylation by Kdm1a (Intermediate→On, with a rate kon) (Lyons et al. 2013, Kooistra and

Helin 2012, Klose and Zhang 2007). Once fully de-silenced, OR expression will transcriptionally downregulate Kdm1a via a recently identified cascade (OR activates the adenylyl cyclase Adcy3, which in turn represses Kdm1a) (Lyons et al. 2013, Dalton, Lyons, and Lomvardas 2013), whose response time was denoted Δt. Although here Δt was assumed to be deterministic, our model still worked if the feedback is stochastic, in which case Δt would denote the mean response time

(Material and Methods).

13

Figure 2.1. Kinetic model for the OR choice. (A) Each allele of OR was modeled by three epigenetic states: Off, Intermediate, and On. Only the Intermediate state was poised for further de-silencing by Kdm1a (also known as Lsd1). Once fully de-silenced, OR expression would transcriptionally downregulate Kdm1a within the response time Δt. (B) In each simulation, all n alleles of OR genes were initialized in the Off state (black line), and then allowed to transition into the Intermediate state (gray box) and finally into the On state (red box). The earliest de- silencing downregulated Kdm1a after time Δt, thus terminating any further de-silencing (dark red vertical line, “end of activation”). “One-neuron-one-receptor” expression (also known as

“singular” expression) would be achieved if no other alleles became de-silenced during this time.

Parameters were kme2→me3 = 1, kme3→me2 = 0.01, kon = 0.01, Δt = 1 for the simulation on the left. The

simulation on the right had the same parameters except for a higher kme3→me2 = 0.1.

14 Figure 2.1 (Continued). (C) The waiting time of each allele followed a probability distribution consisting of a linear rise on a short timescale and an exponential fall on a long timescale. In this

example, the parameters were kme2→me3 = kme3→me2 = kon = 1.

The choice of ORs was simulated by initializing all n = 2800 alleles (including both the parental and the maternal ones) in the Off state (H3K9me3) and then waiting for the earliest de-silencing to occur at any allele, which would turn off Kdm1a after the response time Δt of the feedback loop. Two representative simulations were shown in Figure 2.1B. Starting from the Off state, each allele occasionally lost one methyl group and transitioned into the Intermediate state. In some cases, a methyl group was added back, bringing the allele back into the Off state, whereas in other cases Kdm1a took over and further demethylated the allele for active expression. “One- neuron-one-receptor” expression would be achieved if no further de-silencing occurred in time

Δt after the first de-silencing, as shown in the simulation in Figure 2.1B (left). After Δt, Kdm1a

became downregulated and kon fell to zero, prohibiting further ORs de-silencing. However, if any other ORs were de-silenced before the feedback took effect, as shown in Figure 2.1B (right), the

“one-neuron-one-receptor” rule would be broken because Kdm1a regulation only ensured the maintenance of previous OR choices (Lyons et al. 2013).

In our three-state model, the waiting time for each allele to reach the On state could be solved analytically by analogy to the Michaelis-Menten kinetics (English et al. 2006). Although the

system relied on three parameters kme2→me3, kme3→me2, and kon, the distribution of waiting times only

depended on two factors: the geometric mean of kme3→me2 and kon — denoted kbottleneck ≡ √kme3→me2kon — described how slow the two demethylation reactions were, while the sum of all three rates —

15 denoted ktotal ≡ kme2→me3 + kme3→me2 + kon — described how fast the system transitioned between

2 epigenetic states. For each allele, the mean waiting time before de-silencing was ktotal/k bottleneck; and its

probability density distribution consisted of a linear rise from zero on a short timescale (t ~ 1/ktotal)

2 and an exponential fall on a long timescale (t ~ ktotal/k bottleneck) (Figure 2.1C).

Our model exhibited “one-neuron-one-receptor” expression for a wide range of values of the two

dimensionless parameters kbottleneckΔt and ktotalΔt (Figure 2.2A). In particular, two aspects contributed

to the “one-neuron-one-receptor” rule: a kinetic bottleneck in de-silencing (small kbottleneckΔt), and

rapid switching between epigenetic states (large ktotalΔt). In terms of the original parameters, this

was equivalent to having two slow demethylation reactions (small kme3→me2 and kon) or fast

methylation back into the Off state (large kme2→me3). It was intriguing to notice that either condition will severely delay the first de-silencing of any ORs by decreasing the opportunity of two consecutive demethylations into the On state. For example, in the simulation in the left panel of

Figure 2.1B (corresponding to ktotalΔt = 1.02 and kbottleneckΔt = 0.01), each allele spent most of its time in the Off state and only transiently switched to the Intermediate state. These observations suggested that the rarity of consecutive demethylation might be responsible for the “one-neuron- one-receptor” rule.

16

Figure 2.2. Our kinetic model generated “one-neuron-one-receptor” expression by slow de- silencing and fast feedback. (A) “One-neuron-one-receptor” expression (also known as

“singularity”) was achieved with a wide range of values of two dimensionless parameters:

kbottleneckΔt ≡ Δt √kme3→me2kon, and ktotalΔt ≡ (kme2→me3 + kme3→me2 + kon)Δt. “One-neuron-one-receptor”

expression could be established by a small kbottleneckΔt or a large ktotalΔt. (B) The extent of “one-

neuron-one-receptor” expression, quantified by the failure probability Pfail (the probability of expressing more than one OR), was determined by the ratio of two timescales: the mean time of

the first OR de-silencing T and the response time of the negative feedback Δt. We found Pfail = c ×

Δt/T to be a general conclusion for all kinetic models with an irreversible step of de-silencing.

Our model corresponded to c = 1 or π/2, and simulated data (black dots) fit well with the two theoretical predictions (grey lines). In both panels, orange regions corresponded to the

6 physiological failure probability Pfail ≈ 2%. Each data point was the mean of 10 simulations.

Consistent with this heuristic, we found that the extent of “one-neuron-one-receptor” expression was determined by the ratio of two timescales—the rate of OR de-silencing and the response time of the negative feedback, a general conclusion that held for a wide range of kinetic models.

For each choice of parameters, we recorded T, defined as the mean time of the earliest de-

17 silencing, and the failure probability Pfail, defined as the probability of choosing more than one allele of ORs. Slower de-silencing with respect to the feedback response (larger T/Δt) almost

always led to higher success (smaller Pfail) (black dots in Figure 2.2B). This relationship was actually a general property of all kinetic models with an irreversible step of de-silencing that could be turned off by negative feedback. We showed theoretically that as long as the number of

alleles n is sufficiently large, the failure probability followed Pfail = c × Δt/T (c being a constant) regardless of the model details (Materials and Methods). The constant c was independent of n or any kinetic parameters; instead, it only depended on the categorization of the model based on the shape of its waiting time distribution near zero. Our model corresponded to c = 1 or π/2, depending on whether the linear rise or the exponential fall of the waiting time distribution

(Figure 2.1C) dominated under our choice of n = 2800. Simulated data (black dots in Figure

2.2B) fit well with the two theoretical predictions (grey lines). Therefore, slow de-silencing and fast feedback drove “one-neuron-one-receptor” expression regardless of the details of the de- silencing process.

Our kinetic model suggested an intrinsic trade-off between time and success: strictly “one-

neuron-one-receptor” expression (a very small failure probability Pfail) was achieved by sacrificing the rate of neuronal maturation (a long time before the first de-silencing T). Although the exact extent of “one-neuron-one-receptor” expression has not been investigated for all ~1400

ORs, RNA FISH in the septal organ—a small domain in the MOE where most OSNs choose from a pool of only nine ORs—showed a 1–2% probability of expressing more than one OR in newborn mice or in sensory-deprived adults (in contrast, normal adults would have a lower failure probability of ~0.2% due to apoptosis of neurons with more than one OR) (Tian and Ma

18 2008).

To achieve a physiological failure probability of 2%, the first de-silencing must be ~ 100 times slower than the response of the negative feedback (T/Δt ≈ 100, orange regions in Figure 2A, 2B).

Although the kinetics of this feedback has not been measured, the extremely high transcription levels of ORs and Adcy3 (Magklara et al. 2011), the utilization of the UPR (Dalton, Lyons, and

Lomvardas 2013), and the intrinsic instability of Kdm1a (rapid precipitation and proteasomal degradation when not bound by cofactors) (Zibetti et al. 2010, Forneris et al. 2008) might help to accelerate the response. Given a typical Δt of 1–2 hours for transcriptional regulations (Yosef and Regev 2011), OR de-silencing — namely the maturation of OSNs — should be about 5–10 days. This was consistent with observations after olfactory bulbectomy, where the regeneration of sensory neurons occurred 5–10 days after the operation (Gokoffski et al. 2010).

Our model also provided information on the rate-limiting step behind this kinetic bottleneck. In

the analytical solution, the two rates of demethylation, kme3→me2 and kon (Figure 2.1A), played equal roles because they jointly contributed to “one-neuron-one-receptor” expression via the kinetic

bottleneck kbottleneck ≡ √kme3→me2kon. Therefore, the “one-neuron-one-receptor” rule could in principle be generated in multiple ways: a slow rate of H3K9me3 demethylation by an unidentified enzyme, a

slow rate of H3K9me2 demethylation by Kdm1a, or both. However, different values of kme3→me2 and

kon would impact the relative abundance of epigenetic states: If Kdm1a (kon for Intermediate→On) was the only rate-limiting step, alleles would accumulate in the Intermediate state (H3K9me2), all waiting for further demethylation by Kdm1a. In contrast, OR genes were found mostly with

H3K9me3 instead of H3K9me2 in globose basal cells, neuronal progenitors, and mature OSNs in

19 the MOE (Magklara et al. 2011). Furthermore, Kdm1a is expressed at a relatively high level as revealed by RNA fluorescent in situ hybridization (FISH) and immunostaining (Lyons et al.

2013, Krolewski, Packard, and Schwob 2013), which is not compatible with Kdm1a being a rate- limiting enzyme. Therefore, a more likely explanation would be a rate-limiting step of

H3K9me3-to-H3K9me2 demethylation, probably mediated by an enzyme of very low abundance. It was also possible that Kdm1a and this H3K9me3 demethylase worked in a protein complex, as in the case of regulating androgen-receptor-dependent genes (Wissmann et al. 2007); if so, the low abundance of the complex would account for the bottleneck. The identification of this H3K9me3 demethylase will provide key insights into the cause of singular OR expression.

So far our model had assumed an irreversible OR de-silencing by Kdm1a; however, lineage- tracing experiments showed that ~10% of OSNs once abandoned their initial OR choice and switched to a different one (Shykind et al. 2004). Lyons et al. further suggested that the other function of Kdm1a — demethylation of the active histone marker H3K4me2 — might be responsible for turning off an already de-silenced OR (Lyons et al. 2013). This observation posed the question of whether the reversibility of OR de-silencing contributes to “one-neuron-one- receptor” expression. The ability of Kdm1a to turn off ORs can help to cease the transcription of pseudogenes — nonfunctional OR genes that cannot induce the negative feedback; however, its influence on intact OR genes has not been investigated.

We included OR switching in our model by adding an Kdm1a-mediated reaction from the active

state back into the Intermediated state, with a rate kswitch (On→Intermediate). To model explicitly the negative feedback, we assumed that its response time Δt was primarily determined by the

20 rate of Kdm1a downregulation, because other components of the feedback, ORs and Adcy3, were extremely highly transcribed (Magklara et al. 2011) and would thus respond much more rapidly.

Our conclusions still held if OR or Adcy3 accumulation was instead the primary determinant of

Δt (Material and Methods). In addition, we assumed a decay rate α for Kdm1a when any ORs were active, a steady-state Kdm1a concentration of one when ORs were absent, and a concentration threshold κ below which Kdm1a could no longer induce any further de-silencing.

We found that reversible OR de-silencing could improve “one-neuron-one-receptor” expression by turning off competing ORs, but the extent of enhancement was marginal under the physiological condition. Once the first allele of OR was de-silenced, the downregulation of

Kdm1a from one to κ took Δt = log(1/κ)/α. To break the “one-neuron-one-receptor” rule, the second, competing allele of OR must not only be de-silenced during this time window, but also remain active till the end of de-silencing (Figure 2.3A). When this occurred, OR switching might help “one-neuron-one-receptor” expression by turning the second OR off. However, any significant reduction in the failure probability must be accompanied by a very high frequency of

OR switching Pswitch (Figure 2.3B). For example, a mere two-fold reduction in Pfail (which requires

kswitch ≈ 1/Δt) would require a switching probability Pswitch (defined as the percentage of OSNs that at least once abandoned an initial OR choice) of almost 90%. Analytical calculation showed that,

when OR switching was infrequent, the fold reduction in Pfail was equal to Pswitch/2 in most situations (Material and Methods). Given the physiological switching probability of ~ 10%

(Shykind et al. 2004), which corresponded to kswitch ≈ 0.1/Δt, the failure probability was reduced by a mere 5% (orange region in Figure 3B). Therefore, success probability could only be affected marginally: If without swithing 98.0% of all OSNs followed the “one-neuron-one-receptor” rule,

21 the physiological OR switching would only improve it to 98.1%. Note that there also existed an

edge case where kon was significantly larger than both 1/T and kme2→me3 (namely, an allele that was turned off would quickly revive). In this case, OR switching actually harmed “one-neuron-one- receptor” expression (Material and Methods). In conclusion, the kinetic constraint that we previously derived still held in the presence of OR switching.

Figure 2.3. Switching between ORs might facilitate “one-neuron-one-receptor” expression, but the enhancement was marginal under physiological conditions. (A) Schematic of OR switching,

modeled by the ability of ORs to be turned off by Kdm1a (Off→Intermediate) with a rate kswitch.

22 Figure 2.3 (Continued). We assumed that the response of the negative feedback is primarily determined by Kdm1a depletion; therefore, we modeled Kdm1a by a decay rate α when ORs were active, a steady-state concentration of one when ORs were absent, and a threshold κ (white dashed line) below which no further de-silencing could occur. Switching between ORs could enhance “one-neuron-one-receptor” expression by silencing the competing allele (in this case,

allele 3). (B) Any significant enhancement would require a large kswitch and thus a high probability

of OR switching. Each representative trace was produced by fixing kme2→me3, kme3→me2, kon and tuning

kswitch. The probability of OR switching Pswitch was defined as the percentage of simulations where at

least one OR was once de-silenced and turned back off. The values of Pswitch were almost identical between the three choices of parameters; therefore only one red trace was shown, which fits well

with the analytical prediction Pswitch = 1 - exp(-kswitchΔt). The orange region corresponded to the

6 physiological switching probability Pswitch ≈ 10%. Each data point was the mean of 10 simulations, with α = 1 and κ = 0.1 (corresponding to Δt ≈ 2.30).

Although OR switching had limited effects under physiological conditions, it could provide extra protection against large disruptions in Δt. When the feedback response was artificially prolonged

such that Δt >> 1/kswitch, rather than producing OSNs with more than one OR, the system would get stuck in the high-Kdm1a phase and keep switching between ORs, because the mean lifetime of

the active state (1/kswitch) was too short to fully downregulate Kdm1a. This was indeed the observation after Adcy3 knock-out or after constitutive Kdm1a expression (Lyons et al. 2013).

Therefore, the ability of Kdm1a to turn off ORs provided quality control against Δt disruptions,

preventing the failure probability from exceeding a baseline of 1/(kswitch T). Such “better-safe-than- sorry” strategy might provide an evolutionary advantage.

23

Discussion

The “one-neuron-one-receptor” expression of ORs is a fascinating phenomenon in neurobiology and has attracted numerous modeling efforts. Various models have attempted to explain how the

OSNs overcome the potential competition between multiple ORs, but none has been completely satisfactory. One model suggested that the expression of each OR might specifically silence all

ORs but itself (Lewcock and Reed 2004, Serizawa et al. 2003). Just as in a bistable switch, the mutual repression between ORs can in principle generate multistability in which each stable state only supports one active allele. However, this model was challenged by the extreme difficulty for each OR to distinguish between all ~2800 allele and avoid silencing itself. In another model, a short DNA sequence—the H element—was hypothesized to act as a global OR activator

(Lomvardas et al. 2006). However, later experiments cast doubt on this model because the deletion of the H element does not affect the expression of most ORs (Fuss, Omura, and

Mombaerts 2007, Khan, Vaes, and Mombaerts 2011).

A recent epigenetic study by Magklara et al. provided crucial evidence for a completely different model (Magklara et al. 2011). In particular, they showed that OR silencing is not induced by OR expression but rather precedes it. This observation manifests that OR expression must involve an activation phase that releases some ORs from their initially silent state, which is followed by a maintenance phase that preserves the choice of activation. At first glance, this mechanism seems less ideal than previous models in the sense that if an initial choice activates more than one OR, the maintenance phase will not reject it. “One-neuron-one-receptor” expression must completely rely on the kinetics of the activation process; therefore, it is unclear whether this is a feasible

24 mechanism and what kind of kinetics is required.

In this Chapter, we show theoretically that such an activation-maintenance scheme can indeed generate “one-neuron-one-receptor” expression. In contrast to recent studies (Lyons et al. 2013,

Clowney et al. 2012, Magklara et al. 2011), which emphasized on the dichotomy between activation and maintenance during OR expression, we suggest that these two phases should not be viewed separately. A slow or fast kinetics of either phase cannot guarantee “one-neuron-one- receptor” expression; instead, it is the ratio between the two timescales that determines the extent of “one-neuron-one-receptor” expression. This conclusion holds regardless of the details of the kinetic model or in the presence of OR switching (Lyons et al. 2013, Shykind et al. 2004), and does not rely on ad hoc assumptions of enzymatic cooperativity (Alsing and Sneppen 2013,

Kolterman, Iossifov, and Koulakov 2012). “One-neuron-one-receptor” expression can be achieved if OR de-silencing — namely the complete demethylation of H3K9me3 — occurs as a rare, discrete event, as a result of a combination of slow de-silencing and fast feedback. Now that the feedback loop (OR, Adcy3, Kdm1a) underlying the maintenance phase has been identified

(Lyons et al. 2013, Dalton, Lyons, and Lomvardas 2013), the only missing piece in OR singularity is the de-silencing step responsible for the required kinetic bottleneck.

In general, a slow epigenetic response coupled by a fast transcriptional response, as suggested by our model, is also applicable to other systems where monoallelic expression is desired, such as

V(D)J recombination in the immune system (Farago et al. 2012, Cedar and Bergman 2011,

Krangel 2009, Schlimgen et al. 2008) and the expression of protocadherins (Esumi et al. 2005,

Monahan et al. 2012, Magklara and Lomvardas 2013). Furthermore, understanding the kinetic

25 constraints of the OR expression pattern may provide a guiding principle for the design of a synthetic circuit that assigns one unique barcode to each single cell in vivo.

Materials and Methods

Generalized models

Our conclusion held for a more general model of OR de-silencing and feedback. In a generalized model, the detailed kinetics of OR de-silencing (such as H3K9me3⇄H3K9me2→On) was represented by an arbitrary stochastic process from an Off state with a first-passage time x, whose probability density function (PDF) was f (x) and cumulative distribution function (CDF) F (x).

Similarly, the details of feedback (such as OR→Adcy3⟞Kdm1a) were represented by another stochastic process with a first-passage time y, a PDF g (y), and a CDF G (y). The two processes were connected by an irreversible step in de-silencing that could be turned off after time x + y, which corresponds to the step of H3K9me2 demethylation by Kdm1a in our main model (Figure

2.1A).

Among all n = 2800 alleles of ORs, the mean time of the earliest de-silencing (namely the

+∞ smallest first-passage time of all n independent values of x) could be calculated as T = ʃ0 dx n f

(x) (1-F (x))n-1. When n was sufficiently large, the integral only concerned the behavior of f (x) near x = 0. The PDF f (x) could thus be classified as follows: (I) constant around x = 0, namely f

(0) > 0; (II) a linear rise from x = 0, namely f (0) = 0 but f ′(0) > 0; (III) a quadratic rise from x =

0, namely f (0) = 0, f ′(0) = 0 but f ′′(0) > 0; and so on. For example, models in category (I) had a mean first-de-silencing time scaling as 1/n—in particular T = 1/(n f (0)), whereas models in category (II) have T = √(π/(2n f ′(0))) scaling as n-0.5.

26

The system would fail if the time difference Δx between the earliest de-silencing (x(1)) and the

second earliest one (x(2) = x(1) + Δx) was smaller than the response time y of the feedback. This

+∞ +∞ y failure probability could be calculated as Pfail = ʃ0 dy ʃ0 dx(1) ʃ0 dΔx n(n-1) f (x(1)) f (x(1)+Δx) (1-F

n-2 (x(1)+Δx)) g (y). When success rate was very high (Pfail << 1), the feedback response y would be

y so fast that the most inner integral ʃ0 dΔx could be replaced by multiplication by y near x(2) ≈ x(1).

Under this approximation, we arrived at Pfail ≈ (n-1) Ey E(f (x(1))/(1-F (x(1)))) and even further Pfail ≈

(n-1) Ey Ef (x(1)), because F (x(1)) is small (here E denoted expectation). Therefore, the failure probability also depended on the categorization of the PDF f (x): models in category (I) had a

failure probability proportional to n—in particular Pfail = n f (0) Ey, whereas models in category

(II) had Pfail = √(πn f ′(0)/2) Ey.

Combing the above results, we found that the product of the mean first-de-silencing time T and

the failure probability Pfail was a constant independent of n or the details of the de-silencing

process. For example, category (I) led to Pfail T = Ey and category (II) Pfail T = π/2 Ey. If we further denoted Ey — the mean response time of the feedback — as Δt, we would arrive at the conclusion in the main model.

Modeling OR switching

In the main model of OR switching (Figure 2.3), we assumed that the depletion of Kdm1a was the primary determinant of the feedback response time Δt. Here the accumulation of either OR or

Adcy3 was instead assumed to determine Δt. In this alternative model, OR/Adcy3 was produced at a rate proportional to the number of active OR alleles, and it decayed with a rate α when no

27 alleles were active. The steady-state concentration of OR/Adcy3 was one when one allele is active, and it turned off Kdm1a instantly upon reaching a concentration threshold κ. Therefore, we had the response time Δt = log(1/(1-κ))/α. We found that, similar to the Kdm1a-limited model in Figure 2.3B, OR switching at a physiological probability ~10% negligible effect on

“one-neuron-one-receptor” expression in this OR/Adcy3-limited model. In fact, because

OR/Adcy3 production increased when multiple alleles were active, the system was even more likely to end up with multiple ORs.

Analytical calculation confirmed that OR switching did not significantly facilitate “one-neuron- one-receptor” expression. In most cases, when an active OR was turned off, it was unlikely to be turned on again before another OR was de-silenced. Under this approximation, the switching

probability (Pswitch) could be calculated as the probability that the lifetime of an On state (denoted

ton) was shorter than the time Δt for the feedback to take effect. Therefore,

Pswitch = P(ton < Δt)

Δt = ʃ0 kswitchexp(-kswitchton) dton

= 1 - exp(-kswitchΔt), which fit well with the red lines in Figure 2.3B.

On the other hand, the failure probability (Pfail) could be calculated as the probability that the two following conditions were both met: the competing allele was de-silenced within Δt after the earlier allele, and it remained On until the feedback takes effect. Therefore, if we denoted the time difference between the first and the second activations as Δx, we would have

Pfail = P(Δt - ton < Δx < Δt)

28 = P(Δx < Δt) - P(Δx < Δt - ton and ton <Δt)

= Pfail(kswitch = 0) [1 - Pswitch E(ton | ton < Δt)/Δt]

= Pfail(kswitch = 0) [2 - (1-exp(-kswitchΔt))/(kswitchΔt)].

When switching was not so frequent (kswitchΔt << 1), the above expressions became Pswitch ≈ kswitchΔt,

and Pfail ≈ Pfail(kswitch = 0) (1 - kswitchΔt/2). Therefore, Pfail was proportional to (1 - Pswitch/2) under our approximations. This prediction fit well with Figure 2.3B.

There existed an edge case where the above approximation was not satisfied: if kon was sufficiently large, an allele that was turned off could quickly revive into its On state, before the

second allele was de-silenced. This required both kon >> kme2→me3 and 1/kon << T. In this case, most events of OR turning off resulted in the revival of the same allele, whereas “true” switching between different ORs was relatively rare. We found that switching between ORs actually damaged “one-neuron-one-receptor” expression in this case.

29 References

Alsing, A. K., and K. Sneppen. 2013. "Differentiation of developing olfactory neurons analysed in terms of coupled epigenetic landscapes." Nucleic Acids Res 41 (9):4755-64. doi: 10.1093/nar/gkt181.

Buck, L., and R. Axel. 1991. "A novel multigene family may encode odorant receptors: a molecular basis for odor recognition." Cell 65 (1):175-87.

Cedar, H., and Y. Bergman. 2011. " of haematopoietic cell development." Nat Rev Immunol 11 (7):478-88. doi: 10.1038/nri2991.

Chess, A., I. Simon, H. Cedar, and R. Axel. 1994. "Allelic inactivation regulates olfactory receptor gene expression." Cell 78 (5):823-34.

Clowney, E. J., M. A. LeGros, C. P. Mosley, F. G. Clowney, E. C. Markenskoff-Papadimitriou, M. Myllys, G. Barnea, C. A. Larabell, and S. Lomvardas. 2012. "Nuclear Aggregation of Olfactory Receptor Genes Governs Their Monogenic Expression." Cell 151 (4):724- 737. doi: Doi 10.1016/J.Cell.2012.09.043.

Dalton, R. P., D. B. Lyons, and S. Lomvardas. 2013. "Co-opting the unfolded protein response to elicit olfactory receptor feedback." Cell 155 (2):321-32. doi: 10.1016/j.cell.2013.09.033.

English, B. P., W. Min, A. M. van Oijen, K. T. Lee, G. Luo, H. Sun, B. J. Cherayil, S. C. Kou, and X. S. Xie. 2006. "Ever-fluctuating single enzyme molecules: Michaelis-Menten equation revisited." Nat Chem Biol 2 (2):87-94. doi: 10.1038/nchembio759.

Esumi, S., N. Kakazu, Y. Taguchi, T. Hirayama, A. Sasaki, T. Hirabayashi, T. Koide, T. Kitsukawa, S. Hamada, and T. Yagi. 2005. "Monoallelic yet combinatorial expression of variable exons of the protocadherin-alpha gene cluster in single neurons." Nat Genet 37 (2):171-6. doi: 10.1038/ng1500.

Farago, M., C. Rosenbluh, M. Tevlin, S. Fraenkel, S. Schlesinger, H. Masika, M. Gouzman, G. Teng, D. Schatz, Y. Rais, J. H. Hanna, A. Mildner, S. Jung, G. Mostoslavsky, H. Cedar, and Y. Bergman. 2012. "Clonal allelic predetermination of immunoglobulin-kappa rearrangement." Nature 490 (7421):561-5. doi: 10.1038/nature11496.

Forneris, F., C. Binda, E. Battaglioli, and A. Mattevi. 2008. "LSD1: oxidative chemistry for multifaceted functions in chromatin regulation." Trends Biochem Sci 33 (4):181-9. doi: 10.1016/j.tibs.2008.01.003.

30 Fuss, S. H., M. Omura, and P. Mombaerts. 2007. "Local and cis effects of the H element on expression of odorant receptor genes in mouse." Cell 130 (2):373-84. doi: 10.1016/j.cell.2007.06.023.

Godfrey, P. A., B. Malnic, and L. B. Buck. 2004. "The mouse olfactory receptor gene family." Proc Natl Acad Sci U S A 101 (7):2156-61. doi: 10.1073/pnas.0308051100.

Gokoffski, K. K., S. Kawauchi, H. H. Wu, R. Santos, P. L. W. Hollenbeck, A. D. Lander, and A. L. Calof. 2010. "Feedback Regulation of Neurogenesis in the Mammalian Olfactory Epithelium: New Insights from Genetics and Systems Biology." In The Neurobiology of Olfaction, edited by A. Menini. Boca Raton (FL).

Khan, M., E. Vaes, and P. Mombaerts. 2011. "Regulation of the probability of mouse odorant receptor gene choice." Cell 147 (4):907-21. doi: 10.1016/j.cell.2011.09.049.

Klose, R. J., and Y. Zhang. 2007. "Regulation of histone methylation by demethylimination and demethylation." Nat Rev Mol Cell Biol 8 (4):307-18. doi: 10.1038/nrm2143.

Kolterman, Brian E., Ivan Iossifov, and Alexei A. Koulakov. 2012. A race model for singular olfactory receptor expression. ArXiv e-prints 1201: 2933. Accessed January 1, 2012.

Kooistra, S. M., and K. Helin. 2012. "Molecular mechanisms and potential functions of histone demethylases." Nat Rev Mol Cell Biol 13 (5):297-311. doi: 10.1038/nrm3327.

Krangel, M. S. 2009. "Mechanics of T cell receptor gene rearrangement." Curr Opin Immunol 21 (2):133-9. doi: 10.1016/j.coi.2009.03.009.

Krolewski, R. C., A. Packard, and J. E. Schwob. 2013. "Global expression profiling of globose basal cells and neurogenic progression within the olfactory epithelium." J Comp Neurol 521 (4):833-59. doi: 10.1002/cne.23204.

Lewcock, J. W., and R. R. Reed. 2004. "A feedback mechanism regulates monoallelic odorant receptor expression." Proc Natl Acad Sci U S A 101 (4):1069-74. doi: 10.1073/pnas.0307986100.

Lomvardas, S., G. Barnea, D. J. Pisapia, M. Mendelsohn, J. Kirkland, and R. Axel. 2006. "Interchromosomal interactions and olfactory receptor choice." Cell 126 (2):403-13. doi: 10.1016/j.cell.2006.06.035.

Lyons, D. B., W. E. Allen, T. Goh, L. Tsai, G. Barnea, and S. Lomvardas. 2013. "An epigenetic trap stabilizes singular olfactory receptor expression." Cell 154 (2):325-36. doi: 10.1016/j.cell.2013.06.039.

31 Magklara, A., and S. Lomvardas. 2013. "Stochastic gene expression in mammals: lessons from olfaction." Trends Cell Biol. doi: 10.1016/j.tcb.2013.04.005.

Magklara, A., A. Yen, B. M. Colquitt, E. J. Clowney, W. Allen, E. Markenscoff-Papadimitriou, Z. A. Evans, P. Kheradpour, G. Mountoufaris, C. Carey, G. Barnea, M. Kellis, and S. Lomvardas. 2011. "An Epigenetic Signature for Monoallelic Olfactory Receptor Expression." Cell 145 (4):555-570. doi: Doi 10.1016/J.Cell.2011.03.040.

Mombaerts, P. 2004. "Odorant receptor gene choice in olfactory sensory neurons: the one receptor-one neuron hypothesis revisited." Curr Opin Neurobiol 14 (1):31-6. doi: 10.1016/j.conb.2004.01.014.

Monahan, K., N. D. Rudnick, P. D. Kehayova, F. Pauli, K. M. Newberry, R. M. Myers, and T. Maniatis. 2012. "Role of CCCTC binding factor (CTCF) and cohesin in the generation of single-cell diversity of protocadherin-alpha gene expression." Proc Natl Acad Sci U S A 109 (23):9125-30. doi: 10.1073/pnas.1205074109.

Schlimgen, R. J., K. L. Reddy, H. Singh, and M. S. Krangel. 2008. "Initiation of allelic exclusion by stochastic interaction of Tcrb alleles with repressive nuclear compartments." Nat Immunol 9 (7):802-9. doi: 10.1038/ni.1624.

Serizawa, S., K. Miyamichi, H. Nakatani, M. Suzuki, M. Saito, Y. Yoshihara, and H. Sakano. 2003. "Negative feedback regulation ensures the one receptor-one olfactory neuron rule in mouse." Science 302 (5653):2088-2094. doi: Doi 10.1126/Science.1089122.

Shykind, B. M., S. C. Rohani, S. O'Donnell, A. Nemes, M. Mendelsohn, Y. Sun, R. Axel, and G. Barnea. 2004. "Gene switching and the stability of odorant receptor gene choice." Cell 117 (6):801-15. doi: 10.1016/j.cell.2004.05.015.

Tian, H., and M. Ma. 2008. "Activity plays a role in eliminating olfactory sensory neurons expressing multiple odorant receptors in the mouse septal organ." Mol Cell Neurosci 38 (4):484-8. doi: 10.1016/j.mcn.2008.04.006.

Wissmann, M., N. Yin, J. M. Muller, H. Greschik, B. D. Fodor, T. Jenuwein, C. Vogler, R. Schneider, T. Gunther, R. Buettner, E. Metzger, and R. Schule. 2007. "Cooperative demethylation by JMJD2C and LSD1 promotes androgen receptor-dependent gene expression." Nat Cell Biol 9 (3):347-53. doi: 10.1038/ncb1546.

Yosef, N., and A. Regev. 2011. "Impulse control: temporal dynamics in gene transcription." Cell 144 (6):886-96. doi: 10.1016/j.cell.2011.02.015.

Zhang, X., and S. Firestein. 2002. "The olfactory receptor gene superfamily of the mouse." Nat

32 Neurosci 5 (2):124-33. doi: 10.1038/nn800.

Zibetti, C., A. Adamo, C. Binda, F. Forneris, E. Toffolo, C. Verpelli, E. Ginelli, A. Mattevi, C. Sala, and E. Battaglioli. 2010. "Alternative splicing of the histone demethylase LSD1/KDM1 contributes to the modulation of neurite morphogenesis in the mammalian nervous system." J Neurosci 30 (7):2521-32. doi: 10.1523/JNEUROSCI.5500-09.2010.

33 Chapter 3

Single-cell transcriptomic sequencing of olfactory sensory neurons

34 Author Contribution Statement

Qian Li, X. Sunney Xie, and I designed the experiments. Qian Li and I performed the experiments and analyzed the data. Qian Li, X. Sunney Xie, and I wrote the manuscript.

Introduction

In mammalian olfactory systems, the ability to detect and discriminate between a tremendous number of odors relies on the “one-neuron-one-receptor” rule (Mombaerts 2004, Buck and Axel

1991). However, this rule was only demonstrated by RNA fluorescent in situ hybridization

(FISH), genetic labeling, and single-cell RT-PCR (Serizawa et al. 2003, Tian and Ma 2008,

Tietjen et al. 2003, Shykind et al. 2004, Malnic et al. 1999), none of which could probe all ~

1,000 ORs at the same time. In addition, little is known about the dynamics of OR expression during development.

In this Chapter, we present a direct test of the “one-neuron-one-receptor” rule by transcriptomic sequencing of single cells from adult and newborn mice. Our results provide experimental support for a previously proposed, yet not widely accepted, hypothesis that each cell may first transiently express multiple ORs, and then eliminates all but one during development

(Mombaerts 2004). This is in sharp contrast to the popular view that only one OR is expressed at any given time (Shykind 2005, Dalton, Lyons, and Lomvardas 2013). Such transient co- expression may provide a molecular basis for a recently observed critical period of olfactory axon wiring in newborn mice (Ma et al. 2014, Tsai and Barnea 2014).

Results

35 From the mouse MOE, we sequenced 178 single cells with an average of 2.82 million single-end

100-bp or paired-end 50-bp reads per cell (standard deviation = 0.83 million, min = 1.06 million, max = 4.52 million) (Figure 3.1A). The cells came from either adult mice, aged 1 to 3 months

(56 cells), or newborn mice, postnatal day 4 to 10 (122 cells) (Table S1). In each cell, an average of 2,826 genes (standard deviation = 921, min = 805, max = 6,399) were detected above a threshold of 1 transcript per million (TPM). The numbers of detected genes were similar between adult and newborn cells (median = 2,862 versus 2,894, P = 0.36, two-sided Wilcoxon rank-sum test) (Figure 3.1B). To ensure sample quality, after microfluidic capture, cells were stained for viability and visually inspected to avoid multiple cells; and their cDNA size distributions were analyzed to exclude cells with RNA degradation.

36

Figure 3.1. Single-cell transcriptomic sequencing of mouse OSNs established a time axis for neuronal development. (A) We sequenced 178 single cells from the MOE of adult and newborn mice, with stringent quality control to avoid multiple cells, dead cells, or RNA degradation. (B)

Distribution of the number of detected genes among single cells. Similar numbers of genes were detected above a threshold of 1 transcript per million (TPM) in cells from adults and cells from newborns. Each gray dot denoted one single cell. The horizontal line denoted the median, and the box denoted the lower and upper quartiles. (C) A total of 44 known marker genes were used in principal component analysis (PCA) to infer the developmental stage of each single cell. These genes included most of the known markers for immature OSNs. Two black boxes indicated published expression patterns.

37 Figure 3.1 (Continued). (D) Single cells could be roughly divided into 3 sub-populations along principal component 1. Genes were sorted first by published expression patterns (from top to bottom: immature only, both immature and mature, mature only, and precursor only), and then by average expression across all cells. Color legend was above the panel. (E) Expression profiles for

3 example mature markers (Omp, Stoml3, and Gng13) and 3 example immature markers (Gap43,

Gng8, and Stmn1) on PCA plots. (F) Each single cell can be classified according to principal components 1 and 2.

We first established a time axis for neuronal development among the single cells. Because of continuous neurogenesis, the MOE contains a mixture of neurons at different developmental stages, from basally located immature OSNs to apically located mature OSNs (Verhaagen et al.

1989). To overcome the intrinsic stochasticity of gene expression and the technical noise and biases of single-cell RNA amplification (Wu et al. 2014), we combined 44 marker genes in principal component analysis (PCA) to infer the developmental stages of single cells (Figure

3.1C). These genes were chosen to include most of the known markers for immature OSNs

(Marcucci, Zou, and Firestein 2009, McIntyre, Titlow, and McClintock 2010, Sathyanesan et al.

2013, MacDonald, Gin, and Roskams 2005, Belluscio et al. 1998, Dalton, Lyons, and Lomvardas

2013, Hirota and Mombaerts 2004, Nedelec et al. 2004, Calof and Chikaraishi 1989, Roskams,

Cai, and Ronnett 1998), for mature OSNs (Kobayakawa et al. 2002, Bonigk et al. 1999,

Sathyanesan et al. 2013, Belluscio et al. 1998, Monti-Graziadei et al. 1977, Zou et al. 2007), and for neuronal precursors (Cau et al. 1997). As expected, along principal component 1, these genes roughly divided the single cells into 3 sub-populations: fully mature OSNs, immature OSNs, and other cells (including neuronal precursors, stem cells, and various types of supporting cells) from

38 right to left (Figure 3.1D).

Principal components 1 and 2 together visualized the 3 sub-populations, with more mature OSNs at the bottom right and more immature OSNs on the top (Figure 3.1E). Note that the transition between immature and mature neurons is continuous, as shown by a considerable overlap in gene expression between classic immature markers Gap43, Gng8, Stmn1 and classic mature markers

Omp, Gng13, Cnga2, Stoml3, Gnal (Figures 1E and S1). Changing the exact division line between immature and mature neurons did not affect our conclusions. Under our scheme of classification (Figure 3.1F), 54 immature OSNs (6 from adults, 48 from newborns) were characterized by frequent expression of Gap43, Gng8, Gnas, Dpsyl3, Dpsyl5, Hdac2, relatively high expression of Dpysl2, Stmn1, Stmn2, Emx2, Lhx2, Tubb3, and frequent absence of Stmn4,

Cnga4, Cngb1, Adcy3. In contrast, 79 fully mature OSNs (46 from adults, 33 from newborns) showed the opposite characteristics and were mostly positive for mature markers. The ratio between immature and fully mature neurons is much higher in newborns (1.45 : 1, compared to

0.13 : 1 in adults, P = 2.5 × 10-8, two-sided Fisher’s exact test), consistent with published results

(Verhaagen et al. 1989) and our RNA FISH of Omp and Gap43.

Our classification of immature and mature OSNs was robust against the choice of marker genes.

Instead of the 44 known marker genes from the literature, we picked another set of genes in a less supervised manner. A recent study conducted RNA sequencing on two flow-sorted samples:

Neurog1+ neuronal precursors (a stage earlier than Gap43+ immature OSNs) and Omp+ mature

OSNs (Magklara et al. 2011). The dataset contained 27,389 genes. Among the 496 genes that were highly expressed (FPKM > 100) in at least one sample, we picked the top 100 genes that

39 were enriched in neuronal precursors and the top 100 in mature OSNs based on fold changes.

This set of 200 genes reproduced our main conclusions.

In each cell, we evaluated the expression of each OR with stringent criteria (Materials and

Methods). In total, we made 153 confident calls of receptor expression, including 2 trace amine- associated receptors (TAARs) (Liberles and Buck 2006), in particular Taar4 in Cell 76 and

Taar7e in Cell 74, and 1 vomeronasal receptor (VR) (Dulac and Axel 1995), in particular

Vmn1r37 in Cell 101 (Figure 3.2A). The splicing isoforms that we observed were highly consistent with a recently published assembly of OR and VR transcripts, which was based on

RNA sequencing of whole tissues (Ibarra-Soria et al. 2014). Out of all 151 confident calls of OR and VR expression, 120 cases (79%) had all splicing isoforms agreeing with (Ibarra-Soria et al.

2014), and 13 cases (9%) contained a mixture of novel and published isoforms (Figure 3.2A). In addition, our single-cell results allowed us to show for the first time that multiple isoforms of the same receptor could coexist in one single cell (Figure 3.2A). Figure 3.2C shows the coverage and splicing profiles of two such examples, Olfr1507 and Olfr536. We also found that the level of receptor expression could differ by more than 3 orders of magnitude between single cells, with a median of TPM = 9.15 × 103, corresponding to ~1% of the transcriptome, and a range from 42.1 to 1.46 × 105, corresponding to 0.0042% to 15% of the transcriptome (Figure 3.2B). The very high coverage of some ORs allowed us to identify novel exons or novel combinations of known exons that were previously undetected in whole tissues (Figure 3.2D).

40

Figure 3.2. Expression of ORs was confidently detected in single cells. (A) Composition of all confident calls of receptor expression and comparison with a recently published assembly

(Ibarra-Soria et al. 2014). (B) Distribution of the level of receptor expression among all confident calls. The level of receptor expression differed by more than 3 orders of magnitude between single cells. Each gray dot denoted a confident call of receptor expression. (C) Two example

ORs, Olfr1507 and Olfr536, were expressed in multiple, co-existing isoforms. Note that some of the Olfr1507 transcripts did not contain the beginning of its coding sequence and were thus presumably non-coding. Green pileups denoted coverage by sequencing reads (arbitrary units, binned every 20 bp), and red lines denoted spliced reads. (D) In 3 example ORs, Olfr921,

Olfr114, and Olfr1255, we discovered novel exons (black asterisk) or exon combinations (red asterisk). Note that Olfr1255’s rightmost novel exon is likely part of a non-coding isoform.

41

To our surprise, we observed 20 cells — 5 from adults, 15 from newborns — that expressed multiple ORs, seemingly violating the “one-neuron-one-receptor” rule. In total, we determined the status of OR expression for 155 out of 178 (87%) single cells. In addition to the 20 multi- receptor neurons, we found 57 cells that express no receptors (6 from adults, 51 from newborns), and 78 cells that express a single receptor (38 from adults, 40 from newborns, including 1 VR cell and 2 TAAR cells) (Figure 3.3A). We observed a tendency for more multi-receptor neurons in newborns than in adults ((27 ± 6)% versus (12 ± 5)%, among cells with a single or multiple receptors, with standard errors), consistent with an RNA FISH study in the septal organ (Tian and Ma 2008); but the difference was not significant (P = 0.077, two-sided Fisher’s exact test)

(Figure 3.3A). Despite their different numbers of detected ORs, single- and multi-receptor neurons had similar numbers of detected genes (median = 3,075 versus 3,077, P = 0.86, two- sided Wilcoxon rank-sum test) (Figure 3.3B).

42

Figure 3.3. A subset of 20 single cells expressed multiple ORs. Most of these cells were immature OSNs. (A) We determined the status of OR expression for 155 out of 178 (87%) single cells. Among cells with a single or multiple receptors, newborn mice had a tendency to have more multi-receptor neurons, but the difference was not significant.

43 Figure 3.3 (Continued). (B) Distribution of the number of detected genes among single cells, categorized by their status of OR expression. Similar numbers of genes were detected above a threshold of 1 TPM in single- and multi-receptor neurons (P = 0.86, two-sided Wilcoxon rank- sum test). Symbols had the same meanings as in Figure 3.1B. (C) Distribution of the total level of receptor expression among single cells. Multi-receptor neurons tend to have a slightly lower level of total OR expression compared to their single-receptor counterparts. Each gray dot denoted the sum of expression levels of all its ORs in a single cell. (D) The level of total OR expression, denoted by the area of each circle, and the contribution of each OR, denoted by the size of each slice, in each of the 20 multi-receptor neurons. (E) Coverage profiles of 3 example multi-receptor neurons, each expressing 2, 3, and 2 ORs. Note that Cell 5 (bottom) expressed two adjacent ORs, Olfr1030 and Olfr1031, and some isoforms shared a same upstream exon.

Symbols had the same meanings as in Figure 3.2C. (F) Same as Figure 3.1F, but with multi- receptor neurons labeled by red boxes. (G) A large fraction of immature OSNs expressed multiple receptors, seemingly violating the “one-neuron-one-receptor” rule, while the rule was restored in fully mature OSNs (two-sided Fisher’s exact test).

Interestingly, multi-receptor neurons showed a lower level of total OR expression compared to their single-receptor counterparts (median TPM = 1.18 × 104 versus 1.75 × 104, P = 0.034, two-sided Wilcoxon rank-sum test) (Figure 3.3C), suggesting a possibly earlier stage of OR expression. Each multi-receptor neuron expressed an average of 2.9 ORs, with a median of 2 and a range of 2 to 9. Figure 3.3D showed the level of total OR expression and the contribution of each OR in each of these cells. Figure 3.3E showed 3 examples of the coverage profiles in multi- receptor neurons. Curiously, in Cell 5 we observed the co-expression of two adjacent ORs,

44 Olfr1030 and Olfr1031, on the same strand of chromosome 2. The two ORs were not highly homologous to each other (their previous names, MOR196-2 and MOR200-1, indicate different

OR subfamilies); yet in some isoforms they shared a same upstream exon. It was unclear whether such local co-expression could occur for other ORs.

We hypothesized that OSNs expressed multiple ORs at an early developmental stage based on the observation that multi-receptor neurons expressed ORs at a lower level (Figure 3.3C).

Indeed, we detected the expression of Gap43, a gene critical for axon path-finding (Strittmatter et al. 1995, Maier et al. 1999) and a marker for immature OSNs (Verhaagen et al. 1989), in 18 out of 20 multi-receptor neurons. Under our classification of immature and mature OSNs (Figure

3.1F), 17 out of 20 (85%) multi-receptor neurons were immature (Figure 3.3F). This suggested that a substantially fraction of immature OSNs — (57 ± 9)% among cells with a single or multiple receptors, with standard error — expressed more than one ORs; and this percentage dropped dramatically to (4 ± 2)% when cells developed into fully mature OSNs (P = 1.8 × 10-8, two-sided Fisher’s exact test), restoring the “one-neuron-one-receptor” rule (Figure 3.3G).

An alternative explanation to our observations was that co-expressed ORs might arise from contamination from nearby cells or ambient RNA, and that their enrichment in immature neurons might be an artifact because in fully mature OSNs, the high level of existing OR expression might “mask” contamination and cause their apparent absence. Such “masking” might arise from competition for reverse transcription and/or PCR amplification. To rule out this possibility, we conducted a control experiment in which a “target” cell, expressing a single OR Olfr1537 at

TPM = 4.16 × 104, was either reverse transcribed, amplified, and sequenced alone, or processed

45 as a 1:10 or 1:100 mixture with a “background” cell. Because the microfluidic device that was used for the main results did not support such operations, we conducted the control experiment with mouth pipetting (Li et al. 2013) and a similar chemistry (Picelli et al. 2014). In all 3 mixtures, we detected the “target” OR Olfr1537 against the “background” of a cell that lowly expressed Olfr728, a cell that highly expressed Olfr1348, or a cell that expressed no receptors.

Therefore, a highly expressed OR did not seem to “mask” a co-expressed OR.

Single-cell transcriptomic sequencing was known to exhibit large measurement noise, especially for lowly expressed genes (Wu et al. 2014, Chapman et al. 2015), which might give rise to artifacts. To assess the extent of technical variations, we conducted an additional control experiment in which a single cell was split into two halves and processed separately. The two halves showed great consistency quantifying highly expressed genes, such as the OR Olfr107 and the markers Omp, S100a5, Gng13, Gnal. They also agreed on the absence of genes such as

Gap43 and Gnas. However, for lowly or intermediately expressed genes, such as Cnga2 in this cell, detection and/or quantification was sometimes noisy. In particular, consistent detection between the two halves was frequent only for genes with TPM > 103, while “drop-outs”

(detection in only one half) dominated for genes with TPM < 102. This suggests that in OSNs, genes with an expression level of TPM = 102 to 103, corresponding to 0.01% to 0.1% of the transcriptome, roughly constituted a minimal “unit” of reliable detection. In comparison, the expression levels of ORs in multi-receptor neurons were around or above this “unit.” Therefore, our detection of multiple ORs in these cells was reliable.

Discussion

46 For more than a decade, ORs have been thought to express one-at-a-time during the establishment of the “one-neuron-one-receptor” rule. We found that this might not be true. Our findings suggest that epigenetic regulation behind the “one-neuron-one-receptor” rule is more complicated than previously thought, because current models cannot explain the elimination of all but one OR from multi-receptor neurons. The transient nature of multi-receptor expression suggests a dramatic change in chromatin conformation during development, from a more permissive environment in immature OSNs to a highly compacted one in fully mature OSNs.

This is consistent with recent genetic manipulations of several OR genes (Fleischmann et al.

2013). Although the same repressive histone mark H3K9me3 (Magklara et al. 2011) is present on

OR genes throughout neuronal differentiation, our results suggest that additional epigenetic factors must be involved during the transition between immature and mature OSNs. Candidates include the gradual nuclear aggregation mediated by nuclear lamina (Clowney et al. 2012) and the developmentally regulated subunits of polycomb repressive complexes (PRCs) (Tietjen et al.

2003), which are known to compress chromatin with H3K27me3 (Armelin-Correa et al. 2014).

Our observed concurrence of multi-receptor expression, axonal growth, and synaptic formation leads to a speculation that OR regulation may be non-cell-autonomous. If so, the transient expression of multiple receptors may provide a molecular basis for a recently discovered critical period of olfactory axon wiring (Ma et al. 2014, Tsai and Barnea 2014). Note that whichever the new mechanism is, the current kinetic model of epigenetic gene activation (Lyons et al. 2013,

Tan, Zong, and Xie 2013) is still the biggest contributor to the “one-neuron-one-receptor” rule, bringing down the possible choices from more than 1,000 genes to less than 10 in each single cell. Under this parameter regime, both OR de-silencing and feedback can happen at the

47 timescale of days, which is physiologically more feasible. Starting from there, receptor elimination may be the key to bringing the cell to the remarkable precision of “one-neuron-one- receptor.”

Materials and Methods

Single-cell sequencing of transcriptomes

All mouse experiments were performed in accordance with relevant guidelines and regulations.

Animal protocols were approved by Harvard IACUC.

The MOE of C57BL/6J or C57BL/6NTac mice were dissected, and dissociated by the Papain

Dissociation System (Worthington) at 37°C for 15 minutes and trituration for 5-15 times with a cut P1000 pipette tip, without papain inactivation and density centrifugation. Cells were filtered by a 40 um strainer (Falcon) and a 10 um one (pluriSelect). After spinning at 400 g for 2 minutes, cells were resuspended in DMEM (Gibco).

For the main experiments, cells were loaded at a concentration of ~ 750K/mL onto a 5-10 um mRNA-Seq C1 chip (Fluidigm). Cells were washed, stained by LIVE/DEAD

Viability/Cytotoxicity Kit (Life Technologies), and discarded if stained red or the chamber contained multiple cells. Amplified cDNA was harvested into 3 uL DNA dilution buffer

(Fluidigm) per cell.

For the control experiments, cells were plated onto a cover glass coated with 10 ng/uL poly-D- lysine (Sigma). Cells were washed by HBSS (Gibco), picked by mouth pipetting, and amplified

48 by Smart-Seq2 (Picelli et al. 2014) with minor modifications (22 cycles of PCR, with

SuperScript II and its buffer replaced by ProtoScript II (NEB) to avoid bacterial contaminations in recent lots, and with two rounds of bead purification to minimize primer dimers).

Amplified cDNA was analyzed on High Sensitivity DNA chips (Agilent). Cells with a lot of short cDNA (< 1 kb) were discarded. Reads were aligned to the GRCm38/mm10 assembly of the mouse genome by TopHat 2.0.11 with default parameters. Transcript abundances were estimated by Cufflinks 2.2.1 with the annotation of UCSC genes and default parameters. Alignments were inspected in IGV (Broad Institute). TPM values were calculated after the removal of microRNAs, small nucleolar RNAs, and rRNAs from the Cufflinks output. PCA was done with the ranking of TPM values among all single cells.

Evaluation of OR expression

In each cell, we carefully assessed the expression of each OR against 3 criteria: (a) its coding sequence must be completely covered, otherwise we may have detected a truncated, non-coding transcript; (b) a large fraction of reads must have high mapping quality, otherwise we may have detected mismapping from a homologous OR; (c) some reads must span introns, otherwise we may have detected contamination from genomic DNA. To minimize false negatives, the expression status of an OR is called “uncertain” when only 1 or 2 criteria are met.

RNA in situ hybridization

In situ hybridization analysis of mouse main olfactory epithelium was performed as described before (Liberles and Buck 2006). cRNA riboprobes were used for Omp (992 base-pair sequence

49 amplified by primers CAAACGGCCAGCACTGATTC and ACCGGTACCACAGCCTATCT) labeled with fluorescein and Gap43 (907 base-pair sequence amplified by primers

AGATGGTGTCAAGCCGGAAG and CCGGGGTACAGTGCAAGAAT) labeled with digoxigenin. Fluorescent mages were taken on a Leica TCS SP5 II confocal microscope.

50 References

Armelin-Correa, L. M., L. M. Gutiyama, D. Y. Brandt, and B. Malnic. 2014. "Nuclear compartmentalization of odorant receptor genes." Proc Natl Acad Sci U S A 111 (7):2782-7. doi: 10.1073/pnas.1317036111.

Belluscio, L., G. H. Gold, A. Nemes, and R. Axel. 1998. "Mice deficient in G(olf) are anosmic." Neuron 20 (1):69-81.

Bonigk, W., J. Bradley, F. Muller, F. Sesti, I. Boekhoff, G. V. Ronnett, U. B. Kaupp, and S. Frings. 1999. "The native rat olfactory cyclic nucleotide-gated channel is composed of three distinct subunits." J Neurosci 19 (13):5332-47.

Buck, L., and R. Axel. 1991. "A novel multigene family may encode odorant receptors: a molecular basis for odor recognition." Cell 65 (1):175-87.

Calof, A. L., and D. M. Chikaraishi. 1989. "Analysis of neurogenesis in a mammalian neuroepithelium: proliferation and differentiation of an olfactory neuron precursor in vitro." Neuron 3 (1):115-27.

Cau, E., G. Gradwohl, C. Fode, and F. Guillemot. 1997. "Mash1 activates a cascade of bHLH regulators in olfactory neuron progenitors." Development 124 (8):1611-21.

Chapman, A. R., Z. He, S. Lu, J. Yong, L. Tan, F. Tang, and X. S. Xie. 2015. "Single cell transcriptome amplification with MALBAC." PLoS One 10 (3):e0120889. doi: 10.1371/journal.pone.0120889.

Clowney, E. J., M. A. LeGros, C. P. Mosley, F. G. Clowney, E. C. Markenskoff-Papadimitriou, M. Myllys, G. Barnea, C. A. Larabell, and S. Lomvardas. 2012. "Nuclear Aggregation of Olfactory Receptor Genes Governs Their Monogenic Expression." Cell 151 (4):724- 737. doi: Doi 10.1016/J.Cell.2012.09.043.

Dalton, R. P., D. B. Lyons, and S. Lomvardas. 2013. "Co-opting the unfolded protein response to elicit olfactory receptor feedback." Cell 155 (2):321-32. doi: 10.1016/j.cell.2013.09.033.

Dulac, C., and R. Axel. 1995. "A novel family of genes encoding putative pheromone receptors in mammals." Cell 83 (2):195-206.

Fleischmann, A., I. Abdus-Saboor, A. Sayed, and B. Shykind. 2013. "Functional interrogation of an odorant receptor locus reveals multiple axes of transcriptional regulation." PLoS Biol 11 (5):e1001568. doi: 10.1371/journal.pbio.1001568.

51 Hirota, J., and P. Mombaerts. 2004. "The LIM-homeodomain protein Lhx2 is required for complete development of mouse olfactory sensory neurons." Proc Natl Acad Sci U S A 101 (23):8751-5. doi: 10.1073/pnas.0400940101.

Ibarra-Soria, X., M. O. Levitin, L. R. Saraiva, and D. W. Logan. 2014. "The olfactory transcriptomes of mice." PLoS Genet 10 (9):e1004593. doi: 10.1371/journal.pgen.1004593.

Kobayakawa, K., R. Hayashi, K. Morita, K. Miyamichi, Y. Oka, A. Tsuboi, and H. Sakano. 2002. "Stomatin-related olfactory protein, SRO, specifically expressed in the murine olfactory sensory neurons." J Neurosci 22 (14):5931-7.

Li, Q., W. J. Korzan, D. M. Ferrero, R. B. Chang, D. S. Roy, M. Buchi, J. K. Lemon, A. W. Kaur, L. Stowers, M. Fendt, and S. D. Liberles. 2013. "Synchronous evolution of an odor biosynthesis pathway and behavioral response." Curr Biol 23 (1):11-20. doi: 10.1016/j.cub.2012.10.047.

Liberles, S. D., and L. B. Buck. 2006. "A second class of chemosensory receptors in the olfactory epithelium." Nature 442 (7103):645-50. doi: 10.1038/nature05066.

Lyons, D. B., W. E. Allen, T. Goh, L. Tsai, G. Barnea, and S. Lomvardas. 2013. "An epigenetic trap stabilizes singular olfactory receptor expression." Cell 154 (2):325-36. doi: 10.1016/j.cell.2013.06.039.

Ma, L., Y. Wu, Q. Qiu, H. Scheerer, A. Moran, and C. R. Yu. 2014. "A developmental switch of axon targeting in the continuously regenerating mouse olfactory system." Science 344 (6180):194-7. doi: 10.1126/science.1248805.

MacDonald, J. L., C. S. Gin, and A. J. Roskams. 2005. "Stage-specific induction of DNA methyltransferases in olfactory receptor neuron development." Dev Biol 288 (2):461- 73. doi: 10.1016/j.ydbio.2005.09.048.

Magklara, A., A. Yen, B. M. Colquitt, E. J. Clowney, W. Allen, E. Markenscoff-Papadimitriou, Z. A. Evans, P. Kheradpour, G. Mountoufaris, C. Carey, G. Barnea, M. Kellis, and S. Lomvardas. 2011. "An Epigenetic Signature for Monoallelic Olfactory Receptor Expression." Cell 145 (4):555-570. doi: Doi 10.1016/J.Cell.2011.03.040.

Maier, D. L., S. Mani, S. L. Donovan, D. Soppet, L. Tessarollo, J. S. McCasland, and K. F. Meiri. 1999. "Disrupted cortical map and absence of cortical barrels in growth-associated protein (GAP)-43 knockout mice." Proc Natl Acad Sci U S A 96 (16):9397-402.

Malnic, B., J. Hirono, T. Sato, and L. B. Buck. 1999. "Combinatorial receptor codes for odors."

52 Cell 96 (5):713-23.

Marcucci, F., D. J. Zou, and S. Firestein. 2009. "Sequential onset of presynaptic molecules during olfactory sensory neuron maturation." J Comp Neurol 516 (3):187-98. doi: 10.1002/cne.22094.

McIntyre, J. C., W. B. Titlow, and T. S. McClintock. 2010. "Axon growth and guidance genes identify nascent, immature, and mature olfactory sensory neurons." J Neurosci Res 88 (15):3243-56. doi: 10.1002/jnr.22497.

Mombaerts, P. 2004. "Odorant receptor gene choice in olfactory sensory neurons: the one receptor-one neuron hypothesis revisited." Current Opinion in Neurobiology 14 (1):31- 36. doi: Doi 10.1016/J.Conb.2004.01.014.

Monti-Graziadei, G. A., F. L. Margolis, J. W. Harding, and P. P. Graziadei. 1977. "Immunocytochemistry of the olfactory marker protein." J Histochem Cytochem 25 (12):1311-6.

Nedelec, S., I. Foucher, I. Brunet, C. Bouillot, A. Prochiantz, and A. Trembleau. 2004. "Emx2 homeodomain transcription factor interacts with eukaryotic translation initiation factor 4E (eIF4E) in the axons of olfactory sensory neurons." Proc Natl Acad Sci U S A 101 (29):10815-20. doi: 10.1073/pnas.0403824101.

Picelli, S., O. R. Faridani, A. K. Bjorklund, G. Winberg, S. Sagasser, and R. Sandberg. 2014. "Full-length RNA-seq from single cells using Smart-seq2." Nat Protoc 9 (1):171-81. doi: 10.1038/nprot.2014.006.

Roskams, A. J., X. Cai, and G. V. Ronnett. 1998. "Expression of neuron-specific beta-III tubulin during olfactory neurogenesis in the embryonic and adult rat." Neuroscience 83 (1):191-200.

Sathyanesan, A., A. A. Feijoo, S. T. Mehta, A. F. Nimarko, and W. Lin. 2013. "Expression profile of G-protein betagamma subunit gene transcripts in the mouse olfactory sensory epithelia." Front Cell Neurosci 7:84. doi: 10.3389/fncel.2013.00084.

Serizawa, S., K. Miyamichi, H. Nakatani, M. Suzuki, M. Saito, Y. Yoshihara, and H. Sakano. 2003. "Negative feedback regulation ensures the one receptor-one olfactory neuron rule in mouse." Science 302 (5653):2088-2094. doi: Doi 10.1126/Science.1089122.

Shykind, B. M. 2005. "Regulation of odorant receptors: one allele at a time." Hum Mol Genet 14 Spec No 1:R33-9. doi: 10.1093/hmg/ddi105.

53 Shykind, B. M., S. C. Rohani, S. O'Donnell, A. Nemes, M. Mendelsohn, Y. Sun, R. Axel, and G. Barnea. 2004. "Gene switching and the stability of odorant receptor gene choice." Cell 117 (6):801-15. doi: 10.1016/j.cell.2004.05.015.

Strittmatter, S. M., C. Fankhauser, P. L. Huang, H. Mashimo, and M. C. Fishman. 1995. "Neuronal pathfinding is abnormal in mice lacking the neuronal growth cone protein GAP-43." Cell 80 (3):445-52.

Tan, L. Z., C. H. Zong, and X. S. Xie. 2013. "Rare event of histone demethylation can initiate singular gene expression of olfactory receptors." Proceedings of the National Academy of Sciences of the United States of America 110 (52):21148-21152. doi: Doi 10.1073/Pnas.1321511111.

Tian, H., and M. Ma. 2008. "Activity plays a role in eliminating olfactory sensory neurons expressing multiple odorant receptors in the mouse septal organ." Mol Cell Neurosci 38 (4):484-8. doi: 10.1016/j.mcn.2008.04.006.

Tietjen, I., J. M. Rihel, Y. Cao, G. Koentges, L. Zakhary, and C. Dulac. 2003. "Single-cell transcriptional analysis of neuronal progenitors." Neuron 38 (2):161-75.

Tsai, L., and G. Barnea. 2014. "A critical period defined by axon-targeting mechanisms in the murine olfactory bulb." Science 344 (6180):197-200. doi: 10.1126/science.1248806.

Verhaagen, J., A. B. Oestreicher, W. H. Gispen, and F. L. Margolis. 1989. "The expression of the growth associated protein B50/GAP43 in the olfactory system of neonatal and adult rats." J Neurosci 9 (2):683-91.

Wu, A. R., N. F. Neff, T. Kalisky, P. Dalerba, B. Treutlein, M. E. Rothenberg, F. M. Mburu, G. L. Mantalas, S. Sim, M. F. Clarke, and S. R. Quake. 2014. "Quantitative assessment of single-cell RNA-sequencing methods." Nat Methods 11 (1):41-6. doi: 10.1038/nmeth.2694.

Zou, D. J., A. T. Chesler, C. E. Le Pichon, A. Kuznetsov, X. Pei, E. L. Hwang, and S. Firestein. 2007. "Absence of adenylyl cyclase 3 perturbs peripheral olfactory projections in mice." J Neurosci 27 (25):6675-83. doi: 10.1523/JNEUROSCI.0699-07.2007.

54 Chapter 4

Mapping the expression zones of nearly all mouse olfactory receptors

55 Author Contribution Statement

X. Sunney Xie and I designed the experiments. I performed the experiments and analyzed the data. X. Sunney Xie and I wrote the manuscript.

Introduction

The olfactory system relies on a topological map of OR expression. The first stage of map formation occurs in the MOE, where different OR genes are expressed in different yet partially overlapping “zones”. The zones were first discovered in rodents (Ressler, Sullivan, and Buck

1993, Vassar, Ngai, and Axel 1993), and later observed in fish (Weth, Nadler, and Korsching

1996), insects (Vosshall et al. 1999), and primates (Horowitz et al. 2014). Zonal expression is crucial for downstream projection to the main olfactory bulb (Ressler, Sullivan, and Buck 1994,

Vassar et al. 1994, Mombaerts et al. 1996), and may contribute to the establishment of the “one- neuron-one-receptor” rule (Chess et al. 1994, Malnic et al. 1999, Hanchate et al. 2015, Tan, Li, and Xie 2015). However, existing zonal information either had a limited resolution (Zhang et al.

2004) or was available only for a small subset of < 100 ORs (Miyamichi et al. 2005), and was limited for non-OR genes (Oka et al. 2003, Gussing and Bohm 2004, Yoshihara et al. 1997,

Duggan et al. 2008, Tietjen et al. 2003, Tietjen, Rihel, and Dulac 2005, Norlin et al. 2001, Vedin et al. 2009, Whitby-Logan, Weech, and Walters 2004, Miyawaki et al. 1996, Ling et al. 2004). In this work, we generated a near complete zonal map of 1,033 mouse ORs by mRNA sequencing of isolated MOE pieces, and identified novel non-OR genes that may exhibit zonal expression.

Results

To investigate the spatial pattern of gene expression in the mouse MOE, we sequenced mRNA

56 from 12 isolated MOE pieces with an average of 12.5 million single-end 100-bp reads per piece

(min = 4.9 million, max = 21.0 million). In mice, the expression zone of each OR assumes a complex shape that approximates a cylindrical shell, concentric to each other (Ressler, Sullivan, and Buck 1993). Here we follow a naming convention where zone 1 is the most dorsomedial, zone 5 (also known as zone 4b) is the most ventrolateral, and the zone of each OR is not necessarily an integer (Miyamichi et al. 2005) (Figure 4.1A). The geometry is further complicated by an unusual, medial zone of Olfr459 (also known as OR-Z6 or MOR120-1, orthologous to human OR9A2) (Pyrski et al. 2001) (red region in Figure 4.1A). We tackled the problem of the complex zonal geometry and its invisibility to naked eyes by randomly isolating

12 very small MOE pieces, each of which would contain only one or a few nearby zones. In this way, the expression pattern of each OR across the 12 MOE pieces would reflect its zone. Indeed, normalized expression levels of the 1,033 OR genes that were sufficiently detected (defined as having an average of at least 0.5 transcripts per million (TPM)) showed a prominent pattern of differential expression across the 12 pieces, as visualized by t-distributed stochastic neighbor embedding (t-SNE) (Maaten and Hinton 2008) and principal component analysis (PCA) (black points in Figure 4.1B). This pattern reflected zonal expression, because on the t-SNE and PCA plots a small subset of 78 ORs of known zones (Miyamichi et al. 2005, Pyrski et al. 2001) were distributed according to their zones (colored points in Figure 4.1B). For example, on the t-SNE plot, Olfr459 of the unusual zone resided in a small, separate cluster at the top, while the other 77

ORs of zones 1 to 5 followed a continuous “trajectory” along the larger cluster.

57

Figure 4.1. Spatial-transcriptomic mapping of the expression zones of 1,033 OR genes and 712 putative non-OR genes. (A) We investigated the spatial pattern of gene expression by sequencing mRNA from 12 randomly isolated MOE pieces. Each small piece would contain only one or a few nearby zones. (B) Normalized expression levels of the 1,033 OR genes (diamonds for Class

I ORs (Zhang and Firestein 2002) and dots for Class II) that were sufficiently detected showed a prominent pattern of zonal expression across the 12 pieces, as visualized by 78 ORs of known zones (colored points) (Miyamichi et al. 2005) via t-distributed stochastic neighbor embedding

(t-SNE) and principal component analysis (PCA). We assigned the 22 ORs in the smaller t-SNE cluster (red dashed circle) to the unusual zone of Olfr459 (red dot).

58 Figure 4.1 (Continued). (C) We made a standard curve from zone 1 to zone 5 by smoothing the normalized expression levels of the 77 known ORs (top heatmaps), and inferred zones of the

1,033 ORs by finding the closest point on the standard curve (middle right heatmap). Results of the inference were visualized on a PCA plot. Putative non-OR genes that might exhibit zonal expression were selected based on expression levels, uniformity across the MOE pieces, and distance to the standard curve or to Olfr459 (bottom heatmap). In contrast, known marker genes

(Omp, Gnal, Cnga2 for mature OSNs, and Gng8, Gap43 for immature OSNs) exhibited uniform expression (middle left heatmap) .

We inferred the expression zones of 992 intact OR genes (81% of all 1,228, based on the

GENCODE annotation) and 41 OR pseudogenes (22% of all 189) using the observed expression patterns of 78 known ORs as standards. Among the 1,033 ORs that were sufficiently detected, we began by assigning the 22 ORs in the smaller t-SNE cluster (Figure 4.1B) to the unusual zone

(Pyrski et al. 2001), showing for the first time that the unusual expression pattern of Olfr459 is not an isolated case. For the remaining ORs, which would belong to the usual zones, we first made a 12-dimensional standard curve from zone 1 to zone 5 by smoothing the normalized expression levels of the 77 known ORs (Miyamichi et al. 2005) with a half window size of ± 0.5 zones (top heatmaps in Figure 4.1C). The zone of each OR was then inferred by finding the nearest point (as measured by Euclidean distance) on the standard curve (middle heatmap and visualization on a PCA plot in Figure 4.1C) (Table S5). Uncertainties in zonal inference could be estimated by leave-one-out cross-validation (LOOCV) on the 77 known ORs, yielding a root- mean-square error of 0.3 zones (min = 0.0, max = 1.1). Together, these ORs accounted for 99.4% of the total OR mRNA abundance (as measured by average TPMs) in the MOE.

59

We further identified 666 intact non-OR genes and 46 non-OR pseudogenes that might exhibit zonal expression. The large number of annotated genes in the mouse genome necessitated stringent criteria. We removed genes whose expression was either low (average TPM < 1) or uniform across the 12 pieces (coefficient of variation < 0.5). Among the remaining 3,228 non-

OR genes, putative zonal genes were selected based on Euclidean distance to the aforementioned standard curve (699 candidate genes with distance ≤ 2) or to the unusual zone’s Olfr459 (13 genes with distance ≤ 4) (bottom heatmap in Figure 4.1C). This list included known zonal genes Acsm4 (also known as O-MACS) (Oka et al. 2003), Nqo1 (Gussing and Bohm 2004),

Ncam2 (also known as OCAM) (Yoshihara et al. 1997), Foxg1 (Duggan et al. 2008), Eya2

(Tietjen et al. 2003), Msx1, Nrp2 (Norlin et al. 2001), Gstm5 (Whitby-Logan, Weech, and

Walters 2004), and trace-amine-associated receptors (TAARs) (Liberles and Buck 2006). Novel genes included transcription factors (Six3, Yy2, Isl1, Prdm16, Tbx3/15, Bach2, Foxa1, Pitx1,

Tcea3/l3/l5, Npas3/4, Dlx3, Twist1/2, E2f2/8, Zfp97/365/382/950), chemokines (Cxcl10/14,

Ccl5/8), cytochromes (Ling et al. 2004) (Cyp2a4/2b10/2c44/2e1/7b1), and aldehyde dehydrogenases (Norlin et al. 2001) (Aldh1a7/3a1/3b1). The false discovery rate (FDR) could be estimated to be 51% for the usual zones by random permutation of MOE-piece labels. A more stringent list of 202 genes (distance ≤ 1.5 to the standard curve) was also created with an estimated FDR of 18%. These genes provide promising candidates for zonal regulation of OR expression and of other cellular characteristics, for example cilia lengths (Challis et al. 2015).

Distribution of zones along the mouse genome provides insights into the mechanism of zonal regulation. Within each OR gene cluster, inferred zones typically varied gradually along the

60 chromosome, although drastic changes between adjacent ORs occasionally occurred (Figure

4.2A). In the canonical zones, two ORs would differ by an average of 0.3, 0.8, 0.9, or 1.3 zones, respectively, when they are separated by 10 kb, 100 kb, 1 Mb, or two different chromosomes

(Figure 4.2B). This hints to a model where the pattern of chromatin silencing morphs continuously from zone to zone, in each zone exposing only a subset of ORs — namely, the ORs of that zone — to some universal machinery of transcriptional activation. This model is consistent with a recent observation that removal of H3K9-mediated heterochromatic silencing obliterated zonal expression of ORs, among other changes in OR expression (Lyons et al. 2014).

In addition, genomic locations of drastic zonal changes may harbor important regulatory sites for

OR silencing. Although most non-OR candidates reside in separate locations (Figure 4.2C), several are embedded in OR clusters and share zones with nearby ORs, likely a byproduct of chromatin organization under zonal regulation.

61

Figure 4.2. Distribution of zones across the mouse genome. (A) Zonal information for all 1,417 annotated ORs (triangles, with the pointy end denoting the direction of transcription). Typically, inferred zones varied gradually along the chromosome, although drastic changes between adjacent ORs occasionally occurred. Numbers on the left denoted chromosomes. Grey denoted insufficient detection because of low expression levels. OR clusters were separated by vertical lines. (B) Nearby ORs tend to have similar zones.

62 Figure 4.2 (Continued). All pairs of ORs from the canonical zones were grouped into bins

according to their log10 genomic separation (in base pairs, upstream or downstream) with a bin size of 0.5. Bins with fewer than 10 OR pairs were excluded. Each black dot denoted the average

difference in zones and the average log10 genomic separation in each bin. Error bar denoted standard deviation. (C) Zonal information for 712 non-OR candidates (squares), displayed together with ORs (triangles, faded for visual clarity).

Discussion

Our results provide by far the most complete zonal map of mouse ORs, covering an order of magnitude more OR genes than state-of-the-art results obtained by traditional in situ hybridization (Miyamichi et al. 2005). Future work includes improving zonal resolution and inferring lowly expressed ORs by deeper sequencing of more MOE pieces, improving zonal inference by allowing variable kernel size and investigating expression patterns within zone 1, mapping out the second stage of the olfactory map by applying the same spatial-transcriptomic approach to the olfactory bulb, and comparing zonal expression between mouse and human.

Materials and Methods

Animals

All mouse experiments were performed in accordance with relevant guidelines and regulations.

Animal protocols were approved by Harvard IACUC. An adult male mouse from the inbred strain C57BL/6NTac (Taconic) was used.

Published data

63 Known zonal information of 82 ORs was downloaded from Table S1 of (Miyamichi et al. 2005) and converted to modern gene names through MGI, among which 77 were sufficiently detected in our experiment.

RNA-Seq experiments

Small pieces were isolated by forceps from a dissected MOE. Total RNA was extracted with an

RNeasy Mini Kit (Qiagen) and a TissueLyser II (Qiagen) (25 Hz for 2 min, invert, and another 2 min, with 5 mm stainless steel beads), and quantified by an RNA 6000 Pico Kit on a Bioanalyzer

2100 (Agilent).

Libraries were prepared with a TruSeq RNA Sample Prep Kit v2 (Illumina) with poly(A) selection, quantified by a Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific) and a High

Sensitivity DNA Analysis Kit on a Bioanalyzer 2100 (Agilent), and sequenced on a HiSeq 2500

(Illumina).

Data analysis

RNA-Seq reads were mapped to the mouse reference genome GRCm38.p5 (downloaded from the GENCODE M12 release) with tophat v2.0.11 (with default parameters) (Kim et al. 2013).

Expression levels were quantified with cufflinks v2.2.1 (with parameters “-u --max-bundle-frags

100000000”) (Trapnell et al. 2010) based on the comprehensive gene annotation (ALL) from the above GENCODE release. Restricting quantification to reads that have the highest mapping quality (of 50) had no impact on the results. TPM values were calculated after removing genes that have a gene type beginning with “Mt_”, “miRNA”, “rRNA”, “scRNA”, “snRNA”,

64 “snoRNA” “sRNA”, “scaRNA”, or “vaultRNA”. Translated protein sequences were downloaded from the above GENCODE release. Gene descriptions were downloaded from Ensembl BioMart

(mouse genes GRCm38.p5). OR genes (1,417 genes) were defined as having “Olfr” in the gene name (1,419 entries, 4 pairs of which refer to the same genes: ENSMUSG00000109148 and

ENSMUSG00000074985 are both Olfr1452-ps1, ENSMUSG00000109806 and

ENSMUSG00000073919 are both Olfr663, ENSMUSG00000109020 and

ENSMUSG00000061501 are both Olfr197, and ENSMUSG00000063732 and

ENSMUSG00000110991 are both Olfr908) or having “olfactory receptor” in the gene description (an additional 2 genes, “Olrf445-ps1” and “OR4P4”). Pseudogenes (5,580 genes, 189 of which are ORs) were defined as having “pseudogene” in the gene type.

For zonal inference, expression levels (in TPMs) were normalized such that the average is 1 for each gene across the 12 MOE pieces. Two-dimensional t-SNE was performed with its MATLAB implementation on pairwise Euclidean distances between normalized expression values (the

“tsne_d” function, with a default perplexity of 30). In the main text, the average difference in

zones given a genomic separation was calculated from all OR pairs with log10 genomic separations within ± 0.1 of the desired value, regardless of directionality (upstream or downstream).

Data availability

Raw sequencing data were deposited at the National Center for Biotechnology Information with accession number SRP127539 at the following link: http://www.ncbi.nlm.nih.gov/sra/SRP127539

65

66 References

Challis, R. C., H. Tian, J. Wang, J. He, J. Jiang, X. Chen, W. Yin, T. Connelly, L. Ma, C. R. Yu, J. L. Pluznick, D. R. Storm, L. Huang, K. Zhao, and M. Ma. 2015. "An Olfactory Cilia Pattern in the Mammalian Nose Ensures High Sensitivity to Odors." Curr Biol 25 (19):2503-12. doi: 10.1016/j.cub.2015.07.065.

Chess, A., I. Simon, H. Cedar, and R. Axel. 1994. "Allelic Inactivation Regulates Olfactory Receptor Gene-Expression." Cell 78 (5):823-834. doi: Doi 10.1016/S0092- 8674(94)90562-2.

Duggan, C. D., S. DeMaria, A. Baudhuin, D. Stafford, and J. Ngai. 2008. "Foxg1 is required for development of the vertebrate olfactory system." J Neurosci 28 (20):5229-39. doi: 10.1523/JNEUROSCI.1134-08.2008.

Gussing, F., and S. Bohm. 2004. "NQO1 activity in the main and the accessory olfactory systems correlates with the zonal topography of projection maps." Eur J Neurosci 19 (9):2511- 8. doi: 10.1111/j.0953-816X.2004.03331.x.

Hanchate, N. K., K. Kondoh, Z. Lu, D. Kuang, X. Ye, X. Qiu, L. Pachter, C. Trapnell, and L. B. Buck. 2015. "Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis." Science 350 (6265):1251-5. doi: 10.1126/science.aad2456.

Horowitz, L. F., L. R. Saraiva, D. Kuang, K. H. Yoon, and L. B. Buck. 2014. "Olfactory receptor patterning in a higher primate." J Neurosci 34 (37):12241-52. doi: 10.1523/JNEUROSCI.1779-14.2014.

Kim, D., G. Pertea, C. Trapnell, H. Pimentel, R. Kelley, and S. L. Salzberg. 2013. "TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions." Genome Biol 14 (4):R36. doi: 10.1186/gb-2013-14-4-r36.

Liberles, S. D., and L. B. Buck. 2006. "A second class of chemosensory receptors in the olfactory epithelium." Nature 442 (7103):645-50. doi: 10.1038/nature05066.

Ling, G., J. Gu, M. B. Genter, X. Zhuo, and X. Ding. 2004. "Regulation of cytochrome P450 gene expression in the olfactory mucosa." Chem Biol Interact 147 (3):247-58. doi: 10.1016/j.cbi.2004.02.003.

Lyons, D. B., A. Magklara, T. Goh, S. C. Sampath, A. Schaefer, G. Schotta, and S. Lomvardas. 2014. "Heterochromatin-mediated gene silencing facilitates the diversification of olfactory neurons." Cell Rep 9 (3):884-92. doi: 10.1016/j.celrep.2014.10.001.

67 Maaten, Laurens van der, and Geoffrey Hinton. 2008. "Visualizing data using t-SNE." Journal of Machine Learning Research 9 (Nov):2579-2605.

Malnic, B., J. Hirono, T. Sato, and L. B. Buck. 1999. "Combinatorial receptor codes for odors." Cell 96 (5):713-23.

Miyamichi, K., S. Serizawa, H. M. Kimura, and H. Sakano. 2005. "Continuous and overlapping expression domains of odorant receptor genes in the olfactory epithelium determine the dorsal/ventral positioning of glomeruli in the olfactory bulb." J Neurosci 25 (14):3586- 92. doi: 10.1523/JNEUROSCI.0324-05.2005.

Miyawaki, A., H. Homma, H. Tamura, M. Matsui, and K. Mikoshiba. 1996. "Zonal distribution of sulfotransferase for phenol in olfactory sustentacular cells." EMBO J 15 (9):2050-5.

Mombaerts, P., F. Wang, C. Dulac, S. K. Chao, A. Nemes, M. Mendelsohn, J. Edmondson, and R. Axel. 1996. "Visualizing an olfactory sensory map." Cell 87 (4):675-86.

Norlin, E. M., M. Alenius, F. Gussing, M. Hagglund, V. Vedin, and S. Bohm. 2001. "Evidence for gradients of gene expression correlating with zonal topography of the olfactory sensory map." Mol Cell Neurosci 18 (3):283-95. doi: 10.1006/mcne.2001.1019.

Oka, Y., K. Kobayakawa, H. Nishizumi, K. Miyamichi, S. Hirose, A. Tsuboi, and H. Sakano. 2003. "O-MACS, a novel member of the medium-chain acyl-CoA synthetase family, specifically expressed in the olfactory epithelium in a zone-specific manner." Eur J Biochem 270 (9):1995-2004.

Pyrski, M., Z. Xu, E. Walters, D. J. Gilbert, N. A. Jenkins, N. G. Copeland, and F. L. Margolis. 2001. "The OMP-lacZ transgene mimics the unusual expression pattern of OR-Z6, a new odorant receptor gene on mouse chromosome 6: implication for locus-dependent gene expression." J Neurosci 21 (13):4637-48.

Ressler, K. J., S. L. Sullivan, and L. B. Buck. 1993. "A zonal organization of odorant receptor gene expression in the olfactory epithelium." Cell 73 (3):597-609.

Ressler, K. J., S. L. Sullivan, and L. B. Buck. 1994. "Information coding in the olfactory system: evidence for a stereotyped and highly organized epitope map in the olfactory bulb." Cell 79 (7):1245-55.

Tan, L., Q. Li, and X. S. Xie. 2015. "Olfactory sensory neurons transiently express multiple olfactory receptors during development." Mol Syst Biol 11 (12):844. doi: 10.15252/msb.20156639.

68 Tietjen, I., J. Rihel, and C. G. Dulac. 2005. "Single-cell transcriptional profiles and spatial patterning of the mammalian olfactory epithelium." Int J Dev Biol 49 (2-3):201-7. doi: 10.1387/ijdb.041939it.

Tietjen, I., J. M. Rihel, Y. Cao, G. Koentges, L. Zakhary, and C. Dulac. 2003. "Single-cell transcriptional analysis of neuronal progenitors." Neuron 38 (2):161-75.

Trapnell, C., B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold, and L. Pachter. 2010. "Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation." Nat Biotechnol 28 (5):511-5. doi: 10.1038/nbt.1621.

Vassar, R., S. K. Chao, R. Sitcheran, J. M. Nunez, L. B. Vosshall, and R. Axel. 1994. "Topographic organization of sensory projections to the olfactory bulb." Cell 79 (6):981-91.

Vassar, R., J. Ngai, and R. Axel. 1993. "Spatial segregation of odorant receptor expression in the mammalian olfactory epithelium." Cell 74 (2):309-18.

Vedin, V., M. Molander, S. Bohm, and A. Berghard. 2009. "Regional differences in olfactory epithelial homeostasis in the adult mouse." J Comp Neurol 513 (4):375-84. doi: 10.1002/cne.21973.

Vosshall, L. B., H. Amrein, P. S. Morozov, A. Rzhetsky, and R. Axel. 1999. "A spatial map of olfactory receptor expression in the Drosophila antenna." Cell 96 (5):725-36.

Weth, F., W. Nadler, and S. Korsching. 1996. "Nested expression domains for odorant receptors in zebrafish olfactory epithelium." Proc Natl Acad Sci U S A 93 (23):13321-6.

Whitby-Logan, G. K., M. Weech, and E. Walters. 2004. "Zonal expression and activity of glutathione S-transferase in the mouse olfactory mucosa." Brain Res 995 (2):151-7.

Yoshihara, Y., M. Kawasaki, A. Tamada, H. Fujita, H. Hayashi, H. Kagamiyama, and K. Mori. 1997. "OCAM: A new member of the neural cell adhesion molecule family related to zone-to-zone projection of olfactory and vomeronasal axons." J Neurosci 17 (15):5830- 42.

Zhang, X., and S. Firestein. 2002. "The olfactory receptor gene superfamily of the mouse." Nat Neurosci 5 (2):124-33. doi: 10.1038/nn800.

Zhang, X., M. Rogers, H. Tian, X. Zhang, D. J. Zou, J. Liu, M. Ma, G. M. Shepherd, and S. J.

69 Firestein. 2004. "High-throughput microarray detection of olfactory receptor gene expression in the mouse." Proc Natl Acad Sci U S A 101 (39):14168-73. doi: 10.1073/pnas.0405350101.

70 Chapter 5

Reconstructing the 3D genomes of single diploid human and mouse cells

71 Author Contribution Statement

Dong Xing, Chi-Han Chang, X. Sunney Xie and I designed the experiments. Dong Xing and I performed the experiments. I developed the algorithm and analyzed the data. Dong Xing, X.

Sunney Xie, and I wrote the manuscript.

Introduction

The nucleus of a human diploid cell contains 46 chromosomes — 23 maternal and 23 paternal, together carrying 6 Gb of genomic DNA. The 3D genome structure is believed to be crucial for the regulation of gene expression and other cellular functions (Cremer and Cremer 2001). For example, the nuclei of sensory neurons assume unusual architectures in the mouse visual

(Solovei et al. 2009) and olfactory systems (Clowney et al. 2012, Lyons et al. 2014,

Markenscoff-Papadimitriou et al. 2014, Lomvardas et al. 2006). Chromatin conformation capture assays, such as 3C (Dekker et al. 2002) and Hi-C (Lieberman-Aiden et al. 2009), allow for studies of 3D genome structures in bulk samples. However, such structures are distinctly different from cell to cell, necessitating single-cell measurements. Recent single-cell chromatin conformation capture methods avoided ensemble averaging (Nagano et al. 2013, Flyamer et al.

2017, Li et al. 2017, Nagano et al. 2017, Stevens et al. 2017, Ramani et al. 2017) and yielded 3D genome structures of haploid mouse cells (Nagano et al. 2017, Stevens et al. 2017). However, most functional cells in mammalian tissues are diploid, and their 3D genome structures remain elusive (Carstens, Nilges, and Habeck 2016). Here we use phased (haplotype-resolved) single- nucleotide polymorphisms (SNPs) to distinguish between the two haplotypes of each chromosome, and reveal the 3D organization of the diploid human genome. With this capability in hand, we examined the cell-type dependence of 3D genome structures and the unusual nuclear

72 architecture in mouse OSNs.

Results

High-resolution 3D genome structures of single diploid cells require a large number of chromatin

“contacts” — pairs of genomic loci that were joined by proximity ligation. We developed a chromatin conformation capture method, termed Dip-C (Figure 5.1A), that can detect more contacts than existing methods with minimal false positives. Compared to previous methods, special cares were taken to avoid losing contacts by omitting biotin pulldown (Flyamer et al.

2017, Li et al. 2017) and conducting high-coverage whole-genome amplification with Multiplex

End-tagging Amplification (META) (Xing et al. U.S. provisional patent 62/509,981) (Chen et al.

2017), which introduced few artefactual chimera (Materials and Methods). We detected a median of 1.04 million contacts per single cell (n = 17, min = 0.71 million, max = 1.48 million) from

GM12878, a female lymphoblastoid cell line, and 0.84 million (n = 18, min = 0.67 million, max

= 1.08 million) from peripheral blood mononuclear cells (PBMCs) of a male donor (Lu et al.

2012), at least five times as high as medians of existing methods. Most cells were in the G1 or

G0 phase of the cell cycle. In addition, we simultaneously detected copy-number variations

(CNVs), loss of heterozygosity (LOH), DNA replication, and V(D)J recombination with a 10-kb bin size.

73

Figure 5.1. Single-cell chromatin conformation capture and haplotype imputation by Dip-C. (A)

Schematics of the chromatin conformation capture protocol. The 3D information of chromatin structure was encoded in the linear genome through proximity ligation of chromatin fragments, as in 3C (Dekker et al. 2002) and Hi-C (Lieberman-Aiden et al. 2009, Rao et al. 2014). Ligation product was then amplified by META (Xing et al. U.S. provisional patent 62/509,981) and sequenced. Colors represented genomic coordinates. (B) Imputation of the two chromosome haplotypes linked by each chromatin “contact” (red dot) in a representative single cell.

Another challenge in reconstructing diploid genomes is the lack of known haplotypes in the majority of chromatin contacts (Dixon et al. 2015, Servant et al. 2015, Rao et al. 2014, Giorgetti et al. 2016). To assign each contact to its correct haplotypes, we developed an imputation algorithm (Figure 5.1B). We reasoned that unknown haplotypes can be imputed from

“neighboring” (in terms of genomic distances) contacts based on the assumption that the two homologs would typically contact different partners. Utilizing a novel statistical property of interchromosomal and long-range intrachromosomal contacts (Materials and Methods), we

74 defined a contact neighborhood as a superellipse with an exponent of 0.5 and a radius of 10 Mb, where haplotypes of nearby contacts voted to impute the haplotypes of each contact of interest.

In the Dip-C algorithm, after removal of 3C/Hi-C artefacts (contacts with few neighbors (Stevens et al. 2017)) and initial imputation, haplotypes were further imputed iteratively through a series of draft 3D models (Materials and Methods). Imputation accuracy was estimated to be ~ 96% for each haplotype by cross-validation (Materials and Methods). Regions harboring CNVs or LOH, as well as an apparently damaged GM12878 cell, were excluded from reconstruction.

We reconstructed the 3D diploid human and mouse genomes at a 20-kb resolution.

Reconstruction was successful without supervision for 94% (15 out of 16) of the GM12878 cells and 67% (12 out of 18) of the PBMCs, or after removal of small problematic regions for 6% (one out of 16) of the GM12878 cells and 22% (4 out of 18) of the PBMCs (Materials and Methods).

Figure 5.2A showed a representative cell. Each particle represented 20 kb of chromatin, or a radius of ~ 100 nm. A lower bound for reconstruction uncertainty could be estimated from the median deviation of ~ 0.4 particle radii (~ 40 nm) across all 20-kb particles between three replicates. Inaccuracy in the energy function was difficult to estimate; and imputation was less accurate when two homologs were nearby. Well known nuclear morphologies were observed in an M/G1-phase GM12878 cell, where chromosomes retained their characteristic V shapes from a recent mitosis, and in several PBMCs, where multiple nuclear lobes were reminiscent of the partially segmented nuclei of low-density neutrophils and other blood cell types (Figure 5.2B).

The Dip-C algorithm also reconstructed 3D diploid mouse genomes from published data on mouse embryonic stem cells (mESCs) (Nagano et al. 2017) despite fewer contacts (~ 0.3 million per cell, or ~ 0.2 million by our definition), because the mouse line harbored more SNPs than

75 humans (Materials and Methods).

Figure 5.2. 3D genome structures of single diploid human cells. (A) 3D genome structure of a representative GM12878 cell.

76 Figure 5.2 (Continued). Each particle represents 20 kb of chromatin, or a radius of ~ 100 nm. (B)

Peculiar nuclear morphology in a cell that recently exited mitosis (upper panel) and in a cell with multiple nuclear lobes (lower panel). (C) Serial cross sections of a single cell showed compartmentalization of euchromatin (green) and heterochromatin (magenta), visualized by CpG frequency as a proxy (Xie et al. 2017). (D) Radial preferences across the human genome, as measured by average distances to the nuclear center of mass. Our results (black dots, smoothed by 1-Mb windows) agreed well with published DNA FISH data (gray lines) on whole chromosomes (Boyle et al. 2001) (shifted and rescaled), and provided novel fine-scale information. Axis limits were 20 and 50 particle radii for the black dots. (E) Example radial preferences of two chromosomes. The gene-rich chromosome 19 preferred the nuclear interior

(left), while the gene-poor chromosome 18 almost always resided on the nuclear surface (right).

(F) Stochastic fractal organization of chromatin was quantified by a matrix of radii of gyration of all possible subchains of each chromosome (heatmaps). We identified a hierarchy of single-cell domains across genomic scales (black trees). A subtree was simplified as a black triangle if either of its two subtrees was below a certain size (from left to right: 10 Mb, 2 Mb, 500 kb, 100 kb). In each panel, the region from the previous panel was shown in transparent gray. In the rightmost panel, thick sticks (top) and circles (bottom) highlighted the formation of a known CTCF loop

(Rao et al. 2014). Spheres with arrows (top) heighted the positions and orientations of the two converging CTCF sites. (G) Single-cell domains were highly heterogeneous between cells, exemplified by a 5-Mb region (chosen according to Figure 2e of (Dixon et al. 2012)). Averages were root-mean-square (r.m.s.) of radii of gyration. (H) The extent of multi-chromosome intermingling was quantified by the diversity of chromosomes, as measured by Shannon’s index, with 3 particle radii of each 20-kb particle.

77 Figure 5.2 (Continued). (I) Probability of extensive multi-chromosome intermingling (smoothed by 1-Mb windows) across the human genome. Some genomic regions were enriched in multi- chromosome intermingling. Axis limits were 0 and 0.8. GM12878 Cell 4 was excluded from (D),

(G), and (I) because of its extensive chromosomal aberrations. The M/G1-phase GM12878 Cell

16 was also excluded. Genomic coordinates were in hg19.

Similar to the haploid mouse genome (Nagano et al. 2017, Stevens et al. 2017), the diploid human genome exhibited chromosome territories (Figure 5.2A) and chromatin compartments

(visualized by CpG frequency as a proxy (Xie et al. 2017)), with the heterochromatic compartment B (Lieberman-Aiden et al. 2009) concentrated at the nuclear periphery and around foci in the nuclear center (Figure 5.2C). Spatial clustering of DNA sequences with similar CpG frequencies suggested a correlation between primary sequence features and 3D genome folding.

Our 3D structures revealed different radial preferences across the human genome (black dots in

Figure 5.2D). Our results agreed well with whole-chromosome painting data by DNA fluorescent in situ hybridization (FISH) (Boyle et al. 2001) (gray lines in Figure 5.2D), where the gene-rich chromosome 19 and the gene-poor chromosome 18 preferred the nuclear interior and the nuclear periphery, respectively (Figure 5.2E). Within each chromosome, different segments could have distinctly different radial preferences, which were correlated with chromatin compartments. For example, the CpG-rich, euchromatic end (left) of chromosome 1 was heavily biased towards the nuclear center, while some other regions on the same chromosome were biased towards the nuclear periphery (Figure 5.2D). Such fine-scale information cannot be obtained from whole- chromosome painting (Zhou et al. 2017, Boyle et al. 2001) experiments.

78

Our Dip-C results provided a holistic view of the stochastic, fractal organization of chromatin across different genomic scales. In bulk Hi-C, chromatin form a “fractal globule” with compartments (Lieberman-Aiden et al. 2009, Rao et al. 2014) and domains such as topologically associating domains (TADs) (Dixon et al. 2012) and CTCF loop-domains (Rao et al. 2014).

However, such fractal organization have not been visualized in single human cells in a genome- wide manner. We observed spatial clustering (globules) and segregation (insulation) of consecutive chromatin particles along each chromosome (upper panels in Figure 5.2F). Such organization could be quantified by a matrix of radii of gyration of all possible subchains in each chromosome (lower panels in Figure 5.2F). Single-cell domains could then be identified as squares that had relatively small radii (partly similar to (Flyamer et al. 2017)) (Materials and

Methods). We found single-cell domains to emerge across all genomic scales and therefore identified them through hierarchical merging, yielding a tree of domains (partly similar to (Fraser et al. 2015, Weinreb and Raphael 2016) in bulk Hi-C) (Figure 5.2F). On the smallest scale, some domains coincided with CTCF loop-domains from bulk Hi-C (Rao et al. 2014) (rightmost panels in Figure 5.2F). Single-cell domains were highly heterogeneous between cells, frequently breaking and merging bulk domains (Figure 5.2G).

Traditional methods such as bulk Hi-C and two-color DNA FISH are pairwise measurements and thus cannot study multi-chromosome intermingling. In our 3D models, we quantified multi- chromosome intermingling by the diversity of chromosomes (Shannon’s index) near each 20-kb particle (Figure 5.2H), which revealed genomic regions that frequently contacted multiple chromosomes (Figure 5.2I). These regions were similar between the human cell types despite

79 their different average extents of intermingling, and were mostly euchromatic (CpG-rich) for two reasons: (1) euchromatin more frequently resided on the surface of chromosomes than heterochromatin (consistent with (Nagano et al. 2013)), and (2) even when heterochromatin resided on the surface, it tended to face the nuclear periphery (Stevens et al. 2017) and thus had no partners to intermingle with. The intermingling regions partially overlapped with “hubs” identified by a recent report (Quinodoz et al. 2017).

We then examine the structural relationship between the maternal and paternal alleles, which can only be studied in diploid cells. Our data captured the structural difference between the two alleles in genome imprinting. At imprinted loci, the two alleles can differ drastically in transcriptional activity (Lee and Bartolomei 2013). Near the maternally transcribed H19 gene and the paternally transcribed IGF2 gene, bulk Hi-C identified different contact profiles and usage of CTCF loops between the two homologs (Rao et al. 2014). We directly visualized this ~

0.6-Mb region in single cells (Figure 5.3A). Despite cell-to-cell heterogeneity, the maternal allele more frequently separated IGF2 from both H19 and the nearby HIDAD site and disrupted the

IGF2-HIDAD CTCF loop, while the paternal allele more frequently stayed fully intermingled.

80

Figure 5.3. Distinct 3D structures of the maternal and the paternal alleles. (A) Structural difference between the two alleles of the imprinted H19/IGF2 locus. Despite cell-to-cell heterogeneity, the maternal allele more frequently separated IGF2 from both H19 and the nearby

HIDAD site and disrupted the IGF2-HIDAD CTCF loop (white and red circles). Spheres highlighted three CTCF sites from bulk Hi-C. Heatmaps showed the r.m.s. average pairwise distances between all 20-kb particles. Haplotype-resolved bulk Hi-C (black heatmap) was from

Figure 7C of (Rao et al. 2014). (B) The active (red) and inactive (blue) X chromosomes preferred extended and compact morphologies, respectively, as shown by cross sections of two representative cells. (C) Individual active and inactive X chromosomes could be distinguished by principal component analysis (PCA) of single-cell chromatin compartments, defined for each 20- kb particle as the average CpG frequency of nearby particles. (D) The inactive X chromosome tended to form previously reported “superloops” (Rao et al. 2014, Darrow et al. 2016, Giorgetti et al. 2016). Superloops were sorted by sizes (Mb). White spheres denoted 4 superloop anchors

(DXZ4, x75, ICCE, and FIRRE).

81 Figure 5.3 (Continued). (E) 3D localization of a paternally inherited drug-response single- nucleotide mutation (rs4244285, G to A) in the CYP2C19 gene in a representative GM12878 cell. Arrows represented the directions of transcription of CYP2C genes. (F) 3D localization of two different somatic DNA deletions — results of V(D)J recombination — in the two alleles of the T-cell receptor α/δ locus in a representative T lymphocyte.

X chromosome inactivation (XCI) presents a striking example of the difference between two homologs (Lee and Bartolomei 2013). As expected, we found in the female GM12878 cell line that the active X chromosome (the maternal allele based on RNA expression, Materials and

Methods) tended to exhibit an extended morphology, and the inactive X a compact one (Figure

5.3B), although in some cells the morphological difference was not obvious. More consistently, the two X chromosomes in each cell were characterized by their distinct patterns of chromatin compartments. The active X featured clear compartmentalization of euchromatin and heterochromatin, resembling that of the male X (in PBMCs); in contrast, compartments along the inactive X was more uniform. Individual X chromosomes could be clearly separated into two clusters — active and inactive — by principal component analysis (PCA) of single-cell compartments (Figure 5.3C). We also visualized the simultaneous formation of multiple

“superloops” (Rao et al. 2014, Darrow et al. 2016, Giorgetti et al. 2016) in the inactive X chromosome (Figure 5.3D).

In contrast to XCI, it is unknown whether single-cell compartments of two autosomal alleles also varied in a coordinated manner. By decomposing the variability of single-cell compartments into between-cell and within-cell differences (Materials and Methods), we found two autosomal

82 alleles to fluctuate (with respect to their median compartments) almost independently from each other, exhibiting on average near-zero Spearman’s correlation.

We can pinpoint genomic changes, such as SNPs and CNVs, to their precise spatial locations in the cell nucleus. The donor of the GM12878 cell line carried a heterozygous G-to-A mutation

(rs4244285) in a cytochrome P450 gene CYP2C19, leading to a truncated, non-functional protein variant CYP2C19*2 and affecting metabolism of hormones and drugs (Sample Details page,

Coriell Institute). Figure 3E showed the 3D localization of this drug-response SNP on the paternally inherited chromosome 10 of a GM12878 cell. In addition to inherited mutations, single cells also harbor somatic changes. In lymphocytes, somatic V(D)J recombination generates diversity of immunoglobulins and T-cell receptors by DNA deletions and inversions.

Figure 5.3F showed the 3D localization of two V(D)J recombinations at a T-cell receptor locus, leading to two different DNA deletions on the two alleles of chromosome 14 of a T lymphocyte.

The capability to spatially localize genomic changes is important for studying cancers and inherited diseases, where mutations can have dramatic consequences.

We last examine the cell-type dependency of 3D genome structures. Similar to haploid mESCs

(Stevens et al. 2017), chromosomes in diploid mESCs preferred the Rabl configuration

(centromeres pointing towards one side of the nucleus, and telomeres towards the other), albeit to a different extent in each cell (Figure 5.4A). In contrast, Rabl configuration is weak in most

GM12878 cells and PBMCs. Most PBMCs pointed their centromeres towards the nuclear periphery and telomeres towards the nuclear center, consistent with (Weierich et al. 2003). On the contrary, the M/G1-phase GM12878 cell pointed centromeres towards the outer rim of a

83 characteristic mitotic rosette.

Figure 5.4. Cell-type-specific chromatin structures. (A) Quantification of the organization of centromeres and telomeres. The mESCs exhibited stronger Rabl configuration (horizontal axis; the length of summed centromere-to-telomere vectors normalized by the total particle number; axis limit = 0.005 particle radii), while the PBMCs tended to point centromeres outwards relative to telomeres (vertical axis; the summed centromere-to-telomere difference in distances from the nuclear center of mass normalized by the total particle number; axis limit = 0.007 particle radii).

Each marker represented a single cell and was inferred by V(D)J recombination in PBMCs.

84 Figure 5.4 (Continued). (B) Quantification of chromosome intermingling (vertical axis; the average fraction of nearby particles that were not from the same chromosome) and chromatin compartmentalization (horizontal axis; Spearman’s correlation between each particle’s own CpG frequency and the average of nearby particles). (C) Example cross sections of 3 cell types, colored by chromosomes (left) or by the multi-chromosome intermingle index (right). (D)

Among the human cells, 4 cell-type clusters (shaded) — B lymphoblastoid cells, presumable T lymphocytes, B lymphocytes, and presumable monocytes/neutrophils (PBMC Cells 9, 14, and

18) — could be distinguished from the differential formation (defined as end-to-end distance ≤ 3 particle radii) of known cell-type-specific promoter-enhancer loops from published bulk promoter capture Hi-C (Javierre et al. 2016). (E) The same 4 clusters could also be distinguished by unsupervised clustering via PCA of single-cell chromatin compartments, without the need for bulk data. The two alleles of each locus were treated as two different loci. (F) An example region that was differentially compartmentalized between two cell types (black: B lymphoblastoid cells; red: presumable T lymphocytes). Right panels visualized the configuration of the ~ 0.5-Mb region (chr 13: 62.5 – 63 Mb, thick yellow sticks) with respect to the rest of the genome

(transparent, colored by CpG frequencies) in two representative cells. Only the paternal alleles were shown. Bulk Hi-C (black heatmap) was from (Rao et al. 2014, Jeong et al. 2017).

GM12878 Cell 4 was excluded. GM12878 Cell 16 was excluded from (D) and (E).

The overall extent of chromosome intermingling also differed among the cell types.

Chromosomes tended to intermingle less in mESCs and more in PBMCs, with GM12878 in the middle (Figure 5.4B, Figure 5.4C), consistent with previous reports that chromosomes intermingled less in the pluripotent mESCs than in terminally differentiated fibroblasts

85 (Maharana et al. 2016), and that chromosomes intermingled more in resting human lymphocytes than in activated ones (which resembled GM12878) (Branco et al. 2008). As expected (Nagano et al. 2017, Naumova et al. 2013), the M/G1-phase cell exhibited a low level of chromosome intermingling and the lowest level of chromatin compartmentalization.

Cell-type dependent promoter-enhancer looping was believed to underlie differential gene expression (Javierre et al. 2016). Among the human cells, differential formation of known cell- type-specific promoter-enhancer loops (based on cell-type purified bulk Hi-C (Javierre et al.

2016), Materials and Methods) clearly separated the single cells into four cell-type clusters — B lymphoblastoid cells (GM12878), presumable T lymphocytes, B lymphocytes, and presumable monocytes/neutrophils (Figure 5.4D).

Cell-type clusters could be equally well separated in an unsupervised manner, without prior knowledge of the cell types. Unlike ensemble-averaged structures such as protein crystal structures, single-cell 3D genomes are intrinsically stochastic and dynamic, blended with measurements uncertainties. Statistical characterization such as PCA is necessary to distinguish different cell types, in which clusters of single cells correspond to valleys in a Waddington landscape (Waddington 1957) of certain cellular phenotypes. This kind of cell typing has been carried out with phenotype variables such as single-cell transcriptomes (Tang et al. 2009) and open chromatin regions (Buenrostro et al. 2015, Cusanovich et al. 2015), each of which must have underlying structural differences in the 3D genome.

With Dip-C, we are in a position to carry out cell typing with genome structures as the sole

86 variable. Given the high information content of 3D structures, many possible features might be used in cluster analysis. Here we chose single-cell chromatin compartments as the input variable of PCA. The four cell-type clusters were clearly separated (Figure 5.4E), with one of the most differentially compartmentalized region shown in Figure 5.4F. Previous reports (Nagano et al.

2013, Nagano et al. 2017, Stevens et al. 2017, Ramani et al. 2017, Flyamer et al. 2017) had focused on defining the width, or spread, of a single Waddington valley, studying for example cell-cycle dynamics within a cell type and domain stochasticity within a cell-cycle phase. Our

PCA result, in contrast, highlighted the consistent difference among cell types, signifying the separation between Waddington valleys.

Finally, we apply Dip-C to the mouse olfactory system. OSNs are known to form an unusual nuclear architecture, where centromeres and OR genes from different chromosomes aggregate into epigenetically silenced foci, partly because their anchor to the nuclear periphery — lamin-B receptor (LBR) — is no longer expressed (Clowney et al. 2012, Lyons et al. 2013, Lyons et al.

2014), while a network of OR enhancers from different chromosomes also aggregate and presumbly help activate the one transcribed OR gene (Monahan et al. 2017, Lomvardas et al.

2006, Markenscoff-Papadimitriou et al. 2014, Khan, Vaes, and Mombaerts 2011, Serizawa et al.

2003). This presents an ideal application for Dip-C, because different cells typically express different ORs, necessitating single-cell approaches. In a preliminary experiment, we performed

Dip-C on 11 single cells from the mouse MOE. Consistent with previous reports, 9 cells formed large, interchromosomal aggregates of OR genes (Figure 5.5). We also found chromosomes in these putative OSNs to intermingle extensively. In contrast, the 2 other cells lacked OR-gene aggregates and chromosome intermingling, and were thus putative supporting cells. Our result

87 presents a first step towards understanding 3D chromatin structures of all OR genes and enhancers in single OSNs. Further mouse work and data analysis may provide important insights into the chromatin structure behind the “one-neuron-one-receptor” rule.

Figure 5.5. Preliminary 3D genome structures from the mouse main olfactory epithelium. OR enhancers (rightmost panels) were defined by (Monahan et al. 2017).

Discussion

Our initial examination of only a handful of cell types has clearly shown the tissue-dependence of 3D genome structures. A systematic survey of more cell types under various conditions will likely lead to new discoveries in cell differentiation, genesis of cancer, learning and memory, and aging.

88

Materials and Methods

Human subjects

The collection and analysis of peripheral blood from the male donor (the same as in (Lu et al.

2012)) were approved by the Institutional Review Board (IRB) at Harvard.

Animals

All mouse experiments were performed in accordance with relevant guidelines and regulations.

Animal protocols were approved by Harvard IACUC. Male B6D2F1/J (JAX) mice were used.

The MOE was dissociated as previously described (Tan, Li, and Xie 2015).

Published data

Numbers of contacts were taken from Extended Data Table 1 (column “final contacts”) of

(Stevens et al. 2017) (n = 8), the GEO accession (GSE94489, column “total_contacts” from files

“GSE94489_2i_diploids_features_table.txt.gz” and

“GSE94489_serum_diploids_features_table.txt.gz”, keeping only cells whose “group” column is

“G1”) of (Nagano et al. 2017) (n = 750), Table S1 (column “total number of contacts”, keeping only cells whose “cell type” column is “K562”, “Intermediate”, “Intermediate-Hoechst”, “SN”,

“SN-Hoechst”, “NSN”, or “NSN-Hoechst”) of (Flyamer et al. 2017) (n = 34 for K562 and n =

120 for mOocyte).

Centromere coordinates were downloaded from the Table Browser (“human” -> “Feb. 2009

(GRCh37/hg19)” -> “mapping and sequencing” -> “gap”, and “mouse” -> “Dec. 2011

89 (GRCm38/mm10)” -> “mapping and sequencing” -> “gap”) of the UCSC Genome Browser.

Raw data of mESCs were downloaded from the GEO accession (GSE94489) of (Nagano et al.

2017). Samples were demultiplexed by fastq-multx (Aronesty 2013) (version 1.3.1). A subset of

10 diploid mESCs were chosen so that they were both flow-sorted and inferred in silico to be G1 phase, harbored no large chromosomal aberrations, and were among the top cells in terms of the number of contacts. These cells yielded a median of 3.09 million author-defined contacts (n = 10, min = 2.98 million, max = 3.71 million) (Nagano et al. 2017), or 2.26 million by our definition

(min = 2.01 million, max = 2.84 million).

SNP data were taken from the Illumina Platinum Genomes FTP site (the file “2016-

1.0/hg19/small_variants/NA12878/NA12878.vcf.gz”) for GM12878, Mingyu Yang (Lu et al.

2012) (the folder “Sperm_project_released_data/04.haplotypes/combined_haplotypes”) for the blood donor, and the Sanger Institute Mouse Genomes Project (the file

“mgp.v5.merged.snps_all.dbSNP142.vcf.gz”) for mESCs.

Promoter capture Hi-C interactions were taken from “PCHiC_peak_matrix_cutoff5.tsv” in Data

S1 of (Javierre et al. 2016). Only intra-chromosomal interactions with a minimal genomic distance of 100 kb were kept (n = 632,986). For each interaction, interaction strength in B lymphocytes, T lymphocytes, or monocytes/neutrophils, respectively, was calculated by averaging corresponding columns (“nB” and “tB” for B lymphocytes, “nCD4”, “tCD4”, “nCD8” and “tCD8” for T lymphocytes, “Mon” and “Neu” for monocytes/neutrophils). Cell-type-specific interactions were defined as strength ≥ 6 in the cell type of interest and ≤ 3 in the other 2 cell types (n = 12,338 for B lymphocytes and n = 8,538 for monocytes/neutrophils). In each single

90 cell, the level of cell-type-specific promoter proximity was defined as the percentage of interacting pairs that were within a 3D distance of 3 particle radii.

Bulk Hi-C on GM12878 (Rao et al. 2014) and on T lymphocytes (Jeong et al. 2017) was visualized by Juicebox.js (Robinson et al. 2017) with balanced normalization.

Generation of a list of phased SNPs

For GM12878, all 2.15 million heterozygous SNPs were extracted from the VCF file, assuming a genotype format of “paternal | maternal”.

For PBMCs, raw sequencing reads of the family trio were downloaded from the SRA

(SRX205465 for the blood donor, SRX205467 for the donor’s mother, SRX205466 for the donor’s father) (Lu et al. 2012) and mapped without pre-processing. SNPs were jointly called by

GATK (version 3.8.0) HaplotypeCaller with default parameters and filtered with recommended parameters (“-selectType SNP” for “SelectVariants”, “--filterExpression "QD < 2.0 || FS > 60.0 ||

MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"” for “VariantFiltration”, and “-- excludeFiltered -restrictAllelesTo BIALLELIC” for “SelectVariants” again). Population-based phasing on autosomes were performed by shapeit (version v2.r837) with its genetic map

(“genetic_map_b37”), reference panel (“1000GP_Phase3”), and recommend effective population size for Asians (“--effective-size 14269”), yielding 1.92 million phased SNPs. Population-based phasing was then combined with published sperm-based phasing (1.23 million SNPs), yielding

1.98 million phased SNPs. When in conflict (0.03 million out of 1.17 million), sperm-based results were used.

91

For mESCs, the parental lines were 129S4/SvJaeJ × Castaneus (Stevens et al. 2017). As an approximation, 129S1/SvImJ and CAST/EiJ were extracted from the dbSNP VCF file, yielding

21.50 million sites that were different between the two strains.

For the MOE, the parental lines were C57BL/6J female × DBA/2J male, each of which was extracted from the dbSNP VCF file.

Identification of the active and the inactive X chromosomes by RNA expression

Although the GM12878 cell line that we obtained (Coriell Institute) was not a single-cell clone, it anecdotally preferred the maternal X chromosome as the active one. To confirm this by RT-

PCR, we extracted total RNA with RNeasy Mini with DNase (Qiagen) with a centrifugal homogenizer (Invitrogen), and synthesized cDNA with ProtoScript II (NEB) with random primers. Regions harboring the following heterozygous SNPs (according to ENCODE) were amplified by Q5 Hot Start High-Fidelity Master Mix (NEB) and analyzed by Sanger sequencing:

1. XIST (from the inactive X): rs1620574, paternal = C, maternal = T, primers =

ACTGGATGGAAGACCACAAC + GTGTCTTGGGTAGCAGAAGAA.

2. EBP (from the active X): rs3048, paternal = T, maternal = G, primers =

TATACACACGCAGCCATCAG + CTTCACAGCATCAAGCACAAG.

3. TBL1X (from the active X): rs16985675, paternal = A, maternal = G, primers =

TGTGATGGCTGAATGGAAAGA + GAAAGGTACAGAGGGAGAGAGA.

4. SLC25A53 (also known as MCART6, from the active X): rs5916825, paternal = A,

maternal = G, primers = GCACTGCAGGTGGAAAGAATA +

92 GCTGTGGCTGGAAATCCTAAA.

5. ATRX (from the active X): rs3088074, paternal = G, maternal = C, primers =

TCTCCATCAGTTGTTCCATTCT + CTTCCACTGATGGTGTCGATAA.

Isolation of PBMCs from blood

Blood was drawn into K2EDTA-coated tubes (BD) and placed on ice immediately. PBMCs were isolated according to the manual of Ficoll-Paque PLUS (GE), with 1 X PBS + 2 mM EDTA as the salt solution.

Chromatin conformation capture in Dip-C

We improved the sensitivity from three aspects. First, all biotin-related steps were omitted. This modification avoided inefficient procedures of single-cell biotin-pulldown and blunt-end DNA ligation, baring more similarities to the original 3C (Dekker et al. 2002) than to its Hi-C derivatives (Lieberman-Aiden et al. 2009, Rao et al. 2014), and was independently adopted by others (Flyamer et al. 2017, Li et al. 2017). Second, the ligation product was amplified by a sensitive and uniform whole-genome amplification method, META (Xing et al. U.S. provisional patent 62/509,981), analogous to our recently published LIANTI (Chen et al. 2017). Third,

META added sequencing adapters through PCR, rather than through ligation in traditional library preparations. This modification reduced artefactual paired-end chimera, and was independently adopted by others (Nagano et al. 2017, Stevens et al. 2017). Below is the detailed procedure.

Cells were fixed by 2% PFA (EMS) in PBS or culture media without serum at room temperature for 10 min with rotation. PFA was quenched by the addition of 2 M glycine (0.2-um filtered) to a

93 final concentration of 0.127 M and incubation on ice for 5 min. Cells were washed by ice-cold

PBS (centrifugation: 600 g, 5 min) and pellets were stored at −80 C.

Permeabilization and digestion were performed by removing biotin-related steps from published

Hi-C protocols. Results were comparable between different protocols.

In one variant (based on (Nagano et al. 2017); for replicate 1 of GM12878), cells were first permeabilized in 1 mL ice-cold Permeabilization Buffer (10 mM Tris pH 8.0, 10 mM NaCl,

0.2% Igepal CA 630 (Sigma), cOmplete Mini EDTA-free (Roche)) on ice for 30 min with occasional inversion, washed by 800 uL 1.24 X NEBuffer 3 (NEB) (centrifugation: 600 g, 6 min), and further permeabilized in 400 uL 1.24 X NEBuffer 3 + 0.3% SDS at 37 C for 1 h with

950 RPM shaking. SDS was quenched by the addition of 40 uL 20% Triton X-100 and incubation at 37 C for 1 h with 950 RPM shaking. Cells were then digested by the addition of 50 uL 25 U/uL MboI (NEB R0147M) and incubation at 37 C overnight with 950 RPM shaking.

In another variant (based on (Rao et al. 2014); for replicate 2 of GM12878 and PBMCs), cells were first permeabilized in 500 uL ice-cold Hi-C Lysis Buffer (10 mM Tris pH 8.0, 10 mM

NaCl, 0.2% Igepal CA 630) and 100 uL protease inhibitors (Sigma P8340) on ice for ≥ 15 min, washed by ice-cold Hi-C Lysis Buffer (centrifugation: 2500 g, 5 min), and further permeabilized in 50 uL 0.5% SDS at 62 C for 10 min. SDS was quenched by the addition of 145 uL water and

25 uL 10% Triton X-100 and incubation at 37 C for 15 min with rotation. Cells were then digested by the addition of 25 uL 10 X NEBuffer 2 (NEB) and 20 uL 25 U/uL MboI (NEB

R0147M) and incubation at 37 C overnight with rotation. This variant might reduce clumping

94 and disappearance of some cells.

On the next day, cells were washed by 1 mL Ligation Buffer (1 X T4 DNA ligase buffer (NEB

B0202S), 0.1 mg/mL BSA (NEB B9000S)), and ligated in 1 mL Ligation Buffer and 10 uL 1

U/uL T4 DNA ligase (Life Tech 15224-025) at 16 C for 4 h.

Single-cell isolation by flow cytometry

For GM12878 and PBMCs, ligated cells (in ligation buffer) were filtered by a 40-um cell strainer

(Falcon) and sorted into 0.2-mL UV-irradiated DNA low-bind tubes (MAXYMum Recovery,

Axygen) containing lysis buffer with a FACSJazz flow cytometer (BD; 100-um nozzle). Events were first gated on FSC and SSC as “cells”, and then on FSC and trigger pulse width as

“singlets”. The sorting mode was “1.0 drop single”.

Whole-genome amplification in Dip-C

Design of META: In Nextera kits (Illumina), two different tags — each harboring one side of the

Illumina adapter — were randomly inserted into the input DNA. The two tags would then function as two PCR primers to amplify the resulting fragments. Fragments that ended with two different tags would be amplified, while fragments that ended with two same tags would be lost.

As a result, at least 50% of input DNA would be lost.

In META (Xing et al. U.S. provisional patent 62/509,981), such loss was greatly reduced by inserting n different tags. As a result, only 1/n of input DNA would be lost. Illumina adapters were added later by two short PCR steps.

95

In this work, we used META with n = 20 tags:

1. AGAAGCCGTGTGCCGGTCTA

2. AT CGT GCGGACGAG ACAGC A

3. AATCCTAGCACCGGTTCGCC

4. ACGTGTTGCAGGTGCACTCG

5. ACACCACACGGCCTAGAGTC

6. TGGACAATCACGCGACCAGC

7. TCATCTAACGCGCACCGTGC

8. TTCGTCGGCTCTCTCGAACC

9. TGGTGGAGCGTGCAGACTCT

10. TATCTTCCTGCGCAGCGGAC

11. CTGACGTGTGAGGCGCTAGA

12. CCATCATCCAACCGGCTTCG

13. CACGAGAAGCCGTCCGCTTA

14. CGTACGTGCAACACTCCGCT

15. CTTGGTCAGGCGAGAAGCAC

16. GGCGTGATCAGTGCGTGGAT

17. GAGCGTTTGGTGACCGCCAT

18. GCCTGCGGTCCATTGACCTA

19. GTAAGCCACTCCAGCGTCAC

20. GATCTGTTGCGCGTCTGGTG

96 Preparation of META reagents: Carrier ssDNA (for use in Lysis Buffer) could be either the same as in LIANTI (5′-TCAGGTTTTCCTGAA-3′) (Chen et al. 2017) or the same as the META 20- primer Mix. Store at −20 C.

Transposome (for use in Transposition Mix) was partly similar to LIANTI (Chen et al. 2017), but with two modifications. First, one strand of the transposon was 5′-/Phos/-

CTGTCTCTTATACACATCT-3′, while the other strand was in the form of 5′-[META tag]-

AGATGTGTATAAGAGACAG-3′. Each of the oligos (IDT, purification: PAGE) was dissolved in 0.1 X TE to a final concentration of 100 uM. For each of the n = 20 META tags, two strands were annealed at a final concentration of 5 uM each. The 20 annealed transposons were then pooled with equal volumes. Second, the transposase was purified after expression from the pTXB1-Tn5 plasmid (Addgene). Transposome was assembled at a final concentration of 1.25 uM dimer (2.5 uM monomer), 1:10 diluted (125 nM dimer, or 250 nM monomer), and aliquoted for single uses and store at −80 C.

20-primer Mix (for use in PCR Mix 1) was in the form of 5′-[META tag]-AGATGTGTATAAG-

3′. Each of the oligos (IDT, purification: standard desalting) was dissolved in 0.1 X TE to a final concentration of 100 uM, and combined with equal volumes (100 uM total, or 5 uM each). Store at −20 C.

40-primer Mix (for use in PCR Mix 2) was in the form of 5′-

ACACTCTTTCCCTACACGACGCTCTTCCGATCT-[META tag]-AGATGTGTATAAG-3′ for one side of the Illumina adapter, and 5′-GACTGGAGTTCAGACGTGTGCTCTTCCGATCT-

97 [META tag]-AGATGTGTATAAG-3′ for the other. Each of the oligos (IDT, purification: PAGE) was dissolved in 0.1 X TE to a final concentration of 50 uM, and combined with equal volumes

(50 uM total, or 1.25 uM each). Store at −20 C.

Cell lysis: Single cells were lysed in 3 uL META Lysis Buffer (20 mM Tris pH 8.0, 20 mM

NaCl, 0.1% Triton X-100, 15 mM DTT, 1 mM EDTA, 1.5 mg/mL Qiagen protease, 0.5 uM carrier ssDNA) at 50 C for 6 h, 65 C for 12 h, 70 C for 30 min. For some cell types, lysis may need to be shortened (for example 65 C for 1 h, 70 C for 15 min). Lysed cells could be stored at

−80 C for a few months.

Alternatively, single cells might be directly placed in empty tubes and stored at −80 C for longer times before addition of META Lysis Buffer, although we have not tested this extensively.

Whole-genome amplification: Lysate was transposed by the addition of 5 uL Transposition Mix

(leading to a final concentration of 10 mM TAPS pH 8.5, 5 mM MgCl2, 8% PEG 8000, 1:2,640

(0.5 nM dimer) META transposome) and incubation at 55 C for 10 min. Transposases were removed by the addition of 2 uL Stop Mix (1 uL 2 mg/mL Qiagen protease diluted in water, and

1 uL 0.5 M NaCl, 75 mM EDTA) and incubation at 50 C for 40 min, 70 C for 20 min.

Whole-genome amplification was performed by the addition of 10 uL PCR Mix 1 (4 uL Q5 reaction buffer (NEB), 4 uL Q5 high GC enhancer (NEB), 0.5 uL 100 mM MgCl2, 0.5 uL 100 uM (total) META 20-primer Mix, 0.4 uL 10 mM (each) dNTP mix, 0.2 uL water, 0.2 uL 20 mg/mL BSA (NEB B9000S), 0.2 uL Q5 (NEB M0491S)) and incubation at 72 C for 3 min, 98 C for 20 s, 12 cycles of [98 C for 10 s, 65 C for 1 min, 72 C for 2 min], and 65 C for 5 min.

98

Optionally, the amplification product could be purified at this step and analyzed with a High

Sensitivity DNA Kit on a Bioanalyzer (Agilent) for quality control.

Library preparation: Sequencing libraries were prepared by two additional PCR steps. In the first

PCR step, previous primers were removed by the addition of 0.5 uL 20 U/uL ExoI (NEB

M0293S) and incubation at 37 C for 30 min, 72 C for 20 min. White precipitates might form at this step or at the following steps. PCR was performed by the addition of 9.5 uL PCR Mix 2 (2 uL Q5 reaction buffer (NEB), 2 uL Q5 high GC enhancer (NEB), 3 uL 50 uM (total) META 40- primer Mix, 0.2 uL 10 mM (each) dNTP mix, 2.2 uL water, 0.1 uL Q5 (NEB M0491S)) and incubation at 98 C for 30 s, 2 cycles of 98 C for 10 s + 65 C for 1 min + 72 C for 2 min, and 65 C for 5 min. In the second PCR step, primers were similarly removed by the addition of 0.5 uL 20

U/uL ExoI (NEB M0293S) and incubation at 37 C for 30 min, 72 C for 20 min. PCR was similarly performed by the addition of 2.5 uL NEB Index Primer (NEB E7335S, E7500S,

E7710S, E7730S) and 7 uL PCR Mix 3 (2 uL Q5 reaction buffer (NEB), 2 uL Q5 high GC enhancer (NEB), 2.5 uL NEB Universal Primer, 0.2 uL 10 mM (each) dNTP mix, 0.2 uL water,

0.1 uL Q5 (NEB M0491S)) and incubation at 98 C for 30 s, 2 or more cycles of 98 C for 10 s +

65 C for 1 min + 72 C for 2 min, and 65 C for 5 min. Libraries could be pooled at this step or at any step afterwards.

Libraries were purified by a DNA Clean and Concentrator-5 column (Zymo D4013) with 200 uL

DNA Binding Buffer (a ratio of 1:5) and eluted in 25 uL 0.1 X TE. Size selection was performed with Ampure XP beads (Beckman Coulter, typically 0.65 X). A representative Bioanalyzer trace

99 was shown in Figure S1B. Note that only a fraction of molecules were Illumina libraries

(harboring genomic DNA + 78 bp or more META adaptors + ~ 130 bp Illumina adaptors), while others had incompatible or partial Illumina adapters. Therefore, libraries must be quantified by qPCR.

Sequencing: Libraries were quantified by qPCR and sequenced with paired-end 250-bp reads on a HiSeq 2500 (Illumina). To avoid diversity issues (especially at the 19 bp right after the META tag), 20% PhiX was added. Raw sequencing outputs were 10–47 Gb per cell, corresponding to raw sequencing depths of 3–16 X. Similar to LIANTI (Chen et al. 2017), mapped outputs were lower because of transposed sequences and overlapping reads 1 and 2.

Statistical property of interchromosomal and long-range intrachromosomal contacts

In bulk Hi-C, the probability of intrachromosomal contacts is well known to systematically decrease over genomic separation (bp) — in particular, the probability of a contact joining coordinate x (bp) and x + Δx on the same chromosome is approximately proportional to Δx−1 (the

“fractal globule” model) (Lieberman-Aiden et al. 2009, Naumova et al. 2013, Nagano et al.

2017).

We wondered if a similar rule governed interchromosomal contacts. In particular, given that an interchromosomal contact joined coordinate x on one chromosome and y on another, the two contacting chromosomes might be seen as “tethered” at (x, y) and thus formed more contacts nearby. Such conditional properties were hidden in bulk Hi-C because interchromosomal contacts were highly stochastic.

100

We began by considering the conditional probability, given a contact joining coordinate x (bp) on one chromosome and y (bp) on another, that another contact joined x + Δx and y + Δy. Naively, if the two contacting chromosomes were “concatenated” at (x, y) and intermingled as if they were a single chromosome, the conditional probability density would be proportional to (Δx +

Δy)−1 — in other words, the inverse of the L1 norm of (Δx, Δy). In contrast, we found the conditional probability density to be approximately proportional to the inverse of the L0.5 norm of

(Δx, Δy); in other words, p(Δx, Δy) ∝ (√Δx + √Δy)−2. Therefore, simultaneously large Δx and Δy were relatively disfavored, suggesting that two contacting chromosomes tended not to fully intermingle but rather protrude into each other. This empirical formula held across different genomic scales and for long-range intrachromosomal contacts. Similar to the unconditional Δx−1 density of intrachromosomal contacts (Lieberman-Aiden et al. 2009, Naumova et al. 2013,

Nagano et al. 2017), the conditional density also varied systematically across the cell cycle.

Utilizing the L0.5 property, we defined a contact “neighborhood” as a superellipse with radius = 10

Mb and exponent = 0.5, where haplotypes of nearby contacts could vote to impute the haplotypes of each contact of interest.

3D reconstruction in Dip-C

Code will be available on GitHub (https://github.com/tanlongzhi/dip-c). Starting from FASTQ files, 3D reconstruction consisted of the following steps: preprocessing → alignment → contact identification → artifact removal → haplotype imputation (2D) → [with replicates from here on]

3 rounds of 3D reconstruction at 100-kb resolution + haplotype imputation (3D) → 2 rounds of

3D reconstruction at 20-kb resolution + haplotype imputation (3D). Details of the procedures and

101 file formats will be found on the GitHub page; and we welcome suggestions and efforts to make them better and easier to use for the single-cell 3C/Hi-C community. Below is a brief description of each analysis step:

Preprocessing: Most reads followed a format of [META tag]-AGATGTGTATAAGAGACAG-

[genomic DNA]-CTGTCTCTTATACACATCT-[reverse complement of another META tag], although a small fraction harbored extra META adaptors (and very rarely, genomic DNA in between, which was discarded). Similar to (Chen et al. 2017), META and Illumina adapters were removed, and the two ends (read 1 and read 2) were merged if they overlapped.

Alignment: Similar to (Chen et al. 2017), reads were mapped by BWA-MEM (Li 2013) (version

0.7.15) with default parameters to the human reference genome GRCh37 (for GM12878 and

PBMCs) or to the mouse reference genome GRCm38.p5 (GENCODE, for mESCs).

Contact identification: From each read or read pair, all high-quality (mapping quality ≥ 20, edit distance per bp alignment ≤ 0.05) primary and supplementary alignments were extracted as

“segments”. If a segment overlapped with a phased SNP, a haplotype would be assigned if base quality ≥ 20. Chromatin contacts were identified as all pairs of segment end points — called

“legs” — that were separated by > 1 kb in each read or read pair. PCR duplicates were removed by iteratively merging near-identical (both legs differed by ≤ 1 kb) contacts. Depending on size selection and cells, 4–15% of sequencing reads yielded contacts.

Artifact removal: Similar to (Stevens et al. 2017), “promiscuous” legs (alignment artifacts) were

102 removed if > 10 other legs fell within 1 kb; subsequently, “isolated” contacts (3C/Hi-C artifacts) were removed if < 5 other contacts fell within 10 Mb in L0.5 distance (instead of L∞ distance in

(Stevens et al. 2017)).

Haplotype imputation (2D): In each round of imputation, contacts in an “evidence” set voted to impute unknown haplotypes of contacts in a “target” set. For each target contact, a list of compatible haplotype tuples was first enumerated. For example, a contact joining the maternal chromosome 1 and an unknown haplotype of chromosome 2 would be compatible with two possible haplotype tuples, (Chr 1♀, Chr 2♂) and (Chr 1♀, Chr 2♀). Each evidence contact would then vote for haplotype tuples from this list, if such contact fell within 10 Mb in L0.5 distance from the target contact and was compatible with one and only one haplotype tuple from the list. Imputation would occur if the winning haplotype tuple gathered ≥ 3 votes and ≥ 90% of all votes.

Special care was taken for intrachromosomal contacts because intrahomologous contacts were far more frequent than interhomologous contacts, especially at short ranges (small genomic separation). A target contact would be assumed intrahomologous without voting, if its two legs were separated by ≤ 10 Mb; otherwise, voting still occurred but a winning interhomologous vote would only be accepted if two legs were separated by ≥ 100 Mb. In addition, intrachromosomal contacts that had unknown haplotypes on both legs were not imputed.

The imputation procedure began with all contacts that had known haplotypes on at least one leg as both the target and the evidence sets. Such imputation was repeated two more times, each time

103 with previous results as the new evidence set. Results were subsequently cleaned by removal of isolated contacts (< 2 other contacts that had the same haplotypes within 10 Mb in L0.5 distance).

Finally, cleaned results were used as the evidence set to impute a target set of all interchromosomal contacts that had unknown haplotypes on both legs.

For males, pseudoautosomal regions (PARs) were excluded from this step.

3D reconstruction: Simulated annealing was performed by nuc_dynamics (Stevens et al. 2017)

(parameters: “-temps 20 -s 8 4 2 0.4 0.2 0.1” for 100-kb structures or “-temps 20 -s 8 4 2 0.4 0.2

0.1 0.04 0.02” for 20-kb structures) with minor modifications. First, the backbone energy function remained harmonic for large distances to reduce imputation errors. Second, removal of isolated contacts was skipped because it was already performed. Third, the output was in a simple “3D genome (3DG)” format (tab delimited: chromosome name, genomic coordinate (bp), x, y, z) because the original PDB format did not allow > 99,999 atoms. An example code was provided to convert 3DG to mmCIF for visualization in PyMol (run “set connect_mode, 4” before loading).

Haplotype imputation (3D): Partly similar to (Stevens et al. 2017), unknown haplotypes of each contact were imputed by comparing 3D distances in a draft structure. For each contact, a list of compatible haplotype tuples was first enumerated (same as 2D imputation). For each possible haplotype tuple, 3D positions of the two leg were calculated by linear interpolation along the polymer of particles, between which the 3D distance was recorded. Imputation would occur if the winning haplotype tuple (the shortest 3D distance) yielded a 3D distance ≤ 20 particle radii

104 and ≤ 0.5 times the second shortest 3D distance. Intrachromosomal contacts whose legs were separated by < draft structure resolution (bp) were not imputed. Finally, results were cleaned by removal of isolated contacts (< 2 other contacts that had the same haplotypes within 10 Mb in L0.5 distance). For males, PARs were included in this step.

Removal of repetitive regions: Similar to (Stevens et al. 2017), particles that harbored few contacts, such as centromeres and heterochromatic repeats, were removed from the final 3D structure. For each particle, the number of contact legs within 0.5 Mb was recorded. The bottom

6% of all particles were removed.

Cross-validation: For each GM12878 cell, 10% of all SNPs were randomly held out from the list of phased SNPs. The imputed haplotype of each leg was compared to original leg (ground truth).

Imputation accuracy was estimated by the fraction of correctly imputed legs given that the ground truth was known (~ 5 k such legs per cell).

Analysis of 3D structures

Estimation of reconstruction uncertainty: Similar to (Stevens et al. 2017), three replicate structures were generated with different random seeds. After removal of repetitive regions (see above), shared genomic particles were extracted from the replicates and aligned with the Kabsch algorithm in a pairwise manner. For each particle, r.m.s. deviation was calculated between all pairs of replicates. A lower bound for reconstruction uncertain of each cell was estimated by the median r.m.s. deviation across all particles.

105 For mESCs, five replicates were generated, because their smaller numbers of contacts and limited chromosome intermingling occasionally led to suboptimal structures. These suboptimal structures satisfied fewer contacts than other replicates, and were thus excluded. Typical median r.m.s. deviation was ~ 1.3 particle radii (~ 130 nm).

Chromosome intermingling: The extent of chromosome intermingling of each 20-kb particle was quantified according to the chromosomes of nearby particles within 3 particle radii. Basic intermingling was defined as the fraction of nearby particles (excluding itself) that were not from the same chromosome. Multi-chromosome intermingling was defined as Shannon’s diversity

index of chromosomes (−∑pilnpi, where pi denoted the fraction of nearby particles from chromosome i). Another measure of diversity — species richness (the number of nearby chromosomes) — yielded similar results. The two homologs of the same chromosome were counted as two different chromosomes. Note that quantification could be unreliable near repetitive regions such as the centromere.

Single-cell chromatin compartments: In a previous study (Nagano et al. 2017), the single-cell chromatin compartment (“compartment association score”) of each genomic bin was defined as the average A/B compartment (as measured by bulk Hi-C) of other bins that it contacted. Partly similar to this definition, we defined the single-cell chromatin compartment of each 20-kb particle as the average CpG frequency (a proxy of A/B compartments (Xie et al. 2017)) of nearby particles (including itself) within 3 particle radii. This definition was equivalent to 3D smoothing/diffusion of CpG frequencies in each structure.

106 In some analysis, compartments were rank normalized to 0–1 in each cell because CpG frequencies were more variable in highly euchromatic regions.

Note that in addition to simple A/B compartments (with CpG frequency (Xie et al. 2017) or GC content (Flyamer et al. 2017) as a proxy), the above calculation could also be performed on other genomic vectors such as sub-compartments (Rao et al. 2014) (such as the polycomb), DNA methylation, and ChIP-Seq.

Relationship between single-cell chromatin compartments of the two alleles: Single-cell compartment of each genomic locus has been reported to vary between single cells (Nagano et al. 2017); however, the maternal and paternal alleles could not be distinguished, and it remained unclear whether the two varied in a coordinated manner. Across the genome, single-cell compartment varied both between different cells (“between cells”) and between the two alleles of the same cell (“within cells”), and had near-identical averages for the maternal and paternal alleles. Difference between cells concentrated in regions whose average compartment was neither extremely euchromatic nor extremely heterochromatic, consistent with a previous report on mice (Nagano et al. 2017). Interestingly, difference within cells followed a near-identical pattern, suggesting that cell-to-cell heterogeneity of chromatin compartments was dominated by allele-to-allele heterogeneity. Supporting this idea, we found on average near-zero Spearman’s correlation (median = 0.02, 0.03, and 0.05 for GM12878, presumable T lymphocytes, and combined, respectively) between the maternal and paternal compartments. Therefore, compartments of the two alleles behaved almost independently in a single cell.

107 Potential somatic pairing: We captured potential somatic pairing — spatial proximity of the maternal and paternal alleles — in our 3D models. We found a higher degree of somatic pairing in the PBMCs, with up to 0.4% of all 20-kb particles residing within 5 particle radii (~ 500 nm) from their homologs, while mESCs exhibited the lowest degree of pairing, consistent with their lowest levels of chromosome intermingling. Note that imputation was less effective when homologous loci were in close proximity, leading to higher reconstruction uncertainty.

Identification of single-cell domains through matrices of radii of gyration: In a previous study

(Stevens et al. 2017), 3D sizes — as measured by radii of gyration — were calculated in each cell for bulk-defined domains. We extended this calculation to all n(n + 1)/2 possible subchains of each chromosome (a polymer of n point-mass particles), yielding a matrix whose rows, columns, and values represented the starts, ends, and radii of subchains, respectively. Note that radii were less accurate near repetitive regions because those particles were excluded from calculation.

In a matrix of radii of gyration, single-cell domains were identified as squares that had relatively small radii (partly similar to (Flyamer et al. 2017)). Each 20-kb particle was initialized as a single domain (radius = 0). In each round, all possible ways of merging two adjacent domain into one domain were enumerated; and the one leading to the smallest radius was performed. Merging repeated until the entire chromosome became a single domain.

Note that in addition to radii of gyration, the above calculation could also be performed with other measures of 3D sizes, such as higher moments to further disfavor a few protruding

108 particles, or relative sizes normalized by numbers of particles to favor larger but denser domain.

Configuration of centromeres and telomeres: A previous study (Stevens et al. 2017) noticed a tendency for Rabl configuration in mESCs. We further quantified the extent of Rabl configuration by the length of summed centromere-to-telomere vectors normalized by the total particle number. This definition was equivalent to the length (in particle radii) of the average projection of vectors connecting each 20-kb particle to its centromeric neighbor.

We quantified the radial difference between centromeres and telomeres by the summed centromere-to-telomere difference in distances from the nuclear center of mass normalized by the total particle number. This definition was equivalent to the average change in distance (in particle radii) from the nuclear center of mass between each 20-kb particle and its centromeric neighbor.

109 References

Aronesty, Erik. 2013. "Comparison of sequencing utility programs." The Open Bioinformatics Journal 7 (1).

Boyle, S., S. Gilchrist, J. M. Bridger, N. L. Mahy, J. A. Ellis, and W. A. Bickmore. 2001. "The spatial organization of human chromosomes within the nuclei of normal and emerin- mutant cells." Hum Mol Genet 10 (3):211-9.

Branco, M. R., T. Branco, F. Ramirez, and A. Pombo. 2008. "Changes in chromosome organization during PHA-activation of resting human lymphocytes measured by cryo- FISH." Chromosome Res 16 (3):413-26. doi: 10.1007/s10577-008-1230-x.

Buenrostro, J. D., B. Wu, U. M. Litzenburger, D. Ruff, M. L. Gonzales, M. P. Snyder, H. Y. Chang, and W. J. Greenleaf. 2015. "Single-cell chromatin accessibility reveals principles of regulatory variation." Nature 523 (7561):486-90. doi: 10.1038/nature14590.

Carstens, S., M. Nilges, and M. Habeck. 2016. "Inferential Structure Determination of Chromosomes from Single-Cell Hi-C Data." PLoS Comput Biol 12 (12):e1005292. doi: 10.1371/journal.pcbi.1005292.

Chen, C., D. Xing, L. Tan, H. Li, G. Zhou, L. Huang, and X. S. Xie. 2017. "Single-cell whole- genome analyses by Linear Amplification via Transposon Insertion (LIANTI)." Science 356 (6334):189-194. doi: 10.1126/science.aak9787.

Clowney, E. J., M. A. LeGros, C. P. Mosley, F. G. Clowney, E. C. Markenskoff-Papadimitriou, M. Myllys, G. Barnea, C. A. Larabell, and S. Lomvardas. 2012. "Nuclear Aggregation of Olfactory Receptor Genes Governs Their Monogenic Expression." Cell 151 (4):724- 737. doi: Doi 10.1016/J.Cell.2012.09.043.

Cremer, T., and C. Cremer. 2001. "Chromosome territories, nuclear architecture and gene regulation in mammalian cells." Nat Rev Genet 2 (4):292-301. doi: 10.1038/35066075.

Cusanovich, D. A., R. Daza, A. Adey, H. A. Pliner, L. Christiansen, K. L. Gunderson, F. J. Steemers, C. Trapnell, and J. Shendure. 2015. "Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing." Science 348 (6237):910-4. doi: 10.1126/science.aab1601.

Darrow, E. M., M. H. Huntley, O. Dudchenko, E. K. Stamenova, N. C. Durand, Z. Sun, S. C. Huang, A. L. Sanborn, I. Machol, M. Shamim, A. P. Seberg, E. S. Lander, B. P. Chadwick, and E. L. Aiden. 2016. "Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture." Proc Natl Acad Sci U S A 113

110 (31):E4504-12. doi: 10.1073/pnas.1609643113.

Dekker, J., K. Rippe, M. Dekker, and N. Kleckner. 2002. "Capturing chromosome conformation." Science 295 (5558):1306-11. doi: 10.1126/science.1067799.

Dixon, J. R., I. Jung, S. Selvaraj, Y. Shen, J. E. Antosiewicz-Bourget, A. Y. Lee, Z. Ye, A. Kim, N. Rajagopal, W. Xie, Y. Diao, J. Liang, H. Zhao, V. V. Lobanenkov, J. R. Ecker, J. A. Thomson, and B. Ren. 2015. "Chromatin architecture reorganization during stem cell differentiation." Nature 518 (7539):331-6. doi: 10.1038/nature14222.

Dixon, J. R., S. Selvaraj, F. Yue, A. Kim, Y. Li, Y. Shen, M. Hu, J. S. Liu, and B. Ren. 2012. "Topological domains in mammalian genomes identified by analysis of chromatin interactions." Nature 485 (7398):376-80. doi: 10.1038/nature11082.

Flyamer, I. M., J. Gassler, M. Imakaev, H. B. Brandao, S. V. Ulianov, N. Abdennur, S. V. Razin, L. A. Mirny, and K. Tachibana-Konwalski. 2017. "Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition." Nature 544 (7648):110-114. doi: 10.1038/nature21711.

Fraser, J., C. Ferrai, A. M. Chiariello, M. Schueler, T. Rito, G. Laudanno, M. Barbieri, B. L. Moore, D. C. Kraemer, S. Aitken, S. Q. Xie, K. J. Morris, M. Itoh, H. Kawaji, I. Jaeger, Y. Hayashizaki, P. Carninci, A. R. Forrest, Fantom Consortium, C. A. Semple, J. Dostie, A. Pombo, and M. Nicodemi. 2015. "Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in ." Mol Syst Biol 11 (12):852. doi: 10.15252/msb.20156492.

Giorgetti, L., B. R. Lajoie, A. C. Carter, M. Attia, Y. Zhan, J. Xu, C. J. Chen, N. Kaplan, H. Y. Chang, E. Heard, and J. Dekker. 2016. "Structural organization of the inactive X chromosome in the mouse." Nature 535 (7613):575-9. doi: 10.1038/nature18589.

Javierre, B. M., O. S. Burren, S. P. Wilder, R. Kreuzhuber, S. M. Hill, S. Sewitz, J. Cairns, S. W. Wingett, C. Varnai, M. J. Thiecke, F. Burden, S. Farrow, A. J. Cutler, K. Rehnstrom, K. Downes, L. Grassi, M. Kostadima, P. Freire-Pritchett, F. Wang, Blueprint Consortium, H. G. Stunnenberg, J. A. Todd, D. R. Zerbino, O. Stegle, W. H. Ouwehand, M. Frontini, C. Wallace, M. Spivakov, and P. Fraser. 2016. "Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters." Cell 167 (5):1369-1384 e19. doi: 10.1016/j.cell.2016.09.037.

Jeong, Mira, Xingfan Huang, Xiaotian Zhang, Jianzhong Su, Muhammad Shamim, Ivan Bochkov, Jaime Reyes, Haiyoung Jung, Emily Heikamp, Aviva Presser Aiden, Wei Li, Erez Aiden, and Margaret A. Goodell. 2017. "A Cell Type-Specific Class of Chromatin

111 Loops Anchored at Large DNA Methylation Nadirs." bioRxiv. doi: 10.1101/212928.

Khan, M., E. Vaes, and P. Mombaerts. 2011. "Regulation of the probability of mouse odorant receptor gene choice." Cell 147 (4):907-21. doi: 10.1016/j.cell.2011.09.049.

LaSalle, J. M., and M. Lalande. 1996. "Homologous association of oppositely imprinted chromosomal domains." Science 272 (5262):725-8.

Lee, J. T., and M. S. Bartolomei. 2013. "X-inactivation, imprinting, and long noncoding RNAs in health and disease." Cell 152 (6):1308-23. doi: 10.1016/j.cell.2013.02.016.

Li, Heng. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA- MEM. ArXiv e-prints 1303. Accessed March 1, 2013.

Li, X., J. ZHANG, H. Zhao, Z. PEI, and Z. Xuan. 2017. 构建高分辨率、大信息量单细胞 Hi- C 文库的方法. Google Patents.

Lieberman-Aiden, E., N. L. van Berkum, L. Williams, M. Imakaev, T. Ragoczy, A. Telling, I. Amit, B. R. Lajoie, P. J. Sabo, M. O. Dorschner, R. Sandstrom, B. Bernstein, M. A. Bender, M. Groudine, A. Gnirke, J. Stamatoyannopoulos, L. A. Mirny, E. S. Lander, and J. Dekker. 2009. "Comprehensive mapping of long-range interactions reveals folding principles of the human genome." Science 326 (5950):289-93. doi: 10.1126/science.1181369.

Lomvardas, S., G. Barnea, D. J. Pisapia, M. Mendelsohn, J. Kirkland, and R. Axel. 2006. "Interchromosomal interactions and olfactory receptor choice." Cell 126 (2):403-13. doi: 10.1016/j.cell.2006.06.035.

Lu, S., C. Zong, W. Fan, M. Yang, J. Li, A. R. Chapman, P. Zhu, X. Hu, L. Xu, L. Yan, F. Bai, J. Qiao, F. Tang, R. Li, and X. S. Xie. 2012. "Probing meiotic recombination and aneuploidy of single sperm cells by whole-genome sequencing." Science 338 (6114):1627-30. doi: 10.1126/science.1229112.

Lyons, D. B., W. E. Allen, T. Goh, L. Tsai, G. Barnea, and S. Lomvardas. 2013. "An epigenetic trap stabilizes singular olfactory receptor expression." Cell 154 (2):325-36. doi: 10.1016/j.cell.2013.06.039.

Lyons, D. B., A. Magklara, T. Goh, S. C. Sampath, A. Schaefer, G. Schotta, and S. Lomvardas. 2014. "Heterochromatin-mediated gene silencing facilitates the diversification of olfactory neurons." Cell Rep 9 (3):884-92. doi: 10.1016/j.celrep.2014.10.001.

Maharana, S., K. V. Iyer, N. Jain, M. Nagarajan, Y. Wang, and G. V. Shivashankar. 2016.

112 "Chromosome intermingling-the physical basis of chromosome organization in differentiated cells." Nucleic Acids Res 44 (11):5148-60. doi: 10.1093/nar/gkw131.

Markenscoff-Papadimitriou, E., W. E. Allen, B. M. Colquitt, T. Goh, K. K. Murphy, K. Monahan, C. P. Mosley, N. Ahituv, and S. Lomvardas. 2014. "Enhancer interaction networks as a means for singular olfactory receptor expression." Cell 159 (3):543-57. doi: 10.1016/j.cell.2014.09.033.

Monahan, K., I. Schieren, J. Cheung, A. Mumbey-Wafula, E. S. Monuki, and S. Lomvardas. 2017. "Cooperative interactions enable singular olfactory receptor expression in mouse olfactory neurons." Elife 6. doi: 10.7554/eLife.28620.

Nagano, T., Y. Lubling, T. J. Stevens, S. Schoenfelder, E. Yaffe, W. Dean, E. D. Laue, A. Tanay, and P. Fraser. 2013. "Single-cell Hi-C reveals cell-to-cell variability in chromosome structure." Nature 502 (7469):59-64. doi: 10.1038/nature12593.

Nagano, T., Y. Lubling, C. Varnai, C. Dudley, W. Leung, Y. Baran, N. Mendelson Cohen, S. Wingett, P. Fraser, and A. Tanay. 2017. "Cell-cycle dynamics of chromosomal organization at single-cell resolution." Nature 547 (7661):61-67. doi: 10.1038/nature23001.

Naumova, N., M. Imakaev, G. Fudenberg, Y. Zhan, B. R. Lajoie, L. A. Mirny, and J. Dekker. 2013. "Organization of the mitotic chromosome." Science 342 (6161):948-53. doi: 10.1126/science.1236083.

Quinodoz, Sofia A, Noah Ollikainen, Barbara Tabak, Ali Palla, Jan Marten Schmidt, Elizabeth Detmar, Mason Lai, Alexander Shishkin, Prashant Bhat, Vickie Trinh, Erik Aznauryan, Pamela Russell, Christine Cheng, Marko Jovanovic, Amy Chow, Patrick McDonel, Manuel Garber, and Mitchell Guttman. 2017. "Higher-order inter-chromosomal hubs shape 3-dimensional genome organization in the nucleus." bioRxiv. doi: 10.1101/219683.

Ramani, V., X. Deng, R. Qiu, K. L. Gunderson, F. J. Steemers, C. M. Disteche, W. S. Noble, Z. Duan, and J. Shendure. 2017. "Massively multiplex single-cell Hi-C." Nat Methods 14 (3):263-266. doi: 10.1038/nmeth.4155.

Rao, S. S., M. H. Huntley, N. C. Durand, E. K. Stamenova, I. D. Bochkov, J. T. Robinson, A. L. Sanborn, I. Machol, A. D. Omer, E. S. Lander, and E. L. Aiden. 2014. "A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping." Cell 159 (7):1665-80. doi: 10.1016/j.cell.2014.11.021.

Robinson, James, Douglass Turner, Neva C. Durand, Helga Thorvaldsdottir, Jill P. Mesirov, and

113 Erez Lieberman Aiden. 2017. "Juicebox.js provides a cloud-based visualization system for Hi-C data." bioRxiv. doi: 10.1101/205740.

Serizawa, S., K. Miyamichi, H. Nakatani, M. Suzuki, M. Saito, Y. Yoshihara, and H. Sakano. 2003. "Negative feedback regulation ensures the one receptor-one olfactory neuron rule in mouse." Science 302 (5653):2088-2094. doi: Doi 10.1126/Science.1089122.

Servant, N., N. Varoquaux, B. R. Lajoie, E. Viara, C. J. Chen, J. P. Vert, E. Heard, J. Dekker, and E. Barillot. 2015. "HiC-Pro: an optimized and flexible pipeline for Hi-C data processing." Genome Biol 16:259. doi: 10.1186/s13059-015-0831-x.

Solovei, I., M. Kreysing, C. Lanctot, S. Kosem, L. Peichl, T. Cremer, J. Guck, and B. Joffe. 2009. "Nuclear Architecture of Rod Photoreceptor Cells Adapts to Vision in Mammalian Evolution." Cell 137 (2):356-368. doi: Doi 10.1016/J.Cell.2009.01.052.

Stevens, T. J., D. Lando, S. Basu, L. P. Atkinson, Y. Cao, S. F. Lee, M. Leeb, K. J. Wohlfahrt, W. Boucher, A. O'Shaughnessy-Kirwan, J. Cramard, A. J. Faure, M. Ralser, E. Blanco, L. Morey, M. Sanso, M. G. S. Palayret, B. Lehner, L. Di Croce, A. Wutz, B. Hendrich, D. Klenerman, and E. D. Laue. 2017. "3D structures of individual mammalian genomes studied by single-cell Hi-C." Nature 544 (7648):59-64. doi: 10.1038/nature21429.

Tan, L., Q. Li, and X. S. Xie. 2015. "Olfactory sensory neurons transiently express multiple olfactory receptors during development." Mol Syst Biol 11 (12):844. doi: 10.15252/msb.20156639.

Tang, F., C. Barbacioru, Y. Wang, E. Nordman, C. Lee, N. Xu, X. Wang, J. Bodeau, B. B. Tuch, A. Siddiqui, K. Lao, and M. A. Surani. 2009. "mRNA-Seq whole-transcriptome analysis of a single cell." Nat Methods 6 (5):377-82. doi: 10.1038/nmeth.1315.

Waddington, C. H. 1957. The strategy of the genes; a discussion of some aspects of theoretical biology. London,: Allen & Unwin.

Weierich, C., A. Brero, S. Stein, J. von Hase, C. Cremer, T. Cremer, and I. Solovei. 2003. "Three- dimensional arrangements of centromeres and telomeres in nuclei of human and murine lymphocytes." Chromosome Res 11 (5):485-502.

Weinreb, C., and B. J. Raphael. 2016. "Identification of hierarchical chromatin domains." Bioinformatics 32 (11):1601-9. doi: 10.1093/bioinformatics/btv485.

Xie, W. J., L. Meng, S. Liu, L. Zhang, X. Cai, and Y. Q. Gao. 2017. "Structural Modeling of Chromatin Integrates Genome Features and Reveals Chromosome Folding Principle." Sci Rep 7 (1):2818. doi: 10.1038/s41598-017-02923-6.

114 Zhou, Y., P. Wang, F. Tian, G. Gao, L. Huang, W. Wei, and X. S. Xie. 2017. "Painting a specific chromosome with CRISPR/Cas9 for live-cell imaging." Cell Res 27 (2):298-301. doi: 10.1038/cr.2017.9.

115