Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Research Article

Epigenomics and Single-Cell Sequencing Define a Developmental Hierarchy in Langerhans Cell Histiocytosis

Florian Halbritter1,2, Matthias Farlik1,3, Raphaela Schwentner2, Gunhild Jug2, Nikolaus Fortelny1, Thomas Schnöller2, Hanja Pisa2, Linda C. Schuster1, Andrea Reinprecht4, Thomas Czech4, Johannes Gojo5, Wolfgang Holter2,5,6, Milen Minkov2,7, Wolfgang M. Bauer3, Ingrid Simonitsch-Klupp8, Christoph Bock1,9,10,11, and Caroline Hutter2,5,6

abstract Langerhans cell histiocytosis (LCH) is a rare neoplasm predominantly affecting children. It occupies a hybrid position between cancers and inflammatory dis- eases, which makes it an attractive model for studying cancer development. To explore the molecular mechanisms underlying the pathophysiology of LCH and its characteristic clinical heterogeneity, we investigated the transcriptomic and epigenomic diversity in primary LCH lesions. Using single-cell RNA sequencing, we identified multiple recurrent types of LCH cells within these biopsies, including puta- tive LCH progenitor cells and several subsets of differentiated LCH cells. We confirmed the presence of proliferative LCH cells in all analyzed biopsies using IHC, and we defined an epigenomic and - regulatory basis of the different LCH-cell subsets by chromatin-accessibility profiling. In summary, our single-cell analysis of LCH uncovered an unexpected degree of cellular, transcriptomic, and epigenomic heterogeneity among LCH cells, indicative of complex developmental hierarchies in LCH lesions.

SIGNIFICANCE: This study sketches a molecular portrait of LCH lesions by combining single-cell tran- scriptomics with epigenome profiling. We uncovered extensive cellular heterogeneity, explained in part by an intrinsic developmental hierarchy of LCH cells. Our findings provide new insights and hypotheses for advancing LCH research and a starting point for personalizing therapy.

See related commentary by Gruber et al., p. 1343.

1CeMM Research Center for Molecular Medicine of the Austrian Acad- Note: Supplementary data for this article are available at Cancer Discovery emy of Sciences, Vienna, Austria. 2St. Anna Children’s Cancer Research Online (http://cancerdiscovery.aacrjournals.org/). 3 Institute (CCRI), Vienna, Austria. Department of Dermatology, Medical F. Halbritter and M. Farlik contributed equally to this work. University of Vienna, Vienna, Austria. 4Department of Neurosurgery, Medi- cal University of Vienna, Vienna, Austria. 5Department of Pediatrics, Medi- Corresponding Authors: Caroline Hutter, St. Anna Children’s Cancer cal University of Vienna, Vienna, Austria. 6St. Anna Children’s Hospital, Research Institute (CCRI), St. Anna Kinderkrebsforschung, Zimmer- St. Anna Kinderspital, Vienna, Austria. 7Department of Pediatrics, Adoles- mannplatz 10, 1090 Vienna, Austria. Phone: 4314-0470-4043; E-mail: cent Medicine and Neonatology, Rudolfstiftung Hospital, Vienna, Austria. [email protected]; and Christoph Bock, CeMM Research Center for 8Clinical Institute of Pathology, Medical University of Vienna, Vienna, Aus- Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, tria. 9Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria. E-mail: [email protected] Vienna, Austria. 10Max Planck Institute for Informatics, Saarland Informat- Cancer Discov 2019;9:1406–21 11 ics Campus, Saarbrücken, Germany. Ludwig Boltzmann Institute for Rare doi: 10.1158/2159-8290.CD-19-0138 and Undiagnosed Diseases, Vienna, Austria. ©2019 American Association for Cancer Research.

1406 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

INTRODUCTION BRAFV600E mutation (6–8), but the pathophysiology of LCH is different from cancers with oncogenic BRAFV600E mutations, Langerhans cell histiocytosis (LCH) is a rare hematopoietic such as melanoma, non–small cell lung cancer, and colorec- neoplasm driven by gain-of-function mutations in the MAPK tal cancer. LCH cells carry few genetic alterations beyond pathway (1, 2). The name of the disease originates from the BRAFV600E or another oncogenic driver in the MAPK pathway characteristic expression of CD207, also known as , (MAP2K1, ARAF), and they show no progressive accumula- which is conventionally associated with Langerhans cells—a tion of recurrent somatic mutations (9). Instead, high levels type of resident epidermal immune cell (3). However, the - of inflammatory infiltrate suggest that cell-extrinsic, immu- tionship between LCH cells and Langerhans cells remains con- nologic stimuli could be key contributors to the formation of troversial, and LCH has also been linked to dendritic cells and LCH lesions (10–12). LCH is lethal in up to 20% of patients other myeloid cell types (4–6). LCH is phenotypically character- with risk organ–positive, multisystem disease treated accord- ized by an accumulation of CD1A and CD207 double-positive ing to current standard of care. In contrast, the disease often cells in various tissues (Fig. 1A). These lesions can develop in resolves spontaneously following unknown mechanisms in almost any organ, but are most common in skin and bone (1). patients with single lesions (1, 12). LCH severity is clinically assessed by the extent of the dis- ease (involvement of one or more organs, i.e., single-system RESULTS vs. multisystem disease) and the involvement of known risk organs (liver, spleen, hematopoietic system). Treatment deci- Langerhans Cell Histiocytosis Lesions Are sions are currently based solely on clinical presentation, as Composed of Distinct Cell Populations no reliable molecular markers for risk stratification have LCH cells within the same lesion show wide variability in been identified yet. LCH is most commonly driven by the their CD1A and CD207 levels based on IHC (Fig. 1B) and flow

October 2019 CANCER DISCOVERY | 1407

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

RESEARCH ARTICLE Halbritter et al.

A Langerhans cells histiocytosis (LCH) is a rare BELCH cells are Marker identify multiple cell types in single-cell transcriptome data of LCH lesions paediatric disease that affects various organs heterogeneous CD1A CD207 MAP2K1 CD33 BRAF Lesions formed by immune Bone lesion within biopsies Brain cells of unknown origin CD1A Cell surface markers:

Lymph node CD1A, CD207 High Skin Genetic drivers: Lung Gain of MAPK signaling, CD3D IL32 FOXP3 CD19 CD79A V600E Liver 60% BRAF CD207 Spleen Incidence: Bone ∼4.6/million (age 0–15) Hematoxylin and eosin stain Skin lesion

t-SNE dimension 2 IL3RACLEC4C TCF4 CD14 CD163 BRAFV600E Normalized UMI count Low

t-SNE dimension 1

CDLCH biopsy samples analyzed using LCH biopsy samples analyzed using F Single-cell transcriptomics captures LCH cells single-cell transcriptome sequencing single-cell transcriptome sequencing and their microenvironment Identifier Biopsy Sex Age Extent Plasmacytoid CD1A+ CD207 + LCH_ELymph FemaleChild Multi- dendritic cells (LCH cells) node system LCH_H Skin Female Infant Multi- B cells system LCH_I Bone Male Child Single- system LCH_J Skin Female Aged Multi- system LCH_K Bone Male Child Single- system t-SNE dimension 2 LCH_N Skin Male Adult Multi- t-SNE dimension 2 system T cells LCH_O Bone Male Child Single- 19,044 single-cell Monocytes/ system transcriptomes macrophages t-SNE dimension 1 t-SNE dimension 1 Patient: EHIJKNO LCH cellImmune cell

Figure 1. Single-cell transcriptome analysis captures cellular and molecular diversity of LCH lesions. A, Clinical presentation and incidence of LCH. Shown is an X-ray image of LCH bone lesions in the skull (top right), a photograph of extensive LCH skin lesions (bottom right), and a hematoxylin and eosin staining of an LCH biopsy (bottom left). B, Cellular heterogeneity in an LCH biopsy (lymph node), as revealed by immunostaining for CD1A, CD207, and an antibody that specifically detects mutated BRAFV600E . C, Overview of patient biopsy samples analyzed by single-cell RNA sequencing (RNA-seq) in this study. D, Low-dimensional projection (t-SNE plot) of the combined single-cell RNA-seq dataset across all analyzed biopsies, comprising a total of 19,044 single-cell transcriptome profiles. E, Molecular heterogeneity in LCH illustrated by low-dimensional projection (as in D) of all single-cell transcriptome profiles overlaid with the expression levels of selected marker genes (dark blue color indicates high expression levels). F, Cellular hetero- geneity in LCH illustrated by low-dimensional projection (as in D), annotated with cell types inferred from marker-.

cytometry (Supplementary Fig. S1). To characterize the cel- Focusing initially on LCH-specific transcriptional profiles lular and molecular landscape of LCH lesions, we performed that were shared across cells and across patients, we com- droplet-based single-cell transcriptome sequencing of seven pared CD1A and CD207 marker–positive LCH cells with four LCH lesions. The biopsies were obtained from patients with immune-cell populations identified in all biopsies (Supple- multisystem disease (n = 4) and single-system disease (n = 3), mentary Fig. S2D). The LCH cells showed high expression of and they were collected from different sites: lymph node (n = 1), multiple genes previously reported as specifically expressed skin (n = 3), and bone (n = 3), thus representing the wide clini- in LCH cells (14, 15), including the MMP9 gene, several genes cal spectrum of the disease (Fig. 1C; Supplementary Table S1). relevant for antigen presentation (for instance, CD1E), and Because the percentage of LCH cells within lesions can be low genes encoding members of the HLA complex (Fig. 2A and B). (1), two samples were enriched for cell populations with specific We combined the results of the individual comparisons into expression levels of the canonical LCH surface markers CD1A an LCH gene signature derived from our single-cell transcrip- and CD207 (Supplementary Fig. S1). A total of 19,044 single- tome data, which distinguished LCH cells from immune cells cell transcriptomes passed quality control and collectively gave across patients (Fig. 2B; Supplementary Fig. S3A and S3B). rise to a detailed cellular and molecular portrait of LCH lesions, We validated this gene signature on published LCH bulk spanning multiple patients and diverse clinical presentations transcriptome datasets (8, 14, 15) and were able to confirm (Fig. 1D; Supplementary Fig. S2A and S2B). In the single-cell its robustness for a wider range of LCH samples (Supple- transcriptome dataset, we identified several distinct clusters mentary Fig. S3C and S3D) and in comparison with other of immune cells, which we labeled on the basis of known cell histiocytoses, namely Erdheim–Chester disease and juvenile type–specific marker genes (Fig. 1E and F; Supplementary Fig. xanthogranuloma (Supplementary Fig. S3E). S2C). As expected, we observed T cells (some expressing FOXP3, Our LCH gene signature was enriched for genes associated indicating that these could be regulatory T cells as described with macrophages as well as dendritic cells (Fig. 2C), support- previously in LCH; ref. 13), B cells, monocytes/macrophages, ing that LCH cells share relevant properties with both cell and dendritic cells, in addition to LCH cells. types. We also observed an enrichment of genes involved in

1408 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Developmental Hierarchy in Langerhans Cell Histiocytosis RESEARCH ARTICLE

A Gene expression in LCH cells differs from four C Immune and inflammation gene immune cell types present in the same LCH lesions signatures are enriched in LCH cells

6 6 LCH cells vs. monocytes/ LCH cells vs. T cells ARCHS4 Tissues macrophages Dendritic cell *** CST3 110 HLA–DQB2 TACSTD2 4 4 TACSTD2 Macrophage *** CD1A MMP9 Cord blood *** CD1A CD207 0.1 020 40 60 2 2 Gene density LYZ 0.0 erage UMI count (log): LCH S100A8 erage UMI count (log): LCH CCL3 IL32 Cellular response to interferon *** Av 0 CCL4 Av 0 CD3D Antigen processing/presentation *** 0 2 46 0246 Positive regulation of cell aging Average UMI count (log): Mono./mac. Average UMI count (log): T cells * 0102030 LCH cells > monocytes/macrophages LCH cells > T cells Dendritic cell Dendritic cell Cord blood Blood dendritic cell Jensen diseases Blood dendritic cell Macrophage Hepatitis B *** 0200 0200 Systemic scleroderma Monocytes/macrophages > LCH cells T cells > LCH cells *** Macrophage CD4+ T cell Langerhans cell histiocytosis *** Alveolar macrophage Regulatory T cell Peripheral blood T lymphocyte 02550 75 100 Enrichr score 0200 0200 Enrichr score Enrichr score 6 LCH cells vs. B cells 6 LCH cells vs. pDCs D targets and pathways of DNA replication and repair are enriched in LCH cells S100A4 S100A4 HALLMARK_MYC_TARGETS_V2 TACSTD2 4 4 MMP9 FDR: 0 NFKBIA CD1A Normalized ES: 2.12 HLA–DQB2 CD207

2 2 CD207 0.4 0.6 CD1E IGKC IGKC IRF7 IGHG1 IGHG1 CD79A IRF8 Average UMI count (log): LCH Average UMI count (log): LCH Average 0.0 0.2 0 0 Enrichment score (ES)

0246 0246 Higher in LCH Lower in LCH Average UMI count (log): B cellsAverage UMI count (log): pDCs

LCH cells > B cells LCH cells > pDCs MYC targets v2 *** Dendritic cell Dendritic cell MYC targets v1 *** Blood dendritic cell Blood dendritic cell E2F targets ** Cord blood Cord blood Oxidative phosphorylation ** 0200 0200 MTORC1 signaling ** B cells > LCH cells pDCs > LCH cells DNA repair ** Protein secretion * CD19+ B cell Plasmacytoid dendritic cell Fatty acid metabolism * Plasma cell Plasma cell Apical junction * Spleen (bulk tissue) Spleen (bulk tissue) Glycolysis * 0200 0200 KRAS signaling up * Enrichr scoreEnrichr score 0.00.5 1.01.5 2.0 Normalized enrichment score B LCH cells express a distinctive gene signature with intrapatient and interpatient variation LCH and immune cells DBI VIM TNF CA2 RM1 GSN PKIB PAK1 CKLF CD33 CD58 VASP CD1A CD1E CD1C GLRX PON2 RXRA LMNA GHRL PTMS MMP9 RHOC TIMP1 S100B TIGAR TEX30 FFAR4 CD207 PLEK2 M109A RPS19 PTPN6 CSTD2 CSF1R GPR84 RGS16 RGS10 PRDX1 PA TRPM4 FCGBP CCND1 PPM1N CAMK1 FILIP1L NDRG2 TRIM13 HPGDS SPINK2 S100A4 PDLIM7 RPS27L YWHAH LGALS1 PHLDA2 IL22RA2 FCER1A CARD19 MAP2K1 CDKN2A S100A 11 C15orf48 FA LGALS12 RASL10A CLEC10A TA Patient: RNASET2 NGFRAP1 TMEM14C CTNNBIP1 –404O13.5 HLA–DPA1 HLA–DPB1 HLA–DQA1 HLA–DQB1 HLA–DQB2 HLA–DQA2 –350J20.12 Cell type: RP 11 RP 11

Column z-score: Patient: E HIJKNO (gene expression)

Cell type: LCH T cell Mono./mac. B cell pDC − 1. − 1 − 0. 0 0.5 1 1.5 5 5

Figure 2. Characteristic gene expression patterns distinguish LCH cells from other immune cells present in LCH lesions. A, Scatter plots comparing gene expression (mean normalized UMI counts plotted on logarithmic scale) between LCH cells and non-LCH immune cell populations. Differentially expressed genes specific for LCH cells are colored in red, whereas genes specific for the immune cell populations are colored in violet (monocyte/macrophages), blue (T cells), green (B cells), and orange (plasmacytoid dendritic cells). The top three enriched categories from the ARCHS4 tissue database based on Enrichr are shown below. B, Heat map showing the expression of the genes in the LCH gene signature (genes overexpressed in LCH cells compared with at least three of the four immune cell populations in A) in the LCH and immune cells analyzed (rows). Cell type and patient are indicated by the colored bars on the right. C, Enrichment analysis with Enrichr for the LCH signature genes using three databases (top to bottom: ARCHS4 Tissues, Gene Ontology, or Jensen Diseases), ordered by the Enrichr combined score (58). FDR-adjusted P value: *, P < 0.1; ***, P < 0.001. Further details are provided in Supplementary Table S3. D, Gene set enrichment analysis (GSEA) for LCH-cell gene expression compared with immune cells using MSigDB hallmark signatures, including a GSEA plot for the most significant gene set (MYC_TARGETS_V2, top) and all significant enrichments visualized in the bar plot (bottom). FDR-adjustedP value: *, P < 0.1; **, P < 0.05; ***, P < 0.001.

October 2019 CANCER DISCOVERY | 1409

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

RESEARCH ARTICLE Halbritter et al.

A BD LCH cells cluster into consistent subsets Most LCH-cell subsets are shared LCH-cell subsets exhibit distinct gene signatures across patients between patients independent of S1 S2S12 S11 S13 S14 sample characteristics HMMR S4 MKI67 Patient:E HIJKNO AURKB 100 AURKA S3 CDK1 S6 50 S1 % cells in 0 S12 LCH subset S1 IL22RA2

S8 S1 S2 S3 S4 S5 S6 S7 S8 S9 CD1A S11 S7 S10 S12 S13 S14

S11 S9 C LCH-cell subsets differ by CDK1 MKI67 single-cell entropy MYBL2 1.000 AURKB S2 CD1A

t-SNE dimension 2 S5 CD300A S13

CLEC9A S10 0.995 BATF3 IRF8

S14 S12 S2 CD83 Single-cell entropy

LCH subset marker genes LAMP3 CCR7 S1 S2 S3 S4 S5 S6 S7 S8 S9 11 S11 IL7R t-SNE dimension 1 S10 S12 S13 S14 CXCR4 1 High-entropy Low-entropy HLA–DQA2 cell subsets cell subsets IFI6 0 IFIT3 MMP12 S100A3 −1

JDP2 Row z-score: MMP9

CD68 (gene expression) S14 S1 3S Clustered single-cell transcriptomes of LCH cells

EFLCH-cell subsets with higher entropy are enriched for cell cycle and DNA repair genes, Declining entropy is paralleled by lower expression H Mutation analysis confirms LCH as opposed to inflammation-related genes and disease gene signatures in low-entropy subsets of genes linked to cell cycle and DNA replication driver mutation in LCH-S1/-S12 KEGG Gene Ontology V600E TUBB (r = 0.54) BRAF Oocyte meiosis Reg. of mitotic cell cycle phase transition Mutated Unmutated *** *** 2 TUBA1B (r = 0.53) Cell cycle *** Mitotic nuclear division *** MKI67 (r = 0.52) LCH-S12 1 Progesterone–med. oocyte maturation S1 Mitotic sister chromatid segregation S1 CD1A (r = 0.30) LCH-S1 *** *** CD1A−CD207− DNA replication *** DNA metabolism *** 0 CD1A+CD207+ Bulk biopsy Cell cycle *** DNA replication *** −1 Pyrimidine metabolism S2 G1/S transition of mitotic cell cycle S2 Scaled UMI count *** *** Highest Lowest 0 100 Nicotinate/nicotinamide metabolism * Reg. of cytokine secretion ** Signaling entropy % cells Malaria Pos. reg. of NK cell cytotoxicity * Starch/sucrose metabolism S12 Reg. of TLR9 pathway S12 GIEpidermal Langerhans cell genes are Analysis using the HUMARA assay NFκB pathway *** Cytokine-mediated pathway *** primarily expressed in LCH-S1/-S2 confirms monclonality of LCH-S1/-S12 Toxoplasmosis Nuclear-transcribed mRNA catabolism ** Langerhans cell signature HUMARA Epstein–Barr virus infection ** S11 Dendritic cell chemotaxis * S11 Short allele Long allele −0.1 Influenza A *** Type I interferon patway *** LCH-S12 Tuberculosis Cellular response to type I interferon LCH-S1 *** *** − − Phagosome Neg. reg. of viral genome replication CD1A CD207 *** S13 *** S13 CD1A+CD207+ −0.2 Lysosome *** Neutrophil degranulation *** Bulk biopsy Rheumatoid arthritis *** Neutrophil mediated immunity *** Polyclonal Monoclonal Osteoclast differentiation ** S14 Reg. of long-term synaptic potentiation * S14 Scaled UMI count (mean) 0 100 02550 75 04080 120 S1 S2 S3 S4 S5 S6 S7 S8 S9 % allelic frequency S11 Enrichr score Enrichr score S10 S12 S13 S14

Figure 3. LCH lesions comprise proliferating LCH progenitors and multiple differentiated LCH-cell subsets. A, Low-dimensional projection (t-SNE plot) of LCH-cell transcriptomes, excluding non-LCH immune cells. LCH cells were clustered into 14 subsets. Dashed lines highlight the putative progeni- tor cell subsets (LCH-C1, LCH-C2) and the most differentiated cell subsets (LCH-C11 to LCH-C14) based on the analysis of transcriptome entropy (C). B, Percentage of cells assigned to each of the 14 LCH-cell subsets (from A) for each analyzed patient biopsy sample. C, LCH-cell subsets ordered by the median single-cell entropy of the corresponding single-cell transcriptomes. High entropy indicates promiscuity of gene expression, which has been described as a hallmark as undifferentiated cells. Dashed lines highlight the putative progenitor cell subsets (LCH-C1, LCH-C2) and the most differenti- ated cell subsets (LCH-C11 to LCH-C14) as in A. D, Heat map showing expression levels of differentially expressed genes between the two LCH subsets with highest entropy and the four subsets with lowest entropy. Selected hallmark genes are highlighted for each subset. E, Enrichment analysis with Enrichr for the LCH subset signature genes (from D) using KEGG pathways and Gene Ontology, ordered by the Enrichr combined score. FDR-adjusted P value: *, P < 0.1; **, P < 0.05; ***, P < 0.001. Further details are provided in Supplementary Table S3. F, Smoothed line plots displaying the relation between decreasing entropy (x-axis; ranked) and the expression of selected genes. The cell-cycle genes TUBB, TUBA1B, and MKI67 showed the highest Pearson correlation (r) between entropy and expression. A weaker correlation was found also between entropy and the expression of the canonical LCH marker gene CD1A. G, Expression of genes characteristic of (non-LCH) epidermal Langerhans cells in each LCH-cell subset, displaying the mean of the scaled UMI counts of all genes in the LCH subset signature. H, Assessment of BRAFV600E mutation burden (in percent) based on allele-specific qPCR in sorted LCH-S12 cells, LCH-S1 cells, CD1A/CD207-negative (non-LCH) cells, CD1A/CD207 double-positive LCH cells, and bulk biopsy cells. All data refer to patient sample LCH_E. I, Assessment of cell clonality using the HUMARA assay in the same sorted cell populations as in H, and in known polyclonal and monoclonal cell lines as negative and positive controls, respectively.

IFN signaling and antigen presentation, reflecting the inflam- and it established a catalog of potential targets for future matory nature of LCH lesions. Moreover, MYC-associated genes research on disease mechanisms and molecularly guided were characteristically enriched in the LCH cells, and the same therapies in LCH. was true for pathways linked to the cell cycle and DNA repair (Fig. 2D). In aggregate, our analysis identified an LCH gene LCH Lesions Harbor a Developmental Hierarchy signature that is consistent with the neoplastic (cell cycle, of LCH Progenitors and Their Progeny DNA replication, MYC) as well as the inflammatory (immu- We next exploited the single-cell resolution of our dataset to nity, inflammation, macrophages, dendritic cells) phenotype investigate cellular and molecular heterogeneity among LCH of the disease, it confirmed and extended previous reports cells. To that end, we clustered the single-cell transcriptomes describing gene-expression profiles of bulk LCH samples, of all biopsies and identified 14 discernible subsets (Fig. 3A).

1410 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Developmental Hierarchy in Langerhans Cell Histiocytosis RESEARCH ARTICLE

Each of these subsets was represented in all samples, despite We applied our LCH subset gene signatures to published differences in sample characteristics such as disease extent and LCH bulk transcriptome datasets (refs. 8, 14, 15; Sup- site of the biopsied lesion (Fig. 3B; Supplementary Fig. S4A). plementary Fig. S4C–S4E) and observed patient-to-patient Given that LCH is generally considered a clonal disease (16, heterogeneity in the relative strength of these signatures, sup- 17), we hypothesized that the observed heterogeneity with its porting that LCH lesions can differ in their composition of recurrent subsets of LCH cells may arise from developmental LCH-cell subsets. Moreover, this analysis showed that many of processes such as cellular differentiation, dedifferentiation, the genes in the LCH-S11 signature were also highly expressed or transdifferentiation. This would be in line with the clinical in epidermal Langerhans cells, and genes in the LCH-S12 observation of a dynamic nature of LCH lesions, which often signature were highly expressed in dendritic cells, whereas the develop and resolve spontaneously (1, 12). signatures of these subsets were only weakly detectable in the Pursuing the hypothesis that a developmental hierarchy other histiocytoses analyzed (Supplementary Fig. S4E). exists within LCH lesions, we exploited previous findings Across all LCH cells (independent of their subset), we that transcriptional promiscuity is a hallmark of undifferen- observed a gradual loss of expression of cell-proliferation tiated cells, including stem cell and progenitor cell popula- genes such as TUBB, TUBA1B, and MKI67 as entropy levels tions (18–21). We quantified this property by calculating decreased and cells became more differentiated (Fig. 3F). This the “single-cell entropy” for each single-cell transcriptome, trend was further associated with a decrease in CD1A expres- which provides a measure of lack of order (interpreted here sion in the lowest entropy cells (Fig. 3F) and with a reduction as a lack of specialization), in analogy to the term’s use in in the expression of genes associated with epidermal Langer- physics (thermodynamics). We found that the 14 LCH-cell hans cells, which was most prominent in the LCH-S12 subset subsets differed in their levels of entropy, forming a gradient (Fig. 3G). Notably, the LCH-S11 subset displayed reduced of LCH cells with progressively lower entropy, corresponding expression of the overall Langerhans cell signature despite high to more restricted transcriptomes and more differentiated expression of individual Langerhans cell genes in the LCH-S11 cell states (Fig. 3C). We therefore labeled the 14 LCH-cell gene signature. Taken together, these observations lend further subsets by decreasing entropy, starting from LCH-S1 and support to a model where LCH progenitor cells with high cell LCH-S2 as the least differentiated LCH subsets and ending proliferation and high levels of CD1A marker expression give with LCH-S11 to LCH-S14 as the most differentiated sub- rise, through a gradual process, to differentiated cell subsets sets. We also identified multiple intermediate states, indica- that are less proliferative and carry gene-expression profiles tive of a continuous developmental process unfolding within reminiscent of differentiated immune cells, including that of LCH lesions. For further analysis, we focused on the extreme dendritic cells (most pronounced in the LCH-S12 subset). points of this developmental hierarchy, which exhibited To confirm that the analyzed cell subsets indeed consti- markedly distinct gene expression (Fig. 3A and D; Supple- tute bona fide LCH cells, we performed two complementary mentary Fig. S4B). validations, assaying BRAFV600E mutation status as well as cell Comparing the highest-entropy, least differentiated cell clonality for representative LCH subsets. We prospectively subsets (LCH-S1 and LCH-S2) with the four lowest-entropy enriched cells from the LCH-S1 and LCH-S12 subsets, as well subsets (LCH-S11 to LCH-S14), we identified characteris- as CD1A+CD207+ LCH cells and CD1A−CD207− non-LCH cells, tic transcriptional profiles associated with each LCH-cell for the patient sample with the highest percentage of LCH-S12 subset (Fig. 3D; Supplementary Table S2). In both LCH-S1 cells (Supplementary Fig. S4F and S4G). We then quantified and LCH-S2, we detected high expression of CD1A and of the BRAFV600E mutation rate in each sorted cell population genes associated with cell proliferation, including MKI67 using allele-specific droplet digital PCR (24). Reassuringly, both (which encodes the canonical proliferation marker Ki-67) and LCH subsets as well as the bulk LCH-cell population displayed the aurora kinases AURKA and AURKB—consistent with the a BRAFV600E mutation rate in the range of 85% to 90% (Fig. 3H). interpretation that these two subsets constitute proliferative, We further assessed clonality for the same cell populations progenitor-like LCH cells. Pathway enrichment analyses cor- using the HUMARA assay (16, 17), which evaluates X chromo- roborated their proliferative nature with specific enrichment some inactivation status in female-derived samples (such as the for DNA-replication and cell cycle–regulated genes (Fig. 3E). tested LCH lesion). Indeed, we found that both LCH subsets as In contrast, the lowest entropy and putatively more differen- well as the bulk LCH-cell population showed substantial skew- tiated LCH-cell subsets LCH-S11 to LCH-S14 were character- ing similar to the positive (monoclonal) control, whereas non- ized by high expression of immune genes involved in cellular LCH cells were more similar to the negative (polyclonal) control processes such as cytokine signaling, chemotaxis, and IFN sig­ (Fig. 3I). These results demonstrate that the LCH-S1 and LCH- naling. Specifically, LCH-S11 cells expressed markers of mature S12 cell subsets constitute bona fide LCH cells of clonal origin dendritic cells such as CD83 and LAMP3, as well as genes for that carry the BRAFV600E driver mutation. two receptors involved in Langerhans cell migration, CXCR4 We next tested whether the results obtained on the merged and CCR7 (3, 22); LCH-S12 cells expressed CLEC9A, BATF3, and dataset comprising all 7 patients with LCH were replicated in IRF8, which are characteristic of certain dendritic cell popula- the individual LCH lesions (Supplementary Fig. S5A–S5C). tions (23); LCH-S13 cells expressed genes associated with IFN Indeed, cells corresponding to the progenitor-like LCH-S1 response; and LCH-S14 cells were characterized by expression subset consistently exhibited high levels of entropy in all of MMP9, CD68, CD63, and the peptidase ANPEP (CD13), and 7 lesion-specific single-cell transcriptome datasets, whereas exhibited pathway enrichment for osteoclast differentiation and cells corresponding to the more differentiated cell subsets rheumatoid arthritis, suggesting that this LCH-cell subset may (LCH-S11 to LCH-S14) had substantially lower levels of be involved in osteolysis and tissue destruction (Fig. 3D and 3E). entropy in each analyzed LCH patient sample.

October 2019 CANCER DISCOVERY | 1411

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

RESEARCH ARTICLE Halbritter et al.

A B Single-cell transcriptome signatures distinguish between different LCH-cell subsets MKI67, HMMR, and CLEC9A identify specific LCH-cell subsets as single marker genes LCH-S1 LCH-S2 LCH-S12 HMMRMKI67 CLEC9A t-SNE dimension 2 t-SNE dimension 2 Specific: LCH-S1 Shared: LCH-S1/S2 Specific: LCH-S12 t-SNE dimension 1 LCH subset signature t-SNE dimension 1 Normalized UMI count Low High Low High

CDImmunohistochemistry identifies A subset of LCH cells is positive E A small subset of LCH cells is positive HMMR-positive LCH cells for HMMR and CD1A for CD1A and CLEC9A

20 µm 20 µm 20 µm HMMR HMMR CD1A DAPI CD1A CLEC9A DAPI

Figure 4. The cell-surface markers HMMR and CLEC9A identify specific LCH-cell subsets. A, Low-dimensional projection (t-SNE plot) of LCH-cell transcriptomes (same layout as in Fig. 3A) overlaid with the intensity of LCH subset–specific gene expression signatures (calculated as the mean scaled UMI count for all genes in the signature). Dark red color indicates high expression levels. B, Low-dimensional projection (t-SNE plot) of the LCH-cell transcriptomes (as in A) overlaid with the expression of selected LCH subset–specific marker genes (dark blue color indicates high expression levels). Expression of HMMR is specific to LCH-S1 cells,MKI67 expression overlaps strongly with both the LCH-S1 and LCH-S2 cell subsets, and CLEC9A is detected almost exclusively in LCH-S12 cells. C, IHC image of HMMR (brown) staining of an LCH biopsy. D, Immunofluorescence image of HMMR (green), CD1A (red), and DAPI (blue) staining of an LCH biopsy. E, Immunofluorescence image of CD1A (green), CLEC9A (red), and DAPI (blue) staining of an LCH biopsy.

Finally, to visualize the different LCH-cell subsets in tis- subsets from the same sample for which we confirmed sue slices of LCH lesions (with potential future relevance for BRAFV600E mutation rates and cell clonality (Fig. 3H and clinical diagnostics), we searched our transcriptome dataset I). We performed chromatin profiling by assay for trans- for subset-specific markers that can be analyzed using IHC posase-accessible chromatin using sequencing (ATAC-seq; or immunofluorescence, and we found promising candidates ref. 29) on enriched cells of the progenitor-like cell subset (Fig. 4A and B). Ki-67 (MKI67) and HMMR, in combination LCH-S1 and of two differentiated cell subsets, LCH-S11 with the LCH-specific markers CD1A and CD207, were iden- and LCH-S12 (Supplementary Fig. S7A and S7B). Tran- tified as candidate markers for the LCH-S1 and LCH-S2 sub- scriptional regulators and signaling pathways identified sets; and CLEC9A, in combination with CD1A and CD207, in this analysis may aid in understanding the develop- was identified as a candidate marker for the LCH-S12 cell mental processes in LCH lesions and potentially identify subset (Fig. 4C–E; Supplementary Fig. S6A–S6G). candidate targets for future molecularly guided therapies In aggregate, our results support a developmental hier- of LCH. archy within LCH lesions that is seeded by progenitor-like We integrated the ATAC-seq data with our matched single- LCH-S1/LCH-S2 cells and gives rise to more differentiated cell RNA-sequencing (RNA-seq) data, to connect chromatin- LCH-cell subsets, which we interpret as maturing LCH accessible regulatory regions to the genes that they may cells (LCH-S11 and LCH-S12), based on transcriptome regulate. To link regulatory regions to their most likely features resembling epidermal Langerhans cell activation target genes, we considered not only proximity in the DNA and dendritic cell maturation (25, 26), and as aggressive sequence but also proximity in the chromatin 3-D struc- LCH cells (LCH-S13 and LCH-S14), based on the expres- ture according to HiC data for hematopoietic cells (30). We sion of genes linked to destructive inflammatory behavior then plotted the average chromatin accessibility of gene- (15, 27, 28). linked regulatory regions against the expression of each respec- tive gene, focusing on differences between the LCH-S1 and Aberrant Regulatory Programs Orchestrate the LCH-S11, and between the LCH-S1 and LCH-S12 subsets Developmental Hierarchy in LCH Lesions (Fig. 5A). LCH-S12 cells displayed increased gene expression, To explore the regulatory basis of the observed develop- and a concordant gain in chromatin accessibility, for the mental hierarchy, we prospectively enriched selected LCH key myeloid regulatory gene IRF8, whereas LCH-S11 cells

1412 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Developmental Hierarchy in Langerhans Cell Histiocytosis RESEARCH ARTICLE

A Chromatin accessibility reflects transcriptional B Chromatin accessibility of regulatory modules D Regulatory modules are specifically enriched for ChIP-seq–based differences in LCH cell subsets discriminates LCH cell subsets binding sites of transcription factors and epigenetic regulators Differentially accessible regions (n = 1,964) ChIP-seq overlap: DNMT1 RELB a1 *** *** ** ******* ** 4 −1 CENPA a12 *** *** *** ***** 3 12 *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** a11 odds

IRF8 2 2 ) KDM2B i1 ** *** *** ***** *** * *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 2 LCH-S12 i12 1 FACS 0 Log *** *** ***** *** 0

BATF3 1S Binding site i11 2 1 3

SMYD3 JUNB 11 S1 enrichment 5A 1 ? MYB IRF4 IRF3 IRF1 SPI1 MXI1 AT AC-seq BATF TCF3 BCL3 BCL6 PHF8 MTA3 JUNB CBX3 CDK7 CDK9 LCH-S1 S TAT S TAT S TAT EP300 FOSL1 FOSL2 GTF2B MYBL2 RBBP5 KDM4A KDM1A KDM5B KDM5C S TAT BCL11A HMGN3 TCF7L2 NFATC1

LCH-S1 LCH-S12 S1 scRNA-seq: a1 i1 a12 i12 a11i11 S1 DNMT1 RELB −1 S12 Regulatory modules of genomic regions LCH_E ≥2 MXD3 JUNB S11 TF motif CENPA S1 S1 2S (ax/ix = active/inactive in LCH cell subset X) LCH-S11 0 ETV3 enrichment Overexpressed in subset: S1 Lymph node PHF19 Column z-score: 0 KDM2B REL All S12 (chromatin accessibility) Norm. UMI biopsy (LCH_E) AT AC-seq (normalized RPM, log S11 1 SMYD3 samples 2 1 3 ATCG? 5A

LCH-S1 110 AT AT − AT MYB IRF4 IRF3 IRF1 SPI1 MXI1 AT BATF TCF3 BCL3 BCL6 PHF8 MTA3 JUNB CBX3 CDK7 CDK9 Gene set ST ST ST EP300 FOSL1 FOSL2 GTF2B enrichment MYBL2 RBBP5 KDM4A KDM1A KDM5B KDM5C ST BCL11A HMGN3 TCF7L2 1 −10 NFATC1 LCH-S1 LCH-S11

scRNA-seq (normalized UMI count, log2)

Genes for developmental cell types, cell cycle, and cancer are linked to a5 regions, Regulatory modules are linked to target genes via CEas opposed to immune-related genes linked to other regulatory modules consensus binding motifs

Jensen NCI- ARCHS4 Motif prevalence: diseases nature tissues a1 *** *** *** *** **** *** *** *** *** *** *** 4 a12 *** *** *** *** *** *** *** * 3 Pancreatic cancer IL23-mediated signalling Fetal brain cortex a11 *** *** *** *** *** ** *** *** *** *** ** odds 2 Liver cancer Netrin-mediated signalling Neuronal epithelium i1 ** *** *** *** 2 a1 1 BATF3: IFN pathway Myoblast i12 *** ** ** Kidney cancer γ Log ** i11 *** *** *** ***** ***** *** *** *** *** *** * 0 Systemic lupus erythematosus ** FGF pathway Macrophage *** 1 C1 AT

Multiple sclerosis IFNγ pathway Dendritic cell REL

** *** IRF8 SPIB a12 MITF JDP2 AT ETV3 JUNB RELB SOX4 ST Asthma Ephrin B reverse signalling Myoblast BATF3 IRF8: TFDP1 FOSL2

** *** MYBL2 PRDM1 Anxiety disorder E-cadherin in nascent adherens junction Blood PBMC * NF Systemic lupus erythematosus Reg. of Kidney (bulk) * a11i1i12i11 Celiac disease P38 signalling by MAPKAP Cord blood * scRNA-seq: + JDP2: Williams–Beuren syndrome Reg. of retinoblastoma protein CD34 cell *** S1 Type 1 diabetes mellitus RAC1 pathway Kidney (bulk) ** S12

Multi. endocrine neoplasia type1 ATF-2 transcription factor network Natural killer cell ** LCH_E S11 ≥2 Small intestine cancer Integrin-linked kinase signalling Neuronal epithelium ** MYBL2: REM sleep behavior disorder IGF1 pathway Myoblast S1 * S12 0

Kidney cancer Signalling mediated by VEGFR1/2 Kidney (bulk) All S11 Norm. UMI samples Sclerocornea C-MYB transcription factor network Placenta (bulk) *** 1 C1

Kidney cancer HIV-1 Nef: Neg. effector of Fas/TNF Alveolar macrophage AT

** ** REL IRF8

+ SPIB MITF JDP2 AT ETV3 JUNB RELB Lung cancer JNK signalling in CD4 TCR pathway Dendritic cell SOX4 ST ** BATF3 TFDP1 FOSL2 MYBL2 PRDM1 010 20 0510 0102030 NF Enrichr scoreEnrichr scoreEnrichr score

Figure 5. Characteristic patterns of chromatin accessibility distinguish between LCH-cell subsets. A, Scatter plots contrasting differential gene expression with differential chromatin accessibility between progenitor-like LCH-S1 cells and the more differentiated LCH subsets LCH-S12 (top) and LCH-S11 (bottom). The x-axis denotes differences in gene expression (mean normalized UMI count based on scRNA-seq) and the y-axis displays the mean difference in chromatin accessibility over all regulatory regions linked to the respective gene [mean of two replicates, mean normalized reads per million (RPM)]. Colored points denote LCH subset–specific marker genes (from Fig. 3D). B, Heat map showing ATAC-seq signal intensity for 1,964 differentially accessible regions identified in pair-wise comparisons of one LCH-cell subset against the two other subsets. Regions are grouped into six modules based on differential chromatin accessibility. ATAC-seq signal intensity scores are scaled by column (ATAC-seq peaks) for better visualization of differences. C, Enrichment analysis with Enrichr for genes linked to each of the six modules of LCH subset–specific chromatin-accessible regions, showing the top 3 most enriched terms ordered by the Enrichr combined score (58). FDR-adjusted P value: *, P < 0.1; **, P < 0.05. Further details are provided in Supplementary Table S3. D, Heat map of transcription factor binding enrichment for LCH subset–specific modules, based on LOLA (31) analysis using a large collection of chromatin immunoprecipitation sequencing (ChIP-seq) peak profiles. Top, LOLA enrichments colored by the log2 odds ratio of enriched overlap between the region modules and ChIP-seq peaks for the corresponding transcrip- tion factor, compared with the background of all regulatory regions in the ATAC-seq dataset. Bottom, mean expression (normalized UMI count) of the gene encoding each transcription factor in different LCH subsets (showing gene expression in LCH_E, which matches the ATAC-seq data, as well as the mean expression across all patient samples). Further details are provided in Supplementary Table S3. E, Enrichment of transcription factor binding motifs for LCH subset–specific modules, based on HOCOMOCO (32) motif occurrences identified using FIMO (59). Top, motif enrichments colored by the log2 odds ratio of enriched overlap between the module regions and DNA motif hits for the corresponding transcription factor, com- pared with the background of all regulatory regions identified in the ATAC-seq dataset. Bottom, expression levels of the genes encoding the corresponding transcription factors (as in D). Right, consensus DNA motif for selected transcription factors. FDR-adjusted P value, Fisher exact test: *, P < 0.1; **, P < 0.05, ***, P < 0.005.

displayed increased expression and chromatin accessibility i11, a12, and i12 as those specifically active or inactive in for the NFκB genes REL and RELB as well as the immune the corresponding LCH subsets. We characterized these regulatory genes JUNB and ETV3. regulatory modules by three complementary lines of bioin- For a more systematic analysis of the regulatory dynam- formatic analysis. ics in LCH lesions, we identified all genomic regions that First, enrichment analysis of genes associated with the displayed differential accessibility between the LCH-S1, regulatory regions identified gene sets implicated in cancer LCH-S11, and/or LCH-S12 subsets (n = 1,964; Fig. 5B; (LCH-S1) and in immune diseases (LCH-S12), consistent Supplementary Table S2). We refer to the set of regula- with the dual nature of LCH as a neoplasm with a strong tory regions that showed increased chromatin accessibility inflammatory component (Fig. 5C; Supplementary Table S3). (activation) in LCH subset LCH-S1 relative to LCH-S11 or We also identified genes involved in various immune-related LCH-S12 as a1 (for “active”) and to those that were char- pathways, including IL and IFN signaling. Regions that were acterized by decreased accessibility (inactivation) as i1 (for differentially accessible in the more differentiated LCH-S11 “inactive”). We analogously defined region modules a11, or LCH-S12 cell subsets (region modules a11 and a12) were

October 2019 CANCER DISCOVERY | 1413

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

RESEARCH ARTICLE Halbritter et al. frequently associated with genes expressed in macrophages, ChIP-seq data and sequence motifs), and then linked regu- dendritic cells, and other mononuclear blood cells, consist- latory regions to their putative target genes (using sequence ent with the transcriptional signatures we discovered in the proximity and HiC data), which gave rise to a network gene-expression analysis. model that incorporates data from all LCH subsets. Next, Second, we sought to identify the key regulators of each we derived network models specific to each LCH subset module using enrichment analysis for transcription fac- by parameterizing this global network with the measured tor (TF) binding sites. To that end, we performed genomic chromatin accessibility and gene-expression levels in each region enrichment analysis using the LOLA software (31) LCH subset using the matched ATAC-seq and single-cell for a large collection of TF binding sites experimentally RNA-seq data for the most comprehensively characterized determined by chromatin immunoprecipitation sequencing biopsy LCH_E (Fig. 6A). (ChIP-seq; Supplementary Table S3). Of all TFs identified as This network-based analysis revealed gene-regulatory hubs significantly enriched in at least one module, 34 TFs were shared across all LCH subsets, such as the hematopoietic either among the genes previously identified as differentially TF SPI1 (PU.1; Fig. 6B), but also subset-specific differences expressed in one of the LCH subsets or directly linked to (Fig. 6C). For example, the TFs BATF, TCF3, TCF7L2, and such a gene (Fig. 5D; Supplementary Table S3). Notably, all MYBL2 were highly prominent in the regulatory network of of these TFs were enriched in regulatory regions with inac- LCH-S1. MYBL2 is known to be involved in cancer initiation cessible chromatin in LCH-S1 cells (module i1), whereas and cell proliferation, often associated with poor survival some were accessible in LCH-S11 (module a11), underlining (33, 37, 38). Lysine demethylases (KDM) including KDM1A the contrasting nature of these two LCH subsets. Several and KDM4A were also most active in LCH-S1 cells, consist- TFs were enriched in multiple region modules, indicating ent with the progenitor-like state of LCH-S1 and suggesting broad relevance for LCH cells, which included the AP1 fac- that developmental TFs may be retained in a poised state tors JUNB and FOSL2 as well as the STAT and IRF families (39–42). The regulatory network of LCH-S12 was instead of TFs. We further observed enrichment of SPI1 (also known dominated by the TFs IRF8 (an IFN-dependent mediator of as PU.1) binding sites in all modules except those linked macrophage and dendritic cell function and differentiation; to LCH-S12 [both accessible (a12) and inaccessible (i12)], refs. 34, 36, 43), BATF3, and BCL6, which were all closely indicative of a central role of this hematopoietic master connected to STAT1. Moreover, many connections of SPI1 regulator in LCH. Interestingly, binding sites for the histone to its target genes in other LCH subsets were altered in lysine demethylase KDM1A (also known as LSD1) were LCH-S12 cells, possibly contributing to the specific expres- significantly enriched in module i1 (inaccessible in LCH-S1 sion of dendritic-cell genes in these cells. For the LCH-S11 progenitors) but not in other modules, suggesting a role of subset, the regulatory network is centered on NFκB (REL this repressive histone modifier in the development of the and RELB), STAT3, and AP1 (JUNB and FOSL2), which are more differentiated LCH subsets. present in all networks but most pronounced in LCH-S11 Third, we performed enrichment analysis for a compre- cells. Finally, we observed differential node importance for hensive catalog of DNA sequence binding motifs of TFs the histone acetyltransferase and transcriptional coactivator obtained from the HOCOMOCO database (ref. 32; Fig. EP300 in both LCH-S11 and LCH-S12 cells, implicating this 5E). Motif enrichment analysis is complementary to LOLA epigenetic modifier in the specification of more differenti- analysis (which uses cell type–specific ChIP-seq data) as ated LCH-cell subsets. it focuses on DNA-binding propensities that are intrinsic In summary, we used chromatin-accessibility profiling in to the TF and largely independent of cell type. Regula- prospectively purified LCH-cell subsets in combination with tory regions specifically active in LCH-S1 (module a1) were single-cell RNA-seq data and integrative bioinformatic analy- strongly enriched for the MYBL2 motif, which has a known sis to reconstruct the gene-regulatory networks that may con- regulatory role in cancer stem cells (33). In contrast, the a12 trol LCH subset–specific gene expression. Gene expression module (active in LCH-S12 cells) was enriched for motifs of the discussed regulators was largely consistent across all of the hematopoietic TFs IRF8 and BATF3. These factors analyzed lesions (Supplementary Fig. S7C–S7E), suggesting are critical for myeloid and dendritic cell differentiation that the prototype gene-regulatory networks put forward and function (34–36), possibly explaining the more mature here indeed capture the epigenomic program underlying the myeloid identity of LCH-S12 cells. Moreover, we observed developmental hierarchy in LCH lesions. enrichment for the JDP2 motif in the i1 module. This suggests that an inactivation of AP1 may regulate differen- tiation, in line with upregulation of JDP2 transcription in DISCUSSION more differentiated subsets (Fig. 3D). LCH lesions are thought to arise from hematopoietic pro- genitor cells that carry an activating mutation in the MAPK Regulatory Networks Identify Subset-Specific signaling pathway (2, 44). Despite their clonality and depend- Regulatory Hubs in LCH Lesions ence on a single oncogenic driver (most frequently the activat- Finally, we integrated our data into gene-regulatory net- ing BRAFV600E mutation), LCH is pathophysiologically and work models of LCH to combine and visualize the regu- clinically very different from hematopoietic cancers and from latory programs that we identified in the different LCH BRAFV600E-driven solid tumors. In this study, we systemati- subsets. We assigned the identified TFs (selected on the cally investigated the cellular and molecular heterogeneity in basis of genomic region enrichment and motif analysis) primary LCH lesions using single-cell transcriptome sequenc- to the regions that they are predicted to bind (based on ing in combination with epigenome profiling ofprospectiv­ ely

1414 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Developmental Hierarchy in Langerhans Cell Histiocytosis RESEARCH ARTICLE

A Chromatin accessibility and transcriptome profiles define gene regulatory networks specific to LCH-cell subsets

LCH-S1 LCH-S12 LCH-S11

STAT1 STAT1 STAT1

IRF8

KDM4A HMGN3 HMGN3 HMGN3 BATF3 MYBL2 TCF3 GTF2B GTF2B BATF TCF7L2 GTF2B EP300 STAT3 EP300 STAT3 KDM1A BCL6 EP300 JDP2 STAT3 BCL3 CDK9 CDK9 JUNB JUNB JUNB REL REL SPIB RBBP5 RBBP5

IRF1 IRF1 IRF1

RELB SPI1 SPI1 SPI1

Connection specific a1 i1 Constitutively expressed TF a12 i12 to regulatory module a11i11 Differentially expressed TF Node importance

B Large parts of the regulatory network are shared between LCH subsets LCH-S1 LCH-S12 LCH-S11 SPI1 SPI1 SPI1 HMGN3 IRF8 JUNB STAT1 HMGN3 HMGN3 JUNB STAT1 STAT3 STAT3 BATF3 IRF1 IRF1 JUNB REL GTF2B EP300 RELB BATF BCL6 STAT1 CDK9 IRF1 EP300 REL GTF2B CDK9 0369 0369 0369 Absolute node importance (log ) 2

C Distinct network components dominate LCH-subset-specific regulatory networks LCH-S1 LCH-S12 LCH-S11 BATF IRF8 RELB TCF3 BATF3 REL TCF7L2 BCL6 SPIB MYBL2 EP300 STAT3 KDM1A STAT1 JUNB KDM4A STAT5A IRF1 GTF2B IRF3 BCL3 STAT1 STAT2 MTA3 HMGN3 SOX4 FOSL2 IRF3 PHF8 ETV3 0 12 012 012 Differential node importance

Figure 6. Characteristic gene regulatory networks underlie the observed LCH developmental hierarchy. A, Gene regulatory networks inferred for three LCH-cell subsets (LCH-S1, LCH-S12, LCH-S11), based on single-cell transcriptome and ATAC-seq data, and the key regulators identified by the enrichment analysis in Fig. 5D and E. Nodes in the network correspond to the enriched transcription factors as well as their putative target genes (based on sequence proximity and chromatin 3-D structure). Node size is proportional to both gene-expression level and node out-degree (i.e., number of outgoing connections from the transcription factor) in the respective LCH subset. Edge colors indicate the module of the corresponding peak, and edge visibility is proportional to chromatin accessibility. The network layout was automatically generated using the igraph package. A browser-based version for interactive data exploration is available in Supplementary File S1 and on the Supplementary Website: http://LCH-hierarchy.computational- epigenetics.org. B, Bar plots showing the top 10 transcription factors ranked by node importance (calculated as a combination of gene expression level and node out-degree) in the networks in A. C, Bar plots showing the top 10 transcription factors ranked by differential node importance (node importance in one network relative to the mean across all three networks) in the networks in A.

enriched LCH-cell subsets. Our dataset comprises 19,044 that hematopoietic differentiation programs remain active single-cell transcriptomes across seven LCH biopsies, rep- in LCH cells and contribute to the cellular and molecular resenting the wide clinical spectrum of LCH. Integrative hierarchy observed in LCH lesions. Specifically, LCH-S12 cells bioinformatic analysis of this rich resource uncovered an expressed genes associated with dendritic cells, including unexpected degree of heterogeneity among LCH cells, as well CLEC9A and BATF3 (23); LCH-S11 cells carried features of as evidence of a shared developmental hierarchy that underlies maturing epidermal Langerhans cells, such as expression of all analyzed lesions. CCR7 and CXCR4 (22); and LCH-S13 as well as LCH-S14 cells Most notably, we identified two subsets of LCH cells were characterized by high expression of proinflammatory with hallmarks of progenitor cells (LCH-S1 and LCH-S2), genes, genes encoding metalloproteinases (MMP9, MMP12), including high single-cell entropy (indicative of an undif- and genes encoding aminopeptidases (ANPEP), indicating ferentiated cell state) and higher cell proliferation. On the that these LCH-cell subsets may contribute to tissue destruc- basis of our data, we put forward a conceptual model in tion as observed in LCH (45). Finally, we inferred preliminary which these progenitor-like LCH cells seed a hierarchy of gene-regulatory network models for three of the LCH-cell LCH cells that develop into more differentiated cell states subsets (LCH-S1, LCH-S12, LCH-C11), which implicated (Fig. 7). The four subsets of LCH cells with lowest transcrip- key regulators of immunity (including JAK–STAT, AP1, and tome entropy (indicative of a differentiated cell state) expressed NFκB signaling) and development (including several epi- genes reminiscent of different immune cell types, suggesting genetic modifiers) in the regulation of the developmental

October 2019 CANCER DISCOVERY | 1415

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

RESEARCH ARTICLE Halbritter et al.

LCH-S12 LCH-S1 CLEC9A HMMR

EP300 Maturing IRF8, BATF3 KDMs JAK/STAT signaling LCH-S11 AP1, NFKB CD83, LAMP3 CCR7, CXCR4 MKI67 LCH-S2 LCH-S13 AURKA/B MYBL2 MMP12 Interferon ? LCH-S14

MMP9 Destructive CD68 ANPEP Progenitors

Entropy

Figure 7. Speculative model of the LCH developmental hierarchy and underlying regulatory mechanisms. Schematic representation of the develop- mental hierarchy in LCH lesions based on the combined analysis of single-cell transcriptomes (Figs. 2 and 3) and epigenome data from prospectively sorted LCH-cell subsets (Figs. 5 and 6).

­hierarchy in LCH cells. Although the gene-regulatory net- To address the many questions posed by our preliminary works presented here are based on the analysis of one biopsy model, it will be necessary to develop new in vitro (e.g., by ATAC-seq and on external data (ChIP-seq, DNA motifs) patient-derived organoids) and in vivo (e.g., xenograft experi- from other biological systems, we have observed consistent ments; ref. 46) experimental systems for LCH research, transcriptional activity of key regulators in the other exam- and to closely align these systems with our observations in ined biopsies, supporting relevance to our conceptual under- ­primary patient samples. standing of LCH development. Our model has several potential implications for the Our proposed model (Fig. 7) provides a framework for future clinical management of LCH (and possibly other understanding the developmental and regulatory dynamics histiocytic diseases). First, we speculate that interindividual in LCH. This model will need to be refined, validated, and differences in the composition of LCH-cell subsets reflect improved by subsequent research, taking into account three differences in the aggressiveness of the disease and may cor- important points. First, the precise number of distinct LCH- relate with relevant clinical characteristics including prog- cell subsets remains somewhat arbitrary, given that cellular nosis. There are currently no established molecular markers differentiation is a gradual process, and any separation into for risk stratification in LCH, which constitutes an unmet a number of distinct cell populations is necessarily a simpli- clinical need given the range of treatment options (from fication. Although our bioinformatic analysis identified 14 watch-and-wait to intensive chemotherapy). Although sin- LCH-cell subsets in a data-driven manner, for many inves- gle-cell transcriptome sequencing may not be practical in a tigations it will make sense to combine related cell subsets, routine clinical setting, we were able to observe differences in as we have done for the two progenitor-like subsets (LCH- LCH subset composition in bulk RNA-seq (Supplementary S1 and LCH-S2). Second, it remains to be studied to what Fig. S4C–S4E), IHC, and immunofluorescence imaging (Fig. degree the cells that proceed along the developmental hier- 4; Supplementary Fig. S6). After validation in large patient archy in LCH lesions undergo irreversible cell-fate decisions cohorts, such markers may help inform patient-specific and to what degree they retain the plasticity to change from treatment decisions in LCH. Second, our data implicate sev- one LCH subset to another. For example, it seems possible eral immune-regulatory signaling pathways in the regulation that the three closely related cell subsets LCH-S11, LCH- of the developmental hierarchy, including the potentially S13, and LCH-S14 constitute different manifestations (e.g., destructive LCH-S13 and LCH-S14 cell subsets. This obser- depending on the cellular microenvironment) of one devel- vation may explain the success of anti-inflammatory drugs opmentally defined LCH-cell subset (Fig. 7, center right). such as indomethacin in the treatment of LCH, suggesting Third, our study focused on (peripheral) LCH lesions, but that such treatment might not only alleviate symptoms it is entirely possible that the developmental hierarchy in but indeed interfere with key pathogenic processes. This LCH lesions is seeded by an unseen LCH cell precursor that underlines the importance of the current randomization is located in a different anatomic location. For example, a between chemotherapy (mercaptopurine and methotrexate) plausible model of multisystem LCH may include an LCH and an anti-inflammatory drug (indomethacin) in the ongo- cell precursor in the bone marrow, which has acquired the ing international study for the treatment of LCH (LCH- clonal driver mutation and seeds LCH lesions at distal IV; ClinicalTrials.gov Identifier: NCT02205762). Third, a sites that support a proinflammatory microenvironment. thorough dissection of developmental hierarchies in other

1416 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Developmental Hierarchy in Langerhans Cell Histiocytosis RESEARCH ARTICLE histiocytic disorders such as Erdheim–Chester disease may 4′,6-diamidino-2-phenylindole dihydrochloride (DAPI, Roche refine our understanding of the similarities and differences Diagnostics), washed again, and mounted in aqueous mounting between these diseases and potentially give rise to more medium (PermaFluor, Thermo Fisher Scientific). Negative con- precise molecular diagnoses of diseases of the histiocytic trols were obtained by substituting isotype-matched IgG for the spectrum. respective primary antibodies and by applying the second-step antibodies and the anti-FITC antibodies alone. Image acquisition In summary, our study charts a detailed molecular map of was performed on a ZEISS LSM 510 equipped with four lasers LCH lesions, uncovering a cellular hierarchy that shows evi- and a ZEISS Axiovert 200 M using a ZEISS Plan Neofluar 40× dence of developmental, immunologic, and oncogenic regu- objective with oil immersion. Images were exported from ZEN latory mechanisms. Our results reinforce the view that LCH software (ZEISS). shares important aspects with cancers as well as immune diseases, which makes it a particularly interesting model for Flow Cytometry and Cell Sorting biomedical research. Moreover, this study demonstrates the The following antibodies were used for FACS purification of primary power of combining single-cell sequencing and epigenome LCH samples: CD45-PerCP (345809, BD Biosciences), CD1a-FITC profiling for dissecting complex developmental hierar- (555806, BD Biosciences), CD207 PE (IM3577, Beckman Coulter), chies and their regulatory underpinnings, thereby provid- CD14 Alexa Fluor 700 (A7-212-T100, Exbio). To select antibodies ing a rational basis for the development of personalized for the analysis of LCH subsets, we cross-checked marker gene lists therapies. for commercially available, high-quality antibodies. The following antibodies were used for FACS purification of LCH subsets: CD207 APC-Vio770 (130-112-371, Miltenyi Biotec), CD1a PE-Vio 770 (130- METHODS 112-872, Miltenyi Biotec), CD45 eFluor 506 (69-0459-42, eBiosci- ence), CD168 FITC (bs-4736R-FITC, Bioss), CD208 PE (130-104-392, Patient Cohort and Sample Collection Miltenyi Biotec), CD298 PerCP-Vio770 (130-101-294, Miltenyi LCH samples were obtained with informed consent from Biotec), CD300a Alexa 647 (566342, BD Biosciences), and CD370 patients undergoing routine diagnostic biopsies. Diagnosis was VioBlue (130-097-406, Miltenyi Biotec). LIVE/DEAD stain (Invitro- confirmed by central pathology review. Cell suspensions were pre- gen) was used for live/dead staining according to the manufacturer’s pared by dissociating collagenase IV (Worthington Biochemical) description. Cell sorting was performed on a FACSAria instrument and dispase II (Sigma-Aldrich)–treated tissue using a 70-μm cell (BD Biosciences). The BD FACSDiva software (BD Biosciences) was dissociation sieve (Sigma-Aldrich). Cells were then pelleted, resus- used for data analysis. pended in CellGro medium (CellGenix), and immunostained for FACS using the antibodies specified below. Sorting was performed HUMARA Assay on a BD FACSAria 1 as described previously (15). All protocols HUMARA assays were performed as described previously (47). for obtaining and studying patient material were approved by the Briefly, DNA isolated from unsorted LCH biopsy cells and LCH-cell responsible institutional review board and the ethics committee of subsets was digested with HpaII (New England Biolabs) or mock the Medical University of Vienna (Vienna, Austria). Sampling was digested. Amplification of the HUMARA STR region was performed done according to the regulations of the Declaration of Helsinki using Phusion HS II Polymerase (Thermo Fisher Scientific) and the after written informed consent. following primers: 5′-6FAM-TCCAGAATCTGTTCCAGAGCGTGC-3′, 5′-GCTGTGAAGGTTGCTGTTCCTCAT-3′. PCR products were ana- IHC lyzed and quantified using an Applied Biosystems 3730xl DNA Ana- IHC stainings were done on formalin-fixed, paraffin-embedded lyzer. The ratio of the active to the inactive X was tissue sections with antibodies against CD1a (NCL-L-CD1a-235, calculated using the area under the peak. To account for the ampli- Leica Biosystems), CD207 (CMC39221020, Cell Marque), and the fication bias of the smaller allele, a corrected ratio was calculated V600E mutation-specific BRAF antibody (06965814001, Spring Biosci- by dividing the ratio of the digested sample (allele 1/allele 2) by the ence). Staining was performed on an automated Leica BOND-III ratio of the mock digested sample (allele 1/allele 2). Ratios close to immunohistologic stainer. Images were captured with a Leica DM6 1 indicate a polyclonal cell population, and ratios above 3 are con- B using Leica LAS X software. sidered evidence of clonality (48). DNA isolated from a single-cell– derived cell line and from whole blood of a healthy donor was used Immunofluorescence as monoclonal control (ratio 4.1) or polyclonal control (ratio 0.9), OCT-embedded (Tissue-Plus O.C.T. Compound, Scigen Scien- respectively. tific), fresh-frozen tissue was cut into 7-μm sections and mounted on microscope slides (Dako). After 20 minutes of air-drying, the Allele-Specific Droplet Digital PCR sections were fixed in ice-cold acetone (Sigma-Aldrich) for 10 Droplet digital PCR for BRAFV600E was performed as described minutes, rehydrated in PBS (Gibco Life Technologies), and incu- previously (24), using the Mutation Assay BRAF p.V600E c.1799T-A, bated with PBS/2% goat serum (Dako) for 30 minutes at room Human (dHsaMDV2010027, Bio-Rad Laboratories), and the QX200 temperature. This was followed by an overnight incubation at Droplet Digital PCR System (Bio-Rad Laboratories). 4°C with either CD168 FITC (bs-4736R-FITC, Bioss) or CLEC9a PE (30-101-294, Miltenyi Biotec) diluted in PBS/2% BSA (Sigma- Aldrich). After washing with PBS, the second step antibodies (i.e., Single-Cell RNA-seq goat anti-rabbit IgG Alexa Fluor 488 and goat anti-mouse A546, Single-cell RNA-seq was performed using the 10× Genomics respectively) were applied for 1 hour at room temperature, washed Chromium Single Cell Controller with the Chromium Single Cell 3′ with PBS, blocked, and then stained with CD1a PE or CD1a FITC V2 Kit following the manufacturer’s instructions. After quality con- for 1 hour at room temperature. After washing with PBS, a sec- trol, libraries were sequenced on the Illumina HiSeq 4000 platform ond round of secondary antibody staining was performed (goat in 2 × 75bp paired-end mode. Supplementary Table S1 includes anti-mouse A546 or goat anti-FITC A488, respectively) for 1 hour an overview of sequencing data and performance metrics. Raw at room temperature. The slides were then counterstained with sequencing data were processed with the Cell Ranger v1.3.0 software

October 2019 CANCER DISCOVERY | 1417

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

RESEARCH ARTICLE Halbritter et al.

(10× Genomics) for demultiplexing and alignment to the GRCh38 Chromatin-Accessibility Mapping human reference transcriptome. Processed data were analyzed using the ATAC-seq was performed as described previously (53). Briefly, R statistics software and various Bioconductor packages. Specifi- 20,000 to 50,000 cells were lysed in a buffer containing digitonin and cally, we used Seurat v2.3.4 (49) to load preprocessed results from Tn5 transposase enzyme (Illumina). After incubation at 37°C for Cell Ranger into R and to perform quality control (removing cells 30 minutes, tagmented DNA was purified and enriched. After final with less than 1,000 genes or mitochondrial content greater than purification, the libraries were quality checked using a 2100 Bio- 10%). Unless otherwise stated, we used default parameters through- analyzer (Agilent) with high-sensitivity DNA chips. DNA concen- out. We merged all datasets, log-transformed, and normalized UMI tration was examined using Qubit Fluorometer, and the libraries counts, regressing out UMI count, mitochondrial content, cell-cycle were sequenced on the Illumina HiSeq 4000 platform in 1 × 50bp scores, patient ID, biopsy, sex, and age. We then used canonical cor- single-read mode. Supplementary Table S1 includes an overview relation analysis (CCA; using the RunMultiCCA function) using vari- of the sequencing data and performance metrics. Raw sequencing able genes in the merged dataset and seven canonical vectors. The data were trimmed using skewer v0.1.126 (54), followed by align- CCA-aligned space was used as input for low-dimensional projection ment to the GRCh38 assembly of the with Bowtie using t-SNE with all seven dimensions. Cells were clustered with the 2 v2.2.4 (ref. 55; parameters: –very-sensitive –no-discordant). Only FindClusters function (resolution = 0.2). Clusters in which at least deduplicated, uniquely mapped reads with mapping quality ≥30 25% of cells were double-positive for CD1A and CD207 (“positive were kept for further analysis. To identify accessible genomic list”; normalized UMI greater 1), and no more than 60% of cells were regions, we used MACS2 v2.1.0 (ref. 56; parameters: -q 0.1 -g hs). positive for either CD19, CD3D, CD27, IL32, CD7, NKG7, or CD163 Following initial data processing, all subsequent analyses were (“negative list”) were defined as LCH cells. These criteria select cells performed in R using Bioconductor packages. After merging peaks with high expression of canonical LCH markers and exclude cells across all ATAC-seq datasets and removing peaks that overlapped that are clearly identified as non-LCH immune cell types. We used the blacklisted regions from ENCODE (https://sites.google.com/site/ same procedure and thresholds to define immune cells usingCD1A anshulkundaje/projects/blacklists) or repetitive regions (obtained and CD207 as a negative marker list and as positive markers: CD19 from the UCSC Genome Browser), we quantified for each input for B cells, CD3D for T cells, CD14 for monocytes/macrophages, and dataset the number of reads overlapping the retained peaks. Raw IL3RA for plasmacytoid dendritic cells. To compare gene expres- read counts were loaded into DESeq2 v1.22.2 (57) for normaliza- sion of these immune cell types with that of LCH cells (Fig. 2), tion and differential analysis (using a minimum absolute log2 fold we used the FindMarkers function. To avoid possible confounding change of 1 as threshold; n = 2 replicates per LCH subset). We refer effects by sex differences (cells of both sexes are present at slightly to the set of regulatory regions that showed increased chroma- variable rates between cell types in our dataset), we performed this tin accessibility (activation) in LCH subset LCH-S1 compared to analysis stratified by sex and used the mean log2 fold change and LCH-S11 or LCH-S12 as a1 (for “active”) and to those that were maximum (worst) P value for each gene. An FDR-adjusted P value characterized by decreased accessibility (inactivation) in the same threshold of q ≤ 0.005 was used to select differentially expressed comparison as i1 (for “inactive”). Analogously, we defined region genes. To identify LCH-cell subsets, we repeated the CCA, clus- modules a11, i11, a12, and i12 as those that are specifically active tering, and dimensionality reduction after filtering out non-LCH or inactive in one LCH subset. To associate ATAC-seq peaks with cells and used the FindAllMarkers function to perform differential putative target genes, we used chromatin conformation data from analysis between LCH-cell subsets (Fig. 3), again stratified by sex 17 blood cell types (30), considering each interaction between a and using the same parameters. Differentially expressed genes and promoter and another genomic region observed in these cells as LCH-cell subset markers are listed in Supplementary Table S2. To a possible regulatory interaction. In addition, we assigned each quantify the single-cell entropy (also known as “signaling entropy”) peak also to the closest gene promoter in its vicinity. Annotated of each cell (18, 21, 50), we used the CompSRana function in the regulatory regions from the analysis of ATAC-seq data are listed in LandSCENT v0.99.2 package. Signaling entropy has been described Supplementary Table S2. as a robust measure of differentiation potential/progress. To be consistent with our other analyses, we modified the code of the LandSCENT package to allow use of our precalculated clusters and Enrichment Analysis for Genes and Genomic Regions dimensionality reduction. For the alternative analysis of LCH sub- To help understand the biological roles of differentially expressed sets in each sample separately (Supplementary Fig. S5), we filtered genes and differentially active genomic regions, we employed four the LCH dataset to contain only cells from one sample at a time types of enrichment analysis (Supplementary Table S3). First, we and performed principal component analysis, t-SNE dimensionality used Gene Set Enrichment Analysis (GSEA; v3.0) to test genes in reduction, and clustering independently (as in the standard Seu- LCH cells compared with immune cells (using the Wilcoxon test rat workflow). External microarray and RNA-seq for comparisons statistic as a ranking criterion) for an enrichment of annotations (Gene Expression Omnibus: GSE35340, GSE16395, GSE74442) from the list of hallmark gene signatures from MSigDB (Fig. 2D). were downloaded using the GEOquery package and quantile-nor- We considered pathways with an FDR-adjusted P value below 0.05 malized. Finally, to define reference gene expression signatures of significant. Second, we used the Enrichr API (ref. 58; v1.0) to test immune cells, we used microarray data from our previous research differentially expressed genes and genes linked to enhancers of (ref. 15; Gene Expression Omnibus: GSE35340, GSE114181). These interest for significant enrichment across a broad range of gene data were normalized using frma2 (parameters: summarize = “robust_ sets (Figs. 2A and C, 3E, and 5C). In all plots, we report the Enrichr weighted_average”) and ComBat from the sva package (51). The combined score calculated as log(Old.P.value) × Z.score by Enrichr. probe set with the highest variance per gene was chosen for further This amounts to a product of the significance estimate and the analysis, and we excluded genes with a z-score below 2 in all samples. magnitude of enrichment. Third, we used Locus Overlap Analysis We performed differential expression analysis with the limma (52) (ref. 31; LOLA; v1.12.0) to test regulatory regions identified in our package and used the mean of P values and fold changes to ATAC-seq data for significant overlaps with experimentally deter- determine the top genes associated with each of the four immune mined TF-binding sites from publicly available ChIP-seq data (Fig. cell types [CD1C+ dendritic cells (n = 7), plasmacytoid dendritic 5D). To this end, we used the codex, encode_tfbs, and cistrome_cistrome cells (n = 3), monocytes (n = 6), and Langerhans cells (n = 3)], databases contained in the LOLA Core database. We considered limited to genes with an FDR-adjusted P value less than 0.05 and an only terms with an FDR-adjusted P value below 0.005 and a absolute log2 fold change greater than 1. minimum absolute log2 fold change of 1 significant. We focused on

1418 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Developmental Hierarchy in Langerhans Cell Histiocytosis RESEARCH ARTICLE transcription factors that were found in our lists of differentially Analysis and interpretation of data (e.g., statistical analysis, expressed LCH subset markers or that were closely related to those biostatistics, computational analysis): F. Halbritter, M. Farlik, (by performing a fuzzy string match, e.g., including STAT3 when R. Schwentner, N. Fortelny, T. Schnoller, H. Pisa, I. Simonitsch- only STAT1 was in the marker list). Fourth, we searched the DNA Klupp, C. Hutter sequences underlying sets of ATAC-seq peaks for matches to known Writing, review, and/or revision of the manuscript: F. Halbritter, DNA-binding motifs from the HOCOMOCO database v11 (ref. 32; M. Farlik, N. Fortelny, L.C. Schuster, M. Minkov, W.M. Bauer, Fig. 5E). For this search, we used FIMO (v4.10.2; ref. 59; parameters: I. Simonitsch-Klupp, C. Bock, C. Hutter –no-qvalue –text –bgfile motif-file), and regions with at least one hit Administrative, technical, or material support (i.e., report- (P < 0.0001) were counted. To test for differential motif enrichment, ing or organizing data, constructing databases): F. Halbritter, R. we compared the number of regions with at least one match to a Schwentner, G. Jug, T. Schnoller, J. Gojo, M. Minkov, W.M. Bauer, given motif with the respective frequency of motif hits in the set of C. Hutter all ATAC-seq peaks by using the Fisher exact test. Motifs with an Study supervision: C. Bock, C. Hutter

FDR-adjusted P value below 0.05 and log2 odds above log2(1.5) were considered significant. We focused on motifs matching factors that Acknowledgments were differentially expressed. We would like to thank all patients and their families who gave their permission to include samples and clinical data in this Network Analysis study. We also thank the Biomedical Sequencing Facility at CeMM Integrating the results of gene-centric and region-centric enrich- for assistance with next-generation sequencing, Dieter Prinz and ment analyses as well as the DNA motif search, we inferred a gene Elke Zipperer for flow cytometry support, and the members of regulatory network of LCH subsets. In this network, we defined the Bock lab for help and advice. This work was supported by “regulatory edges” (arrows in the network) as connections pointing DFG Research Fellowship (HA 7723/1-1; to F. Halbritter), Innova- from a transcription factor to its targets. The source nodes of these tion Fund of the Austrian Academy of Sciences (IF_2015_36; to edges were ATAC-seq peaks which either had an overlap with one M. Farlik), Austrian Science Fund (FWF) Special Research Pro- of the enriched ChIP-seq datasets (LOLA analysis) or a match with gramme grant (FWF SFB F 6102-B21; to M. Farlik. and C. Bock), an enriched DNA-binding motif. All source nodes belonging to the EMBO Long-Term Fellowship (ALTF 241-2017; to N. Fortenly), same TFs were merged into one node. Targets of the edges were all New Frontiers Group award of the Austrian Academy of Sciences genes linked to the respective ATAC-seq peak. LCH subset–specific and European Research Council (ERC) Starting Grant (European networks were constructed by using the single-cell RNA-seq and Union’s Horizon 2020 Research and Innovation Programme, grant ATAC-seq data for the respective subset. We defined node sizes pro- agreement no. 679146; to C. Bock), clinical investigator grant by portional to expression level and out-degree (“node importance”) as St. Anna Kinderkrebsforschung and Histiocytosis Association 2 2 follows: ng = Ng × Eg where Ng = (0, xg – xmed) is the difference research grant (to C. Hutter). of expression of a gene (xg) from the median expression (xmed) of all ∑2yy− 80th Received February 1, 2019; revised June 3, 2019; accepted July 10, genes (normalized and scaled UMI counts) and Eg = pp∈ is the sum of edge weights (y) over the 80th percentile (y80th) of all weights 2019; published first July 25, 2019. (normalized ATAC-seq read count). Edges were weighted by accessi- bility (mean of all data for the respective cell subset). Furthermore, browser-based network visualizations were generated to support References interactive exploration of the regulatory networks using the visNet- work package in R (Supplementary File S1; Supplementary Website: . 1 Allen CE, Merad M, McClain KL. Langerhans-cell histiocytosis. N http://LCH-hierarchy.computational-epigenetics.org). Node size in Engl J Med 2018;379;856–68. these networks is proportional to node importance, which was cal- 2. Emile JF, Abla O, Fraitag S, Horne A, Haroche J, Donadieu J, et al. culated in the same way as described above. Revised classification of histiocytoses and neoplasms of the mac- rophage-dendritic cell lineages. Blood 2016;127:2672–81. 3. Doebel T, Voisin B, Nagao K. Langerhans cells - the macrophage in Data and Code Availability dendritic cell clothing. Trends Immunol 2017;38:817–28. Processed single-cell RNA-seq and ATAC-seq data are openly avail- 4. Durham BH, Roos-Weil D, Baillou C, Cohen-Aubart F, Yoshimi A, able via Gene Expression Omnibus (GEO; accession: GSE133706). Miyara M, et al. Functional evidence for derivation of systemic his- Raw sequencing reads are available as controlled access via the tiocytic neoplasms from hematopoietic stem/progenitor cells. Blood European Genome–Phenome Archive (EGA; accession number 2017;130:176–80. EGAS00001003822) to safeguard patient privacy. Additional materials 5. Mass E, Jacome-Galarza CE, Blank T, Lazarov T, Durham BH, Ozkaya including interactive network visualizations and genome browser tracks N, et al. A somatic mutation in erythro-myeloid progenitors causes are provided on the Supplementary Website (http://LCH-hierarchy. neurodegenerative disease. Nature 2017;549:389–93. computational-epigenetics.org). 6. Milne P, Bigley V, Bacon CM, Néel A, McGovern N, Bomken S, et al. Hematopoietic origin of Langerhans cell histiocytosis and Erdheim- Chester disease in adults. Blood 2017;130:167–75. Disclosure of Potential Conflicts of Interest 7. Chakraborty R, Hampton OA, Shen X, Simko SJ, Shih A, Abhyankar No potential conflicts of interest were disclosed. H, et al. Mutually exclusive recurrent somatic mutations in MAP2K1 and BRAF support a central role for ERK activation in LCH patho- Authors’ Contributions genesis. Blood 2014;124:3007–15. 8. Diamond EL, Durham BH, Haroche J, Yao Z, Ma J, Parikh SA, et al. Conception and design: F. Halbritter, M. Farlik, C. Bock, C. Hutter Diverse and targetable kinase alterations drive histiocytic neoplasms. Development of methodology: F. Halbritter, M. Farlik, L.C. Schuster, Cancer Discov 2016;6:154–65. C. Bock, C. Hutter 9. Allen CE, Parsons DW. Biological and clinical significance of somatic Acquisition of data (provided animals, acquired and managed mutations in Langerhans cell histiocytosis and related histiocytic patients, provided facilities, etc.): M. Farlik, R. Schwentner, G. Jug, neoplastic disorders. Hematology 2015;2015:559–64. T. Schnoller, A. Reinprecht, T. Czech, J. Gojo, W. Holter, M. Minkov, 10. Braier J. Is Langerhans cell histiocytosis a neoplasia? Pediatr Blood W.M. Bauer, I. Simonitsch-Klupp, C. Hutter Cancer 2017;64:e26267.

October 2019 CANCER DISCOVERY | 1419

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

RESEARCH ARTICLE Halbritter et al.

.11 Degar BA, Rollins BJ. Langerhans cell histiocytosis: malignancy or transcription factor binding models for human and mouse via large- inflammatory disorder doing a great job of imitating one? Dis Model scale ChIP-Seq analysis. Nucleic Acids Res 2018;46:D252–D259. Mech 2009;2:436–9. 33. Musa J, Aynaud M-M, Mirabeau O, Delattre O, Grünewald TG. 12. Haroche J, Cohen-Aubart F, Rollins BJ, Donadieu J, Charlotte F, MYBL2 (B-Myb): a central regulator of cell proliferation, cell sur- Idbaih A, et al. Histiocytoses: emerging neoplasia behind inflamma- vival and differentiation involved in tumorigenesis. Cell Death Dis tion. Lancet Oncol 2017;18:e113–e125. 2017;8:e2895. 13. Senechal B, Elain G, Jeziorski E, Grondin V, Patey-Mariaud de Serre 34. Lee J, Zhou YJ, Ma W, Zhang W, Aljoufi A, Luh T, et al. Lineage N, Jaubert F, et al. Expansion of regulatory T cells in patients with specification of human dendritic cells is marked by IRF8 expression langerhans cell histiocytosis. PLoS Med 2007;4:1374–84. in hematopoietic stem cells and multipotent progenitors. Nat Immunol 14. Allen CE, Li L, Peters TL, Leung HC, Yu A, Man TK, et al. Cell-specific 2017;18:877–88. gene expression in Langerhans cell histiocytosis lesions reveals a 35. Murphy TL, Tussiwand R, Murphy KM. Specificity through coop- distinct profile compared with epidermal Langerhans cells. J Immu- eration: BATF-IRF interactions control immune-regulatory networks. nol 2010;184:4557–67. Nat Rev Immunol 2013;13:499–509. 15. Hutter C, Kauer M, Simonitsch-Klupp I, Jug G, Schwentner R, 36. Sichien D, Scott CL, Martens L, Vanderkerken M, Van Gassen S, Leitner J, et al. Notch is active in Langerhans cell histiocytosis and Plantinga M, et al. IRF8 transcription factor controls survival and confers pathognomonic features on dendritic cells. Blood 2012;120: function of terminally differentiated conventional and plasmacytoid 5199–208. dendritic cells, respectively. Immunity 2016;45:626–40. 16. Willman CL, Busque L, Griffith BB, Favara BE, McClain KL, Duncan 37. Björk JK, Åkerfelt M, Joutsen J, Puustinen MC, Cheng F, Sistonen L, MH, et al. Langerhans’-cell histiocytosis (histiocytosis X) – a clonal et al. Heat-shock factor 2 is a suppressor of prostate cancer invasion. proliferative disease. N Engl J Med 1994;331:154–60. Oncogene 2016;35:1770–84. 17. Yu RC, Chu C, Buluwela L, Chu AC. Clonal proliferation of Langer- 38. Ren F, Wang L, Shen X, Xiao X, Liu Z, Wei P, et al. MYBL2 is an inde- hans cells in Langerhans cell histiocytosis. Lancet 1994;343:767–8. pendent prognostic marker that has tumor-promoting functions in 18. Banerji CRS, Miranda-Saavedra D, Severini S, Widschwendter M, colorectal cancer. Am J Cancer Res 2015;5:1542–52. Enver T, Zhou JX, et al. Cellular network entropy as the energy poten- 39. Boyer LA, Plath K, Zeitlinger J, Brambrink T, Medeiros LA, Lee tial in Waddington’s differentiation landscape. Sci Rep 2013;3:3039. TI, et al. Polycomb complexes repress developmental regulators in 19. Guo M, Bao EL, Wagner M, Whitsett JA, Xu Y. SLICE: Determining murine embryonic stem cells. Nature 2006;441:349–53. cell differentiation and lineage based on single cell entropy. Nucleic 40. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Acids Res 2017;45:gkw1278. et al. A bivalent chromatin structure marks key developmental genes 20. Shi J, Teschendorff AE, Chen W, Chen L, Li T. Quantifying Wad- in embryonic stem cells. Cell 2006;125:315–26. dington’s epigenetic landscape: a comparison of single-cell potency 41. O’Carroll D, Erhardt S, Pagani M, Barton SC, Surani MA, Jenuwein T. measures. Brief Bioinform 2018 Oct 5 [Epub ahead of print]. The polycomb-group gene Ezh2 is required for early mouse develop- 21. Teschendorff AE, Enver T. Single-cell entropy for accurate estimation ment. Mol Cell Biol 2001;21:4330–6. of differentiation potency from a cell’s transcriptome. Nat Commun 42. Valk-Lingbeek ME, Bruggeman SWM, Van Lohuizen M. Stem cells 2017;8:15599. and cancer: the polycomb connection. Cell 2004;118:409–18. 22. Villablanca EJ, Mora JR. A two-step model for Langerhans cell migra- 43. Langlais D, Barreiro LB, Gros P. The macrophage IRF8/IRF1 regu- tion to skin-draining LN. Eur J Immunol 2008;38:2975–80. lome is required for protection against infections and is associated 23. Villani A-C, Satija R, Reynolds G, Sarkizova S, Shekhar K, Fletcher J, with chronic inflammation. J Exp Med 2016;213:585–603. et al. Single-cell RNA-seq reveals new types of human blood dendritic 44. Berres ML, Allen CE, Merad M. Pathological consequence of mis- cells, monocytes, and progenitors. Science 2017;356:eaah4573. guided dendritic cell differentiation in histiocytic diseases. Adv 24. Schwentner R, Kolenová A, Jug G, Schnöller T, Ahlmann M, Meister B, Immunol 2013;120:127–61. et al. Longitudinal assessment of peripheral blood BRAFV600E levels in 45. Rust R, Kluiver J, Visser L, Harms G, Blokzijl T, Kamps W, et al. Gene patients with Langerhans cell histiocytosis. Pediatr Res 2018;85:856–64. expression analysis of dendritic/Langerhans cells and Langerhans cell 25. Hieronymus T, Zenke M, Baek JH, Seré K. The clash of Langerhans histiocytosis. J Pathol 2006;209:474–83. cell homeostasis in skin: should I stay or should I go? Semin Cell Dev 46. Zitvogel L, Pitt JM, Daillère R, Smyth MJ, Kroemer G. Mouse models Biol 2015;41:30–8. in oncoimmunology. Nat Rev Cancer 2016;16:759–73. 26. Konradi S, Yasmin N, Haslwanter D, Weber M, Gesslbauer B, Sixt 47. Swierczek SI, Agarwal N, Nussenzveig RH, Rothstein G, Wilson A, M, et al. Langerhans cell maturation is accompanied by induction of Artz A, et al. Hematopoiesis is not clonal in healthy elderly women. N-cadherin and the transcriptional regulators of epithelial-mesen- Blood 2008;112:3186–93. chymal transition ZEB1/2. Eur J Immunol 2014;44:553–60. 48. Boudewijns M, van Dongen JJM, Langerak AW. The human androgen 27. Murakami I, Morimoto A, Oka T, Kuwamoto S, Kato M, Horie Y, receptor X-chromosome inactivation assay for clonality diagnostics et al. IL-17A receptor expression differs between subclasses of Langer- of natural killer cell proliferations. J Mol Diagn 2007;9:337–44. hans cell histiocytosis, which might settle the IL-17A controversy. 49. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating Virchows Arch 2013;462:219–28. single-cell transcriptomic data across different conditions, technolo- 28. Murakami I, Matsushita M, Iwasaki T, Kuwamoto S, Kato M, Nagata gies, and species. Nat Biotechnol 2018;36:411–20. K, et al. Interleukin-1 loop model for pathogenesis of Langerhans cell 50. Teschendorff AE, Morabito SJ, Kessenbrock K, Meyer K. Integrated histiocytosis. Cell Commun Signal 2015;13:13. single-cell potency and expression landscape in mammary epithelium 29. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Trans- reveals novel bipotent-like cells associated with breast cancer risk. position of native chromatin for fast and sensitive epigenomic pro- bioRxiv 2018;496471. filing of open chromatin, DNA-binding and nucleosome 51. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The SVA pack- position. Nat Methods 2013;10:1213–8. age for removing batch effects and other unwanted variation in high- 30. Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, throughput experiments. Bioinformatics 2012;28:882–883. et al. Lineage-specific genome architecture links enhancers and non- 52. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma coding disease variants to target gene promoters. Cell 2016;167:1369–84. powers differential expression analyses for RNA-sequencing and 31. Sheffield NC, Bock C. LOLA: enrichment analysis for genomic region microarray studies. Nucleic Acids Res 2015;43:e47. sets and regulatory elements in R and Bioconductor. Bioinformatics 53. Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Arm- 2015;32:587–9. strong NA, Vesuna S, et al. An improved ATAC-seq protocol reduces 32. Kulakovskiy IV, Vorontsov IE, Yevshin IS, Sharipov RN, Fedorova AD, background and enables interrogation of frozen tissues. Nat Methods Rumynskiy EI, et al. HOCOMOCO: Towards a complete collection of 2017;14:959–62.

1420 | CANCER DISCOVERY October 2019 www.aacrjournals.org

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Developmental Hierarchy in Langerhans Cell Histiocytosis RESEARCH ARTICLE

54. Jiang H, Lei R, Ding SW, Zhu S. Skewer: A fast and accurate adapter 57. Love MI, Huber W, Anders S. Moderated estimation of fold change trimmer for next-generation sequencing paired-end reads. BMC Bio- and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; informatics 2014;15:182. 15:550. 55. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. 58. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV. Enrichr: Nat Methods 2012;9:357–9. Interactive and collaborative HTML5 gene list enrichment analysis 56. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein tool. BMC Bioinformatics 2013;14:128. BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 59. Grant CE, Bailey TL, Noble WS. FIMO: Scanning for occurrences of a 2008;9:R137. given motif. Bioinformatics 2011;27:1017–8.

October 2019 CANCER DISCOVERY | 1421

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst July 25, 2019; DOI: 10.1158/2159-8290.CD-19-0138

Epigenomics and Single-Cell Sequencing Define a Developmental Hierarchy in Langerhans Cell Histiocytosis

Florian Halbritter, Matthias Farlik, Raphaela Schwentner, et al.

Cancer Discov 2019;9:1406-1421. Published OnlineFirst July 25, 2019.

Updated version Access the most recent version of this article at: doi:10.1158/2159-8290.CD-19-0138

Supplementary Access the most recent supplemental material at: Material http://cancerdiscovery.aacrjournals.org/content/suppl/2019/07/24/2159-8290.CD-19-0138.DC1

Cited articles This article cites 57 articles, 13 of which you can access for free at: http://cancerdiscovery.aacrjournals.org/content/9/10/1406.full#ref-list-1

Citing articles This article has been cited by 1 HighWire-hosted articles. Access the articles at: http://cancerdiscovery.aacrjournals.org/content/9/10/1406.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerdiscovery.aacrjournals.org/content/9/10/1406. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerdiscovery.aacrjournals.org on September 27, 2021. © 2019 American Association for Cancer Research.