<<

ARTICLE

https://doi.org/10.1038/s41467-020-16164-1 OPEN Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma

Nayoung Kim 1,2,3,13, Hong Kwan Kim4,13, Kyungjong Lee 5,13, Yourae Hong 1,6, Jong Ho Cho4, Jung Won Choi7, Jung-Il Lee7, Yeon-Lim Suh8,BoMiKu9, Hye Hyeon Eum 1,2,3, Soyean Choi 1, Yoon-La Choi6,10,11, Je-Gun Joung1, Woong-Yang Park 1,2,6, Hyun Ae Jung12, Jong-Mu Sun12, Se-Hoon Lee12, ✉ ✉ Jin Seok Ahn12, Keunchil Park12, Myung-Ju Ahn 12 & Hae-Ock Lee 1,2,3,6 1234567890():,;

Advanced metastatic poses utmost clinical challenges and may present molecular and cellular features distinct from an early-stage cancer. Herein, we present single-cell tran- scriptome profiling of metastatic lung adenocarcinoma, the most prevalent histological type diagnosed at stage IV in over 40% of all cases. From 208,506 cells populating the normal tissues or early to metastatic stage cancer in 44 patients, we identify a cancer cell subtype deviating from the normal differentiation trajectory and dominating the metastatic stage. In all stages, the stromal and immune cell dynamics reveal ontological and functional changes that create a pro-tumoral and immunosuppressive microenvironment. Normal resident myeloid cell populations are gradually replaced with -derived and dendritic cells, along with T-cell exhaustion. This extensive single-cell analysis enhances our understanding of molecular and cellular dynamics in metastatic lung cancer and reveals potential diagnostic and therapeutic targets in cancer-microenvironment interactions.

1 Samsung Genome Institute, Samsung Medical Center, Seoul 06351, Korea. 2 Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon 16419, Korea. 3 Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Korea. 4 Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 5 Division of Pulmonary and Critical Care Medicine, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 06351 Seoul, Korea. 6 Department of Health Sciences and Technology, Samsung Advanced Institute for Health Sciences &Technology, Sungkyunkwan University, Seoul 06355, Korea. 7 Department of Neurosurgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 8 Department of , Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 9 Samsung Biomedical Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 10 Laboratory of Cancer Genomics and Molecular Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 11 Department of Pathology and Translational Genomics, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 12 Division of Haematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea. 13These authors contributed equally: Nayoung Kim, ✉ Hong Kwan Kim, Kyungjong Lee. email: [email protected]; [email protected]

NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1

on-small cell lung cancer (NSCLC) is histologically divi- Notably, metastatic lymph nodes (mLN) harbored a significant ded into adenocarcinoma, squamous cell carcinoma, and number of myeloid cells unlike normal lymph nodes (nLN), N fi large-cell carcinoma. Lung adenocarcinoma (LUAD) is indicating an association of myeloid in ltration with . the most common type, accounting for approximately 40% of all mBrain samples contained immune cells (T, B, and NK cells) at lung . LUAD is often detected at the metastatic stage with detectable levels as well as resident oligodendrocytes and myeloid prevalence in the brain, bones, and respiratory system1. Distant cells (). These cellular compositions demonstrated dif- metastasis is the major cause of mortality in lung cancer; how- ferences in tissue-specific resident populations, as well as gross ever, specific aspects of metastatic lung cancer and its associated alterations inflicted by tumor growth and invasion. Thus, our microenvironments remain poorly understood. LUAD atlas, revealed cellular dynamics and progression- Efforts made for the understanding of lung cancer progression associated changes in each cellular component at an unprece- and metastasis have largely focused on profiling of cancer cells dented scale and depth. with genetic aberrations2,3. However, progression and metastasis are also influenced by complex and dynamic features in tumor surroundings4. For instance, application of immune-checkpoint Tumor intrinsic signatures associated with LUAD progression. blockades inhibiting PD-1 (programmed cell death 1) and In the present study, we have explored intrinsic characteristics of CTLA-4 (cytotoxic T--associated protein 4) in adenocarcinoma cells through comparative analysis between immune cells has opened a new therapeutic window for meta- normal epithelial and tumor cells from surgical resection. Normal static NSCLC treatment5. The parsing of unique classes of tumor epithelial cells mainly comprised four distinct subpopulations, microenvironments in advanced cancer can reveal the key ele- including alveolar types I (AT1) and II (AT2), and club cells and ments involved in the predisposition to tumor-induced immu- ciliated cells, expressing well-defined epithelial markers (Supple- nological changes, and these elements can be exploited for novel mentary Fig. 3a-c). AT1 and AT2, the most abundant types, can immunotherapeutic strategies6. initiate LUAD in the distal airway12. In tumor tissues, epithelial Single-cell RNA-sequencing (scRNA-seq) has been recently cell types may contain residual non-malignant cells along used for the profiling of tumor microenvironments7,8. This with malignant tumor cells. To separate the definitive tumor cells technology allows massively parallel characterization of thou- from a potential non-malignant population, we have used genetic sands of cells at the transcriptome level. Previous scRNA-seq aberrations by inferring copy number variations (CNVs) from studies related to lung cancer have been limited to early stage the expression data8,13. The inferred CNV patterns con- primary tumors and normal tissues resected from a small number firmed patient-specific perturbations in malignant tLung, tL/B, of samples of mixed histological types9,10. In the present study, mLN, and mBrain cells (Supplementary Fig. 3d). In the sub- we report the comprehensive single-cell transcriptome profiling sequent tumor cell analysis, we excluded epithelial cells without of LUAD from early to advanced stages of primary cancer and CNV present in tumor tissues, because of their ambiguous distant metastases, and unveil cellular dynamics and molecular identity. features associated with the tumor progression. Using the definitive tumor and normal epithelial cells, we constructed a transcriptional trajectory14 (Fig. 2a) to adjust the inter-patient genomic heterogeneity, and find key Results programs governing the tumor progression. Indeed, transcrip- Cellular dynamics in early, advanced, and metastatic LUAD.To tional states in the trajectory revealed normal differentiation elucidate the cellular dynamics in LUAD progression, tumor from paths as well as progression-associated changes in tumors. Firstly, primary lung tissues, pleural fluids, and lymph node or brain ciliated epithelial and alveolar cells were located in separate metastases were obtained from 44 patients with treatment-naïve trajectory branches, marking their distinct differentiation states. LUAD during endobronchial ultrasound/bronchoscopy biopsy or Secondly, club cells were located between the ciliated and alveolar surgical resection (Fig. 1a, Supplementary Data 1). Distant nor- branches, indicating the intermediate differentiation state15. mal tissues or lymph nodes were also collected for comparative Lastly, tumor cells formed a branched structure, with two analyses. We cataloged 208,506 cells into nine distinct cell transcriptional states (tS1 and tS3) along the normal epithelial lineages annotated with canonical marker gene expression cells; however, one (tS2) was observed to be distinctly positioned (Fig. 1b-d, Supplementary Fig. 1, Supplementary Data 2), thus at the opposite ends of the tS1 and tS3 branches (Fig. 2a, b). In identifying epithelial (alveolar and cancer cells), stromal (fibro- the individual patient-by-patient trajectories, the separation of tS2 blasts and endothelial cells), and immune cells (T, NK, B, mye- from the normal epithelial cells was repeatedly observed loid, and MAST cells) as the common cell types, and (Supplementary Fig. 3e) despite the different trajectory structure oligodendrocytes only in brain metastases (mBrain). Due to the in each patient due to the limited representation of cellular bias introduced during tissue dissociations9, single-cell RNA components. To identify transcriptional signatures defining sequencing data overestimated the immune cell proportions in cellular states in the trajectory, we selected differentially expressed comparison to the stromal and epithelial cell types (Supplemen- specific to each tumor or normal cell state. Sets of 19, 28, tary Fig. 2a). In addition, the recovery rate of tumor cells was 79, 56, 33, and 248 genes were identified as significantly affected by the histological types of LUAD (Supplementary upregulated signatures in tumor cell states 1, 2, 3 (tS1, tS2, tS3) Fig. 2b). Therefore, we assessed the compositions of immune cell or normal cell states 1, 2, and 3 (nS1, nS2, nS3) (Supplementary subsets after removing the epithelial and stromal populations. Data 3). Most of S1- and S3-associated genes were shared but The results faithfully reproduced the immune cell profiles differentially regulated between tumor and normal cells, and detected by mass cytometry by time of-flight (CyTOF) in early related to normal epithelial functions maintaining the surfactant LUAD11. The most abundant immune cells at primary tumor homeostasis, lung alveolus development, and cilium movement sites were observed to be T and myeloid cells (Fig. 2c). By contrast, S2-associated genes showed definitive (Supplementary Fig. 2c, d). Moreover, we confirmed T and B tumor-oriented characteristics, such as aggressive cell movement lymphocyte enrichment and the decline of natural killer (NK) and and abnormal proliferation or . Hence, tS1 and myeloid cells in early- and advanced-stage lung cancers (tLung tS3 states represented a de-regulation of the normal differentia- and tL/B, respectively) compared to the normal lung tissues tion programs, whereas the tS2 tumor cell state deviated (nLung), indicating the activation of adaptive immune responses. completely from the normal transcriptional programs.

2 NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 ARTICLE

acEarly-stage Advanced stage 208,506 cells, 9 major cell lineages mBrain

mLN and tL/B nLN (Biopsy)

tLung

nLung PE tSNE_2 tSNE_1

b nLung tLung mLN and tL/B PE mBrain nLN mLN tL/B (42,995 cells) (45,149 cells) (33,552 cells) (20,304 cells) (29,060 cells) (37,446 cells) (21,479 cells) (12,073 cells) Cell lineage Transcript

Epithelial cells Fibroblasts Endothelial cells T lymphocytes NK cells B lymphocytes 1 10 100 (K) Myeloid cells MAST cells Oligodendrocytes Undetermined

d nLungtLung mLN & tL/B PE mBrain nLN EPCAM KRT19 KRT18 CDH1 DCN THY1 COL1A1 COL1A2 PECAM1 CLDN5 Fraction of cells FLT1 RAMP2 expressing CD3D 0 CD3E 25 CD3G 50 TRAC NKG7 75 GNLY NCAM1 KLRD1 Average Exp. CD79A IGHM −2 IGHG3 −1 IGHA2 0 LYZ 1 MARCO 2 CD68 FCGR3A KIT MS4A2 GATA2 OLIG1 OLIG2 MOG CLDN11 NK cells NK cells NK cells NK cells NK cells Fibroblasts Fibroblasts Fibroblasts Fibroblasts MAST cells MAST cells MAST cells Myeloid cells Myeloid cells Myeloid cells Myeloid cells Myeloid cells Myeloid cells Undetermined Undetermined Undetermined Epithelial cells Epithelial cells Epithelial cells Epithelial cells Epithelial cells T lymphocytes T lymphocytes T lymphocytes T lymphocytes T lymphocytes T lymphocytes B lymphocytes B lymphocytes B lymphocytes B lymphocytes B lymphocytes B lymphocytes Endothelial cells Endothelial cells Oligodendrocytes Fig. 1 Comprehensive dissection and clustering of 208,506 single cells from LUAD patients. a Overview of tissue origins in the present study collection. Single-cell RNA sequencing was applied to cancer tissue-derived whole cells from primary sites (tLung and tL/B), pleural fluids (PE), lymph node (mLN), and brain metastases (mBrain), as well as normal tissues from lungs (nLung) and lymph nodes (nLN). b tSNE projection within each tissue origin, color- coded by major cell lineages and transcript counts. c tSNE plot of 208,506 single cells colored by the major cell lineages as shown in (b). d Dot plot of mean expression of canonical marker genes for nine major lineages from tissues of each origin, as indicated.

NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1

abCell states Cell types c S1 S2 S3 AT2 3.6 10 Cilium movement S3 Ciliated Undetermined 100 Cellular protein Movement of cell or metabolic process S1 AT1 subcellular component Cilium morphogenesis Club 9 8 3.2 Positive regulation of G2/M transition of Respiratory gaseous apoptotic process mitotic cell cycle 50 exchange Malignant % ratio

value 6

P 2.8 S2 6 Negative regulation of cell proliferation −log Component_2 Component_1 0 Surfactant 4 homeostasis 2.4 S1 S2 S3 S1 S2 S3 Cell Malignant Ciliated Club AT1 AT2 Undetermined 3 AT1 nLung Chemokine−mediated Microtubule AT2 2 −based process Club tLung signaling pathway Ciliated nLung tLung nLung tLung nLung tLung Malignant 2468 0.25 1 246 –log Bonferroni –log Bonferroni –log Bonferroni d e f g Gene level Protein level tS1 tS2 tL/B and mLN LUAD LUSC 8 3 |||||| SmokingStageLocationEGFRTMB tLung nLung ** 1.0 ||||||||| ||||||||||||||| ** |||||||||||||||||||||||| |||||| 4 ** 6 ||||||||| |||||||||||||||||||||||||| |||||||||||||||||||||||| 2 |||| ||||||||||||||| | ||||||| P0028 |||| ||||| ||||||||| ||||||| 4 ||||| || |||| ||||| || ||| 0.5 || || | || || |||||||| || ||||||| P0018 2 2 1 ||||||| || | | ||||| IGFBP3 IGFBP3 || || || 2 mm || || | | | || || ||| P0020 0 0 || 0 8 3 p = 0.0014 p = 0.4 P0031 ** ** 0.0 6 | ** 2 1.0 ||| |||||| ||||||||||||||||||||| |||||||||||||||| ||| 6 ||||| |||||||||||||||||| ||||| P0006 ** 4 ||||||| |||||||||| ||||||||||||||||| ** |||||||||||||||| |||||| |||||||||||||||| 1 ||||| |||| ||||||| |||| 2 S100a2 | || P0034 4 S100A2 | |||||||| ||| |||||||||| || | || ||| tS1 tS2 0.5 | |||| | || ||||||| 0 0 tS1 tS2 || || || | ||| |||| ||| ||| || P0008 2 8 3 ||| || || | | |||||| | || ** || P0019 6 p = 0.0025 | p = 0.07 0 2 0.0

Average gene exp. 4 P0025 Survival probability 3 1.0 |||||||||||||| |||||||||||||||||||||| CK19 1 |||||||||||| ||||| KRT19 2 ||||||||||||||||||||| |||||| P0030 |||||||||||||||||||||||||||||| ||||||||||| ||||| || |||||||||||||| || ||||| |||||||| || ||||||| 2 0 0 |||| |||||||||||||||||| |||| || | P0009 8 3 ||| ||||||| || |||| || ||| || | || | tS3 0.5 ||||||| || || || ** tS3 ||| | | 1 ||| |||| | | | | || P0001 6 2 || | | ||| | | |||| Smoking Tumor 0 4 p = 0.38 p = 0.092 AG2 1 0.0 Cur AGR2 location 2 Ex tS1 tS2 tS3 nS1 nS2 nS3 tLung tL/B mLN mBrain 02468100246810 Never LUL 0 0 LLL Years Years Stage RUL %Cells nCells tS1 tS2 tS1 tS2 | High | Low I RLL 0 1500 II 25 tL/B&mLN tL/B&mLN III TMB 100 high 50 EGFR int 75 MT low 100 WT 10 Unknown

Fig. 2 Identification of novel cancer cell signature tS2 in LUAD and the connection to poor survival. a Unsupervised transcriptional trajectory of malignant and normal epithelial cells from Monocle (version 2), colored by cell states and subsets. b Relative proportion of cell subsets and tissue origins for each cell state as shown in (a). c Functional categories ( terms) of signature genes specific to each cell state as shown in (a). d Different enrichment of cell states within tLung and nLung, and associated clinical parameters. The clinical and pathological parameters originated from tLung. Cur: current smoker; Ex: ex-smoker; Never: never smoker; LUL: left upper lobe; LLL: left lower lobe; RUL: right upper lobe; RLL: right lower lobe. e Changes in average tumor cell state-specific signature gene expression with lung cancer progression (tLung and tL/B) and metastasis (mLN and mBrain). **, two-sided Wilcoxon test p-value < 0.01. f Box plots depicting single-cell gene expression and the quantified protein levels of selected markers specific to the tS2 epithelial subset. tS1-enriched tLung (n = 7; T06, T08, T09, T19, T25, T30, T34), tS2-enriched tLung (n = 4; T18, T20, T28, T31), and tL/B & mLN (n = 5; EBUS_06, EBUS_10, EBUS_13, EBUS_19, EBUS_28). **, one-way ANOVA test p-value < 0.01. Each box represents the interquartile range (IQR, the range between the 25th and 75th percentile) with the mid-point of the data, whiskers indicate the upper and lower value within 1.5 times the IQR. IHC staining of IGFBP3, S100a2, CK19, and AG2 on formalin-fixed and paraffin-embedded slides for the tS1 (T19), tS2 (T28), and tL/B & mLN (EBUS_10) samples. Scale bar, 2 mm. g, Kaplan–Meier overall survival curves of TCGA LUAD (n = 494 samples), and LUSC (n = 490 samples) patients. +: censored observations. P- value (p) was calculated using the two-sided log- test.

LUAD patients contained both tS1 and tS2 tumor subpopula- during tumor progression and/or metastasis. Survival analysis tions at different fractions, with minor numbers of tS3 (Fig. 2d, using these gene sets revealed that late-stage specific gene sets tLung). The tS2-specific gene expression was increased for tumor have the highest prediction power for poor survival in LUAD cells that were isolated from the late-stage biopsies or metastases patients. (tL/B, mLN, and mBrain), suggesting an association with tumor progression and metastasis (Fig. 2e). An increase in tS2-specific gene expression was supported at the protein level through Stromal cells orchestrate tissue remodeling and .To immunohistochemical staining of LUAD samples (Fig. 2f). We investigate dynamics in the , further tested the clinical impact of the tS2 signature using an we obtained 6168 presumed fibroblasts and endothelial cells as independent LUAD cohort from the Cancer Genome Atlas shown in Fig. 1b, and performed a principal component analysis (TCGA). Patients with high tS2 signature gene expression showed (Supplementary Fig. 5a). The first principal component was worse overall survival (two-sided log-rank test p < 0.01) than observed to split the cells into 2107 endothelial cells and 3794 those with low expression (Fig. 2g). By contrast, there was no fibroblasts, with a concordant expression of representative marker survival difference for lung squamous cell carcinoma (LUSC), genes (average log-normalized expression > 1). indicating an explicit involvement of the tS2 signature with Sub-clustering of endothelial cells (ECs) revealed eight clusters LUAD progression. (Fig. 3a). Most EC clusters were observed to belong to the normal To further identify genes related to LUAD progression and/or tissues and assigned to known vascular cell types, including tip metastasis, we directly compared tumor cells in early- versus and stalk-like cells, lymphatic ECs, and endothelial progenitor advanced-stage primary, or primary versus metastasis samples cells16,17 (Fig. 3b). By contrast, one distinct cluster was identified (Supplementary Fig. 4, Supplementary Data 4). These pairwise as tumor-derived ECs (EC-C1) present in tLung and mBrain comparisons revealed the gene sets to be differentially regulated samples (Fig. 3c, Supplementary Fig. 5b, c). Tumor ECs

4 NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 ARTICLE

ae2,107 Endothelial cells, Cell types b 3794 Fibroblasts, Cell types 8 clusters 12 clusters

 Gene exp.            PECAM1             CLDN5 6 5     2 FLT1 6    3    1  4  RAMP2 7 0 1 0 100 Sample types 10 3 % ratio nLung mLN 4 5 2 11 1 50 9 7 tLung mBrain 8 0 tL/B 1 063 45 7 2 0 2 Cell types Cell types tSNE_2 tSNE_2 INSR Tumor ECs tSNE_1 Tumor ECs HSPG2 Tip−like ECs tSNE_1 COL14A1+ matrix FBs Mesothelial cells VWA1 Tumor COL13A1+ matrix FBs FB−like cells Tip-like ECs Stalk-like ECs Stalk−like ECs EPCs Lymphatic ECs RGCC Myofibroblasts Pericytes RAMP3 Lymphatic ECs cells Undetermined ADM Tip-like EPCs ACKR1 Relative gene exp. c 120 100 SELP f 100

Stalk-like 2 TYROBP

EPC C1QB 0 90 75 LYVE1 −2 100 75 PROX1 FLT1 60 50 Lymphatic KDR 50 FLT4 50 DLL1 Average number

Average number 30 25 DLL4 25 Cell proportion (%) JAG1 Cell proportion (%) JAG2 0 0 NOTCH1 0 0 NOTCH4

tL/B tL/B tL/B tL/B tLung mLN tLung mLN tLung mLN tLung mLN nLung mBrain nLung mBrain nLung mBrain nLung mBrain Gene exp. d g    DCN           THY1          4          COL1A1 ANXA1    CCND1     2  COL1A2  0 Sample types

100 % ratio GPX3 HPGD nLung mLN 50 tLung mBrain IGHG3 TXNIP 0 tL/B 02 16 3 5117 410 TIMP3 EPAS1 FHL1 TMEM100 IGKC Cell types Cell types SPARC COL14A1 B2M IGLC2 COL14A1+ matrix FBs FCN3 ATF3 COL4A2 COL4A1 GSN ADIRF PI16 COL13A1+ matrix FBs ACKR1 IRF1 CYGB HLA-B SCGB1A1 SAT1 Myofibroblasts

COL14A1+ PRRX1 HLA-C SLPI ANGPT2 COL13A1 Smooth muscle cells HLA-DRB1 COL15A1 HLA-E A2M Angiogenesis VWA1 TCF21 VIPR1 ITGA8 Mesothelial cells HTRA1 CXCL14 HSPG2 FB-like cells S100A4 TNFSF10 BST2 HLA-DPA1HLA-DRB5 GSN NPNT CTGF COL13A1+ TAGLN Pericytes HLA-DRA TFPI CTNNAL1 ACTA2 IGFBP7 ACTG2

MF Relative gene exp. MYH11 IL7R CCL2 MYLK 2 EMP2 SYNPO2 0 LYVE1 CRYAB CNN1 −2 SMC DES UPK3B EDN1 MSLN Node color ABCG2 CALB2 –3 0 2 Cellular response to LPS WT1

FABP5 Mesothelial CYP1B1 log2 FC APOD

EDNRB FB-like Edge thickness RGS5 FABP4 CSPG4 High score ABCC9 KCNJ8 CD36 Pericyte low score PDGFRA PDGFRB PDGFA PDGFB PDGFC PDGFD FABP4 FABP5 Lipofibroblast PPARG

h i nLung tLung j 105 150 5 40 10 tLung nLung 4 4 30 10 10 100 Tumor 20 103 103 Normal Count EpCAM-CD45- EpCAM-CD45- 50 10 2 2 10 αSMA+ % positive cells EpCAM-PerCP/Cy5.5 10 3 mm 300μm EpCAM-PerCP/Cy5.5 0 0 2 2 0 –10 –10 0 2 2 3 4 5 –102 0 102 103 104 105 –10 0 10 10 10 10 0 102 103 104 105 nLung tLung CD45-PE CD45-PE αSMA-Alexa 488

Fig. 3 Tumor endothelial cells and myofibroblasts promoting angiogenesis and tissue remodeling. a tSNE plot of endothelial cells, color-coded by clusters and cell subsets as indicated. EPCs: endothelial progenitor cells. b Three-layered complex heatmap of selected endothelial cell marker genes in each cell cluster. Top: Mean expression of known . Middle: Tissue preference of each cluster; Bottom: Relative expression map of known marker genes associated with each cell subset. Mean expression values are scaled by mean-centering, and transformed to a scale from -2 to 2. c Average cell number and relative proportion of EC subsets from tissues of each origin. nLung, n = 11 samples; tLung, n = 11 samples; tL/B, n = 3 samples; mLN, n = 2 samples; mBrain, n = 9 samples. d Functional association networks between signature genes specific to tumor ECs. e tSNE plot of fibroblasts (FBs), color- coded by clusters and cell subsets as indicated. f Average cell number and relative proportion of fibroblast subsets from tissues of each origin (excluding undetermined cells). nLung, n = 11 samples; tLung, n = 11 samples; tL/B, n = 3 samples; mLN, n = 4 samples; mBrain, n = 10 samples. g Complex heatmap of selected fibroblast marker genes in each cell cluster, as shown in (b) (excluding undetermined cells). h, IHC staining of α-SMA on formalin-fixed and paraffin-embedded slides for the independent biospecimens (n = 4). All replicates showed the similar results. i, Representative flow cytometry plots showing myofibroblast (α-SMA+) in primary tumor (T06) and normal lung (N06) tissues. j, Box plot of the percentage of myofibroblast (α-SMA+)in normal and tumor-derived EPCAM−CD45− cells. nLung (n = 5; N06, N18, N41, N42, N43) and tLung (n = 5; T06, T18, T41, T42, T43). SMC and pericytes can be included in the α-SMA + cells as minor populations. *p < 0.05 (p = 0.02), two-sided Student’s t test. Each box represents the interquartile range (IQR, the range between the 25th and 75th percentile) with the mid-point of the data, whiskers indicate the upper and lower value within 1.5 times the IQR.

NATURE COMMUNICATIONS | (2020)11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 demonstrated a strong activation of VEGF and Notch signaling respectively). Both normal and tumor tissues contained clusters of (Fig. 3b), which regulates the development and cell fate S100A9+ (M-C3)39 or dendritic cells (DCs). The determination of endothelial cells18,19. Gene expression network remaining clusters displayed origin-specific heterogeneity and analysis of tumor ECs further highlighted angiogenesis as the diverse characteristics, including pleural macro- upregulated genes’ functional category (Fig. 3d, Supplementary phages40 from pleural fluids (PE) (M-C8 and 9) or microglia and Data 5). Thus, brain metastases and primary tumors induced macrophages41 derived from mBrain samples (M-C11). The similar vascular changes to accommodate extensive neovascular- pleural macrophages lacked the expression of pro-inflammatory ization. Among the upregulated genes, (INSR) genes, such as IL1B and CXCL8, but expressed CD163 overexpression in the tumor vasculature was suggested as an transcripts, which are associated with a non-inflammatory phe- attractive therapeutic target20. Together, these data further notype. Overall, our data suggest that tumor-associated macro- supported the therapeutic strategies targeting pro-angiogenic phages (TAMs) in primary lung tumors and distant metastases pathways in lung cancer and brain metastases21,22. Notably, mainly propagated from mo-Macs that were ontologically dif- significantly downregulated genes in tumor ECs were related to ferent from tissue-resident macrophages (Fig. 4c, Supplementary immune activation (Fig. 3d, Supplementary Data 5), supporting a Fig. 6a, b). previous finding that tumor ECs suppress the immune To understand the transcriptional transition from monocytes responses9,23. to TAMs, we performed an unsupervised trajectory analysis to Sub-clustering of fibroblasts revealed 12 distinct clusters, infer changes in the status of macrophages from lung or lymph assigned to seven known cell types, including COL13A1+ and node samples (Supplementary Fig. 6c, d). Macrophages can COL14A1+ matrix fibroblasts, myofibroblasts, smooth muscle manifest diverse functional in health and disease cells, mesothelial cells, fibroblast-like cells in mBrain, and conditions, as pro-inflammatory or anti-inflammatory subpopu- pericytes24–26 (Fig. 3e). The COL13A1+ and COL14A1+ matrix lations42. We have detected a serial transformation of pro- fibroblasts comprised the main fibroblast types in normal lung inflammatory monocytes into macrophages along the pseudo- (FB-C0 and 6) and early stage tumor (FB-C1 and 2) tissues time axis, with cells losing their pro-inflammatory nature and (Fig. 3f, g; Supplementary Fig. 5d, e). By contrast, myofibroblasts gaining anti-inflammatory signatures (Supplementary Fig. 6e, f, in FB-C3 exclusively originated from tumor tissues, including Supplementary Data 6). This transition eventually reached a tLung, tL/B, and mLN samples. Myofibroblasts have been branching point at which the two macrophage subpopulations described as cancer-associated fibroblasts promoting extensive either retained part of their pro-inflammatory signatures, or were tissue remodeling27, angiogenesis28, and tumor progression29. skewed to an anti-inflammatory gene expression . The myofibroblasts in mLN might be fibroblastic reticular cells, Normal lung and tumor tissues were enriched in pro- and anti- which have been reported to be immunologically specialized inflammatory macrophages, respectively. We have also identified myofibroblasts using encapsulated mesenchymal sponges to anti-inflammatory macrophages in mLN, with an additional gather immune cells into the lymph node30. The fibroblast-like population (LN-Mac-S6) expressing the macrophage inflamma- cells in mBrain (Fig. 3f, g; Supplementary Fig. 5d, e; FB-C7; tory factors (MIF, CXCL3, and CCL20) (Supplementary Fig. 6f). CYP1B1+ and APOD+)26 might represent cells within the MIF-expressing macrophages also expressed IL1B and TNF at perivascular space of central nervous system (CNS) that expanded levels comparable to those in pro-inflammatory monocytes, after CNS injury31. The infiltration of myofibroblasts in LUAD indicating unique macrophage profiles in mLN. was confirmed by the expression of the marker protein alpha clusters, as shown in Fig. 4b, manifested a smooth muscle actin (α-SMA) (ACTA2 gene product) in the variegated marker gene expression suggesting the presence of tumor stroma (Fig. 3h) and in tumor-derived EPCAM−CD45− heterogeneous DC subpopulations. For a more comprehensive cells (Fig. 3i, j; Supplementary Fig. 10). Partial protein expression analysis, we re-classified DCs into six subsets, including CD1c+ of α-SMA was observed in the vascular smooth muscle cells in DCs (Langerhans cells, LCs), CD141+ DCs, CD207+CD1a+ normal tissues. Conclusively, cellular dynamics in endothelial LCs, pDCs (plasmacytoid DCs), CD163+CD14+ DCs, and cells and fibroblasts support a consistent phenotypic shift of activated DCs43,44 (Fig. 4d, e). This refined the minor DC stromal cells towards promoting tissue remodeling and angiogen- populations within the total myeloid cell clusters (Fig. 4f). esis in LUAD and distant metastases. Interestingly, pDCs were rarely found in normal lung tissues, but recovered in selected tumor tissues and metastatic lymph nodes (Fig. 4g, h). The pDCs demonstrated an immunosuppressive Suppressive immune microenvironment primed by myeloid phenotype45 represented by the upregulation of leukocyte cells. Myeloid cells play a critical role in maintaining tissue immunoglobulin-like receptor (LILR) family genes46, granzyme homeostasis, and regulate inflammation in the lung. Sub- B(GZMB) production47, and loss of CD86, CD83, CD80, and clustering of 42,245 myeloid cells, as shown in Fig. 1b, revealed LAMP3 activation marker expression45,48 (Fig. 4i). The presence them to be monocytes, macrophages, and dendritic cells (Fig. 4a, of pDCs in some LUAD tissues was confirmed through flow b). were not recovered in our experimental process. cytometry (Fig. 4j, k; Supplementary Fig. 10). Therefore, both Two macrophage types are known to populate the normal adult mo-Macs and pDCs could create an immunosuppressive lung, including the alveolar (AM) type highly expressing the microenvironment that possibly caused a sub-optimal tumor MARCO, FABP4, and MCEMP1 genes, and the interstitial type presentation in LUAD and distant metastases. derived from circulating monocytes32,33. Mo-Macs, which are functionally different from tissue-resident macrophages, are recruited and induced to express profibrotic genes during lung Activation and perturbation of adaptive immunity. Tumor- fibrosis34. We mainly detected the AM type in normal lung tis- infiltrating B cells have been identified in tertiary lymphoid sues, including anti-inflammatory AM (M−C1 and 6; APOE+, structures within NSCLC. Mediating an anti-tumor immune CD163+, and C1QB+)35–37, pro-inflammatory AM (M-C5; response, these cells have been reported to be associated with IL1B+ and CXCL8+)38, and actively cycling AM expressing anti- prolonged patient survival49. Relative proportion of B cells was inflammatory markers (M-C13)39. By contrast, lung tumor and observed to be increased in primary tumors, compared to the distant metastasis tissues were strongly enriched in mo-Macs nLung samples (Supplementary Fig. 2c, d). In the lymph nodes, B (anti- and pro-inflammatory mo-Macs in M-C0 and 2, cells were abundantly present regardless of metastases. Sub-

6 NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 ARTICLE

a bc 1500 42,245 Myeloid cells, Cell types Gene exp. % ratio 15 clusters 2 0 4 100 50 0 S100A9 FCGR3A MARCO IL1B CD163 LYZ VCAN LGMN CTSB CD14 MAFB MAF CX3CR1 ITGAM CSF1R FABP4 MCEMP1 CXCL8 APOE SEPP1 C1QB C1QA C1QC STMN1 MKI67 TOP2A CDK1 CLEC10A CD1C CLEC4C PTCRA CCR7 LAMP3 CTSS FCN1 S100A8 12 Cell types 9 7 1000 11 7 8 12 14 14 0 2 4 4 500

11 Average number 1 3 13 9 10 5 8 0 6 mo-Mac 13 Monocytes PE tSNE_2 10 tL/BnLN Alveolar Mac tLung mLN tSNE_1 mo-Mac Microglia/Mac 6 nLung mBrain Pleural Mac Monocytes DCs 1 Microglia/Mac Alveolar Mac Undetermined 100 DCs Pleural Mac 5 3 75 2 0 Monoctye mo-lineage Pro- Anti- Cycling DC 50 Macrophage Alveolar Mac Sample types Celltypes Relative gene exp. 25 Cell proportion (%) LYZ nLung nLN mo-Mac 2 MARCO tLung mLN Monocytes 0 0 CD68 tL/B PE Alveolar Mac FCGR3A −2 mBrain Pleural Mac tL/B PE tLung nLNmLN Microglia/Mac nLung mBrain DCs Undetermined

d e CD1c+ DCs (LCs) CD141+ DCs CD207+CD1a+ LCs f 4087 Dendritic cells, Cell types DC cell subsets 7 Clusters CD1C ITGAX CLEC9A XCR1 CD207 CD1A 6

3 5 2 0 4

1 Activated DCs pDCs CD163+CD14+ DCs CCR7 LAMP3 IL3RA CLEC4C CD14 CD163 tSNE_2 tSNE_2 tSNE_1 CD1c+ DCs (LCs) pDCs tSNE_1 CD141+ DCs CD163+CD14+ DCs CD207+CD1a+ LCs Activated DCs tSNE_2 tSNE_1

g 100 h CD1c+ DCs CD141+ DCs CD207+CD1a+ LCs 15 20 15 75 15 50 10 10 10 25 Ro/e % Proportion 5 5 5 0 300 0 0 0 Activated DCs pDCs CD163+CD14+ DCs 200 40 20 10

Number 100

Ro/e 20 10 5 0

0 0 0 LN_11 LN_05 LN_02 LN_07 LN_12 LN_08 LN_06 LN_04 LN_01 LN_03 NS_19 NS_16 US_28 NS_04 NS_02 NS_17 NS_13 NS_12 NS_06 NS_03 NS_07 BUS_15 B BUS_12 BUS_13 PE PE PE E E E EBUS_10 E EBUS_06 EBUS_49 EBUS_51 EBUS_19 nLN tL/B nLN tL/B nLN tL/B tLung mLN tLung mLN tLung mLN LUNG_T30 LUNG_T28 LUNG_T09 LUNG_T20 LUNG_T19 LUNG_T31 LUNG_T25 LUNG_T06 LUNG_T08 LUNG_T18 LUNG_T34 LUNG_N19 LUNG_N01 LUNG_N30 LUNG_N28 LUNG_N18 LUNG_N08 LUNG_N06 LUNG_N34 LUNG_N31 LUNG_N09 LUNG_N20 nLung nLung nLung ONCHO_11 ONCHO_58 mBrain mBrain mBrain BR BR EFFUSION_64 EFFUSION_13 EFFUSION_12 EFFUSION_11 EFFUSION_06 * p < 0.05, vs. nLung * p < 0.05, vs. nLN p < 0.01, vs. nLung p < 0.01, vs. nLN nLung tLungtL/B nLN mLNPE mBrain ** **

i j nLung tLung k 0.26 Gated on CD45+Lin-HLA-DR+ Gated on CD45+Lin-HLA-DR+ LILRB4 LILRB1 LILRA4 GZMB CD86 CD83 CD80 LAMP3 5 5 ** ** ** ** ** ** ** ** 10 10 9 pDCs Activated DCs 104 104 6 CD1c+ DCs

3 3 % pDCs CD141+ DCs 10 10 3 CD11c-APC

CD11c-APC pDC CD207+CD1a+ LCs pDC 102 102 CD163+CD14+ DCs 0 0 0 nLung tLung 0246 0246 0246 0246 0246 0246 0246 0246 2 3 4 5 2 3 4 5 0 10 10 10 10 010 10 10 10 P0009 P0014 CD123-PE CD123-PE P0019 P0041 clustering of 27,657 B cells revealed 14 clusters (Fig. 5a) con- abundant in all samples (Fig. 5b, Supplementary Fig. 7c). We have verging to five differentiation states9,50,51, which were represented observed tissue-specific enrichment for other subsets (Supple- as follicular B cells, plasma B cells expressing immunoglobulin mentary Fig. 7c, d). Firstly, normal lung tissues were enriched in gamma (IgG), mucosa-associated lymphoid tissue-derived plasma -secreting cytotoxic cells, whose differentiation was B cells expressing IgA and joining chain, granzyme B-secreting B modulated by T-cell-derived IL-12 (ref. 51). Granzyme B secretion cells, and (GC) B cells (Fig. 5a, Supplementary from these cells could play a significant role in mediating cellular Fig. 7a, b). Specifically, GC B cells50 were separated into either cytotoxicity as an alternative to T cells53. Secondly, we found dark or zone cells with distinct transcriptional programs for more GC B cells in primary tumors and LN metastases than in proliferation or activation, respectively52 (Supplementary Fig. 7a). normal lung and lymph nodes, respectively. These data strongly Among these, follicular B cells were observed to be the most suggest highly activated humoral immune responses in some

NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1

Fig. 4 Diversity within the myeloid cell lineage and functionality according to tissue origins. a tSNE plot of myeloid cells, color-coded by clusters and cell subsets as indicated. b Complex heatmap of selected myeloid cell marker genes in each cell cluster. Left: Tissue preference of each cluster. Right: Relative expression map of known marker genes associated with each cell subset. Mean expression values are scaled by mean-centering, and transformed to a scale from -2 to 2. Pro-: Pro-inflammatory; Anti-: Anti-inflammatory. c Average cell number and relative proportion of myeloid cell subsets from each tissue origin (excluding undetermined cells). nLung, n = 11 samples; tLung, n = 11; tL/B, n = 4; nLN, n = 10; mLN, n = 7; PE, n = 5, mBrain, n = 10. d, e tSNE plot of DCs, color-coded by clusters, cell subsets, and canonical marker gene expression (gray to red). f Partitioning of dendritic cell (DC) subsets on tSNE plot of myeloid cells in (a). g Cell number and relative proportion of DC subsets in each sample. h Tissue preference of DC subsets. RO/E is the relative score of observed cell numbers over expected cell numbers calculated by chi-square test. The RO/E values of all tissue origins are shown in different colors. Black dots represent different patients. *p < 0.05; **p < 0.01, two-sided Student’s t test. i Median expression of selected marker genes for DC subsets associated with their functionality in each DC subset. **, one-way ANOVA test p-value < 0.01. pDCs, n = 172 cells; Activated DCs, n = 456; CD1c+ DCs, n = 1,782; CD141+ DCs, n = 303; CD207+CD1a+ LCs, n = 177; CD163+CD14+ DCs, n = 1,197. j Representative flow cytometry plots showing pDC (CD11c-CD123+ DCs) populations in primary tumor (T09) and normal lung (N09) tissues. k Paired dot plot of the percentage of pDC (CD11c-CD123+ DCs) population in myeloid cells (CD45+Lin-HLA-DR+) derived from nLung-tLung paired samples (four pairs; P0009, P0014, P0019, P0041). The increase of pDC populations was detected in the selected primary tumor tissues (T09, T14, and T41). P-value = 0.26, two-sided Student’s t test. In the box plot in (h) and (i), each box represents the interquartile range (IQR, the range between the 25th and 75th percentile) with the mid-point of the data, whiskers indicate the upper and lower value within 1.5 times the IQR.

LUAD patients. Each subtype displayed a slightly different metastases (Fig. 6b). In addition, increase in these two sub- B cell receptor or Ig light chain variable gene expression profile populations in the tumor microenvironment was associated with (Supplementary Fig. 7e), suggesting the generation and clonal the high tumor mutation burden (TMB) (Fig. 6c). As high TMB is expansion of tumor antigen-specific B cells. the principal predictor of successful therapy, T lymphocytes are the central players mediating anti-tumor our results support the role of mo-Macs and exhausted CD8+ immunity and are the targets of immune-checkpoint therapies. T cells in the successful application of immune checkpoint For the sub-clustering analysis of T lymphocytes, we initially therapies in advanced LUAD. collected 91,227 cells from T and NK cell clusters (Fig. 1b) To delineate the molecular associations underlying inter- sharing common transcriptome characteristics, and confidently cellular relationships, we first constructed a cellular communica- defined 64,403 T/NK cells with a secondary cell filtration using tion network using potential receptor- pair interactions. In marker gene expression (average of log-normalized expression > the tLung, we substantiated the dominant crosstalk between the 2). The T/NK cell sub-clusters reflected heterogeneous cell tS2, a novel cancer cell state associated with LUAD progression lineages and functional states that were identified as CD8+ T and metastases, with myeloid or stromal cell types (Fig. 6d, (naïve, effector, exhausted), naïve CD4+ T, exhausted T follicular Supplementary Fig. 9, Supplementary Data 8). In the network, helper, T helper, regulatory T, and NK cells (Fig. 5c, Supple- interactions between the tS2 cells and mo-Macs were predicted to mentary Fig. 8a). In accordance with previous findings9–11,we be most significant, whereas interactions between mo-Macs and found the depletion of NK cells and the emergence of regulatory exhausted CD8+ T cells were observed to be the most prominent T cells (Tregs) in the primary tumor tissues compared to normal within the immune cell network. The proportion of mo-Macs and tissues (Fig. 5d, Supplementary Fig. 8b, c). Treg cells persisted in exhausted CD8+ T cells also demonstrated positive correlations tL/B, mLN, and mBrain, delivering a suppressive mechanism of with an increase in tS2 cancer cells (Fig. 6e). For other cell types, anti-tumor immunity during tumor progression and metastasis. we found potential interactions between tS2/Malignant cells and CD8+ T cells demonstrated a dynamic functional spectrum as tumor ECs through angiogenesis signaling molecules, such as that in naïve, cytotoxic, or exhausted states from the transcrip- VEGF-VEGFRs and -Eph receptors54 (Fig. 7a, b). Tumor tional trajectory (Fig. 5e, f, Supplementary Data 7). Exhausted ECs would receive angiogenic stimulatory signals from mo-Mac/ CD8+ T cells were mainly collected from tumor tissues (tLung, malignant cells through VEGF and its receptor FLT1/VEGFR1, tL/B, mLN, and mBrain), whereas cytotoxic effector CD8+ T cells KDR/VEGFR2, as a key mediator of angiogenesis in cancer55–57 were collected from nLung (Fig. 5g). Naive CD8+ T cells were for samples of all tumor stages or for brain metastasis samples. mostly derived from nLN and PE. Differences in the T/NK cell Further, we predicted the molecular interactions between mo- subset dynamics (NK, Treg, and cytotoxic or exhausted CD8+ Macs, exhausted CD8+ T, and cancer cells in primary tumors T cells) between primary tumor and normal lung tissues were (tLung and tL/B) or distant metastases (mLN and mBrain), which further supported by conventional flow cytometry analysis had a dominant crosstalk in LUAD (Fig. 7c). Most ligand- (Fig. 5h, Supplementary Fig. 10). Altogether, the changes in receptor pairs between mo-Macs and tS2/Malignant cells were cellular composition and gene expression phenotype of T cells involved in signaling of growth factors, such as VEGFA and confirmed the direction of tumor immunity towards immune VEGFB for all stage samples. Malignant cells would receive suppression in LUAD. activation signals from mo-Macs through TNFR (TNF- TNFRSF1A), TGFBR (TGFB1-TGFBR2), and EGFR (EREG- Inference of inter-cellular and molecular interactions. Cellular EGFR) in metastatic lymph nodes. The TNFR signaling was also dynamics during LUAD progression was further confirmed prominent in the late-stage tL/B. In return, malignant cells from through chi-square tests for different tissue distributions of 40 the metastatic lymph nodes would provide the growth signal to immune and stromal cell subsets (Fig. 6a). Tumor-specific mo-Macs (CSF1-CSF1R). Potential to the populations, such as mo-Macs, pDCs, Tregs, myofibroblasts, exhausted CD8+ T cells was mostly inhibitory, delivered by tS2/ and tumor ECs were spread out in primary tumors and distant malignant cells for samples of all tumor stages (NECTIN2-TIGIT) metastases, whereas origin-specific immune and stromal cell or for metastatic lymph nodes (LGALS9-HAVCR2). Interestingly, subsets (alveolar mac, pleural mac, microglia/mac, and FB-like mo-Macs were predicted to deliver both activating (TNF- cells) were specifically associated with their corresponding tissue TNFRSF1B/ICOS) and inhibitory (LGALS9-HAVCR2) signals to sites. The proportions of exhausted CD8+ T cells and mo-Macs exhausted CD8+ T cells. Therefore, our results have demon- were markedly increased during LUAD progression and strated the complex nature of mo-Macs in LUAD, which greatly

8 NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 ARTICLE

a b 100 27,657 B cells, Cell types 1000 14 clusters 750 75 3 1 10 9 GC B cells in the DZ GC B cells in the LZ 500 50 2 12 GrB−secreting B cells e number 11 13 MALT B cells g 6 Plasma cells 0 250 25 4 Follicular B cells Avera Undetermined (%) Cell proportion 5 8 7 0 0 tSNE_2 PE ain PE ain tSNE_1 tL/BnLN r tL/BnLN tLung mLN tLung mLN nLung mB nLung mBr c d 2000 64,403 T/NK cells, Cell types 100 20 clusters 1500 18 Cytotoxic CD8+ T 75 19 3 Exhausted CD8+ T 9 12 Naive CD8+ T 0 1 CD8 low T 1000 50 Naive CD4+ T 11 7 Treg 13 5 10 Exhausted Tfh 17 4 500 25 8 CD4+ Th number Average

6 Cell proportion (%) 2 16 CD8+/CD4+ Mixed Th NK 15 14 Undetermined 0 0 tSNE_2

tSNE_1 ain ain tL/BnLN PE tL/BnLN PE tLung mLN tLung mLN nLung mBr nLung mBr e f CD8+ states CD8+ T cell clusters CytotoxicityExhaustion Naiveness 3 Cyt r = 0.583 r = -0.48 r = 0.23 S3 4 score o 2 2 i

c 2 1 1 Naiveness score Exhaustion score Cytotoxicity 0 0 0 −20246 −7.5 −5.0 −2.5 0.0 2.5 −7.5 −5.0 −2.5 0.0 2.5 Component_2 Component_1 Component_1 S1 4 6 ** ** ** 3 S2 3 score 4 Exhausted Naive 2 2

5681115 xicity 2 1 1 Component_2 Component_1

Cytoto 0

0 Naiveness score

Exhaustion score 0 6 15 5 11 8 11 5 6 8 15 6 5 15 8 11 Cluster Cluster Cluster

g h GatedonCD3+CD56–Gated on CD3+CD56– GatedonCD3+CD8+ Cytotoxic CD8+ T Exhausted CD8+ T 35.64 11.46 23.86 2.95 5 0.11 4.20 5 49.84 17.39 105 105 10 10 nLung mBrain Treg PD1+ 90% 4 4 4 83% 10 104 10 10 g

3 74% 103 3 10 103 8 8 10 tLung nLun 7 18 22 CD25-PE PD1-BV421 CD56-FITC CD8-BV421 2 2 25 11 10 10 10 CD16+ mLN 50% 0 0 0 37 0 PD1– 51% tL/B 2 24.55 28.36 –10 9.45 63.74 33.73 61.96 –102 27.61 5.15 –102 0102 103 104 105 0102 103 104 105 0102 103 104 105 0103 104 105 CD3-PerCP CD4-APC/Cy7 CD4-APC/Cy7 CD16-FITC Naive CD8+ T 5 5.73 13.93 15.42 2.28 0.14 11.72 67.94 14.50 10 105 5 105 9 10 PE Treg PD1+ 3 4 71% 9 nLung 10 104 104 104 tLung 24 13

tL/B ung 3 3 3 3 nLN L 10 10 10 10 mLN t CD25-PE PD1-BV421 CD56-FITC

PE CD8-BV421 2 2 2 79% mBainr 10 10 10 102 CD16+ nLN 0 0 0 2 0 PD1– –102 16.07 64.27 –10 3.38 78.92 –102 18.90 69.24 16.08 1.48 –102 2 3 4 5 2 3 4 5 –102 0102 103 104 105 010 10 10 10 010 10 10 10 –102 0102 103 104 105 CD3-PerCP CD4-APC/Cy7 CD4-APC/Cy7 CD16-FITC

Fig. 5 B cell- and T/NK cell-mediated immune responses during lung cancer progression. a tSNE plot of B cells, color-coded by clusters and cell subsets as indicated. DZ: dark zone; LZ: light zone; GrB, granzyme B; MALT: mucosa-associated lymphoid tissue. b Average cell number and relative proportion of B cell subsets from tissues of each origin (excluding undetermined cells). nLung, n = 11 samples; tLung, n = 11 samples; tL/B, n = 4 samples; nLN, 10 samples; mLN, n = 7 samples; PE, n = 5 samples, mBrain, n = 10 samples. c tSNE plot of T/NK cells, color-coded by clusters and cell subsets as indicated. Tfh: T follicular helper; Th: T helper. d Average cell number and relative proportion of T/NK cell subsets from tissues of each origin (excluding undetermined cells). nLung, n = 11 samples; tLung, n = 11 samples; tL/B, n = 4 samples; nLN, 10 samples; mLN, n = 7 samples; PE, n = 5 samples, mBrain, n = 10 samples. e Unsupervised trajectory of CD8+ T cell functional state transitions. f Correlation of Monocle components with T cell functional features (mean expression of signature genes in Supplementary Fig. 8a). Each dot indicates single cells colored by their clusters. Solid black line and the top-right text (r) denote LOESS fit and Pearson’s correlation, respectively (top). Violin plot of T cell functional features in each cluster (bottom). **, one-way ANOVA test p- value < 0.01. g Tissue distribution along functional states in CD8+ T cells. h Flow cytometry plots gated on NK and T cell subsets (Treg, cytotoxic and exhausted CD8+ T cells) from primary tumor (T08) and normal lung (N08) tissues.

NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1

ab d

tS2 s l nLN mLN PE mBrain nLung tLung tL/B tS1 50 mo-Mac tS3 20 Myofibroblasts Monocytes COL13A1+ matrix FBs al cel

rtion (%) 40 Alveolar Mac i l Smooth muscle cells broblasts e Pleural Mac i Pericytes

10 F Microglia/Mac COL14A1+ matrix FBs 30 CD1c+ DCs Tumor ECs

doth Tip-like ECs CD141+ DCs Stalk-like ECs Cell propo

CD207+CD1a+ LCs 0 En 20 Exhausted CD8+ T Lymphatic ECs

Myeloid cells pDCs mo-Mac CD163+CD14+ DCs Monocytes Activated DCs 60 Alveolar Mac 10 pDCs Cytotoxic CD8+ T

tion (%) CD1c+ DCs Exhausted CD8+ T r 40 CD141+ DCs Naive CD8+ T CD207+CD1a+ LCs CD8 low T CD163+CD14+ DCs Myeloid cells Naive CD4+ T 20 mo−Mac Activated DCs GC B cells in the DZ Treg GC B cells in the LZ Exhausted Tfh Cell propo 0 MALT B cells

T/NK cells CD4+ Th Plasma cells CD8+/CD4+ Mixed Th Follicular B cells mal PE Treg NK r Exhausted Tfh GC B cells in the DZ No rly tumor Naive CD4+ T GC B cells in the LZ Late tumor Ea Metastases CD4+ Th GrB−secreting B cells CD8+/CD4+ Mixed Th MALT B cells Exhausted CD8+ T Lung * p < 0.05, vs. nLung Cytotoxic CD8+ T

B cells Plasma cells LN p < 0.01, vs. nLung Naive CD8+ T Follicular B cells ** CD8 low T Myofibroblasts Brain * p < 0.05, vs. nLN NK COL13A1+ matrix FBs p < 0.01, vs. nLN MAST COL14A1+ matrix FBs PE ** NK tS2 tS1 tS3 lls Mesothelial cells Treg e pDCs FB-like cells MAST c mo-Mac CD4+ Th Smooth muscle cells c Pericytes CD8 low T Monocytes Fibroblasts Tumor ECs

Pericytes CD1c+ DCs

Tip-like ECs Plasma cells

0.039 0.076 Alveolar Mac MALT B cells CD141+ DCs Stalk-like ECs Naive CD8+ T Naive CD4+ T Activated DCs Myofibroblasts Tumor ECs Exhausted Tfh

0.25 0.28 Lymphatic ECs Tip-like ECs 40 Follicular B cells

0.039 0.015 Cytotoxic CD8+ T

Stalk-like ECs Exhausted CD8+ T CD207+CD1a+ LCs GC B cells in the LZ GC B cells in the DZ 20 CD163+CD14+ DCs EPCs Smooth muscle cells CD8+/CD4+ Mixed Th COL14A1+ matrix FBs 30 COL13A1+ matrix FBs Endothelial Lymphatic ECs MAST MAST cells 20 e 10 Pearson residuals % mo-Mac 10 20 r = 0.76 (r 2 = 0.58) r = 0.83 (r 2 = 0.69) −60 0 60

% Exhausted CD8+ T 40 0 0 15 High Int Low High Int Low 10 20 TMB (/Mb) TMB (/Mb)

5 % mo-Mac

0 0

% Exhausted CD8+ T 0 25 50 75 100 0 25 50 75 100 % tS2 cancer cells % tS2 cancer cells

Fig. 6 Phenotypic changes during LUAD progression and metastases. a Tissue distribution map for each of the 40 immune and stromal cell subsets. Pearson residual calculated using the chi-square test was used to adjust cell-sampling biases between tissue origins. Brown and green colors indicate enrichment and depletion, respectively. Circle size is proportional to the contribution of a given cell. b Increased proportion of exhausted CD8+ T cells/ mo-Macs during LUAD progression and metastases. Immune cell proportion was estimated within a non-epithelial compartment. *p < 0.05; **p < 0.01, two-sided Student’s t test. nLung, n = 11 samples; tLung, n = 11 samples; tL/B, n = 4 samples; nLN, 10 samples; mLN, n = 7 samples; PE, n = 5 samples, mBrain, n = 10 samples. c Enrichment of exhausted CD8+ T cells/mo-Macs in tLung samples with a high mutational burden (TMB). The significance was determined using two-sided Student’s t test. high, n = 3 samples, int, n = 3 samples, low, n = 5 samples. d Heat map depicting the number of significant interactions between tLung cell subsets. e Association between proportional changes in exhausted CD8+ T cells/mo-Macs and tS2 cancer cells in tLung. The proportion of tS2 cells was estimated with respect to all malignant cells in each sample. Top-right text (r and r2) represents Pearson’s correlation and its coefficient of determination. In the box plot in b and c, each box represents the interquartile range (IQR, the range between the 25th and 75th percentile) with the mid-point of the data, whiskers indicate the upper and lower value within 1.5 times the IQR. influences T cell functionalities to balance immune activation and root of all three branches, suggesting versatile progenitor properties. exhaustion. Taken together, the inter-cellular interactions suggest Normal samples demonstrated a variable branch distribution a tight relationship between immune cell dynamics and molecular indicating regional heterogeneity with differential proximal (ciliated features of cancer cells that may determine the prognostic and and club) or distal (alveolar type) cells58 (Supplementary Fig. 3c). therapeutic responses in LUAD. On the contrary, cancer cells were found mostly in the S1 and S2 branches, showing de-regulation or complete deviation from nor- mal epithelial transcriptional programs. The cancer cell signature at Discussion the S2 branch, tS2, was specifically associated with lung cancer In the present study, we have depicted the cellular landscape of progression and metastases in LUAD patients. LUAD from the early to the advanced stages, encompassing the Intriguingly, club cells in the S1 and S3 branches expressed a primary and metastatic sites. This LUAD atlas has revealed the touch of distal alveolar cell marker and microtubule assembly characteristics of tumor cells and associated microenvironments, genes, respectively. This suggested the commitment towards and further illuminated changes in cellular and molecular net- alveolar or ciliated cells (Supplementary Data 3). On the contrary, works during tumor progression. To the best of our knowledge, club cells at the S2 branching point highly expressed genes this provides the most comprehensive cellular interaction map of involved in innate immune response and detoxification, repre- LUAD and a framework for future discoveries of molecular and senting their original protective function. This club cell subset cellular therapeutic targets. might include tumorigenic progenitors susceptible to oncogenic Through systemic multi-patient analyses, we have uncovered transformation, as evidenced by sharing of S2 branch formation malignant molecular features of cancer cells previously masked by with tS2 cells. inter- and intra-patient genomic heterogeneity. In particular, the We have also found that most alterations in the tumor projection of cancer and normal epithelial cells into a joint tran- microenvironment from normal lung tissues were inflicted at an scriptional trajectory has revealed their similarities and disparities, early stage, and then sustained in later stages. First, tumor ECs implicating the paths of malignant transformation. Two transcrip- acquired highly angiogenic, yet immune-compromised proper- tional branches reflected the differentiation programs for ciliated ties. Second, myofibroblasts gradually replaced matrix fibroblasts (S3) or alveolar (S1) cells in the lung. Club cells were located at the in the tumor stroma. Third, mo-Macs and dendritic cells (CD163

10 NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 ARTICLE

a c Interactions Tissue origins tS2|mo-Mac tLung Malignant cells|mo-Mac tL/B tS2|Exhausted CD8+ T mLN Malignant cells|Exhausted CD8+ T mBrain mo-Mac|Exhausted CD8+ T mo-Mac|tS2 Interactions Interactions –log (p value) mo-Mac|Malignant cells 6 Tissue origins tS2|Tumor ECs Exhausted CD8+ T|mo-Mac 1.3 NRP1 − VEGFB Malignant cells|Tumor ECs Exhausted CD8+ T|tS2 FLT1 complex − VEGFB Exhausted CD8+ T|Malignant cells 0 FLT1 − VEGFB Tumor ECs|tS2 FLT1 complex − VEGFA HLA−C − FAM3C Tumor ECs|Malignant cells NRP1 − VEGFA Interactions ICAM1 − AREG Tissue origins EFNA1 − EPHA2 Tissue origins EFNB2 − EPHB4 tLung TNF − TNFRSF1A LAMP1 − FAM3C TNF − SEMA4C LGALS9 − CD47 tL/B TNF − DAG1 LGALS9 − MET mBrain TGFB1 − TGFbeta receptor2 NRP2 − VEGFA NAMPT − P2RY6 LGALS9 − MRC2 CD44 − HBEGF –log (p value) EREG − EGFR EGFR − TGFB1 COPA − P2RY6 CD46 − JAG1 6 CSF1R − CSF1 CDH1 − a2b1 complex 1.3 TGFB1 − aVb6 complex LGALS9 − SORL1 0 CCL3L1 − DPP4 TNFSF12 − TNFRSF12A LGALS9 − SLC1A5 FN1 − a3b1 complex LGALS9 − PTPRK TGFB1 − aVb6 complex LGALS9 − DAG1 CXADR − FAM3C ICAM1 − AREG TNFRSF10B − TNFSF10 GRN − SORT1 LAMC1 − a6b1 complex FN1 − aVb5 complex CXCL1 − ACKR1 FN1 − aVb1 complex CXCL8 − ACKR1 FN1 − a2b1 complex COPA − SORT1 EGFR − HBEGF CXCL2 − DPP4 MIF − TNFRSF10D HLA−C − FAM3C VEGFA − FLT1 VEGFA − KDR LGALS9 − CD47 SIRPA − CD47 LAMP1 − FAM3C

NRP1 − VEGFA mo-Mac | tS2/Malignant cells NRP2 − VEGFA CD74 − APP TNFSF12 − TNFRSF12A LGALS9 − MET FN1 − a3b1 complex FPR3 − HEBP1 NRP1 − VEGFB CD74 − COPA LGALS9 − SORL1 b TIGIT − NECTIN2 IFNG − Type II IFNR CD6 − ALCAM CXCR6 − CXCL16 CD44 − HBEGF CD2 − CD58 CXCR3 − CCL20 Interactions Interactions CD99 − PILRA Tissue origins mo-Mac|Tumor ECs TNFRSF1B − GRN CCL4L2 − VSIR CCL2 − ACKR1 Tumor ECs|mo-Mac CCL5 − CCR1 CXCL8 − ACKR1 PTPRC − MRC1 LAMP1 − FAM3C AXL − GAS6 Tissue origins PGRMC2 − CCL4L2 CCR1 − CCL14 EGFR − HBEGF HLA−C − FAM3C tLung EGFR − COPA FN1 − a5b1 complex EGFR − GRN CD74 − MIF tL/B EGFR − MIF EGFR − TGFB1 CCL4L2 − VSIR mBrain CD74 − APP MIF − TNFRSF14 LGALS9 − CD44 MDK − LRP1 CD40 − TNFSF13B –log (p value) TNFRSF1A − GRN PDGFB − LRP1 6 ANXA1 − FPR1 ANXA1 − FPR3 CXCL12 − CXCR4 1.3 LGALS9 − HAVCR2 LRP6 − CKLF LGALS9 − LRP1 0 TNFRSF10B − TNFSF10 CD99 − PILRA SCGB3A1 − MARCO TNFRSF1A − GRN CSF1 − SIRPA CD44 − HBEGF OSMR − OSM NRP2 − SEMA3F TFRC − TNFSF13B MIF − TNFRSF10D LGALS9 − LRP1 FN1 − a10b1 complex CXADR − FAM3C MERTK − GAS6 LGALS9 − COLEC12 ADORA3 − ENTPD1 FPR3 − HEBP1 MDK − SORL1 ICAM1 − aXb2 complex SPP1 − CD44 ICAM1 − aMb2 complex CD74 − COPA C3 − aMb2 complex TNF − FLT4 C3 − C3AR1 SPP1 − a9b1 complex NRP2 − PGF LGALS9 − HAVCR2 FN1 − a8b1 complex CDH1 − aEb7 complex NRP1 − PGF TNFRSF10B − FASLG CD47 − SIRPG NRP1 − VEGFB ICAM1 − SPN Malignant cells | Exhausted CD8+ T PROS1 − AXL TNFRSF1A − FASLG mo-Mac | Exhausted CD8+ T FLT1 complex − VEGFB FLT1 − VEGFB CD8A − CEACAM5 CD96 − NECTIN1 CCR1 − CCL23 FN1 − aVb1 complex PDCD1 − FAM3C LGALS9 − PTPRK CD94:NKG2A − HLA−E FN1 − a2b1 complex TGFB1 − TGFbeta receptor1 CCL5 − ACKR1 HLA−F − LILRB1 LGALS9 − MRC2 SPN − SIGLEC1 LGALS9 − CD47 EGFR − AREG SIRPA − CD47 CD52 − SIGLEC10 OSMR − OSM HLA−DPB1 − TNFSF13B THY1 − aMb2 complex LTBR − LTB THY1 − aXb2 complex TNF − ICOS ICAM1 − AREG PECAM1 − CD38 TGFB1 − TGFbeta receptor1 TNF − TNFRSF1B ICAM1 − aMb2 complex ALOX5 − ALOX5AP ICAM1 − aXb2 complex HLA−E − KLRC1 SPP1 − PTGER4 CD55 − ADGRE5 FAM3C − CLEC2D LGALS9 − CD44 C5AR1 − RPS19 CD74 − MIF

SPP1 − CD44 mo-Mac | Exhausted CD8+ T

Fig. 7 Significant ligand-receptor pair genes accounting for specific inter-cellular interactions. Heatmap depicting significant interactions between (a) Tumor ECs and cancer cells; (b) mo-Macs and tumor ECs; (c) Exhausted CD8+ T cells, mo-Macs, and cancer cells in our LUAD collections (tS2 in tLung and malignant cells in tL/B, mLN, and mBrain). One-sided p-value is calculated from permutation test.

NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1

+CD14+ DCs) expanded and differentiated towards an overall pieces were re-mixed by gentle pipetting at 20-min intervals during incubation. (3) anti-inflammatory phenotype, and overpowered alveolar macro- Metastatic brain tissue was chopped into pieces that were 2–4 mm in size using a phages in lung tissues and conventional DCs in the LNs. Fourth, sterile pair of scissors, placed in a 100-mm dish, and incubated in an solution (collagenase (Gibco, Waltham, MA, USA), DNase I (Roche, Basel, Swit- B cells were activated and expanded in tumor tissues, suggesting zerland), and Dispase I (Gibco, Waltham, MA, USA); prepared in DMEM) at 37 °C humoral immune responses against tumor . Fifth, cyto- for 1 h. The tissue pieces were re-mixed by gentle pipetting at 15-min intervals toxic NK cells were diminished; however, regulatory T cells were during incubation. (4) Pleural fluids were transferred to a 50-ml tube, and the cells observed to be increased. Within the CD8+ T cell subsets, an were centrifuged at 300g. Each cell suspension was transferred to a new 50-ml (15-ml tube for biopsy exhausted T cell phenotype has expanded throughout cytotoxic samples) tube after being passed through a 70-µm strainer. The volume in the tube effector populations. These alterations in stromal and immune was readjusted to 50 ml (or 15 ml) with RPMI1640 medium, and the contents were populations cooperatively transformed immune-competent tis- centrifuged to remove the . The supernatant was aspirated, the cell pellet sues into an immune-suppressive tumor microenvironment. was resuspended in 4 ml of RPMI1640 medium, and the dead cells were removed Aberrant anti-tumor immune responses involving or using Ficoll-Paque PLUS (GE Healthcare, Chicago, IL, USA) separation. regulatory and exhausted T cells also provide therapeutic Single-cell RNA sequencing and read processing. Each cell suspension was opportunities to direct the immune reaction into productive subjected to 3′ single-cell RNA sequencing using Single Cell A Chip Kit, Single Cell directions using immune checkpoint inhibitors and other 3′ Library and Gel Bead Kit V2, and i7 Multiplex Kit (10x Genomics, Pleasanton, immune modulators. CA, USA) with a cell recovery target of 5000, following the manufacturer’s Finally, we demonstrated vibrant cell-population dynamics and instructions. Libraries were sequenced on an Illumina HiSeq2500, and mapped to molecular interactions between the tumor, stromal, and immune the GRCh38 human reference genome using the Cell Ranger toolkit (version 2.1.0). compartments. Numerous highly significant interactions were Whole-exome sequencing and data processing. Exomes of 11 formalin-fixed inferred between mo-Macs and aggressive/metastatic tumor cells and paraffin-embedded lung tumor tissues and paired blood samples were captured involving the activation of TNF, TGF-β, and EGFR signaling using the SureSelectXT Human All Exon V5 (5190-6208, Agilent, Santa Clara, pathways in tumor cells; these interactions may induce ERK- CA, USA). Sequencing libraries were constructed for the HiSeq2500 system (Illu- MAPK signaling, invasive mesenchymal phenotypes, and tumor mina, San Diego, CA, USA) and sequenced using the 100-bp paired-end mode of growth. Malignant cells differentially induced the activation of the Hiseq PE Cluster kit v4 (PE-401-4001, Illumina, San Diego, CA, USA), and the Hiseq SBS kit v4 (PE-401-4003, Illumina, San Diego, CA, USA). Exome-sequencing cellular and humoral immune responses in individual patients; reads were aligned to the hg38 reference genome using BWA-0.7.17. Putative however, they concomitantly provided inhibitory signals to duplications were marked by Picard (version picard-tools-2.18.2-SNAPSHOT). induce immune exhaustion. This dual function of immune acti- Sites potentially harboring small insertions or deletions were realigned, and reca- fi vation and regulation was also predicted for mo-Mac populations, librated by employing GATK (v4.0.5.1) modules with known variant sites identi ed from phase 3 of the 1000 Genomes Project and dbSNP-151. GATK4 Mutect2 was as they provided both stimulatory and inhibitory signals to used to call somatic mutations. The whole-exome sequencing (WES) coverage used T cells. In conclusion, these results illustrate a complex biological was 100× for the tumors and 50× for the paired blood samples. picture, but also suggest a therapeutic opportunity to target mo- Macs in addition to the other T cell populations for the redir- . Patient tissue samples subjected to 3′ single-cell RNA ection towards immune activation. sequencing and obtained from BioBank were fixed in 10% formalin, and embedded in paraffin. Thereafter, 4-µm-thick sections were prepared. The following anti- bodies and dilutions were used to detect the respective : anti-IGFBP3 Methods (mouse, 1:100, NBP2-12364, Novus Biologicals, Centennial, CO, USA), anti-CK19 Human specimens. The present study has been reviewed and approved by the (rabbit, 1:500, NB100-687, Novus Biologicals), anti-AG2 (rabbit, 1:200, NBP2- Institutional Review Board (IRB) of the Samsung Medical Center (IRB no. 2010- 27393, Novus Biologicals), anti-S100a2 (rabbit, 1:300, ab109494, Abcam, Cam- 04-039-052), and all subjects have provided their written informed consent. Forty- bridge, UK), and anti-αSMA (Mouse, M0851, DAKO Agilent, Santa Clara, four patients diagnosed with pathological LUAD were enrolled in the present study CA, USA). (Supplementary Data 1). The average age was 62.2 years old and 38.6% of them were female. A total of 58 samples were collected and immediately transferred for Flow cytometry. Dissociated cells were multi-stained with three to five antibodies single-cell isolation. Tumor tissue, distant normal lung, normal lymph node, and at 4 °C for 1 h, and then washed once with phosphate buffered saline. All antibodies metastatic brain tissues were obtained during conserving surgery at the Samsung were used at concentrations recommended by the manufacturer. After filtering Medical Center (Seoul, Korea) from LUAD patients that had not received prior through a round-bottom tube with a 40-μm strainer-cap, the cells were analyzed treatment. Normal lung tissues were separated from the malignant region by at using FACSVerse and FACSuite v1.2 software (BD Biosciences, San Jose, least 5 cm. Metastatic lymph node and lung tumor tissues were collected from CA, USA). advanced-stage LUAD patients through endobronchial ultrasound and broncho- In order to examine myofibroblasts, we first stained cells with anti-human fl scopy. Pleural uids were obtained from LUAD patients through malignant pleural EpCAM-PerCP/Cy5.5 (Cat# 347199, BD) and CD45-PE (555483, BD) antibodies. effusion. Then cells were fixed in 2% paraformaldehyde/PBS, permeabilized in Intracellular A total of six tissue samples (tumor-normal pair) from three LUAD patients staining Perm Wash Buffer (BioLegend, San Diego, CA, USA), and stained with the fl were additionally collected and immediately dissociated for the ow cytometry anti-aSMA-Alexa 488 (ab197240, Abcam) . analysis. The collected tissues were as follows: LUNG_T14 (stage IIIA), The pDC populations in lung tissues were identified using anti-CD45-BV421 LUNG_N14, LUNG_T41 (stage IIIA), LUNG_N41, LUNG_T42 (stage IA), (368521, BioLegend) and anti-Human 4-Color Dendritic Value Bundle (340565, LUNG_N42, LUNG_T43 (stage IB), LUNG_N43. BD). Due to a low proportion of pDCs in the lung tissues, only samples in which CD45+Lin-HLA-DR+ cells exceed 1% of the total cell number were used for the analysis. Sample preparation. Single-cell suspensions of the collected tissues were prepared In order to examine NK, T, and regulatory T cells, we stained cells with FITC- through mechanical dissociation and enzymatic digestion within 16 h after surgery. conjugated anti-human CD56 (CD56-FITC, Cat# 318303), CD3-PerCP (300428), Single-cell isolation was performed differently depending on the sample conditions. CD4-APC/Cy7 (317417), CD8-BV421 (344747), and CD25-PE (302605) (1) Tumor and distant normal lung tissue dissociation was performed using a antibodies. CD3-PerCP (300428), CD8a-PE (300908), CD16-FITC (360715), and tumor dissociation kit (Miltenyi Biotech, Germany) following the manufacturer’s PD-1-BV421 (329920) antibodies were used together to label cytotoxic or instructions. Briefly, tissues were cut into pieces that were 2–4 mm in size and exhausted T cells. transferred to a C tube containing the enzyme mix (enzymes H, R, and A in RPMI1640 medium). The GentleMACS programs h_tumor_01, h_tumor_02, and h_tumor_02 were run with two 30-min incubations on the MACSmix tube rotator Filtering and normalization of scRNA-seq data. We applied three quality at 37 °C. (2) Normal lymph node tissue and biopsy samples of metastatic lymph measures on raw gene-cell-barcode matrix for each cell: mitochondrial genes nodes and lung tumor tissues were dissociated using collagenase/hyaluronidase (≤20%, unique molecular identifiers (UMIs), and gene count (ranging from 100 to (STEMCELL Technologies, Vancouver, Canada) and DNase I, RNase-Free (lyo- 150,000 and 200 to 10,000). The UMI count for the genes in each cell was log- philized) (QIAGEN, Hilden, Germany). The tissues were chopped into pieces that normalized to TPM-like values, and then used in the log2 scale transcripts per were 2–4 mm in size using a sterile pair of scissors, placed in a 35−mm dish, and million (TPM) plus 1 (ref. 59). For each batch, we used the filtered cells to remove incubated in an enzyme solution (collagenase/hyaluronidase (STEMCELL Tech- genes that are expressed at low levels by counting the number of cells (min.cells) nologies, Vancouver, Canada) and DNase I, RNase-Free (lyophilized) (QIAGEN, having expression of each gene i, and excluded genes with min.cells < 0.1% cells. Hilden, Germany) prepared in RPMI1640 medium at 37 °C for 1 h. The tissue For the remaining cells and genes, we defined relative expression by centering using

12 NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 ARTICLE the Seurat ScaleData function with the following parameters: do.scale = FALSE, do. matrix in the scale of UMI counts was provided as an input to Monocle, and then, center = TRUE, scale. = 10. The relative expression levels across the remaining its newCellDataSet function was called to create an object with the parameter subset of cells and genes were used for subsequent analysis. expressionFamily = negbinomial.size. For the macrophages, variably expressed genes were selected to have normalized variance over the fitting curve and mean expression greater than 0.001, as estimated by the Monocle dispersionTable Unsupervised dimensional reduction and clustering. Variably expressed genes function. For CD8+ T cells, variably expressed genes selected by Seurat that had a with mean expression between 0.0125 and 3 and quantile-normalized variance mean expression between 0.0125 and 3 and quantile-normalized variance larger greater than 0.5 were selected using Seurat v2.3.4 (ref. 60)(https://satijalab.org/ than 0.5 were used as inputs. The cell trajectory was inferred using default Monocle seurat/) in R version 3.4, and then used to compute the principal components parameters after dimension reduction and cell ordering. (PCs). A subset of significant PCs was selected using the JackStraw and PCEl- bowPlot functions of Seurat. Cell clustering and tSNE visualization were performed using the FindClusters and RunTSNE functions, respectively. The annotations of Identification of signature genes. We applied the Seurat FindAllMarkers func- cell identity on each cluster were defined by the expression of known marker genes. tion to identify specific genes for each cell subset. For the selection of marker genes We compared the results for cell lineage annotations based on independent cell specific to each cell state (as estimated by the trajectory analysis), we calculated the 61 62 clusters from different clustering methods as follows: Seurat, CIDR , RCA , and log2 fold change (log2FC) between two groups (a cell state/subset vs. other cells) SC3 (ref. 63) (Supplementary Fig. 1b). The clustering by Seurat algorithm showed using the Seurat FindMarkers function. The significance of the difference was high agreement with the results obtained using the other methods. determined using two-sided Student’s t test with Bonferroni correction. Signature genes were required to be expressed in >25% of cells within either of the two cell groups (marked as PCT, percentage of cells). Genes were selected as signatures Inference of CNV from scRNA-seq data. In order to separate malignant tumor based on the statistical threshold (log FC > 1, two-sided Student’s t test p-value cells from non-malignant cells, CNV aberrations were inferred from the pertur- 2 < 0.01, and adjusted p-value (Bonferroni) < 0.01). The selected genes were cate- bation of chromosomal gene expression. First, we adjusted the proportion of gorized according to the functional gene sets in the Gene Ontology (GO) Biological putative malignant cells below 20%, by adding normal cells derived from normal Process using database for annotation, visualization and integrated discovery lung tissues to the individual tumor samples collected from tLung, tL/B, mLN, and (DAVID)65,66 (https://david.ncifcrf.gov/) pathway enrichment analysis. mBrain. Secondly, we filtered out less informative genes that were expressed in less than 10 cells and that had a mean expression of less than 0.1 across all cells at the ’ log2 scale. Third, the expression of each gene was transformed into a Z-score on a Survival analysis. RNA-seq and clinical data from patients LUAD and squamous limited scale from −3 to 3. Fourth, after sorting the genes by their chromosomal cell carcinoma (LUSC) samples were obtained from the Cancer Genome Atlas position, the signal for CNV was estimated using the moving averages of 100 genes (TCGA) to evaluate the prognostic effects of gene sets derived from specific cell as the window size within individual , and adjusted using centered states. The RNA-seq data (Level 3) included 494 LUAD and 490 LUSC (updated in values across genes64. Finally, we summarized the CNV signal with two parameters, 2017) tumors, and the expression of each gene was represented using the log2 including mean squares of estimates across all windows in the x-axis, and corre- (TPM + 1) scale. We acknowledged patient survival if the time of death after lation of the CNV of each cell with the average of the top 5% of cells in the y-axis8. diagnosis was longer than 10 years for a more refined analysis of survival rate. The Cancer cells showing perturbation in their CNV signal (>0.02 mean squares or >0.2 tumor samples were divided into two classes along the 25th and 75th percentiles of CNV correlation) were classified as malignant. The codes for CNV inference are the mean expression of the target genes. Survival curves were fitted using the available at the Github repository (https://github.com/SGI-LungCancer/SingleCell). Kaplan–Meier formula in the R package ‘survival’, and visualized using the ggsurvplot function of the R package ‘survminer’. Prediction of tumor mutation burden. The tumor mutation burden (TMB) was estimated from bulk WES data for 11 formalin-fixed and paraffin-embedded lung Gene–gene functional association network. We constructed a network between tumor tissues. The TMB is defined as the total number of mutations per coding marker genes specific to tumor ECs to represent their functional association. The area of a tumor genome. The TMB status of tLung samples was determined by a network was quantified using the Jaccard index, known as an intersection over the range of TMB values: high TMB, >30 mutations/Mb; int, intermediate; low TMB, union, between the GO Biological Process terms for all possible pairs of genes. A <25 mutations/Mb. total of 68 genes (22 upregulated and 46 downregulated genes) were identified as marker genes significantly expressed in tumor ECs in comparison to all other ECs (absolute log FC > 1, two-sided Student’s t test p-value<0.01, adjusted p-value Marker gene selection specific to cancer cells. For pairwise comparisons 2 (Bonferroni) < 0.01, and PCT > 0.25). For visual clarity, we constructed the between early- versus advanced-stage primary, or primary versus metastatic cancer gene–gene network using only associations with a Jaccard index exceeding 0.05, for cells, we filtered out non-malignant cells among cancer cells from tLung, tL/B, a total of 56 genes (18 upregulated and 38 downregulated genes). This network was mLN, and mBrain. A total of 6352, 6400, 2961, and 15423 malignant cells, visualized with the force-directed layout algorithm in the open-source platform respectively, were used to identify the differentially expressed genes. Malignant cells Cytoscape (version 3.5.1)67. from tLung were used as a control group to evaluate the advanced stage- and fi metastasis-speci c gene regulation. We manually calculated the log2 fold change – (log2FC) between two groups (tL/B, mLN, mBrain versus tLung) using the Seurat Cell cell interaction network. We mapped the receptor-ligand pairs using Cell- FindMarkers function. The significance of the difference was determined using PhoneDB (www.cellphonedb.org)68,69 onto our cell subsets within tissues of each two-sided Student’s t test with Bonferroni correction. Signature genes were origin to identify cell–cell interactions. This method infers the potential interaction required to be expressed in >25% of cells within either of the two cell groups strength between two cell subsets based on gene expression level, and provides the (marked as PCT, percentage of cells). Genes were selected as signatures based on significance through permutation test (1000 times). Only receptors and ligands ’ fi the statistical threshold (absolute log2FC > 0.585, two-sided Student s t test p- expressed in more than 25% of the cells in the speci c cell subsets were considered value<0.01, and adjusted p-value (Bonferroni) <0.01). in each run. The resulting adjacency matrices were generated for all cell–cell interactions within our collection. For the analysis, we applied four filtering steps to the raw adjacency matrices: (1) interaction pairs with collagens were removed, (2) Inference of tumor cell state using trajectory analysis.Wefirst extracted the cell–cell interactions within identical cellular lineages were excluded, (3) only cell epithelial cell clusters (Fig. 1b) from the single-cell RNA sequencing data on nLung subsets defined in more than 0.1% of the cells in immune and stromal cells were samples. In order to generate a trajectory, we adopted only malignant cells, as analyzed, and (4) and significant interactions (satisfied with one-sided p-value for a estimated by CNV inference, among the epithelial cells in the tLung samples. We permutation test <0.05) were identified between cell subsets in tissues of each origin. employed the Monocle (version 2) algorithm14 using variable genes selected by Seurat as the input to determine the differential tumor cell states referenced against normal epithelial cells. The gene-cell matrix in the scale of UMI counts was pro- Reporting summary. Further information on research design is available in vided as input to Monocle, and then, its newCellDataSet function was called to the Nature Research Reporting Summary linked to this article. create an object with the parameter expressionFamily = negbinomial.size. The epithelial cell trajectory was inferred using default parameters of Monocle after Data availability dimension reduction and cell ordering. Processed data can be accessed from the NCBI Gene Expression Omnibus database (accession code GSE131907). Raw single-cell RNA sequencing and bulk WES data are Inference of the developmental trajectory for immune cells. The cell state available in the European Genome-phenome Archive database (accession code transitions for monocytes/mo-Macs and T cells were estimated using the Monocle EGAD00001005054). Single-cell expression data can be explored online at http://ureca- 14 fi (version 2) algorithm . For monocytes/mo-Mac, we rst selected single cells in singlecell.kr. clusters defined as monocytes and monocyte-derived macrophages (mo-Macs) from the whole dataset. We then generated separate trajectories for lung tissues and lymph nodes. For CD8+ T cells, we took single cells expressing CD8A and CD8B Code availability (average of log-normalized expression > 0) within the CD8+ T cell clusters (T-C5, The codes generated during this study are available at Github repository (https://github. 6, 8, 11, and 15). Similar to the trajectory analysis for tumor cells, the gene-cell com/SGI-LungCancer/SingleCell).

NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1

Received: 22 September 2019; Accepted: 17 April 2020; 32. Gibbings, S. L. et al. Three unique interstitial macrophages in the murine lung at steady state. Am. J. Respiratory Cell Mol. Biol. 57,66–76 (2017). 33. Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20, 163–172 (2019). 34. Misharin, A. V. et al. Monocyte-derived alveolar macrophages drive lung fibrosis and persist in the lung over the life span. J. Exp. Med. 214, 2387–2404 References (2017). 1. Riihimaki, M. et al. Metastatic sites and survival in lung cancer. Lung cancer 35. Bronte, V. et al. Recommendations for myeloid-derived suppressor cell 86,78–84 (2014). nomenclature and characterization standards. Nat. Commun. 7, 12150 (2016). 2. Cancer Genome Atlas Research N. Comprehensive molecular profiling of lung 36. Baitsch, D. et al. Apolipoprotein E induces antiinflammatory phenotype in adenocarcinoma. Nature 511, 543–550 (2014). macrophages. Arterioscler. Thromb. Vasc. Biol. 31, 1160–1168 (2011). 3. Larsen, J. E. & Minna, J. D. of lung cancer: clinical 37. Benoit, M. E., Clarke, E. V., Morgado, P., Fraser, D. A. & Tenner, A. J. implications. Clin. Chest Med. 32, 703–740 (2011). Complement protein C1q directs macrophage polarization and limits 4. Popper, H. H. Progression and metastasis of lung cancer. Cancer Metastasis inflammasome activity during the uptake of apoptotic cells. J. Immunol. 188, Rev. 35,75–91 (2016). 5682–5693 (2012). 5. Chae, Y. K. et al. Current landscape and future of dual anti-CTLA4 and PD-1/ 38. Arango Duque, G. & Descoteaux, A. Macrophage : involvement in PD- blockade immunotherapy in cancer; lessons learned from clinical trials immunity and infectious diseases. Front. Immunol. 5, 491 (2014). with and non-small cell lung cancer (NSCLC). J. Immunother. 39. Tang-Huau, T. L. et al. Human in vivo-generated monocyte-derived dendritic Cancer 6, 39 (2018). cells and macrophages cross-present antigens through a vacuolar pathway. 6. Binnewies, M. et al. Understanding the tumor immune microenvironment Nat. Commun. 9, 2570 (2018). (TIME) for effective therapy. Nat. Med. 24, 541–550 (2018). 40. Kaczmarek, M. et al. Evaluation of the phenotype pattern of macrophages 7. Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast isolated from malignant and non-malignant pleural effusions. Tumour Biol. tumor microenvironment. Cell 174, 1293–1308.e36 (2018). 32, 1123–1132 (2011). 8. Puram, S. V. et al. Single-cell transcriptomic analysis of primary and 41. Li, Q. & Barres, B. A. Microglia and macrophages in brain homeostasis and metastatic tumor ecosystems in head and neck. Cancer Cell 171, 1611–1624. disease. Nat. Rev. Immunol. 18, 225–242 (2018). e24 (2017). 42. Wang, N., Liang, H. & Zen, K. Molecular mechanisms that influence the 9. Lambrechts, D. et al. Phenotype molding of stromal cells in the lung tumor macrophage m1-m2 polarization balance. Front. Immunol. 5, 614 (2014). microenvironment. Nat. Med. 24, 1277–1289 (2018). 43. Boltjes, A. & van Wijk, F. Human dendritic cell functional specialization in 10. Guo, X. et al. Global characterization of T cells in non-small-cell lung cancer steady-state and inflammation. Front. Immunol. 5, 131 (2014). by single-cell sequencing. Nat. Med. 24, 978–985 (2018). 44. Villani A. C., et al. Single-cell RNA-seq reveals new types of human 11. Lavin, Y. et al. Innate immune landscape in early lung adenocarcinoma by blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 paired single-cell analyses. Cell 169, 750–765.e17 (2017). (2017). 12. Rowbotham, S. P. & Kim, C. F. Diverse cells at the origin of lung 45. Demoulin, S., Herfs, M., Delvenne, P. & Hubert, P. Tumor microenvironment adenocarcinoma. Proc. Natl Acad. Sci. USA 111, 4745–4746 (2014). converts plasmacytoid dendritic cells into immunosuppressive/tolerogenic cells: 13. Patel, A. P. et al. Single-cell RNA-seq highlights intratumoral heterogeneity in insight into the molecular mechanisms. J. Leukoc. Biol. 93, 343–352 (2013). primary glioblastoma. Science 344, 1396–1401 (2014). 46. van der Touw, W., Chen, H. M., Pan, P. Y. & Chen, S. H. LILRB receptor- 14. Qiu, X. et al. Reversed graph embedding resolves complex single-cell mediated regulation of myeloid cell maturation and function. Cancer trajectories. Nat. Methods 14, 979–982 (2017). Immunol. Immunother. 66, 1079–1087 (2017). 15. Cheung, W. K. & Nguyen, D. X. Lineage factors and differentiation states in 47. Jahrsdorfer, B. et al. Granzyme B produced by human plasmacytoid dendritic lung cancer progression. Oncogene 34, 5771–5780 (2015). cells suppresses T-cell expansion. Blood 115, 1156–1165 (2010). 16. Blanco, R. & Gerhardt, H. VEGF and Notch in tip and stalk cell selection. Cold 48. Perrot, I. et al. Dendritic cells infiltrating human non-small cell lung cancer Spring Harb. Perspect. Med. 3, a006569 (2013). are blocked at immature stage. J. Immunol. 178, 2763–2769 (2007). 17. Zhao, Q. et al. Single-cell transcriptome analyses reveal endothelial cell 49. Bruno, T. C. et al. Antigen-presenting intratumoral B cells affect CD4(+) TIL heterogeneity in tumors and changes following antiangiogenic treatment. phenotypes in non-small cell lung cancer patients. Cancer Immunol. Res. 5, Cancer Res. 78, 2370–2382 (2018). 898–907 (2017). 18. Hellstrom, M., Phng, L. K. & Gerhardt, H. VEGF and Notch signaling: the yin 50. Milpied, P. et al. Human germinal center transcriptional programs are de- and yang of angiogenic sprouting. Cell Adh. Migr. 1, 133–136 (2007). synchronized in B cell . Nat. Immunol. 19, 1013–1024 (2018). 19. Thomas, J. L. et al. Interactions between VEGFR and Notch signaling 51. Hagn, M. et al. Human B cells differentiate into granzyme B-secreting pathways in endothelial and neural cells. Cell Mol. Life Sci. 70, 1779–1792 cytotoxic B lymphocytes upon incomplete T-cell help. Immunol. Cell Biol. 90, (2013). 457–467 (2012). 20. Nowak-Sliwinska, P. et al. Oncofoetal isoform A marks the 52. Victora, G. D. et al. Identification of human germinal center light and dark tumour ; an underestimated pathway during tumour angiogenesis zone cells and their relationship to human B-cell . Blood 120, and angiostatic treatment. Br. J. Cancer 120, 218–228 (2019). 2240–2248 (2012). 21. Hall, R. D., Le, T. M., Haggstrom, D. E. & Gentzler, R. D. Angiogenesis 53. Hagn, M. & Jahrsdorfer, B. Why do human B cells secrete granzyme B? inhibition as a therapeutic strategy in non-small cell lung cancer (NSCLC). Insights into a novel B-cell differentiation pathway. Oncoimmunology 1, Transl. Lung Cancer Res. 4, 515–523 (2015). 1368–1375 (2012). 22. Kim, W. Y. & Lee, H. Y. Brain angiogenesis in developmental and pathological 54. Shibuya, M. Vascular endothelial (VEGF) and Its receptor processes: mechanism and therapeutic intervention in brain tumors. FEBS J. (VEGFR) signaling in angiogenesis: a crucial target for anti- and pro- 276, 4653–4664 (2009). angiogenic therapies. Genes Cancer 2, 1097–1105 (2011). 23. Klein, D. The tumor vascular endothelium as decision maker in cancer 55. Carmeliet, P. VEGF as a key mediator of angiogenesis in cancer. Oncology 69 therapy. Front. Oncol. 8, 367 (2018). (Suppl. 3), 4–10 (2005). 24. Xie, T. et al. Single-cell deconvolution of fibroblast heterogeneity in mouse 56. Shibuya, M. & Claesson-Welsh, L. Signal transduction by VEGF receptors in pulmonary fibrosis. Cell Rep. 22, 3625–3640 (2018). regulation of angiogenesis and lymphangiogenesis. Exp. Cell Res. 312, 549–560 25. Kumar, A. et al. Specification and diversification of pericytes and smooth (2006). muscle cells from mesenchymoangioblasts. Cell Rep. 19, 1902–1916 (2017). 57. Guo, C., Buranych, A., Sarkar, D., Fisher, P. B. & Wang, X. Y. The role of 26. Vanlandewijck, M. et al. A molecular atlas of cell types and zonation in the tumor-associated macrophages in tumor vascularization. Vasc. Cell 5,20 brain vasculature. Nature 554, 475–480 (2018). (2013). 27. Hinz, B. et al. Recent developments in myofibroblast biology: paradigms for 58. Chen, Z., Fillmore, C. M., Hammerman, P. S., Kim, C. F. & Wong, K. K. Non- connective tissue remodeling. Am. J. Pathol. 180, 1340–1355 (2012). small-cell lung cancers: a heterogeneous set of diseases. Nat. Rev. Cancer 14, 28. Vong, S. & Kalluri, R. The role of stromal myofibroblast and extracellular 535–546 (2014). matrix in tumor angiogenesis. Genes Cancer 2, 1139–1145 (2011). 59. Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. 29. Otranto, M. et al. The role of the myofibroblast in tumor stroma remodeling. Nature 551, 333–339 (2017). Cell Adh. Migr. 6, 203–219 (2012). 60. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single- 30. Fletcher, A. L., Acton, S. E. & Knoblich, K. Lymph node fibroblastic reticular cell transcriptomic data across different conditions, technologies, and species. cells in health and disease. Nat. Rev. Immunol. 15, 350–361 (2015). Nat. Biotechnol. 36, 411–420 (2018). 31. Kelly, K. K. et al. Col1a1+ perivascular cells in the brain are a source of 61. Lin, P., Troup, M. & Ho, J. W. CIDR: Ultrafast and accurate clustering retinoic acid following stroke. BMC Neurosci. 17, 49 (2016). through imputation for single-cell RNA-seq data. Genome Biol. 18, 59 (2017).

14 NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-16164-1 ARTICLE

62. Li, H. et al. Reference component analysis of single-cell transcriptomes the website. H.E. and S.C. performed flow cytometry analyses. Y.C. performed immu- elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, nohistochemical analyses. W.P., H.J., J.S., S.L., J.A., K.P., and J.J. contributed to critical 708–718 (2017). data interpretation. H.L. and N.K. wrote the manuscript, and all authors contributed to 63. Kiselev, V. Y. et al. SC3: consensus clustering of single-cell RNA-seq data. Nat. the writing and provided comments. Methods 14, 483–486 (2017). 64. Chung, W. et al. Single-cell RNA-seq enables comprehensive tumour and Competing interests immune cell profiling in primary breast cancer. Nat. Commun. 8, 15081 (2017). The authors declare no competing interests. 65. Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. Additional information 4,44–57 (2009). Supplementary information is available for this paper at https://doi.org/10.1038/s41467- 66. Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment 020-16164-1. tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37,1–13 (2009). Correspondence and requests for materials should be addressed to M.-J.A. or H.-O.L. 67. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). Peer review information Nature Communications thanks the anonymous reviewers for 68. Efremova, M., Vento-Tormo, M., Teichmann, S. A. & Vento-Tormo, R. their contribution to the peer review of this work. Peer reviewer reports are available. CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat. Protoc. 15, 1484–1506 (2020). Reprints and permission information is available at http://www.nature.com/reprints 69. Vento-Tormo, R. et al. Single-cell reconstruction of the early maternal-fetal interface in humans. Nature 563, 347–353 (2018). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Acknowledgements The authors thank research fellows at the Samsung Genome Institute for their valuable Open Access comments and the Korean Bioinformation Center (KOBIC) for technical support to This article is licensed under a Creative Commons build and manage the data visualization website (http://ureca-singlecell.kr). We thank the Attribution 4.0 International License, which permits use, sharing, Samsung Medical Center BioBank for providing the biospecimens that were used in the adaptation, distribution and reproduction in any medium or format, as long as you give immunohistochemistry. The results shown here are in part based upon data generated by appropriate credit to the original author(s) and the source, provide a link to the Creative the TCGA Research Network: https://www.cancer.gov/tcga. This research was supported Commons license, and indicate if changes were made. The images or other third party ’ by the Collaborative Genome Program for Fostering New Post-Genome Industry of the material in this article are included in the article s Creative Commons license, unless National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) indicated otherwise in a credit line to the material. If material is not included in the ’ (NFR-2017M3C9A6044633 to M.A. and NRF-2017M3C9A6044636 to H.L.). article s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ Author contributions licenses/by/4.0/. M.A. and H.L. designed and supervised the study. N.K. performed data analysis, with significant contributions from Y.H. H.K. and K.L. coordinated sample collection and clinical data interpretation with help from B.K., J.H.C., J.W.C., J.L., and Y.S. Y.H. built © The Author(s) 2020

NATURE COMMUNICATIONS | (2020) 11:2285 | https://doi.org/10.1038/s41467-020-16164-1 | www.nature.com/naturecommunications 15