Tissue-Specific Cell-Free DNA Degradation Quantifies Circulating
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE https://doi.org/10.1038/s41467-021-22463-y OPEN Tissue-specific cell-free DNA degradation quantifies circulating tumor DNA burden Guanhua Zhu1, Yu A. Guo 1, Danliang Ho1, Polly Poon1, Zhong Wee Poh1, Pui Mun Wong1, Anna Gan1, Mei Mei Chang1, Dimitrios Kleftogiannis1, Yi Ting Lau1, Brenda Tay2, Wan Jun Lim2, Clarinda Chua2, Tira J. Tan2, ✉ ✉ ✉ Si-Lin Koo2, Dawn Q. Chong2, Yoon Sim Yap 2, Iain Tan1,2,3 , Sarah Ng1 & Anders J. Skanderup 1,2 Profiling of circulating tumor DNA (ctDNA) may offer a non-invasive approach to monitor 1234567890():,; disease progression. Here, we develop a quantitative method, exploiting local tissue-specific cell-free DNA (cfDNA) degradation patterns, that accurately estimates ctDNA burden independent of genomic aberrations. Nucleosome-dependent cfDNA degradation at pro- moters and first exon-intron junctions is strongly associated with differential transcriptional activity in tumors and blood. A quantitative model, based on just 6 regulatory regions, could accurately predict ctDNA levels in colorectal cancer patients. Strikingly, a model restricted to blood-specific regulatory regions could predict ctDNA levels across both colorectal and breast cancer patients. Using compact targeted sequencing (<25 kb) of predictive regions, we demonstrate how the approach could enable quantitative low-cost tracking of ctDNA dynamics and disease progression. 1 Genome Institute of Singapore (GIS), A*STAR, Singapore, Singapore. 2 National Cancer Center Singapore, Singapore, Singapore. 3 Duke-NUS Medical School, ✉ National University of Singapore, Singapore, Singapore. email: [email protected]; [email protected]; [email protected] NATURE COMMUNICATIONS | (2021) 12:2229 | https://doi.org/10.1038/s41467-021-22463-y | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22463-y ell-free DNA (cfDNA) is present in the blood circulation Data 1). In these high ctDNA burden WGS samples, we could Cof humans. In healthy individuals, the death of normal obtain ctDNA burden estimates using existing methods22–25 that cells of the hematopoietic lineage is the main contributor infer tumor purity using matched tumor and germline high-depth of plasma cfDNA1. In cancer patients, blood plasma can carry WGS data. To identify candidate NDR features for the quanti- circulating tumor DNA (ctDNA) fragments originating from tative model, we identified tumor and blood-specific genes with tumor cells, offering non-invasive access to somatic genetic differential NDR cfDNA degradation in their promoters and first alterations in tumors. The ctDNA profile of a cancer patient is exon–intron junctions in plasma samples from healthy indivi- clinically informative in at least two major ways. Firstly, the duals and cancer patients. Using a machine learning and in silico profile can provide information about specific actionable muta- cfDNA generation approach, we then trained and tested a sparse tions that can guide therapy2–5. Secondly, the profile can be used linear model to predict ctDNA burden from NDR cfDNA cov- to infer tumor growth dynamics by estimating the amount of erage. To further explore how the approach could be useful for ctDNA in the blood6,7. This latter information offers a promising cost-effective monitoring of ctDNA dynamics, we designed a non-invasive approach to track disease progression during clin- compact (<25 kb) capture-based sequencing assay targeting pre- ical trials or therapy, offering a real-time tool to adjust therapy8,9. dictive NDRs to explore the robustness of NDR-based targeted Existing next-generation sequencing-based approaches to approach using independent plasma samples (n = 53) from CRC estimate ctDNA levels in plasma samples are based on somatic patients, and applied it to estimate ctDNA levels in longitudinally single nucleotide variant allele frequencies (SNV VAFs), copy collected plasma samples from a cohort of five colorectal cancer number aberrations (CNAs), or DNA methylation patterns10,11. patients. cfDNA targeted sequencing typically only covers a few hundred selected cancer genes because of the need for ultra-deep Association of gene expression and cfDNA fragmentation sequencing (~10,000x). ctDNA burden estimation based on patterns. Analysis of cfDNA from the healthy individuals SNVs may therefore be challenging when no clonal mutations expectedly revealed nucleosome depletion and reduced cfDNA exist among the targeted genes. Alternatively, low-pass whole- protection flanked by a series of strongly positioned nucleosomes genome sequencing (lp-WGS) yields segmental/arm-level CNAs at gene promoter regions (Fig. 2a). Consistent with a previous that also allow for inference of ctDNA burden12. However, some study21, relative coverage at the promoter NDR was inversely cancers may not have sufficient levels of aneuploidy and chro- correlated with gene expression in whole blood cells. Studies of mosomal instability13,14 needed for robust estimation. Sequen- nucleosome positioning in cells have found that, apart from cing of DNA methylation patterns may provide a general promoters, exon–intron junctions are associated with NDRs26,27. approach to quantify the cellular origin of cfDNA15. However, We therefore systematically scanned these gene regions for asso- both DNA methylation and lp-WGS profiling require separate ciation between gene expression and cfDNA relative coverage assays in addition to standard targeted gene sequencing, high- (Fig. 2a). Strikingly, we found that the first exon–intron junction lighting the need for approaches that simultaneously allow for of transcripts showed a similar association between coverage and profiling of actionable cancer mutations and quantitative esti- expression, where relative cfDNA coverage at the NDR ranging mation of ctDNA burden. from −300 to −100 bp with respect to the end of the first exon Previous studies have shown that the size distribution of exhibited a strong inverse correlation with transcript expression in cfDNA fragments has a mode of ~166 bp, suggesting that whole blood cells. However, surprisingly, correlation between nucleosome-bound DNA fragments are preserved during cell expression and cfDNA coverage was not observed at other death and shed into the circulation16,17. Nucleosome depleted exon–intron and intron–exon junctions as well as at gene ends regions (NDRs) are therefore more frequently degraded, yielding (Fig. 2a and Supplementary Fig. 1). As expected, when comparing a nucleosome-dependent degradation footprint in cfDNA pro- – highly expressed (fpkm ≥ 30) and unexpressed gene groups, we files, which can be used to infer tissue of origin18 20. Moreover, observed a strong positive correlation (Pearson r = 0.81; Spear- plasma cfDNA degradation patterns in cancer patients have man correlation, ρ = 0.85) between the cfDNA relative coverage at been used to infer tumor gene expression18,21. Here, using these promoter and first exon–intron junction NDRs across genes concepts, we hypothesized that a limited set of tumor or blood- (Fig. 2b). While relative coverage at these NDRs correlated specific NDRs could be used to infer the ctDNA burden (fraction) strongly with gene expression level, relative coverage could not in the blood circulation of cancer patients. ctDNA burden refers perfectly separate unexpressed from expressed genes (Fig. 2b, c), to the relative amount of ctDNA out of all cfDNA molecules in a suggesting that additional factors beyond gene expression con- plasma sample. Using deep cfDNA WGS data from cancer tribute to NDR cfDNA degradation. To further explore the factors patients and healthy individuals, we trained and tested a quan- affecting cfDNA degradation at NDRs, we explored the associa- titative model that infers the ctDNA burden using cfDNA tion between NDR relative coverage and a range of epigenetic sequencing data from a limited set of NDRs. We show that this features (Supplementary Fig. 2). In addition to gene expression model is accurate for plasma samples from both colorectal cancer levels (linear regression, promoter r = −0.23, junction r = −0.22), (CRC) and breast cancer (BRCA) patients (mean absolute error relative coverage was negatively correlated with DNase hyper- ≤4.3%), and we explore how it can be deployed using a compact sensitivity (promoter r = −0.60, junction r = −0.55), H3K4me3 targeted sequencing assay for low-cost and quantitative tracking (promoter r = −0.59, junction r = −0.54), and H3K27ac (pro- of patient ctDNA dynamics. moter r = −0.45, junction r = −0.41), which are markers of open chromatin, active promoters, and active enhancers respectively28. = = Results In contrast, H3K36me3 (promoter r 0.49, junction r 0.46) and H3K9me3 (promoter r = 0.11, junction r = 0.10), markers of gene Overview of approach. We collected blood samples (n = 29) bodies and heterochromatin, were positively correlated with NDR from healthy individuals and extracted plasma cfDNA for paired- relative coverage. end WGS (merged ~150x coverage) (Fig. 1). We performed tar- geted sequencing (see Methods section) of plasma samples (CRC n = 65, BRCA n = 36) from cancer patients and selected samples cfDNA coverage patterns at NDRs in colorectal cancer (CRC n = 12, BRCA n = 10) with high SNV VAFs (indicating patients. To further explore the hypothesis that NDR cfDNA high ctDNA burden) for deep ~90x cfDNA WGS (Supplementary coverage in plasma samples from cancer patients is associated 2 NATURE COMMUNICATIONS | (2021) 12:2229 | https://doi.org/10.1038/s41467-021-22463-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-22463-y ARTICLE 1. Identify NDRs associated with 2. Train quantitative model 3. Perform NDR-targeted cfDNA sequencing Merged cfDNA differential tumor/blood expression of ctDNA burden to monitor ctDNA burden and dynamics WGS (~150x) Healthy individual and cfDNA coverage Gene expr. Genej expr. plasma samples (n=29) i 1 Blood Blood Tumor Tumor cfDNA targeted ~90x cfDNA Genei Genej 31% 62% Colorectal cancer sequencing High ctDNA burden WGS Infer NDR NDR 0.72 0.91 ... 1.41 0.62 ctDNA 0.5 plasma samples (n=65) samples (n=12) 1.12 1.01 ... 1.03 0.14 0.91 0.99 ... 1.12 0.27 fraction 0.92 0.98 ..