Research Article

Genomic Characterization of HIV-Associated Plasmablastic Lymphoma Identifies Pervasive Mutations in the JAK–STAT Pathway

Zhaoqi Liu1,2, Ioan Filip1,2, Karen Gomez1,2, Dewaldt Engelbrecht3, Shabnum Meer4, Pooja N. Lalloo3, Pareen Patel3, Yvonne Perner5, Junfei Zhao1,2, Jiguang Wang6, Laura Pasqualucci7–9, Raul Rabadan1,2, and Pascale Willem3

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Copyright 2020 by AssociationAmerican for Association Cancer Research. for Cancer Research. abstract Plasmablastic lymphoma (PBL) is an aggressive B-cell non-Hodgkin lymphoma associated with immunodeficiency in the context of human immunodeficiency virus (HIV) infection or iatrogenic immunosuppression. While a rare disease in general, the incidence is dramatically increased in regions of the world with high HIV prevalence. The molecular pathogenesis of this disease is poorly characterized. Here, we defined the genomic features of PBL in a cohort of 110 patients from South Africa (15 by whole-exome sequencing and 95 by deep targeted sequenc- ing). We identified recurrent mutations in of the JAK–STAT signaling pathway, includingSTAT3 (42%), JAK1 (14%), and SOCS1 (10%), leading to its constitutive activation. Moreover, 24% of cases harbored gain-of-function mutations in RAS family members (NRAS and KRAS). Comparative analysis with other B-cell malignancies uncovered PBL-specific somatic mutations and transcriptional pro- grams. We also found recurrent copy number gains encompassing the CD44 (37%), which encodes for a cell surface receptor involved in lymphocyte activation and homing, and was found expressed at high levels in all tested cases, independent of genetic alterations. These findings have implications for the understanding of the pathogenesis of this disease and the development of personalized medicine approaches.

Significance: Plasmablastic lymphoma is a poorly studied and extremely aggressive tumor. Here we define the genomic landscape of this lymphoma in HIV-positive individuals from South Africa and iden- tify pervasive mutations in JAK–STAT3 and RAS–MAPK signaling pathways. These data offer a genomic framework for the design of improved treatment strategies targeting these circuits.

Introduction (HIV)-related or iatrogenic immunodeficiency (1). As an AIDS- defining illness with a dismal prognosis, PBL is a particularly Plasmablastic lymphoma (PBL) is a highly aggressive lym- compelling problem in the Sub-Saharan African region, which phoma of preterminally differentiated B cells, which predomi- accounts for approximately 54% of the estimated 37.9 million nantly occurs in patients with human immunodeficiency virus people living with HIV. Despite the national rollout offering combination anti- 1Program for Mathematical Genomics, Columbia University, New York, retroviral therapy (cART) in South Africa since 2004, the prev- 2 New York. Departments of Systems Biology and Biomedical Informatics, alence of HIV-associated mature B-cell lymphomas increases Columbia University, New York, New York. 3Department of Haematology and Molecular Medicine, National Health Laboratory Service, Faculty of yearly (2). The burden of HIV-associated lymphomas on Health Sciences, University of the Witwatersrand, Johannesburg, South the country’s encumbered health are further compounded Africa. 4Department of Oral Pathology, Faculty of Health Sciences, Uni- by the younger presentation age of HIV-infected patients, versity of the Witwatersrand, Johannesburg, South Africa. 5Department the challenging classification of these lymphomas, and an of Anatomical Pathology, National Health Laboratory Service, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South exceptionally aggressive clinical disease course that, although Africa. 6Division of Life Science, Department of Chemical and Biological curable in a significant fraction of patients (3), often results Engineering, Center for Systems Biology and Human Health and State Key in fatalities (4). Laboratory of Molecular Neuroscience, The Hong Kong University of Sci- Previously being classified as diffuse large B-cell lymphoma 7 ence and Technology, Hong Kong SAR, China. Institute for Cancer Genetics. (DLBCL), PBL was later recognized as a distinct entity (5) 8Herbert Irving Comprehensive Cancer Center, Columbia University, New York, New York. 9Department of Pathology and Cell Biology, Herbert Irving with well-defined histopathologic features including large Comprehensive Cancer Center, Columbia University, New York, New York. plasmablastic or immunoblastic cell morphology, loss of Note: Supplementary data for this article are available at Blood Cancer B-cell lineage markers (CD20, PAX5), expression of plasma- Discovery Online (http://bloodcancerdiscov.aacrjournals.org/). cytic differentiation markers (CD38, CD138, IRF4/MUM1, L. Pasqualucci, R. Rabadan, and P. Willem contributed equally to this article. BLIMP1), a high proliferation index, and frequent infection Corresponding Authors: Laura Pasqualucci, Institute for Cancer Genetics, by the Epstein-Barr virus (EBV; ref. 6). While EBV infection Columbia University, 1130 St Nicholas Ave, Room 507B, New York, NY 10032. and activating MYC translocations have been reported as Phone: 212-851-5248; Fax: 212-851-5256; E-mail: [email protected]; the major features of these tumors in a number of studies Raul Rabadan, Columbia University Irving Medical Center, 622 West 168th (7), the molecular pathology of PBL remains elusive in both Street, PH18-200, New York, NY 10032. Phone: 212-305-3896; Fax: 212- 851-5149; Email: [email protected]; and Pascale Willem, University HIV-positive and -negative individuals. Numerous case stud- of the Witwatersrand and the National Health Laboratory Service, Wits Medi- ies and some cohorts have been well described and reviewed cal School, 7 York Road Parktown, Johannesburg 2193, South Africa. Phone: (6–9). Only two studies have partially explored genetic aber- 271-1489-8406; Fax: 271-1489-8480; E-mail: [email protected] rations in this disease: a comparative genomic hybridiza- Blood Cancer Discov 2020;1–14 tion study showed a pattern of segmental gains in PBL doi: 10.1158/2643-3230.BCD-20-0051 that more closely resembled DLBCL than plasma cell mye- ©2020 American Association for Cancer Research. loma (10). The second study compared the transcriptional

2020 BLOOD CANCER DISCOVERY | OF2

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. RESEARCH ARTICLE Liu et al. profiles of 15 PBL cases to DLBCL and extraosseous plasma- PBL cohort (Fig. 1A–E). The most frequent genetic lesions cytoma (11), and showed that B-cell receptor signaling genes affected the JAK–STAT signaling pathway with, in total, including CD79A/B, BLK, LYN, and SWAP70 among others, 62% of cases (68/110) harboring a mutation in at least one were significantly downregulated in PBL compared with of five genes STAT3,( JAK1, SOCS1, JAK2, and PIM1, a direct DLBCL; in contrast, targets of MYB and the oncogene MYC, transcriptional target of STAT3/5 that functions as part of as well as genes reflecting the known plasmacytic immu- a negative feedback loop; Fig. 1A). Among these, STAT3 was nophenotype of PBL, were overexpressed. More recently, the predominant target of mutations, with 46 of 110 cases MYC and PRDM1 were investigated for mutations, struc- (42%) harboring clonal events (Supplementary Fig. S2A and tural rearrangements, and copy number gains in 36 PBL S2B). With three exceptions, the identified variants were het- cases (12). PRDM1 mutations were found in 8 of 16 cases, erozygous missense mutations clustering in exons 19 to 22, frequently in association with MYC overexpression, sug- encoding part of the SH2 domain that is required for STAT3 gesting a coordinate role in the pathogenesis of the disease. molecular activation via receptor association and tyrosine While these studies have shed light on this rare disorder, a phosphodimer formation (14). In particular, the majority systematic characterization of -changing alterations of the mutations resulted in the amino acid changes Y640F in PBL has not been performed. (n = 11), D661V (n = 9), S614R (n = 5), and E616G/K (n = 4; The elucidation of genes and pathways that drive the Fig. 1B), which have been categorized as gain-of-function initiation and maintenance of PBL is essential to better or likely gain-of-function mutations based on experimental understand the biology of this cancer and, critically, to imple- validation (https://oncokb.org/gene/STAT3), and overlap ment improved biomarkers and more effective treatment with the STAT3 mutation pattern described in other aggres- options. Here, we combined whole-exome and transcriptome sive B-cell lymphomas and T-cell and natural killer (NK) sequencing followed by targeted resequencing of 110 HIV- cell malignancies (15, 16). Three additional cases showed a associated PBL cases to elucidate the mutational, transcrip- >75% variant allele frequency in the absence of copy number tional, and copy number landscape of this disease. We show changes, consistent with a copy neutral loss of heterozygo- that mutations in various components of the JAK–STAT and sis. Sanger-based resequencing of the involved STAT3 exons, MAPK–ERK pathway pervade this lymphoma, revealing this performed in 23 cases, validated all computationally identi- signaling cascade as a central oncogenic driver of the disease fied mutations on these samples (Fig. 1F; Supplementary and a candidate for targeted therapy. Table S6). In addition to STAT3, 15/110 PBL cases harbored heterozy- Results gous missense mutations of JAK1, encoding for a tyrosine kinase that phosphorylates STAT . These events did The Landscape of Somatic Mutations in PBL not involve the canonical hotspot seen in many other can- To determine the mutational landscape of PBL, we per- cers (loss-of-function K860Nfs), but occurred at a highly formed whole-exome sequencing (WES) in a discovery panel conserved amino acid position in the JH1 kinase domain, of paired tumor and normal DNA collected from 15 HIV- G1097D/V (Fig. 1C). Mutations at the JAK1 G1097 codon positive patients, followed by deep targeted sequencing of the were previously reported in intestinal T-cell lymphomas (17) top 34 candidate genes in 95 additional cases (Supplemen- and anaplastic large cell lymphoma (15), and the G1097D tary Table S1 and S2; Methods). Somatic mutations were substitution has been documented as a gain-of-function identified from WES data using SAVI2 (13), an empirical event that triggers aberrant phosphorylation of STAT3, lead- Bayesian method. Overall, 2,149 nonsynonymous somatic ing to constitutive activation of the JAK–STAT signaling variants were found in the 15 discovery cases, with a median pathway (15). Of note, missense mutations in STAT3 and of 45 per case and a total of 1,528 affected genes, of which JAK1 were found to be mutually exclusive (P = 0.04, Fisher 1,461 were mutated in the tumor-dominant clone (>15% exact test), suggesting a convergent role in activating this cancer-specific allele frequency; Supplementary Table S3). cascade. In contrast to STAT3 and JAK1, mutations in SOCS1 Among the 15 WES cases, one was hypermutated (case (n = 11/110 cases) were more widely distributed, consistent PJ030), showing 845 sequence variants that were confirmed with its previously established role as a target of AID-medi- by RNA-sequencing (RNA-seq; 92% with a read depth ≥ 10; ated aberrant somatic hypermutation (Fig. 1D). Although Supplementary Fig. S1A–S1C). Candidate genes were then the consequences of most SOCS1 amino acid changes will selected for an extension screen based on the following cri- require functional validation, the presence of a start-loss teria: (i) mutated in at least 2 discovery cases, (ii) expressed mutation in one case and a nonsense mutation in a second in normal and/or malignant B cells, (iii) known as a cancer sample is expected to result in loss-of-function and inactiva- driver gene, and/or (iv) with an established role in B-cell dif- tion of this negative JAK/STAT regulator, consistent with a ferentiation (Supplementary Tables S4 and S5; Methods). tumor suppressor role. The 34 selected genes were all annotated in the Catalogue The second most commonly mutated program was the of Somatic Mutations in Cancer (COSMIC) database. The MAPK–ERK signaling pathway, affected in 28% of cases by mean depth of coverage for WES was 54.0x. The mean depth mutations in the RAS gene family members NRAS (14%) of coverage for the targeted DNA sequencing was 121.7x and KRAS (9%), as well as in BRAF (5.5%) and MAP2K1 (3%). and, on average, 99.7% of the target sequences were covered Nearly all (92.5%) RAS mutations were found at known func- by at least 50 reads. tional hotspots that have been shown to affect the intrinsic We found genes in the JAK–STAT, MAPK–ERK, and RAS GTPase activity, namely G12, G13, and Q61 (ref. 18; Notch signaling pathways were commonly mutated in our Fig. 1E). BRAF, an integral component of the MAPK

OF3 | BLOOD CANCER DISCOVERY 2020 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. Genomic Characterization of Plasmablastic Lymphoma RESEARCH ARTICLE

A 110 cases WES DNAseq Targeted DNA Sample site Oral Not oral MYC translocation Pos Neg STAT3 JAK1 JAK-STAT SOCS1 PIM1 JAK2 NRAS MAPK BRAF KRAS MAP2K1 NCOR2 NOTCH SPEN NOTCH1 TET2 KMT2A Epigenetics EP300 TRRAP HDAC6 MYC TP53 Other TF PRDM1 FOXP1 KLF2 Immune TNFRSF14 B2M evasion NPHP4 LTB HSPA8 PTPRD Others BLM NFKBIE UBR5 SART1

Missense Stop lost Splice region 02040 # cases Inframe indel Start lost Frameshift indel

BC STAT3 JAK1 Chr17:42313324-42388568 Chr1:64833229-64966504 Missense Missense

Inframe Y640F (11) Splice G1097 (13)

Frameshift D661V/N (9) E616,G618 (7) S614R (5) N647I (5) C426R (2) N567K (2) K696I (1) R152W (1) X989 (1) P715L (1) S683N (1) A699V (1) 1 STAT int STAT alpha STAT bind SH2 770 aa 1 Pkinase Tyr Pkinase Tyr1154 aa

0 0

60 15 COSMIC 120 30

DESOCS1 NRAS Chr16:11254417-11256200 Chr1:114704469-114716894 Missense Missense Start lost Frameshift G13 (7) Q61 (7) F58L (2) F79L (2) T100 (2) S125N (2) Y64* (1)

M1T (1) G139V (1) R169G (1) G12D (2)

1 SH2 SOCS box 211 aa 1 Ras 189 aa

0 0

7 2,000 COSMIC 14 4,000

F PJ046 STAT3 D661V PJ042 STAT3 Y640F

Met Asp AlaPro Tyr Thr A T GGTTG CCT C A TTCCAA AA Chr17:40474419 Chr17:40474482

Figure 1. The landscape of putative driver gene mutations in PBL. A, Sample information, MYC translocation, and somatic mutation information are shown for 110 cases of PBL samples. The heatmap represents individual mutations in each sample, color-coded by type of mutation. B–E, Individual gene mutation maps for frequently mutated genes, showing mutation subtype, position, and evidence of mutational hotspots, based on COSMIC database information. Y-axis counts at the bottom of the maps reflect the number of identified mutations in the COSMIC database. F, Sanger validation of single nucleotide variants (SNV) in STAT3 mutants. signaling cascade, was found mutated in 6 cases, including segments, and its glutamine substitution confers an over 3 harboring mutations at or around the well-characterized 500-fold increase in activity, leading to constitutive activa- V600 residue (namely: V600E, K601N, and T599TT) and 3 tion of the MEK/ERK signaling cascade in the absence of showing mutations at other common hotspots within the extracellular stimuli. In line with their predicted gain of protein kinase domain (G464E, V471F, and M689V). The function, the mutant alleles in both RAS family members Valine at position 600 normally stabilizes the interaction and BRAF were observed in heterozygosis (Supplementary between the BRAF glycine-rich loop and the activation Tables S3 and S5) and were actively expressed (median

2020 BLOOD CANCER DISCOVERY | OF4

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. RESEARCH ARTICLE Liu et al.

RPKM value of KRAS: 13.253, NRAS: 35.601, and BRAF: S4A), followed by validation in an independent cohort of 9.906). 31 additional PBL samples that had been processed with In addition, 24% of PBL cases carried mutations in genes Affymetrix OncoScan microarrays (ref. 21; Supplementary implicated in the Notch signaling pathway, including those Table S7). Comparison of CNA calls obtained in parallel from encoding for NOTCH1, its negative regulator SPEN, and the WES and OncoScan approach in a subset of 9 samples the Notch pathway corepressor NCOR2. In particular, one confirmed the data were highly consistent with each other sample displayed a frameshift variant in the NOTCH1 gene (Supplementary Fig. S4B), supporting the robustness of the that is predicted to generate a truncated protein lacking the analysis. C-terminal PEST domain, and thus endowed with increased Frequent copy number gains involved large chromosomal protein half-life, while 6 additional cases showed amino regions (>10 Mb) including 1q (20/46 cases, acid changes within the EGF repeats, the juxtamembrane 43%) and the whole or most of chromosome 7 (13/46 cases, heterodimerization domain (N-terminal portion), and the 28%). When applying the GISTIC 2.0 algorithm to the com- C-terminal PEST domain. SPEN mutations include monoal- bined cohort of 46 cases, we uncovered regions of highly lelic missense substitutions that were distributed along the recurrent amplification including 6p22.2 (the most signifi- protein coding exons with no particular clustering, and their cant), 6p22.1 and 1q21.3, all of which encompass functional effect remains to be determined. gene clusters (Fig. 2A). In particular, the chromosome 6p Missense mutations, frequently multiple within the same gain covered 36 genes that encode canonical and allele, were also found in the MYC gene (10/110, 9%), with 4 has been previously reported as a common alteration in a additional cases displaying silent mutations that possibly reflect variety of cancers, where histone gains have been linked to the aberrant activity of the physiologic somatic hypermutation genetic instability (22). The significant region on chromo- mechanism (19) as well as the recruitment of the AID mutator some 1q21.3 also included the IL6R gene and the antiapop- enzyme by the juxtaposed immunoglobulin enhancer in cases totic MCL1 gene, which showed increased gene expression harboring chromosomal MYC translocations. MYC rearrange- (Fig. 2C) analogous to what has been described in 26% of ments with the immunoglobulin genes are the most common activated B-cell like (ABC) DLBCLs (23). Although further cytogenetic feature in PBL, present in about half of reported studies will be needed to determine the functional impact cases (7, 12, 20), and were identified in 36% (31/76) of our of these alterations in PBL, chromosome 1q gains have been samples using FISH (Fig. 1A). The functional consequences of associated with unfavorable prognosis in multiple myeloma, MYC mutations will have to be experimentally tested; however, suggesting a role in the pathobiology of the disease. transcriptomic analysis revealed significantly higher expression The second most significant focal CNA was a chromo- of the MYC RNA in cases positive for the MYC translocation, some 11p13 regional gain targeting genes CD44 and PDHX, consistent with MYC oncogenic activation (Supplementary Fig. present in 17 of 46 cases (37%; Fig. 2A–C). CD44 is a nonki- S3A and S3B). The high proportion of cases showing evidence nase transmembrane glycoprotein that is induced in B cells of MYC dysregulation reinforces the notion that MYC is a criti- upon antigen-mediated activation and is critically involved in cal contributor to this aggressive cancer. multiple lymphocyte functions, including migration, hom- Aside from the genes mentioned above, we observed ing, and the transmission of signals that regulate apoptosis. recurrent mutations, including truncating loss-of-function This CD44 protein is also thought to increase cancer cells’ events, in TET2 (10/110, 9%), TP53 (10/110, 9%), and NPHP4 adaptive plasticity in response to the microenvironment, thus (6/110, 5%). Finally, genes encoding epigenetic modifiers, giving them a survival and growth advantage (24). Analysis of transcription factors implicated in B-cell activation (FOXP1), 20 samples with available RNA-seq data revealed that CD44 positioning (KLF2), and terminal differentiation (PRDM1), was consistently and highly expressed in cases harboring copy and receptor molecules involved in tumor immune surveil- number gains (n = 2), whereas PDHX levels were undetect- lance (e.g., B2M, TNFRSF14) were mutated to a lesser extent able, indicating that CD44 is the specific target of the 11p13 (range: 1%–5%; Fig. 1A). amplicon. However, all samples displayed high CD44 mRNA Although the relatively small number of cases prevents robust expression, independent of the presence of genetic aberrations statistical analyses, we did not find any significant difference in (Fig. 2C); consistently, IHC analysis with a specific CD44 anti- the rate of mutated genes between samples collected from the body showed very strong membranous staining in all cases oral cavity and from other locations (Pearson correlation coef- tested (Fig. 2D) and confirmed this finding in an independent ficient = 0.822, Supplementary Fig. S2B), suggesting a genetic panel of 38 cases. Of note, although CD44 expression can be homogeneity of the disease regardless of the tissue of origin. detected in normal plasma cells at both RNA and protein levels, Collectively, the data presented above uncover a pervasive the signal was markedly lower than in plasmablastic lymphoma role for mutations affecting the JAK–STAT and MAPK–ERK cells, suggesting that elevated expression of CD44 does not pathways in the genetic landscape of PBL, which may con- simply reflect the cellular ontogeny of these tumors, and that tribute to the pathogenesis of this lymphoma by enforcing alternative regulatory mechanisms may lead to CD44 upregu- constitutive signaling activation, in concert with dysregu- lation in cases lacking CNAs (Supplementary Fig. S5A–S5C; lated MYC activity. Supplementary Table S8). Somatic Copy Number Changes Plasmablastic Lymphoma Displays a Distinct To identify recurrent copy number alterations (CNA) asso- Genetic and Transcriptional Program ciated with PBL, we applied the SNP-FASST2 algorithm to To explore the genomic features of PBL in relation to WES data from the 15 discovery cases (Supplementary Fig. other lymphoid neoplasms arising from the mature B-cell

OF5 | BLOOD CANCER DISCOVERY 2020 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. Genomic Characterization of Plasmablastic Lymphoma RESEARCH ARTICLE

A 0.2 0.40.32 PRAMEF2 0.8 1.1 1q21.3 HIST2H2AA3 HIST2H3A 1p36.21 HNRNPCL1 1q23.1 HIST2H2AC HIST2H2BC 1 PRAMEF10 1 1q31.3 HIST2H2BE HIST2H2BF PRAMEF8 1q32.1 HIST2H4A HIST2H4B PRAMEF4 1q43 HIST2H3C HIST2H3D 2 PRAMEF11 2 2p24.1 HIST2H2AB HIST2H2AA4 PRAMEF7 2q13 2q22.1 3 3 2q32.3

4 4 HIST1H1D HIST1H2BF HIST1H4D 5 5 HIST1H1E HIST1H2BE HIST1H2BK 6p22.2 HIST1H1T HIST1H2BH HIST1H4F 6 6 6p22.1 6q23.2 HIST1H2AE HIST1H2BI HIST1H4C HIST1H2AD HIST1H2BC HIST1H4H 7 7 7q11.21 7q34 HIST1H2BD HIST1H3D HIST1H4E 8 8 HIST1H2AC HIST1H3E HIST1H4G 9 9 HIST1H2BG HIST1H3G ... 10 TCERG1L 10 10q26.3 11p13 PDHX 11 MIR378C 11 CD44 12p11.23 12 12 13 13 14 14 15q11.2 15 15 16 16p13.3 16 17 17 18 18 18q11.1 19 19 20 20 21 21 22 22q13.32 22 −4 −6 −9 −3 −6 −10 10 10 10 10 10 10

B chr11:32,620,498-44,306,998 Chr 11

33 Mb 35 Mb 37 Mb 39 Mb 41 Mb 43 Mb

37% Gain

PJ021 PJ026 PJ030 A05 A08 D03 A10 B03 B05 B07 B08

Sample B09 B11 C03 C05 C06 C11 CD44 (chr11:35,160,417-35,253,949)

C 6p22.1 1q21.3 11p13 6p22.2 D

HIST1H1E HIST1H4C HIST1H2BG 150 MCL1

HIST1H1B HIST1H2BI HIST1H2BM HIST1H3G HIST1H2BC 100 HIST1H2AJ HIST1H2AE HIST2H2AC CD44 HIST1H4D HIST1H2AI HIST1H2BH HIST1H2BD ADAR HIST1H2BL HIST1H1D 50 HIST2H2BE Gene expression level in PBL BUB1 PDHX

0

6810 12 14 16

−Log10 (q-value) from GISTIC 2.0 Sample: A10

Figure 2. Recurrent copy number changes in PBL. A, GISTIC 2.0 results showing recurrent copy number changes in PBL samples. The green line indicates q-value = 1.0 × 10−6. B, A zoomed-in view of 11p13 on 17 cases of PBL, which shows consistent focal copy number gains of CD44. The figure was generated by the IGV browser using CNV segment files from SNP-FASST2 algorithm. C, Scatter plot representations of genes located in regions with −6 recurrent copy number gains in PBL (q-value <1.0 × 10 , GISTIC 2.0). The horizontal axis indicates the −log10(q-value) from the GISTIC report, and the ver- tical axis is the median gene expression level (normalized RPKM value) from PBL RNA-seq data (n = 20). D, IHC showing strong CD44 protein expression on tumor cells membrane in a representative sample (A10), which has a focal CD44 copy number gain.

2020 BLOOD CANCER DISCOVERY | OF6

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. RESEARCH ARTICLE Liu et al. lineage, we performed unsupervised clustering analysis based by expressing the programmed cell death protein 1 (PD-1) on mutation frequency of the top mutated genes from three and its PD-L1 ligand (32). We confirmed high expression of cancer types obtained from public repositories (Fig. 3A). The PD-L1 in our PBL cohort, although PD-1 did not follow this analysis included chronic lymphocytic leukemia (CLL; ref. pattern (Supplementary Fig. S6). 25), diffuse large B-cell lymphoma (DLBCL) transcription- We then performed functional enrichment analysis ally defined as ABC and germinal center B-cell-like (GCB) (g:profiler) to identify biological pathways that are pref- subtypes (26), and multiple myeloma (27). As expected on the erentially enriched in PBL as compared with other B-cell basis of their presumed derivation from B cells committed to malignancies. Whereas DLBCL and CLL were enriched for plasma cell differentiation, the mutational landscape of PBL B-cell related biological programs (e.g., B-cell activation and was overall closer to multiple myeloma than to other mature proliferation, lymphocyte activation and differentiation, and B-cell malignancies, with mutations in RAS family members adaptive immune response), multiple myeloma was enriched being detected in as many as 20% of cases in both diseases, for mitotic cell-cycle processes, with highly expressed genes while rare in DLBCL (Fig. 3A). Conversely, the mutational including MCM10, BIRC5, CENPE, BUB1, and AURKA (Fig. landscape of PBL was highly distinct from that of both GCB- 3B). The most significantly enriched pathways in PBL were and ABC-DLBCL (Fig. 3A). In particular, mutations affecting related to chromatin/nucleosome assembly (Fig. 3B); par- the methyltransferase KMT2D and acetyltransferase CREBBP, ticularly, histone genes encoding basic nuclear proteins were two among the most commonly mutated genes in DLBCL highly expressed in PBL, possibly related to the recurrent (28), were absent in plasmablastic lymphoma. ABC-DLBCL– copy number gains/amplification encompassing this gene specific mutations such asCD79A/B and MYD88 were also cluster on 6p22.2 and 1q21.3 (Supplemen- lacking in PBL. Of note, STAT3 was the top mutated gene in tary Fig. S7A–S7D). However, the biological significance our cohort at significantly different frequencies compared of this observation in relation to tumorigenesis remains with other B-cell malignancies, making it a hallmark of PBL unknown. (Fig. 3A). The differential genetic landscape of PBL is consist- ent with its status as a distinct entity among mature B-cell Activation of the JAK–STAT Pathway in neoplasms. Plasmablastic Lymphoma To define the transcriptional profile of HIV-positive PBL Given the elevated frequency of mutations targeting and to identify unique signatures that may distinguish the JAK–STAT pathway, we used computational and IHC it from other lymphoma types, we performed RNA-seq approaches to assess the extent of activation of this signal- analysis of 20 PBL samples (including 12 of the 15 discovery ing cascade in PBL (Methods). To this end, we first inter- cases), and compared their transcriptome to that of normal rogated the genome-wide transcriptional profile of PBL B-cell subsets and other B-cell lymphoma types previously for enrichment in the JAK–STAT signaling pathway using characterized in our laboratories and/or obtained from pub- previously identified signatures available in the MSigDB lic repositories, including germinal center centroblasts (CB), database. This analysis revealed a significant positive naïve B cells (NB), and memory B cells (MB; ref. 29), as well enrichment for genes implicated in this pathway, consist- as CLL (30), DLBCL (26), and multiple myeloma cell lines ent with constitutive activation (Fig. 4A). This signature (31). As expected, hierarchical clustering of the top 1,000 was expressed at higher levels in PBL when compared with most aberrantly expressed genes revealed that PBL and multiple myeloma and CLL (Fig. 4B), with the most signifi- multiple myeloma were closer to each other compared with cant difference being observed between PBL and multiple other B-cell malignancies and to normal B cells, reflecting myeloma (t test, P = 4.7 × 10−11), which is congruent with their presumed cell of origin (Fig. 3B). Consistently, plasma- the fact that the latter had few mutations in this pathway blastic lymphoma and multiple myeloma lacked expression (Fig. 3A). We then performed IHC staining with anti- of common B-cell markers (CD19, CD20, CD40, and PAX5) STAT3 and anti-phospho-STAT3 antibodies to assess the and transcription factors involved in the germinal center STAT3 subcellular localization and phosphorylation, used reaction (BCL6, BCL7A, BCL11A, and SPIB), whereas expres- as a read-out for signaling activation. We observed strong sion of the master regulator of plasma cell differentiation positive nuclear signal in 61% of tested cases (19/31), PRDM1 and other plasma cell markers (CD138, XBP1, and which also stained positive for the phospho-STAT3 IRF4) were increased (Supplementary Fig. S6). MYC expres- antibody. Of note, evidence of constitutively active STAT3 sion was also higher in PBL and multiple myeloma. Other was detected even in the absence of genetic alterations notable differences included the upregulation of IL6R, a affecting this pathway, suggesting alternative (epigenetic) known STAT3 target, and the downregulation of SWAP70, mechanisms of activation and indicating a prominent previously suggested as a potential biomarker of PBL (Sup- role for this cascade in the pathogenesis of the disease plementary Fig. S6; ref. 11). Finally, recent studies have sug- (Fig. 4C and D; Supplementary Fig. S8; Supplementary gested that EBV positive PBLs evade immune recognition Table S9).

Figure 3. Comparative analysis of PBL and other B-cell malignancies. A, Unsupervised clustering based on frequencies of the most recurrently mutated genes from PBL, multiple myeloma, CLL, and two main subtypes of DLBCL (Methods). B, Hierarchical clustering of mRNA expression profiles across plasmablastic lymphoma, multiple myeloma, CLL, DLBCL and normal B cells, including centroblast (CB), naïve B (NB), and memory B cells (MB). Note that the JAK–STAT signaling pathway does not appear in this figure because only the top 1,000 most aberrantly expressed genes were selected for this analysis. Functional enrichment analysis was performed using g:profiler.

OF7 | BLOOD CANCER DISCOVERY 2020 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. Genomic Characterization of Plasmablastic Lymphoma RESEARCH ARTICLE

A

STAT3 NCOR2 TET2 JAK1 NPHP4 MYC LTB KRAS NRAS DIS3 TENT5C BRAF FAT3 SF3B1 ATM TP53 NOTCH1 POT1 XPO1 BIRC3 PIM1 MYD88 CD79B PRDM1 DTX1 IRF2BP2 TNFRSF14 CREBBP SOCS1 EZH2 BCL2 SGK1 GNA13 B2M BTG2 BTG1 TNFAIP3 CARD11 HIST1H1E MEF2B KLHL6 CD58 SPEN

0.2 0.4 0.2 0.4 0.2 0.4 0.2 0.4 0.2 0.4 PBL MM CLL ABC DLBCL GCB DLBCL

B PBL MM Z-score ABC Nucleosome assembly Chromatin silencing GCB −4 0 4 Chromatin assembly Nucleosome positioning DNA packaging Hematopoietic development CB Chromatin organization Sensory perception of taste NB MB CLL Mitotic cell cycle Chromosome segregation Nuclear division Cell-cycle checkpoint Regulation of cell cycle DNA metabolic process Organelle fission Spindle checkpoint

Immune system process Cell adhesion Cell activation Inflammatory response Defense response Cell migration Secretion by cell Neutrophil degranulation

Leukocyte activation B-cell activation Immune response B-cell proliferation Lymphocyte activation Adaptive immune response Lymphocyte differentiation Hemopoiesis

2020 BLOOD CANCER DISCOVERY | OF8

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. RESEARCH ARTICLE Liu et al.

A KEGG JAK–STAT signaling pathway B KEGG JAK–STAT pathway PBL vs. MM P = 4.7e-11 PBL vs. CLL P = 2.3e-07 0.4 PBL vs. DLBCL P = 0.94

0.75 0.2 NES: 2.088 P: 0.000

Enrichment score (ES) 0.0 0.50

10.0 Pos (positively correlated) 0.25

5.0 Zero score at 19,158 Enrichment score of ssGSEA 0.0 Neg (negatively correlated) 0.00 Ranked list metric 0 10,000 20,000 30,000 40,000 50,000 Rank in ordered dataset PBL MM CLLABC GCB

C D ×100 PJ122 N567K ×400 ×100 PJ159 D661V ×400

×100 PJ154 N567K ×400 ×100 PJ225 Y640F ×400

Figure 4. Activation of JAK–STAT pathway in PBL. A, Preranked gene set enrichment analysis indicating the significant positive enrichment of KEGG JAK–STAT signaling pathway in PBL. The preranked gene list was generated on the basis of the median expression level on PBL samples. B, Enrichment comparisons of JAK–STAT pathway between PBL and other B-cell malignancies by performing single-sample GSEA (ssGSEA). Pairwise P values were derived from t test. C and D, IHC plots showing pSTAT3 protein expression in >75% of tumor cells, confirming STAT3 activation in STAT3 mutated cases. Results using anti-pSTAT3 antibody are photographed at ×100 and ×400 magnification.

Virus Detection in Plasmablastic Lymphoma Fig. S9). In addition, EBV transcripts were detected in 18 of 20 samples, HCMV (human cytomegalovirus, or human To analyze the viral and bacterial make-up of PBL tumors, betaherpesvirus 5) in 3 of 20 samples, and KSHV (Kaposi sar- we used the Pandora pipeline, which extracts and aligns coma-associated herpes virus, human herpes virus 8) in one nonhost genetic material from tumor RNA-seq data (Meth- sample (Supplementary Fig. S9). These three herpes viruses ods). Potentially pathogenic species were then identified by have a broad tropism, naturally infect B cells, and are known applying the BLAST algorithm against the NCBI database of to be associated with many tumor types. viruses and bacteria reference genomes. In our PBL cohort, EBV reactivation is considered a major driver of PBL, 12 of 20 samples contained HIV-1 transcripts, with only predominantly with a latency I infection program although three samples exceeding 100 mapped reads (Supplementary low levels of LMP1 gene expression can be detected (33).

OF9 | BLOOD CANCER DISCOVERY 2020 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. Genomic Characterization of Plasmablastic Lymphoma RESEARCH ARTICLE

Normalized virus read counts Latency II,III Latency III Latency I,II,III Latent Early lytic Late lytic 060 120

PJ008 PJ117 PJ024 PJ046 PJ042 PJ123 PJ049 PJ121 PJ030 PJ021 PJ020 PJ007 PJ039 PJ120 PJ119 PJ026 PJ129 PJ122 PJ028 PJ125 Cp TR LF2 LF1 LF3 oriP oriLyt BILF1 BILF2 BcLF1 BLLF3 BLLF1 BZLF2 BFLF2 BFLF1 BcRF1 BALF5 BALF2 BALF1 BXLF1 BSLF1 BBLF4 BALF4 BPLF1 BXLF2 BALF3 BBLF1 BVLF1 BHLF1 BRLF1 BDLF3 BDLF2 BLRF2 BDLF1 BLRF1 BaRF1 BDLF4 BFRF1 BFRF3 BFRF2 BTRF1 BGLF5 BGLF4 BOLF1 BGLF2 BGLF1 BGLF3 LMP−1 BARF1 BKRF3 BKRF4 BBRF3 BBRF1 BVRF1 BVRF2 BSRF1 BKRF2 BBRF2 BXRF1 BHRF1 BRRF1 BNRF1 BRRF2 BCRF1 BORF2 BORF1 BMRF1 BMRF2 RPMS1 BFRF1A LMP−2A EBNA−1 EBNA−2 LMP−2B BDLF3.5 BGLF3.5 EBNA−3A EBNA−3B EBNA−3C BBLF2/BBLF3 BSLF2/BMLF1 BGRF1/BDRF1

Figure 5. EBV transcription programs in PBL. Heatmap illustrating the full genome expression of EBV in the RNA-seq data of PBL samples (virus gene counts are normalized per million host reads). 16/17 EBV-positive samples show significantly higher expression of lytic genes such asBALF4 (encoding the envelope glycoprotein B) and BALF5 (encoding the DNA polymerase catalytic subunit) compared with the expression of canonical latency programs.

We thus performed a detailed RNA profiling of the EBV These findings underscore a central role for this transcription genome. Recapitulating previous reports, all PBL cases factor in the pathogenesis of PBL and have implications for tested by in situ hybridization were positive for EBV-encoded the diagnosis and treatment of these diseases. RNA (EBER), while only 4 of 13 samples showed expression The STAT3 protein is an important player in multiple of the LMP1 protein on IHC (Supplementary Table S1). immune cells where it modulates a variety of physiological When using the levels of BLLF1 and EBNA2, two genes that processes. Within the B-cell lineage, a selective role has been are invariably not expressed in PBL, as threshold for posi- recognized for this factor in the differentiation of B cells into tive calls, we noted that nearly the entire viral genome was plasma cells upon antigen stimulation, as documented by transcribed at background levels in the majority of sam- both in vitro and in vivo studies (14). Briefly, in response to ples (Fig. 5). This is analogous to what has been observed CD40L- and IL21-mediated signaling by T follicular heper in nonreplicating infected B cells, where a background cells, STAT3 is phosphorylated by activated JAK kinases, of transcripts can be detected in the absence of protein translocates into the nucleus as homo- or hetero-dimers, and expression, due to regulation at the ribosomal level (34). activates the transcription of multiple targets, including the However, several genes in the BamHI-A region of the virus plasma cell master regulator BLIMP1. In turn, BLIMP1 down- were abundantly transcribed in at least 50% of cases across regulates BCL6 expression, an absolute prerequisite to GC the cohort, including those encoding for components of the exit and plasma cell differentiation (14). The STAT3 amino viral replication machinery (namely, the DNA polymerase acid changes identified in our study include experimentally catalytic subunit BALF5, the single-stranded DNA-binding documented gain-of-function events that are predicted to protein BALF2, and the lytic origin of DNA replication have oncogenic effects by enhancing its phosphorylation oriLyt), the BART encoded protein RPMS1, the viral envelope and transactivation potential (37). In addition, other well- glycoprotein BALF4, and the G-protein–coupled receptor documented genetic mechanisms were found that can acti- BILF1, which is expressed predominantly during the imme- vate the JAK–STAT signaling pathway, including mutually diate early phases of infection in vitro (35, 36). LF1, LF2, and exclusive upstream mutations of JAK1 or JAK2 (18/110 cases) LF3 were also highly expressed in PBL, but their functional and the loss of the STAT3 negative regulator SOCS1 (mutated roles are less understood. Expression of the LMP-1 mRNA in 11/110 cases). Notably, a targeted sequencing study pub- was detected in 17 of 20 samples, but at much lower levels lished while this manuscript was under revision identified (Fig. 5), consistent with the known EBV latency programs recurrent STAT3 mutations in 5 of 42 cases, all of which of PBL (33). were EBV positive (38). Thus, STAT3 dysregulation may contribute to the pathogenesis of PBL by rendering these cells signaling-independent while providing proliferation and Discussion survival signals. Our study provides a comprehensive snapshot of the Constitutive activation of the JAK–STAT signaling path- genetic landscape of HIV-associated PBL and reveals highly way has been reported in a number of solid and hematologic recurrent somatic mutations affecting the JAK–STAT3 and malignancies and plays a central role in two lymphomas RAS–MAPK signaling pathway as a genetic hallmark of this that are immunophenotypically closely related to plasma- disease, with STAT3 representing the most prominent target. blastic lymphoma: primary effusion lymphoma (PEL) and

2020 BLOOD CANCER DISCOVERY | OF10

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. RESEARCH ARTICLE Liu et al.

ALK-positive large B-cell lymphoma (LBCL). PEL is a rare mal plasticity in the pathogenesis of PBL, which warrants and aggressive AIDS-defining disease, which is associated further investigations. with infection by HHV8 and is clinically distinguishable Due to the small number of EBV-negative cases in our from PBL by the presence of lymphomatous effusion in cohort (n = 2) and the overall rarity of this subset, we could body cavities. In these cells, constitutive STAT3 activity not assess whether EBV status is associated with differing is achieved via expression of the HHV8 viral protein IL6, genetic features, as recently reported for Burkitt lymphoma which contributes to the disease in an autocrine fashion by (50) and suggested by the evidence of distinct transcrip- promoting proliferation and survival (39). In ALK-positive tomic profiles between EBV-positive and –negative PBL (32). LBCL, STAT3 activation is sustained by the ALK kinase Moreover, it remains to be determined whether regional dif- mediated by chromosomal translocations with the CLTCL ferences exist with PBL occurring outside the context of HIV gene or the NPM1 gene (40, 41). However, direct genetic immunodeficiency, or within HIV-associated PBL, given the alterations of STAT3 are rare in mature B-cell lympho- high homogeneity of our cohort from the Gauteng region mas. In particular, while multiple genomic hits leading to in South Africa. Thus, additional work will be required to potentiation of the JAK–STAT oncogenic pathway have specifically address these questions. been detected in 87% of Hodgkin lymphomas as well as in We found that most of the EBV genome was transcribed at primary mediastinal B-cell lymphomas, the most commonly very low levels in the majority of PBL cases, while prominent affected STAT member in these tumors is STAT6 (42–45). expression was detected for several genes in the BamHI-A The high incidence of STAT3 mutational activation in region of the virus, including some early lytic genes. However, HIV-associated PBL points to STAT or JAK inhibitors as transcription of the key early lytic gene BZLF1 was noticeably promising treatment options in this lymphoma type. While absent, suggesting an incomplete lytic program. Supporting anti-STAT3 therapeutic attempts are still in development, this notion, a recent study showed that increased STAT3 JAK inhibitor therapy, currently used in the clinical setting, expression, as observed in the majority of PBL cases, decreases was shown to be an effective antagonist to STAT3 activa- the susceptibility of latently infected cells to EBV lytic activa- tion, inducing apoptosis in both anaplastic large T-cell tion signals via an RNA-binding protein PCBP2 (51). Thus, lymphomas and ovarian cancer (46). further protein expression studies, in particular for BRLF1 The second important finding of this study is the iden- and for the BLLF1-encoded viral envelope glycoprotein gp350, tification of frequent hotspot mutations in RAS–MAPK are required to assess whether the observed transcriptome family members. Functional RAS activation is a common program translates in lytic infection. Indeed, recent transla- molecular feature of multiple myeloma, particularly in the tion ribosome profiling studies have clearly demonstrated a relapsed/refractory setting, while it is rarely observed in marked heterogeneity of lytic genes translation and complex de novo DLBCL, suggesting a specific role in the pathogen- levels of intracellular translation repression mechanisms at esis of plasma cell dyscrasias. Interestingly, and different work in infected B cells (34). from multiple myeloma, NRAS, and KRAS mutations in In conclusion, the results of our study characterize HIV- PBL were never concurrently observed in the same case and associated PBL as a distinct subset of aggressive B-cell lym- were often well represented in the dominant tumor clone, phoma and significantly contribute to our knowledge about consistent with early events. These data have direct implica- the molecular pathogenesis of this disease through the iden- tions for the clinical exploration of treatments inhibiting tification of recurrently mutated genes, uncovering a major this pathway. role for the dysregulation of JAK–STAT3 and RAS–MAPK Our study also showed overexpression of the transmem- signaling pathways. These results reveal new points of poten- brane glycoprotein CD44, frequently associated with copy tial therapeutic intervention in these patients. number gains/amplifications at this locus. CD44 is an adhe- sion molecule that mediates cellular interaction with the microenvironment and participates in the trafficking of Methods neoplastic cells in multiple myeloma, CLL, and ALL (47); For complete experimental details and computational analyses, see moreover, CD44 was shown to increase cell resistance to also Supplementary Methods. apoptosis and to enhance cancer cell invasiveness. Whereas the role of CD44 in PBL requires functional dissection, its Patient Cohorts high expression is likely to compound the aggressiveness Prospective samples (n = 15 with paired tumor-normal tissue; of the disease, as previously described in DLBCL of the discovery cohort) were collected from patients with suspected ABC subtype (48). In light of this, successful anti-CD44– PBL, upon informed written consent in line with the Declara- targeted therapy in a mice xenograft model of human multi- tion of Helsinki, and diagnosis was further confirmed by two ple myeloma may in future represent an attractive therapeutic independent pathologists. Matched normal DNA for these 15 option for PBL (49). patients was extracted from peripheral blood samples that were The finding of increased histones mRNA abundance in documented to be tumor-free. For targeted sequencing (n = 95 samples; extension cohort), formalin-fixed paraffin-embedded PBL, together with recurrent copy number gains encom- (FFPE) material was retrieved from the archives of the Department passing this gene cluster on , is of interest of Oral Pathology and Anatomical Pathology, National Health because it emerged as a distinctive feature of this disease Laboratory Service and the University of the Witwatersrand. This compared with other lymphoid malignancies. Histones rep- panel included 36 nonoral PBL samples that were part of a pub- resent a basic component of the chromatin structure and lished cohort (52). The study protocol was approved by the local their involvement may suggest a selective role for nucleoso- Human Research Ethics Committee (IRB Reference M150390). A

OF11 | BLOOD CANCER DISCOVERY 2020 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. Genomic Characterization of Plasmablastic Lymphoma RESEARCH ARTICLE summary of demographics and phenotypic markers of the discovery sequencing was performed at GENEWIZ. Read alignments and con- cohort are displayed in Supplementary Table S1, whereas demo- ventional preprocessing were conducted as described for the WES graphic information for the extension cohort is included in Supple- analysis. For samples lacking matched normal control, variants were mentary Table S3. For downstream nucleic acid extraction, the tumor filtered out if found in dbSNP database (dbSNP138) as well as in any area (>70% tumor cells) was ringed for microdissection in all samples. normal sample of the WES cohort. A third cohort of 31 well-characterized PBL samples obtained as archived FFPE material (local Human Research Ethics Committee Copy Number Analysis IRB reference M101171 and 96/2011; Supplementary Table S7) was used to validate and refine copy number aberration results observed For copy number analysis from WES data (n = 15), the Biodiscov- in the WES data by using Microarray OncoScan (21). ery Multiscale BAM Reference Builder (53) was used to construct a multiscale reference (MSR) file from 14 paired normal samples WES and RNA-Seq alignments (BAM). The MSR file was used as reference for copy- number variation (CNV) calling of all tumor alignments with the Both exome and RNA-seq library preparations and sequencing SNP-FASST2 algorithm (54), using Nexus Copy Number, v10.0 (Bio- were outsourced to Centrillion Genomics Services and BGI (Americas Discovery, Inc.; ref. 54). Gains and losses were defined as at least+ 0.3 and Corporation). Whole-exome libraries were prepared using the Agi- -0.3 log2 ratio changes, respectively, in the tumor alignment. lent SureSelect Human All Exon V6 Kit (Agilent Technologies) To validate CNV calling from WES data, 31 additional samples were and sequenced on HiSeq 2500 using TruSeq SBS v2 Reagent Kit processed on Oncoscan FFPE Express Arrays (Affymetrix, Thermo (Illumina) at 2 × 100 bp paired-end reads with on-target coverage of Fisher Scientific) (Supplementary Table S7) according to the manu- 100X per sample. RNA-seq libraries were prepared using the Illumina facturer’s instructions, followed by scanning on a GeneChip Scanner TruSeq Stranded Total RNA Sample Prep Kit (Illumina). Flow cells 3000 7G, with the Affymetrix GeneChip Command Console (AGCC). with multiplexed samples were run on the HiSeq 2500 using an Illu- Paired A+T and G+C CEL files were combined and analyzed with mina TruSeq SBS v2 Reagent Kit at 2 × 100 bp paired-end reads and Chromosome Analysis Suite v3.3.0.139 (ChAS, Applied Biosystems, a coverage of 50M reads per sample. Thermo Fisher Scientific), using the OncoScan CNV workflow for FFPE, without manual recentering. We confirmed that copy number Mutation Calling calls from OncoScan and WES data were consistent with each other by Fastq files were aligned to the assembly (hg19) using performing OncoScan on 7 cases that had been sequenced by WES and the Burrows–Wheeler Aligner (version 0.6.2). Before further analysis, comparing the calls obtained from the two methods. Probe genomic the initially aligned BAM files were subjected to preprocessing that coordinates were aligned to hg19 and the resulting OSCHP files sorted, indexed, and marked duplicated reads using SAMtools (ver- were analyzed by Nexus Copy Number v10.0 using the SNP-FASST2 sion 1.7) and Picard (version 1). To identify somatic mutations from algorithm (54) with default parameters. WES data for tumor samples with matched blood control, we applied To detect recurrent copy number aberrations, we applied the the variant-calling software SAVI2, based on an empirical Bayesian GISTIC 2.0 algorithm using GenePattern (https://www.genepattern. method as published (13). Somatic mutations were identified on the org/) to the copy number segmentations of the combined cohort basis of the final report of SAVI2, and following five additional criteria: of 46 patients (15 cases analyzed by WES, 31 cases analyzed by (i) not annotated as a synonymous variant, intragenic variant, or intron OncoScan). Recurrent regions of copy number aberrations with variant; (ii) not annotated as a common SNP (dbSnp138); (iii) a variant q-value < 1.0 × 10−6 were considered significant. Furthermore, we allele frequency of >5% in the tumor sample; (iv) a variant allele read assessed the expression level of each gene within the significant depth of <2 in the matched normal control; and (v) variant associated GISTIC peaks using the median value of quantile normalized RPKM reads with overall mismatch rate ≤ 0.02 as estimated by bam-readcount from RNA-seq data (n = 20). (version 0.8.0, https://github.com/genome/bam-readcount). All muta- tions described throughout this manuscript refer to nonsynonymous Pandora: A High-Performing Pipeline for mutations, unless otherwise specified. Quantifying the Bacterial and Viral Microenvironment of Bulk RNA-Seq Samples Design of Targeted Sequencing Panel Pandora is an open-source pipeline (https://github.com/ The full coding exons of 34 genes found recurrently mutated in RabadanLab/Pandora) that takes as input the total RNA sequence the discovery cohort and/or previously implicated in lymphoma reads from a single sample and outputs the spectrum of detected were analyzed in an extension panel of 95 cases (Supplementary microbial transcripts, focusing on bacteria and viruses. The Pan- Table S3) by targeted capture and next generation sequencing. dora workflow is divided into the following four main modules: Genes were selected according to the following criteria: (i) allele (i) mapping to the human host genome using STAR and Bowtie2 frequency > 15%; (ii) mutated in at least 2 of 15 cases; (iii) expressed to filter out the host reads from downstream analysis; (ii) de novo in normal or transformed B cells; and (iv) functionally annotated. assembly of host-subtracted short reads using Trinity (55) to create In addition, we included 16 genes that were only mutated in one contiguous full-length transcripts (contigs), which help increase the sample but have known roles in the pathogenesis of lymphoma and accuracy of alignment to the correct species of origin in the next step; 4 genes that were not found mutated in the discovery cohort but have (iii) identification of the most likely species of origin for each assem- been previously implicated in PBL. Genes lacking clear functional bled contig with BLAST; and (iv) filtering and parsing of the BLAST annotation and/or known to represent common nonspecific muta- results into a final report on the detected microbial abundances. We tional targets in sequencing studies (e.g., TTN, PCLO) were excluded. also performed gene expression profiling of all the reads that mapped The complete gene list is reported in Supplementary Table S4. to the EBV genome (56) using FeatureCounts (57) to fully characterize Targeted Next-Generation Sequencing lytic and latent EBV programs. The entire coding region of the 34 selected genes was isolated using the IDT xGen Predesigned Gene Capture Custom Target Data Availability Enrichment Technology (Integrated DNA Technologies) and sub- The data that support the findings of this study are available upon jected to library preparation and next-generation sequencing on the request. The sequencing data have been deposited in NCBI Sequence Illumina HiSeq platform with 2 × 150 bp configuration. Targeted Read Archive (SRA) under accession number PRJNA598849.

2020 BLOOD CANCER DISCOVERY | OF12

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. RESEARCH ARTICLE Liu et al.

Disclosure of Potential Conflicts of Interest 6. Castillo JJ, Bibas M, Miranda RN. The biology and treatment of plas- mablastic lymphoma. Blood 2015;125:2323–30. No potential conflicts of interest were disclosed. 7. Valera A, Balague O, Colomo L, Martinez A, Delabie J, Taddesse-Heath L, et al. IG/MYC rearrangements are the main cytogenetic alteration Authors’ Contributions in plasmablastic lymphomas. Am J Surg Pathol 2010;34:1686–94. Z. Liu: Conceptualization, investigation, visualization, method- 8. Tchernonog E, Faurie P, Coppo P, Monjanel H, Bonnet A, Algarte ology, writing-original draft, writing-review, and editing. I. Filip: Genin M, et al. Clinical characteristics and prognostic factors of plas- Investigation, methodology, and writing-original draft. K. Gomez: mablastic lymphoma patients: analysis of 135 patients from the LYSA Investigation, methodology, writing-review, and editing. D. Engel- group. Ann Oncol 2017;28:843–8. brecht: Resources, validation, investigation, and visualization. 9. Teruya-Feldstein J, Chiao E, Filippa D, Lin O, Comenzo R, Coleman S. Meer: Resources and investigation. P. Lalloo: Resources, validation, M, et al. CD20-negative large-cell lymphoma with plasmablastic and investigation. P. Patel: Resources, validation and investigation. features: a clinically heterogenous spectrum in both HIV-positive Y. Perner: Resources and investigation. J. Zhao: Investigation, visuali- and-negative patients. Ann Oncol 2004;15:1673–9. zation, and methodology. J. Wang: Investigation, writing-review, and 10. Chang CC, Zhou X, Taylor JJ, Huang WT, Ren X, Monzon F, et al. Genomic profiling of plasmablastic lymphoma using array compara- editing. L. Pasqualucci: Conceptualization, resources, supervision, tive genomic hybridization (aCGH): revealing significant overlapping funding acquisition, investigation, methodology, writing-original genomic lesions with diffuse large B-cell lymphoma. J Hematol Oncol draft, project administration, writing-review, and editing. R. Rabadan: 2009;2:47. Conceptualization, supervision, funding acquisition, investigation, 11. Chapman J, Gentles AJ, Sujoy V, Vega F, Dumur CI, Blevins TL, methodology, project administration, writing-review, and editing. et al. Gene expression analysis of plasmablastic lymphoma identifies P. Willem: conceptualization, resources, supervision, funding acqui- downregulation of B-cell receptor signaling and additional unique sition, validation, investigation, visualization, writing-original draft, transcriptional programs. Leukemia 2015;29:2270–3. project administration, writing-review, and editing. 12. Montes-Moreno S, Martinez-Magunacelaya N, Zecchini-Barrese T, Villambrosia SG, Linares E, Ranchal T, et al. Plasmablastic lymphoma Acknowledgments phenotype is determined by genetic alterations in MYC and PRDM1. This work has been funded by NIH grants R21 CA192854 (to Mod Pathol 2017;30:85–94. P. Willem, L. Pasqualucci, and R. Rabadan), R01GM117591 and U54 13. Trifonov V, Pasqualucci L, Tiacci E, Falini B, Rabadan R. SAVI: a sta- CA193313 (to R. Rabadan), and was initiated under the Columbia- tistical algorithm for variant frequency identification. BMC Syst Biol South Africa Training Program for Research on HIV-associated 2013;7:S2. Malignancies D43 CA153715, with the support of Judith Jacobson. 14. Hillmer EJ, Zhang H, Li HS, Watowich SS. STAT3 signaling in immu- We thank Stephen P. Goff and Henri-Jacques Delecluse for their nity. Cytokine Growth Factor Rev 2016;31:1–15. 15. Crescenzo R, Abate F, Lasorsa E, Tabbo F, Gaudiano M, Chiesa N, helpful suggestions. We also thank Sonja Boy for providing addi- et al. Convergent mutations and kinase fusions lead to oncogenic tional samples for the independent copy number validation cohort, STAT3 activation in anaplastic large cell lymphoma. Cancer Cell 2015; Nicole Crawford for assistance in samples collection and data cura- 27:516–32. tion, and Jacky Brown for help with Sanger sequencing. Whole exome 16. Song TL, Nairismagi ML, Laurensia Y, Lim JQ, Tan J, Li ZM, et al. capture and sequencing, and RNA sequencing were completed at Oncogenic activation of the STAT3 pathway drives PD-L1 expression Centrillion Biosciences, Inc and BGI Tech Solutions (Hong Kong). in natural killer/T-cell lymphoma. Blood 2018;132:1146–58. Targeted DNA sequencing was performed at Genewiz Inc. 17. Nicolae A, Xi L, Pham TH, Pham TA, Navarro W, Meeker HG, et al. Mutations in the JAK/STAT and RAS signaling pathways are com- The costs of publication of this article were defrayed in part by mon in intestinal T-cell lymphomas. Leukemia 2016;30:2245. the payment of page charges. This article must therefore be hereby 18. Prior IA, Lewis PD, Mattos C. A comprehensive survey of Ras muta- marked advertisement in accordance with 18 U.S.C. Section 1734 tions in cancer. Cancer Res 2012;72:2457–67. solely to indicate this fact. 19. Pasqualucci L, Neumeister P, Goossens T, Nanjangud G, Chaganti R, Küppers R, et al. Hypermutation of multiple proto-oncogenes in Received April 2, 2020; revised May 11, 2020; accepted May 11, B-cell diffuse large-cell lymphomas. Nature 2001;412:341–6. 2020; published first June 10, 2020. 20. Taddesse-Heath L, Meloni-Ehrig A, Scheerle J, Kelly JC, Jaffe ES. Plas- mablastic lymphoma with MYC translocation: evidence for a com- mon pathway in the generation of plasmablastic features. Mod Pathol 2010;23:991–9. References 21. Boy SC, van Heerden MB, Babb C, van Heerden WF, Willem P. Domi- 1. Campo E SH, Harris NL. WHO classification of tumors of the hemat- nant genetic aberrations and coexistent EBV infection in HIV-related opoietic and lymphoid tissues. In: Swerdlow SH, Campo E, Harris NL, oral plasmablastic lymphomas. Oral Oncol 2011;47:883–7. editors. Lyon, France: IARC Press; 2017. p. 321–2. 22. Maya Miles D, Penate X, Sanmartin Olmo T, Jourquin F, Munoz 2. Carbone A, Vaccher E, Gloghini A, Pantanowitz L, Abayomi A, de Centeno MC, Mendoza M, et al. High levels of histones promote Paoli P, et al. Diagnosis and management of lymphomas and other whole-genome-duplications and trigger a Swe1(WEE1)-dependent cancers in HIV-infected patients. Nat Rev Clin Oncol 2014;11: phosphorylation of Cdc28(CDK1). Elife 2018;7:e35337. 223–38. 23. Wenzel S, Grau M, Mavis C, Hailfinger S, Wolf A, Madle H, et al. 3. Noy A, Lensing SY, Moore PC, Gupta N, Aboulafia D, Ambinder MCL1 is deregulated in subgroups of diffuse large B-cell lymphoma. R, et al. Plasmablastic lymphoma is treatable in the HAART era. A Leukemia 2013;27:1381–90. 10 year retrospective by the AIDS Malignancy Consortium. Leuk 24. Chen C, Zhao S, Karnad A, Freeman JW. The biology and role of Lymphoma 2016;57:1731–4. CD44 in cancer progression: therapeutic implications. J Hematol 4. Pather S, Mohamed Z, McLeod H, Pillay K. Large cell lymphoma: Oncol 2018;11:64. correlation of hiv status and prognosis with differentiation profiles 25. Landau DA, Tausch E, Taylor-Weiner AN, Stewart C, Reiter JG, Bahlo assessed by immunophenotyping. Pathol Oncol Res 2013;19:695–705. J, et al. Mutations driving CLL and their evolution in progression and 5. Delecluse HJ, Anagnostopoulos I, Dallenbach F, Hummel M, Mara- relapse. Nature 2015;526:525. fioti T, Schneider U, et al. Plasmablastic lymphomas of the oral cav- 26. Schmitz R, Wright GW, Huang DW, Johnson CA, Phelan JD, Wang ity: a new entity associated with the human immunodeficiency virus JQ, et al. Genetics and pathogenesis of diffuse large B-cell lymphoma. infection. Blood 1997;89:1413–20. N Engl J Med 2018;378:1396–407.

OF13 | BLOOD CANCER DISCOVERY 2020 AACRJournals.org

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. Genomic Characterization of Plasmablastic Lymphoma RESEARCH ARTICLE

27. Lohr JG, Stojanov P, Carter SL, Cruz-Gordillo P, Lawrence MS, 42. Tiacci E, Ladewig E, Schiavoni G, Penson A, Fortini E, Pettirossi V, Auclair D, et al. Widespread genetic heterogeneity in multiple et al. Pervasive mutations of JAK-STAT pathway genes in classical myeloma: implications for targeted therapy. Cancer Cell 2014;25: Hodgkin lymphoma. Blood 2018;131:2454–65. 91–101. 43. Steidl C, Gascoyne RD. The molecular pathogenesis of primary medi- 28. Pasqualucci L, Dalla-Favera R. Genetics of diffuse large B-cell lym- astinal large B-cell lymphoma. Blood 2011;118:2659–69. phoma. Blood 2018;131:2307–19. 44. Joos S, Kupper M, Ohl S, von Bonin F, Mechtersheimer G, Bentz M, 29. Brescia P, Schneider C, Holmes AB, Shen Q, Hussein S, Pasqualucci et al. Genomic imbalances including amplification of the tyrosine L, et al. MEF2B instructs germinal center development and acts as an kinase gene JAK2 in CD30+ Hodgkin cells. Cancer Res 2000;60:549–52. oncogene in B cell lymphomagenesis. Cancer Cell 2018;34:453–65. 45. Ritz O, Guiter C, Castellano F, Dorsch K, Melzner J, Jais JP, et al. 30. Ramsay AJ, Martinez-Trillos A, Jares P, Rodriguez D, Kwarciak A, Recurrent mutations of the STAT6 DNA binding domain in primary Quesada V. Next-generation sequencing reveals the secrets of the mediastinal B-cell lymphoma. Blood 2009;114:1236–42. chronic lymphocytic leukemia genome. Clin Transl Oncol 2013;15: 46. Johnson DE, O’Keefe RA, Grandis JR. Targeting the IL-6/JAK/STAT3 3–8. signalling axis in cancer. Nat Rev Clin Oncol 2018;15:234. 31. Chapman MA, Lawrence MS, Keats JJ, Cibulskis K, Sougnez C, 47. Redondo-Muñoz J, García-Pardo A, Teixidó J. Molecular players in Schinzel AC, et al. Initial genome sequencing and analysis of multiple hematologic tumor cell trafficking. Front Immunol 2019;10:156. myeloma. Nature 2011;471:467–72. 48. Tzankov A, Pehrs AC, Zimpfer A, Ascani S, Lugli A, Pileri S, et al. 32. Gravelle P, Pericart S, Tosolini M, Fabiani B, Coppo P, Amara N, et al. Prognostic significance of CD44 expression in diffuse large B cell EBV infection determines the immune hallmarks of plasmablastic lymphoma of activated and germinal centre B cell-like types: a tissue lymphoma. Oncoimmunology 2018;7:e1486950. microarray analysis of 90 cases. J Clin Pathol 2003;56:747–52. 33. Shannon-Lowe C, Rickinson A. The global landscape of EBV-associated 49. Zhong Y, Meng F, Zhang W, Li B, van Hest JC, Zhong Z. CD44- tumours. Front Oncol 2019;9:713. targeted vesicles encapsulating granzyme B as artificial killer cells 34. Bencun M, Klinke O, Hotz-Wagenblatt A, Klaus S, Tsai MH, Poirey R, for potent inhibition of human multiple myeloma in mice. J Control et al. Translational profiling of B cells infected with the Epstein-Barr Release 2020;320:421–30. virus reveals 5′ leader ribosome recruitment through upstream open 50. Grande BM, Gerhard DS, Jiang A, Griner NB, Abramson JS, Alexander reading frames. Nucleic Acids Res 2018;46:2802–19. TB, et al. Genome-wide discovery of somatic coding and noncoding 35. Hammerschmidt W, Sugden B. Replication of Epstein–Barr viral mutations in pediatric endemic and sporadic Burkitt lymphoma. DNA. Cold Spring Harb Perspect Biol 2013;5:a013029. Blood 2019;133:1313–24. 36. Sugimoto A, Sato Y, Kanda T, Murata T, Narita Y, Kawashima D, 51. Koganti S, Clark C, Zhi J, Li X, Chen EI, Chakrabortty S, et al. Cellular et al. Different distributions of Epstein-Barr virus early and late gene STAT3 functions via PCBP2 to restrain Epstein-Barr virus lytic activa- transcripts within viral replication compartments. J Virol 2013;87: tion in B lymphocytes. J Virol 2015;89:5002–11. 6693–9. 52. Meer S PY, McAlpine ED, Willem P. Extraoral plasmablastic lympho- 37. Yu H, Lee H, Herrmann A, Buettner R, Jove R. Revisiting STAT3 sig- mas in a high human immunodeficiency virus endemic area. Histopa- nalling in cancer: new and unexpected biological functions. Nat Rev thology 2019;76:212–21. Cancer 2014;14:736–46. 53. Villela D, Costa SS, Vianna-Morgante AM, Krepischi AC, Rosenberg 38. Garcia-Reyero J, Martinez Magunacelaya N, Gonzalez de Villambrosia C. Efficient detection of chromosome imbalances and single nucleo- S, Loghavi S, Gomez Mediavilla A, Tonda R, et al. Genetic lesions in tide variants using targeted sequencing in the clinical setting. Eur J MYC and STAT3 drive oncogenic transcription factor overexpression Med Genet 2017;60:667–74. in plasmablastic lymphoma. Haematologica 2020 Apr 9 [Epub ahead 54. Ostrup O, Ahlborn LB, Lassen U, Mau-Sorensen M, Nielsen FC. of print]. Detection of copy number alterations in cell-free tumor DNA from 39. Cousins E, Gao Y, Sandford G, Nicholas J. Human herpesvirus 8 plasma. BBA Clin 2017;7:120–6. viral interleukin-6 signaling through gp130 promotes virus replica- 55. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, tion in primary effusion lymphoma and endothelial cells. J Virol et al. Full-length transcriptome assembly from RNA-Seq data with- 2014;88:12167–72. out a reference genome. Nat Biotechnol 2011;29:644. 40. Chiarle R, Simmons WJ, Cai H, Dhall G, Zamo A, Raz R, et al. Stat3 is 56. Arvey A, Tempera I, Tsai K, Chen HS, Tikhmyanova N, Klichinsky required for ALK-mediated lymphomagenesis and provides a possible M, et al. An atlas of the Epstein-Barr virus transcriptome and epig- therapeutic target. Nat Med 2005;11:623–9. enome reveals host-virus regulatory interactions. Cell Host Microbe 41. Valera A, Colomo L, Martinez A, de Jong D, Balague O, Matheu G, 2012;12:233–45. et al. ALK-positive large B-cell lymphomas express a terminal B-cell 57. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose differentiation program and activated STAT3 but lack MYC rear- program for assigning sequence reads to genomic features. Bioinfor- rangements. Mod Pathol 2013;26:1329–37. matics 2013;30:923–30.

2020 BLOOD CANCER DISCOVERY | OF14

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research. Genomic Characterization of HIV-Associated Plasmablastic Lymphoma Identifies Pervasive Mutations in the JAK−STAT Pathway

Zhaoqi Liu, Ioan Filip, Karen Gomez, et al.

Blood Cancer Discov Published OnlineFirst June 10, 2020.

Updated version Access the most recent version of this article at: doi: 10.1158/2643-3230.BCD-20-0051

Supplementary Access the most recent supplemental material at: Material http://bloodcancerdiscov.aacrjournals.org/content/suppl/2020/06/11/2643-3230.BCD-20-0051.DC 1

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://bloodcancerdiscov.aacrjournals.org/content/early/2020/06/09/2643-3230.BCD-20-0051. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from https://bloodcancerdiscov.aacrjournals.org by guest on September 24, 2021. Copyright 2020 American Association for Cancer Research.