Transcriptome-Guided Characterization of Genomic Rearrangements in a Breast Cancer Cell Line
Total Page:16
File Type:pdf, Size:1020Kb
Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line Qi Zhaoa,1, Otavia L. Caballerob,1, Samuel Levya, Brian J. Stevensonc, Christian Iselic, Sandro J. de Souzad, Pedro A. Galanted, Dana Busama, Margaret A. Levershae, Kalyani Chadalavadae, Yu-Hui Rogersa, J. Craig Ventera,2, Andrew J. G. Simpsonb,2, and Robert L. Strausberga,2 aJ. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850; bLudwig Institute for Cancer Research, New York, NY 10021; cLudwig Institute for Cancer Research, 1015 Lausanne, Switzerland; dLudwig Institute for Cancer Research, CEP 01509-010 Sao Paulo, Brazil; and eMemorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10065 Contributed by J. Craig Venter, December 22, 2008 (sent for review December 1, 2008) We have identified new genomic alterations in the breast cancer per cell (see Fig. 2A). The SKY analysis also reveals a large cell line HCC1954, using high-throughput transcriptome sequenc- number of translocations involving most or all chromosomes. ing. With 120 Mb of cDNA sequences, we were able to identify Using 454-FLX pyrosequencing we generated 510,703 cDNA genomic rearrangement events leading to fusions or truncations of sequences of average length 245 bp from the HCC1954 cell line. genes including MRE11 and NSD1, genes already implicated in (See Methods and Fig. S1). We then initially aligned all cDNA oncogenesis, and 7 rearrangements involving other additional sequences to RefSeq mRNAs (GenBank dataset available on genes. This approach demonstrates that high-throughput tran- March 28, 2008), revealing that Ͼ384,900 reads were uniquely scriptome sequencing is an effective strategy for the characteriza- associated well with 9,221 RefSeq genes. tion of genomic rearrangements in cancers. Detection of Chimeric Genes. The set of sequences that did not cancer genome ͉ transcriptome sequencing cluster with RefSeq mRNAs were aligned to the human refer- ence genome. The remaining 47,370 sequences including those n achievable goal of the oncology community is to perform that were neither aligned to RefSeq mRNA entries nor to the Acomprehensive sequence analysis of the cancer genome and human genome at a full-length coverage were then pooled its transcripts toward the identification of new detection, diag- together and submitted to a computational analysis pipeline for nostic and intervention strategies. The onset of cancer results the detection of chimeric transcripts, as revealed by cDNA that from genomic alterations in precursor cells, and changes in the can be uniquely split between two genomic locations. Chimeric surrounding microenvironment (1) including the immune system transcripts thus identified might result from genomic rearrange- (2). Comprehensive analysis of the active genes comprising the ments. From the set of 47,370 nonalignable sequences, we transcriptome has resulted in advances in our ability to under- identified 496 sequences that could be uniquely mapped to 2 stand the pathways involved in the progression of cancer, serves distinct genomic locations that suggested genomic break points. as a tool to delineate molecular differences in cancers, even The results of this analysis are detailed in Fig. 1. among those that are from the same body site and appear similar Approximately half of the putative chimeric sequences de- by traditional approaches. Large-scale Sanger-based cDNA se- scribed 208 interchromosomal rearrangement events (243/496) quencing approaches contributed significantly to deciphering and the other half represented 210 intrachromosomal rearrange- transcriptome complexity (3–5). However, cost has been a ment events (253/496). We performed experimental validation significant limitation. To address that issue, and to attain deep for 33 putative chimeras including 9 with more than 1 supporting coverage of the transcriptome such that rare transcripts could be 454 read and 24 with only 1 supporting 454 read (Fig. 1). identified, tagging approaches such as SAGE (6), CAGE (7), and Chimeric transcripts were experimentally first validated at the MPSS (8) have been used. However, the short tags that are used transcript level. Reverse transcriptase-PCR (RT-PCR) with a give a very limited view of the complete transcript set and primer pair flanking the break junction of a chimeric transcript variations therein such as through alternative splicing, translo- was performed on an independently prepared genomic DNA- cations and point mutations. free total RNA sample of the HCC1954. The chimeras were Here, we use 454 Life Sciences pyrosequencing technology subsequently confirmed by Sanger sequencing of RT-PCR am- that enables relatively long sequence reads and deep transcrip- plified bands. All 9 chimeric cDNA with multiple 454 read tome coverage. Our primary interest was to determine whether support and 4 chimeric cDNAs with single 454 read support were we could use this technology to identify genomic alterations, thus verified in the HCC1954 transcriptome. We also tested the specifically gene fusions, and thus contribute to an integrated existence of chimeric cDNAs in a matched blood cell line view of the genome and transcriptome alterations within the (HCC1954BL). Surprisingly, most chimeric transcripts that sup- breast cancer cell line HCC1954. This cell line has been the ported an intrachromosomal rearrangement were amplified subject of several large-scale genomic analyses including com- from both HCC1954 and HCC1954BL (Table S1). However, all prehensive exome sequencing to detect somatic point mutations and BAC sequencing to identify chromosome translocations, which thus allows direct comparison between the approaches Author contributions: Q.Z., O.L.C., J.C.V., A.J.G.S., and R.L.S. designed research; O.L.C., D.B., (9–11). It is evident that the generation of sufficient depth of M.A.L., K.C., and Y.-H.R. performed research; Q.Z., O.L.C., S.L., B.J.S., C.I., S.J.d.S., and P.A.G. transcript coverage from a tumor and corresponding normal analyzed data; and Q.Z., S.L., A.J.G.S., and R.L.S. wrote the paper. tissue will identify all expressed genes and the point mutations The authors declare no conflict of interest. they contain. The focus of this current study is how translocations Freely available online through the PNAS open access option. that result in chimeric genes can be elucidated and interpreted 1Q.Z. and O.L.C. contributed equally to this work. from transcript sequencing. 2To whom correspondence may be addressed. E-mail: [email protected], [email protected], or [email protected]. Results This article contains supporting information online at www.pnas.org/cgi/content/full/ As shown by the spectral karyotyping (SKY), HCC1954 has a 0812945106/DCSupplemental. pseudotetraploid karyotype with an average of 92 chromosomes © 2009 by The National Academy of Sciences of the USA 1886–1891 ͉ PNAS ͉ February 10, 2009 ͉ vol. 106 ͉ no. 6 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0812945106 Downloaded by guest on October 2, 2021 region. Resulting candidates with either multiple or only 1 supporting 454 reads were selected and subjected to verification. A total of 36 candidates events were identified through this approach. Of these, 6 were supported by multiple independent 454 reads. Two of these 6 chimeric events were confirmed by RT-PCR at the cDNA level and LR-PCR at genomic DNA level (Table 1, Table S1, and Figs. S2 and S3). Both resulted from interchromosomal rearrangements. In total, 7 chimeric cDNAs resulting from 6 interchromosomal break events and 1 intrachromosomal break event were con- firmed at both the transcriptional and genomic levels through validation assays, all present only in HCC1954 (the tumor cell line) (Table 1). Single 454 read provided supportive evidence for only 1 of them (Table S2). These chimeras could affect the transcription and protein products of at least 9 genes. Wild-type transcripts were detected in parallel with the chimeric transcript to different extents in the cell line (Table S2). All verified break points tended to occur in either an intron or an intergenic region (Fig. 3). To produce the chimeric transcript, the transcription machinery either switched to the downstream exon (Fig. 3 A and C) or acquired a novel splicing acceptor in the intergenic sequence (Fig. 3B). In either break type, the consen- sus splicing sites were always observed on either side of the break junction of a chimeric cDNA when aligned to the genome. Although the mechanism is unknown, it is possible that the false positive chimeric cDNAs could be products of the cDNA Fig. 1. Schematic diagram of chimeric transcript detection and validation. preparation process. There are 2 types of breaks for the 6 verified interchromo- chimeric transcripts that suggested an interchromosomal rear- somal translocations: intragenic to intragenic and intragenic to rangement were only detected in HCC1954 but not in the control intergenic (Table 1 and Fig. 3 A and B). The one verified cell line (Fig. S2). intrachromosomal rearrangement is a local intronic inversion (Table 1 and Fig. 3C). We were unable to detect the alternate Validation of Genomic Rearrangements. Exon repetition and gene broken chromosome for all rearrangement events that involved fusion events have been reported to be present in normal cell lines 2 genes, suggesting that alternate transcripts were rarely pro- as a result of trans-splicing rather than of de novo genomic duced. We also observed that in most cases (5 of 7) the GENETICS rearrangements (12, 13). Thus, for those chimeras that were verified rearrangement events resulted in protein truncation rather than at the cDNA level we carried validation further to the genomic level, extension of the ORF, suggesting loss-of-function of critical using a combination of long-range PCR (LR-PCR) and FISH genes (Table 1). We noted that chromosome 8 seemed to be very experiments. For LR-PCR, we first tested the same set of primer frequently involved (5 of 7 events) in genomic rearrangements in HCC1954.