WO 2014/071279 A2 8 May 20 14 (08.05.2014) W P O P C T
Total Page:16
File Type:pdf, Size:1020Kb
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date WO 2014/071279 A2 8 May 20 14 (08.05.2014) W P O P C T (51) International Patent Classification: (74) Agent: McCLELLAN, Kelly Brett; Genomic Health, C12Q 1/68 (2006.01) Inc., 301 Penobscot Drive, Redwood City, California 94063 (US). (21) International Application Number: PCT/US2013/068236 (81) Designated States (unless otherwise indicated, for every kind of national protection available): AE, AG, AL, AM, (22) International Filing Date: AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, 4 November 2013 (04.1 1.2013) BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, (25) Filing Language: English DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KN, KP, KR, (26) Publication Language: English KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, ME, (30) Priority Data: MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, 61/722,634 5 November 2012 (05. 11.2012) US OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, 61/766,561 19 February 2013 (19.02.2013) US SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, (71) Applicant: GENOMIC HEALTH, INC. [US/US]; 301 ZW. Penobscot Drive, Redwood City, California 94063 (US). (84) Designated States (unless otherwise indicated, for every (72) Inventors: MA, Yan; 301 Penobscot Drive, Redwood kind of regional protection available): ARIPO (BW, GH, City, California 94063 (US). QU, Kunbin; 301 Penobscot GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, SZ, TZ, Drive, Redwood City, California 94063 (US). LIU, Mei- UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, Lan; 301 Penobscot Drive, Redwood City, California TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, 94063 (US). AMBANNAVAR, Ranjana; 301 Penobscot EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, ΓΓ, LT, LU, LV, Drive, Redwood City, California 94063 (US). MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, STEPHANS, James; 301 Penobscot Drive, Redwood TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, City, California 94063 (US). PAN, John; 2547 W. Avenue KM, ML, MR, NE, SN, TD, TG). 30, Los Angeles, California 90065 (US). [Continued on nextpage] (54) Title: GENE FUSIONS AND ALTERNATIVELY SPLICED JUNCTIONS ASSOCIATED WITH BREAST CANCER (57) Abstract: The present invention relates to gene fusions Figure 1 and alternative spliced junctions associated with breast can cer. The present invention also relates to novel methods of A . identifying gene fusions and alternative spliced junctions in Sample based Cohort based RNA sequencing data. The present invention further relates Step 6 to predicting prognosis of a breast cancer patient based on Gene AST tables the number of gene fusion events. Step 7 Read alignments Candidate gene fusions Template index Distant spliced reads Alternative within same gene splicing Ensembl candidates annotation w o 2014/071279 A2 1I 11 II I 1 I 1 II II III II I llll III II I II Published: — without international search report and to be republished upon receipt of that report (Rule 48.2(g)) GENE FUSIONS AND ALTERNATIVELY SPLICED JUNCTIONS ASSOCIATED WITH BREAST CANCER SEQUENCE LISTING [0000] The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on October 24, 2013, is named GHI-0056-PCT_SL.txt and is 1,261,876 bytes in size. FIELD OF THE INVENTION [0001] The present invention relates to gene fusions and genes comprising alternative spliced junctions associated with breast cancer. The present invention also relates to methods of identifying gene fusions and genes comprising alternative spliced junctions in samples obtained from a patient with cancer. Furthermore, the present invention relates to method of predicting the prognosis of a patient with breast cancer based on the number of gene fusion events. INTRODUCTION [0002] Genomic aberrations resulting in gene fusions and alternatively spliced genes play an important role in cancer. Gene fusions, for example, have been estimated to account for about 20% of human cancer morbidity. Mitelman et al., Nature Reviews Cancer 7:233-245 (2007). Gene fusions are hybrids created by joining two previously separate genes via genomic aberrations such as translocations, deletions, and inversions, or trans-splicing between precursor mRNAs. Gene fusions may up-regulate expression of oncogenic genes by fusing a strong promoter to an oncogene. The first gene fusion identified in human neoplasia was BCR-ABL1 in chronic myelogenous leukemia (CML). The protein resulting from this fusion exhibits constitutive tyrosine kinase activity. Discovery of BCR-ABL1 led to development of a targeted treatment for CML using the tyrosine kinase inhibitor imatinib, which was approved in 2001. Druker et al., New England Journal of Medicine 344:1038- 1042 (2001). Most of the known gene fusions have been found in hematological disorders; however, with the advent of next-generation sequencing technology, rare recurrent gene fusion events have been identified in common solid tumors. See Kohno et al., (2012) Nature Medicine 18: 375-377 (2012); Takeuchi et al., Nature Medicine 18: 378-381 (2012); Lipson et al., Nature Medicine, 18: 382-384 (2012); and Ju, et al., Genome Res., 22: 436-445 (2012). [0003] In cancer, aberrantly spliced pre-mRNAs escape the quality control mechanisms within cells (e.g., the nonsense mediated mRNA decay pathway) and are, therefore, translated into aberrant proteins. He et al., PLoS ONE 4(3):e4732 (2009). For example, alternative splicing is known to be related to the pathogenesis of colon cancer and has been described to occur in lung adenocarcinoma. Seo et al., Genome Research 1-11 (Oct. 2012). [0004] Transcriptome sequencing enables detection of transcriptional variants such as gene fusions and alternative splicing events. Current methods, such as ChimeranScan (Robinson, et al., Nature Medicine 17: 1646-1651 (2011)), SnowShoes-FTD (Asmann et al., Cancer Res, 72: 1921-1928 (2012)), GSTRUCT-fusions (Seshagiri,S. et al., Nature 488: 660-664 (2012)), and GFP (Ju et al., Genome Res., 22: 436-445 (2012)), use paired-end data obtained from fresh frozen tissue samples to detect gene fusions. Other methods, such as TopHat-Fusion (Kim and Salzberg, Genome Biol 12: R72 (2011)), FusionMap (Ge et al. Bioinformatics, 27: 1922-1928 (2011)), and FusionFinder (Francis et al. PLoS One, 7(6):e39987 (2012)) can use single-end data from cell lines or fresh frozen tissue samples to detect gene fusions. [0005] Because standard clinical practices include generating formalin-fixed, paraffin-embedded (FFPE) tissue samples from biopsies and surgical resections, FFPE samples provide an enormous repository of information for cancer research. Nonetheless, current methods are not well suited for investigating RNA from FFPE samples as the RNA from such samples is often degraded and libraries generated from those samples have low complexity and small insert sizes. [0006] The present bioinformatics approaches identify gene fusions and alternative spliced junctions from FFPE RNA-sequencing datasets at base-pair resolution. SUMMARY [0007] A bioinformatics approach was developed to identify gene fusion junctions using FFPE RNA-sequencing datasets. The present invention provides gene fusion junctions that are present in breast cancer tissue samples. These gene fusions are provided in Tables A and B. The present invention also provides a bioinformatics approach to identify alternative spliced junctions. The present invention provides alternative spliced junctions that are present in breast cancer tissue samples. These alternative spliced junctions are present in Table 5. [0008] The present invention accommodates the use of archived paraffin- embedded biopsy material for assay of gene fusion transcripts, and therefore is compatible with the most widely available type of biopsy material. It is also compatible with other different methods of tumor tissue harvest, for example, via core biopsy or fine needle aspiration. [0009] A multiplexed, whole genome sequencing methodology was used to enable whole transcriptome-wide gene fusion and alternative spliced junction discovery using low amounts of FFPE tissue. The methods described herein support the use of single end or paired end sequence reads. [0010] In one aspect, the invention provides a method for identifying a gene fusion in a biological sample obtained from a patient with cancer. The method comprises obtaining a plurality of reads from RNA sequencing of the biological sample. The read is then mapped to the human genome. Next, the method comprises determining whether the read comprises a distant spliced junction and selecting the read comprising a distant spliced junction. A candidate gene fusion comprising the distant spliced junction is then identified. The method also comprises creating a first set of templates for the candidate gene fusion. The first set of templates comprises: (1) a fusion template comprising 50 base pairs (bp) of exonic sequence of a preserved region of a donor gene and 50 bp of exonic sequence of a preserved region of an acceptor gene, (2) a donor template comprising 50 bp of exonic sequence of a preserved region of a donor gene and 50 bp of exonic sequence of a discarded region of an donor gene, (3) an acceptor template comprising 50 bp of exonic sequence of a discarded region of a acceptor gene and 50 bp of exonic sequence of a preserved region of an acceptor gene, (4) a donor genomic template comprising 50 bp upstream genomic sequence of a donor splicing site and 50 bp downstream genomic sequence of a donor splicing site, and (5) an acceptor genomic template comprising 50 bp upstream genomic sequence of an acceptor splicing site and 50 bp downstream genomic sequence of an acceptor splicing site.