WO 2014/151764 A2 25 September 2014 (25.09.2014) P O P C T
Total Page:16
File Type:pdf, Size:1020Kb
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date WO 2014/151764 A2 25 September 2014 (25.09.2014) P O P C T (51) International Patent Classification: (81) Designated States (unless otherwise indicated, for every G06F 19/18 (201 1.01) kind of national protection available): AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, (21) International Application Number: BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, PCT/US20 14/0264 11 DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, (22) International Filing Date: HN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KN, KP, KR, 13 March 2014 (13.03.2014) KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, (25) Filing Language: English OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, (26) Publication Language: English SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, (30) Priority Data: ZW. 61/798,941 15 March 2013 (15.03.2013) US (84) Designated States (unless otherwise indicated, for every (71) Applicant: VERACYTE, INC. [US/US]; 7000 Shoreline kind of regional protection available): ARIPO (BW, GH, Court, Suite 250, South San Francisco, CA 94080 (US). GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, SZ, TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ, (72) Inventors: KENNEDY, Giulia, C ; 360 Castenada Aven TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, ue, San Francisco, CA 941 16 (US). WILDE, Jonathan, I.; EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, 1112 Clovelly Lane, Burlingame, CA 94010 (US). MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, CHUDOVA, Darya; 593 1 Taormino Avenue, San Jose, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, CA 95 123 (US). PANKRATZ, Daniel; 111 Wilder Aven KM, ML, MR, NE, SN, TD, TG). ue, Los Gatos, CA 95030 (US). BARBACIORU, Catalin; 36178 Potel Common, Fremont, CA 94536 (US). Published: WALSH, P., Sean; 5 Courtney Lane, Danville, CA 94506 — without international search report and to be republished (US). PAGAN, Moraima; 625 Scott Street, Apt. 104, San upon receipt of that report (Rule 48.2(g)) Francisco, CA 941 17 (US). (74) Agents: ALEMOZAFAR, Ali, R. et al; Wilson Sonsini Goodrich & Rosati, 650 Page Mill Road, Palo Alto, CA 94304-1050 (US). (54) Title: METHODS AND COMPOSITIONS FOR CLASSIFICATION OF SAMPLES (57) Abstract: Disclosed herein are kits, compositions, and methods relating to the classification of samples. Methods disclosed herein can also be used to diagnose conditions or to support treatment-related decisions. METHODS AND COMPOSITIONS FOR CLASSIFICATION OF SAMPLES CROSS REFERENCE [0001] This application claims priority to U.S. Provisional Patent Application No. 61/798,941, filed on March 15, 2013, which is entirely incorporated herein by reference. BACKGROUND [0002] Cancer is one of the leading causes of mortality worldwide; yet for many patients, the process of simply clearing the first step of obtaining an accurate diagnosis is often a frustrating and time-consuming experience. This is true of many cancers, including thyroid cancer. This is also particularly true of relatively rare diseases, such as Hurthle cell adenomas and carcinomas, which account for approximately 5% of thyroid neoplasms. [0003] An inaccurate diagnosis of cancer can lead to unnecessary follow-up procedures, including costly surgical procedures, not to mention unnecessary emotional distress to the patient. In the case of thyroid cancer, it is estimated that out of the approximately 130,000 thyroid removal surgeries performed each year due to suspected malignancy in the United States, only about 54,000 are necessary; therefore, tens of thousands of unnecessary thyroid removal surgeries are performed annually. Continued treatment costs and complications due to the need for lifelong drug therapy to replace the lost thyroid function can cause further economic and physical harm. SUMMARY [0004] The present disclosure provides for a method for diagnosing and/or treating a subject suspected of having a disease such as cancer. In some embodiments, the method comprises isolating ribonucleic acid (RNA) from a biological sample obtained from the subject; identifying one or more mutations within a first region of interest in the RNA sample; comparing a frequency of variation for each base pair position in the first region of interest of the RNA sample to one or more references to identify one or more mutations that are correlated with the cancer; comparing the one or more mutations identified to the one or more mutations identified, to identify the presence of absence of at least one mutation; repeating the previous steps for a second region of interest of the RNA sample to generate a mutation profile for the RNA, wherein the second region of interest is different from the first region of interest; and diagnosing and/or treating the subject based on the mutation profile. In some embodiments, the steps may be repeated at least 2, 10 or 100 times. [0005] In some embodiments, one or more references comprise frequencies of variation for single base pairs in a reference sequence, wherein the frequencies of variation in the reference sequence are derived from at least 1000 individuals. In some embodiments one or more references of comprise frequencies of variation for single base pairs in a reference sequence, wherein the frequencies of variation in the reference sequence are derived from a known cancer. In some embodiments one or more references comprise frequencies of variation for single base pairs in a reference sequence, wherein the frequencies of variation in the reference sequence are derived from at least 40 samples. [0006] In some embodiments a call score is assigned to each mutation identified in the RNA. In some embodiments, a mutation profile of is generated using the COSMIC database of known sites of somatic variations in cancer. [0007] In some embodiments the identification of the presence or absence of one or more mutations is at least 90%, 95%, or 100% accurate. [0008] This disclosure also provides for a method for detecting and normalizing 3'-5 ' amplification bias in microarray sample data generated from a nucleic acid sample from a subject, the method comprising obtaining a biological sample from a subject, wherein the biological sample comprises a nucleic acid sample; amplifying the nucleic acid sample to generate one or more amplicons, wherein the nucleic acid sample is amplified with the aid of one or more probes; generating a nucleic acid sequence read for an individual amplicon among the one or more amplicons; for each individual amplicon among the one or more amplicons, calculating, with the aid of a computer processor, the extent of a 3' bias for a given probe among the one or more probes upon a comparison of a nucleic acid sequence of the given probe to a nucleic acid sequence of the individual amplicon generated in (c); and applying a normalization procedure to correct for the 3' bias for a given probe. [0009] In some embodiments, the nucleic acid is an mR A transcript. In some embodiments calculating the extent of the 3' bias further comprises determining the effective distance from the 3' end of the mRNA transcript and the given probe. In some embodiments calculating the extent of the 3' bias further comprises determining the effective distance from one or more sites or sequences in the mRNA transcript and the given probe. In some embodiments calculating the extent of the 3' bias further comprises calculating a distance or median weighted distance between the given probe and one or more downstream polyA sites or sequences within the mRNA transcript, wherein the weighted distance is determined by read counts associated with each polyA site in the mRNA transcript. In some embodiments calculating the extent of the 3' bias further comprises comparing variability of paired intensity profiles of two or more identical probes, wherein the intensity profiles are obtained from two or more independent sets of microarray data, wherein each microarray data set is generated from an identical biological sample. [0010] In some embodiments comparing variability of paired intensity profiles of two or more identical probes further comprises performing a per-transcript alignment of probes within the mRNA transcript to calculate the effective distance. In some embodiments the normalization procedure further comprises generating a normalization target distribution. In some embodiments the normalization procedure further comprises quantile normalization, wherein probes are grouped into bins, and a quantile normalization is applied to each probe within each bin to normalize the median intensity of probes across a bin. In some embodiments the normalization procedure removes application bias from sample data. [0011] In some embodiments summarization methods are applied to normalized probe intensities and used to improve detection of differential gene expression in the microarray sample data. [0012] This disclosure also provides for a method for the detection of heterogeneity present in microarray data, the method comprising generating hypothetical microarray data from a mixture of one or more samples in silico; generating one or models from the hypothetical microarray data; obtaining microarray data from a mixture of one or more samples performed in vitro; comparing the one or more models of (b) to the data obtained in and based upon the comparison, assessing the strength of the one or more models. [0013] In some embodiments the strength of the one or more models is determined by comparing mean squared error between the model generated and the data obtained.