WO 2015/089333 Al 18 June 2015 (18.06.2015) P O P C T
Total Page:16
File Type:pdf, Size:1020Kb
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date WO 2015/089333 Al 18 June 2015 (18.06.2015) P O P C T (51) International Patent Classification: (81) Designated States (unless otherwise indicated, for every C12Q 1/68 (2006.01) C40B 30/04 (2006.01) kind of national protection available): AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY, (21) International Application Number: BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, PCT/US20 14/069848 DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, (22) International Filing Date: HN, HR, HU, ID, IL, IN, IR, IS, JP, KE, KG, KN, KP, KR, 11 December 2014 ( 11.12.2014) KZ, LA, LC, LK, LR, LS, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, (25) Filing Language: English PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA, SC, (26) Publication Language: English SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (30) Priority Data: 61/914,907 11 December 201 3 ( 11. 12.2013) US (84) Designated States (unless otherwise indicated, for every 61/987,414 1 May 2014 (01.05.2014) US kind of regional protection available): ARIPO (BW, GH, 62/010,975 11 June 2014 ( 11.06.2014) US GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, ST, SZ, TZ, UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, (71) Applicant: ACCURAGEN, INC. [US/US]; 4062 Fabian TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, Way, Suite 1A, Palo Alto, CA 94303 (US). DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, (72) Inventors: LIN, Shengrong; 34568 Willbridge Terrace, SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, Fremont, CA 94555 (US). SUN, Zhaohui; 614 Waterview GW, KM, ML, MR, NE, SN, TD, TG). Drive, Coppell, TX 75019 (US). ZHAO, Grace Qizhi; 3969 Duncan Place, Palo Alto, CA 94306 (US). TANG, Published: Paul Ling-Fung; 2924 19th Avenue, San Francisco, CA — with international search report (Art. 21(3)) 94132 (US). — before the expiration of the time limit for amending the (74) Agents: GIERING, Jeffery, C. et al; Wilson Sonsini claims and to be republished in the event of receipt of Goodrich & Rosati, 650 Page Mill Road, Palo Alto, CA amendments (Rule 48.2(h)) 94304-1050 (US). (54) Title: COMPOSITIONS AND METHODS FOR DETECTING RARE SEQUENCE VARIANTS ( ( V \ 00 © FIG. 7 (57) Abstract: In some aspects, the present disclosure provides methods for identifying sequence variants in a nucleic acid sample. o In some embodiments, a method comprises identifying sequence differences between sequencing reads and a reference sequence, and calling a sequence difference that occurs in at least two different circular polynucleotides, such as two circular polynucleotides having different junctions, as the sequence variant. In some aspects, the present disclosure provides compositions and systems useful in the described methods. COMPOSITIONS AND METHODS FOR DETECTING RARE SEQUENCE VARIANTS CROSS-REFERENCE [0001] This application claims the benefit of U.S. Provisional Application No. 61/914,907, filed December 11, 2013; U.S. Provisional Application No. 61/987,414, filed May 1, 2014; and U.S. Provisional Application No. 62/010,975, filed June 11, 2014; all of which are incorporated herein by reference. BACKGROUND OF THE INVENTION [0002] Identifying sequence variation within complex populations is an actively growing field, particularly with the advent of large scale parallel nucleic acid sequencing. However, large scale parallel sequencing has significant limitations in that the inherent error frequency in commonly-used techniques is larger than the frequency of many of the actual sequence variations in the population. For example, error rates of 0.1 - 1% have been reported in standard high throughput sequencing. Detection of rare sequence variants has high false positive rates when the frequency of variants is low, such as at or below the error rate. [0003] There are many reasons for detecting rare sequence variants. For example, detecting rare characteristic sequences can be used to identify and distinguish the presence of a harmful environmental contaminant, such as bacterial taxa. A common way of characterizing bacterial taxa is to identify differences in a highly conserved sequence, such as rRNA sequences. However, typical sequencing-based approaches to this are faced with challenges relating to the sheer number of different genomes in a given sample and the degree of homology between members, presenting a complex problem for already laborious procedures. Improved procedures would have the potential to enhance contamination detection in a variety of settings. For example, the clean rooms used to assemble components of satellites and other space craft can be surveyed with the present systems and methods to understand what microbial communities are present and to develop better decontamination and cleaning techniques to prevent the introduction of terrestrial microbes to other planets or samples thereof or to develop methodologies to distinguish data generated by putative extraterrestrial microorganisms from that generated by contaminating terrestrial microorganisms. Food monitoring applications include the periodic testing of production lines at food processing plants, surveying slaughter houses, inspecting the kitchens and food storage areas of restaurants, hospitals, schools, correctional facilities and other institutions for food borne pathogens. Water reserves and processing plants may also be similarly monitored. [0004] Rare variant detection can also important for the early detection of pathological mutations. For instance, detection of cancer-associated point mutations in clinical samples can improve the identification of minimal residual disease during chemotherapy and detect the appearance of tumor cells in relapsing patients. The detection of rare point mutations is also important for the assessment of exposure to environmental mutagens, to monitor endogenous DNA repair, and to study the accumulation of somatic mutations in aging individuals. Additionally, more sensitive methods to detect rare variants can enhance prenatal diagnosis, enabling the characterization of fetal cells present in maternal blood. SUMMARY OF THE INVENTION [0005] In view of the foregoing, there is a need for improved methods of detecting rare sequence variants. The compositions and methods of the present disclosure address this need, and provide additional advantages as well. In particular, the various aspects of the disclosure provide for highly sensitive detection of rare or low frequency nucleic acid sequence variants (sometimes referred to as mutations). This includes identification and elucidation of low frequency nucleic acid variations (including substitutions, insertions and deletions) in samples that may contain low amounts of variant sequences in a background of normal sequences, as well as the identification of low frequency variations in a background of sequencing errors. [0006] In one aspect, the disclosure provides a method of identifying a sequence variant, such as in a nucleic acid sample. In some embodiments, each polynucleotide of the plurality has a 5' end and a 3' end, and the method comprises: (a) circularizing individual polynucleotides of said plurality to form a plurality of circular polynucleotides, each of which having a junction between the 5' end and 3' end; (b) amplifying the circular polynucleotides of (a); (c) sequencing the amplified polynucleotides to produce a plurality of sequencing reads; (d) identifying sequence differences between sequencing reads and a reference sequence; and (e) calling a sequence difference that occurs in at least two circular polynucleotides having different junctions as the sequence variant. In some embodiments, the method comprises identifying sequence differences between sequencing reads and a reference sequence, and calling a sequence difference that occurs in at least two circular polynucleotides having different junctions as the sequence variant, wherein: (a) the sequencing reads correspond to amplification products of the at least two circular polynucleotides; and (b) each of the at least two circular polynucleotides comprises a different junction formed by ligating a 5'end and 3'end of the respective polynucleotides. [0007] The plurality of polynucleotides can be single- or double-stranded. In some embodiments, the polynucleotides are single-stranded. In some embodiments, circularizing is effected by subjecting the plurality of polynucleotides to a ligation reaction. In some embodiments, an individual circular polynucleotide has a junction that is unique among the circularized polynucleotides. In some embodiments, the sequence variant is a single nucleotide polymorphism (SNP). In some embodiments, the reference sequence is a consensus sequence formed by aligning the sequence reads with one another. In some embodiments, the reference sequence is a known reference sequence, such as a reference genome or portion thereof. In some embodiments, circularizing comprises the step of joining an adapter polynucleotide to the 5' end, the 3' end, or both the 5' end and the 3' end of a polynucleotide in the plurality of polynucleotides. In some embodiments, amplifying is effected by using a polymerase having strand-displacement activity, such as in rolling-circle amplification (RCA). In some embodiments,