US 2013 O130246A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0130246A1 BENSMON et al. (43) Pub. Date: May 23, 2013

(54) METHODS FOR THE DETECTION, (22) Filed: Oct. 31, 2012 VISUALIZATION AND HIGH RESOLUTION PHYSICAL MAPPING OF GENOMIC Related U.S. Application Data REARRANGEMENTS IN BREAST AND OVARAN CANCER AND LOC (60) Provisional application No. 61/553,906, filed on Oct. BRCA1 AND BRCA2 USING GENOMIC 31, 2011. MORSE CODE IN CONJUNCTION WITH MOLECULAR COMBING Publication Classification (71) Applicants: Aaron BENSIMON, Anthony (FR): (51) Int. Cl. Maurizio Ceppi, Issy-Les-Moulineaux CI2O I/68 (2006.01) (FR); Kevin Cheeseman, (52) U.S. Cl. Champigny-Sur-Marne (FR); Emmanuel CPC ...... CI2O I/6886 (2013.01) Conseiller, Paris (FR); Pierre Walrafen, USPC ...... 435/6.11 Montrouge (FR) (57) ABSTRACT (72) Inventors: Aaron BENSIMON, Anthony (FR): Maurizio Ceppi, Issy-Les-Moulineaux Methods for detecting genomic rearrangements in BRCA1 (FR); Kevin Cheeseman, and BRCA2 genes at high resolution using Molecular Comb Champigny-Sur-Marne (FR); Emmanuel ing and for determining a predisposition to a disease or dis Conseiller, Paris (FR); Pierre Walrafen, order associated with these rearrangements including predis Montrouge (FR) position to ovarian cancer or breast cancer. Primers useful for producing probes for this method and kits for practicing the (21) Appl. No.: 13/665,404 methods. Patent Application Publication May 23, 2013 Sheet 1 of 12 US 2013/0130246 A1

85925 379 o : 98. xxxxxx& 3850,75800

:g i s Patent Application Publication May 23, 2013 Sheet 2 of 12 US 2013/0130246 A1

Fig. 1B

845 3.25° 42950 888 O 3690 73800 O700 147600'

S. : s

7250, Patent Application Publication May 23, 2013 Sheet 3 of 12 US 2013/0130246 A1

§§§§§§§

gaeae,§§§§** *& ¿(gael ***…) *&

Patent Application Publication May 23, 2013 Sheet 4 of 12 US 2013/0130246 A1

s 8x8xx s a

ex, N S.& xxx8. wax

S. 3. er St. (x : . 8: & s ::::::::::: -: 3. S. 3: NS:8. sgy *i: g.-

s: : wa s

o &

sxxxx Patent Application Publication May 23, 2013 Sheet 5 of 12 US 2013/0130246 A1

*******************************

Patent Application Publication May 23, 2013 Sheet 6 of 12 US 2013/0130246 A1

xxxxxxxxxx $$,žš. wo-wox. -swww.sax888

Patent Application Publication May 23, 2013 Sheet 7 of 12 US 2013/0130246 A1 apoi $$33.3.8 (9eseo)cpººxa?ed

|e$$

eiwsiera?avsvozizzzzzzzszez

? Patent Application Publ ication May 23, 2013 Sheet 8 of 12 US 2013/0130246 A1

(zeseo)zxa?

$?%

3888-8-8

Patent Application Publication May 23, 2013 Sheet 9 of 12 US 2013/0130246 A1

*· :

*****?inxe.------***¿. ***?…..

*«».

*&

Patent Application Publicat O May 23, 2013 Sheet 10 of 12 US 2013/0130246 A1 qwzºg-/+z^9?ouo?eolidna! oz-gpxednd

|9|8 | 9·61 Patent Application Publication May 23, 2013 Sheet 11 of 12 US 2013/0130246 A1

vae,ºžk?

Patent Application Publication May 23, 2013 Sheet 12 of 12 US 2013/0130246 A1

sax

£892eup&

US 2013/0130246 A1 May 23, 2013

METHODS FOR THE DETECTION, amplification (MLPA) (Casilli et al., 2002), (Hofmann et al., VISUALIZATION AND HIGH RESOLUTION 2002) and high-resolution oligonucleotide array comparative PHYSICAL MAPPING OF GENOMIC genomic hybridization (aCGH) (Rouleau et al., 2007), (Staaf REARRANGEMENTS IN BREAST AND et al., 2008). New approaches that provide both prescreening OVARAN CANCER GENES AND LOC and quantitative information, such as qPCR-HRM and BRCA1 AND BRCA2 USING GENOMIC EMMA, have recently been developed and genomic capture MORSE CODE IN CONJUNCTION WITH combined with massively parallel sequencing has been pro MOLECULAR COMBING posed for simultaneous detection of Small mutations and large rearrangements affecting 21 genes involved in breast CROSS-REFERENCE TO RELATED and ovarian cancer (Walsh et al., 2010). APPLICATIONS 0008 Molecular Combing is a powerful FISH-based tech 0001. The present application claims priority to U.S. Pro nique for direct visualization of single DNA molecules that visional Application No. 61/553,906, filed Oct. 31, 2011, the are attached, uniformly and irreversibly, to specially treated entire contents of which are incorporated herein by reference. glass surfaces (Herrick and Bensimon, 2009): (Schurra and On Oct. 30, 2012, an International Application (PCT/ Bensimon, 2009). This technology considerably improves IBF ; Submission number 1000168920) was also filed the structural and functional analysis of DNA across the with the same title, the entire contents of which are incorpo genome and is capable of visualizing the entire genome at rated herein by reference. high resolution (in the kb range) in a single analysis. Molecu lar Combing is particularly Suited to the detection of genomic BACKGROUND OF THE INVENTION imbalances such as mosaicism, loss of heterozygosity (LOH), copy number variations (CNV), and complex rearrangements 0002 1. Field of the Invention such as translocations and inversions (Caburet et al., 2005), 0003. The invention relates to a method for detecting thus extending the spectrum of mutations potentially detect genomic rearrangements in BRCA1 and BRCA2 genes and able in breast cancer genes. Molecular Combing has been loci at high resolution using Molecular Combing and relates Successfully employed for the detection of large rearrange to a method of determining a predisposition to diseases or ments in BRCA1 (Gadet al., 2001), (Gadet al., 2002a), (Gad disorders associated with these rearrangements including et al., 2003) and BRCA2 (Gad et al., 2002b), using a first predisposition to ovarian cancer or breast cancer. generation “color bar coding screening approach. However, 0004 2. Description of the Related Art these techniques lack resolution and cannot precisely detect 0005 Breast cancer is the most common malignancy in large rearrangements in and around BRCA1 and BRCA2. women, affecting approximately 10% of the female popula 0009. In distinction to the prior art techniques, as disclosed tion. Incidence rates are increasing annually and it is esti herein, the inventors provide a novel Genetic Morse Code mated that about 1.4 million women will be diagnosed with Molecular Combing procedure that provides for high resolu breast cancer annually worldwide and about 460,000 will die tion visual inspection of genomic DNA samples, precise from the disease. Germline mutations in the hereditary breast mapping of mutated exons, precise measurement of mutation and ovarian cancer Susceptibility genes BRCA1 size with robust statistics, simultaneous detection of BRCA1 (MIMH1 13705) and BRCA2 (MIMH600185) are highly pen and BRCA2 genetic structures or rearrangements, detection etrant (King et al., 2003), (Nathanson et al., 2001). Screening of genetic inversions or translocations, and Substantial elimi is important for genetic counseling of individuals with a nation of problems associated with repetitive DNA sequences positive family history and for early diagnosis or prevention such as Alu sequences in BRCA1 and BRCA2 loci. in mutation carriers. When a BRCA1 or BRCA2 mutation is identified, predictive testing is offered to all family members BRIEF SUMMARY OF THE INVENTION older than 18 years. If a woman tests negative, her risk becomes again the risk of the general population. If she tests (0010. The BRCA1 and BRCA2 genes are involved, with positive, a personalized Surveillance protocol is proposed: it high penetrance, in breast and ovarian cancer Susceptibility. includes mammographic screening from an early age, and About 2% to 4% of breast cancer patients with a positive possibly prophylactic Surgery. Chemoprevention of breast family history who are negative for BRCA1 and BRCA2 cancer with anti-estrogens is also currently tested in clinical point mutations can be expected to carry large genomic alter trial and may be prescribed in the future. ations (deletion or duplication) in one of the two genes, and 0006 Most deleterious mutations consist of either small especially BRCA1. However, large rearrangements are frameshifts (insertions or deletions) or point mutations that missed by direct sequencing. Molecular Combing is a pow give rise to premature stop codons, missense mutations in erful FISH-based technique for direct visualization of single conserved domains, or splice-site mutations resulting in aber DNA molecules, allowing the entire genome to be examined rant transcript processing (Szabo et al., 2000). However, at high resolution in a single analysis. A novel predictive mutations also include more complex rearrangements, genetic test based on Molecular Combing is disclosed herein. including deletions and duplications of large genomic regions For that purpose, specific BRCA1 and BRCA2 “Genomic that escape detection by traditional PCR-based mutation Morse Codes” (GMC) were designed, covering coding and screening combined with DNA sequencing (Mazoyer, 2005). non-coding regions and including large genomic portions 0007 Techniques capable of detecting these complex flanking both genes. The GMC is a series of colored signals rearrangements include Southern blot analysis combined distributed along a specific portion of the genomic DNA with long-range PCR or the protein truncation test (PTT), which signals arise from probe hybridization with the probes quantitative multiplex PCR of short fluorescent fragments of the invention. The concept behind the GMC has been (QMPSF) (Hofmannet al., 2002), real-time PCR, fluorescent previously defined in WIPO patent application WO/2008/ DNA microarray assays, multiplex ligation-dependent probe 028931 (which is incorporated by reference), and relates to US 2013/0130246 A1 May 23, 2013

the method of detection of the presence of at least one domain colors are represented here as grayscale variations: blue of interest on a macromolecule to test. probes are shown as blackboxes, green probes as white boxes 0011. A measurement strategy is disclosed for the GMC and red probes as gray boxes. (2A) The complete BRCA1 signals, and has been validated by testing 6 breast cancer GMC covers a genomic region of 200 kb and is composed of patients with a positive family history and 10 control patients. 18 signals (S1B1-S18B) of a distinct color (green, red or Large rearrangements, corresponding to deletions and dupli blue). Each signal is composed of 1 (e.g., S2B1) to 3 small cations of one or several exons and with sizes ranging from 3 horizontal bars (e.g., S15B1), each bar corresponding to a kb to 40 kb, were detected on both genes (BRCA1 and single DNA probe. The region encoding the BRCA1 BRCA2). Importantly, the developed GMC allowed to unam (81.2 kb) is composed of 7 “motifs' (g1b1-g7b1). Each motif biguously localize several tandem repeat duplications on both is composed of 1 to 3 small horizontal bars and a black 'gap' genes, and to precisely map large rearrangements in the prob (no signal). (2B) Zoom-in on the BRCA1 gene-specific sig lematic Alu-rich 5'-region of BRCA1. This new developed nals and relative positions of the exons. (2C) The complete Molecular Combing genetic test is a valuable tool for the BRCA2 GMC covers a genomic region of 172 kb and is screening of large rearrangements in BRCA1 and BRCA2 composed of 14 signals (S1B2-S14B2) of a distinct color and can optionally be combined in clinical settings with an (green, red or blue). Each signal is composed of 1 (e.g., assay that allows the detection of point mutations. S14B2) to 5 small horizontal bars (e.g., S1B2). The region 0012. A substantial technical improvement compared to encoding the BRCA2 gene (84.2 kb) is composed of 5 motifs the prior color bar coding approach is disclosed here that is 24 (g 1b2-g5b2). Each motif is composed of 2 to 4 small based on the design of second-generation high-resolution horizontal bars and a black gap. (2D) Zoom-in on the BRCA2 BRCA1 and BRCA2 Genomic Morse Codes (GMC). Impor gene-specific signals and relative positions of the exons. tantly, repetitive sequences were eliminated from the DNA Deletions or insertions, if present, will appear in the region probes, thus reducing background noise and permitting covered by the motifs. robust measurement of the color signal lengths within the 0016 FIGS. 3A and 3B: Validation of BRCA1 and GMC. Both GMC were statistically validated on samples BRCA2 Genomic Morse Code signals in control patients. from 10 healthy controls and then tested on six breast cancer Original microscopy images consist of three channel images patients with a positive family history of breast cancer. Large where each channel is the signal from a given fluorophore— rearrangements were detected, with a resolution similar to the these are acquired separately in the microscopy procedure. one obtained with a CGH (1-3 kb). The detected mutation These channels are represented here as different shades on a demonstrates the robustness of this technology, even for the grayscale: blue probes are shown in black, green probes in detection of problematic mutations, such as tandem repeat white and red probes in dark gray, while background (absence duplications or mutations located in genomic regions rich of of signal) is light gray. In diagrams, the same convention as in repetitive elements. The developed Molecular Combing plat FIG. 2 is used. The aspect ratio was not preserved, signals form permits simultaneous detection of large rearrangements have been “widened' (i.e. stretched perpendicularly to the in BRCA1 and BRCA2, and provides novel genetic tests and direction of the DNA fiber) in order to improve the visibility test kits for breast and ovarian cancer. of the probes. Typical BRCA1 (3A) and BRCA2 (3B) Genomic Morse Code signals and measured motif lengths BRIEF DESCRIPTION OF THE DRAWINGS (kb) in one control patient (absence of large rearrangements) 0013 The patent or application file contains at least one are reported. The BRCA1 and BRCA2 signals obtained after drawing executed in color. microscopic visualization are shown at the top of the tables, 0014 FIGS. 1A and 1B: Dot plot alignments of the including the position of the motifs related to the gene of BRCA1 and BRCA2 genomic regions. Dot plot matrix show interest. Typically 20 to 40 images (n images) were selected, ing self-alignment of the 207-kb genomic regions derived and motifs were measured with GVLab software. For each from the BAC RP11-831F13 (ch17:41 172482-41379594) motif, the following values were determined: the theoretical encoding BRCA1 (1A), and the 172-kb genomic regions calculated length (calculated (kb)), the mean measured length derived from the BAC RP11-486017 (ch13: 32858070 (u(kb)), the standard deviation (SD (kb)), the coefficient of 33030569) encoding BRCA2 (1B), based on the GRCh37 variation (CV (%)), the difference between u and calculated genome assembly (also called hg19, April 2009 release) and (delta), and the stretching factor (SF=(calculated/u)x2). In using JDotter software (URL:http:// athena.bioc.uvic.ca/ the absence of mutations, SF values are comprised between tools/JDotter). The main diagonal represents alignment of the 1.8 and 2.2 and delta values are comprised between -1.9 kb sequence with itself, while the lines out of the main diagonal and 1.9 kb (see Material and Methods in Example 1 for represent similar or repetitive patterns within the sequence. details). The dark regions contain large numbers of repetitive (0017 FIGS. 4A, 4B, and 4C: Known BRCA1 large rear sequences, whereas the bright regions contain none. The rangements detected in breast cancer patients. genes are represented as arrows in the 5'-->3' direction. The As in FIGS. 2 and 3, diagrams and microscopy images are sizes and BAC coordinates of the genomic regions, encoding represented in shades of gray, with the following correspon for repetitive sequences, not included in the DNA probes are dence: blue is shown as black, green as white and red as dark indicated in the tables on the left. The bottom panels indicate gray (on a light gray background) and aspect ratio in micros the name and the size (in kb) of the DNA probes (35 for copy images may have been modified for clarity. DNA iso BRCA1 and 27 for BRCA2) without potentially disturbing lated from EBV-immortalized B lymphocytes collected from repetitive sequences, derived from the bioinformatics analy breast cancer patients was analyzed by Molecular Combing S1S. to confirm known large rearrangements previously character 0015 FIGS. 2A, 2B, 2C and 2D: In silico-generated ized by aOGH (see Table 3). Three large rearrangements out Genomic Morse Codes designed for high-resolution physical of seven are shown in the figure: (4A) Dup ex 13 (case 1), mapping of the BRCA1 and BRCA2 genomic regions. Probes visible as a tandem repeat duplication of the blue signal S7B1. US 2013/0130246 A1 May 23, 2013

The g4B1 motif (16.5 kb) was first measured on a mixed FIGS. 2 and 3, diagrams and microscopy images are repre population of 40 images, comprising wild type and mutated sented in shades of gray, with the following correspondence: alleles, and following values were obtained: u.(BRCA1"+ blue is shown as black, green as white and red as dark gray (on BRCA1" signals)=19 kbit3.5 kb, delta=2.5 kb (duplicationis a light gray background) and aspect ratio in microscopy confirmed since delta 22 kb). The images were then divided images may have been modified for clarity. By visual inspec in two groups: 21 images were classified as BRCA1", and 19 tion, there appears to be a tandem duplication of the red signal images were classified as BRCA1". The size was then cal S5B1. After measurement, the mutation was estimated to culated as the difference between the motif mean sizes of the have a size of 6.7+1.2 kb, restricted to a portion of the genome two alleles: u.(BRCA1")=16.1+1.6 kb, u(BRCA1")=22. that encodes for exons 18 to 20. The estimated mutation size 2+2.0 kb, mutation size u(BRCA1")-u(BRCA1")=6. is fully in line with the 8.7 kb reported in the literature (Staaf, 1+1.6 kb. The bottom panel shows the MLPA fragment dis 2008). Details on the measurement and statistical analysis can play (left) and the normalized MLPA results (right), arrows be found in Example 1. indicating exons interpreted as duplicated. (4B) Del ex 8-13 0020 FIG.79: examples of Alu sequences excluded from (case 6), visible as a deletion of the blue signal S7B1, includ the BRCA1 (A) and BRCA2 (B) GMCs. ing a large genomic portion between signals S7B1 and S8B1. The g4B1 (16.5 kb) and the g5b1 (19.7 kb) motifs were first DETAILED DESCRIPTION OF THE INVENTION measured on a mixed population of 23 images, yielding fol lowing values. Forg4b 1: L(BRCA1"+BRCA1")=17.5+4.0 Definitions kb, delta=-2.2 kb (delta s-2 kb); 13 images were then clas 0021 Physical mapping: is the creation of a genetic map sified as BRCA1" and 10 images as BRCA1": u(BRCA1") defining the position of particular elements, mutations or =20.8+1.6 kb, u(BRCA1")=13.3+1.1 kb, LL(BRCA1")-u markers on genomic DNA, employing molecular biology (BRCA1")=-7.5+1.6 kb. Forg5b1: u.(BRCA1"+BRCA1") techniques. Physical mapping does not require previous = 12.8+5.5 kb, delta=-3.7 kb (deltas-2 kb); 13 images were sequencing of the analyzed genomic DNA. then classified as BRCA1" and 10 images as BRCA1": (0022 FISH: Fluorescent in situ hybridization. u(BRCA1")=18.3+1.3 kb, u(BRCA1")=5.8+0.5 kb, 0023 Molecular Combing: a FISH-based technique for u(BRCA1")-u(BRCA1")=-12.5+1.0 kb. Total mutation direct visualization of single DNA molecules that are size-mutation size g4B1+mutation size g5b1 =-20+2.8 kb. attached, uniformly and irreversibly, to specially treated glass (4C) Delex 2 (case 2), visible as a deletion of the green signal S10B1, as well as a large genomic portion of the 5' region Surfaces. upstream of BRCA1, including S11B1 and S1281. To con 0024 Predictive genetic testing: screening procedure firm the presence of the deletion in the BRCA1 gene, the involving direct analysis of DNA molecules isolated from g7B1 (17.7 kb) motif was first measured on a mixed popula human biological samples (e.g.: blood), used to detect gene tion of 20 images, yielding following values: u(BRCA1"+ mutations associated with disorders that appear after birth, BRCA1")=12.3+2.9 kb, delta=-5.4 kb (deletion is con often later in life. These tests can be helpful to people who firmed since deltas-2 kb). To measure mutations size within have a family member with a genetic disorder, but who have the BRCA1 gene, 11 images were then classified as BRCA1" no features of the disorder themselves at the time of testing. and 9 images as BRCA1", yielding following values: Predictive testing can identify mutations that increase a per u(BRCA1")=18.1+0.7 kb, u(BRCA1")=8.1+1.6 kb, muta son's chances of developing disorders with a genetic basis, tion size u(BRCA1")-u(BRCA1")=-10+1.5 kb. To Such as certain types of cancer. include the deleted genomic region upstream of BRCA1 and 0025 Polynucleotides: This term encompasses naturally determine the whole mutation size, we had to measure the occurring DNA and RNA polynucleotide molecules (also genomic region between the signals S8B1 and S14B1 (89.9 designated as sequences) as well as DNA or RNA analogs kb). The S8B1-S14B1 region was first measured on 19 with modified structure, for example, that increases their images, yielding following values: L(BRCA1"+BRCA1") stability. Genomic DNA used for Molecular Combing will =62.3+18.4 kb, delta=-27.6 kb. 11 images were then classi generally be in an unmodified form as isolated from a bio fied as BRCA1", and 8 images as BRCA1", yielding fol logical sample. Polynucleotides, generally DNA, used as lowing values: L(BRCA1")=92.2+3.2 kb, u(BRCA1")=51. primers may be unmodified or modified, but will be in a form 4+2.2 kb, mutation size u(BRCA1")-u(BRCA1")=-40. suitable for use in amplifying DNA. Similarly, polynucle 8+3.5 kb. The BRCA1 signals, derived from both the wild otides used as probes may be unmodified or modified poly type (BRCA1") and the mutated allele (=BRCA1"), nucleotides capable of binding to a complementary target obtained after microscopic visualization, are shown in the top sequence. This term encompasses polynucleotides that are panels. The position, nature (deletion or duplication) and size fragments of other polynucleotides such as fragments having (in kb) of the detected large rearrangements are indicated in 5, 10, 15, 20, 30, 40, 50, 75, 100, 200 or more contiguous orange. The Zoom-in on the BRCA1 gene-specific signals and nucleotides. 0026 BRCA1 locus: This locus encompasses the coding the relative positions of the mutated exons are shown in the portion of the human BRCA1 gene (gene ID: 672, Reference bottom panels. mt, mutated allele; wit, wild-type allele. Sequence NM 007294) located on the long (q) arm of chro 0018 FIG.5. GMC used for BRCA1. Another example of mosome 17 at band 21, from 41,196.311 to base pair a high resolution genomic morse code to analyze the BRCA1 41,277.499, with a size of 81 kb (reference genome Build gene region is shown here. As in FIG. 2, diagrams are repre GRCh37/hg19), as well as its introns and flanking sequences. sented with the following correspondence: blue probes are Following flanking sequences have been included in the shown as black, green as white and red as dark gray. BRCA1 GMC: the 102 kb upstream of the BRCA1 gene 0019 FIG. 6: Duplication in Exons 18-20 of BRCA1 (from 41,277,500 to 41,379.500) and the 24 kb downstream The GMC described in FIG. 2, with probe labels modified as of the BRCA1 gene (from 41,196.310 to 41,172,310). Thus shown in the diagram, was hybridized on this sample. As in the BRCA1 GMC covers a genomic region of 207 kb. US 2013/0130246 A1 May 23, 2013

0027 BRCA2 locus: This locus encompasses the coding in Tables 1 and 2. In a particular embodiment of the invention, portion of the human BRCA2 gene (gene ID: 675, Reference the probes are said to be “free of repetitive nucleotidic Sequence NM 000059.3) located on the long (q) arm of sequences'. Such probes may be located in genomic regions 13 at position 12.3 (13q12.3), from base pair of interest which are devoid of repetitive sequences as defined 32,889,617 to base pair 32,973,809, with a size of 84 kb herein. (reference genome Build GRCh37/hg19), as well as its 0035) Detectable label or marker: any molecule that can be introns and flanking sequences. Following flanking attached to a polynucleotide and which position can be deter sequences have been included in the BRCA2 GMC: the 32 kb mined by means Such as fluorescent microscopy, enzyme upstream of the BRCA2 gene (from 32,857,616 to 32,889, detection, radioactivity, etc, or described in the US applica 616) and the 56 kb downstream of the BRCA2 gene (from tion nr. US2010/0041036A1 published on 18 Feb. 2010. 32,973,810 to 33,029,810). Thus the BRCA2 GMC covers a 0036 Primer: This term has its conventional meaning as a genomic region of 172 kb. nucleic acid molecule (also designated sequence) that serves 0028 Germline rearrangements: genetic mutations as a starting point for polynucleotide synthesis. In particular, involving gene rearrangements occurring in any biological Primers may have 20 to 40 nucleotides in length and may cells that give rise to the gametes of an organism that repro comprise nucleotides which do not base pair with the target, duces sexually, to be distinguished from Somatic rearrange providing sufficient nucleotides in their 3'-end, especially at ments occurring in Somatic cells. least 20, hybridize with said target. The primers of the inven 0029 Point mutations: genetic mutations that cause the tion which are described herein are used to produce probes for replacement of a single base nucleotide with another nucle BRCA1 or BRCA2, for example, a pair of primers is used to otide of the genetic material, DNA or RNA. Often the term produce a PCR amplicon from a bacterial artificial chromo point mutation also includes insertions or deletions of a single some as template DNA. The sequences of the primers used base pair. herein are referenced as SEQID 1 to SEQID 130 in Table 8. 0030 Frameshift mutations: genetic mutations caused by In some cases (details in table 1), the primers contained indels (insertions or deletions) of a number of nucleotides that additional sequences to these at their 5' end for ease of clon is not evenly divisible by three from a DNA sequence. Due to ing. These additional sequences are SEQID 134 (containing the triplet nature of gene expression by codons, the insertion a poly-A and a restriction site for AscI) for forward primers or deletion can change the reading frame (the grouping of the and SEQID 135 (containing a poly-A and a restriction site for codons), resulting in a completely different translation from PacI) for reverse primers. the original. 0037 Tables 1 and 2 and 8 describe representative primer 0031 Tandem repeats duplications: mutations character sequences and the corresponding probe coordinates. ized by a stretch of DNA that is duplicated to produce two or 0038 Genomic Morse Code(s): A GMC is a series of more adjacent copies, resulting in tandem repeats. “dots” (DNA probes with specific sizes and colors) and 0032 Tandem repeat array: a stretch of DNA consisting of “dashes' (uncolored spaces with specific sizes located two or more adjacent copies of a sequence resulting in gene between the DNA probes), designed to physically map a amplification. A single copy of this sequence in the repeat particular genomic region. The GMC of a specific gene or array is called a repeat unit. Gene amplifications occurring locus is characterized by a unique colored "signature' that naturally are usually not completely conservative, i.e. in par can be distinguished from the signals derived by the GMCs of ticular the extremities of the repeated units may be rear other genes or loci. The design of DNA probes for high ranged, mutated and/or truncated. In the present invention, resolution GMC requires specific bioinformatics analysis and two or more adjacent sequences with more than 90% homol the physical cloning of the genomic regions of interest in ogy are considered a repeat array consisting of equivalent plasmid vectors. Low resolution CBC has been established repeat units. Unless otherwise specified, no assumptions are without any bioinformatics analysis or cloning procedure. made on the orientation of the repeat units within a tandem Repetitive nucleotidic sequences: the BRCA1 and BRCA2 repeat array. gene loci contain repetitive sequences of different types: 0033 Complex Rearrangements: any gene rearrangement SINE, LINE, LTR and Alu. The repetitive sequences which that can be distinguished from simple deletions or duplica are present in high quantity in the genome sequence but are tions. Examples are translocations or inversions. absent from the probes, i.e. were removed from the BRCA1 0034 Probe: This term is used in its usual sense for a and BRCA2 GMCs of the invention, are mainly Alu polynucleotide of the invention that hybridizes to a comple sequences, having lengths of about 300 bp (see FIGS. S1, S1, mentary polynucleotide sequences (target) and thus serves to S2 and S3 for more details). This mainly means that the identify the complementary sequence. Generally, a probe will percentage of the remaining Alu-sequences within the DNA be tagged with a marker, such as a chemical or radioactive probes compared to percentage present in the reference market that permits it to be detected once bound to its comple genome is less than 10% and preferably less than 2%. Accord ment. The probes described herein are generally tagged with ingly, a polynucleotide is said to be “free of repetitive nucleo a visual marker, such as a fluorescent dye having a particular tidic sequences” when at least one type of repetitive color Such as blue, green or red dyes. Probes according to the sequences (e.g., Alu, SINE, LINE or LTR) selected from the invention are selected to recognize particular portions or seg types of repetitive sequences cited above is not contained in ments of BRCA1 or BRCA2, their exons or flanking the considered probe, meaning that said probes contains less sequences. For BRCA1, probes generally range in length than 10%, preferably less than 2% compared to percentage between 200 bp and 5,000 bp. For BRCA2, probes generally present in the reference genome. Examples of Alu repeats range in length between 200 bp and 6,000 bp. The name and found in the BRCA1 and 2 genes are given in FIGS. 7A and the size of probes of the invention are described in FIG. 2. 7B, while tables 3 and 4 list the repeats identified by Repeat Representative probes according to the invention, such as Masker contained in the BAC clone RP11-831F13 covering BRCA1-1A(3,458bp) or BRCA2-1 (2,450 bp), are described the genomic region of BRCA1 (FIG. 7A) or in the BAC clone US 2013/0130246 A1 May 23, 2013

RP11-486017 covering the genomic region of BRCA2 (FIG. 0047. A “stringent hybridization' and “stringent hybrid 7B). In both cases, Alu repeats are counted separately in ization wash conditions” in the context of nucleic acid hybrid regions where our probes hybridize and in the regions ization (e.g., as required for Molecular Combing or for iden excluded from this probe design. tifying probes useful for GMC) are sequence dependent, and 0039. The term “intragenic large rearrangement as used are different under different experimental parameters. Strin herein refers to deletion and duplication events that can be gent hybridization conditions that can be used to identify observed in a gene sequence, said sequence comprising in a nucleic acids within the scope of the invention can include for restricted view introns and exons; and in an extended view example hybridization in a buffer comprising 50% forma introns, exons, the 5' region of said gene and the 3' region of mide, 5xSSC, and 1% SDS at 42°C., or hybridization in a said gene. The intragenic large rearrangement can also cover buffer comprising 5.times.SSC and 1% SDS at 65° C., both any gain or loss of genomic material with a consequence in with a wash of 0.2xSSC and 0.1% SDS at 65° C. Exemplary the expression of the gene of interest. stringent hybridization conditions can also include a hybrid 0040. The term “locus’ as used herein refers to a specific ization in a buffer of 40% formamide, 1MNaCl, and 1% SDS position of a gene or other sequence of interest on a chromo at 37° C., and a wash in 1xSSC at 45° C. Alternatively, some. For BRCA1 and BRCA2, this term refer to the BRCA1 hybridization to filter-bound DNA in 0.5 MNaHP0, 7% and BRCA2 genes, the introns and the flanking sequences sodium dodecyl sulfate (SDS), 1 mM EDTA at 65° C., and refer to BRCA1/BRCA2+introns and flanking sequences washing in 0.1xSSC/0.1% SDS at 68° C. can be employed. 0041. The term “nucleic acid” as used herein means a Yet additional stringent hybridization conditions include polymer or molecule composed of nucleotides, e.g., deoxyri hybridization at 60° C. or higher and 3xSSC (450 mM sodium bonucleotides or ribonucleotides, or compounds produced chloride/45 mM sodium citrate) or incubation at 42°C. in a synthetically such as PNA which can hybridize with naturally solution containing 30% formamide, 1 MNaCl, 0.5% sodium occurring nucleic acids in a sequence specific manner analo sarcosine, 50 mM MES, pH 6.5. Those of ordinary skill will gous to that of two naturally occurring nucleic acids, e.g., can readily recognize that alternative but comparable hybridiza participate in Watson-Crick base pairing interactions. tion and wash conditions can be utilized to provide conditions Nucleic acids may be single- or double-stranded or partially of similar stringency. duplex. 0048. A probe or primer located in a given genomic locus 0042. The terms “ribonucleic acid and “RNA as used means a probe or a primer which hybridizes to the sequence in herein mean a polymer or molecule composed of ribonucle this locus of the . Generally, probes are double otides. stranded and thus contain a strand that is identical to and 0043. The terms “deoxyribonucleic acid” and “DNA” as another that is reverse complementary to the sequence of the used herein mean a polymer or molecule composed of deox given locus. A primer is single stranded and unless otherwise yribonucleotides. specified or indicated by the context, its sequence is identical to that of the given locus. When specified, the sequence may 0044) The term “sample” as used herein relates to a mate be reverse complementary to that of the given locus. In certain rial or mixture of materials, typically, although not necessar embodiments, the Stringency of the wash conditions that set ily, in fluid form, containing one or more components of forth the conditions that determine whether a nucleic acid is interest. For Molecular Combing, the sample will contain specifically hybridized to a surface bound nucleic acid. Wash genomic DNA from a biological source, for diagnostic appli conditions used to identify nucleic acids may include for cations usually from a patient. The invention concerns means, example a salt concentration of about 0.02 molar at pH 7 and especially polynucleotides, and methods suitable for in vitro a temperature of at least about 50° C. or about 55° C. to about implementation on samples. 60° C.; or a salt concentration of about 0.15 MNaCl at 72°C. 0045. The terms “nucleoside and “nucleotide' are for about 15 minutes; or a salt concentration of about 0.2x intended to include those moieties that contain not only the SSC at a temperature of at least about 50° C. or about 55° C. known purine and pyrimidine bases, but also other heterocy to about 60° C. for about 15 to about 20 minutes; or, the clic bases that have been modified. Such modifications hybridization complex is washed twice with a solution with a include methylated purines or pyrimidines, acylated purines salt concentration of about 2xSSC containing 0.1% SDS at or pyrimidines, alkylated riboses or other heterocycles. In room temperature for 15 minutes and then washed twice by addition, the terms “nucleoside' and “nucleotide' include 0.1xSSC containing 0.1% SDS at 68° C. for 15 minutes; or, those moieties that contain not only conventional ribose and equivalent conditions. Stringent conditions for washing can deoxyribose Sugars, but other Sugars as well. Modified also be for example 0.2xSSC/0.1% SDS at 42°C. nucleosides or nucleotides also include modifications on the Sugar moiety, e.g., wherein one or more of the hydroxyl 0049. A specific example of stringent assay conditions is groups are replaced with halogen atoms or aliphatic groups, rotating hybridization at 65° C. in a salt based hybridization or are functionalized as ethers, amines, or the like. buffer with a total monovalent cation concentration of 1.5 M 0046. The term “stringent conditions” as used herein followed by washes of 0.5xSSC and 0.1xSSC at room tem refers to conditions that are compatible to produce binding perature. pairs of nucleic acids, e.g., Surface bound and solution phase 0050 Stringent assay conditions are hybridization condi nucleic acids, of sufficient complementarity to provide for the tions that are at least as Stringent as the above representative desired level of specificity in the assay while being less com conditions, where a given set of conditions are considered to patible to the formation of binding pairs between binding be at least as stringent if Substantially no additional binding members of insufficient complementarity to provide for the complexes that lack Sufficient complementarity to provide for desired specificity. Stringent assay conditions are the Summa the desired specificity are produced in the given set of condi tion or combination (totality) of both hybridization and wash tions as compared to the above specific conditions, where by conditions. “substantially no more' is meant less than about 5-fold more, US 2013/0130246 A1 May 23, 2013

typically less than about 3-fold more. Other stringent hybrid probes of the same color form a signal. Thus, a Genomic ization conditions are known in the art and may be employed, Morse Code is composed of sequences of colored signals as appropriate. distributed along a specific portion of the genomic DNA. 0051. “Sensitivity” describes the ability of an assay to Colors were chosen to create unique non-repetitive sequences detect the nucleic acid of interest in a sample. For example, an of signals, which differed between BRCA1 and BRCA2. The assay has high sensitivity if it can detect a small concentration sizes and the BAC coordinates of the genomic regions, encod of the nucleic acid of interest in Sample. Conversely, a given ing for repetitive sequences, excluded from the BRCA1/ assay has low sensitivity if it only detects a large concentra BRCA2 GMC DNA probes are shown in Tables 3 & 4. 257 tion of the nucleic acid of interest in sample. A given assays Alu sequences were excluded from the BRCA1 GMC and 85 sensitivity is dependent on a number of parameters, including Alu sequences were excluded from the BRCA2 GMC. specificity of the reagents employed (such as types of labels, Examples of removed Alu sequences from both GMCs are types of binding molecules, etc.), assay conditions employed, shown in FIG. 7. detection protocols employed, and the like. In the context of 0057 To facilitate Genomic Morse Code recognition and Molecular Combing and GMC hybridization, sensitivity of a measurement, signals located on the genes were grouped given assay may be dependent upon one or more of the nature together in specific patterns called “motifs'. An electronic of the surface immobilized nucleic acids, the nature of the reconstruction of the designed BRCA1 and BRCA2 Genomic hybridization and wash conditions, the nature of the labeling Morse Codes is shown in FIG. 2. In this design, the BRCA1 system, the nature of the detection system, etc. Genomic Morse Code covers a region of 200 kb, including the 0052. Design of High-Resolution BRCA1 and BRCA2 upstream genes NBR1, NBR2, LOC100133166, and Genomic Morse Codes TMEM106A, as well as the pseudogene BRCA1. The com 0053 Molecular Combing has already been used to detect plete BRCA1 Genomic Morse Code is composed of 18 sig large rearrangements in the BRCA1 and BRCA2 genes, but nals (S1B1-S18B), and the 8 BRCA1-specific signals are the hybridization DNA probes originally used were part of a grouped together in 7 motifs (g1b1-g7b1) (FIGS. 2A and B). low resolution “color bar coding screening approach and The BRCA2 Genomic Morse Code covers a genomic region were composed of cosmids, PACs and long-range PCR prod of 172 kb composed of 14 signals (S1B2-S14B2), and the 7 ucts only partially covering the BRCA1 and BRCA2 loci. Of BRCA2-specific signals are grouped together in 5 motifs importance, the DNA probes also encoded repetitive (g 1b2-g5b2) (FIGS. 2C and 2D). Deletions or insertions, if sequences particularly abundant at the two loci (Gad et al., present, are detected in the genomic regions covered by the 2001), (Gadet al., 2002b). As a consequence, detection of the motifs. probes often resulted in the superposition of individual col ored signals (e.g., yellow spots resulting from Superposition 0058 Validation of BRCA1 and BRCA2 Genomic Morse of green and red signals) and in strong background noise, Code Signals in Control Patients undermining the quality of the images and preventing the 0059. The newly designed Genomic Morse Codes were development of a robust strategy to measure the signals first validated on genomic DNA isolated from 10 randomly length. Such a low resolution screening approach did not chosen control patients. Typical visualized signals and mea allow the unambiguous visualization of complex mutations, sured motif lengths for one control donorare reported in FIG. Such as tandem repeat duplications (Schurra and Bensimon, 3, with BRCA1 at the top and BRCA2 at the bottom. For each 2009), (Herrick and Bensimon, 2009). Genomic Morse Code, 20 to 30 images were typically ana 0054 The inventors found that high-resolution Genomic lyzed by measuring the length of the different motifs (see nr. Morse Codes (GMC) that were designed by covering more of images in FIG. 3). Importantly, for all the motifs, the mea the BRCA1 and BRCA2 genomic regions and by removing Sured values were always similar to the calculated values the disturbing repetitive sequences from the DNA probes (compare u and calculated in FIG. 3). The robustness of resolved the problems associated with the prior color bar BRCA1 and BRCA2 signal measurement was determined by coding approach. calculating the mean of the measured motif lengths in all 10 0055 To visualize the repetitive sequences, dot-plot align control patients, and by comparing the mean measured values ments of the BAC clones used for DNA probe cloning were with the calculated values (see Table S1). For BRCA1, we first performed, based on the Genome Reference Consortium obtained delta values (difference betweenu and calculated) in GRCh37genome assembly (also called hg19, April 2009 the range of -0.2 kb and +0.8 kb, whereas BRCA2 delta release). Based on Repeat Masker analysis (www... repeat values were in the range of -0.3 kb and +0.4 kb, underlining masker.org), the percentages of Alu repetitive DNA in the the precision of the developed measurement approach and BRCA1- and BRCA2-encoding BACs were 35% and 17%, confirming that the resolution of Molecular Combing is respectively (data not shown). This resulted in a dark dot-plot around +1 kb (Michalet et al., 1997). Molecular Combing matrix dense in repetitive sequences for BRCA1 (1.6 Alu allows DNA molecules to be stretched uniformly with a sequences per 1 kb of DNA, compared to an average in the physical distance to contour length correlation of 1 um, human genome of only 0.25 Alu/kb), and a brighter dot-plot equivalent to 2 kb (Michalet et al., 1997). As a consequence, matrix for BRCA2 (0.64 Alu/kb of DNA) (FIGS. 1A and 1B). in the absence of large rearrangements, the derived stretching 0056 35 genomic regions in the BRCA1 locus and 27 factor (SF) has a value close to 2 kb/um (+0.2). This was regions in the BRCA2 locus that had significantly less repeti confirmed in all the analyzed control donors, with SF values tive sequences were identified and were used to design and in the range of 1.8-2.2 kb/um (see SF in FIG.3). Accordingly, clone DNA hybridization probes compatible with the visual in the presence of large rearrangements in both BRCA1 and ization process associated with Molecular Combing. The BRCA2, SF values are expected to be 22.3 kb/um (for dele name, size and color of the DNA hybridization probes, and tions) or is 1.7 kb/um (for duplications) and the correspond the exons covered by the probes, are shown in FIG. 1 and ing delta values are expected to be 22 kb (for duplications) or listed in Tables 1 (BRCA1) and 2 (BRCA2). Adjacent DNA s-2 kb (for deletions). Importantly, the presence of a large US 2013/0130246 A1 May 23, 2013 rearrangement is always validated by visual inspection of the different populations (Sluiter 2010). The rearrangement corresponding Genomic Morse Code. reported here has been described three times with an identical 0060 Detection of Known BRCA1 Large Rearrange size (36934 bp). The hotspot for recombination is explained ments in Breast Cancer Patients by the presence of BRCA1. Molecular combing proved 0061 Molecular Combing was then applied to 6 samples capable of characterizing events even in this highly homolo from patients with a severe family history of breast cancer and gous region. known to bear large rearrangements either on BRCA1 or 0068. The results reported herein disclose and exemplify BRCA2 (preliminary screening performed by MLPA or the development of a novel genetic test based on Molecular QMPSF). Importantly, the Molecular Combing analysis was Combing for the detection of large rearrangements in the a blind test, meaning that for each of the patient the identity of BRCA1 and BRCA2 genes. Large rearrangements represent the mutation was unknown before the test, since it was 10-15% of deleterious germline mutations in the BRCA1 revealed to the operator only after having completed the test gene and 1-7% in the BRCA2 gene (Mazoyer, 2005). Specific on all the samples. 6 different large rearrangements were high-resolution GMC were designed and were tested on a identified (see Table 5). Importantly, all 6 known mutations series of 16 biological samples; the robustness of the associ have been recently characterized by aOGH and break-point ated measurement strategy was statistically validated on 10 sequencing (Rouleau 2007) and were correctly identified and control samples, and 6 different large rearrangements were characterized by Molecular Combing. Complete character detected and characterized in samples from patients with a ization of the 3 most significant known BRCA1 large rear severe family history of breast cancer. The robustness of the rangements is reported in FIG. 4 and is described here below. newly designed GMC, devoid of repetitive sequences, is 0062 Duplication of Exon 13 (BRCA1) endorsed by the fact that our Molecular Combing method 0063. By visual inspection via Molecular Combing, this confirmed the results obtained with high-resolution Zoom-in mutation appears as a partial tandem duplication of the blue aCGH (11 k) on the same samples (Rouleau et al., 2007), with signal S7B1 (FIG. 4A, top panel). After measurement, the a resolution in the 1-2 kb range. mutation was estimated to have a size of 6.1 kb, restricted to 0069 Tandem repeat duplications are the most difficult a portion of the DNA probe BRCA1-8 that encodes exon 13. large rearrangements to detect. Contrary to other techniques, The estimated mutation size is fully in line with the 6.1 kb such as aOGH and MLPA, the capacity of Molecular Comb reported in the literature (Puget 1999), and according to the ing to visualize hybridized DNA probes at high resolution Breast Cancer Information Core database, this mutation permits precise mapping and characterization of tandem belongs to the 10 most frequent mutations in BRCA1 (Szabo repeat duplications, as shown here in case 1 (BRCA1 Dup Ex 2000). Duplications are difficult to detect with quantitative 13). aGGH can be used to determine the presence and size of methods such as MLPA, often giving rise to false-positive duplications, but not the exact location and orientation of signals (Cavalieri 2007, Staaf 2008). The characterized tandem repeat duplications. In PCR-based techniques such as patient was therefore also analyzed by MLPA, and the dupli MLPA, duplications are considered to be present when the cation of exon 13 was confirmed. More importantly, we also ratio between the number of duplicated exons in the sample detected a duplication of exons 1A-1B (FIG. 4A, bottom carrying a mutation and the number of exons in the control panel), but this mutation could not be detected by Molecular sample is at least 1.5, reflecting the presence of 3 copies of a Combing (a duplication of exon 13, if present, would yield specific exon in the mutated sample and 2 copies in the two distinct S10B1 signals). Therefore, we consider the exon wild-type sample. The ratio of 1.5 is difficult to demonstrate 1A+1B mutation detected by MLPA to be a false-positive unambiguously by MLPA, which often gives false-positive signal. The risk of false-positive signals is more limited in signals, as observed in case 1 (BRCA1 Dup Ex 13). The limits Molecular Combing. of MLPA have been underlined in several recent studies 0064 Deletion from Exon 8 to Exon 13 (BRCA1) (Cavalieri et al., 2008), (Staaf et al., 2008). MLPA is limited 0065. By visual inspection, the mutation appeared as a to coding sequences and can also give false-negative scores, visible as a deletion of the blue signal S7B1, including a large due to the restricted coverage of the 21 probes (Cavalieri et genomic portion between signals S7B1 and S8B1 (FIG. 4B). al., 2008). In addition, MLPA provides only limited informa After measurement, the mutation was estimated to have a size tion on the location of deletion or duplication breakpoints in of 26.7 kb in a portion of the BRCA1 gene that encodes from the usually very large intronic or affected flanking regions, exon 8 to exon 13. The size reported in the literature is 23.8 thus necessitating laborious mapping for sequence character kb, and this is a recurrent mutation in the French population ization of the rearrangements. Staafetal recently suggested (Mazoyer 2005, Rouleau 2007). that MLPA should be regarded as a screening tool that needs 0066 Deletion of the 5' Region to Exon 2 (BRCA1) to be complemented by other means of mutation character 0067 By visual inspection, the mutation appeared as a ization, such as a CGH (Staaf et al., 2008). We propose deletion of the green signal S10B1, as well as a large genomic Molecular Combing as such a replacement technology for portion of the 5' region upstream of BRCA1, including S11B1 MLPA or acGH, as it unambiguously identifies and visual and S12B1 (FIG. 4C). After measurement, the mutation was izes duplications. estimated to have a size of 37.1 kb, encompassing the portion 0070 Another advantage of Molecular Combing as dis of the BRCA1 gene that encodes exon 2, the entire NBR2 closed herein was its capacity to cover non-coding regions, gene (signal S11B1), the genomic region between NBR2 and including the 5' region of the BRCA1 gene and the genomic the pseudogene BRCA1 (signal S12B1), and a portion of region upstream of BRCA1 that comprises the NBR2 gene, up BRCA1 (signal S13B1). Importantly, the reported size of the BRCA1 pseudogene and the NBR1 gene. Recent studies this type of rearrangement is highly variable, originally in the show that it is very difficult to design exploitable PCR or range of 13.8 to 36.9 kb (Mazoyer 2005) and more recently aCGH probes in this rearrangement-prone genomic region between 40.4 and 58.1 kb (Rouleau 2007). Six different exon (Rouleau et al., 2007), (Staaf et al., 2008), because of the 1-2 deletions have been reported, 16 times, in a number of presence of duplicated regions and the high density of Alu US 2013/0130246 A1 May 23, 2013 repeats. Genomic rearrangements typically arise from sequence), the sample should be suspected to bear a triplica unequal homologous recombination between short inter tion on a single allele (rather than duplications of the spersed nuclear elements (SINES), including Alu repeats, sequence in two separate alleles. Confirmation of the tripli long interspersed nuclear elements (LINEs), or simple repeat cation and its tandem nature may be obtained either through Sequences. a PCR test or through a Molecular Combing test as described 0071 Molecular Combing permits precise physical map in this and the examples section. ping within this difficult regions, as shown here in cases three 0078. As this is a more direct method, we detail some PCR and two (BRCA1 Del Ex 2), where we measured mutation designs here, in the example sections. The man skilled in the sizes of 38.5 kb and 37.1 kb, respectively. As cases 3 and 2 art may adapt these tests through common, generally known, belong to the same family, the detected mutation was the same molecular biology methods, e.g. by modifying primer loca in both cases, as confirmed by aOGH (Rouleau et al., 2007). tions within the sequence ranges mentioned, and/or modify The measurement difference of 1.4 kb between these two ing experimental conditions (annealing temperature, elonga cases is acceptable, being within the 1-2 kb definition range of tion time, ... for PCR). Also, these tests may be included in the molecular combing assay. The mutation was originally “multiplex' tests where other mutations are also sought. For described by Puget et al., who determined the mutation size example, one or several pair(s) of primers designed to detect (37 kb) with a first-generation molecular combing “color bar the triplication and described below may be used simulta coding screening method (Puget et al., 2002). Size estimated neously with one or several other pair(s) of primers targeting with aOGH was in the 40.4-58.1 kb range, because of the low distinct amplicons. In addition to these adaptations, several density of exploitable oligonucleotide sequences in this common variants exist for the molecular tests described. Nev genomic region and the reduced sensitivity of 22 Some oligo ertheless, these variants remain functionally identical to the nucleotides due to (Rouleau et al., 2007). described tests and the adaptation of our designs to these Molecular combing can therefore be used for the analysis of variants is easily achievable by the man skilled in the art. For hard-to-sequence genomic regions that contain large num example, sequencing may be replaced by targeted resequenc bers of repetitive elements. Here we demonstrate that the high ing, where the region of interest is isolated for other genomic concentration of Alu sequences in BRCA1 does not represent regions before the sequencing step, so as to increase coverage an obstacle for molecular combing. in the region of interest. As another example, semi-quantita 0072 Detection of Previously Uncharacterized BRCA1 tive PCR, where DNA is quantity after amplification is Large Rearrangements in Breast Cancer Patients assessed by common agarose electrophoresis, may replace 0073. Further samples were tested, and we characterized QMPSF. by Molecular Combing rearrangements which other tech (0079. These results demonstrate that the developed niques had failed to accurately describe. One Such example is Molecular Combing platform is a valuable tool for genetic detailed below. screening of tandem repeat duplications, CNVs, and other 0074 Triplication of Exons 1a, 1b and 2 of BRCA1 and a complex rearrangements in BRCA1 and BRCA2, such as Portion of NBR2. translocations and inversions, particularly in high-risk breast 0075. We analyzed sample #7 (provided by the Institut cancer families. Claudius Régaud, Toulouse, France) by Molecular Combing, 0080 A prominent application of the developed molecular using the set of probes described in FIG. 5. By visual inspec diagnostic tool is as a predictive genetic test. However, the tion, two alleles of the BRCA1 gene were identified, differing methods and tools disclosed herein may be applied as or in a in the length of the motif g7b1 which extends from the end of companion diagnostic test, for instance, for the screening of the S9B1 probe to the opposite end of the S11B1 probe. The BRCA-mutated cells in the context of the development of mutation appears to be a triplication involving portions of the PARP inhibitors. Such a genetic test can be applied not only SYNT1 probe (SEQ ID 133) and the S10B1 probe, as was to clinical blood samples, but also to circulating cells and confirmed in probe color Swapping experiments. This tripli heterogeneous cell populations, such as tumor tissues. cation of a DNA segment with a size comprised between 5 and 10kb involves exons 1a, 1b and 2 of the BRCA1 gene and EXAMPLES possibly part of the 5' extremity of the NBR2 gene. 0076 Such a triplication has not been reported in this Example 1 genomic region yet. This may be due to the previous lack of relevant technologies to detect the mutation. Therefore, we Materials and Methods designed tests specific to this mutation. These tests may be I0081 Preliminary Patient Screening used to screen for this triplication or to confirm this triplica 0082. The Genomic Morse Code was validated on 10 tion in samples where a rearrangement is Suspected in this samples from patients with no deleterious mutations detected region. There are several types of possible tests, such as PCR, in BRCA1 or BRCA2 (control patients). The genetic test was quantitative PCR (qPCR), MLPA, aGGH, sequencing . . . . validated on 6 samples from patients with positive family 0077 Results of quantification techniques, which provide history of breast cancer and known to bear large rearrange a number of copies of a given sequence (qPCR, MLPA, ments affecting either BRCA1 or BRCA2. Total human aCGH, ...) will not provide direct assessment of the tandem genomic DNA was obtained from EBV-immortalized lym nature of the additional copies of the sequence. The triplica phoblastoid cell lines. Preliminary Screening for large rear tion reported here may be suspected when sequences within rangements was performed with the QMPSF assay (Quanti exons 1a, 1b and/or 2 of BRCA1 and/or the sequences tative Multiplex PCR of Short Fluorescent Fragments) in the between these exons are present in multiple (more than two conditions described by Casillietal and Tournier etal (Casili per diploid genome) copies. Generally speaking, when these et al., 2002) or by means of MLPA (Multiplex Ligation results are above the threshold determined for duplicated Dependent Probe Amplification) using the SALSA MLPA sequence (which have three copies in total of the duplicated kits P002 (MRC Holland, Amsterdam, The Netherlands) for US 2013/0130246 A1 May 23, 2013

BRCA1 and P045 (MRC-Holland) for BRCA2. All 16 incubated at 42°C. overnight with 1.5 Ubeta agarase I (New patients gave their written consent for BRCA1 and BRCA2 England Biolabs, Ipswich, Mass., USA). The solution was analysis. transferred to a combing vessel already containing 1 ml of 0.5 I0083 Molecular Combing MMES pH 5.5, and DNA combing was performed with the I0084 Sample Preparation Molecular Combing System on dedicated coverslips (Com 0085 Total human genomic DNA was obtained from bicoverslips) (both from Genomic Vision, Paris, France). EBV-immortalized lymphoblastoid cell lines. A 45-uI sus 0092 Combicoverslips with combed DNA are thenbaked pension of 10° cells in PBS was mixed with an equal volume for 4 hat 60° C. The coverslips were either stored at -20°C. of 1.2% Nusieve GTG agarose (Lonza, Basel, Switzerland) or used immediately for hybridisation. The quality of comb prepared in 1xRBS, previously equilibrated at 50° C. The ing (linearity and density of DNA molecules) was estimated plugs were left to solidify for 30 min at 4° C., then cell under an epi-fluorescence microscope equipped with an FITC membranes are solubilised and proteins digested by an over filter set and a 40x air objective. A freshly combed coverslip night incubation at 50° C. in 250 uL, of 0.5 M EDTA pH 8.0, is mounted in 20 LL of a 1 ml ProLong-gold solution con 1% Sarkosyl (Sigma-Aldrich, Saint Louis, Mo., USA) and 2 taining 1 uL of Yoyo-1 solution (both from Invitrogen). Prior mg/mL proteinase K (Eurobio, Les Ulis, France), and the to hybridisation, the coverslips were dehydrated by succes plugs were washed three times at room temperature in 10 mM sive 3 minutes incubations in 70%, 90% and 100% ethanol Tris, 1 mM EDTA pH 8.0. The plugs were then either stored baths and then air-dried for 10 min at room temperature. The at 4°C. in 0.5 M7EDTA pH 8.0 or used immediately. Stored probe mix (20 uL; see Probe Preparation) was spread on the plugs were washed three times for 30 minutes in 10 mM Tris, coverslip, and then left to denature for 5 min at 90° C. and to 1 mM EDTA pH 8.0 prior to use. hybridise overnight at 37° C. in a hybridizer (Dako). The I0086 Probe Preparation coverslip was washed three times for 5 min in 50% forma I0087 All BRCA1 and BRCA2 probes were cloned into mide, 1xSSC, then 3x3 min in 2XSSC. pCR2.1-Topo or pCR-XL-Topo (Invitrogen) plasmids by 0093. Detection was performed with two or three succes TOPO cloning, using PCR amplicons as inserts. Amplicons sive layers offluorophore or Streptavidin-conjugated antibod were obtained using bacterial artificial (BACs) ies, depending on the modified nucleotide employed in the as template DNA. The following BACs were used: for random priming reaction (see above). For the detection of BRCA1, the 207-kb BACRP11-831F13 (ch17:41172482 biotin labeled probes the antibodies used were Streptavidin 41379594, InVitrogen, USA); and for BRCA2, the 172-kb A594 (InVitrogen, Molecular Probes) for the 1st and 3rd BACRP11-486017 (ch13:32858070-33030569, InVitrogen, layer, biotinylated goat anti-Streptavidin (Vector Laborato USA). See Tables 1 and 2 for primer sequences and probe ries) for the 2nd layer; For the detection of A488-labelled coordinates. Primer sequences are referenced as SEQID 1 to probes the antibodies used were rabbit anti-A488 (InVitro SEQID 130. In some cases (as detailed in table 1), additional gen, Molecular Probes) for the 1st and goat anti-rabbit A488 artificial sequences were added to the 5' end of the primer for (InVitrogen, Molecular Probes) for the 2nd layer; For the ease of cloning. These artificial sequences are SEQ ID 134 detection of digoxygenin labeled probes the antibodies used (ForwardPrimerPrefix) for forward primers and SEQID 135 were mouse anti-Dig (Jackson Immunoresearch) for the 1st (ReversePrimerPrefix) for forward primers, both containing a layer, ratanti-mouse AMCA (Jackson Immunoresearch) for poly-A and a restriction site for, respectively, AscI and PacI. the 2nd layer and goatanti-mouse A350 (InVitrogen, Molecu 0088 SEQID 131 (BRCA1-1A), SEQID 132 (BRCA1 lar Probes) for the 3rd Layer. 1B) and SEQ ID 133 (BRCA1-SYNT1) are examples of 0094. A 20 minute incubation step was performed at 37° probe sequences. C. in a humid chamber for each layer, and three successive 3 0089 Whole plasmids were used as templates for probe minutes washes in 2xSSC, 0.1% Tween at room temperature labeling by random priming. Briefly, for biotin (Biota) label between layers. Three additional 3 minutes washes in PBS ing, 200 ng of template was labeled with the DNA Bioprime and dehydration by successive 3 minutes washes in 70%, 90% kit (Invitrogen) following the manufacturers instructions, in and 100% ethanol were performed before mounting the cov an overnight labeling reaction. For Alexa-488 (A488) or erslip. digoxigenin (Dig) labeling, the same kit and protocol were 0.095 Image Acquisition used, but the dNTP mixture was modified to include the 0096. Image acquisition was performed with a customized relevant labeled dNTP namely Dig-11-dUTP (Roche Diag automated fluorescence microscope (Image Xpress Micro, nostics, Meylan, France) or A488-7-OBEA dCTP (Invitro Molecular Devices, Sunnyvale, Calif., USA) at 40x magnifi gen) and its unlabelled equivalent, both at 100 uM, and all cation, and image analysis and signal measurement were other dNTPs at 200 uM. Labeled probes were stored at -20° performed with the software Image.J (http:// rsbweb.nih.gov/ C. For each coverslip, 5ull of each labeled probe (/10th of a i) and JMeasure (Genomic Vision, Paris, France). Hybridi labeling reaction product) was mixed with 10 ug of human sation signals corresponding to the BRCA1 and BRCA2 Cot-1 and 10 ug of herring sperm DNA (both from Invitro probes were selected by an operator on the basis of specific gen) and precipitated in ethanol. The pellet was then resus patterns made by the succession of probes. For all motifs pended in 22 uL of 50% formamide, 30% Blocking Aid signals belonging to the same DNA fibre, the operator set the (Invitrogen), 1xSSC, 2.5% Sarkosyl, 0.25% SDS, and 5 mM ends of the segment and determined its identity and length NaCl. (kb), on a 1:1 scale image. The data were then output as a 0090 Genomic DNA Combing and Probe Hybridization spreadsheet. In the final analysis, only intact motif signals 0091 Genomic DNA was stained by 1 h incubation in 40 were considered, confirming that no fibre breakage had mM Tris, 2 mM EDTA containing 3 uMYoyo-1 (Invitrogen, occurred within the BRCA1 or BRCA2 motifs. Carlsbad, Calif., USA) in the dark at room temperature. The (0097 Statistical Analysis plug was then transferred to 1 mL of 0.5 M MES pH 5.5, (0098. Molecular Combing allows DNA molecules to be incubated at 68°C. for 20 minto melt the agarose, and then stretched uniformly with a physical distance to contour length US 2013/0130246 A1 May 23, 2013

correlation of 1 um, equivalent to 2 kb (Michalet et al., 1997). covering the whole loci, including all the repetitive As a consequence, in the absence of large rearrangements, the sequences. Thus, no bioinformatic analysis to identify poten derived stretching factor (SF) has a value close to 2 kb/um tially disturbing repetitive sequences has been even per (+0.2). formed. More importantly, no repetitive sequence has been 0099 All 7 BRCA1 motifs (g1b1-g7b1) and all 5 BRCA2 ever excluded from the design of the CBCs. This often motifs (g 1b2-g5b2) were measured in all 20 biological resulted in incomplete characterizations of the screened samples. The mean value size of all motifs measured in the 10 mutations (see Part 3). As a consequence, detection of the healthy controls, including the associated Statistical analysis, probes often resulted in the superposition of individual col is reported in Table S1. The size of all motifs measured in the ored signals (e.g., yellow/white spots resulting from Super 6 breast cancer patients, including the associated Statistical position of different colored signals) and in strong back analysis, is reported in Table S2. For each motif, the following ground noise, undermining the quality of the images and values were determined: the number of measured images (n), preventing the development of a robust strategy to measure the theoretical calculated length (calculated (kb)), the mean the signals length. In addition, no DNA probe was r isolated measured length (L(kb)), the standard deviation (SD (kb)), the and cloned in an insert vector. The BRCA1 Color Bare Code coefficient of variation (CV (%)), the difference between LL (CBC) was composed of only 7 DNA probes ((Gad, et al. and calculated (delta), and the stretching factor (SF=(calcu Genes Chromosomes and cancer 31:75-84 (2001))), whereas lated/u)x2) (Michalet et al., 1997). In the absence of muta the BRCA2 CBC was composed of only 8 DNA probes (Gad, tions, delta values are comprised between -1.9 kb and 1.9 kb, etal, J Med Genet (2002)). This low number of DNA probes and SF values are comprised between 1.8 and 2.2. The pres did not allow high resolution physical mapping.(). ence of a large rearrangement on BRCA1 or BRCA2 was first 0102 Importantly, such a low resolution screening identified by visual inspection of the corresponding GMC. approach did not allow the unambiguous visualization of From numerous datasets, we established that in the presence complex mutations, such as tandem repeat duplications or of large rearrangements in both BRCA1 and BRCA2, delta triplications. In contrast, full characterization of tandem 22 kb (for duplications) or delta s-2 kb (for deletions), and repeat duplications and triplications is possible with the high the corresponding SF22.3 kb/um (for deletions) or SFs 1.7 resolution GMC (see Example 1). Moreover, the accurate kb/um (for duplications). To confirm the presence of a large physical mapping of all the mutated exons was often prob rearrangement, the motif (-s) of interest was (were) first mea lematic, requiring additional laborious sequencing experi Sured on a total population of images (typically between 20 ments. This often resulted in incomplete characterizations of and 40), comprising wild-type (wt) and mutated (mt) alleles. the screened mutations (see Chapter 3). In presence of large rearrangements, and aiming to measure 0103 Part 2. New Application of Molecular Combing on the mutation size, the images were then divided in two Characterization of BRCA1 and BRCA2 Large Rearrange groups, corresponding to the wt and the mt alleles. Within ments: Design of High Resolution Genomic Morse Codes each of the two groups of n images, following values were (GMCs) and Development of a Genetic Test. calculated: u(kb), SD (kb), CV (%). The L value of the wild 0104. An important point of novelty for the present inven type allele was then compared with the L value of the mutated tion is the design and cloning of high-resolution Genomic allele. To this aim, we calculated the standard error of the Morse Codes (GMC) for both BRCA1 and BRCA2 genomic mean (SEMSD/Vn) and the 95% confidence interval (95% regions. The BRCA1 GMC is composed of 35 DNA probes CIL-2xSEM). The mutation size was then calculated as a (FIG. 1), whereas the BRCA2 GMC is composed of 27 DNA difference between the mean size of the two alleles: mutation probes (FIG. 2). size u(BRCA1")-u(BRCA1"). The related error was cal 0105 Comparative FIG. 1: in-silico generated (top) and culated according to following formula: microscopy observed (bottom) high resolution BRCA1 GMC. 0106 Comparative FIG. 2: in-silico generated (top) and microscopy observed (bottom) high resolution GMC of Example 2 BRCA2. 0107 35 genomic regions in BRCA1 and 27 regions in Comparison of Genetic Morse Code and Molecular BRCA2 devoid of repetitive sequences were identified, and Combing of the Invention to Prior Color Bar Code were used to design and clone the corresponding DNA Procedure hybridization probes. All the details of the employed DNA hybridization probes (name, size, coordinates, color and the 0100 Part 1. Previous Application of Molecular Combing nature of the covered exons) are listed above. The cloned on Characterization of BRCA1 and BRCA2 Large Rear DNA probes allow the accurate physical mapping of deleted rangements: Design of Low Resolution Color Bar Codes exons and permit the simultaneous detection of large rear (CBCs) rangements in BRCA1 and BRCA2. The above described 0101 Molecular Combing has already been used by Gad improvement in resolution, permitted the inventors to trans et al. (Gad GenChrCan 2001, Gad JMG 2002) to detect large late their observations into the development of a robust pre rearrangements in the BRCA1 and BRCA2 genes. The dictive genetic test for breast and ovarian cancer (see example hybridization DNA probes originally used were part of a low 1). resolution "color bar coding screening approach composed (0.108 Part 3: High Resolution GMCs Allow the Unam of cosmids, PACs and long-range PCR products. Some biguous Detection and Visualization of Complex Mutation probes were Small and ranged from 6 to 10 kb, covering a (e.g.: Tandem Repeat Duplications and Triplications) that small fraction the BRCA1 and BRCA2 loci. Other probes can’t be Characterized by Low Resolution CBCs were very big (PAC 103.014 measuring 120 kb for BRCA1 0109 The following are selected examples of complex and BAC 486017 measuring 180 kb for BRCA2) and were mutations that could not be characterized (or only partially) US 2013/0130246 A1 May 23, 2013

by low resolution CBC, but could be precisely and unambigu which extends from the end of the S9B1 probe to the ously characterized by high resolution GMC: opposite end of the S11B1 probe. The mutation 0110 3.1 BRCA1 Dupex 18-20 appeared to be a triplication involving portions of the 0111 CBC: SYNT1 and the S10B1 probe, as confirmed in probe The image generated by Gadet al (case IC 171712 in FIG. 1 of color swapping experiments. This triplication of a DNA Gadet al. Oncogene 2001) has a low resolution and the nature segment with a size comprised between 5 and 10 kb, and and particularly the identity of the deleted exons cannot be probably between 6 and 8 kb, involves exons 1a, 1b and defined by visual inspection. As a consequence, the size of the 2 of the BRCA1 gene and possibly part of the 5' extrem mutation has not been determined, confirming that the gen ity of the NBR2 gene. erated images were problematic for measurements. 0127. The CBC would have at best detected this muta tion as an increase of the length of a single probe, and GMC. (see Table S2 of Example 1) thus would not have been able to characterize the muta 0112 By visual inspection, this mutation appears as a tion as a tandem triplication. Contrarily to Molecular tandem duplication of the red signal S5B1. After measure Combing, none of the current molecular diagnostics ment, the mutation was estimated to have a size of 6.7+1.2 kb, technology, such as MLPA or aOGH, could assess restricted to a portion of the genome that encodes for exons 18 whether the duplication or triplication is in tandem to 20. The estimated mutation size is fully in line with the 8.7 (within BRCA1) or dispersed (out of BRCA1). This kb reported in the literature (Staaf, 2008). Details on the observation makes a clear difference in terms of risk measurement and statistical analysis can be found in Example evaluation, since there is no evidence that repeated 1. genomic portions out of the BRCA1 locus are clinically 0113 Comparative FIG.3: characterization of the BRCA1 significant. Molecular Combing highlights that the mutation Dupex 18-20 via CBC (top) and GMC (bottom). mutation occurs within the BRCA1 gene, thus being of 0114 3.2 BRCA1 Delex 8-13 clinical significance. 0115 CBC: 0116. The image generated by Gadet al (case IC657 in I0128. The following important advantages of GMC com FIG. 1 of Gadet al. Oncogene 2001) has a low resolution and pared to CBC are evident from the examples above: the nature of the deleted exons cannot be unambiguously 0.129 high resolution visual inspection defined by visual inspection. The size of the mutation after 0.130 precise mapping of mutated exons measurement was 20.0-9.6 kb, having an important standard 0131 precise measurement of mutation size with robust deviation. statistics 0117 GMC: (see FIG. 4B, Example 1) 0118. By visual inspection, the mutation clearly appeared (0132 simultaneous detection of BRCA1 and BRCA2 as a deletion of the blue signal S7B1, including a large 0.133 detection of inversions and translocation genomic portion between signals S7B1 and S8B1. After mea 0.134 absence of disturbing repetitive sequence (Alu surement, the mutation was estimated to have a size of 20+2.8 sequences) for GMCs BRCA1 and BRCA2. kb, having a smaller error. I0135 Tests Specific to Detect a Triplication in the 5' 0119. 3.3 BRCA1 Dupex 13 (6.1 kb) Region of BRCA1 0120 CBC: 0.136 PCR tests to detect unambiguously the triplication 0121 No microscopy image related to mutation has been described above or a close triplication may distinguish non ever provided. The estimated mutation size was 5.8+1.8 kb (case IARC3653 in FIG.3 of Gadet al. Oncogene 2001), but triplicated from triplicated alleles through either one of two is not supported by visual inspection. ways: 0122 GMC: (see FIG. 4A, Example 1) 0.137 a -appearance of PCR fragments with the tripli 0123. By visual inspection via Molecular Combing, this cated allele that do not appear with a non-triplicated mutation appears as a partial tandem duplication of the blue allele or signal S7B1. After measurement, the mutation was estimated 0138 b—change of size of a PCR fragment. to have a size of 6.1+1.6 kb, restricted to a portion of the DNA The organization of the sequences in a triplication may be probe BRCA1-8 that encodes exon 13. The estimated muta used to design primer pairs such that the PCR amplification is tion size is fully in line with the 6.1 kb reported in the litera ture (Puget, 1999), and according to the Breast Cancer Infor only possible in a tandem repeat. If one of the primers is mation Core database, this mutation belongs to the 10 most located in the amplified sequence and is in the same orienta frequent mutations in BRCA1 (Szabo, 2000). Therefore, tion as the BRCA gene (5' to 3') and the other is the reverse there is perfect correlation between the images and the mea complementary of a sequence within the amplified sequence Surements, and correlation with values present in literature. located upstream of the first primer (i.e. the direction from the 0.124 3.4 Tandem Repeat Triplication of Exons 1a, 1b and location of the first to the second primer is the same as the 2 of BRCA1 and a Portion of NBR2. direction from the 3' to the 5' end of the BRCA gene), the PCR in a non-mutated Sample will not be possible as the orienta CBC: tion of the primers do not allow it. Conversely, in a triplicated sample, the first primer hybridizing on a repeat unit is ori 0.125 No tandem triplication has been ever reported using ented correctly relative to the second primer hybridizing in the CBC. the repeat unit immediately downstream of the first primer's repeat unit. Thus, the PCR is possible. In a triplicated sample, GMC: two PCR fragments should be obtained using a pair of primers 0.126 By visual inspection via Molecular Combing, designed this way. In a sample with a duplication, only one two alleles of the BRCA1 gene were identified in a fragment would appear. The size of the smaller PCR fragment sample provided by the Institut Claudius Regaud, Tou (or the only fragment in the case of a duplication), S, is the Sum louse, France, differing in the length of the motif g7b1 of the following distances: US 2013/0130246 A1 May 23, 2013

0139 D, measured from the first (downstream) primer to mutated samples) and the characterization of the mutation the downstream (3' direction relative to the BRCA1 gene) will be difficult. Therefore, it is advisable to perform addi breakpoint, and tional experiments with separate primer pairs if at least one 0140 U, measured from the second (upstream) primer to fragment is observed in the multiplex PCR. the upstream (5' direction relative to the BRCA1 gene) break Importantly, with the design described in the preceeding para point. graphs, the orientation of the triplicated sequence is of minor This measurement thus provides a location range for both importance: indeed, in a triplication, at least two of the repeat breakpoints, the downstream breakpoint being at a distance units will share the same orientation and at least one PCR Smaller than or equal to S from the location of the downstream fragments should be amplified. This holds true for a duplica primer (in the downstream direction) and the upstream break tion, as in the case of an inverted repeat, a PCR fragment point at a distance Smaller than or equal to S from the location would be obtained from a one of the primers hybridizing in of the upstream primer (in the upstream direction). Besides, two separate locations with reverse (facing) orientations, since the size of the triplicated sequence (L) is the sum of while a direct tandem repeat would generate a PCR fragment U+D and the distance between the two primers, L may be from the two primers as described above. readily deduced from the size of the PCR fragment. Another type of PCR test to reveal the triplication and its The size of the larger fragment is the sum of L and the size of tandem nature requires the amplification of a fraction of or of the smaller fragment. Thus, by subtracting the size of the the entire repeat array, using primer pairs spanning the smaller fragment from the size of the larger one, the size of the repeated sequence (both primers remaining outside the triplicated sequence is readily assessable in a second, inde amplified sequence), or spanning a breakpoint (one primer is pendent assessment. This reduces the uncertainty on the loca within and the other outside the amplified sequence) or tion of the breakpoints. Thus, a test designed this way will entirely included in the amplified sequence. These tests will allow a precise characterization of the triplication. Given the generate a PCR fragment of given size in a normal sample, location of the triplication identified here, primer pairs used to while in a sample with a triplication on one allele, one or more detect the triplication could include combinations of one or additional PCR fragment will appear, including one the size several of the following downstream and upstream primers of the “normal fragment plus twice the size of the repeat (the primer designed as the downstream primer is in the direct sequence. If a mutation is present, these tests will often lead to orientation relative to the BRCA1 gene and while the results than can have several interpretations. If a single upstream primer is reverse complementary to the first strand experiment is performed and reveals a mutation, a (series of) of the BRCA1 gene). In choosing a combination of primers, complementary test(s) may be performed following the in addition to the prescriptions below, one must choose the designs presented hereinto confirm the correct interpretation. primer locations so the downstream primer is located down Given the location of the triplication identified here, primer stream of the upstream primer: pairs used to detect the triplication could include a combina A downstream primer may be located: tion of one or several of the following primers, with at least 0141 i) in the region between exons 2 and 3 of BRCA1, one down stream and one upstream primer. The primer preferably at a distance from 2-4 kb from the 3' end of designed as the downstream primer is reverse complementary exon 2, more preferably at a distance from 2.5-3 kb from relative to the BRCA1 gene sequence and while the upstream the 3' end of exon 2 primer is in directorientation relative to the BRCA1 gene. In 0.142 ii) in the region between exons 2 and 3 of BRCA1, choosing a combination of primers, in addition to the pre within 2 kb from the 3' end of exon 2, preferably within Scriptions below, one must choose the primer locations so the 1.5 kb and more preferably within 1kb from the 3' end of downstream primer is located downstream of the upstream exon 2 primer: An upstream primer may be located: A downstream primer may be located: 0.143 i) in the region between the BRCA1 gene and the NBR2 gene, within 2 kb from exon 1a of BRCA1, pref 0147 i) in exon 3 of the BRCA1 gene; or erably within 1.5 kb and more preferably within 1kb of 0.148 ii) in the region between exons 2 and 3 of BRCA1, exon 1a of BRCA1; preferably more than 2 kb and less than 10kb from the 3' 0144) ii) within exon 1a of BRCA1 or within exon 1b or end of exon 2, more preferably more than 3 kb and less in the region between exons 1a and 1b; than 8 kb and even more preferably more than 4 kb and 0145 iii) in the region between exons 1b and 2, or in less than 6 kb from the 3' end of exon 2. exon 2, or in the region between exons 2 and 3. An upstream primer may be located: 014.6 An example of such a combination is the primer 0.149 i) in the region between the BRCA1 gene and the pair consisting of primers BRCA1-Synt1-R (SEQ ID NBR2 gene, less than 10kb from exon 1a of BRCA1 and 126) and BRCA1-A3A-F (SEQID 25); more than 1 kb from exon 1a of BRCA1, preferably The combinations above are not meant to be exhaustive and more less than 8 kb than 2 kb and more preferably less the man skilled in the art may well choose other location for than 6 and more than 4 kb of exon 1a of BRCA1; or the upstream and downstream primers, provided the orienta 0.150 ii) in exon 1a, exon 1b or in the region between tion and relative location of the primers is chosen as exons 1a and 1b of BRCA1; or described. Several combinations of primers may be used in 0151 iii) in exon 2 or in the region between exons 1b separate experiments or in a single experiment (in which case and 2 of BRCA1 or in the region between exons 2 and 3. all of the “upstream' primers must be located upstream of all of the “downstream' primers. If more than three primers are 0152 iii) used simultaneously (multiplex PCR, the number of PCR 0153 iv) fragments obtained will vary depending on the exact location 0154 ExamplesXanoles ofOf Suchsuch COmb1nat1OnSbinati are thehe priorimer of the breakpoint (no PCR fragment at all will appear in non pairs consisting of primers BRCA1-A3A-F (SEQID25) US 2013/0130246 A1 May 23, 2013 13

and BRCA1-A3A-R (SEQ ID 26) and of primers 0.167 10. A composition according to embodiments 1 to 9 BRCA1-Synt1-F (SEQ ID 125) and BRCA1-Synt1-R wherein the genetic rearrangement or mutation detected is (SEQID 126) more than 1.5 kilobase (kb). 0155 V) a downstream primer as described in i) and an 0168 11. A predictive genetic test of susceptibility of upstream primeras described in ii) breast or ovarian cancer in a Subject involving the detection 0156 vi) a downstream primeras described ini) and an (presence or absence) and optionally the characterization of upstream primeras described in iii) one or more specific large genetic rearrangement or mutation in the coding or non coding sequences of the BRCA1 or 0157 vii) a downstream primer as described in ii) and BRCA2 locus, the rearrangement being visualized by any of an upstream primer as described in i) the composition according to embodiments 1 to 10. Specific Embodiments of the Invention Include the 0169 12. A method of detection for the sensitivity of a Following Subject to a therapeutic procedure comprising the identifica tion of one or more genetic rearrangements or mutations in 0158 1. A nucleic acid composition for detecting simul the coding or non-coding sequences of BRCA1 or BRCA2 taneously one or more large or complex mutations or genetic gene or locus by visualizing by molecular combing said rearrangements in the locus BRCA1 or BRCA2 comprising at genetic rearrangement by using any of the composition least two colored-labeled probes containing more than 200 according to embodiments 1 to 10. nucleotides and specific of each said gene, said probes being 0170 13. A method of detection of at least one large visually detectable at high resolution and free of repetitive genetic rearrangement or mutation by molecular combing nucleotidic sequences. technique in a fluid or circulating cells or a tissue of a bio 0159 2. A nucleic acid composition according to embodi logical sample comprising the steps of ment 1 for detecting simultaneously one or more large or 0171 a) contacting the genetic material to be tested with at complex mutations or genetic rearrangements in the locus least two colored labeled probes according to embodiments 1 BRCA1 or BRCA2 comprising at least three colored-labeled to 10 visualizing with high resolution the hybridization of probes containing more than 200 nucleotides and specific of step a) and optionally each said gene, said probes being visually detectable at high 0172 b) comparing the result of step b) to the result resolution and free of repetitive nucleotidic sequences. obtained with a standardized genetic material carrying no 0160 3. A nucleic acid composition according to embodi rearrangement or mutation in BRCA1 or BRCA2 gene or ments 1 or 2 for detecting simultaneously one or more large or locus. complex mutations or genetic rearrangements in BRCA1 or 0173 14. A composition comprising: BRCA2 gene comprising at least three color-labeled probes 0.174 two or more oligonucleotide probes according to containing more than 600 nucleotides and specific of each embodiments 1 to 10; said gene, said probes being visually detectable at high reso 0.175 probes complementary to said oligonucleotide lution and free of repetitive nucleotidic sequences. probes; 0161 4. A composition according embodiments 1, 2 or 3, 0176 probes that hybridize to said probes of embodiments wherein the probes are all together visualized on a monos 1 to 10 under stringent conditions; tranded-DNA fiber or on a polynucleotidic sequence of inter 0177 probes amplified by PCR using pairs of primers est or on a genome to be tested. described in Tables 1 or 2 (SEQID 1 to SEQID 130); or 0162 5. A composition according embodiments 1, 2, 3 or (0178 probes comprising BRCA1-1A (SEQID NO: 131), 4 comprising at least fivecolor-labeled signal probes specific BRCA1-1B (SEQID NO: 132), or BRCA1-SYNT1 (SEQID of BRCA1 or BRCA2 locus allowing detection of the follow NO:133) ing mutations: duplication, deletion, inversion, insertion, 0179 15. A set of primers selected from the group of translocation or large rearrangement. primers consisting of SEQID 1 to SEQID 70 and SEQID 125 0163 6. A composition according embodiments 1 to 4 to SEQID 130 for BRCA1 comprising at least seven color-labeled signal probes specific 0180 16. A set of primers selected from the group of of BRCA1 or BRCA2 locus allowing to detect following primers consisting of SEQID 71 to SEQID 124 for BRCA2. mutations: duplication, deletion, inversion, insertion, trans 0181 17. An isolated or purified probe produced by ampli location or large rearrangement. fying BRCA1 or BRCA2 coding, intron or flanking 0164 7. A composition according embodiments 1 to 4 sequences using a primer pair of embodiment 15 or 16. comprising at least nine color-labeled signal probes specific 0182 18. An isolated or purified probe comprising a poly of BRCA1 or BRCA2 locus allowing to detect following nucleotide sequence of SEQID NO: 131 (BRCA1-1A), SEQ mutations: duplication, triplication, deletion, inversion, ID NO: 132 (BRCA1-1B) or SEQID NO: 133 (SYNT1), or insertion, translocation or large rearrangement. that hybridizes to SEQID NO: 131 or to SEQID NO: 132 or 0.165 8. A composition according embodiments 1 to 7 to SEQID NO: 133 under stringent conditions. comprising at least fourteen-color-labeled signal probes spe 0183) 19. A composition comprising at least two poly cific of BRCA1 or BRCA2 locus allowing to detect following nucleotides each of which binds to a portion of the genome mutations: duplication, triplication, deletion, inversion, containing a BRCA1 and/or BRCA2 gene, wherein each of insertion, translocation or large rearrangement. said at least two polynucleotides contains at least 200 con 0166 9. A composition according embodiments 1 to 8 tiguous nucleotides and contains less than 10% of Alu repeti comprising at least eighteen color-labeled signal probes spe tive nucleotidic sequences. cific of BRCA1 or BRCA2 locus allowing to detect following 0.184 20. The composition of embodiment 19, wherein mutations: duplication, triplication, deletion, inversion, said at least two polynucleotides bind to a portion of the insertion, translocation or large rearrangement. genome containing BRCA1. US 2013/0130246 A1 May 23, 2013

0185. 21. The composition of embodiment 19, wherein tagged with visually detectable markers and are selected to said at least two polynucleotides bind to a portion of the identify a duplication, deletion, inversion, insertion, translo genome containing BRCA2. cation or large rearrangement in a particular segment of a 0186 22. The composition of embodiment 19, wherein BRCA1 or BRCA2 locus, BRCA1 or BRCA2 gene, BRCA1 each of said at least two polynucleotides contains at least 500 or BRCA2 flanking sequence or intron; and optionally a up to 6,000 contiguous nucleotides and contains less than standard describing a hybridization profile for a subject not 10% of Alu repetitive nucleotidic sequences. having a duplication, deletion, inversion, insertion, translo 0187. 23. The composition of embodiment 19, wherein the cation or large rearrangement in a BRCA1 or BRCA2 locus, at least two polynucleotides are each tagged with a detectable BRCA1 or BRCA gene, BRCA1 or BRCA flanking sequence label or marker. or intron; one or more elements necessary to perform 0188 24. The composition of embodiment 19, comprising Molecular Combing, instructions for use, and/or one or more at least two polynucleotides that are each tagged with a dif packaging materials. ferent detectable label or marker. (0199 35. The kit of embodiment 34, wherein said at least 0189 25. The composition of embodiment 19, comprising two or polynucleotides are selected to identify a duplication, at least three polynucleotides that are each tagged with a deletion, inversion, insertion, translocation or large rear different detectable label or marker. rangement in a particular segment of a BRCA1 or BRCA2 0190. 26. The composition of embodiment 19, comprising locus, BRCA1 or BRCA2 gene, BRCA1 or BRCA2 flanking at least four polynucleotides that are each tagged with a sequence or intron associated with ovarian cancer or breast different detectable label or marker. CaCC. 0191 27. The composition of embodiment 19, comprising 0200 36. The kit of embodiment 34, wherein said at least three to ten polynucleotides that are each independently two or polynucleotides are selected to identify a duplication, tagged with the same or different visually detectable markers. deletion, inversion, insertion, translocation or large rear 0.192 28. The composition of embodiment 19, comprising rangement in a particular segment of a BRCA1 or BRCA2 eleven to twenty polynucleotides that are each independently locus, BRCA1 or BRCA2 gene, BRCA1 or BRCA2 flanking tagged with the same or different visually detectable markers. sequence or intron associated with a kind of ovarian cancer or 0193 29. The composition of embodiment 19, comprising breast cancer sensitive to a particular therapeutic agent, drug at least two polynucleotides each tagged with one of at least or procedure. two different detectable labels or markers. 0201 37. A method for detecting an amplification of a 0194 30. A method for detecting a duplication, triplica genomic sequence spanning the 5' end of the BRCA1 gene tion, deletion, inversion, insertion, translocation or large rear and consisting of at least three copies of the sequence in a rangement in a BRCA1 or BRCA2 locus, BRCA1 or BRCA sample containing genomic DNA. Accordingly, the invention gene, BRCA1 or BRCA flanking sequence or intron, com relates in particular to a method for in vitro detecting in a prising: isolating a DNA sample, molecularly combing said sample containing genomic DNA, a repeat array of multiple sample, contacting the molecularly combed DNA with the tandem copies of a repeat unit consisting of genomic composition of embodiment 5 as a probe for a time and under sequence spanning the 5' end of the BRCA1 gene wherein conditions Sufficient for hybridization to occur, visualizing said repeat array consists of at least three copies of the repeat the hybridization of the composition of embodiment 5 to the unit and said method comprises: DNA sample, and comparing said visualization with that 0202 providing conditions enabling hybridization of a obtain from a control sample of a normal or standard BRCA1 first primer with the 5' end of the target genomic or BRCA2 locus, BRCA1 or BRCA gene, BRCA1 or BRCA sequence and hybridization of a second primer with the flanking sequence or intron that does not contain a rearrange 3' end of said target sequence, in order to enable poly ment or mutation. merization by PCR starting from said primers: (0195 31. The method of embodiment 30, wherein said probe is selected to detect a rearrangement or mutation of 0203 amplifying the sequences hybridized with the more than 1.5 kb. primers; (0196. 32. The method of embodiment 30, further compris 0204 detecting, in particular with a probe, the ampli ing predicting or assessing a predisposition to ovarian or cons thereby obtained and determining their size or their breast cancer based on the kind of genetic rearrangement or content, in particular their nucleotide sequence. mutation detected in a coding or noncoding BRCA1 or (0205) 38. A method of embodiment 37, where the ampli BRCA2 locus sequence. fied sequence is at least 2 kb long. (0197) 33. The method of embodiment 30, further compris 0206 39. A method of embodiment 37, where the ampli ing determining the sensitivity of a subject to a therapeutic fied sequence is at least 5 kb long. treatment based on the kind of genetic rearrangement or 0207 40. A method of embodiment 37, where the ampli mutation detected in a coding or noncoding BRCA1 or fied sequence is at most 20 kb long. BRCA2 locus sequence. 0208 41. A method of embodiment 37, where the ampli 0198 34. A kit for detecting a duplication, deletion, trip fied sequence is at most 10 kb long. lication, inversion, insertion, translocation or large rearrange mentina BRCA1 or BRCA2 locus, BRCA1 or BRCA2 gene, 0209 42. A method of embodiment 37, where the ampli BRCA1 or BRCA2 flanking sequence or intron comprising at fied sequence is at least 2 kb and at most 20 kb long. least two polynucleotides each of which binds to a portion of 0210 43. A method of embodiment 37, where the ampli the genome containing a BRCA1 or BRCA2 gene, wherein fied sequence is at least 5 kb and at most 10 kb long. each of said at least two polynucleotides contains at least 200 02.11 44. A method of any one of embodiments 37 to 43 contiguous nucleotides and is free of repetitive nucleotidic where the amplified sequence comprises at least one of exons sequences, wherein said at least two or polynucleotides are 1a, 1b and 2 of the BRCA1 gene. US 2013/0130246 A1 May 23, 2013

0212 45. A method of any one of embodiments 37 to 43 0220, 53. A method of any one of embodiments 37-48 where the amplified sequence comprises exons 1a, 1b and 2 of wherein the downstream and upstream primers are respec the BRCA1 gene. tively selected from the group of: for a downstream primer: 0213 46. A method of any one of embodiments 37-45 0221) a polynucleotide sequence in the region between where the detection of the gene amplification is achieved by exons 2 and 3 of BRCA1, preferably at a distance from quantifying copies of a sequence included in the amplified 2-4 kb from the 3' end of exon 2, more preferably at a region. distance from 2.5-3 kb from the 3' end of exon 2 or 0214. 47. A method of any one of embodiments 37-46 0222 a polynucleotide sequence in the region between where the detection of the gene amplification is achieved by exons 2 and 3 of BRCA1, within 2 kb from the 3' end of measuring the size of a genomic sequence encompassing the exon 2, preferably within 1.5 kb and more preferably amplified sequence. within 1 kb from the 3' end of exon 2 0215 48. A method of any one of embodiments 37-47 for an upstream primer: where the detection of the gene amplification is achieved by 0223 a polynucleotide sequence in the region between making use of polymerase chain reaction or other DNA the BRCA1 gene and the NBR2 gene, within 2 kb from amplification techniques. exon 1a of BRCA1, preferably within 1.5 kb and more preferably within 1 kb of exon 1a of BRCA1 or, 0216 49. A method of any one of embodiments 37 to 48 0224 a polynucleotide sequence within exon 1a of where the detection of the gene amplification is achieved by BRCA1 or within exon 1b or in the region between quantitative polymerase chain reaction exons 1a and 1b or, 0217. 50. A method of any one of embodiments 37-48 0225 a polynucleotide sequence in the region between where the detection of the gene amplification is achieved by exons 1b and 2, or in exon 2, or in the region between multiplex, ligation-dependent probe amplification (MLPA). exons 2 and 3 0226 54. A method of any one of embodiments 37-48 0218 51. A method of any one of embodiments 37-48 using two or more primers chosen from BRCA1-A3A-F where the detection of the gene amplification is achieved by (SEQ ID 25), BRCA1-A3A-R (SEQ ID 26), BRCA1 array-based comparative genomic hybridization (aCGH). Synt1-F (SEQID 125) and BRCA1-Synt1-R(SEQID 126) or 0219 52. A method of any one of embodiments 37-48 their reverse complementary sequences. where the detection of the gene amplification is achieved by 0227 55. A method of any one of embodiments 37-48 quick multiplex PCR of short fragments (QMPSF) using the Synt 1 probe (SEQ ID NO: 133). US 2013/0130246 A1 May 23, 2013 16

US 2013/0130246 A1 May 23, 2013 17

penUI?nuoo–IETTEVIL US 2013/0130246 A1 May 23, 2013 18

TABLE 2 Description of the DNA probes encoding the BRCA2 GMC

Probe Probe size Forward Reverse BRCA2 ale (bp) primer primer Start End Signal Motif Color Gene Exons

BRCA2 - 1 2450 AAATGGAGGTCAG TGGAAAGTTTGGGTATGCAG 39 2488 GGAACAA

BRCA2-2 TCT CAATGTGCAA TCTTGACCATGTGGCAAATAA 3386 7446 GGCAATC

BRCA2-3a 3822 AATCACCCCAACC GCCCAGGACAAACATTTTCA 8935 12756 TTCAGC

BRCA2-3o 393 O CCCTCGCATGTAT CTCCTGAAGTCCTGGAAACG 128O8 16737 GATCTGA

BRCA2-3c 395.3 TGAAATCTTTTCC AGATTGGGCACATCGAAAAG 16756 CTCTCATCC

BRCA2 - 5 1903 GGTCTTGAACACC CACTCCGGGGGTCCTAGAT 31 O31 32933 RCA2 TGCTACCC

BRCA2 - 6 TCTTTAACTGTTC TGGCTAGAATTCAAAACACTGA 35073 391.75 RCA2 TGGGTCACAA

BRCA2-7 1854 TTGAAGTGGGGTT CCAGCCAATTCAACATCACA 39 617 4.1470 RCA2 TTTAAGTTACAC

BRCA2 - 11 TTGGGACAATTCT TGCAGGTTTTGTTAAGAGTTTCA 5241.1 57616 RCA2 11 GAGGAAAT

BRCA2 - 12 5734 TGGCAAATGACTG TCTTGAAGGCAAACTCTTCCA 64941 RCA2 12 -- CATTAGG 13

BRCA2-13 3251 GGAATTGTTGAAG ACCACCAAAGGGGGAAAAC 682OO 71450 SSB2 RCA2 14 TCACTGAGTTGT

BRCA2-14 1681 CAAGTCTTCAGAA TAAACCCCAGGACAAACAGC 725 OS 74.185 SSB2 RCA2 15 - TGCCAGAGA 16k

BRCA2 - 15 4216 GGCTGTTTGTTGA GAAACCAGGAAATGGGGTTT 76757 8 O972 RCA2 17 - GGAGAGG 18

BRCA2 - 18 2572 TGTTAGGGAGGAA GGATGTAACTTGTTACCCTTGAAA 93.846 9.6417 RCA2 22 + GGAGCAA 23 - 24

BRCA2-19 2125 TCAATAGCATGAA GAGGTCTGCCACAAGTTTCC 96.951 99.075 RCA2 TCTGTTGTGAA

BRCA2-2O 2.559 GGCCCACTGGAGG TTCCTTTCAATTTGTACAGAAACC 99.537 RCA2 25 k TTTAAT

BRCA2-21 1568 TGAATCAATGTGT GTGTAGGGTCCAGCCCTATG O2 609 O4176 RCA2 GTGTGCAT

BRCA2-22a. 3787 CTGAGGCTAGGAA CTGAGGCTAGGAAAGCTGGA O4 612 O8398 RCA2 AGCTGGA

BRCA2 - 22b GGTTTATCCCAGG AGAAAATGTGGGGTGTAAACAG 12O13 RCA2 26 ATAGAATGG

BRCA2-25 5052 CAGCAAACTTCAG GGGACATGGCAACCAAATAC 231.34 28.185 CCATTGA

BRCA2-26 2353 GCACTTTCACGTC CGTCGTATTCAGGAGCCATT 3284.5 CTTTGGT

BRCA2-27 2O58 CCCAGCTGGCAAA TCGGAGGTAATTCCCATGAC 33 176 35233 CTTTTT

BRCA2-28 a 41.58 TCAAGAGCCATGC AGGTAGGGTGGGGAAGAAGA 371-21 41278 TGACATC

BRCA2-29 2335 TGAGTCTACTTTG TTTTGCTTTCGGGAGCT, TTA 53394 55728 CCCATAGAGG US 2013/0130246 A1 May 23, 2013 19

TABLE 2 - continued Description of the DNA probes encoding the BRCA2 GMC

Probe Probe size Forward Reverse BRCA2 ale (bp) primer primer Start End Signal Motif Color Gene Exons

BRCA2-3 O 2121 TTTTTGCCTGCTT GGTTTTTAAACCTGCACATGAA 16O291. 161435 S13B2 B CATCOTC

BRCA2-31 48O3. TGAAATTTTGTTA TTTGAAATCTGTGGAGGTCTAGC 161435. 166237 S13B2 B TGTGGTGCAT

BRCA2-32 26 O9 GTACCAAGGGTGG ATGGTGTTGGTTGGGTAGGA 1698.18 172426 S14B2 G CAGAAAG

Notes: cordinates relative to BAC RP11-486O17, according to NCBI Build 36. 1 (hg18) 4. = blue, G = green, R = red

TABLE 3 Total Alu sequences in probes 30 (10%) Total Alu sequences in excluded regions 270 (90% position in query position in repeat Alu % % % sequence (hg18 matching repeat (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat class family begin end (left) id (count) excluded 2519 7.1 1.O O.O 132 441 -3O8672 -- AluSp SINE Alu 313 O 1 7 region 1 2S 72.0 O.O. O.O 1136 1160 -307953 - AT rich Low Cplxty 25 O 2 22 S8.3 O.O O.O 1627 1662 -307451 - GC rich Low Cplxty 36 O 3 223 19.3 3.S. O.O 1708 1764 -307349 - (CGG)n Simple 2 60 O 4 21 57.1 O.O O.O 1959 1986 -307127 - GC rich Low Cplxty 28 O 5 2280 7.5 2.7 O.7 2142 2434 -306679 - AluSz SINE Alu 299 -13 6 221 6 10.4 O.O 1.4 2436 2733 -306380 - AluSX1 SINE Alu 294 -18 7 248O 4.4 2.0 (0.3 2734 3026 -306087 - AluY SINE Alu 298 -13 8 1117 15.8 O.6 O.O 3305 3475 -30S638 C AluJr SINE Alu -1 301 130 9 364 13.5 O.O O.O 3482 3533 -30SS8O C MER66A LTRAERV1 - 140 338 287 O 749 11.9 S.9 O.8 3557 3674 -305439 C AluJr SINE Alu -1.87 125 2 9 1741 6.O. 17.9 1.O 3746 3996 -30S117 C AluY SINE Alu -18 293 1 1 probe 1A 273 26.3 2.9 O.8 4677 4880 -3O4233 - G-rich Low Cplxty 208 O 2 1 22 40.9 O.O. O.O 5327 S348 -303765 - GC rich Low Cplxty 22 O 3 2331 9.6 0.7 O.3 S904 62O5 -3O2908 - AluSx SINE Alu 303 -9 4 excluded O region 2 probe 1B 2512 6.3 O.3 3.2 91SO 94.67 -299646 - AluY SINE Alu 309 -2 5 2 313 24.8 17.9 O.O 9930 OO46 -299067 C L2b LINEAL2 O 3375 3238 6 374 31.1 1.9 6.6 OOS8 O260 -298853 C L2b LINEAL2 -179 3208 3OOS 6 958 15.6 O.O 7.1 OSO8 O687 -298426 -- FRAM SINE Alu 8 175 -1 7 excluded 1420 7.5 O.O O.6 1598 1771 -297342 C AluSc SINE Alu -2 307 135 8 7 region 3 2332 8.4 O.7 O.3 1783 2078 -297O3S C AluSp SINE Alu -16 297 1 9 486 10.1 O.O 15.1 2O79 2129 -296.984 C AluSc SINE Alu -218 91 47 8 1515 13.5 O.9 O.S 2130 2344 -296.769 C AluSx SINE Alu -94 218 3 2O 2169 8.4 1.4 1.7 2353 2SO7 -296.606 C AluY SINE Alu -20 291 133 21 2672 4.7 O.O. O.O 2508 2807 -296.306 C AluY SINE Alu -11 3OO 1 22 2169 8.4 1.4 1.7 28O8 2941 -296,172 C AluY SINE Alu -179 132 3 21 probe 2 2169 8.4 1.4 1.7 28O8 2941 -296,172 C AluY SINE Alu -179 132 3 21 2 486 10.1 O.O 15.1 2942 2979 -296.134 C AluSc SINE Alu -177 132 99 18 381 348 4.9 O.6 3095 3256 - 295857 - MIRC SINEMIR 18 186 -82 23 219 29.5 2.8 2.8 3304 3411 - 295702 C L2c LINEAL2 -2O2 318iS 3078 24 449 32 O.O. O.O 3485 3S46 -29SS67 - SVA E Other 1318 1379 -3 25 6O1 28.4 18.6 O.O 4578 4771, -294342 -- MIRb SINEMIR 24 253 -15 26 excluded 1845 17.3 1.6 2.3 5074 S380 -293733 - AluJr SINE Alu 1 305 -7 27 6 region 4 1568 15.O 1 O.S 1.O S388 5653 -293460 - Alub SINE Alu 1 291 -21 28 3S2, 26.1 6.5 2.0 S654 S791 -293322 -- MIR3 SINEMIR 35 178 -30 29 689 11.4 O.O. O.O 6242 6346 - 292767 C L1MBS LINEAL1 O 6174 6O70 30 2643 5.6 O.O. O.O 6374 6678 - 292.435 C AluY SINE Alu -6 305 1 31 2125 10.7 3.8 O.3 6912 72OO -291,913 C AluSq2 SINE Alu -13 299 1 32 381 2.2 O.O. O.O 7660 77OS -291,408 - (CA)n Simple 2 47 O 33 US 2013/0130246 A1 May 23, 2013 20

TABLE 3-continued Total Alu sequences in probes 30 (10%) Total Alu Sequences in excluded regions 270 (90% position in query position in repeat Alu % % % Sequence (hg18 matching repeat (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat class family begin end (left) id (count) 280 2S.O 14.8 34 17883 17993 -291 120 - MIR3 SINEMIR 44 166 -102 34 2337 11.2 O.O. O.3 1823O 18541 - 290572 - AluSq2 SINE Alu 1 311 -1 35 201 35.9 O.O. 11.3 18752, 18908 -2902OS C L2c LINEAL2 -1 3386 3246 36 2S4 32.5 5.9 26 19294 19505 -289608 -- L2b LINEAL2 3073 3286 -89 37 217 21.9 O.O. O.O. 19530 19570 -289543 - (CA)n Simple 2 42 O 38 2SO6 8.1 O.O O.O 1961 6 19923 -28919O C AluY SINE Alu -3 3O8 1 39 639 21.8 3.1 2.2 19966 20118 -288995 - MIRb SINEMIR 6 162 -106 40 probe 3 639 21.8 3.1 2.2 19966 20118 -288995 - MIRb SINEMIR 6 162 -106 40 O 1.SSS 15.4 8.4 2.6 206S4 20974 -288.139 C MER44A DNATcMT O 339 1 41 381 16.3 15.1 7.4 21186 2.1311 -2878O2 C MERSA DNA haT- -54 135 1 42 Charlie 229 22.5 6.5 4.2 21507 21599 -287514 C X8 LINE LINECR1 -29 267 173 43 2OO 38.8 3.6 2.9 22836 22973 -28.6140 - MIR SINEMIR 49 187 -75 44 1354 22.8 13.O 21 23166 23655 -285.458 - MLT1E2 LTRERVL- 2 S41 -86 45 MaLR 399 20.9 O.O 6.O 23697 23808 -2853OS C MIR SINEMIR -75 193 97 46 excluded 2288 12.0 0.7 O.O 24330 24637 -284476 C AluSX1 SINE Alu O 312 3 47 11 region 5 2339 9.7 O.3 0.3 2S459 25758 -2833SS C AluSX SINE Alu -12 3OO 1 48 1409 9.1 O.O O.O 2S759 25933 -283180 C AluSq2 SINE Alu -4 3O8 134 49 1785 12.8 O.O 16 25934 26184 -282929 C AluSX SINE Alu -12 3OO S4 50 916 10.5 O.O 25 26.186 26309 -282804 -- AluSX SINE Alu 178 298 -14 51 1897 16.1 0.7 1.O 26638 26936 -282177 C AluJr SINE Alu -14 298 1 52 189 21.1 13.8 7.6 270S6 27142 -281971 C L2a LINEAL2 -3 3423 3332 53 713 22.6 2.4 3.6 2728O 27307 -281.806 C Alub SINE Alu -144 168 141 S4 1795 13.9 7.9 O.7 273O8 27587 -281526 C Alub SINE Alu -12 3OO 1 55 713 22.6 2.4 3.6 27588. 27728 -281,385 C Alub SINE Alu -172 140 1 S4 2417 7.8 O.O 1.7 27734 28O39 -281 074 C AluSc SINE Alu -7 3O2 2 56 2080 14.0 1.O 1.9 28040 28353 -280760 C AluSz SINE Alu -1 311 1 57 probe 4 2OO 17.6 O.O. O.O 29069 291.02 -280011 - C-rich Low Cplxty 146 179 O 58 1 2386 8.5 1.3 1.6 298.63 3O169 -278944 - AluSc8 SINE Alu 1 306 -6 59 excluded 2494 7.4 O.O O.O 31175 31470 -277643 C AluSg SINE Alu -14 296 1 60 16 region 6 886 20.8 3.0 O.S 31677 31814 -277299 - MER3 DNA haT- 1 142 -67 61 Charlie 1112 16.3 O.O 1.8 31815 3.1980 -277.133 C AluJo SINE Alu -13 299 137 62 886 20.8 3.0 OS 3.1981. 32044 -277O69 - MER3 DNA haT- 143 2O7 -2 61 Charlie 396 O.O. O.O. O.O 32317 32360 -2767S3 - (CA)n Simple 2 45 O 63 2102 9.2 O.O. O.O 3241S 32675 -276438 C AluSX3 SINE Alu -15 297 37 64 2319 9.0 O.O 1.7 3291.7 33217 - 275896 - AluY SINE Alu 1 296 -15 65 2269 10.2 24 O.O 33230 33524 -27SS89 - AluSp SINE Alu 1 3O2 -11 66 1969 16.6 O.O. O.3 3398O 34275 -274838 C Alub SINE Alu -16 296 2 67 2311 8.8 O.3 2.3 34281 34585 -274528 C AluSq2 SINE Alu -13 299 1 68 199 36.4 15 O.O 34736 348O1 -274312 -- MIRC SINEMIR 60 126 -142 69 809 26.O O.7 9.3 3487O 34901 -274212 -- MIR SINEMIR 5 33 -229 70 1727 18.2 O-O 5.9 34902 3S038 -274O75 - AluSx SINE Alu 1 136 - 176 71 1897 14.9 O.O. O.4 3SO39 35313 - 273800 - AluSx SINE Alu 1 274 -38 72 1727 18.2 O-O 5.9 35314 35496 - 273617 - AluSx SINE Alu 137 303 -9 71 809 26.O O.7 9.3 35497 35,710 -273.403 - MIR SINEMIR 34 230 -32 70 1810, 17.4 1.3 1.6 35711 36014 -2.73099 C Alub SINE Alu -9 303 1 73 809 26.O O.7 9.3 36O1S 36046 -273,067 - MIR SINEMIR 231 262 O 70 670 20.9 33 12.7 36048. 36228 - 272885 - FRAM SINE Alu 1 166 O 74 437 34.S 4.7 6.3 362SO 36506 -272607 - MIRb SINEMIR 2 2S4 -14 75 2289 9.9 O.O 3.9 36764 37086 -272O27 - AluSX1 SINE Alu 1 311 -1 76 2440 4.5 O.O 1.1 37090 37406 -271707 - AluY SINE Alu 1 311 O 77 1364 10.9 O.O. O.O 374O7 37581 - 271532 - AluSc8 SINE Alu 133 307 -5 78 1601 18.5 O.3 4.8 3761S 37916 - 271197 - AluJr SINE Alu 2 290 -22 79 probe 5 325 27.1 8.8 10.6 386O2 38717 -27O396 - L2c LINEAL2 2331 2446 -973 8O 1 2107 10.4 O.3 3.2 38718 390OS -270 108 - AluSX1 SINE Alu 1 28O -32 81 414 O.O. O.O. O.O 39006 39051 -27OO62 -- (CAA)n Simple 3 48 O 82 325 27.1 8.8 10.6 390S2 39115 -269998 - L2c LINEAL2 2447 2SO9 -910 8O 218 28.1 9.7 3.2 39.093 39298 -2.69815 - L2c LINEAL2 2464. 2682 -737 8O excluded 218 28.1 9.7 3.2 39.093 39298 -2.69815 - L2c LINEAL2 2464. 2682 -737 8O 9 region 7 198 O.O. O.O. O.O 3943S 39456 -269657 (TTA)n Simple 2 23 O 83 116S 10.7 O.O. O.O 394.57 396OS -2695.08 AluSx SINE Alu -27 285 137 84 1808 10.O. 11.9 1.O 39609 39877 -269236 s AluSp SINE Alu -15 298 1 85

US 2013/0130246 A1 May 23, 2013 23

TABLE 3-continued Total Alu sequences in probes 30 (10%) Total Alu Sequences in excluded regions 270 (90% position in query position in repeat Alu % % % Sequence (hg18 matching repeat (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat class family begin end (left) id (count) 2319 9.6 0.7 O.O 92336 92638 -21647S C AluSp SINE Alu -8 305 1 2O3 1942 13.2 0.4 O.4 92899 932O2 -2.15911 C AuSc8 SINE Alu O 312 1 204 2094 11.2 3.1 O.3 93338 93.623 - 215490 - AluSX1 SINE Alu 2 295 -17 205 887 20.1 O.O O.O 93624 93.767 - 215346 C AluJo SINE Alu -32 28O 137 2O6 2S2 33.6 6.9 0.0 93.795 93910 –215203 + Tigger15a DNATcMT 530 653 -62 2O7 probe 12 252 33.6 6.9 0.0 93.795 93910 –215203 + Tigger15a DNATcMT 530 653 -62 2O7 2 468 11.4 8.6 O.O. 93927 93.996 -2.15117 C AluSq2 SINE Alu -13 299 224 208 395 24.4 2.5 2.5 93999 94116 -214997 C Charlie4Z DNA haT- -46 121 4 209 Charlie 2373 8.8 O.3 O.O. 947S9 95052 -214061 - AluSX4 SINE Alu 2 296 -16 210 23 43.5 O.O O.O 95.358 95.380 -213733 - AT rich Low Cplxty 1 23 O 211 2S8 25.6 10.1 1.2 95.449 95527 -213586 C L2c LINEAL2 -16 3371 3286 212 377 18.3 9.1 7.7 95752 95905 -213208 C L1MCS LINEAL1 -36 7925 7770 213 excluded 377 18.3 9.1 7.7 95752 95905 -213208 C L1MCS LINEAL1 -36 7925 7770 213 15 region 13 728 16.7 11.4 O.O 95916 96O47 -213066 C AluJo SINE Alu -26 286 140 214 223S 1 O.S O.3 0.3 96.061 963S4 -212759 C AluSq2 SINE Alu -18 294 1 215 823 23.1 9.4 1.1 96.357 96.637 - 212476 C L1MCS LINEAL1 -444. 7517 7255 213 2O36 13.5 O.O 1.O 96696 96992 -21.2121 - AluSX4 SINE Alu 294 -18 216 2148 11.7 O.3 1.3 96996 97302 -21.1811 - AluSg SINE Alu 3O4 -6 217 738 27.7 8.5 2.2 97396 97904 -211209 C L2a LINEAL2 -12. 3414 2870 218 1585 12.8 O.O 20.1 9791S 98272 -210841 C AluJr4 SINE Alu -14 298 1 219 1845 13.4 4.1 2.4 98.298 98.588 -210525 C AluSX4 SINE Alu -15 297 2 220 497 11.0 33.0 O.O 98.722 98821 -210292 + FLAMC SINE Alu 133 -10 221 237 31.1 10.1 O.O 98.916 99034 -210079 - MIR3 SINEMIR 5 135 -73 222 2590 5.3 O.O O.O 1 OOO2O 100320 -2O8793 - AluYk4 SINE Alu 301 -11 223 1949 8.9 3.7 2.2 10O331 100600 -208513 - AluSg SINE Alu 2 275 -35 224 2347 7.8 O.O. O.O 1 OO630 100937 -208176 - AluY SINE Alu 311 O 225 2326 10.1 O.7 OO 100941 101.248 -2O7865 - AluSp SINE Alu 3 312 -1 226 S90 26.8 13.O. O.S 101876 102152 -206961 C L2a LINEAL2 -2 3424 31.17 227 1614 16.1 17 28 102162 1023OO -2O6813 - Alub SINE Alu 134 -168 228 233O 9.8 O.O 3.6 102301 102617 -2O6496 - AuY SINE Alu 306 -5 229 1614 16.1 17 28 102618, 102771 -2O6342 -- Alub SINE Alu 135 291 -11 228 2237 9.1 2.0 OO 102886 103183 -20593O C AluSc5 SINE Alu -8 3O4 1 230 probe 13 A 270 00 0.0 0.0 104284 104313 -204800 + (TTTTG)n Simple 30 O 231 1 16SO 4.5 S.S. O.O. 104318. 104516 -204597 C AluSX SINE Alu -37 275 66 232 excluded 8064. 14.0 7.8 5.5 106203 107278 -201835 + LTR12C LTRAERV1 3 1140 -439 233 10 region 14 2324 10.1 O.O O.3 107279 107586 -201527 - AuY SINE Alu 2 3O8 -3 234 8064. 14.O 7.8 S.S. 107587 108052 -2O106.1 - LTR12C LTRAERV1 1141 1579 O 233 939 10.O O.O 6.1 108354 108493 -200620 C FLAM C SINE Alu -11 132 1 235 2397 8.1 O.O 1.6 109001 1093O8 -1998OS C AuY SINE Alu -7 3O4 2 236 790 13.7 1.6 1.6 109.726 109849 -199264 C FLAM C SINE Alu -19 124 1 237 2100 13.8 O.3 O.O. 109852 O149 -198964 C AluSz SINE Alu -13 299 1 238 696 27.4 7.1 O.9 O153 O362 - 198751 C MIRC SINEMIR -1 267 45 239 248. 31.0 6.2. O.O O411 OS23 -198590 C L1MS LINEAL1 -747 S447 S328 240 189 7.4 O.O. O.O 0917 O943 -1981.70 - (TAA)n Simple 2 28 O 241 1606 7.3 O.O O.O 1079 1269 -197844 - AluY SINE Alu 104 294 -17 242 2148, 15.1 O.O O.O 1309 1619 -197494 C AluSZ6 SINE Alu -1 311 1 243 431 16.2 14.1 O.O 1625 1723 -197390 C MIRb SINEMIR -67 2O1 89 244 327 26.O O.O. 12.2 2010 2101 -197012 -- MIRC SINEMIR 37 118 - 150 245 1373 9.8 O.6 0.6 2104 2286 -196827 C AluSc SINE Alu O 309 127 246 2444 7.5 O.O 2.9 2288 26O7 -196506 C AluY SINE Alu O 311 1 247 251 22.8 3.5 1.7 2610 2667 -196446 - MIR SINEMIR 104 162 -100 245 18O 29.8 18.2 1.O 2901 2988 -1961.25 + MERSA DNA haT- 68 170 -19 248 Charlie 23O3 12.O O.O. O.O 3.162 3470 - 195643 C AluSz SINE Alu -3 309 1 249 probe 15 804 14.4 16 O.O 5549 5673 -193440 + FLAMC SINE Alu 2 128 -15 250 1 7181 6.4 O.7 O.1 5705 6977 - 192136 - L1PAS LINEAL1 487S 6154 O 251 excluded 1884 13.3 1.9 O.4 7135 7404 - 191709 - AluSz SINE Alu 1 274 -38 252 2 region 15 18O, O.O. O.O. O.O 7411 7430 -191683 + (CAAAA)n Simple 1 2O O 253 2240 12.3 1.O O.O 7441 7749 -191364 - AluSq2 SINE Alu 1 312 O 2S4 224 37.7 O.O. O.O 7758 7834 -191279 - L2 LINEAL2 458 S34 -2885 255 probe 16 652 29.2 9.5 7.2 8175 8595 - 190518 - LTR33B LTRERVL 53 482 -21 2S6 O 722 16.5 O.O 2.5 8599 8722 - 190391 - MER21C LTRERVL 1 121 -817 257 2342 12.3 O.O 2.8 8771 8897 -190216 C L1PREC2 LINEAL1 O 6160 6034 258 US 2013/0130246 A1 May 23, 2013 24

TABLE 3-continued Total Alu sequences in probes 30 (10%) Total Alu Sequences in excluded regions 270 (90% position in query position in repeat Alu % % % Sequence (hg18 matching rebeat (left) end begin linkage Seq score div. del. ins. begin end (left) -- rebeat class family begin end (left) id (count) excluded 2262 9.2 2.7 0.0 18898 11.9189 -189924 C AluSg4 SINE Alu -12 3OO 1 259 1 region 16 probe 17 2262 9.2 2.7 0.0 18898 11.9189 -189924 C AluSg4 SINE Alu -12 3OO 1 259 1 2342 123 O.O 2.8 19190 119429 -189684 C L1PREC2 LINEAL1 -127 6033 S803 258 1975 21.0 10.4 1.1 1943O 1200S1 89.062 - MER21C LTRERVL 111 790 -148 257 279 35.6 6.S 1.6 20054 120343 8877O - L2c LINEAL2 3O3O 3349 -38 260 440 17.1 4.2 6.9 2O617 120735 88378 - MLT1M LTRERVL- 83 198 -474 261 MaLR excluded 1069 13.8 0.0 1.3 2O857 121016 - 188097 - AluJo SINE Alu 135 292 -20 262 12 region 17 28 62.9 O.O. O.O 21035 121069 -188044 - AT rich Low Cplxty 1 35 O 263 2240 6.4 11 O.O 21072 121338 - 187775 - AluY SINE Alu 3 272 -39 264 2197 11.4 O.O. O.7 21453 121749 - 187364 C AluSX SINE Alu -17 295 1 26S 26S 28.2 1.4 1.4 21841 121912 - 1872O1 - MIRb SINEMIR 197 268 O 266 SO3 3 O.S 4.4 5.3 21998 122246 -186867 -- MIRb SINEMIR 19 26S -3 267 1266 11.9 O.O. 11 22278 1224.53 86660 C AluSp SINE Alu -13 3OO 127 268 726 22.S. O.O. O.O 224.57 122629 -186484 + (TATATG)n Simple 4 176 O 269 23 34.8 O.O. O.O 2263O 122652 - 186461 -- AT rich Low Cplxty 1 23 O 270 94O 113 O.8 O.O 22653 122776 - 186337 C AluSp SINE Alu -188 125 1 268 26 60.6 O.O. O.O 23439 1234.71 85.642 -- AT rich Low Cplxty 1 33 O 271 2378 7.4 O.O 1.0 23.475 123773 85340 - AluY SINE Alu 1 296 -15 272 784 13.1 O.O O.O 2427S 124381 84732 -- AluSX SINE Alu 1 107 -2OS 273 273S 4.2 O.O. O.O 24853 12516.1 83952 C AluY SINE Alu -2 309 1 274 2424 8.1 O.O O.O 2S836 126131 82982 C AluY SINE Alu -3 3O8 13 275 1876 10.7 1.6 5.1 26S4S 126728 -18238S C AluSX SINE Alu -17 295 108 276 2573 5.1 O.O O.O 26729 127023 82090 C AluY SINE Alu -15 296 2 277 1876 10.7 1.6 5.1 27024 1271.43 81970 C AluSX SINE Alu -205 107 1 276 probe 18 2S 72.0 O.O. O.O 27246 127270 -181843 - AT rich Low Cplxty 1 25 O 278 1 24O 21.1. 16.9 4.0 27577 127665 81448 + MIR3 SINEMIR 94 193 -15 279 1262 8.1 1.7 11 27666 127838 - 181275 - AluSp SINE Alu 124 297 -16 28O 2123 13.3 16.2 0.4 27864. 128270 -18O843 C LTR7C LTRAERV1 O 471 1 281 576. 20.3 3.1 3.9 28487 1286.14 - 180499 C MER2B DNATcMT O 336 210 282 excluded 576 20.3 3.1 3.9 28487 1286.14 - 180499 C MER2B DNATcMT O 336 210 282 4 region 18 1973 10.S 4.9 S.6 286.31 128935 8O178 C AluY SINE Alu -8 303 1 283 11SO 5.9 O.O. O.O 28.936 129070 -180043 C AluSz SINE Alu -177 135 1 284 187 33.4 7.1 9.9 29.286 129324 - 179789 - L2 LINEAL2 2142. 21.81 -1238 285 2251 1 O.O. O.O 1.0 2932S 129624 - 179489 C AluSg4 SINE Alu -14 298 2 286 187 33.4 7.1 9.9 29625 129648 -1794.65 - L2 LINEAL2 2182 2.192 -1227 285 1745 16.7 3.5 O.O 29649 12993.5 - 1791.78 C Alub SINE Alu -15 297 1 287 187 33.4 7.1 9.9 29936 1301.09 - 179004 -- L2 LINEAL2 2193 2374 -1045 285 probe 19 187 33.4 7.1 9.9 29936 1301.09 - 179004 -- L2 LINEAL2 2193 2374 -1045 285 2 S48 25.0 O.O. O.O 30353 130464 - 178649 - MER81 DNA haT- 2 113 -1 288 Bkk 397 20.0 3.0 1.0 30604 130704 -1784.09 -- LTR88b LTR/Gypsy? 722 824 -13 289 1038. 18.1 O.O O.6 3O839 131004 -178109 -- AluSZ6 SINE Alu 7 171 -141 290 2O7 O.O. O.O. O.O 31023 131045 – 178068 + (CAAAAA)n Simple 2 24 O 291 1739 17.6 O.O 2.7 31144 131445 - 177668. -- AluJr SINE Alu 1 294 -18 292 excluded 1739 17.6 0.0 2.7 31144 131445 - 177668. -- AluJr SINE Alu 1 294 -18 292 18 region 19 683 21.3 89 2.2 3148S 131652 - 177461 C MIRb SINEMIR -35 233 55 293 290 24.9 15.2 3.1 31818 131962 - 17715.1 - L2C LINEAL2 322S 3.386 -1 294 2015 12.O O.6 1.3 3197S 132108 - 177005 - AluSx SINE Alu 1 135 - 177 295 2358 8.6 O.O 3.0 32109 132421 76692 - AluY SINE Alu 1 3O4 -7 296 2015 12.O O.6 1.3 32422 132598 - 176515 - AluSx SINE Alu 136 310 -2 295 369 16.2 O.O 2.9 32682 132751 76362 C L1MCS LINEAL1 -523 7438 7371 297 3496 8.6 2.O 14 32752 133237 - 175876 - LTR1S LTRAERV1 1 671 -4 298 378 23.8 13.4 O.S 332.42 133382 - 175731 C L1MCS LINEAL1 -547 7495 7255 297 2042 13.2 0.3 0.7 33441 133736 -17S377 - AluSx SINE Alu 1 295 -17 299 2238 9.S. O.O. O.O 33740 134023 -17SO90 - AluSg SINE Alu 1 284 -26 3OO 371 4.7 O.O. O.O 34037 134079 -17SO34 - AluSZ6 SINE Alu 244 286 -26 301 694 29.0 9.4 4.O 341.83 134701 74412 C L2a LINEAL2 O 3375 2870 3O2 1211 10.9 39.O 1.O 347 OS 134933 - 17418O C AluSX3 SINE Alu -14 298 1 303 651 22.9 O.8 O.O 34943 135064 - 174049 C AluSz SINE Alu -1.87 125 3 303 1658 16.3 4.3 2.1 35083 135358 - 173755 C AluSz SINE Alu -30 282 1 3O4 2301 11.2 0.3 O.O 35492 135794 -173319 - AluSx SINE Alu 1 3O4 -8 305 37S 28.3 11.6 16 35871 1361.10 -1730O3 - MIRC SINEMIR 2 268 O 306 2136 114 1.O O.7 369S4 1372S1 71862 -- AluSc8 SINE Alu 1 299 -13 307 US 2013/0130246 A1 May 23, 2013 25

TABLE 3-continued Total Alu sequences in probes 30 (10%) Total Alu Sequences in excluded regions 270 (90% position in query position in repeat Alu % % % Sequence (hg18 matching repeat (left) end begin linkage Seq score div. del. ins. begin end (left) -- rebeat class family begin end (left) id (count) 2368 7.1 1.0 O.3 1372S3 137549 - 171564 - AluSp SINE Alu 3 301 -12 3O8 8O1 26.6 8.3 O.7 1381.99 1384.52 - 170661 C L2a LINEAL2 -1 342S 3153 309 1432 15.2 6.6 O.3 138490 138606 - 170507 - Alub SINE Alu 1 117 - 195 310 195 6.9 O.O O.O 1386O7 1386.35 - 170478 - (CA)n Simple 2 30 O 311 1432 15.2 6.6 O.3 1386.36 138788 - 170325 - Alub SINE Alu 11.8 287 -25 310 254 12.8 O.O O.O 138793 138831 70282 -- L1ME3 LINEAL1 6124 6162 O 312 1283 15.2 0.6 4S 138839 1391.62 - 169951 C SVAF Other -61S 760 449 313 2O29 2.1 O.O O.O 1391.63 139395 69718 - SVAC Other 1152 1384 O 314 1528 7.5 O.O 1.5 139579 139781 69332 C AuY SINE Alu -13 298 99 315 3S2O 7.6 0.2 2.8 139782 14O2S6 - 168857 C LTR2 LTRAERV1 O 463 1 316 7381 7.3 2.1 0.0 140257 141186 - 167927 C Harleq-int LTRAERV1 O 7847 68.98 316 3412O 6.3 0.8 0.3 141187 145402 -163711 C Harleq-int LTRAERV1 -996 S900 1666 316 384 42 O.O O.O 145423 145470 -163643 - L1PA3 LINELI 6103 61SO -5 317 637 8.0 4.9 1.9 145480 145581 63532 C Harleq-int LTRAERV1 -5222 1674 1570 316 S813 9.7 2.9 2.2 14559S 146781 62332 C Harleq-int LTRAERV1 -581 6 1080 1 316 3514 7.8 0.4 O.2 146783. 147234 -161879 C LTR2 LTRAERV1 -1O 453 1 316 775 7.8 O.O O.O 14723S 147336 -161777 C AuY SINE Alu -209 102 1 315 2256 9.6 0.3 O.7 147892 1481.94 -160919 - AluSp SINE Alu 1 3O2 -11 318 probe 22 2246 7.9 3.5 O.O 148712 149001 6O112 C AluSg SINE Alu -9 301 2 319 2 21 42.9 O.O O.O 150814 150834 -158279 - GC rich Low Cplxty 1 21 O 32O 74O 146 (O.O 6.6 151349 151478 -157635 C FLAMC SINE Alu -21 122 1 321 excluded 2502 6.8 0.0 O.3 1523SS 152661 S6452 C AluY SINE Alu -S 306 1 322 5 region 20 794 13.7 1.6 1.6 152695 152818 - 156295 C FLAMC SINE Alu -19 124 1 323 2O8S 13.3 1.3 O.O 152821, 1531.20 - 155993 C AluSz SINE Alu -8 304 1 324 S63 32.8 6.6 1.5 153132 153370 -155743 C MIRC SINEMIR -10 258 3 325 791 18.7 9.2 4.2 153566 153838 155275 - L1MCS LINEAL1 7642 7927 -34 326 2240 9.6 O.O O.7 153853 154145 - 154968 - AluSc8 SINE Alu 3 293 -19 327 28 67.9 O.O O.O 154149 154176 -154937 -- AT rich Low Cplxty 1 28 O 328 216O 9.6 2.2 3.9 1543SO 154662 -154451 - AluY SINE Alu 1 308 -3 329 probe 23 216 27.8 3.8 12 154848. 154927 -154186 - L2a LINEAL2 33O2 3383 -43 330 1 298 25.O 4.6 4.6 1551S6 155264 - 153849 - L2b LINEAL2 32S6 3364 -11 331 1947 15.3 0.3 O.7 15652S 156824 - 152289 - Alub SINE Alu 1 299 -13 332 252 27.7 8.2 5.8 156901 157034 -1S2O79 C L1MC LINEAL1 -2228 S654 SS18 333 441 O.O. O.O O.O 157109 1571.57 - 151956 - (CA)n Simple 2 50 O 334 31S 28.3 5.2 O.O 157159. 157290 -151823 C L1MS LINEAL1 -6SS 5463 S326 335 excluded 813 14.2 O.O 3.5 157768 157887 -151226 C AluJo SINE Alu -196 116 1 336 3 region 21 224S 13.2 O.O O.O 157903 158212 -150901 C AluSz SINE Alu -2 310 1 337 958. 19.8 6.9 O.9 1583 OS 158506 -150607 C AluJr SINE Alu -12. 3OO 87 338 probe 24 S1S 29.2 0.6 1.3 158572 158727 - 150386 C MIR SINEMIR -106 156 2 339 O 559 23.7 7.7 1.8 159274 159428 – 149685 C Tigger16b DNATcMT -16 321 158 340 276, 19.7 O.O O.O 159632 159697 - 149416 C L1MA9 LINEAL1 - 19 6293 6228 341 1903 14.2 6.8 O.3 1596.98 160008 - 1491OS C Tigger3a DNATcMT O 348 18 342 3O4 29.1 17 10.2 16OO14 16O193 -14892O C L1MA9 LINEAL1 -93 6219 6054 341 26 69.2 O.O O.O 16O2SO 160275 - 148838 - AT rich Low Cplxty 1 26 O 343 excluded 3O 6O.O O.O O.O 160373 160402 - 148711 - AT rich Low Cplxty 30 O 344 16 region 22 1901 16.8 0.3 O3 160410 16O707 - 1484.06 C Alub SINE Alu -14 298 1 345 2429 6.6 2.3 O.O 160926 161228 - 147885 - AluY SINE Alu 1 310 -1 346 2151 12.8 O.3 1.O 161239 161543 - 147S70 - AluSq2 SINE Alu 1 3O3 -9 347 812 17.1 O.O 1.6 161559 161687 - 1474.26 C FLAMA SINE Alu -13 129 3 348 2239 11.O O.3 1.3 161748 162O56 - 147057 C AluSZ6 SINE Alu -6 306 1 349 637 9.0 O.8 11.5 16216S 162289 - 146824 C L1MA9 LINEAL1 -33 6279 6167 350 2152 13.O O.O O.O 1623OO 162598 - 146515 C AluSx SINE Alu -12. 3OO 2 351 853 17.8 O.O O.O 1626OO 162728 - 146385 C FLAMC SINE Alu -14 129 1 352 2348, 9.8 O.O O.O 162759 163053 - 146060 C AluSc SINE Alu -13 296 2 353 753 24.7 O.O O.7 163054 163199 - 145914 C Alub SINE Alu -32 280 136 3S4 1899 16.7 2.0 O.O 1632O2 163494 -145619 C AluSZ6 SINE Alu -12. 3OO 2 355 21 67.9 O.O O.O 163511 163538 - 145575 - AT rich Low Cplxty 1 28 O 356 1411 15.6 1.9 12.5 163577 163884 - 145229 C AluJo SINE Alu -23 289 11 357 2314 10.8 O.O O.O 163906 1642O1 - 144912 C AluSx SINE Alu -16 296 1 358 247O 9.1 O.3 O.O 164346 164653 - 144460 - AluSc SINE Alu 1 309 O 359 629 21.8 7.3 O.O 164831 164954 -144159 - Alub SINE Alu 4 136 - 176 360 1493 17.2 4.8 2.0 164955 165244 -143869 - AluJo SINE Alu 2 299 -13 361 2231 9.3 O.O 1.4 165251 165587 - 143526 - AluSq2 SINE Alu 1 312 O 362 US 2013/0130246 A1 May 23, 2013 26

TABLE 3-continued Total Alu sequences in probes 30 (10%) Total Alu Sequences in excluded regions 270 (90% position in query position in repeat Alu % % % Sequence (hg18 matching repeat (left) end begin linkage Seq score div. del. ins. begin end (left) -- rebeat class family begin end (left) id (count) probe 25 5877 8.3 2.5 6.2 166057 166719 - 142394 C L1PA7 LINEAL1 -1 6153 S491 363 O excluded 5877 8.3 2.5 6.2 166057 166719 -142394 C L1PA7 LINEAL1 -1 6153 S491 363 3 region 23 2432 7.4 O.O. O.7 16672O 16701S -142098 C AuY SINE Alu -17 294 1 364 S877 8.3 2.5 6.2 167016 167038 - 142O75 C L1PA7 LINEAL1 -664 5490 5490 363 2296 11.5 O.O. O.O 167039 167343 - 141770 C AluSX3 SINE Alu -7 305 1 365 S877 8.3 2.5 6.2 167344 167416 - 141697 C L1PA7 LINEAL1 -664 5490 S420 363 2S27 8.4 O.O. O.O 1674.17 16772S -141388 C AuY SINE Alu -2 309 1 366 S877 7.4 1.O O.3 167726 168279 -140834 C L1PA7 LINEAL1 -73S S419 4870 363 probe 26 5877 7.4 1.0 0.3 167726 168279 -140834 C L1PA7 LINEAL1 -73S S419 4870 363 2 1566 16.2 8.3 0.3 16963O 169907 - 139206 C Alub SINE Alu -12 3OO 1 367 266 33.0 2.3 1.4 16996O 1701.20 - 138993 C MIRb SINEMIR -96 172 5 368 1633 22.3 O.O. O.7 170506 170806 -1383.07 - AluJr SINE Alu 1 299 -13 369 excluded 2359 8.O O.3 0.7 1712SS 171556 - 137557 C AuY SINE Alu -9 3O2 2 370 3 region 24 2345 8.4 O.O 1.O. 171SS7 171854 - 137259 C AluSg SINE Alu -12 298 4 371 244O 6.5 O.O 2.6 17189S 172204 - 136909 C AuY SINE Alu -9 3O2 1 372 probe 27 500 17.8 10.2 1.4 173641 173784 – 135329 + L.1MC4a LINEAL1 7729 7994 -1 373 O excluded 1743 15.8 0.3 6.0 174758 174905 -134208 + Alub SINE Alu 2 145 - 167 374 8 region 25 2453 8.3 0.3 O.O. 174906 1752O7 -133906 - AluSp SINE Alu 303 -10 375 1743 15.8 O.3 6.O. 1752O8 175375 -133738 - Alub SINE Alu 146 301 -11 374 2487 8.2 O.O. O.O. 175378 175681 -133432 - AluSg7 SINE Alu 3O4 -8 376 1773 15.8 O.3 6.O 276,759 276906 -32207 - Alub SINE Alu 2 145 - 167 377 2466 8.3 0.3 O.O 276907 2772O7 -31906 - AluSp SINE Alu 3O2 -11 378 1773 15.8 O.3 6.O 2772O8 277375 -31 738 - Alub SINE Alu 146 301 -11 377 251O 8.5 O.O. O.O 277378. 277684 -31429 - AluSg7 SINE Alu 307 -5 379 probe 29 O excluded 2477 7.4 O.O O.O 278774 279071 -30042 + AuY SINE Alu 298 -13 380 6 region 26 2212 9.4 O.3 5.3 2794O6 279724 -29389 - AluSp SINE Alu 3O4 -9 381 2283 10.4 O.3 O.O 279909 28O2OS -28908 - AluSg SINE Alu 298 -12 382 2288 9.1 O.O O.7 280216 280SO1 -28612 -- AuY SINE Alu 284 -27 383 23S 22.6 7.O 2.2 280538 280623 -28490 - L1ME4a LINEAL1 S948 6037 -87 384 1552 21.2 4.2 0.3 280624 280910 -282O3 C Alub SINE Alu -14 298 1 385 2217 8.9 14 O.7 280919 28.1210 -27903 C AuY SINE Alu -17 294 1 386 probe 30 288 7.0 0.0 O.O 281782 281824 –27 289 + (GGA)n Simple 43 O 387 O excluded 2005 17.0 O.O O.O 282404 282703 -26410 C AluSZ6 SINE Alu -1 301 2 388 1 region 27 probe 31 O excluded 2341 8.6 0.7 0.7 283434 283734 -25379 + AluSX1 SINE Alu 301 -11 389 1 region 28 probe 32 O excluded 331 28.5 9.8 2.3 283817 283938 -25175 + MIRb SINEMIR 18 148 -120 390 O region 29 probe 33 328 29.2 3.2 14.3 285397 285474 -23639 + MIRb SINEMIR 3 70 -198 392 O excluded 328 29.2 3.2 14.3 285397 285.474 -23639 + MIRb SINEMIR 3 70 -198 392 10 region 30 2457 7.7 O.O. O.3 285.475 285773 -23340 C AuY SINE Alu -13 298 1 393 328, 29.2 3.2 14.3 285774 285818 - 23295 - MIRb SINEMIR 7 114 -154 392 408 34.7 8.7 2.2 285879 28S923 -23190 C L2c LINEAL2 -38 3349 3305 394 81S 17.3 O.O 3.3 28S924, 286O70 -23043 - Alub SINE Alu 145 - 167 395 24O4 7.7 O-3 (0.3 286O71 28.6369 -2.2744 - AluSc5 SINE Alu 299 -13 396 81S 17.3 O.O 3.3 286370 286,532 - 22581 - Alub SINE Alu 146 301 -11 395 408 34.7 8.7 2.2 286.533 286611 -22SO2 C L2c LINEAL2 -83 33O4 3221 394 2426 8.9 O.O. O.O 286612 286903 -22210 - AluSg SINE Alu 292 -18 397 408 31.6 7.5 2.4 286904, 28.7093 - 22O2O C L2c LINEAL2 -167 3220 3009 394 897 18.1 O.O O.3 2871.33 287435 -21678 - AluSZ6 SINE Alu 3O2 -10 398 2477 8.5 O.7 O.O 287436 28.7740 -21373 - AluSg SINE Alu 307 -3 399 236 28.4 6.8 6.1 287743 287888 - 21225 C L2c LINEAL2 -495 2924, 2778 394 242S 7.2 0.7 O.O 287918 288210 -20903 - AluSX4 SINE Alu 5 299 -13 400 966 14.8 O.O. O.7 288.319 2886O1 -20512 -- Alub SINE Alu 281 -31 4O1 198. 19.2 9.4 1.8 2886O2 288648 -20465 C L2c LINEAL2 -823 2596 2545 394 37O 33.9 7.3 3.9 288662 288761 -2O352 C L2c LINEAL2 -927 2492 2386 394 455 18.4 8.1 S.3 288762 288900 -20213 C MER2 DNATcMT 344 212 4O2 649 18.9 1.O 1.7 288901, 289.197 -19916 C AluJr SINE Alu -17 295 1 403 455 18.4 8.1 S.3 289198 289390 -19723 C MER2 DNATcMT -134 211 3 4O2 US 2013/0130246 A1 May 23, 2013 27

TABLE 3-continued Total Alu sequences in probes 30 (10%) Total Alu sequences in excluded regions 270 (90%) position in query position in repeat Alu

% % % sequence (hg18) matching repeat (left) end begin linkage Seq

score div. del. ins. begin end (left) -- repeat class family begin end (left) id (count)

probe 34 1455 18.4 8.1 5.3 289198 2893.90 9723 C MER2 DNATcMT -134 211 3 4O2 O 37O 31.2 49 4.4 789.391 289699 9414 C L2c LINEAL2 -1034 238S 2033 394 274 29.6 20.4 8.6 28.9992 29.0173 894O C MIRb SINEMIR -48 220 16 404 254 16.1 1.4 10.9 290149 290218 8895 - MIR SINEMIR 96 159 -103 40S excluded 254 16.1 1.4 10.9 290149 290218 8895 - MIR SINEMIR 96 159 -103 405: 2 region 31 1998 16.9 O.O. O.3 290222 290S34 8579 - Alub SINE Alu 1 312 O 4O6 2S84 6.3 O.O O.O 290614 290.913 82OO C AluY SINE Alu -11 3OO 1 407 probe 35 25 76.1 O.O O.O 291.372 291417 7696 - AT rich Low Cplxty 1 46 O 4.08 O 21 38.1 O.O O.O 291399 29.1419 7694 - AT rich Low Cplxty 1 21 O 409 228 6.7 O.O. O.O 293811 293840 5273 + (CAGCC)n Simple 3 32 O 410 excluded 1075 11.7 0.0 14 295607 29.5751 3362 + FLAMC SINE Alu 1 143 O 411 3 region 32 2297 12.3 O.O O.3 29621S 296522 2591 - AluSX1 SINE Alu 1 307 -5 412 2261 8.2 0.7 O.O 296524 2968O3 2310 - AluSg SINE Alu 22 3O3 -7 413 probe 36 611 31.6 6.1 1.2 29694O 2971.70 1943 C MIRb SINEMIR -1 267 26 414 1 796 17.6 2.3 O.O 299588 2.99718 -9395 C FLAMC SINE Alu -8 135 2 415 excluded 2282 9.0 0.3 0.3 2999.17 300205 -8908 + AluSq4 SINE Alu 1 289 -23 416 3 region 33 1752 16.3 2.0 1.7 3.00991 301290 - 7823 - AuSZ6 SINE Alu 2 302 -10 417 2156 13.3 O.7 O.3 301631 301930 -7.183 C AuSZ6 SINE Alu -10 302 2 418 probe 37 O excluded 1844 12.7 7.6 0.0 303366 303641 -5472 + AuSZ6 SINE Alu 1 297 -15 419 6 region 34 186 4.3 O.O O.O 3O3712 3O3734 -5379 - (TCTG)n Simple 2 24 O 420 1799 15.9 O.O. O.7 30373S 3040OS -S108 C AuSx3 SINE Alu -43 269 1 421 1627 16.8 0.6 8.1 3O4121 3.04299 -4814 C Alub SINE Alu -3 309 129 422 2369 10.8 O.3 O.O 3O43OO 3O46O4 -4509 C AluSc SINE Alu -2' 307 2 423 1627 16.8 O.6 8.1 3O46OS 304742 -4371 C Alub SINE Alu -184 128 14 422 36S 16.1 8.5 O.O 3O4786 304.873 -4240 C FRAM SINE Alu O 133 24 424 probe 38 219 3.6 0.0 0.0 305.000 305.027 -4086 + (CA)n Simple 2 29 O 425 O 201 7.4 O.O. O.O 3OSO28 305054 -4059 - (TC)n Simple 2 28 O 426 262 36.O O.O. O.O 3OS840 305978 -3135 - (TGG)n Simple 1 139 O 427 excluded 980 19.5 O.O 1.2 306413 306573 -2S4O C Alub SINE Alu -1.8 294 134 428 9 region 35 683 16.O O.O 1.S. 306S 74 306841 -2.272 C AluJr SINE Alu -14 298 35 429 O8 6.8 6.O 8.0 306893 306924 - 2189 C Charlie5 DNA haT- -1 2623 2600 430 Charlie 2498 7.1 O.O O.O 30692S 307220 -1893 - AluSg SINE Alu 1 296 -14 431 35 O.O O.O. O.O 307222 307260 - 1853 - (TA)n Simple 2 40 O 432 O8 6.8 6.O 8.0 307261 307290 -1823 C Charlie5 DNA haT- -25 2599 2574 430 Charlie 2429 10.1 O.O O.O 307291 307597 -1516 C AluSg SINE Alu -3 307 1 433 O8 6.8 6.O 8.0 307598 307634 -1479 C Charlie5 DNA haT- -51 2573 2537 430 Charlie 814 18.1 3.4 O.O 307635 307932 -1181 - AluJr SINE Alu 1 3O8 -4 434 O8 6.8 6.O 8.0 307933 307957 - 1156 C Charlie5 DNA haT- -88 2536 2SO9 430 Charlie 804 16.6 1.O 1.0 307958 3O8258 -85S C Alub SINE Alu -11 301 1 435 O8 6.8 6.O 8.0 3O8259 308509 -604 C Charlie5 DNA haT- -116 2508 2251 430 Charlie 18O, O.O. O.O. O.O 3O8538 3O8557 -556 - (TTG)n Simple 2 21 O 436 2319 9.2 O.O. O.3 3O8558 3O8843 -270 C AluSx SINE Alu -2S 287 3 437 26 80.0 O.O. O.O 3O887S 3O8914 -199 + AT rich Low Cplxty 1 40 O 438 76S 15.O 4.4 O.O 3O891S 309027 -86 + AluJo SINE Alu 1 118 -194 439 43S 14.5 O.O O.O 3O9052 309113 O C AluSZ6 SINE Alu -13 299 238 440 US 2013/0130246 A1 May 23, 2013 28

TABLE 4 Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat family begin end (left) id (count) Ex- O cluded region 1 Probe 398 34.5 9.7 1.3 240 456 - 172044 C L3 LINECR1 -71S 3384 31SO 1 O 1 Ex- 2477 7.O O.6 1.0 2534 2845 - 169655 -- AluY SINE Alu 1 311 O 2 2 cluded region 2 2391 8.5 O.O 2.3 2948 3524 -169246 - AluSg SINE Alu 3 3O2 -8 3 Probe 21 42.9 O.O. O.O 4058 4078 - 168422 -- AT rich Low com- 1 21 O 4 O 2 plexity 181 13.3 O.O O.O 5187 S216 - 167284 C L2b LINEAL2 -2 3373 3344 5 21 S3.6 O.O. O.O S344 S371 - 167129 - AT rich Low com- 1 28 O 6 plexity 25 44.O O.O. O.O 6259 6283 - 166217 - AT rich Low com- 1 25 O 7 plexity 36 69.4 O.O. O.O 6261 6296 - 1662O4 - AT rich Low com- 1 36 O 8 plexity 3OO 32.4 7.6 6.2 634-6 6569 -165931 C L2c LINEAL2 -139 3248. 3022 9 Ex- 2134 12.3 3.6 O.3 7463 7763 - 164737 C AluSp SINE Alu -2 311 1 O 3 cluded region 3 4581 12.2 3.9 2.7 7764 8O38 - 164462 -- Tigger1 DNATcMar.- 1552 1829 -589 1 Tigger 2268 12.5 O.O O.O 8039 83SO - 1641SO C AluSz SINE Alu O 312 1 2 4581 12.2 3.9 2.7 8351 8579 -163921 - Tigger1 DNATcMar- 1830 2052 -366 1 Tigger 2110 12.2 O.4 O.4 858O 8896 -163604 -- AluSc SINE Alu 1 309 O 3 4581 12.6 S.9 2.5 8897 9223 -163277 - Tigger1 DNATcMar- 2053 2418 O 1 Tigger Probe 4581 12.6 S.9 2.5 8897 9223 -163277 - Tigger1 DNATcMar- 2053 2418 O 1 O 3a Tigger 722 28.2 6.O O.9 9919 10136 -162364 C MIRb SINEMIR -14 2S4 26 4 566 16.8 1.6 24 11054 11181 - 161319 - L1MB8 LINEAL1 6051 6.177 -1 5 21 6 15.8 O.O. O.O. 11954 11991 - 1605.09 - T-rich Low com- 143 18O O 6 plexity Ex- O cluded region 4 Probe 1039 34.0 8.2 3.8 4509 SO76 -157424 C L2b LINEAL2 O 3375 2752 17 O 3b S80 10.9 8.9 O.O 5077 5177 - 157323 - L1MB4 LINEAL1 607O 6179 -1 18 1039 29.2 11.7 4.9 5178 5625 - 156875 C L2b LINEAL2 -668 2751 2301 17 392 34.2 7.O. O.O S699 5856 -156644 - MERSB DNA haT- 5 173 -5 19 Charlie 26O 27.O 2.2 11 6498 6587 - 155913 - MERSB DNA haT- 1 91 -87 2O Charlie 356 35.0 9.7 1.8 6639 7148 -155352 -- L2b LINEL2 687 1265 - 2154 2 Ex- 356 35.0 9.7 1.8 6639 7148 -155352 -- L2b LINEL2 687 1265 - 2154 2 O cluded region 5 Probe S82. 29.9 8.9 3.0 7310 8O31 -154469 - L2b LINEL2 1332 2163 - 1256 2 O 3c 570 21.9 5.8 O.6 8054 8209 -154291 - MERSA1 DNA haT- 2 16S -1 22 Charlie 61S 26.7 6.3 7.5 8211 8297 -1542O3 + L2b LINEL2 221S 2285 -1134 2 463 12.4 O.O. O.O 8.298 8386 -154114 C L1PB1 LINEL1 O 6151 6063 23 61S 26.7 6.3 7.5 8387 8553 - 153947 - L2b LINEL2 2286 2466 -953 2 616 28.0 83 2.9 8583 8810 -153690 C MIR SINEMIR O 262 23 24 2S1 27.6 7.8 4.5 8895 9023 -153477 - L2b LINEL2 261.8 27SO -669 2 18O 244 18.9 O.9 91.84 9278 -153222 -- L2b LINEL2 3O29 3140 -235 2 US 2013/0130246 A1 May 23, 2013 29

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat family begin end (left) id (count) 288 25.5 S.2 O.O. 1943O 19517 - 152983 - MIR SINEMIR 108 2O6 -62 25 409 20.3 O.9 13.S. 20554 20661 -151839 - MER2O DNA haT- 6 101 -118 26 Charlie Ex- 2283 10.6 O.O. O.7 20878 21178 - 151322 C AluSX1 SINE Alu -13 299 1 27 9 cluded region 6 26SO 5.7 O.O. O.O 21294 2.1593 -150907 C AluYk4 SINE Alu -12 3OO 1 28 411. 30.1 O.O O.O 21609 21711 -150789 C MIR SINEMIR -2 260 158 29 271. 27.3 6.5 O.O 21747 21823 -150677 - L1MEg LINEAL1 117 198 -6OO2 30 1322 24.O 7.1 2.2 21910 22707 - 149793 - L1MEg LINEAL1 667 1481 -4719 30 2394 10.8 O.O. O.O 22717 23021 -1494.79 - AluSx SINE Alu 1 305 -7 31 367 22.O 15.O S.O. 231 OS 23289 - 14921.1 - L1MEg LINEAL1 1665 1878 -4246 30 2251 12.5 16 O.O 23290 23594 -148906 - AluSX1 SINE Alu 1 310 -2 32 367 23.S 14.9 3.8 23595 23754 -148746 - L1MEg LINEAL1 1858. 2035 -416S 30 21 66.7 O.O. O.O 23863 23883 -148617 - AT rich Low com- 1 21 O 33 plexity 2312 9.8 O.O. O.O 23884 24168 - 148332 C AluSg4 SINE Alu -27 285 1 34 354 27.4 23.6 O.1 24296 244-62 - 148O38 - MIRb SINEMIR 44 240 -28 35 2271 11.0 O.O. O.3 25061 25359 - 147141 C AluSq2 SINE Alu -14 298 1 36 204 31.0 SS 4.3 2S745 2S835 - 146665 - L2c LINEAL2 3252 3343 -44 37 189 38.0 1.8 2.7 26973 27083 - 145417 - L2 LINEAL2 2741 28SO -569 38 3579 15.7 3.5 1.5 28391 28663 - 143837 -- L1MA9 LINEAL1 SSS6 S823 -489 39 2204 10.2 O.O 14 28664. 28973 -143527 - AluSx SINE Alu 1 312 O 40 3579 15.7 3.5 1.5 28.974 294.08 -143092 -- L1MA9 LINEAL1 S824 6279 -33 39 2260 11.5 O.O 1.9 2942O 29733 - 142767 C AluSx SINE Alu -3 309 2 41 388. 29.1 18.1 O4 3OO6O 3O252 - 142248 - MIRb SINEMIR 40 266 -2 42 2247 9.7 O.3 0.7 30637 30936 - 141564 - AluSp SINE Alu 1 299 -14 43 Probe 467 24.0 104 O.O 322O6 32359 -140141 C MER3 DNA haT- -21 188 19 44 O 5 Charlie 637 15.5 13.4 4.7 32864 32983 -139517 C Charlie1a DNA haT- O 14S5 1322 45 Charlie Ex- 637 15.5 13.4 4.7 32864 32983 -139517 C Charlie1a DNA haT- O 14S5 1322 45 2 cluded Charlie region 7 2301 10.8 O.O. O.3 32984 33289 -13921.1 - AluSz SINE Alu 1 305 -7 46 637 16.9 15.4 3.0 33290 33571 -138929 C Charlie1a DNA haT -134 1321 988 45 Charlie 594 21.1 7.8 O.O 336O7 33772 - 138728 C Charlie1a DNA haT -590 865 687 45 Charlie 1745 21.7 7.6 1.8 33787 34341 - 13815.9 C Charlie1a DNA haT -804 651 67 45 Charlie 228O 10.4 1.O O.O 34508 34.805 - 137695 C AuSc8 SINE Alu -11 301 1 47 25 69.2 O.O. O.O 34861 34899 - 1376O1 - AT rich Low com- 1 39 O 48 plexity Probe SS1 28.8 9.0 2.0 35403 35590 - 136910 - MIRb SINEMIR 8 208 -60 49 O 6 346 34.6 12.2 4.0 35890 36193 - 136307 C L2c LINEAL2 -79 33O8 2981 50 243 37.6 S.S. S.S. 36411 36.666 -135834 - L2c LINEAL2 291 O 316S -222 51 186 15.2 152 O.O 36661 36,706 -135794 C L2a LINEAL2 -98 3328 3276 52 278 36.5 4.1 O.8 36911 37059 -135441 - MERSB DNA haT- 7 153 -25 53 Charlie 232 39.2 2.9 O.O 370S6 371.57 - 135343 C L2c LINEAL2 -648 2771 2667 50 293 29.1 12.7 9.0 37286 37553 -134947 C L2c LINEAL2 -2 3,385 3109 S4 22 S9.1 O.O O.O 37814 37835 - 13466S - AT rich Low com- 1 22 O 55 plexity 1767 14.8 26 O.3 38O38 383SO -1341SO C L1MC2 LINEAL1 -158 61.86 S867 56 2581 44 10.9 O.O 383S1 38783 -133717 C MER9a3 LTRERVK O 512 33 57 2SO3 12.5 S4 O.2 3879O 39214 -133286 C L1MC2 LINEAL1 -471 S873 5427 56 Ex- 2SO3 12.5 S4 O.2 3879O 39214 -133286 C L1MC2 LINEAL1 -471 S873 5427 56 1 cluded region 8 257S 6.6 O.O. O.3 3922O 39520 - 132980 C AluY SINE Alu -11 3OO 1 58 Probe 447 30.7 12.8 1.3 401.06 40462 -132O38 C L2a LINEAL2 O 3426 2972 59 1 7 1324. 19.2 10.7 1.O 4O694 4O974 - 131526 C AluJr SINE Alu -2 310 3 60 US 2013/0130246 A1 May 23, 2013 30

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- rebeat family begin end (left) id (count) Ex- 2608 S.3 13 O.O 41606 41907 30593 C AuY SINE Alu -S 306 1 61 10 cluded region 9 1898. 14.0 0.4 O.O 43234 43497 29OO3 + AluSX SINE Alu 1 26S -47 62 2028 8.5 O4 1.2 43498 437SS 28745 - AuY SINE Alu 41 296 -15 63 1289 15.4 O.4 8.1 43837 44089 284.11 C Alub SINE Alu -14 298 64 64 1897 13.9 O.O. O.O 443OO 4456S 27935 C AluSX1 SINE Alu -2 310 45 65 311 17.9 O.O 1.S 44716 44783 2.7717 MERS3 DNA haT 12 78 - 115 66 491. 14.9 O.O 1.1 44783 44870 27630 - MERS3 DNA haT 107 193 O 67 480 14.4 4.8 11.0 4577O 45894 26.606 C MER44D DNATcMar- -2 703 S86 68 Tigger 1057 7.7 1.6 2.7 45879 46.064 26436 C MER44D DNATcMar- -79 626 444 68 Tigger 24OS 12.7 S.6 12 46.064 46728 25772 C Tigger? DNATcMar- -1653 838 145 69 Tigger 919 18.1 O.O O.O 46776 46930 2SS7O C MER44D DNATcMar- -549 1S6 2 68 Tigger 1210 14.2 11.8 O.8 47131 47342 25158 C AufSX SINE Alu O 312 78 70 967 18.1 O.O O.O 47SOO 47648 24852 -- Alub SINE Alu 152 300 -12 71 2O8 22.0 1.1 6.O 47867 47953 24547 - (TATG)n Simple repeat 3 85 O 72 4691. 7.6 0.2 0.6 49683 SO3O7 22193 C L1PA10 LINEAL1 -11 6157 SS36 73 1758. 20.7 O.7 O.O SO462 SOf 66 21734 - AluJr4 SINE Alu 1 307 -5 74 2343 10.9 O.O. O.3 S1130 S1431 21069 - AluSz SINE Alu 1 301 -11 75 1741 18.6 14 O.3 51949 52244 2O2S6 C AluJo SINE Alu -9 303 5 76 Probe O 11 Ex- 2443 O.4 O.O. O.8 S7693 S79SO 4550 - AluYaS SINE Alu 41 296 -14 77 3 cluded region 10 2O3 29.1 9.0 3.8 57957 58056 4444 + MIRC SINEMIR 63 167 -101 78 2301 9.7 1.O O.3 S8059 S8356 4144 + AluSx SINE Alu 1 3OO -12 79 219 18.6 3.1 15.8 S8361 S8424 4O76 -- MIR SINEMIR 2OO 256 -6 8O 1903 12.7 4.4 9.S 58558 S8831 3669 C Tigger3a DNATcMar- O 348 61 81 Tigger 2336 9.7 O.O 1.O 58832 591.30 3370 - AluSx SINE Alu 1 296 -16 82 1903 12.7 4.4 9.S. S9131 S922O 3280 C Tigger3a DNATcMar- -288 60 1 81 Tigger Probe 1903 12.7 4.4 9.S. S9131 S922O 3280 C Tigger3a DNATcMar- -288 60 1 81 1 12 Tigger 27O 39.8 O.O. O.O 60002 6O119 2381 - L4 LINERTE-X 1467 1584 -445 83 18O 111 O.O O.O 6O23S 60261 2239 - (A)n Simple repeat 1 27 O 84 474 10.8 9.2 O.O 60778 60842 1658 C AluSq10 SINE Alu -236 76 6 85 612 13.2 0.9 O.O 60849 60962 1538 C Charlie1a DNA haT- -26 1429 1315 86 Charlie 1915, 18.2 4.9 O.7 6096S 61374 1126 C Charlie1a DNA haT- -617 838 412 86 Charlie 321 29.3 S.9 2.1 61403 61.538 O962 C Charlie1a DNA haT -1314 141 1 86 Charlie 1905 12.3 7.7 1.4 61652 61988 OS12 C Tigger4b DNATcMar- -1 360 3 87 Tigger 656 22.7 6.7 8.5 62213 62S11 O9989 C L.1MC4a LINEAL1 -1844 6038 S745 88 309 32.5 6.3 3.3 63O88 63262 O9238 C MIRC SINEMIR -19 249 70 89 307 26.2 21.7 1.O 63277 63442 O9058 - HAL1 LINEAL1 42 241 -2266 90 820 26.3 16.0 3.2 63465. 6426S O823S - HAL1 LINEAL1 271 1172 -1335 90 744. 23.8 8.6 6.5 64278 64682 O7818 - HAL1 LINEAL1 1215 1627 -880 90 646 29.9 9.2 1.7 6471 O 64981 O7519 - HAL1 LINEAL1 1667 1958 -549 90 Ex- 646 29.9 9.2 1.7 6471 O 64981 O7519 - HAL1 LINEAL1 1667 1958 -549 90 4 cluded region 11 2221 11.7 2.O O.O 65009 65307 O7193 - AluSZ6 SINE Alu 1 305 -7 91 741, 28.S 17.7 SO 653O8 65642 O6858 - HAL1 LINEAL1 15 396 -2111 92 1932 12.4 O.4 O.O 65643 65900 O66OO - AluSx SINE Alu 42 300 -12 93 741. 25.5 7.2 8.2 65901 66135 O6365 - HAL1 LINEAL1 397 625 - 1882 92 S13 26.8 6.3 2.2 66162 66.382 O6118 - HAL1 LINEAL1 743 972 -1535 92 226, 27.4 8.6 9.6 66385 66535 OS965 - HAL1 LINEAL1 1945 2094 -413 92 2516 7.3 O.O 1.3 66536 668SO OS6SO - AluY SINE Alu 1 311 O 94 US 2013/0130246 A1 May 23, 2013 31

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat family begin end (left) id (count) 226, 27.4 8.6 9.6 66851 66926 -105574 - HAL1 LINEAL1 2095 2166 -341 92 482O 10.2 21 O.O 66927 676OO -104900 - LTR12 LTRAERV1 1 688 O 95 226, 27.4 8.6 9.6 676O1 67698 -104802 -- HAL1 LINEAL1 2167 2268 -239 92 2139 11.2 O.O. O.O 67853 68168 - 104332 C AluY SINE Alu O 311 2 96 Probe 13 460 25.0 6.8 1.9 69115 69261 -103239 - L2a LINEAL2 1657 1810 -1609 97 O 85O 28.6 3.9 2.3 69391 69648 -102852 -- L2a LINEAL2 273S 2996 -423 97 345 23.9 19.3 1.4 6967O 69788 -102712 -- L2a LINEAL2 3286 3425 -1 97 327 31.5 8.0 3.0 69875 701 OO -102400 C L2 LINEAL2 -923 2496 2260 98 Ex- 2153 8.9 2.O 1.O 71648 71776 -100724 - AluSx SINE Alu 1 129 -183 99 3 cluded region 12 225 O.O. O.O. O.O 71777 71801 - 1 OO699 - (TAAA)n Simple repeat 2 26 O OO 2153 8.9 2.O 1.O 71802 71.96S -100.53S - AluSx SINE Alu 130 296 -16 99 2223 8.1 O.O. 9.2 72116 72437 -100063 C AluSp SINE Alu -1.8 295 1 O1 Probe 967 25.5 2.0 3.7 73109 73356 -99144 C MIR SINEMIR -2 260 17 O2 O 14 Ex- 2433 9.2 O.O. O.3 74262 74.565 -97935 - AluSX1 SINE Alu 1 303 -9 O3 5 cluded region 13 1011 11.4 O.O. O.7 74578 74717 -97783 - Alub SINE Alu 1 139 -173 O4 2204 12.2 O.O. O.3 7472O 7SOO7 -97493 - AluSx SINE Alu 2 288 -24 05 2390 11.0 O.7 O.O 7SOO8 75315 -97185 - AluSx SINE Alu 1 310 -2 O6 1873 27.2 6.O 3.O 7S901 76439 -96.061 C L2a LINEAL2 -8 3418 2826 O7 2284 9.4 14 O.O 7644.O 7672S -9577S C AluSx SINE Alu -22 290 1 O8 1873 25.9 6.3 2.2 76726 77867 -94.633 C L2a LINEAL2 -594 2825 1505 O7 Probe 1873 25.9 6.3 2.2 76726 77867 -94.633 C L2a LINEAL2 -594 2825 1505 O7 1 15 24 S4.8 O.O. O.O 77993 78023 -94.477 - AT rich Low com- 1 31 O 09 plexity 1987 14.5 O.7 2.3 78087 78396 -94.104 C AluJr SINE Alu -6 306 2 O 654 26.9 11.1 3.8 80306 8077S -9172S C HAL1 LINEAL1 -1 2506 2003 1 366 24.7 222 O.4 8091S 81145 -913SS C HAL1 LINEAL1 -698. 1809 1529 1 Ex- 366 24.7 222 O.4 8091S 81145 -913SS C HAL1 LINEAL1 -698. 1809 1529 1 15 cluded region 14 362. 14.3 O.O O.O 81186 81241 -91.259 C AluJo SINE Alu -1O 302 247 2 810 18.7 O.O. O.O 81247 81369 -91131 C AluJo SINE Alu -1.89 123 1 3 2337 10.8 1.O O.O 81439 81745 -907SS C AluSq2 SINE Alu -2 310 1 4 222 12.8 O.O. O.O 81790 81828 -90672 -- (T)n Simple repeat 1 39 O 5 645 22.8 3.0 3.0 81861 82095 -904OS C HAL1 LINEAL1 -1173 1334 1100 1 2246 12.8 O.O. O.O 82608 82904 -89596 - AluSz SINE Alu 1 297 -15 6 870 26.O 8.8 4.5 82945 83220 -89280 - L1MCS LINEAL1 6652 6915 -1046 7 2237 11.4 O.O. O.7 83221 83518 -88.982 - AluSX1 SINE Alu 1 296 -16 8 870 26.O 8.8 4.S 83519 83591 -88.909 - L1MCS LINEAL1 6916 7007 -954 7 1689 17.8 3.1 2.O 83592 83884 -88.616 - Alub SINE Alu 3 298 -14 9 870 23.0 49 4.9 83885 84.043 -88.457 - L1MCS LINEAL1 7008 7187 -774 7 238S 8.7 O.O. O.3 84O76 84374 -88126 C AuSx3 SINE Alu -1 311 14 2O 361. 24.7 11.5 6.8 84442 84.667 -87833 C HAL1 LINEAL1 -1433 1074 839 1 2526 7.4 O.3 O.O 84867 8517S -87325 C AluSg4 SINE Alu -2 310 1 21 524 30.4 1.8 O.6 8S327 854.95 -87OOS C HAL1 LINEAL1 -2O66 441 271 1 51O 25.4 7.2. 6.6 85541 85640 -86860 - MIR SINEMIR 78 186 -76 22 23O2 10.3 O.O O.O 85641 8S941 -86.559 C AluSX1 SINE Alu -11 301 1 23 51O 25.4 7.2. 6.6 85942 86O21, -86479 - MIR SINEMIR 187 259 -3 22 1959 12.4 5.7 O.O 86679 86960 -85S4O C AluSq2 SINE Alu -14 298 1 24 3783. 12.4 2.8 O.3 8778S 883.89 -84.111 C Tigger1 DNATcMar- O 2418 1799 25 Tigger 2326 9.8 6.7 O.8 8839O 88749 -837S1 C THE1D LTRERVL- O 381 1 26 MaLR 6464 20.4 3.7 4.3 887SO 89064 -83436 C THE1D-int LTRERVL- O 1651 1336 26 MaLR 1687 11.7 O.4 O.4 8906S 89294 -83.206 C AluSZ6 SINE Alu -16 296 67 27 2204 13.9 O.O. O.O 89.295 89603 -82.897 - AluSg SINE Alu 2 310 O 28 6464 20.4 3.7 4.3 896O4 90942 -81558 C THE1D-int LTRERVL- -316 1335 5 26 US 2013/0130246 A1 May 23, 2013 32

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat family begin end (left) id (count) 21 SS 11.9 7.3 1.1 90947 91303 -81197 C THE1D LTRERVL- O 381 3 26 MaLR 271 6 11.2 3.1 1.9 913O8 91627 -8O873 C Tigger1 DNATcMar- -617 1801. 1473 25 Tigger 2474 7.4 O.3 O.O 91628 91926 -80574 C AluSp SINE Alu -12. 301 2 29 271 6 11.2 3.1 1.9 91927 92061 -80439 C Tigger1 DNATcMar- -946 1472 1341 25 Tigger 691 18.9 2.0 4.8 92O)6O 92209 -80291 C Tigger1 DNATcMar- -2271 147 2 30 Tigger 2112 13.6 O.7 O.3 92309 92.610 -79890 - AluSz SINE Alu 1 303 -9 31 23 65.2 O.O. O.O 93071 93.093 -794.07 - AT rich Low com- 1 23 O 32 plexity 259 25.2 8.8 1.4 93163 93299 -79201 + Charlie16a DNA haT- 195 341 -1 33 Charlie 2340 9.7 O.7 O.O 93.378 93.675 -78825 - AluSq2 SINE Alu 1 3OO -12 34 Probe 2O2 33.9 104 24 943OS 94419 -78081 - MIR3 SINEMIR 82 205 -3 35 O 18 206 12.9 O.O. O.O 9474O 94770 -77730 - (TTA)n Simple repeat 2 32 O 36 61S 27.6 3.3 3.8 94907 95117 -77383 - MIR SINEMIR 34 243 -19 37 Ex- 323 25.3 7.1 7.8 96452 966O2 -75898 C HAL1b LINEAL1 -1336 673 523 38 1 cluded region 15 239S 1 O.S O.O O.O 96.603 96907 -75593 C AluY SINE Alu -6 305 1 39 323 25.3 7.1 7.8 96.908 97051 -75449 C HAL1b LINEAL1 -1487 S22 380 38 Probe 323 25.3 7.1 7.8 96.908 97051 -75449 C HAL1b LINEAL1 -1487 S22 380 38 1 19 1346 2S.S 13.0 3.7 97232 97.965 - 74535 C L2a LINEAL2 -1 3425 2625 40 795. 20.8 10.2 O.O 97.979 98.175 -74325 C L2a LINEAL2 -869 2SSO 2334 40 1175 5.3 O.O O.O 98188 98.319 - 74181 C AluY SINE Alu -179 132 1 41 957 25.0 3.7 5.0 98.323 98.646 -73854 C L2a LINEAL2 -1091 2328 2009 40 1822 28.0 S.S. 2.8 9866O 99147 -73353 C L2a LINEAL2 -1465 1954 1460 40 Ex- 1822 28.0 S.S. 2.8 9866O 99147 -73353 C L2a LINEAL2 -1465 1954 1460 40 1 cluded region 16 2307 7.8 3.8 O.O 99.148 99440 -73.06.0 - AluY SINE Alu 1 3O4 -7 42 1822, 28.8 8.3 1.8 994.41 10OS2O -71980 C L2a LINEAL2 -196O 1459 259 40 Probe 1822, 28.8 8.3 1.8 994.41 10OS2O -71980 C L2a LINEAL2 -196O 1459 259 40 O 2O 229 9.1 O.O O.O OO540 10OS83 -71917 C L1MA1 LINEAL1 O 6302 6259 43 Ex- 1871 12.6 O.O. O.O O2237 102490 -7001 O - AluSx SINE Alu 44 297 -15 44 1 cluded region 17 Probe 236 24.6 4.5 2.9 O2761 102827 -69673 C HAL1b LINEAL1 -1785 224 157 38 O 21 1602 16.4 37 0.3 O2909 103217 - 69283 C MLT1C LTRERVL- -19 448 130 45 MaLR 7752 5.3 1.0 O2 O3218 104175 -68325 - LTR13A LTRERVK 1 966 O 46 Ex- 7752 5.3 1.0 O2 O3218 104175 -68325 - LTR13A LTRERVK 1 966 O 46 1 cluded region 18 1602 16.4 37 0.3 O4176 1041.89 -68311 C MLT1C LTRERVL- -338 129 115 45 MaLR 1941 15.5 O3 O.7 O4190 104485 -68O15 C AluSX3 SINE Alu -16 296 2 47 1279 12.0 10.2 11 O4490 104734 - 67766 - MER47A DNATcMar- 30 296 -70 48 Tigger Probe 1279 12.0 10.2 11 O4490 104734 - 67766 - MER47A DNATcMar- 30 296 -70 48 1 22a Tigger 1976 26.4 3.6 4.5 O481O 105732 -66,768 C L1MDa LINEAL1 -3919 2699 1780 49 298 16.3 O.O O.O OS741 105789 -66711 - MER47A DNATcMar- 307 355 -11 50 Tigger 181 32.9 3.5 2.3 O6217 1063O3 -66.197 - L2 LINEAL2 2804 2891 -528 51 667 17.2 9.0 O.O O6378 106499 - 66001 - AluJr SINE Alu 1 133 -179 52 S84 28.8 7.O 1.0 O6933 107118 -65382 C MIRb SINEMIR -63. 205 9 53 979 25.1 18.2 0.2 O7288 1076SS - 64845 C LTR16 LTRERVL -4 434 1 S4 US 2013/0130246 A1 May 23, 2013 33

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat family begin end (left) id (count) Ex- O cluded region 19 Probe 850 11.8 48.0 1.O. 108472 108675 -63825 - AluSz SINE Alu 1 3OO -12 55 1 22b 2O71 22.6 7.S 3.2 1086.79 109832 -62668 C L.1MC4a LINEAL1 -5 7877 6672 56 1300 27.4 6.7 S-3 109826 OSS7 -61943 C L.1MC4a LINEAL1 -166O 6222 5481 56 SO3 25.1 170 0.4 1505 1716 -60784 C MIR SINEMIR -14 248 2 57 26 76.9 O.O. O.O 1823 1848 - 60652 -- AT rich Low com- 1 26 O 58 plexity 25 48.0 O.O. O.O 1826 1850 - 60650 - AT rich Low com- 1 25 O 59 plexity Ex- 2266 11.9 O.O. O.7 2029 2338 - 60162 C AluSZ6 SINE Alu -1 311 4 60 5 cluded region 2O 434 30.8 9.8 1.8 2397 2439 -6OO61 C MIRC SINEMIR -1.8 250 211 61 347 21.8 13 O.O 2440 2S17 -59983 - MADE2 DNATcMar- 1 79 -1 62 Mariner 434 30.8 9.8 1.8 2518 2678 -59822 C MIRC SINEMIR -58. 210 30 61 709 17.2 7.0 5.1 3509 3565 -58935 C MIR SINEMIR -48 214 158 63 1081 17.9 1.O 2.0 3566 3770 -5873O C MER6B DNATcMar- -3 207 5 64 Tigger 709 17.2 7.0 5.1 3771 3884 -58616 C MIR SINEMIR -105 157 40 63 922, 13.4 O.O. O.8 5087 5220 -57280 + FLAMC SINE Alu 1 133 -10 65 2194 12.4 O.O. O.3 5855 6153 -56347 C AluSx SINE Alu -14 298 1 66 21 52.4 O.O. O.O 666.2 6682 -55818 - AT rich Low com- 1 21 O 67 plexity 228 22.7 O.O. O.O 8269 8312 -54188 C MARNA DNATcMar- -263 323 28O 68 Mariner 334 29.6 11.7 2.5 83.35 85.14 -53986 C MARNA DNATcMar- -358 228 33 68 Mariner 2S8 28.7 4.7 4.7 9667 98.16 -52684 C MERSA1 DNA haT- -7 159 10 69 Charlie 2160 12.5 O.O O.O. 121296 121598 -SO902 -- AluSZ6 SINE Alu 1 303 -9 70 2590 48 0.3 2.6 121961 122276 -SO224 C AluY SINE Alu -2 309 1 71 2312 9.6 0.3 1.O 12252S 122837 -49663 C AluSq2 SINE Alu -1 311 1 72 Probe 383 25.5 1.O 1.O 124840 124938 -47562 -- L3 LINECR1 2392 2490 -1609 73 O 25 314 31.S 4.2 0.7 124992 1251.35 -4736S - MIRC SINEMIR 119 267 -1 74 347 26.4 16.3 1.O. 125363 125534 -46966 -- L3 LINECR1 2843. 3040 -1059 73 274 30.5 O.9 3.8 125573 125681 -468.19 C L2c LINEAL2 -1S 3372 3267 75 SO1 32.6 2.8 3.6 125939 126189 -46311 - L3 LINECR1 3577 3825 - 274 73 399 25.0 S.7 O.2 126418 126549 -45951 C MLT1H1 LTRERVL- -368 181 1 76 MaLR 24 45.8 O.O. O.O. 127392 127415 -45085 - AT rich Low com- 1 24 O 77 plexity 283 26.2 12.5 O.9 127944 128047 -44453 C L1MCS LINEAL1 -36 7925 7810 78 327 26.4 O.O. O.O. 128140 12823O -4427O C L1MCS LINEAL1 -396 7S65 7475 78 Ex- 327 26.4 O.O. O.O. 128140 12823O -4427O C L1MCS LINEAL1 -396 7S65 7475 78 3 cluded region 21 SO4 29.0 6.4 3.1 128273 1284.12 -44088 C L1MC4 LINEAL1 -2O 8022 7869 79 223S 1 O.O. O.3 4-S 128413 128.733 -43767 - AuSZ6 SINE Alu 1 3O8 -4 8O SO4 29.0 6.4 3.1 128734 128841 -43659 C L1MC4 LINEAL1 -174 7868 7766 79 27 40.7 O.O. O.O. 128958 128984 -43516 - AT rich Low com- 1 27 O 81 plexity 221 6 10.3 O.O O.7 1290O2 129293 -432O7 C AluSX1 SINE Alu -22 290 1 82 26 69.2 O.O. O.O. 1293O4 1293.29 -43171 - AT rich Low com- 1 26 O 83 plexity 716. 29.2 6.6 2.7 129439 129758 -42742 C L1MC4 LINEAL1 -495 7547 7216 79 284. 25.5 7.7 12.O. 1298O3 129944 -42556 C L1ME4a LINEAL1 -90 6034 S888 84 2477 8.5 O.O O.O. 12994S 130249 -42251 C AluSx SINE Alu -7 305 1 85 284. 25.5 7.7 12.O 1302SO 130445 -420SS C L1ME4a LINEAL1 -237 5887 5710 84 US 2013/0130246 A1 May 23, 2013 34

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat family begin end (left) id (count) Probe 348 38.5 OS 2.2 3072S 130910 -41590 C MIRb SINEMIR -3S 233 51 86 O 26 494. 23.5 33 1.6 30919 131O39 -41461 C L1M6 LINEAL1 -469 1805 1683 87 379 28.8 9.6 4.4 31119 131336 -41164 C MLT1J LTRERVL- -48 464 236 88 MaLR 22 63.6 O.O. O.O 3.1455 131476 -41024 - AT rich Low com- 22 O 89 plexity S59 274 4.7 S.1 31889 132146 -40354 - L2a LINEAL2 317O 3426 O 90 3SO 23.1 26 O.O 32152 132229 -40271 C L1MES LINEAL1 -321 S873 5794 91 443 28.0 21.4 3.8 32249 132461 -4OO39 C MIR SINEMIR -4 258 8 92 269 2S.O. 12.O O.7 32474 132606 -398.94 C L1MS LINEAL1 -339 S784 S637 93 S82 25.6 0.8 O.O 32696 132828 -39672 -- L2a LINEAL2 3293 3426 O 94 Ex- 2247 9.0 O.O. O.O 32904 133181 -393.19 C AluSg SINE Alu -3 279 2 95 1 cluded region 22 Probe 2247 9.0 O.O. O.O 32904 133181 -393.19 C AluSg SINE Alu -3 279 2 95 1 27 2851 6.5 22 O.3 33284 133639 -3.8861 -- THE1C LTRERVL- 3 365 -10 96 MaLR 10891 9.9 3.9 O.6 3364O 13S167 -37333 - THE1C-int LTRERVL- 1578 -2 96 MaLR 2549 7.5 2.2 4.5 3S168 1353.07 -37193 - THE1C LTRERVL- 19 16O -215 96 MaLR Ex- 2549 7.5 2.2 4.5 3S168 1353.07 -37193 - THE1C LTRERVL- 19 16O -215 96 2 cluded MaLR region 23 2O27 12.1 O.O 8.5 3S3O8 135638 -36862 C AluSX1 SINE Alu -6 306 2 97 2549 7.5 2.2 4.5 35639 135862 -36638 - THE1C LTRERVL- 161 375 O 96 MaLR 256 26.8 7.8 2.7 36283 136424 -36O76 C L1M6B LINEAL1 -156 213 65 98 2419 8.7 O.O. O.7 36753 137063 -35437 C AluSq2 SINE Alu -3 309 1 99 Probe 289 30.O 4.7 S4 371.89 137336 -3S164 C L2a LINEAL2 -4 3422 3276 2OO 1 28a. 258. 29.4 6.7 1.8 37612 13771S -34785 - MIRb SINEMIR 116 224 -44 2O1 397 25.0 3.8 2.5 39471. 139630 -32870 C Charlie188 DNA haT- -2 340 179 2O2 Charlie 1647 17.7 2.4 4.0 39631 140006 -32494 - L1MB4 LINEAL1 5777 6146 -34 2O3 458 S.7 O.O. O.O 40640 140692 -31808 C AYb8 SINE Alu -260 58 6 204 245 20.4 2.0 O.O 4O696 14O744 -31756 C L1MS LINEAL1 -453 S671 S622 205 360 2O.S 13.3 O.O 411 OS 141238 -31262 C L1ME4a LINEAL1 -7 6117 5952 2O6 Ex- 604 23.5 13.9 O.4 41588 141796 -30704 C MIRC SINEMIR -10 258 22 2O7 9 cluded region 24 355 33.1 18, 3.6 41846 142014 -30486 C MIR3 SINEMIR -23 185 2O 208 290 30.1 11 O.O 42104 1421.96 -3O3O4 C MIR3 SINEMIR -1 2O7 114 209 24S 23.2 11.5 6.1 428OS 142882 -296.18 C L2c LINEAL2 -2O 3367 3286 210 189 7.4 O.O. O.O 43821 143847 -28653 + (CTGGGG)n Simple repeat 6 32 O 211 24 54.2 O.O. O.O 44054 144077 -28423 + GC rich Low com- 1 24 O 212 plexity 183 8.O O.O. O.O 44078 144102 -28398 - (CTG)n Simple repeat 1 25 O 213 1181 17.2 11-S 1.5 4SS89 145671 -26829 - MER33 DNA haT- 1 81 -243 214 Charlie 2001 15.5 O.O O.3 45672 145974 - 26526 - AluJr SINE Alu 1 3O2 -10 215 1181 17.2 11-S 1.5 4597S 146185 - 26315 - MER33 DNA haT- 82 324 O 214 Charlie 188 32.9 7.8 1.1 46389 146SS4 -25946 C L2 LINEAL2 -1148 2271 2095 216 247 23.3 8.6 4.0 46683 1468O8 -25692 - L2c LINEAL2 3229 3358 -17 217 2357 7.8 O3 O.O 46879 1471.93 -2S307 - AluSp SINE Alu 1 313 O 218 295 29.2 6.9 O.O 47406 147S35 -24965 - HAL1 LINEAL1 1SO 288 - 2219 219 793 22.6 S.8 4.9 47869 1481.10 - 24390 C MER46C DNATcMar- O 338 95 220 Tigger 1758 10.8 O.O. O.4 48122 148352 -24148 C Alub SINE Alu -81 231 2 221 722 16.O 7.9 7.5 48393 148639 -23861 - L1MB2 LINEAL1 S942 61.78 -5 222 298 226 O.O. O.O 486.51 148712 -23788 C MER46C DNATcMar- -274 64 3 220 Tigger US 2013/0130246 A1 May 23, 2013 35

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu Sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % Sequence (hg18 matching class (left) end begin linkage Seq score div. del. ins. begin end (left) -- repeat family begin end (left) id (count) 2096 9.S 4.7 1.6 1494.17 149712 -22788 - AluSX1 SINE Alu 1 305 -7 223 2301 9.8 O.9 2.2 149713 150028 -22472 -- AluSq SINE Alu 1 312 -1 224 264. 29.2 8.3 12.8 150O88 15O137 - 22363 C MIRb SINEMIR -17 251 2O2 225 2099 11.0 O.3 7.2 15O138 150465 -22O3S C AluSx SINE Alu -5 307 1 226 266 27.9 6.O 7.6 1SO466 150634 -21866 C MIRC SINEMIR -67 2O1 38 225 278 214 15.O 4.8 151220 15131 O -21190 - L2a LINEAL2 33O3 3405 -21 227 2280 10.7 O.O. O.O 151311 1516O1 -2O899 C AluSX1 SINE Alu -21 291 1 228 278 214 15.O 4.8 1516O2. 151622 -20878 - L2a LINEAL2 34O6 3426 O 227 28 68.6 O.O. O.O 152478. 152S12 - 19988 - AT rich Low com- 1 35 O 229 plexity 2204 11.1 13 O.O 15258S 152906 - 19594 - AluSx SINE Alu 10 312 O 230 2129 11.3 O.O O.7 15292S 1532SO - 192SO C AluSz SINE Alu O 312 1 231 Probe 1328 11.5 3.0 43 154O64 1543OO -182OO C L1MA6 LINEAL1 -7 6293 6060 232 1 29 1331 9.1 O5 O.O 1543O1 154486 -18O14 - L1MA6 LINEAL1 5791 5977 -323 232 1253 11.9 O.O. O.O 154521 154688 -17812 -- AluSp SINE Alu 137 3O4 -9 233 186 4.3 O.O O.O 154690 154712 - 17788 - (CA)n Simple repeat 2 24 O 234 505 17.1 1.7 4.4 155541 155656 -16844 C Charlie4Z DNA haT- O 167 55 235 Charlie Ex- 2345 9.2 O.O 4.8 155799 1561.23 -16377 - AluSg4 SINE Alu 1 310 -2 236 6 cluded region 25 2161 10.1 2.1 O.O. 1S6S4S 156830 -15670 C AluSx SINE Alu -20 292 1 237 2127 12.2 O.O 1.7 15692O 157222 - 15278 C AluSz SINE Alu -14 298 1 238 2272 9.2 O.O 14 15747S 157817 - 14683 - AluSx SINE Alu 6 312 O 239 2219 3.4 2.7 O.O 15783O 157956 - 14544 - AluY SINE Alu 1 127 - 184 240 369 O.O. O.O. O.O 157957 157997 - 14503 - (TAAA)n Simple repeat 2 42 O 241 2219 3.4 2.7 O.O 157998 158132 - 14368 - AluY SINE Alu 128 269 -42 240 Probe 2231, 12.O O.3 0.7 160325 160633 -11867 C AluSX1 SINE Alu -4 3O8 1 242 2 30 1987 14.8 O.3 5.8 160810 161034 -11466 C Tigger3a DNATcMar- -20 328 106 243 1gger 1922 13.6 O.O. O.7 161035 161313 -11187 - AluSx SINE Alu 1 277 -35 244 27O O.O. O.O. O.O 161319 161348 -11152 -- (TAAA)n Simple repeat 3 32 O 245 1987 14.8 O.3 5.8 161349 161461 -11039 C Tigger3a DNATcMar- -243 105 2 243 1gger Probe 408. 29.6 1.O 118 161656 161862 -10638 - MER2OB DNA haT- 2 188 -595 246 O 31 Charlie 628 26.9 8.4 2.9 162861 163O86 -94.14 C MIR SINEMIR -23 239 2 247 S42. 30.2 3.3 0.9 16348S 163698 -8802 C L2 LINEAL2 -745 2674 2456 248 428 34.8 16.6 1.9 164306 164914 -7586 - L3 LINECR1 655 1352 -2747 249 181. 19.1 4.8 O.O 16504.8 165089 -741.1 - MIRb SINEMIR 144 187 -81 250 879 27.8 2.1 1.3 165105 165341 -7159 + Tigger13a DNATcMar- 12 2SO -521 251 1gger 450 29.4 10.1 0.0 165344 165571 - 6929 + Tigger13a DNATcMar- 342 592 - 179 252 1gger 460 22.3 7.1 44 165562 165716 -6784 + Tigger13a DNATcMar- 607 765 -6 253 1gger 3O8 24.3 O.O O.O 165721 165786 -6714 - MIRb SINEMIR 197 262 O 2S4 195 36.4 1.O 1.O 16581.6 165915 -6585 - L3 LINECR1 1344. 1443 -2656 249 585 27:S 202 0.7 166O18, 166396 - 6104 - L1MS LINEAL1 2S18. 2973 -3173 255 Ex- 585 27:S 202 0.7 166O18, 166396 - 6104 - L1MS LINEAL1 2S18. 2973 -3173 255 6 cluded region 26 2492 6.5 O.O O.O 166397 166690 -5810 C AluY SINE Alu -16 295 2 2S6 1414 15.4 1.4 19.3 166699 166938 -5562 C Alub SINE Alu -2 3OO 115 257 276 3.0 O.O. O.O 166939 166971 -5529 - (TC)n Simple repeat 2 34 O 258 1414 15.4 1.4 19.3 166972 167083 -5417 C Alub SINE Alu -188 114 1 257 237 28.2 10.3 2.2 167084 167217 -5283 - L1MS LINEAL1 2981 3118 -3028 255 746 18.4 O.O 3.8 167220 167355 -5145 + FLAMC SINE Alu 2 132 -11 259 299 25.1 8.5 1.1 167398 167562 - 4938 - L1MS LINEAL1 3219 3395 -2751 255 1486 16.O O.O 3.7 167618 167867 -4633 C AluJo SINE Alu -20 292 52 260 771 3.0.1 6.1 5.2 167896 1681 16 -4384 - L1MS LINEAL1 3410 3626 -2S2O 255 246O 9.3 O3 O.O 1681.17 168428 -4O72 C AluSp SINE Alu O 313 1 261 771 3.0.1 6.1 5.2 168429 168679 -3821 - L1MS LINEAL1 3627 3886 -2260 255 706 21.9 4.8 8.3 168751 169044 -3456 - L1MS LINEAL1 3929 42O8 -1938 255 US 2013/0130246 A1 May 23, 2013 36

TABLE 4-continued Total Alu sequences in probes 11 (10.5%) Total Alu sequences in excluded regions 93 (89.4% position in query repeat position in repeat Alu % % % sequence (hg18) matching class (left) end begin linkage Seq

score div. del. ins. begin end (left) -- repeat family begin end (left) id (count)

2O31 12.3 14 O.7 169045 169336 -3164 - AluSX1 SINE Alu 1 294 -18 262 71.6 22.1 11 S.O 169349 1695.34 -2966 -- L1M4 LINEAL1 2 180 - 6362 263 927 20.2 1.2 1.7 169546 169718 -2782 C FAM SINE Alu -13 172 1 264 2029 23.8 8.0 2.8 16972O 170776 - 1724 - L1M4 LINEAL1 188 1298 -524.4 263 Probe 2029 23.8 8.0 2.8 16972O 170776 - 1724 - L1M4 LINEAL1 188 1298 -524.4 263 O 32 1480 20.6 S.8 O.O 170776 171221 -1279 - L1M2 LINEAL1 1 472 -6377 26S 607 26.4 O.7 O.O. 171233 171376 -1124 - L1M2b LINEAL1 498 642 -6567 266 3991. 25.2 2.7 33 171348 172SOO O + L1M2 LINEAL1 581 1642 -52O7 26S Ex- 3991. 25.2 2.7 33 171348 172SOO O + L1M2 LINEAL1 581 1642 -52O7 26S O cluded region 27

TABLE 5 Description of the 6 characterized large rearrangements as detected by MLPA and Molecular Combing MLPA Molecular Sample Gene Status Combing Breakpoints (bp) Mechanism Mutation name Reference 1 BRCA1 Dupex 6.1 + 1.6 kb/ 38483,825-38489905 Alu-Alu HR c.4186 - 1785 4358- Puget et al. (1999) 13 Dup ex 13 1667dup6081 2 BRCA1 Delex 40.8 + 3.5 kb NBR1 38 562 663-38562 427; Pseudogen-Alu c.-33024 80- Puget N, 2002 2 Delex 2 BRCA1 38 S2S 728-38 S2S 492 3832de36936 Am J Hum Genet 70: 858-865 3 BRCA1 Delex 39.0 + 2.6 kb/ NBR1 38 562 663-38562 427; Pseudogen-Alu c.-33024 80- Puget N, 2002 2 Delex 2 BRCA1 38 S2S 728-38 S2S 492 3832de36936 Am J Hum Genet 70: 858-865 4 BRCA1 Dupex 6.7 + 1.2 kb/ 384-60514-38470596 Alu-Alu HR C.S.O75- Staafetal. (2008) 18-20 Dupex 18-20 1093 5277+ 2089dup10082 5 BRCA1 Delex 4.11.2 kb, 38478.177 38481174 Alu-Alu HR c.4484+857 4676- Puget et al. (1999b) 15 Delex 15 1396del 6 BRCA1 Delex 202.8 kb 38,507,324-38,483,560 Alu-Alu HR c.442-1901. 4358- Puget et al. (1999b) 8-13 Delex 8-13 1404del23763 All patients were previously characterized by high resolution acGH. and the reported values were originally described by Rouleau et al (Rouleau 2007).

TABLE 6 Robustness of BRCA1 and BRCA2 signals measurement in 10 control blood donors BRCA1-mean measured motifs length Blood donor g1b1 g2b1 g3b1 g4b1 g5b1 g6b1 g7b1

7232 8.6 1O.O 13.3 16.9 19.9 9.9 16.5 7673 8.4 9.9 14.2 17.5 20.8 11.2 18.2 7639 7.7 8.8 11.5 15.3 18.0 8.4 1S.O 7671 7.6 10.6 11.O 16.7 19.4 9.6 16.O 7672 8.4 1O.O 13.0 16.8 2O2 9.9 17.3 An 8 7.1 8.2 12.2 14.9 18.7 8.2 15.9 An 11 8.6 9.4 11.8 16.4 2O.S 9.6 17.4 An 12 8.6 11.O 12.5 17.0 20.8 11.2 18.0 An 13 8.7 9.9 13.6 17.1 2O2 9.8 17.6 An 14 8.4 9.8 12.2 16.5 2O2 9.5 17.5 L (measured) 8.2 9.7 12.5 16.5 19.9 9.7 16.9 SD O.S O.8 O.9 O.8 O.9 O.9 1.O calculated 8.5 9.5 12.3 16.5 19.7 9.3 17.7 delta O.3 -0.2 -0.2 O.O -0.2 -0.4 O.8 US 2013/0130246 A1 May 23, 2013 37

TABLE 6-continued Robustness of BRCA1 and BRCA2 signals measurement in 10 control blood donors BRCA2—mean measured motifs length Blood donor g1b2 g2b2 g3b2 g4b2 g5b2

7232 2O2 24.0 15.6 20.6 21.3 7673 22.6 24.4 16.4 22.3 22.4 7639 19.7 21.3 15.5 19.2 19.4 7671 20.7 22.3 15.9 21.3 21.3 7672 21.2 23.4 16.9 21.7 21.3 An 8 2O6 21.1 15.2 20.3 19.5 An 11 22.1 23.9 15.8 21.9 21.9 An 12 21.7 24.7 17.3 22.9 21.9 An 13 22.6 22.2 16.6 21.2 20.8 An 14 22.6 23.7 17.2 22.3 21.7 L (measured) 21.4 23.1 16.2 21.4 21.2 SD 1.O 1.2 0.7 1.1 1.O calculated 20.7 23.5 16.1 21.1 20.8 delta -O.7 0.4 -0.1 -0.3 -0.4 BRCA1 motifs g1b1 to b4b1 Case number g1b1 (8.5)

BRCA1 l CV Mutation l (kb) delta SD (%) SF

1 Tot 36 8.0 -0.5 1.2 1S.O 2.1 Dup W ex 13 nut (8. 2 Tot 21 8.2 -0.3 1.8 22.O 2.1 Del W EX nut 36 kb 8. 3 Tot 23 8.8 O.3 3.1 35.1 1.9 Del W ex2 nut 39 kb 8. 4 Tot 28 8.0 -0.5 1.3 16.O 2.1 Dup ex w 18-20 mut 8. 5 Tot 31 8.0 -0.5 1.O 11.9 2.1 Del ex w 15 nut 8. 6 Tot 21 8.8 O.3 O.9 10.2 1.9 Del ex w 8-13 nut (8. BRCA1 motifs g1b1 to b4b1 g2b1 (9.5)

Case number l CV BRCA1 Mutation l (kb) delta SD (%) SF l

1 Tot 36 10.1 O6 2.4 23.8 1.9 38 Dupex 13 Wt nut delta 2 Tot 18 9.2 -0.3 1.8 19.6 2.1 17 Del ex Wt 36 kb nut delta 3 Tot 25 11.4 1.9 3.3 28.7 1.7 25 Delex 2 Wt 39 kb nut delta 4 Tot 30 9.7 O.2 2.3 24.0 2.0 33 Dup ex Wt 22 18-2O nut 11 delta US 2013/0130246 A1 May 23, 2013 38

TABLE 6-continued Robustness of BRCA1 and BRCA2 signals measurement in 10 control blood donors 5 Tot 32 9.8 O.3 1.5 15.3 1.9 33 Del ex 15 Wt nut delta 6 Tot 22 10.8 1.3 1.9 17.6 1.9 22 Del ex 8-13 Wt nut delta BRCA1 motifs g1b1 to b4b1 g3b1 (12.3)

Case number l CV BRCA1 Mutation (kb) delta SD (%) SF SEM 95% CI Error

1 To 16 -0.7 2.1 18.1 2.1 Dupex 13 W l (8. 2 To 2.0 -0.3 1.9 15.8 2.1 Del ex W 36 kb l 8. 3 To 1.6 -4.9 3.5 3O4 2.1 Delex 2 W 39 kb l 8. 4 To S.O 2.7 3.5 23.3 1.6 Dup ex W 2.7 1.1 8.7 2.6 O.2 12.2 13.2 18-2O l 9.4 1.2 6.2 1.7 0.4 18.7 20.1 8. 6.7 5.5 7.9 1.2 5 To 17 -0.6 1.9 16.O 2.1 Del ex 15 W l 8. 6 To 16 -0.7 1.9 16.4 1.9 Del ex 8-13 W l (8. BRCA1 motifs g1b1 to b4b1 g4b1 (16.5)

Case number l CV BRCA1 Mutation l (kb) delta SD (%) SF SEM 95% CI Error

1 To 40 9.0 2.5 3.5 8.4 1.7 Dupex 13 W 21 6.1 1.6 9.8 2.0 O.3 15.4 16.8 mut 19, 22.2 2.O 8.9 1.5 O.S 21.3 23.1 (8. 6.1 4.5 7.7 1.6 2 To 22 5.9 -0.6 2.2 3.8 2.1 Del ex W 36 kb nut 8. 3 To 25 6.7 O.2 2.2 3.3 2.0 Delex 2 W 39 kb nut 8. 4 To 30 6.5 O.O 2.8 7.0 2.0 Dup ex W 18-2O nut 8. 5 To 33 4.3 -2.2 2.3 6.1 2.3 Del ex 15 W 12 6.9 1.3 7.6 2.0 0.4 16.2 17.7 mut 21 2.8 1.1 8.8 2.6 O.2 12.3 13.3 8. -4.1 -5.3 -2.8 1.2 6 To 23 7.5 -2.2 4.0 23.0 2.2 Del ex 8-13 W 13 20.8 1.6 7.9 1.6 O.S 19.9 21.7 mut 10 3.3 1.1 8.0 2.5 O.3 12.6 14.0 (8. -7.5 -90 -5.9 16 US 2013/0130246 A1 May 23, 2013 39

TABLE 6-continued Robustness of BRCA1 and BRCA2 signals measurement in 10 control blood donors BRCA1 motifs g5b1 to g7b1 g5b1 (19.7)

Case number l CV BRCA1 Mutation l (kb) delta SD (%) SF SEM 95% CI Error

1 To 37 185 -1.2 2.8 15.1 2.1 Dup ex 13 W nut (8. 2 To 22 19.2 -0.5 1.3 6.8 2.1 Del ex W 36 kb nut 8. 3 To 19 19.6 -0.1 2.7 13.9 2.0 Delex 2 W 39 kb nut 8. 4 To 24 2O O.3 2.3 11 2 Dup ex W 18-2O nut 8. 5 To 28 19.4 -0.3 1.9 9.8 2.0 Del ex 15 W nut 8. 6 To 23 12.8 -3.7 5.5 43.O 2.6 Del ex8-13 W 13 18.3 1.3 7.1 1.8 O4 17.6 19.0 mut 10 5.8 O.S 8.6 5.7 O.2 5.5 6.1 (8. -12.5 -135 - 115 1.O BRCA1 motifs g5b1 to g7b1 g6b1 (9.3)

Case number l CV BRCA1 Mutation l (kb) delta SD (%) SF

1 To 34 9.1 -0.2 1.9 20.9 2.0 Dupex 13 W nut (8. 2 To 18 8.8 -0.5 2.5 28.4 2.1 Del ex W 36 kb nut 8. 3 To 19 1O.S 1.2 3.0 28.3 1.8 Delex 2 W 39 kb nut 8. 4 To 23 9.8 O.S 1.6 17 1.9 Dup ex W 18-2O nut 8. 5 To 22 9.3 O.O 1.2 12.5 2.0 Del ex 15 W nut 8. 6 To 21 1O.S 1.2 1.9 18.1 1.8 Del ex 8-13 W nut (8. BRCA1 motifs g5b1 to g7b1 g7b1 (17.7)

Case number l CV BRCA1 Mutation l (kb) delta SD (%) SF SEM 95% CI Error

1 Tot 31 16.2 Dupex 13 Wt nut delta US 2013/0130246 A1 May 23, 2013 40

TABLE 6-continued Robustness of BRCA1 and BRCA2 signals measurement in 10 control blood donors 2 To 2O 2.3 Del ex W 1 18.1 0.7 3.9 2.0 O.2 17.7 18.5 36 kb nut 9 8.1 1.6 19.8 4.4 O.S 7.0 9.2 (8. -10.0 -11.5 -8.5 1.5 3 To 16 1.9 Delex 2 W 5 17.3 O.S 2.9 2.0 O2 16.9 17.7 39 kb mut 11 8.7 1.2 13.8 4.1 0.4 8.0 9.4 8. -8.6 -9.8 -7.4 1.2 4 To 22 7.2 Dup ex W 18-2O nut 8. 5 To 2O 7.8 Del ex 15 W nut 8. 6 To 2O 8.0 O.3 2.4 13.3 2.1 Del ex 8-13 W nut (8. BRCA2 motifs g1b2 to g5b2 g1b2 (20.7)

C3Sel. l l delta SD CV SF

1 To 24 2O2 -0.5 2.6 12.9 2.0 W nut (8. 2 To 31 2O2 -0.5 2.0 9.9 2.0 W nut 8. 3 To 26 20.3 -0.4 2.2 10.8 2.0 W nut 8. 4 To 21 21.3 O.6 3.2 1S.O 1.9 W nut 8. 5 To 27 21.5 O.8 2.1 9.8 1.9 W nut 8. 6 To 21 22.6 1.9 2.0 8.8 1.8 W nut (8. BRCA2 motifs g1b2 to g5b2 g2b2 (23.5)

C8Se. l l delta SD CV SF

1 To 25 22.2 -1.3 4.6 20.7 2.1 W nut delta 2 To 23 22.2 -1.3 1.O 4.5 2.1 W nut delta 3 To 28 22.5 -1.0 3.4 15.1 2.1 W nut delta 4 To 23 22.O -1.5 3.7 16.8 2.1 W nut delta US 2013/0130246 A1 May 23, 2013 41

TABLE 6-continued Robustness of BRCA1 and BRCA2 signals measurement in 10 control blood donors 5 Tot 28 22.6 -0.9 2.0 8.8 2.1 Wt nut delta 6 Tot 22 24.4 O.9 2.7 11.1 1.9 Wt nut delta BRCA2 motifs g1b2 to g5b2 g3b2 (16.1)

C8Se. l l delta SD CV SF

1 To 30 6.1 O.O 2.1 13.0 2.0 W nut (8. 2 To 31 5.2 -0.9 1.2 7.9 2.1 W nut 8. 3 To 30 6.8 0.7 2.2 13.1 1.9 W nut 8. 4 To 30 6.2 O.1 2.9 17.9 2.0 W nut 8. 5 To 29 6.6 O.S 2.2 13.3 1.9 W nut 8. 6 To 22 8.O 1.9 2.1 11.7 1.8 W nut (8. BRCA2 motifs g1b2 to g5b2 g4b2 (21.1)

C8Se. l l delta SD CV SF

1 To 28 20.6 -0.5 2.7 13.1 2.0 W nut (8. 2 To 28 2O.S -0.6 1.4 6.8 2.1 W nut 8. 3 To 30 21.O -0.1 2.5 11.9 2.0 W nut 8. 4 To 27 20.7 -0.4 1.9 9.2 2.0 W nut 8. 5 To 28 22.4 1.3 2.1 9.4 1.9 W nut 8. 6 To 2O 22.8 1.7 O.9 3.9 1.9 W nut (8. US 2013/0130246 A1 May 23, 2013 42

TABLE 6-continued Robustness of BRCA1 and BRCA2 signals measurement in 10 control blood donors BRCA2 motifs g1b2 to g5b2 g5b2 (20.8)

C8Se. delta SD CV SF

27 20.7 -0.1 2.4 11.6 2.0

t

23 20.9 O.1 2.1 2.0

28 20.3 -0.5 2.9 14.3 2.0

19 20.9 O.1 2.6 12.4 2.0

23 21.3 O.S 1.4 6.6 2.0

17 22.3 1.5 1.4 6.3 1.9

TABLE 8

SEQ NO BRCA - 1A-F DNA Hono sapiens GGGACGGAAAGCTATGATGT

SEQ NO BRCA - 1A-R DNA Hono sapiens GGGCAGAGGTGACAGGTCTA

SEQ NO BRCA DNA Hono sapiens CCTCTGACCTGATCCCTTGA

SEQ NO BRCA - 1B-R DNA Hono sapiens ATCAGCAACAGTCCCATTCC

SEQ NO BRCA DNA Hono sapiens GCCCAGACTAGTGTTTCTTAACC

SEQ NO BRCA DNA Hono sapiens GGCATGAGGCAGCAATTTAG

SEQ NO BRCA DNA Hono sapiens TCTTTGAATCTGGGCTCTGC

SEQ NO BRCA DNA Hono sapiens GCTGTTGCTTTCTTTGAGGTG

SEQ NO BRCA DNA Hono sapiens CACAGGTATGTGGGCAGAGA

SEQ NO BRCA DNA Hono sapiens CCTCTGTTGATGGGGTCATAG

SEQ NO BRCA DNA Hono sapiens TTTGGTAGACCAGGTGAAATGA

SEQ NO BRCA DNA Hono sapiens CAAATTATGTGTGGAGGCAGA

SEQ NO BRCA DNA Hono sapiens GAAGAACGTGCTCTTTTCACG

SEQ NO BRCA DNA Hono sapiens AAAGTCTGATAACAGCTCCGAGA

SEQ NO BRCA DNA Hono sapiens TTCGATTCCCTAAGATCGTTTC

SEQ NO BRCA DNA Hono sapiens CACAGTTCTGTGTAATTTAATTTCGAT

SEQ NO BRCA DNA Hono sapiens AGGGAAGGCTCAGATACAAAC

SEQ NO BRCA DNA Hono sapiens TGCCATAGATAGAGGGCTTTTT

SEQ NO BRCA DNA Hono sapiens GCCATCTTCTTTCTCCTGCT

SEQ NO BRCA DNA Hono sapiens TTGACCTATTGCTGAATGTTGG US 2013/0130246 A1 May 23, 2013 43

TABL 8 - continued

SEO ID NO 21 BRCA1-11-F DNA Hono sapiens TTTTACCAAGGAAGGATTTTCG

SEO ID NO 22 BRCA1-11-R DNA Hono sapiens GCTTGATCACAGATGTATGTATGAGTT

SEO ID NO 23 BRCA1-12-F DNA Hono sapiens CCCCAGGGCTTTAAAGGTTA

SEO ID NO 24 BRCA1-12-R DNA Hono sapiens TAGGGGTGGATATGGGTGAA

SEO ID NO 25 BRCA1-13A-F DNA Hono sapiens ACTTCTTCAACGCGAAGAGC

SEO ID NO 26 BRCA1-13A-R DNA Hono sapiens GACAGGCTGTGGGGTTTCT

SEO ID NO 27 BRCA1-15 - DNA Hono sapiens TATCTGCTGGCCACTTACCA

SEO ID NO 28 BRCA1-15-R DNA Hono sapiens TCTCGAGCCTTGAACATCC

SEO ID NO 29 BRCA1-16 DNA Hono sapiens CGCTCAGCTTTCATTCCAGT

SEO ID NO 3 O BRCA1-16-R DNA Hono sapiens AAACGTTCACATGTATCCCCTAA

SEO ID NO 31 BRCA1 - 17 DNA Hono sapiens CCTGGCCAGTACCCAGTAGT

SEO ID NO 32 BRCA1 - 17-R DNA Hono sapiens CTGAGCCCAGAGTTTCTGCT

SEO ID NO 33 BRCA1-18 DNA Hono sapiens GGGCCCAAAAACCAGTAAGA

SEO ID NO 34 BRCA1-18-R DNA Hono sapiens GGGATTGAGCGTTCACAGAT

SEO ID NO 35 BRCA1-19 DNA Hono sapiens GCCATCCAGTCCAGTCTCAT

SEO ID NO 36 BRCA1-19-R DNA Hono sapiens TGCAGTTCTACCCTCCACTTG

SEO ID NO 37 BRCA1-22 DNA Hono sapiens CGGGTAAGTGGTGAGCTTTC

SEO ID NO 38 BRCA1-22-R DNA Hono sapiens GACTGTCATTTAAAGGCACTTTTT

SEO ID NO 39 BRCA1-23 DNA Hono sapiens TGGCTAGTGTTTTGGCCTGT

SEO ID NO 4 O BRCA1-23-R. DNA Hono sapiens TTCAGTGTTGCTTCTCCATTTC

SEO ID NO 41 BRCA1-24 DNA Hono sapiens TGTCAGACTAGCCACAGTACCA

SEO ID NO 42 BRCA1-24 -R DNA Hono sapiens AAGCGCTTCTTCATATTCTCC

SEO ID NO 43 BRCA1-25 DNA Hono sapiens ACCACACTCTTCTGTTTTGATGT

SEO ID NO 44 BRCA1-25-R DNA Hono sapiens GGCACATGTACACCATGGAA

SEO ID NO 45 BRCA1-26 DNA Hono sapiens TTGTGTAGGTTGCCCGTTC

SEO ID NO 46 BRCA1-26 -R DNA Hono sapiens TTCAGAGAGCTGGGCCTAAA

SEO ID NO 47 BRCA1-27 DNA Hono sapiens GGAGGCAATCTGGAATTGAA

SEO ID NO 48 BRCA1-27-R DNA Hono sapiens GGATCCATGATTGCTGCTTT

SEO ID NO 49 BRCA1-28 - DNA Hono sapiens TCTCTGCTGTTTTTACAACTTTTTC

SEO ID NO 50 BRCA1-28 -R DNA Hono sapiens GGATCCATGATTGCTGCTTT

SEO ID NO 51 BRCA1- 29 DNA Hono sapiens CCC. TCTAGATACTTGTGTCCTTTTG

SEO ID NO 52 BRCA1-29 -R DNA Hono sapiens TCTGGCAGTCACAATTCAGG

SEO ID NO 53 BRCA1-3 O DNA Hono sapiens TCCCATGACTGCATCATCTT

SEO ID NO 54 BRCA1-3 O-R DNA Hono sapiens TTGAGATCAGGTCGATTCCTC

SEO ID NO 55 BRCA1-31 DNA Hono sapiens AAAACTCAACCCAAACAGTCA,

SEO ID NO 56 BRCA1-31-R DNA Hono sapiens CCAAGAATCACGAAGAGAGAGA

SEO ID NO 57 BRCA1-32 - DNA Hono sapiens GAC CTCATAGAGGTAGTGGAAAGAA

SEO ID NO 58 BRCA1-32 -R DNA Hono sapiens GCT CAAAGCCTTTAGAAGAAACA US 2013/0130246 A1 May 23, 2013 44

TABLE 8 - continued SEO ID NO 59 BRCA1-33- DNA Homo sapiens GCACTGGGGAAAAGGTAGAA SEO ID NO 6O BRCA1-33-R. DNA Homo sapiens CTCTTCAACCCAGACAGATGC SEO ID NO 61 BRCA1-34- DNA Homo sapiens CAATACCCAATACAATGTAAATGC SEO ID NO 62 BRCA1-34 -R DNA Homo sapiens CTGGGGATACTGAAACTGTGC SEO ID NO 63 BRCA1-35- DNA Homo sapiens ATCAAGAAGCCTTCCCAGGT SEO ID NO 64 BRCA1-35-R DNA Homo sapiens TCCTTGGACGTAAGGAGCTG SEO ID NO 65 BRCA1-36- DNA Homo sapiens TTCAGAACTTCCAAATACGGACT SEO ID NO 66 BRCA1-36-R DNA Homo sapiens GATGGAGCTGGGGTGAAAT SEO ID NO 67 BRCA1-37- DNA Homo sapiens CGTGAGATTGCTCACAGGAC SEO ID NO 68 BRCA1-37-R DNA Homo sapiens CAAGGCATTGGAAAGGTGTC SEO ID NO 69 BRCA1-38- DNA Homo sapiens AGAGGAATAGACCATCCAGAAGT SEO ID NO 70 BRCA1-38-R DNA Homo sapiens TCCTCCAGCACTAAAAACTGC SEO ID NO 71 BRCA2-1-F DNA Homo sapiens AAATGGAGGTCAGGGAACAA SEO ID NO 72 BRCA2-1-R DNA Homo sapiens TGGAAAGTTTGGGTATGCAG SEO ID NO 73 BRCA2-2-F DNA Homo sapiens TCT CAATGTGCAAGGCAATC SEO ID NO 74 BRCA2-2-R DNA Homo sapiens TCTTGACCATGTGGCAAATAA SEO ID NO 75 BRCA2-3a-F DNA Homo sapiens AAT CACCCCAACCTTCAGC SEO ID NO 76 BRCA2-3a-R DNA Homo sapiens GCCCAGGACAAACATTTTCA SEO ID NO 77 BRCA2-3-F DNA Homo sapiens CCCTCGCATGTATGATCTGA SEO ID NO 78 BRCA2-3-R. DNA Homo sapiens CTCCTGAAGTCCTGGAAACG SEO ID NO 79 BRCA2-3c-F DNA Homo sapiens TGAAATCTTTTCCCTCTCATCC SEO ID NO 80 BRCA2-3c-R DNA Homo sapiens AGATTGGGCACATCGAAAAG SEO ID NO 81 BRCA2-5-F DNA Homo sapiens GGTCTTGAACACCTGCTACCC SEO ID NO 82 BRCA2-5-R DNA Homo sapiens CACTCCGGGGGTCCTAGAT

SEO ID NO 83 BRCA2 - 6 - F DNA Homo sapiens TCTTTAACTGTTCTGGGTCACAA SEO ID NO 84 BRCA2 - 6-R DNA Homo sapiens TGGCTAGAATTCAAAACACTGA SEO ID NO 85 BRCA2-7-F DNA Homo sapiens TTGAAGTGGGGTTTTTAAGTTACAC SEO ID NO 86 BRCA2-7-R DNA Homo sapiens CCAGCCAATTCAACATCACA SEO ID NO 87 BRCA2 - 11- DNA Homo sapiens TTGGGACAATTCTGAGGAAAT SEO ID NO 88 BRCA2-11-R DNA Homo sapiens TGCAGGTTTTGTTAAGAGTTTCA SEO ID NO 89 BRCA2-12- DNA Homo sapiens TGGCAAATGACTGCATTAGG SEO ID NO 90 BRCA2-12-R DNA Homo sapiens TCTTGAAGGCAAACTCTTCCA SEO ID NO 91 BRCA2-13 - DNA Homo sapiens GGAATTGTTGAAGTCACTGAGTTGT SEO ID NO 92 BRCA2-13-R. DNA Homo sapiens ACCACCAAAGGGGGAAAAC SEO ID NO 93 BRCA2 - 14- DNA Homo sapiens CAAGTCTTCAGAATGCCAGAGA SEO ID NO 94 BRCA2-14 -R DNA Homo sapiens TAAACCCCAGGACAAACAGC SEO ID NO 95 BRCA2-15- DNA Homo sapiens GGCTGTTTGTTGAGGAGAGG SEO ID NO 96 BRCA2-15-R DNA Homo sapiens GAA ACCAGGAAATGGGGTTT

SEO ID NO 97 BRCA2-18- DNA Homo sapiens TGTTAGGGAGGAAGGAGCAA

US 2013/0130246 A1 May 23, 2013 48

TABLE 8 - continued

AGGCTGAGGCAGCAGAATTGCTTGAACCCGGGAGGTGGAGGTTGTAGTGAGCCGAGATTGTGCCACTG CACTCCAGCCTGGGCAACAAGAGCCAAAGTCTGTCT CAAAAAAAAAAAAAAAAAAAAAAAAAGAAATT AATCTTAACAGGAAACAGAAAAAAGCAATGAAAAGCTAGAAAACATAATAGTTGATTGAAAATAACAAT TTAGCATTTTCATTCTTACATCTTTAATTTTTATGTATCTGAGTTTTTAATTGATGGTTTAATTTGCCAGAAT GAGAAAGAACATCCTATTTTTATGACTCTCTCCCATGGAAATGAAACATAAATGTATCCAAATGCCACAC TATTGAGGATTTTCCTGATCACTGATTGTCATGAGTAAGTTTTGTGCTTTTTCAAAAGCAGTTTTTTCCTAC AATGTCATTTCCTGCTTCTCTGGCTCTGATTTTCAATAAATTGATAAATTGTGAATCCTGTTTTCCTCTTATT TTTGTTTAGCTATAATGTTGAAGGGCAAGGGAGAGGATGGTTATTTATAAATCTTGTATCGCTCTGAAAA CACAACATACATTTTCCTTAATCTGATTAACTTGACTTCAAATATGAAAAACAACTTTCATAAAGCAGAAA AGAATTTACCCTTTTTTATTGTGGGTAAGAGGCAATGGTA SEQ ID NO 134 ForwardPrimerPrefix DNA. Artificial Sequence AAAAGGCGCGCC SEQ ID NO 135 Reverse.PrimerPrefix DNA. Artificial Sequence AAAATTAATTAA

REFERENCES 0238 Mazoyer, S. (2005). Genomic rearrangements in the 0228 Caburet, S., Conti, C. Schurra, C., Lebofsky, R., BRCA1 and BRCA2 genes. Hum Mutat 25, 415-422. Edelstein, S.J., and Bensimon, A. (2005). Human riboso 0239 Nathanson, K. L., Wooster, R., and Weber, B. L. mal RNA gene arrays display abroad range of palindromic (2001). Breast cancer genetics: what we know and what we structures. Genome Res 15, 1079-1085. need. Nat Med 7,552-556. 0229 Casili. F., Di Rocco, Z. C., Gad, S., Tournier, I., 0240 Puget, N., Gad, S. Perrin-Vidoz, L., Sinilnikova, O. Stoppa-Lyonnet, D., Frebourg, T., and Tosi, M. (2002). M. Stoppa-Lyonnet, D., Lenoir, G. M., and Mazoyer, S. Rapid detection of novel BRCA1 rearrangements in high (2002). Distinct BRCA1 rearrangements involving the risk breast-ovarian cancer families using multiplex PCR of BRCA1 pseudogene Suggest the existence of a recombina short fluorescent fragments. Hum Mutat 20, 218-226. tion hot spot. Am J Hum Genet. 70, 858-865. 0230 Cavalieri, S., Funaro, A., Pappi, P. Migone, N., 0241 Rouleau, E., Lefol. C., Tozlu, S., Andrieu, C., Guy, Gatti, R. A., and Brusco, A. (2008). Large genomic muta C., Copigny, F., Nogues, C., Bieche, I., and Lidereau, R. tions within the ATM gene detected by MLPA, including a (2007). High-resolution oligonucleotide array-CGH duplication of 41 kb from exon 4 to 20. Ann Hum Genet. applied to the detection and characterization of large rear 72, 10-18. rangements in the hereditary breast cancer gene BRCA1. 0231 Gad, S., Aurias, A., Puget, N., Mairal, A., Schurra, Clin Genet 72, 199-207. C., Montagna, M., Pages, S., Caux, V., Mazoyer, S., Ben 0242 Schurra, C., and Bensimon, A. (2009). Combing Simon, A., et al. (2001). Color bar coding the BRCA1 gene genomic DNA for structural and functional studies. Meth on combed DNA:auseful strategy for detecting large gene ods Mol Biol 464, 71-90. rearrangements. Genes Chromosomes Cancer 31, 75-84. 0243 Staaf, J., Torngren, T., Rambech, E., Johansson, U. 0232 Gad, S., Bieche, I., Barrois, M., Casilli, F., Pages Persson, C., Sellberg, G., Tellhed, L., Nilbert, M., and Berhouet, S., Dehainault, C., Gauthier-Villars, M., Bensi Borg, A. (2008). Detection and precise mapping of germ mon, A., Aurias, A., Lidereau, R., et al. (2003). Character line rearrangements in BRCA1, BRCA2, MSH2, and ization of a 161 kb deletion extending from the NBR 1 to MLH1 using Zoom-in array comparative genomic hybrid the BRCA1 genes in a French breast-ovarian cancer family. ization (aCGH). Hum Mutat 29, 555-564. Hum Mutat 21, 654. 0244 Szabo, C., Masiello, A., Ryan, J. F., and Brody, L. C. 0233 Gad, S., Caux-Moncoutier, V., Pages-Berhouet, S., (2000). The breast cancer information core: database Gauthier-Villars, M., Coupier, I., Pujol, P., Frenay, M., design, structure, and scope. Hum Mutat 16, 123-131. Gilbert, B., Maugard, C., Bignon, Y. J., et al. (2002a). 0245 Walsh, T., Lee, M.K., Casadei, S., Thornton, A. M., Significant contribution of large BRCA1 gene rearrange Stray, S. M., Pennil, C., Nord, A. S., Mandell, J. B., ments in 120 French breast and ovarian cancer families. Swisher, E. M., and King, M. C. (2010). Detection of Oncogene 21, 6841-6847. inherited mutations for breast and ovarian cancer using 0234 Gad, S. Klinger, M. Caux-Moncoutier, V., Pages genomic capture and massively parallel sequencing. Proc Berhouet, S., Gauthier-Villars, M., Coupier, I., Bensimon, Natl Acad Sci USA 107, 12629-12633. A., Aurias, A., and Stoppa-Lyonnet, D. (2002b). Bar code Screening on combed DNA for large rearrangements of the RELATED PATENTS AND PATENT BRCA1 and BRCA2 genes in French breast cancer fami APPLICATIONS lies. J Med Genet 39, 817-821. 0235. Herrick, J., and Bensimon, A. (2009). Introduction 0246 Lebofsky R. Walrafen P. Bensimon A: Genomic to molecular combing: genomics, DNA replication, and Morse Code U.S. Pat. No. 7.985,542 B2 (application Ser. cancer. Methods Mol Biol 521, 71-101. No. 11/516,673) 0236 Hofmann, W., Wappenschmidt, B., Berhane, S., 0247 Murphy P D, Allen A C, Alvares C P. Critz B S, Schmutzler, R., and Scherneck, S. (2002). Detection of Olson SJ. Schelter D B. Zeng B: Coding sequences of the large rearrangements of exons 13 and 22 in the BRCA1 human BRCA1 gene U.S. Pat. No. 5,750.400 gene in German families. J Med Genet. 39, E36. 0248 Skolnick M H, Goldgar D E. Miki Y., Swenson J, 0237 King, M.C., Marks, J. H., and Mandell, J. B. (2003). Kamb A. Harshman K D, Shattuck-eidens DM, Tavtigian Breast and ovarian cancer risks due to inherited mutations SV, Wiseman R W. Futreal A P: 17q-linked breast and in BRCA1 and BRCA2. Science 302, 643-646. ovarian cancer susceptibility gene U.S. Pat. No. 5,710.001

US 2013/0130246 A1 May 23, 2013

- Continued tggCacatgc CtgcaatcCC agg tact.ccg gaggctgagg cagcagaatt gcttgaaccc 384 O gggaggtgga ggttgtagt agcc.gagatt gtgcc actgc act coagcct gggcaacaag 3900 agccaaagtic tdt citcaaaa aaaaaaaaaa aaaaaaaaaa agaaattaat cittaa.cagga 396 O aacagaaaaa agcaatgaaa agctagaaaa cataatagitt gattgaaaat aacaatttag 4 O2O catttt catt cittacatctt taatttittat gitatctgagt ttittaattga tiggitttaatt 4 O8O tgcc agaatg agaaagaa.ca toc tatttitt atgactic tot cocatggaaa tdaaa cataa 414 O atgitat coaa atgccacact attgaggatt titcct gatca citgattgtca tdagtaagtt 42OO ttgttgcttitt toaaaag.cag titttitt.ccta caatgtcatt toctogcttct c toggctctga 426 O ttittcaataa attgataaat tdtgaatcct gttitt cotct tattitttgtt tagctataat 432O gttgaagggc aagggagagg atggittattt ataaatcttg tat cqct ctgaaaacacaac 438 O atacatttitc cittaatctga tta acttgac ttcaaatatgaaaaacaact tt cataaagc 4 44 O agaaaagaat ttacccttitt ttattgttggg taagaggcaa tigta 4 485

<210s, SEQ ID NO 134 &211s LENGTH: 12 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer prefix appended to the 5' end of primers for cloning purposes

<4 OOs, SEQUENCE: 134 aaaaggcgcg cc 12

<210s, SEQ ID NO 135 &211s LENGTH: 12 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Synthetic primer prefix appended to the 5' end of primers for cloning purposes

<4 OOs, SEQUENCE: 135 aaaattaatti aa 12

1. A composition comprising at least two polynucleotides 6. The composition of claim 1, comprising at least two wherein each polynucleotide binds to a portion of the genome polynucleotides that are each tagged with a different detect containing a BRCA1 and/or BRCA2 gene, wherein each of able label or marker. said at least two polynucleotides contains at least 200 con 7. The composition of claim 1, comprising at least three tiguous nucleotides and contains less than 10% of Alu repeti polynucleotides that are each tagged with a different detect tive nucleotidic sequences. able label or marker. 2. The composition of claim 1, wherein said at least two 8. The composition of claim 1, comprising at least four polynucleotides bind to a portion of the genome containing polynucleotides that are each tagged with a different detect BRCA1. able label or marker. 9. The composition of claim 1, comprising three to ten 3. The composition of claim 1, wherein said at least two polynucleotides that are each independently tagged with the polynucleotides bind to a portion of the genome containing same or different visually detectable markers. BRCA2. 10. The composition of claim 1, comprising eleven to 4. The composition of claim 1, wherein each of said at least twenty polynucleotides that are each independently tagged two polynucleotides contains at least 500 up to 6000 contigu with the same or different visually detectable markers. ous nucleotides and contains less than 10% of Alu repetitive 11. The composition of claim 1, comprising at least two nucleotidic sequences. polynucleotides each tagged with one of at least two different 5. The composition of claim 1, wherein the at least two detectable labels or markers. polynucleotides are each tagged with a detectable label or 12. A method for detecting a duplication, deletion, inver marker. Sion, insertion, translocation or large rearrangement in a US 2013/0130246 A1 May 23, 2013 72

BRCA1 or BRCA2 locus, BRCA1 or BRCA gene, BRCA1 or 18. The kit of claim 16, wherein said at least two polynucle BRCA flanking sequence or intron, comprising: otides are selected to identify a duplication, deletion, inver (i) isolating a DNA sample, Sion, insertion, translocation or large rearrangement in a par (ii) molecularly combing said sample, ticular segment of a BRCA1 or BRCA2 locus, BRCA1 or (iii) contacting the molecularly combed DNA with the BRCA2 gene, BRCA1 or BRCA2 flanking sequence or composition of claim 5 as a probe for a time and under intron associated with a kind of ovarian cancer or breast conditions sufficient for hybridization to occur, cancer sensitive to a particular therapeutic agent, drug or (iv) visualizing the hybridization of the composition of procedure. claim 5 to the DNA sample, and 19. A method for in vitro detecting in a sample containing (V) comparing said visualization with that obtain from a genomic DNA, a repeat array of multiple tandem copies of a control sample of a normal or standard BRCA1 or repeat unit consisting of genomic sequence spanning the 5' BRCA2 locus, BRCA1 or BRCA gene, BRCA1 or end of the BRCA1 gene wherein said repeat array consists of BRCA flanking sequence or intron that does not contain at least three copies of the repeat unit and said method com a rearrangement or mutation. prises: 13. The method of claim 12, wherein said probe is selected providing conditions enabling hybridization of a first to detect a rearrangement or mutation of more than 1.5 kb. primer with the 5' end of the target genomic sequence 14. The method of claim 12, further comprising predicting and hybridization of a second primer with the 3' end of orassessing a predisposition to ovarian or breast cancer based said target sequence, in order to enable polymerization on the kind of genetic rearrangement or mutation detected in by PCR starting from said primers; a coding or noncoding BRCA1 or BRCA2 locus sequence. amplifying the sequences hybridized with the primers; 15. The method of claim 12, further comprising determin detecting, in particular with a probe, the amplicons thereby ing the sensitivity of a subject to atherapeutic treatment based obtained and determining their size or their content, in on the kind of genetic rearrangement or mutation detected in particular their nucleotide sequence. a coding or noncoding BRCA1 or BRCA2 locus sequence. 20. The method of claim 19 wherein the repeat unit encom 16. A kit for detecting a duplication, deletion, inversion, passes the exons 1a, 1b and 2 of the BRCA1 gene and option insertion, translocation or large rearrangementina BRCA1 or ally encompasses a sequence of the 5' end of the NBR2 gene BRCA2 locus, BRCA1 or BRCA2 gene, BRCA1 or BRCA2 21. The method of claim 19, wherein the downstream and flanking sequence or intron comprising upstream primers are respectively selected from the group of a) at least two polynucleotides wherein each polynucle for a downstream primer: otide binds to a portion of the genome containing a a polynucleotide sequence in the region between exons 2 BRCA1 or BRCA2 gene, wherein each of said at least and 3 of BRCA1, preferably at a distance from 2-4 kb two polynucleotides contains at least 200 contiguous from the 3' end of exon 2, more preferably at a dis nucleotides and is free of repetitive nucleotidic tance from 2.5-3 kb from the 3' end of exon 2 or sequences, wherein said at least two polynucleotides are a polynucleotide sequence in the region between exons 2 tagged with visually detectable markers and are selected and 3 of BRCA1, within 2 kb from the 3' end of exon to identify a duplication, deletion, inversion, insertion, 2, preferably within 1.5 kb and more preferably translocation or large rearrangement in a particular seg within 1 kb from the 3' end of exon 2 ment of a BRCA1 or BRCA2 locus, BRCA1 or BRCA2 for an upstream primer: gene, BRCA1 or BRCA2 flanking sequence or intron, and optionally, a polynucleotide sequence in the region between the b) a standard describing a hybridization profile for a subject BRCA1 gene and the NBR2 gene, within 2 kb from not having a duplication, deletion, inversion, insertion, exon 1a of BRCA1, preferably within 1.5 kb and more translocation or large rearrangement in a BRCA1 or preferably within 1 kb of exon 1a of BRCA1 or, BRCA2 locus, BRCA1 or BRCA gene, BRCA1 or a polynucleotide sequence within exon 1a of BRCA1 or BRCA flanking sequence or intron; within exon 1b or in the region between exons 1a and c) one or more elements necessary to perform Molecular 1b or, Combing, a polynucleotide sequence in the region between exons d) instructions for use, and/or 1b and 2, or in exon 2, or in the region between exons e) packaging materials. 2 and 3. 17. The kit of claim 16, wherein said at least two polynucle 22. The method of claim 19, wherein the primers are otides are selected to identify a duplication, deletion, inver selected from the group of BRCA1-A3A-F (SEQ ID 25), Sion, insertion, translocation or large rearrangement in a par BRCA1-A3A-R (SEQ ID 26), BRCA1-Synt1-F (SEQ ID ticular segment of a BRCA1 or BRCA2 locus, BRCA1 or 125) and BRCA1-Synt1-R (SEQ ID 126) or their reverse BRCA2 gene, BRCA1 or BRCA2 flanking sequence or complementary sequences. intron associated with ovarian cancer or breast cancer. k k k k k