Curr. Issues Mol. Biol. (2003) 5: 43-60. Detection of Single Nucleotide Polymorphisms 43 Detection of Single Nucleotide Polymorphisms Pui-Yan Kwok1* and Xiangning Chen2 sequence variation became feasible when enzymatic and chemical methods of DNA manipulation were discovered. 1Cardiovascular Research Institute, University of California, As the demand for genetic analysis increases, SNP 505 Parnassus Avenue, Long 1332A, Box 0130, San detection technologies are developed at an accelerated Francisco, CA 94143-0130, USA pace, fueled by recent advances in enzymology, DNA 2Virginia Institute for Psychiatric and Behavioral Genetics, synthesis, and analytical instrumentation. Virginia Commonwealth University, 800 E. Leigh Street, SNP detection encompasses two broad areas: Suite 1-101, Richmond, VA 23298-0424, USA scanning DNA sequences for previously unknown polymorphisms and screening (genotyping) individuals for known polymorphisms. Scanning for new SNPs can be Abstract further divided into the global, or random, approach and the regional, or targeted, approach. Although the Single nucleotide polymorphism (SNP) detection technologies capable of scanning DNA for new technologies are used to scan for new polymorphisms polymorphisms can be used in screening individuals for and to determine the allele(s) of a known known polymorphisms, there are many more options for polymorphism in target sequences. SNP detection SNP genotyping. Indeed, many of the most exciting technologies have evolved from labor intensive, time genome technologies are those developed for genotyping consuming, and expensive processes to some of the tens of thousands of SNPs in thousands of individuals. most highly automated, efficient, and relatively Furthermore, SNP detection technologies are applicable inexpensive methods. Driven by the Human Genome not only to humans, but to any living organism on earth. Project, these technologies are now maturing and The evolution of SNP detection technology is robust strategies are found in both SNP discovery and characterized by the clever adoption of new biological genotyping areas. The nearly completed human methods, fluorescent and other reporters, computational genome sequence provides the reference against algorithms, and highly sensitive analytical instruments. which all other sequencing data can be compared. Although the ideal SNP detection method does not exist, Global SNP discovery is therefore only limited by the the field has come a long way from the early days and the amount of funding available for the activity. Local, technologies are sufficiently robust that it is now possible target, SNP discovery relies mostly on direct DNA to conduct large-scale genetic studies. As the cost and sequencing or on denaturing high performance liquid throughput of SNP detection continue to drop, even the chromatography (dHPLC). The number of SNP most ambitious studies will become economically feasible. genotyping methods has exploded in recent years and many robust methods are currently available. The Technologies For Global SNP Discovery demand for SNP genotyping is great, however, and no one method is able to meet the needs of all studies The main issue with global SNP discovery is that on using SNPs. Despite the considerable gains over the average, there is one SNP in every 1,000 bp of DNA when last decade, new approaches must be developed to two human genomes are compared to each other (The lower the cost and increase the speed of SNP International SNP Map Working Group, 2001). To maximize detection. the chance of finding SNPs, one must be able to scan 1,000 bp pieces of DNA in a generic way. The first attempt to Introduction identify SNPs randomly in the human genome was to scan for alterations in restriction sites in the genome (Botstein When any two human genomes are compared side by side, et al., 1980). Although the actual sequence variation was they are 99.9% identical (Cooper et al., 1985). However, not determined, restriction fragment length polymorphisms with a 3.2 billion base pair genome, each person harbors (RFLPs) were the first SNPs found in a random, global some 3.2 million differences in his/her diploid genome. Most approach (Donis-Keller et al., 1987). Developed before the of the differences are due to single base substitution polymerase chain reaction (PCR) was even conceived, polymorphisms, popularly known as single nucleotide RFLP analysis was a powerful, but very laborious, strategy. polymorphisms (SNPs). While the majority of the SNPs High quality genomic DNA from multiple individuals were are of no biological consequence, a fraction of the cut with a number of restriction enzymes, separated by substitutions have functional significance and are the basis gel electrophoresis and transferred to nylon filters. These for the diversity found among humans (Collins et al., 1997). Southern blots were then probed with random genomic As genetic markers, SNPs can be used to follow the clones to identify variations in the restriction fragment inheritance patterns of chromosomal regions from lengths. Because only small amounts of DNA were found generation to generation and are powerful tools in the study on the filter, radioactive labels were used with the probes. of genetic factors associated with human diseases Needless to say, this was a very labor intensive and (Johnson and Todd, 2000; Risch, 2000). The study of DNA expensive way to identify SNPs, and it was almost impossible to extract the sequence differences associated *For correspondence. Email [email protected]. © 2003 Caister Academic Press 44 Kwok with the RFLPs. Furthermore, even with probes that were clones are from the same library, the two BACs are derived thousands of bases long, one was scanning only a small from different parental lineage 50% of the time and SNPs fraction of the bases at the restriction sites. will be found in that situation (Taillon-Miller et al., 1998). The situation did not improve with the advent of PCR Indeed, some 800,000 SNPs were found in the overlapping (Saiki et al., 1985). This was due to the fact that in order sequences by the end of year 2000. The second set came for PCR to work, DNA sequence data had to be obtained from the TSC with data derived from the sequencing of to design loci-specific PCR primers. To complicate things clones from “reduced representation libraries”. By reducing further, DNA sequencing and oligonucleotide synthesis the complexity of the genome through cloning of size- were both costly activities until quite recently. Global selected restriction fragments from a pooled DNA sample, scanning of SNPs was not seriously attempted in the early the same loci were sequenced multiple times when a large 1990’s because simple sequence repeat polymorphisms number of clones from each library was sequenced (SSRPs), the highly informative microsatellite markers, (Altshuler et al., 2000). Even for the singleton clones, their were much more easily developed by probing genomic sequences were useful for SNP discovery because they libraries with synthetic oligonucleotides bearing the simple could be compared to the reference human genome sequence repeat motifs and sequencing the positive clones sequence. Some 900,000 SNPs were deposited to the (Weber and May, 1989; Litt and Luty, 1989). When physical public database by the end of 2000. The third set came mapping of the human genome began in earnest in the from the whole genome shotgun sequencing project by a mid-1990’s, many mapped sequence-tagged-sites (STSs) private enterprise and the data were kept from the public. and their PCR primers became available (Hudson et al., This large set of SNPs is available only to “subscribers” of 1995). Using a resequencing approach, these STSs were the private database (Venter et al., 2001). In addition to scanned for SNPs (Kwok et al., 1996) and several thousand these large-scale projects, smaller efforts using a variety SNPs were found in the first “large-scale” SNP identification of approaches also contributed to the global SNP discovery project (Wang et al., 1998). However, even after all the project (for example, Irizzary et al., 2000; Buetow et al., available STSs had been scanned, the number of SNPs 2001). found amounted only to a small fraction of the SNPs needed With the reference human genome sequence now for complex genetic analysis. close to completion, global SNP discovery is most efficient In the late 1990’s, several favorable factors came when one takes a shotgun sequencing approach and together to make global SNP discovery a reality. First, the compares the sequencing data obtained against the human genome project sped up its production rate in reference sequence. The density of the markers is response to the efforts of a new company to sequence the determined by the number of genomic clones sequenced. human genome by whole genome shotgun sequencing. This approach is applicable to any organism as long as This development meant that high quality sequences of the reference sequence of that organism is available. some 90% of the human genome sequence became available for comparison (International Human Genome Technologies For Targeted SNP Discovery Sequencing Consortium, 2001; Venter et al., 2001). Second, the cost and efficiency of DNA sequencing Unlike global SNP discovery, where speed and achieving improved significantly and one can obtain millions of 500 a certain density of marker are the most important factors, bp sequencing reads over a relatively short time. Third, targeted SNP discovery focuses on
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages19 Page
-
File Size-