Online Resources for SNP Analysisreview 65

Total Page:16

File Type:pdf, Size:1020Kb

Online Resources for SNP Analysisreview 65 Online Resources for SNP AnalysisREVIEW 65 Online Resources for SNP Analysis A Review and Route Map Christopher Phillips Abstract The major online single nucleotide polymorphism (SNP) databases freely available as research tools for genetic analysis are explained, reviewed, and compared. An outline is given of the search strategies that can be used with the most extensive current SNP databases: National Centre for Biotechnology Information (NCBI) dbSNP and HapMap to help the user secure the most appropriate data for the research needs of clinical genetics and population genetics research. A range of online tools that can be useful in designing SNP genotyping assays are also detailed. Index Entries: Single nucleotide polymorphism, SNP; genotyping; variation; polymorphism; online data- bases; linkage disequilibrium; haplotype; haplotype block; genome; HapMap; clinical genetics; population genetics. 1. Introduction marks available. Above all, this enables researchers The validated set of human single nucleotide to improve the resolution of linkage maps, in par- polymorphisms (SNPs) is one of the most valu- ticular in the precise mapping and study of genes able resources to have come from the human contributing to complex disorders, where the con- genome mapping project (HGP). This dataset tributory effect of individual genes is small or was largely unforeseen in the initial phases of where genetics is compounded by multiple locus sequencing but as several genome equivalents interaction and environment. As well as enabling were generated from different individuals during greatly enhanced mapping, SNPs provide a new the project, the identification of SNPs became a and fascinating layer of detail to our understand- by-product of growing importance. The discov- ing of human variability and what this tells us ery of new SNP sites by resequencing and the about disease susceptibility, populations, and evo- collation of detailed information about each locus lution. Because SNPs have low recurrent mutation has continued to develop both in scope and in rates compared to other polymorphic markers they depth as the simultaneous publication of draft provide the most stable and reliable indicators of sequence and first full SNP map 5 yr ago (1,2). the evolutionary history of populations. Last, but Added to this, our knowledge of SNPs has ex- not least, SNPs are the very stuff of genetic varia- panded in parallel to an improved understanding tion as a substitution in or near a transcribed region of the structure and function of the genome and can change an amino acid sequence, the control of its gene content. The SNPs provide the most com- transcription events, or the splicing pattern for the plete and densely spaced system of genome land- resulting RNA products. Such changes arise from *Author to whom all correspondence and reprint requests should be addressed. The Spanish National Genotyping Centre CeGen, Santiago node, Genomic Medicine Group, University of Santiago de Compostela, Galicia, Spain. E-mail: [email protected]. Molecular Biotechnology 2006 Humana Press Inc. All rights of any nature whatsoever reserved. ISSN: 1073–6085/Online ISSN: 1559–0305/2006/35:1/065–098/$30.00 MOLECULAR BIOTECHNOLOGY 65 Volume 35, 2007 65_98_Phillips_MB06_0040 65 1/3/07, 4:40 PM 66 Phillips loci commonly termed coding SNPs, promoter– These developments have meant that searching site SNPs, and splice–site SNPs, respectively. for specific SNPs among a total of 9 million or Therefore, it can be argued that SNPs form the more human loci can now be achieved fairly eas- principal part of the variability affecting each ily, provided the researcher begins with clearly stage of gene expression and can justifiably be determined criteria for the markers required for a said to influence the transcriptome and proteome, study. To this end, the I can firmly recommend, as well as the genome itself. As such, SNPs are as starting points, two excellent textbooks: Human more than just ubiquitous marker points, they Molecular Genetics and Human Evolutionary are a core genome feature that will ultimately Genetics (3,4) that each cover thoroughly the help to explain one of the enduring mysteries of fields of genetic analysis and population genet- human genetics: why comparable numbers of ics, respectively, these forming the two principal genes in species at opposite ends of the evolutionary applications for SNP analysis. Both books pro- scale can create such profoundly different levels of vide, in nonscientific parlance, “one-stop shops” complexity. for understanding the extent and specific charac- In the same spirit as that prevailing in the data- teristics of SNPs needed for planning studies of base management of HGP draft and final nucle- inherited human disease (referred to as clinical otide sequence, SNP data arising from public genetics from this point on) or population genet- genome analysis have been open source from the ics. I also consider that it is important for scientists beginning; freely available in websites accessible in one area of speciality to be better acquainted by any researcher. Furthermore, the largest data- with the other, so the use of both books is recom- base, dbSNP, accepts submissions directly from mended. Careful project planning is an essential the scientific community, allowing the rapid dis- step, in addition to the scrutiny of appropriate semination of newly discovered SNP polymor- reporting publications, before any database phisms. In fact all the SNP databases described in search starts in earnest. Therefore, the major steps this review are based on open access and in this of project design involving the access of online sense are a shared scientific resource. Added to resources can be visits to PubMed (for the best this, during 2006 the bulk of privately held SNP current reports in the field), HapMap (for the review data will become available to the whole research of SNP candidates in, or close to, genes or genome community. This means one of the original pre- regions of interest), dbSNP (for the characteriza- cepts of HGP—free access to the best possible tion of the chosen SNPs), and finally use of vari- information acts to accelerate scientific progress, ous web tools for optimizing the genotyping assay equally applies to the overriding majority of human design. This is not intended to be a prescribed SNP data. Unhindered access to information is con- route—many of the sites outlined in this review sidered to be of such primary importance now that have viable alternatives that can act, above all, to a system for data release has been formalized in the supplement the data obtained from the primary recommended framework for international commu- sources described here. In addition, particular nity resource projects (http://www.welcome.ac. areas of clinical genetics such as cancer studies uk/doc_WTD003208.HTML). This framework have specialized databases that are not covered in secures, for all scientists, rapid dissemination, this review but clearly serve their field with a highest standards of database management, and more direct focus of information. the end use of data without restriction. This review was originally written for a third At the same time that genomic data started to potential application of SNP analysis, that of become available, the fields of bioinformatics and forensic identification, commonly termed DNA database management were rapidly developing to profiling. The SNPs offer the potential to greatly keep pace with the scale of the information gener- reduce amplified product size compared to exist- ated and the complexity of the searches required. ing profiling loci and therefore, could provide MOLECULAR BIOTECHNOLOGY Volume 35, 2007 65_98_Phillips_MB06_0040 66 1/3/07, 4:40 PM Online Resources for SNP Analysis 67 much improved results with the highly degraded ers (potentially as many as 1 million), despite the DNA commonly encountered in forensic analysis majority of commercially held Celera loci being or disaster victim identification. common to both public and private databases. Forensic profiling requires a specific set of Some idea of how much supplementary data may search criteria to find SNPs with the necessary eventually be released for public access can be properties for this field and readers interested in obtained from the time taken by National Centre the forensic application of SNP analysis can find for Biotechnology Information (NCBI) to scruti- more specific guidelines in the original article nize, validate, and reorganize the Celera SNPs. along with several specialized reviews (5–7). It is This has already occupied 6 months of 2005 and interesting to note that forensic SNP sets may find is likely to go beyond a full year of data process- applications in genetic studies where measure- ing on a large scale. This underlines the continu- ment of stratification and admixture ratios could ing growth in importance of SNPs to provide both be achieved using a discrimination and geo- sufficient coverage for the analysis of all parts of graphic origin marker panel. These equally spe- the genome and to help fully understand the com- cialized areas of SNP use in genetic analysis are plex mechanisms of gene expression. discussed in the overview of SNP biology. Therefore, the purpose of this review is two- 2. The Biology of SNPs—A Brief Overview fold: to compare SNP databases and to detail the The SNPs, as base modifications, represent methods that can be used to check the character- changes to the ancestral DNA sequence compris- istics of SNP markers of most relevance to a pro- ing a genome. Because the cellular mechanisms posed study. Emphasis has been placed on ways for correcting a base mismatch are extremely to narrow the selection of SNPs (or any other sup- effective, it is necessary to understand how porting data) to a manageable size—an increas- these substitution events progress from a single ingly important strategy given the huge scale of sequence change that is promptly edited back to information available.
Recommended publications
  • Biology of Human Variation Fall 2014
    Anthropology 2110 Biology of Human Variation Fall 2014 Professor: Dr. Tamara Varney Location: AT1010 Lecture time: Tues 7-10 Phone: 807-343-8204 Office: Braun Building 2001D Email: [email protected] Office Hours: Thurs 9-11am am until Oct 15 then by appointment**(if its more convenient feel free to set an appointment time) OR you can phone, email or drop in anytime to see if I am free – just be prepared to come at another time if I am busy with something I cannot interrupt I cannot guarantee that I will respond in a timely fashion if messages are left on my voice mail rather than email and students should NOT expect less than 24 hour turnaround time to email messages. Please DO NOT assume that your message, voice or email, has been received unless you receive an acknowledgement. Course Description: This course focuses on human microevolution. Topics include evolutionary theory, the genetic background of human variation, the distribution of human variation, human adaptability, and the role of disease and other influences on human evolution. Required Textbook: (available at the University Bookstore) Mielke JH, Konigsberg LW and Relethford JH. 2011. Human Biological Variation. 2nd Edition. Oxford University Press. ISBN 13: 978-0-19-538740-7 Also see the course Desire2Learn (D2L) site (look in MyCourseLinks on the LU website) for additional resources. Evaluation: Your final grade will be based on: Term Test 1 25% Oct 27 Term Test 2 30% Nov 18 Final Exam 45% Final examination period – Dec 9-19 **Note exam contingency date is Dec 20. Please note that Fri, November 6, 2015 is the last date for withdrawal without academic penalty from this course.
    [Show full text]
  • Genetic and Environmental Exposures Constrain Epigenetic Drift Over the Human Life Course
    Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Research Genetic and environmental exposures constrain epigenetic drift over the human life course Sonia Shah,1,9 Allan F. McRae,1,9 Riccardo E. Marioni,1,2,3,9 Sarah E. Harris,2,3 Jude Gibson,4 Anjali K. Henders,5 Paul Redmond,6 Simon R. Cox,3,6 Alison Pattie,6 Janie Corley,6 Lee Murphy,4 Nicholas G. Martin,5 Grant W. Montgomery,5 John M. Starr,3,7 Naomi R. Wray,1,10 Ian J. Deary,3,6,10 and Peter M. Visscher1,3,8,10 1Queensland Brain Institute, The University of Queensland, Brisbane, 4072, Queensland, Australia; 2Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom; 3Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, EH8 9JZ, United Kingdom; 4Wellcome Trust Clinical Research Facility, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, United Kingdom; 5QIMR Berghofer Medical Research Institute, Brisbane, 4029, Queensland, Australia; 6Department of Psychology, University of Edinburgh, Edinburgh, EH8 9JZ, United Kingdom; 7Alzheimer Scotland Dementia Research Centre, University of Edinburgh, Edinburgh, EH8 9JZ, United Kingdom; 8University of Queensland Diamantina Institute, Translational Research Institute, The University of Queensland, Brisbane, 4072, Queensland, Australia Epigenetic mechanisms such as DNA methylation (DNAm) are essential for regulation of gene expression. DNAm is dy- namic, influenced by both environmental and genetic factors. Epigenetic drift is the divergence of the epigenome as a function of age due to stochastic changes in methylation.
    [Show full text]
  • Small Variants Frequently Asked Questions (FAQ) Updated September 2011
    Small Variants Frequently Asked Questions (FAQ) Updated September 2011 Summary Information for each Genome .......................................................................................................... 3 How does Complete Genomics map reads and call variations? ........................................................................... 3 How do I assess the quality of a genome produced by Complete Genomics?................................................ 4 What is the difference between “Gross mapping yield” and “Both arms mapped yield” in the summary file? ............................................................................................................................................................................. 5 What are the definitions for Fully Called, Partially Called, Half-Called and No-Called?............................ 5 In the summary-[ASM-ID].tsv file, how is the number of homozygous SNPs calculated? ......................... 5 In the summary-[ASM-ID].tsv file, how is the number of heterozygous SNPs calculated? ....................... 5 In the summary-[ASM-ID].tsv file, how is the total number of SNPs calculated? .......................................... 5 In the summary-[ASM-ID].tsv file, what regions of the genome are included in the “exome”? .............. 6 In the summary-[ASM-ID].tsv file, how is the number of SNPs in the exome calculated? ......................... 6 In the summary-[ASM-ID].tsv file, how are variations in potentially redundant regions of the genome counted? .....................................................................................................................................................................
    [Show full text]
  • A Pilot Analysis of DNA Methylation in Candidate Genes in Brain
    cells Article Epigenetic Study in Parkinson’s Disease: A Pilot Analysis of DNA Methylation in Candidate Genes in Brain Luis Navarro-Sánchez 1 , Beatriz Águeda-Gómez 1, Silvia Aparicio 1 and Jordi Pérez-Tur 1,2,3,* 1 Unitat de Genètica Molecular, Instituto de Biomedicina de Valencia, CSIC, 46010 València, Spain; [email protected] (L.N.-S.); [email protected] (B.Á.-G.); [email protected] (S.A.) 2 Centro de Investigación Biomédica en Red en Enfermedades Neurodegenerativas (CIBERNED), 46010 València, Spain 3 Unidad Mixta de Genética y Neurología, Instituto de Investigación Sanitaria La Fe, 46026 València, Spain * Correspondence: [email protected]; Tel.: +34-96-339-1755 Received: 25 July 2018; Accepted: 21 September 2018; Published: 26 September 2018 Abstract: Efforts have been made to understand the pathophysiology of Parkinson’s disease (PD). A significant number of studies have focused on genetics, despite the fact that the described pathogenic mutations have been observed only in around 10% of patients; this observation supports the fact that PD is a multifactorial disorder. Lately, differences in miRNA expression, histone modification, and DNA methylation levels have been described, highlighting the importance of epigenetic factors in PD etiology. Taking all this into consideration, we hypothesized that an alteration in the level of methylation in PD-related genes could be related to disease pathogenesis, possibly due to alterations in gene expression. After analysing promoter regions of five PD-related genes in three brain regions by pyrosequencing, we observed some differences in DNA methylation levels (hypo and hypermethylation) in substantia nigra in some CpG dinucleotides that, possibly through an alteration in Sp1 binding, could alter their expression.
    [Show full text]
  • Expressed Sequence Tag Analysis of the Response of Apple
    Physiologia Plantarum 133: 298–317. 2008 Copyright ª Physiologia Plantarum 2008, ISSN 0031-9317 Expressed sequence tag analysis of the response of apple (Malus x domestica ‘Royal Gala’) to low temperature and water deficit Michael Wisniewskia,*, Carole Bassetta,*, John Norellia, Dumitru Macarisina, Timothy Artlipa, Ksenija Gasicb and Schuyler Korbanb aUnited States Department of Agriculture – Agricultural Research Service (USDA-ARS), Appalachian Fruit Research Station, 2217 Wiltshire Road, Kearneysville, WV 25430, USA bDepartment of Natural Resources and Environmental Sciences, University of Illinois, 310 ERML, 1201 W. Gregory Drive, Urbana, IL 61801, USA Correspondence Leaf, bark, xylem and root tissues were used to make nine cDNA libraries from *Corresponding author, non-stressed (control) ‘Royal Gala’ apple trees, and from ‘Royal Gala’ trees e-mail: [email protected]; exposed to either low temperature (5°C for 24 h) or water deficit (45% of [email protected] saturated pot mass for 2 weeks). Over 22 600 clones from the nine libraries # Received 26 September 2007; revised 3 were subjected to 5 single-pass sequencing, clustered and annotated using January 2008 BLASTX. The number of clusters in the libraries ranged from 170 to 1430. Regarding annotation of the sequences, BLASTX analysis indicated that within doi: 10.1111/j.1399-3054.2008.01063.x the libraries 65–72% of the clones had a high similarity to known function genes, 6–15% had no functional assignment and 15–26% were completely novel. The expressed sequence tags were combined into three classes (control, low-temperature and water deficit) and the annotated genes in each class were placed into 1 of 10 different functional categories.
    [Show full text]
  • Dbsnp Resources in the UCSC Database Today We Will Discuss
    dbSNP resources in the UCSC database Today we will discuss some of the variation data from dbSNP as displayed on the UCSC Genome Browser. We will start at genome.ucsc.edu at the main Browser page and we will ‘reset all user settings’, which brings us to the most recent genome assembly, hg38. [0:28 Set up GB to hg18 defaults] So to begin, let’s go down to hg18, one of the earlier version of the genome, and we will hit the go button. At the Browser graphic page, we will hit the ‘hide all’ button to turn all the data tracks off. [0:45 Turn on SNPs (130) track] Scrolling to the bottom of the screen, to the Variation and Repeats bluebar group, we’ll draw your attention to the dbSNP track, SNPs version 130. In addition, we have earlier versions: 129, 128 and 126. At the time of version 130, before moving to hg19, dbSNP was mapping all new SNPs to the hg18 genome assembly. If we click into the link above the pulldown menu, we can see that this table was last updated in 2009. We’re going to turn the track on to ‘dense’ and then ‘Submit’. This shows us all the SNPs in version 130, and let’s turn on the gene track, UCSC Genes to ‘pack’so we will see the scale. We have 310,000 bases and we have the default gene GABRA3. The region is annotated by many SNPs. Now let’s zoom into one of the exons here in the middle of the screen, and you can that at this resolution, around 5,000 basepairs, there are a dozen or so SNPs in view.
    [Show full text]
  • Sequence Variations in the Public Human Genome Data Reflect a Bottlenecked Population History
    Sequence variations in the public human genome data reflect a bottlenecked population history Gabor Marth*†, Greg Schuler*, Raymond Yeh‡, Ruth Davenport§, Richa Agarwala*, Deanna Church*, Sarah Wheelan*¶, Jonathan Baker*, Ming Ward*, Michael Kholodov*, Lon Phan*, Eva Czabarka*, Janos Murvai*, David Cutlerʈ, Stephen Wooding**, Alan Rogers**, Aravinda Chakravartiʈ, Henry C. Harpending**, Pui-Yan Kwok†,††, and Stephen T. Sherry*† *National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894; ‡Department of Genetics, Washington University School of Medicine, St. Louis, MO 63130; §Division of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63130; ¶Department of Molecular Biology and Genetics and ʈMcKusick–Nathans Institute of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD 21205; **Department of Anthropology, University of Utah, Salt Lake City, UT 84112; and ††Cardiovascular Research Institute and Department of Dermatology, University of California, San Francisco, CA 94143 Contributed by Henry C. Harpending, November 5, 2002 Single-nucleotide polymorphisms (SNPs) constitute the great ma- (density) distribution of genomic sequence variations. To this jority of variations in the human genome, and as heritable variable end, we built a set of reagents (pairwise sequence alignments landmarks they are useful markers for disease mapping and re- and their corresponding sets of variation) by analyzing the solving population structure. Redundant coverage in overlaps of overlapping regions of large-insert clones sequenced as part of large-insert genomic clones, sequenced as part of the Human the human genome project. These data provided marker Genome Project, comprises a quarter of the genome, and it is density observations grouped by overlap fragment length.
    [Show full text]
  • Methods to Detect Selection in Populations with Applications to the Human
    P1: FQP/VEN July 6, 2000 12:24 Annual Reviews AR104-19 Annu. Rev. Genomics Hum. Genet. 2000. 01:539–59 Copyright c 2000 by Annual Reviews. All rights reserved METHODS TO DETECT SELECTION IN POPULATIONS WITH APPLICATIONS TO THE HUMAN Martin Kreitman Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637; e-mail: [email protected] Key Words natural selection, genetic drift, polymorphism, molecular evolution ■ Abstract The development of statistical tests of natural selection at the DNA level in population samples has been ongoing for the past 13 years. The current state of the field is reviewed, and the available tests of selection are described. All tests use predictions from the theory of neutrally evolving sites as a null hypothesis. Departures from equilibrium-neutral expectations can indicate the presence of natural selection acting either at one or more of the sites under investigation or at a sufficiently tightly linked site. Complications can arise in the interpretation of departures from neutrality if populations are not at equilibrium for mutation and genetic drift or if populations are subdivided, both of which are likely scenarios for humans. Attempts to understand the nonequilibrium configuration of silent polymorphism in human mitochondrial DNA illustrate the difficulty of distinguishing between selection and alternative demographic hypotheses. The range of plausible alternatives to selection will become better defined, however, as additional population genetic data sets become available, allowing better null models to be constructed. INTRODUCTION Kimura’s immensely influential formulation of the neutral theory of molecular evolution, which came in 1968, was based primarily on an argument about the magnitude of the genetic load that would be imposed by positive selection if it were the only driving force in protein evolution (56).
    [Show full text]
  • 1 a Primer on Molecular Biology
    1APrimer on Molecular Biology Alexander Zien Modern molecular biology provides a rich source of challenging machine learning problems. This tutorial chapter aims to provide the necessary biological background knowledge required to communicate with biologists and to understand and properly formalize a number of most interesting problems in this application domain. The largest part of the chapter (its first section) is devoted to the cell as the basic unit of life. Four aspects of cells are reviewed in sequence: (1) the molecules that cells make use of (above all, proteins, RNA, and DNA); (2) the spatial organization of cells (“compartmentalization”); (3) the way cells produce proteins (“protein expression”); and (4) cellular communication and evolution (of cells and organisms). In the second section, an overview is provided of the most frequent measurement technologies, data types, and data sources. Finally, important open problems in the analysis of these data (bioinformatics challenges) are briefly outlined. 1.1 The Cell The basic unit of all (biological) life is the cell. A cell is basically a watery solution of certain molecules, surrounded by a lipid (fat) membrane. Typical sizes of cells range from 1 µm (bacteria) to 100 µm (plant cells). The most important properties Life of a living cell (and, in fact, of life itself) are the following: It consists of a set of molecules that is separated from the exterior (as a human being is separated from his or her surroundings). It has a metabolism, that is, it can take up nutrients and convert them into other molecules and usable energy. The cell uses nutrients to renew its constituents, to grow, and to drive its actions (just like a human does).
    [Show full text]
  • The Genetics of Bipolar Disorder
    Molecular Psychiatry (2008) 13, 742–771 & 2008 Nature Publishing Group All rights reserved 1359-4184/08 $30.00 www.nature.com/mp FEATURE REVIEW The genetics of bipolar disorder: genome ‘hot regions,’ genes, new potential candidates and future directions A Serretti and L Mandelli Institute of Psychiatry, University of Bologna, Bologna, Italy Bipolar disorder (BP) is a complex disorder caused by a number of liability genes interacting with the environment. In recent years, a large number of linkage and association studies have been conducted producing an extremely large number of findings often not replicated or partially replicated. Further, results from linkage and association studies are not always easily comparable. Unfortunately, at present a comprehensive coverage of available evidence is still lacking. In the present paper, we summarized results obtained from both linkage and association studies in BP. Further, we indicated new potential interesting genes, located in genome ‘hot regions’ for BP and being expressed in the brain. We reviewed published studies on the subject till December 2007. We precisely localized regions where positive linkage has been found, by the NCBI Map viewer (http://www.ncbi.nlm.nih.gov/mapview/); further, we identified genes located in interesting areas and expressed in the brain, by the Entrez gene, Unigene databases (http://www.ncbi.nlm.nih.gov/entrez/) and Human Protein Reference Database (http://www.hprd.org); these genes could be of interest in future investigations. The review of association studies gave interesting results, as a number of genes seem to be definitively involved in BP, such as SLC6A4, TPH2, DRD4, SLC6A3, DAOA, DTNBP1, NRG1, DISC1 and BDNF.
    [Show full text]
  • Human DNA Sequences: More Variation and Less Race
    AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 139:23–34 (2009) Human DNA Sequences: More Variation and Less Race Jeffrey C. Long,1* Jie Li,1 and Meghan E. Healy2 1Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109-5618 2Department of Anthropology, University of New Mexico, Albuquerque, NM 87131 KEY WORDS race; DNA sequence; short tandem repeat; diversity; hierachical models ABSTRACT Interest in genetic diversity within and sity is one of nested subsets, such that the diversity in between human populations as a way to answer questions non-Sub-Saharan African populations is essentially a sub- about race has intensified in light of recent advances in set of the diversity found in Sub-Saharan African popula- genome technology. The purpose of this article is to apply tions. The actual pattern of DNA diversity creates some a method of generalized hierarchical modeling to two unsettling problems for using race as meaningful genetic DNA data sets. The first data set consists of a small sam- categories. For example, the pattern of DNA diversity ple of individuals (n 5 32 total, from eight populations) implies that some populations belong to more than one who have been fully resequenced for 63 loci that encode a race (e.g., Europeans), whereas other populations do not total of 38,534 base pairs. The second data set consists of belong to any race at all (e.g., Sub-Saharan Africans). As a large sample of individuals (n 5 928 total, from 46 popu- Frank Livingstone noted long ago, the Linnean classifica- lations) who have been genotyped at 580 loci that encode tion system cannot accommodate this pattern because short tandem repeats.
    [Show full text]
  • Dbsnp Columns
    dbSNP Columns RS dbSNP ID (i.e. rs number) RSPOS Chr position reported in dbSNP RV RS orientation is reversed Variation Property. Documentation is at VP ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a GENEINFO colon (:) and each pair is delimited by a vertical bar (|) dbSNPBuildID First dbSNP Build for RS SAO Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both Variant Suspect Reason Codes (may be more than one value added together) 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - SSR 1kg_failed, 1024 - other WGT Weight, 00 - unmapped, 1 - weight 1, 2 - weight 2, 3 - weight 3 or more VC Variation Class PM Variant is Precious(Clinical,Pubmed Cited) Provisional Third Party Annotation(TPA) (currently rs from PHARMGKB who will TPA give phenotype data) PMC Links exist to PubMed Central article S3D Has 3D structure - SNP3D table SLO Has SubmitterLinkOut - From SNP->SubSNP->Batch.link_out Has non-synonymous frameshift A coding region variation where one allele in NSF the set changes all downstream amino acids. FxnClass = 44 Has non-synonymous missense A coding region variation where one allele in NSM the set changes protein peptide. FxnClass = 42 Has non-synonymous nonsense A coding region variation where one allele in NSN the set changes to STOP codon (TER). FxnClass = 41 Has reference A coding region variation where one allele in the set is identical REF to the reference sequence. FxnCode = 8 Has synonymous A coding region variation where one allele in the set does not SYN change the encoded amino acid.
    [Show full text]