A Comparison of Cataloged Variation Between International Hapmap

Total Page:16

File Type:pdf, Size:1020Kb

A Comparison of Cataloged Variation Between International Hapmap View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by PubMed Central Research and applications A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data Carrie C Buchanan,1,2 Eric S Torstenson,1 William S Bush,1 Marylyn D Ritchie2 1Center for Human Genetics ABSTRACT were used. Second generation linkage maps were Research, Vanderbilt University, Background Since publication of the human genome in based on microsatellites, which are short tandem Nashville, Tennessee, USA 2 2003, geneticists have been interested in risk variant repeated DNA sequences present throughout the Department of Biochemistry fi and Molecular Biology, associations to resolve the etiology of traits and complex genome. The rst published study using micro- Pennsylvania State University, diseases. The International HapMap Consortium satellites included 814 polymorphic markers.2 The University Park, Pennsylvania, undertook an effort to catalog all common variation third generation linkage maps were created using USA across the genome (variants with a minor allele SNPs. These high density maps were developed by frequency (MAF) of at least 5% in one or more ethnic Correspondence to the International HapMap Consortium (haplotype Dr Marylyn D Ritchie, groups). HapMap along with advances in genotyping mapping). HapMap aimed to compare genetic Pennsylvania State University, technology led to genome-wide association studies sequences of different individuals and identify Department of Biochemistry and which have identified common variants associated with chromosomal regions where genetic variants were Molecular Biology, 512 Wartik, many traits and diseases. In 2008 the 1000 Genomes shared. These variations quickly became the core University Park, PA 16802, USA; [email protected] Project aimed to sequence 2500 individuals and identify around which genome-wide association studies rare variants and 99% of variants with a MAF of <1%. (GWAS) were built. Researchers believed that these Received 21 October 2011 Methods To determine whether the 1000 Genomes variations among individuals could explain the Accepted 27 December 2011 Project includes all the variants in HapMap, we examined heritability of common disease. After several years of the overlap between single nucleotide polymorphisms moderately successful GWAS, a group of researchers (SNPs) genotyped in the two resources using merged decided that a more in-depth look at variation, phase II/III HapMap data and low coverage pilot data including rare variation, was necessary to explain from 1000 Genomes. additional disease heritability. The 1000 Genomes Results Comparison of the two data sets showed that Project aimed to sequence 2500 individuals and approximately 72% of HapMap SNPs were also found in gather information on variants down to 1% allele 1000 Genomes Project pilot data. After filtering out frequency with the goal of providing a more exten- HapMap variants with a MAF of <5% (separately for sive catalog of variation to the scientific community. each population), 99% of HapMap SNPs were found in In addition to cataloging human variation, both 1000 Genomes data. databases serve many other purposes. For example, Conclusions Not all variants cataloged in HapMap are GWAS were possible because of the linkage also cataloged in 1000 Genomes. This could affect disequilibrium information calculated from the decisions about which resource to use for SNP queries, SNPs in HapMap. Published sequencing studies are rare variant validation, or imputation. Both the HapMap often filtered by variants in 1000 Genomes to and 1000 Genomes Project databases are useful reduce the number of variants used in association resources for human genetics, but it is important to tests, since the individuals in 1000 Genomes are understand the assumptions made and filtering presumably healthy controls and thus variants strategies employed by these projects. detected in these data are unlikely of importance for disease. Both of these resources have enhanced the study design and analysis pipelines for common and rare variant association studies. INTRODUCTION The HapMap project was launched in October The field of human genetics has rapidly developed in 2002 on the heels of the completion of the human the past few decades. The desire for precise genomic genome sequence. The project was designed to mapping has encouraged the development of asso- build a database of common sequence variation, to ciation studies, from genome-wide linkage studies to determine allele frequencies, and to empirically both low and high throughput single nucleotide determine the linkage disequilibrium relationships polymorphism (SNP) genotyping and, most across the genome. To date, there are three phases recently, high throughput DNA sequencing. At each of HapMap. The details are listed in table 1.3 4 stage of progression, the researcher has been better In 2008, the HapMap project catalog contained able to narrow disease susceptibility genetic regions 3.5 million commonly occurring genetic variants and/or identify causal variants associated with across several populations. The allele frequencies disease. The first genetic linkage map was published and correlation patterns were critical for the in 1987 and based on restriction fragment length development and success of GWAS. However, to 1 This paper is freely available polymorphisms (RFLPs). RFLPs are DNA poly- expand the investigation of causal variants to online under the BMJ Journals morphisms that disrupt (by either creation or include rare variation, more research was required. unlocked scheme, see http:// destruction) restriction endonuclease recognition Using sequencing technology, researchers are able jamia.bmj.com/site/about/ fi unlocked.xhtml sequences. In this rst map, only 393 bi-allelic RFLPs to identify novel or rare variants. Sequencing J Am Med Inform Assoc 2012;19:289e294. doi:10.1136/amiajnl-2011-000652 289 Research and applications Table 1 HapMap details Table 3 Details for 1000 Genomes Project full project data (sequence No. of SNPs index 2010.08.04) genotyped Targeted SNPs Populations studied Continental groups Ethnicity breakdown Total Phase I 1 million Prioritized coding SNPs to attain CEU, YRI, CHB, JPT AFR 78 YRI+67 LWK+24 ASW+5 PUR 174 1 SNP for each 5 kb region EUR 90 CEU+92 TSI+43 GBR+36 FIN+17 283 Phase II 3 million Prioritized non-synonymous CEU, YRI, CHB, JPT MXL+5 PUR SNPs in coding regions ASN 68 CHB+25 CHS+84 JPT+17 MXL 194 Phase III 1.4 million Prioritized rare variants CEU, YRI, CHB, JPT, Total number of unique individuals 629 ASW, CHB, GIH, LWK, MXL, MKK, TSI AFR, African; ASN, Asian; EUR, European. SNP, single nucleotide polymorphism. which database to reference, one might base one’s decision on enables scientists to pinpoint functional variants from associa- the newest release, total number of variants, or even ethnicities ’ tion studies, improve the knowledge available to researchers included. For example, if a researcher s interest was rare variants, interested in evolutionary biology, and may lay the foundation he/she might automatically assume that data from the 1000 for predicting disease susceptibility and drug response. The 1000 Genomes Project would be the variation catalog of choice. The Genomes consortium materialized to address these needs, 1000 Genomes Project aimed to provide characterization of over 95% of variants in accessible genomic regions that have an allele primarily by providing a sequence reference database. Their aim 6 has been to ‘provide a deep characterization of human genome frequency of 1% or higher. In the previous example, presuming sequence variation as a foundation for investigating the rela- that the 1000 Genomes Project data included more rare variants tionship between genotype and phenotype.’ The pilot phase of than HapMap would be a correct assumption; the 1000 the project, which included three subprojects, provided the first Genomes Project pilot data do indeed capture more rare varia- data release. The subprojects were planned to achieve their aims tion than HapMap. However, not all rare variants found in through evaluation of sequencing technology and to develop HapMap have been found in the 1000 Genomes Project catalog. analytical pipelines for alignment, quality control, data Therefore, if one is interested in rare variants, it might be fi management, and statistical analysis.5 Details about the pilot bene cial to investigate both resources. An example of this is fi data from the 1000 Genomes Project are shown in table 2. shown in gure 1, which shows a screen shot from the NCBI Review of the pilot data shows that the project successfully browser (taken in October 2010) of a region on chromosome 7. It catalogs the vast majority of common variation. Durbin et al lists the known variants by chromosome position, rs id, func- reported that over 95% of the currently accessible variants found tional change, alleles, and many other identifying characteristics. in any individual were present in the pilot data.6 It also includes a validation column which provides links and To date (August 2011), the full 1000 Genomes Project data details about the validation status of each given variant. Of includes SNP calls, exome alignments, and genotypes for 1185 particular interest is variant rs2072413. It was validated in individuals. The end goal is to sequence approximately 2500 de- HapMap but was not sequenced by the 1000 Genomes Project identified subjects from 25 populations worldwide using next- (at this time, only pilot data from 1000 Genomes were available generation sequencing technology. In the ‘low-coverage’ full project on NCBI). If this SNP was of interest, it would be important to data, the current coverage estimate is 7.73 (64.2) and includes 15 consider the HapMap data, including rare variants. This was world populations.6 In this analysis, we downloaded an earlier somewhat surprising, and we felt that it might be pertinent to release of the full project data (released October 2010) which determine how pervasive the differences were between HapMap included 629 individuals from 15 world populations (see table 3). and 1000 Genomes Project data.
Recommended publications
  • Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P
    Published online 1 November 2009 Nucleic Acids Research, 2010, Vol. 38, Database issue D563–D569 doi:10.1093/nar/gkp871 Ensembl Genomes: Extending Ensembl across the taxonomic space P. J. Kersey*, D. Lawson, E. Birney, P. S. Derwent, M. Haimel, J. Herrero, S. Keenan, A. Kerhornou, G. Koscielny, A. Ka¨ ha¨ ri, R. J. Kinsella, E. Kulesha, U. Maheswari, K. Megy, M. Nuhn, G. Proctor, D. Staines, F. Valentin, A. J. Vilella and A. Yates EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK Received August 14, 2009; Revised September 28, 2009; Accepted September 29, 2009 ABSTRACT nucleotide archives; numerous other genomes exist in states of partial assembly and annotation; thousands of Ensembl Genomes (http://www.ensemblgenomes viral genomes sequences have also been generated. .org) is a new portal offering integrated access to Moreover, the increasing use of high-throughput genome-scale data from non-vertebrate species sequencing technologies is rapidly reducing the cost of of scientific interest, developed using the Ensembl genome sequencing, leading to an accelerating rate of genome annotation and visualisation platform. data production. This not only makes it likely that in Ensembl Genomes consists of five sub-portals (for the near future, the genomes of all species of scientific bacteria, protists, fungi, plants and invertebrate interest will be sequenced; but also the genomes of many metazoa) designed to complement the availability individuals, with the possibility of providing accurate and of vertebrate genomes in Ensembl. Many of the sophisticated annotation through the similarly low-cost databases supporting the portal have been built in application of functional assays.
    [Show full text]
  • Population Genetic Considerations for Using Biobanks As International
    Carress et al. BMC Genomics (2021) 22:351 https://doi.org/10.1186/s12864-021-07618-x REVIEW Open Access Population genetic considerations for using biobanks as international resources in the pandemic era and beyond Hannah Carress1, Daniel John Lawson2 and Eran Elhaik1,3* Abstract The past years have seen the rise of genomic biobanks and mega-scale meta-analysis of genomic data, which promises to reveal the genetic underpinnings of health and disease. However, the over-representation of Europeans in genomic studies not only limits the global understanding of disease risk but also inhibits viable research into the genomic differences between carriers and patients. Whilst the community has agreed that more diverse samples are required, it is not enough to blindly increase diversity; the diversity must be quantified, compared and annotated to lead to insight. Genetic annotations from separate biobanks need to be comparable and computable and to operate without access to raw data due to privacy concerns. Comparability is key both for regular research and to allow international comparison in response to pandemics. Here, we evaluate the appropriateness of the most common genomic tools used to depict population structure in a standardized and comparable manner. The end goal is to reduce the effects of confounding and learn from genuine variation in genetic effects on phenotypes across populations, which will improve the value of biobanks (locally and internationally), increase the accuracy of association analyses and inform developmental efforts. Keywords: Bioinformatics, Population structure, Population stratification bias, Genomic medicine, Biobanks Background individuals, families, communities and populations, ne- Association studies aim to detect whether genetic vari- cessitated genomic biobanks.
    [Show full text]
  • Gene Linkage and Genetic Mapping 4TH PAGES © Jones & Bartlett Learning, LLC
    © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Gene Linkage and © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC 4NOTGenetic FOR SALE OR DISTRIBUTIONMapping NOT FOR SALE OR DISTRIBUTION CHAPTER ORGANIZATION © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR4.1 SALELinked OR alleles DISTRIBUTION tend to stay 4.4NOT Polymorphic FOR SALE DNA ORsequences DISTRIBUTION are together in meiosis. 112 used in human genetic mapping. 128 The degree of linkage is measured by the Single-nucleotide polymorphisms (SNPs) frequency of recombination. 113 are abundant in the human genome. 129 The frequency of recombination is the same SNPs in restriction sites yield restriction for coupling and repulsion heterozygotes. 114 fragment length polymorphisms (RFLPs). 130 © Jones & Bartlett Learning,The frequency LLC of recombination differs © Jones & BartlettSimple-sequence Learning, repeats LLC (SSRs) often NOT FOR SALE OR DISTRIBUTIONfrom one gene pair to the next. NOT114 FOR SALEdiffer OR in copyDISTRIBUTION number. 131 Recombination does not occur in Gene dosage can differ owing to copy- Drosophila males. 115 number variation (CNV). 133 4.2 Recombination results from Copy-number variation has helped human populations adapt to a high-starch diet. 134 crossing-over between linked© Jones alleles. & Bartlett Learning,116 LLC 4.5 Tetrads contain© Jonesall & Bartlett Learning, LLC four products of meiosis.
    [Show full text]
  • Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W ­ 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S
    https://doi.org/10.1038/s41586-021-03855-y Accelerated Article Preview Rare variant contribution to human disease W in 281,104 UK Biobank exomes E VI Received: 3 November 2020 Quanli Wang, Ryan S. Dhindsa, Keren Carss, Andrew R. Harper, Abhishek N ag­­, I oa nn a Tachmazidou, Dimitrios Vitsios, Sri V. V. Deevi, Alex Mackay, EDaniel Muthas, Accepted: 28 July 2021 Michael Hühn, Sue Monkley, Henric O ls so n , S eb astian Wasilewski, Katherine R. Smith, Accelerated Article Preview Published Ruth March, Adam Platt, Carolina Haefliger & Slavé PetrovskiR online 10 August 2021 P Cite this article as: Wang, Q. et al. Rare variant This is a PDF fle of a peer-reviewed paper that has been accepted for publication. contribution to human disease in 281,104 UK Biobank exomes. Nature https:// Although unedited, the content has been subjectedE to preliminary formatting. Nature doi.org/10.1038/s41586-021-03855-y (2021). is providing this early version of the typeset paper as a service to our authors and Open access readers. The text and fgures will undergoL copyediting and a proof review before the paper is published in its fnal form. Please note that during the production process errors may be discovered which Ccould afect the content, and all legal disclaimers apply. TI R A D E T A R E L E C C A Nature | www.nature.com Article Rare variant contribution to human disease in 281,104 UK Biobank exomes W 1,19 1,19 2,19 2 2 https://doi.org/10.1038/s41586-021-03855-y Quanli Wang , Ryan S.
    [Show full text]
  • Genetic Linkage Analysis
    BASIC SCIENCE SEMINARS IN NEUROLOGY SECTION EDITOR: HASSAN M. FATHALLAH-SHAYKH, MD Genetic Linkage Analysis Stefan M. Pulst, MD enetic linkage analysis is a powerful tool to detect the chromosomal location of dis- ease genes. It is based on the observation that genes that reside physically close on a chromosome remain linked during meiosis. For most neurologic diseases for which the underlying biochemical defect was not known, the identification of the chromo- Gsomal location of the disease gene was the first step in its eventual isolation. By now, genes that have been isolated in this way include examples from all types of neurologic diseases, from neu- rodegenerative diseases such as Alzheimer, Parkinson, or ataxias, to diseases of ion channels lead- ing to periodic paralysis or hemiplegic migraine, to tumor syndromes such as neurofibromatosis types 1 and 2. Arch Neurol. 1999;56:667-672 With the advent of new genetic markers tin gene, diagnosis using flanking mark- and automated genotyping, genetic map- ers requires the analysis of several family ping can be conducted extremely rap- members. idly. Genetic linkage maps have been gen- erated for the human genome and for LINKAGE OF GENES model organisms and have provided the basis for the construction of physical maps When Mendel observed an “independent that permit the rapid mapping of disease assortment of traits” (Mendel’s second traits. law), he was fortunate to have chosen traits As soon as a chromosomal location that were not localized close to one an- for a disease phenotype has been estab- other on the same chromosome.1 Subse- lished, genetic linkage analysis helps quent studies revealed that many genes determine whether the disease pheno- were indeed linked, ie, that traits did not type is only caused by mutation in a assort or segregate independently, but that single gene or mutations in other genes traits encoded by these linked genes were can give rise to an identical or similar inherited together.
    [Show full text]
  • (DDD) Project: What a Genomic Approach Can Achieve
    The Deciphering Development Disorders (DDD) project: What a genomic approach can achieve RCP ADVANCED MEDICINE, LONDON FEB 5TH 2018 HELEN FIRTH DM FRCP DCH, SANGER INSTITUTE 3,000,000,000 bases in each human genome Disease & developmental Health & development disorders Fascinating facts about your genome! –~20,000 protein-coding genes –~30% of genes have a known role in disease or developmental disorders –~10,000 protein altering variants –~100 protein truncating variants –~70 de novo mutations (~1-2 coding ie. In exons of genes) Rare Disease affects 1 in 17 people •Prior to DDD, diagnostic success in patients with rare paediatric disease was poor •Not possible to diagnose many patients with current methodology in routine use– maximum benefit in this group •DDD recruited patients with severe/extreme clinical features present from early childhood with high expectation of genetic basis •Recruitment was primarily of trios (ie The Doctor Sir Luke Fildes (1887) child and both parents) ~ 90% Making a genomic diagnosis of a rare disease improves care •Accurate diagnosis is the cornerstone of good medical practice – informing management, treatment, prognosis and prevention •Enables risk to other family members to be determined enabling predictive testing with potential for surveillance and therapy in some disorders February 28th 2018 •Reduces sense of isolation, enabling better access to support and information •Curtails the diagnostic odyssey •Not just a descriptive label; identifies the fundamental cause of disease A genomic diagnosis can be a gateway to better treatment •Not just a descriptive label; identifies the fundamental cause of disease •Biallelic mutations in the CFTR gene cause Cystic Fibrosis • CFTR protein is an epithelial ion channel regulating absorption/ secretion of salt and water in the lung, sweat glands, pancreas & GI tract.
    [Show full text]
  • Different Evolutionary Patterns of Snps Between Domains and Unassigned Regions in Human Protein‑Coding Sequences
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Mol Genet Genomics (2016) 291:1127–1136 DOI 10.1007/s00438-016-1170-7 ORIGINAL ARTICLE Different evolutionary patterns of SNPs between domains and unassigned regions in human protein‑coding sequences Erli Pang1 · Xiaomei Wu2 · Kui Lin1 Received: 14 September 2015 / Accepted: 18 January 2016 / Published online: 30 January 2016 © The Author(s) 2016. This article is published with open access at Springerlink.com Abstract Protein evolution plays an important role in Furthermore, the selective strength on domains is signifi- the evolution of each genome. Because of their functional cantly greater than that on unassigned regions. In addition, nature, in general, most of their parts or sites are differently among all of the human protein sequences, there are 117 constrained selectively, particularly by purifying selection. PfamA domains in which no SNPs are found. Our results Most previous studies on protein evolution considered indi- highlight an important aspect of protein domains and may vidual proteins in their entirety or compared protein-coding contribute to our understanding of protein evolution. sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each pro- Keywords Human genome · Protein-coding sequence · tein of a given genome. To this end, based on PfamA anno- Protein domain · SNPs · Natural selection tation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in Introduction protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occur- Studying protein evolution is crucial for understanding ring within protein domains and those within unassigned the evolution of speciation and adaptation, senescence and regions.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • 1 Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: P
    Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: Populations Coming into Focus In November 2012, some eleven years after the publication of the first draft sequence of a human genome, an article published in Nature reported a new ‘map’ of the human genome – created from not one, but 1,092 individuals. For many researchers, however, what was compelling was not the number of individuals sequenced, but rather the fourteen worldwide populations they represented. Comparisons that could be made within and among these populations represented new possibilities for the scientific study of human genetic variation. The paper – which has been cited over 400 times in the subsequent year – was the output of the first phase of the 1000 Genomes Project, one of several international research consortia launched with the intent of identifying and cataloguing such variation. With the project’s phase three data release, anticipated in early spring 2014, the sample size will rise to over 2500 individuals representing twenty-six populations. Each individual’s full sequence data is made publicly available online, and is also preserved through the establishment of immortal cell lines, from which DNA can be extracted and distributed. With these developments, population-based science has been made genomic, and scientific conceptions of human populations have begun to crystallize (see appendix). Such extensive biobanking and databasing of human populations is remarkable for a number of reasons, not least among them the socially charged terrain that such an enterprise inevitably must navigate. While the 1000 Genomes Project (1000G) has been relatively uncontroversial in its reception, predecessors such as the Human Genome Diversity Project (HGDP), first conceived in 1991, faced greater difficulty.
    [Show full text]
  • NIH-GDS: Genomic Data Sharing
    NIH-GDS: Genomic Data Sharing National Institutes of Health Data type Explain whether the research being considered for funding involves human data, non- human data, or both. Information to be included in this section: • Type of data being collected: human, non-human, or both human & non-human. • Type of genomic data to be shared: sequence, transcriptomic, epigenomic, and/or gene expression. • Level of the genomic data to be shared: Individual-level, aggregate-level, or both. • Relevant associated data to be shared: phenotype or exposure. • Information needed to interpret the data: study protocols, survey tools, data collection instruments, data dictionary, software (including version), codebook, pipeline metadata, etc. This information should be provided with unrestricted access for all data levels. Data repository Identify the data repositories to which the data will be submitted, and for human data, whether the data will be available through unrestricted or controlled-access. For human genomic data, investigators are expected to register all studies in the database of Genotypes and Phenotypes (dbGaP) by the time data cleaning and quality control measures begin in addition to submitting the data to the relevant NIH-designated data repository (e.g., dbGaP, Gene Expression Omnibus (GEO), Sequence Read Archive (SRA), the Cancer Genomics Hub) after registration. Non-human data may be made available through any widely used data repository, whether NIH- funded or not, such as GEO, SRA, Trace Archive, Array Express, Mouse Genome Informatics, WormBase, the Zebrafish Model Organism Database, GenBank, European Nucleotide Archive, or DNA Data Bank of Japan. Data in unrestricted-access repositories (e.g., The 1000 Genomes Project) are publicly available to anyone.
    [Show full text]
  • Genetic Linkage of the Huntington's Disease Gene to a DNA Marker James F
    LE JOURNAL CANADIEN DES SCIENCES NEUROLOG1QUES SPECIAL FEATURE Genetic Linkage of the Huntington's Disease Gene to a DNA Marker James F. Gusella ABSTRACT: Recombinant DNA techniques have provided the means to generate large numbers of new genetic linkage markers. This technology has been used to identify a DNA marker that coinherits with the Huntington's Disease (HD) gene in family studies. The HD locus has thereby been mapped to human chromosome 4. The discovery of a genetic marker for the inheritance of HD has implications both for patient care and future research. The same approach holds considerable promise for the investigation of other genetic diseases, including Dystonia Musculorum Deformans. RESUME: Les techniques d'ADN recombine ont fourni le moyen de g£nerer un grand nombre de nouveux marqueurs a liason g6n£tique. Cette technologie a He employe" afin d'identifier un marqueur d'ADN qui co-herite avec le gene de la maladie de Huntington (MH) dans les etudes familiales. Le lieu du gene de la MH a ainsi 6te localise sur le chromosome humain numero 4. La d6couverte d'un marqueur g6n6tique pour I'her6dit6 de MH a des implications pour la soin de patients ainsi que pour la recherche dans le futur. La meme approche semble pleine de promesses pour l'investigation d'autres maladies g6ndtiques, incluant la dystonie musculaire deTormante. Can. J. Neurol. Sci. 1984; 11:421-425 Huntington's Disease currently no effective therapy to cure this devastating disease, Huntington's disease (HD) is a genetic neurodegenerative or to slow its inexorable progression. disorder first described by George Huntington in 1873 (Hunting­ ton, 1972;Hayden, 1981;Chaseetal., 1979).
    [Show full text]
  • Linkage & Genetic Mapping in Eukaryotes
    LinLinkkaaggee && GGeenneetticic MMaappppiningg inin EEuukkaarryyootteess CChh.. 66 1 LLIINNKKAAGGEE AANNDD CCRROOSSSSIINNGG OOVVEERR ! IInn eeuukkaarryyoottiicc ssppeecciieess,, eeaacchh lliinneeaarr cchhrroommoossoommee ccoonnttaaiinnss aa lloonngg ppiieeccee ooff DDNNAA – A typical chromosome contains many hundred or even a few thousand different genes ! TThhee tteerrmm lliinnkkaaggee hhaass ttwwoo rreellaatteedd mmeeaanniinnggss – 1. Two or more genes can be located on the same chromosome – 2. Genes that are close together tend to be transmitted as a unit Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display 2 LinkageLinkage GroupsGroups ! Chromosomes are called linkage groups – They contain a group of genes that are linked together ! The number of linkage groups is the number of types of chromosomes of the species – For example, in humans " 22 autosomal linkage groups " An X chromosome linkage group " A Y chromosome linkage group ! Genes that are far apart on the same chromosome can independently assort from each other – This is due to crossing-over or recombination Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display 3 LLiinnkkaaggee aanndd RRecombinationecombination Genes nearby on the same chromosome tend to stay together during the formation of gametes; this is linkage. The breakage of the chromosome, the separation of the genes, and the exchange of genes between chromatids is known as recombination. (we call it crossing over) 4 IndependentIndependent assortment:assortment:
    [Show full text]