Clinical NGS Database Ver1.4 User’S Manual

Total Page:16

File Type:pdf, Size:1020Kb

Clinical NGS Database Ver1.4 User’S Manual Clinical_NGS_Database_ver1.4 User’s Manual Produced by: Shin-ya Nishio and Shin-ichi Usami Department of Otorhinolaryngology, Shinshu University School of Medicine 3-1-1 Asahi, Matsumoto 390-8621, Japan Tel: +81-263-37-2666 Fax: +81-263-36-9164 E-mail: [email protected] 1. Overview Recent advances in molecular genetics technologies, notably next-generation sequencing (NGS) have drastically accelerated the identification of novel genes involved in most inherited disease and expanded the mutational spectrum of disease-causing genes. These new technologies have led to significant breakthroughs in the field of human genetics research, but have also raised new challenges in the interpretation of the pathogenicity of an extraordinary number of generated genomic variants. This database software is designed for the efficient clinical next-generation sequencing analysis of inherited disease by collecting the data for a large number of variants as well as clinical information. This database software is also intended for easy start up, easy maintenance and the reduction of the load on computer resources, while providing powerful analysis tools for clinical next-generation sequencing. 1-1. Architecture of this database This data base software is constructed by 7 internal sub-databases. 1) “Core database” is a database to store all SNVs and Ins/dels information from each patient and in house controls. Variant call formated files (.vcf) produced from next-generation sequencing analysis are stored into this database. 2) “Sample List” is a database for patients and in-house control sample information. This database also includes project information as well as detailed patient clinical information. Records of this database are automatically generated from “Core database”. 3) “Valiant List” is a database to store variant information, calculated allele frequencies and averaged phenotypes of each patient. This database is automatically generated from the “Core database”. 4) “Gene List” is a database to store target gene lists for capture panels of the target re-sequencing. In this version, 10 kinds of capture libraries can be submitted into this database. 5) “ANNOVAR database” is a database to store the annotation information of -1- each variant included in “Variant List” database. In this version, annotation information produced by ANNOVAR is compatible. Please refer to the citations for more details on the ANNOVAR software. 6) “Previously reported variants database” is a database to store previously reported variant information. 7) “Control allele frequency database” is a database to store control variant allele frequency information from the public database. This database is compatible to vcf formated files. Fig. 1 Relationship of each sub-database ANNOVAR Gene_List Gene_refgene Gene Name concatenate Gene Name Chr Locus Start disease End OMIM# Ref AD Alt AR Sample_List Core_DB ID ID Valiant_List Previously_Reporte… concatenate concatenate concatenate ID project_NUM check check Chr project ID Chr position hereditary project Start Ref Enrichment platform End Alt check TaegetSet Ref gene Alt protein_change Control_Allele_Freq… concatenate Chr position Ref Alt Sample# AveDepth Sub-databases are linked to each other as illustrated in Figure 1. “Sample List” managing the sample IDs and clinical data are linked to “Core DB” by the sample “ID”. Thus, the sample “ID” should be unique throughout the samples. “Variant List” managing all variant information are linked to “Core DB”, “ANNOVAR database”, “Previously reported variants database”, and “Control allele frequency database” by the “concatenate” field. The “concatenate” field is automatically generated by combining the variant information (“Chr_Start_End_Ref_Alt). -2- “ANNOVAR database” linked to “Gene List” database by “Gene Name”. The name of genes should be identical between the “ANNOVAR database” to “Gene List” databases. 1-2. User interface of this database This data base software has 2 main user interfaces; the “Case Viewer” and “Variant Viewer”. 1) “Case Viewer” (Figure 2) is an interface for efficient clinical sequencing for the diagnosis of each patient. In this interface, you can get all of the patient’s clinical information including Sample ID, Project Name, Pedigree, and other detailed Clinical Data. In addition to the clinical information, you can get variant information after automatic filtering. This database has automatic variant filtering functions for “Protein-affecting variants,” “Low minor allele frequency among control population,” “Previously pathogenic variants,” etc. (Please refer to section 15 for filtering). This interface is useful for managing the “Direct sequence conformation results,” “Family segregation results,” and the genetic “Diagnosis”. For more detailed information about the filtering process and direct sequencing result management, please refer to section 15 of this manual. -3- Fig. 2 Case viewer Clinical NGS Database ver. 1.2 Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Case_Viewer ID Family_NUM project project_NUM Clinical diagnosis Onset_Age Gender Demo10 Fam001 DemoData ShinshuMP1 Congenital sensorineural hearing loss 0 Female proband_ID relationship hereditary sampling_date registration_date center_name JHLB0001 proband AD/Mit AR/Spo X_linked Control Unknown Shinshu-university Imaging_Data1 Family_History Categorical Data Numerical_Data Category_Data_1 YES NO N/A Numerical_Data_1 40 Numerical_Data_21 Imaging_Data 2 Imaging_Data 3 Category_Data_2 YES NO N/A Numerical_Data_2 Numerical_Data_22 Category_Data_3 YES NO N/A Numerical_Data_3 Numerical_Data_23 Category_Data_4 YES NO N/A Numerical_Data_4 Numerical_Data_24 Category_Data_5 YES NO N/A Numerical_Data_5 Numerical_Data_25 Category_Data_6 YES NO N/A Numerical_Data_6 Numerical_Data_26 Category_Data_7 YES NO N/A Numerical_Data_7 Numerical_Data_27 Category_Data_8 YES NO N/A Numerical_Data_8 Numerical_Data_28 Category_Data_9 YES NO N/A Numerical_Data_9 Numerical_Data_29 Category_Data_10 YES NO N/A Numerical_Data_10 Numerical_Data_30 Category_Data_11 YES NO N/A Numerical_Data_11 Numerical_Data_31 Imaging_Data 4 Imaging_Data 5 Clinical information medical_history Category_Data_12 YES NO N/A Numerical_Data_12 Numerical_Data_32 Category_Data_13 YES NO N/A Numerical_Data_13 Numerical_Data_33 Category_Data_14 YES NO N/A Numerical_Data_14 Numerical_Data_34 Category_Data_15 YES NO N/A Numerical_Data_15 Numerical_Data_35 Causative gene Genotype Curation_date Curator Category_Data_16 YES NO N/A Numerical_Data_36 Diagnostic candidate: Numerical_Data_16 Category_Data_17 YES NO N/A Numerical_Data_17 Numerical_Data_37 Diagnosis: OTOF homozygote Category_Data_18 YES NO N/A Numerical_Data_18 Numerical_Data_38 Comment of NGS analysis Category_Data_19 YES NO N/A Numerical_Data_19 Numerical_Data_39 Category_Data_20 YES NO N/A Numerical_Data_20 Numerical_Data_40 Diagnosis Previously Reported Variants Information Clinvar Database Information Control DB Gene symbol Ref.Seq. ID Exon Base Change AA Change genotype AFforGT QD AD AR CNT X-link UNK Pathogenicity DirectSeq. Segregation Allele Freq. pathogenicity disease pmid AlleleFreq1 AlleleFreq2 pathogenicity disease submitter ESPN NM_031475 exon13 c.C2513A p.A838E het 0.548 10.3 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . USH2A NM_206933 exon34 c.T6506A p.I2169K het 0.167 37.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . CDH23 NM_022124 exon36 c.C4762T p.R1588W het 0.557 10.6 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . MYH14 NM_001145 exon2 c.58_59insC p.V20fs het 0.500 9.1 Variant View 0 1 0 0 0 Confirmed SeqError YES NO MYH14 NM_001145 exon35 c.A4799G p.N1600S het 0.520 9.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . OTOF NM_194323 exon29 c.G3515A p.R1172Q hom 1.000 38.0 Variant View 0 7 0 0 0 AR_Pathogenic Confirmed SeqError YES NO . DFNB31 NM_001083 exon6 c.G200A p.R67H het 0.486 8.0 Variant View 0 1 0 0 0 Confirmed SeqError YES NO unknown not_specifie single NGS results (auto filtering) All Rights Reserved 2015. Shinshu University School of Medicine Department of Otorhinolaryngology. 2) “Variant Viewer” (Figure 3) is an interface for the efficient assessment of the pathogenicity of each variant. In this interface, you can get a whole view of the variant information including patient ID of those carrying the same variant, annotation information, including the computer prediction score in “ANNOVAR database”, minor allele frequency information of the 1000 genome, EVS6500 and other control data. This interface also provides automatically an average and standard deviation of the clinical information of those carrying the same variant and causative gene. For more detailed information about the variant viewer, please refer to section 16 of this manual. -4- Fig. 3 Variant viewer Clinical NGS Database ver. 1.2 Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Variant_Viewer Chr Start End Ref Alt Func. refgene Gene refgene GeneDetail Exonic Func Ref.Seq. ID Exon Base Change AA Change chr2 26681086 26681086 C T exonic; OTOF . nonsynonymo NM_194323 exon29 c.3515G>A p.R1172Q Odds ratio Entrez_gene_ID AAChange Pathogenicity Curation_Date Curator 9381 "OTOF:NM_194323:exon29:c.3515G>A:p.R1172Q,OTOF:NM_001287489:exon46:c.5816G>A:p.R1939Q" AR_Pathogenic Comment of variant interpretation CNT_alt#
Recommended publications
  • Investigating the Role of the Ribonuclease DIS3 In
    Investigating the role of the ribonuclease DIS3 in haematological cancers Sophie Rebecca Robinson A thesis submitted in partial fulfilment of the requirements of the University of Brighton and the University of Sussex for the degree of Doctor of Philosophy July 2016 Abstract Whole genome sequencing has recently identified DIS3 as a novel tumour suppressor gene in multiple myeloma. DIS3 is a conserved RNA exonuclease and catalytic subunit of the exosome, a protein complex involved in the 3’ to 5’ degradation and processing of messenger RNA and small RNAs. Messenger RNA processing and degradation is important in controlling gene expression and therefore cellular function, however the role DIS3 plays in the pathogenesis of haematological cancer remains unclear. Using RNAi as a means to knock-down DIS3, I have performed various functional assays to investigate the consequences of DIS3 loss-of function on myeloma cells. I have investigated cell viability, drug-sensitivity, mitotic errors, apoptosis and the generation of double-strand breaks in both transiently transfected myeloma cells and stable transfected adherent cells. I have also performed transcript profiling experiments in the form of RNA-sequencing to identify possible targets of DIS3 as well as synthetic lethality screens to identify proteins that may be cooperating with DIS3 mutations in myeloma pathogenesis. Overall, DIS3 knock-down did not appear to affect cellular phenotype in these assays, possibly indicating that DIS3 may be conferring a competitive advantage to cancer cells through a mechanism that only occurs in vivo. Alternatively, DIS3 mutations may not be driving tumourigenesis on their own but may either require another cellular pathway to be disrupted, or, may only be required to maintain the tumour rather than initiate it.
    [Show full text]
  • Download.Soe
    UC Santa Cruz UC Santa Cruz Previously Published Works Title The UCSC Genome Browser database: 2015 update. Permalink https://escholarship.org/uc/item/63r004hm Journal Nucleic acids research, 43(Database issue) ISSN 0305-1048 Authors Rosenbloom, Kate R Armstrong, Joel Barber, Galt P et al. Publication Date 2015 DOI 10.1093/nar/gku1177 Peer reviewed eScholarship.org Powered by the California Digital Library University of California D670–D681 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1177 The UCSC Genome Browser database: 2015 update Kate R. Rosenbloom1,*, Joel Armstrong1, Galt P. Barber1, Jonathan Casper1, Hiram Clawson1, Mark Diekhans1, Timothy R. Dreszer1, Pauline A. Fujita1, Luvina Guruvadoo1, Maximilian Haeussler1, Rachel A. Harte1, Steve Heitner1, Glenn Hickey1,AngieS.Hinrichs1, Robert Hubley2, Donna Karolchik1, Katrina Learned1, Brian T. Lee1,ChinH.Li1, Karen H. Miga1, Ngan Nguyen1, Benedict Paten1, Brian J. Raney1, Arian F. A. Smit2, Matthew L. Speir1, Ann S. Zweig1, David Haussler1,3, Robert M. Kuhn1 and W. James Kent1 1Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA, 2Institute for Systems Biology, Seattle, WA 98109, USA and 3Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA Received September 18, 2014; Revised October 30, 2014; Accepted October 31, 2014 ABSTRACT Accompanying the genomes are details of the sequenc- ing and assembly, gene models from RefSeq (3,4), GEN- Launched in 2001 to showcase the draft human CODE (5), Ensembl (6,7) and UCSC (8), transcription ev- genome assembly, the UCSC Genome Browser idence from GenBank (9) and other sources, epigenetic database (http://genome.ucsc.edu) and associated and gene regulatory annotation including comprehensive tools continue to grow, providing a comprehensive data sets from the ENCODE project (10), comparative ge- resource of genome assemblies and annotations to nomics and evolutionary conservation annotation, repeti- scientists and students worldwide.
    [Show full text]
  • DIS3 Isoforms Vary in Their Endoribonuclease Activity and Are Differentially Expressed Within Haematological Cancers
    DIS3 isoforms vary in their endoribonuclease activity and are differentially expressed within haematological cancers Article (Published Version) Robinson, Sophie R, Viegas, Sandra C, Matos, Rute G, Domingues, Susana, Bedir, Marisa, Stewart, Helen J S, Chevassut, Timothy, Oliver, Antony W, Arraiano, Cecilia M and Newbury, Sarah F (2018) DIS3 isoforms vary in their endoribonuclease activity and are differentially expressed within haematological cancers. Biochemical Journal, 475 (12). pp. 2091-2105. ISSN 0264-6021 This version is available from Sussex Research Online: http://sro.sussex.ac.uk/id/eprint/46151/ This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version. Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University. Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available. Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.
    [Show full text]
  • DIS3 Isoforms Vary in Their Endoribonuclease Activity and Are Differentially Expressed Within Haematological Cancers
    Biochemical Journal (2018) 475 2091–2105 https://doi.org/10.1042/BCJ20170962 Research Article DIS3 isoforms vary in their endoribonuclease activity and are differentially expressed within haematological cancers Sophie R. Robinson1, Sandra C. Viegas2, Rute G. Matos2, Susana Domingues2, Marisa Bedir3, Helen J.S. Stewart1, Timothy J. Chevassut1, Antony W. Oliver3, Cecilia M. Arraiano2 and Sarah F. Newbury1 1Medical Research Building, Brighton and Sussex Medical School, University of Sussex, Falmer, Brighton BN1 9PS, U.K.; 2Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal; 3School of Life Sciences, Genome Damage and Stability Centre, University of Sussex, Falmer, Brighton BN1 9RQ, U.K. Correspondence: Sarah F. Newbury ([email protected]) DIS3 (defective in sister chromatid joining) is the catalytic subunit of the exosome, a protein complex involved in the 30–50 degradation of RNAs. DIS3 is a highly conserved exoribonuclease, also known as Rrp44. Global sequencing studies have identified DIS3 as being mutated in a range of cancers, with a considerable incidence in multiple myeloma. In this work, we have identified two protein-coding isoforms of DIS3. Both isoforms are functionally relevant and result from alternative splicing. They differ from each other in the size of their N-terminal PIN (PilT N-terminal) domain, which has been shown to have endoribonuclease activity and tether DIS3 to the exosome. Isoform 1 encodes a full-length PIN domain, whereas the PIN domain of isoform 2 is shorter and is missing a segment with conserved amino acids. We have carried out biochemical activity assays on both isoforms of full-length DIS3 and the isolated PIN domains.
    [Show full text]
  • Genome Annotation Standards Before the Data Deluge
    Standards in Genomic Sciences (2011) 5:168-193 DOI:10.4056/sigs.2084864 Solving the Problem: Genome Annotation Standards before the data deluge William Klimke1, Claire O'Donovan2, Owen White3, J. Rodney Brister1, Karen Clark1, Boris Fedorov1, Ilene Mizrachi1, Kim D. Pruitt1, Tatiana Tatusova1 1The National Center for Biotechnology Information, National Library of Medicine, NIH, Building 45, Bethesda, MD 20894, USA 2UniProt, The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK 3Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid research- ers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databas- es, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fun- damental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a par- ticular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone.
    [Show full text]
  • Genenames.Org: the HGNC Resources in 2011 Ruth L
    Nucleic Acids Research Advance Access published October 6, 2010 Nucleic Acids Research, 2010, 1–6 doi:10.1093/nar/gkq892 genenames.org: the HGNC resources in 2011 Ruth L. Seal*, Susan M. Gordon, Michael J. Lush, Mathew W. Wright and Elspeth A. Bruford European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK Received September 15, 2010; Accepted September 21, 2010 ABSTRACT In order to achieve this, we endeavour to contact the The HUGO Gene Nomenclature Committee (HGNC) researchers that work on particular genes for their Downloaded from aims to assign a unique gene symbol and name to advice and input before approving symbols, and encour- every human gene. The HGNC database currently age researchers to submit proposed gene symbols directly contains almost 30 000 approved gene symbols, to us to determine their suitability prior to publication. over 19 000 of which represent protein-coding The HGNC team attends conferences regularly to ensure that we are meeting the requirements of the community genes. The public website, www.genenames.org, and to discuss the nomenclature of specific gene families nar.oxfordjournals.org displays all approved nomenclature within Symbol and locus types. We work closely with the nomenclature Reports that contain data curated by HGNC editors committees for several other species, especially the mouse and links to related genomic, phenotypic and prote- (2), rat (3), zebrafish (4) and Xenopus (5) to ensure that omic information. Here we describe improvements orthologous vertebrate genes are assigned equivalent to our resources, including a new Quick Gene symbols wherever possible. HGNC symbols are used by Search, a new List Search, an integrated HGNC most biomedical databases, including Ensembl (6), Vega BioMart and a new Statistics and Downloads facility.
    [Show full text]
  • Transcriptome Analysis of Alternative Splicing-Coupled Nonsense-Mediated Mrna Decay in Human Cells Reveals Broad Regulatory Potential
    bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.183327. this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY 4.0 International license. Transcriptome analysis of alternative splicing-coupled nonsense-mediated mRNA decay in human cells reveals broad regulatory potential Courtney E. French1,#a*, Gang Wei2,#b*, James P. B. Lloyd2,3,#c, Zhiqiang Hu2, Angela N. Brooks1,#d, Steven E. Brenner1,2,3,$ 1 Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA 2 Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA 3 Center for RNA Systems Biology, University of California, Berkeley, CA, 94720, USA #a Current address: Department of Paediatrics, University of Cambridge, Cambridge, CB2 1TN, UK #b Current address: State Key Laboratory of Genetics Engineering & MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, 200433, China #c Current address: ARC Centre of Excellence in Plant Energy Biology, University of Western Australia, Perth, Australia #d Current address: Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA * These authors contributed equally to this work $ Correspondence: [email protected] 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.01.183327. this version posted July 2, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY 4.0 International license. Abstract: To explore the regulatory potential of nonsense-mediated mRNA decay (NMD) coupled with alternative splicing, we globally surveyed the transcripts targeted by this pathway via RNA- Seq analysis of HeLa cells in which NMD had been inhibited.
    [Show full text]
  • Ülevaade Põhimõtetest Ning Teise Põlvkonna Sekveneerimise Võimalike Artefaktsete Snvde Annoteerimine
    TARTU ÜLIKOOL LOODUS- JA TÄPPISTEADUSTE VALDKOND MOLEKULAAR- JA RAKUBIOLOOGIA INSTITUUT BIOINFORMAATIKA ÕPPETOOL Anna Smertina Inimgenoomi ühenukleotiidiliste variatsioonide annotatsioon – ülevaade põhimõtetest ning teise põlvkonna sekveneerimise võimalike artefaktsete SNVde annoteerimine Bakalaureusetöö Maht: 12 EAP Juhendaja PhD Ulvi Gerst Talas TARTU 2016 Inimgenoomi ühenukleotiidiliste variatsioonide annotatsioon – ülevaade põhimõtetest ning teise põlvkonna sekveneerimise võimalike artefaktsete SNVde annoteerimine Teise põlvkonna sekveneerimine võimaldab tänu oma kiirusele ja suhtelisele odavusele järjestada kiiresti palju genoome, mille baasil on võimalik läbi viia nii ülegenoomseid assotsiatsiooniuuringuid kui ka kasutada andmeid kliinilises praktikas. Mõlemad lähenemised sõltuvad tugevalt SNVde ja teiste variatsioonide õigest tuvastamisest ning täpsest annotatsioonist. Antud töös tutvustatakse SNVde annoteerimise protsessi ja selle eripärasid, tuuakse välja annotatsiooni tõlgendamise erinevused lähtuvalt erinevatest tööriistadest ning andmebaasidest. Töö praktilises pooles näidatakse, et valepositiivselt tuvastatud SNVd võivad annoteerimise ja tulemuste tõlgendamise põhjal olla näiliselt füsioloogiliselt olulised. Artefaktsete SNVde tuvastamisega arvestamine võimaldab vältida vigaste andmete põhjal tehtud ekslikke järeldusi. Märksõnad: teise põlvkonna sekveneerimine, annoteerimine, SNV, bioinformaatika CERCS: B110 Bioinformaatika, meditsiiniinformaatika Annotation of single nucleotide variants in human genome: an overview and annotation
    [Show full text]
  • Presentazione Standard Di Powerpoint
    Genomi 7 Aplotype Con il termine aplotipo si definisce la combinazione di varianti alleliche lungo un cromosoma o segmento cromosomico contenente loci in linkage disequilibrium, cioè strettamente associati tra di loro, e che in genere, vengono ereditati insieme. Haplotype: A series of polymorphisms that are close together in the genome. The distribution of alleles at each polymorphic site is nonrandom: the base at one position predicts with some accuracy the base at the adjacent position. Persons sharing a haplotype are related, often very distantly. Haplotypes in Europeans are generally of the order of tens of kilobases long; older populations, such as those of West Africa, tend to have shorter haplotypes, since a longer period of evolutionary time means more meiotic events and a greater chance of population admixture, both of which result in shorter haplotypes. Aplotype Over the course of many generations, segments of the ancestral chromosomes in an interbreeding population are shuffled through repeated recombination events. Some of the segments of the ancestral chromosomes occur as regions of DNA sequences that are shared by multiple individuals. These segments are regions of chromosomes that have not been broken up by recombination, and they are separated by places where recombination has occurred. These segments are the haplotypes that enable geneticists to search for genes involved in diseases and other medically important traits. The fossil record and genetic evidence indicate that all humans today are descended from anatomically modern ancestors who lived in Africa about 150,000 years ago. Because we are a relatively young species, most of the variation in any current human population comes from the variation present in the ancestral human population.
    [Show full text]
  • Genetic Investigations of Sporadic Inclusion Body Myositis and Myopathies with Structural Abnormalities and Protein Aggregates in Muscle
    Genetic Investigations of Sporadic Inclusion Body Myositis and Myopathies with Structural Abnormalities and Protein Aggregates in Muscle Qiang Gang MRC Centre for Neuromuscular Diseases and UCL Institute of Neurology Supervised by Professor Henry Houlden, Dr Conceição Bettencourt and Professor Michael Hanna Thesis submitted for the degree of Doctor of Philosophy University College London 2016 1 Declaration I, Qiang Gang, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Signature………………………………………………………… Date……………………………………………………………... 2 Abstract The application of whole-exome sequencing (WES) has not only dramatically accelerated the discovery of pathogenic genes of Mendelian diseases, but has also shown promising findings in complex diseases. This thesis focuses on exploring genetic risk factors for a large series of sporadic inclusion body myositis (sIBM) cases, and identifying disease-causing genes for several groups of patients with abnormal structure and/or protein aggregates in muscle. Both conventional and advanced techniques were applied. Based on the International IBM Genetics Consortium (IIBMGC), the largest sIBM cohort of blood and muscle tissue for DNA analysis was collected as the initial part of this thesis. Candidate gene studies were carried out and revealed a disease modifying effect of an intronic polymorphism in TOMM40, enhanced by the APOE ε3/ε3 genotype. Rare variants in SQSTM1 and VCP genes were identified in seven of 181 patients, indicating a mutational overlap with neurodegenerative diseases. Subsequently, a first whole-exome association study was performed on 181 sIBM patients and 510 controls. This reported statistical significance of several common variants located on chromosome 6p21, a region encompassing genes related to inflammation/infection.
    [Show full text]
  • Confirming the Phylogeny of Mammals by Use of Large
    RESEARCH ARTICLES Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Data Sets Arjun B. Prasad,*à Marc W. Allard,§ NISC Comparative Sequencing Program* and Eric D. Green* *Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD; NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD; àIntegrated Biosciences Program, George Washington University; and §Department of Biological Sciences, George Washington University The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large data set of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large data set, we partitioned the sequence
    [Show full text]
  • Creating Reference Gene Annotation for the Mouse C57BL6/J Genome Assembly
    Mamm Genome (2015) 26:366–378 DOI 10.1007/s00335-015-9583-x Creating reference gene annotation for the mouse C57BL6/J genome assembly 1 1 Jonathan M. Mudge • Jennifer Harrow Received: 2 April 2015 / Accepted: 18 June 2015 / Published online: 18 July 2015 Ó The Author(s) 2015. This article is published with open access at Springerlink.com Abstract Annotation on the reference genome of the of identifying and describing gene structures. However, in C57BL6/J mouse has been an ongoing project ever since the the 21st century, genes are increasingly regarded as col- draft genome was first published. Initially, the principle lections of distinct transcripts—generated, most obviously, focus was on the identification of all protein-coding genes, by alternative splicing—that can have biologically distinct although today the importance of describing long non-cod- roles (Gerstein et al. 2007). The process of ‘gene’ anno- ing RNAs, small RNAs, and pseudogenes is recognized. tation is therefore perhaps more accurately understood as Here, we describe the progress of the GENCODE mouse that of ‘transcript’ annotation (with separate consideration annotation project, which combines manual annotation from being given to pseudogene annotation). The information the HAVANA group with Ensembl computational annota- held in such models can be divided into two categories. tion, alongside experimental and in silico validation pipeli- Firstly, the model will contain the coordinates of the nes from other members of the consortium. We discuss the transcript structure, i.e., the coordinates of exon/intron more recent incorporation of next-generation sequencing architecture and splice sites, as well as the transcript start datasets into this workflow, including the usage of mass- site (TSS) and polyadenylation site (if known; see ‘‘The spectrometry data to potentially identify novel protein- incorporation of next-generation sequencing technologies coding genes.
    [Show full text]