Clinical NGS Database Ver1.4 User’S Manual

Clinical_NGS_Database_ver1.4 User’s Manual Produced by: Shin-ya Nishio and Shin-ichi Usami Department of Otorhinolaryngology, Shinshu University School of Medicine 3-1-1 Asahi, Matsumoto 390-8621, Japan Tel: +81-263-37-2666 Fax: +81-263-36-9164 E-mail: [email protected] 1. Overview Recent advances in molecular genetics technologies, notably next-generation sequencing (NGS) have drastically accelerated the identification of novel genes involved in most inherited disease and expanded the mutational spectrum of disease-causing genes. These new technologies have led to significant breakthroughs in the field of human genetics research, but have also raised new challenges in the interpretation of the pathogenicity of an extraordinary number of generated genomic variants. This database software is designed for the efficient clinical next-generation sequencing analysis of inherited disease by collecting the data for a large number of variants as well as clinical information. This database software is also intended for easy start up, easy maintenance and the reduction of the load on computer resources, while providing powerful analysis tools for clinical next-generation sequencing. 1-1. Architecture of this database This data base software is constructed by 7 internal sub-databases. 1) “Core database” is a database to store all SNVs and Ins/dels information from each patient and in house controls. Variant call formated files (.vcf) produced from next-generation sequencing analysis are stored into this database. 2) “Sample List” is a database for patients and in-house control sample information. This database also includes project information as well as detailed patient clinical information. Records of this database are automatically generated from “Core database”. 3) “Valiant List” is a database to store variant information, calculated allele frequencies and averaged phenotypes of each patient. This database is automatically generated from the “Core database”. 4) “Gene List” is a database to store target gene lists for capture panels of the target re-sequencing. In this version, 10 kinds of capture libraries can be submitted into this database. 5) “ANNOVAR database” is a database to store the annotation information of -1- each variant included in “Variant List” database. In this version, annotation information produced by ANNOVAR is compatible. Please refer to the citations for more details on the ANNOVAR software. 6) “Previously reported variants database” is a database to store previously reported variant information. 7) “Control allele frequency database” is a database to store control variant allele frequency information from the public database. This database is compatible to vcf formated files. Fig. 1 Relationship of each sub-database ANNOVAR Gene_List Gene_refgene Gene Name concatenate Gene Name Chr Locus Start disease End OMIM# Ref AD Alt AR Sample_List Core_DB ID ID Valiant_List Previously_Reporte… concatenate concatenate concatenate ID project_NUM check check Chr project ID Chr position hereditary project Start Ref Enrichment platform End Alt check TaegetSet Ref gene Alt protein_change Control_Allele_Freq… concatenate Chr position Ref Alt Sample# AveDepth Sub-databases are linked to each other as illustrated in Figure 1. “Sample List” managing the sample IDs and clinical data are linked to “Core DB” by the sample “ID”. Thus, the sample “ID” should be unique throughout the samples. “Variant List” managing all variant information are linked to “Core DB”, “ANNOVAR database”, “Previously reported variants database”, and “Control allele frequency database” by the “concatenate” field. The “concatenate” field is automatically generated by combining the variant information (“Chr_Start_End_Ref_Alt). -2- “ANNOVAR database” linked to “Gene List” database by “Gene Name”. The name of genes should be identical between the “ANNOVAR database” to “Gene List” databases. 1-2. User interface of this database This data base software has 2 main user interfaces; the “Case Viewer” and “Variant Viewer”. 1) “Case Viewer” (Figure 2) is an interface for efficient clinical sequencing for the diagnosis of each patient. In this interface, you can get all of the patient’s clinical information including Sample ID, Project Name, Pedigree, and other detailed Clinical Data. In addition to the clinical information, you can get variant information after automatic filtering. This database has automatic variant filtering functions for “Protein-affecting variants,” “Low minor allele frequency among control population,” “Previously pathogenic variants,” etc. (Please refer to section 15 for filtering). This interface is useful for managing the “Direct sequence conformation results,” “Family segregation results,” and the genetic “Diagnosis”. For more detailed information about the filtering process and direct sequencing result management, please refer to section 15 of this manual. -3- Fig. 2 Case viewer Clinical NGS Database ver. 1.2 Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Case_Viewer ID Family_NUM project project_NUM Clinical diagnosis Onset_Age Gender Demo10 Fam001 DemoData ShinshuMP1 Congenital sensorineural hearing loss 0 Female proband_ID relationship hereditary sampling_date registration_date center_name JHLB0001 proband AD/Mit AR/Spo X_linked Control Unknown Shinshu-university Imaging_Data1 Family_History Categorical Data Numerical_Data Category_Data_1 YES NO N/A Numerical_Data_1 40 Numerical_Data_21 Imaging_Data 2 Imaging_Data 3 Category_Data_2 YES NO N/A Numerical_Data_2 Numerical_Data_22 Category_Data_3 YES NO N/A Numerical_Data_3 Numerical_Data_23 Category_Data_4 YES NO N/A Numerical_Data_4 Numerical_Data_24 Category_Data_5 YES NO N/A Numerical_Data_5 Numerical_Data_25 Category_Data_6 YES NO N/A Numerical_Data_6 Numerical_Data_26 Category_Data_7 YES NO N/A Numerical_Data_7 Numerical_Data_27 Category_Data_8 YES NO N/A Numerical_Data_8 Numerical_Data_28 Category_Data_9 YES NO N/A Numerical_Data_9 Numerical_Data_29 Category_Data_10 YES NO N/A Numerical_Data_10 Numerical_Data_30 Category_Data_11 YES NO N/A Numerical_Data_11 Numerical_Data_31 Imaging_Data 4 Imaging_Data 5 Clinical information medical_history Category_Data_12 YES NO N/A Numerical_Data_12 Numerical_Data_32 Category_Data_13 YES NO N/A Numerical_Data_13 Numerical_Data_33 Category_Data_14 YES NO N/A Numerical_Data_14 Numerical_Data_34 Category_Data_15 YES NO N/A Numerical_Data_15 Numerical_Data_35 Causative gene Genotype Curation_date Curator Category_Data_16 YES NO N/A Numerical_Data_36 Diagnostic candidate: Numerical_Data_16 Category_Data_17 YES NO N/A Numerical_Data_17 Numerical_Data_37 Diagnosis: OTOF homozygote Category_Data_18 YES NO N/A Numerical_Data_18 Numerical_Data_38 Comment of NGS analysis Category_Data_19 YES NO N/A Numerical_Data_19 Numerical_Data_39 Category_Data_20 YES NO N/A Numerical_Data_20 Numerical_Data_40 Diagnosis Previously Reported Variants Information Clinvar Database Information Control DB Gene symbol Ref.Seq. ID Exon Base Change AA Change genotype AFforGT QD AD AR CNT X-link UNK Pathogenicity DirectSeq. Segregation Allele Freq. pathogenicity disease pmid AlleleFreq1 AlleleFreq2 pathogenicity disease submitter ESPN NM_031475 exon13 c.C2513A p.A838E het 0.548 10.3 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . USH2A NM_206933 exon34 c.T6506A p.I2169K het 0.167 37.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . CDH23 NM_022124 exon36 c.C4762T p.R1588W het 0.557 10.6 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . MYH14 NM_001145 exon2 c.58_59insC p.V20fs het 0.500 9.1 Variant View 0 1 0 0 0 Confirmed SeqError YES NO MYH14 NM_001145 exon35 c.A4799G p.N1600S het 0.520 9.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . OTOF NM_194323 exon29 c.G3515A p.R1172Q hom 1.000 38.0 Variant View 0 7 0 0 0 AR_Pathogenic Confirmed SeqError YES NO . DFNB31 NM_001083 exon6 c.G200A p.R67H het 0.486 8.0 Variant View 0 1 0 0 0 Confirmed SeqError YES NO unknown not_specifie single NGS results (auto filtering) All Rights Reserved 2015. Shinshu University School of Medicine Department of Otorhinolaryngology. 2) “Variant Viewer” (Figure 3) is an interface for the efficient assessment of the pathogenicity of each variant. In this interface, you can get a whole view of the variant information including patient ID of those carrying the same variant, annotation information, including the computer prediction score in “ANNOVAR database”, minor allele frequency information of the 1000 genome, EVS6500 and other control data. This interface also provides automatically an average and standard deviation of the clinical information of those carrying the same variant and causative gene. For more detailed information about the variant viewer, please refer to section 16 of this manual. -4- Fig. 3 Variant viewer Clinical NGS Database ver. 1.2 Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Variant_Viewer Chr Start End Ref Alt Func. refgene Gene refgene GeneDetail Exonic Func Ref.Seq. ID Exon Base Change AA Change chr2 26681086 26681086 C T exonic; OTOF . nonsynonymo NM_194323 exon29 c.3515G>A p.R1172Q Odds ratio Entrez_gene_ID AAChange Pathogenicity Curation_Date Curator 9381 "OTOF:NM_194323:exon29:c.3515G>A:p.R1172Q,OTOF:NM_001287489:exon46:c.5816G>A:p.R1939Q" AR_Pathogenic Comment of variant interpretation CNT_alt#

Clinical NGS Database Ver1.4 User’S Manual

Investigating the Role of the Ribonuclease DIS3 In

Download.Soe

DIS3 Isoforms Vary in Their Endoribonuclease Activity and Are Differentially Expressed Within Haematological Cancers

DIS3 Isoforms Vary in Their Endoribonuclease Activity and Are Differentially Expressed Within Haematological Cancers

Genome Annotation Standards Before the Data Deluge

Genenames.Org: the HGNC Resources in 2011 Ruth L

Transcriptome Analysis of Alternative Splicing-Coupled Nonsense-Mediated Mrna Decay in Human Cells Reveals Broad Regulatory Potential

Ülevaade Põhimõtetest Ning Teise Põlvkonna Sekveneerimise Võimalike Artefaktsete Snvde Annoteerimine

Presentazione Standard Di Powerpoint

Genetic Investigations of Sporadic Inclusion Body Myositis and Myopathies with Structural Abnormalities and Protein Aggregates in Muscle

Confirming the Phylogeny of Mammals by Use of Large

Creating Reference Gene Annotation for the Mouse C57BL6/J Genome Assembly