Clinical_NGS_Database_ver1.4 User’s Manual

Produced by: Shin-ya Nishio and Shin-ichi Usami

Department of Otorhinolaryngology, Shinshu University School of Medicine 3-1-1 Asahi, Matsumoto 390-8621, Japan Tel: +81-263-37-2666 Fax: +81-263-36-9164 E-mail: [email protected]

1. Overview

Recent advances in molecular genetics technologies, notably next-generation sequencing (NGS) have drastically accelerated the identification of novel genes involved in most inherited disease and expanded the mutational spectrum of disease-causing genes. These new technologies have led to significant breakthroughs in the field of human genetics research, but have also raised new challenges in the interpretation of the pathogenicity of an extraordinary number of generated genomic variants. This database software is designed for the efficient clinical next-generation sequencing analysis of inherited disease by collecting the data for a large number of variants as well as clinical information. This database software is also intended for easy start up, easy maintenance and the reduction of the load on computer resources, while providing powerful analysis tools for clinical next-generation sequencing.

1-1. Architecture of this database This data base software is constructed by 7 internal sub-databases. 1) “Core database” is a database to store all SNVs and Ins/dels information from each patient and in house controls. Variant call formated files (.vcf) produced from next-generation sequencing analysis are stored into this database. 2) “Sample List” is a database for patients and in-house control sample information. This database also includes project information as well as detailed patient clinical information. Records of this database are automatically generated from “Core database”. 3) “Valiant List” is a database to store variant information, calculated allele frequencies and averaged phenotypes of each patient. This database is automatically generated from the “Core database”. 4) “Gene List” is a database to store target gene lists for capture panels of the target re-sequencing. In this version, 10 kinds of capture libraries can be submitted into this database. 5) “ANNOVAR database” is a database to store the annotation information of

-1- each variant included in “Variant List” database. In this version, annotation information produced by ANNOVAR is compatible. Please refer to the citations for more details on the ANNOVAR software. 6) “Previously reported variants database” is a database to store previously reported variant information. 7) “Control allele frequency database” is a database to store control variant allele frequency information from the public database. This database is compatible to vcf formated files.

Fig. 1 Relationship of each sub-database

ANNOVAR Gene_List Gene_refgene Gene Name concatenate Gene Name Chr Locus Start disease End OMIM# Ref AD Alt AR

Sample_List Core_DB ID ID Valiant_List Previously_Reporte… concatenate concatenate concatenate ID project_NUM check check Chr project ID Chr position hereditary project Start Ref Enrichment platform End Alt check TaegetSet Ref gene Alt protein_change

Control_Allele_Freq… concatenate Chr position Ref Alt Sample# AveDepth

Sub-databases are linked to each other as illustrated in Figure 1. “Sample List” managing the sample IDs and clinical data are linked to “Core DB” by the sample “ID”. Thus, the sample “ID” should be unique throughout the samples.

“Variant List” managing all variant information are linked to “Core DB”, “ANNOVAR database”, “Previously reported variants database”, and “Control allele frequency database” by the “concatenate” field. The “concatenate” field is automatically generated by combining the variant information (“Chr_Start_End_Ref_Alt).

-2- “ANNOVAR database” linked to “Gene List” database by “Gene Name”. The name of genes should be identical between the “ANNOVAR database” to “Gene List” databases.

1-2. User interface of this database This data base software has 2 main user interfaces; the “Case Viewer” and “Variant Viewer”.

1) “Case Viewer” (Figure 2) is an interface for efficient clinical sequencing for the diagnosis of each patient. In this interface, you can get all of the patient’s clinical information including Sample ID, Project Name, Pedigree, and other detailed Clinical Data. In addition to the clinical information, you can get variant information after automatic filtering. This database has automatic variant filtering functions for “Protein-affecting variants,” “Low minor allele frequency among control population,” “Previously pathogenic variants,” etc. (Please refer to section 15 for filtering). This interface is useful for managing the “Direct sequence conformation results,” “Family segregation results,” and the genetic “Diagnosis”. For more detailed information about the filtering process and direct sequencing result management, please refer to section 15 of this manual.

-3- Fig. 2 Case viewer

Clinical NGS Database ver. 1.2

Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Case_Viewer

ID Family_NUM project project_NUM Clinical diagnosis Onset_Age Gender Demo10 Fam001 DemoData ShinshuMP1 Congenital sensorineural hearing loss 0 Female

proband_ID relationship hereditary sampling_date registration_date center_name JHLB0001 proband AD/Mit AR/Spo X_linked Control Unknown Shinshu-university

Imaging_Data1 Family_History Categorical Data Numerical_Data

Category_Data_1 YES NO N/A Numerical_Data_1 40 Numerical_Data_21 Imaging_Data 2 Imaging_Data 3

Category_Data_2 YES NO N/A Numerical_Data_2 Numerical_Data_22

Category_Data_3 YES NO N/A Numerical_Data_3 Numerical_Data_23

Category_Data_4 YES NO N/A Numerical_Data_4 Numerical_Data_24

Category_Data_5 YES NO N/A Numerical_Data_5 Numerical_Data_25

Category_Data_6 YES NO N/A Numerical_Data_6 Numerical_Data_26

Category_Data_7 YES NO N/A Numerical_Data_7 Numerical_Data_27

Category_Data_8 YES NO N/A Numerical_Data_8 Numerical_Data_28

Category_Data_9 YES NO N/A Numerical_Data_9 Numerical_Data_29

Category_Data_10 YES NO N/A Numerical_Data_10 Numerical_Data_30

Category_Data_11 YES NO N/A Numerical_Data_11 Numerical_Data_31 Imaging_Data 4 Imaging_Data 5 Clinical information medical_history Category_Data_12 YES NO N/A Numerical_Data_12 Numerical_Data_32

Category_Data_13 YES NO N/A Numerical_Data_13 Numerical_Data_33

Category_Data_14 YES NO N/A Numerical_Data_14 Numerical_Data_34

Category_Data_15 YES NO N/A Numerical_Data_15 Numerical_Data_35 Causative gene Genotype Curation_date Curator Category_Data_16 YES NO N/A Numerical_Data_36 Diagnostic candidate: Numerical_Data_16 Category_Data_17 YES NO N/A Numerical_Data_17 Numerical_Data_37 Diagnosis: OTOF homozygote Category_Data_18 YES NO N/A Numerical_Data_18 Numerical_Data_38 Comment of NGS analysis Category_Data_19 YES NO N/A Numerical_Data_19 Numerical_Data_39

Category_Data_20 YES NO N/A Numerical_Data_20 Numerical_Data_40 Diagnosis

Previously Reported Variants Information Clinvar Database Information Control DB Gene symbol Ref.Seq. ID Exon Base Change AA Change genotype AFforGT QD AD AR CNT X-link UNK Pathogenicity DirectSeq. Segregation Allele Freq. pathogenicity disease pmid AlleleFreq1 AlleleFreq2 pathogenicity disease submitter ESPN NM_031475 exon13 c.C2513A p.A838E het 0.548 10.3 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

USH2A NM_206933 exon34 c.T6506A p.I2169K het 0.167 37.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

CDH23 NM_022124 exon36 c.C4762T p.R1588W het 0.557 10.6 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

MYH14 NM_001145 exon2 c.58_59insC p.V20fs het 0.500 9.1 Variant View 0 1 0 0 0 Confirmed SeqError YES NO

MYH14 NM_001145 exon35 c.A4799G p.N1600S het 0.520 9.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

OTOF NM_194323 exon29 c.G3515A p.R1172Q hom 1.000 38.0 Variant View 0 7 0 0 0 AR_Pathogenic Confirmed SeqError YES NO . . .

DFNB31 NM_001083 exon6 c.G200A p.R67H het 0.486 8.0 Variant View 0 1 0 0 0 Confirmed SeqError YES NO unknown not_specifie single NGS results (auto filtering)

All Rights Reserved 2015. Shinshu University School of Medicine Department of Otorhinolaryngology.

2) “Variant Viewer” (Figure 3) is an interface for the efficient assessment of the pathogenicity of each variant. In this interface, you can get a whole view of the variant information including patient ID of those carrying the same variant, annotation information, including the computer prediction score in “ANNOVAR database”, minor allele frequency information of the 1000 genome, EVS6500 and other control data. This interface also provides automatically an average and standard deviation of the clinical information of those carrying the same variant and causative gene. For more detailed information about the variant viewer, please refer to section 16 of this manual.

-4- Fig. 3 Variant viewer

Clinical NGS Database ver. 1.2

Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Variant_Viewer

Chr Start End Ref Alt Func. refgene Gene refgene GeneDetail Exonic Func Ref.Seq. ID Exon Base Change AA Change chr2 26681086 26681086 C T exonic; OTOF . nonsynonymo NM_194323 exon29 c.3515G>A p.R1172Q Odds ratio

Entrez_gene_ID AAChange Pathogenicity Curation_Date Curator 9381 "OTOF:NM_194323:exon29:c.3515G>A:p.R1172Q,OTOF:NM_001287489:exon46:c.5816G>A:p.R1939Q" AR_Pathogenic

Comment of variant interpretation CNT_alt# CNT_ref# OR_AD/Mit OR_AD/Mit_95%CI OR_AR/Spo OR_AR/Spo_95%CI 4 2120 0.7 0.1 - 3.8 6.0 2.2 - 16.6

p-value 1.000000 p-value 0.000172 Variant interpretation ACMG_clasification: Pathogenic

ID Genotype DP GQ Project Hereditaly Enrichment DirectSeq. Segregation ACMG variant clasification 2529 het 549 99 Iowa AR/Spo Panel_1 Case_Preview in silico PVS1: null variant in a gene where LOF is a known mechanism of disease

260 het 999 99 hokenMP19 AR/Spo Panel_1 Case_Preview PS1: Same change as a previously established pathogenic variant

261 het 755 99 hokenMP19 AR/Spo Panel_1 Case_Preview PS2: De novo (both parents confirmed) in a patient w/o no family history

262 het 260 99 hokenMP19 AR/Spo Panel_1 Case_Preview PS3: Well established experiment support damaging effect prediction score and ClinVar status 2703 hom 432 99 Iowa AR/Spo Panel_1 Case_Preview PS4: The prevalence of the variant in affected individuals is significantly high (OR > 5.0)

2958 het 127 99 shinshuCIPtMP1 AR/Spo Panel_1 Case_Preview PM1: Located in a mutational hot spot and/or well-established functional domain 3354 het 404 99 Iowa Unknow Panel_1 Case_Preview D D D . D . D D D PM2: Absent from controls (or at extremely low frequency if recessive) 4006 het 370 99 Iowa AD/Mit Panel_1 Case_Preview PM3: For recessive disorder detected in trans with a pathogenic variant Patient list who carring same mutation 4879 het 445 99 SenshinMP1 AR/Spo Panel_1 Case_Preview PM4: Protein length changes (in-frame deletions/insertions) in a nonrepeat region or stop-loss variants

PM5: Novel missense change at same AA position of other pathogenic variant Allele Frequency Information in silico prediction PM6: Assumed de novo, but without both parents confirmation Freq Sample# AveDepth Ref# Alt# 0.997 D MutAssessor . 2.493 Upgrade AD 2 / 1532 0.00131 SIFT CADD PP1: Cosegregation with disease in multiple affected family members PM7 PS5 CNT_DB .002743484 729 37.57 1454 4 GERP++ 4.8 AR 48 / 4298 0.01117 PP2 HDIV 1 D FATHMM 0.457 D PhyloP46 0.871 PP2: Missense variant in a gene that has a low rate of benign missense variation all afr amr eas eur sas X_link 0 / 0 ? PP2 HVAR 0.994 D RadialSVM 0.631 D 1000G ? . . 0.001 . . UNK 14 / 1468 0.00954 LRT . LR 0.762 D PhyloP100 0.927 PP3: Multiple lines of computational evidence support a deleterious effect

all aa ea CNT 0 / 666 0.00000 MutTaster 1 D VEST3 0.831 SiPhy29 17.465 PP4: Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology ESP6500 . . . Case 64 / 7298 0.00877 dbSNP COSMIC_ID COSMIC_DIS PP5: Reputable source recently reports variant as pathogenic all afr amr eas fin nfe oth sas BA1: Allele frequency is >5% in ESP, 1000 Genomes, or ExAC ExAC03 2.59E-05 0 0 0.0006 0 0 0 0 ClinVar_SIG ClinVar_DIS ClinVar_STATUS ClinVar_ID ClinVar_DB ClinVar_DBID BS1: Allele frequency is greater than expected for disorder . . . . . BS2: Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous) Pathogenicity data from Previously Reported Variant Database GWAS_DIS GWAS_P GWAS_OR GWAS_BETA GWAS_PMID GWAS_SNP BS3: Well established experiment show no damage effect gene variant_locate aa change base_change BS4: Lack of segregation in affected members of a family OTOF NM_194323:p.Arg1172Gln NM_194323:c.3515G>A Allele frequencies in public and in house database pathogenicity disease pmid AF1 AF2 BP1: Missense variant in a gene for which primarily truncating variants are known to cause disease Unknown NULL NULL BP2: Observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern

Averaged clinical data of the variant carriers BP3: In frame deletion/insertion in repetitive region without known function

BP4: Multiple lines of computational evidence suggest no impact on gene AD AR X_link CNT UNK Numerical_Data_1 ± Numerical_Data_15 ± Numerical_Data_29 ± Averaged Onset Age ± 2 48 0 0 14 Numerical_Data_2 ± Numerical_Data_16 ± Numerical_Data_30 ± BP5: Variant found in a case with an alternate molecular basis for disease Numerical_Data_3 ± Numerical_Data_17 ± Numerical_Data_31 ± BP6: Reputable source recently reports variant as benign Y N Y N Numerical_Data_4 ± Numerical_Data_18 ± Numerical_Data_32 ± Category_Data_1 Category_Data_11 BP7: A synonymous (silent) variant for which splicing prediction algorithms predict no impact Numerical_Data_5 ± Numerical_Data_19 ± Numerical_Data_33 ± Category_Data_2 Category_Data_12 Numerical_Data_6 ± Numerical_Data_20 ± Numerical_Data_34 ± Category_Data_3 Category_Data_13 Numerical_Data_7 ± Numerical_Data_21 ± Numerical_Data_35 ± Category_Data_4 Category_Data_14 Numerical_Data_8 ± Numerical_Data_22 ± Numerical_Data_36 ± Category_Data_5 Category_Data_15 Numerical_Data_9 ± Numerical_Data_23 ± Numerical_Data_37 ± Category_Data_6 Category_Data_16 Numerical_Data_10 ± Numerical_Data_24 ± Numerical_Data_38 ± Category_Data_7 Category_Data_17 Numerical_Data_11 ± Numerical_Data_25 ± Numerical_Data_39 ± Category_Data_8 Category_Data_18 Numerical_Data_12 ± Numerical_Data_26 ± Numerical_Data_40 ± Category_Data_9 Category_Data_19 Numerical_Data_13 ± Numerical_Data_27 ± Averaged clinical information who carried same variant Category_Data_10 Category_Data_20 Numerical_Data_14 ± Numerical_Data_28 ±

Averaged clinical data of the patients caused by this gene mutations

Numerical_Data_1 ± Numerical_Data_15 ± Numerical_Data_29 ± Averaged Onset Age ± Numerical_Data_2 ± Numerical_Data_16 ± Numerical_Data_30 ±

Numerical_Data_3 ± Numerical_Data_17 ± Numerical_Data_31 ± Y N Y N Numerical_Data_4 ± Numerical_Data_18 ± Numerical_Data_32 ± Category_Data_1 Category_Data_11 Numerical_Data_5 ± Numerical_Data_19 ± Numerical_Data_33 ± Category_Data_2 Category_Data_12 Numerical_Data_6 ± Numerical_Data_20 ± Numerical_Data_34 ± Category_Data_3 Category_Data_13 Numerical_Data_7 ± Numerical_Data_21 ± Numerical_Data_35 ± Category_Data_4 Category_Data_14 Numerical_Data_8 ± Numerical_Data_22 ± Numerical_Data_36 ± Category_Data_5 Category_Data_15 Numerical_Data_9 ± Numerical_Data_23 ± Numerical_Data_37 ± Category_Data_6 Category_Data_16 Numerical_Data_10 ± Numerical_Data_24 ± Numerical_Data_38 ± Category_Data_7 Category_Data_17 Numerical_Data_11 ± Numerical_Data_25 ± Numerical_Data_39 ± Category_Data_8 Category_Data_18 Numerical_Data_12 ± Numerical_Data_26 ± Numerical_Data_40 ± Category_Data_9 Category_Data_19 Numerical_Data_13 ± Numerical_Data_27 ± Category_Data_10 Category_Data_20 Numerical_Data_14 ± Numerical_Data_28 ± Averaged clinical information of the patient caused by same gene

All Rights Reserved 2015. Shinshu University School of Medicine Department of Otorhinolaryngology.

-5- 2. Prior to start up

2-1. Required software and computer specs This database software is constructed using FileMaker Pro database software produced by FileMaker Inc. Please install FileMaker Pro ver. 12 or later to your computer. This database software is quite a low load and the required minimum spec is 1 or more x86 or x64 processor over 1.2GHz, 4GB or more memory, and 8GB or more hard disk space. We confirmed that this database software runs comfortably on MacBook Air with a 1.6GHz Intel Core i5 processor, 8GB memory and 256GB SSD* in cases with target re-sequencing data for 2,000 or more patients. * We recommend SSD (Solid State Drive) for speedy database access.

2-2. Supporting next-generation sequencer and output files This database software is intended to support clinical next-generation sequencing analysis on a personal computer and is mainly focused for wet-lab researchers or clinicians. We designed this database to collect only variant call data to reduce the database size and allow speedier database browsing. VCF is the most widely used format for variant calls in next-generation sequencing analysis and this format is adopted by the 1000 genomes project (Figure 4). Please refer to 1000 genome project website for more detailed information about the vcf format.

1000 genome project: http://www.1000genomes.org/

This database software is compatible with both of illumine and ThermoFisher sequencers. Vcf files incorporated into this database software should be constructed using standard 8 fixed columns plus 2 genotype field columns.

1. #CHROM 2. POS

-6- 3. ID 4. REF 5. ALT 6. QUAL 7. FILTER 8. INFO 9. FORMAT (genotype field) 10. FORMAT (genotype field 2)

Fig. 4 Example of VCF4 format data

All variant caller software compatible to vcf format output is supported. However, if you want to collect the genotyping accuracy information (like depth of coverage) into the database, genotype fields defined in FORMAT columns following the INFO column should be used as listed below.

samtools standard output GT:GQ:DP:HQ

GATK unified genotyper standard output GT:AD:DP:GQ:PL

Avadis NGS standard output GT:AD:DP:GQ:PL

Torrent Suit software Variant Caller plugin output

-7- ver 4.2 or higher GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FS RF:FSRR ver 4.0 or higher GT:GQ:DP:FDP:RO:FRO:AO:FAO:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF: FSRR ver 2.0 or higher GT:GQ:GL:DP:FDP:AD:APSD:AST:ABQV

-8- 3. Let’s get started!

NOTE: This user name and password is tentative for first log in. We strongly recommend changing the user name and password after log in. The user account can be modified using the menus bar “File” > “Manage” > “Security”. You should have one or more “Full Access” privilege accounts for administrator and also have “Data Entry Only” and/or “Read-Only Access” privilege account3-1.s ifDouble necessary. click the database icon to start up Double click the database icon to start up. After starting up, please enter the user name and password below.

User Name: admin Password: admin

3-2. Try to change the layout After opening the database, you can see the layout change icons in the top of the database. If you click each icon, you can change the layout of the database (Figure 5). For demonstration, this database contains an example data set.

Fig. 5 Layout change icon set

-9- Brief explanation of each layout in this database: 1) “Sample list“ is a layout for managing patient and in-house control sample information in a list format. This layout enables easy and speedy management of samples and project information. 2) “Case viewer” is an interface for efficient clinical genetic diagnosis of each patient. In this interface, you can get a whole view of patient clinical information and next-generation sequencing results after automatic filtering. 3) “Panel Info” is a layout for managing the target panel information in list format. 4) “Variant list” is a layout for managing the variants identified among the patients and in-house controls in list format. This layout is used as intermediate data for annotation. 5) “Variant viewer” is an interface for efficient assessment of the pathogenicity of each variant. In this interface, you can get a whole view of variant information including patient ID, annotation, computer prediction score, and minor allele frequency in public database control populations. This interface also provides an average and standard deviation of clinical information of those carrying the variant. 6) “Report maker” is a layout for printing the next-generation DNA sequencing analysis report for each case. 7) “Annotation” is a layout for storing the annotation including Gene Name, Base Change, Amino Acid Change, minor allele frequency in control populations available in public databases, clinical evaluation information in the ClinVar database, and in silico prediction scores of each variant. 8) “Pathogenic” is a layout for storing the previously reported variation information including the allele frequency and its pathogenicity. HGMD is a quite useful data source for such information. * 9) “Control” is a layout for storing the minor allele frequency information in the control population. HGVD is a useful data source for this layout especially for the East Asian population. **

ClinVar: http://www.ncbi.nlm.nih.gov/clinvar/ HGMD: http://www.hgmd.cf.ac.uk/ac/index.php

-10- HGVD: http://www.genome.med.kyoto-u.ac.jp/SnpDB/

*Previously reported variation information included in the ClinVar database is stored in “Annotation” layout.

**Minor allele frequency information of 1000 genome project, EVS 6500 project and ExAC project are stored in “Annotation” layout.

-11- 4. Erase all demonstration data

4-1. Erase all demonstration data This database software has demonstration data for evaluating usability. Prior to constructing your own database, we recommend you remove all demonstration data from the database.

4-2. Move to Core DB This database software has 7 internal sub-databases. Most parts of the sub-databases have the layout change buttons explained in section 3-2, but “Core DB” does not link to any of the layout change buttons. Thus, you should move to “Core DB” by pulling down the layout pop-up menu in the status tool bar and select “Core DB”.

4-3. Delete all data from Core DB As a first step to deleting the demonstration data, you should select all data stored in “Core DB” by selecting from menus bar “Record” > “Show All Records”. Then please select from the file menu “Record” > “Delete All Record” to delete all demonstration data including in the “Core DB”.

4-4. Delete all data from other sub-database Same as step 4-2 and 4-3, please change the layout and delete all demonstration data included in the list below.

l Sample List l Variant List l Gene List l ANNOVAR

-12- 5. Define target panel information

5-1. Add genes into the database This database software is able to manage 10 kinds of target gene set (panel) information. First of all, please input the target gene information in the “Gene List” database. To add the genes in the “Gene List” database, click on the “New Record” button in the status tool bar, and input the required information about each gene in your target panel.

5-2. Input target panel gene list into the database Each gene has 15 fields as indicated in Figure 6. Among the 15 fields, “Gene Name” and “Ref_NM” field information is required. “Ref_NM” fields indicate the preferentially displayed RefSeq gene accession number in case the gene has many variants. To manage the genes included in each panel, please input “1” (means including) or “0” (means not-including) into the “Panel_1” to “Panel_10” fields.

Fig. 6 Gene List

-13- Important: Tips: The “Gene_Name” and “Ref_NM” fields mainly support RefSeq gene Instead of one-by-one gene information input, you can import from a template information. However, it is compatible with other database genes (UCSC, excel file. After modifying the template excel file, named “GeneList.xlsx”, ENSEMBL or GENCODE). In this case, you should use the same gene name please select from the file menu of this database “File” > “Import Records” > and accession number throughout the whole database, including “File” and select modified “GeneList.xlsx” file. Prior to clicking the “Import” sub-databases such as ANNOVAR, Previously Reported Pathogenic button, please verify the field order. For more detailed information on the data Variants, etc. import procedure, please refer to the FileMaker Pro User’s Guide available on their website. The (https://fmhelp.filemaker.com/docs/14/en/fmp14_users_guide.pdf) Tips: “Ref_NM” fields indicate the preferentially displayed RefSeq gene accession number in case the gene has many alternative splicing variants. We recommended using reference in consensus CDS project. (https://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi )

-14- 6. Import vcf format files into the Core DB

6-1. File name extension code modification As a second step in your database construction, please import variant call format files (vcf files) obtained from next-generation DNA sequencing analysis. FileMaker Pro software supports excel, tab, and csv format files for incorporation, but, unfortunately, does not support vcf format files. So, please modify the extension code of vcf format files from “.vcf” to “.tab” (Figure 7).

Fig. 7 File name extension code modification

6-2. VCF import manager After modifying the extension code, please click on the “Import VCF” icon. After clicking the icon, the “VCF_Import_Manager” is displayed (Figure 8).

Fig. 8 VCF_Import Manager

-15- ImportantTips: : “WeSample_ID recommend” will you be usedimport as as a manylink anchor vcf files between as possible the “ Corein this DB step.” and In the othernext step,sub-databases. the database Thus, software the “Sample_ID automatically” should creates be unique the recordsthroughout of the samples.“Sample list” and “Variant list” sub-databases. This process is automatic but is time-consuming.6-3. Input Tosample import information the next vcf file, click the “VCF_Import_ManagerAccording” icon to theagain instruction and imports, the please data enterin the thesame sample manner ID , project, and as in steps 6-1sequencing to 6-4. platform (Figure 9). With regard to the target set, please select it by pulling down the menu. This target set information is linked to the target panel data (please refer to section 5).

Fig. 9 Sample information section of the VCF_Import_Manager

6-4. Import header removed VCF file into the database After inputting the sample information, click on the “Import tab formatted vcf file” button and select the extension code modified vcf file (.tab file).

-16- After clicking the import button, the Import Options window will be displayed. Please confirm that “auto-enter options” is checked and click on the Import button (Figure 10).

Fig. 10 Import Options window

After importing the vcf file, an error alert as shown in Figure 11 may be displayed a few times. Please click on the “Continue” button (Figure 11).

Fig. 11 Post vcf import alert

-17- 7. Update the Sample List and Variant List

7-1. Update the Sample List and Variant List In this step, the database software automatically creates the records of the “Sample list” and “Variant list” sub-databases. This process takes a long time (Depending on the total data amount, it may require 2-3 hours). Click on the “DB update” button and wait patiently (Figure 12).

Fig. 12 DB update icon

-18- 8. Export variants list for annotation

8-1. Export variants list for annotation After database update, export the variant list for annotation. To export the variant list, please click on the “Export” button. The variant list exported in this process is an ANNOVAR format file in a tab-separated format (Figure 13).

Fig. 13 Export icon

8-2. Confirm exported ANNOVAR format file (optional) After the above process, a tab-separated text file named “ANNOVAR.txt” is generated on your computer. Please confirm the field order. The ANNOVAR format has 6 rows (Chr, Start, End, Ref, Alt, and Other Info) (Figure 14)

Fig. 14 ANNOVAR formatted variants list

-19- 9. Annotate variants

9-1. Annotate variants This database software stores variant annotation information in a sub-database for easy update. With regard to annotation, this database is constructed to be compatible with ANNOVAR software. We recommend the web-served version of ANNOVAR (wANNOVAR) for ease of use.

wANNOVAR: http://wannovar.usc.edu

Reference: Chang X, Wang K. wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet. 2012 Jun 20

9-2. Input basic information Input E-mail address and Sample Identifier on the wANNOVAR website. Open the ANNOVAR.txt file generated as described in section 8 by text editor and copy all, then paste to the Paste Variant Calls field of the wANNOVAR website (Figure 15).

Fig. 15 wANNOVAR web site

-20-

9-3. Input parameter settings The ANNOVAR.txt file is in ANNOVAR format, so please change the Input Format of the parameter settings from VCF to ANNOVAR (Figure 16), then click on the Submit button.

Fig. 16 wANNOVAR web site parameter settings

9-4. Download variant annotation information After finishing variant annotation, you will get an e-mail notice about the completion of the annotate variants process, so please download the genome annotation file in csv format on your desktop (Figure 17).

-21- Tips: Instead of wANNOVAR software, it is possible to create annotation information by local ANNOVAR. In this case, you should convert linefeed code

(newline character) from CR+LF (Windows) or CR (Machintosh) to LF (Linux linefeed code).Fig. 17 wANNOVAR results

-22- 10. Annotate variants

10-1. Import variant annotation information into the database To import variant annotation into the database, please click on the “Annotation Import” button and select the annotation file (Figure 18).

Fig. 18 Annotation Import icon

Prior to importing the variant annotation, please confirm the import order. The left column indicates annotation information downloaded from the wANNOVAR web site. The right column indicates database fields to import data. Bidirectional arrows between the left and right indicate the matching field. Arrows between left and right indicate data import. Dots indicate the non-import of data (Figure 19).

Fig. 19 Annotation Import field order

-23-

After confirming the import order, please check the Import action section of the window. Import action should be “Update matching records in found set,” and then confirm that the 2 check boxes are checked under the Import action section (Figure 20). Then click on the Import button. *1

Fig. 20 Import action section

After clicking the import button, the Import Options window will be displayed. Please confirm that “auto-enter option” is checked and click on the Import button (Figure 21).

-24- Click to change bidirectional arrow to unidirectional arrow

Fig. 21 Import Options window

*1 Only the first time: import annotation information into the database and modify the import settings as shown in Figure 21.

Fig. 22 First time only settings

-25- 11. Import previously reported variant information (Optional)

11-1. Add previously reported variant information (Optional) This database software is able to manage previously reported pathogenic variant information. Please input the previously reported variant information in the “Previously_Reported_Variants_Database”. Change the layout by clicking on the “Pathogenic” button, then click on the “New Record” button in the status tool bar and input the required information about each variant including the pathogenicity or disease.

Fig. 23 “Previously_Reported_Variants_Database” icon

Brief explanation of the fields including in this sub-database: Chr: Chromosome number in ANNOVAR format. Start: Base position number in ANNOVAR format End: Base position number in ANNOVAR format Ref: Reference base in ANNOVAR format Alt: Alternative base in ANNOVAR format gene: Gene symbol (Optional) aa_change: Amino acid change (Optional) base_change: Base change (Optional) valiantLocate: Locus of variant (Optional) pathogenicity: Pathogenicity information for each variants. This should be any of “Pathogenic”, “Likely pathogenic”, “VUS”, “Likely benign” or “Benign” disease: Disease or syndrome name

-26- Tips: The pathogenicity field information in this sub-database is used as one of the variant filters in the “Case Viewer”. In cases where the pathogenicity field is “Pathogenic” or “Likely pathogenic”, the variant is preferentially displayed in “Case Viewer” independentpmid: PubMed to theID (Optional) other filtering parameters (Func, Exonic Func, 1000G, EVSAlleleFreq1 6500, etc.).: Allele In Frequency cases where among the some pathogenicity population field (Optional) is “VUS” or “Likely AlleleFreq2benign” or “:Benign Allele Frequency”, the variant among is further other selected population by other(Optional) filtering parameters.

Tips: Instead of one-by-one input, you can import from a template excel file. After modifying the template excel file, named “Pathogenic.xlsx”, please select from the file menu of this sub-database “File” > “Import Records” > “File” and select modified “Pathogenic.xlsx” file. Prior to clicking the “Import” button, please verify the field order for import. For more details about the data import procedure, please refer FileMaker Pro User’s Guide available on their website. (https://fmhelp.filemaker.com/docs/14/en/fmp14_users_guide.pdf) NOTE: In case you import data from “Pathogenic.xlsx” file, pathogenic column of excel file should be any of “Pathogenic”, “Likely pathogenic”, “VUS”, “Likely benign” or “Benign” for filtering mentioned above.

-27- 12. Import control allele frequency information (Optional)

12. Import control allele frequency information (Optional) This database software is able to manage variant allele frequency information in the control population if necessary. Please input the allele frequency information in the “Control_Allele_Frequency_Database”. Change the layout by clicking on the “Control” button, then click on the “New Record” button in the status tool bar, and input the required information about each variant and its allele frequency information.

Fig. 24 “Control_Allele_Frequency_Database” icon

Brief explanation of the fields including in this sub-database: Chr: Chromosome number in vcf format. position: Base position number in vcf format Ref: Reference base in vcf format Alt: Alternative base in vcf format Sample#: Control datasets sample number (Optional) AveDepth: Average depth of coverage (Optional) Ref#: Allele number of reference base (Optional) Alt#: Allele number of alternative base (Optional) Gene: Pathogenicity information for each variants. This should be any of “Pathogenic”, “Likely pathogenic”, “VUS”, “Likely benign” or “Benign” Freq: Allele Frequency among the control population

-28- Tips: Instead of one-by-one input, you can import from a template excel file. After modifying the template excel file, named “Control.xlsx”, please select from the file menu of this sub-database “File” > “Import Records” > “File” and select modified “Control.xlsx” file. Prior to clicking on the “Import” button, please verify the field order for import. For more detailed on the data import procedure, please refer FileMaker Pro User’s Guide available on their website. (https://fmhelp.filemaker.com/docs/14/en/fmp14_users_guide.pdf)

NOTE: 1000 genome, EVS 6500, and ExAC project data is stored as a part of the ANNOVAR annotations. In-house control data can be stored in the “Core Database” or in this “Control Allele Frequency Database”.

-29- 13. Input clinical information for each sample

13-1. Input clinical information for each sample This database software is also able to manage detailed patient clinical information. In this version of the database, pedigree, inheritance mode, onset age, gender, medical history, 20 kinds of categorical data, 40 kinds of numerical data and 9 images are supported in the default settings. If you need more, please add the field as explained in the FileMaker Pro User’s Guide available on their website. (https://fmhelp.filemaker.com/docs/14/en/fmp14_users_guide.pdf)

13-2. Change layout to “Case viewer” and modify layout As a first step, please change the layout by clicking on the “Case viewer” button. Case viewer is one of main user interfaces for the efficient clinical NGS analysis provided by this database.

Fig. 25 Case viewer icon

After changing the layout to “Case viewer”, you can see many open fields or check boxes for patient clinical information. As a default setting, the categorical data fields and numerical data fields are numbered by order. So please modify the name of the appropriate fields.

-30-

Fig. 26 Case viewer

Clinical NGS Database ver. 1.2

Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Case_Viewer

ID Family_NUM project project_NUM Clinical diagnosis Onset_Age Gender Demo10 Fam001 DemoData ShinshuMP1 Congenital sensorineural hearing loss 0 Female

proband_ID relationship hereditary sampling_date registration_date center_name JHLB0001 proband AD/Mit AR/Spo X_linked Control Unknown Shinshu-university

Imaging_Data1 Family_History Categorical Data Numerical_Data

Category_Data_1 YES NO N/A Numerical_Data_1 40 Numerical_Data_21 Imaging_Data 2 Imaging_Data 3

Category_Data_2 YES NO N/A Numerical_Data_2 Numerical_Data_22

Category_Data_3 YES NO N/A Numerical_Data_3 Numerical_Data_23

Category_Data_4 YES NO N/A Numerical_Data_4 Numerical_Data_24

Category_Data_5 YES NO N/A Numerical_Data_5 Numerical_Data_25

Category_Data_6 YES NO N/A Numerical_Data_6 Numerical_Data_26

Category_Data_7 YES NO N/A Numerical_Data_7 Numerical_Data_27

Category_Data_8 YES NO N/A Numerical_Data_8 Numerical_Data_28

Category_Data_9 YES NO N/A Numerical_Data_9 Numerical_Data_29

Category_Data_10 YES NO N/A Numerical_Data_10 Numerical_Data_30

Category_Data_11 YES NO N/A Numerical_Data_11 Numerical_Data_31 Imaging_Data 4 Imaging_Data 5 Clinical information medical_history Category_Data_12 YES NO N/A Numerical_Data_12 Numerical_Data_32

Category_Data_13 YES NO N/A Numerical_Data_13 Numerical_Data_33

Category_Data_14 YES NO N/A Numerical_Data_14 Numerical_Data_34

Category_Data_15 YES NO N/A Numerical_Data_15 Numerical_Data_35 Causative gene Genotype Curation_date Curator Category_Data_16 YES NO N/A Numerical_Data_36 Diagnostic candidate: Numerical_Data_16 Category_Data_17 YES NO N/A Numerical_Data_17 Numerical_Data_37 Diagnosis: OTOF homozygote Category_Data_18 YES NO N/A Numerical_Data_18 Numerical_Data_38 Comment of NGS analysis Category_Data_19 YES NO N/A Numerical_Data_19 Numerical_Data_39

Category_Data_20 YES NO N/A Numerical_Data_20 Numerical_Data_40 Diagnosis

Previously Reported Variants Information Clinvar Database Information Control DB Gene symbol Ref.Seq. ID Exon Base Change AA Change genotype AFforGT QD AD AR CNT X-link UNK Pathogenicity DirectSeq. Segregation Allele Freq. pathogenicity disease pmid AlleleFreq1 AlleleFreq2 pathogenicity disease submitter ESPN NM_031475 exon13 c.C2513A p.A838E het 0.548 10.3 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

USH2A NM_206933 exon34 c.T6506A p.I2169K het 0.167 37.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

CDH23 NM_022124 exon36 c.C4762T p.R1588W het 0.557 10.6 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

MYH14 NM_001145 exon2 c.58_59insC p.V20fs het 0.500 9.1 Variant View 0 1 0 0 0 Confirmed SeqError YES NO

MYH14 NM_001145 exon35 c.A4799G p.N1600S het 0.520 9.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

OTOF NM_194323 exon29 c.G3515A p.R1172Q hom 1.000 38.0 Variant View 0 7 0 0 0 AR_Pathogenic Confirmed SeqError YES NO . . .

DFNB31 NM_001083 exon6 c.G200A p.R67H het 0.486 8.0 Variant View 0 1 0 0 0 Confirmed SeqError YES NO unknown not_specifie single NGS results (auto filtering)

All Rights Reserved 2015. Shinshu University School of Medicine Department of Otorhinolaryngology. To modify the field name, you should change Browse mode to Layout mode by clicking the mode pop up menu at the bottom and select “Layout” (Figure 27).

Fig. 27 Mode change pop up

In Layout mode, you can change or modify the displayed field name by double clicking and typing a new field name (Figure 28). After finishing field name modification, please save the Layout changes and return to Browse mode again

-31- via the mode pop up menu. Fig. 28 Modify the displayed field name

13-3. Input clinical information of each sample After modifying the layout, please input the patient’s clinical information. All of numerical data and categorical data are used in summary fields to automatically calculate the average numerical data of patients carrying the same mutation in the “Variant viewer”.

-32- 14. Case viewer

① ③ ②

④ 14-1. Browse case viewer “Case Viewer” is an interface for obtaining a whole view of patient clinical ⑤ information and NGS data in one window (Figure 29). This interface is useful for ⑥ efficient clinical diagnosis as well as the systematic assessment of NGS results.

Fig. 29 Case viewer

Clinical NGS Database ver. 1.2

Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Case_Viewer

ID Family_NUM project project_NUM Clinical diagnosis Onset_Age Gender Demo10 Fam001 DemoData ShinshuMP1 Congenital sensorineural hearing loss 0 Female

proband_ID relationship hereditary sampling_date registration_date center_name JHLB0001 proband AD/Mit AR/Spo X_linked Control Unknown Shinshu-university

Imaging_Data1 Family_History Categorical Data Numerical_Data

Category_Data_1 YES NO N/A Numerical_Data_1 40 Numerical_Data_21 Imaging_Data 2 Imaging_Data 3

Category_Data_2 YES NO N/A Numerical_Data_2 Numerical_Data_22

Category_Data_3 YES NO N/A Numerical_Data_3 Numerical_Data_23

Category_Data_4 YES NO N/A Numerical_Data_4 Numerical_Data_24

Category_Data_5 YES NO N/A Numerical_Data_5 Numerical_Data_25

Category_Data_6 YES NO N/A Numerical_Data_6 Numerical_Data_26

Category_Data_7 YES NO N/A Numerical_Data_7 Numerical_Data_27

Category_Data_8 YES NO N/A Numerical_Data_8 Numerical_Data_28

Category_Data_9 YES NO N/A Numerical_Data_9 Numerical_Data_29

Category_Data_10 YES NO N/A Numerical_Data_10 Numerical_Data_30

Category_Data_11 YES NO N/A Numerical_Data_11 Numerical_Data_31 Imaging_Data 4 Imaging_Data 5 Clinical information medical_history Category_Data_12 YES NO N/A Numerical_Data_12 Numerical_Data_32

Category_Data_13 YES NO N/A Numerical_Data_13 Numerical_Data_33

Category_Data_14 YES NO N/A Numerical_Data_14 Numerical_Data_34

Category_Data_15 YES NO N/A Numerical_Data_15 Numerical_Data_35 Causative gene Genotype Curation_date Curator Category_Data_16 YES NO N/A Numerical_Data_36 Diagnostic candidate: Numerical_Data_16 Category_Data_17 YES NO N/A Numerical_Data_17 Numerical_Data_37 Diagnosis: OTOF homozygote Category_Data_18 YES NO N/A Numerical_Data_18 Numerical_Data_38 Comment of NGS analysis Category_Data_19 YES NO N/A Numerical_Data_19 Numerical_Data_39

Category_Data_20 YES NO N/A Numerical_Data_20 Numerical_Data_40 Diagnosis

Previously Reported Variants Information Clinvar Database Information Control DB Gene symbol Ref.Seq. ID Exon Base Change AA Change genotype AFforGT QD AD AR CNT X-link UNK Pathogenicity DirectSeq. Segregation Allele Freq. pathogenicity disease pmid AlleleFreq1 AlleleFreq2 pathogenicity disease submitter ESPN NM_031475 exon13 c.C2513A p.A838E het 0.548 10.3 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

USH2A NM_206933 exon34 c.T6506A p.I2169K het 0.167 37.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

CDH23 NM_022124 exon36 c.C4762T p.R1588W het 0.557 10.6 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

MYH14 NM_001145 exon2 c.58_59insC p.V20fs het 0.500 9.1 Variant View 0 1 0 0 0 Confirmed SeqError YES NO

MYH14 NM_001145 exon35 c.A4799G p.N1600S het 0.520 9.2 Variant View 0 1 0 0 0 Confirmed SeqError YES NO . . .

OTOF NM_194323 exon29 c.G3515A p.R1172Q hom 1.000 38.0 Variant View 0 7 0 0 0 AR_Pathogenic Confirmed SeqError YES NO . . .

DFNB31 NM_001083 exon6 c.G200A p.R67H het 0.486 8.0 Variant View 0 1 0 0 0 Confirmed SeqError YES NO unknown not_specifie single NGS results (auto filtering)

All Rights Reserved 2015. Shinshu University School of Medicine Department of Otorhinolaryngology.

Brief explanation of the Case viewer: (1) Sample information section is for basic sample information including the sample ID, clinical diagnosis, onset age, gender, inheritance and other information.

-33- (2) Imaging data section is for storing the sample pedigree, one set of imaging data and the medical history of the patient.

(3) Detailed clinical information section is for storing the detailed clinical information of the patient. The categorycal data and numerical data in this section is used for calculating the averaged phenotype in the “Variant viewer”.

(4) Genetic diagnosis section is for storing the candidate genetic diagnosis and final genetic diagnosis of the patient. The final genetic diagnosis in this section is used in the “Report maker”. The final genetic diagnosis in this section is also used for calculating the gene-based averaged phenotype in the “Variant viewer”

-34-

(5) NGS result section is the interface for managing the variant information after automatically filtering. Please refer to section 15 of this manual for filtering parameters. Variant view button in this section is for linking to the “Variant viewer” to obtain all gathered information on the variant.

-35- 15. About filtering

15-1. Filtering parameters This database software has an auto-filtering function for the identified variant. In the default setting, the below order and parameters are used for filtering.

(1) In cases where the pathogenicity interpretation in “Variant viewer” is “AD_pathogenic,” “AR_pathogenic,” or “XL_pathogenic,” the variant is not filtered regardless of the type of variant or minor allele frequencies. (2) If the pathogenicity classification in the “Previously Reported Variant Database” is “Pathogenic” or “Likely pathogenic,” the variant is not filtered regardless of the type of variant of minor allele frequencies. (3) If the pathogenicity classification in the ClinVar status in “ANNOVAR database” is “Pathogenic,” the variant is not filtered regardless of the type of variant of minor allele frequencies. (4) When the variant is located in the intronic region, intergenic, 5’ UTR or 3’ UTR region, the variant is filtered out.* (5) When the variant is located in the exonic region, synonymous variants are filtered out.* (6) A variant with a minor allele frequency over 1% in the 1000 genome, 6500 ord ExAC database is filtered out.* (7) A variant with a minor allele frequency over 1% in the “Control_Allele_Frequency_Database” is filtered out.* (8) The in silico prediction score is not used for filtering.* (9) The QD score is used for filtering.*

15-2. Show filtering settings If you need to modify the filtering parameters, you can modify them through the “Detailed_Setting” function of the database. To move to “Detailed_Setting”, pull down the layout pop-up menu on the status tool bar and select “Detailed_Setting” (Figure 30).

-36- NOTE: This database software was developed with a main focus on target re-sequencing analysis. Thus, if you would like to store large panel or whole exome sequencing results, please modify the filtering parameters. After modifying theFig. settings, 30 Layout please change restart pop your-up database menu .

15-3. Change filtering parameters In the detailed filtering setting window, it is possible to modify detailed filtering parameters (Figure 31). The modifiable filtering parameters include the minor allele frequency threshold in the public control database, intronic variant, synonymous variant, SIFT and PolyPhen2 score, control allele frequencies, and QD filtering.

Fig. 31 Detailed filtering setting window

-37- 16. Variant viewer ① ② ⑥ ⑩ ③ ⑦

④ 16-1. Variant viewer The “Variant viewer” is an interface that provides a whole view of the variant ⑤ information including the many public databases, in silico prediction, and ⑧ pathogenicity classification (Figure 32). This interface is useful for the efficient interpretation of each variant. To assist in this, it also indicates the averaged clinical information and odds ratio.

⑨ Fig. 32 Case viewer

Clinical NGS Database ver. 1.2 Sample list Case viewer Panel Info Variant list Variant viewer Report maker Annotation Patohgenic Control Search Browse Sort Import VCF DB update Export Annotation Import Send E-mail Variant_Viewer

Chr Start End Ref Alt Func. refgene Gene refgene GeneDetail Exonic Func Ref.Seq. ID Exon Base Change AA Change chr2 26681086 26681086 C T exonic; OTOF . nonsynonymo NM_194323 exon29 c.3515G>A p.R1172Q Odds ratio

Entrez_gene_ID AAChange Pathogenicity Curation_Date Curator 9381 "OTOF:NM_194323:exon29:c.3515G>A:p.R1172Q,OTOF:NM_001287489:exon46:c.5816G>A:p.R1939Q" AR_Pathogenic

Comment of variant interpretation CNT_alt# CNT_ref# OR_AD/Mit OR_AD/Mit_95%CI OR_AR/Spo OR_AR/Spo_95%CI 4 2120 0.7 0.1 - 3.8 6.0 2.2 - 16.6

p-value 1.000000 p-value 0.000172 Variant interpretation ACMG_clasification: Pathogenic

ID Genotype DP GQ Project Hereditaly Enrichment DirectSeq. Segregation ACMG variant clasification 2529 het 549 99 Iowa AR/Spo Panel_1 Case_Preview in silico PVS1: null variant in a gene where LOF is a known mechanism of disease

260 het 999 99 hokenMP19 AR/Spo Panel_1 Case_Preview PS1: Same amino acid change as a previously established pathogenic variant

261 het 755 99 hokenMP19 AR/Spo Panel_1 Case_Preview PS2: De novo (both parents confirmed) in a patient w/o no family history

262 het 260 99 hokenMP19 AR/Spo Panel_1 Case_Preview PS3: Well established experiment support damaging effect prediction score and ClinVar status 2703 hom 432 99 Iowa AR/Spo Panel_1 Case_Preview PS4: The prevalence of the variant in affected individuals is significantly high (OR > 5.0)

2958 het 127 99 shinshuCIPtMP1 AR/Spo Panel_1 Case_Preview PM1: Located in a mutational hot spot and/or well-established functional domain 3354 het 404 99 Iowa Unknow Panel_1 Case_Preview D D D . D . D D D PM2: Absent from controls (or at extremely low frequency if recessive) 4006 het 370 99 Iowa AD/Mit Panel_1 Case_Preview PM3: For recessive disorder detected in trans with a pathogenic variant Patient list who carring same mutation 4879 het 445 99 SenshinMP1 AR/Spo Panel_1 Case_Preview PM4: Protein length changes (in-frame deletions/insertions) in a nonrepeat region or stop-loss variants

PM5: Novel missense change at same AA position of other pathogenic variant Allele Frequency Information in silico prediction PM6: Assumed de novo, but without both parents confirmation Freq Sample# AveDepth Ref# Alt# 0.997 D MutAssessor . 2.493 Upgrade AD 2 / 1532 0.00131 SIFT CADD PP1: Cosegregation with disease in multiple affected family members PM7 PS5 CNT_DB .002743484 729 37.57 1454 4 GERP++ 4.8 AR 48 / 4298 0.01117 PP2 HDIV 1 D FATHMM 0.457 D PhyloP46 0.871 PP2: Missense variant in a gene that has a low rate of benign missense variation all afr amr eas eur sas X_link 0 / 0 ? PP2 HVAR 0.994 D RadialSVM 0.631 D 1000G ? . . 0.001 . . UNK 14 / 1468 0.00954 LRT . LR 0.762 D PhyloP100 0.927 PP3: Multiple lines of computational evidence support a deleterious effect

all aa ea CNT 0 / 666 0.00000 MutTaster 1 D VEST3 0.831 SiPhy29 17.465 PP4: Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology ESP6500 . . . Case 64 / 7298 0.00877 dbSNP COSMIC_ID COSMIC_DIS PP5: Reputable source recently reports variant as pathogenic all afr amr eas fin nfe oth sas BA1: Allele frequency is >5% in ESP, 1000 Genomes, or ExAC ExAC03 2.59E-05 0 0 0.0006 0 0 0 0 ClinVar_SIG ClinVar_DIS ClinVar_STATUS ClinVar_ID ClinVar_DB ClinVar_DBID BS1: Allele frequency is greater than expected for disorder . . . . . BS2: Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous) Pathogenicity data from Previously Reported Variant Database GWAS_DIS GWAS_P GWAS_OR GWAS_BETA GWAS_PMID GWAS_SNP BS3: Well established experiment show no damage effect gene variant_locate aa change base_change BS4: Lack of segregation in affected members of a family OTOF NM_194323:p.Arg1172Gln NM_194323:c.3515G>A Allele frequencies in public and in house database pathogenicity disease pmid AF1 AF2 BP1: Missense variant in a gene for which primarily truncating variants are known to cause disease Unknown NULL NULL BP2: Observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern

Averaged clinical data of the variant carriers BP3: In frame deletion/insertion in repetitive region without known function

BP4: Multiple lines of computational evidence suggest no impact on gene AD AR X_link CNT UNK Numerical_Data_1 ± Numerical_Data_15 ± Numerical_Data_29 ± Averaged Onset Age ± 2 48 0 0 14 Numerical_Data_2 ± Numerical_Data_16 ± Numerical_Data_30 ± BP5: Variant found in a case with an alternate molecular basis for disease Numerical_Data_3 ± Numerical_Data_17 ± Numerical_Data_31 ± BP6: Reputable source recently reports variant as benign Y N Y N Numerical_Data_4 ± Numerical_Data_18 ± Numerical_Data_32 ± Category_Data_1 Category_Data_11 BP7: A synonymous (silent) variant for which splicing prediction algorithms predict no impact Numerical_Data_5 ± Numerical_Data_19 ± Numerical_Data_33 ± Category_Data_2 Category_Data_12 Numerical_Data_6 ± Numerical_Data_20 ± Numerical_Data_34 ± Category_Data_3 Category_Data_13 Numerical_Data_7 ± Numerical_Data_21 ± Numerical_Data_35 ± Category_Data_4 Category_Data_14 Numerical_Data_8 ± Numerical_Data_22 ± Numerical_Data_36 ± Category_Data_5 Category_Data_15 Numerical_Data_9 ± Numerical_Data_23 ± Numerical_Data_37 ± Category_Data_6 Category_Data_16 Numerical_Data_10 ± Numerical_Data_24 ± Numerical_Data_38 ± Category_Data_7 Category_Data_17 Numerical_Data_11 ± Numerical_Data_25 ± Numerical_Data_39 ± Category_Data_8 Category_Data_18 Numerical_Data_12 ± Numerical_Data_26 ± Numerical_Data_40 ± Category_Data_9 Category_Data_19 Numerical_Data_13 ± Numerical_Data_27 ± Averaged clinical information who carried same variant Category_Data_10 Category_Data_20 Numerical_Data_14 ± Numerical_Data_28 ±

Averaged clinical data of the patients caused by this gene mutations

Numerical_Data_1 ± Numerical_Data_15 ± Numerical_Data_29 ± Averaged Onset Age ± Numerical_Data_2 ± Numerical_Data_16 ± Numerical_Data_30 ±

Numerical_Data_3 ± Numerical_Data_17 ± Numerical_Data_31 ± Y N Y N Numerical_Data_4 ± Numerical_Data_18 ± Numerical_Data_32 ± Category_Data_1 Category_Data_11 Numerical_Data_5 ± Numerical_Data_19 ± Numerical_Data_33 ± Category_Data_2 Category_Data_12 Numerical_Data_6 ± Numerical_Data_20 ± Numerical_Data_34 ± Category_Data_3 Category_Data_13 Numerical_Data_7 ± Numerical_Data_21 ± Numerical_Data_35 ± Category_Data_4 Category_Data_14 Numerical_Data_8 ± Numerical_Data_22 ± Numerical_Data_36 ± Category_Data_5 Category_Data_15 Numerical_Data_9 ± Numerical_Data_23 ± Numerical_Data_37 ± Category_Data_6 Category_Data_16 Numerical_Data_10 ± Numerical_Data_24 ± Numerical_Data_38 ± Category_Data_7 Category_Data_17 Numerical_Data_11 ± Numerical_Data_25 ± Numerical_Data_39 ± Category_Data_8 Category_Data_18 Numerical_Data_12 ± Numerical_Data_26 ± Numerical_Data_40 ± Category_Data_9 Category_Data_19 Numerical_Data_13 ± Numerical_Data_27 ± Category_Data_10 Category_Data_20 Numerical_Data_14 ± Numerical_Data_28 ± Averaged clinical information of the patient caused by same gene

-38-

All Rights Reserved 2015. Shinshu University School of Medicine Department of Otorhinolaryngology. Brief explanation of variant viewer: (1) Variant information section is for basic variant information including the chromosome number, start, end, reference base, alternation base, gene name, base change in cDNA and amino acid change.

(2) Variant pathogenicity interpretation section is for managing the pathogenicity interpretation of each variant. The interpretation in this section is used all through the database automatically, so please classify the pathogenicity of each variant carefully.

(3) Patient list section indicating patients carrying the same variant is a useful tool for comparing each patient phenotype. The Case_Preview button in this section is a link to the “Case viewer” for each patient.

(4) Allele frequency information section summarizes the minor allele frequency information of the public control population including the 1000 genome, 6500 exome, and ExAC project. “CNT_DB” indicates the allele frequency in the “Control_Allele_Frequency_Database” (see Section 12). The green boxed area indicates the inheritance-specific allele frequency information in the

-39- A Index case

JHLB2722

MYO7A:c.[479C>G;2947G>T];[=]database. 125 250 500 1,000 2,000 4,000 8,000 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120

(5) Pathogenicity data from the previously reported variant database section indicates the variant clasification and disease information in the B C “Previously_Reported_Variants_Database” (see Section 11).

(6) Odds ratio section indicates the variant allele frequency information in the “Control_Allele_Frequency_Database”, in-house controls, the inheritance mode specific odds ratio, and 95% confidential intervals. D E

(7) In silico predication section indicates the results of in silico prediction of the effect of the variant on the protein function. This field also indicates the ClinVar status if applicable.

F Case with same mutation 1 Case with same mutation 2

-40- JHLB0694 MYO7A:c.[479C>G;2947G>T];[=]

JHLB0695 JHLB2454 MYO7A:c.[479C>G;2947G>T];[=] MYO7A:c.[479C>G;2947G>T];[=]

125 250 500 1,000 2,000 4,000 8,000 -20 125 250 500 1,000 2,000 4,000 8,000 -10 -20 0 -10 10 0 20 10 30 20 40 30 50 40 60 50 70 60 80 70 90 80 100 90 110 100 120 110 120

Case with same mutation 3 Case with same mutation 4

JHLB3208

JHLB3207 MYO7A:c.[479C>G;2947G>T];[=] MYO7A:c.[=];[617G>A]

125 250 500 1,000 2,000 4,000 8,000 -20 -10 JHLB3206 0 10 MYO7A:c.[479C>G;2947G>T];[617G>A] 4966 20 MYO7A:c.[479C>G;2947G>T];[=] 30 40 50 125 250 500 1,000 2,000 4,000 8,000 125 250 500 1,000 2,000 4,000 8,000 -20 -20 60 -10 -10 70 0 0 80 10 10 90 20 20 100 30 30 110 40 40 120 50 50 60 60 70 70 80 80 90 90 100 100 110 110 120 120

(8) Averaged clinical information section (variant level averaged clinical information) indicates the automatically calculated averaged clinical data for patients carrying the same variant.

-41- (9) Averaged clinical information of the patient with the same causative gene section (gene level averaged clinical) indicates the automatically calculated averaged clinical data for patient suffering from the same genetic cause.

(10) Variant classification according to the ACMG and AMP guidelines section. Check appropriate evidence box for the variant and automatically calculated ACMG guideline based classification.

-42- 17.18. Add Input more direct patient sequencing data results and family segregation information

17-1. Input direct sequencing results This database software also manages the direct sequencing results and family segregation information in the “Case_viewer” (Figure 33). Please check the appropriate box for this information.

Fig. 33 Direct sequencing results and family segregation information management section in the Case viewer

18-1. Add more patient data If you would like to add more patient data, please refer steps 6 to 10 in this manual. Step 7 of the process requires a lot of time.

-43- 19. Add other genetic testing data (optional)

19-1. Add other genetic testing data If you would like to add other genetic testing data, press “Input Other” button.

19-2. Input other genetic testing results According to the instruction, input the other genetic testing results. The mutation should input as genomic position. For “genotype” field, it is impossible to modifying the field so, please input “Info” field according to the instruction.

-44- 20. Report maker

20-1. Input sequencing metrics This database software has a simple report-making function for NGS results. First, please move to the “Sample list” and input the sequencing metrics (Figure 34).

Fig. 34 Sample list icon

After layout change, input the sequencing metrics for each sample. As sequencing metrics, this database stores “Total Reads”, ”Total Base”, “On Target %”, “over x10 coverage %”, and “over x20 coverage %”. (Instead of one-by-one input, it is possible to import excel data.)

Fig. 35 Input sequencing metrics information

20-2. Confirm the NGS report The next-generation sequencing analysis report is automatically generated by the sample basic information, sequencing basic information, genetic diagnosis and comments input in Step 14-(4), and the variant list and its interpretation input in Step 16-(2). To confirm the NGS report, click on the “Report Maker” icon (Fig. 36).

-45-

Fig. 36 Report Maker icon

The explanation of the NGS test is able to be modified in the Layout mode as shown in Step 13-2. Please modify the text if necessary. Further, please modify the comment and recommendation section if necessary (Figure 37).

Fig. 37 Report Maker icon

-46-

-47- NOTE: The variant list indicated in the report maker includes VUS and others. In case of relatively large panel or whole exome sequencing analysis, it is necessary to modify the filtering parameters to restrict the number of displayed variants.

-48- 21. Terms of use

Prior to using this database software, please read the terms of use and agreement listed below. If you agree to these terms, please send a printed agreement form or scanned agreement form to the developer.

Send to: ======3-1-1 Asahi Matsumoto Nagano JAPAN 390-8621 Shinshu University School of Medicine Department of Otorhinolaryngology Shin-ya Nishio Ph.D. [email protected] ======

(Grant of Use) The developer shall grant to the user the right to use the database software on the condition that user shall comply with the various terms and conditions set forth in these terms of use. User acknowledges that the database software created by the developer and developer shall own the any right regarding the database software.

(Scope of use) This database software shall grant for use by academic users only for non-commercial purposes. If the user wants to use this database for any commercial purposes, he/she must separately make such a request to the developer and consult on the terms and conditions of use.

(Notification of Re-editing) If the user wants to re-edit this database software, the user must notify the developer in writing of the contents of such re-editing and obtain the developer’s approval. The user is not permitted to provide to any third parties any re-edited database software without the developer’s permission.

-49-

(Related Laws) Users must comply with the related laws, regulations, and guidelines, irrespective of the time of enactment, and other provisions relating to the handling of this database software.

(Handling of Study Results) The intellectual property rights relating to the user’s study results shall be, in principle, owned by the relevant user. However, the developer shall own the relevant intellectual property rights related to the database software itself.

(Citation of theses) If the user publishes the study results in the form of a thesis, presentation, etc., he/she shall describe the fact that he/she has used this database software by citing our thesis or indicating the fact in the acknowledgements.

(Exemption) With regard to the use of this database software, the developer shall not provide any warranty to the user as to its suitability for user's purpose, the non-infringement of any third parties' intellectual property rights, or any matters relating to the use of this database software. Should any damage occur to the user or any third party through using this software, the developer shall not be liable therefor, and the user shall be liable for settling such damage at his/her own cost and expense.

(Obligation of management) This database software is developed for closed network use. If you would like to use this database on an open network, the user should use appropriate countermeasures against unintentional data access such as a (1) VPN network connection, (2) firewall IP-restriction, (3) encryption of database, and (4) password change at stated periods. Should any damage occur to the user or any third party through using this software in an open network, the developer shall not be liable therefor, and the user shall be liable for settling such damage at

-50- his/her own cost and expense.

(Compensation for Damage) If the user causes any damage to the developer due to any acts of breach as set forth in the preceding Articles or his/her willful intention or negligence, the developer may demand compensation for such damage.

-51- ======Clinical NGS DB user agreement form ======

I declare that I have read, understood and agree to the contents of terms of use described above.

Signature:

Printed Name:

Affiliation:

Address:

Phone:

E-mail address:

Date:

-52-