Distribute! Or! Use! Other! Than! For! Personal! Purposes! Without! The! Permission! Of! Suzanne!Leal!Or!Michael!Nothnagel.!

! ! ! ! ! ! ! ! This! is! copyrighted! material.! Please! do! not! distribute! or! use! other! than! for! personal! purposes! without! the! permission! of! Suzanne!Leal!or!Michael!Nothnagel.! i ii Table&of&Contents& Monday' '................................................................................................................................'1' Introduction*to*finding*pathogenic*variants*using*NGS*data* *..............................................*1* Introduction*to*the*basics* *..................................................................................................................*3* Calculation*of*LOD*scores* *................................................................................................................*13* Cloud*computing* *.................................................................................................................................*20* Tuesday' '.............................................................................................................................'25' Penetrance* *.............................................................................................................................................*25* Software*to*perform*linkage*analysis* *........................................................................................*28* Genotyping*error*detection*using*Merlin* *.................................................................................*33* Homozygosity*mapping* *...................................................................................................................*35* Wednesday' '.......................................................................................................................'39' File*formats*for*sequence*data* *......................................................................................................*39* Thursday' '..........................................................................................................................'49' Filtering*approaches*for*the*analysis*of*NGS*data* *................................................................*49* Association*analysis*for*Mendelian*traits* *................................................................................*63* Performing*linkage*analysis*using*sequencing*data*..............................................................*69* Friday' '.................................................................................................................................'73' Evaluating*power*using*simulation*studies* *............................................................................*73* Functional*studies* *..............................................................................................................................*80* Variant*annotation* *.............................................................................................................................*85* * iii iv Analysis of Family Data Via Filtering Strategies Samples in a Family Undergoing NGS Two or More Affected & Unaffected One Affected Introduction to Finding Pathogenic Affected Individuals Individuals Individual Variants Using NGS Data: Exclude variants which are not shared by affected Linkage Analysis and Filter Approaches individuals & (present in unaffected individuals ) Suzanne M. Leal [email protected] Exclude non-coding variants and & coding variants which are found Center For Statistical Genetic in databases (ExAC) or are not rare e.g. >0.5% Baylor College oF Medicine https://www.bcm.edu/research/labs/center-for-statistical-genetics Test for segregation of identified variants with disease phenotype & sequence variants in ethnically matched controls This Strategy can Fail! PerForming Linkage Analysis • DNA samples from all informative pedigree None of the variants completely segregate with members are genotyped using arrays disease status • Parametric two-point and multipoint linkage Ø AfFected individuals are phenocopies or incorrectly diagnosed analysis performed Ø UnaFFected individuals are disease variant carriers (reduced D • For consanguineous pedigrees segregating penetrance) autosomal recessive traits Ø Sample swaps have occurred – Homozygosity mapping can also be used Ø Locus heterogeneity within the pedigree Homozygoisty Linkage Analysis Homozygosity Mapping ++ D+ Region oF ++ D+ mapper score ++ D+ Phenocopy Reduced Penetrance Sample Swaps Homozyosity BeneFits oF PerForming Linkage Analysis Trouble Shooting Using Linkage Analysis Using Genotyping Arrays • Linkage analysis can be perFormed using genotyping arrays or sequence data • Aids in selection oF individuals For sequencing • Observed LOD scores compared to • Maps the disease locus to speciFic genomic – Expected maximum LOD (EMLOD) region(s) – Maximum LOD (MLOD) • Filtering can be perFormed within several Mb, i.e. • DeFlated LOD scores can be due to – Incorrect phenotype inFormation linkage region, instead oF the entire genome – Locus heterogeneity within the pedigree – Reducing the number oF variants which need to be • Genotypes can also aid in detection of incorrect followed-up familial relationships • Testing For segregation in pedigrees • Evaluating Frequencies in ethnically matched controls 1 Non-syndromic Hearing Impairment (NSHI) • 893 NSHI families ascertained REPORT American Journal oF Human Genetics – Pakistan, USA, Switzerland, Turkey, Jordan, Hungry Mutations in KARS, Encoding Lysyl-tRNA (Roma), Poland & Germany Synthetase, Cause Autosomal-Recessive Nonsyndromic Hearing Impairment DFNB89 • Intra-familial heterogeneity in the collection Regie Lyn P. Santos-Cortez,1,8 Kwanghyuk Lee,1,8 Zahid Azeem,2,3 Patrick J. Antonellis,4,5 Lana M. Pollock,4,6 Saadullah Khan,2 Irfanullah,2 Paula B. Andrade-Elizondo,1 • 15.3% (95% CI 11.9 - 19.9%) Santos-Cortez et al. 2015 EJHG Ilene Chiu,1 Mark D. Adams,6 Sulman Basit,2 Joshua D. Smith,7 University of Washington Center for Mendelian Genomics, Deborah A. Nickerson,7 Brian M. McDermott, Jr.,4,5,6 • Linkage analysis Followed by exome sequencing led Wasim Ahmad,2 and Suzanne M. Leal1,* to the identiFication oF a number oF NSHI genes Previously, DFNB89, a locus associated with autosomal-recessive nonsyndromic hearing impairment (ARNSHI), was mapped to chromosomal region 16q21–q23.2 in three unrelated, consanguineous Pakistani families. Through whole-exome sequencing of a hearing- – impaired individual from each family, missense mutations were identified at highly conserved residues of lysyl-tRNA synthetase KARS (Santos-Cortez et al. 2013 AJHG) (KARS): the c.1129G>A (p.Asp377Asn) variant was found in one family, and the c.517T>C (p.Tyr173His) variant was found in the other two families. Both variants were predicted to be damaging by multiple bioinformatics tools. The two variants both segregated with the – ADCY1 (Santos-Cortez et al. 2014 Hum Mol Genet) nonsyndromic-hearing-impairment phenotype within the three families, and neither mutation was identified in ethnically matched controls or within variant databases. Individuals homozygous for KARS mutations had symmetric, severe hearing impairment across – TBC1D24 (Rehman et al. 2014 AJHG) all frequencies but did not show evidence of auditory or limb neuropathy. It has been demonstrated that KARS is expressed in hair cells of zebrafish, chickens, and mice. Moreover, KARS has strong localization to the spiral ligament region of the cochlea, as well as to Deiters’ cells, the sulcus epithelium, the basilar membrane, and the surface of the spiral limbus. It is hypothesized that KARS variants affect ami- noacylation in inner-ear cells by interfering with binding activity to tRNA or p38 and with tetramer formation. The identification of rare KARS variants in ARNSHI-affected families defines a gene that is associated with ARNSHI. Hearing impairment (HI) affects nearly 300 million people ilies. The knowledge that has been gained from functional, of all ages globally and increases in prevalence per decade expression, and localization studies after the identification of life.1 Children and adults with bilateral, moderate-to- of genes with mutations that cause NSHI has immensely profound HI have a poorer quality of life, which encom- expanded our understanding of inner-ear physiology. passes not only problems in physical function but also Previously, an ARNSHI-associated locus, DFNB89, was socioemotional, mental, and cognitive difficulties.2,3 In mapped to chromosomal region 16q21–q23.2 in two unre- particular, children with congenital HI must be identified lated, consanguineous Pakistani families.6 The two fam- and habilitated within the first 6 months of life so that de- ilies, 4338 and 4406 (Figures 1A and 1B), had maximum lays in theRare Homozygous Variants in the acquisition of speech, language, and reading multipoint LOD scores of 6.0 and 3.7, respectively. The ho- DFNB89 Locus (16q21-q23.2) skills can be prevented.4 mozygosity regions that overlap in the two families led to Among children with congenitalDFNB89 Region sensorineural HI, more the identification of a 16.1 Mb locus (chr16: 63.6–79.7 Mb) Bilateral symmetric moderate-to-proFound SNP genotyping (Illumina linkage panel) than 80% do not display syndromic features and ~60% that includes 180 genes. Additionally, a third consanguin- hearing impairment across all Frequencies Region oF homozygosity 16.1 Mb: have a family history of HI or a confirmed genetic etiol- eous Pakistani family, 4284 (Figure 1C), was identified, and ogy.5 Because of the complex cellular organization of the showed suggestive linkage to

Distribute! Or! Use! Other! Than! For! Personal! Purposes! Without! The! Permission! Of! Suzanne!Leal!Or!Michael!Nothnagel.!

A Decision Support System for Reporting High Throughput Sequencing of Cancers in Clinical Diagnostic Laboratories Kenneth D

Locus Reference Genomic Sequences: an Improved Basis for Describing Human DNA Variants

14 CLINICAL GENOMICS CODE SYSTEMS Information on Code Systems with Name, HL70396 Code, OID, Source Information Links, and Description

Haplosaurus Computes Protein Haplotypes for Use in Precision Drug Design

Human Genotype–Phenotype Databases: Aims, Challenges and Opportunities

Diagnosis of Neurogenetic Disorders Contribution of Next Generation Sequencing and Deep Phenotyping

Establishing an Automated Graphical Genome Analysis Platform

Genes and Variants Underlying Human Congenital Lactic Acidosis—From Genetics to Personalized Treatment

Algorithms for the Description of Molecular Sequences Issue Date: 2016-12-21

The Online Abstract Book

Accurate Mutation Annotation and Functional Prediction Enhance the Applicability of -Omics Data in Precision Medicine Tenghui Chen

Pathos: a Decision Support System for Reporting High Throughput Sequencing of Cancers in Clinical Diagnostic Laboratories Kenneth D