Interactive Pedigree Plotter for Genetic Analysis

INTERACTIVE PEDIGREE PLOTTER FOR GENETIC ANALYSIS Sveinn Már Ásgeirsson Master of Science June 2019 School of Science and Engineering Reykjavík University M.Sc Thesis ii Interactive Pedigree Plotter for Genetic Analysis by Sveinn Már Ásgeirsson Thesis of 30 ECTS credits submitted to the School of Science and Engineering at Reykjavík University in partial fulfillment of the requirements for the degree of Master of Science (M.Sc) in Biomedical Engineering June 2019 Supervisor: Bjarni V. Halldórsson, Supervisor Associate Professor, Reykjavík University, Iceland Guðbjörn F. Jónsson, Supervisor Software Developer, deCODE Genetics, Iceland Examiner: Páll Melsted, Examiner Professor, University of Iceland, Iceland i Copyright Sveinn Már Ásgeirsson June 2019 ii Interactive Pedigree Plotter for Genetic Analysis Sveinn Már Ásgeirsson June 2019 Abstract Diseases and traits are often caused by mutations in the genome. A very useful way to try to determine how these mutations occur and are inherited in families is by looking at a pedigree. A pedigree is a diagram that depicts the biological relationship between an in- dividual, its ancestors and other relatives. It is often used to look at the genetic transmission of genetic disorders. The purpose of a pedigree is to have a visually easy-to-read chart that depicts a certain characteristic or disorder in a family. It can be used for a physical characteristic like having a widow‘s peak or attached earlobes, or a genetic disorder like colorblindness or Huntington‘s disease. With technological advances in genetics last two decades such as improved genotyping methods, e.g. by moving from microsatellites and other single marker genotyping methods to chip genotyping and whole genome sequencing, researchers at deCODE now have immense data volume to work with. This gives them much better and more accurate un- derstanding of infrequent variations in the DNA that may be an attributing factor to rare diseases. One method to analyze these rare variants is to use pedigrees, but because of the low penetrance of these variants, the researcher may have to draw very large pedigrees, hundreds or even thousands of individuals, just to understand the inheritance of the variant. This thesis reviews a pedigree plotter, Interactive Pedigree Plotter (IPP), designed by the author, which specializes in large pedigrees; both drawing them and working with them. The “interactive” refers to allowing the user to, for example, collapse/expand, move and delete certain parts of the descendant tree; a feature that becomes important with increas- ing pedigree size. The IPP also offers various attribute features such as arbitrary text attributes and multiple symbols, giving the user good tools to distinguish individuals from one another as well as being able to have multiple phenotypes, e.g. several cancer types. To summarize, IPP is a pedigree plotter that is well equipped to handle large and complex pedigrees, enabling researchers to study rare variants. This thesis reviews IPP and its features, and then compares it to several similar pedigree plotters to see whether existing pedigree plotters are sufficiently advanced for the genetic analyses being done at deCODE Genetics. iii Gagnvirkur Fjölskylduteiknari fyrir Erfðarannsóknir Sveinn Már Ásgeirsson júní 2019 Útdráttur Sjúkdómar og erfðaeinkenni koma oft til vegna stökkbreytinga í erfðamengi. Ein mjög gagnleg leið til þess að reyna átta sig á hvernig stökkbreytingar koma til og hvernig þær erfast í fjölskyldum er að horfa á ættartré (e. pedigree). Ættartré eru skýringarmyndir sem lýsa líffræðilegu sambandi milli einstaklinga, forfeðra og annarra skyldmenna. Ættartré eru oft notuð til þess að skoða arfgengi (e. genetic transmission) breytileika sem valda erfða- sjúkdómum. Tilgangur ættartrés er að hafa auðlesanlega skýringarmynd sem segir til um ákveðinn einkennandi eiginleika eða sjúkdóm í fjölskyldu. Hægt er að nota hana fyrir ým- is útlitseinkenni (t.d. Widow’s Peak eða Attached Earlobes), eða erfðasjúkdóma eins og litblindu eða Huntingtonssjúkdóm. Það hafa orðið miklar tækniframfarir í erfðafræði seinutu tvo áratugi hvað varðar arfgerðar- greiningar, t.d. skipti frá örtunglum (e. microsatellites) og öðrum einmerkja arfgerðagrein- ingum, yfir í arfgerðagreiningu með flögum (e. chip genotyping) og raðgreiningu (e. whole genome sequencing), sem gefa vísindamönnum hjá Íslenskri Erfðagreiningu miklu meira magn af erfðagögnum til að vinna með. Þetta gefur miklu nákvæmari og betri skilning á sjaldgæfum breytileikum í erfðamenginu sem gætu hugsanlega stuðlað að sjaldgæfum sjúk- dómum. Ein leið til að greina þessa sjaldgæfu breytileika er að nota fjölskyldutré, en útaf lágri sýnd (e. penetrance) þeirra, þá gæti vísindamaðurinn þurft að teikna mjög stór fjöl- skyldutré, hundrað til þúsund einstaklinga, bara til þess að geta skilið arfleið breytileikans. Þessi ritgerð skoðar fjölskylduteiknara, Gagnvirkur Fjölskylduteiknari (GFT), hannaður af höfundinum, hannaður með þá sérstöðu að takast á við stór fjölskyldutré; bæði teikna þau og vinna með þau. Með „gagnvirkni“ er átt við að gera notandanum kleift að til dæmis fella/útvíkka (e. collapse/expand), færa og/eða eyða ákveðnum greinum trésins, en þessi gagnvirkni verður æ meira mikilvæg með stækkandi ættartrjám. GFT býður líka upp á allskonar eiginleika eins og textaeiginleika sem notandinn velur textann að vild, ásamt tákn eiginleika (e. symbol attribute), en þetta gefur notandanum góð tól til þess að greina í sundur einstaklinga í tréinu auk þess að geta verið með margar svipgerðir í stöku tréi, eins og til dæmis nokkrar svipgerðir krabbameins. Til að draga saman, þá er GFT fjölskylduteiknari sem er vel búinn til þess að ráða við stór og flókin fjölskyldutré, sem gerir vísindamönnum kleift að rannsaka sjaldgæfa breytileika. Þessi ritgerð skoðar og lýsir GFT og eiginleikum hans, og ber hann síðan saman við svip- aða fjölskylduteiknara til þess að sjá hvort þeir fjölskylduteiknarar sem til eru, séu nægi- lega þróaðir til þess að vera nothæfir fyrir erfðarannsóknir í Íslenskri Erfðagreiningu. iv Interactive Pedigree Plotter for Genetic Analysis Sveinn Már Ásgeirsson Thesis of 30 ECTS credits submitted to the School of Science and Engineering at Reykjavík University in partial fulfillment of the requirements for the degree of Master of Science (M.Sc) in Biomedical Engineering June 2019 Student: Sveinn Már Ásgeirsson Supervisor: Bjarni V. Halldórsson Guðbjörn F. Jónsson Examiner: Páll Melsted v I dedicate this thesis to my daughter, Alba Rós Sveinsdóttir. vi Acknowledgements This work was funded by deCODE Genetics. vii Preface The programming of the sofware and this dissertation was done solely by the author, Sveinn Már Ásgeirsson. The design and the idea of the software was done in a collaboration with several people, including Gísli Másson, Guðbjörn F. Jónsson, Birgir Pálsson and Hreinn Ste- fánsson. viii Contents Acknowledgements vii Preface viii Contents ix List of Figures xii List of Tables xiv 1 Introduction 1 2 Background 3 2.1 Pedigree . 3 2.2 Genotype . 4 2.3 Phenotype . 5 2.4 Transmission genetics . 6 2.5 Penetrance . 6 2.6 Genetic marker (variation) . 6 2.6.1 SNPs . 6 2.6.2 Allele . 6 2.6.3 Common variants . 6 2.6.4 Rare variants . 6 2.7 Genotyping methods . 6 2.7.1 Microsatellite genotyping . 7 2.7.2 Chip genotyping . 7 2.7.3 Whole genome sequencing . 7 2.8 Linkage analysis . 7 2.9 GWAS..................................... 7 2.10 Haplotype . 7 2.10.1 Phasing . 8 2.10.2 Parental origin . 8 2.10.3 Long-range phasing . 9 3 Related Work 10 3.1 Online Pedigree Designers . 10 3.1.1 Medical Pedigree . 10 3.1.2 Progeny Pedigree Tool . 11 3.1.3 Genial Pedigree Draw . 13 3.2 Stand-alone Pedigree Plotters . 13 ix 3.2.1 HaploPainter . 13 3.2.2 CraneFoot . 14 3.2.3 Madeline . 16 4 Methods 18 4.1 Incentive . 18 4.2 Implementation . 19 4.3 Necessary requirements . 19 4.3.1 High drawing speed . 20 4.3.2 Handling complex family patterns . 20 4.3.3 Interaction . 20 5 Results 21 5.1 Pedigree Report file . 21 5.1.1 First line . 22 5.1.2 Pedigree report columns . 22 5.1.2.1 PN . 22 5.1.2.2 Father and Mother . 23 5.1.2.3 Sex . 23 5.1.2.4 Yob and Yod . 23 5.1.2.5 Affstatus . 24 5.2 Attributes file . 24 5.2.1 Text attributes . 24 5.2.2 Symbols . 26 5.2.3 Haplotypes . 29 5.3 Layout algorithm . 29 5.4 Interactive features . 32 5.4.1 Deleting . 33 5.4.2 Collapsing and expanding . 33 5.4.3 Moving . 35 5.4.4 Shrinking and stretching . 36 5.5 Comparison . 37 5.5.1 Complex families . 37 5.5.1.1 Multiple spouses . 38 5.5.1.2 Simple consanguinity . 39 5.5.1.3 Complex consanguinity . 43 5.5.1.4 Siblings have spouses that also have parents in the pedigree 46 5.5.1.5 Partners belong to different generations . 48 5.5.1.6 Pedigree made up from many smaller families . 52 5.5.1.7 Single parent connection . 56 5.5.2 Drawing speed . 58 5.5.3 Interaction . 59 6 Future work 60 6.1 Haplotypes . 60 6.2 Delete . 61 6.3 Add . 62 6.4 Export . 62 6.5 Double consanguinity line . 62 x 6.6 Recalculation . 62 7 Summary and conclusion 63 8 Discussion 66 Bibliography 67 xi List of Figures 2.1 Standard set of pedigree symbols and an example of pedigree. 4 2.2 Image showing Genotype . 5 2.3 Image showing Phenotype . 5 2.4 Two trio phasing examples for one marker. 8 3.1 Medical Pedigree. 11 3.2 The Progeny Pedigree Tool offers to start with a small family of maximum four generations, and the user can then add more nodes to the pedigree afterwards as well as adding text attributes and symbols. 12 3.3 Genial Pedigree Draw . 13 3.4 Example of HaploPainter in action.

Interactive Pedigree Plotter for Genetic Analysis

Family Tree Chart Template

Mitochondrial DNA: Hotspot for Potential Gene Modiﬁers Regulating Hypertrophic Cardiomyopathy

Pedigree Charts

Comments to Genealogical Charts and Records

Relationship Chart

The Tree – Pedigree to Person Page

Jebmh.Com Review Article

Making a Pedigree Chart

Fou R Generatio N Pedigree Chart

Using Collateral Lines for Family History Research Cindy Webb 503-888-0923 [email protected]

3.3.2 Pedigrees and Sex-Linked Traits Objectives

Using the Pedigree Chart - - Handout