Kazima Bulayeva · Oleg Bulayev Stephen Glatt Genomic Architecture of Schizophrenia Across Diverse Genetic Isolates A Study of Dagestan Populations Genomic Architecture of Schizophrenia Across Diverse Genetic Isolates Kazima Bulayeva • Oleg Bulayev • Stephen Glatt

Genomic Architecture of Schizophrenia Across Diverse Genetic Isolates A Study of Dagestan Populations Kazima Bulayeva Oleg Bulayev Russian Academy of Sciences Russian Academy of Sciences Moscow, Russia Moscow, Russia

Stephen Glatt Department of Psychiatry and Behavioral Sciences SUNY Upstate Medical University Syracuse, New York USA

ISBN 978-3-319-31962-9 ISBN 978-3-319-31964-3 (eBook) DOI 10.1007/978-3-319-31964-3

Library of Congress Control Number: 2016939944

© Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG Switzerland One of remote highland genetic isolate where we had our study Foreword

Psychiatric disorders are among the world’s most complex and least understood ailments; yet, relative to other medical disorders, they cause a disproportionate share of suffering. The authors of this volume, Dr. Kazima Bulayeva and her son, Dr. Oleg Bulayev, have dedicated not just their scientific careers, but their lives, to combatting these disorders by increasing our understanding of their causes. Their work is the true embodiment of genetic epidemiology, defined by Newton Morton as “a science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations.”1 This volume represents a fitting compilation of decades of labor, a true labor of love that has persevered despite many obstacles, from the financial to the political, to the physical challenges of navigating the mountainous terrain of the , to the social and cultural difficulties of developing and nurturing deep interpersonal relationships with the indigenous people in the highlands of Dagestan. Through this effort, Drs. Bulayeva and Bulayev have been able to develop a truly unique relationship with their subjects and a matchless, singular research program. In this monograph, you, the reader, will be immersed into the scientific workflow of Drs. Bulayeva and Bulayev, and you will feel the intensity of their painstaking effort to dissect the genetic underpinnings of mental disorders that aggregate in distinct ethnic isolates. Although technical advances in the field have ushered in an era of personal and precision medicine, Drs. Bulayeva and Bulayev show us the power and the promise of careful ascertainment, rich clinical characterization, and traditional family-based genetic analysis methods for discovering genomic loci that may harbor risk-conferring genes for mental disorders. This work is a testimonial, a

1 Morton, N. E. (1982). Outline of genetic epidemiology. New York: Karger. ISBN 3-8055-2269-X.

vii viii Foreword tutorial even, on the rewards to be reaped through the careful application of fundamental methods of genetic epidemiology and of sound science.

Syracuse, NY, USA Stephen J. Glatt February 10, 2016 Preface

The study of genetics of complex diseases is one of the main priorities of modern genetics, as these diseases are the leading cause of premature death and disability. Mental diseases—sсhizophrenia, depression, bipolar disorders, and so on—are among the most severe complex diseases for both patients and society. Currently, the rate of affectation by these mental diseases is increasing in most countries. Thousands of scientists from around the world study the genetic and environmental factors influencing the development of the disease, however, and so far, this area of medical genetics is full of conflicting results obtained by different researchers from different countries. We have proposed a cross-isolated population approach, implemented in ethni- cally and demographically subdivided genetic isolates with the aggregation of specific complex diseases. Unique genetic isolates with the aggregation of certain complex diseases, including schizophrenia, were ascertained in our long-term population-genetic studies of small indigenous ethnics of Dagestan (Northern Caucasus, Russia). Such cross-isolate approach allows us to identify common for all observed isolates and specific for every of them genomic regions containing candidate genes for diseases and genomic structural variants (CNV and ROH) linked with schizophrenia. The study of the same complex phenotype in diverse genetic isolates with different ancestors and high rate of endogamy and inbreeding enables the determination of the entire spectrum of genes and structural genomic variants involved in the pathogenesis of schizophrenia or any other complex disease. Genetic homogeneity and ancestor effect in such isolates helps identify the genomic mechanisms of the disease etiopathogenesis with substantial savings of cost and time, compared with genetically and ethnically heterogeneous large populations. Authors keep cherished memory of prominent geneticists, whose support of our pioneering genetic studies among indigenous peoples of Dagestan had a fundamen- tal importance for the development of the works presented—Timofeev-Resovskii, Dubinin, and Gindilis. We also express our sincere gratitude to the staff of our research group of human genetic adaptation at the NI Vavilov Institute of General Genetics of RAS (VIGG RAS), and members of the regular expeditions in ix x Preface

Dagestan— Pavlova, Gurgenova, Kurbanov, Guseynova, and Omarova. We are grateful to Politov, Kurbatova, and all the researchers of the Department of Popu- lation Genetics of VIGG RAS for review and recommendation to publish our long- term study results in this book. We are also grateful to the reviewer of the book manuscript—to Professor Golimbet, whose valuable comments in the manuscript helped to improve the presentation of our study in this book. Endless gratitude to our foreign colleagues, whose appreciation and support of our Dagestan Genetic Heritage research program was of fundamental importance for the preservation and development of this study, in spite of the numerous difficulties in the Russian science. They are outstanding scientists from the United States—Erving Gottesman, Paul Thompson and the International Scientific Corpo- ration ENIGMA, Ming Tsuang, Hilary Coon, Henry Harpending, Lynn Jorde, Michael Hammer, and Tatiana Karafet; from Italia—Giorgio Paoli and Sergio Tofanelli; from Germany Klaus-Peter Lesch; and from Japan Toru Takumi and Hideshi Kawakami. Our internships and joint work in their laboratories helped the authors of this book to master the most advanced methods of molecular genetics and bioinformatics technologies and apply them in our studies in Dagestan genetic isolates. Endless thanks to our coauthor in genetic studies of schizophrenia in Dagestan isolates and scientific editor of the book—Prof. Stephen Glatt. His participation in this study assisted in overcoming the differences in clinical and genetic methodol- ogies between Russian and US researches and certainly made available the results of our research in this book to a wide range of English-speaking colleagues and readers. Endless thanks to all Dagestan highlanders from diverse ethnic groups for their volunteer participation in our long-term study. The authors are grateful for the help in preparing the manuscript for publication to Gurgenova and for English translations to Marisa Peryer.

Moscow, Russia Kazima Bulayeva Moscow, Russia Oleg Bulayev Syracuse, NY, USA Stephen Glatt Contents

1 Current Problems of Complex Disease Genes Mapping ...... 1 1.1 General Problems of Complex Disease Genes Mapping ...... 1 1.2 Current Approaches of Schizophrenia Spectrum Disease Gene Mapping ...... 2 1.3 The Current State of Gene Mapping of Schizophrenia Spectrum Disorders ...... 6 References ...... 11 2 Descriptions and Methods of Study in Selected Genetic Isolates of Dagestan ...... 21 2.1 History and Ethno-linguistic Diversity of Dagestan ...... 21 2.2 Genetic and Demographic Structure of the Selected Isolates . . . . . 22 2.3 Methods of Clinical Studies ...... 28 2.4 Molecular-Genetic Methods of Study ...... 30 2.5 Genetic and Statistical Methods of Experimental Data Analysis ...... 31 References ...... 34 3 Selection of Populations for Mapping Genes of Complex Diseases ... 37 3.1 Principles of Selection of Populations for Complex Disease Gene Mapping ...... 37 3.2 Ethnogenomic Structure of Dagestan Populations ...... 38 3.3 Genetic Epidemiology Study of Selected Genetic Isolates with the Aggregation of Schizophrenia Spectrum Disorders ...... 48 3.4 Gene Pool of Selected Isolates for Mapping Genes of Schizophrenia ...... 55 3.5 Role of Inbreeding in the Aggregation of a Schizophrenia and in Its Age of Onset ...... 60 References ...... 68

xi xii Contents

4 Mapping Genes of Schizophrenia in Selected Dagestan Isolates .... 71 4.1 Haplotype Analysis in Pedigrees Ascertained in the Isolates . . . . . 71 4.2 Genome-Wide Nonparametric Linkage Analysis of Schizophrenia in Selected Isolates ...... 80 4.3 Genome-Wide Parametric Linkage Analysis of Schizophrenia in Selected Isolates ...... 81 4.4 Cross-Population Analysis of Genome-Wide Linkages Scan for Schizophrenia in Selected Isolates ...... 88 References ...... 98 5 Common Structural Genomic Variants in Linked with SCZ Regions ...... 103 5.1 Copy Number Variations and Runs of Homozygosity Analyses in Linked with SCZ Genomic Regions in Pedigrees of Selected Isolates ...... 103 5.2 Effect of Inbreeding on CNV and ROH Segments Sizes (Kb) and on Marker Numbers ...... 110 5.3 Cross-Isolate Study of Structural Genomic Variations in Linked with Schizophrenia Regions ...... 112 References ...... 128

Conclusions ...... 131

Appendix: List of Genome-Wide Scanned Loci and Markers in Studied Dagestan Genetic Isolates (Weber/CHLC 9.0 Markers) ..... 133 List of Figures

Fig. 1.1 The relative risk (RR) of schizophrenia during lifetime (lifetime risk of developing schizophrenia), based on the degree of genetic affinity (Gottesman 1991). In the general population, RR of developing schizophrenia is 1 %. In groups of relatives, the RR increases significantly with closer family ties. SCZ schizophrenia ...... 3 Fig. 2.1 Women’s hats from representatives of different ethnic groups in Dagestan (Gadzhieva 1961) ...... 26 Fig. 3.1 Haplotype of founder isolate with mutations that define the disease, out of the total proto-population transmitted to descendants. Haplotype block with pathogenic loci decreases during meiosis and recombination: over generations in the demographic history population members had greater numbers of recombination in meiosis. Only a short segment of ancestral haplotype with pathogenic locus is maintained in 50-ths generations in population ...... 39 Fig. 3.2 PCA plot of 250 K autosomal SNPs of 56 populations from Dagestan, Caucasus, Near East, Europe, Central Asia and South Asia. ‘Drop one in’ procedure was used for analysis. PC1 and PC2 coordinates for each population were calculated as median coordinate values for individuals within populations. This revealed relatively distinct clusters of Europeans, South Asian and Central Asians, while Daghestani samples (except Nogais and Mountain Jews immigrated to Dagestan region about 700 years ago according historical data) intermingle with other Caucasus individuals and show an affinity with European and Near Eastern samples. Dagestan-ND, ethnic groups belonging to Dagestan and Nakh language family. Dagestan-non-ND, ethnics groups of Dagestan (from Karafet et al. 2016) ...... 40

xiii xiv List of Figures

Fig. 3.3 Distribution of Dagestani ethnic groups (Kumyks, Dargins, and Laks) with 25 racial and ethnic groups worldwide (527 people) in the space of three (PC1–PC3) principal components. Every examined individual is designated by point, the color of which reflects ethnicity. The circle indicates Dagestani ethnic groups. The Kumyks, Dargins, and Laks show clear ethnogenic proximity to the European group ...... 41 Fig. 3.4 Distance from the African centroid and distribution of ethnic populations studied within the size variance alleles of STR loci ...... 41 Fig. 3.5 Network (Median joining), built on the basis of 20 STR loci haplotype of Y-chromosome. Nuclear haplotypes of a certain number of examinees from different ethnic groups are highlighted. Pink Kubachins; red Avars; yellow Chechens-Akkin; green Tabasarans; blue Laks (Caciagli et al. 2009) ...... 43 Fig. 3.6 Multivariate analysis of Y-STR major haplogroups’ frequency in major populations of Dagestan, the Caucasus, and the Middle East. Geographical regions are indicated by the following symbols: filled squares Dagestan; filled circles Caucasus; filled triangles West Asia. Gray color means haplogroups. Legend: MJ Mountain Jews; TAT Tats; Lk Laks; Avr Avars; Kbc Kubachins; Tbs Tabasarans; Drg Dargins; Lzg Lezgins; Rtl Rutuls; Abz Abazins; Abk Abkhaz; Arm ; AZB_NT Azerbaijanians-North Talysh; Che Chechens; Geo ; Ins Ingush; Kbd Kabardians; Krd ; Ir_Teh Teheran Iranians; Ir_Isf Isfahan Iranians; Ir Iranians; Ir_Arb Iranians Arabs; Ir_Gil Iranians–Gilaks; Ir_Bak Iranians-Bakhtiard; Ir_Maz Iranians Mazandarinians; Ir_ST Iranians-South Talysh; Jor Jordanians; Trk Turkish; Yem Yemens ...... 44 Fig. 3.7 Multidimensional scaling of HVS-I sequence matrix (haplotype frequencies) demonstrating the genetic relationships among the ethnic populations of the Caucasus, West Asia, and Central Asia. filled squares Dagestan; filled circles Caucasus, filled triangles West Asia, and open circles Central Asia (Uzbk Uzbeks; Trkm ; Kazh Kazakhs). For symbols of other ethnic groups, see Fig. 3.5 ...... 45 Fig. 3.8 Contour map showing the distribution of J1 and J *(xJ2) haplogroups in ethnic groups of the Caucasus, Middle East, Central Asia, and North Africa, professing Islam ...... 45 Fig. 3.9 Distributions built based on genetic distances between Dagestan (Avars, Dargins, Kumyks, Lezgins, Kubachins) and other worldwide groups by hypervariable locus of HVS1 mtDNA ...... 47 Fig. 3.10 Pedigree branches of genetic isolates DGH064 (a), DGH005 (b), DGH022 (c), and DGH011 (d). Legend: P/SCZ possibly with List of Figures xv

schizophrenia, SCZ schizophrenia and related spectrum disorders ...... 54 Fig. 3.11 Distribution of sizes of alleles of 21 STR loci in groups of descendants from outbred (1), endogamous (2) and inbred (3) marriages. X-axis: the size of alleles of studied loci; Y-axis: frequency of their occurrence in these groups of descendants, % ...... 55 Fig. 3.12 The distribution of alleles D17S784 in the groups of descendants of exogamous (inter-populations and inter-ethnic) (1) and consanguineous (2) marriages ...... 56 Fig. 3.13 A comparative analysis of the level of heterozygosity and allelic rank distribution of grades 28 microsatellites between 3 studied ethnic groups in Dagestan and the global summary of the John Weber lab. HWEB level of heterozygosity in the combined sample from the John Weber lab in examined group of Laks (HLAKS), Dargins (HDARG), and Tindals (HTIND) ...... 56 Fig. 3.14 Distribution of H level of heterozygosity per locus (a) and inbreeding level F (b) in healthy subjects (1) and patients (2). SD standard deviation; SE standard error; X average value ...... 63 Fig. 3.15 Genealogy fragment of a primary isolate with a high frequency of cousin marriages and aggregation of paranoid schizophrenia . . . . . 65 Fig. 3.16 The frequencies of the descendants of outbred and inbred marriages in groups of healthy subjects (N) and schizophrenia spectrum disorders patients (SCZ). Differences in the distribution groups are valid: χ2 ¼ 10.9, df ¼ 1, p ¼ 0.00096, Rs ¼À0.498, t ¼ 3.721, p ¼ 0.00058 ...... 65 Fig. 3.17 The distribution of age at onset of schizophrenia in groups of descendants of the different types of marriage ...... 66 Fig. 3.18 Multivariate genetic analysis of patient groups with different age of onset within two main components II and I ...... 66 Fig. 4.1 Haplotypes of chromosome 22 in the genealogy fragment DGH005. The sequence of chromosome loci: D22S420, D22S345, D22S689, D22S685, D22S683, D22S445 ...... 73 Fig. 4.2 Haplotypes of chromosome 17 in the genealogy fragment DGH005. The sequence of loci: D17S1308, 917S1298, D17S974, D17S1303, D17S947, D17S2196, D17S1294 ...... 74 Fig. 4.3 Haplotype of chromosome 22 in the fragment of genealogy DGH064. Loci sequence (see in Fig. 3.18) ...... 75 Fig. 4.4 Haplotypes of chromosome 17 in the fragment of genealogy DGH064. STR loci (see in Fig. 4.2) ...... 76 Fig. 4.5 Haplotypes of chromosome 17 in the genealogy fragment DGH022. The letter “Y” marks the genealogy members with genome-wide scanned microsatellites ...... 77 xvi List of Figures

Fig. 4.6 Haplotypes of chromosome 22 in the fragment of genealogy DGH011. For sequence of loci, see in Fig. 3.18 ...... 78 Fig. 4.7 Haplotypes of chromosome in genealogy fragment DGH064. The sequence of loci: D6S1959, D6S2439, D6S2427 ...... 79 Fig. 4.8 Common genomic region 6p21.2–p22.3 we found as linked with schizophrenia (see Tables 4.2 and 4.3) for genetic isolates pedigrees studied. Overall LOD ¼ 5.3, α¼1...... 92 Fig. 4.9 Genomic region 10p11.23-p11.21 in isolate DGH011 and the 10q26.12-q26.13 region in isolates DGH005 and DGH022 were linked with schizophrenia and with dominant inheritance of disease loci. In isolate DGH034, we found the 10q12-q26.13 region to inherit disease loci recessively. LODs varied from 1.96 to 2.7 (see Table 3.12). Overall for 2 (DGH005 + DGH022) isolates LOD ¼ 5.3, α À 1...... 93 Fig. 4.10 Genomic regions at 11p114.3-p13 in isolate DGH005, and at 11q23.1-q24.3 in isolates DGH034 and DGH011 linked with schizophrenia candidate genes. LODs varied from 2.1 to 2.7 (see Table 3.12) ...... 94 Fig. 4.11 Genomic region 18p11.31 in isolates DH022/DGH034 and at 18q12.1-q12.3 in isolates DGH011 linked with schizophrenia candidate genes. LODs varied from 1.5 to 3.0 (see Table 3.12) ...... 95 Fig. 4.12 In all studied isolates, genomic region 17p11-q12 linked with schizophrenia spectrum disorders (DGH022/DGH005/DGH034/ DGH011). LOD ¼ 3.7 R/M–2.5 R/M–1.7 R/M–2.98 D/M, respectively (see Tables 4.2 and 4.3) ...... 95 Fig. 4.13 In all studied isolates, genomic region 22q11.1-q12.3 linked with schizophrenia spectrum disorders (DGH005/DGH034/DGH011). LOD ¼ 3.2 D/M–4.4 D/M–3.1 D/M, respectively (see Tables 4.2 and 4.3) ...... 96 Fig. 4.14 Differences between Primary (PI) and Secondary (SI) isolates in % of meioses in chromosomes where we obtained significant linkages for SCZ as well as in rates of recombinations events in the isolates pedigrees. Differences between the isolates are statistically significant (t ¼ 2.3–7.6; p < 0.05–0.000) ...... 97 Fig. 5.1 Schematic representation of the SNP (a) and CNV (b)—deletions and duplications in chromosome ...... 105 Fig. 5.2 Genome-wide length sizes of ROH (a) and CN (b) among affected (SCZ) and healthy (N) pedigrees members summarized from studied isolates. Star marked chromosomes with reliable in the isolates and with higher LOD values obtained in our linkage analyses ...... 106 List of Figures xvii

Fig. 5.3 Variation (in %) by chromosomes of CN gains and losses among observed SCZ patients: the differences between groups are statistically significant χ2 ¼ 32,385, df ¼ 19, p ¼ 0.02833 ...... 107 Fig. 5.4 CNV in 1q21.1 and 15q11.2 regions previously reported (Stefansson et al. 2008) (A1, B1) and in SCZ affected subjects from Dagestan genetic isolates (A2, B2). Segments with CNV and ROH were obtained in same regions: in 1q21.1—in genomes of 20 affected cases we found segments in 13 genomes with ROH, 7—deletions, 3—gains; in 15q11.2 we found segments with 7 ROH, 5 deletions and 6 gains ...... 109 Fig. 5.5 The association between the levels of inbreeding and the average size of segments and the number of markers in patients with homozygous CNV (deletion ¼ 0, duplication ¼ 4)...... 111 Fig. 5.6 The association between inbreeding coefficient and CNV: frequency of homozygous variations of the number of copies is greater with the higher inbreeding coefficients ...... 111 Fig. 5.7 CNV found in gene CRIM1 in linked with SCZ region 2p22.3- p21 in isolate DGH011 (LOD ¼ 3.1) ...... 115 Fig. 5.8 CNVs established in 3 isolates with schizophrenia-linked 6p21- p22 region with high reliable LOD ¼ 4.3 (DGH034), 2.92 (DGH022), and 2.3 (DGH011). Linked region contains candidate genes for schizophrenia: NOTCH4, HLA-DRB1, TNXB, HLA-DRB1, TAP2, etc. Genes localized in linked region had deletions in eight patients and duplications in three patients ..... 116 Fig. 5.9 Segments with copy number variations in linked with schizophrenia 8p23 region. Five patients have segments with deletions and six patients have duplications. We found no ROH segment linked with the SCZ region ...... 119 Fig. 5.10 Recurrent CNVs found in five patients with a common ancestor within linked 8p23 region ...... 119 Fig. 5.11 CNV (a, 10 patients) and ROH (b, 7 patients) ‘hot spots’ obtained among SCZ cases in gene ELAVL2 (9p21.3) confirms genomic instability reported on DGV site ...... 121 Fig. 5.12 CNV “hot spot” in linked 17p11-p12 region and in 17q21.31 . . . 122 Fig. 5.13 Duplication in CECR2 and SLC25A18 genes found in 7 SCZ patient genomes in isolates DGH005 and DGH034 at 22q11.2- q12.1. Six duplications and five deletions in genes CACNG2, PVA2B, and IFT27, as well as deletions in genomes of six patients and duplications in three patients in gene LARGE we obtained within the second linkage peak in isolate DGH034 and DGH011 at 22q12.3 ...... 123 Fig. 5.14 A summarized genome-wide scanned significant linkages obtained in four genetic isolates (color vertical lines) with CNV (del & gain) and ROH found in linked regions. Results on X and Y chromosomes were not presented ...... 126 List of Tables

Table 1.1 The study of genes involved in dopaminergic mechanism ...... 5 Table 1.2 Genes involved in serotonin mechanism ...... 5 Table 2.1 List of parameters studied in the survey of isolates residents .... 23 Table 2.2 Number of ethnic groups in Dagestan and mono-ethnic villages in them ...... 25 Table 2.3 Dynamics of the national structure of the rural population of Dagestan (1926–1989) ...... 27 Table 3.1 Complex disease gene mapping in genetic isolates of outbred populations: advantages and disadvantages ...... 38 Table 3.2 Frequency of Y-haplogroups in the studied ethnic populations of Dagestan (Caciagli et al. 2009) ...... 43 Table 3.3 Analysis of genetic differentiation in the male genome of Caucasus people (Y-chr.), grouped according to different classification criteria ...... 46 Table 3.4 Structure of morbidity in a number of examined mountain Dagestan isolates ...... 49 Table 3.5 Description of selected isolates for the study and reconstructed pedigrees ...... 51 Table 3.6 Structure of morbidity among the members of the pedigrees of studied isolates ...... 52 Table 3.7 Comparative analysis of the level of heterozygosity and allelic loci ranks of chromosomes 17 and 18 in 5 Dagestan ethnic groups, and summary data from the John Weber lab ...... 58 Table 3.8 Hardy–Weinberg equilibrium distribution compliance of studied genomic loci of chromosome 17 ...... 60 Table 3.9 Assessment of genetic similarity of examined isolates examined by summary of genomic loci (Nei 1978) ...... 60

xix