Molecular Genetics and Genomics https://doi.org/10.1007/s00438-018-1415-8

ORIGINAL ARTICLE

Skewing of the genetic architecture at the ZMYM3 human-specific ′5 UTR short tandem repeat in schizophrenia

F. Alizadeh1 · A. Bozorgmehr3 · J. Tavakkoly‑Bazzaz1 · M. Ohadi2

Received: 17 June 2017 / Accepted: 2 January 2018 © Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract Differential expansion of a number of human short tandem repeats (STRs) at the critical core promoter and 5′ untranslated region (UTR) support the hypothesis that at least some of these STRs may provide a selective advantage in human evolution. Following a genome-wide screen of all human -coding 5′ UTRs based on the Ensembl database (http://www. ensembl.org), we previously reported that the longest STR in this interval is a (GA)32, which belongs to the X-linked zinc finger MYM-type containing 3 (ZMYM3) gene. In the present study, we analyzed the evolutionary implication of this region across evolution and examined the allele and genotype distribution of the “exceptionally long” STR by direct sequencing of 486 Iranian unrelated male subjects consisting of 196 cases of schizophrenia (SCZ) and 290 controls. We found that the ZMYM3 transcript containing the STR is human-specific (ENST00000373998.5). A significant allele variance difference was observed between the cases and controls (Levene’s test for equality of variances F = 4.00, p < 0.03). In addition, six alleles were observed in the SCZ patients that were not detected in the control group (“disease-only” alleles) (mid p exact < 0.0003). Those alleles were at the extreme short and long ends of the allele distribution curve and composed 4% of the genotypes in the SCZ group. In conclusion, we found skewing of the genetic architecture at the ZMYM3 STR in SCZ. Further, we found a bell-shaped distribution of alleles and selection against alleles at the extreme ends of this STR. The ZMYM3 STR sets a prototype, the evolutionary course of which determines the range of alleles in a particular species. Extreme “disease-only” alleles and genotypes may change our perspective of adaptive evolution and complex disorders. The ZMYM3 gene “excep- tionally long” STR should be sequenced in SCZ and other human-specific phenotypes/characteristics.

Keywords ZMYM3 · Short tandem repeat · Schizophrenia · Exceptionally long · Disease-only

Abbreviations Introduction SCZ Schizophrenia STR Short tandem repeat While single nucleotide substitutions play an important TF Transcription factor role in intra- and inter-species variations, short tandem TSS Transcription start site repeats (STRs) may be a more efficient source of variation. UTR Untranslated region The above notion is based on the rationale that STRs span ZMYM3 Zinc finger MYM-type containing 3 longer stretches of DNA and, therefore, can potentially recruit larger numbers of regulatory factors, e.g., transcrip- tion factors (TFs). Furthermore, the unique ability of STRs Communicated by S. Hohmann. to expand or contract makes them an ideal source of evo- lutionary adaptation (Ohadi et al. 2012, 2015; King 2012). * M. Ohadi [email protected]; [email protected] An increasing wealth of evidence on the importance and relevance of STRs to various phenotypes and characteris- 1 Department of Medical Genetics, School of Medicine, tics (Bagshaw 2017; Carrat et al. 2017; Press et al. 2014; Tehran University of Medical Sciences, Tehran, Iran Hammock and Young 2005) indicates that this understudied 2 Iranian Research Center on Aging, University of Social category of genetic variation should no longer be ignored as Welfare and Rehabilitation Sciences, Tehran, Iran junk DNA. 3 Department of Neuroscience, Faculty of Advanced Certain STRs that reach exceptional length, particu- Technologies in Medicine, Iran University of Medical larly those in gene regulatory regions, may be of selective Sciences, Tehran, Iran

Vol.:(0123456789)1 3 Molecular Genetics and Genomics advantage with respect to speciation and adaptive evolution. and liver disease) and a history of substance or alcohol use Focusing on the critical core promoter interval of protein- or addiction within 1 year prior to the study. Healthy sub- coding has unraveled STRs that are directionally jects were selected from the same areas in Tehran, and were expanded in humans (Nikkhah et al. 2016; Rezazadeh et al. excluded from the study if they or their first-degree relatives 2015; Mohammadparast et al. 2014). The zinc finger MYM- had a lifetime history of any psychiatric or non-psychiatric type containing three (ZMYM3) gene contains the longest disorders interfering with brain function. STR identified in the 5′ untranslated region (5′ UTR) of a The subjects’ consent was obtained (from their guardians protein-coding gene in humans (Namdar-Aligoodarzi et al. where necessary), and their identities remained confiden- 2015). This stretch of GA-repeats spans the core promoter tial throughout the study. The research was approved by the and 5′ UTR. ZMYM3 is located on the X and responsible ethics committee of Tehran University of Medi- is subject to X inactivation (van der Maarel et al. 1996). The cal Sciences and adhered to the principles outlined in an encoded protein is a component of the histone deacetylase- internationally recognized standard for the ethical conduct containing multiprotein complex that functions by modify- of human research. ing chromatin structures to silence genes. There is evidence for relatively specific effects of X-linked genes on social Allele and genotype analysis of the ZMYM3 gene cognition and emotional regulation (van Rijn et al. 2006). In GA‑repeat line with this finding, a chromosomal translocation (X;13) involving ZMYM3 is associated with X-linked cognitive dis- Genomic DNA was obtained from peripheral blood using a ability (van der Maarel et al. 1996). standard precipitation method. The following primers were Mutations/Dysregulatory mechanisms in a number of used to amplify the region containing the ZMYM3 5′ UTR X-linked genes co-occur with SCZ (Piton et al. 2011; Add- STR, which resulted in a PCR fragment of 260 bp: ington et al. 2011; Feng et al. 2009). Because of the criti- Forward: 5′ CG CAC​GAG​AAG​CAG​AGAGG 3′. cal location of the ZMYM3 STR and its exceptional length Reverse: 5′ TTC​TCC​CTG​AGT​CTT​CCT​GC 3′. in humans, as well as the evidence of functionality of pro- PCR was carried out in a thermocycler instrument moter GA-repeats (Mu and Burt 1999; Heidari et al. 2012; (Applied Biosystems, GeneAmp 2720, Singapore) under the Valipour et al. 2013; Kumar and Bhatia 2016; Emamali- following conditions: 94 °C for 4 min, followed by 40 cycles zadeh et al. 2017), we hypothesized that the GA-STR in including denaturing at 94 °C for 30 s, annealing for 30 s at the ZMYM3 gene may be involved in processes that have 63 °C and extension at 72 °C for 30 s. A final extension was caused the divergence of humans from other species, such conducted at 72 °C for 5 min. All of the samples included in as the higher order brain function in humans. We studied this study were sequenced for the ZMYM3 GA-repeat using schizophrenia (SCZ) as a human-specific disease in which an ABI PRISM 377 DNA sequencer. human-specific brain functions are severely compromised. The allele and genotype distribution of the ZMYM3 5′UTR In silico evolutionary analysis of the 5′ UTR STR GA-repeat was examined in a group of patients with SCZ of the ZMYM3 gene and the TFs binding to different and controls. lengths of this STR

In reference to the location of the human ZMYM3 (GA)32, Materials and methods the ZMYM3 5′ UTR from + 1 to + 100 bp and the transcript containing the STR were screened in all the species whose Subjects sequences were annotated for this gene in the Ensembl database. A total of 486 unrelated Iranian males were recruited for The ConSite link http://consite.genereg.net was used this study, including patients with SCZ (n = 196) and con- to predict TFs binding to different repeat numbers of the trols (n = 290). All patients were diagnosed based on the ZMYM3 GA-repeat. For example, the pattern of TFs binding Diagnostic and Statistical Manual of Mental Disorders to the “disease-only” alleles at either extreme of the GA- (DSM-V) diagnostic criteria. Patients were selected from the repeat was compared to the alleles detected at the extremes Roozbeh Hospital-Tehran University of Medical Sciences, in the control range. Razi Hospital-University of Social Welfare and Rehabilita- Chi-squared test was used to compare the frequency of tion Sciences, and Iran Hospital-Iran University of Medi- each allele between the case and control groups. Statistical cal Sciences. All patients were clinically assessed by two analysis for calculating the mid p value of the “disease-only” expert psychiatrists and clinical psychologists. Patients were genotypes was performed using the two by two table at Ope- excluded if they were mentally retarded or had a history of nEpi. Levene’s test of Variance was used to calculate the p a severe head injury, serious disorders (e.g., thyroid, heart value for the significance of variances.

1 3 Molecular Genetics and Genomics

Results Extreme “Disease‑Only” alleles/genotypes at (GA)n4 of the ZMYM3 gene in SCZ patients vs. controls In silico evolutionary analysis of the 5′ UTR STR of the ZMYM3 gene Six alleles were detected at (GA)n4 in the SCZ group and not in the controls (mid p exact < 0.0003) (Fig. 3a, b). These The transcript containing the ZMYM3 5′ UTR STR so-called “disease-only” alleles/genotypes had 17 (n = 3), (ENST00000373998.5) was found to be human-specific. 20 (n = 1), 38 (n = 1), 40 (n = 1), 43 (n = 1), and 45 repeats This transcript and the immediate core promoter are (n = 1), where “n” stands for the number of patients hav- enriched by a complex of consecutive GA-STRs (n1, n2, n3, ing these alleles/genotypes (Figs. 2, 4). The “disease-only” and n4). The longest of STR contains 32-repeats, (n4), and alleles/genotypes made up 4% of the SCZ alleles/genotypes. is the longest STR identified in the 5′ UTR of protein-coding There were no “control-only” alleles detected, i.e., all of the genes in humans (Fig. 1). The STR formula consisting of alleles that were represented by the control group were also (GA) n1, n2, n3, and n4 was found to be human-specific. detected in the cases.

Significant allele variance difference in the control The human ZMYM3 core promoter and the 5′ UTR group vs. SCZ cases GA‑STR polymorphism status in the human subjects A significant allele variance difference was observed The (GA)n1, (GA)n2, and (GA)n3 STRs were monomorphic between the cases and controls, where allele variance was across the case and control subjects, at 8-, 4-, and 6-repeats, more restricted in the control subjects (Levene’s test for respectively. The allele range of (GA)n4 in the control Equality of Variances F = 4.000, p < 0.03) (Fig. 4). The fre- male subjects had a range of 18–41 repeats. This range was quency of four alleles, 17, 26, 32, and 33-repeat, was signifi- between 17 and 45 repeats in the SCZ cases (Fig. 2). cantly different between the cases and controls (p < 0.05).

Fig. 1 Schematic representation of the human ZMYM3 GA-repeat gene in humans. (GA)n1, (GA)2, and (GA)n3 are monomorphic in complex encompassing the core promoter and 5′ UTR region. (GA) humans. The STR formula for the four GA-STR complex is human- n4 is the longest STR identified in the 5′ UTR of a protein-coding specific

Fig. 2 Bell-shaped allele dis- tribution of the “exceptionally long” GA-repeat in ZMYM3 in SCZ patients and controls. The range of alleles was between 17 and 45 in the SCZ cases and 18 and 41 in the controls. Asterisks represent alleles that are sig- nificantly different in frequency between cases and controls

1 3 Molecular Genetics and Genomics

Fig. 3 Electropherogram of the “disease-only” genotypes in the 45-repeat) (b) were detected in the SCZ cases and not in the controls. exceptionally long STR, (GA)n4, of ZMYM3. Alleles at the short The bars represent the GA-repeat extreme (17- and 20-repeat) (a) and long extreme (38-, 40-, 43-, and

Discussion Alzheimer’s disease (Aubry et al. 2015) and X-linked mental retardation (van der Maarel et al. 1996). The present study As a prototype of disorders unique to humans, SCZ unrav- provides the first evidence of the involvement of this gene els a number of human-specific predisposing factors. The as risk factor for SCZ. ZMYM3 gene transcript containing the “exceptionally long” The stretch of four consecutive GA-STRs across the GA-repeat is human-specific and is located on the X chro- core promoter and 5′ UTR of human ZMYM3 is likely to mosome. Numerous genes located on this chromosome are conform to DNA domains and function to regulate gene linked to higher order brain function in humans such as expression and mRNA stability. In line with the human- cognition, a property that is ubiquitously impaired in SCZ. specificity of the transcript containing the “exceptionally ZMYM3 is linked to two other cognition deficit disorders, long” STR, recent reports indicate a role of repetitive

1 3 Molecular Genetics and Genomics

Fig. 4 The GA-repeat vari- ance distribution in ZMYM3 between the SCZ and control groups. Allele variance was significantly different between the case and control groups. Less variation was observed in the control group, partially at the extreme ends of the STR. Asterisks represent alleles that are significantly different, in frequency, between cases and controls

sequences in the creation of new TSSs during human evo- Conclusion lution (Li et al. 2017). The 17- and 18-repeat alleles were the shortest alleles Our findings support the selection against alleles in the in the SCZ cases and controls, respectively. These alleles extreme short and long ends of the human ZMYM3 gene potentially recruit14 and 15 TF sets, respectively. Like- STR. This STR sets a prototype, the evolutionary course wise, at the long extreme end, the 41-repeat alleles with of which determines the range of alleles in a particular the longest allele in the controls recruit 2 and 4 sets of species. Extreme “disease-only” alleles and genotypes may TFs less than the 43- and 45-repeat “disease-only” alleles, change our perspective of adaptive evolution and complex respectively. These analyses are examples of how different disorders. Large-scale sequencing of the GA-STR in the repeat numbers can potentially recruit various numbers ZMYM3 gene should be performed for a wide range of of TFs. characteristics and disorders that are specific to the human The extreme “disease-only” alleles detected in this species. study were non-existent across over 30,000 alleles anno- tated in the Genome Aggregation Database (http://www. Funding This research was funded by the University of Social Welfare genomAd.org). A role of extreme “disease-only” alleles as and Rehabilitation Sciences, Tehran, Iran. risk factors is largely unknown in the disease pathogen- Compliance with ethical standards esis in SCZ. Support for such alleles at an STR locus was previously reported by our group at the core promoter of Conflict of interest RIT2 The following authors declare that there is no con- the human gene, in which extreme deviation from flict of interest: Fatemeh Alizadeh, Ali Bozorgmehr, Javad Tavakkoly- the predominant 11-repeat allele was detected in SCZ Bazzaz, and Mina Ohadi. (Emamalizadeh et al. 2017). While the extreme 5-repeat Ethical approval allele is non-existent in agricultural humans (http://www. ll procedures performed in studies involving human participants were conducted in accordance with the ethical standards genomAd.org), its presence has been reported in hunter- of the institutional and/or national research committee and the 1964 gatherer men sequenced from Southern Africa (Schuster Helsinki declaration and its later amendments or comparable ethical et al. 2010). Co-occurrence of extreme “disease-only” standards. alleles and genotypes with a number of neuropsychiatric disorders at the CYTH4 and CAV1 genes further support this role (Khademi et al. 2017; Darvish et al. 2013; Zarif- Yeganeh et al. 2010).

1 3 Molecular Genetics and Genomics

References Li C, Lenhard B, Luscombe NM (2017) Integrated analysis sheds light on evolutionary trajectories of young transcription start sites in the . bioRxiv. https://doi.org/10.1101/192757 Addington AM, Gauthier J, Piton A, Hamdan FF, Raymond A, Gogtay Mohammadparast S et al (2014) Exceptional expansion and conserva- N, Miller R, Tossell J, Bakalar J, Inoff-Germain G, Gochman P, tion of a CT-repeat complex in the core promoter of PAXBP1 in Long R, Rapoport JL, Rouleau GA (2011) A novel frameshift primates. Am J Primatol 76(8):747–756 mutation in UPF3B identified in brothers affected with child- Mu W, Burt DR (1999) The mouse GABA(A) receptor alpha3 subunit hood onset schizophrenia and autism spectrum disorders. Mol gene and promoter. Brain Res Mol Brain Res 73(1–2):172–180 Psychiatry 16(3):238–239. https://doi.org/10.1038/mp.2010.59 Namdar-Aligoodarzi P, Mohammadparast S, Zaker-Kandjani B, Talebi (Epub 2010 May 18) Kakroodi S, Jafari Vesiehsari M, Ohadi M (2015) Exceptionally Aubry S, Shin W, Crary JF, Lefort R, Qureshi YH, Lefebvre C, Cali- long 5′ UTR short tandem repeats specifically linked to primates. fano A, Shelanski ML (2015) Assembly and interrogation of Gene 569(1):88–94 Alzheimer’s disease genetic networks reveal novel regulators of Nikkhah M et al (2016) An exceptionally long CA-repeat in the core progression. PLoS One 10(3):e0120352. https://doi.org/10.1371/ promoter of SCGB2B2 links with the evolution of apes and Old journal.pone.0120352 World monkeys. Gene 576(1 Pt 1):109–114 Bagshaw ATM (2017) Functional mechanisms of microsatellite DNA Ohadi M, Mohammadparast S, Darvish H (2012) Evolutionary trend in eukaryotic genomes. Genome Biol Evol 9(9):2428–2443. of exceptionally long human core promoter short tandem repeats. https://doi.org/10.1093/gbe/evx164 Gene 507(1):61–67 Carrat GR, Hu M, Nguyen-Tu MS, Chabosseau P, Gaulton KJ, van Ohadi M, Valipour E, Ghadimi-Haddadan S, Namdar-Aligoodarzi P, de Bunt M, Siddiq A, Falchi M, Thurner M, Canouil M, Pat- Bagheri A, Kowsari A, Rezazadeh M, Darvish H, Kazeminasab S tou F, Leclerc I, Pullen TJ, Cane MC, Prabhala P, Greenwald W, (2015) Core promoter short tandem repeats as evolutionary switch Schulte A, Marchetti P, Ibberson M, MacDonald PE, Manning codes for primate speciation. Am J Primatol 77(1):34–43 Fox JE, Gloyn AL, Froguel P, Solimena M, McCarthy MI, Rutter Piton A, Gauthier J, Hamdan FF, Lafrenière RG, Yang Y, Henrion GA (2017) Decreased STARD10 expression is associated with E et al (2011 Aug) Systematic resequencing of X-chromosome defective insulin secretion in humans and mice. Am J Hum Genet synaptic genes in autism spectrum disorder and schizophrenia. 100(2):238–256 Mol Psychiatry 16(8):867–880 Darvish H et al (2013) Biased homozygous haplotypes across the Press MO, Carlson KD, Queitsch C (2014) The overdue promise human caveolin 1 upstream purine complex in Parkinson’s dis- of short tandem repeat variation for heritability. Trends Genet ease. J Mol Neurosci 51(2):389–93 30(11):504–512 Emamalizadeh B, Movafagh A, Darvish H, Kazeminasab S, Andarva Rezazadeh M et al (2015) A primate-specific functional GTTT-repeat M, Namdar-Aligoodarzi P, Ohadi M (2017) The human RIT2 in the core promoter of CYTH4 is linked to bipolar disorder in core promoter short tandem repeat predominant allele is species- human. Prog Neuropsychopharmacol Biol Psychiatry 56:161–167 specific in length: a selective advantage for human evolution? Mol Schuster SC, Miller W, Ratan A, Tomsho LP, Giardine B, Kasson LR Genet Genom 292(3):611–617 et al (2010) Complete Khoisan and Bantu genomes from southern Feng J, Sun G, Yan J, Noltner K, Li W, Buzin CH, Longmate J, Heston Africa. Nature 463(7283):943–947 LL, Rossi J, Sommer SS (2009) Evidence for X-chromosomal Valipour E, Kowsari A, Bayat H, Banan M, Kazeminasab S, Moham- schizophrenia associated with microRNA alterations. PLoS One madparast S, Ohadi M (2013) Polymorphic core promoter GA- 4(7):e6121. https://doi.org/10.1371/journal.pone.0006121 repeats alter gene expression of the early embryonic developmen- Hammock EA, Young LJ (2005) Microsatellite instability gener- tal genes. Gene 531(2):175–179 ates diversity in brain and sociobehavioral traits. Science van Rijn S, Swaab H, Aleman A, Kahn RS (2006) X Chromosomal 308(5728):1630–1634 effects on social cognitive processing and emotion regula- Heidari A et al (2012) Core promoter STRs: novel mechanism for tion: a study with Klinefelter men (47,XXY). Schizophr Res inter-individual variation in gene expression in humans. Gene 84(2–3):194–203 492(1):195–198 van der Maarel SM, Scholten IH, Huber I, Philippe C, Suijkerbuijk RF, Khademi E, Alehabib E, Shandiz EE, Ahmadifard A, Andarva M, Gilgenkrantz S, Kere J, Cremers FP, Ropers HH (1996) Cloning Jamshidi J, Rahimi-Aliabadi S, Pouriran R, Nejad FR, Mansoori and characterization of DXS6673E, a candidate gene for X-linked N, Shahmohammadibeni N, Taghavi S, Shokraeian P, Akhavan- mental retardation in Xq13.1. Hum Mol Genet 5(7):887–897 Niaki H, Paisán-Ruiz C, Darvish H, Ohadi M (2017) Support Zarif-Yeganeh M, Mirabzadeh A, Khorram Khorshid HR, Kamali K, for “disease-only” genotypes and excess of homozygosity at the Heshmati Y, Gozalpour E, Veissy K, Olad Nabi M, Najmabadi CYTH4 primate-specific GTTT-repeat in schizophrenia. Genet H, Ohadi M (2010) Novel extreme homozygote haplotypes at Test Mol Biomark 21(8):485–490 the human caveolin 1 gene upstream purine complex in sporadic King DG (2012) Evolution of simple sequence repeats as mutable sites. Alzheimer’s disease. Am J Med Genet B Neuropsychiatr Genet Adv Exp Med Biol 769:10–25 153B(1):347–349 Kumar S, Bhatia S (2016) A polymorphic (GA/CT)n- SSR influences promoter activity of Tryptophan decarboxylase gene in Catharan- thus roseus L. Don. 10. Sci Rep 6:33280. https://doi.org/10.1038/ srep33280

1 3