<<

New Research

Development Evolution of the Muscarinic Acetylcholine Receptors in and Dan Larhammar ء,Christina A. Bergqvist ء,Julia E. Pedersen https://doi.org/10.1523/ENEURO.0340-18.2018 Department of Neuroscience, Unit of Pharmacology, Science for Life Laboratory, Uppsala University, Uppsala SE-751 24, Sweden

Abstract The family of muscarinic acetylcholine receptors (mAChRs) consists of five members in mammals, encoded by the CHRM1-5 . The mAChRs are G--coupled receptors, which can be divided into the following two subfamilies: M2 and M4 receptors coupling to Gi/o; and M1, M3, and M5 receptors coupling to Gq/11. However, despite the fundamental roles played by these receptors, their evolution in vertebrates has not yet been fully described. We have combined sequence-based phylogenetic analyses with comparisons of exon– organi- zation and conserved synteny in order to deduce the evolution of the mAChR receptors. Our analyses verify the existence of two ancestral genes prior to the two tetraploidizations (1R and 2R). After these events, one had duplicated, resulting in CHRM2 and CHRM4; and the other had triplicated, forming the CHRM1, CHRM3, and CHRM5 subfamily. All five genes are still present in all vertebrate groups investigated except the CHRM1 gene, which has not been identified in some of the teleosts or in chicken or any other birds. Interestingly, the third tetraploidization (3R) that took place in the teleost predecessor resulted in duplicates of all five mAChR genes of which all 10 are present in . One of the copies of the CHRM2 and CHRM3 genes and both CHRM4 copies have gained in teleosts. Not a single separate (nontetraploidization) duplicate has been identified in any vertebrate species. These results clarify the evolution of the vertebrate mAChR family and reveal a doubled repertoire in zebrafish, inviting studies of gene neofunctionalization and subfunctionalization. Key words: acetylcholine; G-protein-coupled receptor; ; muscarinic; tetraploidization; zebrafish

Significance Statement Despite their pivotal physiologic role, the evolution of the muscarinic acetylcholine receptors (mAChRs) has not yet been resolved. By investigating the genomes of a broad selection of vertebrate species and combining three different types of data, namely sequence-based phylogeny, conserved synteny, and intron organization, we have deduced the evolution of the mAChR genes in relation to the major vertebrate tetraploidizations (1R, 2R, and 3R). Our analyses show that all vertebrate mAChR gene duplications resulted from the tetraploidizations. Interestingly, following 3R, zebrafish doubled its gene number, resulting in the 10 mAChR genes present. By knowing how and when the mAChR genes arose, studies of receptor subtype specialization and possible neofunctionalization or subfunctionalization can follow.

Introduction memory. They are also present in the peripheral nervous The muscarinic acetylcholine receptors (mAChRs) are system and smooth muscle tissue. The mAChR family G-protein-coupled receptors (GPCRs) involved in a vari- consists of five different receptor subtypes named M1– ety of CNS processes such as cognition, learning, and M5, which are encoded by the CHRM1-5 genes. The

Received August 27, 2018; accepted October 17, 2018; First published Author contributions: J.E.P., C.A.B., and D.L. designed the research. J.E.P. October 24, 2018. and C.A.B. performed research; J.E.P., C.A.B., and D.L. analyzed data; J.E.P. The authors declare no competing financial interests. and D.L. wrote the paper.

September/October 2018, 5(5) e0340-18.2018 1–12 New Research 2 of 12 structures of the muscarinic receptors follow the typical is important to deduce evolutionary relationships to dis- GPCR structure with the extracellular N terminus followed tinguish orthologs (species homologs), paralogs (gene by seven transmembrane (TM) domains (TM domains duplicates), and ohnologs (gene duplicates resulting spe- 1–7), which are separated by three intracellular loops (ILs; cifically from tetraploidization events), especially when 1–3), three extracellular loops (ELs; 1–3), and finally the studying species that belong to evolutionarily distant intracellular C terminus. The orthosteric binding site for groups, for instance the commonly used experimental acetylcholine consists of a hydrophobic pocket formed by animals mouse/rat, chicken, and zebrafish. Furthermore, the side chains of TM domains 3–7. The crystal structures the time points of the gene duplication events are impor- of the M2 and M3 receptors have been reported (Haga tant for studies of evolutionary change between orthologs et al., 2012; Kruse et al., 2012), showing that the binding and paralogs as well as ohnologs. It is now well estab- pocket contains identical residues in the M2 lished that the vertebrate predecessor underwent two and M3 receptors (Haga et al., 2012; Kruse et al., 2012; rounds of whole-genome duplication (i.e., tetraploidiza- Tautermann et al., 2013). In the study by Haga et al. tions) before the radiation of jawed vertebrates (Nakatani (2012), 14 amino acid residues were found to form the et al., 2007; Putnam et al., 2008). These two events are antagonist binding sites and, following modeling of ace- usually referred to as 1R and 2R. In addition, the ancestor tylcholine into the antagonist-binding pocket, 6 of these of the teleosts went through a third tetraploidization (3R) residues were suggested to bind acetylcholine. These six after the divergence from the most basal lineages of proposed acetylcholine-binding residues have also been ray-finned fishes (Jaillon et al., 2004). reported to be conserved in Drosophila melanogaster As the availability of high-quality genome assemblies is (Collin et al., 2013). In the EL regions, the amino acid continuously increasing, it is now possible to perform a more residues are less conserved, hence these have been tar- extensive analysis of the evolution of the mAChR family. gets for the design of drugs working as allosteric modu- We have used an approach that combines amino acid lators (Christopoulos, 2002; Kruse et al., 2013, 2014). The sequence-based phylogeny and analyses of chromosomal M1, M3, and M5 receptors form one subfamily, coupling locations for comparison of synteny and duplicated chromo- to Gq/11, and the M2 and M4 receptors form one subfam- some regions. We report here that the 1R and 2R genome- ily, coupling to Gi/o. Hence, acetylcholine may give rise to doubling events duplicated the two ancestral mAChR genes different responses depending on which receptor subtype to the five genes that are present today in all tetrapods is present to initiate the signal transduction. investigated except birds, where CHRM1 has not been The mAChRs are widely expressed in the nervous sys- found. Furthermore, the teleost 3R event doubled the rep- tem, the cardiovascular system, and the gastrointestinal ertoire once more, resulting in the 10 genes present today in tract, as well as elsewhere. In the peripheral nervous zebrafish, albeit some teleosts seem to lack both copies of system, the mAChRs play a major role in the parasympa- CHRM1. This long-lived multiplicity invites further studies of thetic system stimulating smooth muscle contraction and the roles of each of the subtypes. glandular secretion as well as slowing the heart rate (Eg- len, 2005). In the CNS of primates and rodents, the M1, Materials and Methods M2, and M4 receptors are the most highly expressed mAChRs in the brain, but M3 and M5 are also present Species included in analysis and amino acid (Thiele, 2013; Lebois et al., 2018). Regarding mechanisms sequence retrieval behind gene expression and similarities or dissimilarities Species sequences included in the analysis of the among vertebrate species, little is known (Lebois et al., mAChR family were the human (Homo sapiens; Hsa), 2018). Each gene may have multiple promoters as has mouse (Mus musculus; Mmu), opossum (Monodelphis been demonstrated for CHRM2 (Krejci et al., 2004), and domestica; Mdo), chicken (Gallus gallus; Gga), anole lizard while some promoters are conserved across mammals, (Anolis carolinensis; Aca), frog (Xenopus tropicalis; Xtr), others differ and presumably contribute to anatomic or coelacanth (Latimeria chalumnae; Lch), spotted gar (Lep- temporal differences in expression between species. isosteus oculatus; Loc), Japanese eel (Anguilla japonica; Although the muscarinic receptors have prominent Aja), European eel (Anguilla anguilla; Aan), zebrafish roles in various nervous system functions, the evolution of (Danio rerio; Dre), stickleback (Gasterosteus aculeatus; the mAChR has not yet been fully resolved. It Gac), medaka (Oryzias latipes; Ola), tunicates (Ciona in- testinalis; Cin and Ciona savigny; Csa), and nematode (Caenorhabditis elegans; Cel). The amino acid sequences This project was supported by grants from the Carl Trygger Foundation and from the species listed were retrieved from the Ensembl the FACIAS Foundation. *J.E.P. and C.A.B contributed equally. genome browser (release 87; Zerbino et al., 2018; En- We thank Jan-Erik Borg for preliminary analyses at the initial stage of this sembl Genome Browser, RRID:SCR_013367) or NCBI project. (NCBI; RRID:SCR_006472) databases. If sequences were Correspondence should be addressed to Dan Larhammar, Department of Neuro- not found in either of the databases, the sequence of a science, Unit of Pharmacology, Science for Life Laboratory, Box 593, Uppsala Uni- closely related species was used as a query sequence in versity, SE-751 24 Uppsala, Sweden. E-mail: [email protected]. https://doi.org/10.1523/ENEURO.0340-18.2018 a TBLASTN search (TBLASTN; RRID:SCR_011822). The Copyright © 2018 Pedersen et al. Japanese and European eel genome assemblies are not This is an open-access article distributed under the terms of the Creative annotated. Therefore, spotted gar mAChR gene se- Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is quences were used as templates to run a TBLASTN properly attributed. search and retrieve mAChR orthologs present in the Jap-

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 3 of 12 anese eel; thereafter, Japanese eel sequences were used information criterion. SPR was used as tree improvement as templates in TBLASTN search for mAChR orthologs method. If the members of a family showed unclear topology present in the European eel. Due to a high degree of and/or weak node support and/or if the family showed a high sequence conservation between all mAChR genes in the degree of conservation and/or lack of outgroups and/or Japanese eel and European eel, only the European eel massive expansions due to local duplications, it was ex- sequences were included in sequence-based phyloge- cluded from the analysis. The relatively low number of gene netic analysis as this species contained a more complete families in the regions surrounding the CHRM2 and CHRM4 mAChR gene repertoire than the Japanese eel. genes resulted in lack of information of the fourth chromo- some and its gene family members in the spotted gar. Sequence alignments and phylogenetic analyses Therefore, synteny figures of the current neighboring gene The retrieved amino acid sequences were aligned using repertoire in the zebrafish were prepared and included, for Jalview 2.10.3b1, with Muscle default settings (Waterhouse the paralogon structure in the ray-finned fishes. Based on et al., 2009; Jalview, RRID:SCR_006459). If the amino acid the results from the phylogeny and synteny analyses, some sequences were aligning poorly and the predictions ap- genes appeared incorrectly named or were lacking names; peared questionable, the genomic sequences were investi- therefore, genes were renamed or named according to the gated and the sequences were manually edited, by “ZFIN Zebrafish Nomenclature Conventions” (ZFIN; RRID: comparing sequence and consensus donor and SCR_002560), and the proposed gene names were submit- acceptor splice sites. Manual corrections were made where ted to ZFIN. The details of the gene family sequences the alignment appeared shifted due to a low degree of conser- included in the analysis are provided in Fig. 2-3, for the vation, such as the first part of the sequence or the IL3 loop CHRM2 and CHRM4 paralogon and in Fig. 3-3, for the region between TM5 and TM6. However, these adjustments CHRM1, CHRM3, and CHRM5 paralogon and the aLRT were kept to a minimum. All sequence details are included in SH-like trees are included in Fig. 2-1, and 3-1, respectively. Fig. 1-4, and the full alignment is available in Fig. 1-1,andFig. 1-2. A maximum likelihood analysis was performed using the Intron position analysis in teleosts PhyML 3.0 web server (PhyML; RRID:SCR_014629; Guindon To analyze specific intron gains in the CHRM2b, et al., 2010). The optimal substitution model was selected by CHRM3b, CHRM4a, and CHRM4b teleost genes, addi- the “Automatic Model Selection by SMS” option, with the tional teleost species included in the analysis were the Akaike information criterion. Additional settings selected were Amazon molly (Poecilia formosa; Pfo) and fugu (Takifugu as follows: BIONJ as starting tree, Subtree-pruning-regrafting rubripes; Tru). Sequences were analyzed and aligned as (SPR) for tree improvement, no number of random starting tree, described above. Intron positions were determined by no fast likelihood methods, and finally perform bootstrapping manual investigation of amino acid and se- with 100 replicates. The resulting tree was displayed in FigTree quences. Transmembrane domain regions were predicted version 1.4.2 (FigTree, RRID:SCR_008515), rooted with C. el- by consulting the TMHMM Server version 2.0 (TMHMM egans. Server; RRID:SCR_014935). For comparative analyses and confirmation of intron positions, the Japanese eel Conserved synteny and paralogon analysis was also analyzed. For synteny and paralogon analyses, the neighboring regions of the mAChR genes were investigated in human, Results chicken, and spotted gar. Lists of gene families in the genomic regions 10 Mb upstream and 10 Mb downstream Two ancestral mAChR genes expanded to five of the CHRM2/CHRM4 and CHRM1/CHRM3/CHRM5 following 1R and 2R genes, respectively, in the spotted gar genome were re- The multiple analysis of the trieved using the Biomart function in Ensembl version 83 mAChR genes confirmed a generally high degree of over- (Ensembl Genome Browser; RRID:SCR_013367). For the all sequence identity both across receptor subtypes and CHRM2/CHRM4 genes, lists of gene families were also across species. The degree of identity for the seven TM retrieved in a similar manner in the chicken genome, due regions between one of the most slowly evolving verte- to the small number of neighboring gene families in the brate model species, spotted gar, and human is ϳ83% for genomic regions in the spotted gar. Thereafter, families CHRM1, 87% for CHRM3, and 90% for CHRM5. In the containing two members or more were phylogenetically an- other subfamily, the identity is even higher with 96% for alyzed by retrieving the amino acid sequences for human, CHRM2 and 95% for CHRM4, displayed by Jalview chicken, coelacanth, spotted gar, and zebrafish. The region 2.10.3b pairwise alignment. Overall, CHRM2 displays the surrounding the CHRM1, CHRM3, and CHRM5 genes con- highest degree of conservation, whereas the CHRM1 tained a higher number of families; therefore, families with at displays the lowest. The CHRM2/CHRM4 subfamily dis- least three gene family members in the spotted gar were plays a higher degree of conservation than the CHRM1/ analyzed. As outgroups tunicates, amphioxus, drosophila, CHRM3/CHRM5 subfamily. This is also confirmed in C. elegans and in some cases other human gene sequences human–chicken ortholog comparisons and human para- were included. aLRT SH-like trees were constructed using log comparisons. However, when including the complete the PhyML 3.0 web server (PhyML; RRID:SCR_014629; amino acid sequences in the pairwise alignment, the per- Guindon et al., 2010) to verify the sequence orthology. To centage of identity drops considerably. For instance, the apply the most optimal selection model the “Automatic well conserved CHRM2 subtype decreases from 96% to model selection by SMS” model was selected, with Akaike 75% identity when including the complete sequence. One

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 4 of 12

Figure 1. PhyML tree of the mAChR genes (CHRM1-CHRM5), rooted with C. elegans. The tree topology is supported by a nonparametric bootstrap analysis with 100 replicates. In the multiple sequence alignment that the PhyML tree is based on, the IL3 region was excluded as this region showed a low degree of sequence conservation. In the sequence names, the species is followed by the or genomic scaffold at which the gene is located (numbers or roman numerals). If several genes are located on the same chromosome or genomic scaffold, their order is indicated by an additional number. Cin, Ciona intestinalis; Csa, Ciona savignyi. All sequence details are listed in Fig. 1-4. The Jalview sequence alignment from which the PhyML tree was created is presented in Fig. 1-2. Fig. 1-1, and Figure 1-3, display the sequence alignment and PhyML tree of the complete sequences, including the IL3 region. reason for this is that IL3 located between TM5 and TM6 ported by the species distribution, resulting in the CHRM2 is highly variable between receptor subtypes as well as and CHRM4 genes (Fig. 1). The CHRM2 and CHRM4 between species for each subtype (Fig. 1-1). This region is genes are present in all vertebrates investigated. Further- involved in interactions with G- and other cyto- more, duplicates of the CHRM2 and CHRM4 genes are plasmic components and is a potential target for regula- present in zebrafish, medaka, and stickleback. A dupli- tory phosphorylations. If the most variable part of the IL3 cate of the CHRM4 gene is also present in the European region is excluded from the pairwise alignment, the fol- eel, but only one CHRM2 gene has been found in this lowing identities are found: 70% for CHRM1; 76% for species. The ancestor of the CHRM1/CHRM3/CHRM5 CHRM3; 80% for CHRM5; 87% for CHRM2; and 86% for subfamily triplicated in 1R-2R giving rise to the CHRM1, CHRM4. Hence, CHRM2 increases from 75% to 87%. CHRM3, and CHRM5 genes (Fig. 1). The CHRM3 and Due to the high variability in the IL3 region, an alignment CHRM5 genes are present in all vertebrates investigated, excluding this region was prepared (Fig. 1-2) that served with duplicates in the teleosts. However, the CHRM1 as the basis for the phylogenetic analyses. Further, the six gene shows a slightly different species distribution; the amino acid residues proposed to be involved in acetyl- gene has not been identified in the chicken, and we were choline binding by (Collin et al., 2013) were conserved unable to find it in any of the bird genomes. The gene is across all vertebrate sequences included in this study. also missing in the medaka and stickleback genome as- The phylogenetic maximum likelihood (PhyML) tree of semblies, but it is present in European eel and zebrafish, the predicted mAChR proteins is displayed in Figure 1. and it has also retained duplicates in both species. Nota- The tree is rooted with a sequence from the nematode C. bly, the PhyML analysis shows that, despite exclusion of elegans. After the split between protostomes and deuter- the IL3 loop with its low sequence conservation, some ostomes, the primordial gene was duplicated in the deu- mAChR family genes have evolved at much higher rates, terostome lineage to form the two ancestral mAChR particularly in some of the teleosts. The CHRM1 orthologs genes present in the vertebrate predecessor, later to form also appear to have evolved at a higher rate than the other the two mAChR subfamilies. The closest relatives of these four subtypes, as shown by the long branches in the two vertebrate subfamilies are two groups of tunicate (Fig. 1). sequences. The ancestor of the CHRM2/CHRM4 subfam- The CHRM1/CHRM3/CHRM5 subfamily has a few tu- ily duplicated in the 1R-2R tetraploidizations, as shown by nicate sequences as its closest relatives (Fig. 1), whereas paralogon comparisons described below and also sup- the CHRM2/CHRM4 subfamily does not. Instead, there

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 5 of 12 are two groups of tunicate sequences present basally to with the time period of the tetraploidizations. We therefore both of the vertebrate subfamilies. However, the boot- analyzed the neighboring gene families to see whether strap values show that there is weak node support for the these too expanded during this time period and also positioning of the tunicate sequences in the PhyML tree, whether they have representatives in the other chromo- and some of them have also evolved very fast, as shown somal regions of the same paralogon. by their long branches; hence, their positioning in the The genomic regions surrounding the CHRM2/CHRM4 PhyML may not mirror the actual phylogeny. genes in the spotted gar were retrieved, and the gene The positioning of the tunicates basally to the CHRM1/ families with at least two members present were ana- CHRM3/CHRM5 subfamily shows that the expansion of lyzed. Following the exclusion criteria at the preliminary this subfamily occurred after the divergence of the verte- analysis stage stated in the Materials and Methods, five brates and the invertebrate chordates (here represented gene families were included in the final analysis, namely by tunicates), a period that coincides with the timing of ARHGAP, NAV, NELL, PPFIA, and SHANK. However, due the 1R and 2R events. This is further supported by the to the low number of gene families found in spotted gar, species distribution of these three subtypes. Although a the genomic regions surrounding the CHRM2/CHRM4 tunicate group is missing for the CHRM2/CHRM4 sub- genes were investigated in the chicken and six additional family, the timing of the duplication events, as well as the gene families were identified. The reason why they were species distribution of the two subtypes, coincides with not found in the neighboring regions in spotted gar could the duplication events in the CHRM1/CHRM3/CHRM5 be chromosomal rearrangements. The additional neigh- subfamily, supporting the 1R and 2R expansion hypoth- boring families are ABTB2, CREBL, CRY (for sequence esis also for this subfamily. Hence, from this phylogenetic details and phylogenetic analysis, see Haug et al., 2015), analysis we conclude that the mAChR family most likely DGK, MYBP, and RASSF. Information about the neigh- expanded from two ancestral members present before 1R boring gene families is included in Fig. 2-3, and their and 2R, to five members present following the vertebrate phylogenetic trees [aLRT (approximate likelihood-ratio tetraploidizations. Additionally, duplicates found in the test) SH (Shimodaira–Hasegawa)-like trees] are included group of teleosts coincide with the timing of the teleost- in Fig. 2-1. From the phylogenetic analyses, orthologous specific tetraploidization, hence suggesting that those and paralogous genes were determined, and the chromo- gene duplicates are paralogs resulting from 3R and can somal locations of the neighboring gene families in hu- thus be called ohnologs (see below). The PhyML tree man, chicken, and spotted gar are presented in Figure 2. resulting from the analysis of the complete multiple sequence In humans, the first member of this paralogon (Fig. 2, alignment, also including the IL3 region, is shown in Fig. 1-3. yellow) consists of regions located on three separate Details about all sequences included in the analysis are listed in (chromosomes 7, 22, and 12). In chicken Fig. 1-4. and spotted gar, the orthologs are located on a single chromosome (chromosome 1 in the chicken and LG8 in Analysis of synteny blocks confirms expansion of spotted gar). This strongly suggests that this paralogon the mAChR family by 1R and 2R member in the most likely was split by To explore the hypothesis that two ancestral mAChR chromosomal translocations. The second paralogon genes expanded to five as a result of the basal vertebrate member (Fig. 2, orange) is located on a single chromo- tetraploidizations, analyses of the mAChR neighboring some in human and chicken, and with one exception also genomic regions were conducted. If the members of each in spotted gar. Also the third paralogon member is re- of the two mAChR subfamilies are located in chromo- stricted to a single chromosome for these gene families in somal regions that also contain representatives from sev- all three species. Note that the MYBPHL and MYBPH eral other gene families, this would strongly indicate that genes in human are a result of a more recent local dupli- a large block of genes, or even a chromosomal region or cation (Fig. 2, red). This paralogon member has under- an entire chromosome, had been duplicated. The related gone more ohnolog losses than the other two members. genes resulting from these events are named ohnologs, The fourth paralogon member (Fig. 2, brown) seems to as described in the Introduction. On the other hand, if the have undergone even more ohnolog losses and is only members of an mAChR subfamily are in completely dif- present in the human genome, with members from the ferent chromosomal neighborhoods, this would indicate MYBP, PPFIA, and SHANK families (on chromosome 19). independent duplications of an mAChR gene and inser- To investigate whether this paralogon member is pres- tion into unrelated chromosomal regions. Related chro- ent in other species, the neighboring gene family reper- mosomal regions that arose as a result of the 1R and 2R toire and chromosomal locations were investigated in tetraploidizations (or any other tetraploidization event) are zebrafish, presented in Fig. 2-2. The fourth paralogon referred to as comprising a paralogon (Coulier et al., member was indeed found to be present in zebrafish, 2000). Thus, the vertebrate ancestral genome consisted represented by MYBP, PPFIA, and SHANK ohnologs (on of paralogons with quartets of related regions, whereas chromosomes 3 and 24). However, it appears that addi- the teleost ancestor had paralogons with eight related tional translocations have occurred in zebrafish, most members as a result of 3R. In extant species, the paral- likely following the 3R event. These genes are located in a ogons have often secondarily lost some of the ohnologs. paralogon that has been studied previously, where the Our phylogenetic analysis of the mAChR family showed focus was on a region containing numerous other gene that the expansion of the mAChR gene family coincides families including the visual , transducin alpha sub-

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 6 of 12

Figure 2. The evolutionary history and analysis of chromosomal regions and conserved synteny of the CHRM2 and CHRM4 genes and their

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 7 of 12 continued neighboring gene families. The gene repertoire present in the vertebrate predecessor is displayed in the top panel, the duplication scheme further displays which orthologs were retained in the vertebrate ancestor following 1R and 2R, and finally the last three panels display the gene repertoire present in the human, chicken, and spotted gar. Crosses indicate gene loss or gene not (yet) identified. Dashed boxes represent incomplete sequences. Each paralogon member is presented in a separate color. Chicken and spotted gar illustrations are reused with permission from Daniel Ocampo Daza (source: www.egosumdaniel.se). The sequence details are listed in Fig. 2-3. The aLRT SH-like trees are displayed in Fig. 2-1, and the chromosomal regions and conserved synteny of the CHRM2 and CHRM4 genes and their neighboring gene families in zebrafish are displayed in Fig. 2-2. units and / receptors (Lagman et al., The synteny analysis in human, chicken, and spotted 2013). Therefore, although the third and fourth members gar shows that these genomic regions have undergone a of this region have undergone several ohnolog losses, it is number of rearrangements such as translocations, and nevertheless clear that the CHRM2 and CHRM4 genes several ohnologs could not be identified. This is further are located in a paralogon that arose in the basal verte- seen when analyzing the CHRM1/CHRM3/CHRM5 neigh- brate tetraploidizations. boring gene repertoire and chromosomal locations in ze- The CHRM1/CHRM3/CHRM5 subfamily arose from a brafish, presented in Fig. 3-2. It appears that a number of separate ancestor gene. Our investigation of the paral- translocations have occurred in zebrafish, as for instance the ogon hypothesis was initiated by retrieving the genomic first paralogon member (Fig. 3-2, green) is located on five regions surrounding the CHRM1/CHRM3/CHRM5 genes different chromosomes, and the second member is located in spotted gar. These contained a higher number of gene on no less than seven different chromosomes (Fig. 3-2, families than the regions surrounding the CHRM2/CHRM4 turquoise). This paralogon too has been studied in detail in a genes and therefore the analysis was restricted to gene previous study focusing on receptor genes families with three or four members. Following the exclu- including multiple neighboring gene families. These chromo- sion criteria stated in the Materials and Methods, 15 somal regions were found to be related through the 1R, 2R, neighboring gene families were included in the final anal- and 3R events and thereby comprise a paralogon (Ocampo ysis, namely ATL, EHD, FERMT, JAG, LTBP, MERTK, Daza et al., 2012). NRXN, PLD, PRKD, PROX, PRPH2, PYG, SLC24A, SPTB, Thus, the analysis of conserved synteny and paral- and TGFB. Information about the neighboring gene families is ogons of the two mAChR subfamilies and their neighbor- included in Fig. 3-3, and the phylogenetic trees (aLRT SH like) ing gene families confirms our hypothesis based on the are included in Figure 3-1. Based on the phylogenetic analyses, phylogenetic analyses that the mAChR gene family ex- the chromosomal locations of the gene family members are panded by the 1R and 2R tetraploidizations from two shown for human, chicken, and spotted gar in Figure 3. ancestral genes into the five members present today in The first paralogon member (Fig. 3, green) is located on most tetrapods and some basally diverging vertebrates, a single chromosome in human and spotted gar (chromo- and 8-10 members in teleosts, forming two distinct sub- some 11 and LG28, respectively). Three of the genes families belonging to two separate paralogons. located on LG28 (LTBP3, PROX, and SPTBN2; Fig. 3, dashed boxes) were incomplete in the genome database Teleost-specific intron gains in the CHRM2b, and contain Ͻ50% of the sequence. As this might impact CHRM3b, CHRM4a, and CHRM4b genes the topology in the aLRT SH-like trees, trees were also The amino acid sequence analyses and alignments generated where these sequences were excluded; never- revealed that some of the teleost sequences contained theless, the results remained the same (Fig. 3-1). None of annotated introns in the genome assemblies, although the the genes of the first paralogon member are present in mAChR genes in general have been said to lack introns in chicken, except the SPTB family member SPTBN2 (lo- the coding region (Bonner et al., 1987, 1988; Peralta et al., cated on a scaffold). As mentioned previously, the 1987; Seo et al., 2009). To investigate whether these CHRM1 gene is absent in the chicken, as well as in other introns were indeed teleost-specific gains, or whether birds. The second and third paralogon members in human they could be the results of gene annotation or sequenc- are located on three and two different chromosomes, ing difficulties, an extended sequence repertoire from te- respectively (Fig. 3, turquoise and light blue). The second leosts was analyzed. In the sequence analyses, it was found member is located on two different chromosomes in spot- that the CHRM2b, CHRM3b, CHRM4a, and CHRM4b genes ted gar (LG1 and LG16). In chicken, both the second and have independently gained at least one intron in the prox- the third paralogon members are located on a single imity of the region encoding TM1 and at least one intron in chromosome. Finally, the fourth paralogon member is the region encoding the IL between TM5 and TM6, in at least located on chromosome 19 in human and on LG2 in one of the teleost species investigated (Fig. 4). In CHRM2b, spotted gar (Fig. 3, dark blue). In chicken, the fourth stickleback and medaka have gained one intron in the end of member could not be identified for four of the gene fam- the TM1 domain (Fig. 4). This intron gain is supported by ilies, and three ohnologs are located on scaffolds (EHD2, analysis of fugu and Amazon molly CHRM2b sequences, as PLD3, and SPTBN4). However, one member of the TGFB they too contain this intron (Fig. 4-1). One intron is also family, the TGFB1 gene, is located on chromosome 32, present in the IL3 domain, located between TM5 and TM6, which is a very short chromosome in the chicken, con- in zebrafish, stickleback, and medaka. However, it seems sisting of only ϳ78 kb. that this intron is not the same in the three teleost species.

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 8 of 12

Figure 3. The evolutionary history and analysis of chromosomal regions and conserved synteny of the CHRM1, CHRM3, and CHRM5 genes and their neighboring gene families. The gene repertoire present in the vertebrate predecessor is displayed in the top panel, the duplication scheme further displays which orthologs were retained in the vertebrate ancestor following 1R and 2R, and finally the last three panels display the gene repertoire present in the human, chicken, and spotted gar. Crosses indicate gene loss or gene not (yet) identified. Each paralogon member is presented in a separate color. Chicken and spotted gar illustrations are reused with

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 9 of 12 continued permission from Daniel Ocampo Daza (source: www.egosumdaniel.se). The sequence details are listed in Fig. 3-3. The aLRT SH-like trees are displayed in Figure 3-1, and the chromosomal regions and conserved synteny of the CHRM2 and CHRM4 genes and their neighboring gene families in zebrafish are displayed in Figure 3-2.

Rather, one intron seems to have been gained in zebrafish intron found in fugu, which is most likely present also in and a separate intron was gained in the ancestor of stickle- Amazon molly (Extended Data Fig. 4-2). No exons up- back and medaka (Fig. 4). No CHRM2b gene could be stream of the intron in the beginning of TM1 in medaka identified in European eel (or in the Japanese eel), and CHRM3b could be identified, and therefore it was not pos- therefore it was not possible to determine the exact time sible to confirm the presence of additional introns in medaka point when this intron was gained in zebrafish. The intron (Extended Data Fig. 4-2). present in stickleback and medaka was also found in fugu The CHRM4a gene has gained the largest number of and Amazon molly (Extended Data Fig. 4-1). introns. It has gained one intron in the N-terminal region in The CHRM3b gene has gained one intron located in the the ancestor of medaka and stickleback (Fig. 4). The first beginning of the region encoding TM1 in medaka and exon could not be found in stickleback, but there is a stickleback and one additional intron in the N-terminal suitable consensus splice acceptor site present at the region in stickleback (Fig. 4). However, the first exon could position corresponding to the intron present in medaka. not be identified in stickleback, although the presence of This possible splice site is also present in fugu and Am- an intron at this position is supported by an identical azon molly (Extended Data Fig.4-3). Another intron is

Figure 4. The localization of teleost-specific intron gains for the CHRM2b, CHRM3b, CHRM4a, and CHRM4b genes in the European eel, zebrafish, stickleback, and medaka. The top panel displays the relationship between the teleost species included in intron analysis, with the spotted gar as reference species followed by the mAChR outline and specific intron gains (indicated by colored hexagon) for the CHRM2b, CHRM3b, and CHRM4a. No CHRM2b sequence was identified in the European eel. Asterisk is present where an intron gain could not be confirmed. The sequence details are listed in Fig. 4-5. The Jalview sequence alignments of the teleost sequences analyzed are displayed for the CHRM2b, CHRM3b, CHRM4a, and CHRM4b genes in Fig. 4-1, Fig. 4-2, Fig. 4-3, and Fig. 4-4, respectively.

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 10 of 12 present in the region encoding EL2, just before TM5, in stickleback and medaka (Fig. 4), as well as in fugu and Amazon molly (Extended Data Fig. 4-3). Finally, four in- trons have been gained in the large region encoding IL3 of CHRM4a in the ancestor of stickleback and medaka (Fig. 4) and is also present in fugu and Amazon molly (Ex- tended Data Fig. 4-3). Among these four genes, CHRM4b is the one that has gained the lowest number of introns. There is one intron present in the N-terminal region in zebrafish (Fig. 4). This intron is not found in any of the other teleosts analyzed. However, there is one possible intron present in European eel, but before this position there is also a methionine present that could act as trans- lation initiator, meaning that this intron may not be present in European eel (Extended Data Fig. 4-4). However, the methionine is not present in Japanese eel, which implies that there should be an intron present at this position. Due to these inconsistencies between European eel and Japa- nese eel, it is not possible to conclude whether or not there is an intron present in the N-terminal region in these species. There is also one intron gained in IL3 in CHRM4b in medaka (Fig. 4), an intron that is also present in Amazon molly (Fig. 4). Information about the teleost sequences included in this analysis is provided in Extended Data Fig. 4-5.

Discussion Our analyses of the mAChR gene family are based on the following three types of information: sequence-based Figure 5. phylogenetic analysis; synteny and paralogon analysis; as Duplication scheme of the mAChR genes following 1R, 2R, and 3R. Two mAChR genes present in the vertebrate pre- well as analysis of teleost-specific intron gains. The com- decessor expanded to five mAChR genes following 1R and 2R. bined results of these analyses show that the mAChR All genes were retained in the human, chicken, and spotted gar family expanded from two ancestral genes present in the except for the CHRM1 gene in the chicken. In the teleost pre- vertebrate predecessor to five mAChR genes in an early decessor, 5 mAChR genes expanded to 10 genes following 3R, vertebrate ancestor, as a result of the 1R and 2R tetrap- of which all duplicates are retained in the zebrafish. Chicken, loidization events (Fig. 5). All five members could be spotted gar, and zebrafish illustrations are reused with permis- identified in the vertebrate classes of mammals, birds, sion from Daniel Ocampo Daza (source: www.egosumdaniel.se). reptiles, amphibians, and bony fishes, with the exception of the CHRM1 gene, which, surprisingly, has not been genes, lacking one of the CHRM2 duplicates, whereas identified in chicken or any other bird. It remains possible medaka and stickleback are lacking the CHRM1 gene. that the gene exists and is located on a microchromo- Notably, no nontetraploidization duplicates of any of the some, because it is well known that these are under- mAChR genes was found in any of the vertebrate species represented in the genome sequencing projects, probably analyzed. partly due to their extremely high GC content (Burt, 2002; The CHRM2/CHRM4 subfamily contains two ohnologs Zhang et al., 2014). resulting from the 1R-2R tetraploidizations. It is unclear We also identified all five mAChR genes in a ray-finned whether these two genes arose in 1R, and both of their fish, the spotted gar, which represents an early branch in duplicates after 2R were lost, or whether one copy was the ray-finned fish tree. The teleosts, which constitute lost after 1R and the other was duplicated in 2R. The 99.9% of all ray-finned fishes, are descendants of a lin- CHRM1/CHRM3/CHRM5 subfamily contains three of the eage that underwent a third tetraploidization, and for the ohnologs resulting from the two tetraploidizations. This mAChR family all 10 genes deriving from this event have means that the ancestral gene duplicated once in 1R, and been retained in zebrafish. The phylogenetic analysis then both copies duplicated once more in 2R, after which shows that the teleost-specific tetraploidization 3R re- one ohnolog was lost. As the two tetraploidizations were sulted in duplicates of all mAChR genes in zebrafish, probably very close in time to one another, it is difficult to say resulting in a total of 10 mAChR genes (Fig. 5). A doubled which two may be the results of the 2R tetraploidization. mAChR repertoire in zebrafish has been reported before Thus, the repertoire of mAChR genes is quite consistent (Seo et al., 2009; Nuckels et al., 2011); however, neither of across vertebrates. This is presumably a reflection of the previous studies tied it to the 3R tetraploidization. unique functional roles for each of the gene products. The Here we can explain all of these duplications by a single only gene that deviates from this pattern is CHRM1. Not genomic event (and previous gene duplications by the only does it seem to be missing in birds, it has also not 1R/2R events). The European eel has retained nine of the been identified in a few teleosts, namely stickleback and

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 11 of 12 medaka. The pairwise alignments of the mAChR se- References quences showed that the CHRM1 gene displayed the Bonner TI, Buckley NJ, Young AC, Brann MR (1987) Identification of lowest degree of sequence identity; hence, it is more likely a family of muscarinic acetylcholine receptor genes. Science 237: that this gene could be lost, or that the low degree of 527–532. Medline conservation has impeded its identification in the species Bonner TI, Young AC, Bran MR, Buckley NJ (1988) Cloning and where it has so far not been found. In fact, the whole expression of the human and rat m5 muscarinic acetylcholine paralogon member is missing in chicken, but there are still receptor genes. Neuron 1:403–410. Medline two possible reasons for this: either this whole chromo- Burt DW (2002) Origin and evolution of avian microchromosomes. somal region was lost; or the whole region ended up on a Cytogenet Genome Res 96:97–112. CrossRef Medline microchromosome that is as yet unsequenced. In fact, Christopoulos A (2002) Allosteric binding sites on cell-surface recep- tors: novel targets for drug discovery. Nat Rev Drug Discov 1:198– one gene in this paralogon member (SPTBN2) has been 210. CrossRef identified, it is on a scaffold, perhaps indicating that ad- Collin C, Hauser F, de Valdivia EG, Li S, Reisenberger J, Carlsen ditional members may be possible to identify. EMM, Khan Z, Hansen NØ, Puhm F, Søndergaard L, Niemiec J, Interestingly, a more thorough analysis of the amino Heninger M, Ren GR, Grimmelikhuijzen CJP (2013) Two types of acid sequences and especially the IL3 region, which has a muscarinic acetylcholine receptors in Drosophila and other arthro- low degree of sequence conservation, revealed that there pods. Cell Mol Life Sci 70:3231–3242. CrossRef has been a number of teleost specific introns gained in the Coulier F, Popovici C, Villet R, Birnbaum D (2000) MetaHox gene coding regions of the CHRM2b, CHRM3b, CHRM4a, and clusters. J Exp Zool 288:345–351. CrossRef Medline Eglen RM (2005) Muscarinic receptor subtype pharmacology and CHRM4b genes. A previous study by Seo et al. (2009), physiology. In: Progress in medicinal chemistry (King FD, Lawton which focused on a subset of mAChRs and smooth mus- G, eds), pp 105–136. Amsterdam: Elsevier. cle contraction responses in Nile tilapia reported that all Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel five mAChR genes present had retained paralogs in ze- O (2010) New algorithms and methods to estimate maximum- brafish, resulting in 10 mAChR genes present in total. The likelihood phylogenies: assessing the performance of PhyML 3.0. study by Seo et al. (2009) also reported that no introns Syst Biol 59:307–321. CrossRef Medline were present in the mAChR genes studied. However, with Haga K, Kruse AC, Asada H, Yurugi-Kobayashi T, Shiroishi M, Zhang C, Weis WI, Okada T, Kobilka BK, Haga T, Kobayashi T (2012) the increased availability of data, and especially genome Structure of the human M2 muscarinic acetylcholine receptor assemblies, our analyses have identified and verified a bound to an antagonist. Nature 482:547–551. CrossRef number of intron gains in several teleosts (Fig. 4, Fig. 4-1, Haug MF, Gesemann M, Lazovic´ V, Neuhauss SCF (2015) Eumeta- Fig. 4-2, Fig. 4-3, and Fig. 4-4). Our findings are sup- zoan cryptochrome phylogeny and evolution. Genome Biol Evol ported by those of a previous study reporting that the 7:601–619. CrossRef Medline intron turnover in Actinopterygii is high, especially in the Jaillon O, et al. (2004) Genome duplication in the teleost fish Tetra- stickleback and zebrafish (Venkatesh et al., 2014; Ravi odon nigroviridis reveals the early vertebrate proto-karyotype. Na- and Venkatesh, 2018). However, to our knowledge intron ture 431:946–957. CrossRef Medline Krejci A, Bruce AW, Dolezal V, Tucek S, Buckley NJ (2004) Multiple gains have not been previously reported for the mAChR promoters drive tissue-specific expression of the human M mus- genes in teleosts. The intron gains have taken place in carinic acetylcholine receptor gene. J Neurochem 91:88–98. teleost genes that evolve faster than their orthologs in CrossRef Medline other lineages, especially CHRM2b, CHRM3b, and Kruse AC, Hu J, Pan AC, Arlow DH, Rosenbaum DM, Rosemond E, CHRM4b (Fig. 1 and Fig. 1-3). Green HF, Liu T, Chae PS, Dror RO, Shaw DE, Weis WI, Wess J, Duplication by chromosome doubling means by default Kobilka BK (2012) Structure and dynamics of the M3 muscarinic that the two ohnologs deriving from the same mother acetylcholine receptor. Nature 482:552–556. CrossRef Medline gene must initially have had identical gene regulatory Kruse AC, Ring AM, Manglik A, Hu J, Hu K, Eitel K, Hübner H, Pardon E, Valant C, Sexton PM, Christopoulos A, Felder CC, Gmeiner P, elements and hence identical expression patterns, ana- Steyaert J, Weis WI, Garcia KC, Wess J, Kobilka BK (2013) Acti- tomically, temporally, and quantitatively. This may initially vation and allosteric modulation of a muscarinic acetylcholine result in an additive effect on the level of gene expression, receptor. Nature 504:101–106. CrossRef Medline unless compensatory mechanisms are at play. Subse- Kruse AC, Kobilka BK, Gautam D, Sexton PM, Christopoulos A, quently, it is possible that either or both of the ohnologs Wess J (2014) Muscarinic acetylcholine receptors: novel opportu- may begin to accumulate , either regulatory or nities for drug development. Nat Rev Drug Discov 13:549–560. structural, that will alter the functions of the gene. One CrossRef Medline Lagman D, Ocampo Daza D, Widmark J, Abalo XM, Sundström G, possibility is that one of the ohnologs maintains the func- Larhammar D (2013) The vertebrate ancestral repertoire of visual tions of the mother gene, leaving the other free to take on opsins, transducin alpha subunits and oxytocin/vasopressin re- other roles (i.e., neofunctionalization). This was a possi- ceptors was established by duplication of their shared genomic bility favored by Ohno (1970). Alternatively, the two region in the two rounds of early vertebrate genome duplications. ohnologs may lose regulatory elements such that they BMC Evol Biol 13:238. CrossRef Medline subdivide the functions of the mother gene between them Lebois EP, Thorn C, Edgerton JR, Popiolek M, Xi S (2018) Muscarinic in a process called subfunctionalization. As the mAChR receptor subtype distribution in the central nervous system and genes were doubled in zebrafish and all duplicates are relevance to aging and Alzheimer’s disease. Neuropharmacology 136:362–373. CrossRef retained, it would be interesting to study possible neo- Nakatani Y, Takeda H, Kohara Y, Morishita S (2007) Reconstruction functionalizations or subfunctionalizations of these genes, of the vertebrate ancestral genome reveals dynamic genome re- initially by investigating gene expression patterns in ana- organization in early vertebrates. Genome Res 17:1254–1265. tomic mapping studies. CrossRef Medline

September/October 2018, 5(5) e0340-18.2018 eNeuro.org New Research 12 of 12

Nuckels RJ, Forstner MRJ, Capalbo-Pitts EL, García DM (2011) long duration of action and kinetic selectivity of tiotropium for the Developmental expression of muscarinic receptors in the eyes of muscarinic M3 receptor. J Med Chem 56:8746–8756. CrossRef zebrafish. Brain Res 1405:85–94. CrossRef Medline Ocampo Daza D, Sundström G, Bergqvist CA, Larhammar D (2012) Thiele A (2013) Muscarinic signaling in the brain. Annu Rev Neurosci The evolution of vertebrate somatostatin receptors and their gene 36:271–294. CrossRef Medline regions involves extensive chromosomal rearrangements. BMC Venkatesh B, Lee AP, Ravi V, Maurya AK, Lian MM, Swann JB, Ohta Evol Biol 12:231. CrossRef Medline Y, Flajnik MF, Sutoh Y, Kasahara M, Hoon S, Gangu V, Roy SW, Ohno S (1970) Evolution by gene duplication. Berlin, Heidelberg: Irimia M, Korzh V, Kondrychyn I, Lim ZW, Tay BH, Tohari S, Kong Springer. KW, et al. (2014) Elephant shark genome provides unique insights Peralta EG, Ashkenazi A, Winslow JW, Smith DH, Ramachandran J, into gnathostome evolution. Nature 505:174–179. CrossRef Med- Capon DJ (1987) Distinct primary structures, ligand-binding prop- line erties and tissue-specific expression of four human muscarinic Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ acetylcholine receptors. EMBO J 6:3923–3929. Medline (2009) Jalview version 2—a multiple sequence alignment editor Putnam NH, et al. (2008) The amphioxus genome and the evolution and analysis workbench. 25:1189–1191. CrossRef of the chordate karyotype. Nature 453:1064–1071. CrossRef Med- Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, line Billis K, Cummins C, Gall A, Girón CG, Gil L, Gordon L, Haggerty Ravi V, Venkatesh B (2018) The divergent genomes of teleosts. Annu L, Haskell E, Hourlier T, Izuogu OG, Janacek SH, Juettemann T, To Rev Anim Biosci 6:47–68. CrossRef Medline JK, Laird MR, et al. (2018) Ensembl 2018. Nucleic Acids Res Seo JS, Kim M-S, Park EM, Ahn SJ, Kim NY, Jung SH, Kim JW, Lee 46:D754–D761. CrossRef Medline HH, Chung JK (2009) Cloning and characterization of muscarinic Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, Storz JF, Antunes A, receptor genes from the Nile tilapia (Oreochromis niloticus). Mol Greenwold MJ, Meredith RW, Ödeen A, Cui J, Zhou Q, Xu L, Pan Cells 27:383–390. CrossRef Medline H, Wang Z, Jin L, Zhang P, Hu H, Yang W, et al. (2014) Compar- Tautermann CS, Kiechle T, Seeliger D, Diehl S, Wex E, Banholzer R, ative genomics reveals insights into avian and Gantner F, Pieper MP, Casarosa P (2013) Molecular basis for the adaptation. Science 346:1311–1320. CrossRef Medline

September/October 2018, 5(5) e0340-18.2018 eNeuro.org