UPTEC X 12 006 Examensarbete 30 hp April 2012

Molecular evolution of voltage-gated calcium channels of L and N types and their genomic regions

Jenny Widmark

Molecular Biotechnology Programme Uppsala University School of Engineering

UPTEC X 12 006 Date of issue 2012-04 Author Jenny Widmark

Title (English) Molecular evolution of voltage-gated calcium channels of L and N types and their genomic regions Title (Swedish)

Abstract The expansion of the voltage-gated calcium channel alpha 1 subunit families (CACNA1) of L and N types was investigated by combining phylogenetic analyses (neighbour-joining and maximum likelihood) with chromosomal data. Neighbouring families were analysed to see if the chromosomal regions duplicated through whole genome doublings in vertebrates. Results show that both types of CACNA1 expanded in two ancient whole genome duplications as parts of larger genomic regions. Many gene families in these regions obtained copies in an additional teleost-specific genome duplication. This diversification of CACNA1 probably contributed to evolutionary innovations in nervous system function. Keywords

Evolution, vertebrate, voltage-gated calcium channel, whole genome duplication Supervisors Dan Larhammar Uppsala University

Scientific reviewer Tatjana Haitina Uppsala University Project name Sponsors

Language Security English Classification ISSN 1401-2138 Supplementary bibliographical information Pages 57 Biology Education Centre Biomedical Center Husargatan 3 Uppsala Box 592 S-75124 Uppsala Tel +46 (0)18 4710000 Fax +46 (0)18 471 4687

Molecular evolution of voltage-gated calcium channels of L and N types and their genomic regions Jenny Widmark

Populärvetenskaplig sammanfattning

I alla celler i en organism finns en och samma uppsättning av arvsmassa (DNA), denna uppsättning kallas för ett genom och i genomet finns alla gener. Man vet hur genomen i flera olika arter ser ut eftersom de har sekvenserats, dvs att ordningen av byggstenarna har avlästs. Det har visat sig att en gen som i ryggradslösa djur bara finns i en kopia ofta finns i flera kopior i ryggradsdjur. Detta förklaras av att det i de senare har skett två genomduplikationer (genomfördubblingar), alltså att alla gener har kopierats i två omgångar. På detta sätt uppstod genfamiljer, snarlika gener som kodar för proteiner som liknar varandra. Äkta benfiskar har utöver de första två genomduplikationerna genomgått ytterligare en. Detta innebär att vi människor borde ha fyra gånger fler gener än ryggradslösa djur och att äkta benfiskar borde ha dubbelt så många gener som vi. Dock har många av de nya kopiorna muterat så att flera av dem har förstörts och därför inte finns kvar i genomet.

Eftersom det uppstått fler kopior av en och samma gen ges större möjligheter till att vissa av kopiorna förändras då minst en kopia finns kvar som fortsätter att utföra den ursprungliga funktionen. På så sätt kan några av genkopiorna mutera och utveckla nya funktioner. Därför tror man att genomduplikationerna har haft stor evolutionär betydelse, bland annat för utvecklingen av ryggradsdjurens nervsystem. I den här rapporten beskriver jag evolutionen av två genfamiljer som kodar för två olika typer av kalciumkanaler. Kalciumkanaler har flera viktiga funktioner i nervsystemet och mina studier visar att båda familjerna har expanderat i de två genomduplikationerna och att endast ett av duplikaten har gått förlorat i däggdjur. Äkta benfiskar har även bevarat de allra flesta kopiorna från den tredje genomduplikationen. Den höga graden av bevarade genkopior i både däggdjur och äkta benfiskar stödjer teorin att gener som är viktiga för organismernas nervsystem är mer benägna att bevaras.

Examensarbete 30 hp Civilingenjörsprogrammet Molekylär bioteknik

Uppsala universitet, april 2012

Contents

1 Introduction 9

1.1 Calcium channels ...... 9

1.2 Vertebrate evolution ...... 12

1.3 Phylogenetic tree analysis ...... 14

1.4 Neighbour-joining and maximum likelihood methods ...... 15

2 Materials and methods 17

2.1 Collection of CACNA1 gene families ...... 17

2.2 Identifying neighbouring gene families ...... 17

2.3 Editing of genes and sequences ...... 18

2.4 Sequence alignment and phylogenetic analyses ...... 18

3 Results 19

3.1 CACNA1 L-type family ...... 19

3.1.1 Chromosomal regions ...... 21

3.2 CACNA1 N-type family ...... 23

3.2.1 Chromosomal regions ...... 25

4 Discussion 27

5 Acknowledgements 30

6 References 31

Supplementary figure S1 36

Supplementary figure S2 46

7 Key terms and abbrevations

Amniote Tetrapods with amniotic sacs surrounding the embryo. In- cludes reptiles, birds and mammals but not amphibians which do not have land-adapted eggs.

CACNA1 Calcium channel subunit α1

Clade A group which includes one common ancestor and all its de- scendants. In a phylogenetic tree a clade includes all branches that radiate from one single node.

Conserved synteny Preserved co-localisation of genes on of differ- ent species.

Homologues Genes derived from a common ancestral DNA sequence.

Orthologues Homologous genes in different species that diverged from a common ancestral gene by speciation.

Paralogues Homologous genes within the same species, related by dupli- cation within a genome.

Paralogon A set of homologous chromosomal regions containing groups of paralogous genes, derived from a common ancestral re- gion.

Parsimonious From parsimony: according to this principle the hypothesis that makes the least assumptions, the simplest explanation, is the most parsimonious.

Teleosts True bony fishes; the largest clade within ray-finned fishes. In the present study, the zebrafish, medaka, three-spined stickle- back and spotted green pufferfish are teleosts.

Tetraploidisation The process of genome doubling, resulting in four homolo- gous sets of chromosomes (rather than the diploid set of two).

Tetrapods Four-limbed vertebrates, includes amphibians, reptiles, birds and mammals.

WGD Whole genome duplication

8 1 Introduction

A large number of human diseases are caused by ion channel malfunctions, so called channelopathies. One type of ion channel is the voltage-gated calcium channels and malfunction or lack of expression of these cause diseases such as epilepsy, blindness, paralysis, cardiac arrhythmias and migraine (Jegla et al., 2009).

The ion channels have crucial functions in the nervous system and it has previ- ously been suggested (Jegla et al., 2009) that many of the ion channel families have expanded in vertebrate specific tetraplodisation events, called 2R and 3R (Panopoulou and Poustka, 2005). These whole genome duplications might have played a major role in the diversification of vertebrates and the evolution of the nervous system (Holland, 2009). Dan Larhammar’s research group has previously described that several gene families as well as their chromosomal regions have ex- panded in this way, such as the HOX cluster region (Sundström et al., 2008b), the neuropeptide Y system (Larsson et al., 2008; Sundström et al., 2008a), the opioid system (Dreborg et al., 2008; Sundström, 2010) and also other ion channels such as the voltage-gated sodium channels (Widmark et al., 2011).

Here I have investigated the expansion of the calcium channel α subunits of L and N type by combining two types of phylogenetic analysis together with chro- mosomal data to see if they are consistent with whole genome duplication events. In addition, I have analysed neighbouring gene families to further investigate the duplications of larger chromosomal regions.

1.1 Calcium channels

Voltage-gated calcium channels are made up by four to five different subunits, α1, β, α2, δ and γ. This study is focused on the calcium channel subunit α1 (CACNA1) which is the largest subunit that forms the actual channel. The α1 subunit genes have about 47 exons which encode of approximately 2500 amino acids. The proteins have four domains (I-IV) with six transmembrane re- gions (S1-S6) each, the fourth of which functions as a voltage sensor, see figure 1.

In humans there are ten different CACNA1 genes, divided into three families, L-, P/Q/N/R- (from now on referred to as N-type for convenience) and T-type, see Table 1 for a summary. The different CACNA1 proteins contribute to the

9 Figure 1: Schematic picture of a voltage-gated calcium channel. The channel has four domains with six transmembrane regions each. The fourth transmembrane region (in yellow) functions as a voltage-sensor. Adapted from Widmark et al. 2011. pharmacological and electrophysiological diversity among the calcium channels while the other subunits mainly modulate their properties (Catterall, 2011). Three different CACNA1 genes are present in the fruit fly and the tunicate genomes, one ortholog for each CACNA1 family. This indicates that three CACNA1 genes were present before the divergence of the vertebrate lineage.

The L-type family of the CACNA1 subunits includes four different proteins in humans: CACNA1S, 1C, 1D and 1F. They typically have a long-lasting activation time and were previously thought to be dependent on a strong depolarisation for activation. However, lately it has been found that some members can be activated by smaller changes in membrane potential (Lipscombe et al., 2004). They initiate the contraction of muscle cells and the secretion from endocrine cells but are also involved in hearing, retinal signalling and urinary bladder functions.

The P/Q-type (CACNA1A), N-type (CACNA1B) and the R-type (CACNA1E) form one distinct family and are, like the L-type, activated by strong depolarisa- tion. They initiate neurotransmission and are for example involved in synaptic transmission, long-term memory and neuronal migration during development.

The T-type includes three genes: CACNA1G, 1H and 1I. They are activated by weak depolarisation and only stay activated for a short time. Unlike the other types of CACNA1s, they are resistant to both organic antagonists and snake and spider toxins. Due to their properties one of their main functions involves the shaping of action potential and patterns of repetitive firing (Catterall, 2011).

10 Table 1: Summary of the ten voltage-gated calcium channels in human.(Catterall, 2011; Jansen et al., 2011)

Type Channel name Gene name Human Involved in

Cav 1.1 CACNA1S 1 Skeletal muscle excitation

Cardiac muscle contrac- tion, insulin secretion, L (long-lasting) Cav 1.2 CACNA1C 12 urinary bladder function, spatial memory

Sinoatrial node pacemak- Cav 1.3 CACNA1D 3 ing, hearing, brain func- tion (mood behavior)

Cav 1.4 CACNA1F X Retinal signaling

Fast synaptic transmis- sion and synaptic plastic- P/Q (purkinje) Cav 2.1 CACNA1A 19 ity, neuromuscular trans- mission

Sympathetic functions, long-term memory, "N" N (neural) Cav 2.2 CACNA1B 9 synaptic transmission, neuronal migration during development

R (residual) Cav 2.3 CACNA1E 1 Synaptic transmission

Low threshold Ca2+ spike that mediates burst firing Cav 3.1 CACNA1G 17 of neurons, depolarization of sinoatrial nodal cells

T (transient, Cav 3.2 CACNA1H 16 Nociception, relaxation of short-lived) coronary arteries

Low threshold Ca2+ spike Cav 3.3 CACNA1I 22 that mediates burst firing of neurons

11 1.2 Vertebrate evolution

Two rounds (2R) of whole genome duplication (WGD) have occurred early in the evolution of vertebrates, at least before the emergence of jawed vertebrates (see figure 2) (Ohno, 1970; Dehal and Boore, 2005; Nakatani et al., 2007; Putnam et al., 2008). The timing of these events is still uncertain, but they are thought to have happened about 500 million years ago, around the divergence of the hagfish and lamprey lineages. Due to limited genome data no thorough whole genome investigation of the lamprey genome and its relation to 2R have been made. How- ever, several limited studies of known gene families have been conducted which suggest divergence of the lamprey and hagfish lineages after 2R (Kuraku, 2010; Kuraku et al., 2009). A third round of whole genome duplication (3R) occurred early in the evolution of teleost fishes (Jaillon et al., 2004; Meyer and Van de Peer, 2005).

Figure 2: Estimated timing of the whole genome duplication events in vertebrate evolution (in MYA). The timing of 2R is thought to have happened around 500 Mya before the divergence of the lamprey and hagfish lineages. Independent WGDs in some amphibian and teleost fishes are not shown. Adapted from (Sundström 2010)

It has been suggested that the teleost-specific 3R facilitated the diversification leading to today’s large number of species of teleost fish (Amores, 1998), however

12 Figure 3: Schematic representation of whole genome duplications. One ancestral chromosome duplicates into four chromosomes in 2R and eight chromosomes in 3R. Species diverging after 2R, like tetrapods, should therefore ideally have four gene copies of each gene located on four chromosomes. In the teleost lineage an additional whole genome duplication, 3R, occurred resulting in eight gene copies located on eight chromo- somes.

it has been argued that this radiation cannot be easily ascribed to one single event (Santini et al., 2009). Additionally, more recent WGDs have occurred in some vertebrate lineages, including the salmonid fish lineage, some cyprinid fishes and some species of frog (eg. Xenopus laevis) (Hordvik et al., 1997; David et al., 2003; Evans et al., 2004; Van de Peer et al., 2009).

As a result of the two rounds of WGDs, one gene becomes four genes located on four different homologous chromosomes, as depicted in figure 3. The third round of WGD then results in eight genes located on eight different chromosomes. This means that species diverging after 2R (cartilaginous fishes and tetrapods) could have at most four gene copies generated through WGD, while teleost fishes, diverging after 3R, could have up to eight copies. In this way genome duplication events give rise to gene families.

There are three main fates for the duplicated genes. Non-functionalisation, which is the most common one, means that the genes accumulate mutations leading to pseudogenisation and eventually gene loss. The genes might also undergo sub- functionalisation where several gene copies share the ancestral functions between

13 them; they can for example be expressed in different tissues or at different time points (Force et al., 1999; Hahn, 2009). The least common outcome is so called neofunctionalisation where genes acquire completely new functions (Lynch and Conery, 2000; Conant and Wolfe, 2008).

Since whole chromosomes have been duplicated, one can expect to find several related parts of the genome with conserved gene organisation, so called paralo- gons (Coulier et al., 2000). Therefore one can include positional data to study the evolution of gene families. That several gene families are located in the vicinity of each other on several chromosomes in several species supports an expansion in WGD events. However, over time the genes might have been rearranged or undergone local duplication events.

1.3 Phylogenetic tree analysis

To investigate the evolutionary history of gene families it is useful to make se- quence based phylogenetic trees in combination with chromosomal analyses. The phylogenetic analyses use the aligned amino acid sequences to infer the most likely evolutionary relationships between the included sequences and represent this as a phylogenetic tree. By combining this with data on the chromsomal loca- tions of the studied genes one can draw conclusions about the evolutionary events that have shaped the gene families.

Ideally, a phylogenetic tree of a gene family that has expanded in 2R and 3R would display a two-times-two topology (Hughes and Friedman, 2003) with equal branch lengths, assuming identical evolutionary rates (see figure 4, the WGDs are marked with red dots). In the more realistic figure 4B the gene families have been subjected to evolutionary events, such as gene losses and translocations (Taylor et al., 2001). The branch lengths also differ between members of each family indicating different evolutionary rates. The gene family marked in green has un- dergone at least one gene loss and one rearrangement and the red gene family has undergone at least two gene losses and one local duplication event or possibly a translocation of one of the original four red genes to the same chromosome as another member. The phylogenetic analysis has not been able to resolve all of the branches of the purple gene family which may be due to too weak phylogenetic signal in the dataset as a result of very well-conserved or short sequences.

To provide a reference point for the phylogenetic analyses, trees are rooted with a so-called outgroup, a sequence more distantly related to the studied sequences

14 Figure 4: Schematic representation of ideal (A) and realistic (B) phylogenies of gene families that have expanded in 2R. In (A) the evolutionary rates of all genes are equal and the time points of the WGDs are marked with red dots. In B the green gene family has undergone a gene loss as well as a rearrangement and the red gene family has undergone two gene losses and one local duplication event.

than they themselves are to each other. When investigating the evolution within the vertebrates it is suitable to choose a related invertebrate gene as an outgroup. To provide relative dating for the evolutionary events that shaped the gene family, and rule out an expansion before any of the WGDs, it is necessary to include a species that diverged between the outgroup and the vertebrates. The tunicates provide such a reference since they are close relatives of vertebrates and genome sequences for three species are available.

Since many different events could have shaped the evolution of the vertebrate gene families, it is important to include sequences representing as many of the major vertebrate lineages as is possible and practical (teleost fishes, amphibians, birds, reptiles and mammals). This also increases the phylogenetic signal of the analysis as well as the reliability of the trees. Due to the state of the current genome assembly of the western clawed frog (Silurana tropicalis) which provides very little chromosomal data I chose to not include any amphibians in the study.

1.4 Neighbour-joining and maximum likelihood methods

Two common ways of creating sequence-based phylogenetic trees are the neighbour- joining (NJ) and the maximum likelihood (ML) methods. NJ is a relatively fast

15 method of inferring phylogenetic relationships by identifying the sequences that are closest neighbours and joining them together in the tree one by one (Saitou and Nei, 1987). To validate the NJ trees they are commonly subjected to a statis- tical evaluation called bootstrapping. This method repeats the NJ analyses a given number of times and produces a consensus tree based on all replicates, a so called bootstrap consensus tree. The bootstrap values at each branchpoint represent the probability for the branch to be placed in the same way if the test was repeated. The bootstrap value is therefore not a measurement of the validity of the tree but a measure of its reproducibility. Since the NJ method is very fast, as many as a thousand bootstrap replicates can be used without compromising the time.

The various phylogenetic methods have different shortcomings and strengths. Therefore it is common to replicate the phylogenetic analyses using several meth- ods. There are many methods that provide more powerful and reliable alternatives than the relatively outdated NJ. Maximum likelihood methods are more time con- suming but are considered much more reliable since they consider not only se- quence similarities, but rather search for the best tree out of all possible trees that can explain the dataset (Felsenstein, 1981). The best tree is the tree that makes the observed sequence alignment most likely, taking into account a given model of sequence evolution as well as sequence similarities. ML analyses make a starting tree which is then rearranged and tested, seeking the final topology that maximizes the statistical likelihood. The sequence evolution model can be standard for each type of ML analysis, but it is better to test different models against the dataset to find the one that applies best.

Testing maximum likelihood trees by bootstrapping makes the analysis time even longer since the same process has to be replicated a number of times. Recently other statistical methods have been developed to make maximum likelihood analy- sis quicker. One such method is to do an approximate likelihood-ratio (aLRT) test coupled with the PhyML (Phylogenetic Maximum Likelihood) tree-calculating al- gorithm. This method avoids replicating the maximum likelihood procedure and when compared to bootstrap tests it produces similarly reliable trees (Guindon et al., 2010). The aLRT values describe the probability that each branch position increases the likelihood of the entire tree. This means that a branchpoint with an aLRT value of 0,5 has a 50 percent chance of producing a tree with a higher likelihood if it were placed somewhere else.

16 2 Materials and methods

2.1 Collection of CACNA1 gene families

Protein sequences for the CACNA1 L-type family (CACNA1C, D, F and S) were collected from the Ensembl database version 61 by using the built in protein fam- ily predictions in the following species: human (Homo sapiens), mouse (Mus mus- culus), grey short-tailed opossum (Monodelphis domestica), chicken (Gallus gal- lus), anole lizard (Anolis carolinensis), zebrafish (Danio rerio), medaka (Oryzias latipes), three-spined stickleback (Gasterosteus aculeatus), spotted green puffer- fish (Tetraodon nigroviridis), transparent sea squirt (Ciona intestinalis) and fruit fly (Drosophila melanogaster). In general, the longest transcript prediction for each gene was chosen for the study. For proteins with diverging longest tran- scripts, alternate transcript predictions or automatic GenScan predictions (Burge and Karlin, 1997) from the Ensembl database were used instead. To identify addi- tional proteins that were not included in Ensembl’s automatic family predictions, Basic Local Alignment Searches (BLAST) (Altschul et al., 1990) were performed using identified amino acid sequences as search terms with standard settings on the Ensembl database and the National Center for Biotechnology Information (NCBI) database. For some families Hidden Markov Model searches (HMMER) (Finn et al., 2011) were made to identify genes in species not included in the primary selection. The procedure was then repeated for the CACNA1 N-type family.

2.2 Identifying neighbouring gene families

For the CACNA1 L-type family, a list of all identified genes located in the ge- nomic region 5 Mb upstream and 5 Mb downstream of each CACNA1 gene was collected from the Ensembl database version 61. Genes belonging to En- sembl protein families with members located within the selected region on at least two of the chromosomes were considered for the analysis of conserved synteny. The amino acid sequences of members included in the Ensembl families were downloaded from human (Homo sapiens), mouse (Mus musculus), grey short- tailed opossum (Monodelphis domestica), chicken (Gallus gallus), anole lizard (Anolis carolinensis), zebrafish (Danio rerio), medaka (Oryzias latipes), three- spined stickleback (Gasterosteus aculeatus), spotted green pufferfish (Tetraodon nigroviridis), transparent sea squirt (Ciona intestinalis) and fruit fly (Drosophila melanogaster). In general, the longest transcript of each gene was chosen for the study. Searches to identify additional proteins were performed as described for

17 the CACNA1 families. The procedure was then repeated for the CACNA1 N-type family.

2.3 Editing of genes and protein sequences

For short, incomplete or diverging sequences, the full intronic and flanking nu- cleotide sequence was collected and the Genscan gene prediction server (http://genes.mit.edu/GENSCAN.html) (Burge and Karlin, 1997) was used to iden- tify exons that had not been previously predicted. Sequences that were still di- verged with regard to exon-intron boundaries were curated manually by following consensus for splice donor and acceptor sites as well as sequence identities to other family members. Remaining highly divergent regions were removed. Some short protein sequences that did not provide enough sequence information in the alignments were removed; however the chromosomal position of the gene was registered in figures 6 and 8.

2.4 Sequence alignment and phylogenetic analyses

The sequences were aligned using ClustalX 2.0.12 (Larkin et al., 2007) with stan- dard settings. All alignments were manually inspected to optimize poorly aligned sequences by using BioEdit Sequence Alignment Editor (Hall 1999). The se- quences were named with the three-letter abbreviation for the species followed by the chromosome on which the gene is located. When more than one gene located on the same chromosome were present in an alignment an additional number was added, e.g. Mmu.2_2 means gene 2 of 2 located on mouse (Mus musculus) chro- mosome 2.

Phylogenetic trees were made using the neighbour-joining (NJ) method with a 1000 bootstrap replicas using standard settings (Gonnet weight matrix, gap open- ing penalty 10.0 and gap extension penalty 0.20) in ClustalX 2.0.12. The maximum- likelihood trees were generated by using the PhyML 3.0 (Guindon et al., 2010) web server available at http://atgc.lirmm.fr/phyml/. ProtTest (Abascal et al., 2005) was used to find the most suitable sequence evolution model for each alignment. JTT was found to be the best model for all studied CACNA1 gene families and neighbouring gene families except for UBA, COL and QSOX were WAG was preferred. The calculated substitutions models were used together with the fol- lowing settings: empirical amino acid frequencies, proportion of invariable sites

18 and gamma distribution parameter estimated from the dataset, eight substitution rate categories, the tree topology and branch length were optimized using both the NNI and SPR topology optimisation methods. A non-parametric SH-like approx- imate likelihood ratio (aLRT) test was selected for the statistical support of the topology.

All genes in the same cluster were coloured relative to the human orthologues' chromosomal position. The phylogenetic trees for the CACNA1 gene families have been rooted using the fruit fly sequences. All bootstrap and aLRT values below 50 percent (bootstrap under 500 and aLRT under 0,5) were considered non- supportive.

3 Results

3.1 CACNA1 L-type family

The CACNA1 L-type has four members in the , CACNA1C, 1D, 1F and 1S. Three of the four members, CACNA1C, 1D and 1S could be identified in all amniotes included in this study. However, CACNA1F could not be found in the chicken genome or in any other avian genomes. Three of the members, CACNA1D, 1F and 1S have retained both 3R copies in all four teleost fishes.

The rooted NJ and ML trees show that the genes form four distinct clusters with high support (fig. 5). In the NJ tree the two 3R duplicates of CACNA1D in teleost fish do not form one single cluster, which they do in the ML tree, but they do however cluster together with the other CACNA1D genes as a group. The NJ tree shows a topology which is consistent with an expansion in 2R. Both topologies, together with the retained duplications in teleost fish, support an expansion in 2R and 3R.

19 Figure 5: Rooted phylogenetic trees of CACNA1 of L-type using neighbour-joining (NJ) and maximum likelihood methods (ML). The NJ tree is made with 1000 bootstrap replicas and the bootstrap values are shown at the nodes. The corresponding numbers in the ML tree are aLRT values. The sequences are named with the three-letter abbreviation for the species followed by the chromosome on which the gene is located.

20 3.1.1 Chromosomal regions

From the selection criteria, described in materials and methods, thirteen neigh- bouring gene families to CACNA1 of L-type were identified. Two of the gene families, voltage-gated calcium channel subunit alpha-2 (CACNA2D) and RNA- binding motif proteins (RBM), are being studied by others in our lab and were therefore not included in this study.

Two of the gene families, solute carrier family 26 (SLC26A) and WNT are com- plex multi-member families that could not be divided into subfamilies and they were therefore excluded from the study. The kinase anchor proteins (AKAP) could only be found in amniotes and were hence excluded from the study.

The families included in the study are: cell division kinase (CDK), ELKS/RAB6- interacting/CAST family (ERC), IQ motif and Sec7 domain containing (IQSEC), lysine specific demethylase (KDM), kelch domain containing (KELCH), translo- case of inner mitochondrial membrane 17 (TIMM17), ubiquitin modifier activat- ing enzyme (UBA) and ubiquitin specific processing protease (USP). One of the predicted gene families, CDK, was divided into two subfamilies of which one was included in this study. All phylogenetic trees for the neighbouring families to the CACNA1 L-type family can be found in Supplementary figure S1.

For six of the eight studied families the phylogenetic trees as well as the positional data (figure 6) are in accordance with expansion in 2R. For two of the families, CDK and ERC, the topologies of the ML trees do not support an expansion in 2R; however, both the NJ trees as well as the positional data are consistent with an expansion in the vertebrate lineage.

No orthologue to the genes located on human chromosome X could be identified in the chicken genome. Missing genes in the chicken genome database is not uncommon and will be considered further.

An additional orthologue to human UBA1 was identified on mouse chromosome Y. KDM5D could only be identified in the human, mouse and opposum genomes, located on the Y chromosome in all species.

The two USP genes located next to each other in amniotes (chromosome 3 in human) are not recent local duplicates. Two USP genes were present before the divergence of vertebrates and the genes can therefore be considered as two sub- families. Only one of these have copies indicating expansion in 2R.

21 22

Figure 6: Chromosomal region of the CACNA1 L-type family genes. Repertoire of the CACNA1 L-type gene family and adjacent genes in the human, chicken and zebrafish genomes. Each CACNA1 L-type gene’s identity is shown by a letter in its box. The number below each box shows the chromosomal position in Mb. The number to the side of each line shows the chromosome. Gene names are according to human nomenclature and the colour coding is according to the human chromosome of the cluster in the phylogenetic studies. Striped boxes indicate inconsistencies in the topologies from the two different tree methods. A chicken orthologue to human chromosome X was not identified. A recent duplication has occurred for the KDM family in the human genome. The zebrafish ERC1 gene (in italics) represents two ERC1 genes located next to each other on chromosome 4. Some 3R copies are retained in the zebrafish genome. Translocated genes or genes located on smaller scaffolds have been placed according to the most parsimonious organisation. Note that genes have been omitted for clarity and that gene order has been changed to highlight similarity. Four of the eight studied neighbouring gene families have some members with two teleost co-orthologues to one amniote gene which is consistent with an addi- tional duplication in 3R. The zebrafish genome has three ERC1 genes, one located on chromosome 25 and two shorter genes located close to each other on chromo- some 4. These two genes are located in opposite directions. None of the genes are complete but if put together they form a full length gene. Even though no full length protein could be identified in a protein database (nr "non-redudant" in NCBI), a combination of the two parts were used in the analyses.

Four of the CACNA1 L-type genes are located on chromosome 8 in the zebrafish genome. When the genes on chromosome 8 are sorted according to their position along the chromosome they form four distinct regions corresponding to differ- ent paralogon members, two of which appear to be 3R copies. CACNA1S and KDM5B are located in between these two parts and CACNA1D and ERC2 are located in one end.

One CDK gene present in the zebrafish, medaka and pufferfish genomes (located on zebrafish ) does not cluster with any human genes in any of the phylogenies and it is therefore not possible to assign its orthologous relationship to the other genes in the family. However, the chromosomal positions in pufferfish and medaka are in regions that form an established 3R pair together with the chromosomes housing the genes coding for CDK18 (Kasahara et al., 2007). The genes could also be a fourth retained 2R copy that has been lost in the tetrapod lineage.

3.2 CACNA1 N-type family

Three members of the N-type are present in the human genome, CACNA1A, 1B and 1E. Two of the three genes, CACNA1B and 1E, could be identified in the genomes of all studied amniotes. CACNA1A could not be found in the chicken genome or in any other birds. CACNA1E has retained 3R copies in all four teleost genomes. One 3R copy of CACNA1B is missing in the pufferfish genome and one 3R copy of CACNA1A is missing in the medaka and stickleback genomes. However, a part of what could be an additional ortholog to CACNA1A was found in medaka (not included in the study due to short length). Both the NJ and ML tree show three distinct clusters with statistical support (see fig. 7). The topologies and the chromosomal data are consistent with expansion in 2R and 3R.

23 Figure 7: Rooted phylogenetic trees of CACNA1 of N-type using neighbour-joining (NJ) and maximum likelihood methods (ML). The NJ tree is made with 1000 bootstrap replicas and the bootstrap values are shown at the nodes. The corresponding numbers in the ML tree are aLRT values. The sequences are named with the three-letter abbreviation for the species followed by the chromosome on which the gene is located.

24 3.2.1 Chromosomal regions

From the selection criteria described in materials and methods twelve neighbour- ing gene families to CACNA1 of N-type were identified. One of these families, NOTCH, has been previously studied by Theodosiou et al. (2009) and was there- fore not included in our phylogenetic study. However, the chromosomal position data were included in figure 8.

The families included in the study are: angiopoietin (ANGPTL), bromodomain (BRD), collagen alpha chain precursor (COL), glycosyltransferase 25 (GLT), LIM homeobox (LHX), olfactomedin (OLFM), sulfhydryl oxidase precursor (QSOX), ral guanine nucleotide dissociation stimulator (RALGDS), RAS protein activator like (RASAL), SEC16 homolog (SEC) and syntaxin (STX). One of the predicted gene families, RALGDS, was divided into two subfamilies of which one was in- cluded in this study. All phylogenetic trees for the neighbouring families to the N-type of CACNA1 can be found in Supplementary figure S2.

For six of the eleven studied families the phylogenetic trees as well as the posi- tional data (see fig. 8) are in accordance with an expansion in 2R. For two of the families, GLT and RALGDS, no orthologue could be identified in the tunicate or amphioxus lineages and although the topologies from the phylogenetic trees as well as the positional data support an expansion in 2R the families might have expanded already before 2R.

For the BRD and SEC families the topology from one of the phylogenetic methods does not support an expansion in 2R, however, the topology from the other phy- logenetic tree as well as the positional data are consistent with expansions in the vertebrate lineage. The topologies from the phylogenetic studies of the RASAL family do not support an expansion within the vertebrate lineage but according to the positional data the genes still seem to be located within the paralogon.

Only a few orthologues to the genes located on human chromosomes 6 and 19 could be identified in the chicken genome, see discussion.

Two orthologues to RGL1 were identified on two different scaffolds in the genome of the anole lizard. Due to the lack of positional data it is difficult to say whether these genes are two alleles of the same gene locus or duplicates.

25 Figure 8: Chromosomal region of the CACNA1 N-type family genes. Repertoire of the CACNA1 N-type gene family and adjacent genes in the human, chicken and zebrafish genomes. Each CACNA1 N-type gene’s identity is shown by a letter in its box. The number below each box shows the chromosomal position in Mb. The number to the side of each line shows the chromosome. Gene names are according to human nomenclature and the colour coding is according to the human chromosome of the cluster in the phylogenetic studies. Striped boxes indicate inconsistencies in the topologies from the two different tree methods. Most orthologues to genes located on human chromosomes 6 and 19 could not be identified in the chicken genome. Several 3R copies are retained in the zebrafish genome. The zebrafish COL5A3 gene (in italics) represents three genes located next to each other on chromosome 3. Translocated genes or genes located on smaller scaffolds have been placed according to the most parsimonious organisation. Note that genes have been omitted for clarity and that gene order has been changed to highlight similarity. 26 Seven of the eleven studied neighbouring gene families have some members with two orthologous teleost sequences to one amniote gene which is consistent with an additional expansion in 3R.

Possible traces of an unknown member of the RALGDS family were found in the zebrafish genome on chromosome 21. However, due to its short lenght it was not included in the study. NOTCH2 located on zebrafish chromosome 13 was not included in the phylogenetic study made by Theodosiou (2009) and is therefore shown in white in figure 8.

Three partial COL5A3 genes are located next to each other on chromosome 3 in the zebrafish genome, one of which is reversed relative to the other two. Their nucleotide sequences have a high degree of identity. Even though local duplicates are not uncommon for the collagen gene family this could also be due to assembly errors. Only the gene coding for the longest protein was used in the studies. Three short collagen genes were identified in the pufferfish genome, one in the medaka genome and one in the anole lizard genome. The protein sequences encoded by these genes were all too short to include in the study and their orthologous rela- tionships could therefore not be established.

4 Discussion

The presence of three types of CACNA1 genes in the fruit fly genome suggests that three ancestral CACNA1 genes were present before the emergence of the ver- tebrate lineage (Jegla et al., 2009). The CACNA1 genes of L and N type seem to have expanded before the emergence of vertebrates in 2R followed by an ad- ditional teleost-specific expansion in 3R, which is described in more detail in this report. An analysis of the T-type family (work in progress) indicate that this fam- ily also expanded in the same manner.

The existence of the paralogons housing the L and N-type families is supported by the reconstructed linkages showing similarities between the amphioxus and hu- man genomes (Putnam et al., 2008) as well as the reconstruction of the vertebrate ancestral karyotype (Nakatani et al., 2007). A more extensive investigation of the L-type paralogon is currently in progress.

The expansions are supported not only by the phylogenetic studies of the CACNA1 gene families but also studies of their neighbouring gene families. For the CACNA1 L type gene family six of the eight neighbouring gene families support an expan-

27 sion in 2R. The phylogenies for the remaning two were ambiguous, however, the positional data support an expansion in 2R.

For the CACNA1 N type gene family six of the eleven neighbouring gene fam- ilies support an expansion in 2R. Two of the families lack sequences in species serving as a point of relative dating. However, both the topologies as well as chro- mosomal data support expansions in 2R. The phylogenies for two of the families were ambiguous, however, as for the L-type genes the positional data support an expansion in 2R. The remaining neighbouring gene family show topologies that do not support expansion in 2R, however, this is supported by the chromosomal data. The deviating phylogenetic trees are probably due to uneven evolutionary rates. The previous study of the neighbouring family NOTCH (not included in our phylogenetic studies) suggests that also this family expanded in 2R. (Theodosiou et al., 2009).

The high level of retained 2R and 3R copies of CACNA1 genes suggests sub- or neofunctionalisation of the genes and, as shown in table 1, the ten different CACNA1 genes display a wide range of functions. One could conceivably specu- late that these expansions contributed to the increased complexity of the vertebrate nervous system. This study together with the previous study of the related voltage- gated sodium channels (Widmark et al., 2011) (that have retained all 2R and 3R copies) further supports the theory that important developmental and neuronal gene copies are more likely to be retained (Santini et al., 2009).

From the chromosomal data in fig. 6 and fig. 8 it appears as if some rearrange- ments have occurred in the zebrafish genome which makes evolutionary studies more difficult. Four CACNA1 genes are located on zebrafish chromosome 8, fig. 6. However, when the genes on this chromosome are sorted according to their position it seems as if regions from one 3R pair have been joined together on one chromosome, separated by a small region of genes orthologous to another human chromosome. Genome rearrangements in the teleost lineage has been previously proposed (Kasahara et al., 2007) and were recently proven by an analysis of the spotted gar (Lepisosteus oculatus) genome (Amores et al., 2011). The spotted gar diverged from the teleost lineage before 3R and was found to share a more similar gene organisation with the human genome than with that of teleost fish.

As mentioned previously no or very few orthologues to genes located on human chromosome X, 6 and 19 could be found in the chicken genome. This is not uncommon and can to a large extent be explained by the fact that parts of the chicken genome is located on so called microchromosomes that have proven hard to sequence in both chicken and turkey (Dalloul et al., 2010). Some of these genes

28 have been found in other avian species which further suggests that not all of these genes are actually lost in the chicken genome.

It is interesting that two gene families with genes located on human chromosome X (KDM and UBA) also have members on the Y chromosome. KDM5D was identified in the human, mouse and opossum genomes while UBA1 was only identified in the mouse genome, all on the Y chromosome. The human KDM genes located on the X and Y chromosomes are not a part of the pseudoautosomal regions that recombine during meiosis. However, the mammalian sex chromo- somes are thought to have emerged from one pair of autosomes (Graves, 2006). Therefore, it is possible that the two UBA1 genes in mouse as well as KDM5C and 5D in several mammals originate from two different alleles that were both retained after the emergence of the sex chromosomes.

This study is based on predictions of genes that may potentially involve some uncertainties. To avoid this, several species are included in the study and the se- quences are compared with each other and the sequence alignments are inspected and manually curated and edited if necessary. The phylogenies are based on algo- rithms that sometimes are optimised for each dataset. In order to reduce the risk of biases two different phylogenetic algorithms have been used.

In addition to a deeper analysis of the paralogon housing the L-type family and the evolutionary study of the T-type family future studies of the CACNA1 genes will also involve sequence analysis and inspection of important positions for specific channel functions and mechanisms.

29 5 Acknowledgements

I want to thank my supervisor Dan Larhammar for always being supportive and guiding me throughout the years I have spent in the lab and for taking me in from the very beginning. I would also like to thank Daniel Ocampo Daza, your knowl- edge seems never-ending and your enthusiasm and dedication always shows, you truly are a great teacher. A big thank you to Görel Sundström for being a great companion and for never sitting down with you feet touching the floor. David Lagman, thanks for all the great input and all the great ideas, don’t stop singing. Thanks for all the guidance Christina Bergqvist, without you lunch breaks would not be the same. Also a big thank you to the rest of the group, and for those who came and left, Bo, Chus, Helena, Ingrid, Jasna, Kate, Niclas, Caryn, Frida and Kat. You all made my time here unforgettable. Finally, great thanks to my external supervisor Tatjana Haitina for guidance and for carefully inspecting my work.

30 6 References

ABASCAL,F.,ZARDOYA, R., AND POSADA, D. 2005. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21:2104–2105.

ALTSCHUL,S.F.,GISH,W.,MILLER,W.,MYERS,E.W.,AND LIPMAN,D.J. 1990. Basic local alignment search tool. J Mol Biol 215:403–410.

AMORES, A. 1998. Zebrafish hox Clusters and Vertebrate Genome Evolution. Science 282:1711–1714.

AMORES, A., CATCHEN, J., FERRARA, A., FONTENOT, Q., AND POSTLETH- WAIT, J. H. 2011. Genome evolution and meiotic maps by massively parallel dna sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics 188:799–808.

BURGE,C.AND KARLIN, S. 1997. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94.

CATTERALL, W. A. 2011. Voltage-gated calcium channels. Cold Spring Harb Perspect Biol 3:a003947.

CONANT,G.C.AND WOLFE, K. H. 2008. Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet 9:938–950.

COULIER,F.,POPOVICI, C., VILLET, R., AND BIRNBAUM, D. 2000. MetaHox gene clusters. J Exp Zool 288:345–351.

DALLOUL, R. A., LONG, J. A., ZIMIN,A.V.,ASLAM, L., BEAL, K., BLOMBERG LE, A., BOUFFARD,P.,BURT,D.W.,CRASTA, O., CROOI- JMANS,R.P.,COOPER, K., COULOMBE, R. A., DE, S., DELANY, M. E., DODGSON, J. B., DONG, J. J., EVA N S , C., FREDERICKSON, K. M., FLICEK, P., F LOREA, L., FOLKERTS, O., GROENEN, M. A., HARKINS,T.T.,HER- RERO, J., HOFFMANN, S., MEGENS, H. J., JIANG, A., DE JONG,P., KAISER,P.,KIM, H., KIM,K.W.,KIM, S., LANGENBERGER, D., LEE, M. K., LEE,T.,MANE, S., MARCAIS, G., MARZ, M., MCELROY,A.P., MODISE,T.,NEFEDOV, M., NOTREDAME, C., PATON, I. R., PAYNE, W. S., PERTEA, G., PRICKETT, D., PUIU, D., QIOA, D., RAINERI, E., RUFFIER, M., SALZBERG, S. L., SCHATZ, M. C., SCHEURING, C., SCHMIDT, C. J., SCHROEDER, S., SEARLE, S. M., SMITH, E. J., SMITH, J., SONSTEGARD, T. S., STADLER,P.F.,TAFER, H., TU, Z. J., VAN TASSELL,C.P.,VILELLA, A. J., WILLIAMS,K.P.,YORKE, J. A., ZHANG, L., ZHANG, H. B., ZHANG, X., ZHANG,Y.,AND REED, K. M. 2010. Multi-platform next-generation se- quencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol 8.

31 DAV I D , L., BLUM, S., FELDMAN,M.W.,LAV I , U., AND HILLEL, J. 2003. Re- cent duplication of the common carp (Cyprinus carpio L.) genome as revealed by analyses of microsatellite loci. Mol Biol Evol 20:1425–1434.

DEHAL,P.AND BOORE, J. L. 2005. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol 3:e314.

DREBORG, S., SUNDSTRÖM, G., LARSSON, T. A., AND LARHAMMAR,D. 2008. Evolution of vertebrate opioid receptors. Proc Natl Acad Sci U S A 105:15487–15492.

EVA N S , B. J., KELLEY, D. B., TINSLEY, R. C., MELNICK, D. J., AND CAN- NATELLA, D. C. 2004. A mitochondrial DNA phylogeny of African clawed frogs: phylogeography and implications for polyploid evolution. Mol Phylo- genet Evol 33:197–213.

FELSENSTEIN, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376.

FINN, R. D., CLEMENTS, J., AND EDDY, S. R. 2011. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–37.

FORCE, A., LYNCH, M., PICKETT, F. B., AMORES, A., YAN, Y. L., AND POSTLETHWAIT, J. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531–1545.

GRAVES, J. A. M. 2006. Sex chromosome specialization and degeneration in mammals. Cell 124:901 – 914.

GUINDON, S., DUFAYARD,J.F.,LEFORT,V.,ANISIMOVA, M., HORDIJK,W., AND GASCUEL, O. 2010. New algorithms and methods to estimate maximum- likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321.

HAHN, M. W. 2009. Distinguishing among evolutionary models for the mainte- nance of gene duplicates. J Hered 100:605–617.

HOLLAND, L. Z. 2009. Chordate roots of the vertebrate nervous system: expand- ing the molecular toolkit. Nat Rev Neurosci 10:736–746.

HORDVIK, I., DE VRIES LINDSTROM, C., VOIE, A. M., LILYBERT, A., JA- COB, J., AND ENDRESEN, C. 1997. Structure and organization of the im- munoglobulin M heavy chain genes in Atlantic salmon, Salmo salar. Mol Im- munol 34:631–639.

32 HUGHES,A.L.AND FRIEDMAN, R. 2003. 2R or not 2R: testing hypotheses of genome duplication in early vertebrates. J Struct Funct Genomics 3:85–93.

JAILLON, O., AURY, J. M., BRUNET,F.,PETIT, J. L., STANGE-THOMANN, N., MAUCELI, E., BOUNEAU, L., FISCHER, C., OZOUF-COSTAZ, C., BERNOT, A., NICAUD, S., JAFFE, D., FISHER, S., LUTFALLA, G., DOS- SAT, C., SEGURENS, B., DASILVA, C., SALANOUBAT, M., LEVY, M., BOUDET, N., CASTELLANO, S., ANTHOUARD,V.,JUBIN, C., CASTELLI, V., K ATINKA, M., VACHERIE, B., BIEMONT, C., SKALLI, Z., CATTOLICO, L., POULAIN, J., DE BERARDINIS,V.,CRUAUD, C., DUPRAT, S., BROT- TIER,P.,COUTANCEAU,J.P.,GOUZY, J., PARRA, G., LARDIER, G., CHAP- PLE, C., MCKERNAN, K. J., MCEWAN,P.,BOSAK, S., KELLIS, M., VOLFF, J. N., GUIGO, R., ZODY, M. C., MESIROV, J., LINDBLAD-TOH, K., BIR- REN, B., NUSBAUM, C., KAHN, D., ROBINSON-RECHAVI, M., LAUDET,V., SCHACHTER,V.,QUETIER,F.,SAURIN,W.,SCARPELLI, C., WINCKER, P., L ANDER, E. S., WEISSENBACH, J., AND ROEST CROLLIUS, H. 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431:946–957.

JANSEN, R., TIMMERMAN, J., LOOS, M., SPIJKER, S., VA N OOYEN, A., BRUSSAARD, A. B., MANSVELDER, H. D., SMIT, A. B., DE GUNST, M., AND LINKENKAER-HANSEN, K. 2011. Novel candidate genes associated with hippocampal oscillations. PLoS One 6:e26586.

JEGLA, T. J., ZMASEK, C. M., BATALOV, S., AND NAYAK, S. K. 2009. Evolu- tion of the human ion channel set. Combinatorial chemistry & high throughput screening 12:2–23.

KASAHARA, M., NARUSE, K., SASAKI, S., NAKATANI,Y.,QU,W.,AH- SAN, B., YAMADA,T.,NAGAYASU,Y.,DOI, K., KASAI,Y.,JINDO,T., KOBAYASHI, D., SHIMADA, A., TOYODA, A., KUROKI,Y.,FUJIYAMA, A., SASAKI,T.,SHIMIZU, A., ASAKAWA, S., SHIMIZU, N., HASHIMOTO, S.- I., YANG, J., LEE,Y.,MATSUSHIMA, K., SUGANO, S., SAKAIZUMI, M., NARITA,T.,OHISHI, K., HAGA, S., OHTA,F.,NOMOTO, H., NOGATA, K., MORISHITA,T.,ENDO,T.,SHIN-I, T., TAKEDA, H., MORISHITA, S., AND KOHARA, Y. 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature 447:714–9.

KURAKU, S. 2010. Palaeophylogenomics of the vertebrate ancestor- impact of hidden paralogy on hagfish and lamprey gene phylogeny. Integrative and Com- parative Biology 50:124–129.

33 KURAKU, S., MEYER, A., AND KURATANI, S. 2009. Timing of genome dupli- cations relative to the origin of the vertebrates: Did cyclostomes diverge before or after? Molecular Biology and Evolution 26:47–59.

LARKIN, M. A., BLACKSHIELDS, G., BROWN,N.P.,CHENNA, R., MCGET- TIGAN, P. A., MCWILLIAM, H., VALENTIN,F.,WALLACE, I. M., WILM, A., LOPEZ, R., THOMPSON, J. D., GIBSON, T. J., AND HIGGINS,D.G. 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947–2948.

LARSSON, T. A., OLSSON,F.,SUNDSTRÖM, G., LUNDIN, L. G., BRENNER, S., VENKATESH, B., AND LARHAMMAR, D. 2008. Early vertebrate chro- mosome duplications and the evolution of the neuropeptide Y receptor gene regions. BMC Evol Biol 8:184.

LIPSCOMBE, D., HELTON, T. D., AND XU, W. 2004. L-type calcium channels: the low down. J Neurophysiol 92:2633–2641.

LYNCH,M.AND CONERY, J. S. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155.

MEYER,A.AND VAN DE PEER, Y. 2005. From 2R to 3R: evidence for a fish- specific genome duplication (FSGD). Bioessays 27:937–945.

NAKATANI,Y.,TAKEDA, H., KOHARA,Y.,AND MORISHITA, S. 2007. Recon- struction of the vertebrate ancestral genome reveals dynamic genome reorgani- zation in early vertebrates. Genome Res 17:1254–1265.

OHNO, S. 1970. Evolution by gene duplication. Springer-Verlag.

PANOPOULOU,G.AND POUSTKA, A. J. 2005. Timing and mechanism of an- cient vertebrate genome duplications – the adventure of a hypothesis. Trends in genetics : TIG 21:559–67.

PUTNAM, N. H., BUTTS,T.,FERRIER, D. E., FURLONG,R.F.,HELL- STEN, U., KAWASHIMA,T.,ROBINSON-RECHAVI, M., SHOGUCHI, E., TERRY, A., YU, J. K., BENITO-GUTIERREZ, E. L., DUBCHAK, I., GARCIA- FERNANDEZ, J., GIBSON-BROWN, J. J., GRIGORIEV,I.V.,HORTON, A. C., DE JONG, P. J., JURKA, J., KAPITONOV,V.V.,KOHARA,Y.,KUROKI, Y., L INDQUIST, E., LUCAS, S., OSOEGAWA, K., PENNACCHIO, L. A., SALAMOV, A. A., SATOU,Y.,SAUKA-SPENGLER,T.,SCHMUTZ, J., SHIN, I. T., TOYODA, A., BRONNER-FRASER, M., FUJIYAMA, A., HOLLAND, L. Z., HOLLAND,P.W.,SATOH, N., AND ROKHSAR, D. S. 2008. The am- phioxus genome and the evolution of the chordate karyotype. Nature 453:1064– 1071.

34 SAITOU,N.AND NEI, M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425.

SANTINI,F.,HARMON, L. J., CARNEVALE, G., AND ALFARO, M. E. 2009. Did genome duplication drive the origin of teleosts? A comparative study of diversification in ray-finned fishes. 9.

SUNDSTRÖM, G. 2010. Evolution of the Neuropeptide Y and Opioid Systems and their Genomic Regions , volume 585. Acta Universitatis Upsaliensis, Uppsala.

SUNDSTRÖM, G., LARSSON, T. A., BRENNER, S., VENKATESH, B., AND LARHAMMAR, D. 2008a. Evolution of the neuropeptide Y family: new genes by chromosome duplications in early vertebrates and in teleost fishes. Gen Comp Endocrinol 155:705–716.

SUNDSTRÖM, G., LARSSON, T. A., AND LARHAMMAR, D. 2008b. Phyloge- netic and chromosomal analyses of multiple gene families syntenic with verte- brate Hox clusters. BMC Evol Biol 8:254.

TAYLOR,J.S.,VAN DE PEER,Y.,BRAASCH, I., AND MEYER, A. 2001. Com- parative genomics provides evidence for an ancient genome duplication event in fish. Philos Trans R Soc Lond B Biol Sci 356:1661–1679.

THEODOSIOU, A., ARHONDAKIS, S., BAUMANN, M., AND KOSSIDA, S. 2009. Evolutionary scenarios of Notch proteins. Mol Biol Evol 26:1631–1640.

VAN DE PEER,Y.,MAERE, S., AND MEYER, A. 2009. The evolutionary signif- icance of ancient genome duplications. Nat Rev Genet 10:725–732.

WIDMARK, J., SUNDSTRÖM, G., OCAMPO DAZA, D., AND LARHAMMAR,D. 2011. Differential evolution of voltage-gated sodium channels in tetrapods and teleost fishes. Mol Biol Evol 28:859–871.

35 Supplementary figure S1

Phylogenetic analysis of neighbouring gene families to CACNA1 L-type genes using the neighbour joining (NJ) and maximum likelihood (ML) methods with settings as described in materials and methods. The analysis includes the CDK, ERC, IQSEC, KDM, KELCH, TIMM, UBA and USP protein families.

36 Bfl.C3Z917 Hsa.1.CDK18 NJ 905 CDK 979 Mdo.2 371 Mmu.1_2 Aca.GL343228.1

705 Ola.5 890 999 Tni.11 1000 1000 Gac.XVII Dre.11 Gga.26 Hsa.X.CDK16 1000 898 Mmu.X

567 999 Mdo.X Aca.2 998 Gac.s114 998 1000 Tni.Unr Dre.8 Ola.7 348 1000 553 Tni.9 Dre.9 Ola.6 668 372 1000 Tni.13 Gac.XIX Gac.IV 505 1000 905 Tni.19 622 Ola. 23 Dre.4 478 Aca.5 875 Gga.1 1000 Mdo.8_1 744Hsa.12.CDK17 542 Mmu.10 Cel.IV

0.05 Dre.8 0,998 Tni.Unr 0,999 Gac.s114 ML 0,857 Aca.2 0,942 Mdo.X 0,944 Mmu.X 0,995 Hsa.X.CDK16 Bfl.C3Z917

0,618 Tni. 9 0,875 0,879 Ola.7 Dre.9 Mmu.10 0,786 0Hsa.12.CDK17 Mdo.8_1 0,991 Gga.1 0,618 0,215 Aca.5

0,998 Tni.19 0,986 Ola.23 0,849 Gac.IV 0,961 Gac.XIX 1 Tni.13 0,737 0,065 Ola.6 0,745 Dre.4 Gga.26 0,904 Aca.GL343228 0,814 Mmu. 1_ 2 0,992 0,97 Hsa.1.CDK18 1 Mdo.2 Dre.11 0,997 Gac.XVII 0,946 Tni.11 0,655 Ola.5 Cel.IV

0.3 37 Cin.s95 NJ Gac.XII ERC 920 Tni.9 1000

995 Ola.7 Dre.8

Hsa.3.ERC2 1000 Mmu.14 974

981 Mdo.6

1000 Gga.12 998 Aca.2

1000 Ola.5 817 Tni.11 1000

1000 Gac.s27 Dre.11

1000 Tni. 19 603 Ola.23 1000 Gac.IV

Gac.XIX 809 1000 Tni.13 1000

1000 Ola.6 Dre.25 965 Dre.4

Mdo.8 863 Hsa.12.ERC1 1000 1000 Mmu.6 556

785 Gga.1 Aca.5

Dme.2R

0.08 Aca.5

Mdo.8 0,981 ML 0,93Mmu.6 0,987 0,366Hsa.12.ERC1

Gga.1

Dre.25 0 0,999 Ola.6

1 Tni.13 0,841 0,883 Gac.XIX

0,987Dre.4 Ola.23

1 Gac.IV 0,941 Tni.19

Cin.s95

A2Aca.2 0,086 Gga.12 0,528

0,004 Mdo.6 0,984 Mmu.14 0,829 Hsa.3.ERC2

0,929 Dre.11 0,998Gac.s27

0,998Tni.11 0,721 0,928 Ola.5

Dre.8

0,981 Ola.7

0,999 Tni.9 0,981 Gac.XII

Dme.2R

0.9 38 Cin.10q Aca.5 1000 NJ Gga.1 IQSEC 1000 Hsa.12.IQSEC3 1000 1000 Mmu.6_1 989 Mdo.8 Dre.4 1000 Gac.IV 977 996 Ola.23 1000 Tni.19 Ola.6 854 1000 Tni.13 989 Gac.XIX Dre.18 Hsa.X.IQSEC2 1000 835 Mmu.X 1000 1000 Mdo.X Aca.2_1 Gac.XII 1000 Ola.7_2 1000 992 1000 Tni.9_ 2 638 Dre.8_1 Dre.8_2 1000 Ola.uc88 592 1000 623 Tni.Unr Gac.Un Dre.23 Gac.s99 888 1000 Tni.9_1 1000 Ola.7_1 Aca.2_2 762 Gga.12 1000 928 Mdo.6 1000 Hsa.3.IQSEC1 968 Mmu.6_2 1000 Dre.11 994 Gac.XVII 1000 Ola.5 562 Tni.11 Dme.3L

0.06 Cin.10q Tni.11 0,64 ML 0,995Ola.5 0,964Gac.XVII Dre.11 0,862 Ola.7_1 1Tni.9_1 0,633 0,988Gac.s99 0,897 Dre.23 Gga.12 Mmu.6_2 0,9851 0,97 Hsa.3.IQSEC1 0,666Mdo.6 Aca.2_2 0,035 Dre.8_1 0,938Gac.XII 0,999 Tni.9_2 0,248 1 Ola.7_2 Dre.8_2 0,79 Tni.Unr 1 1 Gac. Un 0,977 Ola.uc88 Aca.2_1 0,998 1 Mdo.X 0,99Mmu.X 1 Hsa.X.IQSEC2 Mdo.8 0,998Mmu.6_1 1 0,961 Hsa.12.IQSEC3 Gga.1 0,999 Aca.5 1 Dre.18 1 Gac.XIX 1 Tni.13 0,601 0,8 Ola.6 Dre.4 0,997 Gac.IV 0,959 Tni.19 0,55 Ola.23 Dme.3L

2.0 39 Cin.1p Aca.5 740 KDM NJ Gga.1 1000 Hsa.12.KDM5A 971 1000 Mmu.6 Mdo.8 1000 Gac.IV 999 1000 Tni.19 1000 Ola.23 Dre.NA683 Hsa.1.KDM5B 1000 740 1000 Mmu.1 880 Mdo.2 1000 Gga.26 Aca.4 Gac.XII_1 540 1000 1000 Ola.7_2 1000 Tni.9_2 Dre.8_1

813 825 Gac.s27 918 1000 Ola.5 1000 Tni.11 Dre.11 Gac.XII_2 483 1000 Ola.7_1 1000 Tni.9_1 Dre.8_2

1000 Aca.1 Hsa.X.KDM5C 1000 1000 Mmu.X 976 Hsa.Y.KDM5D 872 938 Mmu.Y Mdo.X 1000 Mdo.Y Dme.2L

0.05

Cin.1p Gga.1 0,49 ML Aca.5 1 Mdo.8 0,978Mmu.6 0,998 1 Hsa.12.KDM5A Dre.NA683 0,992 Ola.23 0,999 Tni.19 0,992 Gac.IV Dre.11 0,99 1 Ola.5 0,854 1 Gac.s27 0,997 Tni.11 Dre.8_1

1 Ola.7_2 0,559 1 Gac.XII_1 1 Tni.9_ 2 Mdo.2 0,762 0,974 Mmu.1 1 1 Hsa.1.KDM5B Gga.26 0,966 Aca.4 Dre.8_2 1 Tni.9_1 1 Ola.7_1 0,079 1 Gac.XII_2 Aca.1 Mdo.Y 0,999 0,993 Mdo.X 1 Hsa.Y.KDM5D 1 Mmu.Y 1 Mmu.X 0,961 Hsa.X.KDM5C Dme.2L

03 40 Aca.4 899 KELCH NJ Gga.26

1000 Hsa.1.KLHDC8A 999 Mmu.1 997

Mdo.2 995 Gac.XVII 661 Tni.11 1000

1000 Ola.5

Dre.11

Gac.XII 708 Ola.7 1000

1000 Tni.Unr

Dre.23

Hsa.3.KLHDC8B

1000 1000 Mmu.9 999

742 Mdo.6

999 Aca.GL343723.1

Gga.12

Cin.10q

0.06 Aca.GL343723 .1 ML 0,757 Mdo.6 0,992 Mmu.9 0,978 1 Hsa.3.KLHDC8B

Gga.12 0,999

Dre.23

0,957 Gac.XII

0,999 Ola.7 0,106 Tni.Unr

Mdo.2

0,833 Mmu.1 0,983

0,991 Hsa.1.KLHDC8A

Gga.26 0,891 0,874 Aca.4

Dre.11

1 Ola.5

0,978 Tni.11 0,869 Gac.XVII

Cin.10q

0.3 41 Cin.s83 1000 NJ Cin.3p TIMM

Hsa.X.TIMM17B 897 Mmu.X 943

667 Mdo.X

Aca.2 1000

998 Tni.9 758 Gac.XII 885

800 Ola.7

Dre.8

1000 Ola.5

734 Tni.11 448 989 Gac.XVII

Dre.11

1000 Hsa.1.TIMM17A 693 Mmu.1 564

Mdo.2 954 Aca.4 380 Gga.26

Cel.IV_2 1000 Cel.IV_1

0.07 Dre.8 ML 0,893 Tni.9 0,943 Ola.7 0,747 0,615 Gac.XII

Aca.2

0,782Mdo. X

0,796 0,636 Hsa.X.TIMM17B 0,925 Mmu.X

Gga.26

Aca.4 0,97

Mdo.2 0,656 Dre.11

0,979 0,852 0,999 Ola.5

0,839 Tni.11 0,279 0,438 Gac.XVII

Hsa.1.TIMM17A 0,754 Mmu.1

Cin.s83 1 Cin.3p

Cel.IV_2 0,979 Cel.IV_1

0.4 42 Cin.1q

NJ Gac.XVII UBA 996 Ola.5 999

1000 Tni.11

Dre.6

1000 Aca.2_2

Hsa.3.UBA7

1000 1000 Mmu.9 1000

972 Mdo.6

Gga.12

795 Hsa.X.UBA1 1000 Mmu.X 687

1000 Mmu.Y

631 Mdo.X

Aca.2_1 1000 Dre.23

1000 Ola.7

1000 Gac.XII 553 Tni.9

Dme.2R

0.05 Cin.1q ML Ola.7 0,999 Tni.9 0,792 1 Gac.XII

Dre.23

Mmu.Y 0,998 0 Hsa.X.UBA1 0,998

1 Mmu.X

0,972 Mdo.X

Aca.2_1 0,996 Gac.XVII 0,29 Ola.5 1

1 Tni.11

Dre.6 1 Aca.2_2

1 Gga.12

0,916 Mdo.6

1 Hsa.3.UBA7 1 Mmu.9

Dme.2R

0.2 43 USP NJ

44 USP ML

45 Supplementary figure S2

Phylogenetic analysis of neighbouring gene families to CACNA1 N-type genes using the neighbour joining (NJ) and maximum likelihood (ML) methods with set- tings as described in materials and methods. The analysis includes the ANGPTL, BRD, COL, GLT, LHX, NOTCH, OLFM, QSOX, RASAL, RALGDS, SEC and STX protein families.

46 Hsa.19.ANGPTL6 1000 Mmu.9 NJ 1000 ANGPTL 995 Mdo.3 Aca.GL343414.1

969 Gac.XI 896 1000 Tni.3

1000 Ola.8 Dre.3 Aca.GL343300.1 937 Gga.8 1000 Hsa.1.ANGPTL1 538 998 Mmu.1 Mdo.2 Gac.VIII 816 999 997 Tni.1

1000 Ola.4 Dre.20 869 Gac.III 990 Tni.15 1000 1000 Ola.17 924 Dre.2 Dre.5_1 981 Gac.XIV Aca.GL343502.1 881 Gga.17 1000 805 Mdo.1 1000 Hsa.9.ANGPTL2 1000 Mmu.2 843 Gac.XIII 995 833 Ola.9 Dre.8 Csa.r140

0.05 Mdo.2 0,225Hsa.1.ANGPTL1 0,993 ML 1 Mmu.1 Gga.8 0,964 Aca.GL343300.1

0,941 Dre.20 0,971 Gac.VIII 0,976 Tni.1 0,82 0,999 Ola.4 Dre.2 0,917 Ola.17 0,986 Gac.III 0,239 Tni.15 Gga.17 0,788 Aca.GL343502.1 0,971 Mdo.1 0,741 Hsa. 9. ANGPTL2 0,999 0,999 Mmu.2 Dre.8

0,919 Gac.XIII 0,995 Ola.9 0,496 Gac.XIV 0,866 0,997 Dre.5_1 Dre.3 1 Ola.8 0,991 Gac.XI 0,774 0,997 Tni.3 Aca.GL343414.1 0,994 Mdo.3 1 Mmu.9 0,991 Hsa.19.ANGPTL6 Csa.r140

0.5 47 Hsa.6.BRD2 1000 1000 Mmu.17_2 NJ 998 Mdo.2_1 BRD 1000 Gga.16 Aca.GL344099.1 Gac.s58 859 1000 Tni.Unr_1 724 Ola.11 997 Dre.19 1000 Dre.16 Gac.XX 1000 Ola.16 Aca.4 896 Gga.8 1000 Hsa.1.BRDT 1000 Mmu.5 700 1000 Mdo.2_2 Cin.9p 439 Aca.2 995 Gga.Unr 1000 Hsa.19.BRD4 1000 1000 Mmu.17_1 Mdo.3 1000 Gac.XI 994 1000 Tni.Unr_3 1000 Ola.8 510 665 Dre.3 Ola.1 1000 Tni.18 574 Gga.17 1000 Mdo.1 638 Hsa.9.BRD3 1000 1000 Mmu.2 803 Aca.GL343736.1 817 Dre.5 1000 Dre.21 478 Ola.9 1000 Tni.12 Gac.XIV 643 998 Tni.4 Ola.12 497 Gac.VIII 1000 Tni.Unr_2 257 710 Ola.4 Dre.6 Dme.X

0.04 Cin.9p Mdo.2_1 0,982Hsa.6.BRD2 0,877 ML 1 Mmu.17_2 Gga.16 0,864 Aca.GL344099.1 0,965 Dre.19 0,25 Ola.11 1 Gac.s58 0,725 1 Tni.Unr_1 Dre.16 0,833 Gac.XX 1 Ola.16 Ola.12 1 Gac.XIV 0,927 Tni.4 0,977 Hsa.9.BRD3 1 0,987 Mmu.2 0,965Mdo.1 0,687Aca.GL343736.1 0,975 0,827Gga.17 Dre.5 0,804 Dre.21 0,805 Ola. 9 1 Tni.12 0,953 Mdo.2_2 1 Hsa.1.BRDT 1 1 Mmu.5 Aca.4 0,935 0,868 Gga.8 Dre.6 0,982 Gac.VIII 1 Tni.Unr_2 0,899 Ola.4 0,971 Ola.1 1 Tni.18 1 Dre.3 0,997 Ola.8 0,999 Gac.XI 0,859 1 Tni.Unr_3 Aca.2 0,97 Gga.Unr 0,995Mdo.3 0,947Hsa.19.BRD4 1 Mmu.17_1 Dme.X

0.6 48 Gac.IX 1000 1000 Ola.1 COL NJ 1000 Tni.18 Dre.1 1000 Ola.8 997 882 Gac.XI 986 Tni.3 1000 Dre.3_3 Hsa.19.COL5A3 1000 1000 Mmu.9 1000 Mdo.3 Aca.GL343198.1 Hsa.6.COL11A2 1000 1000 Mmu.17 1000 Mdo.2_2 1000 Aca.GL343857.1 1000 Gac.s58 1000 1000 Ola.11 1000 Tni.Unr_2 Dre.19 Aca.LGg 763 Gga.17 1000 Hsa.9.COL5A1 1000 1000 1000 Mmu.2 Mdo.1 1000 Ola.12 767 884 Dre.21 1000 Tni.4_1 Gac.XIV 997 Gac.III 1000 1000 Tni.15r Dre.2 Aca.4 821 1000 Gga.8 1000 Hsa.1.COL11A1 991 1000 Mmu.3 700 Mdo.2_1 Dre.24 1000 Ola.20 1000 Gac.XXI 651 Tni.Unr_1 Cin.13q 1000 Cin.7q

0.06 Dre.19 1 Tni.Unr_2 ML 0,999Gac.s58 0,904 1 Ola.11 Aca.GL343857.1 0,997 Mdo.2_2 1 Hsa.6.COL11A2 0,996 Mmu.17 Gga.17 1 Aca.LGg 0,94Mdo.1 1 Hsa.9.COL5A1 0,907 1 1 Mmu.2 Dre.21 1 Ola.12 0,929 Gac.XIV 0,732 Tni.4_1 Gga.8 1 Aca.4 0,278Mdo.2_1 0,159 1 Hsa.1.COL11A1 1 0,997 Mmu.3 Dre.2 1 Gac.III 1 1 Tni.15r Dre.24 0,99 Gac.XXI 0,992Tni.Unr_1 0,109 0,885 Ola.20 Dre.1 1 Ola.1 1 Gac.IX 0,374 1 Tni.18 D33Dre.3_3 0,855 Ola.8 1 0,999 Gac.XI 0,059 Tni.3 Aca.GL343198.1 1 Mdo.3 1 Hsa.19.COL5A3 1 Mmu.9 Cin.13q 0,907 Cin.7q

0.8 49 Gac.XIV 971 Ola.s1388 NJ 722 GLT 1000 Tni.4 Dre.10

999 Hsa.9.CERCAM 999 1000 Mmu.2 Mdo.1 1000 Gga.17 1000 Aca. AAWZ02040154 Gac.VIII 999 884 Tni.1 1000 Ola.4

914 Dre.8 Dre.2 Gga.8 784 Aca.GL343203.1

611 1000 Hsa.1.GLT25D2 1000 1000 Mmu.1 Mdo.2 Hsa.19.GLT25D1 1000 764 529 Mmu.8 1000 Mdo.3 Aca.GL343326.1 Gac.XI 934 1000 1000 Tni.3 985 Ola. 8 Dre.3

1000 Dre.1_1 1000 Dre.1_2 794 Gac.IX 572 1000 Tni.18 Ola.1 Dme.2L

0.06 Dre.2 0,926 Dre.8 ML 0,983 Ola.4 0,901 Gac.VIII 0,679 Tni.1 Mdo.2 0,998 Hsa.1.GLT25D2 0,987 0,999 Mmu.1 Gga.8 0,403 Aca.GL343203.1 Aca.GL343326.1 0,997 Mdo.3 0,959 Hsa.19.GLT25D1 0,999 Mmu.8 0,843 1 Dre.3 0,983 Ola.8 0,988 Gac.XI 09850,985 1 Tni.3 Dre.1_1 1 Dre.1_2 0,582 0,148 Gac.IX 1 Ola.1 0,201 Tni.18 Dre.10 1 Tni.4 0,865 Gac. XIV 0,81 1 Ola.s1388 Aca.AAWZ02040154 0,96 Gga.17 0 Mdo.1 0,996 Hsa.9.CERCAM 1 Mmu.2 Dme.2L

0.4 50 Cin.13q

NJ Hsa.1.LHX4 LHX 1000 Mmu.1 773

1000 Mdo.2

Aca.4

648 Gac.VIII 994 Ola.4 977

998 1000 Tni.1

Dre.8

Gga.Unr

994 Hsa.9.LHX3 1000 Mmu.2

Aca.AAWZ02037440

1000 793 Gga.17 961

Mdo.1

567 Dre.5

992 Ola.12 644 Tni.4 1000

Gac.XIV

Dme.2L

0.05 Cin.13q ML Mmu.1 0,361 Gga.Unr 0,746

0,975Hsa.1.LHX4

Aca.4 0,788 0,993 Mdo.2

Dre.8

0,937Gac.VIII

0,981 Ola.4 0

0,863 Tni.1

Dre.5

0,942Gac.XIV

0,849 Ola.12 0,806 Tni.4 0,216

Aca.AAWZ02037440 0,02 Gga.17

0,748 Mdo.1

0,893 Hsa.9.LHX3 1 Mmu.2

Dme.2L

0.4 51 Cin.9p_2 393 Cin.9p_1 NJ Dre.1 OLFM Gac.XI 873 681 Tni.3 1000 Dre.3 Ola.8 631 Gac.IX 902 1000 Ola.1 Tni.18 890 Hsa.19.OLFM2 1000 998 1000 Mmu.9 Mdo.3 1000 Aca.GL343198 847 Apl.s608 Aca.GL343956.1 639 Gga.17 1000 Hsa.9.OLFM1 956 886 Mmu.2 676 Mdo.1 950 Gac.XIV 441 994 Tni.Unr_3 543 Ola.12 1000 1000 Dre.21 Dre.5 Gac.s133 673 1000 Tni.Unr_2 Ola.9 Gac.III 506 998 965 Ola.uc199 899 Tni.15r Dre.2 923 Ola. 20 623 1000 Tni.Unr_1 924 Gac.XXI 1000 Dre.24 Gga.8 1000Aca.4 925Hsa.1.OLFM3 566 576Mdo.2 Mmu.3 Dme.X

0.08 Cin.9p_2 Cin.9p_1 Mdo.2 ML Aca.4 0,8140,433 Hsa.1.OLFM3 0,73 Mmu.3 0 Gga.8 0,784 Dre.24 0,961 Ola.20 0 1 Tni.Unr_1 0,271 1 Gac.XXI Dre.2 0,978Tni.15r 0,987Gac.III 0,293 Ola.uc199 Mdo.3 1 Hsa.19.OLFM2 0,984 0,993 0,945 Mmu.9 Aca.GL343198 0,822 Apl.s608 1 Dre.3 0Tni,951. 3 0,776Gac.XI 0,688 0,989Ola.8 Dre.1 0,98 Ola.1 0,96Tni.18 0,86 0,805 Gac.IX Gga.17 0,937 Aca.GL343956.1 0,974Mdo.1 0,902Mmu.2 0,858 Hsa.9 .OLFM1 1 Gac.s133 1 Tni.Unr_2 0,784 Ola.9 0,959 Dre.5 0,566Dre.21 0,074Tni.Unr_3 0,909Gac.XIV 0,667 Ola.12 Dme.X

0.5 52 Dme.3R_3

NJ Dme.3R_1 QSOX

Dme.2R

680 Cin.13q

Gga.17 781 999 Mdo. 1 1000 Hsa.9.QSOX2 1000 805 Mmu.2 1000

1000 Dre.5

Ola.9 1000 Tni.12

Aca.4 1000 1000 Gga.8

Gac.VIII 818 842 Ola.4 987

1000 Tni.1

998 Dre.8

Hsa.1.QSOX1 1000 Mmu.1 1000

Mdo.2

Dme.3R_2

0.05 Dme.2R

1 Dme.3R_1

0,988 ML Dme.3R_2 0,958 Dme.3R_3

Cin.13q

Hsa. 9. QSOX2 0,996 Mmu.2 0,995 Gga.17 0,734 1 0,995 Mdo.1

Dre.5

0,924 Ola.9 1 Tni.12 0,153 Mdo.2

0,999 Hsa.1.QSOX1 0,998 Mmu.1

0,987 Aca.4 0,979 Gga.8

0,646 Dre.8

1 Ola.4

0,986 Gac.VIII 0,82 Tni.1

0.5 53 Aca.GL343208.1 957 Aca.GL343203.1 NJ Hsa.1.RGL1 RALGDS 1000 997 997 Mmu.1 1000 Mdo.2_1

1000 Gga.8 Gac.VIII 448 999 Tni.1 1000 Ola.4 Dre.8_1 999 Gga.17 1000 999 Mdo.1 Aca.GL344285.1 555 Hsa.9.RALGDS 1000 Mmu.2 1000 Gac.XIII 997 1000 Tni.12 973 Ola.uc114 Dre.8_2 Hsa.6.RGL2 1000 1000 Mmu.17 1000 Mdo.2_2 Aca.GL343733.1

1000 Gac.XX 928 999 Tni.8 1000 Ola.16 Dre.16 Hsa.19.RGL3 379 1000 1000 Mmu.9 1000 Mdo.3 Aca.2 Ola.1 542 Gac.IX 817 1000 967 Tni.18 Dre.6 886 Gac.XI 1000 1000 Ola.8 Dre.3 Dme.3L

0.04 Dre.3 0,815 Ola.8 0,992 Gac.XI 1 ML Dre.6

0,548 Tni.18 0,71 0,988 1 Ola.1 Gac.IX Aca.2 0,999 Mdo.3 1 Mmu.9 09610,961 Hsa.19.RGL3 Mdo.2_1 0,892Mmu.1 0,944 0,909Hsa.1.RGL1

0,991 Gga.8 Aca.GL343208.1 0,999 0,984 Aca.GL343203.1 Dre.8_1

1 Tni.1 0,963 0,974Gac.VIII Ola.4 Dre.8_2 0,998 Tni.12 0,734 1 Ola.uc114 0,297 0,998 Gac.XIII Mmu.2 1 Hsa.9.RALGDS 0,955 Aca.GL344285.1 0,93 Mdo.1 0,883 0,923 Gga.17 Dre.16 1 Ola.16 0,823 Tni.8 0,788 1 Gac.XX Aca.GL343733.1 0,996 Hsa.6.RGL2 1 Mmu.17 0,167 Mdo.2_2 Dme.3L

0.8 54 Cin.13q Ola.1 987 RASAL NJ 1000 Gac.IX 949 1000 Tni.Unr_1 Dre.6 999 Hsa.19.RASAL3 1000 1000 Mmu.17 Aca.2 Hsa.6.SYNGAP1 1000 1000 Mmu.17_2 993 Mdo.2_2 1000 Aca.GL343884.1 Tgu.22r Gac.XX 1000 Ola.16 1000 1000 585 Dre.16 722 Dre.19 Tni.Unr_2 1000 Gac.X 1000 1000 Ola.11 Tni.21 Hsa.1.RASAL2 1000 1000 979 Mmu.1 1000 Mdo.2_1 1000 Gga.8 Aca.GL343300.1 1000 Dre.2 973 995 Ola.17 Tni.15r Gac.s133 997 1000 1000 Ola.9 Tni.12 Gga.17 876 812 Aca. AAWZ02036756 1000 Mdo.1 1000 Hsa.9.DAB2IP 1000 696 Mmu.2 Dre.8 1000 Gac.XIV 985 1000 Tni.4 984 Ola.12 Dre.5 Dme.X

0.06 Gac.XIV 0,997Tni.4 0,515 0,976 Ola.12 ML Dre.5 Aca.AAWZ02036756 0,369 0,987 Gga.17 1 Hsa.9.DAB2IP 1 Mmu.2 0,877 0,981 0,99 Mdo.1 Dre.8 Ola.9 0,324 1 Gac.s133 Tni.12 Gac.X 0,347 1 Tni.21 0,77 0,847 Ola.11 Dre.19 1 Dre.16 0,998Gac.XX 0,919 0,978Ola.16 1 Tni.Unr_2 Mdo.2_2 0,501 1 Mmu.17_2 0,998 0,999Hsa.6.SYNGAP1 Aca.GL343884.1 0,998 Tgu.22r Tni.15r 0,998 0,999 Ola.17 0 Dre.2 Hsa.1.RASAL2 0,999 0,583 1Mmu.1 0,898Mdo.2_1 0,998Gga.8 Aca.GL343300 .1 Cin.13q Dre.6 1 Tni.Unr_1. 0,799 0,947 Gac.IX 1 Ola.1 Aca.2 0,998 Mmu.17 1 Hsa.19.RASAL3 Dme.X

0.8 55 Cin.9p NJ SEC Aca.GL344202.1

982 Gga.17

1000 Hsa.9.SEC16A

1000 Mmu. 2 984

Mdo.1

1000 Gac.XIV

1000 Tni.4 1000

1000 Ola.12

Dre.16

996 Aca.GL343300.1

766 Gga.8

1000 Hsa.1.SEC16B

1000 Mmu.1 1000 578

Mdo.2

Dre.6

Dme.X

0.05 Ola.12

0,563 ML Gac.XIV 1

0,994 Tni.4

Dre.16

0,867 Mdo. 1

1 Mmu.2

1

0,989 Hsa.9.SEC16A 0,943 Gga.17

0,2 Aca.GL344202.1

Cin.9p

Dre.6

Aca.GL343300.1 0,68 0,982 Gga.8

1 Mdo.2

0,904 Mmu.1

0,999 Hsa.1.SEC16B

Dme.X

2.0 56 Cin.s273 NJ STX Aca.GL343286.1

842 Hsa.19.STX10

Gac.s198 877 991 Ola.s294 986

1000 Tni.18

Dre.11

Hsa.1.STX6 904 929 Mmu.1

912 Mdo.2

468 1000 Gga.8

Aca.4 1000

Dre.22

650 Tni.Unr

620 Ola.4

499 Gac.VIII

Dme.2R

0.05 Cin.s273

ML Mmu.1 0,959 Hsa.1.STX6 0,826

0,903 Mdo.2

0,972 Gga.8

Aca.4

0,908 Dre.22

0,951 Tni.Unr 0,681 Ola.4 0,829 0,509 Gac.VIII

Hsa.19.STX10

0,944 Aca.GL343286.1

0,935 Dre.11

0,998 Tni.18

0,844 Ola.s294

0,878 Gac.s198

Dme.2R

0.2 57