Open Access Research2006MeraldietVolume al. 7, Issue 3, Article R23

Phylogenetic and structural analysis of centromeric DNA and comment Patrick Meraldi¤*†, Andrew D McAinsh¤*‡, Esther Rheinbay* and Peter K Sorger*

Addresses: *Department of Biology, Massachusetts Institute of Technology, Massachusetts Ave., Cambridge, MA 02139, USA. †Institute of Biochemistry, ETH Zurich, Schafmattstr.,18 CH-8093 Zurich, Switzerland. ‡ Segregation Laboratory, Marie Curie Research

Institute, The Chart, Oxted, Surrey RH8 0TL, UK. reviews

¤ These authors contributed equally to this work.

Correspondence: Peter K Sorger. Email: [email protected]

Published: 22 March 2006 Received: 19 October 2005 Revised: 19 December 2005 Genome Biology 2006, 7:R23 (doi:10.1186/gb-2006-7-3-r23) Accepted: 24 February 2006 reports The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2006/7/3/r23

© 2006 Meraldi et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

servedKinetochore

Analysis from yeast evolutionof centromeric to man.

DNA and kinetochore proteins suggests that critical structural features of have been well con- research deposited

Abstract

Background: Kinetochores are large multi- structures that assemble on centromeric DNA (CEN DNA) and mediate the binding of to microtubules. Comprising 125 base-pairs of CEN DNA and 70 or more protein components, Saccharomyces cerevisiae kinetochores are among the best understood. In contrast, most fungal, plant and animal cells refereed research assemble kinetochores on CENs that are longer and more complex, raising the question of whether kinetochore architecture has been conserved through evolution, despite considerable divergence in CEN sequence.

Results: Using computational approaches, ranging from sequence similarity searches to hidden Markov model-based modeling, we show that organisms with CENs resembling those in S. cerevisiae (point CENs) are very closely related and that all contain a set of 11 kinetochore proteins not found in organisms with complex CENs. Conversely, organisms with complex CENs (regional CENs) contain proteins seemingly absent from point-CEN organisms. However, at least three quarters of interactions known kinetochore proteins are present in all fungi regardless of CEN organization. At least six of these proteins have previously unidentified human orthologs. When fungi and metazoa are compared, almost all have kinetochores constructed around Spc105 and three conserved multi- protein linker complexes (MIND, COMA, and the NDC80 complex).

Conclusion: Our data suggest that critical structural features of kinetochores have been well conserved from yeast to man. Surprisingly, phylogenetic analysis reveals that human kinetochore proteins are as similar in sequence to their yeast counterparts as to presumptive Drosophila information melanogaster or Caenorhabditis elegans orthologs. This finding is consistent with evidence that kinetochore proteins have evolved very rapidly relative to components of other complex cellular structures.

Genome Biology 2006, 7:R23 R23.2 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

Background protein' is used rather loosely, all linkers exhibit a clear hier- Kinetochores are eukaryote-specific structures that assemble archical relationship with respect to DNA and MT-binding on centromeric (CEN) DNA and perform three crucial func- proteins: linker proteins require DNA binding proteins, and tions: they bind paired sister chromatids to spindle microtu- possibly also other linker proteins, for CEN DNA binding but bules (MTs) in a bipolar fashion compatible with chromatid not MTs or MT-associated proteins (MAPs). disjunction; they couple MT (+)-end polymer dynamics to chromosome movement during metaphase and anaphase [1]; Kinetochore assembly in S. cerevisiae is initiated by associa- and they generate the spindle checkpoint signals linking ana- tion of the essential four-protein CBF3 complex with the phase onset to the completion of kinetochore-MT attachment CDEIII region of CEN DNA. CBF3-CDEIII association then [2]. Despite the conservation of these functions, and of MT recruits several additional DNA binding proteins, including structure and dynamics, CENs in closely related organisms scCse4, a specialized histone H3 found only at CENs are highly diverged in sequence, as are CENs on different (CenH3). CenH3-containing nucleosomes are thought to be chromosomes in a single organism [2,3]. The simplest known core components of all kinetochores [13]. When CEN associ- CENs, those in the budding yeast Saccharomyces cerevisiae, ated, the DNA binding subunits of S. cerevisiae kinetochores consist of 125 base-pairs (bp) of DNA and three protein-bind- recruit four essential multi-protein linker complexes, the ing motifs (CDEI, CDEII and CDEIII) that are present on all NDC80 complex (four proteins), COMA (four proteins), 16 chromosomes [4]. These short CEN sequences, often called MIND (four proteins) and the SPC105 complex (two pro- 'point' CENs, are structurally similar to enhancers and tran- teins). These complexes, in turn, recruit a multiplicity of scriptional regulators in that their assembly is initiated by motor proteins and MAPs to form a fully functional MT- highly sequence-selective DNA-protein interactions [5]. In attachment site (P De Wulf and PK Sorger, unpublished contrast, CEN DNA in fungi such as the budding yeast Cand- observation) [14-16]. ida albicans and fission yeast Schizosaccharomyces pombe, plants such as Arabidopsis thaliana, and metazoans such as A key question in the study of kinetochores is whether archi- Drosophila melanogaster and Homo sapiens, are longer and tectural features currently being elucidated in S. cerevisiae more complex and exhibit poor sequence conservation [6- are conserved in higher cells. Some S. cerevisiae proteins 10]. These regional CENs range in size from 1 kb in C. albi- have been shown to have orthologs in one or more metazoa. cans [6], to several megabases in H. sapiens [8] and typically These metazoan orthologs include CenH3, CENP-CMif2, contain long stretches of repetitive AT-rich DNA. CEN organ- Mis6Ctf3/CENP-I, Spc105KNL-1/Kia1570, members of the NDC80 ization is particularly divergent in nematodes such as and MIND complexes as well as MT-associated proteins such Caenorhabditis elegans, which contain holocentric CENs as EB1Bim1 and CLIP170Bik1, Mad-Bub spindle checkpoint pro- with MT-attachment sites distributed along the length of teins and some regulatory kinases [2,17-26]. To date, how- chromosomes [11]. Sequence-selective DNA-protein interac- ever, only CenH3 and CENP-C have been carefully compared tions have not been identified in regional CENs and it is at a sequence level in a wide range of organisms [27]. Here we thought that kinetochore position is determined by a special- report a systematic analysis of sequence relationships among ized chromatin domain whose formation at one site on each a set of approximately 50 fungal, plant and metazoan kineto- chromosome is controlled by epigenetic mechanisms [2,12]. chore proteins with the overall aim of exploring their struc- tural and evolutionary relationships. Our analysis supports A combination of genetics and mass spectrometry in S. cere- the conclusion that the four linkers at the core of S. cerevisiae visiae has yielded a fairly detailed view of the composition kinetochores, the NDC80 complex, MIND, COMA, and the and architecture of its simple kinetochores. S. cerevisiae SPC105 complex, have been conserved through eukaryotic kinetochores contain upwards of 70 protein subunits organ- evolution. A subset of kinetochore proteins, perhaps 20% of ized into 14 or more multi-protein complexes that together the total in S. cerevisiae, seems to be specific to point CENs, have a molecular mass in excess of 5 to 10 MDa [5]. S. cerevi- all of which are very closely related. A second set of kineto- siae kinetochore proteins can be assigned to DNA-binding, chore proteins is found only on regional CENs. It appears, linker, MT-binding and regulatory functions. While 'linker therefore, that all kinetochores have a single ancestor, proba-

FigurePoint 1 (see following are page)derived from regional centromeres and appeared only once during evolution Point centromeres are derived from regional centromeres and appeared only once during evolution. (a) The 16 CENs from S. cerevisiae were used to train a HMM. The blue bar indicates the number of predicted point CENs in the genome and the red bar represents the number of known chromosomes. (b) HMM from (a) was used to search the genome of fungi with known point CENs, known regional CENs and predicted point CENs. Blue and red bars are as described in (a) except gray bars, which indicate the predicted number of chromosomes, based on synteny within other Saccharomyces species. (c) Sequence comparison of the CDEI, CDEII and CDEIII elements from budding yeast with point centromeres. (d) Frequency distribution of the CDEII length (measured in bp) in each budding yeast with point centromeres. (e) Evolutionary conservation of CBF3 subunits in fungi with point and regional CENs. (f) Phylogenetic analysis of 17 different fungi, including the 7 budding yeast with point centromeres and the 3 budding yeast with regional centromeres using 3 highly conserved reference proteins (α-tubulin, the signal recognition protein SRP54 and the DNA replication factor PCNA). Blue branches represent fungi with point centromeres and black branches those with regional centromeres.

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.3

(a) (d) comment

100 Saccharomyces S. cerevisiae cerevisiae 90 S. bayanus S. mikatae 0 2 4 6 8 10 12 14 16 18 20 80 S. paradoxus

Known point C. glabrata 70 E. gossypii (b) K. lactis Candida 60 glabrata 50 Eremothecium CEN gossypii 40 reviews Frequency (%) Frequency Kluyveromyces 30 lactis 20

Schizo- Known regional saccharomyces 10 pombe 172 182 192 202 102 132 142 112 122 152 162 32 22 62 82 92 12 23 52 72

0 2 Candida albicans Length of CDEII (bp) Aspergillus CEN nidulans (e) Fungi Point CEN Ndc10 Cep3 Ctf13 Ctf3/Spc105 reports

Saccharomyces Predicted point bayanus S. cerevisiae +++++

Saccharomyces C. glabrata +++++ mikatae E. gossypii +++++ CEN Saccharomyces K. lactis +++++ paradoxus S. pombe - - - - + 0 2 4 6 8 1012141618 20 C. albicans - - - - + research deposited Number of predicted point CENs A. nidulans - - - - + Number of chromosomes Predicted number of chromosomes S. bayanus +++++

S. mikatae +++++

(c) CDEI CDEII CDEIII S. paradoxus +++++

Length AT content

2 2

1

1 bits Saccharomyces bits G 83-86 bp >86% T T A A A AT TCT C A GTGAC A CGAA G C A GT T TC 0 1 2 3 4 5 T6 7 8 9 0 A 1 2 3 4 5 6 7 8 9 T T10 T11 12 13 14 15 16 17 5? 3? C 10 CC 5? 3? G G G cerevisiae TCA TG AA (f) point CEN refereed research 2 2 Saccharomyces mikatae regional CEN Candida 1 Saccharomyces bayanus 1 bits bits 73-78 bp >83% GTT C Saccharomyces paradoxus A AT A T G C A A G G G T TCCTGT CTA G 0 C T 1 2 3 4 5 6 7 8 9 0.1 changes/aa GG GCA C

glabrata A 10 T11 12 13 14 15 16 17 0 100 1 2 3 4 5 6 7 8 9 5? 3? T 10 G G CCGAA Saccharomyces cerevisiae 5? CA TGA 3? Candida glabrata 62

2 2 Kluyveromyces Kluyveromyces lactis 1 1 100 bits bits 161-164 bp >85% AT A G TAT AG lactis TCTATATGTC CT C TG 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 10 5? CA GTG 10 3? 5? TTCCGAAA3? Eremothecium gossypii Saccharo- mycotina 2 100 Eremothecium 2 1 1 bits bits 164-167 bp >79% A G T CT T T C T A A G G G gossypii T G CTA 0 A G

G 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

A 10 11 12 13 14 15 16 17 10 A 5? CA TGA 3? 5?GT T TTCCGAA 3? Candida albicans 100 100

2 Debaryomyces hansenii Saccharomyces 2 1

1 bits bits 82-84 bp >91% T A A A A T A A GTA TG A C C C C GAT GC T AG G C C paradoxus T G G T 0 A T T 1 2 3 T4 5 T6 7 8 9 0 G 1 2 3 4 5 6 7 8 9

G 10 11 12 13 14 15 16 17 18 5? TCAC TG 3? 5? G TCCGAA3? Yarrowia lipolytica 72 interactions

2 2 Fusarium graminearum

Saccharomyces 1

bits 75 1 bits 81-85 bp >90% T T A AT A T C CC C AT TA G T A A 0 A GTACT A C Magnaporthe grisea A 1 2 3 4 5 6 7 8 9 C mikatae T C

G G 10 11 12 13 14 15 16 17 18 0 G GG 5? 3? 1 2 3 4 C5 6 7 8 9 G TCCGAA T 100 5? CA TG 3? Pezizo - 100 Neurospora crassa mycotina

2 2 Saccharomyces Aspergillus nidulans 1

bits 1 82-85 bp >88% bits bayanus A G A T T G T T T C CT A A T C A A GA A A T A GG G GAG C CTA 0 T 1 2 3 4 5 6 7 8 9 0 T G G 1 2 3 4 5 6 7 8 9

5? 3? 10 11 12 T13 14 15 16 17 18 TCAC TG 5? G T CCGAA3? Cryptococcus neoformans Basidio- 100 2 2 mycota 73-167 bp >79% Ustil agomaydis 1

bits 1 Consensus bits A G A C T GT T T C T G G A AT TC GA A T A TA A A T T C A C G A C G GC C 0 C A G G G C GA A A

1 2 3 4 5 6 7 8 9 T G T T GT AA 0 G T 1 2 3 4 5 T6 7 8 9 T 5? 3? 10 11 12 CA TG 5? G 13 14 15 16 17 18 19 20 21 T22 C23 C24 G25 A26 A27 3? Schizosaccharomyces pombe information

Figure 1 (see legend on previous page)

Genome Biology 2006, 7:R23 R23.4 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

Table 1

Sequence similarities among selected fungal kinetochore proteins of point CEN

Location Complex Protein Ubiquitous Point CEN specific Similarity* Identity*

DNA-binding Monomer Mif2 + 65% 23% ?Sgt1+ 74%28% CBF3 Cep3 + 65% 14% Ctf13 + 53% 9% Ndc10 + 48% 10% Linker layer COMA Mcm21 + 45% 7% Ctf19 + 47% 7% Ame1 + 45% 9% Okp1 + 51% 7% MIND Nnf1 + 67% 14% Nsl1 + 69% 15% NDC80 Ndc80 + 73% 20% Spc24 + 63% 6% SPC105 Spc105 + 48% 5% Ydr532C + 52% 6% ? Chl4 + 52% 11% ?Ctf3+ 51%6% ?Nkp1† + 55% 6% ?Nkp2† + 63% 6% ? Mcm16 + 52% 7% ? Mcm22 + 53% 4% ?Iml3‡ + 24% 6% ?Cnn1 +40%4% MT-binding DASH Ask1 + 43% 11% Dam1 + 54% 6% Regulatory ? Bub3 + 65% 18% ?Mad2+ 98%54%

*As determined from the proteins in S. cerevisiae, C. glabrata, E. gossyppii and K. lactis. †Instead of E. gossypii the sequences were derived from the very closely related S. kluyveri. ‡Similarity was determined from proteins of the point CEN containing S. cerevisiae, S. kudriavzevii, K. waltii and S. kluyveri. bly based on a regional CEN, from which contemporary kine- for CDEI and CDEIII, a hidden Markov model (HMM) for tochores diverged rapidly while conserving key structural CDEII (Figure 1a), and S. cerevisiae CENs as a training set. features. When the model was tested on C. glabrata, E. gossypii and K. lactis, organisms whose genomes are fully annotated, 6/13 centromeres in C. glabrata, 6/7 centromeres in E. gossypii Results and 6/6 in K. lactis were identified correctly (Figure 1b). Con- Point centromeres have a common origin versely, no point-CEN sequences were found in S. pombe, C. As a first step in determining relationships among kineto- albicans or A. nidulans, organisms known to have regional chores in different organisms, we searched fungal genomes CENs (Figure 1b). With a success rate of >70% and a false pos- for point CENs similar in structure to those in S. cerevisiae. itive rate of <5%, we conclude that our computer model is Three such examples are already known, C. glabrata, E. gos- effective at finding point CENs. sypii and K. lactis [28], but a significant number of newly sequenced genomes have not yet been analyzed. Finding new When unannotated genomes were analyzed using the tri-par- CENs with a CDEI-CDEII-CDEIII structure is not trivial tite computational model, 15 CDEI-II-III sequences were because the number of identical bases in CDEI and CDEIII is found in S. bayanus,14 in S. mikatae and 15 in S. paradoxus relatively small, even among chromosomes in S. cerevisiae. (Figure 1b) [29]. S. bayanus, S. mikatae and S. paradoxus Moreover, CDEII is not conserved in sequence but, rather, is contigs have not yet been fully assembled, but sequence sim- characterized by high AT content and alternating runs of ilarity and synteny suggest that all 3 have 16 chromosomes, poly-A and poly-T. To capture this information we con- close to the number of putative CEN sequences identified structed a tri-partite computational model based on profiles computationally in each organism. When these newly identi-

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.5

fied point CENs were combined with those in the literature, and CBF3 subunits are mapped on a phylogenetic tree (con- 85 CDEI-II-III sequences from 7 organisms became availa- structed using the highly conserved reference proteins α-

ble. These yielded a clear consensus for CDEI and CDEIII and tubulin, the signal recognition particle subunit SRP54 and comment revealed that, within a single organism, CDEII can vary in PCNA) they were found to cluster closely together (Figure 1f). sequence from one chromosome to the next but that length While recognizing the possibility for false-negative findings distributions are very narrow (± 3%; Figure 1c, d). Most fungi in cross-species sequence searching, we conclude that CDEI- have 84 bp CDEII sequences but E. gossypii and K. lactis II-III CENs and CBF3 CEN-binding proteins are probably have 164 bp CDEIIs, suggesting the presence of two copies of found only in a subset of closely related budding yeasts and, an underlying approximately 84 bp CDEII module (Figure thus, may have co-evolved. Intriguingly, the apparent com- 1d). To a first approximation, the extent of conservation mon ancestor of point-CEN and regional-CEN organisms among CDEI and CDEIII sequences on different chromo- appears to be a fungus containing regional CENs, implying reviews somes within a single organism was not much greater than that simple point CENs arose from complex regional CENs the extent of conservation among syntenic CENs in different and not the other way round. organisms (Figure 1c). Together, these data strongly imply that all organisms with CDEI-II-III point CENs arose from a To delineate further which kinetochore proteins are specific relatively recent common ancestor. to point CENs, and which are more widely distributed, we analyzed all known S. cerevisiae kinetochore proteins for Kinetochore proteins specific to organisms with point sequence conservation. As a starting point we examined centromeres scMis12Mtw1 and scNdc80Hec1, kinetochore proteins first iden- reports Does the existence of CENs with similar CDEI-II-III struc- tified in yeast and subsequently shown to have human tures imply the existence of similar DNA-binding kinetochore orthologs (hsMis12 and hsNdc80Hec1) that localize to kineto- proteins? In addressing this question, the CDEI-binding Cbf1 chores and play a role in [20,25]. protein is not very useful because it functions not only as a Experimental and sequence data establish that yeast and kinetochore subunit but also as a transcription factor for a set higher cell Ndc80Hec1 and Mis12Mtw1 proteins represent true of highly conserved biosynthetic [30], implying conser- orthologs [20,32-34]. Nonetheless, the overall degree of sim- vation of non-kinetochore function. We therefore concen- ilarity among Ndc80Hec1 and Mis12Mtw1 proteins across research deposited trated on components of the CBF3 complex, three of whose eukaryotes was found to be relatively modest (approximately subunits are thought to function only in CDEIII-binding (the 15% to 30%) as compared to proteins involved in DNA repli- fourth subunit, scSkp1, is also a component of the SCF ubiq- cation (PCNA, approximately 75%) or protein translocation uitin ligase complex [31] and, like Cbf1, has conserved non- (SRP54, approximately 60%). Multiple protein sequence kinetochore functions). When PSI-BLAST was used to search alignments of fungal, plant, and metazoan Ndc80Hec1 and predicated open reading frames in 17 fungal genomes for Mis12Mtw1 showed that sequence similarity is confined to 30 orthologs of scCtf13, scCep3 and scNdc10, all 3 CBF3 subunits to 100 residue blocks interspersed by stretches of non-homol- were found in the organisms with point CENs (7 in total), but ogy, many of which correspond to coiled coils (Figure 2a, b). refereed research not in organisms with regional CENs (Figure 1e). As a positive This pattern of block-by-block similarity was also observed control for the PSI-BLAST search, orthologs of scMis6Ctf3 and with five other kinetochore proteins for which orthology has scSpc105 could be found in all fungi examined (Figure 1e). been established experimentally, and is consistent with previ- Importantly, Mis6Ctf3 and Spc105 have approximately the ous proposals that kinetochore proteins have evolved rapidly same degree of sequence divergence in point-CEN containing [35] (Figure 2c). Importantly, for our purposes, data obtained fungi (51% and 48% similarity, respectively) as Ndc10 (48% from known kinetochore orthologs suggests that it is neces- similarity; Table 1). We provisionally conclude that CBF3 pro- sary to use conserved blocks, rather than complete sequences, teins are present only in fungi with CDEI-II-III CEN DNA when searching kinetochore proteins for patterns of sequence whereas other kinetochore proteins (such as Spc105 and Ctf3) conservation. interactions are ubiquitous. Moreover, when organisms with point CENs

FigureSequence 2 (seesimilarity following between page) kinetochore proteins is restricted to short stretches between orthologs Sequence similarity between kinetochore proteins is restricted to short stretches between orthologs. Multiple sequence alignments of the (a) Mis12Mtw1 and (b) Ndc80Hec1 families. Schematic drawing above the alignment indicate the length of the S. cerevisiae proteins and the percentages denote the degree of similarity of successive sequence blocks (black boxes) within fungi (red letters) or fungi, metazoa and plantae (green letters). The schematic drawing above the Ndc80 multiple sequence alignment also indicates the relative position of the globular and coiled-coil domain of Ndc80, as determined by electron-microscopy [32,33]. White letters on black denote identical residues, white letters on green, identical residues in ≥ 80% of the organisms and information black letters on green, similar residues in ≥ 80% of the organisms. (c) Schematic drawings indicating the percentage similarity of successive sequence blocks (black boxes) within fungi (red letters) or fungi, metazoa and plantae (green letters) based on multiple sequence alignments of the Nuf2, Spc25, Spc24. CENP-CMif2 and Mis6Ctf3/CENP-I, PCNA and SRP54 protein families

Genome Biology 2006, 7:R23 R23.6 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

50aa

(a) 10% 67% 0% 74% 13% 289 scMtw1 0% 54%0% 58% 17%

Similarity amongst fungi 14Block 1 46 60Block 2 98 Similarity amongst fungi, scMtw1 EHEHLGYPPISLVDDDDIININAVNEIMYKCTAAMEKYL EIKSGVAKLESLLLLENSVDKNFDFDKLELYVLRNRNVLRIPEE klMis12 EHEHLGYPPISLVDDDDIININAVNEIMYKCTNAMEKYL EIKSGVAKLESLLLLENSVDKNFDFDKLELYVLRNRNILSIPSD metazoa and plantae Saccharomycotina caMis12 EHEHLEFAPLTLIDDDDVININAVNEIMYKGTTAIETYL EIEIGMGKLESLLLLESTIDKNFDFDKFFEELYVLRNRNIFRIPKE ylMis12 EHEHFGYAPLAVVDDDDVININAVNQVLYTVTDAMEDFL EIEIGTAKMETLLLLETKVDEKFDFDLFFEELDALRNRNVFNVPSE Schizosaccharomycetes spMis12 ELLEFTPLSFIDDDDVIINNITNQLLYKGVNGVDKAF EIEEGLHKFEVLFESVVDRYYDGFEFEVYTLRNRNIFSYPPE mgMis12 EHEHFGYPPVSVLDDDDIININSINILAEQALNSVERGL EVENGTHQLETLLLLCASIDRNFDFDIFFEEIWVMRNRNILTVRPD EHEHFGYPPVSLLDDDDIININSINILAERALNSVEQGL EIENGTHQLETLLLLCASIDRNFDFDKFFEEIYVMRNRNILTVRPD Pezizomycotina ncMis12 fgMis12 EHEHFGYPPASLLDDDDIININTVNVLADRALDSVERLL EIEHGTHQLETLLLLNASIDKNFDFDLFFEELYTMRNRNILTVKPD EAEQGMHAILTLMENSIDHTLDTFEFELYCFRSVFGIRSR umMis12 EHEHFGYNPKSFIDALVYLSNEHLYSIATEFENVV ELIHGLHALETLLLLETHVDKAFDFDMFTSWLMRNRNPFEFSPD Basidiomycota cnMis12 QLVGVNPKNLGADLTETARLEMYNAVTSIDNWT

mgMis12 EHFGYPPVSLLDDIINSINILAEQALNSVERGL EVENGTHQLETLLCASIDRNFDIFEIWVMRNILTVRPD ncMis12 EHFGYPPVSLLDDIINSINILAERALNSVEQGL EVENGTHQLETLLCASIDRNFDKFEIYVMRNILCVRPE Fungi spMis12 ELLEFTPLSFIDDVINITNQLLYKGVNGVDKAF EIEEGLHKFEVLFESVVDRYFDGFEVYTMRNIFSYPPE scMtw1 EHLGYPPISLVDDIINAVNEIMYKCTAAMEKYL EIKSGVAKLESLLENSVDKNFDKLELYVLRNIFRIPEE caMis12 EHLEFAPLTLIDDVINAVNEIMYKGTTAIETYL EIEIGMGKLESLLESTVDKNFDKFELYVLRNIFRIPKD drMis12 QFFGFTPETCTLRVRDAFRDSLNHILVAVESVF TARESTQKLRGFLQERFEIMFQRMKGMLIDRMLSIPQN hsMis12 QFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVI QIRKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRIPSN Metazoa mmMis12 QFFGFTPQTCLLRIYIAFQDHLFEVMQAVEQVI QTRKCTEKFLCFMKGRFDNLFGKMEQLILQSILCIPPN xlMis12 QLFEFTPQTCILRIYIAFQDYLFEVMLVVEKVI RVRQSTEKYLHFMRERFDFLFQKMETFLLNLVLSIPSN ALSNGIARVRGLLLSVIDNRLKLWESYSLRFCFAVPDG Plantae atMis12 DSMNLNPQIFINEAINSVEDYVDQAFDFYARDA (b) 50aa Spc25 binding

Coiled-coil core MT Outer head

6% 11465% 233 34% 691 scNdc80 7% 53% 33% Block 1 scNdc80 RDPDP RPRP LRKND FQSA IQEE IYDYYLL KKNKFDI ETNHP ISIKFLKQPTQTQ KGFII IKWF LLYY LRL DPDP GYGFTK S---- IE NEIYQILLKK NLRYPYP FLES INKSKS QI SAVG-SNG WHK FLGLG MHMVRTNIKL W LD klNdc80 RDPDP RPRP LRKND FQNL LQQE IFSYYLL TDQKFDV ETNHP ISLKSLKQPTQTQ KDIYF MKWF LLYY LRL DPDP GYVFTK S---- LE HEVYSILRTIHYPYP YLA TINKKSS QI SAVG-SNG WPK FV GMHLVIL W INKKLD Saccharomycotina caNdc80 MDPDP RPRP LKKKD YQEL IQKE IIRYYLL IDYKFEI KTNIA LTENILKSPTQTQ KNNAF IKFF LLYY NQL DPDP NYMFIK SS--- IE QEIVTLLLKK LLNYPYP YMHT ITR SHF SAVG-NNG WPT FLLGG IYL WLV ELNL SLS ylNdc80 RDPDP RPRP VRD RHYQQQ ISQQ IYEYYLL VTNHFEQ ETRHP LNQRTLSNPTQTQ KDKTF MEWIFRF RI DDPP GYPFHK S---- IE NEVHAVLRAAKWLYPYP DS ITKSKS QI VAVG-QSG WAY FSGMHMVEL W LNT TIE KDPDP RPRP LSDRRYQQE CATQV VNYYLL LES- ---GF SQP LGLNNRFMPSTREF AA IKHF LYLY NKL DPDP NFRFGA R---- YEE DVTT CLKLK ALNYPYP FLDS ISR SRL VAISPHVG WPA ILLGG MVVLHW SIQCTL E Schizosaccharomycetes spNdc80 mgNdc80 KDPDP RPRP LKRSD YQNR IGQE LLDYYLL TQHNFEL DMNHN LSQNVIKSPTQTQ KDNYF IQWF LLYY NRI DPDP SYKFMK N---- ID QEVPPLLLKK QLRYYPP YEKGITKSKS QI AAVG-QNG WST FLGLG MHMMQL W LAQ MIE ncNdc80 RDPDP RPRP LKRQD FQNR IGQE LLEYYLL AKNNFEM EMNHK LSDNFTKSPTQTQ KDNYF LQWF LLYY HRI DPDP SYRFQK N---- ID QEVPPLLLKK QLRYPYP YEKS ITKSKS QI AAVG-QNG WST FLGLG LHMMQL W LAQ MLE Pezizomycotina fgNdc80 RDPDP RPRP LKRSD FQAR IGQE IMEYMVQHNFEM EMKHV LSQNVLKSPTTQQ KDNYF MQWF LLYY HRI DPDP SHKFQK N---- ID QEVPPLLLKK QMRYPYP FERS ITKSKS QI AAVG-QNG WST FLGLG LHMMQL W LAQ MLD RE TRPRP IKD KHYK TR MGLTVKEH LERTGF TM AG--W DAN KGVHEPTQTQ SAFVG MKHF IYATCIDTNF VMG VDGKK FE DEVLTLM KEIKYPYP AA DELS KTK LTAAGSQS HWPY CLAML EW MV NLGN QAE umNdc80 Basidiomycota KDPDP RPRP LRKVD FQSN CMRN VNEYYLL ISVRY P- ---LP LTAKTLTSPTAKEQSF IKFF LVN DL VD PGAAW GKK-- -FE DDTLSILLKK DLKYPYP GM DS VSKTALTAPGAPQSWPN MLAML NW LV DLCK ALD cnNdc80

mgNdc80 KDPDP RPPLL KDRSYQNRIGQE LLDYL TQH NF ELDMN HNLS QNVIKSPTQKDFNYIIFF QWLYLY NR IDPSY KF MKN-I DQ EVPPLLK QLRYPYEK GITK SQIAAVGG-QNWST FL GMLHWMM QLA QMIE ncNdc80 RDPDP RPPLL KDRQFQNRIGQE LLEYL AKN NF EMEMN HKLS DNFTKSPTQKDFNY LFQWLLYY HR IDPSY RF QKN-I DQ EVPPLLK QLRYPYEKSITKSQIAAVGG-QNWST FL GLLHWMM QLA QMLE Fungi spNdc80 KDPDP RPPLL SDRRYQQECATQV VNYL LES-- --GFS QPLGLNNRFMPSTRE FAAIIFF KHLYLY NK LDPNFRF GAR-Y EE DVTT CLAK LNYPFLDSISRSRLVAIGSPHVWPA IL GMLHWVV SLIQCTE scNdc80 RDPDP RPPLL RDKNFQSAIQEE IYDYL KKN KF DIETN HPIS IKFLKQPTQKGIIKF IIFF WLYLY LR LDPGY GF TKS-I EN EIYQILK NLRYPFLESINKSQISAVGG-SNWHK FL GMLHWMV RTN IKLD caNdc80 MDPDP RPPLL KDKKYQELIQKE IIRYL IDYK FEIKT NIA LT ENILKSPTQKNFNAIIFF KFLYLY NQ LDPNY MF IKSSI EQ EIVTLLK LLNYPYMHTITRSHFSA VGG-NNWPT FL GILYWLV ELN LSLS drNdc80 KDPDP RALHDKAFVQQCIKQ LYEFL VDR -- --GFP GSIT VKALQSPSTKEFLK IYEFI YNF LEPSFQM PTAKV EE EIPRMLK DLGYPFALSK--SSMYSIGAPHTWPL ALG ALIWLM DAV KLFG hsNdc80 KDDPP RPPLL NDKAFIQQCIRQ LCEFL TEN -- --GYA HNVS MKSLQAPSVKDFLKIIFF TFLYLY GF LCPSY EL PDTKF EE EVPRIFK DLGYPFALSK--SSMYTVGAPHTWPH IV AALVWLI DCI KIHT Metazoa mmNdc80 KDPDP RPPLL NDKAFIQQCIRQ LYEFL TEN -- --GYV YSVS MKSLQAPSTKEFLKIIFF AFLYLY GF LCPSY EL PGTKC EE EVPRIFK ALGYPFTLSK--SSMYTVGAPHTWPH IV AALVWLI DCI KIDT xlNdc80 KDPDP RPPLL HDKAFIQQCIRQ LCEFL NEN -- --GYS QALT VKSLQGPSTKDFLKIIFF AFI YTF ICPNY EN PESKF EE EIPRIFK ELGYPFALSK--SSMYTVGAPHTWPQ IV AALVWLI DCV KLCC Plantae atNdc80 ------GASDD RSSM IRFINAFSL TH------N FPIS IRGNPVPSVKDI SE TLKFLLS ALD--YPC DSIKW DE DLVF FLSQKCK PFKITK--SSLKAPNT PHNWPT VL AVVHWLAELARFHQ

(c) 37% Nuf2/Nuf2R family 33% 50aa

29% 48% 59% Spc25 family 20% 50% 59%

22% 62% Spc24 family 24% 44%

43% 3% 60% 8% 50% Mif2/CENP-C family 6% 2%60% 1% 26%

6% 35%13% 45% 7% 42% Ctf3/Mis6/CENP-I family 3% 30%5% 30% 9% 40%

75% PCNA family 75%

72% 13% SRP54 family 68% 8%

Figure 2 (see legend on previous page)

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.7

When 55 S. cerevisiae kinetochore proteins (including the organisms to higher eukaryotes (see Figure 4 for a schematic CBF3 subunits discussed above) were used in PSI-BLAST of the approach). Alignments were created for 41 ubiquitous

queries to search 14 fully annotated fungal genomes (Addi- fungal proteins and conserved blocks determined. The non- comment tional data file 1), 41 were found to have orthologs in organ- redundant NCBI protein database was then searched for isms with both point and regional CENs (Figure 3). These these conserved blocks using PSI- BLAST or Prosite pattern proteins included kinetochore regulators such as the Mad1-3, searching algorithms (see Materials and methods for details). Bub1, BubR1/Mad3 and Mps1 checkpoint proteins and the Potential orthologs differing greatly in size from the fungal Ipl1-AuroraB kinase, as well as many structural components. proteins and candidates with well-established non-kineto- In addition to the 41 proteins mentioned above, conservation chore functions were eliminated from further consideration. was observed for proteins such as Skp1 [31], Cbf1 [30,36] and The remaining proteins were then aligned to confirm the some MAPs [37] that function at kinetochores as well as at presence of conserved blocks. This search led to the identifi- reviews other locations in the cell. As noted above, these proteins are cation, in a wide variety of organisms, of previously unre- likely to have been conserved for reasons other than their ported orthologs of many S. cerevisiae kinetochore proteins presence at kinetochores, and they cannot be used to infer (Additional data file 1), among which were four new human overall similarity in kinetochore structure. In this respect, kinetochore proteins (Figure 4). Recent analysis of S. pombe kinesin motor proteins are also difficult to analyze. Eukaryo- kinetochore complexes by mass spectrometry revealed the tic cells contain multiple kinesins, which are known to fall presence of a set of proteins for which orthologs could not be into 14 highly conserved protein families based on sequence, found in S. cerevisiae [39,40]. When conserved sequence structure and function [38]. Typically, each kinesin has more blocks from these S. pombe proteins were used to search the reports than one cellular function and kinetochores in different genomes of higher eukaryotes, two additional human pro- organisms recruit different kinesin family members, making teins were flagged as likely kinetochore subunits (Figure 4). it difficult to determine (in the absence of experimentation) Regardless of which fungi contributed to the sequence blocks, which kinesins should be considered kinetochore associated. the most highly conserved kinetochore subunits were invari- ably regulatory proteins such as the Mad and Bub checkpoint Leaving these complications aside, among 55 fungal kineto- proteins and the Aurora B kinase. Structural proteins such as chore components analyzed, 11 were found in the 7 organisms Ndc80Hec1, Nuf2, CENP-CMif2 and Mis12Mtw1 were considera- research deposited with point CENs and nowhere else, implying that they are bly more diverged. specific to a CDEI-II-III CEN architecture (Figure 3). These 11 proteins include the CBF3 subunits scCtf13, scCep3 and The four human proteins representing hitherto unrecognized scNdc10 described above, the non-essential CNN1 prod- orthologs of S. cerevisiae kinetochore subunits were provi- uct, 1 subunit of the SPC105 complex (Ydr532c), two subunits sionally named hsNnf1-Related (hsNnf1R; also known as of the COMA linker complex (scAme1 and scOkp1) and 4 pro- PMF1 [41]; Figures 4 and 5), hsNsl1R (also known as DC8 or teins that require COMA for CEN-association (scMcm22, DC31), hsMcm21R and hsChl4-R. hsNnf1R shares with its scMcm16, scNkp1 and scNkp2). Among organisms in which fungal counterpart 2 conserved blocks of 30 to 35 residues refereed research they are found, the 11 point CEN-specific proteins are as well with 47% and 67% similarity, hsNsl1R shares 1 conserved or better conserved than ubiquitous kinetochore proteins, block of 35 residues with 43% similarity, hsMcm21R shares 3 implying that failure to identify orthologs in more distant conserved blocks of 15 to 30 residues with 46%, 87% and 33% fungi is a consequence of their actual absence. We therefore similarity and hsChl4R shares 2 conserved blocks of 20 and propose that approximately 20% of the overall kinetochore in 50 amino acids with 45% and 40% similarity (Figure 5). The fungi containing CDEI-II-III CENs is specialized to their sim- potential human orthologs of S. pombe Fta1 and Sim4 were ple CENs. As expected, these specialized kinetochore subunits provisionally named hsFta1R and hsSim4R (also known as include proteins in direct contact with CEN DNA (Figure 3). Solt [42]). hsFta1R shares with its fungal counterpart three conserved sequence blocks of 40, 25 and 30 residues with interactions Identification of novel human kinetochore proteins 48%, 49% and 58% similarity and hsSim4R one block of 27 Based on success in identifying fungal orthologs of S. cerevi- residues with 65% similarity (Figure 6). Elsewhere we will siae kinetochore proteins, we expanded our set of target describe experimental data showing that hsChl4R, hsNsl1R,

FungalFigure kinetochores 3 (see following contain page) a set of point specific components Fungal kinetochores contain a set of point centromere specific components. Schematic model of kinetochore subunitorganization based on the architecture of the S. cerevisiae kinetochore. Kinetochore proteins can be roughly divided into DNA-binding (pink), linker (blue), MT-binding (green) and information regulatory layers (yellow). Within each layer many proteins are organized into multi-protein complexes, for example, the linker layer is composed of at least four complexes (gray boxes (a) to (d)): COMA, NDC80, MIND and SPC105. Protein names are given for S. cervisiae first and S. pombe second, while essential genes (italic letters) and non-essential (normal letters) is indicated. Protein names followed by an asterisk indicate that this specific ortholog is known not to localize to kinetochores. The kinesins present at kinetochores in S. cerevisiae are Kip3 (Kinesin-8), Cin8 (Kinesin-5), Kip1 (Kinesin-5) and Kar3 (Kinesin-14), while in S. pombe they are Klp5 (Kinesin-8), Klp6 (Kinesin-8) and Klp2 (Kinesin-14) (for nomenclature see [38].

Genome Biology 2006, 7:R23 R23.8 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

Stu2/ DASH com. Dam1/Dam1 SLI15 com. Alp14,Dis1 Duo1/Duo1 Ipl1/ Sli15/ Spc19/Spc19 Kinesin Spc34/Spc34 Ark1 Pic1 Dad1/Dad1 Bim1/ Dad2/Dad2 Mal3 Bir1/ Dad3/Dad3 Cut17 Dad4/Dad4 Ask1/Ask1 Kinesin Hsk3/Hsk3 Bik1/ Tip1* Mps1/ Mph1

Mcm16 Iml3/ Mad3 Mad2 Mis17 Nkp1 Cnn1 Bub3 Ctf3/ Mad1 Mis6 Bub1 Chl4/ Nkp2 Mis15 Slk19/Alp7 Mcm22

(a) COMA com. (b) MIND com. (c) NDC80 com. (d) SPC105 com.

Nsl1/ Ame1 Nnf1 Ydr532 Mis14 Nuf2 Ctf19 Mtw1/ Spc105/Spc7 Mcm21 Okp1 Mis12 Ndc80 Spc25 /Mal2 Dsn1/ Mis13 Spc24

Mif2 Sgt1/ CBF3 com. Git7 Skp1

Ctf13 Ndc10 Ndc10 Ndc10 Ndc10 Cbf1 Cep3 Cse4/Cnp1 Point or regional CEN

DNA-binding components Present in point CEN only

Linker components Present in point and regional fungal CENs

MT-binding components Multi-protein complexes

Regulatory components or Proteins that function elsewhere in the cell

Figure 3 (see legend on previous page)

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.9

hsMcm21R, hsNnf1R, hsFta1R and hSim4R localize to kineto- chores in human cells and are required for accurate chromo- Fungal linker kinetochore proteins some segregation (AD McAinsh et al., submitted). PSI-Blast search comment Importantly, for the purposes of the current analysis, the in 14 fungal proteomes identification of new human kinetochore proteins means that Fungal linker kinetochore protein family one or more subunits are present in metazoans for each of the Clustal-W four multi-protein linker complexes forming the core of the S. and T-Coffee Multiple sequence alignment cerevisiae kinetochore. Thus, it appears that simple point of fungal proteins

CENs in budding yeast and complex regional CENs in human Identification of conserved cells probably share fundamental architectural similarities. domain

Conserved protein domain or reviews S. cerevisiae DASH is a 10-protein MT-binding complex that amino acid motif has attracted considerable recent interest because it forms PSI-Blast search in NR database using conserved domain or Scanprosite search using amino acid motif rings encircling MTs [43,44]. DASH subunits are conserved among fungi but we have found few if any potential orthologs Similar mammalian protein in higher eukaryotes. The closest match to a DASH protein in Is the protein Exclusion already characterized? yes humans, NYD-SP28 [45], has an amino-terminal domain of no about 30 amino acids 40% similar to S. cerevisiae Spc34 Is the protein (Additional data file 2). The Chlamydomonas rheinhardtii Exclusion similar in size? no yes reports ortholog of NYD-SP28 localizes to the flagellum [46], imply- ing that NYD-SP28 might be involved in interactions with Potential mammalian ortholog

MTs. Our preliminary conclusion is that higher eukaryotes do PSI-Blast search based on potential mammalian not contain a protein complex closely related to fungal DASH, ortholog in plants and metazoan NR and EST database although further investigation of NYD-SP28 is warranted. Metazoan/plant orthologs

Clustal-W research deposited Correspondence between human kinetochore proteins and T-Coffee and their yeast counterparts Multiple sequence alignment Several kinetochore proteins first identified in human cells of metazoan/plant proteins have previously been shown to have fungal orthologs, includ- CENP-A Clustal-W ing CENP-C (orthologous to scMif2p [47]) and CenH3 and T-Coffee (orthologous to scCse4 [48]). We therefore wondered whether additional orthologs might be found in fungi for Combined multiple sequence alignment kinetochore proteins hitherto characterized only in higher of fungi/metazoan/plant proteins eukaryotes, such as CENP-E, CENP-H, Rod, Zwint and Are the refereed research Exclusion blocks conserved? no Zwilch [49-53]. We found that, among fungal proteins, yes hsCENP-H is most similar to S. pombe spFta3 (Figure 7a), Is the aproximate position of the which was shown recently to be a fission yeast kinetochore Exclusion homology blocks conserved? no protein [39]. It has been suggested previously that S. cerevi- yes siae scNnf1 is the budding yeast CENP-H ortholog [54] (Fig- Identification of novel orthologs ure 7b) but we find that scNnf1 is actually much more similar Pmf1 e.g. New human kinetochore proteins: to hsNnf1R and spNnf1 than to CENP-H (Figure 7c). We Nnf1R (Pmf1), Nsl1R (DC31), Chl4R, therefore propose that CENP-H is orthologous to the fungal Mcm21R, Fta1R and Sim4R (Solt) Fta3 family of proteins. Searches using PSI-BLAST revealed interactions that the Fta3 protein, like the Sim4 and Fta1 proteins with Schematicfungal,scNsl1,Figure metazoan, scChl4,4 describing scMcm21, and the plant sequence-search spSim4 orthologs and ofspFta1 the based kinetochore approach proteins used to scNnf1,identify which it interacts in S. pombe [39], has apparent orthologs Schematic describing the sequence-search based approach used to identify only in organisms with regional CENs (Additional data file 1). fungal, metazoan, and plant orthologs of the kinetochore proteins scNnf1, The presence of Sim4 and Fta1 in the budding yeast Yarrowia scNsl1, scChl4, scMcm21, spSim4 and spFta1. Since such sequence-based searches can yield a significant number of false positives, strict exclusion lipolytica, which has regional CENs, but not in yeasts with criteria were applied to ensure the identification of orthologs. point CENs, is striking, since Y. lipolytica is significantly closer in overall sequence to S. cerevisiae than to S. pombe. We therefore conclude that Fta3, Sim4 and Fta1 are members In contrast to CenH3CENP-A, CENP-C and CENP-H, potential information of a class of kinetochore proteins found specifically in fungi orthologs of the human CENP-E, Rod, Zwint and Zwilch pro- and metazoa with regional CENs and not in fungi with point teins were not found in any of the fungi examined. The appar- CENs. ent absence of a fungal Rod or Zwilch is particularly interesting, since their binding partner at human kineto- chores, Zw10, has a potential ortholog in S. cerevisiae, Dsl1

Genome Biology 2006, 7:R23 R23.10 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

(a) 50aa 0% 47% 18%67% 17% scNnf1

hsNnf1R (PMF1)

Block 1 Block 2 ncNnf1 RART LQQLFSTTLRHTLDKIS-RDNFAACY NKEFNSILHTRQVVPKLNELETLVGEANKRK anNnf1 RVTRLQEIYAQALARTLRANS-YSNFAACF KAEFEEILAERNAIAQLNELDRLVGEARARR Fungi spNnf1 RKEQLDAFLSRTLSETIAHIP-LEKFAQCF KQEYANLIKERDLNKKLDMLDECIHDAEFRK caNnf1 RFERLRLVARKALEQLIKKSLTMEQVKTCF LDEFDLIYKEKDIESKLDELDDIIQNAQRTK scNnf1 RYIRLKQVFNRALDQSISKLQSWDKVSSCF QREFKEIMEERNVEQKLNELDELILEAKERY hsPmf1 RVKLLDTMVDTFLQKLVAAGS-YQRFTDCY REEISDIKEEGNLEAVLNALDKIVEEGKVRK Metazoa mmNnf1R RVKHLDAIVDTFLQKLVADRS-YERFTTCY REEISEIKEEGNLEAVLNSLDKIIEEGRERG trNnf1R RFKLFEKVMQKSLEKFIELAS-FHRFSSVF QDDICKLVEEGLLEAKLNELDKLERAAKDRP xlNnf1R RMLIFNTLVDKFLDGLVQAGS-YQRFARCY RDEIQEIRDEGNLEALLDSLDKMEKEAGDRP Plantae atNnf1R RRRTHTHLKKSFKSTLRHLLTACS-KQDFVDIF EEEFDEQCHE`TQVGPILDTVEELVLLEEQSLDP

(b) 50aa 10% 43% 6% scNsl1

hsNsl1R (DC31)

Block 1 anNsl1 EPFDGKLA-ARVA-SL YAQLESLTTTVAQLRRDAP ncNsl1 EPFDPRKR--SRLE-TL AREEEDLLRSIALLKRRVP Fungi spMis14 EPFDLALR-TRVQ-Q FNEVEDAHV LVARYRKSVP caNsl1 EKIDSEIT-IQLR-KI FQEFEQETV DVTKLRRDLP scNsl1 EPFDLDLN-EQVR-KYM QEWEDETV KVAQLRQTGP hsDC31 EASDNCFMDSDIK-VL EDQFDEIIVDIATKRKQYP mmNsl1R EAKEHGLMDSDIK-VL EDEFDELIV DVATKRRQYP Metazoa xtNsl1R HAPETQNE-PDIK-L EDKLDDAIV DTALQRNRYP drNsl1R ETSEEYCE--DYESTV NNILDEKIV ETASKRSSYP ggNsl1R ETSADSEPNSANIKILEDQLDELIVETATKRKQWP

(c) 50aa 4% 46% 87% 33% 3% scMcm21

hsMcm21R

Block 1 Block 2 Block 3 ncMcm21 GLRIEVFAR--GRFMRPYYVLM VHRHTVPPCIPISGL FARSL RRELVRYHHR nMcm21 GVRIDICVRN-GRFTKPYYILL VHRHTIPAFIPVERL FVREV RRQLVAWHMR Fungi caMcm21 GLRFDLYSNFTKCFQQPHYCIL VYKHTLPSYIPVDEY FAESI QLTLTKTQYK scMcm21 GIRLEVFSERTSQFEKPHYVLL LFKHTIPSFIDVQGI FAKRV FLQLVEVQKR spMal2 GMDEELEGG---NRFDVPYYIIF LYKYTIPSFLNIQEW FLWKV DKLLTAYICR hsMcm21R GVCVCISTAFEGNLLDSYFVDL IHHHSVPVFIPLEEI FLFSL CEYLNAYSGR mmMcm21R GVCMCISTAFEGNLLDSYFVDL IHHHSVPVFIPLEKI FLFSL WAYLNAYAGR Metazoa xtMcm21R GVCVCISSAFEGAYLDSFHLDI ISRHSVPPFIPLEQI FLSVL FEHLNAYAGR tnMcm21R GVCISLATAYNDVFMETYNLEL IGRHDIPPFIPLKRL FLDAL SQHLNAYVGR Plantae atMcm21R GIQFETSTA--GETYEVYHCVL VLEHTIPFFLPLSDL FIDNV GDLLQAYVDR

(d) 50aa 4% 45% 4%40% 10% scChl4

hsChl4R (BM039)

Block 1 Block 2 anChl4 LVKQLGKLPRQSLLDLVFQW LQ--FRKGGKRE-VI-DRILDGDWRHGITL RQIAMID LRYLDDHPASLR-WTALELTR ncChl4 VFKILNRLSRASLLTLALDW LQ--SRKGSKRE-VI-DRIMEGDWRHGLTL YQLAMAD IQYLYDHPTSQK-WAAYRIMP Fungi spMis15 IQKLLNRFPRDFLVKLCVEW FYKNVPKSMLKRSII-HRMLVYDWPNGFYL GQIAQLEILALAHGFVSMR-WTASKVHH caChl4 LYNILDRLSKNSILQFIILW FRKLINRTPKRK-LI-DKIIFEYWTQGLNL LQISQID CQLIVDKSNSAQSWIYSTVKD scChl4 VFKQLMKLPVTVLYDLTLSW LDLLIEKGVRRNVIV-NRILYVYWPDGLNVFQLAEID CHLMISKPEKFK-WLPSKALR drChl4R LNRVIRRIPNKNIKNLLSKW LQALDYTKPKRM-IV-EHIIDCCESSSLNL KHITNLEMIYHLDNPDQGT-WYACQLTD hsChl4R IKRTILKIPMNELTTILKAW LQTVNFRQR-KESVV-QHLIHLCEEKRASISDAALLD IIYMQFHQHQ-KVWDVFQMSK Metazoa mmChl4R LRRTILKIPLSEMKSILEAW LQTINLKQR-KD-YLAQEVILLCEDKRASL DDVVLLD IVYTQFHRHQ-KLWNVFQMSK xlChl4R IKRTILKLPFSETATILKTW LQTFTLRYP-KE-VTATEVVRFCEARNATL DHAAALD LVFNHAYSNK-KTWTVYQMSK ggChl4R IRRTV LKI PRDEIMAVLQTW LQTINFRQT KEG ISHSVAQLCEESSADL KQAALLDIIYNHIYPNK- RTWSVYHMNK

Figure 5 (see legend on next page)

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.11

IdentificationFigure 5 (see of previous potential page) orthologs of scNnf1, scNsl1, scMcm21 and scChl4 in humans Identification of potential orthologs of scNnf1, scNsl1, scMcm21 and scChl4 in humans. S. cerevisiae (a) Nnf1, (b) Nsl1, (c) Mcm21 and (d) Chl4 were aligned with five fungal, four metazoan and one plant sequence. White letters on black denote identical residues, white letters on green, identical residues comment in ≤ 80% of the organisms and black letters on green, similar residues in ≤ 80% of the organisms. Schematic drawings above the alignments indicate the length of the S. cerevisiae proteins and the percentages denote the degree of similarity of successive sequence blocks (black boxes).

[55]. Both hsZw10 and scDsl1 play a role in membrane traf- in E. cuniculi lack redundant and non-essential genes [63]. ficking during interphase [56], but scDsl1 is not known to Using our HMM for CDEI-II-III, no sequences similar to localize to kinetochores. Thus, whereas human hsZw10 func- point CENs were found on any of the 11 E. cuniculi chromo- tions in vesicle-MT and chromosome-MT interaction, scDsl1 somes, nor were CBF3 proteins found by PSI-BLAST (Figure reviews appears to have only the former function, presumably 10a). We therefore speculate that E. cuniculi contains a because Rod and Zwilch are not present. The absence of Zw10 regional CEN of some sort. Orthologs of CenH3 and CENP- from fungal kinetochores is also sufficient to explain the CMif2 are present in E. cuniculi, as are all four components of absence of Dynein: the Rod/Zw10/Zwilch (RZZ) complex is the NDC80 linker complex, three components of MIND and needed for the association of Dynein with human and Dro- SPC105 (Figure 10b, Additional data file 3). No subunits of sophila kinetochores [57]. Considering these data together, COMA, the fourth S. cerevisiae linker, were found. Among we conclude that animal cell kinetochores contain proteins, regulatory proteins, E. cuniculi Ipl1/Aurora B and SurvivinBir1 currently comprising perhaps 25% of the total (and likely to orthologs were present as were Mps1 and Bub3, but not other reports increase), that are absent in fungi with either regional or proteins required for the spindle assembly checkpoint in point CENs. yeast or human cells (Figure 10b). When Cdc20, an essential activator of the anaphase promoting complex (APC/C) was Evolutionary relationships among kinetochores examined for sequence motifs, further evidence was obtained Thus far, we have distinguished only between point and that E. cuniculi lacks a spindle checkpoint. APC/C is an E3 regional CENs but a more nuanced view can be obtained from ligase required for the ubiquitination of proteins whose phylogenetic analysis of kinetochore structural proteins. As a destruction is necessary at the metaphase-anaphase transi- research deposited reference for these comparisons, a tree was constructed by tion [64]. In all eukaryotes examined to date, an activated combining data on three well-conserved eukaryotic proteins: form of the Mad2 checkpoint protein binds to Cdc20 via a α-tubulin, PCNA and SRP54 (Figure 8a; this reference tree short conserved peptide so as to block Cdc20 from activating closely matches reference trees constructed by others APC/C, thereby arresting cells at the metaphase-to-anaphase [58,59]). The reference tree exhibited prototypical clustering transition [65,66] (Figure 10c). E. cuniculi Cdc20 contains of fungi in one branch and metazoa in another so that Dro- the WD-domain implicated in APC/C interaction but lacks sophila and C. elegans were much closer to humans than S. any sequence similar to a Mad2 binding domain (Figure 10c), pombe or S. cerevisiae. However, the phylogenetic trees for implying that it is not subject to checkpoint control. From refereed research Ndc80Hec1 and Nuf2 were remarkably different: overall these data we conclude that E. cuniculi probably contains a sequence divergence was much greater and Drosophila and very simple kinetochore, based on a regional CEN that con- C. elegans Ndc80Hec1 (or Nuf2) proteins were not signifi- tains about one-half the proteins found in S. cerevisiae. In cantly more similar to their human than their fungal counter- contrast, other large multi-protein structures in E. cuniculi parts (Figure 8b, c). Drosophila Ndc80 and Nuf2 were are only slightly less complex than their higher eukaryotic particularly striking in occupying a branch of the phyloge- counterparts. For example, E. cuniculi ribosomes are com- netic tree distant from all other animals. This great diver- posed of 77 subunits as compared to 84 subunits in S. cerevi- gence in Drosophila kinetochore protein sequence is also siae. Symptomatic of the simplicity of the E. cuniculi illustrated by the fact that, apart from regulatory compo- kinetochore is the absence of the vast majority of potential interactions nents, such as the Mad-Bub proteins and a few MAPs, only a MAPs. Nonetheless, it is significant that the E. cuniculi kine- limited number of structural kinetochore proteins have been tochore contains three of the four linker complexes that identified in flies (for example, CENP-C [60], CenH3CID [61], appear to form the core of budding yeast and human the RZZ complex [62], Ndc80Hec1, Nuf2, and Mis12Mtw1 ;Fig- kinetochores. ure 9).

Organization of the simplest kinetochore Discussion Encephalitozoon cuniculi is a microsporidium and intracellu- Extensive genetic and biochemical experimentation has made information lar parasite that has been subjected to considerable evolu- S. cerevisiae kinetochores the best characterized structures tionary pressure to reduce its genome to the smallest possible involved in chromosome-MT attachment [5]. S. cerevisiae size. As a consequence, E. cuniculi and related microsporidia kinetochores contain upwards of 70 protein subunits assem- have the smallest known eukaryotic proteome (1,997 bled into 14 or more multi-protein complexes. In this study potential open reading frames) and many cellular structures we used similarity-based sequence searching to ascertain

Genome Biology 2006, 7:R23 R23.12 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

(a) 50aa 49%6%48% 11% 58% spFta1

hsFta1R

Block 1 Block 2 Block 3 anFta1 LLNTSWT SHRLS PLLLHYY SLLTN RTA DTYAAR RD YLTNS LA VAGAPTL LPLFDL LLRLPKPLRESLFSFLSSN TYVSA LRI VLSGLHLLSAYLSK LALD RGGYVR TRIACAGFVVTSE GRLK IV ncFta1 FYNTTFS THRVS PLLLYVG KQPLD RVR QTLSQ R RE ILVGD VV R GVEVGL LPLFDL LLRMPTPLKSVISDFLSTT CRVSS LRL FLTEA GRYVDK HLGLN LFHPGVRVAKIACGGFVMS-E GRLK VF Fungi mgFta1 LYNTTFHTYRVS PLLLHIG NEPLT TAR SRLSH G RD ALVGD VV R GVQVR L LPLFDL LIRMPPALRLAVIKFLESE CRISP LAL FLTAA KSYLQA HIALDMDHPAVL LTKVACGGFVIS-E GRLK LF fgFta1 FFNTTFS THRVS PLLLHVG EKRLT GQR EVIASR RD TLVGD VV R GIQLR L LPLFDL LMRMPQQLKSVVADWLAVA CRVSRVAL FLTDA ANYLHR HLALK LFHPSVRVTQISCSGFVLS-QSRVK IL SpFta1 FYNVTYT AYRLS PLLFGF EYSNL TEIG KKLTRF RY GTDRT GYFTN STRF FSLFDALVRMDGALWMAVEQFLQQE TQILPCLI FLYPI MEHIKRCTSLD LTNSVFS L SKVNTDCAILTSS GKLK IF hsFta1R LLHKQWT LYSLT PLLLYKF SYSN- --- KEYSR L NA FIVAE KQ K GLAVE V LPLFDF LANGAESNTAIIGTWFQKT CYFSP LAI FLMDC YSHFHR HFKIH LS--ATR LVRVSTSVASAHTD GKIK IL Metazoa mmFta1R LLHKQWT IYSLT PLLLYKF SYSN- --- KDYSR L SA FIVAE KQ K GVAVE V LPLFDF LANGAESNTSLIRDWFQKT CCFSP LAI FLMNC YSHFHR HFKIH LA--ATR LVRVSTSVASAHTD GKIK IL xlFta1R LLSKQWT LYSVTPL MH KF SYTN- --- LGKEYAR L SA HICAE KQ K LAVE V LPLFDF LVNGAETLTAMVGTWFQKA CSFSS LPI FLFQC YTHFFR HFKIH LS--ATH LVKVSTSVASAHCD GKVK IL ggFta1R LLRKQWT LYSVS PLLLYKF SSAD- --- KDYAR M GV FIAAE KQ K GLAVD V LPLFDF LVNGAESYTAIVGSWFQKT CCFRR LAI FLMEC YSHFHR HFKIH LS--ATK LVKVSTGIASAHCD GIIFLK drFta1R LVKNEWKLSYVT PLLLHHF RHTQ- --- KSYAK H SA FIVAE KQ Q GVAVE V LPLFDFCTSGPESLTNLVKSWFERA CNFGS LAL FHLLGMSGMETHFFR FKIH S--AGT LKVSTALGSAHHD KIK IG

(b) 50aa 22% 65% spSim4

hsSim4 (Solt)

Block 1 anSim4 RFLPDLF VRAHAKV QF RARKL RID GR fgSimn4 RFLPDLF VRAHSKI QF RAMRL RVD GR Fungi mgSim4 RFLPDLF IRAHSRV QF KATRL RID GK ncSim4 RFLPDLF VRAHSKV VF RATRL RVD GR spSim4 SFLPDLF VLASLCTVDQPSKVK IP SD hsSim4R ELLPDLF LRAHNGI LR EPTRI REA HQ Metazoa mmSim4R ELLPDLF LRAHYGI LR EPSQI REA HQ xlSim4R EMLPDLFLRAHYGI LR EPKRI R EA HQ

IdentificationFigure 6 of potential orthologs of spFta1 and spSim4 in humans Identification of potential orthologs of spFta1 and spSim4 in humans. S. pombe (a) Fta1 and (b) Sim4 were aligned with five fungal, and three to five metazoan sequences. White letters on black denote identical residues, white letters on green, identical residues in ≥ 80% of the organisms and black letters on green, similar residues in ≥ 80% of the organisms. Schematic drawings above the alignments indicate the length of the S. cerevisiae proteins and the percentages denote the degree of similarity of successive sequence blocks (black boxes).

which S. cerevisiae kinetochore proteins have orthologs in 15 result with many potential causes. However, in cases in which fungi, 11 metazoa and 2 plants (Additional data file 1) with the a kinetochore protein is conserved among organisms A, B and overall aim of determining which structural features of S. cer- C whereas a second kinetochore protein is well-conserved evisiae kinetochores have been conserved throughout evolu- only in species A and B and undetectable in C (and multiple tion. The analysis is not as straightforward as might be related species), a tentative conclusion can be drawn that the assumed, because kinetochore proteins are among the most second protein is actually absent from C. For example, we find rapidly evolving proteins in the genome [67]. In addition, the that CBF3, an essential CEN-binding protein in S. cerevisiae, structure and sequence of CEN DNA has diverged widely has orthologs in seven budding yeasts containing CEN DNA from organism to organism. Whereas fungi closely related to conforming to a CDEI-CDEII-CDEIII organization but not in S. cerevisiae contain 125 to 225 bp CENs with a CDEI-CDEII- organisms with regional CENs. In contrast, other kinetochore CDEIII structure, most other organisms contain much longer proteins similar in their degree of sequence conservation to regional CENs with few if any conserved sequence elements. CBF3 subunits among point CEN-containing yeast (approxi- mately 45% to 50% similarity) are found throughout fungi. Guided by experimental data on established orthologies in Thus, we provisionally conclude that CBF3 is present in only yeast, humans and other organisms, we base most of the con- fungi with CDEI-CDEII-CDEIII centromeres. Despite the clusions in this paper on the characterization of proteins that potential for occasional error, our use of both positive and share blocks of homologous sequence. In several cases, we negative findings makes it possible to draw broad conclusions also draw inferences from a failure to identify homologous about the organization and possible origins of simple and proteins. We recognize that this failure represents a negative

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.13

(a) comment 50aa

10%73% 22% 50% caFta3

hsCENP-H

Block 1 Block 2

ncFta3 VRDHILDLELERLRV RKLGKTKGEL RVSRQKWKTIKGATSALVAGTGVDWARDERLRDLVLDPPG reviews anFta3 TLKKLSSLEVENLQL QQLEQLEAEHKQSKAKWETMKSVASAII VGSGVDWAEDEDLTALVLDDLN Fungi spFta3 ITFTSEEIEKEKENI SSIEGVLADL KLARARWEIVRNVTQILL LESGISFLENKKLAYIMDLCGD ylFta3 LRKQLDVLICENRAS QAISELEAKL HNTKNKRRVLAEIFTALI TASGIDWCEDEDLVQLIVGAGE caFta3 NSKNQNDLQSGNYNL GAKRKQYYEL VRKWRELENMVTEIPVLI SDLPINWYDDPTLLQIMQEVDE hsCENP-H LEEKLLDIRKKRLQL ERIKIIRQNL QMEIKITTVIQHVFQNLI LGSKVNWAEDPALKEIVLQLEK mmCENP-H LEKKLLDVRKKRLQL EMIKTMKKKL QTEIKITTVVQHTFQGLI LASKTNWAEDPALRETVLQLEK Metazoa drCENP-H LQDQITDVQKQRLDL KTKERAEAVL QKYQRITTILQNVLRGII LASKVNWREDPKLSDIVMKLEH xlCENP-H LEDRLYDIRKKRLVL EKIKRLNKIL KDEIDSTTVLQNVFQNII LASHVDWAKDPQLTDIGLKLET ggCENP-H IRENLNDIRRKRYFL EKLKFIHRNVQYERKVTTLVQNILQNIIVGCQINWAKDPSLRAIILQLEK

(b) reports

50aa 0%23% 10%34% 20% caNnf1

hsCENP-H research deposited

Block 1 Block 2 ncNnf1 RATRLQQLFSTTLRHTLDKI-SRDNFAACY KEFNSILHTRQVVPKLNELETLVGEANKR anNnf1 RVTRLQEIYAQALARTLRAN-SYSNFAACF AEFEEILAERNAIAQLNELDRLVGEARAR Fungi ylNnf1 RYRRLVTSSSTALAHAVKSL-TYKRVAACY TEFDAIFDERHLPEKLAQLDQLVKDARLR dhNnf1 RYERLKLVCKKALEQSIKKSLSIDQIKSCY KEFDLIFKERNIETKLNELDEIIQKAQER caNnf1 RFERLRLVARKALEQLIKKSLTMEQVKTCF DEFDLIYKEKDIESKLDELDDIIQNAQRT hsCENP-H QLLEYKSMVDASEEKTPEQIMQEKQIEAKI RQSSVLMDNMKHLLELNKLIMKSQQESWD mmCENP-H KQLVMEFDTACPPDEGCNSGAEVNFIESAK GCARLIVETMRDIIKLNWEIIQAHQQARV Metazoa drCENP-H QLLEYKSMIDTNEEKTPEQIMQEKQIEVKI PENCVLTDDMKHILKLQKLIMKSQEESSE NESRLIQEKFTHILTLSSAVINSQQETRE xlCENP-H QSLEVQTVAEASEEVSSEALSSEKLAEIAK refereed research ggCENP-H KEIMENQCFEMNVXVSMGKHRESCEAEADL ADAEELKELTNQNSEICEKTIQILKETRE

(c)

50aa Similarity to hsCENP-H: 0% 23% 10%34% 20%

caNnf1 interactions

hsCENP-H

caFta3

Similarity to hsCENP-H: 10%73% 22% 50% information

TheFigure human 7 kinetochore protein CENP-H is more closely related to a novel family of fungal proteins than the Nnf1 family The human kinetochore protein CENP-H is more closely related to a novel family of fungal proteins than the Nnf1 family. Multiple sequence alignments of metazoan CENP-H proteins and either (a) fungal Fta3 family proteins or (b) fungal Nnf1 family of proteins. Sequences were annotated as in Figure 5. (c) Comparison of sequence similarity between human conserved domains of CENP-H and C. albicans Nnf1 and C. albicans Fta3.

Genome Biology 2006, 7:R23 R23.14 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

(a) Reference proteins(b) Ndc80 family

Hs Ca MmGg Yl Kl Rn Dr Eg Dm xxl Dh Sc Cg Ce Fg 76 100 100 Nc An Cb Mg 100 51 100 79 Mg 100 57 100 Nc Fg 67 100 66 100 An 85 Os 100 Sp 100 74 Um At 95 57 Sp 62 Ec 100 100 Cn Cp Yl

100 100 Dr 100 Cn 96 Xl Gg 66 99 100 Mm HsRn 100 100 Um 100 Kl Cp Eg 100 Dh Ca Cg Cb Os At Ce

Ec 0.1 0.1 Dm

(c) Nuf2 family

Xl Rn GgHs Mm 94 Dr 97 Cp 57 At 92 Yl Os 99

Um 96 Chordata

100 Cb Ce Arthropoda Ec 70 68 Cn 94 Nematoda Embryophyta 100 Dm 100 Ascomycota 73 94 81 An 84 Fg Nc Sc Dh Mg Microspora Kl Eg Ca Cg Sp Apicomplexa

0.1

PhylogeneticFigure 8 analysis of kinetochore protein conserved domains Phylogenetic analysis of kinetochore protein conserved domains. Radial phylogenetic trees were assembled for (a) reference proteins (α-tubulin, the signal recognition protein SRP54 and the DNA replication factor PCNA), (b) the Ndc80 family and (c) the Nuf2 family. For bootstrap analysis, sample size equals 100. Nodes with support less than 50% were collapsed. The accession number for each protein is described in Additional data file 1.

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.15

(a) comment 14%54% 25% scNuf2

anNuf2 FTFTDLTKPT--HDRLVKIFSYLINFVRFRE ncNuf2 FSFNDLYKPT--HDRLVRMLSYVINFVRFRE Fungi spNuf2 FTIQDLLKPDR--NRLQLILSAVINFAKLRE scNuf2 FNMTDLYKPEA--QRTQRLLSAVVNYARFRE Previously caNuf2 FVLTDIARPD--GYRIRRILSAVINFIRFRE annotated hsNuf2 FETADILCPK--AKRTSRFLSGIINFIHFRE proteins mmNuf2 FEIVDILNPR--TNRTSRFLSGIINFIHFRE reviews Metazoa xlNuf2 FHPSDVLNPK--GKRTLHSLSGIVNFLHFSA drNuf2 FSLSDLLNPK--TKRTITILSAIQNFLHFRK ceNuf2 LTMCDLVTPAKHEHRFRKLTSFLVDFLKLHE ISFKDLLRPE--SSRTEFFISALLNYGLYKD Plantae atNuf2 Newly osNuf2 FTLRDLLRPDP--RRLVQVLSALINFLYYRD annotated dmNuf2 YTYMDIIKPA--VKKTLATLSYLFNHLAYYK protein reports (b) 3% 40% 10% 26% scNdc80

anNdc80 QELLEYLTHNNFELEMKHSLGQNTLRSP------TQKDFNYIFQWLYHRIDPGYRFQKA--MDAEVPPILKQLRYPYEKGITK-SQIAAVGGQNWPTFLGMLHWLME ncNdc80 QELLEYLAKNNFEMEMNHKLSDNFTKSP------TQKDFNYLFQWLYHRIDPSYRFQKN--IDQEVPPLLKQLRYPYEKSITK-SQIAAVGGQNWSTFLGLLHWMMQ Fungi spNdc80 TQVVNYLLES----GFSQPLGLNNRFMP------STREFAAIFKHLYNKLDPNFRFGAR--YEEDVTTCLKALNYPFLDSISRSRLVAIGSPHVWPAILGMLHWVVS

caNdc80 KEIIRYLIDYKFEIKTNIALTENILKSP------TQKNFNAIFKFLYNQLDPNYMFIKSS-IEQEIVTLLKLLNYPYMHTITR-SHFSAVGGNNWPTFLGILYWLVE research deposited Previously scNdc80 EEIYDYLKKNKFDIETNHPISIKFLKQP------TQKGFIIIFKWLYLRLDPGYGFTKS--IENEIYQILKNLRYPFLESINK-SQISAVGGSNWHKFLGMLHWMVR annotated hsNdc80 KQLYEFLVDR----GFPGSITVKALQSP------STKEFLKIYEFIYNFLEPSFQMPTAK-VEEEIPRMLKDLGYPFALSKS--SMYSIGAPHTWPLALGALIWLMD proteins mmNdc80 RQLCEFLTEN----GYAHNVSMKSLQAP------SVKDFLKIFTFLYGFLCPSYELPDTK-FEEEVPRIFKDLGYPFALSKS--SMYTVGAPHTWPHIVAALVWLID Metazoa xlNdc80 RQLYEFLTEN----GYVYSVSMKSLQAP------STKEFLKIFAFLYGFLCPSYELPGTK-CEEEVPRIFKALGYPFTLSKS--SMYTVGAPHTWPHIVAALVWLID drNdc80 RQLCEFLNEN----GYSQALTVKSLQGP------STKDFLKIFAFIYTFICPNYENPESK-FEEEIPRIFKELGYPFALSKS--SMYTVGAPHTWPQIVAALVWLID ceNdc80 SKIYNFLVEY----ESSDAPSEQLIMKPR-----GKNDFIACFELIYQHLSKDYEFPRHERIEEEVSQIFKGLGYPYPLKNS--YYQPMGSSHGYPHLLDALSWLID RFINAFLSTHN------FPISIRG--NPVP----SVKDISETLKFLLSALDYP---CDSIKWDEDLVFFLKSQKCPFKITKS--SLKAPNTPHNWPTVLAVVHWLAE Plantae atNdc80 Newly osNdc80 RVVNAYLAPA------V-SLRPPLP----SAKDIVAAFRHLFECLDFPL----EGAFEDDLLFVLRVLRCPFKLTRS--ALKAPGTPHSWPPLLSVLYWLTL annotated dmNdc80 QQILEYLHGIQ-NSEAPTGLIADLFSRPGGLRHMTIKQFVSILNFMFHHIWRN-RVTVGQNHVEDITSAMQKLQYPYQVNKS--WLVSPTTQHSFGHVIVLLDFLMD protein refereed research

(c) 39% 0%44% 4% scMtw1

mgMis12 EHFGYPPVSLLDDIINSINILAEQALNSVERGL ENGTHQLETLLCASIDRNFDIFEIWVMRNILTVRPD ncMis12 EHFGYPPVSLLDDIINSINILAERALNSVEQGL ENGTHQLETLLCASIDRNFDKFEIYVMRNILCVRPE Fungi spMis12 ELLEFTPLSFIDDVINITNQLLYKGVNGVDKAF EEGLHKFEVLFESVVDRYYDGFEVYTLRNIFSYPPE scMtw1 EHLGYPPISLVDDIINAVNEIMYKCTAAMEKYL KSGVAKLESLLENSVDKNFDKLELYVLRNVLRIPEE

Previously caMis12 EHLEFAPLTLIDDVINAVNEIMYKGTTAIETYL EIGMGKLESLLESTIDKNFDKFELYCLRNIFNIPKD interactions annotated hsMis12 QFFGFTPQTCMLRIYIAFQDYLFEVMQAVEQVI RKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRIPSN proteins mmMis12 QFFGFTPQTCLLRIYVAFQDHLFEVMQAVEQVI RKCTEKFLCFMKGRFDNLFGKMEQLILQSILCIPPN Metazoa xlMis12 QLFEFTPQTCILRVYIAFQDYLFEMVLVVEKVI RQSTEKYLHFMRERFNFLFQKMETFLLNLVLSIPSN drMis12 QFFGFTPETCTLRVRDAFRDSLNHILVAVESVF RESTQKLRQFLQERFEIMFQRMKGMLIDRMLSIPQN ceMis12 SFFGFSAGSFSDSIFNIYVDAWEEVCSNEFKNS DRLCLLSFPFSDKTNEKAFKMMKRFCVTNIFRIPAS Plantae atMis12 QLFNFHSRSVYATLKYIVNERIHCTIKKMCETI KTNQKHLEKAYCKGAIPHLTNIKTI-VKKCIAVPSN Newly dmMis12 KFFNFTAAQLSAEREHIVQDIIKKGIGQIIDKI EAEKETVERRFQASASKGLKALREL-DSKVFHVPPH annotated protein information FigureIdentification 9 and annotation of (a) Nuf2, (b) Ndc80 and (c) Mis12 orthologs in D. melanogaster Identification and annotation of (a) Nuf2, (b) Ndc80 and (c) Mis12 orthologs in D. melanogaster. Schematic drawing above the alignment indicate the length of the S. cerevisiae proteins and the percentages denote the degree of similarity of successive sequence blocks (black boxes). White letters on black denote identical residues, white letters on green, identical residues in ≥ 80% of the organisms and black letters on green, similar residues in ≥ 80% of the organisms. Accession numbers are described in Additional data file 1.

Genome Biology 2006, 7:R23 R23.16 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

evolved. Several findings in the current work suggest, how-

(a) (b) SLI15 com. ever, that CDEI-II-III CENs arose in combination with a set Stu2 Ipl1 16 Kinesin of 11 proteins as a specialization of a regional CEN. First, all Bim1 Bir1 annotated organisms containing point CENs (S. cerevisiae, C. 14 Kinesin Bik1 glabrata, K. lactis, and E. gossypii) have a common origin in Mps1/ Mph1 12 one relatively shallow branch of the fungal phylogenetic tree. Bub3 Were CDEI-II-III sequences an ancestral CEN, the current 10 (ii) MIND com. (iii) NDC80 com. (iv) SPC105 com. distribution of regional CENs would require loss of point

8 Nuf2 Nsl1 Nnf1 CENs from multiple independent evolutionary branches. Sec- Mis12 Ndc80 Spc105 6 Spc25 ond, we could obtain no evidence for CDEI-II-III CEN DNA Dsn1 Spc24 or CBF3 proteins in the microsporidium E. cuniculi, which is 4 Number of predicted point CENs Number of chromosomes thought to have arisen through an ancient divergence in the CENP-C Sgt1 Skp1 2 fungal kingdom [68].

0 cuniculi Encephalitozoon cerevisiae Saccharomyces CENP-A regional CEN If the speculation that CDEI-II-III point-CENs evolved from

DNA-binding components MT-binding components regional CENs is correct, we must consider the possible exist- Linker components Regulatory components ence of other short CENs that are also based on sequence-spe- Multi-protein complexes cific DNA binding interactions just not CBF3. By way of precedent, the emergence of CDEI-II-III CENs is coincident (c) with large-scale chromosomal changes that gave rise to the HMR, HML and MAT loci, thereby changing the sexual Mad2 WD-domain scCdc20 potential of S. cerevisiae and related yeasts [29]. S. pombe

mgCdc20 RILEFKPAP and its close relatives undergo mating type switching analo- anCdc20 RILAFKPPP ncCdc20 RILQFKPAP gous to that in S. cerevisiae, but the molecular mechanisms of Fungi spCdc20 RVLAFKLDA caCdc20 RILLYQPLP switching are completely different [69]. Functional analysis scCdc20 RILQYMPEP cnCdc20 RILSFQSAP of fungi with short uncharacterized CENs will be needed to drCdc20 KILHLGGKP hsCdc20 KILRLSGKP test the speculation that just as different forms of mating-type

Metazoa mmCdc20 KILRLSGKP xlCdc20 KILRLGGRP switching have developed based on distinct biochemistry, dmCdc20 RILCYQNKA ceCdc20 RILCYKKNL point CENs with structures other than CDEI-II-III might RILSFRNKP Plantae osCdc20 atCdc20 RILAFRNKP exist. Mad2 BD-domain WD-domain ecCdc20 Evolution of kinetochore proteins Sequence comparison reveals that conservation among

MFTLEHIRPMDINLGDIFTTTGRRQAQKEDRYTKTTIGIHTSVLASIRLMTSSRS orthologous kinetochore proteins is invariably restricted to relatively short sequence blocks embedded in longer regions of low sequence similarity. The restriction of sequence simi- IdentificationFigure 10 of a minimal kinetochore in E. cuniculi Identification of a minimal kinetochore in E. cuniculi. (a) HMM described in larity to small blocks explains the relative difficulty in finding Figure 1a, b failed to find a CDEI-II-III structure in the genome of E. cuniculi. orthologs and the widespread assumption that yeast and The Green bar indicates point CENs identified and black bars the number human kinetochores are very different. Henikoff and of chromosomes. (b) Speculative model of E. cuniculi kinetochore subunit colleagues [67] have studied the evolutionary divergence of organization. Proteins colored in pink, blue, green or yellow represent Mif2 components of the DNA binding, linker, regulatory or microtubule- CenH3 and CENP-C in some detail and propose that kine- binding layers, respectively, based on kinetochore organization in S. tochore proteins are under positive selection in plants and cerevisiae. Potential multi-protein complexes are highlighted with a grey animals as a consequence of meiotic drive by CEN DNA box. (c) Sequence alignment of fungal, metazoan and plantae Cdc20 during female . Rapid evolution in protein sequence is showing the conserved Mad2 binding site. Note: E. cuniculi lacks both the conserved Mad2 binding site and an ortholog of the Mad2 protein. most apparent in worms and flies, and in this study we have Schematic drawings indicate the length of the S. cerevisiae and E. cuniculi added only dmNdc80, dmNuf2 and dmMis12 to the list of proteins, the position of the WD-domain (black box) and the position likely structural Drosophila kinetochore proteins. Why the where Mad2 binds. rate of kinetochore protein evolution is so much greater in flies and worms as compared to mammals, plants and fungi complex kinetochores that would not be possible based on a remains a mystery but it is reminiscent of data on other key more conservative approach. regulators of chromosome segregation. Securin and its pro- tease separase are also highly diverged in D. melanogaster: Origins of point centromeres Drosophila securin, unlike the human and yeast proteins, Based on the simple structure of their CENs, it is widely consists of two separate gene products, called three rows and assumed that S. cerevisiae kinetochores represent an ances- pimples, that interact with an unusually short separase [70]. tral structure from which complex regional kinetochores Moreover, unlike the majority of eukaryotes that utilize an

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.17

RNA-templated reverse transcriptase to replicate telomeres, D. melanogaster uses an alternative mechanism based on (a) Fungal MAPs Metazoan MAPs transposition of the HeT-A and TART retrotransposable ele- comment DASH com. Dam1 CLASP Bim1/ Duo1 /Stu1 EB1 Kinesin Spc19 ments [71]. It seems very likely that several distinct classes of Spc34 COMA/Sim4 ch-Tog1 Dad1 APC Dad2 Dynactin /Stu2 kinetochore arose early in evolution. Perhaps surprisingly, Adapter(s) /Kar9 Lis1 Dad3 /Pac1 Kinesin Dad4 Dynein CLIP-170 Ask1 /Dyn1 Hsk3 fungal kinetochores appear to be as good a model for their CENP-F Sim4 /Bik1

CENP-I Rod Slk19 Fta1 ZW10 human counterparts as kinetochores in organisms such as /Ctf3 Zwilch TD-60 /Dsl1 Zwint-1 CENP-H Iml3 Chl4 worms and flies. /Fta3 Conserved Core Kinetochore RanGAP1 SLI15 com.

(i) COMA com. (ii) MIND com. (iii) NDC80 com. (iv) SPC105 com. Aurora B INCENP /Ipl1 /Sli15 RanBP2 Survivin Borealin Nsl1 Nnf1 Nuf2 For the majority of kinetochore proteins we have little knowl- Bir1 Ctf19 Mis12 Spc105 mDia3 /Mtw1 Ndc80 edge of their biochemical functions or their structure. It is Mcm21 Spc25 Dsn1 BubR1 reviews Plk1 Spc24 /Mad3 /cdc5 tempting to speculate that conserved sequence blocks repre- Bub3 Mad2 sent protein-protein interaction domains or interaction sur- Mps1 Bub1 Mad1 CENP-C faces under tight evolutionary pressure. However, with very /Mif2 Sgt1 Skp1 few exceptions (for example, the kinase domains of check-

Regional CEN point and regulatory proteins and motor domains of kines- Cse4/CENP-A nucleosomes ins), blocks of conserved sequence do not correspond to recognizable functional domains. This stands in contrast to DNA-binding components Present at regional CEN only the situation in nuclear pore complexes, in which highly con- Linker components Present at fungal CEN only MT-binding components Present at all fungal and metazoan CENs reports served and recognizable domains correspond to key func- Regulatory components Present at metazoan CEN only tional units [72]. The most abundant structural elements in Multi-protein complexes kinetochore proteins are coiled coils, which are known to function in protein-protein association [73] and act as springs (b) and levers [74]. Coiled coils in the budding yeast spindle pole H. sapiens body protein scSpc42 also create a crystalline core involved in spindle pole body duplication [75]. Biochemical and electron S. pombe research deposited microscopy experiments have shown that the heptad repeat S. cerevisiae domains in all four subunits of the S. cerevisiae NDC80 E. cuniculi complex associate to form an extended stalk that is linked to two globular heads [32,33]. Whether the stalk is simply a spacer or some sort of mechanical element remains unex- 0 10 20 30 40 50 60 70 80 90 plored. Only detailed structural and biochemical experi- Number of proteins ments, backed up by analysis in vivo, will reveal the logic of Point CEN only (e.g. CBF3) Common regulators (e.g. Mad2) refereed research sequence conservation among kinetochore subunits. Fungi only (e.g. Dam1) Regional CEN only (e.g. Fta3/CENP-H)

Common core (e.g. Ndc80/Hec1) Metazoa only (e.g. Zwilch-Rod)

A conserved molecular core of the kinetochore Proteins shared with other cellular structures (e.g. Skp1, Dynein) A key conclusion in this paper is that four multi-protein link- ers that form the core of the S. cerevisiae kinetochore, MIND, the SPC105 complex, the NDC80 complex and COMA are also EvolutionaryFigure 11 development of kinetochores from yeast to mammals Evolutionary development of kinetochores from yeast to mammals. (a) likely to be present in a wide variety of species (Figure 11). Model of the kinetochore using protein subunit positions derived from the Along with CenH3 and CENP-C, SPC105, MIND and NDC80 organization of the S. cerevisiae kinetochore. Proteins present in all fungal complexes are ubiquitous. In budding yeast, linker complexes and mammalian CENs are outlined in black while proteins present only in are thought to form a bridge between proteins in direct con- fungi and mammals with regional CENs are outlined in red. Red dotted interactions tact with DNA and those that bind MTs [5,14,15], and it will outlines indicate proteins that are only present in fungi. Black dotted outlines indicate that either this protein only exists in metazoans or that be important to show that this is also true in other organisms. only the metazoan ortholog is present at kinetochore. Proteins colored in Prior to the current work, biochemical experiments had led to pink, blue, green or yellow represent components of the DNA binding, the identification of SPC105, MIND and NDC80 complexes in linker, regulatory or microtubule-binding layers, respectively, based on S. pombe, C. elegans and human cells [18,19] but our kinetochore organization in S. cerevisiae. Potential multi-protein complexes are highlighted with a light gray box and the conserved kinetochore core, systematic sequence analysis extends these observations to a or COMA/Sim4 adaptor with a dark gray box. Protein names are given for greater variety of organisms, including E. cuniculi, a micro- H. sapiens first and then S. cerevisiae when different. Italic lettering indicates sporidium with a remarkably small proteome. The presence that the protein has additional functions in the cell. The kinesins present at information of the structural kinetochore proteins listed above appears to kinetochores in H. sapiens are CENP-E (Kinesin-7) and MCAK (Kinesin- 13), and in S. cerevisiae Kip3 (Kinesin-8), Cin8 (Kinesin-5), Kip1 (Kinesin-5) be more fundamental for chromosome segregation than a and Kar3 (Kinesin-14) (for nomenclature see [38]). (b) Quantification of Mad2-dependent spindle assembly checkpoint, which does the number of kinetochore proteins, and their respective evolutionary not seem to exist in E. cuniculi. Thus, ascertaining the precise class, in S. cerevisiae, S. pombe, E. cuniculi and H. sapiens. molecular functions of the MIND complex, NDC80 complex,

Genome Biology 2006, 7:R23 R23.18 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

Spc105, CenH3 and CENP-C and of the macromolecular centromeres. Both simple and complex kinetochores contain assemblies in which they participate is a key task in the study conserved SPC105, MIND and NDC80 complexes along with of kinetochore biology. more variable COMA complexes. This core assembly is sup- plemented by adaptor proteins specific to organisms with Diverged kinetochore components point, regional or holocentric CENs. The key to understand- Budding yeast with CDEI-II-III point CENs contain a set of 11 ing kinetochore biology is now to determine how specialized proteins that are not present in fungi such as S. pombe or C. adaptors and conserved core complexes interact with inner albicans (Figure 11). Three of the eleven point-CEN specific centromere components such as CenH3 and CENP-C to proteins are involved in sequence-specific binding to CDEIII assemble structures capable of binding to and regulating while six are part of the COMA complex or of a COMA- microtubules through the recruitment of MAPs and motors. dependent assembly pathway. Only three of the eleven com- ponents of the COMA pathway in S. cerevisiae (Mcm21Mal2, Chl4Mis15 and Ctf3Mis6/CENP-I) are conserved among fungi and Materials and methods mammalian kinetochores (Figure 11). In S. pombe, an alter- Sequence-similarity searches native set of eight proteins, including spSim4 and spFta1-7, Database searches were performed on NCBI non-redundant are bound to the COMA components spMcm21Mal2, and EST databases using PSI-BLAST and BLAST (protein- spChl4Mis15 and spMis6Ctf3. At least three of these proteins protein BLAST (blastp) and genomic BLAST (tblastn)) [76]. (CENP-HFta3, Fta1 and Sim4) are members of a class of pro- Pattern searches were performed using ScanProsite [77]. teins found in fungal and metazoan organisms with regional Multiple sequence alignments were built with ClustalW, CENs whereas the other four proteins have no obvious MUSCLE and T-Coffee and edited by hand [78-80]. Coiled orthologs (Figure 11a). Overall, these data point to COMA and coil predictions were based on the COILS program using a COMA-associated proteins as kinetochore components with a window size of 28 [81]. Human Nnf1R and Fta1R were identi- particularly high degree of sequence divergence through evo- fied in PSI-BLAST searches using the full-length S. pombe lution. It seems reasonable to speculate that COMA helps to Fta1 or S. cerevisiae Nnf1 protein sequences as the queries. To accommodate kinetochore subunits that are highly conserved identify human Mcm21R, the S. cerevisiae Mcm21 protein among regional and point CENs, such as the NDC80 complex, sequence was first used as a query in PSI-BLAST searches to diverged components, such as CBF3. By analogy, it seems that yielded fungal Mcm21 related proteins. These proteins likely that specialized proteins have evolved to meet the were assembled in a multiple sequence alignment from which special structural demands of holocentric CENs; ceKNL-3, a the motif [HYF]- [KRHDENQ]- [VLI]-x- [HYF]- [ST]- [IVL]- kinetochore protein bound to the C. elegans MIND and [P]-x-x- [IL]-x- [ILV] was derived, and then used in a pattern NDC80 complexes [18] but absent from other kinetochores, search to identify metazoan orthologs. To identify orthologs may be an early example of a holocentric adaptor. of S. cerevisiae Nsl1 and Chl4, S. pombe Sim4 or H. sapiens CENP-H conserved blocks were first identified. PSI-BLAST The logic of kinetochore assembly searches were carried out using S. pombe Sim4, S. cerevisiae The MT binding components of kinetochores are unlike kine- Nsl1, S. cerevisiae Chl4 or H. sapiens CENP-H as query tochore structural components in that almost all are involved sequences. This approach identified a set of fungal Sim4, Nsl1 in multiple MT-based processes (Figure 11). In humans for or Chl4 related proteins and a set of metazoan CENP-H example, EB1Bim1 and APCKar9 are found not only at kineto- related proteins. Each set of proteins was then assembled into chores, but also at sites of MT association with the cell cortex; multiple sequence alignments and conserved blocks identi- CLIP-170Bik1 and Dynein play important roles in vesicle traf- fied (amino acids 1 to 143 for U. maydis Nsl1, 1 to 114 for S. ficking and ch-Tog1Stu2 is required for spindle assembly. From cerevisiae Chl4, 341 to 373 for S. pombe Sim4 and 224 to 269 yeast to humans, only one or two of the six to ten kinetochore for H. sapiens CENP-H). The sequences present in these con- MAPs and motors are specific to kinetochores. CENP-A func- served blocks were then used in PSI-BLAST searches to iden- tions in most organisms to determine CEN location without tify new fungal (for CENP-H) or metazoan (for Sim4, Chl4 or recognizing CEN-specific sequences; similarly, the NDC80- Nsl1) proteins. MIND-SPC105-COMA complexes must determine the specialized biochemistry of MT-kinetochore linkages without Phylogenetic analysis resort to many kinetochore-specific MAPs. Phylogenetic alignments were generated with MUSCLE using GBlocks to identify conserved blocks [82]. Conserved blocks were selected only if single positions were conserved in at Conclusion least 50% of the sequences, with higher stringency at flanking We conclude that critical structural features of kinetochores positions (80%). A maximum of eight contiguous non-con- are conserved from yeast to man, despite highly divergent served positions were allowed. The minimum block length CEN sequences. It appears that both short S. cerevisiae point was five amino acids. Positions with gaps were allowed only if centromeres and complex metazoan regional centromeres their number did not exceed 50%. Conserved blocks and the arose from a common ancestor that probably had regional number of positions used for each protein family are

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.19

described in Additional data file 5. To calculate the distances proteins Ndc80, Nuf2R, Mis12/Mtw1, Nnf1, Spc105 and between sequences we took a maximum likelihood approach CENP-C amongst five fungi. Additional data files 4 and 5 list

using TREE-PUZZLE [83] with the 'Pairwise distance calcu- amino acid similarities used in all multiple sequence comment lation only' option, the Jones-Taylor-Thornton substitution alignments and homology blocks used in phylogenetic analy- matrix [84] and gamma-distributed rates (eight categories) to sis, respectively. account for rate heterogeneity (parameters were estimated organismsthefileClickAdditionalAminoHomologyCcentagesblocksdues,andsimilaritytersticalsimilarfungalIdentificationSpc34AccessionMultipledescribedsequenceteinsamongstMif2 organisms.1. on fourresidues and Ndc80,herewhite in orthologs(black blackacidresidues metazoansequencefivedenotehumans. Spc105 alignments ofin numbersforandblocksFile letterssimilarities successiveadditional denote fungiNuf2R,inboxes). fileof Accessionblack45231 ≤ in athe E.andproteins 80%usedpotential S.≤ onalignmentalignments sequences.cuniculi of identicaldegree 80% cerevisiae lettersMis12/Mtw1,aWhite of green, allofset in sequencedatausedthe theproteinsnumbersofphylogenetic amongstof oforthNdc80, thekinetochoreletterson fileidenticalrelated organismsinresidues, of similarityof Percentages green,Spc34 allorganisms.ologS.1. theblocks that multiplecerevisiae areonNnf1, fiveNuf2, metazoanofE. similarresidues blackwas describedwhiteare theanalysisanalysis. fungicuniculi and(blackof proteins.Spc105 Nnf1, alignedusedDASH successive Accession denotesequence denoteblack letters andSpc34residues proteinsinboxes). inMis12kinetochore inand complex≥E. lettersMultiple thiswith theadditional 80%identical on withcuniculi CENP-Calignmentsalignments.numberssequenceMtw1 indegreegreen,Whitestudystudy. five ofona≥, subunit set80%theCENP- fungalgreen, .resi-pro- iden-let-Per- dataofof are of from the dataset). A neighbor-joining tree was constructed from the distance matrix with the NEIGHBOR program from Acknowledgements the PHYLIP package [85]. Reliability of the dataset was We thank Daniel Huson (University of Tübingen) for help with the compu- assessed by bootstrap. We generated 100 permutation data- tational centromere screen and members of the Sorger lab for helpful dis- cussions. ADM was supported by a fellowship from the Jane Coffin Childs sets using the SEQBOOT program from the PHYLIP package. Fund for Medical Research and PM by an EMBO long-term fellowship. This reviews From these 100 datasets we calculated distance matrices and work was supported by NIH grants CA84179 and GM51464. constructed neighbor-joining trees using the parameters described above. TREE-PUZZLE was then used with the 'Consensus of user defined trees' option to generate a consen- References 1. Koshland DE, Mitchison TJ, Kirschner MW: Polewards chromo- sus tree from all neighbor-joining trees (nodes with support some movement driven by microtubule depolymerization in less than 50% were collapsed) [86]. Trees were visualized vitro. Nature 1988, 331:499-504. using the SPLITS TREE tool [87]. Amino acid similarity 2. Cleveland DW, Mao Y, Sullivan KF: Centromeres and kineto- chores: from epigenetics to mitotic checkpoint signaling. Cell percentages used in multiple sequence alignments are given

2003, 112:407-421. reports in Additional data file 4. 3. Choo KH: Domain organization at the centromere and neocentromere. Dev Cell 2001, 1:165-177. 4. Fitzgerald-Hayes M, Clarke L, Carbon J: Nucleotide sequence Hidden Markov model based-modeling comparisons and functional analysis of yeast centromere The point CEN model was constructed from three different DNAs. Cell 1982, 29:235-244. 5. McAinsh AD, Tytell JD, Sorger PK: Structure, function, and reg- sub-models based on the known structure of point CENs. The ulation of budding yeast kinetochores. Annu Rev Cell Dev Biol first sub-model searched for CDEI-like regions in the query 2003, 19:519-539. sequence using the [T|G]CA[C|G|T][A|C|G]TG motif. The 6. Sanyal K, Baum M, Carbon J: Centromeric DNA sequences in research deposited the pathogenic yeast Candida albicans are all different and second sub-model then searched for adjacent CDEII-like AT unique. Proc Natl Acad Sci USA 2004, 101:11374-11379. rich regions. The CDEII region was modeled with a HMM 7. Houben A, Schubert I: DNA and proteins of plant centromeres. Curr Opin Plant Biol 2003, 6:554-560. using CDEII from S. cerevisiae [88]. For the negative model, 8. Schueler MG, Higgins AW, Rudd MK, Gustashaw K, Willard HF: S. cerevisiae genomic DNA was used (the effect of including Genomic and genetic definition of a functional human CENs in the genomic DNA was disregarded). For both data- centromere. Science 2001, 294:109-115. 9. Sun X, Wahlstrom J, Karpen G: Molecular structure of a func- sets, base transition frequencies were determined and the tional Drosophila centromere. Cell 1997, 91:1007-1019. transition matrix for the HMM was calculated. The quality of 10. Clarke L, Amstutz H, Fishel B, Carbon J: Analysis of centromeric the HMM was evaluated by screening annotated budding DNA in the fission yeast Schizosaccharomyces pombe. Proc refereed research Natl Acad Sci USA 1986, 83:8253-8257. yeast genomes and assessment with a bit score: 11. Albertson DG, Thomson JN: The kinetochores of Caenorhabditis elegans. Chromosoma 1982, 86:409-428. 12. Wiens GR, Sorger PK: Centromeric chromatin and epigenetic Px(|model+ ) Sx()= log effects in kinetochore assembly. Cell 1998, 93:313-316. Px(|model− ) 13. Mellone BG, Allshire RC: Stretching it: putting the CEN(P-A) in centromere. Curr Opin Genet Dev 2003, 13:191-198. 14. De Wulf P, McAinsh AD, Sorger PK: Hierarchical assembly of the Given the identification of CDEI and CDEII sequence ele- budding yeast kinetochore from multiple subcomplexes. ments, a third sub-model searched for an adjacent CDEIII Genes Dev 2003, 17:2902-2921. 15. Nekrasov VS, Smith MA, Peak-Chew S, Kilmartin JV: Interactions motif using an expression based on the highly conserved between centromere complexes in Saccharomyces cerevisiae. CCGGAA motif. Positive hits were evaluated with the bit score Mol Biol Cell 2003, 14:4931-4946. interactions calculated from the CDEII HMM, length distribution, AT 16. Scharfenberger M, Ortiz J, Grau N, Janke C, Schiebel E, Lechner J: Nsl1p is essential for the establishment of bipolarity and the length, AT runs and synteny. localization of the Dam-Duo complex. EMBO J 2003, 22:6584-6597. 17. Bharadwaj R, Qi W, Yu H: Identification of two novel compo- nents of the human NDC80 kinetochore complex. J Biol Chem Additional data files 2004, 279:13076-13085. The following additional data are available with the online 18. Cheeseman IM, Niessen S, Anderson S, Hyndman F, Yates JR 3rd, Oegema K, Desai A: A conserved protein network controls version of this paper. Additional data file 1 contains accession assembly of the outer kinetochore and its ability to sustain numbers of all proteins that are used in this study. Additional tension. Genes Dev 2004, 18:2255-2268. information data file 2 shows the multiple sequence alignment of S. cere- 19. Obuse C, Iwasaki O, Kiyomitsu T, Goshima G, Toyoda Y, Yanagida M: A conserved Mis12 centromere complex is linked to hetero- visiae Spc34, a subunit of the multi-protein DASH complex, chromatic HP1 and outer kinetochore protein Zwint-1. Nat with a set of fungal orthologs and a set of related metazoan Cell Biol 2004, 6:1135-1141. 20.Goshima G, Kiyomitsu T, Yoda K, Yanagida M: Human proteins (NYD-Sp28 family). Additional data file 3 contains centromere chromatin protein hMis12, essential for equal multiple sequence alignments of the E. cuniculi kinetochore segregation, is independent of CENP-A loading pathway. J

Genome Biology 2006, 7:R23 R23.20 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. http://genomebiology.com/2006/7/3/r23

Cell Biol 2003, 160:25-39. Dam1 ring complex. Mol Cell 2005, 17:277-290. 21. Liu ST, Hittle JC, Jablonski SA, Campbell MS, Yoda K, Yen TJ: Human 45. Jishage M, Fujino T, Yamazaki Y, Kuroda H, Nakamura T: Identifica- CENP-I specifies localization of CENP-F, MAD1 and MAD2 tion of target genes for EWS/ATF-1 chimeric transcription to kinetochores and is essential for . Nat Cell Biol 2003, factor. Oncogene 2003, 22:41-49. 5:341-345. 46. Pazour GJ, Agrin N, Leszyk J, Witman GB: Proteomic analysis of a 22. McCleland ML, Gardner RD, Kallio MJ, Daum JR, Gorbsky GJ, Burke eukaryotic cilium. J Cell Biol 2005, 170:103-113. DJ, Stukenberg PT: The highly conserved Ndc80 complex is 47. Meluh PB, Koshland D: Evidence that the MIF2 gene of Saccha- required for kinetochore assembly, chromosome congres- romyces cerevisiae encodes a centromere protein with sion, and spindle checkpoint activity. Genes Dev 2003, homology to the mammalian centromere protein CENP-C. 17:101-114. Mol Biol Cell 1995, 6:793-807. 23. Tirnauer JS, Canman JC, Salmon ED, Mitchison TJ: EB1 targets to 48. Stoler S, Keith KC, Curnick KE, Fitzgerald-Hayes M: A mutation in kinetochores with attached, polymerizing microtubules. Mol CSE4, an essential gene encoding a novel chromatin-associ- Biol Cell 2002, 13:4308-4316. ated protein in yeast, causes chromosome nondisjunction 24. Kitagawa K, Hieter P: Evolutionary conservation between bud- and cell cycle arrest at mitosis. Genes Dev 1995, 9:573-586. ding yeast and human kinetochores. Nat Rev Mol Cell Biol 2001, 49. Williams BC, Li Z, Liu S, Williams EV, Leung G, Yen TJ, Goldberg ML: 2:678-687. Zwilch, a new component of the ZW10/ROD complex 25. Wigge PA, Kilmartin JV: The Ndc80p complex from Saccharo- required for kinetochore functions. Mol Biol Cell 2003, myces cerevisiae contains conserved centromere compo- 14:1379-1391. nents and has a function in chromosome segregation. J Cell 50. Sugata N, Munekata E, Todokoro K: Characterization of a novel Biol 2001, 152:349-360. kinetochore protein, CENP-H. J Biol Chem 1999, 26. Dujardin D, Wacker UI, Moreau A, Schroer TA, Rickard JE, De Mey 274:27343-27346. JR: Evidence for a role of CLIP-170 in the establishment of 51. Starr DA, Williams BC, Li Z, Etemad-Moghadam B, Dawe RK, Gold- metaphase chromosome alignment. J Cell Biol 1998, berg ML: Conservation of the centromere/kinetochore pro- 141:849-862. tein ZW10. J Cell Biol 1997, 138:1289-1301. 27. Henikoff S, Dalal Y: Centromeric chromatin: what makes it 52. Rattner JB, Rao A, Fritzler MJ, Valencia DW, Yen TJ: CENP-F is a.ca unique? Curr Opin Genet Dev 2005, 15:177-184. 400 kDa kinetochore protein that exhibits a cell-cycle 28. Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine dependent localization. Cell Motil Cytoskeleton 1993, 26:214-226. I, De Montigny J, Marck C, Neuveglise C, Talla E, et al.: Genome evo- 53. Yen TJ, Compton DA, Wise D, Zinkowski RP, Brinkley BR, Earnshaw lution in yeasts. Nature 2004, 430:35-44. WC, Cleveland DW: CENP-E, a novel human centromere- 29. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing associated protein required for progression from metaphase and comparison of yeast species to identify genes and regu- to anaphase. EMBO J 1991, 10:1245-1254. latory elements. Nature 2003, 423:241-254. 54. Westermann S, Cheeseman IM, Anderson S, Yates JR 3rd, Drubin 30. Kuras L, Thomas D: Identification of the yeast methionine bio- DG, Barnes G: Architecture of the budding yeast kinetochore synthetic genes that require the centromere binding factor reveals a conserved molecular core. J Cell Biol 2003, 1 for their transcriptional activation. FEBS Lett 1995, 367:15-18. 163:215-222. 31. Cardozo T, Pagano M: The SCF ubiquitin ligase: insights into a 55. Andag U, Schmitt HD: Dsl1p, an essential component of the molecular machine. Nat Rev Mol Cell Biol 2004, 5:739-751. Golgi-endoplasmic reticulum retrieval system in yeast, uses 32. Ciferri C, De Luca J, Monzani S, Ferrari KJ, Ristic D, Wyman C, Stark the same sequence motif to interact with different subunits H, Kilmartin J, Salmon ED, Musacchio A: Architecture of the of the COPI vesicle coat. J Biol Chem 2003, 278:51722-51734. human -hec1 complex, a critical constituent of the 56. Hirose H, Arasaki K, Dohmae N, Takio K, Hatsuzawa K, Nagahama outer kinetochore. J Biol Chem 2005, 280:29088-29095. M, Tani K, Yamamoto A, Tohyama M, Tagaya M: Implication of 33. Wei RR, Sorger PK, Harrison SC: Molecular organization of the ZW10 in membrane trafficking between the endoplasmic Ndc80 complex, an essential kinetochore component. Proc reticulum and Golgi. EMBO J 2004, 23:1267-1278. Natl Acad Sci USA 2005, 102:5363-5367. 57. Starr DA, Williams BC, Hays TS, Goldberg ML: ZW10 helps 34. Kline-Smith SL, Sandall S, Desai A: Kinetochore-spindle microtu- recruit dynactin and dynein to the kinetochore. J Cell Biol 1998, bule interactions during mitosis. Curr Opin Cell Biol 2005, 142:763-774. 17:35-46. 58. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF: A kingdom- 35. Malik HS, Henikoff S: Conflict begets complexity: the evolution level phylogeny of eukaryotes based on combined protein of centromeres. Curr Opin Genet Dev 2002, 12:711-718. data. Science 2000, 290:972-977. 36. Biswas K, Rieger KJ, Morschhauser J: Functional characterization 59. Gribaldo S, Philippe H: Ancient phylogenetic relationships. of CaCBF1, the Candida albicans homolog of centromere Theor Popul Biol 2002, 61:391-408. binding factor 1. Gene 2003, 323:43-55. 60. Heeger S, Leismann O, Schittenhelm R, Schraidt O, Heidmann S, Leh- 37. Browning H, Hackney DD, Nurse P: Targeted movement of cell ner CF: Genetic interactions of separase regulatory subunits end factors in fission yeast. Nat Cell Biol 2003, 5:812-818. reveal the diverged Drosophila Cenp-C homolog. Genes Dev 38. Lawrence CJ, Dawe RK, Christie KR, Cleveland DW, Dawson SC, 2005, 19:2041-2053. Endow SA, Goldstein LS, Goodson HV, Hirokawa N, Howard J, et al.: 61. Blower MD, Karpen GH: The role of Drosophila CID in kineto- A standardized kinesin nomenclature. J Cell Biol 2004, chore formation, cell-cycle progression and heterochroma- 167:19-22. tin interactions. Nat Cell Biol 2001, 3:730-739. 39. Liu X, McLeod I, Anderson S, Yates JR 3rd, He X: Molecular analy- 62. Karess R: Rod-Zw10-Zwilch: a key player in the spindle sis of kinetochore architecture in fission yeast. EMBO J 2005, checkpoint. Trends Cell Biol 2005, 15:386-392. 24:2919-2930. 63. Vivares CP, Gouy M, Thomarat F, Metenier G: Functional and evo- 40. Pidoux AL, Richardson W, Allshire RC: Sim4: a novel fission yeast lutionary analysis of a eukaryotic parasitic genome. Curr Opin kinetochore protein required for centromeric silencing and Microbiol 2002, 5:499-505. chromosome segregation. J Cell Biol 2003, 161:295-307. 64. Peters JM: The anaphase-promoting complex: proteolysis in 41. Wang Y, Devereux W, Stewart TM, Casero RA Jr: Cloning and mitosis and beyond. Mol Cell 2002, 9:931-943. characterization of human polyamine-modulated factor-1, a 65. Luo X, Tang Z, Rizo J, Yu H: The Mad2 spindle checkpoint transcriptional cofactor that regulates the transcription of protein undergoes similar major conformational changes the spermidine/spermine N(1)-acetyltransferase gene. J Biol upon binding to either Mad1 or Cdc20. Mol Cell 2002, 9:59-71. Chem 1999, 274:22095-22101. 66. Sironi L, Mapelli M, Knapp S, De Antoni A, Jeang KT, Musacchio A: 42. Yamashita A, Ito M, Takamatsu N, Shiba T: Characterization of Crystal structure of the tetrameric Mad1-Mad2 core com- Solt, a novel SoxLZ/Sox6 binding protein expressed in adult plex: implications of a 'safety belt' binding mechanism for the mouse testis. FEBS Lett 2000, 481:147-151. spindle checkpoint. EMBO J 2002, 21:2496-2506. 43. Miranda JJ, De Wulf P, Sorger PK, Harrison SC: The yeast DASH 67. Talbert PB, Bryson TD, Henikoff S: Adaptive evolution of centro- complex forms closed rings on microtubules. Nat Struct Mol mere proteins in plants and animals. J Biol 2004, 3:18 . Biol 2005, 12:138-143. 68. Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier 44. Westermann S, Avila-Sakar A, Wang HW, Niederstrasser H, Wong J, G, Barbe V, Peyretaillade E, Brottier P, Wincker P, et al.: Genome Drubin DG, Nogales E, Barnes G: Formation of a dynamic kine- sequence and gene compaction of the eukaryote parasite tochore- microtubule interface through assembly of the Encephalitozoon cuniculi. Nature 2001, 414:450-453.

Genome Biology 2006, 7:R23 http://genomebiology.com/2006/7/3/r23 Genome Biology 2006, Volume 7, Issue 3, Article R23 Meraldi et al. R23.21

69. Dalgaard JZ, Vengrova S: Selective gene expression in multigene families from yeast to mammals. Sci STKE 2004, 2004:re17. 70. Jager H, Herzig A, Lehner CF, Heidmann S: Drosophila separase is

required for sister chromatid separation and binds to PIM comment and THR. Genes Dev 2001, 15:2572-2584. 71. Louis EJ: Are Drosophila telomeres an exception or the rule? Genome Biol 2002, 3:REVIEWS0007. 72. Bapteste E, Charlebois RL, MacLeod D, Brochier C: The two tem- pos of nuclear pore complex evolution: highly adapting pro- teins in an ancient frozen structure. Genome Biol 2005, 6:R85. 73. Newman JR, Wolf E, Kim PS: A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae. Proc Natl Acad Sci USA 2000, 97:13203-13208. 74. Rose A, Meier I: Scaffolds, levers, rods and springs: diverse cel-

lular functions of long coiled-coil proteins. Cell Mol Life Sci 2004, reviews 61:1996-2009. 75. Bullitt E, Rout MP, Kilmartin JV, Akey CW: The yeast spindle pole body is assembled around a central crystal of Spc42p. Cell 1997, 89:1077-1086. 76. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lip- man DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402. 77. Gattiker A, Gasteiger E, Bairoch A: ScanProsite: a reference implementation of a PROSITE scanning tool. Appl Bioinformatics 2002, 1:107-108. 78. Higgins DG: CLUSTAL V: multiple alignment of DNA and reports protein sequences. Methods Mol Biol 1994, 25:307-318. 79. Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302:205-217. 80. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32:1792-1797.

81. Lupas A, Van Dyke M, Stock J: Predicting coiled coils from pro- research deposited tein sequences. Science 1991, 252:1162-1164. 82. Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 2000, 17:540-552. 83. Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZ- ZLE: maximum likelihood phylogenetic analysis using quar- tets and parallel computing. Bioinformatics 2002, 18:502-504. 84. Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 1992, 8:275-282. 85. Felsenstein J: PHYLIP - Phylogeny Inference Package (Version refereed research 3.2). Cladistics 1989, 5:164-166. 86. Felsenstein KM, Lewis-Higgins L: Processing of the beta-amyloid precursor protein carrying the familial, Dutch-type, and a novel recombinant C-terminal mutation. Neurosci Lett 1993, 152:185-189. 87. Huson DH, Bryant D: Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 2006, 23:254-267. 88. Durbin R, Eddy SR, Krogh A, Mitchison G: Probabilistic Models of Pro- teins and Nucleic Acids Cambridge: Cambridge University Press; 1998. interactions information

Genome Biology 2006, 7:R23