<<

University of Connecticut OpenCommons@UConn

Honors Scholar Theses Honors Scholar Program

Spring 5-1-2018

Sequencing and Analysis of B in Wallaby and the Rapid Evolution of the Centromere

Alexander Tedeschi [email protected]

Follow this and additional works at: https://opencommons.uconn.edu/srhonors_theses

Part of the Biology Commons, and the Commons

Recommended Citation Tedeschi, Alexander, "Sequencing and Analysis of in Wallaby and the Rapid Evolution of the Centromere" (2018). Honors Scholar Theses. 730. https://opencommons.uconn.edu/srhonors_theses/730 1

Sequencing and Analysis of Centromere Protein B in Wallaby and the Rapid Evolution of the Centromere

Alexander Tedeschi Honors Thesis Department of Molecular and Biology University of Connecticut April 2018

Thesis/Honors Advisor: Dr. Rachel O’Neill Graduate Advisor: Zachary Duda

2

Abstract

The centromere is a region of tightly packed that acts as the anchoring site of the protein complex in . This complex is composed of a wide variety of centromere known as CENPs and is responsible for proper segregation of in . This essential function of the centromere is highly conserved in eukaryotes, but paradoxically the

DNA content of the centromere is not. In relatively short evolutionary time, accumulating changes in DNA and the associated proteins at the centromere could be enough to create a barrier and drive evolution through a process known as “centromere drive”.

We aimed to examine the effects rapid changes in centromeric DNA have on the protein-coding sequence of closely associated CENPs of the kinetochore. This portion of the project focuses on CENP-B, an inner kinetochore protein that binds to a 17-bp recognition sequence in the centromeric DNA known as the CENP-B box whose function is still somewhat of a mystery. Through a combination of RNA-seq data analysis and Sanger sequencing, the complete cDNA and corresponding amino acid sequences of CENP-B were determined for eight of (family

Macropodidae) known to have highly divergent . These sequences were aligned and compared to each other and other mammalian CENP-B sequences in order to delineate the degree of divergence amongst species. Initial analysis reveals many amino acid changes within the Macropodidae, particularly in Macropus rufogriseus (red-necked wallaby). These data were also used to generate phylogenies, perform positive selection analysis, and make inferences about the

3 structure and function of CENP-B with respect to centromere DNA binding. Further analysis is needed to compare CENP-B sequence divergence to other proteins in the kinetochore complex.

Table of Contents

Abstract……………………………………………………………………………………………………………….2

List of Tables………………………………………………………………………………………………..………4

List of Figures………………………………………………………………………………………………………5

Background…………………………………………………………………………………………………………6

Materials and Methods……………………………………………………………………………………….11

Results……………………………………………………………………………………………………...……….20

Discussion………………………………………………………………………………………...…………….....22

Acknowledgements…………………………………………………………………………………...……….29

References…………………………………………………………………………………………………………30

4

List of Tables

Table 1: Complete List of Primers Used...... 34

Table 2: Complete List of Primer Set Combinations and Species Tested...…………...…35

Table 3: Summary of Primer Sets Used in Sequencing………………………………………….36

Table 4: Pairwise Percent Amino Acid Identities Across Macropods……………………..37

Species Abbreviations Key

magil Macropus agilis meu Macropus eugenii mgi Macropus giganteus mpm Macropus parma rob Macropus robustus rufo Macropus rufogriseus rufus Macropus rufus wbi Wallabia bicolor all All species tested

5

List of Figures

Figure 1: Illustration of Centromere Drive……….………………………………………………….38

Figure 2: Diagram of the Kinetochore Complex………….………………………………………...38

Figure 3: Differences in Sequencing PCR Kits………………………………………………………39

Figure 4: Macropodid CENP-B Amino Acid Alignment………………………………………….40

Figure 5: Mammalian CENP-B Amino Acid Alignment…………………………………………..41

Figure 6: Phylogeny of CENP-B………………………………………………………………………...... 42

Figure 7: Intrinsically Disordered Glu/Asp-Rich Regions as DNA Mimics………….…..42

Figure 8: Model of CENP-A Formation……………………………………………...43

6

Background

The DNA of eukaryotic organisms is organized into chromosomes, each of which consists of one long, linear strand of DNA wrapped around protein complexes known as . During the process of cell division, these chromosomes must be accurately replicated and segregated into each of the two daughter cells. Essential to this process is the centromere; a highly compact region of DNA that presents as a constriction in metaphase chromosomes. The centromere is the site of formation of the kinetochore, a protein complex that facilitates proper attachment of spindle during metaphase and separation of sister in anaphase. An error in the formation of the kinetochore complex that would prevent microtubules attachment could cause improper segregation of chromosomes and result in an abnormal number of chromosomes in the daughter cells (reviewed by Brinkley

1990) Thus the centromere plays a critical role in cell division in eukaryotic organisms. Both the epigenetic structure and DNA sequence of centromeric regions are distinct from surrounding regions of the . Centromeres are mainly demarcated by a H3 variant known as CENP-A (Palamer et. al 1987). CENP-

A is one of many proteins that have been shown to localize to the centromere, known collectively as CENPs. The underlying DNA at centromeres is composed of repetitive elements, mainly satellite DNA and mobile transposable elements (Brown and O’Neill, 2009). Together, CENP-A and these repetitive satellite regions form the backbone of functional centromeres, on which the rest of the kinetochore complex can form (reviewed in Fukagawa and Earnshaw 2004).

Given its importance in cell division, the function of the centromere and

7 kinetochore machinery is highly conserved in eukaryotes (reviewed in Westermann and Schleiffer 2013). Paradoxically, DNA elements within the centromere are variable and divergent among taxa (Sunkel and Coelho 1995). This problem has become known as the centromere paradox. This strange dynamic has been explained by H.S. Malik (2009) through a process called centromere drive (see

Figure 1). Biases in are the key driver of this process. During meiosis in four potential are created, but only one goes on to become the oocyte. Selection as to which of the four gametes becomes the oocyte is not always completely random, as those that can attain a more favorable orientation in meiosis are more likely to be selected (Zwick et. al. 1999). This selective advantage could potentially originate from changes in the centromere that could cause preferential attachment, increased nucleosome recruitment, or some other similar effect (Heikoff and Malik 2001). Through biased incorporation into oocytes, chromosomes with these “stronger” centromeres would be rapidly driven to fixation in the population (Heikoff and Malik 2001). However, these differences could cause centromeres to become imbalanced, increasing the likelihood of events. In order to combat this, changes to centromere proteins that would restore balance would be selected for and become fixed in the population (Heikoff and Malik

2001). This creates an evolutionary arms race between centromere sequences and associated proteins that causes the sequences of both to change rapidly, while still preserving the function of the centromere (Heikoff and Malik 2001). Over relatively short periods of evolutionary time, these rapidly changing sequences could become so diverged in different populations that the differences prevent formation of viable

8 offspring in hybrids and can act as a speciation barrier (Malik 2009). Thus, this process of centromere drive may act as a mechanism of species evolution.

The kinetochore complex itself can be described in two major sections. The first is the inner kinetochore, which contains the proteins most closely associated with the centromere itself, either through direct contact with the DNA or CENP-A nucleosome or through close association with the proteins that make this connection (Perpelescu and Fukagawa 2011). The second is the outer kinetochore, containing the proteins that interact with the microtubule during cell division

(Perpelescu and Fukagawa 2011). While it is known that centromere sequences are rapidly changing, it is not clear exactly how far into the kinetochore complex these

DNA differences have had an effect on kinetochore protein sequence. One of the goals of the lab is to identify how far-reaching these changes are, starting with the proteins in direct contact with centromeric DNA and working towards the outer kinetochore. This project specifically examines potential changes to CENP-B, one of the few proteins that directly interacts with DNA at the centromere (Masumoto et. al. 1989, Muro et. al. 1992).

Despite the amount of attention CENP-B has received in the scientific literature, the exact function of the protein remains a mystery. CENP-B is known to bind a 17-bp sequence found in centromeric satellite DNA called the CENP-B box

(Masumoto et. al. 1989). This motif is found in all centromeres within alpha satellites, but not in the centromere satellites of the (Masumoto et. al. 1989). CENP-B has also been shown to assemble alpha satellite DNA at these

CENP-B boxes and impose higher order structures, suggesting it is

9 important to centromere organization (Muro et. al. 1991). CENP-B binds to DNA through a helix-turn-helix motif at the N-terminus of the protein and is capable of dimerizing with another CENP-B protein through a homodimerization domain at the

C-terminus (Yoda et. al. 1992). This combination of dimerization and DNA binding activity allows CENP-B to bring distant CENP-B boxes together and create higher order structures (Yoda et. al. 1992 and 1998). The presence of CENP-B boxes within the centromeres of a wide range of species (Kipling et. al. 1995,

Suntronpong et. al. 2016) suggests it plays a role in centromere formation and/or function. CENP-B has also been shown to be essential in the process of de novo centromere formation (Ohzeki et. al. 2002, Masumoto et. al. 2004) and in the maintenance of stable centromeres (Fachinetti et. al. 2015). CENP-B has also been found to complex with CENP-A and could assist in its positioning and stabilization within centromeric (Fujita et. al. 2014). However, other studies have seemed to contradict essential role of CENP-B in . CENP-B null mice are completely viable and exhibit no negative effects in its absence, indicating that it is not an essential protein on its own (Perez-Castro et. al. 1998). CENP-B boxes are also absent in the in some mammal species like the African Green Monkey and in the

Y chromosome of (Goldberg et. al. 1996). While CENP-B does stabilize centromeric nucleosomes, CENP-A and CENP-C alone are sufficient for formation of the kinetochore complex (Fachinetti et. al. 2013). Abp1, the homolog of mammalian

CENP-B present in Schizosaccharomyces pombe (fission yeast) has even been shown to have a completely separate activity in the silencing of transposable elements

(Lorenz et. al. 2012). With much still unclear about the role of CENP-B in mammals,

10 one goal of this project is to make inferences about the function of CENP-B based on its complete sequence structure.

In this project, the CENP-B of eight species of marsupials of the family

Macropodidae (nine total individuals) was sequenced. These species were Macropus eugenii (tammar wallaby), Macropus agilis (agile wallaby), Macropus rufogriseus

(red-necked wallaby, two indviduals), Macropus parma (parma wallaby), Macropus robustus (common wallaroo), Macropus rufus (red kangaroo) Macropus giganteus

(eastern grey kangaroo), and Wallabia bicolor (swamp wallaby). Although unusual, wallabies make an excellent model organism for the study of rapid evolution of the centromere. Despite their recent evolutionary divergence, this clade of marsupials is characterized by a series of chromosomal rearrangements centered around the centromere, suggesting instability and potentially rapid changes (Bulazel et. al.

2007). The highly conserved CENP-B box sequence was found to be altered in the

Macropods, suggesting possible changes to the CENP-B protein itself (Bulazel et. al

2006). In addition, M. rufogriseus possesses abnormally large centromeres compared to the rest of the wallabies, making this species a good target for studying the effects of divergent centromeric DNA on the centromere protein evolution

(Bulazel et. al. 2006). Although not used in this study, hybrid cell lines of M. rufogriseus and M. agilis have been created to examine the effects of potential centromere protein incompatibilities in closely related species. Through the sequencing of the CENP-B gene in these eight species, this study aims to determine the effects of rapidly evolving centromeric DNA sequences on the CENP-B protein through comparative analysis, phylogenetic reconstruction, and positive

11 selection analysis. As one of the few DNA-contacting proteins in the inner kinetochore, the availability of the sequence of CENP-B will be imperative for further study into potential alterations of proteins deeper in the kinetochore assembly complex. The initial research into CENP-B in this study will lay the groundwork for future studies into both the function of CENP-B and the action of centromere drive on the rest of the proteins in the kinetochore.

Materials and Methods

Mapping RNA-seq Reads

The human CENP-B mRNA nucleotide sequence was downloaded from the

NCBI database. This sequence was used to perform a blastn web browser search of the nucleotide collection (nr/nt), marsupial database (taxid:92763) using default blastn algorithm parameters. These marsupial CENP-B sequences were then used to blastn to the Macropus eugenii (in house; R. O’Neill and T. Heider, personal communication) to locate CENP-B in our model species. The ExPASy translate tool was used to find the open reading frame in this isolated CENP-B sequence. The corresponding sequences, along with 150 bp from both the 5’ and 3’ end, were pulled out and imported into Geneious 8.1.9. A script for Bowtie 2 using the very sensitive and paired-end parameters was used to map the Illumina TruSeq data previously generated in the lab (S. Trusiak and Z. Duda, personal communication) for all eight wallaby species. Sequencing reads that did not map to CENP-B were excluded. Mapped reads were also imported into Geneious and aligned to the isolated M. eugenii CENP-B using the “Map to Reference” tool with default parameters.

12

Synthesizing cDNA

cDNA for all species was synthesized from previously prepared DNAsed RNA samples using the Quantabio qSctipt cDNA supermix (catalog number 101414-106) as per the manufacturer’s suggested protocol. RNA concentrations were measured via NanoDrop to determine the volume required for 1 mg RNA

Primer Design

RNA-seq reads mapped to the Macropus eugenii genome were imported into

Geneious 8.1.9. The primer design tool available in Geneious was used to generate a variety of primer sets to cross the gaps in RNA-seq data, with expected product lengths ranging from about 300 to 1000 base pairs. Initial primers were designed using M. eugenii data. Species-specific primers were designed to cover any remaining gaps that could not be bridged using these initial primers. Once designed, custom primers were ordered through Invitrogen. A full list of primers used in this study is provided in Table 1. Ordered primers were resuspended to a concentration of 1 µg/µL using clean distilled water (dH2O). These resuspended solutions were used to make working stocks of 100 ng/µL.

Performing PCR

Determining the correct PCR conditions, primers, and reagents comprised the bulk of this project. The complete list of primers and primer sets tested is descried in Tables 1 and 2. Variations in PCR conditions and reagents were based on an initial protocol as follows: 0.5 µL cDNA, 0.5 µL of forward and reverse primer, 2.0

µL 2.5mM dNTPs, 2.5 µL 10X PCR Buffer, 0.5 µL Taq polymerase, and 18.5 µL dH2O for a total 25.0 µL per reaction. The thermocycler conditions for the reaction were

13 as follows: initial denaturation of 94 ˚C for 3 minutes, 35 cycles of denaturation at 94

˚C for 30 seconds, primer annealing at 58 ˚C for 30 seconds, and extension at 72 ˚C for 45 seconds, and a final extension at 72˚C for 10 minutes.

The first change to this protocol was the use of a higher fidelity polymerase instead of the standard Taq polymerase. In this mix, the same volume of cDNA, primers, and dNTPs was used with 0.125 µL Promega GoTaq Polymerase (Catalog number M3001), 5.0 µL 5X Green GoTaq Buffer, and 16.375 µL dH2O. These reactions were run using the same PCR conditions as the initial PCRs. Gradients with an annealing temperature range of 54 to 64 ˚C were also performed both with and without GoTaq to attempt to get PCR products.

A combination of fresh cDNA and redesigned primers was used to generate the first PCR products, including peaks 2-3 in M. eugenii. M. agilis, and M. rufogriseus

(both individuals). When this method failed to work in peaks 1-2 of these same species, 2.0 µL DMSO was added to the initial mix of reagents (2.0 µL dH2O was removed to maintain the 25.0 µL volume) to assist with strand separation. This new reaction mix was run using previously described gradient conditions. The addition of DMSO lowered the effective primer annealing temperature, and all other PCRs using DMSO were run with either a set annealing temperature of 54 ˚C or a gradient range of 50-60 ˚C. DMSO in combination with primer redesign was used to generate

PCR products in all species with the exception of M. giganteus and W. bicolor.

The next alteration to the initial protocol was the use of a hot start in addition to DMSO with an initial denaturation temperature of 98 ˚C and an

14 annealing temperature of 54 ˚C. This method also failed to capture the remaining products and further changes were required.

In order to amplify the final gaps needed for sequencing, the pre-made

OneTaq Master Mix from New England BioLabs (Catalog number M0483S), specifically designed to amplify through GC-rich regions was used. This PCR kit included a Master Mix containing high fidelity polymerases and a GC buffer, and an additional high GC . 25 µL reactions were performed with this kit, using

12.5 OneTaq 2X Master Mix, 5.0 µL GC enhancer (the highest recommended amount), 0.5 µL cDNA, 0.5 µL forward and reverse primer, and 6.0 µL dH2O. These reactions were run using gradients of 50-60 ˚C and a variety of species-specific primers and successfully generated the remaining products.

Pouring and Imaging Agarose Gels

The presence and size of PCR products was determined through agarose . Gels were made at a concentration of 1% agarose using 1X TBE as a buffer. Ethidium Bromide (EtBr) was used to visualize DNA bands at a concentration of 0.35 µL per 50 mL of gel. PCR products were stained using Cresol Red at a concentration of 1X. Gels were loaded with 10 µL of sample per well. 5 µL of 2log ladder was used for size estimation. Gels were run at 95 Volts for 40 minutes and imaged using the BioRad GelDoc XR.

Cleanup of PCR Products

PCR products were purified using the QIAGEN QIAquick PCR purification kit

(Catalog number 28104) as per the manufactures suggested protocol. To achieve higher concentrations, 30 µL elution buffer (Buffer EB) was used and allowed to sit

15 for 1 minute before final elution. Concentrations of purified products were measured via NanoDrop.

Subcloning and Colony Selection

Initial PCR products were prepared for sequencing via subcloning and sequencing off of the vector. Complications in sequencing using this workflow along with the extended time to perform it made this process difficult, and most samples were prepared using direct sequencing of PCR products (skip to the

Sequencing PCR and Precipitation section). See Table 3 for details.

Purified PCR products were ligated into plasmid vectors and replicated using the Agilent Technologies StrataClone PCR Cloning Kit (Catalog number 240205). The manufacturer’s suggested protocol was followed with two small changes. First, the protocol was performed using two half reactions rather than one full reaction.

Second, SOC media was used in the initial incubation step in place of LB media. Final

LB-ampicillin plates were incubated for 16 hours at 37 ˚C.

After incubation, white colonies were selected to be grown in culture overnight. Overnight media was prepared by mixing 2.5 mL LB media with 2.5 µL of

50 µg/µL ampicillin in 15 mL culture tubes. Using a pipette tip, selected colonies were mixed with 25 µL of dH2O and allowed to sit for 10 minutes. 10 µL of this mixture was added to the appropriate culture tube. Culture tubes were incubated for 16 hours at 37 ˚C in a shaking incubator.

DNA Miniprep

After overnight incubation, all cloudy culture tubes (containing bacteria with the insert-containing vector) were used for miniprep of the plasmid DNA. Miniprep

16 was performed using the 5 Prime FastPlasmid Mini Kit (Reference number

2300010, discontinued) as per the manufacturer’s suggested protocol, using only

30µL of Elution Buffer in the final elution to increase plasmid DNA concentrations.

Concentrations of plasmid were measured via NanoDrop

An EcoRI digest was used to verify the presence of the insert in the miniprep samples. Solutions were prepared containing 5 µL miniprep DNA, 1 µL EcoRI restriction enzyme, 2 µL EcoRI buffer, and 12 µL dH2O. These solutions were incubated for 1 hour at 37 ˚C. 5 µL 5X Cresol Red was added to each solution. 10 µL of each sample was loaded onto a 1% agarose gel (described in the Pouring Agarose

Gels section) along with 5 µL of 2log ladder and run at 95 Volts for 40 minutes. Gels were imaged using the GelDoc XR to determine the presence of the insert

Sequencing PCR

Two different Sequencing PCR systems were used in this study. For sequences that were easier to amplify, the Applied Biosystems BigDye v3.1 Cycle Sequencing Kit for was used (Catalog number 4337455). Sequences that proved harder to amplify in PCR or had poor-quality data using the v3.1 kit were instead prepared using the Applied Biosystems dGTP BigDye Terminator v3.0 Cycle

Sequencing Kit (Catalog number 4390229), designed to be more effective in GC-rich sequences.

For sequencing using the BigDye v3.1 kit, 10 µL reactions were prepared for both the forward and reverse primers. For miniprep plasmid samples, this reaction contained 1.5 µL sample, 1.7 µL primer (T3 or T7), 0.5 µL BigDye, 1.7 µL dilution buffer, and 4.6 µL dH2O. For purified PCR products, the reaction mix contained 1 µL

17

PCR product, 0.5 µL primer (Forward or Reverse), 0.5 µL BigDye, 1.7 µL dilution buffer, and 6.3 µL dH2O. A thermocycler programed with the manufacturer’s recommended settings for synthesis was used to perform the reaction.

For sequencing using the dGTP Cycle Sequencing kit, 20 µL reactions were prepared for both the forward and reverse primers. All samples sequenced using this kit were PCR products (no plasmid DNA). The reaction mixture contained 2 µL

PCR product, 0.5 µL primer (Forward or Reverse), 4 µL dGTP reaction mix (half reaction), and 13.5 µL dH2O. A thermocycler programed with the manufacturer’s recommended settings for synthesis was used to perform the reaction.

Ethanol Precipitation

Ethanol-Sodium Acetate Precipitation was used to precipitate sequencing

PCR samples. All sequencing PCR products were brought to 20 µL using dH2O. Once at the appropriate volume, 2 µL 125 mM EDTA, 2 µL 3M Sodium Acetate, and 50 µL

100% nondenatured Ethanol were to each sample. This mixture was vortexed and incubated in the dark for 15 minutes at room temperature. Following this incubation, samples were spun in a cold centrifuge (4 ˚C) at max speed for 20 minutes. After centrifugation, the supernatant was drawn off and discarded. 70 µL of freshly prepared 70% Ethanol was used to wash the pellets. Samples were spun again in a cold centrifuge at max speed for 15 minutes. The supernatant was drawn off and discarded and samples were allowed to air dry until all ethanol was completely evaporated. Once dry, the precipitated samples were resuspended in 20

µL diFA. Resuspended samples were sent to the UCONN Center for Genome

Innovation for Sanger Sequencing.

18

Alignment of Sequencing Data

All Sanger Sequencing data received was imported into Geneious 8.1.9 and organized by species along with the mapped RNA-seq data. The Sanger data and

RNA-seq data together were mapped to the original Macropus eugenii genome database in all species using the Map to Reference tool with default parameters.

Once all gaps in RNA-seq data were bridged with Sanger data, the original reference sequence was removed to create a consensus sequence for CENP-B in each of the eight species. Some data in the GC-rich region of the sequence did not align properly and had to be aligned manually. Once a consensus sequence was generated for each species, the ExPASy translate tool was used to identify the open reading frame and generate amino acid sequences. These open reading frames were used to identify coding regions in the consensus sequences. The coding DNA sequences in all species were isolated and aligned using a web-based MUSCLE alignment (Edgar 2004).

Amino acid sequence alignments and percent identity data were created using the alignment tool in Geneious with MUSCLE and default parameters. A blastp web browser search of the Non-redundant protein sequences (nr) database in the mammalia taxa (Taxid: 40674) using the consensus CENP-B sequence generated from the alignment of all species was used to find amino acid sequences of CENP-B in other mammals. Several of these sequences, including Homo sapiens (human),

Mus musculus (mouse) Monodelphis domestica (opossum), Manis javanica

(pangolin), Ornithorhynchus anatinus (), Phascolarctos cinereus (koala), and a partial sequence for Sarcophilus harrisii (tasmanian devil), were used in a larger

MUSCLE alignment with all Macropodid sequences to compare CENP-B across

19 mammalian taxa. The UniProt database entry for human CENP-B was used to locate the domains of the protein to make annotations.

Generation of Phylogenies

Aligned DNA sequences of all wallaby species were converted into nexus block format using the web-based tool ALTER (ALignment Transformation

EnviRonment) (Glez-Peïa et. al. 2010). jModelTest 2.1.10 was applied to this dataset to determine the appropriate model to use in phylogenetic analyses (Darriba et. al.

2012, Guindon and Gascuel 2003). The top two scoring models, the GTR+I and

HKY+I, were used to generate phylogenies both with and without estimated priors.

MrBayes 3.2 was used to run the phylogenetic analysis of the aligned sequences with M. rufus set as the outgroup (Huelsenbeck and Ronquist 2001, Ronquist and

Huelsenbeck 2003). The program was run using the default GTR+I (nset=6, rates=propinv) and HKY+I (nset=2, rates=propinv) parameters both with and without the estimated priors determined by jModelTest for 1 million generations and a burnin fraction of 0.25. The resulting tree file was opened using FigTree v1.4.3 to create images for this analysis (Rambaut 2016).

Initial Positive Selection Analysis

Aligned FASTA-format wallaby sequences were also used to perform a preliminary site-based positive selection analysis in codeml, part of the PAML package. The phylogenies generated by MrBayes were used to create a simple,

Newick format reference tree for analysis in codeml. Using this tree and the provided alignment, the program was used to perform a codon-based positive selection analysis with two Nsites models: Nsites=1 (NearlyNeutral) and

20

2(PositiveSelection) (Yang 1997). The resulting outfile was examined to determine sites with a high probability of positive selection (ω>1).

Results

Using a variety of PCR amplification methods (described previously), the complete cDNA and amino acid sequence of CENP-B was successfully delineated in all eight species (nine total individuals) being examined. Early sequencing results in

M. eugenii revealed that the gaps in RNA-seq data were highly GC-rich and somewhat repetitive, making PCR amplification difficult. Of all the alterations in PCR conditions made, the use of fresh cDNA, a GC specific high-fidelity polymerase, and a denaturing agent such as DMSO were the most important in successfully amplifying the gap regions. Species-specific primer design did provide some assistance in amplifying more challenging species such as M. giganteus and W. bicolor. A complete breakdown of primer sets tested and successfully employed in sequencing can be found in Tables 2 and 3. Use of a GC-rich targeted sequencing PCR kit also showed a significant difference in both size and quality of Sanger reads (as seen in Figure 3)

Only four amino acid ambiguities remain among all individuals, most of which are likely explainable by single nucleotide polymorphisms (SNPs) between the two chromosomes of the individuals, indicated by the equal detection of two peaks in the

Sanger data. The complete amino acid alignment of all individuals is shown in Figure

4.

Examination of the alignment reveals a large number of amino acid alterations, insertions, or deletions, in even the most closely related wallaby species.

Several of these differences occur in the functional regions of the protein, including

21 three amino acid positions in the DNA binding helix-turn-helix (blue annotation) region and four in the homodimerization (red annotation) domain (Figure 4). The amino acid percent identity for most Macropodid species (see Table 4) was in the range of 96-98%. The individuals of M. rufogriseus appear to have the highest level of divergence from the other wallabies, with a high percent identity of 96.9% (with

W. bicolor) and a low of 94.6% (with M. rufus), although better measures than strict percent identity are needed to confirm this observation. One potential region of interest is a series of negatively charged residues near the homodimerization domain. This region corresponds to the repetitive GC-rich region in cDNA sequence and has a large when compared to human. The full mammalian alignment

(Figure 5) also shows high levels of insertions or deletions in this region. However, the two main functional domains of the protein (DNA binding and dimerization) are highly conserved across all mammals.

Of the four phylogenetic models used in MrBayes, three (GTR+I, GTR+I with priors, and HKY+I with priors) were able to generate complete phylogenetic trees.

The HKY+I model was unable to resolve a trifurcation containing M. parma, M. eugenii, and M. agilis. The three successful trees had the same branching pattern and similar branch lengths, so the GTR+I tree was selected (see Figure 6) and used in positive selection analysis.

With only one model utilized and no statistical significance analysis, the positive selection analysis performed was preliminary and needs further work to be complete. The site-based approach under the Positive Selection model (Nsites=2) implemented with PAML (Yang 1997) revealed one residue with a probability of

22 positive selection of 95%: V-395. This amino acid does not appear to be in any of the functional domains of the protein. However, one amino acid in the helix-turn-helix domain (E-113) also had a high, although not statistically significant, probability of positive selection at 90.9%. While looking at the positive selection model alone may give some initial guesses as to sites under positive selection, a log likelihood comparison of the Positive Selection model to a Nearly Neutral model is needed to more accurately determine the presence of positive selection. In addition, the use of other positive selection models such as a branch-site comparison would further verify the results obtained using the site model method.

Discussion

On GC-Rich PCR

With the bulk of this project focused on amplification of repetitive, GC-rich regions of DNA, a quick analysis of which methods were most effective for troubleshooting difficult PCRs could be useful in future studies. The use of the complete OneTaq system over standard lab Taq or even GoTaq was essential in successful performance of these PCRs and suggests the importance of using a high quality polymerase and the appropriate buffer to amplify GC-rich regions. If such a system is not available, the use of an additive such as DMSO or Betaine to disrupt secondary structure formation may be used as a substitute to greatly improve yields

(Jensen et. al. 2010). Note that because of the denaturing properties of these chemicals, a lower primer annealing temperature must be used. This does increase the risk of off-target primer annealing, so it is important to verify that the PCR products obtained are the correct size and sequence. Using freshly prepared cDNA

23 and running a gradient of annealing temperatures (especially with addition of a denaturing agent) are also potential solutions to challenging templates fixes before resorting into primer redesign. Once these methods fail, redesigning primers to be either species-specific or target a slightly different region would be the next step in troubleshooting.

CENP-B Divergence and Centromere Drive

Despite their recent evolutionary history, the CENP-B gene of the marsupials examined in this study exhibits a relatively high level of divergence. While most of these changes occur in the region between the two functional domains of the protein, several amino acid changes occurred in the highly conserved DNA binding domain. This region is highly conserved in all mammal species, even across the marsupials and eutherians, but has a surprising number of alterations within the

Macropodids (see figure 5). The helix-turn-helix domain of human and mouse, with an estimated divergence time of 96 million years (Nei and Glazko, 2002) is almost completely identical, with only one amino acid change at the end of the domain. The wallabies on the other hand with a divergence time of just 4.3 million years (Nilsson et. al. 2018) have several amino acid differences in the core of the helix-turn-helix domain, suggesting much more rapid evolution of the protein. Looking at CENP-B as a whole, human and mouse share 90.5% identity. The greatest level of divergence among the wallabies is between M. rufogriseus and M. rufus, with a 94.6% identitiy

(See Table 4). Given the difference in time of evolutionary divergence between these two pairs this level of divergence is surprisingly high, once again suggesting that

CENP-B is rapidly evolving in the Macropodids.

24

M. rufogriseus displays by far the greatest level of divergence in both the DNA binding region and main body of the protein. This is intriguing given the abnormally large of centromeres in M. rufogrisues compared to other species in the Macropus genus (Hayman 1990). Binding of CENP-B occurs along the full length of this expanded chromosome (Bulazel et. al. 2006). This new sequence data creates a direct link between changes in centromeric DNA content, differential binding ability of CENP-B, and the sequence of CENP-B. While this connection from DNA to protein could be used as evidence in favor of centromere drive, it is still not clear how the observed changes in amino acid sequence affect the function of the protein. Further study is needed to determine if CENP-B is simply evolving through coevolution with centromeric DNA or if evolution is driven by the evolutionary arms race mechanism suggested by Henikoff et. al. (2001). Regardless of the mechanism of evolution, rapid changes in the functional domains of CENP-B could alter the binding ability of the protein to equally rapidly evolving centromeric DNA in diverging populations. This decreased binding efficiency would cause incorrect positioning of CENP-A nucleosomes and prevent proper formation of the kinetochore in the potential offspring of these two populations, leading to nondisjunction and inviable offspring

(Fujita et. al. 2014, Fachinetti et. al. 2015). Thus diverging DNA and protein sequences at the centromere have the potential to create a reproductive barrier, eventually leading to speciation (Malik 2009).

The Function of CENP-B and its D/E Rich Region

Despite the body of research that has examined the role of CENP-B in formation of centromeres since its discovery, its specific function is still somewhat

25 unclear (Earnshaw and Rothfield 1985). The presence of the 17-bp CENP-B box within the centromeres of all in humans suggests it plays at least some role in centromere formation or maintenance (Haaf et. al. 1995). Much of the focus of CENP-B studies has been on its two known functional domains, the N-terminal helix-turn-helix DNA binding domain and the C-terminal homodimerization domain.

The high level of conservation in these domains among all mammals is further evidence that these regions are functionally important to CENP-B. However, one region that is overlooked in most studies that seemed intriguing is the repetitive,

Glutamic and Aspartic Acid (Glu and Asp) containing region located just before the dimerization domain. These amino acids posses a negative charge and close proximity of many charged residues makes these regions intrinsically disordered

(Uversky, 2002). Although these intrinsically disordered regions do not adopt a specific conformation, many still play an important role in the function of the protein and have been found to be important in many different cellular pathways and processes (Dyson and Wright, 2002). While not as conserved as the other functional domains, this highly negative region is present in all mammalian CENP-B sequences annotated thus far, differing mainly in the length of the negative chain.

This difference in length is most notable when comparing the marsupials to the eutherians and (see Figure 5). The intrinsically disordered nature of this domain may cause individual amino acid changes to be less severe in altering the overall function of the protein. However, if this region does serve some function other than just forming a linker between the two functional domains, the

26 accumulation of larger changes, such as the observed length differences in monotremes, marsupials, and eutherians, could have an effect on function.

Glu/Asp-rich disordered proteins have been shown primarily to perform three distinct functions (Chou and Wang, 2015) that align with experimental functions of CENP-B. Several studies have implicated CENP-B homologues in silencing of retrotransposons (Cam et. al. 2008 and Lorenz et. al. 2012). Two common functions associated with D/E rich proteins are mRNA processing and regulation of the complex both, of which could be involved in the silencing of retrotransposons (Chou and Wang, 2015). However, even more interesting is the capacity of D/E rich proteins to mimic DNA (Chou and Wang,

2015). The charge of these amino acids allows the protein to form a structure similar to that of the DNA helix by mimicking the negatively charged phosphate backbone (see Figure 7). This allows the protein to bind to other proteins that would normally interact with DNA. This function has been noted in the yeast protein

Chz1, a /H2B (Chou and Wang, 2015).

It is possible that CENP-B could display this DNA mimicry function in interacting with centromeric histone proteins and CENP-A, facilitating the incorporation of CENP-A into the nucleosome in the centromere (Figure 8). A study by Fujita et. al. (2015) suggested that the binding of CENP-B to CENP-A was important in establishing CENP-A nucleosomes, and that binding of CENP-B both stabilized the CENP-A nucleosome through direct interactions and simultaneously destabilized it through epigenetic changes. The D/E rich domain could potentially be key in facilitating this interaction by displacing DNA and destabilizing normal H3

27 nucleosomes at the centromere through an epigenetic mechanism, allowing CENP-A to be incorporated instead. The newly created CENP-A nucleosomes are still somewhat destabilized by these same epigenetic mechanisms, but could be offset by the direct binding interactions of CENP-A and –B in the DNA binding domain (Fujita et. al. 2015), creating an overall stable nucleosome.

Further study is needed to determine if CENP-B is actually capable of performing this DNA mimicry function. More crystal structures of CENP-B and the

CENP-B/CENP-A complex, especially in the rapidly evolving macopodids, could be useful in determining the relationship of CENP-B to the CENP-A nucleosome. An assay to test the binding ability of the D/E rich region specifically to both the H3 and

CENP-A nucleosome could also shed light on this interaction. A transfection study testing the binding ability of CENP-B in one species to different nucleosomes in other species might also yield interesting results. With the large amount of ambiguity still surrounding CENP-B and its function, no region of the protein should be ignored. The strange nature of Glu/Asp chains and lack of previous examination of this region make the D/E rich region an excellent candidate for future efforts in the study of CENP-B.

Future Directions

Much of the analysis performed in this study is preliminary, and more measures of quantification are needed to ascertain the divergence of CENP-B across the eight marsupial species examined herein. Completion of the positive selection analysis would provide better insight into the potential regions of CENP-B that exhibit fixed, species specific. This study was also part of a much larger ongoing

28 study into the effects on centromere drive on proteins throughout the kinetochore complex. By combining sequence data for all of the proteins of the inner kinetochore, we could potentially visualize the ripple effects that divergence in DNA at the centromere have on all the proteins in the assembly complex. A comparative study of this scale will reveal far more information about the complex dynamics of centromere formation and kinetochore assembly than by looking at any one individual protein. Between this large-scale study and further examination into less- studied regions of CENP-B (such as the Glu/Asp-rich region), the confusion surrounding the function of CENP-B may finally be eliminated.

29

Acknowledgements

First and foremost, I would like to thank my advisor and PI Dr. Rachel O’Neill for accepting me into her lab and for all her help in putting this project together.

This has been an incredible experience and I cannot thank her enough for all the support and guidance she has given me. I would also like to thank my graduate student advisor Zachary Duda for allowing me to work on this project, which is just one piece of a larger study years in the making. Zach has been my mentor for my two years in the lab, teaching me all the techniques, data analysis, and general background that I needed to work in the lab. This project would not have been possible without him and I am immensely grateful for all the time and effort he has put into training me and letting me help with his research. I would like to thank my fellow undergraduate Corbinian Wanner for his initial work on the CENP-B project and for his assistance in testing the many PCR primer sets used in this study. Finally,

I would like to thank every member of the O’Neill lab for accepting me into the lab, answering any questions I might have, and creating a welcoming environment for undergraduate research. The support I received from the lab as a whole incredible, and my experience at UCONN would not have been the same without it.

30

References

Brinkley B.R., (1990). Centromeres and : integrated domains on eukaryotic chromosomes. Curr. Opin. Cell Biol. 2(3): 446-452.

Brown J.D., O’Neill R.J. The Evolution of Centromeric DNA Sequences. In: eLS. John Wiley & Sons, Ltd: Chichester.

Bulazel K., Ferreri G.C., Eldridge M.D.B., O’Neill R.J., (2007). Species-specific shifts in centromere sequence composition are coincident with breakpoint reuse in karyotypically divergent lineages. Genome Biol. 8(8):R170.

Bulazel K., Metcalfe C., Ferreri G.C., Yu J., Eldridge M.D.B., O’Neill R.J., (2006). Cytogenetic and Molecular Evaluation of Centromere-Associated DNA Sequences From a Marsupial (Macropodidae: Macropus rufogriseus) . Genetics 172:1129–1137

Cam H.P., Noma K., Ebina H., Levin H.L., Grewal S.I., (2008). Host genome surveillance for retrotransposons by transposon-derived proteins. Nature 451:431– 6

Chou C., Wang A.H.J. (2015), Structural D/E-rich repeats play multiple roles especially in gene regulation through DNA/RNA mimicry. Mol. BioSyst. 11, 2144.

Darriba D., Taboada G.L., Doallo R., Posada D. (2012). jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9(8), 772.

Dyson H.J., Wright P.E (2002). Coupling of folding and binding for unstructured proteins. Curr. Opin. Struct. Biol.;12:54–60.

Earnshaw W.C., Rothfield N., (1985). Identification of a family of human centromere proteins using autoimmune sera from patients with scleroderma. Chromosoma 91:313–321.

Edgar R.C., (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5): 1792–1797.

Fachinetti D., Folco H.D., Nechemia-Arbely Y., Valente L.P., Nguyen K., Wong A.J., Zhu Q., Holland A. J., Desai A., Jansen L.E.T., Cleveland D.W., (2013). A two-step mechanism for epigenetic specification of centromere identity and function. Nature Cell Biol. 15(9)1056-1066.

Fachinetti D., Han J.S., McMahon M.A., Ly P., Abdullah A., Wong A.J., Cleveland D.W., (2015). DNA Sequence-Specific Binding of CENP-B Enhances the Fidelity of Human Centromere Function. Dev Cell. 33(3):314-27.

31

Fujita R., Otake K., Arimura Y., Horikoshi N., Miya Y., Shiga T., Osakabe A., Tachiwana H., Ohzeki J., Larionov V., Masumoto H., Kurumizaka H., (2015). Stable complex formation of CENP-B with the CENP-A Nucleosome. Nucleic Acids Res. May 26;43(10):4909-22

Fukagawa, T., & Earnshaw, W. C. (2014). The Centromere: Chromatin Foundation for the Kinetochore Machinery. Developmental Cell, 30(5), 496–508.

Glez-Peïa D., Gïmez-Blanco D., Reboiro-Jato M., Fdez-Riverola F., Posada D., (2010) ALTER: program-oriented conversion of DNA and protein alignments. Nucleic Acids Research 38(2):W14-W18.

Goldberg I.G., Sawhney H., Pluta A.F., Warburton P.E., Earnshaw W.C., (1996). Surprising Deficiency of CENP-B Binding Sites in African Green Monkey a-Satellite DNA: Implications for CENP-B Function at Centromeres. Mol. Cell Biol. 16(9):5156- 68.

Guindon S. and Gascuel O. (2003). A simple, fast and accurate method to estimate large phylogenies by maximum-likelihood". Systematic Biology 52: 696-704.

Haaf T., Mater A. G., Weinberg J., Ward D.C., (1995). Presence and abundance of CENP-B box sequences in great ape subsets of primate-specific α-satellite DNA. J Mol Evol 41:487-491.

Hasson D., Panchenko T., Salimian K.J., Salman M.U., Sekulic N., Alonso A., Warburton P.E., Black B.E., (2013). The octamer is the major form of CENP-A nucleosomes at human centromeres. Nature Struc. and Mol. Biol. 20:687-695.

Hayman D.L., (1990). Marsupial . Aust. J. Zool. 37: 331–349.

Henikoff S., Ahmad K , Malik H.S., (2001). The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293:1098 – 1102

Huelsenbeck, J. P., Ronquist F., (2001). MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17:754-755.

Jensen M.A., Fukushima M., Davis R.W. (2010) DMSO and Betaine Greatly Improve Amplification of GC-Rich Constructs in De Novo Synthesis. PLOS ONE 5(6): e11024.

Kipling D., Mitchell A.R.,1 Masumoto H., Wilson H.E., Nicol L., Cooke H.J. (1995). CENP-B Binds a Novel Centromeric Sequence in the Asian Mouse Mus caroli. Mol. Cell Biol. 15(8):4009-20.

Lorenz D.R., Mikheyeva I.V., Johansen P., Meyer L., Berg A., Grewal S.I., Cam H.P. (2012). CENP-B cooperates with Set1 in bidirectional transcriptional silencing and genome organization of retrotransposons. Mol Cell Biol. Oct;32(20):4215-25

32

Malik H.S., (2009). The centromere-drive hypothesis: a simple basis for centromere complexity. Prog Mol Subcell Biol 48:33-52.

Masumoto H., Masukata H., Muro Y., Nozaki N., Okazaki T., (1989). A human centromere antigen (CENP-B) interacts with a short specific sequence in alphoid DNA, a human centromeric satellite. J. Cell Biol. 109: 1963-1973.

Masumoto H., Nakano M., Ohzeki J., (2004) The role of CENP-B and a-satellite DNA: de novo assembly and epigenetic maintenance of human centromeres. Chromosome Research 12:543-556.

Muro, Y., Masumoto H., Yoda K., Nozaki N., Ohashi M., OkazakiT., (1992). Centromere protein B assembles human centromeric alpha-satellite DNA at the 17-bp sequence, CENP-B box. J. Cell Biol. 116:585-596.

Nei M., Glazko G.V., (2002). Estimation of Divergence Times for a Few Mammalian and Several Primate Species. The Journal of 93(3):157-164

Nilsson, M. A., Zheng, Y., Kumar, V., Phillips, M. J., & Janke, A. (2018). Speciation Generates in Kangaroos. Genome Biology and Evolution, 10(1), 33– 44.

Ohzeki J., Nakano M., Okada T., Masumoto H., (2002). CENP-B box is required for de novo centromere chromatin assembly on human alphoid DNA. J. Cell Biol. 159(5):765-775.

Palmer, D.K., O’Day, K., Wener, M. H., Andrews, B. S. & Margolis, R. L (1987). A 17-kD centromere protein (CENP-A) copurifies with nucleosome core particles and with histones. J. Cell Biol. 104, 805–815.

Perez-Castro A.V., Shamanski F.L., Meneses J.J., Lovato T.L., Vogel K.G., Moyzis R.K., Pedersen R., (1998). Centromeric protein B null mice are viable with no apparent abnormalities. Dev Biol. 201(2):135-43.

Perpelescu M., Fukagawa T., (2011) .The ABCs of CENPs. Chromosoma 120.5 425- 446.

Rambaut A., (2016). FigTree v1.4.3. http://tree.bio.ed.ac.uk/software/figtree/.

Ronquist F., Huelsenbeck J.P. (2003). MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572-1574.

Sunkel C.E., Coelho P.A., (1995). The elusive centromere: sequence divergence and functional conservation. Current Opinion in Genetics and Development 5:756-767.

33

Suntronpong A., Kugou K., Masumoto H., Srikulnath K., Ohshima K., Hirai H., Koga A. (2016) CENP-B box, a nucleotide motif involved in centromere formation, occurs in a New World monkey. Biol. Lett. 12: 20150817.

Uversky V.N. (2002). What does it mean to be natively unfolded? Eur J Biochem.;269(1):2–12.

Westermann S., Schleiffer A., (2013). Family matters: structural and functional conservation of centromere-associated proteins from yeast to humans. Tends in Cell Biol 23(6):260-269.

Yang, Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood Computer Applications in BioSciences 13:555-556.

Yoda K., Ando S., Okuda A., Kikuchi A., Okazaki T., (1998). In vitro assembly of the CENP-B/alpha-satellite DNA/core histone complex: CENP-B causes nucleosome positioning. Cells. Aug;3(8):533-48

Yoda K., Kitagawa K., Masumoto H., Muro Y., Okazaki T., (1992). A Human Centromere Protein, CENP-B, Has a DNA Binding Domain Containing Four Potential α-Helices at the NH2 Terminus, Which Is Separable from Dimerizing Activity. J. Cell Biol 119(6):1413-27

Zwick M.E., Salstrom J.L., Langley C.H., (1999). Genetic variation in rates of nondisjunction: association of two naturally occurring polymorphisms in the chromokinesin nod with increased rates of nondisjunction in melanogaster. Genetics 152(4):1605-14.

34

Primer Name Target Gap Primer Sequence macroCenpB_321F 1-2 Forward CCGGGAGAAGTCGTACATCATC macroCenpB_322F 1-2 Forward CGGGAGAAGTCGTACATCATCC macroCenpB_330F 1-2 Forward GTCGTACATCATCCAGGAGGTG macroCenpB_415F 1-2 Forward TGAGCACCATCCTCAAGAACAA macroCenpB_416F 1-2 Forward CTGAGCACCATCCTCAAGAACA wbi/mgiCenpB_406F 1-2 Forward TGAGCACCATCCTCAAGAACAA wbi/mgiCenpB_564F 1-2 Forward GTCAACGGCATCATCCTCAAG macroCenpB_1133R 1-2 Reverse TTGGGGTTGGCAGTATAGTCAC macroCenpB_1601R 1-2 Reverse TCCTCTTCGTCTTCCTCCTCAT macroCenpB_1679R 1-2 Reverse TCTCCTTCCTCTCCTTCCTCAG macroCenpB_1680R 1-2 Reverse CTCTCCTTCCTCTCCTTCCTCA macroCenpB_928R 1-2 Reverse CACATGGACCAAATCATGCTCC wbi/mgiCenpB_1093R 1-2 Reverse TTGGGGTTGGCAGTATAGTCAC wbi/mgiCenpB_906R 1-2 Reverse GTAGGAAGTTGTACCACAGGCA macroCenpB_303F 1-3 Forward CTGACGTTCGGGGAGAAGTC macroCenpB_2067R 1-3 Reverse AGGCTAAGTTCTGGTGTGCA macroCenpB_165F 2-3 Forward GCCTCGCAGGATGTCTTCAA magilCenpB_1606F 2-3 Forward CTGAGGAAGGAGAGGAAGGAGA meuCenpB_1787F 2-3 Forward AGGACTCCGAATCAGAGAGTGA mpmCenpB_1408F 2-3 Forward CCTCCCTGAAGCTTTGCACT mpmCenpB_1419F 2-3 Forward CTTTGCACTTTGTGGCAGCT robCenpB_1378F 2-3 Forward CTCCCTGAAGCTTTGCACTTTG rufus/mgiCenpB_1377F 2-3 Forward CTCCCTGAAGCTTTGCACTTTG rufus/mpmCenpB_1234F 2-3 Forward AAGTGGCCTTCTTCCCTTCG rufusCenpB_1206F 2-3 Forward CTAGACGTCTCGGAACTGCG wbi/mgiCenpB_1231F 2-3 Forward TTCAAGTGGCCTTCTTCCCTTC wbiCenpB_1480F 2-3 Forward ATCATGACATCGACGTGACCC macroCenpB_1916R 2-3 Reverse GGTCAGGTACCGTTTGACCA macroCenpB_1998R 2-3 Reverse CGTGGTTCTTTCTGGTCAGA macroCenpB_2020R 2-3 Reverse TGGTTCTTTGTGGTCACATGGA magilCenpB_1804R 2-3 Reverse TCACTCTCTGATTCGGAGTCCT meuCenpB_2020R 2-3 Reverse CGTGGTTCTTTCTGGTCACATG mpmCenpB_1750R 2-3 Reverse GGCTCCAAAGGCCACAAAAC robCenpB_1731R 2-3 Reverse CTGCTCATACTGGGCTCCATAG rufus/mpmCenpB_1891R 2-3 Reverse AAGTATGCCATGGCCTCACC rufusCenpB_1640R 2-3 Reverse ACTCTCCTCCTCCTCTTCTTCC rufusCenpB_1877R 2-3 Reverse GGGAACGGGCACTTCATCAT wbi/mgiCenpB_1786R 2-3 Reverse TCACTCTCTGATTCGGAGTCCT wbi/mgiCenpB_1863R 2-3 Reverse GAACGGGCACTTCATCATCATC wbiCenpB_1960R 2-3 Reverse AAGTGAAGAATGTGGCTCTGGA Table 1: Complete list of primers used in this study. Primer name describes the target species (macro or species-specific) and location in the CENP-B gene while Target Gap indicates the intended gap to be bridged by the product.

35

Table 2: Primer Sets Tested in Each Species Primer Set Target Gap Expected Product Length (bp) Species Tested Species Used for Sequencing 303F-846R 1 to 2 543 meu none 321F-1679R 1 to 2 1358 meu none 321F-1680R 1 to 2 1359 meu none 322F-1601R 1 to 2 1279 meu none 322F-1679R 1 to 2 1357 meu none 322F-1680R 1 to 2 1358 meu none 322F-928R 1 to 2 606 meu none 330F-1133R 1 to 2 803 meu none 330F-1679R 1 to 2 1349 meu none 330F-1680R 1 to 2 1350 meu none 330F-928R 1 to 2 598 meu none 405F-846R 1 to 2 441 meu none 406F-1093R 1 to 2 687 mgi, wbi none 406F-906R 1 to 2 500 mgi, wbi mgi, wbi 415F-1133R 1 to 2 718 meu none 416F-1133R 1 to 2 717 meu none 416F-928R 1 to 2 512 all magil, meu, mpm, rob, rufo, rufus 564F-1093R 1 to 2 529 mgi, wbi none 564F-906R 1 to 2 342 mgi, wbi none 303F-2067R 1 to 3 meu none 1072F-1982R 2 to 3 910 mpm mpm 1231F-1863R 2 to 3 632 mgi, wbi mgi 1231F-1960R 2 to 3 729 mgi, wbi none 1377F-1786R 2 to 3 409 mgi none 1378F-1781R 2 to 3 403 rob rob 1460F-1786R 2 to 3 326 wbi wbi 165F-1916R 2 to 3 all magil, meu, rufo 165F-1998R 2 to 3 meu none 165F-2020R 2 to 3 meu none 1206F-1877R 2 to 3 cleanup 671 rufus none 1206F-1891R 2 to 3 cleanup 685 rufus rufus 1234F-1750R 2 to 3 cleanup 516 mpm none 1234F-1877R 2 to 3 cleanup 643 rufus rufus 1234F-1891R 2 to 3 cleanup 657 mpm, rufus none 1408F-1750R 2 to 3 cleanup 342 mpm mpm 1408F-1891R 2 to 3 cleanup 483 mpm none 1419F-1750R 2 to 3 cleanup 331 mpm none 1419F-1891R 2 to 3 cleanup 472 mpm mpm 1606F-1804R 2 to 3 cleanup 198 magil magil 1787F-2020R 2 to 3 cleanup 233 meu meu

Table 2: List of all combinations of primer sets used to amplify PCR products, including the intended gap to bridge, the species primer sets were tested in, and the species there resulting PCR products were used for sequencing. Highlighting indicates primer sets were used to generate sequence data, while brackets indicate an approximated product length.

36 r

r

o

o

l

l

t

t

c

6

0

c

o

o

6

0

6

6

4

4

c

c

e

e

8

6

i

i

2

0

0

0

r

r

5

5

7

4

i

i

b

b

3

5

9

4

1

1

D

D

.

.

W

W

Peak3

2

2

4

4

2

2

3

3

s

s

e

]

u

u

t

n

6

0

c

e

e

o

5

2

8

6

l

1

0

s

4

s

0

e

i

i

6

1

2

1

c

r

9

6

5

2

r

r

i

1

5

9

4

b

1

1

g

g

D

[

u

o

o

S

f

f

u

u

r

r

.

.

M

M

8

8

8

8

1

1

1

1

Peak2

s

s

e

]

u

u

t

n

6

0

c

e

e

o

5

2

8

6

l

1

0

s

4

s

0

e

i

i

6

1

2

1

c

r

9

6

5

2

r

r

i

1

5

9

4

b

1

1

g

g

D

[

u

o

o

S

f

f

u

u

r

r

.

.

M

M

s

s

u

u

t

t

t

t

s

s

c

c

1

8

u

u

3

2

8

6

0

0

e

e

3

7

b

b

5

1

2

1

r

r

6

5

7

3

i

i

o

o

3

5

9

4

1

1

r

r

D

D

.

.

Peak1

M

M

s

s

u

u

e

e

t

t

t

t

3

1

c

c

n

n

2

0

6

6

6

3

4

2

e

e

a

a

3

0

0

0

r

r

8

2

5

5

i

g

i

g

6

5

9

4

i

i

1

1

D

D

g

g

.

.

M

M

t

7

4

c

3

7

3

e

4

r

s

s

s

8

2

i

6

t

t

u

u

u

1

1

D

f

f

f

c

c

0

7

3

2

8

6

4

4

4

e

e

u

4

7

u

u

6

1

2

1

r

r

r

r

r

5

5

5

6

3

i

i

2

5

9

4

.

.

.

t

1

1

D

D

1

6

c

5

M

M

M

9

0

e

8

r

8

2

i

6

1

1

D

t

1

9

c

2

9

1

e

7

a

a

a

r

8

4

i

4

t

t

1

1

m

m

m

D

c

2

2

c

r

r

r

0

2

8

6

0

4

e

e

8

7

4

a

a

a

1

1

2

1

r

r

6

5

9

0

5

i

i

p

p

p

9

5

9

4

t

1

1

.

.

.

D

D

0

8

c

2

5

0

e

M

M

M

4

r

7

4

i

3

1

1

D

e

s

s

s

i

i

i

]

t

t

n

l

l

l

i

i

i

4

6

6

0

c

c

o

8

5

2

8

6

g

g

g

l

0

0

1

0

0

0

0

e

e

9

6

1

2

1

c

a

a

a

r

r

8

6

9

6

6

6

2

i

i

1

1

5

9

4

.

.

.

b

iprep workflowordirect sequencing

1

1

1

1

D

D

[

u

M

M

M

S

3) and additionaland3) cleanup sequencing

-

i

i

i

i

i

i

e

e

n

n

n

]

t

n

n

e

e

e

0

7

6

0

c

o

o

3

5

2

8

6

g

g

g

l

l

2

8

1

0

0

4

0

e

3

6

1

2

1

c

c

u

u

u

r

0

7

9

6

6

5

2

i

2

1

5

9

4

e

e

e

b

b

2

1

1

1

2 and and 2 2

D

.

.

.

[

u

u

-

S

S

M

M

M

)

)

)

C

C

C

˚

˚

˚

(

(

(

h

h

h

e

e

e

t

t

t

n

n

n

r

r

r

g

g

g

o

o

o

u

u

u

i

i

i

n

n

n

t

t

t

t

t

t

e

e

e

a

a

a

a

a

a

r

r

r

L

L

L

r

r

r

e

e

e

t

t

t

a

a

a

r

r

r

r

r

r

c

c

c

p

p

p

3

2

p

p

p

p

s

s

s

-

-

e

e

e

e

e

e

u

u

u

e

e

e

u

e

e

e

m

m

m

2

1

i

i

i

r

r

r

d

d

d

m

m

m

n

seq dataseq (peaks 1 data,seq captured Geneious in 8.1.9

m

m

m

e

e

e

c

c

c

i

i

i

i

i

i

P

P

P

k

k

o

o

o

- -

a

r

r

r

T

T

T

e

e

e

r

r

r

r

r

r

a

a

f

f

f

e

p

p

p

p

p

p

p

p

p

P

P

P

l

g

g

g

e

e

o

o

o

S

S

S

C

Summaryof primer sets used forinitial Sanger Sequencingthrough

n

n

n

R

F

R

F

P

R

F

P

d

d

d

i

i

i

d

d

d

l

l

l

e

e

e

o

o

o

a

a

a

t

t

t

h

h

h

e

c

e

c

e

c

t

t

t

e

e

e

n

n

n

e

e

e

p

p

p

n

n

n

x

x

x

M

M

M

A

A

A

E

E

E

R

R

R

off ofamplifiedthe PCR figureoff to The the product. peaks the shows right and Table Table 3: RNA gapsin after.Method preparation of describes how sequencing PCR samples were usingprepared, either asubcloning and min RNA gapsin

C

C

C

P

P P

37

Amino Acid Percent Identities

8 2

8 4

1 2

s

s

s

1 3

s

u

i

n

l r

i

u

a u

s s

e

s

s

u o

i

e

n

t

i

l

u u

l

t

c

u

s

m

p e

i

o

f

s

e e

r

n

u

g

a

g

c

s s

u

u

a

a

i

s

i i

b

u a

r

r r

g

p

b

.

o

e

i

.

m

o

g g

.

r

.

.

g

s

o o

.

M

M

m

.

f f

M

u

W

M

o

u u

M

M

r r

H M

. .

M M

Homo sapiens 90.5 72.9 72.1 71.9 72.1 72.7 72.7 73.5 73.5 73.4

Mus musculus 90.5 70.7 70.4 70 70.1 70.9 70.9 71.6 71.4 71.4

M. eugenii 72.9 70.7 97.1 95.9 96.3 97.5 96.6 97.6 97.3 97.6

M. agilis 72.1 70.4 97.1 95.5 95.8 97 95.8 96.5 96.5 96.5

M. rufogriseus 1188 71.9 70 95.9 95.5 99.7 95.6 94.6 96.3 95.8 96.6

M. rufogriseus 3242 72.1 70.1 96.3 95.8 99.7 95.9 94.7 96.6 95.9 96.9

M. parma 72.7 70.9 97.5 97 95.6 95.9 96.9 96.4 96.6 96.6

M. rufus 72.7 70.9 96.6 95.8 94.6 94.7 96.9 97.5 97.6 97.3

M. giganteus 73.5 71.6 97.6 96.5 96.3 96.6 96.4 97.5 98.1 99

M. robustus 73.5 71.4 97.3 96.5 95.8 95.9 96.6 97.6 98.1 98.5

W. bicolor 73.4 71.4 97.6 96.5 96.6 96.9 96.6 97.3 99 98.5

Table 4: Pairwise comparison of CENP-B amino acid percent identities in all Macropodid species with human and mouse sequences included for comparison

38

Figure 1: An illustration of the general mechanism of centromere drive. Initial expansion of satellite allows more microtubule attachments and is driven to fixation by female meiosis, but is soon evened out by in centromere proteins. Adapted from Malik 2009.

Figure 2: Cartoon of the general kinetochore complex. Adapted from Perpelescu and Fukagawa, 2011.

39

A1

A2

B1

B2

Figure 3: The difference in sequence quality achieved from different Sequencing PCR kits. All four samples are from the same M. rufogriseus 1188 PCR product using the 165F primer. Samples A1 and A2 were generated using the BigDye Terminator v3.1 Cycle Sequencing Kit, while B1 and B2 were generated using the dGTP BigDye Terminator v3.0 Cycle Sequencing Kit.

40

-

-

B B

-

B andB

turn

B is is B

-

-

-

Amino

B in all in B eight

er CENPer

-

H: The DNAThe H:

inghelix

-

T

-

functionaldomains intrinsically Figure 4: acidalignment of CENP wallabies with CENPhuman usedfor reference. H bind helixdomain. rve superfamily:a conserved ancestralintegrase domain.Asp/Glu region: rich Negativelycharged, disordered regions theprotein. of Homodimerization Region Domain: whereCENP capable of dimerizing with anoth Createdprotein. in Geneious8.1.9

41

B B

-

-

B B

-

turn

B is

-

-

CENP

rich region: region: rich

-

edomain.

H: The DNAThe H:

-

T

-

Aminoacid alignment theMacropod of consensussequence severaland other speciesmammal. of H bindinghelix helixdomain. rve superfamily:a conservedancestral integras Asp/Glu Negativelycharged, intrinsically disordered regionsof theprotein. Homodimerization Region Domain: whereCENP capabledimerizing of another with CENP Createdprotein. in Geneious8.1.9 Figure 5: 42

M.rufus

M.robustus

M.parma

M.agilis

M.eugenii

M.rufo1188

M.rufo3242

M.giganteu

W.bicolor

Figure 6: Phylogeny of the Macropodidae generated from CENP-B cDNA sequence data using the GTR+I model.

A B

Figure 7: Model of D/E residues in proteins as DNA mimics. Pictured are A) Chz1 and B) ANP32E, two yeast chaperone proteins, with the H2A/H2B complex. Adapted from Chou and Wang 2015.

43

CenpB-Box

H3 1. The nucleosome at the H2A, H2B, H4 centromere with DNA containing } the CENP-B Box

DNA CenpB Dimer

2. The CENP-B Dimer binds to the CENP-B Box

CenpA 3. CENP-A binds to the N-Terminal domain of CENP-B while the Glu/Asp- rich C-Terminal end displaces DNA and interacts with H3 via DNA mimicry

4. The interaction of CENP-B and H3 destabilizes the nucleosome and displaces H3, while the stable CENP-A/- B complex allows CENP-A to replace H3 in the nucleosome

Dissassociation 5. With the CENP-A of CenpB Dimer nucleosome octamer in place, CENP-B and H3 dissociate

Figure 8: Formation of the octameric CENP-A nucleosome (Hasson et. al. 2013) facilitated by formation of a stable CENP-A/-B complex (Fujita et. al. 2015) and DNA mimicry interactions of CENP-B with the nucleosome (Chou and Wang 2015).

44

Supplemental

FASTA Format CENP-B cDNA Sequences

>Macropus agilis CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAAGAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTGTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTCGAGGGGCTGCTCATCGCCTGGTTCCAGCAGATCCGCGCCGCCGGCCTGCCGGT CAACGGCATCATCCTCAAGGAGAAGGCGCTGCGCATCGCTGAGGAGCTGGGCATGGACGACTTCACCGCCT CCAACGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAACGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCGGCCTCGGCGTCGGCCGCAGCGGG CGGTGCAGGGGGCGGTGGTGCAGGGGGCGGTGGTGCGGGCAGCCGGGGCTGGTGGGAGGAGCAGCTGCCCG CCTTGGCCGAGGGCTACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTA CCTGACCAGGCCCCTGCCCTTGGCGGCGAGGCCCGGCCCAGCCCCCGGCGGGGCACCGAGAGGCTCGTGGT GCTCCTGTGTGCCAATGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCT CGGGATCCTGGGCTGGCCTGCCCTGTGACTATACTGCCAACCCCAAGGGGGGCGTCACGGCCCAGGAGCTC GCCAAGTACCTGAAGGCCCTGGATGCCCGCATGGCCAACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCG GCCGGCCGCCCCAGCCCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCGGATGCCC TTCACCCCTTGGAGCGGGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGCTCAAAGCC ATGGCCGCCCTGGAGGACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGC GGCTGCGTGGGAGGCCGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTCGGCGGTGGCC CAGGCGTGGGTCACATCGACGTGACCCTCAAGAGTGATGAGGAGGAAGACGAGGAGGAGGGGGAGGGGGAG GACGAGGAGGATGAGGAGGGCGAGGAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGA GGAGGTAGAAGAGGAGGAGGAGAGTTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTG GGGCAGGGGGCAGTTTTGTGGCCTTTGGAGCCCAGTATGAGCAGGGGGGCCCTGCTGTTCAGCTCCTGGAG GCCGGGGAGGACTCCGAATCAGAGAGTGATGAGGAAGATGAGGATGACGAYGATGACGATGATGATGATGA CGATGATGATGACGATGATGATGATGATGAAGTGCCAGTTCCCAGTTTTGGTGAGGCCCTGGCATACTTTG CCATGGTCAAACGGTACCTGACCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAG CATGATTTGATCCATGTGACCAGAAAGAACCACGCCAGACAGATGTCATCCCGAGCCCTGGCACACCAGAA CTTAGCCTGA

>Macroupus eugenii CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAAGAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTGTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTTGAGGGGCTGCTCATCGCCTGGTTCCAGCAGATCCGCGCCGCCGGCCTGCCGGT CAACGGCATCATCCTCAAGGAGAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGAAGACTTCACCGCCT CCAACGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAACGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCGGCCTCGGCGTCGGCCGCAGCGGG CGGTGCAGGGGGCGGTGGTGCGGGCAGCCGGGGCTGGTGGGAGGAGCAGCTGCCCGCCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCCCGGCCCAGCCCGCGGCGGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCTCGGGATCCCGGGCTG GCCTGCCCTGTGACTATACTGCCAACCCCAAGGGGGGCGTCACGGCCCAGGAGCTCGCCAAGTACCTGAAG GCCCTGGATGCCCGCATGGCCGACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCGGCCGCCCCAGC CCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCCGATGCCCTTCACCCCTTGGAGC GGGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGCTCAAAGCCATGGCCGCCCTGGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCGGCTGCGTGGGAGGC CGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTCGGCGGTGGCCCAGGCGTGGGTCACA TCGACGTGACCCTCAAGAGTGATGAGGAGGAAGACGAAGAGGAGGAGGAGGGGGAGGACGCGGAGGATGAG GAGGGCGAGGAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGA GGATGAGAGTTCCTCAGAGRGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGCAGGGGGCAGTT TTGTGGCCTTTGGAGCCCAGTATGAGCAGGGGGGCCCTGCTGTTCAGCTCCTGGAGGCCGGGGAGGACTCC GAATCAGAGAGTGATGAGGAAGATGAGGATGACGATGATGATGATGATGACGATGATGATGACGATGACGA TGATGATGAAGTGCCCGTTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGTACCTGA

45

CCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCATGTGACC AGAAAGAACCACGCCAGACAGATGTCATCCCGAGCCCTGGCACACCAGAACTTAGCCTGA

>Macropus giganteus CENPB ATGGGCCCCAAGCGGAGGCAGYTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAAGAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTGTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTCGAGGGGCTGCTCATCGCCTGGTTCCAGCAGATCCGCGCCGCCGGCCTGCCGGT CAACGGCATCATCCTCAAGGAGAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGACGACTTCACCGCCT CCAACGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAATGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCAGCCTCGGTGTCGGCCGCAGCGGG CAGTGCAGGGGGCGGTGGTGCGGGCAGCCGGGGCTGGTGGGAGGAGCAGCTGCCCGCCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCTCGGCCCAGCCCCCGGCGGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCTCGGGATCCCGGGCTG GCCTGCCCTGTGACTATACTGCCAACCCCAAGGGGGGYGTCACGGCCCAGGAGCTCGCCAAGTACCTGAAG GCCCTGGATGCCCGCATGGCCGACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCAGCCGCCCCAGC CCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCGGATGCCCTTCAGCCCTTGGAGC GRGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGCTCAAAGCCATGGCCGCCCTAGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCGGCTGCGTGGGAGGC CGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTTGGCGGTGGCCCAGGCCAAGAAATCG ACGTGACCCTCAAGAGTGATGAGGAGGAAGACGAGGAGGAGGAGGAGGGGGAGGACGCGGAGGATGAGGAG GGCGAGGAAGAGGAGGGAGAGGAGGSTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGAGGA GGAGAGTTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGCAGGGGGCAGTTTTG TGGCCTATGGAGCCCAGTATGAGCAGGGGGGCCCTGCTGTTCAGCTCCTGGAGGCCGGGGAGGACTCCGAA TCAGAGAGTGATGAGGAAGATGAGGATGACGATGATGACGATGATGATGATGACGATGATGATGATGATGA CGATGATGATGAAGTGCCCGTTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGTACC TGACCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCATGTG ACCAGAAAGAACCACGCCAGACAGACGTCATCCCGAGCCCTGGCACACCAGAACTTAGCCTGA

>Macropus parma CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAACAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTGTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTCGAGGGGCTGCTCATTGCCTGGTTCCAGCAGATCCGCGCCGCCGGCCTGCCGGT CAACGGCATCATCCTCAAGGAGAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGAGGACTTCACCGCCT CCAATGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAACGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCGGCCTCGGCGTCGGCCGCAGCGGG CGGTGCAGGGGGCGGTGGTGTGGGCAGCCGGGGCTGGTGGGAGCAGCAGCTGCCCGCCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCCCGGCCCAGCCCCCGGCAGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCTCGGGATCCTGGGCTG GCCTGCCCTGTGACTATACTGCCAACCCCAAGGGGGGCGTCACGGCCCAGGAGCTCGCCAAGTACCTGAAG GCCCTGGATGCCCGCATGGCCAACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCGGCCGCCCCAGC GCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCGGATGCCCTTCACCCCTTGGAGC GGGGGGTTATCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGCTCAAAGCCATGGCCGCCCTGGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCAGCTGCGTGGGAGGC TGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTCGGCGGTGGCCCAGGCGTGGGTCACA TCGACGTGACCCTCAAGAGTGATGAGGAGGAAGACGAGGAGGAGGAGGGGGAGGACGAGGAGGAGGAGGGC GAGGAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGAGGAGGA GAGTTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGCAGGGGGCAGTTTTGTGG CCTTTGGAGCCCAGTATGAGCAGGGCGTCCCTGCTGTTCAGCTCCTGGAGGCCGGGGAGGACTCCGAATCA GAGAGTGATGAGGAAGATGAGGATGACGATGATGACGATGATGATGATGACGATGATGATGACGATGACGA TGATGATGAAGTGCCCGTTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGTACCTGA CCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCATGTGACC AGAAAGAACCACGCCAGACAGATGTCATCCCGAGCCCTGGCACACCAGAACTTAGCCTGA

46

>Macropus robustus CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAAGAATCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTGTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTCGAGGGGCTGCTCATCGCCTGGTTTCAGCAGATCCGCGCCGCCGGCCTGCCGGT CAACGGCATCATCCTCAAGGAGAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGACGACTTCACCGCCT CCAATGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGAGCAGGAACGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCGGCCTCGGCGTCGGCCGCAGCGGG CGGTGCAGGGGGCGGTGGTGCGGGCAGCCGGGGCTGGTGGGAGGAGCAGCTGCCCATCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCCCGGCCCAGCCCCCGGCGGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCTCGGGATCCCGGGCTG GCCTGCCCTGTGACTATACTGCCAACCCCAAGGGGGGCGTCACGGCCCAGGAGCTCGCCAAGTACCTGAAG GCCCTGGATGCCCGCATGGCCGACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCGGCCGCCCCAGC CCTAGACGTCTCGGAACTGCGGCACGTTCACGTGGCCTTCTTCCCTTCGGATGCCCTTCATCCCTTGGAGC GGGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGCTCAAAGCCATGGCCGCCCTGGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCGGCTGCGTGGGAGGC CGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTYGGCGGTGGCCCAGGCCAAGACATCG ACGTGACCCTCAAGAGTGATGAGGAGGAAGACGAGGAGGAGGAGGAGGGGGAGGACGAGCAGGATGAGGAG GGTGAGGAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGAGGA GGAGAGTTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGCAGGGGGCAGTTTTG TGGCCTATGGAGCCCAGTATGAGCAGGGGGTCCCTGCTGTTCAGCTCCTGGAGGCCGGGGAGGATTCCGAA TCAGAGAGTGATGAGGAAGATGAGGATGACGATGATGACGATGATGATGATGACGATGATGATGACGATGA CGATGATGATGAAGTGCCCGTTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGTACC TGACCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCATGTG ACCAGAAAGAACCACGCCAGACAGACGTCATCCCGAGCCCTGGCCCACCAGAACTTAGCCTGA

>Macropus rufogriseus ind.1188 CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAACAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTCTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCTTACGACAAGCTCGAGGGGCTGCTCATCGCCTGGTTCCAGCAGATCCGCGCCGCCGGCCTGCAGGT CAACGGCATCATCCTCAAGGAAAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGACGACTTCACTGCCT CCAACGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAATGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCAGCCTCGGTGTCGGCCGCAGCGGG CAGTGCAGGGGGCGGTGGTGCGGGCAGCCGGGGCTGGTGGGAGGAGCAGCTGCCCSCCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCCCGGCCCAGCCCGCGGCGGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCTTGGGATCCTGGGCTG GCCTGCCCTGTGACTATATTGCCAACCCCAAGGGGGGCGTCACGGCCCAGGAGCTCGCCAGGTACCTGAAG GCCCTGGATGCCCGCATGGCCGACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCGTCCGCCCCAGC CCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCGGATGCCCTTCAGCCCTTGGAGC GGGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGTTCAAAGCCATGGCCGCCCTGGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCAGCTGCGTGGGAGGC CGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTAGGCGGTGGCCCAGGCGTGGGTCACA TCGACGTGACCCTCAAGAGTGAAGAGGAGGAAGACGAGGAGGAGGAGGAGGGGGAGGACGAGCAGGATGAG GAGGGTGAGGAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGA GGAGGAGAGTTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGCAGGGGGCAGTT TTGTGGCCTTTGGAGCCCAGTATGAGCAGGGGGGCCCTGCWGTTCAGCTCCTGCAGGCCGGGGAGGACTCC GAATCAGAGAGTGATGAGGAAGATGAGGATGACGATGATGACGATGATGATGATGACGATGATGATGACGA TGACGATGATGATGAAGTGCCCATTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGT ACCTGACCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCAT GTGACCAGAAAGAACCACGCCAGACAGATGTCATCCCGAGCCCTGGCACACCAGAACTTAGCCTGA

>Macropus rufogriseus ind.3242 CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAACAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA

47

ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTCTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTCGAGGGGCTGCTCATCGCCTGGTTCCAGCAGATCCGCGCCGCCGGCCTGCAGGT CAACGGCATCATCCTCAAGGAAAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGACGACTTCACTGCCT CCAACGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAATGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCAGCCTCGGTGTCGGCCGCAGCGGG CAGTGCAGGGGGCGGTGGTGCGGGCAGCCGGGGCTGGTGGGAGGAGCAGCTGCCCGCCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCCCGGCCCAGCCCGCGGCGGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCTTGGGATCCTGGGCTG GCCTGCCCTGTGACTATATTGCCAACCCCAAGGGGGGCGTCACGGCCCAGGAGCTCGCCAAGTACCTGAAG GCCCTGGATGCCCGCATGGCCGACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCGTCCGCCCCAGC CCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCGGATGCCCTTCAGCCCTTGGAGC GGGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGTTCAAAGCCATGGCCGCCCTGGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCGGCTGCGTGGGAGGC CGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTAGGCGGTGGCCCAGGCGTGGGTCACA TCGACGTGACCCTCAAGAGTGAAGAGGAGGAAGACGAGGAGGAGGAGGAGGGGGAGGACGAGCAGGATGAG GAGGGTGAGGAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGA GGAGGAGAGTTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGCAGGGGGCAGTT TTGTGGCCTTTGGAGCCCAGTATGAGCAGGGGGGCCCTGCWGTTCAGCTCCTGCAGGCCGGGGAGGACTCC GAATCAGAGAGTGATGAGGAAGATGAGGATGATGATGATGACGATGATGATGATGACGATGATGATGACGA TGACGATGATGATGAAGTGCCCATTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGT ACCTGACCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCAT GTGACCAGAAAGAACCACGCCAGACAGATGTCATCCCGAGCCCTGGCACACCAGAACTTAGCCTGA

>Macropus rufus CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAAGAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTTGCGTCCGAGCGCAAGTTCGGTGTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTCGAGGGACTGCTCATCGCCTGGTTTCAGCAGATCCGCGCCGCCGGCCTGCCGGT CAACGGCATCATCCTCAAGGAGAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGACGACTTCACCGCCT CCAACGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAACGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCGGCCTCGGCGTCTGCCGCAGCGGG CGGTGCAGGGGGCGGTGGTGTGGGCAGCCGGGGCTGGTGGGAGGCGCAGCTGCCCGTCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCCCGGCCCAGCCCCCGGCGGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCTTCGGGATCCCGGGCTG GCCTGCCCTGTGACTATACTGCCAACCCCAAGGGGGGCATCACGGCCCAGGAGCTCGCCAAGTACCTGAAG GCCCTGGATGCCCGCATGGCCGACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCGGCCGCCCCAGC CCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCGGATGCCCTTCATCCCTTGGAGC GGGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGCTCAAAGCCATGGCCGCCCTGGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCGGCTGCGTGGGAGGC CGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTCGGCGGTGGCCCAGGCCAAGACATCG ACGTGACCCTCAAGAGTGATGAGGAGGAAGACGAGGAGGAGGAGGGGGAGGASGAGGAGGAGGAGGGCGAG GAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGAGGAGGAGAG TTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGTAGGGGCCAGTTTTGTGGCCT ATGGAGCCCAGTATGAGCAGGGTGGCCCTGCTGTTCAGCTCCTGGAGGCCGGGGAGGACTCCGAATCAGAG AGTGATGAGGAAGATGAGGATGACGATGATGACGATGATGATGATGATGATGACGATGACGATGACGATGA TGATGAAGTGCCCGTTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGTACCTGACCT CTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCATGTGACCAGA AAGAACCACGCCAGACAGACGTCATCCCGAGCCCTGGCCCACCAGAACTTAGCCTGA

>Wallabia bicolor CENPB ATGGGCCCCAAGCGGAGGCAGCTGACGTTCCGGGAGAAGTCGTACATCATCCAGGAGGTGGAGAAGAACCC AGACCTCCGCAAGGGCGAGATCGCCCGGCGCTTCAACATCCCGCCGTCCACGCTGAGCACCATCCTCAAGA ACAAGCGGGCCATCCTGGCGTCCGAGCGCAAGTTCGGTGTGGCCTCCACCTGCCGCAAGACCAACAAGCTG TCGCCCTACGACAAGCTCGAGGGGCTGCTCATCGCCTGGTTCCAGCAGATCCGCGCCGCCGGCCTGCCGGT CAACGGCATCATCCTCAAGGAGAAGGCGCTGCGCATCGCCGAGGAGCTGGGCATGGAGGACTTCACCGCCT

48

CCAACGGCTGGCTGGACCGCTTCCGCCGCAGGCACGGGGTCATGACCAGGAACGGGGTGACCAAGAGCCGG GCCCAGCCACCGACCCCGGCCCAGCCGCCTCCCCTTCCCCGGGCCCCAGCCTCGGTGTCGGCCGCAGCGGG CAGTGCAGGGGGCGGTGGTGCGGGCAGCCGGGGCTGGTGGGAGGAGCAGCTGCCCGCCTTGGCCGAGGGCT ACGCCTCGCAGGATGTCTTCAATGCCACCGAGACCTGCCTGTGGTACAACTTCCTACCTGACCAGGCCCCT GCCCTTGGCGGCGAGGCCCGGCCCAGCCCCCGGCGGGGCACCGAGAGGCTCGTGGTGCTCCTGTGTGCCAA CGCCGATGGCAGTGAGAAGCTTCCCCCGCTGGTGGTGGGCAAGCTGGCCAAGCCCTCGGGATCCCGGGCTG GCCTGCCCTGTGACTATACTGCCAACCCCAAGGGGGGCGTCACGGCCCAGGAGCTCGCCAAGTACCTGAAG GCCCTGGATGCCCGCATGGCCGACGAGTCCCGCAGGATCCTGCTGTTGGCCAGTCGGCCGGCCGCCCCAGC CCTAGACGTCTCGGAACTGCGGCACGTTCAAGTGGCCTTCTTCCCTTCGGATGCCCTTCAGCCCTTGGAGC GGGGGGTTGTCCAGCAGGTCAAGGGGCACTACCGACAGGCCCTGCTGCTCAAAGCCATGGCCGCCCTGGAG GACCACGACCGGGCGCGGCTGCAGCTGGGCCTCCCTGAAGCTTTGCACTTTGTGGCGGCTGCGTGGGAGGC CGTGGAGCCCGCAGACATTGCCGCCTGCTTCCGAGAGGCTGGCTTCGGCGGTGGCCCAGGCCATGACATCG ACGTGACCCTCAAGAGTGATGAGGAGGAAGACGAGGAGGAGGAGGAGGGGGAGGACGAGCAGGATGAGGAG GGTGAGGAAGAGGAGGGAGAGGAGGCTGAGGAAGGAGAGGAAGGAGAGGAGGAGGAGGAAGAAGAGGAGGA GGAGAGTTCCTCAGAGGGCTTAGAGGCAGAAGACTGGTCCCAGGGCGGTGGTGGGGCAGGGGGCAGTTTTG TGGCCTATGGAGCCCAGTATGAGCAGGGGGGCCCTGCTGTTCAGCTCCTGGAGGCCGGGGAGGACTCCGAA TCAGAGAGTGATGAGGAAGATGAGGATGACGATGATGACGATGATGATGATGACGATGATGATGACGATGA CGATGATGATGAAGTGCCCGTTCCCAGTTTTGGTGAGGCCATGGCATACTTTGCCATGGTCAAACGGTACC TGACCTCTTTCCCCATCGATGACCGTGTCCAGAGCCACATTCTTCACTTGGAGCATGATTTGGTCCATGTG ACCAGAAAGAACCACGCCAGACAGACGTCATCCCGAGCCCTGGCCCACCAGAACTTAGCCTGA

Newick Format CENP-B Phylogenetic Tree

1 M.parma, 2 M.rufo1188, 3 M.rufo3242, 4 M.rufus, 5 M.giganteu, 6 M.robustus, 7 W.bicolor, 8 M.agilis, 9 M.eugenii

(4,6,(((((1,8),9),(2,3)),5),7))