<<

ORGANIZATION, EVOLUTION AND FUNCTION OF ALPHA SATELLITE DNA

AT

by

M. KATHARINE RUDD

Submitted in partial fulfillment of the requirements

For the degree of Doctor of Philosophy

Dissertation Advisor: Dr. Huntington F. Willard

Department of

CASE WESTERN RESERVE UNIVERSITY

January, 2005 CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the dissertation of

______

candidate for the Ph.D. degree *.

(signed)______(chair of the committee)

______

______

______

______

______

(date) ______

*We also certify that written approval has been obtained for any proprietary material contained therein. 1

Table of Contents

Table of contents...... 1

List of Tables...... 2

List of Figures...... 3

Acknowledgements...... 5

Abstract...... 6

Chapter 1: Introduction...... 8

Chapter 2: Analysis of centromeric regions of the human ...... 49

Chapter 3: Alpha satellite evolution in primates: evidence for the homogenization of monomeric alpha satellite...... 81

Chapter 4: Human artificial with alpha satellite-based de novo centromeres show increased frequency of nondisjunction and anaphase lag...... 122

Chapter 5: Conclusions and future studies...... 155

Appendix: Sequence organization and functional annotation of human centromeres...... 173

Bibliography...... 198 2

List of Tables

Table 2-1: Alpha satellite in the July 2003 (Build 34) assembly of the ...... 62

Table 2-2: Repeat content of the July 2003 (Build 34) human genome assembly...... 77

Table 3-1. Repeat content of the 17 satellite zone and flanking regions...... 98

Table 3-2. Mean percent identity among monomers from particular regions of alpha satellite...... 102

Table 4-1: Characteristics of human artificial chromosomes...... 135

Table 4-2: Segregation errors...... 147 3

List of Figures

Figure 1-1: organization among organisms...... 12

Figure 1-2: Alpha satellite organization at the human centromere...... 19

Figure 1-3: Alpha satellite evolution model...... 22

Figure 1-4: Centromere and pericentromere model...... 33

Figure 2-1: Alpha satellite location in the July 2003 (Build 34) human genome assembly...... 61

Figure 2-2: Genomic landscape of 1 Mb regions outside of the centromere gaps...... 64

Figure 2-3: Types of alpha satellite in the human genome...... 68

Figure 2-4: Alpha satellite and centromere colocalization...... 72

Figure 3-1: Alpha satellite organization in the centromeric region of ...... 95

Figure 3-2: Percent identity scores for pairwise comparisons of alpha satellite monomers...... 100

Figure 3-3: Phylogenetic tree of alpha satellite on chromosome 17...... 104

Figure 3-4: Neighbor-joining tree of monomers from different chromosomes...... 107

Figure 3-5: Maximum likelihood tree of monomers from different chromosomes...... 109

Figure 3-6: Distributions of interchromosomal and intrachromosomal monomeric monomer percent identities...... 112

Figure 3-7: Genomic organization of 17q compared to the orthologous Pan troglodytes region...... 115

Figure 4-1: FISH analysis of artificial chromosomes...... 137 4

Figure 4-2: Anaphase segregation assay...... 139

Figure 4-3: Missegregation of natural, artificial and variant chromosomes....141

Figure 5-1: Model of alpha satellite evolution...... 161

Figure 5-2: Strategy for sequencing an entire human centromere...... 165

Figure A-1: Alpha satellite organization in the human genome...... 178

Figure A-2: Gaps in the public genome assembly of chromosomes X and 17...... 181

Figure A-3: Repeat content of the junction between the short arm and centromere of the ...... 183

Figure A-4: Organization of D17Z1 and D17Z1-B higher-order repeats at the centromere of chromosome 17...... 185

Figure A-5: Phylogenetic analysis of 230 alpha satellite monomers from the X chromosome and chromosome 17...... 188

Figure A-6: Functional centromere annotation using a human artificial chromosome assay...... 193

Figure A-7: Genome assembly of the centromeric regions of the X chromosome, chromosome 17 and 21...... 196 5

ACKNOWLEDGEMENTS

I would like to thank my advisor, Hunt Willard, for introducing me to alpha satellite and nurturing my scientific development for the past five years. Hunt has made me a better scientist, writer and speaker, and has challenged me to think beyond my view of the chromosome.

I am also grateful to Pat Hunt and Terry Hassold. They have been a constant source of support throughout my graduate career, and have always made me feel like a part of their labs.

The members of the Willard lab have provided scientific discussion, thoughtful debate, and lots of fun over the years. I am especially grateful to Brenda Grimes and Mary Schueler for educating me in the ways of artificial chromosomes and alpha satellite and for discussing the complex centromere.

My friends at Case have been an integral part of my graduate school experience.

Whether commiserating over proposal defenses, helping prepare for student seminars, or going out in Coventry to unwind, my friends have always been there for me. My mother and sisters have always encouraged and supported me, even when they weren’t exactly sure what I was doing in the lab. 6

Organization, evolution and function of alpha satellite DNA at human centromeres

Abstract

by

M. KATHARINE RUDD

The centromere is a specialized responsible for ensuring proper chromosome segregation at and . Human centromeres are comprised of arrays of a primate-specific repeat known as alpha satellite

DNA. Understanding the organization and evolution of alpha satellite is essential to delineate the requirements for centromere function.

The basic unit of alpha satellite is an ~ 171 bp monomer, and monomers may be organized in one of two types of structure. Higher-order alpha satellite is made up of monomers arranged in homogeneous multimeric higher-order repeat units. In contrast, more divergent monomeric alpha satellite lacks any higher- order periodicity. We have analyzed the alpha satellite in the human genome assembly (Build 34, July 2003), and found regions of both higher-order and monomeric alpha satellite. Although previously identified at all human centromeres, higher-order alpha satellite has only been included in the assemblies of eleven chromosomes. Monomeric alpha satellite typically lies at 7 the edges of larger higher-order arrays, and has been included in all but three chromosome assemblies.

The organization of alpha satellite in the human genome is a product of concerted evolutionary processes. We have analyzed the relationships between alpha satellite monomers from multiple chromosomes to discern the exchange mechanisms that have shaped the arrangement of alpha satellite in the genome.

Like higher-order alpha satellite described previously, monomeric alpha satellite has a higher frequency of intrachromosomal exchange than interchromosomal exchange. However, comparing orthologous regions of human and chimpanzee alpha satellite, we find that monomeric alpha satellite is more conserved than higher-order alpha satellite.

In addition to varying in sequence organization and evolutionary history, monomeric and higher-order alpha satellites also differ in their functionality.

Using extended chromosome methods to achieve greater resolution, we have found that to centromeric only colocalize with higher-order and not monomeric alpha satellite. We have also created artificial chromosomes with de novo centromeres from D17Z1 and DXZ1 higher-order alpha satellites, while other studies have shown that monomeric alpha satellite lacks this functional capacity.

This work elucidates the genomic and functional differences between higher-order and monomeric alpha satellite to further define the complex human centromere. 8

Chapter 1

Introduction 9

Chapter 1: Introduction

The centromere in all eukaryotic organisms including plays a critical role in each step of chromosome segregation in mitosis and meiosis. As the site of the proteinaceous , the centromere is responsible for attaching chromosomes to spindle microtubules that then align the chromosomes at the plate (Rieder and Salmon 1998). Proteins localized to the centromere are also involved in the metaphase to anaphase checkpoint and signal the attachment of all chromosomes to spindle microtubules before allowing the to progress into anaphase (Shah and Cleveland 2000). Finally, sister cohesion must be resolved at the centromere, and proper timing of the removal of cohesion is of vital importance for segregating (Lee and

Orr-Weaver 2001).

The centromere is defined by specific DNA sequences as well as by a specialized structure. Although centromere proteins are well conserved among all organisms (Baum and Clarke 2000; Brown et al. 1993;

Buchwitz et al. 1999; Earnshaw et al. 1987; Henikoff et al. 2000; Howman et al.

2000; Kalitsis et al. 1998; Oegema et al. 2001; Palmer et al. 1991; Stoler et al.

1995; Sullivan and Glass 1991; Takahashi et al. 2000; Tomkiel et al. 1994), the

DNA sequence organization at the centromere is not at all well conserved (Malik and Henikoff 2002; Willard 1998). In fact, centromeres range in size and complexity from the 125 basepair point centromere found in budding yeast to the human centromere that spans several megabases. Like centromeres in most 10 organisms, the human centromere is made up of repetitive DNA. Alpha satellite

DNA, a tandemly repeated DNA family based on a fundamental unit length of

~171 bp, has been found at all human centromeres. However, the particular organization and sequence identity among alpha satellite repeats is largely chromosome-specific (Alexandrov et al. 2001; Warburton and Willard 1996;

Willard 1985).

Understanding the organization of alpha satellite DNA in the human genome and its role in centromere function is the focus of this dissertation. This introductory chapter discusses the genomic organization and evolution of alpha satellite DNA, as well as the chromatin and protein requirements for centromere function, and compares human centromeres to centromeres from other species, as necessary background for chapters that follow.

Centromere organization among eukaryotes

The sequences that make up the centromeres of diverse organisms are extremely variable. Most well characterized centromeres contain repetitive DNA with an AT-richness greater than that of the genome average (Choo 2001; Koch

2000). However, individual organisms have evolved different genomic structures to create a locus capable of chromosome segregation. This section discusses the organization of centromeric DNA in yeasts, flies, plants, worms and mice, while a following section focuses specifically on the organization of human 11 centromeres. The chromatin modifications and centromere proteins involved in centromere function are discussed in a later section.

The simplest centromere organization is found in the chromosomes of the yeast, Saccharomyces cerevisiae (Figure 1-1). Only approximately 125 bp is required for centromere function in the budding yeast, and this consensus sequence is consistent among the centromeres of all 16 chromosomes (Clarke and Carbon 1985). There are three functional elements within the S. cerevisiae centromere, CDEI, CDEII and CDEIII. Deletions within CDEI (Hegemann et al.

1988) and CDEII (Sears et al. 1995) affect chromosome segregation in mitosis and meiosis, while deletions within CDEIII completely destroy centromere function (Jehn et al. 1991). Unlike other characterized centromeres, the budding yeast centromere consists of largely unique DNA (Clarke 1990). Nonetheless, the entire centromere is highly AT-rich, and the CDEII sequences are greater than 90% AT with short poly-A and poly-T regions (Fitzgerald-Hayes et al. 1982).

In contrast to the simple centromere of the budding yeast, the fission yeast centromere is more similar to the centromeres of higher eukaryotes in its size and complexity. Schizosaccharomyces pombe centromeres are made up of inner and outer inverted repeats flanking a non-repetitive central core (Clarke et al. 1986; Nakaseko et al. 1986; Nakaseko et al. 1987) (Figure 1-1), and each of these regions is AT-rich. Among the three S. pombe chromosomes, 12

S. cerevisiae ... I II III ...

125 bp S. pombe

... dh dg ImrL Cnt ImrR dh dg ...

35 -110 kb D. melanogaster ... AATAT AAGAG ...

A. thaliana 420 kb

... 180 bp repeats ...

400 kb - 1.4 Mb O. sativa ... CentO repeats ...

65 kb - 2 Mb M. musculus

... Major Satellite Minor Satellite ...

240 kb - 2 Mb ~1 Mb C. elegans ......

H. sapiens Alpha Satellite ... / / ...

240 kb - 5 Mb Figure 1-1: Centromere organization among organisms. The S. cerevisiae centromere is made up of three domains; CDEI, CDEII and CDEIII. The S. pombe centromere is comprised of a unique central core (Cnt) flanked by inner (ImrL, ImrR) and outer repeats (dg, dh). The fly centromere has two satellite domains, interspersed with transposable elements. Arrays of 180 bp repeats and CentO repeats interspersed with retrotransposons comprise the Arabidopsis and rice centromeres, respectively. The mouse centromere is made up of adjacent arrays of minor and major satellite. C. elegans have holocentric chromosomes, thus no specific sequence is required for centromere function. Human centromeres are made up of arrays of alpha satellite DNA organized in a hierarchical repetitive structure. 13 centromeres are similar, but not identical in organization. Each centromere contains an approximately 4 kb central core (Cnt), bordered by approximately 6 kb of imperfect repeats on the left and right arms (ImrL and ImrR). The organization of the outer repeats is more variable among chromosomes, but all are made up of dg and dh repeats, each of which is about 5 kb. Overall, S. pombe centromeres are 35 - 110 kb in size (Wood et al. 2002), a huge increase as compared to the S. cerevisiae centromeres. The inner repeats and central core are necessary for centromere function and bind spindle microtubules

(Baum et al. 1994; Hahnenberger et al. 1991; Nakaseko et al. 2001), whereas the outer repeats recruit proteins and are more likely responsible for other functions such as heterochromatin formation and sister chromatid cohesion (Partridge et al. 2000; Partridge et al. 2002).

Centromeres in several other organisms are characterized by long stretches of so-called “satellite DNA”. This type of sequence was first identified as satellite bands in ultracentrifuge density gradients (Corneo et al. 1967; Kit

1961; Sueoka et al. 1959); now the term satellite DNA has come to refer to any tandem repetitive sequence (Charlesworth et al. 1994). Satellite may be divided into two major groups based on the size of the repeat unit; microsatellites are 2-20 bp long, whereas minisatellites are greater than 20 bp long

(Charlesworth et al. 1994). Examples of microsatellites include the human classical satellites (Gosden et al. 1975; Prosser et al. 1986) as well as satellite sequences found in fly heterochromatin (Lohe et al. 1993; Peacock et al. 1978) 14 among others. Larger minisatellites are found at the centromeres of Arabidopsis, rice, mice and humans (see below).

The fly centromere has been defined by a 420 kb region of a minichromosome that is required for chromosome transmission (Murphy and

Karpen 1995; Sun et al. 1997). Similar to other centromeres, the Drosophila melanogaster centromere is repetitive, made up of satellite DNA and transposable elements (Figure 1-1). There are two adjacent blocks of microsatellites, AATAT and AAGAG satellites, that are interspersed with transposons as well as AT-rich DNA (Sun et al. 2003). Normal fly centromeres have not been sequenced to date, likely due to the difficulty in sequencing and assembling highly heterochromatic regions of the genome (Hoskins et al. 2002).

However, the chromatin environment of endogenous Drosophila centromeres has been very well characterized (Blower and Karpen 2001; Blower et al. 2002).

Plant centromeres are very similar to the satellite- and transposon-rich fly centromeres (Figure 1-1). The major component of the Arabidopsis thaliana centromere is an AT-rich 180 bp repeat unit (Richards et al. 1991). The A. thaliana centromere was mapped as a recombination resistant region of the chromosome (Copenhaver et al. 1998), and subsequent sequence analysis identified large arrays of the 180 bp repeat, 400 kb to 1.4 Mb among chromosomes (Copenhaver et al. 1999). The Arabidopsis centromere is also enriched for retrotransposons not usually found on chromosome arms. Similarly, the rice (Oryza sativa) centromere is predominantly comprised of a 155 bp 15 tandem repeat known as CentO arranged in arrays ranging from 65 kb to 2 Mb among the 12 rice chromosomes (Cheng et al. 2002). These arrays are interspersed with gypsy-class retrotransposons known as centromeric retrotransposons of rice. The Arabidopsis and rice centromeres have recently been defined at the level of chromatin, and chromatin immunoprecipitation experiments using antibodies to proteins required for centromere function have been conducted in both species. As expected, centromere proteins are associated with the 180 bp repeats in Arabidopsis (Nagaki et al. 2003) and the

CentO repeats in rice (Nagaki et al. 2004). However, within the functional domain of the smallest rice centromere, there are also four expressed .

This finding is surprising since centromeres are classically characterized as heterochromatic regions resistant to expression (Dillon and Festenstein

2002).

Mouse centromeric DNA is comprised of two types of sequence, major and minor satellite DNA (Figure 1-1). These regions have not been well defined, but mapping studies have shown that major and minor satellite are non- overlapping arrays, with minor satellite positioned closer to the (Joseph et al. 1989; Kipling et al. 1991). The sizes of the basic repeat units in major and minor satellite are approximately 234 bp and 120 bp, respectively (Horz and

Altenburger 1981; Pietras et al. 1983). Only minor satellite coincides with antibodies to centromere proteins, suggesting that this region is part of the kinetochore (Wong and Rattner 1988). This finding is supported by the fact that 16 minor satellite is found at all mouse chromosomes (Wong and Rattner 1988), whereas major satellite is located at the centromeres in only some species of mice (Wong et al. 1990).

As opposed to all other centromeres described, the centromeres of

Caenorhabditis elegans appear to be completely sequence independent (Figure

1-1). C. elegans chromosomes are holocentric, meaning that many sites along the chromosome act as a centromere, recruiting centromere proteins necessary for segregation (Buchwitz et al. 1999; Moore et al. 1999; Oegema et al. 2001).

The sequence-independent nature of C. elegans centromere activity is further supported by the fact that any sequence introduced into the genome segregates as an extrachromosomal element and is heritable (Stinchcomb et al. 1985). It would be interesting to see if particular sequences along the length of the chromosome are associated with centromere proteins on endogenous C. elegans chromosomes, or if every sequence truly plays an active role in chromosome segregation. Holocentric chromosomes are a curious contrast to the monocentric chromosomes found in most other species typically containing repetitive AT-rich DNA at the centromeres.

Human centromere organization

The human centromere is made up of highly repetitive DNA known as alpha satellite. Alpha satellite was first discovered in the human genome (Manuelidis and Wu 1978) by its homology to a repetitive fraction of the African Green 17

Monkey genome (Maio 1971). Further experiments localized these repeats to human centromeric regions by in situ hybridization (Manuelidis 1978).

All human centromeres are comprised of alpha satellite DNA, although the organization of alpha satellite varies from centromere to centromere (Alexandrov et al. 1991; Choo et al. 1990; Devilee et al. 1988; Ge et al. 1992; Greig et al.

1989; Greig et al. 1993; Haaf and Willard 1992; Hulsebos et al. 1988; Jorgensen et al. 1988; Looijenga et al. 1992; Puechberty et al. 1999; Rocchi et al. 1991;

Vissel and Choo 1991; Waye et al. 1987a; Waye et al. 1987b; Waye et al. 1987c;

Waye and Willard 1985; Waye and Willard 1986; Waye and Willard 1987; Waye and Willard 1989a; Willard et al. 1983; Wolfe et al. 1985). The most basic unit of alpha satellite DNA is an approximately 171 bp monomer (Manuelidis and Wu

1978), and monomers may be arranged in one of two types of alpha satellite, higher-order or monomeric. Higher-order alpha satellite is made up of monomers organized in highly identical higher-order repeat units (Willard et al. 1983; Willard and Waye 1987b; Yang et al. 1982). For example, the higher-order alpha satellite found on chromosome 17, D17Z1, is made up of sixteen monomers arranged head to tail to form a 2.7 kb higher-order repeat unit (Waye and Willard

1986) (Figure 1-2). This repeat unit is in turn repeated in tandem with over a thousand copies at the chromosome 17 centromere (Warburton and Willard

1990). Although higher-order repeats within a given array are nearly identical to each other (typically 97-100% identical (Durfy and Willard 1989; Schindelhauer and Schwarz 2002; Schueler et al. 2001; Warburton and Willard 1992)), the 18 monomers that make up the D17Z1 higher-order repeat unit are much less homogeneous, about 76% identical to each other (Waye and Willard 1986).

Higher-order alpha satellite has been found at all human centromeres, and higher-order arrays range between 240 kb (Tyler-Smith et al. 1993) and ~ 5 Mb

(Wevrick and Willard 1989) in size.

Alpha satellite with a less homogeneous monomer organization has been described on chromosomes 7, 10, 16, 21 and the X chromosome (de la Puente et al. 1998; Guy et al. 2003; Horvath et al. 2000; Ikeno et al. 1994; Jackson et al.

1996; Schueler et al. 2001; Wevrick et al. 1992). Termed “monomeric” alpha satellite, this type of alpha satellite lacks any higher-order periodicity. Monomers within a region of monomeric alpha satellite exhibit greater sequence divergence than do higher-order repeat units. Where monomeric alpha satellite has been described, it has been found adjacent to higher-order alpha satellite and is less abundant than the megabase-sized arrays of higher-order alpha satellite. Unlike higher-order alpha satellite, monomeric alpha satellite is regularly interspersed with other repeats as well as some unique sequences.

Centromere function has been linked to higher-order alpha satellite, yet there is no evidence for monomeric alpha satellite contributing to proper chromosome segregation (see below). Thus, higher-order and monomeric alpha satellites occupy physically and functionally distinct regions of the chromosome.

The arrays of higher-order alpha satellite and adjacent regions including monomeric alpha satellite have thus been termed the centromere and 19

D17Z1 3 +/- 1 Mb … …

171bp … …

2.7 kb Higher-order Monomeric alpha satellite alpha satellite

Figure 1-2: Alpha satellite organization at the human centromere. An example of human centromere organization is shown here for the chromosome 17 centromere. Alpha satellite organized in higher-order arrays (red) span several megabases. In the case of D17Z1 higher-order alpha satellite, 16 monomers are arranged head to tail to comprise a 2.7 kb higher-order repeat unit. Monomeric alpha satellite (blue) lacking any higher-order periodicity is located at the edges of higher-order arrays. Higher-order repeat units are extremely homogeneous (97-100% identical), whereas monomeric monomers are on average 72% identical. 20 pericentromere, respectively (Horvath et al. 2001; Jackson 2003). The exact size of the pericentromeric regions likely varies among chromosomes and has not been defined. However, the sequences that make up the pericentromere have been a subject of much interest in recent years. In addition to monomeric alpha satellite, the pericentromeres of several chromosomes have been shown to contain classical satellites, expressed genes, as well as a high concentration of segmental duplications (Guy et al. 2000; Horvath et al. 2000; Jackson et al.

1996). Segmental duplications are duplicated blocks of genomic DNA several kilobases in size (Bailey et al. 2002; Bailey et al. 2001). These duplications occur within and between chromosomes and are highly enriched in pericentromeric regions. Although no centromere has been entirely sequenced

(Eichler et al. 2004), there is no evidence for the kind of interspersed sequence organization seen in regions of monomeric alpha satellite within higher-order alpha satellite arrays (Schueler et al. 2001).

Alpha satellite evolution

The organization of alpha satellite is a product of concerted evolutionary processes (Durfy and Willard 1990; Warburton and Willard 1996; Warburton et al. 1993; Waye and Willard 1986). DNA sequences subject to concerted evolution typically exhibit higher sequence identity within a species than between species (Brown et al. 1972; Coen et al. 1982; Southern 1975). For example, higher-order repeat units from an array on one chromosome are more similar to 21 each other than to the orthologous repeats in another species (Durfy and Willard

1990; Jorgensen et al. 1987). Alpha satellite has been found at all primate centromeres studied; however, the organization and types of alpha satellite varies among species (Alexandrov et al. 2001; Warburton et al. 1996) (Figure 1-

3A). In addition to the human centromere, higher-order alpha satellite has been found at some of the centromeres of chimpanzees (Baldini et al. 1991;

Warburton et al. 1996; Waye and Willard 1989b), gorillas (Durfy and Willard

1990; Waye and Willard 1989b), and orangutans (Haaf and Willard 1998; Waye and Willard 1989b). Notably, higher-order alpha satellite has not been found in more distant primates. Indeed, only monomeric alpha satellite has been found in

Old World Monkeys (Rosenberg et al. 1978; Singer and Donehower 1979;

Thayer et al. 1981), New World Monkeys (Alves et al. 1994; Fanning 1989) and prosimians (Maio et al. 1981; Musich et al. 1980). As the centromeres from these monkeys have not been completely analyzed, the absence of higher-order alpha satellite may reflect a limited amount of alpha satellite sampling or a legitimate lack of higher-order structure. These findings are consistent with a model of alpha satellite evolution in which higher-order evolved relatively recently from monomeric alpha satellite.

Like other tandem satellite families (Brown et al. 1972; Coen et al. 1982;

Southern 1975), alpha satellite is subject to molecular drive mechanisms.

Molecular drive is a model that attempts to explain the high sequence identity within a class of sequences. In this process, a sequence 22

A Prosimians . New World Monkeys ~ 55 Old World Monkeys ~ 35 Orangutans mya ~ 25 Gorillas mya Chimps ~ 5-10 Humans mya Monomeric Higher-order alpha satellite alpha satellite

B first monomer

tandem duplication

monomeric alpha satellite

first higher-order alpha satellite

human higher-order alpha satellite

Figure 1-3: Alpha satellite evolution model. (A) Types of alpha satellite found among primate centromeres. The simplified phylogenetic tree shows approximate divergence times between humans and other primates in millions of years ago (mya) (Kumar and Hedges 1998). (B) Model of alpha satellite evolution by unequal crossing over (see text for details). Small arrowheads denote monomeric alpha satellite, larger arrows represent higher-order alpha satellite. The black diamond and square represent non-alpha satellite sequences that are present within regions of monomeric alpha satellite in human chromosomes. 23 variant can quickly spread through a population and become fixed. Molecular drive operates within and between chromosomes and includes mechanisms such as unequal crossing-over, gene conversion, and transposition (Coen and Dover

1983; Dover 1982; Strachan et al. 1982; Strachan et al. 1985).

Although all of these processes may be participating in alpha satellite evolution to some extent, the homogenization of alpha satellite can best be accounted for by unequal crossing-over. Smith proposed a three-step mechanism to explain the emergence of tandem satellite repeats via unequal crossing-over (Smith 1976). The first step is a that creates short local homology between two regions of a given sequence. In the second step, an unequal crossover event occurs between the two regions of homology generating two products, a and a tandem duplication. Subsequent unequal crossovers between the duplicated repeats in the next step will produce expansions and contractions in the number the tandem repeats. As the number of tandem repeats increases so will the number of sites of homology, increasing the frequency of unequal crossovers between tandem repeats. Recurring crossovers will homogenize the tandem repeats, leading to highly identical repeat units. This process can explain the initial emergence of alpha satellite DNA as well as the homogenization of monomeric alpha satellite to form the higher-order repeat units that subsequently expanded to make up the megabase-sized arrays present on human centromeres (Figure 1-3). 24

The relationships between alpha satellite on different chromosomes, homologs of the same chromosome, and sister chromatids are very informative for determining the relative rates of unequal crossover events predicted to occur in alpha satellite evolution. With the exception of the centromeres on the acrocentric chromosomes, higher-order alpha satellite in the human genome is chromosome-specific, meaning that higher-order alpha satellite on one chromosome may be distinguished from that on another chromosome (Willard and Waye 1987a; Willard and Waye 1987b). This can best be explained by unequal crossover events between homologous chromosomes that homogenized alpha satellite into a chromosome-specific higher-order array (Durfy and Willard

1989; Schindelhauer and Schwarz 2002; Schueler et al. 2001; Warburton et al.

1993). The high sequence identity among thousands of higher-order repeat units on a given array argues that intrachromosomal exchange between homologous chromosomes is an efficient mechanism for homogenizing alpha satellite.

There is also evidence of interchromosomal exchanges involving alpha satellite. Higher-order repeats from different chromosomes have related organizations and fall into suprachromosomal families (Alexandrov et al. 1988;

Greig et al. 1993; Waye et al. 1987a; Waye and Willard 1986; Willard and Waye

1987a). There are four major suprachromosomal families described in the human genome. Two families contain related higher-order alpha satellite organized in dimeric structures (...ABABAB...), while a third family contains higher-order alpha satellite organized in a pentameric structure 25

(...ABCDEABCDE...) (Alexandrov et al. 1988). Higher-order alpha satellite found on the , DYZ3, does not fall into one of these groups and belongs to a more divergent family (Alexandrov et al. 1993). It is interesting to note that some chromosomes contain more than one higher-order array, and in most cases these arrays are very different in sequence identity and organization

(Alexandrov et al. 1991; Baldini et al. 1989; Waye et al. 1987b; Waye et al.

1987c; Wevrick and Willard 1991), suggesting that separate homogenization events gave rise to the two distinct arrays. The related higher-order arrays on different chromosomes provides evidence for interchromosomal exchange; however, the overall sequence variation among higher-order repeats within a suprachromosomal family suggests that this type of exchange event occurred much less frequently than intrachromosomal exchanges between homologous chromosomes (Warburton and Willard 1995; Waye and Willard 1986; Willard and

Waye 1987a).

Yet another driving force in alpha satellite evolution involves exchanges between sister chromatids. Warburton and Willard analyzed the higher-order repeat units that make up the D17Z1 array in three individual chromosomes 17 using two-dimensional gels (Warburton and Willard 1990). Variation in higher- order repeat unit length within an array on different chromosomes 17 suggests that these variants evolve along haplotypic lineages that have arisen relatively recently (Warburton and Willard 1990; Warburton and Willard 1995). Thus, alpha satellite evolution is a multifaceted process, working at the level of exchange 26 events between different chromosomes, between homologs of the same chromosome, and between sister chromatids.

Based on these data, it is possible to hypothesize the steps involved in alpha satellite evolution (Figure 1-3B). Following sequence mutation (s), the first alpha satellite monomer duplicated by unequal crossover to form a dimer sometime early in the primate lineage. These dimers expanded via unequal crossover mechanisms to make a large stretch of tandem monomers such as the monomeric alpha satellite found at the centromeres of Old and New World

Monkeys. Subsequent unequal crossovers within chromosomes homogenized monomers further to give rise to the higher-order arrays found in the great apes.

As higher-order alpha satellite took on the role of centromere function (see below), the monomeric alpha satellite on the periphery was free to accumulate insertions and without phenotypic consequence. This leads to the current organization of a typical human centromere: a large array of higher-order alpha satellite bordered by more divergent monomeric alpha satellite interspersed with other sequences.

The proposed model of alpha satellite evolution will be further evaluated in chapter 3. Based on the model, we would expect higher-order alpha satellite to evolve more rapidly than monomeric alpha satellite. The relationships between alpha satellites on different chromosomes of the same species and between orthologous chromosomes of different species will reveal a great deal about the evolution of alpha satellite in primates. 27

Assembling repetitive regions of the genome

To better understand the organization and evolution of alpha satellite, it is necessary to fully sequence and analzye at least some human centromeres.

Assembling the extremely identical repeat units that make up higher-order alpha satellite is a daunting task. The majority of the human genome has been sequenced and assembled (Lander et al. 2001; Venter et al. 2001); yet, no human centromere has been completely assembled (Eichler et al. 2004). The centromere regions were intentionally neglected from the , due largely to the assumption that they contained nothing but junk DNA, and also due to the perceived difficulty in sequencing and assembling these repetitive regions (Collins et al. 1998; Lander et al. 2001).

Although no human centromere has been completely sequenced, several higher-order arrays of alpha satellite have been extensively mapped (Jackson et al. 1996; Mahtani and Willard 1990; Mahtani and Willard 1998; Puechberty et al.

1999; Tyler-Smith and Brown 1987; Warburton and Willard 1990; Wevrick and

Willard 1989; Wevrick and Willard 1991). A general strategy for mapping higher- order arrays uses restriction enzymes that regularly cut within typical genomic

DNA, but that rarely cut within higher-order alpha satellite (Warburton et al.

1991). Upon pulse-field gel electrophoresis, megabase-sized arrays can be resolved by Southern blot analysis. The next step in sequencing human centromeric regions should focus on connecting existing chromosome arm 28 contigs to higher-order alpha satellite (Guy et al. 2003; Guy et al. 2000; Horvath et al. 2000; Schueler et al. 2001), and then develop a strategy to sequence across the highly homogeneous arrays of higher-order alpha satellite.

The most challenging part of sequencing across megabases of higher- order alpha satellite is not the sequencing per se, but the assembly process. The human genome project used bacterical artificial chromosomes (BACs) to create a tiling path across chromosomes that were subsequently sequenced and assembled (Lander et al. 2001). This methodology could be applied to higher- order alpha satellite to create a BAC scaffold underlying a restriction-mapped array. However, the subsequent assembly of sequences within each BAC is far more complicated than assembling typical genomic DNA. Higher-order repeat units are up to 100% identical (Durfy and Willard 1989; Schindelhauer and

Schwarz 2002; Schueler et al. 2001), so it would be very easy to compress independent sequences. The amount of sequence divergence among higher- order repeat units is comparable to the amount of variation seen in typical genomic DNA assemblies due to allelic variation or polymorphism (Lander et al.

2001; Venter et al. 2001). This is similar to the situation on the human Y chromosome. The Y chromosome is made up of highly homogeneous repetitive sequences, up to 100% identical to each other. To assemble the sequence of this problematic chromosome, Skaletsky et al. used a BAC library constructed from only one man’s Y chromosome and sequenced redundant BACs to avoid the problems associated with normal levels of polymorphism (Skaletsky et al. 29

2003). A similar strategy could be employed to assemble a higher-order array of alpha satellite.

Centromeric chromatin and centromere function

Although, as outlined above, the organization of centromeric DNA varies widely among organisms, the chromatin modifications and proteins involved in centromere function are very well conserved from yeast to humans (Baum and

Clarke 2000; Brown et al. 1993; Buchwitz et al. 1999; Earnshaw et al. 1987;

Henikoff et al. 2000; Howman et al. 2000; Kalitsis et al. 1998; Oegema et al.

2001; Palmer et al. 1991; Stoler et al. 1995; Sullivan and Glass 1991; Takahashi et al. 2000; Tomkiel et al. 1994). Nevertheless, centromere function is complex in that the centromere coordinates multiple processes and different protein players are involved in each step. First of all, the centromere is the site of the kinetochore, a proteinaceous structure responsible for attaching the chromosome to spindle microtubules. To ensure proper chromosome segregation, the centromere must also satisfy spindle assembly checkpoints and release sister chromatid cohesion at the right time. This suggests that the sequences required for all aspects of centromere function may in fact be much larger than those that delineate the region of the kinetochore.

The likely primary mark of the functional centromere is the variant

CENP-A, also known as CENH3 (Ahmad and Henikoff 2002). CENP-A is a variant found at active centromeres in every organism studied 30

(Sullivan et al. 2001). Depleting CENP-A in yeast (Stoler et al. 1995), flies

(Blower and Karpen 2001), worms (Oegema et al. 2001), and mice (Howman et al. 2000), and in human cells (Valdivia et al. 1998) causes chromosome segregation defects and also has downstream effects on the localization of other centromere proteins, supporting its role as the primary epigenetic mark. CENP-A and histone H3 are interspersed at the centromeres of flies and humans (Blower et al. 2002), and CENP-A can substitute for histone H3 in reconstituted nucleosomes in vitro (Yoda et al. 2000). Given the fact that CENP-

A is a histone variant, it may set up the centromere-specific chromatin conformation that then recruits other centromere proteins.

There are a number of proteins, DNA binding proteins as well as motor proteins, that are part of the kinetochore. (CENP-B) is a

DNA binding protein found at the centromeres of diverse organisms, and it recognizes a 17 bp sequence known as the “CENP-B box” in mouse minor satellite and human higher-order alpha satellite (Masumoto et al. 1989). The

CENP-B box sequence has also been found at the centromeres of the great apes, but not in Old World Monkeys, New World Monkeys, or prosimians

(Goldberg et al. 1996; Haaf et al. 1995). Despite its conservation, the role of this protein in centromere function is questionable, as knockout mice have no mitotic defects (Hudson et al. 1998) and the Y chromosome of mice and humans has no detectable CENP-B protein (Broccoli et al. 1990; Earnshaw et al. 1991). In fact, chromosome segregation errors associated with CENP-B depletion have only 31 been seen in a S. pombe minichromosome (Irelan et al. 2001). Another DNA binding protein, CENP-C, is directly involved in centromere function, as its depletion causes chromosome segregation defects in yeast (Brown et al. 1993),

C. elegans (Moore and Roth 2001; Oegema et al. 2001), mice (Kalitsis et al.

1998) and human cells (Tomkiel et al. 1994). Other proteins such as dynein,

MCAK, and CENP-E are also members of the kinetochore, playing a role in chromosome movement along the microtubules (Rieder and Salmon 1998). And spindle checkpoint proteins such as Mad2 and Bub1 are critical for chromosome segregation as they signal the start of anaphase once all are attached to the spindle (Shah and Cleveland 2000).

Proper resolution of sister chromatid cohesion is also required for chromosome segregation. After proceeding into mitotic anaphase, sister chromatids completely lose cohesion and separate to opposite poles of the cell.

Loss of sister chromatid cohesion is a two-step process in meiosis; in the first meiotic division, cohesion is removed from chromosome arms but maintained at the centromere, and then in the second meiotic division cohesion is completely removed (Dej and Orr-Weaver 2000). The mitotic cohesin complex is made up of the proteins SCC1/Rad21, SCC3, SMC1 and SMC3; however, in meiosis

Rec8 substitutes for SCC1/Rad21 (Klein et al. 1999; Lee and Orr-Weaver 2001;

Parisi et al. 1999). In the absence of cohesins, chromosomes missegregate, exhibiting chromosome lag and premature sister chromatid separation (Bernard et al. 2001; Hoque and Ishikawa 2002; LeBlanc et al. 1999). 32

The relationships among centromeric chromatin, kinetochore formation, spindle checkpoints, and resolution of sister chromatid cohesion have been best described in S. pombe and D. melanogaster. The fission yeast centromere is made up of two main domains, the central core and inner repeats responsible for kinetochore activity (Nakaseko et al. 2001) and the outer repeats responsible for pericentromeric heterochromatin formation and sister chromatid cohesion

(Figure 1-4). Both centromere chromatin domains are transcriptionally silent

(Allshire et al. 1995); however, silencing is mediated by different proteins.

Mutations in Swi6 and Chp1 alleviate silencing of transgenes in the outer repeats and a mutation in Mis6 alleviates silencing in the central core (Partridge et al.

2000). Swi6 is the yeast ortholog of heterochromatin protein 1 (HP1), a protein first discovered in flies and found to localize to heterochromatic regions of the

Drosophila genome (James and Elgin 1986; James et al. 1989). The chromodomain protein, Chp1, is involved in heterochromatin formation and chromosome segregation (Doe et al. 1998). Mis6 is required for the proper loading of Cnp1, the S. pombe ortholog of CENP-A (Takahashi et al. 2000).

Chromatin immunoprecipiation experiments are consistent with these data, as Cnp1 and Mis6 are associated with the central core and inner repeats

(Takahashi et al. 2000) while Swi6 and Chp1 associate with the outer repeats

(Partridge et al. 2000). The histone methyltransferase Clr4 (Su (Var)3-9 ortholog) methylates histone H3 at lysine 9 and is required for Swi6 localization to the S. pombe centromere (Bannister et al. 2001). Swi6 is required for the 33

Pericentromere Centromere Pericentromere

... dh dg ImrL Cnt ImrR dh dg ...

Swi6, Chp1, Cnp1, Mis6 Swi6, Chp1, Rad21, Psc3 Rad21, Psc3

... ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ...

HP1, Prod, Cid, Polo kinase, Rod, HP1, Prod, Rad21, Mei-S332 Cenp-meta, dynein, Rad21, Mei-S332 ZW10, Bub1, Bub2

Monomeric Higher-order Monomeric alpha satellite alpha satellite alpha satellite ... / / ...

CENP-A, CENP-B, CENP-C

Cohesins? Other kinetochore Cohesins? Heterochromatin proteins? Heterochromatin proteins? proteins? Checkpoint proteins? Figure 1-4: Centromere and pericentromere model. The S. pombe centromere has been defined at the DNA and chromatin levels by chromatin immunoprecipitation (ChIP). The CENP-A ortholog, Cnp1, and Mis6 are located at the centromeric region and heterochromatin protein Swi6 and cohesins Rad21 and Psc3 are located at the pericentromere. The D. melanogaster centromere has not been sequenced, however numerous proteins have been cytologically localized to the centromeric and pericentromeric regions. The human centromere is made up of higher-order alpha satellite and the adjacent pericentromere is made up of monomeric alpha satellite interspersed with other sequences. CENP-A, CENP-B and CENP-C proteins associate with higher-order alpha satellite as shown by ChIP; however, the precise location of other proteins is unknown. 34 association of the cohesins SCC1/Rad21 (Bernard et al. 2001) and Psc3

(Nonaka et al. 2002) at the outer repeats of the S. pombe centromere. Mutations in either Swi6 (Ekwall et al. 1995) or Rad21 (Bernard et al. 2001) cause chromosome lag, suggesting that although the outer repeats are not the site of microtubule binding, they still are important for centromere function.

The fly centromeric and pericentromeric chromatin domains have a similar bipartite organization. The Drosophila CENP-A ortholog, CID, colocalizes to the genetically defined minichromosome centromere (see above) as well as endogenous fly centromeres (Blower and Karpen 2001; Henikoff et al. 2000).

CID is also required for recruiting kinetochore and spindle checkpoint proteins such as POLO kinase, ROD, Cenp-meta and BUB1, as well as the cohesin MEI-

S332 to the centromere (Blower and Karpen 2001). However, CID is not responsible for the pericentromeric localization of the heterochromatin protein

HP1 or the condensation protein PROD. Mutations in polo, mei-S332, HP1 or prod have no effect on Cid localization, suggesting that Cid is upstream of these proteins in the centromere function pathway. Cytological studies have shown that the kinetochore proteins occupy a region separate from the more distal pericentromeric heterochromatin and sister chromatid cohesion proteins (Blower and Karpen 2001) (Figure 1-4). Nevertheless, loss of mei-S332 affects chromosome segregation, as minichromosomes on a mutant mei-S332 background have a significant drop in transmission (Lopez et al. 2000). Cells mutant for the Drosophila ortholog of Rad21 have mitotic defects such as 35 premature sister chromatid separation and abnormal spindle morphology (Vass et al. 2003). Thus it appears that, similar to the case in S. pombe, Drosophila centromeres can be divided into centromeric and pericentromeric chromatin domains, both of which are required for proper chromosome segregation.

It is tempting to apply this domain model to the organization of the human centromere (Sullivan 2002). As described above, the human centromere is made up of alpha satellite arranged in higher-order arrays flanked by more divergent monomeric alpha satellite (Figure 1-4). Monomeric alpha satellite is interspersed with other sequences, and there is no evidence for monomeric alpha satellite involvement in centromere function (see below). However, as presented in this thesis, human artificial chromosomes derived from higher-order alpha satellite and lacking monomeric alpha satellite have an increase in anaphase lag and nondisjunction as compared to normal chromosomes (Chapter

4). It may be the case that, although monomeric alpha satellite cannot nucleate the site of the kinetochore on its own (Ikeno et al. 1998), it is required for setting up the pericentromeric chromatin state, similar to the kinetochore flanking sequences in S. pombe and D. melanogaster. Future studies carefully dissecting the locations of centromere proteins, heterochromatin proteins and cohesins at the human centromere will determine the difference between centromeric and pericentromeric domains. 36

Assessing centromere function in human chromosomes

The requirements for centromeric and pericentromeric functions in model organisms have been well defined, both at the level of DNA sequence and centromere protein content. Studies involving the human centromere lack the tractable genetic systems found in other organisms, making it difficult to test specific regions functionally for centromere activity. Both sequence specificity and epigenetic modifications are likely responsible for human centromere function (Figueroa et al. 1998; Harrington et al. 1997; Ikeno et al. 1998; Tomkiel et al. 1994; Yen et al. 1991). Nevertheless, the roles of DNA sequence and epigenetics in human centromere function are a topic of much debate (Choo

2000; Cleveland et al. 2003; Karpen and Allshire 1997; Murphy and Karpen

1998). Although all human, and for that matter other primate, centromeres are made up of alpha satellite DNA, there are two lines of evidence used to argue for the sequence independence of human centromeres based on the study of different types of chromosome abnormalities observed in human patient material.

Dicentric chromosomes

Dicentric human chromosomes contain two distinct arrays of alpha satellite formed by chromosome breakage and fusion events (Earnshaw et al.

1989; Higgins et al. 1999; Page et al. 1995; Page and Shaffer 1998; Sullivan and

Schwartz 1995; Sullivan and Willard 1998; Therman et al. 1974). To maintain chromosome stability, only one centromere must remain active, because if the 37 chromosome attaches to spindle microtubules at two sites it could be pulled to opposite poles of the cell, causing chromosome breakage or anaphase bridging

(McClintock 1939). stability has been hypothesized to occur by either inactivating one centromere (Therman et al. 1974) or by coordinating the activity of the two centromeres (Page and Shaffer 1998; Sullivan and Willard 1998). In either case, centromere activity is assessed by the ability to bind centromere proteins. There are several cases in which only one of the two alpha satellite arrays on a dicentric chromosome bind antibodies to centromere proteins (Earnshaw et al. 1989; Sullivan and Schwartz 1995). Both active and inactive regions of alpha satellite bind antibodies to CENP-B; however, only the active centromere binds antibodies to CENP-C and CENP-E.

This suggests that one previously active centromere has been epigenetically inactivated and has lost the ability to bind proteins involved in spindle microtubule attachment. The fact that a region of alpha satellite can exist on a chromosome without conferring centromeric activity has led some to propose that alpha satellite is not sufficient for centromere function (Choo 2000; Cleveland et al.

2003; Karpen and Allshire 1997; Murphy and Karpen 1998). However, this argument is misleading. Much in the same way that a previously active gene can be silenced, human centromeres may be epigenetically inactivated during dicentric chromosome formation. Just as a silenced gene is still a “gene”, an inactive centromere is still a “centromere”. Alpha satellite is clearly sufficient for 38 centromere function as demonstrated by artificial chromosome studies (see below).

Neocentromeres

The existence of neocentromeres also provides an argument for the sequence independence of centromere function. Neocentromeres are regions of chromosomes that do not contain typical centromeric DNA, but that have been modified epigenetically to act as a centromere and segregate the chromosome faithfully. Neocentromeres were first described in maize (Rhodes and

Vilkomerson 1942), and have also been engineered in flies (Maggert and Karpen

2001; Platero et al. 1999; Williams et al. 1998).

Human neocentromeres are found on marker chromosomes detected in patient material and derived from chromosome breakage events in which a previously acquires centromere activity (Depinet et al. 1997; du

Sart et al. 1997). Neocentromeres have been extensively characterized to determine the molecular structure and epigenetic modifications responsible for centromere activity. There appear to be “hotspots” for neocentromere formation as certain regions of the genome are commonly rearranged to form marker chromosomes with the same breakpoints (Warburton et al. 2000). The DNA sequences underlying the centromere protein binding domains of two different neocentromeres have been analyzed (Barry et al. 1999; Satinover et al. 2001).

An increase in AT-richness as compared to the genome average was found at 39 both neocentromeres, and other sequences such as classical satellites and LTRs were also enriched at the neocentromeres. These data suggest that, although neocentromeres contain no detectable alpha satellite DNA, there could be some sequence characteristics that predispose these loci to centromere activity (Koch

2000). This hypothesis has been challenged by a finding that three marker chromosomes derived from the same region of all have different

CENP-A binding domains (Alonso et al. 2003). Thus, the regions of the genome from which marker chromosomes are derived may be hotspots for chromosome breakage events; however, the acquisition of centromeric activity is probably not sequence dependent.

Other parameters such as centromere protein deposition, replication timing, and histone acetylation status likely define neocentromere function.

Neocentromeres bind antibodies to centromere proteins found at normal active centromeres except for antibodies to CENP-B (Depinet et al. 1997; du Sart et al.

1997; Floridia et al. 2000; Saffery et al. 2000; Slater et al. 1999; Voullaire et al.

1999; Voullaire et al. 2001; Warburton et al. 2000). A neocentromere derived from chromosome 10q25 replicates later in the than the normal 10q25 locus (Lo et al. 2001), similar to the replication timing of normal human centromeres (Shelby et al. 2000). Additionally, treatment with the drug

Trichostatin A hyperacetylates the normally hypoacetylated neocentromere derived from 10q25 and shifts the CENP-A binding domain of the neocentromere

(Craig et al. 2003). 40

Thus, neocentromeres appear to be defined by epigenetic factors such as histone modifications and variants, centromere protein binding and replication timing rather than DNA sequence specificity. These data demonstrate that alpha satellite DNA is not always necessary for centromere function, although neocentromere formation is an extremely rare event. However, alpha satellite is the only sequence capable of recapitulating centromere function in human cells

(see below), providing strong evidence for a role for DNA sequence in determining centromere identity in normal chromosomes.

Normal human centromeres

The most direct way to assess the requirements for human centromere function is by examining normal human centromeres. There are two principle strategies for determining what DNA is present at the functional centromere, either by looking for the type of DNA associated with centromere proteins on a normal chromosome or by using minichromosome and artificial chromosome assays to determine the minimal DNA sequences required for centromere function.

In the first approach, antibodies to centromere proteins known to be present at active centromeres are used to determine which DNA sequences colocalize with the active centromere. The colocalization of centromere proteins and alpha satellite DNA has been demonstrated in a number of studies. Before the purification of antibodies to specific centromere proteins, CREST antisera 41 were used to identify the functional centromere. Serum from patients with calcinosis, Raynaud syndrome, esophageal dysmotility, scleroderma, and telangiectasia (CREST) (Moroi et al. 1980) contains antibodies to CENPs -A, -B, and -C (Earnshaw and Rothfield 1985). CREST sera were shown to colocalize with a degenerate alpha satellite probe on mechanically stretched chromosomes; however, the alpha satellite signal extended beyond the edges of the CREST immunostaining (Zinkowski et al. 1991). Similarly, antibodies to CENP-A only bind to a portion of the alpha satellite at human centromeres, at the site of the inner kinetochore (Warburton et al. 1997). Most recently, extended chromatin fiber experiments have demonstrated that antibodies to CENP-A only stain a portion of the alpha satellite at human centromeres. About one-half to two-thirds of the stretched alpha satellite region overlaps with CENP-A (Blower et al. 2002;

Sullivan et al. 2002). These collective data suggest that only a subset of alpha satellite DNA is part of the functional centromere, but they do not define the particular type of alpha satellite participating in centromere function.

Haaf and Ward investigated the functionality of two distinct higher-order arrays of alpha satellite on , D7Z1 and D7Z2 (Waye et al. 1987c;

Wevrick and Willard 1991). Studies in mechanically extended chromosomes and nuclei showed that only D7Z1, and not D7Z2, colocalized with CREST autoantibodies (Haaf and Ward 1994). Thus, on chromosome 7, centromere function is restricted to only one type of higher-order alpha satellite. Given the adjacent organization of monomeric and higher-order alpha satellite at the 42 human centromere and the difference in repetitive structure between the two, it is interesting to determine if centromere function is restricted to higher-order alpha satellite or if monomeric alpha satellite is also part of the kinetochore. This question will be addressed in this thesis in chapter 3.

Chromatin immunoprecipitation experiments with antibodies to centromere proteins also support a functional role for alpha satellite DNA. Vafa and Sullivan first showed that alpha satellite does in fact immunoprecipitate with antibodies to

CENP-A and proposed a specialized phasing for CENP-A-containing nucleosomes (Vafa and Sullivan 1997). In another study, antibodies to CENP-B and CENP-C as well as CENP-A were found to be associated with alpha satellite

DNA (Ando et al. 2002). Upon cloning and sequencing the chromatin immunoprecipitated DNA, the only type of alpha satellite associated with centromere proteins contained CENP-B boxes. CENP-B recognition sites are found only in higher-order alpha satellite and not monomeric alpha satellite

(Masumoto et al. 1989), suggesting that only higher-order alpha satellite is part of the centromere protein complex at the human kinetochore. These two chromatin immunoprecipitation studies are consistent with cytological centromere protein colocalization experiments and strongly support a role for alpha satellite in centromere function. 43

Minichromosomes

In addition to strategies that examine the DNA and protein composition at endogenous centromeres, minichromosome and artificial chromosome studies have also explored the minimal requirements for centromere function. Telomere sequences have been introduced into human cell lines to truncate existing chromosomes into smaller minichromosomes. The minichromosome can be mapped subsequently to determine the sequences present responsible for centromere function on this minimal chromosome (Farr et al. 1992; Heller et al.

1996). In contrast, human artificial chromosomes are derived from naked DNA transfected into tissue culture cells (Harrington et al. 1997; Ikeno et al. 1998).

Artificial chromosomes may be used as an assay to determine the types of sequences capable of forming a de novo centromere and are a valuable tool for determining the sequence requirements for centromere function (see below).

Farr et al. engineered telomere truncation chromosomes by introducing telomere repeats in a non-targeted fashion to truncate the human X chromosome at a number of locations along the q arm (Farr et al. 1992). These chromosomes were further truncated along the p arm to generate minichromosomes less than

2.4 Mb in size (Farr et al. 1995; Mills et al. 1999). The minimal chromosome that retained mitotic stability was 1.4 Mb overall with a 670 kb array of DXZ1 higher- order alpha satellite. Below this threshold, chromosomes with less DXZ1 or less flanking sequence on the p side of DXZ1 were mitotically unstable, suggesting 44 that both higher-order alpha satellite and neighboring pericentromeric sequence may be required for proper centromere function (Spence et al. 2002).

Similar chromosome truncation studies have been conducted on the human Y chromosome (Heller et al. 1996). The smallest Y chromosome-based minichromosome exhibiting faithful segregation was 1.8 Mb overall, with an approximately 100 kb array of DYZ3 higher-order alpha satellite (Shen et al.

2001; Yang et al. 2000). These data from the X- and Y-based minichromosomes demonstrate that higher-order alpha satellite is capable of maintaining centromere function after the original chromosome has been significantly truncated and/or rearranged. The fact that the smallest minichromosomes are larger than just the higher-order alpha satellite array may reflect a requirement for other flanking sequences or may simply be an artifact of the telomere truncation process. Telomere constructs may not have integrated within higher- order alpha satellite on both sides of the array to create purely higher-order minichromosomes. Conversely, these kinds of events may have been so detrimental to chromosome segregation that they were not recoverable.

Human artificial chromosomes

Artificial chromosome studies address the requirements for centromere establishment as well as maintenance. Candidate DNA sequences are transfected into human tissue culture cells to test them for the ability to form an artificial chromosome with a de novo centromere derived from the input DNA. 45

Two major strategies have been employed to construct artificial chromosomes, using either single DNA molecules (Ikeno et al. 1998) or a combination of linear

DNA fragments (Harrington et al. 1997). Numerous artificial chromosome studies have tested alpha satellite sequences, non-alpha satellite sequences, and different types of alpha satellite for centromere functionality.

The first human artificial chromosome study combined the principal components of chromosomes— centromeres, , and genomic DNA

(presumably containing origins of replication)— to generate small linear artificial chromosomes. Harrington et al. combined linear arrays of synthetic D17Z1 or

DYZ3 higher-order alpha satellite with linearized genomic DNA and TTAGGG telomere repeats (Harrington et al. 1997). In an alternate approach, Ikeno et al. engineered a yeast artificial chromosome (YAC) construct containing alpha satellite and telomere sequences on a single molecule (Ikeno et al. 1998). YACs containing either higher-order or monomeric alpha satellite from were retrofitted with telomere repeats by homologous recombination. In both studies, DNA sequences were transfected into a human fibrosarcoma cell line,

HT1080. Interestingly, only higher-order alpha satellites from chromosome 17 and chromosome 21 were capable of forming artificial chromosomes with de novo centromeres. The inability of higher-order alpha satellite from Y chromosome to form a de novo centromere has also been demonstrated in other studies (Grimes et al. 2002). Artificial chromosomes were mitotically stable in the absence of drug selection and bound antibodies to centromere proteins, 46 demonstrating the assembly of a fully functional human centromere. These experiments suggest that higher-order and monomeric alpha satellite differ in their capacities to establish a centromere. Higher-order alpha satellite clearly has some functional capability lacking in monomeric alpha satellite.

Since these two original studies, higher-order alpha satellite from (Kouprina et al. 2003) and a chimeric YAC containing higher- order alpha satellite found on chromosomes 4, 14 and 22 (Henning et al. 1999) have also been successful in generating artificial chromosomes. Conversely, the sequences comprising the neocentromere derived from chromosome 10q25

(Saffery et al. 2001) as well as other non-alpha satellite sequences (Ebersole et al. 2000; Grimes et al. 2002) are not capable of forming artificial chromosomes with de novo centromeres in human cells. So what characteristic of higher-order alpha satellite is responsible for conferring centromere function? Is it the extremely homogeneous organization of higher-order repeats? Or the presence of CENP-B boxes in higher-order but not monomeric alpha satellite? Or are specific basepairs present in higher-order repeats besides the CENP-B box responsible for centromere function?

Expanding on the earlier study involving alpha satellite from chromosome

21 (Ikeno et al. 1998), Ohzeki et al. generated a number of constructs to begin to address the specific characteristics of higher-order alpha satellite that nucleate centromere function (Ohzeki et al. 2002). A mutation was introduced into the

CENP-B boxes in the higher-order repeat unit, causing a failure to bind CENP-B 47 protein in a gel shift assay. BACs containing either normal higher-order or mutated higher-order alpha satellite were transfected into HT1080 cells. Only normal higher-order alpha satellite was capable of artificial chromosome formation. These data suggest that the mutations in this construct are responsible for the absence of centromere function, but it remains to be determined if centromere function was abolished due to the mutation specifically in the CENP-B box or if any mutation in higher-order alpha satellite could hinder centromere function. Further experiments showed that a non-alpha satellite construct that contained CENP-B boxes was not capable of artificial chromosome formation, suggesting that CENP-B binding alone is not sufficient for centromere function (Ohzeki et al. 2002). The absence of CENP-B protein on the Y chromosome and neocentromeres suggests that it is not an integral part of centromere function. The sequence requirements for human centromere function are complex, but likely involve AT-richness and highly repetitive DNA.

Research Objectives and Thesis Outline

The following chapters will expand on the requirements for human centromere function and evaluate the role of alpha satellite DNA through genomic analyses and functional experiments. Chapter 2 describes the analysis of the types of alpha satellite in the current genome assembly (Build 34, July

2003), as well as the other types of sequences in the vicinity of human centromeres and pericentromeric regions. The functionality of higher-order and 48 monomeric alpha satellite is tested by centromere protein colocalization experiments on centromeres from six chromosomes. In Chapter 3, the chromosome 17 centromere is analyzed for its genomic content as well as its evolutionary history. Chapter 4 tests D17Z1 and DXZ1 higher-order alpha satellites functionally for their ability to form a de novo centromere on an artificial chromosome. The segregation of DXZ1- and D17Z1-based artificial chromosomes is evaluated and compared to that of patient-derived ring chromosomes and normal chromosomes. Chapter 5 discusses the impact of these experiments in the fields of centromere and genomics as well as the future experiments to further explore what is required for human centromere function. The appendix is based on a review article that describes the types of alpha satellite in the human genome and discusses ways to functionally annotate these sequences. 49

Chapter 2

Analysis of the centromeric regions of the human genome assembly

M. Katharine Rudd and Huntington F. Willard

Note: This chapter has been adapted from a manuscript accepted in Trends in Genetics as a peer-reviewed “Genome Analysis” article and reformatted for this document. 50

Abstract. The sequence of the human genome is not yet complete, and major gaps remain at the centromere region of each chromosome. Human centromeres are comprised of megabases of repetitive alpha satellite DNA, most of which is missing from the July 2003 (Build 34) genome assembly. Alpha satellite is a repeat family based on ~171 bp monomers that can be arranged either in a highly homogeneous higher-order organization or in a more heterogeneous monomeric form that lacks this higher-order periodicity. We have analyzed the ~7 megabases of alpha satellite that have been assembled thus far, and have found both higher-order and monomeric types of alpha satellite. The majority of alpha satellite in the assembly lies within 1 Mb of the centromere gaps; however, there are also small blocks of alpha satellite several megabases away from the centromere regions. The most centromere proximal regions of the genome asssembly are enriched for other types of satellites as well as segmental duplications. In addition to characterizing the organization of alpha satellite in the genome assembly, we have also functionally annotated alpha satellite on several chromosomes. Using extended chromosome methods, we have found that antibodies to centromeric proteins only colocalize with higher-order and not monomeric alpha satellite. Thus, higher-order and monomeric alpha satellites differ in genomic organization as well as function. 51

Introduction

The centromere of most complex eukaryotic chromosomes is a specialized locus comprised of repetitive DNA that is responsible for chromosome segregation at mitosis and meiosis (Cleveland et al. 2003; Sullivan et al. 2001). Normal human centromeres are made up of megabases of alpha satellite DNA, a repeat family based on ~171 bp monomers (Willard and Waye 1987b). These monomers may be arranged either in a highly homogeneous, multimeric organization or in a more heterogeneous monomeric form that lacks this higher-order periodicity

(Alexandrov et al. 2001; Warburton and Willard 1996; Willard 1991). Despite their obvious functional significance, centromeric regions and their constituent alpha satellite sequences were largely omitted by the Human Genome Project because of their repetitive nature and the expected deficiency of genes (Collins et al. 1998); the reported assemblies (Lander et al. 2001; Venter et al. 2001) of each chromosome arm thus end an uncertain distance from the functional centromere (Schueler et al. 2001). While such regions are often considered to be difficult to sequence, in fact it is the assembly, not the sequencing itself, that presents a challenge due to the high degree of sequence homogeneity among many hundreds or thousands of copies of a given repeated sequence.

Alpha satellite DNA has been identified at every human centromere

(Alexandrov et al. 2001; Warburton and Willard 1996); however, among reported chromosome assemblies, the amount and type of alpha satellite varies. There are two major types of alpha satellite, higher-order and monomeric (Warburton 52 and Willard 1996; Willard 1991). Higher-order alpha satellite is made up of ~171 bp monomers organized in arrays of multimeric repeat units that are highly homogeneous; in contrast, monomeric alpha satellite lacks any higher-order periodicity, and its monomers are only on average ~70% identical to each other

(Wevrick et al. 1992). In addition to their different sequence organization, monomeric and higher-order alpha satellites also differ in their functionality.

Higher-order alpha satellite has been demonstrated to be associated with centromere function on the basis of genomic (Schueler et al. 2001; Spence et al.

2002), biochemical (Ando et al. 2002; Vafa and Sullivan 1997) and artificial chromosome assays (Harrington et al. 1997; Ikeno et al. 1998; Schueler et al.

2001). In contrast, there is no evidence for the direct involvement of monomeric alpha satellite in centromere function (Ikeno et al. 1998).

Higher-order alpha satellite is the predominant type in the genome, present in megabase quantities at each centromere (Warburton and Willard

1996; Wevrick and Willard 1989; Willard and Waye 1987b). Where it has been studied, monomeric alpha satellite lies at the edges of higher-order arrays and is less abundant (Horvath et al. 2000; Schueler et al. 2001; Wevrick et al. 1992).

This expectation notwithstanding, the vast majority of alpha satellite in the current assembly is of the monomeric type (see below), reflecting the currently incomplete nature of centromeric contigs. 53

Materials and Methods

Alpha satellite in the genome

The July 2003 genome assembly (Build 34) was extracted from the UCSC browser (Kent et al. 2002) (http://genome.ucsc.edu). For simplicity and because the sizes of most of the gaps in the genome have not been determined experimentally, we removed all non-centromere clone gaps within the reported chromosome arm contigs in the assembly. The resulting un-gapped assembly was divided into 1 Mb blocks starting from the centromere gaps. The amount of alpha satellite was calculated using RepeatMasker

(http://repeatmasker.genome.washington.edu) and the alpha satellite within and beyond the first 1 Mb block was determined (Table 2-1, Figure 2-1). Segmental duplications in the July 2003 assembly were provided by Evan Eichler (Univ. of

Washington).

Alpha satellite and other satellites were extracted using RepeatMasker.

Alpha satellite was characterized as monomeric or higher-order using the dot- matrix program, DOTTER (Sonnhammer and Durbin 1995). Groups of monomers that appeared to have a higher-order structure by DOTTER

(stringency of greater than or equal to 95% identical over 100 bp windows) were aligned using CLUSTALW (Thompson et al. 1994) and percent identity among higher-order repeats was determined (see below). As a complementary analysis, we made a database of 41 higher-order repeats reported in the literature and performed BLAST alignments vs. all alpha satellite in the July 2003 assembly 54

(http://www.ncbi.nlm.nih.gov/BLAST/). Of these 41 known higher-order repeat families, only six (D2Z1, D7Z2, D8Z2, D17Z1-B, DXZ1, and DYZ3) were found in the assembly with alignments of greater than or equal to 97% identity. Thus, the current genome assembly is lacking most of the higher-order repeats previously reported in the literature.

Novel higher-order alpha satellite

Using DOTTER, we found four regions of higher-order alpha satellite not previously described in the literature in the assemblies of chromosomes 4, 10,

11, and 19. To determine percent identity among higher-order repeat units on a given chromosome arm assembly, we performed in silico restriction digests and aligned tandem higher-order repeat units using CLUSTALW. We extracted seven higher-order repeat units from the proximal q side of the centromere gap on the assembly, totaling over 15 kb in sequence. Based on

CLUSTALW alignments among all possible pairwise comparisons of the seven units, percent identity ranged from 98.5-99.8%, with a mean of 99.1 +/- 0.3%.

Two higher-order repeats from 10q were extracted and aligned, and they were

99.3% identical. We also found higher-order alpha satellite on the proximal p side of the centromere gap of . Six higher-order repeats were aligned via CLUSTALW, and their percent identity ranged from 97.4-99.8%, with a mean of 98.3 +/- 0.5%. Lastly, six higher-order repeats from chromosome 19p were 98.7-100% identical, with a mean of 99.3 +/- 0.4%. 55

Alpha satellite percent identity

Monomeric alpha satellite

To calculate the percent identity among monomeric alpha satellite monomers, we examined three regions of monomeric alpha satellite from the current assembly. Between 30 and 40 kb of sequence from regions including monomeric alpha satellite on chromosomes 3p, 15q and 17p were extracted from

July 2003 assembly using the UCSC browser (chr3:90232867-90263045, chr15:18260006-18300090, chr17:21904223-21937497). We used CLUSTALW to perform all pairwise alignments among monomers from a particular region.

Upon alignment of all 461 monomers from the three regions (106030 alignments), pairwise percent identity ranged from 48.8-100%, with a mean of

71.6 +/- 8.3%.

Higher-order alpha satellite

Percent identity among higher-order repeat units in the July 2003 assembly was also determined. The assemblies of chromosomes 4, 7, 8, 10, 11,

17, 19 and X contain typical higher-order alpha satellite as determined by

DOTTER and subsequent CLUSTALW alignments. We analyzed 73 higher- order repeats from these chromosome assemblies, totaling over 200 kb of sequence. We used CLUSTALW to perform all pairwise alignments among higher-order repeats from each chromosome. Within chromosome arm contigs, higher-order repeat unit identity ranges from 97.5% +/- 0.5% (8q; n=9 higher- 56 order repeats) to 99.3% +/- 0.4% (19p, n=6 higher-order repeats), with an overall average of 98.4 +/- 0.5% identical. More divergent higher-order alpha satellite is found on the assemblies of chromosomes 2, 4, 6, 7, 11, X and Y. We analyzed over 10 kb of divergent higher-order alpha satellite (from chromosomes 2, 6 and

Y), and found higher-order repeat units to be 82.3-100% identical, with an average of 93.7% +/- 2.6% identical.

Alpha satellite and CENP-E colocalization

To achieve greater resolution than conventional FISH, we generated extended chromosomes by treating cells with ethidium bromide and mechanically stretching chromosomes with harsh cytospinning conditions. Extended chromosomes were prepared as described (Haaf and Ward 1994) and stained with antibodies to CENP-A or CENP-E as described (Harrington et al. 1997).

The CENP-A was provided by Manuel Valdivia (Valdivia et al. 1998) and the CENP-E antibody has been described previously (Harrington et al.

1997). BACs containing alpha satellite from the most proximal edges of the p and q arm contigs of (RPCI-11 557B13, p; RPCI-11 124L3, q), chromosome 7 (RPCI-11 548K12, p; RPCI-11 435D24, q) and

(RPCI-11 191K23, p; RPCI-11 125N22, q) were hybridized to extended chromosomes stained with antibodies to CENP-E using described FISH conditions (Harrington et al. 1997). BAC RPCI-11 65I6, containing monomeric alpha satellite from the Xq contig, was hybridized to extended chromosomes 57 stained with antibodies to CENP-E. BACs containing monomeric alpha satellite

(RPCI-11 305L6, p; RPCI-11 846F4, p; RPCI-11 362P24, q) as well as D17Z1-B higher-order alpha satellite (RPCI-11 285M22, p) from chromosome 17 were hybridized to extended chromosomes stained with antibodies to CENP-A.

Plasmids containing higher-order repeat units from chromosomes 8, 17 and the

X chromosome were hybridized to extended chromosomes stained with antibodies to CENP-E. The p17H8 contains D17Z1 higher-order alpha satellite (Waye and Willard 1986), and the plasmid pBamX7 contains DXZ1 higher-order alpha satellite from the X chromosome (Willard et al. 1983). The higher-order repeat found on (Ge et al. 1992) was subcloned from BAC RPCI-11 451D21 and confirmed by end sequencing. Between 35 and

40 metaphase spreads were scored for colocalization between each alpha satellite FISH probe and CENP-E.

CENP-A ChIP data analysis

Sequences from DNA immunoprecipitated with antibodies to tagged CENP-A

(Vafa and Sullivan 1997) were kindly provided by Kevin Sullivan (Scripps

Research Institute, La Jolla, CA) and were compared to alpha satellite in the reported assembly. 69 sequences from the CENP-A immunoprecipitation were aligned versus the July 2003 (Build 34) assembly, as well as a database of alpha satellite in the literature. The database contains 41 known higher-order repeats comprising over 59 kb of alpha satellite sequences. 35 CENP-A associated 58 sequences aligned to alpha satellite as the best BLAST hit

(http://www.ncbi.nlm.nih.gov/BLAST/). 15/35 had high identity alignments

(greater than or equal to 95% identical and >100 bp). All 15 of these sequences were higher-order alpha satellite, from chromosomes 1, 4, 8, 10, 13-21, 15, 17,

20 and X. None of the CENP-A associated sequences were determined to be monomeric alpha satellite, confirming the hypothesis that higher-order, not monomeric alpha satellite is responsible for centromere function.

Non-centromeric alpha satellite

There are 133 blocks of alpha satellite greater than 5 Mb from the centromere gaps as identified by RepeatMasker. Although large blocks of alpha satellite could arise outside of the centromeric regions by inversions or other chromosome rearrangement mechanisms (Baldini et al. 1993; Yunis and

Prakash 1982), it was unanticipated to find many small stretches of ectopic alpha satellite DNA. There are 60 blocks of alpha satellite < 1 kb in size, all > 5 Mb away from the centromere gaps reported in the July 2003 assembly. 13/60 blocks of such alpha satellite lie in an intron of a gene in the Reference

Sequence collection (http://www.ncbi.nih.gov/RefSeq). 39/60 are within 10 bp of a transposable element; for this analysis, we included a 10 bp buffer to allow for discrepancies in RepeatMasker detection of alpha satellite and/or other repeats

(31/39 are immediately adjacent to a transposable element). Of the 39 blocks of alpha satellite bordering transposable elements, 19 abutted a transposable 59 element on both sides, totalling 58 alpha satellite edges next to a transposable element. Of these, 29/58 abutted an Alu element, 17/58 abutted a LINE element,

10/58 abutted an LTR, and 2/58 abutted a DNA element.

For validation purposes, we chose four small regions of non-centromeric alpha satellite and designed PCR primers flanking each region. Primers 578F

(5’CCAAAGTAGTCCAATCCATAG3’) and 578R

(5’AGGAACACATGCATATTCAGC3’) amplify a 113 bp region of alpha satellite on chromosome 5q34 that lies between a LINE and an Alu. Primers 738F

(5’ATCTGTACGTTCTGCCCATG3’) and 738R

(5’AGGTACCAATGGAGTGAGCC3’) amplify a 166 bp and a 99 bp region of alpha satellite flanked by LINEs, also on chromosome 5q34. Primers 389F

(5’AGTGAAGAGACATGTCCTTG3’) and 389R

(5’ACCTGCATGTTCTTCACACC3’) amplify a 38 bp region of alpha satellite next to a LINE on chromosome 2q37.3. PCR conditions were the same for each primer set (5 minute initial denaturation at 94oC followed by 35 cycles of: 94oC for

30 s, 55oC for 30 s and 72oC for 30 s). Each of these ectopic alpha satellite regions was validated by PCR in 20 unrelated individuals, and PCR products from two individuals were sequenced for each region. All 20 individuals were positive for each PCR reaction and the sequenced products agreed with the sequence in the July 2003 assembly in each case (data not shown). 60

Results

Alpha satellite in the genome assembly

Despite the difficulty of assembling alpha satellite and the lack of specific attention to the centromere regions for most chromosomes, a number of chromosome assemblies do include alpha satellite in their contigs. The July

2003 (Build 34) assembly (http://genome.ucsc.edu/) contains 6.6 Mb of alpha satellite (an estimated 10-fold underrepresentation (Eichler et al. 2004)

(appendix)), of which 5.7 Mb lie within the most proximal megabase of each reported arm contig, adjacent to the centromere gaps. As expected, there is a sharp drop in alpha satellite content outside of the first megabase (Figure 2-1,

Table 2-1). To annotate the major alpha satellite regions of the reported genome assembly, we thus focused on the most proximal megabase of each chromosome arm. Validation of the sequence assembly of centromeric regions remains an important goal for future work; nonetheless, general features of the reported contigs have been confirmed in several instances by long-range pulsed field gel mapping (Guy et al. 2003; Schueler et al. 2001). Alpha satellite content adjacent to the centromere gap varies widely among chromosomes (Fig. 2-2). Of the 43 chromosome arm assemblies examined (the five acrocentric chromosomes contain heterochromatic short arms and are not represented in the current genome assembly), nine have not reached any alpha satellite at all, suggesting that these contigs end a substantial distance away from the 61

800

700

600

500

400

300 amount of alpha satellite per block (kb)

200

100

0 1 2 3 4 5 6 7 8 9 10 > 10 Blocks

Figure 2-1. Alpha satellite location in the July 2003 (Build 34) human genome assembly. All chromosomes were divided into 1 Mb blocks starting from the centromere gap, excluding any other gaps in the chromosome assembly. Blocks are labeled 1 - 10 and > 10 corresponding to the distance from the centromere gap. Amount of alpha satellite is expressed per block on the Y axis (in kb), where each data point represents a megabase block on a different chromosome. Only blocks containing alpha satellite are plotted (no zero data points are shown). 62

Table 2-1. Alpha satellite in the July 2003 (Build 34) assembly of the human genome

proximal proximal outside chromosome assembled p 1 Mb q 1 Mb 1 Mb unassembled total 1 14039 14039 119036 133075 2 87192 53035 34157 87192 3 192123 147508 8642 35973 192123 4 66016 7272 26088 32656 66016 5 512981 401339 92797 18845 512981 6 159607 40137 118912 558 159607 7 835724 118511 707204 10009 835724 8 650524 358752 291633 139 179653 830177 9 245578 150285 95293 235452 481030 10 131485 130815 670 4611 136096 11 982406 606419 166190 209797 982406 12 629125 223565 405040 520 629125 13 2064 2064 2064 14 143 143 143 15 34846 34319 527 34846 16 479031 182749 296282 479031 17 129172 96712 32460 18984 148156 18 35610 26169 9441 35610 19 561238 160966 365924 34348 71823 633061 20 83305 69675 13554 76 83305 21 31322 31322 31322 22 6475 1006 5469 6475 X 607663 296468 310795 400 607663 Y 171600 101582 70018 171600 total (bp) 6649269 859776 7278828

Amount of alpha satellite in base pairs is listed for each chromosome. The amount of alpha satellite assembled in the proximal Mb on the p and q sides of the centromere gap (excluding clone gaps in the chromosome assembly) and also outside the first Mb is shown. Alpha satellite that has been assigned to a chromosome but is not part of a reported contig is listed as ‘unassembled’. 63 centromere. Only six chromosomes have >100 kb of alpha satellite assembled on both p and q arm contigs; the longest reported assembly on any chromosome is only 836 kb (Table 2-1), substantially less than the amount known to be located at each centromere on the basis of earlier molecular, cytogenetic and genomic studies (Alexandrov et al. 2001; Warburton and Willard 1996; Wevrick and Willard 1989). It is likely that this variation in coverage reflects the assembly progress on particular chromosomes rather than interchromosome differences in alpha satellite organization.

To characterize the types of alpha satellite in the current assembly, we used a combination of BLAST and DOTTER alignment tools (see Materials and

Methods and Figure 2-3). By this analysis, >92% of alpha satellite in the current assembly is of the monomeric type, and only eleven chromosomes

(chromosomes 2, 4, 6, 7, 8, 10, 11, 17, 19 and the X and Y) have reached higher-order alpha satellite (Fig. 2-2). Notably, even within this limited dataset, four of these assemblies contain previously undescribed families of higher-order alpha satellite (see Materials and Methods), suggesting that the complete set of centromeric repeats in the human genome has yet to be revealed.

In our analysis of alpha satellite in the current genome assembly, we found two categories of higher-order alpha satellite that differ in the degree and extent of sequence homogeneity (Figure 2-3). Within a region of higher-order alpha satellite on any one chromosome arm, the most homogeneous higher- order repeat units are 97-100% identical. In Build 34 of the current genome 64 65 66

Figure 2-2. Genomic landscape of 1 Mb regions outside of the centromere gaps. The 1 Mb regions adjacent to the centromere gaps are depicted for the p and q arms of chromosomes 1-22, X and Y. Monomeric alpha satellite (blue), typical higher-order alpha satellite (red), and more divergent higher-order alpha satellite (pink, with asterisks), as well as other satellites (grey) are shown above the black line. Arrows depict orientation of alpha satellite monomers. Refseq genes (purple) and segmental duplications are illustrated below the black line. Segmental duplications 98-99% or > 99% identical to another region of the genome are shown as yellow and green boxes, respectively. 67 assembly, 9 chromosome arms have reached higher-order alpha satellite of this type; this totals ~200 kb of sequence. Our analysis of 73 higher-order repeat units shows that within chromosome arm contigs, higher-order repeat unit identity ranges from 97.5% +/- 0.5% (8q; n=9 higher-order repeats) to 99.3% +/- 0.4%

(19p, n=6 higher-order repeats), with an overall average of 98.4 +/- 0.5% identical. This degree of homogeneity reflects the concerted evolution of higher- order repeat units and is consistent with previous estimates of intra-array sequence homogeneity in the human genome (Durfy and Willard 1989;

Schindelhauer and Schwarz 2002; Schueler et al. 2001). Between different chromosome arrays, however, higher-order repeats are quite divergent, as well- documented previously (Warburton and Willard 1996; Willard and Waye 1987b).

Other higher-order repeat units in the assembly lack the regular organization and consistent higher-order repeat length characteristic of highly homogeneous tandem arrays (Figure 2-3). Their less highly homogenized repeats, while clearly multimeric, are more divergent in both sequence and structure, with a pairwise mean identity of 93.6% +/- 2.6%. Seven chromosome assemblies contain this kind higher-order alpha satellite, comprising ~100 kb of the current genome assembly. The nature of this second category of higher- order alpha satellite in the genome is itself likely heterogeneous. In some cases, these repeats correspond to diverged copies at the edges of an otherwise homogeneous array (Schueler et al. 2001); in other cases, they may represent vestiges of ancient arrays that are no longer present in the genome (or at least 68

Figure 2-3. Types of alpha satellite in the human genome. Four types of alpha satellite DNA are apparent in the current assembly of the human genome. DOTTER plots of 5 kb of alpha satellite compared to itself are shown for each type of alpha satellite. (a) Highly homogeneous higher- order alpha satellite is made up of multimeric repeat units that are 97-100% identical to one another. Higher-order repeat units are organized in tandem arrays that typically have a uniform repeat unit size and can span several Mb. (b) Other higher-order alpha satellite shows clear evidence of multimeric structure; however, these multimeric units are less regular and more divergent in sequence and are 93.7% +/- 2.6% identical on average. (c) Monomeric alpha satellite lacks any evidence of higher-order periodicity, and its monomers have an average pairwise percent identity of 71.6 +/- 8.3%. (d) Short zones of multimeric, highly homogeneous alpha satellite (<1 to 10 kb) have been found in the middle of larger expanses of monomeric alpha satellite. Although not part of a larger higher-order array, tandem repeat units within these zones are highly homogeneous (98-100% identical). 69 not represented in the currently assembled portion). In addition to higher-order repeats contained in long arrays, several assemblies contain evidence of very short (<1 - 10 kb) ‘islands’ characterized by local homogeneity (>98% identity between tandem multimers) within a region of otherwise monomeric alpha satellite (Figure 2-3). Such regions presumably reflect small numbers of recent homogenizing events, as predicted by evolutionary models (Dover 1982), and may thus represent the earliest stages of the appearance of new arrays.

Only two chromosomes have reached arrays of highly homogeneous higher-order alpha satellite on both p and q arm contigs (Figure 2-2). Since all chromosomes are known to contain higher-order alpha satellite at their centromeres (Alexandrov et al. 2001; Warburton and Willard 1996), the fact that only the assemblies of chromosome 8 and the X chromosome have had this level of success indicates that most current assemblies terminate some distance from the functional centromere. In the two cases where there is higher-order alpha satellite on both p and q arms, the repeats are oriented in the same direction on both arms (Figure 2-2), consistent with them being part of the same homogeneous tandem array (Warburton and Willard 1996). In contrast, within the heterogeneous monomeric arrays, the orientation of alpha satellite typically switches several times within each arm contig (Schueler et al. 2001) (Figure 2-2). 70

Centromere Function

In addition to their distinct sequence organization, monomeric and higher-order alpha satellites also differ in their functionality. To assay the proximity of different types of alpha satellite to the functional centromere we looked for colocalization of alpha satellite sequences and antibodies to centromere proteins known to be present at active centromeres, CENP-A or CENP-E (Cleveland et al. 2003). We examined six chromosome assemblies that had reached either higher-order or monomeric alpha satellite at the most proximal edge of the centromere gap, chromosomes 3, 7, 8, 12, 17, and the X chromosome (see Figure 2-2).

The assemblies of chromosome 8 and the X chromosome are the only ones including higher-order alpha satellite on both p and q proximal contigs.

Chromosomes 3 and 12 have reached monomeric alpha satellite, but not higher- order alpha satellite in their assemblies. Two arrays of higher-order alpha satellite have been found on chromosomes 7 (Waye et al. 1987c; Wevrick and

Willard 1991) and 17 (Rudd et al. 2004; Waye and Willard 1986); however, in the case of both chromosomes the genome assembly has reached only the smaller array. The two arrays on chromosome 7 have been mapped; D7Z1 higher-order alpha satellite is 2 - 4 Mb among individuals and the smaller array, D7Z2, is 100 -

550 kb and lies to the p side of D7Z1 (Haaf and Ward 1994; Wevrick and Willard

1991). The chromosome 7 assembly includes a BAC containing D7Z2 higher- order alpha satellite in the proximal q arm contig; however, our FISH experiments show that this BAC is actually located on the p side of the centromere, consistent 71 with previous D7Z2 mapping studies (Haaf and Ward 1994; Wevrick and Willard

1991). This inconsistency likely reflects an error in the assembly process as the

BAC is flanked by gaps on both sides (Figure 2-2). The assembly of chromosome 17 has reached D17Z1-B higher-order alpha satellite on its proximal p arm (Rudd et al. 2004), but it has not walked into D17Z1 higher-order alpha satellite, the larger array.

Higher-order alpha satellites from chromosome 8 (D8Z2), chromosome 17

(D17Z1), and the X chromosome (DXZ1) were hybridized to extended chromosomes (see Methods) and colocalized perfectly with antibodies to centromere proteins (Figure 2-4A-C). However, the most proximal BAC containing monomeric alpha satellite on the q side of the DXZ1 array failed to colocalize with antibodies to centromere proteins (Figure 2-4D), consistent with previous functional studies of the X centromere (Schueler et al. 2001; Spence et al. 2002). Similarly, the most proximal alpha satellite BACs on the p and q sides of the centromere gaps of chromosomes 3, 7, 12 and 17 were physically distinct from the CENP signals (Figure 2-4E-H). BACs from chromosome arms 3p, 3q,

7p, 12p, 12q and 17q contain only monomeric alpha satellite, confirming the hypothesis that monomeric alpha satellite is not part of the functional centromere.

BACs containing higher-order alpha satellites D7Z2 and D17Z1-B did not colocalize with antibodies to centromere proteins either, suggesting that the site of the functional centromere does not extend to these smaller arrays. This finding is consistent with experiments involving a deleted chromosome 17 72

Figure 2-4. Alpha satellite and centromere protein colocalization. Alpha satellite FISH probes are shown in red and antibodies to centromere proteins stain green. Higher-order alpha satellites from chromosomes 8 (A), 17 (B), and the X chromosome (C) colocalize with antibodies to CENP-E. (D) BAC RPCI-11 65I6 from the Xq contig does not colocalize with CENP-E antibodies. (E) BAC RPCI- 11 557B13 from the chromosome 3p contig and (F) BAC RPCI-11 124L3 from the chromosome 3q contig do not colocalize with CENP-E antibodies. (G) BAC RPCI-11 305L6 from the chromosome 17p contig and (H) BAC RPCI-11 362P24 from the chromosome 17q contig do not colocalize with antibodies to CENP-A. 73

(Wevrick et al. 1990). The deleted chromosome 17 has been shown to behave normally in a chromosome segregation assay (see Chapter 4) even though it is missing the entire D17Z1-B array. Overall, these data suggest that the site of the active centromere on each of the chromosomes examined is restricted to higher- order alpha satellite, and in the case of chromosomes with more than one higher- order array, only the larger array participates in centromere function.

As an alternate approach to assess the relationship between types of alpha satellite and centromere function, we analyzed sequences immunoprecipitated with antibodies to the centromere protein CENP-A. Vafa and

Sullivan identified sequences associated with CENP-A in HeLa cells (Vafa and

Sullivan 1997) and we compared these to all alpha satellite sequences from the

July 2003 (Build 34) genome assembly. Among the alpha satellite sequences with high identity alignments between the two data sets, only higher-order alpha satellite was found to be associated with CENP-A (see Methods), extending earlier studies (Ando et al. 2002; Vafa and Sullivan 1997), and supporting the centromere protein colocalization data. None of the monomeric alpha satellite present in the current assembly appears to be associated with centromere function by this analysis. These data support the model that higher-order alpha satellite, and not monomeric alpha satellite, is the site of the active centromere. 74

Non-centromeric alpha satellite

In addition to the blocks of alpha satellite adjacent to the centromere gaps in the genome assembly, there are also smaller regions of alpha satellite that do not appear to be near centromeres. In total, we found 133 blocks of alpha satellite located >5 Mb away from the centromere gaps. While the largest of these could represent ancient inversions or other chromosomal rearrangements involving centromere regions (Baldini et al. 1993; Yunis and Prakash 1982), there are 60 blocks containing <1 kb of alpha satellite. Such alpha satellite blocks could represent assembly errors, library contamination, or real occurrences of alpha satellite far away from the centromere. Two lines of evidence argue for the legitimacy of at least some of these small blocks of alpha satellite. First, we validated a subset of the blocks by PCR and sequencing in 20 unrelated individuals (see Materials and Methods), indicating that at least these segments of non-centromeric alpha satellite are legitimate. Second, 39 of the 60 blocks lie within 10 bp of a transposable element, consistent with their spread to non- centromeric locations via a transduction mechanism or an unequal crossover event involving Alu or L1 sequences adjacent to alpha satellite (Deininger et al.

2003). Interestingly, 13 such blocks of non-centromeric alpha satellite were found within introns of validated genes, demonstrating that at least small stretches of alpha satellite are not detrimental to .

These data support three possible mechanisms for movement of small blocks of alpha satellite throughout the genome. First, alpha satellite could 75 mobilize through unequal crossover events within alpha satellite that give rise to an episome, followed by integration of the episome into a new genomic location.

There is evidence of such alpha satellite episomes existing in cultured cell lines

(Jones and Potter 1985; Krolewski et al. 1984; Kunisada and Yamagishi 1987).

Secondly, alpha satellite could move via a L1-mediated transduction mechanism whereby alpha satellite flanking a L1 could be transduced to a new location during the L1 retrotransposition event (Goodier et al. 2000; Pickeral et al. 2000).

Given the fact that clustered LINEs and Alus are located within stretches of monomeric alpha satellite outside of higher-order arrays, alpha satellite could also move through an unequal crossover event involving LINEs or Alus

(Deininger and Batzer 1999; Deininger et al. 2003). The occurrence of these types of events would produce small stretches of alpha satellite outside of their normal centromeric locations either next to transposable elements or alone.

Centromeric landscape

Centromeric or pericentromeric regions have been described historically as home to the genome’s “junk DNA” (Doolittle and Sapienza 1980; Orgel and Crick

1980). Recent studies have indicated a relatively sharp transition between the euchromatin of chromosome arms and the satellite-containing region near the centromere (Guy et al. 2003; Horvath et al. 2000; Schueler et al. 2001), raising the possibility that some genes are located quite close to alpha satellite. Indeed, 76 there are 104 genes listed in the Reference Sequence collection

(http://www.ncbi.nih.gov/RefSeq) within the most proximal 1 Mb regions of the 43 chromosome arm contigs (Figure 2-2), an average gene density of 2.5 genes/Mb.

While this density is lower than the genome-wide average of ~7.5 genes/Mb, it is not substantially different from densities reported for some entire (gene-poor) chromosomes, such as chromosomes 13 (Dunham et al. 2004) and 21 (Hattori et al. 2000).

The most proximal segments of the chromosome arms are also full of segmental duplications (She et al. 2004), potentially accounting in part for the difficulty of assembling these regions (Bailey et al. 2001). In the current assembly, 14.9% of the most proximal 1 Mb regions are part of segmental duplications > 98% identical to another region of the genome (Figure 2-2). An emerging model is that segments rich in segmental duplications define some of the pericentromeric regions of the genome distal to alpha satellite, while the centromeric region itself is made up of alpha satellite and will be expected to be largely devoid of such duplications.

Other repeats besides alpha satellite are also enriched at the centromere.

Using RepeatMasker (http://repeatmasker.genome.washington.edu), we examined the repeat content of the combined 43 most proximal megabases adjacent to the centromere gaps, compared to the genome average. The genome as a whole and the most centromere proximal regions were 49% and

64% repetitive, respectively (Table 2-2). This enrichment in repeat content near 77

Table 2-2. Repeat content of the July 2003 (Build 34) human genome assembly

Repeat Genome Avg Proximal 1 Mb LINE 21.26% 22.20% SINE 13.72% 9.19% LTR 8.72% 9.84% DNA Elements 3.03% 1.81% Small RNAs 0.04% 0.05% Simple Repeats 0.92% 0.92% Low Complexity 0.58% 0.53% Unknown 0.01% 0.01% Other 0.14% 0.22% Satellites 0.43% 18.76% Alpha Satellite 0.26% 13.99% Gamma Satellite 0.01% 0.59% Beta Satellite 0.04% 0.53% Human Satellite 4 0.01% 0.52% CER DNA 0.01% 0.50% Human Satellite 2 0.01% 0.49% (GATTG)n <0.01% 0.46% (CATTC)n 0.01% 0.37% Gamma Satellite 2 0.01% 0.32% SST1 0.01% 0.30% SATR1 0.03% 0.23% Gamma Satellite X <0.01% 0.21% REP522 0.01% 0.12% Acromeric Satellite <0.01% 0.05% Human Satellite 5 <0.01% 0.02% TAR1 DNA <0.01% 0.02% D20S16 DNA <0.01% 0.01% SAR DNA <0.01% 0.01% SATR2 DNA 0.01% 0.01% Human Satellite 6 <0.01% 0.01% Human Satellite 1 <0.01% <0.01% LSAU DNA <0.01% <0.01% SUBTEL_SA DNA <0.01% <0.01%

Repeat content of the full July 2003 genome assembly compared to 43 Mb of the genome corresponding to the pooled 1 Mb proximal segments of each chromosome arm studied. 78 the centromere gaps is almost entirely due to a > 40-fold increase in satellite

DNAs. Although alpha satellite makes up the majority of the satellite sequences that are enriched near the centromere gaps, other satellites are also significantly more frequent at the edges of the contigs as compared to the genome average

(Table 2-2). These other types of satellite sequence lie just distal of alpha satellite or in some cases are interspersed among blocks of monomeric alpha satellite (Guy et al. 2003; Schueler et al. 2001) (Figure 2-2). Like segmental duplications, a high density of these non-alpha satellites may be features of the pericentromere rather than the centromere sensu stricto.

Future

The centromere is a critical functional part of our genome; however, its complex repetitive organization and the assumption that it contains nothing but “junk DNA” made it a logical region to omit from assembly strategies that can be frustrated by high levels of sequence homogeneity and/or the extensive polymorphism that has been described for alpha satellite arrays (Schueler et al. 2001; Warburton and Willard 1996; Wevrick and Willard 1989). Given the current status of the assembly as analyzed here, the next phase of genome and centromere annotation might consider a targeted strategy to complete the contigs of each chromosome arm until they reach higher-order arrays of alpha satellite associated with centromere function (appendix; Schueler et al. 2001). Such a strategy, similar in some respects to that used successfully to assemble the 79 highly complex and repetitive Y chromosome sequence (Skaletsky et al. 2003), could build on the evident heterogeneity of monomeric repeats (Horvath et al.

2000; Schueler et al. 2001; Wevrick et al. 1992) and the periodic variants that punctuate the otherwise homogeneous arrays of higher-order alpha satellite

(Warburton and Willard 1996). As in the case of the Y chromosome, the correct assembly of centromeric regions would likely be facilitated by sequencing a single haplotype, as polymorphisms in alpha satellite could confound the assembly process. Such an assembly, in concert with parallel analyses at human telomeres (Riethman et al. 2004), will both provide the underlying sequence data necessary for a full annotation of elements required for human chromosome structure and function and move the genome one step closer to true completion. 80

Acknowledgements

We thank E. Eichler for providing access to data; E. Eichler, M. Schueler and D. Ledbetter for helpful discussions; and Patrick McConnell for assistance. This work was supported by a research grant from the March of Dimes Birth Defects Foundation and by the Duke University Institute for Genome Sciences & Policy. MKR is a predoctoral student at Case Western Reserve University, Cleveland, Ohio. 81

Chapter 3

Alpha satellite evolution in primates: evidence for the homogenization of

monomeric alpha satellite

M. Katharine Rudd and Huntington F. Willard

Note: This manuscript is in preparation. 82

Abstract. Alpha satellite DNA is a family of tandemly repeated sequences found at all normal human centromeres. In addition to its role in centromere function, alpha satellite is also a model for concerted evolution as alpha satellite repeats are more similar within a species than between species. All alpha satellite is made up of ~ 171 bp monomers; however, there are two types of alpha satellite in the human genome. Monomers may be arranged in extremely homogeneous higher-order repeat units, or lack any higher-order periodicity and exist as more divergent monomeric alpha satellite. In this study, we focused on the chromosome 17 centromeric region that has reached both higher-order and monomeric alpha satellite in the human genome assembly. Monomeric and higher-order alpha satellite on chromosome 17 are phylogenetically distinct, consistent with a model in which higher-order evolved independently of monomeric alpha satellite. We also analyzed the monomeric alpha satellite on six different chromosomes and found that monomers on the same chromosome were more similar than monomers on different chromosomes. Thus, like higher- order alpha satellite, monomeric alpha satellite undergoes higher rates of intrachromosomal exchange than interchromosomal exchange. Comparative analysis between human chromosome 17 and the orthologous chimpanzee chromosome indicates that monomeric alpha satellite is evolving at approximately the same rate as the adjacent non-alpha satellite DNA. However, orthologous higher-order alpha satellite is less conserved, suggesting different evolutionary rates for the two types of alpha satellite. This study is the first in 83 depth analysis of monomeric alpha satellite and provides valuable information about the mechanisms of satellite evolution. 84

Introduction

All alpha satellite DNA is made up of tandem monomers, approximately 171 bp each (Manuelidis and Wu 1978; Willard and Waye 1987b). As defined by monomer organization, there are two major types of alpha satellite, higher-order and monomeric (Alexandrov et al. 2001; Warburton and Willard 1996) (Chapter

2). Higher-order alpha satellite is made up of monomers arranged in multimeric repeat units that are highly identical from repeat unit to repeat unit. These higher-order repeat units are positioned head to tail to make up an array of extremely homogeneous higher-order alpha satellite several megabases in size.

In contrast, monomeric alpha satellite lacks any higher-order periodicity, and monomeric monomers are far less homogeneous than are higher-order repeat units (Chapter 2). All human centromeres contain large arrays of higher-order alpha satellite (Alexandrov et al. 2001; Warburton and Willard 1996), and, where investigated, these arrays are found to be bordered by more heterogeneous monomeric alpha satellite (Guy et al. 2003; Horvath et al. 2000; Schueler et al.

2001; Wevrick et al. 1992) (Chapter 2). This adjacent organization of higher- order and monomeric alpha satellite, as well as the fact that more distant primates have only monomeric alpha satellite at their centromeres (Alves et al.

1994; Maio et al. 1981; Musich et al. 1980; Rosenberg et al. 1978; Thayer et al.

1981), has led to the hypothesis that higher-order alpha satellite evolved from monomeric alpha satellite (Alexandrov et al. 2001; Warburton and Willard 1996). 85

Our study of centromere genomics and evolution relies on the alpha satellite assembled in the most recent build of the human genome assembly.

Despite its functional significance, the centromere has been largely omitted from the human genome assembly (Eichler et al. 2004) (Chapter 2). In fact, for each chromosome assembly there exists a centromere gap located at the edges of the most proximal p and q arm contigs. The amount of alpha satellite included in the

July 2003 genome assembly (Build 34) is less than one-tenth the amount of alpha satellite estimated to be in the genome based on cytogenetic and mapping studies (Alexandrov et al. 2001; Wevrick and Willard 1989). Nonetheless, the chromosome assemblies that have reached an appreciable amount of alpha satellite are an excellent resource to begin to address questions of centromere biology and evolution. We have characterized the types of alpha satellite in the genome assembly (Chapter 2) and have used these sequences to investigate the organization and evolutionary history of alpha satellite in primates.

Like other tandem satellite families (Brown et al. 1972; Coen et al. 1982;

Southern 1975), alpha satellite is subject to concerted evolution, exhibiting greater sequence identity within a species than between species (Willard and

Waye 1987b). For example, higher-order repeat units from an array on a particular chromosome are more similar to each other than to the orthologous repeats in another species (Durfy and Willard 1990; Jorgensen et al. 1987). The mechanism by which concerted evolution occurs is known as molecular drive, an evolutionary process in which variants are able to quickly spread through a 86 sequence family and fix in a population (Dover 1982). Molecular drive operates within and between chromosomes and includes mechanisms such as unequal crossing-over, gene conversion, and transposition (Dover 1982). Although all of these processes may be participating in alpha satellite evolution, the homogenization of tandem sequences can best be explained by unequal crossing-over. Smith proposed a three-step mechanism to explain the emergence of tandem satellite repeats via unequal crossing-over (Smith 1976).

The first step is a mutation that creates short local homology between two regions of a given sequence. In the second step, an unequal crossover event occurs between the two regions of homology generating two products, a deletion and a tandem duplication. Subsequent unequal crossovers between the duplicated repeats in the next step will produce expansions and contractions in the number the tandem repeats. As the number of tandem repeats increases so will the number of sites of homology, increasing the frequency of unequal crossovers within tandem repeats. Recurring crossovers will homogenize a subset of tandem repeats, leading to highly identical repeat units.

Alpha satellite likely evolves by chromosomal exchange at several levels

(Warburton and Willard 1996). Higher-order repeat units on different chromosomes have related organizations (Choo et al. 1989; Waye and Willard

1986; Willard and Waye 1987a); in fact monomers within higher-order alpha satellite on all chromosomes except the Y chromosome fall into one of three suprachromosomal families (Alexandrov et al. 1993). The relationships among 87 monomers from higher-order repeat units on different chromosomes suggest that interchromosomal exchanges must have occurred at least at some frequency

(Warburton and Willard 1996). Intrachromosomal exchanges between homologous chromosomes also occur, giving rise to highly homogeneous chromosome-specific arrays of higher-order alpha satellite (Durfy and Willard

1989; Schindelhauer and Schwarz 2002; Warburton and Willard 1996; Willard and Waye 1987b). However, the fastest mechanism of alpha satellite exchange occurs between sister chromatids. Variation in higher-order repeat unit length within an array among individual chromosomes 17 suggests that these variants evolve along haplotypic lineages that have arisen relatively recently (Warburton and Willard 1990; Warburton and Willard 1995). Thus, alpha satellite evolves through unequal crossover events between different chromosomes, homologs of the same chromosome, and sister chromatids.

As predicted by other models of satellite evolution (Smith 1976; Strachan et al. 1982), unequal crossover events can also homogenize a subset of tandem repeats into a larger repeat unit that can then expand into a highly identical array.

Greater homology among repeat units may make them a better substrate for unequal crossovers than more divergent tandem repeats (Southern 1975).

These forces likely played a role in the evolution of higher-order alpha satellite from monomeric alpha satellite. The presence of monomeric alpha satellite and absence of higher-order arrays in more distant primates (Alves et al. 1994; Maio et al. 1981; Musich et al. 1980; Rosenberg et al. 1978; Thayer et al. 1981) 88 supports a mechanism by which monomeric alpha satellite was homogenized into higher-order repeat units around the time of the emergence of the great apes

(Alexandrov et al. 2001; Baldini et al. 1991; Durfy and Willard 1990; Warburton et al. 1996). The edges of these homogeneous arrays are predicted to contain more heterogeneous repeats that were not part of the unequal crossover events that created higher-order arrays, likely more similar to the ancestral repeats before the emergence of the array (Smith 1976; Strachan et al. 1982).

Here, as a model of alpha satellite evolution, we have examined the alpha satellite assembled at the chromosome 17 centromere. The current assembly has reached higher-order alpha satellite on the p side of the centromere gap in addition to monomeric alpha satellite on both arms, thus providing multiple regions of alpha satellite for our study of sequence evolution in the centromeric region.

Materials and Methods

D17Z1-B array estimate

To determine the size of the D17Z1-B array relative to D17Z1, we analyzed three chromosomes 17 in which the D17Z1 array had been previously mapped. The

D17Z1 arrays on chromosomes 17 in hybrid cell lines L65-14A, LT23-4C and

L745 are 3.7 Mb, 3.3 Mb and 2.8 Mb, respectively (Warburton and Willard 1990).

The D17Z1 array is predominantly made up of 16 monomer (16mer) higher-order repeat units; however, repeat unit length variants do exist. D17Z1 higher-order 89 repeat units are highly identical, even among those with variant lengths, and the mean sequence divergence is 1.78% (Warburton and Willard 1995). In contrast,

D17Z1-B higher-order alpha satellite has only been found to exist as a 14 monomer (14mer) repeat unit, and D17Z1-B higher-order alpha satellite is only

92% identical to D17Z1 higher-order alpha satellite. Using very stringent

Southern washing conditions (68oC, 0.5% SDS/0.1XSSC), we were able to distinguish D17Z1 from D17Z1-B and calculate amount of D17Z1-B relative to

D17Z1.

We digested genomic DNA from hybrid cell lines L65-14A, LT23-4C and

L745 with EcoRI. Both D17Z1 and D17Z1-B contain EcoRI sites that digest each array into higher-order repeat units. We divided the digested DNA into two blots, probing one with the D17Z1 higher-order repeat unit (Waye and Willard 1986) and probing the other with the D17Z1-B higher-order repeat unit (Rudd et al.

2004). We used high stringency Southern conditions (Willard et al. 1983) to differentiate repeats from the two higher-order arrays. As described previously, the D17Z1 probe hybridized to bands corresponding to 16mers, 15mers, 14mers,

13mers, 12mers, 11mers and 9mers (Warburton and Willard 1990), whereas the

D17Z1-B probe only hybridized to a 14mer sized band. Using a phosphoimager we calculated the pixel intensity of all the bands. The ratio of the D17Z1-B

14mer band to the sum of the D17Z1 bands allowed us to estimate the relative size of the D17Z1-B array. Based on the known sizes of the D17Z1 arrays from the chromosomes 17 in cell lines L65-14A, LT23-4C and L745, we thus 90 estimated the D17Z1-B arrays to be approximately 930 kb, 560 kb and 500 kb, respectively.

Sequence Alignments and Phylogenetic Analysis

We used the UCSC browser (http://genome.ucsc.edu ) (Kent et al. 2002) to extract sequences from the July 2003 assembly (Build 34) of the human genome and the November 2003 assembly (Build 1) of the Pan troglodytes genome. To identify individual alpha satellite monomers from the human chromosome 17 and the chimpanzee centromeres, we RepeatMasked

(http://repeatmasker.genome.washington.edu) the sequences 1 Mb proximal on the p and q sides of the centromere gaps using a custom RepeatMasker library containing alpha satellite consensus monomers from the five suprachromosomal families (Alexandrov et al. 2001) as well as the alpha satellite monomers included in the default RepeatMasker library (Smit 1999). We isolated 617 monomers from the human chromosome 17 assembly and 308 monomers from the chimpanzee chromosome 19 assembly. We also extracted monomers from the most distal regions of monomeric alpha satellite on the p arms of the X chromosome (Schueler et al. 2001) and chromosome 8. 41 kb from Xp (UCSC position chrX:57050000-57091000) and 20.3 kb from 8p (UCSC position chr8:43444700-4346500) were RepeatMasked to isolate 85 and 104 monomers, respectively. We used CLUSTALW (Thompson et al. 1994) to compute all pairwise alignments among monomers from human chromosomes 8, 17 and the 91

X chromosome. Pairwise percent identity scores were translated into particular color values to generate the heatmap shown in Figure 3-2 using Spotfire version

6.0 (Somerville, MA) (Dresen et al. 2003). Monomers from chimpanzee chromosome 19 and clone PTR219 (Warburton et al. 1996) were compared to monomers from human chromosome 17 using CLUSTALW, to yield the orthologous percent identity values in Figure 3-7.

To generate the chromosome 17 phylogenetic tree (Figure 3-3) we isolated higher-order and monomeric monomers. The 617 monomers from the human chromosome 17 assembly were added to the 16 monomers making up

D17Z1 higher-order alpha satellite, one African Green Monkey monomer, and seven monomers from the BAC ends of BACs spanning D17Z1 and D17Z1-B

(RPCI-11 5B18 and RPCI-11 449A3). The 641 total monomers were aligned using CLUSTALW and subsequent MEGA (Molecular Evolutionary Genetic

Analysis, version 2.1, http://www.megasoftware.net) phylogenetic analyses were performed (Kumar et al. 2001). Neighbor-joining methods were employed with pairwise deletion parameters and 1000 bootstrap iterations.

For the interchromosomal neighbor-joining tree (Figure 3-4), we used the monomeric alpha satellite from chromosome 8, 17 and the X chromosome as described above. We added monomers from D8Z2, D17Z1, D17Z1-B and DXZ1 higher-order alpha satellite for a total of 760 monomers. MEGA phylogenetic analysis was performed as described for the chromosome 17 tree. We confirmed the topology of the interchromosomal tree by repeating the analysis 92 using maximum likelihood methods (Figure 3-5). Starting with the same 760 monomers, we generated a maximum likelihood tree using PAUP 4.0

(http://paup.csit.fsu.edu/downl.html). The initial tree was generated using neighbor-joining methods, and then we performed branch-swapping with a nearest-neighbor interchange of 3.

To calculate the pairwise percent identity scores of monomeric alpha satellite monomers in Figure 3-6 we first extracted monomers from the UCSC browser using RepeatMasker as described above. We isolated 10018 monomeric monomers from the assemblies of chromosomes 3 (n = 997), 8 (n =

3024), 11 (n = 4884), 15 (n = 193), 17 (n =495), and 20 (n = 425). Alpha satellite from these chromosomes was chosen for this analysis because the chromosomes contain higher-order alpha satellite from of each of the three major suprachromosomal families (Alexandrov et al. 1988; Willard and Waye 1987b).

Proximal monomeric alpha satellite from 17p (M3) was not included in the calculations due to its unusual increase in monomer percent identity (see

Results). Pairwise alignments between all monomers were performed using the

Needleman-Wunsch algorithm (Needleman and Wunsch 1970). We used a

Kruskal-Wallis test to determine if the means from the intrachromosomal comparisons fit the same distribution as the mean from the interchromosomal comparisons. This is a non-parametric test that does not assume a normal distribution of data. The distributions of pairwise comparisons in Figure 3-6 were 93 graphed using the graphics program, R, version 1.8.0 (Ihaka and Gentleman

1996).

Junction PCR

PCR primers for the junction between D17Z1-B and proximal monomeric alpha satellite were designed to only amplify the junction fragment and not other alpha satellite. Primers jxnF (5’ CAGATTCTACAACAAGGGTG 3’) and jxnR (5’

GATGTATGCATTCATCACAG 3’) amplify a 298 bp product at high stringency conditions (5 minute initial denaturation at 94oC followed by 30 cycles of: 94oC for

30 s, 60oC for 30 s and 72oC for 20 s). Genomic DNA from individuals from five diverse populations was purchased from Coriell Cell Repositories (Camden, NJ) and amplified using the junction PCR primers. DNAs from 10 Europeans, 7

Africans North of the Sahara, 9 Africans South of the Sahara, 7 Pacific Islanders and 10 Chinese were tested using this PCR assay and PCR products from one individual from each population were sequenced.

Results

Genomic organization of the chromosome 17 centromeric region

To better understand the organization and evolution of centromeric sequences, we focused on the chromosome 17 centromeric region. Alpha satellite from chromosome 17 has been well characterized in numerous studies (Warburton and Willard 1990; Warburton and Willard 1995; Waye and Willard 1986; Wevrick 94 and Willard 1989) and is relatively well represented in the current genome assembly, having reached regions of monomeric and higher-order alpha satellite

(Chapter 2). Mapping studies have shown that the higher-order array D17Z1 is approximately 3 Mb in size (Warburton and Willard 1990). The July 2003 genome assembly (Build 34) has not reached D17Z1 on either the p or q arm contigs of the chromosome 17 assembly. However, the 17 p arm contig terminates in a distinct, yet related type of higher-order alpha satellite, D17Z1-B

(appendix) (Figure 3-1). The D17Z1 higher-order repeat unit is 16 monomers long (Waye and Willard 1986), whereas the D17Z1-B repeat unit is comprised of

14 monomers. The two higher-order repeat units are both made up of monomers arranged in a pentameric fashion with corresponding monomers in the same order, suggesting that they diverged from a common ancestor upon a duplication or deletion event. Unlike other chromosomes with more than one higher-order array (Alexandrov et al. 1991; Choo et al. 1990; Wevrick and Willard 1991),

D17Z1 and D17Z1-B are clearly related to each other, with 92% sequence identity between the two types of higher-order repeat unit.

To better characterize the D17Z1-B array, we calculated its approximate size relative to D17Z1. The D17Z1 array was previously sized in three copies of chromosome 17 and found to be 2.8 Mb, 3.3 Mb, and 3.7 Mb in size (Warburton and Willard 1990). We examined these same chromosomes 17 and used high stringency Southern blotting to distinguish between the 92% identical D17Z1 and

D17Z1-B arrays. Based on the relative intensity of hybridization, we estimate the 95

Figure 3-1. Alpha satellite organization in the centromeric region of chromosome 17.

The genomic landscape 500 kb distal of both sides of the centromere gap (dotted lines) is depicted. Blocks of monomeric alpha satellite (blue) are shown on both p and q arm contigs, and the p arm contig terminates with D17Z1-B higher-order alpha satellite (pink). The proposed organization of D17Z1 (red) and D17Z1-B (pink) is shown inside the centromere gap. Arrows indicate the orientation of alpha satellite monomers, and triangles show the junctions between alpha satellite and non-satellite sequences (alpha satellite junctions). BACs comprising the minimal tiling path are shown in brown, as are the two BACs containing D17Z1 at one end and D17Z1-B at the other. Other repeats are shown below the BAC contigs; from top to bottom, satellites (black), LINEs, SINEs, LTRs and other repeats (grey). RefSeq genes BC031617 and WSB1 are shown in purple at the bottom. 96

D17Z1-B array to be 500-900 kb among individuals (see Methods). FISH experiments with probes specific for D17Z1 and D17Z1-B support the size estimate of D17Z1-B and confirm its location adjacent to D17Z1 on the p side of the centromere (appendix). Although the region between D17Z1 and D17Z1-B has not been sequenced and assembled, BAC end sequencing also supports an adjacent organization of D17Z1 and D17Z1-B. Working draft quality BACs RPCI-

11 5B18 (AC146710) and RPCI-11 449A3 (AC145197) both contain D17Z1 at one end and D17Z1-B at the opposite end, and restriction digests with enzymes that isolate the higher-order repeat unit show both species of higher-order repeats in each BAC (Rudd et al. 2004).

The current chromosome 17 assembly also includes four regions of monomeric alpha satellite distal to the higher-order arrays (Figure 3-1, M1 - M4).

Three blocks of monomeric alpha satellite have been found on the p side of the centromere gap and one on the q side of the centromere gap. As the chromosome 17q arm contig terminates before reaching higher-order alpha satellite, there may be other undiscovered regions of monomeric alpha satellite on 17q. The four monomeric blocks each span 26 - 50 kb in length (Figure 3-1).

These blocks of monomeric alpha satellite, as well as the regions in between monomeric blocks, are interspersed with other types of repeats. The junction between the most distal monomeric alpha satellite and non-alpha satellite sequences of the chromosome arms has been termed the “alpha satellite junction” (Schueler et al. 2001). Like other centromeric regions (Guy et al. 2000; 97

Schueler et al. 2001) (Chapter 2), the sequences proximal to the alpha satellite junctions on chromosome 17 are enriched for other satellites as compared to the genome average. The collective concentration of non-alpha satellites within the region defined by the two satellite junctions is 3.65%, more than twenty-fold greater than the genome average (Table 3-1). The concentration of other repeats such as LINEs, SINEs, LTRs, and DNA transposons within the alpha satellite junctions, however, is not enriched and is similar to that of the genome average. Distal to the alpha satellite junctions, overall repeat content is comparable to that of the genome average (Table 3-1). These data suggest a sharp demarcation between satellite-rich and euchromatic regions of the genome.

There are also genes relatively close to alpha satellite (Chapter 2), and we examined the Reference Sequence Collection (RefSeq)

(http://www.ncbi.nih.gov/RefSeq) (Pruitt et al. 2000) genes near alpha satellite

(Figure 3-1). On the p arm, a sequence transcribed as a brain mRNA,

BC031617, is located between the two most distal regions of monomeric alpha satellite, 33 kb from the nearest monomeric block. There is also a RefSeq gene,

WSB1, within 96 kb of the monomeric alpha satellite on 17q. WSB1 is also expressed in the brain and contains several WD repeats as well as a SOCS box

(Vasiliauskas et al. 1999). The genomic region between higher-order alpha satellite and the alpha satellite junction is part of the pericentromere rather than the functional centromere (Eichler et al. 2004; Rudd et al. 2004), as these 98

Table 3-1. Repeat content of the chromosome 17 satellite zone and flanking regions

satellite zone proximal 17p proximal 17q Genome Repeat repeatsa repeatsb repeatsc Average LINE 19.83% 15.35% 11.71% 21.26% SINE 8.50% 16.23% 21.45% 13.72% LTR 7.55% 9.88% 8.77% 8.72% DNA Elements 2.22% 1.80% 3.46% 3.03% Small RNAs < 0.01% < 0.01% < 0.01% 0.04% Simple Repeats 1.28% 1.58% 0.43% 0.92% Low Complexity 0.71% 0.53% 0.44% 0.58% Other 0.44% 0.12% 0.40% 0.14% Satellites 20.49% 0.73% 0.43% Alpha Satellite 16.84% 0.26% Human Satellite 2 0.01% 0.01% (GATTG)n 0.25% <0.01% SST1 2.66% 0.19% 0.01% REP522 0.73% 0.54% 0.01% ! ! Total Repeats 61.06% 46.24% 46.68% 48.84% a - from UCSC coordinates chr17:21904223-22408571 (p side of cen gap) + chr17:25408570-25671146 (q side of cen gap) b - from UCSC coordinates chr17:21304223-21904223 c - from UCSC coordinates chr17:25671146-26171146

The concentration of repeats was calculated for the region between the most distal monomeric alpha satellite (satellite zone). Repeat content was also determined for the 500 kb distal of the 17p alpha satellite junction and the 17q alpha satellite junction. These values were extracted from the UCSC genome browser (http://genome.ucsc.edu ) and compared to the genome average of repeat content from the July 2003 (Build 34) human genome assembly. 99 sequences do not underlie the centromere protein complex that makes up the kinetochore (Chapter 2). The pericentromere therefore, is a complex region, containing blocks of monomeric alpha satellite, other satellites, as well as at least some expressed genes.

Higher-order and monomeric alpha satellite on chromosome 17

In addition to analyzing the organization of alpha satellite on chromosome 17, we also investigated the evolutionary relationships of alpha satellite monomers on this chromosome. All alpha satellite within 1 Mb of the centromere gap of the chromosome 17 assembly was broken into basic ~171 bp monomers and compared using pairwise alignments as well as phylogenetic methods. We performed CLUSTALW alignments (Thompson et al. 1994) between all possible pairwise combinations of monomers (617 monomers, 380072 unique alignments) and expressed each alignment percent identity score on a color scale to graphically view the relationships between monomers (Figure 3-2A). Within each of the three most distal regions of monomeric alpha satellite (M1, M2, M4), monomer percent identity was 72.2 +/- 3.8%, 70.5 +/- 4.1%, and 72.8 +/- 4.3% respectively (Table 3-2). Monomer percent identity was higher, however, in the most proximal region of monomeric alpha satellite (M3) adjacent to D17Z1-B higher-order alpha satellite, with a mean of 81.1 +/- 3.2%. This type of localized homogenization within proximal monomeric M3 is evidenced by the four duplicated monomers within M3 that are 97.4% identical between 4mers (Figure 100

Figure 3-2. Percent identity scores for pairwise comparisons of alpha satellite monomers.

All pairwise comparisons were calculated for alpha satellite monomers and percent identity scores were depicted according to the color scale. The chromosomal origin of alpha satellite monomers is shown at the top of the figure in alternating black and grey bars. (A) Pairwise comparisons for monomers from the assemblies of chromosomes 8, 17 and the X chromosome. Black lines indicate the boundaries of monomers from each chromosome. (B and C) Detailed versions of percent identity scores from regions indicated by arrowheads. 101

3-2C). Additionally, monomers from the most proximal region of monomeric alpha satellite were more similar to higher-order D17Z1 and D17Z1-B monomers than were monomers from the more distal blocks of monomeric alpha satellite

(Figure 3-2A, Table 3-2). These data suggest that there are two distinct classes of monomeric alpha satellite in the chromosome 17 centromere region, proximal and distal monomeric. Proximal monomeric alpha satellite is more closely related to higher-order alpha satellite and thus may have participated in the early unequal crossover events that gave rise to higher-order alpha satellite on chromosome 17 before becoming physically isolated from the higher-order arrays.

Distal monomeric alpha satellite is less identical to higher-order alpha satellite than proximal monomeric, and distal monomers are just as identical within a region of monomeric alpha satellite as between regions of distal monomeric alpha satellite, even between regions of distal monomeric on opposite chromosome arms (Figure 3-2A, Table 3-2). The similarity among distal monomeric monomers suggests that monomeric alpha satellite was homogenized on this chromosome to some extent by unequal crossover events or by some other process of concerted evolution. A duplication of thirteen highly identical monomers (overall 88.0% identical) present in the same order in distal monomeric regions M1 and M2 on 17p further supports this kind of homogenization (Figure 3-2B). Given the concentration of inter- and intrachromosomal segmental duplications bordering regions of distal monomeric 102

Table 3-2. Mean percent identity among monomers from particular regions of alpha satellite.

! ! ! ! ! Alpha Satellite Regions ! !

! # mons 17p M1 17p M2 17p M3 D17Z1-B D17Z1 17q M4 Xp M 8p M

17p M1 141 72.2 +/- 3.8 69.1 +/- 4.5 63.6 +/- 3.1 57.2 +/- 3.1 58.0 +/- 3.5 71.8 +/- 4.1 65.4 +/- 3.3 65.8 +/- 3.6

17p M2 133 70.5 +/- 4.1 62.1 +/- 3.1 58.1 +/- 3.1 56.3 +/- 4.0 70.9 +/- 4.1 62.7 +/- 3.4 60.3 +/- 3.8

17p M3 97 81.1 +/- 3.2 70.7 +/- 2.9 71.3 +/- 3.5 65.8 +/- 3.3 63.9 +/- 3.2 69.0 +/- 3.4

D17Z1-B 14 74.3 +/- 5.0 76.8 +/- 6.9 58.7 +/- 3.5 57.2 +/- 2.9 60.6 +/- 3.3

D17Z1 16 75.9 +/- 6.3 59.2 +/- 3.9 57.6 +/- 3.2 61.6 +/- 3.4

17q M4 139 72.8 +/- 4.3 67.1 +/- 3.8 70.4 +/- 3.9

Xp M 85 71.3 +/- 3.8 65.1 +/- 3.8

8p M 104 ! ! ! ! ! ! ! 72.0 +/- 4.7 Intra-region and inter-region percent identity scores were determined for all pairwise comparisons of monomers from particular regions of alpha satellite. The chromosomal origin of each region of monomers is shown at the top and left side of the table. The mean and one standard deviation were calculated for all pairwise comparisons. The numbers of monomers used in each calculation is indicated (# mons). 103 alpha satellite in the pericentromeric region of chromosome 17 (Bailey et al.

2002), these blocks of monomeric alpha satellite may have undergone exchanges via segmental duplication mechanisms as well.

As a second approach, we performed a phylogenetic analysis of chromosome 17 alpha satellite. We used neighbor-joining methods to examine phylogenetic relationships among the chromosome 17 alpha satellite monomers.

In addition to the higher-order and monomeric alpha satellite found in the chromosome 17 assembly, we also included monomers from D17Z1 higher-order alpha satellite (Waye and Willard 1986) and a monomer from African Green

Monkey alpha satellite (Rosenberg et al. 1978). Like other Old World Monkeys,

African Green Monkeys have only monomeric alpha satellite at their centromeres

(Goldberg et al. 1996; Rosenberg et al. 1978; Thayer et al. 1981), and this sequence serves as an outgroup for our phylogenetic analysis.

The resulting phylogenetic tree has three major clades (Figure 3-3).

Higher-order alpha satellite from D17Z1 and D17Z1-B clade together while distal monomeric alpha satellite from both p and q arms clade together in a separate node. Proximal monomeric alpha satellite (M3) clades in a third node closest to

African Green Monkey alpha satellite. These data are consistent with our hypothesis that higher-order alpha satellite evolved from monomeric alpha satellite, and that proximal monomeric alpha satellite is phylogenetically separate from distal monomeric alpha satellite. 104

89

99

17pM1 17pM2 17pM3

D17Z1-B

D17Z1 0.1 changes 17qM4 AGM

Figure 3-3. Phylogenetic tree of alpha satellite on chromosome 17.

Neighbor-joining methods were used to generate the phylogenetic tree containing both higher- order and monomeric alpha satellite from chromosome 17. The tree contains 641 monomers, including the outgroup monomer from the African green monkey. The key at the bottom of the figure indicates the chromosomal origin of the monomers. Monomeric alpha satellite 17pM1, 17pM2, 17pM3, 17qM4; higher-order alpha satellite D17Z1 and D17Z1-B; and alpha satellite from the African Green Monkey (AGM) is shown. Bootstrap values for the distal monomeric clade (89) and the higher-order clade (99) are shown. 105

The junction between higher-order and M3 monomeric alpha satellite on

17p is very distinct. A 220 bp monomer clearly demarcates the division between homogeneous higher-order repeat units (97-99% identical) and more divergent monomeric alpha satellite. Additionally, all monomers on either side of the junction fall into the higher-order clade or the proximal monomeric clade as predicted. This is very different from the more gradual (over 10 kb) transition between higher-order and monomeric alpha satellite on Xp (Schueler et al.

2001). The difference in transition zones between the two chromosomes may be due to the aberrant 220 bp monomer on 17p. This monomer may have punctuated the higher-order/monomeric junction since it would be misaligned for future unequal crossover events, while monomers in the transition region of Xp may have continued to crossover after the fixation of DXZ1 higher-order alpha satellite.

Due to the fact that alpha satellite is rapidly evolving and that array size varies among individuals (Mahtani and Willard 1990; Wevrick and Willard 1989), we were curious to see if the junction between higher-order and monomeric alpha satellite was static or if it was subject to slippage. We designed a PCR assay to specifically amplify the 220 bp monomer junction in several individuals.

All thirty individuals from five diverse populations were positive for the junction

PCR, and sequencing one individual from each population showed 100% identity among PCR products (see Methods). These data suggest that the junction between higher-order and monomeric alpha satellite was fixed in the population 106 since the human divergence from the last common ancestor of chimpanzees and humans.

Monomeric alpha satellite evolution

Studies involving higher-order alpha satellite (Warburton and Willard 1995), as well as other satellite families (Coen and Dover 1983; Ohta and Dover 1983), have shown that intrachromosomal exchanges occur much more rapidly than do interchromosomal exchanges. To see if this was also true in monomeric alpha satellite, we examined the phylogenetic relationships among higher-order and monomeric alpha satellites from chromosomes 8, 17 and the X chromosome using neighbor-joining methods. We isolated monomers from the known higher- order alpha satellites on these chromosomes, D8Z2 (Ge et al. 1992), D17Z1

(Waye and Willard 1986), D17Z1-B (appendix) and DXZ1 (Willard et al. 1983).

Additionally, we isolated monomeric monomers from the most distal regions of the chromosome 8 (104 monomeric monomers) and the X chromosome (85 monomeric monomers from (Schueler et al. 2001)) alpha satellite as well as the monomeric alpha satellite on chromosome 17 described earlier. The resulting phylogenetic tree (Figure 3-4) has a very similar topology to the chromosome 17 tree (Figure 3-3). Higher-order alpha satellite from the three chromosomes clade together, and subclades are grouped by higher-order suprachromosomal subfamily. The monomers that comprise higher-order alpha satellite from chromosome 17 and the X chromosome have a pentameric structure, suggesting 107

62

95

17 M 8pM XpM AGM

0.1 changes D17Z1- D17Z1 D8Z2 DXZ1 B

Figure 3-4. Neighbor-joining tree of monomers from different chromosomes.

Neighbor-joining methods were employed to generate the phylogenetic tree containing higher- order and monomeric alpha satellite from chromosomes 8, 17 and the X chromosome. The key at the bottom of the figure indicates the chromosomal origin of the monomers from monomeric alpha satellite (8pM, 17M and Xp M) and higher-order alpha satellite (D8Z2, D17Z1, D17Z1-B and DXZ1). Bootstrap values for the distal monomeric clade (62) and the higher-order clade (95) are shown. 108 an ancient interchromosomal exchange between the two centromeres (Waye and

Willard 1986). As such, the monomers from DXZ1, D17Z1 and D17Z1-B fall into one of five distinct subclades (Willard and Waye 1987a). Higher-order alpha satellite from chromosome 8 is a member of a dimeric suprachromosomal family

(Ge et al. 1992) and consequently its monomers fall into two subclades that are distinct from the pentamer family subclades.

The distal monomeric alpha satellites from each chromosome are present in one large clade, separate from the 17p M3 monomeric clade. Notably, however, the majority of monomers within the distal monomeric clade fall into chromosome-specific subclades (Figure 3-4). Only seven monomers (~ 1% of the total number examined) were assigned to a chromosome-specific subclade other than the chromosome on which they are located. Given the large numbers of monomers in our study and the pairwise comparisons that determine neighbor- joining trees, it is not surprising that a few monomers belong to other subclades.

Chromosome-specific subclades within the distal monomeric clade are further supported by maximum likelihood methods (see Methods, Figure 3-5). These data indicate that although distal monomeric monomers are more similar to monomeric alpha satellite from other chromosomes than neighboring higher- order alpha satellite from the same chromosome, there has been sufficient local homogenization within regions of monomeric alpha satellite on a given chromosome to drive the evolution of chromosome-specificity. 109

17 M 8p M Xp M AGM

D17Z1- D17Z1 D8Z2 DXZ1 B

Figure 3-5. Maximum likelihood tree of monomers from different chromosomes.

(A) The same monomers from multiple chromosomes in Figure 3-4 were analyzed using maximum likelihood methods. (B) An enlargement of the higher-order clade. The key at the bottom of the figure indicates the chromosomal origin of the monomers. 110

The relationship among alpha satellite monomers from different chromosomes is also evident in our sequence alignment results (Figure 3-2B).

We used the same monomeric alpha satellite from the X chromosome and chromosome 8 to visualize the relationships among this monomeric alpha satellite and alpha satellite from chromosome 17. Among regions of distal monomeric alpha satellite, monomers from one chromosome are more similar to each other than they are to monomers from monomeric regions on other chromosomes (Table 3-2, Figure 3-2A). Within a region of distal monomeric alpha satellite, monomers from 8p are 72.0 +/- 4.7% identical and distal monomeric monomers from Xp are 71.3 +/- 3.8% identical, whereas comparisons between regions of monomeric alpha satellite from different chromosomes are much lower (Table 3-2). Thus, monomeric alpha satellite on each chromosome was likely subject to homogenization mechanisms prior to the emergence of higher-order alpha satellite.

To expand this analysis and to test the hypothesis that monomeric alpha satellite has been homogenized intrachromosomally, we next evaluated the relationships between monomeric monomers on other chromosomes. Due to the incomplete nature of the human genome assembly (Eichler et al. 2004) (Chapter

2), not all monomeric alpha satellite has been identified. Additionally, the amount of monomeric alpha satellite assembled for each chromosome is quite variable

(Chapter 2). Given these caveats, we extracted all of the monomeric alpha satellite monomers assembled on chromosomes 3, 8, 11, 15, 17 and 20, totaling 111

10018 monomers. This dataset includes the distal monomeric alpha satellite from the assemblies of chromosomes 8 and 17 described earlier, as well as additional monomers from chromosome 8. We performed all pairwise alignments among these monomers and calculated the mean pairwise identity for both intrachromosomal and interchromosomal comparisons (see Methods). The resulting distributions of pairwise percent identity scores for intrachromosomal comparisons of monomers are expressed in Figure 3-6 for each of the six chromosomes. The distribution for interchromosomal comparisons is comprised of percent identity scores for all pairwise comparisons of monomers from different chromosomes, excluding comparisons of monomers from the same chromosome. The mean of the interchromosomal comparison curve is significantly less than the means of the intrachromosomal distributions (Figure 3-

6) (p < 0.001). These data confirm that monomeric monomers from the same chromosome are more similar than are monomers from different chromosomes.

This finding supports a model in which intrachromosomal exchange mechanisms homogenized monomeric alpha satellite, and thus suggests that intrachromosomal exchanges occur more frequently than do interchromosomal exchanges both in regions of monomeric as well as higher-order alpha satellite

(Warburton and Willard 1995).

Among the intrachromosomal comparisons for each chromosome, the distributions are quite variable (Figure 3-6). The difference in means among individual chromosome distributions could reflect a biological difference in the 112 percentage of monomer comparisons

Figure 3-6. Distributions of interchromosomal and intrachromosomal monomeric monomer percent identities.

Monomeric alpha satellite monomers were compared and pairwise percent identity scores were plotted. (A) Intrachromosomal pairwise comparisons were calculated for chromosomes 3, 8, 11, 15, 17, and 20. All pairwise comparisons of monomers from different chromosomes were plotted as an interchromosomal distribution (black). As chromosomes contain variable amounts of monomers in their assemblies, we normalized each distribution of pairwise comparisons to 100% so that the Y axis represents the percentage of monomer comparisons with a given percent identity score. (B) Statistics from intrachromosomal and interchromosomal comparisons. The range of pairwise percent identity scores and the mean +/- one standard deviation is listed for each distribution. 113 amount of sequence homogenization on each chromosome or may be due to the missing monomers in the genome assembly. Among the six chromosomes evaluated, only the chromosome 8 assembly has reached higher-order alpha satellite on both p and q arm contigs, suggesting that most, if not all, of the monomeric alpha satellite has been identified on this chromosome. The chromosome 8 distribution of monomer comparisons has the lowest mean among intrachromosomal distributions, and this could be due to the abundance of monomers to be evaluated or local variation in intrachromosomal exchange rates across the ~ 600 kb region of monomeric alpha satellite for this chromosome.

Comparative analysis of alpha satellite in primates

To better understand the concerted evolution of alpha satellite in primates, we also examined the alpha satellite organization on the orthologous chimpanzee chromosome, PTR 19. The initial assembly of the Pan troglodytes genome

(Build 1, November 2003) includes alpha satellite on both the p and q arm sides of the centromere gap. Unfortunately there are several gaps in the orthologous centromeric region, especially on the p side of the centromere gap. In order to evaluate the sequence conservation between the monomeric alpha satellite in two species, we performed VISTA alignments (http://pipeline.lbl.gov/cgi- bin/gateway2) (Couronne et al. 2003) between a 300 kb region including monomeric alpha satellite on 17q (M4) and the orthologous region on PTR 19q 114

(Figure 3-7A). This region has relatively few gaps in the chimpanzee assembly, so it provides a reasonable model to address the amount of sequence conservation between the two species.

Overall, the two sequences are 98.0% identical along the 277 kb of aligned sequence. This includes high percentage identity between the chimp and human RefSeq gene WSB1. The monomeric alpha satellite found in this region is also highly conserved; orthologous monomers are 98.2 +/- 1.0% identical (Figure 3-7B), similar to the overall sequence conservation in the region.

Although the PTR 19p assembly is not as comprehensive as the 19q side, chimpanzee monomers corresponding to the proximal monomeric on 17p have also been assembled. We compared orthologous monomers in this region from the two species and again found high sequence identity; the 60 aligned monomers are 98.2 +/- 1.2% identical (Figure 3-7B). The conservation of monomeric alpha satellite between chimpanzees and humans is similar to the overall conservation between human chromosome 21 and chimpanzee chromosome 22 (Watanabe et al. 2004).

The PTR chromosome 19 assembly has not reached higher-order alpha satellite; however, an apparently orthologous higher-order repeat has been reported previously, PTR219 (Warburton et al. 1996). We compared the sequences of PTR219, D17Z1 and D17Z1-B to determine the evolutionary relationships between the three higher-order repeats. The entire chimp higher- order repeat has not been sequenced; however, four monomers related to 115

Figure 3-7. Genomic organization of 17q compared to the orthologous Pan troglodytes region.

(A) The genomic organization of the chromosome 17 centromeric region is depicted, alpha satellite is colored as described in Figure 3-1. A 300 kb region containing monomeric alpha satellite on human chromosome arm 17q was compared to the orthologous region of chimpanzee chromosome arm 19q. A VISTA alignment of the two regions is shown in pink. Percent identity between aligning sequences is indicated by the Y axis (50-100% identical). Alpha satellite (blue) and the gene WSB1 (purple) are shown. Areas of low percent identity between the two can be explained by gaps in the chimpanzee assembly (black boxes) or sequences inserted in the human genome or deleted from the chimpanzee genome (colored boxes). (B) Mean percent identity +/- one standard deviation for aligned alpha satellite monomers from chimpanzees and humans. 116

D17Z1 have been identified (Warburton et al. 1996). We analyzed the corresponding four monomers in PTR219, D17Z1 and D17Z1-B and looked for signature sites where one sequence differed from the other two. Among the 684 bp analyzed, there are 35 sites shared by PTR219 and D17Z1 and not D17Z1-B, and 16 sites in common between PTR219 and D17Z1-B and not D17Z1. There are also 18 sites shared by D17Z1 and D17Z1-B, but not PTR219. Overall,

PTR219 and D17Z1 are 95.0% identical, while PTR219 and D17Z1-B are only

92.3% identical (Figure 3-7B). These data are consistent with the divergence between chimpanzee and human orthologs of the higher-order alpha satellite present on the X chromosome, DXZ1. Comparing the entire 2.0 kb higher-order repeat, human and chimpanzee copies of DXZ1 are 93.0% identical (Laursen et al. 1992). Thus, even though our comparative analysis of higher-order alpha satellite on chromosome 17 is only based on four monomers, data from both chromosome 17 and the X chromosome support a higher rate of divergence among higher-order as compared to monomeric alpha satellite.

Discussion

The evolution of alpha satellite in primates gave rise to two distinct types that differ in organization as well as function. Among human centromeres, higher- order arrays are several megabases in size flanked by smaller stretches of monomeric alpha satellite. Higher-order alpha satellite within an array is extremely homogeneous and few other sequences have been found embedded 117 within higher-order alpha arrays (Schueler et al. 2001). In contrast, monomeric alpha satellite is more heterogeneous in sequence and is interspersed with non- alpha satellite sequences (Guy et al. 2003; Schueler et al. 2001) (Chapter 2).

The evolution of higher-order from monomeric alpha satellite can be modeled by unequal crossover events as first described by Smith (Smith 1976).

After an initial mutation creates homology between two previously unique sequences, an unequal crossover can occur between the two homologous sequences, creating a tandem duplication. Subsequent unequal crossovers can expand the number of tandem repeats, or monomers in the case of alpha satellite. This gives rise to the type of alpha satellite found in the African Green

Monkey (AGM) (Rosenberg et al. 1978). AGM alpha satellite lacks any higher- order periodicity; however, monomers are on average 95% identical (Goldberg et al. 1996; Thayer et al. 1981). Like the evolution of Drosophila satellite sequences described by Dover (Strachan et al. 1982; Strachan et al. 1985), AGM alpha satellite exists as a transition state. AGM alpha satellite is organized in long stretches of monomeric alpha satellite, presumably expanded by unequal crossing-over; however, the monomers have not had sufficient time to diverge as is the case for human monomeric alpha satellite. A second layer of complexity arises when a subset of monomers is multimerized into higher-order alpha satellite. Unequal crossovers between misaligned higher-order repeat units will occur more frequently than between monomeric monomers due to the extremely high homology between repeat units, leading to an expansion and contraction of 118 higher-order alpha satellite. The nature of higher-order alpha satellite expansion allows for the efficient spread and fixation of a sequence variant. Thus, higher- order and monomeric alpha satellite evolve at different rates, causing orthologous higher-order repeat units to be less conserved than orthologous monomeric alpha satellite in closely related species. This is indeed the case for alpha satellite in chimpanzees and humans (Figure 3-7B).

The amount of sequence divergence among monomeric monomers in the human genome is surprising for a repeat family believed to have arisen in primates. Alpha satellite was first identified in the African Green Monkey

(Rosenberg et al. 1978), and subsequent studies have found alpha satellite in the genomes of other Old World Monkeys (Maio et al. 1981; Musich et al. 1980),

New World Monkeys (Alves et al. 1994; Fanning 1989) and prosimians (Maio et al. 1981; Musich et al. 1980), as well as the great apes (Baldini et al. 1991; Durfy and Willard 1990; Haaf and Willard 1998; Warburton et al. 1996; Waye and

Willard 1989b). In order to calculate the amount of sequence divergence among monomeric monomers on a single chromosome, we focused on chromosome 8.

Since the genome assembly has reached higher-order alpha satellite on both p and q arm contigs of chromosome 8, all of the flanking monomeric alpha satellite has likely been assembled. We calculated the pairwise comparisons of all 3024 monomeric monomers in the chromosome 8 assembly and found that percent identity ranged from 52.7 - 100%, with a mean of 72.7 +/- 4.2% identical. Thus, the sequence divergence within monomeric alpha satellite is extremely high. 119

This is surprising for a satellite repeat family in which sequence homogenization is believed to decrease sequence divergence.

To calculate the genetic distance between monomeric monomers, we first calculated the rate of nucleotide substitution between chimpanzees and humans based on the average pairwise identity of 98.2% among orthologous monomeric monomers (Figure 3-7B). Based on this value and the average pairwise divergence between monomeric monomers on chromosome 8 (27.3% divergence), one would estimate 83 Myr between humans and the most distant species containing alpha satellite in its genome (r=K/2T). This appears unlikely, however, as the most distant primate known to contain alpha satellite in its genome is the lemur (Maio et al. 1981; Musich et al. 1980), and the divergence time between lemurs and humans is estimated to be ~ 55 Mya (Goodman 1999).

Alpha satellite is subject to concerted evolution, whereby sequence diversity is decreased by various homogenization mechanisms. However, we see an increase in sequence diversity as compared to other primate sequence variation estimates (Liu et al. 2003). This discrepancy could be explained if alpha satellite is especially mutable and/or has a lower rate of DNA repair than typical genomic sequences, though radiation studies do not support this model

(Bunch et al. 1995). Or the rate of sequence divergence in alpha satellite could be variable within the primate lineage. We based our divergence rate on chimpanzee and human monomeric alpha satellite orthologs, but if this rate was not constant throughout the evolution of alpha satellite it could lead to an 120 overestimate of the separation time between humans and the last common ancestor containing alpha satellite in its genome. Another possibility is that the origin of alpha satellite is more complicated than originally predicted. In the simplest interpretation of Smith’s hypothesis, alpha satellite is proposed to have arisen from an original monomer that duplicated and expanded throughout the genome via unequal crossing-over (Smith 1976). However, if independent monomers arose in the genome and then subsequent exchange events homogenized these monomers, we would expect greater diversity among monomers. The amount of sequence divergence in monomeric alpha satellite in the human genome is intriguing, and the reasons for this phenomenon will be addressed as orthologous sequences are identified in other primate species as part of future sequencing efforts.

This is the first analysis of the relationships between monomeric alpha satellite monomers on different chromosomes. Like higher-order, monomeric alpha satellite has a higher rate of intrachromsomal exchange than interchromosomal exchange, leading to chromosome-specific regions of monomeric alpha satellite. However, monomeric alpha satellite evolves less rapidly than does higher-order, as evidenced by the conservation between human and orthologous chimpanzee alpha satellite. It would be interesting to examine the relationships between higher-order and monomeric alpha satellites among all chromosomes to better model the sequence of events that created the distinct yet related organization of alpha satellite on different chromosomes. 121

Sequencing and assembling more centromeric regions of the genome will provide us access into the multifaceted evolution of alpha satellite. 122

Chapter 4

Human Artificial Chromosomes with Alpha Satellite-based de novo

Centromeres Show Increased Frequency of Nondisjunction and Anaphase

Lag

M. Katharine Rudd, Robert W. Mays, Stuart Schwartz and Huntington F. Willard

Note: This chapter is based on a manuscript that was published in Molecular and Cellular Biology (2003, 23: 7689-7697) and reformatted for this document. R.W.M. and S.S. contributed cell lines and scientific discussion. 123

Abstract. Human artificial chromosomes have been used to model requirements for human chromosome segregation and to explore the nature of sequences competent for centromere function. Normal human centromeres require specialized chromatin that consists of alpha satellite DNA complexed with epigenetically modified and centromere-specific proteins. While several types of alpha satellite have been used to assemble de novo centromeres in artificial chromosome assays, the extent to which they fully recapitulate normal centromere function has not been explored. Here, we have used two kinds of alpha satellite DNA, DXZ1 (from the X chromosome) and D17Z1 (from chromosome 17), to generate human artificial chromosomes. Although they are mitotically stable over many months in culture, when we examined their segregation in individual cell divisions using an anaphase assay, artificial chromosomes underwent more segregation errors than natural human chromosomes (p<0.001). Naturally occurring, but abnormal small ring chromosomes derived from chromosome 17 and the X chromosome also missegregate more than normal chromosomes, implicating overall chromosome size and/or structure in the fidelity of chromosome segregation. As different artificial chromosomes missegregate over a 5-fold range, the data suggest that variable centromeric DNA content and/or epigenetic assembly can influence the mitotic behavior of artificial chromosomes. 124

Introduction

Twenty years have passed since the first yeast artificial chromosomes (YACs) were constructed in an effort to elucidate the minimal components necessary for eukaryotic chromosome function (Murray and Szostak 1983). These pioneering studies demonstrated that both centromere competence and overall chromosome organization play a role in the segregation of chromosomes (Hieter et al. 1985; Murray et al. 1986; Surosky et al. 1986). From yeast to humans, mitotically stable chromosomes require centromeres and other chromosomal features including telomeres and origins of replication (de Lange 2002; Gilbert

2001; Sullivan et al. 2001). While these components have been well characterized in the yeasts Saccharomyces cerevisiae and

Schizosaccharomyces pombe (Beach et al. 1980; Bloom and Carbon 1982)

(Marahrens and Stillman 1992; Shampay et al. 1984; Walmsley et al. 1984), tractable experimental approaches to define the corresponding elements in humans or other organisms with larger chromosomes have only recently begun to be developed (Harrington et al. 1997; Larin and Mejia 2002; Sullivan et al.

2001; Willard 1998).

The first step in creating human artificial chromosomes was to identify the

DNA sequences responsible for human centromere function. Despite the evident importance of the centromere in all eukaryotic organisms, the DNA sequences responsible for centromere activity are not conserved evolutionarily (Malik and

Henikoff 2002). Whereas the S. cerevisiae “point” centromere is only 125 bp and 125 is shared among all yeast chromosomes (Hyman and Sorger 1995), other eukaryotic centromeres are substantially larger, more complex and markedly heterogeneous in sequence and composition. For example, the S. pombe centromere is composed of inner and outer repeat sequences and varies from

40-100 kb in size (Clarke et al. 1986; Nakaseko et al. 1987). Arabidopsis thaliana centromeres have been localized to 500-1300 kb regions of chromosomes resistant to meiotic recombination, and the sequences at these loci are composed of 180 bp tandem repeats and retrotransposon elements

(Copenhaver et al. 1999; Hall et al. 2003; Round et al. 1997). Similarly the

Drosophila centromere, modeled by a 420 kb region of a stable minichromosome, appears to be made up of simple repeats and transposable elements (Sun et al. 2003; Sun et al. 1997).

Human centromeric regions are composed of megabases of repetitive alpha satellite DNA that varies in sequence identity and organization from chromosome to chromosome (Warburton and Willard 1996). The only commonality between alpha satellite and all of the other well-characterized centromeric DNA sequences is an increase in AT-richness as compared to the genome average (Choo 2001; Koch 2000). The structure of alpha satellite is based on ~171 bp monomers repeated in tandem to make up higher-order repeat units that are in turn repeated to generate megabase size arrays (see

(Alexandrov et al. 2001) and (Willard 1998) for reviews). While monomers within each higher-order repeat are only 65-90% identical to each other, the multimeric 126 higher-order repeats within the same array are highly homogeneous

(Schindelhauer and Schwarz 2002; Schueler et al. 2001; Willard and Waye

1987a). Although alpha satellite is located at the centromeres of all normal human chromosomes (Manuelidis 1978; Schueler et al. 2001; Willard and Waye

1987a), the defining characteristics of alpha satellite required for centromere function are unknown. Notwithstanding the likely role of epigenetic influences on centromere activity (Henikoff et al. 2001; Sullivan et al. 2001), the functional competence of alpha satellite may be due to its highly repetitive structure, its AT- richness, or the presence of critical binding sites for particular centromere proteins (Choo 2001; Ohzeki et al. 2002).

While it is well appreciated that centromeres play a pivotal role in chromosome segregation, the precise mechanisms are incompletely understood and are likely to be both complex and multifactorial. First the centromere must assemble a kinetochore, a proteinaceous structure responsible for attaching the chromosome to spindle microtubules (for review see (Tanaka 2002)).

Checkpoint pathways ensure that all chromosomes have aligned properly at metaphase before proceeding into anaphase (for review see (Musacchio and

Hardwick 2002)). Once a bipolar spindle attachment has been established, the sister chromatids separate by resolving cohesins present at the centromere until their release at anaphase (for review see (Lee and Orr-Weaver 2001; Ulhmann

2003)). To segregate properly, therefore, chromosomes must contain sequences 127 competent to assemble kinetochores, maintain cohesion and then separate sister chromatids, and satisfy spindle checkpoints.

The study of human artificial chromosomes (as well as naturally occurring or engineered human chromosome rearrangements) has demonstrated that certain sequences are capable of establishing and maintaining centromeres, but the functionality of these centromeres has not been studied in detail. Studies from a number of groups have shown that alpha satellite from several human chromosomes is capable of forming de novo centromeres in artificial chromosome assays (Ebersole et al. 2000; Grimes et al. 2002; Harrington et al.

1997; Henning et al. 1999; Ikeno et al. 1998; Kouprina et al. 2003; Mejia et al.

2002; Ohzeki et al. 2002; Schueler et al. 2001). Some types of alpha satellite assemble artificial chromosomes that are maintained stably in culture over many cell divisions and contain centromeres that co-localize with kinetochore proteins associated specifically with active centromeres. Additional supporting evidence for the critical role of alpha satellite in human centromere function has come from detailed analysis of either natural or engineered chromosome rearrangements

(Higgins et al. 1999; Mills et al. 1999; Schueler et al. 2001; Shen et al. 2001;

Tyler-Smith et al. 1993). These data suggest that, to a first approximation, artificial and rearranged chromosomes are behaving like normal chromosomes.

However, the detailed segregation of these chromosomes has not been systematically evaluated or compared to that of normal chromosomes. 128

Here, we have investigated the extent to which human artificial chromosomes provide a model to study chromosome segregation. In this study we examine the mitotic segregation of natural human chromosomes, human artificial chromosomes and rearranged human chromosomes to investigate the roles of centromere and chromosome structure in proper chromosome segregation.

Materials and Methods

Cell lines

The near tetraploid human fibrosarcoma cell line HT1080 was grown in Alpha

MEM media (Gibco Inc.) supplemented with 10% fetal bovine serum (Hyclone), penicillin/streptomycin and glutamine. Human fibroblast cell line GM8148

(Wevrick et al. 1990) was derived from a patient with the 47,XX,del(17)

(pter->p11.2::cen->qter), + der(17)(:p11.2->cen:). This cell line contains a deleted chromosome 17 missing approximately 2.5 Mb of D17Z1 as well as several megabases of euchromatin adjacent to the centromere, and also contains a derived from the deletion event.

A human fibroblast cell line was derived from a patient with the karyotype

46,X,r(X)/45,X. This cell line contains a small ring chromosome estimated to be approximately 10 Mb in size. The ring chromosome contains an apparently complete DXZ1 array, plus several megabases of euchromatin adjacent to DXZ1

(S. Schwartz, unpublished data). Both fibroblast lines were grown in Alpha MEM 129 media supplemented with 15% fetal bovine serum, penicillin/streptomycin and glutamine.

Large insert clones and artificial chromosome formation

An ~85 kb NotI fragment from BAC RPCI-11 242E23 containing DXZ1 higher- order alpha satellite repeats from the human X chromosome (Schueler et al.

2001) was isolated and cloned into the pPAC4 vector, containing a blasticidin resistance gene (Frengen et al. 2000). The construction of BAC VJ104a32, containing ~86 kb of synthetic alpha satellite from chromosome 17 (D17Z1), has been described (Harrington et al. 1997). To generate artificial chromosomes,

DXZ1- and D17Z1-based constructs were transfected in multiple independent experiments into human HT1080 fibrosarcoma cells and subjected to drug selection (Grimes et al. 2002; Schueler et al. 2001).

In an alternative strategy (Mays et al., in preparation), VJ104a32 and a

BAC containing ~140 kb of a genomic fragment including the human HPRT gene were both modified to contain 800 bp of human telomere repeat sequences

(Harrington et al. 1997) and were then cotransfected into HT1080 cells.

Transfected cells were grown in medium containing both HAT (hypoxanthine aminopterin thymidine) to select for HPRT expression and G418 to select for bgeo expression. Two artificial chromosome-containing cell lines, PF2.6 and

PF2.7 were generated using this method. Pulse-field gel analysis (as in 130

(Harrington et al. 1997)) demonstrated that PF2.6 was linear and PF2.7 was circular (RWM and MKR data not shown).

FISH and immunostaining

After transfection, drug-resistant colonies were screened for artificial chromosomes by FISH as described (Grimes et al. 2002; Harrington et al. 1997).

Once artificial chromosomes were identified cytogenetically, cell lines were grown in the presence or absence of drug selection for at least 30 days to measure mitotic stability (Grimes et al. 2002; Harrington et al. 1997), expressed on a per basis, as described (Harrington et al. 1997). Metaphase chromosomes were prepared using standard protocols and 50 metaphase spreads were scored per time point. Alpha satellite FISH was performed as described (Grimes et al. 2002).

Telomere PNA probes obtained from Applied Biosystems, Inc. were hybridized to chromosome spreads fixed in 3:1 methanol: acetic acid fixative as described (Lansdorp et al. 1996). Immunostaining for the centromere protein

CENP-E was performed as described (Harrington et al. 1997).

Alpha satellite acquisition

Cell lines containing putative artificial chromosomes were tested for acquisition of alpha satellite from endogenous HT1080 chromosomes with centromere-specific

FISH probes. Pan-centromeric primers amplified alpha satellite DNA from the 131

HT1080 cell line that was subsequently labeled as a FISH probe, as described

(Ikeno et al. 1998). An excess of unlabeled pBamX7 (Willard et al. 1983) or p17H8 (Waye and Willard 1986) DNA was denatured and preannealed with the pan-centromeric FISH probe to block FISH signals from the alpha satellite used to generate the artificial chromosome. Artificial chromosomes lacking pan- centromeric FISH signals were then tested with individual centromeric FISH cocktails, as a more rigorous test. Degenerate alpha satellite consensus primers

(Weier et al. 1991) were modified (5’TCA (A/T) (C/G)T (C/A)ACAGAGTT

(G/T)AAC3’/5’CACATC (A/C)CAAAG (A/T/C)AGTTTC3’) to amplify alpha satellite from monochromosomal rodent/human somatic cell hybrid DNA (from Coriell Cell

Repositories, mapping panel #2). PCR products from each monochromosomal hybrid were pooled into six cocktails. Alpha satellite PCR products were pooled from chromosomes 1, 6, 13, and 20 (cocktail 1); chromosomes 2, 7, 14, and Y

(cocktail 2); chromosomes 3, 8, 15, and 19 (cocktail 3); chromosomes 4, 9, 16, and 21 (cocktail 4); chromosomes 5, 11, 22, and 17 or X (cocktail 5); and chromosomes 10, 12, and 18 (cocktail 6). PCR product cocktails were direct labeled as described (Grimes et al. 2002). Cocktails of alpha satellite were hybridized to artificial chromosomes under stringent conditions (65% formamide/2XSSC washes at 42oC) as described (Harrington et al. 1997) to detect alpha satellite other than the DXZ1 or D17Z1 present on the artificial chromosome. Artificial chromosomes that were negative with the pan- 132 centromere probe and all of the individual cocktails were concluded to be of de novo origin.

Anaphase segregation assay

To measure chromosome segregation, cell lines were arrested at anaphase with nocodazole, a benzimidazole derivative that binds to tubulin dimers, inhibiting microtubule assembly. Following anaphase arrest, cells were treated with dihydrocytochalasin B to prevent cytokinesis, suspending cells (and their segregating daughter chromosomes) in anaphase or telophase (Sullivan and

Warburton 1999; Sullivan and Willard 1998). After such treatment, an estimated

20% of cells are found in anaphase or telophase configurations. Artificial chromosome, ring chromosome, and normal chromosome segregation was monitored by FISH to identify the relevant chromosome(s), using chromosome- specific alpha satellite probes. The vector probes pPAC4 (Frengen et al. 2000) and VJ104 (Harrington et al. 1997) were used to identify the DXZ1 and D17Z1 artificial chromosomes, respectively. Chromosome-specific centromere probes and whole chromosome paints (from Vysis Inc.) were used to measure chromosome segregation. For each cell line, 200 cells were scored in duplicate, for a total of 400 cells scored per data point. The frequency of missegregation was calculated for each chromosome as the number of missegregating chromosomes divided by the total number of chromosomes scored for each type. 133

Results

We have used two approaches to make human artificial chromosomes. In the first approach (Grimes et al. 2002; Schueler et al. 2001), we used large fragments of alpha satellite DNA cloned into BAC or PAC vectors to generate circular human artificial chromosomes. In the second approach, we used co- transfection of synthetic arrays of alpha satellite combined with large fragments of human genomic DNA and synthetic telomeres (Harrington et al. 1997) (Mays et al., in preparation) to generate both linear and circular artificial chromosomes.

Type of alpha satellite influences artificial chromosome formation. Artificial chromosome studies have used a variety of chromosome-specific alpha satellite

DNA sequences to generate de novo centromeres (Ebersole et al. 2000; Grimes et al. 2002; Harrington et al. 1997; Henning et al. 1999; Ikeno et al. 1998;

Kouprina et al. 2003; Mejia et al. 2002; Ohzeki et al. 2002; Schueler et al. 2001).

Our studies focused on higher-order X chromosome (DXZ1) and chromosome 17

(D17Z1) alpha satellite. DXZ1 and D17Z1 alpha satellite are closely related; both are composed of ~171 bp monomers arranged in a pentameric organization, and corresponding monomers within DXZ1 and D17Z1 are 85% identical on average

(Waye and Willard 1986).

Despite their sequence relatedness, DXZ1 and D17Z1 form artificial chromosomes at very different frequencies in HT1080 cells. (Although the DXZ1 and D17Z1 constructs used have different BAC or PAC vectors and drug 134 resistance markers, previous studies have shown that this difference does not significantly influence artificial chromosome formation rates (Grimes et al. 2002).)

After transfecting either the DXZ1- or D17Z1-containing constructs into HT1080 cells, we selected blasticidin or G418 resistant colonies and then detected artificial chromosomes by FISH with probes to either type of alpha satellite.

Consistent with other studies (Grimes et al. 2002; Mejia et al. 2002), D17Z1 formed stable artificial chromosomes in a high proportion of drug resistant colonies (30-40% in other studies; here 6/8 colonies). However, the artificial chromosome formation rate for DXZ1 was much lower; DXZ1 formed artificial chromosomes in only 7% (6/88) of drug resistant colonies in three experiments.

Artificial chromosome composition. Previous studies have demonstrated that some artificial chromosomes may acquire DNA sequences other than the transfected input DNA (Grimes et al. 2002; Harrington et al. 1997). Because we wished to address the centromeric competence of DXZ1 and D17Z1 sequences, it was necessary to rule out that any other alpha satellite had assembled into the artificial chromosome that might confer centromere activity. To this end, we used centromeric FISH assays to detect alpha satellite besides that of the input DNA on each artificial chromosome (see Materials and Methods). While it is formally possible that the artificial chromosomes could have acquired DXZ1 or D17Z1 from endogenous chromosomes, this would escape detection using these methods. However, based on previous experiments (Harrington et al. 1997), we 135

Table 4-1. Characteristics of human artificial chromosomes

Input DNA Cell Line Relative Sizea CENP-Eb Mitotic Stabilityc Centromered Telomeree

Strategy 1 DXZ1 X-1 small + + acquisition - X-2 small + - nd nd X-3 small + + acquisition - X-4 medium + + de novo - X-5 medium + + de novo - X-6 small + nd acquisition -

D17Z1 17-1 small + + de novo - 17-2 small + + de novo - 17-3 medium + nd de novo - 17-4 medium + nd de novo -

Strategy 2 D17Z1 PF2.6 small + + de novo + PF2.7 small + + de novo -

17-15 large + + de novo + ! ! ! ! ! ! ! a Relative size was determined by DAPI staining, and artificial chromosomes were grouped into small, medium, or large categories. Small artificial chromosomes were estimated to be ~15 Mb or less, medium artificial chromosomes were estimated to be between 15 and 30 Mb, and large artificial chromosomes were estimated to be larger than 30 Mb. b Immunostaining for the presence of anti-CENP-E antibodies on the artificial chromosome. c Mitotic stability was measured as described in Materials and Methods. Mitotically stable artificial chromosomes were retained at a rate of 98.6-100% per cell division, calculated as described (Harrington et al. 1997). nd = not done. d Artificial chromosomes were tested for the acquisition of non-input alpha satellite DNA sequences (see Materials and Methods). e Artificial chromosomes were assayed for the presence of functional telomeres. Artificial chromosomes constructed from circular input DNA (in cell lines X-1 – 17-4) did not acquire telomere sequences as demonstrated by FISH. Artificial chromosomes in which telomere sequences were included in the input constructs (PF2.6, PF2.7) were validated for the presence of functional telomeres by gamma-irradiation and pulse-field gel analysis. 136 expect such highly specific recombination-mediated acquisition to be unlikely. Of the five DXZ1-containing chromosomes tested, three had acquired alpha satellite

DNA other than DXZ1; only two, therefore, could be validated as containing denovo centromeres (Schueler et al. 2001). In contrast, none of the four D17Z1- based artificial chromosomes tested was positive for non-D17Z1 alpha satellite

DNA by FISH; thus all were of de novo origin (Table 4-1). The difference between the two types of input alpha satellite is likely related to the formation efficiency of the two constructs (Grimes et al. 2002); poorly or moderately competent constructs are thus more likely to show acquisition of endogenous alpha satellite sequences. Interestingly, the two validated de novo DXZ1-based artificial chromosomes (X-4 and X-5) appear to be cytologically larger than the three acquisition DXZ1-containing artificial chromosomes (Figure 4-1). This could reflect a greater size requirement for DXZ1-based artificial chromosomes.

We only assayed the artificial chromosomes in this study for the acquisition of alpha satellite, as no other sequences have been implicated in centromere function in humans (Ebersole et al. 2000; Saffery et al. 2001).

Artificial chromosome centromere function. While both DXZ1 and D17Z1 can form de novo centromeres on an artificial chromosome, the functionality of these centromeres remained to be determined. (CENP-E), a , is part of the kinetochore (Yen et al. 1991; Yen et al. 1992) and localizes specifically to active centromeres (Sullivan and Schwartz 1995), making 137

Figure 4-1. FISH analysis of artificial chromosomes. Arrowheads denote artificial chromosomes. DAPI (4’,6’ –diamidino-3- phenylindole) stains chromosomes in blue. (A-C), CENP-E immunostaining in green stains active centromeres, alpha satellite FISH in red hybridizes to the artificial chromosome and the relevant endogenous centromere. Cell lines X-5 (A) and X-6 (B) contain DXZ1-based artificial chromosomes probed with DXZ1. (C) Cell line 17-1 contains a D17Z1-based artificial chromosome probed with D17Z1. (D) Cell line 17-15. A green chromosome 8 paint probe hybridizes to the endogenous chromosome 8 as well as the artificial chromosome. The D17Z1 FISH probe in red hybridizes to the artificial chromosome as well as the endogenous chromosome 17. (E) Cell line PF2.6. The green D17Z1 probe hybridizes to the artificial chromosome and to the centromeres of the endogenous chromosomes 17. An HPRT probe in red hybridizes to the artificial chromosome. The inset shows a DAPI image of the artificial chromosome. (F) Cell line 17-1. A telomere probe in green hybridizes to the ends of all chromosomes except for the artificial chromosome. 138 it a useful marker of functional centromeres. We immunostained cell lines containing human artificial chromosomes with antibodies to CENP-E. Like endogenous chromosomes, all artificial chromosomes were positive for the characteristic pattern of CENP-E staining (Figure 4-1, Table 4-1). To further evaluate the artificial chromosome centromeres, we followed the mitotic stability of several artificial chromosomes in the presence and absence of drug selection

(Table 4-1, Strategy 1). Of the seven artificial chromosomes tested in this way, six were mitotically stable both in the presence and absence of drug selection and were retained at a calculated rate of 98.6-100% per cell division. The fact that the artificial chromosomes in this study are retained after many cell divisions indicates that, like previously described artificial chromosomes (Ebersole et al.

2000; Grimes et al. 2002; Harrington et al. 1997; Henning et al. 1999; Ikeno et al.

1998; Kouprina et al. 2003; Mejia et al. 2002; Ohzeki et al. 2002; Schueler et al.

2001), they segregate well, even in the absence of selection.

Artificial chromosome segregation in anaphase. Mitotic stability in the drug selection assay reflects the proportion of cells containing one or more artificial chromosomes over time; as such, it is only an indirect measure of segregation events in individual cell divisions. To investigate the segregation of the artificial chromosomes more directly, we implemented an assay based on use of nocodozole to trap cells in anaphase and thus to monitor chromosome 139

Figure 4-2. Anaphase segregation assay. Anaphase nuclei were stained blue with DAPI and hybridized with FISH probes to detect artificial and natural chromosomes. In each example, the green alpha satellite FISH probe hybridized to the artificial chromosome as well as the natural chromosomes with the same alpha satellite as the artificial chromosome. A red vector probe hybridized to the artificial chromosomes, but not the natural chromosomes. Thus, the natural chromosomes appear as green dots, while the artificial chromosomes appear as merged green and red signals. (A) Cell line X- 5. The natural X chromosomes segregated 2:2 and the artificial chromosomes segregated 1:1. (B) Cell line 17-2. The natural chromosomes 17 segregated 4:4 and the artificial chromosomes segregated 2:4, reflecting a likely nondisjunction event. (C) Cell line X-5. The natural X chromosomes segregated 2:2 and the artificial chromosomes segregated 1:2 with a lagging artificial chromosome in between the other segregating chromosomes. (D) Cell line 17-3. The natural chromosomes 17 segregated 4:4 and one of the artificial chromosomes was in an anaphase bridge. 140 segregation directly at each cell division (Sullivan and Warburton 1999; Sullivan and Willard 1998). We monitored the segregation of seven human artificial chromosomes, three formed with DXZ1 and four formed with D17Z1. Six of the seven artificial chromosomes tested were fully validated and of de novo origin

(see above), while one of the DXZ1 artificial chromosomes, X-3, had acquired endogenous chromosome 7 alpha satellite and thus contained multiple types of alpha satellite. After suspending cells in anaphase, we performed FISH with probes to identify the artificial chromosome and endogenous X chromosomes or chromosomes 17 in each cell line, providing a direct comparison of the segregation of artificial chromosomes to the corresponding endogenous chromosomes in the same cell (Figure 4-2). In addition to the endogenous chromosomes 17 and X, we also examined the segregation of natural human chromosomes 7, 18, and Y as controls.

Missegregation of the natural human chromosomes tested ranged from

0.33% to 1.1% per chromosome per cell division (Figure 4-3A). These data are consistent with previous studies of chromosome segregation in other human cell lines using this or similar assays (Cimini et al. 2002; Sullivan and Willard 1998).

In contrast, the human artificial chromosomes tested missegregated between

1.6% and 9.7% per chromosome per cell division (Figure 4-3B). Notably, the artificial chromosome in cell line X-3 missegregated at a rate of 2.0 +/- 0.4%, similar to the other artificial chromosomes studied. Although this artificial chromosome had acquired endogenous alpha satellite that may or may not be 141

A B C D 10 10 10 10 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 7 X 17 18 Y X-3 X-4 X-5 17-1 17-217-3 17-4 d(17) r(17) r(X) 17-152.6 2.7

Figure 4-3. Missegregation of natural, artificial, and variant chromosomes. Missegregation rates were calculated as the number of missegregating chromosomes divided by the total number of chromosomes present of each type. Percent of chromosomes missegregating is plotted on the Y axis. Error bars represent one standard deviation from the mean of two segregation experiments for each chromosome. Total number of cells scored for each datapoint was >400. (A) Missegregation rates for natural chromosomes 7, X, 17, 18, and Y. (B) Missegregation rates for artificial chromosomes in cell lines X-3, X-4, X-5, 17- 1, 17-2, 17-3, and 17-4. Squares indicate DXZ1-based artificial chromosomes, triangles denote D17Z1-based artificial chromosomes. (C) Missegregation frequencies for the deletion chromosome 17 (d(17)), the ring chromosome 17 (r(17)), and the ring X chromosome (r(X)). (D) Missegregation rates for D17Z1- based artificial chromosomes in cell lines 17-15, PF2.6 and PF2.7. 142 acting as the functional centromere, it behaved like the other artificial chromosomes. The wide range of artificial chromosome missegregation rates is due largely to the artificial chromosomes in cell lines 17-3 and 17-2, with missegregation rates of 4.3 +/- 0.05% and 9.7 +/- 0.2%, respectively. Even without including these two artificial chromosomes, however, the missegregation rates of artificial chromosomes and natural chromosomes were significantly different from each other (p<0.001).

Variant chromosome segregation. The differences in segregation between natural chromosomes and artificial chromosomes could be explained by a number of factors relevant to centromere and/or chromosome function. To shed light on the possible differences in missegregation rates between artificial and natural chromosomes, we next examined a number of variant human chromosomes. To control for possible variation between diploid human cell lines and the hyperdiploid HT1080 line, we first examined the segregation of the normal chromosomes 17 and X in the diploid lines. The frequency of missegregation detected (0.6% and 0.1%, respectively) was indistinguishable from those shown in Figure 4-3A for HT1080, indicating that such possible cell line effects are negligible.

As one hypothesis, artificial chromosomes may missegregate more than natural chromosomes due to the presence of de novo centromeres, which may be compromised in some as yet unidentified way relative to endogenous 143 centromeres. If de novo centromeres are the only factor causing the observed increase in missegregation, then we would expect chromosomes of a comparable size with natural centromeres to segregate better than artificial chromosomes. To test this, we measured the segregation of two naturally occurring, patient-derived ring chromosomes, one derived from the X chromosome and one derived from chromosome 17. These ring chromosomes are cytologically similar in size to artificial chromosomes and contain either DXZ1 or D17Z1 alpha satellite at their natural centromeres. When analyzed in the anaphase assay, the chromosome 17-derived ring chromosome missegregated with a frequency of 2.5 +/- 0.3%, while the X-derived ring missegregated with a frequency of 4.5 +/- 0.8% (Figure 4-3C). Notably, these rates are within the range of artificial chromosome missegregation rather than the range of natural chromosome missegregation. These data argue, therefore, that the presence of de novo centromeres alone does not account for an increase in segregation defects.

The amount of alpha satellite could also play a role in chromosome segregation. With the exception of the Y centromere, natural human centromeres are composed of several megabases of alpha satellite (Mahtani and

Willard 1990; Wevrick and Willard 1989), although the minimum amount of alpha satellite capable of centromere function is unknown. To investigate the possible effect of alpha satellite array size on segregation, we next analyzed the segregation of a patient-derived chromosome 17 with a deletion within the alpha 144 satellite array. This deleted chromosome contains less than a quarter of the alpha satellite present at the normal 17 centromere (700 kb vs. 3.2 Mb) (Wevrick et al. 1990); however it segregates normally in the anaphase segregation assay, with a missegregation rate of 0.12 +/- 0.1% (Figure 4-3C). Thus, at least above

700 kb of alpha satellite, alpha satellite array size does not determine chromosome segregation. Based on the relative intensity of FISH signals, we estimate that the artificial chromosomes in this study contain more than 700 kb of alpha satellite. So it is unlikely that the segregation errors we observe in artificial chromosomes are due to a lack of alpha satellite DNA.

Since both artificial chromosomes with de novo centromeres and ring chromosomes with natural centromeres missegregate more than normal chromosomes, the data suggest that some characteristic of chromosome structure other than the origin of the centromere plays a role in chromosome segregation. Both artificial chromosomes and ring chromosomes are smaller than natural chromosomes. If chromosome size is involved in segregation, then we would expect larger artificial chromosomes to segregate better than smaller artificial chromosomes, as is the case with YACs (Murray et al. 1986; Murray and

Szostak 1983). To determine whether human artificial chromosome size affects segregation, we examined an artificial chromosome with a de novo D17Z1 centromere that, by translocation, had acquired a large region of chromosome 8q

(Harrington et al. 1997). This chromosome (in clone 17-15) is much larger than any of the other artificial chromosomes or ring chromosomes in this study (Figure 145

4-1D, Table 4-1, Strategy 2) and is the only artificial chromosome with a normal euchromatic chromosome arm. Unlike the artificial chromosomes and ring chromosomes, it is linear and is capped by telomere repeats at each end of the chromosome (Harrington et al. 1997). This large linear artificial chromosome missegregates at a frequency of 0.75 +/- 0.35%, within the range of natural human chromosomes (Figure 4-3D). These data argue that chromosome size and/or organization are involved in ensuring proper segregation.

If linear structure alone contributes to proper chromosome segregation, then linear artificial chromosomes should segregate better than circular ones.

Indeed, studies in yeast have shown that circular YACs missegregate more frequently than linear YACs of a comparable size (Murray and Szostak 1983).

Small human artificial chromosomes could also be affected by circular structure since the seven artificial chromosomes studied here in detail were formed from constructs without telomere sequences and, when assayed for the acquisition of telomere sequences using telomere probes (see Materials and Methods), contain no detectable telomere sequences (Table 4-1, Figure 4-1F).

To evaluate the effect of linearity on chromosome segregation, we next studied the segregation of two additional artificial chromosomes, one linear

(PF2.6) and one circular (PF2.7) (Table 4-1, Strategy 2), containing copies of the human HPRT gene (RWM and MKR, data not shown). Both of these artificial chromosomes contain de novo D17Z1 alpha satellite centromeres, a fragment of genomic DNA containing HPRT gene sequences, and telomere repeats (see 146

Materials and Methods) (Mays et al., in preparation). The missegregation frequencies for PF2.6 and PF2.7 are 4.1 +/-0.35% and 5.2 +/- 0.55% respectively, both within the range of artificial chromosome missegregation rates

(Figure 4-3D). The similarity in segregation between the circular and linear artificial chromosomes refutes the possibility that circular structure alone causes an increase in segregation errors.

Types of missegregation errors. In addition to the frequency of missegregation, the types of segregation errors differed between artificial chromosomes and natural chromosomes (Table 4-2). We analyzed each type of chromosome for the number of detected nondisjunction and anaphase lag events, as these types of error could be easily distinguished cytogenetically

(Figure 4-2). (We excluded from this analysis other errors in which the mechanism could not be inferred, i.e. apparent 1:0 segregation. These uncharacterized errors were equally represented among natural and artificial chromosomes, making up 12-18% of the total segregation errors, and could reflect absence of replication, chromosome loss or hybridization failure.)

Natural chromosomes are more likely to nondisjoin than lag during anaphase (Table 4-2); of the 153 missegregating natural chromosomes detected in this study, 81% underwent nondisjunction, while only 19% lagged. This trend was also apparent in the limited number of ring chromosome missegregation events detected, among which 85% nondisjoined and 15% lagged. In contrast, 147

Table 4-2. Segregation Errors

Missegregating chromosomes a

Chromosome type Total chromosomes Total NDJ Lag scored

Normal 31,867 153 124 (81%) 29 (19%)

Artificial 13,193 447 268 (60%) 179 (40%)

Ring 947 13 11 (85%) 2 (15%) a number of missegregating chromosomes undergoing nondisjunction (NDJ) or anaphase lag 148 missegregating artificial chromosomes (n=447) nondisjoined 60% of the time and lagged 40% of the time. The frequencies of nondisjunction and anaphase lag exhibited by artificial chromosomes and natural chromosomes are statistically different (chi-square, p<0.001), suggesting that the mechanism of segregation error varies between natural and artificial chromosomes. Even among artificial chromosomes, missegregation rates and type of missegregation vary from clone to clone (data not shown), likely reflecting the different de novo centromeres. For example, the artificial chromosome in cell line X-5 lagged 30% of the time (n=13) and nondisjoined 70% of the time (n=30) while the artificial chromosome in cell line 17-3 had a higher incidence of lag; 54% of missegregating chromosomes

(n=27) vs. 46% nondisjunction (n=23). The propensity for anaphase lag in artificial chromosomes as compared to natural chromosomes and ring chromosomes suggests that de novo centromeres are structurally and functionally compromised in some manner and/or may be missing critical factors required for normal chromosome segregation.

Discussion

Chromosome segregation is a complicated process involving factors specific to centromere structure and function, as well as more general aspects of chromosome organization (Sullivan et al. 2001). In yeast, the requirements for proper chromosome segregation have been well defined using YACs as a model.

Here, we have used a human artificial chromosome system as a similar model 149 for addressing the critical DNA elements required for proper segregation of human chromosomes in mitosis. By studying the segregation of natural chromosomes, artificial chromosomes, and patient-derived variant chromosomes, we have begun to examine the requirements for proper chromosome segregation.

Chromosome size and structure affects segregation. Chromosome size is a likely factor in chromosome segregation, as revealed initially with YACs (Murray and Szostak 1983). Our data with human natural and artificial chromosomes support this notion; human artificial chromosomes and ring chromosomes analyzed in this study are smaller than normal chromosomes, and they missegregate more often than normal chromosomes (Figure 4-3). Thus, there may be a minimum chromosome size threshold (presumably smaller than the normal Y chromosome) below which chromosomes are more prone to mitotic error.

Overall chromosome structure and organization may also affect segregation. Ring chromosomes and the majority of human artificial chromosomes in this study are circular, and both of these types of chromosome missegregate more often than normal chromosomes. However a linear artificial chromosome in this study, PF2.7, missegregates at a similar rate to the circular artificial chromosomes, arguing that circular structure alone does not account for the observed increased frequency of segregation errors. In contrast, linear 150 artificial chromosome 17-15 missegregates within the range of normal chromosomes (Figure 4-3); however, this chromosome has acquired a region of a normal chromosome arm (Harrington et al. 1997) and consequently is much larger than the other artificial chromosomes examined in this study. This artificial chromosome may segregate properly due to its large size or to the presence of a bona fide chromosome arm. This conclusion has implications for the design of next-generation human artificial chromosomes.

The type of segregation error exhibited by chromosomes is indicative of the mechanism of centromere dysfunction. The fact that normal chromosomes and ring chromosomes undergo relatively less chromosome lag than artificial chromosomes suggests that de novo centromeres are compromised in a way that increases the frequency of anaphase lag. Artificial chromosomes are comprised of alpha satellite DNA plus BAC or PAC vector sequences concatamerized into chromosomes larger than the original 100 kb input DNA. Multiple stretches of alpha satellite separated by vector sequences on an artificial chromosome may be assembling dicentric or multicentric chromosomes. Dicentric chromosomes in yeast and humans are prone to segregation defects. In yeast, dicentric minichromosomes are mitotically unstable and have a higher copy number than monocentric minichromosomes

(Koshland et al. 1987). Similarly, human dicentric chromosomes are more prone to lag than normal chromosomes (Sullivan and Willard 1998). In both of these 151 previous studies, the distance between centromeres affects segregation; when centromeres are close to each other they may function as a monocentric chromosome. At the resolution of metaphase chromosomes, artificial chromosomes appear to bind centromere proteins in one discrete kinetochore; however, the precise organization of DNA and proteins is not currently known.

Further analysis of the structure of artificial chromosomes may prove some to be functionally di- or multicentric and/or others to be lacking in proteins required for centromere function.

Factors affecting chromosome segregation. Among the artificial chromosomes examined in this study, we observe a wide range of missegregation rates (Figure 4-3). Furthermore, the frequency of anaphase lag is increased in artificial chromosomes relative to normal chromosomes (Table 4-

2). Unlike natural human chromosomes, artificial chromosomes must form de novo centromeres in culture, a process requiring specific DNA sequences

(Ohzeki et al. 2002; Willard 2001), centromeric proteins, and epigenetic chromatin modifications (Cleveland et al. 2003). As each artificial chromosome centromere is created independently, some may establish better centromeric configurations than others.

Segregation errors may arise from impaired assembly of trans-acting centromeric factors on the artificial chromosome. For example, CENP-A, a histone H3 variant found at the centromere, plays a critical role in centromere 152 function and is conserved from yeast to human (Palmer et al. 1987; Stoler et al.

1995). While we did not directly address whether the artificial chromosomes in this study contain CENP-A, its presence is suggested by the localization of the centromere protein CENP-E (Table 4-1) (Grimes et al. 2002; Harrington et al.

1997; Ikeno et al. 1998; Masumoto et al. 1998). Furthermore, CENP-E associates with only a subset of the total mass of alpha satellite on both normal and artificial chromosomes (Figure 4-1) (Grimes et al. 2002; Schueler et al.

2001; Sullivan et al. 2002). Whether the remaining alpha satellite is truly redundant or whether it serves a more general function as pericentric heterochromatin remains unknown (Sullivan et al. 2001). Thus at the current level of analysis there are no indications that interactions between kinetochore proteins and centromeric DNA are compromised.

Examination of centromere function in model organisms demonstrate that, in addition to the CENPs, other proteins are required at the centromere to ensure proper chromosome segregation. For example, deletion of the heterochromatin protein HP1 homolog, Swi6, in fission yeast results in chromosome lag (Bernard et al. 2001). Mouse cells depleted for HP1 exhibit micronuclei, an indicator of missegregation (Taddei et al. 2001). Checkpoint proteins monitor bipolar kinetochore attachment to spindle microtubules and initiate the metaphase to anaphase transition. Absence of checkpoint proteins necessary for the anaphase promoting complex cause a delay of anaphase onset (Musacchio and

Hardwick 2002). Centromeric cohesion is also critical for chromosome 153 segregation. Absence of the Drosophila cohesion protein, MEI-S332, results in chromosome segregation defects, as well as reduced transmission frequency of minichromosomes (Lopez et al. 2000). Similarly, when Rad21, a member of the fission yeast cohesion complex, is depleted from fission yeast chromosomes there is an increase in the incidence of chromosome lag (Bernard et al. 2001).

Therefore the presence of heterochromatin, checkpoint, cohesion, or other as yet unidentified proteins may be necessary for proper segregation of artificial chromosomes, and one or more of these aspects may be suboptimal in the human artificial chromosomes examined here.

This study suggests that both chromosome and centromere structure play important roles in chromosome segregation. From our analysis of artificial, variant, and normal chromosomes it appears that chromosome size and/or composition is involved in chromosome segregation. Future investigations into the organization of a range of different human artificial chromosomes may help to identify the factor(s) responsible for impaired de novo centromeres and give us more insight into the requirements (both genetic and epigenetic) for human chromosome segregation. 154

Acknowledgements

The authors thank Brenda Grimes, Bala Balakumaran, Kristin Scott, Mary

Schueler, Beth Sullivan, Gil Van Bokkelen and John Harrington for useful discussions. This work was supported by a Franklin Delano Roosevelt research grant from the March of Dimes Birth Defects Foundation and by a Sponsored

Research Agreement from Athersys, Inc. Both RWM and HFW have a financial interest in Athersys, and this potential conflict of interest has been disclosed to and managed by Case Western Reserve University. MKR was supported in part by a training grant from the National Institutes of Health. 155

Chapter 5

Conclusions and Future Studies 156

Chapter 5: Conclusions and Future Studies

This thesis has addressed the organization, evolution and function of alpha satellite in the human genome. The majority of alpha satellite in the human genome is organized in a higher-order structure, comprising megabases of extremely homogeneous multimeric units at every centromere. More divergent monomeric alpha satellite is found at the edges of higher-order arrays. These two types of alpha satellite have different evolutionary histories and differ in their roles in centromere function.

Alpha satellite in the human genome

Before the start of this thesis work, higher-order alpha satellite had been well- documented at all human centromeres (Alexandrov et al. 1991; Choo et al. 1990;

Devilee et al. 1988; Ge et al. 1992; Greig et al. 1989; Greig et al. 1993; Haaf and

Willard 1992; Hulsebos et al. 1988; Jorgensen et al. 1988; Looijenga et al. 1992;

Puechberty et al. 1999; Rocchi et al. 1991; Vissel and Choo 1991; Waye et al.

1987a; Waye et al. 1987b; Waye et al. 1987c; Waye and Willard 1985; Waye and

Willard 1986; Waye and Willard 1987; Waye and Willard 1989a; Willard et al.

1983; Wolfe et al. 1985), whereas monomeric alpha satellite had only been identified on a few chromosomes (de la Puente et al. 1998; Guy et al. 2003;

Horvath et al. 2000; Ikeno et al. 1994; Jackson et al. 1996; Schueler et al. 2001;

Wevrick et al. 1992). In the analysis of the July 2003 (Build 34) version of the human genome assembly (Chapter 2), higher-order alpha satellite was found in 157 the assemblies of only eleven chromosomes. Among this higher-order alpha satellite, four novel types of higher-order alpha satellite were identified on chromosomes 4, 10, 11 and 19. Monomeric alpha satellite has been assembled on all but three chromosomes, suggesting that this type of alpha satellite commonly lies between higher-order arrays and the euchromatin of chromosome arms. As only 13 out of 43 chromosome arms have reached higher-order alpha satellite (Chapter 2, Figure 2-1), much of the pericentromeric regions of the genome are still unassembled. Furthermore, no higher-order array has been completely sequenced, so each chromosome assembly is missing megabases of centromeric sequence.

Functional alpha satellite

Given the critical role of centromeres in chromosome segregation, identifying the

DNA sequences responsible for centromere function is an important task. Alpha satellite was localized to the primary constrictions of human chromosomes over

25 years ago (Manuelidis 1978). The abundance of alpha satellite at human centromeres has led to the hypothesis that alpha satellite plays a role in centromere function (Willard et al. 1989). However, the specific type of alpha satellite underlying the centromere protein domain of the kinetochore has only recently been characterized.

Chapters 2 and 4 describe the functional characterization of alpha satellite using cytological, biochemical and artificial chromosome assays. Antibodies to 158 centromere proteins CENP-A and -E only colocalize with higher-order and not monomeric alpha satellite (Chapter 2). This functional difference between the two types of alpha satellite is also evident in the analysis of chromatin immunoprecipitation data (Ando et al. 2002; Vafa and Sullivan 1997). Among the alpha satellite sequences associated with CENP-A antibodies, only higher-order alpha satellite was identified (Chapter 2). Furthermore, higher-order alpha satellite from chromosome 17 (D17Z1) and the X chromosome (DXZ1) is capable of forming artificial chromosomes with de novo centromeres (Chapter 4). In contrast, monomeric alpha satellite is not sufficient for centromere formation in an artificial chromosome assay (Ikeno et al. 1998). All of these data demonstrate that higher-order alpha satellite is the site of the functional centromere.

Does monomeric alpha satellite have any role in chromosome function, or is it merely a by-product of alpha satellite evolution? Detailed analyses of chromosome segregation in cultured cells showed that artificial chromosomes and small patient-derived ring chromosomes missegregate more often than normal chromosomes (Chapter 4). The artificial chromosomes were comprised of only higher-order alpha satellite and BAC vector sequences. The ring chromosomes have not been completely structurally defined, but they do not contain the entire higher-order array and flanking sequences present on the chromosomes from which they were derived (Wevrick et al. 1990). These findings suggest that other sequences besides higher-order alpha satellite could also be involved in some aspect of chromosome segregation. The segregation 159 defects exhibited by artificial and ring chromosomes were fairly subtle, and all of these chromosomes bound antibodies to centromere proteins, suggesting that these chromosomes had assembled a normal kinetochore (Chapter 4).

Perhaps monomeric alpha satellite flanking the higher-order arrays is involved in some aspect of pericentromeric function. Although monomeric alpha satellite cannot nucleate a centromere in an artificial chromosome assay (Ikeno et al. 1998) or bind antibodies to centromere proteins on a normal chromosome

(Chapter 2), it may serve some other purpose in chromosome function. This model is consistent with the finding that mitotically stable human minichromosomes retain monomeric alpha satellite and other sequences flanking the higher-order array (Shen et al. 2001; Spence et al. 2002). Similar to the S. pombe and D. melanogaster pericentromeric regions (Chapter 1, Fig 1-4), the human pericentromere could be involved in sister chromatid cohesion or heterochromatin formation (Sullivan et al. 2001). This is an attractive model that needs further investigation to clearly delineate the chromatin modifications and specific proteins found in the human pericentromeric and centromeric regions.

Alpha satellite evolution

The organization of higher-order and monomeric alpha satellite is a product of concerted evolutionary processes. Previous studies of higher-order alpha satellite have demonstrated that intrachromosomal exchanges are far more frequent than interchromosomal exchanges (Durfy and Willard 1989; 160

Schindelhauer and Schwarz 2002; Warburton and Willard 1996; Willard and

Waye 1987b). As shown in this thesis, this is also the case for monomeric alpha satellite; monomeric alpha satellite monomers from the same chromosome are more similar to each other than to monomers on different chromosomes (Chapter

3). However, the rate of sequence divergence in higher-order and monomeric alpha satellite is quite different. Among great apes, monomeric alpha satellite is more conserved than higher-order alpha satellite (Chapter 3). In fact, comparing chimpanzee and human genomes, monomeric alpha satellite evolves at approximately the same rate as other (non-satellite) sequences (Chapter 3) (Liu et al. 2003; Watanabe et al. 2004). The difference in evolutionary rates between higher-order and monomeric alpha satellite is likely caused by an increase in crossing-over among higher-order repeat units due to their extremely high . This process would allow any changes in higher-order alpha satellite to quickly spread throughout that array and fix a new higher-order repeat unit, thus driving increased sequence divergence relative to either monomeric alpha satellite or the rest of the genome.

Based on these data, it is possible to hypothesize steps involved in alpha satellite evolution (Figure 5-1). As proposed by Smith, an initial mutation(s) might generate homology between two sequences by chance (Smith 1976).

These newly homologous sequences would be subject to unequal crossing-over between sister chromatids, creating a tandem duplication. Further unequal crossovers lead to expansions and contractions in the number of repeats, or in 161

first monomer

tandem duplication Prosimians?? New World Monkeys?? monomer expansion

interchromosomal exchange transfers monomers to other chromosomes

African Green intrachromosomal exchange Monkeys homogenizes monomeric alpha satellite --> CHROMOSOME-SPECIFIC MONOMERIC

n n

Other primates?? unequal crossing-over Lesser apes?? generates higher-order repeats from monomeric alpha satellite

interchromosomal exchange transfers higher-order repeat units, creating suprachromosomal families

intrachromosomal exchange Great Apes: homogenizes higher-order Orangutans? alpha satellite --> Gorillas CHROMOSOME-SPECIFIC Chimpanzees HIGHER-ORDER Humans

Figure 5-1. Model of alpha satellite evolution. Alpha satellite monomers and higher-order repeat units are depicted as arrowheads and block arrows, respectively. The numbers of monomers and higher-order repeat units per chromosome have been reduced for simplicity. Primates representative for the type of alpha satellite organization are listed in grey boxes on the left. Each step of the model is described, see the text for a more detailed explanation. 162 the case of alpha satellite, monomers. Interchromosomal exchange mechanisms, such as transposition or gene conversion, might populate monomers on all chromosomes, and initially such monomers would be highly identical to one another. Indeed, this type of alpha satellite organization is found at the centromeres of the African Green Monkey (Rosenberg et al. 1978; Thayer et al. 1981). As sequence divergence increases among monomers from different chromosomes, intrachromosomal exchange mechanisms homogenize monomeric alpha satellite on the same chromosome. This leads to the chromosome-specific type of monomeric alpha satellite discussed in Chapter 3.

Next, unequal crossing-over duplicates a subset of tandem monomers into a higher-order repeat on some or all chromosomes. There are two possibilities at this step. Higher-order alpha satellite might originate only from the monomeric alpha satellite on the same chromosome, allowing independent higher-order repeat formation on every chromosome. It is also possible that higher-order repeat units formed only on a few chromosomes and were then transferred to other chromosomes. Either mechanism could generate higher-order alpha satellite bordered by more divergent monomeric alpha satellite.

Intrachromosomal exchange will homogenize higher-order alpha satellite into large chromosome-specific arrays. Interchromsomal exchanges of higher-order alpha satellite will lead to similarity of higher-order repeat units on different chromosomes, as is evidenced by the suprachromosomal families of higher- order alpha satellite (Alexandrov et al. 1988; Waye and Willard 1986). Higher- 163 order alpha satellite has thus far only been found in orangutans (Haaf and Willard

1998; Waye and Willard 1989b), gorillas (Durfy and Willard 1990; Waye and

Willard 1989b), chimpanzees (Baldini et al. 1991; Warburton et al. 1996; Waye and Willard 1989b), and humans, suggesting that the capacity to form higher- order alpha satellite occurred only in the last common ancestor of the great apes.

This model of alpha satellite evolution is based on a number of studies that have examined specific regions of alpha satellite in primates (Durfy and

Willard 1989; Durfy and Willard 1990; Schueler et al. 2001; Warburton and

Willard 1996; Warburton et al. 1996; Warburton and Willard 1990; Warburton and

Willard 1995; Waye and Willard 1986; Wevrick et al. 1992; Willard and Waye

1987b). The work described in Chapter 3 analyzed alpha satellite from six different human chromosomes to demonstrate that monomeric alpha satellite has been homogenized by intrachromosomal exchanges. Future studies should take advantage of the wealth of data in genome assemblies to comprehensively address questions of alpha satellite evolution. As the human genome assembly improves and other primate genomes are sequenced, it will be possible to thoroughly dissect the evolution of this functionally important class of satellite

DNA. 164

Future Studies

Sequencing the human centromere

As discussed in Chapter 2, no human centromere has been entirely sequenced.

The centromere has been consciously omitted from the human genome project

(Collins et al. 1998; Lander et al. 2001) due to its repetitive complexity and the assumption that it contained nothing but “junk DNA” (Doolittle and Sapienza

1980; Orgel and Crick 1980). However, the genome assembly is missing a critical functional element of the genome. Understanding the organization and evolution of the alpha satellite that makes up the human centromere is worth pursuing, at least for a handful of chromosomes.

Given the extreme homogeneity among higher-order repeat units (Durfy and Willard 1989; Schindelhauer and Schwarz 2002; Schueler et al. 2001) and polymorphism in repeat unit length (Warburton et al. 1993; Warburton and

Willard 1995) and array size (Mahtani and Willard 1990; Wevrick and Willard

1989) among individuals, a specialized strategy needs to be developed to sequence a human centromere (Figure 5-2). Borrowing from the strategy used to sequence the male-specific region of the human Y chromosome (Skaletsky et al. 2003), a human centromere sequencing project could start with a single

(haploid) chromosome to eliminate the confounding effects of polymorphism between homologs. To map a single centromere, a BAC library could be constructed from either a monochromosomal somatic cell hybrid or a 165

• Isolate single chromosome Make BAC library by shearing to avoid cloning bias

• Restriction map single centromere Map higher-order array Map alpha satellite BACs

Higher-order array

NruI XhoI BamHI BglII BstEII BglII KpnI BstEII XhoI NruI

......

• Sequence overlapping and redundant BACs

• Take advantage of alpha satellite sequence variants to assemble BACs into a contiguous centromere array

Figure 5-2. Strategy for sequencing an entire human centromere. To eliminate the confounding effects of polymorphism, the sequencing project should begin with a single chromosome, either derived from a somatic cell hybrid or hydatidaform mole cell line. The next step would be to construct a BAC library using DNA shearing to avoid a cloning bias that might exclude certain alpha satellite regions. Restriction mapping a single centromere (grey box) and BACs (black lines) corresponding to this region would create an in depth map of the alpha satellite array. Sequencing overlapping and redundant BACs would detect rare sequence variants in higher-order alpha satellite that would aid in assembling this extremely homogeneous sequence. 166

hydatidiform mole and then screened for alpha satellite positive BACs (Ikeno et al. 1998; Kouprina et al. 2003). Monochromosomal somatic cell hybrids contain only one human chromosome in a rodent cell line background, while hydatidaform moles are completely homozygous diploid tumors that arise from the fertilization of an empty ovum (Fan et al. 2002). Either of these genomic sources would eliminate the polymorphism found in most BAC libraries. The

BAC library should be generated by shearing genomic DNA to avoid a cloning bias that could possibly under-represent the higher-order repeat units of interest.

Using restriction enzymes that cut frequently in the genome, but rarely in alpha satellite, a higher-order array could be mapped as described previously

(Jackson et al. 1996; Mahtani and Willard 1990; Mahtani and Willard 1998;

Puechberty et al. 1999; Tyler-Smith and Brown 1987; Warburton and Willard

1990; Wevrick and Willard 1989; Wevrick and Willard 1991) (Chapter 1). Alpha satellite containing BACs would be restriction mapped in a similar fashion to build a scaffold of BACs underlying the higher-order array (Figure 5-2). Thorough sequencing of overlapping and redundant BACs would reduce the amount of sequencing errors. Assembling the sequences within each BAC could take advantage of sequence variants within higher-order alpha satellite (Durfy and

Willard 1987; Schindelhauer and Schwarz 2002) that will be detectable due to the elimination of polymorphism and the reduction of sequencing errors. 167

Sequencing an entire human centromere will provide insight into the rates and range of homogenization events that participate in alpha satellite evolution

(Durfy and Willard 1989). Also, this project could possibly uncover other non- alpha satellite sequences embedded in arrays. In addition to transposable elements, there could be genes within the human centromere as is the case for the centromeres of fission yeast (Wood et al. 2002) and rice (Nagaki et al. 2004).

Sequencing and assembling at least a few centromeres by this targeted approach is worth doing not only for the sake of completing the sequence of an entire chromosome, but also to better elucidate the complex human centromere.

Centromeric vs. pericentromeric chromatin

Sequencing at least one centromere in its entirety will provide a comprehensive chromosome map including pericentromeric and centromeric regions and their junctions to chromosome arms. The next step to characterize the human centromere might involve chromatin immunoprecipitation and extended chromatin fiber experiments that would define DNA sequences present at specific protein domains.

There is cytological evidence for distinct centromeric and pericentromeric domains on human chromosomes. Cohesins Rad21 and SMC1 are found in the area of the human centromere; however antibodies to both proteins lie adjacent to antibodies to centromere proteins such as CENP-B or CREST sera (Gregson et al. 2002; Hoque and Ishikawa 2001). Heterochromatin protein 1 (HP1) has 168 also been localized in the vicinity of the human centromere (Minc et al. 1999).

There are three homologous copies of HP1 in the human genome, HP1a, HP1b, and HP1g (Nicol and Jeppesen 1994; Saunders et al. 1993). HP1a and HP1b localize at or near the centromere in mitosis, whereas HP1g is dispersed along chromosome arms (Minc et al. 1999; Nicol and Jeppesen 1994). Recently, antibodies to HP1a have been found to lie adjacent to centromere proteins such as CENP-A (Grimes et al. 2004; Sullivan and Karpen 2004).

Chromatin immunoprecipitation experiments (Ando et al. 2002; Vafa and

Sullivan 1997) using antibodies to centromere proteins, heterochromatin proteins, and cohesins would precisely map the sequences underlying these protein domains. Similarly, immunostaining stretched chromatin fibers (Blower et al. 2002; Haaf and Ward 1994) with antibodies to these proteins would also resolve these regions. Based on the experiments from Chapter 2, it is likely that centromere proteins will only be associated with higher-order alpha satellite.

However, it will be interesting to determine the sequences underlying heterochromatin proteins and cohesins. It may be the case that these proteins colocalize with monomeric alpha satellite and occupy a pericentromeric domain distinct from the centromeric domain (Figure 1-4). Once the sequences involved in heterochromatin formation or sister chromatid cohesion are identified, they could be functionally tested in an artificial chromosome assay. Constructs containing monomeric alpha satellite as well as higher-order alpha satellite may form artificial chromosomes with both centromeric and pericentromeric domains. 169

These resulting artificial chromosomes may segregate better than those derived from higher-order alpha satellite alone (Chapter 4), providing evidence for the functionality of monomeric alpha satellite in a pericentromeric role.

Identifying centromeric DNA sequences in other species

Characterizing the diverse sequences that comprise centromeres among species will help us to better understand the requirements for centromere function.

Centromeric sequences have been identified in yeasts, plants, flies and mice among others (Chapter 1). However, centromeric sequences are not related; thus standard homology searches among different genomes will not reveal the sequences present at centromeres. In contrast to centromeric DNA, centromere proteins are well conserved from yeasts to humans (Chapter 1). Therefore, a better way to identify the centromere in diverse species may be to look for sequences that are associated with centromere/kinetochore proteins.

Chromatin immunoprecipitation experiments using antibodies to CENP-A should identify centromeric DNA sequences in any species (Ando et al. 2002;

Vafa et al. 1999; Vafa and Sullivan 1997). CENP-A is well conserved among organisms, and as a variant histone it is likely the primary epigenetic mark of centromere function (see Chapter 1). Rather than sequencing entire genomes and mapping centromeres, CENP-A chromatin immunoprecipitation experiments would quickly isolate centromeric DNA. This strategy has been employed to identify the centromeric sequences in the Indian muntjac (Vafa et al. 1999). Vafa 170 et al. identified a novel 972 bp monomer repeat associated with CENP-A and subsequently showed that it was present on all muntjac chromosomes.

This kind of analysis in multiple species would provide valuable information about centromeres. Perhaps most centromeres are characterized by

AT-rich satellite DNA as is the case in flies, plants, mice and primates (Chapter

1). Some of these species might have evolved higher-order structures within satellite arrays in a similar fashion to higher-order alpha satellite. Maybe like mice and humans, some centromeric sequences will include CENP-B recognition sequences that evolved convergently in different centromere sequences.

Understanding the paradigms of centromere sequence organization across multiple species will provide insight into centromere evolution and function.

Why study centromere evolution?

The importance of the centromere is indisputable. The centromere is responsible for segregating chromosomes properly, and in the absence of centromere function results. Much of this thesis work has focused on the organization and evolution of alpha satellite, the type of repetitive DNA found at primate centromeres. But why should we care about the inner workings of alpha satellite evolution? This question is especially relevant given the daunting task of sequencing an entire human centromere. There are two principal reasons for studying centromere evolution. First of all, investigating centromere evolution will elucidate this locus in the context of overall genome organization. Even more 171 importantly, by dissecting the conserved attributes of centromeric DNA across diverse organisms we will better understand what is required for centromere function.

Based on estimates of alpha satellite array sizes, alpha satellite likely makes up 2 - 3% of the human genome. However, only about one tenth of the amount of alpha satellite in the genome has been included in the current human genome assembly (Chapter 2). Sequencing an entire centromere will uncover one of the few remaining unknown regions of the human genome. The human centromere contains megabases of higher-order alpha satellite, but what else is there? Could there be expressed genes embedded within arrays of higher-order alpha satellite as is the case in rice centromeres (Nagaki et al. 2004)? Or might there be particular classes of transposable elements like the fly (Sun et al. 1997) and Arabidopsis (Copenhaver et al. 1999) centromeres? The only way to answer these questions and evaluate the genomic organization of the human centromere is to sequence one.

Unlike typical genic sequences, centromeric DNA is not conserved among diverse organisms (Malik and Henikoff 2002; Willard 1998). However, there could be other, less obvious, characteristics of centromeric DNA that are conserved. Investigating the commonalities among diverse centromeric DNA organizations across multiple species will define the requirements for centromere function. The work described in this thesis has focused on the organization of human centromeres and the type of alpha satellite involved in human centromere 172 function. Nonetheless, future studies to dissect the functional requirements of centromeres should continue to move beyond primates. Identifying the centromeric DNA in multiple species by chromatin immunoprecipitation as proposed above will provide valuable data on centromere organization.

The centromeres studied to date are typically AT-rich and repetitive

(Chapter 1). Flies, plants, mice and primates have all evolved different types of satellite DNA at their centromeres. Perhaps repetitive satellite DNA takes on a secondary structure required for setting up centromeric chromatin. The similarity in satellite repeat unit length among centromeres from diverse organisms such as Arabidopsis (180 bp), rice (155 bp), mice (120 bp) and primates (171 bp) is intriguing. It is possible that satellite DNA is capable of assembling a centromere-specific positioning, and that this structure is more conserved than specific DNA sequences (Ikeno et al. 1994; Yoda et al. 1998).

Or perhaps there is a subtle sequence requirement for centromeric DNA, such as dyad symmetry, as described in yeast and human centromeres as well as some repeats found at human neocentromeres (Koch 2000).

In studying the evolution of centromeres, we hope to identify the common paradigms of centromere structure across diverse species. It will then be possible to determine what is truly required for centromere function. 173

Appendix

Sequence Organization and Functional Annotation of Human

Centromeres

M. Katharine Rudd, Mary G. Schueler and Huntington F. Willard

Note: This chapter is based on a manuscript that was published as a review article in the Cold Spring Harbor Symposia on Quantiative Biology (2004, 68: 141-149) and has been reformatted for this document. 174

With the near completion of the archival human genome sequence, attention has turned to defining the attributes of the sequence that are responsible for its many functions, including determination of gene content, functional identification of both short- and long-range regulatory sequences, and characterization of structural elements of the genome responsible for maintaining genome stability and integrity (Collins et al. 2003). In addition, the task remains to close the several hundred gaps that disrupt the landscape of an otherwise contiguous sequence for each of the 24 human chromosomes. The largest of these gaps correspond to regions of known or presumed heterochromatin at or near the centromeres of each chromosome, regions that were largely excluded from both the public (Lander et al. 2001) and private (Venter et al. 2001) sequencing efforts due in part to their highly repetitive DNA content. While there are several large regions of non-centromeric heterochromatin in the human karyotype (Trask

2002), the repetitive DNA associated with the primary constriction of each chromosome has been strongly implicated in centromere function (Lamb and

Birchler 2003; Willard 1998) and thus remains to be fully characterized at the genomic and functional levels.

The centromere is essential for normal segregation of chromosomes in both mitotic and meiotic cells. Paradoxically, while this role is conserved throughout eukaryotic evolution, the sequences that accomplish centromere function in different organisms are not (Henikoff et al. 2001). In part reflecting the different perspectives brought to chromosome structure and function by cell biologists on 175 the one hand and by geneticists (and now genomicists) on the other, two fundamentally different approaches have contributed to the current understanding of centromere biology. The top-down, extrinsic approach, classically adopted by cytologists and cell biologists, has focused predominantly on the functions of the kinetochore, the proteinaceous complex that assembles at the primary constriction and mediates both attachment of chromosomes to microtubules and movement of chromosomes along the mitotic and meiotic spindles (Rieder and Salmon 1998; Tanaka 2002). This approach has been highly successful in identifying a range of proteins that are critical for establishing the specialized chromatin that marks the position of the centromere on complex eukaryotic chromosomes and directs assembly of the kinetochore complex

(Cleveland et al. 2003; Sullivan et al. 2001).

The complementary bottom-up approach, with its fundamentally more intrinsic focus, seeks to explore the underlying genomic basis for centromere function and to define the cis-acting DNA sequences that must, at some level, specify where a centromere forms (Nicklas 1971; Willard 1998). Influenced by the success of elegant experiments in budding yeast that defined a minimal ~120 bp sequence that directed centromere function (Clarke 1998), the large blocks of heterochromatin that dominate the pericentric regions of human (and other mammalian) genomes were initially discounted as serious candidates for anything remotely functional. However, both physical and genetic mapping studies over the past 15 years have given credibility to the possibility that the 176 satellite DNA sequences themselves do indeed play a role in the structural integrity of the centromere and/or directly in specifying kinetochore assembly. In many complex eukaryotic genomes, including plants (Copenhaver et al. 1999;

Dong et al. 1998; Nagaki et al. 2003; Zhong et al. 2002), fission yeast (Kniola et al. 2001), Drosophila (Sun et al. 2003), as well as human (Lander et al. 2001;

Venter et al. 2001) and mouse (Waterston et al. 2002), a variety of repetitive

DNAs have now been characterized genomically to varying degrees at or near the centromere. In an increasing number of these cases, direct functional studies now support a role for repetitive sequences in centromere specification and function (reviewed in (Cleveland et al. 2003)). In the human genome, both in- depth studies of centromeric chromatin (Ando et al. 2002; Blower et al. 2002;

Vafa and Sullivan 1997) and the formation of human artificial chromosomes with de novo centromeres (Harrington et al. 1997; Ikeno et al. 1998) provide direct evidence of a functional role for centromere-associated satellite DNA.

The functional importance of centromeric sequences notwithstanding, these regions of the human genome remain poorly understood at the level of sequence assembly and annotation. Indeed, most reported contigs of pericentromeric regions in the human genome terminate at clones containing just a few heterogeneous satellite DNA repeats without reaching the extensive homogeneous arrays that characterize all normal human centromeres (Schueler et al. 2001). The exact role of primary DNA sequence in centromere function continues to be a matter of some debate, and likely both genomic and epigenetic 177 factors are involved (Choo 2001; Cleveland et al. 2003; Sullivan et al. 2001).

With growing interest in the elements of a functioning centromere, complete sequence assemblies of human centromeres, combined with their functional annotation, will be an essential resource.

Genomic organization of the alpha satellite family

Originally identified in the human genome over 25 years ago (Manuelidis 1976;

Manuelidis and Wu 1978), alpha satellite DNA, defined by a diverged 171 bp motif repeated in a tandem head-to-tail fashion, has been identified at the centromeres of all normal human chromosomes studied to date (Alexandrov et al. 2001; Willard and Waye 1987b). Human chromosomes (as well as those characterized in great apes) contain alpha satellite organized hierarchically into multimeric, higher-order repeat arrays in which a defined number of monomers have been homogenized as a unit (Figure A-1). On at least the majority of human chromosomes, these large arrays span several megabases of DNA

(Wevrick and Willard 1989) and are coincident with the centromere as defined by genetic mapping (Laurent et al. 2003; Mahtani and Willard 1998) and with the cytogenetically visible primary constriction and site of a number of centromere and kinetochore proteins that have been implicated in centromere function

(Blower et al. 2002; Spence et al. 2002; Tyler-Smith and Willard 1993). 178

Figure A-1. Alpha satellite organization in the human genome. (A) The centromere region of human chromosomes is comprised of megabases of repetitive alpha satellite DNA organized largely in tandem arrays that localize to the primary constriction. (B) Monomeric alpha satellite lacks a higher-order periodicity and individual monomers are ~65-80% identical in sequence. Each arrow represents a ~171 bp monomer. A dotter plot (Sonnhammer and Durbin 1995) (approximately 95-99% stringency over 100 bp) of 5 kb of monomeric alpha satellite from the chromosome 17 pericentromeric region. High identity is seen only along the diagonal, indicating a lack of close sequence relationships among the approximately 30 monomers whose sequence is illustrated. (C) Higher-order arrays of alpha satellite consist of tandem multimeric units (shown as five monomer units in the schematic) that are nearly identical in sequence. D17Z1-B alpha satellite is based on a 2.4 kb higher-order repeat unit, and higher-order repeats (consisting of 14 adjacent monomers each) are ~98-99% identical to each other (see text). This multimeric, higher-order relationship is apparent from the dotter plot as diagonals at 2.4 kb intervals parallel to the self-diagonal. 179

In addition to these highly homogeneous arrays of multimeric alpha satellite, many centromeric regions also contain a variable amount of heterogenous alpha satellite monomers that, unlike the higher-order repeats, fail to show any evidence of hierarchical organization (Figure A-1). Where examined at the level of genome maps, these stretches of monomeric alpha satellite flank the higher-order arrays and have been, on some chromosomes, linked on sequence contigs with the euchromatin of the chromosome arms (Horvath et al.

2001). Only on the X chromosome (see below) has a sequence contig successfully bridged from a chromosome arm, through monomeric alpha satellite, to the higher-order repeats of the functionally defined centromere

(Schueler et al. 2001). However, as the precise functional determinants of the centromere on different chromosomes remain undefined and as different classes of alpha satellite DNA (and other pericentromeric satellite DNAs) show substantial sequence heterogeneity both inter- and intrachromosomally

(Alexandrov et al. 2001; Lee et al. 1997), it becomes important to complete sequence contigs of each chromosome arm/centromere junction and to functionally annotate the different types of sequence element present throughout pericentromeric regions.

The centromeres of chromosomes X and 17

As models to explore the feasibility of detailed functional and genomic mapping of human centromeres, we initially selected chromosomes 17 and X (Waye and 180

Willard 1986; Willard et al. 1983). Alpha satellite from both chromosomes has been characterized extensively; each belongs to the pentameric subfamily of human alpha satellite in which the homogeneous higher-order repeats (12- and

16-monomers long for the DXZ1 and D17Z1 arrays on the X and 17, respectively) are based on an underlying five-monomer unit. Both arrays span approximately 2-4 Mb on most copies of these chromosomes in the human population, and detailed long-range restriction maps have been determined

(Mahtani and Willard 1998; Wevrick et al. 1990). Further, study of a variety of both naturally occurring and engineered human chromosome abnormalities had demonstrated correspondence between the DXZ1 and D17Z1 loci and several kinetochore proteins associated specifically with functionally active centromeres

(Higgins et al. 1999; Lee et al. 2000; Mills et al. 1999; Sullivan and Willard 1998;

Wevrick et al. 1990). Notwithstanding this background of information and resources, the centromeric region of these two chromosomes remained poorly represented. While the available sequence contigs of both the X chromosome and chromosome 17 are marked by a number of gaps, the largest and most notable of these lie at the centromeres (Figure A-2).

We focused our initial efforts on the short arm of the X chromosome (Xp), as both the public and private draft sequence assemblies terminated in clones that contained small amounts of monomeric alpha satellite (Schueler et al. 2001). As reported by Schueler et al. (2001), the final clone assembly, based on both in silico strategies and screening of multiple large-clone libraries, spanned an 181

Figure A-2. Gaps in the public genome assembly of chromosomes X and 17 (April 2003 freeze http://genome.ucsc.edu/). The chromosome is shown at the top of the figure and the centromere region is indicated in red. Gaps in the sequence assembly are illustrated as vertical lines. 182 abrupt transition (the ‘satellite junction’, Figure A-3) from the euchromatin of proximal Xp to the first satellite sequences, encountered less than 150 kb from the most proximal fully annotated gene, ZXDA, on Xp (Schueler et al. 2001).

While the most distal portion of the pericentromeric heterochromatin on Xp contains representatives of several different satellite DNA families (alpha satellite, gamma satellite, and a 35 bp satellite; see Figure A-3), it transitions proximally through ~200 kb of alpha satellite until the contig reaches a junction

(the ‘array junction’, Figure A-3) with members of the higher-order repeat array

DXZ1. Notably, repeats from the DXZ1 locus have been annotated functionally as the functional centromere on the X by deletion mapping using chromosome variants (Higgins et al. 1999; Lee et al. 2000; Schueler et al. 2001; Spence et al.

2002), by formation of de novo centromeres in human artificial chromosome assays (see below and (Rudd et al. 2003; Schueler et al. 2001)) and by mapping a domain of topoisomerase II activity associated with centromere function within the DXZ1 array near the Xp array junction (Spence et al. 2002). Thus, these studies provide a proof-of-principle for assembly and functional annotation of a human centromere (Henikoff 2002).

To extend these studies and to determine how general the strategy employed on Xp might be for examining other centromeres, we selected chromosome 17 and the D17Z1 locus, which previous studies had demonstrated had the functional attributes of the centromere (Haaf et al. 1992; Harrington et al. 1997;

Wevrick et al. 1990). In the course of developing a contig between 183

Figure A-3. Repeat content of the junction between the short arm euchromatin and centromere of the X chromosome. Each line and color illustrates a different type of repeat family present in the most proximal 1 Mb on Xp. The ‘satellite junction’ indicates an abrupt transition from the euchromatin of Xp and the first pericentromeric satellite sequences. At least three types of satellite are present here: monomeric alpha satellite, gamma satellite and a 35 bp satellite (HSAT4). The most proximal annotated genes on Xp (ZXDA and ZXDB) are shown, with the direction of transcription indicated by the arrows. The ‘array junction’ indicates the transition between monomeric alpha satellite and DXZ1 higher- order repeats, which extend a further ~3 Mb at the centromere. (Based on (Schueler et al. 2001)) 184

D17Z1 and the short arm of chromosome 17 (17p) (Chapter 3), we discovered a novel type of higher-order repeat, D17Z1-B, that adds to the complexity of this region (Figure A-4A). Most centromere regions that have been examined in detail have revealed a single, highly homogeneous higher-order repeat array, typically several megabases in length. Studies of the DXZ1 array have both strengthened this concept (Schindelhauer and Schwarz 2002; Schueler et al.

2001) and extended it, with the description of a set of localized, diverged copies of the DXZ1 repeat at the junction between DXZ1 and monomeric alpha satellite

(Schueler et al. 2001). In contrast, a few chromosomes (such as chromosome 7) are characterized by two, otherwise unrelated, higher-order repeat arrays

(Wevrick and Willard 1991). Chromosome 17 appears to represent a third type of centromere organization in that there are two physically distinct higher-order repeat arrays that are clearly related to each other evolutionarily.

Whereas D17Z1 is comprised of 16 monomers to make up a 2.7 kb higher- order repeat (Waye and Willard 1986), D17Z1-B is 2.4 kb long and made of 14 monomers. D17Z1 and D17Z1-B are both made up of monomers arranged in a pentameric fashion with corresponding monomers in the same order; however,

D17Z1-B is missing two monomers present in D17Z1 (Figure A-4B), likely reflecting a deletion or duplication event mediated by unequal crossing-over. We identified two overlapping clones in this contig that contain D17Z1-B alpha satellite; BAC RPCI-11 285M22 has been completely sequenced and BAC RPCI-

11 18L18 is currently partially sequenced (low-pass sequence sampling). BAC 185

Figure A-4. Organization of D17Z1 and D17Z1-B higher-order repeats at the centromere of chromosome 17. (A) The array of D17Z1 repeats spans ~ 2.5 Mb and lies adjacent to D17Z1-B, estimated to be ~500 kb. (B) Monomer organization of D17Z1 (16-mer) and D17Z1-B (14-mer). The two higher-order repeats share a similar monomer arrangement and are 92% identical to one another. (C) Chromosomal orientation of D17Z1 and D17Z1-B, determined by fluorescence in situ hybridization. D17Z1-B (red) hybridizes to the 17p side of D17Z1 (green). Chromosomes are counter-stained with DAPI in blue. 186

285M22 bridges the junction between the D17Z1-B array and monomeric alpha satellite; fluorescence in situ hybridization studies demonstrated that this junction lies to the short arm side of D17Z1 (Figure A-4A,C). When we compared the higher-order repeats in 285M22 and 18L18 (n=7), all were 98-99% identical, indicating that they are part of a highly homogeneous array of alpha satellite.

Notably, however, while D17Z1 and D17Z1-B share a similar multimeric structure, they are only 92% identical, demonstrating that they are distinct yet related higher-order repeats. (Such a relationship differs from a number of variants of D17Z1 that, while distinctive in their higher-order structure and their genomic localization within the large D17Z1 locus, are not distinguished by levels of overall sequence relatedness (Warburton and Willard 1990; Warburton and

Willard 1995).)

While a contig extending across the full D17Z1 and D17Z1-B arrays remains to be achieved, we have established a complete contig linking D17Z1-B to the euchromatin of 17p (Chapter 3), essentially using the strategy developed as part of our Xp work (Schueler et al. 2001). The genomic content of this part of chromosome 17 is, however, somewhat more complex than that of the X, in that there are three different blocks of monomeric alpha satellite located within ~500 kb of D17Z1-B, separated by regions of genomic sequence populated by a number of different repeat families (both satellite and non-satellite) and at least some transcribed elements. A similar picture appears to be emerging on the long arm (17q) side of the centromere, although a contig containing stretches of 187 monomeric alpha satellite and proximal 17q sequences has not yet been linked to the large D17Z1 array at the centromere (Chapter 3).

The coexistence within pericentromeric regions of both higher-order repeat arrays of alpha satellite and more limited stretches of monomeric alpha satellite without any detectable higher-order structure appears to be a consistent theme of a number of human chromosomes (Horvath et al. 2001). While the organizational distinction between these types of alpha satellite (i.e. multimeric vs. monomeric) is apparent at short-range with standard homology-finding programs (e.g. Figure A-1B,C), it is also important to address the phylogenetic sequence relationships among the individual monomers populating these different classes of the repeat family.

To illustrate this point, we examined approximately 100 monomers of monomeric alpha satellite from each of the Xp and 17q contigs, together with 28 monomers making up the DXZ1 and D17Z1 higher-order repeats. As seen in figure A-5, the sequences fall into two distinct phylogenetic clades, corresponding precisely to their monomeric or multimeric origin. Within the multimeric clade, the DXZ1 and D17X1 monomers exhibit close sequence relatedness, as established previously (Waye and Willard 1986). Notably, the monomeric clade includes monomers from both chromosomes (Figure A-5); in other words, clusters of monomeric alpha satellite from one chromosome are more closely related to monomeric alpha satellite from the other chromosome 188

84

98

0.10 Figure A-5. Phylogenetic analysis of 230 alpha satellite monomers from the X chromosome and chromosome 17. Monomers from DXZ1 (light blue), D17Z1 (pink), monomeric X alpha satellite (blue, from (Schueler et al. 2001)), and monomeric 17 alpha satellite (red, from chapter 3) were analyzed using neighbor-joining methods (http://www.megasoftware.net/). One alpha satellite monomer from African green monkey (green) was used as an outgroup. 1000 bootstrap replicates were performed, and bootstrap values for the well-supported monomeric and higher-order nodes are shown in bold type. 189 than they are to the multimeric alpha satellite that maps only a few hundred kilobases away on the same chromosome.

These relationships presumably reflect the highly efficient homogenization mechanisms that drive and maintain the high degree of sequence homogeneity within higher-order repeat arrays, even across several megabases of genomic

DNA (Durfy and Willard 1989; Schindelhauer and Schwarz 2002; Schueler et al.

2001; Willard and Waye 1987a). However, the phylogenetic data argue that homogenization mechanisms between monomeric and multimeric alpha satellite are poorly efficient or nonexistent, even over much shorter genomic distances.

Plausibly, this could reflect the inhibitory effect of local discontinuities (i.e. the introduction of non-alpha satellite sequences interrupting an otherwise continuous array of tandem repeats) on homogenization mechanisms.

Alternatively (or in addition), it may indicate that the initial establishment of a multimeric repeat (presumably by unequal crossing-over mediated by pairing of monomers out of register) is relatively inefficient and represents the rate-limiting step evolutionarily in generation of a highly homogeneous array (Smith 1976).

This inefficiency may be a by-product of the average sequence divergence between any two monomers within monomeric alpha satellite; even neighboring monomers differ in their sequence by ~20-35%, presumably quite a bit less than the levels of identity usually associated with unequal pairing and recombination

(e.g. (Stankiewicz and Lupski 2002)). In contrast, even the very first two multimeric repeats (no matter how unlikely their formation in the first place) are 190 virtually 100% identical in sequence, providing a highly efficient substrate for additional rounds of unequal crossing-over and thus driving formation of homogeneous higher-order repeat arrays (Warburton and Willard 1996).

The relationships among classes of alpha satellite both within a given centromeric region and between different chromosomes across the genome provide illustrative examples of a range of evolutionary mechanisms that are almost certainly at play elsewhere in the genome as well. The potential consequences of unequal crossing-over, sequence conversion, and sequence homogenization mechanisms for both human disease and genome evolution have been amply demonstrated. Notable examples include the clusters of highly homogeneous, low-copy repeats associated with genomic rearrangements

(Inoue and Lupski 2002; Stankiewicz and Lupski 2002), the high level of intra- and interchromosomal duplications in the human genome (Bailey et al. 2002;

Eichler and Sankoff 2003), and the unusually high level of palindromic sequences on the human Y chromosome associated with recurrent sequence conversion (Rozen et al. 2003; Skaletsky et al. 2003).

Functional genome annotation with human artificial chromosomes

To fully understand the nature of sequences associated with the centromeric regions of the human genome requires not only complete sequence annotation, but also detailed functional annotation in terms of the impact different sequence elements have on chromosome structure and function. While some sequences 191 in the vicinity of centromeres may indeed be without demonstrable function (and likely no different in that respect from the bulk of human genome sequences), others are plausible candidates for defining boundaries between euchromatin and heterochromatin, for establishing chromosomal regions with characteristic levels of gene expression, for influencing (i.e. blocking or mediating) potential position effects on gene function, for anchoring chromosomes within preferred nuclear territories, and for contributing to sister chromatid cohesion during cell division, in addition to being responsible for centromere specification and function.

As a step towards establishing an experimental approach suitable for functional genome annotation, we and others developed an assay based on formation of human artificial chromosomes (Harrington et al. 1997; Ikeno et al.

1998), building on the success and impact of yeast artificial chromosome technology for understanding the function of components of the budding yeast genome (Murray and Szostak 1983). The development of an efficient and tractable human artificial chromosome system would involve assembly of required chromosomal elements (centromere, telomeres and origins of DNA replication), together with genomic fragments whose genic or other functions one wished to examine (Larin and Mejia 2002). Progress towards this goal has been made, and several different approaches have been used or are under development, based on co-transfection of candidate genomic sequences (Grimes et al. 2001), ligation of synthetic centromere and telomere components 192

(Harrington et al. 1997), or modification of human sequences isolated in yeast

(Henning et al. 1999; Kouprina et al. 2003) and bacterial artificial chromosome constructs (Ebersole et al. 2000; Grimes et al. 2002; Mejia et al. 2002; Rudd et al. 2003; Schueler et al. 2001). While much of the early focus of this technology has been on optimization of de novo centromere formation (Harrington et al.

1997) (Ohzeki et al. 2002; Rudd et al. 2003), proof-of-principle experiments have shown that genes containing large fragments of chromosomal DNA from the human genome can also be expressed and thus are amenable to study using such assays (Grimes et al. 2001; Ikeno et al. 1998; Mejia et al. 2001).

The most straightforward assay is illustrated in figure A-6. In this approach,

BACs containing ~30-100 kb of alpha satellite are modified to introduce a drug- resistance marker for selection in mammalian cells and, if desired, additional fragments from the human genome for functional testing. The BAC is then transfected (or microinjected) into cells in culture and the resulting drug-resistant colonies are screened for the presence of a cytogenetically visible human artificial chromosome (Figure A-6). In ~5-50% of colonies (depending in part on the particular alpha satellite sequences used), a mitotically stable human artificial chromosome is detected in a high proportion of cells, containing both vector and input sequences and co-localizing with kinetochore proteins detected by indirect immunofluorescence. Importantly, a number of non-alpha satellite and non- centromeric control genomic fragments are incapable of de novo centromere 193

Figure A-6. Functional centromere annotation using a human artificial chromosome assay (Grimes et al. 2002; Rudd et al. 2003). Alpha satellite DNA hypothesized to play a role in centromere function (purple arrows) is cloned into a BAC vector containing a drug resistance gene (‘R’) and transfected into human tissue culture cells. Drug-resistant clones are screened for the presence of an artificial chromosome. The artificial chromosome can be identified by hybridization to a red alpha satellite probe. Like normal chromosomes, artificial chromosomes bind antibodies to centromere and kinetochore proteins (green). 194 formation using this assay (Ebersole et al. 2000; Grimes et al. 2002), indicating that the assay is specific for functional centromeric sequences.

Using such an assay, we have demonstrated that both DXZ1 and D17Z1 sequences are capable of generating de novo centromeres in human artificial chromosomes (Grimes et al. 2002; Harrington et al. 1997; Rudd et al. 2003;

Schueler et al. 2001). This provides functional annotation for at least part of the pericentromeric contigs described earlier. However, it should be emphasized that the ability of sequences adjacent to these higher-order repeat arrays to function as centromeres in this assay has not as yet been evaluated. Further, the alpha satellite sequences alone do not completely recapitulate mitotic centromere function, as human artificial chromosomes show a level of chromosome nondisjunction and anaphase lag that is significantly higher than that of intact, endogenous centromeres (Rudd et al. 2003). This suggests that other sequences in these centromeric contigs may be necessary for at least some aspect of faithful chromosome segregation and stability. Thus, to completely annotate these regions will require testing both monomeric alpha satellite and the D17Z1-B sequences, using both the human artificial chromosome assay and extended chromatin studies to identify which DNA is involved in the assembly of the specialized chromatin that underlies centromere function (Blower et al. 2002; Cleveland et al. 2003). 195

Conclusions

Current studies of the genomic organization of the centromeric regions of human chromosomes reveal several features, despite the large gaps that are apparent in the current genome sequence assemblies (e.g. Figure A-2). The most proximal contigs on several chromosome arms in the genome have reached alpha satellite DNA, including those on chromosomes 7 (Hillier et al. 2003;

Scherer et al. 2003), 16 (Horvath et al. 2000), 21 (Brun et al. 2003), 22 (Dunham et al. 1999), and the Y chromosome (Skaletsky et al. 2003), in addition to our work on the X chromosome and chromosome 17, as summarized here. Other fully sequenced contigs (, (Guy et al. 2003)) terminate in other types of satellite DNA, short of connecting to alpha satellite at the centromere.

However, only the contigs on Xp and 17p span from euchromatin of the chromosome arm to higher-order alpha satellite repeat arrays that have been annotated functionally with centromere assays (Figure A-7). Others (like the chromosome 21q junction illustrated in Figure A-7) terminate in monomeric alpha satellite, but are separated by a gap of undetermined size from the higher-order sequences of the functional centromere. Thus, chromosome arm/centromere junctions remain important goals for future research, requiring a combination of directed efforts to extend existing contigs and suitable functional assays to provide validation and functional annotation.

As we move towards an understanding of the organization, function and evolution of the human genome, any claims of a “complete” sequence will need 196

Figure A-7. Genome assembly of the centromeric regions of the X chromosome, chromosome 17 and 21. Both the X and 17 centromeres have contiguous sequence on the short arm sides, connecting euchromatin to monomeric alpha satellite (green) to higher-order repeat alpha satellite (red). Orange indicates other satellite sequences. A chromosome 21q contig has reached monomeric alpha satellite but has not connected to higher-order alpha satellite (gap indicated by question marks). For these three human chromosomes, the centromere activity of their respective higher-order repeat alpha satellites has been functionally annotated using a human artificial chromosome assay (Harrington et al. 1997; Ikeno et al. 1998; Schueler et al. 2001). 197 to include full analysis of the pericentromeric and other heterochromatic regions of our chromosomes. The data presented here and elsewhere (Schueler et al.

2001) suggest that, notwithstanding their repetitive content, the satellite- containing centromeric regions of human chromosomes can be mapped, sequenced, assembled and annotated functionally. Complete assembly of centromere contigs should, therefore, be feasible and will provide an important source of genomic and functional data for studies of chromosome biology, as well as genome evolution.

Scientific arguments aside, there is also a strong historical and philosophical imperative for including centromeres in the final stages of gap closure in the archival, truly complete sequence of the genome of Homo sapiens. After all, which part of the Rosetta Stone would one choose to omit?

Acknowledgements

We thank Evan Eichler, Jeff Bailey, Devin Locke, and Eric Green for helpful discussions and assistance. Work in the authors’ lab has been supported by research grants from the National Institutes of Health and the March of Dimes

Birth Defects Foundation. 198

BIBLIOGRAPHY

Ahmad, K. and S. Henikoff. 2002. Histone H3 variants specify modes of chromatin assembly. Proc Natl Acad Sci U S A 99 Suppl 4: 16477-16484.

Alexandrov, I., A. Kazakov, I. Tumeneva, V. Shepelev, and Y. Yurov. 2001. Alpha-satellite DNA of primates: old and new families. Chromosoma 110: 253-266.

Alexandrov, I.A., T.D. Mashkova, T.A. Akopian, L.I. Medvedev, L.L. Kisselev, S.P. Mitkevich, and Y.B. Yurov. 1991. Chromosome-specific alpha satellites: two distinct families on human . Genomics 11: 15-23.

Alexandrov, I.A., L.I. Medvedev, T.D. Mashkova, L.L. Kisselev, L.Y. Romanova, and Y.B. Yurov. 1993. Definition of a new alpha satellite suprachromosomal family characterized by monomeric organization. Nucleic Acids Res 21: 2209-2215.

Alexandrov, I.A., S.P. Mitkevich, and Y.B. Yurov. 1988. The phylogeny of human chromosome specific alpha satellites. Chromosoma 96: 443-453.

Allshire, R.C., E.R. Nimmo, K. Ekwall, J.P. Javerzat, and G. Cranston. 1995. Mutations derepressing silent centromeric domains in fission yeast disrupt chromosome segregation. Genes Dev 9: 218-233.

Alonso, A., R. Mahmood, S. Li, F. Cheung, K. Yoda, and P.E. Warburton. 2003. Genomic microarray analysis reveals distinct locations for the CENP-A binding domains in three human chromosome 13q32 neocentromeres. Hum Mol Genet 12: 2711-2721.

Alves, G., H.N. Seuanez, and T. Fanning. 1994. Alpha satellite DNA in neotropical primates (Platyrrhini). Chromosoma 103: 262-267.

Ando, S., H. Yang, N. Nozaki, T. Okazaki, and K. Yoda. 2002. CENP-A, -B, and - C chromatin complex that contains the I-type alpha-satellite array constitutes the prekinetochore in HeLa cells. Mol Cell Biol 22: 2229-2241.

Bailey, J.A., Z. Gu, R.A. Clark, K. Reinert, R.V. Samonte, S. Schwartz, M.D. Adams, E.W. Myers, P.W. Li, and E.E. Eichler. 2002. Recent segmental duplications in the human genome. Science 297: 1003-1007. 199

Bailey, J.A., A.M. Yavor, H.F. Massa, B.J. Trask, and E.E. Eichler. 2001. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11: 1005-1017.

Baldini, A., D.A. Miller, O.J. Miller, O.A. Ryder, and A.R. Mitchell. 1991. A chimpanzee-derived chromosome-specific alpha satellite DNA sequence conserved between chimpanzee and human. Chromosoma 100: 156-161.

Baldini, A., T. Ried, V. Shridhar, K. Ogura, L. D'Aiuto, M. Rocchi, and D.C. Ward. 1993. An alphoid DNA sequence conserved in all human and great ape chromosomes: evidence for ancient centromeric sequences at human chromosomal regions 2q21 and 9q13. Hum Genet 90: 577-583.

Baldini, A., D.I. Smith, M. Rocchi, O.J. Miller, and D.A. Miller. 1989. A human alphoid DNA clone from the EcoRI dimeric family: genomic and internal organization and chromosomal assignment. Genomics 5: 822-828.

Bannister, A.J., P. Zegerman, J.F. Partridge, E.A. Miska, J.O. Thomas, R.C. Allshire, and T. Kouzarides. 2001. Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature 410: 120-124.

Barry, A.E., E.V. Howman, M.R. Cancilla, R. Saffery, and K.H. Choo. 1999. Sequence analysis of an 80 kb human neocentromere. Hum Mol Genet 8: 217-227.

Baum, M. and L. Clarke. 2000. Fission yeast homologs of human CENP-B have redundant functions affecting cell growth and chromosome segregation. Mol Cell Biol 20: 2852-2864.

Baum, M., V.K.N. VK, and L. Clarke. 1994. The centromeric K-type repeat and the central core are together sufficient to establish a functional Schizosaccaromyces pombe centromere. Mol Biol Cell 5: 747-761.

Beach, D., M. Piper, and S. Shall. 1980. Isolation of chromosomal origins of replication in yeast. Nature 284: 185-187.

Bernard, P., J.F. Maure, J.F. Partridge, S. Genier, J.P. Javerzat, and R.C. Allshire. 2001. Requirement of heterochromatin for cohesion at centromeres. Science 294: 2539-2542.

Bloom, K.S. and J. Carbon. 1982. Yeast centromere DNA is in a unique and highly ordered structure in chromosomes and small circular minichromosomes. Cell 29: 305-317. 200

Blower, M.D. and G.H. Karpen. 2001. The role of Drosophila CID in kinetochore formation, cell-cycle progression and heterochromatin interactions. Nat Cell Biol 3: 730-739.

Blower, M.D., B.A. Sullivan, and G.H. Karpen. 2002. Conserved organization of centromeric chromatin in flies and humans. Dev Cell 2: 319-330.

Broccoli, D., O.J. Miller, and D.A. Miller. 1990. Relationship of minor satellite to centromere activity. Cytogenet. Cell Genet. 54: 182-186.

Brown, D.D., P.C. Wensink, and E. Jordan. 1972. A comparison of the ribosomal DNA's of Xenopus laevis and Xenopus mulleri: the evolution of tandem genes. J Mol Biol 63: 57-73.

Brown, M.T., L. Goetsch, and L.H. Hartwell. 1993. MIF2 is required for mitotic spindle integrity during anaphase spindle elongation in Saccharomyces cerevisiae. J Cell Biol 123: 387-403.

Brun, M.E., M. Ruault, M. Ventura, G. Roizes, and A. De Sario. 2003. Juxtacentromeric region of human chromosome 21: a boundary between centromeric heterochromatin and euchromatic chromosome arms. Gene 312: 41-50.

Buchwitz, B.J., K. Ahmad, L.L. Moore, M.B. Roth, and S. Henikoff. 1999. A histone-H3-like protein in C. elegans. Nature 401: 547-548.

Bunch, R.T., D.A. Gewirtz, and L.F. Povirk. 1995. Ionizing radiation-induced DNA strand breakage and rejoining in specific genomic regions as determined by an alkaline unwinding/Southern blotting method. Int J Radiat Biol 68: 553-562.

Charlesworth, B., P. Sniegowski, and W. Stephan. 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371: 215-220.

Cheng, Z., F. Dong, T. Langdon, S. Ouyang, C.R. Buell, M. Gu, F.R. Blattner, and J. Jiang. 2002. Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell 14: 1691- 1704.

Choo, A.K.H. 2000. Centromerization. Trends Cell Bio 10: 182-188.

Choo, K.H. 2001. Domain organization at the centromere and neocentromere. Dev Cell 1: 165-177. 201

Choo, K.H., E. Earle, B. Vissel, and R.G. Filby. 1990. Identification of two distinct subfamilies of alpha satellite DNA that are highly specific for human . Genomics 7: 143-151.

Choo, K.H., B. Vissel, and E. Earle. 1989. Evolution of alpha DNA on human acrocentric chromosomes. Genomics 5: 332-344.

Cimini, D., D. Fioravanti, E.D. Salmon, and F. Degrassi. 2002. Merotelic kinetochore orientation versus chromosome mono-orientation in the origin of lagging chromosomes in human primary cells. J Cell Sci 115: 507-515.

Clarke, L. 1990. Centromeres of budding and fission yeasts. Trends Genet. 6: 150-154.

Clarke, L. 1998. Centromeres: proteins, protein complexes, and repeated domains at centromeres of simple eukaryotes. Curr Opin Genet Dev 8: 212-218.

Clarke, L., H. Amstutz, B. Fishel, and J. Carbon. 1986. Analysis of centromeric DNA in the fission yeast Schizosaccharomyces pombe. Proc Natl Acad Sci U S A 83: 8253-8257.

Clarke, L. and J. Carbon. 1985. The structure and function of yeast centromeres. Ann. Rev. Genet. 19: 29-56.

Cleveland, D.W., Y. Mao, and K.F. Sullivan. 2003. Centromeres and kinetochores. From epigenetics to mitotic checkpoint signaling. Cell 112: 407-421.

Coen, E., T. Strachan, and G. Dover. 1982. Dynamics of concerted evolution of ribosomal DNA and histone gene families in the melanogaster species subgroup of Drosophila. J Mol Biol 158: 17-35.

Coen, E.S. and G.A. Dover. 1983. Unequal exchanges and the coevolution of X and Y rDNA arrays in Drosophila melanogaster. Cell 33: 849-855.

Collins, F.S., E.D. Green, A.E. Guttmacher, and M.S. Guyer. 2003. A vision for the future of genomics research. Nature 422: 835-847.

Collins, F.S., A. Patrinos, E. Jordan, A. Chakravarti, R. Gesteland, and L. Walters. 1998. New goals for the U.S. Human Genome Project: 1998- 2003. Science 282: 682-689. 202

Copenhaver, G.P., W.E. Browne, and D. Preuss. 1998. Assaying genome-wide recombination and centromere functions with Arabidopsis tetrads. Proc Natl Acad Sci U S A 95: 247-252.

Copenhaver, G.P., K. Nickel, T. Kuromori, M.I. Benito, S. Kaul, X. Lin, M. Bevan, G. Murphy, B. Harris, L.D. Parnell, W.R. McCombie, R.A. Martienssen, M. Marra, and D. Preuss. 1999. Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286: 2468-2474.

Corneo, G., E. Ginelli, and E. Polli. 1967. A satellite DNA isolated from human tissues. J Mol Biol 23: 619-622.

Couronne, O., A. Poliakov, N. Bray, T. Ishkhanov, D. Ryaboy, E. Rubin, L. Pachter, and I. Dubchak. 2003. Strategies and tools for whole-genome alignments. Genome Res 13: 73-80.

Craig, J.M., L.H. Wong, A.W. Lo, E. Earle, and K.H. Choo. 2003. Centromeric chromatin pliability and memory at a human neocentromere. Embo J 22: 2495-2504. de la Puente, A., E. Velasco, L.A. Perez Jurado, C. Hernandez-Chico, F.M. van de Rijke, S.W. Scherer, A.K. Raap, and J. Cruces. 1998. Analysis of the monomeric alphoid sequences in the pericentromeric region of human chromosome 7. Cytogenet Cell Genet 83: 176-181. de Lange, T. 2002. Protection of mammalian telomeres. Oncogene 21: 532-540.

Deininger, P.L. and M.A. Batzer. 1999. Alu repeats and human disease. Mol Genet Metab 67: 183-193.

Deininger, P.L., J.V. Moran, M.A. Batzer, and H.H. Kazazian, Jr. 2003. Mobile elements and mammalian genome evolution. Curr Opin Genet Dev 13: 651-658.

Dej, K.J. and T.L. Orr-Weaver. 2000. Separation anxiety at the centromere. Trends Cell Biol 10: 392-399.

Depinet, T.W., J.L. Zackowski, W.C. Earnshaw, S. Kaffe, G.S. Sekhon, R. Stallard, B.A. Sullivan, G.H. Vance, D.L. Van Dyke, H.F. Willard, A.B. Zinn, and S. Schwartz. 1997. Characterization of neo-centromeres in marker chromosomes lacking detectable alpha-satellite DNA. Hum Mol Genet 6: 1195-1204. 203

Devilee, P., T. Kievits, J.S. Waye, P.L. Pearson, and H.F. Willard. 1988. Chromosome-specific alpha satellite DNA: isolation and mapping of a polymorphic alphoid repeat from human chromosome 10. Genomics 3: 1- 7.

Dillon, N. and R. Festenstein. 2002. Unravelling heterochromatin: competition between positive and negative factors regulates accessibility. Trends Genet 18: 252-258.

Doe, C.L., G. Wang, C. Chow, M.D. Fricker, P.B. Singh, and E.J. Mellor. 1998. The fission yeast chromo domain encoding gene chp1 (+) is required for chromosome segregation and shows a genetic interaction with alpha- tubulin. Nucleic Acids Res 26: 4222-4229.

Dong, F., J.T. Miller, S.A. Jackson, G.L. Wang, P.C. Ronald, and J. Jiang. 1998. Rice (Oryza sativa) centromeric regions consist of complex DNA. Proc Natl Acad Sci U S A 95: 8135-8140.

Doolittle, W.F. and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284: 601-603.

Dover, G. 1982. Molecular drive: a cohesive mode of species evolution. Nature 299: 111-117.

Dresen, I.M., J. Husing, E. Kruse, T. Boes, and K.H. Jockel. 2003. Software packages for quantitative microarray-based gene expression analysis. Curr Pharm Biotechnol 4: 417-437. du Sart, D., M.R. Cancilla, E. Earle, J.I. Mao, R. Saffery, K.M. Tainton, P. Kalitsis, J. Martyn, A.E. Barry, and K.H. Choo. 1997. A functional neo-centromere formed through activation of a latent human centromere and consisting of non-alpha-satellite DNA [see comments]. Nat Genet 16: 144-153.

Dunham, A., L.H. Matthews, J. Burton, et al. 2004. The DNA sequence and analysis of human chromosome 13. Nature 428: 522-528.

Dunham, I., N. Shimizu, B.A. Roe, et al. 1999. The DNA sequence of human chromosome 22. Nature 402: 489-495.

Durfy, S.J. and H.F. Willard. 1987. Molecular analysis of a polymorphic domain of alpha satellite from the human X chromosome. Am. J. Hum. Genet. 41: 391-401. 204

Durfy, S.J. and H.F. Willard. 1989. Patterns of intra- and interarray sequence variation in alpha satellite from the human X chromosome: Evidence for short range homogenization of tandemly repeated DNA sequences. Genomics 5: 810-821.

Durfy, S.J. and H.F. Willard. 1990. Concerted evolution of primate alpha satellite DNA: evidence for an ancestral sequence shared by gorilla and human X chromosome alpha satellite. J. Mol. Biol. 216: 555-566.

Earnshaw, W.C., R.L. Bernat, C.A. Cooke, and N.F. Rothfield. 1991. Role of the centromere/kinetochore in cell cycle control. Cold Spring Harb Symp Quant Biol 56: 675-685.

Earnshaw, W.C., H. Ratrie, and G. Stetten. 1989. Visualization of centromere proteins CENP-B and CENP-C on a stable dicentric chromosome in cytological spreads. Chromosoma 98: 1-12.

Earnshaw, W.C. and N. Rothfield. 1985. Identification of a family of human centromere proteins using autoimmune sera from patients with scleroderma. Chromosoma 91: 313-321.

Earnshaw, W.C., K.F. Sullivan, P.S. Machlin, C.A. Cooke, D.A. Kaiser, T.D. Pollard, N.F. Rothfield, and D.W. Cleveland. 1987. Molecular cloning of cDNA for CENP-B, the major human centromere autoantigen. J. Cell Biol. 104: 817-829.

Ebersole, T.A., A. Ross, E. Clark, N. McGill, D. Schindelhauer, H. Cooke, and B. Grimes. 2000. Mammalian artificial chromosome formation from circular alphoid input DNA does not require telomere repeats. Hum Mol Genet 9: 1623-1631.

Eichler, E.E., R.A. Clark, and X. She. 2004. An assessment of the sequence gaps: unfinished business in a finished human genome. Nat Rev Genet 5: 345-354.

Eichler, E.E. and D. Sankoff. 2003. Structural dynamics of eukaryotic chromosome evolution. Science 301: 793-797.

Ekwall, K., J.P. Javerzat, A. Lorentz, H. Schmidt, G. Cranston, and R. Allshire. 1995. The chromodomain protein Swi6: a key component at fission yeast centromeres. Science 269: 1429-1431.

Fan, J.B., U. Surti, P. Taillon-Miller, L. Hsie, G.C. Kennedy, L. Hoffner, T. Ryder, D.G. Mutch, and P.Y. Kwok. 2002. Paternal origins of complete 205

hydatidiform moles proven by whole genome single-nucleotide polymorphism haplotyping. Genomics 79: 58-62.

Fanning, T.G. 1989. Molecular evolution of centromere-associated nucleotide sequences in two species of canids. Gene 85: 559-563.

Farr, C.J., R.A.L. Bayne, D. Kipling, W. Mills, R. Critcher, and H.J. Cooke. 1995. Generation of a human X-derived minichromosome using telomere- associated chromosome fragmentation. EMBO J 14: 5444-5454.

Farr, C.J., M. Stevanovic, E.J. Thomson, P.N. Goodfellow, and H.J. Cooke. 1992. Telomere-associated chromosome fragmentation: applications in genome manipulation analysis. Nature Genetics 2: 275-282.

Figueroa, J., R. Saffrich, W. Ansorge, and M. Valdivia. 1998. Microinjection of antibodies to centromere protein CENP-A arrests cells in interphase but does not prevent mitosis. Chromosoma 107: 397-405.

Fitzgerald-Hayes, M., L. Clarke, and J. Carbon. 1982. Nucleotide sequence comparisons and functinal analysis of yeast centromere DNAs. Cell 29: 235-244.

Floridia, G., G. Gimelli, O. Zuffardi, W.C. Earnshaw, P.E. Warburton, and C. Tyler-Smith. 2000. A neocentromere in the DAZ region of the human Y chromosome. Chromosoma 109: 318-327.

Frengen, E., B. Zhao, S. Howe, D. Weichenhan, K. Osoegawa, E. Gjernes, J. Jessee, H. Prydz, C. Huxley, and P.J. de Jong. 2000. Modular bacterial artificial chromosome vectors for transfer of large inserts into mammalian cells. Genomics 68: 118-126.

Ge, Y., M.J. Wagner, M. Siciliano, and D.E. Wells. 1992. Sequence, higher order repeat structure, and long-range organization of alpha satellite DNA specific to human chromosome 8. Genomics 13: 585-593.

Gilbert, D.M. 2001. Making sense of eukaryotic DNA replication origins. Science 294: 96-100.

Goldberg, I.G., H. Sawhney, A.F. Pluta, P.E. Warburton, and W.C. Earnshaw. 1996. Surprising deficiency of CENP-B binding sites in African green monkey alpha-satellite DNA: implications for CENP-B function at centromeres. Mol Cell Biol 16: 5156-5168. 206

Goodier, J.L., E.M. Ostertag, and H.H. Kazazian, Jr. 2000. Transduction of 3'- flanking sequences is common in L1 retrotransposition. Hum Mol Genet 9: 653-657.

Goodman, M. 1999. The genomic record of Humankind's evolutionary roots. Am J Hum Genet 64: 31-39.

Gosden, J.R., A.R. Mitchell, R.A. Buckland, R.P. Clayton, and H.J. Evans. 1975. The locations of four human satellite DNAs on human chromosomes. Exp Cell Res 92: 148-158.

Gregson, H.C., A.A. Van Hooser, A.R. Ball, Jr., B.R. Brinkley, and K. Yokomori. 2002. Localization of human SMC1 protein at kinetochores. Chromosome Res 10: 267-277.

Greig, G.M., S.B. England, H.M. Bedford, and H.F. Willard. 1989. Chromosome- specific alpha satellite DNA from the centromere of human . Am. J. Hum. Genet. 45: 862-872.

Greig, G.M., P.E. Warburton, and H.F. Willard. 1993. Organization and evolution of an alpha satellite DNA subset shared by human chromosomes 13 and 21. J Mol Evol 37: 464-475.

Grimes, B.R., J. Babcock, M.K. Rudd, B.P. Chadwick, and H.F. Willard. 2004. Assembly and characterization of heterochromatin and euchromatin on human artificial chromosomes. submitted.

Grimes, B.R., A.A. Rhoades, and H.F. Willard. 2002. Alpha-satellite DNA and vector composition influence rates of human artificial chromosome formation. Mol Ther 5: 798-805.

Grimes, B.R., D. Schindelhauer, N.I. McGill, A. Ross, T.A. Ebersole, and H.J. Cooke. 2001. Stable gene expression from a mammalian artificial chromosome. EMBO Rep 2: 910-914.

Guy, J., T. Hearn, M. Crosier, J. Mudge, L. Viggiano, D. Koczan, H.J. Thiesen, J.A. Bailey, J.E. Horvath, E.E. Eichler, M.E. Earthrowl, P. Deloukas, L. French, J. Rogers, D. Bentley, and M.S. Jackson. 2003. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10p. Genome Res 13: 159-172.

Guy, J., C. Spalluto, A. McMurray, T. Hearn, M. Crosier, L. Viggiano, V. Miolla, N. Archidiacono, M. Rocchi, C. Scott, P.A. Lee, J. Sulston, J. Rogers, D. 207

Bentley, and M.S. Jackson. 2000. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10q. Hum Mol Genet 9: 2029-2042.

Haaf, T., A.G. Mater, J. Wienberg, and D.C. Ward. 1995. Presence and abundance of CENP-B box sequences in great ape subsets of primate- specific alpha-satellite DNA. J Mol Evol 41: 487-491.

Haaf, T., P.E. Warburton, and H.F. Willard. 1992. Integration of human alpha satellite DNA into Simian chromosomes: Centromere protein binding and disruption of normal chromosome segregation. Cell 70: 681-696.

Haaf, T. and D.C. Ward. 1994. Structural analysis of alpha satellite DNA and centromere proteins using extended chromatin and chromosomes. Hum Mol Genet 3: 697-709.

Haaf, T. and H.F. Willard. 1992. Organization, polymorphism, and molecular of chromosome-specific alpha-satellite DNA from the centromere of . Genomics 13: 122-128.

Haaf, T. and H.F. Willard. 1998. Orangutan alpha-satellite monomers are closely related to the human consensus sequence. Mamm Genome 9: 440-447.

Hahnenberger, K.M., J. Carbon, and L. Clarke. 1991. Identification of DNA regions required for mitotic and meiotic functions within the centromere of Schizosaccaromyces pombe . Mol. Cell. Biol. 11: 2206- 2215.

Hall, S.E., G. Kettler, and D. Preuss. 2003. Centromere satellites from Arabidopsis populations: maintenance of conserved and variable domains. Genome Res 13: 195-205.

Harrington, J.J., G. Van Bokkelen, R.W. Mays, K. Gustashaw, and H.F. Willard. 1997. Formation of de novo centromeres and construction of first- generation human artificial [see comments]. Nat Genet 15: 345-355.

Hattori, M., A. Fujiyama, T.D. Taylor, et al. 2000. The DNA sequence of human chromosome 21. Nature 405: 311-319.

Hegemann, J.H., J.H. Shero, G. Cottarel, P. Philippsen, and P. Hieter. 1988. Mutational analysis of centromere DNA from chromosome VI of Saccharomyces cerevisiae. Mol. Cell. Biol. 8: 2523-2535. 208

Heller, R., K.E. Brown, C. Burtgorf, and W.R.A. Brown. 1996. Mini-chromosomes derived from the human Y chromosome by telomere directed chromosome breakage. PNAS 93: 7125-7130.

Henikoff, S. 2002. Near the edge of a chromosome's "black hole". Trends Genet 18: 165-167.

Henikoff, S., K. Ahmad, and H.S. Malik. 2001. The centromere paradox: stable inheritance with rapidly evolving DNA. Science 293: 1098-1102.

Henikoff, S., K. Ahmad, J.S. Platero, and B. van Steensel. 2000. Heterochromatic deposition of centromeric histone H3-like proteins. Proc Natl Acad Sci U S A 97: 716-721.

Henning, K.A., E.A. Novotny, S.T. Compton, X.Y. Guan, P.P. Liu, and M.A. Ashlock. 1999. Human artificial chromosomes generated by modification of a yeast artificial chromosome containing both human alpha satellite and single- copy DNA sequences. Proc Natl Acad Sci U S A 96: 592-597.

Hieter, P., D. Pridmore, J.H. Hegemannn, M. Thomas, R.W. Davis, and P. Philippsen. 1985. Functional selection and analysis of yeast centromeric DNA. Cell 42: 913-921.

Higgins, A.W., M.G. Schueler, and H.F. Willard. 1999. Chromosome engineering: generation of mono- and dicentric in a somatic cell hybrid system. Chromosoma 108: 256-265.

Hillier, L.W. R.S. Fulton L.A. Fulton et al. 2003. The DNA sequence of human chromosome 7. Nature 424: 157-164.

Hoque, M.T. and F. Ishikawa. 2001. Human chromatid cohesin component hRad21 is phosphorylated in M phase and associated with metaphase centromeres. J Biol Chem 276: 5059-5067.

Hoque, M.T. and F. Ishikawa. 2002. Cohesin defects lead to premature sister chromatid separation, kinetochore dysfunction, and spindle-assembly checkpoint activation. J Biol Chem 277: 42306-42314.

Horvath, J.E., J.A. Bailey, D.P. Locke, and E.E. Eichler. 2001. Lessons from the human genome: transitions between euchromatin and heterochromatin. Hum Mol Genet 10: 2215-2223. 209

Horvath, J.E., L. Viggiano, B.J. Loftus, M.D. Adams, N. Archidiacono, M. Rocchi, and E.E. Eichler. 2000. Molecular structure and evolution of an alpha satellite/non-alpha satellite junction at 16p11. Hum Mol Genet 9: 113-123.

Horz, W. and W. Altenburger. 1981. Nucleotide sequence of mouse satellite DNA. Nucleic Acids Res 9: 683-696.

Hoskins, R.A., C.D. Smith, J.W. Carlson, A.B. Carvalho, A. Halpern, J.S. Kaminker, C. Kennedy, C.J. Mungall, B.A. Sullivan, G.G. Sutton, J.C. Yasuhara, B.T. Wakimoto, E.W. Myers, S.E. Celniker, G.M. Rubin, and G.H. Karpen. 2002. Heterochromatic sequences in a Drosophila whole- genome shotgun assembly. Genome Biol 3: RESEARCH0085.

Howman, E.V., K.J. Fowler, A.J. Newson, S. Redward, A.C. MacDonald, P. Kalitsis, and K.H. Choo. 2000. Early disruption of centromeric chromatin organization in centromere protein A (Cenpa) null mice. Proc Natl Acad Sci U S A 97: 1148-1153.

Hudson, D.F., K.J. Fowler, E. Earle, R. Saffery, P. Kalitsis, H. Trowell, J. Hill, N.G. Wreford, D.M. de Kretser, M.R. Cancilla, E. Howman, L. Hii, S.M. Cutts, D.V. Irvine, and K.H. Choo. 1998. Centromere protein B null mice are mitotically and meiotically normal but have lower body and testis weights. J Cell Biol 141: 309-319.

Hulsebos, T., D. Schonk, I. van Dalen, M. Coerwinkel-Driessen, J. Schepens, H.H. Ropers, and B. Wieringa. 1988. Isolation and characterization of alphoid DNA sequences specific for the pericentric regions of chromosomes 4, 5, 9, and 19. Cytogenet Cell Genet 47: 144-148.

Hyman, A.A. and P.K. Sorger. 1995. Structure and function of kinetochores in budding yeast. Annu Rev Cell Dev Biol 11: 471-495.

Ihaka, R. and R. Gentleman. 1996. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5: 299-314.

Ikeno, M., B. Grimes, T. Okazaki, M. Nakano, K. Saitoh, H. Hoshino, N.I. McGill, H. Cooke, and H. Masumoto. 1998. Construction of YAC-based mammalian artificial chromosomes. Nat Biotechnol 16: 431-439.

Ikeno, M., H. Masumoto, and T. Okazaki. 1994. Distribution of CENP-B boxes reflected in CREST centromere antigenic sites on long-range a-satellite DNA arrays of chromosome 21. Hum Mol Genet 3: 1245-1257. 210

Inoue, K. and J.R. Lupski. 2002. Molecular mechanisms for genomic disorders. Annu Rev Genomics Hum Genet 3: 199-242.

Irelan, J.T., G.I. Gutkin, and L. Clarke. 2001. Functional redundancies, distinct localizations and interactions among three fission yeast homologs of centromere protein-B. Genetics 157: 1191-1203.

Jackson, M. 2003. Duplicate, decouple, disperse: the evolutionary transience of human centromeric regions. Curr Opin Genet Dev 13: 629-635.

Jackson, M.S., C.G. See, L.M. Mulligan, and B.F. Lauffart. 1996. A 9.75-Mb map across the centromere of human chromosome 10. Genomics 33: 258-270.

James, T. and S.C.R. Elgin. 1986. Identification of a nonhistone chromosomal protein sssociated with heterochromatin in Drosophila and its gene. Mol. Cell. Biol. 6: 3862-3872.

James, T.C., J.C. Eissenberg, C. Craig, V. Dietrich, A. Hobson, and S.C.R. Elgin. 1989. Distribution patterns of HP1, a heterochromatin-associated nonhistone chromosomal protein of Drosophila. Eur. J. Cell Biol. 50: 170- 180.

Jehn, B., R. Niedenthal, and J.H. Hegemann. 1991. In vivo analysis of the Saccharomyces cerevisiae centromere CDEIII sequence: requirements for mitotic chromosome segregation. Mol Cell Biol 11: 5212-5221.

Jones, R.S. and S.S. Potter. 1985. Characterization of cloned human alphoid satellite with an unusual monomeric construction: evidence for enrichment in HeLa small polydisperse circular DNA. Nucleic Acids Res 13: 1027- 1042.

Jorgensen, A.L., C. Jones, C.J. Bostock, and A.L. Bak. 1987. Different subfamilies of alphoid repetitive DNA are present on the human and chimpanzee homologous chromosomes 21 and 22. EMBO J 6: 1691- 1696.

Jorgensen, A.L., S. Kolvraa, C. Jones, and A.L. Bak. 1988. A subfamily of alphoid repetitive DNA shared by the NOR-bearing human chromosomes 14 and 22. Genomics 3: 100-109.

Joseph, A., A.R. Mitchell, and O.J. Miller. 1989. The organization of the mouse satellite DNA at centromeres. Exp. Cell Res. 183: 494-500. 211

Kalitsis, P., K.J. Fowler, E. Earle, J. Hill, and K.H. Choo. 1998. Targeted disruption of mouse centromere protein C gene leads to mitotic disarray and early embryo death. Proc Natl Acad Sci U S A 95: 1136-1141.

Karpen, G.H. and R.C. Allshire. 1997. The case for epigenetic effects on centromere identity and function. Trends Genet 13: 489-496.

Kent, W.J., C.W. Sugnet, T.S. Furey, K.M. Roskin, T.H. Pringle, A.M. Zahler, and D. Haussler. 2002. The human genome browser at UCSC. Genome Res 12: 996-1006.

Kipling, D., H.E. Ackford, B.A. Taylor, and H.J. Cooke. 1991. Mouse minor satellite DNA genetically maps to the centromere and is physically linked to the proximal telomere. Genomics 11: 235-241.

Kit, S. 1961. Equilibrium sedimentation in density gradients of DNA preparations from animal tissues. J Mol Biol 3: 711-716.

Klein, F., P. Mahr, M. Galova, S.B. Buonomo, C. Michaelis, K. Nairz, and K. Nasmyth. 1999. A central role for cohesins in sister chromatid cohesion, formation of axial elements, and recombination during yeast meiosis. Cell 98: 91-103.

Kniola, B., E. O'Toole, J.R. McIntosh, B. Mellone, R. Allshire, S. Mengarelli, K. Hultenby, and K. Ekwall. 2001. The domain structure of centromeres is conserved from fission yeast to humans. Mol Biol Cell 12: 2767-2775.

Koch, J. 2000. Neocentromeres and alpha satellite: a proposed structural code for functional human centromere DNA. Hum Mol Genet 9: 149-154.

Koshland, D., L. Rutledge, M. Fitzgerald-Hayes, and L.H. Hartwell. 1987. A genetic analysis of dicentric minichromosomes in Saccharomyces cerevisiae. Cell 48: 801-812.

Kouprina, N., T. Ebersole, M. Koriabine, E. Pak, I.B. Rogozin, M. Katoh, M. Oshimura, K. Ogi, M. Peredelchuk, G. Solomon, W. Brown, J.C. Barrett, and V. Larionov. 2003. Cloning of human centromeres by transformation- associated recombination in yeast and generation of functional human artificial chromosomes. Nucleic Acids Res 31: 922-934.

Krolewski, J.J., C.W. Schindler, and M.G. Rush. 1984. Structure of extrachromosomal circular DNAs containing both the Alu family of dispersed repetitive sequences and other regions of chromosomal DNA. J Mol Biol 174: 41-54. 212

Kumar, S. and S.B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392: 917-920.

Kumar, S., K. Tamura, I.B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17: 1244-1245.

Kunisada, T. and H. Yamagishi. 1987. Sequence organization of repetitive sequences enriched in small polydisperse circular DNAs from HeLa cells. J Mol Biol 198: 557-565.

Lamb, J.C. and J.A. Birchler. 2003. The role of DNA sequence in centromere formation. Genome Biol 4: 214.

Lander, E.S. L.M. Linton B. Birren et al. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.

Lansdorp, P.M., N.P. Verwoerd, F.M. van de Rijke, V. Dragowska, M.T. Little, R.W. Dirks, A.K. Raap, and H.J. Tanke. 1996. Heterogeneity in telomere length of human chromosomes. Hum Mol Genet 5: 685-691.

Larin, Z. and J.E. Mejia. 2002. Advances in human artificial chromosome technology. Trends Genet 18: 313-319.

Laurent, A.M., M. Li, S. Sherman, G. Roizes, and J. Buard. 2003. Recombination across the centromere of disjoined and non-disjoined chromosome 21. Hum Mol Genet 12: 2229-2239.

Laursen, H.B., A.L. Jorgensen, C. Jones, and A.L. Bak. 1992. Higher rate of evolution of X chromosome alpha-repeat DNA in human than in the great apes. Embo J 11: 2367-2372.

LeBlanc, H.N., T.T. Tang, J.S. Wu, and T.L. Orr-Weaver. 1999. The mitotic centromeric protein MEI-S332 and its role in sister-chromatid cohesion. Chromosoma 108: 401-411.

Lee, C., R. Critcher, J.G. Zhang, W. Mills, and C.J. Farr. 2000. Distribution of gamma satellite DNA on the human X and Y chromosomes suggests that it is not required for mitotic centromere function. Chromosoma 109: 381- 389.

Lee, C., R. Wevrick, R.B. Fisher, M.A. Ferguson-Smith, and C.C. Lin. 1997. Human centromeric DNAs. Hum Genet 100: 291-304. 213

Lee, J.Y. and T.L. Orr-Weaver. 2001. The molecular basis of sister-chromatid cohesion. Annu Rev Cell Dev Biol 17: 753-777.

Liu, G., S. Zhao, J.A. Bailey, S.C. Sahinalp, C. Alkan, E. Tuzun, E.D. Green, and E.E. Eichler. 2003. Analysis of primate genomic variation reveals a repeat- driven expansion of the human genome. Genome Res 13: 358-368.

Lo, A.W., J.M. Craig, R. Saffery, P. Kalitsis, D.V. Irvine, E. Earle, D.J. Magliano, and K.H. Choo. 2001. A 330 kb CENP-A binding domain and altered replication timing at a human neocentromere. Embo J 20: 2087-2096.

Lohe, A.R., A.J. Hilliker, and P.A. Roberts. 1993. Mapping simple repeated DNA sequences in heterochromatin of Drosophila melanogaster. Genetics 134: 1149-1174.

Looijenga, L.H., J.W. Oosterhuis, V.T. Smit, J.W. Wessels, P. Mollevanger, and P. Devilee. 1992. Alpha satellite DNAs on chromosomes 10 and 12 are both members of the dimeric suprachromosomal subfamily, but display little identity at the nucleotide sequence level. Genomics 13: 1125-1132.

Lopez, J.M., G.H. Karpen, and T.L. Orr-Weaver. 2000. Sister-chromatid cohesion via MEI-S332 and kinetochore assembly are separable functions of the Drosophila centromere. Curr Biol 10: 997-1000.

Maggert, K.A. and G.H. Karpen. 2001. The activation of a neocentromere in Drosophila requires proximity to an endogenous centromere. Genetics 158: 1615-1628.

Mahtani, M.M. and H.F. Willard. 1990. Pulsed-field gel analysis of alpha satellite DNA at the human X chromosome centromere: high frequency polymorphisms and array size estimate. Genomics 7: 607-613.

Mahtani, M.M. and H.F. Willard. 1998. Physical and genetic mapping of the human X chromosome centromere: repression of recombination. Genome Res 8: 100-110.

Maio, J.J. 1971. DNA strand reassociation and polyribonucleotide binding in the African green monkey, Cercopithecus aethiops. J Mol Biol 56: 579-595.

Maio, J.J., F.L. Brown, and P.R. Musich. 1981. Toward a molecular paleontology of primate genomes. I. The HindIII and EcoRI dimer families of alphoid DNAs. Chromosoma 83: 103-125. 214

Malik, H.S. and S. Henikoff. 2002. Conflict begets complexity: the evolution of centromeres. Curr Opin Genet Dev 12: 711-718.

Manuelidis, L. 1976. Repeating restriction fragments of human DNA. Nucleic Acids Res 3: 3063-3076.

Manuelidis, L. 1978. Chromosomal localization of complex and simple repeated human DNAs. Chromosoma 66: 23-32.

Manuelidis, L. and J.C. Wu. 1978. Homology between human and simian repeated DNA. Nature 276: 92-94.

Marahrens, Y. and B. Stillman. 1992. A yeast chromosomal origin of DNA replication defined by multiple functional elements. Science 255: 817-823.

Masumoto, H., M. Ikeno, M. Nakano, T. Okazaki, B. Grimes, H. Cooke, and N. Suzuki. 1998. Assay of centromere function using a human artificial chromosome. Chromosoma 107: 406-416.

Masumoto, H., H. Masukata, Y. Muro, N. Nozaki, and T. Okazaki. 1989. A human centromere antigen (CENP-B) interacts with a short specific sequence in alphoid DNA, a human centromeric satellite. J. Cell Biol. 109: 1963-1973.

McClintock, B. 1939. The behavior in successive nuclear divisions of a chromosome broken in meiosis. PNAS 25: 405-416.

Mejia, J.E., A. Alazami, A. Willmott, P. Marschall, E. Levy, W.C. Earnshaw, and Z. Larin. 2002. Efficiency of de novo centromere formation in human artificial chromosomes. Genomics 79: 297-304.

Mejia, J.E., A. Willmott, E. Levy, W.C. Earnshaw, and Z. Larin. 2001. Functional complementation of a genetic deficiency with human artificial chromosomes. Am J Hum Genet 69: 315-326.

Mills, W., R. Critcher, C. Lee, and C.J. Farr. 1999. Generation of an approximately 2.4 Mb human X centromere-based minichromosome by targeted telomere-associated chromosome fragmentation in DT40. Hum Mol Genet 8: 751-761.

Minc, E., Y. Allory, H.J. Worman, J.C. Courvalin, and B. Buendia. 1999. Localization and phosphorylation of HP1 proteins during the cell cycle in mammalian cells. Chromosoma 108: 220-234. 215

Moore, L.L., M. Morrison, and M.B. Roth. 1999. HCP-1, a protein involved in chromosome segregation, is localized to the centromere of mitotic chromosomes in Caenorhabditis elegans. J Cell Biol 147: 471-480. Moore, L.L. and M.B. Roth. 2001. HCP-4, a CENP-C-like protein in Caenorhabditis elegans, is required for resolution of sister centromeres. J Cell Biol 153: 1199-1208.

Moroi, Y., C. Peeples, M.J. Fritzler, J. Steigerwald, and E.M. Tan. 1980. Autoantibody to centromere (kinetochore) in scleroderma sera. Proc. Natl. Acad. Sci. USA 77: 1627-1631.

Murphy, T.D. and G.H. Karpen. 1995. Localization of centromere function in a Drosophila minichromosome. Cell 82: 599-609.

Murphy, T.D. and G.H. Karpen. 1998. Centromeres take flight: alpha satellite and the quest for the human centromere. Cell 93: 317-320.

Murray, A.W., N.P. Schultes, and J.W. Szostak. 1986. Chromosome length controls mitotic chromosome segregation in yeast. Cell 45: 529-536.

Murray, A.W. and J.W. Szostak. 1983. Construction of artificial chromosomes in yeast. Nature 305: 189-193.

Musacchio, A. and K.G. Hardwick. 2002. The spindle checkpoint: structural insights into dynamic signalling. Nat Rev Mol Cell Biol 3: 731-741.

Musich, P.R., F.L. Brown, and J.J. Maio. 1980. Highly repetitive component alpha and related alphoid DNAs in man and monkeys. Chromosoma 80: 331- 348.

Nagaki, K., Z. Cheng, S. Ouyang, P.B. Talbert, M. Kim, K.M. Jones, S. Henikoff, C.R. Buell, and J. Jiang. 2004. Sequencing of a rice centromere uncovers active genes. Nat Genet 36: 138-145.

Nagaki, K., P.B. Talbert, C.X. Zhong, R.K. Dawe, S. Henikoff, and J. Jiang. 2003. Chromatin immunoprecipitation reveals that the 180-bp satellite repeat is the key functional DNA element of Arabidopsis thaliana centromeres. Genetics 163: 1221-1225.

Nakaseko, Y., Y. Adachi, S. Funahashi, O. Niwa, and M. Yanagida. 1986. Chromosome walking shows a highly homologous repetitive sequence present in all the centromere regions of fission yeast. EMBO J. 5: 1011- 1021. 216

Nakaseko, Y., G. Goshima, J. Morishita, and M. Yanagida. 2001. M phase- specific kinetochore proteins in fission yeast: microtubule-associating Dis1 and Mtc1 display rapid separation and segregation during anaphase. Curr Biol 11: 537-549.

Nakaseko, Y., N. Kinoshita, and M. Yanagida. 1987. A novel sequence common to the centromere regions of Schizosaccharomyces pombe chromosomes. Nucleic Acids Res 15: 4705-4715.

Needleman, S.B. and C.D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443-453.

Nicklas, R.B. 1971. Mitosis. Adv Cell Biol 2: 225-297.

Nicol, L. and P. Jeppesen. 1994. Human autoimmune sera recognize a conserved 26 kD protein associated wit heterochromatin that is homologous to heterochromatin protein 1 of Drosophila. Chromosome Res. 2: 245-255.

Nonaka, N., T. Kitajima, S. Yokobayashi, G. Xiao, M. Yamamoto, S.I. Grewal, and Y. Watanabe. 2002. Recruitment of cohesin to heterochromatic regions by Swi6/HP1 in fission yeast. Nat Cell Biol 4: 89-93.

Oegema, K., A. Desai, S. Rybina, M. Kirkham, and A.A. Hyman. 2001. Functional analysis of kinetochore assembly in Caenorhabditis elegans. J Cell Biol 153: 1209-1226.

Ohta, T. and G.A. Dover. 1983. Population genetics of multigene families that are dispersed into two or more chromosomes. Proc Natl Acad Sci U S A 80: 4079-4083.

Ohzeki, J., M. Nakano, T. Okada, and H. Masumoto. 2002. CENP-B box is required for de novo centromere chromatin assembly on human alphoid DNA. J Cell Biol 159: 765-775.

Orgel, L.E. and F.H. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284: 604-607.

Page, S.A., W.C. Earnshaw, K.H.A. Choo, and L.G. Schaffer. 1995. Further evidence that CENP-C is a necessary component of active centromeres: studies of a dic (X;15) with simultaneous immunofluorescence and FISH. Hum Mol Genet 4: 289-294. 217

Page, S.L. and L.G. Shaffer. 1998. Chromosome stability is maintained by short intercentromeric distance in functionally dicentric human Robertsonian translocations. Chromosome Res 6: 115-122.

Palmer, D.K., K. O'Day, H.L. Trong, H. Charbonneau, and R.L. Margolis. 1991. Purification of the centromere-specific protein CENP-A and demonstration that it is a distinctive histone. Proc. Natl. Acad. Sci. USA 88: 3734-3738.

Palmer, D.K., K. O'Day, M.H. Wener, B.S. Andrews, and R.L. Margolis. 1987. A 17-kD centromere protein (CENP-A) copurifies with nucleosome core particles and with histones. J Cell Biol 104: 805-815.

Parisi, S., M.J. McKay, M. Molnar, M.A. Thompson, P.J. van der Spek, E. van Drunen-Schoenmaker, R. Kanaar, E. Lehmann, J.H. Hoeijmakers, and J. Kohli. 1999. Rec8p, a meiotic recombination and sister chromatid cohesion phosphoprotein of the Rad21p family conserved from fission yeast to humans. Mol Cell Biol 19: 3515-3528.

Partridge, J.F., B. Borgstrom, and R.C. Allshire. 2000. Distinct protein interaction domains and protein spreading in a complex centromere. Genes Dev 14: 783-791.

Partridge, J.F., K.S. Scott, A.J. Bannister, T. Kouzarides, and R.C. Allshire. 2002. cis-acting DNA from fission yeast centromeres mediates histone H3 methylation and recruitment of silencing factors and cohesin to an ectopic site. Curr Biol 12: 1652-1660.

Peacock, W.J., A.R. Lohe, W.L. Gerlach, P. Dunsmuir, E.S. Dennis, and R. Appels. 1978. Fine structure and evolution of DNA in heterochromatin. Cold Spring Harb Symp Quant Biol 42 Pt 2: 1121-1135.

Pickeral, O.K., W. Makalowski, M.S. Boguski, and J.D. Boeke. 2000. Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. Genome Res 10: 411-415.

Pietras, D.F., K.L. Bennett, L.D. Siracusa, M. Woodworth-Gutai, V.M. Chapman, K.W. Gross, C. Kane-Haas, and N.D. Hastie. 1983. Construction of a small Mus musculus repetitive DNA library: identification of a new satellite sequence in Mus musculus. Nucleic Acids Res 11: 6965-6983.

Platero, J.S., K. Ahmad, and S. Henikoff. 1999. A distal heterochromatic block displays centromeric activity when detached from a natural centromere. Mol Cell 4: 995-1004. 218

Prosser, J., M. Frommer, C. Paul, and P.C. Vincent. 1986. Sequence relationships of three human satellite DNAs. J Mol Biol 187: 145-155.

Pruitt, K.D., K.S. Katz, H. Sicotte, and D.R. Maglott. 2000. Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet 16: 44-47.

Puechberty, J., A.M. Laurent, S. Gimenez, A. Billault, M.E. Brun-Laurent, A. Calenda, B. Marcais, C. Prades, P. Ioannou, Y. Yurov, and G. Roizes. 1999. Genetic and physical analyses of the centromeric and pericentromeric regions of human : recombination across 5cen. Genomics 56: 274-287.

Rhodes, M.M. and H. Vilkomerson. 1942. On the anaphase movement of chromosomes. Proc Natl Acad Sci U S A 28: 433-436.

Richards, E.J., H.M. Goodman, and F.M. Ausubel. 1991. The centromere region of Arabidopsis thaliana chromosome 1 contains telomere-similar sequences. Nucleic Acids Res 19: 3351-3357.

Rieder, C.L. and E.D. Salmon. 1998. The vertebrate cell kinetochore and its roles during mitosis. Trends Cell Biol 8: 310-318.

Riethman, H., A. Ambrosini, C. Castaneda, J. Finklestein, X.L. Hu, U. Mudunuri, S. Paul, and J. Wei. 2004. Mapping and initial analysis of human subtelomeric sequence assemblies. Genome Res 14: 18-28.

Rocchi, M., N. Archidiacono, D.C. Ward, and A. Baldini. 1991. A human -specific alphoid DNA repeat spatially resolvable from satellite 3 DNA by fluorescent in situ hybridization. Genomics 9: 517-523.

Rosenberg, H., M. Singer, and M. Rosenberg. 1978. Highly reiterated sequences of SIMIANSIMIANSIMIANSIMIANSIMIAN. Science 200: 394-402.

Round, E.K., S.K. Flowers, and E.J. Richards. 1997. Arabidopsis thaliana centromere regions: genetic map positions and repetitive DNA structure. Genome Res 7: 1045-1053.

Rozen, S., H. Skaletsky, J.D. Marszalek, P.J. Minx, H.S. Cordum, R.H. Waterston, R.K. Wilson, and D.C. Page. 2003. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423: 873-876. 219

Rudd, M.K., R.W. Mays, S. Schwartz, and H.F. Willard. 2003. Human artificial chromosomes with alpha satellite-based de novo centromeres show increased frequency of nondisjunction and anaphase lag. Mol Cell Biol 23: 7689-7697.

Rudd, M.K., M.G. Schueler, and H.F. Willard. 2004. Sequence organization and functional annotation of human centromeres. Cold Spring Harbor Symposia on Quantitative Biology 68: 141-149.

Saffery, R., D.V. Irvine, B. Griffiths, P. Kalitsis, L. Wordeman, and K.H. Choo. 2000. Human centromeres and neocentromeres show identical distribution patterns of >20 functionally important kinetochore-associated proteins. Hum Mol Genet 9: 175-185.

Saffery, R., L.H. Wong, D.V. Irvine, M.A. Bateman, B. Griffiths, S.M. Cutts, M.R. Cancilla, A.C. Cendron, A.J. Stafford, and K.H. Choo. 2001. Construction of neocentromere-based human minichromosomes by telomere- associated chromosomal truncation. Proc Natl Acad Sci U S A 98: 5705- 5710.

Satinover, D.L., G.H. Vance, D.L. Van Dyke, and S. Schwartz. 2001. Cytogenetic analysis and construction of a BAC contig across a common neocentromeric region from 9p. Chromosoma 110: 275-283.

Saunders, W.S., C. Chue, M. Goebl, C. Craig, R.F. Clark, J.A. Powers, J.C. Eissenberg, S.C.R. Elgin, N.F. Rothfield, and W.C. Earnshaw. 1993. Molecular cloning of a human homologue of Drosophila heterochromatin protein HP1 using anti-centormere autoantibodies with anti- chromospecificity. J. Cell Sci. 104: 573-582.

Scherer, S.W., J. Cheung, J.R. MacDonald, et al. 2003. Human chromosome 7: DNA sequence and biology. Science 300: 767-772.

Schindelhauer, D. and T. Schwarz. 2002. Evidence for a fast, intrachromosomal conversion mechanism from mapping of nucleotide variants within a homogeneous alpha-satellite DNA array. Genome Res 12: 1815-1826.

Schueler, M.G., A.W. Higgins, M.K. Rudd, K. Gustashaw, and H.F. Willard. 2001. Genomic and genetic definition of a functional human centromere. Science 294: 109-115.

Sears, D.D., J.H. Hegemann, J.H. Shero, and P. Hieter. 1995. Cis-acting determinants affecting centromere function, sister-chromatid cohesion and 220

reciprocal recombination during meiosis in Saccharomyces cerevisiae. Genetics 139: 1159-1173.

Shah, J.V. and D.W. Cleveland. 2000. Waiting for anaphase: Mad2 and the spindle assembly checkpoint. Cell 103: 997-1000.

Shampay, J., J.W. Szostak, and E.H. Blackburn. 1984. DNA sequences of telomeres maintained in yeast. Nature 310: 154-157.

She, X., J.E. Horvath, Z. Jiang, G. Lui, T.S. Furey, L. Christ, R. Clark, T. Graves, C.L. Gulden, C. Alkan, J. Bailey, C. Sahinalp, M. Rocchi, D. Haussler, R. Wilson, W. Miller, S. Schwartz, and E.E. Eichler. 2004. The structure and evolution of centromeric transition regions within the human genome. Nature in press.

Shelby, R.D., K. Monier, and K.F. Sullivan. 2000. Chromatin assembly at kinetochores is uncoupled from DNA replication. J Cell Biol 151: 1113- 1118.

Shen, M.H., J.W. Yang, J. Yang, C. Pendon, and W.R. Brown. 2001. The accuracy of segregation of human mini-chromosomes varies in different vertebrate cell lines, correlates with the extent of centromere formation and provides evidence for a trans-acting centromere maintenance activity. Chromosoma 109: 524-535.

Singer, D. and L. Donehower. 1979. Highly repeated DNA of the baboon: organization of sequences homologous to highly repeated DNA of the African green monkey. J Mol Biol 134: 835-842.

Skaletsky, H., T. Kuroda-Kawaguchi, P.J. Minx, H.S. Cordum, L. Hillier, L.G. Brown, S. Repping, T. Pyntikova, J. Ali, T. Bieri, A. Chinwalla, A. Delehaunty, K. Delehaunty, H. Du, G. Fewell, L. Fulton, R. Fulton, T. Graves, S.F. Hou, P. Latrielle, S. Leonard, E. Mardis, R. Maupin, J. McPherson, T. Miner, W. Nash, C. Nguyen, P. Ozersky, K. Pepin, S. Rock, T. Rohlfing, K. Scott, B. Schultz, C. Strong, A. Tin-Wollam, S.P. Yang, R.H. Waterston, R.K. Wilson, S. Rozen, and D.C. Page. 2003. The male-specific region of the human Y chromosome is a of discrete sequence classes. Nature 423: 825-837.

Slater, H.R., S. Nouri, E. Earle, A.W. Lo, L.G. Hale, and K.H. Choo. 1999. Neocentromere formation in a stable ring 1p32-p36.1 chromosome. J Med Genet 36: 914-918. 221

Smit, A.F. 1999. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev 9: 657-663.

Smith, G.P. 1976. Evolution of repeated DNA sequences by unequal crossover. Science 191: 528-535.

Sonnhammer, E.L. and R. Durbin. 1995. A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 167: GC1-10.

Southern, E.M. 1975. Long range periodicities in mouse satellite DNA. J Mol Biol 94: 51-69.

Spence, J.M., R. Critcher, T.A. Ebersole, M.M. Valdivia, W.C. Earnshaw, T. Fukagawa, and C.J. Farr. 2002. Co-localization of centromere activity, proteins and topoisomerase II within a subdomain of the major human X alpha-satellite array. Embo J 21: 5269-5280.

Stankiewicz, P. and J.R. Lupski. 2002. Molecular-evolutionary mechanisms for genomic disorders. Curr Opin Genet Dev 12: 312-319.

Stinchcomb, D.T., J.E. Shaw, S.H. Carr, and D. Hirsh. 1985. Extrachromosomal DNA transformation of Caenorhabditis elegans. Mol Cell Biol 5: 3484- 3496.

Stoler, S., K.C. Keith, K.E. Curnick, and M. Fitzgerald-Hayes. 1995. A mutation in CSE4, an essential gene encoding a novel chromatin associated protein in yeast, causes chromosome nondisjunction and cell cycle arrest at mitosis. Genes Dev. 9: 573-586.

Strachan, T., E. Coen, D. Webb, and G. Dover. 1982. Modes and rates of change of complex DNA families of Drosophila. J Mol Biol 158: 37-54.

Strachan, T., D. Webb, and G. Dover. 1985. Transition stages of molecular drive in multiple-copy DNA families in Drosophila. Embo J 4: 1701-1708.

Sueoka, N., J. Marmur, and P. Doty, 2nd. 1959. Dependence of the density of deoxyribonucleic acids on guanine-cytosine content. Nature 183: 1429- 1431.

Sullivan, B.A. 2002. Centromere round-up at the heterochromatin corral. Trends Biotechnol 20: 89-92. 222

Sullivan, B.A., M.D. Blower, and G.H. Karpen. 2001. Determining centromere identity: cyclical stories and forking paths. Nat Rev Genet 2: 584-596.

Sullivan, B.A. and G.H. Karpen. 2004. CENP-A chromatin occupies "euchromatic" histone modifications. Nature Structural Biology, in press.

Sullivan, B.A. and S. Schwartz. 1995. Identification of centromeric antigens in dicentric Robertsonian translocations: CENP-C and CENP-E are necessary components of functional centromeres. Hum Mol Genet 5: 2189-2198.

Sullivan, B.A., A.D. Skora, H.D. Le, and G.H. Karpen. 2002. CENP-A chromatin occupies "euchromatic" histone modifications. Am. J. Hum. Genet. 71: 218.

Sullivan, B.A. and P.E. Warburton. 1999. Studying progression of vertebrate chromosomes through mitosis by immunofluorescence and FISH. In Chromosome structural analysis-a practical approach (ed. W.A. Bickmore), pp. 81-101. Oxford University Press, Oxford.

Sullivan, B.A. and H.F. Willard. 1998. Stable dicentric X chromsomes with two functional centromeres. Nat. Genet. 20: 227-228.

Sullivan, K.F. and C.A. Glass. 1991. CENP-B is a highly conserved mammalian centromere protein with homology to the helix-loop-helix family of proteins. Chromosoma 100: 360-370.

Sun, X., H.D. Le, J.M. Wahlstrom, and G.H. Karpen. 2003. Sequence analysis of a functional Drosophila centromere. Genome Res 13: 182-194.

Sun, X., J. Wahlstrom, and G. Karpen. 1997. Molecular structure of a functional Drosophila centromere. Cell 91: 1007-1019.

Surosky, R.T., C.S. Newlon, and B.K. Tye. 1986. The mitotic stability of deletion derivatives of chromosome III in yeast. Proc Natl Acad Sci U S A 83: 414- 418.

Taddei, A., C. Maison, D. Roche, and G. Almouzni. 2001. Reversible disruption of pericentric heterochromatin and centromere function by inhibiting deacetylases. Nat Cell Biol 3: 114-120.

Takahashi, K., E.S. Chen, and M. Yanagida. 2000. Requirement of Mis6 centromere connector for localizing a CENP-A-like protein in fission yeast. Science 288: 2215-2219. 223

Tanaka, T.U. 2002. Bi-orienting chromosomes on the mitotic spindle. Curr Opin Cell Biol 14: 365-371.

Thayer, R.E., M.F. Singer, and T.F. McCutchan. 1981. Sequence relationships between single repeat units of highly reiterated African Green monkey DNA. Nucleic Acids Res 9: 169-181.

Therman, E., G.E. Sarto, and K. Patau. 1974. Apparently isodicentric but functionally monocentric X chromosome in man. Amer J Hum Genet 26: 83-92.

Thompson, J.D., D.G. Higgins, and T.J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673-4680.

Tomkiel, J., C.A. Cooke, H. Saitoh, R.L. Bernat, and W.C. Earnshaw. 1994. CENP-C is required for maintaining proper kinetochore size and for a timely transition to anaphase. J Cell Biol 125: 531-545.

Trask, B.J. 2002. Human cytogenetics: 46 chromosomes, 46 years and counting. Nat Rev Genet 3: 769-778.

Tyler-Smith, C. and W.R. Brown. 1987. Structure of the major block of alphoid satellite DNA on the human Y chromosome. J Mol Biol 195: 457-470.

Tyler-Smith, C., R.J. Oakey, Z. Larin, R.B. Fisher, M. Crocker, N.A. Affara, M.A. Ferguson-Smith, M. Muenke, O. Zuffardi, and M.A. Jobling. 1993. Localization of DNA sequences required for human centromere function through an analysis of rearranged Y chromosomes. Nature Genet 5: 368- 375.

Tyler-Smith, C. and H.F. Willard. 1993. Mammalian chromosome structure. Curr Opin Genet Dev 3: 390-397.

Ulhmann, F. 2003. Chromosome cohesion and separation: from men and molecules. Curr Bio 13: R104-R114.

Vafa, O., R.D. Shelby, and K.F. Sullivan. 1999. CENP-A associated complex satellite DNA in the kinetochore of the Indian muntjac. Chromosoma 108: 367-374. 224

Vafa, O. and K.F. Sullivan. 1997. Chromatin containing CENP-A and alpha satellite DNA is a major component of the inner kinetochore plate. Current Biology 7: 897-900.

Valdivia, M.M., J. Figueroa, C. Iglesias, and M. Ortiz. 1998. A novel centromere monospecific serum to a human autoepitope on the histone H3-like protein CENP-A. FEBS Lett 422: 5-9.

Vasiliauskas, D., S. Hancock, and C.D. Stern. 1999. SWiP-1: novel SOCS box containing WD-protein regulated by signalling centres and by Shh during development. Mech Dev 82: 79-94.

Vass, S., S. Cotterill, A.M. Valdeolmillos, J.L. Barbero, E. Lin, W.D. Warren, and M.M. Heck. 2003. Depletion of Drad21/Scc1 in Drosophila cells leads to instability of the cohesin complex and disruption of mitotic progression. Curr Biol 13: 208-218.

Venter, J.C. M.D. Adams E.W. Myers et al. 2001. The sequence of the human genome. Science 291: 1304-1351.

Vissel, B. and K.H. Choo. 1991. Four distinct alpha satellite subfamilies shared by human chromosomes 13, 14 and 21. Nucl. Acid Res. 19: 271-277.

Voullaire, L., R. Saffery, J. Davies, E. Earle, P. Kalitsis, H. Slater, D.V. Irvine, and K.H. Choo. 1999. 20p resulting from inverted duplication and neocentromere formation. Am J Med Genet 85: 403-408.

Voullaire, L., R. Saffery, E. Earle, D.V. Irvine, H. Slater, S. Dale, D. du Sart, T. Fleming, and K.H. Choo. 2001. Mosaic inv dup (8p) with stable neocentromere suggests neocentromerization is a post-zygotic event. Am J Med Genet 102: 86-94.

Walmsley, R.W., C.S. Chan, B.K. Tye, and T.D. Petes. 1984. Unusual DNA sequences associated with the ends of yeast chromosomes. Nature 310: 157-160.

Warburton, P. and H. Willard. 1996. Evolution of centromeric alpha satellite DNA: molecular organization within and between human and primate chromosomes. In Human Genome Evolution (ed. S.T. Jackson M, and Dover G), pp. 121-145. BIOS Scientific Publishers, Oxford.

Warburton, P.E., C.A. Cooke, S. Bourassa, O. Vafa, B.A. Sullivan, G. Stetten, G. Gimelli, D. Warburton, C. Tyler-Smith, K.F. Sullivan, G.G. Poirier, and W.C. Earnshaw. 1997. Immunolocalization of CENP-A suggests a distinct 225

nucleosome structure at the inner kinetochore plate of active centromeres. Curr Biol 7: 901-904.

Warburton, P.E., M. Dolled, R. Mahmood, A. Alonso, S. Li, K. Naritomi, T. Tohma, T. Nagai, T. Hasegawa, H. Ohashi, L.C. Govaerts, B.H. Eussen, J.O. Van Hemel, C. Lozzio, S. Schwartz, J.J. Dowhanick-Morrissette, N.B. Spinner, H. Rivera, J.A. Crolla, C. Yu, and D. Warburton. 2000. Molecular Cytogenetic Analysis of Eight Inversion Duplications of Human Chromosome 13q That Each Contain a Neocentromere. Am J Hum Genet 66: 1794-1806.

Warburton, P.E., T. Haaf, J. Gosden, D. Lawson, and H.F. Willard. 1996. Characterization of a chromosome-specific chimpanzee alpha satellite subset: evolutionary relationship to subsets on human chromosomes. Genomics 33: 220-228.

Warburton, P.E., J.S. Waye, and H.F. Willard. 1993. Nonrandom localization of recombination events in human alpha satellite repeat unit variants: implications for higher-order structural characteristics within centromeric heterochromatin. Mol Cell Biol 13: 6520-6529.

Warburton, P.E., R. Wevrick, M.M. Mahtani, and H.F. Willard. 1991. Pulsed field and two-dimensional gel electrophoresis of long arrays of tandemly repeated DNA: analysis of human centromeric alpha satellite. In Methods, applications and theories of pulsed field gel electrophoresis (eds. M. Burmeister and L. Ulanovsky). Humana Press Inc., Clifton, N. J.

Warburton, P.E. and H.F. Willard. 1990. Genomic analysis of sequence variation in tandemly repeated DNA: evidence for localized homogeneous sequence domains within arrays of alpha satellite DNA. J. Mol. Biol. 216: 3-16.

Warburton, P.E. and H.F. Willard. 1992. PCR amplification of tandemly repeated DNA: analysis of intra- and interchromosomal sequence variation and homologous unequal crossing-over in human alpha satellite DNA. Nucleic Acids Res 20: 6033-6042.

Warburton, P.E. and H.F. Willard. 1995. Interhomologue sequence variation of alpha satellite DNA from human chromosome 17: evidence for concerted evolution along haplotypic lineages. J Mol Evol 41: 1006-1015.

Watanabe, H., A. Fujiyama, M. Hattori, T.D. Taylor, A. Toyoda, Y. Kuroki, H. Noguchi, A. BenKahla, H. Lehrach, R. Sudbrak, M. Kube, S. Taenzer, P. Galgoczy, M. Platzer, M. Scharfe, G. Nordsiek, H. Blocker, I. Hellmann, P. 226

Khaitovich, S. Paabo, R. Reinhardt, H.J. Zheng, X.L. Zhang, G.F. Zhu, B.F. Wang, G. Fu, S.X. Ren, G.P. Zhao, Z. Chen, Y.S. Lee, J.E. Cheong, S.H. Choi, K.M. Wu, T.T. Liu, K.J. Hsiao, S.F. Tsai, C.G. Kim, O.O. S, T. Kitano, Y. Kohara, N. Saitou, H.S. Park, S.Y. Wang, M.L. Yaspo, and Y. Sakaki. 2004. DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature 429: 382-388.

Waterston, R.H. K. Lindblad-Toh E. Birney et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520-562.

Waye, J.S., L.A. Creeper, and H.F. Willard. 1987a. Organization and evolution of alpha satellite DNA from human chromosome 11. Chromosoma 95: 182- 188.

Waye, J.S., S.J. Durfy, D. Pinkel, S. Kenwrick, M. Patterson, K.E. Davies, and H.F. Willard. 1987b. Chromosome-specific alpha satellite DNA from human chromosome 1: Hierarchical structure and genomic organization of a polymorphic domain spanning several hundred kilobase pairs of centromeric DNA. Genomics 1: 43-51.

Waye, J.S., S.B. England, and H.F. Willard. 1987c. Genomic organization of alpha satellite DNA on human chromosome 7: evidence for two distinct alphoid domains on a single chromosome. Mol. Cell. Biol. 7: 349-356.

Waye, J.S. and H.F. Willard. 1985. Chromosome-specific alpha satellite DNA: nucleotide sequence analysis of the 2.0 kilobasepair repeat from the human X chromosome. Nucl. Acids Res. 12: 2731-2743.

Waye, J.S. and H.F. Willard. 1986. Structure, organization, and sequence of alpha satellite DNA from human chromosome 17: Evidence for evolution by unequal crossing-over and an ancestral pentamer repeat shared with the human X chromosome. Mol. Cell. Biol. 6: 3156-3165.

Waye, J.S. and H.F. Willard. 1987. Nucleotide sequence heterogeneity of alpha satellite repetitive DNA: a survey of alphoid sequences from different human chromosomes. Nucl. Acids Res. 15: 7549-7580.

Waye, J.S. and H.F. Willard. 1989a. Chromosome-specificity of satellite DNAs: short and long-range organization of a diverged dimeric subset of human alpha satellite from chromosome 3. Chromosoma 95: 275-280.

Waye, J.S. and H.F. Willard. 1989b. Concerted evolution of alpha satellite DNA: evidence for species specificity and a general lack of sequence 227

conservation among alphoid sequences of higher primates. Chromosoma 98: 273-279.

Weier, H.U., J.N. Lucas, M. Poggensee, R. Segraves, D. Pinkel, and J.W. Gray. 1991. Two-color hybridization with high complexity chromosome-specific probes and a degenerate alpha satellite probe DNA allows unambiguous discrimination between symmetrical and asymmetrical translocations. Chromosoma 100: 371-376.

Wevrick, R., W.C. Earnshaw, P.N. Howard-Peebles, and H.F. Willard. 1990. Partial deletion of alpha satellite DNA associated with reduced amounts of the centromere protein CENP-B in a mitotically stable human chromosome rearrangement. Mol. Cell Biol. 10: 6374-6380.

Wevrick, R. and H.F. Willard. 1989. Long-range organization of tandem arrays of alpha satellite DNA at the centromeres of human chromosomes: High frequency array-length polymorphism and meiotic stability. Proc. Natl. Acad. Sci. USA 86: 9394-9398.

Wevrick, R. and H.F. Willard. 1991. Physical map of the centromeric region of human chromosome 7: relationship between two distinct alpha satellite arrays. Nucl. Acids Res. 19: 2295-2301.

Wevrick, R., V.P. Willard, and H.F. Willard. 1992. Structure of DNA near long tandem arrays of alpha satellite DNA at the centromere of human chromosome 7. Genomics 14: 912-923.

Willard, H.F. 1985. Chromosome-specific organization of human alpha satellite DNA. Am. J. Hum. Genet. 37: 524-532.

Willard, H.F. 1991. Evolution of alpha satellite. Curr Opin Genet Dev 1: 509-514.

Willard, H.F. 1998. Centromeres: the missing link in the development of human artificial chromosomes. Curr Opin Genet Dev 8: 219-225.

Willard, H.F. 2001. Neocentromeres and human artificial chromosomes: an unnatural act. Proc Natl Acad Sci U S A 98: 5374-5376.

Willard, H.F., K.D. Smith, and J. Sutherland. 1983. Isolation and characterization of a major tandem repeat family from the human X chromosome. Nucleic Acids Res 11: 2017-2033.

Willard, H.F. and J.S. Waye. 1987a. Chromosome-specific subsets of human alpha satellite DNA: Analysis of sequence divergence within and between 228

chromosomal subsets and evidence for an ancestral pentameric repeat. J. Mol. Evol. 25: 207-214.

Willard, H.F. and J.S. Waye. 1987b. Hierarchical order in chromosome-specific human alpha satellite DNA. Trends Genet. 3: 192-198.

Willard, H.F., R. Wevrick, and P.E. Warburton. 1989. Human centromere structure: organization and potential role of alpha satellite DNA. In Aneuploidy: Mechanisms of Origin (ed. M.A. Resnick), pp. 9-18. Alan R. Liss, Inc., New York.

Williams, B.C., T.D. Murphy, M.L. Goldberg, and G.H. Karpen. 1998. Neocentromere activity of structurally acentric mini-chromosomes in Drosophila. Nat Genet 18: 30-37.

Wolfe, J., S.M. Darling, R.P. Erickson, I.W. Craig, V.J. Buckle, P.W. Rigby, H.F. Willard, and P.N. Goodfellow. 1985. Isolation and characterization of an alphoid centromeric repeat family from the human Y chromosome. J Mol Biol 182: 477-485.

Wong, A.K., F.G. Biddle, and J.B. Rattner. 1990. The chromosomal distribution of the major and minor satellite is not conserved in the genus Mus. Chromosoma 99: 190-195.

Wong, A.K.C. and J.B. Rattner. 1988. Sequence organization and cytological localization of the minor satellite of mouse. Nucl. Acids Res. 16: 11645- 11661.

Wood, V. R., M.A. Gwilliam, M. Rajandream et al. 2002. The genome sequence of Schizosaccharomyces pombe. Nature 415: 871-880.

Yang, J.W., C. Pendon, J. Yang, N. Haywood, A. Chand, and W.R. Brown. 2000. Human mini-chromosomes with minimal centromeres. Hum Mol Genet 9: 1891-1902.

Yang, T.P., S.K. Hansen, K.K. Oishi, O.A. Ryder, and B.A. Hamkalo. 1982. Characterization of a cloned repetitive DNA sequence concentrated on the human X chromosome. Proc Natl Acad Sci U S A 79: 6593-6597.

Yen, T.J., D.A. Compton, D. Wise, R.P. Zinkowski, B.R. Brinkley, W.C. Earnshaw, and D.W. Cleveland. 1991. CENP-E, a novel human centromere-associated protein required for pogresion from metaphase to anaphase. EMBO J. 10: 1245-1254. 229

Yen, T.J., G. Li, B.T. Schaar, I. Szilak, and D.W. Cleveland. 1992. CENP-E is a putative kinetochore motor that accumulates just before mitosis. Nature 359: 536-539.

Yoda, K., S. Ando, S. Morishita, K. Houmura, K. Hashimoto, K. Takeyasu, and T. Okazaki. 2000. Human centromere protein A (CENP-A) can replace histone H3 in nucleosome reconstitution in vitro. Proc Natl Acad Sci U S A 97: 7266-7271.

Yoda, K., S. Ando, A. Okuda, A. Kikuchi, and T. Okazaki. 1998. In vitro assembly of the CENP-B/alpha-satellite DNA/core histone complex: CENP-B causes nucleosome positioning. Genes Cells 3: 533-548.

Yunis, J.J. and O. Prakash. 1982. The origin of man: a chromosomal pictorial legacy. Science 215: 1525-1530.

Zhong, C.X., J.B. Marshall, C. Topp, R. Mroczek, A. Kato, K. Nagaki, J.A. Birchler, J. Jiang, and R.K. Dawe. 2002. Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell 14: 2825-2836.

Zinkowski, R.P., J. Meyne, and B.R. Brinkley. 1991. The centromere-kinetochore complex: a repeat subunit model. J. Cell. Biol. 113: 1091-1110.