Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 587

Gene Mapping in Flycatchers

NICLAS BACKSTRÖM

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 UPPSALA ISBN 978-91-554-7380-8 2009 urn:nbn:se:uu:diva-9513                        !" #$$  $%$$ &  '  &    & ('  ') *'   +      ')

 ,- .) #$$) /  0      ') 1   )                          234) 3# ) ) 56. 437!72287493$73)

5     &     & ' +               '    & ' '    '     &     &   &&     ) *'   &+    '   & +''  &        &&       '    ) : +  '  &           &          &  '  , &    &     '   ' ,        & '   &' ;   <) *'     ''  &           +        '  !$$  ) *'      &    &             '   ' '      '  &  &    '=     ,      &      7   ) >       & '  +  &       '     ' '        '  +  & ' =  &         &     ) 1   & 7 ,   ,    7   , ? ;@<    &'     '      @    +' A 2$ ,    ' B #$)$$$ , +        '        ) 1   & 48 7 ,        ' '      ' '   &'  '  &' ;    <   ' '    '           ' ' '    &    &        C  '  )

     &' 6.(  , ?      

 ! " #      $  # %     # & '(#    # $)*+,-.  #   

D . ,- #$$

566. !"2!7"#!8 56. 437!72287493$73  %  %%% 72!9 ;' %CC ),)C E F %  %%% 72!9< List of papers

This thesis is based on the following papers, referred to by their Roman numerals.

I Backström, N., Brandström, M., Gustafsson, L., Qvarnström, A., Cheng, H., and Ellegren, H. 2006. Genetic mapping in a natural population of collared flycatchers (Ficedula albicol- lis): conserved synteny but gene order rearrangements on the avian Z chromosome. Genetics 174: 377-386.

II Backström, N., Fagerberg, S., and Ellegren, H. 2007. Genom- ics of natural populations: a gene-based set of reference markers evenly spread across the avian genome. Molecular 17: 964-980.

III Backström, N., Karaiskou, N., Leder, E.H., Gustafsson, L., Primmer, C.R., Qvarnström, A., and Ellegren, H. 2008. A gene-based genetic linkage map of the (Fi- cedula albicollis) reveals extensive synteny and gene order conservation during 100 million years of avian evolution. Genetics 179: 1479-1495.

IV Backström, N., Gustafsson, L., Qvarnström, A., and Ellegren, H. 2006. Levels of linkage disequilibrium (LD) in a wild bird population. Biology Letters 2: 435-438.

V Backström*, N., Lindell*, J., Zhang, Y., Palkopoulou, E., Saetre, G-P., and Ellegren, H. 2008. A high-density scan of the Z-chromosome in Ficedula flycatchers reveals candidate loci for diversifying selection and speciation. Manuscript. * = shared first authorship.

Papers number I (© Genetics Society of America), II (© Wiley-Blackwell Publishing, Inc.), III (© Genetics Society of America) and IV (© The Royal Society, UK) are reproduced with permission from the publishers. Additional papers not included in the thesis

Lindgren, G., Backström, N., Swinburne, J., Hellborg, L., Einarsson, A., Sandberg, K., Vilà, C., Binns, M. and Ellegren, H. 2004. Limited number of patrilines in horse domestication. Nature Genetics 36: 335-336.

Backström, N., Ceplitis, H., Berlin, S., and Ellegren, H. 2005. Gene conver- sion drives the evolution of HINTW, an ampliconic gene on the female- specific avian W chromosome. Molecular Biology and Evolution 22: 1992- 1999.

Berlin, S., Brandström, M., Backström, N., Axelsson, E., Smith, N.G.C., and Ellegren, H. 2006. Substitution rate heterogeneity and the male mutation bias. Journal of Molecular Evolution 62: 226-233.

Piece of art on front page, “Flugsnappare”, by Axel Backström and Ebba Backström © Axel and Ebba Backström 2008

Contents

Introduction ...... 9 The avian genome ...... 11 Background ...... 11 Karyotypes and genome sizes of ...... 12 Genomic properties and mapping ...... 14 Polymorphisms ...... 14 Recombination ...... 16 Conservation ...... 17 The collared and the pied flycatcher ...... 19 General information ...... 19 The Ficedula flycatchers as ecological models ...... 20 The collared flycatcher as a genetic model?...... 21 Analysis methods ...... 23 Microsatellite genotyping methods ...... 23 SNP genotyping methods ...... 23 SNPStream system ...... 23 GoldenGate Assay ...... 24 Gene mapping methods ...... 27 Background ...... 27 Pedigree-based approaches ...... 28 Population-based approaches ...... 34 Estimating linkage disequilibrium ...... 37 Research aims ...... 42 General aims ...... 42 Specific aims ...... 42 Summaries of papers ...... 44 Paper I: Genetic mapping in a natural population of collared flycatchers (Ficedula albicollis): conserved synteny but gene order rearrangements on the avian Z chromosome...... 44 Paper II: Genomics of natural bird populations: a gene-based set of reference markers evenly spread across the avian genome...... 45

Paper III: A gene-based genetic linkage map of the collared flycatcher (Ficedula albicollis) reveals extensive synteny and gene-order conservation during 100 million years of avian evolution...... 47 Paper IV: Levels of linkage disequilibrium in a wild bird population. .... 50 Paper V: A high-density scan of the Z-chromosome in Ficedula flycatchers reveals candidate loci for diversifying selection and speciation...... 52 Prospects for the future ...... 55 Svensk sammanfattning ...... 57 Acknowledgements ...... 61 References ...... 63

Abbreviations

nucleotide diversity

mutation rate

theta, the population mutation rate, 4Ne

W Watterson’s theta

Tajima’s theta

H Fay and Wu’s theta

population recombination rate, 4Ner bp base pair(s)

CATS comparative anchor tagged sequences cM centiMorgan

|D’| absolute value of D’, measure of LD

DNA deoxy-ribo-nucleic acid

F1 first generation offspring after a cross

Gb giga base pair indel insertion/deletion kb kilo base pair

LD linkage disequilibrium

Mb Mega base pair

PCR polymerase chain reaction

QTL quantitative trait loci r proportion of recombinant gametes with respect to two loci r2 correlation of alleles at different loci, measure of LD

S number of segregating sites

SNP single nucleotide polymorphism

SSR / STR short sequence repeat / short tandem repeat, microsatellite

Introduction

The variety of life forms inhabiting Earth is striking. No less than 1.5 million have been described by biologists and, dependent on the definition of a species, guesstimates of the total number of species populating Earth range between 10 and 80 million. Considering that current inhabitants probably only constitute a minority of the number of species that historically have been around, the total number of different living forms that have once been present must be immense. How did all these variants evolve and what are the forces that make the forms maintain genetic and phenotypic diversity in the face of selective pressures to adapt to the environment? To understand and explain these questions one has to start with the basics - determine the loca- tion and characterize the function of the genetic components to phenotypic variants.

A major aim of evolutionary biologists is to find the genetic basis of traits that are of importance for individual fitness in natural settings. Not before we have that knowledge will it be possible to to track the fate of different genotypic setups in wild populations (Ellegren and Sheldon 2008) and eva- luate the relative importance of mutation, drift and selection in shaping phe- notypic diversity (Mitchell-Olds et al. 2007). However, the information is still poor regarding what to expect in terms of numbers and nature of loci involved in a particular trait. Are there structural or regulatory elements or a combination of them (Carroll 2008; Hoekstra and Coyne 2007) and how large a role plays gene-family expansions and contractions? Are there few loci with major effects, many loci with minor effects or a mix? In addition, it is neither known how common pleiotropy is, i.e. the amount of compound functions carried out by one specific gene, nor what the general level of epi- stasis is, i.e. the joint effect of different genes in determining a trait value. Moreover, the traits dependence on the environment are weakly understood (Slate 2005). Even in model organisms like humans or mice the knowledge of the genetic background to phenotypes is sparse and there are cases where it has been shown that the reproducibility of mapping efforts is low, for ex- ample can studies in different populations point at disparate genes as major determinants of the focal trait. This incongruence generates exhaustive lists of candidates (Weiss 2008) and makes it hard to generalize and indicates that there is a need for extensive research to find the genetic basis of traits in different environmental settings and/or geographical areas. The lack of

9 knowledge is obviously even more pronounced in non-model species, and there is no comparative data at hand to indicate if the low reproducibility seen in inter-population comparisons of humans is similarly low in other systems. Actually, with the exception of a handful of study species, we have not yet reached the stage where it is possible to find the important genes in a single population.

The main focus of this thesis is to develop genetic tools (linkage maps) to use when searching for the genetic components underlying evolutionary important phenotypic traits in the collared flycatcher (Ficedula albicollis). I also examine some alternative approaches of mapping traits in the collared and the pied flycatcher (Ficedula hypoleuca). In the introductory part pre- ceding the actual research papers I will give a background to and describe the methods of the most commonly used mapping procedures. The emphasis is put on the methods used for establishing linkage maps and estimating linkage disequilibrium but I will also touch upon the subsequent steps in- volving phenotypic measures. To put large scale genetics of natural avian populations in historical perspective there is a part dealing with the known features of the avian genome. Here, the focus is on the properties that are of importance for forthcoming mapping analyses and I will also discuss why it is almost crucial to have a reference genome sequence from a reasonably close relative available when starting a large-scale mapping effort in a pre- viously uncharacterized organism. In addition, I introduce the study species and argue for why we have selected to work with the Ficedula flycatchers. In the summary of papers section I recapitulate in brief the research I have done, paper by paper and, finally, in the prospects section there is a small piece about what I anticipate will happen in the field in the near future.

10 The avian genome

Background The era of vertebrate full genome sequence analysis started with the release of the first draft of the human genome in 2001 (Lander et al. 2001) although full genome sequences for prokaryotes had been available since the publica- tion of the proteobacterium Haemophilus influenzae in 1995 (Fleischmann et al. 1995). Since these initial steps some 15 whole-genome se- quences have become publicly available (GOLD, Genomes OnLine Data- base v 2.0, http://www.genomesonline.org). A selection of examples include the house mouse (Mus musculus) (MGSC 2002), the Norway rat (Rattus norvegicus) (Gibbs et al. 2004), chimpanzee (Pan troglodytes) (TCSAC 2005), the domestic dog (Canis familiaris) (Lindblad-Toh et al. 2005) and the rhesus macaque (Macaca mulatta) (Gibbs et al. 2007). There are more genomes in the pipe-line for sequencing and annotation and to get a feeling for the progression in the field it is worth mentioning that at today’s date (September 14th, 2008) there is a total of 4,006 ongoing or finished genome projects among which 854 are published and 990 are ongoing eukaryotic projects. Before the availability of DNA sequence data for complete ge- nomes, genetic analyses were restricted to short and/or fragmented sequence samples and in some vertebrate groups (i.e. birds) there is still very limited availability of full-genome sequences and evolutionary analyses are limited to restricted pieces of the genome. Comparative genomics, the analysis of genetic function and genome organization through comparisons of genomes of different taxa, is a very young field of research in vertebrates, starting off with the completion of the mouse genome sequence in 2002 which made it possible to compare the genomes of human and mouse. In birds however, comparative genomics in its true sense has just begun.

The only avian species with its genome sequenced and annotated is chicken (Gallus gallus). It was a female inbred red jungle-fowl that was sequenced and the annotated draft genome sequence was officially released in 2004 (ICGSC 2004). This was 68 years after the first genetic map was produced for this species (Hutt 1936), a partial map consisting of 7 sex-linked genes. Hutt continued working on the sex-chromosome map and after a series of attempts he managed to extend the map to include 12 genes in 1960 (Hutt 1960). More recently, a comprehensive genetic map, predominantly based

11 on microsatellites was published (Groenen et al. 2000). In addition to genetic mapping efforts and genome sequencing, several studies have aimed at de- termining chromosome number, individual chromosome sizes and the karyo- type of chicken, e.g. (Auer et al. 2004; Fillon et al. 1998; Masabanda et al. 2004)

Karyotypes and genome sizes of birds At this stage at least 276 bird species have had their karyotype determined and/or their genome size (C-values) estimated by cytological investigations (http://www.genomesize.com (Gregory et al. 2007)). All these studies have revealed that birds have smaller genomes than most characterized mammals, amphibians and reptiles (Gregory et al. 2007) and a specific setup with only a few large- (macro-chromosomes) but many small chromosomes (micro- chromosomes) (Figure 1a and 1b). This chromosomal setup, or karyotype, seems to be strongly conserved among avian lineages even for very distantly related taxa (http://www.genomesize.com (Gregory et al. 2007)) and the only exceptions so far found include some birds of prey that have fewer chromosomes and lack the extreme size difference among chromosomal classes seen in other orders (Bed'Hom et al. 2003; de Oliveira et al. 2005).

The diploid chromosome number (2n) is typically 78-80 and of these are 10- 11 (2n = 20–22) classified as macro- (possible to distinguish individual chromosomes by flow cytometry and microscopy) and 28 (2n = 56) as mi- cro-chromosomes. In addition there is a set of sex-chromosomes (2n = 2). Females are heterogametic, carrying one each of the generally highly dimor- phic sex chromosomes Z and W and males are homogametic carrying double copies of the Z-chromosome. Interestingly, ratites (Palaeognathae) have less differentiated sex-chromosomes than all other birds, the Z- and the W- chromosome being almost similar in size and showing a high degree of banding homology (Ansari et al. 1988; Shetty et al. 1999). A pattern of less differentiated Z and W-chromosomes is also found in the California condor (Raudsepp et al. 2002).

12 Figure 1a (above) and 1b (below). A comparison of chromosome counts (Figure 1a) and C-values (Figure 1b) for the four major vertebrate classes Amphibia (n=907), Reptilia (exc. Aves, n=406), Aves (n=276) and Mammalia (n=614) as summarized from the Eukaryotic genome size databases (Gregory et al. 2007). Note the large number of chromosomes and small genome sizes of birds (Aves) compared to other classes.

In addition to being exceptionally conserved at the karyotype level, fluores- cent in situ hybridizations (FISH) with probes developed from chicken ma- cro-chromosomes and painted onto metaphase chromosome spreads of a diverse array of other bird species have revealed that avian chromosomes have experienced low levels of large scale inter-chromosomal rearrange- ments in general (Derjusheva et al. 2004; Fillon et al. 2007; Guttenbach et al. 2003; Itoh et al. 2006; Kasai et al. 2003; Nishida-Umehara et al. 2007; Raudsepp et al. 2002; Schmid et al. 2005; Shetty et al. 1999; Shibusawa et al. 2001; Shibusawa et al. 2004a; Shibusawa et al. 2004b) reviewed in (Grif-

13 fin et al. 2007). There are only two major rearrangements detected when comparing chicken and birds (Passeriformes). The first is a fission of the ancestral chromosome 1 in the lineage leading to (Derju- sheva et al. 2004) and the second is a fusion of the ancestral chromosomes 4 and 10 in chicken (Derjusheva et al. 2004; Nishida-Umehara et al. 2007; Raudsepp et al. 2002), an event that must have occurred after the split from other well characterized galliform birds like turkey (Reed et al. 2005; Shibu- sawa et al. 2004a) and common and Japanese quail (Kayang et al. 2006; Kikuchi et al. 2005; Shibusawa et al. 2001; Shibusawa et al. 2004a), hence, approximately in the last 30 million years (Dimcheff et al. 2002). Besides these autosomal changes, the Z-chromosome in particular has been involved in a few large scale inversions. Parsimony analyses indicate that the ance- stral state of the Z-chromosome is acrocentric but that recurrent inversions have happened independently in different lineages resulting in submetacen- tric or metacentric Z-chromosmes. In addition there has been addition of heterochromatic DNA and perhaps also events of centromere loss and gain (Griffin et al. 2007). As indicated in the previous paragraph, the few birds of prey so far examined have a differing chromosomal setup indicating that the rate of chromosomal rearrangements has been higher, perhaps already in the lineage leading to the bird of prey order Falconiformes (Bed'Hom et al. 2003; de Oliveira et al. 2005; Griffin et al. 2007; Nanda et al. 2006).

Genomic properties and mapping Polymorphisms A characteristic property of the avian genome is the scarcity of repeat ele- ments, true as well for short and long tandem sequence repeats like microsa- tellites (SSRs, STRs) and minisatellites, as for short and long interspersed repetitive elements (SINES and LINES) (ICGSC 2004; Primmer et al. 1997). The only relatively frequently occurring repeat is the chicken repeat 1 (CR1), a LINE transposon that was most active far back in the early radia- tion of birds and that can be found in truncated copies in genomes of extant birds (ICGSC 2004). In contrast to microsatellites and other repeat polymor- phisms, single nucleotide polymorphisms (SNPs) seem to occur at a high frequency, at least in those few species of birds that have so far been screened (Edwards and Dillon 2004; ICPMC 2004). The sample size of in- vestigated species is still small though and we cannot be confident that this is a general pattern, but large scale screening efforts in wider arrays of species will certainly reveal the pattern of SNP diversity.

The selection of genetic markers has an effect on the power and reliability of both mapping efforts and of population genetic analysis. Traditionally, the

14 most frequently used markers in genetic analyses have been microsatellites. The rationale behind choosing microsatellites has been the high levels of heterozygosity, their abundance and the straightforward genotyping methods in the form of length separation on gels. The high level of polymorphism for microsatellites has been ascribed to the approximately thousand-fold higher mutation rate of these sequences compared to common single nucleotide point mutations. The higher mutation rate is probably a result of replication slippage and has been shown to increase with repeat length (Ellegren 2004). On the other hand it has sometimes proven difficult to separate true homolo- gy (identical by descent) from homoplasy (identical by state) due to recur- rent mutations. Consequently, the interpretation of microsatellite data can sometimes be dubious, particularly for loci with very many alleles present. This is generally not a problem in mapping studies looking at segregation of alleles from one generation to another where de novo mutations are rare, but can cause biases in population genetic estimates of demographic parameters, for instance gene flow.

The absolute majority of SNPs are bi-allelic, i.e. there are only two alleles segregating at a certain nucleotide position. In theory, multiple mutations can occur at a single site (multiple hits), creating >2 alleles segregating at that particular site. However, this scenario is very unlikely given the low rate of point mutations, average rates in birds have been estimated to 3.6*10-9 per site and generation (Axelsson et al. 2004). Because of the few alleles segre- gating at each site, SNPs are generally less informative than microsatellites. The probability that an individual is heterozygous for a marker depends on the number of alleles and the respective frequencies of each allele at a locus. In general, if there are n alternative alleles at a locus there are

n*(n + 1) / 2 possible genotypes (Hartl and Clark 1997). If these alleles occur at frequen- cies n1, n2, …, ni (i is the total number of alleles at the locus) the probability of being heterozygous is

2 2 2 2 1 – (n1 + n2 + … + ni ) = 1 - ni as summarized over all values of i, i.e. 1 to i (ni is the frequency of the i:th allele), given that the population is in Hardy-Weinberg equilibrium (Hartl and Clark 1997). For example, for a locus with two segregating alleles at frequencies 0.5 and 0.5 (as informative as a bi-allelic SNP marker can be), the probability of being heterozygote is 1 - 2 * 0.52 = 0.5. In comparison, for a locus with 10 segregating alleles, each at frequency 0.1 the probability of being heterozygote will be 1 - 10 * 0.12 = 0.9. Despite this obvious caveat with SNPs, their relative abundance makes it possible to combine informa-

15 tion over sites within a certain region, i. e. to look at haplotypes rather than genotypes, thereby increasing the information level of individuals. Unless analysis involves very short sequences where recombination can be neg- lected, this procedure is only applicable to pedigree studies. In combination with the relatively cost efficient genotyping techniques available (see be- low), SNPs have been the marker of choice in an increasing number of stu- dies, not only in birds.

Recombination The specific setup of the avian karyotype, with many small micro- chromosomes and only a few large macro-chromosomes has an impact on the expected outcome of a mapping effort. If we consider the initial effort to develop a linkage map, the possibility to detect linkage between adjacent markers is strongly dependent on the amount of recombination occurring between them. Specifically, for any given physical distance, it is easier to detect linkage between markers in regions of low rather than high recombi- nation rate. For segregation to occur correctly during meioses at least one crossing over event (chiasma) per chromosome arm is required. Studies of meiotic recombination in yeast (Saccharomyces cerevisiae) show that there is also interference between chiasmata occurring on the same chromosome arm. This reduces the maximum number of events per arm to, on average, between one and two (Jones and Franklin 2006). Hence, a strong correlation between chromosome(-arm) size and rate of recombination is expected, a pattern very obvious when comparing for example the recombination rates of chicken macro- and micro-chromosomes; these chromosomal classes have an average recombination rate of 2.8 and 6.4 cM / Mb, respectively. The actual range of rates is higher and the lowest and highest rates for individual chromosomes are estimated to 2.5 and 21 cM/Mb, respectively (ICGSC 2004; Schmid et al. 2005).

The recombination rates are known to increase in telomeric regions, i.e. to- wards chromosome termini and decrease close to the centromere (Nachman 2002; Nachman and Churchill 1996; Schmid et al. 2005) and high-density genetic mapping studies, population based likelihood analyses and sperm typing experiments have shown that local rates of recombination can vary at the scale of several orders of magnitude (Jeffreys et al. 2001; Jeffreys et al. 2000; Kong et al. 2002; McVean et al. 2004; Myers et al. 2006). The ulti- mate cause for recombination variation on the regional and local scale is not completely understood. Some authors believe that recombination is the force that drives local base composition (Meunier and Duret 2004) while others have suggested that the underlying sequence context might be a determinant of recombination rate (Myers et al. 2008; Myers et al. 2006; Spencer et al. 2006). Regardless, variation in recombination rate is directly linked to the

16 power and resolution of mapping studies and might even cause biases in the interpretation of e.g. the number of genes involved in a trait (Boyle and Noor 2004). The probability to detect linkage is thus dependent on where on a chromosome the markers are located. If markers are located close to chro- mosome ends or on each side of one or several recombination hotspot there is a lower chance of detecting linkage. On the other hand, if one has a very dense marker map, the power to resolve individual order between tightly linked loci obviously increases in regions of high recombination.

Conservation The completion of the chicken genome was the tool needed to expand avian genetic analyses in natural populations from being founded on small pieces of single genes, minute sets of microsatellites or other anonymous markers to large scale analysis involving hundreds or thousands of loci or selected sets of markers (Ellegren 2005). With information from the chicken genome sequence it has become possible to extract sequence information for marker development in any bird species of choice. Given the low rate of large-scale inter-chromosomal reorganization (see above), it is also a relatively straightforward task to establish a selection of markers with an expected genomic distribution, e.g. located on a specific chromosome, a specific class of chromosomes or spread across many chromosomes, even in species of distant relationship to the chicken. However, most ecologically well studied species are from the order Passeriformes, a clade that has a deep phylogenet- ic relation to the order Galliformes to which chicken belongs. These orders share a common ancestor approximately 100 million years back in time (van Tuinen et al. 2000) which means that sequence divergence is sometimes very high. One way to overcome this obstacle is to align the chicken se- quence to an orthologous region in another species to look for sequence con- servation, a pattern indicative of purifying selection. Since chicken is the only bird species with its genome fully sequenced, the sequence comparisons in some cases have to be made to non-avian vertebrates. There is a shortcut though; large-scale EST sequencing efforts in other birds have generated cDNA sequences of quite a high number of genes where the orthologues sometimes can be traced (Axelsson et al. 2008). However, these reads rarely include the entire gene sequence.

The near finishing point of the second avian genome sequence, that of the zebra finch (Taeniopygia guttata), assures that we soon will have more de- tailed knowledge about bird genomics. A web browser is available at http://genome.ucsc.edu/cgi-bin/hgGateway?db=taeGut1 for searching the zebra finch genome sequence, but the official reference publication from the sequencing consortium is still ahead of us. With the genome sequence of two distantly related bird species at hand it will be possible to investigate the

17 degree of chromosome stability in more detail than has previously been possible. We already know from fluorescent in situ hybridizations that large scale inter-chromosomal rearrangements are uncommon. However, there are only preliminary data at hand to show if this scales down to gene clusters, individual genes or pieces of genes and data on the degree of intra- chromosomal rearrangements is sparse except for on the scale involving large chromosome pieces (Dawson et al. 2007; Dawson et al. 2006; Hale et al. 2008; Hansson et al. 2005). This is particularly important information for later stages of the mapping procedure since the possibility to use the infor- mation from already annotated genomes relies heavily on the degree of re- gional conservation between the model species and the focal species. Given that we find evidence for conservation at the regional and local levels it will be possible to screen model species genome assemblies for candidate genes in regions indicative of housing genetic loci underlying traits of interest.

Besides this, chicken is a very important food source around the world and breeders focus intensely on finding the genetic basis to production traits. This adds a particularly interesting aspect to the chicken being a model- organism for bird evolutionary genomics, since very many different morpho- logical and physiological QTL have been mapped to a specific location in chicken (Abasht et al. 2006), QTL that might affect also evolutionary impor- tant traits in natural populations.

18 The collared and the pied flycatcher

General information The collared flycatcher (Ficedula albicollis) and the pied flycatcher (Ficedu- la hypoleuca) are two of four black and white flycatchers from the Ficedula (family Muscicapidae) that inhabit the western Palearctic region during breeding season and that migrate to wintering grounds in sub-saharan . It has been suggested that breeding was restricted to geographically separated refugia during the Pleistocene glaciation events and that recurrent isolations have resulted in the formation of these species. Pollen analyses together with phylogeographic studies of deciduous trees point at a geo- graphic distribution of refugia that are reflected in the current distribution ranges of the four species, summarized in (Saetre et al. 2001a). The sug- gested refugium in north-western Africa coincides with the current distribu- tion range of the atlas flycatcher (Ficedula speculigera), the refugium on the Iberian Peninsula with parts of the distribution range of the pied flycatcher (see below), the refugium on the Apennine Peninsula with parts of the cur- rent distribution of the collared flycatcher (see below) and the refugium on the with the distribution range of the semi-collared flycatcher (Fice- dula semitorquata) (Saetre et al. 2001a). The atlas flycatcher was previously considered a subspecies to the pied flycatcher but genetic analysis suggests it should be given species status (Saetre et al. 2001b). The phylogenetic rela- tionship between the four species is not completely resolved. It seems most likely that the collared and the pied flycatcher are the closest relatives, with the atlas flycatcher as a close outgroup and the semi-collared flycatcher as a distant outgroup. This pattern is supported both from autosomal and mtDNA data (Saetre et al. 2003). However, bootstrap support values are rather low and data from the Z-chromosome give a different tree (Saetre et al. 2003).

The breeding range of the collared flycatcher is from central Italy in the west to western Russia in the east and the distribution edge to the north runs through the Czech Republic and Poland. The species also breeds on the Bal- tic Sea isles Öland and Gotland (del Hoyo et al. 2006). The census-based estimated total population size is 340.000 – 760.000 individuals and the den- sity of birds is highest in regions of deciduous forest in Czech Republic, Poland and Rumania and on Gotland () (del Hoyo et al. 2006). Popu- lation density of collared flycatcher is strongly dependent on suitable nesting

19 sites, preferably cavities in wood, and can be significantly enhanced by add- ing artificial nesting possibilities like nest-boxes. The pied flycatcher has a much wider distribution that ranges from the Iberian Peninsula in the west to central Siberia in the east and from central eastern in the south to the northernmost areas of Scandinavia in the north. The estimated census popu- lation size of pied flycatcher is about 10 times that of the collared flycatcher. The number of breeding pairs in the eastern parts of the distribution has been estimated to roughly 3 million and the European population to approximate- ly 5.25 million (del Hoyo et al. 2006).

In parts of their distribution ranges, in southern Germany, the Czech Repub- lic, in southern Poland and on the islands, the collared and the pied flycatcher breed in sympatry and hybridization occurs at a low frequen- cy. Typically around 5% of breeding pairs are mixed-species pairs (Alatalo et al. 1982; Saetre et al. 1999; Veen et al. 2001) but the frequency is depen- dent on the population density of each species in the area. Both the sympa- tric area in central Europe and on the Baltic Sea islands are most likely sec- ondary contact zones (Saetre et al. 1999) although the central European hy- brid zone is probably very much older than the Baltic sea island zone. The latter assumption is based on circumstantial data from expeditions by Lin- naéus and others and on the fact that the collared flycatcher population is steadily increasing in the region, observations that suggest that the zone might be as young as 100 - 150 years (Alatalo et al. 1990; Saetre et al. 1999). There is strong character displacement, especially among pied fly- catcher males. Individuals breeding in sympatry with collared flycatchers have a much duller coloration, similar to the female plumage while individu- als breeding in allopatric regions are distinctly colored in black and white. Also collared flycatchers have accentuated plumage characteristics in re- gions of co-occurrence, the white forehead patch being larger, the collar wider and the glossiness of the black parts more intense. This probably en- hances species recognition and decreases the occurrence of potentially mala- daptive hybridization (Roskaft et al. 1986).

The Ficedula flycatchers as ecological models The populations of pied and the collared flycatcher in Europe in general and on the Baltic Sea islands Öland and Gotland in particular have been in focus for intense ecological research for more than 25 years. There is an immense amount of data collected on morphology and phenotypic traits of importance for individual fitness in natural settings in particular. Numerous studies have described and/or characterized, and sometimes quantified, the rates and con- sequences of several secondary sexually selected plumage characters (Gus- tafsson et al. 1995), sex-ratio manipulations (Ellegren et al. 1996), cryptic

20 evolution (Merilä et al. 2001), song characteristics (Haavie et al. 2004; Qvarnström et al. 2006), life-history related traits like reproductive success and longevity (Gustafsson and Pärt 1990; Gustafsson and Sutherland 1988; Merilä and Sheldon 2000; Sendecka et al. 2007), level of extra pair copula- tions (Sheldon and Ellegren 1999), interspecific competition (Gustafsson 1987), immune function (Cichón et al. 2003), hybridization (Svedin et al. 2008; Wiley et al. 2007) and reinforcement (Saetre et al. 1997). One specific example is the collared flycatcher males fitness dependence on the size of the white forehead patch (Qvarnström 1997).

The collared flycatcher as a genetic model? To be able to carry out mapping experiments there are three requirements: phenotypic characters to map, a pedigree to follow segregation within and a linkage map to connect the phenotypes onto (Slate 2005). Besides the exten- sive phenotypic data, the most important feature of the collared flycatcher population on Gotland and Öland is the preference for these birds to breed in nest boxes. By designing study areas with a large number of nest boxes it has been possible to do long-term studies that have generated a bank of informa- tion about relatedness between individuals. Throughout the years the popula- tions have been monitored very intensely and it has been possible to identify family material with a sufficient amount of offspring for segregation analy- sis. A typical clutch size of collared flycatcher is 5–7 but occasionally a fe- male can lay up to 8 or 9 eggs (del Hoyo et al. 2006). This is a low number of offspring for genetic mapping but by taking advantage of returning adults producing offspring over several consecutive years with different partners (or sometimes the same) it has been possible to extract pedigrees based on half-sib families for a relatively large number of parents (males).

There are also drawbacks with this species. First of all, the collared flycatch- er has a large proportion of extra pair offspring in the clutches. The average frequency is about 15% (Sheldon and Ellegren 1999) although it may happen that the whole clutch is sired by a male different from the one raising the brood. This means that there is a large drop-out of individuals initially se- lected, reducing the power of analyses. In addition, the collared flycatcher is a trans-saharan migrant, wintering in south-eastern Africa (del Hoyo et al. 2006) and the survival rate of young is expected to be low. This makes it difficult to generate fitness trait values for more than one generation at a time, weakening the statistical power in analysis of co-segregation of geno- types and phenotypes.

One additional aspect of the collared flycatcher as a genetic model species regards the findings that some secondary male sexually selected plumage

21 characters seem to be linked to the Z-chromosome (Saetre et al. 2003). The lack of introgression, the high rate of inter-specific divergence and lower than expected levels of nucleotide diversity on the Z-chromosome compared to the autosomes suggest that recurrent selective sweeps act on the Z- chromosome (Borge et al. 2005b). Furthermore, recent findings through clutch-swap experiments indicate that there might be a significant Z-linked determinant to female preference for male characteristics as well (Saether et al. 2007). Theoretical analyses show that the expectation is an excess of sexually selected traits linked to the Z-chromosome in female heterogametic taxa. This may partly be due to expression of recessive Z-linked variants in the heterogametic sex but also to reduced recombination on the Z- chromosome (recombines only in males) compared to the genomic average (Qvarnström and Bailey 2008). In addition, there are theoretical evidence for a faster evolution of pre- and postzygotic barriers if loci involved are Z- linked (Servedio and Saetre 2003). Hence, the Z-chromosome seem to be the first chromosome of choice if one aims at finding genetic components that are important for male reproductive success and for formation of barriers to gene flow.

22 Analysis methods

Microsatellite genotyping methods Microsatellites (SSRs, STRs) have been used to discover and discard extra pair offspring in the family material and, in combination with gene-based SNP markers, to produce the autosomal genetic map (paper III). All STR analyses have been conducted with fluorescence labeled primers using the colors FAM, HEX and TET. Sometimes pools of STRs have been amplified together in multiplex reactions (Karaiskou et al. 2008; Leder et al. 2008). The general procedure runs as follows; primers designed to hybridize to flanking regions next to length variable STRs are used in PCR where one primer is tagged with fluorescence dye and the other is a non-tagged primer, to produce amplification products corresponding to the repeat plus the flank- ing region. For example, if an individual is polymorphic for an STR with allele lengths of 100 and 102 bp and the total length of the flanking regions, including the primer sequences, is 100 bp, the total amplicon lengths for this individual will be 200 and 202 bp, respectively.

Products are length separated in an electric field through thin capillaries filled with polyacrylamide gel. After length separation the products are flashed with UV light and the fluorescence dye emits light at a certain wave- length. Light emission is captured by a CCD camera. By running a size marker in parallel it is possible to determine the amplicon lengths. Some high capacity capillary instruments are equipped with 96 capillaries making it possible to run 96 individuals in parallel. It is also possible to run several markers at the same time, either if the markers have different size ranges and/or if markers are dyed with different tag colors, e.g. (Karaiskou et al. 2008; Leder et al. 2008).

SNP genotyping methods SNPStream system The SNPstream system (Figure 2) is a mini-sequencing method (Syvänen 2005) based on allele specific extension of primers located immediately up- stream of the polymorphism to type (Bell et al. 2002). Up to 48 SNPs with

23 the same nucleotide variation are run in multiplex PCR. There is one PCR reaction for each individual and 96 individuals can be run together in a mi- crotiter plate. The PCR products are cleaned from dNTPs and remaining primers using a standard ExoSAP protocol. PCR products are used as tem- plate for cyclic minisequencing reactions with a primer tagged with a locus specific oligonucleotide (example in Figure 2, Oligo 1) and designed to hy- bridize to the region immediately upstream of the polymorphic site. Single- nucleotide elongation steps are then run with ddNTPs labeled with two dif- ferent dyes (tamra and fluorescein), one for each allele. The locus specific oligonucleotides are used to hybridize sequencing reaction products to probes on 384 well plates and the fluorescent tags are photographed by scanning with a CCD camera and fluorescence signals are measured from the image and translated to genotypes. (Bell et al. 2002; Syvänen 2005)

GoldenGate Assay The GoldenGate Assay (Illumina, Inc., Figure 3) is a PCR-based method. Template DNA is amplified using a set of three primers, two allele-specific (Oligo 1 and 2) and one locus specific (Oligo 3). The allele specific oligo- nucleotides have allele-specific universal PCR primer site extensions in the 5’-end (Primer site 1 and 2). Likewise, the locus specific primer, which is general for both alleles and located several base-pairs downstream of the actual SNP, has a universal PCR primer site extension (Primer site 3), but also a unique, locus specific tag (Tag 1). This locus specific tag matches one probe (Probe 1) attached in many copies to a particular bead on an array. In the first round of amplification either one (example in Figure 3, homozygous for allele A) or the other (homozygous for allele G) or both (heterozygous for alleles A/G) allele specific primers and the locus specific primer will anneal and a polymerase activates extension of allele specific primers. The nick between the extending fragment from the allele specific primers and the locus specific universal primer is then ligated by a ligase to produce a full- length PCR-product of the region. The amplified template is amplified in turn, with universal primers (Primer 1, 2 and 3) in a multiplex reaction, in- volving all loci. In this reaction mix allele specific primers (primer 1 and 2) are labeled with dye Cy3 and Cy5, respectively. After multiplex PCR, the products are hybridized to the array where beads with locus specific probes are attached. For a particular locus there could thus only be hybridization of one (AA or GG homozygote) or both (AG heterozygote) of the two diffe- rently tagged amplification products. The different tags reflect UV-light at different wave-lengths and will produce fluorescent signals with different colors dependent on if they have the genotype AA, AG or GG. An image is taken of the fluorescent pattern of the array and individual beads (loci) are analyzed (Fan et al. 2006; Fan et al. 2003; Syvänen 2005).

24

Figure 2. Schematic illustration of the minisequencing method. Redrawn from www.medsci.uu.se/molmed/snpgenotyping/methods.htm.

25

Figure 3. Schematic illustration of the Illumina GoldenGate Assay method. Redrawn from www.illumina.com/print.ilmn?ID=11.

26 Gene mapping methods Background A major contribution to the understanding of the genetic basis to phenotypic traits was Alfred Henry Sturtevants publication of the first genetic map in Journal of Experimental Zoology (Sturtevant 1913). It was based on pheno- typic observations in fruit flies (Drosophila melanogaster). The study showed that genes are arranged on chromosomes in a linear manner and that the genetic elements for specific traits are located at certain positions, i.e. loci, on chromosomes. This was the initial step towards linking genotypes and phenotypes. With the increasing availability of genetic markers of dif- ferent kinds and full sequence data from many different genomic regions, mapping methods have become more diverse and in this section I will sum- marize some of the most used methods.

The strategies for finding the link between genetic loci and certain pheno- types vary with study organism and type of data at hand but can be divided into a few discrete categories (Figure 4). QTL-mapping refers to the analysis of co-segregation between genetic markers and phenotypic traits in pedigrees while association-, linkage disequilibrium-, or case/control mapping is the analysis of co-occurrence (association) of genetic marker alleles and pheno- typic traits in samples of unrelated individuals. More recently developed methods include population genetic and molecular evolutionary approaches like selective sweep mapping and comparisons of divergence levels at dif- ferent sites in coding regions. Recently it has also become possible to per- form gene expression analyses to look for transcription associated differenc- es among individuals or groups of individuals.

27 Figure 4. Categories of methods to link the observed phenotype to a candidate re- gion in the genome and, subsequently, to the causative mutation, modified from (Ellegren and Sheldon 2008). A possible way is also to use information from preced- ing functional analyses and select a potential locus from another organ- ism/population and go directly to the candidate locus level in the focal species with- out performing any indicative analyses. An important point to make is that the step from having a candidate region to pinpointing and verifying the actual causative mutation requires dense marker maps, association analysis at the single nucleotide level and subsequent functional analysis like transgenic experiments.

Pedigree-based approaches Linkage mapping Linkage mapping, or genetic mapping, refers to the procedure of finding signs of physical linkage between genetic markers through the information from segregation analysis in pedigrees. The term is also sometimes used to generally describe the analysis of linkage between genetic markers and phe- notypic traits but in this paragraph I will only consider the detection of lin- kage between different genetic markers. The method relies on the fact that, for chromosomes to segregate properly during meioses, at least one crossing- over event has to occur on each chromosome arm (Jones and Franklin 2006). Hence, there is a higher probability for alleles at loci that are located close to each other on the same chromosome to be inherited jointly than for alleles at loci that are located far apart. Loci on different chromosomes of course al-

28 ways assort independently (Mendel’s 2nd law) although the random combina- tion of paternally and maternally inherited chromosomes in the offspring generation sometimes deviates from the expected 50% ratio and this could, especially when dealing with small F1 samples, be interpreted as genetic linkage.

The principal of detecting linkage relies on analysis of inheritance of recom- binant and non-recombinant chromosomes in pedigrees. This is straightfor- ward in cases when the phase of markers is known in the parents, for exam- ple in designed crosses between heavily inbred lines of domestic species or species possible to keep in captivity (e.g. Drosophila). If only one of the parents is heterozygous for both of the genetic markers included in a pair- wise analysis, the fraction of recombinants can easily be counted in the offspring generation. The situation becomes more complicated when both parents are heterozygous. Despite knowing the phase of the parental chro- mosomes, it is not always possible to distinguish between recombinant and non-recombinant chromosomes in the offspring. Unless the offspring is ho- mozygous for at least one marker, the situation when the offspring has inhe- rited one non-recombinant chromosome from each parent can look identical to the situation when one recombinant chromosome has been inherited from each parent. When working with outbred populations the phases of the pa- rental chromosomes are rarely known and since the phases are equally likely and mutually exclusive, the linkage analysis in this case has to invoke a like- lihood analysis of the different possible outcomes (Figure 5). This is general- ly not a problem, especially not for tightly linked loci in crosses with a large number of F1s, since the recombinants always are expected to occur in a lower frequency than the parental chromosomes. However, it might affect the possibility to detect linkage, in particular in small offspring sets or for loci separated by relatively large distances. This is because the four possible offspring genotype combinations (if one parent is homozygous for both loci, as for example in Figure 5) at independently assorting loci are expected to occur at similar rates, but this pattern can be statistically demanding to dis- tinguish from the slight deviation from these ratios expected for distantly linked loci. This is particularly so if the number in each category is small.

The fraction of recombinants in the offspring is used as an estimate of the genetic distance separating the genetic markers in the analysis. The distance is generally measured in centi-Morgans (cM), the unit Morgan was coined by Alfred Henry Sturtevant (see above) to honor his tutor, Thomas Hunt Morgan. One cM refers to 1% recombinants in an offspring sample (Figure 5).

29 Figure 5. An example of a cross between two parents where the female is homozyg- ous for both genetic markers and the male is heterozygous for both markers but the phase of each of the paternal chromosomes is unknown. The male has the genotype Aa/Bb but A could either be coupled with B (case 1) or with b (case 2). In this ex- ample the number of offspring is 9. If case 1 is true, 7 of the offspring carry the parental chromosomes (NR) and 2 carry the recombinant chromosomes (R) while, if case 2 is true, 2 offspring carry the parental chromosomes and 7 carry the recombi- nant chromosomes. The interpretation in this case is pretty straightforward since there is a relatively strong deviation from the 50/50 ratio of parental/recombinant chromosomes that is expected under independent inheritance. Hence case 1 is more likely than case 2 since we never expect recombinant chromosomes to occur in the sample at more than 50% frequency. However, the sample size is small and the probability of observing this ratio might not be significantly different from what is expected under random sampling of gametes. To estimate the probability of the different outcomes, mapping algorithms calculate the likelihood of each case. Hence, the probability of observing this particular data is equal to: Likelihood (r) = (r)2 (1 - (r))7 for case 1 and Likelihood (r) = (r)7 (1 - (r))2 for case 2. If we accept case 1, the estimated genetic distance between loci A and B would be 2/9 = 22 cM.

As Sturtevant establish in his 1913 paper (Sturtevant 1913) one can extend the information from pair-wise recombination fractions among genetic markers and infer the relative position of markers along chromosomes from this. For example, if we have three gene markers (A, B and C) along a chro- mosome and we estimate the pairwise distance between markers A and B to 20 cM, between B and C to 15 cM and between A and C to 33 cM, the most

30 likely order of these markers is A-B-C. Since there is stochastic variation in the pair-wise estimates of genetic distances among loci there will not be a perfect additive relationship among loci along chromosomes. In addition to the random variation in different estimates, the lack of additivity is also a result of that the number of crossing-overs is not restricted to a single event per chromosome (Klug and Cummings 2000). Although there seem to be some interference between chiasmata at the same chromosome arm, some- times more than one crossing over occurs during the same meiotic division resulting in an underestimation of the recombinant frequency, especially for loci separated by more than approximately 10 cM (Kosambi 1944). Quan- titative data of interference is sparse and there is no general conclusion on how severe the effect is but there is some evidence showing that the level of interference can vary among chromosomal regions on a genomic scale, at least in some species (Sherman and Stack 1995). There are two established methods available to correct for the discrepancy between crossing-over and recombination and to improve the estimate of the genetic distance (E(r)) between loci, i.e. to include the possibility that recombination has occurred more than once (double cross-overs) in the calculation of recombination events (Figure 6). One method

E(r) = - ½ ln (1 - 2 r) proposed by Haldane (Haldane 1919) assumes that crossing-over is random and independently occurring along chromosomes and hence does not take interference between adjacent crossing-over into account while the method

E(r) = 1/4 ln [ (1 + 2 r) / (1 - 2 r) ] by Kosambi (Kosambi 1944) assumes constant interference at a specific level (Figure 6). Both methods assume crossover events to follow a Poisson distribution but the latter is generally preferred due to the present evidence of interference.

The most widely used method for determining the statistical significance of linkage between markers is based on the estimation of Maximum Likelihood scores for different values in the parameter space and calculates a likelihood value for each r. The algorithm then aims at finding the value of r that max- imizes the LOD-score and to assign linkage between markers if the LOD score exceeds some cut-off value. The LOD-score is the Log10 of the ratio of the likelihood of linkage (r < 0.5) over no linkage (r = 0.5) and a commonly used threshold for inferring linkage is a LOD-score of 3, i.e. the estimated level of genetic linkage is 1,000 times more likely than independent assort- ment. In a random sample that should not occur more often than in 1 out of 1,000 tests. There are several softwares available that conduct linkage analy-

31 sis with maximum likelihood, e.g. CRI-MAP (Green et al. 1990) and Join- Map (Stam 1993) and in addition to estimating pair-wise linkage between markers these softwares can also handle multipoint data and make use of the correction methods described above (e.g. Kosambi) to estimate genetic dis- tances between multiple loci in linkage groups.

Observed recombination fraction Observed recombination Observed recombination fraction Observed recombination 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Haldane genetic distance Kosambi genetic distance Figure 6. Plots describing the mapping functions of Haldane (left) and Kosambi (right). By taking interference between chiasmata into account the Kosambi map- ping function decays faster towards the asymptote at 0.5 observed recombination fraction so that e.g. a mapping distance of 0.5 corresponds to 32% and 38% ob- served recombination for the Haldane and the Kosambi functions, respectively.

QTL mapping A quantitative phenotypic trait is a trait that shows variation on a continuous scale (e.g. height of humans) and that has an inheritance pattern that deviates from what is expected from a simple monogenic (Mendelian or qualitative) trait. QTL, or quantitative trait loci, are regions in the genome that have been found to affect a quantitative phenotypic trait (Doerge 2002). A QTL “co- operate” with other QTL to affect the phenotype, although this cannot be known in advance. Hence, in theory, it might be a single locus affecting the trait but for example a large environmental effect shaping a continuous dis- tribution of trait values in a population. QTL-mapping refers to the proce- dure where one tries to find the genomic regions harboring genetic elements affecting a phenotypic trait of interest. It is similar to linkage mapping in the sense that one follows the segregation of alleles and phenotypes in a pedi- gree and estimate statistical linkage between them. As is the case for all lin- kage analyses, the strength of the co-segregation analysis increases with the size and the variability of the marker set and increased power can be achieved by so called interval mapping (Lander and Botstein 1989; Lynch and Walsh 1998), using separate analyses for each marker-pair (interval) along a genetic map instead of a set of independent markers. The possibility

32 to pick up loci affecting the trait obviously increases with the genomic cov- erage of the genetic map.

One advantage of using captive populations for QTL mapping is that one can design crosses between genetically highly inbred lines to produce a F1 gen- eration that has a high probability of expressing variance at both marker loci (all loci fixed between parental lines) and at QTL. When crossing the F1 back to the parents (backcross design) or to other F1s (F2 design) to produce a F2 generation the distribution of QTL and markers becomes useful for statistical analyses of co-segregation of QTL and genetic markers. In gener- al, the F2 design produces three genotypes (AA, Aa, aa for alleles A and a) at each marker locus and the backcross design only two (AA, Aa with back- cross to AA parents or Aa, aa with backcross to aa parents) so that the F2 design increases the possibility to determine the degree of dominance of the QTL (Lynch and Walsh 1998). In natural populations it is generally not possible to create inbred lines and design specific crosses. In addition, it may be difficult to collect the necessary phenotypic data or to gather pedigree material. The optional natural populations possible to monitor thereby de- creases dramatically since few species have life-history traits, distribution ranges, population densities or family structures suitable for mapping ap- proaches.

The scarcity of suitable populations and species in combination with absence of genetic tools is well reflected in the lack of publications describing map- ping efforts in natural populations. There are a few exceptions though, most of which include geographically restricted populations and/or species with a model species as a close relative. For example, it has been possible to map candidate QTL responsible for coat color, coat pattern and a few morphometric characters in an island population of Soay sheep (Ovis aries) in Scotland (Beraldi et al. 2006; Beraldi et al. 2007; Gratten et al. 2007). There are also QTL detected that affect adaptive coat color variation in beach and pocket mouse species (Peromyscus polionotus & Chaetodipus intermedius) in southern USA (Hoekstra et al. 2004; Hoekstra et al. 2006), hemoglobin oxygen-binding affinity in deer mouse (Peromyscus manicula- tus) populations inhabiting different altitudes (Storz et al. 2007), the forma- tion of body armour in three-spined sticklebacks (Gasterosteus aculeatus) (Peichel et al. 2001), and birth weight in red deer (Cervus elaphus) (Slate et al. 2002). Of particular interest for avian geneticists, QTL affecting plu- mage color polymorphisms in different bird species have also been described (Doucet et al. 2004; Mundy 2005; Mundy et al. 2004; Theron et al. 2001). In addition to the scarcity of suitable populations for mapping, the process itself becomes more complicated since parents in outbred populations are expected to be much less informative, firstly because not all F1s in a cross between outbred parents will be heterozygous at all loci and secondly because indi-

33 viduals in outbred populations are likely to differ in marker - QTL phase (Lynch and Walsh 1998). However, if it is possible to breed the focal species in captivity, similar approaches as for domestic species might be applicable as exemplified by a few studies of natural populations of beach mice (Steiner et al. 2007) and three-spined sticklebacks (Peichel et al. 2001).

Population-based approaches Association mapping QTL (and monogenic trait loci) can also be assigned to genomic regions by association mapping in population samples. The traditional way (association study) is to sample individuals from one affected group (case) expressing the trait of interest, for example a disease, and from one unaffected group (con- trol). The individuals are then genotyped for a number of variable genetic markers and the statistical association between markers and trait values is investigated. One problem with working with population samples is that historical recombination events have broken up associations that might have been possible to pick up in segregation analyses in pedigrees. Hence, the number of markers needed to cover the genome in an association study is always higher than in a segregation analysis. On the other hand, by detailed association mapping it is possible to get closer to the causative locus and sometimes to pinpoint the actual causative mutation. Therefore, a commonly used approach is to combine the methods, by first conducting a segregation analysis of QTL in a pedigree to find candidate regions of the genome and then carry out a more detailed screen of that region in a case/control sample.

The statistical assessment of association is relatively straightforward. In a contingency table test it is a matter of calculating if the observed frequencies of alleles or genotypes in cases and controls are expected from random sam- pling or not (Weir 2008). However, one obvious caveat of association stu- dies relates to the large number of statistical tests carried out. In practice there is one test per genetic marker and for full genome scans with some- times hundreds of thousands of markers, the expected rate of false positives is very large. Hence, it might be necessary to discriminate and discard the proportion of stochastic positive associations found. This process tends to be conservative with traditional statistical approaches like Bonferroni- correction (Quinn and Keough 2002) that aims at excluding any false posi- tives, i.e. the risk for conducting type II errors is high. Another approach is the application of false discovery rate (FDR) thresholds. FDR refers to the process whereby one estimates the proportion of false positives in the entire set of positives. Although being much less stringent, applying FDR does not eliminate the risk of discarding very valuable true positives. Therefore, as an alternative, one might accept including a set of false positives in the first

34 genome-wide screen and evaluate the significance of each locus after a more detailed analysis (Schlötterer 2003). An alternative is to use a relatively small sample size in an initial scan and to use the positives in subsequent scans in other, larger samples; the multi-stage approach (Hirschhorn and Daly 2005).

In the later stages of the mapping process, the possibility to narrow down a genomic interval by more and more detailed association screens, i.e. using denser and denser sets of markers, ultimately depends on the level of linkage disequilibrium (LD) in the region of interest. Hence, in order to assess the coverage of a certain number of markers in an association study one has to quantify the level of linkage disequilibrium (see below) in the region before- hand.

Selective sweep mapping In contrast to QTL- and association mapping, selective sweep mapping is a scanning approach that aims at detecting regions that deviate from neutral patterns regarding nucleotide diversity, thereby inferring selection without having a previous knowledge about phenotypes, but see (Storz 2005). The method can be used to screen candidate genes identified by QTL or associa- tion studies (Namroud et al. 2008) and is thereby a complement but the me- thod relies on the assumption that the locus of interest has been subject to relatively strong directional selection in the recent past.

The idea to the approach springs from the concept of genetic hitch-hiking (Maynard Smith and Haigh 1974), the process whereby particular alleles sweep to fixation because they are linked to a positively selected mutation at another site. Dependent on the selection coefficient, this might occur rapidly enough for recombination not to have time to break up the associations be- tween the selected locus and the linked variants. This may cause a reduction in the level of within-population variability (nucleotide diversity) in the re- gion affected by the sweep. The stronger the selective coefficient and the lower the rate of recombination, the more extensive the region affected by hitch-hiking. The actual mapping procedure is a scan for the level of poly- morphisms along chromosomes to identify regions of low variability and/or high divergence between populations. The random variation in nucleotide diversity among sites is expected to be large because genomic regions have different genealogies due to recombination between loci and because muta- tion rates differ at a regional scale (Smith et al. 2002). Therefore, a single locus with low diversity or high differentiation might not provide a strong enough signal to rule out stochastic variation. Ideally, these diversity valleys or divergence peaks should therefore include several adjacent markers (Harr 2006; Kim and Stephan 2002; Schlötterer 2003). However, there are me- thods available to establish an expected distribution of values and statistical-

35 ly evaluate single locus outliers, for example, by using the ratio of variability of microsatellite markers between two closely related populations, over a large number of loci (Schlötterer 2002). This approach, looking at variation in diversity levels between populations has been useful for finding candidate regions for resistance in malaria parasites (Nair et al. 2003; Wootton et al. 2002). Another approach uses simulated neutral distributions of population differentiation as estimated by FST – statistics (Wright 1921) and compares single-locus estimates of FST to the expected distribution to find outlier loci (Beaumont and Balding 2004; Beaumont and Nichols 1996). Reduction in nucleotide diversity can also be assessed using single populations and one method uses a likelihood ratio test to discriminate between a neutral scenario and selective sweeps (Kim and Stephan 2002). The very recent availability of large-scale polymorphism data, e.g. (ICPMC 2004; IHMC 2005) allows for hitchhiking mapping on a genomic scale (Pavlidis et al. 2008) although the process is yet restricted to a very limited number of organisms where there are dense polymorphism maps at hand.

Analyses of allele frequency distributions Other varieties of the selective sweep mapping approach use observed and expected allele frequency distributions of polymorphisms. These tests were developed to investigate DNA sequence evolution using the neutral theory of molecular evolution (Kimura 1983) as a null model. Expectations from the neutral theory include that the rate of substitution equals the rate of mutation and that the probability of a new mutation going to fixation equals its current frequency in the population, i.e. allele frequency changes are determined by genetic drift only. However, a positively selected mutation will have a higher probability of going to fixation, the probability increases as a function of the selection coefficient, the relative fitness of the mutant allele compared to other alleles (Hartl and Clark 1997).

A method described by Fumio Tajima (Tajima 1989) investigates the rela- tionship between the number of polymorphic sites (S) and the average pair- wise nucleotide difference () in a set of sequences sampled from a popula- tion. Under neutrality, the estimate of the population mutation rate as calcu- lated from the number of polymorphic sites, corrected for sample size (W, (Watterson 1975)), should equal that calculated from the average number of pair-wise differences between individual chromosomes (, (Tajima 1983)). The actual test statistic, D, is calculated as - W (actually also divided by the standard deviation). If there is an excess of high frequency polymor- phisms, D increases since is inflated compared to S while if there is an excess of rare variants, D decreases since is understated compared to S. The D statistic is based on data from a single population only and does not include information about ancestral and derived variants meaning that the highest allele frequency possible is 0.5. Positively selected variants will drag

36 along with them neutral variants increasing the allele frequency of linked variants during a sweep, hence resulting in a high positive Tajimas D during a certain period of a selective sweep. On the other hand, in the end of, or immediately after a sweep the only variants remaining are at low frequency (close to fixation or newly arisen mutations) causing a negative Tajima’s D. Purifying selection (background selection of linked variants) reduces the possibility for mutations to reach higher allele frequencies and there will be an excess of low frequency variants causing Tajima’s D to be negative. Similar effects can be caused by population expansion since that results in maintenance of low frequency variants but lowers the probability of alleles increasing in frequency. This can make it hard to disentangle different selec- tive regimes and selection from demography using this test only. However, the inclusion of multiple loci might at least give a possibility to discriminate between demographic history (affects all loci) from selection (affects a sin- gle locus), e.g. one locus expressing a highly positive Tajima’s D while all others are negative could indicate an ongoing sweep or balancing selection affecting that particular locus.

Another method, in practice very similar to the Tajima’s D, uses information from an out-group making it possible to categorize allelic variants into ance- stral and derived. The method, developed by Fay and Wu (Fay and Wu 2000) includes a test statistic (H) which is the difference between and H where H is Fay and Wu’s own developed population mutation parameter, sensitive to high frequency derived variants (Fay and Wu 2000). This meas- ure takes into account the fact that, if positive selection acts on a region, it is possible for derived variants to quickly reach high frequency in a population while if the sequence evolves under neutrality, or purifying selection, de- rived variants will generally segregate at a lower frequency than ancestral variants. The possibility to detect selection increases with a combination of these statistics, for example if both the Tajima’s D statistic and the Fay and Wu’s H are significantly negative it is unlikely that background selection or population growth can explain the data (Otto 2000).

Estimating linkage disequilibrium As indicated above, the possibility to use neutral genetic markers to link phenotypic traits to specific genomic locations depends on if the neutral markers can give any information about the genotype at the locus of interest. In other words, is the neutral marker and the focal region in linkage disequi- librium. Linkage disequilibrium is defined as non-random association of alleles at different loci (Lewontin and Kojima 1960). The level of linkage disequilibrium is highly dependent on demographic history of populations, but also on selective effects and to some extent random genetic drift (Slatkin

37 2008). In general, linkage disequilibrium decays with increasing genetic and/or physical distance and the extent of linkage disequilibrium can be used to select markers for association mapping studies. If there is a region with extensive linkage disequilibrium over long physical distance, the number of markers needed to cover that region is lower than in a region where linkage disequilibrium decays rapidly. In other words, the signal from a locus of interest extends further in regions with high linkage disequilibrium. There- fore, to be able to assess the number of markers needed to cover a genomic region in a screen for association, it is necessary to have an idea about the level of linkage disequilibrium in that region.

Considering genomic levels of linkage disequilibrium, population history might be the most important determinant (Slatkin 2008). This is particularly true for population subdivision, migration and changes in population size. Levels of linkage disequilibrium are expected to increase with population subdivision since there is random drift of different alleles in different subpo- pulations. When samples are taken from both populations there will be ge- netic association between the alleles that have gone to fixation or drifted to dissimilar allele frequencies in the different subpopulations. Population bot- tlenecks also lead to an increase in average levels of linkage disequilibrium due to the more rapid loss of haplotypes compared to genotypes. This is be- cause haplotypes can be lost due to drift without losing any genotypes but genotypes, on the other hand, cannot be lost without losing haplotypes. The effect though, is strongly dependent on the severity and duration of the bot- tleneck.

Examples of very different demographic histories come from the domestic dog (Canis familiaris) and the Norway spruce (Picea abies). Domestic dogs of specific breeds have been subject to at least two different bottlenecks dur- ing the past 10,000 years. When the wild ancestors (wolves) were tamed it is likely that the number of wild forming the domestic population was small. Later on, breed formation has been associated with very selective breeding schemes. Hence, we expect the loss of haplotypes to be severe in dogs in general and even more extreme within breeds. Indeed, there is in general long range linkage disequilibrium observed in dogs (Sutter et al. 2004). On the other extreme, the Norway spruce, a species that probably has an enormous population size that has been maintained for a long period of time. It is thus expected that historical recombination events have had time to break down association between alleles at different loci. When assessing the levels of linkage disequilibrium in this species it is observed that decay is sometimes complete already between adjacent nucleotides (Heuertz et al. 2006).

38 Regional variation in linkage disequilibrium can to some extent be attributed to stochastic variation in association among alleles. However, regional varia- tion can theoretically also be the result of selection. Directional selection might lead to fixation of beneficial alleles in combination with physically linked alleles if the selective pressure is strong enough to cause the selected variant to go to fixation before recombination breaks up the association be- tween the selected and the linked variants. In addition to hitch-hiking effects, selection can also affect linkage disequilibrium if there is an epistatic inte- raction among alleles generating a higher fitness for particular allele combi- nations (Slatkin 2008). In that case linkage disequilibrium can build up be- tween loci that are not necessarily physically linked, although this force is expected to be weak.

Regional and local variation in linkage disequilibrium can also be attributed to regional and local recombination rate variation because recombination is the only major force that acts to break up associations among loci. As men- tioned previously, it is known that recombination rates vary both on genom- ic, regional and local scales. On the genomic scale, recombination rates gen- erally increase towards the telomeres at chromosome termini and decrease in the vicinity of the centromeres (Nachman 2002; Nachman and Churchill 1996). There is also positive correlation between GC-content on the mega- base scale and recombination rate. The causal relationship is not yet resolved but the effect is a large variation in recombination rate between chromosom- al regions with differences in average levels of GC-content (Meunier and Duret 2004; Spencer et al. 2006).

In addition to these large scale effects the majority of the recombination events seem to occur in recombination hot-spots, specific regions of up to a few kilobases wide that have ten- to hundred-fold higher recombination rates than the surrounding sequences. The density of these hot-spots is rather high but they are short lived. It has been estimated that the human genome harbor more than 25,000 hot-spots (Myers et al. 2005), but they seem to be relative- ly transient (Myers et al. 2006; Ptak et al. 2005; Ptak et al. 2004). Hence, although the local recombination rate might be extremely high in an active hot-spot, recombination levels should smoothen out as hot-spots appear and disappear at a relatively high rate (Myers et al. 2006). It is possible that the relocation of hot-spots is restricted to certain regions affecting the regional rate rather than the global or local rates (Spencer et al. 2006). As a conse- quence of the complex pattern of recombination histories in different parts of the genome it is generally very difficult to get a general or comprehensive picture of the levels of linkage disequilibrium in different parts of the ge- nome.

39 The level of association between loci can be estimated from population po- lymorphism data by different measures among which |D’| and r2 are the most frequently used. Both r2 and |D’| capture the association between pairs of bi- allelic loci (locus A and locus B in the examples below) and can take any value between 0 (no LD) and 1 (complete LD).

|D’| is defined as |D/Dmax|, where D = (fAB-fAfB) and Dmax = min[fafB, fAfb] if D>0, or min[fAfB, fafb] if D<0. fAB = frequency of haplotype AB and fA, fB, fa, fb = allele frequencies of alleles A, B, a and b, respectively.

2 2 r is defined as D AB / fA fa fB fb where fA, fB, fa, fb = allele frequencies of alleles A, B, a and b, respectively.

|D’| is often used to study the impact of recombination on haplotype struc- ture in a region and is intimately related to the four-gamete test of recombi- nation (Hudson and Kaplan 1985). This test relies on the assumption of an infinite sites model of mutation where no new point mutations can occur on sites that have already been hit by a mutation. Hence, if 3 of the four poss- ible haplotypes are observed in a pair-wise comparison of alleles over two loci, there is no evidence of recombination and |D’| = 1. However, recombi- nation might occur between non-informative sites (Myers and Griffiths 2003) and it is well known that the four-gamete test only picks up some 10- 15% of recombination events in areas of intermediate nucleotide diversity. |D’| can be hard to interpret if it is not 1 or 0 since it is not linearly related to the amount of recombination and it is inflated by severely biased allele fre- quencies or small sample sizes.

The other commonly used measure, r2, is the most straightforward estimate of LD since it is simply the statistical correlation between alleles at different loci, i.e. the probability that one can tell what allele to expect at one locus by determining the allelic state at another locus (Hill and Robertson 1968). Hence, for r2 to be 1, the correlation of alleles between two loci has to be perfect with only 2 of the 4 possible haplotypes observed. r2 decrease to 0 as a direct function of the decrease in correlation between alleles. r2 is also robust to biased allele frequencies and small sample sizes. Therefore, r2 is often the preferred estimate when designing association studies and for esti- mating the number of markers needed in a scan of a particular genomic re- gion. It is also useful for estimating the sample size needed to find signifi- cant association. In the case where genetic marker and the affected locus are in perfect linkage disequilibrium (r2 = 1), the information content at the marker locus is exactly as high as at the causative locus and the sample size needed to detect association does not differ from what one would need if just typing the causative locus. However, as the distance between the causative locus and the genetic marker increases, the level of linkage disequilibrium generally decays (r2 < 1) and the power to detect association decreases.

40 Therefore, to get the same power as at the causative locus the sample size (N) needed would be N / r2. The power is also strongly dependent on the allele frequency of included markers and instead of typing all markers a more cost-efficient approach is to select particular tag markers that capture a large proportion of the variation in a region (Wang et al. 2005).

41 Research aims

General aims The main aim of this work was to develop a foundation for forthcoming efforts in the search for the genetic basis of phenotypic traits in natural bird populations. The main focus has been on the collared flycatcher but the ap- proaches taken were selected to aid the development of genetic tools in other species as well. The major goal was to create detailed linkage maps covering a large proportion of the collared flycatcher genome. Besides providing the necessary genetic tool for subsequent QTL mapping experiments this would enhance the possibilities to study chromosome evolution in birds in more detail than what has previously been possible. In addition I wanted to pros- pect the chances of doing population based association scans. As well as providing undemanding sampling strategies this approach generally gene- rates narrower genomic intervals to scan for candidate loci. Furthermore, it has been suggested that many particularly interesting traits might have a genetic component linked to the Z-chromosome and one aim was therefore to use a population genetic approach and do a detailed scan for signs of se- lection over the Z-chromosome.

Specific aims I Investigate the possibility to do linkage mapping in a natural population of collared flycatchers and to produce a detailed map of the Z-chromosome for comparisons to other bird spe- cies (Paper I). II Develop a high density anchor marker set with applicability to a wide array of birds for genetic linkage mapping, phylogeny reconstruction, population genetic analyses and studies of speciation (Paper II). III Develop an autosomal genetic map covering a large propor- tion of the genome of the collared flycatcher to use in subse- quent QTL mapping efforts and for detailed studies of chro- mosomal rearrangements between bird species (Paper III).

42 IV Estimate the level of linkage disequilibrium on the Z- chromosome to get information about the density of markers needed to cover the entire collared flycatcher genome in an association scan (Paper IV). V Scan a large number of gene sequences along the Z- chromosome in both allopatric and sympatric populations of the collared flycatcher and the pied flycatcher to reveal loci that show signs of directional selection in either or both of the species (Paper V).

43 Summaries of papers

Paper I: Genetic mapping in a natural population of collared flycatchers (Ficedula albicollis): conserved synteny but gene order rearrangements on the avian Z chromosome. Detailed linkage mapping in birds have so far been restricted to domestic species like chicken (Groenen et al. 2000), quail (Kayang et al. 2004; Kiku- chi et al. 2005) and turkey (Reed et al. 2005) and there are only rudimentary maps available for wild birds, e.g. of the great reed warbler (Acrocephalus arundinaceus) (Hansson et al. 2005). In order to evaluate the possibilities to create a detailed genetic map in a natural population of collared flycatchers the Z-chromosome was selected as target for a mapping effort using a pedi- gree of 365 birds in total. Coding regions from the recently released genome sequence of chicken was used as template for BLAST searches against other vertebrate sequences deposited in GenBank (http://www.ncbi.nih.gov/ BLAST). Specifically, 135 predicted Z-linked genes from the chicken as- sembly were evaluated for sequence homology to other vertebrate genes. Primers were designed to anneal to conserved resions in exons flanking in- trons of proper length for amplification and sequencing in collared flycatch- ers. Re-sequencing in 8 unrelated collared flycatcher males generated 53 SNPs with a minor allele frequency exceeding 20% in the screening panel. These SNPs were selected as markers for genotyping in the pedigree. Geno- typing was conducted with the SNPstream minisequencing strategy (see above) (Bell et al. 2002). These 53 SNPs were from 31 different genes that subsequently were treated as single markers by combining the genotypes from SNPs within an intron to haplotypes.

All 23 gene markers were linked to each other with LOD-score > 3.0, hence constituting a single linkage group with no evidence of synteny distortion between chicken and collared flycatcher for these particular markers. In ad- dition, no females showed heterozygosity at any SNP strongly supporting that the location of the markers are indeed on the Z-chromosome also in collared flycatcher. The framework map consists of 11 loci at a total map length of 44.4 cM and the best order map, for all 23 loci, was 62.7 cM. By aligning the collared flycatcher linkage map to the homologous part of the

44 chicken Z-chromosome, there was strong gene order conservation stretching over approximately half of the chromosome. However, for the other half, at least 4 inversions are needed to explain the difference in gene order between the species. By assuming a similar physical size of the Z-chromosome in collared flycatchers as in chicken (Gregory et al. 2007) and comparing the length of the linkage groups for the most distal markers in both species we could conclude that the recombination rate for the chicken was about twice as high (135 cM) as the rate in collared flycatcher (63 cM).

The results show that synteny conservation of the avian Z-chromosome scales down to single genes but, in agreement with relatively detailed FISH- mapping studies (Shibusawa et al. 2004b), there is evidence of intra- chromosomal rearrangements, at least for parts of the chromosome. The recombination rate is lower in collared flycatcher as expected from the do- mestication hypothesis, i.e. that domesticated species in general have higher recombination rates due to strong directional selection (Rodell et al. 2004) as observed in other organism groups (Ross-Ibarra 2004). However, the discre- pancy is perhaps larger than expected since the general conclusion from the approximately doubled rate of recombination in chicken compared to hu- mans has been thought to, at least in part, be due to the greatly differing ka- ryotypes between these species (ICGSC 2004). On the other hand, there are several ecological factors also affecting recombination rate evolution (Otto and Lenormand 2002) and it is difficult to draw any general conclusions from this single observation.

Chicken is the only avian species with substantial sequence data at hand and this provides the necessary information to conduct relatively large scale ge- netic analyses in non-model birds. The success rate of generating sequence data from non-model species like the collared flycatcher will undoubtedly be enhanced with the availability of additional avian genome sequences. The strategy of using conserved regions in exons for primers to amplify adjacent introns (Lyons et al. 1997) seems to be a promising approach in a wide array of organisms (Aitken et al. 2004; Hellborg and Ellegren 2003; Primmer et al. 2002) and increases the possibilities to correctly anchor the markers to a certain genomic position.

Paper II: Genomics of natural bird populations: a gene- based set of reference markers evenly spread across the avian genome. CATS, or ‘comparative anchor tagged sequences’ were introduced by (O'Brien et al. 1993). The term was used to describe conserved hybridization

45 probes with a known location in a model species, used to map genes in ge- netically uncharacterized species. Later the method has been used to locate conserved regions in exons, flanking introns of appropriate size for PCR amplification and sequencing, e.g. (Aitken et al. 2004; Cogan et al. 2006; Palumbi and Baker 1994). In birds, the only reference genome available is chicken (ICGSC 2004). However, large scale EST sequencing efforts in the zebra finch (Taeniopygia guttata) have generated cDNA sequence data for quite a large number of genes, mainly expressed in brain (ESTIMA database, http://titan.biotec.uiuc.edu/songbird/resources). In addition, the zebra finch is in the pipe-line for full genome sequencing and the raw sequence reads are available in the trace archive (www.ncbi.nlm.nih.gov/Traces/trace.cgi? cmd=retrieve&s=search&m=obtain&retrieve=Search&val=SPECIESCODE =TAENIOPYGIA+GUTTATA). In this study, we selected 242 genes evenly spread across all annotated chicken chromosomes and used the CATS ap- proach to find exonic regions conserved between chicken and, preferably, zebra finch. In regions where no zebra finch sequence data was available, other vertebrates where used in the exonic alignments. Average distance between markers was 4.1 Mb and only five intervals exceeded 10 Mb ac- cording to chicken. Amplification success of primer pairs was evaluated using 122 of the primers and a panel of 6 different bird species, chicken (Gallus gallus), peregrine falcon (Falco peregrinus), Tengmalm’s owl (Ae- golius funereus), blue tit (Parus caeruleus) and great reed warbler (Acroce- phalus arundinaceus). In addition, all primers (242) were used for PCR- amplification attempts in collared flycatcher. All successfully amplified loci in collared flycatcher were re-sequenced in a panel of 8 unrelated males to get estimates of intron length, base composition, repeat content, nucleotide diversity, allele frequencies of SNPs and indel frequency. To be able to compare the diversity estimates to other species, two blue tits and four great reed warblers were also re-sequenced.

Amplification success was highest in chicken (93%) and decreased slightly for the passerines, blue tit (83%), great reed warbler and collared flycatcher (79%), was still relatively high for peregrine falcon (78%) but low for Tengmalm’s owl (34%). The latter was probably to some extent a result of poorer DNA quality. The outcome of the amplification assessment points toward a broad utility of these markers in avian genetical research. 200 of the 242 markers could be readily amplified and sequenced in collared fly- catchers. In total this generated 122.41 kb of sequence data (12.1 kb coding). Using a BLAST cut-off expected value of E-5, 118 of 126 loci matched the exact position in the chicken genome used for primer design. There was a strong correlation between intron size (Pearson’s correlation coefficient = 0.47) and GC-content (Pearson’s correlation coefficient = 0.67) between chicken and collared flycatcher. We found no evidence of presence of LINES (CR1) and only 0.39% of the sequence was assigned to be STRs. In

46 total, 904 polymorphisms were detected among which 33 were short indels and 871 were SNPs. The nucleotide diversity () of collared flycatcher was 0.0029, higher than for both blue tit ( = 0.0018) and great reed warbler ( = 0.0012). For 160 of the 200 sequenced markers it was at leat one SNP with minor allele frequency (MAF) > 20% and the average allele frequency dis- tribution was not significantly different from the distribution expected under strict neutrality. However, there was a skew (albeit non-significant) towards positive Tajima’s D values (average Tajima’s D over all loci = 0.58).

Considering that avian genomes seem to be very stable both in terms of the number of chromosomes and total genome size (Gregory et al. 2007), the marker set provided should cover approximately the same proportion and the same regions as it does in chicken. However, there is a bias in the coverage of the different chromosomal classes with a complete lack of information for the smallest chromosomes. The inter-marker distances suggest that it, theo- retically, should be straightforward to detect linkage between adjacent mark- ers in a linkage mapping effort. Extrapolation from the observed recombina- tion rates in chicken, turkey, quail and great-reed warblers (Groenen et al. 2000; Hansson et al. 2005; Kayang et al. 2004; Reed et al. 2005) indicates that most marker pairs should be in the interval of 5 - 10 cM.

The marker set will probably be an important contribution to the characteri- zation of genomic properties in non-model birds (Edwards 2008). In addi- tion, avian phylogeneticists have stressed the importance of scaling up from the traditionally used short pieces of mtDNA or a single or a few nuclear loci (Avise et al. 1994; Ericson et al. 2006; Ericson and Johansson 2003) to ex- tensive multi locus datasets (Edwards et al. 2005; Edwards et al. 2007; Liu and Pearl 2007; Scholl and Bird 2005) and for these purposes the marker set will probably be a useful resource. In addition, the markers should be helpful in multi-locus population genetic analyses e.g.(Becquet and Przeworski 2007; Hey and Nielsen 2004), as well as in speciation genetics, for example in the search for patterns of introgression between closely related species (Borge et al. 2005a; Borge et al. 2005b).

Paper III: A gene-based genetic linkage map of the collared flycatcher (Ficedula albicollis) reveals extensive synteny and gene-order conservation during 100 million years of avian evolution. As a consequence of the high density polymorphism maps needed to investi- gate the association between phenotypes and genotypes with population data, QTL mapping in pedigrees is the approach of choice unless the focal

47 organism has a well characterized genome sequence at hand. In addition, to be able to study chromosome evolution, detailed genetic maps are useful tools when whole genome sequence data is absent. It is known that the rate of chromosome reshuffling varies among clades and time periods (Ferguson- Smith and Trifonov 2007; Kohn et al. 2006). Comparisons between birds and mammals (Burt et al. 1999) and between closely and distantly related bird clades (Derjusheva et al. 2004; Fillon et al. 2007; Griffin et al. 2007; Guttenbach et al. 2003; Itoh and Arnold 2005; Kasai et al. 2003; Nishida- Umehara et al. 2007; Raudsepp et al. 2002; Schmid et al. 2005; Shetty et al. 1999; Shibusawa et al. 2001; Shibusawa et al. 2004a; Shibusawa et al. 2004b) have shown that large scale rearrangements have been rare through- out the evolution of birds. However, these studies, preferentially using fluo- rescence in-situ hybridizations (FISH) neither have the resolution to pick up small scale reshuffling between chromosomes, nor can they resolve intra- chromosomal rearrangements.

As a step towards QTL mapping in a wild population of collared flycatchers we made use of a previously established marker set (see Paper II) to gener- ate a comprehensive genetic linkage map in this species. Besides being ne- cessary for subsequent linkage analysis of genotypes and phenotypic meas- ures the genetic map is detailed enough to reveal inter- and intra- chromosomal changes down to a few megabases (Mb). We typed 384 SNPs corresponding to 170 different autosomal genes with the Illumina Golden Gate assay (Fan et al. 2003) in combination with in house genotyping of 71 microsatellites developed for pied flycatcher (Karaiskou et al. 2008; Leder et al. 2008) in a pedigree of 24 half-sib families at a grand total of 322 individ- uals. SNPs located within the same intron were combined to haplotypes to increase information content for each gene marker. Linkage analyses were made using CRI-MAP (Green et al. 1990) starting off with a two-point anal- ysis of all markers, assigning markers to linkage groups if the LOD-score was > 3.0. Framework maps were constructed with the build option and best order maps constructed by including all loci that were linked to at least one other locus in the initial twopoint analysis. The markers in the best order map were assigned to their relative position by recurrent runs of the option flips4. 211 of the 241 markers could be linked to at least one other marker with LOD score > 3.0. Altogether, these 211 markers formed 33 linkage groups in the length interval 0 to 325 cM. The mean genetic distance between adjacent markers was 10.0 cM and the total sex-averaged map length 1787 cM. By extrapolating from the physical position of the most terminal markers in chicken for each linkage group in collared flycatcher, we estimate the cover- age of the map to be 66%. 30 markers remained unlinked after the initial twopoint analysis and among these markers were a large proportion with a low number of informative meioses. Among the informative markers that

48 were unlinked but had a high number of informative meioses (n = 9), six were located at chromosome termini in chicken, probably in regions of high- er than average recombination rates (Wahlberg et al. 2007) making it more difficult to pick up linkage. However, we cannot rule out the possibility that this result is attributable to chromosomal rearrangements involving small pieces of chromosome ends.

By aligning the collared flycatcher linkage groups to their orthologous chicken chromosomes we could conclude that the rate of rearrangements has been very low, not only regarding inter-chromosomal changes but also for gene order rearrangements within linkage groups. We could verify the fis- sion of the ancestral chromosome 1 in the passerine lineage (Griffin et al. 2007) and the fusion of ancestral chromosome 4 and ancestral chromosome 10 in chicken. A few observations could indicate the presence of additional, previously undiscovered inter-chromosomal rearrangements but the circums- tances for these observations indicate that they, rather than representing true fissions or fusions, reflect erroneously assigned chromosome positions of markers (low BLAST expected values), a low density of markers in a partic- ular region or lack of power to detect linkage among linkage groups. By assuming that the only inter-chromosomal changes that have occured be- tween chicken and collared flycatcher are the fission of chromosome 1 and fusion of chromosomes 4 and 10 and taking the partial coverage into account we estimated the rate of change for this class of rearrangement to be no more than 2 changes / 200 million years (100 million years per lineage) * 0.66 = 1 fission or fusion per 66 million years. There were indications of 18 intra- chromosomal changes but only 7 of these were verified in the framework map. By making a similar calculation for this kind of rearrangements, the rate was estimated to be approximately 1 inversion (of a few megabases) per 19 million years. The estimates of rearrangements are crude and it cannot be ruled out that, at least intra-chromosomal rearrangements might occur at a scale not picked up with a map of our density. However, the calculations hint that chromosome stability in birds might be extreme compared to for exam- ple mammals (Pontius et al. 2007).

Why are avian genomes stable? A plausible explanation might relate to the fact that avian genomes seem to be unusually sparse in repeat sequences (ICGSC 2004; Organ et al. 2007; Primmer et al. 1997), elements that are overrepresented in regions of chromosome fragility (Bailey et al. 2004; Freudenreich 2007; Kehrer-Sawatzki and Cooper 2007). This observation might provide a neutral explanation to avian genome conservation. Birds are known to have a very low rate of formation of post-zygotic hybridization barriers (Fitzpatrick 2004; Price and Bouvier 2002), barriers that might often be formed by chromosomal rearrangements (Capanna and Castiglia 2004; Hoffman and Rieseberg 2008; Navarro and Barton 2003; Noor et al. 2001;

49 Rieseberg 2001). An intriguing consequence of the stability of avian karyo- types may therefore be a lower rate of speciation in birds than in, for in- stance, mammals.

Paper IV: Levels of linkage disequilibrium in a wild bird population. In contrast to QTL mapping, association studies provide the possibility to use population samples instead of pedigrees for the assessment of linkage between genotypes and phenotypes. Trait associations to particular geno- types are typically measured in one case and one control sample (Kruglyak 1999). This could be a more straightforward approach than QTL-mapping in natural populations since there might be problems in establishing pedigrees for segregation analysis in free-ranging study species. However, to be able to assess the number of markers needed to cover the genome (or any region of interest) in an association scan it is crucial to have information about the level of linkage disequilibrium.

82 collared flycatcher females were sampled from the Swedish population inhabiting the Baltic Sea isles Öland and Gotland. The reason for selecting females was that, in birds females are the heterogametic sex carrying only one Z-chromosome. Thereby the phase of included markers is known. 34 SNPs corresponding to 23 different genes (paper I) were genotyped in the sample using the SNPstream system (Bell et al. 2002). Pairwise LD was estimated with the measures |D’| and r2 as implemented in the software Hap- loview (Barrett et al. 2005). The patterns of decay were assessed within in- trons, between markers all along the chromosome and for a subset of mark- ers where the physical distance between adjacent markers was less than 1.4 Mb according to the chicken genome map.

There was only visible evidence for recombination in 2 out of 12 intra- intronic comparisons, average |D’| was 0.98 for these loci. However, decay then occured rapidly and background |D’|-values were reached at 3–400 kb. An interesting observation for the evaluation of the possibility for associa- tion mapping is that r2 decays below commonly applied threshold values (Carlson et al. 2004) already at < 50 kb (Figure 7). This points at figures around 20,000 markers to cover the genome (~1 Gb) in an association scan if LD is uniformly and equally long-range on autosomes as on the Z- chromosome. The estimate of 1,000 – 2,000 markers suggested in the paper (Paper IV) is based on a threshold value of |D’| exceeding 0.5. However, it is hard to interpret intermediate values of |D’| and a much more straightforward threshold is r2 > 0.5, i.e. the sample size needed to detect association to a

50 causative mutation is twice as high for the marker locus compared to typing the causative locus. In reality, it is likely that this estimate is too low since the Z-chromosome is fairly large (few crossing-overs per bp) and only re- combines in males and should be among the lowest recombining chromo- somes in the avian genome. In addition, the Z-chromosome harbors a higher density of sex-biased genes indicating strong selection might more frequent- ly affect loci on the Z-chromosome compared to the autosomes, perhaps increasing general levels of LD through hitch-hiking effects. Ideally, one would like to assess the levels of LD among a large set of regions evenly spread across different chromosomal classes to get a better idea about the general levels of LD in collared flycatchers but the obtained estimate at least gives a rough idea about the minimum number of markers needed in a scan.

Figure 7. Decay of linkage disequilibrium (r2) against physical distance according to the position in the chicken genome.

Assessments of LD in natural populations are still very scarce. There are estimates from well characterized organisms like humans (Homo sapiens) (Reich et al. 2001), fruit flies (Drosophila melanogaster) (Andolfatto and Przeworski 2000; Haddrill et al. 2005), maize (Zea mays) (Remington et al. 2001), thale cress (Arabidopsis thaliana) (Nordborg et al. 2005), a protozoan malaria parasite (Plasmodium falciparium) (Conway et al. 1999) and domes- tic chicken (Gallus gallus) (Heifetz et al. 2005) but, to our knowledge, the only wild bird species investigated so far is the red-winged blackbird (Age- laius phoeniceus) (Edwards and Dillon 2004). A general conclusion from these studies is that LD varies enormously among species, but also among

51 chromosomal regions. Unfortunately it is hard to compare our findings with previous studies in birds since the approach taken in the chicken study in- volves only between generation estimates of LD (Heifetz et al. 2005) and the study in red-winged blackbirds (Edwards and Dillon 2004) is restricted to a region of the genome (MHC) known to have very specific characteristics like balancing selection, unusually high recombination rate and epistasis (Kauppi et al. 2003; Meyer and Thompson 2001).

LD is affected both by the rate of recombination (r) and the effective popula- tion size (Ne), (the population recombination rate () for diploid organisms equals 4Ne r (Stumpf and McVean 2003)). Hence, for an outcrossing taxon that has not experienced any severe bottleneck in the recent past the average level of LD should rarely be extensive. The levels observed in collared fly- catcher are fairly high, comparable to what has been estimated in human populations (Reich et al. 2001). This indicates a demographic history involv- ing either a population bottleneck or colonization from different source pop- ulations. It could also be due to a sampling effect, i.e. sampling of a struc- tured population, but since the level of differentiation is low (95% C.I. for FST = [-0.009 0.042]) even between the most geographically separated of sampled individuals this explanation is unlikely. There are indications of a very recent colonization of Öland and Gotland (Alatalo et al. 1990). A recent founding event, in practice the same as a population bottleneck, might there- fore be a plausible explanation to the relatively long-range LD observed in collared flycatchers.

Paper V: A high-density scan of the Z-chromosome in Ficedula flycatchers reveals candidate loci for diversifying selection and speciation. Speciation is the process that generates biodiversity and to understand the route of speciation it is necessary to reveal which processes that play a role in developing barriers to gene flow between populations. Initially, genetic divergence between populations can come about through drift or adaptation (Barton 1996; Orr and Smith 1998). If speciation is to occur, the diversifying process must also involve the build-up of reproductive isolation. Local adap- tation is likely to enhance the development of reproductive isolation. This could arise through linkage between incompatibility loci and positively se- lected loci or through pleiotropic effects (loci affect both traits for local adaptation and traits that confer genetic incompatibility) (Hawthorne and Via 2001). If genetic incompatibilities have evolved in the diverging populations, then hybridization events are likely to be selected against since hybrids will suffer from reduced fitness. Recombinant genotypes with disadvantageous

52 allele combinations are likely to be purged and loci that confer genetic in- compatibilities will therefore be characterized by more pronounced genetic differentiation than loci that have no effect on fitness (Nielsen 2005).

Theoretical and empirical data suggest that the sex-chromosomes should be particularly important in the build-up of genetic incompatibilities (Pre- sgraves 2008; Qvarnström and Bailey 2008). Sex-chromosomes are present in one copy only in the heterogametic sex, which causes recessive sex-linked variants to be expressed. Hence, selection against deleterious allele combina- tions is likely to be stronger on the sex-chromosomes than on the autosomes. This is known as Haldane’s rule which stipulates that hybrids of the hetero- gametic sex suffer from reduced fertility and/or viability compared to the homogametic sex (Haldane 1922; Orr 1997). It has also been shown that the power of reinforcement is accentuated if the genetic basis of both pre- zygotic and post-zygotic isolation is sex-linked (Lemmon and Kirkpatrick 2006; Servedio and Saetre 2003). Similarly, selection for male secondary sexual characters is more efficient if the loci that affect the male traits and the female preference for that trait are inherited jointly (Reeve and Pfennig 2003). In organisms with female heterogamety the Z-chromosome is inhe- rited directly from father to son, which reduces the risk of random mixing of male trait and female preference alleles (Reeve and Pfennig 2003). In addi- tion, it is likely that male secondary sexual characters are deleterious if ex- pressed in females. Part of this antagonism can probably be resolved if sex- ually antagonistic traits are sex-linked (Arnqvist and Rowe 2005). By assessing the level of genetic differentiation of 74 Z-linked markers we aimed to find loci that might have been involved in the build-up of genetic incompatibilities between the pied and the collared flycatcher and therefore facilitated the speciation process.

We sampled 10 pied flycatchers from Germany (allopatry), 21 pied flycatch- ers from Öland (sympatry), 10 collared flycatchers from Hungary (allopatry) and 8 collared flycatchers from Öland (sympatry). These birds were re- sequenced for 74 Z-linked introns located in genes evenly distributed along the Z-chromsome. Twenty-three of the genes were selected from a previous study (Backstrom et al. 2006). Fifty-one gene markers were developed by applying the exon-primed intron-crossing (EPIC) approach (Palumbi and Baker 1994). We selected exons that flanked introns in the size range 500– 1,000 bp from regions on the chicken Z-chromsome not yet covered by the initial 23 markers. The chicken exons were aligned to the orthologous exons in zebra finch and primers were designed to anneal to conserved regions. Individual haplotypes were inferred and locus specific FST-estimates were calculated. We also estimated the level of nucleotide diversity and calculated Tajima’s D and Fay and Wu’s H to assess patterns of molecular evolution for the different regions.

53 The sequencing effort yielded 40.5 kb of sequence data and we found 34 fixed and 46 shared polymorphisms between the species. Seven loci were found to have a higher than expected level of differentiation according to a Bayesian analysis. Interestingly, two of these regions harbor genes that have been associated with plumage coloration in chicken and Japanese quail (Gunnarsson et al. 2007; Nadeau et al. 2007). There was some overlap be- tween genes with high degree of differentiation and genetic diversity was generally not higher in sympatry than in allopatry. When analyzing com- bined data and allopatric and sympatric populations separately only one gene showed significantly high differentiation between sympatric populations but low differentiation between allopatric populations This might suggest that time in sympatry have not been sufficient to result in a detectable signal from selection against hybrids or that the progress of speciation is too far gone for differences among sympatric and allopatric populations to be de- tected.

We found a higher degree of genetic variation in the collared flycatcher. This pattern was observed both in allopatric and sympatric populations. In addi- tion, Tajima’s D values were generally positive in the collared flycatcher in contrast to the generally negative values found in the pied flycatcher. This point towards different demographic scenarios for the two species following the recolonization of Europe from refugia after the last glaciation event. A negative Tajima’s D is expected if populations have been through a bottle- neck (refuge) and this has been followed by range expansion (recoloniza- tion) and population growth. This scenario should be true for both species according to their current distribution ranges. However, a higher diversity in the collared flycatcher indicates that the bottleneck effect was less severe than in the pied flycatcher. Positive Tajima’s D values could either be a re- sult of founding from genetically divergent ancestral populations, e.g. col- lared flycatchers from different refuges or of unidirectional gene flow from the pied flycatcher to the collared flycatcher.

The main objective with this study was to scan the Z-chromsome in the pied flycatcher and the collared flycatcher for signs of adaptive evolution. We found evidence for increased between-species differentiation in seven re- gions. Interestingly, two of the regions house genes associated with plumage coloration.

54 Prospects for the future

The rapid improvement of quality and the steady increase in individual read length and total sequence yield for the second generation sequencing me- thods imply that transcriptome or full genome sequencing in any organisms will be comparatively cheap and reasonably straightforward within just a few years. This will obviously lead to increased possibilities to generate large-scale genotyping and re-sequencing data also for non-model species. As a result, mapping efforts in natural populations will not be limited by marker availability. However, the general difficulties in establishing exten- sive pedigree data and phenotypic measures in large samples that span sev- eral generations will remain. As a consequence, the most likely scenario is that mapping procedures in natural populations will be focused on popula- tion samples and association approaches rather than traditional segregation analysis in pedigrees.

For populations that may be maintained in captivity, it will be possible to circumvent the difficulties encountered in natural settings and QTL-mapping might still be the approach of choice. A caveat with captive populations is that individuals in captivity might display phenotypes that differ markedly from what had been found in natural settings. Additionally, with very dense marker maps the resolution of individual markers is dependent on a high number of informative meioses and extensive pedigrees will be needed.

One approach that could be increasingly important is candidate gene investi- gations. With the accumulation of genome sequences from an increasing number of species the possibilities to conduct mapping and functional analy- sis of genes will be enhanced. With increasing availability of functional data from many species, it will also be more likely that the genetic background to a trait of interest has been described in another species previously. Instead of scanning the entire genome one might therefore be able to do detailed stu- dies of candidate loci immediately. The sparse data at hand predicts that, unless studying very close relatives, the causative mutations are unlikely to be identical between species but the pathways might be more conserved. An economic strategy would therefore be to scan candidate genes in a particular pathway before scanning the entire genome. This approach benefits from that one circumvents a large part of the problems with multiple testing. Of course, there has to be several tests conducted to screen a set of candidates

55 but it is far from comparable to the copious amounts of tests needed for a genome-wide association scan in a species with an average sized genome and an average level of LD.

In addition to the more or less classical mapping strategies mentioned above, increased access to a variety of genetic tools suggests that selective sweep mapping will be more efficient. Although this approach does not include any beforehand knowledge of the phenotype, the characterization of the loci under adaptive evolution could establish which trait that is affected later on.

For all mapping approaches one shortcoming is that the conclusions are based on associations only. Even if the level of association between a geno- type and a phenotype is 100% one cannot rule out the possibility that there are other causative loci in the genome, not included in the scan. Hence, no matter which approach taken, the ultimate test for a causative relationship between a genetic variant and a phenotype is dependent on verification through transgenic, knock-out or similar experiments. For many species this kind of verification will be very difficult, consider for example the ethical considerations behind conducting knock-outs in humans. Unless alternative experimental methods can be applied, the genotype – phenotype relations therefore will remain being just associations.

The lack of general conclusions from disease mapping efforts in human pop- ulations could be discouraging. How would it ever be possible to generate general understanding of genetics of quantitative traits in natural populations if it is not possible in humans? Presumably it will not. There will be very few traits that are determined by the presence of the same alleles in different environments. Consequently, it is likely that the genetic basis of traits has to be determined for single populations specifically and for different environ- mental conditions. That will indeed be a challenge. In addition, it is increa- singly evident that levels of expression and translation rather than structure are important for the phenotype and that there is a lot of plasticity, also in phenotypes that affect fitness. Hence, besides traditional mapping of DNA variants, functional analyses are likely to increasingly incorporate proteo- mics, micro-RNA analysis and studies of other modes of epigenetic control.

56 Svensk sammanfattning

Ett av huvudmålen för evolutionsbiologer av idag är att kartlägga de genetis- ka komponenter som ligger till grund för karaktärer (fenotyper) som har betydelse för individers överlevnad och reproduktiva framgång i naturliga miljöer. Först när det är känt kan vi få komplett förståelse för hur evolutio- nen verkar. I välkaraktäriserade modellorganismer som människa, mus, backtrav och olika husdjur och grödor börjar det i viss mån bli känt vilken effekt olika gener och regulatoriska element har, framförallt för karaktärer som har med hälsa och matproduktion att göra. Dock är det oftast många olika regioner (lokus) i arvsmassan involverade i en specifik karaktär och det finns också generellt en stor miljömässig påverkan vilket innebär att analyser i olika populationer inte nödvändigtvis resulterar i att samma regioner indi- keras som betydelsefulla för karaktären, snarare tvärtom. Det gör det svårt att hitta generella mönster. I naturliga populationer har genkartering precis påbörjats och det är egentligen bara i arter som kan hållas i fångenskap eller i populationer som är begränsade till en mycket begränsad area som det har varit möjligt att kartlägga några av de regioner i arvsmassan som har bety- delse för en viss karaktär.

Fram tills nu har vi alltså egentligen ingen kunskap alls om den genetiska bakgrunden till de karaktärer som är intressanta ur en evolutionär synvinkel. Är det varianter i de proteinkodande delarna av gener eller i de regulatoriska element som styr mängden genuttryck och alltså i förlängningen kvantiteten protein som tillverkas i cellerna? Är det oftast en eller flera regioner som påverkar, hur stor effekt har ett visst lokus och har olika regioner kombinato- riska effekter (epistatiska effekter) eller är det så att vissa gener har en på- verkan på flera karaktärer (pleiotropiska effekter)? Alla de här frågorna är i stort sett obesvarade. För att nå målen, dvs för att hitta genetiska komponen- ter som styr evolutionärt intressanta karaktärer finns ett antal olika strategier. I traditionell husdjursgenetik används information om släktskap för att följa nedärvningen av såväl genotyper som fenotyper mellan generationer (QTL mappning) och tack vare att rekombination bara sker maximalt ett fåtal gånger per kromosom och reduktionsdelning (meios), går det att hitta kopplingar mellan genetiska markörer och orsakande genetiska regioner över relativt långa avstånd. Upplösningen är dock generellt dålig och det kan vara svårt att hitta fram till den exakta positionen i genomet bara genom segrega- tionsstudier med släktskapsdata. En alternativ och komplementär strategi är

57 att använda sig av obesläktade individer från två (eller fler) olika grupper som uppvisar olika fenotyper. Genom att karaktärisera ett stort antal variabla genetiska markörer i de olika grupperna går det att analysera korrelationen (associationen) mellan en viss fenotyp och de olika genetiska varianterna (allelerna) för ett visst lokus, det kallas för associationsstudie. Associations- studier har fördelen att de ger en mycket högre upplösning än segregations- studier på grund av att de många historiska rekombinationshändelser som har skett under populationens levnad har brutit upp associationer över längre distanser. Nackdelen med associationsstudier är att det krävs en stor mängd genetiska markörer och att varje enskild markör testas individuellt vilket gör det svårt att få någon högre styrka i den statistiska analysen, man måste kor- rigera för multipla tester.

Det finns flera analysmetoder som bygger på populationsdata. Alla har gemensamt att man letar efter mönster i den genetiska variationen inom och mellan populationer som tyder på att en region är eller har varit under selek- tion. Man antar alltså att det lokus man letar efter har en effekt på fenotypen som gör att olika alleler ger olika konsekvenser för bäraren, individer skiljer sig åt i exempelvis överlevnad eller reproduktiv framgång. Fördelaktiga alle- ler tenderar därmed att öka i frekvens i populationen genom processen med naturligt urval, bättre rustade individer får fler avkommor. Genom att titta på många lokus över hela genomet kan man hitta regioner som verkar ha varit påverkade av selektion, dock kan man inte veta vilken fenotyp det lokuset påverkar innan man har gjort en associationsstudie för regionen.

Halsbandsflugsnappare (Ficedula albicollis) är en av fyra närbesläktade flugsnappararter som häckar i Europa och som antas härstamma från en gemensam anfader vars population delades upp i flera olika refuger under de olika istider som har dominerat Europa under den senaste miljonen år unge- fär. Förutom halsbandsflugsnappare häckar i Sverige svartvit flugsnappare (Ficedula hypoleuca) som antas vara halsbandsflugsnapparens närmast nu levande släkting. På Öland och Gotland, som är de enda landskap i Sverige där halsbandsflugsnappare förekommer regelbundet (svartvit är vanlig i hela landet utom i högsta fjällkedjan), häckar båda arterna parallellt (sympatri) och de populationerna har följts av zooekologer i över 25 år. Därmed finns en enorm databank som beskriver karaktärer som har betydelse för de olika individernas fitness (hur bra de klarar tillvaron i jämförelse med andra indi- vider, exempelvis genom överlevnad och reproduktionsframgång) och det finns också blod insamlat för en stor del av de individer som årligen häckar på öarna vilket har gjort att vi kunnat utröna släktskap (pedigree) mellan individer och samla släkter med ett större antal avkommor för segregations- analys.

58 För att kunna utföra genetiska analyser i en art som inte tidigare har någon karaktäriserad arvsmassa krävs att man utvecklar genetiska markörer. Gene- tiska markörer finns i ett antal olika varianter och vi har valt att i huvudsak jobba med enskilda nukleotidvarianter (SNP, single nucleotide poly- morphism) i icke-kodande delar av gener (introner). Tack vare att tamhönans genom är karaktäriserat finns sekvensinformation för alla regioner att hämta. Dock var det ungefär 100 miljoner år sedan tamhöna och halsbands- flugsnappare hade en gemensam anfader och många nya mutationer har ac- kumulerats i deras respektive genom sedan dess. Vi använde oss av sekvens- jämförelser mellan kyckling och andra karaktäriserade vertebrater för att hitta regioner i gener som var konserverade mellan arterna och använde des- sa regioner för att med hjälp av rekombinant DNA teknik som PCR (poly- merase chain reaction) och sangersekvensering karaktärisera de korrespon- derande (orthologa) regionerna i halsbandsflugsnappare. Dessa regioner (markörer) använde vi sedan i analyserna.

Den här avhandlingen omfattar genkartering, uppskattning av kopplingso- jämvikt (linkage disequilibrium, LD) och populationsgenetisk analys av halsbands- och svartvit flugsnappare. Huvudmålet har varit att utveckla ge- netiska markörer och genkartor att använda som verktyg i senare QTL mappning i halsbandsflugsnapparpopulationerna på Öland och Gotland samt att undersöka graden av kopplingsojämvikt för att få en uppfattning om möj- ligheterna att göra associationsstudier. Dessutom var målsättningen att de- taljstudera Z-kromosomen för att se om det gick att finna regioner som kan vara viktiga för artbildningsprocessen mellan halsbands- och svartvit flug- snappare. Det senare på grund av att tidigare studier i arten har indikerat att Z-kromosomen verkar hysa lokus som är viktiga för sexuellt urval i hanar och för preferenser för vissa hanliga karaktärer hos honor.

I fem olika studier presenteras en Z-kromosom karta bestående av 23 gene- tiska markörer (Paper I), en autosomal genkarta bestående av 211 kopplade och 31 okopplade markörer (Paper III), en uppsättning om 242 genbaserade markörer för ett brett användningsområde inom exempelvis genkartering, fylogenetiska och populationsgenetiska studier och artbildningsstudier (Pa- per II), en uppskattning av graden av kopplingsojämvikt på Z-kromosomen (Paper IV) och en detaljerad screening av inom- och mellanartsvariation längs Z-kromosomen för 74 gener (Paper V). Huvudresultaten innefattar; att det är möjligt att utveckla detaljerade och vältäckande genkartor även i na- turliga populationer, att graden av konservering mellan olika fågelarter är extrem i jämförelse med andra organismgrupper, att graden av kopplingso- jämvikt i halsbandsflugsnappare indikerar att det skulle krävas åtminstone 20,000 variabla markörer för att täcka genomet i en associationsstudie och att det finns regioner på Z-kromosomen som visar spår av positiv selektion, regioner som kan ha varit viktiga för upprättande av hybridiseringsbarriärer

59 mellan svartvita flugsnappare och halsbandsflugsnappare. Fortsatta studier kommer att inkludera kopplingen av genotyper i de gentiska kartorna till de fenotyper som har kartlagts genom åren. En försvårande omständighet är att en stor del av halsbandsflugsnapparna omkommer under migrationen till och från eller under tiden i vinterkvarteren i sydöstra Afrika, eller att de inte återvänder till den plats där de häckade sist eller där de föddes. Det går att på omvägar karaktärisera troliga fenotyper (breeding value) även för saknade individer genom att applicera specifika kvantitativa genetiska metoder (ani- mal model) men varianserna i estimaten är mycket stora och styrkan i de statistiska analyserna därmed ytterst liten. Det är därför troligt att framtida insatser i vilda populationer kommer att riktas mot populationsbaserade me- toder, framförallt på grund av den mycket snabba tekniska utvecklingen av metoder för DNA sekvensering och genotypning (karaktärisering av alleler) vilket kommer att göra det möjligt att uppnå den täckningsgrad som krävs för att göra en associationsstudie över ett helt genom till exempel. Det är också rimligt att anta att variationen mellan populationer kommer att vara jämförbar med vad som observeras i modellorganismer, dvs att olika lokus är olika viktiga i olika miljöer och därför behövs troligen upprepade insatser i olika regioner för att få en helhetsbild över vilka lokus som har en effekt på de fenotyper som är viktiga för individernas överlevnad i olika naturliga miljöer.

60 Acknowledgements

Thank you,

Present and past staff at the Department of Evolutionary Biology, in particu- lar:

Hans for accepting me as a PhD-student and for providing a creative re- search environment and excellent supervision. Thanks also for your generos- ity regarding projects and conferences. It has been most interesting, stimulat- ing and fun to spend these years in your lab which I know it is at the fore- front of evolutionary biological research. I certainly hope that there will be opportunities for future work together.

Sofia, Erik, Micke, Matt, Hanna, Helene, Judith, Lina, Lujiang, Martin, Jo- han, Axel, Matthieu, KiWoong, Jochen, Harriet, Eleftheria and Carina in the Molecular Evolution Group. It has been great to take part of your expe- riences and knowledge and indeed beneficial and fun to work in such a re- sourceful environment. A special thanks to Sofia for fruitful and not so fruit- ful, but most often fun discussions about much besides science, to Erik for tirelessly keeping EBC-iskalla and the Friday bandy going, to Micke for unbelievable and priceless computer skills and shared interest in “the mean fish”, to Judith for being the fountain of knowledge, always helpful and will- ing to explain difficult stuff and to Johan for a lot of valuable input during the last weeks of writing. Very big thanks also to Nilla, Johanna S and Mia for important help in the lab and pleasant lunch breaks at Yukikos and nice company in general.

Emma, Anna-Karin and Santi for being the ultimate office crew. Especially stimulating were the rare but delicious Friday-whiskeys, exciting mixes of music styles, unlimited supply of Digestive crackers and the seasonal occur- rence of fresh lambs. Thanks also to Jorge and Ülo for re-awakening my slumbering bird-brain and for nice birding trips and lots of good laughs.

The Nilsson-Ehle Donation, The Helge Ax:son-Johnsons Foundation, The Wallenberg Foundation, The Lars Hierta Memorial Foundation, The Royal Swedish Academy of Science and The EBC graduate school on Genomes and Phenotypes for economical support.

61 Also thanks to:

Yxan, Johansson and Thomas for extremely important and always very fun expeditions to lakes, streams and seas in the late Terhi and the Väddö and other more or less well-suited wrecks in search for the ultimate species of the planet. Many places still to visit though. Liljefors, Bergman, Pelle - although occurring relatively rarely these days, tours with you are always fun and incredibly often successful, no matter if it involves birds, fish or in between. Åsa, Gurra, Hampus, Matilda, Anders, Irene, Bill, Alma, Daniel, Robban, Jenny, for precious friendship, times with you are indeed relaxing and refuel- ing.

Mom, dad, Fredrik and Andreas for all support during the early years and for giving me the freedom to follow all kinds of intuitions. Thanks for not being too upset about leaking ant farms in the garage, sour smell of salamanders in the living room, decaying pieces of dead pig in the trunk of the red SAAB and similar things, details that helped set the stage for becoming a true biol- ogy geek. Seriously, you really gave me the best platform possible, thank you.

Lars and Lena, the greatest parents in law - you are and have been such good and generous friends over the years. Sofia, Emma, Henrik, Nora and Ellen for enjoyable activities at all times.

Last, but most, Johanna, Ebba and Axel for being the best family one could ever wish for. Kommer ni ihåg?

62 References

Abasht, B., J.C.M. Dekkers, and S.J. Lamont. 2006. Review of quantitative trait loci identified in the chicken. Poultry Science 85: 2079-2096. Aitken, N., S. Smith, C. Schwarz, and P.A. Morin. 2004. Single nucleotide polymorphism (SNP) discovery in mammals: a targeted-gene ap- proach. Molecular Ecology 13: 1423-1431. Alatalo, R.V., D. Eriksson, L. Gustafsson, and A. Lundberg. 1990. Hybridi- zation between pied and collared flycatchers - sexual selection and speciation theory. Journal of Evolutionary Biology 3: 375-389. Alatalo, R.V., L. Gustafsson, and A. Lundberg. 1982. Hybridization and breeding success of collared and pied flycatchers on the island of Gotland. The Auk 99: 285-291. Andolfatto, P. and M. Przeworski. 2000. A genome-wide departure from the standard neutral model in natural populations of Drosophila. Genet- ics 156: 257-268. Ansari, H.A., N. Takagi, and M. Sasaki. 1988. Morphological differentiation of sex chromosomes in three species of ratite birds. Cytogenetic and Genome Research 47: 185-188. Arnqvist, G. and L. Rowe. 2005. Sexual conflict. Princeton University Press, Princeton, NJ. Auer, H., B. Mayr, M. Lambrou, and W. Schleger. 2004. An extended chicken karyotype, including the NOR chromosome. Cytogenetics and Cell Genetics 45: 218-221. Avise, J.C., W.S. Nelson, and C.G. Sibley. 1994. Why one-kilobase se- quences from mitochondrial DNA fail to solve the hoatzin phyloge- netic enigma. Molecular Phylogenetics and Evolution 3: 175-184. Axelsson, E., L. Hultin-Rosenberg, M. Brandström, M. Zwahlén, D.F. Clay- ton, and H. Ellegren. 2008. Natural selection in avian protein-coding genes expressed in brain. Molecular Ecology 17: 3008-3017. Axelsson, E., N.G. Smith, H. Sundstrom, S. Berlin, and H. Ellegren. 2004. Male-biased mutation rate and divergence in autosomal, z-linked and w-linked introns of chicken and turkey. Molecular Biology and Evolution 21: 1538-1547. Backström, N., M. Brandström, L. Gustafsson, A. Qvarnström, H. Cheng, and H. Ellegren. 2006. Genetic mapping in a natural population of collared flycatchers (Ficedula albicollis): conserved synteny but gene order rearrangements on the avian Z chromosome. Genetics 174: 377-386.

63 Bailey, J.A., R. Baertsch, W.J. Kent, D. Haussler, and E. Eichler. 2004. Hotspots of mammalian chromosome evolution. Genome Biology 5: R23. Barrett, J.C., B. Fry, J. Maller, and M.J. Daly. 2005. Haploview, analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263- 265. Barton, N. 1996. Natural selection and random genetic drift as causes of evolution on islands. Philosophical Transactionc of the Royal Socie- ty B Biological Sciences 351: 785-795. Beaumont, M.A. and D.J. Balding. 2004. Identifying adaptive genetic diver- gence among populations from genome scans. Molecular Ecology 13: 969-980. Beaumont, M.A. and R.A. Nichols. 1996. Evaluating loci for use in the ge- netic analysis of population structure. Proceedings of the Royal So- ciety B-Biological Sciences 263: 1619-1626. Becquet, C. and M. Przeworski. 2007. A new approach to estimate parame- ters of speciation models with application to apes. Genome Research 17: 1505-1519. Bed'Hom, B., P. Coullin, Z. Guillier-Gencik, S. Moulin, A. Bernheim, and V. Volobouev. 2003. Characterization of the atypical karyotype of the black-winged kite Elanus caeruleus (Falconiformes: Accipitri- dae) by means of classical and molecular cytogenetic techniques. Chromosome Research 11: 335-343. Bell, P.A., S. Chaturvedi, C.A. Gelfand, C.Y. Huang, M. Kochersperger, R. Kopla, F. Modica, M. Pohl, S. Varde, R. Zhao, X. Zhao, M.T. Boyce-Jacino, and A. Yassen. 2002. SNPstream UHT: ultra-high throughput SNP genotyping for pharmacogenomics and drug dis- covery. Biotechniques: 76-77. Beraldi, D., A.F. McRae, J. Gratten, J. Slate, P.M. Visscher, and J.M. Pem- berton. 2006. Development of a linkage map and mapping of pheno- typic polymorphisms in a free-living population of Soay sheep (Ovis aries). Genetics 173: 1521-1537. Beraldi, D., A.F. McRae, J. Gratten, J. Slate, P.M. Visscher, and J.M. Pem- berton. 2007. Mapping quantitative trait loci underlying fitness- related traits in a free-living sheep population. Evolution 61: 1403- 1416. Borge, T., K. Lindroos, P. Nadvornik, A.C. Syvanen, and G.P. Saetre. 2005a. Amount of introgression in flycatcher hybrid zones reflects regional differences in pre and post-zygotic barriers to gene ex- change. Journal of Evolutionary Biology 18: 1416-1424. Borge, T., M.T. Webster, G. Andersson, and G.P. Saetre. 2005b. Contrasting patterns of polymorphism and divergence on the Z chromosome and autosomes in two Ficedula flycatcher species. Genetics 171: 1861- 1873. Boyle, A.S. and M.A. Noor. 2004. Variation in recombination rate may bias human genetic disease mapping studies. Genetica 122: 245-252.

64 Burt, D.W., C. Bruley, I.C. Dunn, C.T. Jones, A. Ramage, A.S. Law, D.R. Morrice, I.R. Paton, J. Smith, D. Windsor, A. Sazanov, R. Fries, and D. Waddington. 1999. The dynamics of chromosome evolution in birds and mammals. Nature 402: 411-413. Capanna, E. and R. Castiglia. 2004. Chromosomes and speciation in Mus musculus domesticus. Cytogenetic and Genome Research 105: 375- 384. Carlson, C.S., M.A. Eberle, M.J. Rieder, Q. Yi, L. Kruglyak, and D.A. Nick- erson. 2004. Selecting a maximally informative set of single- nucleotide polymorphisms for association analyses using linkage disequilibrium. American Journal of Human Genetics 74: 106-120. Carroll, S.B. 2008. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134: 25-36. Cichón, M., J. Sendecka, and L. Gustafsson. 2003. Age-related decline in humoral immune function in collared flycatchers. Journal of Evolu- tionary Biology 16: 1205-1210. Cogan, N.O., R.C. Ponting, A.C. Vecchies, M.C. Drayton, J. George, P.M. Dracatos, M.P. Dobrowolski, T.I. Sawbridge, K.F. Smith, G.C. Spangenberg, and J.W. Forster. 2006. Gene-associated single nuc- leotide polymorphism discovery in perennial ryegrass (Lolium pe- renne L.). Molecular Genetics and Genomics 276: 101-112. Conway, D.J., C. Roper, A.M. Oduola, D.E. Arnot, P.G. Kremsner, M.P. Grobusch, C.F. Curtis, and B.M. Greenwood. 1999. High recombi- nation rate in natural populations of Plasmodium falciparum. Pro- ceedings of the National Academy of Sciences U.S.A. 96: 4506-4511. Dawson, D.A., M. Akesson, T. Burke, J.M. Pemberton, J. Slate, and B. Hansson. 2007. Gene order and recombination rate in homologous chromosome regions of the chicken and a passerine Bird. Molecular Biology and Evolution 24: 1537-1552 Dawson, D.A., T. Burke, B. Hansson, J. Pandhal, M.C. Hale, G.N. Hinten, and J. Slate. 2006. A predicted microsatellite map of the passerine genome based on chicken-passerine sequence similarity. Molecular Ecology 15: 1299-1320. de Oliveira, E.H., F.A. Habermann, O. Lacerda, I.J. Sbalqueiro, J. Wienberg, and S. Muller. 2005. Chromosome reshuffling in birds of prey: the karyotype of the world's largest eagle (Harpy eagle, Harpia harpyja) compared to that of the chicken (Gallus gallus). Chromosoma 114: 338-343. del Hoyo, J., A. Elliot, and D.A. Christie. 2006. Handbook of the birds of the world. old world flycatchers to old world warblers. Lynx Edicions, Barcelona. Derjusheva, S., A. Kurganova, F. Habermann, and E. Gaginskaya. 2004. High chromosome conservation detected by comparative chromo- some painting in chicken, pigeon and passerine birds. Chromosome Research 12: 715-723.

65 Dimcheff, D.E., S.V. Drovetski, and D.P. Mindell. 2002. Phylogeny of Te- traoninae and other galliform birds using mitochondrial 12S and ND2 genes. Molecular Phylogenetics and Evolution 24: 203-215. Doerge, R.W. 2002. Mapping and analysis of quantitative trait loci in expe- rimental populations. Nature Reviews Genetics 3: 43-52. Doucet, S.M., M.D. Shawkey, M.K. Rathburn, H.L. Mays Jr, and R. Mont- gomerie. 2004. Concordant evolution of plumage colour, feather mi- crostructure and a melanocortin receptor gene between mainland and island populations of a fairy-wren. Proceedings of the Royal Society B-Biological Sciences 271: 1663-1670. Edwards, S.V. 2008. A smörgåsbord of markers for avian ecology and evo- lution. Molecular Ecology 17: 945-946. Edwards, S.V. and M. Dillon. 2004. Hitchhiking and recombination in birds: evidence from Mhc-linked and unlinked loci in red-winged black- birds (Agelaius phoeniceus). Genetical Research 84: 175-192. Edwards, S.V., W.B. Jennings, and A.M. Shedlock. 2005. Phylogenetics of modern birds in the era of genomics. Proceedings of the Royal So- ciety B-Biological Sciences 272: 979-992. Edwards, S.V., L. Liu, and D.K. Pearl. 2007. High-resolution species trees without concatenation. Proceedings of the National Academy of Science U S A 104: 5936-5941. Ellegren, H. 2004. Microsatellites: Simple sequences with complex evolu- tion. Nature Reviews Genetics 5: 435-445. Ellegren, H. 2005. The avian genome uncovered. Trends in Ecology & Evo- lution 20: 180-186. Ellegren, H., L. Gustafsson, and B.C. Sheldon. 1996. Sex ratio adjustment in relation to paternal attractiveness in a wild bird population. Proceed- ings of the National Academy of Science U S A 93: 11723-11728. Ellegren, H. and B.C. Sheldon. 2008. Genetic basis of fitness differences in natural populations. Nature 452: 169-175. Ericson, P.G., C.L. Anderson, T. Britton, A. Elzanowski, U.S. Johansson, M. Kallersjo, J.I. Ohlson, T.J. Parsons, D. Zuccon, and G. Mayr. 2006. Diversification of Neoaves: integration of molecular sequence data and fossils. Biology Letters 2: 543-547. Ericson, P.G. and U.S. Johansson. 2003. Phylogeny of Passerida (Aves: Pas- seriformes) based on nuclear and mitochondrial sequence data. Mo- lecular Phylogenetics and Evolution 29: 126-138. Fan, J.B., M.S. Chee, and K.L. Gunderson. 2006. Highly parallell genomic assays. Nature Reviews Genetics 7: 632-644. Fan, J.B., A. Oliphant, R. Shen, B.G. Kermani, F. Garcia, K.L. Gunderson, M. Hansen, F. Steemers, S.L. Butler, P. Deloukas, L. Galver, S. Hunt, C. McBride, M. Bibikova, T. Rubano, J. Chen, E. Wickham, D. Doucet, W. Chang, D. Campbell, B. Zhang, S. Kruglyak, D. Bentley, J. Haas, P. Rigault, L. Zhou, J. Stuelpnagel, and M.S. Chee. 2003. Highly parallel SNP genotyping. Cold Spring Harbor sympo- sia on quantitative biology 68: 69-78.

66 Fay, J.C. and C.-I. Wu. 2000. Hitchhiking under positive Darwinian selec- tion. Genetics 155: 1405-1413. Ferguson-Smith, M.A. and V. Trifonov. 2007. Mammalian karyotype evolu- tion. Nature Reviews Genetics 8: 950-962. Fillon, V., M. Morisson, R. Zoorob, C. Auffray, M. Douaire, J. Gellin, and A. Vignal. 1998. Identification of 16 chicken microchromosomes by molecular markers using two-colour fluorescence in situ hybridiza- tion (FISH). Chromosome Research 6: 307-313. Fillon, V., M. Vignoles, R.P. Crooijmans, M.A. Groenen, R. Zoorob, and A. Vignal. 2007. FISH mapping of 57 BAC clones reveals strong con- servation of synteny between Galliformes and Anseriformes. Animal Genetics 38: 303-307. Fitzpatrick, B.M. 2004. Rates of evolution of hybrid inviability in birds and mammals. Evolution 58: 1865-1870. Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, A.R. Kerlavage, C.J. Bult, J.-F. Tomb, B.A. Dougherty, J.M. Mer- rick, K. McKenney, G. Sutton, W. FitzHugh, C. Fields, J.D. Go- cayne, J. Scott, R. Shirley, L.-I. Liu, A. Glodek, J.M. Kelley, J.F. Weidman, C.A. Phillips, T. Spriggs, E. Hedblom, M.D. Cotton, T.R. Utterback, M.C. Hanna, D.T. Nguyen, D.M. Saudek, R.C. Brandon, L.D. Fine, J.L. Fritchman, J.L. Fuhrmann, N.S.M. Geoghagen, C.L. Gnehm, L.A. McDonald, K.V. Small, C.M. Fraser, H.O. Smith, and J.C. Venter. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512. Freudenreich, C.H. 2007. Chromosome fragility: molecular mechanisms and cellular consequences. Frontiers in Bioscience 12: 4911-4924. Gibbs, R.A. J. Rogers M.G. Katze R. Bumgarner G.M. Weinstock E.R. Mardis K.A. Remington R.L. Strausberg J.C. Venter R.K. Wilson M.A. Batzer C.D. Bustamante E.E. Eichler M.W. Hahn R.C. Hardi- son K.D. Makova W. Miller A. Milosavljevic R.E. Palermo A. Sie- pel J.M. Sikela T. Attaway S. Bell K.E. Bernard C.J. Buhay M.N. Chandrabose M. Dao C. Davis K.D. Delehaunty Y. Ding H.H. Dinh S. Dugan-Rocha L.A. Fulton R.A. Gabisi T.T. Garner J. Godfrey A.C. Hawes J. Hernandez S. Hines M. Holder J. Hume S.N. Jhan- giani V. Joshi Z.M. Khan E.F. Kirkness A. Cree R.G. Fowler S. Lee L.R. Lewis Z. Li Y.S. Liu S.M. Moore D. Muzny L.V. Nazareth D.N. Ngo G.O. Okwuonu G. Pai D. Parker H.A. Paul C. Pfannkoch C.S. Pohl Y.H. Rogers S.J. Ruiz A. Sabo J. Santibanez B.W. Schneider S.M. Smith E. Sodergren A.F. Svatek T.R. Utterback S. Vattathil W. Warren C.S. White A.T. Chinwalla Y. Feng A.L. Hal- pern L.W. Hillier X. Huang P. Minx J.O. Nelson K.H. Pepin X. Qin G.G. Sutton E. Venter B.P. Walenz J.W. Wallis K.C. Worley S.P. Yang S.M. Jones M.A. Marra M. Rocchi J.E. Schein R. Baertsch L. Clarke M. Csuros J. Glasscock R.A. Harris P. Havlak A.R. Jackson H. Jiang Y. Liu D.N. Messina Y. Shen H.X. Song T. Wylie L. Zhang E. Birney K. Han M.K. Konkel J. Lee A.F. Smit B. Ullmer H. Wang J. Xing R. Burhans Z. Cheng J.E. Karro J. Ma B. Raney X.

67 She M.J. Cox J.P. Demuth L.J. Dumas S.G. Han J. Hopkins A. Ka- rimpour-Fard Y.H. Kim J.R. Pollack T. Vinar C. Addo-Quaye J. Degenhardt A. Denby M.J. Hubisz A. Indap C. Kosiol B.T. Lahn H.A. Lawson A. Marklein R. Nielsen E.J. Vallender A.G. Clark B. Ferguson R.D. Hernandez K. Hirani H. Kehrer-Sawatzki J. Kolb S. Patil L.L. Pu Y. Ren D.G. Smith D.A. Wheeler I. Schenck E.V. Ball R. Chen D.N. Cooper B. Giardine F. Hsu W.J. Kent A. Lesk D.L. Nelson E. O'Brien W K. Prufer P.D. Stenson J.C. Wallace H. Ke X.M. Liu P. Wang A.P. Xiang F. Yang G.P. Barber D. Haussler D. Karolchik A.D. Kern R.M. Kuhn K.E. Smith and A.S. Zwieg. 2007. Evolutionary and biomedical insights from the rhesus macaque ge- nome. Science 316: 222-234. Gibbs, R.A. G.M. Weinstock M.L. Metzker D.M. Muzny E.J. Sodergren S. Scherer G. Scott D. Steffen K.C. Worley P.E. Burch G. Okwuonu S. Hines L. Lewis C. DeRamo O. Delgado S. Dugan-Rocha G. Miner M. Morgan A. Hawes R. Gill Celera R.A. Holt M.D. Adams P.G. Amanatides H. Baden-Tillson M. Barnstead S. Chin C.A. Evans S. Ferriera C. Fosler A. Glodek Z. Gu D. Jennings C.L. Kraft T. Nguyen C.M. Pfannkoch C. Sitter G.G. Sutton J.C. Venter T. Woo- dage D. Smith H.M. Lee E. Gustafson P. Cahill A. Kana L. Dou- cette-Stamm K. Weinstock K. Fechtel R.B. Weiss D.M. Dunn E.D. Green R.W. Blakesley G.G. Bouffard P.J. De Jong K. Osoegawa B. Zhu M. Marra J. Schein I. Bosdet C. Fjell S. Jones M. Krzywinski C. Mathewson A. Siddiqui N. Wye J. McPherson S. Zhao C.M. Fraser J. Shetty S. Shatsman K. Geer Y. Chen S. Abramzon W.C. Nierman P.H. Havlak R. Chen K.J. Durbin A. Egan Y. Ren X.Z. Song B. Li Y. Liu X. Qin S. Cawley K.C. Worley A.J. Cooney L.M. D'Souza K. Martin J.Q. Wu M.L. Gonzalez-Garay A.R. Jackson K.J. Kalafus M.P. McLeod A. Milosavljevic D. Virk A. Volkov D.A. Wheeler Z. Zhang J.A. Bailey E.E. Eichler E. Tuzun E. Birney E. Mongin A. Ureta-Vidal C. Woodwark E. Zdobnov P. Bork M. Suyama D. Torrents M. Alexandersson B.J. Trask J.M. Young H. Huang H. Wang H. Xing S. Daniels D. Gietzen J. Schmidt K. Ste- vens U. Vitt J. Wingrove F. Camara M. Mar Alba J.F. Abril R. Gui- go A. Smit I. Dubchak E.M. Rubin O. Couronne A. Poliakov N. Hubner D. Ganten C. Goesele O. Hummel T. Kreitler Y.A. Lee J. Monti H. Schulz H. Zimdahl H. Himmelbauer H. Lehrach H.J. Jacob S. Bromberg J. Gullings-Handley M.I. Jensen-Seaman A.E. Kwitek J. Lazar D. Pasko P.J. Tonellato S. Twigger C.P. Ponting J.M. Duarte S. Rice L. Goodstadt S.A. Beatson R.D. Emes E.E. Winter C. Webber P. Brandt G. Nyakatura M. Adetobi F. Chiaromonte L. El- nitski P. Eswara R.C. Hardison M. Hou D. Kolbe K. Makova W. Miller A. Nekrutenko C. Riemer S. Schwartz J. Taylor S. Yang Y. Zhang K. Lindpaintner T.D. Andrews M. Caccamo M. Clamp L. Clarke V. Curwen R. Durbin E. Eyras S.M. Searle G.M. Cooper S. Batzoglou M. Brudno A. Sidow E.A. Stone J.C. Venter B.A. Pay- seur G. Bourque C. Lopez-Otin X.S. Puente K. Chakrabarti S. -

68 terji C. Dewey L. Pachter N. Bray V.B. Yap A. Caspi G. Tesler P.A. Pevzner D. Haussler K.M. Roskin R. Baertsch H. Clawson T.S. Fu- rey A.S. Hinrichs D. Karolchik W.J. Kent K.R. Rosenbloom H. Trumbower M. Weirauch D.N. Cooper P.D. Stenson B. Ma M. Brent M. Arumugam D. Shteynberg R.R. Copley M.S. Taylor H. Riethman U. Mudunuri J. Peterson M. Guyer A. Felsenfeld S. Old S. Mockrin and F. Collins. 2004. Genome sequence of the brown Nor- way rat yields insights into mammalian evolution. Nature 428: 493- 521. Gratten, J., D. Beraldi, B.V. Lowder, A.F. McRae, P.M. Visscher, J.M. Pemberton, and J. Slate. 2007. Compelling evidence that a single nucleotide substitution in TYRP1 is responsible for coat-colour po- lymorphism in a free-living population of Soay sheep. Proceedings of the Royal Society B-Biological Sciences 274: 619-626. Green, P., K. Falls, and S. Crook. 1990. Documentation for CRIMAP, ver- sion 2.4., Washington Univ. School of Medicine, St. Louis, MO. Gregory, T.R., J.A. Nicol, H. Tamm, B. Kullman, K. Kullman, I.J. Leitch, B.G. Murray, D.F. Kapraun, J. Greilhuber, and M.D. Bennett. 2007. Eukaryotic genome size databases. Nucleic Acids Research 35: D332-338. Griffin, D.K., L.B. Robertson, H.G. Tempest, and B.M. Skinner. 2007. The evolution of the avian genome as revealed by comparative molecular cytogenetics. Cytogenetics and Genome Research 117: 64-77. Groenen, M.A., H.H. Cheng, N. Bumstead, B.F. Benkel, W.E. Briles, T. Burke, D.W. Burt, L.B. Crittenden, J. Dodgson, J. Hillel, S. Lamont, A.P. de Leon, M. Soller, H. Takahashi, and A. Vignal. 2000. A con- sensus linkage map of the chicken genome. Genome Research 10: 137-147. Gunnarsson, U., A.R. Hellström, M. Tixier-Boichard, F. Minvielle, B. Bed'Hom, S. Ito, P. Jensen, A. Rattink, A. Vereijken, and L. Anders- son. 2007. Mutations in SLC45A2 cause plumage color variation in chicken and Japanese quail. Genetics 175: 867-877. Gustafsson, L. 1987. Interspecific competition lowers fitness in collared flycatchers Ficedula albicollis: an experimental demonstration. Journal of Animal Ecology 68: 291-296. Gustafsson, L. and T. Pärt. 1990. Acceleration of senescence in the collared flycatcher Ficedula albicollis by reproductive costs. Nature 347: 279-281. Gustafsson, L., A. Qvarnström, and B.C. Sheldon. 1995. Trade-offs between life-history traits and a secondary sexual character in male collared flycatchers. Nature 375: 311-313. Gustafsson, L. and W.J. Sutherland. 1988. The costs of reproduction in the collared flycatcher Ficedula albicollis. Nature 335: 813-817. Guttenbach, M., I. Nanda, W. Feichtinger, J.S. Masabanda, D.K. Griffin, and M. Schmid. 2003. Comparative chromosome painting of chicken au- tosomal paints 1-9 in nine different bird species. Cytogenetic and Genome Research 103: 173-184.

69 Haavie, J., T. Borge, S. Bures, L.Z. Garamszegi, H.M. Lampe, J. Moreno, A. Qvarnström, J. Török, and G.P. Saetre. 2004. Flycatcher song in al- lopatry and sympatry, convergence, divergence and reinforcement. Journal of Evolutionary Biology 17: 227-237. Haddrill, P.R., K.R. Thornton, B. Charlesworth, and P. Andolfatto. 2005. Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. Ge- nome Research 15: 790-799. Haldane, J.B.S. 1919. The combination of linkage values and the calculation of distances between the loci of linked factors. Journal of Genetics 8: 299-309. Haldane, J.B.S. 1922. Sex ratio and unisexual sterility in hybrid animals. Journal of Genetics 12: 279-281. Hale, M.C., H. Jensen, T.R. Birkhead, T. Burke, and J. Slate. 2008. A com- parison of synteny and gene order on the homologue of chicken chromosome 7 between two passerine species and between passe- rines and chicken. Cytogenetic and Genome Research 121: 120-129. Hansson, B., M. Akesson, J. Slate, and J.M. Pemberton. 2005. Linkage map- ping reveals sex-dimorphic map distances in a passerine bird. Pro- ceedings of the Royal Society B-Biological Sciences 272: 2289- 2298. Harr, B. 2006. Genomic islands of differentiation between house mouse subspecies. Genome Research 16: 730-737. Hartl, D.L. and A.G. Clark. 1997. Principles of Population Genetics. Sinauer Associates, Inc., Sunderland, MA. Hawthorne, D.J. and S. Via. 2001. Genetic linkage of ecological specializa- tion and reproductive isolation in pea aphids. Nature 412: 904-907. Heifetz, E.M., J.E. Fulton, N. O'Sullivan, H. Zhao, J.C. Dekkers, and M. Soller. 2005. Extent and consistency across generations of linkage disequilibrium in commercial layer chicken breeding populations. Genetics 171: 1173-1181. Hellborg, L. and H. Ellegren. 2003. Y chromosome conserved anchored tagged sequences (YCATS) for the analysis of mammalian male- specific DNA. Molecular Ecology 12: 283-291. Heuertz, M., E. De Paoli, T. Källman, H. Larsson, I. Jurman, M. Morgante, M. Lascoux, and N. Gyllenstrand. 2006. Multilocus patterns of nuc- leotide diversity, linkage disequilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics 174: 2095-2105. Hey, J. and R. Nielsen. 2004. Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genet- ics 167: 747-760. Hill, W.G. and A.R. Robertson. 1968. Linkage disequilibrium in finite popu- lations. Theoretical and Applied Genetics 38: 226-231. Hirschhorn, J.N. and M.J. Daly. 2005. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6: 95-108.

70 Hoekstra, H.E. and J.A. Coyne. 2007. The locus of evolution: Evo devo and the genetics of adaptation. Evolution 61: 995-1016. Hoekstra, H.E., K.E. Drumm, and M.W. Nachman. 2004. Ecological genet- ics of adaptive color polymorphism in pocket mice: geographic vari- ation in selected and neutral genes. Evolution 58: 1329-1341. Hoekstra, H.E., R.J. Hirschmann, R.A. Bundey, P.A. Insel, and J.P. Cross- land. 2006. A single amino acid mutation contributes to adaptive beach mouse color pattern. Science 313: 101-104. Hoffman, A.A. and L.H. Rieseberg. 2008. Revisiting the impact of inver- sions in evolution: from population genetic markers to drivers of adaptive shifts and speciation? Annual Review of Ecology, Evolu- tion, and Systematics 39: 21-42. Hudson, R.R. and N.L. Kaplan. 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111: 147-164. Hutt, F.B. 1936. A tentative chromosome map. In Genetics of the Fowl. VI. Neue Forschungen in Tierzucht und Abstammungslebre (Duerst Festschrif) pp. 105-112. Verbands-druckerei, Bern. Hutt, F.B. 1960. New loci in the sex chromosome of the fowl. Heredity 15: 97-110. ICGSC. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432: 695-716. ICPMC. 2004. A genetic variation map for chicken with 2.8 million single- nucleotide polymorphisms. Nature 432: 717-722. IHMC. 2005. A haplotype map of the human genome. Nature 437: 1299- 1320. Itoh, Y. and A.P. Arnold. 2005. Chromosomal polymorphism and compara- tive painting analysis in the zebra finch. Chromosome Research 13: 47-56. Itoh, Y., K. Kampf, and A.P. Arnold. 2006. Comparison of the chicken and zebra finch Z chromosomes shows evolutionary rearrangements. Chromosome Research 14: 805-815. Jeffreys, A.J., L. Kauppi, and R. Neumann. 2001. Intense punctuate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics 29: 109-111. Jeffreys, A.J., A. Ritchie, and R. Neumann. 2000. High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 re- combination hotspot. Human Molecular Genetics 9: 725-733. Jones, G.H. and C. Franklin. 2006. Meiotic crossing-over: obligation and interference. Cell 126: 246-248. Karaiskou, N., L. Buggiotti, E.H. Leder, and C.R. Primmer. 2008. High de- gree of transferability of 86 newly developed zebra finch EST-linked microsatellite markers in 8 bird species. Journal of Heredity 99: 688-693 Kasai, F., C. Garcia, M.V. Arruga, and M.A. Ferguson-Smith. 2003. Chro- mosome homology between chicken (Gallus gallus domesticus) and

71 the red-legged partridge (Alectoris rufa); evidence of the occurrence of a neocentromere during evolution. Cytogenetic and Genome Re- search 102: 326-330. Kauppi, L., A. Sajantila, and A.J. Jeffreys. 2003. Recombination hotspots rather than population history dominate linkage disequilibrium in the MHC class II region. Human Molecular Genetics 12: 33-40. Kayang, B.B., V. Fillon, M. Inoue-Murayama, M. Miwa, S. Leroux, K. Feve, J.L. Monvoisin, F. Pitel, M. Vignoles, C. Mouilhayrat, C. Beaumont, S. Ito, F. Minvielle, and A. Vignal. 2006. Integrated maps in quail (Coturnix japonica) confirm the high degree of synte- ny conservation with chicken (Gallus gallus) despite 35 million years of divergence. Bmc Genomics 7. Kayang, B.B., A. Vignal, M. Inoue-Murayama, M. Miwa, J.L. Monvoisin, S. Ito, and F. Minvielle. 2004. A first-generation microsatellite linkage map of the Japanese quail. Animal Genetics 35: 195-200. Kehrer-Sawatzki, H. and D.N. Cooper. 2007. Structural divergence between the human and chimpanzee genomes. Human Genetics 120: 759- 778. Kikuchi, S., D. Fujima, S. Sasazaki, S. Tsuji, M. Mizutani, A. Fujiwara, and H. Mannen. 2005. Construction of a genetic linkage map of Japanese quail (Coturnix japonica) based on AFLP and microsatellite mark- ers. Animal Genetics 36: 227-231. Kim, Y. and W. Stephan. 2002. Detecting a local signature of genetic hit- chhiking along a recombining chromosome. Genetics 160: 765-777. Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge. Klug, W.S. and M.R. Cummings. 2000. Concepts of genetics. Prentice Hall, Inc., Upper Saddle River, New Jersey. Kohn, M., J. Hogel, W. Vogel, P. Minich, H. Kehrer-Sawatzki, J.A. Graves, and H. Hameister. 2006. Reconstruction of a 450-My-old ancestral vertebrate protokaryotype. Trends in Genetics 22: 203-210. Kong, A., D.F. Gudbjartsson, J. Sainz, G.M. Jonsdottir, S.A. Gudjonsson, B. Richardsson, S. Sigurdardottir, J. Barnard, B. Hallbeck, G. Masson, A. Shlien, S.T. Palsson, M.L. Frigge, T.E. Thorgeirsson, J.R. Gulch- er, and K. Stefansson. 2002. A high-resolution recombination map of the human genome. Nature Genetics 31: 241-247. Kosambi, D.D. 1944. The estimation of map distances from recombination values. Annals of Eugenics 12: 172-175. Kruglyak, L. 1999. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genetics 22: 139-144. Lander, E.S. and D. Botstein. 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185-199. Lander, E.S. L.M. Linton B. Birren C. Nusbaum M.C. Zody J. Baldwin K. Devon K. Dewar M. Doyle W. FitzHugh R. Funke D. Gage K. Har- ris A. Heaford J. Howland L. Kann J. Lehoczky R. LeVine P. McE- wan K. McKernan J. Meldrim J.P. Mesirov C. Miranda W. Morris J. Naylor C. Raymond M. Rosetti R. Santos A. Sheridan C. Sougnez

72 N. Stange-Thomann N. Stojanovic A. Subramanian D. Wyman J. Rogers J. Sulston R. Ainscough S. Beck D. Bentley J. Burton C. Clee N. Carter A. Coulson R. Deadman P. Deloukas A. Dunham I. Dunham R. Durbin L. French D. Grafham S. Gregory T. Hubbard S. Humphray A. Hunt M. Jones C. Lloyd A. McMurray L. Matthews S. Mercer S. Milne J.C. Mullikin A. Mungall R. Plumb M. Ross R. Shownkeen S. Sims R.H. Waterston R.K. Wilson L.W. Hillier J.D. McPherson M.A. Marra E.R. Mardis L.A. Fulton A.T. Chinwalla K.H. Pepin W.R. Gish S.L. Chissoe M.C. Wendl K.D. Delehaunty T.L. Miner A. Delehaunty J.B. Kramer L.L. Cook R.S. Fulton D.L. Johnson P.J. Minx S.W. Clifton T. Hawkins E. Branscomb P. Predki P. Richardson S. Wenning T. Slezak N. Doggett J.F. Cheng A. Ol- sen S. Lucas C. Elkin E. Uberbacher M. Frazier R.A. Gibbs D.M. Muzny S.E. Scherer J.B. Bouck E.J. Sodergren K.C. Worley C.M. Rives J.H. Gorrell M.L. Metzker S.L. Naylor R.S. Kucherlapati D.L. Nelson G.M. Weinstock Y. Sakaki A. Fujiyama M. Hattori T. Yada A. Toyoda T. Itoh C. Kawagoe H. Watanabe Y. Totoki T. Taylor J. Weissenbach R. Heilig W. Saurin F. Artiguenave P. Brottier T. Bruls E. Pelletier C. Robert P. Wincker D.R. Smith L. Doucette- Stamm M. Rubenfield K. Weinstock H.M. Lee J. Dubois A. Rosen- thal M. Platzer G. Nyakatura S. Taudien A. Rump H. Yang J. Yu J. Wang G. Huang J. Gu L. Hood L. Rowen A. Madan S. Qin R.W. Davis N.A. Federspiel A.P. Abola M.J. Proctor R.M. Myers J. Schmutz M. Dickson J. Grimwood D.R. Cox M.V. Olson R. Kaul C. Raymond N. Shimizu K. Kawasaki S. Minoshima G.A. Evans M. Athanasiou R. Schultz B.A. Roe F. Chen H. Pan J. Ramser H. Le- hrach R. Reinhardt W.R. McCombie M. de la Bastide N. Dedhia H. Blocker K. Hornischer G. Nordsiek R. Agarwala L. Aravind J.A. Bailey A. Bateman S. Batzoglou E. Birney P. Bork D.G. Brown C.B. Burge L. Cerutti H.C. Chen D. Church M. Clamp R.R. Copley T. Doerks S.R. Eddy E.E. Eichler T.S. Furey J. Galagan J.G. Gilbert C. Harmon Y. Hayashizaki D. Haussler H. Hermjakob K. Hokamp W. Jang L.S. Johnson T.A. Jones S. Kasif A. Kaspryzk S. Kennedy W.J. Kent P. Kitts E.V. Koonin I. Korf D. Kulp D. Lancet T.M. Lowe A. McLysaght T. Mikkelsen J.V. Moran N. Mulder V.J. Polla- ra C.P. Ponting G. Schuler J. Schultz G. Slater A.F. Smit E. Stupka J. Szustakowski D. Thierry-Mieg J. Thierry-Mieg L. Wagner J. Wal- lis R. Wheeler A. Williams Y.I. Wolf K.H. Wolfe S.P. Yang R.F. Yeh F. Collins M.S. Guyer J. Peterson A. Felsenfeld K.A. Wetter- strand A. Patrinos M.J. Morgan P. de Jong J.J. Catanese K. Osoega- wa H. Shizuya S. Choi and Y.J. Chen. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921. Leder, E.H., N. Karaiskou, and C.R. Primmer. 2008. 70 new microsatellites for the pied flycatcher Ficedula hypoleuca and amplification in other passerine birds. Molecular Ecology Resources 8: 874-880. Lemmon, A.R. and M. Kirkpatrick. 2006. Reinforcement and the genetics of hybrid incompatibilities. Genetics 173: 1145-1155.

73 Lewontin, R.C. and K. Kojima. 1960. The evolutionary dynamics of com- plex polymorphisms. Evolution 14: 458-472. Lindblad-Toh, K. C.M. Wade T.S. Mikkelsen E.K. Karlsson D.B. Jaffe M. Kamal M. Clamp J.L. Chang E.J. Kulbokas, 3rd M.C. Zody E. Mau- celi X. Xie M. Breen R.K. Wayne E.A. Ostrander C.P. Ponting F. Galibert D.R. Smith P.J. DeJong E. Kirkness P. Alvarez T. Biagi W. Brockman J. Butler C.W. Chin A. Cook J. Cuff M.J. Daly D. DeCa- prio S. Gnerre M. Grabherr M. Kellis M. Kleber C. Bardeleben L. Goodstadt A. Heger C. Hitte L. Kim K.P. Koepfli H.G. Parker J.P. Pollinger S.M. Searle N.B. Sutter R. Thomas C. Webber J. Baldwin A. Abebe A. Abouelleil L. Aftuck M. Ait-Zahra T. Aldredge N. Al- len P. An S. Anderson C. Antoine H. Arachchi A. Aslam L. Ayotte P. Bachantsang A. Barry T. Bayul M. Benamara A. Berlin D. Bes- sette B. Blitshteyn T. Bloom J. Blye L. Boguslavskiy C. Bonnet B. Boukhgalter A. Brown P. Cahill N. Calixte J. Camarata Y. Cheshat- sang J. Chu M. Citroen A. Collymore P. Cooke T. Dawoe R. Daza K. Decktor S. DeGray N. Dhargay K. Dooley K. Dooley P. Dorje K. Dorjee L. Dorris N. Duffey A. Dupes O. Egbiremolen R. Elong J. Falk A. Farina S. Faro D. Ferguson P. Ferreira S. Fisher M. FitzGe- rald K. Foley C. Foley A. Franke D. Friedrich D. Gage M. Garber G. Gearin G. Giannoukos T. Goode A. Goyette J. Graham E. Grandbois K. Gyaltsen N. Hafez D. Hagopian B. Hagos J. Hall C. Healy R. Hegarty T. Honan A. Horn N. Houde L. Hughes L. Hunnicutt M. Husby B. Jester C. Jones A. Kamat B. Kanga C. Kells D. Khazano- vich A.C. Kieu P. Kisner M. Kumar K. Lance T. Landers M. Lara W. Lee J.P. Leger N. Lennon L. Leuper S. LeVine J. Liu X. Liu Y. Lokyitsang T. Lokyitsang A. Lui J. Macdonald J. Major R. Marabel- la K. Maru C. Matthews S. McDonough T. Mehta J. Meldrim A. Melnikov L. Meneus A. Mihalev T. Mihova K. Miller R. Mittelman V. Mlenga L. Mulrain G. Munson A. Navidi J. Naylor T. Nguyen N. Nguyen C. Nguyen T. Nguyen R. Nicol N. Norbu C. Norbu N. No- vod T. Nyima P. Olandt B. O'Neill K. O'Neill S. Osman L. Oyono C. Patti D. Perrin P. Phunkhang F. Pierre M. Priest A. Rachupka S. Raghuraman R. Rameau V. Ray C. Raymond F. Rege C. Rise J. Rogers P. Rogov J. Sahalie S. Settipalli T. Sharpe T. Shea M. Shee- han N. Sherpa J. Shi D. Shih J. Sloan C. Smith T. Sparrow J. Stalker N. Stange-Thomann S. Stavropoulos C. Stone S. Stone S. Sykes P. Tchuinga P. Tenzing S. Tesfaye D. Thoulutsang Y. Thoulutsang K. Topham I. Topping T. Tsamla H. Vassiliev V. Venkataraman A. Vo T. Wangchuk T. Wangdi M. Weiand J. Wilkinson A. Wilson S. Ya- dav S. Yang X. Yang G. Young Q. Yu J. Zainoun L. Zembek A. Zimmer and E.S. Lander. 2005. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438: 803-819. Liu, L. and D.K. Pearl. 2007. Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using esti- mated gene tree distributions. Systematic Biology 56: 504-514.

74 Lynch, M. and B. Walsh. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates, Inc., Sunderland, Massachusetts. Lyons, L.A., T.F. Laughlin, N.G. Copeland, N.A. Jenkins, J.E. Womack, and S.J. O'Brien. 1997. Comparative anchor tagged sequences (CATS) for integrative mapping of mammalian genomes. Nature Genetics 15: 47-56. Masabanda, J.S., D.W. Burt, P.C. O'Brien, A. Vignal, V. Fillon, P.S. Walsh, H. Cox, H.G. Tempest, J. Smith, F. Habermann, M. Schmid, Y. Matsuda, M.A. Ferguson-Smith, R.P. Crooijmans, M.A. Groenen, and D.K. Griffin. 2004. Molecular cytogenetic definition of the chicken genome: the first complete avian karyotype. Genetics 166: 1367-1373. Maynard Smith, J. and J. Haigh. 1974. The hitch-hiking effect of a favoura- ble gene. Genetical Research 23: 23-35. McVean, G.A., S.R. Myers, S. Hunt, P. Deloukas, D.R. Bentley, and P. Donnelly. 2004. The fine-scale structure of recombination rate varia- tion in the human genome. Science 304: 581-584. Merilä, J., L.E.B. Kruuk, and B.C. Sheldon. 2001. Cryptic evolution in a wild bird population. Nature 412: 76-79. Merilä, J. and B.C. Sheldon. 2000. Lifetime reproductive success and herita- bility in nature. American Naturalist 155: 301-310. Meunier, J. and L. Duret. 2004. Recombination drives the evolution of GC- content in the human genome. Molecular Biology and Evolution 21: 984-990. Meyer, D. and G. Thompson. 2001. How selection shapes variation of the human major histocompatibility complex, a review. Annales of Hu- man Genetics 65: 1-26. MGSC. 2002. Initial sequencing and comparative analysis of the mouse ge- nome. Nature 420: 520-562. Mitchell-Olds, T., J.H. Willis, and D.B. Goldstein. 2007. Which evolutio- nary processes influence natural genetic variation for phenotypic traits? Nature Reviews Genetics 8: 845-856. Mundy, N.I. 2005. A window on the genetics of evolution: MC1R and plu- mage colouration in birds. Proceedings of the Royal Society B- Biological Sciences 272: 1633-1640. Mundy, N.I., N.S. Badcock, T. Hart, K. Scribner, K. Janssen, and N.J. Na- deau. 2004. Conserved genetic basis of a quantitative plumage trait involved in mate choice. Science 303: 1870-1873. Myers, S., L. Bottolo, C. Freeman, G. McVean, and P. Donnelly. 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 310: 321-324. Myers, S., C. Freeman, A. Auton, P. Donnelly, and G. McVean. 2008. A common sequence motif associated with recombination hot spots and genome instability in humans. Nature Genetics 40: 1124-1129 Myers, S., C.C. Spencer, A. Auton, L. Bottolo, C. Freeman, P. Donnelly, and G. McVean. 2006. The distribution and causes of meiotic recombi-

75 nation in the human genome. Biochemical Society Transactions 34: 526-530. Myers, S.R. and R.C. Griffiths. 2003. Bounds on the minimum number of recombination events in a sample history. Genetics 163: 375-394. Nachman, M.W. 2002. Variation in recombination rate across the genome: evidence and implications. Current Opinion in Genetics & Devel- opment 12: 657-663. Nachman, M.W. and G.A. Churchill. 1996. Heterogeneity in rates of recom- bination across the mouse genome. Genetics 142: 537-548. Nadeau, N.J., N.I. Mundy, D. Gourichon, and F. Minvielle. 2007. Associa- tion of a single-nucleotide substitution in TYRP1 with roux in Japa- nese quail (Coturnix japonica). Animal Genetics 38: 609-613. Nair, S., J.T. Williams, A. Brockman, L. Paiphun, M. Mayxay, P.N. Newton, J.P. Guthmann, F.M. Smithuis, T.T. Hien, N.J. White, F. Nosten, and T.J. Anderson. 2003. A selective sweep driven by pyrimetha- mine treatment in southeast Asian malaria parasites. Molecular Bi- ology and Evolution 20: 1526-1536. Namroud, M.-C., J. Beaulieu, N. Juge, J. Laroche, and J. Bousquet. 2008. Scanning the genome for gene single nucleotide polymorphisms in- volved in adaptive population differentiation in white spruce. Mole- cular Ecology 17: 3599-3613. Nanda, I., E. Karl, V. Volobouev, D.K. Griffin, M. Schartl, and M. Schmid. 2006. Extensive gross genomic rearrangements between chicken and Old World vultures (Falconiformes : Accipitridae). Cytogenetic and Genome Research 112: 286-295. Navarro, A. and N.H. Barton. 2003. Chromosomal speciation and molecular divergence: accelerated evolution in rearranged chromosomes. . Science 300: 321-324. Nielsen, R. 2005. Molecular signatures of natural selection. Annual Review of Genetics 39: 197-218. Nishida-Umehara, C., Y. Tsuda, J. Ishijima, J. Ando, A. Fujiwara, Y. Mat- suda, and D.K. Griffin. 2007. The molecular basis of chromosome orthologies and sex chromosomal differentiation in palaeognathous birds. Chromosome Research 15: 721-734. Noor, M.A., K.L. Grams, L.A. Bertucci, and J. Reiland. 2001. Chromosomal inversions and the reproductive isolation of species. Proceedings of the National Academy of Sciences U S A 98: 12084-12088. Nordborg, M., T.T. Hu, Y. Ishino, J. Jhaveri, C. Toomajian, H. Zheng, E. Bakker, P. Calabrese, J. Gladstone, R. Goyal, M. Jakobsson, S. Kim, Y. Morozov, B. Padhukasahasram, V. Plagnol, N.A. Rosenberg, C. Shah, J.D. Wall, J. Wang, K. Zhao, T. Kalbfleisch, V. Schulz, M. Kreitman, and J. Bergelson. 2005. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biology 3: e196. O'Brien, S.J., J.E. Womack, L.A. Lyons, K.J. Moore, N.A. Jenkins, and N.G. Copeland. 1993. Anchored reference loci for comparative genome mapping in mammals. Nature Genetics 3: 103-112.

76 Organ, C.L., A.M. Shedlock, A. Meade, M. Pagel, and S.V. Edwards. 2007. Origin of avian genome size and structure in non-avian dinosaurs. Nature 446: 180-184. Orr, H.A. 1997. Haldane's rule. Annual Review of Ecology and Systematics 28: 195-218. Orr, M.R. and T.B. Smith. 1998. Ecology and speciation. Trends in Ecology & Evolution 13: 502-506. Otto, S. and T. Lenormand. 2002. Resolving the paradox of sex and recom- bination. Nature Reviews Genetics 3: 252-261. Otto, S.P. 2000. Detecting the form of selection from DNA sequence data. Trends in Genetics 16: 526-529. Palumbi, S.R. and C.S. Baker. 1994. Contrasting population structure from nuclear intron sequences and mtDNA of humpback whales. Molecu- lar Biology and Evolution 11: 426-435. Pavlidis, P., S. Hutter, and W. Stephan. 2008. A population genomic ap- proach to map recent positive selection in model species. Molecular Ecology 17: 3585-3598. Peichel, C.L., K.S. Nereng, K.A. Ohgi, B.L.E. Cole, P.F. Colosimo, C.A. Buerkle, D. Schluter, and D.M. Kingsley. 2001. The genetic archi- tecture of divergence between threespine stickleback species. Nature 414: 901-905. Pontius, J.U., J.C. Mullikin, D.R. Smith, K. Lindblad-Toh, S. Gnerre, M. Clamp, J. Chang, R. Stephens, B. Neelam, N. Volfovsky, A.A. Schaffer, R. Agarwala, K. Narfstrom, W.J. Murphy, U. Giger, A.L. Roca, A. Antunes, M. Menotti-Raymond, N. Yuhki, J. Pecon- Slattery, W.E. Johnson, G. Bourque, G. Tesler, and S.J. O'Brien. 2007. Initial sequence and comparative analysis of the cat genome. Genome Research 17: 1675-1689. Presgraves, D.C. 2008. Sex-chromosomes and speciation in Drosophila. Trends in Genetics 24: 336-343. Price, T.D. and M.M. Bouvier. 2002. The evolution of F1 postzygotic in- compatibilities in birds. Evolution 56: 2083-2089. Primmer, C.R., T. Borge, J. Lindell, and G.P. Saetre. 2002. Single- nucleotide polymorphism characterization in species with limited available sequence information: high nucleotide diversity revealed in the avian genome. Molecular Ecology 11: 603-612. Primmer, C.R., T. Raudsepp, B.P. Chowdhary, A.P. Möller, and H. Ellegren. 1997. Low frequency of microsatellites in the avian genome. Ge- nome Research 7: 471-482. Ptak, S.E., D.A. Hinds, K. Koehler, B. Nickel, N. Patil, D.G. Ballinger, M. Przeworski, K.A. Frazer, and S. Paabo. 2005. Fine-scale recombina- tion patterns differ between chimpanzees and humans. Nature Ge- netics 37: 429-434. Ptak, S.E., A.D. Roeder, M. Stephens, Y. Gilad, S. Paabo, and M. Przewors- ki. 2004. Absence of the TAP2 human recombination hotspot in chimpanzees. PLoS Biology 2: e155.

77 Quinn, G.P. and M.J. Keough. 2002. Experimental Design and Data Analy- sis for Biologists. Cambridge University Press, Cambridge, UK. Qvarnström, A. 1997. Experimentally increased badge size increases male competition and reduces male parental care in the collared flycatch- er. Proceedings of the Royal Society B-Biological Sciences 264: 1225-1231. Qvarnström, A. and R.I. Bailey. 2008. Speciation through evolution of sex- linked genes. Heredity in press Qvarnström, A., J. Haavie, S.A. Saether, S.A. Eriksson, and T. Pärt. 2006. Song similarity predicts hybridization in flycatchers. Journal of Evo- lutionary Biology 19: 1202-1209. Raudsepp, T., M.L. Houck, P.C. O'Brien, M.A. Ferguson-Smith, O.A. Ryd- er, and B.P. Chowdhary. 2002. Cytogenetic analysis of California condor (Gymnogyps californianus) chromosomes: comparison with chicken (Gallus gallus) macrochromosomes. Cytogenetic and Ge- nome Research 98: 54-60. Reed, K.M., L.D. Chaves, M.K. Hall, T.P. Knutson, and D.E. Harry. 2005. A comparative genetic map of the turkey genome. Cytogenetic and Genome Research 111: 118-127. Reeve, H.K. and D.W. Pfennig. 2003. Genetic biases for showy males: are some genetic systems especially conductive to sexual selection? Proceedings of the National Academy of Science U S A 100: 1089- 1094. Reich, D.E., M. Cargill, S. Bolk, J. Ireland, P.C. Sabeti, D.J. Richter, T. La- very, R. Kouyoumjian, S.F. Farhadian, R. Ward, and E.S. Lander. 2001. Linkage disequilibrium in the human genome. Nature 411: 199-204. Remington, D.L., J.M. Thornsberry, Y. Matsuoka, L.M. Wilson, S.R. Whitt, J. Doebley, S. Kresovich, M.M. Goodman, and E.S. Buckler IV. 2001. Structure of linkage disequilibrium and phenotypic associa- tions in the maize genome. Proceedings of the National Academy of Sciences U S A 98: 11479-11484. Rieseberg, L. 2001. Chromosomal rearrangements and speciation. Trends in Ecology & Evolution 16: 351-358. Rodell, C.F., M.R. Schippe, and D.K. Keenan. 2004. Modes of selection and recombination response in Drosophila melanogaster. Journal of He- redity 95: 70-75. Roskaft, E., T. Järvi, N.E.I. Nyholm, M. Vitrolainen, W. Winkel, and H. Zang. 1986. Geographic variation in secondary sexual plumage cha- racteristics of the male pied flycatcher. Ornis Scandinavica 17: 293- 298. Ross-Ibarra, J. 2004. The evolution of recombination under domestication: a test of two hypotheses. American Naturalist 163: 105-112. Saether, S.A., G.-P. Saetre, T. Borge, C. Wiley, N. Svedin, G. Andersson, T. Veen, J. Haavie, M.R. Servedio, S. Bures, M. Kral, M.B. Hjernquist, L. Gustafsson, J. Traff, and A. Qvarnstrom. 2007. Sex chromosome-

78 linked species recognition and evolution of reproductive isolation in flycatchers. Science 318: 95-97. Saetre, G.P., T. Borge, J. Lindell, T. Moum, C.R. Primmer, B.C. Sheldon, J. Haavie, A. Johnsen, and H. Ellegren. 2001a. Speciation, introgres- sive hybridization and nonlinear rate of molecular evolution in fly- catchers. Molecular Ecology 10: 737-749. Saetre, G.P., T. Borge, K. Lindroos, J. Haavie, B.C. Sheldon, C. Primmer, and A.C. Syvanen. 2003. Sex chromosome evolution and speciation in Ficedula flycatchers. Proceedings of the Royal Society B- Biological Sciences 270: 53-59. Saetre, G.P., T. Borge, and T. Moum. 2001b. A new bird species? The tax- onomic status of 'the atlas flycatcher' assessed from DNA sequence analysis. Ibis 143: 494-497. Saetre, G.P., K. Král, S. Bures, and R.A. Ims. 1999. Dynamics of a clinal hybrid zone and a comparison with island hybrid zones of flycatch- ers (Ficedula hypoleuca and F. albicollis). Journal of Zoology 247: 53-64. Saetre, G.P., T. Moum, S. Bures, M. Král, M. Adamjan, and J. Moreno. 1997. A sexually selected character displacement in flycatchers rein- forces premating isolation. Nature 387: 589-592. Schlötterer, C. 2002. A microsatellite-based multilocus screen for the identi- fication of local selective sweeps. Genetics 160: 753-763. Schlötterer, C. 2003. Hitchhiking mapping - functional genomics from the population genetics perspective. Trends in Genetics 19: 32-38. Schmid, M., I. Nanda, H. Hoehn, M. Schartl, T. Haaf, J.M. Buerstedde, H. Arakawa, R.B. Caldwell, S. Weigend, D.W. Burt, J. Smith, D.K. Griffin, J.S. Masabanda, M.A.M. Groenen, R. Crooijmans, A. Vig- nal, V. Fillon, M. Morisson, F. Pitel, M. Vignoles, A. Garrigues, J. Gellin, A.V. Rodionov, S.A. Galkina, N.A. Lukina, G. Ben-Ari, S. Blum, J. Hillel, T. Twito, U. Lavi, L. David, M.W. Feldman, M.E. Delany, C.A. Conley, V.M. Fowler, S.B. Hedges, R. Godbout, S. Katyal, C. Smith, Q. Hudson, A. Sinclair, and S. Mizuno. 2005. Second report on chicken genes and chromosomes 2005. Cytogenet- ic and Genome Research 109: 415-479. Scholl, E.H. and D.M. Bird. 2005. Resolving tylenchid evolutionary rela- tionships through multiple gene analysis derived from EST data. Molecular Phylogenetics and Evolution 36: 536-545. Sendecka, J., M. Cichón, and L. Gustafsson. 2007. Age dependent reproduc- tive costs and the role of breeding skills in the collared flycatcher. Acta Zoologica 88: 95-100. Servedio, M.R. and G.P. Saetre. 2003. Speciation as a positive feedback loop between postzygotic and prezygotic barriers to gene flow. Proceed- ings of the Royal Society B-Biological Sciences 270: 1473-1479. Sheldon, B.C. and H. Ellegren. 1999. Sexual selection resulting from extra- pair paternity in collared flycatchers. Animal Behaviour 57: 285- 298.

79 Sherman, J.D. and S.M. Stack. 1995. Two-dimensional spreads of synapto- nemal complexes from solanaceous plants. VI. High resolution re- combination nodule map for tomato (Lycopersicon esculentum). Ge- netics 141: 683-708. Shetty, S., D.K. Griffin, and J.A. Graves. 1999. Comparative painting re- veals strong chromosome homology over 80 million years of bird evolution. Chromosome Research 7: 289-295. Shibusawa, M., S. Minai, C. Nishida-Umehara, T. Suzuki, T. Mano, K. Ya- mada, T. Namikawa, and Y. Matsuda. 2001. A comparative cytoge- netic study of chromosome homology between chicken and Japanese quail. Cytogenetics and Cell Genetics 95: 103-109. Shibusawa, M., M. Nishibori, C. Nishida-Umehara, M. Tsudzuki, J. Masa- banda, D.K. Griffin, and Y. Matsuda. 2004a. Karyotypic evolution in the Galliformes: an examination of the process of karyotypic evo- lution by comparison of the molecular cytogenetic findings with the molecular phylogeny. Cytogenetic and Genome Research 106: 111- 119. Shibusawa, M., C. Nishida-Umehara, M. Tsudzuki, J. Masabanda, D.K. Griffin, and Y. Matsuda. 2004b. A comparative karyological study of the blue-breasted quail (Coturnix chinensis, Phasianidae) and California quail (Callipepla californica, Odontophoridae). Cytoge- netic and Genome Research 106: 82-90. Slate, J. 2005. Quantitative trait locus mapping in natural populations: progress, caveats and future directions. Molecular Ecology 14: 363- 379. Slate, J., P.M. Visscher, S. MacGregor, D. Stevens, M.L. Tate, and J.M. Pemberton. 2002. A genome scan for quantitative trait loci in a wild population of red deer (Cervus elaphus). Genetics 162: 1863-1873. Slatkin, M. 2008. Linkage disequilibrium - understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics 9: 477-485. Smith, N.G., M.T. Webster, and H. Ellegren. 2002. Deterministic mutation rate variation in the human genome. Genome Research 12: 1350- 1356. Spencer, C.C., P. Deloukas, S. Hunt, J. Mullikin, S. Myers, B. Silverman, P. Donnelly, D. Bentley, and G. McVean. 2006. The influence of re- combination on human genetic diversity. PLoS Genetics 2 e148 Stam, P. 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. The Plant Journal 3: 739-744. Steiner, C.C., J.N. Weber, and H.E. Hoekstra. 2007. Adaptive variation in beach mice produced by two interacting pigmentation genes. PloS Biology 5: e219. Storz, J.F. 2005. Using genome scans of DNA polymorphism to infer adap- tive population divergence. Molecular Ecology 14: 671-688. Storz, J.F., S.J. Sabatino, F.G. Hoffmann, E.J. Gering, H. Moriyama, N. Ferrand, B. Monteiro, and M.W. Nachman. 2007. The molecular ba- sis of high-altitude adaptation in deer mice. PLoS Genetics 3: e45.

80 Stumpf, M.P.H. and G.A.T. McVean. 2003. Estimating recombination rates from population-genetic data. Nature Reviews Genetics 4: 959-968. Sturtevant, A.H. 1913. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. Journal of Expe- rimental Zoology 14: 43-59. Sutter, N.B., M.A. Eberle, H.G. Parker, B.J. Pullar, E.F. Kirkness, L. Krug- lyak, and E.A. Ostrander. 2004. Extensive and breed-specific lin- kage disequilibrium in Canis familiaris. Genome Research 14: 2388-2396. Svedin, N., C. Wiley, T. Veen, L. Gustafsson, and A. Qvarnström. 2008. Natural and sexual selection against hybrid flycatchers. Proceedings of the Royal Society B-Biological Sciences 275: 735-744. Syvänen, A.-C. 2005. Toward genome-wide SNP genotyping. Nature Genet- ics Supplement 37: S5-S10. Tajima, F. 1983. Evolutionary relationship of DNA sequences in finite popu- lations. Genetics 105: 437-460. Tajima, F. 1989. Statistical method for testing the neutral mutation hypothe- sis by DNA polymorphism. Genetics 123: 585-595. TCSAC. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437: 69-87. Theron, E., K. Hawkins, E. Bermingham, R.E. Ricklefs, and N.I. Mundy. 2001. The molecular basis of an avian plumage polymorphism in the wild: a melanocortin-1-receptor point mutation is perfectly asso- ciated with the melanic plumage morph of the bananaquit, Coereba flaveola. Current Biology 11: 550-557. Wahlberg, P., L. Stromstedt, X. Tordoir, M. Foglio, S. Heath, D. Lechner, A.R. Hellstrom, M. Tixier-Boichard, M. Lathrop, I.G. Gut, and L. Andersson. 2007. A high-resolution linkage map for the Z chromo- some in chicken reveals hot spots for recombination. Cytogenetic and Genome Research 117: 22-29. van Tuinen, M., C.G. Sibley, and S.B. Hedges. 2000. The early history of modern birds inferred from DNA sequences of nuclear and mito- chondrial ribosomal genes. Molecular Biology and Evolution 17: 451-457. Wang, W.Y.S., B.J. Barratt, D.G. Clayton, and J.A. Todd. 2005. Genome- wide association studies: theoretical and practical concerns. Nature Reviews Genetics 6: 109-118. Watterson, G.A. 1975. On the number of segregating sites in genetic models without recombination. Theoretical Population Biology 7: 256-276. Veen, T., T. Borge, S.C. Griffith, G.P. Saetre, S. Bures, L. Gustafsson, and B.C. Sheldon. 2001. Hybridization and adaptive mate choice in fly- catchers. Nature 411: 45-50. Weir, B.S. 2008. Linkage disequilibrium and association mapping. Annual review of Genomics and Human Genetics 9: 129-142. Weiss, K.M. 2008. Tilting at quixotic trait loci (QTL): an evolutionary pers- pective on genetic causation. Genetics 179: 1741-1756.

81 Wiley, C., N. Fogelberg, S.A. Saether, T. Veen, N. Svedin, J.V. Kehlenbeck, and A. Qvarnstrom. 2007. Direct benefits and costs for hybridizing Ficedula flycatchers. Journal of Evolutionary Biology 20: 854-864. Wootton, J.C., X. Feng, M.T. Ferdig, R.A. Cooper, J. Mu, D.I. Baruch, A.J. Magill, and X.Z. Su. 2002. Genetic diversity and chloroquine selec- tive sweeps in Plasmodium falciparium. Nature 418: 320-323. Wright, S. 1921. Systems of mating, I - IV. Genetics 6: 111-178.

82

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 587

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-9513 2009