Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 312

From QTLs to Genes: Flowering Time Variation and CONSTANS-LIKE Genes in the Black Mustard

MARITA KRUSKOPF ÖSTERBERG

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 UPPSALA ISBN 978-91-554-6903-0 2007 urn:nbn:se:uu:diva-7900                                      ! "##$ %#&## '  (  '    ' )(  (* +(    ,      - (*

  ./ ' 0  1* "##$*   2+3  &  ,  +     4567+86793:.-    ( ; / 1     * 8       *                            <%"* <= *    * :7;6 $!9%9>>=9?#<9#*

+(    ' ,    @     , (     (   ' ,((        '  ' * : ( ( ( '  (    '  '   ''         ' ,      * +(  /   '  ( (    2+39       * +(    , ( (   '' ' ,    (      (    ( !" #"   ,((  / , '' ' ,    #   * A(       (     '' ' ,        * 6  (   !   B !C ( , ,  ' (    (   !" #" $%&' ( B!$(C  * : #    !$( (   ( ,     , (    ' ' ,     (   (    B !$(C  ( 2+3  ( ,       '     ,      ' ,    ,    ' (    ,     , ( ' ,      * : ( '  , '    '     , ' ,       B% )C  (     * 1    (      ,  ( '    '   (  !$(B  , C* :  ( (  ,     (   '  ( '        '     (/ ,( ( (    '     :   '  /  D , (     ' (  *   

 '  ,     (       ' ( 4567+86793:.-       ; !$(; !  ; !**

&+,  ,   ;    4567+867 4567+86793:.- %

-  &. /* 0    '  0 1      0 '       1  0 "* (20    0 '345)67 0 +

E 1  ./ ' 0  "##$

:776 %?>%9?"%= :7;6 $!9%9>>=9?#<9#  &  &&& 9$## B( &FF */*F G H &  &&& 9$##C List of papers

This thesis is based on the following papers, which are referred to by their Roman numerals:

I Kruskopf Österberg, M., Shavorskaya, O., Lascoux, M., Lagercrantz, U. 2002. Naturally occuring indel variation in the nigra COL1 gene is associated with vari- ation in flowering time. Genetics, 161:299-306

II Lagercrantz, U., Kruskopf Österberg, M. Lascoux, M. 2002. Sequence variation and haplotype structure at the putative flowering-time locus COL1 of . Molecular Biology and Evolution 19:1474-1482

III Krouchi, F., Gustavsson, S., Sjödin, P., Kruskopf Öster- berg, M., Lagercrantz, U., Lascoux, M. Polymorphism and genotypic disequilibrium aorund COL1, a putative flowering time gene in Brassica nigra. (Submitted)

IV Sjödin, P., Hedman, H., Kruskopf Österberg, M., Gus- tafsson, S., Lagercrantz, U. Polymorphism and diver- gence at three duplicate genes in Brassica nigra. (Sub- mitted)

Contents

Introduction...... 7 The species: Brassica nigra ...... 9 The trait; Flowering time...... 11 Molecular biology of flowering...... 11 Promoting pathways ...... 12 Enabling pathways...... 14 Floral meristem identity genes...... 14 CONSTANS-like genes in different species ...... 15 Genetic control of variation in flowering time...... 16 Duplication of genes and gene families ...... 17 Evolution of multigene families ...... 18 Aims of the study ...... 20 Results and discussion...... 20 Conclusions ...... 26 Acknowledgements...... 29 References ...... 31

Introduction

One of the main goals of biology is undoubtedly to understand the causes of the bewildering phenotypic variation observed both within and among spe- cies. In particular, large efforts have been paid to disentangle the genetic control of the variation in phenotypic characters. Unfortunately, this is gen- erally a daunting task as most traits of interest (size, morphology, phenology, fitness etc …) do not have a simple single-locus, Mendelian, inheritance but instead are controlled by the environment and a few major genes, in the best cases, or a myriad of loci, in the worst ones. The analysis of the inheritance of quantitative traits is by no means a recent endeavor as it all started at the end of the nineteen century, when Galton, Pearson and Weldon developed the necessary concepts and statistical tools. While those concepts and tools were readily embraced by plant and animal breeders and led to stunning successes, efforts to identify the genetic factors controlling the variability in quantitative traits where hampered by the lack of genetic markers, most of the available markers being a few visible morphological markers. Needless to say this situation started to change in the late 1960’s with the discovery of allozymes and was dramatically altered in the 1980’s with the discovery of PCR. Today, marker availability is, for many organisms, no longer the main limiting factor. And yet, identifying the genes responsible for variation in complex traits obviously remains a demanding task as suggested by the lim- ited number of genes so far discovered, even in extensively studied species such as humans, Arabidopsis or Drosophila.

There are different paths to identify genes associated to the variation in quantitative traits. First one can use linkage mapping (aka QTL mapping, see Doerge 2002 for a recent review). Briefly, in plant species most commonly two parents with contrasted phenotypes (tall and short for instance, if the trait of interest is plant size) are crossed to create a F1 generation and co- segregation of the phenotype and alleles at molecular markers is studied in the F2 or backcrosses progenies. There are different statistical methods to identify QTL relying on analysis of variance (ANOVA), regression analysis and maximum likelihood (LOD score). QTL is actually a misnomer as QTL generally correspond to large DNA segments where more than one locus might be affecting the trait of interest. Most studies have found many QTL with small effects and a few with strong effect (Tanskley 1993). This may, at

7 least in part, be simply due to the limited size of most QTL studies. Discov- ering QTL with small effects would require extensive progeny sizes. The usual approach is to explore only the QTL with strong effect. Once you have a coarse QTL map a higher resolution can be reached in different ways, typi- cally by identifying more recombination events in additional crosses. One may consider that the region of interest is sufficiently small to be entirely sequenced. This is, however, seldom the case. In model organisms whose genomes have been entirely sequenced and annotated the sequence can be used either directly if one only wants to know how the genome segment looks like and if it contains any previously known potential candidate genes, or as a template to design PCR primers in order to sequence available sam- ples. If the genomic region corresponding to the QTL is totally unknown chromosome walking can be an alternative. The strategy of chromosome walking or map-based cloning is first to identify markers very closely linked on both sides of the gene of interest. A marker on one side is then used to identify clones typically from large insert YAC or BAC libraries. End se- quences from these clones can then be used to identify additional clones from adjacent genome segments, and the distal ends of these clones can in turn be used to reprobe the library. This process is repeated until you hit a clone that contains a molecular marker on the other side of the gene, A con- tig is in this way produced that cover the region with overlapping clones. This method was applied with resounding success in Colosimo et al. (2005) to identify the main gene responsible for plate morph variation in stickle- backs.

For fine mapping one alternative which has been gathering pace over the last ten years with the coming of age of the genomic era is association mapping, also called LD mapping (Balding 2006). In contrast to linkage mapping which relies on linkage disequilibrium generated over a couple of genera- tions, association mapping uses the linkage disequilibrium created through the history of the population. Advantages of association mapping over link- age mapping are that (i) the creation of new crosses, often a time-consuming and costly process, when at all possible, is not needed, (ii) a large number of alleles per locus can be surveyed and, (iii) because it takes into account all recombination events that have occurred since the last common ancestor of the population, resolution can be dramatically increased. A main disadvan- tage of association mapping is that the number of false positives can be very large, the major source being unaccounted for stratification within the map- ping population. There are, however, various strategies to alleviate this prob- lem. For instance Yu et al. (2006) (see also Zhao et al. 2007) method cor- rects for population structure by using a random effect to estimate the frac- tion of the phenotypic variation that can be explained by genome-wide corre- lations. On assumes that part of the resemblance between the individuals is proportional to their relative relatedness (or kinship) estimated with a large

8 number of markers along the genome. Yu et al. (2006) also include in their model the population assignments produced by the STRUCTURE algorithm (Pritchard et al. 2000). In species where dense maps are available the ge- nome can be scanned for association and analysed with the method just de- scribed. The absence of whole genome sequence in most species has led to the adoption of a candidate gene approach rather than genome scans when attempting to relate DNA polymorphism to phenotypic traits of interest through association mapping. Briefly, rather than attempting to associate random polymorphisms to the trait of interest, one focuses on loci on which information is already available, so called candidate genes. The choice of the candidate genes is guided by prior information available on their involve- ment in the same or analogous traits in model plants (e.g. flowering time in Arabidopsis thaliana), in other species or in the species under study. In gen- eral, information stems from physiological or molecular studies, but it is worth pointing out that population genetics studies could also be relevant there. For instance, Wright et al. (2005) carried out a genome scan in maize to identify genes that have been and/or are under selection since maize de- scended from teosinte. One of the genes that showed a strong signature of selection was the tb1 gene which had previously been identified as one of the main genes involved in the morphological evolution of maize from teosinte through QTL studies and chromosome walking (Wang et al.1999). The can- didate gene approach has been effectively used in crops (e.g. Thornsberry et al. 2001; Kruskopf-Österberg et al., 2002) and more recently forest trees (Thumma et al. 2005; Ingvarsson et al. 2006; Gonzalez-Martinez et al. 2007). One can then validate the identified polymorphisms by carrying out a functional analysis of the candidate genes that are most promising by con- structing transgenic plants and studying their phenotypes in different geno- types under diverse environmental conditions.

The plant species: Brassica nigra

The genus Brassica comprises 38 species, of which 6 are economically im- portant (Warwick 1993, Warwick et al. 2000). These include three diploid species, B.rapa/B.campestris (A genome, n = 10), B.nigra (B genome, n = 8) and B.oleracea (C genome, n = 9). The other three are amphidiploids and are natural hybrids of these diploid species; these are B.carinata (BC genome, n = 17), B.juncea (AB genome, n = 18) and B.napus (AC, n = 19) (Figure 1). The cytogenetic relationships between the cultivated species were revealed by Morinaga (1928) based on interspecific hybridization and chromosome pairing and is classically represented by U’s triangle (Fig. 1; U, 1935). The six species of the U-triangle are usually referred to as cultivated .

9 These can further be divided into vegetable crops, oilseed and condiment yielding Brassica. is important as a source of vegetable crop, whereas B.juncea, B.napus and B. rapa are major oilseeds yielding crops. B.nigra and B.carinata are on the other-hand useful as condiments. The agricultural utility of B.nigra has been on the decline the last 50 years and it has successively been replaced by B. juncea, since B. nigra scatter seeds as they ripen permitting large amounts of seeds to escape harvest. Presently, B.nigra is mainly used as a source of condiment in India and Ethiopia. Among these six species, B.rapa, B.nigra and their amphidiploid B.juncea were the first ones to be domesticated (Gomez-Campo and Prakash 1999). The population genetics of Brassica nigra is still poorly known. High within population genetic diversity and significant subpopulations structure was found in an early study albeit with few markers and a limited number of populations (Westman and Kresovich 1999). A more extensive study based on both chloroplast and nuclear SSRs markers have corroborated these re- sults (Alström-Rapaport et al., in preparation). In particular, the Ethiopian populations seem to have diverged from other populations and be particu- larly depleted of genetic variation.

Figure 1. Genetic relationships of the cultivated Brassica species (redrawn from U, 1935)

10 The trait; Flowering time

Flowering represents a major developmental switch and its molecular basis is now well understood in A. thaliana (e.g. Mouradov et al. 2002; Boss et al., 2004). However, it is still unclear which of the genes are responsible for the extensive variation in flowering time observed in most plant species (Roux et al. 2006). Roux et al. (2006) established a list of genes potentially involved in the evolution of early flowering in A. thaliana but stressed that this list might have to be reconsidered in other species, according to their genetic architecture and ecological niche.

Molecular biology of flowering

There are several reasons why Arabidopsis thaliana is a good choice as a model plant. It is easy to work with since it has a small genome size, short generation time and the plant is also physically small, and therefore large number of plants can be obtained in a small space and after a short time. Its selfing habit greatly facilitates sequencing, as time-consuming and tiresome cloning step becomes unnecessary. On the other hand, its selfing habit makes its population genetics analysis more difficult as it does not conform to the standard model and, some would argue, its evolutionary biology less inter- esting as selfing is a dead-end. The innumerable studies on A. thaliana over the years have led to an unprecedented understanding of all aspects of plant biology. In particular, the control of flowering is now well understood in that species (Mouradov et al. 2002).

The research on different flowering time pathways and genes involved in them has mainly been done in A. thaliana. While it is clear that this research has bearing on other plants species too, the extent to which results obtained in A. thaliana extend to other species is not yet fully known. In comparisons of A. thaliana and different monocot species, it is clear that large parts of the pathways identified in A. thaliana are conserved, but important differences are also evident (Cockram et al. 2007). Current data suggests for example, that the two lineages may have evolved vernalization pathways independ- ently.

Since B. nigra is closely related to A. thaliana, and both are long day plants, their genomes exhibiting a high level of collinearity (Lagercrantz et al. 1996), the information from research on Arabidopsis should be readily us- able in B. nigra even if details certainly will differ.

11 At least four different pathways that regulate the flowering transition/time have been identified in A. thaliana. Two of them respond to environmental cues and two works mainly independently from such cues. These pathways can conceptually be divided into those that enable flowering by regulating floral repressors (mainly FLC), and those that promote flowering. The pro- moting pathways include those that mediate photoperiod and gibberellin synthesis/signaling.

Promoting pathways

Photoperiod pathway – Photoperiod (i.e., the daily duration of light) is very important cue to adjusting flowering time to the environment in many plant species. So called short day plants flower when the length of the night ex- ceeds a given threshold whereas in contrast long day (LD) plants need long days/short nights to start flowering. Some day neutral plants don’t have any of those requirements. Both Arabidopsis and B. nigra is classified as a facul- tative LD plants meaning that they will, eventually, flower also in short days. In contrast an obligate species will never flower under the “wrong” condi- tions.

To regulate the response to day length the plants have an internal oscillator, the circadian clock. Such an internal clock is present in various organisms from humans, all the way to microorganisms such as Synechococcus (Dunlap 1999). The circadian clock keeps track of the time and can run largely inde- pendent of environmental signals. This independency can be seen by grow- ing plants in constant environmental conditions, like constant light or dark- ness. Under these conditions the functions and genes that are regulated by the clock show a rhythm with a period of ~24h. The clock got its name from the length of the period, circa and dies meaning approximately and day in Latin. Although the clock doesn’t need light changes to keep a rhythm, it receives input from light to help it stay synchronized with the environment by regulating the activity in the components of the clock. Even though the concept of circadian rhythms and clocks is not recent, great progresses in the understanding of circadian clocks have been made during the last ten years. Yet many question marks remain. The clock is believed to have evolved independently several times during evolution since unrelated clock genes are found in e.g., animals and plants (Young and Kay 2001).

The circadian clock in plants as in other organisms is believed to have one or more feed back loops making the clock a self-regulating unit. The first pro-

12 posed loop in Arabidopsis contains the clock genes LHY1 (LATE ENLON- GATED HYPOCOTYL), CCA1 (CIRCADIAN CLOCK ASSOCIATED 1) and TOC1 (TIMING OF CAB1). LHY and CCA1 show high expression levels in the morning and they repress the expression of TOC1. This gene is needed for the expression of LHY1 and CCA1 so eventually the levels of those decrease making room for the level of TOC1 to increase. Increase of expression of TOC1 promotes LHY1 and CCA1 making them peak in the morning and so on. There are also other genes, like GIGANTEA (GI), that have been suggested to function in additional loops within the clock. Besides the core genes of the clock there are genes that are regulated by clock like COLD AND CIRCADIAN REGULATED 2, and CHLOROPHYLL A/B BINDING PROTEIN 2. These genes regulate different functions that are dependent on the circadian rhythm.

The output pathway involved in photoperiod response controls the expres- sion of CONSTANS (CO), which has been identified as a key component of the photoperiod pathway integrating signals from the clock and photoperiod. The circadian clock generates daily oscillation of CONSTANS (CO) mRNA. As protein stability of CO is controlled by light, the coincidence of light and high CO expression that only occurs in long days induces the pathway inte- grators like SOC1 and FT and thereby flowering (Valverde et al. 2004).

Light is perceived through a number of photoreceptors. Photoperiodic re- sponses are affected by red and far-red light (through phytochromes) and blue light (though cryptochromes). In A. thaliana, phytochromes are encoded by five genes (PHYA to PHYE), while two genes (CRY1 and CRY2) encode cryptochromes. The photorecep- tors possess two functions in photoperiodic response, one is to entrain the clock, and the other is to affect the stability of the CO protein.

Gibberellin pathway - Gibberellin acid (GA) is a growth regulator that is involved in many aspects of plant growth and development. Here I will only mention aspects related to flowering. GA promotes flowering in Arabidopsis, particularly in non-inductive short days (Wilson et al. 1992, Putterill et al. 1995, Blazquez et al 1998). The GA pathway can thus be interpreted as a default pathway, enabling flowering mainly under non-inductive short days. Several mutations in this pathway have been discovered and they influence flowering time by disturbing different aspects of GA synthesis or GA signal- ing. The ga1-3 mutant disrupts the function of the gene GA1 that encodes an enzyme that catalyses an early step in GA biosynthesis (Sun and Kamiya, 1994). Under short day conditions Arabidopsis ga1-3 mutants never flower, and under long days they flower slightly later (Wilson et al. 1992). One target of GA signaling is LEAFY (LFY), which is a flower meristem gene (see

13 below). In the ga1-3 mutants the expression levels of LFY is reduced and if LFY is overexpressed in those mutants they flower normally (Blazquez et al. 1998; Melzer et al. 1999).

Enabling pathways

Enabling pathway can be defined as those that antagonize the activation of pathway integrators by the promoting pathways. Both the autonomous and vernalization pathway can be characterized as enabling pathway, and both regulate the floral repressor FLOWERING LOCUS C (FLC). The FRIGIDA (FRI) is usually not included in the autonomous pathway although it regu- lates FLC expression in an autonomous way. FRI causes upregulation of FLC expression and thereby delays flowering. Allelic variation at FRI de- termines a major part of the variation in flowering time in natural popula- tions of A. thaliana (Johansson et al. 2000).

Autonomous pathway - The autonomous pathway does not respond to ex- ogenous cues from the environment, as mutations of genes in this pathway flower late in all photoperiods. Genes in this pathway function to reduce expression of the pathway integrator FLC. Late flowering in autonomous pathway mutants can be suppressed by vernalization or growth in far-red light, so these cues are thought to function in parallel to other pathways. Genes in this pathway functions as a series of subgroups sharing a common target FLC, rather than working in a hierarchical fashion.

Vernalization pathway - Vernalization occurs when plants experience cold temperature for several weeks. This normally takes place during winter, and the result is downregulation of FLC that enables flower promotion through other pathways. Vernalization results in histon modification at FLC. This is possibly mediated by VERNALIZATION INSENSITIVE3 (VIN3), VER- NALIZATION2 (VRN2), VERNALIZATION1 (VRN1) and TERMINAL FLOWER2 (TFL2) (Sung and Amasino 2005).

Floral meristem identity genes

The different pathways are integrated mainly through FT and SOC1 that in turn regulate the activity of a group of genes called floral meristem identity (FMI) genes and they direct the primordia to develop into flowers instead of

14 leaves. FMI genes in Arabidopsis include LEAFY (LFY), APETALA1 (AP1), CAULIFLOWER (CAL), and TERMINAL FLOWER1 (TFL1).

Figure 2.Main flowering time pathways, from Roux et al. 2006

CONSTANS-like genes in different species

In Arabidopsis CONSTANS (CO) and CONSTANSLIKE1 (COL1) belongs to a family consisting of 17 members divided into three groups (Robson et al. 2001). CO-like homologs are also found in several other plant species e.g., rice (Yano et al. 2000), barley (Griffiths et al. 2003) and perennial ryegrass (Martin et al. 2004). Thus, CO-like genes are found both in monocots and dicots, both in LD and SD plants and both in annual and perennial plants. In Arabidopsis CO promotes flowering time in response to long days (Putterill et al. 1995) but altered expression of COL1 and COL2 in transgenic plants had little effect on flowering time (Ledger et al. 2001). CO seems to also have effects other traits like tuber formation in potato, which is also influ- enced by photoperiod (Martinez-Garcia et al. 2002).

COL1 and COL2 encode proteins with a ~67 % amino acid identity overall to the CO protein (Ledger et al. 2001, Putterill et al. 1995, Putterill et al. 1997). All three proteins share two zinc finger motifs at their N terminus, and a CCT motif which is also shared by TOC1 at the C terminal end (Ledger et al. 2001, Putterill et al. 1997). Although COL1 and COL2 have

15 not been implicated in regulating flowering time in Arabidopsis, their ex- pression is regulated by the circadian clock (Ledger et al. 2001). Further- more, overexpression of COL1 results in alterations of circadian leaf move- ments.

Genetic control of variation in flowering time

Flowering is an important developmental switch, and as I have reviewed above the major features of the molecular basis of flowering are starting to be well understood in A. thaliana (e.g. Mouradov et al. 2002; Boss et al. 2004). However, it is not known which genes that contribute most for the extensive variations in flowering time observed in most plant species (Roux et al. 2006). Roux et al. (2006) established a list of genes potentially in- volved in the evolution of early flowering in A. thaliana but stressed that this list have to be reconsidered in other species, according to their genetic archi- tecture and ecological niche. In A. thaliana, several studies, have implicated polymorphisms in FRI and FLC in natural flowering time variation. For instance, the two widely used laboratory strains Landsberg erecta (Ler) and Columbia (Col-0) flower early since them both carry deletions at the FRI locus. These two deletions have also been identified in a large number of early-flowering accessions, but other polymorphisms have also been found (Johanson et al. 2000; Gazzani et al. 2003; Le Corre et al. 2002, Stinch- combe et al. 2004). FRI and FLC account for up to 23% of the variation in flowering time and are the main genes involved in the control of flowering time though other genes have been shown to play a part too. Allelic variation at the photoreceptor, PHYC was also shown to affect flowering time (Balasubramanian et al. 2006) and Caicedo, A L et al. (2004) have suggested that CRYPTOCHROME2 (CRY2) could also be associated to variation in flowering time. Generally these studies are based on so called accessions or ecotypes, and seldom large populations samples this probably means that a lot of natural variation still goes undiscovered. Further studies are thus likely to reveal additional allelic variation causing variation in flowering time, both at genes already shown to be involved but also at new ones.

As noted above, while flowering pathways seem to be well conserved across plants species, there are major differences. For instance some plant species require vernalization while others do not. Consequently, a priori one does not expect to see the same major genes controlling flowering time variation across plant species and indeed, genes from other pathways have been asso- ciated to variation in flowering time in other or in short-day

16 plants such as rice or tobacco. In rice flowering time (also called heading time) was shown to be controlled by five quantitative trait loci (Hd1 to Hd5) (Yano et al. 2000). Hd1 is an ortholog of A. thaliana CO, a gene that so far has not been associated to flowering time variation in this species.

Among the Brassica species, three of the four copies of FLC in B. rapa co- segregate with quantitative trait loci for flowering time (Schranz et al. 2002). The authors have argued that the three genes modulate flowering in an addi- tive manner.

Duplication of genes and gene families

COL1 is part of a gene family and two of the copies co-localized with flow- ering time QTL. It seemed therefore important to take this dimension into account in the present study. Gene families consist of genes that have some similarities in function and/or sequence and those similarities are based on their common background. All genes in a gene family have a common an- cestor from which they originated by duplication. The size of gene families can vary a lot, from the smallest with only a pair of genes to very large fami- lies with up to a few thousand members. There are also differences among species. For instance the largest gene family in Drosophila melanogaster only consists of 111 genes while the largest one found so far in mammals consists of ~ 1000 genes (Gu et al. 2002).

Duplication of genes can happen in several ways. Genes can become dupli- cated because a large genome segment where they are located in gets dupli- cated or the entire genome gets duplicated (polyploidization). Many plant species have experienced whole genome duplications, but the presence of whole genome duplications in vertebrates remains a debated issue. Another form is gene conversion where two rather similar genes pair up and one gene replaces all or part of its nucleotide sequence with a copy of the other gene. Yet another is unequal crossing over, where two homologous chromosomes pair up unequally during meiosis leading to one chromosome getting larger and the other one smaller. Usually this happens where there already are tan- dem repeats, which means that one chromosome gets more copies on behalf of the other. A fourth way of duplication is retrotransposition. This is when a mRNA is retrotranscribed to a cDNA and then inserted back into the ge- nome. In this form of duplication the copy will lose any introns and the du- plicated part can turn up anywhere in the genome, by chance probably not close to its original place.

17 The really interesting part is what happens after the duplication? Having two identical genes and only needing one is an excellent working material for evolution, but is it used? The earliest ideas about what happened next was that after the duplication one of the copies acquired so many mutations, due to relaxed selection pressure, that after a while it would be functionless and becomes a so called pseudogene or a silenced gene. If the mutation rate dif- fers among genes, the genes will evolve at different rates and the most slowly evolving one would be the one that retains the original function. The future for a pseudogene is that after an extensive time period it is either deleted from the genome or is so different from its original structure that it is beyond recognition. In contrast to nonfunctionalization, one of the gene copies could acquire a new function. This also implies some relaxed selection because one gene copy retains the original function and the other can then evolve more freely. Instead of getting non-beneficial mutations and structural change, the gene copy get beneficial mutations/structural changes making it useful in a new way, and after that probably is itself under new selectional pressure to retain this new function. The most likely is that this new function is close to the original function and not a totally different one: for instance, in humans two duplicated opsin genes differ in the wavelength they absorb (Yokoyama and Yokoyama, 1989). An alternative variant of this is that the new function is already there before the duplication event. Finally, subfunctionalization has been proposed to explain cases where the two duplicate gene copies share parts of the original function of the ancestor gene. This means that for a full gene function, both copies are needed and therefore kept. The base for this can be that the genes acquire mutations that somewhat compromises their function and those mutations are not removed since the full function is any- way retained. This scenario can sometimes involve a more specialized func- tion of one of the genes. This theory is also called the duplication, degenera- tion and complementation model (DDC model). Experimentally this theory has shown to be valid in several yeast genes (Van Hoof, 2005).

Evolution of multigene families

There are different theories about how gene families evolve. First duplicated genes can diverge over time. Concerted evolution, on the other hand, means that the genes in a gene family stay more or less alike. When mutations occur they spread to the member of a family by unequal crossing over or gene con- versions. In this model the cluster of genes within a species stay more alike than between species because you have divergence. The same result can be caused by purifying selection that helps to keep the family together by re-

18 moving mutations when they arise. In 1992 Nei and Hughes suggested a third possibility, which they named the birth-and-death evolution. This new model was needed since the two initial models were shown to be insufficient to ex- plain new data from sequencing gene families like the MHC family. Evolu- tion according to the birth-and-death model is composed of arrival of new genes through duplication and some of those new genes are maintain for long periods while others are silenced. It seems that gene families with highly conserved genes usually can be explained by concerted evolution or purifying selection, for example most RNA genes, while birth-and-death evolution could fit better gene families with extremely variable members (e.g. most immune system gene families). Finally, there are also gene families that evolve in a pattern that best can be explained by a mix between both con- certed evolution and birth-and-death evolution, for example the heat shock gene family. Possibly the MADS box gene family is the most interesting gene family involved in plant development. Herein you find genes like FLC, SOC1, AP1, AP3, AG and PI among others (De Bodt et al., 2003). One inter- esting thing about this family is that the floral MADS-box genes seems to have arisen long before flowering plants evolve (Alvarez-Buylla et al. 2000) which agrees with the idea that duplicated genes can develop new functions.

As more and more genomes are sequenced, the possibilities to investigate how large parts of the genomes are made up of duplicated genes increases. Figures as high as 65 % of genes being duplicated, has been reported in A. thaliana. In bacteria 17 % of genes in both Helicobacter pyroli and Haemo- philus influenzae (Tomb et al. 1997) originated through duplication while for Mycoplasma pneumoniae the figure is 44% (Himmelreich et al. 1996). In humans it could be as high as 38 % (Li et al. 2001). The actual proportion of duplicated genes is probably higher since many duplicated genes have di- verged so much that they can no longer be recognized as duplicated. An average rate of origin of new duplications that is mentioned (Lynch and Conery, 2000) is 0.01 per gene per million years, making gene duplication an important source for evolution.

19 Aims of the study

The major aim of this study was to identify genes associated to flowering time in Brassica nigra and study those into more details.

Specific aims:

I Based on a previous QTL study that identified a major QTL for flowering time in B. nigra, test for association between candidate genes located within this QTL and flowering time (paper I). II To analyze the polymorphism along the identified gene(s) in paper I, in order to test for the signature of recent or ancient se- lection (paper II) III To extend the study of polymorphism to flanking areas of the identified polymorphism and validate the association identified in paper I (paper III) IV To analyze the molecular evolution of CO-Like genes in Bras- sica nigra (Paper IV).

Results and discussion

Paper I

The background for this paper is a previous QTL-mapping study in Brassica nigra. The aim was to find QTLs affecting flowering time. Two main ge- nomic areas influencing that trait were identified (Lagercrantz et al. 1996). In the area with the largest effect, a Brassica homolog to the Arabidopsis gene CONSTANS was found. As discussed above, this gene promotes flowering in response to long photoperiods and consequently this gene was chosen as a candidate gene and its function was tested by transforming this gene from

20 both late and early flowering individuals into Arabidopsis co-mutants. Both alleles were functional and restored early flowering in a co mutant, but there was no significant difference in flowering time between the plants that had been restored by alleles from early or late flowering plants. Additionally, sequence variation along this gene was extremely limited in Brassica nigra. Hence the QTN (quantative trait nucleotides) that were responsible for the observed QTL seemed more likely to be located outside the B. nigra CO gene (BniCOa). Close by BniCOa a B. nigra CONSTANSLIKE 1 gene (BniCOL1) was found. This gene is a duplicate of CO, but in Arabidopsis COL1 (or At- COL2) have not been shown to have a role in regulating flowering. In B. nigra this gene was located 3.5 kb upstream BniCOa and displayed a surpris- ing amount of sequence variation between the early and late flowering indi- viduals. We therefore decided to focus our efforts on BniCOL1. In article I we studied the variation found in this gene and tried to correlate this variation to differences in flowering time in natural populations. The variation con- sisted of 16 nucleotide substitutions and two indels (Ind1 and Ind2) separated by 235 bp, but for ease of scoring we focused on the indel variation in paper I. One indel (Ind1) consisted of a trinucleotide repeat (AACn) with six differ- ent alleles in the sample, and the second indel consisted of an 18 bp indel (insertion/deletion) polymorphism, with two alleles (denoted L and S, respec- tively). The association study was performed in eight populations that were cultivated in two separate rounds, under controlled environments. The popu- lation samples were from Germany, Greece, Spain, Portugal, Italy (three populations) and Ethiopia. The plants were scored for flowering time (time to the start of flowering) and association between polymorphism and flowering time was performed using analysis of variance. Due to the presence of strong population structure, association between genotype and flowering time was tested within each population. There was variation at Ind1 in six populations but none was significantly correlated to flowering time. On the other hand, for Ind2 there were significant correlations between flowering time and geno- type in four out of five populations. Altogether the S allele was correlated to early flowering and the L allele to late flowering. Plants with the SS haplo- type flower early while those with the LL haplotype generally flowered late. The LS heterozygote, displayed an intermediate flowering time.

Paper II

In this second paper, following the discovery of an association between Ind2 and flowering time, we tried to find evidence of selection along the Bni- COL1 gene. Since flowering time is expected to affect fitness, genomic areas influencing this trait could be under selection.

21 The study included 41 complete sequences (1320 bp) of B. nigra COL1. A total of 47 segregating sites were found, most of them in the coding region (40 sites); of those 17 were synonymous and 23 nonsynonymous changes. The number of haplotypes was 26 corresponding to a haplotype diversity of 0.954. The distribution of variation is not uniform with peaks of variation close to Ind2 and the 3´ end of the gene. Some of the polymorphic sites are in complete linkage disequilibrium with Ind2 (L/S). All of them are located outside the two conserved domains and found in fast evolving parts, one is found in the non-coding region. Polymorphism both at Ind1 and other sites showed that the Ethiopian population was significantly differentiated from the other populations.

The Ind2 indel polymorphism that was previously associated to flowering time has two alleles the L allele (long, having the insertion) and the S allele (short, lacking the same insertion). The Ethiopian population is fixed for the S allele. Only the L allele is found in A. thaliana whereas only the S allele was found in other Brassica species. To study the evolutionary divergence between those two alleles the Kimura two-parameter model was used and based on that a neighbor-joining tree was made using Arabidopsis thaliana as an outgroup. The L and S alleles form two different clades. In a later, more extensive study Shavorskaya and Lagercrantz (2006) examined this indel in several species in the Brassicaceae family. Their conclusion was that this polymorphism arose after the split between Arabidopsis and Bras- sica/Sinapsis/Raphanus but before the split between Brassica elongata and Raphanus/Sinapsis/B. nigra/B. oleracea/rapa. This makes Ind2 a very old polymorphism that seems to have remained for more than then ten million years. (Shavorskaya and Lagercrantz, 2006)

Using an Arabidopsis sequence we identified the ancestral state of the poly- morphic sites. Interestingly the 5´ half of the gene in the ancestral state of polymorphic sites was found in either the Ethiopian population or the Euro- peans ones. On the other half of the gene the division was instead between the L and the S alleles.

To test for selection or departure from neutrality the following tests were used: Tajima´s D, McDonald and Kreitman test, Wall’s B and Q test, Kelly’s ZnS test, the haplotype number test (K test) and the haplotype diversity test (H test). Almost all tests failed to detect evidence of selection. Only when testing the Ethiopian population separately could we show a significant de- parture from neutrality. However the negative D value that was found could well be a sign of a severe bottleneck instead of selection.

Since many of the neutrality tests assume no recombination, having recom- bination can undermine them. Therefore it is important to estimate the rate of

22 recombination and this was done for BniCOL1 leading us to the conclusion that the recombination appears to be moderate. So taking into account re- combination rate seems unlikely alter the lack of significant departure from neutrality. Compared to Arabidopsis thaliana it is much higher but that is probably because of the difference in reproductive systems, outcrossing in B. nigra and selfing in A. thaliana. .

Paper III

The aim of this paper was twofold. First we wanted to validate the associa- tion found in paper I in a larger set of populations. Second, we wanted to test whether the association found in paper I could simply reflect extensive link- age disequilibrium with areas outside of the gene, most notably areas in the intergenic space between BniCOL1 and BniCOa.

We extended the experiments both by including samples from more popula- tions and by looking at more markers/polymorphism around BniCOL1. Since it had previously been found that there is strong population structure, adding more populations and not just adding more samples from the same populations should be useful since we then could verify the generality of the pattern found in our first study. We used individuals from 25 populations and genotyped them for polymorphism at Ind2 as well as at 4 adjacent poly- morphisms and assessed the relationship of these markers with flowering time. Many of the additional populations come from geographical areas pre- viously not covered. By adding those populations the geographical range was much extended, especially eastwards, with the inclusion of 4 popula- tions from India. The inclusion of these populations also allowed us to check whether the fixation of the S allele was specific to Ethiopia, or alternatively, a characteristic of southern populations. As in Paper I the association be- tween flowering time and markers genotypes was performed using analysis of variance.

Flowering time was very variable among the populations with a significant population effect. Like before the Ethiopian populations flowered early and so did the Indian populations while the other populations displayed a large variation in flowering time. Based on their latitudinal origin, the Turkish populations were surprisingly late flowering. The Ethiopian populations were almost monomorphic at all 5 loci while the Indian and Middle Eastern populations exhibited slightly more polymorphism. Most variation was found within the European populations. Because of the almost complete monomorphism of the Indian and Ethiopian populations no estimate of geno- typic disequilibrium could be obtained for those populations.

23 In the other populations the genotypic disequilibrium was found to be ex- tensive in five populations (from Spain, France, Italy and two from Greece) and limited in all others. Ind2 and two markers in the intergenic region be- tween BniCOL1 and BniCO showed the highest genotypic disequilibrium.

The correlation between loci polymorphism and flowering time was ana- lyzed in 15 populations (Indian, Ethiopian and Turkish populations were excluded due to their lack of polymorphism). All loci were analyzed sepa- rately in the different populations and some combinations could not be tested due to of lack of variation. In total, 55 comparisons were made and 15 of them were significant. The results confirmed our previous study concerning the Greek and the Spanish populations with a significant relationship be- tween FT and the polymorphism at Ind2. In addition, variation at other markers was also associated with FT in these populations, as well as in a few others. Besides Ind2, the markers with the majority of the significant correlations were the two markers that were lo- cated in the intergenic region between BniCOL1 and BniCOa. These results are compatible with the hypothesis that an important polymorphism control- ling FT variation is located either within BniCOL1, or in the intergenic re- gion between BniCOL1 and BniCOa. Furthermore, in a large number of early flowering populations from Ethiopia and India, the markers in the re- gion are fixed for alleles associated with early flowering in variable popula- tions. Still, a number of variable populations did not display association be- tween FT and markers in or around BniCOL1. Thus, it is likely that addi- tional loci contribute to flowering time variation in B. nigra. The initial QTL experiment identifying the locus in the BniCOL1 area utilized a single cross, and was based on a limited population size, so several QTL might have es- caped detection. Due to extensive linkage in the BniCOL1 area, the data from this study has not allowed us to narrow down the area for the actual position of the QTN. Fine mapping using additional crosses or association mapping with large sample sizes might resolve this issue.

Paper IV

This paper focuses on the three known CO-like genes in Brassica nigra, namely BniCOL1, BniCOa and BniCOb to obtain a broader picture of their evolution. BniCOa and BniCOb are more recent duplications compared to BniCOL1 which arose from a duplication event before the split between Arabidopsis and Brassica. Still, BniCOL1 and BniCOa are located on the same chromosome just 3.5 kb apart within a major QTL for flowering time while BniCOb is located with a second QTL (Lagercrantz et al. 1996). A joint study of the three genes therefore seemed warranted.

24 The degree of polymorphism varied among the three genes. BniCOb was the most variable, BniCOL1 exhibiting only half of the level observed in Bni- COb. The least variable was BniCOa with only a quarter of the variation found in BniCOb. We tested the three genes for departure from the standard neutral model using several tests. The only test that was significant was the McDonald and Kreitman test for the BniCOL1 gene. While the McDonald- Kreitman test has many advantages it cannot separate ancient and recent selection. We therefore do not know whether the departure from neutrality detected at BniCOL1 reflects recent or ancient selection. The rejection of neutrality in BniCOL1 is in contrast to what was found in paper II. The dis- crepancy between these studies can be the different samples used, for exam- ple in this study a Turkish population that was very late flowering in spite of its southern location was included. The significant result is due to a deficit of polymorphic nonsynonymous changes relative to polymorphic synonymous changes. This could imply an increase in selective constrain in Brassica ni- gra and/or a comparable higher constrain compared to Arabidopsis. The latter could be because the BniCOL1 gene in Brassica nigra seems to be correlated to flowering time and therefore under strong selective constraints which is not the case in Arabidopsis. The former scenario with an increase in selective constrain can be explained by the fact that BniCOL1 probably is the only one left of the two or more initial CO-Like genes in Brassica nigra. When a duplication event takes place, one theory postulates that both genes initially experience relaxed selection pressure. Eventually one of the copies accumulates many deleterious mutations and becomes functionless and a so called pseudogene. Thereafter the gene can either be deleted from the ge- nome or is so different from its original structure that it is no longer recog- nizable. The remaining gene may then again experience an increased selec- tion pressure.

BniCOb showed characteristics that made us believed it is a recent pseu- dogene, for example BniCOb has several mutation that interrupt its function like stop codons and it has a high degree of nucleotide polymorphisms. It also has a long deletion removing a large part of the highly conserved zinc finger region. This should be strongly selected against, if BniCOb had the role of a functional gene.

25 Conclusions

The present set of studies illustrate how one can go from QTL to genes con- trolling complex traits in close relatives of model species for which large amount of data are available. In the absence of comparative genomics data in Brassicaceae and a wealth of information on genes involved in the control of flowering time in A. thaliana it would simply had not been possible to go from the QTL to a putative candidate locus so quickly. The magnitude of the efforts needed to reach that goal in species not benefiting from this type of initial advantage is well illustrated by the work of Colosimo et al (2005) in sticklebacks. So, in organisms where crosses are readily obtained, genomic information is available, in the species itself or in a close relative, and pheno- typing is rather straightforward, the strategy deployed here undoubtedly con- stitutes a good starting point. However, it also has some obvious limitations, most of which can fortunately be remedied. First, the initial QTL study only reflects the variation present in the initial cross, which may not necessarily be representative of the variation present in the species. Of course, more than one mapping population may be used, although QTL mapping being a labor intensive activity, this would imply a very heavy additional work. Second, QTLs can be very large genomic area, and harbor far more than a handful of reasonable candidate genes. It is worth pointing out that in the present case, for instance, BniCOa rather than BniCOL1 was our first choice. Third, in organisms where linkage disequilibrium is extensive it may be difficult to pinpoint the exact location of the causal polymorphism. In Brassica nigra, for instance, we could create sets of recombinant inbred lines and carry out fine-scale mapping studies. Such studies are currently underway in Brassica species of higher economic values such as B. napus and B. rapa. For both future mapping efforts as well as for population genetic studies we would also need extensive SNPs data. In the latter case, the availability of those SNPs would allow us to reconstruct the population history of the species and account for it when testing for the presence of selection at specific loci, such as BniCOL1.

26 Summary in Swedish

Populärvetenskaplig sammanfattning

När en växt blommar är oerhört viktigt för växten eftersom det i sin tur på- verkar hur stor chans det är att för den att ge upphov till fler plantor genom frösättning. Eftersom växter är bundna rent fysiskt till den plats som de väx- er på är de mer sårbara för den yttre miljön än djur. Därför gäller det att veta när det är lämpligast att blomma med hänsyn till den yttre miljö d v s när det är lagom med ljus, vatten och värme för kunna skapa många frön som får bästa möjliga förutsättningar för att kunna gro.

Eftersom blomningstiden är så viktig för växter så är den också hårt reglerad av en mängd gener, många av dem påverkar varandra i invecklade nätverk. Dessa samverkar för att tala om för växten när det är dags. När en växt bör- jar blomma påverkas av många gener, men även miljön.

I den här avhandlingen har vi försökt hitta var i svartsenaps (Brassica nigra) arvsmassa (genom) man kan hitta en eller flera gener som har en stor påver- kan på blomningstid. Bakgrunden är en QTL-mapping, som i korthet innebär att man korsar plantor som är väldigt olika i det avseende man vill studera, i vårt fall blomningstid. Sedan studerar man deras avkomma, med avseende på blomningstiden och med avseende på vilka genetiska markörer varje in- dividuell planta uppvisar. Man försöker hitta ett mönster av att vissa plantor som ex. blommar väldigt tidigt alltid har en viss genetisk markör. Dessa markörer är utspridda i hela arvsmassan. När man hittar en samband mellan markör och egenskap kan man titta på var i arvsmassan just den markören är belägen och på så sätt få information om var i arvsmassan man ska leta efter en gen som har effekt på ex. blomningstiden. I vårt fall hittade vi, i ett områ- de i arvsmassan som verkade intressant för blomningstid, två gener. Den ena, CONSTANS (CO) har man sett påverkar blomningstid i andra växter och den andra CONSTANS LIKE 1 (COL1) är, som hörs på namnet, väldigt lik den första eftersom det är en kopia. I COL1 genen har vi sedan kunna se ett samband med variation av en del av genen och blomningstid (artikel 1). Vi har sedan försökt bekräfta det resultatet genom att undersöka fler växter, från fler populationer och olika geografiska ursprung (artikel 3). Vi har även undersökt om andra delar av arvsmassan påverkar blomningstiden genom att studera även andra områden, strax utanför COL1 genen (även det artikel 3). Vi har även försökt se om denna gen är utsatt för selektion, d v s ett evolu- tionärt tryck att behållas som den är. Om genen är inblandad i blomningstid är den viktig för växten och bör därför skyddas mot förändringar (artikel 2).

27 I svartsenap finns det det tre gener som är duplicerade (kopierade) varianter av varandra och i den sista artikeln tittar vi närmare på skillnader mellan dessa och deras ursprung. Generna är COL1 och två varianter av CO, som kallas COa och COb. Det verkar som om COL1 har fått en annan roll i svart- senap än i vissa andra växter, där samma roll innehavs av COa. Det verkar också som om COb håller på att förlorat sin funktion vilket kan bero på att det redan finns en CO gen i arvsmassan och det räcker för att ha full funk- tion.

28 Acknowledgements

I think many thesis in science, also mine, is definitely the result of a theme work (so I really hope I’m not the only one who will show up at the disputa- tion defending it, or???). And therefore its not such a big surprise that this section “Acknowledgement” is the first paragraph to be read by the average science thesis reader. So I take this opportunity of undivided attention to really thank my team members!

- First of all, my head supervisor Martin Lascoux. Really great thanks for all the help making this thesis. With your large passion for genetics and all kinds of statistical tests you’re a great asset as a supervisor and for the whole department, even if I personally have difficult to share your fascination for the program R! I also appreciate that you always are quick in giving feed- back on written material and that you bring a French touch to the whole de- partment (even if I am the one driving a French car….)

- Secondly my other supervisor Ulf Lagercrantz who showed me how to run my first PCR to start with…. A more humble professor is hard to find, who loves to sit by his computers (seems to always have at least two running at the same time, compared to me who are fully occupied by one), but also very handy on all practical laboratory things, from how to fiddle with the PCR- program to how to cultivate plants. Really great thanks to you too for all the help with my thesis and not least all the practical work behind it.

Great thanks also to:

- All my other Brassica team members: Per, Tomas, Oksana, Susanne, Kerstin, Maj-Britt, Harald and Fazia.

- To my fantastic room mates. I had the fortune to first share room with Anna and Susanne, and then Laura . Thanks for great company during the years!

- To my departments team members at Department of Plant Biology, Swed- ish University of Agricultural Sciences. Thank you for all help in the lab

29 and for all the interesting lunch room conversions, finally but not least being so funny and nice to be around.

-And also thanks to my other department team, nowadays called Department of Evolutionary Functional Genomics, Uppsala University. It has been great to know you all, some for a shorter time, while some for my whole PhD period. I have truly enjoyed being around you and I will miss you all!

-And finally but not least to my biggest supporters in life (besides my par- ents!), my husband Richard (I could not have done this without you!) and our sweet children Erik, Zeenat and Bhavana.

30 References

Alvarez-Buylla, E R et al. (2000) An ancestral MADS-box gene duplication oc- curred before the divergence of plants and animal. Proceedings of the National Academy of Sciences 97:5328-5333

Balasubramanian, S et al. (2006) The PHYTOCHROME C photoreceptor gene me- diates natural variation in flowering and growth response of Arabidopsis thaliana. Nature Genetics 38:711-715

Balding DJ (2006) A tutorial on statistical methods for population association stud- ies, Nature Reviews Genetics 7: 781-791

Blazquez, M A et al. (1998) Gibberellins promote flowering of Arabidopsis by acti- vating the LEAFY promoter. The Plant Cell 10: 791-800

De Bodt, S et al. (2003) And then there were many: MADS goes genomic. Trends in Plant Science 8:475-483

Boss, RK et al. (2004) Multiple Pathways in the decision to flower: enabling, pro- moting, and resetting. The Plant Cell 16: S18-S31

Caicedo, A L et al. (2004) Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proceed- ings of the National Academy of Sciences 101:15670-15675

Cockram, J et al. (2007) Control of flowering time in temperate cereals: genes, do- mestication, and sustainable productivity. Journal of experimental botany doi:10.1093/jxb/erm042

Colosimo, PF et al. (2005) Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science 307:1928-1933

Doerge, RW (2002) Mapping and analysis of quantitative trait loci in experimental populations. Nature Reviews in Genetics 3:43-52.

Dunlap, J C (1999) Molecular bases for circadian clocks. The Cell 96:271-290

Griffiths, S et al. (2003) The evolution of CONSTANS-like gene families in barley, rice, and Arabidopsis. Plant Physiology 131:1855-1867

Gu, Z et al. (2002) Extent of gene duplication in the genomes of Drosophila, nema- tode and yeast. Molecular biology and evolution 19:256-262

Gomez-Campo, C and Prakash, S (1999) Origin and domestication. In Gomez- Campo (ed) Biology of Brassica coenospecies. Elsevier Science Pp 33-58

31 González-Martínez, SC (2007) Association genetics in Pinus taeda L. I. Wood property traits. Genetics 175:399-409

Himmelreich, R et al. (1996) Complete sequence analysis of the genome of the bac- terium Mycoplasma pneumoniae. Nucleic Acids Research 24:4420-4449

Van Hoof, A (2005) Conserved Functions of Yeast Genes Support the Duplication, Degeneration and Complementation Model for Gene Duplication. Genetics 171:1455-1461

Ingvarsson, PK (2006) Clinal variation in phyB2, a candidate gene for day-length- induced growth cessation and bud set across a latitudinal gradient in European aspen (Populus tremula). Genetics 172:1845-1853.

Johanson, U et al. (2000) Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. Science 290: 344-347

Lagercrantz, U et al. (1996) Comparative mapping in Arabidopsis and Brassica, congruence of genes controlling flowering time. The Plant Journal 9:13-20.

Ledger, S (2001) Analysis of the function of two circadian-regulated CONSTANS- LIKE genes. The Plant Journal 26:15-22.

Li, W-H et al. (2001) Evolutionary analyses of the human genome. Nature 409:847- 849

Lynch, M and Conery, J S (2000) The Evolutionary Fate and Consequence of Du- plicate Genes. Science 290: 1151-11155

Kruskopf-Österberg, M et al. (2002) Naturally occurring indel variation in the Bras- sica nigra COL1 gene is associated with variation in flowering time. Genetics 161: 299-306

Martin, J et al. (2004) Photoperiodic regulation of flowering in perennial ryegrass involving a CONSTANS-like homolog. Plant Molecular Biology 56: 159-169

Martinez-Garcia, J F et al. (2002) Control of photoperiod-regulated tuberization in potato by the Arabidopsis flowering-time gene CONSTANS. Proceedings of the National Academy of Science 99:15211-15216

Melzer, S et al. (1999) FPF1 modulates the competence to flowering in Arabidopsis. Plant Journal 18: 395-405

Morinaga, T (1928) Preliminary note on interspecific hybridization in Brassica. Proc. Imp. Acad. 4: 620-622

Mouradov, A et al. (2002) Control of flowering time: interacting pathways as a basis for diversity. The Plant Cell 14: S111-S130

Nei, M and Hughes AL (1992) Balanced polymorphism and evolution by the birth- and-death process in the MHC loci. In 11th histoincompatibility workshop and

32 conference (K. Tsuji, M. Aizawa and T. Sasazuki eds.), pp. 27-38. Oxford Uni- versity Press, Oxford, UK.

Pritchard, JK et al. (2000) Inference of population structure using multilocus geno- type data. Genetics 155: 945-959.

Putterill, J et al. (1995) The CONSTANS gene of Arabidopsis promotes flowering and encodes a protein showing similarities to zinc finger transcription factors. Cell 80: 847-857.

Putterill, J et al. (1997) The flowering time gene CONSTANS and homologue CON- STANS LIKE 1 (Accession no. Y10555 and Y10556) exist as a tandem repeat on chromosome 5 of Arabidopsis. Plant Physiology 114: 396.

Roux, F et al. (2006) How to be early flowering: an evolutionary perspective. Trends in Plant Science 11:375-381

Schranz, ME et al. (2002) Characterization and effects of the replicated flowering time gene FLC in . Genetics 162:1457-1468

Shavorskaya, O and Lagercrantz, U (2006) Sequence divergence at the putative flowering time locus COL1 in Brassicaceae. Molecular Phylogenetics and Evo- lution 39:846-854

Sun, T-P and Kamiya, Y (1994) The Arabidopsis GA1 locus encodes the cyclase ent-kaurene synthetase. Plant cell 6:1509-1518

Sung, S and Amasino, R M (2005) Remembering winter: Toward a molecular un- derstanding of vernalization. Annual Review of Plant Biology 56:491-508

Tanksley, S D (1993) Mapping polygenes. Annual Review of Genetics 27:205-233

Thornsberry, J (2001) Dwarf8 polymorphisms associate with variation in flowering time. Nature Genetics 28: 286-289.

Thumma, BR et al. (2005) Identification of casual relationship among traits related to drought resistance in Stylosanthes scabra using QTL analysis. Journal of Ex- perimental Botany 52:203-214

Tomb, J-F. et al. (1997) The complete genome sequence of the gastric pathogen Helicobacter pyroli. Nature 389: 539-547

U N (1935) Genomic analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Japan J Bot 7:389-452

Yokoyama, A and Yokoyama, R (1989) Molecular evolution of human visual pig- ment gene. Molecular Biology and Evolution 6:186-197

Young, M W and Kay, SA (2001) Time zones: a comparative genetics of circadian clocks. Nature Reviews Genetics 2:702-715

33 Yu, J et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38:203-208.

Valverde, F et al. (2004) Photoreceptor regulation of CONSTANS protein in photo- periodic flowering. Science 303:1003-1006

Wang, R-L et al. (1999) The limits of selection during maize domestication. Nature 398: 236-239.

Warwick, S I and Black, L D (1993) Molecular relationships in subtribe Brassicinae (Cruciferae, tribe Brassiceae). Canadian Journal of Botany 71:906-918

Warwick, S I et al. (2000) The biology of Canadian weed. 8. Sinapsis arvensis L. (updated). Canadian Journal of Plant Science 80: 939-961

Westman AL and Kresovich, S (1999) Simple sequence repeat (SSR)-based marker variation in Brassica nigra genebank accessions and weed populations. Euphytica 109: 85-92.

Wilson, R N et al. (1992) Gibberellin is required for flowering in Arabidopsis thaliana under short days. Plant physiology 100:403-408

Wright, S I (2005) The effects of artificial selection on the maize genome. Science 308:1310-1314

Yano, M et al. (2000) Hd1, a major photoperiod sensitivity quantative trait locus in rice, is closely related to the Arabidopsis flowering time gene CONSTANS. The Plant Cell 12:2473-2483.

Zhao, K et al. (2007) An Arabidopsis example of association mapping in structured samples. PLoS Genetics 3:e4.

34

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 312

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-7900 2007