<<

A link between LEAFY and B-gene homologs in Welwitschia mirabilis sheds light on 2 ancestral mechanisms prefiguring floral development

Edwige Moyroud1, Marie Monniaux1, Emmanuel Thévenon1, Renaud Dumas1, Charles P. 4 Scutt2, Michael W.Frohlich2,3, François Parcy1

1 LPCV, CEA, CNRS, INRA, Université Grenoble-Alpes, BIG, 38000, Grenoble, France

6 2 Laboratoire de Reproduction et Développement des Plantes, UMR5667, CNRS, INRA,

Université de Lyon, Ecole Normale Supérieure de Lyon, 46 allée d’Italie, 69364 Lyon Cedex

8 07, France.

3 Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK.

10 CORRESPONDING AUTHORS:

François Parcy; [email protected] 33 (0)4 38784978 @Francois_Parcy

12 and Michael Frohlich; [email protected] 44 (0)79 5223 2864

Twitter heading: 14 Paving the way for flowers: a piece of the floral network predates flower origin

16 Twitter account: @Francois_Parcy

WORD COUNT

18 Total: 6197 words

Summary: 199 words

20 Introduction: 1415 words

Materials and Methods: 733 words

22 Results: 2000 words

Discussion: 1879 words

24 Acknowledgements: 170 words

This manuscript also contains 5 Main Figures (all in colour except Fig. 4), 4 Supplementary

26 Figures, 5 Supplementary Tables, Supplementary Methods and Supplementary References.

1

28 SUMMARY

• Flowering evolved from an unidentified ancestor. Comparing the 30 mechanisms controlling development in angiosperm flowers and gymnosperm cones may help elucidate the mysterious origin of the flower. 32 • We combined gene expression studies with protein behaviour characterisation in Welwitschia mirabilis to test whether the known regulatory links between LEAFY and 34 its MADS-box gene targets, central to flower development, might also contribute to gymnosperm reproductive development. 36 • We found that WelLFY, one of two LEAFY-like genes in Welwitschia, could be an upstream regulator of the MADS box genes APETALA3/PISTILLATA-like (B-genes). 38 We demonstrated that even though their DNA binding domains are extremely similar, WelLFY and its paralog WelNDLY exhibit distinct DNA binding specificities and 40 that, unlike WelNDLY, WelLFY shares with its angiosperm ortholog the capacity to bind promoters of Welwitschia B-genes. Finally, we identified several cis-elements 42 mediating these interactions in Welwitschia and obtained evidence that the link between LFY homologs and B-genes is also conserved in two other , 44 Pinus and Picea. • Although functional approaches to investigate cone development in gymnosperms are 46 limited, our state-of-the-art biophysical techniques, coupled with expression studies, provide evidence that crucial links, central to the control of floral development, may 48 already have existed before the appearance of flowers.

50 KEY WORDS:

Angiosperms, bisexual structure, flower development, flower origin, gymnosperms,

52 LEAFY, MADS-box, transcription factors

54

2

INTRODUCTION

56 One of the most important developmental changes in the evolutionary origin of the flower was the combining of male and female reproductive organs onto a single axis (Frohlich 58 & Parker, 2000; Baum, D.A., and Hileman, 2006; Theissen & Melzer, 2007; Rudall & Bateman, 2010). However, the origin of bisexuality in the angiosperms remains enigmatic 60 (Bateman et al., 2006; Frohlich & Chase, 2007; Doyle, 2008; Mathews & Kramer, 2012). By comparing the genetic circuits that control the development of bisexual flowers versus 62 unisexual gymnosperm reproductive structures (GRS), we aim to generate evidence regarding the developmental network that functioned in the last common ancestor of the living 64 plants (angiosperms and extant gymnosperms). An understanding of this ancestral seed network should help to identify the subsequent molecular changes which led to the 66 appearance of the first flowers in the angiosperm lineage.

68 In angiosperms, male and female reproductive organ identity is controlled by the combinatorial expression of B- and C-class MADS-box genes: C-gene expression confers 70 female (carpel) identity in primordia arising from the centre of the floral , while combined B- and C-gene expression confers male (stamen) identity in primordia that form in 72 the surrounding zone (Becker & Theißen, 2003). B- and C-class genes belong to the APETALATA3/PISTILLATA (AP3/PI in Arabidopsis) and AGAMOUS (AG in Arabidopsis) 74 clades of MADS-box genes, respectively. Gymnosperms also possess MADS-box genes within these two lineages (Theißen & Becker, 2004): AG-like genes are expressed in both 76 male and female GRS, while AP3/PI-like genes are only expressed in male GRS (Sundström & Engström, 2002; Winter et al., 2002; Zhang et al., 2004). The expression of gymnosperm 78 AP3/PI- or AG-like transgenes in flowering plants whose native B or C genes are inactivated by mutation, is sufficient to restore near wild-type flower development, suggesting that the 80 biochemical properties of B and C homologs are widely conserved between seed plants (Winter et al., 2002; Zhang et al., 2004). To generate the bisexual structure of the first 82 flowers, a C-class expression domain must have arisen next to a B+C-class domain on the same growing axis. Accordingly, several authors have proposed a change in the regulation of 84 AP3/PI and/or AG homologs as a crucial event on the lineage leading to the angiosperms and candidate genes potentially responsible for this regulatory shift have been suggested (Albert et 86 al., 2002; Becker & Theißen, 2003; Theißen & Becker, 2004; Baum, D.A., and Hileman,

3

2006). However, regulators of AP3/PI and/or AG homologs in extant gymnosperms, the sister 88 group to flowering plants, remain to be identified. The LEAFY/FLORICAULA (LFY/FLO) gene encodes a unique plant transcription 90 factor, which, in angiosperms, patterns the floral meristem by regulating B- and C- class genes (Moyroud et al., 2009a, 2010). In Arabidopsis for instance, LFY is a direct activator of 92 both APETALA3 (AP3) and AGAMOUS (AG) (Parcy et al., 1998; Busch et al., 1999; Lohmann et al., 2001; Lamb et al., 2002). All major groups of extant gymnosperms possess 94 two paralogous LFY-like genes (Frohlich & Parker, 2000; Vazquez-Lobo et al., 2007), first identified in Monterey pine as PRFLL (Mellerowicz et al., 1998) and NEEDLY (NDLY) 96 (Mouradov et al., 1998a). The only known exception to the maintenance of two LFY-like paralogs in gymnosperms is in the , at least some species of which seem to 98 possess a single LFY-like gene (Frohlich & Meyerowitz, 1997; Shindo et al., 2001; Frohlich, 2003). Phylogenetic analyses indicate that both LFY and NDLY homologs were probably 100 present in the last common ancestor of the living seed plants and that the NDLY gene, retained in most gymnosperms, was subsequently lost in the angiosperm lineage before the radiation 102 of the extant flowering plants (Frohlich & Parker, 2000).

104 LFY-like genes are expressed in the developing GRS of all gymnosperms studied to date, consistent with a role for these genes in reproductive development (Mellerowicz et al., 106 1998; Mouradov et al., 1998a; Shindo et al., 2001; Carlsbecker et al., 2004; Dornelas & Rodriguez, 2005; Guo, C.L., Chen, L.G., He, X.H., Dai, Z., and Yuan, 2005; Shiokawa et al., 108 2008). A recent study in Norway spruce brings further supports for a role of LFY-like proteins in reproductive initiation: in the acrocona mutant, vegetative shoots can develop into 110 female cones and this vegetative-to-reproductive transition correlates with an upregulation of LFY-like genes (Carlsbecker et al., 2013). A comprehensive study performed in three conifer 112 genera also demonstrated that LFY and NDLY homologs are both expressed within male and female GRS, sometimes in overlapping territories, but that their expression patterns often 114 diverge so that the two paralogs are expressed in mutually exclusive domains, especially in late developmental stages (Vazquez-Lobo et al., 2007). These observations suggest that LFY 116 and NDLY make distinct contributions to GRS formation, though the identity of the genes they regulate in gymnosperms and the molecular basis accounting for their hypothetical 118 functional divergence remain to be established.

4

120 A link between LFY-like genes and homologs of floral homeotic genes in gymnosperms has been a central postulate of many hypotheses of flower origin (Albert et al., 122 2002; Becker & Theißen, 2003; Theißen & Becker, 2004; Baum, D.A., and Hileman, 2006), though the existence of this link has never been demonstrated. In particular, it is not known 124 whether gymnosperm LFY-like genes perform a similar function to their angiosperm counterparts by regulating AP3/PI- and AG-like genes as the regulatory potential of LFY-like 126 proteins has never been analysed in any gymnosperm species. To investigate the role(s) of LFY-like genes in gymnosperms, i.e. a non-flowering seed plant lineage, we used a 128 combination of gene expression and biophysical analyses to test for the existence of a minimal network involving LFY-like proteins and AP3/PI- and AG-like MADS-box genes in 130 the gymnosperm Welwitschia mirabilis.

132 Welwitschia presents numerous advantages as a gymnosperm for molecular- developmental studies: plants can make male reproductive structures (cones) in as little as two 134 years from seed (Van Jaarsveld, 1992) and are small enough to be isolated in controlled environments. Although the plant body is famously bizarre, the reproductive structures are 136 generalized; they have not lost numerous parts, as have their relatives Gnetum and , nor do they have extensive fusions, with the resulting morphological ambiguity of conifer 138 cones (Mundry & Stützel, 2004). Welwitschia cones show long gradate development in which all stages are simultaneously available. The cones are borne on thin, branching stems that 140 emerge from the “scaly body” between the (Fig. 1A). Each cone bears a series of opposite-decussate axillary fertile units subtended by sterile bracts (Fig. 1B-C), such that 142 newly formed units emerge at the cone tip and become progressively older towards the base. In female cones, each axillary unit comprises three pairs of opposite bracts surrounding a 144 central fertile ovule. In male cones, each axillary unit (Fig. 1D) consists of two pairs of opposite bracts surrounding antherophores fused in a tube bearing six stalked synangia (the 146 pollen producing organs). The antherophores enclose a sterile ovule (Fig. 1D), which functions in the attraction of pollinators by producing sugar-containing droplets (Endress, 148 1996). The sterile ovule is identifiable as an ovule because (1) it is borne in the comparable position (terminal on the axillary unit stem) to the seed-producing ovules of the female and 150 (2) is structurally similar to fertile ovules on the female, having a nucellus surrounded by an integument prolonged into a tube that (3) releases a droplet. However, the nucellus 152 of the sterile ovule lacks the megaspore and megagametophyte of the fertile ovules on the female.

5

154 Orthologs of LFY and NDLY in Welwitschia (WelLFY and WelNDLY) had previously 156 been isolated (Frohlich & Meyerowitz, 1997; Frohlich & Parker, 2000), but not further characterized. Here, we isolated five distinct MADS-box genes of the AP3/PI, AG, AGL6 and 158 Bsister clades and showed that, among these genes, the expression of the two AP3/PI homologs and of the AG homolog was compatible with regulation by WelLFY and/or 160 WelNDLY. Further biophysical analyses suggested that WelLFY, in particular, could regulate AP3/PI-like gene expression, as does its ortholog in angiosperms. Our analyses revealed that 162 WelLFY and WelNDLY show distinct DNA binding specificities and that only WelLFY is able to bind efficiently the promoters of the AP3/PI-like genes present in Welwitschia. In 164 addition, we also detected conserved LFY DNA-binding sites of high affinity in the promoters of AP3/PI homologs in four different species of conifers. Taken together, these 166 data constitute the first evidence at the biochemical level that ancient protein-DNA interactions involving LFY-like proteins and homologs of floral homeotic MADS-box genes 168 were already established before the appearance of angiosperms, which is a central postulate of many theories for the origin of flowers. 170

MATERIAL AND METHODS

172 Plant Material W. mirabilis Hook. f. tissues, collected at the Huntington Botanical Gardens, San Marino, CA 174 or at California State University, Fullerton CA, were frozen in liquid nitrogen for mRNA extraction or fixed in 4% paraformaldehyde prior to in situ hybridization.

176 Gene identification and Expression analysis Welwitschia cDNAs were isolated by a combination of RT-PCR and RACE-PCR, using 178 primers listed in Supplementary Table 1 and as described in Supplementary Methods. The phylogenetic placement of the fully sequenced cDNAs was assessed using ML phylogenetic 180 reconstructions, as described in Supplementary Methods. The expression of LFY-like and MADS-box genes was initially analyzed in Welwitschia vegetative and reproductive tissues 182 using semi-quantitative PCR, and then in detail in developing Welwitschia male cones using in situ hybridization, as described in Supplementary Methods. New sequences are available in 184 GenBank (accession nos. KF145184, KF145185, KF145186, KF145187, KF145188, KF145189, KF145190).

6

186 Protein production and DNA binding analysis Recombinant C-terminal (DNA Binding Domain, DBD; Supplementary Fig. 1) and near full- 188 length versions of WelLFY and WelNDLY (WelLFYD and WelNDLYD; D denotes proteins starting at amino acid 58 and 56 respectively and ending at the stop codon) were expressed 190 from pETM-11 (EMBL) and pET expression vectors and purified from E. coli cultures. Presumptive promoter regions upstream of Welwitschia MADS box genes were amplified 192 from Welwitschia genomic DNA by anchored PCR, as described in Supplementary Methods. DNA binding behaviour of LFY-like proteins was analyzed using three complementary 194 methods (SELEX-seq, Gel Shift and SPR). Position Weight Matrices (PWM) describing LFY-like protein binding to all possible target sequences were derived using the SELEX-seq 196 procedure, as previously described (Chahtane et al., 2013). Briefly, SELEX-seq (Systematic Evolution of Ligands by EXponential enrichment followed by massive sequencing) is an in 198 vitro technique allowing the purification of dsDNA sequences presenting a high affinity for a protein of interest from a random pool of 30nt-long dsDNA (library) (Liu & Stormo, 2005). 200 The process is iterative: the first round of selection involves the synthesis of a ‘random 30nt long’ dsDNA library, followed by incubation of this library with the protein of interest and 202 finally purification of the protein/DNA complexes. A new library is synthesised at the beginning of round 2, using the dsDNA purified in round 1 as a template. The enrichment of 204 high affinity sequences in each successive library is estimated by gel shift. Here, libraries that gave a visible shift (cycle 3 and 4 for WelLFYD, cycles 2 and 3 for WelNDLYD) were 206 selected and prepared for sequencing, using our recently published methodology (Chahtane et al., 2013). More than 20000 sequences were obtained for each library. The 2000 most 208 frequent unique sequences were aligned and the frequencies of each nucleotide at each position were derived from this alignment to generate the WelLFYD PWM, with constraints 210 of size (19 bp) and symmetry, taking into account the dependency between positions 4, 5 and 6; 9,10 and 11; 14, 15 and 16 as previously described (Moyroud et al., 2011; Minguet et al., 212 2015). For WelNDLYD, the analysis of the 2000 most frequent sequences gave variable results and no robust motif was found. Binding of LFY-like proteins to whole Welwitschia 214 MADS-box gene promoters was quantified using Surface Plasmon resonance (SPR). This method can be used to investigate the interaction between a transcription factor and a long 216 piece of DNA (typically a few kb long) (Moyroud et al., 2009b): DNA fragments are attached to a gold-coated chip in contact with a solution containing the transcription factor of interest. 218 The formation of DNA/protein complexes produces an angular deflection of an extinction

7

band occurring within a beam of plane-polarized light reflected from the other side of the gold 220 layer. This deflection is measured and analysed to quantify the strength of the interaction between the two components. Individual candidate binding sites were identified by scanning 222 the promoter sequences in silico using the PWMs and further tested together with known binding sites from Arabidopsis LFY target genes using EMSA assays, as fully described in 224 Supplementary Methods.

Analysis of B genes promoters in conifers

226 Homologs of DAL11, DAL12 and DAL13, the three B genes in Picea abies (Sundstrom and Engstrom, 2002), were identified by BLAST (Congenie.org, Dendrome Project: 228 http://dendrome.ucdavis.edu) against the sequenced genomes of Picea glauca, Pinus taeda and Pinus lambertiana (Supplementary Table 2). Potential LEAFY binding sites were 230 identified in the 3.5kb upstream of the coding sequences of all B genes homologs in those four conifer species using the WelLFYD matrix. 232

RESULTS

234 Expression of LFY-like genes precedes or parallels the activation of AP3/PI-like and AG- like genes in male tissues 236 To identify potential targets of LFY-like proteins in Welwitschia mirabilis, we isolated five distinct MADS-box cDNAs from a Welwitschia male and female cone cDNA library 238 (Supplementary Fig. 2). Subsequent phylogenetic analyses (Fig. 1E) indicated that two of these belong to the AP3/PI clade: WelAP3/PI-1 (close to Gnetum gene GGM2) and 240 WelAP3/PI-2 (close to Gnetum gene GGM15). The others belong to the AG (WelAG), AGL6 (WelAGL6) and GGM13/Bsister (WelBsister) clades. AP3/PI and AG-like genes in 242 gymnosperms are thought to specify male-versus-female identity of reproductive organs, similarly to their well-characterised homologs in angiosperms (Theißen & Becker, 2004). 244 Gymnosperm AGL6-like genes may be involved in the switch to reproductive development (Carlsbecker et al., 2004), while Bsister genes have been proposed to specifically regulate 246 female developmental programs in gymnosperms (Becker et al., 2002a,b). To elucidate the overall expression profiles of these genes in Welwitschia vegetative and reproductive tissues, 248 we performed semi-quantitative RT-PCR analyses. These analyses (Fig. 2) showed that WelAP3/PI-1 and WelAP3/PI-2 were both expressed exclusively in male cones, including the 250 tip region in which new axillary units are forming. WelAG was expressed in both male and

8

female cones, but not in leaves or female bracts (Fig. 2). WelBsister expression was detected 252 exclusively in female Welwitschia cones (Fig. 2), most strongly in ovules, though also faintly in sterile bracts and young axillary units at the cone tip. Finally, WelAGL6 expression was 254 detected in all tissues examined, including leaves (Fig. 2). Thus, the Welwitschia MADS-box genes investigated showed expression profiles that parallel those of their orthologs in other 256 gymnosperms and angiosperms. In particular, Welwitschia AP3/PI-like and AG-like genes were expressed specifically in reproductive tissues, with AP3/PI-like gene expression further 258 limited to male cones. Expression of WelLFY and WelNDLY was detected in all tissues examined, though 260 WelNDLY was expressed more weakly in bracts and much more weakly in male leaves than in the other tissues tested (Fig. 2). The broad expression profiles of these genes precluded simple 262 correlations between LFY-like genes and homologs of floral homeotic genes in Welwitschia. Thus, we performed in situ hybridizations to analyse the detailed spatiotemporal expression 264 patterns of LFY-like genes and their potential targets WelAP3/PI and WelAG, during Welwitschia cone development. As AP3/PI-like genes are not expressed in female cones (Fig. 266 2), we used male cones, which express all four genes, to perform this analysis. Expression of WelLFY was first evident on the flanks of the cone apex where axillary units were about to 268 form (Fig. 3A-B) and in very young bracts and emerging axillary unit primordia, just below the apical dome (Fig. 3B). WelNDLY transcripts, by contrast, were not detectable at the cone 270 apex (Fig. 3D). As fertile units developed, the WelLFY signal disappeared from the bracts (Fig. 3B), but remained high in the synangium primordia (Fig. 3C) and later became apparent 272 in the upper part of the elongating synangia (Fig. 3C). As pollen-producing tissues started to differentiate, the WelLFY signal remained very strong, but became restricted to the regions 274 bordering the pollen sacs. At this stage, WelLFY transcripts were clearly excluded from the tissues that would later generate pollen grains (indicated by asterisks in Fig. 3C). Similarly, 276 WelNDLY was initially expressed in synangium primordia, though at slightly later stages this signal became restricted to the future pollen-producing tissues (indicated by asterisks in Fig. 278 3D). In addition, WelNDLY transcripts were observed at the top of the nucellus in the sterile ovule (Fig. 3E). As with LFY-like genes, Welwitschia AP3/PI-like and AG-like genes were 280 also expressed during male axillary unit development. A strong WelAG signal first appeared when synangium primordia emerged (Fig. 3F). This signal was maintained as the primordia 282 grew, though it became restricted to the cells that would give rise to pollen grains (Fig. 3F-G). WelAG expression was also visible at the top of the nucellus of the sterile ovule (Fig. 3H). 284 The two AP3/PI-like genes showed nearly identical expression patterns: both were first

9

expressed throughout early male organ primordia, except in the centre of the axillary unit 286 from which the sterile ovule would emerge (Fig. 3I-J). These WelAP3/PI signals were visible as a gradual coloration along the synangium stalks, reaching maximum intensity at the top of 288 the pollen-producing organs. As sporogenous cells differentiated, WelAP3/PI gene expression became restricted to the tissues enclosing these cells, similar to the expression of WelLFY 290 (Fig. 3C, I and K). Based on these expression patterns, we conclude (i) that the expression of both Welwitschia 292 LFY-like genes precedes or parallels the activation of AP3/PI- and strong activation of AG- like genes in male tissues, and (ii) that in later stages of male cone development, the 294 expression domain of WelLFY and the two WelAP3/PI genes becomes mutually exclusive from that of WelNDLY and WelAG. These data are consistent with a possible regulatory link 296 between LFY/NDLY and homologs of floral homeotic genes in Welwitschia.

LFY orthologs in Welwitschia and Arabidopsis share a similar DNA-binding specificity, 298 while that of WelNDLY differs To investigate the roles of WelLFY and WelNDLY in transcriptional regulation, we 300 characterized their DNA binding specificities. We produced recombinant versions of the

DNA binding domains of these proteins (WelLFYDBD and WelNDLYDBD, Supplementary Fig. 302 1) and analyzed their properties in vitro. Size-exclusion chromatography assays indicated that

WelLFYDBD and WelNDLYDBD are monomeric in solution (Supplementary Table 3) and both 304 proteins can bind to AP1bs1, a DNA probe bearing the Arabidopsis LFY binding site from the AP1 promoter (Benlloch et al., 2011) (Fig. 4A). The band patterns obtained in EMSA for

306 WelLFYDBD and WelNDLYDBD were reminiscent of Arabidopsis LFYDBD (Hamès et al., 2008)

(Fig. 4A), suggesting that these Welwitschia proteins, like LFYDBD, bind to DNA as dimers.

308 We confirmed this hypothesis by mixing WelNDLYDBD with a GFP-tagged WelNDLYDBD protein (Fig. 4B); a new complex of intermediate mobility formed, which corresponded to a

310 WelNDLYDBD/GFP-WelNDLYDBD complex bound to AP1bs1. We also observed the

formation of a WelLFYDBD/GFP-WelNDLYDBD heterodimer in vitro (Fig. 4B).

312 Our EMSA results indicate that WelLFYDBD presents a higher affinity than WelNDLYDBD for AP1bs1 and for two other Arabidopsis LFY binding sites (AGbs2 and AP3bs1, located in the 314 regulatory second intron of AG and in the promoter of AP3, respectively), as complexes

involving WelLFYDBD are detected at lower protein concentrations (Fig. 4A, C). This suggests

316 that, of the two Welwitschia LFY-like genes, WelLFYDBD has the closest DNA preferences to Arabidopsis LFY. We then used a SELEX-Seq (Systematic Evolution of Ligands by

10

318 EXponential enrichment followed by massive sequencing) approach (Zhao et al., 2009; Moyroud et al., 2011; Chahtane et al., 2013), to characterize the DNA binding specificity of 320 near full-length proteins (WelLFYD and WelNDLYD) that included their conserved N- terminal oligomerization domain (Sayou et al., 2016). Comparing the logo of WelLFYD to 322 that of AtLFYD (Chahtane et al., 2013) shows that the DNA binding preferences of these two factors are very similar (Fig. 5A, Supplementary Table 4), consistent with the findings of a

324 recent study of LFYDBD evolution (Sayou et al., 2014). The analysis of DNA sequences recovered using WelNDLY was less straightforward as it gave variable results, preventing us 326 from generating a logo that faithfully reflects WelNDLY DNA binding specificity. Some sequences bound by WelNDLY clearly resemble those bound by LFY, whereas others are 328 very different (Supplementary Table 5). For example, we could identify two sequences, W10 and W12, which are bound by WelNDLYD, but not by WelLFYD, to form a complex of 330 lower mobility (Fig. 4D; Supplementary Table 5). These results indicate that WelLFY and WelNDLY DNA binding behaviours have diverged to some extent even if the two proteins 332 display overlapping preferences. Biochemical data support the regulation of AP3/PI-like genes by WelLFY in 334 Welwitschia, as it occurs in angiosperms Based on gene expression patterns, WelLFY is a more plausible candidate than WelNDLY to 336 regulate Welwitschia AP3/PI-like genes. To further investigate this, we isolated the sequences upstream of the WelAP3/PI coding regions (2.8 kb for WelAP3/PI-1 and 3.3 kb for 338 WelAP3/PI-2) and tested the ability of these full-length sequences to interact either with WelLFY or with WelNDLY. We used surface plasmon resonance (SPR) (Moyroud et al., 340 2009b) to evaluate the specific interactions of each paralog with the upstream regions of the Welwitschia AP3/PI homologs, and with a 2.2 kb genomic fragment of the WelTubulin gene 342 used as a negative control. Neither WelLFYD nor WelNDLYD was able to bind efficiently to

App the WelTubulin upstream region (KD best site ≈ 22 M) and the quality of the fits obtained 344 for these interactions was low (c2> 14), as is often the case when only non-specific binding

App occurs (Table 1). However, the presence of sites of high affinity for WelLFY (KD ≈ 45 nM, 346 c2 =2.6) was detected in the upstream sequence of WelAP3/PI-2. This DNA region could also

App contain medium affinity (KD ≈ 175 nM) binding sites for WelNDLY, but the insufficient 348 quality of the fit (c2=13.8) prevented us from establishing their existence with confidence (Table 1). Results for the other B gene, WelAP3/PI-1, were comparable: the presence of

App 2 350 medium affinity binding sites was detected with WelLFYD (KD ≈ 730 nM, c = 4.5),

11

whereas WelNDLYD was not able to bind efficiently to the upstream sequence of WelAP3/PI-

App 2 352 1 (KD ≈ 76 µM, c = 4.1). These results suggest that the upstream regions of both AP3/PI- like genes in Welwitschia possess binding sites for WelLFY, but not for WelNDLY. 354 Next, to identify those binding sites, we scanned the relevant upstream regions with the WelLFYD Position Weight Matrix (PWM) obtained from SELEX-Seq (Fig. 5A-C; 356 Supplementary Table 4). In each of the WelAP3/PI-1 and WelAP3/PI-2 upstream sequences used in SPR analyses, we detected five motifs (Fig. 5B and C) predicted as high affinity 358 WelLFY binding sites (score > -20). We then used an EMSA assay to test whether WelLFY could bind efficiently to DNA probes corresponding to the three sites with the highest scores 360 in the WelAP3/PI-1 upstream sequence (named bs1, bs2 and bs3) and the sites with, respectively, the highest (bs4) and the lowest (bs5) score above the -20 threshold in the 362 WelAP3/PI-2 upstream sequence (Fig. 5D). The results of these analyses showed that all sites tested except bs5 could interact efficiently with WelLFYD, forming four distinct complexes 364 (Fig. 5D), as in Fig. 4D. When tested with WelNDLY, these five individual sites were poorly bound, consistent with the SPR measurements on the whole promoters (Supplementary Fig. 366 3). Our results provide biochemical support for the proposition that WelLFY could regulate the expression of both Welwitschia AP3/PI-like genes, WelAP3/PI-1 and WelAP3/PI-2, 368 consistent with the results of our expression analysis in male cone tissues (Fig. 3). To further test this hypothesis and determine if a link between LFY and B genes could also be 370 detected in other gymnosperms, we took advantage of recently released conifers genomic sequences. Norway spruce (Picea abies) possesses three B genes, DAL11, DAL12 and DAL13 372 (Sundström et al., 1999; Sundström & Engström, 2002). Expression data analysis in this conifer species and constitutive expression of DAL11-13 in Arabidopsis together indicate that 374 while DAL11 and DAL12 could specify the identity of the pollen cone meristem, DAL13 could act as a determinant of male organ (pollen-bearing organ) identity (Sundström & 376 Engström, 2002). We identified homologs of those genes in three additional conifers genomes (Picea glauca, Pinus taeda, Pinus lambertiana) and analysed their promoter sequence for 378 conservation of putative LFY binding sites (Supplementary Table 2 and Supplementary Fig. 4). Within each four species, we consistently detected the highest affinity binding sites in the 380 promoters of DAL13-like genes (Supplementary Fig. 4). These sites were also largely conserved across species, even when the bulk of the promoter sequences diverged 382 significantly (Supplementary Fig. 4) hinting that these could be functionally relevant and bona fide regulatory motifs. Taken together, data from Welwitschia, pine and spruce strongly

12

384 suggest that a regulatory link between LFY-like proteins and B genes exists in extant gymnosperms. As LFY regulates B genes in angiosperms too, we propose that B genes were 386 already targets of LFY-like proteins to specify the identity of the pollen-bearing organs in the last common ancestor of the living seed plants. 388

390 DISCUSSION

392 Expression of floral homeotic genes homologs in Welwitschia mirabilis is consistent with a combined role of B and C-genes in specifying male identity 394 B- (AP3/PI) and C-class (AG) MADS-box genes control the identity of male and female reproductive organs in angiosperms. The discovery that AP3/PI- and AG-like genes are also 396 present in gymnosperms stimulated an interest in understanding the role that homologs of floral homeotic genes could fulfil in plants that do not make flowers. Here, we observed that 398 the expression of AP3/PI- and AG-like genes overlaps when primordia emerge in early stages of male cone development. However, WelAP3/PI-1 and WelAP3/PI-2 expression weakens and 400 becomes spatially restricted once the pollen-producing organs are fully formed, while WelAG remains active in the pollen producing cells and throughout ovule development. This 402 situation, also observed in Gnetum (Becker et al., 2003), supports the idea of a male program triggered by the combined expression of B and C genes, and a female program initiated by C 404 function alone (at least for sterile ovules). Since this feature is shared between extant gymnosperms and angiosperms, it is likely that the function of these genes was already 406 established in the last common ancestor of extant seed plants (Winter et al., 2002; Becker & Theißen, 2003). Additionally, the absence of WelBsister expression in the sterile ovules of 408 male cones (Fig. 2) suggests that Bsister genes may not be essential for ovule specification per se, but rather that they may play a role in the development of fully fertile ovules (e.g. in 410 megaspore or megagametophyte formation). This result is consistent with observations made in angiosperms. For example, in Arabidopsis, the Bsister gene TRANSPARENT 412 TESTA16/ARABIDOPSIS BSISTER (TT16/ABS) is involved in seed coat pigmentation (Nesi et al., 2002), outer integument development (De Folter et al., 2006; Prasad et al., 2010) and 414 also participates in endosperm development (Ehlers et al., 2016). This led to the conclusion that in angiosperms and perhaps in gymnosperms, Bsister genes are not essential for 416 specifying female organ identity but may contribute to the development of fertile ovules and (reviewed in Bernardi et al., 2014).

13

418 Evidence for ancient control of floral homeotic gene homologs by LFY-like genes The conservation of AP3/PI- and AG-like gene expression between angiosperms and 420 gymnosperms, with AG homologs expressed in both sexes and AP3/PI homologs in male tissues only, was a first clue that similar genetic mechanisms may underlie reproductive 422 development in all seed plants (Tandre et al., 1998; Becker & Theißen, 2003; Theißen & Becker, 2004). As LFY regulates both AP3/PI- and AG-like genes in angiosperms, it was 424 proposed that LFY homologs may also fulfil a similar role in gymnosperms. However, most gymnosperms contain two LFY-like genes of ancient origin, LFY and NDLY, and previous 426 studies, solely based on expression pattern analysis, revealed that deciphering the function of LFY-like genes in gymnosperms is a complex task as LFY and NDLY proteins probably fulfil 428 both separate and combined roles during cone development (Mellerowicz et al., 1998; Mouradov et al., 1998a; Shindo et al., 2001; Carlsbecker et al., 2004; Dornelas & Rodriguez, 430 2005; Guo, C.L., Chen, L.G., He, X.H., Dai, Z., and Yuan, 2005; Vazquez-Lobo et al., 2007; Shiokawa et al., 2008). 432 In the present work, we observed that in Welwitschia, as in other gymnosperms, AP3/PI- and AG-like genes show typical expression profiles in male and female reproductive tissues (Figs 434 2 and 3), while WelLFY and WelNDLY show somewhat broader expression patterns (Fig. 2). However, a closer examination in developing male cones revealed that LFY-like gene 436 expression always precedes or parallels WelAP3/PI and strong WelAG expression, suggesting that WelLFY and/or WelNDLY could regulate floral homeotic gene homologs in Welwitschia. 438 To provide firmer evidence of a regulatory link than this correlation between LFY-like and MADS-box gene expression profiles, a genetic approach would be ideal. However, functional 440 genetics to study cone development in Welwitschia is not yet possible, and so we used a range of state-of-the-art biochemical methods to determine whether important protein-DNA 442 interactions, central to the genetic network controlling flower development in Arabidopsis and other angiosperms, could be conserved in Welwitschia. We used a combination of 444 SELEX-Seq, SPR and EMSA analyses to show that WelLFY binds strongly and specifically to at least four sites in the presumptive promoters of two Welwitschia AP3/PI genes and that 446 WelLFY and its ortholog in Arabidopsis exhibit nearly identical DNA binding specificities. In addition, we identified conserved binding sites of high affinity for LFY-like proteins in the 448 promoter region of DAL13 homologues, a presumed determinant of male organ identity, from four conifers species. These results combine to support a central tenet of hypotheses of flower 450 origin (Frohlich & Parker, 2000; Albert et al., 2002): that LFY-like genes regulated the expression of specific classes of MADS-box genes in reproductive tissues before the

14

452 appearance of the flower. In particular, we present strong evidence that the control of specific AP3/PI-like MADS-box genes by LFY /NDLY homologs predates the origin of the flower.

454 Functional divergence of the two LFY-like genes in gymnosperms Specialization of the roles of LFY and NDLY homologs in gymnosperms has long been 456 postulated but never firmly established (Frohlich & Parker, 2000; Albert et al., 2002; Vazquez-Lobo et al., 2007). Early suggestions that, in gymnosperms, LFY orthologs are only 458 expressed in male reproductive structures while NDLY-like genes are only expressed in female cones have been refuted in conifers (Carlsbecker et al., 2004; Dornelas & Rodriguez, 460 2005; Vazquez-Lobo et al., 2007). Our analysis of WelLFY and WelNDLY expression pattern clearly establishes that the two paralogs are also expressed in both male and female cones in 462 Welwitschia, a gnetalean. Thus, the non-sex specific expression of LFY and NDLY genes is a characteristic shared by at least two extant lineages of gymnosperms. The control of male 464 versus female cone specification, upstream of the AP3/PI and AG genes, remains unclear, but could involve the LFY genes. Ultimately, such control must begin at the earliest 466 developmental stages of the cones. WelLFY, but not WelNDLY, is expressed in the male cone axis primordium and in the earliest primordea of male axillary units. Our evidence also 468 shows that WelLFY but not WelNDLY could specify the presumed male-determining AP3/PI genes. However, without comparable data from female cones any possible inferences on the 470 role of LFY in specifying male cone identity are weak.

472 Here, we investigate the biochemical properties of WelLFY and WelNDLY and we demonstrate that these two paralogs do not have identical DNA binding properties. LFY 474 orthologs in angiosperms and gymnosperms have very similar DNA binding specificities (Fig. 5A), while the DNA-binding specificity of their paralog NDLY has diverged (Fig. 4D): 476 sequences containing a LFY motif are recognized by both paralogs, albeit sometimes more efficiently by WelLFY than WelNDLY and novel sequences are bound only by WelNDLY 478 (Fig 4A and B for example). These observations could explain why both LFY and NDLY from gymnosperms, expressed in Arabidopsis or tobacco (Nicotiana tabacum) can induce flower 480 formation and complement a lfy mutant (Mouradov et al., 1998a; Maizel et al., 2005; Shiokawa et al., 2008), while in a yeast assay, WelNDLY is not as efficient as PRFLL (an 482 ortholog of WelLFY from Monterey pine) to activate a reporter gene under the control of the LFY binding sites from the AP1 or AG regulatory regions (Maizel et al., 2005). Our EMSA 484 assays (Fig. 4D) also suggest that the complexes that WelNDLY forms with W10 and W12

15

differ from the DNA/protein complexes assembled with the canonical LFY binding sites (by 486 having different protein:DNA ratios per complex, or different quaternary structures). Taken together, these results indicate that WelNDLY DNA binding specificity and possibly its DNA 488 binding mode have changed as compared to WelLFY, indicating that LFY succeeded at least once to evolve a slightly different specificity (or an alternative mode for contacting the DNA) 490 following a gene duplication event. However, this change is not as radical as the one that occurred between LFY and its distant homologs in algae or moss (Sayou et al., 2014). 492 Differences in biochemical activities may allow NDLY-like genes to fulfil different functions 494 from their LFY paralogs: WelNDLY failed to interact efficiently with the upstream regions of the two WelAP3/PI genes and our expression pattern analysis in late stages shows that 496 AP3/PI-like genes are expressed in territories where WelNDLY is not expressed. Taken together, these data indicate that WelNDLY is unlikely to regulate AP3/PI-like genes in 498 Welwitschia. By contrast, during late stages, WelNDLY expression correlates precisely with WelAG expression: both genes are first expressed in synangium primordia and later their 500 expression become restricted to the cells that will give rise to pollen grains. Also, transcripts of both WelNDLY and WelAG are observed at the top of the nucellus in the sterile ovule. 502 Thus, we speculate that WelNDLY could contribute to the regulation of WelAG expression, though confirmation of this hypothesis will require further experiments. It has been shown in 504 various angiosperm species that the large intron conserved within the AG locus (1st or 2nd intron depending on the species) contains the regulatory elements sufficient to direct the 506 expression of AG homologs, including the cis-element bound by LFY-like genes (Causier et al., 2009; Moyroud et al., 2011). Testing for the interaction between WelNDLY and the large 508 intron of WelAG would allow us to determine whether WelNDLY has the capacity to regulate this gene. However, we did not succeed in cloning the full-length intron of WelAG, probably 510 because this intron is very long (>10kb). Similarly, previous studies in another gymnosperm, Cycas edentata, failed to isolate the long intron of CyAG, the AG homolog in this species, as 512 here too, the sequence is extremely long (Zhang et al., 2004). Examination of pine and spruce genomic sequences also did not allow isolation of a complete AG first intron. 514 Conclusion 516 Here we showed that DNA-protein interactions between LFY orthologs and B-gene orthologs, that are central to the floral gene network of angiosperms, can be detected in the 518 gymnosperm Welwitschia (Gnetales). We also identified the specific sequences within the

16

Welwitschia B-gene promoters to which the LFY ortholog transcription factor binds in vitro. 520 We achieved this without the unknown complications of heterologous expression in a test angiosperm or yeast, but rather with well characterized biophysical methods, using authentic 522 gymnosperm protein and authentic DNA promoter sequences from the species under study. We also found, in other gymnosperms, that similar LFY-like binding sites are present in 524 promoters of the orthologous B genes. As LFY regulates B-genes in flowering plants too, we propose that the relationship between LFY-like proteins and B genes is ancient and could 526 have already been established in the last common ancestor of extant seed plants. Thus, these groups of extant seed plants not only use homologous sets of MADS-box genes to specify the 528 identities of their reproductive organs, but may also employ similar regulators, i.e., LFY-like proteins, to control the expression of these homeotic genes. Next, we demonstrated that LFY 530 and NDLY exhibit overlapping but distinct sets of target DNA sequences, which could reflect the functional divergence of the two LFY-like paralogs in gymnosperms. To what extent the 532 modification of LFY-like protein behaviour participated in the evolution of the first flowers remains to be established. Our work also shows that biophysical analysis can be applied to 534 genetically intractable organisms that occupy crucial phylogenetic positions to gain insight into the molecular mechanisms that led to morphological novelties. 536 ACKNOWLEDGMENTS 538 We thank N. Maturen, K. James and K. Warner for help with the in situ hybridization, M. Reymond and A. Chaboud for advice related to SPR experiments, N. Warthmann and D. 540 Weigel for SELEX-seq sample sequencing, members of the Parcy laboratory for discussion and three anonymous reviewers for their insightful comments on the manuscript. We thank J. 542 Trager and J. Folsom (Huntington Gardens, CA) and L. Song (Cal. State U. Fullerton) for providing Welwitschia material, E. Meyerowitz for providing lab space for the processing of 544 plant tissues, G. Theissen and R. Melzer for providing the MADS gene alignment from Becker and Theissen (2003), and B. Dentinger for help with ML analysis. This work was 546 supported by funding from the CNRS (ATIP+ to F.P.), the ANR (Plant-TFcode to F.P. and C.P.S.), PhD fellowships from the University J. Fourier, Grenoble (to E.M. and M.M.), 548 GRAL (ANR-10-LABX-49-01), the SYNTHESYS Project (to E.M.), the Floral Genome Project (NSF Plant Genome Research Program project DBI-0115684 to M. W. F.) and by 550 NSF DEB-9974374 (to M.W.F.).

552 AUTHOR CONTRIBUTIONS

17

E.M., M.W.F and F.P. designed the research, E.M., M.M., E.T., R.D. and M.W.F. performed 554 the research, E.M., M.M., M.W.F., C.P.S. and F.P. analysed the data, E.M. and F.P. wrote the manuscript with substantial help from M.W.F, C.S.P. and M.M. All authors read and helped 556 to discuss the manuscript.

558 REFERENCES Albert VA, Oppenheimer DG, Lindqvist C. 2002. Pleiotropy, redundancy and the evolution 560 of flowers. Trends in Plant Science 7: 297–301. Bateman RM, Hilton J, Rudall PJ. 2006. Morphological and molecular phylogenetic 562 context of the angiosperms: contrasting the ‘top-down’ and ‘bottom-up’ approaches used to infer the likely characteristics of the first flowers. Journal of experimental botany 57: 3471– 564 3503. Baum, D.A., and Hileman LC. 2006. A developmental genetic model for the origin of the 566 flower. In: C A, ed. Flowering and its manipulation. Sheffield, UK: Blackwell Publishing, 3– 27. 568 Becker A, Bey M, Bürglin TR, Saedler H, Theissen G. 2002a. Ancestry and diversity of BEL1-like homeobox genes revealed by gymnosperm (Gnetum gnemon) homologs. 570 Development Genes and Evolution 212: 452–457. Becker A, Kaufmann K, Freialdenhoven A, Vincent C, Li MA, Saedler H, Theissen G. 572 2002b. A novel MADS-box gene subfamily with a sister-group relationship to class B floral homeotic genes. Molecular Genetics and Genomics 266: 942–950. 574 Becker A, Saedler H, Theissen G. 2003. Distinct MADS-box gene expression patterns in the reproductive cones of the gymnosperm Gnetum gnemon. Development Genes and Evolution 576 213: 567–572. Becker A, Theißen G. 2003. The major clades of MADS-box genes and their role in the 578 development and evolution of flowering plants. Molecular Phylogenetics and Evolution 29: 464–489. 580 Benlloch R, Kim MC, Sayou C, Thévenon E, Parcy F, Nilsson O. 2011. Integrating long- day flowering signals: A LEAFY binding site is essential for proper photoperiodic activation 582 of APETALA1. Plant Journal 67: 1094–1102. Bernardi J, Roig-Villanova I, Marocco A, Battaglia R. 2014. Communicating across

584 generations: The B sister language. Plant Biosystems - An International Journal Dealing with all Aspects of Plant Biology 148: 150–156. 586 Birol I, Raymond A, Jackman SD, Pleasance S, Coope R, Taylor GA, Yuen MAM Saint,

18

Keeling CI, Brand D, Vandervalk BP, et al. 2013. Assembling the 20 Gb white spruce 588 (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics 29: 1492–1497. 590 Busch M a, Bomblies K, Weigel D. 1999. Activation of a floral homeotic gene in Arabidopsis. Science 285: 585–587. 592 Carlsbecker A, Sundstr??m JF, Englund M, Uddenberg D, Izquierdo L, Kvarnheden A, Vergara-Silva F, Engstr??m P. 2013. Molecular control of normal and acrocona mutant 594 seed cone development in Norway spruce (Picea abies) and the evolution of conifer ovule- bearing organs. New Phytologist 200: 261–275. 596 Carlsbecker A, Tandre K, Johanson U, Englund M, Engström P. 2004. The MADS-box gene DAL1 is a potential mediator of the juvenile-to-adult transition in Norway spruce (Picea 598 abies). Plant Journal 40: 546–557. Causier B, Bradley D, Cook H, Davies B. 2009. Conserved intragenic elements were critical 600 for the evolution of the floral C-function. Plant Journal 58: 41–52. Chahtane H, Vachon G, Le Masson M, Thévenon E, Périgon S, Mihajlovic N, Kalinina 602 A, Michard R, Moyroud E, Monniaux M, et al. 2013. A variant of LEAFY reveals its capacity to stimulate meristem development by inducing RAX1. Plant Journal 74: 678–689. 604 Dornelas MC, Rodriguez APM. 2005. A FLORICAULA/LEAFY gene homolog is preferentially expressed in developing female cones of the tropical pine Pinus caribaea var. 606 caribaea. Genetics and Molecular Biology 28: 299–307. Doyle JA. 2008. Integrating Molecular Phylogenetic and Paleobotanical Evidence on Origin 608 of the Flower. International Journal of Plant Sciences 169: 816–843. Ehlers K, Bhide AS, Tekleyohans DG, Wittkop B, Snowdon RJ, Becker A. 2016. The 610 MADS Box Genes ABS, SHP1, and SHP2 Are Essential for the Coordination of Cell Divisions in Ovule and Seed Coat Development and for Endosperm Formation in Arabidopsis 612 thaliana. PloS one 11: e0165075. Endress PK. 1996. Structure and Function of Female and Bisexual Organ Complexes in 614 Gnetales. International Journal of Plant Sciences 157: 113–125. De Folter S, Shchennikova A V., Franken J, Busscher M, Baskar R, Grossniklaus U, 616 Angenent GC, Immink RGH. 2006. A Bsister MADS-box gene involved in ovule and seed development in petunia and Arabidopsis. Plant Journal 47: 934–946. 618 Frohlich MW. 2003. An evolutionary scenario for the origin of flowers. Nature reviews. Genetics 4: 559–566. 620 Frohlich MW, Chase MW. 2007. After a dozen years of progress the origin of angiosperms

19

is still a great mystery. Nature 450: 1184–1189. 622 Frohlich MW, Meyerowitz EM. 1997. The Search for Flower Homeotic Gene Homologs in Basal Angiosperms and Gnetales: A Potential New Source of Data on the Evolutionary Origin 624 of Flowers. International Journal of Plant Sciences 158: S131. Frohlich MW, Parker DS. 2000. The Mostly Male Theory of Flower Evolutionary Origins: 626 from Genes to Fossils. Systematic Botany 25: 155–170. Guo, C.L., Chen, L.G., He, X.H., Dai, Z., and Yuan HY. 2005. Expressions of LEAFY 628 homologous genes in different organs and stages of Ginkgo biloba. Yi Chuan 27: 241–244. Hamès C, Ptchelkine D, Grimm C, Thevenon E, Moyroud E, Gérard F, Martiel J-L, 630 Benlloch R, Parcy F, Müller CW. 2008. Structural basis for LEAFY floral switch function and similarity with helix-turn-helix proteins. The EMBO journal 27: 2628–2637. 632 Van Jaarsveld E. 1992. Welwitschia mirabilis in cultivation at Kirstenbosch. Veld and Flora 78: 119–120. 634 Jermstad KD, Eckert AJ, Wegrzyn JL, Delfino-Mix A, Davis DA, Burton DC, Neale DB. 2011. Comparative mapping in Pinus: Sugar pine (Pinus lambertiana Dougl.) and loblolly 636 pine (Pinus taeda L.). Tree Genetics and Genomes 7: 457–468. Lamb RS, Hill T a, Tan QK-G, Irish VF. 2002. Regulation of APETALA3 floral homeotic 638 gene expression by meristem identity genes. Development 129: 2079–2086. Liu J, Stormo GD. 2005. Combining SELEX with quantitative assays to rapidly obtain 640 accurate models of protein-DNA interactions. Nucleic Acids Research 33: 1–6. Lohmann JU, Hong RL, Hobe M, Busch MA, Parcy F, Simon R, Weigel D. 2001. A 642 molecular link between stem cell regulation and floral patterning in Arabidopsis. Cell 105: 793–803. 644 Maizel A, Busch MA, Tanahashi T, Perkovic J, Kato M, Hasebe M, Weigel D. 2005. The floral regulator LEAFY evolves by substitutions in the DNA binding domain. Science 308: 646 260–263. Mathews S, Kramer EM. 2012. The evolution of reproductive structures in seed plants: A 648 re-examination based on insights from developmental genetics. New Phytologist 194: 910– 923. 650 Mellerowicz EJ, Horgan K, Walden A, Coker A, Walter C. 1998. PRFLL - A Pinus radiata homologue of FLORICAULA and LEAFY is expressed in buds containing vegetative 652 shoot and undifferentiated male cone primordia. Planta 206: 619–629. Minguet EG, Segard S, Charavay C, Parcy F. 2015. MORPHEUS, a webtool for 654 transcription factor binding analysis using position weight matrices with dependency. PLoS

20

ONE 10. e0135586 656 Mouradov a, Glassick T, Hamdorf B, Murphy L, Fowler B, Marla S, Teasdale RD. 1998a. NEEDLY, a Pinus radiata ortholog of FLORICAULA/LEAFY genes, expressed in both 658 reproductive and vegetative . Proceedings of the National Academy of Sciences of the United States of America 95: 6537–6542. 660 Mouradov A, Glassick T V, Hamdorf BA, Murphy LC, Marla SS, Yang Y, Teasdale RD. 1998b. Family of MADS-Box genes expressed early in male and female reproductive 662 structures of monterey pine. Plant Physiology 117: 55–62. Moyroud E, Kusters E, Monniaux M, Koes R, Parcy F. 2010. LEAFY blossoms. Trends in 664 Plant Science 15: 346–352. Moyroud E, Minguet EG, Ott F, Yant L, Posé D, Monniaux M, Blanchet S, Bastien O, 666 Thévenon E, Weigel D, et al. 2011. Prediction of regulatory interactions from genome sequences using a biophysical model for the arabidopsis LEAFY transcription factor. The 668 Plant cell 23: 1293–1306. Moyroud E, Reymond MCA, Hamès C, Parcy F, Scutt CP. 2009a. The analysis of entire 670 gene promoters by surface plasmon resonance. Plant Journal 59: 851–858. Moyroud E, Tichtinsky G, Parcy F. 2009b. The LEAFY floral regulators in angiosperms: 672 Conserved proteins with diverse roles. Journal of Plant Biology 52: 177–185. Mundry M, Stützel T. 2004. Morphogenesis of the reproductive shoots of Welwitschia 674 mirabilis and Ephedra distachya (Gnetales), and its evolutionary implications. Organisms Diversity and Evolution 4: 91–108. 676 Nesi N, Debeaujon I, Jond C, Stewart AJ, Jenkins GI, Caboche M, Lepiniec L. 2002. The TRANSPARENT TESTA16 locus encodes the ARABIDOPSIS BSISTER MADS domain 678 protein and is required for proper development and pigmentation of the seed coat. The Plant cell 14: 2463–2479. 680 Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin Y-C, Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A, et al. 2013. The Norway spruce genome 682 sequence and conifer genome evolution. Nature 497: 579–84. Parcy F, Nilsson O, Busch M a, Lee I, Weigel D. 1998. A genetic framework for floral 684 patterning. Nature 395: 561–566. Prasad K, Zhang X, Tobón E, Ambrose BA. 2010. The Arabidopsis B-sister MADS-box 686 protein, GORDITA, represses fruit growth and contributes to integument development. Plant Journal 62: 203–214. 688 Rudall P, Bateman R. 2010. Defining the limits of flowers: the challenge of distinguishing

21

between the evolutionary products of simple versus compound strobili. Philosophical 690 Transactions of the Royal Society of London, Ser. B 365: 397–409. Sayou C, Monniaux M, Nanao MH, Moyroud E, Brockington SF, Thévenon E, 692 Chahtane H, Warthmann N, Melkonian M, Zhang Y, et al. 2014. A promiscuous intermediate underlies the evolution of LEAFY DNA binding specificity. Science 343: 645–8. 694 Sayou C, Nanao MH, Jamin M, Posé D, Thévenon E, Grégoire L, Tichtinsky G, Denay G, Ott F, Peirats Llobet M, et al. 2016. A SAM oligomerization domain shapes the genomic 696 binding landscape of the LEAFY transcription factor. Nature communications 7: 11222. Shindo S, Sakakibara K, Sano R, Ueda K, Hasebe M. 2001. Characterization of a 698 FLORICAULA/LEAFY homologue of Gnetum parvifolium and its implications for the evolution of reproductive organs in seed plants. International Journal of Plant Sciences 162: 700 1199–1209. Shiokawa T, Yamada S, Futamura N, Osanai K, Murasugi D, Shinohara K, Kawai S, 702 Morohoshi N, Katayama Y, Kajita S. 2008. Isolation and functional analysis of the CjNdly gene, a homolog in Cryptomeria japonica of FLORICAULA/LEAFY genes. Tree physiology 704 28: 21–8. Sundström J, Carlsbecker A, Svensson ME, Svenson M, Johanson U, Theißen G, 706 Engström P. 1999. MADS-box genes active in developing pollen cones of Norway spruce (Picea abies) are homologous to the B-class floral homeotic genes in angiosperms. 708 Developmental Genetics 25: 253–266. Sundström J, Engström P. 2002. Conifer reproductive development involves B-type 710 MADS-box genes with distinct and different activities in male organ primordia. Plant Journal 31: 161–169. 712 Tandre K, Svenson M, Svensson ME, Engström P. 1998. Conservation of gene structure and activity in the regulation of reproductive organ development of conifers and angiosperms. 714 Plant Journal 15: 615–623. Theißen G, Becker A. 2004. Gymnosperm orthologues of class B floral homeotic genes and 716 their impact on understanding flower origin. Critical Reviews in Plant Sciences 23: 129–48. Theissen G, Melzer R. 2007. Molecular mechanisms underlying origin and diversification of 718 the angiosperm flower. Annals of Botany 100: 603–619. Vazquez-Lobo A, Carlsbecker A, Vergara-Silva F, Alvarez-Buylla E., Pinero D, 720 Engstrom P. 2007. Characterization of the expression patterns of LEAFY/FLORICAULA and NEEDLY orthologs in female and male cones of the conifer genera Picea, Podocarpus, and 722 Taxus: implications for current evo-devo hypotheses for gymnosperms. Evolution and

22

Development 9: 446–459. 724 Wegrzyn JL, Liechty JD, Stevens KA, Wu LS, Loopstra CA, Vasquez-Gross HA, Dougherty WM, Lin BY, Zieve JJ, Martínez-García PJ, et al. 2014. Unique features of the 726 loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics 196: 891–909. 728 Winter KU, Saedler H, Theißen G. 2002. On the origin of class B floral homeotic genes: Functional substitution and dominant inhibition in Arabidopsis by expression of an 730 orthologue from the gymnosperm Gnetum. Plant Journal 31: 457–475. Zhang P, Tan HTW, Pwee KH, Kumar PP. 2004. Conservation of class C function of 732 floral organ development during 300 million years of evolution from gymnosperms to angiosperms. Plant Journal 37: 566–577. 734 Zhao Y, Granas D, Stormo GD. 2009. Inferring binding energies from selected binding sites. PLoS Computational Biology 5. e1000590 736

23

738 TABLE

Table 1 – Kinetic constants for the interaction of Welwitschia mirabilis WelLFYD and 740 WelNDLYD proteins with WelAP3/PI-1 or WelAP3/PI-2 upstream regions, by Surface Plasmon Resonance. The sensogram curves corresponding to the association and 742 dissociation between WelLFYD, WelNDLYD and the tested DNA molecules fitted best to the “heterogenous ligand model” that assumes the existence of two types of sites, consistent with

App App 744 the presence of a many low affinity sites (KD 1) among a few high affinity sites (KD 2). If no high affinity sites are present, then the model proposes two types of low affinity sites.

App App 2 Length (bp) KD 1 (µM) KD 2 (µM) c WelAP3/PI-2 3377 235 0.045 2.6 WelLFYD WelAP3/PI-1 2878 427 0.732 4.5 WelTubulin 2147 1030 22.3 18.2 WelAP3/PI-2 3377 884 0.175 13.8 WelNDLYD WelAP3/PI-1 2878 2570 76.0 4.12 WelTubulin 2147 833 25.3 14.1 746

App bp, base pair; KD , apparent dissociation constant, binding reaction was modelled with a

748 heterogenous ligand model.

750 FIGURE LEGENDS

752 Figure 1 – Morphology of Welwitschia mirabilis and phylogenetic relationships of MADS-box genes from Welwitschia mirabilis. (A) Welwitschia mirabilis plant with female 754 cones on stalks emerging from the scaly body at base of the leaves, at Huntington Botanical Gardens, San Marino, CA. (B) Female cone. (C) Male cone. (D) Close-up of a dissected male 756 axillary unit showing the bracts (b) and the pollen-producing synangia (sy) at the tips of the antherophores which surround the sterile ovule (so) with its integument. One bract and part of 758 the antherophore have been removed to show the central ovule. (E) PhyML tree showing approximate likelihood ratio test support values above 0.9. Clades containing the AP3/PI- 760 like, Bsister, AG-like and AGL6-like genes are marked; other clades are collapsed to triangles. Welwitschia genes (in red) are strongly supported as sister to Gnetum genes already 762 identified by Becker and Theissen (Becker and Theissen, 2003) and by Becker et al. (Becker et al., 2003; Becker and Theissen, 2003) as members of the different clades indicated, and

24

764 they fall within those clades in this analysis. Source species are indicated next to the name of each gene (angiosperm species in blue, gymnosperm species in orange).

766 Figure 2 – Expression of WelAP3/PI-1, WelAP3/PI-2, WelAG, WelBsister, WelAGL6, WelLFY and WelNDLY in cones and leaves of Welwitschia mirabilis. Semi-quantitative 768 RT-PCR profiles of WelLFY, WelNDLY and the newly identified Welwitschia MADS-box genes. The Welwitschia actin gene (WelActin) was used as a control. PCR cycles are given on 770 the left. Names of the different tissues used are indicated above each lane.

772 Figure 3 – In situ hybridization of WelAP3/PI-1, WelAP3/PI-2, WelAG, WelLFY and WelNDLY in male Welwitschia mirabilis cones. (A) Schematic representation of a 774 longitudinal section of a male cone showing the apex (ap), axillary units (au) and sterile bracts (br). The synangia primordia (light green) and young developing synangia (dark green) are 776 visible in the second and third (from the top) axillary units. In later developmental stages, the pollen-producing tissues (yellow) differentiate from the rest of the synangia (blue). Inside this 778 are the integuments of the sterile ovule (white) with the nucellus (orange) inside. (B-K) In situ hybridization of male cone sections using DIG-labelled RNA antisense WelLFY (B,C), 780 WelNDLY (D,E), WelAG (F-H), WelAP3/PI-1 (I) and WelAP3/PI-2 (J,K) probes. The expression patterns are schematically summarized on the right. Scale bars = 200 µm, except 782 for G scale bar = 100 µm. ap, apex; au, axillary unit; br, bract; sy, synangia; n, nucellus of the sterile ovule; in, integument of the ovule. Asterisks indicate the locations of sporangia.

784

Figure 4 – In vitro comparison of Welwitschia mirabilis WelLFY and WelNDLY DNA 786 binding specificities. (A) Electrophoretic mobility shift assay (EMSA) with 10 nM

fluorescent AP1bs1 DNA and increasing concentrations of WelLFYDBD or WelNDLYDBD 788 protein concentrations, from left to right, are 0, 0.025, 0.05, 0.1, 0.2, 0.4, 0.8, 1.5, 3 and 5 µM. The complexes corresponding to a monomer (one filled circle) or a dimer (two filled circles) 790 bound to DNA are indicated on the right. (B) EMSA with 10 nM fluorescent AP1bs1 DNA

and various combinations of WelLFYDBD (0.4 µM), WelNDLYDBD (1.5 µM) and GFP-

792 WelNDLYDBD (1.5 µM). WelLFYDBD or WelNDLYDBD homodimer (two filled circles), GFP-

WelNDLYDBD homodimer (two open circles) and heterodimer involving a GFP-WelNDLYDBD

794 and a non-GFP labelled WelNDLYDBD protein or a GFP-WelNDLYDBD and a non-GFP

labelled WelLFYDBD protein (a filled and an open circle) are depicted. (C) EMSA with 10 nM

25

796 fluorescent AGbs2 or 10 nM AP3bs1 and increasing concentrations of WelLFYDBD or

WelNDLYDBD. Protein concentrations from left to right are 0, 0.1, 0.4 and 1.6 µM. (D) EMSA 798 with 10 nM fluorescent AP1bs1, W10 or W12 DNA probe and 0 or 0.5 µM WelLFYD or 0.5 µM WelNDLYD. D denotes proteins starting at amino acid 40 on the basis of the AtLFY 800 sequence.

802 Figure 5 – Identification of the binding sites of Welwitschia mirabilis protein WelLFYD upstream of the WelAP3/PI-1 and WelAP3/PI-2 coding sequences. (A) Logo of WelLFYD 804 Position Weight Matrix compared to the logo of AtLFYD PWM from (Chahtane et al., 2013) (B) Scores of the binding sites computed with the WelLFYD PWM along Wel1AP3/PI-1 or 806 (C) WelAP3/PI-2 upstream sequences (bp, base pairs, counting from start codon). (D) EMSA with 10 nM fluorescent DNA corresponding to the five sites indicated in (C) and 0.5 µM of 808 WelLFYD in lanes 2-6. The sequences of the five sites tested are indicated on the right, with the 19 bp motif indicated in red. 810

812

26

FIGURES 814 Figure 1 – Morphology of Welwitschia mirabilis and phylogenetic relationships of MADS-box genes from Welwitschia mirabilis. 816

818 Figure 2 – Expression of WelAP3/PI-1, WelAP3/PI-2, WelAG, WelBsister, WelAGL6, 820 WelLFY and WelNDLY in cones and leaves of Welwitschia mirabilis.

822

824

826

828

830

27

Figure 3 – In situ hybridization of WelAP3/PI-1, WelAP3/PI-2, WelAG, WelLFY and 832 WelNDLY in male Welwitschia mirabilis cones.

834

836

28

Figure 4 – In vitro comparison of Welwitschia mirabilis WelLFY and WelNDLY DNA 838 binding specificities.

840

842 Figure 5 – Identification of the binding sites of Welwitschia mirabilis protein WelLFYD upstream of the WelAP3/PI-1 and WelAP3/PI-2 coding sequences.

844

846

29

SUPPORTING INFORMATION 848 Supplementary Figure 1: Alignment of the DNA binding domains (DBD) of AtLFY, 850 WelLFY and WelNDLY. Supplementary Figure 2: Alignment of the predicted amino acid sequences of the 852 five newly identified Welwitschia mirabilis genes reveals the conserved domain structure of MIKC-type MADS-box proteins. 854 Supplementary Figure 3: WelLFY and WelNDLY binding to predicted WelLFY binding sites in WelAP3/PI promoters. 856 Supplementary Figure 4: Conservation of potential LFY homolog binding sites in the promoters of conifers B genes. 858 Supplementary Table 1: List of oligonucleotides used for this study 860 Supplementary Table 2: B genes homologues identified in conifer species with sequenced genome. 862 Supplementary Table 3: Size-exclusion chromatography assays with recombinant

WelLFYDBD and WelNDLYDBD. 864 Supplementary Table 4: Position Weight Matrix obtained for WelLFYD and AtLFYD. 866 Supplementary Table 5: Examples of sequences from the SELEX-seq performed with WelNDLYD 868 Supplementary Methods 870 Supplementary References 872

30