DNA recognition by Arabidopsis transcription factors ABI3 and NGA1

Giedrius Sasnauskas*, Elena Manakova, Kęstutis Lapėnas, Kotryna Kauneckaitė and Virginijus Siksnys*

Address: Institute of Biotechnology, Vilnius University, Saulėtekio avenue 7, LT-10257 Vilnius, Lithuania

* Corresponding authors: Dr. Giedrius Sasnauskas phone: +370-5-2234443 fax: +370-5-2234367 E-mail: [email protected] and Prof. Virginijus Siksnys phone: +370-5-2234365 fax: +370-5-2234367 E-mail: [email protected]

Running title: DNA recognition by ABI3 and NGA1 B3 domains

Abbreviations: ABI3-B3, B3 domain of ABI3; NGA1-B3, B3 domain of NGA1; VAL1-B3, B3 domain of VAL1; DBD, DNA-binding domain; TF, transcription factor; EMSA, electrophoretic mobility shift assay; MES, 2-(N-morpholino)ethanesulfonic acid.

Database entry: RCSB Data Bank, accession number 5os9.

Keywords: NGA1, ABI3, B3 domain, X-ray crystallography, DNA recognition

Conflicts of interest. The authors declare that they have no conflicts of interest. ABSTRACT

B3 transcription factors constitute a large plant‐specific protein superfamily, which plays a central role in plant life. Family members are characterized by the presence of B3 DNA binding domains (DBDs). To date, only a few B3 DBDs were structurally characterized, therefore the DNA recognition mechanism of other family members remains to be elucidated. Here we analyze DNA recognition mechanism of two structurally uncharacterized B3 transcription factors, ABI3 and NGA1. Guided by the structure of the DNA-bound B3 domain of Arabidopsis transcriptional repressor VAL1, we have performed mutational analysis of the ABI3 B3 domain. We demonstrate that both VAL1-B3 and ABI3-

B3 recognize the Sph/RY DNA sequence 5′-TGCATG-3′ via a conserved set of base-specific contacts. We have also solved a 1.8 Å apo-structure of NGA1-B3,

DBD of Arabidopsis transcription factor NGA1. We show that NGA1-B3, like the structurally related RAV1-B3 domain, is specific for the 5′-CACCTG-3′ DNA sequence, albeit tolerates single substitutions at the 5′-terminal half of the recognition site. Employing distance-dependent fluorophore quenching, we show that NGA1-B3 binds the asymmetric recognition site in a defined orientation, with the ‘N-arm’ and ‘C-arm’ structural elements interacting with the 5′- and 3′-terminal nucleotides of the 5′-CACCTG-3′ sequence, respectively.

Mutational analysis guided by the model of DNA-bound NGA1-B3 helped us identify NGA1-B3 residues involved in base-specific and DNA backbone contacts, providing new insights into the mechanism of DNA recognition by plant B3 domains of RAV and REM families. INTRODUCTION

B3 transcription factors (B3 TFs) constitute a large plant‐specific protein superfamily (~10% of all TFs in the flowering plants), which plays a central role in plant life. Family members contain one or several B3 DNA binding domains

(DBDs), each comprised of ~100 aa residues. The B3 superfamily is divided into four families: ARF, LAV, RAV and REM [1,2].

To date, the best mechanistically understood group is ARF (AUXIN RESPONSE

FACTOR) family with more than 20 members in A. thaliana [3]. It was shown that ARF B3 domains specifically interact with a range of 5′-NNGACA-3′ hexanucleotides [4], with 5′-GAGACA-3′ being the ‘canonical’ auxin response

(AuxRE) element [5]. Crystal structures of ARF1, ARF5 and DNA-bound ARF1 are solved providing the mechanistic basis for DNA target specificity in auxin response [4].

The LAV or LEC2/ABI3-VAL ([LEAFY COTYLEDON 2]/[ABSCISIC ACID INSENSITIVE

3] – [VP1/ABI3-LIKE]) family is comprised of LEC2/ABI3 and VAL subfamilies. A. thaliana LEC2/ABI3 LEC2, FUS3 and ABI3 (an ortholog of maize VP1), and VAL proteins (VAL1,2,3, or HSI2, HSL1 and HSL2) play an important role in various stages of plant development, from embryogenesis to seed dormancy and vernalization [1]. VP1, ABI3, VAL1, VAL2 and FUS3 specifically bind to the consensus ABI3-VP1 recognition sequence (Sph/RY element) 5′-TGCATG-3′ [6–

10]. The structure of VAL1 B3 domain bound to Sph/RY DNA was recently solved

[11].

The RAV family (RELATED to ABI3/VP1) proteins participate in leaf maturation and flowering. 13 family members are present in A. thaliana, including RAV1-

RAV2, TEM1-TEM2, and NGA1-NGA4 [2]. The RAV1 B3 domain specifically binds DNA sequence 5′-CACCTG-3′ [12], and its apo-structure was solved by NMR

[13].

The largest (more than 70 members in A. thaliana) and the least studied is the

REM (REPRODUCTIVE MERISTEM) family [2]. The best known member is VRN1, which plays a role in vernalization. The apo-structure of VRN1 B3 domain was solved by X-ray crystallograpy, and nonspecific DNA binding demonstrated in vitro [14]. An NMR structure of B3 domain of At1g16640 protein is also available, but DNA binding for this protein was not demonstrated [15].

Structurally similar DNA binding domains are also found in some bacterial restriction endonucleases, e. g. EcoRII [16] UbaLAI [17], BfiI [18] and NgoAVII

[19]. Despite low sequence similarity and considerable differences in size, bacterial B3-like domains and plant B3 domains from ARF1 and VAL1 proteins employ a conserved wrench-like DNA binding surface to bind DNA from the major groove side, and make base-specific contacts via structural elements dubbed N-arm and C-arm [18].

Here we analyze DNA binding determinants of B3 domains of ABI3 and NGA1, two structurally uncharacterized Arabidopsis transcription factors. ABI3 functions in the regulation of abscisic acid-responsive during seed development [2,20]. It contains a LAV family B3 domain (ABI3-B3), which, like

VAL1-B3, is specific for the Sph/RY element [8]. We confirm by mutational analysis that both VAL1-B3 and ABI3-B3 recognize the Sph/RY sequence via a conserved set of base-specific and DNA backbone contacts. NGA1 (NGATHA1), which promotes a general differentiation program in plants [21], contains a RAV family B3 domain (NGA1-B3). Here we present an apo-structure of NGA1-B3 and demonstrate its specificity for the canonical RAV family recognition sequence 5′-CACCTG-3′. Furthermore, we determine the orientation of NGA1-B3 on its asymmetric recognition site, and, guided by a homology model of DNA- bound NGA1-B3, test the roles of putative DNA recognition residues.

Comparison of base-specific and DNA backbone contacts made by ARF1-B3,

VAL1-B3/ABI3-B3 and NGA1-B3 provide new insights into the DNA recognition mechanism of plant B3 domains, including members of the least studied REM family. RESULTS

Structure of the NGA1-B3 domain

In the present study we focus on the DNA recognition mechanism of ABI3-B3, which recognizes the Sph/RY sequence and shares significant sequence similarity with VAL1-B3, and NGA1-B3, which is similar to RAV1-B3 DBD specific for the DNA sequence 5′-CACCTG-3′ (Fig. 1A).

We have solved the apo-structure of NGA1 B3 DNA binding domain (NGA1-B3) at 1.8 Å resolution (data collection and refinement statistics are summarized in

Table 1). The asymmetric unit contains two almost identical protein molecules

(RMSD < 0.35 Å over 90 Cα atoms), with larger deviations observed only at the

C-termini. NGA1-B3 is composed of 7 β strands making a pseudo barrel decorated by 3 α helices (Fig. 1A,B). The overall fold of the domain belongs to

SCOP double-split β -barrel fold, DNA binding pseudobarrel domain superfamily

[22]. The structure of the NGA1-B3 domain is most similar to the B3 recognition domains of plant transcription factors VAL1 (PDB ID 6fas, DALI Z-score 16.8),

ARF5 (1ldu, DALI Z-score 15.1), ARF1 (4ldx, 14.6), RAV1 (1wid, 14.2, Fig. 1A),

At1g16640 (1yel, 10.7) and VRN1 (4i1k, 10.7), and to B3-like domains of bacterial restriction endonucleases, including BfiI (3zi5, 8.6) and EcoRII (3hqf,

7.8). Like ARF1-B3 and VAL1-B3, NGA1 contains a wrench-like surface capable of DNA binding from the major groove side, with the N-arm and C-arm structural elements well positioned to provide base-specific contacts (Fig. 1C).

A similar mechanism for DNA recognition was also proposed for RAV1-B3 [23].

DNA binding specificity of ABI3-B3 and NGA1-B3

Site-specific interactions of LAV family B3 DBDs in vitro were previously investigated using electrophoretic mobility shift assay (EMSA) [6,11,24,25], surface plasmon resonance and ELISA [8]. Here we have tested DNA binding of

Arabidopsis ABI3-B3 by EMSA using the 12 bp oligoduplex 12/12LAV as cognate

DNA and the 12/12RAV as non-cognate DNA. Under standard EMSA conditions

(pH 8.3 Tris-acetate buffer) the wt ABI3-B3 protein containing an N-terminal hexahistidine tag forms a high mobility complex with cognate DNA, and low- mobility complexes with both cognate and non-cognate DNAs (Fig. 2A). We presume that the mobile band is the specific ABI3-B3-DNA complex, with a single protein molecule bound to the specific sequence at the center of the

12/12LAV oligoduplex, and the low mobility band observed with both DNAs is a nonspecific complex. Surprisingly, we find that wt ABI3-B3 variant lacking a hexahistidine tag (the tag removal is described in Experimental Procedures) forms considerably less nonspecific complex (Fig. 2A); in order to reduce the effect of the nonspecific ABI3-B3 binding on EMSA data interpretation, all further experiments were performed with samples of ABI3-B3 proteins containing no tags.

To confirm the specific binding of ABI3-B3 to DNA containing an Sph/RY element, we have performed an EMSA competition experiment. ABI3-B3 was either incubated with radiolabeled cognate 12/12LAV DNA alone, or the samples were supplemented with increasing amounts of unlabeled competitor

DNA, either the cognate duplex 12/12LAV, or the non-cognate duplex 12/12RAV

(Fig. 2B). Quantitative analysis of the dependence of the amount of the labeled specific complex as a function of unlabeled competitor concentration

(described in Experimental Procedures and ref. [11]) gave the dissociation constants KD(competitor) for the unlabeled DNA-protein interactions. The

KD(competitor) value in the case of cognate unlabeled DNA (2.3 ± 0.4 µM) is significantly lower than the KD(competitor) for the non-cognate competitor (11 ± 2 µM, Fig. 2B), confirming the specificity of ABI3-B3 for the DNA containing an Sph/RY sequence.

RAV family protein RAV1-B3 specifically binds the cognate sequence 5′-

CACCTG-3′, as demonstrated by EMSA experiments [12], but no in vitro data is available for NGA1 and other RAV family DBDs. We find that the binding patterns of NGA1-B3 with the presumed cognate (12/12RAV) and non-cognate

(12/12LAV) DNAs under standard EMSA conditions are different, but the specific and the nonspecific complexes are not clearly separated (Fig. 2C). Auspiciously, much better discrimination between the presumed cognate and non-cognate

DNAs is observed in a standard pH 8.3 buffer supplemented with 5 mM calcium-acetate (Fig. 2C). The mechanism by which Ca2+ ions interfere with formation of the mobile non-specific NGA1-B3 complex remains unknown; we find that it is not a simple effect of increased ionic strength, as an equivalent increase in ionic strength by addition of sodium-acetate had no effect on NGA1-

B3 binding (data not shown). As with ABI3-B3, we have found that removal of the hexahistidine tag reduced the low-mobility nonspecific complex, therefore all further experiments were performed using samples of NGA1-B3 proteins containing no tags. As with ABI3-B3, the specificity of NGA1-B3 for the canonical RAV1 recognition sequence was confirmed by an EMSA competition experiment (Fig. 2D). As might be expected, we find that KD(competitor) value in the case of cognate unlabeled 12/12RAV competitor (2.3 ± 0.4 µM) is approx.

20-fold lower than KD(competitor) determined for non-cognate 12/12LAV DNA

(45 ± 15 µM, Fig. 2D).

We note that the stability of specific complexes observed with both ABI3-B3 and NGA1-B3 is low, as complexes are observed only with micromolar DNA and protein concentrations, and the determined values for dissociation constants with cognate DNAs are in the micromolar range. Though unsuitable for precise quantification of DNA binding affinities, EMSA experiments allowed us to evaluate the ability of ABI3-B3 and NGA1-B3 to bind the canonical and altered

DNA recognition sequences, and to test the effect of protein point mutations on sequence-specific binding.

Stringency of sequence recognition by ABI3-B3 and NGA1-B3

To probe the stringency of DNA recognition by the ABI3-B3 and NGA1-B3 domains, we have performed EMSA experiments with oligoduplexes containing substitutions within or adjacent to the canonical Sph/RY and RAV recognition sequences, respectively (20 variants of 12/12LAV DNA were tested with ABI3-

B3, and 21 variant of 12/12RAV DNA with NGA1-B3, Table S1).

We find that almost all single base pair deviations from the canonical Sph/RY sequence 5′-TGCATG-3′ abolish formation of the specific complex by ABI3-B3.

The only exception is the oligoduplex carrying a T:A to C:G replacement at the first position of the Sph/RY element (sequence 5′-CGCATG-3′), which forms a low amount of the specific (high-mobility) complex (Fig. 3A).

As with ABI3-B3, most substitutions within the RAV1 recognition sequence 5′-

CACCTG-3′ abolished formation of the specific complex by NGA1-B3, albeit under our experimental conditions NGA1-B3 formed a significant amount of the specific complex with DNA oligoduplexes 5′-TACCTG-3′, 5′-AACCTG-3′, 5′-

CCCCTG-3′, and a lower amount with some other DNA variants, implying that

NGA1-B3 tolerates certain substitutions at the 5′-terminal halve of the recognition site (Fig. 3B). Base substitutions at positions adjacent to the recognition sites had no effect on specific complex formation by ABI3-B3 and

NGA1-B3 (Fig. 3A,B), indicating that ABI3-B3 and NGA1-B3 are specific (with significant tolerance for single bp substitutions in the case of NGA1-B3) for the

Sph/RY (5′-TGCATG-3′) and RAV1 (5′-CACCTG-3′) sequences, respectively.

Orientation of NGA1-B3 on cognate DNA

All plant B3 and bacterial B3-like DBDs characterized to date recognize asymmetric DNA sequences and bind these sites in one defined orientation

[4,16–19]. The orientation of ABI3-B3 on the Sph/RY sequence 5′-TGCATG-3′ must be identical to orientation of the closely related VAL1-B3 domain (N-arm and C-arm residues interact with the 5′- and 3′-termini of the 5′-TGCATG-3′ sequence, respectively) [11]. On the other hand, orientation of NGA1-B3 and the related domain RAV1-B3 on the 5′-CACCTG-3′ recognition sequence (the non-palindromic part of the sequence is underlined), due to the lack of relevant co-crystal structures, remains unknown. To determine the NGA1-B3 binding orientation, we have employed experimental setup based on the distance- dependent fluorophore quenching (Fig. 4A). First, we have removed the sole cysteine from wt NGA1-B3 (mutation C123S), and, based on a simple model of

DNA-bound NGA1-B3 (an overlay of apo-NGA1-B3 structure on the structure of

DNA-bound ARF1-B3 with an extended DNA fragment), introduced single cysteines on the opposite faces of the protein, thereby generating NGA1-B3 variants C123S-K108C and C123S-R59C. The engineered cysteines were labeled with a maleimide derivative of Alexa Fluor 488, and the fluorescence of the resultant protein C123S-K108C (the C123S-R59C derivative precipitated upon labeling and could not be purified) was measured in the presence of unmodified cognate 18/18SP DNA, or modified DNA bearing BHQ-2 quenchers on 5′- or 3′- termini of either the top or the bottom DNA strands. The length of

DNA oligoduplexes (18 bp) was selected such that it prevents direct interaction between the DNA-attached quencher and the protein-attached fluorophore, and also ensures that the distances between the quencher and the fluorescent label in two binding orientations are sufficiently different to provide detectable changes in the quenching efficiency (Fig. 4A).

Fluorescence of the labeled NGA1-B3 was efficiently quenched by 3′-BHQ-

2_DNA1 and 5′-BHQ-2_DNA2, and less efficiently by 3′-BHQ-2_DNA2 and 5′-

BHQ-2_DNA1 (Fig. 4B, bar graphs). This supports the model in which NGA1-B3 binds DNA in the orientation where N-arm contacts the 5′-terminus of the 5′-

CACCTG-3′ sequence, and C-arm contacts the 3′-terminus (Fig. 4B). We have also validated this experimental approach by performing a similar set of measurements with Alexa Fluor 488-labeled ABI3-B3 variants C663S-T596C and

C663S-T646C and DNA oligoduplexes 5′-BHQ-2 and 3′-BHQ-2 (Fig. 4C). As expected, fluorescence measurements confirmed that ABI3-B3 binds the

Sph/RY DNA sequence in the same orientation as VAL1-B3.

Model of DNA-bound ABI3-B3

ABI3-B3 shares significant sequence similarity (36% identical, 54% similar residues) with the structurally characterized VAL1-B3, including almost full conservation of residues responsible for base-specific and DNA backbone contacts (Fig. 1A). This allowed us to build a reliable homology model of DNA- bound ABI3-B3. The resultant model was primarily used for visualization of

ABI3-B3 contacts (Fig. 5A and Fig. 6A), as most amino acid residues for mutational analysis could be selected based on the VAL1-B3/ABI3-B3 sequence alignment (Fig. 5A, top). The most prominent differences between VAL1-B3 and

ABI3-B3 DNA binding surfaces exposed by the model are VAL1 R306 (in ABI3-B3 replaced by N583, no direct backbone contact possible), VAL1 Q345 (R623 in ABI3-B3, may form a backbone contact) and VAL1 S327 (R604 in ABI3-B3, enables direct backbone contacts).

We find that alanine replacement of conserved ABI3 residues implicated in direct base-specific contacts V588, W627, N629 and M634 (Fig. 5A, equivalent to VAL1 residues V311, W349, N351 and M356) abolished formation of the cognate complex (Fig. 5B). Similarly to VAL-B3 [11], alanine replacement of

L584 [equivalent to VAL1 I307, a van der Walls (vdW) contact to T1:A6 thymine], and N630 (equivalent to VAL1 N352, a water-mediated contact to

C1:G6 guanine), did not eliminate the cognate complex, albeit the N630A mutation reduced its amount (Fig. 5B). The only discrepancy between ABI3-B3 and VAL1-B3 was observed for the R586 position (equivalent to VAL1 R309, H- bonds to G2:C5 guanine), as in contrast to VAL1-B3 R309A variant [11], ABI3-B3

R586A mutant retained specific DNA binding (Fig. 5B).

Alanine replacements were also made at several ABI3-B3 positions implicated in direct contacts to DNA phosphates (Fig. 6A). Most of these mutations also reduced the amount of the specific complex (Fig. 6B).

Model of DNA-bound NGA1-B3

Despite the fact that ARF1-B3 and VAL1-B3 use equivalent DNA binding surfaces, the DNA orientation relative to the proteins is slightly different (an overlay in Fig. 1B), resulting in a significant shift of DNA atom positions (e. g., the DNA phosphates at the 5′-terminus of the hexanucleotide sequences in Fig.

1B are shifted by 5 Å). Such variation in DNA position complicates modeling of

DNA-bound NGA1-B3, as the precise location, orientation and conformation of the 5′-CACCTG-3′ DNA sequence when bound to NGA1-B3 may differ from both

ARF1-B3 and VAL1-B3. Nevertheless, we have built an approximate model of DNA-bound NGA1-B3 and attempted to validate the roles of putative NGA1

DNA-contacting residues by site-specific mutagenesis.

Based on modeling (performed as described in Experimental procedures), we have identified 3 groups of NGA1-B3 residues that may form DNA contacts:

(1) N-arm residues L47, V51 and R49, and C-arm residue W91, which are conserved in VAL1/ABI3-B3 and all (except for V51) are within H-bonding or vdW distances from DNA bases (Fig. 5A);

(2) N-arm residue N48 and C-arm residues (N92, S93, S94, Q95 and S96), which, depending on the N-arm and C-arm conformation and exact DNA position, may form water-mediated contacts or direct contacts to DNA bases

(Fig. 5A);

(3) positively charged and uncharged residues K39, T40, S42, K46, K54, R87,

S93 (S93 also belongs to the ‘2nd’ group), K101 and S104 that may form direct contacts to DNA phosphates (Fig. 6A).

To probe the role of these residues in DNA binding, we have replaced all the

‘1st’ and the ‘2nd’ group residues, and a subset of the ‘3rd’ group residues by alanines, and tested the ability of mutant NGA1-B3 domains to form specific complexes (Fig. 5C and Fig. 6C).

We find that NGA1-B3 mutations R49A and V51A (the ‘1st’ group) abolish formation of the specific complex; in contrast, the ‘1st’ group mutants L47A and W91A retain the specific DNA binding ability (Fig. 5C). The ‘2nd’ group mutations S94A, S96A also abolish formation of the specific complex, but N92,

S93 and Q95 have no effect (Fig. 5C and 6C). Interestingly, mutations L47A

(‘1st’ group) and N48A (‘2nd’ group) apparently relax sequence recognition of

NGA1-B3, as these mutants, unlike the wt variant, form a specific-like (high- mobility) complex with the ‘nonspecific’ 12/12LAV DNA (Fig. 5C). As with ABI3-B3, alanine replacements of charged residues and the conserved serine implicated in direct DNA backbone contacts (K37, S42, K46, R87, the

‘3rd’ group) abolish specific binding (Fig. 6C).

DISCUSSION

DNA binding by ABI3-B3 and NGA1-B3

In the present study we have explored the DNA binding specificity and DNA recognition determinants of two site-specific plant B3 domains, ABI3-B3 and

NGA1-B3. Employing EMSA assay we show that ABI3-B3, like other characterized LAV family B3 domains, is specific for the canonical Sph/RY DNA sequence 5′-TGCATG-3′, but shows some tolerance for changes at the 5′- terminal T:A base pair (Fig. 3A). We also find that NGA1-B3 specifically binds the same DNA sequence (5′-CACCTG-3′) as RAV1-B3 (Fig. 3B). This could be expected given high sequence similarity between NGA1-B3 and RAV1-B3, including complete conservation of N- and C-arm sequences (Fig. 1A). In addition, we unequivocally determine the orientation of NGA1-B3 on its asymmetric recognition sequence (Fig. 4). Our simple experimental approach

(Fig. 4A), which relies on bulk measurements of distance-dependent fluorophore quenching, could be successfully applied to other site-specific proteins with unknown binding orientation. Noteworthy, a similar, but considerably more complex setup based on single-molecule quantification of

FRET efficiency values was recently employed to determine the preferred binding orientation of restriction endonuclease BcnI [26]. We also find that

NGA1-B3 has significant tolerance for certain single base pair substitutions at the 5′-terminal halve of the recognition site (Fig. 3B); based on our data, its recognition sequence can be defined as 5′-(C≳A,T)(A≳C>G)(C>T>G)(C>A)TG- 3′. This considerably broadens the range of potential DNA sequences targeted by RAV family transcription factors.

Base readout by site-specific B3 domains

The co-crystal structures of VAL1-B3 [11] and ARF1-B3 [4], the model of DNA- bound NGA1-B3 with a defined orientation of cognate DNA, and NMR chemical shift data for RAV1-B3 residues observed upon DNA binding [13] enable detailed comparison of base readout employed by B3 domains belonging to

LAV, ARF, and RAV families (Fig. 5A). Below we discuss the base readout of

VAL1-B3/ABI3-B3, NGA1-B3 and ARF1-B3; a complete list of interactions, including DNA recognition residues and available experimental evidence for

RAV1-B3 and ARF5-B3, are listed in Table 2.

#1-2 positions

LAV family B3 domains of VAL1/ABI3 recognize the 5′-terminal base pair via the

N-arm residue I307/L584, which makes a vdW contact to the methyl group of the T1:A6 base pair (Fig. 4A). Equivalent position in NGA1-B3 is occupied by

L47, which in analogy to VAL1/ABI3 can make a vdW contact to the C1:G6 base pair (Fig. 5A). Distinct specificities for the 1st base pair by VAL1/ABI3 and NGA1 domains could be due to slightly different leucine/isoleucine positions relative to the DNA. A single base-specific vdW contact is also consistent with relaxed recognition of the 1st base pair by ABI3-B3 and NGA1-B3 (Fig. 3). Interestingly, the NGA1-B3 L47A mutant forms a specific-like high-mobility complex with the

‘standard’ non-cognate oligoduplex 12/12LAV. This DNA contains an Sph/RY sequence 5′-TGCATG-3′, which differs from the cognate RAV1 sequence 5′-

CACCTG-3′ by 3 underlined positions. Presumably, the L47A mutation further increases the tolerance of NGA1-B3 for substitutions within the specific site, enabling formation of a specific-like complex with the Sph/RY sequence. This view is supported by the fact that no binding is observed with an ‘alternative’ non-cognate DNA 12/12NSP, which contains an inverted RAV1 sequence 5′-

GTCCAC-3′ (Fig. 7A).

N-arm I307/L584 in VAL1/ABI3 is succeeded by G308/G585 and R309/R586, with the arginine making two H-bonds to guanine of the G2:C5 base pair (Fig.

5A). Interestingly, the R586A mutation in ABI3-B3 reduced the amount of the specific complex, but did not abolish its formation (Fig. 5B); moreover, this mutant retained strict specificity for the G2:C5 base pair (Fig. 7B), suggesting that recognition of G2:C5 by ABI3-B3 (and possibly other LAV family domains) may involve additional interactions not resolved in our model.

L47 in NGA1-B3 is succeeded by two non-glycine residues N48 and R49, in theory both capable of forming base-specific H-bonds. As expected, R49 is critical for specific binding by NGA1-B3 (Fig. 5C). We also propose that additional specificity of NGA1-B3 for the 2nd (and possibly 3rd) base pairs could be provided by the N48 residue. In the apo-structure of NGA1-B3 (PDB ID 5os9, this study) and most NMR models of RAV1 (PDB ID 1wid) [13] this asparagine points away from the DNA base-interacting surface. Nevertheless, we find that a minor change in N-arm conformation may place the N48 residue within a H- bonding distance from DNA bases (one such conformation is captured in the model depicted in Fig. 5A). Like NGA1-B3 L47A mutant, NGA1-B3 N48A also forms a specific-like complex with the ‘standard’ non-cognate DNA 12/12LAV

(Fig. 4C), but not with the ‘alternative’ non-cognate DNA 12/12 NSP (Fig. 7A).

The decreased specificity of the NGA1-B3 N48A mutant, and NMR analysis of an equivalent asparagine in RAV1 (N201, Table 2) are consistent with direct involvement of N-arm asparagine in base readout. Relaxed recognition of two 5′-terminal positions of the auxin response element

5′-GAGACA-3′ was recently demonstrated for ARF1 and ARF5 [4]. This agrees well with the experimental structure of ARF1-B3, in which the 1st and the 2nd base pairs are approached by ARF1 N-arm residues H136/G137/G138

(equivalent to NGA1-B3 L47/N48/R49, Fig. 5A). While glycines 137/138 do not form any contacts to the A2:T5 base pair, H136 and nitrogen of G137 make H- bonds to the G1:C6 base pair. Presumably, alternative conformations and /or protonation states of these residues may accommodate alternative base pairs.

This view is also supported by mutagenesis of equivalent ARF5 residues (Table

2), which did not alter DNA recognition preferences of ARF5 [4].

#3 position

ARF1 recognizes the 3rd G3:C4 base pair via R181 (Fig. 5A); as expected, an alanine replacement of an equivalent residue R215 in ARF5 impaired recognition of the AuxRE sequence [4].

Recognition of the 3rd C3:G4 base pair is conserved among VAL1-B3/ABI3-B3 and NGA1, and involves a C-arm tryptophan (W349/627/91 in VAL1/ABI3/NGA1) forming a vdW interaction with the N4-C4-C5-C6 edge of cytosine C3 (Fig. 5A).

Alanine replacement of this residue in ABI3-B3 abolished specific binding (Fig.

5B). In contrast, NGA1-B3 W91A mutant retained specific binding (Fig. 5C), but it also acquired increased tolerance for substitutions at the #3 target site position (C≈T>G, Fig. 7A). This finding, together with a significant chemical shift upon cognate DNA binding observed for the equivalent RAV1 residue

W245 (Table 2) support direct involvement of the C-arm tryptophan in base readout [13,23].

#4-6 positions VAL1-B3/ABI3-B3 recognizes the -ATG-3′ part of the Sph/RY sequence via C-arm residues M356/M634 and N351/N629, and vdW interactions of N-arm residues

S302/S579 and V311/V588 (Fig. 5A). As expected, alanine replacements of

ABI3-B3 residues V588, N629 and M634 impair specific binding (Fig. 5B). A similar effect of the S579A mutation (Fig. 6B), which preserves the vdW interaction, is likely due to the loss of a key DNA backbone contact (Fig. 6A, see discussion below).

Equivalents of ABI3-B3 S579/V588 in ARF1-B3 and NGA1-B3 are S131/S140 and

S42/V51, respectively. According to the co-crystal structure of ARF1, both S131 and S140 may form vdW contacts to a thymine of the T3:A4 base pair (Fig. 5A).

Of the two NGA1-B3 residues, according to our approximate model, only the

S42 makes a vdW contact to guanine in the C4:G3 base pair, while V51 is located approx. 5 Å away. Nevertheless, the V51A NGA1-B3 mutant is incapable of specific DNA binding (Fig. 5C), implying V51 plays a direct role in DNA recognition. As with ABI3-B3, the impaired specific binding by the NGA1-B3

S42A mutant (Fig. 6C) is likely due to the loss of a key DNA backbone contact

(Fig. 6A, see discussion below).

The -ACA-3′ part of the AuxRE sequence is approached by ARF1 C-arm residues

Q183, R185, R186 and P184, which make vdW contacts and H-bonds to the bases (Fig. 5A). The key role of an equivalent proline in ARF5 was confirmed by mutagenesis (Table 2) [4].

C-arm of NGA1 is shorter by two residues than in VAL1/ABI3, and lacks a methionine, which is replaced by a serine S96 (Fig. 5A). Thus, despite the fact that the last two base pairs of NGA1 recognition sequence 5′-CACCTG-3′ overlap with the 3′-terminus of VAL1/ABI3 recognition sequence 5′-TGCATG-3′,

NGA1-B3 must employ a different base readout mechanism. Unfortunately, due to the lack of homology in the C-arm regions of VAL1/ABI3, ARF1 and NGA1, we could not model the atomic details of the -CTG-3′ DNA sequence recognition by

NGA1-B3. Nevertheless, this role must be fulfilled by the C-arm residues 92-96.

Alanine replacements of only two residues (S94A and S96A) abrogated specific binding, implying a direct role for these residues in base readout (Fig. 5A,C).

Interestingly, RAV1 residues N246 and S250, equivalent to NGA1 N92 and S96, were implicated in base readout/DNA binding based on modeling [23] and NMR measurements [13] (Table 2).

Backbone contacts of site-specific B3 domains

B3 domains bind DNA in a positively charged cleft and make contacts to DNA backbone that encompass phosphates within and outside of the cognate hexanucleotide sequences (Fig. 6A). We identify three highly conserved positions of charged and non-charged phosphate-contacting residues:

(i) the α1 strand lysine (K297/574/126/190/37 in VAL1/ABI3/ARF1/RAV1/NGA1,

Fig. 1A), which contacts the 3′-NNNNNpN-5′ phosphate in the bottom DNA strand; also conserved in B3-like domains of bacterial endonucleases EcoRII

(K23) [16], UbaLAI (K27) [17], and Arabidopsis protein At1g16640 (K11); not present in VRN1-B3 [14,15];

(ii) the N-arm serine (S302/579/131/195/42 in VAL1/ABI3/ARF1/RAV1/NGA1, Fig.

1A), which contacts the 3′-NNNNpNN-5′ phosphate; also conserved in VRN1-B3

(S251), but not in At1g16640 (replaced by a glutamate E16) [18];

(iii) the C-arm (β4 strand) arginine (R623/177/241/87) in

ABI3/ARF1/RAV1/NGA1), which contacts the 5′-NpNNNNN-3′ phosphate in the top strand; also conserved in VRN1-B3 (R289), At1g16640 (R53), and bacterial

B3-like domains of EcoRII (R81) and BfiI (R272) [18]; this position in VAL1-B3 contains a glutamine (Q345), but an equivalent contact is made by an adjacent

R347 [11].

Mutations of these residues in ABI3-B3 and NGA1-B3 abolished or impaired formation of the specific complex (Fig. 6B,C). This corroborates the view proposed by Golovenko et al. [18], which states that contacts to DNA phosphates on the opposite edges of the recognition site (‘clamp phosphates’) orient the recognition sequence in the binding cleft and facilitate formation of the network of site-specific base contacts.

DNA recognition by other B3 domains

Inspection of multiple sequence alignments of Arabidopsis B3 domains belonging to RAV (absolute conservation of N-arm residues, and nearly complete conservation of C-arm residues in 13 Arabidopsis RAV family B3 domains), LAV [11], and ARF families [4] reveals that N- and C-arm residues implicated in base readout are highly conserved among the respective family members, implying all group members recognize their sites via a conserved set of base and DNA backbone contacts (an apparent exception is LAV family member VAL3 [11]). The least characterized and the most heterogeneous family of B3 domains is REM, with over 70 members in A. thaliana [2,27]. To date, only two family members are structurally characterized (apo-structures of

At1g16640 and VRN1), with nonspecific DNA binding activity demonstrated for

VRN1-B3 [14,15]. Comparison of At1g16640 and VRN1 sequences to members of the LAV/ARF/RAV families (Fig. 8) reveals that both REM domains have truncated C-arms (shorter by 2 aa than in ARF1/NGA1/RAV1, and shorter by 4 aa than in VAL1/ABI3), while At1g16640 also has a truncated N-arm (2 aa shorter than in LAV/ARF/RAV domains). Reduced length of N- and C-arms may account for the reported lack of sequence specificity of VRN1-B3 [14]; nevertheless, the length of VRN1-B3 N- and C-arms is sufficient to make contacts to 3 or 4 base pairs, corresponding to #2-4 or #2-5 base pairs of

LAV/ARF/RAV hexanucleotide recognition sites. The contacted region in the case of At1g16640, due to truncated N-arm would be even shorter, but the lack of

DNA binding in this case more likely is due to replacement of the conserved N- arm serine involved in DNA phosphate contacts (see above) by a glutamate

E16 (Fig. 8). Interestingly, some REM domains contain N- and C-arms that are longer than in At1g16640 and VRN1-B3 [2,27]. It remains to be seen if some of these domains may recognize specific DNA sequences. EXPERIMENTAL PROCEDURES

DNA oligonucleotides

Oligoduplex substrates used in this study are listed in Supporting information

Table S1. All oligonucleotides were purchased from Metabion (Germany).

Radioactive labeling was performed with [γ-33P]ATP (PerkinElmer) and T4 polynucleotide kinase (Thermo Fisher Scientific). Oligoduplexes were assembled by annealing the corresponding radiolabeled and unlabeled strands.

Construction of expression vectors and site-directed mutagenesis

The genes encoding the B3 domains of A. thaliana proteins NGA1 (UniProtKB ID

O82799 residues 29-143) and ABI3 (UniProtKB ID Q01593 residues 557-677 with an N-terminal (His)6 tag sequence MGH6G) with E. coli-optimized codons were synthesized by Invitrogen. The of NGA1-B3 was inserted into the pLATE31 plasmid (Thermo Fisher Scientific); the resultant construct contained a

C-terminal (His)6 tag. The gene of ABI3-B3 with an N-terminal (His)6 sequence was cloned into the pLATE11 plasmid (Thermo Fisher Scientific). Mutations were introduced using ‘Phusion mutagenesis kit’ (Thermo Fisher Scientific).

Sequences of wt and mutant genes were confirmed by Sanger sequencing.

Protein expression and purification

All proteins were expressed in E. coli strain ER2566. Cells were grown in LB broth in the presence of ampicillin at 37°C. When A600 of the cell culture reached 0.5, the incubation temperature was lowered to 16°C, 0.2 mM IPTG were added, cells incubated for ~16 hours at 16°C and harvested by centrifugation. Cells were sonicated in a buffer containing 20 mM Tris-HCl, pH 8.0, 500 mM NaCl, 2 mM phenylmethylsulfonyl fluoride and 7 mM 2- mercaptoethanol. Lysates were cleared by centrifugation, and proteins were purified to >90% homogeneity by chromatography through HisTrap HP chelating and HiTrap Heparin HP columns (GE Healthcare). Purified proteins were stored at -20°C in a buffer containing 20 mM Tris-HCl (pH 8.0), 200 mM

KCl, 1 mM DTT, and 50% v/v glycerol. Protein concentrations were determined from A280 measurements using the theoretical extinction coefficients calculated with the ProtParam tool available at http://web.expasy.org/protparam/. Protein concentrations are expressed in terms of monomer.

Removal of hexahistidine tags

Affinity tags of wt ABI3-B3 and wt NGA1-B3 were removed by limited trypsinolysis. Incubation of either protein (1.0 mg/mL concentration) with 0.01 mg/mL trypsin (SIGMA) in a pH 8.0 buffer containing 50 mM KCl and 5 mM Tris-

HCl for 5-60 min at 25°C resulted in formation of stable truncated protein forms as judged by SDS-PAGE. Trypsinolysis products were purified by chromatography through HisTrap HP chelating column (both proteins eluted as flow-through) and HiTrap Heparin HP columns. Mass spectrometric analysis of purified samples confirmed removal of the affinity tags: the experimentally determined MW of purified NGA1-B3 protein, 13436.30 Da, corresponds to 29-

142 residues of full-length NGA1 (calculated MW 13436.22 Da); the MW of purified ABI3-B3 protein, 13669.11 Da, corresponds to 560-677 residues of full- length ABI3 (calculated MW 13668.81 Da). Subsequently, small samples of Tag- free ABI3-B3 and NGA1 wt/mutant proteins that were used for EMSA experiments were generated by incubation of 0.1-0.2 mL 0.5-1.0 mg/mL volumes of purified proteins with 100-fold lower concentration of trypsin for 20 min at 25°C. The reactions were stopped by addition of phenylmethylsulfonyl fluoride to 2 mM final concentration, and successful trypsinolysis was confirmed by SDS-PAGE. The resultant protein samples were used for EMSA without further purification.

Protein crystallization

Protein crystallization was performed by sitting drop vapor diffusion method at

4 °C. The NGA1-B3 protein in 20 mM Tris-HCl (pH 8.0 at 25 C), 400 mM NaCl and 0.02% NaN3 was concentrated to 7.1 mg/ml (∼0.5 mM); 0.4 μL of protein solution was mixed with 0.2 μL of the crystallization solution (0.1 M Hepes-

NaOH pH 7.5, 0.2 M K/Na tartrate; 1.6 M K/Na phosphate; 5% PEG 4000) using an ‘Oryx8’ robotic unit (Douglas Instruments). Crystals appeared after 2 weeks and reached the maximum size in 6 months.

Data collection and structure determination

NGA1-B3 crystals were briefly transferred into an experimentally selected cryo- buffer (0.1 M Tris-HCl, 2.1 M Na-malonate pH 7.0) and flash-frozen. Diffraction data were collected at 100 K at the P14 beamline (PETRA III / DESY) on an

EIGER 16M detector. Data were processed with XDS [28], SCALA and TRUNCATE

[29]. The structure was solved automatically using the molecular replacement protocol of Auto-Rickshaw pipeline [30]. Molecular replacement procedure during the Auto-Rickshaw run was performed by the MoRDa pipeline [31] and

MOLREP [32]. The best template selected by MoRDa for model building was the

B3 domain of ARF5 transcription factor (PDB ID 4ldu) [4]. The further Auto-

Rickshaw run consisted of rigid-body refinement (CNS [33]), density modification (PIRATE [29]), model building (ARP/wARP [34]), and structure refinement (REFMAC [35] and PHENIX [36]). The obtained model was manually corrected using COOT [37] and refined using PHENIX (phenix.refine_1.10_2155)

[36]. Data collection and refinement statistics, as reported by the ‘Generate table for journal’ utility from the PHENIX [36] package, are shown in Table 1.

Coordinates and structure factors of NGA1-B3 structure were deposited in the

RCSB under PDB ID: 5os9.

Homology models

The model of the DNA-bound ABI3-B3 domain was built with MODELLER [38] using the VAL1-B3–DNA structure (PDB ID 6fas, chains ACD) as the sole template. The rotamers of ABI3-B3 residues K574, K577, D580, K631 and R623 were selected such that they either make direct contacts to the DNA backbone, or occupy similar positions as in VAL1-B3.

The templates for the model of DNA-bound NGA1-B3 in MODELLER [38] were apo-NGA1-B3 (PDB ID 5os9 chain A), ARF1-B3–DNA (PDB ID 4ldx, chains ACD or chain A), and VAL1-B3–DNA (PDB ID 6fas, chain A or chains ACD). Since the

DNA conformation in complexes with ARF1-B3 and VAL1-B3 is almost undistorted B-form [11], the specifically recognized hexanucleotide in both templates was simply mutated into the RAV1 sequence (5′-CACCTG-3′) matching the experimentally determined NGA1-B3 binding orientation using the ‘mutate_bases’ function of the ‘3DNA’ package [39], and directly copied into the models. To enable alternative folding of the N-arm and C-arm loops, in some MODELLER runs these loops were removed from the templates. For example, the NGA1-B3–DNA model used for visualization was generated using the following templates: VAL1-B3 lacking residues A304-G308 and P350-M356; apo-NGA1-B3 lacking residues V44-N48 and N92-S96; and an intact ARF1-B3– DNA complex with a mutated DNA sequence. The rotamers of the presumed backbone-contacting NGA1-B3 residues (K37, K46, R87, K101 and S104) were selected such that their side chains make direct contacts to DNA backbone phosphates.

Structure and sequence analysis

Protein structures were overlaid using Multiprot [40]. Structure-based sequence alignments were generated using the Staccato algorithm [41] and ESPRIPT

[42], phylogenetic tree of protein sequences was generated using phylogeny.fr

[43]. All images of protein structures were generated using PyMOL (The PyMOL

Molecular Graphics System, Version 1.7 Schrödinger, LLC). Electrostatic potential surfaces of proteins were generated using the ‘APBS Tools’ plugin in

PyMOL.

Protein labeling and fluorescence measurements

A 0.1 mL 300 μM sample of a protein mutant (ABI3-B3 variants C663S-T596C or

C663S-T646C, NGA1-B3 variants C123S-K108C or C123S-R59C) was transferred into the labeling buffer (0.1 M NaCl, 0.1 M Na-phosphate, pH 7.0) using a NAP-5 desalting column (GE Healthcare). The resultant protein sample (0.5 mL, 60

μM) was mixed with 1 μL of 20 mM Alexa Fluor 488 maleimide derivative dissolved in DMSO and incubated for 1 hour at room temperature. The protein sample was then diluted with 2 mL of buffer containing 10 mM Tris-HCl (pH 8.0) and 15 mM NaCl, loaded on a HiTrap Heparin HP column, eluted using a NaCl gradient and dialyzed against the storage buffer as described above. The protein labeling efficiencies, as calculated from A280 and A495 measurements were approximately 80% for the ABI3-B3(C663S-T596C), approx. 50% for the ABI3-B3(C663S-T646C), and approx. 60% for the NGA1-B3(C123S-K108C) proteins. The NGA1-B3(C123S-R59C) protein precipitated upon labeling and could not be purified.

Fluorescence measurements were performed at 25°C using a Fluoromax-3 fluorimeter. Excitation wavelength was 495 nm (2 nm slit), emission wavelength was 519 nm (4 nm slit). The samples (total volume 160 μL) contained 100 nM labeled protein with 200 nM of either unmodified 18 bp DNA, or 18 bp DNA variants carrying 5’-BHQ-2 or 3’-BHQ-2 modifications (Table S1) in the Measurement Buffer (50 mM KCl, 10 mM MES-KOH, pH 6.0 at 25°C).

Electrophoretic mobility shift assay

DNA binding was analyzed by the electrophoretic mobility shift assay (EMSA) using 33P-labeled 12 bp oligoduplexes. DNA (final concentration typically 1 μM, obtained by mixing 2-4 nM of radiolabeled DNA with an appropriate amount of unlabeled oligoduplex) was incubated with the protein (final concentrations typically varied from 0.5 to 10 μM) for 15 min in 20 μL of the binding buffer containing 40 mM Tris-acetate (pH 8.3 at 25°C), 0.1 mg/ml BSA, 10% v/v glycerol and 5 mM calcium acetate (for NGA1-B3) or 1 mM EDTA (for ABI3-B3).

Free DNA and protein-DNA complexes were separated by electrophoresis through 8% polyacrylamide gels (29:1 acrylamide/bisacrylamide, gel length:width:thickness 22:15:0.1 cm) in 40 mM Tris-acetate (pH 8.3) for 45 min at 5 V/cm. The running buffer in the case of NGA1-B3 was also supplemented with 5 mM calcium acetate. Low power consumption during electrophoresis runs (approx. 2 W or 110 V × ~18 mA using the standard buffer, and approx.

2.6 W or 110V × ~24 mA using the buffer supplemented with calcium acetate per electrophoretic unit containing two gels and 1 liter of electrophoretic buffer) ensured that the gels remained at room temperature (~22°C), well below the melting temperature of 12 bp duplexes used in the assay, >40°C.

Radiolabeled DNA and protein-DNA complexes were detected using the Cyclone phosphorimager and the OptiQuant software (Packard Instrument).

EMSA competition experiments

EMSA samples contained 2 μM protein (WT ABI3-B3 or WT NGA1-B3 without hexahistidine tags), 1 μM radiolabeled 12/12LAV (with ABI3-B3) or 12/12RAV

(with NGA1-B3) DNA, and variable amounts (final concentrations up to 20.0 μM) of unlabeled competitor 12/12LAV or 12/12RAV DNA. Electrophoresis was run as described above. The amounts of the radiolabeled specific protein-DNA complexes (i. e., the complex with the respective cognate 12 bp DNA) at different competitor concentrations were determined by densitometric analysis of phosphorimager images of EMSA gels. Competition data, which is described by two equilibria,

(Protein) + (labeled cognate DNA) <=KD(cognate)=> (labeled complex), and

(Protein) + (unlabeled competitor) <=KD(competitor)=> (unlabeled complex), were analyzed as described in ref. [11]. All experiments were performed in triplicate, the non-linear regression results to combined data are reported as the optimal parameters KD(cognate) and KD(competitor) (in µM) ± 1 standard error as reported by the KyPlot (v. 2.0 beta 14) software [44] used for non- linear regression. We note that the dissociation constant KD(cognate) for the labeled cognate DNA–protein interaction (4.8 ± 0.3 µM for ABI3-B3, and 1.5 ±

0.1 µM for NGA1-B3) is not equal to the KD(competitor) value determined in the same experiment for the identical cognate competitor DNA (the differences are

1.5-2-fold, Fig. 2B,D). Most likely these discrepancies arise due to slight variations in actual concentrations of the active protein, and to partial dissociation of the protein-DNA complex during the electrophoretic runs (the latter may reduce the amount of cognate complex in the absence of the competitor, and consequently increase the observed KD(cognate) value).

ACKNOWLEDGMENTS

Authors thank E. Rudokienė for sequencing services and EMBL DESY staff members dr. G. Bourenkov and dr. J. Kallio for the help with beamline operation.

This work was supported by the Research Council of Lithuania grant MIP-

106/2015 (to G. S.). E. M. acknowledges travel support by iNEXT, proposal ID

1812, funded by the Horizon 2020 programme of the European Union.

AUTHOR CONTRIBUTIONS

GS conceived the study. GS and EM planned experiments. GS, EM, KL and KK performed experiments. GS analyzed data. GS and VS wrote the paper. REFERENCES

1 Wang Y, Deng D, Zhang R, Wang S, Bian Y & Yin Z (2012) Systematic analysis of plant-specific B3 domain-containing proteins based on the genome resources of 11 sequenced species. Mol. Biol. Rep. 39, 6267–82. 2 Swaminathan K, Peterson K & Jack T (2008) The plant B3 superfamily. Trends Plant Sci. 13, 647–55. 3 Guilfoyle TJ & Hagen G (2007) Auxin response factors. Curr. Opin. Plant Biol. 10, 453–460. 4 Boer DR, Freire-Rios A, van den Berg WAM, Saaki T, Manfield IW, Kepinski S, López-Vidrieo I, Franco-Zorrilla JM, de Vries SC, Solano R, Weijers D & Coll M (2014) Structural basis for DNA binding specificity by the auxin-dependent ARF transcription factors. Cell 156, 577–89. 5 Ulmasov T, Liu ZB, Hagen G & Guilfoyle TJ (1995) Composite Structure of Auxin Response Elements. PLANT CELL ONLINE 7, 1611–1623. 6 Suzuki M, Kao CY & McCarty DR (1997) The conserved B3 domain of VIVIPAROUS1 has a cooperative DNA binding activity. Plant Cell 9, 799–807. 7 Reidt W, Wohlfarth T, Ellerström M, Czihal A, Tewes A, Ezcurra I, Rask L & Bäumlein H (2000) Gene regulation during late embryogenesis: the RY motif of maturation-specific gene promoters is a direct target of the FUS3 gene product. Plant J. 21, 401–8. 8 Mönke G, Altschmied L, Tewes A, Reidt W, Mock H-P, Bäumlein H & Conrad U (2004) Seed-specific transcription factors ABI3 and FUS3: molecular interaction with DNA. Planta 219, 158–166. 9 Yuan W, Luo X, Li Z, Yang W, Wang Y, Liu R, Du J & He Y (2016) A cis cold memory element and a trans epigenome reader mediate Polycomb silencing of FLC by vernalization in Arabidopsis. Nat. Genet. 48, 1527– 1534. 10 Qüesta JI, Song J, Geraldo N, An H & Dean C (2016) Arabidopsis transcriptional repressor VAL1 triggers Polycomb silencing at FLC during vernalization. Science 353, 485–8. 11 Sasnauskas G, Kauneckaitė K & Siksnys V (2018) Structural basis of DNA target recognition by the B3 domain of Arabidopsis epigenome reader VAL1. Nucleic Acids Res. 46, 4316–4324. 12 Kagaya Y, Ohmiya K & Hattori T (1999) RAV1, a novel DNA-binding protein, binds to bipartite recognition sequence through two distinct DNA-binding domains uniquely found in higher plants. Nucleic Acids Res. 27, 470–8. 13 Yamasaki K, Kigawa T, Inoue M, Tateno M, Yamasaki T, Yabuki T, Aoki M, Seki E, Matsuda T, Tomo Y, Hayami N, Terada T, Shirouzu M, Osanai T, Tanaka A, Seki M, Shinozaki K & Yokoyama S (2004) Solution structure of the B3 DNA binding domain of the Arabidopsis cold-responsive transcription factor RAV1. Plant Cell 16, 3448–59. 14 King GJ, Chanson AH, McCallum EJ, Ohme-Takagi M, Byriel K, Hill JM, Martin JL & Mylne JS (2013) The Arabidopsis B3 domain protein VERNALIZATION1 (VRN1) is involved in processes essential for development, with structural and mutational studies revealing its DNA-binding surface. J. Biol. Chem. 288, 3198–207. 15 Waltner JK, Peterson FC, Lytle BL & Volkman BF (2005) Structure of the B3 domain from Arabidopsis thaliana protein At1g16640. Protein Sci. 14, 2478–83. 16 Golovenko D, Manakova E, Tamulaitiene G, Grazulis S & Siksnys V (2009) Structural mechanisms for the 5’-CCWGG sequence recognition by the N- and C-terminal domains of EcoRII. Nucleic Acids Res. 37, 6613–24. 17 Sasnauskas G, Tamulaitienė G, Tamulaitis G, Čalyševa J, Laime M, Rimšelienė R, Lubys A & Siksnys V (2017) UbaLAI is a monomeric Type IIE restriction enzyme. Nucleic Acids Res. 44, W410–W415. 18 Golovenko D, Manakova E, Zakrys L, Zaremba M, Sasnauskas G, Gražulis S & Siksnys V (2014) Structural insight into the specificity of the B3 DNA- binding domains provided by the co-crystal structure of the C-terminal fragment of BfiI restriction enzyme. Nucleic Acids Res. 42, 4113–22. 19 Tamulaitiene G, Silanskas A, Grazulis S, Zaremba M & Siksnys V (2014) Crystal structure of the R-protein of the multisubunit ATP-dependent restriction endonuclease NgoAVII. Nucleic Acids Res. 42, 14022–14030. 20 Koornneef M, Reuling G & Karssen CM (1984) The isolation and characterization of abscisic acid-insensitive mutants of Arabidopsis thaliana. Physiol. Plant. 61, 377–383. 21 Lee BH, Mai TT, Song JT & Kim JH (2017) The Arabidopsis thaliana NGATHA1 transcription factor acts as a promoter of a general differentiation program and a carpel identity factor. J. Plant Biol. 60, 352–357. 22 Fox NK, Brenner SE & Chandonia J-M (2014) SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42, D304-9. 23 Yamasaki K, Kigawa T, Seki M, Shinozaki K & Yokoyama S (2013) DNA- binding domains of plant-specific transcription factors: structure, function, and evolution. Trends Plant Sci. 18, 267–276. 24 Nag R, Maity MK & DasGupta M (2005) Dual DNA Binding Property of ABA insensitive 3 Like Factors Targeted to Promoters Responsive to ABA and Auxin. Plant Mol. Biol. 59, 821–838. 25 Chen N, Veerappan V, Abdelmageed H, Kang M & Allen RD (2018) HSI2/VAL1 Silences AGL15 to Regulate the Developmental Transition from Seed Maturation to Vegetative Growth in Arabidopsis. Plant Cell, tpc.00655.2017. 26 Kostiuk G, Dikić J, Schwarz FW, Sasnauskas G, Seidel R & Siksnys V (2017) The dynamics of the monomeric restriction endonuclease BcnI during its interaction with DNA. Nucleic Acids Res. 45, 5968–5979. 27 Romanel EAC, Schrago CG, Couñago RM, Russo CAM & Alves-Ferreira M (2009) Evolution of the B3 DNA binding superfamily: new insights into REM family gene diversification. PLoS One 4, e5791. 28 Kabsch W (2010) XDS. Acta Crystallogr. D. Biol. Crystallogr. 66, 125–32. 29 CCP4 (1994) The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D. Biol. Crystallogr. 50, 760–3. 30 Panjikar S, Parthasarathy V, Lamzin VS, Weiss MS & Tucker PA (2005) Auto- rickshaw: an automated crystal structure determination platform as an efficient tool for the validation of an X-ray diffraction experiment. Acta Crystallogr. D. Biol. Crystallogr. 61, 449–57. 31 Vagin A & Lebedev A (2015) MoRDa , an automatic molecular replacement pipeline. Acta Crystallogr. Sect. A Found. Adv. 71, s19–s19. 32 Vagin A & Teplyakov A (2010) Molecular replacement with MOLREP. Acta Crystallogr. D. Biol. Crystallogr. 66, 22–5. 33 Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T & Warren GL (1998) Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. D. Biol. Crystallogr. 54, 905–21. 34 Langer G, Cohen SX, Lamzin VS & Perrakis A (2008) Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nat. Protoc. 3, 1171–9. 35 Murshudov GN, Skubák P, Lebedev AA, Pannu NS, Steiner RA, Nicholls RA, Winn MD, Long F & Vagin AA (2011) REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr. D. Biol. Crystallogr. 67, 355–67. 36 Afonine P V, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH & Adams PD (2012) Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallogr. D. Biol. Crystallogr. 68, 352–67. 37 Emsley P & Cowtan K (2004) Coot: model-building tools for molecular graphics. Acta Crystallogr. D. Biol. Crystallogr. 60, 2126–32. 38 Webb B & Sali A (2016) Comparative Protein Structure Modeling Using MODELLER. In Current Protocols in Protein Science p. 2.9.1-2.9.37. John Wiley & Sons, Inc., Hoboken, NJ, USA. 39 Lu X-J & Olson WK (2008) 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nat. Protoc. 3, 1213–1227. 40 Shatsky M, Nussinov R & Wolfson HJ (2004) A method for simultaneous alignment of multiple protein structures. Proteins 56, 143–56. 41 Shatsky M, Nussinov R & Wolfson HJ (2005) Optimization of multiple- sequence alignment based on multiple-structure alignment. Proteins Struct. Funct. Bioinforma. 62, 209–217. 42 Gouet P, Robert X & Courcelle E (2003) ESPript/ENDscript: Extracting and rendering sequence and 3D information from atomic structures of proteins. Nucleic Acids Res. 31, 3320–3. 43 Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard J-F, Guindon S, Lefort V, Lescot M, Claverie J-M & Gascuel O (2008) Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 36, W465-9. 44 Yoshioka K (2002) KyPlot as a Tool for Graphical Data Analysis. In Compstat pp. 37–46. Physica-Verlag HD, Heidelberg. SUPPORTING INFORMATION

Table S1. Oligoduplex substrates used in the study. TABLES

Table 1. NGA1-B3 apo-structure. Data collection and refinement statistics. Data collection

Space group P212121 a (Å) 30.76 b (Å) 151.18 c (Å) 47.51 α/β/γ 90° /90° /90° Wavelength (Å) 0.9763 X-ray source DESY/PETRAIII/P14 Total reflections 42321 Unique reflections 21197 Resolution range* (Å) 75.59–1.80 (1.86-1.80) Completeness* (%) 99 (98) Multiplicity* 2.0 (2.0) Mean I/σ(I)* 21.81 (6.85) R(merge)* (%) 1.86 (10.59) B(iso) from Wilson (Å2) 17.75

Resolution range (Å) 75.59–1.80 Reflections work (non-anomalous) / test 21189 / 2100 Protein / solvent atoms 1894 / 260 R-factor / R-free (%) 15.5 / 20.2 R.m.s.d. bond lengths (Å) / angles (deg) 0.019 / 1.54 Ramachandran plot (%) favoured / allowed / outliers 95.6 / 4 / 0.4 Average B factors: Macromolecule / solvent / all atoms 20.0 / 29.8 / 21.2 Refinement PDB ID 5os9 * The data for the outer shell (1.86-1.80 Å) Table 2. DNA recognition by site-specific B3 DNA binding domains.

Recognition sequence position / contacting residues / evidence a TF #1 #2 #3 #4 #5 #6 (reference) VAL1-B3 I307 R309 W349 M356 [11] (str., (str, mut.) (str.) (str., mut.) mut.) V351 N351 (str., mut.) (str., mut.) S302 (str.) ABI3-B3 L584 R586 W627 M364 [this work] (model, (model, (model, (mut., model) mut.) mut.) mut.) V588 N629 (mut., (mut., model) model) S579 (model, mut.) ARF1-B3 H136 - R181 Q183 R186 R185 [4] (str.) (str.) (str.) (str.) (str.) G137 S140 P184 (str.) (str.) (str.) S131 (str.) ARF5-B3 H170, - R215 P218 [4] G171 (mut.) (mut.) (mut.) RAV1-B3 L200 N201 W245 V204 N246 [13,23] (NMR, (NMR) (NMR, (NMR) (model) model) R202 model) Q249, S250, S251 (NMR, (NMR) model) NGA1-B3 b L47 N48? W91 V51 N92?, S93?, Q95? (this work) (model, (model, (model, (mut., (model) mut.) mut.) mut.) model) S94?, S96? R49 S42 (mut., model) (mut., (model, model) mut.) a Mutagenesis – (mut.); structure – (str.). b The question marks (?) mark the ‘2nd’ group residues as discussed in the

Results section. FIGURES A NGA1-B3 RAV1-B3 ARF1-B3 (%) VAL1-B3 chrg. id./sim./gap NGA1-B3 29 +6 100xx100660 RAV1-B3 182 +9 366xxx78663 ARF1-B3 120 +2 _35xxx58668 VAL1-B3 289 +4 _32xxx44665 ABI3-B3 566 +6 x28xxx43xx7 N-arm C-arm VAL1-B3–DNA NGA1-B3 B C ARF1-B3–DNA RAV1-B3 NGA1-B3

Y88 90° S89 Y90

W91 N-arm N92 C-arm Fig. 1. Site-specific A. thaliana B3 DNA recognition domains. (A) Structure- based sequence alignment of B3 DNA binding domains of NGA1 (PDB ID 5os9, this study), RAV1 (PDB ID 1wid), ARF1 (PDB ID 4ldx), VAL1 (PDB ID 6fas), and

ABI3-B3 (only sequence available). Residues forming the N-arm (green box) and the C-arm (orange box) structural elements involved in DNA recognition are indicated. Charges (chrg.) at neutral pH of the aligned sequences, and % identity (id.) / similarity (sim.) / gaps of all proteins relative to NGA1-B3 are given at the end of the alignment. Circles above and below the alignment mark

ABI3-B3 and NGA1-B3 residues mutated in this work. (B) Superimposition of the apo-structures of NGA1-B3 (pink, PDB ID 5os9 chain A) and RAV1-B3 (gray, PDB

ID 1wid model 1). Secondary structure elements of NGA1-B3 are marked. The

N-arm residues of NGA1and RAV1 are colored green and light green, respectively; the C-arm residues of the corresponding B3 domains are colored orange and light orange. The NGA1-B3 C-arm region Y88-N92 showing 2mFo-

DFc electron density contoured at the 1.6σ level is also shown. (C)

Superimposition of DNA-bound structures of VAL1-B3 (yellow, PDB ID 6fas),

ARF1-B3 (light pink, PDB ID 4ldx, residues 119-225), and apo-NGA1-B3 (pink,

PDB ID 5os9). The N-arm residues of VAL1, ARF1 and NGA1 are colored moss- green, light green, and green, respectively; the C-arm residues of the corresponding B3 domains are colored orange, light orange and brown. The 5′-

TGCATG-3′/5′-CATGCA-3′ fragments of VAL1-B3 DNA are blue/gray, the 5′-

GAGACA-3′/5′-TGTCTC-3′ fragments of ARF1-B3 DNA are light blue/white. 12/12RAV 12/12LAV (non-specific DNA) (specific DNA) Nonspecific complex

Specific complex

5 -CGGTGCATGGCT-3

A 5 -CGACAGGTGGCT-3

3 -GCCACGTACCGA-5 Free DNA

3 -GCTGTCCACCGA-5

WT ABI3-B3 , 40 mM Tris-acetate, pH 8.3

N-(His)6 tag No tag 12/12RAV 12/12LAV 12/12RAV 12/12LAV

66 68 27 27 16 27 34 16 73 46 % % % % '0' '0' '0' '0'

B 2 µM WT ABI3-B3 + 1 µM labeled 12/12LAV Unlabeled competitor (µM) Competitor: 12/12LAV 12/12RAV 12/12LAV (cognate) (non-cognate) (cognate) 00.62 6 0 2 620 12/12RAV Labeled Labeled (non-cognate) complex (%) complex

Competitor (µM) '0' cognate KD(competitor) = 2.3 ± 0.4 µM

non-cognate KD(competitor) = 11 ± 2 µM

12/12LAV 12/12RAV C Nonspecific complex

(non-specific DNA) (specific DNA) Specific complex

5✁ -AGCCACCTGTCG-3

5✁ -CGGTGCATGGCT-3 ✁

3✁ -TCGGTGGACAGC-5 Free DNA

✁ 3✁ -GCCACGTACCGA-5

WT NGA1-B3 , 40 mM Tris-acetate, pH 8.3

C-(His)6 tag No tag 12/12LAV 12/12RAV 12/12LAV 12/12RAV

78 76 78 77 22 23 22 24 % % % % '0' '0' '0' '0' WT NGA1-B3 , 40 mM Tris-acetate + 5 mM Ca2+, pH 8.3

C-(His)6 tag No tag 12/12LAV 12/12RAV 12/12LAV 12/12RAV

65 40 28 49 35 32 51 % '0'% '0' % '0' '0'

D 2 µM WT NGA1-B3 + 1 µM labeled 12/12RAV

Unlabeled competitor (µM) Competitor: 12/12RAV 12/12LAV 12/12RAV (cognate) (non-cognate) (cognate) 00.62 6 0 2 620 12/12LAV Labeled Labeled (non-cognate) complex (%) complex

'0' Competitor (µM)

cognate KD(competitor) = 2.3 ± 0.4 µM

non-cognate KD(competitor) = 45 ± 15 µM Fig. 2. Site-specific DNA binding by wt ABI3-B3 and wt NGA1-B3.

(A) DNA binding by ABI3-B3 containing and lacking an N-terminal hexahistidine tag in a pH 8.3 Tris-acetate buffer. Noncognate DNA was 12/12RAV, cognate

DNA was 12/12LAV. Concentration of radiolabeled DNA was 1 μM, protein concentrations were 0.5, 1, 2, 5 μM; lanes ‘0’ contained no protein. Positions of free DNA, specific and nonspecific complexes are indicated by blue, red and yellow brackets, respectively. The numbers indicate % amount of the respective

DNA forms in the highest protein concentration lane. (B) An EMSA competition experiment performed with ABI3-B3. The reactions contained 1 µM radiolabeled

12/12LAV DNA, 2 µM ABI3-B3 domain lacking a hexahistidine tag, and variable amounts of unlabeled cognate 12/12LAV or non-cognate 12/12RAV competitors.

The plot shows the amount of the radiolabeled protein-DNA complex as a function of the cognate (open circles) and non-cognate (crosses) competitor concentration. The non-linear fit of the competition model (solid lines) to cognate competitor data (11 data points) gave KD(competitor) = 2.3 ± 0.4 µM and KD(cognate) = 4.8 ± 0.3 µM, and the fit to the non-cognate competitor data

(11 data points) gave KD(competitor) = 11 ± 2 µM and KD(cognate) = 4.9 ± 0.3

µM (optimal fit values ± SEM). (C) DNA binding by NGA1-B3 containing and lacking a C-terminal hexahistidine tag in a pH 8.3 Tris-acetate buffer, and in a pH 8.3 Tris-acetate buffer supplemented with 5 mM calcium acetate.

Noncognate DNA (1 µM) was 12/12LAV, cognate DNA was 12/12RAV, protein concentrations were 1, 2, 4, 10 µM. (D) An EMSA competition experiment performed with NGA1-B3. The reactions in a Ca2+-containing buffer contained 1

µM radiolabeled 12/12RAV DNA, 2 µM NGA1-B3 domain lacking a hexahistidine tag, and variable amounts of unlabeled cognate 12/12RAV or non-cognate

12/12LAV competitors. Data analysis was performed as in panel (B). The non- linear fit of the competition model to cognate competitor data (13 data points) gave KD(competitor) = 2.3 ± 0.4 µM and KD(cognate) = 1.5 ± 0.1 µM, and the fit to the non-cognate competitor data (13 data points) gave KD(competitor) =

45 ± 15 µM and KD(cognate) = 1.5 ± 0.1 µM (optimal fit values ± SEM). 0.5, 1, 2, 5 µM WT ABI3-B3 1, 2 , 4, 10 µM WT NGA1-B3 ABFree DNA Specific complex Nonspecific complex Specific complex Free DNA

12/12LAV 12/12LAV_1A 12/12LAV_1C 12/12LAV_1G 12/12RAV12/12RAV_1G 12/12RAV_1A 12/12RAV_1T

5 ~gTGCATGg~3 5 ~gAGCATGg~3 5 ~gCGCATGg~3 5 ~gGGCATGg~3 5 ~cCACCTGt~3 5 ~cGACCTGt~3 5 ~cAACCTGt~3 5 ~cTACCTGt~3

3 ~cACGTACc~5 3 ~cTCGTACc~5 3 ~cGCGTACc~5 3 ~cCCGTACc~5 3 ~gGTGGACa~5 3 ~gCTGGACa~5 3 ~gTTGGACa~5 3 ~gATGGACa~5 27 19 17 27 27 11 48 31 46 69 54 46 81 72 73 52 % % % % % % '0' % '0'

12/12LAV_2 12/12LAV_2C 12/12LAV_2T 12/12LAV_2A 12/12RAV_2 12/12RAV_2C 12/12RAV_2G 12/12RAV_2T

5 ~gCACCTGa~3 5 ~cCCCCTGt~3 5 ~cCGCCTGt~3 5 ~cCTCCTGt~3

5 ~cTGCATGc~3 5 ~gTCCATGg~3 5 ~gTTCATGg~3 5 ~gTACATGg~3

3 ~cGTGGACt~5 3 ~gGGGGACa~5 3 ~gGCGGACa~5 3 ~gGAGGACa~5

3 ~gACGTACg~5 3 ~cAGGTACc~5 3 ~cAAGTACc~5 3 ~cATGTACc~5

26 23 27 22 52 38 17 24 83 49 77 73 48 62 % 78 % % % % % % '0'

12/12LAV_3 12/12LAV_3G 12/12LAV_3A 12/12LAV_3T 12/12RAV_3 12/12RAV_3G 12/12RAV_3A 12/12RAV_3T

5 ~aCACCTGg~3 5 ~cCAGCTGt~3 5 ~cCAACTGt~3 5 ~cCATCTGt~3

5 ~aTGCATGg~3 5 ~gTGGATGg~3 5 ~gTGAATGg~3 5 ~gTGTATGg~3

3 ~tGTGGACc~5 3 ~gGTCGACa~5 3 ~gGTTGACa~5 3 ~gGTAGACa~5

3 ~tACGTACc~5 3 ~cACCTACc~5 3 ~cACTTACc~5 3 ~cACATACc~5 25 27 23 24 65 17 21 34 35 83 66 54 73 77 76 % % % % % % %

12/12LAV_4T 12/12LAV_4C 12/12LAV_4G 12/12RAV_4 12/12RAV_4G 12/12RAV_4A 12/12RAV_4T

5 ~gTGCTTGg~3 5 ~gTGCCTGg~3 5 ~gTGCGTGg~3 5 ~tCACCTGc~3 5 ~cCACGTGt~3 5 ~cCACATGt~3 5 ~cCACTTGt~3

3 ~cACGAACc~5 3 ~cACGGACc~5 3 ~cACGCACc~5 3 ~aGTGGACg~5 3 ~gGTGCACa~5 3 ~gGTGTACa~5 3 ~gGTGAACa~5 24 29 27 52 20 48 80 76 71 73 % % % % %

12/12LAV_5A 12/12LAV_5G 12/12LAV_5C 12/12LAV 12/12RAV_5A 12/12RAV_5C 12/12RAV_5G

5 ~gTGCAAGg~3 5 ~gTGCAGGg~3 5 ~gTGCACGg~3 5 ~gtgcatgg~3 5 ~cCACCAGt~3 5 ~cCACCCGt~3 5 ~cCACCGGt~3

3 ~cACGTTCc~5 3 ~cACGTCCc~5 3 ~cACGTGCc~5 3 ~cacgtacc~5 3 ~gGTGGTCa~5 3 ~gGTGGGCa~5 3 ~gGTGGCCa~5 24 28 33 * 76 72 67 % % % '0'

12/12RAV 12/12LAV_6C 12/12LAV_6T 12/12LAV_6A 12/12NSP 12/12RAV_6A 12/12RAV_6C 12/12RAV_6T

5 ~cgtggact~3 5 ~cCACCTAt~3 5 ~cCACCTCt~3 5 ~cCACCTTt~3

5 ~ccacctgt~3 5 ~gTGCATCg~3 5 ~gTGCATTg~3 5 ~gTGCATAg~3

3 ~gcacctga~5 3 ~gGTGGATa~5 3 ~gGTGGAGa~5 3 ~gGTGGAAa~5

3 ~ggtggaca~5 3 ~cACGTAGc~5 3 ~cACGTAAc~5 3 ~cACGTATc~5 29 27 29 24 ?

71 73 71 76 4 10 µM % % % % '0' Fig. 3. Stringency of sequence recognition by ABI3-B3 and NGA1-B3.

(A) ABI3-B3 binding to DNA with alterations at various positions within or adjacent to the Sph/RY sequence 5′-TGCATG-3′. The positions that deviate from the standard sequence are shown as white letters on gray background. DNA concentration was 1 µM, ABI3-B3 concentrations were 0.5, 1, 2 and 5 µM, EMSA experiments were performed in a pH 8.3 Tris-acetate running buffer as described in Experimental Procedures. Specific substrates (Sph/RY sequence intact) are in a green box, nonspecific DNA is in a red box, DNAs with single bp substitutions capable of cognate complex formation is in a blue box. Positions of the free DNA, specific complex and nonspecific complex are marked by blue, red and yellow brackets, respectively, and the numbers indicate % amount of the respective DNA forms in the highest protein concentration lane. (B) NGA1-

B3 binding to DNA with alterations at various positions within or adjacent to the

RAV1 recognition sequence 5′-CACCTG-3′. DNA concentration was 1 µM, NGA1-

B3 concentrations were 1, 2, 4 and 10 µM, EMSA experiments were performed in a pH 8.3 Tris-acetate running buffer supplemented with 5 mM calcium acetate. The DNA oligoduplexes and protein complexes are marked as in panel

(A).

A NGA1-B3 ✁

3✁ -BHQ-2_DNA1 5 -BHQ-2_DNA1

☎ ☎

5 -CTACGACAGGTGGCTATC-BHQ-2-3 5 -BHQ-2-CTACGACAGGTGGCTATC-3

☎ ☎ 3 -GATGCTGTCCACCGATAG-5 3 -GATGCTGTCCACCGATAG-5 C123S C123S

K108C K108C Alexa Alexa 488 488

1.9 nm 3✂-BHQ-2

✄ ✄ 3 3 4.9 nm 3✄

5✂ -BHQ-2

5✄ ✄

5✄ 5 ✁

5✁ -BHQ-2_DNA2 3 -BHQ-2_DNA2

5 -CTACGACAGGTGGCTATC-3 5 -CTACGACAGGTGGCTATC-3

3 -GATGCTGTCCACCGATAG-BHQ-2-5 3 -BHQ-2-GATGCTGTCCACCGATAG-5 C123S C123S

K108C K108C

Alexa Alexa 488 488 4.8 nm

3.2 nm 3 ✂-BHQ-2

✄ ✄

3 3✄ 3 ✄ 5✄ 5

5✄

5✂-BHQ-2 B NGA1-B3 K108C(Alexa 488)

p=0.0002

p=0.00003

C ABI3-B3 ✁

(5✁ -BHQ-2 DNA) (3 -BHQ-2 DNA)

(5 -BHQ-2)-GATAGCCATGCACCGTAG-(BHQ-2-3 ) 3 -CTATCGGTACGTGGCATC-5 C663S C663S T646C

T596C Alexa 488 Alexa 488 3.1 nm

4.9 nm ✄ 3 3✄ 4.9 nm

3.3 nm ✂

(3 -BHQ-2) (3✂-BHQ-2) ✄

5 5✄ ✂ (5 -BHQ-2) (5✂-BHQ-2) ABI3-B3 ABI3-B3 T596C(Alexa 488) T646C(Alexa 488)

p=0.0002 p=0.0001 Fig. 4. DNA binding orientation of NGA1-B3. (A) Experimental strategy. The sole cysteine in NGA1-B3 variant C123S-K108C was used for labeling with Alexa

Fluor 488. According to the model of NGA1-B3 bound to 18 bp DNA, the efficiency of fluorescence quenching by BHQ-2 quenchers located on 5′- or 3′- termini of either the 5′-CAGGTG-3′ (gray) or 5′-CACCTG-3′ (blue) DNA strands should depend on the orientation of NGA1-B3 relative to the asymmetric recognition site. In the depicted binding orientation stronger quenching is expected with 3′-BHQ-2_DNA1 and 5′-BHQ-2_DNA2 than with 5′-BHQ-2_DNA1 and 3′-BHQ-2_DNA2. (B) Fluorescence measurements. Fluorescence intensity of samples containing protein (100 nM) and DNA (200 nM, either unlabeled or labeled with a BHQ-2 quencher at either terminus of the ‘blue’ and ‘gray’ strands) was normalized against the fluorescence intensity of the ‘Unmodified

DNA’ sample (error bars denote SD of three independent experiments, p values were determined by unpaired t-tests). The fluorescence quenching is consistent with the depicted DNA binding orientation. (C) Verification of the experimental strategy with ABI3-B3. The sole cysteine in wt ABI3-B3 was replaced with a serine, and, based on the model of ABI3-B3 bound to 18 bp DNA, single cysteines were introduced on the opposite faces of the protein (ABI3-B3 variants C663S-T596C and C663S-T646C) for fluorescent labeling. The fluorescence intensity of labeled proteins was measured in the presence of cognate DNA bearing a BHQ-2 quencher on either 5′- or 3′- termini of the ‘gray’ strand (oligoduplexes 5′-BHQ-2 and 3′-BHQ-2, respectively), and normalized against the intensity of samples containing unmodified DNA. Relative fluorescence data (bar graphs, error bars denote SD of three independent experiments, p values were determined by unpaired t-tests) indicate that ABI3- B3 binds the Sph/RY sequence in the same orientation as VAL1-B3 (the modeled orientation). C-arm

A N-arm 1 0 1 2 0 3' 2 3' N-arm 3 C-arm 3 5' ABI3-B3 576 S L R V 591 5' 7 ABI3-B3 622 W PN N M 639 VAL1-B3 299 S I R V 314 4 4 VAL1-B3 344 W PN N M 361 ARF1-B3 128 S H G S 143 7 5 ARF1-B3 176 R Q P R 192

NGA1-B3 39 S L N R V 54 6 5 6 NGA1-B3 86 W NSS QS 102

ABI3 5 - T G C A T G -3 3 - A C G T A C -5 W627 V588 M634* (G585) (G585) R586* R586* R586 N629* P628 N629* N630* N629* N630 M634* T1 G 2 C3 S579 C1

L584 A6 L584 A A2 C 4 T G6 5 T3 5

G4

ARF1 5 - G A G A C A -3 3 - C T C T G T -5 S140 P184 R181

(G138) G1 (G138) Q183 P184 P184 A2 ? G137 G137 S131 R186 R186 G3 C5 C6 T1 H136 H136 A4 T5 T3 G2 A6

C4

NGA1 5 - C A C C T G -3 3 - G T G G A C -5 Q95 Q95

W91 V51 W91 S93 W91 S93 R49 C R49 N92 S93 1 A2 S96 S96 N92 N92 N48 S94 S94 C N48 N48 3 S42

G6 C1 L47 L47 C T5 4 A2 G4 G6 T5 G3 B 12/12RAV 12/12LAV (non-specific DNA) (specific DNA) Nonspecific complex

Specific complex ✁

5✁-CGGTGCATGGCT-3 ✁

5✁ -CGACAGGTGGCT-3 ✁

3✁-GCCACGTACCGA-5 Free DNA ✁ 3✁ -GCTGTCCACCGA-5

Wt ABI3-B3 N-arm MUTATIONS ABI3-B3(L584A) ABI3-B3(R586A) ABI3-B3(V588A) * 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 29 21 28 18 5 2 21 35 32 ? 71 58 72 47 95 65 % % % % % % '0' '0' C-arm MUTATIONS ABI3-B3(W627A) ABI3-B3(N629A) ABI3-B3(N630A) ABI3-B3(M634A) 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV

27 32 15 26 1 2 3 3 8 24 73 68 85 66 99 74 97 97 % % % % % % % % C 12/12LAV 12/12RAV (non-specific DNA) (specific DNA)

Specific complex ✂

5 ✂-AGCCACCTGTCG-3 ✂

5✂ -CGGTGCATGGCT-3 ✂

3 ✂-TCGGTGGACAGC-5 Free DNA ✂ 3✂ -GCCACGTACCGA-5

Wt NGA1-B3 N-arm MUTATIONS NGA1-B3(L47A) NGA1-B3(N48A) NGA1-B3(R49A) NGA1-B3(V51A) 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV

53 37 60 30 50 ? 24 20 47 63 40 70 50 76 80 '0' '0' % % % % % % % C-arm MUTATIONS NGA1-B3(W91A) NGA1-B3(N92A) NGA1-B3(S94A) NGA1-B3(Q95A) NGA1-B3(S96A) 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV

42 59 19 56 58 41 81 44 % % % % Fig. 5. Base readout by the site-specific B3 domains. (A) Base specific contacts of ABI3-B3 (model), ARF1-B3 (co-crystal structure, PDB ID 4ldx), and NGA1-B3

(model). The cartoon at the top schematically depicts DNA recognition by the

N-arm and C-arm structural elements and numbering of DNA bases in both strands. Alignments of the N-arm and C-arm residues from B3 domains are shown at the top; N-arm and C-arm residues implicated in direct base readout are shaded green and orange, respectively. The same coloring scheme is applied to NGA1 residues that are conserved in other B3 domains. NGA1 residues that could potentially make direct contacts to bases based on modeling are in a pink background. NGA1-B3 positions where alanine replacements either abolish formation of the specific complex, or reduce target recognition stringency are marked by red and magenta boxes, respectively.

Individual panels show contacts made by the N-arm (green) and the C-arm

(orange) residues to 6 base pairs of the respective recognition sequence.

Asterisks denote residues that make contacts to multiple base pairs. In the case of NGA1-B3, the contacts are shown only for residues that are conserved in VAL1/ABI3. (B) DNA binding by ABI3-B3 mutants. DNA was either nonspecific

(12/12RAV), or specific (12/12LAV), experimental conditions were as in Fig. 3A.

Positions of free DNA, specific complex and nonspecific complex are marked by blue, red and yellow brackets, respectively, and the numbers indicate % amount of the respective DNA forms in the highest protein concentration lane.

(C) DNA binding by NGA1-B3 mutants. DNA was either nonspecific (12/12LAV), or specific (12/12RAV), experimental conditions were as in Fig. 3B. A A-2 K631 R625 R604 S632 K631 R604 R625 C-arm G R623 K591 -1 S632

3 5 ABI3 C0 C1

-2 P - 1P 0 P 1P 2P 3P 4P 5P 6P 7P 8P 9 A ✂ 5✂ -C G G G C T-3 2

T G C A T G C3

✂ 3✂ -G C C C G A-5 5 9P 8P 7 P A6P C5P G4P T3P A2P C1P 0P - 1 P - 2 K591 3 T G2 3 T1 R623 N-arm

G0 K574 K574 K577 G-1 S579 S579 K577

Q183 G182 R181 G182 R143 R181 R177 R177 R143 T C-arm 0

A4 5 T1

T191 3 ARF1 G2 S194 G3 A2 R144

P-1P 0 P 1P 2P 3P 4P 5P 6P 7 P T191 R144 ✂

5✂ - G G G A G A C A A -3 T3 ✂ 3✂ - C C T -5 P P PCPTPCPTPGPTP P G 3 8 7 6 5 4 3 2 1 0 1 S194 K126 T135 K126

5 G0 N-arm T135 T129 S131 G-1 T135 S131

N92 W91 S93 S93 R87 K54 R87 T0 C4 K54 C-arm 5 K101 3 C1 C S104 3 A2 NGA1 A2 K101

-1 0 1 2 3 4 5 6 7 G3

P P P P P P P P P P C1 3 ✂

5✂ - G G C A C C T G A -3 S106 ✂ 3✂ - C C T -5 P 8P 7 P G6P T5P G4P G3P A2P C1P 0 P K46 K37 G 5 K37 0 N-arm T40 G-1 K46 S42 S42 T40

Negative Positive

12/12RAV 12/12LAV

B (non-specific DNA) (specific DNA) Non-specific complex

✁ ✁ ✁ 5✁ -AGCCACCTGTCG-3 5 -CGGTGCATGGCT-3 Free DNA

3 -TCGGTGGACAGC-5 3 -GCCACGTACCGA-5

✁ ✁ ✁ ✁

ABI3-B3(K574A) ABI3-B3(S579A) ABI3-B3(R623A) ABI3-B3(R625A) ABI3-B3(K631A) 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV

25 20 20 31 5 6

75 80 80 69 95 94 % % % % % % 12/12LAV 12/12RAV

C (non-specific DNA) (specific DNA) Specific complex

✁ ✁ ✁ 5✁ -CGGTGCATGGCT-3 5 -AGCCACCTGTCG-3 Free DNA

3 -GCCACGTACCGA-5 3 -TCGGTGGACAGC-5

✁ ✁ ✁ ✁ NGA1-B3(K37A) NGA1-B3(S42A) NGA1-B3(K46A) NGA1-B3(R87A) NGA1-B3(S93A) 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV 12/12LAV 12/12RAV

? ? 46 54 % Fig. 6. Backbone contacts of site-specific B3 domains. (A) Direct contacts to

DNA phosphates. Left: sequences of DNA fragments present in the experimental structures of ARF1-B3 and the models of DNA-bound ABI3-B3 and

NGA1-B3. DNA phosphates making direct contacts to the protein are marked as orange ‘p’ letters. Center: electrostatic potential surfaces of B3 domains and cartoon representations of the DNA (colored as in Fig. 5). Phosphates that make contacts to B3 domains are shown as orange spheres, protein residues making direct contacts to the phosphates are shown in stick representation. Contacts to the phosphates of the ‘blue’ and ‘gray’ strands are shown separately on each side of the electrostatic potential surface. In all panels, N-arm residues are green, C-arm residues are orange, residues from other parts of the protein are yellow. (B) Mutations of ABI3-B3 residues that make direct contacts to DNA phosphates. Experimental conditions were as in Fig. 3A. (C) Mutations of NGA1-

B3 residues that in the depicted model make direct contacts to DNA phosphates. Experimental conditions were as in Fig. 3B. A Free DNA Specific complex Nonspecific complex NGA1-B3(L47A)

12/12NSP 12/12LAV 12/12RAV

5 ~cGTGGACt~3 5 ~gTGCATGg~3 5 ~cCACCTGt~3

3 ~gCACCTGa~5 3 ~cACGTACc~5 3 ~gGTGGACa~5

37 60 63 40 % % NGA1-B3(N48A)

12/12NSP 12/12LAV 12/12RAV

5 ~cGTGGACt~3 5 ~gTGCATGg~3 5 ~cCACCTGt~3

3 ~gCACCTGa~5 3 ~cACGTACc~5 3 ~gGTGGACa~5

30 50 70 50 % % NGA1-B3(W91A)

12/12RAV 12/12RAV_3G 12/12RAV_3A 12/12RAV_3T

5 ~cCACCTGt~3 5 ~cCAGCTGt~3 5 ~cCAACTGt~3 5 ~cCATCTGt~3

3 ~gGTGGACa~5 3 ~gGTCGACa~5 3 ~gGTTGACa~5 3 ~gGTAGACa~5

45 28 46 55 72 54 % % % B ABI3-B3(R586A)

12/12LAV 12/12LAV_2C 12/12LAV_2T 12/12LAV_2A

5 ~gTGCATGg~3 5 ~cTCCATGg~3 5 ~cTTCATGg~3 5 ~cTACATGg~3

3 ~cACGGACc~5 3 ~gAGGTACc~5 3 ~gAAGTACc~5 3 ~gATGTACc~5 9 9 12 11 32 59 91 88 89 % % % % Fig. 7. DNA binding by NGA1-B3 mutants L47A, N48A, W91A and ABI3-B3 mutant R586A. (A) Experiments with NGA1-B3 mutants L47A and N48A were performed with cognate DNA 12/12RAV, standard nonspecific DNA containing 3 substitutions within the RAV1/NGA1 recognition sequence (12/12LAV, positions that deviate from the standard RAV1/NGA1 recognition sequence are shown as white letters on gray background), and alternative nonspecific DNA 12/12NSP, containing an inverted RAV1/NGA1 recognition sequence. Experiments with

NGA1-B3 mutant W91A were performed with standard cognate DNA 12/12RAV and 3 DNAs containing substitutions of the C3/G4 base pair within the canonical RAV1/NGA1 sequence. DNA concentration was 1 µM, protein concentrations were 1, 2, 4 and 10 µM, EMSA experiments were performed as in Fig. 3B. Positions of the free DNA, the specific and the nonspecific complex are marked by blue, red and yellow brackets, respectively, the numbers indicate % amount of the respective DNA forms in the highest protein concentration lane. (B) Experiments with ABI3-B3 mutant R586A were performed with cognate DNA 12/12LAV and 3 DNAs containing substitutions of the G2/C5 base pair within the canonical Sph/RY sequence. DNA concentration was 1 µM, protein concentrations were 0.5, 1, 2 and 5 µM, EMSA experiments were performed as in Fig. 3A. NGA1-B3 NGA1-B3 29 RAV1-B3 182 ARF1-B3 120 ARF5-B3 152 VAL1-B3 289 At1g16640 3 VRN1-B3 238 N-arm C-arm Fig. 8. Structure-based sequence alignment of plant B3 domains. Alignment includes all structurally characterized plant B3 domains: NGA1 (PDB ID 5os9, this study), RAV1 (PDB ID 1wid), ARF1 (PDB ID 4ldx), ARF5 (PDB ID 4ldu), VAL1

(PDB ID 6fas), At1g16640 (PDB ID 1yel), and VRN1-B3 (PDB ID 4i1k, N-terminal residues omitted). Secondary structure elements of NGA1-B3 are shown above the alignment. Residues forming the N-arm (green box) and C-arm (orange box) structural elements are indicated.