-PROTEIN INTERACTION MAP OF Arabidopsis thaliana GENERAL FACTORS A, B, D, E, AND F

By

SHAI JOSHUA LAWIT

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2003

Copyright 2003

by

Shai Joshua Lawit

This document is dedicated to my family, both genetic and scientific.

ACKNOWLEDGMENTS

I thank my dearest wife, Kristel Lynn, for her undying support for me. I also thank

my son, Benjamin Owen, for constant interest in this document as I wrote and unending

smiles and hugs at all times. I thank my parents and all of my family for instilling me

with a desire for education and excellence. Of course, I have a great appreciation for the

members of the Gurley lab (past, present, and future) who continually contribute to this field of research. John Davis and the members of his lab (especially Chris Dervinis for helping me to get access to the Poplar genomic sequences and Ram Kishore Alavalapati for running all the PAUP analyses) deserve special thanks for technical assistance, collaboration, and helpful discussions. I would finally like to thank William B. Gurley,

Eva Czarnecka-Verner, Robert Ferl, Alice Harmon, Karen Koch, Donald McCarty,

Thomas Yang, Robert R. Schmidt, Waltraud I. Dunn, and the entire teaching faculty who have molded me into the scientist that I am.

iv

TABLE OF CONTENTS Page

ACKNOWLEDGMENTS ...... iv

LIST OF TABLES...... viii

LIST OF FIGURES ...... ix

ABSTRACT...... xii

CHAPTER

1 INTRODUCTION TO THE LITERATURE ...... 1

General Transcription Factors ...... 1 TATA Binding Protein and TFIID...... 3 TATA Binding Protein-Associated Factors ...... 6 -like TAFs ...... 6 TAF1 family...... 8 Other TAFs and interactions of TFIID...... 13 Alternative TBP- or TAF-Containing Complexes ...... 17 TAFs: Required Factors or Optional Accessories...... 23 Interplay of GTFs ...... 26 Transcriptional Activators That Bind DNA ...... 36

2 PHYLOGENETIC ANALYSIS OF POPLAR, Arabidopsis AND OTHER PLANT GENERAL TRANSCRIPTION FACTORS ...... 51

Introduction...... 51 Methods ...... 53 Results...... 56 TFIIA Large and Small Subunits ...... 56 TFIIB Family...... 57 Representative TFIID Components...... 58 TFIIEα and TFIIEβ Subunits ...... 59 TFIIFα and TFIIFβ Subunits ...... 60 Discussion...... 60 TFIIA Large and Small Subunits ...... 60 TFIIB Family...... 62 Representative TFIID Components...... 65

v TFIIEα and TFIIEβ Subunits ...... 68 TFIIFα Family...... 68 TFIIFβ Family...... 69

3 BINARY PROTEIN-PROTEIN INTERACTIONS OF THE Arabidopsis thaliana GENERAL TRANSCRIPTION FACTOR IID...... 89

Introduction...... 89 Materials and Methods ...... 90 Results...... 96 Discussion...... 98

4 BINARY PROTEIN-PROTEIN INTERACTIONS OF Arabidopsis TFIIA, TFIIB, TFIID, TFIIE, AND TFIIF ...... 118

Introduction...... 118 Materials and Methods ...... 119 Results...... 121 Discussion...... 123

5 DISCUSSION...... 147

TFIIA Large and Small Subunits...... 147 TFIIB Family...... 149 TFIID Components...... 152 TFIIEα and TFIIEβ Subunits ...... 154 TFIIFα and TFIIFβ Subunits...... 155 Conclusion ...... 157

APPENDIX

A NUCLEOTIDE AND AMINO ACID SEQUENCES OF GENERAL TRANSCRIPTION FACTORS...... 161

TFIIA Small Subunit Sequences ...... 161 TFIIA Large Subunit Sequences ...... 163 TFIIB Family Sequences ...... 165 TATA Binding Protein Sequences ...... 172 TAF6 Sequences...... 176 TAF9 Sequences...... 179 TAF10 Sequences...... 182 TAF11 Sequences...... 185 TFIIEα Sequences ...... 186 TFIIEβ Sequences ...... 189 TFIIFα Sequences ...... 192 TFIIFβ Sequences...... 194

vi

B AMINO ACID MULTIPLE SEQUENCE ALIGNMENTS FOR CORE DOMAINS OF THE GENERAL TRANSCRIPTION FACTORS ...... 197

TFIIA Small Subunit Alignment ...... 197 TFIIA Large Subunit Alignment ...... 198 TFIIB Family Alignment...... 200 TBP Alignment...... 205 TAF6 Alignment...... 208 TAF9 Alignment...... 211 TAF10 Alignment...... 214 TAF11 Alignment...... 215 TFIIEα Alignment...... 216 TFIIEβ Alignment ...... 219 TFIIFα Alignment ...... 222 TFIIFβ Alignment ...... 223

LIST OF REFERENCES...... 226

BIOGRAPHICAL SKETCH ...... 249

vii

LIST OF TABLES

Table page

1-1. TATA binding protein-associated factors of the TFIID complex...... 39

1-2. Protein-protein interactions of TFIID in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae with corresponding references...... 41

1-3. Protein-protein interactions between TFIIA, TFIIB, TFIID, TFIIE, and TFIIF subunits in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae with corresponding references...... 46

2-1. Arabidopsis GTF , loci, genomic sizes, coding sequence sizes (counting stop codons), predicted protein molecular weights, and pI of the predicted . ..72

2-2: Similarity and identity percentage ranges of the GTF protein families examined.74

3-1. Primers for amplification of TBP and TAF-like cDNAs and cloning into pENTR/D-Topo or pDONR207 vectors...... 105

3-2. Primers for cloning of TAF12 N-terminal, middle, and C-terminal fragments...107

3-3. Arabidopsis thaliana TFIID subunit cDNA GenBank accession numbers...... 108

3-4. A yeast two-hybrid targeted protein-protein interaction matrix between subunits of the Arabidopsis thaliana TFIID complex...... 113

4-1. Primers for amplification of cDNAs to Arabidopsis homologs of TFIIA, TFIIB, TFIIE, and TFIIF cloning into the pENTR/D-Topo vector...... 129

4-2. Primers for cloning of TFIIEα2 N-terminal, and C-terminal fragments...... 131

4-3. Arabidopsis thaliana TFIIA, TFIIB, TFIIE, and TFIIE component cDNA GenBank accession numbers...... 132

4-4. A yeast two-hybrid targeted protein-protein interaction matrix between components of Arabidopsis thaliana TFIIA, TBIIB, TFIIE, and TFIIF with subunits of the TFIID complex...... 137

viii

LIST OF FIGURES

Figure page

1-1. The “two-step handoff” model of removal of auto-inhibition of TFIID by the TAF1 N-terminal domains TAND1 and TAND2 (T1 and T2, respectively)...... 40

1-2. Binary protein-protein interactions of the Homo sapiens general transcription factors TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and their homologs...... 48

1-3. Binary protein-protein interactions of Drosophila melanogaster general transcription factors TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and their homologs....49

2-1: Unrooted phylogram of TFIIA small subunit proteins from plants, humans, fruit flies, and yeast...... 75

2-2: Unrooted phylogram of TFIIA large subunit proteins from plants, humans, fruit flies, and yeast...... 76

2-3: Unrooted phylogram of TFIIB-related proteins from plants, humans, fruit flies, yeast, and Archaea...... 77

2-4: Unrooted phylogram of TBP-related proteins from plants, humans, fruit flies, yeast, and Archaea...... 78

2-5: Unrooted phylogram of TAF6-related proteins from plants, humans, fruit flies, and yeast...... 79

2-6: Unrooted phylogram of TAF9-related proteins from plants, humans, fruit flies, and yeast...... 80

2-7: Unrooted phylogram of TAF10-related proteins from plants, humans, fruit flies, and yeast...... 81

2-8: Unrooted phylogram of TAF11-related proteins from plants, humans, fruit flies, and yeast...... 82

2-9: Unrooted phylogram of TFIIEα-related proteins from plants, humans, fruit flies, yeast, and Archaea...... 83

2-10: Unrooted phylogram of TFIIEβ-related proteins from plants, humans, fruit flies, and yeast...... 84

ix 2-11: Unrooted phylogram of TFIIFα-related proteins from plants, humans, fruit flies, and yeast...... 85

2-12: Unrooted phylogram of TFIIFβ-related proteins from plants, humans, fruit flies, and yeast...... 86

2-13. Multiple sequence alignment of the TFIIB region containing the conserved lysine residue that is acetylated in human and yeast TFIIB (in green)...... 87

2-14: Exon-Intron diagrams of Arabidopsis TAF6 and TAF6b alternative splicing forms...... 88

3-1. Histogram of percent of matings, per bait construct, that yielded colony growth...... 109

3-2. Histogram of percent of matings, per prey construct, that yielded colony growth...... 110

3-3. Immunoblots of TFIID components expressed as bait fusion proteins in MaV204K...... 111

3-4. Immunoblots of TFIID components expressed as prey fusion proteins in AH109...... 112

3-5. Colorimetric assays of the β-galactosidase reporter levels in yeast diploids containing both bait and prey plasmids...... 114

3-6. Protein-protein interactions of Arabidopsis thaliana TFIID subunits as determined by yeast two-hybrid and β-galactosidase confirmations...... 116

3-7. Protein-protein interactions of Arabidopsis thaliana TFIID subunits as determined by yeast two-hybrid and β-galactosidase confirmations...... 117

4-1. Histogram of percent of matings, per bait construct, that yielded colony growth...... 133

4-2. Histogram of percent of matings, per prey construct, that yielded colony growth...... 134

4-3. Immunoblots of TFIIA, TFIIB, TFIIE, and TFIIF components expressed as bait fusion proteins in MaV204K...... 135

4-4. Immunoblots of TFIIA, TFIIB, TFIIE, and TFIIF components expressed as bait fusion proteins in AH109...... 136

4-5. Colorimetric assays of the β-galactosidase reporter levels in yeast diploids containing both bait and prey plasmids...... 138

x 4-6. Protein-protein interactions of Arabidopsis thaliana TFIIA subunits with components of TFIIB, TFIID, TFIIE, TFIIF, and other TFIIA components as determined by yeast two-hybrid and β-galactosidase confirmations...... 140

4-7. Protein-protein interactions of Arabidopsis thaliana TFIIB homologs with components of TFIIA, TFIID, TFIIE, TFIIF, and other homologs of TFIIB as determined by yeast two-hybrid and β-galactosidase confirmations...... 141

4-8. Protein-protein interactions of Arabidopsis thaliana TFIID components with components of TFIIA, TFIIB, TFIIE, and TFIIF as determined by yeast two- hybrid and β-galactosidase confirmations...... 142

4-9. Protein-protein interactions of Arabidopsis thaliana TFIIE subunits with components of TFIIA, TFIIB, TFIID, TFIIF, and other TFIIE components as determined by yeast two-hybrid and β-galactosidase confirmations...... 143

4-10. Protein-protein interactions of Arabidopsis thaliana TFIIF subunits with components of TFIIA, TFIIB, TFIID, TFIIE, and other TFIIF components as determined by yeast two-hybrid and β-galactosidase confirmations...... 144

4-11. Protein-protein interactions of Arabidopsis thaliana TFIIA, TFIIB, TFIIE, and TFIIF with each other and subunits of TFIID as determined by yeast two-hybrid and β-galactosidase confirmations...... 145

4-12. Strong protein-protein interactions of Arabidopsis thaliana TFIIA, TFIIB, TFIIE, and TFIIF with each other and subunits of TFIID as determined by yeast two- hybrid and β-galactosidase confirmations...... 146

5-1. Protein-protein interactions among TFIIA, TFIIB, TFIIE, TFIID, and TFIIF that are unique to Arabidopsis thaliana as determined by yeast two-hybrid and β- galactosidase confirmations...... 158

5-2. Protein-protein interactions of Arabidopsis thaliana TFIIA, TFIIB, TFIID, TFIIE, and TFIIF that have been reported previously for Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae homologs...... 159

5-3. Interactions of Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae TFIIA, TFIIB, TFIID, TFIIE or TFIIF that were not confirmed for Arabidopsis thaliana homologs ...... 160

xi

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

PROTEIN-PROTEIN INTERACTION MAP OF THE Arabidopsis thaliana GENERAL TRANSCRIPTION FACTORS A, B, D, E, AND F

By

Shai Joshua Lawit

December 2003

Chair: William B. Gurley Cochair: Eva Czarnecka-Verner Major Program: Plant Molecular and Cellular Biology

General transcription factor IID (TFIID) is a central to the nucleation of the deoxyribonucleic acid (DNA) dependent ribonucleic acid (RNA) polymerase II (PolII) preinitiation complex (PIC) and is critical to the transcriptional activation of many genes. The presence of TFIID at a promoter leads to recruitment of the other general transcription factors (GTFs) TFIIA, TFIIB, TFIIE, TFIIH, and TFIIF (in association with PolII). While GTFs have been heavily studied in metazoans and yeast, little is known about their functions in the plant kingdom. Recent studies of selected

GTF proteins in plants have uncovered possible plant-specific and developmental roles, suggesting that some GTF proteins have evolved different functions since the last common ancestor of plants, animals, and fungi.

The specific objectives for characterization of the GTFs from Arabidopsis, were to identify the GTF proteins and uncover their binary interactions. A number of genes for putative GTFs were identified by homology-based searches. These newly identified

xii genes added to the two previously known TATA-binding protein (TBP) genes, three genes encoding subunits of TFIIA, and two TFIIB genes. Of these genes, 16 encoded

TBP-associated factor like proteins (TAFs), and 14 encoding putative components of

TFIIA, TFIIB, TFIIE, and TFIIF in Arabidopsis. Many of their complementary DNAs

(cDNAs) were cloned using reverse transcriptase-mediated polymerase chain reaction

(PCR). Often, these clones were the first confirmation of messenger RNAs for their respective genes.

The cDNAs of these Arabidopsis GTF genes have been subcloned into yeast two- hybrid bait and prey vectors, and transformed into yeast MATa and MATα strains, respectively. Using a targeted interaction scheme, 1598 interactions were tested.

Interactions that yielded colony growth in the yeast two-hybrid system were verified using β-galactosidase assays. A map of binary protein-protein interactions between the subunits of Arabidopsis TFIIA, TFIIB, TFIID, TFIIE, and TFIIF was constructed. Of the

112 interactions, 36.4% were protein interactions that were previously characterized in other systems. However, 63.6% (112) of the interactions were novel. This is the first comprehensive protein-protein interaction map for TFIIA, TFIIB, TFIID, TFIIE and

TFIIF and has elucidated new PIC nucleation pathways (i.e. a TAF8-TAF10 heterotetramer with extensive protein contacts).

xiii CHAPTER 1 INTRODUCTION TO THE LITERATURE

General Transcription Factors

The central dogma of molecular biology defines the flow of genetic information as directed from deoxyribonucleic acid (DNA), to ribonucleic acid (RNA), and ultimately to

protein. In eukaryotes, DNA-dependent RNA polymerase II (PolII) is responsible for the

transcription of some small nuclear RNAs and all messenger RNA (mRNA is the only

RNA that is translated into protein) (Burley and Roeder, 1996). Initiation of transcription

by PolII is a complex process and requires interactions of many proteins that comprise

the general transcription factors (GTFs) TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH

(Matsui et al., 1980; Samuels et al., 1982; Albright and Tjian, 2000). In the first step,

TFIID nucleates the assembly of the GTFs at the TATA element of PolII promoters. This

is achieved in part by sequence specific recognition of DNA by TATA binding protein

(TBP).

Two predominant models describe PolII assembly at a promoter. The stepwise

model states that GTFs are sequentially recruited to a promoter in a predetermined order

(TFIID, TFIIB, TFIIA, TFIIF with PolII, TFIIE, and TFIIH) (Buratowski et al., 1989;

Flores et al., 1992; Koleske and Young, 1995). The second model is called the

holoenzyme model because TFIIB, TFIIE, TFIIF, TFIIH, Mediator (a compilation of

suppressor of RNA polymerase B proteins, and numerous other proteins), and PolII are

pre-assembled as a multi-protein mega Dalton (MDa) complex before recruitment to the

promoter (Koleske and Young, 1994, 1995). Both models require either TBP or TFIID to

1 2 nucleate the assembly of the preinitiation complex (PIC) at the promoter as a first step.

In total, the PolII PIC is composed of at least 48 protein subunits: 12 PolII; 18-20 TFIID, including multiple copies of some TBP associated factors (TAF) subunits; one TFIIB; four TFIIF (α2β2) (Flores et al., 1990); four TFIIE (α2β2) (Ohkuma et al., 1990); and 9

TFIIH.

In recent years, the holoenzyme model has gained preference in the transcription- oriented community. Evidence in support of the holoenzyme model includes the isolation of the large holoenzyme complex from a variety of species; and that artificial recruitment of holoenzyme subunits (out of order according to the stepwise model) to a promoter leads to high levels of transcription (Koleske and Young, 1994; Berk, 1999).

Furthermore, early purifications of PolII lacking some members of the holoenzyme can be accounted for by low abundance relative to PolII and unfavorable protein-protein interaction conditions in the purification schemes (Koleske and Young, 1994; Berk,

1999). However, arguing against the importance of the holo-form of the enzyme is the finding that quantitative immunoblots of HeLa extracts demonstrated that only 3% of soluble PolII was present in a holoenzyme-sized complex (Kimura et al., 1999).

The stepwise model implies that multiple assembly steps are involved in formation of the PIC. Conversely, the holoenzyme model suggests that there are only two major regulatory steps in PIC formation: recruitment of either TFIID or holoenzyme to the promoter and subsequent recruitment of the reciprocal complex through protein-protein interaction (Koleske and Young, 1995). In either case, recognition of a promoter by

TFIID is certainly a prominent regulatory step in the initiation of transcription by PolII.

3

TATA Binding Protein and TFIID

The saddle-shaped TBP constitutes the core of the TFIID complex. This

evolutionarily ancient protein is essential for recognition of promoters by all three

eukaryotic RNA polymerases, as well as the archaeal RNA polymerase (Rowlands et al.,

1994). This was reviewed by Buratowski (1994); and Burley and Roeder (1996).

Promoter specificity for DNA-dependent RNA polymerase I (PolI), which transcribes

ribosomal DNA, is determined by selectivity factors. These selectivity factors are SL1 in

humans (Learned et al., 1985), TIF-IB in mice (Clos et al., 1986), and CF in yeast (Lin et

al., 1996). These selectivity factors contain TBP and three TBP associated factors

(TAFIs): TAFI48, TAFI63, and TAFI110) (Comai et al., 1994; Zomerdijk et al., 1994).

The TAFIs contact the consensus elements in ribosomal DNA core promoters and are

vital to PolI transcription (Learned et al., 1985; Clos et al., 1986). In addition, TAFIs

contact a transcriptional activator that enhances Selectivity factor 1 (SL1) binding and

increases PolI transcription (Beckmann et al., 1995; Steffan et al., 1996).

RNA polymerase III is responsible for the transcription of transfer RNAs, 5S

ribosomal RNA, small ribonuclear protein RNAs, and small nuclear RNAs (snRNAs;

which in some cases are also transcribed by PolII) (Carmo-Fonseca et al., 2000). In

mammals, TBP plus the PolIII selectivity-factors PTF/SNAPc (proximal-sequence

element-binding transcription-factor/small nuclear RNA-activating protein complex)

recognize the snRNA promoters (Murphy et al., 1992; Sadowski et al., 1993).

PTF/SNAPc is composed of four subunits that bind core snRNA promoter elements and stabilize TBP binding at these promoters (Yoon and Roeder, 1996; Mittal and Hernandez,

1997). In yeast, two essential TAFIIIs bind TBP, forming the TFIIIB complex, which

4

functions as an snRNA promoter selectivity factor (Margottin et al., 1991; Joazeiro et al.,

1994).

In addition to its role in PolI and PolIII-mediated expression, TBP also

serves as a promoter selectivity factor for PolII within the TFIID complex. In early

experiments, TBP was thought to be the sole component of TFIID. These experiments

showed that TBP alone could form the foundation of the PIC in vitro (Peterson et al.,

1990). TBP is a saddle-shaped protein with near symmetry (Nikolov et al., 1992). The

concave surface of TBP interacts with the minor groove of the DNA TATA-element; and

the convex surface is left open to protein-protein interactions (Chasman et al., 1993; Kim

et al., 1993a; Kim et al., 1993b; Albright and Tjian, 2000). The C-terminal stirrup of

TBP interacts with the 5’-end of the TATA element and TFIIB, which facilitates the

directionality of the TATA interaction and transcription (Nikolov et al., 1995). The

combined interaction of TBP and TFIIB with TATA leads to an 80° bend in the promoter

(Kim et al., 1993b) that is believed to lead to a wrapping of DNA from –60 to +40 around PolII (Coulombe, 1999; Coulombe and Burton, 1999).

Later experiments demonstrated that TBP alone could not initiate activator-

dependent transcription with some transactivators (Hoey et al., 1990; Kambadur et al.,

1990; Peterson et al., 1990; Pugh and Tjian, 1990; Burley and Roeder, 1996). Moreover, it was found that TBP binding to DNA is not required at TATA-less promoters that contain an Initiator element (Zhou et al., 1992; Martinez et al., 1994). These data suggested that TFIID might be composed of TBP and accessory factors (TAFIIs, referred

to as TAFs from here forward). When Dynlacht et al. (1991) found peptides associated

with TBP that provided coactivator function, it was realized that TFIID was composed of

5

more than just TBP. It is now confirmed that TBP and 8 to 14 TAFs comprise TFIID

(Dynlacht et al., 1991; Sanders and Weil, 2000). Therefore, TBP recruitment (often a

limiting step of PolII transcription) (Klein and Struhl, 1994; Chatterjee and Struhl, 1995)

can be achieved by targeted recruitment TAFs to a promoter. Furthermore, some of the

TAFs (in addition to TBP) form the promoter specific DNA interacting surface of TFIID,

probably increasing the stability of a promoter-TFIID interaction (Martinez et al., 1994;

Burke and Kadonaga, 1997; Chalkley and Verrijzer, 1999)}. Since SLI, PTF/SNAPc,

and TFIID all rely on TBP, they must be formed by mutual exclusion of the various TAF

protein-protein interactions with TBP. This model has been experimentally demonstrated

in that TAFIs exclude TAF2 and TAF1 binding to TBP, and vice versa (Comai et al.,

1994).

Arabidopsis TBP has been well characterized and indeed was the first TBP

structure from any organism to be elucidated by X-ray crystallography (Nikolov et al.,

1992). Many subsequent papers described the structure of Arabidopsis TBP in complex with the TATA-box, TFIIA, and TFIIB and various combinations thereof (Kim et al.,

1993a; Kim and Burley, 1994; Nikolov and Burley, 1994; Nikolov et al., 1995). The

Arabidopsis TBP was utilized in these studies not because of an interest in the mechanisms of plant gene regulation, but rather because it closely approximates a TBP core structure, lacking the long N-terminal extensions found in metazoans. Nonetheless, the structure of the Arabidopsis TBP, its DNA-sequence recognition sites (Mukumoto et al., 1993), bending of the TATA element (Takeda et al., 1994), interactions with transcriptional regulators (Qadri et al., 1995; Reindl and Schoffl, 1998; Le Gourrierec et

6

al., 1999; Pan et al., 1999), and its effect on growth of Arabidopsis when overexpressed

(Li et al., 2001) have been well characterized.

TATA Binding Protein-Associated Factors

TAFs may have a wide range of activities (such as coactivation, repression, and

even protein modification) that are thus bestowed on the TFIID complex (Albright and

Tjian, 2000). It is now accepted that the TFIID complex contains at least 14 TAF

subunits. Some of these subunits have enzymatic activity, while many do not have

apparent catalytic properties. Most of the TAFs are in stoichiometric unity, while some

are in multiple copies in a complex, and others are substoichiometric. These proteins

have been studied extensively, and this work will be discussed below.

Histone-like TAFs

It is believed that a structural core of TFIID is composed of the histone-like

TAFs. Yeast TAF6 (yTAF6), yTAF9, and yTAF12 have similarity with H4, H3,

and H2B, respectively. Additionally, evidence supports a theory that yTAF4 has

structural similarity with the histone H2A, forming a heterodimer with yTAF12

(Gangloff et al., 2000; Sanders and Weil, 2000). Humans contain two distinct TAF4

homologs: TAF4 and TAF4b. TAF4b appears to be a likely target of activators

responsible for transcription of genes in B-cells, specifically the κ-promoter or enhancer

(Dikstein et al., 1996b). Interestingly, TAF4b protein levels are post-transcriptionally

regulated such as to reduce the protein levels in non-B-cells to below detection limits

(Dikstein et al., 1996b). The crystal structure of Drosophila TAF9 and 6 (dTAF9 and 6)

display histone-fold domains (HFDs) that interact with one another, forming a dTAF9-

dTAF6 dimer (Xie et al., 1996). TAF dimerization is critical to TFIID formation because

7

TAF HFDs are predicted to not properly fold when the binding partner is lacking (Burley and Roeder, 1996).

It has been hypothesized that some of the histone-like TAFs bind DNA and form a structure similar to a nucleosome, perhaps in conjunction with the TBP-TFIIB bending of

DNA (Coulombe, 1999). This hypothesis was supported by data showing that TFIID introduces a negative supercoil in bound DNA, much like a nucleosome (Oelgeschlager et al., 1996). However, TAFs do not contain the arginine residues found in histone tails that bind the minor grove of DNA (Luger et al., 1997; Workman and Kingston, 1998;

Albright and Tjian, 2000). Additionally, structural evidence also refutes a nucleosome- like organization. Human TFIID is composed of four dimensionally equivalent domains, none of which are large enough to contain the histone-fold TAF-octamer (Brand et al.,

1999). Despite this evidence, DNase I footprinting experiments suggest that DNA is in some way wound around TFIID, approximately one turn as opposed to the two turns characteristic of nucleosomes (Michel et al., 1998). Furthermore, dTAF6 and 9 contact a conserved downstream promoter element (DPE) in photo-crosslinking experiments

(Burke and Kadonaga, 1997). Selleck and colleagues (Selleck et al., 2001) demonstrated that bacterially expressed, yeast histone-like TAFs 4, 6, 9, and 12 can self-assemble into a

TAF octamer in a 2:2:2:2 ratio in vitro. Overexpression of these yeast histone-like TAFs individually has the capacity to suppress mutations in the other members of the octamer

(Selleck et al., 2001). Additionally, by co-immunoprecipitation, TAF4 and TAF4b have been shown to be present within the same TFIID complex, indicating that there are likely two copies of TAF4 in TFIID (Dikstein et al., 1996b). This evidence constitutes strong support for a histone-like octamer in TFIID, or some variation thereof. Irrespective of

8

DNA binding, multiple, possibly interchangeable, histone-fold proteins within TFIID suggest that this complex has a flexible composition modulated in response to cellular signals (Albright and Tjian, 2000).

More recently, it has been realized that other TAFs contain conserved histone-fold domains, such as TAF3, TAF8, TAF10, TAF11, and TAF13 (Leurent et al., 2002). In addition to the TAF6-TAF9 association, a number of TAF heterodimers containing HFDs have been shown to form including human TAF11 (hTAF11) -hTAF13 (Birck et al.,

1998), human and yeast TAF4-TAF12 (Gangloff et al., 2000; Reese et al., 2000; Sanders and Weil, 2000), TAF3-TAF10 (Gangloff et al., 2001b), and TAF8-TAF10 (Gangloff et al., 2001b). TAFs 11 and 13 are unique in that they are a histone-fold binding-pair that is specific to the TFIID complex (Grant et al., 1998). However, it is believed that they are structural and functional orthologs of the Spt3 subunit of the Spt-Ada-Gcn5- acetyltransferase (SAGA) complex (Apone et al., 1998; Birck et al., 1998).

The full complement of putative histone-like TAFs has been identified in

Arabidopsis thaliana, with the exception of TAF3 (Table 1-1). Several of the putative

Arabidopsis TAFs are members of two-gene families. These include the TAF1/1b,

TAF4/4b, TAF6/6b, TAF11/11b, TAF12/12b, TAF14/14b, and TAF15/15b genes.

TAF1 family

Proteins in the TAF1 family represent the largest TAFs in animal and plant cells, and have a bevy of biochemical roles including histone acetyltransferase (HAT) activity and protein kinase activity (Takada et al., 1992; Lee and Young, 1998). The protein kinase phosphorylates the large subunit of TFIIF (α or RAP74), but ultimately has an unknown downstream function (Dikstein et al., 1996a; O. Brien and Tjian, 1998). Yeast

9

TAF1 can also phosphorylate TFIIA apparently raising levels of transcription and increasing the affinity for TBP (Solow et al., 2001). The TAF1 family also has a recently identified ubiquitin conjugating activity (ubac); however, the utility of this function is yet to be elucidated (Pham and Sauer, 2000). The substrate of the ubac activity is histone

H1, which it monoubiquitylates. Histone H1 monoubiquitylation may lead to

transcriptional activation, since monoubiquitylation of histones H2A and H2B has been

correlated with actively transcribed genes (Davie and Murphy, 1990).

Generally, TBP binding to the TATA box and PIC formation are inhibited by

nucleosomes, suggesting that a condensed chromatin state directly inhibits transcription

(Workman et al., 1991; Imbalzano et al., 1994; Workman and Kingston, 1998).

Furthermore, a correlation between hyperacetylated (open) chromatin and regions that are

transcriptionally active leads to the hypothesis that HAT activity functions in the

remodeling of chromatin to activate transcription (Ayer, 1999; Grant and Berger, 1999;

Wolffe and Guschin, 2000). The TAF1 family HAT activity implies that histone

acetylation is important at the core promoter to aid transcription factor/chromatin

contacts (Workman and Kingston, 1998). Furthermore, it is known that human TAF1

also acetylates TFIIFα, and the β subunit of TFIIE (Imhof et al., 1997). This activity is

known as Factor Acetyltransferase (FAT) activity; however, the significance of FAT

activity is unclear (Grant and Berger, 1999). The yTAF1 FAT/HAT might acetylate

histones near the core promoter, basal transcription factors, or other unknown protein

targets (Jacobson et al., 2000). FAT/HAT, and kinase activities may function in

conjunction as a signal transduction cascade targeting GTFs and histones and ultimately

result in gene activation (Albright and Tjian, 2000; Jacobson et al., 2000). Two putative

10

Arabidopsis TAF1 homologs (AtTAF1 and AtTAF1b) also appear to have these

conserved domains responsible for the activities described above (HAT/FAT, protein

kinase, and ubiquitin conjugating activity; E. Czarnecka-Verner and W.B. Gurley; unpublished data).

At the C-terminus, human and Drosophila TAF1 contain two tandem

bromodomains that are known to bind acetylated lysine residues of histone H4 (Jacobson

et al., 2000). The first bromodomain shown to bind acetylated histones (H3 and H4 NH2-

terminal peptide) was from p300/CBP-associated factor (PCAF) (Dhalluin et al., 1999).

A later binding study with hTAF1 found that the double bromodomain motif has an

affinity for acetylated H4 peptide 70-fold greater than the single domain of PCAF

(Dhalluin et al., 1999; Jacobson et al., 2000). Acetylation of four histone H4 lysines is

correlated with transcriptional activity, raising the possibility that the role of the hTAF1

bromodomain(s) is to bind to the acetyl-lysines, thereby facilitating histone modification

by the HAT domain (Jacobson et al., 2000). Interestingly, yeast TAF1 is lacking

bromodomains and a C-terminal kinase domain; however, a separate protein,

bromodomain factor 1 (Bdf1), was identified that contains two bromodomains, and a

kinase domain, and is found in association with TFIID (Matangkasombut et al., 2000).

Due to these structural and functional similarities to the C-terminus of hTAF1, Bdf1 is

hypothesized to be functionally analogous to the hTAF1 C-terminus.

In Arabidopsis, TAF1 and TAF1b each have a single bromodomain (W.B. Gurley,

unpublished data), and thus are hypothesized to have a limited affinity for acetylated

histone H4, perhaps as much as 70-fold lower affinity than hTAF1 (Jacobson et al.,

2000). Drawing on the parallel with yeast, it is hypothesized that Arabidopsis may also

11

express a protein analogous to Bdf1, supplementing the AtTAF1 and AtTAF1b single

bromodomains. However, analyses of Arabidopsis genomic sequence did not reveal a

Bdf1-analgous protein to be present.

Another property of the TAF1 family is a capacity to auto-inhibit TFIID function

(Kokubo et al., 1993b). This regulatory property resides in the N-terminal domain

(TAND1) that acts as a TATA-element minor-groove mimic. Inhibition is achieved by

competition between the TAND1 and the TATA-box for binding to the concave surface

of TBP (Liu et al., 1998; Kotani et al., 2000). While the TAND1-TBP interaction may

add to the stability of the TFIID complex, it is not required for TFIID integrity (Albright and Tjian, 2000). A TAND1-adjacent domain TAND2 appears to stabilize the TAND1-

TBP interaction by binding the helix H2 on the convex face of TBP (Burley and Roeder,

1998). It has been shown through domain swapping that there is a functional

conservation between the TAND1 and acidic activation domains of transactivator proteins such as VP16 (Kotani et al., 2000). In domain-swapping experiments, acidic activator domains are capable of TFIID inhibition when translationally fused to yTAF1; conversely, TAND1 is capable of serving as an activator when fused to a DNA-binding domain (Kotani et al., 2000). This leads to the “two-step hand off model” in which the auto-inhibition caused by TAND1 is competed by acidic activators. Subsequencely, the

TAND2 interaction with the TBP convex surface is competed by TFIIA, ultimately leading to a cooperative removal of the inhibitory region of TAF1 from TBP (Figure 1-1)

(Kotani et al., 2000). In this model, the TFIIA-TFIID-acidic activator intermediate allows some TAFs to bind near the transcriptional initiation site (TAF2 binds Initiator,

TAF6/TAF9 dimer binds DPE) and leads to the TATA-box displacing the acidic activator

12

from the concave surface of TBP (Kotani et al., 2000). However, this model may not prove useful in humans or Drosophila where the TBP-TAND1 affinity is much higher

than that of yeast (Kotani et al., 2000). A study using HeLa heat-treated chromatin

demonstrated that TFIID had less capacity for transcriptional initiation than TBP alone,

possibly due to the TAND1-based inhibition of TBP in the absence of functional

transcriptional activators (Remboutsika et al., 2001).

The genomic region encoding A. thaliana TAF1b is lacking a

corresponding to the yeast and metazoan TBP-inhibiting TAND1 and TAND2. On the

other hand, AtTAF1 appears to contain a TAND1 and TAND2 (E. Czarnecka-Verner and

W.B. Gurley, unpublished data). AtTAF1b seems to be the only example of a TAF1

homolog lacking a TAND1/2, possibly altering the transcriptional activation

characteristics of AtTFIID.

Interestingly, in a microarray experiment utilizing a temperature sensitive (TS)

mutant yeast only 16% of PolII promoters showed dependence on yTAF1 (Holstege et

al., 1998). Many of the genes that were down regulated upon temperature shift were cell

cycle and DNA repair genes (not unexpected given the original identification of yTAF1

as a cyclin). The surprise is in the low percentage of yTAF1-dependent genes. This

apparent low dependence on yTAF1 can be explained by the overlap in function between

TFIID and other TAF-containing complexes (such as SAGA) that do not require yTAF1.

Alternatively, the yTAF1 TS mutation could be leaky in its disruption of function as seen

for other TS TAF mutations that underrepresent the magnitude of the yTAF1 contribution

to gene regulation in these experiments (Michel et al., 1998).

13

Other TAFs and interactions of TFIID

Many other TAFs are known, but for the most part their functions are quite nebulous. The metazoan TAF2 family represents one exception in that it has been shown to bind the core promoter Initiator element. This was shown directly by DNase I footprinting and electrophoretic mobility shift assays (EMSA) with recombinant

Drosophila TAF2 (Albright and Tjian, 2000). However, it seems unlikely that this should be the only function of TAF2 due to its large size (~150 kDa). For example, TBP binds a specific DNA sequence with only 20% the mass of TAF2. With so many properties attributed to the TAF1 family, it would be intellectually satisfying to find other roles for TAF2. Interestingly, consensus sequences for plant and fungal Initiator elements have not been defined and it is unknown whether the TAF2 proteins in these kingdoms bind to core promoter elements.

Other than TAF1, the only other stoichiometric TAF with enzymatic activity is

TAF7 (personal communication, M. Horikoshi) TAF7 from humans and yeast (but not

Drosophila or Arabidopsis) have similarity with von Willebrand factor type A domain

(VWA). Interestingly, the majority of VWA-containing proteins are extracellular.

However, very ancient VWA-containing proteins (found in all eukaryotes) are intracellular proteins involved in various cellular tasks such as transcription, DNA repair, ribosomal and membrane transport, and the proteasome protein degradation pathway

(Colombatti et al., 1993). One feature common to these proteins is the formation of multiprotein complexes (Colombatti et al., 1993). It is important to note that yTAF7 most closely resembles the ATPases associated with diverse cellular activities (AAA)

ATPase family of VWA-containing proteins. A yeast two-hybrid experiment from the lab of Laurie Stargell recently demonstrated that of all the TAFs, only yeast TAF7 was

14

directly associated with TBP (Yatherajam et al., 2003). This, taken together with the fact

that the TAND1 domain of TAF1 binds to the concave surface of TBP, inhibiting TBP

dimerization and promoter binding, may suggest that a role of some TAFs are to prevent

and dissociate nonproductive TBP interactions. Gegonne et al. (2001) demonstrated that

hTAF7 bound hTAF1 and inhibited the FAT/HAT activity. Therefore, I propose an

alternative function for the putative TAF7 ATPase activity and this could be to shut-off the TAF1 FAT/HAT activity by acting as a chaperone protein. Regardless of the veracity of these models, the developing TAF7 story is of great interest.

Other classical TAFs have no recognized role beside protein-protein or protein-

DNA interactions. However, it is thought that these interactions may stabilize the PIC

(Burley and Roeder, 1996). The recent work of Yatherajam et al. (2003) elaborated the

binary protein-protein interactions within the TFIID complex of yeast. These interactions

and others for TFIID from humans, Drosophila and yeast are assembled in Table 1-2. It

is significant that a large number of the binary interactions of yeast TFIID are nucleated

by five histone-fold TAFs (TAFs 4, 6, 9, 10, and 12), four of which (TAFs 4, 6, 9, and

12) are proposed members of a nucleosome-like octamer (Yatherajam et al., 2003). In

addition, a large number of protein-protein interactions occur between TAFs and other

GTFs, as will be discussed in detail below.

Human TAF10, which has affinity for the estrogen receptor and thus may play a

role in estrogen dependent activation, is found in only a subset of cellular TFIID

complexes (Jacq et al., 1994). Nevertheless, a TS mutation of yTAF10 was shown to

impede bulk transcription of mRNA and destabilize TFIID and SAGA upon temperature

shift (Sanders et al., 1999). These results suggest a more general role for yTAF10 than

15 for its human homolog hTAF10. Another example of a TFIID specialization is seen with hTAF13 (which binds hTAF11; see above). The human TAF11-TAF13 pair may be an alternative to hTAF10 because it is found only in TFIID complexes lacking hTAF10

(Jacq et al., 1994; Mengus et al., 1995).

Several TAFs (beside the TAF2, TAF6, TAF9 families mentioned above) contact promoter DNA and these include hTAFs 1, 4, 5, and 7 (Oelgeschlager et al., 1996).

These interactions aid in the stabilization of promoter-TFIID interactions. TBP interactions with the TATA box have similar affinity to TAF interactions with promoter

DNA (Purnell et al., 1994); therefore, TAF-promoter binding is critical for transcription from TATA-less promoters (Bell and Tora, 1999). Thus, it is hypothesized that some

TAFs function in recruiting TFIID to TATA-less promoters, participating in promoter recognition by TFIID (Bell and Tora, 1999).

The TAF15 family of TAFs is a group of pro-oncoproteins that are common sites of translocations in human sarcomas (Bertolotti et al., 1996; Attwooll et al.,

1999). These are hTAF15, TLS/FUS (translocated in liposarcoma/fusion of CHOP) and

EWS (Ewing sarcoma) and are all RNA binding proteins with high similarity to RNA- binding domain (RNP-CS). TAF15 binds not only RNA but also single stranded DNA

(ssDNA) (Bertolotti et al., 1996). Like TAF15, TLS/FUS and EWS associate with TFIID in a mutually exclusive manner (Bertolotti et al., 1996; Bertolotti et al., 1998). TAF15 and EWS contact exactly the same subunits of TFIID (Bertolotti et al., 1998), suggesting that EWS (and possibly TLS/FUS) are TAF15b (and TAF15c) proteins. Similarly to

TAF14, TAF15 and EWS are also associated with another core transcription complex, in this case PolII (Bertolotti et al., 1998). TAF15 and EWS contacted the hRPB3 subunit

16

(Bertolotti et al., 1998). However, only TAF15 contacted hRPB5 and hRPB7 (Bertolotti et al., 1998).

Recent work (presented recently at the 22nd Summer Symposium in Molecular

Biology: Chromatin Structure and Function) from the laboratory of Stephen Buratowski

in which a large-scale purification of TFIID was performed from yeast cells, has

demonstrated sub-stoichiometric association of four ubiquitin machinery proteins with

TFIID (Auty et al., 2003). Although these proteins (BRE5, BUL1, UBP3, and UBP8) are found in many other complexes, they may be operationally defined as TAFs due to their association in TFIID. Yeast TAF1 does not have a demonstrated ubiquitylation activity, nor the domains associated with this activity (Wassarman and Sauer, 2001 Aug) as reported for Drosophila TAF1 (Pham and Sauer, 2000). Perhaps, the two ubiquitin-

conjugases are in some way adopting this role in yeast. Recent evidence from the

Drosophila melanogaster genome-wide protein-protein interaction study shows a ubiquitin conjugase interaction with TAF10 (Giot et al., 2003), suggesting that the presence of ubiquitylation machinery in TFIID is conserved (and not due to a complementation of an activity that yeast TAF1 is lacking).

Proteins involved in ubiquitylation (such as the E2-conjugases) could be involved

in activation of transcription and lead to degradation of inhibitory proteins (i.e., histones) or may result in a rapid turn over of transcriptional activators, which would be critical for shutting off a promoter after the activation triggers are removed. However, the role of the two ubiquitin-hydrolases is more mysterious. It has recently been suggested by

Shelly Berger (2003) that histone H2B ubiquitylation, followed rapidly by de- ubiquitylation, is required for histone H3 methylation on K4 and K36 resulting in gene

17

activation. If this is the case, the presence of both ubiquitylation and de-ubiquitylation enzymes in one complex may be mechanistically linked for gene activation.

Little information is available about the TFIID complex in plants. A crude preparation of TFIID has been purified from wheat germ, which appears to be a stable

complex similar to that from metazoans (Washburn et al., 1997). Unfortunately, this

purified complex was sparse, not homogeneous, and did not lead to the identification of

any subunits.

Some information on subunits TFIID from plants is beginning to become available.

In Arabidopsis, TAF10 was found to interact with a ubiquitin conjugase as tested by

yeast-two hybrid screen (S.J. Lawit, P. Michaluk, E. Czarnecka-Verner, W.B. Gurley

unpublished results), suggesting that plant TFIID complexes also include ubiquitylation

machinery. Also, Tamada et al. (2003) have shown that the Arabidopsis TAF10 is

transcriptionally regulated to a high degree. As it is highly expressed in developing

tissues, but expressed below detection levels in mature tissues. Along this line, TAF10 is

not expressed in non-reproductive tissues following bolting of the inflorescences of

Arabidopsis. Such a close tie to development is consistent with biochemical information

on human TAF10 that is present in a subset of TFIID complexes and interacts with the

estrogen receptor, potentially playing a part in development.

Alternative TBP- or TAF-Containing Complexes

Several protein complexes other than TFIID contain TBP or TAFs, blurring the

lines of what can be considered general transcription factors. At least four types of

coactivators interact with TBP and display promoter selection properties (for review see

(Lee and Young, 1998). These are TAFIs, TAFIIs (TAFs), TAFIIIs, and PTF/SNAPc.

Other coactivators/corepressors/GTFs such as SAGA, Mot1, NC2, Nots, and TFIIA,

18

along with TAFs, play important roles in regulating expression of mRNA (Lee and

Young, 1998; Mitsiou and Stunnenberg, 2000). The SAGA complex does not copurify

with TBP; however, multiple SAGA subunits do bind TBP individually (Spt3, Ada2,

Ada5/Spt20) and may recruit it to a promoter (Eisenmann et al., 1992; Barlev et al.,

1995; Roberts and Winston, 1996; Saleh et al., 1997; Sterner et al., 1999). Interestingly, western blots have demonstrated that TBP in yeast is around ten-fold more abundant than

TAFs, SAGA, BTAF1 (Mot1), NC2, and Nots (Lee and Young, 1998). This is consistent with the observations that TBP may be a component of a variety of protein complexes.

One alternate TFIID complex, B-TFIID, is found in yeast and humans. Human B-

TFIID is capable of nucleating basal transcription, but is unresponsive to activators much like TBP alone (Chang and Jaehning, 1997). B-TFIID functions as a global transcriptional co-repressor and was initially believed to contain several core TAFs

(Wade and Jaehning, 1996; Chang and Jaehning, 1997). Later it was established that B-

TFIID consisted of TBP and at least one TAF (BTAF1, or Mot1), but not the full complement of TAFs (Auble et al., 1994; Wade and Jaehning, 1996). BTAF1 is an essential protein in yeast and affects only a subset of the organismal transcriptosome in microarray studies, possibly through promoter recruitment (Wade and Jaehning, 1996), or adenosine triphosphate (ATP)-dependent release of the rate limiting TBP from high affinity TATA elements for use at lower affinity promoters (Collart, 1996). This second model is more likely since BTAF1 seems to function through dissociating TBP from

DNA in an ATP-dependent manner (Lee and Young, 1998).

Other alternative TAF complexes are hTFTC, hPCAF, ySAGA, human SPT3-

TAFII31-GCN5-L acetyltransferase (hSTAGA), ySLIK (yeast SAGA-like), and

19 ySALSA (yeast SAGA altered Spt8 absent), etc. These complexes are quite similar in structure and function (Struhl and Moqtaderi, 1998). Significantly, none of these complexes contain either TBP or TAF1; however, each complex contains HAT activity and a subset of TAFs (only four to five of nine histone-fold motif TAFs) (Bell and Tora,

1999). Interestingly, some well-characterized histone-binding partners are replaced in

SAGA and other alternative complexes. Examples include hTAF11-hTAF13 partnering being replaced by an intramolecular Spt3 histone-fold paring, and yTAF12 being paired with Ada51 (Birck et al., 1998; Gangloff et al., 2000).

Outside of TFIID, perhaps the most recognized TAF-containing complex is the yeast SAGA and its human counterpart STAGA (Green, 2000). SAGA is a 1.8 – 2.0

MDa complex containing TAFs, ubiquitin-machinery proteins, Spt, Ada

(alteration/deficiency in activation), and Gcn5 subunits (Grant et al., 1998; Lee and

Young, 1998; Grant and Berger, 1999; Berger, 2003). The TAFs that copurify with

SAGA are TAF5 (WD40 domain), 6 (H4-like), 9 (H3-like), 10 (histone-fold domain), 12

(H2B-like), and 13 (histone-fold domain) (Grant et al., 1998; Grant and Berger, 1999).

The presence of ubiquitin-machinery proteins in SAGA, like TFIID, a TAF containing complex, further supports a functional relationship with TAFs and ubiquitylation. Yeast

Gcn5 (a HAT) is additionally found in other large protein complexes containing Ada proteins (Grant and Berger, 1999).

Yeast SAGA is somewhat redundant with other complexes including TFIID,

SWI/SNF (Switch/Sucrose non-fermenting, a chromatin remodeling complex), and suppressor of RNA polymerase B (SRB)/Mediators (Grant and Berger, 1999). A temperature sensitive mutation of yTAF9, a member of both TFIID and SAGA, tested by

20

a microarray experiment demonstrated that 67% of PolII genes require yTAF9 (Apone et

al., 1998; Holstege et al., 1998). A TS mutation in yTAF10 (also a member of the same

complexes) showed that it was also required for bulk mRNA expression (Sanders et al.,

1999). Similarly, a TS mutation in yTAF11 (a member of TFIID only) also decreased

PolII transcription to background levels when tested under the nonpermissive temperature

(Komarnitsky et al., 1999). However, a similarly tested Gcn5 mutation was required by

only 5% of PolII transcribed genes (Holstege et al., 1998). An even more dramatic

demonstration of the functional redundancy of SAGA is the fact that a deletion of Spt20

causing complete loss of SAGA does not create an apparent deficiency in transcription as

monitored by total mRNA levels (Berk, 1999). Interestingly, mutants of Ada1 and Spt20 individually had more dramatic phenotypes than mutants with inactivated or eliminated

Gcn5, suggesting that HAT activity may not be the major role of SAGA (Struhl and

Moqtaderi, 1998). While SAGA function may not be requisite for viability, the association of TAFs and other transcription regulators may convey the potential for

SAGA to regulate at many different promoters (Grant et al., 1998).

Recent work by Pugh and co-workers (Huisinga and Pugh, 2003; Pugh, 2003) has further elaborated the role of SAGA (and the related SLIK and SALSA complexes).

Microarray analysis of Gcn5 deletion mutants demonstrated that 10% of the genome was activated by these Gcn5 HAT complexes, whereas a strict TAF1 TS mutant demonstrated a non-overlapping 90% dependence on TFIID (Huisinga and Pugh, 2003). Interestingly,

46% of the Gcn5-dependent genes are stress inducible, thus nearly 1/3 of all stress- inducible genes in yeast are dependent upon SAGA for activation (Huisinga and Pugh,

2003). Pugh’s group has also discovered that the 90% of the promoters regulated by

21

TFIID are TATA-less, while those that are SAGA-dependent contain TATA-boxes

(Pugh, 2003). This is explained most easily by TFIID being able to correctly position

TBP at TATA-less promoters by having a firmly incorporated TBP and being able to read other contextual cues in a promoter (i.e., the DPE and Initiator element, if present). On the other hand, SAGA may not be able to read such core promoter elements since it lacks several TAFs involved in promoter recognition. This suggests that SAGA may be able to recruit TBP, but not position it correctly.

In recent years evidence has arisen that suggests a SAGA-like complex exists in

Arabidopsis. The presence of a SAGA-like complex, suggests that AtTAFs may interact with alternative complexes like TAFs in other organisms (Stockinger et al., 2001;

Vlachonasios et al., 2003). Vlachonasios et al. (2003) in a series of microarray experiments demonstrated that Arabidopsis Ada52b and Gcn5 regulate 5% of the genes represented in the 8,200 gene Affymetrix array. This result is strikingly similar to the findings for yeast Gcn5, suggesting a very similar extent of regulation by a potential

SAGA-like Arabidopsis complex.

The human PCAF complex is structurally related to SAGA, containing many homologous subunits. In addition to several hAda subunits, hTAFs 9, 10, and 12 are also found associated with PCAF, which has approximately 20 subunits in all (Ogryzko et al.,

1998). The histone H4-like hTAF6 is missing from the PCAF complex, but is apparently replaced in the histone-octamer-like structure by an ortholog with 42% similarity (PCAF associated factor – TAF6L) (Ogryzko et al., 1998). There is also an hTAF5 ortholog with WD40 repeats, TAF5L (Ogryzko et al., 1998).

22

Drosophila TBP-related factor 1 (TRF1) is expressed only in neuronal tissues and is a component of yet another protein complex with similarities to TFIID. However, this complex contains no bona fide TAFs (Hansen et al., 1997). TRF (TRF2 or TATA binding protein-like, TLF) homologs are found in humans, Drosophila, and C. elegans with a unique subset of TRF associated factors (Chang and Jaehning, 1997; Albright and

Tjian, 2000). An interesting finding from the laboratory of Robert Roeder (Xiao et al.,

1999) places a human TRF proximal (hTRFP) in the mediator complex. The function of the various TRFs appears to be, in general, to mediate transcriptional responses

(potentially not mediated by TFIID) by a variety of activators.

TBP-free TAF-containing complex (TFTC) is a human protein complex with similarities to TFIID. TFTC lacks TBP and TAF1, but contains most other TAFs (Grant and Berger, 1999). In addition to TAFs, TFTC includes hAda53, hSPT3, and hGcn5L that provides a HAT activity in place of TAF1 (Grant and Berger, 1999). TFTC can functionally substitute for TFIID at both TATA-containing and TATA-less promoters by nucleating PIC assembly (Wieczorek et al., 1998; Bell and Tora, 1999; Grant and Berger,

1999).

The existence of multiple TAF-containing protein complexes with HAT activity emphasizes that chromatin remodeling is essential for transcriptional activation of many promoters. In addition, this multiplicity of TAF/HAT complexes suggests a functional redundancy in activator complexes. These arguments imply that TBP and TAF recruitment to promoters is complex and the role of specific TAF-containing complexes is not well understood, even in the well-studied metazoans (Lee and Young, 1998).

23

TAFs: Required Factors or Optional Accessories

Several studies indicate that TAFs may have redundant or even optional roles in

transcription of PolII-dependent genes. For instance, several well-studied, strong

activators such as VP16 and Gal4 have redundant activation motifs that interact

separately with TFIID and/or holoenzyme (Chang and Jaehning, 1997). This redundant

interaction suggests that strong activators may be capable of activating transcription in

the absence of TFIID, by contacting holoenzyme through other GTFs such as SRBs,

TFIIA, and TFIIB (Burley and Roeder, 1996). Furthermore, in human embryonal

carcinoma cells, a novel TFIIA-TBP complex has been identified that is capable of

activating transcription but completely lacks TAFs (Mitsiou and Stunnenberg, 2000). In

addition, several novel coactivator complexes in mammalian systems such as vitamin D3

receptor interacting proteins/activator-recruited factor (DRIP/ARC), thyroid-hormone

associated protein complex (TRAP), and cofactor required for Sp1 activation (CRSP)

completely lack TBP and TAFs (Fondell et al., 1996; Naar et al., 1999; Rachez et al.,

1999; Ryu et al., 1999). Taken together, TFIID (and TAFs) may be optional accessories

to transcription. Alternatively, protein complexes other than TFIID and SAGA

(containing TAF subunits or other coactivators) such as TFTC and TRAP can

compensate for the lack of TFIID and SAGA under some conditions (Albright and Tjian,

2000).

Early studies employing TAF TS mutations resulting in down-regulated expression

and targeted degradation of TAFs in yeast did not demonstrate promoter dependency on

these coactivators. In some studies, PolII holoenzyme alone (no TAFs present) supported

activated transcription in vitro (Koleske and Young, 1994; Berk, 1999). However, in apparent contradiction to earlier results, experiments utilizing TS mutations demonstrated

24 significant promoter-dependency on TAFs. Michel et al. (1998) hypothesized that this was due to the use of tighter TS mutations than in previous studies, and that only TS mutations causing rapid cessation of growth upon temperature shift were appropriate for such studies (Berk, 1999). In fact, using “tight” TS mutants, a significant loss of transcription was observed only 30 min after the shift to the nonpermissive temperature

(Michel et al., 1998). This temperature shift also caused rapid degradation of not only the mutated protein but also the other proteins of the TFIID and SAGA complexes

(Michel et al., 1998). This result suggested that PolII cannot transcribe without a functional TFIID complex (Michel et al., 1998). Interestingly, temperature shift also resulted in a degradation of western blot-detectable TAFs and two thirds of TBP (Michel et al., 1998; Berk, 1999). This suggests that two thirds of TBP is associated with TFIID, while the other one third is either free or bound by TAFIs or TAFIIIs (Berk, 1999).

A recent study utilizing an inducible depletion strategy for chicken TAF9 (histone

H3-like TAF) demonstrated a high level of PolII transcription without detectable levels of chicken TAF9 (Chen and Manley, 2000). This elegant experimental system measured transcription directly through pulse-labeling and included steady state measurements of several gene transcripts. This apparent inconsistency with yTAF TS experiments was mainly explained by an increased functional redundancy in mammalian transcriptional machinery (Chen and Manley, 2000). Such alternative mammalian complexes as PCAF, and TRAP were proposed to play much larger roles than SAGA in yeast (Chen and

Manley, 2000).

Other studies lead to the conclusion that different promoters have distinct dependencies on TAFs for their expression. In yeast, TAF-independent (TAFind) and

25

TAF dependent (TAFdep) promoters were identified using the TAF TS mutants. After

temperature shift of these mutants, transcript profiling was used to detect genes that were

transcriptionally dependent or independent of the various TAFs (Li et al., 2000). It was

established with the use of formaldehyde DNA-crosslinking chromatin

immunoprecipitations (ChIPs) that TAFdep promoters recruited TAFs and TBP at similar

levels (Li et al., 2000). However, ChIP of TAFind promoters indicated that TAFs were

only present at background levels (Li et al., 2000). These TAFind promoters still recruited

TBP, apparently sans TAFs, as TBP bound all promoters at levels that correlated well

with transcript accumulation (Li et al., 2000).

Interestingly, when yeast TBP was inactivated in a temperature shift experiment,

TAFs continued to be recruited to TAFdep promoters (Li et al., 2000). In general, TAFdep promoters recruit TAFs in an activator-dependent fashion, independent of other GTFs (Li et al., 2000). For instance, the binding of TBP with TAFs to a TAFdep promoter (the yeast RPS5 promoter) is lost after removal of the activator binding sites, but inactivation of TFIIB or SRB4 had minimal effect on binding of TBP to the TAFdep (ACT1, and

RPS5) promoters (Li et al., 2000). This is compelling evidence for a model in which

yeast TAFs are directly targeted for recruitment by transcriptional activators, and in

parallel pulling TBP to the promoter, nucleating PIC assembly (Li et al., 2000). On the

other hand, inactivation of TFIIB or SRB4 substantially reduced binding of TBP to a

TAFind (ADH1) promoter (Le Gourrierec et al., 1999). This evidence seems to indicate

that holoenzyme recruits TBP (sans TAFs) to TAFind promoters, leading to stabilization

of each other’s interaction with the promoter.

26

It is clear that most TAFs are essential to yeast survival (Green, 2000). Work with temperature sensitive mutants of the histone-like yTAFs demonstrated that they were essential to PolII transcription as well (Michel et al., 1998). Michel et al. (1998)

made the argument that loss of SAGA did not cause a large drop off in transcription because most of its components were not required for yeast viability or PolII transcription. Therefore, it seems that TFIID is required for viability, but not necessarily for correct transcription of every PolII dependent gene. From the evidence accumulated to date, the model of TFIID serving as the major PolII coactivator seems secure in most organisms. However, there is clearly significant redundancy of coactivator complexes in yeast and metazoans. With so little known about similar coactivators in plants, it is futile at this point to postulate how their transcription is regulated.

Interplay of GTFs

Assembly of the PIC involves many GTFs and nearly an order of magnitude greater number of individual proteins. Therefore, implicit in formation of this complex are a large number of binary protein-protein interactions involving intra-GTF and inter-GTF binding partners. The interactions between TFIIA, TFIIB, TFIID, TFIIE, and TFIIF are summarized in Table 1-3 and in Figures 1-2, 1-3, and 1-4 for humans, Drosophila, and yeast, respectively. Just a few examples of the many known TAF-GTF interactions are dTAF9/ hTAF9 with TFIIB; yTAF14 with TFIIF; hTAF6 with TFIIF and TFIIE; and dTAF4/ hTAF4 with TFIIA (Tjian and Maniatis, 1994; Burley and Roeder, 1996).

While TFIID is generally responsible for nucleation of the PIC, the other GTFs play major roles as well. Regardless of which model is assumed (ordered multi-step assembly or holoenzyme), TFIIA has a somewhat controversial presence. TFIIA is composed of either two subunits (L and S) in yeast and Arabidopsis or three subunits (α,

27

β, and γ) in metazoans. TFIIA-α and -β are derived from post-translational cleavage of a protein homologous to the larger (L) subunit in fungi and plants (Li et al., 1999). TFIIA is able to integrate into the PIC at any step of assembly and is even capable of binding

TBP in the absence of DNA (Orphanides et al., 1996). However, when TFIIA does bind

TBP at a promoter it is able to interact with both the N-terminal stirrup of TBP and DNA upstream of the TATA element increasing the stability of the DNA-protein complex

(Langelier et al., 2001). Besides this function of TFIIA, it is suggested that TFIIA is involved in TBP anti-repression because it is able to remove repressors like Mot1. Only

TFIIA-β and -γ are required for anti-repression, but not -α. However, all three subunits are required for activation mediated by trans-activators that recruit TFIIA. In the work of

Langelier et al. (2001), TFIIA stimulated basal transcriptional activity only in the presence of TFIIEβ and TFIIFα suggesting that TFIIA may somehow be involved in enhancing the activities of TFIIE and TFIIF.

The structures of TFIIB as well as TFIIA in association with the TBP-TATA complex have been determined by x-ray crystallography (Nikolov et al., 1995; Geiger et al., 1996). The binding of TFIIB to the C-terminal stirrup of TBP at a promoter is a required step to PIC formation (regardless of which model is followed). Like the binding of TFIIA to TBP-TATA, the TFIIB-TBP-TATA complex is stabilized by TFIIB interactions with both TBP and DNA both upstream and downstream of TATA in this case (due to the 80° bend in the TATA-element) (Malik et al., 1993; Lee and Hahn, 1995;

Tang et al., 1996). The element directly upstream of TATA that is contacted by TFIIB is termed the IIB recognition element (BRE)(Lagrange et al., 1998). The BRE is contacted in a sequence specific manner by a helix-turn-helix DNA binding domain of TFIIB

28

(Lagrange et al., 1998). The BRE has a consensus sequence of 5'-G/C-G/C-G/A-C-G-C-

C-3' and represents the fourth known core promoter element in addition to the TATA-

element, the DPE, and the Initiator (Lagrange et al., 1998). Along with creating a more stable TBP-TATA interaction, TFIIB makes contact with a TAF, TFIIF, and PolII (Ha et al., 1993; Fang and Burton, 1996). In fact, some mutations of TFIIB have a great effect on transcription start sites suggesting that TFIIB plays a significant role in positioning of

PolII in the PIC (Orphanides et al., 1996).

Choi et al. (2003) recently demonstrated that human TFIIB has the capacity to autoacetylate itself on K238. This autoacetylation was competitively inhibited by coenzyme A and was reversible under the conditions of high coenzyme A concentrations, indicating that this is a catalytic process (Choi et al., 2003). Interestingly, yeast TFIIB had the same autoacetylation capacity and the TFIIB affinity for TFIIF was then significantly increased (Glutathione-S transferase-pulldown efficiency increased from

15% to 90%) (Choi et al., 2003). This affinity increase suggests that TFIIB acetylation is

a key mechanism for recruitment of TFIIF and PolII to a promoter.

Similarly to TBP and TFIIA, TFIIB has been studied to a limited degree in plants

(Baldwin and Gurley, 1996; Pan et al., 2000). Pan et al. (2000) demonstrated that TBP-

TFIIB interactions were dispensable for basal transcription and activated transcription at

strong complex promoters (CaMV 35S). However, the TBP-TFIIB interaction was

required for activated transcription at simplified artificial promoters (Pan et al., 2000).

These results can be most simply interpreted to mean that during basal and activated transcription from complex promoters, other factors besides TBP and TFIIB (perhaps

TAFs and transcriptional activators, respectively) are able to recruit PolII to the promoter.

29

However, at the simple, artificial promoters the rate-limiting step is no longer TBP

recruitment (due to recruitment by the transactivator) but recruitment of PolII. Since the

artificial promoters are TATA-containing, they may act in a TAFind manner and thus

TAF-holoenzyme interactions may not fully complement the lack of TBP-TFIIB

interaction. In separate work, Pan et al. (1999) showed that a 14-3-3 protein binds TBP and TFIIB independently of known 14-3-3 protein-binding motifs. It was also demonstrated that this 14-3-3 protein was capable of stimulating transcription in vivo,

suggesting that 14-3-3s might act as co-activators thus creating another layer of

complexity to transcriptional regulation (Pan et al., 1999).

An exciting story that is beginning to unfold is that of plant-specific TFIIB-related

protein (pBrp), or AtTFIIB5 (chapters 2 and 4). This protein was shown to interact in

vitro to form a TBP-TFIIB5-TATA complex (Lagrange et al., 2003). Using enhanced

yellow fluorescent protein-tagging, chloroplast-fractionation and proteolytic-cleavage

experiments analyzed by western blots, as well as plastid agglutination experiments,

Lagrange et al. (2003) have shown that AtTFIIB5 was normally localized to the cytosolic

face of the plastid envelope. This localization to the chloroplast is the first occurrence of

any GTF to be observed stably located outside of the nucleus.

AtTFIIB5 was also observed to contain a P/E/D/S/T-rich domain that appears to

play a role in targeting this protein for degradation by the proteasome (Lagrange et al.,

2003). Upon pharmacological disruption of proteasome function and in COP9

signalosome mutants, AtTFIIB5 was observed to localize to the nucleus. Lagrange et al.

(2003) suggest a model in which an unknown plastid-derived signal leads to release of

the TFIIB5 protein from the outer envelope and movement into the nucleus. In the

30

nucleus, TFIIB5 would then induce transcription of genes appropriate for response to the

original signal. In this model, proteasome-mediated degradation provides a rapid turn-

over of the nuclear-localized TFIIB5 protein, leading to tighter control of the response to

the signal.

However, this model lacks an explanation as to why the TFIIB5 protein appeared to

be released from the chloroplast under proteasome/COP9 signalosome dysfunction. I

propose a model in which the COP9 signalosome leads to degradation of a TFIIB5/pBrp

co-factor that allows the protein to localize to the nucleus. The co-factor could be either

a chaperone that escorts TFIIB5/pBrp to the nucleus or a chloroplast-docking antagonist.

Alternatively, the proteasome/COP9 signalosome may be activating either a plastid docking factor or a chloroplast targeting signal in TFIIB5/pBrp by partial proteolysis

(Gille et al., 2003). In either of these cases, TFIIB5/pBrp transport to the nucleus and

induction of transcription is likely to be the culmination of this signal transduction

pathway. Whatever their role, the TFIIB5/pBrp subfamily of TFIIB-like factors is sure to

play a novel role in transcriptional regulation.

Part of the role of TFIIB is to recruit TFIIF into the PIC. TFIIF is a heterotetrameric complex of two TFIIFα (RNA polymerase-associated protein 74 kDa,

RAP74) and two TFIIFβ (RAP30) molecules (Flores et al., 1990) that is tightly associated with PolII. However, in yeast a third factor interacts as part of TFIIF, the yeast TAF14 (also a member of the TFIID and SWI/SNF complexes) (Henry et al., 1994;

Cairns et al., 1996). Interestingly, although both human TFIIFα and TFIIFβ bind to

TFIIB individually, it has been shown that TFIIFα blocks the binding of TFIIFβ to TFIIB

by simultaneously binding to both proteins in the regions required for their respective

31 interaction (Fang and Burton, 1996). TFIIF is required for recruitment of PolII to the

TATA-TBP-TFIIA-TFIIB complex, and is found tightly associated with PolII. Indeed,

TFIIF is credited with stimulating the rate of transcriptional elongation (which implicates that it is an elongation factor in addition to its function as an initiation factor) (Flores et al., 1989).

Besides elongation stimulation, TFIIF (specifically the β-subunit) also inhibits and reverses PolII binding to non-promoter DNA regions making this interaction promoter- specific, similarly to the bacterial σ-factor (Killeen and Greenblatt, 1992). TFIIFβ has sequence similarity with bacterial σ factors and is able to bind E. coli RNA polymerase in the same region as σ70 (McCracken and Greenblatt, 1991). Interestingly, a dimer of

TFIIFβ alone is able to recruit PolII to promoters and support proper initiation of transcription (Flores et al., 1991).

Three functional domains compose TFIIFβ: the TFIIFα binding N-terminus (Fang and Burton, 1996); the polymerase binding middle domain (Killeen and Greenblatt,

1992); and the C-terminal winged-helix domain (Groft et al., 1998). Similarly, TFIIFα can be functionally divided into three domains: the N-terminal TFIIFβ binding domain; the highly charged middle region; and the C-terminal winged helix domain (Kamada et al., 2001). TFIIFα seems to largely play a role in stimulating transcriptional elongation and aides TFIIFβ to remove PolII from non-specific DNA interactions. One interesting observation is the presence of a serine/threonine kinase activity in TFIIFα that is involved in transcriptional elongation (Rossignol et al., 1999). Rossignol and co-workers

(1999) were unable to find an identifiable ATPase domain in TFIIFα that must be present for kinase activity; however, a weak similarity with an AAA ATPase VWA-containing

32

proteins is clearly identifiable in A. thaliana TFIIFα (as mentioned in Chapter 2). Two

other interesting findings are: the phosphorylation of TFIIFα by the TAF1 factor kinase

domain, and a protein kinase in TFIIH (Dikstein et al., 1996a; O. Brien and Tjian, 1998;

Rossignol et al., 1999). Although the significance of these activities utilizing TFIIF as a

substrate is still unknown, some data suggest that both TFIIF initiation and elongation

activities are stimulated by this phosphorylation similarly to the stimulation of TFIIA

activities by phosphorylation (Kitajima et al., 1994).

After TFIIF and PolII, TFIIE is the next GTF to enter the assembling PIC possibly

with PolII. TFIIE, like TFIIF is a heterotetramer composed of two different proteins (α and β) (Ohkuma et al., 1990; Inostroza et al., 1991). It was found that TFIIEα without

TFIIEβ has the capacity to mediate basal transcription when added to the other required factors (Ohkuma et al., 1990; Inostroza et al., 1991); however, in the recombinant form both subunits were required (Peterson et al., 1991). TFIIEα contains a zinc-finger domain that is critical its stable incorporation into the PIC (Maxon and Tjian, 1994), a leucine repeat, and a helix-turn-helix domain as well as sequence similarity to E. coli σ-

factor region 2.1 (Ohkuma et al., 1991). TFIIEα also contains a clearly identifiable catalytic loop domain found in many protein kinases (Peterson et al., 1991), although

TFIIE has no known ATPase activity. The protein-protein interaction between TFIIEα and TFIIEβ may be mediated by leucine repeats, since TFIIEβ also contains this recognizable motif (Sumimoto et al., 1991). TFIIEβ, like TFIIEα has some sequence similarity to σ-factors, but in this case it is with region 3 which is implicated in promoter recognition (Sumimoto et al., 1991). TFIIEβ also has similarity with TFIIFβ in a region

33

similar to a σ-factor domain that binds to core RNA polymerase, consistent with both

TFIIE and TFIIF interactions with PolII (Sumimoto et al., 1991).

TFIIEβ contacts ssDNA through a C-terminal winged helix domain that may play a role in stabilization of the open promoter and assist DNA melting (Okamoto et al., 1998;

Okuda et al., 2000). This ssDNA-binding domain is novel in that it binds DNA in the

opposite face of where winged helix domains typically interact with DNA, in a positively

charged channel (Okuda et al., 2000). Since TFIIE interacts with PolII and GTFs (TFIIB

and TFIIF), Okuda et al. (2000) suggested a model in which these properties in addition

to the ssDNA binding lead to a stabilization of the PIC where the promoter starts to open.

Interestingly, like TBP and TFIIB, TFIIE appears to have ancient roots. Homologs of

TFIIEα (TFE) have been identified in Archaea. TFE was not required for transcription,

but was stimulatory to transcription under conditions of limiting TBP (Bell et al., 2001).

This suggests a conserved function for TFE/TFIIEα in stabilization of the PIC.

Once TFIIE is incorporated into the PIC, it recruits TFIIH - a multi-subunit GTF

with two ATP-dependent helicases and a protein kinase (Orphanides et al., 1996). One

of the TFIIH helicases appears to be required for transcriptional initiation, but TFIIE,

TFIIH, and ATP are all dispensable on templates that are highly negatively supercoiled

(Parvin and Sharp, 1993). It is believed that this negative supercoiling greatly lowers the

energetic requirement for strand separation and precludes the need for helicases activity.

Interestingly, both TFIIB and TFIIEα contain zinc-ribbon motifs, and both TFIIB and

TFIIE bind DNA between the TATA-element and the start site of transcription (Robert et

al., 1996). It has been speculated that these DNA binding motifs may play a role in

stabilizing the melted region of the promoter and as such supplement one of the main

34

functions of TFIIH in transcription; however, more recent evidence suggests this is a

function of the TFIIEβ ssDNA binding domain (Okamoto et al., 1998).

Another main function of TFIIH (beside transcription-coupled nucleotide excision

repair, which will not be discussed here) is to stimulate elongation by hyper-

phosphorylation of the C-terminal domain (CTD) of the largest subunit of PolII. This evolutionarily conserved CTD is composed of many tandem repeats of a heptapeptide

(YSPTSPS) of which five residues are potential recipients of phosphate moieties. The

cdk7 subunit, a cyclin-dependent kinase, of human TFIIH has been shown to be the

subunit responsible for hyperphosphorylation of the CTD, an activity that potentially leads to PolII promoter escape (Orphanides et al., 1996). One additional function of both subunits of TFIIE is to stimulate this CTD kinase activity of TFIIH (Okamoto et al.,

1998).

PolII is a 12 subunit complex of approximately 500 kDa (Dvir et al., 2001). The

core of PolII is composed of two large subunits, RNA polymerase B protein 1 (Rbp1) and

Rbp2 (Dvir et al., 2001). The ten remaining subunits (Rbp 3-12) coat the surface of these

two proteins in single copies (Dvir et al., 2001). PolII is capable of unwinding double

stranded (dsDNA), adding ribonucleotides to RNA transcripts, and proofreading nascent

transcript (Cramer et al., 2000). The structure of PolII shows a deep cleft between Rbp1

and Rbp2 through which ~20 bp of dsDNA is held as it enters the active site (Cramer et

al., 2000). This entering dsDNA is griped by a “pair of jaws” formed by a portion of

Rbp1 with Rbp5 on one jaw with Rbp9 on the other jaw (Cramer et al., 2000). A “sliding clamp” composed of the C-terminal region of Rbp2 and the N-terminal parts of Rbp6 and

Rbp1 greatly stabilizes the interaction with downstream DNA, leading to the processivity

35

of the enzyme (Cramer et al., 2000). Cramer et al. (2000) propose that a groove leading

away from the active site behind the hinge of the downstream-DNA sliding-clamp binds

the emerging RNA and that acts as a lock on the clamp increasing processivity.

Underneath the base of the cleft is an inverted funnel leading to two pores that give

access to the DNA-RNA hybrid and are near the active site (Cramer et al., 2000). These

pores may provide access for elongation factors, nucleotides, and exit of the 3’ end of the

mRNA during backtracking (Cramer et al., 2000). PolII is clearly a complex assembly of

protein domains with a multitude of functions (many of which are elucidated by the

structure). Unfortunately, further details must be left for other manuscripts.

The CTD of PolII is not a naked protein-tail structure as once thought, instead it is

covered by a large Mediator complex of co-activators (Orphanides et al., 1996). While the Mediator complex is not technically a GTF (because it was not identified as a factor required for basal transcription in vitro), it does merit some discussion here. Mediator is composed of approximately 60 proteins and is ~3.5 MDa (reviewed in Myers and

Kornberg, 2000; Rachez and Freedman, 2001). Many Mediator genes were first identified as suppressors of CTD truncation mutants of RNA polymerase B (SRB) in yeast. Many of these so called SRB proteins have little (if any) recognizable sequence similarity with other proteins, except SRB10 and SRB11 which are also known as cyclin

C and cdk8, respectively.

Besides the GTFs, RNA PolII and Mediator needed for regulation of transcription, there are many co-activator and co-repressors complexes that are required in the cell, but detailed discussions are beyond the scope of this document. Virtually all of them in some way modulate the access of GTFs to their promoters. Some of these include the TAF-

36

containing HAT complexes (TFTC; SAGA; STAGA; PCAF; nucleosome

acetyltransfersase histone H4, NuA4; etc.), histone deacetylase complexes (i.e. SWI

independent 3 complex, SIN3; nucleosome remodeling HD complex, NuRD; regulator of

nucleolar silencing and telophase exit complex, RENT) (reviewed in Lawit and Czarnecka-

Verner, 2002), ATP-dependent chromatin remodeling complexes (SWI/SNF, BRG1-

associated factor (BAF), and related factors), DNA and histone methyltransferase complexes (i.e., Complex Proteins Associated with Set1, COMPASS; Enhancer of Zeste)

(Miller et al., 2001; Czermin et al., 2002), just to name the best-studied classes.

However, all of these protein complexes require some type of contextual cues to find their substrates. At some point, nearly all of these cues originate with sequence-specific

DNA-binding transcriptional regulators (either activators or repressors).

Transcriptional Activators That Bind DNA

The GTFs alone are capable of conveying basal levels of transcription from core

promoters (for a review of core promoters see Smale and Kadonaga, 2003). However,

for activated transcription several additional layers of control are often necessary. The

first of these are transcriptional trans-activators, the second are cis-acting DNA elements

in promoters to which transcriptional activators (and repressors) can bind. In general,

DNA binding transcriptional activators are composed minimally of two functionally

separable domains: a DNA binding domain and a transcriptional activation domain

(Ptashne, 1988).

Differential expression of the 27,000 genes of Arabidopsis implies involvement of

many different transcriptional regulators that are capable of differential combinations of

protein-protein and protein-DNA interactions. The team of scientists that analyzed the

Arabidopsis genomic information found 1,709 putative proteins with similarity to known

37

classes of DNA-binding domain-containing transcription factors (The Arabidopsis

Genome Initiative, 2000). This large, and potentially underestimated, set of

transcriptional regulators is certainly involved in a multitude of protein-protein interactions with each other (homo- and hetero- dimers and trimers) and with co-activator

and/or co-repressor complexes.

Many transcriptional regulators have been studies in various organisms, including

plants. Interestingly, only 8-23% of the genes encoding proteins containing DNA-

binding domains identified in Arabidopsis are similar to genes in other non-plant

eukaryotic genomes (The Arabidopsis Genome Initiative, 2000). Unlike transcription-

related genes, 48-60% of Arabidopsis protein synthesis genes have homologs in the other

eukaryotic genomes (The Arabidopsis Genome Initiative, 2000). This great disparity

reflects an independent evolution of plant transcription factors in general.

More than half of the transcription factor families in Arabidopsis (16 of 29) appear

to be specific to plants. The Apetala 2/Ethylene response element binding protein-

related to ABI3/VP1 (AP2/EREBP-RAV), no apical meristem/cup shaped cotyledon 2

family (NAC) and auxin response factor (ARFAUX/IAA) families, have DNA-binding

domains not found outside of the plant kingdom. DOF zinc-finger, WRKY zinc-finger,

and the two-repeat MYB families contain plant-specific variants of more widespread

domains. Some large families of transcription factors (R2R3-repeat MYB, WRKY

families, etc.) have expanded in plants, with approximately 100 members in some groups.

Other classes of DNA-binding proteins are completely missing in plants such as the Rel-

like DNA-binding domain, nuclear steroid receptors, forkhead-winged helix, and POU

38

(Pit-1, Oct and Unc-8b) domain protein families (The Arabidopsis Genome Initiative,

2000).

The functions of the individual transcription-factor family-members can be regulated by expression characteristics. Another way to add a layer of control to a transcriptional activator is for it to target a co-activator or GTF that is only expressed at certain temporal or spatial coordinates. With the plethora of different GTFs in

Arabidopsis, it seems likely that differential expression of GTFs targeted by different transcription factors is a key mechanism of control in plants. Therefore, to truly begin to understand transcriptional regulation of any gene in plants we must attempt to understand regulation circuitry of the transcription factors, the GTFs, and the coactivator/co- repressor complexes, as well as all of their protein-protein interactions.

39

Table 1-1. TATA binding protein-associated factors of the TFIID complex. The TAFs are displayed as identified in (from left to right) Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens.

Arabidopsis Yeast Drosophila Human 225(1)/205(1b) 145(130)(1) 230(250)(1) 250(1) BDs, PK, HAT, TAND, Ub ligase 142(2) 150(TSM1)(2) 150(2) 150(CIF150)(2) contacts initiator

47(3) 155(BIP1)(3) 140(3) H2A-like HFD, contacts BTB domain proteins H2A-like HFD, contacts 76(4)/69(4b) 48(4) 110(4) 135(4)/105(4b) Q-rich TAs, and IIA WD-40 Repeats, 78(5) 90(5) 80(85)(5) 100(95)(5) contacts IIFβ H4-like HFD, contacts 59(6)/55(6b) 60(6) 62(60)(6) 70(80)(6) IIEα, IIFα, AAs, & DPE contacts multiple TAs, 22(7) 67(7) AAF54162(7) 55(7) and Bdf1 40(8) 65(8) Prodos(8) 43(8) H3-like HFD

H3-like HFD, contacts 21(9) 17(20)(9) 42(40)(9) 31(32)(9)* acidic TAs, IIB, & DPE 15(10) 23(25)(10) 24(10)/16(10b) 30(10)* H3-like HFD

24(11)/19(11b) 40(11) 30β(11) 28(11) H3-like HFD

58(12)/75(12b) 61(68)(12) 30α(22)(12) 20(15)(12)* H2B-like HFD

14(13) 19(FUN81)(13) AAF53875(13) 18(13) H4-like HFD

SWI/SNF, TFIIF, TFIID 23(14)/30(14b) 30(ANC-1)(14) member substoichiometric, 41(15)/39(15b) 68(15) contacts RNA and ssDNA Not in TFIID complex, part of B-TFIID

228(BTAF1) Mot1(BTAF1) 172(BTAF1) helicase similarity, TBP negative regulator

Blue = Present in ySAGA, or hTFTC; = required in yeast * = Present in hPCAF Complex

Note: The names displayed in red are from the unified nomenclature designated by Lazlo Tora (Tora, 2002). Other names are based on accession numbers, mutant designations, or molecular weight (either observed or predicted). Histone-fold domain, HFD; bromodomains, BD; protein kinase, PK; transcriptional activators, TAs; acidic activators, AAs; Broad-complex, Tramtrack, and Bric-a`-brac, BTB.

40

3 1 10 8 7 12 5 TBP 4 15 TAND 6 10 11 9 13

2 A

TATAAAT

3 10 1 7 12 8 15 5 TBP 11 6 10 4 13 A 9 T2 2 T1

TATAAAT DPE

3 10 1 7 12 8 15 5 11 TBP 10 4 6 13 9 A 2

Figure 1-1. The “two-step handoff” model of removal of auto-inhibition of TFIID by the TAF1 N-terminal domains TAND1 and TAND2 (T1 and T2, respectively). TAFs are labeled and shown in light blue, the acidic activator is shown in red, and TFIIA is shown in light green.

41

Table 1-2. Protein-protein interactions of TFIID in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae with corresponding references.

References Interaction Human Drosophila Yeast (Ruppert et al., (Reese et al., 1994; (Weinzierl et al., TBP-TAF1 1993; Xenarios et Kokubo et al., 1993a) al., 2002) 1998) (Verrijzer et al., (Verrijzer et al., TBP-TAF2 1994) 1994) (Dubrovskaya et (Kokubo et al., TBP-TAF5 al., 1996; Tao et 1993c) al., 1997) (Weinzierl et al., (Weinzierl et al., TBP-TAF6 1993b; Hisatake et 1993b; Kokubo et al., 1995) al., 1994) (Yatherajam et al., TBP-TAF7 2003) (Kokubo et al., TBP-TAF9 1994) (Klebanow et al., TBP-TAF10 (Jacq et al., 1994) 1996) (Xenarios et al., (Kraemer et al., TBP-TAF11 2002) 2001) (Yokomori et al., (Mengus et al., TBP-TAF12 1993b; Kokubo et (Reese et al., 2000) 1995) al., 1994) (Mengus et al., TBP-TAF13 1995) (Verrijzer et al., (Verrijzer et al., TAF1-TAF2 1994) 1994) (Mengus et al., 1995; Burley and (Kokubo et al., Roeder, 1996) (Yatherajam et al., TAF1-TAF4 1993a; Weinzierl et TAF1-TAF4b 2003) al., 1993a) (Dikstein et al., 1996b) (Dubrovskaya et (Yatherajam et al., TAF1-TAF5 al., 1996; Tao et 2003) al., 1997)

42

Table 1-2 continued. References Interaction Human Drosophila Yeast (Weinzierl et al., (Weinzierl et al., (Yatherajam et al., TAF1-TAF6 1993b; Hisatake et 1993b) 2003) al., 1995) (Lavigne et al., 1996) TAF1- (Yatherajam et al., TAF1-TAF7 TAF7L (Pointud et 2003) al., 2003) (Kokubo et al., (Yatherajam et al., TAF1-TAF9 1994) 2003) TAF1-TAF10 (Jacq et al., 1994) (Yokomori et al., (Yatherajam et al., TAF1-TAF11 1993b) 2003) (Yokomori et al., TAF1-TAF12 1993b) (Yatherajam et al., TAF2-TAF3 2003) TAF2- (Yatherajam et al., TAF2-TAF4 TAF4b(Dikstein et 2003) al., 1996b) (Yatherajam et al., TAF2-TAF7 2003) (Yatherajam et al., TAF2-TAF8 2003) (Yatherajam et al., TAF2-TAF10 2003) (Yokomori et al., TAF2-TAF11 1993b) (Yokomori et al., TAF2-TAF12 1993b) (Gangloff et al., (Gangloff et al., (Gangloff et al., TAF3-TAF10 2001b; Yatherajam 2001a) 2001a) et al., 2003) (Kokubo et al., (Yatherajam et al., TAF4-TAF5 1993c) 2003) (Yatherajam et al., TAF4-TAF7 2003) (Yatherajam et al., TAF4-TAF8 2003) (Kokubo et al., (Yatherajam et al., TAF4-TAF9 1994) 2003) (Yatherajam et al., TAF4-TAF10 2003)

43

Table 1-2 continued. References Interaction Human Drosophila Yeast (Yokomori et al., (Yatherajam et al., TAF4-TAF11 1993b) 2003) (Hoffmann et al., (Yokomori et al., (Selleck et al., 1996; Gangloff et TAF4-TAF12 1993b; Kokubo et 2001; Yatherajam et al., 2000; Werten et al., 1994) al., 2003) al., 2002) TAF5-TAF6 (Tao et al., 1997) (Ito et al., 2001) (Dubrovskaya et al., TAF5-TAF7 1996; Lavigne et al., 1996) (Yatherajam et al., TAF5-TAF8 2003) (Kokubo et al., TAF5-TAF9 (Tao et al., 1997) (Uetz et al., 2000) 1994) (Yatherajam et al., TAF5-TAF10 2003) (Lavigne et al., TAF5-TAF11 1996; Tao et al., 1997) (Lavigne et al., (Yatherajam et al., TAF5-TAF12 1996; Tao et al., 2003) 1997) (Dubrovskaya et al., TAF5-TAF13 1996; Lavigne et al., 1996) (Bertolotti et al., TAF5-TAF15 1998) (Uetz et al., 2000; (Weinzierl et al., (Weinzierl et al., Ito et al., 2001; 1993b; Hisatake et 1993b; Kokubo et TAF6-TAF9 Selleck et al., 2001; al., 1995; Xenarios al., 1994; Xie et al., Yatherajam et al., et al., 2002) 1996) 2003) (Yatherajam et al., TAF6-TAF10 2003) (Yatherajam et al., TAF6-TAF11 2003) (Hisatake et al., TAF6-TAF12 1995; Hoffmann et al., 1996) (Yatherajam et al., TAF7-TAF7 2003)

44

Table 1-2 continued. References Interaction Human Drosophila Yeast (Yatherajam et al., TAF7-TAF8 2003) (Lavigne et al., (Yatherajam et al., TAF7-TAF11 1996) 2003) (Hoffmann et al., TAF7-TAF12 1996) (Bertolotti et al., TAF7-TAF15 1998) TAF8-TAF10b (Uetz et al., 2000; (Hernandez- Gangloff et al., TAF8-TAF10 Hernandez and 2001b; Yatherajam Ferrus, 2001) et al., 2003) (Yatherajam et al., TAF8-TAF12 2003) (Yatherajam et al., TAF9-TAF10 2003) (Yatherajam et al., TAF9-TAF12 2003) (Klebanow et al., 1996; Gangloff et TAF10-TAF10 al., 2001b; Yatherajam et al., 2003) (Uetz et al., 2000; TAF10-TAF11 Yatherajam et al., 2003) (Mengus et al., (Yatherajam et al., TAF10-TAF12 1995; Xenarios et 2003) al., 2002) (Mengus et al., (Ito et al., 2000; TAF10-TAF13 1995; Xenarios et Yatherajam et al., al., 2002) 2003) TAF10-TAF14 (Uetz et al., 2000) (Mengus et al., (Yatherajam et al., TAF11-TAF12 1995) 2003) (Mengus et al., (Ito et al., 2001; 1995; Birck et al., TAF11-TAF13 (Giot et al., 2003) Yatherajam et al., 1998; Xenarios et 2003) al., 2002) (Bertolotti et al., TAF11-TAF15 1998)

45

Table 1-2 continued. References Interaction Human Drosophila Yeast (Yatherajam et al., TAF12-TAF12 2003) (Bertolotti et al., TAF13-TAF15 1998)

46

Table 1-3. Protein-protein interactions between TFIIA, TFIIB, TFIID, TFIIE, and TFIIF subunits in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae with corresponding references.

References Interaction Human Drosophila Yeast (Geiger et al., 1996; (Yokomori et al., TBP-TFIIA-S/γ (Sun et al., 1994) Kokubo et al., 1994) 1998) TFIIA-L20 – TBP (Geiger et al., 1996; TFIIAβ– TBP (Sun TBP-TFIIA-L (Yokomori et al., Kokubo et al., et al., 1994) 1994) 1998) (Maldonado et al., 1990; Ha et al., (Yamashita et al., TBP-TFIIB 1993; Xenarios et 1993) al., 2002) (Yokomori et al., TBP-TFIIEα (Maxon et al., 1994) 1998) (Okamoto et al., TBP-TFIIEβ 1998) Acetylation (Imhof TAF1-TFIIEβ et al., 1997) Acetylation (Ruppert and Tjian, TAF1-TFIIFα 1995; Dikstein et al., 1996a) TAF4b- (Yokomori et al., TAF4-TFIIA-L TFIIAα(Dikstein et 1993a) al., 1996b) (Dubrovskaya et al., TAF5-TFIIFβ 1996) (Hisatake et al., TAF6-TFIIEα 1995) (Hisatake et al., TAF6-TFIIFα 1995) (Klemm et al., (Goodrich et al., TAF9-TFIIB 1995; Xenarios et 1993) al., 2002) (Kraemer et al., TAF11-TFIIA-S 2001) TAF13-TFIIA-L (Giot et al., 2003)

47

Table 1-3 continued. References Interaction Human Drosophila Yeast TAF14-TFIIFα (Henry et al., 1994) TFIIAα-TFIIAγ (Sun et al., 1994) TFIIA-L - TFIIA-Like Factor (Ranish and Hahn, (Giot et al., 2003) TFIIA-S (ALF)-TF TFIIAγ 1991) (Upadhyaya et al., 1999) TFIIA-L (Yokomori et al., (TFIIAα/β) - 1998; Langelier et TFIIEβ al., 2001) TFIIA-L - (Uetz et al., 2000) TFIIB TFIIAγ(S)- (Langelier et al.,

TFIIEα 2001) (Okamoto et al., TFIIB-TFIIEβ (Ito et al., 2001) 1998) (Fang and Burton, TFIIB-TFIIFα 1996; Xenarios et al., 2002) (Ha et al., 1993; TFIIB-TFIIFβ Fang and Burton, (Ito et al., 2001) 1996) (Austin and Biggin, (Riechmann and (Okamoto et al., TFIIEα-TFIIEβ 1996; Giot et al., Ratcliffe, 2000; 1998) 2003) Uetz et al., 2000) (Okamoto et al., TFIIEβ-TFIIEβ 1998) (Okamoto et al., TFIIEβ-TFIIFα 1998) (Okamoto et al., TFIIEβ-TFIIFβ 1998) (Flores et al., 1990; (Austin and Biggin, TFIIFα-TFIIFβ Killeen and 1996) Greenblatt, 1992 )

48

TFIIFα TFIIB TBP TAF1 TFIIFβ TAF15 TAF2 TAF13

TAF12 TAF3 TAF11

TAF10 TAF4

TAF9L TAF4b

TAF9 TFIIA-α

TAF8 TAF5

TFIIA-β TFIIEβ TAF7L TAF5L TAF6 TFIIA-γ TAF7 TAF6L TFIIEα ALF

Figure 1-2. Binary protein-protein interactions of the Homo sapiens general transcription factors TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and their homologs.

49

TFIIFα TFIIB TBP TAF1 TAF13 TFIIFβ TAF2 TAF12

TAF11 TAF3

TAF10b

TAF4 TAF10 TFIIA-L30

TAF9 TFIIA-L20 TAF5 TAF8 TAF5L TFIIEβ

TFIIA-S TAF7 TAF6 TAF6L TFIIEα

Figure 1-3. Binary protein-protein interactions of Drosophila melanogaster general transcription factors TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and their homologs.

50

TFIIFγ (TAF14) TFIIFα TFIIB TBP TAF1

TAF14 TAF2 TFIIFβ

TAF3 TAF13 TAF4 TAF12

TAF5

TAF11 TAF6 TFIIA-L TAF7 TAF8 TFIIEβ TAF10 TAF9 TFIIA-S TFIIEα

Figure 1-4. Binary protein-protein interactions of Saccharomyces cerevisiae general transcription factors TFIIA, TFIIB, TFIID, TFIIE, and TFIIF.

CHAPTER 2 PHYLOGENETIC ANALYSIS OF POPLAR, Arabidopsis AND OTHER PLANT GENERAL TRANSCRIPTION FACTORS

Introduction

While there has been a great deal of interest in DNA-binding transcriptional activators in plants, comparatively little work has been done on plant regulatory factors downstream of the activators themselves. One exception is a current project that seeks to clone and characterize the majority of chromatin regulators that are readily identifiable based on similarity (Pandey et al., 2002). Another exception is the plant histone deacetylases, which have been characterized to some degree in plants (reviewed in Lawit and Czarnecka-Verner, 2002). Since plant histone deacetylases have historically been of interest due to their involvement with the maize disease caused by an inhibitor, HC-toxin from Cochliobolus carbonum, they have been characterized more so with respect to enzymology than their function in protein complexes or gene regulation. Only two plant histone acetyltransferase complexes have been characterized to any extent: a SAGA-like complex in Arabidopsis (Stockinger et al., 2001), and TFIID in wheat (Washburn et al.,

1997).

The present study provides a detailed phylogenetic analysis of peptide subunits from the principal eukaryotic complexes responsible for transcriptional activation from plants. The model plant system Arabidopsis thaliana (ecotype Columbia) was chosen as a primary source of peptide sequences because it was the first of the plant genomes to be fully sequence and annotated (The Arabidopsis Genome Initiative, 2000). General

51 52

transcription factors (GTFs) from plant systems are readily identifiable using homology based searches since these proteins are highly conserved within eukaryotes and in some cases within Archaea as well.

Putative peptide subunits of plant GTFs TFIIA, TFIIB, TFIID, TFIIE, and TFIIF

were identified using basic local alignment search tool (BLAST) algorithms of the

Arabidopsis genomic data. In many cases, multiple genes were identified in Arabidopsis

based on sequence similarity with GTFs in other eukaryotes. These putative Arabidopsis

GTFs were then used to identify analogous genes in other plants to evaluate how wide

spread the gene duplication and evolutionary changes found in Arabidopsis were to the

plant kingdom. Phylogenetic analyses were performed using these other plant homologs,

and the well-characterized proteins from Homo sapiens, Drosophila melanogaster, and

Saccharomyces cerevisiae. The greatest emphasis was placed on identifying homologs in

arabidopsis, rice, and poplar (given that these plant genome projects were the most

advanced). This provided an examination of differences between GTFs from

dicotyledonous and monocotyledonous plants, as well as between herbaceous and woody

dicotyledonous plants.

Functional analysis tied with genome sequence scrutiny has identified families of

transcription factors found uniquely in plants (Riechmann et al., 2000). These large

families have encouraged efforts to systematically examine the functions of the

individual family members looking for redundancy and divergence (Eulgem et al., 2000;

Jakoby et al., 2002; Heim et al., 2003; Toledo-Ortiz et al., 2003). However, a similar

kingdom-level investigation of GTFs has not been reported. There are numerous

examples of GTF family members in metazoans that are specialized for certain functions.

53

Mammalian TRF2 (TBP-related factor 2) appears to be required for spermiogenesis,

Drosophila Cannonball (TAF5L) is required for spermatogenesis, and human TAF4b is part of a unique TFIID complex in follicle cells of the ovary (reviewed in Levine and

Tjian, 2003). These and other findings suggest that some GTF homologs have evolved to undertake specialized functions in animals, and similar events may have occurred in plants. The present analysis was undertaken because: 1) sequence of three divergent seed-plant genomes are nearly complete and have the potential to allow discovery of plant-specific GTF family members; 2) transcriptional regulation is a fundamental adaptive mechanism in plants to short-term stresses, and hitherto information is lacking on potential functional redundancy or divergence in GTFs.

Methods

The putative proteins of Arabidopsis GTFs TFIIA, TFIIB, TFIID, TFIIE, and

TFIIF were identified using two BLAST resources. The National Center for

Biotechnology Information protein BLAST (Arabidopsis thaliana organism setting; no filtering) was used to identify the following GTF subunits: TBP1, TBP2, TAF1, TAF1b,

TAF2, TAF6, TAF6b, TAF9, TAF10, TAF11, TAF11b, and TAF12. The Arabidopsis

Information Resource (TAIR) WU-BLAST2 BLASTp using the protein database was used to identify TAF4, TAF4b, TAF5, TAF7, TAF8, TAF13, TAF14, TAF14b, TAF15,

TAF15b, TFIIA-S, TFIIA-L1-3, TFIIB1-6, TFIIEα1-3, TFIIEβ1-2, TFIIFα, and

TFIIFβ1-2 (Table 2-1 for a summary of these genes and their respective proteins). All searches used human protein sequences as queries, with the exception of the TAF14 search in which the yeast protein sequence was used. Iterative searches were performed using the identified Arabidopsis protein sequences to identify TAF12b using TAIR WU-

54

BLAST2 BLASTp. Nucleotide sequences and other relevant annotated data were

identified by following the appropriate links provided by TAIR and the US National

Center for Biotechnology Information (NCBI).

Putative poplar GTFs were identified in the Department of Energy Joint Genome

Institute poplar database genomic sequence of the female black cottonwood (Populus

balsamifera subsp. trichocarpa) clone Nisqually-1 (Wullschleger et al., 2002). Searches were performed using the tBLASTn (Altschul et al., 1990; Altschul et al., 1997) function

at http://aluminum.jgi-

psf.org/prod/bin/runBlast.pl?db=poplar0&dump=1&matchReads=1. Genomic contigs

sequences were assembled using Contig Assembly Program 3 (CAP3) (Huang and

Madan, 1999). Contig gaps were filled using iterative searches of the genomic sequence

(BLASTn) and CAP3 assembly. Contigs were analyzed to predict cDNA sequence using

Softberry software (http://www.softberry.com/berry.phtml?topic=gfind) FGENESH with

either the Dicots (Arabidopsis) or Nicotiana tabacum settings (Chicurel, 2002). Since

there is a disagreement between taxonomists and the group sequencing the Populus

genome as to the proper name of the species, either trichocarpa or balsamifera subsp.

trichocarpa, these two names are used interchangeably in this text.

Putative GTF amino acid sequences from plants other than Arabidopsis and

poplar were identified using three different resources with Arabidopsis protein sequences

as queries. NCBI BLASTp (http://www.ncbi.nlm.nih.gov/BLAST/) was utilized to

search all available protein sequences using the Viridiplantae organism setting. H.

sapiens, D. melanogaster, and S. cerevisiae sequences were collected using text searches

in (http://www.ncbi.nlm.nih.gov/Entrez/). TBP, TFIIB, and TFIIEα homologies

55 can be traced back to Archaea (Rowlands et al., 1994; Qureshi et al., 1995; Bell et al.,

2001); therefore, in the case of these proteins archeal homologs were assembled as well.

The Institute for Genome Research (TIGR) plant expressed sequence tag (EST) databases were searched using Arabidopsis sequence queries and a tBLASTn algorithm

(http://tigrblast.tigr.org/tgi/). Only full-length EST contigs were considered and the open reading frame (ORF) encoding the sequence with GTF similarity was translated using

Java based Molecular Biologist’s Workbench 1.1 (JaMBW 1.1) (Toldo, 1997)

TranslatER. Finally, the Plant Genome Database (PlantGDB; http://www.plantgdb.org/) was searched with Arabidopsis queries using BLASTp on protein sequences and tBLASTn on EST contigs (ORFs were translates as above). The sequences of predicted

TATA biding proteins from poplar were assembled as above by John Davis.

The collected amino acid sequences of each GTF homology group were aligned using the Gonnet weight matrix (Gonnet et al., 1994) on ClustalX software (Thompson et al., 1997). Unaligned N-terminal or C-terminal sequence extensions were deleted by hand and datasets realigned to conservatively estimate phylogenic distances based on conserved protein regions. ClustalX alignment outputs were produced in Nexus format and phylogenetically analyzed using PAUP* (phylogenetic analysis using parsimony

*and other methods) parsimony with 500 bootstrap replicates (PAUP analysis done by

Ram Kishore Alavalapati) (Swofford, 2003). Phylogenetic trees were created using

TreeView (Page, 1996).

Similarity and identity matrices were created using Matrix Global Alignment Tool

(MatGAT; http://www.angelfire.com/nj2/arabidopsis/MatGAT.html) with the blocks substitution matrix 62 (BLOSUM62) similarity matrix (Campanella et al., 2003).

56

Truncated sequences were used as input data for those cases in which protein sequences were shortened for ClustalX alignments.

Results

The amino acid sequences of the assembled GTFs are shown in Appendix A. The amino acid multiple sequence alignments for core domains of the GTFs (sans N-terminal and C-terminal extensions) are found in Appendix B. Similarity and identity ranges for protein families are found in Table 2-2. Phylogenic trees derived from the parsimony analyses are found in Figures 2-1 through 2-12.

TFIIA Large and Small Subunits

The poplar TFIIA-L1 coding sequences prediction using the Softberry, GlimmerM

(http://www.tigr.org/tdb/glimmerm/glmr_form.html) (Majoros et al., 2003), and

Eukaryotic GeneMark.hmm (http://opal.biology.gatech.edu/GeneMark/eukhmm.cgi)

(Lukashin and Borodovsky, 1998) programs were all predicted to have a C-terminal extension of varying lengths that did not have homology with any other group and was missing the last seven amino acids that were highly conserved among all other organism examined. However, further analysis using the GeneSeqer program

(http://www.plantgdb.org/cgi-bin/PlantGDB/GeneSeqer/PlantGDBgs.cgi) (Schlueter et al., 2003) which used cDNA data from all plants and only Populus, predicted a shorter C- terminus that was highly similar to other organisms and ended with the ATGEFEF plant consensus. Since this model was truncated at the N-terminus due to the paucity of 5’ cDNA sequences, it was merged with the Softberry FGENESH output from the Nicotiana tabacum and Arabidopsis thaliana to yield the final prediction of poplar TFIIA-L1 coding sequence.

57

TFIIB Family

A large number of full-length cDNAs and predicted genes encoding TFIIB

homologs have been identified in plants (30 plant homologs in all). It appears that the

TFIIB protein-family has undergone many duplications as well as differentiations

including a novel homolog which has a functional connection to the plastid (Lagrange et

al., 2003). Arabidopsis and poplar both have four distinct phylogenetic TFIIB clusters

(Class A, Class C, Class D, and Class E). Clearly identifiable plant homologs of TFIIB- related factors (BRFs) associated with DNA-dependent RNA polymerase III (PolIII) were excluded from the phylogenetic analysis; however, it appears that distantly related homologs from Lycopersicon esculentum and Populus may have remained.

Plant TFIIB homologs appear to have many conserved motifs first identified in the metazoan TFIIB. Among these is a lysine residue that has recently been shown to be autoacetylated in human and yeast TFIIBs (Choi et al., 2003). This lysine is conserved in many members of the Class A TFIIB family (Figures 2-3 and 2-13). Similarly, the putative zinc-ribbon domain at the N-terminus has been conserved in most family members (Appendix B3). Although it is not apparent in Appendix B3 due to N-terminal trimming for alignment purposes, AtTFIIB4 also contains this conserved metal-binding domain. Poplar TFIIB3, poplar TFIIB9, poplar TFIIB8, Lycopersicon esculentum TFIIB

AF273333, and Sulfolobus solfataricus TFB AAK40772.1 are all missing the conserved cysteine and/or histidine residues essential to this N-terminal motif. Since a known functional TFB from the archaea Sulfolobus solfataricus is lacking this motif, it may not be required for TFIIB function in all cases. Thus, poplar TFIIB3, poplar TFIIB9, poplar

TFIIB8, and Lycopersicon esculentum TFIIB AF273333 may all be functional TFIIB- homologs despite the absence of this conserved motif.

58

Likewise, the imperfect direct repeats of amino acid sequences found in human

core-TFIIB (Nikolov et al., 1995) have been well conserved (Appendix B3). AtTFIIB6 is

lacking the second direct repeat region. This region is involved in protein-protein

interactions with PolII (Ha et al., 1993). Therefore, it is suggested that the AtTFIIB6 protein may be deficient in this PolII interaction, and could possibly function as a

negative regulator. Vitis vinifera TFIIB TC9302, as well as amino acid predictions from

poplar TFIIB4, TFIIB5 and TFIIB6 are notably lacking both direct repeats suggesting

that these proteins, if expressed, are not functional TFIIB homologs since they would

likely be deficient in TBP and PolII interactions (Ha et al., 1993).

In addition to the canonical TFIIB proteins (Class A) and BRFs (Class B), the

plastid envelope associated (Class C) TFIIB-like proteins (Lagrange et al., 2003) are

conserved in all plant lines with available sequence. The Arabidopsis AtTFIIB5/pBrp

shows two closely related homologs in Lycopersicon esculentum (Accession AAG01118)

and poplar (TFIIB7/pBrp) and high relatedness to the partial cDNAs from Spinacia

oleracea and Zea mays reported by Lagrange et al. (2003).

Representative TFIID Components

TBP has been highly conserved in plants (Table 2-2 and Figure 2-4), even in cases where plants contain duplicate TBP genes. These duplicated TBP proteins are in all cases highly similar and are not likely to have diverged functions. Plant TBP genes are tightly clustered phylogenetically, although somewhat diverged from metazoan, fungal, protistan, and archaeal TBPs and TBP-like proteins. Significantly, both imperfect repeat motifs within the protein structure are conserved in all the TBP-like proteins.

Similarly to the case with TBP, A. thaliana has two loci that TAF6 homologs. Both of these genes (designated TAF6 and TAF6b) are transcribed and the

59 latter has at least four alternative splicing variants (E. Czarnecka-Verner, S.J. Lawit,

W.B. Gurley unpublished data). Clones were sequenced by the University of Florida

ICBR DNA sequencing facility. Intron-Exon diagrams of TAF6 and TAF6b isoforms are shown in Figure 2-14.

TAFs 6, 9, 10, and 11 phylogeny patterns easily cluster into monocot and dicot families. The plant proteins are well conserved and somewhat divergent from metazoan, fungal and protistan proteins. However, poplar TAF9b is more related to the

Chlamydomonas reinhardtii TAF9 and S. cerevisiae TAF9 than the plant TAF9 proteins.

This TAF9 homolog is perhaps a TAF9-like protein involved in other transcriptional complexes similarly to H. sapiens TAF9L (Chen and Manley, 2003). Similarly, poplar

TAF11 is roughly equally related to plant and fungal family members suggesting that it represents a more ancient form of TAF11 than is found in other plants.

TFIIEα and TFIIEβ Subunits

Similarly to the TAF proteins, the TFIIEα phylogeny pattern also formed along dicot-monocot lines. The six dicot proteins and single monocot protein were diverged and branched separately with 100% bootstrap support. Interestingly, the archaeal TFE proteins were phylogenetically more similar to plant TFIIEα than to the yeast and metazoan counterparts. This suggests a higher degree of primary structure conservation in the plant TFIIEα proteins than in the other kingdoms, possibly indicating a greater reliance on TFIIE for stabilization of the open promoter conformation.

Monocot TFIIEβ proteins cluster closely with one another as do dicots with the exception of two proteins. These exceptions are the A. thaliana TFIIEβ family members.

60

Both proteins from Arabidopsis are found well outside of the core plant cluster suggesting a significant divergence in sequence in the Arabidopsis TFIIEβ genes.

TFIIFα and TFIIFβ Subunits

A. thaliana TFIIFα has an 87 amino acid C-terminal extension and S. cerevisiae

TFIIFα has a 68 amino acid C-terminal extension that were removed from their

sequences for alignment purposes. Due to the large size of the TFIIFα subunit, relatively

few full-length plant cDNAs could be assembled for comparison. Furthermore, due to a

lack of sequence in one region of a poplar TFIIF gene, a full-length genomic region could

not be assembled. Therefore, only one of two poplar putative TFIIFα proteins was

phylogenetically analyzed. These proteins are well conserved throughout the length of

the protein.

The N-terminus and C-terminus of S. cerevisiae TFIIFβ was trimmed by 32 and 33

amino acids, respectively, for alignment purposes. Similarly, D. melanogaster TFIIFβ

was trimmed at the C-terminus by nine amino acids. TFIIFβ proteins are likewise well

conserved, with two exceptions. Both A. thaliana TFIIFβ1 and poplar TFIIFβ1 have

large, non-conserved insertions. It should be noted that neither of these genes have

cDNA clone representation and are therefore only predictions (despite numerous tries at

RT-PCR cloning of AtTFIIFβ1; data not shown).

Discussion

TFIIA Large and Small Subunits

TFIIA is composed of either two subunits (L and S) in fungi and plants or three

subunits (α, β, and γ) in metazoans where α and β are derived from post-translational

cleavage of a protein homologous to fungal and plant TFIIA-L subunit (Li et al., 1999).

61

TFIIA interacts with both the N-terminal stirrup of TBP and DNA upstream of the TATA element (Langelier et al., 2001). TFIIA in A. thaliana seems to be encoded by four

genes, three encoding large subunit homologs and one encoding a small subunit homolog.

AtTFIIA-L1 and AtTFIIA-L2 appear to result from a recent gene-duplication due to their

high degree of identity and their close juxtaposition in chromosomal location (Fig. 2-2).

The genes encoding TFIIA-L1 and TFIIA-L2 are oriented in a tail-to-tail fashion with

their polyadenylation sites separated by 1,922 bp. AtTFIIA-L3 appears to have arisen

from a more ancient gene duplication, and has significantly diverged from other TFIIA-L

genes (Figure 2-2). The AtTFIIA-L3 protein is approximately half the size of its two

Arabidopsis homologs, although it has maintained a similar isoelectric point (pI) and

appears to be competent for assembly of the TFIIA complex based on yeast two-hybrid

interactions (Chapter 4). One hypothesis is that AtTFIIA-L3 represents an ancestral form

of the TFIIA-L protein family due to its phylogenetic clustering with fungal and

metazoan sequences.

Poplar TFIIA also appears to be encoded by four genes (two encoding large subunit

proteins, and two encoding small subunit proteins). Unfortunately, two contigs that most

likely encode one of the TFIIA-L genes could not be connected due to the presence of a

large, low-complexity (most likely intronic) region; however, the predicted TFIIA-L amino acid sequence grouped solidly with other dicot TFIIA-L sequences (data not shown). The predicted amino acid sequence from the incomplete TFIIA-L sequence has a high degree of identity with the complete form (Fig. 2-2), suggesting a recent gene duplication and redundancy in function. Interestingly, poplar TFIIA-S1 is highly

62

conserved (Fig. 2-1) and one that is quite diverged from other plant proteins (TFIIA-S2), nearly equidistant from other plant proteins and metazoan proteins.

Arabidopsis and poplar both encode TFIIA proteins that are highly diverged from

other plant proteins (a TFIIA-L3, and a TFIIA-S2, respectively). These proteins may

have evolved specialized functions within their respective organisms; however, they do not seem to be conserved within other plants that have been sequenced. This suggests that these proteins are potentially the products of evolutionary experiments in progress or may be ancestral forms of these proteins that have been retained in their respective species. However, the significance of their presence cannot be reliably predicted.

Overall, TFIIA conservation appears to be quite high. The TFIIA-S family is conserved throughout the length of the protein, while the TFIIA-L family is conserved mainly at the

N-terminal and C-terminal ends. The TFIIA-L sequence conservation pattern is consistent with the observation that human and fruit fly TFIIA is composed of three subunits, the two largest of which are derived from proteolytic cleavage of the TFIIAα/β

(TFIIA-L) pre-protein. This suggests that the middle region of TFIIA-L proteins may function as a flexible linker.

TFIIB Family

Full-length cDNAs of 30 plant homologs of TFIIB have been identified in plants.

The TFIIB protein-family has undergone myriad duplications and differentiations including one (the Class C TFIIB-related proteins) that has evolved a functional interaction with the defining plant organelle, the plastid (Lagrange et al., 2003). The canonical member of the Class C group (TFIIB5/pBrp) was discovered by Lagrange et al.

(2003) to bind the outer envelope of the plastid, suggesting a function in signal

63

transduction from the plastid to the nucleus. Six distinct phylogenetic TFIIB groups are

apparent in Arabidopsis and Populus (if one accounts for the BRFs). Clear homologs of

DNA-dependent RNA polymerase III (PolIII) associated TFIIB-related factors (BRFs) from plants were excluded from my phylogenetic analysis with the exception of the

Arabidopsis proteins for use as an out-group.

Plant TFIIB homologs have a number of conserved motifs. These include a lysine residue, located 28 amino acids from the N-terminus of the second direct repeat (in the human sequence), which has recently been shown to be autoacetylated in human and yeast TFIIBs (Choi et al., 2003). This lysine is conserved in many members of the plant

Class A TFIIB family (Figures 2-3 and 2-13) suggesting a conservation of this autoacetylation activity in plants. Choi et al. (2003) did not identify the catalytic domain of this autoacetylase activity; therefore, the conservation of this domain could not be assessed. Choi et al. (2003) reported that the presence of the acetyllysine group in TFIIB increases the affinity of this protein for TFIIF, implying a role in transcriptional initiation. It is likely that an activity involved in this critical process will be conserved not only in metazoans and fungi, but also in plants. Equally significant is the absence of this lysine in several of the plant TFIIBs, suggesting plant-specific specialization among members of the TFIIB family.

Similarly, the putative zinc-ribbon domain at the N-terminus has been conserved in most family members (Appendix B3) including AtTFIIB4 although it is not apparent in due to N-terminal trimming. Significantly, poplar TFIIB3, poplar TFIIB8, poplar

TFIIB9, Lycopersicon esculentum TFIIB AF273333, and Sulfolobus solfataricus TFB

AAK40772.1 are all missing the conserved cysteine and/or histidine residues essential to

64

this N-terminal motif. However, at least one archaeal species (Sulfolobus solfataricus) is

lacking the zinc-ribbon in its TFB suggesting that this motif may not be required for

TFIIB function in all cases.

Another conserved domain, the imperfect direct repeats (Nikolov et al., 1995) are

found in most plant TFIIB homologs (Appendix B3). AtTFIIB6 is lacking the second

direct repeat region, which interacts directly with PolII in animals (Ha et al., 1993).

Therefore, it is suggested that this proteins may be deficient in this PolII interaction and,

if they are functional, could possibly play a role as negative regulators. Four TFIIB-

related proteins are lacking both direct repeats suggesting that these proteins, if

expressed, are not functional TFIIB homologs (Ha et al., 1993).

In addition to the TFIIB-family proteins in Class A and Class B (BRFs, which were not analyzed extensively in this study), the a clear conservation has been shown for the

plastid envelope associated (Class C) TFIIB-like proteins in this study and by Lagrange et al. (2003). This plant-specific TFIIB is localized to the outer plastid membrane and is not detectable in the nucleus of wild type plants (Lagrange et al., 2003). The

characterized protein AtTFIIB5/pBrp has two closely related homologs (Lycopersicon

esculentum Accession AAG01118, and poplar TFIIB7/pBrp) in addition to the partial

cDNAs from Spinacia oleracea and Zea mays reported by Lagrange et al. (2003). This

suggests that this protein has a conserved activity that is critical to plant cell functions.

Lagrange et al. (2003) suggested a functional model in which a plastid-derived signal

leads to release of the TFIIB5 protein from the outer envelope and movement into the

nucleus. Once in the nucleus, TFIIB5 induces expression of its unique transcriptosome.

The nuclear accumulation of TFIIB5 in plants deficient in COP9 suggests proteasome-

65

mediated degradation provides a rapid turnover of the nuclear-localized TFIIB5 protein,

facilitating temporal control of the signal response. However, this is model does not

explain the lack of TFIIB5/pBrp protein on the plastid under conditions of

proteasome/COP9 signalosome dysfunction.

In the present study, a model is proposed in which the COP9 signalosome leads to

degradation of a TFIIB5/pBrp co-factor that allows the protein to the nucleus. Such a co-

factor may be either a chaperone that escorts TFIIB5/pBrp to the nucleus, a chloroplast-

docking antagonist, or a post-translational modifying regulator (i.e. a kinase or a 14-3-3

protein). In any of these cases, TFIIB5/pBrp transport to the nucleus and induction of

transcription is likely to be the culmination of this signal transduction pathway.

Whatever the role of TFIIB5/pBrp, the Class C subfamily of TFIIB-like factors plays a

novel role in plant transcriptional regulation.

There is weak bootstrap support for two additional conserved classes of TFIIB-like

proteins in plants. The Class D group contains Arabidopsis TFIIB4 and Poplar TFIIB8.

Class E contains Arabidopsis TFIIB3 and TFIIB6, as well as Poplar TFIIIB2. The

functions of these proteins are unknown; however, Arabidopsis TFIIIB3 and TFIIB6 have

similar interactions with other GTFs (Chapter 4).

Representative TFIID Components

TBP is widely regarded as being the rate-limiting factor of PIC formation.

Consistent with this critical role, it is among the most highly conserved proteins of the

GTFs, through all the organisms examined in this study, with 73.7% and 63.1% average similarity and identity, respectively. Likewise, TBP is highly conserved in plants (84% average identity). Similarly to animals, many plants contain duplicate TBP genes; however, unlike the case in animals the plant proteins are highly similar and are likely to

66

be largely redundant. Plant TBPs are tightly clustered, although significantly diverged

from metazoan, fungal, protist, and archaeal TBPs and TBP-like proteins. As would be

expected, the TBP two repeated structural domains are conserved in all the proteins in the

TBP-like family.

In general, TBP-associated factors are more highly variable than TBP. There are

many cases of duplicate TAFs as well as TAF-like proteins in fungi and animals

(reviewed in Tora, 2002). One example of this in plants is the presence of two genetic loci encoding homologs of TAF6 in Arabidopsis. Upon cloning of these cDNAs

(Chapter 3) it was found that one of these genes, TAF6b, is represented by four alternatively spliced mRNAs (Figure 2-14). TAF6b has 12 coding exons in the mRNAs of three isoforms, and TAF6b-4 has only 5 coding exons due to what appears to be a premature stop codon in exon V caused by the lack of splicing of intron II. In contrast,

TAF6 has 11 coding exons and no detected alternative splicing.

The TAF genes that have been investigated in this work are clearly divergent along taxonomic lines. This is clearly demonstrated by TAF9, of which monocot and dicot

TAF9 sequences cluster separately in the unrooted phylogram (Figure 2-6). This situation is also evident in the TAF6 phylogram (Figure 2-5). However, poplar TAF9b is more closely related to the C. reinhardtii TAF9 and S. cerevisiae TAF9 than the monocot or dicot TAF9 proteins. This TAF9 homolog is perhaps a TAF9-like protein involved in other transcriptional complexes similarly to H. sapiens TAF9L (Chen and Manley, 2003).

A second possibility is that poplar TAF9b may be a bona fide TAF9 that regulates a subset of genes in poplar, or is merely redundant. Finally, poplar TAF9b could represent an ancient form of the gene that has been maintained in this lineage.

67

Similarly to the situation with TAF9, TAF10 proteins are plainly grouped as either

monocots or dicots with other kingdoms clustering separately. One gymnosperm (Pine)

TAF10 protein was included in this analysis, and it was found to be more similar to the

TAF10 proteins of dicots than those of monocots, consistent with the more recent

evolution of monocots.

TAF11 proteins are similarly clustered in the phylogram, with the exception of the

protein encoded by poplar TAF11. Poplar TAF11 is equally similar to yeast and plant

TAF11 proteins. Interestingly, AtTAF11 is located on Arabidopsis chromosome 4 only

five loci away from TFIIEβ2 (~27 kbp) that is in-turn located very near TFIIEα2 (see

TFIIE section below). Although TAF11 has no known direct-connection to TFIIE, this close genomic proximity seems unusually coincidental. Chromosome 3 of Oryza sativa

(rice) has genes encoding both TAF11-like and TFIIEβ-like protein; however, these

genes are separated by over 8 Mb. Unfortunately, fine mapping of have

not yet been performed for sequences for any dicot except Arabidopsis; therefore, comparison of GTF synteny within dicots is not yet possible.

Arabidopsis TAF11b does not have representation in EST collections, nor has it been amplified by RT-PCR. This data suggests that AtTAF11b may be a non-expressed

or very low expression gene. This hypothesis is supported by the evolutionary

divergence of the protein sequence in relation to other plant TAF11 amino acid

sequences. However, the sequence of AtTAF11 is actually more similar to other plant

sequences than the putative poplar TAF11 protein.

68

TFIIEα and TFIIEβ Subunits

A. thaliana has genes encoding three homologs of TFIIEα and two of TFIIEβ

(Table 2-1). TFIIEα and TFIIEβ of H. sapiens are acidic (pI of 4.5) and basic (pI of 9.5)

proteins, respectively (Peterson et al., 1991). The acidic properties of TFIIEα appear to

be well conserved in Arabidopsis with pI values of 4.75, 4.95, and 4.72 for Eα1, Eα2,

and Eα3, respectively. Likewise, the basic pI values are conserved in Arabidopsis

TFIIEβ proteins (10.23 and 10.04 respectively for Eβ1 and Eβ2).

Four of these Arabidopsis TFIIE genes are clustered on chromosome 4. TFIIEα2 and TFIIEβ2 neighbor each other in a head to head inverted fashion sharing a common promoter region. TFIIEα3 and TFIIEβ1 are in relatively close proximity both in the

same orientation (18 genetic loci inserted between the genes, 83 kbp apart). The extreme

proximity (only 972 bp between start codons) of TFIIEα2 and TFIIEβ2 suggest that they

are direct descendents of the ancestral genes in plants and have been duplicated to create

the other loci. This hypothesis is supported by phylogenetic data in the case of TFIIEβ2 in which the Arabidopsis protein (along with the gene product of TFIIEβ1) is clustered

separately from all other TFIIEβ proteins. However, the TFIIEα2 protein clusters with

the other Arabidopsis TFIIEα sequences, within the dicot grouping.

TFIIFα Family

The poplar genome clearly encodes two TFIIFα genes; however, two contigs

encoding what appears to be the N-terminal and C-terminal regions could not be

connected due to lack of sequence in what appears to be a low-complexity intronic

69

region. Therefore, only one poplar putative-TFIIFα amino acid sequence is included in

my analyses.

TFIIFα is highly conserved throughout the length of the primary structure. In

metazoans and yeast, TFIIFα can be functionally divided into three domains: the N-

terminal TFIIFβ binding domain, the highly charged middle domain, and the C-terminal

winged helix domain (Kamada et al., 2001). The high conservation of the plant TFIIFα primary structures carries through to the hydrophobic residues in the C-termini

(Appendix B). Therefore, the conclusions of Kamada et al. (2001) are followed, suggesting that these proteins have a conserved winged helix domain in their C-termini.

This winged helix domain is not yet implicated in DNA-binding as are the winged helix domains of TFIIEβ, TFIIFβ, and many of the winged helix superfamily members

(Kamada et al., 2001).

TFIIFα has been reported to contain a serine/threonine kinase activity in that is

involved in transcriptional elongation (Rossignol et al., 1999). However, Rossignol and

co-workers (1999) were unable to find an identifiable ATPase domain in TFIIFα.

Interestingly, I have identified a weak similarity with AAA ATPase VWA-containing

proteins in all the plant TFIIFα homologs studied within this work, suggesting that this

kinase activity is retained in plants.

TFIIFβ Family

The poplar genomic sequence has four coding regions with homology to the

Arabidopsis TFIIFβ proteins. However, only three of the poplar contigs created from the

genomic sequences appear to be transcribed into RNA. The fourth sequence (analyzed

using BLASTx) (Altschul et al., 1997) does not support an mRNA prediction of

70 reasonable length and encodes a stop codon within a highly conserved region of the predicted amino acid sequence. Thus, this fourth contig region is most likely a remnant of a non-functional gene duplication.

TFIIFβ proteins are highly conserved, except for large, non-conserved insertions in

A. thaliana TFIIFβ1 and poplar TFIIFβ1. It should be noted that neither of these genes have cDNA clone representation and are therefore only predictions. Despite the lack of cDNA support for these genes, the insertions seem to be grouped between conserved functional domains identified by Tan et al. (1995). Therefore, these insertions may be in flexible linker regions between more defined structural domains suggesting that these gene products, if expressed, may be functional.

The tight clustering of GTF proteins (TAF6, TAF9, TAF10, TAF11, TFIIEα, and

TFIIEβ) along evolutionary lines suggests that these proteins are evolving with the species that encode them in their genomes and have functions that are somewhat resistant to minor sequence variations. This may be due to linker regions in proteins that have little need for conservation of sequence. Conversely, the lack of obvious clustering of proteins within evolutionary groupings below the kingdom level (such as in TFIIB and

TBP) that these proteins are very tightly conserved and that minor sequence alterations may drastically disrupt functions. Thus in the case of TFIIB and TBP, it appears more likely that the conservation may be so tight (at least within subclasses) that very few changes have occurred below the kingdom level and thus phylogenetic analysis cannot produce subgroups reliably. Three plant-specific clusters of TFIIB-like proteins have been identified. At least one of these has evidence of a plant-specific function due to it

71 intracellular localization to the plastid outer membrane under normal conditions

(Lagrange et al., 2003).

72

Table 2-1. Arabidopsis GTF genes, loci, genomic sizes, coding sequence sizes (counting stop codons), predicted protein molecular weights, and pI of the predicted proteins.

Gene Locus Genomic Size CDS Size Predicted Predicted of CDS (bp) (bp) Mw (KDa) pI TFIIA-S At4g24440 1,094 321 12.1 5.61 TFIIA-L1 At1g07480 2,510 1,128 41.3 3.98 TFIIA-L2 At1g07470 2,628 1,128 41.2 4.02 TFIIA-L3 At5g59230 825 561 20.9 3.94 TFIIB1 At2g41630 1,846 939 34.3 6.77 TFIIB2 At3g10330 1,736 939 34.2 6.66 TFIIB3 At3g29380 1,118 1,011 37.7 6.32 TFIIB4 At3g57370 1,637 1,083 39.7 7.76 TFIIB5 At4g36650 2,245 1,512 55.7 6.14 TFIIB6 At4g10680 549 549 19.9 8.89 TBP1 At3g13445 1,453 603 22.4 10.21 TBP2 At1g55520 1,334 603 22.4 10.31 TAF1 At1g32750 9,877 5,760 217.2 5.55 TAF1b At3g19040 8,107 5,103 192.1 7.65 TAF2 At1g73960 9,201 4,113 153.5 6.18 TAF4 At5g43130 5,537 2,163 80.6 9.34 TAF4b At1g27720 4,026 1,854 68.9 9.84 TAF5 At5g25150 4,942 2,010 74.4 6.65 TAF6 At1g04950 3,508 1,584 58.9 8.73 TAF6b1 At1g54360 2,562 1,515 56.5 8.83 TAF6b2 At1g54360 2,562 1,494 55.7 8.84 TAF6b3 At1g54360 2,562 1,431 53.1 8.56 TAF6b4 At1g54360 857 588 22.6 9.62 TAF7 At1g55300 1,321 612 22.5 4.11 TAF8 At4g34340 1,062 1,062 39.5 4.96 TAF9 At1g54140 794 552 20.6 4.67 TAF10 At4g31720 1,482 405 14.9 5.50 TAF11 At4g20280 873 633 23.7 5.39 TAF11b At1g20000 769 615 23.4 9.59 TAF12 At3g10070 2,604 1,620 57.7 10.42 TAF12b At1g17440 3,257 2,052 74.8 10.46 TAF13 At1g02680 836 384 14.3 5.81 TAF14 At2g18000 1,025 609 22.8 6.16 TAF14b At5g45600 1,608 807 30.2 7.03 TAF15 At1g50300 2,845 1,119 41.3 7.98 TAF15b-1 At5g58470 2,131 1,164 38.9 7.93 TAF15b-2 At5g58470 2,289 1,269 42.3 8.73 TFIIEα1 At1g03280 3,124 1,440 54.1 4.75

73

Table 2-1 Continued. Gene Locus Genomic Size CDS Size Predicted Predicted of CDS (bp) (bp) Mw (KDa) pI TFIIEα2 At4g20340 2,738 1,428 54.6 4.95 TFIIEα3 At4g20810 2,171 1,251 47.8 4.72 TFIIEβ1 At4g21010 1,047 828 31.5 10.23 TFIIEβ2 At4g20330 1,357 861 32.4 10.04 TFIIFα At4g12610 3,349 1,950 72.3 5.22 TFIIFβ1 At3g52270 1,708 1,095 42.1 7.70 TFIIFβ2 At1g75510 1,376 761 29.7 6.92

74

Table 2-2: Similarity and identity percentage ranges of the GTF protein families examined.

Similarity Identity Similarity Identity Range within Range within Protein Family Range Range Plants Plants (Average) (Average) (Average) (Average) TFIIA-S 57.4 – 100.0 36.2 – 100.0 62.5 – 100.0 46.7 – 100.0 (83.5) (72.2) (90.1) (83.4) TFIIA-L 23.8 – 97.3 14.8 – 96.5 33.6 – 97.3 21.5 – 96.5 (49.4) (34.5) (63.2) (50.5) TFIIB (all) 13.2 – 99.7 8.6 – 99.4 16.3 – 99.7 10.0 – 99.4 (46.0) (33.2) (50.4) (37.8) TFIIB Class A 21.7 – 99.7 10.0 – 99.4 24.1 – 99.7 10.0 – 99.4 (58.3) (46.8) (63.3) (54.7) TFIIB Class C 88.4 – 91.2 80.2 – 86.3 88.4 – 91.2 80.2 – 86.3 (89.9) (82.7) (89.9) (82.7) TBP 31.4 – 99.5 16.5 – 98.5 61.5 – 99.5 52.8 – 98.5 (73.7) (63.1) (88.2) (84.0) TAF6 15.5 – 98.6 10.2 – 98.6 20.0 – 98.6 15.2 – 98.6 (49.8) (35.2) (59.3) (47.5) TAF9 22 – 98.4 12.1 – 96.2 24.1 – 98.4 13.2 – 96.2 (52.7) (40.1) (61.8) (50.7) TAF10 47.3 – 98.7 22.0 – 98.0 66.9 – 98.7 49.7 – 98.0 (71.5) (57.3) (82.5) (71.6) TAF11 21.4 – 97.2 10.6 – 96.3 31.3 – 97.2 17.2 – 96.3 (51.1) (35.5) (61.6) (47.1) TFIIEα 15.4 – 85.0 8.8 – 73.5 53.3 – 85.0 39.6 – 73.5 (42.6) (28.3) (69.1) (53.5) TFIIEβ 36.3 – 100.0 16.8 – 99.6 67.4 – 100.0 48.6 – 99.6 (69.6) (53.4) (80.7) (67.1) TFIIFα 29.1 – 94.9 15.1 – 89.1 70.8 – 94.9 58.5 – 89.1 (49.2) (33.4) (76.2) (65.3) TFIIFβ 35.4 – 100 18.8 – 99.6 48.9 – 100 34.8 – 99.6 (62.3) (46.1) (74.4) (60.2)

100

70

52 97 77 75

96

51

55

= 1 amino acid change

Figure 2-1: Unrooted phylogram of TFIIA small subunit proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches.

100

100

80

78 76

85

99

92

59

= 10 amino acid changes

Figure 2-2: Unrooted phylogram of TFIIA large subunit proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches.

99

77

100 98 89

84 58 63

66 77 71 99 55

100

62 58 96 100 97 = 10 amino acid changes

Figure 2-3: Unrooted phylogram of TFIIB-related proteins from plants, humans, fruit flies, yeast, and Archaea. Bootstrap percentage support values are shown on branches. Archetypical TFIIB proteins (Class A) are enclosed in blue boxes, PolIII associated TFIIIB-related factors (Class B) are in a red box, and the plastid-associated TFIIBs (Class C) are in a green box, Class D are in a gold box, and Class E are in an orange box.

56

95 78 80

66

86

71 100

72 = 10 amino acid changes

Figure 2-4: Unrooted phylogram of TBP-related proteins from plants, humans, fruit flies, yeast, and Archaea. Bootstrap percentage support values are shown on branches.

100

56 95 82

100 79

99

71

100

= 10 amino acid changes

Figure 2-5: Unrooted phylogram of TAF6-related proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches. Dicot TAF6 proteins are grouped in a green box, monocot proteins are grouped in a gold box.

100 64

99 67

61 80

58 74 64

100

94 100

= 10 amino acid changes Figure 2-6: Unrooted phylogram of TAF9-related proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches. Dicot TAF9 proteins are grouped in a green box, monocot proteins are grouped in a gold box.

53

76 56 81

99 90

51 84 81

74

53 69

70 99

= 10 amino acid changes

Figure 2-7: Unrooted phylogram of TAF10-related proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches. Dicot TAF10 proteins are grouped in a green box, monocot proteins are grouped in a gold box.

76

82

97

100

99

99

= 10 amino acid changes

Figure 2-8: Unrooted phylogram of TAF11-related proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches. Dicot TAF11 proteins are grouped in a green box, monocot proteins are grouped in a gold box.

552

98

100 67 100

67 98 98

83 93 62 66 87

99

99

= 10 amino acid changes

Figure 2-9: Unrooted phylogram of TFIIEα-related proteins from plants, humans, fruit flies, yeast, and Archaea. Bootstrap percentage support values are shown on branches. Dicot TFIIEα proteins are grouped in a green box.

74 100

78

74 100

93 91 84

99 95 99

84 100

100 100

= 10 amino acid changes

Figure 2-10: Unrooted phylogram of TFIIEβ-related proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches. Monocot TFIIEβ proteins are grouped in a gold box.

100

100 85

52

100

= 10 amino acid changes

Figure 2-11: Unrooted phylogram of TFIIFα-related proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches.

99

89

73 77 100 86

81

56

100 88

= 10 amino acid changes

Figure 2-12: Unrooted phylogram of TFIIFβ-related proteins from plants, humans, fruit flies, and yeast. Bootstrap percentage support values are shown on branches. Dicot TFIIFβ proteins are grouped in green boxes, monocot proteins are grouped in a gold box.

87

Arabidopsis_thaliana_BRF1_At2g45100 VLTATHIIASMKRDWMQT Arabidopsis_thaliana_BRF2_At3g09360 VATARDIIASMKRDWIQT Arabidopsis_thaliana_BRF3_At2g01280 ANTAKNIISSMKRDWIQT Drosophila_melanogaster_BRF_AAF72065 SMTALRIVQRMKKDCMHS Homo_sapiens_BRF_NP_001510.2 SMTALRLLQRMKRDWMHT Saccharomyces_cerevisiae_BRF_NP_011762.1 VKDAVKLAQRMSKDWMFE Populus_balsamifera_TFIIB7/pBrp QELATHIGEVVINKCFCT Arabidopsis_thaliana_TFIIB5_At4g36650 QELATHIGEVVINKCFCT Lycopersicon_esculentum_AAG01118 QELATHIGEVIINKCFCT Populus_balsamifera_TFIIB6 ------Populus_balsamifera_TFIIB4 LMKV------Populus_balsamifera_TFIIB5 IK------Oryza_sativa_TFIIB1_AF464908 VKAAQEAVQR-SEELDIR Triticum_aestivum_TC68795 VKAAQEAVQR-SEELDIR Mesembryanthemum_crystallinum_TFIIB_TC5895 MKAAQEAVQK-SEEIDIR Vitis_vinifera_TFIIB_TC19782 VKAAQEAVQK-SEEFDIR Populus_balsamifera_TFIIB1 VKAATEAVKT-SEQFDIR Citrus_sinensis_TFIIB_CB292941 VKAAQEAVQK-SEEFDIR Glycine_max_TFIIB_U31097 VKAAQEAVQK-SEEFDIR Medicago_truncatula_TC86832 VKAAQESVQK-SEEFDIR Arabidopsis_thaliana_TFIIB2_At3g10330 VKAAQESVQK-SEEFDIR Arabidopsis_thaliana_TFIIB1_At2g41630 VKAAQEAVQK-SEEFDIR Lycopersicon_esculentum_TFIIB_TC124975 IKVVQETVQK-AEEFDIR Solanum_tuberosum_TFIIB1_TC58701 IKVVQETVQK-AEEFDIR Populus_balsamifera_TFIIB3 VKAATEAVKT-SEQFDIR Populus_balsamifera_TFIIB8 ------ELKRDG- Arabidopsis_thaliana_TFIIB4_At3g57370 VEAALEAAESYDYMTNGR Populus_balsamifera_TFIIB2 VKAVHEAVEK-IQDVDIR Oryza_sativa_TFIIB2_AAN59779 VREAQRAAQTLEDKLDVR Drosophila_melanogaster_TFIIB_NM_057540 QRAATHIAKK-AVEMDIV Homo_sapiens_TFIIB_NM_001514 QMAATHIARK-AVELDLV Arabidopsis_thaliana_TFIIB3_At3g29380 IMAIPEAVEK-AENFDIR Arabidopsis_thaliana_TFIIB6_At4g10680 ------Saccharomyces_cerevisiae_TFIIB_M81380 TTSAEYTAKKCKEIKEIA Methanosarcina_acetivorans_TFB_NP_615574.1 QSKSVEILRQ-ASEKELT Sulfolobus_solfataricus_TFB_AAK40772.1 MKTAAEIIDK-AKGSGLT Populus_balsamifera_TFIIB9 -NPDGDLIQGFEIIETMA Arabidopsis_thaliana_BRF4_At4g35540 VRTDGFCVEDLVMDCLSK Lycopersicon_esculentum_TFIIB_AF273333 DIISLNVLANTHSNTMQI

Figure 2-13. Multiple sequence alignment of the TFIIB region containing the conserved lysine residue that is acetylated in human and yeast TFIIB (in green). The poplar TFIIB1 and TFIIB3 predicted amino acid sequences have a lysine one amino acid off register (in blue) that might be autoacetylated.

88

A.

+3771 +1 +266 AUG codon Stop codon

B.

+1 +2560 +2629 AUG codon Stop codon

+2629 +1 +855 AUG codon Stop codon

+2560 +2629 +1 Stop codon AUG codon

+1 +2560 +2629 AUG codon Stop codon

Figure 2-14: Exon-Intron diagrams of Arabidopsis TAF6 and TAF6b alternative splicing forms. Exons are depicted as orange boxes, introns are depicted as a black line. Blue boxes show exons that are modified in different clones due to differential splicing. A. TAF6, and B. TAF6b. Nucleotide annotations are relative to the genomic sequence. TAF6b forms are from top to bottom TAF6b-1, TAF6b-4, TAF6b-2, and TAF6b-3.

CHAPTER 3 BINARY PROTEIN-PROTEIN INTERACTIONS OF THE Arabidopsis thaliana GENERAL TRANSCRIPTION FACTOR IID

Introduction

The existence of the two Arabidopsis thaliana TBP proteins has been accepted for

many years (Gasch et al., 1990; Nikolov et al., 1992; Chasman et al., 1993; Heard et al.,

1993; Kim et al., 1993a; Nikolov and Burley, 1994; Rowlands et al., 1994); however, there has been very little examination of other GTFs in plants. In fact, there has been no publication of a TBP associated factor (TAF) from plants having been cloned and biochemically verified as a TAF. With the advent of genome sequencing, gene annotation has been often performed based on homology alone, and with sufficient similarity, gene function is often assumed. TAFs however, despite frequent sequence similarity, are defined by their association with TBP. If a protein is not found in TFIID

(with TBP), it technically is not a TAF despite sequence similarity with bona fide TAFs.

There are examples of human and Drosophila proteins that are not part of TFIID yet are very similar in sequence to TAFs and are designated “TAF-like” (Tora, 2002). Homo sapiens have TAF5L, TAF6L, TAF7L, TAF9L, and TAF11L (Tora, 2002); TAF5L and

TAF6L are both found in the p300/CBP-associated factor (PCAF) complex (PAF65α and

PAF65β, respectively). Without a direct demonstration of integration into the TFIID

complex, the next best experiment is to test protein-protein interactions with putative

TAF proteins and TBP. The yeast two-hybrid system was chosen to undertake a comprehensive analysis of the Arabidopsis TFIID protein-protein interactions.

89 90

The Stargell laboratory has recently conducted a similar analysis with the proteins of yeast TFIID (Yatherajam et al., 2003). This group found that 17% of the potential interactions tested resulted in a strong to intermediate growth response, another 8% demonstrated a potential weak interaction (Yatherajam et al., 2003). Significantly, this study was unable to reproduce some interactions that were previously characterized by other experimental methods. These unconfirmed interactions included TBP-TAF1,

TAF1-TAF2, and TBP-TAF12 among others (Yatherajam et al., 2003). In fact, the entire complement of TBP-TAF interactions was not reproduced, with the exception of TBP-

TAF7 (Yatherajam et al., 2003). Yatherajam et al. (2003) suggested either that this observation was the result of less extensive TBP-TAF interactions than previously thought or that some necessary TAF truncations removed TBP-interaction domains

(Yatherajam et al., 2003).

Arabidopsis homologs of all TAFs, other than TAF3, have been identified (Chapter

2, Table 2-1). These putative TAFs, as well as TBP1 and TBP2, were cloned into the

Gateway cloning system (Invitrogen, USA), and further into MATCHMAKER III yeast- two hybrid vectors (Clontech, USA). These clones were transformed into yeast and protein-protein interaction tests were performed. Of the 720 identified interactions, 102

(14.2%) were positive.

Materials and Methods

Total RNA was extracted from A. thaliana ecotype Columbia suspension cells

(kindly provided by Robert Ferl, University of Florida) using Plant RNeasy (Qiagen,

USA) with the additional on column DNase treatment (Qiagen, USA). RNA extraction was performed according to the manufacturer’s protocols. First strand cDNA synthesis

91

was performed with 1 µg of total Arabidopsis RNA with Superscript II reverse

transcriptase (Invitrogen) following the manufacturer’s protocol.

Primers compatible with the pENTR/D-Topo vector (Invitrogen) were designed to

amplify the coding sequences (CDS) of the identified proteins (Table 3-1). Primer

designs targeted a melting temperature of 65°C with a G or C 3’ end nucleotide preceded

by an A or T. Primers were ordered at the 10 nmole synthesis level with standard

preparation from Invitrogen. Primers were resuspended at 20 pmol/µl in TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0). Blunt end polymerase chain reaction (PCR) products were created using the high-fidelity enzyme Platinum PFX (Invitrogen), 1x PCR

Optimizer (included with Platinum PFX), Supescript II product derived from 30 ng of

RNA, 35 amplification cycles, and an annealing temperature of 52°C. Elongation times varied by template using 1 min/Kb + ~15 s. The PCR program ended with an 8 min elongation step followed by a hold at 4°C. TAF12b was subcloned from the Arabidopsis

Biological Resource Center (ABRC) clone U11077 because PCR product was not obtained from the cDNA preparation. All PCR reactions were carried out using an

Eppendorf Mastercycler Personal thermocycler (Brinkmann).

The Gateway system (Invitrogen, USA) was selected for rapid cloning and subcloning of cDNAs through recombination into appropriately modified vectors. As such, PCR products were cloned into pENTR/D-Topo (Invitrogen, USA) according to the manufacturers instructions. The directional Topo cloning reaction products were transformed into chemically competent One Shot TOP10 Escherichia coli cells

(Invitrogen, USA). TAF1, TAF2 and TAF4 were cloned by BP reactions (Invitrogen,

USA) into pDONR207 (Invitrogen, USA) from PCR using primers with attB sites. The

92

resultant colonies were screened by PCR using a combination of coding region (CDS)-

specific and vector-specific primers. Positive clones were verified by sequencing at the

University of Florida Microbiology and Cell Science Sequencing Core or Macrogen, Inc.

(Seoul, Korea).

Yeast two-hybrid vectors pGAD-T7 and pGBK-T7 (Clontech, USA) were converted to be Gateway compatible by the insertion of the Gateway reading frame A cassette by blunt-end ligation into the digested SmaI sites. The constructs were verified by DNA sequencing by Macrogen, Inc. Verified pENTR/D-Topo constructs were recombined (Gateway LR and BP reactions, Invitrogen) successively through pGAD-

T7Gateway, pDONR207, and pGBK-T7Gateway. These plasmid clones were

transformed into electrocompetent MACH1 E. coli (Invitrogen, USA).

Clones in pGAD-T7Gateway and pGBK-T7Gateway were confirmed by digestion

with BamHI. Clones in pDONR207 were confirmed by digestion with BamHI and

EcoRV. The Gateway recombination system is dependent on the proper function of a

negative selection gene (ccdB), which is normally replaced by the DNA fragment being

cloned. However, if the ccdB gene becomes mutated, the un-recombined vector could

grow on the selective media. If this is the case, BamHI digestion of pGAD-T7Gateway

and pGBK-T7Gateway results in the production of a 600 bp and a 700 bp band (in

addition to a ~8 kb vector band). In a similar case with pDONR207, BamHI and EcoRV

digestion results in the formation of 3.9 Kb and 2.0 Kb bands.

The resultant yeast two-hybrid bait (contains Gal4 DNA binding domain) and prey

(contains Gal4 activation domain) constructs were transformed into MaV204K (MATa,

trp1–901, leu2–3, 112, his3∆200, ade2–101∆::kanMX, gal4∆, gal80∆, SPAL10::URA3,

93

UASGAL1::HIS3, GAL1::lacZ; a kind gift of Dr. T. Ito, Kanazawa University, Japan) (Ito

et al., 2000) and AH109 (MATa, trp1-901, leu2-3, 112, ura3-52, his3-200, gal4∆,

gal80∆, LYS2::GAL1UAS-GAL1TATA-HIS3, MEL1 GAL2UAS-GAL2TATA-ADE2,

URA3::MEL1UAS-MEL1TATA-lacZ; Clontech) yeast strains, respectively. All

transformations of vector plasmids were conducted using Frozen-EZ Yeast

Transformation II (Zymo Research, USA). Bait constructs in MaV204K were plated on

SD –Trp, +0.1% 5’ fluoroorotic acid (5’FOA) to check for spurious activation of the

SPLA10::URA3 reporter. In the event that the bait protein contains an activation domain, the URA3 gene would be activated by the recruitment of the bait protein to the SPLA10 promoter that contains the Gal4 upstream activation site (UASGal4). The URA3 gene

product catalyzes the formation of a toxic product from 5’FOA, preventing yeast growth,

and indicates an unsuitable bait construct for yeast two-hybrid analysis.

Western blots were performed to examine bait and prey protein expression levels.

Protein was harvested and prepared using a modification of the method developed by

Horvath and Riezman (1994). Bait and prey transformants were grown in their

appropriate SD medium to an OD600 between 0.65 and 0.9. Culture volumes were

linearly adjusted to normalize harvested cultures to 1.5 mL of culture at 0.7 OD600.

Cultures were harvested by centrifugation at 16,000 x g for 1 min, washed with 1 mL of deionized H2O, and collected again by centrifugation. The resulting pellet was

resuspended in 100 µl of reducing 2x Laemmli loading buffer (Laemmli, 1970)

containing 5 mM EDTA, and 2x Halt Protease Inhibitor Cocktail (Pierce, USA), placed

in a boiling water bath for 5 min and separated on a 10% SDS-PAGE gel. The separated

proteins were transferred to Immobilon-P PVDF membranes (Millipore, USA) in a BIO-

94

RAD TRANS-BLOT SD semi-dry transfer cell at 75 mA per minigel for 35 min. Blots

were blocked in Tris-buffered saline (pH 7.4) containing 0.2% Tween-20 (TTBS) and 3%

non-fat dried milk for 1 hour. Blots containing bait proteins were probed overnight with

anti-Myc-tag 9B11 monoclonal antibody (Cell Signaling Technology, USA) diluted

1:800 in TTBS 1% non-fat dried milk. Blots containing prey proteins were probed

overnight with anti-hemagglutinin-epitope HA.11 monoclonal antibody (Covance, USA) diluted 1:500 in TTBS 1% non-fat dried milk. All blots were probed with goat anti-

mouse horseradish peroxidase (HRP) conjugated Immunopure Antibody (Pierce) as a

secondary antibody for 1 h. Four washes of 5 min each were performed in TTBS

following each incubation with antibody. HRP activity was displayed after incubation of

blots with ECL+ chemiluminescent substrate (Amersham, UK) using X-ray film (RPI,

USA).

Bait/MaV204K and prey/AH109 transformants were then mated with one another and selected on SD -Trp, -Leu (to test for mating efficiency) and SD -Trp, -Leu, -His, -

Ade (to test for interaction). Yeast matings were performed similarly to the protocol in the Yeast Protocols Handbook (Clontech). Several fresh, large (2-3 mm diameter) colonies were resuspended in 1 mL YPD media and vortexed vigorously for 30 s to disrupt clumps of cells. Into sterile 96-well plates containing 180 µL of YPD in each well, 10 µL each of bait/MaV204K and prey/AH109 suspensions were added. The 96- well plates were incubated at 30°C overnight with shaking at 200 rpm.

Using a 30 µL 8-channel pipette (Matrix Technologies, USA), 10 µL of each overnight mating culture was removed from each well and serially diluted 10-fold three successive times in sterile 96-well plates containing 90 µL of YPD in each well. Each

95

dilution was mixed by three 30 µL aspiration/blow-out cycles before removing an aliquot

for the next dilution. From each dilution, 3 µL of cell suspension was removed and spotted in a grid on both a SD -Trp, -Leu and SD -Trp, -Leu, -His, -Ade 25 cm x 25 cm screening plate. The plates were incubated at 30°C and spots were monitored for growth

over 14 days.

β-galactosidase assays were performed to obtain semi-quantitative data regarding the strength of individual interactions. The activation of the GAL1::lacZ reporter is

driven by the reconstituted Gal4 protein in a manner which correlates with the strength of

interaction and activation potential of the two fusion proteins. β-galactosidase assays of

colonies exhibiting growth were performed using CPRG substrate (Roche, Germany)

according to the Clontech Yeast Protocols Handbook. All values were expressed as

Miller units (Miller, 1972, 1992). The normalized activity (NAct) for positive a β-

galactosidase test was determined by the following equation: NAct = AVGt – SDt – 1.1 x

AVGc – 1.1 x SDc. In this equation AVGt is the average of the test activities, SDt is the standard deviation of the test activities, AVGc is the average of the activities of

appropriate bait negative controls, and SDc is the standard deviation of the activities of

appropriate bait negative controls. If the NAct was determined to be greater than zero,

the β-galactosidase assay was considered positive. If the NAct was equal to or greater than one, then the interaction was deemed strong. Thus, the average test activity minus its standard deviation must be equal to or greater than 110% of the average negative control activity plus its standard deviation.

96

Results

The TAF7, TAF12, and TAF15 bait constructs were spurious activators defined by the lack of growth on 5’FOA containing plates. These bait constructs were thus excluded from further studies. It was also found that the TAF12 prey construct caused the activation of the reporter genes and thus TAF12 was cloned as N-terminal (aa 1-200), middle (aa 201-394), and C-terminal (aa 395-539) fragments into pENTR/D-topo vector and subcloned into the bait and prey vectors as detailed above (Table 3-2). Histograms of the percentage of matings that formed colonies from each bait and prey constructs were created (Figures 3-1 and 3-2). These figures exclude the full length TAF12 to avoid redundancy (TAF12 is represented as the separate peptide fragments consisting of amino acids 1-200, 201-394, and 395-538). The bait proteins that interacted with the Gal4 activation domain (Gal4 AD) alone (TAF1 #6, N-terminal one-third of TAF1; TAF12 amino acids 1-200; TAF12 amino acids 395-538; and TAF15b) formed colonies with a frequency of over 70% of their matings. These baits were the only constructs to have colony frequencies above 70%, confirming the assertion that only baits that were not observed to interact with the Gal4 AD were suitable for analysis.

Immunoblots to detect bait and prey proteins (Figures 3-3 and 3-4, respectively)

demonstrated a large variability in protein expression levels. The majority of bait

proteins were at steady state levels that were below the level of detection. However, interaction data suggest that these proteins are present in the yeast cells. The results of the targeted two-hybrid screens are presented in Table 3-4 and depicted pictorially in

Figure 3-6. Of the interactions in Table 3-4, 552 were reciprocated meaning that bait X was interacted with prey Y, and bait Y was interacted with prey X. Only TAF1 #8

(middle region; MR) was lacking in interactions with other TFIID components, but it

97

does interact with the small subunit of TFIIF (chapter 4). The results of the β-

galactosidase assays are shown in Figure 3-5. The β-galactosidase normalized activities

were utilized to verify or exclude protein-protein interactions based on colony growth.

NAct values below zero were considered negative interactions, while NAct values above one were considered evidence of strong interactions.

The criteria for positive interactions were that greater that 50% of interaction evidence must be positive. Thus, if an interaction is not supported by growth of diploid yeast containing reciprocated constructs, it must be supported by a NAct value greater than zero to be considered positive. If an interaction is supported by growth of diploid

yeast containing reciprocated constructs, only one of the two tested β-galactosidase

activities must have a NAct greater than zero for the interaction to be considered positive.

Strong interactions (NAct ≥ 1) are depicted in Figure 3-7.

Of the 720 total combinations tested, which do not include negative controls, 102

or 14.2% were positive. Thirty or 4.2% of the total combinations grew in yeast, but were

determined to be negative based on NAct values. However, of these, five were

reciprocated interactions that were verified by the NAct values of their reciprocal

constructs in the β-galactosidase assays. The total number of protein-protein interactions

that were tested regardless of bait or prey conformations was 444 (the non-redundant

interaction set), leaving 276 protein-protein interactions that were reciprocally tested. Of

the 444 non-redundant interactions tested, 72 (16.2%) formed colonies that were verified

and 25 (5.6%) produced colonies that were not verified by NAct values.

98

Discussion

Using homology based searches of the A. thaliana genomic sequence database 23

loci encoding putative TFIID subunits have been identified. Of these, 30 putative TFIID

subunit coding-sequences (including 4 splice variants of TAF6b and fragments of TAF1

and TAF12) have been cloned into the MATCHMAKER III yeast two-hybrid system

(Clontech). Interestingly, full-length TAF12 was found to act as an activator of the

reporter genes in both the bait and prey constructs (data not shown), and has therefore been cloned as three separate sub-fractions (amino acids 1-200, 201-394, and 395-539).

Surprisingly, none of these TAF12 sub-fractions interfere or act as spurious activators in bait or prey forms (all baits passed the 5’FOA test). However, amino acids 1-200 and

395-538 fragments interact with all prey constructs tested indicating an interaction with the Gal4 AD.

The majority of interactions identified in this study have been described for homologs in other systems. For instance, TAF10 was shown to interact with

TAF4/TAF4b, TAF6b (splicing versions 2 and 3), TAF8, TAF9, TAF10 (dimer formation), TAF11, TAF12b, and TAF13. All of these interactions have been described

previously for homologs in other systems (Chapter 1).

Unique interactions have also been described such as TAF5 and TAF8

homodimers, and interactions of TAF8 with TAF13 and TAF14. TAF8 and TAF10 are a

HFD binding pair and interact strongly in this study. Both proteins independently form dimers, suggesting formation of a α2β2 tetramer. Consistent with there tight interaction

are a number of shared interactions with other TAFs (TAF4, TAF4b, TAF12b, and

TAF13).

99

As with any system a number of false positives and false negatives can occur.

False positives were minimized with the use of β-galactosidase activity measurements and analysis of bait constructs (5’FOA tests and positive interaction frequency analyses).

Of the 444 non-redundant interactions tested to date (via 720 matings), 16.2% (72) have resulted in confirmed interactions. This rate is in line with observations of Yatherajam et al. (2003) in yeast TFIID and suggests that molecular bridging of TBP-TBP, TBP-TAF, and TAF-TAF interactions is not being mediated by yeast GTF components in this assay.

If molecular bridging was problematic, interactions between two proteins that have a known common interactor would be expected. For example, TAF4 interacts strongly with both TAF1 #9 (C-terminal one-third) and TAF12 395-538. If molecular bridging of

TAF1 #9 with TAF12 395-538 occurred through yeast TAF4, then a TAF1#9-TAF12

395-538 interaction would have been observed.

TBP1 and TBP2 notably lacked interaction with any TAFs besides TAF1 #6 (N- terminus). This interaction could potentially be mediated by the TAND domain of TAF1 that in other systems interacts with the concave surface of TBP. TBP1 and TBP2 prey constructs interacted with both TAF1 bromodomain and TAF10 in yeast growth studies.

However, β-galactosidase NAct values did not verify these interactions. TBP1 and TBP2 could potentially dampen the Gal4 AD fusion activity by intermolecularly or intramolecularly masking the activation domain, leading to false negatives in both yeast growth studies and β-galactosidase assays. Yatherajam et al. (2003) observed that yeast

TBP only interacted with TAF7 in yeast two-hybrid assays. Arabidopsis TAF7 failed the

5’FOA test as a bait protein, thus only the prey protein construct was analyzed further.

100

Although this prevented testing a bait-TAF7/prey-TBP interaction, bait-TBP did interact with TAF1 #6 indicating viability in some cases.

AtTAF2 as in the study by Yatherajam et al. (2003), interacted with only one other

TFIID component. In this study, TAF2 interacted with the N-terminus of TAF1, although in the yeast study, it interacted with TAF4. TAF2 in metazoans is known to

have sequence-specific DNA interactions with the Initiator element. Neither yeast nor

plants have been shown to contain an initiator consensus sequence in their promoters.

The role of TAF2 in these organisms is therefore unknown, and yeast two-hybrid assays

show limited structural interactions with other TFIID components.

Despite the pit-falls listed above, a number of plant interactions unique among

eukaryotes have been elucidated. The strongest unique interaction appears to be TAF1

#9-TAF1 #9 (a dimer of the C-terminal one-third of TAF1). No previous study has

suggested that TAF1 may form dimers. This interaction may be an artifact of incorrect

protein-folding due to the examination of a protein fragment, or expression in a

heterologous system.

Formation of a TAF1 dimer in Arabidopsis is an attractive model. Both

Arabidopsis TAF1 and TAF1b encode only one bromodomain. Bromodomains are

known to bind acetylated lysine residues on histone tails; however, single bromodomains

have approximately 70-fold lower affinity for acetylated lysines than do double

bromodomains (Dhalluin et al., 1999; Jacobson et al., 2000). Human and Drosophila

TAF1 proteins both contain two tandem bromodomains, and while yeast TAF1 has no

bromodomains, a novel TFIID member in yeast encodes two bromodomains

(Matangkasombut et al., 2000). The Arabidopsis genome does not appear to encode such

101 a novel bromodomain-containing protein; therefore, the presence of a TAF1 dimer could compensate for the single bromodomain.

Arabidopsis is the first organism to be reported to encode two TAF1 homologs

(Pandey et al., 2002). In silico analyses that suggest TAF1b does not encode an N- terminal domain (TAND; W.B. Gurley and E. Czarnecka-Verner unpublished data). The

TAND domains have been shown to be auto-inhibitors of TFIID binding to the TATA- box (Kokubo et al., 1993b), and there are no known examples of TAF1 proteins lacking the TAND domains outside of Arabidopsis. A TAF1-TAF1b heterodimer would leave

TFIID with two bromodomains, and a single complement of TAND inhibitory domains.

This is one possible model to explain the lack of the TAND domains in TAF1b and single bromodomains in both TAF1 and TAF1b.

Three other novel strong interactions in Arabidopsis TFIID are the TAF12 1-200 with TAF13, TAF12b-TAF15b, and TAF14-TAF15b pairs. The novel finding that

TAF12 homologs and TAF13 interact in Arabidopsis may suggest a unique structure of this TFIID complex. TAF13 is a histone-fold containing protein, as are the TAF12 homologs. However, the HFD of the TAF12 homologs are in the C-terminus, not the N- terminus that interacts strongly with TAF13. Although TAF13 does interact with the

HFD containing amino acid 395-538 fragment of TAF12, the much stronger interaction with the amino acid 1-200 fragment suggests that the primary interaction is not through the HFD domains of these proteins.

The strong TAF12b-TAF15b interaction is also unique. Neither TAF12 nor

TAF12b were found to interact with TAF15, nor did TAF12 interact with TAF15b. Like the TAF12-TAF13 interaction, this may also suggest a novel structure of plant TFIID.

102

Although both TAF15 and TAF15b were observed to interact with TAF4 and TAF4b

(histone-fold binding partners for TAF12/TAF12b), the selectivity of TAF12b and

TAF15b for each other could have two possible ramifications. The histone-fold core

TAFs (TAF4, TAF6, TAF9, and TAF12) are suggested to be in TFIID as an octamer structure similar to the nucleosome, although there are discrepancies between this model

and the data. One such discrepancy is the known dimerization of the H2B-like TAF12

proteins since the two histone H2B polypeptides lie on opposite sides of the nucleosome.

The likely presence of two TAF12-like proteins in Arabidopsis TFIID offer three possible

configurations. TAF12-TAF12, TAF12-TAF12b, and TAF12b-TAF12b dimers are all

consistent with the interaction data presented here. Therefore, TAF15b could be

interacting with 0, 1, or 2 copies of TAF12b.

Furthermore, TAF15b may or may not be present in an Arabidopsis TFIID

complex, depending on the presence of TAF12b. However, TAF15b does have a number

of other weak interactions with other TAFs which may cooperatively incorporate this

protein into the complex. Interestingly, the protein designated as TAF15 appears to be less tightly connected to TFIID in general. This protein was only shown to interact

weakly with both TAF4 and TAF4b. However, analysis of interactions with several

TFIID components was restricted by the activating nature of several constructs.

Potentially, TAF15 may interact strongly with TAF12 (either the N-terminus or the C-

terminus), interactions that could not be tested in this system. Even if these TAF15

proteins are not present in TFIID, they may still be recruited to the PIC through and

interaction with PolII as is seen for their human homologs (Bertolotti et al., 1998).

103

TAF15b also interacts strongly with TAF14. Although no organism has previously been reported to contain both TAF14 and TAF15, Arabidopsis contains two homologs of each. TAF14 and TAF15 both interact with a number of the same partners including

TAF4b, TAF5, and TAF12b. Their common interactions and strong pairing suggest that

TAF14 and TAF15b are likely to be localized to a lobe of TFIID containing TAF4(b),

TAF5, and TAF12(b). The presence of such a lobe is consistent with the results presented here and by Yatherajam et al. (2003).

TAF9 is expected to interact with TAF6-like proteins (a Histone H3/H4-like interaction) (Hisatake et al., 1995), and this was upheld in our studies, with the exception of the TAF6b-3 cDNA. It is interesting to note that TAF6b-3 contains an altered histone- fold structure (N-terminus) due to splicing differences, and this is the only TAF6-like protein shown to interact with TAF11 (containing a histone H3-like fold).

Another interesting interaction is that of TAF7 with the embedded ubiquitin domain unique to the plant TAF1 genes. Personal communication (M. Horikoshi) and sequence analysis of TAF7 genes have suggested that this TAF may have an VWA- related AAA ATPase activity. This similarity with proteasome subunits could potentially be part of a chromatin or factor remodeling activity within TFIID targeting ubiquitylated proteins.

Despite a number of studies of the structure of TFIID, X-ray crystallographic analysis of this complex has been elusive. Two studies have characterized the gross molecular structure of TFIID (Andel et al., 1999; Brand et al., 1999), and the HFD TAFs have been mapped on this structure (Leurent et al., 2002). However, absolute stoichiometries have not been determined, and the positioning of each component within

104 the structure is unknown after more than a decade of study. This work represents the first detailed analysis of the TFIID complex from any plant species. While no TAF3 homolog has been observed in plants (possibly due to poor conservation of sequence similarity), at least one homolog has been identified for each of the other 15 confirmed TFIID components. In all Arabidopsis appears to have 23 genes encoding TBP or TAF homologs. Of these, 21 were examined in this work. A total of 72 binary interactions were identified, including 26 novel protein interactions. In addition, Arabidopsis is the first organism to encode both TAF14 and TAF15.

105

Table 3-1. Primers for amplification of TBP and TAF-like cDNAs and cloning into pENTR/D-Topo or pDONR207 vectors.

Gene Primer Sequence TBP1 Upper Primer caccATGACTGATCAAGGATTGGAAGGGAGTAATC TBP1 Lower Primer TTGCTGTATCTTTCTGAATTCCGAGAGCAC TBP2 Upper Primer caccATGGCTGATCAAGGAACGGAAGGGAG TBP2 Lower Primer TTGCTGGACCTTCCTGAATTCTCTAAGAAC TAF2 Upper Primer ggggacaagtttgtacaaaaaagcaggcttaATGGCCAAGGCTCGAAAGCC GAAG TAF2 Lower Primer ggggaccactttgtacaagaaagctgggttTGAGTTGTTGAACGCTTTGCTT TTCAGTTTGATTC TAF4 Upper Primer ggggacaagtttgtacaaaaaagcaggcttaATGGATCTCTCCATT GTCAAGCTCCTC TAF4 Lower Primer ggggaccactttgtacaagaaagctgggttAACATCCGAGCAGATT CTATTGTATACGCGATAC TAF4b Upper Primer caccATGGATCCTTCAATTTTCAAGCTCCTTGAAG TAF4b Lower Primer TTGAATTAATCGATACATCAGAGTGGATTTGGAC TAF5 Upper Primer caccATGGATCCAGAGCAAATCAACGAGTTCGTC TAF5 Lower Primer ATATCTGATACCATTGTTTTGATCAGTTTGCGGGT TAF6 Upper Primer ggggacaagtttgtacaaaaaagcaggcttaATGAGCATTGTACCTAAGGA AACGGTTGAG TAF6 Lower Primer ggggaccactttgtacaagaaagctgggttGAGGAATACTGACATCTCTGT AGAAGGGATAAAG TAF6b-1 Upper Primer caccATGGTGACGAAAGAATCCATTGAAGTGATAGCTC TAF6b-1 Lower Primer CAAGAAGAAACTGAGCTCATGTGTG TAF6b-2 Upper Primer caccATGGTGACGAAAGAATCCATTGAAGTGATAGCTC TAF6b-2 Lower Primer CAAGAAGAAACTGAGCTCATGTGTG TAF6b-3 Upper Primer caccATGGTGACGAAAGAATCCATTGAAGTGATAGCTC TAF6b-3 Lower Primer CAAGAAGAAACTGAGCTCATGTGTG TAF6b-4 Upper Primer caccATGGTGACGAAAGAATCCATTGAAGTGATAGCTC TAF6b-4 Lower Primer AAGCCCACTCCGTGACTTTGTCAAAGTAAATC TAF7 Upper Primer caccATGGAAGAACAGTTCATACTTAGGGTTC TAF7 Lower Primer CATTGAATCATCAGAATCTTCTGATTCACTTCTCTC TAF8 Upper Primer caccATGAACACAGAGAGAGCTCAAGAAGGTGATAG TAF8 Lower Primer CAACTGATTGAGGTCTACTGGGTTCTCCATAC TAF9 Upper Primer caccATGGCAGGAGAAGGTGAAGAAGATGTACCTAGAGAT GCTAAG TAF9 Lower Primer TTTGGGTCGTCTAGAGAGTGGGAAAGAGACCCTTTGAGGA TAF10 Upper Primer caccATGAATCACGGCCAACAATCTGGTGAGGC TAF10 Lower Primer TTCGTCCCTTGTTGCAGGGTCCATTCCAGTCGA TAF11 Upper Primer caccATGAAGCATTCAAAGGATCCGTTTGAAGCAGCGA TAF11 Lower Primer GCGGAAAAGGCGTGGAACTGATCTTTTAGGCAC TAF11b Upper Primer caccATGGCCTTTAACGCAAGGTCTTGTTGTTTTGCTAG TAF11b Lower Primer GCGAAAAAGCCGTTGAACTGATCTTTGAG TAF12 Upper Primer CaccATGGATCAGCCACGGCAAAGCTCGA

106

Table 3-1 Continued. Gene Primer Sequence TAF12 Lower Primer GTGATTGAAAGTTGTAGAGCCCATGGGA TAF12b Upper Primer caccATGGCGGAACCGATTCCCTCATCGTC TAF12b Lower Primer GTATCGTGTCATGTGTTGTAATATGTGAGGACCGGATG TAF13 Upper Primer caccATGAGTAACACACCAGCAGCG TAF13 Lower Primer ATCAACGAGTTCCTTTTCGTCGACATC TAF14 Upper Primer caccATGGAGTCGGATATCGAGATTTTGTCTGAAG TAF14 Lower Primer GAACAAGAATGCACCTGGAGGAGGCAG TAF14b Upper Primer ggggacaagtttgtacaaaaaagcaggcttaATGACGAACAGCTCGTCATC GAAGAAACAAGCTC TAF14b Lower Primer ggggaccactttgtacaagaaagctgggttCAGGTCTGATCCTGTTTTAACG GTCTGATTC TAF15 Upper Primer caccATGGCTGGATATCCTACTAATGGATCAGTCTAC TAF15 Lower Primer GTTACGGTACCTGCTTCCACGTTCGCGAC TAF15b Upper Primer caccATGGCTGGGATGTACAATCAAGACGGCGGCGGAG TAF15b Lower Primer GATCAGACACAGACATCTCTGGTCAAAGGTAGGAGCAC

Note: Lower case sequences are not homologous to the gene of interest, but are required for cloning into the appropriate vector. The sequence “cacc” on the upper primer is for directional cloning into pENTR/D-Topo, the attB1 “ggggacaagtttgtacaaaaaagcaggctta” and attB2 “ggggaccactttgtacaagaaagctgggtt” sequences are for directional BP cloning into pDONR207.

107

Table 3-2. Primers for cloning of TAF12 N-terminal, middle, and C-terminal fragments. Lower case sequences are not homologous to the gene of interest, but are required for cloning into the pENTR/D-Topo vector.

Gene Primer Sequence TAF12 1-200 Upper Primer caccATGGATCAGCCACGGCAAAGCTCGA TAF12 1-200 Lower Primer CTGAGTTCCTTGCATCATTCTAACCTGAG TAF12 201-394 Upper Primer caccGGAATTGGGATGATGGGAACACTTG TAF12 201-394 Lower Primer CGGCTCGGTCTCTGCAGAAACTG TAF12 395-539 Upper Primer caccTCTGATGATCGTATCCTGGGGAAACGAAGCATC TAF12 395-539 Lower Primer GTGATTGAAAGTTGTAGAGCCCATGGGA

108

Table 3-3. Arabidopsis thaliana TFIID subunit cDNA GenBank accession numbers.

Gene Locus Accession CDS Size Mw (kDa) pI (bp) TBP1 At3g13445 AY463625 603 22.4 10.21 TBP2 At1g55520 AY463626 603 22.4 10.31 TAF1 At1g32750 AF510669 5,760 217.2 5.55 TAF1b At3g19040 N/A 5,103 192.1 7.65 TAF2 At1g73960 AY457045 4,113 153.5 6.18 TAF4 At5g43130 AY457043 2,163 80.6 9.34 TAF4b At1g27720 AY457044 1,854 68.9 9.84 TAF5 At5g25150 AY463620 2,010 74.4 6.65 TAF6 At1g04950 AY463621 1,584 58.9 8.73 TAF6b1 At1g54360 AY463630 1,515 56.5 8.83 TAF6b2 At1g54360 AY463631 1,494 55.7 8.84 TAF6b3 At1g54360 AY463632 1,431 53.1 8.56 TAF6b4 At1g54360 AY463633 588 22.6 9.62 TAF7 At1g55300 AY463622 612 22.5 4.11 TAF8 At4g34340 AY463623 1,062 39.5 4.96 TAF9 At1g54140 AY463624 552 20.6 4.67 TAF10 At4g31720 AY463628 405 14.9 5.50 TAF11 At4g20280 AY463612 633 23.7 5.39 TAF11b At1g20000 N/A 615 23.4 9.59 TAF12 At3g10070 AY463613 1,620 57.7 10.42 TAF12b At1g17440 AY463614 2,052 74.8 10.46 TAF13 At1g02680 AY463615 384 14.3 5.81 TAF14 At2g18000 AY463616 609 22.8 6.16 TAF14b At5g45600 AY463617 807 30.2 7.03 TAF15 At1g50300 AY463618 1,119 41.3 7.98 TAF15b-1 At5g58470 AY463619 1,164 38.9 7.93 TAF15b-2 At5g58470 N/A 1,269 42.3 8.73

16

14

12

10

8

Frequency

6 109

4

2

0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100

Percent Growth Figure 3-1. Histogram of percent of matings, per bait construct, that yielded colony growth. Bars marked in red represent bait constructs that were excluded from the analysis and considered spurious bait activators.

16

14

12

10

8

Frequency

6 110

4

2

0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 Percent Growth

Figure 3-2. Histogram of percent of matings, per prey construct, that yielded colony growth.

Kda

pGBK-T7 TBP1 TBP2 TAF1 UBD TAF1 BD TAF1 #8 (MR) TAF1 #9 (CD) TAF2 TAF4 TAF4b TAF5 TAF6 TAF6b-1 TAF6b-2 TAF6b-3 TAF6b-4 TAF8 TAF9 TAF10 TAF11 TAF12 201-394 TAF12b TAF13 TAF14 TAF14b

120 100 80 60 50

40

111

30

Figure 3-3. Immunoblots of TFIID components expressed as bait fusion proteins in MaV204K. Proteins were expressed as Gal4 DNA binding domain fusion proteins with an internal Myc-tag from a Gateway converted pGBK-T7 vector. Blots were probed with anti-Myc 9B11 antibody. The lane labeled pGBK-T7 is the non-Gateway modified empty vector control. Ubiquitin domain, UBD; bromodomain, BD; middle region, MR; C-terminal domain, CD.

Kda

pGAD-T7 TBP1 TBP2 TAF1 UBD TAF1 BD TAF1 #6 (ND) TAF1 #8 (MR) TAF1 #9 (CD) TAF2 TAF4 TAF4b TAF5 TAF6 TAF6b-1 TAF6b-2 TAF6b-3 TAF6b-4 TAF7 TAF8 TAF9 TAF10 TAF11 TAF12 1-200 TAF12 201-394 TAF12 395-538 TAF12b TAF13 TAF14 TAF14b TAF15 TAF15b

120 100 80 60 50

40 112

30

Figure 3-4. Immunoblots of TFIID components expressed as prey fusion proteins in AH109. Proteins were expressed as Gal4 AD fusion proteins with an internal HA-tag from a Gateway converted pGAD-T7 vector. Blots were probed with HA.11 antibody. The lane labeled pGAD-T7 is the non-Gateway modified empty vector control. Ubiquitin domain, UBD; bromodomain, BD; N-terminal domain, ND; middle region, MR; C-terminal domain, CD.

Table 3-4. A yeast two-hybrid targeted protein-protein interaction matrix between subunits of the Arabidopsis thaliana TFIID complex.

TAF12 201-394 TAF1 #4UbD TAF1 #5BD pBGK-T7 TAF6b-1 TAF6b-2 TAF6b-3 TAF6b-4 TAF1 #8 TAF1 #9 TAF12b TAF14b TAF4b TAF10 TAF11 TAF13 TAF14 TBP1 TBP2 TAF2 TAF4 TAF5 TAF6 TAF8 TAF9 LAM 53

pGAD-T7 8/1+ TBP1 13/1+ 13/1+ TBP2 14/1+ 12/4+ TAF1 #4 UbD TAF1 #5 BD TAF1 #6 3/2+ 3/2+ 5/2+ TAF1 #8 11/1+ TAF1 #9 9/1+ 5/2+ 3/2+ 11/1+ 8/1+ TAF2 TAF4 5/1+ 6/1+ 10/1+ TAF4b 5/1+ 2/3+ 4/2+ 3/4+ 3/3+ 3/3+ 4/2+ TAF5 14/1+ 5/1+ 5/4+ 9/3+ 12/1+ 8/4+ 6/3+ 14/1+ 7/4+ 6/3+ TAF6 7/2+ TAF6b-1 12/1+ 7/4+ 12/1+ TAF6b-2 6/1+ 13/1+ 10/4+ 8/2+ 11/1+ TAF6b-3 8/1+ 10/1+ 8/1+ 10/2+ 10/1+ 113 TAF6b-4 12/2+ TAF7 2/3+ 4/2+ TAF8 3/3+ 2/3+ 4/2+ 3/1+ 4/3+ 2/4+ 3/3+ 4/3+ TAF9 3/2+ 5/4+ 7/3+ 6/3+ 6/4+ 12/1+ TAF10 14/1+ 3/2+ 2/4+ 2/4+ 6/1+ 2/3+ 2/4+ 8/3+ TAF11 10/1+ 10/1+ 10/1+ 6/2+ 3/4+ TAF12 1-200 TAF12 201-394 13/1+ 13/1+ 8/2+ TAF12 395-538 13/1+ 2/3+ 2/3+ 5/4+ 3/3+ 6/1+ 4/4+ 3/3+ 2/4+ 6/4+ 4/3+ TAF12b 3/2+ 2/3+ 9/1+ 2/3+ 2/3+ 3/3+ 8/3+ 3/2+ TAF13 5/3+ 6/3+ 14/1+ 5/3+ TAF14 8/1+ 3/2+ 8/1+ 4/2+ 10/1+ 6/3+ TAF14b 10/2+ 3/2+ 10//1+ 3/2+ 4/1+ 4/2+ TAF15 6/1+ 3/3+ 7/2+ TAF15b 3/3+ 3/4+ 7/3+ 4/4+ 2/4+

Note: Data describing the average number of days after spotting (DAS) and in how many of the five spots form colonies (i.e., 6/3+ indicates that three spots in the serial-dilution series developed colonies in an average of 6 DAS). Boxes highlighted in orange and green are reciprocated growth interactions, bold boxes are tests for dimer formation. Green and blue interactions were validated by β- Gal assays, with darker boxes being stronger interactions. Red and orange interactions were negated by β-Gal NAct values. Prey constructs highlight in blue are constructs that have not been shown as bait constructs (either because the bait proteins interact with the Gal4 AD, the bait proteins did not grow in the 5’FOA test, or they have not yet been tested as bait constructs).

9.0

8.0

7.0 )

-1 6.0 x cell

-1 5.0

4.0

-galactosidase units β mol CPRG x min 3.0 µ

(1 114 2.0

1.0

0.0

TAF4/TAF8 TAF5/TAF4 TAF5/TAF5 TAF5/TAF8 TAF5/TAF9 TAF6/TAF5 53/pGAD-T7 TAF4/TAF14 TAF4/TAF15 TAF4b/TAF5 TAF4b/TAF7 TAF4b/TAF8 TAF4b/TAF9 TAF5/TAF14 TAF4/TAF4b TAF4/TAF10 TAF4/TAF12b TAF4/TAF15b TAF4b/TAF4b TAF4b/TAF10 TAF4b/TAF11 TAF4b/TAF13 TAF4b/TAF14 TAF4b/TAF15 TAF5/TAF12b TAF5/TAF14b TAF5/TAF15b TAF2/TAF1#6 TAF4/TAF1#9 TAF1 #4/TAF7 TAF1 #5/TAF5 LAM/pGAD-T7 TAF5/TAF6b-3 TAF6b-1/TAF5 TAF6b-1/TAF9 TAF6b-2/TAF9 TAF6b-3/TAF9 TAF6b-4/TAF9 TBP1/TAF1 #6 TBP2/TAF1 #6 TAF1 #5/TBP1 TAF1 #5/TBP2 TAF4b/TAF12b TAF4b/TAF14b TAF4b/TAF15b TAF5/pGAD-T7 TAF6/pGAD-T7 TAF2/pGAD-T7 TAF4/pGAD-T7 TBP1/pGAD-T7 TBP2/pGAD-T7 TAF1 #5/TAF10 TAF6b-3/TAF11 TAF4b/pGAD-T7 TAF1 #5/TAF14b TAF1 #9/TAF1 #9 TAF1 #5/TAF1 #9 TAF1 #5/TAF6b-1 TAF6b-1/TAF6b-2 TAF6b-2/TAF6b-3 TAF6b-4/TAF6b-3 TAF1 #4/pGAD-T7 TAF1 #5/pGAD-T7 TAF1 #9/pGAD-T7 TAF1 pBGKT7/pGAD-T7 TAF6b-1/pGAD-T7 TAF6b-2/pGAD-T7 TAF6b-3/pGAD-T7 TAF6b-4/pGAD-T7

395-538 TAF4/TAF12 395-538 TAF5/TAF12 TAF4b/TAF12 395-538 TAF4b/TAF12 TAF1 #5/TAF12 395-538 #5/TAF12 TAF1 Bait/Prey 201-394 TAF6b-1/TAF12

Figure 3-5. Colorimetric assays of the β-galactosidase reporter levels in yeast diploids containing both bait and prey plasmids. Assays were preformed using the CPRG liquid culture assay protocol in the Yeast Protocols Handbook (Clontech). Green bars indicate negative controls of empty vector and two control baits with Gal4-AD. Yellow bars indicate the negative control for each bait construct with the empty prey vector. Blue bars are interactions yielding β-Galactosidase activities with NAct values above zero, red bars indicate β-Galactosidase activities with NAct values below zero.

8.0

7.0

) 6.0 -1

x cell 5.0 -1

4.0

3.0 -galactosidase units β mol CPRG x min µ

(1 2.0 115

1.0

0.0

TAF8/TAF5 TAF8/TAF8 TAF9/TAF5 TAF9/TAF6 53/pGAD-T7 TAF14/TAF5 TAF14/TAF8 TAF13/TAF8 TAF8/TAF4b TAF8/TAF10 TAF9/TAF4b TAF9/TAF10 TAF10/TAF4 TAF10/TAF5 TAF10/TAF8 TAF10/TAF9 TAF10/TBP1 TAF10/TBP2 TAF14/TAF14 TAF13/TAF4b TAF13/TAF10 TAF13/TAF11 TAF13/TAF14 TAF11/TAF13 TAF12b/TAF4 TAF12b/TAF5 TAF12b/TAF8 TAF10/TAF4b TAF10/TAF10 TAF10/TAF11 LAM/pGAD-T7 TAF9/TAF6b-2 TAF9/TAF6b-1 TAF9/TAF6b-4 TAF14/TAF12b TAF14/TAF14b TAF14/TAF15b TAF13/TAF12b TAF13/TAF14b TAF12b/TAF13 TAF12b/TAF14 TAF12b/TAF15 TAF12b/TAF4b TAF12b/TAF10 TAF12b/TAF11 TAF10/TAF12b TAF8/pGAD-T7 TAF9/pGAD-T7 TAF11/TAF6b-3 TAF10/TAF6b-1 TAF10/TAF6b-2 TAF10/TAF6b-3 TAF12b/TAF12b TAF12b/TAF14b TAF12b/TAF15b TAF14b/TAF12b TAF12b/TAF1#8 TAF14/pGAD-T7 TAF13/pGAD-T7 TAF11/pGAD-T7 TAF10/pGAD-T7 TAF12b/TAF1 #9 TAF12b/TAF6b-2 TAF14b/pGAD-T7 TAF12b/pGAD-T7 pBGKT7/pGAD-T7 TAF8/TAF12 395-538 TAF9/TAF12 395-538 TAF14/TAF12 395-538 TAF13/TAF12 395-538 TAF11/TAF12 395-538 TAF10/TAF12 395-538 TAF12b/TAF12 201-394 TAF12b/TAF12 395-538

TAF12 201-395/pGAD-T7 TAF12 201-394/TAF13 Bait/Prey

Figure 3-5 continued.

TBP2 TBP1

TAF15b TAF1 TAF15

TAF14b TAF1b TAF14

TAF2 TAF13

TAF12b TAF4

116 TAF12 TAF4b

TAF11 TAF5 TAF11b Possible pseudo-gene TAF6b TAF6 -1 TAF10 TAF6b TAF9 -4 Low TAF8 TAF6b conservation in TAF7 TAF6b -3 histone-fold -2 domain

Figure 3-6. Protein-protein interactions of Arabidopsis thaliana TFIID subunits as determined by yeast two-hybrid and β- galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from either Homo sapiens, Drosophila melanogaster, and /or Saccharomyces cerevisiae. Solid green lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

TBP2 TBP1 TAF15b TAF1 TAF15

TAF14b TAF1b TAF14

TAF13 TAF2

TAF12b TAF4

117 TAF12 TAF4b

TAF11 TAF5 TAF11b Possible pseudo-gene TAF6b TAF6 -1 TAF10 TAF6b TAF9 -4 Low TAF8 TAF6b conservation in TAF7 TAF6b -3 histone-fold -2 domain

Figure 3-7. Protein-protein interactions of Arabidopsis thaliana TFIID subunits as determined by yeast two-hybrid and β- galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from Homo sapiens, Drosophila melanogaster, and /or Saccharomyces cerevisiae. Solid green lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

CHAPTER 4 BINARY PROTEIN-PROTEIN INTERACTIONS OF Arabidopsis TFIIA, TFIIB, TFIID, TFIIE, AND TFIIF

Introduction

Relatively little research has been conducted on GTFs of the plant kingdom despite

a large number of genes present in the fully sequenced Arabidopsis thaliana genome that

are related to the GTFs of metazoans and yeast. A. thaliana TFIIA large subunit 1

(TFIIA-L1) has been shown to interact with TFIIA small subunit (TFIIA-S) and the

reconstituted recombinant TFIIA-L1/TFIIA-S complex is able to bind an TBP2/CaMV

35S promoter complex in vitro (Li et al., 1999). Furthermore, TBP2 has been shown to

interact with TFIIB1 (Pan et al., 2000).

Given the relative paucity of information regarding GTFs in plants our

objectives were to identify and clone GTFs from a model plant system (Arabidopsis) and

to develop a rapid and reliable assay of GTF protein-protein interactions (TFIIH and

PolII have been excluded due to their many subunit composition and greater roles in

transcription elongation than in initiation). No comprehensive analysis of GTF

interactions from any species has been conducted previously (with the exception of the

protein interaction maps of Arabidopsis and yeast TFIID) (Yatherajam et al., 2003) and

in fact there has been little or no publication on plant TFIIE or TFIIF complexes.

In this study, 878 binary protein-protein interactions between Arabidopsis TFIIA,

TFIIB, TFIID, TFIIE, and TFIIF components were examined (with the exception of inter-

TFIID interactions). Of these potential interactions, 118 (13.4%) were positive. The

118 119 total number of protein-protein interactions that were tested regardless of bait or prey conformations was 664 (the non-redundant interaction set), leaving 214 protein-protein interactions that were reciprocally tested.. Of the 664 non-redundant interactions 104

(15.7%) were positive. Seventy-seven of these interactions are novel, having not been found to occur between homologs. The work shown here significantly adds to the knowledge base of GTF-GTF binary protein interactions and lends credence to the functionality of each GTF in the multiple gene families. With the plethora of different

GTFs in Arabidopsis, it seems likely that differential expression of GTFs targeted by different transcription factors is a key mechanism of control in plants.

Materials and Methods

Experiments with TFIIA, TFIIB, TFIIE, and TFIIF were performed similarly to those with TFIID subunits (Chapter 3). First strand cDNA synthesis was performed as in

Chapter 3. Primers compatible with the pENTR/D-Topo or pDONR207 vectors

(Invitrogen) were designed to amplify the coding sequences (CDS) of the identified proteins (Table 4-1). Primers were designed and prepared and PCR products were made as in Chapter 3. TFIIA-L2 entry clone was produced by PCR and pENTR/D-Topo cloning from the Arabidopsis Biological Resource Center (ABRC) clone. TFIIEβ1 was also cloned in this way from ABRC clone C103330. PCR products were reacted with pENTR/D-Topo per the manufacturers instructions. The directional Topo cloning reaction products were transformed into One Shot TOP10 E. coli cells (Invitrogen, USA).

The resultant colonies were screened by PCR using a CDS specific and vector specific primer. Positive clones were verified by sequencing at the University of Florida

Microbiology and Cell Science Sequencing Core or Macrogen, Inc. (Seoul, Korea).

120

Upon screening and sequencing numerous TFIIEα2 clones, it became apparent that this

gene was toxic in the forward orientation in pENTR/D-Topo (clones were always found

in the reverse orientation). Therefore, this gene was cloned in two fragments into

pENTR/D-Topo (TFIIEα2 1-215 and TFIIEα2 200-475; Table 4-2 for primer

sequences).

Verified pENTR/D-Topo constructs were recombined (Gateway LR and BP

reactions, Invitrogen) successively through pGAD-T7Gateway, pDONR207, and pGBK-

T7Gateway and propagated in E. coli MACH1 cells (Invitrogen). Clones in pGAD-

T7Gateway and pGBK-T7Gateway were confirmed by digestion with BamHI as discussed in Chapter 3. A positive pGBK-T7Gateway clone of TFIIEβ1 was not obtained after numerous attempts, suggesting that this construct is toxic to E. coli.

Clones in pDONR207 were confirmed by digestion with BamHI and EcoRV as discussed in Chapter 3. The resultant yeast two-hybrid bait and prey constructs were transformed into MaV204K (Ito et al., 2000) and AH109 (MATa, trp1-901, leu2-3, 112, ura3-52, his3-200, gal4∆; gal80∆, LYS2::GAL1UAS-GAL1TATA –HIS3, GAL2UAS-GAL2TATA-ADE2,

URA3::MEL1UAS-MEL1TATA-lacZ, MEL1; Clontech) Saccharomyces cerevisiae strains,

respectively, as in Chapter 3. Bait constructs in MaV204K were plated on SD –Trp,

+0.1% 5’FOA to check for the potential to activate the SPLA10::URA3 reporter.

Expression of the URA3 protein resulted in no growth and indicated spurious activation

by the bait fusion protein. Suitable bait/MaV204K (non-activators) and prey/AH109

transformants were then mated with one another and selected on SD -Trp, -Leu (to test

for mating efficiency) and SD -Trp, -Leu, -His, -Ade (to test for interaction). Double

selection on plates lacking His and Ade served as a stringent screen for interactions.

121

Yeast matings were performed similarly to the protocol in the Yeast Protocols Handbook

(Clontech), as outlined in Chapter 3.

β-galactosidase assays were performed to obtain semi-quantitative data regarding the strength of individual interactions. β-galactosidase assays with the resulting

interactions were performed using CPRG substrate (Roche, Germany) according to the

Clontech Yeast Protocols Handbook. Normalized activities (NAct) were determined and

evaluated as in Chapter 3 with the exception of those involving the TFIIFβ2 prey protein.

The normalization value for interactions involving the TFIIFβ2 prey protein was

determined by the following equation: NAct = AVGt – SDt – 1.1 x [AVG (AVGc1

+AVGc2) – 1.1 x AVG (SDc1 – SDc2)]. In this equation AVGt is the average of the test

activities, SDt is the standard deviation of the test activities, AVGc1 is the average activity

of the appropriate bait negative control, SDc1 is the standard deviation of the activity of

the appropriate bait negative control, AVGc2 is the average activity of the empty vector

bait interacting with the TFIIFβ2 prey, and SDc2 is the standard deviation of the activity

of the empty vector bait interacting with the TFIIFβ2 prey. Immunoblots were performed

as described in Chapter 2.

Results

The TFIIA-L1, and TFIIA-L2 bait constructs did not grow on 5’FOA containing plates as expected, since TFIIA-L1 was shown to be an activator when artificially recruited to a yeast promoter (Li et al., 1999). After screening, it became apparent that

several bait proteins were interacting with the Gal4 activation domain (Gal4AD) due to

growth on the SD -Trp, -Leu, -His, -Ade in all prey combinations. The Gal4AD

interacting bait clones TFIIB2, TFIIB4, TFIIB5, and TFIIEα2 200-475 have been

122

eliminated from interactions for these reasons. Histograms of the percentage of matings

that formed colonies from each bait and prey constructs were created (Figures 4-1 and 4-

2). These figures exclude the full length TAF12 to avoid redundancy (as in Chapter 2).

The bait proteins that interacted with the Gal4 AD formed colonies with a frequency of

over 70% of their matings and were the only bait constructs with frequencies in this

range. This confirmed the assertion that the baits that were not observed to interact with the Gal4 AD and were suitable for use as bait proteins.

Immunoblots to detect bait and prey proteins (Figures 4-3 and 4-4, respectively)

demonstrated a large variability in protein expression levels. Bait and prey proteins used

in Chapter 3 are included in this study; the results of the immunoblots for these proteins

are in Figures 3-3, and 3-4. The majority of bait proteins were at steady state levels that were below the level of detection. However, interaction data based on growth suggest that these proteins are present in the yeast cells. The results of the targeted two-hybrid screens are presented in Table 4-4. These interactions are depicted for TFIIA, TFIIB,

TFIID, TFIIE, TFIIF and as a summary in Figures 4-6 through 4-11, respectively.

The results of the β-galactosidase assays are shown in Figure 4-5. The β- galactosidase normalized activities were utilized to verify or exclude protein-protein interactions based on colony growth. The NAct values below zero were not considered to interactions, while NAct values above one were considered evidence of strong interactions. For interactions to be deemed positive, greater than 50% of interaction evidence was required to be positive. Thus, if an interaction is not supported by growth of diploid yeast containing reciprocated constructs, it must be supported by a NAct value greater than zero in order to be considered positive. If an interaction is supported by

123 growth of diploid yeast containing reciprocated constructs, only one of the two tested β- galactosidase activities must have a NAct greater than zero for the interaction to be considered positive. Strong interactions (NAct ≥ 1) are depicted in Figure 4-12.

A total of 878 partially reciprocated binary interactions were tested which included most possible interactions of TFIIA, TFIIB, TFIID, TFIIE, and TFIIF. Inter-TFIID interactions were excluded from this study, but are shown in Chapter 3. For a variety of reasons, bait constructs of TFIIA-L1, TFIIA-L2, TFIIB2, TFIIB4, TFIIB5, TAF1 #6,

TAF7, TAF12 1-200, TAF12 395-538, TAF15, TAF15b, TFIIEα2 200-475, and TFIIEβ1 were not included in this study. Therefore, interactions between these subunits could not be tested. This leaves a minimum of 169 untested interactions between these five complexes.

Discussion

Using homology based searches of the A. thaliana genomic sequence database, 18 loci encoding putative TFIIA, TFIIB, TFIIE, and TFIIF subunits have been identified, in addition to the 23 loci discussed in Chapter 3 for TFIID. Of these, 47 putative TFIID subunit coding-sequences (including 4 splice variants of TAF6b and fragments of TAF1,

TAF12, and TFIIEα2) have been cloned into the MATCHMAKER III yeast two-hybrid system (Clontech). Thirteen of these proteins lead to transcriptional activation of the yeast reporter promoters in artificial recruitment assays when expressed as baits either without or with the Gal4 AD being co-expressed. Interactions among these 13 (TFIIA-

L1, TFIIA-L2, TFIIB2, TFIIB4, TFIIB5, TAF1 #6, TAF7, TAF12 1-200, TAF12 395-

538, TAF15, TAF15b, TFIIEα2 200-475, and TFIIEβ1) have therefore not been tested.

124

Development of another protein-protein interaction assay system will be necessary to test

interactions of these proteins.

TFIIA in Arabidopsis appears to potentially be represented by three different

complexes. These being the result of heterodimer formation of TFIIA-S with TFIIA-L1,

TFIIA-L2, or TFIIA-L3. Because of the high degree of similarity between TFIIA-L1 and

TFIIA-L2, it is predicted that these two proteins have redundant functions. However,

TFIIA-L3 is significantly diverged in sequence from the other plant TFIIA-L proteins

(Chapter 2).

The small subunit of TFIIA displays a number of novel interactions, which have

not been found in the literature. These include interactions with TFIIB3, TFIIB6,

TAF4b, TAF8, TAF10, TAF12 1-200, TAF12b, TAF13, TAF14, and TFIIFβ2. Only

TFIIA-L3 interacts with a TFIIB homolog (TFIIB6); although in yeast, TFIIA-L interacts

with TFIIB in two-hybrid experiments. Since there are essentially two diverged versions

of TFIIA-L in Arabidopsis, some of the functions (interactions) of TFIIA may have been

evolutionarily transferred to TFIIA-S, which is present in every TFIIA complex.

TFIIB is represented by six different proteins in Arabidopsis. TFIIB1 and TFIIB2

are very similar in sequence and are suggested to play a canonical TFIIB role in formation of the PIC. Interestingly, TFIIB1 and TFIIB2 have very dissimilar yeast two- hybrid interactions. TFIIB1 interacts with TAF4, TAF4b, TAF8, TAF10, TAF12b,

TAF13, TFIIB5, TFIIEα1, and forms a homooligomer (possibly a dimer). TFIIB2 was only shown to interact with TAF10. This discrepancy in interactions was unexpected for two close homologs. When expressed as a bait construct, TFIIB2 was found to interact

125

with the Gal4 AD. This interaction might dampen the transcriptional response of the

TFIB2 prey construct leading to false negatives in this system.

TFIIB3 and TFIIB6 group closely in a phylogenetic analysis of the TFIIB-family of

proteins. This is remarkable because TFIIB6 lacks the second direct repeat region found

in TFIIB proteins, while TFIIB3 does not. Nonetheless, multiple interactions with other

proteins are in common between TFIIB3 and TFIIB6: TFIIA-S, TFIIB5, TAF12 395-538,

TAF12b, and TFIIFβ2. While neither of these proteins has been tested to interact with

TBP at the TATA-element (the gold standard for TFIIB function), their multiple protein-

protein interactions with other GTF proteins strongly suggest a role in the PIC.

TFIIB5/pBrp has recently been shown to associate with the plastid outer envelope and presumably is involved in a signaling pathway from the plastid to the nucleus, triggering a transcriptional response (Lagrange et al., 2003). This protein is a bona fide

TFIIB, because it interacts with TBP bound to the TATA-element in electromobility gel shift assays. While the exact function of TFIIB5/pBrp is still unknown, the data presented here demonstrates that it interacts with many other proteins in TFIID, TFIIE, and TFIIF as well as with several of its TFIIB homologs. The protein interaction information presented here, along with the trafficking of this protein from the plastid to the nucleus under conditions of proteasome/signalosome dysfunction (Lagrange et al.,

2003) strongly suggest a signal transduction role resulting in direct manipulation of the central proteins regulating transcription.

Arabidopsis TFIID appears to be composed of at least 15 different protein subunits,

some of which are present in multiple copies. There is no evidence of a TAF3-like

126

protein in plants other than a minimal similarity with TAF8. Interactions of TFIID

subunits with each other are explored in detail within Chapter 3.

Interestingly, TAF8 and TAF10 (which both dimerize and interact very strongly

with one another) have a very similar pattern of interactions with the proteins of other

GTFs. Both TAF8 and TAF10 were shown to interact with TFIIA-S, TFIIB1,

TFIIB5/pBrp, TFIIEα1, and TFIIFβ2. This data suggests that TAF8 and TAF10 (perhaps

as an α2β2 structure) mediate interactions between TFIID and TFIIA, TFIIB, TFIIE, and

TFIIF. There are some TAF-GTF interactions of TAF8 and TAF10 which are not shared.

These include TAF8-TFIIEβ2, TAF10-TFIIB2, TAF10-TFIIB3, and TAF10-TFIIB6; however, all of these interactions are consistent with the shared TAF8 and TAF10

interactions. The TAF8-TFIIEβ2 interaction is congruous with a TAF8-TAF10

heterotetramer interacting with the TFIIEα1-TFIIEβ2 heterotetramer. Similarly, the

TAF10-TFIIB2, and TAF10-TFIIB6 interactions are consistent with a TAF8-TAF10

interacting with TFIIB proteins. None of these interactions of TAF8 or TAF10 with

other GTFs have been reported previously in any organism. These data suggest a

previously unknown role of a TA8-TAF10 heterotetramer in PIC nucleation, at least

within plants.

Arabidopsis TFIIE proteins are encoded by five genes (three encoding TFIIEα subunits and two encoding TFIIEβ subunits). TFIIEα1 was shown to interact strongly with TFIIEβ2, although TFIIEα2 fragments did not interact with TFIIEβ proteins in this study. This may be caused by improper folding of TFIIEα2 domains when expressed in a fragmented form. TFIIEα3 was not examined in this study because it was not amplified under the RT-PCR conditions utilized. This gene is not represented by EST

127

sequences and may therefore be a non-expressed pseudogene, or a under-represented mRNA/cDNA. Interestingly, TFIIEβ1 did not interact with any other proteins in this study. This could be due to a two amino acid truncation of the C-terminus, or another artifact of the protein expression system. However, this protein was readily expressed in the prey form as evidenced by Figure 4-2.

TFIIF is a heterotetrameric complex of two TFIIFα and two TFIIFβ molecules

(Orphanides et al., 1996). The yeast two-hybrid data presented here suggest a αββα composition for plant TFIIF, since the TFIIFβ2 protein was shown to dimerize while

TFIIFα apparently did not. TFIIFβ1, like TFIIEα3, was not examined in this study

because an RT-PCR product was not obtained. This gene is also not represented by

ESTs, and given it divergence in primary structure from other TFIIFβ proteins in plants,

it is considered a possible pseudogene (Chapter 2, Figure 2-12). However, in yeast a

third factor interacts as part of TFIIF, the yeast TAF14 (Henry et al., 1994). The data

presented here demonstrates a connection of TAF14 and TAF14b with TFIIF; however,

these interactions were between TAF14(b) and TFIIFβ2, not TFIIFα as in yeast. Since

the TAF14(b)-TFIIF connection differs from that in yeast, support for TAF14 or TAF14b

acting as TFIIF subunits is tenuous without direct evidence of its localization to an

isolated TFIIF complex.

Strikingly, TFIIFβ2 interacts with many other GTF subunits (24 of 47), while

TFIIFα only interacts with TFIIFβ2. TFIIFα did not interact with any subfragment of

TAF1, which in other systems is known to occur since TAF1 acetylates TFIIFα.

Interestingly, TFIIFβ2 interacts with both TAF1 #8 (middle region) and TAF1 UBD (an

internally coded ubiquitin moiety plus over one-hundred amino acids on either side,

128

which is located in the middle region construct). The HAT/FAT domain of TAF1 is

located in the TAF1 #8 construct, and is partially represented in the TAF1 UBD construct. This suggests that in Arabidopsis either the TFIIFβ2 is acetylated in the place of TFIIFα, or that TFIIFβ2 is a major stabilizer of the TAF1-TFIIFα interaction.

TFIIFβ2 interacted with four of six TFIIB homologs, as is expected since this interaction is a major connection point for TFIIB and PolII. Of the TFIIBs, only TFIIB2 and TFIIB4 did not interact with TFIIF2. In general, these two TFIIB homologs had the fewest interactions of the family, although both were detectably expressed as prey proteins. Both TFIIB2 and TFIIB4 contain putative zinc-binding domains that are

implicated in interactions with TFIIFβ homologs (Buratowski and Zhou, 1993). Since

false negatives are often a problem in any protein interaction study, definitive

conclusions can not be drawn with respect to these failed interactions. However, a lack

of interactions with TFIIFβ2 and nearly all other GTFs tested here draws into question

the veracity of TFIIB2 and TFIIB4 as functional TFIIB-homologs.

Of the 118 interactions identified in this study, 86 were novel. A significant

portion of these are likely to be due to the lack of a previously performed systematic

study to specifically test interactions among TFIIA, TFIIB, TFIID, TFIIE, and TFIIF

from any organism. However, with the large number of novel plant GTF homologs

identified by homology based searches, various specializations are to be expected. The

data provided by this study can lead to specific, testable hypotheses as to variable PIC

conformations. Further detailed analyses will be necessary to unravel the meanings

behind these varied binary interactions.

129

Table 4-1. Primers for amplification of cDNAs to Arabidopsis homologs of TFIIA, TFIIB, TFIIE, and TFIIF cloning into the pENTR/D-Topo vector.

Gene Primer Sequence TFIIA-L1 Upper Primer caccATGGGTACAACAACGACAACAAGCGCTGTGTATATCC ATG TFIIA-L1 Lower Primer GAAGTCAAACTCGCCTGCTGCTTTGTTGAAGAGAATGTCC TTATC TFIIA-L2 Upper Primer caccATGGGTACAACAACGACAACAAGCGCTGTG TFIIA-L2 Lower Primer GAAGTCGAACTCGCCTGTTGCTTTGTTGAAGAGAATG TFIIA-L3 Upper Primer caccATGGTGTTATCAACGAGCGATACGAGTAGCTCTTACA ACTATG TFIIA-L3Lower Primer GAAGTTGAAATCTCCTGTTGCCTGTGAG TFIIA-S Upper Primer caccATGGCGACGTTTGAGCTGTACAGGAGATCGACGATC TFIIA-S Lower Primer CTGTGTGAGCAGCTTGGAATCACATGCCACTATCTTC TFIIB1 Upper Primer caccATGTCGGATGCGTATTGTACGGATTG TFIIB1 Lower Primer AGGACTTGACAGGTTTTTCAGATCCTCTTCCTTTGCATACC AAC TFIIB2 Upper Primer caccATGAGTGACGCGTTTTGTTCGGACTGTAAGAGGCACA CGGA TFIIB2 Lower Primer AGGGCTTTGAAGGTTCTTGAGATCTTCTTCTTTAGCGTACC AAGCT TFIIB3 Upper Primer caccATGGAAGAAGAGACCTGCTTGGACTG TFIIB3 Lower Primer TACTGAAAATTTTGCAGAATCCCAGGACGTGATG TFIIB4 Upper Primer caccATGACGATGAAGTGGGGTCACAGTTGCAGGAGATGTA AG TFIIB4 Lower Primer AGGAGCTCCAAGGTTTTTCAGGTCATTTGCATTGGCAAAC CAC TFIIB5 Upper Primer caccATGAAGTGTCCGTACTGTTCATC TFIIB5 Lower Primer GAAGTCTCCATGGGGATTATCAGCATTC TFIIB6 Upper Primer caccATGAAAGAAGACGGAATTTGCTTGGAGTGCAAGAGGC CAAC TFIIB6 Lower Primer AATAGTACCGAAAGAATCTCCAAGAAGCTTCACCGCTTTG TFIIEα1 Upper Primer caccATGGAAAAATCAGGCCCGGTGCAGAAAGCCGTTGTTC TC TFIIEα1 Lower Primer GCCTTCTTCCCAATCGACGTCGTCTTCCTCTTCTTCTTC TFIIEα2 Upper Primer caccATGGACAAATCAATCACGGTGGTGCGGAAAACCGTTG TG TFIIEα2 Lower Primer GCCTTCTTCCCAGTCGATGTCGTCATCTTCGTCTCCATCT TFIIEα3 Upper Primer caccATGGTGAAGCTTGTAGCGAAAAC TFIIEα3 Lower Primer GCATTCTTGCCAATCGACATCGTTTTCGTC TFIIEβ1 Upper Primer caccATGGCTTTGCGGGAGCAGCTTG TFIIEβ1 Lower Primer ACTCTGGAAGAGCTCGAGCATATGGGAATTG TFIIEβ2 Upper Primer caccATGGCTCTAAAGGAACAGCTAG TFIIEβ2 Lower Primer GTTCCGGGAACTGCTGCCGTTAAG

130

Table 4-1 Continued. Gene Primer Sequence TFIIFα Upper Primer caccATGTCGAACTGTTTGCAATTGAATACGTCTTGTGTTGG TTGCGGATCAC TFIIFα Lower Primer AGCAAGCGGAGTAACATTATCTCTCAAAACAACAACAAA CTTTTCAGAAC TFIIFβ1 Upper Primer caccATGGAAGATGTAAAGGTGGAAATGAAGGTAAG TFIIFβ1 Lower Primer TTCCTGAGTGGCTTTCTTATATTCAGGCTTCAG TFIIFβ2 Upper Primer caccATGGAAGATATTCATAATCTCGATATAGAG TFIIFβ2 Lower Primer CTGCCCACCTGTATCATCTTCAGCAG

Note: The lower case sequence “cacc” on the upper primers is for directional cloning into pENTR/D-Topo.

131

Table 4-2. Primers for cloning of TFIIEα2 N-terminal, and C-terminal fragments. Lower case sequences are not homologous to the gene of interest, but are required for cloning into the pENTR/D-Topo vector.

Gene Primer Sequence TFIIEα2 1-215 Upper Primer caccATGGACAAATCAATCACGGTGGTGCGGAAAACCGTTGTG TFIIEα2 1-215 Lower Primer TCTATCTACTACTTCTTCGGAAATTAGCTTGTTACATTC TFIIEα2 200-475 Upper Primer caccGTTATGGAATGTAACAAGCTAATTTCCGAAGAAG TFIIEα2 200-475 Lower Primer GCCTTCTTCCCAGTCGATGTCGTCATCTTCGTCTCCATCT

132

Table 4-3. Arabidopsis thaliana TFIIA, TFIIB, TFIIE, and TFIIE component cDNA GenBank accession numbers.

Gene Locus Accession CDS Size Mw (KDa) pI (bp) TFIIA-S At4g24440 AY463599 321 12.1 5.61 TFIIA-L1 At1g07480 AY463627 1,128 41.3 3.98 TFIIA-L2 At1g07470 AY463597 1,128 41.2 4.02 TFIIA-L3 At5g59230 AY463598 561 20.9 3.94 TFIIB1 At2g41630 AY463600 939 34.3 6.77 TFIIB2 At3g10330 AY463601 939 34.2 6.66 TFIIB3 At3g29380 AY463629 1,011 37.7 6.32 TFIIB4 At3g57370 AY463602 1,083 39.7 7.76 TFIIB5 At4g36650 AY463603 1,512 55.7 6.14 TFIIB6 At4g10680 AY463604 549 19.9 8.89 TFIIEα1 At1g03280 AY463605 1,440 54.1 4.75 TFIIEα2 At4g20340 N/A 1,428 54.6 4.95 TFIIEα2 At4g20340 AY463606 645 25.5 9.04 1-215 TFIIEα2 At4g20340 AY463607 825 30.9 4.32 200-475 TFIIEα3 At4g20810 N/A 1,251 47.8 4.72 TFIIEβ1 At4g21010 AY463610 828 31.5 10.23 TFIIEβ2 At4g20330 AY463608 861 32.4 10.04 TFIIFα At4g12610 AY463611 1,950 72.3 5.22 TFIIFβ1 At3g52270 N/A 1,095 42.1 7.70 TFIIFβ2 At1g75510 AY463609 761 29.7 6.92

20

18

16

14

12

10

Frequency 133 8

6

4

2

0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 Percent Growth Figure 4-1. Histogram of percent of matings, per bait construct, that yielded colony growth. Bars marked in red represent bait constructs that were excluded from the analysis and considered spurious bait activators.

16

14

12

10

8

Frequency

6 134

4

2

0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 Percent Growth Figure 4-2. Histogram of percent of matings, per prey construct, that yielded colony growth.

1 2 1-215

2 2 α α β α β

pGBK-T7 TFIIA-L3 TFIIA-S TFIIB1 TFIIB3 TFIIB6 TFIIE TFIIE TFIIE TFIIF TFIIF KDa

120 100 80 60 135 50

40

30

Figure 4-3. Immunoblots of TFIIA, TFIIB, TFIIE, and TFIIF components expressed as bait fusion proteins in MaV204K. Proteins were expressed as Gal4 DNA binding domain fusion proteins with an internal Myc-tag from a Gateway converted pGBK-T7 vector. Blots were probed with anti-Myc 9B11 antibody. The lane labeled pGBK-T7 is the non-Gateway modified empty vector control.

1 2 1-215 2 200-475

1 2 2 α α α β β α β

pGAD-T7 TFIIA-L1 TFIIA-L2 TFIIA-L3 TFIIA-S TFIIB1 TFIIB2 TFIIB3 TFIIB4 TFIIB5 TFIIB6 TFIIE TFIIE TFIIE TFIIE TFIIE TFIIF TFIIF KDa

120 100 80

60 50 136 40

30

Figure 4-4. Immunoblots of TFIIA, TFIIB, TFIIE, and TFIIF components expressed as bait fusion proteins in AH109. Proteins were expressed as Gal4 AD fusion proteins with an internal HA-tag from a Gateway converted pGAD-T7 vector. Blots were probed with HA.11 antibody. The lane labeled pGAD-T7 is the non-Gateway modified empty vector control.

Table 4-4. A yeast two-hybrid targeted protein-protein interaction matrix between components of Arabidopsis thaliana TFIIA, TBIIB, TFIIE, and TFIIF with subunits of the TFIID complex.

137

Note: Data describing the average number of days after spotting (DAS) and in how many of the five spots form colonies (i.e., 6/3+ indicates that three spots in the serial-dilution series developed colonies in an average of 6 DAS). Boxes highlighted in orange and green are reciprocated growth interactions, bold boxes are tests for dimer formation. Green and blue interactions were validated by β- Gal assays, with darker boxes being stronger interactions. Red and orange interactions were negated by β-Gal NAct values. Prey constructs highlighted in blue are constructs that have not been shown as bait constructs (either because the bait proteins interacted with the Gal4 AD, the bait proteins did not grow in the 5’FOA test, or they have not yet been tested as bait constructs).

6.0

5.0

) -1 4.0 x cell -1

3.0

-galactosidase units β mol CPRG x min 2.0 µ (1 138

1.0

0.0

53/pGAD-T7 TAF4/TFIIB1 TAF4/TFIIB5 TAF5/TFIIB5 TAF8/TFIIB1 TAF8/TFIIB5 TAF5/TFIIA-S TAF8/TFIIA-S TAF4/TFIIFb2 TAF5/TFIIFb2 TAF8/TFIIFb2 TAF4/TFIIEa1 TAF8/TFIIEa1 TBP1/TFIIFb2 TAF4b/TFIIB1 TAF4b/TFIIB3 TAF4b/TFIIB4 TAF4b/TFIIB5 TAF4b/TFIIB6 TAF14/TFIIB3 TAF14/TFIIB4 TAF5/TFIIEb2 TAF8/TFIIEb2 TAF10/TFIIB1 TAF10/TFIIB2 TAF10/TFIIB3 TAF10/TFIIB5 TAF10/TFIIB6 TAF13/TFIIB1 TAF13/TFIIB5 LAM/pGAD-T7 TAF2/TFIIA-L1 TAF4/TFIIA-L1 TAF8/TFIIA-L3 TAF4b/TFIIA-S TAF14/TFIIA-S TAF10/TFIIA-S TAF12b/TFIIFa TAF13/TFIIA-S TAF4b/TFIIFb2 TAF10/TFIIFb2 TAF11/TFIIFb2 TAF13/TFIIFb2 TAF4b/TFIIEa1 TAF4b/TFIIEa2 TAF10/TFIIEa1 TAF13/TFIIEa1 TAF4b/TFIIEb2 TAF12b/TFIIB1 TAF12b/TFIIB3 TAF12b/TFIIB4 TAF12b/TFIIB5 TAF12b/TFIIB6 TAF2/pGAD-T7 TAF4/pGAD-T7 TAF5/pGAD-T7 TAF8/pGAD-T7 TBP1/pGAD-T7 TAF1 #4/TFIIB5 TAF4b/TFIIA-L1 TAF4b/TFIIA-L2 TAF4b/TFIIA-L3 TAF14/TFIIA-L2 TAF14/TFIIA-L3 TAF10/TFIIA-L1 TAF14b/TFIIA-S TAF12b/TFIIA-S TAF14b/TFIIFb2 TAF1#8/TFIIFb2 TAF12b/TFIIFb2 TAF12b/TFIIEa1 TAF12b/TFIIEb2 TAF4b/pGAD-T7 TAF11/pGAD-T7 TAF14/pGAD-T7 TAF10/pGAD-T7 TAF13/pGAD-T7 TAF1 #4/TFIIFb2 TAF6b-2/TFIIA-S TAF6b-3/TFIIFb2 TAF12b/TFIIA-L1 TAF12b/TFIIA-L2 TAF12b/TFIIA-L3 TAF1 #5/TFIIA-L3 TAF14b/pGAD-T7 TAF1#8/pGAD-T7 TAF12b/pGAD-T7 TAF1 #4/pGAD-T7 TAF1 #5/pGAD-T7 TAF1 pBGKT7/pGAD-T7 TAF6b-2/pGAD-T7 TAF6b-3/pGAD-T7

TAF6b-2/TFIIA-L1 Bait/Prey

Figure 4-5. Colorimetric assays of the β-galactosidase reporter levels in yeast diploids containing both bait and prey plasmids. Assays were preformed using the CPRG liquid culture assay protocol in the Yeast Protocols Handbook (Clontech). Green bars indicate negative controls of empty vector and two control baits with Gal4-AD. Yellow bars indicate the negative control for each bait construct with the empty prey vector. Blue bars are interactions yielding β-Galactosidase activities with NAct values above zero; red bars indicate β-Galactosidase activities with NAct values below zero.

5.0

4.0 ) -1 x cell

-1 3.0

2.0 -galactosidase units β mol CPRG x min µ (1 1.0 139

0.0 53/pGAD-T7 TFIIB3/TAF7 TFIIB1/TAF8 TFIIB3/TBP1 TFIIB3/TFIIB5 TFIIB1/TFIIB1 TFIIB1/TFIIB5 TFIIB1/TFIIB6 TFIIB6/TFIIB5 TFIIB1/TAF10 TFIIEa1/TAF8 TFIIEa1/TBP1 TFIB6/TFIIFb2 LAM/pGAD-T7 TFIIB3/TFIIA-S TFIIB1/TFIIA-S TFIIA-L3/TBP2 TFIIB6/TFIIA-S TFIIFa/TFIIFb2 TFIIFb2/TFIIB1 TFIIFb2/TFIIB5 TFIIFb2/TFIIB6 TFIIB3/TFIIFb2 TFIIB1/TFIIFb2 TFIIB1/TFIIEa1 TFIIEa1/TFIIB1 TFIIEa1/TFIIB4 TFIIEa1/TFIIB5 TFIIEb2/TFIIB5 TFIIFb2/TAF10 TFIIEa1/TAF10 TFIIEa1/TAF13 TFIIB6/TAF14b TFIIB6/TAF1 #9 TFIIB6/TAF1 TFIIA-L3/TFIIB1 TFIIA-L3/TFIIB6 TFIIB6/TFIIA-L3 TFIIFb2/TFIIA-S TFIIA-S/TFIIFb2 TFIIFb2/TFIIFb2 TFIIEa1/TFIIFb2 TFIIEb2/TFIIFb2 TFIIEa1/TFIIEa1 TFIIEa1/TFIIEb2 TFIIEb2/TFIIEa1 TFIIEa1/TAF1#8 TFIIFa/pGAD-T7 TFIIB6/pGAD-T7 TFIIB3/pGAD-T7 TFIIB1/pGAD-T7 TFIIA-S/TFIIA-L1 TFIIA-S/TFIIA-L3 TFIIA-L3/TFIIA-S TFIIA-S-TFIIA-L2 TFIIA-L3/TFIIFb2 TFIIA-L3/TFIIEb2 TFIIEa1/TAF6b-2 TFIIA-L3/TAF12b TFIIA-S/pGAD-T7 TFIIFb2/pGAD-T7 TFIIEb2/pGAD-T7 TFIIEa1/pGAD-T7 pBGKT7/pGAD-T7 TFIIA-L3/pGAD-T7 TFIIB6/TAF12 1-200 TFIIEa2 1-215/TFIIB5 TFIIEa2 1-215/TAF10 TFIIEa2 1-215/TFIIFb2 TFIIB3/TAF12 395-538 TFIIB1/TAF12 395-538 TFIIB6/TAF12 395-538 TFIIFb2/TAF12 395-538 TFIIEa1/TAF12 395-538 TFIIEa2 1-215/pGAD-T7 TFIIA-L3/TAF12 395-538 TFIIA-S/TFIIEa2 201-475 TFIIEa2 1-215/TAF12 395-538 Bait/Prey Figure 4-5 continued.

TFIIB1 TFIIB2 TFIIA-L1 TFIIB3 TBP2 TFIIA-S TBP1 TAF15b TFIIB4

TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF2 TAF13

TAF12b TAF4

140 TAF12 TFIIFα TAF4b TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b- TAF6 1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b- conservation TAF7 TAF6b- 3 TFIIEα3 2 in histone- TFIIEβ1 fold domain TFIIEβ2

Figure 4-6. Protein-protein interactions of Arabidopsis thaliana TFIIA subunits with components of TFIIB, TFIID, TFIIE, TFIIF, and other TFIIA components as determined by yeast two-hybrid and β-galactosidase confirmations. Dashed lines are interactions found in this study that have been demonstrated to occur with homologs from Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae. Solid lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB4

TFIIA-L2 TAF15 TAF1

TAF14b TAF1b TFIIB5 TFIIA-L3 TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4

TAF12 141 TAF4b TFIIFα TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b- TAF6 1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b- TAF7 3 conservation TFIIEα3 TAF6b- 2 in histone- TFIIEβ1 fold domain TFIIEβ2

Figure 4-7. Protein-protein interactions of Arabidopsis thaliana TFIIB homologs with components of TFIIA, TFIID, TFIIE, TFIIF, and other homologs of TFIIB as determined by yeast two-hybrid and β-galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae. Solid green lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB4

TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4

TAF12 TFIIFα 142 TAF4b TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b- TAF6 1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b- TAF7 3 conservation TFIIEα3 TAF6b- 2 in histone- TFIIEβ1 fold domain TFIIEβ2

Figure 4-8. Protein-protein interactions of Arabidopsis thaliana TFIID components with components of TFIIA, TFIIB, TFIIE, and TFIIF as determined by yeast two-hybrid and β-galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae. Solid green lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB4

TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4

TAF12 143 TAF4b TFIIFα

TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b- TAF6 1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b- conservation TAF7 TAF6b- 3 TFIIEα3 2 in histone- fold domain TFIIEβ1 TFIIEβ2

Figure 4-9. Protein-protein interactions of Arabidopsis thaliana TFIIE subunits with components of TFIIA, TFIIB, TFIID, TFIIF, and other TFIIE components as determined by yeast two-hybrid and β-galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae. Solid green lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4

TAF12 144 TAF4b TFIIFα

TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b- TAF6 1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b- TAF7 3 conservation TFIIEα3 TAF6b- 2 in histone- TFIIEβ1 fold domain TFIIEβ2

Figure 4-10. Protein-protein interactions of Arabidopsis thaliana TFIIF subunits with components of TFIIA, TFIIB, TFIID, TFIIE, and other TFIIF components as determined by yeast two-hybrid and β-galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae. Solid green lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB4

TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4

145

TAF12 TAF4b TFIIFα TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b- TAF6 1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b- conservation TAF7 3 TAF6b- in histone-fold TFIIEα3 2 domain TFIIEβ1 TFIIEβ2

Figure 4-11. Protein-protein interactions of Arabidopsis thaliana TFIIA, TFIIB, TFIIE, and TFIIF with each other and subunits of TFIID as determined by yeast two-hybrid and β-galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from humans, fruit flies, and/or yeast. Solid green lines are novel interactions demonstrated only in this study. Striped figures could not be tested as baits.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB4 TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4

TAF12 146 TAF4b TFIIFα

TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b- TAF6 1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b- TAF7 3 conservation TFIIEα3 TAF6b- 2 in histone- TFIIEβ1 fold domain TFIIEβ2

Figure 4-12. Strong protein-protein interactions of Arabidopsis thaliana TFIIA, TFIIB, TFIIE, and TFIIF with each other and subunits of TFIID as determined by yeast two-hybrid and β-galactosidase confirmations. Dashed black lines are interactions found in this study that have been demonstrated to occur with homologs from humans, fruit flies, and/or yeast. Solid green lines are novel interactions from this study. Striped figures could not be tested as baits.

CHAPTER 5 DISCUSSION

Using homology based searches of the Arabidopsis thaliana genomic sequence database, 41 loci encoding putative TFIIA, TFIIB, TFIID, TFIIE, and TFIIF subunits have been identified. TFIIA, TFIIB, TFIID, TFIIE, and TFIIF are encoded by four, six, twenty-three, five, and three of these loci, respectively. In total, 38 cDNAs encoding

these genes have been cloned. Their protein-protein interactions and phylogenetic

relationships have been analyzed and are discussed below.

TFIIA Large and Small Subunits

TFIIA is composed of either two subunits (L and S) in fungi and plants or three

subunits (α, β, and γ) in metazoans where α and β are derived from post-translational

cleavage of a protein homologous to the fungal and plant TFIIA-L subunit (Li et al.,

1999). In A. thaliana, TFIIA is encoded by four genes. Three of these encode large subunit homologs and one encodes a small subunit homolog. Poplar TFIIA also appears to be encoded by four genes (two encoding large subunit proteins, and two encoding small subunit proteins). AtTFIIA-L1 and AtTFIIA-L2 appear to have resulted from a recent gene-duplication due to their high degree of identity and their close juxtaposition in chromosomal location (Fig. 2-2). Because of this similarity, TFIIA-L1 and TFIIA-L2 are predicted to have redundant functions. The TFIIA-S family is conserved throughout the length of the protein, while the TFIIA-L family is conserved mainly at the N-terminal

147 148

and C-terminal ends. The TFIIA-L sequence conservation pattern is consistent with the observation that human and fruit fly TFIIA is composed of three subunits, the two largest

of which are derived from proteolytic cleavage of the TFIIAα/β (TFIIA-L) pre-protein.

This suggests that the middle region of TFIIA-L proteins may function as a flexible

linker.

AtTFIIA-L3 appears to have arisen from a more ancient gene duplication and has

significantly diverged from other TFIIA-L genes (Figure 2-2). The AtTFIIA-L3 protein

is approximately half the size of its two Arabidopsis homologs, although it has

maintained a similar pI and appears to be competent for assembly of the TFIIA complex

based on yeast two-hybrid interactions (Chapter 4). One hypothesis is that AtTFIIA-L3 represents an ancestral form of the TFIIA-L protein family due to its phylogenetic clustering with fungal and metazoan sequences.

Arabidopsis and poplar both encode TFIIA proteins that are highly diverged from other plant proteins (TFIIA-L3, and TFIIA-S2, respectively). These proteins may have evolved specialized functions within their respective organisms; however, they do not seem to be conserved within other plant genomes that have been sequenced. This suggests that these proteins are potentially evolutionary experiments in progress or may be ancestral forms of these proteins that have been retained in their respective species.

The small subunit of Arabidopsis TFIIA displays a number of novel interactions, which have not been previously identified. These include interactions with TFIIB3,

TFIIB6, TAF4b, TAF8, TAF10, TAF12 1-200, TAF12b, TAF13, TAF14, and TFIIFβ2.

Only TFIIA-L3 interacts with a TFIIB homolog (TFIIB6), although in yeast TFIIA-L interacts with TFIIB in two-hybrid experiments. Since essentially two diverged versions

149

of TFIIA-L (TFIIA-L1/TFIIA-L2 and TFIIA-L3) exist in Arabidopsis, some of the functions (interactions) of TFIIA may have evolutionarily been selected to occur in

TFIIA-S, which is present in every TFIIA complex.

TFIIB Family

Full-length cDNAs of over 30 plant homologs of TFIIB have been identified in plants. The TFIIB protein-family has undergone a myriad of duplications and differentiations including one (the Class C TFIIB-related proteins) that has evolved a functional interaction with the defining plant organelle, the plastid (Lagrange et al.,

2003). Six distinct phylogenetic TFIIB groups are apparent in Arabidopsis and Populus

(if one accounts for the BRFs). Clear homologs of DNA-dependent RNA polymerase III

(PolIII) associated TFIIB-related factors (BRFs) from plants were excluded from the phylogenetic analysis with the exception of the Arabidopsis proteins for use as an out- group.

Plant TFIIB homologs have a number of conserved motifs. These include a lysine residue in the second direct repeat, which has been shown to be autoacetylated in human and yeast TFIIBs (Choi et al., 2003). Choi et al. (2003) reported that the presence of the acetyllysine group increases TFIIB affinity for TFIIF, implying a role in transcriptional initiation. It is likely that an activity involved in this critical process will be conserved not only in metazoans and fungi, but also in plants. Equally significant is the absence of this lysine in several of the plant TFIIBs, suggesting plant-specific specialization among members of the TFIIB family. Arabidopsis TFIIB1, TFIIB2, TFIIB3 all contain this lysine residue, while TFIIB4, TFIIB5 and TFIIB6 do not. Interestingly, TFIIB2 and

TFIIB4 do not interact with TFIIFβ2 in yeast two-hybrid assays while the other TFIIB

150

homologs do. This confirms that the acetyllysine is not the only factor involved in the

TFIIB-TFIIF interaction.

The putative zinc-ribbon domain at the N-terminus has been conserved in most

TFIIB-family members. Significantly, poplar TFIIB3, poplar TFIIB8, poplar TFIIB9,

Lycopersicon esculentum TFIIB AF273333, and Sulfolobus solfataricus TFB

AAK40772.1 are all missing the conserved cysteine and/or histidine residues essential to this N-terminal motif. However, at least one archaeal species (Sulfolobus solfataricus) is lacking the zinc-ribbon in its TFB suggesting that this motif may not be required for

TFIIB function in all cases.

Another conserved domain, the imperfect direct repeats (Nikolov et al., 1995) are found in most plant TFIIB homologs (Appendix B3). Although not closely related,

AtTFIIB6 and the predicted amino acid sequence of poplar TFIIB4 are both lacking the second direct repeat region, which interacts directly with PolII in animals (Ha et al.,

1993). Therefore, it is suggested that these proteins may be deficient in this PolII

interaction and, if they are functional, could possibly play a role as negative regulators.

Vitis vinifera TFIIB TC9302, as well as amino acid predictions from poplar TFIIB5 and

TFIIB6 are notably lacking both direct repeats suggesting that these proteins, if expressed, are not functional TFIIB homologs since they would likely be deficient in

TBP and PolII interactions (Ha et al., 1993).

In addition to the TFIIB-family proteins in Class A and Class B (BRFs, which were not analyzed extensively in this study), a clear conservation has been shown for the plastid-envelope associated TFIIB-like proteins (Class C). The canonical member of the

Class C group (TFIIB5/pBrp) was discovered by Lagrange et al. (2003) to bind the outer

151

envelope of the plastid, suggesting a function in signal transduction from the plastid to

the nucleus. This plant-specific TFIIB is not detectable in the nucleus of wild type plants

(Lagrange et al., 2003). The characterized protein AtTFIIB5/pBrp has two closely

related homologs (Lycopersicon esculentum Accession AAG01118, and poplar

TFIIB7/pBrp) in addition to the partial cDNAs from Spinacia oleracea and Zea mays reported by Lagrange et al. (2003). This suggests that this protein is widely distributed in plants and has a conserved activity that is critical to plant cell functions. TFIIB5/pBrp is also a bona fide TFIIB, because it interacts with TBP bound to the TATA-element in

EMSA experiments. While the exact function of TFIIB5/pBrp is still unknown, the data presented here demonstrates that it interacts with many other proteins in TFIID, TFIIE, and TFIIF as well as with several of its TFIIB homologs. These interactions, along with the trafficking of this protein from the plastid to the nucleus under the conditions of proteasome/signalosome dysfunction (Lagrange et al., 2003) strongly suggest a signal transduction role resulting in direct manipulation of the central proteins regulating

transcription.

There is weak bootstrap support for two additional conserved classes of TFIIB-like

proteins in plants. The Class D group contains Arabidopsis TFIIB4 and Poplar TFIIB8.

Class E contains Arabidopsis TFIIB3 and TFIIB6, as well as Poplar TFIIIB2. The

functions of these proteins are unknown; however, Arabidopsis TFIIIB3 and TFIIB6 have

similar interactions with other GTFs (Chapter 4).

TFIIB is represented by six different proteins in Arabidopsis. TFIIB1 and TFIIB2

are both closely related Class A proteins. Interestingly, TFIIB1 and TFIIB2 have very

dissimilar yeast two-hybrid interactions. TFIIB1 interacts with TAF4, TAF4b, TAF8,

152

TAF10, TAF12b, TAF13, TFIIB5, TFIIEα1, and forms a homooligomer (possibly a

dimer). However, TFIIB2 only interacts with TAF10. This discrepancy in interactions

was unexpected for two close homologs, and is difficult to explain. When expressed as a

bait construct, TFIIB2 was found to interact with the Gal4 AD; this interaction might dampen the transcriptional response of the TFIB2 prey construct leading to false

negatives.

TFIIB3 and TFIIB6 group closely in a phylogenetic analysis of the TFIIB-family of

proteins. However, TFIIB6 lacks the second direct repeat region found in TFIIB proteins, and TFIIB3 does not. This suggests that these two proteins may have vastly different roles in transcription, with TFIIB6 perhaps playing a role as a repressor of

transcription. Alternatively, TFIIB6 may be lacking this second repeat domain through a

functionally deleterious mutation.

TFIID Components

TBP is widely regarded as being the rate-limiting factor of PIC formation {Klein,

1994 #464; Collart, 1996 #189; Chatterjee, 1995 #465}. Consistent with this critical role,

it is among the most highly conserved proteins of the GTFs through all the organisms

examined in this study, with 73.7% and 63.1% average similarity and identity,

respectively. Likewise, TBP is highly conserved in plants in general (84% average

identity). Similarly to animals, many plants contain duplicate TBP genes; however,

unlike the case in animals, the plant proteins are highly similar and are likely redundant.

Plant TBPs are tightly clustered phylogenetically, although significantly diverged from

metazoan, fungal, protist, and archaeal TBPs and TBP-like proteins. As would be

expected, the TBP two repeated structural domains are conserved in all the proteins in the

153

TBP-like family. Arabidopsis TBPs, like their counterpart in yeast, interact with very

few proteins in yeast two-hybrid. TBP1 and TBP2 interact with the N-terminus of TAF1

and TBP1 interacts with TFIIFβ2. The paucity of TBP interactions may be due to a

dampening of the Gal4 AD activity by TBP itself.

Arabidopsis TFIID is composed of at least 15 different protein subunits, some of

which are present in multiple copies. In general, TBP-associated factors are more highly

variable than TBP. There are many cases of duplicate TAFs as well as TAF-like proteins

in fungi and animals (reviewed in Tora, 2002). One extreme example of this type of

duplication in Arabidopsis, are the TAF6 homologs. Two genetic loci encoding

homologs of TAF6 have been identified in Arabidopsis. One of these genes, TAF6b, is alternatively spliced into four distinct messages. The TAF genes that were phylogenetically analyzed were found to be clearly divergent along taxonomic lines.

This was demonstrated by TAF9, of which monocot and dicot TAF9 sequences clustered separately in the unrooted phylogram. However, poplar TAF9b is more closely related to the Chlamydomonas reinhardtii TAF9 than the monocot or dicot TAF9 proteins. This

TAF9 homolog is perhaps a TAF9-like protein involved in other transcriptional complexes similarly to H. sapiens TAF9L (Chen and Manley, 2003). A second possibility is that poplar TAF9b may be a bona fide TAF9 that regulates a subset of genes in poplar, or is merely redundant. Finally, poplar TAF9b could represent an ancient form of the gene that has been maintained in this lineage.

TAF8 and TAF10 both dimerize and interact very strongly with one another.

These two proteins also have a very similar pattern of interactions with the proteins of other GTFs. Both TAF8 and TAF10 were shown to interact with TFIIA-S, TFIIB1,

154

TFIIB5/pBrp, TFIIEα1, and TFIIFβ2. These data suggests that TAF8 and TAF10

(perhaps as an α2β2 structure) mediate interactions between TFIID and TFIIA, TFIIB,

TFIIE, and TFIIF. Some non-shared interactions are consistent with the shared

interactions, such as the TAF8-TFIIEβ2, which supports the TAF8-TAF10

heterotetramer interacting with the TFIIEα1-TFIIEβ2 heterotetramer. To the best of my

knowledge, none of these interactions of TAF8 or TAF10 with other GTFs have been

reported previously in any organism. This data suggest a previously unknown role of a

TA8-TAF10 heterotetramer in PIC nucleation, at least within plants.

TFIIEα and TFIIEβ Subunits

Arabidopsis thaliana has genes encoding three homologs of TFIIEα and two of

TFIIEβ (Table 2-1). TFIIEα and TFIIEβ of H. sapiens are acidic (pI of 4.5) and basic (pI

of 9.5) proteins, respectively (Peterson et al., 1991). The acidic properties of TFIIEα

appear to be well conserved in Arabidopsis with pI values of 4.75, 4.95, and 4.72 for

Eα1, Eα2, and Eα3, respectively. Likewise, the basic pI values are conserved in

Arabidopsis TFIIEβ proteins (10.23 and 10.04 respectively for Eβ1 and Eβ2).

Four of these Arabidopsis TFIIE genes are clustered on chromosome 4. TFIIEα2 and TFIIEβ2 neighbor each other in a head to head inverted fashion sharing a common promoter region. TFIIEα3 and TFIIEβ1 are in relatively close proximity both in the

same orientation (18 genetic loci inserted between the genes, 83 kbp apart). The extreme

proximity (only 972 bp between start codons) of TFIIEα2 and TFIIEβ2 suggest that they

are direct descendents of the ancestral genes in plants and have been duplicated to create

the other loci. This hypothesis is supported by phylogenetic data in the case of TFIIEβ2 in which the Arabidopsis protein (along with the gene product of TFIIEβ1) is clustered

155

separately from all other TFIIEβ proteins. However, the TFIIEα2 protein clusters with

the other Arabidopsis TFIIEα sequences, within the dicot grouping.

TFIIEα1 interacts strongly with TFIIEβ2, although surprisingly, TFIIEα2

fragments did not interact with TFIIEβ proteins. This may be caused by improper

folding of TFIIEα2 domains when expressed in yeast in a fragmented form. TFIIEα3

was not examined in this study because it was not amplified from the cDNA population.

TFIIEα3 is not represented by EST sequences and may therefore be a non-expressed pseudogene, or a under-represented mRNA/cDNA. Interestingly, TFIIEβ1 did not interact with any other proteins in this study, a result that may be caused by a two amino acid truncation of the C-terminus. However, this protein was readily expressed in the prey form as evidenced by Figure 4-2, suggesting that protein stability and folding were

not factors in the lack of interaction.

TFIIFα and TFIIFβ Subunits

TFIIF is a heterotetrameric complex of two TFIIFα and two TFIIFβ molecules

(Orphanides et al., 1996). TFIIFα is highly conserved throughout the length of the

primary structure through the animal, plant, and fungal kingdoms. The poplar genome

clearly encodes two TFIIFα genes; however, the genomic sequence of one of these could

not be completed due to lack of sequence in what appears to be a low-complexity intronic

region. In yeast two-hybrid assays, Arabidopsis TFIIFα only interacted with TFIIFβ2.

The yeast two-hybrid data presented here would suggest a αββα composition for plant

TFIIF, since the TFIIFβ2 protein was shown to dimerize while TFIIFα apparently did

not. This is an unusual conformation for this complex, which has is considered to be

α2β2 heterotetramer (Flores et al., 1990). In yeast a third factor interacts as part of TFIIF

156 as a distinct complex, the yeast TAF14 (Henry et al., 1994). The data presented here demonstrate a connection of TAF14 and TAF14b with TFIIF; however, these interactions were between TAF14(b) and TFIIFβ2, not TFIIFα as in yeast. Since the TAF14(b)-

TFIIF connection is different from that in yeast, support for TAF14 or TAF14b acting as

TFIIF subunits is weak.

Interestingly, TFIIFβ2 interacts with many other GTF subunits (25 of 47), while

TFIIFα did not interact with any subunits except TFIIFβ2. TFIIFα binds TAF1 in other systems, and is acetylated by the TAF1 FAT activity (Ruppert and Tjian, 1995).

Interestingly, TFIIFβ2 interacts with both TAF1 #8 (middle region) and TAF1 UBD (an internally coded ubiquitin moiety plus over one-hundred amino acids on either side, which is located in the middle region construct). The HAT/FAT domain of TAF1 is located in the TAF1 #8 construct, and is partially represented in the TAF1 UBD construct. This suggests that in Arabidopsis either TFIIFβ2 is acetylated in the place of

TFIIFα, or that TFIIFβ2 is a stabilizer of the TAF1-TFIIFα interaction.

A major anchoring point of PolII to the GTFs is the binding of TFIIB through the tightly associated TFIIF complex. Consistently with this, TFIIFβ2 interacted with four of six TFIIB homologs. Of the TFIIBs, only TFIIB2 and TFIIB4 did not interact with

TFIIF2. These TFIIB homologs had relatively few interactions, despite detectable expression as prey proteins. Both TFIIB2 and TFIIB4 contain putative zinc-binding domains that are implicated in interactions with TFIIFβ homologs (Buratowski and Zhou,

1993). This lack of interactions between TFIIB2/TFIIB4 and TFIIFβ4 may indicate false negatives in the yeast two-hybrid assay. False negatives are a problem in any protein interaction study, and often lead to a lack of definitive conclusions with respect to these

157 failed interactions. However, a lack of interactions with TFIIFβ2 and nearly all other

GTFs tested here draws into question the veracity of TFIIB2 and TFIIB4 as functional

TFIIB-homologs.

Conclusion

Throughout this work, 11 plant GTF protein families have been phylogenetically analyzed, 39 Arabidopsis cDNAs for TFIIA, TFIIB, TFIID, TFIIE, and TFIIF protein homologs have been cloned, and 1598 potential protein-protein interactions have been tested by yeast two-hybrid analyses. Of the 1108 tested non-redundant interactions, 176

(15.9%) have been positive. From these, 112 (63.6%) were novel for any system (Figure

5-1), and 64 (36.4%) of the interactions from Arabidopsis were confirmations of interactions from human, Drosophila, or yeast (Figure 5-2). Figure 5-3 shows 52 interactions that have been described in human, Drosophila, and/or yeast that were not found for the Arabidopsis homologs. While it is somewhat surprising to be missing this many interactions in a comprehensive study of this type, there are several explanations.

Poor expression or misfolding of proteins in yeast may account for some of these missing binary interactions. A few may be missing because of the inability to test the interactions

(non-testable bait constructs). However, it is also likely that many of these “lacking” interactions are simply not found in the Arabidopsis PIC. Many changes have occurred in these protein homologs since the last common ancestor of plants, animals and fungi.

This degree of shuffling of protein-protein interactions in the PIC, since the last common ancestor of these eukaryotes, is not unreasonable.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB4

TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4

158

TAF12 TAF4b TFIIFα

TFIIEα1 TAF11 TAF55 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b TAF6 -1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b conservation TAF7 -3 TAF6b in histone-fold TFIIEα3 -2 domain TFIIEβ1 TFIIEβ2

Figure 5-1. Protein-protein interactions among TFIIA, TFIIB, TFIIE, TFIID, and TFIIF that are unique to Arabidopsis thaliana as determined by yeast two-hybrid and β-galactosidase confirmations.

TFIIB1 TFIIB2 TFIIA-L1 TBP2 TFIIB3 TFIIA-S TBP1 TAF15b TFIIB4 TFIIA-L2 TAF15 TAF1

TAF14b TFIIB5 TFIIA-L3 TAF1b TAF14 TFIIB6 TAF13 TAF2

TAF12b TAF4 159

TAF12 TAF4b TFIIFα

TFIIEα1 TAF11 TAF5 TAF11b TFIIFβ2 Possible pseudo- TAF6b gene -4 TAF6b TAF6 -1 TFIIEα2 TAF10 TAF9 Low TFIIFβ1 TAF8 TAF6b conservation TAF7 -3 TAF6b in histone-fold TFIIEα3 -2 domain TFIIEβ1 TFIIEβ2

Figure 5-2. Protein-protein interactions of Arabidopsis thaliana TFIIA, TFIIB, TFIID, TFIIE, and TFIIF that have been reported previously for Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae homologs. Striped proteins represent those that could not be tested as baits.

TFIIFα TFIIB TBP TAF15 TAF1 TFIIFβ TAF14 TAF2 TAF13

TAF12 TAF3 TAF11

160 TAF10 TAF4

TFIIA-a&β/L TAF9 TAF5

TAF8 TAF6 TFIIEβ

TFIIA-γ/S TAF7 TFIIEα

Figure 5-3. Interactions of Homo sapiens, Drosophila melanogaster, and/or Saccharomyces cerevisiae TFIIA, TFIIB, TFIID, TFIIE or TFIIF that were not confirmed for Arabidopsis thaliana homologs

APPENDIX A NUCLEOTIDE AND AMINO ACID SEQUENCES OF GENERAL TRANSCRIPTION FACTORS

TFIIA Small Subunit Sequences

Arabidopsis thaliana TFIIA-S At4g24440 MATFELYRRSTIGMCLTETLDEMVQSGTLSPELAIQVLVQFDKSMTEALESQVKTKVSI KGHLHTYRFCDNVWTFILQDAMFKSDDRQENVSRVKIVACDSKLLTQ

Drosophila melanogaster TFIIA-S NP_524467 MSYQLYRNTTLGNTLQESLDELIQYGQITPGLAFKVLLQFDKSINNALNQRVKARVTFK AGKLNTYRFCDNVWTLMLNDVEFREVHEIVKVDKVKIVACDGKSGEF

Glycine max TFIIA-S TC148651 MATFELYRRSTIGMCLTETLDEMVQNGTLSPELAIQVLVQFDKSMTEALETQVKSKVSI KGHLHTYRFCDNVWTFILQDALFKNEDSQENVGRVKIVACDSKLLTQ

Homo sapiens TFIIA-γ NP_004483 MAYQLYRNTTLGNSLQESLDELIQSQQITPQLALQVLLQFDKAINAALAQRVRNRVNFR GSLNTYRFCDNVWTFVLNDVEFREVTELIKVDKVKIVACDGKNTGSNTTE

Hordeum vulgare TFIIA-S TC66396 MATFELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFDKSMTEALENQVKSKVTV KGHLHTYRFCDNVWTFILTDAQFKNEEITEQVSKVKIVACDSKLLSQ

Lycopersicon esculentum TFIIA-S TC119445 MATFELYRRSTIGMCLTETLDEMVSNGILSPEHAIQVLVQFDKSMTEALETQVKSKVTI KGHLHTYRFCDNVWTFILQDAVFKSEECQETVNRVKIVACDSKLLTQ

Medicago truncatula TFIIA-S TC79554 MATFELYRRSTIGMCLTETLDEMVQNGTLSPEIAIQVLVQFDKSMTEALETQVKSKVSI KGHLHTYRFCDNVWTFILQDALFKNEDNQENVGRVKIVACDSKLLSQ

Mesembryanthemum crystallinum TFIIA-S TC5775 MATFELYRRSTIGMCLTETLDEMVQSGTLSPELAIQVLVQFDKSMTEALEAQVKTKVTI KGHLHTYRFCDNVWTFMLQDALFKSEECQENVSRVKIVACDSKLLTQ

Oryza sativa TFIIA-S AAK73129 MATFELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFDKSMTEALENQVKSKVSI KGHLHTYRFCDNVWTFILTEASFKNEETTEQVGKVKIVACDSKLLSQ

161 162

Oryza_sativa_TFIIA-S2 MATFELYRRSTIGMCLTDTLDDMVSSGALSPELAIQVLVQFDKSMTSALEHQVKSKVTV KGHLHTYRFCDNVWTFILTDAIFKNEEITETINKVKIVACDSKLLETKEE

Pinus TFIIA-S TC16392 MATFELYRKSTIGTCLTETLDELVLNGTLSPEHAIQVLVQFDKSMAEALETQVKSKVTI KGHLHTYRFCDNVWTFLLQDAQFKGEDIHEQAGRVKIVACDSKILTQ

Populus trichocarpa TFIIA-S1 MATFELYRRSTIGMCLTETLDDMVQNGTLSPELAFQVLVQFDKSMTEALETKVKSKVTI KGHLHTYRFCDNVWTFILQDANFKNEDSQENVGRVKIVACDSKLLTQ

Populus trichocarpa TFIIA-S2 MSTNGNNPAPYFELYRRSSVGLALTDALDELIQSGHINPQLALTVLKQFDKSASQVLST QLRSKCLIKGHLSTYRLCDEVWTFLLRDSIYKLEGGEQVGPVKRVKIVACKGNAGASAP PA

Saccharomyces cerevisiae TFIIA-S TOA2p NP_012865 MAVPGYYELYRRSTIGNSLVDALDTLISDGRIEASLAMRVLETFDKVVAETLKDNTQSK LTVKGNLDTYGFCDDVWTFIVKNCQVTVEDSHRDASQNGSGDSQSVISVDKLRIVACNS KKSE

Solanum tuberosum TFIIA-S TC60470 MATFELYRRSTIGMCLTETLDEMVSNGILSPEHAIQVLVQFDKSMTEALETQVKSKVTI KGHLHTYRFCDNVWTFILQDAVFKSEECQETVNRVKIVACDSKLLTQ

Triticum aestivum TFIIA-S TC71252 MATFELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFDKSMTEALENQVKSKVTV KGHLHTYRFCDNVWTFILTDAQFKNEETTEQVGKVKIVACDSKLLSQ

Triticum aestivum TFIIA-S TC71251 MATFELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFDKSMTDALETQVKSKVTV KGHLHTYRFCDNVWTFILTDAQFKNEETTEQVGKVKIVACDSKLLSQ

Triticum aestivum TFIIA-S CA484144 MATFELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFDKSMTDALENQVKSKVNI KGHLHTYRFCDNVWTFILTDASFKNEETTEQVGKVKIVACDSKLLGQ

Vitis vinifera TFIIA-S TC15540 MATFELYRRSTIGMCLTETLDEMVQNGTLSPELAIQVLVQFDKSMTEALESQVKSKVTI KGHLHTYRFCDNVWTFILQDALFKNEESQENVGRVKIVACDSKLLTQ

Zea mays TFIIA-S TC170582 MATFELYRRSTIGMCLTETLDEMVSNGTLSPELAIQVLVQFDKSMTDALENQVKSKVTV KGHLHTYRFCDNVWTFILTDASFKNEEATEQVGKVKIVACDSKLLGQ

163

Zea mays TFIIA-S TC173972 MATFELYRRSTIGTCLTETLDELVSSGAVSPELAIQVLVQFDKSMTEALEMQVKSKVSV KGHLHTYRFCDNVWTFILTDATFKSEEIQETLGRVKIVACDSKLLQPQHP

TFIIA Large Subunit Sequences

Arabidopsis thaliana TFIIA-L1 MGTTTTTSAVYIHVIEDVVNKVREEFINNGGPGESVLSELQGIWETKMMQAGVLNGPIE RSSAQKPTPGGPLTHDLNVPYEGTEEYETPTAEMLFPPTPLQTPLPTPLPGTADNSSMY NIPTGSSDYPTPGTENGVNIDVKARPSPYMPPPSPWTNPRLDVNVAYVDGRDEPERGNS NQQFTQDLFVPSSGKRKRDDSSGHYQNGGSIPQQDGAGDAIPEANFECDAFRITSIGDR KVPRDFFSSSSKIPQVDGPMPDPYDEMLSTPNIYSYQGPSEEFNEARTPAPNEIQTSTP VAVQNDIIEDDEELLNEDDDDDELDDLESGEDMNTQHLVLAQFDKVTRTKSRWKCSLKD GIMHINDKDILFNKAAGEFDF

Arabidopsis thaliana TFIIA-L2 MGTTTTTSAVYIHVIEDVVNKVREEFINNGGPGESVLSELQGIWETKMMQAGVLNGPIE RSSAQKPTPGGPLTHDLNVPYEGTEEYETPTAEMLFPPTPLQTPLPTPLPGTADNSSMY NIPTGSSDYPTPGTENGVNIDVKARPSPYMPPPSPWTNPRLDVNVAYVDGRDEPERGNS NQQFTQDLFVPSSGKRKRDDSSAHYQNGGSIPQQDGASDAIPEANFECAALRITYVGDR KIPRDLIGSSSKIPQVDGPMPDPYDEMLSTPNIYSYQGPNEEFNEARTPAPNEIQTSTP VAVPNDIIEDDEELLNEDDDDDELDDLESGEDMNTQHLVLAQFDKVTRTKSRWKCSLKD GIMHINDKDILFNKATGEFDF

Arabidopsis thaliana TFIIA-L3 MVLSTSDTSSSYNYVIDDVINKSRCDLVYNGELDESVLSQIQSMWKTKMIQAGAMSGTI ETSSASIPTTPVIVQTTLQTPDAIPLPEKKMSPKKESDGFYYIPQQDGARDEAIVDVDE NEEPLNEDDDDEEDDIDDDDMNIQHLVMCQFDKVKRSKNKWECKFNAGVMQINGKNVLF SQATGDFNF

Drosohphila melanogaster TFIIA-alpha-beta NP_476996 MALCQTSVLKVYHAVIEDVITNVRDAFLDEGVDEQVLQEMKQVWRNKLLASKAVELSPD SGDGSHPPPIVANNPKAANAKAKKAAAATAVTSHQHIGGNSSMSSLVGLKSSAGMAAGS GIRNGLVPIKQEVNSQNPPPLHPTSAASMMQKQQQAASSGQGSIPIVATLDPNRIMPVN ITLPSPAGSASSESRVLTIQVPASALQENQLTQILTAHLISSIMSLPTTLASSVLQQHV NAALSSANHQKTLAAAKQLDGALDSSDEDESEESDDNIDNDDDDDLDKDDDEDAEHEDA AEEEPLNSEDDVTDEDSAEMFDTDNVIVCQYDKITRSRNKWKFYLKDGIMNMRGKDYVF QKSNGDAEW

Glycine max TFIIA-L TC192713 MAASTTSQVYIQVIDDVMNKVRDEFVNNGGPGDEVLKELQSIWESKMMQAGAIVGPIER SGAPKPTPGGPITPVHDLNMPYEGTEEYETPTAEMLFPPTPLQTPLQTPLPGTVDNSMY NIPTGPSDYPSAGNEPGANNEIKGGRPGPYMQPPPSPWTNQNQNQNQRAPLDVNVAYVE GRDEAERGASNQPLTQDFFMSSGKRKRDEIASQYNAGGYIPQQDGAGDAASQILEIEVY GGGMSIDAGHSTSKGKMPAQSDRPASQIPQLDGPIPYDDDVLSTPNIYNYGVFNEDYNI ANTPAPSEVPASTPAPIAQNEVDEEDDDDEPPLNENDDDDLDDLDQGEDQNTHHLVLAQ FDKVTRTKSRWKCTLKDGIMHINNKDILFNKATGEFDF

164

Homo sapiens TFIIA-alpha/beta-like factor NM_006872 MACLNPVPKLYRSVIEDVIEGVRNLFAEEGIEEQVLKDLKQLWETKVLQSKATEDFFRN SIQSPLFTLQLPHSLHQTLQSSTASLVIPAGRTLPSFTTAELGTSNSSANFTFPGYPIH VPAGVTLQTVSGHLYKVNVPIMVTETSGRAGILQHPIQQVFQQLGQPSVIQTSVPQLNP WSLQATTEKSQRIETVLQQPAILPSGPVDRKHLENATSDILVSPGNEHKIVPEALLCHQ ESSHYISLPGVVFSPQVSQTNSNVESVLSGSASMAQNLHDESLSTSPHGALHQHVTDIQ LHILKNRMYGCDSVKQPRNIEEPSNIPVSEKDSNSQVDLSIRVTDDDIGEIIQVDGSGD TSSNEEIGSTRDADENEFLGNIDGGDLKVPEEEADSISNEDSATNSSDNEDPQVNIVEE DPLNSGDDVSEQDVPDLFDTDNVIVCQYDKIHRSKNKWKFYLKDGVMCFGGRDYVFAKA IGDAEW

Homo sapiens TFIIA-alpha-beta NP_056943 MANSANTNTVPKLYRSVIEDVINDVRDIFLDDGVDEQVLMELKTLWENKLMQSRAVDGF HSEEQQLLLQVQQQHQPQQQQHHHHHHHQQAQPQQTVPQQAQTQQVLIPASQQATAPQV IVPDSKLIQHMNASNMSAAATAATLALPAGVTPVQQILTNSGQLLQVVRAANGAQYIFQ PQQSVVLQQQVIPQMQPGGVQAPVIQQVLAPLPGGISPQTGVIIQPQQILFTGNKTQVI PTTVAAPTPAQAQITATGQQQPQAQPAQTQAPLVLQVDGTGDTSSEEDEDEEEDYDDDE EEDKEKDGAEDGQVEEEPLNSEDDVSDEEGQELFDTENVVVCQYDKIHRSKNKWKFHLK DGIMNLNGRDYIFSKAIGDAEW

Hordeum vulgare TFIIA-L Barley1_09796 MASGNVSTVYISVIDDVVAKVREDFITYGVGDAVLNELQALWEMKMLHCGAISGNIDRN RAPPASAGGAPGAGATPPVHDLNVPYEATSEEYATPTADMLFPPTPLQTPIQTPLPGID TGMYNIPTGPSDYAPSPISDMRNGMGMNGSDPKTGRPSPYMQPPSPWMNQRPLGVDVNV AYEESREDPDRLMQPQPLTKDFLMMSSGKRKRDEYPGQLPSGSFVPQQDGCADQVAEFV GSKDNAQQVWNSILNKQESVTKTLSIKESTIPPVLPQRDGIQDDYNDQFFFPGVPTEDY NTPGESSEYRTPTPAIATPKPRNDMAGGDDDDDDDDEPPLNEDDDDDDEIDDLQDGDEE PNTQHLVLAQFDKVTRTKNRWKCTLKDGIMHLNGRDVLFNKASGEFDF

Oryza sativa TFIIA-L MASSNVSTVYISVIDDVISKVRDDFISYGVGDAVLNELQALWEMKMLHCGAISGTIDRS KAAPAPSAGTPGAGTTPPVHDLNVPYEATSEEYATPTADMLFPPTPLQTPIQTPLPGTD AGMYNIPTGPSDYAPSPISDVRNGMAMNGADPKTGRPSPYMPPPSPWMTQRPLGVDVNV AYVENREDPDRTGQPPQLTKDFLMMSSGKRKRDEYPGQLPSGSFVPQQDGSADQIVEFV VSKDNAQQLWSSIVNKQGTATKESSTKETIIAPTIPQRDGMDDYNDPFYFQGVPTEDYN TPGESSEYRAPTPAVGTPKPRNDVGDDDEPPLNEDDDDDDELDDLEQGEDEPNTQHLVL AQFDKVTRTKNRWKCTLKDGIMHLNGRDVLFNKVVNMIF

Populus trichocarpa TFIIA-L1 MASSATSTVYTEVIEDVIDKVRDEFINNGGPGETVLSELQGLWEKKLMQAGVLSGPIVR SSANKQLVPGGLTPVHDLNVPYEGTEEYETPTAEILFPPTPLQTPMQTPLPGSAQTPLP GNVQTPLPGNVPTPLPGSVDNSSMYNISTGSSSDYPTPVSDAGGSTDVKAGRPSHFMQS PSPLMHQRPPLDVNVGKSYFYAPRRVHGQKDFFMSSGKRKRGDFAPKYNNGGFIPQQDG AVDSASEVSQVSQGNNPHGRCDTITTKNREILARVSRSYVKIPQVDGPIPDPYDDMLST PNIYNYQGVANEDYNIASTPAPNDLQASTPAVVSQNDDVDDDDDEPLNEDDDDDEDLDG VDQGEELNTQHLILAQFDKVTRTKSRWKCTLKDGVMHINNRDILFNKATGEFEF

165

Saccharomyces cerevisiae TOA1p NP_014837 MSNAEASRVYEIIVESVVNEVREDFENAGIDEQTLQDLKNIWQKKLTETKVTTFSWDNQ FNEGNINGVQNDLNFNLATPGVNSSEFNIKEENTGNEGLILPNINSNNNIPHSGETNIN TNTVEATNNSGATLNTNTSGNTNADVTSQPKIEVKPEIELTINNANITTVENIDDESEK KDDEEKEEDVEKTRKEKEQIEQVKLQAKKEKRSALLDTDEVGSELDDSDDDYLISEGEE DGPDENLMLCLYDKVTRTKARWKCSLKDGVVTINRNDYTFQKAQVEAEWV

Solanum tuberosum TFIIA-L STtuc02-10-23.4519 MASSTTSNVYIHVIEDVISKVRDEFISNGGPGESILKELQALWEVKMMNAGAILGTIER NSAAKATPGGPITPVHDLNMPYEGNEEYETPTADILFPPTPLQTPLPGTAQTPLPGTVQ TPLPGTAQTPLPGTADSSMYNIPTGGTPFTPSDYSPLNDTGGATELKAGPGRPSPFMHP PSPWLNQRPPLDVNGAYVEGREEVGDRGGSQQPMTQDFFMNSAGKRKREDFPPQYHNGG YIPQQDGAADSIYDNLKSGEGSNIQLELVTVGPVQASAYRIPQFDGPIPDSYDDALSTP NIYYQGVVNEDYNIVNTPAPNDMQAPTPAPALQNDDIDDDDEPLNEDDDDDLDDVDQGE DLNTAHLVLAQFDKVTRTKSRWKCTLKDGIMHINNKDILFNKANGEFDF

Zea mays TFIIA-L TC183075 MASSNVSTVYISVIDDVISKVREDFITYGVGDAVLNELQALWEMKMLHCGAISGNIDRT KAAAASVGGTTGTTAPVHDLNVPYEATSEEYATPTADMLFPPTPLQTPIQTPLPGTDTA MYNIPTGPSDYAPSPISDIRNGMTINGADPKAGHPSPYMPPPSPWMNQRPLGVDVNVAY VEGREDPDRGVQPQPLTQDFLTMSSGKRKRDEYPGQLPSGSFVPQQDGSADQIVEFVVS KENANQHWSSIINKLETPTKTVTPVIPQCDGIQDDYNDQFFFPGVPTEDYNTPGESAEY RAPTPAVGTPKPRNDAGDDNDDDDDDEEPPLNEDDDDDDDLDDLEEGEDEPNTQHLVLA QFDKVTRTKNRWKCTLKDGIMHLNGRDVLFNKATGEFDF

TFIIB Family Sequences

Arabidopsis thaliana BRF1 At2g45100 MVWCKHCGKNVPGIRPYDAALSCDLCGRILENFNFSTEVTFVKNAAGQSQASGNILKSV QSGMSSSRERIIRKATDELMNLRDALGIGDDRDDVIVMASNFFRIALDHNFTKGRSKEL VFSSCLYLTCRQFKLAVLLIDFSSYLRVSVYDLGSVYLQLCDMLYITENHNYEKLVDPS IFIPRFSNMLLKGAHNNKLVLTATHIIASMKRDWMQTGRKPSGICGAALYTAALSHGIK CSKTDIVNIVHICEATLTKRLIEFGDTEAASLTADELSKTEREKETAALRSKRKPNFYK EGVVLCMHQDCKPVDYGLCESCYDEFMTVSGGLEGGSDPPAFQRAEKERMEEKASSEEN DKQVNLDGHSDESSTLSDVDDRESDRFTVSQLDCYFRTPEEVRLVKIFFDHENPGYDEK EAAKKAAGLNACNNASNIFEASKAAAAKSRKEKRQQRAEEEKNAPPPATGIEAVDSMVK RKKFRDINCDYLEELFDASVEKSPKRSKTETVMEKKKKEEHEIVENEQEEEDYAAPYEQ DEEDYAAPYEMNTDKKFYESEVEEEEDGYDFGLY

Arabidopsis thaliana BRF2 At3g09360 MVWCNHCVKNVPGIRPYDGALACNLCGRILENFHFSTEVTFVKNAAGQSQASGNIVRSV QSGITSSRERRFRIARDELMNLKDALGIGDERDDVIVIAAKFFEMAVEQNFTKGRRTEL VQASCLYLTCRELNIALLLIDFSSYLRVSVYELGSVYLQLCEMLYLVENRNYEKLVDPS IFMDRFSNSLLKGKNNKDVVATARDIIASMKRDWIQTGRKPSGICGAALYTAALSHGIK CSKTDIVNIVHICEATLTKRLIEFGDTDSGNLNVNELRERESHKRSFTMKPTSNKEAVL CMHQDSKPFGYGLCEDCYKDFINVSGGLVGGSNPPAFQRAEKERMEKAAREENEGGISS

166

LNHDEQLYHLRIYLGCVAEKGEKDKDGAEEHADTSDESDNFSDISDDEVNGYINNEEET HYKTITWTEMNKDYLEEQAAKEAALKAASEALKASNSNCPEDARKAFEAAKADAAKSRK EKQQKKAEEAKNAAPPATAVEAVRRTLDKKRLSSVINYDVLESLFDTSAPEKSPKRSKT ETDIEKKKEENKEMKSNEHENGENEDEDEEDEEEGNVESYDMKTDFQNGEKFYEEDEEE EEDGNDFGLY

Arabidopsis thaliana BRF3 At2g01280 MDQNFTKGRRAELVQSSCLYLACRDMKISLLFIDFSSYLRVSVYELGSVYLQLCEMLYL VQNKNYEELVDPSIFIPRFTNSLLKGAHAKAKDVANTAKNIISSMKRDWIQTGRKPSGI CGAAIYMAALSHGIMYSRADIAKVVHMCEATITKRLNEFANTEAGSLTLLVGRILLLIS EQRKREWKKQLEKKTREELAANCPEDARNLVEASKAAVANSRKEKRRKRAEEAKNAPPS ATATEAVCRTLERKIKIYSIF

Arabidopsis thaliana BRF4 At4g35540 MRCKRCNGSNFERDEDTGNSYCGGCGTLREYDNYEAQLGGIRGPQGTYIRVGTIGRGSV LDYKDKKIYEANNLIEETTERLNLGNKTEVIKSMISKLTDGEFGQGEWFPILIGACCYA VVREEGKGVLSMEEVAYEVGCDLHQLGPMIKRVVDHLDLELREFDLVGLFTKTVTNSPR LTDVDRDKKEKIIKQGTFLMNCALKWFLSTGRRPMPLVVAVLAFVVQVNGVKVKIDDLA KDASVSLTTCKTRYKELSEKLVKVAEEVGLPWAKDVTVKNVLKHSGTLFALMEAKSMKK RKQGTGKELVRTDGFCVEDLVMDCLSKESMYCYDDDARQDTMSRYFDVEGERQLSLCNY DDNISENQLSTKYNEFEDRVCGGTLAKRSQGSSQSMWQRRSVFGMVSTENWWKGKSELS KRLLLKDLLEKDVGLEALPPSYIKGCVAVERRREKIKAAKLRINAIQHPSDNVSEGALS LELEHSKKKRKKGSEIDWEDLVIQTLVLHNVNEEEIEKGHYKTLLDLHVFNSGEV

Arabidopsis thaliana TFIIB1 At2g41630 MSDAYCTDCKKETELVVDHSAGDTLCSECGLVLESHSIDETSEWRTFANESSNSDPNRV GGPTNPLLADSALTTVIAKPNGSSGDFLSSSLGRWQNRNSNSDRGLIQAFKTIATMSER LGLVATIKDRANELYKRLEDQKSSRGRNQDALYAACLYIACRQEDKPRTIKEICVIANG ATKKEIGRAKDYIVKTLGLEPGQSVDLGTIHAGDFMRRFCSNLAMSNHAVKAAQEAVQK SEEFDIRRSPISIAAVVIYIITQLSDDKKTLKDISHATGVAEGTIRNSYKDLYPHLSKI APSWYAKEEDLKNLSSP

Arabidopsis thaliana TFIIB2 At3g10330 MSDAFCSDCKRHTEVVFDHSAGDTVCSECGLVLESHSIDETSEWRTFANESGDNDPVRV GGPTNPLLADGGLTTVISKPNGSSGDFLSSSLGRWQNRGSNPDRGLIVAFKTIATMADR LGLVATIKDRANEIYKRVEDQKSSRGRNQDALLAACLYIACRQEDKPRTVKEICSVANG ATKKEIGRAKEYIVKQLGLETGQLVEMGTIHAGDFMRRFCSNLGMTNQTVKAAQESVQK SEEFDIRRSPISIAAAVIYIITQLSDEKKPLRDISVATGVAEGTIRNSYKDLYPHLSKI IPAWYAKEEDLKNLQSP

Arabidopsis thaliana TFIIB3 At3g29380 MEEETCLDCKRPTIMVVDHSSGDTICSECGLVLEAHIIEYSQEWRTFASDDNHSDRDPN RVGAATNPFLKSGDLVTIIEKPKETASSVLSKDDISTLFRAHNQVKNHEEDLIKQAFEE IQRMTDALDLDIVINSRACEIVSKYDGHANTKLRRGKKLNAICAASVSTACRELQLSRT LKEIAEVANGVDKKDIRKESLVIKRVLESHQTSVSASQAIINTGELVRRFCSKLDISQR EIMAIPEAVEKAENFDIRRNPKSVLAAIIFMISHISQTNRKPIREIGIVAEVVENTIKN SVKDMYPYALKIIPNWYACESDIIKRLDGVITSWDSAKFSV

167

Arabidopsis thaliana TFIIB4 At3g57370 MTMKWGHSCRRCKQINVVTDHVTRRTRCFGCGLEFKYRPIGDLSPVAENDTVRLPDPTN TLLSNTDLSIVTTEHKNGSFDDSLSLNLGNSSKPRLDPVSIATAKLMNGSSNDFLSLGT SQNSETITASSDEFLFSDLGHLQKFSFDPLSMASTKPNKALSIVSIEAISNGLKLPATI KGQANEIFKVVESYARGKERNVLFAACIYIACRDNDMTRTMREISRFANKASISDISET VGFIAEKLEINKNWYMSIETANFIKRFCSIFRLDKEAVEAALEAAESYDYMTNGRRAPV SVAAGIVYVIARLSYEKHLLKGLIEATGVAENTIKGTYGDLYPNLPTIVPTWFANANDL KNLGAP

Arabidopsis thaliana TFIIB5 At4g36650 MKCPYCSSAQGRCTTTSSGRSITECSSCGRVMEERQTQNHHLFHLRAQDTPLCLVTSDL QTAAQPSPEDEEDPFEPTGFITAFSTWSLEPSPIFARSSLSFSGHLAELERTLELASST SNSNSSTVVVDNLRAYMQIIDVASILGLDCDISEHAFQLFRDCCSATCLRNRSVEALAT ACLVQAIREAQEPRTLQEISIAANVQQKEIGKYIKILGEALQLSQPINSNSISVHMPRF CTLLQLNKSAQELATHIGEVVINKCFCTRRNPISISAAAIYLACQLEDKRKTQAEICKI TGLTEVTLRKVYKELLENWDDLLPSNYTPAVPPEKAFPTTTISTTRSTTPRAVDPPEPS FVEKDKPSAKPIETFDHTYQQPKGKEDKQPKFRQPWLFGTASVMNPAEMISEPAKPNAM DYEKQQLDKQQQQQLGDKETLPIYLRDHNPFPSNPSPSTGISTINWSFRPSVVPGSSSN LPVIHPPKLPPGYAEIRGSGSRNADNPHGDF

Arabidopsis thaliana TFIIB6 At4g10680 MKEDGICLECKRPTETVVNYKNGDTICIECGHVIENNIIDDLDGASTNPNLKSGHLPTI IFKLSGKSSSLASKLRRTQNEMIKNKQEEDVIKIAYAEIERMTEALGLTFGISNTACKI LSKLDKKNLRGGKSLRGLCAASVSRACRQVNIPKTLKEISAVANVDMKEINKAVKLLGD SFG

Citrus sinensis TFIIB CB292941 MTDAFCSDCKKHTEVVFDHSAGDTVCSECGLVLESHSIDETSEWRTFANESGDNDPVRV GGPTNPLLADGGLSTVIAKPNGASGEFLSSSLGRWQNRGSNPDRGLILAFKTIATMSDR LGLVATIKDRANEIYKKVEDQKSSRGRNQDALLAACLYIACRQEDKPRTVKEICSVANG ATKKEIGRAKEYIVKQLGLETGQSVEMGTIHAGDFMRRFCSNLGMNNQAVKAAQEAVQK SEEFDIRRSPISIAAAVIYIITQLSDDKKPLKDISVATGVAEGTIRNSYKDLYPHVSKI IPNWYAKEEDLKNLCSP

Drosophila melanogaster BRF AAF72065 MSTGLKCRNCGSNEIEEDNARGDRVCMNCGSVLEDSLIVSEVQFEEVGHGAAAIGQFVS AESSGGATNYGYGKFQVGSGTESREVTIKKAKKDITLLCQQLQLSQHYADTALNFFKMA LGRHLTRGRKSTHIYAACVYMTCRTEGTSHLLIDISDVQQICSYELGRTYLKLSHALCI NIPSLDPCLYIMRFANRLQLGAKTHEVSMTALRIVQRMKKDCMHSGRRPTGLCGAALLI AARMHDFSRTMLDVIGVVKIHESTLRKRLSEFAETPSGGLTLEEFMTVDLEREQDPPSF KAARKKDRERIKDMGEHELTELQKEIDAHLEKDLGKYSNSVYRQLTKGKGLSPLSSPST PNSSSEKDIELEESRQFIEQSNAEVIKELIAKNEDVKKSEPGGLVAGIEGLRPDIEAIC RVTQSDLEDVEKAKQPQEQELITDDLNDDELDQYVLTEEESVAKLEMWKNLNAEYLQEQ KERDERLAKEREEGKPERKKRKPRKKVIGPSSTAGEAIEKMLQEKKISSKINYEILKTL TDGMGGLTDDSPTTSADTKPSTLEELKHQPVIVEEGPVPSKSRGNRAAYDLPGPSRKRP

168

KLEVGLPVSQAADVEQPETKPAVVVEADDLDEDADDPDVEPEAEPEATLQDMLNTGGDD DEFGYGFDEEEEY

Drosophila melanogaster TFIIB NM_057540 MASTSRLDNNKVCCYAHPESPLIEDYRAGDMICSECGLVVGDRVIDVGSEWRTFSNEKS GVDPSRVGGPENPLLSGGDLSTIIGPGTGSASFDAFGAPKYQNRRTMSSSDRSLISAFK EISSMADRINLPKTIVDRANNLFKQVHDGKNLKGRSNDAKASACLYIACRQEGVPRTFK EICAVSKISKKEIGRCFKLTLKALETSVDLITTADFMCRFCANLDLPNMVQRAATHIAK KAVEMDIVPGRSPISVAAAAIYMASQASEHKRSQKEIGDIAGVADVTIRQSYKLMYPHA AKLFPEDFKFTTPIDQLPQM

Glycine max TFIIB U31097 MSDAFCSDCKRQTEVVFDHSAGDTVCSECGLVLESHSIDETSEWRTFANESGDNDPNRV GGPSNPLLTDGGLSTVIAKPNGGGGGEFLSSSLGRWQNRGSNPDRALIQAFKTIATMSD RLGLVATIKDRANEIYKRVEDQKSSRGRNQDALLAACLYIACRQEDKPRTVKEICSVAN GATKKEIGRAKEYIVKQLGLENGNAVEMGTIHAGDFMRRFCSNLCMNNQAVKAAQEAVQ KSEEFDIRRSPISIAAAVIYIITQLSDDKKPLKDISLATGVAEGTIRNSYKDLYPHVSK IIPNWYAKEEDLKNLCSP

Homo sapiens BRF NP_001510.2 MTGRVCRGCGGTDIELDAARGDAVCTACGSVLEDNIIVSEVQFVESSGGGSSAVGQFVS LDGAGKTPTLGGGFHVNLGKESRAQTLQNGRRHIHHLGNQLQLNQHCLDTAFNFFKMAV SRHLTRGRKMAHVIAACLYLVCRTEGTPHMLLDLSDLLQVNVYVLGKTFLLLARELCIN APAIDPCLYIPRFAHLLEFGEKNHEVSMTALRLLQRMKRDWMHTGRRPSGLCGAALLVA ARMHDFRRTVKEVISVVKVCESTLRKRLTEFEDTPTSQLTIDEFMKIDLEEECDPPSYT AGQRKLRMKQLEQVLSKKLEEVEGEISSYQDAIEIELENSRPKAKGGLASLAKDGSTED TASSLCGEEDTEDEELEAAASHLNKDLYRELLGGAPGSSEAAGSPEWGGRPPALGSLLD PLPTAASLGISDSIRECISSQSSDPKDASGDGELDLSGIDDLEIDRYILNESEARVKAE LWMRENAEYLREQREKEARIAKEKELGIYKEHKPKKSCKRREPIQASTAREAIEKMLEQ KKISSKINYSVLRGLSSAGGGSPHREDAQPEHSASARKLSRRRTPASRSGADPVTSVGK RLRPLVSTQPAKKVATGEALLPSSPTLGAEPARPQAVLVESGPVSYHADEEADEEEPDE EDGEPCVSALQMMGSNDYGCDGDEDDGY

Homo sapiens TFIIB NM_001514 MASTSRLDALPRVTCPNHPDAILVEDYRAGDMICPECGLVVGDRVIDVGSEWRTFSNDK ATKDPSRVGDSQNPLLSDGDLSTMIGKGTGAASFDEFGNSKYQNRRTMSSSDRAMMNAF KEITTMADRINLPRNIVDRTNNLFKQVYEQKSLKGRANDAIASACLYIACRQEGVPRTF KEICAVSRISKKEIGRCFKLILKALETSVDLITTGDFMSRFCSNLCLPKQVQMAATHIA RKAVELDLVPGRSPISVAAAAIYMASQASAEKRTQKEIGDIAGVADVTIRQSYRLIYPR APDLFPTDFKFDTPVDKLPQL

Lycopersicon esculentum TFIIB AF273333 MDRGKIPDLAARSNRIYLDLEDIIKENALPFLPAKSAVKFQAVCRDWRLQISAPLFAHK QSLSCNSTSGIFSQLNRGSPFLIPIDANSCGVPDPFLNFLPEPVDIKSSSNGLLCCRGR EGDKVYYICNPFTKQWKELPKSNAYHGSDPAIVLLFEPSLLNFVAEYKIICAFPSTDFD KATEFDIYYSREGCWKIAEEMCFGSRTIFPKSGIHVNGVVYWMTSKNILAFDLTKGRTQ LLESYGTRGFLGTFSGKLCKVDVSGDIISLNVLANTHSNTMQIGSQIKMWSEKEIVVLD

169

SEIVGDGAARNHTVLHVDSDIMVVLCGRRTCSYDFKSRLTKFLSSKVGILDRCFPYVNS LVSL

Lycopersicon esculentum TFIIB TC124975 MDTYCSDCKRNTEVVFDHAAGDTVCSECGLVLESRSIDETSEWRTFADESGGDDPNRVG GPVNPLLGDAALSTVISKGPNGSNGDGSLARLQNRGGDPDRAIVLAFKAIATMADRLSL VSTIRDRASEIYKRLEDQKCTRGRNLDALVAACIYIACRQEGKPRTVKEICSIANGASK KEIGRAKEFIVKQLKVEMGESMEMGTIHAGDYLRRFCSNLGMNHEEIKVVQETVQKAEE FDIRRSPISIAAAIIYMITQLSDSKKPVLRDISVATTVAEGTIKNAYKDLYPHASKIIP EWYVKDKDLKSLCSPKA

Lycopersicon esculentum TFIIB AAG01118 MRCPYCSAEQGRCTSSTSGRPITECTSCGRVVEERLTQSHHLFHTRAQDSPLCLATSDL PTLPISATNDDEDPFEPTGFITTFSTWSLEPYPVFAQSSISFAGHLAELERVLEMTSTS SSSSSSSVVVENLRAYLQIIDVASILRLDSDISDHAFQLFRDCSSATCLRNRSVEALAT AALVHAIREAQEPRTLQEISVAANLPQKEIGKYIKILGEALQLSQPINSNSISVHMPRF CTLLQLNKSAQELATHIGEVIINKCFCTRRNPISISAAAIYLACQLEDKRKTQAEICKV TGLTEVTLRKVYKELLENWDDLLPSSYKPVVPPEKAFPSATIATGRSSTPRVDIVEGTS SERDKPVKPVDSLDISPQIRGKEDSDSKDNINTTQLSWPPPFWKPQAPAEGGVKSATDK SQNATEEMEIDL

Mesembryanthemum crystallinum TFIIB TC5895 MSDAFCSDCKKCTEVVFDHSAGDTVCSECGLVLESHSIDETSEWRTFANESNDNDPVRV GGPTNPLLSDGGLSTVISKPNGTTGDYLSSSLGRWQNRGANPDRGLILAFKTIATMADR LGLVATIKDRASEIYKKVEDQKSSRGRNQDAILAACLYIACRQEDKPRTVKEICSVANG ASKKEIGRAKEYIVKQLELEMGKSVTIGTIHAADFLRRFCSNLGMNNQAMKAAQEAVQK SEEIDIRRSPISIAAAVIYIITQLSEEKKPLRDISLATGVAEGTIRNAYKDLYPHISKI IPVWYATEDDLKTSAAHKVKQ

Methanosarcina acetivorans TFB NP_615574.1 MVEVERVRYSDTLEREKIRAMIKARKEKQKEQSFENEKAVCPECGSRNLVHDYERAELV CGDCGLVIDADFVDEGPEWRAFDHDQRMKRSRVGAPMTYTIHDKGLSTMIDWRNRDSYG KSISSKNRAQLYRLRKWQRRIRVSNATERNLAFALSELDRMASALGLPRTVRETAAVVY RKAVDKNLIRGRSIEGVAAAALYAACRQCSVPRTLDEIEEVSRVSRKEIGRTYRFISRE LALKLMPTSPIDYVPRFCSGLNLKGEVQSKSVEILRQASEKELTSGRGPTGVAAAAIYI ASILCGERRTQREVADVAGVTEVTIRNRYKELAEELDIEIIL

Medicago truncatula TC86832 MSDAFCSDCKRATEVVFDHSAGDAVCSECGLVLESHSIDETSEWRTFANESGDNDPVRV GGPSNPLLTDGGLSTVIAKPNGASGDFLSSSLGRWQNRGSNPDRGLILAFKTIGTMAER LGLVPTIKDRANEIYKRVEDQKSSRGRNQDALLAACLYIACRQEDKPRTVKEICSIANG ATKKEIGRAKEYIVKQLGLENGGQSVEMGTIHAGDFMRRFCSNLGMNHQAVKAAQESVQ KSEEFDIRRSPISIAAAVIYIITQLSDDKKPLKDISVATGVAEGTIRNSYKDLYPHVSK IIPNWYAKEEDLKNLCSP

Oryza sativa TFIIB AF464908 MSDSFCPDCKKHTEVAFDHSAGDTVCTECGLVLEAHSVDETSEWRTFANESSDNDPVRV

170

GGPTNPLLTDGGLSTVIAKPNGAQGEFLSSSLGRWQNRGSNPDRSLILAFRTIANMADR LGLVATIKDRANEIYKKVEDLKSIRGRNQDAILAACLYIACRQEDRPRTVKEICSVANG ATKKEIGRAKEFIVKQLEVEMGQSMEMGTIHAGDFLRRFCSTLGMNNQAVKAAQEAVQR SEELDIRRSPISIAAAVIYMITQLSDDKKPLKDISLATGVAEGTIRNSYKDLYPYASRL IPNTYAKEEDLKNLCTP

Populus trichocarpa TFIIB6 MVKTKTNPLVSRAKQETSNKYVYQTYPKVIDLLTHSNPFLLSFSLPPHPSSHGSKPKNK QEMGDAFCSDCKRHTEVVFDHSAGDTVCSECGLVLESHSIDETSEWRTFANESGDNDPV RVGGPTNPLLTDGGLSTVIAKPNGASGEFLSSSLGRWQNRGSNPDRGLITAFKTIATMS DRWVREYKLLDVEFGGF

Populus trichocarpa TFIIB3 MNRTITNIKSASPSLSLTRKGLDAFLVWLFMRISPNVSFSFSGDAFFIILKDRANEIYK KVEDQKSSRGRNQDALLAACLYIACRQEDKPRTVKEICSVANGATKKEIGRAKEYIVKQ LGLEAGQSVEMGTIHAGDFMRRFCSNLGMSNHTVKAATEAVKTSEQFDIRRSPISIAAA VIYIITQLSDDKKPLRDISLATGVAEGTIRNSYKDLYPHVSKIIPAWYANEEDLKNLSS P

Populus trichocarpa TFIIB5 MEDSYCPDCKRLTEIVFDHSAGDTICSECGLILEAHSVDETSEWRTFSNESSDHDPNRV GGPLNPLLADGGLSTTISKTNGGSNELLSCSLGKWQSRGANPDRNRIQAFKSIAAMADR FFLFYFLWEKNDVCQLWIRLLAIVCRLGKCMLNSCWLWNFIRQHHGKIK

Populus trichocarpa TFIIB2 MARNGEIDDYRDYCKDCKANTYIVLDHCTGDTICSDCGLVLESCYIDEIAEWRTFNDDN NDKDPNRVGYNVNPLLSQGNLKTLISNNKGDHAIPRWQDGVSNSDRVLLQGFDIIEIIA NRLGLVRPIKDRAKEIFKKIEEQKTCVMRKRDSICAACLFISSRENKLPRTLNEISSVV YGVTKKEINKAVQSIKRHVELEDMGTLNPSELVRRFCSNLGMKNHAVKAVHEAVEKIQD VDIRRNPKSVLAAIIYTITQLSDEKKPLRDISLAADVAEGTIKKSFKDISPHVSRLVPK WYAREEDIRRIRIPRNCGAKQLN

Populus trichocarpa TFIIB1 MVKTKTNPLVSRAKQETSNKYVYQTYPKVIDLLTHSNPFLLSFSLPPHPSSHGSKPKNK QEMGDAFCSDCKRHTEVVFDHSAGDTVCSECGLVLESHSIDETSEWRTFANESGDNDPV RVGGPTNPLLTDGGLSTVIAKPNGASGEFLSSSLGRWQNRGSNPDRGLITAFKTIATMS DRLGLVATIKDRANEIYKKVEDQKSSRGRNQDALLAACLYIACRQEDKPRTVKEICSVA NGATKKEIGRAKEYIVKQLGLETGQSVEMGTIHAGDFMRRFCSNLGMSNHTVKAATEAV KTSEQFDIRRSPISIAAAVIYIITQLSDDKKPLRDISLATGVAEGTIRNSYKDLYPHVS KIIPSWYASEEDLKNLCSP

Populus trichocarpa TFIIB4 MGDAFCSDCKKHTEVVCDHSAGDTVCSECGLVLESHSIDETSEWRIFANESGDNDPVRV GGPTNPLLTDGGLSTVIAKPNGASGDFLSTSLGRWQNRGSNPDRGLILAFKTIATMSDR LGLVATIKVFILDVCTLLLPLMVSTVSLKHPLMNMALNLNYHVNKVCFLGISRVLPGNE PYILFHPDLSFSSILKMYRHILMKV

171

Populus trichocarpa TFIIB9 MSPALASSGASIELALYAGSKWFGSNRVFFPPPPPSLTALGCIGQAKKKWVSLSSPTWR QCVIKQSCELSKIRFLSPPFQETYTHPYQLRKKKKKTHQEKQNRKKYSANVLDHLTGDT ICIDCGLVLISYYVDEEPEWRTFGIEDNINEYDPNHLGSLSDPLLTHANLATTISKPAK GGTTAVAISKNWLINRQSNPDGDLIQGFEIIETMARRRNREAMPAACLFISCKENKLPR TLKETCSAASCNGGGGGGLTMKEACTIGGYDRRDHES

Populus trichocarpa TFIIB7/pBrp MKCPYCSATQGRCATTTTTNRCITECTSCGRVVEERQFHPHHLFHLRAQDTPLCLVTSD LPTLHHHHQNEEDPFEPTGFITSFSTWSLEPNPVSLRSSLSFSGHLAELERTIELSAST PASSSNVVVDNLRAYMQIIDVASILGLDCDISDHAFQLFRDCCSATCLRNRSVEALATA ALVQAIREAQEPRTLQVGTLVNGEEISIAANVPQKEIGKYIKILGEALQLSQPINSNSI SVHMPRFCTLLQLNKSAQELATHIGEVVINKCFCTRRNPISISAAAIYLACQLEDKRKT QAEICKVTGLTEVTLRKVYKELLENWGDLLPKNYTPAVPPEKAFPTTTITSGRSSAPKI DPVELVSSSSEKDKQLESKSNKPSELARGKEDAENNGNSRGIQPPPWQNFRQPWLQFVT SGVRMVGDTNQNLARVDINESQPRRQEFEEKADKQKMDKDPTASAWPNQLSSSPASGAS TISWPFRSPTLSGPSPIVQPPPKLTPGYAELKGIGSQNGGKTGNNSGDNK

Populus trichocarpa TFIIB8 MENNLMFVWRSPISIAAAVIYIIIQLSDDKKPLKDISVVTQVAEGTIKNSYKDLSPHLS QIIPSWFAKEEDIKNLHSKHTNLDEGINICLRLKEAPPHNNEQYTATFLLLFVTNELKR DGGGKVLLLLNLCMDRKILESGKQPSEAKLYALSVPTDTHRPTMLPE

Saccharomyces cerevisiae BRF NP_011762.1 MPVCKNCHGTEFERDLSNANNDLVCKACGVVSEDNPIVSEVTFGETSAGAAVVQGSFIG AGQSHAAFGGSSALESREATLNNARRKLRAVSYALHIPEYITDAAFQWYKLALANNFVQ GRRSQNVIASCLYVACRKEKTHHMLIDFSSRLQVSVYSIGATFLKMVKKLHITELPLAD PSLFIQHFAEKLDLADKKIKVVKDAVKLAQRMSKDWMFEGRRPAGIAGACILLACRMNN LRRTHTEIVAVSHVAEETLQQRLNEFKNTKAAKLSVQKFRENDVEDGEARPPSFVKNRK KERKIKDSLDKEEMFQTSEEALNKNPILTQVLGEQELSSKEVLFYLKQFSERRARVVER IKATNGIDGENIYHEGSENETRKRKLSEVSIQNEHVEGEDKETEGTEEKVKKVKTKTSE EKKENESGHFQDAIDGYSLETDPYCPRNLHLLPTTDTYLSKVSDDPDNLEDVDDEELNA HLLNEEASKLKERIWIGLNADFLLEQESKRLKQEADIATGNTSVKKKRTRRRNNTRSDE PTKTVDAAAAIGLMSDLQDKSGLHAALKAAEESGDFTTADSVKNMLQKASFSKKINYDA IDGLFR

Saccharomyces cerevisiae TFIIB_M81380 MMTRESIDKRAGRRGPNLNIVLTCPECKVYPPKIVERFSEGDVVCALCGLVLSDKLVDT RSEWRTFSNDDHNGDDPSRVGEASNPLLDGNNLSTRIGKGETTDMRFTKELNKAQGKNV MDKKDNEVQAAFAKITMLCDAAELPKIVKDCAKEAYKLCHDEKTLKGKSMESIMAASIL IGCRRAEVARTFKEIQSLIHVKTKEFGKTLNIMKNILRGKSEDGFLKIDTDNMSGAQNL TYIPRFCSHLGLPMQVTTSAEYTAKKCKEIKEIAGKSPITIAVVSIYLNILLFQIPITA AKVGQTLQVTEGTIKSGYKILYEHRDKLVDPQLIANGVVSLDNLPGVEKK

172

Solanum tuberosum TFIIB TC58701 MDTYCSDCKRNTEVVFDHAAGDTVCSECGLVLESRSIDETSEWRTFADESGGDDPNRVG GPVNPLLGDAALSTVISKGPNGSNGDGSLARLQNRGGDPDRAIVLAFKAIATMADRLSL VSTIRDRASEIYKRLEDQKCTRGRNLDALVAACIYIACRQEGKPRTVKEICSIANGASK KEIGRAKEFIVKQLKVEMGESMEMGTIHAGDYLRRFCSNLGMNHEEIKVVQETVQKAEE FDIRRSPISIAAAIIYMITQLSDSKKPVLRADISVATTVAEGTIKNAYKDLYPHASKII PEWYVKDKDLKNLCSPKA

Sulfolobus solfataricus TFB AAK40772.1 MLYLSEENKSVSTPCPPDKIIFDAERGEYICSETGEVLEDKIIDQGPEWRAFTPEEKEK RSRVGGPLNNTIHDRGLSTLIDWKDKDAMGRTLDPKRRLEALRWRKWQIRARIQSSIDR NLAQAMNELERIGNLLNLPKSVKDEAALIYRKAVEKGLVRGRSIESVVAAAIYAACRRM KLARTLDEIAQYTKANRKEVARCYRLLLRELDVSVPVSDPKDYVTRIANLLGLSGAVMK TAAEIIDKAKGSGLTAGKDPAGLAAAAIYIASLLHDERRTQKEIAQVAGVTEVTVRNRY KELTQELKISIPTQ

Triticum aestivum TFIIB TC68795 MGDSYCQDCKKHTEVAFDHSAGDTVCTECGLVLEAHSVDETSEWRTFANESNDNDPVRV GGPSNPLLTDGGLSTVIAKPNGAHGDFLSSSLGRWQNRGSNPDRSLILAFRTIANMADR LGLVATIKDRANEIYKKVEDLKSIRGRNQDAILAACLYIACRQEDRPRTVKEICSVANG ATKKEIGRAKEFIVKQLEVEMGQSMEMGTIHAGDFLRRFCSTLGMNNTAVKAAQEAVQR SEELDIRRSPISIAAAVIYMITQLSEDKKPLKDISLATGVAEGTIRNSYKDLYPYAARL IPNSYAKEEDLKNLCTP

Vitis vinifera TFIIB TC19782 MADAFCTDCKKNTEVVFDHSAGDTVCSECGLVLESHSIDETSEWRTFANESGDNDPVRV GGPSNPLLTDGGLSTVIAKPNGVSGDFLSSSLGRWQNRGSNPDRGLILAFKTIATMSDR LGLVATIKDRANEIYKKVEDQKSTRGRNQDALLAACLYIACRQEDKPRTVKEICSVANG ATKKEIGRAKEYIVKQLEAEKGQSVEMGTIHAGDFMRRFCSNLGMTNQVVKAAQEAVQK SEEFDIRRSPVSIAAAVIYIITQLSDEKKLLRDISIATGVAEGTIRNSYKDLYPHISRI IPSWYAKEEDLRNLCSP

TATA Binding Protein Sequences

Arabidopsis thaliana TBP1 At3g13445 MTDQGLEGSNPVDLSKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEDFSKMAARKYARIVQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHAAFSSYEPELFPGLIYRMKVPKIVLLIFVSGKIVITGAKM RDETYKAFENIYPVLSEFRKIQQ

Arabidopsis thaliana TBP2 At1g55520 MADQGTEGSQPVDLTKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEHLSKLAARKYARIVQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHSAFSSYEPELFPGLIYRMKLPKIVLLIFVSGKIVITGAKM REETYTAFENIYPVLREFRKVQQ

173

Cenarchaeum symbiosum TBP AAC62688 MLDPRTRPRVVNVVSTSDLVQRVSAKKMAAMPCCMYDEAVYGGRCGYIKTPGMQGRVTV FISGKMISVGARSVRASFGQLHEARLHLVRNGAAGDCKIRPVVRNIVATVDAGRNVPID RISSRMPGAVYDPGSFPGMILKGLDSCSFLVFASGKMVIAGAKSPDELRRSSFDLLTRL NNAGA

Chlamydomonas reinhardtii TBP TC24902 MMAAAEAPPATPQLSAADVEAEMAAHVSGIKPQLQNVVATVNLGTKLDLKEIAMHARNA EYNPKRFAAVIMRIREPKTTALIFASGKMVCTGAKSEDDSRTAARRYAKIVQKLGFPAT FKEFKIQNIVGSCDVKFPIRLEGLAYAHSLFASYEPELFPGLIYRMKQPKIVLLIFVSG KVVLTGAKTRGEIYQAYMNIYPTLIQYKKGDAVVPTLPNNLMGPPRALPAAKQGGQADV GEPQQAQEQDGAGPSGVRGAGASAGAAAVPAAGSGWHDAAAASGGDASAGPEPAAADTH APPPAAAHASAPGAGGYTQPEPPPAAALTAAVRRCKAGAQRRAELDARRMAGGSAGGGG GQCLVASAYHHVASVYHSSMHFGGMVLVGNGRRRVGVAXRQLDVTLKIDRSVYAVGR

Glycine max TBP TC146463 MADQGLEGSQPVDLQKHPSGIVPTLQNIVSTVNLDCKLDLKTIALQARNAEYNPKRFAA VIMRIRDPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVLTGAKV RDETYTAFENIYPVLTEFRKNQQ

Hordeum vulgare TBP TC78738 MAEAALEGSQPVDLSKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VASCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMRQPKIVLLIFVSGKIVLTGAKV REETYTAFENIYPVLTEFRKVQQ

Medicago truncatula TBP TC86874 MADQGLEGSQPVDLSKHPSGIVPTLQNIVSTVNLDCKLELKSIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEVQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSVKYDTKLLLSISPGEFEEIMYHYYQSHMTLALFPS IIYLKSSILQKTGTSLETLVSKVCPMENKTDTNQS

Medicago truncatula TBP TC88717 MADQGLEGSQPVDLAKHPSGIVPTLQNIVSTVNLDTKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFNAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSVSYFIYSYLHTSSSLDVICIGISMAKRFRFFLKI

Mesembryanthemum crystallinum TBP TC7116 MAEQGLEGSQPVDPIKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVLTGAKV REETYTAFENIYPVLTEFRKNQQ

Oryza sativa TBP TC116362 MAAEAAAALEGSEPVDLAKHPSGIIPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKR FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKI

174

QNIVGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVLTG AKVRDETYTAFENIYPVLTEFRKVQQ

Populus trichocarpa TBP1 MAEQGGLEGSQPVDLSKHPSGIVPTLQNIVSTVNLDCKLELKQIALQARNAEYNPKRFA AVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFAAKFKDFKIQN IVGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVITGAK VREETYTAFENIYPVLAEFRKVQQWYTSQSLCPAL

Populus trichocarpa TBP2 MAEQGGLEGSQPVDLSKHPSGIVPILQNIVSTVNLDCRLDLKQIALQARNAEYNPKRFA AVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFAAKFKDFKIQN IVGSCDVKFPIRLEGLAYSHGAFSSYEPEIFPGLIYRMKQPKIVLLIFVSGKIVITGAK VRDETYTAFGNIYPVLTEFRKVQQW

Pyrococcus woesei TBP AAA73447 MVDMSKVKLRIENIVASVDLFAQLDLEKVLDLCPNSKYNPEEFPGIICHLDDPKVALLI FSSGKLVVTGAKSVQDIERAVAKLAQKLKSIGVKFKRAPQIDVQNMVFSGDIGREFNLD VVALTLPNCEYEPEQFPGVIYRVKEPKSVILLFSSGKIVCSGAKSEADAWEAVRKLLRE LDKYGLLEEEEEEL

Solanum tuberosum TBP TC74102 MADQGLEGSQPVDLTKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYAHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVITGAKV RDETYTAFENIYPVLTEFRKNQQ

Sorghum bicolor TBP TC54739 MAEPGLEGSQPVDLSKHPSGIVPTLHFPVLGASKRANIVLNSWGFGGNYLVVILSPRVD FRNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAAVIMRIREPKTTALIFASGKMYARI IQKLGFPAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPK IVLLIFVSGKIVLTGAKVREETYTAFENIYPVLTEFRKVQQ

Triticum aestivum TBP TC72701 MAEATLEGSEPVDLSKHPSGIIPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVLTGAKV REETYTAFENIYPVLTEFRKVQQ

Triticum aestivum TBP TC88519 MAEAAALEGSEPVDLTKHPSGIIPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFA AVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQN IVASCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMRQPKIVLLIFVSGKIVLTGAK VREETYTAFENIYPVLTEFRKVQQ

Triticum aestivum TBP TC90291 MAAAAVDPMVLGLGTSGGASGSGVVGGGVGRAGGGGAVMEGAQPVDLARHPSGIVPVLQ

175

NIVSTVNLDCRLDLKQIALQARNAEYNPKRFAAVIMRIRDPKTTALIFASGKMVCTGAK SEEHSKLAARKYARIVQKLGFPATFKDFKIQNIVASCDVKFPIRLEGLAYSHGAFSSYE PELFPGLIYRMKQPKIVLLVFVSGKIVLTGAKVRDEIYAAFENIYPVLTEYRKSQQ

Zea mays TBP TC182979 MAEPGLEGSQPVDLSKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVLTGAKV REETYTAFENIYPVLSEFRKIQQ

Zea mays TBP TC171023 MAEPGLEDSQPVDLSKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVLTGAKV REETYTAFENIYPVLAEFRKVQQ

Zea mays TBP X90652.1 MAEPRLEDSQPVDLSKHPSGIVPTLQNIVSTVNLDCKLDLKAIALQARNAEYNPKRFAA VIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGFPAKFKDFKIQNI VGSCDVKFPIRLEGLAYSHGAFSSYEPELFPGLIYRMKQPKIVLLIFVSGKIVLTGAKV REETYTAFENIYPVLAEFRKVQQWYVVLFYHVSIIVRS

Saccharomyces cerevisiae TBP NP_011075.1 MADEERLKEFKEANKIVFDPNTRQVWENQNRDGTKPATTFQSEEDIKRAAPESEKDTSA TSGIVPTLQNIVATVTLGCRLDLKTVALHARNAEYNPKRFAAVIMRIREPKTTALIFAS GKMVVTGAKSEDDSKLASRKYARIIQKIGFAAKFTDFKIQNIVGSCDVKFPIRLEGLAF SHGTFSSYEPELFPGLIYRMVKPKIVLLIFVSGKIVLTGAKQREEIYQAFEAIYPVLSE FRKM

Homo sapiens TBP_L1 NP_004856 MDADSDVALDILITNVVCVFRTRCHLNLRKIALEGANVIYKRDVGKVLMKLRKPRITAT IWSSGKIICTGATSEEEAKFGARRLARSLQKLGFQVIFTDFKVVNVLAVCNMPFEIRLP EFTKNNRPHASYEPELHPAVCYRIKSLRATLQIFSTGSITVTGPNVKAVATAVEQIYPF VFESRKEIL

Homo sapiens TBP NP_003185.1 MDQNNSLPPYAQGLASPQGAMTPGIPIFSPMMPYGTGLTPQPIQNTNSLSILEEQQRQQ QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQAVAAAAVQQSTSQQATQGTSGQA PQLFHSQTLTTAPLPGTTPLYPSPMTPMTPITPATPASESSGIVPQLQNIVSTVNLGCK LDLKTIALRARNAEYNPKRFAAVIMRIREPRTTALIFSSGKMVCTGAKSEEQSRLAARK YARVVQKLGFPAKFLDFKIQNMVGSCDVKFPIRLEGLVLTHQQFSSYEPELFPGLIYRM IKPRIVLLIFVSGKVVLTGAKVRAEIYEAFENIYPILKGFRKTT

Drosophila melanogaster TRF_Q27896 MQFHFKVADAERDRDNVAATSNAAANPHAALQPQQPVALVEPKDAQHEIRLQNIVATFS VNCELDLKAINSRTRNSEYSPKRFRGVIMRMHSPRCTALIFRTGKVICTGARNEIEADI

176

GSRKFARILQKLGFPVKFMEYKLQNIVATVDLRFPIRLENLNHVHGQFSSYEPEMFPGL IYRMVKPRIVLLIFVNGKVVFTGAKSRKDIMDCLEAISPILLSFRKT

Drosophila melanogaster TBP NP_523805 MDQMLSPNFSIPSIGTPLHQMEADQQIVANPVYHPPAVSQPDSLMPAPGSSSVQHQQQQ QQSDASGGSGLFGHEPSLPLAHKQMQSYQPSASYQQQQQQQQLQSQAPGGGGSTPQSMM QPQTPQSMMAHMMPMSERSVGGSGAGGGGDALSNIHQTMGPSTPMTPATPGSADPGIVP QLQNIVSTVNLCCKLDLKKIALHARNAEYNPKRFAAVIMRIREPRTTALIFSSGKMVCT GAKSEDDSRLAARKYARIIQKLGFPAKFLDFKIQNMVGSCDVKFPIRLEGLVLTHCNFS SYEPELFPGLIYRMVRPRIVLLIFVSGKVVLTGAKVRQEIYDAFDKIFPILKKFKKQS

Drosophila melanogaster TRF2 AAD28784 MQNDMVSIPVANLNGGLKAASSGSGVGVVTPGGVVSSAVLANAPRVYLTPSSTFMTNRQ MAGVASTGRMSGQVVGGSSGTASTAGTVRHFSQFSKMQTAGGPSLQRKLANGDTIVLAT GSKNMFLTSSENKANLPTVASNGNGLITAKMDLLEEEVMQSITVIDDDDEEKKEVAEDE EESSNNAKPIDLHQPIADNEHELDIVINNVVCSFSVGCHLKLREIALQGSNVEYRRENG MVTMKLRHPYTTASIWSSGRITCTGATSESMAKVAARRYARCLGKLGFPTRFLNFRIVN VLGTCSMPWAIKIVNFSERHRENASYEPELHPGVTYKMRDPDPKATLKIFSTGSVTVTA ASVNHVESAIQHIYPLVFDFRKQRSAEELQHLRQKQRLQAGGDPHELEKNVLADNKTAS LDNIFVNTTAAHSKSSSNDQTSAPATILSSTVDSMPRLKQMVNYHQMMKQTQEERRHIM FNGEKANPASTSSAAAAPSTSSSSSSSGDNICANARRRATECWATKLQNKRPRYNDPGT TGTINAASSTASAATSSLASQATHLRNPLKTAALANARMLGAKVTTCTRNSIIVQQPQR IQMQQQQQQLQPQQQQTSFSPSEFDVDDLIEEEENNELDMPF

TAF6 Sequences

Arabidopsis thaliana TAF6 At1g04950 MSIVPKETVEVIAQSIGITNLLPEAALMLAPDVEYRVREIMQEAIKCMRHSKRTTLTAS DVDGALNLRNVEPIYGFASGGPFRFRKAIGHRDLFYTDDREVDFKDVIEAPLPKAPLDT EIVCHWLAIEGVQPAIPENAPLEVIRAPAETKIHEQKDGPLIDVRLPVKHVLSRELQLY FQKIAELAMSKSNPPLYKEALVSLASDSGLHPLVPYFTNFIADEVSNGLNDFRLLFNLM HIVRSLLQNPHIHIEPYLHQLMPSVVTCLVSRKLGNRFADNHWELRDFAANLVSLICKR YGTVYITLQSRLTRTLVNALLDPKKALTQHYGAIQGLAALGHTVVRLLILSNLEPYLSL LEPELNAEKQKNQMKIYEAWRVYGALLRAAGLCIHGRLKIFPPLPSPSPSFLHKGKGKG KIISTDPHKRKLSVDSSENQSPQKRLITMDGPDGVHSQDQSGSAPMQVDNPVENDNPPQ NSVQPSSSEQASDANESESRNGKVKESGRSRAITMKAILDQIWKDDLDSGRLLVKLHEL YGDRILPFIPSTEMSVFL

Arabidopsis thaliana TAF6b-1 At1g54360 MVTKESIEVIAQSIGLSTLSPDVSAALAPDVEYRVREVMQEAIKCMRHARRTTLMAHDV DSALHFRNLEPTSGSKSMRFKRAPENRDLYFFDDKDVELKNVIEAPLPNAPPDASVFSH WLAIDGIQPSIPQNSPLQAISDLKRSEYKDDGLAARQVLSKDLQIYFDKVTEWALTQSG STLFRQALASLEIDPGLHPLVPFFTSFIAEEIVKNMDNYPILLALMRLARSLLHNPHVH IEPYLHQLMPSIITCLIAKRLGRRSSDNHWDLRNFTASTVASTCKRFGHVYHNLLPRVT RSLLHTFLDPTKALPQHYGAIQGMVALGLNMVRFLVLPNLGPYLLLLLPEMGLEKQKEE AKRHGAWLVYGALMVAAGRCLYERLKTSETLLSPPTSSVWKTNGKLTSPRQSKRKASSD

177

NLTHQPPLKKIAVGGIIQMSSTQMQMRGTTTVPQQSHTDADARHHNSPSTIAPKTSAAA GTDVDNYLFPLFEYFGESMLMFTPTHELSFFL

Arabidopsis thaliana TAF6b-2 At1g54360 MVTKESIEVIAQSIGLSTLSPDVSAALAPDVEYRVREVMQEAIKCMRHARRTTLMAHDV DSALHFRNLEPTSGSKSMRFKRAPENRDLYFFDDKDVELKNVIEAPLPNAPPDASVFSH WLAIDGIQPSIPQNSPLQAISDLKRSEYKDDGLAARQIYFDKVTEWALTQSGSTLFRQA LASLEIDPGLHPLVPFFTSFIAEEIVKNMDNYPILLALMRLARSLLHNPHVHIEPYLHQ LMPSIITCLIAKRLGRRSSDNHWDLRNFTASTVASTCKRFGHVYHNLLPRVTRSLLHTF LDPTKALPQHYGAIQGMVALGLNMVRFLVLPNLGPYLLLLLPEMGLEKQKEEAKRHGAW LVYGALMVAAGRCLYERLKTSETLLSPPTSSVWKTNGKLTSPRQSKRKASSDNLTHQPP LKKIAVGGIIQMSSTQMQMRGTTTVPQQSHTDADARHHNSPSTIAPKTSAAAGTDVDNY LFPLFEYFGESMLMFTPTHELSFFL

Arabidopsis thaliana TAF6b-4 At1g54360 MVTKESIEVIAQSIGLSTLSPDVSAALAPDVEYRVREVMQEAIKCMRHARRTTLMAHDV DSALHFRNLEVSSSSLLLLFHTVDPDFDFFLYSLPLAPKVCGSRELLRTEIYTSSMTKM SSSRMLSKLLYQMHLLMHLFSLIGWQLMVFNLPFHRILLSKPYLTLNDRNIRTMAWLLH RCFLRTFRFTLTKSRSGL

Arabidopsis thaliana TAF6b-3 At1g54360 MVTKESIEVIAQSIGLSTLSPDVSAALAPDVDSALHFRNLEPTSGSKSMRFKRAPENRD LYFFDDKDVELKNVIEAPLPNAPPDASVFSHWLAIDGIQPSIPQNSPLQAISDLKRSEY KDDGLAARQVLSKDLQIYFDKVTEWALTQSGSTLFRQALASLEIDPGLHPLVPFFTSFI AEEIVKNMDNYPILLALMRLARSLLHNPHVHIEPYLHQLMPSIITCLIAKRLGRRSSDN HWDLRNFTASTVASTCKRFGHVYHNLLPRVTRSLLHTFLDPTKALPQHYGAIQGMVALG LNMVRFLVLPNLGPYLLLLLPEMGLEKQKEEAKRHGAWLVYGALMVAAGRCLYERLKTS ETLLSPPTSSVWKTNGKLTSPRQSKRKASSDNLTHQPPLKKIAVGGIIQMSSTQMQMRG TTTVPQQSHTDADARHHNSPSTIAPKTSAAAGTDVDNYLFPLFEYFGESMLMFTPTHEL SFFL

Drosophila melanogaster TAF6 NP_524161 MSGKPSKPSSPSSSMLYGSSISAESMKVIAESIGVGSLSDDAAKELAEDVSIKLKRIVQ DAAKFMNHAKRQKLSVRDIDMSLKVRNVEPQYGFVAKDFIPFRFASGGGRELHFTEDKE IDLGEITSTNSVKIPLDLTLRSHWFVVEGVQPTVPENPPPLSKDSQLLDSVNPVIKMDQ GLNKDAAGKPTTGKIHKLKNVETIHVKQLATHELSVEQQLYYKEITEACVGSDEPRRGE ALQSLGSDPGLHEMLPRMCTFIAEGVKVNVVQNNLALLIYLMRMVRALLDNPSLFLEKY LHELIPSVMTCIVSKQLCMRPELDNHWALRDFASRLMAQICKNFNTLTNNLQTRVTRIF SKALQNDKTHLSSLYGSIAGLSELGGEVIKVFIIPRLKFISERIEPHLLGTSISNTDKT AAGHIRAMLQKCCPPILRQMRSAPDTAEDYKNDFGFLGPSLCQAVVKVRNAPASSIVTL SSNTINTAPITSAAQTATTIGRVSMPTTQRQGSPGVSSLPQIRAIQANQPAQKFVIVTQ NSPQQGQAKVVRRGSSPHSVVLSAASNAASASNSNSSSSGSLLAAAQRSSDNVCVIAGS EAPAVDGITVQSFRAS

Homo sapiens TAF6 NP_647476 MAEEKKLKLSNTVLPSESMKVVAESMGIAQIQEETCQLLTDEVSYRIKEIAQDALKFMH MGKRQKLTTSDIDYALKLKNVEPLYGFHAQEFIPFRFASGGGRELYFYEEKEVDLSDII

178

NTPLPRVPLDVCLKAHWLSIEGCQPAIPENPPPAPKEQQKAEATEPLKSAKPGQEEDGP LKGKGQGATTADGKGKEKKAPPLLEGAPLRLKPRSIHELSVEQQLYYKEITEACVGSCE AKRAEALQSIATDPGLYQMLPRFSTFISEGVRVNVVQNNLALLIYLMRMVKALMDNPTL YLEKYVHELIPAVMTCIVSRQLCLRPDVDNHWALRDFAARLVAQICKHFSTTTNNIQSR ITKTFTKSWVDEKTPWTTRYGSIAGLAELGHDVIKTLILPRLQQEGERIRSVLDGPVLS NIDRIGADHVQSLLLKHCAPVLAKLRPPPDNQDAYRAEFGSLGPLLCSQVVKARAQAAL QAQQVNRTTLTITQPRPTLTLSQAPQPGPRTPGLLKVPGSIALPVQTLVSARAAAPPQP SPPPTKFIVMSSSSSAPSTQQVLSLSTSAPGSGSTTTSPVTTTVPSVQPIVKLVSTATT APPSTAPSGPGSVQKYIVVSLPPTGEGKGGPTSHPSPVPPPASSPSPLSGSALCGGKQE AGDSPPPAPGTPKANGSQPNSGSPQPAP

Homo sapiens TAF6L NP_006464 MSEREERRFVEIPRESVRLMAESTGLELSDEVAALLAEDVCYRLREATQNSSQFMKHTK RRKLTVEDFNRALRWSSVEAVCGYGSQEALPMRPAREGELYFPEDREVNLVELALATNI PKGCAETAVRVHVSYLDGKGNLAPQGSVPSAVSSLTDDLLKYYHQVTRAVLGDDPQLMK VALQDLQTNSKIGALLPYFVYVVSGVKSVSHDLEQLHRLLQVARSLFRNPHLCLGPYVR CLVGSVLYCVLEPLAASINPLNDHWTLRDGAALLLSHIFWTHGDLVSGLYQHILLSLQK ILADPVRPLCCHYGAVVGLHALGWKAVERVLYPHLSTYWTNLQAVLDDYSVSNAQVKAD GHKVYGAILVAVERLLKMKAQAAEPNRGGPGGRGCRRLDDLPWDSLLFQESSSGGGAEP SFGSGLPLPPGGAGPEDPSLSVTLADIYRELYAFFGDSLATRFGTGQPAPTAPRPPGDK KEPAAAPDSVRKMPQLTASAIVSPHGDESPRGSGGGGPASASGPAASESRPLPRVHRAR GAPRQQGPGTGTRDVFQKSRFAPRGAPHFRFIIAGRQAGRRCRGRLFQTAFPAPYGPSP ASRYVQKLPMIGRTSRPARRWALSDYSLYLPL

Hordeum vulgare TAF6 Barley1_10250 MSIVPKETIEVIAQSIGIPSLPADVSAALAPDVEYRLREIMQEAIKCMRHAKRTVLTAD DVDSALSLRNVEPVYGFASGDPLRFKRAVGHKDLFYIDDREVDFKEIIEAPLPKAPLDT AVVAHWLAIEGVQPAIPENPPIDAISAPTENKRTEQVKDDGLPVDIKLPVKHILSRELQ MYFDKIAELTMSRSSTPIFREALVSLSKDSGLHPLVPYFSYFIADEVTRSLADLPVLFA LMRVVQSLLRNPHIHIEPYLHQLMPSMITCIVAKRLGHRLSDNHWELRDFSANLVASVC RRYGHVYHNLQIRLTKTLVHAFLDPHKALTQHYGAVQGISALGPSAIRLLLLPNLQTYM QLLDPELQLEKQSNEMKRKEAWRVYGALLCAAGKCLYERLKLFPNLLCPSTRPLLRSNS RVATNNPNKRKSSTDLSASQPPLKKMASDVSMSPMGSAAPVAGNMAGSMDGFSAQLPNP GMMQASSSGQKVESMTAAGAIRRDQGSNHAQRVSAVLRQAWKEDQDAGHLLGSLHEVFG EAIFSFIQPPELSIFL

Oryza sativa TAF6 BAB92191 MSIVPKETIEVIGQSVGIANLPADVSAALAPDVEYRLREIMQEAIKCMRHAKRTVLTAD DVDSALSLRNVEPVYGFASGDPLRFKRAVGHKDLFYIDDREVDFKEIIEAPLPKAPLDT AVVAHWLAIEGVQPAIPENPPVDAIVAPTENKRTEHGKDDGLPVDIKLPVKHVLSRELQ MYFDKIAELTMSRSETSVFREALVSLSRDSGLHPLVPYFSYFIADEVTRSLGDLPVLFA LMRVVQSLLHNPHIHIEPYLHQLMPSIITCMVAKRLGHRLSDNHWELRDFSANLVGSVC RRFGHAYHNIQTRVTRTLVQGFLDPQKSLTQHYGAIQGISALGPSAIRLLLLPNLETYM QLLEPELQLDKQKNEMKRKEAWRVYGALLCAAGKCLYDRLKLFPNLLSPSTRPLLRSNK RVVTNNPNKRKSSTDLSTSQPPLKKMTTDGAMNSMTSAPMPGTMDGFSTQLPNPSMTQT SSSGQLVESTASGVIRRDQGSNHTQRVSTVLRLAWKEDQNAGHLLSSLYEVFGEAIFSF VQPPEISFFL

179

Populus trichocarpa TAF6 MSIVAKETIEVIAQSIGISNLSEDVALTLAPDVEFRMRQIMQEAIKCMRHSKRTRLTTD DVDGALNLTNVEPIYGFASGGALQFKRAIGHRDLFYVDDKDIDFKDVIEAPLPKAPLDT AVVCHWLAIEGVQPAIPENAPLEVIAPPSDGKISEQNDEFPVDIKLPVKHVLSRELQLY FDKITDLTVRRSDSVLFKEALVSLATDSGLHPLIPYFTYFIADEVARGLNDYSLLFALM RVVWSLLQNPHIHIEPYIIVNVLSFVFRIMSSIDEYKIKVQSLKLRRRWISCQLHQLMP SVVTCLVARKLGNRFADNHWELRDFTANLVAPICKRVHGWQHSALILCKHSLTEYVPRV SWSGCCRFGHVYNSLQTRLTKTLLNALLDPKRSLTQHYGAIQGLAALGPNVVRLLLLPN LKPYLQLLEPEMLLEKQKNEMKRHEAWHVYGALLCAAGQSIYDRLKMFPALMSHPACAV LRTNEKVVTKRPGDFYDFSFQKLYHLNATVCDVSMPMYLWVESNLFPLIENYQDKRKAS MEHMEQPPPKKIATDGPVDMQVEPIAPVPLGDSKTGLSTSSEHTPNYSEAGSRNQKDKG DSQAIKTSAILSQVWKDDLNSGHLLVSLFELFGESILSFIPSPEMSLFL

Populus trichocarpa TAF6b MSSSIVAKEAIEVIAQGIGITNLSPDVSLTLAPDVEYRLREIIQEAIKCMRHSRRTALT AHDVDTALILRNVEPIYGFGSGGDKVPLRFKRAAAAGHKDLYYIDDKDVNFKHVIEAPP PKPPLDTSLTSHWLAIEGVQPAIPENVPIEALGVISDGKKSDYKDDGLSIDVKLPVKDI LSRELQLYFEKVTELTARRSESAIFKQALVSLATDSGLHPLVPYFIQFIADEVSRNLNN FSLLLAVMRIARSLLQNPYIHIEPYLHQLMPSIITCLVAKRLGNRFSDNHWELRNFTAN LVASICKRFGHAYHNLQPRIIRTLVHAFLDPTKSLPQHYGSIQGLAALGPSVVRLLILP NLEPYLLLLEQEMLLEKQKNEIKRHEAWQRAAGLCMYDRLKMLPGLFIPPSRAIWKSNG RVMTAMPSMTCFNLSHWDTFIHASINPVTGYVYCLKIPVNACVEMGLYVGTSSFHYVHL TLYPACISCRSLLANQDKRKASTDNLMQQPLLKKIATDSAIGAMPMNSMPVEMQGAASG FPTAVGASSVSVSAISRQLSNENVPRREISGRGLKTSTVLAQAWKEDMDAGHLLASLFE LFSESMFSFTPKPELSFFL

Saccharomyces cerevisiae TAF6 NP_011403 MSTQQQSYTIWSPQDTVKDVAESLGLENINDDVLKALAMDVEYRILEIIEQAVKFKRHS KRDVLTTDDVSKALRVLNVEPLYGYYDGSEVNKAVSFSKVNTSGGQSVYYLDEEEVDFD RLINEPLPQVPRLPTFTTHWLAVEGVQPAIIQNPNLNDIRVSQPPFIRGAIVTALNDNS LQTPVTSTTASASVTDTGASQHLSNVKPGQNTEVKPLVKHVLSKELQIYFNKVISTLTA KSQADEAAQHMKQAALTSLRTDSGLHQLVPYFIQFIAEQITQNLSDLQLLTTILEMIYS LLSNTSIFLDPYIHSLMPSILTLLLAKKLGGSPKDDSPQEIHEFLERTNALRDFAASLL DYVLKKFPQAYKSLKPRVTRTLLKTFLDINRVFGTYYGCLKGVSVLEGESIRFFLGNLN NWARLVFNESGITLDNIEEHLNDDSNPTRTKFTKEETQILVDTVISALLVLKKDLPDLY EGKGEKVTDEDKEKLLERCGVTIGFHILKRDDAKELISAIFFGE

TAF9 Sequences

Arabidopsis thaliana TAF9 At1g54140 MAGEGEEDVPRDAKIVKSLLKSMGVEDYEPRVIHQFLELWYRYVVEVLTDAQVYSEHAS KPNIDCDDVKLAIQSKVNFSFSQPPPREVLLELAASRNKIPLPKSIAGPGVPLPPEQDT LLSPNYQLVIPKKSVSTEPEETEDDEEMTDPGQSSQEQQQQQQQTSDLPSQTPQRVSFP LSRRPK

180

Chlamydomonas reinhardtii TAF9 TC21330 MDAARGAGGAVSDGAQPQDVATMHALLRSMGVEEFEPRVVNQLMDFMYKYTTDVLLDAE VFSEHAGRQPGQVDASGVTMAIQSRTALYVQPPPQERVTELARQVNDTGTARPGHQAPA CRCRPRASR

Drosophila melanogaster TAF9 A49067 MSAEKSDKAKISAQIKHVPKDAQVIMSILKELNVQEYEPRVVNQLLEFTFRYVTCILDD AKVYANHARKKTIDLDDVRLATEVTLDKSFTGPLERHVLAKVADVRNSMPLPPIKPHCG LRLPPDRYCLTGVNYKLRATNQPKKMTKSAVEGRPLKTVVKPVSSANGPKRPHSVVAKQ QVVTIPKPVIKFTTTTTTKTVGSSGGSGGGGGQEVKSESTGAGGDLKMEVDSDAAAVGS IAGASGSGAGSASGGGGGGGSSGVGVAVKREREEEEFEFVTN

Gossypium arboretum TAF9 TC14563 MAEGEEDLPRDAKIVKSLLKSMGVEDYEPRVIHQFLELWYRYVVDVLTDAQVYSEHAGK QTIDCDDVKLAIQSKVNFSFSQPPPREVLLELARNRNKVPLPKAIPGPGIPLPPEQDTL ISTNYQLAIPKKQPAQAMEEMEEDEESVEPNSSQEHKTDAPHPTSQRVSFPLTKRSK

Homo sapiens TAF9 NP_003178 MESGKTASPKSMPKDAQMMAQILKDMGITEYEPRVINQMLEFAFRYVTTILDDAKIYSS HAKKATVDADDVRLAIQCRADQSFTSPPPRDFLLDIARQRNQTPLPLIKPYSGPRLPPD RYCLTAPNYRLKSLQKKASTSAGRITVPRLSVGSVTSRPSTPTLGTPTPQTMSVSTKVG TPMSLTGQRFTVQMPTSQSPAVKASIPATSAVQNVLINPSLIGSKNILITTNMMSSQNT ANESSNALKRKREDDDDDDDDDDDYDNL

Homo sapiens TAF9L NP_057059 MESGKMAPPKNAPRDALVMAQILKDMGITEYEPRVINQMLEFAFRYVTTILDDAKIYSS HAKKPNVDADDVRLAIQCRADQSFTSPPPRDFLLDIARQKNQTPLPLIKPYAGPRLPPD RYCLTAPNYRLKSLIKKGPNQGRLVPRLSVGAVSSKPTTPTIATPQTVSVPNKVATPMS VTSQRFTVQIPPSQSTPVKPVPATTAVQNVLINPSMIGPKNILITTNMVSSQNTANEAN PLKRKHEDDDDNDIM

Hordeum vulgare TAF9 TC68170 MDSGGVRPSLPSAAAAGGASVPDEPRDARVVRELLRSMGLGEGEYEPRVVGQFLDLAYR YVGDVLGDAQVYADHADKPQIDADDVRLAIQANVNFSFSQPPPREVLLELARSRNKIPL PKSIAPPGSIPLPPEQDTLLSENYQLLPALKPPTQTEEAEDDNEGADAIPANPSPSYSQ DQRGSEQHQPQSQSQRVSFQLNAVAAAAAKRPLVTTDQLNMG

Lycopersicon esculentum TAF9 TC128464 MAEGGEEDLPRDAKIVKTLLKSMGVDDYEPRVVHQFLELWYRYVVDVLTDAQVYSEHAR KASIDSDDIKLAIQSKVNFSFSQPPPREVLLELARNRNKIPLPKSIAGSGVPLPPEQDT LINPNYQLAIAKKQTNQPEETEEDEESADPNPAPSKNPTLSHEKTDLPQGTPQRVSFPL GAKRPR

Medicago truncatula TAF9 TC85341 MADNEEDSNMPRDAKIMQSLLKSMGVEEYEPRVINKFLELWYRYVVDVLTDAQVYSEHA GKPAIDVDDVKLAIQSQVNFSFSQPPPREVLLELAQNRNKIPLPKSIAGPGFPLPPDQD

181

TLIAPNYQFAIPNKRSVEPMEETEDEEVPNADPNPSQEEKTDAEQNPHQRVSFPLPKRQ KD

Medicago truncatula TAF9 TC85342 MADNEEDSNMPRDAKIVQSLLKSMGVEEYEPRVINKFLELWYRYVVDVLTDAQVYSEHA GKPAIDVDDVKLAIQSQVNFSFSQPPPREVLLELAQNRNKIPLPKSIAGPGFPLSPDQD TLIAPNYQFAIPNKRSVEPMEETEDERSSQWPIPTHLKKRRQMRNKIPIKECHFPCLNP KGLI

Oryza sativa TAF9 BAC21319.1 MDTGADQAPPPPPPPPVAAASAAADEPRDLRVVREILHSLGLREGDYEEAAVHKLLLFA HRYAGDVLGEAKAYAGHAGRESLQADDVRLAIQARGMSSAAPPSREEMLDIAHKCNEIP IPKPCVPSGSISLPHYEDMLLNKKHIFVPRVEPTPHQIEETEDDYNDDGSNANVASPNS NYDQDLFGSISLPHYQDMLLNQNHLSVHRVEPAHDQLEKIKDDGSNDNADSSHSNYVQD SSGSVSLQHHQDMSLNQNHLFVHQVELTLDQIEEIKDDGSNDNVDSPNFNCVQDPSRSV SFPHYQVMPLNQNHLSFHQVEPMLDQVEEIKDDSSNDNVASLDSNCIQDPHYQDMLLNQ DHLSVRGVEPTLDQVEEIEDDCSSDNVASPDSNYDKEKNDSNKQKPSKKVSQLNTLVAA GKDKVDCSTELS

Oryza sativa TAF9 AAP12985 MDPGGLRPAPQSAAAAAAAAAAGAGAGASAADEPRDARVVRELLRSMGLSEGEYEPRVV HQFLDLAYRYVGDVLGDAQVYADHAGKPQLDADDVRLAIQSKVNFSFSQPPPRECSEFF HSDQDFRSRSLPSDNPLFFSMVLLEVARNRNKIPLPKSIAPPGSIPLPPEQDTLLSQNY QLLAPLKPPPQFEETEDDNAGANPTPTSNPSNPSPNNLQEQQQLPQHGQRVSFQLNAVA AAKRRGTMDQLNMG

Populus trichocarpa TAF9 MAEGEEDMPRDAKIVKSLLKSMGVEDYEPRVVHQFLELWYRYVVDVLTDAQVYSEHANK TAIDCDDVKLAIQSKVNFSFSQPPPREVLLELARNRNKIPLPKSIAGPGIPLPPEQDTL ISPNYQLAIPKKRTAQAIEETEEDEESADPNQSQEQKTDPPQLTPQRVSFPLTKRPNYR FQVMSSISCSSSMNSPDSSTLFTRLKFELCDMRIALI

Populus trichocarpa TAF9b MGEGTVPLEVQIRPKEMHLQAEFGFAAHWRYKEGDCKHSSFVLQVVEWARWVITWQCET MSKDRPSIGCDDSIKPPCTFPSHSDGCPYSYKPHCGQDGPIFIIMIENDKMSVQEFPAD STVMDLLERAGRASSRWSAYGFPVKEELRPRLNHRPVHDATCKLKMGDVVELTPAIPDK SLSDYREEIQRMYEHGSATVSSTAPAVSGTVGRRS

Saccharomyces cerevisiae TAF9 NP_013963 MNGGGKNVLNKNSVGSVSEVGPDSTQEETPRDVRLLHLLLASQSIHQYEDQVPLQLMDF AHRYTQGVLKDALVYNDYAGSGNSAGSGLGVEDIRLAIAARTQYQFKPTAPKELMLQLA AERNKKALPQVMGTWGVRLPPEKYCLTAKEWDLEDPKSM

Solanum tuberosum TAF9 TC67183 MAEGGEEDLPRDAKIVKTLLKSMGVDDYEPRVVHQFLELWYRYVVDVLMDAQVYSEHAG KASIDSDDIKLAIQSKVNFSFSQPPPREVLLELARNRNKIPLPKSIAGSGVPLPPEQDT

182

LINPNYQLAIAKKQTSQPEETEEDEERADPNPAPSKNPSLSHEKTDVPQGTPQQVSFPL GAKRPR

Solanum tuberosum TAF9 TC67182 MAEGGEEDLPRDAKIDKTSLKSMGVDDYEPRVVQQFLELRNSYVVDVLTDAQVYSEHAG KTSIDSDDIKLAIQSKVNFSFSQPPPREVLLELARNRNKIPLPKSIAGSGVPLPPQQDT LINPNYQLAIAKKQTSQPEETEEDEESADPNPAPSKNPSLSHEKTDVPQGTPQRVSFPL GAKRPR

Triticum aestivum TAF9 TC70841 MDGGGGGGGRPALQPAAAGGGASGPDEPRDARVVRELLRSMGLGEGEYEPRVVHQFLDL AYRYAGDVLGDAQVYADHAGKPQLDADDVRLAIQAKVNFSFSQPPPREVLLELARSRNK IPLPKSIAPPGSIPLPPEQDTLLSQNYQLLPALKPPTQTEEAEDEEEGANADAANANPN SSQDQRGNEAXSSSLRARARAQGFFQA

Vitis vinifera TAF9 TC11580 MAGGDEDLPRDAKIVKSLLKSMGVDDYEPRVIHQFLELWYRYVVDVLTDAQVYSEHASK LAIDCDDVKLAIHFKVNFSFFQPPAREVLLELARNRNKIPLPKSIAGPGIPLPPEQDTL ISPNYQLAIPKKRTAQAVEETEEDEEGADPSHASQEGRTDLPQHTPQRVSFPIGAKRPR

Zea mays TAF9 TC182853 MDAGAARPSAPSTAAVAGASVADEPRDARVVRELLRSMGLREGEYEPRVVHQFLDLAYR YVGDVLGDAQVYADHAGKAQIDADDVRLAIQAKVNFSFSQPPPREVLLELARSRNRMPL PKSIAPPGSIPLPPEQDTLLAQNYQLLPPLKPPPQYEEIEDETEEPNPSNPANSNPSYS QDQSSKEQQQQHTPQHGQRVSFQLNAVAAAAAAAKRPRMAIDQLNMG

Zea mays TAF9 TC182854 MDAADARPSAPSAAAAAVAGASVADEPRDARVVRELLRSMGLGEGEYEPRVVHQFLDLA YRYVGDVLGDAQVYADHAGKAQIDADDVRLAIQAKVNFSFSQPPPREVLLELARSRNRM PLPKSIAPPGSIPLPPEQDTLLAQNYQLLPPLKPPPQYEENEDENEESNPSLTPNPANS NPTFSQDQRSNEQQHTPQHGQRVSFQLNAVAAAAAKRPRMTVDQLNIG

TAF10 Sequences

Arabidopsis thaliana TAF10 AAK29671 MNHGQQSGEAKHEDDAALTEFLASLMDYTPTIPDDLVEHYLAKSGFQCPDVRLIRLVAV ATQKFVADVASDALQHCKARPAPVVKDKKQQKDKRLVLTMEDLSKALREYGVNVKHPEY FADSPSTGMDPATRDE

Beta vulgaris TAF10 BVSVtuc03-04-08.1346 MNPQTSDGRHDDDAALSEFLASLMDYTPTIPDELVEHYLAKSGFQCPDVRLIRLVAVAT QKFISEVATDALQHCKARQSSVVKDKRDKLQKDKRLVLTMEDLSRALKEYGVNLKHQEY FADNPSTGMDPASRDE

Drosophila melanogaster TAF10 CAC08819 MASDGEDISVTPAESVTSATDTEEEDIDSPLMQSELHSDEEQPDVEEVPLTTEESEMDE

183

LIKQLEDYSPTIPDALTMHILKTAGFCTVDPKIVRLVSVSAQKFISDIANDALQHCKTR TTNIQHSSGHSSSKDKKNPKDRKYTLAMEDLVPALADHGITMRKPQYFV

Drosophila melanogaster TAF10b AAL48842 MVGSNFGIIYHNSAGGASSHGQSSGGGGGGDRDRTTPSSHLSDFMSQLEDYTPLIPDAV TSHYLNMGGFQSDDKRIVRLISLAAQKYMSDIIDDALQHSKARTHMQTTNTPGGSKAKD RKFTLTMEDLQPALADYGINVRKVDYSQ

Glycine max TAF10 TC162515 MNQNPQSSDGRNDDDSALSDFLASLMDYTPTIPDELVEHYLAKSGFQCPDVRLTRLVAV ATQKFVAEVAGDALQHCKARQATIPKDKRDKQQKDKRLVLTMEDLSKALREYGVNLKHQ EYFADSPSTGMDPATREE

Glycine max TAF10 TC162516 MNQNPQSSEGRNDDDSALSDFLASLMDYTPTIPDELVEHYLAKSGFQCPDVRLTRLVAV ATQKFVAEVAGDALQHCKARQATIPKDKRDKQQKDKRLVLTMEDLSQALREYGANLTDQ EYFADSPSTVMDPATREE

Gossypium arboretum TAF10 BQ401852 MNHNPQSSDGKHDDDSALSDFLASLMDYAPTIPDELVEHYLAKSGFQCPDVRLIRLVAV ATQKFVAEVASDALQHCKARQAAVVKDKREKQQKDKRLILTMDDLSKSLREYGVNVKHQ EYFADSPSTGIDPASREE

Homo sapiens TAF10 Q12962 MSCSGSGADPEAAPASAASAPGPAPPVSAPAALPSSTAAENKASPAGTAGGPGAGAAAG GTGPLAARAGEPAERRGAAPVSAGGAAPPEGAISNGVYVLPSAANGDVKPVVSSTPLVD FLMQLEDYTPTIPDAVTGYYLNRAGFEASDPRIIRLISLAAQKFISDIANDALQHCKMK GTASGSSRSKSKDRKYTLTMEDLTPALSEYGINVKKPHYFT

Hordeum vulgare TAF10 Barley1_07779 MGSNNSGGAGGGGGMAPGTGAGGSDGRHDDEAVLTEFLSSLMDYNPTIPDELVEHYLGR SGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCKARVAAPIKDNKSKQPKDRRLVLTMD DLSKALREHGVNLRHPEYFADSPSAGMAPSTRDE

Hordeum vulgare TAF10 TC68796 MMGSNNPGGAGGGMAPGMGAGGSDGRHDDEAVLTEFLSSLMDYNPMIPDELVEHYLGRS GFHXPDLRLTRLVAVATQKFISDVASDSLQHCKARVAAPIKDNKSKQPKDRRLVLTMDD LSKALREHGVNLKHPEYFADSPSAGMGHSTREE

Hordeum vulgare TAF10 HVtuc02-11-10.5382 MGSNNSGGAGGGGGMAPGTGAGGSDGRHDDEAVLTEFLSSLMDYNPTIPDELVEHYLGR SGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCKARVAAPIKDNKSKQPKDRRLVLTMD DLSKALREHGVNLRHPEYFADSPSAGMGHSTREE

Lycopersicon esculentum TAF10 TC118341 MNQSQGQQTSEGRHEDDAVLADFLASLMDYTPTIPDELVEHYLGKSGFQCPDVRLIRLV

184

AVATQKFIADVATDALQHCKARQSTIVKDKRDKQQKDKRLTLTMDDLSKSLREYGVNVK HQDYFADSPSAGLDPASREE

Oryza sativa TAF10 TC129171 MVPGGMGGGGPMGAAPPGGGGGGDGRHDDEAVLTEFLSSLMDYTPTIPDELVEHYLGRS GFYCPDLRLTRLVAVATQKFISDIASDSLQHCKARVAAPIKDNKSKQPKDRRLVLTMDD LSKALQEHGVNLKHPEYFADSPSAGMAPAAREE

Pinus TAF10 TC9616 MAESKQDDDAVLIEFLSSLMDYTPTIPDELAEYYLSKSGFQCPDVRIIRMVSIATQKFI AEIASDAFQLCKARQSAVNKEKRDKQQKDKSFVLTTEDLSMALREYGVNMKRQEYFADN PSAGTNPTSKDE

Populus trichocarpa TAF10 MNNTSSSNSQQQQQSSEARHDDDAVLTEFLASLMDYTPTIPDELVEHYLAKSGFQCPDV RLVRLVAVATQKFVADVATDALQQCKARPAPVVKDKRDKQQKEKRLILTMEDLSKALSE YGVNVKHQEYFADSPSTGMDPASREE

Saccharomyces cerevisiae TAF10 NP_010451 MDFEEDYDAEFDDNQEGQLETPFPSVAGADDGDNDNDDSVAENMKKKQKREAVVDDGSE NAFGIPEFTRKDKTLEEILEMMDSTPPIIPDAVIDYYLTKNGFNVADVRVKRLLALATQ KFVSDIAKDAYEYSRIRSSVAVSNANNSQARARQLLQGQQQPGVQQISQQQHQQNEKTT ASKVVLTVNDLSSAVAEYGLNIGRPDFYR

Triticum aestivum TAF10 TC64687 MMGSNNPGGAGGGGGMAPGTGAGGSDGRHDDEAVLTEFLSSLMDYNPTIPDELVEHYLG RSGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCKARVAAPVKDNKSKQPKDRRLVLTM DDLSKALREHGGNLKHPEYFADSPSAGMPPSTREE

Triticum aestivum TAF10 TC64747 MMGSNNPGGAGGGGGGGMAPGTGGGGSDGRHDDEAVLTDFLSSLMDYNPTIPDELVEHY LGRSGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCKARVAAPIKDNKSKQPKDRRLVL TMDDLSKALREHGVNLRHPEYFADSPSAGXPLKREE

Triticum aestivum TAF10 CA620043 MAPGMGAGSSDGRHDDEAVLTEFLSSLMDYNPMIPDELVEHYLGRSGFHCPDLRLTRLV AIATQKFISDVASDSLQHCKARVAAPIKDNKSKQPKDRRLVLTMDDLSKALREHGVNLK HPEYFADSPSARMGPSTREE

Zea mays TAF10 TC184169 MGTGVGGGGDGRHDDEAALTEFLSSLMDYTPTIPDELVEHYLGRSGFHCPDLRLTRLVA VATQKFLSDIASDSLQHCKARVAAPIKDNKSKQPKDRRLVLTMDDLSKALREHGVNLKH AEYFADSPSAGMAPSTREE

185

TAF11 Sequences

Arabidopsis thaliana TAF11 At4g20280 MKHSKDPFEAAIEEEQEESPPESPVGGGGGGDGSEDGRIEIDQTQDEDERPVDVRRPMK KAKTSVVVTEAKNKDKDEDDEEEEENMDVELTKYPTSSDPAKMAKMQTILSQFTEDQMS RYESFRRSALQRPQMKKLLIGVTGSQKIGMPMIIVACGIAKMFVGELVETARVVMAERK ESGPIRPCHIRESYRRLKLEGKVPKRSVPRLFR

Arabidopsis thaliana TAF11b At1g20000 MAFNARSCCFASSNERVTCNCNCLKDQPVPSVVGCATKKLAEFWSFKIQRYVIFVKVLL RMKHSKDPFEAAMEEQEESPVETEQTLEGDERAVKKCKTSVVAEAKNKDEVEFTKNITG ADPVTRANKMQKILSQFTEEQMSRYESFRRSGFKKSDMEKLVQRITGGPKMDDTMNIVV RGIAKMFVGDLVETARVVMRERKESGPIRPCHIRESYRRLKLQGKVPQRSVQRLFR

Drosophila melanogaster TAF11 NP_723484 MDEILFPTQQKSNSLSDGDDVDLKFFQSASGERKDSDTSDPGNDADRDGKDADGDNDNK NTDGDGDSGEPAHKKLKTKKELEEEERERMQVLVSNFTEEQLDRYEMYRRSAFPKAAVK RLMQTITGCSVSQNVVIAMSGIAKVFVGEVVEEALDVMEAQGESGALQPKFIREAVRRL RTKDRMPIGRYQQPYFRLN

Homo sapiens TAF11 NP_005634 MDDAHESPSDKGGETGESDETAAVPGDPGATDTDGIPEETDGDADVDLKEAAAEEGELE SQDVSDLTTVEREDSSLLNPAAKKLKIDTKEKKEKKQKVDEDEIQKMQILVSSFSEEQL NRYEMYRRSAFPKAAIKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWG EMPPLQPKHMREAVRRLKSKGQIPNSKHKKIIFF

Hordeum vulgare TAF11 TC81880 MKDPFEAAVEEQESPPDSPAPPEEGPATAVPHTIDEDYDGSAGAGGSRPPPPRPRPSAL AAPSTSAAPAAAKAKVRPQKEQDDDDDEEDPMEVDLDKLPSGTSDPDKLAKMNALLSQF TEDQMNRYESFRRSGFQKSNMKKLLASITGSQKISMPTTIVVSGIAKMFVGEVIETARI IMSERKDSGPIRPCHIREAYRRLKLEGKIPKRSVPRLFR

Medicago truncatula TAF11 TC80073 MAGGISFGIGLKRMKQSKDPFEAAFEESPPESPIETEPDPDASTENPNSTNSSLPQSTL THEEEHNHIKTPNSNNTITKHKDEEDDEEEDNMDVELAKFPTAGDPHKMAKMQAILSQF TEEQMSRYESFRRAGFQKANMKRLLTSITGTQKISIPITIAVSGIAKVFVGEVVETART IMKERKETGPIRPCHLREAHRRLKLEGKIFKRTTSRLFR

Oryza sativa TAF11 BAB90043 MATRIAQARKRQGRDRRSSTRTPLNRGQPDKATLLQLQPGLALQRSAQPKRGIIIGNDS GSLVRRASDEQPVEYSLSSPAKRKKKHDDICGSRRFIFTCMMYHTEYDSIYRFDSPVKI LAAATEKSAQKSGPRKRPTREAAHQARRRRAAAAMKDPFEAAVEEQESPPESPAANEED AAGAPEGYDGASGSRGPPLRLPPSRAAPSGSGGAAAAAARGKVVRVQKEQQEEEDDEED HMEVDLDKLPSGTSDPDKLAKMNAILSQFTEDQMNRYESFRRSGFQKSNMKKLLASITG SQKISLPTTIVVSGIAKMFVARIVMTERKDSGPQGNQSKQYVQAEVLRYY

186

Oryza sativa TAF11b TC124761 MKDPFEAAVEEQESPPESPAANEEDAAGAPEGYDGASGSRGPPLRLPPSRAAPSGSGGA AAAAARGKVVRVQKEQQEEEDDEEDHMEVDLDKLPSGTSDPDKLAKMNAILSQFTEDQM NRYESFRRSGFQKSNMKKLLASITGSQKISLPTTIVVSGIAKMFVGELVETARIVMTER KDSGPVRPCHIREAYRRLKLEGKIPRRTVPRLFR

Populus trichocarpa TAF11 MKQSKDPFEAAYVEQEESPPESPVAQDDYDTQASNAAAAADDSQGAVVGQDDDDLGGGG RNDFAHSSDHPSASRPMLGSARSKAKNKDDDEEEEEDNMDVELSKLASTADPDKMAKMQ FGNSRTEIFQELSSYVHSALHGRRASAPVHAYCKEYHAQTIASSIRPSGLKYFYNSLCK KG

Saccharomyces cerevisiae TAF11 NP_013697 MTEPQGPLDTIPKVNYPPILTIANYFSTKQMIDQVISEDQDYVTWKLQNLRTGGTSINN QLNKYPKYKYQKTRINQQDPDSINKVPENLIFPQDILQQQTQNSNYEDTNTNEDENEKL AQDEQFKLLVTNLDKDQTNRFEVFHRTSLNKTQVKKLASTVANQTISENIRVFLQAVGK IYAGEIIELAMIVKNKWLTSQMCIEFDKRTKIGYKLKKYLKKLTFSIIENQQYKQDYQS DSVPEDEPDFYFDDEEVDKRETTLGNSLLQSKSLQQSDHNSQDLKLQLIEQYNKLVLQF NKLDVSIEKYNNSPLLPEHIREAWRLYRLQSDTLPNAYWRTQGEGQGSMFR

Triticum aestivum TAF11 TC91943 MKDPFEAAVEEQDSPPDSPAPPEEDPATAVPHTAAEDYDGSAGAGGSRAPPPRPRPSAL AAPSTSVAPAAAKAKVRPHKEQDDDDDEEDPMEVDLDKLPSGTSDPDKLAKMNALLSQF TEDQMNRYESFRRSGFQKSNMKKLLASITGSQKISMPTTIVVSGIAKMFVGEVIETARI VMSERKDSGPIRPCHIREAYRRLKLEGKIPKRSVPRLFR

TFIIEα Sequences

Arabidopsis thaliana TFIIEα1 At1g03280 MEKSGPVQKAVVLQPFVKLVRLVARAFYDDYTTKSDNQQKSARSDNRGIAAVVLDALAR RQWVREEDLAKDLQLHAKQLRKIIRLFEEEKLIMRDHRKETAKGAKMYSAAVAATTDGR AEDKVKLHTHSYCCLDYAQICDVVRFRLHRMKKRLKDELEDKNTVQEYGCPNCQRKYNA LDALRLISMVDDSFHCENCNGELVVECNKLTSEEVVDGDDNARRRRRENLKNMLQKLEV QMKPLMDQLNRVKDLPIPEFGSFLAWEARAAMAARENGDLNPNDPLRSQGGYGSTPMPF LGETKVEVNLGDGNEDVKSKGGDSSLKVLPPWMIKEGMNLTEEQRGEMRQEAKVDGGAG AAAKLSDDKKSAIGNGDEKDLKDEYLKAYYAELMKQQELAARRNQQESAGEPTSGIQSG TVYSGRQVSMKAKREEDEDEDEEEVEWEEKAPVTANGNYKVDLNVEAEASGGEEEEEED DVDWEEG

Arabidopsis thaliana TFIIEα2 At4g20340 MDKSITVVRKTVVLEPFVKLVRLLVRIFYDNYTPESDNQQKSVKNVKGSAVIVLDALTR RQWVREEDLAKEVKRNAKELRKLIRHFEEQKFVMRYHRKETAKRAKMYSYAVGGTTDGR AEDNVKFHTHSYCCLDYAQIYDIVRYKLHRLKKKFKDELEDRNTVQEYGCPNCKRKYNA LDALRLISMEDDSFHCENCNGELVMECNKLISEEVVDRGDNARRRQREKVKVWLQDLEG ELKPLMELINRVKDLPFPAFEPFPAWEARAAKAARENGDFNPDDPSRSLGGYGSTPMPF LGETKVEVNLGEGNEDVTSTGGDSSLKMLPPWMIKQGMKLTEEQRGEMRQEANVDGEAA KLSDDKKSVMENGDDNKDLKDEYLKAYYAAIMEQQKLAAKLNEQESAGESTTTDIESAT

187

TYSDRQVGMKSKREEEEEDVEWEEGASVAANGNYKVDLNVEAEEAEEKEDGDEDDDIDW EEG

Arabidopsis thaliana TFIIEα3 At4g20810 MVKLVAKTFYDNYTPKNNNQKKSAKNGSGGIAVLVLDALTRRQWVREEDLAKELKLNTK QLRTILRYFEEQQFIMRVHRKEKSSATTNGRGEDKVKVHMYSYCCLDYSQIYDVIRYKL HRMKKEFKDVLEDKDNVQEYGCPNCKRKIFFHCENCNGELVMECNKLTSEEVVVDGSDN PRSRRDHLKDLLQNMEVRLKPLMDHINRIKDLPVPSFESFPAWETRVAKAARENGDLNP DDTLRPQGGYGSTPMPFLGETEIEVNLGEENEDVKSDEVGDSSRRKLTPSWLIKKGMNL SDEQRGEIRHEAKADDGGSSMENGDDDRNLKDEYLKAYYAAILEEQELAEKLNQQESAG KVTTDIELATSSSDRQVGMKSKREEEEEEEASVAANGNYKVDLNVEAEEAEQDENDVDW QEC

Drosophila melanogaster TFIIEα NP_524026 MSSTSTAAANAAPAKTEVRYVTEVPSSLKQLARLVVRGFYSLEDALIIDMLVRNPCMKE DDIGELLRFEKKQLRARITTLRTDKFIQIRLKMETGPDGKAQKVNYYFINYKTFVNVVK YKLDLMRKRMETEERDATSRASFKCSSCSKTFTDLEADQLFDMATLEFRCTFCGSSVEE DSAAMPKKDSRLMLAHFNEQLQPLYDLLREVEGIKLAPEVLEPEPVDIDTIRGLNKPNA TRPDGMAWSGEATRNQGFAVEETRVDVTIGGDDTSDAVIERKSRPIWMTESTVITDTDA ADGAADAVQTASGSGHRNRKENEDIMSVLLQHEKQPGQKEPHMKGMRVGSSNANSSDSS DDEKDIENSKIPDVDFDNYINSDSAEEDDDVPTVLVAGRPHPLDQLDDNLIAQMTPQEK ENYIHVYQQHYSHIFE

Homo sapiens TFIIEα NP_005504 MADPDVLTEVPAALKRLAKYVIRGFYGIEHALALDILIRNSCVKEEDMLELLKFDRKQL RSVLNNLKGDKFIKCRMRVETAADGKTTRHNYYFINYRTLVNVVKYKLDHMRRRIETDE RDSTNRASFKCPVCSSTFTDLEANQLFDPMTGTFRCTFCHTEVEEDESAMPKKDARTLL ARFNEQIEPIYALLRETEDVNLAYEILEPEPTEIPALKQSKDHAATTAGAASLAGGHHR EAWATKGPSYEDLYTQNVVINMDDQEDLHRASLEGKSAKERPIWLRESTVQGAYGSEDM KEGGIDMDAFQEREEGHAGPDDNEEVMRALLIHEKKTSSAMAGSVGAAAPVTAANGDDS ESETSESDDDSPPRPAAVAVHKREEDEEEDDEFEEVADDPIVMVAGRPFSYSEVSQRPE LVAQMTPEEKEAYIAMGQRMFEDLFE

Hordeum vulgare TFIIEα TC90346 MGSLEPFNRLVRLTARAFYDDISIKGDTQAKTSRGDNRGMAVVVLDGLTRRQWVREEDL AKSLKLHSKQLRRVLRFFEEEKLVTRDHRKESAKGAKIYSAAAAAAGDGQPTKEGEEKV KLHTHSYCCLDYAQICDVVRYRIHRMKKTLKDELDSRNTVQHYICPNCKKRYSAFDALQ LISYTDEYFHCENCNGELLAESDKLSSEEMGDGDDNARKRRREKLNDMQQRIDEQLKPL QAQLKRVKDLPAPEFGSLQSWERLNLGAFAHGDSAAAEAARNAQGQYNGTPMPYLGDTK VDVELAGSGVKEEGAESGRDGTVLKVLPPWMVREGMNLTKEQRGESSNTSKGDEKSDVK DEKKQDSKEDEKSIQDEYLKAYYEAFKKKQEEEDAKRMQQEGQAFSSEIHSERQLGMKA KREDENVEDDGVEWEEEQPAGNASEEPYKFVDLNAEAPESGDEEDEIDWEEG

Methanosarcina acetivorans TFE NP_618742 MNTLVDLNDKVIRGYLISLVGEEGLRMIEEMPEGEVTDEEIAAKTGVLLNTVRRTLFIL YENKFAICRRERDSNSGWLTYLWHLDFSDVEHQLMREKKKLLRNLKTRLEFEENNVFYV CPQGCVRLLFDEATETEFLCPMCGEDLVYYDNSRFVSALKKRVDALSSV

188

Oryza sativa TFIIEα1 MGSMEPFNRLVRLAARAFYDDISMKGDNQPKTSRGDNRGMAVVVLDALTRRQWVREEDL AKALKLHSKQLRRILRFFEEEKLVTRDHRKESAKGAKIYSAAAAAAGDGQSITKEGEEK VKMHTHSYCCLDYAQICDVVRYRIHRMKKKLKDELDSRNTIQHYICPNCKKRYSAFDAL QLISYTDEYFHCENCNGELVAESDKLASEEMGDGDDNARKRRREKLKDMQQRIDEQLKP LQAQLNRVKDLPAPEFGSLQSWERANIGAFGTADPSAADSSRNPQGQYGTPMPYLGETK VEVALSGTGVKDEGAESGTNGNGLKVLPPWMIKQGMNLTKEQRGETSNSSNLDEKSEVK DEKKQDSKEDEKSIQDEYIKAYYEALRKRQDEEEAKRKIQQEGDTFASASHSERQVGMK SKREDDDEGVEWEEEQPAGNTAETYKLADLNVEAQESGDEEDEIDWEEG

Oryza sativa TFIIEα2 MVYDVVRYRIHRMRKKLKDGLDDRDTVQHYVCPNCKRRYSAFDALQLVSDMDDYFHCEH CKGELRPESEKLTLDEIVCGGGNAIKHTHDKLKDMQQRMEEQLKPLIAVLDRVKDLPFP SFMSLQDWERATIGASANGAVGSSQNSEGRYSSKPMPFLGETEVEVNFLGSTGAQEGVE SGMESIKPQHSWMNRKRTVLAGEHKEENNNTANLDQSSEAKSDKKQLSEEDEMKSIQEA YAKAYYEAIQKRQEDEGKRAIQEESLACISDQPFASDAQFERRLGAKSKRDDGGESGDD GIELKVRQSTGNIEEVYKFADLNVETQELVEKNCIPPAE

Oryza sativa TFIIEα3 MDTMEQLNRLVRMVARGFYEDVSLEEDQSKPNGSGSCGIVVVVLDALTRQQWVREEDLA RSLMIPFNRLRQITHFLEQQKLVRRYYRKEAIHDASISTASPSHVSHDAHLVPTNVAGK LKMIMQPYCCLHYGQVYDVTLYRIHEMKKKLKDELDGNYMIQNYVCPNCERRYSSLNAL DLVSHIDNNFHCKHCNEELSQDFGDLAWGGRGGDGDNARRDRHAKLKDFLQRMEHQMER LISQLNKVKDLDFPEFLALETWERNMREPAGGDDVSRPMLFLGEVMSHEHQKGSASCID ADEEIFEFRVQDARPIPSFVIRKDINHTEDKEEQL

Oryza sativa TFIIEα4 MSINERLVKCAAQLLYGNVGFKAGEVRIDCDENRGVVVMVLDALTRYQWVPDTHLAKSL KVQKKKLCLILEFLEKQMFVRRCEVKAKTGRNVSNTATTAGVSAIPRNEKVKSKHPKWY CCINYAKICSVVRYHIMQMEANLKSQLENTNTVDKYTCPNCGKSFSAFDVKDLVSCTDG NFYCESCKHELVACSEYGNYNEREGRSANLLDFLENMKEKLRPLKTKLDLLEDLPAPDF GSTPDFKGTYNISDWSRTSVPLPEPTNGDDSFSSPCAKDDESDAGVSELKILPSWLIRK GMKLKQAHLSNSSTVCGEGGTNIQEEYMKAYYEAIQKRQEDRIRHSGQSSVPGGPSVSS ERPMGVKRQKLCNDINNNALECQGEEPPGDTFRT

Populus trichocarpa TFIIEα2 MDMNTTISVEPFKRLVKLAARAFYDDVSTKGENQSKNNARGDNKGIAVVVLDALTRRLW VNEEGLAKDLKIHIKQLRRILRLFEEDKLLTRAHRKETAKVTKKPNAGGADSQRKFGSR EDDKNKLHTHSYCCLDYAQIYDVVRYRLHRMRKMIKDELENNNAVQQYICPICERRYNA LDALRLISLVDEDFHCENCDGVLVAESDKLAAQEGGDGDDNARKRRREKLKDMLQNMEC YFMVPNFDFESINCKNWPARFWLAKVQLKPLMDQLSRVKDLPIPEIGSLQAWQLHENAA GRATNGDPNSDDHFKYSQGPGYGGTPMPFLGETKVEVAFAGDESKENIKSETASTSLKV LPPWMIKQGMNLTKEQRGEVKQESKMDSSSTAVEFSDEKKSAKVNGDSIKEEYVKAYYA ALLEQQRQAEESAKQQQELSQTSMSNGLSESSSNRQVGMKSKREEGEGDDDVEWEEAPI EGKSNNWNLIALLSY

189

Populus trichocarpa TFIIEα1 MAEFGSKLVNKFEESPRGTTAFIKINEAHTEVKKELVKLAARAFYDDITTKGDNQPKTG RSDNRGIAVVVLDALTRRQWVREEDLAKELKLHSKQLRRTLRFFEEEKLVTRDHRKETA KAAKMHNAAVANTTDGHRTKEGDDKIKMHTHSYCCLDYAQIYDVVRYRLHRMRKKLKDE LEDKNTVQEYTCPNCGRRYNALDALRLMSLVDEYFHCENCDGELVAESDKLAAQEGGDG DDNARRRRREKLKDMLQKMEDASNLFLFKCYLLLMKACYRVIEEVLGRRFIFSMTGQIE MARVKDLPVPEFGSLQEWQIHASAAGRAANGDSSYNDPSRSSQGYGGTPMPFLGETKHR VEFNASKRCQLRHDQEKDSSTKGRVEVSFSGVEGKEDLKSETASTGLKVLPPWMIKQGM NLTKEQRGEVKQGSKMDDSSAAAEPPDDKKISIENDDKIKDEYVKAYYAALLQKQREAE ESAEKQQELLQTSISNGFSKSSSDRQVGMKSKREEDDEPDDDVEWEEAPIGGMSYLSME WDPLQSY

Saccharomyces cerevisiae TFIIEα NP_012897 MDRPIDDIVKNLLKFVVRGFYGGSFVLVLDAILFHSVLAEDDLKQLLSINKTELGPLIA RLRSDRLISIHKQREYPPNSKSVERVYYYVKYPHAIDAIKWKVHQVVQRLKDDLDKNSE PNGYMCPICLTKYTQLEAVQLLNFDRTEFLCSLCDEPLVEDDSGKKNKEKQDKLNRLMD QIQPIIDSLKKIDDSRIEENTFEIALARLIPPQNQSHAAYTYNPKKGSTMFRPGDSAPL PNLMGTALGNDSSRRAGANSQATLHINITTASDEVAQRELQERQAEEKRKQNAVPEWHK QSTIGKTALGRLDNEEEFDPVVTASAMDSINPDNEPAQETSYQNNRTLTEQEMEERENE KTLNDYYAALAKKQAKLNKEEEEEEEEEEDEEEEEEEEMEDVMDDNDETARENALEDEF EDVTDTAGTAKTESNTSNDVKQESINDKTEDAVNATATASGPSANAKPNDGDDDDDDDD DEMDIEFEDV

Solanum tuberosum TFIIEα TC67033 MSIEPFNRLVKLAARAFYDDITTKGDNQPKSGRSDNRGIAVVILDALTRRQWVREEDLA KDLKLHTKQLRRTLRFFEEEKLITRDHRKEGAKGAKVYNSAVAATVDGLQNGKEGDDKI KMHTHSYCCLDYAQIYDVVRYRLHRMKKKLRDELDNKNTVQEYICPNCGKRYTALDALR LISPVDEYFHCESCNEELVAESDKLASQGTTDGDDNDRRRRREKLEDMLHRVEAQLKPL MDQLARVKDLPAPEFGSLQAWEVRANAVARGANGDNANDSKSGQGLGFGGTPMPFVGET KVEVAFSGLEEKGDIKSEVSVTPMKVLPPWMIKEGMNLTKEQRGEVKQESNMEGTSTAA GLSDDKKSIGFEDVKNIQDEYIKAYYEALFKRQKEQEEATKMLPETSTTDGVYNTSTER QVGMKSKREEEDEGEDVEWEEAPPAGNTTTGNLKVDLNVQADASEDDNDEEDDIDWEEG

Sulfolobus solfataricus TFE NP_341815 MVNAEDLFINLAKSLLGDDVIDVLRILLDKGTEMTDEEIANQLNIKVNDVRKKLNLLEE QGFVSYRKTRDKDSGWFIYYWKPNIDQINEILLNRKRLILDKLKTRLEYEKNNTFFICP QDNSRYSFEEAFENEFKCLKCGSQLTYYDTDKIKSFLEQKIRQIEEEIDKETKLGANKN H

TFIIEβ Sequences

Arabidopsis thaliana TFIIEβ1 At4g21010 MALREQLDKFNKQQEKCQSTLSSISSSRTALSRSYVPAATTSQKPNVFRGKFSENTKQL QHITNIRNSAVGAQMKIVIDLLFKTRLAYTAEQINEACYVDMHNNKAVFDSLRKNPKVH YDGRRFSYKATHNIKDKKQLLSFVNKSDKVIDVSDLKDAYPNVMEDLKSLKSSGEIFWL LSNTDSKEGTVYRNNMEYPKIDDELKALFRDIIPSDMLEVEKELLKIGLKPATNIAERR AAEQLHGVSNKPKDKKKKKKEITNRTKLTNSHMLELFQS

190

Arabidopsis thaliana TFIIEβ2 At4g20330 MALKEQLDKFNKQQVKCQSTLSSIASSRERTSSSRQNVPLPAAITQKKPDAAPVKFSSD TERLQNINNIRKAPVGAQIKRVIDLLYERRLALTPEQINEWCHVDMHANKAVFDSLRKN PKAHYDGRRFSYKATHDVNDKNQLLSLVRKYLDGIAVVDLKDAYPNVMEDLKALSASGD IYLLSNSQEDIAYPNDFKCEIKVDDEFKALFRDINIPNDMLDVEKELLKIGLKPATNTA ERRAAAQTHGISNKPKDKKKKKQEISKRTKLTNAHLPELFQNLNGSSSRN

Drosophila melanogaster TFIIEβ NP_523923 MDPALLREREAFKKRAMATPTVEKKSKPDRPAPPPPSDDSRRKMRPPNAPRLDATTYKT MSGSSQYRFGVLAKIVKFMRTRHQDGDDHPLTIDEILDETNQLDIGQSVKNWLASEALH NNPKVEASPCGTKFSFKPVYKIKDGKTLMRLLKQHDLKGLGGILLDDVQESLPHCEKVL KNRSAEILFVVRPIDKKKILFYNDRTANFSVDDEFQKLWRSATVDAMDDAKIDEYLEKQ GIRSMQDHGLKKAIPKRKKAANKKRQFKKPRDNEHLADVLEVYEDNTLTLKGVNPT

Glycine max TFIIEβ TC192062 MTLQEKLDKFKKQQEKCQTTLSSIAASKAAATQKSAAHGSANGRNAAPAVKFSNDTERL QHINSIRKAPVGAQMKRVIDLLLETRQAFTPEQINGACYVDMKANKDVFENLRKNPKVN YDGQRFSYKSKYGLKDKTELLQLIRKYPEGLAVIDLKDAYPTVMEDLQAMKAAGQIWLL SNFDSQEDIAYPNDPKVHIKVDDDLKHLFRSIELPRDMIDIEKDLQKNGMKPATNTAQR RSAAQIQGISSKPKPKKKKSEISKRTKLTNAHLPELFQNLNSS

Helianthus annuus TFIIEβ TC9497 MGSLRESLNRFKQQQEKCQSTLTSIAAGSKTSNRTTTPAPRVAPAASTLAKNPVPAVKF SNDTERLQHINNVRKSPVGAQIKKVIDLLFESRQAFTAEQINEACYVDVKGNKAVFESL AKNPKVNYDGKRFSYKSKHNVRDQKELLRLIRTFAEGIAVADLKDAYPTVMEDLQALKA GRQIWLLSNFDSQEEIAYPNDPRVPIKVDDELKQLFRSIELPRDMLDIERDLQKNGMKP ATNTAKRRVDGSKWQYFE

Homo sapiens TFIIEβ NP_002086 MDPSLLRERELFKKRALSTPVVEKRSASSESSSSSSKKKKTKVEHGGSSGSKQNSDHSN GSFNLKALSGSSGYKFGVLAKIVNYMKTRHQRGDTHPLTLDEILDETQHLDIGLKQKQW LMTEALVNNPKIEVIDGKYAFKPKYNVRDKKALLRLLDQHDQRGLGGILLEDIEEALPN SQKAVKALGDQILFVNRPDKKKILFFNDKSCQFSVDEEFQKLWRSVTVDSMDEEKIEEY LKRQGISSMQESGPKKVAPIQRRKKPASQKKRRFKTHNEHLAGVLKDYSDITSSK

Hordeum vulgare TFIIEβ TC102892 MDLKDSLSRFKQQQERCQSSLASIAASQASTTKPKHRAQPINAQSAPARPAQPIKFSND TERLQHINSIRKSPIGAQIKLVIELLYKTRQAFTAEQINDETYVDINGNKAVFESLRNN LKVHYDGRRFSYKSKHDLEGKDQLLELIRCHQEGLAVVEVKDAYPSVLEDLQALKAAGE VWLLSNMDSQEDIVYPNDPKVKIKVDDDLKELFRGIELPRDMVDIEKDLQKNGMKPMTD TTKRRAAAQIHGVKPKAKPKKKQREITKRTKLTNAHLPELFQHLKS

Hordeum vulgare TFIIEβ TC89335 MALNERLSKFKQQQERCQTTLSSIAATQASTTKSHNAPRSRPANAPSAPAKQIQAIKFS NDTERLQHINSVRKSPVGAQIKLVIELLYKTRLAYTAEQINEATYVAINSNKAVFDSLT

191

NNPKVQFDGKRFSYKSKHDLKGKDQLLHLIRRFPEGLPVVEVKDSYPTVLDDLQALKAS GDVWWLSSMDSQEDIVYPNDPKSKIKLDADLKQLYREIELPRDMIDIEKELLKNGHKPA TDTTKRRAAAQIHGQRPKPKAKKKQKEITKRTKLTNAHLPELFDLPR

Lycopersicon esculentum TFIIEβ TC116522 MASLQESLQRFKKQQEKCQAITSMAARAGPSKGAPPRPANAKPPAPAVKFSNDTERLQH INTIRKGPVGSQMKRVIDLLLETRQAFTPEQINEACYVDLIGNKPVFDSLRKNVKVYYD GNRFSYKSKHALKNKEQLLILIRKFPEGIAVIDLKDAYPTVMEDLQALKGAGQIWLLSN FDSQEDIAFPNDPRVPIKVDDDLKQLFRGIELPRDMLDIERDLQKNGMKPATNTAKRRA MAQVHGIAPKPKTKKKKHEISKRTKLTNAHLPELFKL

Medicago truncatula TFIIEβ TC77471 partial MALQGKLDRFKKQQEKCQSTLSSIAANKAVSASVPNALAPVKFSTDTERLQHINSIRKA PVGAQMKRVIDLLFETRQALTLEQINETCHVDMKANKDVFDNMRKNPKVRYDGERFSYK SKHALRDKKELLFLIRKFPEGIAVIDLKDSYPTVMEDLQALKGGREIWLLSNFDSQEDI AYPNDPKVPIKVDDDLKQLFRGIELPRDMIDIERDLQKNGMKPATNTAKRRSAAQMEGI SSKPKPKKKKNEITKRTKLTNAHLPE

Oryza sativa TFIIEβ AAM01137 MDLKDSLSKFKQQQERCQSSLASIAASTSKPKHRAQPVNAPSAPARPLQPIKFSNDTER LQHINSVRKSPIGAQIKLVIELLYKTRQAFTAEQINETTYVDIHGNKSVFDSLRNNPKV HYDGRRFSYKSKHDLKGKDQLLVLVRKYPEGLAVVEVKDAYPTVMEDLQALKAAGEVWL LSNMDSQEDIVYPNDPKAKIKVDDDLKQLFREMELPRDMVDIEKELQKNGIKPMTNTAK RRAAAQINGVQPKAKPKKKQREITRRTKLTNAHLPELFQNLNT

Oryza sativa TFIIEβ TC151474 MDLKDSLSKFKQQQERCQSSLASIAASTSKPKHRAQPVNAPSAPARPLQPIKFSNDTER LQHINSVRKSPIGAQIKLVIELLYKTRQAFTAEQINETTYVDIHGNKSVFDSLRNNPKV HYDGRRFSYKSKHDLKGKDQLLVLVRKYPEGLAVVEVKDAYPTVMEDLQALKAAGEVWL LSNMDSQEDIVYPNDPKAKIKVDDDLKQLFREMELPRDMVDIEKELQKNGIKPMTNTAK RRAAAQINGVQPKAKPKKKQREITRRTKLTNAHLPELFQNLNT

Populus trichocarpa TFIIEβ1 MALQEQLDRFKKQQEKCQSTLTSIAKSRPSKSSLTQKTVAVAPAPSTSARTPAPAVKFS NDTERLQHINSIRKAPAGAQIKRVIDLLLETRQAFTPEQINDHCYVDMNSNKAVFDSLR NNPKVHYDGKRFSYKSKHDLKDKSQLLVLIRKFPEGIAVIDLKDSYPSVMDDLQALKAV GQIWLLSNFDSQEDIAYPNDPRMVIKVDDDLKQLFRGIELPRDMLDIEKDLQKNGMKPA TNTAKRRAAAQVQGISTKQKAKKKKHEISKRTKLTNAHLPELFKNLGS

Saccharomyces cerevisiae TFIIEβ NP_012988 MSKNRDPLLANLNAFKSKVKSAPVIAPAKVGQKKTNDTVITIDGNTRKRTASERAQENT LNSAKNPVLVDIKKEAGSNSSNAISLDDDDDDEDFGSSPSKKVRPGSIAAAALQANQTD ISKSHDSSKLLWATEYIQKKGKPVLVNELLDYLSMKKDDKVIELLKKLDRIEFDPKKGT FKYLSTYDVHSPSELLKLLRSQVTFKGISCKDLKDGWPQCDETINQLEEDSKILVLRTK KDKTPRYVWYNSGGNLKCIDEEFVKMWENVQLPQFAELPRKLQDLGLKPASVDPATIKR QTKRVEVKKKRQRKGKITNTHMTGILKDYSHRV

192

Solanum tuberosum TFIIEβ TC60506 MASLQESLQRFKKQQEKCQAISSMAARAGPSKGAPPRPANAKPPAPAVKFSNDTERLQH INSIRKGPVGAQIKRVIDLLLETRQAFTPEQINEACYVDINGNKAVFDSLRNNLKVYYD GNRFSYKSKHALKNKEQLLILIRKFPEGIAVIDLKDAYPTVMEDLQALKGAGQIWLLSK FDSQEDIAFPNDPRVPIKVDDDLKQLFRSIELPRDMLDIERDLQKNGMKPATNTAKRRA MAQVHGIVPKPKTKKKKHEISKRTKLTNAHLPELFKL

Sorghum bicolor TFIIEβ TC59949 MDLKDSLSRFKQQQERCQSSLASIAASSSKPKHRAQPAHAPNVPARPSQPVKFSNDTER LQHINSIRKSPVGAQIKLVIELLYKTRQAFTAEQINDATYVDIHGNKAVFDSLRNNPKV SYDGRRFSYKSKHDLKGKDQLLVLIRKFPEGLAVVEVKDAYPNVLEDLQALKAAGEVWL LSNMDSQEDIVYPNDPKAKIKVDDDLKQLFREIELPRDMVDIEKELQRNGFKPMTNTAK RRAAAQINGVKPKAKPKKKQREITKRTKLTNAHLPELFQNLNT

Sorghum bicolor TFIIEβ TC67168 MALNDRLNKFKQQQERCQNTLSSIFASQTSISTSKHVPGIQPVNAPLAPIKPLHPIKFS NDTERLQHINSVRKSAVGVQIKLVVELLYKTRQSFTAKQVNEATYVDIHGNKAVSDSLR NNPKVLFDGTRFSYKPKHILTGRDELLGLIKEKECGLPVEDIKDAYPSVLEDLQALKAS GDVWWLSSTQSQEDMAYFNDPRYNITVDNDLKELFLKTELPRDMLDVEKEIKKSGEKPM TNTTKRRALAQILDAAPKTKTKGSKKKQRRLTGKSKGLTNIHMPELFDA

Triticum aestivum TFIIEβ TC110564 MDLKDSLSRFKQQQERCQSSLASIAASQASTTKPKHRAQPINAPSAPARPAQPIKFSND TERLQHINSIRKSPVGAQIKLVIELLYKTRQAFTAEQINDATYVDINANKAVFDSLRNN LKVQYDGRRFSYKSKHDLEGKDQLLDLIRCHQEGLAVVEVKDAYPSVLEDLQALKAAGE VWLLSNMDSQEDIVYPNDPKVKIKVDDDLKELFRGIELPRDMVDIEKELQKNGMKPMTD TTKRRAAAQIHGVKPKAKPKKKQREITKRTKLTNAHLPELFQHLKS

Triticum aestivum TFIIEb TC129305 MALNERLSKFKQQQERCQTTLSSIAATQASTTKSHNAPRSRPANAPSAPAKQIQAIKFS NDTERLQHINSVRKSPVGAQIKLVIELLYKTRLAYTAEQINEATYVAINSNKAVFDSLT NNPKVQFDGKRFSYKSKHDLKGKDQLLHLIRRFPEGLPVVEVKDSYPTVLDDLQALKAS GDVWWLSSMDSQEDIVYPNDPKSKIKVDADLKQLYREIELPRDMIDIEKELLKNGHKPA TDTTKRRAAAQIHGQRPKPKAKKKQKEITKRTKLTNAHLPELFDLPR

Zea mays TFIIEβ TC209727 MDLKDSLSKFKQQQERCQSSLASIAASTSKPKHRAQPAHAPNVPARPSQPIKFSNDTER LQHINSIRKSPVGAQIKLVIELLYKTRQAFTAEQINEATYVDIHGNKAVFDSLRNNPKV SYDGRRFSYKSKHDLKGKDQLLVLIRKFPEGLAVVEVKDAYSNVLEDLQALKAAGEVWL LSNMDSQEDIVYPNDPKAKIKVDDDLKQLFREIELPRDMVDIEKELQKNGFKPMTNTAK RRAAAQINGVKPKAKPKKKQREITKRTKLTNAHLPELFQNLNT

TFIIFα Sequences

Arabidopsis thaliana TFIIFα At4g12610 MSNCLQLNTSCVGCGSQSDLYGSSCRHMTLCLKCGRTMAQNKSKCHECGTVVTRLIREY NVRAAAPTDKNYFIGRFVTGLPNFKKGSENKWSLRKDIPQGRQFTDAQREKLKNKPWIL

193

EDETGQFQYQGHLEGSQSATYYLLVMQNKEFVAIPAGSWYNFNKVAQYKQLTLEEAEEK MKNRRKTADGYQRWMMKAANNGPALFGEVDNEKESGGTSGGGGRGRKKSSGGDEEEGNV SDRGDEDEEEEASRKSRLGLNRKSNDDDDEEGPRGGDLDMDDDDIEKGDDWEHEEIFTD DDEAVGNDPEEREDLLAPEIPAPPEIKQDEDDEENEEEEGGLSKSGKELKKLLGKANGL DESDEDDDDDSDDEEETNYGTVTNSKQKEAAKEEPVDNAPAKPAPSGPPRGTPPAKPSK GKRKLNDGDSKKPSSSVQKKVKTENDPKSSLKEERANTVSKSNTPTKAVKAEPASAPAS SSSAATGPVTEDEIRAVLMEKKQVTTQDLVSRFKARLKTKEDKNAFANILRKISKIQKN AGSQNFVVLREKCQPKPGKRESRVNKLNIRSNLQPRKMELVTEDEIRKVLMEKKQLTTL ELVMRFKERLTTTEDKDSFSHILKKIAKLQKNPGSEKFVVVLRDNVTPLASDLTRLSIS

Drosophila melanogaster TFIIFα NP_524246 MSSASKSTPSAASGSSTSAAAAAAASVASGSASSSANVQEFKIRVPKMPKKHHVMRFNA TLNVDFAQWRNVKLERENNMKEFRGMEEDQPKFGAGSEYNRDQREEARRKKFGIIARKY RPEAQPWILKVGGKTGKKFKGIREGGVGENAAFYVFTHAPDGAIEAYPLTEWYNFQPIQ RYKSLSAEEAEQEFGRRKKVMNYFSLMLRKRLRGDEEEEQDPEEAKLIKAATKKSKELK ITDMDEWIDSEDESDSEDEEDKKKKEQEDSDDGKAKGKGKKGADKKKKKRDVDDEAFEE SDDGDEEGREMDYDTSSSEDEPDPEAKVDKDMKGVAEEDALRKLLTSDEEEDDEKKSDE SDKEDADGEKKKKDKGKDEVSKDKKKKKPTKDDKKGKSNGSGDSSTDFSSDSTDSEDDL SNGPPKKKVVVKDKDKEKEKEKESAASSKVIASSSNANKSRSATPTLSTDASKRKMNSL PSDLTASDTSNSPTSTPAKRPKNEISTSLPTSFSGGKVEDYGITEEAVRRYLKRKPLTA TELLTKFKNKKTPVSSDRLVETMTKILKKINPVKHTIQGKMYLWIK

Homo sapiens TFIIFα/RAP74 NP_002087 MAALGPSSQNVTEYVVRVPKNTTKKYNIMAFNAADKVNFATWNQARLERDLSNKKIYQE EEMPESGAGSEFNRKLREEARRKKYGIVLKEFRPEDQPWLLRVNGKSGRKFKGIKKGGV TENTSYYIFTQCPDGAFEAFPVHNWYNFTPLARHRTLTAEEAEEEWERRNKVLNHFSIM QQRRLKDQDQDEDEEEKEKRGRRKASELRIHDLEDDLEMSSDASDASGEEGGRVPKAKK KAPLAKGGRKKKKKKGSDDEAFEDSDDGDFEGQEVDYMSDGSSSSQEEPESKAKAPQQE EGPKGVDEQSDSSEESEEEKPPEEDKEEEEEKKAPTPQEKKRRKDSSEESDSSEESDID SEASSAFFMAKKKTPPKRERKPSGGSSRGNSRPGTPSAEGGSTSSTLRAAASKLEQGKR VSEMPAAKRLRLDTGPQSLSGKSTPQPPSGKTTPNSGDVQVTEDAVRRYLTRKPMTTKD LLKKFQTKKTGLSSEQTVNVLAQILKRLNPERKMINDKMHFSLKE

Oryza sativa TFIIFα TC148835 MGSADLVLKAACEGCGSPSDLYGTSCKHTTLCSSCGKSMALSGARCLVCSAPITNLIRE YNVRANATTDKSFSIGRFVTGLPPFSKKKSAENKWSLHKEGLQGRQIPENMREKYNRKP WILEDETGQYQYQGQMEGSQSSTATYYLLMMHGKEFHAYPAGSWYNFSKIAQYKQLTLE EAEEKMNKRKTSATGYERWMMKAATNGPAAFGSDVKKLEPTNGTEKENARPKKGKNNEE GNNSDKGEEDEEEEAARKNRLALNKKSMDDDEEGGKDLDFDLDDEIEKGDDWEHEETFT DDDEAVDIDPEERADLAPEIPAPPEIKQDDEENEEEGGLSKSGKELKKLLGKAAGLNES DADEDDEDDDQEDESSPVLAPKQKDQPKDEPVDNSPAKPTPSGHARGTPPASKSKQKRK SGGGDDSKASGGAASKKAKVESDTKPSVAKDETPSSSKPASKATAASKTSANVSPVTED EIRTVLLAVAPVTTQDLVSRFKSRLRGPEDKNAFAEILKKISKIQKTNGHNYVVLRDDK K

Populus trichocarpa TFIIFα1 MSFDLLLKPSCSGCGSTTDLYGSNCKHMTLCLNCGKTMAENRGKCFDCGTTEYNVRAST

194

SSDKNYFIGRFVTGLPSFSKKKNAENKWSLHKEGILGRQITDALREKFKNKPWLLEDET GQSQYQGHLEGSQSATYYLLMMTGKEFVAIPAGSWYNFNKVAHYKQLTLEEAEEKMKNR RKTADGYERWMMKAANNGAAAFGEVEKVDDKEGVSAGGRGGRRKASGDDDEGNVSDRGE EDEEEEAGRKSRLGLNKQGGDDDEEGPRGGDLDMDDDDIEKGDDWEHEEIFTDDDEAVA IDPEEREDLAPEVPAPPEIKQDEDDEDEENEEGGLSKSGKELKKLLGKANGLNESDVED DDDDEDMDDDISPVLAPKQKDVVPKEEAADISPAKPTPSGSAKGTPSTSKSAKGKRKLN GEDAKSSNGAPVKKVKTENEVKPAVKEESSPATKGTATPKVTPPSSKTGSTSGSTGPVT EEEIRAVLLQNGPVTTQDLVARFKSRLRTPECFTADYSLGLSVRLCMLLRICVVHDGIN TISGVWVAKFLHRTWGFYQFNKGWVGGTG

Saccharomyces cerevisiae TFIIF-L/α AAA61640 MSRRNPPGSRNGGGPTNASPFIKRDRMRRNFLRMRMGQNGSNSSSPGVPNGDNSRGSLV KKDDPEYAEEREKMLLQIGVEADAGRSNVKVKDEDPNEYNEFPLRAIPKEDLENMRTHL LKFQSKKKINPVTDFHLPVRLHRKDTRNLQFQLTRAEIVQRQKEISEYKKKAEQERSTP NSGGMNKSGTVSLNNTVKDGSQTPTVDSVTKDNTANGVNSSIPTVTGSSVPPASPTTVS AIESNGLSNGSTSAANGLDGNASTANLANGRPLVTKLEDAGPAEDPTKVGMVKYDGKEV TNEPEFEEGTMDPLADVAPDGGGRAKRGNLRRKTRQLKVLDENAKKLRFEEFYPWVMED FDGYNTWVGSYEAGNSDSYVLLSVEDDGSFTMIPADKVYKFTARNKYATLTIDEAEKRM DKKSGEVPRWLMKHLDNIGTTTTRYDRTRRKLKAVADQQAMDEDDRDDNSEVELDYDEE FADDEEAPIIDGNEQENKESEQRIKKEMLQANAMGLRDEEAPSENEEDELFGEKKIDED GERIKKALQKTELAALYSSDENEINPYLSESDIENKENESPVKKEEDSDTLSKSKRSSP KKQQKKATNAHVHKEPTLRVKSIKNCVIILKGDKKILKSFPEGEWNPQTTKAVDSSNNA SNTVPSPIKQEEGLNSTVAEREETPAPTITEKDIIEAIGDGKVNIKEFGKFIRRKYPGA ENKKLMFAIVKKLCRKVGNDHMELKKE

Triticum aestivum TFIIFα TC106270 MGSVDLVLKPACEGCGSTSDLYGTGCKHTTLCSSCGKSMALSRARCLVCSAPITNLIRE YNVRANASTDKAFSIGRFVTGLPPFSKKKNAENKWSLHKEGLQGRQLTDKMLEKYNRKP WILEDETGQYQFQGHMEGSQSATATYYLLMLHGKEFHAFPAGSWYNFSKVAQYKQLTLE EAEEKMNKRKTSATGYERWMMKAATNGPAAFGSDMMKLEPANDGEKESARHKKGKDNEE GNNSDKGEENEEEEAARKDRLGLSKRGMDDDEEGGKDLDFDLDDDIEKGDDWEHEETFT DDDEAVDIDPEERADLAPEIPAPPEIKQDDEENEEEGGLSKSGKELKKLLGRSSGQNES DADDDDEEDDQDDESSPVLAPKQTDQPKDEPVDNSPAKPTPSSGHARSTPPASKSKQKR KSGGDDAKASSGAASKKAKVESDTKTSSIKEETPSSSKPTPKASASSRSANVSPVTEDE IRTVLLAVAPVTTQDLVSRFKSRLRGPEDKNAFAEILKKISKIQKTNGHNYVVLREDKK

TFIIFβ Sequences

Arabidopsis thaliana TFIIFβ1 At3g52270 MEDVKVEMKVRKNENEALETGLAERSMLLMKAPSLVASSLQSHSFPDDPYRPDDPYRPD AKVILGVDPLAHEDEGTQLFRVSSNHSGKFHPLRNLLLHSLKFHGFGEMGFLSLEISSG PHFDHEIPCNLRIGLLSMNFLARHLNYEFLGVKHGNSFALQFVMELARADSGNMPRRYT LDMSKDFIPMNVFCESSDDFGSLGEEFSIGMFIYSPGKMSVEGKIKNKFDMRPHNENIE SYGRLCRERTNKYMGKNRQIQVIDNARGMHMRPMPGMIIPTAAPEKKKLTNRTSEMKRT RRDRREMEEVMFNLFERQSNWTLRLLIQETDQPEQFLKDLLKDLCIYNNKGSNQGTYEL KPEYKKATQE

195

Arabidopsis thaliana TFIIFβ2 At1g75510 MEDIHNLDIEKSDRSIWLMKCPVVVDKAWHKIAASSSSSFASSDSPPDMAKIVREVDPL RDDSPPEFKMYMVGAEYGNMPKCYALNMFTDFVPMGGFSDVNQGCAAAEGKVDHKFDMK PYGETIEEYARLCRERTSKAMVKNRQIQVIDNDRGVHMRPMPGMLGLVSSNSKEKRKPP PVKQTEVKRTRRDRGELEAIMFKLFEGQPNWTLKQLVQETDQPAQFLKEILNELCVYNK RGSNQGTYELKPEYKKSAEDDTGGQ

Drosophila melanogaster TFIIFβ NP_524305 MSKEDKEKTQIIDKDLDLSNAGRGVWLVKVPKYIAQKWEKAPTNMDVGKLRINKTPGQK AQVSLSLTPAVLALDPEEKIPTEHILDVSQVTKQTLGVFSHMAPSDGKENSTTSAAQPD NEKLYMEGRIVQKLECRPIADNCYMKLKLESIRKASEPQRRVQPIDKIVQNFKPVKDHA HNIEYRERKKAEGKKARDDKNAVMDMLFHAFEKHQYYNIKDLVKITNQPISYLKEILKD VCDYNMKNPHKNMWELKKEYRHYKTEEKKEEEHKSGSSDSE

Glycine max TFIIFβ TC178154 MDEENGYSGSISSNLETTKAERSVWLMKCPLVVAKSWQTHPPSQPLAKVVLSLDPLHPE EDDPSAVQFTMEMAGTEAVNMSKTYSLNMFKDFVPMCVFSETSQGGKVAMEGKVEHKFD MKPHGENIEEYGKLCRERTNKSMIKNRQIQVIDNDRGVLMRPMPGMIGLVSSNSKDKKK TQPVKQSDTKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEILNELCVY NKRGANQGTYELKPEYKKSVEDTSAE

Homo sapiens TFIIFβ NP_004119 MAERGELDLTGAKQNTGVWLVKVPKYLSQQWAKASGRGEVGKLRIAKTQGRTEVSFTLN EDLANIHDIGGKPASVSAPREHPFVLQSVGGQTLTVFTESSSDKLSLEGIVVQRAECRP AASENYMRLKRLQIEESSKPVRLSQQLDKVVTTNYKPVANHQYNIEYERKKKEDGKRAR ADKQHVLDMLFSAFEKHQYYNLKDLVDITKQPVVYLKEILKEIGVQNVKGIHKNTWELK PEYRHYQGEEKSD

Hordeum vulgare TFIIFβ TC103743 MGDEAKYLETARADRSVWLMKCPPVVSQAWQGASASSGDANPNPVVAKVVLSLDPLSSA EPSLQFKMEMSQTSVASTCNLPKSYSLNMFKDFVPMCVFSETNQGKLSCEGKVEHKFDM EPHKDNLLNYAKLCRERTQKSMVKTRKVQVLDNDHGMSMRPMPGMVGLISSSSKEKRKP TPTKPSDVKRTRRDRRELENIIFKLFEKQPNWALKALVQETDQPEQFLKEILNDLCMYN KRGPNQGTHELKPEYKKSSEDAAGAP

Medicago truncatula TFIIFβ TC78885 MEDENSYGGSSGGSNLETSKAERSVWLMKCPVAVAKSWQNHPPSQPLSKVVFSIDPLLP EDDPAHLQFTMEMSGTEAVNMPKTYSLNMFKDFVPMCIFSETSEGDKVAMEGKVEHKFD MKPRHENMDDYGKLCRERTKKSMIKNRQVQIIADDRGTHMRPMPGMVGLVSSNFKDKKR TQPVKQTDTKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEILNELCVY NKRGANQGTYELKPEYKKSVEDANAE

Oryza sativa TFIIFβ TC137623 MAEEAKNLETARADRSVWLMKCPTVVSRAWQEAATAAASSSSSSDAAAGANSNSNANPN PVVAKFKMEMAQTGNGNTPKSYSLNMFKDFVPMCVFSESNQGKLSCEGKVGHKFDMEPH SDNLVNYGKLCRERTQKSMIKNRKLMVLANDNGMSMRPLPGLVGLMSSGPKQKEKKPLP

196

VKPSDMKRTRRDRRELENILFKLFERQPNWSLKNLMQETDQPEQFLKEILNDLCFYNKR GPNQGTHELKPEYKKSTEDADATAT

Populus trichocarpa TFIIFβ1 MDDEASNSSSGNNNNNNKNLTNDNNNKSPVLGGFLDASKAEKSVWLMKCPSIVSRFLRS QEHEVGDGDASSPPVAKVIVSVDPLKSNDDDNSATEDYPNALNFELSLVLFCLVFTLHD FFCSLWKWLGTGLGDGLKSYSMEMSKDLVDMSVFSESSQGKLSVEGRILNKFDVRPHSE NLENYRKICRERTKKYMVKSRQIKVIDNDTGSHMMPMPGMIISGLAVLSFFYIFVNDKK KLPIKASDMKRTRRDRREMEGIMFKLFEKQPNWTLKQLVQETDQPEQFVKDMLKDLCVY NNKGSNQGSYELKPEYKKSNEEPAPE

Populus trichocarpa TFIIFβ2 MEEDHSNGGNSSSSGNLETSKADKAVWLMKCPVVVAKSWKSHHTSSSDSAPLAKVVLSL DPLQSDDPSAIQFTMEMARTETGNVPKSYSLNMFKDFVPMGVFSETPQGRVSMEGKVEH KFDMKPHEENIEEYSKLCRDRTKKSMIKNRQIRVIDNDRGVHMRPMPGMVGLISSTSKD KKKTQPVKQSDVKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEILNEL CVYNKRGTNQGTYELKPEYKKTAEDTGAD

Populus trichocarpa TFIIFβ3 MEEDNSSSSANLETSKADKSVWLMKCPVVVAKSWKTHTSPSSSDSAPLAKVVLSLDPLQ SDDPSALQFTMEMARTEAGNVPKSYSLNMFKDFVPMCVFSETPQGKVAMEGKVEHKFDM KPHEQNIEEYHKLCRERTKKSMVKIRQIQVINNDRGVHMRPMPGMVGLISSSSKDKKRP QPVKQSDVKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEILNELCVYN KRGTNQGTYELKPEYKKTVEDTGAD

Saccharomyces cerevisiae TFIIFβ NP_011519 MSSGSAGAPALSNNSTNSVAKEKSGNISGDEYLSQEEEVFDGNDIENNETKVYEESLDL DLERSNRQVWLVRLPMFLAEKWRDRNNLHGQELGKIRINKDGSKITLLLNENDNDSIPH EYDLELTKKVVENEYVFTEQNLKKYQQRKKELEADPEKQRQAYLKKQEREEELKKKQQQ QKRRNNRKKFNHRVMTDRDGRDRYIPYVKTIPKKTAIVGTVCHECQVMPSMNDPNYHKI VEQRRNIVKLNNKERITTLDETVGVTMSHTGMSMRSDNSNFLKVGREKAKSNIKSIRMP KKEILDYLFKLFDEYDYWSLKGLKERTRQPEAHLKECLDKVATLVKKGPYAFKYTLRPE YKKLKEEERKATLGELADEQTGSAGDNAQGDAEADLEDEIEMEDVV

Triticum aestivum TFIIFβ TC122239 MGDEAKYLETARADRSVWLMKCPPVVSQAWQGASSSSGDANPNPVVAKVVLSLDPLSSA EPSLQFKMEMSQTSVASTCNLPKSYSLNMFKDFVPMCVFSETNQGKLSCEGKVEHKFDM EPHKDNLLNYAKLCRERTQKSMVKTRKVQVLDNDHGMSMRPMPGMVGLISSSSKEKRKP TPTKPSDVKRTRRDRRELENIIFKLFEKQPNWALKALVQETDQPEQFLKEILNDLCMYN KRGPNQGTHELKPEYKKSSEDAAGAP

Vitis vinifera TFIIFβ TC20528 MEEEQGNSSSSNLETGKAERSVWLMKCPLAVSKSWQSHSSSESQPVAKVVLSLDPLRSE DPSALEFTMEMTGTGAPNMPKSYSLNMFKDFVPMCVFSETNQGRVAMEGKVEHKFDMKP HNENIEEYGKLCRERTNKSMIKNRQIQVIDNDRGVHMRPMPGMVGLIASNSKDKKKTAP VKGSDMKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEILNELCVYNKR GTNQGTYELKPEYKKSAEDTGAE

APPENDIX B AMINO ACID MULTIPLE SEQUENCE ALIGNMENTS FOR CORE DOMAINS OF THE GENERAL TRANSCRIPTION FACTORS

TFIIA Small Subunit Alignment

CLUSTAL X (1.83) multiple sequence alignment

Homo_sapiens_TFIIA-gamma_NP_00 MAY------QLYRNTTLGNSLQESLDELIQSQQITPQLALQVLLQFD Drosophila_melanogaster_TFIIA- MSY------QLYRNTTLGNTLQESLDELIQYGQITPGLAFKVLLQFD Arabidopsis_thaliana_TFIIA-S_A MAT------FELYRRSTIGMCLTETLDEMVQSGTLSPELAIQVLVQFD Mesembryanthemum_crystallinum_ MAT------FELYRRSTIGMCLTETLDEMVQSGTLSPELAIQVLVQFD Medicago_truncatula_TFIIA-S_TC MAT------FELYRRSTIGMCLTETLDEMVQNGTLSPEIAIQVLVQFD Glycine_max_TFIIA-S_TC148651 MAT------FELYRRSTIGMCLTETLDEMVQNGTLSPELAIQVLVQFD Populus_balsamifera_TFIIA-S1 MAT------FELYRRSTIGMCLTETLDDMVQNGTLSPELAFQVLVQFD Vitis_vinifera_TFIIA-S MAT------FELYRRSTIGMCLTETLDEMVQNGTLSPELAIQVLVQFD Lycopersicon_esculentum_TFIIA- MAT------FELYRRSTIGMCLTETLDEMVSNGILSPEHAIQVLVQFD Solanum_tuberosum_TFIIA-S_TC60 MAT------FELYRRSTIGMCLTETLDEMVSNGILSPEHAIQVLVQFD Triticum_aestivum_TFIIA-S1_TC7 MAT------FELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFD Triticum_aestivum_TFIIA-S2_TC7 MAT------FELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFD Oryza_sativa_TFIIA-S_AAK73129 MAT------FELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFD Triticum_aestivum_TFIIA-S3_CA4 MAT------FELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFD Zea_mays_TFIIA-S1_TC170582 MAT------FELYRRSTIGMCLTETLDEMVSNGTLSPELAIQVLVQFD Hordeum_vulgare_TFIIA-S_TC6639 MAT------FELYRRSTIGMCLTETLDEMVSSGTLSPELAIQVLVQFD Oryza_sativa_TFIIA-S2 MAT------FELYRRSTIGMCLTDTLDDMVSSGALSPELAIQVLVQFD Zea_mays_TFIIA-S2_TC173972 MAT------FELYRRSTIGTCLTETLDELVSSGAVSPELAIQVLVQFD Pinus_TFIIA-S_TC16392 MAT------FELYRKSTIGTCLTETLDELVLNGTLSPEHAIQVLVQFD Populus_trichocarpa_TFIIA-S2 MSTNGNNPAPYFELYRRSSVGLALTDALDELIQSGHINPQLALTVLKQFD Saccharomyces_cerevisiae_TFIIA MAVPG-----YYELYRRSTIGNSLVDALDTLISDGRIEASLAMRVLETFD

Homo_sapiens_TFIIA-gamma_NP_00 KAINAALAQRVRNRVNFR-GSLNTYRFCDNVWTFVLNDVEFREV------Drosophila_melanogaster_TFIIA- KSINNALNQRVKARVTFKAGKLNTYRFCDNVWTLMLNDVEFREV------Arabidopsis_thaliana_TFIIA-S_A KSMTEALESQVKTKVSIK-GHLHTYRFCDNVWTFILQDAMFKSD------Mesembryanthemum_crystallinum_ KSMTEALEAQVKTKVTIK-GHLHTYRFCDNVWTFMLQDALFKSE------Medicago_truncatula_TFIIA-S_TC KSMTEALETQVKSKVSIK-GHLHTYRFCDNVWTFILQDALFKNE------Glycine_max_TFIIA-S_TC148651 KSMTEALETQVKSKVSIK-GHLHTYRFCDNVWTFILQDALFKNE------Populus_balsamifera_TFIIA-S1 KSMTEALETKVKSKVTIK-GHLHTYRFCDNVWTFILQDANFKNE------Vitis_vinifera_TFIIA-S KSMTEALESQVKSKVTIK-GHLHTYRFCDNVWTFILQDALFKNE------Lycopersicon_esculentum_TFIIA- KSMTEALETQVKSKVTIK-GHLHTYRFCDNVWTFILQDAVFKSE------Solanum_tuberosum_TFIIA-S_TC60 KSMTEALETQVKSKVTIK-GHLHTYRFCDNVWTFILQDAVFKSE------Triticum_aestivum_TFIIA-S1_TC7 KSMTEALENQVKSKVTVK-GHLHTYRFCDNVWTFILTDAQFKNE------Triticum_aestivum_TFIIA-S2_TC7 KSMTDALETQVKSKVTVK-GHLHTYRFCDNVWTFILTDAQFKNE------Oryza_sativa_TFIIA-S_AAK73129 KSMTEALENQVKSKVSIK-GHLHTYRFCDNVWTFILTEASFKNE------Triticum_aestivum_TFIIA-S3_CA4 KSMTDALENQVKSKVNIK-GHLHTYRFCDNVWTFILTDASFKNE------Zea_mays_TFIIA-S1_TC170582 KSMTDALENQVKSKVTVK-GHLHTYRFCDNVWTFILTDASFKNE------Hordeum_vulgare_TFIIA-S_TC6639 KSMTEALENQVKSKVTVK-GHLHTYRFCDNVWTFILTDAQFKNE------Oryza_sativa_TFIIA-S2 KSMTSALEHQVKSKVTVK-GHLHTYRFCDNVWTFILTDAIFKNE------Zea_mays_TFIIA-S2_TC173972 KSMTEALEMQVKSKVSVK-GHLHTYRFCDNVWTFILTDATFKSE------Pinus_TFIIA-S_TC16392 KSMAEALETQVKSKVTIK-GHLHTYRFCDNVWTFLLQDAQFKGE------Populus_balsamifera_TFIIA-S2 KSASQVLSTQLRSKCLIK-GHLSTYRLCDEVWTFLLRDSIYKLEG----- Saccharomyces_cerevisiae_TFIIA KVVAETLKDNTQSKLTVK-GNLDTYGFCDDVWTFIVKNCQVTVEDSHRDA

Homo_sapiens_TFIIA-gamma_NP_00 ------TELIKVDKVKIVACDGKNTGSNTTE Drosophila_melanogaster_TFIIA------HEIVKVDKVKIVACDGK-SGEF--- Arabidopsis_thaliana_TFIIA-S_A ------DRQENVSRVKIVACDSKLLTQ---- Mesembryanthemum_crystallinum_ ------ECQENVSRVKIVACDSKLLTQ---- Medicago_truncatula_TFIIA-S_TC ------DNQENVGRVKIVACDSKLLSQ---- Glycine_max_TFIIA-S_TC148651 ------DSQENVGRVKIVACDSKLLTQ---- Populus_balsamifera_TFIIA-S1 ------DSQENVGRVKIVACDSKLLTQ---- Vitis_vinifera_TFIIA-S ------ESQENVGRVKIVACDSKLLTQ---- Lycopersicon_esculentum_TFIIA------ECQETVNRVKIVACDSKLLTQ----

197 198

Solanum_tuberosum_TFIIA-S_TC60 ------ECQETVNRVKIVACDSKLLTQ---- Triticum_aestivum_TFIIA-S1_TC7 ------ETTEQVGKVKIVACDSKLLSQ---- Triticum_aestivum_TFIIA-S2_TC7 ------ETTEQVGKVKIVACDSKLLSQ---- Oryza_sativa_TFIIA-S_AAK73129 ------ETTEQVGKVKIVACDSKLLSQ---- Triticum_aestivum_TFIIA-S3_CA4 ------ETTEQVGKVKIVACDSKLLGQ---- Zea_mays_TFIIA-S1_TC170582 ------EATEQVGKVKIVACDSKLLGQ---- Hordeum_vulgare_TFIIA-S_TC6639 ------EITEQVSKVKIVACDSKLLSQ---- Oryza_sativa_TFIIA-S2 ------EITETINKVKIVACDSKLLETKEE- Zea_mays_TFIIA-S2_TC173972 ------EIQETLGRVKIVACDSKLLQPQHP- Pinus_TFIIA-S_TC16392 ------DIHEQAGRVKIVACDSKILTQ---- Populus_balsamifera_TFIIA-S2 ------GEQVGPVKRVKIVACKGNAGASAPPA Saccharomyces_cerevisiae_TFIIA SQNGSGDSQSVISVDKLRIVACNSKKSE-----

TFIIA Large Subunit Alignment

CLUSTAL X (1.83) multiple sequence alignment

Arabidopsis_thaliana_TFIIA-L1_ ---MGTTTTTSAVYIHVIEDVVNKVREEFINNGGPGESVLSELQGIWETK Arabidopsis_thaliana_TFIIA-L2_ ---MGTTTTTSAVYIHVIEDVVNKVREEFINNGGPGESVLSELQGIWETK Glycine_max_TFIIA-L_TC192713 ----MAASTTSQVYIQVIDDVMNKVRDEFVNNGGPGDEVLKELQSIWESK Populus_balsamifera_TFIIA-L1 ----MASSATSTVYTEVIEDVIDKVRDEFINNGGPGETVLSELQGLWEKK Solanum_tuberosum_TFIIA-L-STtu ----MASSTTSNVYIHVIEDVISKVRDEFISNGGPGESILKELQALWEVK Hordeum_vulgare_TFIIA-L_Barley ----MASGNVSTVYISVIDDVVAKVREDFITYG-VGDAVLNELQALWEMK Oryza_sativa_TFIIA-L ----MASSNVSTVYISVIDDVISKVRDDFISYG-VGDAVLNELQALWEMK Zea_mays_TFIIA-L_TC183075 ----MASSNVSTVYISVIDDVISKVREDFITYG-VGDAVLNELQALWEMK Arabidopsis_thaliana_TFIIA-L3_ --MVLSTSDTSSSYNYVIDDVINKSRCDLVYNGELDESVLSQIQSMWKTK Homo_sapiens_TFIIA-alpha/beta- ---MACLNPVPKLYRSVIEDVIEGVRNLFAEEG-IEEQVLKDLKQLWETK Homo_sapeins-TFIIA-alpha-beta_NP MANSANTNTVPKLYRSVIEDVINDVRDIFLDDG-VDEQVLMELKTLWENK Drosohphila_melanogaster_TFIIA --MALCQTSVLKVYHAVIEDVITNVRDAFLDEG-VDEQVLQEMKQVWRNK Saccharomyces_cerevisiae_TOA1p ----MSNAEASRVYEIIVESVVNEVREDFENAG-IDEQTLQDLKNIWQKK

Arabidopsis_thaliana_TFIIA-L1_ MMQAGVLNGPIERSSAQKPTP-----GGPLT--HDLNVPYEGT-EEYETP Arabidopsis_thaliana_TFIIA-L2_ MMQAGVLNGPIERSSAQKPTP-----GGPLT--HDLNVPYEGT-EEYETP Glycine_max_TFIIA-L_TC192713 MMQAGAIVGPIERSGAPKPTP-----GGPITPVHDLNMPYEGT-EEYETP Populus_balsamifera_TFIIA-L1 LMQAGVLSGPIVRSSANKQLV-----PGGLTPVHDLNVPYEGT-EEYETP Solanum_tuberosum_TFIIA-L-STtu MMNAGAILGTIERNSAAKATP-----GGPITPVHDLNMPYEGN-EEYETP Hordeum_vulgare_TFIIA-L_Barley MLHCGAISGNIDRNRAPPASAGGAPGAGATPPVHDLNVPYEATSEEYATP Oryza_sativa_TFIIA-L MLHCGAISGTIDRSKAAPAPSAGTPGAGTTPPVHDLNVPYEATSEEYATP Zea_mays_TFIIA-L_TC183075 MLHCGAISGNIDRTKAAAASVGGTT--GTTAPVHDLNVPYEATSEEYATP Arabidopsis_thaliana_TFIIA-L3_ MIQAGAMSGTIETSSAS------IP Homo_sapiens_TFIIA-alpha/beta- VLQSKATEDFFR--NSIQSPLFTLQLPHSLHQTLQSSTASLVIPAGRTLP Homo_sapeins-TFIIA-alpha-beta_NP LMQSRAVDGFHS--EEQQLLLQVQQQHQPQQQQHHHHHHHQQAQPQQTVP Drosohphila_melanogaster_TFIIA LLASKAVELSPDSGDGSHPPPIVANNPKAANAKAKKAAAATAVTSHQHIG Saccharomyces_cerevisiae_TOA1p LTETKVTTFSWDNQFNEGNIN------GVQNDLNFNLATPGVNSSEFNIK

Arabidopsis_thaliana_TFIIA-L1_ TAEMLFPPTPLQT------PLP------TPLPGTAD Arabidopsis_thaliana_TFIIA-L2_ TAEMLFPPTPLQT------PLP------TPLPGTAD Glycine_max_TFIIA-L_TC192713 TAEMLFPPTPLQT------PLQ------TPLPGTVD Populus_balsamifera_TFIIA-L1 TAEILFPPTPLQTPMQTPLPGSAQTPLPGNVQTPLPG--NVPTPLPGSVD Solanum_tuberosum_TFIIA-L-STtu TADILFPPTPLQT----PLPGTAQTPLPGTVQTPLPG--TAQTPLPGTAD Hordeum_vulgare_TFIIA-L_Barley TADMLFPPTPLQT------PIQ------TPLPGIDT Oryza_sativa_TFIIA-L TADMLFPPTPLQT------PIQ------TPLPGTDA Zea_mays_TFIIA-L_TC183075 TADMLFPPTPLQT------PIQ------TPLPGTDT Arabidopsis_thaliana_TFIIA-L3_ TTPVIVQTT-LQT------PDA------IPLPEKKM Homo_sapiens_TFIIA-alpha/beta- SFTTAELGTSNSSANFTFPGYPIHVPAGVTLQTVSGHLYKVNVPIMVTET Homo_sapeins-TFIIA-alpha-beta_NP QQAQTQQVLIPASQQATAP--QVIVPDSKLIQHMN----ASNMSAAATAA Drosohphila_melanogaster_TFIIA GNSSMSSLVGLKSSAGMAAGSGIRNGLVPIKQEVN------SQNPPPLHP Saccharomyces_cerevisiae_TOA1p EENTGNEGLILPN------INSNNNIP

Arabidopsis_thaliana_TFIIA-L1_ NSSMYNIPTG-----SSDYP-TPGTENG-----VNIDVK-A--RPSPYM- Arabidopsis_thaliana_TFIIA-L2_ NSSMYNIPTG-----SSDYP-TPGTENG-----VNIDVK-A--RPSPYM- Glycine_max_TFIIA-L_TC192713 N-SMYNIPTG-----PSDYP-SAGNEPG-----ANNEIKGG--RPGPYMQ Populus_balsamifera_TFIIA-L1 NSSMYNISTG----SSSDYP-TPVSDAG-----GSTDVKAG--RPSHFM- Solanum_tuberosum_TFIIA-L-STtu -SSMYNIPTGGTPFTPSDY--SPLNDTG-----GATELKAGPGRPSPFM- Hordeum_vulgare_TFIIA-L_Barley --GMYNIPTG-----PSDYAPSPISDMRNGMGMNGSDPKTG--RPSPYM- Oryza_sativa_TFIIA-L --GMYNIPTG-----PSDYAPSPISDVRNGMAMNGADPKTG--RPSPYM- Zea_mays_TFIIA-L_TC183075 --AMYNIPTG-----PSDYAPSPISDIRNGMTINGADPKAG--HPSPYM- Arabidopsis_thaliana_TFIIA-L3_ S------Homo_sapiens_TFIIA-alpha/beta- SGRAGILQHPIQQVFQQLGQPSVIQTSVPQLNPWSLQATTEKSQRIETVL Homo_sapeins-TFIIA-alpha-beta_NP TLALPAGVTPVQQILTNSGQLLQVVR------AANGAQYIFQPQQSVV

199

Drosohphila_melanogaster_TFIIA TSAASMMQKQQQAASSGQGSIPIVAT------LDPNRIM Saccharomyces_cerevisiae_TOA1p HSGETNINTN------

Arabidopsis_thaliana_TFIIA-L1_ PPPSPWTNPR------LDVNVAYVDGRD-EPERGNSNQQFTQDLFVPS Arabidopsis_thaliana_TFIIA-L2_ PPPSPWTNPR------LDVNVAYVDGRD-EPERGNSNQQFTQDLFVPS Glycine_max_TFIIA-L_TC192713 PPPSPWTNQNQNQNQRAPLDVNVAYVEGRD-EAERGASNQPLTQDFFMSS Populus_balsamifera_TFIIA-L1 QSPSPLMHQR------PPLDVN---VGKSYFYAPRRVHGQ---KDFFMSS Solanum_tuberosum_TFIIA-L-STtu HPPSPWLNQR------PPLDVNGAYVEGREEVGDRGGSQQPMTQDFFMNS Hordeum_vulgare_TFIIA-L_Barley QPPSPWMNQRP-----LGVDVNVAYEESRE-DPDRLMQPQPLTKDFLMMS Oryza_sativa_TFIIA-L PPPSPWMTQRP-----LGVDVNVAYVENRE-DPDRTGQPPQLTKDFLMMS Zea_mays_TFIIA-L_TC183075 PPPSPWMNQRP-----LGVDVNVAYVEGRE-DPDRGVQPQPLTQDFLTMS Arabidopsis_thaliana_TFIIA-L3_ ------Homo_sapiens_TFIIA-alpha/beta- QQPAILPSGPVDRKHLENATSDILVSPGNEHKIVPEALLCHQESSHYISL Homo_sapeins-TFIIA-alpha-beta_NP LQQQVIP------QMQPGGVQAPVIQQVLAPLPG-GISPQ Drosohphila_melanogaster_TFIIA PVNITLP------SPAGSASSESRVLTIQVPASALQEN Saccharomyces_cerevisiae_TOA1p ------TVEATNNSGATLNTNTSGNTNADVTSQ

Arabidopsis_thaliana_TFIIA-L1_ SGKRKRDDSSGHYQNGGSIPQQDGAGDA---IPEANFECDAFR------Arabidopsis_thaliana_TFIIA-L2_ SGKRKRDDSSAHYQNGGSIPQQDGASDA---IPEANFECAALR------Glycine_max_TFIIA-L_TC192713 -GKRKRDEIASQYNAGGYIPQQDGAGDAASQILEIEVYGGGMS------Populus_balsamifera_TFIIA-L1 -GKRKRGDFAPKYNNGGFIPQQDGAVDSASEVSQVSQGNNPHG------Solanum_tuberosum_TFIIA-L-STtu AGKRKREDFPPQYHNGGYIPQQDGAADSIYDNLKSGEGSNIQL------Hordeum_vulgare_TFIIA-L_Barley SGKRKRDEYPGQLPSGSFVPQQDGCADQVAEFVGSKDNAQQVWNS----- Oryza_sativa_TFIIA-L SGKRKRDEYPGQLPSGSFVPQQDGSADQIVEFVVSKDNAQQLWSS----- Zea_mays_TFIIA-L_TC183075 SGKRKRDEYPGQLPSGSFVPQQDGSADQIVEFVVSKENANQHWSS----- Arabidopsis_thaliana_TFIIA-L3_ -PKKESDGFY------YIPQQDGARDEA------Homo_sapiens_TFIIA-alpha/beta- PGVVFSPQVSQTNSNVESVLSGSASMAQNLHDESLSTSPHGALHQHVTDI Homo_sapeins-TFIIA-alpha-beta_NP TGVIIQPQQILFTGNKTQVIPTTVAAPTPAQAQITATGQQ------Drosohphila_melanogaster_TFIIA Q-----LTQILTAHLISSIMSLPTTLASSVLQQHVNAALS------Saccharomyces_cerevisiae_TOA1p PKIEVKPEIELTINNANITTVENIDDESEKKDDEEKEE------

Arabidopsis_thaliana_TFIIA-L1_ ------ITS---IGDRKVPRD------FFSSSS Arabidopsis_thaliana_TFIIA-L2_ ------ITY---VGDRKIPRD------LIGSSS Glycine_max_TFIIA-L_TC192713 ------IDAGHSTSKGKMPAQ------SDRPAS Populus_balsamifera_TFIIA-L1 ------RCDTITTKNREILAR------VSRSYV Solanum_tuberosum_TFIIA-L-STtu ------ELVTVGP------VQASAY Hordeum_vulgare_TFIIA-L_Barley ------ILNKQESVTKTLSIK------ESTIPP Oryza_sativa_TFIIA-L ------IVNKQGTATKESSTK------ETIIAP Zea_mays_TFIIA-L_TC183075 ------IINKLETPTKT------VTP Arabidopsis_thaliana_TFIIA-L3_ ------Homo_sapiens_TFIIA-alpha/beta- QLHILKNRMYGCDSVKQPRNIEEPSNIPVSEKDSNSQVDLSIRVTDDDIG Homo_sapeins-TFIIA-alpha-beta_NP ------QPQAQPAQTQAP------Drosohphila_melanogaster_TFIIA ------SANHQKTLAAAK------Saccharomyces_cerevisiae_TOA1p ------

Arabidopsis_thaliana_TFIIA-L1_ KIPQVDGPMPDPYDEMLSTPNIYSYQGP-SEEFNEARTPAPNEIQTSTPV Arabidopsis_thaliana_TFIIA-L2_ KIPQVDGPMPDPYDEMLSTPNIYSYQGP-NEEFNEARTPAPNEIQTSTPV Glycine_max_TFIIA-L_TC192713 QIPQLDGPIPY-DDDVLSTPNIYNYGVF-NEDYNIANTPAPSEVPASTPA Populus_balsamifera_TFIIA-L1 KIPQVDGPIPDPYDDMLSTPNIYNYQGVANEDYNIASTPAPNDLQASTPA Solanum_tuberosum_TFIIA-L-STtu RIPQFDGPIPDSYDDALSTPNIY-YQGVVNEDYNIVNTPAPNDMQAPTPA Hordeum_vulgare_TFIIA-L_Barley VLPQRDGIQDDYNDQFF------FPGVPTEDYNTPGESSEYRTPTPAIA Oryza_sativa_TFIIA-L TIPQRDG-MDDYNDPFY------FQGVPTEDYNTPGESSEYRAPTPAVG Zea_mays_TFIIA-L_TC183075 VIPQCDGIQDDYNDQFF------FPGVPTEDYNTPGESAEYRAPTPAVG Arabidopsis_thaliana_TFIIA-L3_ -IVDVD------Homo_sapiens_TFIIA-alpha/beta- EIIQVDGSGDTSSNEEIGSTRDADENEFLGNIDGGDLKVPEEEADSISNE Homo_sapeins-TFIIA-alpha-beta_NP LVLQVDGTGDTSSEE------DEDE------EEDYDDDEEE Drosohphila_melanogaster_TFIIA ---QLDGALDSSDED------ESEE------SDDNIDNDDDD Saccharomyces_cerevisiae_TOA1p ---DVEKTRKE------KEQIEQVKLQ

Arabidopsis_thaliana_TFIIA-L1_ AVQN-DIIE------DDEELLNE-DDDDDELDDLESGEDM-NTQHLVL Arabidopsis_thaliana_TFIIA-L2_ AVPN-DIIE------DDEELLNE-DDDDDELDDLESGEDM-NTQHLVL Glycine_max_TFIIA-L_TC192713 PIAQ-NEVDE----EDDDDEPPLNE-NDDDD-LDDLDQGEDQ-NTHHLVL Populus_balsamifera_TFIIA-L1 VVSQNDDVD------DDDDEPLNEDDDDDEDLDGVDQGEEL-NTQHLIL Solanum_tuberosum_TFIIA-L-STtu PALQNDDID------DDD-EPLNEDDDD--DLDDVDQGEDL-NTAHLVL Hordeum_vulgare_TFIIA-L_Barley TPKPRNDMAGGDDDDDDDDEPPLNEDDDDDDEIDDLQDGDEEPNTQHLVL Oryza_sativa_TFIIA-L TPKPRNDVG------DDDEPPLNEDDDDDDELDDLEQGEDEPNTQHLVL Zea_mays_TFIIA-L_TC183075 TPKPRNDAGDDNDDDDDDEEPPLNEDDDDDDDLDDLEEGEDEPNTQHLVL Arabidopsis_thaliana_TFIIA-L3_ ------ENEEPLNE--DDDDEEDDID--DDDMNIQHLVM Homo_sapiens_TFIIA-alpha/beta- DSATNSSDNEDPQVN-IVEEDPLNSGDDVSE-----QDVPDLFDTDNVIV Homo_sapeins-TFIIA-alpha-beta_NP DKEKDGAEDGQ------VEEEPLNSEDDVSD-----EEGQELFDTENVVV

200

Drosohphila_melanogaster_TFIIA DLDKDDDEDAEHED--AAEEEPLNSEDDVTD-----EDSAEMFDTDNVIV Saccharomyces_cerevisiae_TOA1p AKKEKRSAL------LDTDEVGSELDDSDDDYLISEGEEDGPDENLML

Arabidopsis_thaliana_TFIIA-L1_ AQFDKVTRTKSRWKCSLKDGIMHINDKDILFNKAAGEFDF- Arabidopsis_thaliana_TFIIA-L2_ AQFDKVTRTKSRWKCSLKDGIMHINDKDILFNKATGEFDF- Glycine_max_TFIIA-L_TC192713 AQFDKVTRTKSRWKCTLKDGIMHINNKDILFNKATGEFDF- Populus_balsamifera_TFIIA-L1 AQFDKVTRTKSRWKCTLKDGVMHINNRDILFNKATGEFEF- Solanum_tuberosum_TFIIA-L-STtu AQFDKVTRTKSRWKCTLKDGIMHINNKDILFNKANGEFDF- Hordeum_vulgare_TFIIA-L_Barley AQFDKVTRTKNRWKCTLKDGIMHLNGRDVLFNKASGEFDF- Oryza_sativa_TFIIA-L AQFDKVTRTKNRWKCTLKDGIMHLNGRDVLFNKVVNMIF-- Zea_mays_TFIIA-L_TC183075 AQFDKVTRTKNRWKCTLKDGIMHLNGRDVLFNKATGEFDF- Arabidopsis_thaliana_TFIIA-L3_ CQFDKVKRSKNKWECKFNAGVMQINGKNVLFSQATGDFNF- Homo_sapiens_TFIIA-alpha/beta- CQYDKIHRSKNKWKFYLKDGVMCFGGRDYVFAKAIGDAEW- Homo_sapeins-TFIIA-alpha-beta_NP CQYDKIHRSKNKWKFHLKDGIMNLNGRDYIFSKAIGDAEW- Drosohphila_melanogaster_TFIIA CQYDKITRSRNKWKFYLKDGIMNMRGKDYVFQKSNGDAEW- Saccharomyces_cerevisiae_TOA1p CLYDKVTRTKARWKCSLKDGVVTINRNDYTFQKAQVEAEWV

TFIIB Family Alignment

Cysteine and histidine residues potentially involved in metal ion chelation are highlited in yellow. A conserved lysine residue found to be autoacetylated in human and yeast TFIIB proteins is shown in green (Choi et al., 2003). The first core-TFIIB imperfect direct-repeat is highlighted in grey, and the second direct repeat is in a red font

(Nikolov et al., 1995).

CLUSTAL X (1.83) multiple sequence alignment

Arabidopsis_thaliana_BRF1_At2g ------MVWCKHCGKNVPG--IRPYDAALSCD-----L Arabidopsis_thaliana_BRF2_At3g ------MVWCNHCVKNVPG--IRPYDGALACN-----L Arabidopsis_thaliana_BRF3_At2g ------Drosophila_melanogaster_BRF_AA ------MSTGLKCRNCGSNEIE--EDNARGDRVCM-----N Homo_sapiens_BRF_NP_001510.2 ------MTGRVCRGCGGTDIE--LDAARGDAVCT-----A Saccharomyces_cerevisiae_BRF_N ------MPVCKNCHGTEFERDLSNANNDLVCK-----A Populus_balsamifera_TFIIB7/pBrp ------MKCPYCSATQGRC-ATTTTTNRCITECT-----S Arabidopsis_thaliana_TFIIB5_At ------MKCPYCSSAQGRC-TTTSSG-RSITECS-----S Lycopersicon_esculentum_AAG011 ------MRCPYCSAEQGRC-TSSTSG-RPITECT-----S Populus_balsamifera_TFIIB6 ------MGDAFCSDCKRHT-EVVFDHSAGDTVCS-----E Populus_balsamifera_TFIIB4 ------MGDAFCSDCKKHT-EVVCDHSAGDTVCS-----E Populus_balsamifera_TFIIB5 ------MEDSYCPDCKRLT-EIVFDHSAGDTICS-----E Oryza_sativa_TFIIB1_AF464908 ------MSDSFCPDCKKHT-EVAFDHSAGDTVCT-----E Triticum_aestivum_TC68795 ------MGDSYCQDCKKHT-EVAFDHSAGDTVCT-----E Mesembryanthemum_crystallinum_ ------MSDAFCSDCKKCT-EVVFDHSAGDTVCS-----E Vitis_vinifera_TFIIB_TC19782 ------MADAFCTDCKKNT-EVVFDHSAGDTVCS-----E Populus_balsamifera_TFIIB1 ------MGDAFCSDCKRHT-EVVFDHSAGDTVCS-----E Citrus_sinensis_TFIIB_CB292941 ------MTDAFCSDCKKHT-EVVFDHSAGDTVCS-----E Glycine_max_TFIIB_U31097 ------MSDAFCSDCKRQT-EVVFDHSAGDTVCS-----E Medicago_truncatula_TC86832 ------MSDAFCSDCKRAT-EVVFDHSAGDAVCS-----E Arabidopsis_thaliana_TFIIB2_At ------MSDAFCSDCKRHT-EVVFDHSAGDTVCS-----E Arabidopsis_thaliana_TFIIB1_At ------MSDAYCTDCKKET-ELVVDHSAGDTLCS-----E Lycopersicon_esculentum_TFIIB_T ------MDTYCSDCKRNT-EVVFDHAAGDTVCS-----E Solanum_tuberosum_TFIIB1_TC587 ------MDTYCSDCKRNT-EVVFDHAAGDTVCS-----E Populus_balsamifera_TFIIB3 ------MNRTITNIKSA-----S Populus_balsamifera_TFIIB8 ------Arabidopsis_thaliana_TFIIB4_At ------CGLEFKYRPIGDLSPVAENDT-VRLPDPTNTLLSNT-----D Populus_balsamifera_TFIIB2 ------MARNGEIDDYRDYCKDCKANT-YIVLDHCTGDTICS-----D Oryza_sativa_TFIIB2_AAN59779 ------MCSVAGNEQCYCPECHRTT-VVVVDHATGDTICT-----E Drosophila_melanogaster_TFIIB_ ------MASTSRLDNN-KVCCYAHPES-PLIEDYRAGDMICS-----E Homo_sapiens_TFIIB_NM_001514 ------MASTSRLDALPRVTCPNHPDA-ILVEDYRAGDMICP-----E Arabidopsis_thaliana_TFIIB3_At ------MEEETCLDCKRPT-IMVVDHSSGDTICS-----E Arabidopsis_thaliana_TFIIB6_At ------MKEDGICLECKRPT-ETVVNYKNGDTICI-----E

201

Saccharomyces_cerevisiae_TFIIB ------NIVLTCPECKVYPPKIVERFSEGDVVCA-----L Methanosarcina_acetivorans_TFB ------FENEKAVCPECGSRN--LVHDYERAELVCG-----D Sulfolobus_solfataricus_TFB_AA ------MLYLSEENKSVSTPCPPDK--IIFDAERGEYICS-----E Populus_balsamifera_TFIIB9 ------MSPALASSGASI-----E Arabidopsis_thaliana_BRF4_At4g MRCKRCNGSNFERDEDTGNSYCGGCGTLREYDNYEAQLGGIRGPQGTYIR Lycopersicon_esculentum_TFIIB_AF ------YLDLEDIIKENALPFLPAKSAVKFQAVCR-----D

Arabidopsis_thaliana_BRF1_At2g CGRILENFNFSTEVTFVKNAAGQSQ---ASGNILKSVQSGMS------Arabidopsis_thaliana_BRF2_At3g CGRILENFHFSTEVTFVKNAAGQSQ---ASGNIVRSVQSGIT------Arabidopsis_thaliana_BRF3_At2g ------Drosophila_melanogaster_BRF_AA CGSVLEDSLIVSEVQFEEV-GHGAA---AIGQFVSAESSGGATNYGYG-- Homo_sapiens_BRF_NP_001510.2 CGSVLEDNIIVSEVQFVESSGGGSS---AVGQFVSLDGAGKTPTLG-G-- Saccharomyces_cerevisiae_BRF_N CGVVSEDNPIVSEVTFGETSAGAAV---VQGSFIGAGQSHAA------Populus_balsamifera_TFIIB7/pBrp CGRVVEERQFHPHHLFHLRAQDTPL---CLVTSDLPTLHHHHQ-NEEDPF Arabidopsis_thaliana_TFIIB5_At CGRVMEERQTQNHHLFHLRAQDTPL---CLVTSDLQTAAQPSPEDEEDPF Lycopersicon_esculentum_AAG011 CGRVVEERLTQSHHLFHTRAQDSPL---CLATSDLPTLPISATNDDEDPF Populus_balsamifera_TFIIB6 CGLVLESHSIDETSEWRTFANESG-----DNDPVRVGGPTNPLLTDGG-- Populus_balsamifera_TFIIB4 CGLVLESHSIDETSEWRIFANESG-----DNDPVRVGGPTNPLLTDGG-- Populus_balsamifera_TFIIB5 CGLILEAHSVDETSEWRTFSNESS-----DHDPNRVGGPLNPLLADGG-- Oryza_sativa_TFIIB1_AF464908 CGLVLEAHSVDETSEWRTFANESS-----DNDPVRVGGPTNPLLTDGG-- Triticum_aestivum_TC68795 CGLVLEAHSVDETSEWRTFANESN-----DNDPVRVGGPSNPLLTDGG-- Mesembryanthemum_crystallinum_ CGLVLESHSIDETSEWRTFANESN-----DNDPVRVGGPTNPLLSDGG-- Vitis_vinifera_TFIIB_TC19782 CGLVLESHSIDETSEWRTFANESG-----DNDPVRVGGPSNPLLTDGG-- Populus_balsamifera_TFIIB1 CGLVLESHSIDETSEWRTFANESG-----DNDPVRVGGPTNPLLTDGG-- Citrus_sinensis_TFIIB_CB292941 CGLVLESHSIDETSEWRTFANESG-----DNDPVRVGGPTNPLLADGG-- Glycine_max_TFIIB_U31097 CGLVLESHSIDETSEWRTFANESG-----DNDPNRVGGPSNPLLTDGG-- Medicago_truncatula_TC86832 CGLVLESHSIDETSEWRTFANESG-----DNDPVRVGGPSNPLLTDGG-- Arabidopsis_thaliana_TFIIB2_At CGLVLESHSIDETSEWRTFANESG-----DNDPVRVGGPTNPLLADGG-- Arabidopsis_thaliana_TFIIB1_At CGLVLESHSIDETSEWRTFANESS-----NSDPNRVGGPTNPLLADSA-- Lycopersicon_esculentum_TFIIB_T CGLVLESRSIDETSEWRTFADESG-----GDDPNRVGGPVNPLLGDAA-- Solanum_tuberosum_TFIIB1_TC587 CGLVLESRSIDETSEWRTFADESG-----GDDPNRVGGPVNPLLGDAA-- Populus_balsamifera_TFIIB3 PSLSLTRKGLDAFLVW-LFMRIS------PNVSFSFSGD-- Populus_balsamifera_TFIIB8 ------MENNLMFVWRS------PISIAAAVIY----- Arabidopsis_thaliana_TFIIB4_At LSIVTTEHKNGSFDDSLSLNLGNS-----SKPRLDPVSIATAKLMNGSSN Populus_balsamifera_TFIIB24 CGLVLESCYIDEIAEWRTFNDDNN-----DKDPNRVGYNVNPLLSQGN-- Oryza_sativa_TFIIB2_AAN59779 CALVLEERYIDETSEWRTFSDAGSG---EDRDPNRVGGCSDPFLSHAE-- Drosophila_melanogaster_TFIIB_ CGLVVGDRVIDVGSEWRTFSNEKS-----GVDPSRVGGPENPLLSGGD-- Homo_sapiens_TFIIB_NM_001514 CGLVVGDRVIDVGSEWRTFSNDKA-----TKDPSRVGDSQNPLLSDGD-- Arabidopsis_thaliana_TFIIB3_At CGLVLEAHIIEYSQEWRTFASDDNH---SDRDPNRVGAATNPFLKSGD-- Arabidopsis_thaliana_TFIIB6_At CGHVIENNIID------DLD----GASTNPNLKSGH-- Saccharomyces_cerevisiae_TFIIB CGLVLSDKLVDTRSEWRTFSNDDHN----GDDPSRVGEASNPLLDGNN-- Methanosarcina_acetivorans_TFB CGLVIDADFVDEGPEWRAFDHDQR------MKRSRVGAPMTYTIHDKG-- Sulfolobus_solfataricus_TFB_AA TGEVLEDKIIDQGPEWRAFTPEEK------EKRSRVGGPLNNTIHDRG-- Populus_balsamifera_TFIIB9 LALYAGSKWFGSNRVFFPPPPPSL------TALGCIGQAKKKWVSLSS-- Arabidopsis_thaliana_BRF4_At4g VGTIGRGSVLDYKDKKIYEANNLIE---ETTERLNLGNKTEVIKSMISKL Lycopersicon_esculentum_TFIIB_AF WRLQISAPLFAHKQSLSCNSTSGIFSQLNRGSPFLIPIDANSCGVPDP--

Arabidopsis_thaliana_BRF1_At2g ------SS Arabidopsis_thaliana_BRF2_At3g ------SS Arabidopsis_thaliana_BRF3_At2g ------Drosophila_melanogaster_BRF_AA ------KFQVGSGTES Homo_sapiens_BRF_NP_001510.2 ------GFHVNLGKES Saccharomyces_cerevisiae_BRF_N ------FGGSSALES Populus_balsamifera_TFIIB7/pBrp EPTG--FITSFSTWSLEPNPVSLRSSLSFSGHLAELERTIELSASTPAS- Arabidopsis_thaliana_TFIIB5_At EPTG--FITAFSTWSLEPSPIFARSSLSFSGHLAELERTLELASSTSNSN Lycopersicon_esculentum_AAG011 EPTG--FITTFSTWSLEPYPVFAQSSISFAGHLAELERVLEMTSTSSSSS Populus_balsamifera_TFIIB6 ------LSTVIAK-PNGASG-EFL------SSSLGRWQNRGSN Populus_balsamifera_TFIIB4 ------LSTVIAK-PNGASG-DFL------STSLGRWQNRGSN Populus_balsamifera_TFIIB5 ------LSTTISK-TNGGSN-ELL------SCSLGKWQSRGAN Oryza_sativa_TFIIB1_AF464908 ------LSTVIAK-PNGAQG-EFL------SSSLGRWQNRGSN Triticum_aestivum_TC68795 ------LSTVIAK-PNGAHG-DFL------SSSLGRWQNRGSN Mesembryanthemum_crystallinum_ ------LSTVISK-PNGTTG-DYL------SSSLGRWQNRGAN Vitis_vinifera_TFIIB_TC19782 ------LSTVIAK-PNGVSG-DFL------SSSLGRWQNRGSN Populus_balsamifera_TFIIB1 ------LSTVIAK-PNGASG-EFL------SSSLGRWQNRGSN Citrus_sinensis_TFIIB_CB292941 ------LSTVIAK-PNGASG-EFL------SSSLGRWQNRGSN Glycine_max_TFIIB_U31097 ------LSTVIAK-PNGGGGGEFL------SSSLGRWQNRGSN Medicago_truncatula_TC86832 ------LSTVIAK-PNGASG-DFL------SSSLGRWQNRGSN Arabidopsis_thaliana_TFIIB2_At ------LTTVISK-PNGSSG-DFL------SSSLGRWQNRGSN Arabidopsis_thaliana_TFIIB1_At ------LTTVIAK-PNGSSG-DFL------SSSLGRWQNRNSN Lycopersicon_esculentum_TFIIB_T ------LSTVISKGPNGSNG------DGSLARLQNRGGD Solanum_tuberosum_TFIIB1_TC587 ------LSTVISKGPNGSNG------DGSLARLQNRGGD

202

Populus_balsamifera_TFIIB3 ------AFFIILK------Populus_balsamifera_TFIIB8 ------IIIQLSD------D Arabidopsis_thaliana_TFIIB4_At DFLSLGTSQNSETITASSDEFLFSDLGH------LQKFSFDPLSMAS Populus_balsamifera_TFIIB2 ------LKTLISN-NKG------DHAIPRWQDGVSN Oryza_sativa_TFIIB2_AAN59779 ------LGTVVAPAKRQAKD---T------ASPPHVRVDSKSG Drosophila_melanogaster_TFIIB_ ------LSTIIGP-GTGSASFDAF------GAPKYQNRRTMSS Homo_sapiens_TFIIB_NM_001514 ------LSTMIGK-GTGAASFDEF------GNSKYQNRRTMSS Arabidopsis_thaliana_TFIIB3_At ------LVTIIEKPKETASSVLSKDDI----STLFRAHNQVKN--H Arabidopsis_thaliana_TFIIB6_At ------LPTIIFKLSGKSSSLASK------LRRTQNEMIKNKQ Saccharomyces_cerevisiae_TFIIB ------LSTRIGKGETTDMRFTKELN------KAQGKNVMDK Methanosarcina_acetivorans_TFB ------LSTMIDWRNRDSYGKSISSKNRAQLYRLRKWQRRIRVSNA Sulfolobus_solfataricus_TFB_AA ------LSTLIDWKDKDAMGRTLDPKRRLEALRWRKWQIRARIQSS Populus_balsamifera_TFIIB9 ------PTWRQCVIK Arabidopsis_thaliana_BRF4_At4g TDGEFGQGEWFPILIGACCYAVVREEGKG----VLSMEEVAYEVGCDLHQ Lycopersicon_esculentum_TFIIB_AF ------FLNFLPEPVDIKSSSNGLLCCR------GREGDKVYYICNP

Arabidopsis_thaliana_BRF1_At2g RERIIRKATDELMNLRDALGIGDDRDDVIVMASNFFRIALDHN------Arabidopsis_thaliana_BRF2_At3g RERRFRIARDELMNLKDALGIGDERDDVIVIAAKFFEMAVEQN------Arabidopsis_thaliana_BRF3_At2g ------MDQN------Drosophila_melanogaster_BRF_AA REVTIKKAKKDITLLCQQLQLS---QHYADTALNFFKMALGRH------Homo_sapiens_BRF_NP_001510.2 RAQTLQNGRRHIHHLGNQLQLN---QHCLDTAFNFFKMAVSRH------Saccharomyces_cerevisiae_BRF_N REATLNNARRKLRAVSYALHIP---EYITDAAFQWYKLALANN------Populus_balsamifera_TFIIB7/pBrp SSNVVVDNLRAYMQIIDVASILGLDCDISDHAFQLFRDCCSAT------Arabidopsis_thaliana_TFIIB5_At SSTVVVDNLRAYMQIIDVASILGLDCDISEHAFQLFRDCCSAT------Lycopersicon_esculentum_AAG011 SSSVVVENLRAYLQIIDVASILRLDSDISDHAFQLFRDCSSAT------Populus_balsamifera_TFIIB6 PDRGLITAFKTIATMSDRW------Populus_balsamifera_TFIIB4 PDRGLILAFKTIATMSDRLGLVATIKVFILDVCTLLL--PLM------Populus_balsamifera_TFIIB5 PDRNRIQAFKSIAAMADRF------Oryza_sativa_TFIIB1_AF464908 PDRSLILAFRTIANMADRLGLVATIKDRANEIYKKVE--DLK------Triticum_aestivum_TC68795 PDRSLILAFRTIANMADRLGLVATIKDRANEIYKKVE--DLK------Mesembryanthemum_crystallinum_ PDRGLILAFKTIATMADRLGLVATIKDRASEIYKKVE--DQK------Vitis_vinifera_TFIIB_TC19782 PDRGLILAFKTIATMSDRLGLVATIKDRANEIYKKVE--DQK------Populus_balsamifera_TFIIB1 PDRGLITAFKTIATMSDRLGLVATIKDRANEIYKKVE--DQK------Citrus_sinensis_TFIIB_CB292941 PDRGLILAFKTIATMSDRLGLVATIKDRANEIYKKVE--DQK------Glycine_max_TFIIB_U31097 PDRALIQAFKTIATMSDRLGLVATIKDRANEIYKRVE--DQK------Medicago_truncatula_TC86832 PDRGLILAFKTIGTMAERLGLVPTIKDRANEIYKRVE--DQK------Arabidopsis_thaliana_TFIIB2_At PDRGLIVAFKTIATMADRLGLVATIKDRANEIYKRVE--DQK------Arabidopsis_thaliana_TFIIB1_At SDRGLIQAFKTIATMSERLGLVATIKDRANELYKRLE--DQK------Lycopersicon_esculentum_TFIIB_T PDRAIVLAFKAIATMADRLSLVSTIRDRASEIYKRLE--DQK------Solanum_tuberosum_TFIIB1_TC587 PDRAIVLAFKAIATMADRLSLVSTIRDRASEIYKRLE--DQK------Populus_balsamifera_TFIIB3 -DR------ANEIYKKVE--DQK------Populus_balsamifera_TFIIB8 KKPLKDISVVTQVAEG---TIKNSYKDLSPHLSQIIP--S------Arabidopsis_thaliana_TFIIB4_At TKPNKALSIVSIEAISNGLKLPATIKGQANEIFKVVE--S------Populus_balsamifera_TFIIB2 SDRVLLQGFDIIEIIANRLGLVRPIKDRAKEIFKKIE--EQK------Oryza_sativa_TFIIB2_AAN59779 QDSSLAVAFRAISDMADRLQLVATIRDRAKELFKKME--EAKL------Drosophila_melanogaster_TFIIB_ SDRSLISAFKEISSMADRINLPKTIVDRANNLFKQVH--DGK------Homo_sapiens_TFIIB_NM_001514 SDRAMMNAFKEITTMADRINLPRNIVDRTNNLFKQVY--EQK------Arabidopsis_thaliana_TFIIB3_At EEDLIKQAFEEIQRMTDALDLDIVINSRACEIVSKYDGHANTK------Arabidopsis_thaliana_TFIIB6_At EEDVIKIAYAEIERMTEALGLTFGISNTACKILSKLD---KKN------Saccharomyces_cerevisiae_TFIIB KDNEVQAAFAKITMLCDAAELPKIVKDCAKEAYKLCH---DEK------Methanosarcina_acetivorans_TFB TERNLAFALSELDRMASALGLPRTVRETAAVVYRKAV---DKN------Sulfolobus_solfataricus_TFB_AA IDRNLAQAMNELERIGNLLNLPKSVKDEAALIYRKAV---EKG------Populus_balsamifera_TFIIB9 QSCELSKIRFLSPPFQETYTHPYQLRKKKKKTHQEKQ---NRK------Arabidopsis_thaliana_BRF4_At4g LGPMIKRVVDHLDLELREFDLVGLFTKTVTNSPRLTDVDRDKKEKIIKQG Lycopersicon_esculentum_TFIIB_AF FTKQWKELPKSNAYHGSDPAIVLLFEPSLLNFVAEYKIICAFP------

Arabidopsis_thaliana_BRF1_At2g ------FTKGRSKELVFSSCLYLTCRQFKLAVLLI------DF Arabidopsis_thaliana_BRF2_At3g ------FTKGRRTELVQASCLYLTCRELNIALLLI------DF Arabidopsis_thaliana_BRF3_At2g ------FTKGRRAELVQSSCLYLACRDMKISLLFI------DF Drosophila_melanogaster_BRF_AA ------LTRGRKSTHIYAACVYMTCRTEGTSHLLI------DI Homo_sapiens_BRF_NP_001510.2 ------LTRGRKMAHVIAACLYLVCRTEGTPHMLL------DL Saccharomyces_cerevisiae_BRF_N ------FVQGRRSQNVIASCLYVACRKEKTHHMLI------DF Populus_balsamifera_TFIIB7/pBrp ------CLRNRSVEALATAALVQAIREAQEPRTLQVGTLVNGEEI Arabidopsis_thaliana_TFIIB5_At ------CLRNRSVEALATACLVQAIREAQEPRTLQ------EI Lycopersicon_esculentum_AAG011 ------CLRNRSVEALATAALVHAIREAQEPRTLQ------EI Populus_balsamifera_TFIIB6 ------VREYKLLD------V Populus_balsamifera_TFIIB4 ------VSTVSLKHPLMNMALNLNYHVNK------V Populus_balsamifera_TFIIB5 ------FLFYFLWEKND------V Oryza_sativa_TFIIB1_AF464908 ------SIRGRNQDAILAACLYIACRQEDRPRTVK------EI Triticum_aestivum_TC68795 ------SIRGRNQDAILAACLYIACRQEDRPRTVK------EI Mesembryanthemum_crystallinum_ ------SSRGRNQDAILAACLYIACRQEDKPRTVK------EI

203

Vitis_vinifera_TFIIB_TC19782 ------STRGRNQDALLAACLYIACRQEDKPRTVK------EI Populus_balsamifera_TFIIB1 ------SSRGRNQDALLAACLYIACRQEDKPRTVK------EI Citrus_sinensis_TFIIB_CB292941 ------SSRGRNQDALLAACLYIACRQEDKPRTVK------EI Glycine_max_TFIIB_U31097 ------SSRGRNQDALLAACLYIACRQEDKPRTVK------EI Medicago_truncatula_TC86832 ------SSRGRNQDALLAACLYIACRQEDKPRTVK------EI Arabidopsis_thaliana_TFIIB2_At ------SSRGRNQDALLAACLYIACRQEDKPRTVK------EI Arabidopsis_thaliana_TFIIB1_At ------SSRGRNQDALYAACLYIACRQEDKPRTIK------EI Lycopersicon_esculentum_TFIIB_T ------CTRGRNLDALVAACIYIACRQEGKPRTVK------EI Solanum_tuberosum_TFIIB1_TC587 ------CTRGRNLDALVAACIYIACRQEGKPRTVK------EI Populus_balsamifera_TFIIB3 ------SSRGRNQDALLAACLYIACRQEDKPRTVK------EI Populus_balsamifera_TFIIB8 ------WFAKEE------DI Arabidopsis_thaliana_TFIIB4_At ------YARGKERNVLFAACIYIACRDNDMTRTMR------EI Populus_balsamifera_TFIIB2 ------TCVMRKRDSICAACLFISSRENKLPRTLN------EI Oryza_sativa_TFIIB2_AAN59779 ------CARVRNRDAAYAACLHIACRNEGNPRTLK------EL Drosophila_melanogaster_TFIIB_ ------NLKGRSNDAKASACLYIACRQEGVPRTFK------EI Homo_sapiens_TFIIB_NM_001514 ------SLKGRANDAIASACLYIACRQEGVPRTFK------EI Arabidopsis_thaliana_TFIIB3_At ------LRRGKKLNAICAASVSTACRELQLSRTLK------EI Arabidopsis_thaliana_TFIIB6_At ------LRGGKSLRGLCAASVSRACRQVNIPKTLK------EI Saccharomyces_cerevisiae_TFIIB ------TLKGKSMESIMAASILIGCRRAEVARTFK------EI Methanosarcina_acetivorans_TFB ------LIRGRSIEGVAAAALYAACRQCSVPRTLD------EI Sulfolobus_solfataricus_TFB_AA ------LVRGRSIESVVAAAIYAACRRMKLARTLD------EI Populus_balsamifera_TFIIB9 ------KYSANVLDHLTGDTICIDCGLVLISYYVDEEPEWRTFGI Arabidopsis_thaliana_BRF4_At4g TFLMNCALKWFLSTGRRPMPLVVAVLAFVVQVNGVKVKIDD---LAKDAS Lycopersicon_esculentum_TFIIB_AF ------STDFDKATEFDIYYSREGCWKIAEEMCFG------

Arabidopsis_thaliana_BRF1_At2g SSYLRVSV--YDLGSVYLQLCDMLYITENH------NYEKLVDPSIFIP Arabidopsis_thaliana_BRF2_At3g SSYLRVSV--YELGSVYLQLCEMLYLVENR------NYEKLVDPSIFMD Arabidopsis_thaliana_BRF3_At2g SSYLRVSV--YELGSVYLQLCEMLYLVQNK------NYEELVDPSIFIP Drosophila_melanogaster_BRF_AA SDVQQICS--YELGRTYLKLSHALCIN------IPSLDPCLYIM Homo_sapiens_BRF_NP_001510.2 SDLLQVNV--YVLGKTFLLLARELCIN------APAIDPCLYIP Saccharomyces_cerevisiae_BRF_N SSRLQVSV--YSIGATFLKMVKKLHITE------LPLADPSLFIQ Populus_balsamifera_TFIIB7/pBrp SIAANVPQ--KEIGKYIKILGEALQLSQP------INSNSISVHMP Arabidopsis_thaliana_TFIIB5_At SIAANVQQ--KEIGKYIKILGEALQLSQP------INSNSISVHMP Lycopersicon_esculentum_AAG011 SVAANLPQ--KEIGKYIKILGEALQLSQP------INSNSISVHMP Populus_balsamifera_TFIIB6 EFGG------F------Populus_balsamifera_TFIIB4 CFLG------ISRVLPGN------EPYILFHPDLS- Populus_balsamifera_TFIIB5 CQLW------IRLLAIVC------RLGKCMLNSCW- Oryza_sativa_TFIIB1_AF464908 CSVANGAT-KKEIGRAKEFIVKQLEVEMG------QSMEMGTIHAGDFLR Triticum_aestivum_TC68795 CSVANGAT-KKEIGRAKEFIVKQLEVEMG------QSMEMGTIHAGDFLR Mesembryanthemum_crystallinum_ CSVANGAS-KKEIGRAKEYIVKQLELEMG------KSVTIGTIHAADFLR Vitis_vinifera_TFIIB_TC19782 CSVANGAT-KKEIGRAKEYIVKQLEAEKG------QSVEMGTIHAGDFMR Populus_balsamifera_TFIIB1 CSVANGAT-KKEIGRAKEYIVKQLGLETG------QSVEMGTIHAGDFMR Citrus_sinensis_TFIIB_CB292941 CSVANGAT-KKEIGRAKEYIVKQLGLETG------QSVEMGTIHAGDFMR Glycine_max_TFIIB_U31097 CSVANGAT-KKEIGRAKEYIVKQLGLENG------NAVEMGTIHAGDFMR Medicago_truncatula_TC86832 CSIANGAT-KKEIGRAKEYIVKQLGLENGG-----QSVEMGTIHAGDFMR Arabidopsis_thaliana_TFIIB2_At CSVANGAT-KKEIGRAKEYIVKQLGLETG------QLVEMGTIHAGDFMR Arabidopsis_thaliana_TFIIB1_At CVIANGAT-KKEIGRAKDYIVKTLGLEPG------QSVDLGTIHAGDFMR Lycopersicon_esculentum_TFIIB_T CSIANGAS-KKEIGRAKEFIVKQLKVEMG------ESMEMGTIHAGDYLR Solanum_tuberosum_TFIIB1_TC587 CSIANGAS-KKEIGRAKEFIVKQLKVEMG------ESMEMGTIHAGDYLR Populus_balsamifera_TFIIB3 CSVANGAT-KKEIGRAKEYIVKQLGLEAG------QSVEMGTIHAGDFMR Populus_balsamifera_TFIIB8 KNLHSKHT-NLDEGINICLRLKEAPPHN------NEQYTA Arabidopsis_thaliana_TFIIB4_At SRFANKAS-ISDISETVGFIAEKLEINKN------WYMSIETANFIK Populus_balsamifera_TFIIB2 SSVVYGVT-KKEINKAVQSIKRHVELED------MGTLNPSELVR Oryza_sativa_TFIIB2_AAN59779 ASVMRDCQDKKEIGRMERIIRRHLGEEAG------TAMEMGVVRAADYMS Drosophila_melanogaster_TFIIB_ CAVSKISK--KEIGRCFKLTLKALETSVD------LITTADFMC Homo_sapiens_TFIIB_NM_001514 CAVSRISK--KEIGRCFKLILKALETSVD------LITTGDFMS Arabidopsis_thaliana_TFIIB3_At AEVANGVD-KKDIRKESLVIKRVLESHQTS-----VSASQAIINTGELVR Arabidopsis_thaliana_TFIIB6_At SAVAN-VD-MKEINK---AVKLLGDSFG------Saccharomyces_cerevisiae_TFIIB QSLIHVKT--KEFGKTLNIMKNILRGKSEDGFLKIDTDNMSGAQNLTYIP Methanosarcina_acetivorans_TFB EEVSRVSR--KEIGRTYRFISRELALKLM------PTSPIDYVP Sulfolobus_solfataricus_TFB_AA AQYTKANR--KEVARCYRLLLRELDVSVP------VSDPKDYVT Populus_balsamifera_TFIIB9 EDNINEYD-PNHLGSLSDPLLTHANLATT------ISKPAKGGTTAV Arabidopsis_thaliana_BRF4_At4g VSLTTCKTRYKELSEKLVKVAEEVGLPWAKD---VTVKNVLKHSGTLFAL Lycopersicon_esculentum_TFIIB_AF SRTIFPKSGIHVNGVVYWMTSKNILAFDLTKG---RTQLLESYGTRGFLG

Arabidopsis_thaliana_BRF1_At2g RFSNMLLKGAHN--NKLVLTATHIIASMKRDWMQTG-RKPSGICGAALYT Arabidopsis_thaliana_BRF2_At3g RFSNSLLKGKNN--KDVVATARDIIASMKRDWIQTG-RKPSGICGAALYT Arabidopsis_thaliana_BRF3_At2g RFTNSLLKGAHAKAKDVANTAKNIISSMKRDWIQTG-RKPSGICGAAIYM Drosophila_melanogaster_BRF_AA RFANRLQLGAKT--HEVSMTALRIVQRMKKDCMHSG-RRPTGLCGAALLI Homo_sapiens_BRF_NP_001510.2 RFAHLLEFGEKN--HEVSMTALRLLQRMKRDWMHTG-RRPSGLCGAALLV Saccharomyces_cerevisiae_BRF_N HFAEKLDLADKK--IKVVKDAVKLAQRMSKDWMFEG-RRPAGIAGACILL

204

Populus_balsamifera_TFIIB7/pBrp RFCTLLQLNKSA-----QELATHIGEVVINKCFCTR-RNPISISAAAIYL Arabidopsis_thaliana_TFIIB5_At RFCTLLQLNKSA-----QELATHIGEVVINKCFCTR-RNPISISAAAIYL Lycopersicon_esculentum_AAG011 RFCTLLQLNKSA-----QELATHIGEVIINKCFCTR-RNPISISAAAIYL Populus_balsamifera_TFIIB6 ------Populus_balsamifera_TFIIB4 -FSSILKMYRHI-----LMKV------Populus_balsamifera_TFIIB5 -LWNFIRQHHGK-----IK------Oryza_sativa_TFIIB1_AF464908 RFCSTLGMNNQA-----VKAAQEAVQR-SEELDIR--RSPISIAAAVIYM Triticum_aestivum_TC68795 RFCSTLGMNNTA-----VKAAQEAVQR-SEELDIR--RSPISIAAAVIYM Mesembryanthemum_crystallinum_ RFCSNLGMNNQA-----MKAAQEAVQK-SEEIDIR--RSPISIAAAVIYI Vitis_vinifera_TFIIB_TC19782 RFCSNLGMTNQV-----VKAAQEAVQK-SEEFDIR--RSPVSIAAAVIYI Populus_balsamifera_TFIIB1 RFCSNLGMSNHT-----VKAATEAVKT-SEQFDIR--RSPISIAAAVIYI Citrus_sinensis_TFIIB_CB292941 RFCSNLGMNNQA-----VKAAQEAVQK-SEEFDIR--RSPISIAAAVIYI Glycine_max_TFIIB_U31097 RFCSNLCMNNQA-----VKAAQEAVQK-SEEFDIR--RSPISIAAAVIYI Medicago_truncatula_TC86832 RFCSNLGMNHQA-----VKAAQESVQK-SEEFDIR--RSPISIAAAVIYI Arabidopsis_thaliana_TFIIB2_At RFCSNLGMTNQT-----VKAAQESVQK-SEEFDIR--RSPISIAAAVIYI Arabidopsis_thaliana_TFIIB1_At RFCSNLAMSNHA-----VKAAQEAVQK-SEEFDIR--RSPISIAAVVIYI Lycopersicon_esculentum_TFIIB_T RFCSNLGMNHEE-----IKVVQETVQK-AEEFDIR--RSPISIAAAIIYM Solanum_tuberosum_TFIIB1_TC587 RFCSNLGMNHEE-----IKVVQETVQK-AEEFDIR--RSPISIAAAIIYM Populus_balsamifera_TFIIB3 RFCSNLGMSNHT-----VKAATEAVKT-SEQFDIR--RSPISIAAAVIYI Populus_balsamifera_TFIIB8 TFLLLFVTN------ELKRDG------GGKVLL Arabidopsis_thaliana_TFIIB4_At RFCSIFRLDKEA-----VEAALEAAESYDYMTNGR--RAPVSVAAGIVYV Populus_balsamifera_TFIIB2 RFCSNLGMKNHA-----VKAVHEAVEK-IQDVDIR--RNPKSVLAAIIYT Oryza_sativa_TFIIB2_AAN59779 RFGSRLGMGKPE-----VREAQRAAQTLEDKLDVR--RNPESIAAAIIYM Drosophila_melanogaster_TFIIB_ RFCANLDLPNMV-----QRAATHIAKK-AVEMDIVPGRSPISVAAAAIYM Homo_sapiens_TFIIB_NM_001514 RFCSNLCLPKQV-----QMAATHIARK-AVELDLVPGRSPISVAAAAIYM Arabidopsis_thaliana_TFIIB3_At RFCSKLDISQRE-----IMAIPEAVEK-AENFDIR--RNPKSVLAAIIFM Arabidopsis_thaliana_TFIIB6_At ------Saccharomyces_cerevisiae_TFIIB RFCSHLGLPMQV-----TTSAEYTAKKCKEIKEIAG-KSPITIAVVSIYL Methanosarcina_acetivorans_TFB RFCSGLNLKGEV-----QSKSVEILRQ-ASEKELTSGRGPTGVAAAAIYI Sulfolobus_solfataricus_TFB_AA RIANLLGLSGAV-----MKTAAEIIDK-AKGSGLTAGKDPAGLAAAAIYI Populus_balsamifera_TFIIB9 AISKNWLINRQS------NPDGDLIQGFEIIETMARRRNREAMPAACLFI Arabidopsis_thaliana_BRF4_At4g MEAKSMKKRKQGTGKELVRTDGFCVEDLVMDCLSKE-SMYCYDDDARQDT Lycopersicon_esculentum_TFIIB_AF TFSGKLCKVDVSG----DIISLNVLANTHSNTMQIGSQIKMWSEKEIVVL

Arabidopsis_thaliana_BRF1_At2g AALSHGIKCSKTD--IVNIVHICEATL--TKRLIEFGDTEAASLTADELS Arabidopsis_thaliana_BRF2_At3g AALSHGIKCSKTD--IVNIVHICEATL--TKRLIEFGDTDSGNLNVNELR Arabidopsis_thaliana_BRF3_At2g AALSHGIMYSRAD--IAKVVHMCEATI--TKRLNEFANTEAGSLTLLVGR Drosophila_melanogaster_BRF_AA AARMHDFSRTMLD--VIGVVKIHESTL--RKRLSEFAETPSGGLTLEEFM Homo_sapiens_BRF_NP_001510.2 AARMHDFRRTVKE--VISVVKVCESTL--RKRLTEFEDTPTSQLTIDEFM Saccharomyces_cerevisiae_BRF_N ACRMNNLRRTHTE--IVAVSHVAEETL--QQRLNEFKNTKAAKLSVQKFR Populus_balsamifera_TFIIB7/pBrp ACQLEDKRKTQAE--ICKVTGLTEVTL--RKVYKELLENWGDLLPKNYTP Arabidopsis_thaliana_TFIIB5_At ACQLEDKRKTQAE--ICKITGLTEVTL--RKVYKELLENWDDLLPSNYTP Lycopersicon_esculentum_AAG011 ACQLEDKRKTQAE--ICKVTGLTEVTL--RKVYKELLENWDDLLPSSYKP Populus_balsamifera_TFIIB6 ------Populus_balsamifera_TFIIB4 ------Populus_balsamifera_TFIIB5 ------Oryza_sativa_TFIIB1_AF464908 ITQLSDDKKPLKD--ISLATGVAEGTI--RNSYKDLYPYASRLIPNTYAK Triticum_aestivum_TC68795 ITQLSEDKKPLKD--ISLATGVAEGTI--RNSYKDLYPYAARLIPNSYAK Mesembryanthemum_crystallinum_ ITQLSEEKKPLRD--ISLATGVAEGTI--RNAYKDLYPHISKIIPVWYAT Vitis_vinifera_TFIIB_TC19782 ITQLSDEKKLLRD--ISIATGVAEGTI--RNSYKDLYPHISRIIPSWYAK Populus_balsamifera_TFIIB1 ITQLSDDKKPLRD--ISLATGVAEGTI--RNSYKDLYPHVSKIIPSWYAS Citrus_sinensis_TFIIB_CB292941 ITQLSDDKKPLKD--ISVATGVAEGTI--RNSYKDLYPHVSKIIPNWYAK Glycine_max_TFIIB_U31097 ITQLSDDKKPLKD--ISLATGVAEGTI--RNSYKDLYPHVSKIIPNWYAK Medicago_truncatula_TC86832 ITQLSDDKKPLKD--ISVATGVAEGTI--RNSYKDLYPHVSKIIPNWYAK Arabidopsis_thaliana_TFIIB2_At ITQLSDEKKPLRD--ISVATGVAEGTI--RNSYKDLYPHLSKIIPAWYAK Arabidopsis_thaliana_TFIIB1_At ITQLSDDKKTLKD--ISHATGVAEGTI--RNSYKDLYPHLSKIAPSWYAK Lycopersicon_esculentum_TFIIB_T ITQLSDSKKPVLR-DISVATTVAEGTI--KNAYKDLYPHASKIIPEWYVK Solanum_tuberosum_TFIIB1_TC587 ITQLSDSKKPVLRADISVATTVAEGTI--KNAYKDLYPHASKIIPEWYVK Populus_balsamifera_TFIIB3 ITQLSDDKKPLRD--ISLATGVAEGTI--RNSYKDLYPHVSKIIPAWYAN Populus_balsamifera_TFIIB8 LLNLCMDRKILE-----SGKQPSEAKL--YALSVPTDTHRPTMLPE---- Arabidopsis_thaliana_TFIIB4_At IARLSYEKHLLKG--LIEATGVAENTI--KGTYGDLYPNLPTIVPTWFAN Populus_balsamifera_TFIIB2 ITQLSDEKKPLRD--ISLAADVAEGTI--KKSFKDISPHVSRLVPKWYAR Oryza_sativa_TFIIB2_AAN59779 VVQRAGAQTSARD--VSKASGVAEATI--KEACKELSQHEELLFSS---- Drosophila_melanogaster_TFIIB_ ASQASEHKRSQKE--IGDIAGVADVTI--RQSYKLMYPHAAKLFPEDFKF Homo_sapiens_TFIIB_NM_001514 ASQASAEKRTQKE--IGDIAGVADVTI--RQSYRLIYPRAPDLFPTDFKF Arabidopsis_thaliana_TFIIB3_At ISHISQTNRKPIR-EIGIVAEVVENTI--KNSVKDMYPYALKIIPNWYAC Arabidopsis_thaliana_TFIIB6_At ------Saccharomyces_cerevisiae_TFIIB NILLFQIPITAAK--VGQTLQVTEGTI--KSGYKILYEHRDKLVDPQLIA Methanosarcina_acetivorans_TFB ASILCGERRTQRE--VADVAGVTEVTI--RNRYKELAEELDIEIIL---- Sulfolobus_solfataricus_TFB_AA ASLLHDERRTQKE--IAQVAGVTEVTV--RNRYKELTQELKISIPTQ--- Populus_balsamifera_TFIIB9 SCKENKLPRTLKE------TCSAASCN--GGGGGGLTMKEACTIGGYDRR

205

Arabidopsis_thaliana_BRF4_At4g MSRYFDVEGERQLSLCNYDDNISENQL--STKYNEFEDRVCGGTLAKRSQ Lycopersicon_esculentum_TFIIB_AF DSEIVGDGAARNHTVLHVDSDIMVVLCGRRTCSYDFKSRLTKFLSSKVGI

Arabidopsis_thaliana_BRF1_At2g KTEREKETAA----- Arabidopsis_thaliana_BRF2_At3g ERESHKRSFT----- Arabidopsis_thaliana_BRF3_At2g ILLLISEQR------Drosophila_melanogaster_BRF_AA TVDLER-EQDP---- Homo_sapiens_BRF_NP_001510.2 KIDLEE-ECDP---- Saccharomyces_cerevisiae_BRF_N ENDVEDGEARP---- Populus_balsamifera_TFIIB7/pBrp AVPPEKAFPT----- Arabidopsis_thaliana_TFIIB5_At AVPPEKAFPT----- Lycopersicon_esculentum_AAG011 VVPPEKAFPS----- Populus_balsamifera_TFIIB6 ------Populus_balsamifera_TFIIB4 ------Populus_balsamifera_TFIIB5 ------Oryza_sativa_TFIIB1_AF464908 EEDLKNLCTP----- Triticum_aestivum_TC68795 EEDLKNLCTP----- Mesembryanthemum_crystallinum_ EDDLKTSAAH----- Vitis_vinifera_TFIIB_TC19782 EEDLRNLCSP----- Populus_balsamifera_TFIIB1 EEDLKNLCSP----- Citrus_sinensis_TFIIB_CB292941 EEDLKNLCSP----- Glycine_max_TFIIB_U31097 EEDLKNLCSP----- Medicago_truncatula_TC86832 EEDLKNLCSP----- Arabidopsis_thaliana_TFIIB2_At EEDLKNLQSP----- Arabidopsis_thaliana_TFIIB1_At EEDLKNLSSP----- Lycopersicon_esculentum_TFIIB_T DKDLKSLCSPKA--- Solanum_tuberosum_TFIIB1_TC587 DKDLKNLCSPKA--- Populus_balsamifera_TFIIB3 EEDLKNLSSP----- Populus_balsamifera_TFIIB8 ------Arabidopsis_thaliana_TFIIB4_At ANDLKNLGAP----- Populus_balsamifera_TFIIB2 EEDIRRIRIP----- Oryza_sativa_TFIIB2_AAN59779 ------Drosophila_melanogaster_TFIIB_ TTPIDQLPQM----- Homo_sapiens_TFIIB_NM_001514 DTPVDKLPQL----- Arabidopsis_thaliana_TFIIB3_At ESDIIKRLDG----- Arabidopsis_thaliana_TFIIB6_At ------Saccharomyces_cerevisiae_TFIIB NGVVSLDNLPGVEKK Methanosarcina_acetivorans_TFB ------Sulfolobus_solfataricus_TFB_AA ------Populus_balsamifera_TFIIB9 DHES------Arabidopsis_thaliana_BRF4_At4g GSSQSMWQRR----- Lycopersicon_esculentum_TFIIB_AF LDRCFPYVNSLVSL-

TBP Alignment

The N-terminal imperfect repeat is highlighted in yellow, the C-terminal imperfect repeat is shaded in grey (Nikolov et al., 1995).

CLUSTAL X (1.83) multiple sequence alignment

Arabidopsis_thaliana_TBP1_At3g ------MTDQG------Arabidopsis_thaliana_TBP2_At1g ------MADQG------Medicago_truncatula_TBP_TC8687 ------MADQG------Medicago_truncatula_TBP_TC8871 ------MADQG------Zea_mays_TBP_TC171023 ------MAEPG------Zea_mays_TBP_X90652.1 ------MAEPR------Oryza_sativa_TBP_TC116362 ------MAAEAAAA------Triticum_aestivum_TBP_TC88519 ------MAEAAA------Triticum_aestivum_TBP_TC72701 ------MAEAT------Hordeum_vulgare_TBP_TC78738 ------MAEAA------Sorghum_bicolor_TBP_TC54739 ------MAEPG------Zea_mays_TBP_TC182979 ------MAEPG------Mesembryanthemum_crystallinum_ ------MAEQG------Populus_trichocarpa_TBP_Contig1 ------MAEQGG------Populus_trichocarpa_TBP_Contig2 ------MAEQGG------Glycine_max_TBP_TC146463 ------MADQG------Solanum_tuberosum_TBP_TC74102 ------MADQG------

206

Triticum_aestivum_TBP_TC90291 ------MAAAAVDPMVLGLGTSGGASGSGV Chlamydomonas_reinhardtii_TBP_ ------MMAAAEAPPATP------Saccharomyces_cerevisiae_TBP_N ------QSEED------Homo_sapiens_TBP_NP_003185.1 ------QATQGTSGQAPQLFH--SQTLTTAP Drosophila_melanogaster_TBP_NP ------MPMSERSVGGSGAGGG--GDALSNIH Drosophila_melanogaster_TRF_Q2 ------MQFHFKVADAERDRDNVAATSNAAAN---- Homo_sapiens_TBP_L1_NP_004856 ------MD------Drosophila_melanogaster_TRF2_A VASNGNGLITAKMDLLEEEVMQSITVIDDDDEEKKEVAED------

Arabidopsis_thaliana_TBP1_At3g ------LEGSNPVDLSKHPSGIVPTL------Arabidopsis_thaliana_TBP2_At1g ------TEGSQPVDLTKHPSGIVPTL------Medicago_truncatula_TBP_TC8687 ------LEGSQPVDLSKHPSGIVPTL------Medicago_truncatula_TBP_TC8871 ------LEGSQPVDLAKHPSGIVPTL------Zea_mays_TBP_TC171023 ------LEDSQPVDLSKHPSGIVPTL------Zea_mays_TBP_X90652.1 ------LEDSQPVDLSKHPSGIVPTL------Oryza_sativa_TBP_TC116362 ------LEGSEPVDLAKHPSGIIPTL------Triticum_aestivum_TBP_TC88519 ------LEGSEPVDLTKHPSGIIPTL------Triticum_aestivum_TBP_TC72701 ------LEGSEPVDLSKHPSGIIPTL------Hordeum_vulgare_TBP_TC78738 ------LEGSQPVDLSKHPSGIVPTL------Sorghum_bicolor_TBP_TC54739 ------LEGSQPVDLSKHPSGIVPTLHFPVLGASKRANIVLN Zea_mays_TBP_TC182979 ------LEGSQPVDLSKHPSGIVPTL------Mesembryanthemum_crystallinum_ ------LEGSQPVDPIKHPSGIVPTL------Populus_trichocarpa_TBP_Contig1 ------LEGSQPVDLSKHPSGIVPTL------Populus_trichocarpa_TBP_Contig2 ------LEGSQPVDLSKHPSGIVPIL------Glycine_max_TBP_TC146463 ------LEGSQPVDLQKHPSGIVPTL------Solanum_tuberosum_TBP_TC74102 ------LEGSQPVDLTKHPSGIVPTL------Triticum_aestivum_TBP_TC90291 VGGGVGRAGGGGAVMEGAQPVDLARHPSGIVPVL------Chlamydomonas_reinhardtii_TBP_ ------QLSAADVEAEMAAHVSGIKPQL------Saccharomyces_cerevisiae_TBP_N ------IKRAAPESEKDTSATSGIVPTL------Homo_sapiens_TBP_NP_003185.1 LPGTTPLYPSPMTPMTPITPATPASESSGIVPQL------Drosophila_melanogaster_TBP_NP QT------MGPSTPMTPATPGSADPGIVPQL------Drosophila_melanogaster_TRF_Q2 ------PHAALQPQQPVALVEPKDAQHEIR------Homo_sapiens_TBP_L1_NP_004856 ------ADSDVALD------Drosophila_melanogaster_TRF2_A ------EEESSNNAKPIDLHQPIADNEHELD------

Arabidopsis_thaliana_TBP1_At3g ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Arabidopsis_thaliana_TBP2_At1g ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Medicago_truncatula_TBP_TC8687 ------QNIVSTVNLDCKLELKSIALQARNAEYNPKR Medicago_truncatula_TBP_TC8871 ------QNIVSTVNLDTKLDLKAIALQARNAEYNPKR Zea_mays_TBP_TC171023 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Zea_mays_TBP_X90652.1 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Oryza_sativa_TBP_TC116362 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Triticum_aestivum_TBP_TC88519 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Triticum_aestivum_TBP_TC72701 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Hordeum_vulgare_TBP_TC78738 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Sorghum_bicolor_TBP_TC54739 SWGFGGNYLVVILSPRVDFRNIVSTVNLDCKLDLKAIALQARNAEYNPKR Zea_mays_TBP_TC182979 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Mesembryanthemum_crystallinum_ ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Populus_trichocarpa_TBP_Contig1 ------QNIVSTVNLDCKLELKQIALQARNAEYNPKR Populus_trichocarpa_TBP_Contig2 ------QNIVSTVNLDCRLDLKQIALQARNAEYNPKR Glycine_max_TBP_TC146463 ------QNIVSTVNLDCKLDLKTIALQARNAEYNPKR Solanum_tuberosum_TBP_TC74102 ------QNIVSTVNLDCKLDLKAIALQARNAEYNPKR Triticum_aestivum_TBP_TC90291 ------QNIVSTVNLDCRLDLKQIALQARNAEYNPKR Chlamydomonas_reinhardtii_TBP_ ------QNVVATVNLGTKLDLKEIAMHARNAEYNPKR Saccharomyces_cerevisiae_TBP_N ------QNIVATVTLGCRLDLKTVALHARNAEYNPKR Homo_sapiens_TBP_NP_003185.1 ------QNIVSTVNLGCKLDLKTIALRARNAEYNPKR Drosophila_melanogaster_TBP_NP ------QNIVSTVNLCCKLDLKKIALHARNAEYNPKR Drosophila_melanogaster_TRF_Q2 ------LQNIVATFSVNCELDLKAINSRTRNSEYSPKR Homo_sapiens_TBP_L1_NP_004856 ------ILITNVVCVFRTRCHLNLRKIALEGANVIYK-RD Drosophila_melanogaster_TRF2_A ------IVINNVVCSFSVGCHLKLREIALQGSNVEYR-RE

Arabidopsis_thaliana_TBP1_At3g FAAVIMRIREPKTTALIFASGKMVCTGAKSEDFSKMAARKYARIVQKLGF Arabidopsis_thaliana_TBP2_At1g FAAVIMRIREPKTTALIFASGKMVCTGAKSEHLSKLAARKYARIVQKLGF Medicago_truncatula_TBP_TC8687 FAAVIMRIREPKTTALIFASGKMVCTGAKSEVQSKLAARKYARIIQKLGF Medicago_truncatula_TBP_TC8871 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Zea_mays_TBP_TC171023 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Zea_mays_TBP_X90652.1 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Oryza_sativa_TBP_TC116362 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Triticum_aestivum_TBP_TC88519 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Triticum_aestivum_TBP_TC72701 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Hordeum_vulgare_TBP_TC78738 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF

207

Sorghum_bicolor_TBP_TC54739 FAAVIMRIREPKTTALIFASGKM------YARIIQKLGF Zea_mays_TBP_TC182979 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Mesembryanthemum_crystallinum_ FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Populus_trichocarpa_TBP_Contig1 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Populus_trichocarpa_TBP_Contig2 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Glycine_max_TBP_TC146463 FAAVIMRIRDPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Solanum_tuberosum_TBP_TC74102 FAAVIMRIREPKTTALIFASGKMVCTGAKSEQQSKLAARKYARIIQKLGF Triticum_aestivum_TBP_TC90291 FAAVIMRIRDPKTTALIFASGKMVCTGAKSEEHSKLAARKYARIVQKLGF Chlamydomonas_reinhardtii_TBP_ FAAVIMRIREPKTTALIFASGKMVCTGAKSEDDSRTAARRYAKIVQKLGF Saccharomyces_cerevisiae_TBP_N FAAVIMRIREPKTTALIFASGKMVVTGAKSEDDSKLASRKYARIIQKIGF Homo_sapiens_TBP_NP_003185.1 FAAVIMRIREPRTTALIFSSGKMVCTGAKSEEQSRLAARKYARVVQKLGF Drosophila_melanogaster_TBP_NP FAAVIMRIREPRTTALIFSSGKMVCTGAKSEDDSRLAARKYARIIQKLGF Drosophila_melanogaster_TRF_Q2 FRGVIMRMHSPRCTALIFRTGKVICTGARNEIEADIGSRKFARILQKLGF Homo_sapiens_TBP_L1_NP_004856 VGKVLMKLRKPRITATIWSSGKIICTGATSEEEAKFGARRLARSLQKLGF Drosophila_melanogaster_TRF2_A NGMVTMKLRHPYTTASIWSSGRITCTGATSESMAKVAARRYARCLGKLGF

Arabidopsis_thaliana_TBP1_At3g PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHAAFSSYEP------E Arabidopsis_thaliana_TBP2_At1g PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHSAFSSYEP------E Medicago_truncatula_TBP_TC8687 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSVKYDTKLLLSISPG Medicago_truncatula_TBP_TC8871 NAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSVSY------Zea_mays_TBP_TC171023 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Zea_mays_TBP_X90652.1 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Oryza_sativa_TBP_TC116362 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Triticum_aestivum_TBP_TC88519 PAKFKDFKIQNIVASCDVKFPIRLEGLAYSHGAFSSYEP------E Triticum_aestivum_TBP_TC72701 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Hordeum_vulgare_TBP_TC78738 PAKFKDFKIQNIVASCDVKFPIRLEGLAYSHGAFSSYEP------E Sorghum_bicolor_TBP_TC54739 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Zea_mays_TBP_TC182979 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Mesembryanthemum_crystallinum_ PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Populus_trichocarpa_TBP_Contig1 AAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Populus_trichocarpa_TBP_Contig2 AAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Glycine_max_TBP_TC146463 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYSHGAFSSYEP------E Solanum_tuberosum_TBP_TC74102 PAKFKDFKIQNIVGSCDVKFPIRLEGLAYAHGAFSSYEP------E Triticum_aestivum_TBP_TC90291 PATFKDFKIQNIVASCDVKFPIRLEGLAYSHGAFSSYEP------E Chlamydomonas_reinhardtii_TBP_ PATFKEFKIQNIVGSCDVKFPIRLEGLAYAHSLFASYEP------E Saccharomyces_cerevisiae_TBP_N AAKFTDFKIQNIVGSCDVKFPIRLEGLAFSHGTFSSYEP------E Homo_sapiens_TBP_NP_003185.1 PAKFLDFKIQNMVGSCDVKFPIRLEGLVLTHQQFSSYEP------E Drosophila_melanogaster_TBP_NP PAKFLDFKIQNMVGSCDVKFPIRLEGLVLTHCNFSSYEP------E Drosophila_melanogaster_TRF_Q2 PVKFMEYKLQNIVATVDLRFPIRLENLNHVHGQFSSYEP------E Homo_sapiens_TBP_L1_NP_004856 QVIFTDFKVVNVLAVCNMPFEIRLPEFTKNNRPHASYEP------E Drosophila_melanogaster_TRF2_A PTRFLNFRIVNVLGTCSMPWAIKIVNFSERHRENASYEP------E

Arabidopsis_thaliana_TBP1_At3g LFPGLIYRMKV--PKIVLLIFVSGKIVITGAKMRDETYKAFENIYPVLSE Arabidopsis_thaliana_TBP2_At1g LFPGLIYRMKL--PKIVLLIFVSGKIVITGAKMREETYTAFENIYPVLRE Medicago_truncatula_TBP_TC8687 EFEEIMYHYYQ--SHMTLALFPS--IIYLKSSILQKTGTSLETLVSKVCP Medicago_truncatula_TBP_TC8871 ----FIYSYLH--TSSSLD------VICIGISMAKRFRFFLKI------Zea_mays_TBP_TC171023 LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLAE Zea_mays_TBP_X90652.1 LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLAE Oryza_sativa_TBP_TC116362 LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVRDETYTAFENIYPVLTE Triticum_aestivum_TBP_TC88519 LFPGLIYRMRQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLTE Triticum_aestivum_TBP_TC72701 LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLTE Hordeum_vulgare_TBP_TC78738 LFPGLIYRMRQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLTE Sorghum_bicolor_TBP_TC54739 LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLTE Zea_mays_TBP_TC182979 LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLSE Mesembryanthemum_crystallinum_ LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVREETYTAFENIYPVLTE Populus_trichocarpa_TBP_Contig1 LFPGLIYRMKQ--PKIVLLIFVSGKIVITGAKVREETYTAFENIYPVLAE Populus_trichocarpa_TBP_Contig2 IFPGLIYRMKQ--PKIVLLIFVSGKIVITGAKVRDETYTAFGNIYPVLTE Glycine_max_TBP_TC146463 LFPGLIYRMKQ--PKIVLLIFVSGKIVLTGAKVRDETYTAFENIYPVLTE Solanum_tuberosum_TBP_TC74102 LFPGLIYRMKQ--PKIVLLIFVSGKIVITGAKVRDETYTAFENIYPVLTE Triticum_aestivum_TBP_TC90291 LFPGLIYRMKQ--PKIVLLVFVSGKIVLTGAKVRDEIYAAFENIYPVLTE Chlamydomonas_reinhardtii_TBP_ LFPGLIYRMKQ--PKIVLLIFVSGKVVLTGAKTRGEIYQAYMNIYPTLIQ Saccharomyces_cerevisiae_TBP_N LFPGLIYRMVK--PKIVLLIFVSGKIVLTGAKQREEIYQAFEAIYPVLSE Homo_sapiens_TBP_NP_003185.1 LFPGLIYRMIK--PRIVLLIFVSGKVVLTGAKVRAEIYEAFENIYPILKG Drosophila_melanogaster_TBP_NP LFPGLIYRMVR--PRIVLLIFVSGKVVLTGAKVRQEIYDAFDKIFPILKK Drosophila_melanogaster_TRF_Q2 MFPGLIYRMVK--PRIVLLIFVNGKVVFTGAKSRKDIMDCLEAISPILLS Homo_sapiens_TBP_L1_NP_004856 LHPAVCYRIKS--LRATLQIFSTGSITVTGPNVK-AVATAVEQIYPFVFE Drosophila_melanogaster_TRF2_A LHPGVTYKMRDPDPKATLKIFSTGSVTVTAASVN-HVESAIQHIYPLVFD

Arabidopsis_thaliana_TBP1_At3g FRKIQQ------Arabidopsis_thaliana_TBP2_At1g FRKVQQ------Medicago_truncatula_TBP_TC8687 MENKTDTNQS------

208

Medicago_truncatula_TBP_TC8871 ------Zea_mays_TBP_TC171023 FRKVQQ------Zea_mays_TBP_X90652.1 FRKVQQWYVVLFYHVSIIVRS Oryza_sativa_TBP_TC116362 FRKVQQ------Triticum_aestivum_TBP_TC88519 FRKVQQ------Triticum_aestivum_TBP_TC72701 FRKVQQ------Hordeum_vulgare_TBP_TC78738 FRKVQQ------Sorghum_bicolor_TBP_TC54739 FRKVQQ------Zea_mays_TBP_TC182979 FRKIQQ------Mesembryanthemum_crystallinum_ FRKNQQ------Populus_trichocarpa_TBP_Contig1 FRKVQQWYTSQSLCPAL---- Populus_trichocarpa_TBP_Contig2 FRKVQQW------Glycine_max_TBP_TC146463 FRKNQQ------Solanum_tuberosum_TBP_TC74102 FRKNQQ------Triticum_aestivum_TBP_TC90291 YRKSQQ------Chlamydomonas_reinhardtii_TBP_ YKKGDAVVPTLPN------Saccharomyces_cerevisiae_TBP_N FRKM------Homo_sapiens_TBP_NP_003185.1 FRKTT------Drosophila_melanogaster_TBP_NP FKKQS------Drosophila_melanogaster_TRF_Q2 FRKT------Homo_sapiens_TBP_L1_NP_004856 SRKEIL------Drosophila_melanogaster_TRF2_A FRKQRS------

TAF6 Alignment

CLUSTAL X (1.83) multiple sequence alignment

Drosophila_melanogaster_TAF6_N MSGKPSKPSSPSSSMLYGSSISAESMKVIAESIGVGSLSDDAAKELAEDV Homo_sapiens_TAF6_NP_647476 MAEE------KKLKLSNTVLPSESMKVVAESMGIAQIQEETCQLLTDEV Arabidopsis_thaliana_TAF6b_1 ------MVTKESIEVIAQSIGLSTLSPDVSAALAPDV Arabidopsis_thaliana_TAF6b_3 ------MVTKESIEVIAQSIGLSTLSPDVSAALAP-- Arabidopsis_thaliana_TAF6b_2 ------MVTKESIEVIAQSIGLSTLSPDVSAALAPDV Arabidopsis_thaliana_TAF6_At1g ------MSIVPKETVEVIAQSIGITNLLPEAALMLAPDV Populus_trichocarpa_TAF6_Contig1 ------MSIVAKETIEVIAQSIGISNLSEDVALTLAPDV Hordeum_vulgare_TAF6_Barley1_1 ------MSIVPKETIEVIAQSIGIPSLPADVSAALAPDV Oryza_sativa_TAF6_BAB92191 ------MSIVPKETIEVIGQSVGIANLPADVSAALAPDV Populus_trichocarpa_TAF6_Contig2 ------MSSSIVAKEAIEVIAQGIGITNLSPDVSLTLAPDV Arabidopsis_thaliana_TAF6b_4 ------MVTKESIEVIAQSIGLSTLSPDVSAALAPDV Saccharomyces_cerevisiae_TAF6_ ------MSTQQQSYTIWSPQDTVKDVAESLGLENINDDVLKALAMDV Homo_sapiens_TAF6L_NP_006464 ------MSEREERRFVEIPRESVRLMAESTGL-ELSDEVAALLAEDV

Drosophila_melanogaster_TAF6_N SIKLKRIVQDAAKFMNHAKRQKLSVRDIDMSLKVRNVEPQYGFVAKD--- Homo_sapiens_TAF6_NP_647476 SYRIKEIAQDALKFMHMGKRQKLTTSDIDYALKLKNVEPLYGFHAQE--- Arabidopsis_thaliana_TAF6b_1 EYRVREVMQEAIKCMRHARRTTLMAHDVDSALHFRNLEPTSGSKS----- Arabidopsis_thaliana_TAF6b_3 ------DVDSALHFRNLEPTSGSKS----- Arabidopsis_thaliana_TAF6b_2 EYRVREVMQEAIKCMRHARRTTLMAHDVDSALHFRNLEPTSGSKS----- Arabidopsis_thaliana_TAF6_At1g EYRVREIMQEAIKCMRHSKRTTLTASDVDGALNLRNVEPIYGFASGG--- Populus_trichocarpa_TAF6_Contig1 EFRMRQIMQEAIKCMRHSKRTRLTTDDVDGALNLTNVEPIYGFASGG--- Hordeum_vulgare_TAF6_Barley1_1 EYRLREIMQEAIKCMRHAKRTVLTADDVDSALSLRNVEPVYGFASGD--- Oryza_sativa_TAF6_BAB92191 EYRLREIMQEAIKCMRHAKRTVLTADDVDSALSLRNVEPVYGFASGD--- Populus_trichocarpa_TAF6_Contig2 EYRLREIIQEAIKCMRHSRRTALTAHDVDTALILRNVEPIYGFGSGGDK- Arabidopsis_thaliana_TAF6b_4 EYRVREVMQEAIKCMRHARRTTLMAHDVDSALHFRNLEVS------Saccharomyces_cerevisiae_TAF6_ EYRILEIIEQAVKFKRHSKRDVLTTDDVSKALRVLNVEPLYGYYDGSEVN Homo_sapiens_TAF6L_NP_006464 CYRLREATQNSSQFMKHTKRRKLTVEDFNRALRWSSVEAVCGYGSQE---

Drosophila_melanogaster_TAF6_N --FIPFRFASGGGRELHFTEDKEIDLGEITSTN-SVKIPLDLTLRSHWFV Homo_sapiens_TAF6_NP_647476 --FIPFRFASGGGRELYFYEEKEVDLSDIINTP-LPRVPLDVCLKAHWLS Arabidopsis_thaliana_TAF6b_1 --MRFKR--APENRDLYFFDDKDVELKNVIEAP-LPNAPPDASVFSHWLA Arabidopsis_thaliana_TAF6b_3 --MRFKR--APENRDLYFFDDKDVELKNVIEAP-LPNAPPDASVFSHWLA Arabidopsis_thaliana_TAF6b_2 --MRFKR--APENRDLYFFDDKDVELKNVIEAP-LPNAPPDASVFSHWLA Arabidopsis_thaliana_TAF6_At1g -PFRFRK--AIGHRDLFYTDDREVDFKDVIEAP-LPKAPLDTEIVCHWLA Populus_trichocarpa_TAF6_Contig1 -ALQFKR--AIGHRDLFYVDDKDIDFKDVIEAP-LPKAPLDTAVVCHWLA Hordeum_vulgare_TAF6_Barley1_1 -PLRFKR--AVGHKDLFYIDDREVDFKEIIEAP-LPKAPLDTAVVAHWLA Oryza_sativa_TAF6_BAB92191 -PLRFKR--AVGHKDLFYIDDREVDFKEIIEAP-LPKAPLDTAVVAHWLA Populus_trichocarpa_TAF6_Contig2 VPLRFKRAAAAGHKDLYYIDDKDVNFKHVIEAP-PPKPPLDTSLTSHWLA Arabidopsis_thaliana_TAF6b_4 ------SSSLLLLFHTVDPDFDF---FLYS-LPLAP------K Saccharomyces_cerevisiae_TAF6_ KAVSFSKVNTSGGQSVYYLDEEEVDFDRLINEP-LPQVPRLPTFTTHWLA Homo_sapiens_TAF6L_NP_006464 --ALPMR--PAREGELYFPEDREVNLVELALATNIPKGCAETAVRVHVSY

209

Drosophila_melanogaster_TAF6_N VEGVQPTVPENPPPLSKDSQLLDSVNPVIKMDQGLNKD------Homo_sapiens_TAF6_NP_647476 IEGCQPAIPENPPPAPKEQQKAEATEPLKSAKPGQEEDGPLKGKGQGATT Arabidopsis_thaliana_TAF6b_1 IDGIQPSIPQNSPLQAIS------Arabidopsis_thaliana_TAF6b_3 IDGIQPSIPQNSPLQAIS------Arabidopsis_thaliana_TAF6b_2 IDGIQPSIPQNSPLQAIS------Arabidopsis_thaliana_TAF6_At1g IEGVQPAIPENAPLEVIRAP------Populus_trichocarpa_TAF6_Contig1 IEGVQPAIPENAPLEVIAPP------Hordeum_vulgare_TAF6_Barley1_1 IEGVQPAIPENPPIDAISAP------Oryza_sativa_TAF6_BAB92191 IEGVQPAIPENPPVDAIVAP------Populus_trichocarpa_TAF6_Contig2 IEGVQPAIPENVPIEALGVI------Arabidopsis_thaliana_TAF6b_4 VCGSRELLRT------Saccharomyces_cerevisiae_TAF6_ VEGVQPAIIQNPNLNDIRVSQPPFIRGAIVTALNDNSLQTPVTSTTASAS Homo_sapiens_TAF6L_NP_006464 LDGKGNLAPQGSVPSAVSS------

Drosophila_melanogaster_TAF6_N AAGKPTTGKIHKLKNVETIHVKQLATHELSVEQQLYYKEIT-----EACV Homo_sapiens_TAF6_NP_647476 ADGKGKEKKAPPLLEGAPLRLKPRSIHELSVEQQLYYKEIT-----EACV Arabidopsis_thaliana_TAF6b_1 ----DLKRSEYK------DDGLAARQVLSKDLQIYFDKVT-----EWAL Arabidopsis_thaliana_TAF6b_3 ----DLKRSEYK------DDGLAARQVLSKDLQIYFDKVT-----EWAL Arabidopsis_thaliana_TAF6b_2 ----DLKRSEYK------DDGLAAR------QIYFDKVT-----EWAL Arabidopsis_thaliana_TAF6_At1g ---AETKIHEQ--KDGPLIDVRLPVKHVLSRELQLYFQKIA-----ELAM Populus_trichocarpa_TAF6_Contig1 ---SDGKISEQ--NDEFPVDIKLPVKHVLSRELQLYFDKIT-----DLTV Hordeum_vulgare_TAF6_Barley1_1 ---TENKRTEQVKDDGLPVDIKLPVKHILSRELQMYFDKIA-----ELTM Oryza_sativa_TAF6_BAB92191 ---TENKRTEHGKDDGLPVDIKLPVKHVLSRELQMYFDKIA-----ELTM Populus_trichocarpa_TAF6_Contig2 ---SDGKKSDYK-DDGLSIDVKLPVKDILSRELQLYFEKVT-----ELTA Arabidopsis_thaliana_TAF6b_4 ------EIYTSSMT-----KMSS Saccharomyces_cerevisiae_TAF6_ VTDTGASQHLSNVKPGQNTEVKPLVKHVLSKELQIYFNKVISTLTAKSQA Homo_sapiens_TAF6L_NP_006464 ------LTDDLLKYYHQVT------RAV

Drosophila_melanogaster_TAF6_N G-SDEPRRGEALQSLGSDPGLHEMLPRMCTFIAEGVKVNVVQNNLALLIY Homo_sapiens_TAF6_NP_647476 G-SCEAKRAEALQSIATDPGLYQMLPRFSTFISEGVRVNVVQNNLALLIY Arabidopsis_thaliana_TAF6b_1 TQSGSTLFRQALASLEIDPGLHPLVPFFTSFIAE--EIVKNMDNYPILLA Arabidopsis_thaliana_TAF6b_3 TQSGSTLFRQALASLEIDPGLHPLVPFFTSFIAE--EIVKNMDNYPILLA Arabidopsis_thaliana_TAF6b_2 TQSGSTLFRQALASLEIDPGLHPLVPFFTSFIAE--EIVKNMDNYPILLA Arabidopsis_thaliana_TAF6_At1g SKSNPPLYKEALVSLASDSGLHPLVPYFTNFIAD--EVSNGLNDFRLLFN Populus_trichocarpa_TAF6_Contig1 RRSDSVLFKEALVSLATDSGLHPLIPYFTYFIAD--EVARGLNDYSLLFA Hordeum_vulgare_TAF6_Barley1_1 SRSSTPIFREALVSLSKDSGLHPLVPYFSYFIAD--EVTRSLADLPVLFA Oryza_sativa_TAF6_BAB92191 SRSETSVFREALVSLSRDSGLHPLVPYFSYFIAD--EVTRSLGDLPVLFA Populus_trichocarpa_TAF6_Contig2 RRSESAIFKQALVSLATDSGLHPLVPYFIQFIAD--EVSRNLNNFSLLLA Arabidopsis_thaliana_TAF6b_4 SRMLSKLLYQ------MHLLMHLFS------LIGWQLMVF Saccharomyces_cerevisiae_TAF6_ DEAAQHMKQAALTSLRTDSGLHQLVPYFIQFIAE--QITQNLSDLQLLTT Homo_sapiens_TAF6L_NP_006464 LGDDPQLMKVALQDLQTNSKIGALLPYFVYVVSG---VKSVSHDLEQLHR

Drosophila_melanogaster_TAF6_N LMRMVRALLDNPSLFLEKY------Homo_sapiens_TAF6_NP_647476 LMRMVKALMDNPTLYLEKY------Arabidopsis_thaliana_TAF6b_1 LMRLARSLLHNPHVHIEPY------Arabidopsis_thaliana_TAF6b_3 LMRLARSLLHNPHVHIEPY------Arabidopsis_thaliana_TAF6b_2 LMRLARSLLHNPHVHIEPY------Arabidopsis_thaliana_TAF6_At1g LMHIVRSLLQNPHIHIEPY------Populus_trichocarpa_TAF6_Contig1 LMRVVWSLLQNPHIHIEPYIIVNVLSFVFRIMSSIDEYKIKVQSLKLRRR Hordeum_vulgare_TAF6_Barley1_1 LMRVVQSLLRNPHIHIEPY------Oryza_sativa_TAF6_BAB92191 LMRVVQSLLHNPHIHIEPY------Populus_trichocarpa_TAF6_Contig2 VMRIARSLLQNPYIHIEPY------Arabidopsis_thaliana_TAF6b_4 NLPFHRILLSKPYLTLNDRNIR------Saccharomyces_cerevisiae_TAF6_ ILEMIYSLLSNTSIFLDPY------Homo_sapiens_TAF6L_NP_006464 LLQVARSLFRNPHLCLGPY------

Drosophila_melanogaster_TAF6_N -----LHELIPSVMTCIVSKQLCMRP------ELDNHWALRDFASR Homo_sapiens_TAF6_NP_647476 -----VHELIPAVMTCIVSRQLCLRP------DVDNHWALRDFAAR Arabidopsis_thaliana_TAF6b_1 -----LHQLMPSIITCLIAKRLGRR------SSDNHWDLRNFTAS Arabidopsis_thaliana_TAF6b_3 -----LHQLMPSIITCLIAKRLGRR------SSDNHWDLRNFTAS Arabidopsis_thaliana_TAF6b_2 -----LHQLMPSIITCLIAKRLGRR------SSDNHWDLRNFTAS Arabidopsis_thaliana_TAF6_At1g -----LHQLMPSVVTCLVSRKLGNR------FADNHWELRDFAAN Populus_trichocarpa_TAF6_Contig1 WISCQLHQLMPSVVTCLVARKLGNR------FADNHWELRDFTAN Hordeum_vulgare_TAF6_Barley1_1 -----LHQLMPSMITCIVAKRLGHR------LSDNHWELRDFSAN Oryza_sativa_TAF6_BAB92191 -----LHQLMPSIITCMVAKRLGHR------LSDNHWELRDFSAN Populus_trichocarpa_TAF6_Contig2 -----LHQLMPSIITCLVAKRLGNR------FSDNHWELRNFTAN Arabidopsis_thaliana_TAF6b_4 ------Saccharomyces_cerevisiae_TAF6_ -----IHSLMPSILTLLLAKKLGGSPKDDSPQEIHEFLERTNALRDFAAS Homo_sapiens_TAF6L_NP_006464 -----VRCLVGSVLYCVLEPLAASIN------PLNDHWTLRDGAAL

210

Drosophila_melanogaster_TAF6_N LMAQICK------NFNTLTNNLQTRV Homo_sapiens_TAF6_NP_647476 LVAQICK------HFSTTTNNIQSRI Arabidopsis_thaliana_TAF6b_1 TVASTCK------RFGHVYHNLLPRV Arabidopsis_thaliana_TAF6b_3 TVASTCK------RFGHVYHNLLPRV Arabidopsis_thaliana_TAF6b_2 TVASTCK------RFGHVYHNLLPRV Arabidopsis_thaliana_TAF6_At1g LVSLICK------RYGTVYITLQSRL Populus_trichocarpa_TAF6_Contig1 LVAPICKRVHGWQHSALILCKHSLTEYVPRVSWSGCCRFGHVYNSLQTRL Hordeum_vulgare_TAF6_Barley1_1 LVASVCR------RYGHVYHNLQIRL Oryza_sativa_TAF6_BAB92191 LVGSVCR------RFGHAYHNIQTRV Populus_trichocarpa_TAF6_Contig2 LVASICK------RFGHAYHNLQPRI Arabidopsis_thaliana_TAF6b_4 ------TMAWLL Saccharomyces_cerevisiae_TAF6_ LLDYVLK------KFPQAYKSLKPRV Homo_sapiens_TAF6L_NP_006464 LLSHIFWT------HGDLVSGLYQHI

Drosophila_melanogaster_TAF6_N TRIFSKALQNDKTHLSSLYGSIAGLSELGGEVIKVFIIPRLKFISERIEP Homo_sapiens_TAF6_NP_647476 TKTFTKSWVDEKTPWTTRYGSIAGLAELGHDVIKTLILPRLQQEGERIRS Arabidopsis_thaliana_TAF6b_1 TRSLLHTFLDPTKALPQHYGAIQGMVALGLNMVRFLVLPNLGPYLLLLLP Arabidopsis_thaliana_TAF6b_3 TRSLLHTFLDPTKALPQHYGAIQGMVALGLNMVRFLVLPNLGPYLLLLLP Arabidopsis_thaliana_TAF6b_2 TRSLLHTFLDPTKALPQHYGAIQGMVALGLNMVRFLVLPNLGPYLLLLLP Arabidopsis_thaliana_TAF6_At1g TRTLVNALLDPKKALTQHYGAIQGLAALGHTVVRLLILSNLEPYLSLLEP Populus_trichocarpa_TAF6_Contig1 TKTLLNALLDPKRSLTQHYGAIQGLAALGPNVVRLLLLPNLKPYLQLLEP Hordeum_vulgare_TAF6_Barley1_1 TKTLVHAFLDPHKALTQHYGAVQGISALGPSAIRLLLLPNLQTYMQLLDP Oryza_sativa_TAF6_BAB92191 TRTLVQGFLDPQKSLTQHYGAIQGISALGPSAIRLLLLPNLETYMQLLEP Populus_trichocarpa_TAF6_Contig2 IRTLVHAFLDPTKSLPQHYGSIQGLAALGPSVVRLLILPNLEPYLLLLEQ Arabidopsis_thaliana_TAF6b_4 HRCFLRTFR------Saccharomyces_cerevisiae_TAF6_ TRTLLKTFLDINRVFGTYYGCLKGVSVLEGESIR-FFLGNLNNWARLVFN Homo_sapiens_TAF6L_NP_006464 LLSLQKILADPVRPLCCHYGAVVGLHALGWKAVERVLYPHLSTYWTNLQA

Drosophila_melanogaster_TAF6_N HLLGTSISNTDKTAAGHIRAMLQKCCPPILRQMRSAPDTAEDYKND---F Homo_sapiens_TAF6_NP_647476 VLDGPVLSNIDRIGADHVQSLLLKHCAPVLAKLRPPPDNQDAYRAE---F Arabidopsis_thaliana_TAF6b_1 EMGLEKQKEEAKRHGAWLVYGALMVAAGRCLYERLKTSETLLSPPT---S Arabidopsis_thaliana_TAF6b_3 EMGLEKQKEEAKRHGAWLVYGALMVAAGRCLYERLKTSETLLSPPT---S Arabidopsis_thaliana_TAF6b_2 EMGLEKQKEEAKRHGAWLVYGALMVAAGRCLYERLKTSETLLSPPT---S Arabidopsis_thaliana_TAF6_At1g ELNAEKQKNQMKIYEAWRVYGALLRAAGLCIHGRLKIFPPLPSPSP---S Populus_trichocarpa_TAF6_Contig1 EMLLEKQKNEMKRHEAWHVYGALLCAAGQSIYDRLKMFPALMSHPA---C Hordeum_vulgare_TAF6_Barley1_1 ELQLEKQSNEMKRKEAWRVYGALLCAAGKCLYERLKLFPNLLCPST---R Oryza_sativa_TAF6_BAB92191 ELQLDKQKNEMKRKEAWRVYGALLCAAGKCLYDRLKLFPNLLSPST---R Populus_trichocarpa_TAF6_Contig2 EMLLEKQKNEIKRHEAWQR------AAGLCMYDRLKMLPGLFIPPS---R Arabidopsis_thaliana_TAF6b_4 -FTLTKSRSGL------Saccharomyces_cerevisiae_TAF6_ ESGITLDNIEEHLNDDSNPTRTKFTKEETQILVDTVISALLVLKKD---L Homo_sapiens_TAF6L_NP_006464 VLDDYSVSNAQVKADGHKVYGAILVAVERLLKMKAQAAEPNRGGPGGRGC

Drosophila_melanogaster_TAF6_N GFLGPSLCQAVVKVR------Homo_sapiens_TAF6_NP_647476 GSLGPLLCSQVVKARAQA------Arabidopsis_thaliana_TAF6b_1 SVWKTN--GKLTSPRQ------Arabidopsis_thaliana_TAF6b_3 SVWKTN--GKLTSPRQ------Arabidopsis_thaliana_TAF6b_2 SVWKTN--GKLTSPRQ------Arabidopsis_thaliana_TAF6_At1g FLHKGKGKGKIISTDP------Populus_trichocarpa_TAF6_Contig1 AVLRTN--EKVVTKRPGDFYD------F Hordeum_vulgare_TAF6_Barley1_1 PLLRSN--SRVATNNP------Oryza_sativa_TAF6_BAB92191 PLLRSN--KRVVTNNP------Populus_trichocarpa_TAF6_Contig2 AIWKSN--GRVMTAMPSMTCFNLSHWDTFIHASINPVTGYVYCLKIPVNA Arabidopsis_thaliana_TAF6b_4 ------Saccharomyces_cerevisiae_TAF6_ PDLYEGKGEKVTDEDK------Homo_sapiens_TAF6L_NP_006464 RRLDDLPWDSLLFQESSSGGGAEPSFGSGLPLPPGGAGPEDPSLSVTLAD

Drosophila_melanogaster_TAF6_N ----NAPASSIVTLSSN------TINTAP------Homo_sapiens_TAF6_NP_647476 --ALQAQQVNRTTLTITQPRPTLTLSQAPQPGPRTPGLLKVPGSIALPVQ Arabidopsis_thaliana_TAF6b_1 ------SKRKASSDNLT------HQPPL------Arabidopsis_thaliana_TAF6b_3 ------SKRKASSDNLT------HQPPL------Arabidopsis_thaliana_TAF6b_2 ------SKRKASSDNLT------HQPPL------Arabidopsis_thaliana_TAF6_At1g ------HKRKLSVDSSE------NQSPQ------Populus_trichocarpa_TAF6_Contig1 SFQKLYHLNATVCDVSMPMYLWVESNLFPL------Hordeum_vulgare_TAF6_Barley1_1 ------NKRKSSTDLSA------SQPPL------Oryza_sativa_TAF6_BAB92191 ------NKRKSSTDLST------SQPPL------Populus_trichocarpa_TAF6_Contig2 CVEMGLYVGTSSFHYVHLTLYPACISCRSL------Arabidopsis_thaliana_TAF6b_4 ------Saccharomyces_cerevisiae_TAF6_ ------Homo_sapiens_TAF6L_NP_006464 IYRELYAFFGDSLATRFGTGQPAPTAPRPPGD------

211

Drosophila_melanogaster_TAF6_N ------ITSAAQTATTIGRVSMPTTQRQGSPGVSS Homo_sapiens_TAF6_NP_647476 TLVSARAAAPPQPSPPPTKFIVMSSSSSAPSTQQVLSLSTSAPGSGSTTT Arabidopsis_thaliana_TAF6b_1 ------KKIAVGG------IIQM Arabidopsis_thaliana_TAF6b_3 ------KKIAVGG------IIQM Arabidopsis_thaliana_TAF6b_2 ------KKIAVGG------IIQM Arabidopsis_thaliana_TAF6_At1g ------KRLITMDGPDGVHSQDQSGSAPMQVD Populus_trichocarpa_TAF6_Contig1 ------IENYQDKRKASMEHMEQPPPKKIATD Hordeum_vulgare_TAF6_Barley1_1 ------KKMASDVSMSPMGSAAPVAGNMAGSM Oryza_sativa_TAF6_BAB92191 ------KKMTTDGAMNSMTSAP-----MPGTM Populus_trichocarpa_TAF6_Contig2 ------LANQDKRKASTDNLMQQPLLKKIATD Arabidopsis_thaliana_TAF6b_4 ------Saccharomyces_cerevisiae_TAF6_ ------Homo_sapiens_TAF6L_NP_006464 ------KKEPAAAPDSVRKMPQLTASAIVSPHGD

Drosophila_melanogaster_TAF6_N LPQIRAIQANQPAQKFVIVTQNSP----QQGQAKVVR------RGSSP Homo_sapiens_TAF6_NP_647476 SPVTTTVPSVQPIVKLVSTATTAPPSTAPSGPGSVQKYIVVSLPPTGEGK Arabidopsis_thaliana_TAF6b_1 SSTQMQMRGTTTVPQ------QSHTDADARHH------NSP Arabidopsis_thaliana_TAF6b_3 SSTQMQMRGTTTVPQ------QSHTDADARHH------NSP Arabidopsis_thaliana_TAF6b_2 SSTQMQMRGTTTVPQ------QSHTDADARHH------NSP Arabidopsis_thaliana_TAF6_At1g NPVENDNPPQNSVQP------SSSEQASDANESESRNGK----VKESG Populus_trichocarpa_TAF6_Contig1 GPVDMQVEPIAPVPLGDSKTGLSTSSEHTPNYSEAGSRNQ------KDKG Hordeum_vulgare_TAF6_Barley1_1 DGFSAQLPNPGMMQA------SSSGQKVESMTAAGAIR------RDQG Oryza_sativa_TAF6_BAB92191 DGFSTQLPNPSMTQT------SSSGQLVES-TASGVIR------RDQG Populus_trichocarpa_TAF6_Contig2 SAIGAMPMNSMPVEMQGAASGFPTAVGASSVSVSAISRQLSNENVPRREI Arabidopsis_thaliana_TAF6b_4 ------Saccharomyces_cerevisiae_TAF6_ ------Homo_sapiens_TAF6L_NP_006464 ESPRGSGGGGPASASGPAASESRPLPRVHRARGAPRQQGPGTGTRDVFQK

Drosophila_melanogaster_TAF6_N HSVVLSAASNAASASNSNSSSSGSLLAAAQRSSDNVCVIAGSEAPAVDGI Homo_sapiens_TAF6_NP_647476 GGPTSHPSPVPPPASSPSPLSGSALCGGKQEAGDSPPPAPGTPKANGSQP Arabidopsis_thaliana_TAF6b_1 STIAPKTSAAAG------TDVDNYLFPLFEYFGESMLMFTPTHELSFFL- Arabidopsis_thaliana_TAF6b_3 STIAPKTSAAAG------TDVDNYLFPLFEYFGESMLMFTPTHELSFFL- Arabidopsis_thaliana_TAF6b_2 STIAPKTSAAAG------TDVDNYLFPLFEYFGESMLMFTPTHELSFFL- Arabidopsis_thaliana_TAF6_At1g RSRAITMKAILDQIWKDDLDSGRLLVKLHELYGDRILPFIPSTEMSVFL- Populus_trichocarpa_TAF6_Contig1 DSQAIKTSAILSQVWKDDLNSGHLLVSLFELFGESILSFIPSPEMSLFL- Hordeum_vulgare_TAF6_Barley1_1 SNHAQRVSAVLRQAWKEDQDAGHLLGSLHEVFGEAIFSFIQPPELSIFL- Oryza_sativa_TAF6_BAB92191 SNHTQRVSTVLRLAWKEDQNAGHLLSSLYEVFGEAIFSFVQPPEISFFL- Populus_trichocarpa_TAF6_Contig2 SGRGLKTSTVLAQAWKEDMDAGHLLASLFELFSESMFSFTPKPELSFFL- Arabidopsis_thaliana_TAF6b_4 ------Saccharomyces_cerevisiae_TAF6_ EKLLERCGVTIGFHILKRDDAKELISAIFFGE------Homo_sapiens_TAF6L_NP_006464 SRFAPRGAPHFRFIIAGRQAGRRCRGRLFQTAFPAPYGPSPASRYVQKLP

Drosophila_melanogaster_TAF6_N TVQSFRAS------Homo_sapiens_TAF6_NP_647476 NSGSPQPAP------Arabidopsis_thaliana_TAF6b_1 ------Arabidopsis_thaliana_TAF6b_3 ------Arabidopsis_thaliana_TAF6b_2 ------Arabidopsis_thaliana_TAF6_At1g ------Populus_trichocarpa_TAF6_Contig1 ------Hordeum_vulgare_TAF6_Barley1_1 ------Oryza_sativa_TAF6_BAB92191 ------Populus_trichocarpa_TAF6_Contig2 ------Arabidopsis_thaliana_TAF6b_4 ------Saccharomyces_cerevisiae_TAF6_ ------Homo_sapiens_TAF6L_NP_006464 MIGRTSRPARRWALSDYSLYLPL

TAF9 Alignment

CLUSTAL X (1.83) multiple sequence alignment

Gossypium__Cotton__TAF9_TC1456 MAEG------EEDLPRDAKIVKSLLKSMGVE Populus_balsamifera_TAF9 MAEG------EEDMPRDAKIVKSLLKSMGVE Vitis_vinifera_TAF9_TC11580 MAGG------DEDLPRDAKIVKSLLKSMGVD Solanum_tuberosum_TAF9_TC67183 MAEGG------EEDLPRDAKIVKTLLKSMGVD Solanum_tuberosum_TAF9_TC67182 MAEGG------EEDLPRDAKIDKTSLKSMGVD Lycopersicon_esculentum_TAF9_T MAEGG------EEDLPRDAKIVKTLLKSMGVD Arabidopsis thaliana TAF9 MAGEG------EEDVPRDAKIVKSLLKSMGVE M_truncatula_TAF9_TC85341 MADNEE------DSNMPRDAKIMQSLLKSMGVE

212

M_truncatula_TAF9_TC85342 MADNEE------DSNMPRDAKIVQSLLKSMGVE Zea_mays_TAF9_TC182853 MDAGAARPSAPS--TAAVA------GASVADEPRDARVVRELLRSMGLR Zea_mays_TAF9_TC182854 MDAADARPSAPSAAAAAVA------GASVADEPRDARVVRELLRSMGLG Hordeum_vulgare_TAF9_TC68170 MDSGGVRPSLPS--AAAAG------GASVPDEPRDARVVRELLRSMGLG Oryza_sative_TAF9_AAP12985 MDPGGLRPAPQSAAAAAAAAAAGAGAGASAADEPRDARVVRELLRSMGLS Triticum_aestivum_TAF9_TC70841 MDGGGGGGGRPALQPAAAGG------GASGPDEPRDARVVRELLRSMGLG Oryza_sativa_TAF9_BAC21319.1 MDTGADQAPPPPPPPPVAAAS------AAADEPRDLRVVREILHSLGLR Chlamydomonas_reinhardtii_TAF9 MDAARGAGGAVS------DGAQPQDVATMHALLRSMGVE Homo_sapiens_TAF9_NP_003178 MESGK------TASPKSMPKDAQMMAQILKDMGIT Homo_sapiens_TAF9L_NP_057059 MESGK------MAPPKNAPRDALVMAQILKDMGIT Drosophila_melanogaster_TAF9_A MSAEKSDKAKI------SAQIKHVPKDAQVIMSILKELNVQ Saccharomyces_cerevisiae_TAF9_ MNGGGKNVLNKNSVGSVSEVGP----DSTQEETPRDVRLLHLLLASQSIH Populus_balsamifera_TAF9b MGEGTVP------LEVQIRPKEMHLQAEFGFAAHWR

Gossypium__Cotton__TAF9_TC1456 D--YEPRVIHQFLELWYRYVVDVLTDAQVYSEHAGKQ-----TIDCDDVK Populus_balsamifera_TAF9 D--YEPRVVHQFLELWYRYVVDVLTDAQVYSEHANKT-----AIDCDDVK Vitis_vinifera_TAF9_TC11580 D--YEPRVIHQFLELWYRYVVDVLTDAQVYSEHASKL-----AIDCDDVK Solanum_tuberosum_TAF9_TC67183 D--YEPRVVHQFLELWYRYVVDVLMDAQVYSEHAGKA-----SIDSDDIK Solanum_tuberosum_TAF9_TC67182 D--YEPRVVQQFLELRNSYVVDVLTDAQVYSEHAGKT-----SIDSDDIK Lycopersicon_esculentum_TAF9_T D--YEPRVVHQFLELWYRYVVDVLTDAQVYSEHARKA-----SIDSDDIK Arabidopsis thaliana TAF9 D--YEPRVIHQFLELWYRYVVEVLTDAQVYSEHASKP-----NIDCDDVK M_truncatula_TAF9_TC85341 E--YEPRVINKFLELWYRYVVDVLTDAQVYSEHAGKP-----AIDVDDVK M_truncatula_TAF9_TC85342 E--YEPRVINKFLELWYRYVVDVLTDAQVYSEHAGKP-----AIDVDDVK Zea_mays_TAF9_TC182853 EGEYEPRVVHQFLDLAYRYVGDVLGDAQVYADHAGKA-----QIDADDVR Zea_mays_TAF9_TC182854 EGEYEPRVVHQFLDLAYRYVGDVLGDAQVYADHAGKA-----QIDADDVR Hordeum_vulgare_TAF9_TC68170 EGEYEPRVVGQFLDLAYRYVGDVLGDAQVYADHADKP-----QIDADDVR Oryza_sative_TAF9_AAP12985 EGEYEPRVVHQFLDLAYRYVGDVLGDAQVYADHAGKP-----QLDADDVR Triticum_aestivum_TAF9_TC70841 EGEYEPRVVHQFLDLAYRYAGDVLGDAQVYADHAGKP-----QLDADDVR Oryza_sativa_TAF9_BAC21319.1 EGDYEEAAVHKLLLFAHRYAGDVLGEAKAYAGHAGRE-----SLQADDVR Chlamydomonas_reinhardtii_TAF9 E--FEPRVVNQLMDFMYKYTTDVLLDAEVFSEHAGRQP---GQVDASGVT Homo_sapiens_TAF9_NP_003178 E--YEPRVINQMLEFAFRYVTTILDDAKIYSSHAKKA-----TVDADDVR Homo_sapiens_TAF9L_NP_057059 E--YEPRVINQMLEFAFRYVTTILDDAKIYSSHAKKP-----NVDADDVR Drosophila_melanogaster_TAF9_A E--YEPRVVNQLLEFTFRYVTCILDDAKVYANHARKK-----TIDLDDVR Saccharomyces_cerevisiae_TAF9_ Q--YEDQVPLQLMDFAHRYTQGVLKDALVYNDYAGSGNSAGSGLGVEDIR Populus_balsamifera_TAF9b YKEGDCKHSSFVLQVVEWARWVITWQCETMSKDRPSIG-CDDSIKPPCTF

Gossypium__Cotton__TAF9_TC1456 LAIQSKVNFSFSQPPPRE------VLLELA Populus_balsamifera_TAF9 LAIQSKVNFSFSQPPPRE------VLLELA Vitis_vinifera_TAF9_TC11580 LAIHFKVNFSFFQPPARE------VLLELA Solanum_tuberosum_TAF9_TC67183 LAIQSKVNFSFSQPPPRE------VLLELA Solanum_tuberosum_TAF9_TC67182 LAIQSKVNFSFSQPPPRE------VLLELA Lycopersicon_esculentum_TAF9_T LAIQSKVNFSFSQPPPRE------VLLELA Arabidopsis thaliana TAF9 LAIQSKVNFSFSQPPPRE------VLLELA M_truncatula_TAF9_TC85341 LAIQSQVNFSFSQPPPRE------VLLELA M_truncatula_TAF9_TC85342 LAIQSQVNFSFSQPPPRE------VLLELA Zea_mays_TAF9_TC182853 LAIQAKVNFSFSQPPPRE------VLLELA Zea_mays_TAF9_TC182854 LAIQAKVNFSFSQPPPRE------VLLELA Hordeum_vulgare_TAF9_TC68170 LAIQANVNFSFSQPPPRE------VLLELA Oryza_sative_TAF9_AAP12985 LAIQSKVNFSFSQPPPRECSEFFHSDQDFRSRSLPSDNPLFFSMVLLEVA Triticum_aestivum_TAF9_TC70841 LAIQAKVNFSFSQPPPRE------VLLELA Oryza_sativa_TAF9_BAC21319.1 LAIQARG-MSSAAPPSRE------EMLDIA Chlamydomonas_reinhardtii_TAF9 MAIQSRTALYVQPPPQER------VTELA Homo_sapiens_TAF9_NP_003178 LAIQCRADQSFTSPPPRD------FLLDIA Homo_sapiens_TAF9L_NP_057059 LAIQCRADQSFTSPPPRD------FLLDIA Drosophila_melanogaster_TAF9_A LATEVTLDKSFTGPLERH------VLAKVA Saccharomyces_cerevisiae_TAF9_ LAIAARTQYQFKPTAPKE------LMLQLA Populus_balsamifera_TAF9b PSHSDGCPYSYKPHCGQDG------PIFIIM

Gossypium__Cotton__TAF9_TC1456 RNRNKVPLPKAIPGPG-IPLPPEQDTLISTNYQLAIPKKQPAQAMEEMEE Populus_balsamifera_TAF9 RNRNKIPLPKSIAGPG-IPLPPEQDTLISPNYQLAIPKKRTAQAIEETEE Vitis_vinifera_TAF9_TC11580 RNRNKIPLPKSIAGPG-IPLPPEQDTLISPNYQLAIPKKRTAQAVEETEE Solanum_tuberosum_TAF9_TC67183 RNRNKIPLPKSIAGSG-VPLPPEQDTLINPNYQLAIAKKQTSQP-EETEE Solanum_tuberosum_TAF9_TC67182 RNRNKIPLPKSIAGSG-VPLPPQQDTLINPNYQLAIAKKQTSQP-EETEE Lycopersicon_esculentum_TAF9_T RNRNKIPLPKSIAGSG-VPLPPEQDTLINPNYQLAIAKKQTNQP-EETEE Arabidopsis thaliana TAF9 ASRNKIPLPKSIAGPG-VPLPPEQDTLLSPNYQLVIPKKSVSTEPEETED M_truncatula_TAF9_TC85341 QNRNKIPLPKSIAGPG-FPLPPDQDTLIAPNYQFAIPNKRSVEPMEETED M_truncatula_TAF9_TC85342 QNRNKIPLPKSIAGPG-FPLSPDQDTLIAPNYQFAIPNKRSVEPMEETED Zea_mays_TAF9_TC182853 RSRNRMPLPKSIAPPGSIPLPPEQDTLLAQNYQLLPPLKPPPQY-EEIED Zea_mays_TAF9_TC182854 RSRNRMPLPKSIAPPGSIPLPPEQDTLLAQNYQLLPPLKPPPQY-EENED Hordeum_vulgare_TAF9_TC68170 RSRNKIPLPKSIAPPGSIPLPPEQDTLLSENYQLLPALKPPTQT-EEAED Oryza_sative_TAF9_AAP12985 RNRNKIPLPKSIAPPGSIPLPPEQDTLLSQNYQLLAPLKPPPQF-EETED

213

Triticum_aestivum_TAF9_TC70841 RSRNKIPLPKSIAPPGSIPLPPEQDTLLSQNYQLLPALKPPTQT-EEAED Oryza_sativa_TAF9_BAC21319.1 HKCNEIPIPKPCVPSGSISLPHYEDMLLNKKHIFVPRVEPTPHQIEETED Chlamydomonas_reinhardtii_TAF9 RQVNDTGTARPGHQAPACRCRPRASR------Homo_sapiens_TAF9_NP_003178 RQRNQTPLPLIKPYSG-PRLPPDRYCLTAPNYRLKS----LQKKASTSAG Homo_sapiens_TAF9L_NP_057059 RQKNQTPLPLIKPYAG-PRLPPDRYCLTAPNYRLKS----LIKKG-PNQG Drosophila_melanogaster_TAF9_A DVRNSMPLPPIKPHCG-LRLPPDRYCLTGVNYKLRATNQPKKMTKSAVEG Saccharomyces_cerevisiae_TAF9_ AERNKKALPQVMGTWG-VRLPPEKYCLTAKEWDLEDPKSM------Populus_balsamifera_TAF9b IENDKMSVQEFPADSTVMDLLERAGRASSRWSAYGFPVKEELRPRLNHRP

Gossypium__Cotton__TAF9_TC1456 DEE------SVEP-NSSQEH------KTDAPHPTSQR Populus_balsamifera_TAF9 DEE------SADP-NQSQEQ------KTDPPQLTPQR Vitis_vinifera_TAF9_TC11580 DEE------GADPSHASQEG------RTDLPQHTPQR Solanum_tuberosum_TAF9_TC67183 DEE------RADPNPAPSKNPSLSHE------KTDVPQGTPQQ Solanum_tuberosum_TAF9_TC67182 DEE------SADPNPAPSKNPSLSHE------KTDVPQGTPQR Lycopersicon_esculentum_TAF9_T DEE------SADPNPAPSKNPTLSHE------KTDLPQGTPQR Arabidopsis thaliana TAF9 DEE------MTDPGQSSQEQQQQQQQ------TSDLPSQTPQR M_truncatula_TAF9_TC85341 EEVP------NADPNPSQEEK-----T------DAEQN--PHQR M_truncatula_TAF9_TC85342 ERS------SQWPIPTHLKK-----R------RQMRNKIPIKE Zea_mays_TAF9_TC182853 ETEEPNPS---NPANSNPSYSQDQSSKEQQ------QQHTPQHG-QR Zea_mays_TAF9_TC182854 ENEESNPSLTPNPANSNPTFSQDQRSNEQ------QHTPQHG-QR Hordeum_vulgare_TAF9_TC68170 DNEGADAI----PANPSPSYSQDQRGSEQ------HQPQSQSQR Oryza_sative_TAF9_AAP12985 DNAGANPTPTSNPSNPSPNNLQEQQ------QLPQHG-QR Triticum_aestivum_TAF9_TC70841 EEEGANAD----AANANPNSSQDQR------GNEA Oryza_sativa_TAF9_BAC21319.1 DYNDDGSNAN--VASPNSNYDQDLFGSISLPHYQDMLLNQNHLSVHRVEP Chlamydomonas_reinhardtii_TAF9 ------Homo_sapiens_TAF9_NP_003178 RITVPRLSVGSVTSRPSTPTLGTPTPQTMSVSTKVGTPMSLTGQRFTVQM Homo_sapiens_TAF9L_NP_057059 RL-VPRLSVGAVSSKPTTPTIATP--QTVSVPNKVATPMSVTSQRFTVQI Drosophila_melanogaster_TAF9_A RPLKTVVKPVSSANGPKRPHSVVAKQQVVTIPKPVIKFTTTTTTKTVGSS Saccharomyces_cerevisiae_TAF9_ ------Populus_balsamifera_TAF9b VHDATCKLKMGDVVELTPAIPDKSLSDYR------EEIQRMYEH

Gossypium__Cotton__TAF9_TC1456 VSFPL-TKRSK------Populus_balsamifera_TAF9 VSFPL-TKRPNYRFQVMSSISCSSSMNSPDSSTLFTRLKFELCDMRIALI Vitis_vinifera_TAF9_TC11580 VSFPIGAKRPR------Solanum_tuberosum_TAF9_TC67183 VSFPLGAKRPR------Solanum_tuberosum_TAF9_TC67182 VSFPLGAKRPR------Lycopersicon_esculentum_TAF9_T VSFPLGAKRPR------Arabidopsis thaliana TAF9 VSFPL-SRRPK------M_truncatula_TAF9_TC85341 VSFPLPKRQKD------M_truncatula_TAF9_TC85342 CHFPCLNPKGLI------Zea_mays_TAF9_TC182853 VSFQLNAVAAAAAAAKRPRMAIDQLNMG------Zea_mays_TAF9_TC182854 VSFQLNAVAAAA--AKRPRMTVDQLNIG------Hordeum_vulgare_TAF9_TC68170 VSFQLNAVAAAA--AKRPLVTTDQLNMG------Oryza_sative_TAF9_AAP12985 VSFQLNAVAAAK---RRG--TMDQLNMG------Triticum_aestivum_TAF9_TC70841 XSSSLRARARAQ------GFFQA------Oryza_sativa_TAF9_BAC21319.1 AHDQLEKIKDDGSNDNADSSHSNYVQDSSGSVSLQHHQDMSLNQNHLFVH Chlamydomonas_reinhardtii_TAF9 ------Homo_sapiens_TAF9_NP_003178 PTSQS---PAVKASIPATSAVQNVLINPSLIGSKNILITTNMMSSQNTAN Homo_sapiens_TAF9L_NP_057059 PPSQS---TPVKP-VPATTAVQNVLINPSMIGPKNILITTNMVSSQNTAN Drosophila_melanogaster_TAF9_A GGSGGGGGQEVKSESTGAGGDLKMEVDSDAAAVGSIAGASGSGAGSASGG Saccharomyces_cerevisiae_TAF9_ ------Populus_balsamifera_TAF9b GSATVSSTAPAVSGTVGRRS------

Gossypium__Cotton__TAF9_TC1456 ------Populus_balsamifera_TAF9 ------Vitis_vinifera_TAF9_TC11580 ------Solanum_tuberosum_TAF9_TC67183 ------Solanum_tuberosum_TAF9_TC67182 ------Lycopersicon_esculentum_TAF9_T ------Arabidopsis thaliana TAF9 ------M_truncatula_TAF9_TC85341 ------M_truncatula_TAF9_TC85342 ------Zea_mays_TAF9_TC182853 ------Zea_mays_TAF9_TC182854 ------Hordeum_vulgare_TAF9_TC68170 ------Oryza_sative_TAF9_AAP12985 ------Triticum_aestivum_TAF9_TC70841 ------Oryza_sativa_TAF9_BAC21319.1 QVELTLDQIEEIKDDGSNDNVDSPNFNCVQDPSRSVSFPHYQVMPLNQN Chlamydomonas_reinhardtii_TAF9 ------Homo_sapiens_TAF9_NP_003178 ESS------NALKRKREDDDDDDDDDDDYDNL------Homo_sapiens_TAF9L_NP_057059 EA------NPLKRKHEDDDDNDIM------

214

Drosophila_melanogaster_TAF9_A GGGGGSSGVGVAVKREREEEEFEFVTN------Saccharomyces_cerevisiae_TAF9_ ------Populus_balsamifera_TAF9b ------

TAF10 Alignment

CLUSTAL X (1.83) multiple sequence alignment

Homo_sapiens_TAF10_Q12962 ------NGDVKPVVSSTPLVDFLMQLED Drosophila_melanogaster_TAF10b MVGSNFGIIYHNSAGGASSHGQSSGGGGGGDRDRTTPSSHLSDFMSQLED Drosophila_melanogaster_TAF10_ -----TEEEDIDSPLMQSELHSDEEQPDVEEVPLTTEESEMDELIKQLED Hordeum_vulgare_TAF10_Barley1_ -----MGSNNSGGAGGGGG--MAPGTGAGGSDGRHDDEAVLTEFLSSLMD Hordeum_vulgare_TAF10c_HVtuc02 -----MGSNNSGGAGGGGG--MAPGTGAGGSDGRHDDEAVLTEFLSSLMD Triticum_aestivum_TAF10b_TC647 ----MMGSNNPGGAGGGGGGGMAPGTGGGGSDGRHDDEAVLTDFLSSLMD Triticum_aestivum_TAF10_TC6468 ----MMGSNNPGGAGGGGG--MAPGTGAGGSDGRHDDEAVLTEFLSSLMD Hordeum_vulgare_TAF10b_TC68796 ----MMGSNNPGGAG--GG--MAPGMGAGGSDGRHDDEAVLTEFLSSLMD Triticum_aestivum_TAF10c_CA620 ------MAPGMGAGSSDGRHDDEAVLTEFLSSLMD Oryza_sativa_TAF10_TC129171 ------MVPGGMGGGGPMGAAPPGGGGGGDGRHDDEAVLTEFLSSLMD Zea_mays_TAF10_TC184169 ------MG------TGVGGGGDGRHDDEAALTEFLSSLMD Arabidopsis_thaliana_TAF10_AAK ------MN------HGQQSGEAKHEDDAALTEFLASLMD Populus_trichocarpa_TAF10 ------MNNTSS--SNSQQQQQSSEARHDDDAVLTEFLASLMD Glycine_max_TAF10_TC162515 ------MNQ------NPQSSDGRNDDDSALSDFLASLMD Glycine_max_TAF10b_TC162516 ------MNQ------NPQSSEGRNDDDSALSDFLASLMD Gossypium_arboreum_TAF10_BQ401 ------MNH------NPQSSDGKHDDDSALSDFLASLMD Beta_vulgaris_TAF10_BVSVtuc03------MN------PQTSDGRHDDDAALSEFLASLMD Lycopersicon_esculentum_TAF10_ ------MNQS------QGQQTSEGRHEDDAVLADFLASLMD Pinus_TAF10_TC9616 ------MAESKQDDDAVLIEFLSSLMD Saccharomyces_cerevisiae_TAF10 ------GIPEFTRKDKTLEEILEMMDS

Homo_sapiens_TAF10_Q12962 YTPTIPDAVTGYYLNRAGFEASDPRIIRLISLAAQKFISDIANDALQHCK Drosophila_melanogaster_TAF10b YTPLIPDAVTSHYLNMGGFQSDDKRIVRLISLAAQKYMSDIIDDALQHSK Drosophila_melanogaster_TAF10_ YSPTIPDALTMHILKTAGFCTVDPKIVRLVSVSAQKFISDIANDALQHCK Hordeum_vulgare_TAF10_Barley1_ YNPTIPDELVEHYLGRSGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCK Hordeum_vulgare_TAF10c_HVtuc02 YNPTIPDELVEHYLGRSGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCK Triticum_aestivum_TAF10b_TC647 YNPTIPDELVEHYLGRSGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCK Triticum_aestivum_TAF10_TC6468 YNPTIPDELVEHYLGRSGFHCPDLRLTRLVAVAAQKFISDIASDSLQHCK Hordeum_vulgare_TAF10b_TC68796 YNPMIPDELVEHYLGRSGFHXPDLRLTRLVAVATQKFISDVASDSLQHCK Triticum_aestivum_TAF10c_CA620 YNPMIPDELVEHYLGRSGFHCPDLRLTRLVAIATQKFISDVASDSLQHCK Oryza_sativa_TAF10_TC129171 YTPTIPDELVEHYLGRSGFYCPDLRLTRLVAVATQKFISDIASDSLQHCK Zea_mays_TAF10_TC184169 YTPTIPDELVEHYLGRSGFHCPDLRLTRLVAVATQKFLSDIASDSLQHCK Arabidopsis_thaliana_TAF10_AAK YTPTIPDDLVEHYLAKSGFQCPDVRLIRLVAVATQKFVADVASDALQHCK Populus_trichocarpa_TAF10 YTPTIPDELVEHYLAKSGFQCPDVRLVRLVAVATQKFVADVATDALQQCK Glycine_max_TAF10_TC162515 YTPTIPDELVEHYLAKSGFQCPDVRLTRLVAVATQKFVAEVAGDALQHCK Glycine_max_TAF10b_TC162516 YTPTIPDELVEHYLAKSGFQCPDVRLTRLVAVATQKFVAEVAGDALQHCK Gossypium_arboreum_TAF10_BQ401 YAPTIPDELVEHYLAKSGFQCPDVRLIRLVAVATQKFVAEVASDALQHCK Beta_vulgaris_TAF10_BVSVtuc03- YTPTIPDELVEHYLAKSGFQCPDVRLIRLVAVATQKFISEVATDALQHCK Lycopersicon_esculentum_TAF10_ YTPTIPDELVEHYLGKSGFQCPDVRLIRLVAVATQKFIADVATDALQHCK Pinus_TAF10_TC9616 YTPTIPDELAEYYLSKSGFQCPDVRIIRMVSIATQKFIAEIASDAFQLCK Saccharomyces_cerevisiae_TAF10 TPPIIPDAVIDYYLTKNGFNVADVRVKRLLALATQKFVSDIAKDAYEYSR

Homo_sapiens_TAF10_Q12962 MKG----T------ASGSSRSK----SKDRKYTL Drosophila_melanogaster_TAF10b ARTH-MQT------TNTPGGSK----AKDRKFTL Drosophila_melanogaster_TAF10_ TRTTNIQH------SSGHSSSKDKKNPKDRKYTL Hordeum_vulgare_TAF10_Barley1_ ARV------AAPIKDNKSKQPKDRRLVL Hordeum_vulgare_TAF10c_HVtuc02 ARV------AAPIKDNKSKQPKDRRLVL Triticum_aestivum_TAF10b_TC647 ARV------AAPIKDNKSKQPKDRRLVL Triticum_aestivum_TAF10_TC6468 ARV------AAPVKDNKSKQPKDRRLVL Hordeum_vulgare_TAF10b_TC68796 ARV------AAPIKDNKSKQPKDRRLVL Triticum_aestivum_TAF10c_CA620 ARV------AAPIKDNKSKQPKDRRLVL Oryza_sativa_TAF10_TC129171 ARV------AAPIKDNKSKQPKDRRLVL Zea_mays_TAF10_TC184169 ARV------AAPIKDNKSKQPKDRRLVL Arabidopsis_thaliana_TAF10_AAK ARP------APVVKDK--KQQKDKRLVL Populus_trichocarpa_TAF10 ARP------APVVKDKRDKQQKEKRLIL Glycine_max_TAF10_TC162515 ARQ------ATIPKDKRDKQQKDKRLVL Glycine_max_TAF10b_TC162516 ARQ------ATIPKDKRDKQQKDKRLVL Gossypium_arboreum_TAF10_BQ401 ARQ------AAVVKDKREKQQKDKRLIL Beta_vulgaris_TAF10_BVSVtuc03- ARQ------SSVVKDKRDKLQKDKRLVL Lycopersicon_esculentum_TAF10_ ARQ------STIVKDKRDKQQKDKRLTL

215

Pinus_TAF10_TC9616 ARQ------SAVNKEKRDKQQKDKSFVL Saccharomyces_cerevisiae_TAF10 IRSSVAVSNANNSQARARQLLQGQQQPGVQQISQQQHQQNEKTTASKVVL

Homo_sapiens_TAF10_Q12962 TMEDLTPALSEYGINVKKPHYFT------Drosophila_melanogaster_TAF10b TMEDLQPALADYGINVRKVDYSQ------Drosophila_melanogaster_TAF10_ AMEDLVPALADHGITMRKPQYFV------Hordeum_vulgare_TAF10_Barley1_ TMDDLSKALREHGVNLRHPEYFADSPSAGMAPSTRDE Hordeum_vulgare_TAF10c_HVtuc02 TMDDLSKALREHGVNLRHPEYFADSPSAGMGHSTREE Triticum_aestivum_TAF10b_TC647 TMDDLSKALREHGVNLRHPEYFADSPSAGX-PLKREE Triticum_aestivum_TAF10_TC6468 TMDDLSKALREHGGNLKHPEYFADSPSAGMPPSTREE Hordeum_vulgare_TAF10b_TC68796 TMDDLSKALREHGVNLKHPEYFADSPSAGMGHSTREE Triticum_aestivum_TAF10c_CA620 TMDDLSKALREHGVNLKHPEYFADSPSARMGPSTREE Oryza_sativa_TAF10_TC129171 TMDDLSKALQEHGVNLKHPEYFADSPSAGMAPAAREE Zea_mays_TAF10_TC184169 TMDDLSKALREHGVNLKHAEYFADSPSAGMAPSTREE Arabidopsis_thaliana_TAF10_AAK TMEDLSKALREYGVNVKHPEYFADSPSTGMDPATRDE Populus_trichocarpa_TAF10 TMEDLSKALSEYGVNVKHQEYFADSPSTGMDPASREE Glycine_max_TAF10_TC162515 TMEDLSKALREYGVNLKHQEYFADSPSTGMDPATREE Glycine_max_TAF10b_TC162516 TMEDLSQALREYGANLTDQEYFADSPSTVMDPATREE Gossypium_arboreum_TAF10_BQ401 TMDDLSKSLREYGVNVKHQEYFADSPSTGIDPASREE Beta_vulgaris_TAF10_BVSVtuc03- TMEDLSRALKEYGVNLKHQEYFADNPSTGMDPASRDE Lycopersicon_esculentum_TAF10_ TMDDLSKSLREYGVNVKHQDYFADSPSAGLDPASREE Pinus_TAF10_TC9616 TTEDLSMALREYGVNMKRQEYFADNPSAGTNPTSKDE Saccharomyces_cerevisiae_TAF10 TVNDLSSAVAEYGLNIGRPDFYR------

TAF11 Alignment

CLUSTAL X (1.83) multiple sequence alignment

Drosophila_melanogaster_TAF11_ ------MDEILFPTQQKSNSLSDGDDV-DLKFFQSASGERKDSDTSDPG Homo_sapiens_TAF11_NP_005634 ------MDDAHESPSDKGGETGESDET-AAVPGDPGATD-TDGIPEETD Arabidopsis_thaliana_TAF11_At4 ----MKHSKDPFEAAIEEEQEES------PPESPVGGGGGGDGSEDGR Arabidopsis_thaliana_TAF11b_AA ----MAFNARSCCFASSNERVTCNCNCL-KDQPVPSVVGCATKKLAEFWS Medicago_truncatula_TAF11_TC80 MAGGISFGIGLKRMKQSKDPFEAAFE---ESPPESPIETEPDPDASTENP Hordeum_vulgare_TAF11_TC81880 ------MKDPFEAAVEEQ-ESPPDSPAPPEEGPATAVPHT Triticum_aestivum_TAF11_TC9194 ------MKDPFEAAVEEQ-DSPPDSPAPPEEDPATAVPHT Oryza_sativa_TAF11_BAB90043 TREAAHQARRRRAAAAMKDPFEAAVEEQ-ESPPESPAANEEDAAGAP--- Oryza_sativa_TAF11b_TC124761 ------MKDPFEAAVEEQ-ESPPESPAANEEDAAGAP--- Populus_trichocarpa_TAF11_Contig1 ------MKQSKDPFEAAYVEQEESPPESPVAQDDYDTQASNAA Saccharomyces_cerevisiae_TAF11 --MTEPQGPLDTIPKVNYPPILTIANYFSTKQMIDQVISEDQDYVTWKLQ

Drosophila_melanogaster_TAF11_ NDAD------RDGKDADGDNDNKNTD------GD Homo_sapiens_TAF11_NP_005634 GDAD------VDLKEAAAEEGELESQDVSDLTTVERED Arabidopsis_thaliana_TAF11_At4 IEID------QTQDEDERPVDVRR--PMKKAKTSVVVTEAKNK Arabidopsis_thaliana_TAF11b_AA FKIQRYVIFVKVLLRMKHSKDPFEAAMEEQEESPVETEQTLEGDERAVKK Medicago_truncatula_TAF11_TC80 NSTN------SSLPQSTLTHEEEHNHIKTPNSN----NTITK Hordeum_vulgare_TAF11_TC81880 IDEDYDGSAGAGGSR-PPPPRPRPSALAAPSTSAAPAAAKAK---VRPQK Triticum_aestivum_TAF11_TC9194 AAEDYDGSAGAGGSR-APPPRPRPSALAAPSTSVAPAAAKAK---VRPHK Oryza_sativa_TAF11_BAB90043 --EGYDG---ASGSR-GPPLR-LPPSRAAPSGSGGAAAAAARGKVVRVQK Oryza_sativa_TAF11b_TC124761 --EGYDG---ASGSR-GPPLR-LPPSRAAPSGSGGAAAAAARGKVVRVQK Populus_trichocarpa_TAF11_Contig1 AAAD------DSQGAVVGQDDDDLGGGGRND------FA Saccharomyces_cerevisiae_TAF11 NLRTGG------TSINNQLNKYPKYKYQKTRINQQDPDSINKVPEN

Drosophila_melanogaster_TAF11_ GDSGEPAHKKLKT------KKELEEEERE---RMQVLVSNFTEEQLDR Homo_sapiens_TAF11_NP_005634 SSLLNPAAKKLKIDTKEKKEKKQKVDEDEIQ---KMQILVSSFSEEQLNR Arabidopsis_thaliana_TAF11_At4 DKDEDDEEEEENMDVELTKYPTS-SDPAKMA---KMQTILSQFTEDQMSR Arabidopsis_thaliana_TAF11b_AA CKTSVVAEAKNKDEVEFTKNITG-ADPVTRAN--KMQKILSQFTEEQMSR Medicago_truncatula_TAF11_TC80 HKDEEDDEEEDNMDVELAKFPTA-GDPHKMA---KMQAILSQFTEEQMSR Hordeum_vulgare_TAF11_TC81880 EQ-DDDDDEEDPMEVDLDKLPSGTSDPDKLA---KMNALLSQFTEDQMNR Triticum_aestivum_TAF11_TC9194 EQ-DDDDDEEDPMEVDLDKLPSGTSDPDKLA---KMNALLSQFTEDQMNR Oryza_sativa_TAF11_BAB90043 EQQEEEDDEEDHMEVDLDKLPSGTSDPDKLA---KMNAILSQFTEDQMNR Oryza_sativa_TAF11b_TC124761 EQQEEEDDEEDHMEVDLDKLPSGTSDPDKLA---KMNAILSQFTEDQMNR Populus_trichocarpa_TAF11_Contig1 HSSDHPSASRPMLGSARSKAKNKDDDEEEEED--NMDVELSKLASTADPD Saccharomyces_cerevisiae_TAF11 LIFPQDILQQQTQNSNYEDTNTNEDENEKLAQDEQFKLLVTNLDKDQTNR

Drosophila_melanogaster_TAF11_ YEMYRRSAFPKAAVKRLMQTITGCS-VSQNVVIAMSGIAKVFVGEVVEEA Homo_sapiens_TAF11_NP_005634 YEMYRRSAFPKAAIKRLIQSITGTS-VSQNVVIAMSGISKVFVGEVVEEA Arabidopsis_thaliana_TAF11_At4 YESFRRSALQRPQMKKLLIGVTGSQKIGMPMIIVACGIAKMFVGELVETA Arabidopsis_thaliana_TAF11b_AA YESFRRSGFKKSDMEKLVQRITGGPKMDDTMNIVVRGIAKMFVGDLVETA Medicago_truncatula_TAF11_TC80 YESFRRAGFQKANMKRLLTSITGTQKISIPITIAVSGIAKVFVGEVVETA

216

Hordeum_vulgare_TAF11_TC81880 YESFRRSGFQKSNMKKLLASITGSQKISMPTTIVVSGIAKMFVGEVIETA Triticum_aestivum_TAF11_TC9194 YESFRRSGFQKSNMKKLLASITGSQKISMPTTIVVSGIAKMFVGEVIETA Oryza_sativa_TAF11_BAB90043 YESFRRSGFQKSNMKKLLASITGSQKISLPTTIVVSGIAKMFV------A Oryza_sativa_TAF11b_TC124761 YESFRRSGFQKSNMKKLLASITGSQKISLPTTIVVSGIAKMFVGELVETA Populus_trichocarpa_TAF11_Contig1 KMAKMQFGNSRTEIFQELSSYVHSALHGRRASAPVHAYCKEYH------Saccharomyces_cerevisiae_TAF11 FEVFHRTSLNKTQVKKLASTVANQT-ISENIRVFLQAVGKIYAGEIIELA

Drosophila_melanogaster_TAF11_ LDVMEAQGES------Homo_sapiens_TAF11_NP_005634 LDVCEKWGEM------Arabidopsis_thaliana_TAF11_At4 RVVMAERKES------Arabidopsis_thaliana_TAF11b_AA RVVMRERKES------Medicago_truncatula_TAF11_TC80 RTIMKERKET------Hordeum_vulgare_TAF11_TC81880 RIIMSERKDS------Triticum_aestivum_TAF11_TC9194 RIVMSERKDS------Oryza_sativa_TAF11_BAB90043 RIVMTERKDS------Oryza_sativa_TAF11b_TC124761 RIVMTERKDS------Populus_trichocarpa_TAF11_Contig1 -----AQTIA------Saccharomyces_cerevisiae_TAF11 MIVKNKWLTSQMCIEFDKRTKIGYKLKKYLKKLTFSIIENQQYKQDYQSD

Drosophila_melanogaster_TAF11_ ------Homo_sapiens_TAF11_NP_005634 ------Arabidopsis_thaliana_TAF11_At4 ------Arabidopsis_thaliana_TAF11b_AA ------Medicago_truncatula_TAF11_TC80 ------Hordeum_vulgare_TAF11_TC81880 ------Triticum_aestivum_TAF11_TC9194 ------Oryza_sativa_TAF11_BAB90043 ------Oryza_sativa_TAF11b_TC124761 ------Populus_trichocarpa_TAF11_Contig1 ------Saccharomyces_cerevisiae_TAF11 SVPEDEPDFYFDDEEVDKRETTLGNSLLQSKSLQQSDHNSQDLKLQLIEQ

Drosophila_melanogaster_TAF11_ ------GALQPKFIREAVRRLRTKDRMPIGRYQQPY Homo_sapiens_TAF11_NP_005634 ------PPLQPKHMREAVRRLKSKGQIPNSKHKKII Arabidopsis_thaliana_TAF11_At4 ------GPIRPCHIRESYRRLKLEGKVPKRSVPRLF Arabidopsis_thaliana_TAF11b_AA ------GPIRPCHIRESYRRLKLQGKVPQRSVQRLF Medicago_truncatula_TAF11_TC80 ------GPIRPCHLREAHRRLKLEGKIFKRTTSRLF Hordeum_vulgare_TAF11_TC81880 ------GPIRPCHIREAYRRLKLEGKIPKRSVPRLF Triticum_aestivum_TAF11_TC9194 ------GPIRPCHIREAYRRLKLEGKIPKRSVPRLF Oryza_sativa_TAF11_BAB90043 ------GPQG----NQSKQYVQAE------VLRYY Oryza_sativa_TAF11b_TC124761 ------GPVRPCHIREAYRRLKLEGKIPRRTVPRLF Populus_trichocarpa_TAF11_Contig1 ------SSIRPSGLKYFYNSLCKKG------Saccharomyces_cerevisiae_TAF11 YNKLVLQFNKLDVSIEKYNNSPLLPEHIREAWRLYRLQSDTLPNAYWRTQ

Drosophila_melanogaster_TAF11_ FRLN----- Homo_sapiens_TAF11_NP_005634 FF------Arabidopsis_thaliana_TAF11_At4 R------Arabidopsis_thaliana_TAF11b_AA R------Medicago_truncatula_TAF11_TC80 R------Hordeum_vulgare_TAF11_TC81880 R------Triticum_aestivum_TAF11_TC9194 R------Oryza_sativa_TAF11_BAB90043 ------Oryza_sativa_TAF11b_TC124761 R------Populus_trichocarpa_TAF11_Contig1 ------Saccharomyces_cerevisiae_TAF11 GEGQGSMFR

TFIIEα Alignment

CLUSTAL X (1.83) multiple sequence alignment

Drosophila_melanogaster_TFIIEa ------MSSTSTAAANAAPAKTEVRYVTEVPS Homo_sapiens_TFIIE-alpha_NP_00 ------MADPDVLTEVPA Arabidopsis_thaliana_TFIIEa2_A -MDKSITVVRKT------VVLEPFVKLVRLLVRIFYDNYTP Arabidopsis_thaliana_TFIIEa3_A ------MVKLVAKTFYDNYTP Arabidopsis_thaliana_TFIIEa1_A -MEKSG-PVQKA------VVLQPFVKLVRLVARAFYDDYTT Populus_balsamifera_TFIIE-alpha 2 -MDMNTTIS------VEPFKRLVKLAARAFYDDVST Populus_balsamifera_TFIIE-alpha 1 MAEFGSKLVNKFEESPRGTTAFIKINEAHTEVKKELVKLAARAFYDDITT Solanum_tuberosum_TFIIEa_TC670 ---MS------IEPFNRLVKLAARAFYDDITT Hordeum_vulgare_TFIIEa_TC90346 ------MGSLEPFNRLVRLTARAFYDDISI Oryza_sativa_TFIIEa1 ------MGSMEPFNRLVRLAARAFYDDISM

217

Oryza_sativa_TFIIEa2 ------Oryza_sativa_TFIIEa3 ------MDTMEQLNRLVRMVARGFYEDVSL Oryza_sativa_TFIIEa4 ------MSINERLVKCAAQLLYGNVGF Methanosarcina_acetivorans_TFE ------MNTLVD Sulfolobus_solfataricus_TFE_NP ------MVN Saccharomyces_cerevisiae_TFIIE ------MDRPIDD

Drosophila_melanogaster_TFIIEa SLKQLARLVVRGFYSLEDALIIDMLVRN-PCMKEDDIGELLRFEKKQLRA Homo_sapiens_TFIIE-alpha_NP_00 ALKRLAKYVIRGFYGIEHALALDILIRN-SCVKEEDMLELLKFDRKQLRS Arabidopsis_thaliana_TFIIEa2_A ESDNQQK-SVKN-VKGSAVIVLDALTRR-QWVREEDLAKEVKRNAKELRK Arabidopsis_thaliana_TFIIEa3_A KNNNQKK-SAKNGSGGIAVLVLDALTRR-QWVREEDLAKELKLNTKQLRT Arabidopsis_thaliana_TFIIEa1_A KSDNQQK-SARSDNRGIAAVVLDALARR-QWVREEDLAKDLQLHAKQLRK Populus_balsamifera_TFIIE-alpha 2 KGENQSKNNARGDNKGIAVVVLDALTRR-LWVNEEGLAKDLKIHIKQLRR Populus_balsamifera_TFIIE-alpha 1 KGDNQPK-TGRSDNRGIAVVVLDALTRR-QWVREEDLAKELKLHSKQLRR Solanum_tuberosum_TFIIEa_TC670 KGDNQPK-SGRSDNRGIAVVILDALTRR-QWVREEDLAKDLKLHTKQLRR Hordeum_vulgare_TFIIEa_TC90346 KGDTQAK-TSRGDNRGMAVVVLDGLTRR-QWVREEDLAKSLKLHSKQLRR Oryza_sativa_TFIIEa1 KGDNQPK-TSRGDNRGMAVVVLDALTRR-QWVREEDLAKALKLHSKQLRR Oryza_sativa_TFIIEa2 ------Oryza_sativa_TFIIEa3 E-EDQSK-PNGSGSCGIVVVVLDALTRQ-QWVREEDLARSLMIPFNRLRQ Oryza_sativa_TFIIEa4 KAGEVRI--DCDENRGVVVMVLDALTRY-QWVPDTHLAKSLKVQKKKLCL Methanosarcina_acetivorans_TFE LNDKVIRGYLISLVGEEGLRMIEEMPEG--EVTDEEIAAKTGVLLNTVRR Sulfolobus_solfataricus_TFE_NP AEDLFIN-LAKSLLGDDVIDVLRILLDKGTEMTDEEIANQLNIKVNDVRK Saccharomyces_cerevisiae_TFIIE IVKNLLKFVVRGFYGGSFVLVLDAILFH-SVLAEDDLKQLLSINKTELGP

Drosophila_melanogaster_TFIIEa RITTLRTDKFIQIRLKMETGPDGKAQKVN------Homo_sapiens_TFIIE-alpha_NP_00 VLNNLKGDKFIKCRMRVETAADGKTTRHN------Arabidopsis_thaliana_TFIIEa2_A LIRHFEEQKFVMRYHRKETAKRAKMYSYA-VGGTTDGRA-----EDNVKF Arabidopsis_thaliana_TFIIEa3_A ILRYFEEQQFIMRVHRKEKSS------ATTNGRG-----EDKVKV Arabidopsis_thaliana_TFIIEa1_A IIRLFEEEKLIMRDHRKETAKGAKMYSAA-VAATTDGRA-----EDKVKL Populus_balsamifera_TFIIE-alpha 2 ILRLFEEDKLLTRAHRKETAKVTKKPNAG-GADSQRKFG-SRE-DDKNKL Populus_balsamifera_TFIIE-alpha 1 TLRFFEEEKLVTRDHRKETAKAAKMHNAA-VANTTDGHR-TKEGDDKIKM Solanum_tuberosum_TFIIEa_TC670 TLRFFEEEKLITRDHRKEGAKGAKVYNSA-VAATVDGLQNGKEGDDKIKM Hordeum_vulgare_TFIIEa_TC90346 VLRFFEEEKLVTRDHRKESAKGAKIYSAA-AAAAGDGQ-PTKEGEEKVKL Oryza_sativa_TFIIEa1 ILRFFEEEKLVTRDHRKESAKGAKIYSAA-AAAAGDGQSITKEGEEKVKM Oryza_sativa_TFIIEa2 ------Oryza_sativa_TFIIEa3 ITHFLEQQKLVRRYYRKEAIHDASISTASPSHVSHDAHLVPTNVAGKLKM Oryza_sativa_TFIIEa4 ILEFLEKQMFVRRCEVKAKTGRNVSNTATTAGVSAIPRN-----EKVKSK Methanosarcina_acetivorans_TFE TLFILYENKFAICRRERDSNSGWLTYLWH------Sulfolobus_solfataricus_TFE_NP KLNLLEEQGFVSYRKTRDKDSGWFIYYWK------Saccharomyces_cerevisiae_TFIIE LIARLRSDRLISIHKQREYPPNSKSVERV------

Drosophila_melanogaster_TFIIEa ----YYFINYKTFVNVVKYKLDLMRKRMETEERDATSRASFKCSSCSKTF Homo_sapiens_TFIIE-alpha_NP_00 ----YYFINYRTLVNVVKYKLDHMRRRIETDERDSTNRASFKCPVCSSTF Arabidopsis_thaliana_TFIIEa2_A HTHSYCCLDYAQIYDIVRYKLHRLKKKFKDELEDRNTVQEYGCPNCKRKY Arabidopsis_thaliana_TFIIEa3_A HMYSYCCLDYSQIYDVIRYKLHRMKKEFKDVLEDKDNVQEYGCPNCKRKI Arabidopsis_thaliana_TFIIEa1_A HTHSYCCLDYAQICDVVRFRLHRMKKRLKDELEDKNTVQEYGCPNCQRKY Populus_balsamifera_TFIIE-alpha 2 HTHSYCCLDYAQIYDVVRYRLHRMRKMIKDELENNNAVQQYICPICERRY Populus_balsamifera_TFIIE-alpha 1 HTHSYCCLDYAQIYDVVRYRLHRMRKKLKDELEDKNTVQEYTCPNCGRRY Solanum_tuberosum_TFIIEa_TC670 HTHSYCCLDYAQIYDVVRYRLHRMKKKLRDELDNKNTVQEYICPNCGKRY Hordeum_vulgare_TFIIEa_TC90346 HTHSYCCLDYAQICDVVRYRIHRMKKTLKDELDSRNTVQHYICPNCKKRY Oryza_sativa_TFIIEa1 HTHSYCCLDYAQICDVVRYRIHRMKKKLKDELDSRNTIQHYICPNCKKRY Oryza_sativa_TFIIEa2 ------MVYDVVRYRIHRMRKKLKDGLDDRDTVQHYVCPNCKRRY Oryza_sativa_TFIIEa3 IMQPYCCLHYGQVYDVTLYRIHEMKKKLKDELDGNYMIQNYVCPNCERRY Oryza_sativa_TFIIEa4 HPKWYCCINYAKICSVVRYHIMQMEANLKSQLENTNTVDKYTCPNCGKSF Methanosarcina_acetivorans_TFE ------LDFSDVEHQLMREKKKLLRNLKTRLEFEENNVFYVCPQGCVRL Sulfolobus_solfataricus_TFE_NP ------PNIDQINEILLNRKRLILDKLKTRLEYEKNNTFFICPQDNSRY Saccharomyces_cerevisiae_TFIIE ----YYYVKYPHAIDAIKWKVHQVVQRLKDDLDKNSEPNGYMCPICLTKY

Drosophila_melanogaster_TFIIEa TDLEADQLFDMATLEFRCTFCGSSVEEDSAAMPKKDSRLMLAHFN-EQLQ Homo_sapiens_TFIIE-alpha_NP_00 TDLEANQLFDPMTGTFRCTFCHTEVEEDESAMPKKDARTLLARFN-EQIE Arabidopsis_thaliana_TFIIEa2_A NALDALRLISMEDDSFHCENCNGELVMECNKLISEEVVDRGDNARRRQRE Arabidopsis_thaliana_TFIIEa3_A ------FFHCENCNGELVMECNKLTSEEVVVDGSDNPRSRRD Arabidopsis_thaliana_TFIIEa1_A NALDALRLISMVDDSFHCENCNGELVVECNKLTSEEVVDGDDNARRRRRE Populus_balsamifera_TFIIE-alpha 2 NALDALRLISLVDEDFHCENCDGVLVAESDKLAAQEGGDGDDNARKRRRE Populus_balsamifera_TFIIE-alpha 1 NALDALRLMSLVDEYFHCENCDGELVAESDKLAAQEGGDGDDNARRRRRE Solanum_tuberosum_TFIIEa_TC670 TALDALRLISPVDEYFHCESCNEELVAESDKLASQGTTDGDDNDRRRRRE Hordeum_vulgare_TFIIEa_TC90346 SAFDALQLISYTDEYFHCENCNGELLAESDKLSSEEMGDGDDNARKRRRE Oryza_sativa_TFIIEa1 SAFDALQLISYTDEYFHCENCNGELVAESDKLASEEMGDGDDNARKRRRE Oryza_sativa_TFIIEa2 SAFDALQLVSDMDDYFHCEHCKGELRPESEKLTLDEIVCGGGNAIKHTHD Oryza_sativa_TFIIEa3 SSLNALDLVSHIDNNFHCKHCNEELSQDFGDLAWGGRGGDGDNARRDRHA Oryza_sativa_TFIIEa4 SAFDVKDLVSCTDGNFYCESCKHELVACS------EYGNYNEREGRSA

218

Methanosarcina_acetivorans_TFE L------FDEATETEFLCPMCGEDLVYYD------NS Sulfolobus_solfataricus_TFE_NP S------FEEAFENEFKCLKCGSQLTYYD------TD Saccharomyces_cerevisiae_TFIIE TQLEAVQLLNFDRTEFLCSLCDEPLVEDDSGKKNKE-KQDKLNRLMDQIQ

Drosophila_melanogaster_TFIIEa PLYDLLREVEG------IKLAPEVLE- Homo_sapiens_TFIIE-alpha_NP_00 PIYALLRETED------VNLAYEILE- Arabidopsis_thaliana_TFIIEa2_A KVKVWLQDLEG------ELKPLMELIN- Arabidopsis_thaliana_TFIIEa3_A HLKDLLQNMEV------RLKPLMDHIN- Arabidopsis_thaliana_TFIIEa1_A NLKNMLQKLEV------QMKPLMDQLN- Populus_balsamifera_TFIIE-alpha 2 KLKDMLQNMECYFMVPNFDFESINCKNWP--ARFWLAKVQLKPLMDQLS- Populus_balsamifera_TFIIE-alpha 1 KLKDMLQKMEDASNLFLFKCYLLLMKACYRVIEEVLGRRFIFSMTGQIEM Solanum_tuberosum_TFIIEa_TC670 KLEDMLHRVEA------QLKPLMDQLA- Hordeum_vulgare_TFIIEa_TC90346 KLNDMQQRIDE------QLKPLQAQLK- Oryza_sativa_TFIIEa1 KLKDMQQRIDE------QLKPLQAQLN- Oryza_sativa_TFIIEa2 KLKDMQQRMEE------QLKPLIAVLD- Oryza_sativa_TFIIEa3 KLKDFLQRMEH------QMERLISQLN- Oryza_sativa_TFIIEa4 NLLDFLENMKE------KLRPLKTKLD- Methanosarcina_acetivorans_TFE RFVSALKKRVD------ALSSV--- Sulfolobus_solfataricus_TFE_NP KIKSFLEQKIR------QIEEEID- Saccharomyces_cerevisiae_TFIIE PIIDSLKKIDDSRIEEN------TFEIALARLIP

Drosophila_melanogaster_TFIIEa --PEPVDIDTIRGLNKPNATRPDGMAWSG----EATRNQGFAVEETRVD- Homo_sapiens_TFIIE-alpha_NP_00 --PEPTEIPALKQSKDHAATTAGAASLAGGHHREAWATKGPSYEDLYTQN Arabidopsis_thaliana_TFIIEa2_A -RVKDLPFPAFEPFPAWEARAAKAAR-ENGDFNPDDPSRSLG--GYGSTP Arabidopsis_thaliana_TFIIEa3_A -RIKDLPVPSFESFPAWETRVAKAAR-ENGDLNPDDTLRPQG--GYGSTP Arabidopsis_thaliana_TFIIEa1_A -RVKDLPIPEFGSFLAWEARAAMAAR-ENGDLNPNDPLRSQG--GYGSTP Populus_balsamifera_TFIIE-alpha 2 -RVKDLPIPEIGSLQAWQLHENAAGRATNGDPNSDDHFKYSQGPGYGGTP Populus_balsamifera_TFIIE-alpha 1 ARVKDLPVPEFGSLQEWQIHASAAGRAANGDSSYNDPSRSSQ--GYGGTP Solanum_tuberosum_TFIIEa_TC670 -RVKDLPAPEFGSLQAWEVRANAVARGANGDNAND--SKSGQGLGFGGTP Hordeum_vulgare_TFIIEa_TC90346 -RVKDLPAPEFGSLQSWER--LNLGAFAHGDSAAAEAARNAQ-GQYNGTP Oryza_sativa_TFIIEa1 -RVKDLPAPEFGSLQSWER--ANIGAFGTADPSAADSSRNPQ-GQY-GTP Oryza_sativa_TFIIEa2 -RVKDLPFPSFMSLQDWER--ATIGASANG---AVGSSQNSE-GRYSSKP Oryza_sativa_TFIIEa3 -KVKDLDFPEFLALETWER---NMREPAGGD------DVSRP Oryza_sativa_TFIIEa4 -LLEDLPAPDFGSTPDFKG------TYNISDWSRTS Methanosarcina_acetivorans_TFE ------Sulfolobus_solfataricus_TFE_NP ------KETKLGANKNH------Saccharomyces_cerevisiae_TFIIE PQNQSHAAYTYNPKKGSTMFRPGDSAPLPNLMGTALGNDSSRRAGANSQA

Drosophila_melanogaster_TFIIEa VTIGGDDTSD------AVIERKS---RPIW Homo_sapiens_TFIIE-alpha_NP_00 VVINMDDQEDL------HRASLEGKSAKERPIW Arabidopsis_thaliana_TFIIEa2_A MPFLGETK------VEVNLG--EGNED-VT Arabidopsis_thaliana_TFIIEa3_A MPFLGETE------IEVNLG--EENED-VK Arabidopsis_thaliana_TFIIEa1_A MPFLGETK------VEVNLG--DGNED-VK Populus_balsamifera_TFIIE-alpha 2 MPFLGETK------VEVAFAGDESKEN-IK Populus_balsamifera_TFIIE-alpha 1 MPFLGETKHRVEFNASKRCQLRHDQEKDSSTKGRVEVSFSGVEGKED-LK Solanum_tuberosum_TFIIEa_TC670 MPFVGETK------VEVAFSGLEEKGD-IK Hordeum_vulgare_TFIIEa_TC90346 MPYLGDTK------VDVELAGSGVKEEGAE Oryza_sativa_TFIIEa1 MPYLGETK------VEVALSGTGVKDEGAE Oryza_sativa_TFIIEa2 MPFLGETE------VEVNFLGSTGAQEGVE Oryza_sativa_TFIIEa3 MLFLGEVMS------HEHQKGSASCIDADE Oryza_sativa_TFIIEa4 VPLPEPTNG------DDSFSSPCAKDD--E Methanosarcina_acetivorans_TFE ------Sulfolobus_solfataricus_TFE_NP ------Saccharomyces_cerevisiae_TFIIE TLHINITTAS------DEVAQRELQERQAEEKRKQ

Drosophila_melanogaster_TFIIEa MTESTVITDTDAADG-----AADAVQTASGSGHRNRKENE------Homo_sapiens_TFIIE-alpha_NP_00 LRESTVQGAYGSEDMKEGGIDMDAFQERE-EGHAGPDDNE------Arabidopsis_thaliana_TFIIEa2_A S--TGGDSSLKMLPPWMIKQGMKLTEEQRGEMRQEANVDG---EAAKLSD Arabidopsis_thaliana_TFIIEa3_A SDEVGDSSRRKLTPSWLIKKGMNLSDEQRGEIRHEAKAD------Arabidopsis_thaliana_TFIIEa1_A S--KGGDSSLKVLPPWMIKEGMNLTEEQRGEMRQEAKVDGGAGAAAKLSD Populus_balsamifera_TFIIE-alpha 2 S--ETASTSLKVLPPWMIKQGMNLTKEQRGEVKQESKMDSSSTAVEFSDE Populus_balsamifera_TFIIE-alpha 1 S--ETASTGLKVLPPWMIKQGMNLTKEQRGEVKQGSKMDDSSAAAEPPDD Solanum_tuberosum_TFIIEa_TC670 S--EVSVTPMKVLPPWMIKEGMNLTKEQRGEVKQESNMEGTSTAAGLSDD Hordeum_vulgare_TFIIEa_TC90346 S--GRDGTVLKVLPPWMVREGMNLTKEQRGESSNTSKGDE---KSDVKDE Oryza_sativa_TFIIEa1 S--GTNGNGLKVLPPWMIKQGMNLTKEQRGETSNSSNLDE---KSEVKDE Oryza_sativa_TFIIEa2 S--GMES--IKPQHSWMNRKRTVLAGEHKEENNNTANLDQ---SSEAKSD Oryza_sativa_TFIIEa3 EIFEFRVQDARPIPSFVIRKDINHTEDKEEQL------Oryza_sativa_TFIIEa4 S--DAGVSELKILPSWLIRKGMKLKQAHLSNSSTVCGEGG------Methanosarcina_acetivorans_TFE ------Sulfolobus_solfataricus_TFE_NP ------Saccharomyces_cerevisiae_TFIIE NAVPEWHKQSTIGKTALGRLDNEEEFDPVVTASAMDSINPDNEPAQETSY

219

Drosophila_melanogaster_TFIIEa ----DIMSVLLQHEKQPGQKEPHMKGMRVGSSNANSSDSSDDEKDIENSK Homo_sapiens_TFIIE-alpha_NP_00 ----EVMRALLIHEKKTSSAMAGSVGAAAPVTAANGDDSESETSESDDDS Arabidopsis_thaliana_TFIIEa2_A DKKSVMENGDDNKDLKDEYLKAYYAAIMEQQKLA-AKLNEQESAGESTTT Arabidopsis_thaliana_TFIIEa3_A DGGSSMENGDDDRNLKDEYLKAYYAAILEEQELA-EKLNQQESAGK-VTT Arabidopsis_thaliana_TFIIEa1_A DKKSAIGNGDE-KDLKDEYLKAYYAELMKQQELA-ARRNQQESAGE-PTS Populus_balsamifera_TFIIE-alpha 2 KK-SAKVNGDS---IKEEYVKAYYAALLEQQRQA-EESAKQQQELSQTSM Populus_balsamifera_TFIIE-alpha 1 KK-ISIENDDK---IKDEYVKAYYAALLQKQREA-EESAEKQQELLQTSI Solanum_tuberosum_TFIIEa_TC670 KKSIGFEDVKN---IQDEYIKAYYEALFKRQKEQ-EEATK---MLPETST Hordeum_vulgare_TFIIEa_TC90346 KK--QDSKEDE-KSIQDEYLKAYYEAFKKKQEEEDAKR-MQQEG------Oryza_sativa_TFIIEa1 KK--QDSKEDE-KSIQDEYIKAYYEALRKRQDEEEAKRKIQQEG------Oryza_sativa_TFIIEa2 KK--QLSEEDEMKSIQEAYAKAYYEAIQKRQEDE-GKRAIQEESLACISD Oryza_sativa_TFIIEa3 ------Oryza_sativa_TFIIEa4 ------TNIQEEYMKAYYEAIQKRQEDR------IRHSGQSS Methanosarcina_acetivorans_TFE ------Sulfolobus_solfataricus_TFE_NP ------Saccharomyces_cerevisiae_TFIIE QNNRTLTEQEMEERENEKTLNDYYAALAKKQAKLNKEEEEEEEEEEDEEE

Drosophila_melanogaster_TFIIEa IP-DVDFDNYINSDSAEEDD------DVPTVLVAGRPHPLDQLDDN--LI Homo_sapiens_TFIIE-alpha_NP_00 PPRPAAVAVHKREEDEEEDDEFEEVADDPIVMVAGRPFSYSEVSQRPELV Arabidopsis_thaliana_TFIIEa2_A DIESATTYSDRQVGMKSKREE----EEEDVEWEEGASVAANGNY---KVD Arabidopsis_thaliana_TFIIEa3_A DIELATSSSDRQVGMKSKREE----EEE-----EEASVAANGNY---KVD Arabidopsis_thaliana_TFIIEa1_A GIQSGTVYSGRQVSMKAKREEDEDEDEEEVEWEEKAPVTANGNY---KVD Populus_balsamifera_TFIIE-alpha 2 SNGLSESSSNRQVGMKSKREEG---EGDDDVEWEEAPIEGKSN------N Populus_balsamifera_TFIIE-alpha 1 SNGFSKSSSDRQVGMKSKREEDD--EPDDDVEWEEAPIGGMSYL-----S Solanum_tuberosum_TFIIEa_TC670 TDGVYNTSTERQVGMKSKREEED--EGED-VEWEEAPPAGNTTTGNLKVD Hordeum_vulgare_TFIIEa_TC90346 QAFSSEIHSERQLGMKAKREDE-NVEDDGVEWEEEQPAGNASEEPYKFVD Oryza_sativa_TFIIEa1 DTFASASHSERQVGMKSKRED----DDEGVEWEEEQPAGNTAET-YKLAD Oryza_sativa_TFIIEa2 QPFASDAQFERRLGAKSKRDDGGESGDDGIELKVRQSTGN-IEEVYKFAD Oryza_sativa_TFIIEa3 ------Oryza_sativa_TFIIEa4 VPGGPSVSSERPMGVKRQKLCNDINNNALECQGEEPPGDTFRT------Methanosarcina_acetivorans_TFE ------Sulfolobus_solfataricus_TFE_NP ------Saccharomyces_cerevisiae_TFIIE EEEEEMEDVMDDNDETARENALEDEFEDVTDTAGTAKTESNTSNDVKQES

Drosophila_melanogaster_TFIIEa AQMTPQEKENYIHVYQQHYSHIFE------Homo_sapiens_TFIIE-alpha_NP_00 AQMTPEEKEAYIAMGQRMFEDLFE------Arabidopsis_thaliana_TFIIEa2_A LNVEAEEAEEKEDGDEDDDIDWEEG------Arabidopsis_thaliana_TFIIEa3_A LNVEAEEAEQDE-----NDVDWQEC------Arabidopsis_thaliana_TFIIEa1_A LNVEAEASGGEEE-EEEDDVDWEEG------Populus_balsamifera_TFIIE-alpha 2 WNLIALLSY------Populus_balsamifera_TFIIE-alpha 1 MEWDPLQSY------Solanum_tuberosum_TFIIEa_TC670 LNVQADASEDDND--EEDDIDWEEG------Hordeum_vulgare_TFIIEa_TC90346 LNAEAPESGDEED-----EIDWEEG------Oryza_sativa_TFIIEa1 LNVEAQESGDEED-----EIDWEEG------Oryza_sativa_TFIIEa2 LNVETQELVEKN------CIPPAE------Oryza_sativa_TFIIEa3 ------Oryza_sativa_TFIIEa4 ------Methanosarcina_acetivorans_TFE ------Sulfolobus_solfataricus_TFE_NP ------Saccharomyces_cerevisiae_TFIIE INDKTEDAVNATATASGPSANAKPNDGDDDDDDDDDEMDIEFEDV

TFIIEβ Alignment

CLUSTAL X (1.83) multiple sequence alignment

Arabidopsis_thaliana_TFIIEb1_A -----MALREQLDKFNKQQEKCQSTL------S Arabidopsis_thaliana_TFIIEb2_A -----MALKEQLDKFNKQQVKCQSTL------S Lycopersicon_esculentum_TFIIEb ----MASLQESLQRFKKQQEKCQAIT------S Solanum_tuberosum_TFIIEb_TC605 ----MASLQESLQRFKKQQEKCQAIS------S Populus_trichocarpa_TFIIEb_Contig1 -----MALQEQLDRFKKQQEKCQSTL------T Glycine_max_TFIIEb_TC192062 -----MTLQEKLDKFKKQQEKCQTTL------S Medicago_truncatula_TFIIEb_TC7 -----MALQGKLDRFKKQQEKCQSTL------S Helianthus_annuus_TFIIEb_TC949 ----MGSLRESLNRFKQQQEKCQSTL------T Hordeum_vulgare_TFIIEb_TC10289 -----MDLKDSLSRFKQQQERCQSSL------A Triticum_aestivum_TFIIEb_TC110 -----MDLKDSLSRFKQQQERCQSSL------A Oryza_sativa_TFIIE-beta_AAM011 -----MDLKDSLSKFKQQQERCQSSL------A Oryza_sativa_TFIIEb_TC151474 -----MDLKDSLSKFKQQQERCQSSL------A Sorghum_bicolor_TFIIEb_TC59949 -----MDLKDSLSRFKQQQERCQSSL------A

220

Zea_mays_TFIIEb_TC209727 -----MDLKDSLSKFKQQQERCQSSL------A Hordeum_vulgare_TFIIEb_TC89335 -----MALNERLSKFKQQQERCQTTL------S Triticum_aestivum_TFIIEb_TC129 -----MALNERLSKFKQQQERCQTTL------S Sorghum_bicolor_TFIIEb_TC67168 -----MALNDRLNKFKQQQERCQNTL------S Drosophila_melanogaster_TFIIE- ---MDPALLREREAFKKRAMATPTVE------K Homo_sapiens_TFIIE-beta_NP_002 ---MDPSLLRERELFKKRALSTPVVE------K Saccharomyces_cerevisiae_TFIIE MSKNRDPLLANLNAFKSKVKSAPVIAPAKVGQKKTNDTVITIDGNTRKRT

Arabidopsis_thaliana_TFIIEb1_A SISSSR--TALSRS---YVP--AATTSQKPNVFRGKFSENTKQLQHITNI Arabidopsis_thaliana_TFIIEb2_A SIASSRERTSSSRQ---NVPLPAAITQKKPDAAPVKFSSDTERLQNINNI Lycopersicon_esculentum_TFIIEb MAARAGPSKG------APPRPANAKPP-APAVKFSNDTERLQHINTI Solanum_tuberosum_TFIIEb_TC605 MAARAGPSKG------APPRPANAKPP-APAVKFSNDTERLQHINSI Populus_trichocarpa_TFIIEb_Contig1 SIAKSRPSKSSLTQKTVAVAPAPSTSARTP-APAVKFSNDTERLQHINSI Glycine_max_TFIIEb_TC192062 SIAASKAAATQKS------AAHGSANGRNA-APAVKFSNDTERLQHINSI Medicago_truncatula_TFIIEb_TC7 SIAANKAVS------ASVPNA-LAPVKFSTDTERLQHINSI Helianthus_annuus_TFIIEb_TC949 SIAAGSKTSNRTTTPAPRVAPAASTLAKNP-VPAVKFSNDTERLQHINNV Hordeum_vulgare_TFIIEb_TC10289 SIAASQASTTKPKHR--AQPINAQSAPARP-AQPIKFSNDTERLQHINSI Triticum_aestivum_TFIIEb_TC110 SIAASQASTTKPKHR--AQPINAPSAPARP-AQPIKFSNDTERLQHINSI Oryza_sativa_TFIIE-beta_AAM011 SIAAS---TSKPKHR--AQPVNAPSAPARP-LQPIKFSNDTERLQHINSV Oryza_sativa_TFIIEb_TC151474 SIAAS---TSKPKHR--AQPVNAPSAPARP-LQPIKFSNDTERLQHINSV Sorghum_bicolor_TFIIEb_TC59949 SIAAS---SSKPKHR--AQPAHAPNVPARP-SQPVKFSNDTERLQHINSI Zea_mays_TFIIEb_TC209727 SIAAS---TSKPKHR--AQPAHAPNVPARP-SQPIKFSNDTERLQHINSI Hordeum_vulgare_TFIIEb_TC89335 SIAATQASTTKSHNAPRSRPANAPSAPAKQ-IQAIKFSNDTERLQHINSV Triticum_aestivum_TFIIEb_TC129 SIAATQASTTKSHNAPRSRPANAPSAPAKQ-IQAIKFSNDTERLQHINSV Sorghum_bicolor_TFIIEb_TC67168 SIFASQTSISTSKHVPGIQPVNAPLAPIKP-LHPIKFSNDTERLQHINSV Drosophila_melanogaster_TFIIE- KSKP-DRPAPPPPSDDSRRKMRPPNAPRLD-ATT------YKTMSGSSQY Homo_sapiens_TFIIE-beta_NP_002 RSASSESSSSSSKKKKTKVEHGGSSGSKQN-SDHSNGSFNLKALSGSSGY Saccharomyces_cerevisiae_TFIIE ASERAQENTLNSAKNPVLVDIKKEAGSNSSNAISLDDDDDDEDFGSSPSK

Arabidopsis_thaliana_TFIIEb1_A RNSAVGAQMKIVIDLLFK------TRLAYTAEQIN------EACYVDM Arabidopsis_thaliana_TFIIEb2_A RKAPVGAQIKRVIDLLYE------RRLALTPEQIN------EWCHVDM Lycopersicon_esculentum_TFIIEb RKGPVGSQMKRVIDLLLE------TRQAFTPEQIN------EACYVDL Solanum_tuberosum_TFIIEb_TC605 RKGPVGAQIKRVIDLLLE------TRQAFTPEQIN------EACYVDI Populus_trichocarpa_TFIIEb_Contig1 RKAPAGAQIKRVIDLLLE------TRQAFTPEQIN------DHCYVDM Glycine_max_TFIIEb_TC192062 RKAPVGAQMKRVIDLLLE------TRQAFTPEQIN------GACYVDM Medicago_truncatula_TFIIEb_TC7 RKAPVGAQMKRVIDLLFE------TRQALTLEQIN------ETCHVDM Helianthus_annuus_TFIIEb_TC949 RKSPVGAQIKKVIDLLFE------SRQAFTAEQIN------EACYVDV Hordeum_vulgare_TFIIEb_TC10289 RKSPIGAQIKLVIELLYK------TRQAFTAEQIN------DETYVDI Triticum_aestivum_TFIIEb_TC110 RKSPVGAQIKLVIELLYK------TRQAFTAEQIN------DATYVDI Oryza_sativa_TFIIE-beta_AAM011 RKSPIGAQIKLVIELLYK------TRQAFTAEQIN------ETTYVDI Oryza_sativa_TFIIEb_TC151474 RKSPIGAQIKLVIELLYK------TRQAFTAEQIN------ETTYVDI Sorghum_bicolor_TFIIEb_TC59949 RKSPVGAQIKLVIELLYK------TRQAFTAEQIN------DATYVDI Zea_mays_TFIIEb_TC209727 RKSPVGAQIKLVIELLYK------TRQAFTAEQIN------EATYVDI Hordeum_vulgare_TFIIEb_TC89335 RKSPVGAQIKLVIELLYK------TRLAYTAEQIN------EATYVAI Triticum_aestivum_TFIIEb_TC129 RKSPVGAQIKLVIELLYK------TRLAYTAEQIN------EATYVAI Sorghum_bicolor_TFIIEb_TC67168 RKSAVGVQIKLVVELLYK------TRQSFTAKQVN------EATYVDI Drosophila_melanogaster_TFIIE- RFGVLAKIVKFMRTRHQDG-----DDHPLTIDEILD------ETNQLDI Homo_sapiens_TFIIE-beta_NP_002 KFGVLAKIVNYMKTRHQRG-----DTHPLTLDEILD------ETQHLDI Saccharomyces_cerevisiae_TFIIE KVRPGSIAAAALQANQTDISKSHDSSKLLWATEYIQKKGKPVLVNELLDY

Arabidopsis_thaliana_TFIIEb1_A HNNKA---VFDSLRKNPKVHYDGR--RFSYKATHNIKDKKQLLSFVN-KS Arabidopsis_thaliana_TFIIEb2_A HANKA---VFDSLRKNPKAHYDGR--RFSYKATHDVNDKNQLLSLVR-KY Lycopersicon_esculentum_TFIIEb IGNKP---VFDSLRKNVKVYYDGN--RFSYKSKHALKNKEQLLILIR-KF Solanum_tuberosum_TFIIEb_TC605 NGNKA---VFDSLRNNLKVYYDGN--RFSYKSKHALKNKEQLLILIR-KF Populus_trichocarpa_TFIIEb_Contig1 NSNKA---VFDSLRNNPKVHYDGK--RFSYKSKHDLKDKSQLLVLIR-KF Glycine_max_TFIIEb_TC192062 KANKD---VFENLRKNPKVNYDGQ--RFSYKSKYGLKDKTELLQLIR-KY Medicago_truncatula_TFIIEb_TC7 KANKD---VFDNMRKNPKVRYDGE--RFSYKSKHALRDKKELLFLIR-KF Helianthus_annuus_TFIIEb_TC949 KGNKA---VFESLAKNPKVNYDGK--RFSYKSKHNVRDQKELLRLIR-TF Hordeum_vulgare_TFIIEb_TC10289 NGNKA---VFESLRNNLKVHYDGR--RFSYKSKHDLEGKDQLLELIR-CH Triticum_aestivum_TFIIEb_TC110 NANKA---VFDSLRNNLKVQYDGR--RFSYKSKHDLEGKDQLLDLIR-CH Oryza_sativa_TFIIE-beta_AAM011 HGNKS---VFDSLRNNPKVHYDGR--RFSYKSKHDLKGKDQLLVLVR-KY Oryza_sativa_TFIIEb_TC151474 HGNKS---VFDSLRNNPKVHYDGR--RFSYKSKHDLKGKDQLLVLVR-KY Sorghum_bicolor_TFIIEb_TC59949 HGNKA---VFDSLRNNPKVSYDGR--RFSYKSKHDLKGKDQLLVLIR-KF Zea_mays_TFIIEb_TC209727 HGNKA---VFDSLRNNPKVSYDGR--RFSYKSKHDLKGKDQLLVLIR-KF Hordeum_vulgare_TFIIEb_TC89335 NSNKA---VFDSLTNNPKVQFDGK--RFSYKSKHDLKGKDQLLHLIR-RF Triticum_aestivum_TFIIEb_TC129 NSNKA---VFDSLTNNPKVQFDGK--RFSYKSKHDLKGKDQLLHLIR-RF Sorghum_bicolor_TFIIEb_TC67168 HGNKA---VSDSLRNNPKVLFDGT--RFSYKPKHILTGRDELLGLIK-EK Drosophila_melanogaster_TFIIE- GQSVKNWLASEALHNNPKVEASPCGTKFSFKPVYKIKDGKTLMRLLK-QH Homo_sapiens_TFIIE-beta_NP_002 GLKQKQWLMTEALVNNPKIEVIDG--KYAFKPKYNVRDKKALLRLLD-QH Saccharomyces_cerevisiae_TFIIE LSMKKDDKVIELLKKLDRIEFDPKKGTFKYLSTYDVHSPSELLKLLRSQV

221

Arabidopsis_thaliana_TFIIEb1_A D-KVIDVSDLKDAYPNVMEDLKSLKSSGEIFWLLSNTDSKEGTVYRNNME Arabidopsis_thaliana_TFIIEb2_A L-DGIAVVDLKDAYPNVMEDLKALSASGDIY-LLSN--SQEDIAYPNDFK Lycopersicon_esculentum_TFIIEb P-EGIAVIDLKDAYPTVMEDLQALKGAGQIW-LLSNFDSQEDIAFPNDPR Solanum_tuberosum_TFIIEb_TC605 P-EGIAVIDLKDAYPTVMEDLQALKGAGQIW-LLSKFDSQEDIAFPNDPR Populus_trichocarpa_TFIIEb_Contig1 P-EGIAVIDLKDSYPSVMDDLQALKAVGQIW-LLSNFDSQEDIAYPNDPR Glycine_max_TFIIEb_TC192062 P-EGLAVIDLKDAYPTVMEDLQAMKAAGQIW-LLSNFDSQEDIAYPNDPK Medicago_truncatula_TFIIEb_TC7 P-EGIAVIDLKDSYPTVMEDLQALKGGREIW-LLSNFDSQEDIAYPNDPK Helianthus_annuus_TFIIEb_TC949 A-EGIAVADLKDAYPTVMEDLQALKAGRQIW-LLSNFDSQEEIAYPNDPR Hordeum_vulgare_TFIIEb_TC10289 Q-EGLAVVEVKDAYPSVLEDLQALKAAGEVW-LLSNMDSQEDIVYPNDPK Triticum_aestivum_TFIIEb_TC110 Q-EGLAVVEVKDAYPSVLEDLQALKAAGEVW-LLSNMDSQEDIVYPNDPK Oryza_sativa_TFIIE-beta_AAM011 P-EGLAVVEVKDAYPTVMEDLQALKAAGEVW-LLSNMDSQEDIVYPNDPK Oryza_sativa_TFIIEb_TC151474 P-EGLAVVEVKDAYPTVMEDLQALKAAGEVW-LLSNMDSQEDIVYPNDPK Sorghum_bicolor_TFIIEb_TC59949 P-EGLAVVEVKDAYPNVLEDLQALKAAGEVW-LLSNMDSQEDIVYPNDPK Zea_mays_TFIIEb_TC209727 P-EGLAVVEVKDAYSNVLEDLQALKAAGEVW-LLSNMDSQEDIVYPNDPK Hordeum_vulgare_TFIIEb_TC89335 P-EGLPVVEVKDSYPTVLDDLQALKASGDVW-WLSSMDSQEDIVYPNDPK Triticum_aestivum_TFIIEb_TC129 P-EGLPVVEVKDSYPTVLDDLQALKASGDVW-WLSSMDSQEDIVYPNDPK Sorghum_bicolor_TFIIEb_TC67168 E-CGLPVEDIKDAYPSVLEDLQALKASGDVW-WLSSTQSQEDMAYFNDPR Drosophila_melanogaster_TFIIE- DLKGLGGILLDDVQESLPHCEKVLKNRSAEILFVVRPIDKKKILFYNDRT Homo_sapiens_TFIIE-beta_NP_002 DQRGLGGILLEDIEEALPNSQKAVKALGDQILFVNRP-DKKKILFFNDKS Saccharomyces_cerevisiae_TFIIE TFKGISCKDLKDGWPQCDETINQLEEDSKILVLRTKKDKTPRYVWYNSGG

Arabidopsis_thaliana_TFIIEb1_A YP-KIDDELKALFRDI-IPSDMLE-VEKELLKIGLKPATNIAERRAAEQL Arabidopsis_thaliana_TFIIEb2_A CEIKVDDEFKALFRDINIPNDMLD-VEKELLKIGLKPATNTAERRAAAQT Lycopersicon_esculentum_TFIIEb VPIKVDDDLKQLFRGIELPRDMLD-IERDLQKNGMKPATNTAKRRAMAQV Solanum_tuberosum_TFIIEb_TC605 VPIKVDDDLKQLFRSIELPRDMLD-IERDLQKNGMKPATNTAKRRAMAQV Populus_trichocarpa_TFIIEb_Contig1 MVIKVDDDLKQLFRGIELPRDMLD-IEKDLQKNGMKPATNTAKRRAAAQV Glycine_max_TFIIEb_TC192062 VHIKVDDDLKHLFRSIELPRDMID-IEKDLQKNGMKPATNTAQRRSAAQI Medicago_truncatula_TFIIEb_TC7 VPIKVDDDLKQLFRGIELPRDMID-IERDLQKNGMKPATNTAKRRSAAQM Helianthus_annuus_TFIIEb_TC949 VPIKVDDELKQLFRSIELPRDMLD-IERDLQKNGMKPATNTAKRR----V Hordeum_vulgare_TFIIEb_TC10289 VKIKVDDDLKELFRGIELPRDMVD-IEKDLQKNGMKPMTDTTKRRAAAQI Triticum_aestivum_TFIIEb_TC110 VKIKVDDDLKELFRGIELPRDMVD-IEKELQKNGMKPMTDTTKRRAAAQI Oryza_sativa_TFIIE-beta_AAM011 AKIKVDDDLKQLFREMELPRDMVD-IEKELQKNGIKPMTNTAKRRAAAQI Oryza_sativa_TFIIEb_TC151474 AKIKVDDDLKQLFREMELPRDMVD-IEKELQKNGIKPMTNTAKRRAAAQI Sorghum_bicolor_TFIIEb_TC59949 AKIKVDDDLKQLFREIELPRDMVD-IEKELQRNGFKPMTNTAKRRAAAQI Zea_mays_TFIIEb_TC209727 AKIKVDDDLKQLFREIELPRDMVD-IEKELQKNGFKPMTNTAKRRAAAQI Hordeum_vulgare_TFIIEb_TC89335 SKIKLDADLKQLYREIELPRDMID-IEKELLKNGHKPATDTTKRRAAAQI Triticum_aestivum_TFIIEb_TC129 SKIKVDADLKQLYREIELPRDMID-IEKELLKNGHKPATDTTKRRAAAQI Sorghum_bicolor_TFIIEb_TC67168 YNITVDNDLKELFLKTELPRDMLD-VEKEIKKSGEKPMTNTTKRRALAQI Drosophila_melanogaster_TFIIE- ANFSVDDEFQKLWRSATVDAMDDAKIDEYLEKQGIRSMQDHGLKKAIPK- Homo_sapiens_TFIIE-beta_NP_002 CQFSVDEEFQKLWRSVTVDSMDEEKIEEYLKRQGISSMQESGPKKVAPIQ Saccharomyces_cerevisiae_TFIIE NLKCIDEEFVKMWENVQLPQFAEL--PRKLQDLGLKPASVDPATIKRQTK

Arabidopsis_thaliana_TFIIEb1_A HGVSNKPKDK--KKKKKEITNRTK-LTNSHMLELFQS------Arabidopsis_thaliana_TFIIEb2_A HGISNKPKDK--KKKKQEISKRTK-LTNAHLPELFQNLNGSSSRN Lycopersicon_esculentum_TFIIEb HGIAPKPKTK---KKKHEISKRTK-LTNAHLPELFKL------Solanum_tuberosum_TFIIEb_TC605 HGIVPKPKTK---KKKHEISKRTK-LTNAHLPELFKL------Populus_trichocarpa_TFIIEb_Contig1 QGISTKQKAK---KKKHEISKRTK-LTNAHLPELFKNLGS----- Glycine_max_TFIIEb_TC192062 QGISSKPKPK---KKKSEISKRTK-LTNAHLPELFQNLNSS---- Medicago_truncatula_TFIIEb_TC7 EGISSKPKPK---KKKNEITKRTK-LTNAHLPE------Helianthus_annuus_TFIIEb_TC949 DGS------K---WQYFE------Hordeum_vulgare_TFIIEb_TC10289 HGVKPKAKPK---KKQREITKRTK-LTNAHLPELFQHLKS----- Triticum_aestivum_TFIIEb_TC110 HGVKPKAKPK---KKQREITKRTK-LTNAHLPELFQHLKS----- Oryza_sativa_TFIIE-beta_AAM011 NGVQPKAKPK---KKQREITRRTK-LTNAHLPELFQNLNT----- Oryza_sativa_TFIIEb_TC151474 NGVQPKAKPK---KKQREITRRTK-LTNAHLPELFQNLNT----- Sorghum_bicolor_TFIIEb_TC59949 NGVKPKAKPK---KKQREITKRTK-LTNAHLPELFQNLNT----- Zea_mays_TFIIEb_TC209727 NGVKPKAKPK---KKQREITKRTK-LTNAHLPELFQNLNT----- Hordeum_vulgare_TFIIEb_TC89335 HGQRPKPKAK---KKQKEITKRTK-LTNAHLPELFDLPR------Triticum_aestivum_TFIIEb_TC129 HGQRPKPKAK---KKQKEITKRTK-LTNAHLPELFDLPR------Sorghum_bicolor_TFIIEb_TC67168 LDAAPKTKTKGSKKKQRRLTGKSKGLTNIHMPELFDA------Drosophila_melanogaster_TFIIE- -RKK-AANKKRQFKKPRDNEHLADVLEVYEDNTLTLKGVNPT--- Homo_sapiens_TFIIE-beta_NP_002 RRKKPASQKKRRFKT--HNEHLAGVLKDYSDITSSK------Saccharomyces_cerevisiae_TFIIE RVEVKKKRQR---KGKITNTHMTGILKDYSHRV------

222

TFIIFα Alignment

Conserved C-terminal hydrophobic amino acids are highlighted in yellow.

CLUSTAL X (1.83) multiple sequence alignment

Drosophila_melanogaster_TFIIFa ------MSSASKSTPSAASGSSTSAAAAAAASVASGSAS Homo_sapiens_TFIIF_RAP74_NP_00 ------MAALGP------Oryza_sativa_TFIIFa_TC148835 ------MGSADLVLKAACEGCGSPSDLYGT Triticum_aestivum_TFIIFa_TC106 ------MGSVDLVLKPACEGCGSTSDLYGT Arabidopsis_thaliana_TFIIF-alp ------MSNCLQLNTSCVGCGSQSDLYGS Populus_trichocarpa_TFIIF_alpha_C3 ------MSFDLLLKPSCSGCGSTTDLYGS Saccharomyces_cerevisiae_TFIIF MSRRNPPGSRNGGGPTNASPFIKRDRMRRNFLRMRMGQNGSNSSSPGVPN

Drosophila_melanogaster_TFIIFa SSANVQEFKIRVPK-MPKKHHVMRFNATLNVDFAQWRNVKLERENNMK-- Homo_sapiens_TFIIF_RAP74_NP_00 SSQNVTEYVVRVPKNTTKKYNIMAFNAADKVNFATWNQARLERDLSNK-- Oryza_sativa_TFIIFa_TC148835 SCKHTTLCSSCGKSMALSGARCLVCSAPITNLIREYNVRANATTDKSF-- Triticum_aestivum_TFIIFa_TC106 GCKHTTLCSSCGKSMALSRARCLVCSAPITNLIREYNVRANASTDKAF-- Arabidopsis_thaliana_TFIIF-alp SCRHMTLCLKCGRTMAQNKSKCHECGTVVTRLIREYNVRAAAPTDKNY-- Populus_trichocarpa_TFIIF_alpha_C3 NCKHMTLCLNCGKTMAENRGKCFDCGT------TEYNVRASTSSDKNY-- Saccharomyces_cerevisiae_TFIIF GDNSRGSLVKKDDPEYAEEREKMLLQIGVEADAGRSNVKVKDEDPNEYNE

Drosophila_melanogaster_TFIIFa ------EFRGMEEDQPKF--GAGSEYNRDQREEARRKKFGIIARKYR Homo_sapiens_TFIIF_RAP74_NP_00 ------KIY-QEEEMPES--GAGSEFNRKLREEARRKKYGIVLKEFR Oryza_sativa_TFIIFa_TC148835 ------SIGRFVTGLPPFSKKKSAENKWSLHKEGLQGRQIPENMREK Triticum_aestivum_TFIIFa_TC106 ------SIGRFVTGLPPFSKKKNAENKWSLHKEGLQGRQLTDKMLEK Arabidopsis_thaliana_TFIIF-alp ------FIGRFVTGLPNF--KKGSENKWSLRKDIPQGRQFTDAQREK Populus_trichocarpa_TFIIF_alpha_C3 ------FIGRFVTGLPSFSKKKNAENKWSLHKEGILGRQITDALREK Saccharomyces_cerevisiae_TFIIF FPLRAIPKEDLENMRTHLLKFQSKKKINPVTDFHLPVRLHRKDTRNLQFQ

Drosophila_melanogaster_TFIIFa PEAQPWILKVGGKTGKKFKG------IREGGVG Homo_sapiens_TFIIF_RAP74_NP_00 PEDQPWLLRVNGKSGRKFKG------IKKGGVT Oryza_sativa_TFIIFa_TC148835 YNRKPWILEDETGQ-YQYQG------QMEGSQS Triticum_aestivum_TFIIFa_TC106 YNRKPWILEDETGQ-YQFQG------HMEGSQS Arabidopsis_thaliana_TFIIF-alp LKNKPWILEDETGQ-FQYQG------HLEGSQS Populus_trichocarpa_TFIIF_alpha_C3 FKNKPWLLEDETGQ-SQYQG------HLEGSQS Saccharomyces_cerevisiae_TFIIF LTRAEIVQRQKEISEYKKKAEQERSTPNSGGMNKSGTVSLNNTVKDGSQT

Drosophila_melanogaster_TFIIFa ENAAFYVFTHAPDGAIEAYPLTEWYNFQPIQRYKSLSAEEAEQEFGRRKK Homo_sapiens_TFIIF_RAP74_NP_00 ENTSYYIFTQCPDGAFEAFPVHNWYNFTPLARHRTLTAEEAEEEWERRNK Oryza_sativa_TFIIFa_TC148835 STATYYLLMMHGK-EFHAYPAGSWYNFSKIAQYKQLTLEEAEEKMNKRKT Triticum_aestivum_TFIIFa_TC106 ATATYYLLMLHGK-EFHAFPAGSWYNFSKVAQYKQLTLEEAEEKMNKRKT Arabidopsis_thaliana_TFIIF-alp --ATYYLLVMQNK-EFVAIPAGSWYNFNKVAQYKQLTLEEAEEKMKNRRK Populus_trichocarpa_TFIIF_alpha_C3 --ATYYLLMMTGK-EFVAIPAGSWYNFNKVAHYKQLTLEEAEEKMKNRRK Saccharomyces_cerevisiae_TFIIF PTVDSVTKDNTANGVNSSIPTVTGSSVPPASPTTVSAIESNGLSNGSTSA

Drosophila_melanogaster_TFIIFa VMNYFSLMLRKRLRGDEEEEQDPEEA--KLI--KAATKKSKELKITDMDE Homo_sapiens_TFIIF_RAP74_NP_00 VLNHFSIMQQRRLK---DQDQDEDEE--EKE--KRGRRKASELRIHDLED Oryza_sativa_TFIIFa_TC148835 SATGYERWMMKAATNGPAAFGSDVKK--LEP--TNGTEKENARPKKGKNN Triticum_aestivum_TFIIFa_TC106 SATGYERWMMKAATNGPAAFGSDMMK--LEP--ANDGEKESARHKKGKDN Arabidopsis_thaliana_TFIIF-alp TADGYQRWMMKAANNGPALFGEVDNE--KESGGTSGGGGRGRKKSSGGDE Populus_trichocarpa_TFIIF_alpha_C3 TADGYERWMMKAANNGAAAFGEVEKV--DDKEGVSAGGRGGRRKASG-DD Saccharomyces_cerevisiae_TFIIF ANGLDGNASTANLANGRPLVTKLEDAGPAEDPTKVGMVKYDGKEVTNEPE

Drosophila_melanogaster_TFIIFa WID-SEDESDSEDEEDKKKKEQEDSDDGKAKGKGKKGADKKKKKRDVDDE Homo_sapiens_TFIIF_RAP74_NP_00 DLEMSSDASDASGEEGGR------VPKAKKKAPLAKGGRKKKKKKGSDDE Oryza_sativa_TFIIFa_TC148835 EEGNNSDKGEEDEEEEAA------RKNRLALNKK Triticum_aestivum_TFIIFa_TC106 EEGNNSDKGEENEEEEAA------RKDRLGLSKR Arabidopsis_thaliana_TFIIF-alp EEGNVSDRGDEDEEEEAS------RKSRLGLNRK Populus_trichocarpa_TFIIF_alpha_C3 DEGNVSDRGEEDEEEEAG------RKSRLGLNKQ Saccharomyces_cerevisiae_TFIIF FEEGTMDPLADVAPDGGG------RAKRGNLRRKTRQLKVLDEN

Drosophila_melanogaster_TFIIFa AFEESDDGDEEGREMDYDT---SSSEDEPDPE------AKVDKDMKGVAE Homo_sapiens_TFIIF_RAP74_NP_00 AFEDSDDGDFEGQEVDYMSDGSSSSQEEPESK------AKAPQQEEGPKG Oryza_sativa_TFIIFa_TC148835 SMDDD-EEGGKDLDFDLDD-EIEKGDDWEHEE------TFTDDDEAVDID Triticum_aestivum_TFIIFa_TC106 GMDDD-EEGGKDLDFDLDD-DIEKGDDWEHEE------TFTDDDEAVDID Arabidopsis_thaliana_TFIIF-alp SNDDDDEEGPRGGDLDMDDDDIEKGDDWEHEE------IFTDDDEAVGND Populus_trichocarpa_TFIIF_alpha_C3 GGDDD-EEGPRGGDLDMDDDDIEKGDDWEHEE------IFTDDDEAVAID Saccharomyces_cerevisiae_TFIIF AKKLRFEEFYPWVMEDFDGYNTWVGSYEAGNSDSYVLLSVEDDGSFTMIP

223

Drosophila_melanogaster_TFIIFa EDALRKLLTSDEEEDDEKKSDESDKEDADGEKKKKDKGKDEVSKDKKKKK Homo_sapiens_TFIIF_RAP74_NP_00 VDEQ----SDSSEESEEEKPPEEDKEEEE------EKKA Oryza_sativa_TFIIFa_TC148835 PEERAD-LAPEIPAPPEIKQD--DEENEEE------GGLSKSG Triticum_aestivum_TFIIFa_TC106 PEERAD-LAPEIPAPPEIKQD--DEENEEE------GGLSKSG Arabidopsis_thaliana_TFIIF-alp PEEREDLLAPEIPAPPEIKQDEDDEENEEEE------GGLSKSG Populus_trichocarpa_TFIIF_alpha_C3 PEERED-LAPEVPAPPEIKQDEDDEDEENEE------GGLSKSG Saccharomyces_cerevisiae_TFIIF ADKVYKFTARNKYATLTIDEAEKRMDKKSGEVPRWLMKHLDNIGTTTTRY

Drosophila_melanogaster_TFIIFa PTKDDKKGKSNGSGDSSTDFSSDSTDSEDDLS---NGPPKKKVVVKDKDK Homo_sapiens_TFIIF_RAP74_NP_00 PTPQEKKRRKDSSEES------DSSE-ESDID---SEASSAFFMAKKKTP Oryza_sativa_TFIIFa_TC148835 KELKKLLGKAAGLNESDADEDDEDDDQEDESS-PVLAPKQKDQ-PKDEPV Triticum_aestivum_TFIIFa_TC106 KELKKLLGRSSGQNESDADDDDEEDDQDDESS-PVLAPKQTDQ-PKDEPV Arabidopsis_thaliana_TFIIF-alp KELKKLLGKANGLDESDEDDDDDSDDEEETNYGTVTNSKQKEA-AKEEPV Populus_trichocarpa_TFIIF_alpha_C3 KELKKLLGKANGLNESDVEDDDDDEDMDDDIS-PVLAPKQKDVVPKEEAA Saccharomyces_cerevisiae_TFIIF DRTRRKLKAVADQQAMDEDDRDDNSEVELDYDEEFADDEEAPIIDGNEQE

Drosophila_melanogaster_TFIIFa EKEKE---KESA--ASSKVIASS-SNANKSRSATPTLSTDASKRKMNSLP Homo_sapiens_TFIIF_RAP74_NP_00 PKRER---KPSG--GSSRGNSRPGTPSAEGGSTSSTLRAAASKLEQGKRV Oryza_sativa_TFIIFa_TC148835 DNSPA---KPTP-SGHARGTPPASK-SKQKRKSGGGDDSKASGGAASKKA Triticum_aestivum_TFIIFa_TC106 DNSPA---KPTPSSGHARSTPPASK-SKQKRKSGG-DDAKASSGAASKKA Arabidopsis_thaliana_TFIIF-alp DNAPA---KPAP-SGPPRGTPPAKP-SKGKRKLNDGDSKKPSS-SVQKKV Populus_trichocarpa_TFIIF_alpha_C3 DISPA---KPTP-SGSAKGTPSTSKSAKGKRKLNG-EDAKSSNGAPVKKV Saccharomyces_cerevisiae_TFIIF NKESEQRIKKEMLQANAMGLRDEEAPSENEEDELFGEKKIDEDGERIKKA

Drosophila_melanogaster_TFIIFa SDLTASDTSNSPTSTPAKRPKNEI-----STSLPTSFSGGKVEDYGITEE Homo_sapiens_TFIIF_RAP74_NP_00 SEMPAAKRLRLDTGPQSLSGKS------TPQPPSGKTTPNSGDVQVTED Oryza_sativa_TFIIFa_TC148835 KVESDTKPSVAKDETPSSSKP------ASKATAASKTSANVSPVTED Triticum_aestivum_TFIIFa_TC106 KVESDTKTSSIKEETPSSSKP------TPKASASSR-SANVSPVTED Arabidopsis_thaliana_TFIIF-alp KTENDPKSSLKEERANTVSKSNTPTKAVKAEPASAPASSSSAATGPVTED Populus_trichocarpa_TFIIF_alpha_C3 KTENEVKPAVKEESSPATKGTATP----KVTPPSSKTGSTSGSTGPVTEE Saccharomyces_cerevisiae_TFIIF LQKTELAALYSSDENEINPYLSESD---IENKENESPVKKEEDSDTLSKS

Drosophila_melanogaster_TFIIFa AVRRYLKRK-PLTATELLTKFKNKKTPVSS-----DRLVETMTKILKKIN Homo_sapiens_TFIIF_RAP74_NP_00 AVRRYLTRK-PMTTKDLLKKFQTKKTGLSS-----EQTVNVLAQILKRLN Oryza_sativa_TFIIFa_TC148835 EIRTVLLAVAPVTTQDLVSRFKSRLRGPE------DKNAFAEILKKIS Triticum_aestivum_TFIIFa_TC106 EIRTVLLAVAPVTTQDLVSRFKSRLRGPE------DKNAFAEILKKIS Arabidopsis_thaliana_TFIIF-alp EIRAVLMEKKQVTTQDLVSRFKARLKTKE------DKNAFANILRKIS Populus_trichocarpa_TFIIF_alpha_C3 EIRAVLLQNGPVTTQDLVARFKSRLRTPECFTADYSLGLSVRLCMLLRIC Saccharomyces_cerevisiae_TFIIF KRSSPKKQQKKATNAHVHKEPTLRVKSIKN----CVIILKGDKKILKSFP

Drosophila_melanogaster_TFIIFa PVKHTIQGKMYLWIK------Homo_sapiens_TFIIF_RAP74_NP_00 PERKMINDKMHFSLKE------Oryza_sativa_TFIIFa_TC148835 KIQKTNG-HNYVVLRDDKK------Triticum_aestivum_TFIIFa_TC106 KIQKTNG-HNYVVLREDKK------Arabidopsis_thaliana_TFIIF-alp KIQKNAGSQNFVVLREKCQPKPGKRESRVNKLNIRS Populus_trichocarpa_TFIIF_alpha_C3 VVHDGINTISGVWVAKFLHRTWGFYQFNKGWVGGTG Saccharomyces_cerevisiae_TFIIF EGEWNPQTTKAVDSSNNASNTVPSPIKQEEGLNSTV

TFIIFβ Alignment

CLUSTAL X (1.83) multiple sequence alignment

Glycine_max_TFIIFb_TC178154 MDEENGYS-GSIS------SNLETTKAERSVWLMKCP Medicago_truncatula_TFIIFb_TC7 MEDENSYG-GSSGG------SNLETSKAERSVWLMKCP Vitis_vinifera_TFIIFb_TC20528 MEEEQ----GNSSS------SNLETGKAERSVWLMKCP Populus_balsamifera_TFIIF-beta 2 MEEDHSNGGNSSSS------GNLETSKADKAVWLMKCP Populus_balsamifera_TFIIF-beta 3 MEED-----NSSSS------ANLETSKADKSVWLMKCP Arabidopsis_thaliana_TFIIFb-2_ MEDIH------NLDIEKSDRSIWLMKCP Hordeum_vulgare_TFIIFb_TC10374 MGDEA------KYLETARADRSVWLMKCP Triticum_aestivum_TFIIFb_TC122 MGDEA------KYLETARADRSVWLMKCP Oryza_sativa_TFIIFb_TC137623 MAEEA------KNLETARADRSVWLMKCP Arabidopsis_thaliana_TFIIFb-1_ MEDVKVEMKVRKNEN------EALETGLAERSMLLMKAP Populus_balsamifera_TFIIF-beta 1 MDDEASNSSSGNNNNNNKNLTNDNNNKSPVLGGFLDASKAEKSVWLMKCP Drosophila_melanogaster_TFIIF_ MSKEDKEK------TQIIDKDLDLSNAGRGVWLVKVP Homo_sapiens_TFIIFb_NP_004119 MAERG------ELDLTGAKQNTGVWLVKVP Saccharomyces_cerevisiae_TFIIF LSQEEEVFDGNDIENNE------TKVYEESLDLDLERSNRQVWLVRLP

Glycine_max_TFIIFb_TC178154 LVVAKSWQTHPP------S--QPLAKVVLSLDPLHPEE------Medicago_truncatula_TFIIFb_TC7 VAVAKSWQNHPP------S--QPLSKVVFSIDPLLPE------

224

Vitis_vinifera_TFIIFb_TC20528 LAVSKSWQSHSS------SESQPVAKVVLSLDPLRSE------Populus_balsamifera_TFIIF-beta 2 VVVAKSWKSHHT------SSSDSAPLAKVVLSLDPLQSD------Populus_balsamifera_TFIIF-beta 3 VVVAKSWKTHTSP------SSSDSAPLAKVVLSLDPLQSD------Arabidopsis_thaliana_TFIIFb-2_ VVVDKAWHKIAASSSS-SFASSDSPPDMAKIVREVDPLRDD------Hordeum_vulgare_TFIIFb_TC10374 PVVSQAWQGASASS-----GDANPNPVVAKVVLSLDPLSSA------Triticum_aestivum_TFIIFb_TC122 PVVSQAWQGASSSS-----GDANPNPVVAKVVLSLDPLSSA------Oryza_sativa_TFIIFb_TC137623 TVVSRAWQEAATAA-----ASSSSSSDAAAGANSNSNANPN------Arabidopsis_thaliana_TFIIFb-1_ SLVASSLQSHSFPDDP--YRPDDPYRPDAKVILGVDPLAHEDEGTQLFRV Populus_balsamifera_TFIIF-beta 1 SIVSRFLRSQEHEVG----DGDASSPPVAKVIVSVDPLKSNDDDNS---- Drosophila_melanogaster_TFIIF_ KYIAQKWEKAPTNMDVGKLRINKTPGQKAQVSLSLTPAVLAL------Homo_sapiens_TFIIFb_NP_004119 KYLSQQWAKASGRGEVGKLRIAKTQG-RTEVSFTLN------Saccharomyces_cerevisiae_TFIIF MFLAEKWRDRNNLHG-QELGKIRINKDGSKITLLLNENDNDS------

Glycine_max_TFIIFb_TC178154 ------Medicago_truncatula_TFIIFb_TC7 ------Vitis_vinifera_TFIIFb_TC20528 ------Populus_balsamifera_TFIIF-beta 2 ------Populus_balsamifera_TFIIF-beta 3 ------Arabidopsis_thaliana_TFIIFb-2_ ------Hordeum_vulgare_TFIIFb_TC10374 ------Triticum_aestivum_TFIIFb_TC122 ------Oryza_sativa_TFIIFb_TC137623 ------Arabidopsis_thaliana_TFIIFb-1_ SSNHSGKFHPLRNLLLHSLKFHGFGEMGFLSLEISSGPHFDHEIPCNLRI Populus_balsamifera_TFIIF-beta 1 ------Drosophila_melanogaster_TFIIF_ ------Homo_sapiens_TFIIFb_NP_004119 ------Saccharomyces_cerevisiae_TFIIF ------

Glycine_max_TFIIFb_TC178154 ------DDPSAVQFTMEMA------Medicago_truncatula_TFIIFb_TC7 ------DDPAHLQFTMEMS------Vitis_vinifera_TFIIFb_TC20528 ------DPSALEFTMEMT------Populus_balsamifera_TFIIF-beta 2 ------DPSAIQFTMEMA------Populus_balsamifera_TFIIF-beta 3 ------DPSALQFTMEMA------Arabidopsis_thaliana_TFIIFb-2_ ------SPPEFKMYMV------Hordeum_vulgare_TFIIFb_TC10374 ------EPSLQFKMEMS------Triticum_aestivum_TFIIFb_TC122 ------EPSLQFKMEMS------Oryza_sativa_TFIIFb_TC137623 ------PVVAKFKMEMA------Arabidopsis_thaliana_TFIIFb-1_ GLLSMNFLARHLNYEFLGVKHGNSFALQFVMELA------Populus_balsamifera_TFIIF-beta 1 ------ATEDYPNALNFELSLVLFCLVFTLHDFFCSLW Drosophila_melanogaster_TFIIF_ ------DPEEKIPTEHILD------Homo_sapiens_TFIIFb_NP_004119 ------EDLANIHDIG------Saccharomyces_cerevisiae_TFIIF ------IPHEYDLELTKKVVENEYVFTEQNLKKY

Glycine_max_TFIIFb_TC178154 ---GTEAV---NMSKTYSLNMFKDFVPMCVFSETSQGG------Medicago_truncatula_TFIIFb_TC7 ---GTEAV---NMPKTYSLNMFKDFVPMCIFSETSEGD------Vitis_vinifera_TFIIFb_TC20528 ---GTGAP---NMPKSYSLNMFKDFVPMCVFSETNQG------Populus_balsamifera_TFIIF-beta 2 ---RTETG---NVPKSYSLNMFKDFVPMGVFSETPQG------Populus_balsamifera_TFIIF-beta 3 ---RTEAG---NVPKSYSLNMFKDFVPMCVFSETPQG------Arabidopsis_thaliana_TFIIFb-2_ ---GAEYG---NMPKCYALNMFTDFVPMGGFSDVNQG------Hordeum_vulgare_TFIIFb_TC10374 ---QTSVASTCNLPKSYSLNMFKDFVPMCVFSETNQG------Triticum_aestivum_TFIIFb_TC122 ---QTSVASTCNLPKSYSLNMFKDFVPMCVFSETNQG------Oryza_sativa_TFIIFb_TC137623 ---QTGNG---NTPKSYSLNMFKDFVPMCVFSESNQG------Arabidopsis_thaliana_TFIIFb-1_ ---RADSG---NMPRRYTLDMSKDFIPMNVFCESSDDFGSLGEE------Populus_balsamifera_TFIIF-beta 1 KWLGTGLG---DGLKSYSMEMSKDLVDMSVFSESSQG------Drosophila_melanogaster_TFIIF_ ---VSQVTKQTLGVFSHMAPSDGKENSTTSAAQPDNE------Homo_sapiens_TFIIFb_NP_004119 ---GKPASVSAPREHPFVLQSVG-GQTLTVFTESSSD------Saccharomyces_cerevisiae_TFIIF QQRKKELEADPEKQRQAYLKKQEREEELKKKQQQQKRRNNRKKFNHRVMT

Glycine_max_TFIIFb_TC178154 ------KVAMEGKVEHKFDMKPHGENIEEYGKLCRERTN Medicago_truncatula_TFIIFb_TC7 ------KVAMEGKVEHKFDMKPRHENMDDYGKLCRERTK Vitis_vinifera_TFIIFb_TC20528 ------RVAMEGKVEHKFDMKPHNENIEEYGKLCRERTN Populus_balsamifera_TFIIF-beta 2 ------RVSMEGKVEHKFDMKPHEENIEEYSKLCRDRTK Populus_balsamifera_TFIIF-beta 3 ------KVAMEGKVEHKFDMKPHEQNIEEYHKLCRERTK Arabidopsis_thaliana_TFIIFb-2_ ------CAAAEGKVDHKFDMKPYGETIEEYARLCRERTS Hordeum_vulgare_TFIIFb_TC10374 ------KLSCEGKVEHKFDMEPHKDNLLNYAKLCRERTQ Triticum_aestivum_TFIIFb_TC122 ------KLSCEGKVEHKFDMEPHKDNLLNYAKLCRERTQ Oryza_sativa_TFIIFb_TC137623 ------KLSCEGKVGHKFDMEPHSDNLVNYGKLCRERTQ Arabidopsis_thaliana_TFIIFb-1_ ------FSIGMFIYSPGKMSVEGKIKNKFDMRPHNENIESYGRLCRERTN Populus_balsamifera_TFIIF-beta 1 ------KLSVEGRILNKFDVRPHSENLENYRKICRERTK Drosophila_melanogaster_TFIIF_ ------KLYMEGRIVQKLECRPIAD--NCYMKLKLESIR

225

Homo_sapiens_TFIIFb_NP_004119 ------KLSLEGIVVQRAECRPAAS--ENYMRLKRLQIE Saccharomyces_cerevisiae_TFIIF DRDGRDRYIPYVKTIPKKTAIVGTVCHECQVMPSMNDPNYHKIVEQRRNI

Glycine_max_TFIIFb_TC178154 KSMIKNRQIQVIDNDRGVLMRPMPGMIG------LVSSNSKDK-KKTQP Medicago_truncatula_TFIIFb_TC7 KSMIKNRQVQIIADDRGTHMRPMPGMVG------LVSSNFKDK-KRTQP Vitis_vinifera_TFIIFb_TC20528 KSMIKNRQIQVIDNDRGVHMRPMPGMVG------LIASNSKDK-KKTAP Populus_balsamifera_TFIIF-beta 2 KSMIKNRQIRVIDNDRGVHMRPMPGMVG------LISSTSKDK-KKTQP Populus_balsamifera_TFIIF-beta 3 KSMVKIRQIQVINNDRGVHMRPMPGMVG------LISSSSKDK-KRPQP Arabidopsis_thaliana_TFIIFb-2_ KAMVKNRQIQVIDNDRGVHMRPMPGMLG------LVSSNSKEK-RKPPP Hordeum_vulgare_TFIIFb_TC10374 KSMVKTRKVQVLDNDHGMSMRPMPGMVG------LISSSSKEK-RKPTP Triticum_aestivum_TFIIFb_TC122 KSMVKTRKVQVLDNDHGMSMRPMPGMVG------LISSSSKEK-RKPTP Oryza_sativa_TFIIFb_TC137623 KSMIKNRKLMVLANDNGMSMRPLPGLVG------LMSSGPKQKEKKPLP Arabidopsis_thaliana_TFIIFb-1_ KYMGKNRQIQVIDNARGMHMRPMPGMI------IPTAAPEK--KKLT Populus_balsamifera_TFIIF-beta 1 KYMVKSRQIKVIDNDTGSHMMPMPGMIISGLAVLSFFYIFVNDK--KKLP Drosophila_melanogaster_TFIIF_ KASEPQRRVQPIDKIVQN-FKPVKDHAH------NIEYRER------Homo_sapiens_TFIIFb_NP_004119 ESSKPVRLSQQLDKVVTTNYKPVANHQY------NIEYERK------Saccharomyces_cerevisiae_TFIIF VKLNNKERITTLDETVGVTMSHTGMSMR------SDNSNFLKVGRE

Glycine_max_TFIIFb_TC178154 VKQSDTKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEIL Medicago_truncatula_TFIIFb_TC7 VKQTDTKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEIL Vitis_vinifera_TFIIFb_TC20528 VKGSDMKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEIL Populus_balsamifera_TFIIF-beta 2 VKQSDVKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEIL Populus_balsamifera_TFIIF-beta 3 VKQSDVKRTRRDRGELEDIMFKLFERQPNWALKQLVQETDQPAQFLKEIL Arabidopsis_thaliana_TFIIFb-2_ VKQTEVKRTRRDRGELEAIMFKLFEGQPNWTLKQLVQETDQPAQFLKEIL Hordeum_vulgare_TFIIFb_TC10374 TKPSDVKRTRRDRRELENIIFKLFEKQPNWALKALVQETDQPEQFLKEIL Triticum_aestivum_TFIIFb_TC122 TKPSDVKRTRRDRRELENIIFKLFEKQPNWALKALVQETDQPEQFLKEIL Oryza_sativa_TFIIFb_TC137623 VKPSDMKRTRRDRRELENILFKLFERQPNWSLKNLMQETDQPEQFLKEIL Arabidopsis_thaliana_TFIIFb-1_ NRTSEMKRTRRDRREMEEVMFNLFERQSNWTLRLLIQETDQPEQFLKDLL Populus_balsamifera_TFIIF-beta 1 IKASDMKRTRRDRREMEGIMFKLFEKQPNWTLKQLVQETDQP------Drosophila_melanogaster_TFIIF_ -KKAEGKKARDDKNAVMDMLFHAFEKHQYYNIKDLVKITNQPISYLKEIL Homo_sapiens_TFIIFb_NP_004119 -KKEDGKRARADKQHVLDMLFSAFEKHQYYNLKDLVDITKQPVVYLKEIL Saccharomyces_cerevisiae_TFIIF KAKSNIKSIRMPKKEILDYLFKLFDEYDYWSLKGLKERTRQPEAHLKECL

Glycine_max_TFIIFb_TC178154 NELCVYNKRGANQGTYELKPEYKKSVEDTSAE-- Medicago_truncatula_TFIIFb_TC7 NELCVYNKRGANQGTYELKPEYKKSVEDANAE-- Vitis_vinifera_TFIIFb_TC20528 NELCVYNKRGTNQGTYELKPEYKKSAEDTGAE-- Populus_balsamifera_TFIIF-beta 2 NELCVYNKRGTNQGTYELKPEYKKTAEDTGAD-- Populus_balsamifera_TFIIF-beta 3 NELCVYNKRGTNQGTYELKPEYKKTVEDTGAD-- Arabidopsis_thaliana_TFIIFb-2_ NELCVYNKRGSNQGTYELKPEYKKSAEDDTGGQ- Hordeum_vulgare_TFIIFb_TC10374 NDLCMYNKRGPNQGTHELKPEYKKSSEDAAGAP- Triticum_aestivum_TFIIFb_TC122 NDLCMYNKRGPNQGTHELKPEYKKSSEDAAGAP- Oryza_sativa_TFIIFb_TC137623 NDLCFYNKRGPNQGTHELKPEYKKSTEDADATAT Arabidopsis_thaliana_TFIIFb-1_ KDLCIYNNKGSNQGTYELKPEYKKATQE------Populus_balsamifera_TFIIF-beta 1 -EVCFLTT------Drosophila_melanogaster_TFIIF_ KDVCDYNMKNPHKNMWELKKEYRHYKTEEKKEEE Homo_sapiens_TFIIFb_NP_004119 KEIGVQNVKGIHKNTWELKPEYRHYQGEEKSD-- Saccharomyces_cerevisiae_TFIIF DKVATLVKKGPYAFKYTLRPEYKKLKEEERKATL

LIST OF REFERENCES

Albright, S.R., and Tjian, R. (2000). TAFs revisited: more data reveal new twists and confirm old ideas. Gene 242, 1-13.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J Mol Biol 215, 403-410.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402.

Andel, F.r., Ladurner, A.G., Inouye, C., Tjian, R., and Nogales, E. (1999). Three- dimensional structure of the human TFIID-IIA-IIB complex. Science 286, 2153- 2156.

Apone, L.M., Virbasius, C.A., Holstege, F.C., Wang, J., Young, R.A., and Green, M.R. (1998). Broad, but not universal, transcriptional requirement for yTAFII17, a histone H3-like TAFII present in TFIID and SAGA. Mol Cell 2, 653-661.

Attwooll, C., Tariq, M., Harris, M., Coyne, J.D., Telford, N., and Varley, J.M. (1999). Identification of a novel fusion gene involving hTAFII68 and CHN from a t(9;17)(q22;q11.2) translocation in an extraskeletal myxoid chondrosarcoma. Oncogene 18, 7599-7601.

Auble, D.T., Hansen, K.E., Mueller, C.G., Lane, W.S., Thorner, J., and Hahn, S. (1994). Mot1, a global repressor of RNA polymerase II transcription, inhibits TBP binding to DNA by an ATP-dependent mechanism. Genes Dev 8, 1920- 1934.

Austin, R.J., and Biggin, M.D. (1996). Purification of the Drosophila RNA polymerase II general transcription factors. Proc Natl Acad Sci U S A 93, 5788-5792.

Auty, R., Steen, H., Myers, L., Gygi, S., and Buratowski, S. (2003). The Purification and Initial Characterization of TFIID from Saccharomyces cerevisiae. In 22nd Summer Symposium in Molecular Biology: Chromatin Structure and Function (State College, PA), pp. 151.

Ayer, D.E. (1999). Histone deacetylases: transcriptional repression with SINers and NuRDs. Trends Cell Biol 9, 193-198.

226 227

Baldwin, D.A., and Gurley, W.B. (1996). Isolation and characterization of cDNAs encoding transcription factor IIB from Arabidopsis and soybean. Plant J 10, 561- 568.

Barlev, N.A., Candau, R., Wang, L., Darpino, P., Silverman, N., and Berger, S.L. (1995). Characterization of physical interactions of the putative transcriptional adaptor, ADA2, with acidic activation domains and TATA-binding protein. J Biol Chem 270, 19337-19344.

Beckmann, H., Chen, J.L., O'Brien, T., and Tjian, R. (1995). Coactivator and promoter-selective properties of RNA polymerase I TAFs. Science 270, 1506- 1509.

Bell, B., and Tora, L. (1999). Regulation of by multiple forms of TFIID and other novel TAFII-containing complexes. Exp Cell Res 246, 11-19.

Bell, S.D., Brinkman, A.B., van der Oost, J., and Jackson, S.P. (2001). The archaeal TFIIE alpha homologue facilitates transcription initiation by enhancing TATA- box recognition. EMBO Rep 2, 133-138.

Berger, S.L. (2003). Histone Covalent Modifications in Gene Regulation. In 22nd Summer Symposium in Molecular Biology: Chromatin Structure and Function (State College, PA), pp. 57.

Berk, A.J. (1999). Activation of RNA polymerase II transcription. Curr Opin Cell Biol 11, 330-335.

Bertolotti, A., Lutz, Y., Heard, D.J., Chambon, P., and Tora, L. (1996). hTAF(II)68, a novel RNA/ssDNA-binding protein with homology to the pro-oncoproteins TLS/FUS and EWS is associated with both TFIID and RNA polymerase II. Embo J 15, 5022-5031.

Bertolotti, A., Melot, T., Acker, J., Vigneron, M., Delattre, O., and Tora, L. (1998). EWS, but not EWS-FLI-1, is associated with both TFIID and RNA polymerase II: interactions between two members of the TET family, EWS and hTAFII68, and subunits of TFIID and RNA polymerase II complexes. Mol Cell Biol 18, 1489- 1497.

Birck, C., Poch, O., Romier, C., Ruff, M., Mengus, G., Lavigne, A.C., Davidson, I., and Moras, D. (1998). Human TAF(II)28 and TAF(II)18 interact through a histone fold encoded by atypical evolutionary conserved motifs also found in the SPT3 family. Cell 94, 239-249.

Brand, M., Leurent, C., Mallouh, V., Tora, L., and Schultz, P. (1999). Three- dimensional structures of the TAFII-containing complexes TFIID and TFTC. Science 286, 2151-2153.

228

Buratowski, S., Hahn, S., Guarente, L., and Sharp, P.A. (1989). Five intermediate complexes in transcription initiation by RNA polymerase II. Cell 56, 549-561.

Buratowski, S., and Zhou, H. (1993). Functional domains of transcription factor TFIIB. Proc Natl Acad Sci U S A 90, 5633-5637.

Burke, T.W., and Kadonaga, J.T. (1997). The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila. Genes Dev 11, 3020-3031.

Burley, S.K., and Roeder, R.G. (1998). TATA box mimicry by TFIID: autoinhibition of pol II transcription. Cell 94, 551-553.

Burley, S.K., and Roeder, R.G. (1996). Biochemistry and Structural Biology of Transcription Factor IID (TFIID). In Annual Review of Biochemistry, C.C. Richardson, J.N. Abelson, and C.R.H. Raetz, eds (Palo Alta, CA: Annual Reviews Inc.), pp. 769-799.

Cairns, B.R., Henry, N.L., and Kornberg, R.D. (1996). TFG/TAF30/ANC1, a component of the yeast SWI/SNF complex that is similar to the leukemogenic proteins ENL and AF-9. Mol Cell Biol 16, 3308-3316.

Campanella, J., Bitincka, L., and Smalley, J. (2003). MatGAT: An application that generates similarity/identity matrices using protein or DNA sequences. BMC Bioinformatics 4, 29.

Carmo-Fonseca, M., Mendes-Soares, L., and Campos, I. (2000). To be or not to be in the nucleolus. Nat Cell Biol 2, E107-112.

Chalkley, G.E., and Verrijzer, C.P. (1999). DNA binding site selection by RNA polymerase II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. Embo J 18, 4835-4845.

Chang, M., and Jaehning, J.A. (1997). A multiplicity of mediators: alternative forms of transcription complexes communicate with transcriptional regulators. Nucleic Acids Res 25, 4861-4865.

Chasman, D.I., Flaherty, K.M., Sharp, P.A., and Kornberg, R.D. (1993). Crystal structure of yeast TATA-binding protein and model for interaction with DNA. Proc Natl Acad Sci U S A 90, 8174-8178.

Chatterjee, S., and Struhl, K. (1995). Connecting a promoter-bound protein to TBP bypasses the need for a transcriptional activation domain. Nature 374, 820-822.

229

Chen, Z., and Manley, J.L. (2000). Robust mRNA transcription in chicken DT40 cells depleted of TAF(II)31 suggests both functional degeneracy and evolutionary divergence. Mol Cell Biol 20, 5064-5076.

Chen, Z., and Manley, J.L. (2003). In vivo functional analysis of the histone 3-like TAF9 and a TAF9-related factor, TAF9L. J Biol Chem 278, 35172-35183.

Chicurel, M. (2002). Putting a name on it. Nature 419, 755, 757.

Choi, C.H., Hiromura, M., and Usheva, A. (2003). Transcription factor IIB acetylates itself to regulate transcription. Nature 424, 965-969.

Clos, J., Buttgereit, D., and Grummt, I. (1986). A purified transcription factor (TIF-IB) binds to essential sequences of the mouse rDNA promoter. Proc Natl Acad Sci U S A 83, 604-608.

Collart, M.A. (1996). The NOT, SPT3, and MOT1 genes functionally interact to regulate transcription at core promoters. Mol Cell Biol 16, 6668-6676.

Colombatti, A., Bonaldo, P., and Doliana, R. (1993). Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix proteins. Matrix 13, 297-306.

Comai, L., Zomerdijk, J.C., Beckmann, H., Zhou, S., Admon, A., and Tjian, R. (1994). Reconstitution of transcription factor SL1: exclusive binding of TBP by SL1 or TFIID subunits. Science 266, 1966-1972.

Coulombe, B. (1999). DNA wrapping in transcription initiation by RNA polymerase II. Biochem Cell Biol 77, 257-264.

Coulombe, B., and Burton, Z.F. (1999). DNA bending and wrapping around RNA polymerase: a "revolutionary" model describing transcriptional mechanisms. Microbiol Mol Biol Rev 63, 457-478.

Cramer, P., Bushnell, D.A., Fu, J., Gnatt, A.L., Maier-Davis, B., Thompson, N.E., Burgess, R.R., Edwards, A.M., David, P.R., and Kornberg, R.D. (2000). Architecture of RNA polymerase II and implications for the transcription mechanism. Science 288, 640-649.

Czermin, B., Melfi, R., McCabe, D., Seitz, V., Imhof, A., and Pirrotta, V. (2002). Drosophila enhancer of Zeste/ESC complexes have a histone H3 methyltransferase activity that marks chromosomal Polycomb sites. Cell 111, 185-196.

Davie, J.R., and Murphy, L.C. (1990). Level of ubiquitinated histone H2B in chromatin is coupled to ongoing transcription. Biochemistry 29, 4752-4757.

230

Dhalluin, C., Carlson, J.E., Zeng, L., He, C., Aggarwal, A.K., and Zhou, M.M. (1999). Structure and ligand of a histone acetyltransferase bromodomain. Nature 399, 491-496.

Dikstein, R., Ruppert, S., and Tjian, R. (1996a). TAFII250 is a bipartite protein kinase that phosphorylates the base transcription factor RAP74. Cell 84, 781-790.

Dikstein, R., Zhou, S., and Tjian, R. (1996b). Human TAFII 105 is a cell type-specific TFIID subunit related to hTAFII130. Cell 87, 137-146.

Dubrovskaya, V., Lavigne, A.C., Davidson, I., Acker, J., Staub, A., and Tora, L. (1996). Distinct domains of hTAFII100 are required for functional interaction with transcription factor TFIIF beta (RAP30) and incorporation into the TFIID complex. Embo J 15, 3702-3712.

Dvir, A., Conaway, J.W., and Conaway, R.C. (2001). Mechanism of transcription initiation and promoter escape by RNA polymerase II. Curr Opin Genet Dev 11, 209-214.

Dynlacht, B.D., Hoey, T., and Tjian, R. (1991). Isolation of coactivators associated with the TATA-binding protein that mediate transcriptional activation. Cell 66, 563-576.

Eisenmann, D.M., Arndt, K.M., Ricupero, S.L., Rooney, J.W., and Winston, F. (1992). SPT3 interacts with TFIID to allow normal transcription in Saccharomyces cerevisiae. Genes Dev 6, 1319-1331.

Eulgem, T., Rushton, P.J., Robatzek, S., and Somssich, I.E. (2000). The WRKY superfamily of plant transcription factors. Trends Plant Sci 5, 199-206.

Fang, S.M., and Burton, Z.F. (1996). RNA polymerase II-associated protein (RAP) 74 binds transcription factor (TF) IIB and blocks TFIIB-RAP30 binding. J Biol Chem 271, 11703-11709.

Flores, O., Ha, I., and Reinberg, D. (1990). Factors involved in specific transcription by mammalian RNA polymerase II. Purification and subunit composition of transcription factor IIF. J Biol Chem 265, 5629-5634.

Flores, O., Lu, H., Killeen, M., Greenblatt, J., Burton, Z.F., and Reinberg, D. (1991). The small subunit of transcription factor IIF recruits RNA polymerase II into the preinitiation complex. Proc Natl Acad Sci U S A 88, 9999-10003.

Flores, O., Lu, H., and Reinberg, D. (1992). Factors involved in specific transcription by mammalian RNA polymerase II. Identification and characterization of factor IIH. J Biol Chem 267, 2786-2793.

231

Flores, O., Maldonado, E., and Reinberg, D. (1989). Factors involved in specific transcription by mammalian RNA polymerase II. Factors IIE and IIF independently interact with RNA polymerase II. J Biol Chem 264, 8913-8921.

Fondell, J.D., Ge, H., and Roeder, R.G. (1996). Ligand induction of a transcriptionally active thyroid hormone receptor coactivator complex. Proc Natl Acad Sci U S A 93, 8329-8333.

Gangloff, Y.G., Pointud, J.C., Thuault, S., Carre, L., Romier, C., Muratoglu, S., Brand, M., Tora, L., Couderc, J.L., and Davidson, I. (2001a). The TFIID components human TAF(II)140 and Drosophila BIP2 (TAF(II)155) are novel metazoan homologues of yeast TAF(II)47 containing a histone fold and a PHD finger. Mol Cell Biol 21, 5109-5121.

Gangloff, Y.G., Sanders, S.L., Romier, C., Kirschner, D., Weil, P.A., Tora, L., and Davidson, I. (2001b). Histone folds mediate selective heterodimerization of yeast TAF(II)25 with TFIID components yTAF(II)47 and yTAF(II)65 and with SAGA component ySPT7. Mol Cell Biol 21, 1841-1853.

Gangloff, Y.G., Werten, S., Romier, C., Carre, L., Poch, O., Moras, D., and Davidson, I. (2000). The human TFIID components TAF(II)135 and TAF(II)20 and the yeast SAGA components ADA1 and TAF(II)68 heterodimerize to form histone-like pairs. Mol Cell Biol 20, 340-351.

Gasch, A., Hoffmann, A., Horikoshi, M., Roeder, R.G., and Chua, N.H. (1990). Arabidopsis thaliana contains two genes for TFIID. Nature 346, 390-394.

Gegonne, A., Weissman, J.D., and Singer, D.S. (2001). TAFII55 binding to TAFII250 inhibits its acetyltransferase activity. Proc Natl Acad Sci U S A 98, 12432-12437.

Geiger, J.H., Hahn, S., Lee, S., and Sigler, P.B. (1996). Crystal structure of the yeast TFIIA/TBP/DNA complex. Science 272, 830-836.

Gille, C., Goede, A., Schloetelburg, C., Preissner, R., Kloetzel, P.M., Gobel, U.B., and Frommel, C. (2003). A comprehensive view on proteasomal sequences: implications for the evolution of the proteasome. J Mol Biol 326, 1437-1448.

232

Giot, L., Bader, J.S., Brouwer, C., Chaudhuri, A., Kuang, B., Li, Y., Hao, Y.L., Ooi, C.E., Godwin, B., Vitols, E., Vijayadamodar, G., Pochart, P., Machineni, H., Welsh, M., Kong, Y., Zerhusen, B., Malcolm, R., Varrone, Z., Collis, A., Minto, M., Burgess, S., McDaniel, L., Stimpson, E., Spriggs, F., Williams, J., Neurath, K., Ioime, N., Agee, M., Voss, E., Furtak, K., Renzulli, R., Aanensen, N., Carrolla, S., Bickelhaupt, E., Lazovatsky, Y., DaSilva, A., Zhong, J., Stanyon, C.A., Finley Jr., R.L., White, K.P., Braverman, M., Jarvie, T., Gold, S., Leach, M., Knight, J., Shimkets, R.A., McKenna, M.P., Chant, J., and Rothberg, J.M. (2003). A Protein Interaction Map of Drosophila melanogaster. Science, 1090289.

Gonnet, G.H., Cohen, M.A., and Benner, S.A. (1994). Analysis of amino acid substitution during divergent evolution: the 400 by 400 dipeptide substitution matrix. Biochem Biophys Res Commun 199, 489-496.

Goodrich, J.A., Hoey, T., Thut, C.J., Admon, A., and Tjian, R. (1993). Drosophila TAFII40 interacts with both a VP16 activation domain and the basal transcription factor TFIIB. Cell 75, 519-530.

Grant, P.A., and Berger, S.L. (1999). Histone acetyltransferase complexes. Semin Cell Dev Biol 10, 169-177.

Grant, P.A., Schieltz, D., Pray-Grant, M.G., Steger, D.J., Reese, J.C., Yates, J.R., and Workman, J.L. (1998). A subset of TAF(II)s are integral components of the SAGA complex required for nucleosome acetylation and transcriptional stimulation. Cell 94, 45-53.

Green, M.R. (2000). TBP-associated factors (TAFIIs): multiple, selective transcriptional mediators in common complexes. Trends Biochem Sci 25, 59-63.

Groft, C.M., Uljon, S.N., Wang, R., and Werner, M.H. (1998). Structural homology between the Rap30 DNA-binding domain and linker histone H5: implications for preinitiation complex assembly. Proc Natl Acad Sci U S A 95, 9117-9122.

Ha, I., Roberts, S., Maldonado, E., Sun, X., Kim, L.U., Green, M., and Reinberg, D. (1993). Multiple functional domains of human transcription factor IIB: distinct interactions with two general transcription factors and RNA polymerase II. Genes Dev 7, 1021-1032.

Hansen, S.K., Takada, S., Jacobson, R.H., Lis, J.T., and Tjian, R. (1997). Transcription properties of a cell type-specific TATA-binding protein, TRF. Cell 91, 71-83.

233

Heard, D.J., Kiss, T., and Filipowicz, W. (1993). Both Arabidopsis TATA binding protein (TBP) isoforms are functionally identical in RNA polymerase II and III transcription in plant cells: evidence for gene-specific changes in DNA binding specificity of TBP. Embo J 12, 3519-3528.

Heim, M.A., Jakoby, M., Werber, M., Martin, C., Weisshaar, B., and Bailey, P.C. (2003). The Basic Helix-Loop-Helix Transcription Factor Family in Plants: A Genome-wide Study of Protein Structure and Functional Diversity. Mol Biol Evol.

Henry, N.L., Campbell, A.M., Feaver, W.J., Poon, D., Weil, P.A., and Kornberg, R.D. (1994). TFIIF-TAF-RNA polymerase II connection. Genes Dev 8, 2868- 2878.

Hernandez-Hernandez, A., and Ferrus, A. (2001). Prodos Is a Conserved Transcriptional Regulator That Interacts with dTAF(II)16 in Drosophila melanogaster. Mol Cell Biol 21, 614-623.

Hisatake, K., Ohta, T., Takada, R., Guermah, M., Horikoshi, M., Nakatani, Y., and Roeder, R.G. (1995). Evolutionary conservation of human TATA-binding- polypeptide-associated factors TAFII31 and TAFII80 and interactions of TAFII80 with other TAFs and with general transcription factors. Proc Natl Acad Sci U S A 92, 8195-8199.

Hoey, T., Dynlacht, B.D., Peterson, M.G., Pugh, B.F., and Tjian, R. (1990). Isolation and characterization of the Drosophila gene encoding the TATA box binding protein, TFIID. Cell 61, 1179-1186.

Hoffmann, A., Chiang, C.M., Oelgeschlager, T., Xie, X., Burley, S.K., Nakatani, Y., and Roeder, R.G. (1996). A histone octamer-like structure within TFIID. Nature 380, 356-359.

Holstege, F.C., Jennings, E.G., Wyrick, J.J., Lee, T.I., Hengartner, C.J., Green, M.R., Golub, T.R., Lander, E.S., and Young, R.A. (1998). Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717-728.

Horvath, A., and Riezman, H. (1994). Rapid protein extraction from Saccharomyces cerevisiae. Yeast 10, 1305-1310.

Huang, X., and Madan, A. (1999). CAP3: A DNA Sequence Assembly Program. Genome Res. 9, 868-877.

234

Huisinga, K.L., and Pugh, B.F. (2003). A Genome-wide Housekeeping Role for TFIID and Highly Regulated Stress-related Role for SAGA in Saccharomyces cerevisiae. In 22nd Summer Symposium in Molecular Biology: Chromatin Structure and Function (Pennsylvania State University, University Park, PA), pp. 101.

Imbalzano, A.N., Kwon, H., Green, M.R., and Kingston, R.E. (1994). Facilitated binding of TATA-binding protein to nucleosomal DNA. Nature 370, 481-485.

Imhof, A., Yang, X.J., Ogryzko, V.V., Nakatani, Y., Wolffe, A.P., and Ge, H. (1997). Acetylation of general transcription factors by histone acetyltransferases. Curr Biol 7, 689-692.

The Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796-815.

Inostroza, J., Flores, O., and Reinberg, D. (1991). Factors involved in specific transcription by mammalian RNA polymerase II. Purification and functional analysis of general transcription factor IIE. J Biol Chem 266, 9304-9308.

Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001). A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 98, 4569-4574.

Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., and Sakaki, Y. (2000). Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci U S A 97, 1143-1147.

Jacobson, R.H., Ladurner, A.G., King, D.S., and Tjian, R. (2000). Structure and function of a human TAFII250 double bromodomain module. Science 288, 1422- 1425.

Jacq, X., Brou, C., Lutz, Y., Davidson, I., Chambon, P., and Tora, L. (1994). Human TAFII30 is present in a distinct TFIID complex and is required for transcriptional activation by the estrogen receptor. Cell 79, 107-117.

Jakoby, M., Weisshaar, B., Droge-Laser, W., Vicente-Carbajosa, J., Tiedemann, J., Kroj, T., and Parcy, F. (2002). bZIP transcription factors in Arabidopsis. Trends Plant Sci 7, 106-111.

Joazeiro, C.A., Kassavetis, G.A., and Geiduschek, E.P. (1994). Identical components of yeast transcription factor IIIB are required and sufficient for transcription of TATA box-containing and TATA-less genes. Mol Cell Biol 14, 2798-2808.

235

Kamada, K., De Angelis, J., Roeder, R.G., and Burley, S.K. (2001). Crystal structure of the C-terminal domain of the RAP74 subunit of human transcription factor IIF. Proc Natl Acad Sci U S A 98, 3115-3120.

Kambadur, R., Culotta, V., and Hamer, D. (1990). Cloned yeast and mammalian transcription factor TFIID gene products support basal but not activated metallothionein gene transcription. Proc Natl Acad Sci U S A 87, 9168-9172.

Killeen, M.T., and Greenblatt, J.F. (1992). The general transcription factor RAP30 binds to RNA polymerase II and prevents it from binding nonspecifically to DNA. Mol Cell Biol 12, 30-37.

Kim, J.L., and Burley, S.K. (1994). 1.9 A resolution refined structure of TBP recognizing the minor groove of TATAAAAG. Nat Struct Biol 1, 638-653.

Kim, J.L., Nikolov, D.B., and Burley, S.K. (1993a). Co-crystal structure of TBP recognizing the minor groove of a TATA element. Nature 365, 520-527.

Kim, Y., Geiger, J.H., Hahn, S., and Sigler, P.B. (1993b). Crystal structure of a yeast TBP/TATA-box complex. Nature 365, 512-520.

Kimura, H., Tao, Y., Roeder, R.G., and Cook, P.R. (1999). Quantitation of RNA polymerase II and its transcription factors in an HeLa cell: little soluble holoenzyme but significant amounts of polymerases attached to the nuclear substructure. Mol Cell Biol 19, 5383-5392.

Kitajima, S., Chibazakura, T., Yonaha, M., and Yasukochi, Y. (1994). Regulation of the human general transcription initiation factor TFIIF by phosphorylation. J Biol Chem 269, 29970-29977.

Klebanow, E.R., Poon, D., Zhou, S., and Weil, P.A. (1996). Isolation and characterization of TAF25, an essential yeast gene that encodes an RNA polymerase II-specific TATA-binding protein-associated factor. J Biol Chem 271, 13706-13715.

Klein, C., and Struhl, K. (1994). Increased recruitment of TATA-binding protein to the promoter by transcriptional activation domains in vivo. Science 266, 280-282.

Klemm, R.D., Goodrich, J.A., Zhou, S., and Tjian, R. (1995). Molecular cloning and expression of the 32-kDa subunit of human TFIID reveals interactions with VP16 and TFIIB that mediate transcriptional activation. Proc Natl Acad Sci U S A 92, 5788-5792.

236

Kokubo, T., Gong, D.W., Roeder, R.G., Horikoshi, M., and Nakatani, Y. (1993a). The Drosophila 110-kDa transcription factor TFIID subunit directly interacts with the N-terminal region of the 230-kDa subunit. Proc Natl Acad Sci U S A 90, 5896-5900.

Kokubo, T., Gong, D.W., Wootton, J.C., Horikoshi, M., Roeder, R.G., and Nakatani, Y. (1994). Molecular cloning of Drosophila TFIID subunits. Nature 367, 484-487.

Kokubo, T., Gong, D.W., Yamashita, S., Horikoshi, M., Roeder, R.G., and Nakatani, Y. (1993b). Drosophila 230-kD TFIID subunit, a functional homolog of the human cell cycle gene product, negatively regulates DNA binding of the TATA box-binding subunit of TFIID. Genes Dev 7, 1033-1046.

Kokubo, T., Gong, D.W., Yamashita, S., Takada, R., Roeder, R.G., Horikoshi, M., and Nakatani, Y. (1993c). Molecular cloning, expression, and characterization of the Drosophila 85-kilodalton TFIID subunit. Mol Cell Biol 13, 7859-7863.

Kokubo, T., Swanson, M.J., Nishikawa, J.I., Hinnebusch, A.G., and Nakatani, Y. (1998). The yeast TAF145 inhibitory domain and TFIIA competitively bind to TATA-binding protein. Mol Cell Biol 18, 1003-1012.

Koleske, A.J., and Young, R.A. (1994). An RNA polymerase II holoenzyme responsive to activators. Nature 368, 466-469.

Koleske, A.J., and Young, R.A. (1995). The RNA polymerase II holoenzyme and its implications for gene regulation. Trends Biochem Sci 20, 113-116.

Komarnitsky, P.B., Michel, B., and Buratowski, S. (1999). TFIID-specific yeast TAF40 is essential for the majority of RNA polymerase II-mediated transcription in vivo. Genes Dev 13, 2484-2489.

Kotani, T., Banno, K., Ikura, M., Hinnebusch, A.G., Nakatani, Y., Kawaichi, M., and Kokubo, T. (2000). A role of transcriptional activators as antirepressors for the autoinhibitory activity of TATA box binding of transcription factor IID. Proc Natl Acad Sci U S A 97, 7178-7183.

Kraemer, S.M., Ranallo, R.T., Ogg, R.C., and Stargell, L.A. (2001). TFIIA interacts with TFIID via association with TATA-binding protein and TAF40. Mol Cell Biol 21, 1737-1746.

Laemmli, U.K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680-685.

237

Lagrange, T., Hakimi, M.A., Pontier, D., Courtois, F., Alcaraz, J.P., Grunwald, D., Lam, E., and Lerbs-Mache, S. (2003). Transcription factor IIB (TFIIB)-related protein (pBrp), a plant-specific member of the TFIIB-related protein family. Mol Cell Biol 23, 3274-3286.

Lagrange, T., Kapanidis, A.N., Tang, H., Reinberg, D., and Ebright, R.H. (1998). New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev 12, 34-44.

Langelier, M.F., Forget, D., Rojas, A., Porlier, Y., Burton, Z.F., and Coulombe, B. (2001). Structural and functional interactions of transcription factor (TF) IIA with TFIIE and TFIIF in transcription initiation by RNA polymerase II. J Biol Chem 276, 38652-38657.

Lavigne, A.C., Mengus, G., May, M., Dubrovskaya, V., Tora, L., Chambon, P., and Davidson, I. (1996). Multiple interactions between hTAFII55 and other TFIID subunits. Requirements for the formation of stable ternary complexes between hTAFII55 and the TATA-binding protein. J Biol Chem 271, 19774-19780.

Lawit, S.J., and Czarnecka-Verner, E. (2002). Histone Deacetylase Complexes: Implications for Plants. Biotechnologia 3, 39-52.

Le Gourrierec, J., Li, Y.F., and Zhou, D.X. (1999). Transcriptional activation by Arabidopsis GT-1 may be through interaction with TFIIA-TBP-TATA complex. Plant J 18, 663-668.

Learned, R.M., Cordes, S., and Tjian, R. (1985). Purification and characterization of a transcription factor that confers promoter specificity to human RNA polymerase I. Mol Cell Biol 5, 1358-1369.

Lee, S., and Hahn, S. (1995). Model for binding of transcription factor TFIIB to the TBP-DNA complex. Nature 376, 609-612.

Lee, T.I., and Young, R.A. (1998). Regulation of gene expression by TBP-associated proteins. Genes Dev 12, 1398-1408.

Leurent, C., Sanders, S., Ruhlmann, C., Mallouh, V., Weil, P.A., Kirschner, D.B., Tora, L., and Schultz, P. (2002). Mapping histone fold TAFs within yeast TFIID. Embo J 21, 3424-3433.

Levine, M., and Tjian, R. (2003). Transcription regulation and animal diversity. Nature 424, 147-151.

Li, X.Y., Bhaumik, S.R., and Green, M.R. (2000). Distinct classes of yeast promoters revealed by differential TAF recruitment. Science 288, 1242-1244.

238

Li, Y.F., Dubois, F., and Zhou, D.X. (2001). Ectopic expression of TATA box-binding protein induces shoot proliferation in Arabidopsis. FEBS Lett 489, 187-191.

Li, Y.F., Le Gourierrec, J., Torki, M., Kim, Y.J., Guerineau, F., and Zhou, D.X. (1999). Characterization and functional analysis of Arabidopsis TFIIA reveal that the evolutionarily unconserved region of the large subunit has a transcription activation domain. Plant Mol Biol 39, 515-525.

Lin, C.W., Moorefield, B., Payne, J., Aprikian, P., Mitomo, K., and Reeder, R.H. (1996). A novel 66-kilodalton protein complexes with Rrn6, Rrn7, and TATA- binding protein to promote polymerase I transcription initiation in Saccharomyces cerevisiae. Mol Cell Biol 16, 6436-6443.

Liu, D., Ishima, R., Tong, K.I., Bagby, S., Kokubo, T., Muhandiram, D.R., Kay, L.E., Nakatani, Y., and Ikura, M. (1998). Solution structure of a TBP- TAF(II)230 complex: protein mimicry of the minor groove surface of the TATA box unwound by TBP. Cell 94, 573-583.

Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F., and Richmond, T.J. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251-260.

Lukashin, A.V., and Borodovsky, M. (1998). GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26, 1107-1115.

Majoros, W.H., Pertea, M., Antonescu, C., and Salzberg, S.L. (2003). GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Res 31, 3601-3604.

Maldonado, E., Ha, I., Cortes, P., Weis, L., and Reinberg, D. (1990). Factors involved in specific transcription by mammalian RNA polymerase II: role of transcription factors IIA, IID, and IIB during formation of a transcription-competent complex. Mol Cell Biol 10, 6335-6347.

Malik, S., Lee, D.K., and Roeder, R.G. (1993). Potential RNA polymerase II-induced interactions of transcription factor TFIIB. Mol Cell Biol 13, 6253-6259.

Margottin, F., Dujardin, G., Gerard, M., Egly, J.M., Huet, J., and Sentenac, A. (1991). Participation of the TATA factor in transcription of the yeast U6 gene by RNA polymerase C. Science 251, 424-426.

Martinez, E., Chiang, C.M., Ge, H., and Roeder, R.G. (1994). TATA-binding protein- associated factor(s) in TFIID function through the initiator to direct basal transcription from a TATA-less class II promoter. Embo J 13, 3115-3126.

239

Matangkasombut, O., Buratowski, R.M., Swilling, N.W., and Buratowski, S. (2000). Bromodomain factor 1 corresponds to a missing piece of yeast TFIID. Genes Dev 14, 951-962.

Matsui, T., Segall, J., Weil, P.A., and Roeder, R.G. (1980). Multiple factors required for accurate initiation of transcription by purified RNA polymerase II. J Biol Chem 255, 11992-11996.

Maxon, M., and Tjian, R. (1994). Transcriptional Activity of Transcription Factor IIE is Dependent on Zinc Binding. PNAS 91, 9529-9533.

Maxon, M.E., Goodrich, J.A., and Tjian, R. (1994). Transcription factor IIE binds preferentially to RNA polymerase IIa and recruits TFIIH: a model for promoter clearance. Genes Dev 8, 515-524.

McCracken, S., and Greenblatt, J. (1991). Related RNA polymerase-binding regions in human RAP30/74 and Escherichia coli sigma 70. Science 253, 900-902.

Mengus, G., May, M., Jacq, X., Staub, A., Tora, L., Chambon, P., and Davidson, I. (1995). Cloning and characterization of hTAFII18, hTAFII20 and hTAFII28: three subunits of the human transcription factor TFIID. Embo J 14, 1520-1531.

Michel, B., Komarnitsky, P., and Buratowski, S. (1998). Histone-like TAFs are essential for transcription in vivo. Mol Cell 2, 663-673.

Miller, J.H. (1972). Experiments in Molecular Genetics. (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press).

Miller, J.H. (1992). A Laboratory Manual for Escherichia coli and Related Bacteria: A Short Course in Bacterial Genetics. (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press).

Miller, T., Krogan, N.J., Dover, J., Erdjument-Bromage, H., Tempst, P., Johnston, M., Greenblatt, J.F., and Shilatifard, A. (2001). COMPASS: a complex of proteins associated with a trithorax-related SET domain protein. Proc Natl Acad Sci U S A 98, 12902-12907.

Mitsiou, D.J., and Stunnenberg, H.G. (2000). TAC, a TBP-sans-TAFs complex containing the unprocessed TFIIA alpha/beta precursor and the TFIIA gamma subunit. Mol Cell 6, 527-537.

Mittal, V., and Hernandez, N. (1997). Role for the amino-terminal region of human TBP in U6 snRNA transcription. Science 275, 1136-1140.

240

Mukumoto, F., Hirose, S., Imaseki, H., and Yamazaki, K. (1993). DNA sequence requirement of a TATA element-binding protein from Arabidopsis for transcription in vitro. Plant Mol Biol 23, 995-1003.

Murphy, S., Yoon, J.B., Gerster, T., and Roeder, R.G. (1992). Oct-1 and Oct-2 potentiate functional interactions of a transcription factor with the proximal sequence element of small nuclear RNA genes. Mol Cell Biol 12, 3247-3261.

Myers, L.C., and Kornberg, R.D. (2000). Mediator of transcriptional regulation. Annu Rev Biochem 69, 729-749.

Naar, A.M., Beaurang, P.A., Zhou, S., Abraham, S., Solomon, W., and Tjian, R. (1999). Composite co-activator ARC mediates chromatin-directed transcriptional activation. Nature 398, 828-832.

Nikolov, D.B., and Burley, S.K. (1994). 2.1 A resolution refined structure of a TATA box-binding protein (TBP). Nat Struct Biol 1, 621-637.

Nikolov, D.B., Chen, H., Halay, E.D., Usheva, A.A., Hisatake, K., Lee, D.K., Roeder, R.G., and Burley, S.K. (1995). Crystal structure of a TFIIB-TBP-TATA-element ternary complex. Nature 377, 119-128.

Nikolov, D.B., Hu, S.H., Lin, J., Gasch, A., Hoffmann, A., Horikoshi, M., Chua, N.H., Roeder, R.G., and Burley, S.K. (1992). Crystal structure of TFIID TATA- box binding protein. Nature 360, 40-46.

O. Brien, T., and Tjian, R. (1998). Functional analysis of the human TAFII250 N- terminal kinase domain. Mol Cell 1, 905-911.

Oelgeschlager, T., Chiang, C.M., and Roeder, R.G. (1996). Topology and reorganization of a human TFIID-promoter complex. Nature 382, 735-738.

Ogryzko, V.V., Kotani, T., Zhang, X., Schlitz, R.L., Howard, T., Yang, X.J., Howard, B.H., Qin, J., and Nakatani, Y. (1998). Histone-like TAFs within the PCAF histone acetylase complex. Cell 94, 35-44.

Ohkuma, Y., Sumimoto, H., Hoffmann, A., Shimasaki, S., Horikoshi, M., and Roeder, R.G. (1991). Structural motifs and potential sigma homologies in the large subunit of human general transcription factor TFIIE. Nature 354, 398-401.

Ohkuma, Y., Sumimoto, H., Horikoshi, M., and Roeder, R.G. (1990). Factors involved in specific transcription by mammalian RNA polymerase II: purification and characterization of general transcription factor TFIIE. Proc Natl Acad Sci U S A 87, 9163-9167.

241

Okamoto, T., Yamamoto, S., Watanabe, Y., Ohta, T., Hanaoka, F., Roeder, R.G., and Ohkuma, Y. (1998). Analysis of the role of TFIIE in transcriptional regulation through structure-function studies of the TFIIE beta subunit. J Biol Chem 273, 19866-19876.

Okuda, M., Watanabe, Y., Okamura, H., Hanaoka, F., Ohkuma, Y., and Nishimura, Y. (2000). Structure of the central core domain of TFIIE beta with a novel double- stranded DNA-binding surface. Embo J 19, 1346-1356.

Orphanides, G., Lagrange, T., and Reinberg, D. (1996). The general transcription factors of RNA polymerase II. Genes Dev 10, 2657-2683.

Page, R.D. (1996). TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12, 357-358.

Pan, S., Czarnecka-Verner, E., and Gurley, W.B. (2000). Role of the TATA binding protein-transcription factor IIB interaction in supporting basal and activated transcription in plant cells. Plant Cell 12, 125-136.

Pan, S., Sehnke, P.C., Ferl, R.J., and Gurley, W.B. (1999). Specific interactions with TBP and TFIIB in vitro suggest that 14-3-3 proteins may participate in the regulation of transcription when part of a DNA binding complex. Plant Cell 11, 1591-1602.

Pandey, R., Muller, A., Napoli, C.A., Selinger, D.A., Pikaard, C.S., Richards, E.J., Bender, J., Mount, D.W., and Jorgensen, R.A. (2002). Analysis of histone acetyltransferase and histone deacetylase families of Arabidopsis thaliana suggests functional diversification of chromatin modification among multicellular eukaryotes. Nucleic Acids Res 30, 5036-5055.

Parvin, J.D., and Sharp, P.A. (1993). DNA topology and a minimal set of basal factors for transcription by RNA polymerase II. Cell 73, 533-540.

Peterson, M.G., Inostroza, J., Maxon, M.E., Flores, O., Admon, A., Reinberg, D., and Tjian, R. (1991). Structure and functional properties of human general transcription factor IIE. Nature 354, 369-373.

Peterson, M.G., Tanese, N., Pugh, B.F., and Tjian, R. (1990). Functional domains and upstream activation properties of cloned human TATA binding protein. Science 248, 1625-1630.

Pham, A.D., and Sauer, F. (2000). Ubiquitin-activating/conjugating activity of TAFII250, a mediator of activation of gene expression in Drosophila. Science 289, 2357-2360.

242

Pointud, J.-C., Mengus, G., Brancorsini, S., Monaco, L., Parvinen, M., Sassone- Corsi, P., and Davidson, I. (2003). The intracellular localisation of TAF7L, a paralogue of transcription factor TFIID subunit TAF7, is developmentally regulated during male germ-cell differentiation. J Cell Sci 116, 1847-1858.

Ptashne, M. (1988). How eukaryotic transcriptional activators work. Nature 335, 683- 689.

Pugh, B.F. (2003). Short Talk: Coordination of TFIID and SAGA. In Keystone Symposia: The Enzymology of Chromatin and Transcription (Santa Fe, NM), pp. 16.

Pugh, B.F., and Tjian, R. (1990). Mechanism of transcriptional activation by Sp1: evidence for coactivators. Cell 61, 1187-1197.

Purnell, B.A., Emanuel, P.A., and Gilmour, D.S. (1994). TFIID sequence recognition of the initiator and sequences farther downstream in Drosophila class II genes. Genes Dev 8, 830-842.

Qadri, I., Maguire, H.F., and Siddiqui, A. (1995). Hepatitis B virus transactivator protein X interacts with the TATA-binding protein. Proc Natl Acad Sci U S A 92, 1003-1007.

Qureshi, S.A., Khoo, B., Baumann, P., and Jackson, S.P. (1995). Molecular cloning of the transcription factor TFIIB homolog from Sulfolobus shibatae. Proc Natl Acad Sci U S A 92, 6077-6081.

Rachez, C., and Freedman, L.P. (2001). Mediator complexes and transcription. Curr Opin Cell Biol 13, 274-280.

Rachez, C., Lemon, B.D., Suldan, Z., Bromleigh, V., Gamble, M., Naar, A.M., Erdjument-Bromage, H., Tempst, P., and Freedman, L.P. (1999). Ligand- dependent transcription activation by nuclear receptors requires the DRIP complex. Nature 398, 824-828.

Ranish, J.A., and Hahn, S. (1991). The yeast general transcription factor TFIIA is composed of two polypeptide subunits. J Biol Chem 266, 19320-19327.

Reese, J.C., Apone, L., Walker, S.S., Griffin, L.A., and Green, M.R. (1994). Yeast TAFIIS in a multisubunit complex required for activated transcription. Nature 371, 523-527.

Reese, J.C., Zhang, Z., and Kurpad, H. (2000). Identification of a yeast transcription factor IID subunit, TSG2/TAF48. J Biol Chem 275, 17391-17398.

243

Reindl, A., and Schoffl, F. (1998). Interaction between the Arabidopsis thaliana heat shock transcription factor HSF1 and the TATA binding protein TBP. FEBS Lett 436, 318-322.

Remboutsika, E., Jacq, X., and Tora, L. (2001). Chromatin is permissive to TBP- mediated transcription initiation. J Biol Chem Accepted Manuscript, 22.

Riechmann, J.L., Heard, J., Martin, G., Reuber, L., Jiang, C., Keddie, J., Adam, L., Pineda, O., Ratcliffe, O.J., Samaha, R.R., Creelman, R., Pilgrim, M., Broun, P., Zhang, J.Z., Ghandehari, D., Sherman, B.K., and Yu, G. (2000). Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290, 2105-2110.

Riechmann, J.L., and Ratcliffe, O.J. (2000). A genomic perspective on plant transcription factors. Curr Opin Plant Biol 3, 423-434.

Robert, F., Forget, D., Li, J., Greenblatt, J., and Coulombe, B. (1996). Localization of subunits of transcription factors IIE and IIF immediately upstream of the transcriptional initiation site of the adenovirus major late promoter. J Biol Chem 271, 8517-8520.

Roberts, S.M., and Winston, F. (1996). SPT20/ADA5 encodes a novel protein functionally related to the TATA-binding protein and important for transcription in Saccharomyces cerevisiae. Mol Cell Biol 16, 3206-3213.

Rossignol, M., Keriel, A., Staub, A., and Egly, J.M. (1999). Kinase activity and phosphorylation of the largest subunit of TFIIF transcription factor. J Biol Chem 274, 22387-22392.

Rowlands, T., Baumann, P., and Jackson, S.P. (1994). The TATA-binding protein: a general transcription factor in eukaryotes and archaebacteria. Science 264, 1326- 1329.

Ruppert, S., and Tjian, R. (1995). Human TAFII250 interacts with RAP74: implications for RNA polymerase II initiation. Genes Dev 9, 2747-2755.

Ruppert, S., Wang, E.H., and Tjian, R. (1993). Cloning and expression of human TAFII250: a TBP-associated factor implicated in cell-cycle regulation. Nature 362, 175-179.

Ryu, S., Zhou, S., Ladurner, A.G., and Tjian, R. (1999). The transcriptional cofactor complex CRSP is required for activity of the enhancer-binding protein Sp1. Nature 397, 446-450.

244

Sadowski, C.L., Henry, R.W., Lobo, S.M., and Hernandez, N. (1993). Targeting TBP to a non-TATA box cis-regulatory element: a TBP-containing complex activates transcription from snRNA promoters through the PSE. Genes Dev 7, 1535-1548.

Saleh, A., Lang, V., Cook, R., and Brandl, C.J. (1997). Identification of native complexes containing the yeast coactivator/repressor proteins NGG1/ADA3 and ADA2. J Biol Chem 272, 5571-5578.

Samuels, M., Fire, A., and Sharp, P.A. (1982). Separation and characterization of factors mediating accurate transcription by RNA polymerase II. J Biol Chem 257, 14419-14427.

Sanders, S.L., Klebanow, E.R., and Weil, P.A. (1999). TAF25p, a non-histone-like subunit of TFIID and SAGA complexes, is essential for total mRNA gene transcription in vivo. J Biol Chem 274, 18847-18850.

Sanders, S.L., and Weil, P.A. (2000). Identification of two novel TAF subunits of the yeast Saccharomyces cerevisiae TFIID complex. J Biol Chem 275, 13895-13900.

Schlueter, S.D., Dong, Q., and Brendel, V. (2003). GeneSeqer@PlantGDB: Gene structure prediction in plant genomes. Nucleic Acids Res 31, 3597-3600.

Selleck, W., Howley, R., Fang, Q., Podolny, V., Fried, M.G., Buratowski, S., and Tan, S. (2001). A histone fold TAF octamer within the yeast TFIID transcriptional coactivator. Nat Struct Biol 8, 695-700.

Smale, S.T., and Kadonaga, J.T. (2003). The RNA polymerase II core promoter. Annual Review of Biochemistry 72, 449-479.

Solow, S., Salunek, M., Ryan, R., and Lieberman, P.M. (2001). Taf(II) 250 phosphorylates human transcription factor IIA on serine residues important for TBP binding and transcription activity. J Biol Chem 276, 15886-15892.

Steffan, J.S., Keys, D.A., Dodd, J.A., and Nomura, M. (1996). The role of TBP in rDNA transcription by RNA polymerase I in Saccharomyces cerevisiae: TBP is required for upstream activation factor-dependent recruitment of core factor. Genes Dev 10, 2551-2563.

Sterner, D.E., Grant, P.A., Roberts, S.M., Duggan, L.J., Belotserkovskaya, R., Pacella, L.A., Winston, F., Workman, J.L., and Berger, S.L. (1999). Functional organization of the yeast SAGA complex: distinct components involved in structural integrity, nucleosome acetylation, and TATA-binding protein interaction. Mol Cell Biol 19, 86-98.

245

Stockinger, E.J., Mao, Y., Regier, M.K., Triezenberg, S.J., and Thomashow, M.F. (2001). Transcriptional adaptor and histone acetyltransferase proteins in Arabidopsis and their interactions with CBF1, a transcriptional activator involved in cold-regulated gene expression. Nucleic Acids Res 29, 1524-1533.

Struhl, K., and Moqtaderi, Z. (1998). The TAFs in the HAT. Cell 94, 1-4.

Sumimoto, H., Ohkuma, Y., Sinn, E., Kato, H., Shimasaki, S., Horikoshi, M., and Roeder, R.G. (1991). Conserved sequence motifs in the small subunit of human general transcription factor TFIIE. Nature 354, 401-404.

Sun, X., Ma, D., Sheldon, M., Yeung, K., and Reinberg, D. (1994). Reconstitution of human TFIIA activity from recombinant polypeptides: a role in TFIID-mediated transcription. Genes Dev 8, 2336-2348.

Swofford, D.L. (2003). PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods) (Sunderland, Massachusetts: Sinauer Associates).

Takada, R., Nakatani, Y., Hoffmann, A., Kokubo, T., Hasegawa, S., Roeder, R.G., and Horikoshi, M. (1992). Identification of human TFIID components and direct interaction between a 250-kDa polypeptide and the TATA box-binding protein (TFIID tau). Proc Natl Acad Sci U S A 89, 11809-11813.

Takeda, Y., Hirokawa, H., and Yamazaki, K. (1994). Bending of DNA in solution caused by a protein from Arabidopsis that binds to a TATA element. Biosci Biotechnol Biochem 58, 916-920.

Tamada, Y., Nakamori, K., Matsuda, K., Furumoto, T., and Izui, K. (2003). Characterization of TAF10, a general transcription factor, in plants. In 7th International Congress of Plant Molecular Biology (Barcelona, SP), pp. S05-22.

Tan, S., Conaway, R.C., and Conaway, J.W. (1995). Dissection of transcription factor TFIIF functional domains required for initiation and elongation. Proc Natl Acad Sci U S A 92, 6042-6046.

Tang, H., Sun, X., Reinberg, D., and Ebright, R.H. (1996). Protein-protein interactions in eukaryotic transcription initiation: structure of the preinitiation complex. Proc Natl Acad Sci U S A 93, 1119-1124.

Tao, Y., Guermah, M., Martinez, E., Oelgeschlager, T., Hasegawa, S., Takada, R., Yamamoto, T., Horikoshi, M., and Roeder, R.G. (1997). Specific interactions and potential functions of human TAFII100. J Biol Chem 272, 6714-6721.

Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. (1997). The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25, 4876-4882.

246

Tjian, R., and Maniatis, T. (1994). Transcriptional activation: a complex puzzle with few easy pieces. Cell 77, 5-8.

Toldo, L.I. (1997). JaMBW 1.1: Java-based Molecular Biologists' Workbench. Comput Appl Biosci 13, 475-476.

Toledo-Ortiz, G., Huq, E., and Quail, P.H. (2003). The Arabidopsis basic/helix-loop- helix transcription factor family. Plant Cell 15, 1749-1770.

Tora, L. (2002). A unified nomenclature for TATA box binding protein (TBP)- associated factors (TAFs) involved in RNA polymerase II transcription. Genes Dev 16, 673-675.

Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J.M. (2000). A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623-627.

Upadhyaya, A.B., Lee, S.H., and DeJong, J. (1999). Identification of a general transcription factor TFIIA alpha/beta homolog selectively expressed in testis. J Biol Chem 274, 18040-18048.

Verrijzer, C.P., Yokomori, K., Chen, J.L., and Tjian, R. (1994). Drosophila TAFII150: similarity to yeast gene TSM-1 and specific binding to core promoter DNA. Science 264, 933-941.

Vlachonasios, K.E., Thomashow, M.F., and Triezenberg, S.J. (2003). Disruption mutations of ADA2b and GCN5 transcriptional adaptor genes dramatically affect Arabidopsis growth, development, and gene expression. Plant Cell 15, 626-638.

Wade, P.A., and Jaehning, J.A. (1996). Transcriptional corepression in vitro: a Mot1p- associated form of TATA-binding protein is required for repression by Leu3p. Mol Cell Biol 16, 1641-1648.

Washburn, K.B., Davis, E.A., and Ackerman, S. (1997). Coactivators and TAFs of transcription activation in wheat. Plant Mol Biol 35, 1037-1043.

Wassarman, D.A., and Sauer, F. (2001 Aug). TAF(II)250: a transcription toolbox. J Cell Sci 114, 2895-2902.

Weinzierl, R.O., Dynlacht, B.D., and Tjian, R. (1993a). Largest subunit of Drosophila transcription factor IID directs assembly of a complex containing TBP and a coactivator. Nature 362, 511-517.

247

Weinzierl, R.O., Ruppert, S., Dynlacht, B.D., Tanese, N., and Tjian, R. (1993b). Cloning and expression of Drosophila TAFII60 and human TAFII70 reveal conserved interactions with other subunits of TFIID. Embo J 12, 5303-5309.

Werten, S., Mitschler, A., Romier, C., Gangloff, Y.G., Thuault, S., Davidson, I., and Moras, D. (2002). Crystal structure of a subcomplex of human transcription factor TFIID formed by TATA binding protein-associated factors hTAF4 (hTAF(II)135) and hTAF12 (hTAF(II)20). J Biol Chem 277, 45502-45509.

Wieczorek, E., Brand, M., Jacq, X., and Tora, L. (1998). Function of TAF(II)- containing complex without TBP in transcription by RNA polymerase II. Nature 393, 187-191.

Wolffe, A.P., and Guschin, D. (2000). Chromatin structural features and targets that regulate transcription. J Struct Biol 129, 102-122.

Workman, J.L., and Kingston, R.E. (1998). Alteration of nucleosome structure as a mechanism of transcriptional regulation. Annual Review of Biochemistry 67, 545-579.

Workman, J.L., Taylor, I.C., and Kingston, R.E. (1991). Activation domains of stably bound GAL4 derivatives alleviate repression of promoters by nucleosomes. Cell 64, 533-544.

Wullschleger, S.D., Jansson, S., and Taylor, G. (2002). Genomics and Forest Biology: Populus Emerges as the Perennial Favorite. Plant Cell 14, 2651-2655.

Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., and Eisenberg, D. (2002). DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 30, 303-305.

Xiao, H., Tao, Y., and Roeder, R.G. (1999). The human homologue of Drosophila TRF-proximal protein is associated with an RNA polymerase II-SRB complex. J Biol Chem 274, 3937-3940.

Xie, X., Kokubo, T., Cohen, S.L., Mirza, U.A., Hoffmann, A., Chait, B.T., Roeder, R.G., Nakatani, Y., and Burley, S.K. (1996). Structural similarity between TAFs and the heterotetrameric core of the histone octamer. Nature 380, 316-322.

Yamashita, S., Hisatake, K., Kokubo, T., Doi, K., Roeder, R.G., Horikoshi, M., and Nakatani, Y. (1993). Transcription factor TFIIB sites important for interaction with promoter-bound TFIID. Science 261, 463-466.

Yatherajam, G., Zhang, L., Kraemer, S.M., and Stargell, L.A. (2003). Protein-protein interaction map for yeast TFIID. Nucleic Acids Res 31, 1252-1260.

248

Yokomori, K., Admon, A., Goodrich, J.A., Chen, J.L., and Tjian, R. (1993a). Drosophila TFIIA-L is processed into two subunits that are associated with the TBP/TAF complex. Genes Dev 7, 2235-2245.

Yokomori, K., Chen, J.L., Admon, A., Zhou, S., and Tjian, R. (1993b). Molecular cloning and characterization of dTAFII30 alpha and dTAFII30 beta: two small subunits of Drosophila TFIID. Genes Dev 7, 2587-2597.

Yokomori, K., Verrijzer, C.P., and Tjian, R. (1998). An interplay between TATA box- binding protein and transcription factors IIE and IIA modulates DNA binding and transcription. Proc Natl Acad Sci U S A 95, 6722-6727.

Yokomori, K., Zeidler, M.P., Chen, J.L., Verrijzer, C.P., Mlodzik, M., and Tjian, R. (1994). Drosophila TFIIA directs cooperative DNA binding with TBP and mediates transcriptional activation. Genes Dev 8, 2313-2323.

Yoon, J.B., and Roeder, R.G. (1996). Cloning of two proximal sequence element- binding transcription factor subunits (gamma and delta) that are required for transcription of small nuclear RNA genes by RNA polymerases II and III and interact with the TATA-binding protein. Mol Cell Biol 16, 1-9.

Zhou, Q., Lieberman, P.M., Boyer, T.G., and Berk, A.J. (1992). Holo-TFIID supports transcriptional stimulation by diverse activators and from a TATA-less promoter. Genes Dev 6, 1964-1974.

Zomerdijk, J.C., Beckmann, H., Comai, L., and Tjian, R. (1994). Assembly of transcriptionally active RNA polymerase I initiation factor SL1 from recombinant subunits. Science 266, 2015-2018.

BIOGRAPHICAL SKETCH

Shai Joshua Lawit was born October 5, 1976 in Daytona Beach, FL to Steven A.

Lawit and Donna B. Lawit. He graduated magna cum laude from Spruce Creek High

School, Port Orange, FL with an International Baccalaureate Diploma in 1995. He enrolled at the University of Florida in August 1995 with his new bride Kristel. In

August 1998, he graduated first in his College of Agriculture class with Highest Honors.

He received a Bachelor of Science degree majoring in microbiology and cell science, with a minor in Chemistry. Thereafter, he immediately enrolled in the Plant Molecular and Cellular Biology Doctor of Philosophy program.

249