<<

ACTIVATORS AND OF : USING APPROACHES TO ANALYZE AND GROUP TRANSCRIPTION FACTORS

by

Ala Savitskaya

A Thesis Submitted to the Faculty of

The Charles . Schmidt College of Science

in Partial Fulfillment of the Requirements for the Degree of

Master of Science

Florida Atlantic University

Boca Raton, Florida

May 2010

ACKNOWLEDGEMENTS

The author wishes to express her thanks to her advisor Dr. Kasirajan Ayyanathan for his continued support and the time and effort put into this project, and to her committee members Dr. Zhongwei Li and Dr. Xing-Hai Zhang for their time, knowledge, insight, and ideas on how to improve further her research. The author would also like to thank all members of Dr. Ayyanathan laboratory as well as the technicians at Florida

Atlantic University for their assistance and support. Special thanks go to the former undergraduate student of Dr. Ayyanathan laboratory, Leo Wolfe, for sharing his

Information Technology knowledge and to the author’ family for their support and encouragement.

iii

ABSTRACT

Author: Ala Savitskaya

Title: Activators and Repressors of Transcription: Using

Bioinformatics Approaches to Analyze and Group Human

Transcription Factors

Institution: Florida Atlantic University

Thesis Advisor: Dr. Kasirajan Ayyanathan

Degree: Master of Science

Year: 2010

Transcription factors are macromolecules that are involved in transcriptional regulation by interacting with specific DNA regions, and they can cause activation or silencing of their target . regulation by transcriptional control explains different biological processes such as development, function, and disease. Even though transcriptional control has been of great interest for , much still remains unknown. This study was designed to generate the most current list of human genes. Unique entries of transcription factor genes were collected and entered into Microsoft Office 2007 Access Database along with information about each gene. Microsoft Office 2007 Access tools were used to analyze and group collected entries according to different properties such as or record, or presence of certain domains. Furthermore, protein sequence alignments of members of

iv

different groups were performed, and phylogenetic trees were used to analyze relationship between different members of each group. This work contributes to the existing knowledge of transcriptional regulation in human.

ACTIVATORS AND REPRESSORS OF TRANSCRIPTION: USING BIOINFORMATICS APPROACHES TO ANALYZE AND GROUP HUMAN TRANSCRIPTION FACTORS

List of Tables...... vii

List of Figures ...... viii

Introduction ...... 1

Study Design and Data Analysis ...... 4

Study Design...... 4

A Database Compilation of All Human Transcription Factors ...... 5

Major DNA-binding Domains ...... 6

Other DNA-binding Domains ...... 12

DNA-binding Domains that have not Previously been Discussed in Human TF

Studies ...... 21

Discussion ...... 26

Summary ...... 26

Limitations ...... 26

Future Studies ...... 27

Tables and Figures ...... 28

Works Cited ...... 118

vi

LIST OF TABLES

Table 1: ZF-C2H2 KRAB Transcription Factors ...... 28

Table 2: ZF-C2H2 SCAN Transcription Factors ...... 34

Table 3: ZF-C2H2 BTB Transcription Factors ...... 35

Table 4: ZF-C2H2 without KRAB, SCAN, BTB Domains ...... 36

Table 5: Transcription Factors ...... 42

Table 6: HLH Transcription Factors ...... 47

Table 7: bZip Transcription Factors ...... 50

Table 8: ZF-C4 Transcription Factors ...... 52

Table 9: Forkhead Transcription Factors ...... 53

Table 10: -like Transcription Factors...... 54

Table 11: HMG Transcription Factors ...... 56

Table 12: ETS(a) and TIG() Transcription Factors ...... 57

Table 13: POU(a), SAND(b), and IRF() Transcription Factors ...... 58

Table 14: GATA(a), DM(b), HSF(c), and CP2() Transcription Factors ...... 59

Table 15: RFX(a) and AP2(b) Transcription Factors ...... 60

Table 16: MYB DNA-Binding Transcription Factors ...... 61

Table 17: MH1(a), PAX(b), and ARID(c) Transcription Factors ...... 62

Table 18: CUT Transcription Factors ...... 63

vii

LIST OF FIGURES

Figure 1: Transcription Factor Genes (All) ...... 64

Figure 2: Transcription Factor Genes (Activators) ...... 70

Figure 3: Transcription Factor Genes (Repressors) ...... 72

Figure 4: Transcription Factor Genes (Activators and Repressors) ...... 74

Figure 5: Phylogenetic Relationship Between Genes with KRAB Domain ...... 75

Figure 6: Phylogenetic Relationship Between Genes with SCAN Domain ...... 81

Figure 7: Phylogenetic Relationship Between Genes with BTB Domain ...... 82

Figure 8: Phylogenetic Relationship Between Genes with C2H2 domain but

without KRAB, SCAN, or BTB Domains ...... 83

Figure 9: Phylogenetic Relationship Between Genes with Homeobox Domain ...... 89

Figure 10: Phylogenetic Relationship Between Genes with HLH Domain...... 94

Figure 11: Phylogenetic Relationship Between Genes with bZip Domain ...... 96

Figure 12: Phylogenetic Relationship Between Genes with ZF C4 Domain...... 97

Figure 13: Phylogenetic Relationship Between Genes with Forkhead Domain ...... 98

Figure 14: Phylogenetic Relationship Between p53-like Genes ...... 99

Figure 15: Phylogenetic Relationship Between Genes with HMG Domain ...... 100

Figure 16: Phylogenetic Relationship Between Genes with ETS(a) and TIG(b)

Domains ...... 101

viii

Figure 17: Phylogenetic Relationship Between Genes with POU(a), SAND(b), and

IRF(c) Domains ...... 102

Figure 18: Phylogenetic Relationship Between Genes with GATA(a), DM(b), and HSF(c)

Domains ...... 103

Figure 19: Phylogenetic Relationship Between Genes with CP2(a), RFX(b), and AP2(c)

Domains ...... 104

Figure 20: Phylogenetic Relationship Between Genes with MYB(a) and MH1(b)

Domains ...... 105

Figure 21: Phylogenetic Relationship Between Genes with PAX(a), AIRD(b), and

CUT(c) Domains ...... 106

Figure 22: Phylogenetic Relationship Among Genes with Documented Transcriptional

Activation Function ...... 107

Figure 23: Phylogenetic Relationship Among Genes with Documented Transcriptional

Repression Function ...... 112

ix

INTRODUCTION

Transcription, synthesis of RNA molecules from DNA template, is the mechanism that leads to functional gene products, and gene regulation by transcriptional control explains different biological processes such as development, function, and disease. Transcription factors are macromolecules that are involved in transcriptional regulation by interacting with specific DNA regions, and they can cause activation or silencing of their target genes. Transcription factors play a key role in certain mechanisms by which specific genes are expressed in a temporal and tissue-specific manner (Costoya, 2007). Multiple transcriptional mechanisms have already been elucidated. Indeed precise transcriptional control is essential for proper and homeostasis (Boyadjiev and Jabs, 2000), and many diseases are associated with a break in the transcriptional regulatory system (Darnell, 2002; Jimenez-

Sanchez et al., 2001; Lopez-Bigas et al., 2006). Various kinds of therapeutics are already in use today that target transcriptional regulator molecules (Bustin and McKay, 1994;

Butt and Karathanasis, 1995; Ghosh and Papavassiliou, 2005; Gronemeyer et al., 2004;

Moellering et al., 2009; Overington et al., 2006; Papavassiliou, 1998).

Even though transcriptional control has been of great interest for molecular biology, much still remains unknown: not only there is missing information about the mode of action of some known transcription factors; there is still no common list of all existing transcription factors or a full repertoire of transcriptional regulators for certain

1

organisms. Since the mode of action of transcription factors is turning genes on where their expression is needed and, equally important, turning genes off or keeping them silent where is not needed, it would be important to completely understand the system of transcriptional regulation in specific tissues and in a whole organism. It would also be beneficial to know the place for each player of transcriptional control in order to better understand the course of embryonic development as well as the cause of deviations from homeostasis. Since most of information in the process of transcriptional control in human is still missing, the purpose of this study is to generate the most current list of human transcription factor genes and to group those members into activators or repressors of transcription according to available information about each member. Once a full list of transcription factors is available, the members can be grouped into distinct categories, their family and evolutionary relationship can be analyzed, and the mode of action of the uncharacterized/poorly characterized members can be predicted depending on the proximity of similarity to a well-described member.

Several studies in the past attempted to create a list of human transcription factors

(Ashburner et al., 2000; Messina et al., 2004; Vaquerizas et al., 2009; Wilson et al.,

2008). Those studies analyzed by searching for sequences coding for different protein domains that could be typical for transcription factors. Each of those studies reported a number of predicted transcription factors along with the confirmed transcription factors. The results of each study were different from the others: Messina et al. identified 1962 members (Messina et al., 2004), Wilson et al. predicted 1508 loci in human as transcription factors (Wilson et al., 2008), Ashburner et al. and Vaquerizas et al. reported 1052 and 1391 transcription factor genes respectively (Ashburner et al.,

2

2000; Vaquerizas et al., 2009). Those studies, as previously discussed by Vaquerizas et al., may have obtained erroneous results for several reasons: most of these studies depended on identifying genes that are homologous to previously characterized regulators; there are technical limitations to sequence search methods, and algorithms can sometimes output false positive hits. Moreover, even among true positives, some DNA- binding domains also exist in non-transcription factor , making these domains unreliable markers of sequence specific DNA-binding functionality (Vaquerizas et al.,

2009).

3

STUDY DESIGN AND DATA ANALYSIS

Study Design

The methods of this study included obtaining several lists of transcription factors from different sources, comparing those lists, deleting all the duplicates, and manually analyzing each unique member for transcription factor, transcriptional activator, or transcriptional repressor activity using National Center for Biotechnology Information

(NCBI) database. Every member that had any record of transcription factor, transcriptional activator, or transcriptional repressor was entered into Microsoft Office

Access 2007 database. The four lists of transcription factors that were analyzed were as follows:

1) 901 members from GeneCards, using (GO) ID 0003700

(www..org)

2) 2886 members from Transcription Factor Prediction Database (transcriptionfacror.org)

3) 2704 members from NCBI using keyword search “Transcription factor Homo sapiens”

(www.ncbi.nlm.nih.gov)

4) 2068 members from NCBI using keyword search “Human Transcription Factor”

(www.ncbi.nlm.nih.gov)

The entries for transcription factors (2886 total) from Transcription Factor

Prediction Database included isoforms, after deleting the isoforms the number decreased to about 1500. NCBI members included transcription factors from organisms other than

4

human, there were also genes other than transcription factors; therefore, the numbers of the members of those lists also decreased significantly after each member was analyzed.

A Database Compilation of All Human Transcription Factors

As mentioned above, each member from four lists was entered into NCBI “Gene” search, and, using gene ontology records and any available literature references, 1188 unique entries were obtained that had any record of transcription factor, transcriptional activator, or transcriptional repressor. Those members were entered into a table of a

Microsoft Office Access 2007 Database. Additionally, 423 members were found that had

DNA-binding and transcriptional regulation activity, therefore, they can be assumed to have transcription factor activity. Nonetheless, those 423 members were entered in a separate table. Later, members from both tables (1611 unique entries total) were treated equally when they were subjected to analysis.

The initial analysis included recording every entry’s aliases, nucleotide sequence, protein sequence, and any significant protein domains using search

(pfam.sanger.ac.uk). Additionally, all the literature references of each entry were analyzed for a proof of transcriptional activator or repressor activity, and if any of such record was found, the reference’s Uniform Resource Locator (URL) was entered into the database. As a result, each of the database entry included the official NCBI gene name, the aliases, nucleotide sequence, protein sequence, significant Pfam domains, and URLs for supporting literature. The entries were marked as repressor, activator, or both if any supporting information was obtained.

After analyzing and gathering information about each member, lists of members with different properties could be obtained using Microsoft Office Access tools. Out of

5

1611 transcription factors (represented as a list in Figure 1) there are 286 members with a record of transcriptional activator and 323 members with a record of transcriptional repressor (represented in Figures 2 and 3). There are 66 genes that have record for both transcriptional activator and repressor, and they are listed in Figure 4.

Major DNA-binding domains

Cys2His2 (C2H2) Zinc Finger

All 1611 genes were also grouped according to their DNA-binding domains. The largest DNA-binding domain in human is Zinc Finger (ZF) C2H2 type. C2H2 Kruppel- type zinc finger is a type of DNA-binding domain, approximately 25–30 amino acids long, characterized by two conserved cysteine and histidine residue pairs that coordinate a single zinc atom which serves as a crucial determinant of zinc finger conformation

(Kelly and Daniel, 2006). ZF C2H2 proteins often have other domains in their sequences such as SNAG (Snail/GFI), BTB/POZ, (Broad Complex, Tramtrack, Bric-a-brac/Pox virus Zinc finger) and KRAB (Kruppel Associated Box) (Ayyanathan et al., 2007;

Huntley et al., 2006; Melnick et al., 2000; Stogios et al., 2005; Venter et al., 2001).

Therefore, the ZF C2H2 genes were further grouped according to other domains present in their protein sequence.

Zinc-finger proteins containing the Krüppel-associated box (KRAB-containing proteins) are the largest single family of transcriptional regulators in mammals. KRAB box appears to be specific (Venter et al., 2001). The KRAB domain is normally located near the amino terminus and consists of one or both of the KRAB A box and the KRAB B box. The two boxes of the KRAB domain are always encoded by individual separated by introns of variable sizes (Urrutia, 2003). There are 274

6

genes that were collected by this study that have both KRAB and ZF C2H2 domains and they are represented in Table 1. This number is close to 290 members of KRAB ZF C2H2 recorded by Urrutia (Urrutia, 2003). Huntley et al. recorded the number of 423 of KRAB

ZF C2H2 genes, however, they explain in their work that 66 out of those 423 genes result in effector (KRAB) only transcripts, and 79 out of 423 genes result in Zinc Finger-only transcripts (Huntley et al., 2006). Previous studies have shown that KRAB domain functions as a strong transcriptional repressor domain (Friedman et al., 1996; Witzgall et al., 1994). Out of 274 genes identified by this study, 54 members have a record of transcriptional repressor. KRAB C2H2 zinc finger proteins represent the largest superfamily of transcriptional repressors in the human genome. They interact with

KRAB-associated protein-1 (KAP-1) which serves as a corepressor by recruiting , histone methyltransferase and heterochromatin proteins and thus creating a macromolecular repression complex at the target promoters (Ayyanathan et al., 2003;

Friedman et al., 1996; Ryan et al., 1999; Schultz et al., 2002; Schultz et al., 2001). A phylogenetic tree was constructed using all genes with KRAB domain identified by this work, and it is represented in Figure 5. It suggests that the most ancient members in this group are such genes as ZNF91, ZNF383, and the most recently evolved members are

PRDM7 and BAZ2A. Some members can be grouped into subfamilies based on their protein sequence . For example, one of such groups includes the following members: ZNF140, ZNF527, ZFP28, ZNF471, ZNF470, ZNF583, ZNF181, and ZNF302

(Huntley et al., 2006).

Another group of ZF C2H2 proteins contain SCAN domain in their sequence. This study identified 50 members of SCAN ZF C2H2 genes, and they are represented in Table

7

2. This number is close to the number of 58 reported by another study (Sander et al.,

2003). Some of those proteins also have KRAB domain (22 out of 50 identified members). Unlike KRAB domains, SCAN domain may not have transcriptional activation or repression capabilities: findings using both mammalian and yeast two- hybrid systems demonstrate that the SCAN domain is an interaction motif (Schumacher et al., 2000; Williams et al., 1999). A phylogenetic tree was constructed using all genes with SCAN domain identified by this work, and it is represented in Figure 6. The phylogenetic tree suggests that the most ancient member in this group is ZNF323, and the most recently evolved member is ZSCAN18. Some members can be grouped into subfamilies based on their . For example, one of such groups includes the following members: ZSCAN4, ZSCAN5A, ZSCAN5B, ZSCAN5C, PEG3, and

ZIM2.

Another major domain present in zinc finger C2H2 transcription factors is

BTB/POZ (broad-complex, Tramtrack and bric a brac/Pox virus and Zinc finger) domain. A variety of functional roles have been identified for this domain, including transcription repression (Ahmad et al., 2003; Melnick et al., 2000). This study identified

45 members of BTB/POZ Zinc Finger proteins, and they are represented in Table 3. This number is close to 43, the number of human BTB/POZ Zinc Finger proteins recorded by earlier reports (Stogios et al., 2005). A phylogenetic tree was constructed using all genes with a BTB domain identified by this work, and it is represented in Figure 7. The phylogenetic tree suggests that the most ancient member in this group is BCL6B, and the most recently evolved member is LZTR1. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes

8

the following members: ZBTB12, ZBTB26, ZBTB6, ZBTB20, and ZBTB45 (Stogios et al., 2005).

Apart from the ZF C2H2 transcription factors collected by this study that have

KRAB, SCAN, or BTB domains, there are 292 genes that that do not contain those domains but do have the ZF C2H2 DNA-binding domain (Table 4). A phylogenetic tree was also constructed using all genes with a ZF C2H2 domain but without KRAB, SCAN, and BTB domains, and it is represented in Figure 8. Apart from BTB/POZ and KRAB repressor domains, there is another with repressor function, SNAG that has been found in ZF proteins (Ayyanathan et al., 2007). Even though a number of ZF

C2H2 proteins have been reported to have this domain, it has not been characterized by

Pfam (pfam.sanger.ac.uk) or NCBI (www.ncbi.nlm.nih.gov). The database created by this study was searched for the first 5 amino acids of the SNAG domain consensus sequence (MPRSF LVKKH FNASK KPNYS ELDTH TVIIS), and the following members were found: GFI, GFI1B, GSX1, GSX2, SCRT1, SCRT2, SNAI1, SNAI2,

SNAI3) (Ayyanathan et al., 2007; Nieto, 2002).

Homeodomain

Another major DNA-binding domain is Homeodomain (also known as Homeobox). The

Homeobox domain was first identified in a number of homeotic and segmentation proteins, but is now known to be well-conserved in many other , including (Gehring, 1992). Homeobox domain binds DNA through a helix- turn-helix (HTH) structure. That structure is characterized by two alpha helices, which make contacts with the DNA that are joined by a short turn. The second helix binds to

DNA via a number of hydrogen bonds and hydrophobic interactions, which occur

9

between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA (Gehring et al., 1990; Otting et al., 1990). There are 225 genes with Homeodomain that were collected in the database which have a record of transcription factor activity, and they are represented in Table 5. There are 24 genes that have a record of transcriptional activator, and 28 genes that have a record of transcriptional repressor. There are 5 genes (NKX2-5, CDX1, CDX2, CUX1, HMBOX1, and HOXA9) in this group that have a record of both activator and repressor. All activators were analyzed for the presence of additional domains with a known activator function. Some proteins did contain such activator domains: genes CDX1 and CDX2 have a domain named Caudal act, which has been shown to play a role of activator

(Rings et al., 2001). Genes HOXA9, HOXB9, HOXC9, and HOXD9 have another domain with an activator function, Hox9 act (Prevot et al., 2000). HNF1-, another activator domain, is present in transcription factors HNF1A and HNF1B (Pastore et al.,

1991). Other members of the 24 Homeobox domain activators of transcription do not have any significant domains with a known activator function, therefore, it remains unknown what causes their activator ability. It could be other domains in their sequence that have not yet been characterized, and, as a result, not showing in the output of the searches that were used as a method in this study. Also, variation of

Homeodomain itself could contribute to activator or repressor activity of this domain. No domains with a known repressor function were observed in the collected Homeobox members. A phylogenetic tree was constructed using all genes with Homeobox domain identified by this work, and it is represented in Figure 9. It suggests that the most ancient member in this group is ZFHX2, and the most recently evolved members are LASS3 and

10

LASS5. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: NKX6-1,

NKX6-2, NKX6-3, TLX1, TLX2, and TLX3.

Helix-Loop-Helix

The third largest DNA-binding domain in transcription factors is HLH (helix- loop-helix) domain. Basic helix-loop-helix proteins (bHLH) are a group of factors that exert a determinative influence in a variety of developmental pathways. These transcription factors are characterized by a highly evolutionary conserved bHLH domain that mediates specific dimerization (Littlewood and Evan,

1995). This study identified 105 members with HLH domain, 31 and 28 genes with a record of activator or repressor respectively (Table 6). There are 7 genes with Homeobox have a record of both activator and repressor. Except for Mic N domain in MICN gene no other known activator domains were observed in this group (Grandori and Eisenman,

1997). Out of 28 genes with a record of repressor, 8 members contain a domain called

Hairy Orange, which is known to have a repressor function (Dawson et al., 1995). Two other members, HEYL and HES2, that do not have any record of repressor or activator, also contain this domain; therefore, they can be predicted to have a function of transcriptional repressor. A phylogenetic tree was constructed using all genes with HLH domain identified by this work, and it is represented in Figure 10. It suggests that the most ancient member in this group is TAL2, and the most recently evolved member is

SOHLH2. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: ARNT2,

ARNT, ARNTL, ARNTL2, CLOCK, NPAS2.

11

Other DNA-binding domains

Basic Zipper Domain (bZIP)

One part of bZIP domain contains a region that mediates sequence specific DNA binding properties, and the is required for the dimerization of two DNA binding regions. The DNA binding region comprises a number of basic amino acids such as arginine and lysine (Ellenberger et al., 1994). There are 55 transcription factor genes having bZIP domain that were collected by this study, 23 and 14 genes with activator and repressor function respectively (Table 7). There are 4 genes (ATF3, CREM, MAFF, and

CEBPA) with bZIP domain that have a record of both activator and repressor. CREM is known to have 21 isoforms, which could explain its diverse abilities of transcriptional regulation. A prominent repressor domain found in this group is BTB (in BACH1 and

BACH2), which is mainly found in ZF C2H2 transcription factors. A phylogenetic tree was constructed using all genes with a bZIP domain identified by this work, and it is represented in Figure 11. It suggests that the most ancient member in this group is

BATF3, and the most recently evolved member is NFIL3. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: CREB5, ATF6, ATF6B, CREBZF, DDIT3.

ZF-C4

This domain is a DNA-binding domain of nuclear hormone receptors. This domain recognizes specific sequences, and it is connected by a linker region to a C-terminal ligand-binding domain. The DNA-binding domain can elicit either an activating or repressing effect by binding to specific regions of the DNA known as hormone-response elements (Claessens and Gewirth, 2004; Moehren et al., 2004). There are 46 transcription

12

factor genes with ZF-C4 domain that were identified by this study, 16 and 12 genes with activator and repressor function respectively (Table 8). There are 4 genes (PPARD,

PPARG, ESR1, and NR3C1) with this domain that have a record of both activator and repressor. A phylogenetic tree was constructed using all genes with a ZF C4 domain identified by this work, and it is represented in Figure 12. It suggests that the most ancient member in this group is NR2F6, and the most recently evolved member is

NR0B1. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: NR2C1,

NR2C2, NR2E1, NR2E3.

Forkhead

This domain is also known as 'winged helix.’ Although the domain is found in several different transcription factors, a common function is their involvement in early developmental decisions of cell fates during embryogenesis (Hacker et al., 1992). There are 49 transcription factor genes with Forkhead domain that were identified by this study,

14 and 8 genes with activator and repressor function respectively (Table 9). There is one gene (FOXK1) with Forkhead domain that has a record of both activator and repressor. A phylogenetic tree was constructed using all genes with Forkhead domain identified by this work, and it is represented in Figure 13. It suggests that the most ancient member in this group is FOXE3, and the most recently evolved member is FOXR2. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: FOXA1, FOXA2, FOXA3, FOXB1, and

FOXB2.

P53-like

13

This clan contains a variety of DNA-binding domains that contain an immunoglobulin- like fold. It includes the following DNA-binding domains: P53, RHD, -box, STAT, and

Runt (Table 10).

P53: These transcription factors play diverse roles in the regulation of cellular functions.

TP53, TP63, and TP73 are the members of this group, and they have a record of both transcriptional activator and repressor.

Rel Homology Domain (RHD): The RHD is composed of two immunoglobulin-like beta- barrel subdomains that grip the DNA in the major groove. The N-terminal specificity domain resembles the core domain of the p53 transcription factor (Muller et al., 1995).

There are 10 transcription factor genes with RHD domain that were identified by this study, 3 and 2 genes with activator and repressor function respectively.

T-box: Transcription factors of the T-box family are required both for early cell-fate decisions, such as those necessary for formation of the basic vertebrate body plan, and for differentiation and organogenesis (Wilson and Conlon, 2002). There are 17 transcription factor genes with T-box domain that were identified by this study, 2 genes with activator function, and another 2 genes with repressor function.

Signal Transducers and Activators of Transcription (STAT): these transcription factors are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus (Kisseleva et al., 2002). There are 7 genes with

STAT domain that were collected by this study, all with a record of transcriptional activator. A phylogenetic tree was constructed using all genes with p53-like domains identified by this work, and it is represented in Figure 14. It suggests that the most

14

ancient member in this group is TBX10, and the most recently evolved member is

STAT3.

High Mobility Group (HMG)

Some HMG proteins have no sequence specificity, they have a high affinity for bent or distorted DNA, and appear to play important architectural roles in the assembly of nucleoprotein complexes in a variety of biological processes (Thomas and Travers,

2001). Other HMG proteins bind DNA in a sequence specific fashion and contain a single HMG box (Xin et al., 2000). There are 34 transcription factor genes with HMG domain that were identified by this study, 9 and 7 genes with activator and repressor function respectively (Table 11). There are 2 genes (SOX9 and SRY) that have a record of both activator and repressor. Two HMG containing transcription factors, LEF1 and

TCF7L2 have a CTNNB1 binding domain. LEF1 gene has a record of transcriptional activator, and CTNNB1 binding domain has been shown to have activator function in other proteins (Roose et al., 1998). TCF7L2, however, has a record of transcriptional repressor. A phylogenetic tree was constructed using all genes with HMG domain identified by this work, and it is represented in Figure 15. It suggests that the most ancient member in this group is SOX15, and the most recently evolved member is

HMGN5. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups can include the following members: SOX1,

SOX 2, SOX 3, SOX 14, and SOX 21.

Erythroblast Transformation Specific (ETS)

The ETS family of transcription factors is characterized by an evolutionarily conserved

DNA-binding domain that regulates expression of a variety of viral and cellular genes by

15

binding to a purine-rich GGAA/T core sequence in cooperation with other transcriptional factors and co-factors. Most ETS family proteins are nuclear targets for activation of Ras-

MAP kinase signaling pathway and some of them affect proliferation of cells by regulating the immediate early response genes and other growth-related genes. Some of them also regulate -related genes (Oikawa and Yamada, 2003). There are 28 transcription factor genes with ETS domain that were identified by this study, 10 and 9 genes with activator and repressor function respectively (Table 12a). There is one gene

(ELF1) that has a record of both activator and repressor. A phylogenetic tree was constructed using all genes with ETS domain identified by this work, and it is represented in Figure 16a. It suggests that the most ancient member in this group is FEV, and the most recently evolved member is ELK4. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups can include the following members: SPI1, SPIB, and SPIC.

Transcription factor ImmunoGlobin (TIG)

This domain is another DNA-binding domain found in transcription factors (Vaquerizas et al., 2009). There are 17 transcription factor genes with TIG domain that were identified by this study, 6 and 2 genes with activator and repressor function respectively (Table

12b). A phylogenetic tree was constructed using all genes with TIG domain identified by this work, and it is represented in Figure 16b. It suggests that the most ancient member in this group is EBF4, and the most recently evolved member is NFKB1. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: EBF1, EBF3, and EBF4.

POU

16

The 'POU' (acronym of Pit-1, Oct-1, Unc-86) family of transcription factors shares a common DNA-binding domain of approximately 160 residues. The importance of POU proteins as developmental regulators and tumor-promoting agents is due to linker flexibility, which allows them to adapt to a considerable variety of DNA targets (Alazard et al., 2007). There are 16 transcription factor genes with POU domain that were identified by this study, 2 genes and 1 gene with a record of activator and repressor respectively (Table 13a). All 16 genes also contain a Homeobox domain. A phylogenetic tree was constructed using all genes with a POU domain identified by this work, and it is represented in Figure 17a. It suggests that the most ancient member in this group is

POU3F1, and the most recently evolved member is POU5F2. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: POU4F1, POU4F2, and POU4F3.

SAND

The SAND domain is a conserved found in a number of nuclear proteins.

Those proteins are thought to play important roles in -dependent transcriptional regulation and are linked to many diseases (Bottomley et al., 2001). There are 6 transcription factor genes with SAND domain that were identified by this study, 2 genes with a record of activator (Table 13b). A phylogenetic tree was constructed using all genes with a SAND domain identified by this work, and it is represented in Figure 17b.

It suggests that the most ancient member in this group is SP140, and the most recently evolved member is GMEB1. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: GMEB1, GMEB, and DEAF1.

17

IRF

Induction of type I interferon genes is mediated by the binding of interferon regulatory factor 1 (IRF-1) to a region known as the interferon consensus sequence, located upstream of the interferon genes (Weisz et al., 1992). There are 9 transcription factor genes with IRF domain that were identified by this study, 6 and 2 genes with a record of activator and repressor respectively (Table 13c). There are 2 genes (IRF1 and IRF2) that have a record of both activator and repressor. A phylogenetic tree was constructed using all genes with IRF domain identified by this work, and it is represented in Figure 17c. It suggests that the most ancient member in this group is IRF5, and the most recently evolved member is IRF2. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members:

IRF4, IRF8, and IRF9.

GATA

These transcription factors specifically bind the DNA sequence (A/T)GATA(A/)

(Yamamoto et al., 1990). There are 15 transcription factor genes with GATA domain that were identified by this study, 6 and 7 genes with a record of activator and repressor respectively (Table 14a). There are 3 genes (MTA2, GATA1 and GATA2) that have a record of both activator and repressor. In one of those genes, BAH domain is present which is known to be involved in gene silencing (Callebaut et al., 1999). A phylogenetic tree was constructed using all genes with GATA domain identified by this work, and it is represented in Figure 18a. It suggests that the most ancient members in this group are

GATA1, GATA2, and GATA5, and the most recently evolved members are MTA1 and

GATAD1. Some members can be grouped into subfamilies based on their sequence

18

homology. For example, one of such groups includes the following members: GATA1,

GATA2, and GATA3.

DM

The DM domain has a pattern of conserved zinc chelating residues C2H2C4. The DM domain has been found to bind palindromic DNA (Erdman and Burtis, 1993). There are 7 transcription factor genes with DM domain that were identified by this study and none of them have a record of activator or repressor. There is, however, another domain, DMA, present in two genes in this group (DMRT3 and DMRTA1) (Table 14b). The function of

DMA domain is unknown. A phylogenetic tree was constructed using all genes with DM domain identified by this work, and it is represented in Figure 18b. It suggests that the most ancient member in this group is DMRTA2, and the most recently evolved member is DMRT2.

Heat shock factor (HSF)

HSF domain containing proteins have been found to be transcriptional activators of heat shock genes (Clos et al., 1990). There are 8 transcription factor genes with HSF domain that were identified by this study (Table 14c). Only two of them have a record of transcriptional activator. A phylogenetic tree was constructed using all genes with HSF domain identified by this work, and it is represented in Figure 18c. It suggests that the most recently evolved member is HSF4.

CP2

CP2 is another DNA-binding domain. There are 6 transcription factor genes with CP2 domain that were identified by this study, with no record of activator or repressor activity

(Table 14d). A phylogenetic tree was constructed using all genes with CP2 domain

19

identified by this work, and it is represented in Figure 19a. It suggests that the most ancient member in this group is TFCP2L1.

RFX

Proteins containing this domain bind DNA as a dimer (Reith et al., 1990). There are 7 transcription factor genes with RFX domain that were identified by this study, 2 genes with a record of activator, and 2 genes with a record of repressor (Table 15a). There is one gene (RFX3) that has a record of both activator and repressor. RFX1 transcription activation region was observed in 3 of those genes: RFX1, RFX2, and RFX3. A phylogenetic tree was constructed using all genes with RFX domain identified by this work, and it is represented in Figure 19b. It suggests that the most ancient members in this group are RFX1 and RFX3, and the most recently evolved member is RFX5.

AP2

This domain is another DNA-binding domain found in transcription factors. There are 5 transcription factor genes with AP2 domain that were collected by this study. There is one gene with a record of activator, and one gene (TFAP2B) with a record of both activator and repressor (Table 15b). A phylogenetic tree was constructed using all genes with AP2 domain identified by this work, and it is represented in Figure 19c. It suggests that the most ancient member in this group is TFAP2A, and the most recently evolved member is TFAP2D.

MYB DNA-binding

The retroviral oncogene v- and its cellular progenitor c-myb nuclear DNA- binding proteins (Biedenkapp et al., 1988). There are 22 transcription factor genes with

MYB DNA-binding domain that were identified by this study, 3 genes with a record of

20

activator, and 2 genes with a record of repressor (Table 16). There are 7 genes out of 23 with this domain that also contain ELM2 domain (Egl-27 and MTA1 homology 2).

ELM2 domain has been found in proteins involved in transcriptional repression (Ding et al., 2003). A phylogenetic tree was constructed using all genes with MYB domain identified by this work, and it is represented in Figure 20a. It suggests that the most ancient member in this group is NCOR2, and the most recently evolved member is

MYBL1.

MAD homology 1 (MH1)

MH1 domain forms a compact globular fold that uses a highly conserved 11-residue hairpin to contact DNA in the major groove (Attisano and Lee-Hoeflich, 2001). There are

12 transcription factor genes with MH1 domain that were identified by this study. There are 6 genes with MH1 domain that have a record of activator (Table 17a). A phylogenetic tree was constructed using all genes with MH1 domain identified by this work, and it is represented in Figure 20b. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: NFIA, NFIB, NFIC, and NFIX.

DNA-binding domains that have not previously been discussed in human TF studies

The following DNA-binding domains have not been discussed in the previous studies that attempted to generate a list of human transcription factors.

Transcriptional activators (TEA)

This domain is also known as ATTS domain. There are 4 transcription factor genes with

TEA domain that were identified by this study (TEAD1, TEAD2, TEAD3, and TEAD4), and there are 2 genes with a record of activator.

21

Paired Box (PAX)

This domain is a conserved 124 amino acid N-terminal domain that sometimes precedes a homeobox domain (Xu et al., 1999). There are 9 transcription factor genes with PAX domain that were identified by this study, one gene with a record of activator, and one gene with a record of repressor (Table 17b). A phylogenetic tree was constructed using all genes with PAX domain identified by this work, and it is represented in Figure 21a. It suggests that the most ancient member in this group is PAX6, and the most recently evolved member is PAX1. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members:

PAX2, PAX5 and PAX8.

ARID

The function of this domain is not well characterized, however, proteins containing this domain have been found to be involved in both positive and negative transcriptional regulation, and they are likely to have involvement in the modification of chromatin structure (Kortschak et al., 2000). There are 9 transcription factor genes with ARID domain that were identified by this study, 2 genes with a record of activator, and 6 genes with a record of repressor (Table 17c). A phylogenetic tree was constructed using all genes with ARID domain identified by this work, and it is represented in Figure 21b. It suggests that the most ancient members in this group are ARID3A and ARID5A, and the most recently evolved members are ARID4A and KDM5B. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: KDM5A, KDM5B, and KDM6C.

22

Glia cell missing (GCM)

GCM motif containing proteins have been shown to bind nonpalindromic DNA octamer

5'-ATGCGGGT-3'. Such proteins have been identified in developmental processes

(Schreiber et al., 1998). There are 2 transcription factor genes with this domain (GSM1 and GCM2) that were identified by this study.

Helicase-SANT–associated (HSA)

This domain has been shown to bind to DNA but it is poorly characterized (Doerks et al.,

2002). There is one transcription factor gene (SMARCA4) with this domain that was identified by this study, and it has a record of transcriptional activator.

SLIDE

SLIDE domain binds DNA target sites in a similar fashion as Homeodomain (Grune et al., 2003). There is one transcription factor gene (SMARCA5) with this domain that was identified by this study, and it has a record of transcriptional activator.

The SAP (after SAF-A/B, Acinus and PIAS) motif is a putative DNA/RNA binding domain (Aravind and Koonin, 2000). There is one transcription factor gene (PIAS1) with this domain that was identified by this study, and it has a record of both transcriptional activator and repressor.

ZF-MIZ

Miz1 (PIAS2) is one of the proteins with this domain. It binds DNA in a sequence specific manner, and it can function as a positive-acting transcription factor (Wu et al.,

1997). There are 2 transcription factors genes (PIAS1 and PIAS2) with ZF-MIZ domain that were identified by this study.

Cut

23

This domain is a DNA-binding motif, which can bind independently or in cooperation with the Homeodomain (Lannoy et al., 1998). There are 7 genes with CUT domain by this study, one gene with a record of activator, and 2 genes with a record of repressor.

There is one gene (CUX1) that has a record of both activator and repressor. There are 6 genes out of 7 with CUT domain that also contain a Homeodomain (Table 18). A phylogenetic tree was constructed using all genes with CUT domain identified by this work, and it is represented in Figure 21c. It suggests that the most ancient member in this group is ONECUT3, and the most recently evolved member is SATB1. Some members can be grouped into subfamilies based on their sequence homology. For example, one of such groups includes the following members: CUX1, CUX2, SATB1, and SATB2.

In addition to protein alignments according to DNA-binding domains, phylogenetic trees were also constructed for transcription factor encoding genes having a record of activator activity (Figure 22) and genes having a record of repressor activity

(Figure 23). These phylograms of activators and repressors suggest that many genes with the same domain fall within the same part of the tree: for example, in the phylogram of activators Forkhead domain containing genes such as FOXA1, FOXA2, FOXA3, etc., appear in one part of the tree, while STAT domain genes such as STAT1, STAT2,

STAT3, etc., appear in a different part of the tree. The phylograms also suggest that some genes with a certain domain evolved together, while other genes with the same domain evolved independently: the phylogram of repressor proteins, for example, suggests that some genes with the same repressor domain (e. g., BTB), evolved together, such as

ZBTB12, ZBTB26, ZBTB6, etc. On the other hand, other BTB domain genes, such as

24

ZBTB10 and ZBTB41 appear in different parts of the tree, which suggests that these genes evolved independently from the first group.

25

DISCUSSION

Summary

This study generated a database of unique entries of genes with a record of transcription factor activity. All transcription factors were grouped according to their

DNA binding domain and according to their activator or repressor record, and presence of other DNA-binding domains was then analyzed. This database contains such information on each gene as aliases, nucleotide sequence, protein sequence, and significant protein domains; therefore, in addition to grouping of entries used in this study, the entries can be grouped according to other properties using such Microsoft

Office Access tools as simple search, filtering, and query. For example, all genes can be searched for presence of specific protein sequence, which has not yet been characterized as a specific domain by major bioinformatics sources. Several other than DNA-binding domains were observed with activator or repressor function. Additionally, protein sequence alignments of members of different groups were performed by this work and phylogenetic trees can be used to analyze relationship between different members of each group; moreover, functional properties of certain members can be predicted.

Limitations

Even though a careful analysis of each entry was performed, there are limitations to the results of this study. As in other attempts of generating a list of human transcription factors, some genes are only homologues found in other , and some of the

26

transcription factors are only predicted. Activator, repressor, and transcription factor activity of some of those genes can only be confirmed by experimental evidence, even though the likelihood of their predicted activity is high. Furthermore, this study was limited to keyword search when reviewing the literature corresponding to each analyzed gene. Therefore, some important information may have been missed.

Future Studies

The database generated by this study can be used for future human transcription factors research. Every entry can be analyzed once again, and each literature reference can be more thoroughly reviewed to further confirm or prove wrong activator or repressor activity of each gene. The existing database can be expanded by collecting such information about each member as their expression in specific tissues, and, as a result, all entries can be grouped as activators and repressors present in specific tissues. The phylogenetic trees can be used to search for poorly-characterized transcription factors that are closely related to the genes that are currently well-characterized, and the function of poorly-characterized genes can be predicted based on that relationship. Using this database information, more protein domains with activator and repressor functions can be discovered by analyzing sequence and function similarities of different genes.

The list of transcription factor genes generated by this study can be compared to other existing lists of transcription factors to search for members that were not recorded by this work. The database generated by this work can be translated into a website form in order to be available to everyone. Furthermore, since transcription factors are continuously being characterized, the database can be periodically updated with new information.

27 TABLES AND FIGURES

Table 1. ZF-C2H2 KRAB Transcription Factors. This table represents the collection of genes with ZF C2H2 and KRAB domains. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor CASZ1 KRAB Zf-C2H2 HKR1 KRAB Zf-C2H2 RBAK KRAB Zf-C2H2  ZFP1 KRAB Zf-C2H2 ZFP112 KRAB Zf-C2H2 ZFP14 KRAB Zf-C2H2 ZFP28 KRAB KRAB Zf-C2H2 ZFP30 KRAB Zf-C2H2 ZFP37 KRAB Zf-C2H2 ZFP57 KRAB Zf-C2H2  ZFP82 KRAB Zf-C2H2 ZFP90 KRAB Zf-C2H2 ZIK1 KRAB Zf-C2H2 ZKSCAN1 SCAN KRAB Zf-C2H2 ZKSCAN2 SCAN KRAB Zf-C2H2 ZKSCAN3 SCAN KRAB Zf-C2H2 ZKSCAN4 SCAN KRAB Zf-C2H2 ZKSCAN5 SCAN KRAB Zf-C2H2 ZNF10 KRAB Zf-C2H2  ZNF100 KRAB Zf-C2H2 ZNF101 KRAB Zf-C2H2 ZNF114 KRAB Zf-C2H2 ZNF12 KRAB Zf-C2H2 ZNF132 KRAB Zf-C2H2 ZNF133 KRAB Zf-C2H2  ZNF135 KRAB Zf-C2H2 ZNF136 KRAB Zf-C2H2 ZNF14 KRAB Zf-C2H2  ZNF140 KRAB Zf-C2H2 ZNF141 KRAB Zf-C2H2 ZNF154 KRAB Zf-C2H2 ZNF155 KRAB Zf-C2H2 ZNF157 KRAB Zf-C2H2 ZNF160 KRAB Zf-C2H2 ZNF167 SCAN KRAB Zf-C2H2 ZNF169 KRAB Zf-C2H2 ZNF17 KRAB Zf-C2H2 ZNF175 KRAB Zf-C2H2  ZNF177 KRAB Zf-C2H2 ZNF18 SCAN KRAB Zf-C2H2 ZNF180 KRAB Zf-C2H2 28 Table 1 (continued). ZF-C2H2 KRAB Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor ZNF181 KRAB Zf-C2H2 ZNF182 KRAB Zf-C2H2 ZNF184 KRAB Zf-C2H2 ZNF189 KRAB Zf-C2H2 ZNF19 KRAB Zf-C2H2 ZNF192 SCAN KRAB Zf-C2H2 ZNF195 KRAB Zf-C2H2 ZNF197 SCAN KRAB  ZNF2 KRAB Zf-C2H2 ZNF20 KRAB Zf-C2H2 ZNF202 SCAN KRAB Zf-C2H2 ZNF205 KRAB Zf-C2H2 ZNF208 KRAB Zf-C2H2 ZNF211 KRAB Zf-C2H2 ZNF212 KRAB Zf-C2H2 ZNF213 SCAN KRAB Zf-C2H2 ZNF214 KRAB Zf-C2H2 ZNF215 SCAN KRAB Zf-C2H2 ZNF222 KRAB Zf-C2H2 ZNF223 KRAB Zf-C2H2 ZNF224 KRAB Zf-C2H2  ZNF225 KRAB Zf-C2H2 ZNF226 KRAB Zf-C2H2 ZNF227 KRAB Zf-C2H2 ZNF229 KRAB Zf-C2H2 ZNF230 KRAB Zf-C2H2 ZNF233 KRAB Zf-C2H2 ZNF234 KRAB Zf-C2H2 ZNF235 KRAB Zf-C2H2 ZNF248 KRAB Zf-C2H2 ZNF250 KRAB Zf-C2H2 ZNF253 KRAB Zf-C2H2  ZNF254 KRAB Zf-C2H2 ZNF256 KRAB Zf-C2H2  ZNF257 KRAB Zf-C2H2 ZNF26 KRAB Zf-C2H2 ZNF263 SCAN KRAB Zf-C2H2  ZNF264 KRAB Zf-C2H2 ZNF267 KRAB Zf-C2H2 ZNF268 KRAB Zf-C2H2  ZNF273 KRAB Zf-C2H2 ZNF274 KRAB SCAN KRAB Zf-C2H2 ZNF275 KRAB Zf-C2H2 ZNF282 KRAB Zf-C2H2  ZNF283 KRAB Zf-C2H2 ZNF284 KRAB Zf-C2H2 ZNF285A KRAB Zf-C2H2 ZNF286 KRAB Zf-C2H2 ZNF287 SCAN KRAB  ZNF3 KRAB Zf-C2H2 ZNF300 KRAB Zf-C2H2 ZNF302 KRAB Zf-C2H2 29 Table 1 (continued). ZF-C2H2 KRAB Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor ZNF311 KRAB Zf-C2H2 ZNF317 KRAB Zf-C2H2 ZNF320 KRAB Zf-C2H2 ZNF324 KRAB Zf-C2H2 ZNF331 KRAB Zf-C2H2 ZNF333 KRAB KRAB Zf-C2H2 ZNF334 KRAB Zf-C2H2 ZNF337 KRAB Zf-C2H2 ZNF33A KRAB Zf-C2H2 ZNF33B KRAB Zf-C2H2 ZNF34 KRAB Zf-C2H2 ZNF343 KRAB Zf-C2H2 ZNF350 KRAB Zf-C2H2  ZNF354A KRAB Zf-C2H2 ZNF354B KRAB Zf-C2H2 ZNF354C KRAB Zf-C2H2 ZNF37A KRAB Zf-C2H2 ZNF382 KRAB Zf-C2H2  ZNF383 KRAB Zf-C2H2 ZNF394 SCAN KRAB Zf-C2H2 ZNF398 KRAB Zf-C2H2 DUF3669  ZNF404 KRAB Zf-C2H2 ZNF41 KRAB Zf-C2H2 ZNF415 KRAB Zf-C2H2 ZNF417 KRAB Zf-C2H2 ZNF418 KRAB Zf-C2H2 ZNF419 KRAB Zf-C2H2 ZNF420 KRAB Zf-C2H2 ZNF425 KRAB Zf-C2H2 ZNF426 KRAB Zf-C2H2 ZNF429 KRAB Zf-C2H2 ZNF43 KRAB Zf-C2H2 ZNF430 KRAB Zf-C2H2 ZNF431 KRAB Zf-C2H2 ZNF432 KRAB Zf-C2H2 ZNF433 KRAB Zf-C2H2 ZNF434 KRAB Zf-C2H2 ZNF436 KRAB Zf-C2H2 ZNF439 KRAB Zf-C2H2 ZNF44 KRAB Zf-C2H2  ZNF440 KRAB Zf-C2H2 ZNF445 SCAN KRAB Zf-C2H2 ZNF446 SCAN KRAB Zf-C2H2 ZNF45 KRAB Zf-C2H2 ZNF454 KRAB Zf-C2H2 ZNF460 KRAB Zf-C2H2 ZNF461 KRAB Zf-C2H2  ZNF470 KRAB Zf-C2H2 ZNF471 KRAB Zf-C2H2 ZNF473 KRAB Zf-C2H2 ZNF479 KRAB Zf-C2H2 ZNF480 KRAB Zf-C2H2  30 Table 1 (continued). ZF-C2H2 KRAB Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor ZNF483 SCAN KRAB Zf-C2H2 ZNF485 KRAB Zf-C2H2 ZNF486 KRAB Zf-C2H2 ZNF490 KRAB Zf-C2H2 ZNF492 KRAB Zf-C2H2 ZNF496 SCAN KRAB Zf-C2H2 ZNF498 KRAB Zf-C2H2 ZNF500 SCAN KRAB Zf-C2H2 ZNF510 KRAB Zf-C2H2 ZNF514 KRAB Zf-C2H2 ZNF517 KRAB Zf-C2H2 ZNF519 KRAB Zf-C2H2 ZNF527 KRAB Zf-C2H2 ZNF528 KRAB Zf-C2H2 ZNF540 KRAB Zf-C2H2 ZNF543 KRAB Zf-C2H2 ZNF544 KRAB Zf-C2H2 ZNF546 KRAB Zf-C2H2 ZNF547 KRAB Zf-C2H2 ZNF548 KRAB Zf-C2H2 ZNF549 KRAB Zf-C2H2 ZNF550 KRAB Zf-C2H2 ZNF551 KRAB Zf-C2H2 ZNF552 KRAB Zf-C2H2 ZNF554 KRAB Zf-C2H2 ZNF555 KRAB Zf-C2H2 ZNF556 KRAB Zf-C2H2 ZNF557 KRAB Zf-C2H2 ZNF558 KRAB Zf-C2H2 ZNF559 KRAB Zf-C2H2 ZNF560 KRAB KRAB Zf-C2H2 ZNF561 KRAB Zf-C2H2 ZNF563 KRAB Zf-C2H2 ZNF564 KRAB Zf-C2H2 ZNF565 KRAB Zf-C2H2 ZNF566 KRAB Zf-C2H2 ZNF567 KRAB Zf-C2H2 ZNF568 KRAB Zf-C2H2 ZNF571 KRAB Zf-C2H2 ZNF573 KRAB Zf-C2H2 ZNF577 KRAB Zf-C2H2 ZNF582 KRAB Zf-C2H2 ZNF583 KRAB Zf-C2H2 ZNF587 KRAB Zf-C2H2 ZNF589 KRAB Zf-C2H2 ZNF589 KRAB Zf-C2H2 ZNF595 KRAB Zf-C2H2 ZNF596 KRAB Zf-C2H2 ZNF597 KRAB Zf-C2H2 ZNF605 KRAB Zf-C2H2 ZNF606 KRAB Zf-C2H2  ZNF607 KRAB Zf-C2H2

31 Table 1 (continued). ZF-C2H2 KRAB Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor ZNF611 KRAB Zf-C2H2 ZNF614 KRAB Zf-C2H2 ZNF615 KRAB Zf-C2H2 ZNF616 KRAB Zf-C2H2 ZNF620 KRAB Zf-C2H2 ZNF621 KRAB Zf-C2H2 ZNF624 KRAB Zf-C2H2 ZNF626 KRAB Zf-C2H2 ZNF627 KRAB Zf-C2H2 ZNF641 KRAB Zf-C2H2 ZNF642 KRAB Zf-C2H2 ZNF649 KRAB Zf-C2H2  ZNF667 KRAB Zf-C2H2 ZNF669 KRAB Zf-C2H2 ZNF670 KRAB Zf-C2H2 ZNF671 KRAB Zf-C2H2 ZNF674 KRAB Zf-C2H2 ZNF675 KRAB Zf-C2H2  ZNF676 KRAB Zf-C2H2 ZNF677 KRAB Zf-C2H2 ZNF678 KRAB Zf-C2H2 ZNF679 KRAB Zf-C2H2 ZNF680 KRAB Zf-C2H2 ZNF682 KRAB Zf-C2H2 ZNF684 KRAB Zf-C2H2 ZNF688 KRAB Zf-C2H2 ZNF689 KRAB Zf-C2H2 ZNF699 KRAB Zf-C2H2 ZNF7 KRAB Zf-C2H2 ZNF700 KRAB Zf-C2H2 ZNF701 KRAB Zf-C2H2 ZNF705A KRAB Zf-C2H2 ZNF705D KRAB Zf-C2H2 ZNF707 KRAB Zf-C2H2 ZNF708 KRAB Zf-C2H2 ZNF709 KRAB Zf-C2H2 ZNF713 KRAB Zf-C2H2 ZNF714 KRAB Zf-C2H2 ZNF718 KRAB Zf-C2H2 ZNF726 KRAB Zf-C2H2 ZNF730 KRAB Zf-C2H2 ZNF737 KRAB Zf-C2H2 ZNF74 KRAB Zf-C2H2 ZNF746 DUF3669 KRAB Zf-C2H2 ZNF75A KRAB Zf-C2H2 ZNF75D SCAN KRAB Zf-C2H2 ZNF761 KRAB Zf-C2H2 ZNF763 KRAB Zf-C2H2 ZNF764 KRAB Zf-C2H2 ZNF765 KRAB Zf-C2H2 ZNF766 KRAB Zf-C2H2 ZNF77 KRAB Zf-C2H2

32 Table 1 (continued). ZF-C2H2 KRAB Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor ZNF772 KRAB Zf-C2H2 ZNF773 KRAB Zf-C2H2 ZNF776 KRAB Zf-C2H2 ZNF777 KRAB Zf-C2H2 ZNF778 KRAB Zf-C2H2 ZNF785 KRAB Zf-C2H2 KRAB ZNF786 KRAB Zf-C2H2 ZNF789 KRAB Zf-C2H2 ZNF79 KRAB Zf-C2H2 ZNF790 KRAB Zf-C2H2 ZNF791 KRAB Zf-C2H2 ZNF8 KRAB Zf-C2H2  ZNF81 KRAB Zf-C2H2 ZNF813 KRAB Zf-C2H2 ZNF814 KRAB Zf-C2H2 ZNF816A KRAB Zf-C2H2 ZNF823 KRAB Zf-C2H2 ZNF84 KRAB Zf-C2H2 ZNF846 KRAB Zf-C2H2 ZNF85 KRAB Zf-C2H2 ZNF90 KRAB Zf-C2H2 ZNF91 KRAB Zf-C2H2  ZNF92 KRAB Zf-C2H2 ZNF93 KRAB Zf-C2H2 ZNF99 KRAB Zf-C2H2 ZSCAN18 SCAN KRAB Zf-C2H2

33 Table 2. ZF C2H2 SCAN Transcription Factors. This table represents the collection of genes with ZF C2H2, SCAN, and other protein domain information. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor MZF1 SCAN Zf-C2H2  PEG3 SCAN Zf-C2H2 ZIM2 SCAN Zf-C2H2 ZKSCAN1 SCAN KRAB Zf-C2H2 ZKSCAN2 SCAN KRAB Zf-C2H2 ZKSCAN3 SCAN KRAB Zf-C2H2 ZKSCAN4 SCAN KRAB Zf-C2H2 ZKSCAN5 SCAN KRAB Zf-C2H2 ZNF165 SCAN Zf-C2H2 ZNF167 SCAN KRAB Zf-C2H2 ZNF174 SCAN Zf-C2H2  ZNF18 SCAN KRAB Zf-C2H2 ZNF192 SCAN KRAB Zf-C2H2 ZNF193 SCAN Zf-C2H2 ZNF197 SCAN KRAB Zf-C2H2  ZNF202 SCAN KRAB Zf-C2H2 ZNF213 SCAN KRAB Zf-C2H2 ZNF215 SCAN KRAB Zf-C2H2 ZNF232 SCAN Zf-C2H2 ZNF24 SCAN Zf-C2H2  ZNF263 SCAN KRAB Zf-C2H2  ZNF274 KRAB SCAN KRAB Zf-C2H2 ZNF287 SCAN KRAB Zf-C2H2  ZNF323 SCAN Zf-C2H2 ZNF394 SCAN KRAB Zf-C2H2 ZNF396 SCAN Zf-C2H2 ZNF397 SCAN Zf-C2H2   ZNF397OS SCAN Zf-C2H2 ZNF444 SCAN Zf-C2H2 ZNF445 SCAN KRAB Zf-C2H2 ZNF446 SCAN KRAB Zf-C2H2 ZNF449 SCAN Zf-C2H2 ZNF483 SCAN KRAB Zf-C2H2 ZNF496 SCAN KRAB Zf-C2H2 ZNF500 SCAN KRAB Zf-C2H2 ZNF75D SCAN KRAB Zf-C2H2 ZSCAN1 SCAN Zf-C2H2 ZSCAN10 SCAN Zf-C2H2 ZSCAN12 SCAN Zf-C2H2 ZSCAN16 SCAN Zf-C2H2  ZSCAN18 SCAN KRAB Zf-C2H2 ZSCAN20 SCAN Zf-C2H2 ZSCAN21 SCAN Zf-C2H2  ZSCAN22 SCAN Zf-C2H2 ZSCAN23 SCAN Zf-C2H2 ZSCAN29 SCAN Zf-C2H2 ZSCAN4 SCAN Zf-C2H2 ZSCAN5A SCAN Zf-C2H2 ZSCAN5B SCAN Zf-C2H2 ZSCAN5C SCAN Zf-C2H2

34 Table 3. ZF-C2H2 BTB Transcription Factors. This table represents the collection of genes with ZF C2H2 and BTB domains. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Activator Repressor BCL6 BTB Zf-C2H2  BCL6B BTB Zf-C2H2  GZF1 BTB Zf-C2H2  HIC1 BTB Zf-C2H2 HIC2 BTB Zf-C2H2  MYNN BTB Zf-C2H2 PATZ1 BTB AT Zf-C2H2  ZBTB10 BTB Zf-C2H2 ZBTB11 BTB Zf-C2H2 ZBTB12 BTB Zf-C2H2 ZBTB16 BTB Zf-C2H2  ZBTB17 BTB Zf-C2H2 ZBTB2 BTB Zf-C2H2  ZBTB20 BTB Zf-C2H2 ZBTB22 BTB Zf-C2H2 ZBTB24 BTB AT hook Zf-C2H2 ZBTB25 BTB Zf-C2H2 ZBTB26 BTB Zf-C2H2 ZBTB3 BTB Zf-C2H2 ZBTB32 BTB Zf-C2H2 ZBTB33 BTB Zf-C2H2 ZBTB34 BTB Zf-C2H2 ZBTB38 BTB Zf-C2H2  ZBTB39 BTB Zf-C2H2 ZBTB4 BTB Zf-C2H2  ZBTB41 BTB Zf-C2H2 ZBTB43 BTB Zf-C2H2 ZBTB44 BTB Zf-C2H2 ZBTB45 BTB Zf-C2H2 ZBTB47 BTB Zf-C2H2 ZBTB48 BTB Zf-C2H2 ZBTB5 BTB Zf-C2H2  ZBTB6 BTB Zf-C2H2 ZBTB7A BTB Zf-C2H2  ZBTB7B BTB Zf-C2H2 ZBTB7C BTB Zf-C2H2 ZBTB8A BTB Zf-C2H2 ZBTB8B BTB Zf-C2H2 ZBTB9 BTB Zf-C2H2 ZFP161 BTB Zf-C2H2 ZNF131 BTB Zf-C2H2 ZNF238 BTB Zf-C2H2   ZNF295 BTB Zf-C2H2 ZNF509 BTB Zf-C2H2

35 Table 4. ZF-C2H2 without KRAB, SCAN, BTB Domains. This table represents the collection of genes with ZF C2H2 but without KRAB, SCAN, or BTB domains. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor ATF2 zf-C2H2 bZIP1  ATF7 zf-C2H2 bZIP 1 BCL11A zf-C2H2  BCL11B zf-C2H2  BCL11B zf-C2H2  BNC1 zf-C2H2 BNC2 zf-C2H2 CTCF zf-C2H2   CTCFL zf-C2H2  DPF2 zf-C2H2 PHD PHD Zf-C2H2   EGR1 zf-C2H2  EGR2 zf-C2H2 EGR3 zf-C2H2 EGR4 zf-C2H2  FEZF1 zf-C2H2  FIZ1 zf-C2H2  GFI1 zf-C2H2  GFI1B zf-C2H2  GLI1 zf-C2H2  GLI2 zf-C2H2  GLI3 zf-C2H2  GLIS1 zf-C2H2   GLIS2 zf-C2H2   GLIS3 zf-C2H2   HINFP zf-C2H2  HIVEP1 zf-C2H2 zf-C2H2_jaz  HIVEP2 zf-C2H2 HIVEP3 zf-C2H2  IKZF1 zf-C2H2  IKZF2 zf-C2H2  IKZF2 zf-C2H2  IKZF3 Zf-C2H2 IKZF4 zf-C2H2  IKZF5 Zf-C2H2 INSM1 zf-C2H2  INSM2 Zf-C2H2 KLF1 zf-C2H2  KLF10 zf-C2H2 KLF11 zf-C2H2  KLF12 zf-C2H2  KLF13 zf-C2H2  KLF14 zf-C2H2 KLF15 zf-C2H2  KLF16 zf-C2H2 

36 Table 4 (continued). ZF-C2H2 without KRAB, SCAN, BTB Domains. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor KLF17 zf-C2H2 KLF2 zf-C2H2  KLF3 zf-C2H2  zf-C2H2   KLF5 zf-C2H2  KLF6 zf-C2H2  KLF7 zf-C2H2  KLF8 zf-C2H2  KLF9 zf-C2H2   L3MBTL MBT zf-C2H2  L3MBTL4 MTB zf-C2H2 MAZ zf-C2H2  MDS1 zf-C2H2 MECOM zf-C2H2  MTF1 zf-C2H2  MYT1L zf-C2H2 NFXL1 zf-C2H2 OSR2 zf-C2H2 OVOL1 zf-C2H2 OVOL2 zf-C2H2 C1_1 PHF20 zf-C2H2 PHD PLAG1 zf-C2H2 PLAGL1 zf-C2H2  PLAGL2 zf-C2H2 PRDM1 SET zf-C2H2 PRDM10 zf-C2H2 PRDM12 zf-C2H2 PRDM13 SET zf-C2H2 PRDM14 SET zf-C2H2 PRDM15 zf-C2H2 PRDM16 zf-C2H2  PRDM2 SET zf-C2H2  PRDM4 zf-C2H2 PRDM5 SET zf-C2H2 PRDM8 zf-C2H2 REST Zf-C2H2  RLF zf-C2H2 RREB1 zf-C2H2   SALL1 zf-C2H2  SALL2 zf-C2H2 SALL3 zf-C2H2 SALL4 zf-C2H2 SCRT1 zf-C2H2  SCRT2 zf-C2H2  SLC2A4RG zf-C2H2 SNAI1 zf-C2H2  SNAI2 zf-C2H2  SNAI3 zf-C2H2 SP1 zf-C2H2  SP2 zf-C2H2 37 Table 4 (continued). ZF-C2H2 without KRAB, SCAN, BTB Domains. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor SP3 zf-C2H2   SP4 zf-C2H2 SP5 zf-C2H2 SP6 zf-C2H2 SP7 zf-C2H2 SP8 zf-C2H2 ST18 zf-C2H2  TRERF1 zf-C2H2 ELM2 Myb DNA- zf-C2H2 binding TSHZ1 zf-C2H2 TSHZ3 zf-C2H2 VEZF1 zf-C2H2 WT1 WT1 zf-C2H2  YY1 zf-C2H2   YY2 zf-C2H2   ZBTB37 zf-C2H2 ZBTB40 zf-C2H2 ZEB1 zf-C2H2  ZEB2 zf-C2H2  ZFAT zf-C2H2 ZFAT zf-C2H2 ZFHX2 zf-C2H2 Homeobox zf-C2H2 Homeobox ZFHX3 zf-C2H2 Homeobox  ZFHX4 zf-C2H2 Homeobox ZFP2 zf-C2H2 ZFP276 zf-AD zf-C2H2 ZFP3 zf-C2H2 ZFP41 zf-C2H2 ZFP42 zf-C2H2 ZFP62 zf-C2H2 ZFP64 zf-C2H2 ZFP91 zf-C2H2 ZFPM1 zf-C2H2 ZFPM2 zf-C2H2 ZFX zf-C2H2 ZFY Zfx zfy act zf-C2H2 ZIC1 zf-C2H2 ZIC2 zf-C2H2  ZIC3 zf-C2H2 ZIM3 zf-C2H2 ZNF107 zf-C2H2 ZNF117 zf-C2H2 ZNF121 zf-C2H2 ZNF124 zf-C2H2 ZNF134 zf-C2H2 ZNF138 zf-C2H2 ZNF142 zf-C2H2 ZNF143 zf-C2H2  ZNF146 zf-C2H2 ZNF148 zf-C2H2   ZNF16 zf-C2H2 38 Table 4 (continued). ZF-C2H2 without KRAB, SCAN, BTB Domains. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor ZNF187 zf-C2H2 ZNF217 zf-C2H2  ZNF219 zf-C2H2  ZNF22 zf-C2H2 ZNF23 zf-C2H2 ZNF236 zf-C2H2 ZNF239 zf-C2H2 ZNF25 zf-C2H2 ZNF251 zf-C2H2 ZNF260 zf-C2H2 ZNF266 zf-C2H2 ZNF276 Zf-AD zf-C2H2 ZNF277 zf-C2H2 ZNF28 zf-C2H2 ZNF280B zf-C2H2 ZNF281 zf-c2h2   ZNF292 zf-C2H2 ZNF296 zf-C2H2 zf-C2H2 ZNF30 zf-C2H2 ZNF319 zf-C2H2 ZNF32 zf-C2H2 ZNF322A zf-C2H2 ZNF322B zf-C2H2 ZNF329 zf-C2H2 ZNF335 zf-C2H2 ZNF341 zf-C2H2 ZNF345 zf-C2H2 ZNF347 zf-C2H2 ZNF35 zf-C2H2 ZNF358 zf-C2H2 ZNF362 zf-C2H2 ZNF366 zf-C2H2 ZNF367 zf-C2H2 ZNF384 zf-C2H2  ZNF391 zf-C2H2 ZNF407 zf-C2H2 ZNF408 zf-C2H2 ZNF410 zf-C2H2 ZNF414 zf-C2H2 ZNF423 zf-C2H2   ZNF438 zf-C2H2  ZNF441 zf-C2H2 ZNF451 zf-C2H2 ZNF462 zf-C2H2 ZNF467 zf-C2H2 ZNF468 zf-C2H2 ZNF48 zf-C2H2 ZNF488 zf-C2H2  ZNF491 zf-C2H2 ZNF493 zf-C2H2 ZNF497 zf-C2H2

39 Table 4 (continued). ZF-C2H2 without KRAB, SCAN, BTB Domains. Gene Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor Name ZNF501 zf-C2H2 ZNF502 zf-C2H2 ZNF507 zf-C2H2 ZNF511 zf-C2H2 ZNF512 zf-C2H2 ZNF512B zf-C2H2 ZNF513 zf-C2H2 ZNF516 zf-C2H2 ZNF518A zf-C2H2 ZNF518B zf-C2H2 ZNF524 zf-C2H2 ZNF526 zf-C2H2 ZNF529 zf-C2H2 ZNF532 zf-C2H2 ZNF536 zf-C2H2 ZNF541 zf-C2H2 ELM2 Myb DNA-binding ZNF542 zf-C2H2 ZNF562 zf-C2H2 ZNF569 zf-C2H2 ZNF572 zf-C2H2 ZNF574 zf-C2H2 ZNF575 zf-C2H2 ZNF576 zf-C2H2 ZNF578 zf-C2H2 ZNF579 zf-C2H2 ZNF580 zf-C2H2 ZNF581 zf-C2H2 ZNF584 zf-C2H2 ZNF592 zf-C2H2 ZNF594 zf-C2H2 ZNF599 zf-C2H2 ZNF600 zf-C2H2 ZNF610 zf-C2H2 ZNF613 zf-C2H2 ZNF618 zf-C2H2 ZNF619 zf-C2H2 ZNF623 zf-C2H2 ZNF625 zf-C2H2 ZNF628 zf-C2H2 ZNF629 zf-C2H2 ZNF630 zf-C2H2 ZNF639 zf-C2H2  ZNF644 zf-C2H2 ZNF646 zf-C2H2 ZNF652 zf-C2H2 ZNF653 zf-C2H2 ZNF654 zf-C2H2 Zf-BED DUF3133 ZNF655 zf-C2H2 ZNF658 zf-C2H2 ZNF658B zf-C2H2 ZNF660 zf-C2H2 40 Table 4 (continued). ZF-C2H2 without KRAB, SCAN, BTB Domains. Gene Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor Name ZNF662 zf-C2H2 ZNF663 zf-C2H2 ZNF664 zf-C2H2 ZNF665 zf-C2H2 ZNF668 zf-C2H2 ZNF672 zf-C2H2 ZNF681 zf-C2H2 ZNF687 zf-C2H2 ZNF692 zf-C2H2 ZNF696 zf-C2H2 ZNF697 zf-C2H2 ZNF70 zf-C2H2 ZNF702P zf-C2H2 ZNF71 Zf-C2H2 ZNF710 zf-C2H2 ZNF711 Zfx_Zfy_act zf-C2H2 ZNF721 zf-C2H2 ZNF76 zf-C2H2  ZNF768 zf-C2H2 ZNF770 zf-C2H2 ZNF771 zf-C2H2 ZNF774 zf-C2H2 ZNF775 zf-C2H2 ZNF780A zf-C2H2 ZNF780B zf-C2H2 ZNF781 zf-C2H2 ZNF784 zf-C2H2 ZNF787 zf-C2H2 ZNF788 zf-C2H2 ZNF792 zf-C2H2 ZNF799 zf-C2H2 ZNF80 zf-C2H2 ZNF800 zf-C2H2 ZNF805 zf-C2H2 ZNF818P zf-C2H2 ZNF821 zf-C2H2 ZNF826 zf-C2H2 ZNF827 zf-C2H2 ZNF83 zf-C2H2 ZNF835 zf-C2H2 ZNF836 zf-C2H2 ZNF841 zf-C2H2 ZSCAN2 zf-C2H2 ZXDA zf-C2H2 ZXDC zf-C2H2 

41 Table 5. Homeobox Transcription Factors. This table represents the collection of genes with Homeobox domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor PDX1 Homeobox  POU1F1 Pou Homeobox  NKX2-1 Homeobox  NKX2-3 Homeobox  DMBX1 Homeobox  MSX1 Homeobox  NKX2-5 Homeobox   ZFHX3 zf-C2H2 Homeobox  LOC360030 Homeobox ADNP Homeobox ALX3 Homeobox ALX4 Homeobox OAR ARGFX Homeobox ARX Homeobox OAR BARHL1 Homeobox BARHL2 Homeobox BARX1 Homeobox BARX2 Homeobox BSX Homeobox CDX1 Caudal act Homeobox   CDX2 Claudal act Homeobox   CDX4 Caudal act Homeobox CRX Homeobox  CUX1 CUT Homeobox   DBX1 Homeobox DBX2 Homeobox DLX1 Homeobox DLX2 Homeobox DLX3 Homeobox DLX4 Homeobox  DLX5 Homeobox DLX6 Homeobox DRGX Homeobox OAR DUXA Homeobox Homeobox EMX1 Homeobox EMX2 Homeobox EN1 Homeobox ESX1 Homeobox  Homeobox  EVX2 homeobox HDX Homeobox HESX1 Homeobox  HHEX Homeobox  HLX Homeobox HMBOX1 Homeobox   HMX1 Homeobox  HMX2 Homeobox HMX3 Homeobox HNF1A HNF-1 N Homeobox 

42 Table 5 (continued). Homeobox Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor HNF1B HNF-1_N Homeobox HNF-1B_C  HOPX Homeobox  HOXA1 Homeobox HOXA10 Homeobox HOXA11 Homeobox HOXA13 Homeobox HOXA2 Homeobox HOXA3 Homeobox HOXA4 Homeobox HOXA5 Homeobox HOXA6 homeobox HOXA7 Homeobox  HOXA9 Hox9_act Homeobox   HOXB1 Homeobox HOXB13 Homeobox HOXB2 Homeobox HOXB3 Homeobox HOXB4 Homeobox  HOXB5 Homeobox HOXB6 Homeobox HOXB7 Homeobox HOXB8 Homeobox HOXB9 Hox9 act Homeobox  HOXC10 Homeobox HOXC11 Homeobox HOXC13 Homeobox HOXC4 Homeobox HOXC5 Homeobox HOXC6 Homeobox HOXC8 Homeobox  HOXC9 Hox9 act Homeobox  HOXD1 Homeobox HOXD10 Homeobox HOXD11 Homeobox HOXD12 Homeobox HOXD13 Homeobox HOXD3 Homeobox HOXD4 Homeobox HOXD8 Homeobox HOXD9 Hox9 act Homeobox  IRX1 Homeobox IRX2 Homeobox IRX3 Homeobox IRX4 Homeobox IRX5 Homeobox IRX6 Homeobox ISL1 LIM Homeobox  ISL2 LIM LIM Homeobox ISX Homeobox LASS3 Homeobox LAG1 LASS5 Homeobox LAG1 LBX1 Homeobox 43 Table 5 (continued). Homeobox Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor LBX2 Homeobox LEUTX Homeobox LHX1 LIM LIM Homeobox LHX2 LIM LIM Homeobox LHX3 LIM LIM Homeobox LHX4 LIM LIM Homeobox LHX5 LIM Homeobox LHX6 LIM Homeobox LHX8 LIM Homeobox LHX9 LIM Homeobox LMX1A LIM LIM Homeobox LMX1B LIM LIM Homeobox MEIS1 Homeobox  MEIS2 Homeobox MEIS3 Homeobox MEOX1 Homeobox MEOX2 Homeobox MIXL1 Homeobox MKX Homeobox MSX2 Homeobox  NANOG Homeobox NANOGP1 Homeobox NANOGP8 Homeobox NKX2 Homeobox  NKX2-2 Homeobox NKX2-6 Homeobox NKX2-8 Homeobox NKX3-1 Homeobox NKX6-1 Homeobox NOBOX Homeobox NOTO Homeobox ONECUT1 CUT Homeobox  ONECUT2 CUT Homeobox ONECUT3 CUT Homeobox OTP Homeobox OAR OTX1 Homeobox TF Otx OTX2 Homeobox TF Otx PAX3 PAX Homeobox  PAX4 PAX Homeobox PAX6 PAX Homeobox PAX7 PAX Homeobox PBX1 PBC Homeobox PBX2 PBC Homeobox  PBX3 PBC Homeobox PBX4 PBC Homeobox PHOX2A Homeobox PITX1 Homeobox OAR PITX2 Homeobox OAR PKNOX2 Homeobox POU2F1 Pou Homeobox  POU2F2 Pou Homeobox POU2F3 Pou Homeobox 44 Table 5 (continued). Homeobox Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor POU3F1 Pou Homeobox POU3F2 Pou Homeobox POU3F3 Pou Homeobox POU3F4 Pou Homeobox POU4F1 Pou Homeobox POU4F2 Pou Homeobox POU5F1 Pou Homeobox POU5F1B Pou Homeobox  POU6F1 Pou Homeobox POU6F2 Pou Homeobox PROP1 Homeobox  PRRX1 Homeobox PRRX2 Homeobox OAR RAX Homeobox OAR RAX2 Homeobox RHOXF2 Homeobox RHOXF2B Homeobox SATB1 CUT Homeobox  SATB2 CUT Homeobox SHOX Homeobox OAR SHOX2 Homeobox OAR SIX1 Homeobox SIX2 Homeobox SIX3 Homeobox  SIX4 Homeobox SIX5 Homeobox SIX6 Homeobox TGIF1 Homeobox TGIF2 Homeobox  TGIF2LX Homeobox TGIF2LY Homeobox TLX1 Homeobox TLX2 Homeobox TLX3 Homeobox TPRX1 Homeobox UNCX Homeobox VAX1 Homeobox VAX2 homeobox VENTX Homeobox  VSX1 Homeobox ZFHX2 zf-C2H2 Homeobox zf-C2H2 Homeobox ZFHX4 zf-C2H2 Homeobox ZHX1 Homeobox Homeobox Homeobox ZHX3 Homeobox ADNP2 Homeobox ALX1 Homeobox OAR  DPRX Homeobox DUX1 Homeobox DUX4 Homeobox  EN2 Homeobox GSC2 Homeobox

45 Table 5 (continued). Homeobox Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor GSX1 Homeobox GSX2 Homeobox NKX1-1 Homeobox NKX1-2 Homeobox NKX2-4 Homeobox NKX3-2 Homeobox NKX6-2 Homeobox NKX6-3 Homeobox PITX3 Homeobox OAR POU5F2 Pou Homeobox RHOXF1 Homeobox SEBOX Homeobox VSX2 Homeobox OAR ZHX2 Homeobox  GBX1 Homeobox GBX2 Homeobox GSC Homeobox MNX1 Homeobox POU4F3 Pou Homeobox PHOX2B Homeobox PKNOX1 Homeobox

46 Table 6. HLH Transcription Factors. This table represents the collection of genes with HLH domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Activator Repressor MLXIPL HLH   BHLHE23 HLH  MITF HLH   MYOG Basic HLH  NEUROD1 HLH  SOHLH1 HLH  TCF12 HLH  FERD3L HLH  BHLHE22 HLH  BHLHE40 HLH  HAND1 HLH  TWIST2 HLH  AHR HLH PAS  ARNT2 HLH PAS  ARNTL HLH PAS  ARNTL2 HLH PAS ASCL1 HLH   ASCL2 HLH  ASCL3 HLH ASCL4 HLH ATOH1 HLH  ATOH8 HLH CLOCK HLH PAS  EPAS1 HLH PAS  FIGLA HLH HAND2 HLH  HELT HLH Hairy orange  HES1 HLH Hairy orange  HES5 HLH Hair orange  HES6 HLH Hairy orange  HES7 HLH Hairy orange  HEY1 HLH Hairy orange  HEY2 HLH Hairy orange  HEYL HLH Hairy orange HIF1A HLH PAS  HOXC12 HLH ID1 HLH  ID2 HLH  ID3 HLH ID4 HLH KIAA2018 HLH LYL1 HLH MAX HLH MESP2 HLH MGA T-box HLH MLX HLH MLXIP HLH MNT HLH   MSC HLH 

47 Table 6 (continued). HLH Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Activator Repressor MXD1 HLH  MXD3 HLH  MXD4 HLH   MXI1 HLH  Myc N HLH   MYCL1 Myc N HLH MYCN Myc N HLH  MYF6 Basic HLH MYOD1 Basic HLH  NEUROD2 HLH  NEUROD6 HLH NEUROG1 HLH NEUROG2 HLH NEUROG3 HLH NHLH2 HLH NPAS1 HLH PAS NPAS2 HLH PAS NPAS3 HLH PAS OLIG1 HLH OLIG2 HLH OLIG3 HLH PTF1A HLH  SCXA HLH SIM1 HLH PAS SIM C SIM2 HLH PAS SIM C  SREBF1 HLH  SREBF2 HLH TAL1 HLH   TAL2 HLH TCF21 HLH TCF23 HLH TCF24 HLH TCF3 HLH  TCF4 HLH TCFL5 HLH TFAP4 HLH  TFE3 HLH  TFEB HLH TFEC HLH  TWIST1 HLH USF1 HLH USF2 HLH BHLHE41 HLH Hairy orange  TCF15 HLH ARNT HLH PAS MESP1 HLH MYF5 Basic HLH Myf5  NCOA1 HLH PAS DUF1518  NEUROD4 HLH BHLHA9 HLH BHLHA15 HLH HES2 HLH Hairy orange 48 Table 6 (continued). HLH Transcription Factors. Gene Name Domain 1 Domain 2 Domain 3 Activator Repressor MSGN1 HLH SOHLH2 HLH ATOH7 HLH NHLH1 HLH

49 Table 7. bZip Transcription Factors. This table represents the collection of genes with bZIP domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Domain Domain Domain Domain Domain Activator Repressor Name 1 2 3 4 5 ATF4 bZIP_1  MAFA bZIP Maf  ATF1 pKID bZIP 1  ATF2 Zf-C2H2 bZIP1  ATF3 bZIP 2   ATF5 bZIP_1  ATF6 bZIP_1  ATF7 zf-C2H2 bZIP 1  BACH1 BTB bZIP 1   BACH2 BTB bZIP 1   BATF bZIP_1   BATF2 bZIP 1  CEBPB bZIP_2  CEBPE bZIP 2  CEBPG bZIP_2  CREB1 pKID bZIP 1  CREB3 bZIP 1  CREB3L bZIP 1  1 CREB3L bZIP 1  2 CREB3L bZIP_1  3 CREB3L bZIP 1  4 CREB5 bZIP 1  CREBL2 bZIP 2  CREBZF bZIP 1   CREM pKID bZIP 1   DBP bZIP 2  DDIT3 bZIP_2  FOS bZIP 1  FOSB bZIP 2  FOSL1 bZIP 1  FOSL2 bZIP 1  HEXIM2 KdgM HAP1_N bZIP_2 DivIC Herpes_   BLRF2 HLF bZIP 2  JDP2 bZIP_2   JUN Jun bZIP 1  JUNB Jun bZIP 1  JUND Jun bZIP_1  MAF bZIP_Ma  MAFB bZIP Maf  

50 Table 7 (continued). bZip Transcription Factors. Gene Domain Domain Domain Domain Domain Activator Repressor Name 1 2 3 4 5 MAFF bZIP Maf   MAFG bZIP Maf MAFK bZIP Maf  NFE2 bZIP_1  NFE2L1 bZIP 1 NFE2L3 bZIP 1 NFIL3 bZIP 2 Vert IL3-  reg TF NRL bZIP Maf TEF bZIP_2 XBP1 bZIP 2 ATF6B bZIP_1  BATF3 bZIP 1  CEBPD bZIP_2 CEBPA bZIP_2   C5orf41 Daxx bZIP_1 NFE2L2 bZIP 1 

51 Table 8. ZF C4 Transcription Factors. This table represents the collection of genes with ZF C4 domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Activator Repressor PPARG zf-c4   ESRRG zf-c4 Hormone recep  NR4A1 zf-c4 Hormone recep  NR4A3 zf-C4  PPARA zf-C4 Hormone recep  RARB zf-C4 Hormone recep  NR1I2 zf-C4 Hormone recep  NR2C1 zf-C4 Hormone recep  PPARD zf-C4 Hormone recep   AR Androgen recep zf-c4 Hormone recep  ESR1 Oest recep zf-C4   ESR2 zf-c4 Hormone recep ERbeta_N  ESRRA Zf-C4 Hormone_recep ESRRB zf-c4 Hormone recep  HNF4A zf-c4 Hormone recep HNF4G zf-c4 Hormone recep  NR1D1 Zf-C4 Hormone_recep NR1H2 zf-c4 Hormone recep NR1H3 zf-C4 Hormone recep NR1H4 zf-C4 Hormone recep  NR1I3 zf-c4 Hormone recep NR2C2 zf-c4 Hormone recep NR2E1 zf-C4 Hormone recep  NR2E3 zf-C4 Hormone recep NR2F1 zf-c4 Hormone recep  NR2F2 zf-c4 Hormone recep NR2F6 zf-C4 Hormone recep NR3C1 GCR zf-C4 Hormone recep   NR3C2 zf-C4 NR4A2 zf-C4 Hormone recep NR5A1 zf-C4 Hormone_recep  NR5A2 zf-C4 Hormone_recep  NR6A1 zf-C4 Hormone recep PGR Rrog recep zf-c4 Hormone recep RARA zf-c4 Hormone recep  RARG zf-C4 Hormone recep RORA zf-C4 Hormone recep  RORB zf-c4 Hormone recep RORC zf-C4 Hormone recep RXRA zf-C4 Hormone recep  RXRB zf-C4 Hormone recep RXRG zf-C4 Hormone_recep THRA zf-c4 Hormone recep THRB zf-C4 Hormone recep VDR zf-C4 Hormone_recep NR1D2 zf-C4 Hormone_recep 

52 Table 9. Forkhead Transcription Factors. This table represents the collection of genes with Forkhead domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Activator Repressor FOXC1 Fork head  FOXH1 Fork head  FOXI1 Fork head  FOXC2 Fork head  FOXE1 Fork head  FOXK1 FHA Fork head   FOXN3 Fork head  FOXA1 Fork head  FOXA2 Fork head  FOXA3 Fork head  FOXB1 Fork Head FOXB2 Fork head FOXD1 Fork head FOXD3 Fork head FOXE3 Fork head FOXG1 Fork head  FOXI3 Fork head FOXD2 Fork head FOXD4 Fork head FOXD4L1 Fork head FOXD4L3 Fork head FOXD4L4 Fork Head FOXD4L5 Fork head FOXD4L6 Fork Head FOXF1 Fork head  FOXF2 Fork head  FOXI2 Fork head FOXJ1 Fork head FOXJ2 Fork head FOXJ3 Fork head FOXK2 FHA Fork head FOXL1 Fork head FOXL2 Fork Head FOXM1 Fork Head FOXN1 Fork head FOXN2 Fork head FOXN4 Fork head FOXO1 Fork head  FOXO3 Fork head  FOXO4 Fork head  FOXO6 Fork head FOXP1 Fork head  FOXP2 Fork Head  FOXP3 Fork Head  FOXP4 Fork head  FOXQ1 Fork head FOXR1 Fork head FOXR2 Fork head FOXS1 Fork head 

53 Table 10. p53-like Transcription Factors. This table represents the collection of genes with p53, RUNT, RHD, STAT, and T-box domains. Those domains comprise the p53- like clan of transcription factors. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor Name TP63 p53 p53 tetramer SAM 2   TP53 p53 p53 tetramer   TP73 p53 p53 tetramer SAM 2   NFATC1 RHD TIG  NFATC2 RHD TIG  NFAT5 RHD TIG NFATC3 RHD TIG NFATC4 RHD TIG NFKB1 RHD TIG Ank Death   NFKB2 RHD TIG Ank Death REL RHD TIG RELA RHD IPT/TIG  RELB RHD TIG RUNX1 Runt  RUNX2 Runt  RUNX3 Runt   HSF1 HSF DNA-bind HSF2 HSF_DNA-bind Vert_HS_TF HSF4 HSF_DNA-bind HSF5 HSF DNA-bind HSFX1 HSF_DNA-bind HSFX2 HSF_DNA-bind HSFY1 HSF_DNA-bind  HSFY2 HSF_DNA-bind  IKBKB Pkinase IKKbetaNE  MObind STAT1 STAT int STAT alpha STAT bind SH2  STAT2 STAT_int STAT_alpha STAT bind  STAT3 STAT int STAT alpha STAT bind SH2  STAT4 STAT int STAT alpha STAT bind SH2  STAT5A STAT int STAT alpha STAT bind SH2  STAT5B STAT int STAT alpha STAT bind SH2  STAT6 STAT int STAT alpha STAT bind SH2  NRF1 Nrf1_DNA- Nrf1_activ_b bind dg TBX20 T-box  EOMES T-Box  MGA T-box HLH

54 Table 10 (continued). p53-like Transcription Factors. Gene Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor Name T T-box TBR1 T-box TBX10 T-box TBX15 T-box TBX18 T-box TBX19 T-Box TBX2 T-Box  TBX21 T-box TBX22 T-box TBX3 T-box  TBX4 T-box TBX5 T-box TBX6 T-box TBX1 T-box

55 Table 11. HMG Transcription Factors. This table represents the collection of genes with HMG domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor LEF1 CTNNB1_binding HMG_box  SOX9 Sox_N HMG_box   SRY HMG_box   HMG20A HMG_box HMG20B HMG_box HMGB2 HMG_box  HMGN5 HMG14_17  SOX1 HMG_box SOXp SOX13 HMG_box SOX14 HMG_box SOXp  SOX15 HMG_box SOX18 HMG_box DUF3547 HMG_box SOXp  SOX21 HMG_box SOXp SOX4 HMG_box SOX5 HMG_box SOX6 HMG_box  SOX7 HMG_box DUF3547 SOX8 Sox_N HMG_box TCF7L1 CTNNB1_binding HMG_box TCF7L2 CTNNB1_binding HMG_box  TFAM HMG_box BBX HMG_box DUF2028 SOX30 HMG_box TCF7 CTNNB1_binding HMG_box SOX10 Sox_N HMG_box  HBP1 AXH HMG_box  UBTF HMG_box  WHSC1 PWWP HMG_box PHD SET  SOX3 HMG_box SOXp SOX17 HMG_box DUF3547 SOX11 HMG_box  SOX12 HMG_box CIC HMG_box

56 Table 12. ETS(a) and TIG(b) Transcription Factors. This table represents the collection of genes with ETS domain (a) and TIG domain (b). The table includes such information as protein domains as well as the presence of activator or repressor record. a) ETS Transcription Factors Gene Name Domain 1 Domain 2 Activator Repressor ELF1 Ets   ELF2 Ets  ELF4 Ets  ETV4 ETS PEA3 N Ets  ELF3 SAM PNT Ets  EHF SAM PNT Ets  ELF5 SAM PNT Ets  ELK1 Ets  ELK3 ETS ELK4 Ets ERF Ets  ERG SAM PNT Ets ETS1 SAM PNT Ets ETS2 SAM PNT Ets ETV1 ETS PEA3 N Ets  ETV2 Ets  ETV3 Ets  ETV5 ETS_PEA3_N Ets  ETV6 SAM PNT Ets  ETV7 Ets  FEV Ets  FLI1 SAM PNT Ets  SPDEF SAM PNT Ets SPI1 Ets  SPIB Ets SPIC Ets ETV3L Ets GABPA SAM PNT Ets b) TIG Transcription Factors Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor NFATC1 RHD TIG  NFATC2 RHD TIG  CAMTA1 CG-1 TIG IQ  CAMTA2 CG-1 TIG IQ  NFAT5 RHD TIG NFATC3 RHD TIG NFATC4 RHD TIG NFKB1 RHD TIG Ank Death   NFKB2 RHD TIG Ank Death REL RHD TIG RELA RHD IPT/TIG  RELB RHD TIG RBPJ TIG RBPJL TIG EBF4 TIG EBF1 TIG 

57 Table 13. POU(a), SAND(b), and IRF(c) Transcription Factors. This table represents the collection of genes with POU domain (a), SAND domain (b), and IRF domain (c). The table includes such information as protein domains as well as the presence of activator or repressor record. a) POU Transcription Factors Gene Name Domain 1 Domain 2 Activator Repressor POU1F1 Pou Homeobox  POU2F1 Pou Homeobox  POU2F2 Pou Homeobox POU2F3 Pou Homeobox POU3F1 Pou Homeobox POU3F2 Pou Homeobox POU3F3 Pou Homeobox POU3F4 Pou Homeobox POU4F1 Pou Homeobox POU4F2 Pou Homeobox POU5F1 Pou Homeobox POU5F1B Pou Homeobox  POU6F1 Pou Homeobox POU6F2 Pou Homeobox POU5F2 Pou Homeobox POU4F3 Pou Homeobox

b) SAND Transcription Factors Gene Domain1 Domain2 Domain3 Domain4 Activator Repressor Name SP110 SP100 SAND PHD  SP140 SP100 SAND PHD Bromodomain GMEB2 SAND DEAF1 SAND zf-MYND  GMEB1 SAND SP100 Sp100 SAND Bromodomain

c) IRF Transcription Factors Gene Name Domain 1 Activator Repressor IRF1 IRF   IRF2 IRF   IRF3 IRF  IRF4 IRF  IRF5 IRF IRF6 IRF IRF7 IRF  IRF8 IRF  IRF9 IRF

58 Table 14. GATA(a), DM(b), HSF(c), and CP2(d) Transcription Factors. This table represents the collection of genes with GATA domain (a), DM domain (b), HSF domain (c), and CP2 domain (d). The table includes such information as protein domains as well as the presence of activator or repressor record. a) GATA Transcription Factors Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor MTA2 BAH ELM2 GATA   GATA4 GATA-N GATA  GATA5 GATA-N GATA  GATAD2A GATA  TRPS1 GATA  MTA1 BAH ELM2 Myb DNA-binding GATA MTA3 BAH ELM2 Myb DNA-binding GATA RERE ELM2 GATA Atrophin-1 ZGLP1 GATA  GATA1 GATA GATA   GATA2 GATA   GATA3 GATA GATA6 GATA-N GATA GATA  GATAD1 GATA GATAD2B GATA  b) DM Transcription Factors Gene Name Domain 1 Domain 2 Activator Repressor DMAP1 DMAP1   DMRT1 DM DMRT2 DM DMRT3 DM DMA DMRTA1 DM DMA DMRTB1 DM DMRTC2 DM DMRTA2 DM c) HSF Transcription Factors Gene Name Domain 1 Domain 2 Activator Repressor HSF1 HSF DNA-bind HSF2 HSF_DNA-bind Vert_HS_TF HSF4 HSF_DNA-bind HSF5 HSF DNA-bind HSFX1 HSF_DNA-bind HSFX2 HSF_DNA-bind HSFY1 HSF_DNA-bind  HSFY2 HSF_DNA-bind  d) CP2 Transcription Factors Gene Name Domain 1 Domain 2 Activator Repressor TFCP2 CP2 TFCP2L1 CP2 UBP1 CP2 SAM 1 GRHL1 CP2 GRHL2 CP2 GRHL3 CP2 59 Table 15. RFX(a) and AP2(b) Transcription Factors. This table represents the collection of genes with RFX domain (a) and AP2 domain (b). The table includes such information as protein domains as well as the presence of activator or repressor record. a) RFX Transcription Factors Gene Name Domain 1 Domain 2 Activator Repressor RFX3 RFX1 trans act RFX DNA binding   RFX1 RFX1_trans_act RFX_DNA_binding  RFX2 RFX1 trans act RFX DNA binding  RFX4 RFX DNA binding RFX5 RFX DNA binding  RFX6 RFX DNA binding RFX7 RFX_DNA_binding b) AP-2 Transcription Factors Gene Name Domain 1 Activator Repressor TFAP2A TF AP-2 TFAP2B TF AP-2   TFAP2C TF AP-2 TFAP2D TF AP-2  TFAP2E TF AP-2

60 Table 16. MYB DNA-Binding Transcription Factors. This table represents the collection of genes with MYB DAN-binding domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Domain 4 Activator Repressor MYB Myb DNA- Wos2   binding MYBL1 Myb_DNA- Wos2 Cmyb_C  binding MTA1 BAH ELM2 Myb DNA- GATA binding MTA3 BAH ELM2 Myb DNA- GATA binding MYBL2 Myb DNA- binding NCOR1 Myb DNA-  binding RCOR2 ELM2 myb DNA- binding SMARCC1 SWIRM myb DNA- binding SMARCC2 SWIRM myb DNA- binding SNAPC4 Myb_DNA- binding TADA2A Myb DNA- SWIRM binding TRERF1 zf-C2H2 ELM2 Myb DNA- zf-C2H2 binding DMTF1 Myb_DNA-  binding MYPOP Myb_DNA- binding ZNF541 zf-C2H2 ELM2 Myb DNA- binding CDC5L Myb_DNA- DUF3351 binding MYSM1 Myb_DNA- SWIRM Mov34 binding NCOR2 Myb_DNA- binding RCOR1 ELM2 Myb_DNA- binding RCOR3 ELM2 Myb_DNA- binding ZZZ3 Myb_DNA- ZZ binding CCDC79 Myb-DNA Binding

61 Table 17. MH1(a), PAX(b), and ARID(c) Transcription Factors. This table represents the collection of genes with MH1 domain (a), PAX domain (b), and ARID domain (c). The table includes such information as protein domains as well as the presence of activator or repressor record. a) MH1 Transcription Factors Gene Name Domain 1 Domain 2 Domain 3 Activator Repressor SMAD1 MH1 MH2  SMAD4 MH1 MH2  SMAD5 MH1 MH2  SMAD3 MH1 MH2  SMAD6 MH1 MH2 SMAD7 MH1 MH2 SMAD9 MH1 MH2 NFIA NfI_DNAbd_pre-N MH1 CTF_NFI  NFIB NfI_DNAbd_pre-N MH1 CTF_NFI NFIC NfI_DNAbd_pre-N MH1 CTF_NFI NFIX NfI_DNAbd_pre-N MH1 CTF_NFI b) PAX Transcription Factors Gene Name Domain 1 Domain 2 Activator Repressor PAX8 PAX  PAX1 PAX PAX2 PAX Pax2_C PAX3 PAX Homeobox  PAX4 PAX Homeobox PAX5 PAX Pax2_C PAX6 PAX Homeobox PAX7 PAX Homeobox PAX9 PAX c) ARID Transcription Factors Gene Domain Domain Domain Domain Domain Activator Repressor Name 1 2 3 4 5 ARID1A ARID DUF35  18 ARID4A RBB1N ARID Tudor-  T knot ARID5A ARID  ARID5B ARID  JARID2 JmjN ARID JmjC zf-C5HC2  KDM5A JmjN ARID PHD  KDM5B JmjN ARID PHD  ARID3A ARID KDM5C JmjN ARID PHD JmjC zf-C5HC2 

62 Table 18. CUT Transcription Factors. This table represents the collection of genes with CUT domain. The table includes such information as protein domains as well as the presence of activator or repressor record. Gene Name Domain 1 Domain 2 Domain 3 Activator Repressor CUX1 CUT Homeobox   ONECUT1 CUT Homeobox  ONECUT2 CUT Homeobox ONECUT3 CUT Homeobox SATB1 CUT Homeobox  SATB2 CUT Homeobox CUX2 CUT CUT CUT 

63 Figure 1. Transcription Factor Genes (All). This figure represents a list of collected genes with a record of transcription factor activity, including the genes with transcriptional activator and transcriptional repressor activity. AATF ATF6 BTF3L2 CNOT8 DMBX1 ADNP ATF6B BTF3L3 CREB1 DMRT1 ADNP2 ATF7 BUD31 CREB3 DMRT2 AEBP1 ATOH1 C11orf9 CREB3L1 DMRT3 AFF1 ATOH8 C21orf66 CREB3L2 DMRTA1 AFF3 ATXN1 C2orf3 CREB3L3 DMRTA2 AFF4 ATXN7 C5orf41 CREB3L4 DMRTB1 AHCTF1 BACH1 CALR CREB5 DMRTC2 AHR BACH2 CAMTA1 CREBBP DMTF1 AIRE BARHL1 CAMTA2 CREBL2 DNAJB6 AKNA BARHL2 CAND1 CREBZF DNM2 ALS2CR8 BARX1 CAND2 CREM DNMT3L ALX1 BARX2 CBFA2T2 CRX DPF2 ALX3 BATF CBFA2T3 CSDA DPRX ALX4 BATF2 CBFB CSRNP1 DRAP1 ANKRD1 BATF3 CBL CSRNP2 DRGX ANKRD30A BAZ1B CBX2 CSRNP3 DUX1 AR BBX CCRN4L CTBP2 DUX4 ARGFX BCL11A CD80 CTCF DUXA ARID1A BCL11B CD86 CTCFL ARID3A BCL2 CDH1 CTNNB1 ARID4A BCL3 CDKN2A CUX1 ARID5A BCL6 CDX1 CUX2 ARID5B BCL6B CDX2 CXXC1 ARNT BCLAF1 CDX4 DACH1 E2F6 ARNT2 BHLHE22 CEBPA DAXX E2F7 ARNTL BHLHE23 CEBPB DBP E2F8 ARNTL2 BHLHE40 CEBPD DBX1 E4F1 ARX BHLHE41 CEBPE DBX2 EBF1 ASCL1 BLZF1 CEBPG DDIT3 EBF4 ASCL2 BMP2 CEBPZ DDX20 EDF1 ASCL3 BNC1 CEP290 DEAF1 EED ASCL4 BPTF CHURC1 DLX1 EGR1 ASH1L BRCA2 CITED1 DLX2 EGR2 ATF1 BRD8 CITED2 DLX3 EGR3 ATF2 BRF1 CLOCK DLX4 EGR4 ATF3 BRPF1 CNBP DLX5 EHF ATF4 BSX CNOT1 DLX6 ELF1 ATF5 BTAF1 CNOT7 DMAP1 ELF2 64 Figure 1 (continued). Transcription Factor Genes (All). ELF3 FIZ1 FOXN3 GMEB2 HLX ELF4 FLI1 FOXN4 GPN1 HMBOX1 ELF5 FOS FOXO1 GRHL1 HMG20A ELK1 FOSB FOXO3 GRHL2 HMG20B ELK3 FOSL1 FOXO4 GRHL3 HMGA1 ELK4 FOSL2 FOXO6 GSC HMGB2 EMX1 FOXA1 FOXP1 GSC2 HMGN5 EMX2 FOXA2 FOXP2 GSX1 HMX1 EN1 FOXA3 FOXP3 GSX2 HMX2 EN2 FOXB1 FOXP4 GZF1 HMX3 ENO1 FOXB2 FOXQ1 HAND1 HNF1A EOMES FOXC1 FOXR1 HAND2 HNF1B EP300 FOXC2 FOXR2 HBP1 HNF4A EPAS1 FOXD1 FOXS1 HCFC1 HNF4G EPC1 FOXD2 FUBP1 HCLS1 HNRNPAB ERF FOXD3 FUBP3 HDAC1 HNRNPD ERG FOXD4 GABPA HDAC2 HOMEZ ESR1 FOXD4L1 GABPB1 HDAC4 HOPX ESR2 FOXD4L3 GAS7 HDX HOXA1 ESRRA FOXD4L4 GATA1 HELT HOXA10 ESRRB FOXD4L5 GATA2 HES1 HOXA11 ESRRG FOXD4L6 GATA3 HES5 HOXA13 ESX1 FOXE1 GATA4 HES6 HOXA2 ETS1 FOXE3 GATA5 HES7 HOXA3 ETS2 FOXF1 GATA6 HESX1 HOXA4 ETV1 FOXF2 GATAD1 HEXIM2 HOXA5 ETV2 FOXG1 GATAD2A HEY1 HOXA6 ETV3 FOXH1 GATAD2B HEY2 HOXA7 ETV3L FOXI1 GBX1 HEYL HOXA9 ETV4 FOXI2 GBX2 HHEX HOXB1 ETV5 FOXI3 GCM1 HIC1 HOXB13 ETV6 FOXJ1 GCM2 HIC2 HOXB2 ETV7 FOXJ2 GFI1 HIF1A HOXB3 EVX1 FOXJ3 GFI1B HIF3A HOXB4 EVX2 FOXK1 GLI1 HINFP HOXB5 EZH2 FOXK2 GLI2 HIRA HOXB6 FABP4 FOXL1 GLI3 HIVEP1 HOXB7 FERD3L FOXL2 GLIS1 HIVEP2 HOXB8 FEV FOXM1 GLIS2 HIVEP3 HOXB9 FEZF1 FOXN1 GLIS3 HLF HOXC10 FIGLA FOXN2 GMEB1 HLTF HOXC11 65 Figure 1 (continued). Transcription Factor Genes (All). HOXC12 IRF2 KLF5 MAX MSX1 HOXC13 IRF3 KLF6 MAZ MSX2 HOXC4 IRF4 KLF7 MBD1 MTA1 HOXC5 IRF5 KLF8 MBD2 MTA2 HOXC6 IRF6 KLF9 MDS1 MTA3 HOXC8 IRF7 L3MBTL MECOM MTF1 HOXC9 IRF8 L3MBTL3 MECP2 MXD1 HOXD1 IRF9 L3MBTL4 MED24 MXD3 HOXD10 IRX1 LASS2 MED30 MXD4 HOXD11 IRX2 LASS3 MED4 MXI1 HOXD12 IRX3 LASS4 MEF2A MYB HOXD13 IRX4 LASS5 MEF2B MYBL1 HOXD3 IRX5 LASS6 MEF2C MYBL2 HOXD4 IRX6 LBX1 MEF2D MYC HOXD8 ISL1 LBX2 MEIS1 MYCL1 HOXD9 ISL2 LCOR MEIS2 MYCN HR ISX LCORL MEIS3 MYF5 HSF1 JARID2 LEF1 MEIS3P1 MYF6 HSF2 JDP2 LEUTX MEIS3P2 MYNN HSF4 JUN LHX1 MEOX1 MYOD1 HSF5 JUNB LHX2 MEOX2 MYOG HSFX1 JUND LHX3 MESP1 MYPOP HSFX2 KDM1 LHX4 MESP2 MYST2 HSFY1 KDM4A LHX5 MGA MYST4 HSFY2 KDM5A LHX6 MITF MYT1 ID1 KDM5B LHX8 MIXL1 MYT1L ID2 KDM5C LHX9 MKX MZF1 ID3 KHDRBS1 LMX1A MLL NANOG ID4 KIAA2018 LMX1B MLL4 NANOGP1 IFI16 KLF1 LOC360030 MLLT1 NANOGP8 IKBKB KLF10 LRCH4 MLLT10 NAT14 IKZF1 KLF11 LRRFIP1 MLLT3 NCOA1 IKZF2 KLF12 LYL1 MLX NCOA2 IKZF3 KLF13 LZTR1 MLXIP NCOR1 IKZF4 KLF14 LZTS1 MLXIPL NEUROD1 IKZF5 KLF15 MAF MNT NEUROD2 ILF3 KLF16 MAFA MNX1 NEUROD4 ING2 KLF17 MAFB MRPL28 NEUROD6 INSM1 KLF2 MAFF MSC NEUROG1 IRAK1 KLF3 MAFG MSL3 NEUROG2 IRF1 KLF4 MAFK MSRB2 NEUROG3 66 Figure 1 (continued). Transcription Factor Genes (All). NFAT5 NOBOX OTP PLAGL1 RBAK NFATC1 NOTCH1 OTX1 PLAGL2 RBBP7 NFATC2 NOTO OTX2 POU1F1 RBPJ NFATC3 NPAS1 OVOL1 POU2F1 RBPJL NFATC4 NPAS2 PA2G4 POU2F2 RCAN1 NFE2 NPAS3 PATZ1 POU2F3 RCOR2 NFE2L1 NPAS4 PAX1 POU3F1 REL NFE2L2 NR0B1 PAX2 POU3F2 RELA NFE2L3 NR0B2 PAX3 POU3F3 RELB NFIA NR1D1 PAX4 POU3F4 RERE NFIB NR1D2 PAX5 POU4F1 REST NFIC NR1H2 PAX6 POU4F2 REXO4 NFIL3 NR1H3 PAX7 POU4F3 RFX1 NFIX NR1H4 PAX8 POU5F1 RFX2 NFKB1 NR1I2 PAX9 POU5F1B RFX3 NFKB2 NR1I3 PBX1 POU5F2 RFX4 NFRKB NR2C1 PBX2 POU6F1 RFX5 NFX1 NR2C2 PBX3 POU6F2 RFX6 NFXL1 NR2E1 PBX4 PPARA RFX7 NFYA NR2E3 PCGF2 PPARD RFXANK NFYB NR2F1 PCGF6 PPARG RFXAP NFYC NR2F2 PDX1 PRDM1 RHOXF1 NHLH2 NR2F6 PEG3 PRDM10 RHOXF2 NKRF NR3C1 PFDN1 PRDM16 RHOXF2B NKX1-1 NR3C2 PGBD1 PRDM2 RING1 NKX1-2 NR4A1 PGR PRDM4 RLF NKX2 NR4A2 PHB2 PRDM5 RNF4 NKX2-1 NR4A3 PHF1 PRDM7 RORA NKX2-2 NR5A1 PHF12 PREB RORB NKX2-3 NR5A2 PHF20 PROP1 RORC NKX2-4 NR6A1 PHF5A PROX1 RREB1 NKX2-5 NRF1 PHOX2A PRRX1 RSF1 NKX2-6 NRK PHOX2B PRRX2 RUNX1 NKX2-8 NRL PHTF1 PTF1A RUNX1T1 NKX3-1 OLIG1 PIAS1 PTTG1 RUNX2 NKX3-2 OLIG2 PITX1 PURB RUNX3 NKX6-1 OLIG3 PITX2 RARA RXRA NKX6-2 ONECUT1 PITX3 RARB RXRB NKX6-3 ONECUT2 PKNOX1 RARG RXRG NME2 ONECUT3 PKNOX2 RAX SALL1 NMRAL1 OSR2 PLAG1 RAX2 SALL2 67 Figure 1 (continued). Transcription Factor Genes (All). SALL4 SMARCC2 SPIB TCF15 TLX3 SATB1 SNAI1 SPIC TCF19 TP53 SATB2 SNAI2 SREBF1 TCF20 TP53BP1 SBNO2 SNAI3 SREBF2 TCF21 TP63 SCAND1 SNAPC2 SRF TCF23 TP73 SCAND2 SNAPC4 SRY TCF24 TPRX1 SCAND3 SNAPC5 ST18 TCF25 TRERF1 SCMH1 SOHLH1 STAT1 TCF3 TRIM22 SCML1 SOLH STAT2 TCF4 TRIM25 SCML2 SOX1 STAT3 TCF7 TRIM29 SCRT1 SOX10 STAT4 TCF7L1 TRIM66 SCRT2 SOX11 STAT5A TCF7L2 TRPS1 SCXA SOX12 STAT5B TCFL5 TSC22D1 SEBOX SOX13 STAT6 TEAD1 TSC22D2 SEC14L2 SOX14 SUPT3H TEAD2 TSC22D3 SERTAD1 SOX15 SUPT4H1 TEAD3 TSC22D4 SERTAD3 SOX17 SUPT6H TEAD4 TSHZ1 SETDB1 SOX18 T TEF TSHZ2 SHOX SOX2 TADA2A TFAM TSHZ3 SHOX2 SOX21 TAF13 TFAP2A TUB SIM1 SOX3 TAF1B TFAP2B TULP3 SIM2 SOX30 TAL1 TFAP2C TULP4 SIX1 SOX4 TAL2 TFAP2D TWIST1 SIX2 SOX5 TARDBP TFAP2E TWIST2 SIX3 SOX6 TBPL2 TFAP4 UBE2V1 SIX4 SOX7 TBR1 TFCP2 UBN1 SIX5 SOX8 TBX1 TFCP2L1 UBP1 SIX6 SOX9 TBX10 TFDP1 UBTF SLC26A3 SP1 TBX15 TFDP2 UHRF1 SLC2A4RG SP110 TBX18 TFDP3 UNCX SLC30A9 SP140 TBX19 TFE3 USF1 SMAD1 SP2 TBX2 TFEB USF2 SMAD3 SP3 TBX20 TFEC VAV1 SMAD4 SP4 TBX21 TGIF1 VAX1 SMAD5 SP5 TBX22 TGIF2 VAX2 SMAD6 SP6 TBX3 TGIF2LX VDR SMAD7 SP7 TBX4 TGIF2LY VENTX SMAD9 SP8 TBX5 THRA VEZF1 SMARCA4 SPDEF TBX6 THRB VPS72 SMARCA5 SPEN TCEAL1 TLX1 VSX1 SMARCC1 SPI1 TCF12 TLX2 VSX2 68 Figure 1 (continued). Transcription Factor Genes (All). WHSC1 ZKSCAN3 ZNF236 ZNF45 ZSCAN22 WT1 ZKSCAN4 ZNF238 ZNF461 ZSCAN23 XBP1 ZKSCAN5 ZNF239 ZNF480 ZSCAN29 YBX1 ZNF10 ZNF24 ZNF483 ZSCAN4 YEATS4 ZNF117 ZNF253 ZNF488 ZSCAN5A YWHAH ZNF131 ZNF256 ZNF492 ZSCAN5B YY1 ZNF132 ZNF263 ZNF496 ZSCAN5C YY2 ZNF133 ZNF268 ZNF498 ZXDA ZBED1 ZNF134 ZNF274 ZNF500 ZXDC ZBTB16 ZNF135 ZNF277 ZNF589 ZBTB17 ZNF138 ZNF281 ZNF606 ZBTB2 ZNF14 ZNF282 ZNF628 ZBTB25 ZNF140 ZNF287 ZNF639 ZBTB32 ZNF141 ZNF292 ZNF642 ZBTB33 ZNF142 ZNF3 ZNF649 ZBTB38 ZNF143 ZNF323 ZNF675 ZBTB4 ZNF148 ZNF33A ZNF69 ZBTB48 ZNF154 ZNF33B ZNF692 ZBTB5 ZNF155 ZNF35 ZNF70 ZBTB7A ZNF157 ZNF350 ZNF71 ZBTB7B ZNF165 ZNF354A ZNF75C ZEB1 ZNF167 ZNF367 ZNF75D ZEB2 ZNF169 ZNF37A ZNF76 ZFHX2 ZNF174 ZNF382 ZNF8 ZFHX3 ZNF175 ZNF384 ZNF80 ZFHX4 ZNF18 ZNF394 ZNF81 ZFP36L1 ZNF187 ZNF395 ZNF83 ZFP36L2 ZNF189 ZNF396 ZNF85 ZFP37 ZNF19 ZNF397 ZNF90 ZFP42 ZNF192 ZNF397OS ZNF91 ZFP57 ZNF193 ZNF398 ZNF92 ZFPM2 ZNF197 ZNF41 ZNF93 ZGLP1 ZNF202 ZNF418 ZRANB2 ZHX1 ZNF207 ZNF423 ZSCAN1 ZHX2 ZNF211 ZNF436 ZSCAN10 ZHX3 ZNF213 ZNF438 ZSCAN12 ZIC1 ZNF215 ZNF44 ZSCAN16 ZIC2 ZNF217 ZNF444 ZSCAN18 ZIC3 ZNF219 ZNF445 ZSCAN2 ZKSCAN1 ZNF224 ZNF446 ZSCAN20 ZKSCAN2 ZNF232 ZNF449 ZSCAN21 69 Figure 2. Transcription Factor Genes (Activators). This figure represents a list of collected genes with documented transcriptional activation function. AFF1 CITED1 ETV2 HMBOX1 MBD2 AFF3 CITED2 ETV4 HMGN5 MED24 AHR CLOCK ETV5 HNF1A MED30 AIRE CNOT7 FEZF1 HNF1B MED4 ALS2CR8 CREB1 FOSB HNF4G MITF ALX1 CREB3L1 FOSL1 HNRNPD MLXIPL AR CREB3L2 FOXA1 HOXA9 MNT ARID1A CREB3L4 FOXA2 HOXB9 MTA2 ARNTL CREBBP FOXA3 HOXC9 MTF1 ASCL1 CREM FOXC1 HOXD9 MXD4 ASCL2 CRX FOXC2 HSFY1 MYB ASH1L CTCF FOXF1 HSFY2 MYBL1 ATF2 CTCFL FOXF2 IKBKB MYC ATF3 CUX1 FOXH1 IKZF2 MYCN ATF4 CXXC1 FOXI1 ILF3 MYF5 ATF5 DEAF1 FOXK1 ING2 MYOD1 ATF6 DMAP1 FOXO1 IRAK1 MYOG ATF6B DMTF1 FOXO3 IRF1 MYST4 ATOH1 DNM2 FOXO4 IRF2 NAT14 BCL2 DUX4 FOXP3 IRF3 NCOA1 BHLHE23 E2F1 FUBP3 IRF4 NEUROD1 BMP2 E2F2 GATA1 IRF7 NEUROD2 BRCA2 E2F3 GATA2 IRF8 NFATC1 BRD8 E4F1 GATA4 ISL1 NFATC2 BRF1 EBF1 GATA5 JUN NFE2 BRPF1 EGR1 GATA6 JUND NFE2L2 CAMTA1 EGR4 GLI1 KDM5A NFIA CAMTA2 ELF1 GLI2 KLF1 NFKB1 CAND1 ELF2 GLIS1 KLF11 NKX2 CAND2 ELF4 GLIS2 KLF13 NKX2-1 CD80 ELF5 GLIS3 KLF2 NKX2-3 CD86 ELK1 GPN1 KLF4 NKX2-5 CDH1 EOMES HAND2 KLF5 NPAS4 CDX1 EP300 HDAC1 KLF6 NR1H4 CDX2 EPAS1 HDAC2 KLF7 NR3C1 CEBPA EPC1 HDAC4 KLF9 NR4A1 CEBPB ESR1 HIF1A LEF1 NR4A3 CEBPE ESR2 HIVEP3 MAFA NR5A1 CEP290 ESRRG HLF MAFF NR5A2 CHURC1 ETV1 HLTF MAZ ONECUT1 70 Figure 2 (continued). Transcription Factor Genes (Activators). PAX8 SPI1 ZNF492 PBX2 SREBF1 ZSCAN21 PDX1 SRY ZXDC PIAS1 STAT1 POU1F1 STAT2 POU5F1B STAT3 PPARA STAT4 PPARD STAT5A PPARG STAT5B PRDM2 STAT6 PREB TAL1 PROP1 TBX20 PTF1A TCF12 RARB TCF3 RFX2 TEAD1 RFX3 TEAD2 RFXANK TFAP2B RORA TFAP2D RREB1 TFAP4 RSF1 TFE3 RUNX1 TFEC RUNX2 TP53 RUNX3 TP53BP1 SEC14L2 TP63 SERTAD1 TP73 SERTAD3 UBE2V1 SMAD1 UBTF SMAD3 WT1 SMAD4 YWHAH SMAD5 YY1 SMARCA4 YY2 SMARCA5 ZNF143 SOHLH1 ZNF148 SOX10 ZNF238 SOX11 ZNF281 SOX2 ZNF287 SOX6 ZNF384 SOX9 ZNF397 SP1 ZNF398 SP110 ZNF423 SP3 ZNF480 71 Figure 3. Transcription Factor Genes (Repressors). This figure represents a list of collected genes with documented transcriptional repression function. AEBP1 DMBX1 GFI1 IRF1 MYST4 ARID4A DNAJB6 GFI1B IRF2 MZF1 ARID5A DNMT3L GLI3 JARID2 NCOA2 ARID5B DRAP1 GLIS1 JDP2 NCOR1 ARNT2 E2F1 GLIS2 KDM1 NFIL3 ASCL1 E2F6 GLIS3 KDM4A NFKB1 ATF3 E2F7 GZF1 KDM5B NFX1 ATXN1 E2F8 HAND1 KDM5C NKRF BACH1 E4F1 HBP1 KHDRBS1 NKX2-5 BACH2 EED HDAC2 KLF12 NMRAL1 BATF EHF HDAC4 KLF15 NR0B1 BATF3 ELF1 HELT KLF16 NR0B2 BCL11A ELF3 HES1 KLF3 NR1D2 BCL11B EPC1 HES5 KLF4 NR1I2 BCL3 ERF HES6 KLF8 NR2C1 BCL6 ESR1 HES7 KLF9 NR2E1 BCL6B ESRRB HESX1 L3MBTL NR2F1 BCLAF1 ESX1 HEXIM2 LRCH4 NR3C1 BHLHE22 ETV3 HEY1 LRRFIP1 PATZ1 BHLHE40 ETV6 HEY2 MAFB PAX3 BHLHE41 ETV7 HHEX MAFF PCGF2 BMP2 EVX1 HIC2 MAFK PCGF6 C2orf3 EZH2 HINFP MBD1 PHB2 CALR FABP4 HIRA MBD2 PHF1 CBX2 FERD3L HIVEP1 MECOM PHF12 CDX1 FEV HMBOX1 MECP2 PIAS1 CDX2 FIZ1 HMGB2 MEIS1 PLAGL1 CEBPA FLI1 HMX1 MITF POU2F1 CITED1 FOXE1 HOPX MLXIPL PPARD CNOT1 FOXG1 HOXA7 MNT PPARG CREBZF FOXK1 HOXA9 MSC PRDM16 CREM FOXN3 HOXB4 MSX1 RARA CTBP2 FOXP1 HOXC8 MSX2 RBAK CTCF FOXP2 ID1 MTA2 RBBP7 CUX1 FOXP4 ID2 MXD1 RELA CUX2 FOXS1 IFI16 MXD3 REST DAXX GATA1 IKZF1 MXD4 RFX1 DDX20 GATA2 IKZF4 MXI1 RFX3 DLX4 GATAD2A ILF3 MYB RFX5 DMAP1 GATAD2B INSM1 MYC RING1 72 Figure 3 (continued). Transcription Factor Genes (Repressors). RREB1 TP63 ZNF217 RSF1 TP73 ZNF219 RUNX1T1 TRIM66 ZNF224 RUNX3 TRPS1 ZNF238 RXRA TSC22D4 ZNF24 SALL1 TWIST2 ZNF253 SATB1 VENTX ZNF256 SBNO2 WHSC1 ZNF263 SCMH1 YBX1 ZNF268 SCRT1 YY1 ZNF281 SCRT2 YY2 ZNF282 SETDB1 ZBTB16 ZNF350 SIM2 ZBTB2 ZNF382 SIX3 ZBTB38 ZNF395 SNAI1 ZBTB4 ZNF397 SNAI2 ZBTB5 ZNF423 SOX14 ZBTB7A ZNF438 SOX9 ZEB1 ZNF44 SP3 ZEB2 ZNF461 SPEN ZFHX3 ZNF488 SRY ZFP57 ZNF589 ST18 ZGLP1 ZNF606 TAL1 ZHX2 ZNF639 TARDBP ZIC2 ZNF649 TBX2 ZNF10 ZNF675 TBX3 ZNF133 ZNF76 TCF25 ZNF14 ZNF8 TCF7L2 ZNF148 ZNF91 TFAP2B ZNF174 ZSCAN16 TGIF2 ZNF175 TP53 ZNF197

73 Figure 4. Transcription Factor Genes (Activators and Repressors). This figure represents a list of collected genes with both activator and repressor function. ASCL1 MYST4 ATF3 NFKB1 BMP2 NKX2-5 CDX1 NR3C1 CDX2 PIAS1 CEBPA PPARD CITED1 PPARG CREM RFX3 CTCF RREB1 CUX1 RSF1 DMAP1 RUNX3 E2F1 SOX9 E4F1 SP3 ELF1 SRY EPC1 TAL1 ESR1 TFAP2B FOXK1 TP53 GATA1 TP63 GATA2 TP73 GLIS1 YY1 GLIS2 YY2 GLIS3 ZNF148 HDAC2 ZNF238 HDAC4 ZNF281 HMBOX1 ZNF397 HOXA9 ZNF423 ILF3 IRF1 IRF2 KLF4 KLF9 MAFF MBD2 MITF MLXIPL MNT MTA2 MXD4 MYB MYC 74 Figure 5. Phylogenetic Relationship Between Genes with KRAB Domain. Protein allignment was performed for transcription factors containing KRAB domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

75 Figure 5 (continued). Phylogenetic Relationship Between Genes with KRAB Domain.

76 Figure 5 (continued). Phylogenetic Relationship Between Genes with KRAB Domain.

77 Figure 5 (continued). Phylogenetic Relationship Between Genes with KRAB Domain.

78 Figure 5 (continued). Phylogenetic Relationship Between Genes with KRAB Domain.

79 Figure 5 (continued). Phylogenetic Relationship Between Genes with KRAB Domain.

80 Figure 6. Phylogenetic Relationship Between Genes with SCAN Domain. Protein allignment was performed for transcription factors containing SCAN domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

81 Figure 7. Phylogenetic Relationship Between Genes with BTB Domain. Protein allignment was performed for transcription factors containing BTB domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

82 Figure 8. Phylogenetic Relationship Between Genes with ZF C2H2 Domain but without KRAB, SCAN, or BTB Domains. Protein allignment was performed for transcription factors containing ZF C2H2 domain but no KRAB, SCAN, or BTB domains. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

83 Figure 8 (continued). Phylogenetic Relationship Between Genes with ZF C2H2 Domain but without KRAB, SCAN, or BTB Domains.

84 Figure 8 (continued). Phylogenetic Relationship Between Genes with ZF C2H2 Domain but without KRAB, SCAN, or BTB Domains.

85 Figure 8 (continued). Phylogenetic Relationship Between Genes with ZF C2H2 Domain but without KRAB, SCAN, or BTB Domains.

86 Figure 8 (continued). Phylogenetic Relationship Between Genes with ZF C2H2 Domain but without KRAB, SCAN, or BTB Domains.

87 Figure 8 (continued). Phylogenetic Relationship Between Genes with ZF C2H2 Domain but without KRAB, SCAN, or BTB Domains.

88 Figure 9. Phylogenetic Relationship Between Genes with Homeobox Domain. Protein allignment was performed for transcription factors containing Homeobox domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

89 Figure 9 (continued). Phylogenetic Relationship Between Genes with Homeobox Domain.

90 Figure 9 (continued). Phylogenetic Relationship Between Genes with Homeobox Domain.

91 Figure 9 (continued). Phylogenetic Relationship Between Genes with Homeobox Domain.

92 Figure 9 (continued). Phylogenetic Relationship Between Genes with Homeobox Domain.

93 Figure 10. Phylogenetic Relationship Between Genes with HLH Domain. Protein allignment was performed for transcription factors containing HLH domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

94 Figure 10 (continued). Phylogenetic Relationship Between Genes with HLH Domain.

95 Figure 11. Phylogenetic Relationship Between Genes with bZip Domain. Protein allignment was performed for transcription factors containing bZIP domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

96 Figure 12. Phylogenetic Relationship Between Genes with ZF C4 Domain. Protein allignment was performed for transcription factors containing ZF C4 domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

97 Figure 13. Phylogenetic Relationship Between Genes with Forkhead Domain. Protein allignment was performed for transcription factors containing Forkhead domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

98 Figure 14. Phylogenetic Relationship Between p53-like Genes. Protein allignment was performed for transcription factors containing p53-like domains such p53, RUNT, RHD, STAT, and T-box in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

99 Figure 15. Phylogenetic Relationship Between Genes with HMG Domain. Protein allignment was performed for transcription factors containing HMG domain in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

100 Figure 16. Phylogenetic Relationship Between Genes with ETS(a) and TIG(b) Domains. Protein allignment was performed for transcription factors containing ETS (a) and TIG domain (b) in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

a) ETS domain genes

b) TIG domain genes

101 Figure 17. Phylogenetic Relationship Between Genes with POU(a), SAND(b), and IRF(c) Domains. Protein allignment was performed for transcription factors containing POU domain (a), SAND domain (b), and IRF domain (c) in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

a) POU domain genes

b) SAND domain genes

c) IRF domain genes

102 Figure 18. Phylogenetic Relationship Between Genes with GATA(a), DM(b) and HSF(c) Domains. Protein allignment was performed for transcription factors containing GATA domain (a), DM domain (b), and HSF domain (c) in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree. a) GATA domain genes

b) DM domain genes

c) HSF domain genes

103 Figure 19. Phylogenetic Relationship Between Genes with CP2(a), RFX(b), and AP2(c) Domains. Protein allignment was performed for transcription factors containing CP2 doamin (a), RFX domain (b), and AP2 domain (c) in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

a) CP2 domain genes

b) RFX domain genes

c) AP-2 domain genes

104 Figure 20. Phylogenetic Relationship Between Genes with MYB(a) and MH1(b) Domains. Protein allignment was performed for transcription factors containing MYB domain (a) and MH1 doamin (b) in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

a) MYB domain genes

b) MH1 domain genes

105 Figure 21. Phylogenetic Relationship Between Genes with PAX(a), ARID(b), and CUT(c) Domains. Protein allignment was performed for transcription factors containing PAX domain (a), ARID doamin (b), and CUT doamin (c) in their sequence. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

a) PAX domain genes

b) ARID domain genes

c) CUT domain genes

106 Figure 22. Phylogenetic Relationship Among Genes with Documented Transcriptional Activation Function. Protein allignment was performed for transcription factors having a documented transcriptional activation activity. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

107 Figure 22 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Activation Function.

108 Figure 22 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Activation Function.

109 Figure 22 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Activation Function.

110 Figure 22 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Activation Function.

111 Figure 23. Phylogenetic Relationship Among Genes with Documented Transcriptional Repression Function. Protein allignment was performed for transcription factors having a documented transcriptional repression activity. Such information as family relationship between the genes as well as evolutionary relationship among the included members can be obtained from this phylogenetic tree.

112 Figure 23 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Repression Function.

113 Figure 23 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Repression Function.

114 Figure 23 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Repression Function.

115 Figure 23 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Repression Function.

116 Figure 23 (continued). Phylogenetic Relationship Among Genes with Documented Transcriptional Repression Function.

117

WORKS CITED

Ahmad, . F., Melnick, A., Lax, S., Bouchard, D., Liu, ., Kiang, C. ., Mayer, S., Takahashi, S., Licht, J. D., and Prive, G. G. (2003). Mechanism of SMRT corepressor recruitment by the BCL6 BTB domain. Mol Cell 12, 1551-64. Alazard, ., Mourey, L., Ebel, C., Konarev, P. V., Petoukhov, . V., Svergun, D. I., and Erard, M. (2007). Fine-tuning of intrinsic N-Oct-3 POU domain allostery by regulatory DNA targets. Nucleic Acids Res 35, 4420-32. Aravind, L., and Koonin, E. V. (2000). SAP - a putative DNA-binding motif involved in chromosomal organization. Trends Biochem Sci 25, 112-4. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, ., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel- Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-9. Attisano, L., and Lee-Hoeflich, S. T. (2001). The Smads. Genome Biol 2, REVIEWS3010. Ayyanathan, K., Lechner, M. S., Bell, P., Maul, G. G., Schultz, D. C., Yamada, ., Tanaka, K., Torigoe, K., and Rauscher, F. J., 3rd. (2003). Regulated recruitment of HP1 to a euchromatic gene induces mitotically heritable, epigenetic gene silencing: a mammalian cell culture model of gene variegation. Genes Dev 17, 1855-69. Ayyanathan, K., Peng, H., Hou, ., Fredericks, . J., Goyal, R. K., Langer, E. M., Longmore, G. D., and Rauscher, F. J., 3rd. (2007). The Ajuba LIM domain protein is a corepressor for SNAG domain mediated repression and participates in nucleocytoplasmic Shuttling. Res 67, 9097-106.

118

Biedenkapp, H., Borgmeyer, U., Sippel, A. E., and Klempnauer, K. H. (1988). Viral myb oncogene encodes a sequence-specific DNA-binding activity. Nature 335, 835-7. Bottomley, M. J., Collard, M. W., Huggenvik, J. I., Liu, Z., Gibson, T. J., and Sattler, M. (2001). The SAND domain structure defines a novel DNA-binding fold in transcriptional regulation. Nat Struct Biol 8, 626-33. Boyadjiev, S. A., and Jabs, E. W. (2000). Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders. Clin Genet 57, 253-66. Bustin, S. A., and McKay, I. A. (1994). Transcription factors: targets for new designer drugs. Br J Biomed Sci 51, 147-57. Butt, T. R., and Karathanasis, S. K. (1995). Transcription factors as drug targets: opportunities for therapeutic selectivity. Gene Expr 4, 319-36. Callebaut, I., Courvalin, J. C., and Mornon, J. P. (1999). The BAH (bromo-adjacent homology) domain: a link between DNA methylation, replication and transcriptional regulation. FEBS Lett 446, 189-93. Claessens, F., and Gewirth, D. T. (2004). DNA recognition by nuclear receptors. Essays Biochem 40, 59-72. Clos, J., Westwood, J. T., Becker, P. B., Wilson, S., Lambert, K., and Wu, C. (1990). Molecular cloning and expression of a hexameric Drosophila subject to negative regulation. Cell 63, 1085-97. Costoya, J. A. (2007). Functional analysis of the role of POK transcriptional repressors. Brief Funct Genomic Proteomic 6, 8-18. Darnell, J. E., Jr. (2002). Transcription factors as targets for cancer therapy. Nat Rev Cancer 2, 740-9. Dawson, S. R., Turner, D. L., Weintraub, H., and Parkhurst, S. M. (1995). Specificity for the hairy/enhancer of split basic helix-loop-helix (bHLH) proteins maps outside the bHLH domain and suggests two separable modes of transcriptional repression. Mol Cell Biol 15, 6923-31. Ding, Z., Gillespie, L. L., and Paterno, G. D. (2003). Human MI-ER1 alpha and beta function as transcriptional repressors by recruitment of histone deacetylase 1 to their conserved ELM2 domain. Mol Cell Biol 23, 250-8.

119

Doerks, T., Copley, R. R., Schultz, J., Ponting, C. P., and Bork, P. (2002). Systematic identification of novel protein domain families associated with nuclear functions. Genome Res 12, 47-56. Ellenberger, T., Fass, D., Arnaud, M., and Harrison, S. C. (1994). Crystal structure of transcription factor E47: E-box recognition by a basic region helix-loop-helix dimer. Genes Dev 8, 970-80. Erdman, S. E., and Burtis, K. C. (1993). The Drosophila doublesex proteins share a novel zinc finger related DNA binding domain. Embo J 12, 527-35. Friedman, J. R., Fredericks, W. J., Jensen, D. E., Speicher, D. W., Huang, . P., Neilson, E. G., and Rauscher, F. J., 3rd. (1996). KAP-1, a novel corepressor for the highly conserved KRAB repression domain. Genes Dev 10, 2067-78. Gehring, W. J. (1992). The homeobox in perspective. Trends Biochem Sci 17, 277-80. Gehring, W. J., Muller, M., Affolter, M., Percival-Smith, A., Billeter, M., Qian, Y. ., Otting, G., and Wuthrich, K. (1990). The structure of the homeodomain and its functional implications. Trends Genet 6, 323-9. Ghosh, D., and Papavassiliou, A. G. (2005). Transcription factor therapeutics: long-shot or lodestone. Curr Med Chem 12, 691-701. Grandori, C., and Eisenman, R. N. (1997). Myc target genes. Trends Biochem Sci 22, 177-81. Gronemeyer, H., Gustafsson, J. A., and Laudet, V. (2004). Principles for modulation of the nuclear superfamily. Nat Rev Drug Discov 3, 950-64. Grune, T., Brzeski, J., Eberharter, A., Clapier, C. R., Corona, D. F., Becker, P. B., and Muller, C. W. (2003). Crystal structure and functional analysis of a recognition module of the remodeling factor ISWI. Mol Cell 12, 449-60. Hacker, U., Grossniklaus, U., Gehring, W. J., and Jackle, H. (1992). Developmentally regulated Drosophila gene family encoding the fork head domain. Proc Natl Acad Sci U S A 89, 8754-8. Huntley, S., Baggott, D. M., Hamilton, A. T., Tran-Gyamfi, M., Yang, S., Kim, J., Gordon, L., Branscomb, E., and Stubbs, L. (2006). A comprehensive catalog of

120

human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res 16, 669-77. Jimenez-Sanchez, G., Childs, B., and Valle, D. (2001). Human disease genes. Nature 409, 853-5. Kelly, K. F., and Daniel, J. M. (2006). POZ for effect--POZ-ZF transcription factors in cancer and development. Trends Cell Biol 16, 578-87. Kisseleva, T., Bhattacharya, S., Braunstein, J., and Schindler, C. W. (2002). Signaling through the JAK/STAT pathway, recent advances and future challenges. Gene 285, 1-24. Kortschak, R. D., Tucker, P. W., and Saint, R. (2000). ARID proteins come in from the desert. Trends Biochem Sci 25, 294-9. Lannoy, V. J., Burglin, T. R., Rousseau, G. G., and Lemaigre, F. P. (1998). Isoforms of hepatocyte nuclear factor-6 differ in DNA-binding properties, contain a bifunctional homeodomain, and define the new ONECUT class of homeodomain proteins. J Biol Chem 273, 13552-62. Littlewood, T. D., and Evan, G. I. (1995). Transcription factors 2: helix-loop-helix. Protein Profile 2, 621-702. Lopez-Bigas, N., Blencowe, B. J., and Ouzounis, C. A. (2006). Highly consistent patterns for inherited human diseases at the molecular level. Bioinformatics 22, 269-77. Melnick, A., Ahmad, K. F., Arai, S., Polinger, A., Ball, H., Borden, K. L., Carlile, G. W., Prive, G. G., and Licht, J. D. (2000). In-depth mutational analysis of the promyelocytic leukemia zinc finger BTB/POZ domain reveals motifs and residues required for biological and transcriptional functions. Mol Cell Biol 20, 6550-67. Messina, D. N., Glasscock, J., Gish, W., and Lovett, M. (2004). An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res 14, 2041-7. Moehren, U., Eckey, M., and Baniahmad, A. (2004). Gene repression by nuclear hormone receptors. Essays Biochem 40, 89-104.

121

Moellering, R. E., Cornejo, M., Davis, T. N., Del Bianco, C., Aster, J. C., Blacklow, S. C., Kung, A. L., Gilliland, D. G., Verdine, G. L., and Bradner, J. E. (2009). Direct inhibition of the NOTCH transcription factor complex. Nature 462, 182-8. Muller, C. W., Rey, F. A., Sodeoka, M., Verdine, G. L., and Harrison, S. C. (1995). Structure of the NF-kappa B p50 homodimer bound to DNA. Nature 373, 311-7. Nieto, M. A. (2002). The snail superfamily of zinc-finger transcription factors. Nat Rev Mol Cell Biol 3, 155-66. Oikawa, T., and Yamada, T. (2003). Molecular biology of the Ets family of transcription factors. Gene 303, 11-34. Otting, G., Qian, Y. Q., Billeter, M., Muller, M., Affolter, M., Gehring, W. J., and Wuthrich, K. (1990). Protein--DNA contacts in the structure of a homeodomain-- DNA complex determined by nuclear magnetic resonance spectroscopy in solution. Embo J 9, 3085-92. Overington, J. P., Al-Lazikani, B., and Hopkins, A. L. (2006). How many drug targets are there? Nat Rev Drug Discov 5, 993-6. Papavassiliou, A. G. (1998). Transcription-factor-modulating agents: precision and selectivity in drug design. Mol Med Today 4, 358-66. Pastore, A., De Francesco, R., Barbato, G., Castiglione Morelli, M. A., Motta, A., and Cortese, R. (1991). 1H resonance assignment and secondary structure determination of the dimerization domain of transcription factor LFB1. Biochemistry 30, 148-53. Prevot, D., Voeltzel, T., Birot, A. M., Morel, A. P., Rostan, M. C., Magaud, J. P., and Corbo, L. (2000). The leukemia-associated protein Btg1 and the p53-regulated protein Btg2 interact with the homeoprotein Hoxb9 and enhance its transcriptional activation. J Biol Chem 275, 147-53. Reith, W., Herrero-Sanchez, C., Kobr, M., Silacci, P., Berte, C., Barras, E., Fey, S., and Mach, B. (1990). MHC class II regulatory factor RFX has a novel DNA-binding domain and a functionally independent dimerization domain. Genes Dev 4, 1528- 40.

122

Rings, E. H., Boudreau, F., Taylor, J. K., Moffett, J., Suh, E. R., and Traber, P. G. (2001). of the serine 60 residue within the Cdx2 activation domain mediates its transactivation capacity. Gastroenterology 121, 1437-50. Roose, J., Molenaar, M., Peterson, J., Hurenkamp, J., Brantjes, H., Moerer, P., van de Wetering, M., Destree, ., and Clevers, H. (1998). The Wnt effector XTcf-3 interacts with Groucho-related transcriptional repressors. Nature 395, 608-12. Ryan, R. F., Schultz, D. C., Ayyanathan, K., Singh, P. B., Friedman, J. R., Fredericks, W. J., and Rauscher, F. J., 3rd. (1999). KAP-1 corepressor protein interacts and colocalizes with heterochromatic and euchromatic HP1 proteins: a potential role for Kruppel-associated box-zinc finger proteins in heterochromatin-mediated gene silencing. Mol Cell Biol 19, 4366-78. Sander, T. L., Stringer, K. F., Maki, J. L., Szauter, P., Stone, J. R., and Collins, T. (2003). The SCAN domain defines a large family of zinc finger transcription factors. Gene 310, 29-38. Schreiber, J., Enderich, J., and Wegner, M. (1998). Structural requirements for DNA binding of GCM proteins. Nucleic Acids Res 26, 2337-43. Schultz, D. C., Ayyanathan, K., Negorev, D., Maul, G. G., and Rauscher, F. J., 3rd. (2002). SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes Dev 16, 919-32. Schultz, D. C., Friedman, J. R., and Rauscher, F. J., 3rd. (2001). Targeting histone deacetylase complexes via KRAB-zinc finger proteins: the PHD and bromodomains of KAP-1 form a cooperative unit that recruits a novel isoform of the Mi-2alpha subunit of NuRD. Genes Dev 15, 428-43. Schumacher, C., Wang, H., Honer, C., Ding, W., Koehn, J., Lawrence, Q., Coulis, C. M., Wang, L. L., Ballinger, D., Bowen, B. R., and Wagner, S. (2000). The SCAN domain mediates selective oligomerization. J Biol Chem 275, 17173-9. Stogios, P. J., Downs, G. S., Jauhal, J. J., Nandra, S. K., and Prive, G. G. (2005). Sequence and structural analysis of BTB domain proteins. Genome Biol 6, R82.

123

Thomas, J. O., and Travers, A. A. (2001). HMG1 and 2, and related 'architectural' DNA- binding proteins. Trends Biochem Sci 26, 167-74. Urrutia, R. (2003). KRAB-containing zinc-finger repressor proteins. Genome Biol 4, 231. Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A., and Luscombe, N. M. (2009). A census of human transcription factors: function, expression and evolution. Nat Rev Genet 10, 252-63. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P., Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q., Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G., Thomas, P. D., Zhang, J., Gabor Miklos, G. L., Nelson, C., Broder, S., Clark, A. G., Nadeau, J., McKusick, V. A., Zinder, N., Levine, A. J., Roberts, R. J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A. E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T. J., Higgins, M. E., Ji, R. R., Ke, Z., Ketchum, K. A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G. V., Milshina, N., Moore, H. M., Naik, A. K., Narayan, V. A., Neelam, B., Nusskern, D., Rusch, D. B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., et al. (2001). The sequence of the human genome. Science 291, 1304-51. Weisz, A., Marx, P., Sharf, R., Appella, E., Driggers, P. H., Ozato, K., and Levi, B. Z. (1992). Human interferon consensus sequence binding protein is a negative regulator of enhancer elements common to interferon-inducible genes. J Biol Chem 267, 25589-96. Williams, A. J., Blacklow, S. C., and Collins, T. (1999). The zinc finger-associated SCAN box is a conserved oligomerization domain. Mol Cell Biol 19, 8526-35.

124

Wilson, D., Charoensawan, V., Kummerfeld, S. K., and Teichmann, S. A. (2008). DBD-- taxonomically broad transcription factor predictions: new content and functionality. Nucleic Acids Res 36, D88-92. Wilson, V., and Conlon, F. L. (2002). The T-box family. Genome Biol 3, REVIEWS3008. Witzgall, R., O'Leary, E., Leaf, A., Onaldi, D., and Bonventre, J. V. (1994). The Kruppel-associated box-A (KRAB-A) domain of zinc finger proteins mediates transcriptional repression. Proc Natl Acad Sci U S A 91, 4514-8. Wu, L., Wu, H., Ma, L., Sangiorgi, F., Wu, N., Bell, J. R., Lyons, G. E., and Maxson, R. (1997). Miz1, a novel zinc finger transcription factor that interacts with Msx2 and enhances its affinity for DNA. Mech Dev 65, 3-17. Xin, H., Taudte, S., Kallenbach, N. R., Limbach, M. P., and Zitomer, R. S. (2000). DNA binding by single HMG box model proteins. Nucleic Acids Res 28, 4044-50. Xu, H. E., Rould, M. A., Xu, W., Epstein, J. A., Maas, R. L., and Pabo, C. O. (1999). Crystal structure of the human Pax6 paired domain-DNA complex reveals specific roles for the linker region and carboxy-terminal subdomain in DNA binding. Genes Dev 13, 1263-75. Yamamoto, M., Ko, L. J., Leonard, M. W., Beug, H., Orkin, S. H., and Engel, J. D. (1990). Activity and tissue-specific expression of the transcription factor NF-E1 multigene family. Genes Dev 4, 1650-62.

125