A COMPREHENSIVE STUDY OF MAMMALIAN SNAG TRANSCRIPTION

FACTOR FAMILY MEMBERS

by

Cindy Chiang

A Dissertation Submitted to the Faculty of

The Charles E. Schmidt College of Science

in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

Florida Atlantic University

Boca Raton, FL

May 2012

Copyright by Cindy Chiang 2012

ii A COMPREHENSIVE STUDY OF MAMMALIAN SNAG TRANSCRIPTION FACTOR FAMILY MEMBERS

by

Cindy Chiang

This dissertation was prepared under the direction of the candidate's dissertation advisor, Dr. Kasirajan Ayyanathan, Department of Biological Sciences, and has been approved by the members of her supervisory committee. It was submitted to the faculty of the Charles E. Schmidt College of Science and was accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy.

SUPERVISORY COMMITTEE: rlA--7 A- v-- t-f-1:. Kas~yyanathanJ Ph.D. DiS7~~' __~ _ Paul Kirchman, Ph.D. ~-~

Gary W. erry, PH. Dean e Charles E. Schmidt College of Science

Barry T. Ro on, Ph.D. Dean, Graduate College

iii ACKNOWLEDGMENTS

The author is grateful to her advisor Dr. Ayyanathan for her acceptance into his lab as an undergraduate and the impact he had on her scientific growth as a graduate student. She would also like to thank her committee members Dr. Paul Kirchman, Dr. Ramaswamy Narayanan, and Dr. Herbert

Weissbach for not only their guidance and insight on her project but also for their encouragement and cooperation during her studies. Thanks are also expressed to members of the labs at the Center for Molecular Biology and

Biotechnology and all the DIS and graduate students she met along the way.

Finally, the author wishes to express her sincere thanks to her parents David and Dona Chiang, and the rest of her family for the constant encouragement to further her education, and to her friends for all their support during her time spent working on her degree.

iv ABSTRACT

Author: Cindy Chiang

Title: A Comprehensive Study of Mammalian SNAG Transcription Factor Family Members

Institution: Florida Atlantic University

Dissertation Advisor: Dr. Kasirajan Ayyanathan

Degree: Doctor of Philosophy

Year: 2012

Transcriptional regulation by the family of SNAG (Snail/Gfi-1) zinc fingers has been shown to play a role in various developmental states and diseases. These transcriptional repressors have function in both DNA- and -binding, allowing for multiple interactions by a single family member.

This work aims to characterize the SNAG members Slug, Smuc, Snail,

Scratch, Gfi-1, Gfi-1B, and IA-1 in terms of both DNA-protein and protein- protein interactions.

The specific DNA sequences to which the zinc finger regions bind were determined for each member, and a general consensus of

TGCACCTGTCCGA, was developed for four of the members. Via these

v studies, we also reveal thebinding affinities of E-box (CANNTG) sequences to the members, since this core is found for multiple members’ binding sites.

Additionally, protein-protein interactions of SNAG members to other biological molecules were investigated. The Slug domain and Scratch domain have unknown function, yet through yeast two-hybrid screening, we were able to determine protein interaction partners for them as well as for other full length SNAG members. These protein-interacting partners have suggested function as corepressors during transcriptional repression.

The comprehensive information determined from these studies allow for a better understanding of the functional relationship between SNAG-ZFPs and other . The collected data not only creates a new profile for each member investigated, but it also allows for further studies to be initiated from the results.

vi A COMPREHENSIVE STUDY OF MAMMALIAN SNAG TRANSCRIPTION FACTOR FAMILY MEMBERS

LIST OF FIGURES ...... xi LIST OF TABLES ...... xiv CHAPTER 1. INTRODUCTION ...... 1 1.1 Transcription Factors and Transcriptional Regulation ...... 1 1.2 SNAG (Snail/Gfi-1) Domain Zinc Finger (ZFPs) ...... 2 1.3. SNAG-ZFPs and E-box ...... 7 1.4 SNAG Domain ZFPs in Development and Disease ...... 8 1.5 SNAG-ZFPs Interact with Other Proteins ...... 11 1.6 Hypothesis ...... 11 CHAPTER 2. GLOBAL ANALYSIS OF SNAG-ZINC FINGER TRANSCRIPTION FACTOR TARGET GENES ...... 14 2.1 Identification of Unique DNA-Binding Specificities for the SNAG- Transcription Factor Family Members ...... 14 2.2 Bacterial Expression and Purification of GST-SNAG Fusion Proteins ...... 15 2.3 Zinc-Finger Array Binding Site Interactions ...... 17 2.4 Cloning of DNA-Protein Binding Site Interactions and DNA Sequencing ...... 20 2.5 Confirmation of Binding Site Selected Sequences ...... 22 2.6 Deriving a Consensus Sequence ...... 23 2.7 E-Box as a Core of the Consensus Sequences ...... 25 2.8 Gfi-1, Gfi-1B Hs, Gfi-1B Mm, and IA-1 Consensus Sequences are Novel and Vary More ...... 27 2.9 Competition EMSAs for Derived Consensus Sequences ...... 28 2.10 An Alternative Non-Isotopic Competition Dot Blot Assay ...... 30 2.11 Target Genes and Ontology ...... 32 2.12 Target Genes Span Multiple Functional and Biological Process Categories ...... 33 vii 2.13 Summary of Results ...... 37 CHAPTER 3. CHARACTERIZATION OF E-BOX ...... 39 3.1 Functionality of GST-SNAG Fusion Proteins ...... 39 3.2 SNAG-ZFPs Are Found to Vary in E-box Sequence Binding ...... 40 3.3 Nature of E-box Sequence Can Control the Strength of Protein Binding ...... 42 3.4 Nature of the Protein Can Dictate the Strength of Binding for the Same E-box Sequence ...... 44 3.5 Conservation of Zinc Fingers May Affect Binding ...... 45 3.6 Summary of Results ...... 47 CHAPTER 4. MOLECULAR CHARACTERIZATION AND IDENTIFICATION OF INTERACTING PROTEIN PROTEIN PARTNERS FOR SNAG FAMILY ...... 49 4.1 Identification and Characterization of SNAG-ZFP Novel Protein- Protein Interactions ...... 49 4.2 GST-Pull-Down Assay ...... 51 4.3 Identification of a Putative Slug Domain-Interacting Protein ...... 54 4.4 Identification of Interacting Partners by Yeast Two-Hybrid Screening ...... 56 4.5 Cloning and Sequencing of pGBKT7-SNAG-ZFP Bait ...... 57 4.6 Yeast Transformation and Protein Expression ...... 59 4.7 Autoactivation and Toxicity Assays Followed by Yeast Mating ...... 61 4.8 Screening of Positively Interacting Proteins ...... 62 4.9 Sequencing and Identification of Positively Interacting Proteins ...... 63 4.10 Protein-Protein Interactors Involved in Many Functions ...... 66 4.11 Protein-Interactor Expression Is Spread Throughout Multiple Tissues ...... 71 4.12 Summary of Results ...... 73 CHAPTER 5. IN VIVO VALIDATION OF BINDING SITE SELECTED TARGET GENES ...... 75 5.1 Luciferase Assay Scheme ...... 75 5.2 pGL3 and pQE30 Construction ...... 76 5.3 Completion of Luciferase Assay ...... 83 5.4 Summary of Results ...... 84

viii CHAPTER 6. DISCUSSION AND FUTURE DIRECTION ...... 86 6.1 Discussion ...... 86 6.2 Future Direction ...... 88 CHAPTER 7. MATERIALS AND METHODS ...... 90 7.1 Materials ...... 90 7.2 Bacterial Expression and Protein Purification of GST-SNAG Fusion Proteins ...... 91 7.3 Zinc-Finger Array Binding Site Interactions ...... 93 7.4 Cloning of DNA-Protein Binding Site Interactions ...... 94 7.5 DNA Sequencing ...... 95 7.6 Confirmation of Binding Site Selected Sequences ...... 95 7.7 Competition EMSAs for Derived Consensus Sequences ...... 96 7.8 Alternative Non-Isotopic Competition Dot Blot Assay ...... 97 7.9 Target Gene Analysis and ...... 99 7.10 Luciferase Assay ...... 100 7.11 E-Cadherin Promoter and E-box Characterization Assays ...... 104 7.12 E-Box Competition Assays with Increasing Protein or Increasing DNA ...... 106 7.13 GST-Pull-Down Binding Assay ...... 107 7.14 Cloning and Sequencing of pGBKT7-SNAG-ZFP Bait ...... 108 7.15 Yeast Transformation ...... 109 7.16 Yeast Protein Expression ...... 109 7.17 Autoactivation and Toxicity Assays and Yeast Mating ...... 111 7.18 Yeast Plasmid Preparation ...... 112 7.19 DNA Sequencing ...... 113 APPENDIX ...... 114 Table A1. Slug Target Genes ...... 114 Table A2. Smuc Target Genes ...... 119 Table A3. Snail Long Target Genes ...... 124 Table A4. Snail Short Target Genes ...... 127 Table A5. Scratch Long Target Genes ...... 129 Table A6. Scratch Short Target Genes ...... 134 Table A7. Gfi-1 Target Genes ...... 136 Table A8. Gfi-1B Hs Target Genes ...... 142 ix Table A9. Gfi-1B Mm Target Genes ...... 149 Table A10. IA-1 Long Target Genes ...... 154 Table A11. IA-1 Short Target Genes ...... 163 REFERENCES ...... 173

x LIST OF FIGURES

Figure 1. 1. Transcriptional regulation by a SNAG-ZFP ...... 4 Figure 1. 2. Structure of a C2H2 zinc finger and DNA-binding domain...... 6 Figure 2. 1. Alignment of SNAG Domain Transcription Factor Family Members ...... 16 Figure 2. 2. Diagrammatic representation of the GST-SNAG-ZF recombinant proteins ...... 16 Figure 2. 3. Purified GST-SNAG ZFPs used for binding site selection ...... 17 Figure 2. 4. EMSA of first binding site selection of SNAG-ZFPs with the randomized oligonucleotide library...... 18 Figure 2. 5. Second round of binding site selection and PCR ...... 18 Figure 2. 6. Third round of binding site selection and PCR ...... 19 Figure 2. 7. Final round of binding site selection and PCR ...... 19 Figure 2. 8. EcoRI/BamHI digestion of pUC18 vector...... 20 Figure 2. 9. Positive recombinant clones ...... 21 Figure 2. 10. RNase and PCI cleaned recombinant clones...... 22 Figure 2. 11. Binding of DNA to specific and nonspecific proteins ...... 23 Figure 2. 12. Consensus sequence construction...... 25 Figure 2. 13. Competition assays...... 29 Figure 2. 14. Chemiluminescent EMSA assay ...... 31 Figure 2. 15. Labeling efficiency of biotinylated DNA ...... 31 Figure 2. 16. Dot blot competition assay for Gfi-1 ...... 32 Figure 2. 17. Gene ontology grouping...... 37 Figure 3. 1. Gel mobility shift assay of E-cadherin and GST-SNAG- ZFPs ...... 40 Figure 3. 2. Preliminary E-box binding study ...... 41

xi

Figure 3. 3. Characterization of E-box ...... 42 Figure 3. 4. Differences in E-box dinucleotides dictate binding with the same ZFP ...... 44 Figure 3. 5. Different SNAG-ZFPs bind the same E-box sequence with different affinities ...... 45 Figure 3. 6. ClustalW alignment of zinc fingers ...... 47 Figure 4. 1. Diagrammatic representation of the GST-Slug domain and –Scratch domain ...... 50 Figure 4. 2. ClustalW alignment of Slug domains and Scratch domains between species ...... 51 Figure 4. 3. Slug and Scratch domains protein expression ...... 53 Figure 4. 4. Microarray analysis of Slug and Scratch ZFPs...... 54 Figure 4. 5. Binding assays of the Slug domain and Scratch domain ...... 55 Figure 4. 6. Clontech yeast two-hybrid screening background...... 56 Figure 4. 7. Plasmid preparation for yeast two-hybrid assay ...... 57 Figure 4. 8. PCR amplification of SNAG-ZFPs for use in yeast two- hybrid assay ...... 58 Figure 4. 9. Confirmation of SNAG-ZFPs and vector for yeast two- hybrid assay ...... 58 Figure 4. 10. Yeast recombinant clones as a result of ligation and transformation ...... 59 Figure 4. 11. Purified positive transformants ...... 60 Figure 4. 12. Confirmation of positive transformants by double digestion ... 60 Figure 4. 13. Western blot of yeast protein...... 60 Figure 4. 14. Autoactivation test ...... 61 Figure 4. 15. Positive Scratch-ZFP-protein interactions ...... 63 Figure 4. 16. Purified positively transformed clones ...... 64 Figure 4. 17. NdeI/EcoRI double digests of mini prep plasmid DNA ...... 65 Figure 4. 18. RNase/PCI/CI purification of positive yeast two-hybrid transformants ...... 65

xii

Figure 4. 19. SAGE expression data of yeast two-hybrid selected protein interactors ...... 73 Figure 5. 1. Plasmid constructs for luciferase assay ...... 76 Figure 5. 2. Genomic DNA extraction from lung and RPE cell lines ...... 78 Figure 5. 3. PCR amplified target gene promoters ...... 78 Figure 5. 4. PCR amplified SNAG-ZFPs and β-actin ...... 79 Figure 5. 5. Positively cloned β-actin pGL3 constructs ...... 80 Figure 5. 6. Positive clones of target gene promoters in pGL3-β-actin constructs ...... 80 Figure 5. 7. Multimerization attempts for binding site selected sequences ...... 81 Figure 5. 8. Positive TAT and His/TAT inserts cloned into pQE30 ...... 82 Figure 5. 9. pQE30 TAT SNAG-ZFP cloning scheme ...... 82 Figure 5. 10. Positively cloned pQE30 TAT constructs with Slug and Snail ...... 83 Figure 5. 11. Proposed model of repression of a target gene by Smuc ...... 85

xiii

LIST OF TABLES

Table 1. Yeast two-hybrid screening results at a nucleotide level ...... 66

Table 2. Selected target genes for luciferase assay ...... 77

Table 3. Primers used to amplify zinc finger containing regions...... 92

Table 4. Competition assay oligonucleotides...... 97

Table 5. Primers of SNAG-ZFP and TAT for pQE30 construction ...... 101

Table 6. Primers of target gene promoter regions & β-actin for pGL3 ...... 102

Table 7. E-box characterization oligonucleotides ...... 104

Table 8. Oligonucleotides used for E-cadherin promoter and E-box dinucleotide assays...... 105

Table 9. Primers for SNAG-ZFPs used in yeast two-hybrid assay ...... 109

xiv

CHAPTER 1. INTRODUCTION

1.1 Transcription Factors and Transcriptional Regulation

Transcription, the synthesis of an RNA transcript from a DNA template, is an important event in all aspects of biology. It leads to the expression of genes via an mRNA intermediate. Because of this, transcriptional regulation is an important field that requires in-depth research. By determining the interactions between two biological molecules and the mechanisms that cause genes to be expressed or repressed, a better understanding of their effects in a biological system may be reached. This includes obtaining a greater knowledge of how genes affect development or diseases.

Eukaryotic genomes contain several thousand genes. These genes are transcribed with the help of regulatory proteins known as transcription factors that can activate or silence the target genes when necessary. The is comprised of approximately 20,500 polypeptide-coding genes

(Clamp et al., 2007), of which approximately 1,400 encode transcription factors (Vaquerizas et al., 2009) that function in regulating the expression of the remaining genes. Transcriptional regulation mainly occurs at the initiation of transcription of the target gene, where transcription factors function either as activators or repressors (Okkema and Krause, 2005), turning genes on or

1 off, respectively. Transcriptional repressors may function by directly interacting with components of basal transcription machinery, remodeling chromatin in an

ATP- dependent manner, or site-specific modifications of the histones (Berger,

2007; Tsukiyama et al., 1995; Tsukiyama and Wu, 1995). A transcriptional repressor binding to a particular DNA sequence can inhibit transcription in many ways – it can block the function of an activator in a process termed active repression (Hanna-Rose and Hansen, 1996) or involve chromatin compaction leading to the loss of formation of the polymerase II holoenzyme complex at the promoter, which is needed to transcribe the DNA into RNA

(Fondell et al., 1996; Goldmark et al., 2000). The repressors can inhibit the formation of the complex directly or by recruiting a co-factor to repress indirectly (Ayyanathan et al., 2003; Ayyanathan et al., 2007; Hanna-Rose and

Hansen, 1996). In cases where transcriptional repressors are directly associated with causing a disease (for example involvement of Snail transcription factor in epithelial to mesenchymal transition and cancer metastasis), blocking repression may ultimately inhibit the progression of a disease (Batlle et al., 2000; Cano et al., 2000). This is why understanding the interactions and pathways that transcription factors are involved in is so crucial.

1.2 SNAG (Snail/Gfi-1) Domain Zinc Finger Proteins (ZFPs)

The family under investigation is the SNAG-domain family, a group of around forty members that act as transcriptional repressors and have been

2 implicated in disease states. The name SNAG comes from Snail/Gfi-1, two of the founding members of the family. The Snail superfamily of zinc finger transcription factors has been found to consist of two independent families,

Snail and Scratch (Manzanares et al., 2001), although all belong to the broader SNAG family. The transcriptional repressors in this family all contain a

SNAG domain, which was first discovered in Gfi-1 (Growth Factor

Independence-1) (Grimes et al., 1996a; Zweidler-Mckay et al., 1996), and is a highly conserved 21-amino acid sequence that mediates transcriptional repression. Each of these transcription factors has the SNAG domain at the N- terminus and a variable number of C2H2 zinc fingers (four to six in number), which function as a sequence-specific DNA binding domain, towards the C- terminus. These regions within the SNAG-domain family members’ structures allow for numerous interactions with DNA and/or protein, increasing their potential for significant function in a biological manner. This regulation is the focus of this study (Figure 1.1)

Although the function of SNAG domain transcriptional repressors and interactions involving these transcription factors have been studied, the molecular mechanisms by which they act remain largely unknown, and recent research has increased the information in this area. Most recently, the SNAG domain has been shown to act as a binding domain and to provide stability of

Snail1 (Lin et al., 2010). Ayyanathan and colleagues first determined that the

3

Figure 1. 1. Transcriptional regulation by a SNAG-ZFP. SNAG-ZFPs can work as repressors by binding to a specific DNA sequence in the upstream regulatory region of a target gene, inhibiting activator function, recruiting cofactors, or blocking the RNA polymerase holoenzyme complex. This is accomplished by creating a tightly wound heterochromatin structure that prevents access to binding the proximal promoter region of a target gene.

SNAG domain recruits and functions with a corepressor called Ajuba, a multiple LIM domain protein. The Ajuba protein shuttles between the cytoplasm and the nucleus, and assembles repression complexes at target promoters (Ayyanathan et al., 2007). This observation has been extended in further studies by Hou et al. which showed that protein arginine methyltransferase 5 is an effector recruited by Ajuba to mediate Snail- dependent repression (Hou et al., 2008). The SNAG repression domain in Gfi-

1 and Gfi-1B have been found to interact with the REST corepressor, lysine specific demethylase (a histone demethylase), and HDACs 1 and 2 which are recruited to the target genes of these transcription factors. The same study

4 found that a proline to alanine mutation in the SNAG domain inactivates Gfi-1 function (Saleque et al., 2007).

As with many transcription factors, phosphorylation controls much of the function of the Snail transcription factor, including why it is located sub- cellularly (Dominguez et al., 2003) as well as how its function is regulated.

GSK-3β binds to and phosphorylates Snail at two motifs within Snail to regulate its function. It upregulates Snail and downregulates E-cadherin in vivo

(Zhou et al., 2004). Slug localizes to target promoters to regulate transcription, using both the zinc fingers and SNAG domain to act in a repressive manner

(Hemavathy et al., 2000a; Hemavathy et al., 2000b).

The majority of transcriptional repressors are zinc finger proteins

(ZFPs) which contain sequence-specific DNA binding zinc finger motifs. The binding of the zinc fingers to the upstream regulatory region in a DNA sequence-specific manner can selectively inhibit the transcription of the target gene. There are roughly 700 Cys2His2 (C2H2) zinc finger-type transcription factors (Tadepally et al., 2008), named for the specific zinc chelating domain which consists of a pair of cysteine residues in the beta helix and a pair of histidine residues in the alpha helix (Figure 1.2) These zinc fingers function as

DNA binding domains that allow for the interaction of the DNA’s major groove with the divalent cation (Miller et al., 1985). The zinc ion provides stability to the binding domain as the absence of it causes the domain to unfold. (Klug and Schwabe, 1995).

5

Figure 1. 2. Structure of a C2H2 zinc finger and DNA-binding domain. Zinc fingers are marked by two cysteine and two histidine residues that chelate a zinc ion the provides stability (left.) A number of zinc finger repeats forms a motif that acts as a DNA-binding domain.

The C2H2 family of transcriptional repressors is classified into sub- families such as SNAG, BTB/POZ, (Broad Complex, Tramtrack, Bric-a- brac/Pox virus Zinc finger) and KRAB (Kruppel Associated Box) based on the repressor domains they contain (Collins et al., 2001). There are around 40 members in the SNAG domain family, the most studied and well-known of which are Snail, Slug, and Gfi-1 (Nieto, 2002). Many transcription factors, regulatory proteins, and other proteins that interact with DNA contain these protein domains. The zinc fingers form a binding domain that interacts with the

DNA.

As previously stated, zinc fingers binding to DNA creates a motif during the interaction that makes the structure more stable. As the zinc finger motifs can direct specific binding to DNA (Nagai et al., 1988), it gives insight to the protein’s binding abilities. One of these most commonly bound sequences is

6

E-box or CANNTG. Previous studies revealed a consensus sequence with the highly conserved core CAGGTG for the Drosophila transcription factor SNAIL

(Mauhin et al., 1993). Other binding sites have been identified for the ZFPs

Gfi-1(Duan and Horwitz, 2003), IA-1 (Breslin et al., 2003; Breslin et al., 2002) and ZBRK1 (Peng et al., 2002).

These DNA-protein binding specificities can regulate transcription in a number of ways, and it is significant to investigate these relationships to better understand how each SNAG-ZFP functions. Because each member has a variable number of zinc fingers that are dissimilar in sequence, they are believed to bind to a specific sequence, thereby binding different genes, and regulating a unique panel of genes. Here, we investigate the sequences, the genes, and their function, creating a unique view of the genes that each

SNAG-ZFP targets.

1.3 SNAG-ZFPs and E-box

Transcription factors such as Snail and Scratch bind to E-box elements of E-cadherin’s (CDH1) proximal promoter site (Cano et al., 2000; Giroldi et al., 1997). This E-box can regulate basic helix-loop-helix transcription factors as does Smuc (Kataoka et al., 2000), and can itself be regulated by other transcription factors, as is the case in IA-1 (Breslin et al., 2003). The E-box element found within the IA-1 promoter can be regulated by NeuroD1 and E47 heterodimers, both transcription factors critical to start transcription of the IA-1 gene (Breslin et al., 2003).

7

We have identified a longer and novel consensus sequence for each of the SNAG-ZFPs, some possessing an E-box. This differentiates one group of

SNAG-ZFPs from another, and here we seek to separate one group of E-box binding factors from other unrelated ones and prove significance between families of transcription factors based on their binding sites. Not all bind to the same E-box similarly so here we investigate the differences in E-box sequences. Due to previous studies showing their ability to bind to E-box, we further examined the affinity of the sequence with members of the SNAG- family. Through our binding assays, we found that the internal NN residues dictate how strongly the ZFPs bind. Using all 16 NN combinations, EMSA was performed to see the effect, if any, of the interaction.

1.4 SNAG Domain ZFPs in Development and Disease

Zinc fingers have been implicated in numerous diseases (Ladomery and Dellaire, 2002), and more specifically, Snail, Slug, and Scratch are examples of transcription factors that contain the SNAG domain and have roles in development and disease (Nieto, 2002). These transcription factors have been found in a vast array of different tissues and are involved development of many areas. In Drosophila, mesoderm differentiation and development of the nervous system require a functional SNAG domain

(Hemavathy et al., 2004). Gfi-1 and Gfi-1B play a role in haematopoiesis but also in inner ear development (Fiolka et al., 2006). Snail has been found in the early development of mouse and Drosophila (Nieto, 2002), and Scratch in the

8 developing and mature adult brain (Marin and Nieto, 2006). Whilst Scratch is found in the brain, its related family member Snail is not detectable in the brain, but rather highly expressed in the kidneys (Twigg and Wilkie, 1999).

Slug is also expressed at low levels in the brain, but is highly expressed in the ovary and prostate (Hemavathy et al., 2000a; Hemavathy et al., 2000b).

ZFPs not only act developmentally, their function also extends to human disease. Snail ZFP targets E-cadherin promoters, which will result in the loss of E-cadherin, an important gene in cell adhesion, and will ultimately lead to highly malignant and invasive human tumor progression (Batlle et al.,

2000). E-cadherin plays a role in epithelial-mesenchymal transition (EMT), and its repression by Snail has been established in both murine and human cell lines and tumors (Cano et al., 2000). SLUG, which is expressed in chick and

Xenopus embryo EMT regions, has also been found to be an E-cadherin repressor (Savagner et al., 2005). Snail represses E-cadherin with the help of

HDAC. When Snail is overexpressed, histones H3 and H4 are deacetylated at the E-cadherin promoter. Chromatin-modifying activites allow the SNAG domain to recruit HDAC1 and 2 and corepressor mSin3A, forming a multimolecular complex that can repress E-cadherin (Peinado et al., 2004).

EMT plays a role in many processes of the body, both beneficial and detrimental. It is involved in homeostasis, repairing tissue and healing wounds, as well as organogenesis and morphogenesis in development of an organism, but it also gives the migratory and invasive properties that characterize

9 metastasis in cancer progression and SNAG ZFPs appear to contribute to this phenotype (Thiery et al., 2009).

Many studies have shown that multiple members of the SNAG family are upregulated in breast cancer, including Snail, Slug, and Twist (Martin et al., 2005; Moody et al., 2005), and more specifically, Slug and Snail have been shown to be highly expressed in invasive human breast tumors (Blanco et al.,

2002), acting as repressors of the genes occludin (OCLN) and claudin-1

(CLDN1), integral membrane proteins present in tight junctions (Martinez-

Estrada et al., 2006) or tumor suppressor gene BRCA2 (Tripathi et al., 2005).

Finding other genes repressed by these transcription factors is important for characterizing this group of repressors and potentially contributing to the knowledge of development and disease profiles. Due to their previously shown impact in a number of disorders, this study aims to characterize these ZFPs further, finding additional potential interactors and target genes to derive comprehensive information about the roles these

SNAG-ZFPs play in development and disease.

Identification of target genes in vitro lays the groundwork for necessary validation in vivo which will greatly increase the significance of these findings at a human cellular state. In nearly every case where it is shown a SNAG-ZFP represses a gene, in vivo experiments are necessary to confirm these findings

(Battle et al., 2000; Martinez-Estrada et al., 2006; Tripathi et al., 2005). Here, a luciferase assay will be designed and attempted to further relate the

10 importance of this project at a cellular level.

1.5 SNAG-ZFPs Interact with Other Proteins

Not only can SNAG-ZFPs function by binding to DNA sequences, but, like many other zinc finger proteins, they may be involved in protein binding

(Brayer and Segal, 2008; Iuchi, 2001; Mackay and Crossley, 1998). Brayer and colleagues has suggested that zinc finger domains have greater potential for protein interactions than DNA binding (Brayer et al., 2008). Some zinc fingers have distinct regions in their structure that deal with DNA-binding and others with protein interactions (Sun et al., 1996), like POZ domain (Bardwell and Treisman, 1994), and as previously stated the SNAG domain has been found to recruit a protein to form repression complexes (Ayyanathan et al.,

2007). Just like LIM domains of other zinc fingers have been found to bind to the protein protein kinase C (Kuroda et al., 1996), it is possible that not only the SNAG domain, but also the linker regions that join this domain with the zinc fingers may play a role in protein interactions. The Slug and Scratch ZFPs also have conserved domains with or act in a direct manner to further expand the biological function of zinc finger proteins.

1.6 Hypothesis

As previously stated, SNAG-ZFPs have been implicated in many developmental and disease states. Due to the conservation of the repressor domain, yet the variance in linker and zinc finger regions, we believe that these ZFPs may be involved in a number of different interactions. Many of 11 these interactions may not have been previously elucidated and by screening for them we can narrow down the targets of each SNAG-ZFP, making it easier to investigate each interaction.

The hypothesis is that numerous genes are regulated by SNAG-ZFPs, and that these will be functionally related. The family can be split into to groups, E-box binders and non-E-box binders, each with their unique group of regulated genes. For the E-box binders, these genes may be linked to known target genes such as E-cadherin, playing a role in cell adhesion repression and progression to cancer. This group is expected to have a similar consensus sequence as a core CANNTG should provide the basis of the DNA binding site. Due to the similarity of binding sites, the regulated genes should be related as well.

Based on previous studies, the non-E-box binding SNAG-ZFPs Gfi-1,

Gfi-1B, and IA-1 will regulate a set of different genes with cues in immunity, cell cycle, and insulin regulation. Gfi-1B in its human and murine counterparts should select for the same genes as there is no expected divergence of function between the two organisms. Although studies have identified a number of genes regulated by these transcription factors, a global analysis of functionally related genes has not, and we expect to be able to select a number of genes in this screening.

Additionally, these SNAG-ZFPs are expected to have protein-binding capabilities and are expected to participate in functionally related categories

12 as well. Expression of these genes is widespread in various tissues so interactions with proteins cannot be narrowed down to one particular area, but using various methods we should be able to select for other proteins with which SNAG-ZFPs can interact with.

The main goals of this research are two-fold. The first is to identify sequence specific DNA binding sites of these SNAG transcription factors, further characterizing the way in which they function. By identifying the sequences to which the zinc fingers bind, as well as discovering the interactions they are involved in, I will further contribute to the information known about these transcriptional repressors, which could be potentially vital information to decoding the mechanisms by which these factors act in various diseases. The second goal is to unravel the novel functions associated with the conserved Slug and Scratch domains and full length SNAG-ZFPs by identifying and characterizing the interacting proteins, via multiple methods including GST-pull-down and yeast two-hybrid approaches.

13

CHAPTER 2. GLOBAL ANALYSIS OF SNAG-ZINC FINGER TRANSCRIPTION FACTOR TARGET GENES

Specific Aims:

1. Identify and confirm DNA-binding site consensus sequences for each

SNAG-family member in the study

2. Identify genes possessing these DNA-binding sites in their promoter

regions, thereby suggesting target gene potential

3. Create a target gene function profile for each member

2.1 Identification of Unique DNA-Binding Specificities for the SNAG- Transcription Factor Family Members

The first aim of this project seeks to determine the DNA-binding sites of

SNAG-ZFPs by building a consensus sequence for each member. The zinc fingers create a DNA binding domain that works to control the transcription of a gene by binding to its promoter region. This project starts with building the GST-

SNAG fusion protein constructs of each member, then purifying those to use for binding site selections in which specific binding site sequences that interact with the SNAG proteins are collected. The multiple sequences obtained were used to create a longer, more specific consensus sequence than what has previously been shown for each of the SNAG-ZFPs in question. Finally, the binding site

14 selected sequences provided information to create a comprehensive list of genes that are potentially targeted by the members. These are grouped by biological function to better understand the areas in which the ZFPs may be working.

2.2 Bacterial Expression and Purification of GST-SNAG Fusion Proteins

As discussed in the introduction, SNAG-ZFPs have a highly conserved amino acid sequence in the SNAG domain at each member’s N-terminal and zinc fingers of varying number at the C-terminal (Figure 2.1). GST-SNAG fusion proteins were created by fusing the C-terminal zinc fingers to a glutathione-sepharose transferase (GST) tag. A diagrammatic representation of the fusion proteins is seen in Figure 2.2.

GST-SNAG recombinant constructs were induced to produce the

.appropriate ZFPs. Column chromatography was employed to separate fractions of purified proteins. When significant amounts of purified, homogeneous proteins were obtained after various rounds of chromatography, the proteins were further purified and loaded onto a polyacrylamide gel standardized to a concentration of 3µg (Figure 2.3).

15

Figure 2. 1. Alignment of SNAG Domain Transcription Factor Family members. SNAG domain transcription factors have a highly conserved 21-amino acid sequence at the N-terminal and a chain of C2H2 zinc finger repeats at the C- terminal, which are connected by a linker region. Slug and Scratch have domains in the linker region, but the function of these is unknown.

Figure 2. 2. Diagrammatic representation of the GST-SNAG-ZF recombinant proteins. DNA segments of the zinc fingers were PCR amplified by using the full-length genes of SNAG domain transcription factor family as templates. Zinc fingers were fused with GST-tag to generate recombinant fusion proteins.

16

Figure 2. 3. Purified GST-SNAG ZFPs used for binding site selection. Elution samples from column purification were further purified using electroelution. The purified, homogeneous fusion proteins were used for bindings to E-cadherin and the randomized oligonucleotide library.

2.3 Zinc-Finger Array Binding Site Interactions

The initial binding of purified proteins to the radiolabeled oligonucleotide library is seen in Figure 2.4, indicating that a DNA-protein complex is forming as a result of the binding. These complexes were eluted and the resultant

DNA was PCR amplified in order to enrich the sequences collected (Figures

2.5-2.7). DNA-protein complexes become more strongly amplified as expected, since the amount of DNA available is increasing with each round of enrichment. The population of possible 18N combinations is upwards of 68 billion, and this binding site selection allowed us to not only narrow down the potential sequences and select for the recognized sequences but to also amplify them.

17

Figure 2. 4. EMSA of first binding site selection of SNAG-ZFPs with the randomized oligonucleotide library. GST-SNAG-ZFPs were bound to a randomized 49-N oligonucleotide library flanked with known PCR primers and restriction sites. The bands indicated by the black arrowhead were cut and eluted.

A. B.

Figure 2. 5. Second round of binding site selection and PCR. A. Second binding site selection of SNAG-ZFPs. The black arrowhead shows the area of bands used for subsequent PCR. For some products, top and bottom bands were used. B. PCR products of previously bound and eluted enriched oligonucleotide library and SNAG-ZFPs run on a DNA-PAGE gel. 18

A. B.

Figure 2. 6. Third round of binding site selection and PCR. A. Third binding site selection of SNAG-ZFPs with enriched oligonucleotide library as shown by EMSA. The open arrowhead shows the area of bands used for subsequent PCR. B. PCR products of previously bound and eluted enriched oligonucleotide library and SNAG-ZFPs run on a DNA-PAGE gel.

A. B.

Figure 2. 7. Final round of binding site selection and PCR. A. Fourth and final binding site selection of SNAG-ZFPs with oligonucleotide library. EMSA of the binding between SNAG-ZFPs and enriched PCR product. B. Enriched PCR products of previously bound SNAG-ZFPs and bound oligonucleotide library run on a DNA-PAGE gel. 19

2.4 Cloning of DNA-Protein Binding Site Interactions and DNA Sequencing

The universal cloning vector, pUC18 plasmid was used to insert the binding site selected DNA sequences. After ligation of these sequences and transformation into competent Escherichia coli cells, at least twenty clones were selected, and the plasmids were prepared and digested with EcoRI and

BamHI in order to identify the positive recombinant clones (Figure 2.8). A sample of clones processed is seen in Figure 2.9, and each band is an indication of a positive clone carrying a 49N sequence. Positive recombinant clones were selected and cleaned of RNA using RNase and phenol chloroform extraction (PCI/CI) to confirm the integrity and purity of DNA (Figure 2.10), which was spotted on a 96-well plate for high throughput cycle sequencing.

A. B.

Figure 2. 8. EcoRI/BamHI digestion of pUC18 vector. A. Two pUC18 vector samples were first digested with restriction enzymes – one with EcoRI and the other with BamHI. B. The vectors were then digested with the restriction enzyme not used in the first digestion as part of the double digestion. This is done to ensure efficiency of the vector to be used for ligation. 20

Figure 2. 9. Positive recombinant clones. The electroeluted PCR products were pUC18-ligated and transformed into E. coli DH5α cells which were grown, mini-plasmid prepared, and run on a DNA- PAGE gel. Occurrence of a band indicates the presence of a positive clone.

21

Figure 2. 10. RNase and PCI cleaned recombinant clones. The mini-plasmid preps were RNase digested and purified in order to check the presence and integrity of the DNA. Samples were run on agarose gels.

2.5 Confirmation of Binding Site Selected Sequences

While samples were sent off for sequencing, we further tested the selected binding site sequences to confirm that they were truly bound by the

SNAG-ZFP used to select them. Binding of the 18N DNA with the cognate protein resulted in some obvious shifts seen for the Smuc protein on anautoradiograph exhibited in Figure 2.11A. These figures correspond with the gels run with the competing, nonspecific CG7938 protein seen in Figure

2.11B. As expected, where there were once bands where the Smuc DNA and

Smuc protein bound to each other, this binding is lost when Smuc DNA is

22 combined with CG7938 protein, suggesting specificity of the binding site. This ensured that the selected sequences are not only true binders of the SNAG-

ZFPs, but also specific for them and not other proteins.

Figure 2. 11. Binding of DNA to specific and nonspecific proteins. A. EMSAs of binding of the resultant positive clones with its cognate Smuc protein and B. with a competing non-specific protein CG7938. Filled arrowheads indicate non-specific binding and open arrowheads show the specific binding of DNA to protein. 2.6 Deriving a Consensus Sequence

Each set of sequences for each SNAG-ZFP was analyzed to determine if there were any consensus sequences. The sequences were identified from the 5’ BamHI restriction site GGATCCATTGCA and the 3’ EcoRI restriction site CTGTCCGAATC. The resultant sequences in between these restriction sites were collected and compared. In order to derive a consensus sequence for each SNAG-ZFP, the full 18N sequence as well as surrounding nucleotides

23 were thoroughly analyzed both visually and computationally and aligned for the best fit match of sequences. Adenine (A) and guanine (G), both purines were considered similar nucleotides as were the pyrimadines cytosine (C) and thymine (T). The percentage of each nucleotide being selected at a specific position was calculated. A significant match was considered to be a base that occurred in nearly half of the sequences chosen. Aberrant sequences such as empty vector (GGATCCCCGGCTACCAAGCTCGAATTC) or partial sequences were unused. There were a number of replicates within each

SNAG-ZFP set, and even between long and short constructs. These replicates, which came in pairs and sometimes triplets, indicated that these particular sequences are selected by the ZFP more often than some of the other binding sites.

An example of how each consensus sequence was derived is seen in

Figure 2.12A for Slug ZFP. Also provided is a diagrammatic representation of the percentages of nucleotide occurrence of each based on the consensus obtained. A figure showing all of the derived consensus sequences for the

SNAG-ZFP constructs that were investigated can be seen in Figure 2.12B.

24

Figure 2. 12. Consensus sequence construction. A. Building a Slug Consensus. The obtained consensus for Slug is TGCACCTGYCCGA as shown by the table of sequences from different clones and the percentage breakdown for each. The diagram below the table indicates the percentages as to the occurrence of a particular nucleotide at a certain position. B. Derived SNAG-ZFP Consensus Sequences. The consensus sequences of the SNAG-ZFPs are highlighted at their core E-box binding site conserved among the Slug, Smuc, Snail, and Scratch proteins.

2.7 E-Box as a Core of the Consensus Sequences

Using the known SNAG consensus sequence CANNTG as a core to build upon with flanking nucleotides, these variations in the surrounding nucleotides were sought as well as a more concise representation of the N nucleotides within the E-box. First, only exact E-box hits were used, and then the remaining sequences were analyzed to find variations of the CANNTG sequence. This includes extra or missing nucleotides, or a different nucleotide 25 in one position of the sequence. These sequences were aligned based on the position of the E-box, and the flanking nucleotides were added around the core to determine if there was any common pattern. The breakdown of nucleotides at each position is calculated below the Slug example in Figure

2.12A to illustrate. Any nucleotide with a significant appearance was taken as part of the consensus.

The consensus for Slug was found to be TGCACCTGYCCGA (where Y is a pyrimidine), and the Smuc consensus sequence was

TGCACCTGTCCGA. Snail long was GCACCTGTCCGA, and Snail short was

CACCTGYCCG. Finally the consensus for Scratch long was

CATTGCACCTGTCCGA, and the Scratch short was GCACCTGTCCGA.

Each of these six constructs possessed not only an E-box core, but also an exact E-box core of CACCTG. The inner dinucleotide pair was not as strongly represented, however, it still appeared over fifty percent of the time at those positions. This was expected as literature has shown that these members bind to this hexonucleotide sequence (see Chapter 3). These results suggest that Slug, Smuc, Snail, and Scratch act similarly to bind to the same sequence, although there may be additional factors such as flanking nucleotides that allows them to target different genes for transcriptional repression.

26

2.8 Gfi-1, Gfi-1B Hs, Gfi-1B Mm, and IA-1 Consensus Sequences are Novel and Vary More

As for the other five proteins, the following consensus sequences were obtained using the binding site selected DNA sequences: Gfi-1 –

GGNCGGYCCGYGYGG, Gfi-1B Hs – GNGRCNCCGCCGGGG, Gfi-1B Mm –

GYGYGGRNCGCGG, IA-1 long – CCGGCGGCYYYGGGCG, and IA-1 short

–GGCGAGGCCGGGCGG. Between these five constructs and the six previously discussed, there were more variation to the consensus sequences and none matched as well as they did for the first six. However, we were able to obtain longer consensus sequences as expected since they had more zinc fingers than the rest. The DNA-binding sequences appear to be much longer, probably due to these ZFPs’ possession of additional zinc fingers, but it does not necessarily mean that all the nucleotides are targeted. The most important binding specificities are most likely those that occurred at the highest percentage at a particular location

Previous research done by Zweidler-McKay et al. in 1996, and further tested by Jegalian and Wu in 2002 and Duan and Horwitz in 2003, reveal a

A T A consensus sequence of TAAATCAC( /T)GCA or TAAA( /G)CAC( /T)GCA with

T an invariant core sequence of AA( /G)C. This particular sequence was not found within our sequences, indicating that there may be multiple binding site sequences that Gfi-1 ZFPs are able to bind (Duan and Horwitz, 2003; Jegalian and Wu, 2002; Zweidler-Mckay et al., 1996). Our consensus sequences for

IA-1 only slightly resemble a previously derived sequence of 27

G C C T G G T( /T)( /T)( /T)( /A)GGGG( /T)C( /A) (Breslin et al., 2002) as seen by the guanine-rich regions that appear more often than the other nucleotides. This is true of Gfi-1-, Gfi-1B- and IA-1-selected consensus sequences. Another characteristic of this set of ZFPs is that they most likely bind to GC-rich egions when binding to DNA sequences.

Also, although no published literature states that IA-1 is an E-box binder, a few binding site selected sequences for it contained an E-box,

CAAATG. This is different from the core sequence of the Slug, Smuc, Snail, and Scratch consensus sequences, and from the three E-box sequences within its promoter (Breslin et al., 2003). This suggests that IA-1 may be similar to other SNAG family members in that it has a binding affinity for E-box, but it is distinct in that it does not bind the commonly occurring one.

2.9 Competition EMSAs for Derived Consensus Sequences

Oligonucleotides for the consensus sequences for each SNAG-ZFP were constructed, competition assays to confirm the derived sequence was indeed a binding site. The oligonucleotides used include wild type sequences, along with sequences mutated at the highly specific nucleotide residues

(MUT100), and at every position (MUT). As expected, the derived consensus sequences bound specifically to their cognate SNAG-ZFPs. Additional wild type oligonucleotide resulted in competition and less binding of the labeled oligonucleotide. However, addition of a nonspecific oligonucleotide mutated in

28 the highest conserved regions did not reduce binding, even when adding 20-

40x as much nonspecific DNA (Figure 2.13). This was also the case for the oligonucleotides which were completely mutated in all positions. It was not recognized by the SNAG-ZFP, and a strong protein-DNA interaction remained, further confirming each consensus sequence as a valid binding site. These competition assays were completed for SNAG-ZFP members Slug, Smuc, and both constructs of Snail and Scratch.

A. B.

Figure 2. 13. Competition assays. SNAG-ZFP binding was competed using oligonucleotides mutated in highly conserved regions of the consensus sequences. Full Scratch long EMSA gel shown in A. This pattern shows unlabeled WT oligonucleotide decreased binding between proteins and radiolabeled WT sequences and no change in binding when a B. 100MUT or MUT oligonucleotide (not shown) is added.

The cognate binding of the SNAG-ZFP to wild type sequences was occasionally of less intensity than the reactions in which 100MUT sequences were additionally added. This suggests that the mutatedoligonucleotides 29 actually strengthen binding of the wild type sequence to a given SNAG-ZFP.

This is seen most prominently in the cases of Scratch long and short.

2.10 An Alternative Non-Isotopic Competition Dot Blot Assay

In addition to these radioactive means of verifying in vitro competition of binding site selected sequences, a novel method was derived to complete the competition assays for all SNAG-ZFPs. This method involved a dot blot apparatus and visualization of biotin-labeled DNA with a streptavidin conjugate. Trials were done using the Lightshift® Chemiluminescent EMSA Kit without success (Figure 2.14) so a new technique was conceived retaining the same concept.

By using a dot blot apparatus, we retain the rationale behind the chemiluminescent kit, but eliminate the need for excess materials and gel run and transfer. Instead of running reactions with protein and DNA (WT or MUT combinations) on a polyacrylamide gel, we add them stepwise onto an

Immobilon protein membrane, vacuuming between each step. Those DNA sequences recognized by the protein will be retained on the membrane, while the excess will be vacuumed out. Each reaction consists of protein with biotin- labeled WT oligonucleotide (Figure 2.15). As each reaction increases in unlabeled WT oligonucleotide, the signal should be weaker, yet with the addition of unlabeled MUT oligonucleotide, no interruption of binding should occur.

30

Figure 2. 14. Chemiluminescent EMSA Assay. Competition assay was tested using the Lightshift Chemiluminescent EMSA Kit, however, signal was unseen in any experimental lanes. Poor ly labeled DNA in the experimental conditions. Either one DNA strand was biotin labeled and then annealed to the unlabeled complementary or both strands were labeled and then annealed.

Labeling efficiency was assessed and showed that there was no benefit of one way versus the other. Controls were also biotin-labeled to confirm that the supplies provided were functional.

Figure 2. 15. Labeling efficiency of biotinylated DNA. To assess labeling efficiency, samples of biotin-labeled DNA were spotted in a dot blot apparatus at decreasing concentrations along with two control supplied DNA oligonucleotides. Labeling one strand versus both strands showed no significant difference. 31

Trials were attempted for Slug, Gfi-1, and Gfi-1B (previously untested via the radiolabeled method), and as seen in Figure 2.16 the desired effect was achieved. This novel method can be used as a more efficient way to test competition with a number of reactions for multiple proteins all at once.

Figure 2. 16. Dot blot competition assay for Gfi-1. Using streptavidin-AP, a dot blot competition assay was performed for Gfi-1. Addition of WT oligonucleotide decreases binding while MUT has no effect. GST provided a control for no expected binding. 2.11 Target Genes and Gene Ontology

Once consensus sequences for each of the SNAG domain constructs which were analyzed had been derived, target genes were found via multiple sources. Primarily, two databases, TRANSfac (http://www.gene- regulation.com) and Eukaryotic Promoter Database (EPD)

(http://www.epd.isb-sib.ch/), were used to find genes that contained the consensus sequences within their promoter region, usually upstream to the gene’s transcription start site. Not only were perfect matches of consensus sequence searched, but also shorter pieces of the original sequence, nucleotide variants (for example, where multiple nucleotides were present in a

32 given position), and the individually selected sequences obtained via binding site selection. By searching for the multiple variations of the consensus sequences and purely derived sequences for each of the constructs, we were able to suggest genes targeted by these ZFPs. Tables were then constructed that included additional information about the promoter sequences and genes.

The comprehensive list of more than 1,600 total genes for all 11 constructs (see tables in APPENDIX) was used to select potential target genes for further testing. Complete information including , positive or negative orientation, and a 60n strand of DNA that encompasses the consensus sequence was included. This list of genes obtained through the target genes study holds a gold mine of information. These genes could all be potential target genes of their respective ZFP, either playing a direct role or some axillary corepressor role. Further studies must be done to identify in what manner they are working within a pathway and how each SNAG domain transcription factor comes into the picture.

2.12 Target Genes Span Multiple Functional and Biological Process Categories

Gene ontology analysis reveals much variation in function of the selected genes. For each SNAG-ZFP, genes grouped based on biological process and molecular function reveal that many are involved in metabolism and biochemical processes, while for the most part, a large portion were individual genes with their own function, unrelated to any of the other

33 categories (Figure 2.17).

For Slug, the biggest functional category was localization followed by

RNA binding and then proteolysis, and metabolism. Smuc had target genes with the least gene ontology similarity. Nearly 75% of genes had their own gene function, with the next most common grouping being organ development, and some specialized functions of binding and transport. As one of the most divergent members of the Snail family (Manzanares et al., 2004), Smuc could have evolved to develop an alternative function as opposed to its ancestors and represses a very specialized group of genes. It still possesses the ability to target E-box-containing promoter regions, but with the additionally selected binding site sequences, it shows function to bind dissimilar sequences as well.

Snail plays a role in immune system development, transcription cofactor activity, mitosis, and other metabolic and biochemical processes. Analysis of

Snail and Scratch binding sites resulted in overlapping functional grouping including stress response, cell adhesion, translational initiation, and transcriptional activity, indicating that they work in a similar manner in regulating the same type of pathways. They are a separate subgroup of C2H2 zinc fingers but part of the same superfamily (Barrallo-Gimeno and Nieto,

2009) so there may be similar function between the two members.

Additional groups of interest for the Slug, Smuc, Snail, and Scratch members are immune response, system development, and cell cycle process as Snail has been implicated in cell cycle disruption via maintenance of low

34 levels of cyclin D (Grimes et al., 1996b; Vega et al., 2004) and p21WAF/CIP1

(Takahashi et al., 2004). Because these SNAG-ZFPs have a similarly derived consensus sequences, many of their target genes are the same. This will be useful for further testing as many genes can be tested in parallel with different members.

Due to the more varied and longer consensus sequences for Gfi-1, Gfi-1B, and IA-1 ZFPs, a greater number of target genes were identified by our analysis. Gene ontology results reveal that many biological process groupings were the same for these ZFPs, including involvement in either apoptosis or cell proliferation. This corresponds with previous studies that show Gfi-1 represses Bax to inhibit T-cell death (Grimes et al., 1996b) and may repress gene expression of those involved with cell proliferation (Grimes et al., 1996a).

Additional targets have previously been screened for Gfi-1 binding sites via alternative methods (Duan and Horwitz, 2003). Unfortunately, our studies were unable to pick out the same genes, but many were of a similar function such as cell cycle control and hematopoiesis. Gfi-1B was found to regulate Socs gene expression, which is involved in suppression of cytokine signaling

(Jegalian and Wu, 2002). Due to the presence of genes involved in cell proliferation, apoptosis, and cell cycle control and literature stating known interactions with genes in the same realm, these would be the best to focus on as there is previous work done in relation to these areas.

As its name suggests, IA-1 is involved in the differentiation of cells into

35 those that are insulin-positive (Zhu et al., 2002), and many of the screened target are insulin-related, receptors, growth factors, and insulin itself, making those genes of particular interest. Cell death, response to chemical stimuli or stressors, cell division, cell cycle regulation (possibly resulting in cell proliferation, another grouped category), and many protein-related functions such as binding, transport, folding, and proteolysis were represented as well.

Using similar methods to ours (GST pull-down assay and yeast two-hybrid),

Xie et al. identified Cbl-associated protein to be a positive interactor with IA-1

(Xie et al., 2002). These approaches, in conjunction with the selected target genes screened for will be useful in confirming the interactions.

2.13 Summary of Results

In the present study, novel consensus sequences have been elucidated for each of the 11 constructs investigated and used to determine potential genes targeted by each transcriptional repressor. Each member has a unique profile of target genes involved in a number of biological processes, though parts of each are also comparable. There are many implications to the findings from this screening, including characterizing target genes related to development and disease. Gene ontology analysis reveals that the biological process groupings are related to genes already known to be targets of the

SNAG-ZFPs, such as cell cycle regulation for Gfi-1 or cell adhesion for Snail.

This family of transcription factors is employed in a multitude of cellular functions and further studies are necessary to completely explore these

36

Figure 2. 17. Gene ontology grouping. The target genes identified for each SNAG-ZFP play roles in multiple biological processes and molecular functions. These include metabolism, biosynthesis, development, transcription and translation. Those grouped in genes with unique function were individual genes with their own function, not categorized in any of the gene ontology categories. 37 findings.

The suggested target genes to investigate first include those with an E- box sequence in their promoter regions and cell adhesion molecules for Slug,

Smuc, Snail, and Scratch, cell proliferation, apoptosis, and cell cycle control genes for Gfi-1 and Gfi-1B, insulin-related genes for IA-1, and any disease- or development-related genes for all SNAG-ZFPs investigated. The remaining genes might be interesting to look at in order to reveal the specialized function of each member. In saying that each of these ZFPs are similar but unique , we come across a paradox that makes it difficult to fully understand these regulatory molecules. Only when we can identify each and every SNAG-ZFP to DNA binding site interaction will we grasp the full picture of these transcriptional regulators.

38

CHAPTER 3. CHARACTERIZATION OF E-BOX

Specific Aims:

1. Test SNAG-member binding to all 16 possible E-box (CANNTG)

combinations to determine strongest and weakest interactions

2. Assess binding affinity of SNAG-ZFPs to the E-box sequences by

increasing amount of DNA available

3. Assess binding affinity of SNAG-ZFPs to the E-box sequences by

increasing amount of protein available

3.1 Functionality of GST-SNAG fusion proteins

The GST-SNAG-ZFP fusion proteins were bound to E-cadherin promoter to confirm functionality. Slug, Smuc, Snail, and Scratch are known binders to the E-cadherin promoter sequence, which contains multiple E- boxes, whereas Gfi-1 and Gfi-1B does not (Figure 3.1). This corresponds to the literature that E-cadherin is repressed by the ZFPs tested except for the

GFI genes, which do not bind to E-box motifs (Nieto, 2002; Savagner et al.,

2005).

39

Figure 3. 1. Gel mobility shift assay of E-cadherin and GST-SNAG-ZFPs. To show functionality of the purified proteins, the GST-SNAG-ZFPs were bound to E-cadherin promoter, a known binding site of Slug, Smuc, Snail, and Scratch. Bands of DNA-protein complexes were visualized for the proteins known to target E-cadherin, and showed no interaction with those that do not bind to the promoter. Bands are indicated by the open arrowhead.

3.2 SNAG-ZFPs Are Found to Vary in E-box Sequence Binding

Knowing that these SNAG-ZFPs bind to E-box, we combined them with the promoter regions of HLH, and muscle creatine kinase (MCK) 1 and 2, genes known to contain E-box sequences and previously shown to be bound by Smuc (Kataoka et al., 2000). We noticed that the patterning of DNA-protein binding showed that the SNAG-ZFPs bound preferentially to MCK2, followed by HLH, and barely interacted, if at all, with MCK1 (Figure 3.2). The E-box

40 sequences for these promoter regions are CACCTG, CAGGTG, and

CATGTG, respectively. The differences in the internal dinucleotide sequence led us to believe that these two bases alone may be enough to dictate how strongly the DNA and protein interact.

Figure 3. 2. Preliminary E-box binding study. Comparison of binding affinities of additional E-box-containing promoters HLH, and muscle creatine kinase 1 and 2 and SNAG-ZFPs. The distinct pattern of strong binding to HLH and MCK2 but lesser, if any. binding to MCK1 led us to further investigate the differences in the E-box sequence and SNAG-ZFP binding capabilities.

After testing all 16 possible combinations for the internal E-box dinucleotides, as shown in Figure 3.3, we found that a sequence of CAGGTG and CACCTG will strongly interact with the SNAG-ZFPs, whereas other sequences such as CAACTG and CATATG do not. This suggests that these particular nucleotides play a role in binding with different kinds of E-box- binding proteins. Although it is not necessary for the internal nucleotides of E- box to be GG or CC for binding to occur, these combinations strengthen 41 contact between protein and DNA. The strongest interaction between a

SNAG-ZFP and a target gene is one that has a GG or CC E-box dinucleotide.

Flanking nucleotides may also be necessary to facilitate proper binding of a factor.

Figure 3. 3. Characterization of E-box. It is known that SNAG-ZFPs such as Slug, Smuc, Snail, and Scratch bind to the E-box site CANNTG. E-box binding assay with variable internal nucleotide residues reveal that the internal NN nucleotides allow for stronger or weaker binding. When these sequences are GG or CC, binding is vastly increased, whereas others such as AC, TA, and GC show limited binding.

3.3 Nature of E-box Sequence Can Control the Strength of Protein Binding

We investigated whether the same SNAG-ZFP can bind differently to different E-box dinucleotide combinations. We tested CACCTG and CAGGTG

42 combinations with Snail-ZFP and observed that they bound very differently as seen in Figure 3.4. This confirms that DNA-protein interactions can vary depending on the nature of the SNAG-ZFP and the nature of the E-box sequences used in binding assays.

As shown in our preliminary E-box characterization assay (Figure 3.3), some dinucleotide combinations bind better than others, confirming expected results. When comparing the binding fraction quantitatively for the same protein Snail versus the amount of unlabeled E-box sequences as described for other transcription factors in various other studies (Jacobs et al., 1975;

Swirnoff and Milbrandt, 1995; Zimmerman et al., 2008), we find that a better competing effect was observed for CAGGTG than CACCTG. As the amount of unlabeled E-box oligonucleotide is increased, binding complexes become reduced for CAGGTG. However, under the same conditions, CACCTG does not show such a drastic result. Binding remains intact despite additional oligonucleotides are present to compete for the Snail ZFP.

The preferential specificity of the GG combination indicates that Snail will more likely bind this sequence better than an E-box containing CC. These two E-box sequences were the most strongly bound out of the sixteen different combinations, and just as there was such great disparity between these results we saw the same loss of binding for less effective E-box sequences

(data not shown). Binding became limited with the other E-box sequences in

43 the same assays as performed here, results were not shown as they gave limited information.

Figure 3. 4. Differences in E-box dinucleotides dictate binding with the same ZFP. Comparison of Snail-ZFP binding to different E-box combinations show variation in binding intensities. Addition of different dinucleotide combinations has differential effects when bound to the same SNAG-ZFP.

3.4 Nature of the Protein Can Dictate the Strength of Binding for the Same E-box Sequence

To decipher if different proteins can differ in their binding affinities to the same E-box, binding fractions were compared for different SNAG-ZFPs. We tested Smuc and Snail zinc finger proteins against the same E-box, CAAGTG.

As expected, increasing quantity of protein to a constant amount of radiolabeled E-box oligonucleotides showed increased DNA-protein interaction

(Figure 3.5). However, when it comes to the ability to bind, we find that the same E-box dinucleotide combination is bound more readily by some proteins

44 than others. In this experiment, the CAAGTG sequence was more readily bound by Smuc than Snail (Figure 3.5), which also indicates that Smuc has higher affinity to CAAGTG sequence than Snail. While Smuc exhibited a dose- dependent increase in complex formation, Snail did not show this effect. Just as we saw in the previous assays, there is a limitation to the binding between

SNAG-ZFP and DNA. Each member acts quite differently from one another when bound to different sequences of DNA.

Figure 3. 5. Different SNAG-ZFPs bind the same E-box sequence with different affinities. Increasing amounts of SNAG-ZFP have varied effects when bound to the same E-box sequence. Representative EMSAs are shown on the right for Smuc and Snail short CAAGTG binding. Graph shows variation in binding when different SNAG proteins are bound to the same E-box dinucleotide combination. 3.5 Conservation of Zinc Fingers May Affect Binding

To determine the cause of these differences in binding affinities, the zinc finger region sequences of the ZFPs were analyzed using ClustalW, an online bioinformatics tool which can align amino acid sequences based on their best-fit matches. The results show that two of the zinc fingers of each 45 protein are highly conserved (Figure 3.6). The conserved sequences are those of zinc finger three of Slug, Smuc, and Scratch with finger two of Snail, and finger four of Slug, Smuc, and Scratch with finger three of Snail. There is also some conservation between the rest of the zinc fingers, but in a more moderate manner and not such a strong conservation as the sets just mentioned. These other zinc fingers play a role in binding the extra nucleotides that extend on either side of the E-box. Since zinc fingers bind three nucleotides each, this reveals that these two fingers play a role in binding the conserved CACCTG sequence. The zinc finger to DNA recognition is a coded signature for each interaction (Choo and Klug, 1994). Site directed mutagenesis would be able to confirm this to prove these two zinc fingers of

SNAG-ZFPs plays a role in binding E-box.

46

Figure 3. 6. ClustalW alignment of zinc fingers. Although conserved in most regions between zinc fingers of various SNAG- ZFPs, the most highly conserved are seen in the third and fourth groupings depicted. The second finger of Snail and the third of Slug, Smuc, and Scratch show high conservation as do the third finger of Snail and the fourth fingers of Slug, Smuc, and Scratch. * - perfect, : - good, and . - some similarity.

3.6 Summary of Results

The known E-box sequence binders Slug, Smuc, Snail and Scratch were found to preferentially bind to those with an internal dinucleotide sequence of CC or GG. It was also shown that different SNAG-ZFPs will bind to the same sequence with different affinities. The internal dinucleotide pair can direct binding as well as strengthen it, as it is shown that a SNAG-ZFP will

47 bind to numerous pairings, but more preferentially to some versus others.

Since the transcription factors also bind to the same sequence of DNA with different relative affinities, it must be that other residues in the zinc fingers may help to facilitate additional binding. Due to the fact that zinc fingers bind to three nucleotides each, we believe that two highly conserved zinc fingers of each ZFP are directing this DNA-protein interaction.

These results not only show how a few nucleotides can change the binding affinity of some DNA-protein interactions, but also leads to information about where each SNAG-ZFP may bind. This new information, which has not been previously studied, along with the binding site selection data can infer as to what genes each SNAG-ZFP may be predicted to interact with.

48

CHAPTER 4. MOLECULAR CHARACTERIZATION AND IDENTIFICATION

OF INTERACTING PROTEIN PARTNERS FOR SNAG FAMILY

Specific Aims:

1. Identify Slug domain and Scratch domain-interacting proteins via GST

pull-down assay

2. Identify Slug domain, Scratch domain, and full length SNAG-ZFP-

interacting proteins via yeast two-hybrid assay

3. Determine potential cofactors involved in protein-protein interactions

4.1 Identification and Characterization of SNAG-ZFP Novel Protein- Protein Interactions

The second aim of this dissertation research is to determine interactions that occur between cellular proteins and the SNAG-ZFPs. Efforts will be made to identify novel protein partners that interact with Slug and

Scratch domains represented in the diagrammatic representation of the structure of GST-tagged SNAG ZFPs presented in Figure 4.1. These domains are highly conserved among the same proteins of different organisms indicating that there is some strong evidence that it plays a role in some necessary process otherwise it should be lost over time. ClustalW analysis

49 done for Slug domain amongst lower and higher organisms as well as Scratch

1 and 2, paralogs, amongst a number of organisms, indicating this strong conservation (Figure 4.2).

Figure 4. 1. Diagrammatic representation of the GST-Slug domain and – Scratch domain. DNA segments corresponding to the indicated amino acids were PCR amplified by using the full-length genes of SNAG domain transcription factor family as templates. Domains were fused with GST-tag to generate recombinant fusion proteins.

The identification of proteins that interact with the SNAG-ZFPs was tested in multiple ways to achieve greatest results. Binding assays were done

0for Slug and Scratch domains and the Clontech Matchmaker Gold Yeast

Two-Hybrid System was employed for both the Slug and Scratch domains and the full length forms of all ZFPs.

50

Figure 4. 2. ClustalW alignment of Slug domains and Scratch domains between species. The Slug and Scratch domains are highly conserved among a host of organisms, suggesting some importance. They are found exclusively in the same proteins of the different organisms. The boxes around the amino acid sequences show the actual domain sequence, while the additional amino acids on either side show the flanking regions added in the constructs made for each domain.

4.2 GST-Pull-Down Assay

The GST-pull down assay was the first method employed to determine potential protein-protein interactions with the Slug and Scratch domains, using these proteins as bait for other proteins.

Processing of Slug and Scratch domains from induction of bacterial cultures to obtain a GST-tagged fusion protein was followed by protein induction, protein preparation, and column chromatography. As previously done for the full length ZFPs for the DNA binding site selection assay, the

51 domains were successfully induced (Figure 4.3A), the lysate was prepared in

SLS (Figure 4.3B) and proteins were purified via column chromotography

(Figure 4.5C).

A cell line was sought in which Slug and Scratch had good expression, indicating that there is some protein-protein interaction occurring in which the domain proteins participate. Using online microarray databases such as

BioGPS (http://biogps.org) and Cancer Genome Anatomy Project

(http://cgap.nci.nih.gov), the expression of Slug and Scratch in multiple cell lines was assessed (Figure 4.4). Human embryonic kidney cell line HEK293T was identified and used for the following experiment. The resultant elutions of the binding assay between nuclear extract and SNAG-ZFPs that were boiled and loaded on a SDS-PAGE gel silver staining to show the proteins (Figure

4.4).

52

A. B.

C. D.

Figure 4. 3. Slug and Scratch domains protein expression. A. Protein induction demonstrating Slug and Scratch domain fusion proteins. GST-Slug and Scratch fusion proteins were induced (I) and run along their uninduced (UI) counterparts. Broad range marker (M) and GST control are also shown. B. Purification of domain proteins by sodium laurel sulfate. C. Purification of GST-Slug domain and D. GST-Scratch domain. Panels of induced cell culture (I), sonicated extract (SE), flowthrough of loaded material (FT), wash (W) and elutions (E1-E3) are shown on SDS-PAGE gels stained with Coomassie Blue.

53

Figure 4. 4. Microarray analysis of Slug and Scratch ZFPs. CGAP database was used to assess expression of Slug and Scratch in various tissues. A cell line was sought in which both transcription factors were expressed at average or above average levels.

4.3 Identification of a Putative Slug Domain-Interacting Protein

Analysis of the gels was between the unbound protein domains and the bound domains to HEK293T nuclear extract. Any bands visualized in the binding wells of varying salt concentration washes but not in the GST control would be taken as significant finding. For the identification of interactions between the domains of Slug and Scratch and the HEK293T cell line, the results were difficult to determine. Thorough visual analysis was done to identify the presence of unique protein bands.

For the Slug domain, a single band met the conditions needed and was considered as a putative protein-protein interaction band, which is marked by a box in Figure 4.5. Other similar bands were sought for the remainder of the gel and for the Scratch domain gel. Unfortunately, for the Scratch domain samples, looking between each sample showed no largely prominent or distinct bands in one set of bands versus another. Due to the large number of

54 proteins found in the cell line, it proved challenging to discern one independent band from another.

Figure 4. 5. Binding assays of the Slug domain and Scratch domain. The domains were bound with HEK293T to identify protein-protein interactions. These binding assays were performed at various binding buffer concentrations and run on SDS-PAGE gels and silver stained. 4.4 Identification of Interacting Partners by Yeast Two-Hybrid Screening

To grasp a better understanding of the protein-protein interactions of the Slug domain and Scratch domains, we employed the yeast two-hybrid 55 method as an alternative means to identify protein interactors. This was also used in order to identify genes interacting with the SNAG-ZFPs in their full- length form. In the Clontech yeast two-hybrid assay (Figure 4.6), a bait protein is fused to a Gal4 DNA-binding-domain that interacts with a prey protein fused to a Gal4 activation domain. When the bait and prey fusion proteins interact, four reporter genes, AUR1-C, an antibiotic resistance gene, ADE2 and HIS3, biochemical markers, and MEL1, a colorimetric marker, are activated. This is a strictly controlled assay used to decrease the likelihood of self-activation and to limit the number of falsely interacting proteins.

Figure 4. 6. Clontech yeast two-hybrid screening background. The yeast two-hybrid method is under the control of four reporter genes, ensuring true protein interactors. (Figure courtesy Clontech.)

4.5 Cloning and Sequencing of pGBKT7-SNAG-ZFP Bait

Using the Clontech Matchmaker Gold kit, maxi sized preps were generated for the provided plasmids and digested to ensure correct sizing

(Figure 4.7). Once again, primers used to flank SNAG-ZFPs of interest were

56 created to obtain PCR products of the SNAG-ZFP and domains (Figure 4.8).

After the samples were processed by gene-cleaning, they were double digested using BamHI and EcoRI, electrophoresed on an agarose gel gene cleaned for use in ligation and transformation (Figure 4.9).

Figure 4. 7. Plasmid preparation for yeast two-hybrid assay. Maxi prep samples of Clontech provided plasmids for yeast two-hybrid assay (right) were digested to confirm that the transformed plasmids were of proper sizes. Samples were run on a 1% agarose gel.

57

Figure 4. 8. PCR amplification of SNAG-ZFPs for use in yeast two-hybrid assay. Samples were run on a 1% agarose gel.

Figure 4. 9. Confirmation of SNAG-ZFPs and vector for yeast two-hybrid assay. Double digestion of PCR amplified SNAG-ZFPs with EcoRI and BamHI to confirm the correct products were amplified. pGBKT7 vector was also digested. Samples were run on a 1% agarose gel.

58

4.6 Yeast Transformation and Protein Expression

SNAG-ZFP constructs were successfully ligated into pGBKT7 and transformed into competent yeast cells. Multiple colonies from each construct were mini-plasmid prepared and digested to check for inserts, indicating successful transformation. Samples were run on gels (Figure 4.9) to identify at least three positive clones each construct. Positive DNA samples were then cleaned using RNase and PCI/CI and then prepared at a midi scale (Figure

4.10). These were additionally digested to confirm proper sizes of the preps

(Figure 4.11). From the colonies containing the pGBKT7-Bait plasmids, proteins were expressed and collected, and then run on a Western blot to show positive protein expression in a yeast system (Figure 4.12).

Figure 4. 10. Yeast recombinant clones as a result of ligation and transformation. Multiple colonies for each construct were mini plasmid prepared, digested, and run on a 1% agarose gel and a 10% DNA polyacrylamide gel for the smaller Slug and Scratch domains to analyze inserts.

59

Figure 4. 11. Purified positive transformants. RNase/PCI/CI purified midis of positively transformed clones run on a 1% agarose gel.

Figure 4. 12. Confirmation of positive transformants by double digestion. Midi preps were additionally digested to confirm they were of the correct insert sizes and run on a 1% agarose gel.

Figure 4. 13. Western blot of yeast protein. A Western blot of two samples of full length Snail (264 amino acids, and ~29kDa) protein expressed in yeast indicated by the arrowhead. 60

4.7 Autoactivation and Toxicity Assays Followed by Yeast Mating

Several safeguards were taken to restrict false positive results. The autoactivation assay to ensure the SNAG-ZFP does not autonomously activate the reporter genes and toxicity assay to ensure the proteins are not toxic when expressed in yeast are examples of these. On an SD/-Trp plate all constructs are expected to grow. On the same plate supplemented with X-α- galactosidase (X-α-gal), the MEL1 marker within the plasmid should react and cause a colorimetric change of the colonies to blue. Finally, on the most stringent plate, SD/-Trp/X-α-gal/aureobasidin (A), none of the colonies should grow due to a lack of antibiotic resistance. The results of this assay as shown in Figure 4.14 showed Gfi-1 and Gfi-1B Hs were auto-activating.

Figure 4. 14. Autoactivation test. Expected results are indicated below each plate. If SNAG-ZFP expressing yeast autoactivate the markers, they will grow on the most stringent of plates, SD/-Trp/X-α-gal/A, due to antibiotic resistance.

61

Gfi-1 and Gfi-1B Hs were not considered for further study in the present form. To remedy this, the ZFPs should be truncated, and smaller segments should be used for testing. By using smaller fragments in the yeast two-hybrid assay and repeating the autoactivation test, we can determine which section of the SNAG-ZFPs are causing this autoactivation or another kit with different markers may be necessary to investigate these SNAG-ZFPs. If toxic, the expressed ZFPs in the pGBKT7 bait plasmid should cause the yeast to grow slower than normal. This was not the case for the remaining SNAG-ZFPs.

4.8 Screening of Positively Interacting Proteins

After combining a human cDNA prey library with bait plasmid, yeast cells were mated, and a three day incubation period yielded blue colonies on the plate, indicated an interaction between the bait and prey fusion proteins.

The colonies were spotted onto a DDO plate to confirm growth as red colonies and in parallel onto QDO/ X-α-gal/A plate as blue colonies that ensure all four markers are utilized. After the growth of the colonies, those from the DDO plate are then re-streaked onto a QDO/X-α-gal/A as a replica and a double confirmation of the results. The blue colonies from the replica plating are grown up and saved as glycerol stocks for further studies.

From this procedure, more than 40 potential Scratch-ZFP interactors have been grown. This construct was not found to autoactivate or be toxic, and replica plating revealed genuine interactions as indicated by the growth of

62 blue colonies on a QDO plate as seen in Figure 4.15. Although fewer in number, additional interactors were also found for Slug domain, Scratch domain, and the full-length constructs of Snail and Smuc. Multiple matings and screenings were done for each member, yielding those results.

Figure 4. 15. Positive Scratch-ZFP-protein interactions. Replica plating shows that on a quadruple drop out medium supplemented with X-α-gal and aureobasidin, blue colonies that indicate a protein-protein interaction

4.9 Sequencing and Identification of Positively Interacting Proteins

The yeast plasmid preps (Figure 4.16) were double digested with NdeI and EcoRI to determine which cells contained positively transformed inserts

(Figure 4.17). These positively checked transformants were RNase/PCI/CI purified (Figure 4.18), and all were sent for high throughput cycle sequencing 63 in a 96-well format. They were sequenced from both the 5’ and 3’ ends in order to doubly confirm true interactors, and in some instances more than one clone of each independent interactor, to determine which genes they were.

Figure 4. 16. Purified positively transformed clones. Positively selected clones were grown on a midi scale and plasmids were prepared using a glass bead method and PCI/CI purification. These were then used for electroporation into electrocompetent DH5α cells.

64

Figure 4. 17. NdeI/EcoRI double digests of mini prep plasmid DNA. Electroporated cells with yeast plasmid DNA were grown on SOC agar, and colonies were checked for transformants.

Figure 4. 18. RNase/PCI/CI purification of positive yeast two-hybrid transformants. Only positive transformants were selected and further purified to be sent for sequencing.

65

4.10 Protein-Protein Interactors Involved in Many Functions

Protein-protein interactions may be necessary for repression by SNAG-

ZFPs, and via the yeast two-hybrid screening, we were able to identify multiple potential interactors for the full length Scratch, Smuc, and Snail ZFPs as well as the Slug and Scratch domains. After sequencing the positively interacting proteins obtained from the yeast two-hybrid assay, the gene results as derived from BLASTing the 5’ and 3’ sequences were compared in Table 1.

Table 1. Yeast two-hybrid screening results at a nucleotide level

T7 3’ AD Gene Description Gene Description Homo sapiens CTD (carboxy-terminal Homo sapiens CTD (carboxy-terminal Slug domain, RNA polymerase II, domain, RNA polymerase II, Domain polypeptide A) small phosphatase like polypeptide A) small phosphatase like cl 7-1 2 (CTDSPL2), mRNA 2 (CTDSPL2), mRNA Homo sapiens CTD (carboxy-terminal Homo sapiens CTD (carboxy-terminal Slug domain, RNA polymerase II, domain, RNA polymerase II, Domain polypeptide A) small phosphatase like polypeptide A) small phosphatase like cl 7-3 2 (CTDSPL2), mRNA 2 (CTDSPL2), mRNA Scratch PREDICTED: Homo sapiens PREDICTED: Homo sapiens Domain hypothetical LOC100507659 hypothetical LOC100507659 cl 1-1 (LOC100507659), partial miscRNA (LOC100507659), partial miscRNA Scratch PREDICTED: Homo sapiens PREDICTED: Homo sapiens Domain hypothetical LOC100507659 hypothetical LOC100507659 cl 1-3 (LOC100507659), partial miscRNA (LOC100507659), partial miscRNA Scratch PREDICTED: Homo sapiens PREDICTED: Homo sapiens Domain hypothetical LOC100507659 hypothetical LOC100507659 cl 2-1 (LOC100507659), partial miscRNA (LOC100507659), partial miscRNA Scratch PREDICTED: Homo sapiens PREDICTED: Homo sapiens Domain hypothetical LOC100507659 hypothetical LOC100507659 cl 2-3 (LOC100507659), partial miscRNA (LOC100507659), partial miscRNA Scratch Homo sapiens tripartite motif- Homo sapiens tripartite motif- * cl 1-1 containing 59 (TRIM59), mRNA containing 59 (TRIM59), mRNA Scratch Homo sapiens tripartite motif- Homo sapiens tripartite motif- * cl 1-2 containing 59 (TRIM59), mRNA containing 59 (TRIM59), mRNA Homo sapiens PIH1 domain * Scratch Homo sapiens PIH1 domain containing containing 2 (PIH1D2), transcript cl 3-1 2 (PIH1D2), transcript variant 1, mRNA variant 1, mRNA (continued on next page)

* in right-hand column denotes true positive protein interacting partner. Absence of * indicates out-of-frame amino acid sequencing 66

(Table 1 continued) Scratch Homo sapiens synergin, gamma Homo sapiens synergin, gamma * cl 10-1 (SYNRG), transcript variant 3, mRNA (SYNRG), transcript variant 3, mRNA Scratch Homo sapiens synergin, gamma Homo sapiens synergin, gamma * cl 10-2 (SYNRG), transcript variant 3, mRNA (SYNRG), transcript variant 3, mRNA Homo sapiens BCL2/adenovirus E1B Homo sapiens BCL2/adenovirus E1B Scratch 19kDa interacting protein 2 19kDa interacting protein 2 cl 11a-1 (BNIP2), mRNA (BNIP2), mRNA Homo sapiens vacuolar protein Homo sapiens vacuolar protein sorting Scratch sorting 26 homolog A (S. pombe) 26 homolog A (S. pombe) cl 11b-2 (VPS26A), transcript variant 1, mRNA (VPS26A), transcript variant 1, mRNA Homo sapiens nuclear transport Scratch Homo sapiens nuclear transport factor factor 2-like export factor 2 (NXT2), cl 13a-1 2-like export factor 2 (NXT2), mRNA mRNA Scratch Homo sapiens nuclear transport factor No significant similarity cl 13b-1 2-like export factor 2 (NXT2), mRNA Homo sapiens synaptotagmin Homo sapiens synaptotagmin binding, Scratch binding, cytoplasmic RNA interacting cytoplasmic RNA interacting protein cl 14-1 protein (SYNCRIP), transcript variant (SYNCRIP), transcript variant 5, 1, mRNA mRNA Homo sapiens Na+/H+ exchanger Homo sapiens Na+/H+ exchanger * Scratch domain containing 1 (NHEDC1), domain containing 1 (NHEDC1), cl 16-1 transcript variant 1, mRNA transcript variant 1, mRNA Homo sapiens Na+/H+ exchanger Homo sapiens Na+/H+ exchanger * Scratch domain containing 1 (NHEDC1), domain containing 1 (NHEDC1), cl 16-2 transcript variant 1, mRNA transcript variant 1, mRNA Homo sapiens membrane-associated Homo sapiens membrane-associated * Scratch ring finger (C3HC4) 5 (MARCH5), ring finger (C3HC4) 5 (MARCH5), cl 18-3 mRNA mRNA Homo sapiens ATPase, Na+/K+ Homo sapiens ATPase, Na+/K+ * Scratch transporting, alpha 1 polypeptide transporting, alpha 1 polypeptide cl 19-4 (ATP1A1), transcript variant 1, mRNA (ATP1A1), transcript variant 1, mRNA Homo sapiens chromosome 3 * Scratch Homo sapiens centrosomal protein genomic contig, GRCh37.p2 reference cl 20-1 70kDa (CEP70), mRNA primary assembly Homo sapiens chromosome 1 Homo sapiens chromosome 1 Scratch genomic contig, GRCh37.p2 genomic contig, GRCh37.p2 reference cl 22-1 reference primary assembly primary assembly Scratch Homo sapiens zinc finger, DHHC-type Homo sapiens zinc finger, DHHC-type cl 25-1 containing 2 (ZDHHC2), mRNA containing 2 (ZDHHC2), mRNA Scratch Homo sapiens chromosome 15 open Homo sapiens chromosome 15 open cl 27-1 reading frame 17 (C15orf17), mRNA reading frame 17 (C15orf17), mRNA Homo sapiens hydroxyacyl-CoA Homo sapiens hydroxyacyl-CoA * dehydrogenase/3-ketoacyl-CoA dehydrogenase/3-ketoacyl-CoA Scratch thiolase/enoyl-CoA hydratase thiolase/enoyl-CoA hydratase cl 28-1 (trifunctional protein), beta subunit (trifunctional protein), beta subunit

(HADHB), nuclear gene encoding (HADHB), nuclear gene encoding mitochondrial protein, mRNA mitochondrial protein, mRNA (continued on next page)

67

(Table 1 continued) Homo sapiens hydroxyacyl-CoA Homo sapiens hydroxyacyl-CoA * dehydrogenase/3-ketoacyl-CoA dehydrogenase/3-ketoacyl-CoA Scratch thiolase/enoyl-CoA hydratase thiolase/enoyl-CoA hydratase cl 28-2 (trifunctional protein), beta subunit (trifunctional protein), beta subunit

(HADHB), nuclear gene encoding (HADHB), nuclear gene encoding mitochondrial protein, mRNA mitochondrial protein, mRNA Scratch No significant similarity No significant similarity cl 30-1 Homo sapiens SAM domain, SH3 * Scratch domain and nuclear localization No significant similarity cl 31-3 signals 1 (SAMSN1), mRNA Homo sapiens NADH dehydrogenase Homo sapiens NADH dehydrogenase Scratch (ubiquinone) Fe-S protein 6, 13kDa (ubiquinone) Fe-S protein 6, 13kDa cl 35-1 (NADH-coenzyme Q reductase) (NADH-coenzyme Q reductase) (NDUFS6), nuclear gene encoding (NDUFS6), nuclear gene encoding mitochondrial protein, mRNA mitochondrial protein, mRNA Homo sapiens myosin VA (heavy Homo sapiens myosin VA (heavy Scratch chain 12, myoxin) (MYO5A), chain 12, myoxin) (MYO5A), transcript cl 36-1 transcript variant 1, mRNA variant 1, mRNA Homo sapiens biorientation of Homo sapiens biorientation of * Scratch in cell division 1-like chromosomes in cell division 1-like cl 39-1 (BOD1L), mRNA (BOD1L), mRNA Homo sapiens interleukin 6 signal Homo sapiens interleukin 6 signal * Scratch transducer (gp130, oncostatin M transducer (gp130, oncostatin M cl 40-1 receptor) (IL6ST), transcript variant 1 receptor) (IL6ST), transcript variant 1 Homo sapiens interleukin 6 signal * Scratch transducer (gp130, oncostatin M No significant similarity cl 40-2 receptor) (IL6ST), transcript variant 1 Homo sapiens lysine (K)-specific Homo sapiens lysine (K)-specific * Smuc demethylase 3A (KDM3A), transcript demethylase 3A (KDM3A), transcript cl 21-1 variant 1, mRNA variant 1, mRNA Homo sapiens lysine (K)-specific Homo sapiens lysine (K)-specific * Smuc demethylase 3A (KDM3A), transcript demethylase 3A (KDM3A), transcript cl 21-2 variant 1, mRNA variant 2, mRNA Smuc Homo sapiens Dmx-like 1 (DMXL1), Homo sapiens Dmx-like 1 (DMXL1), * cl 10-1 mRNA mRNA Homo sapiens chaperonin containing * Snail TCP1, subunit 3 (gamma) (CCT3), No significant similarity cl 3-1 transcript variant 1, mRNA Homo sapiens chaperonin containing * Snail TCP1, subunit 3 (gamma) (CCT3), No significant similarity cl 3-2 transcript variant 1, mRNA Homo sapiens gamma-aminobutyric Homo sapiens gamma-aminobutyric Snail acid (GABA) A receptor, beta 3 acid (GABA) A receptor, beta 3 cl 11-1 (GABRB3), transcript variant 1, (GABRB3), transcript variant 1, mRNA mRNA Homo sapiens gamma-aminobutyric Homo sapiens gamma-aminobutyric Snail acid (GABA) A receptor, beta 3 acid (GABA) A receptor, beta 3 cl 11-2 (GABRB3), transcript variant 1, (GABRB3), transcript variant 1, mRNA mRNA 68

In nearly all the cases of the 40 positively interacting cDNA sequences obtained from the yeast two-hybrid assay, sequencing an insert was identified.

Only one plasmid was unsequenced. In five cases if sequencing from one direction was unsuccessful, the other provided the sequence. At the amino acid level, nearly half of the interactors were in frame. One of the misleading aspects of the two-hybrid method is that screened interactors may be determined on a nucleotide level, but the cDNA may not be fused in frame.

Those that were in frame are denoted by an asterisk in Table 1. Only those that were in frame will be further investigated.

None of the Slug domain or Scratch domain-interacting proteins were found to be in frame. For the Slug domain, carboxy-terminal domain RNA polymerase II phosphatase was found to be an interacting protein when analyzed at a nucleotide level, however, at the protein level, the translation was not in frame. For the Scratch domain, the same analysis was found but for hypothetical proteins. This may be due to the fact that this domain is of unknown function so the proteins with which they interact are not known as of yet. It would be interesting to see what the hypothetical protein it is determined to be as two separately selected interactors provided the same result. The only information known about this gene as of this writing is that it is on chromosome 4.

The Scratch full-length proteins were the most successful batch of

SNAG-ZFP in the yeast two-hybrid assay. The most interesting screened

69 interactors that were in frame are synergin, Na+/H+ exchanger domain, membrane-associated ring finger (C3HC4) 5, ATPase Na+/K+ transporting alpha 1, and interleukin 6 signal transducer. Scratch, being one of the longer

SNAG-ZFPs investigated, may play a role in more protein-protein interactions.

In two separate instances, the same SNAG-ZFP selected for two similar genes involved in ion transport and/or exchange. This may be good evidence that there is some role that these genes play with Scratch.

For Smuc, interestingly, lysine-specific demethylase 3A was found to be a protein-protein interactor. This is also a zinc finger protein and plays a role in hormone-dependent transcriptional regulation, with the DNA-binding factor

ER71 (Knebel et al., 2006). This demethylase could similarly be acting as a cofactor with Smuc to mediate transcriptional repression. Interestingly enough, the transcription factor ER71 targets MMP-1, and KDM3A acts as a corepressor working in conjunction with ER71. Another matrix metalloproteinase, MMP-11, was selected for in the binding site selection assay, further suggesting that these are working in collaboration with each other.

A gene ontology analysis was done for positively selected genes to determine if there is any similarity in biological process or molecular function.

Of the 18 independent genes selected for Scratch interaction, 15 were found in GOstat. The largest grouping for these genes was in the cytoplasm. Ten were found to be related to this cellular location, while eight of the same were

70 also in the organelle. The remaining genes were integral to membrane function. Via these results we can see that Scratch is involved in a host of different pathways, not specialized in one major location in the cell or one particular function. That leads us to believe that the SNAG-ZFPs are working in a broad manner, targeting specific genes.

Finally, for Snail, gamma-aminobutyric acid A receptor, which functions as part of a multi-subunit chloride channel receptor in the nervous system, was confirmed at a nucleotide level but not at the protein level. TCP1 was found to be in frame, but the 3' sequencing was not concrete .

4.11 Protein-Interactor Expression Is Spread Throughout Multiple

Tissues

To further investigate the role of these genes in protein-protein interactions with SNAG-ZFPs, expression data was obtained for each of the coding genes derived from the yeast two-hybrid assay.

Given the hypothesis that SNAG-ZFPs interact with proteins, we expect to see that the genes that they regulate should also be found in those same tissues the ZFPs are expressed in. For Slug, this is the case for several tissues. It is highly over-expressed in breast tissue, cancerous tissue, normal kidney, cancerous liver, and normal peritoneum, prostate, and uterus tissue.

Looking at the expression of CTDSPL2, we see that it is also highly expressed in just one of those, normal kidney. Perhaps the regulation of this gene by

71

Slug is in effect down regulating itself in the rest of these tissues.

As a whole, the group of proteins selected for Scratch showed no similarity as to tissue expression. Only an overexpression in thyroid cancer, cancerous or not, was seen while the rest of the tissues showed no particular pattern. Scratch is not expressed at high levels in the thyroid so again, the presence of one protein is accompanied by the lack of the other. Here, the lack of Snail and the presence of proteins thought to interact with it reveal there is a reciprocal relationship between these. If further investigation is to be done, all proteins should be tested to see whether tissue expression is important or not.

72

Figure 4. 19. SAGE expression data of yeast two-hybrid selected protein interactors. Proteins selected for in the yeast two-hybrid screening show broad expression in a number of tissues, however, many are highly expressed in cancerous versus normal tissues, and more specifically, thyroid cancer.

4.12 Summary of Results

The two-hybrid method was useful in collecting information about potential protein-interactors with SNAG-ZFPs, and this furthers the suggestion that SNAG-ZFPs and C2H2 zinc fingers in general can play a role in protein- mediated transcriptional regulation. The implications again are broad for the proteins with which they interact, just as the potential genes that are repressed

73 by SNAG-ZFPs are, but this shows that this family of transcription factors is involved in a broad range of functions and pathways. As a group they may be playing a role in many vast processes that are distinct from each other, each providing a specialized role in transcriptional regulation. These results are a stepping stone to the next discovery of SNAG-ZFP role in transcription, namely, mechanisms and activity in vivo.

74

CHAPTER 5. IN VIVO VALIDATION OF BINDING SITE SELECTED TARGET

GENES

Specific Aims:

1. Construct a plan to confirm SNAG-ZFPs’ repressive effects on in vitro

selected target genes

2. Assess repressive activity of genes in cell culture

5.1 Luciferase Assay Scheme

To validate findings of the binding site selection assay, a selection of target genes screened in vitro are to be used in an in vivo luciferase assay.

From the list of target genes, one or two were chosen to test in the luciferase assay (Table 2.) Selection of these genes was based on their possession of

E-box sequence or by the longest possible binding site sequence match pulled from the binding assay (this includes individually screened clones so sequences may not match the built consensus sequences as this will also give indication of whether individually selected sequences will bind specifically.)

The plasmids pGL3, as promoter-less luciferase reporter, and pQE30, as TAT fusion protein carrier, were chosen. To test both the derived binding

75 site selected sequence and chosen target genes containing matched sites in their promoter region, two pGL3 constructs were designed. Β-actin was chosen as a basic mammalian promoter. The pQE30 TAT plasmid was utilized to produce His-tagged fusion proteins. This plasmid has an inherent His6 tag so in one construct, the SNAG-ZFP and TAT protein were inserted downstream to it, while in the other construct, the inherent His6 was removed and replaced between SNAG-ZFP and TAT. This is to reduce steric hindrance possible from the large His tag blocking the SNAG-ZFP.

Figure 5. 1. Plasmid constructs for luciferase assay. Two constructs of each plasmid were designed. pGL3 is constructed with β- actin promoter. For pQE30, one construct consisted of TAT protein and the other His/TAT protein, while both had the SNAG-ZFP.

5.2 pGL3 and pQE30 Construction

pGL3 construction started with amplification of the promoter regions of the target genes and β-actin, using two cell lines for genomic DNA extraction – lung and retinal pigment epithelium (RPE) cell line (Figure 5.2). Using the

RPE genomic DNA as a template, promoters were amplified (Figure 5.3).

76

Table 2. Selected target genes for luciferase assay Gene Target Gene Description Target Gene Name Binding Sequence Slug IAPP islet amyloid polypeptide TGCTGCCTG SCD1 syndecan 1 GGTGGGGGG Smuc MT1 metallothionein I TGCACACTG CST4 cystatin S CACCTGCCT Snail_L CEACAM5 carcinoembryonic antigen-related cell CACGTGATG adhesion molecule 1 Snail_S ADH2 alcohol dehydrogenase 2 GGGTGTGG Scratch_L ACTA1 skeletal alpha-actin GGGCAACTG APOE apolipoprotein E GCCCCACCT Scratch_S PDGFB platelet-derived growth factor chain B AGGCTGTC E-box MMP11 matrix metalloproteinase 11 (stromelysin CACCTGTC 3) LGALS4 lectin, galactoside-binding, soluble, 4 CACCTGTC (galectin 4) Gfi-1 TMBS1 thrombospondin 1 GCCCGCGTGG RAD51C RAD51 homolog C (S. cerevisiae) CCCGTGCGG Gfi-1B_Hs SERPINE plasminogen activator inhibitor type 1 GTGAGTGG LTA Tumor necrosis factor-alpha, TNF or CTCCTCTCG TNFA gene. Gfi-1B_Mm PDGFA PDGF, A-chain (platelet-derived growth GGGCGCGGCGG factor A-chain) C-FOS FBJ murine osteosarcoma viral GGCCGCGG oncogene homolog IA-1_L SMARCB1 SWI/SNF related, matrix associated, CGGCGGCG actin dependent regulator of chromatin, subfamily b, member 1. IA-1_S COL1A1 A1(1) COL (alpha1(I) collagen) GGGGCCGGG

77

kbp

2.31 9.42 6.56 4.36

Figure 5. 2. Genomic DNA extraction from lung and RPE cell lines. Genomic DNA was extracted from two cell lines and run on a 0.7% agarose gel to measuring DNA concentration with a spectrophotometer.

Figure 5. 3. PCR amplified target gene promoters. PCR products were gene cleaned then loaded on a 10% DNA-PAGE to assess proper gene extraction based on size.

SNAG-ZFPs were amplified using previously purchased clones (Figure

5.4) for construction of the pQE30 reporter plasmid. TAT and His/TAT sequences to be inserted were made in the form of oligonucleotides.

78

Figure 5. 4. PCR amplified SNAG-ZFPs and β-actin. SNAG-ZFPs were amplified from OBS clones for insertion into pQE30 TAT and His/TAT constructs. The β-actin promoter was successfully cloned into pGL3 (Figure 5.5) followed by successful cloning of 14 out of 18 target gene promoter regions

(Figure 5.6.) These were grown at a midi-scale for addition to the cell culture.

The second pGL3 construct required a repeat of the wild type binding site consensus sequence oligonucleotide. Unfortunately, despite numerous attempts at various conditions (10 minutes at room temperature or one, three, or five hours at 4°C), the multimerization of annealed binding site selected sequences was unsuccessful (Figure 5.7), and this construct was unable to be attained.

79

Figure 5. 5. Positively cloned β-actin pGL3 constructs. The arrowhead shows release of a 267bp fragment corresponding to the size of the β-actin promoter in positive clones 14 and 16.

Figure 5. 6. Positive clones of target gene promoters in pGL3-β-actin constructs. Of 18 target gene promoter regions chosen (17 shown above), 14 had at least one positively transformed clone as shown by underlined and bold numbers.

80

Figure 5. 7. Multimerization attempts for binding site selected sequences. Oligonucleotides were ligated for 10 minutes at room temperature, or one hour, three hours, or five hours at 4°C. “C” represents control oligonucleotide with no ligation.

For pQE30 TAT and His/TAT construction, the oligonucleotides were successfully cloned into the plasmid (Figure 5.8), and the pQE30 TAT plasmid was used for further cloning of SNAG-ZFP inserts (Figure 5.9). Despite multiple attempts of this cloning with various ligation insert:vector ratios, conditions (temperature and time), and transformation conditions (cells, heat shock time and temperature, and medium), only two positively cloned inserts were obtained while the rest were empty (Figure 5.10.) The positive inserts were grown at a midi-scale for protein induction and purification.

81

Figure 5. 8. Positive TAT and His/TAT inserts cloned into pQE30. Oligonucleotides coding for TAT protein which aids in nucleus localization were successfully cloned and checked for positive inserts (underlined and bold numbers) on a 10% DNA-PAGE.

Figure 5. 9. pQE30 TAT SNAG-ZFP cloning scheme. Plasmid and inserts were double digested sequentially (top panel) and gene cleaned, then run on an 0.7% agarose gel to compare concentration for subsequent ligation.

82

Figure 5. 10. Positively cloned pQE30 TAT constructs with Slug and Snail. Plasmids were double digested to release the SNAG-ZFP inserts to confirm positive cloning (left), while the majority of chosen colonies from cloning attempts were empty (right.)

5.3 Completion of Luciferase Assay

To complete the luciferase assay and obtain results that should validate the in vitro work, pQE30 TAT SNAG-ZFP constructs must be checked for inducible protein production, and then purified using the inherent His tag in the plasmid.

Human embryonic kidney cell like HEK293T cells can be used to determine the SNAG-ZFP’s ability to repress the target genes due to the presence of these transcription factors in this cell line. Once performed, the change in the presence of light will reveal the SNAG-ZFPs’ ability to repress each target gene tested. The luciferase assay should validate findings obtained in vitro via the binding site selection assay. 83

5.4 Summary of Results

Validation of target genes in vivo has yet to be completed due to time constraints, however the scheme has been conceived, plasmids constructed, and materials obtained to do so. Difficulties constructing the pQE30 TAT

SNAG-ZFP construct consumed most of the time. Although digests of plasmid with restriction enzymes, either borrowed or new, showed good yield, the inserts seemed to be difficult to clone into the plasmid. In any case, two

SNAG-ZFPs were successfully inserted, and the project was suspended during protein production phase.

To complete the assay, proteins must be induced and purified via the plasmid’s His affinity tag and nickel beads. These target gene plasmids are then to be transfected into the human cell line of choice. The proteins will be expressed in cells with the TAT fusion protein allowing it to localize to the nucleus. With the aid of a luciferase assay kit, repressive activity (or if incorrectly hypothesized, the activating nature) of these proteins on the target gene possessing the derived consensus sequences can be determined.

Additionally, the extra constructs in the devised scheme may be used as additional data to support this assay.

If completed, we expect that the target genes selected will indeed be repressed by the SNAG-ZFPs, knowing that this assay allows for control of the effects of one transcription factor’s regulation on one particular gene. Figure

5.11 presents a possible mechanism by which SNAG-ZFPs repress genes. In

84 one proposed mechanism, the essential portion of the DNA, CACCTG, is bound while the rest of the sequence facilitates proper binding of the zinc fingers. This directed binding created a heterochromatin form that prevents

RNA polymerase I holoenzyme complex from binding to the gene’s promoter, thereby repressing the gene’s transcription. The corepressor KDM3A may be recruited to facilitiate transcriptional repression. Hopefully this can be further investigated to confirm these theories.

Figure 5. 11. Proposed model of repression of a target gene by Smuc. In this proposed scheme, Slug binds to the proximal promoter region of a target gene such as MMP-11, thereby repressing the gene’s transcription. Other molecules such as KDM3a may play a corepressor type role in this repressional activity.

85

CHAPTER 6. DISCUSSION AND FUTURE DIRECTION

6.1 Discussion

The SNAG-domain family of transcription factors has been found to be a group of repressors that has its hand in numerous types of interactions and in a broad range of pathways. The DNA- and protein-binding capabilities enable the members to direct the transcriptional repression of numerous genes and play a role in quite a number of biological relationships. Those include immunological, disease, and cell cycle regulation just to name a few.

The screenings we have done to elucidate the functional capabilities of these

ZFPs have narrowed the scope of genes to investigate when looking at the mechanisms that are employed by these transcription factors. Rather than guess which genes may be targeted, we have screened and collected a comprehensive list of genes possessing DNA sequences that are targeted by the SNAG-ZFPs. We have also identified additional genes that may work in cooperation with these ZFPs to regulate transcription. These genes were found to be widespread in various tissues, indicating that SNAG-ZFPs have a role multiple pathways, functioning as broad-ranging transcription factors.

During these studies we have also investigated the nature of E-box’s internal dinucleotide sequence, used bioinformatics to derive significance of

86 zinc fingers and expression of genes, and conceived new and novel methods for competition assays.

Although we suspected that E-box binding transcription factors would regulate the same group of genes, this turned out to incorrect. This is true in that the consensus sequence selected the same target genes, but individually selected binding sequences screened a number of unique genes for each

SNAG-ZFP. This indicates that the binding properties of each family member is unique and works to regulate a distinct group of genes. A profile of the genes’ function reveals that each SNAG-ZFP, although part of the same family with similar characteristics, target different genes as shown by their individually selected binding site sequences and target genes possessing them.

The yeast two-hybrid method was successful for pulling out possible protein interactors that aid in transcriptional activity and confirming that the transcription factors have protein-binding capabilities in addition to DNA binding by the zinc fingers.

As a whole, this research has targeted a vast number of biological targets for numerous members of the SNAG-ZFP family, introducing previously unknown knowledge. There is much more to investigate, but these results provide a foundation for new and novel mechanisms to be further explored.

87

6.2 Future Direction

With the collective information obtained from these studies, much more can be investigated about the SNAG-ZFPs and their role in DNA and protein interactions. Although in vitro assays were performed, and results were confirmed via competition assays, further analysis of target genes an in vivo level is still essential to prove functional significance of these studies. As described, a luciferase assay should be performed to assess transcriptional regulation. The framework for this assay has been lain, and the concept in chapter 5 may be carried out to complete this study. This will give a deeper look into the genes’ effects at a cellular level. Additional studies may be done using the same approach as presented in this paper with other members of the SNAG family, or even other zinc finger proteins from other families.

In terms of the protein-protein interactions, the information can be used to further confirm if the selected genes are indeed true interactors and what mechanisms are employed to regulate transcriptional repression. The yeast two-hybrid method was successful in screening multiple members on a grand scale and may be used for the other members. Adjustments will need to be made to the auto-activating ZFPs, and sections of each may be used to determine exactly where in the structure of the ZFPs these proteins are interacting with. Interactions of note include that between Smuc and KDM3A, and it would be interesting to explore the specific regions that are involved in the binding and regulatory activity caused by these two genes.

88

This complete analysis of multiple members of the SNAG family of transcription factors contributes new information to this broad group of transcription factors. The collective data may be used in future studies to further identify SNAG-ZFP mediated interactions. We have characterized DNA binding (including the nature of E-box) as well a protein binding (through corepressor activity.) Each interaction must be carefully explored to gain insight as to each SNAG-ZFP’s exact role in biological processes.

89

CHAPTER 7. MATERIALS AND METHODS

7.1 Materials

Materials for binding site selection include but are not limited to SNAG-

ZFP full length (Open BioSystems), primers and oligonucleotides (Integrated

DNA Technologies), all restriction enzymes and their buffers as well as Quick

Ligase Buffer and ligase(New England Biolabs), and BL21 and DH5 α competent E. coli cells (Invitrogen). PCR reactions were done using Taq Bead

Hot Start Polymerase (Promega), DMSO, and mineral oil (Sigma-Aldrich).

Qiagen Gene Clean Kit was used to extract DNA. Various reagents were purchased from Fisher BioReagents (ampicillin, kanamycin, urea, PCI/CI, proteinase inhibitors), EMD (lysozyme, glycerol), and Sigma-Aldrich (PMSF,

SDS). Anti-myc and anti-mouse IgG antibodies were from Promega. Clontech

Matchmaker Gold kit supplied all materials for transformation (TE buffer, lithium acetate, Yeastmaker Carrier DNA, competent cells), yeast two-hybrid assay including cloning vectors and yeast strains, and yeast media. Biotin 3’

End Labeling Kit (#89818) and LightShift® Chemiluminescent EMSA Kit

(#20148) with Chemiluminescent Nucleic Acid Detection Module were purchased from Thermo Scientific. Immobilon-P transfer membrane (Millipore) and Biodyne B membrane (Pall) were utilized for competition dot blot assays.

90

7.2 Bacterial Expression and Protein Purification of GST-SNAG Fusion

Proteins

SNAG-domain family zinc finger constructs were PCR amplified using primers from Integrated DNA Technologies (Table 3) flanking just the zinc finger portion and full-length transcription factor clones purchased from Open

BioSystems. Each PCR product was run on an agarose gel, gel purified, and digested with BamHI and SalI (New England Biolabs). The zinc finger regions in amino acids for each construct is as follows: Slug (SNAI2) (121-268 aa),

Smuc (SNAI3) (152-292 aa), Snail (SNAI1) long (144-264 aa), Snail short

(154-259 aa), Scratch (SCRT1) long (181-337 aa), Scratch short (191-327 aa),

Gfi-1 (245-422 aa), Gfi-1B Hs (human) (153-330 aa), Gfi-1B Mm (mouse)

(153-330 aa), insulinoma-associated 1 (IA-1/INSM1) long (290-521 aa), and

IA-1 short (300-508 aa). Short constructs refer to just the zinc fingers, and long pertains to zinc fingers with additional flanking amino acids on either side to facilitate proper folding.

Recombinant constructs with a GST-tag in the pGEX-4T2 plasmid vector were used for bacterial expression in competent BL21 E. coli cells.

Ligated samples were transformed into E. coli DH5α cells and grown on

LB/amp agar plates. Colonies from these plates were grown in LB/amp for mini-plasmid DNA preps. The plasmids were checked for the presence of inserts by EcoRI and BamHI restriction endonuclease digestion and gel

91 electrophoresis. Positive recombinant plasmids were then transformed into the E. coli BL21 host for protein expression. The E. coli was grown in

LB/amp/kanamycin (kan) to an O.D.595 = 0.6 and induced at 30°C using 100 mM IPTG. A maxi-protein preparation was done on the 500mL of induced culture. Gfi-1, Gfi-1B Hs and Mm, and IA-1 were grown in LB/amp/kan to an

O.D.595 of 0.3 and then induced for 4 hours.

Table 3. Primers used to amplify zinc finger containing regions.

Name Sequence

Slug 5’ BamHI 5’-GTGGGATCCCATGCCATTGAAGCTGAAAAG-3’ Slug 3’ SalI 5’-GTGGTCGACTCAGTGTGCTACACAGCAG-3’ Smuc 5’BamHI 5’-CACGGATCCTTTGAGTGCTTCCACTGCCAC-3’ Smuc 3’SalI 5’-CACGTCGACTCAGGGGCCCGGGCAGCCAGC-3’ Snail long 5’BamHI 5’-GTGGGATCCGAGGCCAAGGATCTCCAG-3’ Snail long 3’SalI 5’-GTGGTCGACGCGGGGACATCCTGAG-3’ Snail short 5’BamHI 5’-GTGGGATCCTTCAACTGCAAATACTGCA-3’ Snail short 3’SalI 5’-GTGGTCGACGCAGCCGGACTCTTGG-3’ Scratch long 5’BamHI 5’-GTGGGATCCGGGTCGGGAGCCACGG-3’ Scratch long 3’SalI 5’-GTGGTCGACGGTTGCGGGGCCGCTAG-3’ Scratch short 5’BamHI 5’-GTGGGATCCCACGCGTGCGGAGAGTG-3’ Scratch short 3’SalI 5’-GTGGTCGACGCAGGCTGACTCGTAGTG-3’ Gfi-1 5’BamHI 5’-GTGGGATCCTGCACCCGCCTGCTGCTGG-3’ Gfi-1 3’SalI 5’-GTGGTCGACTCATTTGAGCCCATGCTGCG-3’ Gfi-1B Hs 5’BamHI 5’-GTGGGATCCAGCCTCCGCTACTCCCCAGG-3’ Gfi-1B Hs 3’SalI 5’-GTGGTCGACTCACTTGAGATTGTGCTGG-3’ Gfi-1B Mm 5’BamHI 5’-GTGGGATCCCGCCTCCGCTACTCTCCAGG-3’ Gfi-1B Mm 3’SalI 5’-GTGGTCGACTCACTTGAGATTGTGTTGACTCT-3’ IA-1 long 5’BamHI 5’-GTGGGATCCCACAAGTGCTCGCGCATCG-3’ IA-1 long 3’SalI 5’-GTGGTCGACCTAGCAGGCCGGACGCACAG-3’ IA-1 short 5’BamHI 5’-GTGGGATCCTACCGCTGCCCAGAGTGCG-3’ IA-1 short 3’SalI 5’-GTGGTCGACTCTATTCTCAGACGGGTGG-3’

92

To purify GST-SNAG-ZFPs, induced protein pellets were resuspended in PBS and lysozyme (1mg/mL) and rotated at 4°C for 30 minutes. Samples were sonicated with six to eight one-minute pulses, centrifuged at 4°C at

10,000 rpm for 30 minutes, and then the supernatant was centrifuged at 4°C at 10,000 rpm for 20 minutes to obtain clear soluble fraction which was passed through a GSH sepharose affinity chromatography column. The beads were washed with 5 mL of 1x PBS, and the bound proteins were eluted off the column three times in 500 µL of elution buffer. Aliquots of all the samples were analyzed on SDS-PAGE gels to confirm the presence of and determine the amount of proteins present in each fraction.

All elutions were dialyzed using Spectra/por molecular porous membrane tubing (5 kDa cut-off) and a buffer of 1x PBS (100mM PMSF), twice for four hours, then transferred into 1x PBS (10% glycerol, 100mM

PMSF) for over 12 hours to concentrate the protein and to remove GSH in the elutions. Protein concentration was measured using Bradford reagent assay and a BSA standard.

7.3 Zinc-Finger Array Binding Site Interactions

Radiolabeled E-cadherin promoter (containing E-box) sequence was bound to the purified recombinant GST-ZFPs to determine if binding would take place as expected in an EMS). DNA binding site selection experiments were carried out using recombinant ZFPs and a randomized oligonucleotide

93 library, as described (Peng et al., 2002). The oligonucleotide (18N) library contained flanking sequences of a 5’ end BamHI site and a 3’ end EcoRI site.

The full sequence is 5’-agacGGATCCattgca-NNNNNNNNNNNNNNNNNN- ctgtccGAATTCgga-3' (total length 49 nucleotides), with the restriction sites being the underlined nucleotides. The amplified oligonucleotide library was end-labeled with T4 polynucleotide kinase, its corresponding buffer, and 32P for one hour at 37°C and bound to the purified proteins in 2x NEBB-NaCl+DTT for 15 minutes at room temperature. EMSA was run at 4°C, transferred onto a sheet of Whatman paper and dried, and exposed to autoradiography in an -

80°C freezer. The gel bands with signals were excised from the filter paper and placed in elution buffer (0.5 mM NH4Oac, 1 mM EDTA) in a 37°C water bath overnight to elute the DNA-protein complexes. Purified DNA was subjected to PCR (94°C, 30 seconds; 48°C, 1 minute; 72°C, 20 seconds; 40 cycles with a 15 minute 72°C elongation period). A portion of the amplified

PCR product was radiolabeled and bound to fresh protein. Four total cycles were carried out to obtain an enriched DNA population.

7.4 Cloning of DNA-Protein Binding Site Interactions

The enriched DNA samples were electrophoresed and the gel slices containing the resultant PCR product bands were excised from the DNA-

PAGE gel and subjected to electroelutions in a dialysis bag. A 45 V charge was applied for 2.5 hours and then reversed polarity for 30 seconds at 100 V.

94

These electro-eluted samples were extensively PCI/CI purified and used for cloning. Inserts and pUC18 vector were double digested with EcoRI and

BamHI for an hour each at 37°C. The plasmid vector and insert DNAs were ligated and transformed into DH5α cells. Samples were plated onto LB/amp and incubated at 37°C overnight. At least 20 colonies from each cloning sample were inoculated.

7.5 DNA Sequencing

The DNA templates for sequencing were prepared using mini-plasmid prep of positive recombinant clones and RNase and PCI/CI cleaning. Aliquots equivalent to 1µg of DNA were spotted into a 96-well microtiter plate. The microtiter plates were sent to the ICBR Genomics Core at the University of

Florida, in Gainesville for high-throughput cycle sequencing.

7.6 Confirmation of Binding Site Selected Sequences

With the PCI/CI purified samples, further gel shifts were performed to confirm binding of these isolated DNA sequences to the SNAG-ZFP that was used to obtain the sequences. The DNA was kinased using 32P for 1 hour at

37°C and was then bound to the proteins for 15 minutes at room temperature, similar to the previous procedures presented. The purified protein of each

SNAG-ZF used in this experiment was normalized to a concentration of 300 ng. Bands indicating binding between DNA and protein were seen in each case. The same was done with the DNA and some other unrelated protein, 95

Drosophila melanogaster Serendipity locus protein beta (CG7938), provided by Joseph Krystel.

7.7 Competition EMSAs for Derived Consensus Sequences

Competition assays for Slug, Smuc, Snail long and short, and Scratch long and short involved binding reactions consisting of 100ng of purified protein, 200,000 cpm radiolabeled wild type oligonucleotide, 5x NEBB-

NaCl+DTT to a salt concentration of 75mM. Reactions were then adjusted to accommodate 10x wt, 20x wt, 40x wt, 20x mut (mutant oligonucleotide), and

40x mut unlabeled oligonucleotides to get a competing effect. Using the same binding conditions, constructed E-box oligonucleotides with varying internal

NN nucleotide residues were bound to Slug, Smuc, Snail long and short, and

Scratch long and short purified proteins. Wild type, completely mutated (MUT), and mutated only at the highly conserved nucleotides (100MUT) were obtained from Integrated DNA Technologies (Table 4).

96

Table 4. Competition assay oligonucleotides.

Name Sequence

Slug WT 5’ 5’-AATTCTGCACCTGTCCGAG-3’ Slug WT 3’ 5’-AATTCTCGGACAGGTGCAG-3’ Snail long WT 5’ 5’-AATTCGCACCTGTCCGAG-3’ Snail long WT 3’ 5’-AATTCTCGGACAGGTGCG-3’ Scratch long WT 5’ 5’-AATTCCATTGCACCTGTCCGAG-3’ Scratch long WT 3’ 5’-AATTCTCGGACAGGTGCAATGG-3’

Slug MUT 5’ 5’-AATTCTTATGGATATGAAG-3’ Slug MUT 3’ 5’-AATTCTTCATATCCATAAG-3’ Snail long MUT 5’ 5’-AATTCGACGGTTTCAGTG-3’ Snail long MUT 3’ 5’-AATTCACTGAAACCGTCG-3’ Scratch long MUT 5’ 5’-AATTCATGTGACGCGTTCAGAG-3’ Scratch long MUT 3’ 5’-AATTCTCTGAACGCGTCACATG-3’

Slug 100MUT 5’ 5’-AATTCTGAACCTTTCCGGG-3’ Slug 100MUT 3’ 5’-AATTCCCGGAAAGGTTCAG-3’ Smuc 100MUT 5’ 5’-AATTCTGACCCTTTCCGGG-3’ Smuc 100MUT 3’ 5’-AATTCCCGGAAAGGGTCAG-3’ Snail long 100MUT 5’ 5’-AATTCTACCCTTTCCGG-3’ Snail long 100MUT 3’ 5’-AATTCCGGAAAGGGTAG-3’ Scratch long 100MUT 5’ 5’-AATTCCATTGACCCGTTCCGGG-3’ Scratch long 100MUT 3’ 5’-AATTCCCGGAACGGGTCAATGG-3’

7.8 Alternative Non-Isotopic Competition Dot Blot Assay

Following the instructions of the biotin 3’ end DNA labeling kit, 5pm of 5’

WT oligonucleotide was added to a reaction of 1X Terminal Deoxynucleotidyl

Transferase (TdT) reaction buffer, 0.5µM biotin-11-UTP, and 0.2U/µL dilutied

TdT. Samples were incubated at 37°C for 2.5 hours. 100mM EDTA was added to a final concentration of 0.01µM to stop the reaction followed by a chloroform:isoamyl alcohol extraction. Cleaned biotin-labeled 5’ sequences

97 were then annealed to their 3’ counterparts at 95°C for 1 minute followed by gradual cooling to room temperature over a few hours.

PVDF Immobilon P membrane was soaked in methanol followed by 1x

PBS for at least 10 minutes. This membrane was placed on a damp Whatman backing sheet and securely clamped in the dot blot apparatus. Reactions were made to contain 1x NEBB-NaCl+DTT, 0.2% PBS, 0.1µg protein, and unlabeled annealed oligonucleotides to 10x WT, 20X WT, 40X WT, 20X MUT, or 40X MUT concentrations where 0.1pm was the starting amount. Binding was for 15 minutes at room temperature, then the mixtures were added to the membrane for 20 minutes. After vacuuming the reactions through slowly, 50µL of PBS/1% BSA was added for 20 minutes to block non-specific binding sites.

A biotin cocktain containing 0.1pm biotin-labeled oligonucleotide, 0.2% PBS, and 1X NEBB-NaCl+DTT was added to the membrane for 20 minutes then vacuumed through. Two 10 minute washes in 1X supplied wash buffer were done before removing the membrane from the apparatus. The membrane was auto crosslinked then blocked in supplied blocking buffer for one hour followed by one hour of streptavidin-HRP conjugate (1:1000) in blocking buffer. Four 10 minute 1X washes followed by 10 minutes in substrate equilibriation buffer preceded visualization in luminol/enhancer solution and stable peroxide solution. (This was also performed using 1X TBS/0.2% Tween as the washing solution, 3% BSA to block, and streptavidin-AP (1:2000) as the conjugate.

Visualization was done with NBT/BCIP solution. Times were not changed.)

98

Dot blots to assess the amount of DNA that was biotin-labeled were performed in a non-vacuuming dot blot apparatus. Two control oligonucleotides, EBNA DNA and a control oligonucleotide, supplied in the kits were labeled as described previously. Two different sets of experiments were done to assess labeling efficiency, and visualization was done using both streptavidin-HRP and streptavidin-AP methods.

7.9 Target Gene Analysis and Gene Ontology

Investigation of target genes containing consensus sequences within their promoter region was accomplished primarily via two online databases, the public version of TRANSfac (http://www.gene-regulation.com) (Matys et al., 2003) and the Eukaryotic Promoter Database (http://www.epd.isb-sib.ch/)

(Schmid et al., 2006). Exact matches, shorter pieces of the original sequence, nucleotide variants, and individual selected sequences were searched. The comprehensive list of more than 1,500 total target genes for all 11 constructs includes complete information of chromosome number, positive or negative orientation, transcription start site, and a 60n strand of DNA that contains the consensus sequence. Each set of target genes was categorized based on statistical overexpression of groups based on biological function, molecular function, or cellular process using GOstat (http://gostat.wehi.edu.au/)

(Beissbarth and Speed, 2004).

99

7.10 Luciferase Assay

The plasmids pGL3 and pQE30 were chosen for the luciferase assay.

In order to amplify the promoter regions of the target genes, two cell lines were selected, lung cells and retinal pigment epithelium cells for their genomic DNA.

Cells were grown, harvested, trypsinized, washed, and pelleted. To isolate the genomic DNA, pellets were resuspended in genomic DNA lysis buffer (50mM

Tris, pH 8; 10mM EDTA; 10mM NaCl; 1% SDS) and proteinase K (0.2mg).

Samples were heated at 55°C for four hours, with proteinase K supplemented every few hours. Cells were then moved to a 37°C water bath overnight.

Genomic DNA samples were then PCI/CI purified, and precipitated with

250µM NaCl and 2.5x volume 100% ethanol. A second PCI/CI extraction was performed, and the DNA pellet was finally resuspended in TE buffer. To determine DNA concentration, the O.D.290s of diluted samples of each genomic DNA sample were measured, and concentrations of 0.2, 0.4, and

0.8µg were loaded on a 0.7% agarose gel. Primers for pQE30 SNAG-ZFP

TAT and pGL3 target gene promoter region plus β-actin are listed in Tables 5 and 6.

100

Table 5. Primers of SNAG-ZFP and TAT for pQE30 construction

Name Sequence Slug_BamHI_5' gagGGATCCatgccgcgctccttcctggtcaag Slug_SalI_3' gagGTCGACgtgtgctacacagcagccagattcctcatg Smuc_BamHI_5' gagGGATCCatgccgcgctccttcctggtgaaaacg Smuc_SalI_3' gagGTCGACggggcccgggcagcagccag Snail_BamHI_5' gagGGATCCatgccgcgctctttcctcgtcaggaag Snail_SalI_3' gagGTCGACgcggggacatcctgagcagccgg Scratch_BamHI_5' gagGGATCCatgcccaggtccttcctggtcaagaagg Scratch_SalI_3' gagGTCGACggcctgcacagggctgagctgtgg Gfi-1_BamHI_5' gagGGATCCatgccgcgctcatttctcgtcaaaagcaag Gfi-1_SalI_3' gagGTCGACtttgagcccatgctgcgtctcccggtgc Gfi-1B_BamHI_5' gagGGATCCatgccacgctccttcctggtgaagagcaag Gfi-1B_SalI_3' gagGTCGACcttgagattgtgctggctctcgcgg IA-1_BamHI_5' gagGGATCCatgccccgcgggtttctggtgaagcg IA-1_SalI_3' gagGTCGACgcaggccggacgcacaggcacctg

TAT-F-SalI-pQE TCGACggtcgtaagaaacgtagacagcgcagacgtcctcagA TAT-R-HindIII-pQE AGCTTctgaggacgtctgcgctgtctacgtttcttacgaccG

In pQE30, the TAT and His/TAT oligonucleotides were inserted at the

SalI and HindIII restriction sites while SNAG-ZFPs were inserted at BamHI and SalI. β-actin promoter was inserted into pGL3 at XhoI and HindIII while the target gene promoter regions were inserted at SmaI and XhoI. Insertion of

TAT and His/TAT oligonucleotides into pQE30 as well as β-actin into pGL3 were done first so that a midi-sized culture of these plasmids could be grown and used for subsequent cloning as needed. Ligated cells were transformed into competent E. coli DH5α cells and plated onto LB/ampicillin at 37°C.

Transformants were grown in LB/ampicillin and checked for positive inserts using the same restriction enzymes used to clone them into the plasmids, and run on 10% DNA-PAGE (target gene promoters) or 0.7% agarose (SNAG- 101

ZFPs.)

Table 6. Primers of target gene promoter regions & β-actin for pGL3

Name Sequence IAPP_SmaI_5' gagCCCGGGttaatatttactgatgagttaatgtaataatgacccatccg IAPP_XhoI_3' gagCTCGAGtgtcattaatttccatccctatagatagaaagtacctcacag SDC1_SmaI_5' gagCCCGGGgggggcggggtcctgggg SDC1_XhoI_3' gagCTCGAGccttagccgttgcaaaaactgg MT-1_SmaI_5' gagCCCGGGatgttccacacgtcacatgggtcgtcctatccg MT-1_XhoI_3' gagCTCGAGgctccctggagcgccagtgtg CST4_SmaI_5' gagCCCGGGagtcaggggcagggcatggaggtg CST4_XhoI_3' gagCTCGAGcgaggcagggagcccagaccag CEACAM5_SmaI_5' gagCCCGGGaggggacagaggacacctgaataaag CEACAM5_XhoI_3' gagCTCGAGctagggcaggagtacttctcagcatcacg ADH2_SmaI_5' gagCCCGGGacccttttatctgttttgacagtctgggaataatccag ADH2_XhoI_3' gagCTCGAGacatccaattccaattccacacgtgatctatg ACTA1_SmaI_5' gagCCCGGGcacatgcacccaccggcgaacgcgg ACTA1_XhoI_3' gagCTCGAGgggccgggccgtatatggagtg APOE_SmaI_5' gagCCCGGGattactgggcgaggtgtcctcccttcctg APOE_XhoI_3' gagCTCGAGgcccccagtcacgaggtgggctg PDGFB_SmaI_5' ctcCCCGGGtcccccagctcccgcgtcc PDGFB_XhoI_3' ctcCTCGAGaagggagagtgcgagaggtgggtggagac MMP11_SmaI_5' gagCCCGGGggatccgttgaggctctgaggggtggg MMP11_XhoI_3' gagCTCGAGcccccacctttggaaggcgcatcccactg LGALS4_SmaI_5' gagCCCGGGtcacagttgctgggagaggcaggaatttg LGALS4_XhoI_3' gagCTCGAGtggcagagggagggtgagaccagg TMBS1_SmaI_5' gagCCCGGGggtggtctccccagccccg TMBS1_XhoI_3' gagCTCGAGcccgcccacgcagccttgg RAD51C_SmaI_5' gagCCCGGGaacggaatggtgcataagtgtgaaaatttacaagactg RAD51C_XhoI_3' gagCTCGAGtcagacgtaaagcggaaggggccgg SERPINE_SmaI_5' gagCCCGGGgagagggaggtgtcgagggggacccg SERPINE_XhoI_3' gagCTCGAGagatgtgggcaggaaatagatgaactcatgttccag LTA_SmaI_5' gagCCCGGGccgcttcctccagatgagctcatggg LTA_XhoI_3' gagCTCGAGatgtccctggggcgagaggaggg PDGFA_SmaI_5' gagCCCGGGcttccgaggtgcgggtcccagg PDGFA_XhoI_3' gagCTCGAGgcgccgccgccgcg c-fos_SmaI_5' ctcCCCGGGcgccttctctgcctttcccgcctc (continued on next page)

102

(Table 6 continued) c-fos_XhoI_3' ctcCTCGAGggaacagctcgccggctgcagc SMARCB1_SmaI_5' gagCCCGGGggagaaagagaaattagtcgtggctcctttaag SMARCB1_XhoI_3' gagCTCGAGgacgctgacgcgcgcgccg COL1A1_SmaI_5' ctcCCCGGGaggggagatgtggggtggactccc COL1A1_XhoI_3' ctcCTCGAGaaccctgcccctcggagagggggagc

β-actin_XhoI_5' ctcCTCGAGaccccaaggcggccaacgccaaaac β-actin_HindIII_3' ctcAAGCTTggtggcgcgtcgcgccgc

In order to multimerize WT oligonucleotides, 50pm of annealed DNA was added with quick ligase buffer (1X) and quick ligase for varying times and at varying temperatures. Trials were done at 10 minutes at room temperature

(suggested ligation conditions), and one, three, and five hours at 4°C. The entire sample was run on a 12% DNA-PAGE and visualized with ethidium bromide staining.

Protein production of pQE30 SNAG-ZFP TAT constructs is done by combining 1µL of prepared plasmid with 10µL competent S9 (with BR4 plasmid to prevent leaky expression) cells on ice for 30 minutes followed by heat shock of 30 seconds at 37°C. 1mL of LB is added to the cells for recovery at 225rpm and 37°C followed by an additional 1mL 2X

LB/ampicillin/kanamycin for overnight growth. Cells are then split, with 400µL as a glycerol stock and equal amounts used as induced and uninduced cultures. Volume was brought up to 2mL with 1X LB/ampicillin/kanamycin, and the induced samples also received 2X 1M IPTG. Samples were grown at 37°C for three hours, then centrifuged at 7000rpm for two minutes. Pellets were

103 resuspended in 1X PBS and an equal volume of 2X SDS loading buffer.

Aliquots were loaded on a 10% SDS-PAGE and assessed for protein production.

7.11 E-Cadherin Promoter and E-box Characterization Assays

GST-tagged SNAG-ZFPs were purified to homogeneity using column chromatography. Constructed E-cadherin promoter (Table 7) and E-box oligonucleotides (Integrated DNA Technologies) with varying internal NN nucleotide residues (Table 8) were annealed in 10x NEB2 buffer (New

England Biolabs) to a final concentration of 50pm/λ by heating at 95ºC for five minutes and slowly cooling to room temperature (~25ºC) over several hours.

Oligonucleotides were end-labeled with T4 polynucleotide kinase, its corresponding 10x buffer (Fermentas), and 32P for one hour at 37°C then purified via spin column (400µL Sephadex beads, 2000 rpm, 5 minutes).

Table 7. E-box characterization oligonucleotides

Name Sequence E-cadherin E-box 5’ 5’-GGAACTGCAAAGCACCTGTGAGCTTGCGG E-cadherin E-box 3’ 5’-CCGCAAGCTCACAGGTGCTTTGCAGTTCC HLH 5’ 5’-AATTCGCAGGTGGATG HLH 3’ 5’-AATTCATCCACCTGCG MCK1 5’ 5’-CAGACATGTGGCTGC MCK1 3’ 5’-GCAGCCACATGTCTG MCK2 5’ 5’-CCAACACCTGCTGCC MCK2 3’ 5’-GGCAGCAGGTGTTGG

104

Table 8. Oligonucleotides used for E-cadherin promoter and E-box dinucleotide assays.

Name Primer Sequence E-cad E-box 5’ 5’-GGAACTGCAAAGCACCTGTGAGCTTGCGG-3’ E-cad E-box 3’ 5’-CCGCAAGCTCACAGGTGCTTTGCAGTTCC-3’ Name Primer Sequence EBX-AA 5’ 5’-CAAAGCAAATGTGAG-3’

EBX-AA 3’ 5’-CTCACATTTGCTTTG-3’ EBX-AT 5’ 5’-CAAAGCAATTGTGAG-3’ EBX-AT 3’ 5’-CTCACAATTGCTTTG-3’ EBX-AG 5’ 5’-CAAAGCAAGTGTGAG-3’ EBX-AG 3’ 5’-CTCACACTTGCTTTG-3’ EBX-AC 5’ 5’-CAAAGCAACTGTGAG-3’ EBX-AC 3’ 5’-CTCACAGTTGCTTTG-3’ EBX-TA 5’ 5’-CAAAGCATATGTGAG-3’ EBX-TA 3’ 5’-CTCACATATGCTTTG-3’ EBX-TT 5’ 5’-CAAAGCATTTGTGAG-3’ EBX-TT 3’ 5’-CTCACAAATGCTTTG-3’ EBX- TG 5’ 5’-CAAAGCATGTGTGAG-3’ EBX-TG 3’ 5’-CTCACACATGCTTTG-3’ EBX-TC 5’ 5’-CAAAGCATCTGTGAG-3’ EBX-TC 3’ 5’-CTCACAGATGCTTTG-3’ EBX-GA 5’ 5’-CAAAGCAGATGTGAG-3’ EBX-GA 3’ 5’-CTCACATCTGCTTTG-3’ EBX-GT 5’ 5’-CAAAGCAGTTGTGAG-3’ EBX-GT 3’ 5’-CTCACAACTGCTTTG-3’ EBX-GG 5’ 5’-CAAAGCAGGTGTGAG-3’ EBX-GG 3’ 5’-CTCACACCTGCTTTG-3’ EBX-GC 5’ 5’-CAAAGCAGCTGTGAG-3’ EBX-GC 3’ 5’-CTCACAGCTGCTTTG-3’ EBX-CA 5’ 5’-CAAAGCACATGTGAG-3’ EBX-CA 3’ 5’-CTCACATGTGCTTTG-3’ EBX-CT 5’ 5’-CAAAGCACTTGTGAG-3’ EBX-CT 3’ 5’-CTCACAAGTGCTTTG-3’ EBX-CG 5’ 5’-CAAAGCACGTGTGAG-3’ EBX-CG 3’ 5’-CTCACACGTGCTTTG-3’ EBX-CC 5’ 5’-CAAAGCACCTGTGAG-3’ EBX-CC 3’ 5’-CTCACAGGTGCTTTG-3’

105

E-cadherin promoter oligonucleotide binding reactions consisted of 2x

NEBB (nuclear extract binding buffer) without NaCl and 100 ng of purified protein. Binding reactions for E-box assay consisted of 100 ng of purified protein, 200,000 cpm radiolabeled E-box oligonucleotides, and 5x (NEBB) (-

NaCl +DTT) (100mM HEPES pH 7.5, 2.5mM DTT, 50% glycerol, 25mM

MgCl2, 0.25mM ZnSO4) to a salt concentration of 75mM for 20 minutes at room temperature. Electrophoretic Mobility Shift Assays (EMSAs) were run at

4°C on a 5% DNA polyacrylamide gel for 1.5 hours at 400V in 0.5x TBE buffer, transferred onto Whatman paper, dried in vacuum at 80ºC for 45 minutes, and processed for autoradiography at -80°C.

7.12 E-Box Competition Assays with Increasing Protein or Increasing DNA

Proteins and unlabeled oligonucleotides were titrated at various concentrations. For each data set, an equivalent amount of labeled DNA was added to each reaction and then titrated with increasing amounts of unlabeled oligonucleotide. Unlabeled oligonucleotides from 5nM to 1280nM or 80nM to

102.4µM in two-fold and ten-fold increments were added to the reaction with

100 ng standard of protein at room temperature for 20 minutes to allow the reaction to reach equilibrium, followed by addition of labeled oligonucleotide at an equal amount to the lowest concentration for another 20 min at room temperature to reach equilibrium again. Conversely, proteins were added 10- fold excess from 78 pg to 780 pg and then in two-fold increments up to 100 ng

106 or 6.25 ng to 400 ng in two-fold increments (depending on the conditions and round of experiments done). Reactions were made in 5x NEBB (-NaCl +DTT) to a concentration of 50mM NaCl, bound as above, and loaded on a 5% DNA polyacrylamide gel for EMSA.

7.13 GST-Pull-Down Binding Assay

Induced samples were obtained as previously described for the GST-

SNAG-ZFPs. In order to begin the protein association studies, GST-Slug domain and GST-Scratch domain proteins were prepared from 50mL cell pellets of Slug (84-127 aa), Scratch short(107-126 aa), and Scratch long (102-

131 aa). PBS (1x) and lysozyme, followed by the addition of phenylmethylsulphonylfluoride (PMSF) and an aprotinin, leupeptin, and pepstatin proteinase inhibitor cocktail. Dithiothreitol (DTT) and Sarkosyl/ sodium lauryl sarcosine (SLS). The samples were then disrupted by sonication to release the proteins.

HEK293T nuclear extract was prepared to use in the bindings of the

Slug and Scratch domains. This preparation entailed a procedure involving

NLB, NEB, and the proteinase inhibitor cocktail. The domain proteins and nuclear extract were bound on G75 fine Sepharose beads and washed, spun, and removed of supernatant. This cycle was repeated with binding buffers of varying salt concentrations (100, 250, and 500 mM). Bindings were then centrifuged for a final time at 14,000 rpm for two minutes, and the supernatant

107 was mixed with SDS-PAGE gel loading buffer, cracked at 95°C, and loaded on an SDS-PAGE gel which was silver stained.

7.14 Cloning and Sequencing of pGBKT7-SNAG-ZFP Bait

The DNA of the constructs to be transformed into the yeast vector pGBKT7 was PCR amplified using primers (5’ EcoRI and 3’BamHI sites) from

Integrated DNA Technologies (Table 9), along with the DNA of the cloning and control vectors supplied in the Clontech Matchmaker Gold kit. pGBKT7-DNA

Binding Domain and pGBKT7-Activation Domain are cloning vectors, and pGBKT7-53, pGADT7-T, and pGBKT7-Lam are control vectors.

The DNA of these was gene cleaned using a Qiagen kit, and digested with EcoRI and BamHI to release the insert of the desired construct. These were gene cleaned again and ligated into pGBKT7 vector, transformed into competent E. coli DH5α cells, recovered for one hour in 900µL of LB, and

200µL was plated onto LB/kan plates and incubated at 37°C overnight.

Colonies were mini plasmid prepared and digested to check for the proper inserts. These digests were run on a 1% agarose gel, and for Slug and

Scratch domains, a 10% DNA polyacrylamide gel. At least three unique clones for each construct were obtained.

The DNA samples that showed a positive recombinant clone were subjected to RNase cleaning, and PCI/CI purified.Positive clones were midi prepared,

108

RNase and PCI/CI cleaned, and spotted into a 96-well microtiter plate for sequencing by the ICBR Genomics Core at UF.

Table 9. Primers for SNAG-ZFPs used in yeast two-hybrid assay Name Sequence

Slug domain 5’ 5’-CACGAATTCGGGCGAGTGAGTCCCCCTC-3’ Slug domain 3’ 5’-GTGGGATCCCTTTTCAGCTTCAATGGCATGGG- 3’ Slug 5’ 5’-GTGGAATTCATGCCGCGCTCCTTCCTGG-3’ Slug 3’ 5’-GTGGGATCCGTGTGCTACACAGCAGCCAG-3’ Smuc 5’ 5’-GTGGAATTCATGCCGCGCTCCTTCCTGG-3’ Smuc 3’ 5’-GTGGGATCCGGGGCCCGGGCAGCAG-3’ Snail 5’ 5’-CACGAATTCATGCCGCGCTCTTTCCTCGTC-3’ Snail 3’ 5’-CACGGATCCGCGGGGACATCCTGAGCAGC-3’ Scratch domain 5’ 5’-GTGGAATTCATCAACGGCGACGCGGCGG-3’ Scratch domain 3’ 5’-GTGGGATCCATTAGCAGCCTTACGGCGTGAG-3’ Scratch 5’ 5’-GTGGAATTCATGCCCAGGTCCTTCCTGG-3’ Scratch 3’ 5’-GTGGGATCCGGCTTGAACCGGGCTGAGCTG-3’ Gfi-1 5’ 5’-CACGAATTCATGCCGCGCTCATTTCTCGTC-3’ Gfi-1 3’ 5’-CACGGATCCTTTGAGCCCATGCTGCGTCTC-3’ Gfi-1B 5’ 5’-GTGGAATTCATGCCACGCTCCTTCCTGGTG-3’ Gfi-1B 3’ 5’-GTGGGATCCCTTGAGATTGTGCTGGCTCTCG-3’ IA-1 5’ 5’-GTGGAATTCATGCCCCGCGGGTTTCTGG-3’ IA-1 3’ 5’-GTGGGATCCGCAGGCCGGACGCACAGG-3’

7.15 Yeast Transformation

Competent yeast cells were prepared using the Yeast Two-Hybrid

Matchmaker Gold yeast as stated in the protocol using TE buffer and lithium acetate. The plasmid DNA was transformed into the cells with the supplied

Yeastmaker Carrier DNA. All steps were followed as stated in the yeast transformation protocol, and cells were plated on SD/-Trp to determine transformation efficiency. 109

7.16 Yeast Protein Expression

Transformed yeast glycerol stocks were grown in SD/-Trp at 30°C,

225rpm for about 40 hours, and cultures were poured into 15mL conical tubes filled halfway with ice to chill the cells then centrifuged at ~1400rpm for five minutes. The supernatant was decanted, an additional 5mL of ice water resuspended the pellet, and the culture was centrifuged again under the same conditions. The pellet was processed for protein extraction using cracking buffer stock solution (8M urea, 5% w/v SDS, 40mM Tris-HCl pH 6.8, 0.1mM

EDTA, 0.4 mg/mL bromophenol blue) which was used to make a prewarmed

60°C cracking buffer (0.1M DTT, ~4.4x PMSF, aprotinin (0.37mg/mL), leupeptin (0.03mM), and pepstatin (0.1 mg/mL)). Samples were heated to

70°C for ten minutes, vortexed vigorously for one minute, and placed on ice for one minute, for a total of ten cycles, adding PMSF and proteinase inhibitor cocktail every other cycle. Samples were centrifuged at 14,000 rpm for five minutes to pellet debris and unbroken cells while supernatants were combined and transferred to new tubes.

The supernatant was boiled for three minutes and run on a 12% SDS- polyacrylamide gel to ensure the transformed yeast cells were producing proteins. The completed gel was run for 4 hours at 250mA, and the membrane was rinsed in blocking solution (1x PBS, 0.2% Tween, 5% nonfat milk powder) for one hour. Membrane was then washed in a rinsing solution (1xPBS and

0.2% Tween) for five minutes. A 1:3333 myc antibody in PBS, 5% BSA, and

110

1x PBS primary antibody solution coated the membrane for one hour, then four washes in the rinsing solution were done. The secondary antibody

(1:10,000 antimouse conjugate, 1x PBS, 0.2% Tween20, and 1% nonfat dry milk) was followed by four additional rinses. The membrane was then soaked in 0.1M Tris Cl, pH 9.5 before a substrate (0.1M Tris Cl, pH 9.5, NBT,and

BCIP) was added. Membrane was incubated until bands developed and kept in water with 20mM EDTA to save.

7.17 Autoactivation and Toxicity Assays and Yeast Mating

Each of the glycerol stocks made as a result of a transformation was plated on selective plates to ensure the bait did not autonomously activate the reporter genes without a prey protein. pGBKT7-bait was grown in in 3mL of 2x

YPDA at 225rpm and 30°C for 24 hours, and 10µL was transferred into 25mL

SD/-Trp and grown to an O.D.600 of ~0.8. This culture was spun down at

1000rpm for 5 min, resuspended in 2mL 2x YPDA and added to a 1L flask containing 25mL of 2x YPDA/kan (50µg/mL) and 1mL of a normalized human cDNA library. The mixture was incubated for 20 hours at 30°C at 40rpm. The mating was centrifuged at 1000rpm for 10 minutes, decanted, rinsed in 25mL

0.5x YPDA to resuspend the pellet, centrifuged at 1000rpm for 5minutes, decanted, and finally, the pellet was resuspended in 5mL of 0.5x YPDA with

50µg/mL of kan. Ten DDO/-Trp/-Leu/X-α-gal/A plates were plated with 200µL of the mated mixture and spread with glass beads. The plates were incubated

111 at 30°C for about three days. Colonies that were positive under all conditions were saved in glycerol stocks (20-25% final concentration of glycerol) for long- term storage.

7.18 Yeast Plasmid Preparation

Glycerol stocks were used to grow midi-amount cultures in 25mL YPDA broth at 30ºC, 225rpm. Cultures were centrifuged for 5 minutes at 7000rpm, the supernatant was decanted, and then 1 mL of lysis buffer (2% Triton X100,

1% SDS, 100mM NaCl, 1mM Na2EDTA) was added to the cell pellets to resuspend them in. 0.2g of acid washed glass beads and PCI were added to the samples, and the tubes were vortexed for 2 minutes. Samples were PCI/CI purified as well as digested with RNase for 1 hour at 37ºC. Final samples were again PCI/CI purified and dissolved in water to minimize salt content.

Electrocompetent DH5α cells were used for electroporation of the plasmids. To make electrocompetent cells for transformation, DH5α streaked on an LB plate was grown overnight at 37ºC. One colony from the plate was inoculated into 5mL of SOB medium and grown to an O.D. between 0.5 and

1.0. The culture was chilled on ice for 15 minutes then centrifuged at 4960rpm for 15 minutes. The cell pellet was resuspended in ice cold water, centrifuged for 15 minutes, and this was repeated. The cell pellet was next resuspended in

20mL ice cold water with 10% glycerol, centrifuged at 4960rpm, and repeated, and finally resuspended in 3mL volume in 10% glycerol. Aliquots were taken

112 and frozen at -80ºC immediately.

Plasmids were diluted 1/10 from the midi preps (to further reduce salt concentration which hinders electroporation). From this, a 40:1 ratio of cells to

DNA was chilled and combined in an electroporation cuvette (0.2 cm gap).

Mixtures were pulsed in the Bio-Rad Gene Pulser II and Pulse Controller Plus apparatus (25µF, 2.5kV, and 200 Ω). Cells were recovered in SOC media for 1 hour, 225rpm, 37ºC then plated on SOC/ampicillin overnight at 37ºC. Colonies were then mini plasmid prepared, and to ensure that a plasmid was inserted into the cells, restriction endonuclease digestion with NdeI and EcoRI. After one hour in a 37C water bath, samples were run on a 0.7% agarose gel.

7.19 DNA Sequencing

Positive clone mini preps were RNase purified then sent to the ICBR

Genomics Core at the University of Florida (Gainesville) for high-throughput cycle sequencing in a 96-well format and sequenced in duplicate. Each clone was sequenced with the T7 promoter 5’-TAATACGACTCACTATAGGG-3’ and

3’ pGADT7 Activation Domain sequencing primer 5’-

AGATGGTGCACGATGCACAG-3’. Sequences were then analyzed and genes were searched possessing those sequences using NCBI’s nucleotide BLAST function and then again with the blastx function which translates the nucleotide sequence into an amino acid sequence. This was to ensure that the fusion proteins were in frame.

113

APPENDIX

Table A1. Slug Target Genes

Gene Description Ch 60n Sequence with Binding Site

2-hydroxyphytanoyl-CoA lyase. 3 - -5 gcctcttccttcccgttgtttaaggcagttggttgccctC -64 CTGTCCGtcagaggtgcagt Adhesion regulating molecule 1. 20 + -15 ggccgaacgcgggtttccggcggggcCCGAGG 44 AGgcgccgaggaggaagagcgagcccgga ADP-ribosylation factor 3, ARF3 12 - 50 gaagtatcccgggctgggGGTGGGGGtacgc -9 gene. cgcacagctccagtcgccgtcgcggcttc Amplified in osteosarcoma. 12 + -9 gggcgtgttgaggccGCTGCCTGgcttagggc 50 ggaaacagattctctgcataagaagggg Angiotensinogen. 1 - -234 tgtaactcgacccTGCACCGGctcactctgttca -293 gcagtgaaactctgcatcgatcacta APE (apurinic endonuclease) 14 + -10 cacgtggtgtcagacagaccaatCACGCGCAtt 49 cttcggccacgacaagcgcgcctctga APG12 autophagy 12-like (S. 5 - -188 gaaaagcacgcccacTGCACGCGCtcagtcg -247 cerevisiae). ctacttccgctctcgagtgtctccaagca Apolipoprotein A-I binding 1 + -49 gtgctggcccggcctcttcgggggcggggcgagcg 10 protein. ccgcacatgcgccggggccgggccg Apolipoprotein CIII. 11 + -51 gcctgctgccctggagatgatataaaacaggtcaga 8 accctcCTGCCTGTCtgctcagtt Aquaporin 3. 9 - 50 cgtgtctccagcgctcctataaagggagccaccagc -9 gctggaggccgcTGCTCGCTGcgc Arginase liver. 6 + -47 gttgtttattcaacccaagtataaatggaaaaaaaA 12 GATGCGCCctctgtcactgagggt ATPase, H+ transporting, 7 + -32 gGGTGGGGGttgaggccgacggggcgccgta 27 lysosomal 14kDa, V1 subunit F. cggcggaggcggggtttcagtggcttctg ATPase, H+ transporting, 14 - -21 gaaaatgggtgtcccTGCTGCCTcttagcaaca -80 lysosomal 34kDa, V1 subunit D. agaggggtcaagtgacacaaccagctg B-cell leukemia/lymphoma 2 18 - 744 ccgcccctccgcgccgCCTGCCCGcccgccc 685 proto-oncogene, BCL2 gene gccgcgctcccgcccgccgctctccgtgg Block of proliferation 1. 8 - 0 ctcctgcgcacgcggcccggTCGCTGTCggaa -59 gcggctgtgcgggtggcggccggcgcgc Carboxypeptidase B1 (tissue). 3 + -41 aagcaaagaacactcaggattataaaAGCAGA 18 TGagacctacccactagacctggtcaga caspase-1 11 - 103 ataaagacatgcatatgCATGCACAgtgagtatt 44 tcccaatacatgtacaggccctgcca Cathepsin L , transcript variant 1. 9 + 11 ccgaacccagACCCGAGGttttagaagcagag 70 tcaggcgaagctgggccagaaccgcgac Cathepsin L2. 9 - -69 ggggaaggccgcctggaaacttaaatcccgaggc -128 gggcgaaccTGCACCAGaccgcggac CD11c (p150.95 leukocyte 16 + -149 tctgatgagagtgactccggttggggGGTGGGG -90 integrin alpha subunit) Gcgtgtgggaggccgagcctgtcctcg CD40 20 + -644 gcgtgagccaccgcgcccggccccactcttaataaa -585 TGCCTGTCtccaggtgctgggtgg CD63 antigen (melanoma 1 12 - -68 ccggggcggggccgcgcggcaggcggggcggga -127 antigen). gCCGGGGGGcgcagctagagagccccg CD74 antigen (invariant 5 - 31 ttcctctccagcaccgactttaagaggcgagCCGG -28 polypeptide of MHC, class II GGGGtcagggtcccagatgcacagg antigen-associated). (continued on next page)

114

(Table A1 continued) CDC42 effector protein (Rho 11 + -12 gggtccgcggtgcactctgtaagttcaccgccggtcg 47 GTPase binding) 2. ggtccggccgccgCGCTGTCCag Cellular retinoic acid binding 1 - 33 ggtataaaaGCTGTCCGcgcgggagcccagg -26 protein 2. ccagctttggggttgtccctggacttgtc Ceroid-lipofuscinosis, neuronal 2, 11 - 10 gaggggtagtggtggtggaatatagagctcatgtgat -49 late infantile (Jansky- ccgtCACATGACAGcagatccgc Bielschowsky disease). Chromosome 14 open reading 14 + 22 aacttcggagctgtcgcccgggttaccgggaggcgg 81 frame 100. agccgccgaGCTCGCTGTggcccg Claudin 4. 7 + 101 agccatataactgctcaACCTGTCCccgagag 160 agagtgccctggcagctgtcggctggaa Cleavage and polyadenylation 16 - 14 ggctccgggttcacttccggcgtgcctacgcctcctctt -45 specific factor 5, 25 kDa. gCGCTGTCCtgttaatggcgg c-myc 8 + 134 taaaagccggttttcggggctttatctaaCTCGCT 193 GTagtaattccagcgagaggcagag Complement component 1, s 12 + -44 cccctctgtttagatcagggaatttcagACATGCA 15 subcomponent. CACtcgggtagggaatcttatgaac COP9 constitutive 17 - 18 cgtcgccgctcgcgaggacctCAGGTGGAtcg -41 photomorphogenic homolog ccgcggcccctcctcccagagcggcagc subunit 3 (Arabidopsis). CS-B (placental lactogen, 17 - 42 tggccccatgcataaatgtacacagaaaCAGGT -17 chorionic somatomamotropin) GGGgtcaagcagggagagagaactggc CSE1 chromosome segregation 20 + -30 cagcgggccgctcgctcatgcgctctggcctcagG 29 1-like (yeast). CTCGCTGTCgcgccattttgccggg Cystatin S. 20 - 62 gggctgggctgccaaagcaggataaATGCACA 3 Cctgcctgctggtctgggctccctgcct Cytochrome c oxidase subunit 19 + 50 tctctcagcttccggctggtagtagttccgcttCCTG 109 VIb , nuclear gene encoding TCCGActgtggtgtctttgctga mitochondrial protein. D1A (dopamine D1A receptor) 5 - 128 gcggctgtcgcccgcaactCTGCCTGTCaagc 69 gaggaccgcccccagggcaggggagggg DEAD/H (Asp-Glu-Ala-Asp/His) 16 + 175 tagctttgggcggatagagggggCGCGCAAAg 234 box polypeptide 38. tattaagggacaataatggccgctttca DKFZP586F1524 protein. 17 - -15 ACCAGGCGggccaagagggccgggacgccg -74 cgcggggcagtgtgggactggggcggaacc Dual specificity phosphatase 11 2 - -5 gccagcgtggctacgccatcacgaccggcggtgg -64 (RNA/RNP complex 1- CACATGCGcaatagcgagacgctgga interacting). Dystonia 1, torsion (autosomal 9 - -15 gtctggcggcTGCACCGGttcgcggtcggcgc -74 dominant; torsin A). gagaacaagcagggtggcgcgggtccgg EGFR (epidermal growth factor 7 + -205 ccggagactaggtcccgcgggggccacCGCTG -146 receptor) TCCaccgcctccggcggccgctggcctt Elongation of very long chain 1 - 22 cccgcccctccccctctggtgacagaaagtcggccc -37 fatty acids (FEN1/Elo2, AGCAGATGaggaagtggcaggcag SUR4/Elo3, 1 ast)-like 1. Eukaryotic translation initiation 16 + 486 gccttagagggctgtgatagccggtACCCACCA 545 factor 3, subunit 8, 110kDa. GGCacaagggatagaatcacaga Farnesyl-diphosphate 8 + 11 gggcggggcgtcgccgtactaggcctgcccCCTG 70 farnesyltransferase 1. TCCGgccagcccctcgaagcacctac Fatty acid desaturase 1. 11 - -11357 gaaaacccggcgcgCAGGCGGCtggctctgg -11416 gcgcgcgccagcaaatccactcctggagc Follicle stimulating hormone-beta, 11 + -49 agtTGCACATGattttgtataaaaggtgaactga 10 FSHB gene. gatttcattcagtctacagctcttgc Galactosidase, alpha. X - -39 gtccgcccctgaggttaatcttaaaagcccaggttac -98 ccgcggaaatttatGCTGTCCgg Glycyl-tRNA synthetase. 7 + 243 gtgcggcaGCACGCGCgccgcgtcgtttacgc 302 ggcgatttcatcatgctccgagccgggc Growth hormone-releasing factor, 20 - 4974 acgcttaggaaaatgaagagataaatgatgggaac 4915 GHRH or GHRF gene. gCCAGGCGGCtgccagagcaaacac Guanylate kinase 1. 1 + -42 cgggcggaggtgggccggtgcggcgctgtCACG 17 TAGGttcagtgggcggaagaggtggcc (continued on next page)

115

(Table A1 continued) H1 histone family, member 0. 22 + 92 ctaaatacccgGATGCGCCgcccaagcgcca 151 gacgcggagctgggaaaagggaggcagag H19 gene. 11 - 61 aattctggcgggccaccccagttagaaaaagcccg 0 ggctaggaCCGAGGAGcagggtgag H2A histone family, member Y. 5 - 49 tttcgGCCTGTCCGcagtttttaaaaaacgtgtgt -10 gatgataaggaatcactgtctacat Haptoglobin Hp1F. 16 + -35 gagaagtgagctagtggcagcataaaaagaccA 24 GCAGATGCcccacagcactgctcttcc haptoglobin Hp2. 16 + -35 gagaagtgagctagtggcagcataaaaagaccA 24 GCAGATGCcccacagcactgctcttcc haptoglobin-related protein. 16 + -49 gagaagtgagctagtggcagcataaaaagaccA 10 GCAGATGCcccacagcactgctcttcc Heat shock 70kDa protein 9B 5 - 85 agtcgagtatcctctggtCAGGCGGCgcgggc 26 (mortalin-2). ggcgcctcagcggaagagcgggcctctg Hematological and neurological 17 - -130 aaaaccctataaaggcgtcgatcggccggaCAG -189 expressed 1. GCGGCagcggcggctcctgcagcggtg Heterogeneous nuclear 5 + -8 ggcattataaagggcgccacgagtcggcattgtCA 51 ribonucleoprotein A/B. GGCGGCggcaccgcgcgggacggag Heterogeneous nuclear X + -30 agttgctcacgtgaccgagatctCACATGACgta 29 ribonucleoprotein H2 (H'). ggcgctcacgtgattactcgcttacg HLA-DRA (MHC class II human 6 + -107 ttgcaagaacccttcccctagcaaCAGATGCGt -48 leukocyte antigen DRalpha) catctcaaaatatttttctgattggcc Hypoxia-inducible factor 1, alpha 14 + 70 gcctccgcccttgcccgccccctgacgctgcctcagc 129 subunit (basic HLH TF). tcctcagtgcacagTGCTGCCTc IAPP (islet amyloid polypeptide); 12 + -122 agttaatgtaataatgacccatccgcttcTGCTGC -63 CTGtgaggtactttctatctatagg Insulin-like growth factor binding 7 - 49 cctgggccaccccggcttctatatagcggccggcgc -10 protein 3. gcccgggccgccCAGATGCGagca Insulin-like growth factor II, IGF2 11 - -2777 cccgcctccagagtgggggccaaggctgggcagg -2836 gene. cgGGTGGACGGccggacactggcccc Interleukin enhancer binding 1 - 63 caataggcataagcgatggggtaaggcgggagcc 4 factor 2, 45kDa. aatcGCACCGGGttttattgggacac Isocitrate dehydrogenase 3 X - -99 GCCTGTCCccgaaacttcgcaccccgtcgaac -158 (NAD+) gamma. tctcgcgagagcggtatctgcgtgtcgg KH domain containing, RNA 1 + -69 tgccggtggctggctgcGCACGCGCgccgcct -10 binding, signal transduction catttccggtgctctctctcgctgggtc associated 1. Kinesin 2 60/70kD. 14 + -26 tgtggggcggggcgaggcggggcgCCGGGG 33 GGagcaacactgagacgccattttcggcgg Kinesin family member 22. 16 + -31 gggcgggGCACCGGGGaattcgaatgggag 28 aggcgggcccaaggagggagtggaatggcc Lectin, galactoside-binding, 19 - -219 tcctccgcgttctcctcctccgctgccCACCTGTC -278 soluble, 4 ctgggtcattcctgcagcctgccct L-3-hydroxyacyl-Coenzyme A 4 + 20 gggacgccgggggcgcgcgggctgcagggccgC 79 dehydrogenase, short chain. GTAGGTCCccgcccccagagtctggct Likely ortholog of mouse 19 + 17394 tcccacaggccccgccccctcctcccaccctcgttca 17453 phospholipase D3. GCCTGTCCagacagaagctgggg lyn 8 + -66 gccctccgggctcaatatgcaaatccgaGCACC -7 AGGaagtagctgggacctctcggccga lyn 8 + -66 gccctccgggctcaatatgcaaatccgaGCACC -7 AGGaagtagctgggacctctcggccga Lysosomal acid phosphatase 2. 11 - -59 tgctTGCAGGTGccacccagcgggttccagctt -118 gtttgctgcatagattacaacggtgat Mannose-P-dolichol utilization 17 + 156 cGCACGCGCAAcgaaagtcaatggcggtctg 215 defect 1. gagagactggcggaagctagctttgcaat Membrane cofactor protein 1 + 6 gccacgccCACCTGTCCtgcagcactggatgc 65 (CD46, trophoblast-lymphocyte tttgtgagttggggattgttgcgtccca cross-reactive antigen). Mitochondrial ribosomal protein 1 + 6 cgttcccagcaggccctgcgcgcggcaacatggcg 65 L37. gggtcCAGGTGGAggtcttgaggct (continued on next page)

116

(Table A1 continued) Mitochondrial ribosomal protein 19 + -49 agagaggtctcattggctgaattccTGCACCGG 10 L4. ctcgtcggaggcgggacccaaagtagg Mitochondrial ribosomal protein 12 - -6 ggctgttgcggatggggcgtaggtgggcggtgcgcc -65 L51. cacaGCTGCCTGggtaaggcccaa MMP11 matrix metallopeptidase 22 + -424 AacttgcttgtgggtaggacCACCTGTCagagg -365 11 (stromelysin 3) tcagaggtcaggccaccaaggagaccc

Myo-inositol 1-phosphate 19 - 47 ggggaggggccgcagcgcttttctccggaggtcgcg -12 synthase A1. cgcccgagagcccgcGCTGTCCGc Myo-inositol 1-phosphate 19 - 47 ggggaggggccgcagcgcttttctccggaggtcgcg -12 synthase A1. cgcccgagagcccgcGCTGTCCGc Na/H exchanger NHE-1. 1 - 214 gcTGCTCGCTGgtgcctataagtgacagcgcc 155 gggctcagctaggcttcagtctgctgcg NADH dehydrogenase 22 - -76 tcgcccgcggtccGCACGCGCtgcttgcaaag -135 (ubiquinone) 1 alpha gggtggggttgtggagtggatgctttgg subcomplex, 6, 14kDa. Nipsnap homolog 1 (C. elegans). 22 - -23 gccaccagggcttgggggcggggccttccTGCA -82 ACCTttgcggctccaacatggctccgc NRAS-related gene. 1 - -18 TGCACGCGCccggagcccgcaactctcgcga -77 gagaagcgagatttattcctacgtaccgg Opsin green, GCP gene. X + -90 gggctgatcccactggccggtataaagcgccgtgac -31 cctcaggtgacGCACCAGGgccgg Opsin red, RCP gene. X + -57 gggctgatcccacaggccagtataaagcgccgtga 2 ccctcaggtGATGCGCCAgggccgg Peptidylprolyl isomerase B 15 - -118 ccaccctcttccggcctcaGCTGTCCGggctgct -177 (cyclophilin B). ttcgcctccgcctgtggatgctgcgc Peroxiredoxin 4. X + -84 gggggaggagctggtctcgcgaggccgcgccccg -25 gttcgcgcgggcgtcgTGCACGCGgt Phenylalanyl-tRNA synthetase 2 - 33 ggtcctacgcgcttcgctagggaagcccgggtcaG -26 beta-subunit. CTCGCTGcgcaggcgcagtgagttc Phosphomevalonate kinase. 1 - -38 actgttctaagtgagttcgGGTGGGGGagcttca -97 cgaggggaggctgctctgtgaaggaa Prion protein (p27-30) 20 + 259 gccccgcgtccctccccctcggccccgcgcgtcGC 318 (Creutzfeld-Jakob disease, CTGTCCtccgagccagtcgctgaca Gerstmann-Strausler-Scheinker syndrome). Protease, serine, 23. 11 + 44 gcgcggggggcggacccgccaGCTGCCTGc 103 gctgctcgccagcttgctcgcactcggctg Proteasome (prosome, 9 - 49 acgccggaagtgctcggctctttttgttTGCACCC -10 macropain) subunit, beta type, 7. Gcctccgacccggaactgctttctt Protective protein for beta- 20 + 334 agactgtcacgtggcgccggagttcacgtgactcgta 393 galactosidase (galactosialidosis). CACATGACttccagtccccgggc Protein (peptidyl-prolyl cis/trans 19 + -33 cgcgggcgaCAGGCGGCgcagctgaggcgg 26 isomerase) NIMA-interacting 1. agcaggcgctgcggcaggagggaagatggc Protein kinase C, alpha binding 22 + 103 caGTGGGGGGaaagccgggacttccgcgtctt 162 protein. gccggaagtgacgtgacaatcgcggcca RAE1 RNA export 1 homolog (S. 20 + 117 cgacgatttccgggcgacgcaggaagtggctccag 176 pombe). ggcGCACGCGCgttgtttccgcggt Regenerating islet-derived 1 beta 2 - 18 agttctctctcagaccctgatataaagctcctactctgt -41 (pancreatic stone protein, ctgACCTGACAAgccacctca pancreatic thread protein). Regulatory factor X, 4 (influences 12 + 101425 ctcctcccccacccacagagtgagttcCAGGTG 101484 HLA class II expression). GGaaggcagttatgacagttgagaagt Reserved. X + -2 catctccctgggagtcgcgcagagtggagtcaaag 57 gcaaccagTGCTCGCTGcggtctct Ribosomal protein L14. 3 + -20 ggggcggtgcgttcttctaCACATGCGcagggtt 39 gggcgggtcttcttccttctcgccta Ribosomal protein L15. 8 + -49 gctgAGGTGGGGGaggagcccaaaaggcat 10 tgtgggagtacagctctttcctttccgtct Ribosomal protein L7. 8 - 47 ggcttctctcgcgaagtctttaagtggacagtacgcA -12 TGCGCCAacttcctctttttccg (continued on next page)

117

(Table A1 continued) Ribosomal protein L8. 8 - 3 ggaagataaggccGCTCGCTGacgccgtgttt -56 cctctttcggccgcgctggtgaacaggt Ribosomal protein, large, P0. 12 - -55 tggctactttgttcgcattataaaagGCACGCGC -114 gggcgcgaggcccttctctcgccagg Ring-box 1. 22 + -52 AGGTGGGGGGagcgaagtgtgcgctgctgcg 7 caggcgcggtggtcggacgacagaccgtg RNA binding motif protein 3. X + -13 cgggACGCGCAAggggaacgttccgggacgtt 46 ctcgctacgtactctttatcaatcgtct S100 calcium binding protein A1. 1 + 16 gtctccacacacagctccagcagccacattTGCA 75 ACCTtggccatctgtccagaacctgc S-adenosylhomocysteine 20 - -5 tatgcaagtgcgaggaagatatttaaaggcgtcggc -64 hydrolase. gcCACGCGCAtatccctgctcggc Secreted protein, acidic, 5 - 8 ggtttcctgttTGCCTGTCtctaaacccctccacat -51 cysteine-rich (osteonectin). tcccgcggtccttcagactgcccgg Serine/threonine kinase 25 2 - -273 ggcaccgggaggaagctgccttggaagAGGTG -332 (STE20 homolog, yeast). GGGGcggcgacgggaggggcggcgagcc Sialyltransferase 4C (beta- 11 + 49837 gtgtctaggcaggagagtttgtgaagctgaccggAC 49896 galactoside alpha-2,3- ACCTGTggctcttatttcctaggt sialytransferase). Similar to S. cerevisiae RER1. 1 + 18 cgcgccatcttgggggccctggaggcggcgccgcg 77 gaggacggagcggaagTGCTCGCTG Small nuclear ribonucleoprotein 15 - 30 cggctgcgattggTGCTGCCTGgcggcgggg -29 polypeptide A'. cgcggggcacgctgggacgtctcgctggc SMT3 suppressor of mif two 3 17 - 16 tcggctactcgtgctcgcgcacgcagGGTGGGG -43 homolog 2 (yeast). GGagggagcgcacgacgtgcgcgcacc Solute carrier family 25 4 + -46 ctcgcgagagcccggcggggatataagggggagc 13 (mitochondrial carrier; adenine e tgcgggCCAGGCGGCggccccctagc translocator), member 4. Solute carrier family 26, member 7 - 39 ccactgtcttacatagcctatatatagacatccctatgc -20 3. atATGCACACagagcct Solute carrier family 35 (UDP- X - -263 gtgacgaacgcggaagtggtttttctgttgccgaggg -322 galactose transporter),member2. gacgggccggGCAGATGCcaaca Sorting nexin 6 , transcript variant 14 - 31 ccccgctcctggccgccggctccctccggcagcag -28 1. ggAGATGCGCGCCtgcgccggccct Syndecan 1. 2 - -273 gagacctggcggagctgggGGTGGGGGGcc -332 agtttttgcaacggctaaggaagggcctgt T-cell- or lymphocyte-specific 1 + 22833 atccCAGGTGGGagggtggggctagggctca 22892 tyrosine kinase 1. ggggccgtgtgtgaatttacttgtagcct Tf (transferrin) 3 + -3325 tgctttcttacccacctcctgctggtcagcttttccagcttt -3266 ccTGCACGTAcacacaag Thioredoxin 2. 22 - 49 ctcttccgcctgcctgtacccggaagtgacgttatgta -10 cgtcccccCCCGAGGAagtgac Tissue specific transplantation 8 - 35 actccatttcccagagtgccccgccccagcctcccg -24 antigen P35B. gccccgcccCCCACCTGgctccgc TNFRSF6 (tumor necrosis factor 10 + 762 gaggagcggaactcctggacaagcCCTGACA 821 superfamily, member 6) Agccaagccaaaggtccgctccggcgcgg Transcription elongation factor A 20 + 6065 cgcgcgcggccgcggggccgagggtttgaACCG 6124 (SII), 2. GGGGtctgtcgtccgcggcggggctgc Translocase of inner X - 307 aaaaacagTGCCTGTCacagcacgcatgcaa 248 mitochondrial membrane 8 tgttatttgaatgaatctgaaagttgggc homolog A (yeast). Tropomyosin 1 (alpha). 15 + 63 gcccccttgggaaagtacatatctgggagaagCA 122 GGCGGCtccgcgctcgcactcccgct Tumor necrosis factor, alpha- 17 + 218 acgagcgcaggatgtgagctcacagcttgggactg 277 induced protein 1 (endothelial). ctgaggggCAGGCGGCtgcaggcta Ubiquitin-conjugating enzyme 21 - 5 ccggaagcagtccccggtgtcggggcaggagGC -54 E2G 2 (UBC7 homolog, yeast). ACGCGCgcggctgaggcgaggtcgctc uPA (urokinase-type 10 + -1942 aagtcatctgctctcagcaatcagCATGACAGc -1883 plasminogen activator). ctccagccaagtaatctggagtcatga Uridine phosphorylase. 7 + 443 cccccgggcagggcggggccgctcgcagactccat 502 atgagattcacctcGCAGGTGGttc (continued on next page)

118

(Table A1 continued) Vacuolar protein sorting 29 12 - 46 tgtctccctctgctctcattggttgcggttaagtgggcgg -13 (yeast). gtcgCCGAGGAGcctgagga Vacuolar protein sorting 35 16 - -14 ggcttggaggggccgcagcgtCACATGACcgc -73 (yeast). gggaggctacgcgcggggcgggtgctgc Wilm's tumor, WT1 gene. 11 - 262 ggccaagaaggggAGGTGGGGGGagggtt 203 gtgccacaccggccagctgagagcgcgtgtt XPA binding protein 1; putative 2 + 353 ggaagtttctctaccCATGCGGTgtctctatggtc 412 ATP(GTP)-binding protein. gggtgggtggggccaggaggaagat

Table A2. Smuc Target Genes

Gene Description Ch 60n Sequence with Binding Site

2-hydroxyphytanoyl-CoA lyase. 3 - -5 gcctcttccttcccgttgtttaaggcagttggttgccctC -64 CTGTCCGtcagaggtgcagt Acid sphingomyelinase-like 1 + -45 tggcttccacttaagggtccggtatgCCTGCCTC 14 phosphodiesterase. Ctcgggccagcccagatcataccctg ADA (adenosine deaminase); 20 - -5396 ttttgactcacatggcagttggtGGTGGAGGgg -5455 Gene: G000183. aacaaaggagactgagtttcatcgaag AHA1, activator of heat shock 14 + 14 ggagggGGTGGAGGgaaaagcaagccagg 73 90kDa protein ATPase aggtgcttgcggccgcttctagtagtttcca homolog1. Aldo-keto reductase family 1, 10 + -11 aggggtttcctgcccattgtttttgtaatctctgaggag 48 member C3 (3-αhydroxysteroid aagcagcagcaaaCATTTGCT dehydrogenase, type II). ATPase, H+ transporting, X + 21 tgcggctgtcgaggccgctgaggcaGTGGAGG 80 lysosomal interacting protein 1. Ctgaggctatgatggcggccatggcgac Beta-2-microglobulin. 15 + -19 attggctgggcacgcgtttaatataaGTGGAGG 40 Cgtcgcgctggcgggcattcctgaagc beta-globin. 11 - 49 caggagccagggctgggcataaaagtcagggca -10 gagccatctattgcttaCATTTGCTtc Calcium binding atopy-related 10 - -13 aggagagtcacgtgagagtgggcggagggGGT -72 autoantigen 1. GGAGGtttgtctccgctgtttcatctct Calmodulin 3 (phosphorylase 19 + 62 gcgcgctgcgggcagtgagtGTGGAGGCgcg 121 kinase, delta). gacgcgcggcggagctggaactgctgcag CD63 antigen (melanoma 1 12 - -68 ccggggcggggccgcgcggCAGGCGGGgc -127 antigen). gggagccggggggcgcagctagagagccccg CDW52 antigen (CAMPATH-1 1 + -11 gcctcctCTGCCTCCtggttcaaaagcagctaa 48 antigen). accaaaagaagcctccagacagccctg CGA (glycoprotein hormone 6 - 36 ctgctggtataaaaGCAGGTGAggacttcatta -23 alpha-subunit) actgcagttactgagaactcataagac Chorionic gonadotropin- 6 - 48 ggtggaaacactctgctggtataaaaGCAGGT -11 Luteinizing hormone- Follicle GAggacttcattaactgcagttactgag stimulating hormone- Thyroid stimulating hormone -alpha. Claudin 4. 7 + 101 agccatataactgctcaACCTGTCCccgagag 160 agagtgccctggcagctgtcggctggaa c-myc 8 + -1666 gaattgttttctcttttggaggtGGTGGAGGgaga -1607 gaaaagtttacttaaaatgcctttg Complement component 1, s 12 + -44 cccctctgtttagatcagggaatttcagacaTGCA 15 subcomponent. CACTcgggtagggaatcttatgaac COP9 constitutive 17 - 18 cgtcgccgctcgcgaggacctCAGGTGGAtcg -41 photomorphogenic homolog ccgcggcccctcctcccagagcggcagc subunit 3 (Arabidopsis). CREBBP/EP300 inhibitory 15 + 26 ggccgcgcagcatctgtcTTGCTGGAagcttttt 85 protein 1. cctagaggttgagcggtttgcacaat Cyclin-dependent kinase 5. 7 - 29 ggggctggaggctCAGGTGCCgcctcctctgc -30 aacgccggggccagagtcttaaaaccga (continued on next page)

119

(Table A2 continued) Cystatin S (4). 20 - 62 gggctgggctgccaaagcaggataaatgcaCAC 3 CTGCCTgctggtctgggctccctgcct Cytochrome c oxidase subunit 19 + 50 tctctcagcttccggctggtagtagttccgcttCCTG 109 VIb , nuclear gene encoding TCCGActgtggtgtctttgctga mitochondrial protein. Cytochrome P450, subfamily IIIA 7 - 62 CTGCCTCCttctccagcacataaatctttcagca 3 (niphedipine oxidase), gcttggctgaagactgctgtgcaggg polypeptide 5. DKFZP586F1524 protein. 17 - -15 ACCAGGCGGGccaagagggccgggacgcc -74 gcgcggggcagtgtgggactggggcggaac EGF-containing fibulin-like 11 - -49 CAGGCGGGcgggcggggggcgcttcctgggg -108 extracellular matrix protein 2. ccgcgcgtccagggagctgtgccgtccgc EGFR (epidermal growth factor 7 + -55 cctcccgcCCTGCCTCCccgcgcctcggccc 4 receptor) gcgcgagctagacgtccgggcagcccccc Endomucin. 4 - -42 aaataagtaggaatgggcagtggctattcacattca -101 ctacaccttttcCATTTGCTaata Enolase 3, (beta, muscle) , 17 + -21 gggaccgagtggctcagggataaatgcgcagcctg 38 transcript variant 1. agagggggtgagctgACACTGTCCc Epidermal growth factor receptor, 7 + -64 gtccctcctcctcccgcCCTGCCTCCccgcgcc -5 EGFR or ERBB1 gene. tcggcccgcgcgagctagacgttcggg Epo (erythropoietin) 7 + 2967 gcagtgcagcaggtccaggtccgggaaacgagg 3026 GGTGGAGGgggctgggccctacgtgct Eukaryotic translation initiation 16 + 486 gccttagagggctgtgatagccggtagccaccCA 545 factor 3, subunit 8, 110kDa. CCAGGCacaagggatagaatcacaga Farnesyl-diphosphate 8 + 11 gggcggggcgtcgccgtactaggcctgcccCCT 70 farnesyltransferase 1. GTCCGgccagcccctcgaagcacctac Fucosidase, alpha-L- 1, tissue. 1 - 10 agccgcccgcgggCACCTGCgcgttaagagtg -49 ggccgcgtcgctgaggggtagcgatgcg Fumarate hydratase , nuclear 1 - 44 cggaacggtttctgacaaccggcGTGGAGGC -15 gene encoding mitochondrial gtggccacagccgcccagaaattctaccc protein. GLOB-B (beta-globin) 11 - 53620 tccccagcaggatgcttacagggcagatggcaaaa 53561 aaaaggagaagctgACCACCTGact Glycophorin C (Gerbich blood 2 + -2 cagagcccctcccctcggcccgcgcgggaggagt 57 group). gtgaccCAGGTGCCgcttcctctcgc Growth hormone-releasing 20 - 4974 acgcttaggaaaatgaagagataaatgatgggaac 4915 factor, GHRH or GHRF gene. gCCAGGCGGctgccagagcaaacac Guanine nucleotide binding 12 + 681 ggggccaggccaggccaggccagctcctctggca 740 protein (G protein), beta gcagagcctggGCAGGTGAcgggcgg polypeptide 3. Guanylate binding protein 2, 1 - 47 aattccttttatGGTGAATGagtcactgctttagttg -12 interferon-inducible. atactttgtttcatattagtgca H2A histone family, member Y. 5 - 698 tttcggCCTGTCCGcagtttttaaaaaacgtgtgt 639 gatgataaggaatcactgtctacat Insulin-like growth factor II, IGF2 11 - -2777 cccgcctccagagtgggggccaaggctgggcagg -2836 gene. cgGGTGGACGgccggacactggcccc Interferon-induced protein with 10 + 3 caccattggctgctgtttagctcccttatataACACT 62 tetratricopeptide repeats 1. GTCttggggtttaaacgtaactg Keratin 17. 17 - -824942 ctgcttgtattggcTTTGCTGGAagcgaagggg -825001 gactcctctctggaaatgagaaggtga Lectin, galactoside-binding, 19 - -219 tcctccgcgttctcctcctccgctgccCACCTGTC -278 soluble, 4 ctgggtcattcctgcagcctgccct Likely ortholog of mouse RNA pol 9 + -49 cggggacacgcggctcgcgcgctgtgggcGGT 10 I associated factor, 53 kD. GCCCGgcggggccacgccttttccggcc lyn 8 + -66 gccctccgggctcaatatgcaaatccgaGCACC -7 AGGaagtagctgggacctctcggccga Lysosomal acid phosphatase 2. 11 - -59 tgctTGCAGGTGCCacccagcgggttccagct -118 tgtttgctgcatagattacaacggtgat Major histocompatibility complex, 6 - 28550917 aaatcagtaacttcctccccataatttggaatgtgGG 28550858 class II, DR beta 3. TGGAGGgggatcatagttctccc (continued on next page)

120

(Table A2 continued) Melanoma antigen, family A, 3. X - -650625 tggcggcttgagattGGTGGAGGgaagtgggt -650684 ccaggctcggtgaggaggcaaggtgaga Membrane cofactor protein 1 + 6 gccacgccCACCTGTCCtgcagcactggatg 65 (CD46, trophoblast-lymphocyte ctttgtgagttggggattgttgcgtccca cross-reactive antigen). Metallothionein-1 8 + -69 tatccgagccagtcgtgccaaaggggcggtcccgc -10 tgTGCACACTGgcgctctcgagctc Mitochondrial ribosomal protein 11 - 29 ctCACCTGCCctagcctaacgtgaggacccttg -30 L11. acctctggcccaagatggtggcgccca Mitochondrial ribosomal protein 1 + 6 cgttcccagcaggccctgcgcgcggcaacatggcg 65 L37. gggtcCAGGTGGAGGtcttgaggct MLC (myosin light chain 1/3 - -26555 taccatgtgtgaaatgcctgtaactcaagtaacaGC -26614 locus) AGGTGCaaaataaagtagcaggcg MMP11 matrix metallopeptidase 22 + -424 aacttgcttgtgggtaggacCACCTGTCagagg -365 11 (stromelysin 3) tcagaggtcaggccaccaaggagaccc

N-acetylglucosamine kinase. 2 + 323 cggtgcccagacagctggagggaaggaggtgtC 382 AGGCGGGgagagacgcaaacggcggga Opsin green, GCP gene. X + -37126 gggctgatcccactggccggtataaagcgccgtga -37067 ccctcaggtgaCGCACCAGGgccgg Pepsinogen A. 11 + -50 ataaggcgggacccaacttgtatataaggggcagc 9 tcatgctgctgctcTGCACCTTcct Pepsinogen A. 11 + -50 ataaggcgggacccaacttgtatataaggggcagc 9 tcatgctgctgctcTGCACCTTcct Peptidyl-prolyl isomerase G 2 + -49 gtgacgcaaGCAGGTGAAcgctggccgatgc 10 (cyclophilin G). tgacaacccactcctgacgcgcttccggt Phenylalanine-tRNA synthetase- 19 - 16 ttccggtggctcagctccccccgcccaccctggccc -43 like. ggacacgctgaGCACACTGgaagg Phosphatidylserine synthase 1. 8 + -52 ctcgcgcctgtccttccctctgctcccagccTTTGC 7 TGGgcgccagacccggctttgccg Proteasome (macropain) 26S 14 + -22 acttccgcctctggcgggccgcagtGGTGGAG 37 subunit, ATPase, 1. Gaacttccggcagcggcagctcaagtgg Proteasome (macropain) 26S 2 + -66 cggCAGGCGGGgtggcgggcagcccctggg -7 subunit, non-ATPase, 1. cgggcggggtcctggcgagaagcgagccgg Serine proteinase inhibitor. 14 - -8 aactgggcactgtgcccagggcatgcaCTGCCT -67 CCacgcagcaaccctcagagtcctgag Sialyltransferase 4C (beta-gal 11 + 49837 gtgtctaggcaggagagtttgtgaagctgaccggA 49896 alpha-2,3-sialytransferase). CACCTGTggctcttatttcctaggt Solute carrier family 25 4 + -46 ctcgcgagagcccggcggggatataagggggagc 13 (mitochondrial carrier; adenine e tgcgggCCAGGCGGcggccccctagc translocator), member 4. Solute carrier family 25 X + -21 gcctccgacgcctgccgctccagctccggctccccc 38 (mitochondrial carrier; adenine tatataaatcggcCATTTGCTtcg translocator), member 5 Stromelysin, MMP3 or STMY1 11 - 47 caaacaaACACTGTCactctttaaaagctgcgc -12 gene. tcccgaggttggacctacaaggaggca Surfeit 4. 9 - 14621 gccgacgcggagcgaggccggccgccgggcactt 14562 cctGTGGAGGCcgcagcgggtgcggg TCR-alpha (T-cell receptor 14 + 935415 gccgaggggagcaggctgggccgttacaccaccc 935474 alpha); cccaaccGCAGGTGCagcaaggccaa Thymidylate synthetase. 18 + 72 gcgccacttggcctgcctccgtcccgccgcgccactt 131 ggCCTGCCTCCgtcccccgcccg Transformer-2 alpha (htra-2 7 - -7 ttacgtcgtgctctgaggaggccaggagatttctggc -66 alpha). ggcgccggcgccatTTTGCTGGA Transmembrane 4 superfamily 3 - -3 ggcgggacattcccCCTGCCTCttcgcaccac -62 member 1. agccagagcctgccattaggaccaatga Uridine phosphorylase. 7 + 443 cccccgggcagggcggggccgctcgcagactcca 502 tatgagattcacctcGCAGGTGGttc Voltage-dependent anion 5 - 18 cgccccgcccttggcccagccgctcgctcggctccg -41 channel 1. ctccctggctcggctcCCTGCCTC V-ral simian leukemia viral 2 + 36 tgacaaatcGGTGGAGGacggctggggtccg 95 oncogene homolog B. gccccgggagggggcggggcgcgtttaag

121

Table A3. Snail Long Target Genes

Gene Description Ch 60n Sequence with Binding Site

2,3-bisphosphoglycerate mutase. 7 + 4 taggcagagccctaggacctgccCACGTCTGg 63 ttcaggggctagaaaagagcgtcgatgc 2-hydroxyphytanoyl-CoA lyase. 3 - -5 gcctcttccttcccgttgtttaaggcagttggttgccctC -64 CTGTCCGtcagaggtgcagt alpha-tubulin b alpha 1. 12 - -399 gcgaccgagggtctgggcgtcccggctgggcccc -460 GTGTCTGTgcgcacggtttcgctgat ATP synthase, H+ transporting, 11 + 168 gaaaagaggcggcctgaccttggaagTGGGAC 227 mitochondrial F0 complex, GGggtcctgcagcgggtccttccggcgg subunit g. ATPase, H+ transporting, 8 + 14 ggagcagcgcgcgccgcggtcagctgactgctggg 73 lysosomal 42kDa, V1 subunit C, ctgGCACGTGActtgttctgtgttc isoform 1. Basic leucine zipper and W2 7 + 16 ccgtgtgtctatgtcaatGTGTCTGTccttcactcc 75 domains 2. tccattgtctgccgccactgctgc Basic leucine zipper and W2 7 + 16 ccgtgtgtctatgtcaatgtGTCTGTCCttcactcc 75 domains 2. tccattgtctgccgccactgctgc BCR (breakpoint cluster region 22 + -780 gctcaagagaggccaggggacaaacacatggctc -721 protein) agcacacaggaaggcaggtgtgggta Carcinoembryonic antigen, CEA 19 + -36 tgaccCACGTGATGctgagaagtactcctgcc 23 gene. ctaggaagagactcagggcagagggagg CDC23 (cell division cycle 23, 5 - 29 gtGCAGATGGGCagccagtattatcgccgatg -30 yeast, homolog). attggctggagtggcccgagtcgggaaa CEA (carcinoembyonic antigen); 19 + -53 aataaagaccacacccatgaccCACGTGATG 6 ctgagaagtactcctgccctaggaagaga Cell division cycle associated 8. 1 + -40 tgttgTGGGACGGcggcttctcattcgtcagcctg 19 tgactgtggagtttgaattgggtgg CETP (cholesteryl ester transfer 16 + -395 acaggcaacagcaggcttcaagcttggggtCATT -336 protein) GTCGGgcaacagtatctggcaagaat Claudin 4. 7 + 101 agccatataactgctcaACCTGTCCccgagag 160 agagtgccctggcagctgtcggctggaa Crystallin, lambda 1. 13 - 33 ggtccgcgcgcgcagccaATGGGCGCggga -26 cccgccctccccggagcccagagctcgcag Cyclin C. 6 - -105 TTGTCGGCcggcgtgaaggagactagggggc -164 catcctcttcctttcgccgtcgccgccgc Cystatin S (4). 20 - 62 gggctgggctgccaaagcaggataaatgcaCAC 3 CTGCCTgctggtctgggctccctgcct Cytochrome c oxidase subunit 19 + 50 tctctcagcttccggctggtagtagttccgcttCCTG 109 VIb , nuclear gene encoding TCCGActgtggtgtctttgctga mitochondrial protein. Endothelial differentiation-related 9 - 23 tctctagcagctgccgctgagccgccggacggacg -36 factor 1. ctcgtcttcgcccgcCATGGCCGag Eukaryotic translation initiation 16 - 36 ctgttcgtTTGTCGGCgctaccaataaagttttag -23 factor 3, subunit 6 48kDa. tgagcacagactcccttttctttgg Farnesyl-diphosphate 8 + 11 gggcggggcgtcgccgtactaggcctgcccCCT 70 farnesyltransferase 1. GTCCGgccagcccctcgaagcacctac gamma-interferon, IFNG gene. 12 - 51 cctcaggagacttcaattaggtataaataccagcag -8 ccagaggaggtgcaGCACATTGTt General transcription factor IIIC, 9 + 276 gcgaggcctttgggagtactttgTGGGACGGac 335 polypeptide 5, 63kDa. cctggcgggccctgccagacgcacagg GPC (glycophorin C) 2 + -184 gtatccggtagtgGCAGATGGaaagagaaac -125 ggttagaaaagcgtggttcttgcgcagga H2A histone family, member Y. 5 - 49 tttcggCCTGTCCGcagtttttaaaaaacgtgtgt -10 gatgataaggaatcactgtctacat Heterogeneous nuclear X + -30 agttgctcacgtgaccgagatctcacatgacgtagg 29 ribonucleoprotein H2 (H'). cgctCACGTGATtactcgcttacg Hypothetical protein HSPC148. 11 - -7 gcttccggGTCGTAGGcgctagctctgggcgca -66 gaggtttctgggagccaagagtggtaa (continued on next page)

122

(Table A3 continued) IL-3 (interleukin-3) 5 + -167 tagaaagtcatggatgaataattACGTCTGTggt -108 tttctatggaggttccatgtcagata Insulin, INS gene. 11 - 49 gggAGATGGGCtctgagactataaagccagcg -10 ggggcccagcagccctcagccctccagg Interferon-related developmental 3 - 375 ctacacctcactaagcccatggagaggtcacagatg 316 regulator 2. ggccgtgcacggGCAGATGGGCCC Lectin, galactoside-binding, 19 - -219 tcctccgcgttctcctcctccgctgccCACCTGTC -278 soluble, 4 ctgggtcattcctgcagcctgccct Likely homolog of rat and mouse 17 + -25 gacccgccccgtccgcctccgtCACGTGATGg 34 retinoid-inducible serine ggcgccgggtccagcctgttgctgatgc carboxypeptidase. Melan-A. 9 + -93 aacccgtgactatCATGGGACtcaaaaccagg -34 aaaaaaaataagtcaaaacgattaagag Membrane cofactor protein 1 + 6 gccacgcCCACCTGTCCtgcagcactggatg 65 (CD46, trophoblast-lymphocyte ctttgtgagttggggattgttgcgtccca cross-reactive antigen). Mitochondrial ribosomal protein 12 + -26 cccgcgggcTGGGACGGcgcgcgcagcgct 33 L42. gtttcgtgggaccaggattgaaacaagatg MMP11 matrix metallopeptidase 22 + -424 AacttgcttgtgggtaggacCACCTGTCagagg -365 11 (stromelysin 3) tcagaggtcaggccaccaaggagaccc Nucleolar and coiled-body 10 + 184 ttcctgagtcgtgctgcgtcgaCAACGGTAgtga 243 phosphoprotein 1. cgcgtattgcctggaggatggcggac Polyamine-modulated factor 1. 1 + -17 gggaggccgggccagttagatttggaggttcaacttc 42 aACATGGCCGaagcaagtagcgc Polymerase (DNA directed), δ 2, 7 - 49 gggcggggaagccgcgcggggattagcgagttgc -10 regulatory subunit 50kDa. ggcGATGGGCGgggcaggcgcgcggg Proteasome 26S subunit, non- 17 + -28 ggcgggctttccgggtgtgtgtttccggcgtcggcgg 31 ATPase, 11. ccgcggccgGGGACGGTgtgaga Protein x 0001. 3 - 34 cctcggagtcttcggcaccgccctgtcccagcctcctt -25 tgcgggtaaacagACATGGCCG Pyruvate dehydrogenase 3 - 40 aggcggggcctgctCACGTCTGccggcccctc -19 (lipoamide) beta. tgttgtcgtttggcagcggatagaggac Ribosomal protein S28. 19 + -49 gggcggtgctagaacgtcctataaaggctctccccg 10 aaGCACGTGActcctctccgccag Sarcoma amplified sequence. 12 + 0 aatccagaggcaggggaGGGACGGTgcagg 59 cgcagagtattgggtttggctggcctcgat Sjogren syndrome antigen B 2 + -48 cctgcccacccaggccgcaagagctgccGGGA 11 (autoantigen La). CGGTccccatcttcttggagcgctttag Small acidic protein. 11 + 19 aggggccGCAACGGTgacgactgtggcagag 78 aaggcccggaggggctctgcgttctgtag Small nuclear ribonucleoprotein 19 + 290 tagagcatcctggAAGTCGTAGtaaatctctcg 349 polypeptide A. agagttctctccgcacgcgggctggag Solute carrier family 2 (facilitated 1 - -259 agccaATGGCCGGggtcctataaacgctacgg -318 glucose transporter), member 1. tccgcgcgctctctggcaagaggcaaga TAF9 RNA polymerase II, TATA 5 - -444 gccgcccgccccttccggctcctccctgtagagcaa -503 box binding protein (TBP)- aggGCACGTGAgcgaggcgcccga associated factor, 32kDa. Thyroid autoantigen 70kDa (Ku 22 + -45 gCGTGATGAcgtagagggcgttgattgggacc 14 antigen). gagtacagggcccgcgcatgcgtggatt Transgelin. 11 + 0 tcaccacggcggcagccctttaaacccctcacccag 59 ccagcgccccatcctGTCTGTCCg Tumor susceptibility gene 101. 11 - -33 ggttgggggtgtgcgattgtgTGGGACGGTctg -92 gggcagcccagcagcggctgaccctct Zinc finger protein 134 (clone 19 + -260 ggacggaagctctctgggcgggacttccggtatcttc -201 pHZ-15). ctcgcggtggacatcTTGTCGGC

123

Table A4. Snail Short Target Genes

Gene Description Ch 60n Sequence with Binding Site

2-hydroxyphytanoyl-CoA lyase. 3 - -5 gcctcttccttcccgttgtttaaggcagttggttgccctC -64 CTGTCCGtcagaggtgcagt A disintegrin & metalloproteinase 1 + 16 ttctccgaggcgacctggccgccggccgctcctccg 75 domain 15 (metargidin). cgcgctgttccgCACTTGCTGccc ADH1 (alcohol dehydrogenase 1) 4 - 118 gacagtctgtgaataatctaatGGGTGTGGctta 59 aagacctagatcatgtgtggaactgg ADH2 (alcohol dehydrogenase 2) 4 - 78 GGGTGTGGcttaaagacatagatcacgtgtgg 19 aattggaattggatgttacacaagcaaa ADH3 (alcohol dehydrogenase 3) 4 - 93 tgagaagaatacgaCGGGTGTGGcttaaaaa 34 cctagatcacgtgtgtagttggaattggg ALDC (aldolase C) 17 - 59 ccccggacaccagtcctggggaggGGGTGTG 0 Gtcagggcggggcatgcaggccacgcccc alpha 1-acid glycoprotein 1. 9 + 26 agtgaccgcccatagtttattataaaggtgACTGC 85 ACCctgcagccaccagcactgcctg alpha 1-acid glycoprotein 2, 9 + 31 agtgaccgcccatagtttattataaaggtgACTGC 90 Orosomucoid 2. ACCctgcagccaccagcactgcctg alpha-actin skeletal muscle. 1 - 47 agggaatgcccgcgggctatataaaACCTGAG -12 CAgagggacaagcggccaccgcagcgga APG12 autophagy 12-like (S. 5 - -188 gaaaagcacgccCACTGCACgcgctcagtcg -247 cerevisiae). ctacttccgctctcgagtgtctccaagca apoB (apolipoprotein B) 2 - 2705 tcgccttgaaattcctctagtCAGGTGGCtttctaa 2646 tgggtacccagagccctatgacta Apoptosis antagonizing 17 + 62 gccccgccccctcccgcgcgcctcccggaAGTG 121 transcription factor. GCCGgtccagagctgtggggtggcctc ATPase, H+ transporting, X + 21 tgCGGCTGTCgaggccgctgaggcagtggag 80 lysosomal interacting protein 1. gctgaggctatgatggcggccatggcgac bbc3 (bcl-2 binding component 19 - -1413 gggcgcggcgcgCGCCTGCAagtcctgacttg -1472 3); tccgcggcgggcgggcggggccgtagcg BCR (breakpoint cluster region 22 + -752 tggctcagcacacaggaaggcaGGTGTGGGt -693 protein) attgagtgacctgagggagaaatccctat Block of proliferation 1. 8 - 0 ctcctgcgcacgcggcccggtcgctgtcggaagcg -61 gctgtgcgGGTGGCGGccggcgcgc Capping protein (actin filament) 1 - 65832 GGTGGCGGcggcccggcgcggggggaggg 65773 muscle Z-line, beta. gggtgctgacccggatgttcactcctgggca CD9 antigen (p24). 12 + -38 cgccctaaggagtggcactttttaaaagtgcagccg 21 gagaccagcctacagcCGCCTGCA CDC23 (cell division cycle 23, 5 - 29 gtgcagatgggcagccagtattatcgccgatgattgg -30 yeast, homolog). ctgGAGTGGCCgagtcgggaaa Cellular retinoic acid binding 1 - 33 ggtataaaaGCTGTCCGcgcgggagcccagg -26 protein 2. ccagctttggggttgtccctggacttgtc Chromosome 3 open reading 3 - -115 agggggcgggctctgcagggaagtgcgtcagagg -174 frame 4. aggcgcggggagagtagGGTGCTGTg Claudin 4. 7 + 101 agccatataactgctcaACCTGTCCccgagag 160 agagtgccctggcagctgtcggctggaa CYP3A4 (Cytochrome P450 3A4) 7 - -41258 gataatctcagctgaatgaACTTGCTGaccctct -41317 gctttcccccagcctctcagtgccct Cystatin S (4). 20 - 62 gggctgggctgccaaagcaggataaatgcaCAC 3 CTGCCTgctggtctgggctccctgcct Cytochrome b-561. 17 - -8751 catctcggGGTGCTGTgcgctgctcctctgagct -8810 ctgctctttcttgcagcgtttgcctc Cytochrome c oxidase subunit 9 + 50 tctctcagcttccggctggtagtagttccgcttCCTG 109 VIb , nuclear gene encoding TCCGactgtggtgtctttgctga mitochondrial protein. DHFR (dihydrofolate reductase) 5 - -377 cggggcctcGCCTGCACaaatggggacgagg -436 ggggcggggcggccacaatttcgcgccaa Dihydrofolate reductase. 5 - -377 cggggcctCGCCTGCACaaatggggacgag -436 gggggcggggcggccacaatttcgcgccaa (continued on next page)

124

(Table A4 continued) Endothelin-A receptor, EDNRA or 9 + 143 tgcccttcagggcctggaagggggcggcagctttgt 202 ETRA gene. gctttttAGTGGCCGcgtcccagg Farnesyl-diphosphate 8 + 11 gggcggggcgtcgccgtactaggcctgcccCCT 70 farnesyltransferase 1. GTCCGgccagcccctcgaagcacctac Galactosidase, alpha. X - -39 gtccgcccctgaggttaatcttaaaagcccaggttac -98 ccgcggaaatttatGCTGTCCGg GDP dissociation inhibitor 1. X + 209 cggaggggtcgggcgacggccgacgcgccgccat 268 ctttggtccagtgcGGTGGCGGcggc GPC (glycophorin C) 2 + -124 aagttggagcaaagaagagttcagaagtgggCG -65 GGTGTGtgtttaaaaaaaaaaaagggg H2A histone family, member Y. 5 - 698 tttcggCCTGTCCGcagtttttaaaaaacgtgtgt 639 gatgataaggaatcactgtctacat Heat shock 70kD protein 8. 11 - 49 aaggttctaagatagggtataagaggcagGGTG -10 GCGGgcggaaaccggtctcattgaact High mobility group nucleosomal 6 + -25 tcccCGCCTGCAgccccggaagcgcatcctca 34 binding domain 4. ccttccctcgccttcctgttcctggcgg Hydroxyacyl-Coenzyme A X - 29 tcggccaatcaacgagcgcccgcgcccccatcccc -30 dehydrogenase, type II. atcccgtgGAGTGGCCGgcgacaag Hypothetical protein FLJ12525. X - 18 ctggtgcGGTGGCGGttcggagtgagggtagc -41 acgctgagctgaaggctgtgcggagcgg Hypothetical protein FLJ20422. 19 - 74 cgggatgctcgtcagaccagagtcgggctctaattg 15 gtcgagtttccgcagAGTGCCCGg Hypothetical protein HSPC148. 11 - -7 gcttccggGTCGTAGGCgctagctctgggcgc -66 agaggtttctgggagccaagagtggtaa Hypoxanthine X + 80 cctcttgctgcgcctccgcctcctcctctgctccgccac 139 phosphoribosyltransferase 1. cggcttcctcctCCTGAGCAg Insulin-like growth factor II, IGF2 11 - 8829 cccgcctcCAGAGTGGgggccaaggctgggc 8770 gene. aggcgggtggacggccggacactggcccc Intercellular adhesion molecule 1 19 + 230 gccccCAGGTGGCtagcgctataaaggatcac 289 (CD54), human rhinovirus gcgccccagtcgacgctgagctcctctg receptor. Lectin, galactoside-binding, 19 - -219 tcctccgcgttctcctcctccgctgccCACCTGTC -278 soluble, 4 ctgggtcattcctgcagcctgccct Mannose-6-phosphate receptor 12 - 38 gaggagcggttgcccagcggcctcttggcgcttcctg -21 (cation dependent). tttccggttcccCAGAGTGGggca MCM3 minichromosome 6 - -9 acctttGGTGGCGGgaagagttcggaagttttcg -68 maintenance deficient 3 (S. cgccggggtggagtcatcctgggaac cerevisiae). Melanocortin-2 receptor (ACTH 18 - 55 gtgttccggcccttcccggcccaaggtcCACTTG -4 receptor) CTtgcttttctctccgagctcattcc Membrane cofactor protein 1 + 6 gccacgccCACCTGTCCtgcagcactggatgc 65 (CD46, trophoblast-lymphocyte tttgtgagttggggattgttgcgtccca cross-reactive antigen). Metallothionein IF, MT1F gene. 16 + -4 cccggccccctcccctgactatcaaagcagcggC 55 CGGCTGTttgggtccaccacgccttc MMP11 matrix metallopeptidase 22 + -424 AacttgcttgtgggtaggacCACCTGTCagagg -365 11 (stromelysin 3) tcagaggtcaggccaccaaggagaccc Muskelin 1, intracellular mediator 7 + 217746 cctcccctcctcccgttcgctgccagcggtcGGTG 217805 containing kelch motifs. GCGGccgctacggtgctgacaagat Myo-inositol 1-phosphate 19 - 47 ggggaggggccgcagcgcttttctccggaggtcgc -12 synthase A1. gcgcccgagagcccgcGCTGTCCGc Myozenin 2. 4 + 77 atgctgtcccaggttcaaggataaaaaccatcaggc 136 ccCAAGTGCCatccatagtccatct NADH dehydrogenase 8 + -42 ggacctccccagcggaagcggaAGTGGCCG 17 (ubiquinone) 1 beta subcomplex, ccggcaactccgcccttccggctggccccg 9, 22kDa. Neighbor of A-kinase anchoring 19 - 15 aggcggggctCGGGTGTGtgtggaggggacc -44 protein 95. ctgtggttagcagcagctatcgcagcgtc Non-metastatic cells 2, protein 17 + 832 gctgggccctccgGGGTGTGGccaccccgcg 891 expressed in nuclear gene ctccgccctgcgcccctcctccgccgccg encoding mitochondrial protein. (continued on next page)

125

(Table A4 continued) Nuclear antigen Sp100. 2 + 15 tttctgagtcagtctgtcggccgacttcctgcttggggc 74 ctgggcagccaCACTGCACgc Opsin blue, BCP gene. 7 - 63 ggaggggttttgtggggtgggaggatcacctataag 4 aggactcagaggggGGTGTGGGGc Peptidylprolyl isomerase B 15 - -118 ccaccctcttccggcctcaGCTGTCCGggctgc -177 (cyclophilin B). tttcgcctccgcctgtggatgctgcgc Phosphoglycerate kinase 1. X + 37 GGTGTGGGgcggtagtgtgggccctgttcctgc 96 ccgcgcggtgttccgcattctgcaagc Pinin, desmosome associated 14 + -2 tcattggctgagcCCGGCTGTCagtcctttcgcg 57 protein. cctcggcggcgcggcatagcccggct Polymerase (RNA) II (DNA 7 - 25 ggtttgccccgccggagtaatccggaagaggcctctt -34 directed) polypeptide J, 13.3kDa. attagggctctGGTGGCGGcggtc Propionyl Coenzyme A 13 + -15 accccctCCGGCTGTgtgagaggtcagcaga 44 carboxylase, α polypeptide , ggggcggtctgcggggacaacaatggcgg nuclear gene encoding mitochondrial protein. Proteasome 2 + -66 cggcaggcggGGTGGCGGgcagcccctggg -7 (prosome,macropain) 26S cgggcggggtcctggcgagaagcgagccgg subunit, non-ATPase, 1. Proteasome (prosome, 17 + -28 ggcgggctttcCGGGTGTGTGtttccggcgtcg 31 macropain) 26S subunit, non- gcggccgcggccggggacggtgtgaga ATPase, 11. Protein kinase, interferon- 2 - -49 gcctgccccctcgctggagcaacgcaagcaggag -108 inducible double stranded RNA gcgggggagtcggaggAGGTGGCGGc dependent activator. RAB7, member RAS oncogene 3 + 8 tcggggcggcggcGGTGGCGGaagtgggag 67 family. cgggcctggagtcttggccataaagcctga Reserved. X + -2 catctccctgggagtcgcgCAGAGTGGagtca 57 aaggcaaccagtgctcgctgcggtctct Ribonuclease H2, large subunit. 19 + 1 cgccgagacccgctcctgcagtattagttcttgcagct 60 ggtGGTGGCGGctgaggcggca Ribosomal protein S7. 2 + 284 gCGCCTGCAgcccgggccgcgtaacgctgac 343 cgctgtgccttcagttctcccaggagaaa S100 calcium binding protein A9. 1 + -49 tgccccagtcaggagctgcctataaatgccgaGC 10 CTGCACagctctggcaaacactctgt Sialyltransferase 7D 9 - -45 gagctccgagcggcggatcgcgagcctcctgcgaa -104 galactosaminide alpha-2,6- ccccaGCCTGCACgcccggttagca sialyltransferase. SKI-interacting protein. 14 - 51 tccttcctttccgctcctcatcatctggaaagacccgcc -8 cagcGGTGCTGTcgctcgcgc Small nuclear ribonucleoprotein 18 + -39 cggcccagcgcagcctcggcgtttccgccatgCG 20 D1 polypeptide 16kDa. CCTGCAgtgctccgcgcgctcttgac Small nuclear ribonucleoprotein 19 + 290 tagagcatcctggAAGTCGTAGtaaatctctcg 349 polypeptide A. agagttctctccgcacgcgggctggag Small nuclear ribonucleoprotein 6 + -54 gctggctgtgcgcgtcatttccgggcgtcacgtaacg 5 polypeptide C. GAGTGGCCaacggcctgcagagc SNRPN upstream reading frame. 15 + 32 gcgaagcctgccgctgctgcagcgagtctggcgCA 91 GAGTGGagcggccgccggagatgcc Solute carrier family 31 (copper 9 + 11 gcgggaaatcctcggcctcGGTGGCGGtggtg 70 transporters), member 1. gacacgtcgagccgggtagaagtggagg Splicing factor 3a, subunit 2, 19 + -2 ccggaagtgctttctttgcccgccgttcgccaaacga 57 66kDa. agtcgtggAGGTGGCGaaacgag Succinate dehydrogenase 1 + -44 cccccagccggcgcgcctccgccctcgGGTGG 15 complex, subunit C, integral CGGggccgcctggcgtcacttccgtcca membrane protein, 15kDa. Syntaxin 10. 19 - 92 gggGGTGGCGGcgcgttctcctcggttgggag 33 ggaaccagcccgcgaacccaggccggga Thiopurine S-methyltransferase. 6 - -20 gagaagtggcggaggtggaagcggaggcgtaccc -79 gcccctggggacgtcattGGTGGCGG TPI (triosephosphate isomerase). 12 + -46 cgctctatataagtgggcAGTGGCCGcgactgc 13 gcgcagacactgaccttcagcgcctcg (continued on next page)

126

(Table A4 continued) Triosephosphate isomerase 1. 12 + 62 gttccacttcgcggcgctctatataagtgggcAGTG 121 GCCGcgactgcgcgcagacactga Tumor necrosis factor-beta, LTA 6 + -80 acgctgccactgccgcttcctctataaagggACCT -21 or TNFB gene. GAGCgtccgggcccaggggctccgc Ubiquitin-activating enzyme E1 X + -146 cgttgttactgccacaatgtccaccaggtGGTGCT -87 (A1S9T and BN75 temperature GTtccacttctaaccgaatgaatcg sensitivity complementing). VAMP (vesicle-associated 18 + 44 tcacgtgggtcgccgaggctcgcaagtgcgcgtggc 103 membrane protein)-associated cgtggcggctGGTGTGGGGttgag protein A, 33kDa.

Table A5. Scratch Long Target Genes

Gene Description Ch 60n Sequence with Binding Site [Endogenous retrovirus 3]. 7 - 6308 ccgcccctgttggTTGCATGTataaaagtcaag 6249 ccctgtcattgttcagggctcagcctt 2',3'-cyclic nucleotide 3' 17 + 16 ccccgggctatgtaaagcggccgggctcgggtcGT 75 phosphodiesterase. GCCACCgctggactcccgtgtccct 2-hydroxyphytanoyl-CoA lyase. 3 - -5 gcctcttccttcccgttgtttaaggcagttggttgccctC -64 CTGTCCGtcagaggtgcagt ADH1 (alcohol dehydrogenase 1) 4 - 92 tgtggcttaaagacctagatCATGTGTGgaact 33 ggaatcgggtgttattcaagcaaaaaa Adipose differentiation-related 9 - -34 ggaggggcggcgaggcggggtttatagcccgggc -93 protein gcccgcgGGCCCCACgctttgaccgg ADP-ribosylation factor 5. 7 + -61 agcctcctccTGCTGCTGctgcgccccatcccc -2 ccgcggccggccagttccagcccgcac ADP-ribosylation factor-like 4. 7 + 248 gtgggcgtggtcggagtctgagtccaaggtttctcgtg 307 cGGCAACTGccagagctgcttg Advanced glycosylation end 6 - 62 gctggagagagggtgcaGGCCCCACCtagg 3 product-specific receptor. gcggaggccacagcagggagaggggcagac Aldolase A. 16 + 10991 gttataggaggagtctggcccttgagtaccgggtacg 11050 caggggtgcctcaACCACACTcc alpha 1-acid glycoprotein 1. 9 + 26 agtgaCCGCCCATagtttattataaaggtgactg 85 caccctgcagccaccagcactgcctg alpha 1-acid glycoprotein 2, 9 + 31 agtgaCCGCCCATagtttattataaaggtgactg 90 Orosomucoid 2. caccctgcagccaccagcactgcctg alpha-actin skeletal muscle. 1 - 47 agggaatgcccgcgggctatataaaACCTGAG -12 CAGagggacaagcggccaccgcagcgga Annexin A2. 15 - 49 tagggaggatgtggcgggtataaaaGCCCCAC -10 Ccaggccagccggctctgctcagcattt Antigen identified by monoclonal X + 36 cttgcagggccgctcccctcacccgcccccttcgagt 95 antibodies 12E7, F21 and O13. ccccgggcttcGCCCCACCcggc Antithrombin III, AT3 gene. 1 - 0 tctGCCCCACCctgtcctctggaacctctgcgag -59 atttagaggaaagaaccagttttcag apoB (apolipoprotein B) 2 - 815 aggttccagatgtctatgaggaACATGACGtgtc 756 ctgtccactactctgcttttcctcgt apoB (apolipoprotein B) 2 - 2075 aaacaccgtctctactaaaaatacaaaaattagctg 2016 gGCATAGTGgcactcacctgttat apoB (apolipoprotein B) 2 - -31 ccttctcgGTTGCTGCcgctgaggagcccgccc -90 agccagccagggccgcgaggccgaggc apoE (apolipoprotein E) 19 + -96 ggagaacaGCCCACCTcgtgactgggggctg -37 gcccagcccgccctatccctgggggaggg apoE (apolipoprotein E) 19 + -167 gtggggggtggtcaaaagacctctatGCCCCAC -108 CTccttcctccctctgccctgctgtgc B-cell receptor-associated protein X - -317 ccggtttccggccgcggtatgaggggcggggccgg -376 31 (BAP31). gGCTGCTGTgggagagttctgttgc BGP (biliary glycoprotein) 19 - 71 aggaaacagagcttcctggacaaaccccgcccca 12 GCACACATGAtcagacaaagctctgg (continued on next page)

127

(Table A5 continued) Calmodulin 1 (phosphorylase 14 + -46 gatacggcgcaccatatatatatcgcggggcgcag 13 kinase, delta). actcgcgctccggCAGTGGTGctgg Carcinoembryonic antigen- 19 - 43 ccccaGCACACATGAtcagacaaagctctgg -16 related cell adhesion molecule 1 gccccagggaggaggctcagcacagagag (biliary glycoprotein). CBF1 interacting corepressor. 2 - 32 gaagctgtcacctttgatctcggaagttcccccTTG -27 CTGCTGgaaacgcagttccggtta Cellular retinoic acid binding 1 - 33 ggtataaaaGCTGTCCGcgcgggagcccagg -26 protein 2. ccagctttggggttgtccctggacttgtc Ceroid-lipofuscinosis, neuronal 2, 11 - 10 gaggggtagtggtggtggaatatagagctcatgtgat -49 late infantile. ccgtCACATGACagcagatccgc Ceroid-lipofuscinosis, neuronal 2, 11 - 10 gaggggtAGTGGTGGtggaatatagagctcat -49 late infantile. gtgatccgtcacatgacagcagatccgc CGI-51 protein. 22 + 116 gcggacccgccttctgccctcagcagcagacgctC 175 TGTCCCGcccgggcagctctgcgag Chaperonin containing TCP1, 21 - 4 gcgtgaggcGGCCCCACgctgctttcccagaa -55 subunit 8 (theta). ggctgtgcgtgctcctcgcttcctccgc CHK1 checkpoint homolog (S. 11 + 799 cgtgacgccctcaagttttggcgggaaaagcgctgc 858 pombe). atttggattcctgCAGTGGTGGGC Claudin 4. 7 + 101 agccatataactgctcaACCTGTCCccgagag 160 agagtgccctggcagctgtcggctggaa c-myc 8 + -162 ccggctgagtctcctCCCCACCTtccccaccctc -103 cccaccctccccataagcgcccctcc Colony-stimulating factor-1, 1 + 186 accacatgccccccagtcctctcttaaaaggCTGT 245 CSF1 gene. GCCGagggctggccagtgaggctcg COP9 constitutive 4 + 66 actctggagGACCACACTcgttttctttttggctgc 125 photomorphogenic homolog cagaggcccccgcatccaccgctg subunit 4 (Arabidopsis). cycB1 (cyclin B1) 5 + -158 ccgagagcgcaggcgcagaggCAGACCACg -99 tgagagcctggccaggccttccggcctagc Cyclin-dependent kinase 4. 12 - -553 gcgctttttttttgggagtcccttTGTTGCTGCagg -612 ctcataccatcctaactctgtaag Cystatin S (4). 20 - 62 gggctgggctgccaaagcaggataaatgcaCAC 3 CTGCCtgctggtctgggctccctgcct Cytochrome c oxidase subunit 19 + 50 tctctcagcttccggctggtagtagttccgcttCCTG 109 VIb , nuclear gene encoding TCCGActgtggtgtctttgctga mitochondrial protein. DEAD/H (Asp-Glu-Ala-Asp/His) 14 - 39 ctccccagcgcgcacgagtacgtgcgtggaggtcc -20 box polypeptide 24. cgccccggaactgcaGTTGCTGCTG DnaJ (Hsp40) homolog, 16 - -41 ctgtctccctcggcCTGTGCCGccgccgacgcc -100 subfamily A, member 2. gcttgtgggcccgactccgctctgtct Dynactin 4. 16 + -56 cagtggGCCACCTGgagcggaagtggagga 3 gcggccggaagtagccggaatctctgaaag EGF-containing fibulin-like 11 - -49 caggcgggcgggcggggggcgcttcctggggccg -108 extracellular matrix protein 2. cgcgtccagggagCTGTGCCGtccgc Electron-transfer-flavoprotein, 15 - -14 tgaggcggcgccagttggccgggcacgggGCTG -73 alpha polypeptide (glutaric II). CTGTaaggccgaggttgcggcggaagc Enolase 3, (beta, muscle) , 17 + -21 gggaccgagtggctcagggataaatgcgcagcctg 38 transcript variant 1. agagggggtgagctgACACTGTCCc EpoR (erythropoietin receptor) 19 - -69 gtggacTGTGCCGGGggcgggggacggagg -128 ggcaggagccctgggctccccgtggcgggg Eukaryotic translation initiation Y + 3 gagtgcgcgtcagcagtttattagagagctctgtagc 62 factor 1A, Y chromosome. cagcctcttctgcGCACCCACCT Eukaryotic translation initiation 11 + -17 tattcatgaATGCTGTCatttccgcttccgcctcctt 42 factor 3, subunit 5 ε, 47kDa. ctttctcgacaagatggccacac Eukaryotic translation initiation 16 + 486 gccttagagggctgtgatagccggtagcCACCCA 545 factor 3, subunit 8, 110kDa. CCaggcacaagggatagaatcacaga Farnesyl-diphosphate 8 + 11 gggcggggcgtcgccgtactaggcctgcccCCTG 70 farnesyltransferase 1. TCCGgccagcccctcgaagcacctac Ferritin heavy chain. 11 - 21 gggcctgacgccgacgcggctataagAGACCA -38 CAagcgacccgcagggccagacgttctt (continued on next page)

128

(Table A5 continued) Finkel-Biskis-Reilly murine 11 - -15 cgCAGCCCACggtctgtactgacgcgccctcgc -74 sarcoma virus (FBR-MuSV) (fox ttcttcctctttctcgactccatcttc derived); ribosomal protein S30. Follicle stimulating hormone-beta, 11 + -49 agTTGCACATGattttgtataaaaggtgaactga 10 FSHB gene. gatttcattcagtctacagctcttgc Fucosidase, alpha-L- 1, tissue. 1 - 10 agccgcccgcggGCACCTGCgcgttaagagt -49 gggccgcgtcgctgaggggtagcgatgcg FXYD domain containing ion 19 + -30 ccgtctgggtccccgcccgctcgcgctcccctgGC 29 transport regulator 5. CACACCctccgcctggacgcagcag FXYD domain containing ion 11 - 36 agcctCACCCACCgccctgcccccagccccgc -23 transport regulator 6. cactcccaggctcctcgggactcggcgg Galactosidase, alpha. X - -39 gtccgcccctgaggttaatcttaaaagcccaggttac -98 ccgcggaaatttATGCTGTCCGg Geranylgeranyl diphosphate 1 + 83 aaaggggCGGGGCAACaaagcagtaggga 142 synthase 1. ggcggcaacgacgcctgcgcagtgtgaccgg GFAP (glial fibrillary acidic 17 - 1546 aatgactcaccttggcacagacacaatgttcggggt 1487 protein) ggGCACAGTGcctgcttcccgccg GLOB-B (beta-globin) 11 - 53788 gccaggcccctgtcggggtcagtGCCCCACCc 53729 ccgccttctggttctgtgtaaccttcta Glycerol-3-phosphate 12 + -59 ctGCACCCACgtgtggtggagttaaatgctccta 0 dehydrogenase 1 (soluble). gccggcagaggagctagggagtgtgg H2A histone family, member Y. 5 - 49 tttcggCCTGTCCGcagtttttaaaaaacgtgtgt -10 gatgataaggaatcactgtctacat H3 (histone 3, ST519 gene) 6 - 252 ccagtacaagacttgtcCACAGTGGatatagca 193 gctaaggggttaacaaaatgacgtcag Heterogeneous nuclear X + -30 agttgctcacgtgaccgagatctCACATGACGt 29 ribonucleoprotein H2 (H'). aggcgctcacgtgattactcgcttacg Histidyl-tRNA synthetase. 5 + 13 gcgacccagcgactcgatagccggaagtcatccttg -46 ctgaggctGGGGCAACcaccgcag Histone deacetylase 1. 1 + 84 cggaggaaagtctgttactactacgacggTGAGC 143 ACCgtcctcgcggcgggggcggggcc Histone H4-A1. 6 + -82 tcggtccgcCAACTGTCgtataaaggcgctgcc -23 tcaggtcagaggcctcacaaagcgttg HO (heme oxygenase) 22 + -165 ccaagggtcatatgactgctcctctccaccCCACA -106 CTGgcccggggcgggctgggcgcgg Hydroxysteroid (17-beta) 5 + 7 aaatcggcaagtcactgaccctcgtcccgcccccgc 66 dehydrogenase 4. cattccccgcctcctCCTGTCCCG Hypothetical protein X - -88 cctgttctcacGCCCCACCctcagacctagccg -147 DKFZp564K142 similar to gagcaaagtttcacttatagaagggag implantation-associated protein. Hypothetical protein FLJ14800. 12 - 85 agatgggggacggggcgggcctctaggagtgaag 26 gaaaggcggggacccggATGTGTGTg Hypothetical protein FLJ20244. 19 - 36 tacgtttcctcattgactggccaaccactatcatgtgatt -23 cgcccaccCACTGTGCCaac Hypoxanthine X + 80 cctcttgctgcgcctccgcctcctcctctgctccgccac 139 phosphoribosyltransferase 1 cggcttcctcctCCTGAGCAG (Lesch-Nyhan syndrome). Hypoxia-inducible factor 1, alpha 14 + 70 gcctccgcccttgcccgccccctgacgctgcctcagc 129 subunit (basic HLH TF). tcctcagTGCACAGTGctgcctc IgA(germline) 14 - 4329 ccaccacagccagaccacaggccagACATGA 4270 CGtggagccgcgcggccgtgtctgctggg IgA(germline) 14 - 4329 ccaccacagcCAGACCACAggccagacatga 4270 cgtggagccgcgcggccgtgtctgctggg Ig-kappa (immunoglobulin kappa 2 - -159627 cctgctccattccctgctgatTTGCATGTtcccag -159686 light chain) agcacaaccccctgttctgaagact IK cytokine, down-regulator of 5 + 11 tgtgggttgattctgaggTGCACTGTGggaaag 70 HLA II. agcttgtcgctgcggtgttgctgttgg IK cytokine, down-regulator of 5 + 11 tgtgggttgattctgaggtgcactgtgggaaagagctt 70 HLA II. gtcgctgcggTGTTGCTGttgg Interleukin-6, B-cell stimulating -152 gcacaatcttaataaggtttccaatcagccccacccg -93 factor-2, IL6 or IFNB2 gene ctCTGGCCCCACCctcaccctcc (continued on next page)

129

(Table A5 continued) IL-6 (interleukin-6) 7 + -176 tgccatgctaaaggacgtcaCATTGCACAatctt -117 aataaggtttccaatcagccccaccc Interferon regulatory factor 3. 19 - 31 TGCATGTGacgctcccagcatgcctctggccgg -28 aaacccaaaaaagggcataggggcggg Interferon-induced protein with 10 + 3 caccattggctgctgtttagctcccttatataACACT 62 tetratricopeptide repeats 1. GTCttggggtttaaacgtaactg Interferon-induced protein with 10 + 3 caccattgGCTGCTGTttagctcccttatataaca 62 tetratricopeptide repeats 1. ctgtcttggggtttaaacgtaactg Interferon-stimulated gene 15K. 1 + -19 acgtgtgtgcctcaggcttaataatagggccggTG 40 CTGCTGcggaagccggcggctgaga Interleukin 10 receptor, beta. 21 + -49 agcggggtgggagcccggtcgcccggcccctcccc 10 accccgccCCGCCCATctccgctgg Interleukin-4, B-cell stimulating 5 + 256 tcggtttcagcaattttaaatctatatatagagatatcttt 315 factor-1, IL4 gene. gtcagCATTGCATcgttag Interleukin-6, B-cell stimulating 7 + -110 acgtcaCATTGCACAatcttaataaggtttccaa -51 factor-2, IL6 or IFNB2 gene. tcagccccacccgctctggccccacc Interleukin-6, B-cell stimulating 7 + -110 acgtcacattgcacaatcttaataaggtttccaatcag -51 factor-2, IL6 or IFNB2 gene. ccccacccgctctGGCCCCACC Ionized calcium binding adapter 9 + -43 gggtGTGTCGGCcgcccgcgccgttcctccgc 16 molecule 2. ccctcggtcccgcggccacacgcagcta Isocitrate dehydrogenase 3 15 + -49 gggcgggctccgagcgtggggtgggcgcttgcgca 10 (NAD+) alpha. ctgccgctgcggcTGTTGCTGCgga Isocitrate dehydrogenase 3 X - -99 gCCTGTCCCcgaaacttcgcaccccgtcgaac -158 (NAD+) gamma. tctcgcgagagcggtatctgcgtgtcgg Kangai 1 (suppression of 11 + -2 ggggCCTGCCGAGtccgcggcgttccccggct 57 tumorigenicity 6, prostate; CD82 gcagccgggagggggccgaggagtgact antigen. Kynureninase (L-kynurenine 2 + -11 gtggagtCTGAGCAGttctttgaatttctcacccta 48 hydrolase). agatctggcctgtacattttcaag Lectin, galactoside-binding, 19 - -219 tcctccgcgttctcctcctccgctgccCACCTGTC -278 soluble, 4 ctgggtcattcctgcagcctgccct Legumain. 14 - 9 gtcaccgcgGCACAGTGGcccttaagcgagg -50 agcggcggcgcccgcagcaatcacagcag Leukocyte immunoglobulin-like 19 - 72 ttcctgtgtggTTGCACATaacaaaaccccatga 13 receptor, subfamily A (with TM caagaaggatccagcctccgagtgtc domain), member 2. Leupaxin. 11 - -2261 atacaggaaccaaaaaggtttatagaggctgccttA -2320 TGCTGTCtcatccagttcctttgc L-histidine decarboxylase. 15 - 49 ctaaggtcaaagaaagaaccctttaaataaagggc -10 CCACACTGgctgccagggagtgcgc Likely homolog of rat and mouse 17 + -25 gacccgccccgtccgcctccgtcacgtgatggggcg 34 retinoid-inducible serine ccgggtccagccTGTTGCTGatgc carboxypeptidase. Likely ortholog of mouse another 14 + -73 acttCCTGTCCCGcgctcgagctgcttccggcg -14 partner for ARF 1. ggagccggaagacgctgtgtgtggacg LPL (lipoprotein lipase) 8 + -294 acaagctgggacgcaATGTGTGTccctctatcc -235 ctacattgactttgcgggggtggggat Lymphotoxin beta receptor 12 + -72 gctcccagcgctcCCTGTCCCcgccccgcggc -13 (TNFR superfamily, member 3). cagctcgctccactcccacttcctgagc Lysosomal acid phosphatase 2. 11 - -59 tgcttgcaGGTGCCACCcagcgggttccagctt -118 gtttgctgcatagattacaacggtgat Macrophage migration inhibitory 22 + -44 cgggggcggggcctggcgccggcggtggcgtcac 15 factor (glycosylation-inhibiting aaaaggcgggacCACAGTGGTGtccg factor). Major histocompatibility complex, 6 - 11 cgccgcggtcccagttctaaagtccccacGCACC -48 class I, B. CACCcggactcagagtctcctcagac Major histocompatibility complex, 6 - 63 gcgtctccgcagtcccggttctaaagtccccagtCA 4 class I, C. CCCACCcggactcacattctcccc Matrin 3. 5 + 19591 gccgctgccgccgcttctcgccagcgccGTTGCT 19650 GCgggggattgtgggagtctccgcgt (continued on next page)

130

(Table A5 continued) MCM2 minichromosome 3 + -8 cgtgaaccacttttcgcgcgaaacctggtTGTTGC 51 maintenance deficient 2, mitotin TGtagtggcggagaggatcgtggta (S. cerevisiae). Melan-A. 9 + -93 aacccgtgactatCATGGGACtcaaaaccagg -34 aaaaaaaataagtcaaaacgattaagag Melanophilin. 2 + -33 agcggggcgCAGCCCACgtggttcgggcggg 26 aggcgccgggacgtggccagttgcccgcc Membrane cofactor protein 1 + 6 gccacgccCACCTGTCCtgcagcactggatgc 65 (CD46, trophoblast-lymphocyte tttgtgagttggggattgttgcgtccca cross-reactive antigen). Milk fat globule-EGF factor 8 15 - -21 tgatttattccggtcccagaggagaaggcgccagaa -80 protein. ccccgcggggtCTGAGCAGcccag Mitochondrial ribosomal protein 11 - 29 ctCACCTGCCctagcctaacgtgaggacccttg -30 L11. acctctggcccaagatggtggcgccca Mitochondrial ribosomal protein 8 - -208 ggAACTGTCCtgcttgggtgttagcgtttcccgcc -267 L13. gggccagtaaggctgagtgacccgg Mitochondrial ribosomal protein 16 - -263 gccgggggcctggccaggctgcgactgactgcgag -322 L28. cCCCCACCTcccgccaggctcgcga MMP11 matrix metallopeptidase 22 + -424 AacttgcttgtgggtaggacCACCTGTCagagg -365 11 (stromelysin 3) tcagaggtcaggccaccaaggagaccc MMP-2 (matrix metalloproteinase 16 + -1646 aaagtcagagcacaCACCCACCagacaagc -1587 2) ctgaacttgtctgaagcccactgagaccca MMS19-like (MET18 homolog, S. 10 - -145 attggctcgggaggcgcatgcgcggagagcccgtct -204 cerevisiae). cgagccaCCGCCCATatcccctcc MT-IF (metallothionein IF) 16 + -123 tgcagcgagctgcgctcaggggaccttgcgcccgg -64 cccttctgcTGCACACAgcccaccc Myo-inositol 1-phosphate 19 - 47 ggggaggggccgcagcgcttttctccggaggtcgcg -12 synthase A1. cgcccgagagcccgcGCTGTCCGc Myozenin 2. 4 + 77 ATGCTGTCCcaggttcaaggataaaaaccatc 136 aggcccaagtgccatccatagtccatct NADH dehydrogenase 11 + 13 cggtccgtgcccttggggctccgtgtcCTGCTGT 72 (ubiquinone) Fe-S protein 3, Ctttccgtccgctgcctagtctgcat 30kDa) e Q reductase). NADP-dependent retinol 2 + 4722 ggagtgactcacagagcaaggagagaacctgagg 4781 dehydrogenase/reductase , attcctCACACATGtagtactcagag transcript variant C. Neurotensin. 12 + -51 acatcatcctcctgcttatatatataggggaatggcca 8 GAGCACCTctcatagttcactc Nuclear antigen Sp100. 2 + 15 tttctgagtcagtctgtcggccgacttcctgcttggggc 74 ctgggcAGCCACACtgcacgc Nuclear antigen Sp100. 2 + 15 tttctgagtcagtctgtcggccgacttcctgcttggggc 74 ctgggcagCCACACTGcacgc Nuclear cap binding protein 3 - -18 gtttccgcttccgccatcgcggacgcagcctcGTG -77 subunit 2, 20kDa. CCGGGagtcgccgcattgtggtccg Nuclear factor (erythroid-derived 2 - -383 ggggcccgccggcgtgtagccgattaccgaGTG -442 2)-like 2. CCGGGgagcccggaggagccgccgacg Nuclear factor of kappa light 19 + -40 GTGGTGGGGaatttcccgcagggcggaagct 19 polypeptide gene enhancer in B- ccagaactcccggcaaagcccagctacag cells inhibitor, beta. Nuclear receptor binding protein. 2 + -29 gtgcggcaCCGCCCATcgttggtccgggcgcc 30 gcgagggcggggcccgcagttcggttgc p21 Waf1 (cip1: Cdk-inhibiting 6 + -2237 tcaggaacatgtcccaACATGTTGagctctggc -2178 protein 1) atagaagaggctggtggctattttgtc Papillary renal cell carcinoma 1 + 12 tcctgtattggtgaggcaaggaggaggcggagtgac 71 (translocation-associated). tcggcggccattagcTGTGTGTAg PBGD (porphobilinogen 11 + 2995 gctgcaGGCCCCACCcttcctgtggccaggctg 3054 deaminase) atgggccttatctctttacccacctgg PDGFB (platelet-derived growth 22 - 113 ccgggccagaagaggaaaggctgtctcCACCC 54 factor chain B) ACCTctcgcactctcccttctcctttat PEF protein w/a long N-terminal 1 - 46 gggcgggtttgagagcggaaaGCCCCACCcct -13 hydrophobic domain (peflin). tgcctgagtgtgacgtcagaatcaccat (continued on next page)

131

(Table A5 continued) Pepsinogen A. 11 + -50 ataaggcgggacccaacttgtatataaggggcagct 9 caTGCTGCTGctctgcaccttcct Peptidylprolyl isomerase B 15 - -118 ccaccctcttccggcctcaGCTGTCCGggctgct -177 (cyclophilin B). ttcgcctccgcctgtggatgctgcgc Peroxiredoxin 5. 11 + 38 cgccgCTGTGCCGctagcggtgccccgcctgc 97 tgcggtggcaccagccaggaggcggagt Peroxisomal D3,D2-enoyl-CoA 6 - 26 taggccccgcccccgagacccGGCCCCACCc -33 isomerase. ccgagcccccgcagccctagagccgccca PES2 (prostaglandin- 1 - 81 acgaaaaggcggaaagaaacagtcatttcgtCAC 22 endoperoxide synthase 2). ATGGGcttggttttcagtcttataaa PET112-like (yeast). 4 - 40 agcctcgcgGCATAGTGaggctgagtcacctg -19 accaagaccctggagttacaatggcggc Phosphoglycerate mutase 1 10 + -129 gCACACCTGgagggcggagagcggtccaga -70 (brain). gcgagtggaaagatttgggcgagaacttgc Phosphoribosyl pyrophosphate X + 37 gaCCCCACCTccgccgctttgggtaatttagagc 96 synthetase 1. cgcgcgccgggcgggaatgtaagatg Polymerase (RNA) II (DNA 8 + -27 ggagtaagagagagagaaagagagacaGCAC 32 directed) polypeptide K, 7.0kDa. TGTGCgtcatgtaccagcgccggaagttg prl (prolactin) 6 - -332 ttgatgtttgaaattatgggggtaatctcaATGACG -391 GAaatagatgaccaggaaaaggga Protamine P1. 16 - 45 ccaacggccccctggcatctataacaggccgcaga -14 gctggcccctgactcaCAGCCCACa Proteasome (prosome, 14 + -22 acttccgcctctggcgggccgCAGTGGTGGag 37 macropain) 26S subunit, ATPase, gaacttccggcagcggcagctcaagtgg 1. Proteasome (prosome, 7 - 24 ggggcggggaaatgaaaggaaatcggccacagtg -35 macropain) subunit, α type, 2. cGCATGTGTGcggctgtgctttggct Protective protein for beta- 20 + 334 agactgtcacgtggcgccggagttcacgtgactcgt 393 galactosidase (galactosialidosis). ACACATGACttccagtccccgggc Protein kinase, AMP-activated, 12 + -15 acggagtcgccggaagcggaagtcgctgaggggt 44 beta 1 non-catalytic subunit. ggtgaagcggttgggaaAGTGTCGGt Protein tyrosine phosphatase, 12 + 4747 tCCTGTCCCcgccctgccggctgccccaggcc 4806 non-receptor type 6 , variant 1. agtggagtggcagccccagaactgggac Protein x 0001. 3 - 34 cctcggagtcttcggcaccgcCCTGTCCCagcc -25 tcctttgcgggtaaacagacatggccg Regulatory factor X-associated 19 + 739 aagttgctttCTGTCCCGgcagaggaagccag 798 ankyrin-containing protein. atcgctgagggtccggtctccagtttgc Replication protein A2, 32kDa. 1 - -100 TGAGCACCgattggctgaagcgagcaccccgg -159 gagctgactggctccgccattcgcggga Reticulon 4. 2 - 38132 ATTGCACTGcgtcagactgttccacacccaga 38073 agacgtcaggtgacttcagtcctgctgc Ribonuclease, RNase A family, 4. 14 + 333 accctCCACACCTcaccacgcccccatctccgt 392 ccgtgtacacacactcacacaaggacg Ribosomal protein S25. 11 - 32 acggtggtttttgggcccgtttCTGAGCAGcgctt -27 cctttttgtccgacatcttgacgag Ribosomal protein S3A. 4 + -17 tacaaaatttagccttttggtactCTGAGCAGcac 42 catggcggttgttaagaacaagtgc Ribosomal protein, large, P0 12 - -339 GTGTCGGCgtgaccaggcctgagctccctgtct -398 ctcctcagtgacatcgtctttaaaccc S100A2 1 - 1877 ggttggatttcagcaggatagagggcatggGCAT 1818 GTGTGggcacgttctgaacagagggg SA-ACT (skeletal alpha-actin) 1 - 278 cccctccggcGGGCAACTGggtcgggtcagg 219 aggggcaaacccgctagggagacactcca Scavenger receptor class B, 12 - -77 gcataaaaccactggcCACCTGCCGggctgct -136 member 1. cctgcgtgcgctgccgtcccggatccac Scavenger receptor class B, 12 - -77 gcataaaaccactgGCCACCTGccgggctgct -136 member 1. cctgcgtgcgctgccgtcccggatccac Selenium binding protein 1. 1 - 66 GCCCCACCcccagctggttgtataaattccctcc 7 cttcgctccttccccggaacagcggc Selenoprotein P, plasma, 1. 5 - 19 tgaggtaaacaacaggactataaatatcagagtgT -40 GCTGCTGTggctttgtggagctgcc (continued on next page)

132

(Table A5 continued) Serine proteinase inhibitor, clade 14 + 19 aatccagccagccctctgccctttctgagcccgaggg 78 A (α1 antiproteinase, antitrypsin), acTGCCACCTccactgtgtgcac member 5. Serine palmitoyltransferase, long 9 - 27 CTGAGCAGcgacgcgcacttttgggacgcgctt -32 chain base subunit 1. gtgacccgccttccggaaggaagcggc Serine proteinase inhibitor. 14 - -8 aactggGCACTGTGCCcagggcatgcactgc -67 ctccacgcagcaaccctcagagtcctgag Shwachman-Bodian-Diamond 7 - 47 cgctcacttttcccctcccggcttctgctCCACCTG -12 syndrome. Acgcctgcgcagtaagtaagcctg Shwachman-Bodian-Diamond 7 - 47 cgctcacttttcccctcccggcttctgcTCCACCT -12 syndrome. Gacgcctgcgcagtaagtaagcctg Sialyltransferase 4C (beta- 11 + 49837 gtgtctaggcaggagagtttgtgaagctgaccggAC 49896 galactoside alpha-2,3- ACCTGTggctcttatttcctaggt sialytransferase). Sirtuin silent mating type 19 - -101 gcagtcggtgacaggacagagcagtcggtgacgg -160 information regulation 2 homolog gaCACAGTGGTtggtgacgggacaga 2 (S. cerevisiae). Small nuclear ribonucleoprotein 1 + -58 agCCGCCCATGCcgccgcgtgaccttcacact 1 E. tccgcttccggttctttattccggaagt Solute carrier family 26, member 7 - 39 ccactgtcttacatagcctatatatagacatccctatgc -20 3. atatggaTGCACACAgagcct Sorbitol dehydrogenase. 15 + 30 tcccaGGCCCCACCttccatccagtgccctgga 89 ccctcggctgggtagcgccaccagagc SPB (lung surfactant protein B) 2 - 148 gtgttcctgctgggaaaaggtgggatcaAGCACC 89 TGgagggctcttcagagcaaagacaa Speckle-type POZ protein. 17 - -118 ctccgagtgTGTGTGTAtttgtgtatcggcggtcc -177 cgcaggtcccggatgttgcggacag Splicing factor, arginine/serine- 20 + -17 ccGCCCCACCggccgcccatatagcgaaagc 42 rich 6. ctggcgcgcgcgcgcgccattgtgtggct Splicing factor, arginine/serine- 20 + -17 ccgccccaccggCCGCCCATatagcgaaagc 42 rich 6. ctggcgcgcgcgcgcgccattgtgtggct SPTF-associated factor 65 2 - 76 cctgTTGCATAGcttgattgagaagaagcgcttc 17 gamma (STAF65(gamma)). cggcggtcttagatcactaatcaaca Stromelysin, MMP3 or STMY1 11 - 47 caaacaaACACTGTCactctttaaaagctgcgc -12 gene. tcccgaggttggacctacaaggaggca Suppressor of Ty 4 homolog 1 (S. 17 - 48 aagaggaagtctaagcgccggaAGTGGTGG -11 cerevisiae). Gcattctgggtaacgagctatttacttcct Surfactant protein 5K, SFTPC or 8 + 79 cccctctccctacggacacatataagaccctggtCA 138 SFTP2 gene. CACCTGggagaggaggagaggaga T-cell leukemia/lymphoma 1A. 14 - -39 ccggatataaagggtcGGCCCCACatcccagg -98 gaccagcgagcggccttgagaggctctg TCR-delta (T-cell receptor delta); 14 + 38093 agggaggtgagtgagcaaTGCATGTGgtttcc 38152 Gene: G000395. aaccgttaatgctagagttatcactttc TERF1 (TRF1)-interacting 14 - 7 tggctccctcttaccgcccttttcCGGGGCAAggg -52 nuclear factor 2. aagctagtagcggagccggaagtga Tissue specific transplantation 8 - 35 actccatttcccagagtgccccgccccagcctcccg -24 antigen P35B. gccccgccCCCCACCTGgctccgc Titin immunoglobulin domain 5 + -29 ctgataaaagtCAGTGGTGctataatactaagc 30 protein (myotilin). acagagccactagattagtctgtgagg Troponin I. 1 - 33 aGTTGCTGCTGgacacagttttcatagcctccc -26 ctcggctctgcccctcacagtctgcag Tumor necrosis factor-beta, LTA 6 + -80 acgctgccactgccgcttcctctataaagggACCT -21 or TNFB gene. GAGCgtccgggcccaggggctccgc Unr-interacting protein. 12 + 1 cgtcatttccggcgggtgctctgcgtcatttacgtcgtc 60 acttCCTGCCGAtgccggtgt uPA (urokinase-type 10 + -3667 gccaatctgcagagtgaggcacaggctggacaggt -3608 plasminogen activator) ggatCACCTGAGgtcaggggttcaa Uridine kinase-like 1. 20 - 50 ttgtagtccgcggtgggcaagagagggacccgcca -9 gctcccggcgGCCCATGCgcgccgc Vaccinia related kinase 1. 14 + -1 tactgcagggtgcgaaggggccggcgccgCTGC 58 CGAGttacgagtcggcgaaagcggcgg (continued on next page)

133

(Table A5 continued) Vacuolar protein sorting 35 16 - -14 ggcttggaggggccgcagcgtCACATGACcgc -73 (yeast). gggaggctacgcgcggggcgggtgctgc Valosin-containing protein. 9 - -153 cgtcgctgccgctgccgctgccactgccacTGCC -212 ACCTcgcggatcaggagccagcgttg V-jun sarcoma virus 17 oncogene 1 - -48 agctcgggctggataagggctcagagTTGCACT -107 homolog (avian). Gagtgtggctgaagcagcgaggcggga Williams Beuren syndrome 7 + -2 atgacataaaaaccgggtgccggcaggcgccagtc 57 chromosome region 22. gcaggtgTGCTGCTGaggcgtgaga Wilm's tumor, WT1 gene. 11 - 262 ggccaagaaggggaggtggggggagggttgtGC 203 CACACCggccagctgagagcgcgtgtt

Table A6. Scratch Short Target Genes

Gene Description Ch 60n Sequence with Binding Site

2-hydroxyphytanoyl-CoA lyase. 3 - -5 gcctcttccttcccgttgtttaaggcagttggttgccctC -64 CTGTCCGtcagaggtgcagt 5-aminoimidazole-4-carboxamide 2 + 67 tccgccccctcgcttcctgagccgccacatcccggc 126 ribonucleotide formyltransferase/ agccctcctacctgcGCACGTGGT IMP cyclohydrolase. Adaptor-related protein complex 19 + 20 gaccggacataacggtccccgcgcggctccccga 79 1, mu 1 subunit. accggaagtggaggtgaGCTGTCGCg alcohol dehydrogenase 1B (class 4 - 59 agaTCACGTGTggaattggaattggatgttaca 0 I), beta polypeptide caagcaaacaaaataaatatctgtgca alcohol dehydrogenase 1C (class 4 - 84 tacgacgggtgtggcttaaaaacctagaTCACGT 25 I), gamma polypeptide GTgtagttggaattgggtgttatatg Aldehyde dehydrogenase 6 14 - -24 gagggcgaggcctggCAGCTGTAgtgcttctg -83 family, member A1 , nuclear gene ggcagtagaggcgcggggtgcggagcta encoding mitochondrial protein. apoAI (apolipoprotein AI) 11 - 127 ccttgaactcttaagttcCACATTGCcaggacca 68 gtgagcagcaacagggccggggctgg Apolipoprotein A. 1 - 50 gggGGTGGGTAaacagacaggtatatagccc -9 cttcctctccagccagggcaggcacagac Aquaporin 1 (channel-forming 7 + -49 tccccccgccccccggccctataaataggcccagc 10 integral protein, 28kDa). cCAGGCTGTggctcagctctcagag ATPase, H+ transporting, 16 - 27 gtgaggcgcttgcgcgtCACCTGACGcacttga -32 lysosomal 38kDa, V0 subunit d cagcccgctgaggacgcagcgtcagct isoform 1. ATPase, H+ transporting, X + 21 tgcGGCTGTCGaggccgctgaggcagtggag 80 lysosomal interacting protein 1. gctgaggctatgatggcggccatggcgac Calreticulin. 19 + -30 gcgGGTGGGTAtaaaagtgcaaggcgggcg 29 gcggcgtccgtccgtactgcagagccgctg Carbonic anhydrase III muscle, 8 + -24 ggCCATGCAgtgtgcgggggagctacataaaa 35 CA3 gene. gcgcgggctcgcgccgactctgcaccac Casein kinase 2, beta 6 + -91 ccctaagtcagcctcttcacccagtgagcacaAAA -32 polypeptide. CTGTAttgcccagactcccgggccc CD79B antigen (immunoglobulin- 17 - 8 gggACAGGCTGcagccggtgcagttacacgttt -51 associated beta). tcctccaaggagcctcggacgttgtca CDC28 protein kinase regulatory 9 + -52 atcggcggcgGGCTGTCGgattcaaatccgga 7 subunit 2. tcgttgggactgcggtcgttagtctccg Chromosome 14 open reading 14 + 22 aacttcggaGCTGTCGCccgggttaccgggag 81 frame 100. gcggagccgccgagctcgctgtggcccg Chymotrypsinogen B2. 16 - 22 cagggctataacggggaccggcagacaggcgtcc -37 tacacccctgccagcgGCACCATGgc Claudin 4. 7 + 101 agccatataactgcTCAACCTGTCCccgaga 160 gagagtgccctggcagctgtcggctggaa C-myc binding protein. 1 - 39 gccggtggtTTGCAAGTgcgcaggcgcgcgcc -20 cccttccgcgcatgagcagtgctgctcc CS-B (placental lactogen, 17 - 26 tgtacacagaaaCAGGTGGGgtcaagcaggg -33 chorionic somatomamotropin,) agagagaactggccagggtataaaaaggg (continued on next page)

134

(Table A6 continued) CSE1 chromosome segregation 20 + -30 cagcgggccgctcgctcatgcgctctggcctcagG 29 1-like (yeast). CTCGCTGTCgcgccattttgccggg Cyclin D1, CCND1 or PRAD1 or 11 + 3 acggactacaggggagttttgttgaagttgcaaagtc 62 BCL1 gene. ctggagcctccagagGGCTGTCG Cystatin S (4). 20 - 62 gggctgggctgccaaagcaggataaatgcaCAC 3 CTGCCTgctggtctgggctccctgcct Cytochrome c oxidase subunit 19 + 50 tctctcagcttccggctggtagtagttccgcttCCTG 109 VIb , nuclear gene encoding TCCGActgtggtgtctttgctga mitochondrial protein. DEAD/H (Asp-Glu-Ala-Asp/His) 2 + 6 ctgcgcacgtgcggccggaagggaagtaacgtca 65 box polypeptide 18 (Myc- gcctgagaactgagtAGCTGTACtgt regulated). E-1 enzyme. 4 + -12 caagggccgccccttttcctgccCACGTGGTCt 47 cgggctcctgccccgtcctgctcacga Eukaryotic translation initiation 14 + -1 cctcgcccgccccggaagtGCAAACTGTgtgg 58 factor 2B, subunit 2 beta, 39kDa. tctggcaggtgtggattccgccggtgaa Excision repair cross- 19 - -367 tggaggtcaaaggggcgtggcgttacagagcctcta -426 complementing rodent repair gcgctgggtgttggggACCTGACG deficiency, implementation group 1. Farnesyl-diphosphate 8 + 11 gggcggggcgtcgccgtactaggcctgcccCCT 70 farnesyltransferase 1. GTCCGgccagcccctcgaagcacctac gamma-interferon, IFNG gene. 12 - 51 cctcaggagacttcaattaggtataaataccagcag -8 ccagaggaggtgcaGCACATTGTt H2A histone family, member Y. 5 - 698 tttcggCCTGTCCGcagtttttaaaaaacgtgtgt 639 gatgataaggaatcactgtctacat Heat shock 10kDa protein 1 2 + 335 gcgcctgcgcgctgggtgatttttTCACGTGTcg 394 (chaperonin 10). ccagggccggactgcgagtctctttg Heat shock 70kDa protein 9B 5 - 12 aacccccgcccctcacccCACGTGGTtggagg -47 (mortalin-2). tttccagaagcgctgccgccaccgcatc Hexosaminidase A (alpha 15 - -149 ctCACCTGACcagggtctcacgtggccagccc 92 polypeptide). cctccgagaggggagaccagcgggccat Hsp70-interacting protein. 19 - -207 accacggcacgaagttgGGTGGGTACgcctg -266 gcacttccgggtgaaaccgtgcgtatttc IK cytokine, down-regulator of 5 + 11 tgtgggttgattctgaggtgcactgtgggaaagagctt 70 HLA II. gtcgctgcgGTGTTGCTgttgg IL-6 (interleukin-6) 7 + -176 tgccatgctaaaggacgtCACATTGCAcaatct -117 taataaggtttccaatcagccccaccc Insulin-like growth factor II, IGF2 11 - -12267 tgggaggagtcggctcacacataaaagctgaggca -12326 gene. ctgaccagcctGCAAACTGgacatt Interleukin-6, B-cell stimulating 7 + -110 acgtCACATTGCAcaatcttaataaggtttccaa -51 factor-2, IL6 or IFNB2 gene. tcagccccacccgctctggccccacc Isocitrate dehydrogenase 1 2 - 38 aggggacaaagccgggaagaggaaaagctcgg -21 (NADP+), soluble. acctaccctGTGGTCCCgggtttctgca MAD2 mitotic arrest deficient-like 4 - 18 aaaggctcgtcgcgcctgcgcaaaggACCTGA -41 1 (yeast). CGAcgtgctgcgtcgttacttttgaaac MDR1 (multidrug resistance gene 7 - -112199 cctggaaatTCAACCTGTttcgcagtttctcgag -112258 1) gaatcagcattcagtcaatccgggcc Melanophilin. 2 + -33 agcggggcgcagccCACGTGGTtcgggcgg 26 gaggcgccgggacgtggccagttgcccgcc Membrane cofactor protein 1 + 6 gccacgccCCACCTGTCCtgcagcactggat 65 (CD46, trophoblast-lymphocyte gctttgtgagttggggattgttgcgtccca cross-reactive antigen). MMP11 matrix metallopeptidase 22 + -424 AacttgcttgtgggtaggacCACCTGTCagagg -365 11 (stromelysin 3) tcagaggtcaggccaccaaggagaccc NADH dehydrogenase 9 - 5 ctcgtggttgggcagggcaatacgcctgCGTGTT -54 (ubiquinone) 1 alpha GCcggatgcgcatgcgcaggcgccgt subcomplex, 8, 19kDa. N-ethylmaleimide-sensitive factor 19 - -135 ggttggccagcgctagtCACGTGGTtcctggcc -194 attachment protein, alpha. gcaagcgacgcgcgccggtgcgacgtc N-Myelocytomatosis virus 29 2 + -60 gggtgtgtcagatttttcagttaataatatcccccgagc -1 oncogene, MYCN or NMYC. ttcaaagcgCAGGCTGTgaca (continued on next page) 135

(Table A6 continued) PDGFB (platelet-derived growth 22 - 108 cagaagaggaaAGGCTGTCtccacccacctct 49 factor chain B) cgcactctcccttctcctttataaaggc Peroxiredoxin 6. 1 + -60 ccccgcccctcgcgccttcgtttattcctccgcgcgct -1 gggACAGGCTGcttcttcgcca PET112-like (yeast). 4 - 40 agcctcgcggcatagtgaggctgagtCACCTGA -19 Ccaagaccctggagttacaatggcggc Protein C. 2 + -51 gtgctgagggccaagcaaatatttgtggttatggatta 8 actcgaactcCAGGCTGTCatg Protein kinase, cAMP-dependent, 17 + -4 cgctgattggctgcggccaggccgtttccggtggaG 55 regulatory, type I, alpha (tissue CTGTCGCctagccgctatcgcaga specific extinguisher 1). Regulatory factor X, 4 (influences 12 + -28 ctcctcccccacccacagagtgagttcCAGGTG 31 HLA class II expression). GGaaggcagttatgacagttgagaagt Reticulocalbin 1, EF-hand 11 + 151 cctccgcgccggccccaaccCTGTCGCTgcc 210 calcium binding domain. gccgcgctccgagtccccattcccgagct Retinol dehydrogenase 11 (all- 14 - 15 tccccggGCTGTCGCaatttgaattggggcgtgt -44 trans and 9-cis). ctagaaagagaagccatagtcggcga Ribosomal protein L24. 3 - 12 cccatgattctctctttcttttcgccatcttttgtctttccgtg -47 gaGCTGTCGCcatgaa Seryl-tRNA synthetase. 1 + -35 gcgcctgcgcaggaagggcgggtcagcgcgccgg 24 cgcagtgcggcggtcACAGGCTGagt Shwachman-Bodian-Diamond 7 - 47 cgctcacttttcccctcccggcttctgctcCACCTG -12 syndrome. ACGcctgcgcagtaagtaagcctg SKI-interacting protein. 14 - 51 tccttcctttccgctcctcatcatctggaaagacccgcc -8 cagcggtGCTGTCGCTcgcgc T-cell- or lymphocyte-specific 1 + 22833 atccCAGGTGGGagggtggggctagggctca 22892 tyrosine kinase 1. ggggccgtgtgtgaatttacttgtagcct Tryptophan 2,3-dioxygenase. 4 + -48 gatgcagggtaagcaggctacataaaaggCAGC 11 TGTAgaacatctgggaaggtcaatgat Tryptophan rich basic protein. 21 + 36 gctgttgttGTGGTCCCcatggagctgccgtagc 95 ggacccagcacagccaggagcgtccg tubulin, beta 2A 6 - -31 gtccacgccgcgcaccgctccgagggccagcgcc -90 acccgctccgcagccgGCACCATGCg Uridine phosphorylase. 7 + 443 cccccgggcagggcggggccgctcgcagactcca 502 tatgagattcacctcGCAGGTGGttc

Table A7. Gfi-1 Target Genes

Gene Description Ch 60n Sequence with Binding Site

[X-linked gene downstream of X - 72 ccagcgcgcgcgcccggggcggCGGCGCGC 13 G6PD gene], GDX gene. Ggcggggggtggttggggtgcgcgccggcc Actin related protein 2/3 complex, 7 + -25 ccccgccccccgcccacgaggaagtggctgctgct 34 subunit 1B, 41kDa. ccGGCGCGGAGcccagagccggttc Actin related protein 2/3 complex, 2 + 139 tgcgtgggccagcgcgttcgtcttacccttccctccaC 198 subunit 2, 34kDa , transcript 2. CCCGGCCCctctccccttctcc Actin-related protein 10 homolog. 14 + -10 ggcccccgccgtcgccgacgcccgagcgccggag 49 CCCCGGCCCcgccccgcgagcgccg Adenosine deaminase, ADA 20 - 15 gcgggaggcggggcccggcccgttaagaagagc -44 gene. gtggcCGGCCGCGGccaccgctggcc Alcohol dehydrogenase 5 (class 4 - -19 cctccgtcGCTGCGCGGcccaccccggatgtc -78 III), chi polypeptide. agccccccgcgccgaccagaatccgtga Aldehyde dehydrogenase 7 5 - -144 tttcagcagctctcagggccttgggctcatcccgaGT -203 family, member A1. CCCGGGCtcagtatgtggcgcct Apolipoprotein B. 2 - 48 ctcttgcagcctgggcttcctataaatggggtgcgggc -11 gccGGCCGCGCAttcccaccgg Apoptosis antagonizing 17 + 62 gccccgccccctcccgcgcgcctcccggaagtggc 121 transcription factor. cggtccagagctGTGGGGTGGcctc Aquaporin 1 (channel-forming 7 + -49 tccccccgccCCCCGGCCCTataaataggcc 10 integral protein, 28kDa). cagcccaggctgtggctcagctctcagag (continued on next page)

136

(Table A7 continued) ATP synthase, H+ transporting, 21 - -702 gcattgcgatggcgggtaGGCGTGTGGgggc -761 mitochondrial F0 complex, ggagccagggccggaagtagagcggaggt subunit F6. ATPase, H+ transporting, 7 + -32 gggtgggggttgaggccgacggggcgccgtacgG 27 lysosomal 14kDa, V1 subunit F. CGGAGGCGgggtttcagtggcttctg ATPase, H+ transporting, 14 - -22 gaaaatgggtgTCCCTGCTGcctcttagcaac -81 lysosomal 34kDa, V1 subunit D. aagaggggtcaagtgacacaaccagctg bbc3 (bcl-2 binding component 3) 19 - -1408 gggggggcgCGGCGCGCGcctgcaagtcct -1467 gacttgtccgcggcgggcgggcggggccgt B-cell leukemia/lymphoma 2 18 - -635 ctcttctttctctgggggccGTGGGGTGGgagct -694 proto-oncogene, BCL2 gene. ggggcgagaggtgccgttggcccccg B-cell receptor-associated protein X - -318 ccggtttcCGGCCGCGGtatgaggggcgggg -377 31 (BAP31). ccggggctgctgtgggagagttctgttgc B-cell receptor-associated protein 7 + -49 cccctcccccgcccagccgcggcgtctgacgtcccg 10 BAP29 cgcgtcggCGGCCGCGGagcagcg B-cell receptor-associated protein 7 + -49 cccctcccccgcccagccgcggcgtcTGACGT 10 BAP29. CCCGcgcgtcggcggccgcggagcagcg BCL2/adenovirus E1B 19kDa 8 + -17 gcctgggcgggccctgacgtcaggggcaggggag 42 interacting protein 3-like. GGACGGCGCaggcgcagaaaagggg Beta-1,4 mannosyltransferase. 16 + -15 gCGGCTGCGCacagcgcgaggaagcgcggt 44 cacgtgactgctgcgggccagccaagatgg Beta-2-microglobulin. 15 + -19 attggctgggcacgcgtttaatataagtGGAGGC 40 GTCGcgctggcgggcattcctgaagc beta-actin. 7 - 50 tcgagCGGCCGCGGcggcgccctataaaacc -9 cagcggcgcgacgcgccaccaccgccgag BRCA2 and CDKN1A interacting 10 + -36 tgggctggtccggaaggtcaggcaaggggaagctg 23 protein. cgcagGCGCAGTGTGagcggcaaca Casein kinase 2, beta 6 + -91 ccctaagtcagcctcttcacccagtgagcacaaaac -32 polypeptide. tgtattgcccagacTCCCGGGCCc Catenin (cadherin-associated 5 + -47 gggGGCCGCGGGcggggggcgggccgggg 12 protein), alpha 1, 102kDa. gcgggggcgtggggcggcccatttcctcctc CD11c (p150.95 leukocyte 16 + -132 cggttggggggtgggGGCGTGTGGgaggcc -73 integrin alpha subunit) gagcctgtcctcggatcagttgcgtactct CD43 (leukosialin) 16 + 192 ggggcaggatggaggggtgGGTGGGGTGG 251 gtggagccagggcccacttcctttccccttg Cellular-Rat-derived Harvey 11 - -68 cccccgcccccgccccgccccggcctcggCCCC -127 murine sarcoma virus oncogene, GGCCCTggccccgggggcagtcgcgcc HRAS1 or HRAS gene. Cellular retinoic acid binding 1 - 33 ggtataaaagctGTCCGCGCGGgagcccag -26 protein 2 gccagctttggggttgtccctggacttgtc CGI-81 protein. 16 + 8 agggggaggtgccgaGGCTGCGCGccggct 67 gctcctccccacccccagcctttgccctga Chaperonin containing TCP1, 12 + -18 ggccccgccccccgaagtagGGCGTGTGGc 41 subunit 2 (beta). gtcacttccggcttccttcagtccgctggt Chaperonin containing TCP1, 7 + 32 ggCGGCGCGCGggcacgctgggggccggcc 91 subunit 6A (zeta 1). agacgggccgacttttccagaagacccgga Chemokine-like factor super 3 - -66 gcccaggaagtGACGGCCGCctcccggctac -125 family 6. cggggacttctggagtccgagaagtcaac Chloride channel, nucleotide- 11 - 15 gactaagttgttcttccggggtgactgcctcttccaggg -44 sensitive, 1A. CGGGCGGTGTGGtgcacgcat Chromosome 14 open reading 14 - 27 ccttcgtcgcggcctctagtgcactttcggctccttccc -32 frame 94. ctTCCCGGGCCtttcagcttg Chromosome 20 open reading 20 + -49 gcgcgcccgcgaccgacgtgcgcaggcgcccacg 10 frame 29. GGCCGCGCAGccgccattgctctcct Chromosome 20 open reading 20 - 18 gatttctcagcgcgcaggcGCAGTGGCGcgg -41 frame 31. attcccggaagaacccgcagcagctccca chromosome 5 open reading 5 - -219690 cccctcCCCCCCCCAatctgtctttctagcatgtt -219749 frame 13. gccctttttcaaccacatttgtgtt Chymotrypsin C (caldecrin). 1 + -48 cactcacctgCTCCTGCCTAtaagtgtgcccca 11 gcccatcccgatggtcagccagtcctg (continued on next page)

137

(Table A7 continued) Claudin 5 (transmembrane 22 - -882 gaatccgcccccgaaccttcaaagagggtaccccc -941 protein deleted in velocardiofacial cggcaggaGCTGGCAGAcccaggag syndrome). Cold inducible RNA binding 19 + 15 cggggcccggcggaagcgtatataaggccgggctc 74 protein. ggggacGCCCCCCCctcactcgcgc Colony-stimulating factor-1, 1 + 186 accacatgCCCCCCAGTcctctcttaaaaggct 245 CSF1 gene. gtgccgagggctggccagtgaggctcg Coronin, actin binding protein, 16 + -48 ccaaggggcgcggccaccccggaggcgggcGC 11 1A. ACGGCTGCttctcattcattgtcttgac Creatine kinase, mitochondrial 1 15 + 936 cctccctctcctccgcccgcccgcctgccactagctc 995 (ubiquitous). attgcgccTCTCCTGCagtctga Creatine kinase, muscle. 19 - 47 cacagcctaggtccccctatataaggcCACGGC -12 TGCtggcccttcctttgggtcagtgtc CREBBP/EP300 inhibitory 15 + 20 GGCCGCGCAGcatctgtcttgctggaagcttttt 79 protein 1. cctagaggttgagcggtttgcacaat DEAD (Asp-Glu-Ala-Asp) box 19 + -19 cgtgacgtcgtCGGCGCGCGccggaagcgcg 40 polypeptide 49. gatcacacgggcccctacaaggggcccct Death-associated protein 6. 6 - -45 tgacttccggtccgtagtggggcctgcGGTGGGA -104 GTGGgaaggaaggcggagggaaccat Delta-like 1 homolog 14 + -55 gtacgaaaagggCGGCGCGCGcggcggcg 4 (Drosophila). gcggcagctccccggcagcggcggtggagag DiGeorge syndrome critical 22 - 6 gggaggcggggcggggcggcgcGGCGCGG -53 region gene 2. AGccgagcaggacggccgccatcttgcgcgc DiGeorge syndrome critical 22 - 6 gggaggcggggcggggcggcgcggcgcggagc -53 region gene 2. cgagcagGACGGCCGCcatcttgcgcgc Diptheria toxin resistance protein 1 + -6 aacctccccggtagtcccacgtgtagcggagaaac 53 required for diphthamide 2 aGTAGTTAGGatggctgaaggggat DnaJ (Hsp40) homolog, 9 + 15 gacgcgcccgggttcggctacaaaagaggACGG 74 subfamily A, member 1. CTGCGgcgcgccgggcggaactttcca Dynactin 3 (p22). 9 - 44 ctcggattggctagccttcttatttggacTGACGTC -15 CCGccctctggctgtttcctggcc Endoglin (Osler-Rendu-Weber 9 - -62 tccctgggccggccgggctggatgagccaggagcT -121 syndrome 1). CCCTGCTGccggtcataccacagcc Endomucin. 4 - 28 aaataagtaggaatgGGCAGTGGCtattcaca -31 ttcactacaccttttccatttgctaata Estrogen-responsive RNA, TFF1 21 - 48 taagcaaacagagcctgccctataaaatccggggC -11 or BCEI or PS2 gene. TCGGGCGGcctctcatccctgactc Fas (TNFRSF6)-associated via 11 + 152 agtgaatcaggcaccggagtgcaggtTCGGGG 211 death domain. GTGgaatccttgggccgctgggcaagcg Geranylgeranyl diphosphate 1 + 83 aaaggggcggggcaacaaagcagtagggaggc 142 synthase 1. ggcaacgacgcctGCGCAGTGTGaccgg Glioblastoma amplified 7 + -61 gggaggggtcgcgcggcggcggcgtcagcggcg -2 sequence. gcgccCGGGCGGTGggagccgaggcgc Glutaminyl-peptide 2 + 34 tggagaagagggaaggcgaaggacgcgcgtTC 93 cyclotransferase (glutaminyl CCGGGCtcgtgaccgccagcggcccggg cyclase). Glutathione peroxidase 3 5 + 121 cttgaaaggtggctgggagcgccggacacctcaga 180 (plasma). cGGACGGTGGccagggatcaggcag Glutathione S-transferase subunit 7 + -37 ttctctctgaGGCTGCGCGctaaggcggtgggc 22 13 homolog. ggtcccaggcaggcccagaagctgggc Glycerol-3-phosphate 12 + -59 ctgcacccacgtgtggtggagttaaatgctcctagcc 0 dehydrogenase 1 (soluble). ggcagaggagctaGGGAGTGTGg Glycogenin. 3 + -97 cgcggccacgtGACGGCCGCtataagagcgc -38 acggggcagacgctcggttccccgccgtg Glycyl-tRNA synthetase. 7 + 301 gCGGCGCGCGccgcttccgtcgccaccctctct 360 ggacagcccagggccgcaggctcatgc Glyoxylate reductase/ 9 + -61 cgacgtctctcccgcccactccagcctggccccgC -2 hydroxypyruvate reductase. CCCGGCCCagctacattcccgggcc Glyoxylate reductase/ 9 + -61 cgacgtctctcccgcccactccagcctggccccgcc -2 hydroxypyruvate reductase. ccggcccagctacatTCCCGGGCC (continued on next page)

138

(Table A7 continued) guanylate cyclase 2C (heat 12 - 102 gtgcaaacacaaagtgaactttggtttaTCTCCT 43 stable enterotoxin receptor) GCCataacgtagctgctaattactgg Heat shock 60kDa protein 1 2 - -360 cgcggtGCCGCGGGGcgggagtagaggcgg -419 (chaperonin). agggaggggacacgggctcattgcggtgtg Heat shock protein 75. 16 - -25 ccgtcccgccccttcccatcgtgtacggtcccgcgtG -84 GCTGCGCGcggcgctctgggagt Heme oxygenase (decycling) 1. 22 + -47 ccacgtgacccgccgagcataaatgtgacCGGC 12 CGCGGctccggcagtcaacgcctgcct Heterogeneous nuclear 12 + -6 aaaaaagagagggcgaaggtagGCTGGCAG 53 ribonucleoprotein A1. Atacgttcgtcagcttgctcctttctgccc Homology group 168; 11 + -51 gcctgctgccctggagatgatataaaacaggtcaga 8 Mammalian apolipoprotein CIII. accCTCCTGCCTgtctgctcagtt HSPC142 protein. 19 + -29 ttctgcagcccgctgaactacatttcccaGGAGGC 30 GTCGgggctgcgccgtacaacttcc Hypothetical protein FLJ10769. 13 + -133 aaccggaaaacgcttccaatggctgtgtttccggcG -74 ACGGCGCGggggcagctgggaatc Hypothetical protein FLJ20274. 16 - -11 atgaggcttcgaggccggctaggggAGGCCGT -70 CGctcatatccgacgtcaccagttcgcg Insulin-like growth factor II, IGF2 11 - 8828 cccgcctccagagtgggggccaaggctgggcagg 8769 gene. cgggtgGACGGCCGgacactggcccc Integral membrane protein 2B. 13 + 19 cagtctccGCCCTCGCGCgggagctgggag 78 gctgcgagatccctaccgcagtagccgcct Interferon regulatory factor 1. 5 - 35 agtgtttggattgctcggtggcgccgctgccCTGG -24 CAGAGctcgccactccttagtcgag Interleukin 10 receptor, beta. 21 + -49 agcGGGGTGGGAGcccggtcgcccggcccc 10 tccccaccccgccccgcccatctccgctgg Isocitrate dehydrogenase 3 15 + -49 gggcgggctccgagcGTGGGGTGGgcgcttg 10 (NAD+) alpha. cgcactgccgctgcggctgttgctgcgga Kinesin family member 3A. 5 - 21 tggggcggggcgagctcggaatgCGGCTGCG -38 Cactgggcgctctcgcccgagaagccgca Lectin, galactoside-binding, 22 + -19 ggctcacccggtccggtccagttaaaaGGGTGG 40 soluble, 1 (galectin 1). GAGcgtccgggggcccatctctctcgg Ligatin. 1 - -39 ccagcggcaacggccacgaaGCTGCGCGG -98 ccctggtttccagccgggcccttttcgcggc Lipase A, lysosomal acid, 10 - -80 caggccccctgcaggtcccctatccgcaCCCCG -139 cholesterol esterase (Wolman GCCCctgagagctggcactgcgactcg disease). Lipocalin 7. 1 + -19 tgcagagggCGGGCGGTacaaaaagcgccc 40 cgccccgcgctcctctcttgactttgagcg McKusick-Kaufman syndrome , 20 - 28 tcgaccgcagagctgcgcgtgctccgtGCCCTC -31 transcript variant 1. GCGCgacgcgaaggttgtcgggatccg Mel transforming oncogene 19 + 173 gccagtggagaggcgctggccgcacttcCCGTC 232 (derived from cell line NK14)- GGGGagagagtgtaatatggcgaagac RAB8 homolog. Metallothionein 1L. 16 + 64957 CCCGGCCCTcttcccggactataaagagagcc 65016 gccggcttctgggctccaccacgctttt Mitochondrial ribosomal protein 11 - 31 tggctgggcGGACGGCGCctttgaccgcagga -28 L16. actcaacctcccccggaactcaaccccc Mitochondrial ribosomal protein 19 + 233 gcgcctcgcgaggctccagtggccttgacctcccgc 292 L4. ggcgtgggaGGCTGCGCGGcgatg Mitochondrial ribosomal protein 12 + -26 cccgcgggctgGGACGGCGCGCGcagcgc 33 L42. tgtttcgtgggaccaggattgaaacaagatg Mitochondrial ribosomal protein 5 + -29 cagtccggtctccgcctccgTGACGTCCCGG 30 S36. Gaagcaccgcccacagctgcccgggactc MT-IF (metallothionein IF) 16 + -104 gggaccttgcgCCCGGCCCTtctgctgcacac -45 agcccacccaggacctcccgcagcgctg MT-IIA (metallothionein IIA) 16 + -324 tcaggttcgagtacaggacaggagggaggggagct -265 gtgcacacgGCGGAGGCGcacggcg N-acylsphingosine 8 - -844 cgggtcacgcggcggagggggcgtggcctgcCC -903 amidohydrolase (acid CCGGCCCagccggctcttctttgcctc ceramidase) 1. (continued on next page)

139

(Table A7 continued) NADH dehydrogenase X + 31 ggcaccgccccgctcgcgggCGGCCGCGG 90 (ubiquinone) 1 alpha GGcttgctgggaagagaggcgaagccaggtc subcomplex, 1, 7.5kDa Nuclear distribution gene C 1 + -24 ggcagccGCGCGCGTGcgtgtttccggctccg 35 homolog (A. nidulans). ctgcggaaggcggacgactagagtcgtt Nucleoporin 88kDa. 17 - 44 tgGACGGCCGCGGattggctgtgctcagcgg -15 cgggctgagcaactggagtgaggggagca Opsin blue, BCP gene. 7 - 56 tttGTGGGGTGGgaggatcacctataagagga -3 ctcagaggagggtgtggggcatccatga Origin recognition complex, 2 - -781 gggagaggggcGGGGTGGGAGaggtagtg -840 subunit 4-like (yeast). cgtgcgcggggcatgccgggagtggttgtgt P21(CDKN1A)-activated kinase 19 + -15 gcggtgaggggggaaagccccggatgttcgttggg 44 4. gattcaacatggcggCGGGAGTGTc Parkinson disease (autosomal 1 + -2 cgtgagtctgcgcagtgtggggctgagggaggccG 57 recessive, early onset) 7. GACGGCGCGCGTGcgtgctggcgtg Parkinson disease (autosomal 1 + -2 cgtgagtctGCGCAGTGTGgggctgagggag 57 recessive, early onset) 7. gccggacggcgcgcgtgcgtgctggcgtg PDGFB (platelet-derived growth 22 - 273 tgctgaggggcgGGACGGTGGGtcaccccta 214 factor chain B) gttcttttttccccagggccagattcatg Peptidylprolyl isomerase F 10 + -44 ccgggaacctgggcaagccaataaaggctgCGG 15 (cyclophilin F). CGCGCGgctgcgcgggactcggccttc Peptidylprolyl isomerase F 10 + -44 ccgggaacctgggcaagccaataaaggctgcggc 15 (cyclophilin F). gcgCGGCTGCGCGGgactcggccttc Peroxiredoxin 3 , nuclear gene 10 - 28 ggcccgctcaccACCCCGTAGgccccgcccct -31 encoding mitochondrial protein. gcgtctctgcccgccccgtggcgcccga Peroxiredoxin 6. 1 + -60 ccccgcCCCTCGCGCcttcgtttattcctccgcg -1 cgctgggacaggctgcttcttcgcca Pescadillo homolog 1, containing 22 - 38 caaacagacagcGTGGGGTGGggagggtcc -21 BRCT domain (zebrafish). tcggggtccttggcagggcacgtgcgggag Plasma retinol-binding protein. 10 - 52 cgcgaccccctccccccggcgctataaagcagcgg -7 ggCGGCCGCGGcgcgctcgcctccc Polymerase (DNA-directed), delta 11 - 35 gggGGAGGCGTCcctccctggaccccggcga -24 4. cttcctctctcggtttgtctgggtcatct Polymerase (RNA) II (DNA 16 + -17 tttgcggaagcCGCGGAGGCctcgctgactga 42 directed) polypeptide C, 33kDa. cttccggtgttggcggtggcgccgcgca Polymerase (RNA) II (DNA 11 + -8 gttccgccctcctagtcgcaGCAAGCGCGgaa 51 directed) polypeptide G. ctggggttgcggcgtctaagtgtttccg Polymerase (RNA) II (DNA 19 - -364 acctaggcctggcccctcccgcgacctgtagcgcgg -423 directed) polypeptide I, 14.5kDa. cggaGCAAGCGCGgaaggctggga Proteasome (prosome, 17 + -28 ggcgggctttccgggtgtgtgtttccggcgtcggCG 31 macropain) 26S subunit, non- GCCGCGGccggggacggtgtgaga ATPase, 11. Proteasome (prosome, 2 + 101 tccaattGGCCGTCGGGGaacggaagccga 160 macropain) 26S subunit, non- agcaaactagtgaacccggaagtgcttcgc ATPase, 14. Proteasome (prosome, 1 - 49 tatatacatatatatattactatgtatccatgaaaattaa -10 macropain) 26S subunit, non- aaaataaagGGCAGTGGCag ATPase, 2. Proteasome (prosome, 9 - 49 acgccggaagtgctcggctctttttgtttgcacccgcct -10 macropain) subunit, beta type, 7. cCGACCCGGAactgctttctt Rac GTPase activating protein 1 12 - 19 agagGGACGGCGCGggaggaataaatttctc -40 tgtgattggttggtgaaggttttcaaacc RAD51 homolog C (S. 17 + -105 gcaaagctgcaaggcccggagcCCCGTGCG -46 cerevisiae) Gccaggccgcagagccggcccctcgagctc Ribosomal protein L12. 9 - 49 atagccgcgtttTCCTGCCTAtatctggcttgtcc -10 gcgcgatttccggcctctcggcttt Ribosomal protein L18a. 19 + -49 cttccggtagtgaaggcctggtgaACGGCTGC 10 GCGacagaggacacttccttttgcgggt Ribosomal protein L39-like. 3 - 49 aatcaccgccctagcatccggggaaatcgcggtctt -10 agcatcCGGCGCGCGgcggttgaa (continued on next page)

140

(Table A7 continued) Ribosomal protein S5. 19 + -49 acccggaagttttcttcccagttaaaagtgttggcccg 10 CGGCGCGCGgcctcttcctgtc RNA binding motif protein 5. 3 + -30 gataagcgtgaggtactgtgggtaggagacggccgt 29 cgGCGGAGGCGccattttgtgtag RNA binding motif protein 5. 3 + -30 gataagcgtgaggtactgtgggtaggagacGGC 29 CGTCGGcggaggcgccattttgtgtag RNA binding motif protein 5. 3 + -30 gataagcgtgaggtactgtgggtaggAGACGGC 29 CGtcggcggaggcgccattttgtgtag S-adenosylhomocysteine 20 - 87 tatgcaagtgcgaggaagatatttaaaggcgtcggc 28 hydrolase. gccacgcgcatATCCCTGCTcggc Sepiapterin reductase (7,8- 2 + -4 gcaccgcCTCCTGCCTggtctcgggtgccagc 55 dihydrobiopterin. gccgccggcggagaacaggagcatggag Serine (or cysteine) proteinase 6 - 62 tgacgcttccgcctgctataagagcagCGGCCC 3 inhibitor, clade B (ovalbumin), 1 TCGgtgcctccttcctgacctcgcacc mber 1. Solute carrier family 35 (CMP- 6 + 3 aggggcggtccccggtgtcctgcgcgggGGCGC 62 sialic acid transporter) member1. GGAGGgggcgggcgtcagttccgcggg S-phase kinase-associated 5 + -16 cggccgcgaagcagagcgggctgtagagccttgcg 43 protein 2 (p45). CGCGCAGTGgggatggaacgttgct Splicing factor, arginine/serine- 17 - -70 cccatatccgtgcgccgagctgataaaggcgccattt -129 rich 1 (splicing factor 2, alternate tggaggGGCCGCGGGagacgtgg splicing factor). Splicing factor, arginine/serine- 3 - -100 ccgcgtcacatccggtagagttagagcccgtGCG -159 rich 10 (transformer 2 homolog, GAGGCGgtgcggagcatttcggctct Drosophila). Splicing factor, arginine/serine- 12 + -47 cgctcgccggaaatgcgggcTCTCCTGCCgg 12 rich 8 homolog, Drosophila) aagctgcccctccaccattttgtggcccg apricot homolog, Drosophila). Superoxide dismutase 1, SOD1 21 + 17 gcgaGGCGCGGAGGtctggcctataaagtag 76 gene tcgcggagacggggtgctggtttgcgtcg Suppression of tumorigenicity 13 22 - 5 ccgttGGGTGGGAGgagccagcggccgggg -54 (colon carcinoma) (Hsp70 aggttctagtctgttctgtcttgcggcagc interacting protein). SWI/SNF related, matrix 22 + -32 tttgtttgagcggCGGCGCGCGcgtcagcgtca 27 associated, actin dependent acgccagcgcctgcgcactgagggcgg regulator of chromatin, subfamily b, member 1. Synaptosomal-associated 15 + -22 gcgcgggctCGGCGCGCGcaggcgcgacta 37 protein, 23kDa , transcript 1. gggtgcagcgccaggtccggtgttggggtg TGFalpha (transforming growth 2 - -54 agaggcgctcggtcctccctccgccctcccgcgccg -113 factor alpha) ggggcaggcCCTGCCTAGtctgcg Thiopurine S-methyltransferase. 6 - -20 gagaagtggcggaggtggaaGCGGAGGCG -79 Tacccgcccctggggacgtcattggtggcgg Threonyl-tRNA synthetase. 5 + 152 ggcggggttagggcgcctttcgattgcatcagctggt 211 ccagccgaggccaaGTCCCGGGC Thrombospondin 15 + -314 ttgctgatcaccccgaGCCCGCGTGGcgcaa -255 gagtacgagcgccgagcccgtgcgcgcca Thymidylate synthetase. 18 + -24 cgccgagcaggaagaggcggagcgcggGACG 35 GCCGCGGGaaaaggcgcgcggaaggggtc TPI (triosephosphate isomerase) 12 + -64 cggagttccacttcgcggcgctctatataagtgGGC -5 AGTGGCcgcgactgcgcgcagaca Transcription elongation factor A 20 + 6065 cgcgcgCGGCCGCGGGGccgagggtttgaa 6124 (SII), 2. ccgggggtctgtcgtccgcggcggggctgc Transducin (beta)-like 3. 16 + -30 tcgcgcgcggtgacgccatcgcagcgcgcCGGG 29 AGTGTGgcgttctgtgaagagttcggt Transforming growth factor, beta- 5 + 47 ctcccgctcgcagcttacttaacctggcccgggcgG 106 induced, 68kDa. CGGAGGCGctctcacttccctgga Translocase of outer 22 + -23 tccgccgttCGGCTGCGCtcacctcctttccgctt 36 mitochondrial membrane 22 ccggtgtcccctacagtcatggctg homolog (yeast). (continued on next page)

141

(Table A7 continued) Translocase of outer 19 + 12 gtggcggcggcggcagcgggttcggttgcgcgtgg 71 mitochondrial membrane 40 cgcacGGGGTGGGAGcggagcccag homolog (yeast). Tubulin, beta, 4. 16 + -41 agggcggggccgcggctataagagcgcgCGGC 18 CGCGGccccgaccctcagcagccagcc UDP-GlcNAc. 11 - -27 cggctggcctcggctcgcctCGGCTGCGCtcg -86 gcaggctgcggtaaatccgggcttgcgg X-box binding protein 1. 22 - 3 ctgggcgctgggcggctgCGGCGCGCGgtgc -56 gcggtgcgtagtctggagctatggtggtg Zinc finger protein 207. 17 + -23 ctaccgcagtgcttgacgggaggcggagcgggga 36 acgAGGCCGTCGGccattttgtgtct Zinc finger protein 265. 1 - 20 gggacagccccgagttgccgaaagtCCCCGG -39 CCCgttttcctcctgtgaagacatagctg

Table A8. Gfi-1B Hs Target Genes

Gene Description Ch 60n Sequence with Binding Site

[X-linked gene downstream of X - 72 ccagcgcgcgcgcccGGGGCGGCGGCGc 13 G6PD gene], GDX gene. gcggcggggggtggttggggtgcgcgccggcc A2(1) COL (alpha2(1) collagen) 7 + 9 agtgtttccaaacttggaaagggcgggggAGGG 68 CGGGAGgatgcggagggcggaggtatg Acidic (leucine-rich) nuclear 9 + 84 ggcCGCGGCCGGgggcggcggtagtgagcc 143 phosphoprotein 32 family, cagcgctgcgctccagcccccttttccctc member B. ADA (adenosine deaminase) 20 - 35 gggaacggcggcgggcgGGGCGGGAGgcg -24 gggcccggcccgttaagaagagcgtggccg Adipose differentiation-related 9 - -34 ggaGGGGCGGCGaggcggggtttatagcccg -93 protein. ggcgcccgcgggccccacgctttgaccgg Adipose differentiation-related 9 - -34 ggaggGGCGGCGAGGcggggtttatagccc -93 protein. gggcgcccgcgggccccacgctttgaccgg ADP-ribosylation factor 5. 7 + -61 agcctcctcctgctgctgctgcgccccatcccccCG -2 CGGCCGGCcagttccagcccgcac ADP-ribosylation factor 6. 14 + 32 ggggaagtgaggcggtttcctcggcgccttttccggc 91 agcGGCGGCGGCagaactgggag Alcohol dehydrogenase 5 (class 4 - -18 cctccgtcgctgcgcggCCCACCCCGgatgtc -77 III), chi polypeptide. agccccccgcgccgaccagaatccgtga ALDC (aldolase C); Gene: 17 - 60 gccccggacaccagtcctggggagggggtgtggtC 1 G001111. AGGGCGGGgcatgcaggccacgccc Aldo-keto reductase family 7, 1 - -9 aggCGCCGCTGCtatgctgagtgccgcgtctc -68 member A2 (aflatoxin aldehyde gcgtagtctcccgcgccgccgtccactg reductase). Aldolase A. 16 + 12602 cgccgccccttccgaggctaaatcggctgcgtTCC 12661 TCTCGGaacgcgccgcagaaggggt Alpha-actinin-2-associated LIM 4 - -8 tcacccccagcggggataaagcgcccccgcccgg -67 protein gtcggggccaGGACGCCGCCcggcg Alpha-actinin-2-associated LIM 4 - -2 gtgacgtcacccccagcggggataaagcgccCC -61 protein. CGCCCGGtcggggccaggacgccgcc Amyloid beta (A4) precursor-like 19 + 67 gggagctcctgtcaccgctggggccgggccGGG 126 protein 1. CGGGAGtgcaggggacgtgagggcgca Amyloid beta (A4) precursor-like 11 + 24 gtcagttccggttggtgtaaaacccccGGGGCG 83 protein 2. GCGGCGaactggctttagatgcttctg Apolipoprotein E. 19 + -36 ggcGGGACAGGGggagccctataattggaca 23 agtctgggatccttgagtcctactcagcc ARP3 actin-related protein 3 2 + 88 aggccgcggcggggCAGGGCGGGgactgcc 147 homolog (yeast). tgcctgcctgggttgcggaagtgatagccg ATPase, Na+/K+ transporting, 3 + -34 agacgtcaggctcccgcggcccaaGCGGCCG 25 beta 3 polypeptide. CCcggcgcggcgcggcgcagtcggctcga (continued on next page)

142

(Table A8 continued) ATP-binding cassette, sub-family 7 + -20 caccgccaaggcgcgagggggttgtcgggatggG 39 B (MDR/TAP), member 8. GGCGGGAGccaacatagagccctcag B-cell receptor-associated protein 7 + -49 cccctcccccgcccagccgcggcgtctgacgtcccg 10 BAP29. cgcgtCGGCGGCCGCggagcagcg BCR (breakpoint cluster region 22 + 181 ctggctgagcttagcgtccgaggaggcGGCGGC 240 protein) GGCGgcggcggcacggcggcggcggg beta-actin. 7 - 51 cgagcggccGCGGCGGCGccctataaaacc -8 cagcggcgcgacgcgccaccaccgccgag Block of proliferation 1. 8 - 0 ctcctgcgcacgcggcccggtcgctgtcggaagcg -59 gctgtgcgggtgGCGGCCGGCgcgc Bridging integrator 1. 2 - -170 ctCGCTCGCCCGtccggcgcacgctccgcctc -229 cgtcagttggctccgctgtcgggtgcgc Calponin 3, acidic. 1 - -26 gggcggcctgtggggaaccgaggtgcgGGCGG -85 CGAGcgaggcagccgggtgcttcgcag Calreticulin. 19 + -30 gcgggtgggtataaaagtgcaaggcGGGCGG 29 CGGCGtccgtccgtactgcagagccgctg Capping protein (actin filament) 1 - 65832 ggtGGCGGCGGCccggcgcggggggaggg 65773 muscle Z-line, beta. gggtgctgacccggatgttcactcctgggca Carboxypeptidase E. 4 + 66 agtgcagcccgtggagccgcggctttgcccGTCT 125 CCTCTgggtggccccagtgcgcgggc CD63 antigen (melanoma 1 12 - -68 ccggggcggggccgcgcggcaggcgGGGCG -127 antigen). GGAGccggggggcgcagctagagagccccg CGI-51 protein. 22 + 76 gcggacccgccttctgccctcagcagcagacgctct 135 gtCCCGCCCGGgcagctctgcgag Chitinase 1 (chitotriosidase). 1 - -9 GGGACAGGGtggccagataaaagcagagca -68 ggacctggaaagctggtttgtatgggctgc Chloride channel, nucleotide- 11 - 16 gactaagttgttcttccggggtgactgcctcttcCAG -43 sensitive, 1A. GGCGGGcggtgtggtgcacgcat Chloride channel, nucleotide- 11 - 16 gactaagttgttcttccggggtgactgcctcttccaggg -43 sensitive, 1A. cGGGCGGTGTggtgcacgcat Chromosome 9 open reading 9 + -59 gcatcagcccggtcctgggctctGGGGCGGCG 0 frame 19. ctgggccgggcgagcgcagtgcagcgca c-myc 8 + 0 acccccgagctgtgctgctcGCGGCCGCCacc 59 gccgggccccggccgtccctggctcccc Connective tissue growth factor. 6 - 43 cgcggCCGCCCGGAgcgtataaaagcctcgg -16 gccgcccgccccaaactcacacaacaact Connective tissue growth factor. 6 - 43 cGCGGCCGCCcggagcgtataaaagcctcgg -16 gccgcccgccccaaactcacacaacaact COP9 constitutive 12 + 35 aacagctgccggaggtgacggagcggcggcCC 94 photomorphogenic homolog CGCCCGGtgcgctggaggtcgaagcttc subunit 7A (Arabidopsis). Creatine kinase, mitochondrial 1 15 + 936 cctccctctcctccGCCCGCCCGcctgccacta 995 (ubiquitous). gctcattgcgcctctcctgcagtctga Creatine kinase, mitochondrial 2 5 + -45 ccgcCCCGCCAGGgaggatttccaggccggc 14 (sarcomeric) , nuclear gene ccgaccagctcgccctgcatacacttctt encoding mitochondrial protein. Creatine kinase, mitochondrial 2 5 + -45 ccgccCCGCCAGGGaggatttccaggccggc 14 (sarcomeric) , nuclear gene ccgaccagctcgccctgcatacacttctt encoding mitochondrial protein. Cysteine-rich protein 2. 14 + -13 ccgggcaggcggggctgggcgcGGGCGGCG 46 GCGgcccggaggagaacgggcggagggcgc Cysteinyl-tRNA synthetase , 11 - 26 ggaagacgggcccggcgtggggcgcgacttccG -33 transcript variant 2. GGGCGGCGGttgcatcagattctagga Cytochrome P-450 IA1, P450-C, 15 - 43 ccacacgtacAAGCCCGCCtataaaggtggc -16 methylcholanthrene-inducible. agtgccttcaccctcaccctgaaggtgac Cytochrome P-450 IA1, P450-P1, 15 - 43 gcgtggccacacgtacAAGCCCGCCtataaa -16 2,3,7,8-Tetrachlorodibenzo-p - ggtggcagtgccttcaccctcaccctgaa dioxin-inducible. DBH (dopamine beta 9 + -82 agacaaatgtgattacccgtgctgcctggACCCA -23 hydroxylase). CCCCattcaggaccagggcataaatg (continued on next page)

143

(Table A8 continued) DEAD/H box polypeptide 17, 22 - 17 tctccgccgtacgcagcgttaagttggagccgactca -42 72kDa , transcript variant 1. gCGGCGGCCGCCAttttgtgcag DEAD/H (Asp-Glu-Ala-Asp/His) 11 + 82 gcaccgcctcgcgCCGGTGGTGgaagcacgt 141 box polypeptide 25. gctgggggcgggagcagcaatcgcagcca DEAD/H (Asp-Glu-Ala-Asp/His) 11 + 82 gcaccgcctcgcgccggtggtggaagcacgtgctg 141 box polypeptide 25. gGGGCGGGAGcagcaatcgcagcca DEAD/H (Asp-Glu-Ala-Asp/His) 17 - -28 cgtaggaggcggtccagactacaaaagcggctgcc -87 box polypeptide 5 RNA helicase ggaaaGCGGCCGGCAcctcattcat Delta-like 1 homolog 14 + -55 gtacgaaaagggcggcgcgcgcGGCGGCGG 4 (Drosophila). CGgcagctccccggcagcggcggtggagag DiGeorge syndrome critical 22 - 7 ggaggcggggcggggcggcgcggcgcggagcc -52 region gene 2. gagcaggaCGGCCGCCAtcttgcgcgc DiGeorge syndrome critical 22 - 7 ggaggcggggcGGGGCGGCGcggcgcgga -52 region gene 2. gccgagcaggacggccgccatcttgcgcgc EBNA-2 co-activator (100kD). 7 + -3 ggcacgcttgcgcggcgagtagaacgtgtGGCG 56 GCGGCGgagatcgcgtctctttcgctc Enhancer of rudimentary 14 - 44 tttcgacgacttccggcggCTCCTCTCGcgagg -15 homolog (Drosophila). attggctgttagcggcgttgtagttaa ENOS interacting protein. 19 - 59 agtgggcggagcggagggggaaaatagtgctccttt 0 cCGGTGTCGGGgcacagttgaaga Equilibrative nucleoside 10 + -43 ggggcggggcggggccgggggagGAGCCCG 16 transporter 3. CCtgccgcctgccaagcccagtggtcctgg Estrogen-related receptor beta 3 - -118 ttccctggtgacggattgTCCGGTGGTcgcctg -177 like 1. gtaaccggtcgtggctgtactggcggc Etoposide-induced 2.4 mRNA. 11 + 26 ttcGCGCTCGCCCGacttcccagcggccccgt 85 gcggcccgggcatgcccagtgcgggcgc Eukaryotic translation initiation 14 + 32 ggcggacgctggggccgagggtagcttgagcgcG 91 factor 5. GCGGCGGCGttgttcagtcagagcga Fibrillarin. 19 - 16 gcgggactacggggtggcgtcacgcagcgCACG -43 TCGCCgcgcgcctgcgctcttttccac Fibronectin. 2 - 55 ccagagGGGCGGGAGggccgtcccatataa -4 gcccggctcccgcgctccgacgcccgcgcc Follicular lymphoma variant 18 - -21 gccccggcccgcaaacccaaacactccaggcGC -80 translocation 1. CCGCCCGccgcgcgtgattctcgcctc Gardner-Rasheed felinesarcoma 1 - -23 tcctcctccctgcccagcaGGGGCGGCGGtc -82 viral (v-fgr) oncogene homolog agaggcgggcagcaccccagttctccccg GDP dissociation inhibitor 1. X + 209 cggaggggtcgggcgacggccgacgcgccgccat 268 ctttggtccagtgcggtGGCGGCGGC Gelsolin (amyloidosis, Finnish 9 + 11 ccctgCCCACCCCGgccgcgcgcaccacaac 70 type). gcccccgccccgccgcccggaaccagctg Gelsolin (amyloidosis, Finnish 9 + 11 ccctgcccaccccggccgcgcgcaccacaacgcc 70 type). cccgccccgCCGCCCGGAaccagctg Glioblastoma amplified 7 + -62 gggaggggtcgcgcGGCGGCGGCGtcagc -3 sequence. ggcggcgcccgggcggtgggagccgaggcgc Glutamic-oxaloacetic 16 - -16 ccctgtccttaccttcagcaggagccggttccctgtgt -75 transaminase 2, mitochondrial gtgtgtcCGCTCGCCCtctgct (aspartate 2) notransferase 2). Glutathione transferase omega 1. 10 + -68 gAGGGCGGGAaggacgcgccacctacttcct -9 gaatcccctgcaaaccccagaggagctcg Glyceraldehyde-3-phosphate 12 + -23 gccgccgcgcccccggtttctataaatTGAGCCC 36 dehydrogenase. GCagcctcccgcttcgctctctgctc Glycine amidinotransferase (L- 15 - -189 ccggggcggagcgcggctacaaaaggcctcgggc -248 arginine:glycine cccgcgcgcccgCCCACCCCGctccg amidinotransferase). Guanine nucleotide binding 7 + 305 ggctgcagtcacatcctgcgcgggtGGGCGGC 364 protein 11. GGgccaggccttcagttgtttcgggacg Heat shock 60kDa protein 1 2 - -359 cgcggtgccgcgGGGCGGGAGtagaggcgg -418 (chaperonin). agggaggggacacgggctcattgcggtgtg Heterogeneous nuclear 5 + -8 ggcattataaagggcgccacgagtcggcattgtca 51 ribonucleoprotein A/B. GGCGGCGGCaccgcgcgggacggag (continued on next page)

144

(Table A8 continued) Heterogeneous nuclear 2 + 17 cggaaaacgcgtcacgtgacgactggccccgcctc 76 ribonucleoprotein A3. tTCCTCTCGGtcccatattgaactc High-mobility group chromosomal 21 - 48 ggggcggcccggccggcggggagggggagccC -11 protein-14. GCGGCCGGggacgcggggggaggaggag Histone H3.3. 1 + -47 ggcGGGGCGGCGtgtgttgggggatagcctcg 12 gtgtcagccatctttcaattgtgttcgc HLA-DRA (MHC class II human 6 + -585 agtgttgagagtgttgaacctcagagttTCTCCTC -526 leukocyte antigen DRalpha) TCattttctctaaatgagatacaat HNOEL-iso protein. 1 + -16 agagagacccagagatcaggagagaaggcaccg 43 ccCCCACCCCGcctccaaagctaaccc Hypothetical protein FLJ10276. 1 - 19 cttgagGGGGCGGTGccgtgtcggaagtcaa -40 aggtcagtaaatagtggtgatgtcatgca Hypothetical protein FLJ12895. 19 - -20149 tcgtagtccgcggtgaggcccagtgcgcaggcgca -20208 gtttgcCGGCCGCCAtcgcgcactg Hypothetical protein FLJ14904. 1 + -22 cgcaggcgcctacctctgttacttAGGGCGGGA 37 Gcccggcgagggcgccggtgctttgtt Hypothetical protein FLJ20320. 1 + 31 cccgcgGGGGCGGTGgcgcgcggtcagctg 90 acccggcgggccttgacccagaagctgggc Hypothetical protein LOC57019. 16 - -42 ctcCTCCTCTCGcgagaggcgcgaggcgtgg -101 agtcgacggctggagagaagccgggagcg Hypothetical protein similar to 17 - 49 gcCGGCGGCGAccccgcccgaccacggaga -10 beta-transducin family. cggcgctctgtctgccgggctctttctcct Insulin-like growth factor binding 7 - 49 cctgggccaccccggcttctatataGCGGCCGG -10 protein 3. Cgcgcccgggccgcccagatgcgagca Insulin-like growth factor binding 12 + -44 ttaaagggcccggcccctggccggcggctacttaag 15 protein 6. acagaGGGGCGGCGGCGggcagc Insulin-like growth factor binding 4 - 61 ggccgggcGGGGCGGCGGgtttaaggcgcc 2 protein 7. ggccggcccgacacggctcactcgcgccct Insulin-like growth factor II, IGF2 11 - 8829 cccgcctccaGAGTGGGGGccaaggctgggc 8770 gene. aggcgggtggacggccggacactggcccc Integrin-linked kinase. 11 + -9 ctgcccaagagcgccacgggcggggcggggccg 50 gcggcgggctgcgggCGCGGCCGGacg Interferon regulatory factor 1. 5 - 36 agtgtttggattgctcggtggCGCCGCTGCcctg -23 gcagagctcgccactccttagtcgag Interleukin 10 receptor, beta. 21 + -49 agcggggtgggagcccggtcgcccggcccctcCC 10 CACCCCGccccgcccatctccgctgg Interleukin 2 receptor, ɣ (severe X - 71 acgtgtgggtggggaggggtaGTGGGTGAGg 12 combined immunodeficiency). gacccaggttcctgacacagacagactac Isocitrate dehydrogenase 3 15 + -49 gggcgggctccgagcgtggggtgggcgcttgcgca 10 (NAD+) alpha. ctGCCGCTGCGgctgttgctgcgga KDEL (Lys-Asp-Glu-Leu) ER 7 - -7 cccgcgcgcgcgcgcgccggcagttcggccacgtc -66 protein retention 2 cctggCCACGTCGCgggcgatctcg Legumain. 14 - 9 tcaccgcggcacagtggcccttaagcgaggaGC -50 GGCGGCGcccgcagcaatcacagcag Likely ortholog of mouse 11 + 221 cctcccccgcgcgctggcgcggggctttctgggcCA 280 synembryn. GGGCGGGgccggcgaactgcggcc Lysophospholipase 3 (lysosomal 16 + -20 gggatgcGGGCGGCGGggttaagcgcgtcgc 39 phospholipase A2). caccgcccccgcctaggcgagagcccaga Major histocompatibility complex, 6 - 16 gcgtcgccggggtcccagttctaaagtccccacgca -43 class I, B. cccacccggactcaGAGTCTCCTC Major histocompatibility complex, 6 - 25 tattggaaatgatctggcaaaacatgatctaaggcca -34 class II, DM alpha. cCCTCTCGGGgagggagttgggg Malate dehydrogenase 1, NAD 2 + -58 agccttttctcgctaacacCGCTCGCCCtctccg 1 (soluble). agtcagttccgcggtagaggtgacct Mannosyl (alpha-1,3-)- 5 - -7284 cggggaagggccacgttGCCCGCCCGGccg -7343 glycoprotein beta-1,2-N- tccggccccggcgcgccgcagaaagggctg acetylglucosaminyltransferase. Matrix metalloproteinase 11 22 + -49 cgggggtggggcggaagctataaGGGGCGG 10 (stromelysin 3). CGGCccggagcggcccagcaagcccagcag Melanophilin. 2 + -33 agcggggcgcagcccacgtggttcGGGCGGG 26 AGgcgccgggacgtggccagttgcccgcc (continued on next page)

145

(Table A8 continued) Metallothionein IF, MT1F gene. 16 + -4 cccggccccctcccctgactatcaaagcaGCGG 55 CCGGCtgtttgggtccaccacgccttc Microsomal glutathione S- 12 + 463 ctggagaGGGGCGGTGcctgcgtccggcccg 522 transferase1 transcript 1b. cgcggccacagtccctgcattgcgcgcga Mitochondrial ribosomal protein 16 - -263 gccgggggcctggccaggctgcgactgactgcgag -322 L28. cccccacctCCCGCCAGGctcgcga Molybdenum cofactor synthesis 5 - 36 ggttacgcctgcgctccggGTGAGCCCGCgc -23 2. ctgcgcctttgcggccgtgattcggtccc Muskelin 1, intracellular mediator 7 + 217746 cctcccctcctcccgttcgctgccagcggtcggtGG 217805 containing kelch motifs. CGGCCGCtacggtgctgacaagat NADH dehydrogenase X + 31 ggcaccgccccgctcgcgGGCGGCCGCggg 90 (ubiquinone) 1 alpha gcttgctgggaagagaggcgaagccaggtc subcomplex, 1, 7.5kDa. NADH dehydrogenase 16 - 45 gtcgcctgcgcgccgccggaaaggaaccctggtcc -14 (ubiquinone) 1, alpha/beta ggaGGCGGCGGCGcagtgcatcctg subcomplex, 1, 8 kDa. Niemann-Pick disease, type C2. 14 - -28 gggcccgccccgacaggtttgtcttgtgaccgcgG -87 GCGGCCGCtgcttctttcccgagct N-myc downstream regulated 8 - -3 ggggcggggccgcggcgcctataaagtcgccctC -62 gene 1. CGCCCGGAcgtaaacaaacctcgcct Nuclear factor I/C (CCAAT- 19 + 6916 cgggggggggggtTGGGGGGGGcgggggg 6975 binding transcription factor). gtggtttggaaaaatgactcagtaagttcag Nuclear factor of kappa light 14 - 44 aaatccccagccagcgtttatagggcgccGCGG -15 polypeptide gene enhancer in B- CGGCGctgcagagcccacagcagtccg cells inhibitor, alpha. Ornithine decarboxylase 19 + -50 gcgtggcctgggcgcagcatctataaaggcGGG 9 antizyme 1. CGGCGGCagaggcgccattttgcgaac Ovarian cancer overexpressed 1. 20 - 14 ggttcGGCGGCGGCatccggcctcgcacttcc -45 ggtggggagattccggcctggagctccc PAI-1 (plasminogen activator 7 + -113 ggacccgctggctgttcagacggactcccagagcc -54 inhibitor type 1) aGTGAGTGGGtggggctggaacatg Paraoxonase 2. 7 - 5 cgcagggactcggcctaggcggaggacggggcg -54 gagCGCGGCCGGCAccatcgagccggg PDGF, A-chain (platelet-derived 7 - 0 acgcgcgccctgcgGAGCCCGCCcaactccg -59 growth factor A-chain) gcgagccgggcctgcgcctactcctcctc PDGF, A-chain (platelet-derived 7 - 90 ggggaggcggGGGGGGGGGgcgggggcg 31 growth factor A-chain) ggggcgggggaggggcgcggcggcggcgctat PDGF, A-chain (platelet-derived 7 - 90 ggggaggcggggggggggggcgggggcggggg 31 growth factor A-chain) cgggggaggggcGCGGCGGCGgcgctat Peptidylprolyl isomerase A 7 + -10 gggcggggccgaacgtggtataaaagGGGCG 49 (cyclophilin A). GGAGgccaggctcgtgccgttttgcagac Peroxisomal biogenesis factor 1 + -62 ccgttgggtcgcgcagtaggcgtgactaggggcgg -3 11B. gaagtgGGGCGGGAGcagggccgcg PGK1 (phosphoglycerate kinase X + 6 agcggccgggaaGGGGCGGTGcgggaggc 65 1) ggggtgtggggcggtagtgtgggccctgttc PHD finger protein 7. 3 + 101 ttcccgaaggtcatagtccgcgtgACCCACCCC 160 cgtttgcgcgctcgcagtcttcgccgg Phosphofructokinase, platelet. 10 + -61 cgcgggcGGGGCGGCGGttccgagtcaggc -2 gcgcgcgggcagggtccccattgcctgctg Phosphogluconate 1 + -8 gcgtgagcggccgcagtttctggagggaGCCGC 51 dehydrogenase. TGCGggtctttccctcactcgtcctcc Phospholipid transfer protein. 20 - 47 gaggccggctttataaaggcggctggaacaaccct -12 GCCCGCCAGaccccgtcgcccggat Plasma retinol-binding protein. 10 - 52 cgcgaccccctccccccggcgctataaagcagcgg -7 ggcggccgcggcGCGCTCGCCtccc Plasma retinol-binding protein. 10 - 52 cgcgaccccctccccccggcgctataaagcagcgg -7 GGCGGCCGCggcgcgctcgcctccc Procollagen-lysine, 2- 7 - -102 ccgaccctCCGCCAGGGgtcacttcccctgtcc -161 oxoglutarate 5-dioxygenase 3. aggtttcagcttccacatgtgtcaagc (continued on next page)

146

(Table A8 continued) Profilin 1. 17 - 49 cggggcGGGGGGGGGgaggagcaggaagt -10 ggcggtgcgagggctgctgcacagcgagcgg Profilin 2 , transcript variant 2. 3 - -31 taccgccgccgcCGCCGCTGCGcctgctgctc -90 ctcgccgtccgcgctgcagtgcgaaggg Proopiomelanocortin, 2 - -58 ccaccaggagagctcggcaagtatataaggacag -117 adrenocorticotropic hormone/ aggagcgcgggaccaaGCGGCGGCGA beta lipotropic hormone. Prostaglandin E synthase 2. 9 - -667 ctcggcgtcttcgtttcgcgcGCCCGCCCGcgg -726 cgccggcggagcgaacatggacccggc Proteasome (prosome, 14 + -21 ccttccCCCGCCCGGAcggccatggccattcc 38 macropain) 26S subunit, ATPase, cggcatcccctatgagagacggcttctc 6. Proteasome (prosome, 17 + -28 ggcgggctttccgggtgtgtgtttccggcgtCGGC 31 macropain) 26S subunit, non- GGCCGCggccggggacggtgtgaga ATPase, 11. Proteasome (prosome, 17 + -28 ggcgggctttccgggtgtgtgtttccggcgtcggcggc 31 macropain) 26S subunit, non- CGCGGCCGGggacggtgtgaga ATPase, 11. Protein C receptor, endothelial 20 + 70 gccCAGACTCCGcccctcccagacggtcctca 129 (EPCR). cttctcttttccctagactgcagccagc Protein kinase C, alpha binding 22 + 103 cAGTGGGGGGaaagccgggacttccgcgtctt 162 protein. gccggaagtgacgtgacaatcgcggcca Protein phosphatase 1, catalytic 11 - -39 ggagagccaggccggaaggaggctgccggAGG -98 subunit, alpha isoform. GCGGGAGgcaggagcgggccaggagctg Protein phosphatase 4 (formerly 16 + -82 ttccgcggcggggccggaagtaggagcGGCGG -23 X), catalytic subunit. CGGCGgcggcggcggcggtcgaaagcgg RAB7, member RAS oncogene 3 + 8 tcGGGGCGGCGGCGgtggcggaagtggga 67 family. gcgggcctggagtcttggccataaagcctga Ras suppressor protein 1. 10 - -8 cgcctgcgcggggcgccgaGACAGGGCGtgt -67 tcgctgttcagtgccggtgttgcagggag Ras-GTPase-activating protein 5 + -10 aagttaGGGTGAGTGGcagttatatagaccgg 49 SH3-domain-binding protein. cggcggagcacgcgtgtgtgcggacgca Ribosomal protein L14. 3 + -20 GGGGCGGTGcgttcttctacacatgcgcagggt 39 tgggcgggtcttcttccttctcgccta Ribosomal protein L28. 19 + -34 cggtattaaaggcgtcgcctcgcccctcgccttcctctt 25 tccgtctcaggtCGCCGCTGC Ribosomal protein L39-like. 3 - 49 aatcaccgccctagcatccggggAAATCGCG -10 Gtcttagcatccggcgcgcggcggttgaa Ribosomal protein L8. 8 - -237 gtacccgggccgcccggccgcgctaatcgtgaGT -296 CGCCCCCaggacccgtcgccatgggc RNA binding motif protein 4. 11 + -11 cccttctactcagagcactgctGCGGCCGCCg 48 ccattttagcgttttgtcagaagcgtcc RNA binding protein S1, serine- 16 - 38 GGGCGGCGGCGActtcccgctccttgactctg -21 rich domain. acgtcagagcggcgccggcctcggcggc S100 calcium binding protein A10 1 - -392 acgtactaaggaaggcgcacagcccgccGCGC -451 (annexin II ligand, calpactin I, TCGCCtctccgccccgcgtccagctcgc light polypeptide (p11)). Serine proteinase inhibitor. 14 - -1041 ttgcccctctggatccactgcttaaatacggacgaG -1100 GACAGGGCcctgtctcctcagctt Serine/threonine kinase 25 2 - -273 ggcaccgggaggaagctgccttggaagaggtgG -332 (STE20 homolog, yeast). GGGCGGCGacgggaggggcggcgagcc Serine/threonine kinase 25 2 - -273 ggcaccgggaggaagctgccttggaagaggtggg -332 (STE20 homolog, yeast). ggcggcgacgggaggGGCGGCGAGcc SH3 domain binding glutamic 1 + 328 agcaggaaggaaaccgctcccgagcacGGCG 387 acid-rich protein like 3. GCGGCGtcgtctcccggcagtgcagctgc Signal sequence receptor, λ 3 - 59 cgcgcgccgcgctgacgggtcgttagcggataCG 0 (translocon-associatedprotein λ). GCGGCCGCaggggcggagtcaagggc Small inducible cytokine B14 5 - -239 cgagggcaggagcggatttaaaagaggcCAGG -298 precursor (chemokine CXCL14), GCGGGcggagggaggctgtggagagagc SCYB14 or NJAC gene. (continued on next page)

147

(Table A8 continued) Small nuclear ribonucleoprotein 19 + 194 gcagtgaggacggcggagggatttgcggccgggA 253 70kDa polypeptide (RNP CCCACCCCctgctccagtcgctatcg antigen). SNRPN upstream reading frame. 15 + 131308 gcgaagcctgccgctgctgcagcgagtctggcgca 131367 gagtggaGCGGCCGCCggagatgcc Solute carrier family 25 4 + -27 ctcgcgagagcccggcggggatataagggggagc 32 (mitochondrial carrier; adenine tgcgggccaGGCGGCGGCcccctagc member 4 e translocator) Solute carrier family 25 17 - 33 ggagctgcacggtgattggcccagGAGCCCGC -26 (mitochondrial deoxynucleotide CAGtttcgtccgagctcagtagagtttt carrier), member 19. Solute carrier family 31 (copper 9 + 5 gtgcggccagggtagctatcgcGGCGGCGGC 64 transporters), member 2. Ggcggcggcggttgaactgactcggagcg Sorting nexin 3. 6 - -228 gaccgctgcgcgcgagccccgtgtccccacggcgg -287 gcagcagcGGCGGCGGCGgcggctg Sorting nexin 3. 6 - -228 gaccgctgcgcgcgagccccgtgtccccacggcgg -287 gcagcaGCGGCGGCGgcggcggctg Splicing factor 3b, subunit 1, 2 - 30 ctatttttctccgtGGCGGCGGCGAcgagcgg -29 155kDa. aagttcttgggagcgccagttccgtctg SWI/SNF related, matrix 22 + -32 tttgtttgaGCGGCGGCGcgcgcgtcagcgtca 27 associated, actin dependent acgccagcgcctgcgcactgagggcgg regulator of chromatin, subfamily b, member 1. TBP-like 1. 6 + 12 tgctccccagcgtctagtctatttattgtcgcggggaag 71 ctGCGGCCGCCtcgcacccgg Tf (transferrin) 3 + -644 aaggtgtcccacaggaagcttgAGGGCGGGA -585 agttttccagcccaggagcctgagctcag Thrombospondin 1. 15 + -49 ccaggaatgcgagcgcccctttaaaagcgcgcggc 10 tcctccgccttgccaGCCGCTGCGc Thymosin, beta 10 2 + -32 cggggcggggccgcggcgtatataaggctaggcg 27 GGGGCGCCGCtcttttgtttcttgctg Tissue factor pathway inhibitor 2. 7 - 49 cggccggaCGCTCGCCCcgcataaagcggg -10 cacccgggccgcctggagcagaaagccgcg tk (thymidine kinase) 17 - -24 gagggggcgggctgcggccaaatctCCCGCCA -83 GGtcagcggccgggcgctgattggcccc TNF receptor-associated factor 2 9 + -46 agcGGCGGCGGCGgcggcggcgttggggg 13 , transcript variant 1. cggtagctgggcgggcccttagttccgggcg TPI (triosephosphate isomerase) 12 + -154 tcgggcggcggccGGGGCGGCGGcaggag -95 ggcgggcggggggcagggctccgggggactg Transforming growth factor, beta- 5 + 47 ctcccgctcgcagcttacttaacctggcccGGGCG 106 induced, 68kDa. GCGGaggcgctctcacttccctgga Transforming, acidic coiled-coil 4 + -52 tcggcgtttgaaactccggcgcgccggcggccatca 7 containing protein 3. agggctagaagcGCGGCGGCG Translocase of outer 19 + 12 gtGGCGGCGGCGgcagcgggttcggttgcgc 71 mitochondrial membrane 40 gtggcgcacggggtgggagcggagcccag homolog (yeast). Transmembrane 4 superfamily 7 + 9 gcggggctcgggctcctgctccggctcagctgCGG 68 member 13. CGGCCGCaggttccaaagcgggtcc Transmembrane 4 superfamily X + 17 cgcGCCCGCCCGccgcctgccgccgccgcc 76 member 2. gccgccgccggagctctgtagtatggcatc Transmembrane 4 superfamily X - -32 ctcggggactccgcgtctcgctctctgtgttccaaTC -91 member 6. GCCCGGTgcggtggtgcagggtc Trinucleotide repeat containing 5. 6 + 290 gggctgcggctgcgagaggagggcGGGCGG 349 GAGgctagctgttgtcgtggttgctcggag Tubulin, beta polypeptide. 6 - 54 ggCAGGGCGGGtccccgggtataaaagacc -5 gagctgggggggcggcggcaggtctctgcg Tubulin, beta polypeptide. 6 - 54 ggcagggcgggtccccgggtataaaagaccgagc -5 tgggGGGGCGGCGGCaggtctctgcg Tumor necrosis factor-alpha, TNF 6 + -61 cgccCTCCTCTCGccccagggacatataaag -2 or TNFA gene. gcagttgttggcacacccagccagcagac (continued on next page)

148

(Table A8 continued) Tumor-associated calcium signal 2 + 114 aggcggGGCCGCCAGGtcgggcaggtgtgc 173 transducer 1. gctccgccccgccgcgcgcacagagcgcta Ubiquitin A-52 residue ribosomal 19 + 4 cgtgcgcaagcgctttCGGCGGCGAttaggtg 63 protein fusion product 1. gtttccggttccgctatcttctttttct Ubiquitin associated protein 1. 9 + -16 atgacgtcagacgcccaaaTGAGTGGGGcg 43 gtgaggggaaggaggagggaagtaggactt Ubiquitin associated protein 1. 9 + -16 atgacgtcagacgcccaaatgagtGGGGCGG 43 TGaggggaaggaggagggaagtaggactt Ubiquitin-conjugating enzyme 21 - 5 ccggaagcagtcccCGGTGTCGGGgcagga -54 E2G 2 (UBC7 homolog, yeast). ggcacgcgcgcggctgaggcgaggtcgctc Uridine phosphorylase. 7 + 443 cccccgggCAGGGCGGGgccgctcgcagac 502 tccatatgagattcacctcgcaggtggttc Vaccinia related kinase 1. 14 + -1 tactgcagggtgcgaaggggccggCGCCGCT 58 GCcgagttacgagtcggcgaaagcggcgg Von Hippel-Lindau binding X + 138 gtgcgaacaagccaatcacggaatccCGGCGG 197 protein 1. CCGgcggcccgggaggcagtcgcgcgct Von Hippel-Lindau binding X + 138 gtgcgaacaagccaatcacggaatcccgGCGG 197 protein 1. CCGGCggcccgggaggcagtcgcgcgct Williams-Beuren syndrome 7 + -46 tattgcggggtccttcctcgctcaccctggtTCCTC 13 chromosome region 1. TCGGagcggagacggcaaatggcg

Table A9. Gfi-1B Mm Target Genes

Gene Description Ch 60n Sequence with Binding Site [X-linked gene downstream of X - 72 ccagcgcgcgcgcccggggcggcggcGCGCG 13 G6PD gene], GDX gene. GCGGggggtggttggggtgcgcgccggcc 53K phosphoprotein, TP53 gene. 17 - 17 actccatttcctttgcttcctcCGGCAGGCGGatt 99 acttgcccttacttgtcatggcgac Acidic (leucine-rich) nuclear 15 - 13 tccatggggccgcagatcccgcctccacggcgatc -46 phosphoprotein 32 family, aggttagtGTGCGCCGCGggtgctg member A. Activated RNA polymerase II 5 + 9 cgtgaccgcagccccaGCGCGGCGGggccg 68 transcription cofactor 4. gcgtctcctggctgccgtcacttccggttc Adhesion regulating molecule 1. 20 + -15 ggccgaacgcgggtttccggcggggccCGGCA 44 GGCGccgaggaggaagagcgagcccgga alpha-actin cardiac. 15 - -128 ccctgtccatcagcgttctataaagcggccctcCTG -187 GAGCCAgccacccagagcccgctg Amyloid beta (A4) precursor-like 11 + 24 gtcagttccggttggtgtaaaacccccggggCGG 83 protein 2. CGGCGAactggctttagatgcttctg Aquaporin 3. 9 - 50 cgtgtctccagcgctcctataaagggagccaccaG -9 CGCTGGAGgccgctgctcgctgcgc ATP synthase, H+ transporting, 2 - -3 catccgggtgcTGCGGCGCGaataagagccg -62 mitochondrial F0 complex, gaccgcgcttgcgcattgagtcccactcc subunit c (subunit 9) isoform 3. ATPase, Na+/K+ transporting, 3 + -34 agacgtcaggctcccgcggcccaagcggccgccc 25 beta 3 polypeptide. GGCGCGGCGcggcgcagtcggctcga ATPase, Na+/K+ transporting, 3 + -34 agacgtcaggctcccgcggcccaagcggccgccc 25 beta 3 polypeptide. ggcGCGGCGCGGcgcagtcggctcga Block of proliferation 1. 8 - 0 ctcctgcgcacgcggcccggtcgctgtcggaagcg -59 gctGTGCGGGTGGcggccggcgcgc Calmodulin 3 (phosphorylase 19 + 62 gcgcgctgcgggcagtgagtgtggaggcgcggac 121 kinase, delta). GCGCGGCGGagctggaactgctgcag Carboxypeptidase A3 (mast cell). 3 + -9 atcaagataaGGGCTGAGGcataaaactgcc 50 agagggtctcaaggcaggcaaagaagaac Catenin (cadherin-associated 5 + -47 ggGGGCCGCGGgcggggggcgggccgggg 12 protein), alpha 1, 102kDa. gcgggggcgtggggcggcccatttcctcctc Catenin, beta like 1. 20 + -34 tttgccctattggctctgcggctccgctcgccgcacttt 25 acggcagtgtgGCTGGAGCCg (continued on next page)

149

(Table A9 continued) CD68 antigen. 17 + 104 tcctcctttccaagagaGGGCTGAGGgagcag 163 ggttgagcaactggtgcagacagcctag Cellular-Abelson murine leukemia 9 + -573 gggcccgccttccgctgtcTGGGCCGCGaga -514 virus oncogene 7kb. gtccttcgtcccttacagccccgccccgg Chaperonin containing TCP1, 7 + 32 ggcggcgcgcgggcacgctGGGGGCCGGc 91 subunit 6A (zeta 1). cagacgggccgacttttccagaagacccgga Cisplatin resistance-associated 17 + -67 cgtgtcccacgggcgcacggcatgctgggaaggcg -8 overexpressed protein. tccGCGCGGCGGccattttgtcttg Cofilin 1 (non-muscle). 11 - -104 cccctcattgtgcggctcctactaaacggaAGGG -163 GCCGGgagaggccgcgttcagtcggg Complement component 1, q 17 - 9 gcggggctTCCGGCGGCGcctcaggtcgcg -50 subcomponent binding protein. gggcgcctaggcctgggttgtcctttgcat Component of oligomeric golgi 1 + -24 aggccggcttgcggtgccgtcgttcattggccGCG 35 complex 2. GCGCGGgcgctgccatgttggcgga COP9 constitutive 12 + 35 aacagctgccggaggtgacggagcggcggccccg 94 photomorphogenic homolog cccggtGCGCTGGAGgtcgaagcttc subunit 7A (Arabidopsis). Coronin, actin binding protein, 16 + -48 ccaagGGGCGCGGCcaccccggaggcggg 11 1A. cgcacggctgcttctcattcattgtcttgac Cyclin D3. 6 - -135 tccccacttcctgcccagctctggatcggcGGCGC -194 GGCGcggactttgtaaacacttcgc Cyclin D3. 6 - -135 tccccacttcctgcccagctctggatcgGCGGCG -194 CGGcgcggactttgtaaacacttcgc CYP24 (24-hydroxylase gene) 20 - 325 gggagggcggggaggcgcgttcgaaGCACAC 266 CCGGtgaactccgggcttcgcatgacttc Cystatin B (stefin B). 21 - -23 acgtgaccccagcgcctacttGGGCTGAGGa -82 gccgccgcgtcccctcgccgagtcccctc Cysteine-rich protein 2. 14 + -13 ccgGGCAGGCGGggctgggcgcgggcggc 46 ggcggcccggaggagaacgggcggagggcgc Cytochrome c oxidase subunit IV 16 + 56 cgggcggagtcttcctcgatcccgtggtgctccGCG 115 isoform 1 , nuclear gene GCGCGGccttgctctcttccggtc encoding mitochondrial protein. DBH (dopamine beta 9 + -163 agtgccaattagaggagggcagcaggctGAGT -104 hydroxylase) GCTTGGcctggggcgcaagcttgtggga DEAD/H (Asp-Glu-Ala-Asp/His) 2 + 186 aacgccgaagacCAGGGGCCGGaagcgcg 245 box polypeptide 1. cgccgccactgccacgccgtgtcagtcggga Delta-like 1 homolog 14 + -55 gtacgaaaagggcggcgcGCGCGGCGGcg 4 (Drosophila). gcggcagctccccggcagcggcggtggagag Description of the gene 12 - -68 ccggggcggggccgcgCGGCAGGCGGggc -127 gggagccggggggcgcagctagagagccccg DiGeorge syndrome critical 22 - 7 ggaggcggggcggggcGGCGCGGCGcgga -52 region gene 2. gccgagcaggacggccgccatcttgcgcgc DiGeorge syndrome critical 22 - 7 ggaggcggggcgggGCGGCGCGGcgcgga -52 region gene 2. gccgagcaggacggccgccatcttgcgcgc DnaJ (Hsp40) homolog, 9 + 15 gacgcgcccgggttcggctacaaaagaggacggc 74 subfamily A, member 1. TGCGGCGCGccgggcggaactttcca Dynactin 4. 16 + -61 agtccCAGTGGGCCacctggagcggaagtgg -2 aggagcggccggaagtagccggaatctct Enhancer of rudimentary 14 - 44 tttcgacgactTCCGGCGGCtcctctcgcgagg -15 homolog (Drosophila). attggctgttagcggcgttgtagttaa Enolase 1, (alpha). 1 - 14 ggtggggctcgccttagctaggcaggaagtcggcg -45 cggGCGGCGCGGacagtatctgtgg Eukaryotic translation initiation 17 + 4417 caACACCCAGGtgagggcagtcttgcttgaata 4476 factor 4A, isoform 1. gctaatgattcttgaaaaatagtaagt Eukaryotic translation initiation 14 + 32 ggcggacgctggggccgagggtagcttgaGCGC 91 factor 5. GGCGGcggcgttgttcagtcagagcga Ewing sarcoma breakpoint region 22 + 234 gcctgcgcaGTGCGGCGCctagagggaaag 293 1. cgagagggagacggacgttgagagaacgag FBJ murine osteosarcoma viral 14 + -442 ccgcctcccctcccccGGCCGCGGccccggtt -383 oncogene homolog ccccccctgcgctgcaccctcagagttg (continued on next page)

150

(Table A9 continued) Fucosidase, alpha-L- 1, tissue. 1 - 10 agccgcccgcgggcacctgcgcgttaagAGTGG -49 GCCGCGtcgctgaggggtagcgatgcg Galactose-4-epimerase, UDP-. 1 - -3 ggAGTGGGCCGtcagcacttaaagggcccgc -62 ggctcgggcgtaggaggcggtgcctctgc Glioblastoma amplified 7 + -62 gggaggggtcGCGCGGCGGcggcgtcagcg -3 sequence. gcggcgcccgggcggtgggagccgaggcgc Glutamate-ammonia ligase 1 - 24 ggcggccccGGGCCGCGGataaagggtgcg -35 (glutamine synthase). gggctgctggcggctctgcagagtcgagag Glycine C-acetyltransferase (2- 22 + -81 cgattggctGCTGGAGCCcggcccccggcca -22 amino-3-ketobutyrate coenzyme cgcgaccaggctgcgtccgcgatgcgcac A ligase) Golgi reassembly stacking 2 + 3 cggcgggctgtgcggcccggctcTCCGGCGG 62 protein 2, 55kDa. Cagcgagtgccacgtcccaagtgctacgc Guanine nucleotide binding 3 + -60 ccggcccgccccgccgtcggtGCGCGGCGGt -1 protein (G protein), alpha agggaaggcgcctcccgcagtcgctcgga inhibiting activity polypeptide 2. Guanylate kinase 1. 1 + -42 cgggcggaggtgggccgGTGCGGCGCtgtc 17 acgtaggttcagtgggcggaagaggtggcc Heat shock 70kDa protein 9B 5 - 85 agtcgagtatcctctggtcagGCGGCGCGGgc 26 (mortalin-2). ggcgcctcagcggaagagcgggcctctg HIV-1 Tat interactive protein, 11 + 185 agggcgctcggtcccggaagtgacgtctcccagA 244 60kDa. GGGGCCGGaagtggcagtggagggag Hypothetical protein FLJ11000. 7 + -34 agtgacatcagtcagaacaaatgtaccaaagttcag 25 agagctgtttaCTAGGCACGactg Hypothetical protein FLJ20274. 16 - -11 atgaggcttcgaggccggctaGGGGAGGCCg -70 tcgctcatatccgacgtcaccagttcgcg Hypothetical protein MGC2668. 9 + -1 cccggaaatttGCGGGCGCGccggaagttga 58 ggggagtttcctgcgagctcggcttcctc Hypothetical protein similar to 17 - 49 gCCGGCGGCGAccccgcccgaccacggag -10 beta-transducin family. acggcgctctgtctgccgggctctttctcct Inner membrane protein, 2 - -259 agtggcgtccgctgtgcttccggtgcgcCGGGCG -318 mitochondrial (mitofilin). CGGacgcgggcacgcacacacgcaag Insulin-like growth factor II, IGF2 11 - -10160 ccTGGGCCGCGGgctggcgcgactataaga -10219 gene. gccgggcgtgggcgcccgcagttcgcctgc Insulin-like growth factor II, IGF2 11 - 8828 cccgcctccagagtgggggccaaggctgGGCA 8769 gene. GGCGGgtggacggccggacactggcccc Integrin, beta 2 (antigen CD18 21 - -7914 acccgcgcctccaGCTGAGGTTtctagacgtg -7973 (p95), lymphocyte function- acccagggcagactggtagcaaagcccc associated antigen 1). Integrin-linked kinase. 11 + -9 ctgcccaagagcgccacgggcggggcggggccg 50 gcggcgggctGCGGGCGCGGCcggacg Integrin-linked kinase. 11 + -9 ctgcccaagagcgccacgggcggggcggggCC 50 GGCGGCGggctgcgggcgcggccggacg Mdm4, transformed 3T3 cell 1 + -39 tccGGAGGCCTCaccggaagccctcgtgtga 20 double minute 4, p53 binding ggccgtgtgggaggccggaagttgcggct protein (mouse). Mesoderm specific transcript 7 + 5849 ctctaaaagtcggtgcccactcgctccgcgctgccgc 5908 homolog (mouse). ggcaaccaGCACACCCCggcacc Microtubule-associated protein 1 16 + 92 acctgaccaggctgcGGGCTGAGGagataca 151 light chain 3 beta. agggaagtggctatcgccagagtcggatt Mitochondrial ribosomal protein 3 - 35 gttcattgtaaaacgcaccggaagtggGTCCGG -24 L3. CGGCtttctttccgtcgcagagagcat Mitochondrial ribosomal protein 3 - -21 gcatcggccggcgaccgtTCCGGCGGCcatt -80 L3. gcgaaaacttccccacggctactgcgtcc NADH dehydrogenase 1 + 3003 ccaagaaggaggcGCGCTGGAGttacttccg 3062 (ubiquinone) Fe-S protein 2, cccggttctccttcccgcagtctgcagcc 49kDa (NADH-coenzyme Q reductase). NADH dehydrogenase 1 + -53 gcttccggcgtcgcgctgcaaacgtctcgctgaggtc 6 (ubiquinone) Fe-S protein 5, atccttTACGGCAGGCGtccgcg 15kDa) e Q reductase). (continued on next page)

151

(Table A9 continued) Neighbor of COX4. 16 - 1 cttTCCGGCGGCGccgggcgagaagcgcga -58 ctggagtctggggctgcggcttcccagagg Neural precursor cell expressed, 14 - 23 gtggagccagacagtgacccggaagtagaagtgg -36 developmentally down-regulated cccttgcaggcAAGAGTGCTggaggg 8. Ninjurin 1. 9 - 4 ggcgcgcaGCTGGAGCCtgcggctgaggctc -55 gggcgcgctcaggcccggatcctggcggc N-myc downstream regulated 8 - -3 ggggcgGGGCCGCGGcgcctataaagtcgc -62 gene 1. cctccgcccggacgtaaacaaacctcgcct Parkinson disease (autosomal 1 + -2 cgtgagtctgcgcagtgtgGGGCTGAGGgag 57 recessive, early onset) 7. gccggacggcgcgcgtgcgtgctggcgtg PDGF, A-chain (platelet-derived 7 - 90 ggggaggcggggggggggggcgggggcggggg 31 growth factor A-chain) cgggggagGGGCGCGGCGGcggcgctat PDZ and LIM domain 1 (elfin). 10 - -72 tctgtccCCGCGGGTCgtcgcccgccacagcc -131 gcgccatgaccacccagcagatagacct Peptidylprolyl isomerase F + -44 ccgggaacctgggcaagccaataaaggcTGCG 15 (cyclophilin F). GCGCGcggctgcgcgggactcggccttc Pericentrin 1. 17 + 169 ggagtcttgtctcgcagccagctctgagcGGGAG 228 GCCTgagcgggaagcattggcgtccg Peter pan homolog (Drosophila). 19 + 36 aggcgccggaagtgagctgcgcagcgccggaag 95 cggcggacgcaGGAGGCCTCgtggagg Phosphogluconate 1 + -8 gcgtgagcggccgcagtttctggagggagccgctG 51 dehydrogenase. CGGGTCTTtccctcactcgtcctcc Pinin, desmosome associated 14 + -2 tcattggctgagcccggctgtcagtcctttcgcgcctc 57 protein. gGCGGCGCGGcatagcccggct Pituitary tumor-transforming 1. 5 + -52 aattgggccgcgagttgtggtttaaaccaggaGTG 7 CGCCGCGtccgttcaccgcggcctc Pituitary tumor-transfo-rming 1. 5 + -52 aatTGGGCCGCGagttgtggtttaaaccagga 7 gtgcgccgcgtccgttcaccgcggcctc Polymerase (RNA) II (DNA 16 + -17 tttgcggaagccgcGGAGGCCTCgctgactga 42 directed) polypeptide C, 33kDa. cttccggtgttggcggtggcgccgcgca Polymerase (RNA) II (DNA 19 - -364 acctaggcctggcccctcccgcgacctgtaGCGC -423 directed) polypeptide I, 14.5kDa. GGCGGagcaagcgcggaaggctggga Procollagen-proline, 2- 17 - -185 tccgtgtccgacatgcTGCGCCGCGctctgctg -244 oxoglutarate 4-dioxygenase, beta tgcctggccgtggccgccctggtgcgc polypeptide (protein disulfide isomerase; thyroid hormone binding protein p55). Proopiomelanocortin, 2 - -58 ccaccaggagagctcggcaagtatataaggacag -117 adrenocorticotropic hormone/ aggagcgcgggaccaagCGGCGGCGA beta lipotropic hormone. Proteasome (prosome, 2 + -71 ccgggCGGCAGGCGGggtggcgggcagcc -12 macropain) 26S subunit, non- cctgggcgggcggggtcctggcgagaagcga ATPase, 1. Proteasome (prosome, 14 + 22 cctaccgctttcgctttcccttcgcggtgcccactccac 81 macropain) activator subunit 1 tccttGTGCGGCGCtaggccc (PA28 alpha). Proteasome (prosome, 17 + -23 agccaatgaagagacagcagtgagagcggttgcg 36 macropain) subunit, beta type, 3. cagtgaaggcTAGACCCGGTttactg Protein disulfide isomerase- 2 - -31 accggactggcctggggcgggacgtgggcgcgg -90 related protein. GGGCGCGGCGtgcggcacgctgcaggg Protein kinase, interferon- 2 - -49 gcctgccccctCGCTGGAGCaacgcaagcag -108 inducible double stranded RNA gaggcgggggagtcggaggaggtggcggc dependent activator. Protein phosphatase 1, 12 - -309 gggcttctgtggtaggaagggaggtccgctcggccg -368 regulatory (inhibitor) subunit 12A. gGTGCGCCGCcccagtgctctgtg Protein phosphatase 1, 19 + -20 cacgcCGGGCGCGGtaggctataaaagccta 39 regulatory (inhibitor) subunit 15A. gtggccattgtgttcgttgctcttatcgg RAD23 homolog A (S. 19 + -12 gcattgcctgccccggaagtggtcggcgcgcGGC 47 cerevisiae). GCGGCGcgcctgggcgctaagatggc (continued on next page)

152

(Table A9 continued) RAD23 homolog A (S. 19 + -12 gcattgcctgccccggaagtggtcggcgcGCGG 47 cerevisiae). CGCGGcgcgcctgggcgctaagatggc Ras-GTPase-activating protein 5 + -10 aagttagggtgagtggcagttatatagaCCGGC 49 SH3-domain-binding protein. GGCGgagcacgcgtgtgtgcggacgca Reticulon 4. 2 - -40277 attgcactgcgtcagactgttcCACACCCAGaa -40336 gacgtcaggtgacttcagtcctgctgc Ribosomal protein L14. 3 + -20 ggggcggtgcgttcttctacacatgcgcagggttgg 39 GCGGGTCTTCttccttctcgccta Ribosomal protein L39-like. 3 - 49 aatcaccgccctagcatccggggaaatcgcggtctt -10 agcatccggcGCGCGGCGGttgaa Ribosomal protein, large, P0. 12 - -55 tggctactttgttcgcattataaaaggcacgcGCGG -114 GCGCGaggcccttctctcgccagg Ribosomal protein, large, P0. 12 - -55 tggctactttgttcgcattataaaAGGCACGCGc -114 gggcgcgaggcccttctctcgccagg Ring finger protein 4. 4 + -23 ggcgggcggccaatggggacatgatggggggcgg 36 agccGAGGCCTCCGaagcggaagtgg RNA binding motif protein 14. 11 + 15 gcgcctGCGCGGCGGggttctcggtcgccagc 74 cattcctgaggaggactgccggtcgttc RNA binding protein S1, serine- 16 - 43 gcgccgggCGGCGGCGActtcccgctccttga -16 rich domain. ctctgacgtcagagcggcgccggcctcg Serine/threonine kinase 25 2 - 25 ggtcccagagagccgtGCGCGGGCCGCGa -34 (STE20 homolog, yeast). cgccgagcaccgccctcgccgtcgcctccgg Sequestosome 1. 5 + 14468 gcggggcctccgcgttcgctacaaaagccGCGC 14527 GGCGGctgcgaccgggacggcccgttt Splicing factor 3b, subunit 1, 2 - 30 ctatttttctccgtggCGGCGGCGAcgagcgga -29 155kDa. agttcttgggagcgccagttccgtctg Splicing factor, arginine/serine- 17 - -70 cccatatcCGTGCGCCGagctgataaaggcg -129 rich 1 (splicing factor 2, alternate ccattttggaggggccgcgggagacgtgg splicing factor). Splicing factor, arginine/serine- 17 - -70 cccatatccgtgcgccgagctgataaaggcgccattt -129 rich 1 (splicing factor 2, alternate tggagGGGCCGCGGgagacgtgg splicing factor). Src homology 3 domain- 7 + 119 aacgggccagcgctgcaagaggcctacGTGCG 178 containing protein HIP-55. GGTGGtcaccgagaagtccccgaccgac STIP1 homology and U-Box 16 + 263 tggtcccgcccccggccggaagtTCCGGCGG 322 containing protein 1. CGgagctgggccgggcccgagcggatcgc T-cell- or lymphocyte-specific 1 + 22833 atcccaggtgggagggtggggctagggctCAGG 22892 tyrosine kinase 1. GGCCGtgtgtgaatttacttgtagcct theta-globin. 16 + 83 cgcgggacccctggccggtccgcgcaggcgcagc 142 ggggtcgcaGGGCGCGGCGGgttcca Thyrotropin-releasing hormone, 3 + 400 gagccgccgcctggcgcagatataagcggcggcc 459 TRH gene. catctgaagagggctCGGCAGGCGcc Transcription factor-like 4. 17 + -41 acctctcggtcgcggattgGCGGGCGCGGgt 18 cacgtgggcacgccacccgcttcctcgcc Transforming, acidic coiled-coil 4 + -52 tcggcgtttgaaactccggcgcgccggcggccatca 7 containing protein 3. agggctagaaGCGCGGCGGcggta Tubulin, beta, 3. 16 + -41 agggcgGGGCCGCGGctataagagcgcgcg 18 gccgcggtccccgaccctcagcagccagcc Tubulin, gamma 1 pseudogene. 7 + -32 gcgctcagcttaattaaaagtggatatctggggggct 27 GGCACGCGGcagcgttgcgggtg Tubulin-specific chaperone c. 6 - 13 gccaatcaccggacgcgttggtGGGAGGCCT -46 Cacggacagcgcgcccggaggaaggaaga Tumor necrosis factor, alpha- 17 + 218 acgagcgcaggatgtgagctcacagcttgggactg 277 induced protein 1 (endothelial). ctgaggGGCAGGCGGctgcaggcta Tumor necrosis factor-alpha, TNF 6 + -61 cgccctcctctcgccccagggacatataaaggcagt -2 or TNFA gene. tgttgGCACACCCAGccagcagac Ubiquitin A-52 residue ribosomal 19 + 4 cgtgcgcaagcgctttCGGCGGCGAttaggtg 63 protein fusion product 1. gtttccggttccgctatcttctttttct Ubiquitin fusion degradation 1- 22 - 18 tccggcacttccggtgagcctctggggcgtaccggct -41 like. tGGCGCGGCGGcagcggcagcgg (continued on next page)

153

(Table A9 continued) Ubiquitin-conjugating enzyme 21 - 5 ccggaagcagtccccggtgtcggggcaggAGGC -54 E2G 2 (UBC7 homolog, yeast). ACGCGcgcggctgaggcgaggtcgctc Ubiquitin-conjugating enzyme 12 - -258 cgaCGGGCGCGGCaacgtcccccggaagtg -317 E2N (UBC13 homolog, yeast). gagcccgggacttccactcgtgcgtgaggc Urokinase (urine plasminogen 10 + -23 gggcggcgccggggcgggccctgatatagagcag 36 activator). gcgCCGCGGGTCgcagcacagtgcgg Vaccinia related kinase 1. 14 + -1 tactgcagggtgcgaAGGGGCCGGcgccgct 58 gccgagttacgagtcggcgaaagcggcgg Williams Beuren syndrome 7 + -2 atgacataaaaaccgggtgcCGGCAGGCGc 57 chromosome region 22. cagtcgcaggtgtgctgctgaggcgtgaga X-box binding protein 1. 22 - 3 ctgggcgctgggcggcTGCGGCGCGcggtgc -56 gcggtgcgtagtctggagctatggtggtg

Table A10. IA-1 Long Target Genes

Gene Description Ch 60n Sequence with Binding Site

[X-linked gene downstream of X - 72 cagcgcgcgcgcccggggCGGCGGCGCGc 13 G6PD gene], GDX gene. ggcggggggtggttggggtgcgcgccggcc [X-linked gene downstream of X - 72 cagcgcgcgcgcccggggcggcggcgcgcggC 13 G6PD gene], GDX gene. GGGGGGTGgttggggtgcgcgccggcc 6-phosphogluconolactonase. 19 + -7 aggcgtactagcgacggccgtagggagcgcttcctc 52 ctCCCCGCCGCCGccctcgccatg Acid sphingomyelinase-like 1 + -45 tggcttccacttaAGGGTCCGGtatgcctgcctc 14 phosphodiesterase. ctcgggccagcccagatcataccctg Activating transcription factor 6. 1 + -19 tcgtcagcgttacggagtattttgtccgcctgCCGCC 40 GCCGtcccagatattaatcacgg Adenosine deaminase, ADA 20 - 15 gcgggaggcggggcccggcccgttaagaagagcg -44 gene. tggCCGGCCGCGgccaccgctggccc ADH1 (alcohol dehydrogenase 1) 4 - 109 tgaataatctaatGGGTGTGGCttaaagaccta 50 gatcatgtgtggaactggaatcgggtg ADH2 (alcohol dehydrogenase 2) 4 - 89 aataatccagtGGGTGTGGCttaaagacatag 30 atcacgtgtggaattggaattggatgtt ADH3 (alcohol dehydrogenase 3) 4 - 84 tacgacGGGTGTGGCttaaaaacctagatcac 25 gtgtgtagttggaattgggtgttatatg Adiponectin receptor 1. 1 - -232 ggtgacgcggctgcggaggtgacgcgggaggtcgc -291 gcgccccttcCGGCGCGGGGagggc Adipose differentiation-related 9 - -34 ggaggggcggcgaggcggggtttatagcccgGG -93 protein. CGCCCGCgggccccacgctttgaccgg ADM (adrenomedullin) 11 + -93 cgggcagcccaggccccgCCCCGCCGCtccc -34 ccacccgtgcgcttataaagcacaggaac ADP-ribosylation factor 3, ARF3 12 - 50 gaagtatcccgggCTGGGGGTGggggtacgc -9 gene. cgcacagctccagtcgccgtcgcggcttc ADP-ribosylation factor 5. 7 + -61 agcctcctCCTGCTGCTGctgcgccccatcccc -2 ccgcggccggccagttccagcccgcac ADP-ribosylation factor 11 - -4 cttctgcccggagaggacgtcatttccgccgagtccct -63 interacting protein 2 (arfaptin 2). gaCCTGCTGCTaggatcgcgac ADP-ribosylation factor-like 2. 11 + 13 cgagcgtgatagccaacaGGAACCGGGagc 72 ggggtcccgggactgggaagaaacggcggc ADP-ribosyltransferase 3. 4 + 63480 tactgCTGGGGGTGTttataaaaagaaaagca 63539 ttcctgtctcagctctcactgtcaacaa ALDC (aldolase C) 17 - 60 gccccggacaccagtcctggggAGGGGGTGT 1 GGtcagggcggggcatgcaggccacgccc Aldehyde dehydrogenase 3 17 - -91 acgggaaGCAGGGTCCttaaatacgtcccctct -150 family, memberA1. tggctcttgccgttccaggagccccag Aldehyde dehydrogenase 6 14 - -24 gagggcgaggcctggcagctgtagtgcttctgggca -83 family, member A1. gtagaGGCGCGGGGtgcggagcta APC (adenomatous polyposis 5 + 30144 cccgaggggtacGGGGCTAGGgctaggcag 30203 coli) gctgtgcggttgggcggggccctgtgcccc (continued on next page)

154

(Table A10 continued)

Apolipoprotein B. 2 - 48 ctcttgcagcctgggcttcctataaatggggtgcgggc -11 gCCGGCCGCGcattcccaccgg Asparagine-linked glycosylation 2 9 - -3 ctacgagcgcggagcttgcgcagaagACCCCC -62 homolog. ATCagggtgcggggtgcagttgcggctc ATP synthase, H+ transporting, 2 - -3 catCCGGGTGCTgcggcgcgaataagagccg -62 mitochondrial F0 complex, gaccgcgcttgcgcattgagtcccactcc subunit c (subunit 9) isoform 3. ATP synthase, H+ transporting, 17 + -48 aCCAATGGGGacgcggggatattacggccaat 11 mitochondrial F0 complex, gagaatggagaaggtccaggacacgtgg subunit c (subunit 9), isoform 1. ATP synthase, H+ transporting, 21 - -702 gcattgcgatggcgggtaGGCGTGTGGgggc -761 mitochondrial F0 complex, ggagccagggccggaagtagagcggaggt subunit F6 , nuclear gene encoding mitochondrial protein. ATPase, H+ transporting, 7 + -32 gggtgGGGGTTGAGGccgacggggcgccgt 27 lysosomal 14kDa, V1 subunit F. acggcggaggcggggtttcagtggcttctg ATPase, Na+/K+ transporting, 3 + -34 agacgtcaggctcccgcggcccaagcggccgccc 25 beta 3 polypeptide. ggcGCGGCGCGGcgcagtcggctcga B-cell receptor-associated protein X - -318 ccggttTCCGGCCGCGgtatgaggggcgggg -377 31 (BAP31). ccggggctgctgtgggagagttctgttgc B-cell receptor-associated protein X - -318 ccggtttccggccgcggtatgaggggcggggccgg -377 31 (BAP31). gGCTGCTGTGGGAGagttctgttgc Benzodiazapine receptor 22 + -28 aggggcgggGCCTGGCGGctgggaggggcg 31 (peripheral). gggcggatgcggggacagcggcctggctaa Beta-1,3-glucuronyltransferase 3 11 - 67 gcctgcggacaggagccggggtttggggcgggaac -2 (glucuronosyltransferase I). ccctcgtcccctgcaGACCCCGCCt beta-actin. 7 - 50 tcgagcggccgCGGCGGCGCcctataaaacc -9 cagcggcgcgacgcgccaccaccgccgag beta-actin. 7 - 50 tcgagcggccgcggcggcgccctataaaacccagc -9 ggcgcgacgcgccACCACCGCCGag Biglycan. X + -22 tcggcccgcctgcccagcctttagcctcccgcCCG 37 CCGCCTctgtctccctctctccaca Biglycan. X + -22 tcggcccgcctgcccagcctttagcctcccgCCCG 37 CCGCCtctgtctccctctctccaca Bleomycin hydrolase. 17 - -63 gcctcagcctCCCCGCCGCCGccgccgccg -122 ccgccgccgagccggtttcctttttccggc Calcium binding atopy-related 10 - -13 aggagagtcacgtgagagtgggCGGAGGGG -72 autoantigen 1. Gtggaggtttgtctccgctgtttcatctct Calponin 3, acidic. 1 - -26 ggggcggcctgtGGGGAACCGaggtgcgggc -85 ggcgagcgaggcagccgggtgcttcgcag Calponin 3, acidic. 1 - -26 ggggcggcctgtggggaaccgaggtgcgggcggc -85 gagcgaggcagCCGGGTGCTTCgcag Capping protein (actin filament) 1 - 65832 ggtggcggcggccCGGCGCGGGGGGagg 65773 muscle Z-line, beta. ggggtgctgacccggatgttcactcctgggca Carbonyl reductase 1. 21 + -12 gcgctcagcggCCGGGCGTGTaacccacgg 47 gtgcgcgcccacgaccgccagactcgagca Casein kinase 1, alpha 1. 5 - -4 tcccagagtgctctgcgccgtgaagaagcggctccc -63 ggggactGGGGGCATTttgtgttg Cathepsin H. 15 - -4 gctcgtgccgcTCCCCCCCCgcgctcccagttg -63 acgctctgggccgccacctccgcggac CD11c (p150.95 leukocyte 16 + -137 gactccggttggggggtggGGGCGTGTGGga -78 integrin alpha subunit) ggccgagcctgtcctcggatcagttgcgt CD83 antigen (activated B 6 + 91 agttcccgaacgcgcgggcataaaagggcagccG 150 lymphocytes, immunoglobulin GCGCCCGCgcgccacagctctgcagc superfamily). CDC10 cell division cycle 10 7 + 153 gctgcgctccgctggggctggtcgCGGAGGGG 212 homolog (S. cerevisiae). Gggaggggatgtcggtcagtgcgagatc CDC28 protein kinase regulatory 9 + -57 gtccaaTCGGCGGCGggctgtcggattcaaat 2 subunit 2. ccggatcgttgggactgcggtcgttagt Cellular-Abelson murine leukemia 9 + 121337 ggggcgggCCTGGCGGGcgccctctccgggc 121396 virus oncogene 6kb. cctttgttaacaggcgcgtcccggccagc (continued on next page)

155

(Table A10 continued) Cellular-Abelson murine leukemia 9 + -20000573 gggcccgccttccgctgtctgGGCCGCGAGagt - virus oncogene 7kb. ccttcgtcccttacagccccgccccgg 20000514 CGI-69 protein. 17 - 27 gcgggcGGGGCCTCGtggcgcggtgtagggg -32 cggggcccccttggctgctttcggcgtcg CGI-81 protein. 16 + 8 agggggaggtgccgaggctgcgcgccggctgctcc 67 tccCCACCCCCAgcctttgccctga Chaperonin containing TCP1, 12 + -18 ggccccgccccccgaagtaGGGCGTGTGGC 41 subunit 2 (beta). GTcacttccggcttccttcagtccgctggt Chaperonin containing TCP1, 1 - -92 tgagagtagcGGGTTGAGGTgtaagccctga -151 subunit 3 (gamma). ggaggcagcgttttctgggcttctgtctg Chaperonin containing TCP1, 7 + 27 atggcGGCGGCGCGcgggcacgctgggggc 86 subunit 6A (zeta 1). cggccagacgggccgacttttccagaagac Chloride channel, nucleotide- 11 - 16 gactaagttgttctTCCGGGGTGactgcctcttcc -43 sensitive, 1A. agggcgggcggtgtggtgcacgcat Chloride intracellular channel 1. 6 - -38 gatccagcccgggagaggaccgagctggaggagc -97 tgggtgtGGGGTGCGTTgggctggtg Chorionic gonadotropin-beta, 19 - 49 tggactgtggtgcaggaaagcctcaagtagaggaG -10 CGB gene. GGTTGAGGcttcagtccagcacctt Chromobox homolog 3 (HP1 7 + 221 cccctccccCCGGCGGCCCCGcgcgcagct 280 gamma homolog, Drosophila) , cccggctccctcccccttcggatgtggctt transcript variant 2 Chromosome 3 open reading 3 - -115 agggggcgggctctgcagggaagtgcgtcagagg -174 frame 4. aGGCGCGGGGagagtagggtgctgtg chromosome 5 open reading 5 - -219688 cccccTCCCCCCCCaatctgtctttctagcatgtt -219747 frame 13. gccctttttcaaccacatttgtgtt Citrate synthase. 12 - -26 cgggctgggcgCCGCCGCCGgttcgtctactctt -85 tccttcagccgcctcctttcaacctt c-jun 1 - -630 ggagtccgggcggccaagaCCCGCCGCCGg -689 ccggccactgcagggtccgcactgatccgc Cleft lip and palate associated 19 + -85 gcccggcGGGGCTAGGaaagcgtgaaatctc -26 transmembrane protein 1. gcgcgattgcgctgcgaagtcggggacgg c-myc 8 + -124 ccctccccataagcgcccctcccgggttcccaaagc -65 agaggGCGTGGGGGaaaagaaaaa c-myc 8 + 10 tgtgctgctcgcggccgCCACCGCCGggcccc 69 ggccgtccctggctcccctcctgcctcg Complement component 1, q 17 - 9 gcggggcttcCGGCGGCGCctcaggtcgcgg -50 subcomponent binding protein. ggcgcctaggcctgggttgtcctttgcat Component of oligomeric golgi 1 + -24 aggccggcttgcggtgccgtcgttcattggccGCG 35 complex 2. GCGCGGGcgctgccatgttggcgga Component of oligomeric golgi 1 + -24 aggccggcttgcggtgCCGTCGTTCattggccg 35 complex 2. cggcgcgggcgctgccatgttggcgga Component of oligomeric golgi 16 - 31 ggcgccaacgaagaGGGGGCCTCGcggga -28 complex 4. ccggaagtgccgaatggggaccaagatggcg Cyclin B2. 15 + -56 agaacagcgacccgtgcgcagggccgcCCAAT 3 GGGGcgcaagcgacgcggtatttgaatc Cyclin D3. 6 - -135 tccccacttcctgcccagctctggaTCGGCGGC -194 GCGGcgcggactttgtaaacacttcgc Cytochrome c oxidase subunit IV 16 + 56 cgggcggagtcttcctcgatcccgtggtgctccGCG 115 isoform 1 , nuclear gene GCGCGGccttgctctcttccggtc encoding mitochondrial protein. DC6 protein. 8 + -107 ggcagggtggcgaGACCCCGCCccggaaat -48 gcgtgttctagctttctgtgtgcttaggtg DEAD/H (Asp-Glu-Ala-Asp/His) 16 + -10 gtcacTGTGGCGCGccgcttccggtctgcagcc 49 box polypeptide 19 (DBP5 ttgtagtggggctggagcagagcctgc homolog, yeast). Delta-like 1 homolog 14 + -55 gtacgaaaagGGCGGCGCGcgcggcggcgg 4 (Drosophila). cggcagctccccggcagcggcggtggagag DiGeorge syndrome critical 22 - 7 gggaggcggggcggGGCGGCGCGGcgcgg -52 region gene 2. agccgagcaggacggccgccatcttgcgcgc Dihydrofolate reductase. 5 - -371 gggggcGGGGCCTCGcctgcacaaatgggg -430 acgaggggggcggggcggccacaatttcgc (continued on next page)

156

(Table A10 continued) Dihydrolipoamide dehydrogenase 7 + -30 ccgcgcgggccaatcgcgctgcTCCCGGGTG 29 (E3 component of pyruvate atgacgtaggctgcgcctgtgcatgcgca dehydrogenase complex). Diptheria toxin resistance protein 1 + -6 aacctcccCGGTAGTCCCacgtgtagcggag 53 required for diphthamide 2 (S. aaacagtagttaggatggctgaaggggat thesis-like 2 (S. cerevisiae). DnaJ (Hsp40) homolog, 16 - -41 ctgtctccctcggcctgtgCCGCCGCCGacgcc -100 subfamily A, member 2. gcttgtgggcccgactccgctctgtct DnaJ (Hsp40) homolog, 7 + -41 cgccgtgCCGCCGCCGcgtgacgcatttcctgtt 18 subfamily B, member 6 , tgttgttggagaaaggagagaaagga transcript variant 2. DnaJ (Hsp40) homolog, 19 - 56 ggaagcttccgccggacgggtatatagagtccggga -3 subfmaily B, member 1. ctggTCGGCGGCGgagccggggga EBNA-2 co-activator (100kD). 7 + -3 ggcacgcttgcgcggcgagtagaaCGTGTGGC 56 Ggcggcggagatcgcgtctctttcgctc EGFR (epidermal growth factor 7 + 39 acggtgtgagcgcccgacgcggccgaggCGGC 98 receptor) CGGAGtcccgagctagccccggcggccg EGFR (epidermal growth factor 7 + -82 tccctcctCCGCCGCCTggtccctcctcctcccg -23 receptor) ccctgcctccccgcgcctcggcccgc ElaC homolog 2 (E. coli). 17 - -5 tggatccgccatgcggagcGGCTAGGTGGT -64 Gcacgggaaacgcgggcgtaggtgaccggc Endothelin-B receptor, EDNRB or 13 - 46 gccccgccccactgcatattatttacccctcctggcca -13 ETRB gene. cGCGGGGGAAgaaaaacagctg Enolase 1, (alpha). 1 - 14 ggtggggctcgccttagctaggcaggaagtcggcgc -45 ggGCGGCGCGGacagtatctgtgg Enolase 1, (alpha). 1 - 14 ggtggggctcgccttagctaggcaggaagtcggcgc -45 gGGCGGCGCGGacagtatctgtgg Enolase 3, (beta, muscle) , 17 + -21 gggaccgagtggctcagggataaatgcgcagcctg 38 transcript variant 1. agagGGGGTGAGCtgacactgtccc ENOS interacting protein. 19 - 59 agtgggcggagcgGAGGGGGAAaatagtgct 0 cctttccggtgtcggggcacagttgaaga ENOS interacting protein. 19 - 59 agtgggcggagCGGAGGGGGaaaatagtgct 0 cctttccggtgtcggggcacagttgaaga Epidermal growth factor receptor, 7 + 31 ccccccgcacggtgtgagcgcccgacgcggccga 90 EGFR or ERBB1 gene. ggCGGCCGGAGtcccgagctagcccc Eukaryotic translation initiation 22 - -15 gacagcgccggtcgtgtttacggcGGCGCCCG -74 factor 3, subunit 7 ζ, 66/67kDa. Ctgcgcgcgcatgtttcctcttttcctg Eukaryotic translation initiation 22 - -15 gacagcgccggtcgtgtttaCGGCGGCGCccg -74 factor 3, subunit 7 ζ, 66/67kDa. ctgcgcgcgcatgtttcctcttttcctg Factor VIII, F8C gene. X - 49 cctgtggcTGCTTCCCActgataaaaaggaag -10 caatcctatcggttactgcttagtgctg Family with sequence similarity 14 - 30 gcgcgtccctactggatgGAGGGGGAAgtaac -29 14, member A. accccaagaacgctgtcatttcctgggc Fibroblast growth factor (acidic) 11 - -15 atgccgcgagcggaatctcggcgctcccggaagtg -74 intracellular binding protein. gcctgaaGGCGGCGCGccagtcccg Follicular lymphoma variant 18 - -21 gccccggcccgcaaacccaaacactccaGGCG -80 translocation 1. CCCGCccgccgcgcgtgattctcgcctc GDP dissociation inhibitor 1. X + 204 cgcggCGGAGGGGTcgggcgacggccgacg 263 cgccgccatctttggtccagtgcggtggcg Gelsolin (amyloidosis, Finnish 9 + 11 ccctgcccaccCCGGCCGCGcgcaccacaac 70 type). gcccccgccccgccgcccggaaccagctg Gelsolin (amyloidosis, Finnish 9 + 11 ccctgcccaccccggccgcgcgcaccacaacgccc 70 type). ccgCCCCGCCGCCcggaaccagctg General transcription factor IIIC, 9 + 276 gcgaggcctttgggagtactttgtgggacggacCCT 335 polypeptide 5, 63kDa. GGCGGGccctgccagacgcacagg Glioblastoma amplified 7 + -61 gggaggggtcgcgcggcggcggcgtcagCGGC -2 sequence. GGCGCccgggcggtgggagccgaggcgc Glucose-6-phosphate X - -668 agcccaGGCGCCCGCccccgcccccgccgat -727 dehydrogenase. taaatgggccggcggggctcagcccccgg Glutamic-oxaloacetic 10 - -100 gcccccgccGGCTAGGTGaaggtgagtgtctc -159 transaminase 1, soluble. ctccagtcgcaacggccagacctgacct (continued on next page)

157

(Table A10 continued) Glutaredoxin (thioltransferase). 5 - -98 gcctccagggaggttccttattaaataggagcCAAC -157 TGGCTGggtcggggctcaataccc Glutathione peroxidase 1. 3 - 16 gtgaCCCGCCGCCGgccagttaaaaggagg -43 cgcctgctggcctccccttacagtgcttgt Glutathione S-transferase pi. 11 + 188 ttataaggctcggaGGCCGCGAGGccttcgct 247 ggagtttcgccgccgcagtcttcgccac Glycine cleavage system protein 16 - -42 cctccCGGCGCGGGtggccgaggcgtagcgc -101 H (aminomethyl carrier). tgcgacccccgcacccctgcgaacatggc Glycine cleavage system protein 16 - -42 cctcccggcgcgggtggccgaggcgtagcgctgcg -101 H (aminomethyl carrier). accCCCGCACCCctgcgaacatggc GM-CSF2 5 + -2985 tgccacagCCCCATCGGagcccctgagtcagc -2926 (granulocyte/macrophage colony atggctggctatcggttgacactgttta stimulating factor) Golgi phosphoprotein 3 (coat- 5 - 0 atattggaaaggcgccgCCGCCGCCTccgcct -59 protein). tggagctcggggtgtttcggggactgcg Golgi phosphoprotein 3 (coat- 5 - 0 atattggaaaggcgCCGCCGCCGcctccgcctt -59 protein). ggagctcggggtgtttcggggactgcg Guanine nucleotide binding 1 - -371 acttagagacaaagttcggagccccgcCCCCGC -430 protein (G protein), gamma 5. CGCgcgccgctgagttgtctggccccg H2A histone family, member O. 1 - 20 tgaagcagcggcgttttcggcgactttcCCGATCG -39 CCaggcaggagtttctctcggtgac Heat shock 70kDa protein 9B 5 - 85 agtcgagtatcctctggtcaGGCGGCGCGGG 26 (mortalin-2). cggcgcctcagcggaagagcgggcctctg Heat shock 90kDa protein 1, 14 - -52656 cgctgggcgggcccgtcgctatataaggcaGGCG -52715 alpha. CGGGGGtggcgcgtcagttgcttcag Heme oxygenase (decycling) 1. 22 + -47 ccacgtgacccgccgagcataaatgtgaCCGGC 12 CGCGgctccggcagtcaacgcctgcct High mobility group chromosomal 1 + 38 ttctaaccGGTCCGGGGctcccagcgctataaa 97 protein-17. aactttataaaccccccggagcccgag High-mobility group chromosomal 21 - 48 ggggcggcccggccggcggggagggggagcccg -11 protein-14. cggccggggaCGCGGGGGGaggaggag Histidyl-tRNA synthetase. 5 - 13 gcgacccagcgactcgatagccggaagtcatccttg -46 ctgaggctggggCAACCACCGCag Histone H3.3. 1 + -47 ggcggggcgGCGTGTGTTgggggatagcctc 12 ggtgtcagccatctttcaattgtgttcgc HMG-box containing protein 1. 7 + -61 ggggctcaacccccagGGGGTTGAGGggag -2 ggggaagacgaagcttgaaagacttggtaa HMG-box containing protein 1. 7 + -61 ggggctcaacccccagggggttgagggGAGGG -2 GGAAgacgaagcttgaaagacttggtaa HNF4A (hepatocyte nuclear 20 + 45489 aggcagtgggagggcggaggGCGGGGGCC 45548 factor 4-alpha) Ttcggggtgggcgcccagggtagggcaggtg Hydroxyprostaglandin 4 - 77 cgcgcgcgcgcgcaGGGGGGCATaaaagcc 18 dehydrogenase 15-(NAD). gcggccgcgcggagacgcggagctcgccca Hypothetical protein FLJ10769. 13 + -133 aaccggaaaacgcttccaatggctgtgtttccggcga -74 CGGCGCGGGGGcagctgggaatc Hypothetical protein FLJ11730. 1 - 40 ctggcggcgcagggaccgcccccaacctcgcgCC -19 GCCGCCGcccgcctcagcccaacatg Hypothetical protein similar to 17 - 49 gccggcggCGACCCCGCCcgaccacggag -10 beta-transducin family. acggcgctctgtctgccgggctctttctcct IL-1beta (prointerleukin-1beta) 2 - 2796 ctgtggagactgttacgtcaGGGGGCATTgccc 2737 catggctccaaaatttccctcgagcga Insulin, INS gene. 11 - 49 gggagatgggctctgagactataaagccaGCGG -10 GGGCCcagcagccctcagccctccagg Insulin-like growth factor II, IGF2 11 - cccgcctccagagtgggggccaaggctgggcaggc gene. gggtggACGGCCGGAcactggcccc Isocitrate dehydrogenase 1 2 - aggggacaaagccgggaagaggaaaagctcgga (NADP+), soluble. cctaccctgtgGTCCCGGGTttctgca Isocitrate dehydrogenase 3 15 + -49 gggcgggctccGAGCGTGGGGtgggcgcttg 10 (NAD+) alpha. cgcactgccgctgcggctgttgctgcgga Lactate dehydrogenase C. 11 + -12 cgaGCGGGGGAACgtgcgtgtctcgagtcgca 47 cggagggcaaccgtcgacgggcttagcg (continued on next page)

158

(Table A10 continued)

Lectin, galactoside-binding, 14 + 49 ccgggcggggctgggagtatttgaggctcggagcca 108 soluble, 3 (galectin 3). ccgccccgccGGCGCCCGCAgcac Lectin, galactoside-binding, 19 - -106 gccactcccttaccctgggtataagagccACCAC -165 soluble, 4 (galectin 4). CGCCtgccatccgccaccatctccca Legumain. 14 - 9 tcaccgcggcacagtggcccttaagcgaggagCG -50 GCGGCGCccgcagcaatcacagcag Legumain. 14 - 9 gtcaccgcggcacagtggcccttaagcgaggagcg -50 gcGGCGCCCGCAgcaatcacagcag Likely ortholog of mouse 11 + 221 cctcccccgcgcgctGGCGCGGGGctttctggg 280 synembryn. ccagggcggggccggcgaactgcggcc Lysophospholipase II. 1 + -7 ctcgaacggaagttccggcgggggcggccGAGG 52 GGGAAgagtgtgtctgcgggagaaaga Major histocompatibility complex, 6 - 69 gaaatcagtaacttactccctataacttggaatgtgggt 10 class II, DR beta 1. GGAGGGGTTcatagttctccc Microtubule-associated protein, 20 + 12 gcgtaacgagGGGGTGCGTgtgaggtcatcg 71 RP/EB family, member 1. cgcgggcgggcgggcggggtctggcggtt Midkine (neurite growth- 11 + 566 tggggagaggggGCGGGGGCCcatgtgacc 625 promoting factor 2). ggctcagaccggttctggagacaaaagggg Mitochondrial ribosomal protein 17 - 38 cgctaccccacaatccttagctctttccgTCTCCAC -21 L10. TCggcttccgtccattcttccggt Moesin. X + -21 cacataaaggagcgGGCGGCGCGacaggg 38 gcggctctttcctgggtggggtttgtgaagt MT-IIA (metallothionein IIA) 16 + -86 tgactcagcgcgGGGCGTGTGcaggcagcgc -27 ccggccggggcggggcttttgcactcgtc Myeloid leukemia factor 2. 12 - -360 tccgttggccgagggggccgtacggaggtggcaG -419 CTGTGGGAGgaggcggcgtggaaggc N-acetyltransferase 8 (camello 2 - 43 aaacattcaggagccaagcaatgaggcaggcagc -16 like). ctgctttccaGGGGTGAGCaagcccc N-acylsphingosine 8 - -843 cgggtcacgcggCGGAGGGGGcgtggcctgc -902 amidohydrolase (acid ccccggcccagccggctcttctttgcctc ceramidase) 1. NADH dehydrogenase 16 - 45 gtcgcctgcgcgccgccggaaaggaaccctggtcc -14 (ubiquinone) 1, alpha/beta ggaggCGGCGGCGCagtgcatcctg subcomplex, 1, 8kDa. NADH dehydrogenase 11 - -308 cacgtaccgggaaGCGTTTGGGcgctgcacc -367 (ubiquinone) 1, subcomplex gcgctgcgccgtgcccgtgagtccggcgc unknown, 2, 14.5kDa. NADH dehydrogenase 2 - 11 cgggctttcCGTTCTCCAggcccggctgacaga -48 (ubiquinone) Fe-S protein 1, gttagccgaggccgccatattgaataa 75kDa (NADH-coenzyme Q reductase). NADH dehydrogenase 18 + -21 ggggtcgggagcgcgcacgctgtgcgccctgggcg 38 (ubiquinone) flavoprotein 2, cgctcgggattctCGCCTGGCGcgg 24kDa. Neighbor of COX4. 16 - 1 ctttcCGGCGGCGCcgggcgagaagcgcgac -58 tggagtctggggctgcggcttcccagagg NOL1 (nucleolar protein 1, 12 - 52 cggccgactaggcgcaacaggaagaggcggGG -7 120kDa). CCGCGAGgctagcgctacccgtgcgcct Non-metastatic cells 2, (NM23B) 17 + 832 gctgggcccTCCGGGGTGtggccaccccgcg 891 expressed in , nuclear gene ctccgccctgcgcccctcctccgccgccg encoding mitochondrial protein. Non-metastatic cells 2, (NM23B) 17 + 832 gctgggccctccGGGGTGTGGCcaccccgcg 891 expressed in , nuclear gene ctccgccctgcgcccctcctccgccgccg encoding mitochondrial protein. Non-metastatic cells 2, (NM23B) 17 + 832 gctgggccctccggggtgtggccaccccgcgctccg 891 expressed in , nuclear gene ccctgcgcccctcctCCGCCGCCG encoding mitochondrial protein. Nuclear autoantigenic sperm 1 + -49 gggcgaagggtcctcgtatataaaagggccCCGG 10 protein (histone-binding) CCGCGcggggtctctaatctgccatt transcript 2. (continued on next page)

159

(Table A10 continued)

Nuclear factor I/C (CCAAT- 19 + 6916 cgggggggggggtTGGGGGGGGcgggggg 6975 binding transcription factor). gtggtttggaaaaatgactcagtaagttcag Nuclear factor of kappa light 14 - 44 aaatccccagccagcgtttatagggcgccgCGGC -15 polypeptide in B-cells inhibitor, α. GGCGCtgcagagcccacagcagtccg Nuclear receptor binding protein. 2 + -29 gtgcggcaccgcccatcgttggtccgggcGCCGC 30 GAGGgcggggcccgcagttcggttgc Ovarian cancer overexpressed 1. 20 - 14 ggtTCGGCGGCGgcatccggcctcgcacttcc -45 ggtggggagattccggcctggagctccc P53-induced protein PIGPC1. 6 - -55 cctccccgcggcctcttcgcttttgtggcGGCGCC -114 CGCgctcgcaggccactctctgctg Peptidase D. 19 - -57 cttaccccgccccatgcattggcacCCGGAGGG -116 Gctcagctgacgccgcacttcacgtga Peroxiredoxin 3 , nuclear gene 10 - 28 ggcccgctcaccaccccgtaggccccgcccctgcgt -31 encoding mitochondrial protein. ctctgcccgccccgtGGCGCCCGa Pescadillo homolog 1, containing 22 - 38 caaacagacAGCGTGGGGtggggagggtcct -21 BRCT domain (zebrafish). cggggtccttggcagggcacgtgcgggagagcgtg ggg Phorbol-12-myristate-13-acetate- 18 + -13 cagggaagttctcactggacaaaagcgtggtctctG 46 induced protein 1. GCGCGGGGatctcagagtttcccg Phosphofructokinase, platelet. 10 + -61 cgcgggcggggcggcggttccgagtcaggcgcgc -2 gcggGCAGGGTCCccattgcctgctg Pinin, desmosome associated 14 + -2 tcattggctgagcccggctgtcagtcctttcgcgccTC 57 protein. GGCGGCGCGGcatagcccggct Pituitary tumor-transforming 1. 5 + -52 aattgGGCCGCGAGttgtggtttaaaccaggag 7 tgcgccgcgtccgttcaccgcggcctc Polymerase (DNA directed), 9 - -354 gaacggaaacctCGCAGGGTCagaccgtagc -413 epsilon 3 (p17 subunit). gacgcgggaagtccggacgcagtagctcc Polymyositis/scleroderma 4 + -25 tcccatttttCCGCCGCCGcgccttgatgacgtaa 34 autoantigen 1, 75kDa. ttttcctgcgcctcggggcgagcag Profilin 1. 17 - 49 cggggcGGGGGGGGGGaggagcaggaagt -10 ggcggtgcgagggctgctgcacagcgagcgg Profilin 2 , transcript variant 2. 3 - -31 taccgccgccgccgccgctgcgCCTGCTGCTc -90 ctcgccgtccgcgctgcagtgcgaaggg Profilin 2 , transcript variant 2. 3 - -31 tACCGCCGCCgccgccgctgcgcctgctgctcc -90 tcgccgtccgcgctgcagtgcgaaggg Profilin 2 , transcript variant 2. 3 - -31 taccgCCGCCGCCGccgctgcgcctgctgctcc -90 tcgccgtccgcgctgcagtgcgaaggg Progesterone receptor X + -12 cccgcctcctccccggctagtctttggCCGCCGC 47 membrane component 1. CGaaccccgcgcgccactcgctcgct Protease, serine, 23. 11 + 39 gggtgGCGCGGGGGGcggacccgccagctg 98 cctgcgctgctcgccagcttgctcgcactc Proteasome (prosome, 15 + -10 tcccgctccccccgccccaacccagcggttctgcgc 49 macropain) subunit, α type, 4. atGCGCGGGGGccatattagcagc Proteasome (prosome, 15 + -10 tcccgctccccccgccccaacccagcggttctgcgc 49 macropain) subunit, α type, 4. atgcGCGGGGGCCatattagcagc Protein disulfide isomerase- 2 - -31 accggactggcctggggcgggacgtgGGCGCG -90 related protein. GGGGcgcggcgtgcggcacgctgcaggg Protein phosphatase 1, catalytic 11 - -39 ggagagccaggccggaaggaggctGCCGGA -98 subunit, alpha isoform. GGGcgggaggcaggagcgggccaggagctg Protein phosphatase 2 (formerly 5 - -134 ccgccgcctcctgacgCCGGGCGTGacgtcac -193 2A), catalytic subunit, α isoform. cacgcccggcgggcgccattacagagag Protein phosphatase 2 (formerly 5 - -134 CCGCCGCCTcctgacgccgggcgtgacgtc -193 2A), catalytic subunit, αisoform. accacgcccggcgggcgccattacagagag Rac GTPase activating protein 1. 12 - -269981 agagggaCGGCGCGGGaggaataaatttct -270040 ctgtgattggttggtgaaggttttcaaacc RAD23 homolog A (S. 19 + -12 gcattgcctgccccggaagtggtcggcgcGCG 47 cerevisiae). GCGCGGcgcgcctgggcgctaagatggc Regulatory factor X-associated 19 + 739 aagttgctttctgtcccggcagaggaagccagatc 798 ankyrin-containing protein. gctgAGGGTCCGGtctccagtttgc Ribosomal protein L35. 9 - 50 ggccgcgCCCGCCGCCacccggcagtagtt -9 gcggaggtcagccccgcctacttcctcttt (continued on next page)

160

(Table A10 continued)

Ribosomal protein L8. 8 - -237 gtacccgggccgcCCGGCCGCGctaatcgt -296 gagtcgcccccaggacccgtcgccatgggc Ribosomal protein S15a. 16 - 41 tctcccGCTTCCCATaactccgtgcgctctccc -18 ctcccgtcctctttccgccatctttcc Ribosomal protein S19. 19 + 288 tctcgcgagaccctacgcccgacttgtgcgcccgg 347 gaaaccCCGTCGTTCcctttcccct Ring finger protein 4. 4 + -23 ggcgggcggCCAATGGGGacatgatgggg 36 ggcggagccgaggcctccgaagcggaagtgg SA-ACT (skeletal alpha-actin). 1 - 287 caccccatcccctccggcGGGCAACTGGgt 228 cgggtcaggaggggcaaacccgctagggag SCAN domain containing 1. 20 - 35 ccgcaagcggcttCCGGGTGCTcgcgcgcc -24 gacctggacgcagagaagccagagactttc Scavenger receptor class B, 4 - -276 tccctccttgcagttggatcCCTGGCGGGtgc -335 member 2. ggcccggcccggcccgtgagcggcgcac Selenium binding protein 1. 1 - 66 gccCCACCCCCAgctggttgtataaattccctc 7 ccttcgctccttccccggaacagcggc Selenoprotein P, plasma, 1. 5 - 19 tgaggtaaacaacaggactataaatatcagagtg -40 TGCTGCTGTGGctttgtggagctgcc Serine (or cysteine) proteinase 18 + -49 tgtggGAGGGGCAAagctgtataaaaccagt 10 inhibitor, clade B (ovalbumin), cattaccatgtctgaactgtaacaactct member 2. Sialidase 1 (lysosomal sialidase). 6 - -79 gctttaaAGGGTCCGGGtcagctgactcccg -138 actctgtggagtctagctgccagggtcgc Small acidic protein. 11 + 19 aggggccgcaacggtgacgactgtggcagagaa 78 ggcCCGGAGGGGctctgcgttctgtag Small nuclear ribonucleoprotein 15 - 30 ggctgcgattggtgctGCCTGGCGGcgggg -29 polypeptide A'. cgcggggcacgctgggacgtctcgctggc Small nuclear ribonucleoprotein 15 - 30 cggctgcgattggtgctgcctggcggcggGGCG -29 polypeptide A'. CGGGGcacgctgggacgtctcgctggc SMC2 structural maintenance of 9 + 283 gggaggcggGGCGCGGGGcgcgggggct 342 chromosomes 2-like 1 (yeast). gtagggagggggaccagtggcagagggacctt Solute carrier family 35 (CMP- 6 + 3 aggggcggtccccggtgtcctgcgcgggggcgC 62 sialic acid transporter),member1. GGAGGGGGcgggcgtcagttccgcggg Solute carrier family 35 (CMP- 6 + 3 aggggcggtccccggtgtcctGCGCGGGGG 62 sialic acid transporter),member1. cgcggagggggcgggcgtcagttccgcggg S-phase kinase-associated 5 + -21 agtgCGGCCGCGAagcagagcgggctgta 38 protein 2 (p45). gagccttgcgcgcgcagtggggatggaacg Src homology 3 domain- 7 + 119 aacgggccagcgctgcaagaggcctacgtgCG 178 containing protein HIP-55. GGTGGTCaccgagaagtccccgaccgac START domain containing 7. 2 - 22 gaaaaagtagggtgcgcgcgcgccgcGCGT -37 GTGGCagtcgcggaaggcgcgggagcttgc Sterol-C4-methyl oxidase-like. 4 + 2 cggctCCACCCCCAagccaggcgaggcag 61 gttccgaggttggaacacctggcgagtcctc Succinate dehydrogenase 1 + -44 cccccagccggcgcgcctccgccctcgggtggcg 15 complex, subunit C, integral gggcCGCCTGGCGtcacttccgtcca membrane protein, 15kDa. SURF-2 (Surfeit 2 promoter) 9 + -125 gcagccccagctgcaacgcagccACCGCCG -66 CCatcgcacccggccccgcgggcgcttccg SWI/SNF related, matrix 22 + -32 tttgtttgagCGGCGGCGCGcgcgtcagcgt 27 associated, actin dependent caacgccagcgcctgcgcactgagggcgg regulator of chromatin, subfamily b, member 1. SYN1 (synapsin 1) X - 216 tgccttcgccccCGCCTGGCGGcgcgcgcc 157 accgccgcctcagcactgaaggcgcgctga Syndecan 1. 2 - -273 gagacctggcggagCTGGGGGTGggggg -332 ccagtttttgcaacggctaaggaagggcctgt Syntaxin 10. 19 - 92 gggggtGGCGGCGCGttctcctcggttggga 33 gggaaccagcccgcgaacccaggccggga T-cell- or lymphocyte-specific 1 + 22833 atcccaggtgggagggTGGGGCTAGGgct 22892 tyrosine kinase 1. caggggccgtgtgtgaatttacttgtagcct (continued on next page)

161

(Table A10 continued) T-cell- or lymphocyte-specific 1 + -23 cggagccctccggaggaggcaggaagtcagggt 36 tyrosine kinase 2. gggacgtgGGCGCGGGGagacaggtgg Testis derived transcript (3 LIM 7 + -37 gccattggctgcccggccccctttgttcccGGGT 22 domains) , transcript variant 1. CCGGGccgcaggcccgctgcggcgga Thioredoxin 2. 22 - 49 ctcttccgcctgcctgtacccggaagtgacgttatgt -10 acGTCCCCCCCCgaggaagtgac Thioredoxin. 9 - 58 cacgCCGGGCGTGccagtttataaagggag -1 agagcaagcagcgagtcttgaagctctgtt Thymidine kinase. 17 - -99 gctcgtgattggccagcacgccgtggtttaaagcg -158 gtCGGCGCGGGaaccaggggcttac Thyrotropin-releasing hormone, 3 + 400 gagCCGCCGCCTggcgcagatataagcgg 459 TRH gene. cggcccatctgaagagggctcggcaggcgcc Thyrotropin-releasing hormone, 3 + 400 gagccgcCGCCTGGCGcagatataagcgg 459 TRH gene. cggcccatctgaagagggctcggcaggcgcc Topoisomerase III alpha, TOP3 17 - 49 gagttgtagttccaggtcgctgtggggcgcgCGC -10 or TOP3A gene. CTGGCGGaagtagctggagaaggcgg Transcription elongation factor A 20 + 6065 cgcgcgcggccgcggggccgagggtttGAAC 6124 (SII), 2. CGGGGGtctgtcgtccgcggcggggctgc Transducin (beta)-like 3. 16 + -30 tcgcgcgcggtgacgccatcgcagcgcgccggg 29 aGTGTGGCGTtctgtgaagagttcggt Translocase of inner 10 - -34 ttaacGGGAACCGGcgcccggaaggtcagc -93 mitochondrial membrane 23 gtgtgaagtaggcgctggcaacgcggggtt homolog (yeast). Transmembrane 4 superfamily X + 17 cgcgcccgcCCGCCGCCTgccgccgccgc 76 member 2. cgccgccgccggagctctgtagtatggcatc Transmembrane 4 superfamily X + 17 cgcgcccgCCCGCCGCCtgccgccgccgc 76 member 2. cgccgccgccggagctctgtagtatggcatc Tropomyosin 2 (beta). 9 - -82 cgcccggtccctccccgccttttaGGCGCCCG -141 Cgtggccgggacgtcccagtcccgctcc Ts translation elongation factor, 12 + -25 ggcccttccgccgcgccgtaggtttgttgcctcacc 34 mitochondrial. agtgcgcccGCCGGAGGGtgttta Tumor susceptibility gene 101. 11 - -33 ggtTGGGGGTGTGcgattgtgtgggacggtc -92 tggggcagcccagcagcggctgaccctct Tumor-associated calcium signal 2 + 114 aggcggggccgccaggtcgggcaggtgtgcgct 173 transducer 1. ccgCCCCGCCGCgcgcacagagcgcta Tyrosyl-tRNA synthetase. 1 - -623 ttccggccgctgcCGGAGGGGTccaggccg -682 agtaagcggagcgccgagcccagctgatg Tyrosyl-tRNA synthetase. 1 - -623 ttccggccgctGCCGGAGGGGtccaggccg -682 agtaagcggagcgccgagcccagctgatg U2(RNU2) small nuclear RNA 21 - -5 ggtgacgtctcccgagggcgtcggcagggTCG -64 auxillary factor 1. GCGGCGtcggcagcagtgtcgacggcag Ubiquitin A-52 residue ribosomal 19 + 4 cgtgcgcaagcgcttTCGGCGGCGattaggt 63 protein fusion product 1. ggtttccggttccgctatcttctttttct Ubiquitin-conjugating enzyme 10 + -20 ccagaagcaaaagcggaaacaaaagagtctcg 39 E2D 1 (UBC4/5 homolog, yeast). ccggcgtcccCGCCCGCACactcgcgca Ubiquitin-conjugating enzyme 22 + 19 ctccagccgcccggCCGGCCGCGAtgcatt 78 E2L 3. ctggggaaggagcagcaccaaatccaagat UDP-Gal β 1,4-galacto- 1 - 19 cgccccgaggccccgccccatgacgcgaGAC -40 syltransferase, polypeptide 3. CCCGCCcccgcagcgcccgcttccaagat Uncoupling protein 2 11 - 37 ccgaggcttaagccgcgCCGCCGCCTgcg -22 (mitochondrial, proton carrier). cggagccccactgcgaagcccagctgcgcgc Uracil-DNA glycosylase , nuclear 12 + 526 gcggtggggcgggtCTGGCGGGGGcggg 585 gene encoding mitochondrial gcacctctgtgcagggttcccagtcaccgcga protein, transcript variant 1. VAMP (vesicle-associated 18 + 50 gggtcgccgaggctcgcaagtgcgcgtggccgtg 109 membrane protein)-associated gcggctggtgtGGGGTTGAGtcagtt protein A, 33kDa. Vitamin A responsive; 3 + 26 caactctccaactcagctcagctgatcggttgCC 85 cytoskeleton related. GCCGCCGccgccgccagattctggag Williams Beuren syndrome 7 - 18 gcgggcgccgagccGGGTGCGTTTtgcca -41 chromosome region 20A. cagagccgtaaaggcgcgcgggaacatgggg Zinc finger protein 259. 11 - 32 gagaggggtggGGCCGCGAGagcggcgg -27 aagtaggaagccgaggtctgaattgcgcgtgg 162

Table A11. IA-1 Short Target Genes

Gene Description Ch 60n Sequence with Binding Site A1 (1) COL (alpha1(1) collagen 17 - -279 ccctcctcctccccctctccattccaactcccaaattg -220 GGGGCCGGGgccaggcagctctg A disintegrin & metalloproteinase 1 + 16 ttctccgaggcgacctggccgccggccgctcctcc 75 domain 15 (metargidin). GCGCGCTGTtccgcacttgctgccc Acid sphingomyelinase-like 1 + -47 tggcttccacttaaGGGTCCGGTatgcctgcc 14 phosphodiesterase. tcctcgggccagcccagatcataccctg Activated RNA polymerase II 5 + 9 cgtgaccgcagccccagcgcggcgGGGCCG 68 transcription cofactor 4. GCGtctcctggctgccgtcacttccggttc Activating transcription factor 4 22 + -40 ccccatagagacgaagtctataaaGGGCCG 9 (tax-responsive enhancer GCGggcggccacggcagccatttctacttt element B67). Activating transcription factor 4 22 + -40 ccccatagagacgaagtctataaagggcCGGC 9 (tax-responsive enhancer GGGCGGCcacggcagccatttctacttt element B67). ADA (adenosine deaminase) 20 - 55 ccggcgagagggcgggccccgggaacggcggc -4 ggGCGGGGCGGgaggcggggcccggccc Adaptor-related protein complex 19 + 20 gaccggacataacGGTCCCCGCGCGGct 79 1, mu 1 subunit. ccccgaaccggaagtggaggtgagctgtcgcg Adipose differentiation-related 9 - -34 ggagGGGCGGCGAggcggggtttatagccc -93 protein. gggcgcccgcgggccccacgctttgaccgg ADP-ribosylation factor-like 1. 12 - 18 ggccaccgttcCCGGCGCGCagtcgcagct -41 gaccctcgctcccgcccccgcctggagtcc ADP-ribosylation factor-like 2. 11 + 13 cgagcgtgatagccaacaggaaccgggagCG 72 GGGTCCCgggactgggaagaaacggcggc ADP-ribosyltransferase (NAD+; 1 - 42 ccccgccccgtggacgcgggttCCGTGGGC -17 poly (ADP-ribose) polymerase). Gttcccgcggccaggcatcagcaatctatc Alcohol dehydrogenase 1C (class 4 - 35 ggtgttatatgagcaaacaaaataaatacctgtgC -24 I), gamma polypeptide. AACATACCtgctttatgcactcaag Aldolase A. 16 + 12602 cgccgccccttccgaggctaaatcGGCTGCG 12661 TTCctctcggaacgcgccgcagaaggggt Alpha-actinin-2-associated LIM 4 - -2 gtgacgtcacccccagcGGGGATAAAgcgc -61 protein. ccccgcccgggtcggggccaggacgccgcc Amyloid beta (A4) precursor-like 19 + 67 gggagctcctgtcaccgctGGGGCCGGGcc 126 protein 1. gggcgggagtgcaggggacgtgagggcgca Amyloid beta (A4) precursor-like 19 + 67 gggagctcctgtcaccgctggggccggGCCGG 126 protein 1. GCGGgagtgcaggggacgtgagggcgca Anillin, actin binding protein 7 + -10 atgctcgcggctcggcgctgaaattcaaatttgaac 49 (scraps homolog, Drosophila). ggctGCAGAGGCCGagtccgtcac Annexin A4. 2 + 190 gtaaaggggctccGGGCCGGGGgtcctggg 249 ctcagggaacggggagcgtggcccgggagc Annexin A5. 4 - 37 agccgggcaGGGCCGGGGtggggccgctg -22 gcgtttccgttgcttggatcagtctaggtgc Apolipoprotein A-I binding 1 + -49 gtgctggcccggcctcttcgggggcggggcgagc 10 protein. gccgcacatgcgccGGGGCCGGGccg Apolipoprotein B. 2 - 49 ctcttgcagcctgggcttcctataaatggggtgcgg -10 gcgcCGGCCGCGCattcccaccgg ARD1 homolog, N- X - 48 tccggccggtggccggcCCGGCGCGCacc -11 acetyltransferase (S. cerevisiae). gccccttccgccgtcgcccagcgagcccagc Arginase liver. 6 + -47 gttgtttattcaacccaagtATAAATGGAaaaa 12 aaagatgcgccctctgtcactgagggt AS (argininosuccinate 9 + 116 tgtgaacgctgagcggctccaggcgGGGGCC 175 synthetase) GGGcccgggggcggggtctgtggcgcgcg ATP synthase, H+ transporting, 10 + -20 actgaagaagagagcaaggtgggaggGGCG 39 mitochondrial F1 complex, CGCTGgggagcttcggcgcatgcgcgctga gamma polypeptide 1. ATPase, H+ transporting, 7 + -32 gggtgggggttgaGGCCGACGGGGcgcc 27 lysosomal 14kDa, V1 subunit F. gtacggcggaggcggggtttcagtggcttctg (continued on next page)

163

(Table A11 continued) ATP-binding cassette, sub-family 7 + -20 caccgccaaggcgcgagggggttgtcgggatgg 39 B (MDR/TAP), member 8. gggcgggaGCCAACATAgagccctcag bbc3 (bcl-2 binding component 3) 19 - -1405 taggggggggcgcggcgcgcgcctgcaagtcct -1464 gacttgtccgCGGCGGGCGggcggggc B-cell leukemia/lymphoma 2 18 - -636 ctcttctttctctGGGGGCCGTggggtgggag -693 proto-oncogene, BCL2 gene. ctggggcgagaggtgccgttggcccccg B-cell receptor-associated protein X - -324 ccggtttccggccgcggtatgaggggcGGGGC -376 31 (BAP31). CGGGGctgctgtgggagagttctgttgc B-cell receptor-associated protein X - -324 ccggtttccggccgcggtatgaggggcggGGC -376 31 (BAP31). CGGGGCtgctgtgggagagttctgttgc B-cell receptor-associated protein 7 + -49 cccctcccccgcccagccgcggcgtctgacgtccc 10 BAP29. gcgCGTCGGCGGccgcggagcagcg Benzodiazapine receptor 22 + -28 aggggcggggcctggcggctgggaggggcggg 31 (peripheral). gcGGATGCGGGGacagcggcctggctaa Benzodiazapine receptor 22 + -28 aggggcggggcctggcggctgggagggGCG 31 (peripheral). GGGCGGatgcggggacagcggcctggctaa Bifunctional apoptosis regulator. 16 + -48 acGGGGCCGGGGcacgcttcccccactctg 11 tcttgttacttccggtagcgaagcctctcc Block of proliferation 1. 8 - 0 ctcctgcgcacgcggcccggtcgctgtcggaagc -59 ggctgtgcgggtggcGGCCGGCGCGC Calponin 3, acidic. 1 - -26 ggggcggcctgtggggaaccgaggtGCGGG -85 CGGCGAgcgaggcagccgggtgcttcgcag Calreticulin. 19 + -30 gcgggtgggtataaaagtgcaaGGCGGGCG 29 GCGgcgtccgtccgtactgcagagccgctg Carbonic anhydrase II. 8 + 54 cacgaagttggcgggagcctataaaagcGGGC 115 CGGCGCGacccgcggacacacagtgcag Carbonic anhydrase III muscle, 8 + -24 ggccatgcaagtGTGCGGGGGagctacata 34 CA3 gene. aaagcgcgggctcgcgccgactctgcacca Carboxypeptidase A1 7 + -4 agtcctcggtcctgggagggtttaaaagccaGG 45 (pancreatic). GGGCCGTctcgacctcagtctgacctt Caspase 6, apoptosis-related 4 - 19 cggggccgccgaggaagggccgagggcGGG -40 cysteine protease. GCCGGGcccgggagcctgtggcttcaggaa Catenin (cadherin-associated 5 + -47 gggggccgcgggcggggggcGGGCCGGG 12 protein), alpha 1, 102kDa. Ggcgggggcgtggggcggcccatttcctcctc Cathepsin B , transcript variant 1. 8 - -1 gggcGGGGCCGGGagggtacttagggccg -60 gggctggcccaggctacggcggctgcagggc Cathepsin B , transcript variant 1. 8 - -1 gggcggggccgggagggtacttagGGCCGG -60 GGCtggcccaggctacggcggctgcagggc Cathepsin C , transcript variant 1. 11 - 12 tttccGGGCCGGCGtagctatttcaaggcgcg -47 cgcctcgtggtggactcaccgctagccc caveolin 1 7 + 104 ccaaccgcgagcagaacaaacctttGGCGGG 163 CGGCcaggaggctccctcccagccaccgc CD34 (surface glycoprotein) 1 - 613 tccctagtggcttagacccagggctggagaGGG 554 GATAActggtgagaaggcatccaggga CD63 antigen (melanoma 1 12 - -68 ccggggcggggccgcgcggcagGCGGGGC -127 antigen). GGgagccggggggcgcagctagagagccccg CD63 antigen (melanoma 1 12 - -68 ccggggcggGGCCGCGCGGcaggcggg -127 antigen). gcgggagccggggggcgcagctagagagcccc g CDC42 effector protein (Rho 11 + 588 gggtccgcggtgcactctgtaagttcaccgCCG 47 GTPase binding) 2. GTCGGGtccggccgccgcgctgtccag CDH1 (E-Cadherin) 16 + -1992 gcctgtaatcccagcacTTTGGGAGGctgag -1933 gcaggtggatcatctgaggacaggagttc CDK5 regulatory subunit 17 + 10 gccacaacgccactggattggtggtaggGCGG 69 associated protein 3. GGCGGgccacagtctccagcctgaagcg CEA (carcinoembyonic antigen) 19 - -1374387 ggagggacaaaagaggcagaaatgaGAGG -1374328 GGAGGggacagaggacacctgaataaagac c c-fos; Gene: G000218. 14 + -342 tcaatccctccccccttaCACAGGATGtccata -283 ttaggacatctgcgtcagcaggtttcc (continued on next page)

164

(Table A11 continued) CGI-69 protein. 17 - 27 gcgggcggggcctcgtggcgCGGTGTAGG -32 ggcggggcccccttggctgctttcggcgtcg Chaperonin containing TCP1, 2 - 27 ccagcGCCGCGCGGcaaggaaggccccct -32 subunit 4 (delta). tctccgcctccgcctcctcccgacgccggcg Chaperonin containing TCP1, 7 + 32 ggcggcgcgcgggcacgctGGGGGCCGG 91 subunit 6A (zeta 1). ccagacgggccgacttttccagaagacccgga Chloride channel, nucleotide- 11 - 15 gactaagttgttcttccggggtgactgcctcttccag -44 sensitive, 1A. GGCGGGCGGtgtggtgcacgcat Chromobox homolog 3 (HP1 7 + 221 cccctccccccggcggCCCCGCGCGcagct 280 gamma homolog, Drosophila) cccggctccctcccccttcggatgtggctt Chromosome 20 open reading 20 + 17 cggtgcgccggaaGTGGCTGCGgatttcgc 76 frame 43. cggaaatcccggaagtgacagctttggggg Cleavage and polyadenylation 2 + -157 gaagcctcgcgagcggcgtcccccgaccggaag -98 specific factor 3, 73kDa. tcGAGGGGAGGAGtctagtgccgcggg Cofilin 1 (non-muscle). 11 - -104 cccctcattgtgcggctcctactaaacggaaGGG -163 GCCGGGagaggccgcgttcagtcggg CS-1 (chorionic 17 - 98 tgatcccagcatgtgtGGGAGGAGCttctaaa 39 somatomammotropin) ttatccactagcacaagcccgtcagtgg Cystatin S. 20 - 62 gggctgggctgccaaagcaGGATAAATGca 3 cacctgcctgctggtctgggctccctgcct Cysteine and glycine-rich protein 12 - 27 taaaaggcaCCGGCGCGCgcttcgcctggg -32 2. atctcggactccctggaccctccctccagc Cysteine-rich protein 2. 14 + -13 ccgggcaggcggggctgggcGCGGGCGG 46 CGgcggcccggaggagaacgggcggagggc gc Cytidine monophosphate N- 12 + 1 atCGGGCGGCGccgagctgaggtggtgag 60 acetylneuraminic acidsynthetase. ggactagctcccggatgtggagaagctgggg Cytochrome b-245, alpha 16 - 40 gGCGGGGCGGggttcggccgggagcgcag -19 polypeptide. gggcggcagtgcgcgcctagcagtgtcccag DHFR (dihydrofolate reductase) 5 - -383 tcgcctgcacaaatggggacgagggggGCGG -442 GGCGGccacaatttcgcgccaaacttgac DiGeorge syndrome critical 22 - 6 gggagGCGGGGCGGggcggcgcggcgcg -53 region gene 2. gagccgagcaggacggccgccatcttgcgcgc Dihydrofolate reductase. 5 - -371 gggggcggggcctcgcctgcacaaatagggacg -431 aggggGCGGGGCGGccacaatttcgcg DKFZP586F1524 protein. 17 - -15 accaggcgggccaagagggccgggacGCCG -74 CGCGGggcagtgtgggactggggcggaacc DNA nucleotidylexotransferase, 10 + 59 agtctgctggtgagatgacatcaaaacccttcGT 118 DNTT or TDT gene. GTAGGAGggtggcagtctccctccct DnaJ (Hsp40) homolog, 9 + 15 gacgcgcccgggttcggctacaaaagaggacgg 74 subfamily A, member 1. ctgcggcgcGCCGGGCGGaactttcca Dynactin 4. 16 + -56 cagtgggccacctggagcggaagtGGAGGA -6 GCGGccggaagtagccggaatctctgaaag EGF-containing fibulin-like 11 - -49 caggcgggcgggcggggggcgcttcctggGGC -108 extracellular matrix protein 2. GGGCGGtccagggagctgtgccgtccgc EGFR (epidermal growth factor 7 + 17 cgcagcagcctccgcCCCCCGCACGgtgtg 76 receptor) agcgcccgacgcggccgaggcggccggagt Electron-transfer-flavoprotein, 19 - -26 tggtggggcttctggactgagccgctgagggtgcg -85 beta polypeptide. ggctgaccctgtaaGTGGCTGCGgc Enolase 1, (alpha). 1 - 14 ggtggggctcgccttagctaggcaggaagtcggc -45 GCGGGCGGCGcggacagtatctgtgg Enolase 3, (beta, muscle) , 17 + -21 gggaccgagtggctcaGGGATAAATGcgc 38 transcript variant 1. agcctgagagggggtgagctgacactgtccc Enoyl Coenzyme A hydratase, 10 - -13 tgaaaggctataggcgagggggcggctccgcG -72 short chain, 1, mitochondrial. GGGCCGGGcgaggagtccagagagccat Epidermal growth factor receptor, 7 + 31 cCCCCCGCACGgtgtgagcgcccgccgcg 91 EGFR or ERBB1 gene. ccgaggcggccggagtcccgagctagccccg Equilibrative nucleoside 10 + -43 ggggcggggcGGGGCCGGGGgaggagc 16 transporter 3. ccgcctgccgcctgccaagcccagtggtcctgg Equilibrative nucleoside 10 + -43 ggggcggggcggggccgGGGGAGGAGC 16 transporter 3. ccgcctgccgcctgccaagcccagtggtcctgg (continued on next page)

165

(Table A11 continued) Equilibrative nucleoside 10 + -43 gggGCGGGGCGGggccgggggaggagc 16 transporter 3. ccgcctgccgcctgccaagcccagtggtcctgg Eukaryotic translation elongation 2 + -28 gggaaggacaacttggacccgcagttccgccgg 31 factor 1 beta 2. aagtggccccagcctcGAGGCCGGGcg Fatty acid desaturase 1. 11 - -173 gaaaacCCGGCGCGCaggcggctggctct -232 gggcgcgcgccagcaaatccactcctggagc Ferritin light chain. 19 + -49 tccgaGGGCCGGCGCaccataaaagaag 10 ccgccctagccacgtcccctcgcagttcggcg FK506 binding protein 1A, 12kDa 20 - -37 agggctGGGCGTGAGggggcgtgcgcgtg -96 , transcript variant 12B. cgcaggcgacgcgccgaggtactaggcagag FN (fibronectin) 2 - 133 cggcgggcgggcgggcgggtggggtgggGCG 74 GGGCGGggacagcccggcgggtctctcct Follicle stimulating hormone-beta, 11 + -49 AGTTGCACAtgattttgtataaaaggtgaactg 10 FSHB gene. agatttcattcagtctacagctcttgc FXYD domain containing ion 19 + -30 ccgtctGGGTCCCCGCccgctcgcgctcccc 29 transport regulator 5. tggccacaccctccgcctggacgcagcag GDP dissociation inhibitor 1. x + 209 cggaggggtcgggcgaCGGCCGACGcgcc 268 gccatctttggtccagtgcggtggcggcggc Gelsolin (amyloidosis, Finnish 9 + 11 ccctgcccacccCGGCCGCGCGcaccaca 70 type). acgcccccgccccgccgcccggaaccagctg GH (growth hormone gene-1) 17 - 168 tgacaagccagggggcatgatcccagcatgtgtG 109 GGAGGAGCttctaaattatccattag Glucosamine-6-phosphate 5 - 21 cgccgcgtgacccgcagccacgtgaCCCGCG -38 isomerase. CGGggccgggccgccggcgcagtcgctgc Glucosamine-6-phosphate 5 - 21 cgccgcgtgacccgcagccacgtgacccgcgcG -38 isomerase. GGGCCGGGccgccggcgcagtcgctgc Glucose-6-phosphate x - -673 aggcgcccgcccccgcccccgccgattaaatGG -732 dehydrogenase. GCCGGCGgggctcagcccccggaaacg Glutathione synthetase. 20 - 52 ctgctctttaagcctaGCCGGGGCGgtcagc -7 gcaagcgcactgggtcgcatcgaggcccc Glycine amidinotransferase (L- 15 - -189 ccggggcggagcgcggctacaaaaggcctcgg -248 arginine:glycine gCCCCGCGCGcccgcccaccccgctccg amidinotransferase), Glycine cleavage system protein 16 - -42 cctcccggcgcgggtggccgaggcgtagcgctgc -101 H (aminomethyl carrier). GACCCCCGCACccctgcgaacatggc Glycophorin C (Gerbich blood 2 + -2 cagagcccctcccctcggCCCGCGCGGga 57 group). ggagtgtgacccaggtgccgcttcctctcgc Heat shock 60kDa protein 1 2 - -360 cgcggtgccGCGGGGCGGgagtagaggcg -419 (chaperonin). gagggaggggacacgggctcattgcggtgtg Heat shock 70kD protein 8. 11 - 49 aaggttctaagatagggtataagaggcagggtG -10 GCGGGCGGaaaccggtctcattgaact Heat shock 70kDa protein 5 9 - 10 CGACGGGGCTgggggagggtatataagcc -49 (glucose-regulated protein, gagtaggcgacggtgaggtcgacgccggcca 78kDa). Heat shock 70kDa protein 9B 5 - 85 agtcgagtatcctctggtcaggcggcGCGGGC 26 (mortalin-2). GGCGcctcagcggaagagcgggcctctg Heat shock protein 75. 16 - -24 ccgtcccgccccttcccatcgtgtacggtcccgCG -83 TGGCTGCGcgcggcgctctgggagt Hematological and neurological 17 - -130 aaaaccctataaaggcgtcGATCGGCCGga -189 expressed 1. caggcggcagcggcggctcctgcagcggtg Heterogeneous nuclear 9 - -387 gggcgcgacggcggggaggacgcgagaaggc -446 ribonucleoprotein K. gggGGAGGGGAGcctgcgctcgttttctg High mobility group nucleosomal 6 - -52 ggtcgaagaggccgggctacgtcgtgccctgcgc -52 binding domain 3. gtgagcagctgcagaGGCAGAGGCag High-mobility group chromosomal 21 - 48 ggggcggcccggccggcggggagggggagcc -11 protein-14. cgcggccggggacgcggGGGGAGGAGga g Histamine N-methyltransferase. 2 + 193 ggaagagaAGGAGGGGAacttaaaccttgc 252 ttcctgctctgtctttctcagaaaaccaaa Histone H3.3. 1 + -47 gGCGGGGCGGcgtgtgttgggggatagcct 12 cggtgtcagccatctttcaattgtgttcgc (continued on next page)

166

(Table A11 continued) HMG-box containing protein 1. 7 + -61 ggggctcaacccccagggggttGAGGGGAG -2 Ggggaagacgaagcttgaaagacttggtaa HNF4A (hepatocyte nuclear 20 + 45482 tgggaggaggcagtgggagggcggaggGCG 45541 factor 4-alpha) GGGGCCttcggggtgggcgcccagggtagg HNOEL-iso protein. 1 + -16 agagagacccagagatcaggagagaaggcacc 43 gccccCACCCCGCCtccaaagctaaccc Homology group 94; Mammalian 3 - 32 agcctgacgtcagagagagagtttaaaacagag -27 somatostatin ggagacggttgagaGCACACAAGccgc HTR1A (5-hydroxytryptamine 5 - 649 tcctcggggttggggaagtattAGGAGGGGA 590 (serotonin) receptor 1A) gggttagagtgggagggaaggagcctggc Hydroxyprostaglandin 4 - 77 cgcgcgcgcgcgcaggggggcataaaagccgC 18 dehydrogenase 15-(NAD). GGCCGCGCGGagacgcggagctcgccca Hypoxanthine X + -29 gccTGCGGGGCGtggcggggcgggcaga 30 phosphoribosyltransferase 1. gggcggggcctgcttctcctcagcttcaggcg IgA(germline) 14 - 4320 cagaccacaggccagacatgacgtggaGCCG 4261 CGCGGccgtgtctgctggggcaggaagtg Insulin, INS gene. 11 - 49 gggagatgggctctgagactataaagccaGCG -10 GGGGCCcagcagccctcagccctccagg Insulin-like growth factor binding 7 - 49 cctgggccaccccggcttctatatagcGGCCG -10 protein 3. GCGCGCccgggccgcccagatgcgagca Insulin-like growth factor binding 4 - 61 gGCCGGGCGGggcggcgggtttaaggcgc 2 protein 7. cggccggcccgacacggctcactcgcgccct Insulin-like growth factor binding 4 - 61 ggccggGCGGGGCGGcgggtttaaggcgc 2 protein 7. cggccggcccgacacggctcactcgcgccct Integrin-linked kinase. 11 + -9 ctgcccaagagcgccacggGCGGGGCGG 50 ggccggcggcgggctgcgggcgcggccggacg Integrin-linked kinase. 11 + -9 ctgcccaagagcgccacgggcggggcgGGG 50 CCGGCGgcgggctgcgggcgcggccggacg Interferon gamma receptor 1. 6 - -23 cgtaAGGCCGGGGCtggagggcagtgctg -82 ggctggtcccgcaggcgctcggggttggagc Interferon regulatory factor 3. 19 - 31 tgcatgtgacgctcccagcatgcctctggccggaa -28 acCCAAAAAAGggcataggggcggg Interleukin 10 receptor, beta. 21 + -49 agcggggtgggagcccggtcgcccggcccctcc 10 cCACCCCGCCccgcccatctccgctgg IR (insulin receptor) 19 - 523 aggcggggaggcgggcggggcggggcGGG 464 GCCGGGcggcacctccctcccctgcaagctt IR (insulin receptor) 19 - 523 aggcggggaggcgggcggggcggggcgggG 464 CCGGGCGGcacctccctcccctgcaagctt IR (insulin receptor) 19 - 523 aggcggggaggcggGCGGGGCGGggcg 464 gggccgggcggcacctccctcccctgcaagctt KIAA0102 gene product. 11 + -25 ctgtcagctccagccaatcggcttcagacaagtcG 34 GAGGGGAGggagacgcagaggcgga L-3-hydroxyacyl-Coenzyme A 4 + 20 gggacgccgggggcgcgcgggctgcagggccg 79 dehydrogenase, short chain. cgtaGGTCCCCGCccccagagtctggct Lactate dehydrogenase C. 11 + -12 cGAGCGGGGGaacgtgcgtgtctcgagtcg 47 cacggagggcaaccgtcgacgggcttagcg Lectin, galactoside-binding, 22 + -19 ggctcacccGGTCCGGTCcagttaaaaggg 40 soluble, 1 (galectin 1). tgggagcgtccgggggcccatctctctcgg Lectin, galactoside-binding, 14 + 46 ggGCCGGGCGGggctgggagtatttgaggc 105 soluble, 3 (galectin 3). tcggagccaccgccccgccggcgcccgcag Legumain. 14 - 9 gtcaccgcggcacagtggcccttaagcGAGGA -50 GCGGcggcgcccgcagcaatcacagcag Likely ortholog of mouse RNA pol 9 + -49 cggggacacgcggctcGCGCGCTGTgggc 10 I associated factor, 53 kD. ggtgcccggcggggccacgccttttccggcc Likely ortholog of mouse 11 + 221 cctcCCCCGCGCGctggcgcggggctttctg 280 synembryn. ggccagggcggggccggcgaactgcggcc Likely ortholog of mouse 11 + 221 cctcccccgcgcgctggcgcggggctttctgggcc 280 synembryn. agggcgGGGCCGGCGaactgcggcc Lipocalin 7. 1 + -19 tgcagagGGCGGGCGGtacaaaaagcgcc 40 ccgccccgcgctcctctcttgactttgagcg (continued on next page)

167

(Table A11 continued) Lysophospholipase 3 (lysosomal 16 + -20 gGGATGCGGGcggcggggttaagcgcgtc 39 phospholipase A2). gccaccgcccccgcctaggcgagagcccaga Lysophospholipase 3 (lysosomal 16 + -20 gggatGCGGGCGGCGgggttaagcgcgtc 39 phospholipase A2). gccaccgcccccgcctaggcgagagcccaga Major histocompatibility complex, 6 - 16 gcgtcgCCGGGGTCCCagttctaaagtccc -43 class I, B. cacgcacccacccggactcagagtctcctc Mannose-6-phosphate receptor 12 - 44 tggtggGAGGAGCGGttgcccagcggcctctt -15 (cation dependent). ggcgcttcctgtttccggttcccagagt Mannosyl (alpha-1,3-)- 5 - -7284 cggggaagggccacgttgcccgcccggccgtcc -7343 glycoprotein beta-1,2-N- ggccCCGGCGCGCcgcagaaagggctg acetylglucosaminyltransferase. MDR1 (multidrug resistance gene 7 - -231335 ttggctgggcaggaacagcGCCGGGGCGtg -231394 1) ggctgagcacagccgcttcgctctctttgc Microsomal glutathione S- 12 + 463 ctggagaggggcggtgcctgcgtccggCCCGC 522 transferase 1 , transcript variant GCGGccacagtccctgcattgcgcgcga 1b. Microtubule-associated protein, 20 + 12 gcgtaacgagggggtgcgtgtgaggtcatcgcgc 71 RP/EB family, member 1. gGGCGGGCGGgcggggtctggcggtt Midkine (neurite growth- 11 + 566 tggggagaggggGCGGGGGCCcatgtgac 625 promoting factor 2). cggctcagaccggttctggagacaaaagggg Mitochondrial ribosomal protein 1 - -30 tttcctgccgccgacgCCGGGGTCCtccctca -89 L9. ggtctcattctgtgcctgtgaacatggc Moesin. X + -21 cacataaaggaGCGGGCGGCGcgacagg 38 ggcggctctttcctgggtggggtttgtgaagt Myeloid leukemia factor 2. 12 - -360 tccgttggccgaGGGGGCCGTacggaggtg -419 gcagctgtgggaggaggcggcgtggaaggc NADH dehydrogenase X + 31 ggcaccgccccgctcGCGGGCGGCcgcgg 90 (ubiquinone) 1 alpha ggcttgctgggaagagaggcgaagccaggtc subcomplex, 1, 7.5kDa. NADH dehydrogenase 1 + 3003 ccaagaaggaGGCGCGCTGgagttacttcc 3062 (ubiquinone) Fe-S protein 2, gcccggttctccttcccgcagtctgcagcc 49kDa (NADH-coenzyme Q reductase). Neuropilin 1. 10 - -285 aagggagaggaagccggagctaaatgACAG -344 GATGCaggcgacttgagacacaaaaagaga Niemann-Pick disease, type C2. 14 - -28 gggcccgccccgacaggtttgtcttgtgaccGCG -87 GGCGGCcgctgcttctttcccgagct N-Myelocytomatosis virus 29 2 + -15 gcgcaggctgtgacagtcatctgtctggacgcgct 44 oncogene, MYCN or NMYC. gggtGGATGCGGGGggctcctggga N-Myelocytomatosis virus 29 2 + 140 acccgcgcagaatcgcctccggatcccctgcaG 199 oncogene, MYCN or NMYC. TCGGCGGGaggtaaggagcagggcttg Nuclear autoantigenic sperm 1 + -49 gggcgaagggtcctcgtatataaaagggcccCG 10 protein (histone-binding) , GCCGCGCGGggtctctaatctgccatt transcript variant 2. Nuclear factor I/C (CCAAT- 19 + 6916 cgggggggggggttgggggggGCGGGGG 6975 binding transcription factor). GGtggtttggaaaaatgactcagtaagttcag Origin recognition complex, 2 - -781 gggagaggggcggggtgggagaggtagtgcgtg -840 subunit 4-like (yeast). cgcggggCATGCCGGGagtggttgtgt Ornithine decarboxylase 19 + -50 gcgtggcctgggcgcagcatctataaaggcgggc 9 antizyme 1. ggcGGCAGAGGCgccattttgcgaac Ornithine decarboxylase 19 + -50 gcgtggcctgggcgcagcatctataaaGGCGG 9 antizyme 1. GCGGCGgcagaggcgccattttgcgaac PDGF, A-chain (platelet-derived 7 - -559481 gaatccggggagGCGGGGGGGggggggc -559481 growth factor A-chain) gggggcgggggcg PDZ and LIM domain 1 (elfin). 10 - -72 tctGTCCCCGCGggtcgtcgcccgccacagc -131 cgcgccatgaccacccagcagatagacct Peroxiredoxin 4. X + -95 tagggggggtagGGGGAGGAGCtggtctcg -36 cgaggccgcgccccggttcgcgcgggcgtc PFKM (muscle 12 + -71 tgccatatctgagctgagtacttagGGGGAGG -12 phosphofructokinase) AGgaagaggaggagaaaggcaagcagga (continued on next page)

168

(Table A11 continued) Phosphofructokinase, liver. 21 + -15 cgcgcGCGGGGCGGggcggggacggcga 44 cgcggcgcaggcggcgggagtgcgagctgggc Phosphofructokinase, platelet. 10 + -61 cgcgggCGGGGCGGcggttccgagtcaggc -2 gcgcgcgggcagggtccccattgcctgctg Phosphogluconate 1 + -8 GCGTGAGCGGccgcagtttctggagggagc 51 dehydrogenase. cgctgcgggtctttccctcactcgtcctcc Phosphoglycerate kinase 1. X + 37 ggtgtggggcggtagtgtgggccctgttcctgCCC 96 GCGCGGtgttccgcattctgcaagc Phosphomevalonate kinase. 1 - -38 actgttctaagtgagttcgggtgggggagcttcacG -97 AGGGGAGGctgctctgtgaaggaa Phosphoribosyl pyrophosphate X + 37 gaccccacctccgccgctttgggtaatttagagccg 96 synthetase 1. cgcGCCGGGCGGgaatgtaagatg Plasma retinol-binding protein. 10 - 52 cgcgaccccctccccccggcgctataaagcaGC -7 GGGGCGGccgcggcgcgctcgcctccc Plasma retinol-binding protein. 10 - 52 cgcgaccccctccccccggcgctataaagcagcg -7 gggcggccgCGGCGCGCTcgcctccc Plastin 3 (T isoform). X + -37 GGAGGGGAGGgagcgctggctttagagcc 22 acagctgcaaagattccgaggtgcagaagtt Polo-like kinase (Drosophila). 16 + -30 cagcgccgcgtttgaattcGGGGAGGAGCG 29 gagcggtgcggaggctctgctcggatcgag Polyamine-modulated factor 1. 1 + -17 ggGAGGCCGGGccagttagatttggaggttc 42 aacttcaacatggccgaagcaagtagcgc Polymerase (DNA directed), delta 7 - 49 gggcggggaaGCCGCGCGGggattagcg -10 2, regulatory subunit 50kDa. agttgcggcgatgggcggggcaggcgcgcggg POU domain, class 2, 1 + -69 GGAGGGGAGccagagcgagggagggtttat -10 transcription factor 1. cgaccgggcgattttggttaaaatattcaa Pregnancy-specific beta-1- 19 - 51 agagAGGAGGGGAcagagaggtgtcctgg -8 glycoprotein 5. gcctgaccccacccatgagcctgagaagtgc Pregnancy-specific beta-1- 19 - 43 agAGGAGGGGAcagagaggtgtcctgggc -16 glycoprotein. ctgaccctgcccatgagcttgagaattgctc Prion protein (p27-30) 20 + 259 gccccgcgtccctccccctcggCCCCGCGC 318 (Creutzfeld-Jakob disease,), Gtcgcctgtcctccgagccagtcgctgaca Profilin 1. 17 - 49 cgggGCGGGGGGGgggaggagcaggaa -10 gtggcggtgcgagggctgctgcacagcgagcgg Profilin 1. 17 - 49 cggggcggggggGGGGAGGAGCaggaa -10 gtggcggtgcgagggctgctgcacagcgagcgg Progesterone receptor X + -12 cccgcctcctccccggctagtctttggccgccgccg 47 membrane component 1. aaCCCCGCGCGccactcgctcgct Proopiomelanocortin, 2 - -58 ccaccaggagagctcggcaagtatataaggaca -117 adrenocorticotropic hormone/ gAGGAGCGCGggaccaagcggcggcga beta lipotropic hormone. Proteasome (prosome, 2 + -66 cggcaggcggggtggcgggcagcccctgGGC -7 macropain) 26S subunit, non- GGGCGGggtcctggcgagaagcgagccgg ATPase, 1. Proteasome (prosome, 17 + -28 ggcgggctttccgggtgtgtgtttccggCGTCGG 31 macropain) 26S subunit, non- CGGccgcggccggggacggtgtgaga ATPase, 11. Proteasome (prosome, 16 + -16 ggcactctgggagcggaagaaggaGGCCGC 43 macropain) 26S subunit, non- GCGagggctgacgaaccggaagaagaggaa ATPase, 7 (Mov34 homolog). Proteasome (prosome, 15 + -10 tcccgctccccccgccccaacccagcggttctgcg 49 macropain) subunit, α type, 4. catgcGCGGGGGCCatattagcagc Proteasome (prosome, 6 - 27 gatagAGAGGCCGGGagcgaacttcagga -32 macropain) subunit, beta type, 1. gaagccggaagtggcgtaacgtccggtcaag Protein phosphatase 1G 2 - 11 gcagGCCGGGCGGggtctggagcggcgcc -47 (formerly 2C), magnesium- gtttccgcttccgctccctcacagctcccgt dependent, gamma isoform. Protein phosphatase 2 (formerly 5 - -134 ccgccgcctcctgacgccgggcgtgacgtcacca -193 2A), catalytic subunit, α isoform. cgccCGGCGGGCGccattacagagag Protein phosphatase 4 (formerly 16 + -82 ttccgcggcggggccggaaGTAGGAGCGg -23 X), catalytic subunit. cggcggcggcggcggcggcggtcgaaagcgg (continued on next page)

169

(Table A11 continued) Putative S1 RNA binding domain 1 + -12 GGCGCGCTGggtcttgtgggtggaaacgcg 47 protein. ctggctgactggggtcggcgtttagttcag Rab acceptor 1 (prenylated). 19 - 2 ctgggtgGGGCCGGCGtcgggcggggccg -57 gcggggtcttcagggtaccgggctggttaca Regulatory factor X-associated 19 + 739 aagttgctttctgtcccggcagaggaagccagatc 798 ankyrin-containing protein. gctgaGGGTCCGGTCtccagtttgc Ribosomal protein L13. 16 + 212 gcgcgcCCGGGGTCCggcctctcactcgctc 271 ccctctcgtccgcagccgcagggccgtag Ribosomal protein L13. 16 + 212 gcgcgcccGGGGTCCGGcctctcactcgctc 271 ccctctcgtccgcagccgcagggccgtag Ribosomal protein L15. 3 + -49 gctgaggtgGGGGAGGAGCccaaaaggc 10 attgtgggagtacagctctttcctttccgtct Ribosomal protein L19. 17 + -17 tttgggctctccccttcgcagataatGGGAGGA 42 GCcgggcccgagcgagctctttccttt Ribosomal protein L30. 8 - -108 gatcctaaaattcctgtcctgttctctgtctcttctaggtt -167 GGGGGCCGTcccgctccta Ribosomal protein L39-like. 3 - 49 aatcaccgccctagcatccggggaaatcgcggtct -10 tagcatCCGGCGCGCggcggttgaa Ribosomal protein L8. 8 - 3 ggaagataaggccgctcgctgacgccgtgtttcctc -56 ttTCGGCCGCGCtggtgaacaggt Ribosomal protein L8. 8 - -237 gtacccgggccgccCGGCCGCGCtaatcgt -296 gagtcgcccccaggacccgtcgccatgggc Ribosomal protein S24. 10 + 54 CCGGGGTCCttccgtgcgcgttgatatgattg 113 gccggcgaatcgtggttctcttttcctc Ribosomal protein S6. 9 - 48 cgcgagaactgaaagcgcctatgtgacctgcGC -11 TAAGCGGaagttggccctcttttccgt Ribosomal protein S9. 19 + 256 caggcGCCGTTTGGagcccttacgctcaca 315 cttctctcccgcgcaggcgcagacggggaa Ring finger protein 4. 4 + -23 GGCGGGCGGCcaatggggacatgatggg 36 gggcggagccgaggcctccgaagcggaagtgg RNA binding motif protein 5. 3 + -30 gataagcgtgaggtactgtgggtaggagacggcC 29 GTCGGCGGaggcgccattttgtgtag Scavenger receptor class B, 4 - -276 tccctccttgcagttggatccctggcgggtgcggcc -335 member 2. cggcccggccCGTGAGCGGcgcac Sequestosome 1. 5 + 14468 gcggggcctccgcgttcgctacaaaaGCCGC 14527 GCGGcggctgcgaccgggacggcccgttt Serine/threonine kinase 16. 2 + -18 gttacGCCGGCGCGgactgatgatgtcagca 41 ctgcttccggtcggtggcgcttctctctg Serine/threonine kinase 16. 2 + -18 gttacgccggcgcggactgatgatgtcagcactgc 41 tTCCGGTCGGtggcgcttctctctg Serine/threonine kinase 25 2 - -273 ggcaccgggaggaagctgccttggaagaggtgg -332 (STE20 homolog, yeast). GGGCGGCGAcgggaggggcggcgagcc Sjogren's syndrome/scleroderma 11 + -62 agctATGGAACGGgcggagccgcagcgga -3 autoantigen 1. gggcgctccgctttgacgtcacttcctgtga Small inducible cytokine B14 5 - -239 cgagggcaggagcggatttaaaagaggccagG -298 precursor (chemokine CXCL14). GCGGGCGGagggaggctgtggagagagc Small nuclear RNA, U1 small 1 - 49 taaagagtgaggcgtatgaggctgtgtcggGGC -10 nuclear 1 AGAGGCCcaagatctcatacttacctg SMT3 suppressor of mif two 3 21 - -20 ggccaacgggtgcgccgggatttgGGGGATA -79 homolog 1 (yeast). AAgcgcggccccgcgcacagttgcggcgg Solute carrier family 2 (facilitated 1 - -259 agccaatggCCGGGGTCCtataaacgctac -318 glucose transporter), member 1. ggtccgcgcgctctctggcaagaggcaaga Sorting nexin 3. 6 - -171 cGGGGAGGAGCttcgcgtgcggggtgaac -230 gcccgctctacgtgctcgttctcttcgcgac Splicing factor, arginine/serine- 17 - 50 cattttgtgaggagcgatataaacgggcGCAGA -9 rich 2. GGCCGGctgcccgcccagttgttactc Stomatin. 9 - 30 ttggggGCGGGGCGGcaatctgggtcttgtg -29 cctctggctcctcagggcattcccggcgg Succinate dehydrogenase 1 + -44 cccccaGCCGGCGCGCctccgccctcgggt 15 complex, subunit C, integral ggcggggccgcctggcgtcacttccgtcca membrane protein, 15kDa ,. (continued on next page)

170

(Table A11 continued) Superoxide dismutase 2, 6 - -31 gccctgcTCCCCGCGCtttcttaaggcccgcg -90 mitochondrial. ggcggcgcaggagcggcactcgtggctg Superoxide dismutase 2, 6 - -31 gccctgctccccgcgctttcttaaggcccGCGG -90 mitochondrial. GCGGCGcaggagcggcactcgtggctg Suppression of tumorigenicity 13 22 - 5 ccgttgggtGGGAGGAGCcagcggccggg -54 (colon carcinoma) (Hsp70 gaggttctagtctgttctgtcttgcggcagc interacting protein). TAF15 RNA polymerase II, TATA 17 + -49 attggctgCCGGGGTCCCgcagtccgcctc 10 box binding protein (TBP)- agcccgccgcgccgccctcagtacagctcc associated factor, 68kDa , transcript variant 2. TAR DNA binding protein. 1 + -20 aaccggtgggagaggacgccggtgggcggGG 39 GGAGGAGgcggccctagcgccattttgtg Tax1 (human T-cell leukemia 7 + -1 gtgacgcaaggcctactgtcggctgGGAGGG 60 virus type I) binding protein 1. GAGGtgtagccggtctttgggggtaggcg T cell receptor beta variable 12-1 7 - -5911 cagatgcattctgtGGGGATAAAatgtcacaa -5970 pseudogene aattcatttctttgctcatgctcacaga TGFalpha (transforming growth 2 - -94 caggccctgcctagtctgcgtctttttCCCCCGC -153 factor alpha) ACcgcggcgccgctccgccactcggg TPI (triosephosphate isomerase) 12 + -155 atcgggcggcggccggggcggcggcaggGGC -96 GGGCGGggggcagggctccgggggact Transforming growth factor, beta- 5 + 47 ctcccgctcgcagcttacttaacctggccCGGG 106 induced, 68kDa. CGGCGgaggcgctctcacttccctgga Transforming, acidic coiled-coil 4 + -52 tcggcgtttgaaactCCGGCGCGCcggcgg 7 containing protein 3. ccatcaagggctagaagcgcggcggcggta Trans-golgi network protein 2. 2 - -226 gagaggggCCCCGCGCGcggatctcgcga -285 gagcattagagggcggaagcgctatccgagc Translin. 2 + 66 gtcctCCGGCGCGCcgcgagcctcggagga 125 ccctagcgacggtcgtggcgtaagaccggg Transmembrane 9 superfamily 13 + -51 tctGTTGCGGTCcgcttcggtttctgttgcggga 8 member 2. cccggggtgtctcctagcgcaaccgg Trinucleotide repeat containing 5. 6 + 290 gggctgcggctgcgagaggagGGCGGGCG 349 Ggaggctagctgttgtcgtggttgctcggag TRK-fused gene. 3 + 221 ggcttgttgggtcagcgcgattGGCCGGGGC 280 ccgcgcgagcctgcgagcgaggtgcggcg Troponin C2, fast. 20 - 48 gGGAGGGGAGGgtgcccctacaaatcccg -11 ggggctagagcaggccaggtcatctttgggt Tubulin 2A, beta polypeptide. 6 - 44 ggcagggcGGGTCCCCGggtataaaagac -15 cgagctgggggggcggcggcaggtctctgcg Tumor differentially expressed 1. 20 - -19 GGCTGCGTttccggcctgagaaaccgtcatgt -78 ttctggggagtcacctcagctggcagtt Tumor-associated calcium signal 2 + 114 aggcggggccgccaGGTCGGGCAGgtgtg 173 transducer 1. cgctccgccccgccgcgcgcacagagcgcta Tyrosylprotein sulfotransferase 2. 22 - 29 cggccccgcCCCCGCGCGGgcacctcccc -30 cttccccggctggggcggctggagagccggg Ubiquitin c-terminal esterase L1 4 + -33 acagtgcgtctGGCCGGCGCtttatagctgca 26 (ubiquitin thiolesterase). gcctgggcggctccgctagctgtttttc Ubiquitin-conjugating enzyme E2 8 + -53 CGGCGCGCTcccggaagtgacgcgcgacg 6 variant 2. gttcgtgcgtgcgtgcgggcggctgcgtcgg Ubiquitin-conjugating enzyme E2 8 + -53 cggcgcgctcccggaagtgacgcgcgacggttcg 6 variant 2. tgcgtgcgtGCGGGCGGCtgcgtcgg Ubiquitin-conjugating enzyme 11 - -274 ggggtGGGGTCCCCggggcggggcgggg -333 E2L 6. cgcgctgtgtcgcgggtcggagctcggtcctg Ubiquitin-conjugating enzyme 11 - -274 ggggtggggtccccgggGCGGGGCGGggc -333 E2L 6. gcgctgtgtcgcgggtcggagctcggtcctg Ubiquitin-conjugating enzyme 11 - -274 ggggtggggtccccggggcggggcggGGCG -333 E2L 6. CGCTGTGtcgcgggtcggagctcggtcctg Uncharacterized hypothalamus 11 + -74 gcaccacttcccgcccagggcgttctGGGTCC -15 protein HT007. CCGCccaccggcaagtcacatgagccac Urokinase (urine plasminogen 10 + -23 gggcggcGCCGGGGCGggccctgatatag 36 activator). agcaggcgccgcgggtcgcagcacagtcgga (continued on next page)

171

(Table A11 continued) Vaccinia related kinase 1. 14 + -1 tactgcagggtgcgaagGGGCCGGCGCcg 58 ctgccgagttacgagtcggcgaaagcggcgg Vaccinia related kinase 3. 19 - -130 cgggcgcctcagattCAGGATGCGcgcgca -189 gcgctctccgcccagcggaagttttcgctg Vacuolar protein sorting 29 12 - 46 tgtctccctctgctctcatTGGTTGCGGTtaagt -13 (yeast). gggcgggtcgccgaggagcctgagga Vacuolar protein sorting 35 16 - -14 ggcttggaggggccgcagcgtcacatgaccgcg -73 (yeast). ggaggctacgcGCGGGGCGGgtgctgc V-fos FBJ murine osteosarcoma 14 + -49 ttcataaaacgcttgttataaaagcaGTGGCTG 10 viral oncogene homolog. CGgcgcctcgtactccaaccgcatctg V-jun sarcoma virus 17 oncogene 1 - -48 agctcgggctggataagggctcAGAGTTGCA -107 homolog (avian). Ctgagtgtggctgaagcagcgaggcggga Voltage-dependent anion 10 + -19 cctgtctgggagaggacagggttgCGGCGGG 40 channel 2. CGGaacggtgtctccttcacttcgccctc V-ral simian leukemia viral 2 + 36 tgacaaatcggtggaggacggctGGGGTCC 95 oncogene homolog B (ras GGccccgggagggggcggggcgcgtttaag related; GTP binding protein). V-rel reticuloendotheliosis viral 11 - -16 cggattccgggcagtgacgcgacggcgGGCC -75 oncogene homolog A, nuclear GCGCGGcgcatttccgcctctggcgaatg factor of kappa light polypeptide gene enhancer in B-cells 3, p65 (avian). XPA binding protein 1; putative 2 + 412 ggaagtttctctacccatgcggtgtctctatGGTC 353 ATP(GTP)-binding protein. GGGTGggtggggccaggaggaagat

172

REFERENCES

Ayyanathan, K., Lechner, M. S., Bell, P., Maul, G. G., Schultz, D. C., Yamada,

Y., Tanaka, K., Torigoe, K., and Rauscher, F. J., 3rd. (2003). Regulated

recruitment of HP1 to a euchromatic gene induces mitotically heritable,

epigenetic gene silencing: a mammalian cell culture model of gene

variegation. Genes Dev 17, 1855-69.

Ayyanathan, K., Peng, H., Hou, Z., Fredericks, W. J., Goyal, R. K., Langer, E.

M., Longmore, G. D., and Rauscher, F. J., 3rd. (2007). The Ajuba LIM

domain protein is a corepressor for SNAG domain mediated repression

and participates in nucleocytoplasmic Shuttling. Cancer Res 67, 9097-

106.

Bardwell, V. J., and Treisman, R. (1994). The POZ domain: a conserved

protein-protein interaction motif. Genes Dev 8, 1664-77.

Barrallo-Gimeno, A., and Nieto, M. A. (2009). Evolutionary history of the

Snail/Scratch superfamily. Trends Genet 25, 248-52.

Batlle, E., Sancho, E., Franci, C., Dominguez, D., Monfar, M., Baulida, J., and

Garcia De Herreros, A. (2000). The transcription factor snail is a

repressor of E-cadherin gene expression in epithelial tumour cells. Nat

Cell Biol 2, 84-9.

173

Beissbarth, T., and Speed, T. P. (2004). GOstat: find statistically

overrepresented Gene Ontologies within a group of genes.

Bioinformatics 20, 1464-5.

Berger, S. L. (2007). The complex language of chromatin regulation during

transcription. Nature 447, 407-12.

Blanco, M. J., Moreno-Bueno, G., Sarrio, D., Locascio, A., Cano, A., Palacios,

J., and Nieto, M. A. (2002). Correlation of Snail expression with

histological grade and lymph node status in breast carcinomas.

Oncogene 21, 3241-6.

Brayer, K. J., Kulshreshtha, S., and Segal, D. J. (2008). The protein-binding

potential of C2H2 zinc finger domains. Cell Biochem Biophys 51, 9-19.

Brayer, K. J., and Segal, D. J. (2008). Keep your fingers off my DNA: protein-

protein interactions mediated by C2H2 zinc finger domains. Cell

Biochem Biophys 50, 111-31.

Breslin, M. B., Zhu, M., and Lan, M. S. (2003). NeuroD1/E47 regulates the E-

box element of a novel zinc finger transcription factor, IA-1, in

developing nervous system. J Biol Chem 278, 38991-7.

Breslin, M. B., Zhu, M., Notkins, A. L., and Lan, M. S. (2002). Neuroendocrine

differentiation factor, IA-1, is a transcriptional repressor and contains a

specific DNA-binding domain: identification of consensus IA-1 binding

sequence. Nucleic Acids Res 30, 1038-45.

174

Cano, A., Perez-Moreno, M. A., Rodrigo, I., Locascio, A., Blanco, M. J., del

Barrio, M. G., Portillo, F., and Nieto, M. A. (2000). The transcription

factor snail controls epithelial-mesenchymal transitions by repressing E-

cadherin expression. Nat Cell Biol 2, 76-83.

Chiang, C., and Ayyanathan, K. (in press 2012). Characterization of the E-box

Binding Affinity to SNAG-Zinc Finger Proteins. Mol Biol.

Chiang, C., and Ayyanathan, K. (manuscript in preparation). Global Analysis of

Mammalian SNAG-Zinc Finger Transcription Factor Target Genes.

Choo, Y., and Klug, A. (1994). Selection of DNA binding sites for zinc fingers

using rationally randomized DNA reveals coded interactions. Proc Natl

Acad Sci U S A 91, 11168-72.

Clamp, M., Fry, B., Kamal, M., Xie, X., Cuff, J., Lin, M. F., Kellis, M., Lindblad-

Toh, K., and Lander, E. S. (2007). Distinguishing protein-coding and

noncoding genes in the human genome. Proc Natl Acad Sci U S A 104,

19428-33.

Collins, T., Stone, J. R., and Williams, A. J. (2001). All in the family: the

BTB/POZ, KRAB, and SCAN domains. Mol Cell Biol 21, 3609-15.

Dominguez, D., Montserrat-Sentis, B., Virgos-Soler, A., Guaita, S., Grueso, J.,

Porta, M., Puig, I., Baulida, J., Franci, C., and Garcia de Herreros, A.

(2003). Phosphorylation regulates the subcellular location and activity

of the snail transcriptional repressor. Mol Cell Biol 23, 5078-89.

175

Duan, Z., and Horwitz, M. (2003). Targets of the transcriptional repressor

oncoprotein Gfi-1. Proc Natl Acad Sci U S A 100, 5932-7.

Fiolka, K., Hertzano, R., Vassen, L., Zeng, H., Hermesh, O., Avraham, K. B.,

Duhrsen, U., and Moroy, T. (2006). Gfi1 and Gfi1b act equivalently in

haematopoiesis, but have distinct, non-overlapping functions in inner

ear development. EMBO Rep 7, 326-33.

Fondell, J. D., Brunel, F., Hisatake, K., and Roeder, R. G. (1996). Unliganded

thyroid hormone receptor alpha can target TATA-binding protein for

transcriptional repression. Mol Cell Biol 16, 281-7.

Giroldi, L. A., Bringuier, P. P., de Weijert, M., Jansen, C., van Bokhoven, A.,

and Schalken, J. A. (1997). Role of E boxes in the repression of E-

cadherin expression. Biochem Biophys Res Commun 241, 453-8.

Goldmark, J. P., Fazzio, T. G., Estep, P. W., Church, G. M., and Tsukiyama,

T. (2000). The Isw2 chromatin remodeling complex represses early

meiotic genes upon recruitment by Ume6p. Cell 103, 423-33.

Grimes, H. L., Chan, T. O., Zweidler-McKay, P. A., Tong, B., and Tsichlis, P.

N. (1996a). The Gfi-1 proto-oncoprotein contains a novel transcriptional

repressor domain, SNAG, and inhibits G1 arrest induced by interleukin-

2 withdrawal. Mol Cell Biol 16, 6263-72.

Grimes, H. L., Gilks, C. B., Chan, T. O., Porter, S., and Tsichlis, P. N. (1996b).

The Gfi-1 protooncoprotein represses Bax expression and inhibits T-cell

death. Proc Natl Acad Sci U S A 93, 14569-73.

176

Hanna-Rose, W., and Hansen, U. (1996). Active repression mechanisms of

eukaryotic transcription repressors. Trends Genet 12, 229-34.

Hemavathy, K., Ashraf, S. I., and Ip, Y. T. (2000a). Snail/slug family of

repressors: slowly going into the fast lane of development and cancer.

Gene 257, 1-12.

Hemavathy, K., Guru, S. C., Harris, J., Chen, J. D., and Ip, Y. T. (2000b).

Human Slug is a repressor that localizes to sites of active transcription.

Mol Cell Biol 20, 5087-95.

Hemavathy, K., Hu, X., Ashraf, S. I., Small, S. J., and Ip, Y. T. (2004). The

repressor function of snail is required for Drosophila gastrulation and is

not replaceable by Escargot or Worniu. Dev Biol 269, 411-20.

Hou, Z., Peng, H., Ayyanathan, K., Yan, K. P., Langer, E. M., Longmore, G.

D., and Rauscher, F. J., 3rd. (2008). The LIM protein AJUBA recruits

protein arginine methyltransferase 5 to mediate SNAIL-dependent

transcriptional repression. Mol Cell Biol 28, 3198-207.

Iuchi, S. (2001). Three classes of C2H2 zinc finger proteins. Cell Mol Life Sci

58, 625-35.

Jacobs, S., Chang, K. J., and Cuatrecasas, P. (1975). Estimation of hormone

receptor affinity by competitive displacement of labeled ligand: effect of

concentration of receptor and of labeled ligand. Biochem Biophys Res

Commun 66, 687-92.

177

Jegalian, A. G., and Wu, H. (2002). Regulation of Socs gene expression by

the proto-oncoprotein GFI-1B: two routes for STAT5 target gene

induction by erythropoietin. J Biol Chem 277, 2345-52.

Kataoka, H., Murayama, T., Yokode, M., Mori, S., Sano, H., Ozaki, H., Yokota,

Y., Nishikawa, S., and Kita, T. (2000). A novel snail-related transcription

factor Smuc regulates basic helix-loop-helix transcription factor

activities via specific E-box motifs. Nucleic Acids Res 28, 626-33.

Klug, A., and Schwabe, W. R. (1995). Zinc fingers. The FASEB Journal 9,

597-604.

Knebel, J., De Haro, L., and Janknecht, R. (2006). Repression of transcription

by TSGA/Jmjd1a, a novel interaction partner of the ETS protein ER71.

J Cell Biochem 99, 319-29.

Kuroda, S., Tokunaga, C., Kiyohara, Y., Higuchi, O., Konishi, H., Mizuno, K.,

Gill, G. N., and Kikkawa, U. (1996). Protein-protein interaction of zinc

finger LIM domains with protein kinase C. J Biol Chem 271, 31029-32.

Ladomery, M., and Dellaire, G. (2002). Multifunctional zinc finger proteins in

development and disease. Ann Hum Genet 66, 331-42.

Lin, Y., Wu, Y., Li, J., Dong, C., Ye, X., Chi, Y. I., Evers, B. M., and Zhou, B. P.

(2010). The SNAG domain of Snail1 functions as a molecular hook for

recruiting lysine-specific demethylase 1. Embo J 29, 1803-16.

Mackay, J. P., and Crossley, M. (1998). Zinc fingers are sticking together.

Trends Biochem Sci 23, 1-4.

178

Manzanares, M., Blanco, M. J., and Nieto, M. A. (2004). Snail3 orthologues in

vertebrates: divergent members of the Snail zinc-finger gene family.

Dev Genes Evol 214, 47-53.

Manzanares, M., Locascio, A., and Nieto, M. A. (2001). The increasing

complexity of the Snail gene superfamily in metazoan evolution. Trends

Genet 17, 178-81.

Marin, F., and Nieto, M. A. (2006). The expression of Scratch genes in the

developing and adult brain. Dev Dyn 235, 2586-91.

Martin, T. A., Goyal, A., Watkins, G., and Jiang, W. G. (2005). Expression of

the transcription factors snail, slug, and twist and their clinical

significance in human breast cancer. Ann Surg Oncol 12, 488-96.

Martinez-Estrada, O. M., Culleres, A., Soriano, F. X., Peinado, H., Bolos, V.,

Martinez, F. O., Reina, M., Cano, A., Fabre, M., and Vilaro, S. (2006).

The transcription factors Slug and Snail act as repressors of Claudin-1

expression in epithelial cells. Biochem J 394, 449-57.

Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R.,

Hornischer, K., Karas, D., Kel, A. E., Kel-Margoulis, O. V., Kloos, D. U.,

Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I.,

Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E. (2003).

TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic

Acids Res 31, 374-8.

179

Mauhin, V., Lutz, Y., Dennefeld, C., and Alberga, A. (1993). Definition of the

DNA-binding site repertoire for the Drosophila transcription factor

SNAIL. Nucleic Acids Res 21, 3951-7.

Miller, J., McLachlan, A. D., and Klug, A. (1985). Repetitive zinc-binding

domains in the protein transcription factor IIIA from Xenopus oocytes.

Embo J 4, 1609-14.

Moody, S. E., Perez, D., Pan, T. C., Sarkisian, C. J., Portocarrero, C. P.,

Sterner, C. J., Notorfrancesco, K. L., Cardiff, R. D., and Chodosh, L. A.

(2005). The transcriptional repressor Snail promotes mammary tumor

recurrence. Cancer Cell 8, 197-209.

Nagai, K., Nakaseko, Y., Nasmyth, K., and Rhodes, D. (1988). Zinc-finger

motifs expressed in E. coli and folded in vitro direct specific binding to

DNA. Nature 332, 284-6.

Nieto, M. A. (2002). The snail superfamily of zinc-finger transcription factors.

Nat Rev Mol Cell Biol 3, 155-66.

Okkema, P. G., and Krause, M. (2005). Transcriptional regulation. WormBook,

1-40.

Peinado, H., Ballestar, E., Esteller, M., and Cano, A. (2004). Snail mediates E-

cadherin repression by the recruitment of the Sin3A/histone

deacetylase 1 (HDAC1)/HDAC2 complex. Mol Cell Biol 24, 306-19.

180

Peng, H., Zheng, L., Lee, W. H., Rux, J. J., and Rauscher, F. J., 3rd. (2002). A

common DNA-binding site for SZF1 and the BRCA1-associated zinc

finger protein, ZBRK1. Cancer Res 62, 3773-81.

Saleque, S., Kim, J., Rooke, H. M., and Orkin, S. H. (2007). Epigenetic

regulation of hematopoietic differentiation by Gfi-1 and Gfi-1b is

mediated by the cofactors CoREST and LSD1. Mol Cell 27, 562-72.

Savagner, P., Kusewitt, D. F., Carver, E. A., Magnino, F., Choi, C., Gridley, T.,

and Hudson, L. G. (2005). Developmental transcription factor slug is

required for effective re-epithelialization by adult keratinocytes. J Cell

Physiol 202, 858-66.

Schmid, C. D., Perier, R., Praz, V., and Bucher, P. (2006). EPD in its twentieth

year: towards complete promoter coverage of selected model

organisms. Nucleic Acids Res 34, D82-5.

Sun, L., Liu, A., and Georgopoulos, K. (1996). Zinc finger-mediated protein

interactions modulate Ikaros activity, a molecular control of lymphocyte

development. Embo J 15, 5358-69.

Swirnoff, A. H., and Milbrandt, J. (1995). DNA-binding specificity of NGFI-A

and related zinc finger transcription factors. Mol Cell Biol 15, 2275-87.

Tadepally, H. D., Burger, G., and Aubry, M. (2008). Evolution of C2H2-zinc

finger genes and subfamilies in mammals: species-specific duplication

and loss of clusters, genes and effector domains. BMC Evol Biol 8, 176.

181

Takahashi, E., Funato, N., Higashihori, N., Hata, Y., Gridley, T., and

Nakamura, M. (2004). Snail regulates p21(WAF/CIP1) expression in

cooperation with E2A and Twist. Biochem Biophys Res Commun 325,

1136-44.

Thiery, J. P., Acloque, H., Huang, R. Y., and Nieto, M. A. (2009). Epithelial-

mesenchymal transitions in development and disease. Cell 139, 871-

90.

Tripathi, M. K., Misra, S., Khedkar, S. V., Hamilton, N., Irvin-Wilson, C.,

Sharan, C., Sealy, L., and Chaudhuri, G. (2005). Regulation of BRCA2

gene expression by the SLUG repressor protein in human breast cells.

J Biol Chem 280, 17163-71.

Tsukiyama, T., Daniel, C., Tamkun, J., and Wu, C. (1995). ISWI, a member of

the SWI2/SNF2 ATPase family, encodes the 140 kDa subunit of the

nucleosome remodeling factor. Cell 83, 1021-6.

Tsukiyama, T., and Wu, C. (1995). Purification and properties of an ATP-

dependent nucleosome remodeling factor. Cell 83, 1011-20.

Twigg, S. R., and Wilkie, A. O. (1999). Characterisation of the human snail

(SNAI1) gene and exclusion as a major disease gene in

craniosynostosis. Hum Genet 105, 320-6.

Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A., and Luscombe, N. M.

(2009). A census of human transcription factors: function, expression

and evolution. Nat Rev Genet 10, 252-63.

182

Vega, S., Morales, A. V., Ocana, O. H., Valdes, F., Fabregat, I., and Nieto, M.

A. (2004). Snail blocks the cell cycle and confers resistance to cell

death. Genes Dev 18, 1131-43.

Xie, J., Cai, T., Zhang, H., Lan, M. S., and Notkins, A. L. (2002). The zinc-

finger transcription factor INSM1 is expressed during embryo

development and interacts with the Cbl-associated protein. Genomics

80, 54-61.

Zhou, B. P., Deng, J., Xia, W., Xu, J., Li, Y. M., Gunduz, M., and Hung, M. C.

(2004). Dual regulation of Snail by GSK-3beta-mediated

phosphorylation in control of epithelial-mesenchymal transition. Nat Cell

Biol 6, 931-40.

Zhu, M., Breslin, M. B., and Lan, M. S. (2002). Expression of a novel zinc-

finger cDNA, IA-1, is associated with rat AR42J cells differentiation into

insulin-positive cells. Pancreas 24, 139-45.

Zimmerman, K. A., Fischer, K. P., Joyce, M. A., and Tyrrell, D. L. (2008). Zinc

finger proteins designed to specifically target duck hepatitis B virus

covalently closed circular DNA inhibit viral transcription in tissue culture.

J Virol 82, 8013-21.

Zweidler-Mckay, P. A., Grimes, H. L., Flubacher, M. M., and Tsichlis, P. N.

(1996). Gfi-1 encodes a nuclear zinc finger protein that binds DNA and

functions as a transcriptional repressor. Mol Cell Biol 16, 4024-34.

183