Characterizing the Dna-Binding Site Specificities of Cis2his2 Zinc Fingers

MQP-ID-DH-UM1 C H A R A C T E RI Z IN G T H E DN A-BINDIN G SI T E SPE C I F I C I T I ES O F C IS2H IS2 Z IN C F IN G E RS A Major Qualifying Project Report Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degrees of Bachelor of Science in Biochemistry and Biology and Biotechnology by _________________________ Heather Bell April 26, 2012 APPROVED: ____________________ ____________________ ____________________ Scot Wolfe, PhD Destin Heilman, PhD David Adams, PhD Gene Function and Exp. Biochemistry Biology and Biotech UMass Medical School WPI Project Advisor WPI Project Advisor MAJOR ADVISOR A BST R A C T The ability to modularly assemble Zinc Finger Proteins (ZFPs) as well as the wide variety of DNA sequences they can recognize, make ZFPs an ideal framework to design novel DNA-binding proteins. However, due to the complexity of the interactions between residues in the ZF recognition helix and the DNA-binding site there is currently no comprehensive recognition code that would allow for the accurate prediction of the DNA ZFP binding motifs or the design of novel ZFPs for a desired target site. Through the analysis of the DNA-binding site specificities of 98 ZFP clones, determined through a bacterial one-hybrid selection system, a predictive model was created that can accurately predict the binding site motifs of novel ZFPs. 2 T A B L E O F C O N T E N TS Signature Page ««««««««««««««««««««««««««« $EVWUDFW«««««««««««««««««««««««««««««« 7DEOHRI&RQWHQWV«««««««««««««««««««««««««« $FNQRZOHGJHPHQWV««««««««««««««««««««««««« %DFNJURXQG«««««««««««««««««««««««««««« Project Purpose «««««««««««««««««««««««««««15 0HWKRGV««««««««««««««««««««««««««««««16 5HVXOWV««««««««««««««««««««««««««««««21 'LVFXVVLRQ«««««««««««««««««««««««««««««28 Bibliograph\«««««««««««««««««««««««««««« 6XSSOHPHQWDO««««««««««««««««««««««««««« 3 A C K N O W L E D G E M E N TS I would like to thank Dr. Scot Wolfe and the University of Massachusetts Medical School for sponsoring this project and providing guidance along the way. I would also like to thank Ankit Gupta for his guidance in carrying out this product as well as The Joung Lab and The Stormo Lab for their collaboration with the project. Lastly, I would like to thank Dr. Dave Adams and Dr. Destin Heilman, my WPI major advisors, for advising this project and helping to edit the final report. 4 PR OJE C T PURPOSE The purpose of this project was to create a comprehensive ZFP predicative model that could be used to both accurately predict the DNA-binding sites of ZFPs, as well as to aid in designing ZFPs for a desired target site. This predictive model was developed based upon Bacterial One-Hybrid (B1H) binding site selections on 116-engineered ZFP clones based upon the structural framework of Zif268 and was created in collaboration with the laboratories of Keith Joung and Gary Stormo. 5 B A C K G R O UND T H E DISC O V E R Y O F Z IN C FIN G E R PR O T E INS Through the work of Robert Roeder and Donald Brown on the 5S RNA genes of Xenopus laevis, it was discovered that the binding of a 40-kDa protein factor, called transcription factor IIIA (TFIIIA), is required for the initiation of transcription. Through further experiments Roeder and Brown determined that TFIIIA interacts with a 50- nucleotide long region within the gene, called the internal control region, resulting in TFIIIA being the first eukaryotic transcription factor ever described. In 1982, further studies of TFIIIA by Miller, Roeder, and Brown showed that the transcription factor contained a high concentration of zinc and could be broken down into nine tandemly repeating units of 30-amino acids in length with each unit containing a similar pattern of cysteines and histidines. The remarkable repeating motif within TFIIIA was later termed a zinc finger (ZF). The protein was given this name due to the presence of the zinc ion and the manner with which the zinc finger gripped the DNA (Klug, 2010). The ability to modularly assemble Zinc Finger Proteins (ZFPs) as well as the wide variety of DNA sequences they can recognize, make ZFPs an ideal framework to design novel DNA-binding proteins. Of the 30 amino acids in each ZF, 25 fold around a Zn ion WRIRUPDQLQGHSHQGHQWVWUXFWXUDOGRPDLQRU³WKHILQJHU´DQGWKHUHPDLQLQJDPLQR acids serve a linker between consecutive fingers. In later studies by Neuhaus, it was shown that this linker is extremely flexible and can vary depending on the organism. The discovery of the flexibility of the linker also showed that tandemly linked ZFs are structurally independent of one another and thus opened the door to a whole new world of DNA recognition and DNA binding proteins (Klug, 2010). 6 Since the discovery of TFIIIA, three major classes of ZFs have been defined. The first class is characterized by the presence of a Cys6-Zn cluster motif and is found in metabolic regulators of Fungi. The second class consists of Cys2Cys2 (or Cys4) ZFs with a conserved Zn-binding consensus of Cys-X2-Cys-X13-Cys-X2-Cys and is found mainly in nuclear steroid or hormone receptors. The third class contains the most common ZF, the Cys2His2 =)RUWKH³FODVVLFDO´=)GRPDLQDQGFDQEHIRXQGLQDODUJHQXPEHURI regulatory proteins spanning almost all branches of the evolutionary tree (Papworth et al, 2006). DN A BINDIN G O F C YS2 H IS2 Z IN C FIN G E R PR O T E INS Cys2His2 Zinc Fingers are the most common DNA-binding domain in metazoan genomes and are naturally occurring in animals, plants, and fungi. They are the best understood out of the three major classes of ZFs and are therefore the most widely used for designing proteins with novel DNA-binding specificities (Wolfe et al., 2000). Cys2His2 Zinc Fingers contain a conserved Cys-&\V«+LV-His pattern with the zinc ion tetrahedrally coordinated between the two cysteines at one end of the E-strands and the two histidines in the C-terminal portion of the D-helix to stabilize its folds (figure 1). In addition to the conserved Cys-Cys and His-His pattern, each ZF also contains Figure 1: The structure of a Zinc Finger from a two- dimensional NMR study of three conserved large hydrophobic residues: typically Tyr6, two-finger peptide in solution (Klug, 2010). Phe17, and Leu23, which help to stabilize the finger structure. Therefore, the conserved sequence of Cys2His2 ZFs can be written as (Tyr, Phe)-X-Cys- 7 X2-5-Cys-X3-(Tyr, Phe)-X5-Leu-X2-His-X3-5-His, with X representing any amino acid (Klug, 2010). Additionally, ZFs can be linked tandemly in a linear, polar fashion to recognize DNA (or RNA) sequences of different lengths as well as proteins (Wolfe et al., 2000). However, while each finger domain has a similar structural framework, it can achieve chemical distinctiveness through variations in the 4 key amino acid positions used in DNA-binding (Klug, 2010). In 1991, Pavletich and Pabo conducted an in depth study of the crystal structure of a DNA oligonucleotide bound to a three-finger DNA-binding domain of the mouse transcription factor Zif268. From this study, Pavletich and Pabo showed that the primary sequence-specific contacts are made by the D-helix, which binds in the major groove of the DNA. This binding occurs through specific hydrogen-bond or hydrophobic interactions from amino acids in the recognition helix at positions -1, 3, and 6 with the three positions contacting three consecutive basepairs on one strand of the DNA. Later studies showed that the amino acid in the recognition helix at position 2 also plays an important role in DNA binding, contacting a single basepair on the opposite DNA strand as positions -1, 3, and 6. These studies also demonstrated the importance of cross-strand interaction between the D-helices of tandemly linked ZFs due to the fact that the complement of the basepair contacted by position 2 of the first finger is contacted by position 6 of the second finger and so on (figure 2) (Klug, 2010). 8 Figure 2: Schematic of the DNA-binding of the mouse transcription factor, Zif268. Contacts with the DNA are made from positions -1, 3, and 6 of the recognition helix on one strand of the DNA with the residue at position 2 contacting the complement strand. Positions -1, 3, and 6 contact three consecutive basepairs while position two contacts the complement of the basepair contacted by position 6 of the next finger. 7KHWKUHHILQJHUPRGXOHELQGVZLWKSRVLWLRQRIILQJHUWKUHHFRQWDFWLQJWKH¶HQGRI the DNA and position -RIILQJHUFRQWDFWLQJWKH¶HQGRIWKH'1$ ,PDJHWDNHQ from Klug, 2010). The most common linker arrangement found in ZFs contains five residues between the final histidine of one finger and the first conserved aromatic of the next. Roughly half of the known ZFs contain a linker with the consensus sequence TGEKP. This particular linker is flexible in the free ZFP but becomes rigid upon binding of the ZFP to DNA. The docking of adjacent fingers in a ZFP are stabilized further by a contact LQYROYLQJWKHVLGHFKDLQRIWKHUHVLGXHDWSRVLWLRQRIWKHSUHFHGLQJILQJHU¶VKHOL[DORQJ with the backbone carbonyl or side chain at position -2 of the subsequent finger. This contact using position 9 has been observed to correlate with the use of a canonical linker such as TGEKP. Due to this interfinger contact at position 9 and the highly conserved TGEKP linger, it can be implied that the interfinger organization is important in DNA recognition as well (Wolfe et al., 2000). 9 The mode of DNA recognition can be considered a one-to-one interaction between individual amino acids from the recognition helix to individual DNA bases (Wolfe et al., 2000). This combined with the fact that ZFs function as independent modules, allows for the design of ZFPs that will recognize longer DNA sequences through combining several ZFs with different triplet specificities.

Characterizing the Dna-Binding Site Specificities of Cis2his2 Zinc Fingers

Department of Energy Office of Health and Environmental Research SEQUENCING the HUMAN GENOME Summary Report of the Santa Fe Workshop March 3-4, 1986

Mapping Our Genes—Genome Projects: How Big? How Fast?

ISMB 99 August 6 – 10, 1999 Heidelberg, Germany the Seventh

Download File

Baldi Bioinformatics

Gene Regulatory Network Inference Using Machine Learning Techniques

Highlights (PDF)

2014 ISCB Accomplishment by a Senior Scientist Award: Gene Myers

Protein-DNA Recognition Models for the Homeodomain and C2H2 Zinc Finger Transcription Factor Families Ryan Christensen Washington University in St

Reading Genomes Bit by Bit

Footer: a Quantitative Comparative Genomics Method for Efficient Recognition of Cis-Regulatory Elements

Combined Evidence Annotation of Transposable Elements in Genome Sequences