Biomolecular Chemistry 4. from Amino Acids to Proteins

Biomolecular chemistry

4. From amino acids to proteins

Suggested reading: Sections 14.5 to 14.8 and Sections 2.1 to 2.4 of Mikkelsen and Cortón, Bioanalytical Chemistry

Primary Source Material • Chapters 4 and 12 of Introduction to Genetic Analysis Anthony: J.F. Griffiths, Jeffrey H. Miller, David T. Suzuki, Richard C. Lewontin, William M. Gelbart (courtesy of the NCBI bookshelf). • Chapters 4, 4 and 6 of Biochemistry: Berg, Jeremy M.; Tymoczko, John L.; and Stryer, Lubert (courtesy of the NCBI bookshelf). • Chapters 3 and 7 of Molecular Cell Biology: Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence; Matsudaira, Paul; Baltimore, David; Darnell, James E. (courtesy of the NCBI bookshelf). • ExPASy: online course on Principles of Protein Structure • Many figures and the descriptions for the figures are from the educational resources provided at the Protein Data Bank (http://www.pdb.org/) • Most of these figures and accompanying legends have been written by David S. Goodsell of the Scripps Research Institute and are being used with permission. I highly recommend browsing the Molecule of the Month series at the PDB (http://www.pdb.org/pdb/101/ motm_archive.do) Where are we and how did we get here? 78

We are here!

• We are done with the Central Dogma and now we move into the realms of protein structure and function. The Central Dogma only relates to the ﬂow of genetic information, not to the function of biological macromolecules. Proteins come in all shapes and sizes 79

http://www.rcsb.org/pdbstatic/education_discussion/molecule_of_the_month/poster_quickref.pdf

• Proteins are diverse and versatile ‘nano’ structures and machines • Large number of potential combinations • There is a relatively large number number of amino acids (a.a.) which you can use to construct a protein. • Includes 20 common a.a.’s plus numerous post-translational modiﬁcations. • 200 amino-acid protein could have 20 to the 200th power possible sequences. • Structurally versatile • Polypeptide backbone can adopt a variety of conformations • Many conformers of side chains • Secondary structural elements can pack together in a wide variety of orientations • Various states of homo- and hetero- oligomerization • Proteins can bind prosthetic groups or cofactors (non-protein) • Heme • Metal ions • ﬂavins • Structurally dynamic • Allosteric activation • Active and inactive forms The structure of a protein is determined by the80 linear sequence of amino acids (1º structure)

Ribonuclease

An unfolded protein can be refolded in vitro. This demonstrates that the information needed to specify the tertiary structure is fully contained in the primary sequence.

http://www.users.csbsju.edu/~hjakubow/classes/rasmolchime/01ch331ﬁnproj/Rnase/templateprot.htm

• The classic work of Christian Anfinsen in the 1950s on the enzyme ribonuclease revealed the relation between the amino acid sequence of a protein and its conformation. For this work he was awarded the Nobel Prize in Chemistry in 1972. Anfinsen discovered that: • Ribonuclease is a single polypeptide chain consisting of 124 amino acid residues cross-linked by four disulfide bonds. • Agents such as urea or guanidinium chloride effectively disrupt the noncovalent bonds., • The disulfide bonds can be cleaved reversibly by reducing them with a reagent such as β-mercaptoethanol. • When ribonuclease was treated with β-mercaptoethanol in 8 M urea, the product was a fully reduced, randomly coiled polypeptide chain devoid of enzymatic activity. In other words, ribonuclease was denatured by this treatment. • Anfinsen then made the critical observation that the denatured ribonuclease, freed of urea and β-mercaptoethanol by dialysis, slowly regained enzymatic activity. All the measured physical and chemical properties of the refolded enzyme were virtually identical with those of the native enzyme. • These experiments showed that the information needed to specify the catalytically active structure of ribonuclease is contained in its amino acid sequence. The 20 common amino acids 81

Ala showing L- stereochemistry http://www.neb.com/neb/sitemap/sitemap_5-1-10.html

• 20 different common amino acids only differing in side chain • Note that stereochemistry at Cα has not been indicated in this figure. • All natural a.a.’s are in L-configuration • A more general system of stereochemical designation is the R/S system. The L- configuration nearly always corresponds to S in the R/S system. The exception is L- cysteine which is R. • You might want to keep this sheet handy as a reference. • I will often used the one letter codes and you should learn these. • Most are easy, but I find D, E, N and Q the most tricky to remember • Q: Do we need to memorize the structure and names of amino acids on the test? • A: Yes. You should know the structure, name, 3 letter abbreviation, and 1 letter code of all the common amino acids. Amino acid classification by property 82

http://www.imb-jena.de/image_library/GENERAL/aa/mut1.jpg http://www.imb-jena.de/image_library/GENERAL/aa/chemprop.jpg

• Various simple textbook classifications for a.a.’s • e.g. small, nucleophilic, hydrophobic, aromatic, acidic, amide, basic • e.g. aliphatic, non-polar, aromatic, polar, charged -ve, charged +ve • However, no simple classification can properly capture the diversity of a.a. interactions and properties. • the same amino acid in different charge states can go from polar to nonpolar (H or K for example) • Different portions of the same amino acid can have different properties (aliphatic chain vs. guanidinium of arginine) • Generally find aliphatic/hydrophobic residues inside proteins and polar/charged on the surface of proteins • Notes: • Cysteine is special because it is best nucleophile, is most easily oxidized, and can form disulphide bonds. • Proline has a tertiary as opposed to secondary amide nitrogen and induces bend in polypeptide chain. • Theonine and Isoleucine have chiral carbons in side chain Free amino acids are almost always zwitterions83

Commentary on the topic of zwitterions: http://bip.cnrs-mrs.fr/bip10/zwitter.htm

• Amino acids in solution at neutral pH exist as dipolar ions (zwitterions). • In the dipolar form, the amino group is protonated (-NH3+) and the carboxyl group is deprotonated (-CO2-). • Under almost any conceivable physiologically relevant conditions, the amino and carboxylate group of a free amino acid will be in its charged state. • This is also true of a polypeptide chain: the N-terminus and the C-terminus will be in the charged states • Possible exceptions • Groups buried in the interior of proteins or lipid bilayers • Proteins in the stomach pKa values of protein functional groups 84

• Seven of the 20 amino acids have readily ionizable side chains. These 7 amino acids are able to donate or accept protons to facilitate reactions as well as to form ionic bonds. • The above table gives equilibria and typical pKa values for ionization of the side chains of tyrosine, cysteine, arginine, lysine, histidine, and aspartic and glutamic acids in proteins. • Two other groups in proteins—the terminal α-amino group and the terminal α-carboxyl group—can be ionized. • You should know the approximate values for all of these ionizable groups. It is safe to say that all carboxylic acids in proteins have a pKa of about 3-4. • Q: What is so special about Histidine? It has a pka of ~6, but did you mention that it does not react with anything much? • A: Histidine is very good at donating and accepting protons at physiological pH. This is a very important part of many enzyme mechanisms. I may have mentioned that histidine is not such a good nucleophile. For enzyme mechanisms that involve a nucleophilic attack on the substrate, cysteine would be the best amino acid, followed by lysine. • Q. Proteins buried in lipid bilayers are charged on one terminal end or not at all? if its charged on part which one is it? • A. The N-terminus is always positively charged and the C-terminus is always negatively charged under normal pH conditions (near neutral). Under some circumstances, such as when the N- or C-terminus is buried in a very hydrophobic environment, I suppose they could be uncharged. The pKa of an ionizable group is going to depend on its environment. • Q. Proteins in stomach are charged on their N terminals, am i right? • A. I believe that the stomach is very low pH, like 2-3. At such low pH, practically every group in proteins will be protonated. It is close to the pKa for the C-terminus, so it might be partially protonated. • Q. Are the pKa values of AAs will be given in the test or not? • A. They won't be provided. You should know which residues are positively and negatively charged at neutral pH. An oligopeptide 85

• Oligopeptide: A compound made up of the condensation of a small number (typically less than 20) of amino acids • Polypeptide: A compound made up of the condensation of more than ~20 amino acids • Each type of protein differs in its sequence and number of amino acids. It is the particular sequence of the various side chains that makes each protein distinct. • The two ends of a polypeptide chain are chemically different: the end carrying the free amino group (NH3+, sometimes incorrectly written as NH2) is the amino, or N-terminus, and that carrying the free carboxyl group (CO2-, sometimes incorrectly written as CO2H) is the carboxyl, or C-terminus. • The amino acid sequence of a protein is always presented in the N to C direction, reading from left to right. This corresponds to the 5’ to 3’ direction in which genes are read. The peptide bond is planar 86

• Linus Pauling and Robert Corey analyzed the geometry and dimensions of the peptide bonds in the crystal structures of molecules containing one or a few peptide bonds. • This analysis led Pauling to correctly predict the existence and structure of the alpha helix and beta sheets (for which he was awarded the 1954 Nobel Prize in Chemistry) • The take home message is that the secondary structure elements of proteins can be predicted by looking at the structure of an individual amino acid. That is, an amino acid in an alpha helical or beta sheet conformer is also in a minimal energy conformer because its bonds are staggered and the peptide bond is planar. • Note that the C-N bond length of the peptide is 10% shorter than that found in usual C-N amine bonds. This is because the peptide bond has some double bond character (40%) due to resonance which occurs with amides. • As a consequence of this resonance all peptide bonds in protein structures are found to be almost planar. This rigidity of the peptide bond reduces the degrees of freedom of the polypeptide during folding. • The planarity of the peptide bond is described using the angle ‘omega’. This is the dihedral angle between the Calpha-carbonyl bond and the N-Calpha bond. The peptide bond is almost always trans 87

All amino acids except proline

Proline

image credit: http://www.imb-jena.de/IMAGE.html.

• The omega (ω) angle is almost always 180º (trans) though sometimes (extremely rarely) it is 0º (cis). • Note that both the cis and trans form are planar. • Of the cis-peptide bonds found in proteins, almost all involve proline residues. • The overall atom geometry in cis proline is very similar to the trans-proline case. Energetically, the trans proline structure is not markedly more favorable than its cis-proline counterpart since much the same spatial conﬂicts are present in both cases. • Approximately 1% of prolines in proteins are cis. • A cis-peptide bond induces a very sharp kink in the polypeptide chain. • Q. It is stated that "Approximately 1% of prolines in proteins are cis." Does it mean 99% of prolines in proteins are trans? So, trans-proline is still more favourable than cis-proline (Slide 87)? Also, do you mean that proline is the only amino acid that can exist in cis while 19 other amino acids cannot. • A. Correct. 99% of all prolines are trans and trans is more favourable than cis. The difference in energy for cis vs. trans is smaller than it is for any of the other amino acids, and this is why we occasionally see cis prolines. It is extremely rare to ﬁnd any of the other 19 amino acids in a cis conformation. Certain combinations of φ and ψ angles are 88 preferred

Scans downloaded from: http://www.nd.edu/~aseriann/cou.html

• A polypeptide can be thought of as a series of planar units (peptide bonds) joined by ﬂexible hinges (Cα-atoms). • Each Cα-atom has two rotatable bonds, the C-N bond (φ, phi) and the C-C bond (ψ, psi) • Only certain combinations of φ and ψ angles are allowed due to steric clashes between the adjacent residues. The Ramachandran Plot (φ vs. ψ) 89

β-strand conformation

α-helical conformation

• A graph of φ angle vs. ψ angle vs. occurrence in proteins is called a Ramachandran plot. • There are actually only a few conformations that are strongly preferred and these give rise to the common elements of secondary structure. The Ramachandran90 Plot of a typical protein (as output by the program PROCHECK)

http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html http://www.cryst.bbk.ac.uk/PPS2/course/section3/rama.html

• The Ramachandran plot for a particular protein shows the phi-psi torsion angles for all residues in the structure • By looking at how well the angles match up with expected distribution, the quality of a structure can be assessed. • Glycine residues are separately identiﬁed by triangles as these are not restricted to the regions of the plot appropriate to the other sidechain types. • The coloring/shading on the plot represents the various levels of favorability: the darkest areas (here shown in red) correspond to the "core" regions representing the most favorable combinations of phi-psi values. • A properly folded protein will have over 90% of the residues in these "core" regions. • The percentage of residues in the "core" regions is one of the better guides to stereochemical quality for assessing experimental protein structures. • An ideal Ramachandran plot can be generated computationally using known atomic radii and bond distances. alpha-helices: an ‘island’ of preferred 91 conformation

http://www.cryst.bbk.ac.uk/PPS2/course/section3/rama.html

• As mentioned earlier, Pauling and Corey twisted models of polypeptides around to ﬁnd ways of getting the backbone into regular conformations which would agree with experimental diffraction data (much like the way the structure of DNA was determined). The most simple and elegant arrangement is a right-handed spiral conformation known as the 'alpha-helix'. • The structure repeats itself every 5.4 Angstroms along the helix axis, i.e. we say that the alpha-helix has a pitch of 5.4 Angstroms. Alpha-helices have 3.6 amino acid residues per turn, i.e. a helix 36 amino acids long would form 10 turns. The separation of residues along the helix axis is 5.4/3.6 or 1.5 Angstroms, i.e. the alpha-helix has a rise per residue of 1.5 Angstroms. • Every mainchain C=O and N-H group is hydrogen-bonded to a peptide bond 4 residues away (O(i) to N(i+4)). This gives a very regular, stable arrangement. • The peptide planes are roughly parallel with the helix axis and the dipoles within the helix are aligned. That is, all C=O groups point in the same direction and all N-H groups point the other way. This alignment of C=O and N-H bonds gives the alpha-helix a permanent dipole with a partial positive charge at the amino-terminus and a partial negative charge at the carboy-terminus. • Side chains point outward from helix axis and are generally oriented towards its amino- terminal end. • All the amino acids have negative phi and psi angles, typical values being -60 degrees and -50 degrees, respectively beta-strands: another ‘island’ of preferred 92 conformation

http://www.cryst.bbk.ac.uk/PPS2/course/section3/rama.html

• In addition to the alpha helix, Pauling and Corey discovered another periodic structural motif which they named the β-pleated sheet (β because it was the second structure that they elucidated, the α helix being the ﬁrst). • The β-sheet differs markedly from the rodlike α-helix. A polypeptide chain, called a β- strand, in a β-sheet is almost fully extended rather than being tightly coiled as in the α- helix. A range of extended structures are sterically allowed. The side chains of adjacent amino acids point in opposite directions. • A β-sheet is formed by linking two or more β-strands by hydrogen bonds. Adjacent chains in a β-sheet can run in opposite directions (antiparallel β-sheet) or in the same direction (parallel β-sheet). • In the antiparallel arrangement, the NH group and the CO group of each amino acid are respectively hydrogen bonded to the CO group and the NH group of a partner on the adjacent chain. • In the parallel arrangement, for each amino acid, the NH group is hydrogen bonded to the CO group of one amino acid on the adjacent strand, whereas the CO group is hydrogen bonded to the NH group on the amino acid two residues farther along the chain. • Many strands, typically 4 or 5 but as many as 10 or more, can come together in β-sheets. Such β-sheets can be purely antiparallel, purely parallel, or mixed. • β-sheets can be relatively ﬂat but most adopt a somewhat twisted shape. Turns and loops connect strands and helices 93

http://www.cryst.bbk.ac.uk/PPS2/course/section3/rama.html

• Most proteins have compact, globular shapes, requiring reversals in the direction of their polypeptide chains. Many of these reversals are accomplished by reverse turns and hairpins. • A reverse turn is region of the polypeptide having a hydrogen bond from one main chain carbonyl oxygen to the main chain N-H group 3 residues along the chain (i.e. O(i) to N(i+3)). Helical regions are excluded from this definition and turns between beta-strands form a special class of turn known as the beta-hairpin. • Reverse turns are very abundant in globular proteins and generally occur at the surface of the molecule. It has been suggested that turn regions act as nucleation centers during protein folding. • Reverse turns are divided into classes based on the phi and psi angles of the residues at positions i+1 and i+2. Types I and II shown in the figure are the most common reverse turns, the essential difference between them being the orientation of the peptide bond between residues at (i+1) and (i+2). The torsion angles for the residues (i+1) and (i+2) in the two types of turn lie in distinct regions of the Ramachandran plot. • 2-residue beta-hairpin turns occur between two antiparallel beta-strands as shown in the figure. • The residues forming these two-residue turns have torsion angles in characteristic regions of the Ramachandran plot. • For type I' turns, residue 2 is always glycine whereas for type II' turns residue 1 is always Gly. This is because amino acids other than glycine would cause steric hindrance involving the residue's side chain and the main chain. • In other cases, more elaborate structures are responsible for chain reversals. These structures are called loops or sometimes Ω loops (omega loops) to suggest their overall shape. Unlike α-helices and β-strands, loops do not have regular, periodic structures. Nonetheless, loop structures are often rigid and well defined. Turns and loops invariably lie on the surfaces of proteins and thus often participate in interactions between proteins and other molecules. • For example, a part of an antibody molecule has surface loops (shown in red) that mediate interactions with other molecules. • Q. Does reverse turn only exists among alpha-helix? And beta-hairpin only exists among beta-sheet? • A. As the name implies, the beta hairpin is most commonly found as a connector between strands of an antiparallel beta sheet. The reverse turn is a a bit more general and can be found in loops that connect both helices and strands. • Q. What are the differences/relationship between reverse turns, beta-hairpin turns and omega loops? • A. Reverse turns and beta turns do look very similar when you look at the structures on the slide. However, there are key differences in the conformations of amino acids that define each of these two types of turns. I don't expect you to know the details of these differences. One thing you should remember is that beta turns are typically used to connect two strands of anti-parallel beta sheet. An omega loop is a larger structure that is supposed to look something like the omega character (Ω). That is, the ends are very close together but the loop itself is large and extends out into space. The variable regions of an antibody can be described as omega loops. • Q. In slide 93, type II looks similar to type I'. I know one is for reverse turn and another is for beta hairpin. If I have a structure like that, how can I tell it's a beta-hairpin or a reverse turn. I know the feature for beta-hairpin is that residue2 in typeI' and residue1in typeII' should be Glysine. Is there any other feature for reverse turn? Do we need to know how to tell type I and type II of reverse turns? Another question is the difference between reverse turn, beta-hairpin and loops. Did we tell them apart by the number of amino acids? • A. There are key differences in the conformations of the residues in these turns, which is the basis by which they are classified. I don't expect you to know the details of these differences. One thing you should remember is that beta turns are typically used to connect two strands of anti-parallel beta sheet. Proteins generally composed of α-helices 94 and/or β-sheets connected by turns and loops

• The α-helical content of proteins ranges widely, from nearly none to almost 100%. For example, about 75% of the residues in ferritin, a protein that helps store iron, are in α- helices. Single α-helices are usually less than 45 Å long. However, two or more α-helices can entwine to form a very stable ‘coiled coil’ structure, which can have a length of 1000 Å (100 nm, or 0.1 μm) or more. Such α-helical ‘coiled coils’ are found in myosin and tropomyosin in muscle, in fibrin in blood clots, and in keratin in hair. The helical cables in these proteins serve a mechanical role in forming stiff bundles of fibers, as in porcupine quills. The cytoskeleton (internal scaffolding) of cells is rich in so-called intermediate filaments, which also are two-stranded α-helical coiled coils. Many proteins that span biological membranes also contain α-helices. • The β-sheet is an important structural element in many proteins. For example, fatty acid- binding proteins, important for lipid metabolism, are built almost entirely from β-sheets. Protein folding is largely driven by 95 hydrophobic interactions

Myoglobin

Hydrophobic Hydrophilic

surface cross section

• Myoglobin, the oxygen carrier in muscle, is a single polypeptide chain of 153 amino acids. The capacity of myoglobin to bind oxygen depends on the presence of heme, a prosthetic (helper) group consisting of protoporphyrin IX and a central iron atom. • The folding of the main chain of myoglobin, like that of most other proteins, is complex and devoid of symmetry. A unifying principle emerges from the distribution of side chains. The striking fact is that the interior consists almost entirely of nonpolar residues such as leucine, valine, methionine, and phenylalanine. Charged residues such as aspartate, glutamate, lysine, and arginine are absent from the inside of myoglobin. The only polar residues inside are two histidine residues, which play critical roles in binding iron and oxygen. • The outside of myoglobin, on the other hand, consists of both polar and nonpolar residues. This contrasting distribution of polar and nonpolar residues reveals a key facet of protein architecture. In an aqueous environment, protein folding is driven by the strong tendency of hydrophobic residues to be excluded from water. • The polypeptide chain therefore folds so that its hydrophobic side chains are buried and its polar, charged chains are on the surface. • The secret of burying a segment of main chain in a hydrophobic environment is pairing all the NH and CO groups by hydrogen bonding. This pairing is neatly accomplished in an α- helix or β-sheet. • The ability to predict whether or not a given polypeptide sequence will fold into a given tertiary structure remains one of the ‘grand challenges’ of science. • In nature, protein fold either independently or with the help of other proteins known as chaperones. Membrane proteins have grease on the outside96 K+-channel

lipid bilayer

Three views of the same structure

• Some proteins that span biological membranes are “the exceptions that prove the rule” regarding the distribution of hydrophobic and hydrophilic amino acids throughout three-dimensional structures. For example, ion channels are covered on the outside largely with hydrophobic residues that interact with the neighbouring alkane chains. The inner channel is quite polar and there are many speciﬁc interactions with the ion being transported. • David S. Goodsell: The Molecule of the Month appearing at the PDB • Potassium ions move through this channel from inside the cell to the outside. The driving force for this movement is simply the concentration gradient. Cells concentrate potassium ions inside, and then these ions are released when the membrane depolarizes (for example, during transmission of signals through the nervous system). The selectivity ﬁlter is the part with the backbone carbonyls oriented towards the ion in the centre of the channel. Only potassium (not sodium) is perfectly coordinated by these carbonyl oxygen atoms, and so only it can pass through the channel. It is my understanding that potassium ions are normally surrounded by 8 water molecules, whereas sodium is normally surrounded by 6. • The 2003 Nobel Prize in Chemistry was awarded for work in the area of channels • Roderick Mackinnon pioneered x-ray crystallography of ion channels. • Peter Agre discovered water channels. • Water channels facilitate the rapid transport of water across cell membranes in response to osmotic gradients. These channels are believed to be involved in many physiological processes that include renal water conservation, neuro-homeostasis, digestion, regulation of body temperature and reproduction. Members of the water channel superfamily have been found in a range of cell types from bacteria to human. Chaperone assisted protein folding 97

http://www.users.csbsju.edu/~hjakubow/classes/rasmolchime/02ch331ﬁnproj/GroELES/templateprotGROEL2.htm

• Folding of proteins in vitro tends to be an inefficient process, with only a minority of unfolded molecules undergoing complete folding within a few minutes. • More than 95 percent of the proteins present in cells are in their native conformation. • The explanation for the cell’s remarkable efficiency in promoting protein folding probably lies in chaperones, a family of proteins found in all organisms from bacteria to humans. • There are two general families of chaperones: molecular chaperones, which bind and stabilize unfolded or partially folded proteins, thereby preventing these proteins from being degraded; and chaperonins, which directly facilitate their folding. • Chaperonins are probably used for a specific and relatively small selection of proteins, whereas molecular chaperones are used for most, if not all, proteins. • All chaperones have ATPase activity, and their ability to bind and stabilize their target proteins is specific and dependent on ATP hydrolysis. • Molecular chaperones include the Hsp70 family of proteins. When bound to ATP, Hsp70 assumes an open form in which an exposed hydrophobic pocket transiently binds to exposed hydrophobic regions of the unfolded target protein. Hydrolysis of the bound ATP causes Hsp70 to assume a closed form, releasing the target protein. Molecular chaperones are thought to bind all nascent polypeptide chains as they are being synthesized on ribosomes. More on GroEL 98

Hydrophobic stripe

ATP-binding site

Large cavity

David S. Goodsell: The Molecule of the Month appearing at the PDB

• Proper folding of a small proportion of proteins (e.g., the cytoskeletal proteins actin and tubulin) requires additional assistance, which is provided by chaperonins. • Shown on this slide is the bacterial chaperonin, GroEL, which contains 14 identical subunits stacked in two concentric rings (green). GroES is shown at the bottom in pink. • The large GroEL-GroES complex is available in PDB entry 1aon. In this picture, three of the subunits in each GroEL ring have been removed to show the interior, leaving four subunits in each ring. On the two in back, the hydrophobic amino acids, LEU, ILE, VAL, MET, PHE, TYR and TRP, are coloured blue. • Notice the stripe of hydrophobic amino acids around the entry at the top. This will interact strongly with unfolded proteins by coaxing them into the upper cavity. Once the unfolded protein is bound, ATP and GroES bind to GroEL. This causes a conformational change that forces the protein into the larger lower cavity that is much more hydrophilic than the upper cavity. • Now that the protein is in a hydrophilic environment, it will be forced to fold in order to minimize they unfavourable interactions between its hydrophobic portions and its hydrophilic environment. • After the protein has folded, ATP is hydrolyzed and GroES (the lid on the cavity) is released along with the newly folded protein. • Q: When use chaperonin to help proteins to fold, the GroES will bind to GroEL to the large cavity side or hydrophobic stripe side? • A: I believe it can bind to both sides. Don't worry about the details. Proteins often consist of multiple independent99 domains and have 4o structure CD4

Cro

hemoglobin

Rhinovirus

http://web.mit.edu/esgbio/www/cb/virus/virus.html Immunoglobin (antibody)

• Some polypeptide chains fold into two or more compact regions that may be connected by a flexible segment of polypeptide chain, rather like pearls on a string. • These compact globular units, called domains, range in size from about 30 to 400 amino acid residues. • For example, the extracellular part of CD4 (shown at top), the cell-surface protein on certain cells of the immune system to which the human immunodeficiency virus (HIV) attaches itself, comprises four similar domains of approximately 100 amino acids each. Often, proteins are found to have domains in common even if their overall tertiary structures are different. • Antibodies (immunoglobins) have a distinct domain structure in addition to quaternary structure. We will be taking a much closer look at antibody structure in the next section. • Quaternary structure refers to the spatial arrangement of subunits and the nature of their interactions. • The simplest sort of quaternary structure is a dimer, consisting of two identical subunits. This organization is present in the DNA-binding protein Cro found in a bacterial virus called λ. • More complicated quaternary structures also are common. More than one type of subunit can be present, often in variable numbers. For example, human hemoglobin, the oxygen-carrying protein in blood, consists of two subunits of one type (designated α) and two subunits of another type (designated β). Thus, the hemoglobin molecule exists as an α2β2 tetramer. • Viruses make the most of a limited amount of genetic information by forming coats that use the same kind of subunit repetitively in a symmetric array. The coat of rhinovirus, the virus that causes the common cold, includes 60 copies each of four subunits. The subunits come together to form a nearly spherical shell that encloses the viral genome. • Q: It mentioned that the coat of rhinovirus includes 60 copies each of four subunits. But from the picture I only see three coloured subunits. What's wrong in this. • A: There is a 4th protein that is inside and not visible from the outside. Post-translational modifications of proteins

Ubiquitylation (Lys)

Courtesy of Spencer Alford

• Many proteins are covalently modified, through the attachment of groups other than amino acids, to augment their functions. Many proteins, especially those that are present on the surfaces of cells or are secreted, acquire carbohydrate units on specific asparagine residues. The addition of sugars makes the proteins more hydrophilic and able to participate in interactions with other proteins. Conversely, the addition of a fatty acid to an α-amino group or a cysteine sulfhydryl group produces a more hydrophobic protein that will be tightly associated with the membrane. • Proteins can also be reversibly modified to regulate their activity. Perhaps the single most important of all modifications is phosphorylation and dephosphorylation of serine, threonine, and tyrosine residues. Regulation of protein activity by phosphorylation is basis for intracellular signalling. The enzymes that catalyze the addition of phosphate groups (from ATP donors) are called kinases (why kinases?). Enzymes that remove phosphate groups are called phosphatases. • Histones—proteins that assist in the packaging of DNA into chromosomes as well as in gene regulation—are rapidly acetylated and deacetylated on specific lysine residues in vivo. More heavily acetylated histones are associated with genes that are being actively transcribed. A more permanent modification of lysines in histone proteins is methylation. • The attachment of ubiquitin, a protein comprising 72 amino acids, is a signal that a protein is to be destroyed, the ultimate means of regulation. • This slides shows only a few of the common examples. A number of additional post- translational modifications are known. REVIEWS

a Phosphorylation words and phrases, and has engendered the hypothesis O– – that protein domains represent a basic syntactic unit of O P O 5 OHATP ADP O cellular organization . In this article, we briefly discuss the common ways in which PTMs and interaction Protein kinase domains synergize to regulate cellular processes, and we Phosphatase provide specific examples that involve phosphorylation, N N methylation, acetylation, ubiquitylation and sumoylation H H O O SHC P–Tyr (FIGS 1,2). This is not intended to be a comprehensive Pi H O Tyrosine2 Phosphotyrosine GRB2 SH2 domain analysis, but rather aims to highlight the general strategies through which PTMs exert their effects. b Methylation SAM SAH + + CH3 NH3 H2N PTM-dependent interactions: common strategies Methyltransferase In the following subsections we briefly discuss the com- Amine oxidase mon mechanisms by which PTM-dependent interactions demethylase regulate cellular processes. N N H H PTM-induced interactions. Interaction domains often O H2O2 O2 O Histone H3 Me–Lys Lysine H2C=O ε-N-monomethyllysine HP1 chromodomain recognize short peptide motifs that are embedded in target proteins, but do not bind stably until the pep- c Acetylation Acetyl O 101 tide has acquired an appropriate PTM (FIGS 1,2a). Such

+ CoA CoA domains usually have a conserved binding pocket for NH HN Post-translational modiﬁcations3 are catalyzed the modified residue and a more variable surface that selectively engages the flanking amino acids, and thereby HAT HDAC distinguishes between different peptide motifs with the by enzymes and often serve as binding sites same PTM6–9. Both the domains and the peptide motifs REVIEWS N N that they recognize are modular in design and can there- H O H2O H O O Histone H4 Ac–Lys fore, in principle, be incorporated into many different Lysine O– ε-N-monoacetyllysine GCN5 bromodomain proteins. O a Phosphorylation – wordsd Ubiquitylation and phrases, and has engendered the hypothesis O Cooperative interactions and multi-site PTMs. PTM- + Ubiquitin + ATP – that protein domains represent a basic syntacticHN unit of O P O NH3 dependent interactions can be cooperative, such that a cellular organization5. In this article, we briefly discussUb OHATP ADP O signal is only generated after two or more sites on the the common ways in which PTMs and interaction Ubiquitin ligase (E3) same protein have been modified. This can be achieved domains synergize to regulate cellular processes, and we Protein kinase Deubiquitylating in various ways. First, a doubly modified motif can be provide specific examples that involve phosphorylation, Phosphatase isopeptidase recognized in an obligatory fashion by two tandem inter- N N methylation,N acetylation, ubiquitylation andN sumoylation H H H H action domains, as is the case for the recognition of two O O SHC P–Tyr (FIGS 1,2)O. This is not intended to be a comprehensiveO Ubiquitin Pi H O Lysine Ubiquitin H2O N-ubiquitinyllysine Vps27 UIM phosphotyrosine (pTyr)-containing motifs by the tandem Tyrosine2 Phosphotyrosine GRB2 SH2 domain analysis, but rather aims to highlight the general strategies Src-homology-2 (SH2) domains of the ZAP-70 (ζ-chain through which PTMs exert their effects. e Hydroxylation (T-cell receptor)-associated protein kinase 70 kDa) b Methylation SAM SAH 10 + + CH3 protein tyrosine kinase (FIG. 2b). This can increase both NH3 H2N PTM-dependent interactions: common strategies O H O HO the affinity and the specificity of the interaction. Second, In the following subsections2 we2 briefly discuss the com- Methyltransferase a single domain can possess two binding pockets for the mon mechanisms by which PTM-dependent interactions Amine oxidase modified residues. For example, the single SH2 domain of regulateN cellular processes. N demethylase Prolyl hydroxylase the APS protein (adaptor molecule containing pleckstrin- O Dehydroxylase? HIF1α OH–Pro N N O homology (PH) and SH2 domains protein) binds to two H H PTM-inducedProline interactions. Interaction domains Hydroxyproline often VHLβ O H2O2 O2 O Histone H3 Me–Lys pTyr residues in the activation loop of the insulin recep- Lysine H C=O ε-N-monomethyllysine HP1 chromodomain recognizeFigure 1 | Example short peptide post-translational motifs that aremodification embedded reactions in and structures of 2 tor kinase; furthermore, two APS SH2 domains form targetprotein-interaction-domain–ligand proteins, but do not bind stably complexes. until the Various pep- amino-acid side chains a non-covalent dimer, which potentially stabilizes the tidecan behas modified acquired by, an for appropriate example: phosphorylation PTM (FIGS 1,2a) (.a Such); methylation (b); acetylation (c); c Acetylation Acetyl O ubiquitylation (d); and hydroxylation (e). The enzymes that are involved in adding and activated receptor11. Third, a domain with a single bind- + CoA CoA domains One usually haveof thea conserved main binding functions pocket for of PTMs NH HN removing these post-translational modifications are shown on the reaction arrows. ing pocket can bind specifically to a protein that carries 3 the modified residue and a more variable surface that The structures on the far right show examples of protein-interaction domains (pale several modifications. For example, the WD40-repeat selectivelyis engagesto provide the flanking aminoa binding acids, and thereby site for a HAT purple) in complex with their respective ligands (red). The structures were obtained domain of the Saccharomyces cerevisiae protein Cdc4 (cell- distinguishes between different peptide motifs with the HDAC from theprotein Protein Data partner Bank (accession with codes 1JYR a ,suitable 1Q3L, 1E6I, 1Q0W and 1LM8 for division cycle-4) only binds to its target, Sic1 (substrate sameparts PTMa–e, respectively)6–9. Both the anddomains were manipulatedand the peptide using motifs Pymol (Delano Scientific). GCN5, inhibitor of cyclin-dependent protein kinase-1), N N thatgeneral theyinteraction control recognize of amino-acid-synthesis are modular domain. in design protein-5; and can there-GRB2, growth-factor-receptor- H O H2O H when the target has been phosphorylated during the O O Histone H4 Ac–Lys fore,bound in protein-2; principle, HAT, be incorporated histone acetyltransferase; into many different HDAC, histone deacetylase; HIF1α, G1 phase of the cell cycle on at least six Ser/Thr residues Lysine O– ε-N-monoacetyllysine GCN5 bromodomain proteins.hypoxia-inducible In this factor-1 way,α; HP1, the heterochromatin location protein-1; of proteins Me–Lys, methylated lysine; OH–Pro, hydroxylated proline; Pi, inorganic phosphate; P–Tyr, (FIG. 2c). As Cdc4 is the substrate-targeting subunit of d Ubiquitylation O phosphotyrosine;inside SAH, of Scells-adenosylhomocysteine; can be dynamicallySAM, S-adenosylmethionine; SH2, an SCF (Skp1–Cul1–F-box) E3 ubiquitin ligase complex, + Cooperative interactions and multi-site PTMs. PTM- Ubiquitin + ATP HN Src-homology-2; SHC, SH2-domain-containing transforming protein; Ub, ubiquitin; phosphorylation of Sic1 leads to its polyubiquitylation NH3 dependent interactions can be cooperative, such that a Ub UIM, ubiquitin-interactingchanged. motif; VHLβ, von Hippel–Lindau protein-β; Vps27, vacuolar and degradation by the proteasome. This, in turn, is signal is only generated after two or more sites on the protein sorting protein-27. necessary for the onset of DNA synthesis, because Sic1 Nature Reviews MolecularUbiquitin ligase Cell (E3) Biology 7, 473-483 same protein have been modified. This can be achieved Deubiquitylating in various ways. First, a doubly modified motif can be isopeptidase N N recognized474 | JULY 2006 in an | VOLUMEobligatory 7 fashion by two tandem inter- www.nature.com/reviews/molcellbio H H O O Ubiquitin action domains, as is the case for the recognition of two Lysine Ubiquitin H2O N-ubiquitinyllysine Vps27 UIM phosphotyrosine (pTyr)-containing motifs by the tandem Src-homology-2 (SH2) domains of the ZAP-70 (ζ-chain e Hydroxylation (T-cell receptor)-associated protein kinase 70 kDa) protein tyrosine kinase10 (FIG. 2b). This can increase both the affinity and the specificity of the interaction. Second, O2 H2O HO a single domain can possess two binding pockets for the modified residues. For example, the single SH2 domain of N N Prolyl hydroxylase the APS protein (adaptor molecule containing pleckstrin- O Dehydroxylase? HIF1α OH–Pro O homology Proline Hydroxyproline VHLβ (PH) and SH2 domains protein) binds to two Figure 1 | Example post-translational modification reactions and structures of pTyr residues in the activation loop of the insulin recep- protein-interaction-domain–ligand complexes. Various amino-acid side chains tor kinase; furthermore, two APS SH2 domains form can be modified by, for example: phosphorylation (a); methylation (b); acetylation (c); a non-covalent dimer, which potentially stabilizes the ubiquitylation (d); and hydroxylation (e). The enzymes that are involved in adding and activated receptor11. Third, a domain with a single bind- removing these post-translational modifications are shown on the reaction arrows. ing pocket can bind specifically to a protein that carries The structures on the far right show examples of protein-interaction domains (pale several modifications. For example, the WD40-repeat purple) in complex with their respective ligands (red). The structures were obtained domain of the Saccharomyces cerevisiae protein Cdc4 (cell- from the Protein Data Bank (accession codes 1JYR, 1Q3L, 1E6I, 1Q0W and 1LM8 for division cycle-4) only binds to its target, Sic1 (substrate parts a–e, respectively) and were manipulated using Pymol (Delano Scientific). GCN5, inhibitor of cyclin-dependent protein kinase-1), general control of amino-acid-synthesis protein-5; GRB2, growth-factor-receptor- when the target has been phosphorylated during the bound protein-2; HAT, histone acetyltransferase; HDAC, histone deacetylase; HIF1α, hypoxia-inducible factor-1α; HP1, heterochromatin protein-1; Me–Lys, methylated G1 phase of the cell cycle on at least six Ser/Thr residues lysine; OH–Pro, hydroxylated proline; Pi, inorganic phosphate; P–Tyr, (FIG. 2c). As Cdc4 is the substrate-targeting subunit of phosphotyrosine; SAH, S-adenosylhomocysteine; SAM, S-adenosylmethionine; SH2, an SCF (Skp1–Cul1–F-box) E3 ubiquitin ligase complex, Src-homology-2; SHC, SH2-domain-containing transforming protein; Ub, ubiquitin; phosphorylation of Sic1 leads to its polyubiquitylation UIM, ubiquitin-interacting motif; VHLβ, von Hippel–Lindau protein-β; Vps27, vacuolar and degradation by the proteasome. This, in turn, is protein sorting protein-27. necessary for the onset of DNA synthesis, because Sic1

474 | JULY 2006 | VOLUME 7 www.nature.com/reviews/molcellbio Many proteins are assembled from interaction102 multiple domains with predictable functions

By combining various module binding domains, plus a catalytic domain, sophisticated dynamic regulation of protein activity is possible. It is apparent that evolution has shufﬂed and recycled these domains many times to make the proteins that control cell signalling. Examples:

http://pawsonlab.mshri.on.ca Pawson and Nash, Science 300, 445 (2003) Modular proteins interact with each other in 103 REVIEWS complex, yet rational, pathways (circuits) EGF EGF receptor

“…mammalian cyclin E activates cyclin- SCF E3 complex dependent kinase-2 (CDK2) to promote P the G1-to-S phase transition in the cell (WD40)8 F-box E3 cycle. The phosphorylation of cyclin E Thr Lys Ub UIM CDC4 on a Thr residue is required for its Cyclin E Cyclin E E2 recognition by the WD40-repeat domain Ub CDK2 P CDK2 of the targeting subunit — CDC4 (cell- Epsin P Tyr Tyr P SH2 Cbl division cycle-4) — of an SCF (SKP1– P P Ub EPS15 AP2 CUL1–F-box) E3 ubiquitin-ligase E1 complex. This interaction leads to the Ub ‘Open’ addition of a Lys48-linked polyubiquitin UIM Ub P PTB CDK2 Ub Ub Thr SHC Lys Ub UIM chain to cyclin E, which results in its Ub subsequent recruitment to the P Ub Cyclin E P Tyr Internalization proteasome by a ubiquitin-interacting Ub Ub CDK2 Proteasome ‘Closed’ EPS15 motif (UIM) and its destruction.” P P UIM Lys GRB2 SH2 Ub CDK2 is an important kinase that needs to be activated in a cell-cycle dependent

fashion. Speciﬁcally, it needs t one active at the G1 to S transition point. Ras–MAPK signalling While turning on the activity of kinase is important, so is turning it off. This is Cell-cycle regulation Cell signalling Endocytic trafcking accomplished by targeting the cyclin E activator for Figuredestruction 3 | Networks by of the modification-dependent proteasome. interactions regulate cellular processes. Networks of Nature Reviews Molecular Cell Biology 7, 473-483 phosphorylation (P) and ubiquitin (Ub)-dependent protein interactions regulate the cell cycle, as well as growth-factor- induced signalling and endocytic trafficking. Inducible post-translational modification (PTM)-dependent interactions are highlighted by red arrows. Sequential PTM-dependent interactions generate biological pathways, as can be seen in two examples. In the first example, mammalian cyclin E activates cyclin-dependent kinase-2 (CDK2) to promote the G1-to-S phase transition in the cell cycle. The phosphorylation of cyclin E on a Thr residue is required for its recognition by the WD40-repeat domain of the targeting subunit — CDC4 (cell-division cycle-4) — of an SCF (SKP1–CUL1–F-box) E3 ubiquitin-ligase complex. This interaction leads to the addition of a Lys48-linked polyubiquitin chain to cyclin E, which results in its subsequent recruitment to the proteasome by a ubiquitin-interacting motif (UIM) and its destruction. In the second example, epidermal growth factor (EGF) receptor autophosphorylation produces phosphotyrosine (pTyr) sites that can recruit the Src-homology-2 (SH2) domain of the E3 ubiquitin ligase Cbl (Casitas B-lineage lymphoma proto- oncogene). Cbl monoubiquitylates the receptor to provide docking sites for UIM-containing proteins such as epsin and EPS15 (EGF-receptor-pathway substrate-15), which are involved in endocytic membrane trafficking. The EPS15 UIM also undergoes an intramolecular interaction with a monoubiquitylated site, which regulates EPS15 activity (see the ‘closed’ and ‘open’ conformations). The binding of the phosphotyrosine-binding (PTB) domain of SHC (SH2-domain-containing transforming protein) to a pTyr site on the EGF receptor induces the phosphorylation of a Tyr residue in SHC. This pTyr, in turn, recruits the SH2 domain of GRB2 (growth-factor-receptor-bound protein-2) and activates downstream signals. The figure also shows the convergence of distinct interaction domains on the recognition of a single PTM (for example, SH2 and PTB domains for pTyr) and the modification of the same polypeptide by different types of PTM (for example, the multi-site phosphorylation and ubiquitylation of the EGF receptor). AP2, adaptor protein complex-2; E1, ubiquitin- activating enzyme; E2, ubiquitin-conjugating enzyme; MAPK, mitogen-activated protein kinase.

residues in a manner that depends on ligand phospho- These data show that an activated RTK is modified rylation and the identity of the flanking amino acids26 at multiple sites by phosphorylation and ubiquitylation (FIG. 2a). Activated receptor tyrosine kinases (RTKs), to yield motifs that are specifically recognized by the such as the β-platelet-derived growth factor receptor or interaction domains of cytoplasmic effectors and regu- the epidermal growth factor receptor (EGFR), become lators (FIG. 4a). As is discussed in the following sections, phosphorylated at multiple Tyr sites, and each of these this strategy of multi-site protein modification followed sites selectively binds the SH2 domain of one or more by the selective recruitment of effectors through PTM- cytoplasmic signalling proteins, which, in turn, activate dependent interaction domains is more widely used by specific intracellular signalling pathways27–29 (FIGS 3,4a). key regulatory proteins. For example, this mechanism Among the SH2-domain-containing proteins that are is used by histones to modify chromatin organization recruited to an autophosphorylated RTK such as EGFR (FIG. 4b) and by RNA polymerase II to recruit trans- is the Cbl (Casitas B-lineage lymphoma proto-oncogene) criptional regulators (FIG. 4c), although in these cases E3 ubiquitin ligase, which subsequently ubiquitylates the the principal PTMs that are recognized by interaction activated receptor on Lys residues and forms binding sites domains are acetylated/methylated Lys residues or pSer for proteins with ubiquitin-interacting motifs (UIMs) that residues, respectively, as opposed to pTyr residues for are involved in receptor endocytosis15 (FIGS 2d,3). RTKs (FIG. 4a).

NATURE REVIEWS | MOLECULAR CELL BIOLOGY VOLUME 7 | JULY 2006 | 477 Disulphide bonds are another type of post- 104 translational modiﬁcation

Ribonuclease (PDB 5RSA)

• Disulphide bonds generally only form under oxidizing conditions. They are very common in extracellular proteins such as those found in blood. • Disulphide bonds generally do not form in the cytoplasm of living cells because this is a reducing environment (~5 mM glutathione, a thiol) • A protein that has disulphide bonds will tend to be very stably folded relative to proteins that do not have disulphide since there is a covalent link holding two sections of chain together (as opposed to non-covalent interactions in a protein that lacks disulﬁde bonds). You would expect a protein with disulphide bonds to be more stable at elevated temperatures than a protein without disulphide bonds. • To maintain the activity of a protein with free thiols in vitro, it is important to add reducing agents such as β-mercaptoethanol, TCEP, or DTT. In vitro, reducing agents are necessary to 105 maintain thiols in a reduced form

S S SH HS

HO OH oxidation HO OH Dithiothreitol (DTT) reduction SH SH O S S

O OH SH HO HO P H Cl

β-mercaptoethanol (2-mercaptoethanol) HO O Tris(2-Carboxyethyl) Phosphine Hydrochloride (TCEP·HCl)

• Don’t use β-mercaptoethanol (at least not in this building) if you can help it. It stinks • DTT or TCEP are much better choices. These two reducing agents are similar in terms of effectiveness. TCEP can be slightly more expensive but it benefits from not having a free thiol (a good nucleophile). This is advantageous if you are working with electrophiles in solution. The chromophore of green fluorescent protein106 is a unique post-translational modification

Gly67 O O cyclization Tyr66 N H N HN O HN HO O HO OH O HO HO N N H Ser65 H oxidation & dehydration

N N HO O HO N H mature chromophore

MSKGEELFTGVVPILVELDGDVNGQKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTL VTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFYKDDGNYKTRAEVKFEGDTLV NRIELKGIDFKEDGNILGHKMEYNYNSHNVYIMADKPKNGIKVNFKIRHNIKDGSVQLAD HYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMILLEFVTAAGITHGMDELYK

Prasher et al. Gene (1992) 111: 229-233. Tsien, Annu. Rev. Biochem. (1998) 67: 509-544.

• Yet another type of post-translational modification are those that are generated by chemical rearrangements of side chains and, sometimes, the peptide backbone. For example, the Aequorea Victoria jellyfish produces a green fluorescent protein (GFP). Surprisingly, the chromophore is not a bound cofactor but rather a post-tranlationally modified sequence of amino acids. • The chromophore (also okay to call fluorophore) is formed by the protein-promoted rearrangement and oxidation of the sequence Ser-Tyr-Gly within the center of the protein. The GFP is of tremendous utility to researchers as a marker within cells. The Nobel prize in Chemistry for 2009 was awarded to 3 researchers (Tsien, Chalfie, and Shimomura) who are considered pioneers in the development of GFP as a research tools. • The spontaneous formation of the Aequorea green fluorescent protein chromophore within the folded beta-can protein structure must necessarily involve at least three key steps: cyclization of the main chain, loss of a molecule of water (dehydration), and oxidation with molecular oxygen. The exact order and mechanism of these steps is a matter of ongoing investigation. • Chromophore formation is spontaneous only within the context of the fluorescent protein - can structure where steric constraints force the peptide into a tight turn conformation (Branchini et al. 1998) and the side chains of highly conserved residues, such as glutamate 222 and arginine 96, are positioned to facilitate the reaction. Fluorescent proteins engineered to 107 fluoresce at different wavelengths

• Note that, because the GFP chromophore is generated from amino acids, we can change the structure of the chromophore by introducing mutations in the gene that ultimately change the amino acids present in the chromophore forming tripeptide (or its immediate surroundings). • For example, by changing the tyrosine of the GFP chromophore to other aromatic amino acids, new chromophore structures can be formed that ﬂuoresce at different wavelengths from the wild-type protein. Visualizing the central dogma in a live cell! 108

cyan ﬂuorescence = Lac operator in nucleus and yellow ﬂuorescence = An immortalized human peroxisomes in RNA transcripts cancer cell known as U2OS cytoplasm

Gene for Gene for LacI - cyan FP MS2bp - yellow FP

256 x Lac operator promoter cyan FP (peroxisome-targeted) 24 x MS2-repeats

This cell has 3 different genes artificially introduced into it: • LacI - cyan FP, the gene for LacI fused to the gene for a cyan fluorescent protein • MS2bp - yellow FP, the gene for MS2bp fused to the gene for yellow fluorescent protein • A peroxisome targeted cyan FP followed by 24 copies of MS2 in the 3‘UTR. This gene is proceeded by 256 copies of the Lac operator which is not transcribed.

• Definitions: • U2OS cell line: a human cell line cultivated from the bone tissue of a fifteen-year-old human female suffering from osteosarcoma (the same cancer as Terry Fox had). • LacI: a bacterial protein that binds tightly to the section of DNA known as the Lac operator. • Lac operator: a DNA sequence that binds to LacI. • MS2: an RNA sequence that forms a hairpin structure. This RNA hairpin binds tightly to the viral protein that I have called MS2bp. • MS2bp (MS2-binding protein): a viral coat protein that binds tightly to the the MS2 RNA sequence. • cyan FP: the cyan fluorescent variant of the green fluorescent protein • yellow FP: the yellow fluorescent variant of the green fluorescent protein • peroxisomes: small organelles in the cytoplasm of eukaryotic cells. They have a role in destroying peroxides in the cell. • This is the work of the labs of David L. Spector and Robert H. Singer • Q. This is about the topic of visualizing the central dogma in the class today. In the example of the sequence having 254 copies of Lac operator and 24 copies of MS2, the figure shows both the cyan and yellow florescence separately. But will there be more of the cyan fluorescence actually noticed since the sequence has 254 copies of Lac and 24 copies of MS2 ? But the figure actually shows more number of yellow spots compared to cyan in the nucleus. Could you please explain if there will be any differences ? • A. All of the copies of the Lac operator DNA sequence are located in the same place and there is only one 'copy' of the 254 copies in the cell (i.e., there is only one copy of the genome). There are many, many RNA transcripts created by transcription of this DNA sequence. As we saw earlier in the course, one DNA sequence can be read by many RNA polymerases over and over again in order to produce many mRNA molecules. Each of these transcripts carries the 24 copies of MS2 and thus shows up as a yellow fluorescent spot. You would expect each yellow spot (corresponding to one transcript) to be substantially dimmer than the cyan spot in the nucleus. Each transcript has 24 fluorophores attached to it, while the DNA sequence has 254 fluorophores attached to it. • Q. Visualizing central dogma in a living cell. There are three arrows point to nucleus. One is for Lacl, one is for a DNA sequence, another is for MS2bp. For Lacl and MS2bp, do they also stand for DNA sequences? And they would transcribe and translate to cyanFP and yellow FP respectively. The translated cyan FP would bind to Lac operator and translated yellow FP would bind to MS2 RNA sequence. Is that right? • A. Correct. The cell has 3 different genes introduced into its genome. One of these is the gene for LacI-cyan FP and it would be transcribed and translated to form the LacI- cyan FP protein. Likewise the gene for MS2bp-yellow FP would be transcribed and translated to form the MS2bp-yellow FP protein. You are correct that they protein then bind to the Lac operator (in DNA) and the MS2 RNA sequence (in RNA), respectively. Visualizing the central dogma in a live cell! 109 example 1 cyan fluorescence = Lac operator in yellow fluorescence = nucleus and peroxisomes in cytoplasm RNA transcripts

• Imaging gene expression in single living cells Nat Rev Mol Cell Biol 5(10):855-862 (2004 October) • Visualizing gene expression in living cells. The movie shows a cell with a stably integrated gene that also contains 256 lac operator repeats. This gene transcribes an RNA that contains both a coding sequence for the cyan fluorescent protein (CFP) protein (with a peroxisome-targeting sequence) and a stretch of MS2 stem-loops. In the beginning of the movie, the gene locus is visible as a result of tagging of the DNA with a CFP–lac-repressor protein. Once transcription is induced from this gene, the locus becomes structurally open and decondenses. The RNAs produced from the gene are tagged with yellow fluorescent protein (YFP)–MS2 and can be seen accumulating at the transcription site. The RNA is translated in the cytoplasm and at later times post-induction, CFP-labelled peroxisomes are detected. The cell was imaged every 2.5 min for a total of 4 hr and 22.5 min. • http://singerlab.aecom.yu.edu/supplements/natrevmcb_v5p855/movies03.htm • Q: The lac operator is connected to the DNA sequence of the cyan FP, and when transcription happens, cyan FP will be generated and the lac operator will not be transcribed? And the lac operator will not be replicated in the nucleus? • A: Transcription starts from the promoter and so only things that come after the promoter, specifically the cyan FP (peroxisome targeted) and the 24 x MS2 repeats, will be transcribed. The 256 lac operator copies comes before the promoter, so it is not transcribed. The 256 lac operator are just there to serve as a binding site for Lac I- cyan FP, so that the location of the inserted DNA sequence can be visualized by fluorescence imaging. All of the DNA will be replicated when the cell divides, but otherwise there will be just one copy of the DNA in the cell. • Q: The MS2 is an RNA sequence and it is connected to the DNA sequence of the yellow FP? What is the function of the MS2 here and can it be replicated? • A: 'MS2' is an RNA sequence that forms a hairpin structure. 'MS2 binding protein' (MS2bp) is a protein that binds to the MS2 hairpin. The cell contains the gene for MS2bp fused to yellow FP. When this fusion protein is made in the cell, it will stick to the RNA molecules that contain the 24 x MS2 repeats and allow the RNA molecules to be visualized by fluorescence imaging. • Q: Is one lac operator connected to one cyan FP? • A: The lac operator isn't connected to anything. It is just a DNA sequence that is not transcribed. Lac Inhibitor (LacI) is a protein that binds to the lac operator. The cell contains the gene for LacI fused to cyan FP. When this protein is made, 256 copies of it will stick to the 256 repeats of the lac operator in the DNA. • Q: Will the content of visualizing the central dogma in a live cell in lecture 4 be included in the exam? • A: Everything that was covered in class and/or is in assigned reading could be included in the exam. Visualizing the central dogma in a live cell! 110 example 2 cyan fluorescence = Lac operator in yellow fluorescence = nucleus and peroxisomes in cytoplasm RNA transcripts

1 3

2 4

• Dynamics of Single mRNPs in Nuclei of Living Cells Science 304(5678):1797-1800 (2004 June 18) • 1. Detection of open gene locus by CFP-lac repressor. • 2. Detection of cytoplasmic CFP-peroxisomes (different plane than 1). • 3. Detection of YFP-MS2 nuclear mRNPs and YFP-MS2 accumulation at the transcription site. • 4. Different threshold of same cell showing cytoplasmic YFP-MS2 mRNPs. • An important conclusion from this work is that RNA transcripts are freely diffusing inside of the nucleus. This is different than in the cytoplasm where they are actively transported. • http://singerlab.aecom.yu.edu/supplements/science_v304p1797/movies.htm • Q: In the example of "visualizing the central dogma in a living cell", you mentioned that we tag the MS2 RNA with the YFP. My question is that where does the YFP comes from? I found there is no DNA sequence that represent for the YFP • A: The cell also contains a gene encoding MS2bp-YFP • Q. I have a question about cyanFP, you mentioned that it is peroxisome- targeted what do you mean by that? You mean that it will have interaction by peroxisome? What will happen to it? It is still on the translating region and it should appear on the RNA transcription, so do we have both cyan and yellow color? • A. By genetically fusing a protein (including FPs) to speciﬁc peptide sequences, they can be targeted to different compartments of the cell. For example, there are also speciﬁc sequences that send proteins to the nucleus and the mitochondria. In the case of peroxisomes it is a simple SKL tripeptide that causes a protein to be targeted to this compartment. A peroxisome is a membrane-enclosed organelle and the protein will accumulate inside of it. Adding a targeting sequence does not effect the color of the FP. These sequences are typically added to the N- or C-terminal tails of the protein. • Q. If LacI-cyan FP only bind to Lac operator of DNA sequences, why does the whole nucleus appear as cyan? • A. There is an excess of LacI-cyan FP relative to the number of binding sites. It is this unbound protein that is causing the whole nucleus to appear cyan.