Paper No. : 04 Genetic engineering and recombinant DNA technology Module : 24 Understanding the architecture of Proteins, protein to proteome
Principal Investigator: Dr Vibha Dhawan, Distinguished Fellow and Sr. Director The Energy and Resouurces Institute (TERI), New Delhi
Co-Principal Investigator: Prof S K Jain, Professor, of Medical Biochemistry Jamia Hamdard University, New Delhi
Paper Coordinator: Dr Mohan Chandra Joshi, Assistant Professor, Jamia Millia Islamia, New Delhi
Content Writer: Dr. Pravir Kumar, Professor, Department of Biotechnology, Delhi Technological University
Content Reviwer: Dr Rohini Muthuswami, Associate Professor, Jawaharlal Nehru University
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
Description of Module Subject Name Biotechnology
Paper Name Genetic engineering and Recombinant DNA technology
Module Name/Title Understanding the architecture of Proteins, protein to proteome
Module Id 24
Pre-requisites
Objectives
Keywords
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
e-Content
Introduction: Amino acids and protein Amino acids are compounds made up of carbon, oxygen, hydrogen and nitrogen. They act as monomers of proteins and are consisting of a carboxyl group, an amino group, a H-atom and a different side chain, all bonded to the α-carbon. In an α-amino acid, the amino and carboxylate groups are bonded to the α-carbon. Further, various α-amino acids differ in respect of the side chain (R-group) attached to their α-carbon. The common structure of amino acid is:
Figure 1: Common structure of an amino acid.
This structure is common to all except one of the α-amino acids (proline). The R- group of side chain attached to the α-carbon is different in each amino acid. In the simplest case, the R-group is a H-atom and amino acid is glycine.
Figure 2: Structure of glycine and lysine.
In α-amino acids both the carboxyl and amino group are bonded to the same C- atom. Although, many naturally occurring amino acids not found in protein, have structures that differ from the α-amino acids. In these compounds the amino group is attached to a C-atom other than α-carbon atom and they are called as β, γ, δ and ε amino acids depending upon the location of the C-atom to which amino group is attached.
Standard and non-standard amino acids
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
More than 300 amino acids are present in cells; however, only 22 amino acids take their part in protein biosynthesis ribosomically. Such amino acids are known as standard or proteinogenic amino acids. Some amino acids are more abundant in proteins than other amino acids. Four amino acids-serine, lysine, leucine and glutamic acid are the most abundant amino acid residues in a typical protein. Tryptophan and methionine are rare amino acids in a protein. Standard L-α-amino acids are specified by simple three letter codons. Α amino acids can be classified on the properties of their side chain, in particular, their polarity, or tendency to interact with H2O at physiological pH. The polarity of the side chain varies widely, from non- polar and hydrophobic to highly polar and hydrophilic. 1) Amino acids with non-polar side chains Among standard amino acids, nine amino acids contain non-polar side chain. These include alanine, glycine, leucine, valine, proline, isolucine, phenylalanine, methionine and tryptophan. Proline differs from other members because of having its side chain bonded to both the nitrogen and α- carbon atoms. Tryptophan and phenylalanine are having aromatic side chains. Further, the side chain of tryptophan contains an indole ring whereas, phenylalanine has a phenyl ring. 2) Amino acids with uncharged polar side chain Six amino acids contain uncharged polar side chain. These are theronine, cysteine, serine, asparagines, tyrosine and glutamine. Three amino acids theronine, serine and tyrosine are having hydroxyl groups attached to a side chain. Cysteine is similar to serine but contains a thiol or sulfhydral group in place of the OH-group. 3) Amino acids with charged polar side chain Positively charged R group: arginine and lysine have side chains that contain positively charged groups at neutral pH. Lysine has an amino group whilst arginine contains a guanidinium group. Histidine contains an imidazole group, an aromatic ring.
Negatively charged R group: glutamate and aspartate have side chains that contain acidic side chains and are having negatively charged groups at physiological pH.
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
Figure 3: Amino acids with polar and non-polar side chains.
Further the structures of the 20 standard amino acids including (both hydrophilic and hydrophobic amino acids) are shown in the figure below.
Figure: Structures of hydrophilic amino acids.
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
Figure 4: Structures of hydrophobic amino acids.
Non-standard amino acids Apart from the 22 standard amino acids, all other amino acids are not ribosomically incorporated into proteins are known as non-standard amino acids. Selenocysteine is introduced during protein biosynthesis rather than created through postsynthetic modification. It contains selenium rather than sulphur of its structural analog, cysteine. Since, selenocysteine is incorporated into polypeptides during translation, thus it is referred to as 21st amino acid and specified by stop codon (UGA). Similarly, pyrrolysine is considered to be 22st amino acid which is similar to lysine and found in some bacterial proteins. It is coded by UAG codon.
Figure 5: Structures of Nonstandard amino acids.
Proteins structural organization Proteins are unbranched polymers constructed from 22 standard α-amino acids. They have four levels of the structural organization. Primary structure, the amino acid sequence is specified by genetic information. As the polypeptide chain folds, it forms certain localized arrangements of adjacent amino acids that constitute secondary structure. The overall 3-D shape that a polypeptide assumes is known as tertiary structure. Likewise, proteins that contain two or more polypeptide subunits are said to have quaternary structure.
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
Figure 6: Structural organization of protein.
1) Primary structure The primary structure of a polypeptide is its amino acid sequence. The amino acids are connected via peptide bonds. Polypeptide containing primary structure determines the higher levels of structural organization.
2) Secondary structure The most common types of secondary structure are the α-helix and the β- pleated sheet. Both α-helix and the β-pleated sheet patterns are stabilized by
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
H-bonds between the carbonyl and N-H groups in the polypeptides backbone. The α-helix is a rigid, rod like structure that forms when polypeptide chain twists into a helical conformation. The screw sense of α- helix can be right-handed or left-handed. However, right-handed helices are energetically more favourable. Length of α-helix is usually 10-15 amino acid residues. Intrachain H-bonds form between the N-H group of each amino acid and the carbonyl group of the amino acid four residues away. Further, β- pleated sheets form when two or more polypeptide chain segments line up side by side. Each individual segment is referred to as a β-strand.
3) Tertiary structure The terms tertiary structure refers to the unique 3-D conformations that globular proteins assume as a result of the interactions between the side chains in their primary structure. All the information’s needed to fold protein into its native tertiary structure is contained within the primary structure of the peptide chain itself. The following types of covalent and non-covalent interactions stabilize tertiary structure of protein:
Non-covalent interactions H-bonds Electrostatic interactions Hydrophobic interactions Van der waals force of interaction
Covalent bonds: The most prominent covalent bonds in tertiary structure are the disulfide bond present in several extracellular proteins. Cysteine has a thiol group that is unique among the 20 amino acids in that it often forms a disulfide bond to another cysteine residue through the oxidation of their thiol groups.
4) Quaternary structure
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
Numerous proteins such as, haemoglobin are made up of two or more polypeptide chains; each polypeptide chain is known as subunit. Subunits in multimeric protein may be identical or quite different. Polypeptide subunits assemble to form quaternary structure and are stabilized by non-covalent bonds including, hydrophobic interactions, electrostatic interactions and H- bonds as well as covalent cross-links.
Further, the conformation of protein folding relies on following factors: Peptide bond rotation and R group orientation Weak non-covalent interaction Protein shape and sequence of amino acids Conformation and free energy Denaturation and renaturation of proteins Factors: heat, pH, chemicals and time
Post-translational modification
Acetylation This process illustrates a reaction, where an acetyl functional group into a chemical compound get introduced. Conversely, deacetylation is a process of the removal of an acetyl group. Acetylation refers to the process of introducing an acetyl group (resulting in an acetoxy group) into a compound, namely the substitution of an acetyl group for an active H-atom. A reaction involves the substitution of the H-atom of a hydroxyl group (OH) with an acetyl group (CH3 CO) yields a specific ester, the acetate. Acetic anhydride is commonly used as an acetylating agent reacting with free hydroxyl groups. For example, it is used in the synthesis of aspirin, heroin, and THC-O-acetate.
Phosphorylation Phosphorylation allows coupling and addition of a phosphoryl group. This particular biochemistry probably represents the origins of catecholamine overload (PO3) to a molecule. In biology, phosphorylation and its counterpart, dephosphorylation, are
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
critical for many cellular processes. A large fraction of proteins are temporarily phosphorylated, as are many sugars, lipids and other molecules. Phosphorylation is especially important for protein function as this modification activates (or deactivates) almost half of the enzymes, thereby regulating their function. Protein phosphorylation is considered the most abundant post-translational modification in eukaryotes.
Hydroxylation Hydroxylation is a chemical process which is responsible for introducing a OH group into an organic compound. In biochemistry, hydroxylation reactions are often facilitated by enzymes called hydroxylases. Further, the process of hydroxylation is crucial for oxidative degradation of organic compounds in air. Moreover, it is also pivotal in detoxification process as hydroxylation converts lipophilic compounds into hydrophilic products that are more readily removed by the kidneys or liver and excreted. Some drugs including steroids are activated or deactivated by hydroxylation.
DNA methylation This is an important process in biochemistry, in which the methyl groups are added to the DNA molecule. The process of methylation is responsible for altering the activity of a DNA segment without affecting the sequence. Further, it acts to repress gene transcription when it is located in a gene promoter. It plays a decisive role in normal development and found to be associated with numerous key processes including, X-chromosome inactivation, genomic imprinting, carcinogenesis, repression of transposable elements and aging.
Carboxylation The process of carboxylation in biochemistry is a post-translational modification of glutamate residues to γ-carboxyglutamate in proteins. It occurs primarily in proteins involved in the blood clotting cascade. This modification is required for these proteins to function. Carboxylation occurs in the liver and is performed by γ-glutamyl carboxylase. The carboxylase requires vitamin K as a cofactor and performs the reaction in a processive manner. γ-carboxyglutamate binds calcium, which is essential for its activity. For example, in prothrombin, calcium binding allows the protein to associate with the plasma membrane in platelets, bringing it into close proximity with the proteins that cleave prothrombin to active thrombin after injury.
Glycosylation Glycosylation is a form of co-translational and post-translational modification. In biochemistry, glycosylation refers in particular to the enzymatic cascade that attaches glycans to proteins, lipids or other organic molecules. This enzymatic cascade generates one of the fundamental biopolymers found in cells. Further, in this process, a carbohydrate, i.e. a glycosyl donor is attached to a hydroxyl or other functional group of a glycosyl acceptor. The majority of proteins formed in the RER (rough endoplasmic reticulum) undergo the process of glycosylation. It is an enzyme- directed site-specific process which is present in the cytoplasm and nucleus as the O-GlcNAc modification. There are five classes of glycans are produced:
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome
N-linked glycans bonded to nitrogen of asparagine side-chains. N-linked glycosylation requires participation of a special lipid called dolichol phosphate. O-linked glycans attached to the hydroxyl oxygen of serine, tyrosine, threonine and hydroxyproline side-chains or to oxygen’s on lipids such as ceramide phospho-glycans attached through the phosphate of a phospho-serine, C-linked glycans, a rare form of glycosylation where a sugar is added to a carbon on a tryptophan side-chain. Finally, glypiation which is the addition of a GPI anchor that links proteins to lipids through glycan linkages.
Ubiquitination The addition of ubiquitin to a substrate protein is called ubiquitination or less frequently ubiquitylation. Ubiquitination affects proteins in many ways: it can mark them for degradation through proteasome, impair their cellular location and activity and finally prevent protein interactions. It basically involves three crucial steps: activation, conjugation and ligation which are orchestrated by E1, E2, and E3 respectively. The outcome of this chronological cascade is to bind ubiquitin to lysine residues on the protein substrate via an isopeptide bond, cysteine residues through a thioester bond, serine and threonine residues through an ester bond or the N- terminus of the proteins via a peptide bond
Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome