Bioinformatics Exercises: Bovine Lactate Dehydrogenase (LDH)

CH/BI 421/621/527 F15 Bioinformatics Worksheet for LDH Bioinformatics Exercises: Bovine Lactate Dehydrogenase (LDH) BACKGROUND: Often primary structure (amino acid sequence) is the first piece of experimental information a biochemist wants to have about a protein s/he is interested in studying since it can be used to make several predictions about the properties and possible behavior of the protein such as: • Protein molecular weight by adding up the masses of the individual amino acid residues. • Isoelectric point. The isoelectric point is where the protein has no charge. Because of ionizable functional groups on amino acids, protein charge changes as a function of pH depending on whether or not these groups are protonated. By knowing the sequence, we know how many of each ionizable group our protein contains. If we know the pH range where these groups become protonated or deprotonated, we can estimate the charge of the whole protein as a function of pH. This will be discussed in more detail below. • Molar extinction coefficient. Tryptophan, Tyrosine and Cysteine residues absorb ultraviolet light at 280 nm. By knowing how many of these amino acids are found in our protein’s sequence, we can calculate how much we expect a solution of our protein to absorb 280 nm light as a function of its concentration. I say “expect” instead of “determine” because the amount of light absorbed by these amino acids is dependent on their local environment within the protein especially on whether they are on the surface and exposed to the solution or buried inside the protein. • Sequence similarity to other proteins which suggest homology to proteins of known function and/or structure. • Other structural predictions based on sequence o Disulfide bonds. If your protein is from a bacterial cytoplasm it will not have disulfide bonds in its native conformation because the intracellular environment is reducing. However if you are studying a eukaryotic protein, disulfide bonds (two cysteine residues bound together via an oxidation reaction) may play a key role in the protein’s folded structure. o Secondary structure. Based largely on databases of experimentally-determined protein three dimensional structures, some sequences and particular amino acid residues are more or less likely to form particular types of protein secondary structure hydrogen-bonding networks. o Stability with respect to proteolytic digestion. All cells contain proteases. When cells or tissues are disrupted to isolate proteins, these previously compartmentalized protein-cutting enzymes are now in the solution with your protein target. Proteases have different specificities in terms of protein sequence and some sequences are particularly “yummy” (likely to be cleaved by these enzymes). o Hydrophobicity. Once the sequence is known, you can look for the location of all of the hydrophobic amino acid side chains. If you find a long (~23 residues) linear 1 CH/BI 421/621/527 F15 Bioinformatics Worksheet for LDH stretch of sequence containing only hydrophobic amino acids, this may suggest a region of the protein that spans a lipid membrane. o Potential post-translational modification sites for glycosylation, biotinylation, binding metal cofactors, etc. All of this is very useful information, but much of it is a prediction and may not be true of the biologically-relevant folded protein (the “native” structure). Protein sequence cannot yet predict tertiary structure or association with other subunits (quaternary structure). Even the secondary structure prediction tools are often inaccurate. Sequence does not tell you about the overall shape of the protein or the characteristics of its surface such as its charge distribution or whether or not it has hydrophobic patches. These surface characteristics are important both for the biological function of the protein and for determining how other molecules may interact with it and currently still need to be determined experimentally. In the past three weeks you have purified LDH from bovine heart or skeletal tissue and soon you will be conducting biochemical and biophysical experiments to characterize it both structurally and functionally. This bioinformatics worksheet coupled with the molecular visualization worksheet you will complete during week 6 in the lab, you will help you to take advantage of what can be learned about the proteins using bioinformatics tools and structural information from proteins whose structure has been experimentally determined and deposited in the protein databank (PDB). Please take a moment to bookmark these sites on your computer now: - PIR - Protein Information Resource, http://pir.georgetown.edu/ - Expasy Proteomics tools, http://www.expasy.org/ - Protein Data Bank, http://www.rcsb.org/ Start with the PIR site: http://pir.georgetown.edu/ Q1. Find the amino acid sequence of LDH from Organism: bovine (Bos Taurus) i. Go to the PIR site and find “Text Search box” on the lower right ii. Click on the arrow next to the text box to open up a screen where you can enter multiple text keywords. Enter LDH and bovine iii. Choose the entry with Protein Name: L-lactate dehydrogenase A chain iv. Record the Protein Accession (AC) #: v. Click on UniProtKB/Swiss-Prot option under the AC number to access information on this protein. Take some time to familiarize yourself with the kinds of information this file contains. vi. Scroll down to the “Sequence” section of the file and click the blue button labeled FASTA to download the protein sequence in the “FASTA” format. In this format the first line starts with the character “>” followed by some informational text, indicating that that line is for informational content only and will be ignored by other programs running their own algorithms. This line is followed by the single letter amino acid sequence of the protein. vii. Copy/paste the sequence here: 2 CH/BI 421/621/527 F15 Bioinformatics Worksheet for LDH Now go to the ExPASY Bioinformatics Resource Portal: http://www.expasy.org Q2. Go to ExPASY Bioinformatics Resource Portal. Under “Categories” on the left choose “Proteomics” and then “Protein characterization and function” (http://www.expasy.org/proteomics/protein_characterisation_and_function). Under the “Tools” column on the right find “ProtParam” (http://web.expasy.org/protparam/) This program uses the primary structure of your protein to determine: a. Molecular weight b. Isoelectric point c. Amino acid composition including sums of residues that will always be negatively charged in a physiologically-relevant pH range (Aspartate and Glutamate) and those that will always be positively charged (Arginine and Lysine). d. Atomic composition e. Molar extinction coefficient (M-1cm-1) at 280 nm f. Half life and instability index (predicted susceptibility to proteolysis) g. Aliphatic index and hydropathicity (based on relative amounts of polar and nonpolar residues) i. Enter either the LDL Accession number you retrieved from the PIR site (Q1) or LDH sequence you retrieved from the uniprot site (Q1) into the corresponding text box, click on “compute parameters”. ii. The first screen will allow you to select different parts of the protein for analysis. Choose the sequence corresponding to “Chain” (this is the mature form of the protein). iii. Fill out the following information. For the extinction coefficient, use the oxidized form of the proteins (disulfide bonded cystines). Protein ID and # of MW pI # neg # pos Aromatic amino Extinction Accession # amino g/mol (asp (arg acids Coefficient acids and and tyr phe trp M-1cm-1 glu) lys) Q3. Go to the “Protein Structure” option under Proteomics categories and select Jpred from the tools menu. ( http://www.compbio.dundee.ac.uk/jpred/) i. Paste the LDH sequence (retrieved in Q1 above) into the appropriate field and click Advanced options. ii. Select “single sequence” input type, check the box to skip searching PDB, enter your email address, and give the name LDH_your initials. Then click the button “make prediction”. It will let you know if there’s an experimental crystal structure for your protein that gives far 3 CH/BI 421/621/527 F15 Bioinformatics Worksheet for LDH more accurate structural information than the prediction tool, but for demonstration purposes go ahead and click on “continue” to generate the predicted secondary structures based on primary structure. It will take a few minutes for the computer to do the computation. iii. Click on “Simple HTML” when it finishes. iv. Cut and paste the relevant parts of the sequence to fill out the table below that lists each stretch of beta strand (E) or alpha helix (H). Sequence Secondary Structure Q4. Finally, let’s look into the database to see what other proteins are similar to the sequence of bovine LDH you retrieved in Q1. i. Go to ExPASy Proteomics page and select “Similarity Search/alignment” from the “categories” on the left and then “BLAST-Uniprot” tool from the “Tools” column. http://www.uniprot.org/blast/ ii. Paste the amino acid sequence for bovine LDH from Q1 into the search box and click “Run Blast” iii. Based on the results you get which organism has the closest sequence (ranked by score and by identity) in the database to the bovine LDH sequence you retrieved in Q1? iv. Find the match for “homosapiens” (L-lactate dehydrogenase A chain – homosapiens) and fill in the following information: Protein AC# (located to the left of the name): % identity (to the right of the match hit display after ordering by “identity” on the top: v. Click on this entry‘s accession number and retrieve the sequence of the 1st isoform in FASTA format (as you did for Q1 above.) 4 CH/BI 421/621/527 F15 Bioinformatics Worksheet for LDH vi. Copy/paste the sequence here: Q5. Go back to the Similarity Search/alignment page and select ClustalW-PIBL from the tools (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_clustalw.html) i. Type: >bovine LDH Paste the bovine LDH sequence from Q1 Type >human LDH Paste the human LDH sequence from Q1 ii.

Bioinformatics Exercises: Bovine Lactate Dehydrogenase (LDH)

Zebrafish Disease Models to Study the Pathogenesis of Inherited Manganese Transporter Defects and Provide A

Glycomics Goes Visual and Interactive

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

Pathogenicity and Selective Constraint on Variation Near Splice Sites

Viroinformatics Investigation of B-Cell Epitope Conserved Region in SARS

Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

Plant Protein Annotation in the Uniprot Knowledgebase

Europe PMC Funders Group Author Manuscript Nature

Sedum Alfredii Sanramp6 Metal Transporter Contributes to Cadmium Accumulation in Transgenic Arabidopsis Thaliana

C2orf62 and TTC17 Are Involved in Actin Organization and Ciliogenesis in Zebrafish and Human

How to Get the Best from Fission Yeast Genome Data

Automated Download and Clean-Up of Family Specific Databases for Kmer-Based Virus Identification