Introduction to Bioinformatics (Master ChemoInformatique)
• Roland Stote • Institut de Génétique et de Biologie Moléculaire et Cellulaire • Biocomputing Group • 03.90.244.730 • [email protected]
Biological Function at the Molecular Level
1 • 3.1x109 letters in the DNA code in every one of the 100x1012 cells in the human body.
• Humans have between 30,000 and 40,000 genes.
• There is approximately 2m of DNA in each cell packed into the nucleosome.
• If all the DNA in the human body were put end-to-end, it would reach to the Sun and back more than 600 times
What is Bioinformatics?
• Bioinformatics is the study of information contained within biological, chemical or medical systems through the use of computers.
• Bioinformatics methods are used in a wide variety of fields including basic science, biotechnology, medicine, pharmaceutical development and public health, plus others.
• Bioinformatics is continually evolving; new approaches and tools are being developed that allow the researcher to more accurately and efficiently acquire, analyze and present the large amounts of data that are generated in today's research environment.
2 What is Bioinformatics?
• Development and application of computerized methods for the study biological information and data (generation of databases) • Analysis and interpretation of these data (software tools)
Developing algorithms for text string comparison (sequence alignment and keyword searches) Developing algorithms for pattern matching (data mining, cluster analysis) Algorithms for geometry analysis (docking,visualisation) Physical simulations and model building (molecular dynamics, molecular mechanics, homology modeling)
Bioinformatics is situated at the interface of multiple domains of research.
3 Objectives of this module - 20hours
1. Introduction of protein and DNA sequence and structure.
2. Present different biological databases (sequence and structure) and their associated interrogration tools.
3. Find information on a protein from its sequence.
4. Visualize, analyze and find information on a protein from its 3-D structure - use of the visualization program VMD.
5. Presentation of the basics of molecular modeling applied to biological molecules. An introduction of energy minimization and molecular dynamics.
4 An introduction to DNA and protein structure Relationship to function
Roland Stote
Sources of supplementary information
Introduction à la structure des protéines – Branden & Tooze – Ed DeBoeck Université
Proteins: structures and molecular properties – Thomas E. Creighton – W.H. Freeman
On the web: – http://www.expasy.ch/swissmod/course/course-index.htm – http://www.cryst.bbk.ac.uk/PPS2/course/index.html
Page ‹#› Structure des acides nucléiques
Roland Stote
Basé sur le cour de Annick Dejaegere à l’ESBS
Acides nucléiques formés de
Phosphates Sucres Bases
Page ‹#› Bases: Pyrimidines 4 5 N 3
6 2 N NH2 1
N Cytosine: 2 oxy 4 amino pyirmidine
N O H
O O Uracile: 2, 4 dioxy pyrimidine H3C NH NH Thymine: 5 methyl uracile
N O N O H H
6 7 N 5 N 1 Bases: Purines 8 4 2 N H H 9 H N N 3
N Adenine: 6 amino purine N O
N H N N H N
N H Guanine: 2 amino 6 oxy purine H N N
H
Page ‹#› Liaisons hydrogènes
A A D D
O NH2 D A
N NH D D
N O N O H H
D D Cytosine Uracile
Liaisons hydrogènes A: interaction avec accepteur A A D: interaction avec donneur
D H H D D N
D O D A N N N H N A
N H N
N H H N N
H
D Adenine Guanine D A
Page ‹#› Interactions dans le plan Liaisons hydrogènes
Paires de bases
10 possibilités dʼassemblage de paires purines-pyrimidines 11 purines-purines 7 pyrimidines-pyrimidines http://www.imb-jena.de/ImgLibDoc/
Page ‹#› Interactions verticales
Empilement (stacking) vertical Effet hydrophobe Interactions électrostatique des bases
Hélices A, B
Page ‹#› Hélices A, B
Les sillons
Page ‹#› Les sillons
Les sillons
Page ‹#› DNA Z
Séquences G -C Alternance de conformations syn et anti
DNA A, B, Z
B DNA: faible force ionique - conformation native de la chromatine
A DNA: forte force ionique ou en présence dʼalcohol
A RNA: conformation native de lʼARN
Z DNA: séquences alternées poly dG - dC à forte force ionique.
Page ‹#› Nucléosome
Molecular Biology
DNA String of four-letter alphabet of nucleotides A, C, G, T
Usually double stranded with complementary anti-parallel strands 5 ʼ ATCGCCTTATTCAT 3 ʼ 3 ʼ TAGCGGAATAAGTA 5 ʼ
Page ‹#› Genetic Code
A gene is a specific sequence of nucleotide bases, whose sequences carry the information required for constructing proteins, which provide the structural components of cells and tissues as well as enzymes for essential biochemical reactions. The human genome is estimated to comprise more than 30,000 genes.
The Genetic Code describes the translation of genes into protein
Genetic Code
TTT F Phe TCT S Ser TAT Y Tyr TGT C Cys TTC F Phe TCC S Ser TAC Y Tyr TGC C Cys TTA L Leu TCA S Ser TAA * Ter TGA * Ter TTG L Leu i TCG S Ser TAG * Ter TGG W Trp
CTT L Leu CCT P Pro CAT H His CGT R Arg CTC L Leu CCC P Pro CAC H His CGC R Arg CTA L Leu CCA P Pro CAA Q Gln CGA R Arg CTG L Leu i CCG P Pro CAG Q Gln CGG R Arg
ATT I Ile ACT T Thr AAT N Asn AGT S Ser ATC I Ile ACC T Thr AAC N Asn AGC S Ser ATA I Ile ACA T Thr AAA K Lys AGA R Arg ATG M Met i ACG T Thr AAG K Lys AGG R Arg
GTT V Val GCT A Ala GAT D Asp GGT G Gly GTC V Val GCC A Ala GAC D Asp GGC G Gly GTA V Val GCA A Ala GAA E Glu GGA G Gly GTG V Val GCG A Ala GAG E Glu GGG G Gly
Page ‹#› Biology at the Molecular Level
Proteins are essentially biological polymers
N-terminus terminates by an amino group
Peptide bond
Amino acid
C-terminus terminates by a carboxyl group A peptide: Phe-Ser-Glu-Lys (F-S-E-K)
Page ‹#› General form of amino acids
R
H2N C COOH α
H
The alpha carbon is asymmetic. 2 stereoisomers are possible, D or L
Chirality in proteins
D form L form
Page ‹#› Page ‹#› Page ‹#› Interactions between amino acids and their environment.
Elecrostatic interactions E(r)=A/r
r r + - + +
van der Waals interactions
r 12 6 E(r)= B/r - C/r
Interactions with solvent
Mesured by:
- solubility - chromatography - surface tension
Hydrophobic interactions
Hydrophilic interactions
Page ‹#› The hydrogen bond
δ - δ + δ - - D --- H …. A - donor acceptor
Double character: - electrostatic - covalent An acceptor can be shared geometry: N Linéaire +/-20 deg H O = d = 1.7 Å - 2.0 Å H N
Page ‹#› Classification of amino acids
Amphipatics Charged R D Q Small hydrophilics E K H N T C S
Y W G M P A V
F I L Small hydrophobics
Bulky hydrophobics
The peptide bond
i+1 - i+1 O C O C α + α C N C N
C C α H α H i i
Page ‹#› CIS-TRANS Isomerixation
i+1 O C O α H C N C N
C H C C α α α i i i+1 ω = 180 ω = 0 TRANS CIS
Four Levels of Structure Determine the Shape of Proteins
Primary structure The linear arrangement (sequence) of amino acids and the location of covalent (mostly disulfide) bonds within a polypeptide chain. Determined by the genetic code.
Secondary structure local folding of a polypeptide chain into regular structures including the α helix, β sheet, and U-shaped turns and loops.
Tertiary structure overall three-dimensional form of a polypeptide chain, which is stabilized by multiple non-covalent interactions between side chains.
Quaternary structure: The number and relative positions of the polypeptide chains in multisubunit proteins. Not all protein have a quaternary structure.
Page ‹#› Primary Structure of a protein: determined by the nucleotide sequence of its gene
Bovine Insulin: the first sequenced protein
• In 1953, Frederick Sanger determined the amino acid sequence of insulin, a protein hormone .
• This work is a landmark in biochemistry because it showed for the first time that a protein has a precisely defined amino acid sequence.
• it demonstrated that insulin consists only of amino acids linked by peptide bonds between α-amino and α-carboxyl groups.
• the complete amino acid sequences of more than 100,000 proteins are now known.
• Each protein has a unique, precisely defined amino acid sequence.
Amino acid substitution in proteins from different species
Conservative Substitution of an amino acid by another amino acid of similar polarity (Val for Ile in position 10 of insulin)
Substitution involving replacement of Non conservative an amino acid by another of different polarity (sickle cell anemia, 6th position of hemoglobin replace from a glutamic acid to a valine induce precipitation of hemoglobin in red blood cells)
Invariant residues Amino acid found at the same position in different species (critical for for the sructure or function of the protein)
Page ‹#› Protein conformation: many (but not all) proteins fold into a stable conformation, otherwise known as the native conformation
More than 50 amino acids becomes a protein,otherwise known as a peptide.
Secondary structure of proteins
Page ‹#› The 3D structure is defined by the orientation of successive petide planes
There are degrees of freedom for each amino acid (phi and psi)
Conformation of the polypeptide chain
Phi
Page ‹#› Conformation of the polypeptide chain
Psi
The Ramachandran plot
Page ‹#› The alpha helix
Helix Parameters
• Helix step: 5,4 Å • 3,6 residues per turn •Hydrogen bonds Oi - Ni+4 • phi = -60 / psi = -50
Vue axiale:
Page ‹#› The 3 10 helix
3 residues/turn H bond Oi to Ni+3 phi = -50 / psi = -25
The beta strand
Page ‹#› Assembly of beta strands into sheets
Parallel and anti-parallel beta sheets
Page ‹#› The beta sheet is not flat but curved
The beta turns
Page ‹#› Triple helix of Collagen
•Limited to tropocollagen molecule •3 left-handed helices wound together to give a right-handed superhelix •Stable superhelix : glycines located on the central axis (small R group) of triple helix •One interchain H-bond for each triplet of amino acids – between NH of Gly and CO of X (or Proline) in the adjacent chain
Page ‹#› Side chain conformations
Definition o dihedral angles along the side chain
Most frequently observed rotamers
> >
Non covalent interactions involved in the shape of proteins
Page ‹#› Tertiary structure: the overall shape of a protein
or a telephone cord!!!
The secondary structure of a telephone cord
A telephone cord, specifically the coil of a telephone cord, can be used as an analogy to the alpha helix secondary structure of a protein.
The tertiary structure of a telephone cord
The tertiary structure of a protein refers to the way the secondary structure folds back upon itself or twists around to form a three-dimensional structure. The secondary coil structure is still there, but the tertiary tangle has been superimposed on it.
Secondary structure motifs
Motifs: assembly (simple) of secondary structural elements
Helix-turn-helix
β Hair-pin
Page ‹#› Secondary structure motifs
β−α−β motic
Greek key motif
Fold Classification
α β α/β α+β
Page ‹#› Tertiary structure: the overall shape of a protein
Full three dimensional organization of a protein
The three-dimensional structure of a protein kinase
The role of side chain in the Where is water? shape of proteins
Hydrophilic
Hydrophobic
Page ‹#› TERTIARY STRUCTURE
R-group interactions result in 3D structures of globular proteins
Types of interactions : H-, ionic- (salt linkage), hydrophobic- and disulphide- bond
Hydrophilic R groups on surface while hydrophobic R groups buried inside of molecule
Wide variety of 3o structures: since large variation in protein sizes and amino acid sequences
Page ‹#› After X-ray crystallographic studies of hen lysozyme (Phillips, 1966), papain (Drenth et al., 1968) and by limited proteolysis studies of immunoglobulins (Porter, 1973; Edelman, 1973), Donald B. Wetlaufer Wetlaufer defined domains as stable units of protein structure that could fold autonomously.
A protein domain •is a part of a protein that can evolve, function, and exist independently of the rest of the protein chain.
•each domain forms a compact three-dimensional structure and often can be independently stable and folded.
•many proteins consist of several structural domains.
•one domain may appear in a variety of evolutionarily related proteins.
•domains vary in length from between about 25 amino acids up to 500 amino acids in length.
•examples, zinc fingers (stabilized by metal ions), calcium-binding EF hand domain.
Protein domains
The NAD*-binding domain of Cytochrome b562 the enzyme lactic A single domain protein dehydrogenase The variable domain of involved in electron transport an immunoglobulin in mitochondria
*nicotinamide adenine dinucleotide
Page ‹#› The Src protein
Quaternery structure:
If protein is formed as a complex of more than one protein chain, the complete structure is designed as quaternery structure:
• Generally formed by non-covalent interactions between subunits
• Either as homo- or hetero-multimers
Page ‹#› Primary structure
Secondary structure
Tertiary structure
Quaternary structure
Function of peptides and proteins
Page ‹#› STRUCTURE - FUNCTION RELATIONSHIPS
In general, all globular proteins have distinctive 3D structures that are specialized for their particular functions.
Shape and function
Page ‹#› Membrane transport proteins
Mechanical support - skin and bone are strengthened by the protein collagen.
Abnormal collagen synthesis or structure causes dysfunction of
• cardiovascular organs, • bone, • skin, • joints • eyes
Refer to Devlin Clinical correlation 3.4 p121
Page ‹#› Transport and storage - small molecules are often carried by proteins in the physiological setting (for example, the protein hemoglobin is responsible for the transport of oxygen to tissues). Many drug molecules are partially bound to serum albumins in the plasma.
The binding of oxygen is afected by molecules such as carbon monoxide (CO) (for example from tobacco smoking, cars and furnaces).
CO competes with oxygen at the heme binding site. Hemoglobin binding afnity for CO is 200 times greater than its afnity for oxygen, meaning that small amounts of CO dramatically reduces hemoglobin's ability to transport oxygen. When hemoglobin combines with CO, it forms a very bright red compound called carboxyhemoglobin.
When inspired air contains CO levels as low as 0.02%, headache and nausea occur; if the CO concentration is increased to 0.1%, unconsciousness will follow. In heavy 3-dimensional structure of smokers, up to 20% of the oxygen-active sites can be blocked hemoglobin. The four subunits are by CO. shown in red and yellow, and the heme groups in green.
Cell adhesion and signaling: Integrins are cell adhesion molecules that couple the cytoskeleton to the extracellular matrix
Inside
signal
outside
Page ‹#› Integrin Topology
http://www.multimedia.mcb.harvard.edu/
Page ‹#› The relationship between shape and function of proteins:
The relationship between shape and function of proteins:
Page ‹#› The Shape of proteins:
Occurs Spontaneously Native conformation
determined by different Levels of structure
Disease and protein folding:
Disease
Exemple: Neurodegenerative diseases
Page ‹#› An X-ray difraction image for the protein myoglobin.
The first protein crystal structure was of sperm whale myoglobin, as determined by Max Perutz and Sir John Cowdery Kendrew in 1958, which led to a Nobel Prize in Chemistry.
NMR is a field of structural biology, that applies nuclear magnetic resonance spectroscopy to investigating proteins
The field was pioneered by among others, Richard Ernst (Nobel prize 1991) and Kurt Wüthrich (Nobel prize 2002),
Pacific Northwest National Laboratory's high magnetic The NMR sample is prepared field (800 MHz) NMR spectrometer being loaded with a in a thin walled glass tube. sample.
Protein NMR is performed on aqueous samples of highly purified protein.
Sample consist of between 300 and 600 microlitres with a protein concentration in the range 0.1 – 3 millimoles.
The source of the protein can be either natural or produced in an expression system using recombinant DNA techniques through genetic engineering.
Page ‹#› Acknowledgements
Marie-Véronique Clement Annick Dejaegere Bruno Kiefer
Page ‹#›