Introduction to Bioinformatics (Master Chemoinformatique)
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Bioinformatics (Master ChemoInformatique) • Roland Stote • Institut de Génétique et de Biologie Moléculaire et Cellulaire • Biocomputing Group • 03.90.244.730 • [email protected] Biological Function at the Molecular Level 1 • 3.1x109 letters in the DNA code in every one of the 100x1012 cells in the human body. • Humans have between 30,000 and 40,000 genes. • There is approximately 2m of DNA in each cell packed into the nucleosome. • If all the DNA in the human body were put end-to-end, it would reach to the Sun and back more than 600 times What is Bioinformatics? • Bioinformatics is the study of information contained within biological, chemical or medical systems through the use of computers. • Bioinformatics methods are used in a wide variety of fields including basic science, biotechnology, medicine, pharmaceutical development and public health, plus others. • Bioinformatics is continually evolving; new approaches and tools are being developed that allow the researcher to more accurately and efficiently acquire, analyze and present the large amounts of data that are generated in today's research environment. 2 What is Bioinformatics? • Development and application of computerized methods for the study biological information and data (generation of databases) • Analysis and interpretation of these data (software tools) Developing algorithms for text string comparison (sequence alignment and keyword searches) Developing algorithms for pattern matching (data mining, cluster analysis) Algorithms for geometry analysis (docking,visualisation) Physical simulations and model building (molecular dynamics, molecular mechanics, homology modeling) Bioinformatics is situated at the interface of multiple domains of research. 3 Objectives of this module - 20hours 1. Introduction of protein and DNA sequence and structure. 2. Present different biological databases (sequence and structure) and their associated interrogration tools. 3. Find information on a protein from its sequence. 4. Visualize, analyze and find information on a protein from its 3-D structure - use of the visualization program VMD. 5. Presentation of the basics of molecular modeling applied to biological molecules. An introduction of energy minimization and molecular dynamics. 4 An introduction to DNA and protein structure Relationship to function Roland Stote [email protected] Sources of supplementary information Introduction à la structure des protéines – Branden & Tooze – Ed DeBoeck Université Proteins: structures and molecular properties – Thomas E. Creighton – W.H. Freeman On the web: – http://www.expasy.ch/swissmod/course/course-index.htm – http://www.cryst.bbk.ac.uk/PPS2/course/index.html Page ‹#› Structure des acides nucléiques Roland Stote Basé sur le cour de Annick Dejaegere à l’ESBS Acides nucléiques formés de Phosphates Sucres Bases Page ‹#› Bases: Pyrimidines 4 5 N 3 6 2 N NH2 1 N Cytosine: 2 oxy 4 amino pyirmidine N O H O O Uracile: 2, 4 dioxy pyrimidine H3C NH NH Thymine: 5 methyl uracile N O N O H H 6 7 N 5 N 1 Bases: Purines 8 4 2 N H H 9 H N N 3 N Adenine: 6 amino purine N O N H N N H N N H Guanine: 2 amino 6 oxy purine H N N H Page ‹#› Liaisons hydrogènes A A D D O NH2 D A N NH D D N O N O H H D D Cytosine Uracile Liaisons hydrogènes A: interaction avec accepteur A A D: interaction avec donneur D H H D D N D O D A N N N H N A N H N N H H N N H D Adenine Guanine D A Page ‹#› Interactions dans le plan Liaisons hydrogènes Paires de bases 10 possibilités dʼassemblage de paires purines-pyrimidines 11 purines-purines 7 pyrimidines-pyrimidines http://www.imb-jena.de/ImgLibDoc/ Page ‹#› Interactions verticales Empilement (stacking) vertical Effet hydrophobe Interactions électrostatique des bases Hélices A, B Page ‹#› Hélices A, B Les sillons Page ‹#› Les sillons Les sillons Page ‹#› DNA Z Séquences G -C Alternance de conformations syn et anti DNA A, B, Z B DNA: faible force ionique - conformation native de la chromatine A DNA: forte force ionique ou en présence dʼalcohol A RNA: conformation native de lʼARN Z DNA: séquences alternées poly dG - dC à forte force ionique. Page ‹#› Nucléosome Molecular Biology DNA String of four-letter alphabet of nucleotides A, C, G, T Usually double stranded with complementary anti-parallel strands 5 ʼ ATCGCCTTATTCAT 3 ʼ 3 ʼ TAGCGGAATAAGTA 5 ʼ Page ‹#› Genetic Code A gene is a specific sequence of nucleotide bases, whose sequences carry the information required for constructing proteins, which provide the structural components of cells and tissues as well as enzymes for essential biochemical reactions. The human genome is estimated to comprise more than 30,000 genes. The Genetic Code describes the translation of genes into protein Genetic Code TTT F Phe TCT S Ser TAT Y Tyr TGT C Cys TTC F Phe TCC S Ser TAC Y Tyr TGC C Cys TTA L Leu TCA S Ser TAA * Ter TGA * Ter TTG L Leu i TCG S Ser TAG * Ter TGG W Trp CTT L Leu CCT P Pro CAT H His CGT R Arg CTC L Leu CCC P Pro CAC H His CGC R Arg CTA L Leu CCA P Pro CAA Q Gln CGA R Arg CTG L Leu i CCG P Pro CAG Q Gln CGG R Arg ATT I Ile ACT T Thr AAT N Asn AGT S Ser ATC I Ile ACC T Thr AAC N Asn AGC S Ser ATA I Ile ACA T Thr AAA K Lys AGA R Arg ATG M Met i ACG T Thr AAG K Lys AGG R Arg GTT V Val GCT A Ala GAT D Asp GGT G Gly GTC V Val GCC A Ala GAC D Asp GGC G Gly GTA V Val GCA A Ala GAA E Glu GGA G Gly GTG V Val GCG A Ala GAG E Glu GGG G Gly Page ‹#› Biology at the Molecular Level Proteins are essentially biological polymers N-terminus terminates by an amino group Peptide bond Amino acid C-terminus terminates by a carboxyl group A peptide: Phe-Ser-Glu-Lys (F-S-E-K) Page ‹#› General form of amino acids R H2N C COOH α H The alpha carbon is asymmetic. 2 stereoisomers are possible, D or L Chirality in proteins D form L form Page ‹#› Page ‹#› Page ‹#› Interactions between amino acids and their environment. Elecrostatic interactions E(r)=A/r r r + - + + van der Waals interactions r 12 6 E(r)= B/r - C/r Interactions with solvent Mesured by: - solubility - chromatography - surface tension Hydrophobic interactions Hydrophilic interactions Page ‹#› The hydrogen bond δ - δ + δ - - D --- H …. A - donor acceptor Double character: - electrostatic - covalent An acceptor can be shared geometry: N Linéaire +/-20 deg H O = d = 1.7 Å - 2.0 Å H N Page ‹#› Classification of amino acids Amphipatics Charged R D Q Small hydrophilics E K H N T C S Y W G M P A V F I L Small hydrophobics Bulky hydrophobics The peptide bond i+1 - i+1 O C O C α + α C N C N C C α H α H i i Page ‹#› CIS-TRANS Isomerixation i+1 O C O α H C N C N C H C C α α α i i i+1 ω = 180 ω = 0 TRANS CIS Four Levels of Structure Determine the Shape of Proteins Primary structure The linear arrangement (sequence) of amino acids and the location of covalent (mostly disulfide) bonds within a polypeptide chain. Determined by the genetic code. Secondary structure local folding of a polypeptide chain into regular structures including the α helix, β sheet, and U-shaped turns and loops. Tertiary structure overall three-dimensional form of a polypeptide chain, which is stabilized by multiple non-covalent interactions between side chains. Quaternary structure: The number and relative positions of the polypeptide chains in multisubunit proteins. Not all protein have a quaternary structure. Page ‹#› Primary Structure of a protein: determined by the nucleotide sequence of its gene Bovine Insulin: the first sequenced protein • In 1953, Frederick Sanger determined the amino acid sequence of insulin, a protein hormone . • This work is a landmark in biochemistry because it showed for the first time that a protein has a precisely defined amino acid sequence. • it demonstrated that insulin consists only of amino acids linked by peptide bonds between α-amino and α-carboxyl groups. • the complete amino acid sequences of more than 100,000 proteins are now known. • Each protein has a unique, precisely defined amino acid sequence. Amino acid substitution in proteins from different species Conservative Substitution of an amino acid by another amino acid of similar polarity (Val for Ile in position 10 of insulin) Substitution involving replacement of Non conservative an amino acid by another of different polarity (sickle cell anemia, 6th position of hemoglobin replace from a glutamic acid to a valine induce precipitation of hemoglobin in red blood cells) Invariant residues Amino acid found at the same position in different species (critical for for the sructure or function of the protein) Page ‹#› Protein conformation: many (but not all) proteins fold into a stable conformation, otherwise known as the native conformation More than 50 amino acids becomes a protein,otherwise known as a peptide. Secondary structure of proteins Page ‹#› The 3D structure is defined by the orientation of successive petide planes There are degrees of freedom for each amino acid (phi and psi) Conformation of the polypeptide chain Phi Page ‹#› Conformation of the polypeptide chain Psi The Ramachandran plot Page ‹#› The alpha helix Helix Parameters • Helix step: 5,4 Å • 3,6 residues per turn •Hydrogen bonds Oi - Ni+4 • phi = -60 / psi = -50 Vue axiale: Page ‹#› The 3 10 helix 3 residues/turn H bond Oi to Ni+3 phi = -50 / psi = -25 The beta strand Page ‹#› Assembly of beta strands into sheets Parallel and anti-parallel beta sheets Page ‹#› The beta sheet is not flat but curved The beta turns Page ‹#› Triple helix of Collagen •Limited to tropocollagen molecule •3 left-handed helices wound together to give a right-handed superhelix •Stable superhelix : glycines located on the central axis (small R group) of triple helix •One interchain H-bond for each triplet of amino acids – between NH of Gly and CO of X (or Proline) in the adjacent chain Page ‹#› Side chain conformations Definition o dihedral angles along the side chain Most frequently observed rotamers > > Non covalent interactions involved in the shape of proteins Page ‹#› Tertiary structure: the overall shape of a protein or a telephone cord!!! The secondary structure of a telephone cord A telephone cord, specifically the coil of a telephone cord, can be used as an analogy to the alpha helix secondary structure of a protein.