Introduction to (Master ChemoInformatique)

• Roland Stote • Institut de Génétique et de Biologie Moléculaire et Cellulaire • Biocomputing Group • 03.90.244.730 • [email protected]

Biological Function at the Molecular Level

1 • 3.1x109 letters in the DNA code in every one of the 100x1012 cells in the human body.

• Humans have between 30,000 and 40,000 genes.

• There is approximately 2m of DNA in each cell packed into the nucleosome.

• If all the DNA in the human body were put end-to-end, it would reach to the Sun and back more than 600 times

What is Bioinformatics?

• Bioinformatics is the study of information contained within biological, chemical or medical systems through the use of computers.

• Bioinformatics methods are used in a wide variety of fields including basic science, biotechnology, medicine, pharmaceutical development and public health, plus others.

• Bioinformatics is continually evolving; new approaches and tools are being developed that allow the researcher to more accurately and efficiently acquire, analyze and present the large amounts of data that are generated in today's research environment.

2 What is Bioinformatics?

• Development and application of computerized methods for the study biological information and data (generation of databases) • Analysis and interpretation of these data (software tools)

Developing algorithms for text string comparison ( and keyword searches) Developing algorithms for pattern matching (data mining, cluster analysis) Algorithms for geometry analysis (docking,visualisation) Physical simulations and model building (molecular dynamics, molecular mechanics, homology modeling)

Bioinformatics is situated at the interface of multiple domains of research.

3 Objectives of this module - 20hours

1. Introduction of and DNA sequence and structure.

2. Present different biological databases (sequence and structure) and their associated interrogration tools.

3. Find information on a protein from its sequence.

4. Visualize, analyze and find information on a protein from its 3-D structure - use of the visualization program VMD.

5. Presentation of the basics of molecular modeling applied to biological molecules. An introduction of energy minimization and molecular dynamics.

4 An introduction to DNA and Relationship to function

Roland Stote

[email protected]

Sources of supplementary information

 Introduction à la structure des protéines – Branden & Tooze – Ed DeBoeck Université

: structures and molecular properties – Thomas E. Creighton – W.H. Freeman

 On the web: – http://www.expasy.ch/swissmod/course/course-index.htm – http://www.cryst.bbk.ac.uk/PPS2/course/index.html

Page ‹#› Structure des acides nucléiques

Roland Stote

Basé sur le cour de Annick Dejaegere à l’ESBS

 Acides nucléiques formés de

Phosphates Sucres Bases

Page ‹#› Bases: Pyrimidines 4 5 N 3

6 2 N NH2 1

N Cytosine: 2 oxy 4 amino pyirmidine

N O H

O O Uracile: 2, 4 dioxy pyrimidine H3C NH NH Thymine: 5 methyl uracile

N O N O H H

6 7 N 5 N 1 Bases: Purines 8 4 2 N H H 9 H N N 3

N Adenine: 6 amino purine N O

N H N N H N

N H Guanine: 2 amino 6 oxy purine H N N

H

Page ‹#› Liaisons hydrogènes

A A D D

O NH2 D A

N NH D D

N O N O H H

D D Cytosine Uracile

Liaisons hydrogènes A: interaction avec accepteur A A D: interaction avec donneur

D H H D D N

D O D A N N N H N A

N H N

N H H N N

H

D Adenine Guanine D A

Page ‹#› Interactions dans le plan Liaisons hydrogènes

Paires de bases

 10 possibilités dʼassemblage de paires purines-pyrimidines  11 purines-purines  7 pyrimidines-pyrimidines http://www.imb-jena.de/ImgLibDoc/

Page ‹#› Interactions verticales

 Empilement (stacking) vertical  Effet hydrophobe  Interactions électrostatique des bases

Hélices A, B

Page ‹#› Hélices A, B

Les sillons

Page ‹#› Les sillons

Les sillons

Page ‹#› DNA Z

 Séquences G -C  Alternance de conformations syn et anti

DNA A, B, Z

 B DNA: faible force ionique - conformation native de la chromatine

 A DNA: forte force ionique ou en présence dʼalcohol

 A RNA: conformation native de lʼARN

 Z DNA: séquences alternées poly dG - dC à forte force ionique.

Page ‹#› Nucléosome

Molecular Biology

 DNA String of four-letter alphabet of nucleotides A, C, G, T

 Usually double stranded with complementary anti-parallel strands 5 ʼ ATCGCCTTATTCAT 3 ʼ 3 ʼ TAGCGGAATAAGTA 5 ʼ

Page ‹#› Genetic Code

 A gene is a specific sequence of nucleotide bases, whose sequences carry the information required for constructing proteins, which provide the structural components of cells and tissues as well as for essential biochemical reactions.  The human is estimated to comprise more than 30,000 genes.

 The Genetic Code describes the translation of genes into protein

Genetic Code

TTT F Phe TCT S Ser TAT Y Tyr TGT C Cys TTC F Phe TCC S Ser TAC Y Tyr TGC C Cys TTA L Leu TCA S Ser TAA * Ter TGA * Ter TTG L Leu i TCG S Ser TAG * Ter TGG W Trp

CTT L Leu CCT P Pro CAT H His CGT R Arg CTC L Leu CCC P Pro CAC H His CGC R Arg CTA L Leu CCA P Pro CAA Q Gln CGA R Arg CTG L Leu i CCG P Pro CAG Q Gln CGG R Arg

ATT I Ile ACT T Thr AAT N Asn AGT S Ser ATC I Ile ACC T Thr AAC N Asn AGC S Ser ATA I Ile ACA T Thr AAA K Lys AGA R Arg ATG M Met i ACG T Thr AAG K Lys AGG R Arg

GTT V Val GCT A Ala GAT D Asp GGT G Gly GTC V Val GCC A Ala GAC D Asp GGC G Gly GTA V Val GCA A Ala GAA E Glu GGA G Gly GTG V Val GCG A Ala GAG E Glu GGG G Gly

Page ‹#› Biology at the Molecular Level

Proteins are essentially biological polymers

N-terminus terminates by an amino group

Peptide bond

Amino acid

C-terminus terminates by a carboxyl group A : Phe-Ser-Glu-Lys (F-S-E-K)

Page ‹#› General form of amino acids

R

H2N C COOH α

H

The alpha carbon is asymmetic. 2 stereoisomers are possible, D or L

Chirality in proteins

D form L form

Page ‹#› Page ‹#› Page ‹#› Interactions between amino acids and their environment.

Elecrostatic interactions E(r)=A/r

r r + - + +

van der Waals interactions

r 12 6 E(r)= B/r - C/r

Interactions with solvent

Mesured by:

- solubility - chromatography - surface tension

Hydrophobic interactions

Hydrophilic interactions

Page ‹#› The hydrogen bond

δ - δ + δ - - D --- H …. A - donor acceptor

Double character: - electrostatic - covalent An acceptor can be shared geometry: N Linéaire +/-20 deg H O = d = 1.7 Å - 2.0 Å H N

Page ‹#› Classification of amino acids

Amphipatics Charged R D Q Small hydrophilics E K H N T C S

Y W G M P A V

F I L Small hydrophobics

Bulky hydrophobics

The peptide bond

i+1 - i+1 O C O C α + α C N C N

C C α H α H i i

Page ‹#› CIS-TRANS Isomerixation

i+1 O C O α H C N C N

C H C C α α α i i i+1 ω = 180 ω = 0 TRANS CIS

Four Levels of Structure Determine the Shape of Proteins

Primary structure The linear arrangement (sequence) of amino acids and the location of covalent (mostly ) bonds within a polypeptide chain. Determined by the genetic code.

Secondary structure local folding of a polypeptide chain into regular structures including the α helix, β sheet, and U-shaped turns and loops.

Tertiary structure overall three-dimensional form of a polypeptide chain, which is stabilized by multiple non-covalent interactions between side chains.

Quaternary structure: The number and relative positions of the polypeptide chains in multisubunit proteins. Not all protein have a quaternary structure.

Page ‹#› Primary Structure of a protein: determined by the nucleotide sequence of its gene

Bovine : the first sequenced protein

• In 1953, Frederick Sanger determined the sequence of insulin, a protein hormone .

• This work is a landmark in because it showed for the first time that a protein has a precisely defined amino acid sequence.

• it demonstrated that insulin consists only of amino acids linked by peptide bonds between α-amino and α-carboxyl groups.

• the complete amino acid sequences of more than 100,000 proteins are now known.

• Each protein has a unique, precisely defined amino acid sequence.

Amino acid substitution in proteins from different species

Conservative Substitution of an amino acid by another amino acid of similar polarity (Val for Ile in position 10 of insulin)

Substitution involving replacement of Non conservative an amino acid by another of different polarity (sickle cell anemia, 6th position of hemoglobin replace from a glutamic acid to a valine induce precipitation of hemoglobin in red blood cells)

Invariant residues Amino acid found at the same position in different species (critical for for the sructure or function of the protein)

Page ‹#› Protein conformation: many (but not all) proteins fold into a stable conformation, otherwise known as the native conformation

More than 50 amino acids becomes a protein,otherwise known as a peptide.

Secondary structure of proteins

Page ‹#› The 3D structure is defined by the orientation of successive petide planes

There are degrees of freedom for each amino acid (phi and psi)

Conformation of the polypeptide chain

Phi

Page ‹#› Conformation of the polypeptide chain

Psi

The Ramachandran plot

Page ‹#› The

Helix Parameters

• Helix step: 5,4 Å • 3,6 residues per •Hydrogen bonds Oi - Ni+4 • phi = -60 / psi = -50

Vue axiale:

Page ‹#› The 3 10 helix

3 residues/turn H bond Oi to Ni+3 phi = -50 / psi = -25

The beta strand

Page ‹#› Assembly of beta strands into sheets

Parallel and anti-parallel beta sheets

Page ‹#› The is not flat but curved

The beta turns

Page ‹#› Triple helix of Collagen

•Limited to tropocollagen molecule •3 left-handed helices wound together to give a right-handed superhelix •Stable superhelix : glycines located on the central axis (small R group) of triple helix •One interchain H-bond for each triplet of amino acids – between NH of Gly and CO of X (or Proline) in the adjacent chain

Page ‹#› Side chain conformations

Definition o dihedral angles along the side chain

Most frequently observed rotamers

> >

Non covalent interactions involved in the shape of proteins

Page ‹#› Tertiary structure: the overall shape of a protein

or a telephone cord!!!

The secondary structure of a telephone cord

A telephone cord, specifically the coil of a telephone cord, can be used as an analogy to the alpha helix secondary structure of a protein.

The tertiary structure of a telephone cord

The tertiary structure of a protein refers to the way the secondary structure folds back upon itself or twists around to form a three-dimensional structure. The secondary coil structure is still there, but the tertiary tangle has been superimposed on it.

Secondary structure motifs

Motifs: assembly (simple) of secondary structural elements

Helix-turn-helix

β Hair-pin

Page ‹#› Secondary structure motifs

β−α−β motic

Greek key motif

Fold Classification

α β α/β α+β

Page ‹#› Tertiary structure: the overall shape of a protein

Full three dimensional organization of a protein

The three-dimensional structure of a protein kinase

The role of side chain in the Where is water? shape of proteins

Hydrophilic

Hydrophobic

Page ‹#› TERTIARY STRUCTURE

 R-group interactions result in 3D structures of globular proteins

 Types of interactions : H-, ionic- (salt linkage), hydrophobic- and disulphide- bond

 Hydrophilic R groups on surface while hydrophobic R groups buried inside of molecule

 Wide variety of 3o structures: since large variation in protein sizes and amino acid sequences

Page ‹#› After X-ray crystallographic studies of hen (Phillips, 1966), (Drenth et al., 1968) and by limited proteolysis studies of immunoglobulins (Porter, 1973; Edelman, 1973), Donald B. Wetlaufer Wetlaufer defined domains as stable units of protein structure that could fold autonomously.

A •is a part of a protein that can evolve, function, and exist independently of the rest of the protein chain.

•each domain forms a compact three-dimensional structure and often can be independently stable and folded.

•many proteins consist of several structural domains.

•one domain may appear in a variety of evolutionarily related proteins.

•domains vary in length from between about 25 amino acids up to 500 amino acids in length.

•examples, zinc fingers (stabilized by metal ions), calcium-binding EF .

Protein domains

The NAD*- of Cytochrome b562 the lactic A single domain protein dehydrogenase The variable domain of involved in electron transport an immunoglobulin in mitochondria

*nicotinamide adenine dinucleotide

Page ‹#› The Src protein

Quaternery structure:

If protein is formed as a complex of more than one protein chain, the complete structure is designed as quaternery structure:

• Generally formed by non-covalent interactions between subunits

• Either as homo- or hetero-multimers

Page ‹#› Primary structure

Secondary structure

Tertiary structure

Quaternary structure

Function of and proteins

Page ‹#› STRUCTURE - FUNCTION RELATIONSHIPS

In general, all globular proteins have distinctive 3D structures that are specialized for their particular functions.

Shape and function

Page ‹#› Membrane transport proteins

Mechanical support - skin and bone are strengthened by the protein collagen.

Abnormal collagen synthesis or structure causes dysfunction of

• cardiovascular organs, • bone, • skin, • joints • eyes

Refer to Devlin Clinical correlation 3.4 p121

Page ‹#› Transport and storage - small molecules are often carried by proteins in the physiological setting (for example, the protein hemoglobin is responsible for the transport of oxygen to tissues). Many drug molecules are partially bound to serum in the plasma.

The binding of oxygen is afected by molecules such as carbon monoxide (CO) (for example from tobacco smoking, cars and furnaces).

CO competes with oxygen at the heme binding site. Hemoglobin binding afnity for CO is 200 times greater than its afnity for oxygen, meaning that small amounts of CO dramatically reduces hemoglobin's ability to transport oxygen. When hemoglobin combines with CO, it forms a very bright red compound called carboxyhemoglobin.

When inspired air contains CO levels as low as 0.02%, headache and nausea occur; if the CO concentration is increased to 0.1%, unconsciousness will follow. In heavy 3-dimensional structure of smokers, up to 20% of the oxygen-active sites can be blocked hemoglobin. The four subunits are by CO. shown in red and yellow, and the heme groups in green.

Cell adhesion and signaling: Integrins are molecules that couple the cytoskeleton to the

Inside

signal

outside

Page ‹#› Integrin Topology

http://www.multimedia.mcb.harvard.edu/

Page ‹#› The relationship between shape and function of proteins:

The relationship between shape and function of proteins:

Page ‹#› The Shape of proteins:

Occurs Spontaneously Native conformation

determined by different Levels of structure

Disease and :

Disease

Exemple: Neurodegenerative diseases

Page ‹#› An X-ray difraction image for the protein myoglobin.

The first protein crystal structure was of sperm whale myoglobin, as determined by Max Perutz and Sir John Cowdery Kendrew in 1958, which led to a Nobel Prize in Chemistry.

NMR is a field of , that applies nuclear magnetic resonance spectroscopy to investigating proteins

The field was pioneered by among others, Richard Ernst (Nobel prize 1991) and Kurt Wüthrich (Nobel prize 2002),

Pacific Northwest National Laboratory's high magnetic The NMR sample is prepared field (800 MHz) NMR spectrometer being loaded with a in a thin walled glass tube. sample.

Protein NMR is performed on aqueous samples of highly purified protein.

Sample consist of between 300 and 600 microlitres with a protein concentration in the range 0.1 – 3 millimoles.

The source of the protein can be either natural or produced in an expression system using recombinant DNA techniques through .

Page ‹#› Acknowledgements

Marie-Véronique Clement Annick Dejaegere Bruno Kiefer

Page ‹#›