<<

Outline

Computational and A whirlwind review of Molecular Biology An overview of computational molecular biology New problems in genomics

Dannie Durand Fall 2003 Lecture 1

Genes Encode DNA forms a double stranded helix

A is a DNA GTGCACCTGACTCCTGAG... sequence

V H L T P E... A is an amino acid sequence

A protein folds into a 3D structure A T G C

DNA replication

…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga…

DNA

1 Protein Synthesis: Protein Synthesis DNA DNA

RNA

RNA transcription RNA transcription

GTGCAC aCTGACTCCTTGGCTACCCGAG... GaTGTCGGTGCACCTGACTCCTGAG ...GTGTCGCTGACTCCTTGGCTACCCGAG... GTGCACCTGACTCCTGAG ...CACAGCGACTGAGGAACCGATGGGCTC... open DNA helix CACGTGGACTGAGGACTC... CACaAGCa.. CACGTG aGACTGAGGAACCGATGGGCTC... double-stranded GTGCAC DNA sequence

RNA transcription RNA

GTGCAC aCTGACTCCTTGGCTACCCGAG... • Adenine, Guanine, Cytosine, Uracil (AGCU) GaTGTCGGTGCACCTGACTCCTGAG ...CUGACUCCUUGGCUACCCGAG... GTGCACCTGACTCCTGAG RNA transcript ...GACTGAGGAACCGATGGGCTC... CACGTGGACTGAGGACTC... CACaAGCaCUGACUCCUUGGCUACCCGAG... • Single stranded CACGTG aGACTGAGGAACCGATGGGCTC... • Secondary structure

2 RNA Secondary Structure Protein Synthesis:

CCGUGAACGUGUACCGGAUUUUUAUUCCC...

translation

RNA

Protein Translation transfer RNA amino acid sequence ... E .V H L transfer RNA T P

UGAGGA ...GUGCACCUGACUCCUGAG... messenger RNA CUC

ribosome secondary structure tertiary structure

Protein Translation Protein Translation amino acid sequence transfer RNA . amino acid sequence .. tertiary protein .V structrure H L E T P .... CUC transfer RNA UGAGGA ...GUGCACCUGACUCCUGAG...... GUGCACCUGGUGCACCUGACUCCUGAGGUGCACCUGACUCCUGAGCUCCUGAG... messenger RNA messenger RNA

ribosome ribosome

3 Gene Regulation

If I have the same set of in every cell, how come my liver cells look so different from my skin cells?

…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga… Only a small number of genes are being protein coding sequence translated into protein at any one time

A gene is a location on a chromosome that encodes a protein

Translating Genes into Proteins Gene Regulation Controls When Bacteria a Gene Is Transcribed DNA:

RNA polymerase

gene

transcription DNA: messenger RNA: RNA polymerase

promoter gene translation

amino acid sequence:

Gene Regulation Controls When Gene Regulation Controls When a Gene Is Transcribed a Gene Is Transcribed

tryptophan

tryptophan

DNA: RNA polymerase DNA: repressor repressor

promoter gene promoter gene

4 Translating Genes into Proteins: Gene Regulation Controls When Multicellular organisms

a Gene Is Transcribed DNA:

tryptophan exon1 exon2 exon3 exon4

transcription repressor RNA transcript: DNA: RNA polymerase RNA splicing

promoter gene mRNA:

translation

amino acid sequence

Alternative splicing Gene Regulation

DNA: In single cell organisms, gene regulation exon1 exon2 exon3 exon4 exon5 exon6 orchestrates mRNA: – Responses to changing environment – Cell cycle exon1 exon2 exon3 exon4 exon5 exon6 In multicellular organisms, gene regulation female exon1 exon2 exon3 exon4 orchestrates Alternate splice forms: – Tissue type differentiation – Development from embryo to adulthood male exon1 exon2 exon3 exon5 exon6

Outline The Origins of

ARPANET A whirlwind review of molecular biology 1970 An overview of computational molecular biology Sanger-Coulson First royal email Maxam-Gilbert sequencing – Sequence comparison USENET newgroups Gilbert, Sanger win – Reconstruct evolutionary history 1980 Nobel Prize – Gene prediction TCP/IP – Predict structure from sequence Congress establishes Genbank Internet • RNA 1990 • Proteins Human Project begins World Wide Web, Gopher New problems in genomics NCSA Mosaic GenBank goes online. Pizza Hut goes on line

5 Growth of sequence data during the ’90’s Outline

A whirlwind review of molecular biology An overview of computational molecular biology – Sequence comparison – Reconstruct evolutionary history – Gene prediction – Predict structure from sequence • RNA • Proteins

Collins et al, Science, Oct 1998 New problems in genomics

National Center for Information

Why sequence data is so powerful: Sequence Comparison Sequences are related!

…atgcaaggagtcccagagcctgagctgactacgt… …atgcaag_cgtcccagtgccagaactccctacgt… …atgcgaggtctcccagtgtctgaactgactaagt… …acc_gtggtctccgagtggctgaactgac_aaca… early mammal global pairwise alignment locallocal pairwisepairwise alignmentalignment …atgccaggactcccagtga…

mkwvtfisll v..frrda.h ksevahrfkd lgeenfkalv… …mkwvtfisll flfssaysrg v..frrda.h ksevahrf …atgccaggactcccagtga… …atgccaggactcccagtga… mkwvtfisll v..frrea.h kseiahrfnd vgeehfiglv… …mkwvtfisll flfssaysrg v..frrea.h kseiahrf ~~wvtfisll v..frrdt.y kseiahrfkd lgeqyfkglv… …~~wvtfisll flfssaysrg v..frrdt.y kseiahrf mkwvtlisfi lqrfardaeh kseiahrynd lkeetfkava… …mkwvtlisfi flfssatsrn lqrfardaeh kseiahr …atgcaaggagtcccagagc… …atgcgaggtctcccagtgt… global multiple alignment locallocal multiplemultiple alignmentalignment …atgcaaggagtcccagagc… …atgcgaggtctcccagtgt… Applications • Database searching human rat mouse • RNA structure …atgcaaggagtcgcagagc… …atgcgaggtctcgtagtgt… …atgggaggtctcccagtgt… • Evolutionary tree reconstruction • Gene finding …atgcaaggagtcgcagagc… …atgcgaggtctcgtagtgt… • …. …atgggaggtctcccagtgt…

Sequence similarity => functional similarity Outline

A whirlwind review of molecular biology BLAST …LWDPTFQVEFNQLG… An overview of computational molecular biology …LWDEFNQLGTE …TMFPTFEMIVTKAG… MIVTKAGRRMFP …RRMFPTFQVPTFQV… – Sequence comparison TFQVKLFGMDPM …KLFMFPTFGEMDPM… ADYMLLMDFVPV …ADYMMCFPTFLLMD… – Reconstruct evolutionary history DDKRYRYAFHS… …FVPVDDKPTSFQVR… . – Gene prediction . . – Predict structure from sequence • RNA • Proteins New problems in genomics O

6 Reconstructing Evolutionary History Outline

early mammal …atgccaggactcccagtga… A whirlwind review of molecular biology An overview of computational molecular biology – Sequence comparison – Reconstruct evolutionary history …atgcaaggagtcccagagc… …atgcgaggtctcccagtgt… – Gene prediction – Predict structure from sequence • RNA gorilla human chimpanzee • Proteins …atgcaaggagtcgcagagc… …atgcgaggtctcgtagtgt… …atgggaggtctcccagtgt… …atgcaaggagtcgcagagc… New problems in genomics …atgc…atgcaaggagagggttcctcgcaggtagtgagc…t… …atgc…atggagggaggtctctctcccagtggtagtgt…t… …atgggaggtctcccagtgt…

Gene Recognition Problem Gene Recognition Problem

human …aggaggcctataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactggccaggactcccagtga… …aggaggactataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactgcgccaggactcgtagtga…

?

…cgacgccatataaattagtaatgtactatgggctggggcgtgatacgtacactgtggcctggtagctatgcagcacgtgtctagtga… Mouse

Gene Recognition Problem Outline

human A whirlwind review of molecular biology …aggaggactataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactgcgccaggactcgtagtga… An overview of computational molecular biology – Sequence comparison – Reconstruct evolutionary history – Gene prediction …cgacgccatataaattagtaatgtactatgggctggggcgtgatacgtacactgtggcctggtagctatgcagcacgtgtctagtga… Mouse – Predict structure from sequence • RNA • Proteins New problems in genomics

7 RNA Secondary Structure RNA Secondary Structure

CCGUGUACGUGUAACGGAUCUUUAUUCCC... CCGUGAACGUAUACUGGAGUUUUAGUCCG... GCGUCACCGUGUUCCGGAUUAUGAUUCCC CCGAGAACGUCCACGGGAUUUCUGUUCCA

RNA Secondary Structure Outline

CCGUGUACGUGUAACGGAUCUUUAUUCCC... A whirlwind review of molecular biology CCGUGAACGUAUACUGGAGUUUUAGUCCG... An overview of computational molecular biology GCGUCACCGUGUUCCGGAUUAUGAUUCCC – Sequence comparison CCGAGAACGUCCACGGGAUUUCUGUUCCA – Reconstruct evolutionary history – Gene prediction – Predict structure from sequence • RNA • Proteins New problems in genomics

Structure Determination is Hard Predicting Protein Structure

• Tertiary structure: – Detailed physical models – Find the configuration of minimum energy Nuclear Magnetic Resonance • Secondary structure: – e.g., Which amino acids participate in a helix? • Secondary structure motifs: – e.g., Is this a coiled-coil protein? X-ray Diffraction • Threading: – Estimate protein structure from a related protein with known structure and similar sequence.

8 Predicting Protein Structure Predicting Protein Structure

• Tertiary structure: – Detailed physical models Complete physical model – Find the configuration of minimum energy of tertiary structure • Secondary structure: – e.g., Which amino acids participate in a helix? • Secondary structure motifs: – e.g., Is this a coiled-coil protein? • Threading: – Estimate protein structure from a related protein with known structure and similar sequence.

Predicting Protein Structure Protein Secondary Structure

Complete physical model of tertiary structure Too hard!

Secondary structure: Does this amino acid participate in an alpha- helix? alpha helix beta sheet

Predicting Protein Structure Is this a coiled-coil protein?

• Tertiary structure: – Detailed physical models – Find the configuration of minimum energy • Secondary structure: – e.g., Which amino acids participate in a helix? • Secondary structure motifs: – e.g., Is this a coiled-coil protein? • Threading: – Estimate protein structure from a related protein with known structure and similar sequence.

9 Predicting Protein Structure Threading

• Tertiary structure: Structure? – Detailed physical models – Find the configuration of minimum energy

• Secondary structure: …vkltpegtr_wgghpldekflske… – e.g., Which amino acids participate in a helix? • Secondary structure motifs: …vhltpettrgwgghmldekeiske… – e.g., Is this a coiled-coil protein? Estimate protein structure • Threading: from a related protein – Estimate protein structure from a related protein with with known structure known structure and similar sequence. and similar sequence.

Threading

• A world-wide consortium to determine novel protein structures. Structure? • Which proteins are likely to be novel?

…vkltpegtr_wgghpldekflske… …vhltpettrgwgghmldekeiske… Estimate protein structure from a related protein with known structure and similar sequence.

Pairwise sequence alignment Outline (global and local)

Multiple sequence alignment A whirlwind review of molecular biology

Substitution An overview of computational molecular matrices Database searching biology

global local BLAST New problems in genomics Sequence statistics

Gene Finding Evolutionary tree Protein structure prediction reconstruction RNA structure prediction

10