Outline
Computational Genomics and A whirlwind review of molecular biology Molecular Biology An overview of computational molecular biology New problems in genomics
Dannie Durand Fall 2003 Lecture 1
Genes Encode Proteins DNA forms a double stranded helix
A gene is a DNA GTGCACCTGACTCCTGAG... sequence
V H L T P E... A protein is an amino acid sequence
A protein folds into a 3D structure A T G C
DNA replication
…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga…
DNA
1 Protein Synthesis: Transcription Protein Synthesis DNA DNA
RNA
RNA transcription RNA transcription
GTGCAC aCTGACTCCTTGGCTACCCGAG... GaTGTCGGTGCACCTGACTCCTGAG ...GTGTCGCTGACTCCTTGGCTACCCGAG... GTGCACCTGACTCCTGAG ...CACAGCGACTGAGGAACCGATGGGCTC... open DNA helix CACGTGGACTGAGGACTC... CACaAGCa.. CACGTG aGACTGAGGAACCGATGGGCTC... double-stranded GTGCAC DNA sequence
RNA transcription RNA
GTGCAC aCTGACTCCTTGGCTACCCGAG... • Adenine, Guanine, Cytosine, Uracil (AGCU) GaTGTCGGTGCACCTGACTCCTGAG ...CUGACUCCUUGGCUACCCGAG... GTGCACCTGACTCCTGAG RNA transcript ...GACTGAGGAACCGATGGGCTC... CACGTGGACTGAGGACTC... CACaAGCaCUGACUCCUUGGCUACCCGAG... • Single stranded CACGTG aGACTGAGGAACCGATGGGCTC... • Secondary structure
2 RNA Secondary Structure Protein Synthesis: Translation
CCGUGAACGUGUACCGGAUUUUUAUUCCC...
translation
RNA
Protein Translation transfer RNA amino acid sequence ... E .V H L transfer RNA T P
UGAGGA ...GUGCACCUGACUCCUGAG... messenger RNA CUC
ribosome secondary structure tertiary structure
Protein Translation Protein Translation amino acid sequence transfer RNA . amino acid sequence .. tertiary protein .V structrure H L E T P .... CUC transfer RNA UGAGGA ...GUGCACCUGACUCCUGAG...... GUGCACCUGGUGCACCUGACUCCUGAGGUGCACCUGACUCCUGAGCUCCUGAG... messenger RNA messenger RNA
ribosome ribosome
3 Gene Regulation
If I have the same set of genes in every cell, how come my liver cells look so different from my skin cells?
…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga… Only a small number of genes are being protein coding sequence translated into protein at any one time
A gene is a location on a chromosome that encodes a protein
Translating Genes into Proteins Gene Regulation Controls When Bacteria a Gene Is Transcribed DNA:
RNA polymerase
promoter gene
transcription DNA: messenger RNA: RNA polymerase
promoter gene translation
amino acid sequence:
Gene Regulation Controls When Gene Regulation Controls When a Gene Is Transcribed a Gene Is Transcribed
tryptophan
tryptophan
DNA: RNA polymerase DNA: repressor repressor
promoter gene promoter gene
4 Translating Genes into Proteins: Gene Regulation Controls When Multicellular organisms
a Gene Is Transcribed introns DNA:
tryptophan exon1 exon2 exon3 exon4
transcription repressor RNA transcript: DNA: RNA polymerase RNA splicing
promoter gene mRNA:
translation
amino acid sequence
Alternative splicing Gene Regulation
DNA: In single cell organisms, gene regulation exon1 exon2 exon3 exon4 exon5 exon6 orchestrates mRNA: – Responses to changing environment – Cell cycle exon1 exon2 exon3 exon4 exon5 exon6 In multicellular organisms, gene regulation female exon1 exon2 exon3 exon4 orchestrates Alternate splice forms: – Tissue type differentiation – Development from embryo to adulthood male exon1 exon2 exon3 exon5 exon6
Outline The Origins of Computational Biology
ARPANET A whirlwind review of molecular biology 1970 An overview of computational molecular biology Sanger-Coulson sequencing First royal email Maxam-Gilbert sequencing – Sequence comparison USENET newgroups Gilbert, Sanger win – Reconstruct evolutionary history 1980 Nobel Prize – Gene prediction TCP/IP – Predict structure from sequence Congress establishes Genbank Internet • RNA 1990 • Proteins Human Genome Project begins World Wide Web, Gopher New problems in genomics NCSA Mosaic GenBank goes online. Pizza Hut goes on line
5 Growth of sequence data during the ’90’s Outline
A whirlwind review of molecular biology An overview of computational molecular biology – Sequence comparison – Reconstruct evolutionary history – Gene prediction – Predict structure from sequence • RNA • Proteins
Collins et al, Science, Oct 1998 New problems in genomics
National Center for Biotechnology Information
Why sequence data is so powerful: Sequence Comparison Sequences are related!
…atgcaaggagtcccagagcctgagctgactacgt… …atgcaag_cgtcccagtgccagaactccctacgt… …atgcgaggtctcccagtgtctgaactgactaagt… …acc_gtggtctccgagtggctgaactgac_aaca… early mammal global pairwise alignment locallocal pairwisepairwise alignmentalignment …atgccaggactcccagtga…
mkwvtfisll v..frrda.h ksevahrfkd lgeenfkalv… …mkwvtfisll flfssaysrg v..frrda.h ksevahrf …atgccaggactcccagtga… …atgccaggactcccagtga… mkwvtfisll v..frrea.h kseiahrfnd vgeehfiglv… …mkwvtfisll flfssaysrg v..frrea.h kseiahrf ~~wvtfisll v..frrdt.y kseiahrfkd lgeqyfkglv… …~~wvtfisll flfssaysrg v..frrdt.y kseiahrf mkwvtlisfi lqrfardaeh kseiahrynd lkeetfkava… …mkwvtlisfi flfssatsrn lqrfardaeh kseiahr …atgcaaggagtcccagagc… …atgcgaggtctcccagtgt… global multiple alignment locallocal multiplemultiple alignmentalignment …atgcaaggagtcccagagc… …atgcgaggtctcccagtgt… Applications • Database searching human rat mouse • RNA structure …atgcaaggagtcgcagagc… …atgcgaggtctcgtagtgt… …atgggaggtctcccagtgt… • Evolutionary tree reconstruction • Gene finding …atgcaaggagtcgcagagc… …atgcgaggtctcgtagtgt… • Sequence assembly…. …atgggaggtctcccagtgt…
Sequence similarity => functional similarity Outline
A whirlwind review of molecular biology BLAST …LWDPTFQVEFNQLG… An overview of computational molecular biology …LWDEFNQLGTE …TMFPTFEMIVTKAG… MIVTKAGRRMFP …RRMFPTFQVPTFQV… – Sequence comparison TFQVKLFGMDPM …KLFMFPTFGEMDPM… ADYMLLMDFVPV …ADYMMCFPTFLLMD… – Reconstruct evolutionary history DDKRYRYAFHS… …FVPVDDKPTSFQVR… . – Gene prediction . . – Predict structure from sequence • RNA • Proteins New problems in genomics O
6 Reconstructing Evolutionary History Outline
early mammal …atgccaggactcccagtga… A whirlwind review of molecular biology An overview of computational molecular biology – Sequence comparison – Reconstruct evolutionary history …atgcaaggagtcccagagc… …atgcgaggtctcccagtgt… – Gene prediction – Predict structure from sequence • RNA gorilla human chimpanzee • Proteins …atgcaaggagtcgcagagc… …atgcgaggtctcgtagtgt… …atgggaggtctcccagtgt… …atgcaaggagtcgcagagc… New problems in genomics …atgc…atgcaaggagagggttcctcgcaggtagtgagc…t… …atgc…atggagggaggtctctctcccagtggtagtgt…t… …atgggaggtctcccagtgt…
Gene Recognition Problem Gene Recognition Problem
human …aggaggcctataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactggccaggactcccagtga… …aggaggactataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactgcgccaggactcgtagtga…
?
…cgacgccatataaattagtaatgtactatgggctggggcgtgatacgtacactgtggcctggtagctatgcagcacgtgtctagtga… Mouse
Gene Recognition Problem Outline
human A whirlwind review of molecular biology …aggaggactataacgcctctcccagcatgggctggggctcctgtcccccactgtggcctggtactgcgccaggactcgtagtga… An overview of computational molecular biology – Sequence comparison – Reconstruct evolutionary history – Gene prediction …cgacgccatataaattagtaatgtactatgggctggggcgtgatacgtacactgtggcctggtagctatgcagcacgtgtctagtga… Mouse – Predict structure from sequence • RNA • Proteins New problems in genomics
7 RNA Secondary Structure RNA Secondary Structure
CCGUGUACGUGUAACGGAUCUUUAUUCCC... CCGUGAACGUAUACUGGAGUUUUAGUCCG... GCGUCACCGUGUUCCGGAUUAUGAUUCCC CCGAGAACGUCCACGGGAUUUCUGUUCCA
RNA Secondary Structure Outline
CCGUGUACGUGUAACGGAUCUUUAUUCCC... A whirlwind review of molecular biology CCGUGAACGUAUACUGGAGUUUUAGUCCG... An overview of computational molecular biology GCGUCACCGUGUUCCGGAUUAUGAUUCCC – Sequence comparison CCGAGAACGUCCACGGGAUUUCUGUUCCA – Reconstruct evolutionary history – Gene prediction – Predict structure from sequence • RNA • Proteins New problems in genomics
Structure Determination is Hard Predicting Protein Structure
• Tertiary structure: – Detailed physical models – Find the configuration of minimum energy Nuclear Magnetic Resonance • Secondary structure: – e.g., Which amino acids participate in a helix? • Secondary structure motifs: – e.g., Is this a coiled-coil protein? X-ray Diffraction • Threading: – Estimate protein structure from a related protein with known structure and similar sequence.
8 Predicting Protein Structure Predicting Protein Structure
• Tertiary structure: – Detailed physical models Complete physical model – Find the configuration of minimum energy of tertiary structure • Secondary structure: – e.g., Which amino acids participate in a helix? • Secondary structure motifs: – e.g., Is this a coiled-coil protein? • Threading: – Estimate protein structure from a related protein with known structure and similar sequence.
Predicting Protein Structure Protein Secondary Structure
Complete physical model of tertiary structure Too hard!
Secondary structure: Does this amino acid participate in an alpha- helix? alpha helix beta sheet
Predicting Protein Structure Is this a coiled-coil protein?
• Tertiary structure: – Detailed physical models – Find the configuration of minimum energy • Secondary structure: – e.g., Which amino acids participate in a helix? • Secondary structure motifs: – e.g., Is this a coiled-coil protein? • Threading: – Estimate protein structure from a related protein with known structure and similar sequence.
9 Predicting Protein Structure Threading
• Tertiary structure: Structure? – Detailed physical models – Find the configuration of minimum energy
• Secondary structure: …vkltpegtr_wgghpldekflske… – e.g., Which amino acids participate in a helix? • Secondary structure motifs: …vhltpettrgwgghmldekeiske… – e.g., Is this a coiled-coil protein? Estimate protein structure • Threading: from a related protein – Estimate protein structure from a related protein with with known structure known structure and similar sequence. and similar sequence.
Threading Structural Genomics
• A world-wide consortium to determine novel protein structures. Structure? • Which proteins are likely to be novel?
…vkltpegtr_wgghpldekflske… …vhltpettrgwgghmldekeiske… Estimate protein structure from a related protein with known structure and similar sequence.
Pairwise sequence alignment Outline (global and local)
Multiple sequence alignment A whirlwind review of molecular biology
Substitution An overview of computational molecular matrices Database searching biology
global local BLAST New problems in genomics Sequence statistics
Gene Finding Evolutionary tree Protein structure prediction reconstruction RNA structure prediction
10