Introduction to Bioinformatics
Introduction to Bioinformatics
Dr. rer. nat. Jing Gong Cancer Research center Medicine School of Shandong University 2011.9.14
1 Introduction to Bioinformatics
Chapter 1 Introduction
2 Introduction to Bioinformatics
About me • Dr. rer. nat. Jing Gong • Bachelor Degree in Marine Biology at the China Ocean University (former Qingdao Ocean University) • Bachelor, Master & Doctoral Degree in Bioinformatics at the Ludwig Maximilians Universität München, Germany • Affiliation: Cancer Research Center of SDU • Tel: 0531-88380202 • Email: [email protected] • Office: Dianjing Building, Rm.106, Baotuquan Campus
3 Introduction to Bioinformatics
About this course • Schedule: 2011/9/14 - 2011/10/12, Mi. 14:00 - 18:00 • Locus: 8#, first floor, west, Computer Pool • Homepage: http://1.51.212.243/bioinfo.html
• Table of Contents My name is Lampy. Chapter 1 : Introduction Chapter 2 : Databases
Chapter 5 : Tree
Chapter 3 : Alignment Chapter 4 : Structure
4 Introduction to Bioinformatics
Literatures: 1. Bioinformatics - An Introduction, 2nd Edition, Jeremy Ramsden, 2009, Springer. 2. Bioinformatics For Dummies, 2nd Edition, Jean-Michel Claverie, Cedric Notredame, 2007, Wiley.
5 Introduction to Bioinformatics
Information Page Vocabulary List
Information Page Vocabulary Chapter 1, 2011/9/14 Chapter 1, 2011/9/14 Dr. rer. nat. Jing Gong Affiliation: Cancer Research Center of SDU FASTA FASTA Tel: 0531-88380202 FASTA (prounced FAST-Aye) FASTA (读作FAST-Aye) 代表 Email: [email protected] stands forFAST-ALL, reflecting FAST-ALL, 反映的实施是他能 Office: Dianjing Building, Rm.106, Baotuquan the fact that it canbe used for a 够用于快速的蛋白质比对或者快 Campus fast protein …… 组的核苷比对。该程序…… BLAST BLAST Schedule: 2011/9/14 - 2011/10/12, Mi. 14:00 - 18:00 Basic Local Alignment Search 基本局部比对搜索工具。以速度 Place: 8#, first floor, west, Computer Pool Tool. A sequence comparison 最优化算法为核心,搜索序列数 algorithm optimized for speed 据库得到最佳局部比对结果。用 Course Homepage: http://1.51.212.243/bioinfo.html used to search sequence 替代矩阵和查新序列…… dtabases …… Pubmed: http://www.ncbi.nlm.nih.gov/entrez/ Alignment ExPASy: http://expasy.org/ 比对 The result of a comparison of 两个甚至更多的基因或者蛋白质 NCBI: http://www.ncbi.nlm.nih.gov/ two or more gene or protein 序列进行比较的结果,用以计算 sequences in order to 他们碱基或者氨基酸的相似度。 PRI: http://pir.georgetown.edu determine their degree of base 序列比对用来决定两个甚至……. or amino acid…….
6 Introduction to Bioinformatics
What is Bioinformatics?
biophysics biohazards biometrics biomathematics
biochemistry bioterrorism
biopotato bioinformatics
7 Introduction to Bioinformatics
What is Bioinformatics? Interdisciplinary
a biology/medical researchers, just like you
a professional in the pharmaceutical industry
a policeman worrying about DNA testing
a computer scientist developing bio-databases
a consumer concerned about GMOs (Genetically Modified Organisms)
…… 8 Introduction to Bioinformatics
What is Bioinformatics? Definition: Bioinformatics – the science of collecting and analyzing complex biological data such as genetic codes. [Oxford Dictionary] Bioinformatics – the computational branch of molecular biology. [Bioinformatics for Dummies] Bioinformatics – the application of computer science and information technology to the field of biology and medicine. [Wikipedia] Bioinformatics – the science of how information is generated, transmitted, received, and interpreted in biological systems, i.e. the application of information science to biology. [Bioinformatics-An Introduction] A formel definition ?
9 Introduction to Bioinformatics
History of Bioinformatics
In 1809, French biologist Jean Baptiste Lamarck published “Philosophie Zoologique”. Lamarck stressed two main themes in his biological work: 1. The environment gives rise to changes in animals, i.e. changes through use and disuse. 2. Life was structured in an orderly manner and that many different parts of all bodies make it possible for the organic movements of animals.
“blind as a mole” “show your teeth” “birds have no teeth?” Jean Baptiste Lamarck (1744-1829) 10 Introduction to Bioinformatics
History of Bioinformatics
In 1859, English naturalist Charles Darwin published “On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life”.
Charles Darwin (1809-1882)
11 Introduction to Bioinformatics
History of Bioinformatics
In 1866, Austrian scientist Gregor Mendel demonstrated that the inheritance of certain traits in pea plants follows particular patterns, now referred to as the laws of Gregor J. Mendel (1822-1884) “Mendelian Inheritance”.
12 Introduction to Bioinformatics
History of Bioinformatics
In 1869, Swiss physician and biologist Friedrich Miescher isolated DNA from the white blood cells at Felix Hoppe-Seyler's laboratory at the University of Tübingen, Germany.
Nuclei Nuclein Nucleic acid DNA
Friedrich Miescher (1844-1895) 13 Introduction to Bioinformatics
History of Bioinformatics
Thomas Hunt Morgan, American geneticist, famous for his experimental research with the fruit fly by which he established the chromosome theory of heredity. He showed that genes are linked in a series on chromosomes and are responsible for identifiable, hereditary traits. Morgan’s work played a key role in establishing the field of genetics. He received the Nobel Prize for Physiology or Medicine in 1933.
Thomas H. Morgen (1866-1945) nobel prize 1933 14 Introduction to Bioinformatics
History of Bioinformatics
In 1944, American physician and medical researcher Oswald Avery and his co-workers Colin MacLeod and Maclyn McCarty demonstrated that DNA is the material of which genes and chromosomes are made.
In his experiment he destroyed the lipids, ribonucleic acids, carbohydrates, and proteins. Transformation still occurred after this. Next he destroyed the deoxyribonucleic acid. Transformation did not occur.
Oswald Avery Colin MacLeod Maclyn McCarty (1877-1955) (1909-1972) (1911-2005) 15 Introduction to Bioinformatics
History of Bioinformatics
In 1950, American biochemist Erwin Chargaff noticed a pattern in the amounts of the four bases: adenine (A) , thymine (T) , cytosine (C) , guanine (G). He discovered that the amounts of adenine (A) and thymine (T) in DNA were roughly the same, as were the amounts of cytosine (C) and guanine (G). This later became known as Chargaff's rule.
%A = %T and %G = %C
Erwin Chargaff (1905-2002)
16 Introduction to Bioinformatics
History of Bioinformatics
In 1953, James D. Watson and Francis Crick suggested the first correct double-helix model of DNA structure in the journal Nature. Their double- helix model of DNA was based on a single X-ray diffraction image taken by Rosalind Franklin and Maurice Wilkins in 1952.
James Waston Francis Crick Maurice Wilkins Rosalind Franklin (1928-) (1916-2004) (1916-2004) (1920-1958) nobel prize 1962 nobel prize 1962 nobel prize 1962
17 Introduction to Bioinformatics
History of Bioinformatics
The sequence of 77 nucleotides of a yeast alanine tRNA was found by an American biochemist Robert W. Holley in 1965. Holley was awarded the 1968 Nobel Prize in Physiology or Medicine for describing the structure of this tRNA, linking DNA and protein synthesis.
Robert W. Holley (1922-1993) nobel prize 1968
18 Introduction to Bioinformatics
History of Bioinformatics In 1977, Frederick Sanger and Colleagues introduced the “dideoxy” chain-termination method for sequencing DNA molecules, also known as the “Sanger method”. Hence, in 1980, he shared Nobel Prize in chemistry with Walter Gilbert.
The key principle of the Sanger method was the use of dideoxynucleotide triphosphates Frederick Sanger Walter Gilbert (ddNTPs), as DNA chain terminators. (1918-) (1932-) nobel prize 1980 nobel prize 1980 19 Introduction to Bioinformatics
History of Bioinformatics
Read protein sequence directly in the DNA sequence!
Central dogma of molecular biology was first articulated by Francis Crick in 1958 and re-stated Francis Crick in a Nature paper published in 1970. (1916-2004)
20 Introduction to Bioinformatics
History of Bioinformatics
Marshall Warren Nirenberg shared a Nobel Prize in Physiology or Medicine in 1968 with Har Gobind Khorana and Robert W. Holley for "breaking the genetic code" and describing how it operates in protein synthesis.
Marshall Warren Har Gobind Robert W. Holley Nirenberg Khorana (1922-) (1922-1993) (1927-2010) nobel prize 1968 nobel prize 1968 nobel prize 1968
21 Introduction to Bioinformatics English Courses for Graduate Students History of Bioinformatics
Amino acids are the building blocks of protein. Protein is a nutrient needed by the human body for growth and maintenance.
Amino acids are made of carbon, hydrogen, oxygen, nitrogen, and sulfur atoms.
A protein = C1200H2400O600N300S100
22 Introduction to Bioinformatics English Courses for Graduate Students History of Bioinformatics
insulin = ( # 1-letter 3-letter Nmae A given type of protein 30 glycines + 1 A Ala Alanine 44 alanines + 2 R Arg Arginine always contains the same 5 tyrosines + 3 N Asn Asparagine number of total amino acids 14 glutamines 4 D Asp Aspartic acid + . . .) 5 C Cys Cysteine in the same proportion.
6 Q Gln Glutamine
7 E Glu Glutamic acid
8 G Gly Glycine Amino acids are linked 9 H His Histindine together as a chain. 10 I Ile Isoleucine The first amino acid 11 L Leu Leucine
12 K Lys Lysine sequence of a protein, 13 M Met Methionine Insulin, was determined 14 F Phe Phenylalanine Frederick Sanger (1918-) in 1951 by Dr. Sanger. nobel prize 1958 15 P Pro Proline
16 S Ser Serine
17 T Thr Threonine insulin = MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHL
18 W Trp Trytophan VEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPL
19 Y Tyr Tyrosine ALEGSLQKRGIVEQCCTSICSLYQLENYCN
20 V Val Valine 23 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing Protein Sequences
Protein Sequence: MAVLD
The first 3D structure of a protein was determined in 1958 by Drs. Kendrew and Perutz, using the complicated technique of X-ray crystallography. Max Ferdinand John Cowdery Perutz (1914-2002) Kendrew (1917-1997) nobel prize 1962 nobel prize 1962 24 Introduction to Bioinformatics English Courses for Graduate Students History of Bioinformatics
In 1956, Symposium on Information Theory in Biology (Gatlinburg, USA). In 1979, GenBank was established at Los Alamos National Laboratory (USA). In 1982, nucleotide sequence database of European Molecular Biology Laboratory (EMBL) was created (Europe). In 1986, DNA Data Bank of Japan (DDBJ) began data bank activities at NIG (Japan). in the early 1990s, International Nucleotide Sequence Database Collaboration (INSDC) was founded in cooperation of Genebank/EMBL/DDBJ. In 1987, a Chinese-American scientist LIN Hua-an first created the word “bioinformatics”. At the very beginning, he created the word “compbio”, then “bioinformatique”, and then “bio-informatics”. But at that time, the email title did not support the hyphen symbol, thus “bioinformatics” was born. Since at least the late 1980s, the term “bioinformatics” has been primary used in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing. 25 Introduction to Bioinformatics English Courses for Graduate Students History of Bioinformatics
26 Introduction to Bioinformatics English Courses for Graduate Students History of Bioinformatics Publicly funded project: Privately funded project
James D. Watson & Francis Collins President Clinton (2000) Craig Venter 1990 began, $3-billion 1998 began, $300-million patented 2000 90% 2000 90% 2001 99% feely available 2001 99%
2003 finished 2003 finished27 Introduction to Bioinformatics English Courses for Graduate Students History of Bioinformatics
28 Introduction to Bioinformatics English Courses for Graduate Students History of Bioinformatics
Shenzhen
AB SOLiDTM Illumina HiSeq 2000 4.0 System Shanghai X 137 X 27
Beijing
29 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You?
9 Analyzing DNAs 9 Analyzing RNAs 9 Analyzing Proteins 9 Others: Pathway, Bioimaging, etc.
30 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing DNAs 1. Read the DNA sequence: ATGGAAGTATTTAAAGCGCCACCTATTGGGAT ATAAG
2. Decompose it into successive triplets: ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G . . .
3. Translate each triplet into the corresponding amino acid: M E V F K A P P I G I STOP
31 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing DNAs
Database
M E V Protein F ATGGAAGTATTTAA…… K A DNA P …
32 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing RNAs
In the context of bioinformatics, there are only two important differences between RNA and DNA: 9 RNA differs from DNA by one nucleotide. 9 RNA comes as a single strand. 33 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing RNAs
Even though RNA molecules consist of single strands of nucleotides, their natural urge for pairing with complementary sequences is still there.
All transfer RNAs (tRNAs) assemble themselves into a shape like a cloverleaf. Hairpin shapes are the basic elements of RNA secondary structure; they’re made up of loops (the unpaired C-U) and stems (the paired regions). 34 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? Analyzing Proteins 9 The first 3D structure of a protein Protein Structure Determination: was determined in 1958 using X- ray crystallography. Experimental Methods
X-ray Crystallography Nuclear Magnetic Resonance (NMR) Computational Methods De novo method, Homology Modeling, Threading, and ensemble method. 35 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing Proteins
VMD Sequence
Maestro
Function
Pymol Structure 36 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing Protein Sequences
Drug Design: • Virtual Screen • Docking Virtual screening involves the rapid in silico assessment of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.
37 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing Protein Sequences
Molecular dynamics (MD) is a computer simulation of physical movements of atoms and molecules.
Super- computer
500-aa protein, 1 ns (10-9 s), 120 Cores : 5 hours
38 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Analyzing Protein Sequences Bavaria Supercomputing Centre • Linux Cluster: 2007, 753 notes, 5646 cores, 43 Tera Float/s • HLRB II: 2007, 9728 cores, 62 Tera Float/s • SuperMUC: 2012, 140000 cores, 3 Peta Float/s 天河一号: 2.5 Peta Float/s, No.1 in the world Linux Cluster HLRB II SuperMUC
39 Introduction to Bioinformatics English Courses for Graduate Students What Bioinformatics Can Do for You? 9 Others: Pathway, Bioimaging, etc.
statistic graph CT
magnetic resonance
40 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences 9Becoming an Instant Expert with PubMed 9Retrieve a 3D protein structure
9Retrieving DNA Sequences 9Making a Multiple Protein Sequence Alignment with ClustalW 9Using BLAST to Compare Your Protein Sequence 41 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
Gene Sequence
Specialist in Bioinformatics
But, what’s Great! It’s dUTPase. dUTPase.
42 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed http://www.ncbi.nlm.nih.gov/entrez/
dUTPase
43 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
44 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
45 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
46 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
47 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
48 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
Author Name
49 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
Author Name + Topic
50 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
51 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
52 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
1
2
3
53 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
Pubmed ID Internal structure of a database record: The information is spread out over Publication separate sections, called fields. Date Title Page Abstracts
Laboratory address authors
54 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed Search “Down” in field “Author [AU]”
Search “Down” in field “Title [TI]”
Search “Down” in field “Laboratory address [AD]”
Search “Down” everywhere 55 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed Using fields to find experts near you : 1
Beijing Beijing 2
Tel : 86 - 10 - 6275-5002 Fax : 86 - 10 - 6276-2292 New Life Science Building, Peking 3 University, Summer Palace Road No. 5, Beijing, P. R. China 100871 56 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed http://www.ncbi.nlm.nih.gov/entrez/
dUTPase
Searching PubMed using limits
57 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
Searching PubMed using limits
58 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed
59 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Becoming an Instant Expert with PubMed A few more tips about PubMed : How to get the most out of your query: • quoted queries (for example, “down syndrome”) • logical connectors: AND, OR, NOT (for example, dUTPase[TI] OR pyrophosphatase[TI] NOT Smith[AU]) • initials to proper names (for example, “Abergel C”) • PubMed Identifier (the number in the PMID field) • deselection of the Limit box when starting a new search. • Related Articles link How to get the most out of your query: • Names ranking beyond the 10th place in author’s list for older papers (before 1995). • Papers recorded before 1965. • Abstracts for most references recorded before 1976. 60 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
acquire some preliminary information 9 about a particular function that you’re interested in — dUTPase.
find out more about it by retrieving a few examples of protein sequences that perform this function in E. coli. ExPASy
61 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
dUTPase coli
Prof. Amos Bairoch 62 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
63 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
64 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
65 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
66 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
1 23
9
67 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
68 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
1 2 3
9
69 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
70 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
71 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
72 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
Tab FASTA
Excel
73 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
74 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
9
9
75 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
“Cross-references” point to data collections other than UniProtKB.
76 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/9
right click
“sequences” provides you with the actual amino acid sequence of the protein.
Save this sequence on your Desktop as “P06968.fasta”.
77 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving Protein Sequences http://expasy.org/
What is FASTA? (has anything to do with PASTA?) FASTA is the name of a popular sequence alignment and database scanning program created by W.R. Pearson and D.J. Lipman in 1988. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.
The sequence in FASTA format : The line starting with > (the definition line) contains a unique >P06968 My_Sequence_Name identifier followed by an optional ARCGTCRGCKINTANDRGCKINTAND short definition. The lines that CKINTANDARCGTCRGCKINTANDRG follow it contain the DNA or CKINTAND protein sequence (in one-letter code) until the next > symbol indicates the beginning of a new sequence. 78 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving DNA Sequences
acquire some preliminary information about a particular function that you’re 9 interested in — dUTPase.
9 find out more about it by retrieving a few examples of protein sequences ExPASy that perform this function in E. coli.
9 retrieve DNA sequence relevant to dUTPase protein of E. coli. 79 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving DNA Sequences http://expasy.org/
P06968
80 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving DNA Sequences http://expasy.org/
81 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics? 9Retrieving DNA Sequences
82 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics? 9Retrieving DNA Sequences
83 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics? 9Retrieving DNA Sequences
84 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieving DNA Sequences From UniprotKB: P06968 jump to
1. Summary Section
2. Reference Section 85 …… Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics? 9Retrieving DNA Sequences
Range of UTPase 3. Features Section ORF (CDS) • promoter elements • ribosome binding sites (RBS) • protein coding segments (CDS) …… ORF translation
4. Sequence Section …… 86 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics? 9Retrieving DNA Sequences
1. Summary Section
2. Reference Section
87 …… Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics? 9Retrieving DNA Sequences
1. Summary Section
2. Reference Section
88 …… Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence
acquire some preliminary information about a particular function that you’re 9 interested in — dUTPase.
9 find out more about it by retrieving a few examples of protein sequences ExPASy that perform this function in E. coli.
9 retrieve DNA sequence relevant perform a BLAST search to dUTPase protein of E. coli. 89 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence
What is BLAST?
BLAST (Basic Local Alignment Search Tool) – A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query. BLASTn – BLASTn will search a DNA sequence against a DNA databank. BLASTp – BLASTp will compare a protein sequence against the protein database of your choice. BLASTx – BLASTx will translate a nucleic acid sequence in all six reading frames and compare all these against the protein database of your choice. BLAST? – BLAST? ……
90 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
91 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
92 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
1
Open “P06968.fasta”at 2 your Desktop, and paste the sequence here. Give a name here.
http://1.51.212.243/P06968.fasta 3 93 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
94 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
95 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
E-value (form 0 to 1) close to 1 is a warning that the conclusion you might draw from the alignments is NOT reliable.
96 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
to see the corresponding database entry. to see the alignment between your query sequence and the matching sequence of the protein that corresponds to this score.
97 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
98 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
What is Alignment? Alignment is the result of a comparison of two or more gene or protein sequences in order to determine their degree of base or amino acid similarity.
Pairwise Alignment
Multiple Alignment 99 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment
acquire some preliminary information 9 about a particular function that you’re interested in — dUTPase.
9 find out more about it by retrieving a few examples of protein sequences perform a ExPASy that perform this function in E. coli. multiple alignment 9 9 retrieve DNA sequence relevant perform a BLAST search to dUTPase protein of E. coli. 100 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment Multiple alignments are used to : • Identify sequence positions where specific amino acids really matter for the structural integrity or the function of a given protein • Define specific sequence signatures for protein families • Classify sequences and build evolutionary trees
101 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
102 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
103 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
http://1.51.212.243/multi.fasta
Get sequences under : http://1.51.212.243/multi.fasta
Select all
Copy 104 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
Paste
105 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
106 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
* identical : similar . related different
107 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
108 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
Conserved region
109 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
9
110 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Making a Multiple Sequence Alignment http://pir.georgetown.edu
111 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure
acquire some preliminary information retrieve a 9 about a particular function that you’re protein structure interested in — dUTPase.
9 9 find out more about it by retrieving a few examples of protein sequences perform a ExPASy that perform this function in E. coli. multiple alignment 9 9 retrieve DNA sequence relevant perform a BLAST search to dUTPase protein of E. coli. 112 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure dUTPase
DNA sequence
protein sequence
3D structure
113 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure Using fields to find experts near you :
Beijing Beijing
Tel : 86 - 10 - 6275-5002 Fax : 86 - 10 - 6276-2292 New Life Science Building, Peking University, Summer Palace Road No. 5, Beijing, P. R. China 100871 114 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure
Su XD dUTPase
115 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure
116 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure
117 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure
118 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure Press left button
119 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure Pressing left button
Action Rotate View Left Click and Drag Shift + Left Click Zoom drag mouse up or down / roll mouse middle button Select/ Deselect Left Click Residue Jmol Menu Right-Click 120 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure
121 Introduction to Bioinformatics English Courses for Graduate Students How Most People Use Bioinformatics?
9Retrieve a protein structure
Backbone by chain 122 Introduction to Bioinformatics English Courses for Graduate Students
123