<<

Bioinformatics and Biocomputing

Byoung-Tak Zhang Center for Bioinformation Technology (CBIT) & Biointelligence Laboratory School of and Engineering Seoul National University

[email protected] http://bi.snu.ac.kr/ or http://cbit.snu.ac.kr/

Outline

! Bioinformation Technology (BIT) ! DNA Chip : IT for BT ! DNA : BT for IT ! DNA Computing with DNA Chips ! Outlook

2 HumanGenomeProject

ANew Disease Encyclopedia

New Genetic Genome Fingerprints Goals Health • Identify the approximate 40,000 genes Implications in human DNA New • Determine the sequences of the 3 billion Diagnostics bases that make up human DNA • Store this in database New • Develop tools for data analysis Treatments • Address the ethical, legal and social issues that arise from genome research 3

Bioinformation Technology: vs. Biocomputing

Bioinformatics

IT BT

Biocomputing

4 Bioinformatics

5

What is Bioinformatics?

Bio – molecular biology – computer science

Bioinformatics – solving problems arising from biology using methodology from computer science.

! Bioinformatics vs. Computational Biology ! Bioinformatik (in German): Biology-based computer science as well as bioinformatics (in English)

6 Molecular Biology: Flow of Information

DNA RNA Protein Function hj{nn

œ Œ s ]X D

h ™ zŒ h Ž h™ n wŒj šs šj šhš—j š j

kuh {

{ w™–›Œ• h { j

7

DNA (Gene) RNA Protein

j–•›™–“ {h{hG {Œ™”•ˆ›–• j–•›™–“ š›ˆ›Œ”Œ•› š›ˆ™› š›–— š›ˆ›Œ”Œ•›

nŒ•Œ

y‰–š–”Œ ‰•‹•Ž {™ˆ•šŠ™—›–•GOyuhG—–“ ”Œ™ˆšŒP

\’ œ›™ ”yuh Z’ œ›™

{™ˆ•š“ˆ›–•GOy‰–š–”ŒP

w™–›Œ•

8 Nucleotide and Protein Sequence

DNA (Nucleotide) Sequence zxGšŒ˜œŒ•ŠŒGXZ[[GiwbGY`XGhbGjbG[WXGnbGY^_G{bGWG–›Œ™       Protein (Amino Acid) Sequence       CG2B_MARGL Length: 388 April 2, 1997 14:55 Type: P Check:    9613 .. 1    tsunlu}kzyGGptnr}h{yhzGGGzrn}rz{sn{GGynhslupzu}GG    ARNNLQAGAK KELVKAKRGM TKSKATSSLQ SVMGLNVEPM    EKAKPQSPEP MDMSEINSAL EAFSQNLLEG VEDIDKNDFD    NPQLCSEFVN DIYQYMRKLE REFKVRTDYM TIQEITERMR   SILIDWLVQV HLRFHLLQET LFLTIQILDR YLEVQPVSKN    KLQLVGVTSM LIAAKYEEMY PPEIGDFVYI TDNAYTKAQI   RSMECNILRR LDFSLGKPLC IHFLRRNSKA GGVDGQKHTM    AKYLMELTLP EYAFVPYDPS EIAAAALCLS SKILEPDMEW    GTTLVHYSAY SEDHLMPIVQ KMALVLKNAP TAKFQAVRKK    YSSAKFMNVS TISALTSSTV MDLADQMC          

9

Some Facts

! 1014 cells in the human body. ! 3 × 109 letters in the DNA in every cell in your body. ! DNA differs between humans by 0.2% (1 in 500 bases). ! Human DNA is 98% identical to that of chimpanzees. ! 97% of DNA in the human genome has no known function.

10 Topics in Bioinformatics

Sequence analysis 4 Sequence alignment 4 Structure and function prediction 4 Gene finding Structure analysis 4 Protein structure comparison 4 Protein structure prediction 4 RNA structure modeling

Expression analysis 4 Gene expression analysis 4 Gene clustering

Pathway analysis 4 Metabolic pathway 4 Regulatory networks 11

Extension of Bioinformatics Concept ! Genomics 4Functional genomics 4Structural genomics ! Proteomics: large scale analysis of the proteins of an organism ! Pharmacogenomics: developing new drugs that will target a particular disease ! Microarray: DNA chip, protein chip

12 Applications of Bioinformatics

! Drug design ! Identification of genetic risk factors ! Gene therapy ! Genetic modification of food crops and animals ! Biological warfare, crime etc.

! Personal Medicine? ! E-Doctor?

13

Bioinformatics as

nŒ•iˆ•’ z~pzzTwyv{ kˆ›ˆ‰ˆšŒ

p•–™”ˆ›–• oˆ™‹žˆ™Œ yŒ›™Œˆ“ zœ—Œ™Š–”—œ›•Ž

i–”Œ‹Šˆ“G›ŒŸ›Gˆ•ˆ“ šš i–•–™”ˆ›Šš

h“Ž–™›” hŽŒ•› p•–™”ˆ›–•G“›Œ™•Ž t–•›–™•ŽGˆŽŒ•› zŒ˜œŒ•ŠŒGˆ“Ž•”Œ•› tˆŠ•Œ

sŒˆ™••Ž j“œš›Œ™•Ž yœ“ŒG‹šŠ–Œ™ wˆ››Œ™•G™ŒŠ–Ž•›–• 14 Background of Bioinformatics

! Biological information infra 4Biological systems 4Analysis software tools 4Communication networks for biological research ! Massive biological databases 4DNA/RNA sequences 4Protein sequences 4Genetic map linkage data 4Biochemical reactions and pathways ! Need to integrate these resources to model biological reality and exploit the biological knowledge that is being gathered.

15

Areas and Workflow of Bioinformatics

hnj{hn{{jhn{hjh {nnh{jjh{hhnn{h j{jhn{jh{{hj{nj hnn{jhj{{hjnh{h {jhn{jnh{jhj{hn j{nhj{{hjnhnhn{

Microarray (Biochip)

Structural Functional Pharmaco- Proteomics Genomics Genomics genomics

Infrastructure of Bioinformatics

16 DNA Chip Data Mining: IT for BT

17

cDNA Microarray Excitation Scanning

Laser 2 Laser 1 cDNA clones PCR product amplification (probes) purification mRNA target Emission

Printing

Overlay images and normalize

Hybridize target 0.1nl/spot to microarray

Microarray

Analysis 18 The Complete Microarray Bioinformatics Solution

Databases

Data Cluster Management Analysis

Statistical Data Analysis Mining

Image Automation Processing

19

DNA Chip Applications

! Gene discovery: gene/mutated gene 4Growth, behavior, homeostasis … ! Disease diagnosis 4Cancer classification ! Drug discovery: Pharmacogenomics ! Toxicological research: Toxicogenomics

20 Disease Diagnosis: Cancer Classification with DNA Microarray

- cDNA microarray data of 6567 gene expression levels [Khan ’01].

- Filter genes that are correlated to the classification of cancer using PCA and ANN .

- Hierarchical clustering of the DNA chip samples based on the filtered 96 genes.

- Disease diagnosis based on DNA chip.

[Fig.] Flowchart of the experimental procedure. 21

Disease Diagnosis: Hierarchical Clustering Based on Gene Expression Levels

- Hierarchical clustering of cancer by 96 gene expression levels.

- The relation between gene expression and cancer category.

- Four cancer diagnostic categories

[Fig.] The dendrogram of four cancer clusters and gene expression levels (row:genes,column:samples). 22 AI Methods for DNA Chip Data Analysis ! Classification and prediction 4ANNs, support vector machines, etc. 4Disease diagnosis ! Cluster analysis 4Hierarchical clustering, probabilistic clustering, etc. 4Functional genomics ! Genetic network analysis 4Differential models, relevance networks, Bayesian networks, etc. 4Functional genomics, drug design, etc.

23

Cluster Analysis

[Gene Cluster 1]

[Gene Cluster 2]

[Gene Cluster 3]

[DNA microarray dataset]

[Gene Cluster 4] 24 Methods for Cluster Analysis

! Hierarchical clustering [Eisen ’98] ! Self-organizing maps [Tamayo ’99] ! Bayesian clustering [Barash ’01] ! Probabilistic clustering using latent variables [Shin ’00] ! Non-negative matrix factorization [Shin ’00] ! Generative topographic mapping [Shin ’00]

25

Clustering of Cell Cycle-regulated Genes in S. cerevisiae (the Yeast)

! Identify cell cycle-regulated genes by cluster analysis. 4104 genes are already known to be cell-cycle regulated. 4Known genes are clustered into 6 clusters. ! Cluster 104 known genes and other genes together. ! The same cluster " similar functional categories.

[Fig.] 104 known gene expression levels according to the cell cycle (row: time step, column: gene).

26 Probabilistic Clustering Using Latent Variables

gi: ith gene

zk: kth cluster

tj: jth time step p(gi|zk): generating probability of ith gene given kth cluster

vk=p(t|zk): prototype of kth cluster

∈ = = p(gi | zk ) p(zk ) similarity (x , v ) = x v p(gi zk ) p(zk | gi ) i k ∑ ij kj p(gi ) j = f (g,t,z) ∑∑gij ∑log(p(zk )p(gi | zk )p(t j | zk )) : (*) objective function ij k (maximized by EM)

27

Experimental Result: Identify Cell Cycle-Regulated Genes

! Clustering result

[Table] Clustering result with α-factor arrest data. In 4 clusters, the genes, that have high probability of being cell cycle-regulated, were found. 28 Experimental Result: Prototype Expression Levels of Found Clusters

• The genes in the same cluster show similar expression patterns during the cell cycle. • The genes with similar expression patterns are likely to have correlated functions.

[Fig.] Prototype expression levels of genes found to be cell cycle- regulated (4 clusters).

29

Clustering Using Non-negative Matrix Factorization (NMF)

! NMF (non-negative matrix factorization) ≈ G WH ! NMF as a latent variable model r ≈ = (G)iµ (WH)iµ ∑Wia H aµ h1 h2 hr a=1 … G =/gene expression data matrix W W =/basis matrix (prototypes) H =/encoding matrix (in low … dimension) g g g 1 2 < g >= Wh n ≥ Giµ ,Wia , H aµ 0

30 Experimental Result: Five Clusters Found by NMF

! 5 prototype expression levels during the cell cycle.

ũŪű

ũŪů

ũŪŭ

ũŪū

ũŪ

ũũű

ũũů

ũũŭ Expression level

ũũū

ũ Ū ū Ŭ ŭ Ů ů Ű ű Ūũ ŪŪ Ūū ŪŬ Ūŭ ŪŮ Ūů ŪŰ Ūű Time step in cell cycle

31

Clustering Using Generative Topographic Mapping (GTM) • GTM: a nonlinear, parametric mapping y(x;W) from a latent space to a data space.

Grid t3 Generation

yOxbWPaGmapping x2 t2 Visualization t x1 1

32 Experimental Result: Clusters Found by GTM

! Three cell cycle-regulated clusters found by GTM

Cluster center No. of train Correct no. / test Overall mean expression Data/ no. in data levels (Cln/b) of known cluster genes S/G2 5/ 1/2 (.148 .184 -.367 -.044)

S (0.111 –0.333) 5/5 5 / 5 (100%) (1.075 1.482 -.233 -.375)

M/G1 c1 (0.111 0.333) 13 / 7 1/ 6 (-.171 -.573 .091 .311) c2 (-0.111 –0.111) /2 0/ 6 c3 (0.323 0.1) /2 0/ 6 G2/M c1 (0.111 0.333) 10 / 5 0/ 5 (-.616 –1.01 1.832 1.596) c2 (0.111 0.111) /3 3 / 5 (80%) G1 c1 (-0.111 0.333) 35 / 18 10 / 16 (62%) (.894 .907 -.766 -.479) c2 (-0.111 0.111) /7 0/16

33

Experimental Result: Comparison with other methods

! Comparison of prototype expression levels No. of Mean expression No. of selected Mean expression selected levels by GTM genes by levels by Spellman genes Spellman

S/G2 92 (.13 -.06 -.1 .01) 121 (.13 .05 -.16 .03)

S 25 (.84 .81 -.42 -.33) 71 (.46 .47 -.43 -.18) M/G1 c1 120 (.82 .65 -.65 -.38) 113 (-.21 -.61 -.04 .07) c2 34 (-.04 -.37 -.01 -.11) c3 10 (.32 .29 -.3 .05)

G2/M c1 33 (-.59 -.96 1.34 1.29) 195 (-.32 -.62 .49 .54) c2 60 (.08 -.30 .51 .57)

G1 c1 122 (.92 .74 -.62 -.33) 300 (.66 .49 -.55 -.33) c2 74 (.79 .82 -.48 -.34) (total = 570) (total = 800) 34 Genetic Network Analysis

- Discover the complex regulatory interaction among genes.

- Disease diagnosis, pharmacogenomics and toxicogenomics

- Boolean networks

- Differential equations

- Relevance networks [Butte ’97]

- Bayesian networks [Friedman ’00] [Hwang ’00]

[Fig.] Basin of attraction of 12-gene Boolean genetic network model [Somogyi ’96].

35

Bayesian Networks

! Represent the joint probability distribution among random variables efficiently using the concept of conditional independence. A B An edge denotes the possibility of the causal relationship between nodes.

•A, C and D are independent given B. C D •C asserts dependency between A and B. •A, B and E are independent given C. P(A, B,C, D, E) E = P(A)P(B | A)P(C | A, B)P(D | A, B,C)P(E | A, B,C, D) (by chain rule) = P(A)P(B)P(C | A, B)P(D | B)P(E | C) (by the example Bayes net) 36 Bayesian Networks Learning

! Dependence analysis [Margaritis ’00] 4Mutual information and χ2 test ! Score-based search p(D, S) = p(S) p(D | S)

n q Γ(α ) r Γ(α + N ) = p(S)⋅∏∏iiij ∏ ijk ijk i==11j Γ α + k = 1Γ α ( ij Nij ) ( ijk ) • D: data, S: Bayesian network structure 4NP-hard problem 4Greedy search 4Heuristics to find good massive network structures quickly (local to global search )

37

The Small Bayesian Network for Classification of Cancer

•The Bayesian network was learned by full search Zyxin using BD (Bayesian Dirichlet) score with uninformative prior [Heckerman ’95] from the DNA microarray data for cancer classification (http://waldo.wi.mit.edu/MPR/). Leukemia class [Table] Comparison of the classification performance with other methods [Hwang ’00]. LTC4S C-myb Training error Test error Bayes nets 0/38 2/34 MB-1 Neural trees 0/38 1/34 RBF networks 0/38 1.3/34

38 Large-Scale Bayesian Network with 1171 Genes

- Genetic networks for understanding the regulatory interaction among genes and their derivatives

- Pharmacogenomics and Toxicogenomics

[Fig.] The Bayesian network structure constructed from DNA microarray data for cancer classification (partial view). 39

DNA Computing: BT for IT

40 DNA Computing: BioMolecules as Computer

011001101010001 ATGCTCGAAGCT 41

Why DNA Computing?

! 6.022 × 1023 molecules / mole ! Immense, brute force search of all possibilities 4Desktop: 109 operations / sec 4Supercomputer: 1012 operations / sec 41 µmolofDNA:1026 reactions ! Favorable energetics: Gibb’s free energy ∆G = −8kcal mol-1 ! 1 J for 2 × 1019 operations ! Storage capacity: 1 bit per cubic nanometer

42 Flow of DNA Computing Encoding Node 0: ACG Node 3: TAA Node 1: CGA Node 4: ATG HPP Node 2: GCA Node 5: TGC Node 6: CGT ... TAAACG ... 3 4 Ligation ... ATG ...... ATGTGCTAACGAACG CGA ACGCGAGCATAAATGTGCACGCGT 0 1 ACG GCA...... TAAACGGCAACG CGT TGC TAA... ACGCGAGCATAAATGTGCCGT 6 ...... ACGCGAGCATAAATGCGATGCACGCGT ...... 2 5 CGACGTAGCCGT... CGACGT ...... Gel Electrophoresis PCR ACGCGAGCATAAATGTGCCGT ACGGCATAAATGTGCACGCGT (Polymerase Solution ACGCGAGCATAAATGCGATGCCGT Chain Reaction) Decoding 3 4 1 Affinity Column ... ACGCGTAGCCGT ACGCGAGCATAAATGTGCCGT 0 ...... ACGCGAGCATAAATGTGCACGCGT... 6 ...... ACGCGAGCATAAATGTGCCGT...... ACGCGT 2 5 ... ACGCGAGCATAAATGCGATGCACGCGT 43

Biointelligence on a Chip?

Bioinformation Biological Technology Computer Information Technology Biointelligence Chip Computing Models: The limit of conventional computing models Molecular Computing Devices: Biotechnology Electronics The limit of silicone semiconductor technology

44 Intelligent Biomolecular Information Processing

{Œ–™Œ›Šˆ“Gt–‹Œ“š

Lqsxw#DLqsxw#D Controller

GFP

Cytochrome c Reaction Rxwsxw Chamber (Calculating) S

i–TtŒ”–™ i–Tw™–ŠŒšš–™ i–Š–”—œ›•Ž

45

Evolvable Biomolecular Hardware

! Sequence programmable and evolvable molecular systems have been constructed as cell-free chemical systems using biomolecules such as DNA and proteins.

46 DNA Computers vs. Conventional Computers

DNA-based computers Microchip-based computers slow at individual operations fast at individual operations can do billions of operations can do substantially fewer simultaneously operations simultaneously can provide huge in small smaller memory space setting up a problem may involve settinguponlyrequireskeyboard considerable preparations input DNA is sensitive to chemical electronic data are vulnerable but deterioration can be backed up easily

47

Molecular Operators for DNA Computing

• Hybridization: complementary pairing of two single- stranded polynucleotides

\’T hnjh{jjhG–Z’ \ T hnjh{jjhG Z R ’ – ’ Z’T {nj{hnn{G–\’ Z’T {jn{hnn{G–\’

• Ligation: attaching sticky ends to a blunt-ended molecule

h{njh{nj R {nhj h{njh{nj{nhj {hjn {hjnhj{n {hjn{hjn{nhj

š›Š’ GŒ•‹

48 Research Groups

! MIT, Caltech, Princeton University, Bell Labs ! EMCC (European Molecular Computing Consortium) is composed of national groups from 11 European countries ! BioMIP Institute (BioMolecular Information Processing) at the German National Research Center for Information Technology (GMD) ! Molecular Computer Project (MCP) in Japan ! Leiden Center for Natural Computation (LCNC)

49

Applications of Biomolecular Computing ! Massively parallel problem solving ! Combinatorial optimization ! Molecular nano-memory with fast associative search ! AI problem solving ! Medical diagnosis ! Cryptography ! Drug discovery ! Further impact in biology and medicine: 4Wetbiologicaldatabases 4Processing of DNA labeled with digital data 4Sequence comparison 4Fingerprinting

50 NACST (Nucleotide Acid Computing Simulation Toolkit)

DNA Sequence Generator

DNA Sequence Optimizer Genetic Algorithm

GUI NACST Engine Controller Ligation Unit PCR Unit Electrophoresis Unit Affinity Column Unit Enzyme Unit

51

NACST

Inputs Outputs

52 Combinatorial Problem Solver

TSP (Traveling Salesman Problem) 3 4 3 7 1 AGCT TAGG 0 3 P P 11 11 1A 1B 3 3 6 1 5 3 3 9 3 7 2 5 3 ATCC GCCT GCTA ATCC ATCA TACC 0 → 1 → 2 → 3 → 4 → 5 → 6 → 0 P1B W1→3 P3A P1B W1→2 P2A

2 3 Representations ATGG CATG CGAT CGAA P P 2A 2B P3A P3B 53

Combinatorial Problem Solver

! Weight representation Hybridization/Ligation methods 1. Molecules with high G-C PCR/Gel electrophoresis content tend to hybridize easily. 2. Molecules with high G-C Affinity chromatography content tend to be denatured at higher temperature. PCR/Gel electrophoresis 3. Molecules with larger population in tube will Temperature Gradient have more probability to Gel Electrophoresis hybridize.

Graduate PCR 54 Experimental Results for 4-TSP

Ligation result

Hybridization (37°C) Ligation (16 °C 15hr)

PCR (36 cycle) Gel electrophoresis (10% polyacrylamide gel) Final PCR result 50 bp marker (140bp) Oligomer mixture 55

Molecular Theorem Prover

! Resolution refutation method ! Problem under ¬P∨¬Q∨R P ¬S∨¬T∨Q S T ¬R consideration: ∧ → ∧ → P Q R, S T Q, S,T, P ¬Q∨R ¬T∨Q R = true? ¬ ∨ ! Turn A B Q intoA → B , add R as ¬ ! R ¬P ∨ ¬Q ∨ R, ¬S ∨ ¬T ∨ Q R S , T , P, ¬R R is true! nil

56 Molecular Theorem Prover (Abstract Implementation)

! Lpsohphqwdwlrq 1 ! Lpsohphqwdwlrq 2 ¬Q ¬T ¬R ¬S ¬T Q P ¬R ¬P P S T ¬Q ¬P R ¬S R Q TS

¬S ¬T Q P ¬R ¬Q ¬P R S T

¬S ¬T Q P ¬R S T ¬Q ¬P R

57

Molecular Theorem Prover (Experiments for Method 1) 20 bp DMA marker (Talara)

! ๢  ! ๢  Mixture Reaction

I.  ਺༦ ኢ 100pmol/each # Total 20 ul 1 2 34 5 6

II. Denaturation ( 95°C 10 min) 200 bp

III. Annealing 95°C 1 min # 15 °C : 1°C down/min 20 bp IV. Polyacrylamide gel Electrophoresis(20%) ( PAGE )

V. Detection of solution : 75bp ds DNA

58 Solving Logic Problems by Molecular Computing

! Satisfiability Problem 4Find Boolean values for (x ∨ x ∨ x ) ∧ (x ) ∧ (x ∨ x ) variables that make the given 1 3 4 4 2 3 formula true (x or x or x ) AND (x or x or x ) ! 3-SAT Problem 1 2 3 4 5 6 (x or x or x ) AND (x or x or x ) 4Every NP problems can be 1 2 3 1 2 3 seen as the search for a solution that simultaneously satisfies a number of logical clauses, each composed of three variables.

59

DNA Computing with DNA Chips DNA Chips for DNA Computing

I. Make: oligomer synthesis II. Attach (Immobilized): 5’HS-C6-T15-CCTTvvvvvvvvTTCG-3’

III. Mark: hybridization

IV. Destroy: Enzyme rxn (ex.EcoRI) V. Unmark *  ෌ ર strand 

VI. Readout: Ncycle໲ ,PCRሣ༂ !

61

Variable Sequences and the Encoding Scheme

62 Tree-dimensional Plot and Histogram of the Fluorescence

! S3: w=0, x=0, y=1, z=1

! S7: w=0, x=1, y=1, z=1

! S8: w=1, x=0, y=0, z=0

! S9 : w=1, x=0, y=0, z=1

! y=1: (w V x V y)  ! z=1: (w V y V z)  ! x=0 or y=1: (x V y)  ! w=0: (w V y) 

! Four spots with high fluorescence intensity correspond to the four expected solutions.

! DNA sequences identified in the readout step via addressed array hybridization. 63

Outlook

! IT gets a growing importance in the advancement of BT. 4Bioinformatics 4DNA Microarray Data Mining ! IT can benefit much from BT. 4Biocomputing and Biochips 4DNA Computing (with DNA Chips) ! Bioinformation technology (BIT) is essential as a next-generation information technology. 4In Silico Biology vs. In Vivo Computing

64 References

! [Barash ’01] Barash, Y. and Friedman, N., Context-specific Bayesian clustering for gene expression data, Proc. of RECOMB’01, 2001. ! [Butte ’97] Butte, A.J. et al., Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks, Proc. Natl Acad. Sci. USA, 94, 1997. ! [Eisen ’98] Eisen, M.B. et al., Cluster analysis and display of genome- wide expression patterns, Proc. Natl Acad. Sci. USA, 95, 1998. ! [Friedman ’00] Friedman, N. et al, Using Bayesian networks to analyze expression data, Proc. of RECOMB’00, 2000. ! [Heckerman ’95] Heckerman, D. et al., Learning Bayesian networks: the combination of knowledge and statistical data, Machine Learning, 20(3), 1995. ! [Hwang ’00] Hwang, K.-B. et al., Applying machine learning techniques to analysis of gene expression data: cancer diagnosis, CAMDA’00, 2000. 65

References

! [Khan ’01] Khan, J. et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature Medicine, 7(6), 2001. ! [Margaritis ’00]Margaritis,D.andThrun,S.,Bayesiannetwork induction via local neighborhoods, Proc. of NIPS’00, 2000. ! [Shin ’00] Shin, H.-J. et al., Probabilistic models for clustering cell cycle-regulated genes in the yeast, CAMDA’00, 2000. ! [Somogyi ’96] Somogyi, R. and Sniegoski, C.A., Modeling the of genetic networks: understanding multigenic and pleiotropic regulation, Complexity, 1(6), 1996. ! [Tamayo ’99] Tamayo, P. et al., Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation, Proc. Natl Acad. Sci. USA, 96, 1999.

66 Web Resources: Bioinformatics

! ANGIS - The Australian National Genomic Information Service: http://morgan.angis.su.oz.au/ ! Australian National University (ANU) Bioinformatics: http://life.anu.edu.au/ ! BioMolecular Engineering Research Center (BMERC): http://bmerc-www.bu.edu/ ! Brutlag bioinformatics group: http://motif.stanford.edu/ ! Columbia University Bioinformatics Center (CUBIC): http://cubic.bioc.columbia.edu/ ! European Bioinformatics Institute (EBI): http://www.ebi.ac.uk/ ! European Molecular Biology Laboratory (EMBL): http://www.embl-heidelberg.de/ ! Genetic Information Research Institute: http://www.girinst.org/ ! GMD-SCAI: http://www.gmd.de/SCAI/scai_home.html ! Harvard Biological Laboratories: http://golgi.harvard.edu/ ! Laurence H. BakerCenter for Bioinformatics and Biological Statistics: http://www.bioinformatics.iastate.edu/ ! NASA Center for Bioinformatics: http://biocomp.arc.nasa.gov/ ! NCSA Computational Biology: http://www.ncsa.uiuc.edu/Apps/CB/ ! Stockholm Bioinformatics Center: http://www.sbc.su.se/ ! USC Computational Biology: http://www-hto.usc.edu/ ! W. M. Keck Center for Computational Biology: http://www-bioc.rice.edu/

67

Web Resources: Biocomputing

! European Molecular Computing Consortium (EMCC): http://www.csc.liv.ac.uk/~emcc/ ! BioMolecular Information Processing (BioMip): http://www.gmd.de/BIOMIP ! Leiden Center for Natural Computation (LCNC): http://www.wi.leidenuniv.nl/~lcnc/ ! Biomolecular Computation (BMC): http://bmc.cs.duke.edu/ ! DNA Computing and Informatics at Surfaces: http://www.corninfo.chem.wisc.edu/writings/DNAcomputi ng.html ! SNU Molecular Evolutionary Computing (MEC) Project: http://scai.snu.ac.kr/Research/

68 Web Resources: Biochips

! DNA Microarry (Genome Chip): http://www.gene-chips.com/ ! Large-Scale Gene Expression and Microarray Link and Resources: http://industry.ebi.ac.uk/~alan/MicroArray/ ! The Microarray Centre at The Ontario Cancer Institute: http://www.oci.utoronto.ca/services/microarray/ ! Lab-on-a-Chip resources: http://www.lab-on-a- chip.com/ ! Mailing List: [email protected]

69

Books: Bioinformatics

! Cynthia Gibas and Per Jambeck, Developing Bioinformatics Computer Skills,O’REILLY, 2001. ! Peter Clote and Rolf Backofen, Computational Molecular Biology: An Introduction, A John Wiley & Sons, Inc., 2000. ! Arun Jagota, Data Analysis and Classification for Bioinformatics, 2000. ! Hooman H. Rashidi and Lukas K. Buehler, Bioinformatics Basics Applications in Biological Science and Medicine, 1999. ! Pierre Baldi and Soren Brunak, Bioinformatics: The Machine Learning Approach, MIT Press, 1998. ! Andreas Baxevanis and B. F. Francis Ouellette, Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, A John Wiley & Sons, Inc., 1998.

70 Books: Biocomputing

! Cristian S, Calude and Gheorghe Paun, Computing with Cells and Atoms: An introduction to quantum, DNA and membrane computing, Taylor & Francis, 2001. ! Pâun, G., Ed., Computing With Bio-Molecules: Theory and Experiments, Springer, 1999. ! Gheorghe Paun, Grzegorz Rozenberg and Arto Salomaa, DNA Computing, New Computing Paradigms, Springer, 1998. ! C.S.Calude,J.CastiandM.J.Dinneen,Unconventional Models of Computation, Springer, 1998. ! Tono Gramss, Stefan Bornholdt, Michael Gross, Melanie Mitchell and thomas Pellizzari, Non-Standard Computation: Molecular Computation-Cellular Automata-Evolutionary -Quantum Computers, Wiley-Vch, 1997.

71

More information at http://cbit.snu.ac.kr/ http://bi.snu.ac.kr/

72