Biclustering Algorithms Are Used

Roadmap • Clustering (just a very small overview!!) – Hierarchical – Partitional • Biclustering • Biclustering Gene Expression Time Series 1 Sara C. Madeira, 24/03/2011 Clustering 2 Sara C. Madeira, 24/03/2011 Clustering 3 Sara C. Madeira, 24/03/2011 Clustering 4 Sara C. Madeira, 24/03/2011 Clustering 5 Sara C. Madeira, 24/03/2011 Clustering • Distances used when clustering expression data are related to • Absolute differences (Euclidean distance, …) • Trends (Pearson Correlation, …) • Homogeneity and Separation Principles should be preserved!! • Homogeneity: Genes/conditions within a cluster are close/ correlated to each other • Separation: Genes/conditions in different clusters are further apart from each other/uncorrelated to each other clustering is not an easy task! 6 Sara C. Madeira, 24/03/2011 Clustering Techniques • Agglomerative Start with every gene/condition in its own cluster, and iteratively join clusters together. • Divisive Start with one cluster and iteratively divide it into smaller clusters. • Hierarchical Organize elements into a tree, leaves represent genes and the length of the pathes between leaves represents the distances between genes/conditions. Similar genes/conditions lie within the same subtrees. • Partitional Partitions the genes/conditions into a specified number of groups. 7 Sara C. Madeira, 24/03/2011 Hierarchical Clustering 8 Sara C. Madeira, 24/03/2011 Hierarchical Clustering 9 Sara C. Madeira, 24/03/2011 Hierarchical Clustering 10 Sara C. Madeira, 24/03/2011 Partitional Clustering 11 Sara C. Madeira, 24/03/2011 Clustering in Babelomics • Algorithms: UGMA, k-Means, SOTA (Dopazo and Carazo, 1997; Herrero et al., 2001) • Webpage: http://babelomics.bioinfo.cipf.es/ • Tutorial: http://bioinfo.cipf.es/babelomicstutorial/clustering/ 12 Sara C. Madeira, 24/03/2011 Roadmap • Clustering • Biclustering – Why Biclustering and not just Clustering? – Bicluster Types and Structure – Algorithms • Biclustering Gene Expression Time Series 13 Sara C. Madeira, 24/03/2011 What is Biclustering? • Simultaneous Clustering of both rows and columns of a data matrix. – Biclustering - Identifies groups of genes with similar/coherent expression patterns under a specific subset of the conditions. – Clustering - Identifies groups of genes/conditions that show similar activity patterns under all the set of conditions/all the set of genes under analysis. • |R| by |C| data matrix A = (R,C) Cond 1 … Cond j … Cond.m – R={r1,..., r|R|} = Set of |R| rows. Gene 1 … … … … … – C={y1,..., y|C|} = Set of |C| columns. … … … … … … – aij=relation between row i and column j. Gene i … … a … … • Gene expression matrices ij … … … … … … – R = Set of Genes – C = Set of Conditions. Gene n … … … … … – aij = expression level of gene i under condition j (quantity of mRNA). 14 Sara C. Madeira, 24/03/2011 Biclustering vs Clustering C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 G1 a11 a12 a13 a14 a15 a16 a17 a18 a19 a110 G2 a21 a22 a23 a24 a25 a26 a27 a28 a29 a210 G3 a31 a32 a33 a34 a35 a36 a37 a38 a39 a310 G4 a41 a42 a43 a44 a45 a46 a47 a48 a49 a410 G5 a51 a52 a53 a54 a55 a56 a57 a58 a59 a510 G6 a61 a62 a63 a64 a65 a66 a67 a68 a69 a610 R = {G1, G2, G3, G4, G5, G6} Cluster of Conditions (R,J) C= {C1, C2, C3, C4, C5, C6, C7, C8, C9, C10} (R,{C4, C5, C6}) I = {G2, G3, G4} Cluster of Genes (I,C) Bicluster (I,J) J = {C4, C5, C6} ({G2, G3, G4},C) ({G2, G3, G4}, {C4, C5, C6}) 15 Sara C. Madeira, 24/03/2011 Example – Yeast Cell Cycle 16 Sara C. Madeira, 24/03/2011 Example – Yeast Cell Cycle 17 Sara C. Madeira, 24/03/2011 Example – Yeast Cell Cycle GENE CLUSTER 18 Sara C. Madeira, 24/03/2011 Example – Yeast Cell Cycle BICLUSTER 19 Sara C. Madeira, 24/03/2011 Example – Yeast Cell Cycle BICLUSTER 20 Sara C. Madeira, 24/03/2011 Why Biclustering and not just Clustering? • When Clustering algorithms are used – Each gene in a given gene cluster is defined using all the conditions. – Each condition in a condition cluster is characterized by the activity of all the genes. Global Model • When Biclustering algorithms are used – Each gene in a bicluster is selected using only a subset of the conditions – Each condition in a bicluster is selected using only a subset of the genes. Local Model 21 Sara C. Madeira, 24/03/2011 Why Biclustering and not just Clustering? • Unlike Clustering – Biclustering identifies groups of genes that show similar activity patterns under a specific subset of the experimental conditions. • Biclustering is the key technique to use when 1. Only a small set of the genes participates in a cellular process of interest. 2. An interesting cellular process is active only in a subset of the conditions. 3. A single gene may participate in multiple pathways that may or not be co- active under all conditions. 22 Sara C. Madeira, 24/03/2011 Roadmap • Clustering • Biclustering – Why Biclustering and not just Clustering? – Bicluster Types and Structure – Algorithms • Biclustering Gene Expression Time Series 23 Sara C. Madeira, 24/03/2011 Bicluster Types 1. Biclusters with constant values. 2. Biclusters with constant values on rows or columns. 3. Biclusters with coherent values. 4. Biclusters with coherent evolutions. 24 Sara C. Madeira, 24/03/2011 Constant Values • Perfect constant bicluster sub-matrix (I,J) where all values 1.0 1.0 1.0 1.0 within the bicluster are equal for all i∈ I and j∈ J: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 aij= µ 1.0 1.0 1.0 1.0 25 Sara C. Madeira, 24/03/2011 Constant Values on Rows or Columns • Perfect bicluster with constant • Perfect bicluster with constant rows columns – submatrix (I,J) where all the – submatrix (I,J) where all the values within the bicluster can values within the bicluster can be obtained using: be obtained using: aij= µ + αi aij= µ + βj aij= µ x αi aij= µ x βj where µ is the typical value within where µ is the typical value within the the bicluster and α is the i bicluster and βj is the adjustment for adjustment for row i∈ I. column j∈ J. This adjustment can be obtained either in an additive or multiplicative way. 26 Sara C. Madeira, 24/03/2011 Constant Values on Rows or Columns 1.0 1.0 1.0 1.0 1.0 2.0 3.0 4.0 2.0 2.0 2.0 2.0 1.0 2.0 3.0 4.0 3.0 3.0 3.0 3.0 1.0 2.0 3.0 4.0 4.0 4.0 4.0 4.0 1.0 2.0 3.0 4.0 Constant Rows Constant Columns 27 Sara C. Madeira, 24/03/2011 Coherent Values • Perfect bicluster with additive/multiplicative model – a subset of rows and a subset of columns, whose values aij are predicted using: aij= µ + αi + βj aij= µ x αi x βj where µ is the typical value within the bicluster, αi is the adjustment for row i∈ I and βj is the adjustment for row j∈ J. These adjustments can be obtained either in an additive or multiplicative way. 28 Sara C. Madeira, 24/03/2011 Coherent Values 1.0 2.0 5.0 0.0 1.0 2.0 0.5 1.5 2.0 3.0 6.0 1.0 2.0 4.0 1.0 3.0 4.0 5.0 8.0 3.0 4.0 8.0 2.0 6.0 5.0 6.0 9.0 4.0 3.0 6.0 1.5 4.5 Additive Model Multiplicative Model 29 Sara C. Madeira, 24/03/2011 Coherent Values • The “the plaid models” (Lazzeroni and Owen) consider a generalization of the additive model: general additive model. • For every element aij – The general additive model represents a sum of models. – Each model represents the contribution of the bicluster Bk to the value of aij in case i∈ I and j∈ J. 30 Sara C. Madeira, 24/03/2011 Coherent Values General Additive Model • θijk specifies the contribution of each bicluster k and can be one of the following expressions representing different types of biclusters: • K is the number of biclusters. – µk Constant Biclusters • ρik and κjk are binary values that – µk + αik Biclusters with represent memberships: constant rows – ρik is the membership of row i in the – µk + βjk Biclusters with bicluster k. constant columns – κjk is the membership of column j in – µk + αik + βjk Biclusters with the bicluster k. additive model General Multiplicative Model can also be assumed! 31 Sara C. Madeira, 24/03/2011 Coherent Values General Additive Model 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 Constant Values 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 3.0 3.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 Constant Biclusters 2.0 2.0 2.0 2.0 32 Sara C. Madeira, 24/03/2011 Coherent Values General Additive Model 1.0 1.0 1.0 1.0 1.0 2.0 3.0 4.0 2.0 2.0 2.0 2.0 1.0 2.0 3.0 4.0 3.0 3.0 8.0 8.0 5.0 5.0 1.0 2.0 8.0 10 7.0 8.0 4.0 4.0 10 10 6.0 6.0 1.0 2.0 8.0 10 7.0 8.0 7.0 7.0 7.0 7.0 5.0 6.0 7.0 8.0 8.0 8.0 8.0 8.0 5.0 6.0 7.0 8.0 Constant Rows Constant Columns 33 Sara C.

Load more