Functional Genomics Algorithms and Tools

Functional Genomics Algorithms and Tools

Gad Getz Stefano Monti Michael Reich {gadgetz,smonti,mreich}@broad.mit.edu http://www.broad.mit.edu/~smonti/aws Broad Institute of MIT & Harvard October 18-20, 2006 Cambridge, MA Workshop Format • Morning lectures: – Principles of statistics, machine learning and pattern recognition. – Their application to the analysis of gene expression data. • Afternoon hands-on’s: – Practice sessions w/ GenePattern. – Application of the concepts presented in the lectures. 1 User Profile • Knowledge of basic mathematical concepts assumed (square root, log, function, …) • No (or little) previous analysis experience required. • Basic familiarity with microarrays and expression analysis terms. • Mixed audience: nobody satisfied. Outline of the course Lectures Hands-on’s • Day 1: • Day 1: – Introduction: Functional Genomics – Preprocessing – GenePattern mini-tutorial – Data visualization/Dimensionality – FG Pipeline: reduction: • Data Acquisition • HeatMaps • Preprocessing & Visualization • PCA,NMF,MDS • Day 2: • Day 2: – Supervised Analysis – Differential Analysis/Annotation • Differential analysis/GSEA – Classification: • Class Prediction/Classification • Model building/selection • Validation • Evaluation • Day 3: • Day 3: • [Survival analysis] – Clustering: – Unsupervised Analysis • HC, NMF, CC, Bi-clustering • Clustering, Bi-clustering • GO Annotation – Annotation – Final Project 2 The use of high-throughput gene expression micro-arrays and computational tools for molecular profiling. Functional genomics definition • The use of systematic approaches to answer questions for the majority of genes in a genome, including – when is a gene expressed? – with which other genes does it interact? – what phenotype results if a gene is switched-on/-off/mutated? Functional genomics aspires to answer such questions systematically for all genes in a genome in contrast to conventional approaches that do so for one gene at a time. 3 Paradigm for Functional Genomics Biological • Tumor vs. Normal States/ • Chemical treatment vs. untreated Phenotypes • Remission vs. Refractory Disease • Successful vs. unsuccessful Treatment • RNAi • Time courses • Polymorphism Readouts DNA • Mutation • Loss of Heterozigosity RNA • Expression Levels • Relative abundance Protein • Modification • Activity •• StatisticalWhat pathways inference are affected by a disease?Statistics Machine Learning •• Classification/PredictionWhat pathways are modulated by a specific drug? • Clustering Pattern Recognition Analysis and • What signatures predict tumor type or patient • Featureoutcome? extraction / projection Understanding • Pattern discovery Understanding • What genes confer susceptibility to disease? • Network extraction High-throughput assays technologies DNA • Polymorphism • SNP arrays • Copy number variation • CGH arrays • Loss of Heterozigosity • sequencing • Expression levels • Microarrays RNA • SAGE •… • Relative abundance • Mass Spectrometry Protein • Modification •ChIP2chip • Activity •… 4 High-throughput assays technologies DNA • Polymorphism • SNP arrays • Copy number variation • CGH arrays • Loss of Heterozigosity •… • Expression levels • Microarrays RNA • SAGE •… • Relative abundance • Mass Spectrometry Protein • Modification •ChIP2chip • Activity •… mRNA micro-array Samples Measures the gene Read out organized in a “activity” of 10K of high-dimensional genes at once numerical matrix Genes Traits Transcription translation Diseases RNA Proteins Physiology DNA Metabolism mRNA Drug Resistance Computational analysis 5 Number of articles PubMed query: “microarray” in title/abstract 4908‡ 4211 3377 2383 1542 No. of articles of No. 792 230 0217 81 04 05 06 0 0 1996 1997 1998 1999 2000 2001 2002 2003 2 2 20 Year ‡Extrapolated from first 5 months The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering Data analysis/Hypothesis generation Supervised Analysis Unsupervised Analysis Differential analysis, Classification, … Clustering, Bi-clustering, … Validation/Annotation Enrichment analysis “In silico” testing “In vitro” testing GO annotation, GSEA, … Cross validation, train/test, etc, Back to the lab 6 Introductory references 1. Hastie, T, Tibshirani R, and Friedman J. The Elements of Statistical Learning. Springer-Verlag, 2001. 2. Nature Genetics supplements: The Chipping Forecast I [Nat Genet, 21(1s), 1999], II [Nat Genet, 32(4s), 2002], III [Nat Genet, 37(6s), 2005]. 3. Allison, D. B., Cui, X., Page, G. P., and Sabripour, M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet, 7: 55-65, 2006. 4. Hoffman, E. P., Awad, T., et al. Expression Profiling - Best Practices for Data Generation and Interpretation in Clinical Trials. Nat Rev Genet, 5: 229-237, 2004. 5. Larkin, J. E., Frank, B. C., et al. Independence and reproducibility across microarray platforms. Nat Meth, 2: 337-344, 2005. 6. Irizarry, R. A., Warren, D., et al. Multiple-laboratory comparison of microarray platforms. Nat Meth, 2: 345-350, 2005. 7.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us