Combined Supervised and Unsupervised Learning in Genomic Data Mining Jack Y

Combined Supervised and Unsupervised Learning in Genomic Data Mining Jack Y

Purdue University Purdue e-Pubs ECE Technical Reports Electrical and Computer Engineering 1-1-2003 Combined Supervised and Unsupervised Learning in Genomic Data Mining Jack Y. Yang Okan K. Ersoy Follow this and additional works at: http://docs.lib.purdue.edu/ecetr Yang, Jack Y. and Ersoy, Okan K. , "Combined Supervised and Unsupervised Learning in Genomic Data Mining" (2003). ECE Technical Reports. Paper 152. http://docs.lib.purdue.edu/ecetr/152 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. Combined Supervised and Unsupervised Learning in Genomic Data Mining Jack Y. Yang Okan K. Ersoy TR-ECE 03-10 School of Electrical and Computer Engineering 465 Northwestern Avenue Purdue University West Lafayette, IN 47907-2035 ii iii TABLE OF CONTENTS LIST OF TABLES .......................................................................................................................................................... V LIST OF FIGURES .......................................................................................................................................................VII ABSTRACT....................................................................................................................................................................IX 1. INTRODUCTION.......................................................................................................................................................1 1.1 MOTIVATION.......................................................................................................................................................1 1.2 ADVANCEMENT OF CURRENT RESEARCH....................................................................................................2 1.3 A SNAPSHOT OF OUR RES EARCH..................................................................................................................5 2. BIOPHYSICAL ASPECTS OF BIOINFORMATICS ..........................................................................................9 2.1 ROLES OF PHYSICS IN BIOMEDICAL SCIENCES .........................................................................................9 2.1.1 X-ray crystallography..............................................................................................................................12 2.1.2 The Overhauser effects: Combined ESR with NMR..........................................................................12 2.1.3 DNA microarray technology..................................................................................................................17 2.2 ROLES OF BIOINFORMATICS ........................................................................................................................19 2.2.1 Protein Modeling .....................................................................................................................................20 2.2.2 Evolution, phylogenetic systematics and phylogenetic trees..........................................................21 3. DATA MINING AND COMPUTATIONAL INTELLIGENCE...........................................................................25 3.1 ORIGINATIONS OF BIOINFORMATICS AND ITS RELATIONSHIP TO COMPUTER ENGINEERING.....25 3.2 STATISTICAL LEARNING, DATA MINING AND COMPUTATIONAL INTELLIGENCE..............................26 3.3 UNSUPERVISED LEARNING.............................................................................................................................27 3.3.1 K-means clustering algorithm ..............................................................................................................28 3.3.2 Self-organizing maps (SOMs) ................................................................................................................30 3.3.2.1 Update neuron’s rule from energy function.................................................................................... 30 3.3.2.2 Implementation of the SOM in optional first stage of UST........................................................ 31 3.3.3 Unsupervised decision trees ..................................................................................................................35 3.4 SUPERVISED LEARNING...................................................................................................................................35 3.4.1 Support vector machines........................................................................................................................36 3.4.2 Decision trees ...........................................................................................................................................36 3.4.3 Neural networks.......................................................................................................................................38 3.4.4 Ersoy’s parallel, self-organizing, hierarchical neural networks...................................................41 3.4.5 Nearset neighbor classifiers..................................................................................................................41 3.5 RESEARCH STRATEGY.....................................................................................................................................41 4. THE UST ALGORITHM: A NEW WAY OF COMB INING SUPERVISED AND UNSUPERVISED LEARNING.....................................................................................................................................................................43 4.1 STRUCTURE OF THE UST................................................................................................................................44 4.1.1 The optional first stage: self-organizing maps (SOMs)...................................................................44 4.1.2 Maximum Contrast Tree........................................................................................................................45 4.2 CONSTRUCTING THE MAXIMUM CONTRAST TREE...................................................................................45 4.2.1 Method I: MCT........................................................................................................................................46 4.2.2 Method II: Balanced MCT....................................................................................................................47 4.2.3 The fixed point MCT ..............................................................................................................................49 4.3 FAST IMPLEMENTATION ALGORITHM.........................................................................................................49 4.4. THE ENERGY FUNCTION................................................................................................................................51 iv 4.5 OVERVIEW OF UST RESULTS ........................................................................................................................52 5. DATA GENERATION..............................................................................................................................................55 5.1 REASONS FOR GENERATING OUR OWN DATA..........................................................................................55 5.2 METHOD IN GENERATING PROTEIN PHYLOGENETIC PROFILES ............................................................55 5.3 OBTAINING PROTEIN FUNCTIONAL LABELS..............................................................................................56 6. RESULTS OF UST ON BIOMEDICAL ASPECTS ............................................................................................61 6.1 IDENTIFYING FUNCTIONAL RELATED PROTEINS .......................................................................................61 6.2 IDENTIFYING PROTEIN (GENE) FUNCTIONAL COMPLEX WITHOUT SEQUENCE HOMOLOGY ........63 6.3 A SCENARIO OF EVOLUTIONARY PATHWAYS .........................................................................................65 6.4 PREDICTING FUNCTIONS OF UNKNOWN PROTEINS................................................................................67 7. NEW CLASSIFIER TO HANDLE MULTIPLE-LABELED INSTANCES .......................................................71 7.1 THEORETICAL PROPERTIES OF THE NEAREST NEIGHBOR CLASSIFIER ..............................................71 7.2 PROBABILITY THAT TWO PHYLOGENETIC PROFILES MATCH DUE TO CHANCE.............................72 7.3 NEW INSIGHT ON IMPROVING NEAREST NEIGHBOR CLASSIFIERS .....................................................82 7.3 UST BASED MULTIPLE-LABELED INSTANCE CLASSIFIER (MLIC)..........................................................83 7.4 MULTIPLE FUNCTIONALITIES OF MLIC ......................................................................................................86 7.5 USING THE MAXIMUM CONTRAST TREE AS A CLASSIFIER......................................................................87 8. COMARATIVE EXPERIMENTAL RESULTS ....................................................................................................89 8.1 ACCOMMODATING TO COMPLEX DATA BY UST.....................................................................................89 8.2 CONSTRUCTING A LIBRARY OF YEAST PROTEIN PHYLOGENETIC PROFILES ................................ 102 8.3 COMPARATIVE RESULTS ............................................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    143 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us