Automatic Structure Discovery for Large Source Code

Automatic Structure Discovery for Large Source Code

Automatic Structure Discovery for Large Source Code By Sarge Rogatch Universiteit van Amsterdam, Master Thesis Artificial Intelligence, 2010 Automatic Structure Discovery for Large Source Code Page 1 of 130 Master Thesis, AI Sarge Rogatch, University of Amsterdam July 2010 Acknowledgements I would like to acknowledge the researchers and developers who are not even aware of this project, but their findings have played very significant role: Soot developers: Raja Vall´ee-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and others. TreeViz developer: Werner Randelshofer H3 Layout author and H3Viewer developer: Tamara Munzner Researchers of static call graph construction: Ondˇrej Lhot´ak, Vijay Sundaresan, David Bacon, Peter Sweeney Researchers of Reverse Architecting: Heidar Pirzadeh, Abdelwahab Hamou-Lhadj, Timothy Lethbridge, Luay Alawneh Researchers of Min Cut related problems: Dan Gusfield, Andrew Goldberg, Maxim Babenko, Boris Cherkassky, Kostas Tsioutsiouliklis, Gary Flake, Robert Tarjan Automatic Structure Discovery for Large Source Code Page 2 of 130 Master Thesis, AI Sarge Rogatch, University of Amsterdam July 2010 Contents 1 Abstract ................................................................................................................................ 6 2 Introduction .......................................................................................................................... 7 2.1 Project Summary .......................................................................................................... 8 2.2 Global Context ........................................................................................................... 10 2.3 Relevance for Artificial Intelligence .......................................................................... 10 2.4 Problem Analysis ....................................................................................................... 11 2.5 Hypotheses ................................................................................................................. 11 2.6 Business Applications ................................................................................................ 12 2.7 Thesis Outline ............................................................................................................ 15 3 Literature and Tools Survey ............................................................................................... 16 3.1 Source code analysis .................................................................................................. 16 3.1.1 Soot ........................................................................................................................ 17 3.1.2 Rascal ..................................................................................................................... 18 3.2 Clustering ................................................................................................................... 18 3.2.1 Particularly Considered Methods ........................................................................... 20 3.2.1.1 Affinity Propagation ...................................................................................... 20 3.2.1.2 Clique Percolation Method ............................................................................ 22 3.2.1.3 Based on Graph Cut ....................................................................................... 22 3.2.2 Other Clustering Methods ...................................................................................... 24 3.2.2.1 Network Structure Indices based ................................................................... 25 3.2.2.2 Hierarchical clustering methods .................................................................... 27 4 Background ........................................................................................................................ 29 4.1 Max Flow & Min Cut algorithm ................................................................................ 29 4.1.1 Goldberg’s implementation ................................................................................... 29 4.2 Min Cut Tree algorithm ............................................................................................. 30 4.2.1 Gusfield algorithm ................................................................................................. 30 4.2.2 Community heuristic .............................................................................................. 31 4.3 Flake-Tarjan clustering .............................................................................................. 31 4.3.1 Alpha-clustering ..................................................................................................... 31 4.3.2 Hierarchical version ............................................................................................... 32 4.4 Call Graph extraction ................................................................................................. 33 4.5 The Problem of Utility Artifacts ................................................................................ 34 4.6 Various Algorithms .................................................................................................... 36 5 Theory ................................................................................................................................ 37 5.1 Normalization ............................................................................................................ 37 5.1.1 Directed Graph to Undirected ................................................................................ 38 5.1.2 Leverage ................................................................................................................. 39 5.1.3 An argument against fan-out analysis .................................................................... 40 5.1.4 Lifting the Granularity ........................................................................................... 40 5.1.5 An Alternative ........................................................................................................ 43 5.2 Merging Heterogeneous Dependencies ..................................................................... 44 5.3 Alpha-search .............................................................................................................. 45 5.3.1 Search Tree ............................................................................................................ 45 5.3.2 Prioritization .......................................................................................................... 46 5.4 Hierarchizing the Partitions ....................................................................................... 47 5.5 Distributed Computation ............................................................................................ 48 5.6 Perfect Dependency Structures .................................................................................. 49 5.6.1 Maximum Spanning Tree ...................................................................................... 50 Automatic Structure Discovery for Large Source Code Page 3 of 130 Master Thesis, AI Sarge Rogatch, University of Amsterdam July 2010 5.6.2 Root Selection Heuristic ........................................................................................ 51 6 Implementation and Specification ..................................................................................... 53 6.1 Key Choices ............................................................................................................... 54 6.1.1 Reducing Real- to Integer- Weighted Flow Graph ................................................ 54 6.1.2 Results Presentation ............................................................................................... 54 6.2 File formats ................................................................................................................ 54 6.3 Visualization .............................................................................................................. 55 6.4 Processing Pipeline .................................................................................................... 55 7 Evaluation .......................................................................................................................... 58 7.1 Experiments ............................................................................................................... 58 7.1.1 Analyzed Software and Dimensions ...................................................................... 58 7.2 Interpretation of the Results ....................................................................................... 59 7.2.1 Architectural Insights ............................................................................................. 60 7.2.2 Class purpose from library neighbors .................................................................... 61 7.2.2.1 Obvious from class name ............................................................................... 62 7.2.2.2 Hardly obvious from class name .................................................................... 64 7.2.2.3 Not obvious from class name ......................................................................... 65 7.2.2.4 Class name seems to contradict the purpose .................................................. 66 7.2.3 Classes that act together ........................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    130 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us