Constrained Clustering Approach to Aid in Remodularisation of Object-Oriented Software Systems

CONSTRAINED CLUSTERING APPROACH TO AID IN REMODULARISATION OF OBJECT-ORIENTED SOFTWARE SYSTEMS CHONG CHUN YONG Malaya of FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR University 2016 CONSTRAINED CLUSTERING APPROACH TO AID IN REMODULARISATION OF OBJECT-ORIENTED SOFTWARE SYSTEMS CHONG CHUN YONG Malaya of THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR University 2016 UNIVERSITY OF MALAYA ORIGINAL LITERARY WORK DECLARATION Name of Candidate: Chong Chun Yong Registration/Matric No: WHA130005 Name of Degree: Doctor of Philosophy Title of Project Paper/Research Report/Dissertation/Thesis: Constrained Clustering Approach to aid in Remodularisation of Object-oriented Software Systems Field of Study: Software Engineering I do solemnly and sincerely declare that: (1) I am the sole author/writer of this Work; (2) This Work is original; (3) Any use of any work in which copyright exists was done by way of fair dealing and for permitted purposes and any excerpt or extract from, or reference to or reproduction of any copyright work has been disclosed expressly and sufficiently and the title of the WorkMalaya and its authorship have been acknowledged in this Work; (4) I do not have any actual knowledge nor do I ought reasonably to know that the making of this work constitutes anof infringement of any copyright work; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (“UM”), who henceforth shall be owner of the copyright in this Work and that any reproduction or use in any form or by any means whatsoever is prohibited without the written consent of UM having been first had and obtained; (6) I am fully aware that if in the course of making this Work I have infringed any copyright whether intentionally or otherwise, I may be subject to legal action or any other action as may be determined by UM. Candidate’s Signature Date: University Subscribed and solemnly declared before, Witness’s Signature Date: Name: Designation: ABSTRACT Effective execution of software maintenance requires knowledge of the detailed working of software. The structure of a software, however, may not be clear to software maintainers because it is poorly designed or, worse, there is no updated software documentation. To effectively address this issue, researchers have proposed to apply software clustering to help in recovering a high-level semantic representation of the software design by grouping sets of collaborating software components into meaningful subsystems. This high-level semantic representation serves to help bridge the dichotomy between the perceived software design from the maintainers’ view and the actual code structure. However, software clustering is typically conducted in an unsupervised and rigid manner, where maintainers have no influence Malayaon the clustering results and only a single solution is produced for any given dataset. Even if maintainers possess additional information that could be useful to guide andof improve the clustering results, traditional clustering algorithms have no way to take advantage of this information. These practical concerns have led the researcher to propose the idea of integrating domain knowledge into traditional unsupervised clustering algorithms, herewith referred as constrained clustering, a semi-supervised clustering technique where domain experts can explicitly exert their opinions in the form of explicit clustering constraints to restrict whether a pair of software components should or should not be clustered into the same subsystem. Apart fromUniversity the explicit clustering constraints from domain experts, other sources of information to guide and improve clustering results can be derived implicitly from the source code itself. To help maintainers effectively identify and interpret the implicit information hidden in the source code, this study proposes representing software using weighted complex network in conjunction with graph theory to help in understanding and analysing the structure, behaviour, as well as the complexity of the software components and their ii relationships from the graph theory’s point of view. The results of the analysis can be subsequently converted into implicit clustering constraints. Hence, maintainers can make use of both the explicit and implicit constraints to help in creating a high-level semantic representation of the software design that is coherent and consistent with the actual code structure. This thesis proposes a constrained clustering approach to aid in remodularisation of poorly designed or poorly documented object-oriented software systems. The source code of an object-oriented software system is first converted into UML class diagrams. Next, information from the class diagrams are extracted to measure the strength of cohesion among related classes together with their relationships, and then transform them into a weighted complex network with its nodes and edgesMalaya associated with measured weights. Graph theory metrics are subsequently applied onto the constructed weighted complex network so that the structure, behaviour, andof the complexity of software components and their relationships can be analysed. The results are then converted into sets of clustering constraints. Guided by the explicit and implicit clustering constraints, sets of cohesive clusters are progressively derived to act as a high-level semantic representation of the software design. This research follows an empirical research methodology, where the proposed approach is validatedUniversity using 40 object-oriented open -source software systems written in Java. Using MoJoFM, which is a well-established technique used to compare the similarity between multiple clustering results, the proposed approach achieves an aggregated average of 80.33% accuracy when compared against the original package diagrams of the 40 software systems, thus considerably outperforms conventional unconstrained clustering approach. The clustering results serve as supplementary information for software iii maintainers to aid in making critical decisions for re-engineering, maintaining and evolving software systems. Ultimately, this research helps in reducing the cost of software maintenance through better comprehension of the recovered software design. Malaya of University iv ABSTRAK Penyelenggaraan perisian yang berkesan memerlukan pengetahuan tentang operasi perisian tersebut. Bagaimanapun, struktur perisian mungkin tidak jelas kepada penyelenggara perisian kerana perisian tersebut direka dengan buruk, atau lebih teruk lagi, tidak ada dokumentasi yang dikemaskini. Bagi menangani isu ini dengan berkesan, penyelidik telah mencadangkan untuk melaksanakan pengkelompokan perisian untuk membantu dalam memulihkan perwakilan semantik peringkat tinggi secara rekabentuk perisian dengan mengumpulkan komponen-komponen perisian yang bekerjasama ke dalam subsistem yang bermakna. Perwakilan semantik peringkat tinggi ini berfungsi untuk merapatkan dikotomi antara reka bentuk perisian yang dilihat dari pandangan penyelenggara dan struktur kod yang sebenMalayaarnya. Walau bagaimanapun, pengkelompokan perisian biasanya dijalankan secara tidak terselia dan tegar, di mana penyelenggara tidak mempunyai pengaruhof ke atas keputusan kelompok dan hanya satu penyelesaian yang dihasilkan untuk sebarang set data yang diberikan. Walaupun penyelenggara mempunyai maklumat tambahan yang boleh membantu dan meningkatkan keputusan pengelompokan, algoritma pengelompokan tradisional tidak mempunyai cara untuk mengambil kesempatan daripada maklumat tersebut. Kebimbangan yang praktikal ini telah mendorong penyelidik kepada idea untuk menyepadukan pengetahuan domain ke dalam algoritma pengelompokan tradisional tanpa pengawasan, bersama-sama ini dirujukUniversity sebagai pengkelompokan secara kekangan, teknik pengelompokan separuh selia dimana pakar-pakar domain boleh memberi pendapat mereka dalam bentuk kekangan kelompok untuk menyekat sama ada sepasang komponen perisian perlu atau tidak dikelompokkan ke dalam subsistem yang sama. Selain daripada kekangan pengelompokan yang jelas daripada pakar-pakar domain, sumber maklumat lain untuk membimbing dan meningkatkan hasil pengelompokan boleh diperolehi secara tersirat v dari kod sumber perisian itu sendiri. Untuk membantu penyelenggara mengenal pasti dan mentafsir maklumat yang tersirat tersembunyi dalam kod sumber secara berkesan, kajian ini mencadangkan untuk mewakili perisian menggunakan rangkaian kompleks berwajaran sempena dengan teori graf untuk membantu dalam memahami dan menganalisis struktur, kelakuan, dan juga kerumitan komponen perisian dan hubungan mereka dari sudut pandangan teori graf. Keputusan analisis boleh kemudiannya ditukar menjadi kekangan pengelompokan tersirat. Oleh itu, penyelenggara boleh menggunakan kedua-dua kekangan tersurat dan tersirat untuk membantu dalam mewujudkan perwakilan rekabentuk perisian semantik berperingkat tinggi yang koheren dan konsisten dengan struktur kod yang sebenarnya. Tesis ini mencadangkan satu kaedah pengelompokanMalaya kekangan untuk membantu dalam remodularisasi sistem perisian berorientasikan objek yang direka secara buruk atau tidak didokumenkan. Pada mulanya, kod sumberof sistem perisian berorientasikan objek ditukar kepada gambar rajah kelas UML. Seterusnya, maklumat daripada gambar rajah kelas diekstrak untuk mengukur kekuatan perpaduan

Constrained Clustering Approach to Aid in Remodularisation of Object-Oriented Software Systems

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support