Metadata Foundations for the Life Cycle Management of Software Systems
Total Page:16
File Type:pdf, Size:1020Kb
Metadata Foundations for the Life Cycle Management of Software Systems David Paul Hyland-Wood A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in July 2008 School of Information Technology and Electrical Engineering Declaration by Author This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly authored works that I have included in my thesis. I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my research higher degree candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award. I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the General Award Rules of The University of Queensland, immediately made available for research and study in accordance with the Copyright Act 1968. I acknowledge that copyright of all material contained in my thesis resides with the copyright holder(s) of that material. Statement of Contributions to Jointly Authored Works Contained in the Thesis Hyland-Wood, D., Carrington, D. and Kaplan, S. (2008). Toward a Software Maintenance Methodology using Semantic Web Techniques and Paradigmatic Documentation Modeling, IET Software, Special Issue on Software Evolvability, (to appear). – Hyland-Wood was responsible for 95% of conception and 100% of design, drafting and writing. Carrington and Kaplan provided discussion and review of the material. Hyland-Wood, D., Carrington, D. and Kaplan, S. (2006) Toward a Software Maintenance Methodology using Semantic Web Techniques, Proc. of Second International IEEE Workshop on Software Evolvability 2006. – Hyland-Wood was responsible for 95% of conception and 100% of design, drafting and writing. Carrington and Kaplan provided discussion and review of the material. i Statement of Contributions by Others to the Thesis as a Whole No contributions by others. Statement of Parts of the Thesis Submitted to Qualify for the Award of Another Degree None. Published Works by the Author Incorporated into the Thesis Hyland-Wood, D. (2006). RESTful Software Development and Maintenance, Proc. of the Open Source Developers Conference (OSDC) 2006. - Partially incorporated as paragraphs in Chapters V and VII. Additional Published Works by the Author Relevant to the Thesis but not Forming Part of it Hyland-Wood, D. (2007). Managing Open Source Software Projects in Belbaly, N., Benbya, H. and Meissonier, R. (Eds.), Successful OSS Project Design and Implementation: Requirements, Tools, Social Designs, Reward Structures and Coordination Methods, (to appear). Wood, D. (2005) Scaling the Kowari Metastore, in Dean, M., et al. (Eds.): WISE 2005 Workshops, LNCS 3807, pp. 193-198, 2005. Wood, D., Gearon, P., Adams, T. (2005). Kowari: A Platform for Semantic Web Storage and Analysis, Proc. of XTech 2005. Wood, D. (2004). The Tucana Knowledge Server version 2.1, Proc. of XML 2004, http://www.idealliance.org/proceedings/xml04/. Wood, D. (2004). RDF Metadata in XHTML, Proc. of XML 2004, http://www.idealliance.org/proceedings/xml04/. ii Hyland-Wood, D., Carrington, D. and Kaplan, S. (2006). Scale-Free Nature of Java Software Package, Class and Method Collaboration Graphs (Tech. Rep. No. TR-MS1286), University of Maryland College Park, MIND Laboratory. Acknowledgments Many, many thanks go to my wife Bernadette for her unflagging support over the years of my study. Although her own interest in software is serious, it ends prior to the depths plumbed in this work; yet she offered meaningful advice and encouraged me to continue in spite of obvious incentives to the contrary. She proved her worth continuously as my life partner, my frequent business partner and my best friend. Thanks are also due to my advisors, Associate Professor David Carrington and Professor Simon Kaplan, for their thoughts, guidance and encouragement. Professor James Hendler , now at Rensselaer Polytechnic Institute, provided encouragement, guidance and funding during my tenure at the University of Maryland Information and Network Dynamics (MIND) Laboratory Laboratory. I credit Jim for encouraging me to think on the scale of the World Wide Web. He was also responsible for arranging a U.S. National Science Foundation grant (NSF ITR 04-012) that funded a significant portion of this work. The MIND Laboratory was funded by Fujitsu Laboratory of America College Park, NTT Corporation, Lockheed Martin Corporation and Northrop Grumman Corporation. Other MIND Laboratory researchers who deserve specific thanks include Christian Halaschek-Wiener and Vladimir Kolovski. My long-time collaborators Mr. Paul Gearon of Fedora Commons, Mr. Andrae Muys of Netymon Pty Ltd and Mr. Brian Sletten of Zepheira LLC, deserve thanks for their kind suggestions during the last several years, especially for the improvement of the ontology of software engineering concepts. Brian implemented the new PURL server discussed in Chapter V. David Feeney of Zepheira LLC implemented the second Active PURL prototype as a NetKernel module and refined implementation details with me. Conversations with Dr. Eric Miller of M.I.T. and Zepheira LLC have been invaluable in shaping my thoughts on the deep history and wide applicability of metadata. Dr. Peter Rodgers and Tony Butterfield of 1060 Research Ltd deserve thanks for building 1060 NetKernel and making pre-release versions of it available for my iii prototyping. Open Source software used in this work included Java, Perl, CVS, Subversion, JRDF, the SWOOP ontology editor, the Pellet OWL-DL reasoner, the Protégé ontology editor with OWL and OWLDoc plug-ins, the Mulgara Semantic Store, the Sesame RDF store, the Redfoot RDF library, 1060 NetKernel and the Redland RDF Store. Abstract Software maintenance is, often by far, the largest portion of the software lifecycle in terms of both cost and time. Yet, in spite of thirty years of study of the mechanisms and attributes of maintenance activities, there exist a number of significant open problems in the field: Software still becomes unmaintainable with time. Software maintenance failures result in significant economic costs because unmaintainable systems generally require wholesale replacement. Maintenance failures occur primarily because software systems and information about them diverge quickly in time. This divergence is typically a consequence of the loss of coupling between software components and system metadata that eventually results in an inability to understand or safely modify a system. Accurate documentation of software systems, long the proposed solution for maintenance problems, is rarely achieved and even more rarely maintained. Inaccurate documentation may exacerbate maintenance costs when it misdirects the understanding of maintainers. This thesis describes an approach for increasing the maintainability of software systems via the application and maintenance of structured metadata at the source code and project description levels. The application of specific metadata approaches to software maintenance issues is suggested for the reduction of economic costs, the facilitation of defect reduction and the assistance of developers in finding code-level relationships. The vast majority of system metadata (such as that describing code structure, encoded relationships, metrics and tests) required for maintainability is shown to be capable of automatic generation from existing sources, thus reducing needs for human input and likelihood of inaccuracy. Suggestions are made for dealing with metadata relating to human intention, such as requirements, in a structured and consistent manner. The history of metadata is traced to illustrate the way metadata has been applied to virtual, physical and conceptual resources, including software. Best practice approaches for applying metadata to virtual resources are discussed and compared to the ways in which metadata have iv historically been applied to software. This historical analysis is used to explain and justify a specific methodological approach and place it in context. Theories describing the evolution of software systems are in their infancy. In the absence of a clear understanding of how and why software systems evolve, a means to manage change is desperately needed. A methodology is proposed for capturing, describing and using system metadata, coupling it with information regarding software components, relating it to an ontology of software engineering concepts and maintaining it over time. Unlike some previous attempts to address the loss of coupling between software systems and their metadata, the described methodology is based on standard data representations and may be applied to existing software systems. The methodology is supportive of distributed development teams and the use of third-party components by its grounding of terms and mechanisms in the World Wide Web. Scaling the methodology to the size of the