A Requirements-Based Exploration of Open-Source Software Development Projects – Towards a Natural Language Processing Software Analysis Framework
Total Page:16
File Type:pdf, Size:1020Kb
Georgia State University ScholarWorks @ Georgia State University Computer Information Systems Dissertations Department of Computer Information Systems 8-7-2012 A Requirements-Based Exploration of Open-Source Software Development Projects – Towards a Natural Language Processing Software Analysis Framework Radu Vlas Georgia State University Follow this and additional works at: https://scholarworks.gsu.edu/cis_diss Recommended Citation Vlas, Radu, "A Requirements-Based Exploration of Open-Source Software Development Projects – Towards a Natural Language Processing Software Analysis Framework." Dissertation, Georgia State University, 2012. https://scholarworks.gsu.edu/cis_diss/48 This Dissertation is brought to you for free and open access by the Department of Computer Information Systems at ScholarWorks @ Georgia State University. It has been accepted for inclusion in Computer Information Systems Dissertations by an authorized administrator of ScholarWorks @ Georgia State University. For more information, please contact [email protected]. PERMISSION TO BORROW In presenting this dissertation as a partial fulfillment of the requirements for an advanced degree from Georgia State University, I agree that the Library of the University shall make it available for inspection and circulation in accordance with its regulations governing materials of this type. I agree that permission to quote from, to copy from, or publish this dissertation may be granted by the author or, in his/her absence, the professor under whose direction it was written or, in his absence, by the Dean of the Robinson College of Business. Such quoting, copying, or publishing must be solely for the scholarly purposes and does not involve potential financial gain. It is understood that any copying from or publication of this dissertation which involves potential gain will not be allowed without written permission of the author. RADU E. VLAS 1 NOTICE TO BORROWERS All dissertations deposited in the Georgia State University Library must be used only in accordance with the stipulations prescribed by the author in the preceding statement. The author of this dissertation is: RADU E. VLAS Computer Information Systems Department 35 Broad St., NW Georgia State University P.O. Box 4015 Atlanta, Georgia 30302-4015 The director of this dissertation is: WILLIAM N. ROBINSON Computer Information Systems Department 35 Broad St., NW Georgia State University P.O. Box 4015 Atlanta, GA 30302-4015 2 A REQUIREMENTS-BASED EXPLORATION OF OPEN-SOURCE SOFTWARE DEVELOPMENT PROJECTS – TOWARDS A NATURAL LANGUAGE PROCESSING SOFTWARE ANALYSIS FRAMEWORK BY RADU EDUARD VLAS A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree Of Doctor of Philosophy In the Robinson College of Business Of Georgia State University GEORGIA STATE UNIVERSITY ROBINSON COLLEGE OF BUSINESS 2012 3 Copyright by Radu Eduard Vlas 2012 4 ACCEPTANCE This dissertation was prepared under the direction of the RADU E. VLAS Dissertation Committee. It has been approved and accepted by all members of that committee, and it has been accepted in partial fulfillment of the requirements for the degree of Doctoral of Philosophy in Business Administration in the J. Mack Robinson College of Business of Georgia State University. H. Fenwick Huss, Dean DISSERTATION COMMITTEE Dr. William Robinson Dr. Balasubramaniam Ramesh Dr. Duane Truex Dr. Walt Scacchi 5 ABSTRACT A REQUIREMENTS-BASED EXPLORATION OF OPEN-SOURCE SOFTWARE DEVELOPMENT PROJECTS – TOWARDS A NATURAL LANGUAGE PROCESSING SOFTWARE ANALYSIS FRAMEWORK BY RADU EDUARD VLAS JULY 2012 Committee Chair: Dr. William N. Robinson Major Academic Unit: Computer Information Systems Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects, error-prone. Automated analysis of natural language requirements, even partial, will be of great benefit. Towards that end, I describe the design and validation of an automated natural language requirements classifier for open source software development projects. I compare two strategies for recognizing requirements in open forums of software features. The results suggest that classifying text at the forum post aggregation and sentence aggregation levels may be effective. Initial results suggest that it can reduce the effort required to analyze requirements of open source software development projects. Software development organizations and communities currently employ a large number of software development techniques and methodologies. This implied complexity is also enhanced by a wide range of software project types and development environments. The resulting lack of consistency in the software development domain leads to one important challenge that researchers encounter while exploring this area: specificity. This results in an increased difficulty of maintaining a consistent unit of measure or analysis approach while exploring a wide variety of software development projects and environments. The problem of specificity is more prominently exhibited in an area of software development characterized by a dynamic evolution, a unique development environment, and a relatively young history of research when compared to traditional software development: the open-source domain. While performing research on open source and the associated communities of developers, one can notice the same challenge of specificity being present in requirements engineering research as in the case of closed-source software development. Whether research is aimed at performing longitudinal or cross-sectional analyses, or attempts to link requirements to other aspects of software development projects and their management, specificity calls for a flexible analysis tool capable of adapting to the needs and specifics of the explored context. This dissertation covers the design, implementation, and evaluation of a model, a method, and a software tool comprising a flexible software development analysis framework. These design artifacts use a rule-based natural language processing approach and are built to meet the specifics of a requirements-based analysis of software development projects in the open-source domain. This research follows the principles of design science research as defined by Hevner et. al. and includes stages of problem awareness, suggestion, development, evaluation, and results and conclusion (Hevner et al. 2004; Vaishnavi and Kuechler 2007). The long-term goal of the research stream stemming from this dissertation is to propose a flexible, customizable, requirements-based natural language processing software analysis framework which can be adapted to meet the research needs of multiple different types of domains or different categories of analyses. 6 Table of Contents 1. Introduction .......................................................................................................................................... 9 1.1. The Importance of Requirements and of Requirements Traceability ........................................ 12 1.2. Usage Scenarios .......................................................................................................................... 16 1.3. NL Requirements Discovery ........................................................................................................ 17 1.4. NL Requirements Classification................................................................................................... 21 1.5. An Emergent Grammar and Perspective .................................................................................... 23 1.6. Research Method ........................................................................................................................ 27 2. Related Research ................................................................................................................................ 29 2.1. Requirements and Requirements Processes in Open-Source .................................................... 30 2.2. Pattern-Based Analysis of Requirements .................................................................................... 33 2.3. Requirements Discovery ............................................................................................................. 34 2.4. Requirements Classification ........................................................................................................ 35 2.5. Software Product Quality and Software Development Project Success ..................................... 37 3. The Grammar-Based Approach ........................................................................................................... 40 3.1. Classifier Design .......................................................................................................................... 40 Illustrative Text Tagging ...................................................................................................................... 41 Requirements Parsing Ontology ......................................................................................................... 42 3.2. Classifier Engineering .................................................................................................................. 47 Rule-Based Tagging ............................................................................................................................