Partially Automated In-Line Documentation (PAID): Design and Implementation of a Software Maintenance Tool
Total Page:16
File Type:pdf, Size:1020Kb
Partially Automated In-Line Documentation (PAID): Design and Implementation of a Software Maintenance Tool Jane E. Huffman Clifford G. Burgess Science Applications University of Southem International Carporation Mississippi Technical documentation refers to information on the ABSTRACT various phases of the software life cycle. It includes design specifications, performance specifications, Large scale budget and schedule overruns of software functional specifications, development information, etc. projects have sparked interest in software maintenance. This documentation is usually Written by the system The importance of accurate technical documentation to designers or programmers. It is found in-line in the the maintenance pmcess is presented. A possible solution source program, on-line,or in had copy form. It is often to the lack of adequate technical documentation is used as referem material for maintenance of a suggested -- a partially automated in-line documentation progradsystem. system. This system uses software merrics to determine where comments should be placed in source programs. Unfortumely, quite often no technical documentation The system can be used as a tool in software development is produced.. In addition, when documentation is and/or softwm maintenance. produced, it IS often poorly or incompletely Written, and may not be kept current. These factors contribute to the KEY WORDS: Software Engineering difficulty of maintaining the software at a later time. Software Maintenance Software Tools Software Metrics Documentation In-Line Documentation 2. RESEARCH RATIONALE This paper will show that undocumented (or poorly documented) source code can be documented with the 1. INTRODUCTION help of a partially automated in-line documentation system, PAID. This system seeks to improve software maintenance by facilitating the in-line documentation of Large budget and schedule overruns on many software programs. The system uses software metrics to help projects has generated a great deal of interest in software determine where comments should be placed in the maintenance. Maintenance accounts for approximately program. Specifically, the Index of Difficulty used by the 80% of the overall life cycle cost of a software product Maintainability Analysis Tool (MAT) [1],[2] has been [lo]. Also, personnel dissatisfaction with maintenance moditied for use by PAID. An extended discussion of work has reached epidemic Proportions. The national PAID follows. average length of employment for a computer programmer is approximately six months. Turnover in personnel results in high training costs and added 3. AUTOMATION OF IN-LINEDOCUME NTATION maintenance costs (due to the large number of programmers working on a system over its life cycle). If these costs can be r"zed,. the overall cost of software E!u.PQE can be reduced substantially. The importance of technical documentation to the A major factor affecting software maintenance is software maintenance process has been discussed, and the documentation. Software documentation is the collection need for commenting existing programs is apparent. The of documents that explain, describe, and define the ideal solution to this problem would be some system that purposes and uses of a particular software program or a automatically places comments in a program explaining system composed of multiple programs [8]. There are its operation. This would allow maintenance three categories of software documentation: user progammers to submit all of the old programs that need documentation, product documentation, and technical modifcation (but lack documentation) to this system and documentation. Technical documentation will be the within a short period of time have fully documented focus of this paper. programs. CH2615-3/88/0000/0060$01.00 Q 1988 EEE Authorized licensed use limited to: UNIVERSITY OF KENTUCKY. Downloaded on February 1, 2010 at 12:43 from IEEE Xplore. Restrictions apply. Method PAID utilizes a modified version of the Index of Difficulty, referred to as the Modified Index of Difficulty, PAID concentrates on the major functionality of this to decide where comments should be placed. The Index "super" documentation system - deciding where of Difficulty was selected as the foundation for PAID comments should be placed. This becomes difficult since because it is a measure of how difficult a program will be no national or international standards for documentation to maintain and it uses element weights and factors to exist. If standards did exist, however, the system could determine this complexity measure. This makes the just follow their precepts and guidelines, and place metric relatively easy to implement, easy to understand, comments as appropriate. and easy to modify. Since no standards exist, PAID implements the Using the Index of Difficulty weights as a foundation, following in-line documentation guidelines. In general, weights for Pascal source code were determind. This comments should be inserted: process involved using the authors' experience. peer input, and trial and error. A subset of the weighting 1) at the beginning of the program to give the name factors for the Modified Index of Difficulty are shown in of the author, title of the program, object of the table 1. program, and methods used by the program; PAID examines a line of the source code (subject 2) in the declaration section to explain variables, code) being evaluated. The line is scanned and token data types, record structures, etc. used in the types are determined. These token types each have a program; weight in the Modified Index of Difficulty table. A simple table "look up" procedure is used to retrieve the 3) at the beginning and end of each block; and token metric value. The weight for the token type is then added to the complexity of the line. 4) within the program andlor block body as deemed necessary (particularly in sections of code that If the line's complexity is sufficient to warrant are notoriously difficult to maintain, such as documentation (it exceeds a limit for complexity per line recursive routines). of code), PAID "looks ahead" and "looks behind" an appropriate number of lines of code (based on There are more specific documentation "guidelines" that complexity) to see if such documentation already exists. vary depending on the language used to implement a If it does not, the user is prompted to enter a comment. If program. the line complexity does not warrant documentation, it is added to the accumulated complexity (the complexity The first three guidelines can be fairly easily since an already existing comment was found or a new implemented by lexically scanning the subject program comment was inserted). If accumulated complexity and checking for the program heading, declaration exceeds a certain limit, documentation is necessary, and section, or beginnindend of a block. Guideline four PAID prompts the user to enter a comment. presents more of a problem -- determining that a segment of code needs documentation requires close examination PAID checks for complex Pascal structures, also. An of the code. PAID uses the concept of textual complexity example is the forwd declaration. This structure makes to make this determination. By implementing a measure a program difficult to understand, and therefore difficult of textual complexity, PAID chooses locations in the to maintain (particularly if the structure is not program where comments should be inserted. documented). If a forward declaration is encountered by PAID, it "looks ahead" and "looks behind for Software metrics are tools used to quantify software documentation. If documentation does not exist, PAID complexity. When deciding where to place in-line prompts the user to enter a comment. When prompting comments, textual complexity is of importance. Textual for a comment, PAID informs the user of the reason that complexity deals with the readability and documentation is necessary (e.g., line complexity, understandability of programs. There are two major accumulated complexity, forward declaration, etc.). metric systems used to quantify textual complexity: 1) the Berry-Meekings style metric; and 2) the Maintainability PAID checks for direct and indirect recursion. Direct Analysis Tools' (MAT) 111, 121 Index of Difficulty. The recursion is present when a block calls itself. An latter will be of interest in this paper. example of direct recursion implemented in Pascal is: MAT, developed by Science Applications Procedure T; International Corporation, is a static analysis tool for begin FORTRAN programs. It is used to locate programming T; discrepancies such as prusages, errors, possible errors, end; etc. This FORTRAN static analyzer reads, parses, and examines the source code of each program module one at This is often represented as: T --> T. Indirect recursion a time - i.e., the static analyzer examines each source exists if a block calls another block@), at least one of module individually and all of them as a whole [2]. Weights and factors assigned to program elements and attributes (Index of Difficulty) are used to examine the modules. 61 Authorized licensed use limited to: UNIVERSITY OF KENTUCKY. Downloaded on February 1, 2010 at 12:43 from IEEE Xplore. Restrictions apply. ~ which calls the original block. For example: Table 1. Modified Index of Difficulty We ihe ts procedure z; forward; Identifiers Yeight ProcedureL; besin -Y 1.50 z; File Variable 1 .00 end; Function 2.00 ProcedureB; Parameter 1.00 besin procedure 2.00 L; ~m 2.00 end; StIillg 2.00 procedurez; Type 7.00 begin Variable 0.85 B; constants end; This can be repsented Z --> --> --> Z. Boolean 0.00 as: B L Character 0.10 Integer 0.00 PAID evaluates the relationships between blocks in Real the subject program, It compms each identifier it 0.05 encounters to the current block name. If these match, Qperator Element direct recursion exists and the user is prompted to enter a comment (if a CQmment does not already exist). The And identXer is also compared to all blocks currently active 0.25 in the program that may have an indirectly recursive Not 1 .00 relationship to the current block. If indirect recursion is or 0.25 found.