An APPROACH in SOFTWARE SIZING and ITS ESTIMATION TOOL

An APPROACH IN SOFTWARE SIZING AND ITS ESTIMATION TOOL Weider D. Yu Chi Ho Yiu* Computer Engineering Department, San Jose State University, San Jose, CA 95192-0180, USA Email: [email protected] • Objectives: Measuring the number of non-blank, Abstract— Software quality measurement has been non-comment lines of source code crucial to the success of software development. Metrics are counted by the analyzer. widely used as standards of measuring software’s quality • Input/Output: and guidelines for software development. Companies ♦ Input: Source code require their employers to follow a set of standards in order ♦ Output: Number of lines of code. to maintain software reliability, quality, maintainability, • Output Usage: completeness, accuracy, portability, consistency, testability ♦ Intuitive guide to the scale of the project. and usability. In this paper, sample metrics data collected ♦ Predictor of inadequate decomposition per from a large software and hardware company is accessed, module. and the result metrics are accessed in order to generate a set • Relationship to Quality Model Attributes: of statistical methods to provide reasonable guidelines to ♦ Accuracy. (Non-Comment, Non-Blank lines estimate expected lines of code (ELOC) with the input of of code). expected number of module. ♦ Structuredness. (LOC calculated on a per-module basis). Index Terms—Software Metrics, Quality, Lines of Code, ♦ Structuredness. (LOC calculated on a Sizing Tool. per-function basis). • Thresholds: 24 lines for methods [1] I. INTRODUCTION • Suggested Guideline: Review code for classes with large methods. One of the challenges for managers to manage teams in building quality software is the difficulty in managing the time and available resources in development. Resources estimation is always a challenging question for managers. Lines of Code vs Number of Modules By estimating the needed resources to develop a component, manager can have better control on the team resources and 350000 time so that the overall progress is under control. In fact, 300000 many projects overrun their expected budget and schedule. 250000 As soon as the requirements for the project are defined, estimation of the size of projects is very important 200000 since it can ensure high quality outcome of the project, allow 150000 managers better control of available resources and make Code of Lines 100000 engineers easier to maintain the overall system. By 50000 measuring the size of the project, lines of code (LOC) is widely known in the industry. 0 0 200 400 600 800 1000 1200 1400 1600 1800 To study LOC, the following paragraph talks about Number of Modules the characteristics of LOC, and graphs are presented below to demonstrate the relationship of LOC with other software Figure 1. Lines of Code vs. Number of Modules. metrics [4], [6]. Moreover, the following analysis is based on software metrics data collected from several components written in C++ programming language [9]. Characteristics of LOC(Lines of code): *Chi Ho Yiu graduated from San Jose State University and currently works at IBM Santa Teresa Laboratories, San Jose, California. 1 II. CALCULATING EXPECTED LINES OF CODE I Lines of Code vs McCabe Complexity 350000 Before starting to calculate the expected lines of code 300000 (ELOC), each component is classified into different categories because every component has different 250000 characteristics. Therefore, calculating the ELOC of each 200000 component without classifying them into different categories 150000 will yield inaccurate result. According to Figure 1 and Lines of Code of Lines Figure 2, both of them demonstrate the relationship among 100000 LOC, NOM and MVG. However, if two figures are 50000 compared closely, they indicate that LOC vs. MVG provide 0 a more stable relationship that this relationship can be used 0 10000 20000 30000 40000 50000 60000 70000 to organize components into various categories. Therefore, McCabe Complexity components can be classified into different categories using the following equation: Figure 2. Lines of Code vs. McCabe Complexity [8]. LPM = LOC / MVG … (1) McCabe Complexity vs Number of Modules In addition, as the calculation of ELOC involves the NOM attribute, it is good to calculate the LOC per NOM for 70000 each component because some components may tend to have 60000 high LOC per module, and some components may have low 50000 LOC per module. Therefore, this inconsistency may cause 40000 the output of ELOC for each component varies. As a result, 30000 calculating the LOC per NOM is a good indicator to further 20000 classify components in categories. 10000 McCabe Com plexity 0 LPC = LOC / NOM … (2) 0 200 400 600 800 1000 1200 1400 1600 1800 -10000 Number of Modules Through applying equation (1), it is possible to calculate the LOC/MVG for each component, and the output Figure 3. McCabe Complexity vs. Number of can be used to organize them into different categories. Modules. In order to calculate ELOC, a range of expected lines of code per module against different difficulties levels is From Figure 1, the chart demonstrates that the defined. Through applying equation (1) and (2) on the growth of number of modules (NOM) is not corresponding metrics data collected from various source data, a set of to the growth of LOC that the growth rate of LOC varies in metrics data is generated . By looking at the LOC/NOM data, different range of NOM. Increasing in the number of it is easy to observe that as the value of LPM increases, the modules does not guarantee the corresponding increase in value of LPC decreases. Therefore, a conclusion can be lines of code. On the other hand, Figure 2 demonstrates that drawn that if the component is not complicated, and the growth rate of LOC against McCabe Complexity is more developers are familiar with the logic they need to stable that increasing in the McCabe Complexity causes the implement, they tend to write most of the code in a single increasing in lines of code. module. In contrast, if the component is complicated, and From management point of view, Figure 1 and developers are not familiar with the logic they need to Figure 2 indicate that the system is inconsistent in terms of implement, they would like to break the component into LOC. Increasing in LOC should result in increasing in MVG different modules so that each module contains less code so and NOM. However, Figure 1 shows that increasing in LOC that it is easier for developers to implement it. does not result in corresponding increase in NOM. Figure 3 Based on the conclusion, the following ranges are further indicates that increasing in NOM does not result in defined that it groups components into three different major corresponding increase in MVG. categories based on components’ LPC: Easy, Average and In this paper, several guidelines are established to Complicated. We also further define each category into 3 calculate the expected lines of code for each component in subcategories: Easiest, Easy, Less Easy, Below Average, the software system. Open source metrics measurement tool, Average, Over Average, Less Complicated, Complicated CCCC, is used to calculate necessary metrics attributes, and and Very Complicated. Moreover, based on the empirical several source data from various applications created by data computed by Professor Weider Yu from San Jose State different companies are gathered in this research. University through statistical process on various source data, a range of lines of code per module is defined under various difficulties. By using these ranges, we can calculate a range 2 of new ELOC for each component in terms of different 2. LIB 12: difficulties. LIB 12 has a total of 80 modules. The ELOCs of LOC Range (LOCR) LOC per Module LIB 12 is calculated by using formula (3), and the corresponding results are displayed in Figure 8. Moreover, 1. (E1)Easiest: 200. the accuracy of the ELOC comparing to the actual LOC is 2. (E2)Easy: 185. calculated by applying formula (4) and the results are 3. (E3)Less Easy: 170. displayed in Figure 9. The following table displays the 4. (A1)Below Average: 155. calculated ELOCs and the accuracy for LIB 12: 5. (A2)Average: 140. 6. (A3)Over Average: 125. LOC Range ELOC Accuracy 7. (C1)Less Complicated: 110. E1 200 x 80 = 16000 44.8% 8. (C2)Complicated: 95. E2 14800 34.6% 9. (C3)Very Complicated: 80. E3 13600 23.7% A1 12400 12.8% By estimating the lines of code of a component, the A2 11200 1.9% number of the modules for the component is needed. After A3 10000 -9.0% the number of the modules is gathered, the estimated lines of C1 8800 -19.9% code can be computed by multiplying the number of C2 7600 -30.9% modules to each of the LOC range defined above, the C3 6400 -41.8% formula is defined below. Moreover, the accuracy of the . ELOC comparing to the actual LOC of the component is calculated by using equation (4). 3. LIB 8: ELOC = LOCR × NOM … (3) LIB 8 has a total of 66 modules. The ELOCs of LIB 8 is calculated by using formula (3), and the corresponding Accuracy (%) = results are displayed in Figure 10. Moreover, the accuracy of (1-((ELOC – ALOC) / ALOC)) x 100 … (4) the ELOC comparing to the actual LOC is calculated by applying formula (4) and the results are displayed in Figure III. EXAMPLES OF CALCULATING ELOC: 11. The following table displays the calculated ELOCs and the accuracy for LIB 8: Each component’s estimated lines of code is calculated under 9 different difficulties level: Easiest, Easy, LOC Range ELOC Accuracy Less Easy, Below Average, Average, Over Average, Less E1 200 x 80 = 16000 71.4% Complicated, Complicated and Very Complicated. E2 12210 58.5% E3 11220 45.7% 1. LIB 10: A1 10230 32.8% A2 9240 20.0% LIB 10 has a total of 74 modules.

Load more