Language-Independent Volume Measurement
Total Page:16
File Type:pdf, Size:1020Kb
Language-independent volume measurement Edwin Ouwehand [email protected] Summer 2018, 52 pages Supervisor: Ana Oprescu Organisation supervisor: Lodewijk Bergmans Second reader: Ana Varbanescu Host organisation: Software Improvement Group, http://ww.sig.eu Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering http://www.software-engineering-amsterdam.nl Contents Abstract 3 1 Introduction 4 1.1 Problem statement...................................... 5 1.2 Research Questions...................................... 5 1.3 Software Improvement Group................................ 6 1.4 Outline ............................................ 6 2 Background 7 2.1 Software Sizing........................................ 7 2.1.1 Function Points.................................... 7 2.1.2 Effort estimation................................... 8 2.2 Expressiveness of programming languages......................... 8 2.3 Kolmogorov Complexity................................... 8 2.3.1 Incomputability and Estimation .......................... 9 2.3.2 Applications ..................................... 9 2.4 Data compression....................................... 9 2.4.1 Compression Ratio.................................. 10 2.4.2 Lossless & Lossy................................... 10 2.4.3 Archives........................................ 10 3 Research Method 12 3.1 Methodology ......................................... 12 3.2 Data.............................................. 12 3.3 Counting Lines of Code ................................... 13 3.4 The expressiveness spectrum ................................ 13 4 Measuring information content 15 4.1 Compressor and Algorithm selection............................ 15 4.1.1 Comparing algorithms................................ 16 4.2 Archive selection....................................... 17 4.2.1 Overhead ....................................... 17 4.2.2 Comparing archives ................................. 18 4.3 Project size.......................................... 19 4.4 Discussion........................................... 20 4.5 Conclusion .......................................... 21 5 Expressiveness and information content 22 5.1 Determining language expressiveness levels ........................ 22 5.2 Validation........................................... 25 5.3 Normalising LOC counts................................... 25 5.4 Discussion........................................... 26 5.5 Conclusion .......................................... 27 6 Quality and relative verbosity 28 1 6.1 Experiment.......................................... 28 6.2 Discussion........................................... 30 6.3 Conclusion .......................................... 30 7 Related Work 31 7.1 Normalised Compression Distance ............................. 31 7.2 Calculating software productivity with compression.................... 31 7.3 Determining software complexity.............................. 31 8 Conclusion 33 8.1 Future work.......................................... 33 Acknowledgements 34 Bibliography 35 Appendices 39 A Cut-off Data 40 B Language Distributions 41 2 Abstract The software size of a system is typically measured using lines of code. Lines of code are easy to obtain, but the amount of lines that is needed for implementing a given functionality is strongly influenced by the programming style and programming languages that are used. A new approach towards this is measuring 'information content' of source code as an estimate for the size of a software system. With this approach we have successfully determined the expressiveness of various programming languages, however we were not able to verify the results definitively. As a practical application, we have proposed a new way to normalise lines of code counts. Finally, we found no relation between various quality metrics and a verbose style. 3 Chapter 1 Introduction Determining the size of a software system is done for various reasons. It is typically used to predict the amount of effort that will be required to develop a system, as well as to estimate programming productivity or maintainability once the system has been developed. As explained by Galorath [GE06], estimates are only as good as the size projections they are based on. In the physical world, size is a measure of volume or mass. In the software world, though, size is not as clearly defined. Some metrics include: counting characters, tokens, lines of code (LOC), classes and function points. A well-established way to determine the size of a project, is by counting the lines of code that have been produced. Studies suggest [HPM03] that LOC often correlates to other measures of effort or functionality, such as function points. The programming language of choice and the style of the programmer play a large role in the relation between the size in LOC and the actual effort required to create the system. As a result, for systems that are written in different languages, the number of lines is not comparable. Nevertheless, the amount of lines of code is generally accepted as a sensible and practical measure of the size of a system, because it can be accurately measured, fully automated and is easily comprehensible. The amount of lines required to express a certain amount of functionality in a language is an inherent property of said language and is typically referred to as the language gearing factor, language level or expressiveness. In this study, we are interested in determining a size measure that helps us to compare source code with regard to creation effort, namely the intellectual effort that goes into writing a number of lines of source code. We believe that the relevant size of a software system is proportional to the amount of information that is encapsulated in the code base. This follows the idea that the comprehension of program functionality is the comprehension of information and relationships [BH91], which in turn implies that the development process consists of translating knowledge about processes, activities, data structures and manipulations, and calculations into source code. Certain programming languages allow the programmer to express this in a very concise manner, where others do not. We argue that the bulk of the effort in developing software is spent thinking and reasoning about these concepts, and only a small portion is spent translating the result into source code. Therefore, our approach is based on the idea that the information content of a system is a reflection of its functional size. By applying this way of measuring the size of source code, we can derive a table containing expressiveness levels of various languages, a language level table. This would thus allow LOC measurements of projects in the future to be normalised, which we expect to be a better indication of the intellectual effort that went into creating a project than traditional methods. Lastly, we consider the qualitative aspect of unnecessarily verbose or duplicated code (relatively high line count and low functionality). Inexperienced developers often resort to code duplication, which is more bug-prone and costly to maintain, but it results in a higher LOC count. Therefore, we also investigate the relation between a system's verbosity relative to systems in the same language and its other qualitative attributes. 4 1.1 Problem statement Consider two applications that provide the exact same functionality (screens, reports, databases). One of the applications is written in Java and the other is written in Python. The amount of lines required for the Java implementation is expected to be higher, because Java is a more verbose language. We can observe this effect even at the smallest level. For example, in listings 1.1 and 1.2 we see 'Hello World' coded in Java (5 LOC) and Python (2 LOC) respectively. Similarly, the LOC count of two identical Java applications could be vastly different based on code conventions and stylistic differences of the programmer. An experienced developer may be able to develop the same functionality with far less code. 1 public class HelloWorld f 1 2 public static void main(String[] args) f 2 #!/usr/bin/env python 3 System.out.println("Hello, World"); 3 4 g 4 print "Hello, world!" 5 g 5 Listing 1.1: Hello World coded in Java Listing 1.2: Hello World coded in Python These limitations have led to the inception of backfiring, which is the conversion of lines of code to function points [GE06, Jon95] based on historical data. Examples of this include the SPR Program- ming Languages Table1 and the QSM Function Points Languages Table2, which describe backfiring ratios for various languages. With these ratios, LOC counts can essentially be normalised for the lan- guage used. Sadly, the benchmarking process for these tables is far from ideal. For one, because they are based on function points which in turn are counted from documentation rather than being based on the actual system. Though the counting process is standardised [Sse12] and can be automated, there still often exists a mismatch between the actual software and documentation. Secondly, software is often developed in more than one language. A variety of languages are often employed depending on the complexity and requirements. This means that functionality cannot directly be attributed to a particular language as it is interwoven throughout the system. We have reason to believe that these tables are indeed (to an extent) flawed. We can observe large differences between