Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis

Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis Istvan Gergely Czibula, Zsuzsanna Onet¸-Marian and Robert-Francisc Vida Department of Computer Science, Babes¸-Bolyai University, M. Kogalniceanu Street, Cluj-Napoca, Romania Keywords: Algorithmic Complexity, Dynamic Program Analysis. Abstract: Algorithm complexity is an important concept in computer science concerned with the efficiency of algorithms. Understanding and improving the performance of a software system is a major concern through the lifetime of the system especially in the maintenance and evolution phase of any software. Identifying certain performance related issues before they actually affect the deployed system is desirable and possible if developers know the algorithmic complexity of the methods from the software system. In many software projects, information related to algorithmic complexity is missing, thus it is hard for a developer to reason about the performance of the system for different input data sizes. The goal of this paper is to propose a novel method for automatically determining algorithmic complexity based on runtime measurements. We evaluate the proposed approach on synthetic data and actual runtime measurements of several algorithms in order to assess its potential and weaknesses. 1 INTRODUCTION son about the performance characteristics of a given algorithm for increasing size of the input data. In The performance of a software application is one of essence, studying efficiency means to predict the re- the most important aspect for any real life software. sources needed for executing a given algorithm for After the functional requirements are satisfied, soft- various inputs. ware developers try to predict and improve the performance of the software in order to meet user expec- 1.1 Motivation tations. Performance related activities include modi- fication of the software in order to reduce the amount While in case of library functions, especially stan- of internal storage used by the application, increase dard library functions, complexity guarantees for the the execution speed by replacing algorithms or com- exposed methods exist, such information is generally ponents and improve the system reliability and robust- omitted from the developer code and documentation. ness (Chapin et al., 2001). The main reason for this is the difficulty of deduc- Simulation, profiling and measurements are per- ing said information by the software developer. Ana- formed in order to assess the performance of the sys- lyzing even a simple algorithm may require a good tem during maintenance (McCall et al., 1985), but us- understanding of combinatorics, probability theory ing just measurements performed on a developer ma- and algebraic dexterity (Cormen et al., 2001). Au- chine can be misleading and may not provide suffi- tomated tools, created based on the theoretical model cient insight into the performance of the deployed sys- presented in this paper can overcome this difficulty. tem on possible different real life data load. Profiling Knowledge about algorithmic complexity can is a valuable tool but, as argued in (W. Kernighan and complement existing software engineering practices J. Van Wyk, 1998), no benchmark result should ever for evaluating and improving the efficiency of a soft- be taken at face value. ware system. The main advantage of knowing the Analysis of an algorithm, introduced by (Knuth, complexity of a method is that it gives an insight into 1998) is concerned with the study of the efficiency of the performance of an operation for large input data algorithms. Using algorithm analysis one can com- sizes. Profiling and other measurement based tech- pare several algorithms for the same problem, based niques can not predict the performance characteristics on the efficiency profile of each algorithm or can rea- of a method for other than the data load under which 186 Czibula, I., One¸t-Marian, Z. and Vida, R. Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis. DOI: 10.5220/0007831801860193 In Proceedings of the 14th International Conference on Software Technologies (ICSOFT 2019), pages 186-193 ISBN: 978-989-758-379-7 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis the measurements were performed. While they do not focus on determining the com- Knowledge about algorithmic complexity is also plexity of the source code directly, there are several beneficial for mitigating security issues in software approaches that try to find performance bugs (pieces systems. There is a well documented class of low- of code that function correctly, but where functional- bandwidth denial of service (DoS) attacks in the lit- ity preserving changes can lead to substantial perfor- erature that exploit algorithmic deficiencies in soft- mance improvement) in the source code, for example: ware systems (Crosby and Wallach, 2003). The first (Luo et al., 2017), (Olivo et al., 2015) and (Chen et al., line of defence against such attacks would be to prop- 2018). erly identify algorithmic complexity for the opera- An approach using evolutionary search techniques tions within the software system. for generating input data that trigger worst case com- The main contributions of this paper is to pro- plexity is presented in (Crosby and Wallach, 2003). pose and experimentally evaluate a deterministic ap- Such approaches are complementary to our approach proach to find the asymptotic algorithmic complexity which assumes that testing data already exists. for a method. To the best of our knowledge there is no other approach in the literature that automatically determines algorithmic method complexity based on 3 METHODOLOGY runtime data analysis. The most used representation for algorithm ana-lysis is the one proposed in (Knuth, 1998), the asymptotic representation, based on the Big O notation, a conve- 2 RELATED WORK nient way to deal with approximations introduced by Paul Bachmann in (Bachmann, 1894). In this section we will present a short overview of Definition 1. O( f (n)) denotes the set of g(n) such some existing approaches from the literature related that there exist positive constants C and nO with to algorithmic complexity. jg(n)j ≤ C ∗ f (n) for all n ≥ n0. The first approaches, for example (Le Metayer,´ If we denote by T(n) the number of steps per- 1988), (Wegbreit, 1975) and (Rosendahl, 1989), were formed by a given algorithm (n is the size of the input based on source code analysis and the computation of data), then the problem of identifying the algorithmic mathematical expressions describing the exact num- complexity becomes finding a function f (n) such that ber of steps performed by the algorithm. While such T(n) 2 O( f (n)). The basic idea is to find a function methods can compute the exact complexity bound of f (n) that provides an asymptotic upper bound for the an algorithm, they were defined for functional pro- number of steps that is performed by the algorithm. gramming languages and recursive functions. When analyzing algorithmic complexity, we are More recent approaches can be applied for other not differentiating between functions like f (n) = programming languages as well, but many of them 2n2 + 7 and f (n) = 8n2 + 2n + 1 the complexity in focus only on determining the complexity of spe- both cases will be T(n) 2 O(n2). The result of algo- cific parts of the source code (usually loops). Such rithm analysis is an approximation indicating the or- approach is the hybrid (composed of static and der of growth of the running time with respect to the dynamic analysis) method presented in (Demontieˆ input data size (Cormen et al., 2001). While the com- et al., 2015) and the modular approach presented in plexity function can be any function, there is a set of (Brockschmidt et al., 2014). well known and widely used functions in the literature Goldsmith et al. introduced in (Goldsmith et al., used to communicate complexity guarantees. In this 2007) a model that explains the performance of a pro- paper we use the functions from Table 1, but the set gram as a feature of the input data. The focus is on can be extended without loosing the applicability of presenting the performance profile (empirical compu- the proposed method. The considered functions rep- tational complexity) of the entire program and not the resent complexity classes that appear frequently for identification of algorithmic complexity at the method real life software systems (Weiss, 2012). level. In conclusion, the problem of identifying algorith- Benchmark is a library, written in C++, that sup- mic complexity becomes selecting the most appropri- ports benchmarking C++ code (Benchmark, 2016). ate function f (n) from a predefined set, such that f (n) Though it is not its main functionality, it also sup- best describes the order of growth of the running time ports the empirical computation of asymptotic com- of the analyzed method for increasing input data sizes. plexities. A comparison of our approach to the results In this paper we introduce an automated approach provided by Benchmark are presented in Section 5. for identifying algorithmic complexity for a method 187 ICSOFT 2019 - 14th International Conference on Software Technologies in a software system based on runtime measurements. In the definition of the Big O notation, the only re- The basic idea is to measure the execution time of quirement regarding the constant C is that it should be the analyzed method for different input sizes and use positive, but no upper limit is set, since the inequal- these measurements to identify the best fitting com- ity from the definition should be true for all values plexity function. The proposed approach consists of of n ≥ n0 . However, in our approach the value n three steps described in the following sections. (i.e., the size of the input data) is finite, it has a max- imum value, which means that the values of the con- 3.1 Step 1 - Data Collection stants c1;c2 should also be restricted.

Load more