Retrofitting Memoization: an Exploratory Study

Retrofitting Memoization: An Exploratory Study Joël Ledelay University of Twente PO Box 217, 7500 AE Enschede the Netherlands [email protected] ABSTRACT a typical request, and are thus a major contributor to the ActFact is a software development company which delivers experienced UI performance. Enterprise Resource Planning (ERP) as part of a Software as a In some cases, these operations are unnecessarily repeated Service (SaaS) product. As part of their effort to improve the while processing a particular request, in some rare cases up to scalability of their software stack, they wanted to investigate the hundreds or even thousands of times. These cases were possibilities for them to use memoization techniques to reduce discovered through code analysis. unnecessary repetition of expensive function calls, such as When performance issues like these are corrected, it is not database access and parsing and evaluation of expressions. This unusual for them to reappear as a result of a seemingly small, project aims to investigate different memoization techniques, in unrelated change to the UI code. Avoiding or mitigating these order to determine which of them best suits the company’s issues typically involves duplicated state, leading to a more needs. Their comparative performance have been measured complex architecture and code base, and more defects. These between the methods, as well as the implications that defects were reported by users, and the root cause was implementing each of these techniques would have on the rest determined to be these ad hoc performance corrections. of the system. Based on these metrics, a recommendation was made to employ method level memoization as outlined in this 1.2 Terminology and research goals paper. In this research, there are a number of terms which are used extensively. They are explained here. Keywords A cache, in the context of this research, consists of a key-value Memoization, caching, optimization, Java, scalability map. This map contains method calls (method name and list of parameters) and maps them to method call results (the return 1. INTRODUCTION value of the method). Retrieving data from this cache is faster than recomputing the result in the case of expensive function 1.1 System overview calls. Therefore, using the cache can increase system ActFact [8] is a software development company which delivers performance. Enterprise Resource Planning (ERP) software as part of a Software as a Service (SaaS) product. Their ERP software is Memoization is a software engineering technique, in which also named ActFact. It consists of a core system, written in results of method calls are stored in a cache. Before running a Java, with a number of extension modules. These modules are memoized method, this cache is checked to see if the method customized for each customer. One customer could require a has already been called with these arguments. If this is the case, specific financial management module, while another customer the stored result is returned instead. If the result is not in the could instead want to run a web shop which is connected to the cache, the method is run, and the result is stored in the cache. ERP software seamlessly. ActFact aims to improve scalability and reduce the code New functionality can be added directly in the ActFact user complexity of the ActFact software stack. These performance interface (UI), by defining the structure (column names, types issues must be corrected if scalability is to be improved. etc) for database tables and windows which display and alter Reducing code complexity and reducing the amount of defects the data in these tables. requires replacement of these ad hoc solutions by a more robust solution which is less complex for the developer. This makes it simpler for non-experts to develop new modules, since less knowledge about databases (SQL queries, for In order to achieve this, ActFact would like to investigate the instance) is required to create a functional extension to the possibility to add a memoization implementation on a request system. scope level. The server side of the ActFact UI implementation can, like most The possibility of adding a memoization implementation on a web applications, be thought of as a request pipeline. A request request scope level is investigated through analysis of three is received, processed and a response is created and shipped off. possible ways of implementing memoization, namely an ad hoc approach, a generalised class level approach and a generalised Operations involving input and output (I/O), specifically method level approach. database access, and those performing parsing and evaluation of expressions take a significant part of the time needed to process 2. RESEARCH QUESTIONS The possibilities for adding a memoization implementation are Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are investigated on the basis of the following research question. not made or distributed for profit or commercial advantage and that RQ1: How can unnecessary repetition of expensive function copies bear this notice and the full citation on the first page. To copy calls, such as database access or expression parsing, be otherwise, or republish, to post on servers or to redistribute to lists, avoided? requires prior specific permission and/or a fee. 34thTwente Student Conference on IT, Jan. 29th, 2021, Enschede, The This research question can subsequently be split into multiple Netherlands. Copyright 2021, University of Twente, Faculty of Electrical sub-questions: Engineering, Mathematics and Computer Science. 1 RQ1.1 Where does unnecessary repetition of expensive requires changes at every location the methods in this class are function calls occur within the ActFact system? called. RQ1.2 How to ensure that correctness is not compromised? While libraries exist for memoization in object-oriented RQ1.2.1 How can it be proven that the expression parsing will languages [12], it is quite difficult to identify methods which return the same result with similar (but not equal) inputs? can benefit from memoization, as these methods are often complex. Furthermore, analysing whether any side effects occur RQ1.3 How to ensure that minimal code changes are needed in these methods can be time-consuming and error-prone. in the existing code? Research has been done into tools which serve to aid developers These research questions were chosen to help identify areas in identifying which methods may benefit from memoization which could benefit from memoization and different methods of [3]. Identification of such methods is, however, only the first implementing memoization. They also help identify step in implementing memoization. performance metrics, such as execution time or their implications for the rest of the system. In an effort to aid the developer in detecting methods which may benefit from memoization and providing insight as to how to implement memoization for these methods, a tool was 3. RELATED WORK developed [4], which performs dynamic analysis and finds Memoization is a technique which has had significant research methods which might be overlooked by other analysis tools. done into it in recent years. For instance, J. Mertz et al. [1] This tool was not used, since the company already gave pointers recently published a paper in which they provide an overview towards which methods could benefit from memoization. These of the benefits and challenges of applying memoization. They pointers were sufficient to investigate the possibilities of identified three main tasks which are required to apply retrofitting memoization in the company’s software memoization, which have been used as a guideline for architecture. performing this research. These steps are choosing which computations to cache, implementing caching decisions as caching logic and defining a consistency strategy. 4. METHODOLOGY A number of steps were identified through which this research S. Wimmer, S. Hu et al. [7] developed a tool which is conducted. They are detailed in this section. automatically memoizes pure, recursive functions – that is, functions which have no side-effects and provides proof of equivalence. Since this is only applicable to pure, recursive 4.1 Researching the ActFact code base functions, this tool cannot be applied here, since the functions 4.1.1 Investigating repetition of expensive methods investigated are not free of side effects. It must be clear where unnecessary repetition of expensive function calls occur within ActFact. This is needed for two In recent years, research into approximate memoization reasons. If it is not known where unnecessary repetition of techniques has been performed [5]. These techniques sacrifice expensive function calls occur, the implications of developing a precise result for performance gains. While promising for many solution cannot be investigated. Furthermore, a suitable solution applications in which there may not be a single best result, such cannot be developed if it’s unclear where in the system it should as a search result or displaying trending posts on social media, this is unfortunately not applicable to this project, since be used. correctness is paramount here. This is the case because ActFact As a means to find out where this repetition occurs, code runs administrative software, in which returning an analysis is performed on specific parts of the system, which approximately correct result is simply not an option. were identified by the company as potential performance bottlenecks. P. Prabhu et al. [2] developed a tool which is employed in the context of search algorithms. In search algorithms, in many 4.1.2 Proving correctness with similar inputs cases, different solution paths are looped over. If memoization Since expression parsing contributes significantly to the is applied, parallelized access to the memoization data structure experienced performance, this section focuses on those methods is often the main performance bottleneck. This is solved which perform this expression parsing, namely through development of a tool which provides access to a evaluateLogic(Evaluatee source, String logic).

Retrofitting Memoization: an Exploratory Study

Design and Evaluation of an Auto-Memoization Processor

Parallel Backtracking with Answer Memoing for Independent And-Parallelism∗

Derivatives of Parsing Expression Grammars

Dynamic Programming Via Static Incrementalization 1 Introduction

Intercepting Functions for Memoization Arjun Suresh

Staged Parser Combinators for Efficient Data Processing

Using Automatic Memoization As a Software Engineering Tool in Real-World AI Systems

Recursion and Memoization

CS/ECE 374: Algorithms & Models of Computation

A Tabulation-Based Parsing Method That Reduces Copying

Dynamic Programming

Technical Correspondence Techniques for Automatic Memoization with Applications to Context-Free Parsing