<<

A Cross- of Using Automatic

Benjamin Cassell University of Waterloo [email protected]

Abstract ities, are proven to be able to parse left-recursive lan- guages [18]. This surprising increase in ability pro- This paper demonstrates the key abilities of varying lan- vided by memoization is an incredibly potent argument guages to support functional recursive descent parsing in favour of the technique. through automatic memoization. It begins by summa- It is important to consider, however, that manual im- rizing the most pertinent works that have dealt with us- plementation of memoization in a complex parser can ing automatic memoization for the purposes of top-down consume a considerable amount of time and effort. Re- parsing. Next, this paper provides an in-depth exami- gardless of the potential benefits, memoization that is te- nation of a variety of popular modern day to dious for the and results in unreadable and determine their support for the easy implementation of messy code is far less useful. This alone often serves as an automatic memoization mechanism that is sufficiently a barrier to the use of memoization in fields that require powerful to aid recursive descent parsing. In particular, rapid development or prototyping [13]. two primary types of solutions are examined through- This is where automatic memoization becomes useful: out the various languages: Macro-based solutions, and It allows for all the benefits of memoization with minimal first-class citizen function-based solutions in combina- coding effort and loss of maintainability. For parsing, tion with mutation. this is particularly important as the of the parser becomes highly complicated or convoluted (for example, 1 Introduction the notoriously complicated grammar for the ++ pro- gramming language). But this raises the question: Is au- Memoization has been an important technique in vari- tomatic memoization available to coders in most situa- ous computationally-intensive environments for decades. tions? More importantly, is the automatic memoization Michie, in one of the earliest published works to exam- powerful enough to support context-free parsing (in other ine the role of memoization in , described it words, does it properly memoize recursive functions)? as a method akin to the way that humans learn, and ex- The contributions of this paper are as follows: tolled the virtues of using memoization to improve per- formance in computationally-intensive [19]. • A brief summary of the automatic memoization Due to the ability of memoization to prevent unnec- techniques introduced by Norvig [20] and John- essary work on input that has been previously exam- son [18] in their respective works, and how they ined, it is a critical component in the successful reduc- benefit parsing. tion of computational complexity for a variety of parsing techniques. This review chooses to focus on the bene- • An examination of multiple popular modern pro- fits of memoization for functional top-down backtrack- gramming languages to determine the possibility ing parsers, which can be greatly improved in a simple, (and simplicity) of implementing general-purpose natural way with the intelligent definition of memoized automatic memoization in them, which in turn al- functions [20]. lows for the usage of the parsing techniques demon- Interestingly, recursive descent parsers gain more strated by Norvig and Johnson. than just improved performance from memoization. In fact, recursive descent parsers, when combined with • A sample implementation of a subset of Scheme in the proper techniques and sufficient memoization abil- C#, and a working implementation of Norvig and

1 Johnson’s memoized parsing algorithms built on top duced by Abelson and Sussman six years earlier [3]), of it. it was the first major work to examine automatic memoization within the scope of parsing context-free • A discussion of the features between the languages [20]. Norvig demonstrates the complexities that make automatic memoization possible or diffi- associated with providing automatic memoization for cult. functional recursive descent parsing: Because grammar The remainder of this work is structured as follows: rules can be recursive, simply producing a lookup- Section 2 gives a brief overview of the Johnson and function generator can result in improperly memoized Norvig articles that serve as the basis for this review. recursive functions. Section 3 explores implementations of automatic mem- Norvig provides an implementation of automatically oization for use in parsing across a variety of popular memoized parsing in Lisp, using the setf function to modern languages, and section 4 discusses the findings circumvent issues with recursive function calls. Norvig of this paper about the availability of automatic memo- adds a generic memoizer call on top of functions repre- ization for context-free parsing. Section 5 discusses pos- senting production rules in a sample grammar. Calls to sible future work, and section 6 concludes the paper. these rules are re-bound to the memoized versions, even in recursive (and mutually recursive) cases. Norvig addi- 2 Background and Related Work tionally demonstrates the utility of making the automatic memoization function flexible enough to allow different As mentioned in section 1, the concept of memoization types of hashing and equality testing on arguments, with in is not a new. The idea of memoizing the answers to a simple change to the hashing and equality algorithms well-known problems is ancient, and the application of changing the time complexity of a sample memoization in has been studied since parser from quartic to cubic. at least 1968, when Michie published an article in Na- Norvig also shows that a similar implementation for ture [19] examining memoization’s usefulness within the automatic memoization would be effective for parsing context of algorithms. Since then, a in Scheme, either through the use of the set! func- great body of work has focused on using memoization tion or through macros. Finally, Norvig demonstrates to improve everything from thread scheduling [10] and that other languages, for instance Pascal, could imple- proof systems [5] to multimedia applications [4]. ment automatic memoization through constructs such as More recently however, the concept of automatic a memo keyword (given that the language’s parser can memoization has become increasingly important. Auto- be changed). This review contends that this method matic memoization can be thought of as the implementa- is too heavy-handed to be useful. As such, we con- tion of a higher level memoization paradigm, which al- centrate on demonstrating which languages support au- lows a programmer to provide memoization benefits to tomatic memoization through already-provided features any given function with minimal effort (usually by pro- and not through explicitly modifying the language core. ducing memoized versions of functions using a generic memoizer function, or by by marking a functions as 2.2 Extending Memoized Parsing memoizable using statements or other lan- guage properties). Most typical implementations of recursive descent pars- Automatic memoization is highly valuable to parsing ing suffer from similar problems, and foremost among as it allows for rapid prototyping without losing high per- these is the inability of minimally-implemented top- formance. It allows to focus their efforts down parsers to recognize left-recursive grammars. on optimizing important special cases [20], and it also Given a left-recursive grammar, naive recursive descent prevents code bases from becoming bloated and unmain- parsers fail to terminate and, without proper supplemen- tainable [13, 14]. For recursive descent parsing in partic- tation, automatic memoization does nothing to change ular, automatic memoization techniques allow functions this. In his 1995 work, Johnson shows how to solve to continue resembling grammar production rules, even this problem using automatic memoization in combina- when memoized. This prevents grammar rules from be- tion with continuation-passing style (CPS) [18]. coming cluttered with a blurred relationship between the CPS is employed by Johnson to reverse the flow of recursive function and its associated grammar rule. computation in a top-down parser, resulting in values that travel downwards through the computation instead 2.1 Parsing with Automatic Memoization of being pulled upwards from the bottom. Additionally, Johnson memoizes on arguments to the parser and as- Although automatic memoization predates Peter sociates them with lists of continuations. This creates Norvig’s 1991 work (for example, with a sample pro- a system in which no grammar function is evaluated on

2 the same arguments more than once (subsequent evalua- different kinds of caching algorithms, including custom tions are passed forwards to the continuations memoized ones). The function returns a lambda expression repre- in the lookup table). In the case of left , this im- senting the memoized version of the input function. The plies that the recursive cycle is broken after at most one implementation of this function can be seen in listing 1. recursive call, preventing non-termination. Of note is that the function must check whether or not Johnson provides a full implementation of both the the input key is null, as IDictionary types in .NET automatic memoizer and CPS-based parser in Scheme, languages do not operate on null keys. Furthermore, affirming Norvig’s previous conjectures about its feasi- this function only operates on functions that take a single bility [20]. Johnson requires some manipulation to be input for ease of implementation. There are, however, able to define mutually-recursive production rules (due to several methods that can be used to extend the number what he refers to as a vacuous deficiency in the Scheme of parameters accepted by this function. These meth- specification), and this highlights an important limitation ods are are discussed in section 4.3. The this keyword on languages that do not allow properly mutual-recursive qualifying the function fn is syntactic sugar, and defines definitions: Such languages need to support some com- Memoize as an extension method of the type Func. With bination of function predeclarations or first-class citizen this definition, all objects of type Func accepting one functions with lambda expressions to directly allow for value and returning one value can invoke Memoize as if easy implementation of recursive descent parsers. Other it were a method belonging to their class definition. more inconvenient methods may be available to lan- Having also defined a templated ConsCell data type guages without these features, for example defining a set representing two values joined by a cons function (as in of values up front and operating over interpretations of Lisp or Scheme, based on an implementation by Dustin said values, but this defies some of the spirit of functional Campbell [9]), as well as the higher order functions im- recursive descent parsing, which is to have functions that plemented by Norvig and Johnson in their works, a C# resemble the production rules of the grammar. parsing function representing the production rule (S → NP VP) in Johnson’s example grammar would resemble 3 Exploration of Automatic Memoization the sample in listing 2. The implementation of the higher order functions alt and seq function can be seen in list- This section examines implementations of automatic ings 3 and 4. Note that the vacuous macro from the memoization in various popular modern programming Johnson work is explicitly in-lined, as C# does not sup- languages. It highlights the techniques used to achieve port macro expansion. Furthermore, as with Memoize, automatic memoization, whether or not it is sufficient the seq function is defined as an extension method. for use in functional context-free parsing, and the ben- The full implementation of the Norvig parser can be efits and difficulties of the implementation or language. seen in the C# that accompanies this work For every language listed here, unless otherwise noted, a and is available upon request. It also includes a fully functioning code sample is included with the source code functioning example of Johnson’s CPS-based parser us- that accompanies this paper. ing automatic memoization written in C#. It is not in- cluded in this paper due to space constraints. 3.1 C# 3.2 Lisp, Scheme and Racket The most comprehensive examination of automatic memoization performed by this paper was conducted us- As described in section 2, automatic memoization that ing the C# . This, and all other ex- supports context-free parsing is readily available in both aminations of .NET languages (C++.NET and VB.NET), Lisp and Scheme, as shown by Norvig [20] and John- were conducted using the .NET 4.0 language SDK and son [18]. Sample code is provided in Racket using John- run-time environment. son’s Scheme method (which is textually identical as Automatic memoization in C# has been attempted by Racket is just a super-set of the older PLT Scheme). In- others before with success, for example by Wes Dyer terestingly, Lisp, Scheme and Racket all support macro in 2007 [11]. Like these previous implementations, the expansion, meaning an alternative (but likely inferior) C# version of automatic memoization employed by this automatic memoization solution that uses macros and work uses the .NET Func templated type, which is a first- would be able to support context-free parsing is viable. class representation of a function (albeit one with specif- ically typed arguments and a specifically typed return 3.3 Visual Basic .NET value). The Memoize function takes in a similarly tem- plated Func object, as well as an object implementing the Because VB.NET is a completely Microsoft-defined lan- IDictionary interface to use as a memo (allowing for guage (unlike C++.NET), it is able to work natively with

3 s t a t i c Func Memoize ( t h i s Func fn, IDictionary memo) { return key => { i f ( key == n u l l ) return fn ( key ) ; T2 v a l u e ; i f (memo. TryGetValue(key , out v a l u e ) ) return v a l u e ; value = fn(key); memo.Add(key, value); return v a l u e ; } ; } Listing 1: Automatic Memoization in C#

s t a t i c Func, ConsCell> S = Memoize, ConsCell>(( i n p u t => NP.Seq(VP)(input))); Listing 2: Sample Grammar Rule (S → NP VP)

s t a t i c Func, ConsCell> A l t ( t h i s Func, ConsCell> a , Func, ConsCell> b ) { return i n p u t => { return a(input).Union(b(input)); } ; } Listing 3: Sample Higher Order alt Function

s t a t i c Func, ConsCell> Seq ( t h i s Func, ConsCell> a , Func, ConsCell> b ) { return i n p u t => { return Reduce, ConsCell>(Union , null , b . Map( a ( i n p u t ) ) ) ; } ; } Listing 4: Sample Higher Order seq Function

4 most of the complex .NET types that C# has access to. listing 8 is an example provided by Colin Ihrig [15]. It This makes automatic memoization supporting context- is functionally equivalent to the C# example, but as with free parsing very straight-forward to implement, by di- the Python example can support functions with arbitrary rectly translating the C# version into VB.NET line by line. numbers and types of arguments. This translation can be seen in listing 5, as well as a sam- ple application on the Fibonacci sequence. It has all the 3.7 Java same associated benefits and drawbacks as the C# ver- sion, and can also be turned into an extension method if Whereas most other languages examined in this work desired with a small addition. succeeded in some way or another, Java falls short, un- fortunately. Although a solution for direct automatic 3.4 C++.NET and C++11 memoization exists, as shown by Tom White’s 2003 im- plementation [24], it does not properly memoize recur- Although it is possible to create a version of automatic sive functions. The technique employed by White is memoization that can support context-free parsing in to create an memoization class, which has an invoca- C++.NET using .NET constructs, it would be much more tion handler for other interfaces. The programmer sub- useful if there was a generic version that could also sequently creates an object which adheres to a desired be used without any changes inside C++11. This code interface, and memoizes it by passing it to the memoiza- would be much more portable and thus applicable across tion class. Any function calls to interface-bound mem- a much greater variety of systems. Happily, such a bers of the returned object are then routed through the compromise exists. The sample in listing 6 shows a memoization class first. C++ memoize function that properly handles the auto- Because only the first call to the object passes through matic memoization of recursive functions in both envi- the memoized layer, recursive calls and calls to other ronments. functions are not properly memoized. To make matters Like the C# implementation, the sample given here is worse, White’s solution is likely the best automatic mem- templated and accepts a single-argument function, but oization available for Java in its current release. Java could be easily modified using the methods in section does not treat functions as first class citizens, does not 4.3 to accept functions with more generic amounts of ar- offer lambda expressions, and does not natively support guments. It could also be modified to accept a collection- macro expansion. Because of these limitations, there is style object like the .NET automatic memoization imple- likely no easily accessible Java solution for true auto- mentations, instead of simply using the stl map object. matic memoization that is appropriate for use in context- Because C++ supports macro expansion through the C free parsing. Of course, a manual solution is always pos- preprocessor, a more limited alternative solution could sible, and probably not burdensome enough to warrant also be constructed using macros, in a manner similar to avoiding Java as a language if other circumstances sup- the solution demonstrated in section 3.8. port its usage.

3.5 Python 3.8 C In Python, automatic memoization that handles recur- Automatic memoization in C is tricky. While C does pro- sive functions can be provided in a variety of ways. One vide function pointers, they are little more than the mem- particularly clean way, as suggested in various online re- ory addresses that functions live at post-compilation and sources [17], is to create a memoized class which, when provide little of use in terms of automatic memoization. initialized, attaches an empty associative list and an input What C does provide, on the other hand, is the powerful function to itself. When the object is invoked, it performs C preprocessor, which can be used to provide automatic the appropriate memoized lookup. A sample of this im- memoization macros as demonstrated in listing 9. This plementation style, which is able to handle arbitrary pa- is a very basic but functional example, and something rameters and types, is given in 7. more user-friendly and generic could almost certainly be created given more time and thought. For the sake of 3.6 JavaScript brevity, the hash table implementation is excluded from this document. JavaScript is often an unfairly-maligned language The memoize macro takes in several values, the first of among programmers. Implementations of automatic which is a hash table initialization function. This func- memoization that can handle context-free parsing in tion in turn accepts a pointer to the hash table (allowing JavaScript are surprisingly simple, elegant and easy it to be initialized on the stack or allocating new mem- to understand. The implementation of memoize given in ory with malloc if the pointer is null), as well as the

5 Public Function Memoize(Of T1, T2)(fn As Func (Of T1, T2), memo As IDictionary( Of T1, T2)) As Func ( Of T1 , T2 ) Return ( Function (key As T1) As T2 I f (IsNothing(key)) Then Return fn ( key ) End If Dim v a l u e As T2 I f (memo.TryGetValue(key , value)) Then Return v a l u e End If value = fn(key) memo . Add(key, value) Return v a l u e End Function ) End Function

Dim Fib As Func (Of UInteger , ULong) = Memoize( Function (n As UInteger) As ULong I f ( n < 2) Then Return n End If Return Fib ( n − 2) + Fib ( n − 1) End Function ) Listing 5: Automatic Memoization in VB.NET

template f u n c t i o n memoize( function fn ) { map∗ memo = new map(); return ( [ = ] ( ArgType key ) −> ReturnType { i f (memo−>find(key) == memo−>end ( ) ) (∗memo)[key] = fn(key); return (∗memo)[key ]; }); }

f u n c t i o n fib = memoize([=] ( unsigned n ) { return ( ( n < 2) ? n : f i b ( n − 2) + f i b ( n − 1) ) ; }); Listing 6: Automatic Memoization in C++

6 c l a s s Memoize : def i n i t ( s e l f , fn ) : s e l f . fn = fn s e l f . memo = {} def c a l l ( s e l f , ∗ a r g s ) : i f not a r g s in s e l f . memo : self.memo[args] = self.fn(∗ a r g s ) return self .memo[args] def Fib ( n ) : i f n < 2 : return n return Fib ( n − 2) + Fib ( n − 1)

Fib = Memoize(Fib) Listing 7: Automatic Memoization in Python

f u n c t i o n memoize(func) { var memo = {} ; var slice = Array.prototype.slice; return function () { var args = slice.call(arguments); i f ( a r g s in memo) return memo[args ]; e l s e return (memo[args] = func.apply(this , args)); } ; } f u n c t i o n f i b ( n ) { i f ( n < 2) return n ; return f i b ( n − 2) + f i b ( n − 1) ; } fib = memoize(fib); Listing 8: Automatic Memoization in JavaScript

7 byte size and number of keys that will be stored in the of declarations necessary to implement automatic mem- hash table. The next arguments accepted by memoize oization and recursive descent parsers with functions are hash table functions to determine if a key exists in that directly parallel the production rules of a grammar. the hash table and to return a value based on its key. It These language features, roughly summarized are: further accepts the name of the key, the type of value being returned by the function, and the number of val- • Function predilection’s or lambda expressions, al- ues the hash table should retain. When successful, the lowing for the definition of mutually-recursive func- memoize macro statically allocates a hash table pointer tions representing grammar rules. to heap-initialized memory for the function being mem- oized. Calls to the function are then checked for hash • First class functions and the ability to re-bind a label table residence before the remainder of the function is to a newly generated function or value (mutation). executed. • If mutation or higher-order functions are lacking, Because the C implementation relies on macros, it also sufficient preprocessor power to be able to add requires the use of an additional memo return macro, memoization boilerplate code wrapping the con- which replaces any return statements that should have tents of a function. their values memoized. The memo return macro ac- cepts the function used to set values in the hash table, Despite most most of the discussed languages sharing the name of the key in the function, and the expres- some or all of these features, there are a few language- sion whose value should be returned. This macro ex- specific properties that render several of the explored lan- pands to the appropriate statements that hash and re- guages less attractive for usage in recursive descent pars- turn the value of the expression. Although the memoize ing. and memo return macros sound complicated, in practice they are very easy to use and provide the full benefits of automatic memoization. 4.1 Tail Call Optimization One particularly pressing problem is the issue of tail and 3.9 Haskell sibling call optimization. A successful parse using a re- cursive descent parser can require a significant amount of Due to time constraints, this work has not performed an recursive calls to complete (CPS style, for example, can in-depth examination of the Haskell language for its potentially recurse without unwinding any stack frames compatibility with automatic memoization, and thus no until final completion), and if an environment is inca- accompanying source code is provided. That said, there pable of properly providing tail call optimization this can is no reason to believe that Haskell should not be able to drastically limit the usability of recursive descent parsers support automatic memoization which is appropriate for implemented in that environment. That said, use in context-free parsing, as the Template Haskell may also harm a memoized application if they eliminate language extension provides even more powerful and ex- tail recursion altogether: If a tail-recursive call is improp- pressive macro support than the C preprocessor. erly optimized out (for example, turned into a modified Other sources have claimed moderate success using loop), this could prevent an application from taking ad- Template Haskell to create automatically memoized vantage of memoized results. functions in Haskell that support recursion [8], and al- Most of the languages examined in this work have ei- though these results have not been independently con- ther poor or no support for tail call optimization. While firmed by this work, the scope of the claims being made languages like Scheme and Haskell provide explicit, are entirely reasonable. In the cited example, Mike Bur- well-defined and predictable tail call optimization, many rell reports that a first pass at an automatic memoiza- languages like Java do not. For directorial, historical tion macro that supports recursive functions works as in- or other reasons, some languages do not support tail call tended, but consumes far too much memory. As such, optimization and never will without a significant change further work would be needed to ensure that memory use in development focus and principals. Python’s creator, does not become a concern when automatically memoiz- as an example, is known to be particularly resistant to ing functions for the purposes of context-free parsing in the idea of adding tail call optimization to Python’s lan- Haskell. guage specification [22, 23]. Potentially even worse than knowing that tail call opti- 4 Discussion mization will not happen in a memoization-friendly way is being unsure of whether or not it will. Although many Almost all of the languages examined in this work implementations of C allow for some form of tail or sib- demonstrate similar abilities which allow for the types ling call optimization, in gcc (for example) this decision

8 #define MAX FIB 16384// The maximum size to use for the hash table

#define memoize( init fn , contains f n , g e t f n , key name , r e t t y p e , num memo vals ) \ s t a t i c h a s h t a b l e ∗ memo = NULL; \ r e t t y p e m e m o r e t ; \ i f ( memo == NULL) \ memo = i n i t f n (NULL, s i z e o f ( r e t t y p e ) , num memo vals ) ; \ i f ( c o n t a i n s f n ( memo , key name ) ) \ return g e t f n ( memo , key name ) ; \

#define memo return ( s e t f n , key name , r e t e x p r ) \ m e m o r e t = r e t e x p r ; \ s e t f n ( memo , key name , m e m o r e t ); \ return m e m o r e t ;

unsigned long f i b ( unsigned int n ) { memoize( hashtable init , hashtable contains , hashtable g e t , n , unsigned long , MAX FIB) ; i f ( n < 2) return n ; memo return(hashtable set, n, fib(n − 2) + f i b ( n − 1) ) ; } Listing 9: Automatic Memoization in C

is left to the (at levels of optimization O2 and will perform the appropriate optimization leads to failure higher) [1, 2] which can result in highly variable opti- cases. mization decisions. The sample code in listing 10 was run for about 10,655 Func Fib = n u l l ; recurses before it crashed. This number was not static, Fib = ( n => n > 1 ? and slightly different values were be obtained with vary- Fib ( n − 1) + Fib ( n − 2) : n ) ; ing stability based on the performance of the JIT com- piler. This sample further highlights an important aspect Listing 10: Stack Overflow in C# of programming recursive functions in languages that do not provide adequate tail call optimization: Simply re- Func Fib = n u l l ; versing the order that the recursive calls to Fib are made Fib = ( n => n > 1 ? in (shown in listing 11) reduces the use of the call stack Fib ( n − 2) + Fib ( n − 1) : n ) ; by half for equivalent inputs. It can be generalized from this that recursively programmed functions should be in- Listing 11: Insignificant Change, Significant Gains telligently catered towards early and shallow termination Managed environments suffer as well. All .NET lan- when possible, depending on environmental limitations. guages compile into an intermediate language called Despite the grim prognosis for tail call optimization in MSIL which is just-in-time (JIT) compiled by the .NET a large subset of languages, some other languages are im- run-time environment. None of these MSIL compilations proving. JavaScript, which is becoming substantially include the .tail instruction that induces tail call opti- more popular thanks to the adoption of HTML5, tradition- mization [7]. Worse still, the JIT compiler for .NET pro- ally did not support tail call optimization. With the in- grams may or may not dynamically add .tail instruc- troduction of the newest JavaScript standard however, tions at program run-time, ensuring that programmers do this is reported to be changing [6]. Even with a com- not know whether or not their programs will overflow the plete lack of tail call optimization, it is still possible to stack. The code sample in listing 10 is a very simple C# provide the illusion of optimized tail recursion through lambda expression that can cause stack overflows even the use of trampolining in languages that support higher- when memoized. Here, assuming that the JIT compiler order functions [16]. Other techniques can also allow

9 for the illusion of trampolining in languages that do not An alternative strategy for a language that does not cleanly support first-class citizen functions (for example, support variadic arguments can come in different forms: with function pointers, setjmp and longjmp in C). In an object-oriented language, a wrapper class contain- ing variable amounts of elements can be used as the 4.2 Macro Support sole parameter to functions (for example, the tuple type available in C++ and all .NET languages). In a language Lack of macro support can limit development of simple where all types inherit from a common object type, col- automatic memoization, or at least make it more difficult. lections containing the object type can be used to emu- Currently, VB.NET does not support macro expansion, late this as well. Templating the memoization function is and neither does C# by design choice [12]. Regardless, also an effective strategy for accepting generic argument both languages support automatic memoization through types. Finally, function overloading can also be used to their implementations of first class functions and lambda create multiple memoization functions accepting differ- expressions. Other languages, like C and Haskell, have ent amounts of parameters up to a sensible finite limit. a much easier time supporting automatic memoization These overloaded functions can be used to simulate true through the use of macro expansion than through any variadic functions for most useful purposes. other means. As mentioned in section 3.7, Java currently lacks 4.4 Limitations of Automatic Memoization support for functions as first class citizens. Because it also lacks macro expansion capabilities, Java is inca- It should be noted that there are several important lim- pable of easily providing true automatic memoization itations to automatic memoization. First and foremost, support, and is perhaps, as a result, an inappropriate lan- automatic memoization is only useful if the results of guage choice for recursive descent parsing. This is due a function are deterministic based on input, or failing to change with the release of Java 8, however, which this, if results that are simply ”good enough” are suffi- is expected to introduce both functions as first-class citi- cient. Automatic memoization on procedures that can zens and lambda expressions [21]. return varying values (for example, based on user input) One potential solution for macro-less languages is to are most likely not useful. See section 5 for a discussion write macros using a text-based macro implementation of how automatic memoization can be taken advantage from another language, passing any necessary source of in a system where values are acceptable if they are code through the external macro handler before compil- simply ”good enough”. ing or interpreting. This, however, is clunky and incon- Automatic memoization can also incur penalties in venient - a poor substitute for language-native support of terms of both computation time and space if used ineffec- paradigms enabling automatic memoization. tively. A poor caching can result in inefficient lookups and slow performance. Mistakes in how items 4.3 Variadic Argument Support are cached can result in degenerate behaviour (in some languages, for example, mistakenly indexing on the ad- Although variadic argument support - the ability for dresses of objects instead of a stringified representation functions in a language to accept and work on variable of an arbitrary object will cause all lookups to result in amounts and types of parameters - is not required for misses). Furthermore, automatic memoization can enabling context-free parsing through automatic mem- actually slow down fast-running functions and should not oization, it is highly convenient and prevents the pro- be applied without careful consideration. grammer from having to define multiple memoization Finally, unless there is a way to flush an automatically functions for varying numbers and types of arguments. memoized function’s cache, it can continue to grow and VB.NET C++.NET, C++11 and C#, the languages whose consume space indefinitely. This is easy enough to cir- examples in this paper did not use variadic arguments, cumvent, by forcing the programmer to include an exter- did not do so due to lack of capability. All of these lan- nally referenced cache object when memoizing a func- guages support some form of variadic arguments, as does tion, but also implies that the burden of knowing how Java. C# and VB.NET both have the ability to operate and when to flush the cache is on the programmer. A over parameter lists (params and ParamArray respec- programmer must fully examine whether or not it is use- tively), and accepting a parameter list of type object ful to use memory by memoizing results: Ignoring the would allow a user to pass in arbitrary numbers of argu- closed-form of the algorithm, a Fib function that will ments of any type. C++.NET supports similar functional- only ever be used to list terms in successive order should ity through templated array lists in function definitions. not be memoized, as it can be implemented using con- Lastly, C++11 supports a generic Args... type, which stant space. On the other hand, if the Fib function is to is nearly identical to the .NET parameter lists in practice. be accessed arbitrarily, memoization will provide signif-

10 icant wins. modified to take the staleness of the returned information into account when serving it. If the cached information is 5 Future Work considered sufficiently fresh or otherwise ”acceptable”, it would be returned. Otherwise, the cached copy would This research has opened up several interesting possibil- be updated to a value fetched from a remote location. ities for me in future work. The first does not stem di- rectly from automatic memoization for context-free pars- ing but rather from the mini-recreation of Scheme in C#. The recreation used for the purposes of this paper 6 Conclusion was very ad hoc and limited. It was also not as well- engineered as it could have been. Having learned some Recursive descent parsers are straight-forward, effective, important lessons during the implementation of this and have the added benefit of being defined such that project, I would like to re-examine creating an elegant their functions closely resemble the grammar rules that and intelligent solution that better represents Scheme in- they represent. Unfortunately, typical recursive descent side C# and that simplifies ad-hoc development and pro- parsers can be very slow, and in naive implementations totyping. they do not terminate on left-recursive grammars. Au- A second possibility for future work, which is directly tomatic memoization can fix both of these issues, drasti- related to context-free parsing, is adding automatic par- cally improving performance and providing termination allelization to the Norvig and Johnson parsers. With the on left-recursive grammars using Johnson’s 1995 CPS provided C# implementation, it should be fairly straight- techniques. forward to parallelize certain aspects of parsing. In par- It has been shown in this work that a significant ticular, the alt function could spawn two threads which amount of modern languages support adequate facilities simultaneously handle both parsing paths. A thread that to provide automatic memoization for context-free pars- fails or succeeds could use futures or raise events to in- ing (primarily, the ability to memoize recursive functions form other thread groups of acquired information. In the properly). These languages include C, C#, JavaScript, case of constructing unions (as is required for both seq VB.NET, Racket, Python, and all modern versions of and alt), the unions could be built in parallel by adding C++. Although Haskell was not directly examined, results to thread-safe collections. there is sufficient evidence to reasonably expect it to The environment makes integrating paralleliza- .NET behave similarly. These languages achieve this auto- tion into automatically memoized context-free parsers matic memoization in two primary ways, either through especially easy: The implementation of automatic mem- lambda expressions and first-class citizen functions com- oization provided by this work in already allows for C# bined with mutation, or through the use of macro expan- a generic object adhering to the interface IDictionary sion. to be used for memoization. Because .NET provides a thread-safe dictionary type (ConcurrentDictionary) Unfortunately, some languages, even though they may that implements this interface, making the memoization support some form of automatic memoization, cannot thread-safe and thus parallelization-compatible is as easy currently support automatic memoization that handles re- as passing an instance of ConcurrentDictionary in cursive descent parsing. Java is one such language, al- when memoizing functions. Automatically parallelizing though it should be updated in the near future to have full the parsing techniques in this implementation would be support for recursive automatic memoization. Further- a large step towards making it as powerful and useful more, there are scenarios where automatic memoization as similar parallelized parsing libraries available in lan- can be harmful if used carelessly, and as such it should guages like Racket. be approached with careful consideration. One final possibility for future work is examining Automatic memoization is a fascinating topic, with a the applications of automatic memoization in cloud and wide array of applications in and beyond context-free internet-based computing. As previously mentioned in parsing. It is heartening to see that the kinds of tech- section 4, when results that are simply ”good enough” niques it requires are becoming a regular focus in most are considered acceptable, programmers gain lenience on programming languages. With potential applications the functions and procedures that they may apply mem- from this work in terms of cloud-based programming as oization to. One could imagine a cloud-based ad serving well as the automatic parallelization of context-free pars- system in which a programmer could memoize ad infor- ing, one hopes that automatic memoization continues to mation based on consumer preferences. In such a system, be increasingly considered as an appropriate and effec- the advertisement data could be stored along with dates- tive method for improving the performance and power of tamps, and the automatic memoization function could be algorithms and applications.

11 7 Acknowledgements [15]I HRIG, C. Implementing memoization in . http://www.sitepoint.com/ I would like to thank Professor Prabhakar Ragde in ad- implementing-memoization-in-javascript/, August 2012. dition to my supervisor Tim Brecht, as well as my other [16]J ACK, S. Bouncing on your tail. http://blog. colleagues and friends at the University of Waterloo. I functionalfun.net/2008/04/bouncing-on-your-tail. would additionally like to thank all the authors whose html, April 2008. code samples and insights acted as the base for this work. [17]J ASON. Re: What is memoization and how can i use it in python? Without their contributions this would not have been pos- http://stackoverflow.com/a/1988826, July 2010. sible. [18]J OHNSON, M. Memoization in top-down parsing. Computa- tional 21, 3 (1995), 405–417. References [19]M ICHIE, D. Memo functions and machine learning. Nature 218 (1968), 19–22. [1] GCC: Options that control optimization. http://gcc.gnu. [20]N ORVIG, P. Techniques for automatic memoization with appli- org/onlinedocs/gcc/Optimize-Options.html. cations to context-free parsing. Computational Linguistics 17, 1 (1991), 91–98. [2] Tail recursion. http://c2.com/cgi/wiki? TailRecursionhttp://c2.com/cgi/wiki? [21]O RACLE. Lambda expressions. http://docs.oracle.com/ TailRecursion, January 2012. javase/tutorial/java/javaOO/lambdaexpressions. html, October 2013. [3]A BELSON,H., AND SUSSMAN,G.J. Structure and Interpre- tation of Computer Programs. MIT Press, Cambridge, Mas- [22] VAN ROSSUM, G. Final words on tail calls. sachusetts, 1985. http://neopythonic.blogspot.com.au/2009/04/ final-words-on-tail-calls.html, April 2009. [4]A LVAREZ,C.,CORBAL,J., AND VALERO, M. Fuzzy memo- [23] VAN ROSSUM, G. Tail recursion elimination. ization for floating-point multimedia applications. IEEE Trans- http://neopythonic.blogspot.com.au/2009/04/ actions on Computers 54, 7 (2005), 922–927. tail-recursion-elimination.html, April 2009. [5]B EAME, P., IMPAGLIAZZO,R.,PITASSI, T., AND SEGERLIND, [24]W HITE, T. Memoization in java using dynamic proxy N. Memoization and DPLL: Formula caching proof systems. In classes. http://www.onjava.com/pub/a/onjava/2003/ Proceedings of 18th IEEE Annual Conference on Computational 08/20/memoization.html, August 2003. Complexity (July 2003).

[6]B ENVIE, B. Javascript (ES6) has proper tail calls. http://bbenvie.com/articles/2013-01-06/ JavaScript-ES6-Has-Tail-Call-Optimization, Jan- uary 2013.

[7]B ROMAN, D. Enter, leave, tailcall hooks part 2: Tall tales of tail calls. http://blogs. msdn.com/b/davbr/archive/2007/06/20/ enter-leave-tailcall-hooks-part-2-tall-tales-of-tail-calls. aspx, June 2007.

[8]B URRELL, M. Memoizing haskell. http://www.onjava. com/pub/a/onjava/2003/08/20/memoization.html, April 2007.

[9]C AMPBELL, D. Building data out of thin air. http://diditwith.net/2008/01/01/ BuildingDataOutOfThinAir.aspx, January 2008.

[10]C UI,H.,WU,J., CHE TSAI,C., AND YANG, J. Stable deter- ministic multithreading through schedule memoization. In Pro- ceedings of OSDI 2010 (October 2010).

[11]D YER, W. Function memoization. http:// blogs.msdn.com/b/wesdyer/archive/2007/01/26/ function-memoization.aspx, January 2007.

[12]G UNNERSON, E. Why doesn’t C# support #define macros? http://blogs.msdn.com/b/csharpfaq/archive/2004/ 03/09/86979.aspx, March 2004.

[13]H ALL,M., AND MAYFIELD, J. Improving the performance of AI software: Payoffs and pitfalls in using automatic memoiza- tion. In Proceedings of Sixth International Symposium on Artifi- cial Intelligence (Monterrey, Mexico, September 1993).

[14]H ALL,M., AND MCNAMEE, J. P. Improving software perfor- mance with automatic memoization. Johns Hopkins APL Techni- cal Digest 18, 2 (1997), 254–260.

12