Realizing C++11 Lambda Expression in Open64 Javed Absar Anitha Boyapati Dibyendu Das AMD, India AMD, India AMD, India Richmond Road Richmond Road Richmond Road Bangalore Bangalore Bangalore +91 9901880710 +91 7795334034 +91 9448537014 [email protected] [email protected] [email protected]
ABSTRACT higher-order functions. Higher order functions are functions that C++11 is the latest edition of C++ programming language take one or more functions as input and output a function as standard by ISO. It replaces C++03. It includes a number of core return value [4]. Their major use is to abstract common behaviour language extensions, probably the most interesting of which is the into one place [5]. inclusion of the lambda-expressions. The onus is now upon Higher order functions are closely related to first-class functions. compiler writers – gcc, llvm, Open64 and others, to incorporate The distinction between the two is that higher-order function this extension into their existing compiler infrastructure so that describes a mathematical concept while first-class object is a programmers could benefit from this powerful language feature. computer science term that describes programming language Incorporating lambda-expression is not so straight-forward an entity that has no restriction on its use. First-class functions can extension for a compiler. It needs a good understanding of the therefore appear anywhere in the program where other first-class C++11 standard and the many possible intricate use and misuse of entities such as numbers can appear, including as arguments to this language feature in programs. In this paper, we analyze other functions and as the return value. lambda expression from a language-feature perspective, the value it provides to programmers and how Open64 could support it. In mathematics, higher order functions are also known as operators . For example, the definite integral in calculus is an operator that given a function f of a real variable x, and an interval Categories and Subject Descriptors [a, b] on the real line, returns the area under its curve. Later we D.3.4 [Programming Languages ]: Processors – compilers. will illustrate implementation of this operator using anonymous- function construct of C++11.
General Terms Anonymous function and lambda-expression are sometimes used Algorithms, Design, Languages and Theory. in a mixed and confusing way, more so in programming language contexts than in mathematical expositions. In mathematics, Keywords lambda-expression is a notational convention in support of lambda expression, anonymous function, C++11, C++0x, closure, lambda calculus. Lambda expression in programming is an higher order function, Open64, WHIRL, compiler. expression that specifies an anonymous function object [6]. Since anonymous functions in programming languages can have statements (control, assignment or expression statement) in the 1. INTRODUCTION function body, the mathematical purity is lost to some extent.
The first programming language to adopt anonymous functions C++11 [1] was approved by the ISO in August 2011 as the new was Lisp (1958). Traditionally anonymous functions have found standard for the C++ programming language, replacing C++03. good use in functional languages and languages that treat The name C++11 is derived from the convention of naming functions as first-class objects, such as Haskell, Scheme, ML and language versions based on the year of publication. Lisp [7][8]. In the current era of multi-paradigm languages, many C++11 includes several addition to the core language. In the imperative, procedural and object-oriented languages have added design of C++11 standard, the committee had applied some anonymous class and anonymous functions to their repertoire of directives to help guide their decision. One of the directives was language features – C#, Clojure, Java, JavaScript, PHP, amongst “to prefer changes that can evolve programming technique”. The others. C++ added anonymous function in its C++11 edition. inclusion of anonymous functions or lambda function is a direct Support for anonymous function in C# (.NET) [9] has improved result of that objective. with new versions of the compiler. In C# 1.0, one would create an Anonymous function is a function which is defined and invoked instance of a delegate by explicitly initializing it with a method without being bound to an identifier. Anonymous functions have that was defined elsewhere in the code. In C# 2.0, a delegate their origin in the works of Alonzo Church on λ-calculus [2][3]. could be initialized with inline code, called anonymous method. They are a convenient way to pass functions as arguments to In C# 3.0, a delegate could be initialized with a lambda expression, which is more expressive and concise. E.g. in 2.0 the Permission to make digital or hard copies of all or part of this work for personal or parameter type had to be defined twice (during declaration of classroom us e is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation delegate and during initialization) and this was done away in 3.0. on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLDI’12 . Copyright © 2012 ACM 1 -59593 -XXX -X/0X/000X…$5.00. C++11 has leapfrogged this by removing most redundancies seen in earlier versions of C# and gives a spartan expression of anonymous function definition and invocation. Anonymous functions discussion requires an understanding of the concept of closure. A closure is a function together with a referencing environment for the non-local variables of the function. Or to quote the ISO/IEC sub-committee technical report [9], “ Closure is an anonymous function object that is created automatically by the compiler as a result of evaluating a lambda expression. Closure consists of the code of the body of the lambda function and the environment in which the lambda function is Figure 1. Computing definite integral by partitioning the area defined ”. In practice, this means that external variables referred to under the curve into rectangles in the body are stored (as reference or as copied-value) as member variables of the anonymous function object. Or that a pointer to the frame where the lambda function was created is stored in the The key limitation here is that the pointed-to function ptr2func function object. has to be defined separately from the context in which integrate will be called from. Suppose we need to compute integral of f(x) The concept of closure was developed in 1960s and featured in = u/x+v. We define it as below and then pass a pointer to Scheme programming. C++11 support closures in two default func_inverse each time we need to integrate f (x) = u/x+v. forms. One stores a copy of the variable, the other stores references to the original variables. Both provide functionality to override this default behaviour for individual variables. double func_inverse(double x ) { A key limitation of the C++11 lambda feature, however, is that return u/x+v; C++11 closures are monomorphic . That is, their type does not } adapt to the context in which they are called. This is unlike C#. 2. Organization of the Paper Different contexts from which integrate may be called may have different values of u and v. To overcome this problem, we are forced to set u and v as globals, a highly undesirable solution. In Sec 3, we provide the advantages of using anonymous function. Sec 4 discusses the formal syntax and semantics of C++11 lambda-expression illustrating key concepts with examples. Sec 5 3.2 Integral – Function Object gets into the details of implementing anonymous function in the front-end and in Open64 [12]. Sec 6 forms the conclusion. In C++ prior to C++11, we can use function object or functor to solve the same integral problem in a more elegant manner. A 3. Lambda Expression – Motivation functor is a construct that allows an object to be called as if it were a function. We construct a class (e.g. CFoo ) which overloads In this section, we illustrate with examples the advantages of the function call operator ( ) member function. In C++ this is C++11 lambda functions over other forms of expression – such as called class type functor. function pointers of C and object function of C++03. 3.1 Integral – Function Pointers class CFoo{ public: double u,v; Consider the problem of writing a function to compute the definite integral [a,b] of a mathematical function f : R→R. The basic double operator( )(double x){ return u/x+v; } approach is to divide the area under the curve of f into very }; narrow rectangles and sum the area of all the rectangles. The double integrate( double a, double b, CFoo f){ integrate function is then actually a higher-order function as it int i; double sum =0, dt = (b-a)/N; takes not a value but any continuous function as its input. for( int i = 0 ; i < N; i++ ) sum += f(a+i*dt)*dt;… Now, if we limit ourselves to implementing integral in C, we can use function pointer as shown below. return sum; }
int main( ){ double integrate(double a, double b, double (*ptr2func)(double) ) CFoo f_inv_x; { f_inv_x.u = … ; f_inv_x.v = …; double a=… b =…; int i; double sum =0, dt = (b-a)/N; // N is number of segments double t = integrate (a, b, f_inv_x); … for( i = 0; i < N; i++ ) sum += ptr2func(a+i*dt) * dt ; } return sum; Figure 2 Computing definite integral using functor } The code above demonstrates how the idiom of functor is double t = integrate (a, b, [u,v ](double x){ return u/x+v;} ); typically used. The functor approach gives efficient local context, encapsulation and conciseness than function pointer. Instead of setting some global values, we can pass context now by setting the This increases the readability of the code. If another integral, e.g. 2 functor’s member variables. x has to be computed; only one more line of code is required: The function pointer points to a function which has simply two contexts – the argument passed to it and the global variables. The double t = integrate (a, b, [u,v ](double x){ return x*x;} ); functor on the other hand can additionally refer to its own local context or object state (data member) which can be used to generate set of functions using current context. 3.4 Idiomatic uses of C++11 Lambda The function object approach has the advantage of context and Expression encapsulation but has much syntactic overhead. The syntactic requirement of defining a class with its member variables, function call operator, and constructor, and then constructing an It is envisioned that lambda-expression in C++11 will help object of that type is very verbose and thus not well-suited for improve brevity of codes in general and be particularly useful for crating function object “on the fly” to be used only once. The STL based codes. For example, if we want to add up the elements semantics and syntax of lambda expression via translation to in a vector, we can do it as shown in Figure 4. function objects, as defined in C++11 is a more concise way of achieving the same and more. int sum = 0; 3.3 Integral – Lambda Expression for_each( myvector.begin( ), myvector.end( ), [&sum](int i){ sum+=i;}); The definition and use of C++11 lambda expression for Figure 4. Using lambda to sum a STL vector computing the integral is shown below (Figure 3). In Figure 4, the variable sum is captured by reference. Therefore template
Our last STL example is of std::sort shown in Figure 6. x instantiated to 5. In other words, c2 represents specifically the 2 function (25+y ). std::sort( myvector.begin( ), myvector.end( ), [ ] (MyType& a, Note that currying is possible in C++11 as lambda object can MyType& b){ return a < b; }); return another lambda object. Figure 6. Using lambda with std::sort Above is a call to sort function that expects comparison function 4. Lambda Syntax and Semantics object that, taking two values of the same type as those contained in the iterator ‘s range, and returns true if the first argument goes before the second argument. Here we discuss the syntax and semantics of lambda-expression relying on the C++11 standard specification [1]. Having seen three idiomatic STL uses of lambda, we give an example out of STL. Figure 7 show the famous Fibonacci using lambda expressions of C++11: 4.1 C++11 Specification – Syntax
C++11 extends the previous definition of primary-expression with std::function
lambda-introducer lambda-declarator opt compound-statement Class template std::function is a general-purpose polymorphic lambda-introducer: function wrapper type. Instances of std::function can store, copy [ lambda-capture opt ] and invoke any callable target – normal functions or lambda functions or other function objects. We create a function object fib which is of type lambda-expression that takes an int as input The lambda-introducer is the indicator that what is to follow must and returns an int as output. This is the first time we introduce the be interpreted as a lambda-expression definition. It can be trailing-return syntax of C++11 as well. As the reader may have unambiguously identified by the square bracket [ ] with optional → guessed, int says the lambda returns an int. content [ lambda-capture opt ]. lambda-capture is the list of Note that the implementation of fib uses std::function to create variables the lambda-expression wants to capture, as part of its first-class object fib. This fib is then captured and used inside its closure. own lambda-expression to make a recursive call. So in a way, Next, lambda-declarator is the function parameter list. Note that C++11 lambda is not truly anonymous. the lambda-declarator is optional. If the lambda-expression does not need any argument then we can either put as ‘( )’ or just leave it blank. 3.5 Currying with C++11 Lambda The detailed syntax for lambda-capture is given in Figure 9. From that, we deduce that lambda-capture could be [ ], [=], [&], [id], Currying is a technique of transforming a function that takes [&id], [&,id1, id2], [=,id1,this,&id2] amongst others. Not all multiple arguments, to a chain of functions each with a single though will be semantically acceptable. argument. It was discovered by Moses Schönfinkel and later re- discovered by Haskell Brooks Curry [11]. In theoretical computer lambda-capture: science, currying provides a way to study functions with multiple arguments in very simple theoretical models, where functions only capture-default | capture-list | capture-default, capture-list take single argument. The example below illustrates how currying capture-default : & | = can be realized in C++11. capture-list: 2 2 In the example in Figure 7, we compute x +y in two steps, each capture | capture-list , capture time using a lambda function that accepts only one argument. capture: identifier | & identifier | this
auto c1 = [ ](int x) { return [x](int y){ return x*x + y*y ;} }; lambda-declarator: auto c2 = c1(5); //c2 is eq. to [ ](int y){ return 5*5+y*y; } (parameter-declaration-clause) attribute-specifier opt mutable opt auto c3 = c2(3); //c3 is eq. to value (25+3*3) exception-specification opt trailing-return-type opt assert( c3 == (5*5 + 3*3) ); Figure 9 Figure 8. Example of currying using C++11 lambda The lambda-declarator is the argument list with an option trailing The lambda object c1 takes an int x argument and returns the type, e.g. →int signifies that the return type is int. function (x 2 + y 2) where y is a free variable, and x is initialized to whatever value of x it is supplied with. For example, c2 is c1 with The compound statement is simply the lambda function body and is a sequence of zero or more statements within { }. 4.2 C++11 Specification – Semantics 5. Lambda Expression – Implementation
The lambda-expression can refer to variables declared outside its Here we explain the conversion of lambda expression to body. Table 1 below clarifies the two modes of capturing compiler’s intermediate (e.g. WHIRL) format [12]. externally defined variables – by reference or by value.
5.1 Parsing Lambda Expression [ ] The lambda-expression cannot access any external variables in its body. Table 2 shows some of the information gathered in GCC front- e.g. [ ]( int i){ return i+j; } //error as j is external and not end 1 during the parsing of lambda-expression declaration [10]. captured
[& ] Any external variable is implicitly captured by reference if it is used in the lambda function. INFORMATION DESCRIPTION e.g. [&]( int i){ j++; return i+j; } //changes made to j is LAMBDA_EXPR Tree code for lambda-expression reflected upon return LAMBDA_EXPR_DEF enum {NONE /*[ ]*/, BY_VALUE /*[=]*/, [=] Any external variable is implicitly captured by value if AULT_CAPTURE BY_REFERENCE /*[&]*/ } it is used in the lambda function. LAMBD_EXPR_CAPT Each item is stored as a structure
We give some examples more below of lambda-expression formulation with implied semantic interpretation. The evaluation of a lambda-expression results in a prvalue (pure rvalue ) temporary. This temporary is the closure object. A closure [ ]{ }( ); //valid lambda-function which does nothing object behaves like a function object. The type of a lambda-expression is (which is also the type of the A lambda object can be captured by another lambda object as the closure object) is a unique, unnamed non-union class type- called gcd implementation shows below: the closure type . This class type is not an aggregate. The closure type is declared in the smallest block scope, class scope, or auto rem = [ ]( int x, int y) { return x%y; }; namespace scope that contains the lambda expression. std::function
assert(gcd(21,49) == 7); 1 Footnotes: GCC 4.5 supports lambda-expressions in experimental form. Current Open64 uses GCC 4.2 as front-end which does not have support for C++11 features.
The closure type for a lambda expression has a public inline object of type T. It could have captured individual members as function call operator whose parameters and return type are well but does not in this case. The lambda then uses this to access described by the parameter-declaration clause and trailing return- data member i. type. Any exception specification applies only to the The first thing we show is the equivalent conversion of the corresponding function operator. lambda internally by the front-end to a struct with an overloaded If the lambda-expression does not include a trailing return-type it function call operator. The names of internal types and variables is as if the trailing return-type denotes the following type: are chosen to be expressive of their role. Figure 12 shows the resulting changes to function body of get . • If the compound-statement is of the form
{ return attribute-specifier expression ; } opt class T{ Then the type of the return is the expression after public: lvalue-to-rvalue conversion and function-to-pointer int i; conversion. …. get( ) { • otherwise, void. struct Lambda1{ public: It is important to note that the initialization of captured values T* enclosureThisPointer; //closure occurs at the time of lambda object creation and not when the int operator( )( ){ lambda function is invoked. return enclosureThisPointer->i; } }lam; class test{ lam.enclosureThisPointer = this; //copy this of T to closure public: void print( std::function
5.3 Translating Lambda Expression int _ZN1T3getEv ( struct T *const this){ struct Lamdbda1 lam; //create λ closure lam.enclosureThisPointer= this; //prepare closure We now explain the translation of lambda expressions by the return _ZZN1T3getEvEN7Lambda1clEv(&lam); //call λ front-end of the compiler to a form that can be integrated into } existing, i.e. pre C++11, compiler infrastructure and its intermediate representation, or IR, such as WHIRL. Consider the int _ZZN1T3getEvEN7Lambda1clEv (Lambda1 *const closure) example in Figure 11. { return constclosure->enclosureThisPointer->i; } class T { public: int main( ) { int i; T t; void set(int v) { this->i = v; } _ZN1T3setEi (&t, 5); int get( ) { return [this]( ) {return this->i;}( ); } assert( _ZN1T3getEv (&t) == 5 ); }; return 1; int main( ) { } T t; t.set(5); Figure 13. Conversion of method to a normal function with assert( t.get( ) == 5); this pointer as an argument return 1; There are a couple of points to note carefully about the conversion } by the front-end. Firstly, note that the function _ZN1T3getEv , i.e. Figure 11. Lambda example capturing this of its enclosure T::get, is passed a pointer to structure of type T which is the ‘ this’ In this example, there is a lambda expression in method get. The pointer of object t. This mechanism exists today already in lambda captures this pointer, which is the pointer to the enclosing WHIRL. _ZN1T3getEv declares and initializes the struct Lambda1 . This structure contains the values and references that efficiency-minded that programmers may capture ‘this’. So the were captured by the lambda-expression definition. As this- copy propagation has to just optimize ‘this’ copy. pointer of T was specified as a capture item, Lambda1 has an element enclosureThisPointer . Function _ZZN1T3getEvEN7Lambda1clEv receives a pointer to 6. CONCLUSION Lambda1 structure. We call it closure although technically closure includes the function itself. The closure contains reference and values of all captured variables. In _ Z1T3get7lambda1clEv, We presented a bit of the general philosophy of lambda it is then used to gain access to variable i. expression and some motivation of why programmers may be excited to start using this new C++11 language feature. Also, we It is important to note that the initialization of captured values showed some idiomatic uses of lambda expressions. We covered occurs at the time of lambda object creation and not when the the syntax and semantics using the C++11 standard as reference. lambda function is invoked. Then we covered how lambda may be translated by the front-end Figure 14 shows the WHIRL for the method get . At line 6-7, we of the compiler to a representation that is compatible with existing note the copying of this pointer to lam.enclosureThisPointer . In intermediate-level representation in the compiler such as WHIRL. lines 9-11, the Lambda1 *lam is passed as parameter to lambda We note that the WHIRL may not be severely impacted. We call _ZZN1T3getEvEN7Lambda1clEv . concluded, however, that generating efficient code for lambda expressions may be the real challenge behind its use.
1. FUNC_ENTRY <1,53,_ZN1T3getEv> 2. IDNAME 0 <2,4,this> 3. BODY 7. REFERENCES 4. BLOCK {line: 1/8} 5. PRAGMA 0 119
[5] Jan van Eijck and Christina Unger. Computational Semantics Figure 15 shows the WHIRL for the lambda function itself. Note with Functional Programming . Cambridge University Press. that pointer variable ‘this’ corresponds to ‘ const closure’ used in figure 13. [6] Andrew Troelsen, Pro C# and the .NET Platform . ISBN 978-1-4302-2549-2. 1. FUNC_ENTRY <1,57,_ZZN1T3getEvEN7Lambda1clEv> 2. IDNAME 0 <2,4,this> [7] Richard Bird. Pearls of Functional Algorithm Design. 3. BODY Cambridge University Press. Cambridge 2010. 4. BLOCK [8] C. A. Hoare and Richard Bird. Introduction to Functional 5. U8U8LDID 0 <2,4, this> T<71,anon_ptr.,8> 6. U8U8ILOADT<69,Lambda1,8> T<71,anon_ptr.,8>
http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02165.html Lambda expression does not incur additionally overhead as such [11] Strachey, Christopher. Fundamental Concepts in compared to the function object creation method employed Programming Languages . 2000. generally in C++03. Inlining will be key to an efficient implementation especially, as envisioned, large parts of lambda [12] Open64 Manuals. www.open64.net . expression usage will be for small function. Additionally, efficient http://sourceforge.net/projects/open64/files/open64/Docume copy propagation should remove redundant copies made for the ntation/ closure during lambda to function-object creation. It is likely that when several members of the enclosing object are referenced,