<<

Higher-order Functions 15-150: Principles of – Lecture 13

Giselle Reis

By now you might feel like you have a pretty good idea of what is going on in functional program- ming, but in reality we have used only a fragment of the language. In this lecture we see what more we can do and what gives the name functional to this paradigm. Let’s take a step back and look at ML’s typing system: we have basic types (such as int, string, etc.), tuples of types (t*t’ ) and functions of a type to a type (t ->t’ ). In a grammar style (where α is a basic type):

τ ::= α | τ ∗ τ | τ → τ What types allowed by this grammar have we not used so far? Well, we could, for instance, have a function below a tuple. Or even a function within a function, couldn’t we? The following are completely valid types: int*(int -> int) int ->(int -> int) (int -> int) -> int The first one is a pair in which the first element is an integer and the second one is a function from integers to integers. The second one is a function from integers to functions (which have type int -> int). The third type is a function from functions to integers. The two last types are examples of higher-order functions1, i.e., a function which:

• receives a function as a parameter; or

• returns a function.

Functions can be used like any other value. They are first-class citizens. Maybe this seems strange at first, but I am sure you have used higher-order functions before without noticing it. For example, callback functions are usually passed as a parameter to other functions. In math, derivatives are nothing but a function which operates on functions (also integrals). In fact, let’s look at the case of derivatives. There are two ways to compute derivatives of a function f at a point x. One is via the definition:

f(x + δ) − f(x) f 0(x) = lim δ→0 δ But soon after learning that, you also learn some rules of computation which avoid having to compute this limit all the time (e.g. if f(x) = xn then f 0(x) = nxn−1). To use the rules, though, we need to know what the function looks like, whereas the definition works regardless of the structure of f. In other words, the definition treats f as a black-box. This is how we shall treat functions inside higher-order functions in ML. Now let’s look at some concrete examples2. Take the function that increments a number by a δ:

inc : N × N → N inc(δ, x) = x + δ It’s a pretty simple function.

1Analogously, the first type is that of higher-order tuple. 2We stay in math for now. We will see the SML syntax for all this soon, but for now concentrate on the concepts.

1 Now let’s implement a function that, given δ, generates the inc function for that δ. So that will be a function that returns a function, and its type is:

mkInc : N → (N → N) There is not much secret to this definition, really. There are two ways in which this can be done. One way is to define the returning function with a name and return this name:

mkInc(δ) = f where f(x) = x + δ With all you know about SML so far, you could already implement such function. The other way is to keep the function nameless, i.e., use an . Does that ring a bell? Anonymous functions are declared using λ notation:

mkInc(δ) = λx.x + δ

Remark 1. The use of λ for anonymous function follows the notation used in the λ-calculus, a very famous model of computation on which functional programming is based. In this model there are only three constructs (one being anonymous functions) and it can be used to simulate Turing machines (meaning it is Turing-complete). λ-calculus was proposed by Alonzo Church in 1936.

Here are some examples of functions using λ notation and their named version (using f):

Identity λx.x f(x) = x g ◦ h λx.g(h(x)) f(x) = g(h(x)) First of a pair (uncurried) λ(x, y).x f(x, y) = x First of a pair (curried) λx.λy.x f x y = x

You are free to use any notation you want. Once you get used to the λ-notation, it might become less clumsy than giving names to all the functions. Let’s use our mkInc function to build a function that incremets numbers by 1:

incOne = mkInc 1

Since incOne has type N → N, we can apply it to a number:

incOne 15 = 16 In fact, we can completely bypass the creation of incOne:

(mkInc 1) 15 = 16

Remember the type of mkInc : N → (N → N). It takes one integer and returns a function, that is what is happenning in mkInc 1. This function has type N → N, so we go ahead and apply it to 15 to get the final value: the integer 16. Interestingly, we could have done the same with inc, but we would have to give it the pair (1, 15) as argument. These two functions do the same thing, but they have different types:

inc : N × N → N mkInc : N → (N → N) The second type is said to be curried3. It turns out we can always transform a function application to a tuple (i.e., uncurried) into a curried type. This is called . The opposite is also possible, we can uncurry a function. Notice though that in this case a of the function is no longer possible. We could create incOne with mkInc, but if we are going to use inc, all arguments must be there.

3After Haskell Curry, the guy who also gives the name to the language Haskell.

2 Currying and uncurrying change the type of the function, so this does not mean we can use mkInc (1, 15) for example. This does not typecheck. The function arrow associates to the right, and function application associates to the left, so we can drop the parentheses and write N → N → N for the type and mkInc 1 15 for the application. To think: Why does it make sense for function application to be left-associative and its type to be right-associative? Let us look at a more interesting example. We will use one of our favorite data-structures for that: trees. Suppose we have the following tree where all leaves have implicit empty children (i.e., leaves are node(empty, , empty)). It stores countries and its area in ×1000km2.

(“India”,3287)

(“Egypt”,1000) (“Nepal”,147)

(“Brazil”,8515) (“Fiji”,18) (“Jordan”,92) (“Qatar”,11)

Suppose we want to ommit the number of the area itself, and keep only the name of the country. You can certainly define a function that does that:

name (empty) = empty name (node(tl, (n, a), tr)) = node(name(tl), n, name(tr)) Now you need to change the areas to ×106km2. That is also a simple function:

areaNorm (empty) = empty areaNorm (node(tl, (n, a), tr)) = node(areaNorm(tl), (n, a/1000), areaNorm(tr)) These two functions follow the same pattern. Can we use our abstraction power and what we learned so far about higher-order functions to avoid having to type the same thing again and again? The two functions traverse the tree the same way, the only thing that changes is the operation done on the data in the nodes. We can thus parametrize the function by this operation (which will be a function itself).

map f empty = empty map f (node(tl, x, tr)) = node(map f tl, f(x), map f tr) This operation is called map. Observe that we do not even require that the data within the nodes is a pair. It could be anything. Using map, we can redefine name and areaNorm in one line each:

noName = map (λ(n, a).a) areaNorm = map (λ(n, a).(n, a/1000)) We might also need functions that compute some value depending on the content of the tree, instead of reconstructing it with other data. For example, we might want to find the areas of the biggest and smallest countries. The functions that compute these values are the following:

biggest empty = 0 biggest node(tl, (n, a), tr) = max(biggest tl, a, biggest tr) smallest empty = 20000 smallest node(tl, (n, a), tr) = min(smallest tl, a, smallest tr) We can again use higher-order functions to simplify these definitions, but we need something more. Notice that not only the function (max/min) used is different, but also the value in the base case. These two elements are the parameters for the reduce function:

3 reduce f e empty = e reduce f e node(tl, x, tr) = f(reduce f e tl, x, reduce f e tr) The functions biggest and smallest become:

biggest = reduce (λ(x, (n, a), y).max(x, a, y)) 0 smallest = reduce (λ(x, (n, a), y).min(x, a, y)) 20000 Notice how max and min are used inside a λ function which takes the needed element from the tuple. The functions map and reduce can also be defined for lists. The only difference is that reduce is called fold, and there are two different ways to reduce. Map is straightforward:

map f [] = [] map f (x :: l) = f(x) :: (map f l) Folding can be done using fold-right (foldr) or fold-left (foldl). The difference will be in the order the funtion f will be applied to the terms of the list. Let L = [x1, x2, x3].

foldr f e [] = e foldr f e (x :: l) = f(x, foldr f e l)

foldr f e L will compute f(x1, f(x2, f(x3, e))).

foldl f e [] = e foldl f e (x :: l) = foldl f (f(x, e)) l

foldl f e L will compute f(x3, f(x2, f(x1, e))). If the operation perfomed by f is commutative, these behave the same way.

4