Willow Power: Optimizing Pricing Trees

Michael Curran

A new type of recombining derivative pricing tree is presented as an alternative to standard binomial and trinomial trees. This tree, called the “Willow” tree, expands as the square root of time and therefore avoids the unnecessary computations that are performed in the “wings” of standard binomial and trinomial trees. It is shown that arranging the tree in accordance with the process being modelled gives dramatic improvements in computational efficiency.

The purpose of this paper is to introduce an regions. In Figure 1, the space between the alternative to standard binomial and trinomial Willow and trinomial envelopes is relatively a trees, called the “Willow” tree because of its high probability region, which is neglected by characteristic shape. The motivation for this the trinomial tree. In Figure 2, the trinomial research is the following. Consider the pricing tree wastes nodes outside of the probabilisti- of a European stock using a binomial cally relevant region. This results because the tree with 100 time steps. Using this model, the Willow tree, in concert with Brownian motion, nodes at maturity are spread uniformly across opens relatively fast initially and more slowly 20 standard deviations of the theoretical distri- later on. bution of the underlying. The range of nodes in a standard tree spreads out faster, namely, lin- early in time, than does the (normal) marginal probability distribution, which expands as the square root of time. As a result, many very low probability events are included in the back- wards induction algorithm. This inefficiency can be mitigated by “pruning” the tree, but this solves only part of the problem, and in an awk- ward way that is difficult to generalize to multi- ple factors. A more elegant and convenient remedy for this situation is presented that offers other important benefits, including radi- cally accelerated convergence, particularly for multi-factor models. Figure 1: Outer envelopes of Willow and The Willow tree provides better coverage of trinomial trees for short-time horizon high probability regions of the process space and gives less waste in the low probability

ALGO RESEARCH QUARTERLY 15 VOL. 4, NO. 4 DECEMBER 2001 Willow power

of barriers and “smiles,” are described in the third section, which is followed by some brief concluding remarks. The basic model

This section details the mathematical develop- ment of the Willow tree. To model Brownian motion efficiently, the tree implements a discrete approximation to the normal distribution at each time point. Normal distributions are rele- vant to pricing derivatives since they play an integral role in diffusion processes, which, in Figure 2: Outer envelopes of Willow and turn, are the building blocks of the large major- trinomial trees for long-time horizon ity of derivative pricing models. The basic struc- ture of the tree is constructed using this As with all derivatives pricing trees, the Willow principle as a guideline, with the transitions tree is built in order to approximate Brownian linking successive time points together coming motion well, using a fairly small number of spa- later. tial and time steps. It is used primarily for pricing American and Bermudan instruments, which is Consider dividing the support of the normal dis- straightforward in a recombining lattice or tree. tribution into n intervals and assigning a single A good reference for trinomial trees, as applied value in each interval to represent the corre- to a variety of instrument types, is Hull (2000). sponding stratum of the distribution. Let z1, z2, ... z be the representative normal variates with Numerical comparisons show that the Willow n corresponding probabilities q , q , ... q . For tree radically outperforms standard trees and 1 2 n –1 i – 5 simplifies the implementation of important ------example, we can let zi = N (see Curran aspects of pricing trees that are required for n practical applications. In particular, the length of 1 1994) and qi = --- for i = 1, ..., n. Note that the time steps can be chosen arbitrarily. This is in n contrast to trinomial trees, which have an inher- symmetry of the normal distribution will result ent upper bound on the amount of time (or qua- in the variates having a mean value of zero. dratic variation) that can be modelled at each Let {X : k=1,2,3,... } be a time-inhomoge- step. (This bound is achieved by setting the k probability of the central node equal to zero. In neous Markov chain with state space order to take a larger step, a change in the topol- {1,2,...,n}. We will say that the process {:Y t ≥ 0} has the value t z when X is in ogy of the trinomial tree is required.) Further- tk k k i k more, it is emphasized that, due to the small k number of nodes in the spatial dimension, the ≡ state i, where tk  hj , for some hj> 0, speed advantage of the Willow tree increases j = 1 with the number of factors. j =1,...,k. Then, subject to certain conditions The first section of the paper discusses the con- on the transition probabilities of the embedded chain X , Y converges to Brownian motion as struction of a basic Willow tree. The second sec- k tk →∞ →0 tion provides numerical results, demonstrating n and hj for all j. Note that we must that the Willow tree outperforms trinomial trees take a double limit since the time and space in pricing a (one-factor) and a (two- steps do not have any mutual dependence in this factor) American . Various extensions model, as they do in standard trees. Observe also of the basic model, including the incorporation that this construction obviates the burdensome

ALGO RESEARCH QUARTERLY 16 DECEMBER 2001 Willow power practice of pruning very low probability levels of we consider this aggregated node as being a sin- the Brownian motion from the tree, so that, gle node to avoid unnecessary computations.) while technically this approach is an explicit finite difference method, it is far more efficient To begin the process, the first step in the tree than standard implementations of the PDE must be from the root node to all n nodes at the approach. first time point, with the move to node i having probability qi. By construction, this initial transi- k tion has the required mean and variance. For Let pij denote the transition probability from node i to node j at time step k. The transition subsequent transitions, we insure satisfaction of probabilities must be parametrized by t and h in the above constraints as follows. order to achieve convergence to Brownian Equation 1 can be written as: motion. This parametrization must satisfy the usual requirements for discrete-time models, n k ∀ , consisting of the following three conditions:  pij tk + hk + 1 zj = tkzi ik j = 1 • the process must constitute a martingale or, equivalently, EY[]Y = Y ∀ k (1) tk + hk + 1 tk tk

.n α /k ∀ , • the variance of the process must be equal to 1 + k / pij zj = zi ik, (5) the length of the time step j = 1 0

[] ∀ Var Yt + h Yt = hk + 1 k (2) h k k + 1 k α ≡ k + 1 where k ------. The (time) dependence of the tk • the transition probabilities from each node probabilities on k can thus be expressed in terms must sum to one α of the parameter k.

n k ∀ , . (3) The variance condition (Equation 2) can be  pij = 1 ik j = 1 written as:

Finally, we also impose the restriction: .n /k []2 []2 / pij tk + hk + 1 zj – tkzi n j = 1 0 k  q p = q ∀ jk, . (4) i ij j = h ∀ ik, . i = 1 k + 1

This condition inductively guarantees that the Dividing by tk gives unconditional probability of each state j at time n step k+1 is qj, given that this is the case at time k 2 2 p []1 + α z – z = α ∀ ik, . (6) step k. This means that we are simultaneously  ij k j i k j = 1 imposing that the stationary distribution of Xk be as prescribed, and that Xk be initialized in Thus, at each time step, the (linear) conditions this stationary distribution. (Observe that the on the transition probabilities are fully deter- α root node can be regarded as containing all mined by a single scalar parameter k, greatly n nodes at value 0z for all i, so that the root diminishing the storage requirements when cal- i culating the resulting transition probabilities. need not be taken as a special case of the embedded chain. Nevertheless, when pricing, Assuming n is greater than four (which it always will be!), the system of equations consisting of

ALGO RESEARCH QUARTERLY 17 DECEMBER 2001 Willow power the normalization constraints (Equations 3 and sibly some other discrete set of values corre- 4) and the mean and variance conditions sponding to alternative step sizes, and solve (Equations 5 and 6, respectively) is under-deter- Equation 7 for each value. These computations mined. A unique solution to this system can be are done once, off-line, and the results are obtained by solving the following linear program stored. When we encounter a situation during for each time step k (hereafter, to improve read- pricing where we need a solution for a value of α ability, we suppress the time dependence (i.e., that has not been computed, the step size can be the subscript k)). adjusted slightly, so that the value of α is one for which Equation 7 has been solved, or the solu- tion can be interpolated from known values. To n n α 3 assure that the interpolated probabilities yield Minimize   pij 1 + zj – zi i = 1 j = 1 the correct drift in Equation 5, we interpolate 1 Subject to: linearly based on ------. That is, if p(1) and p(2) 1 + α n ∀ 2  pij = 1 i are the n -vectors of transition probabilities cor- j = 1 α α responding to 1 and 2, respectively, then the n probabilities p(3) for α < α < α are given by q p = q ∀ j 1 3 2  i ij j (7) i = 1 . /n 1 + α p z = z ∀ i / ij j i .1 1 0 ------– ------j = 1 /α α /1 + 1 + ()1 ()3 = ------3 2- p n p / 2 2 1 1 p []1 + α z – z = α ∀ i /------– ------ ij j i 1 + α 1 + α 0 j = 1 1 2 (8) ≥ ∀ .1 1 pij 0 i, j. ------– ------/α α /1 + 1 + ()2 + ------1 3- p . /1 1 /------– ------α α 0 1 + 1 1 + 2 In order to insure sensible (on the order of the standard deviation) and unique increments of Note that the non-negativity and normalization the process, Equation 7 minimizes the expecta- constraints also remain satisfied by Equation 8. tion of the absolute value of the third power of The error in the variance of the increments can all of the possible moves. Note that three is the be kept innocuously small through a judicious smallest non-trivial integer appropriate for this choice of the values of α for which the linear purpose. If two were used, the objective would program (Equation 7) is solved. simply reflect the variance already imposed in the constraints, while powers higher than three General observations were found to give the same transition probabili- We now point out some of the properties of the ties. This objective function keeps the range of basic Willow model. For illustration purposes, possible moves, and consequently higher-order examples of 11-node Willow trees with equal errors during backward induction, small. and unequal time steps are shown in Figures 3 Observe that a tree with time steps of constant and 4, respectively (lines indicate non-zero tran- size h would require solving Equation 7 for sition probabilities). α = 1, 1/2, 1/3, ... . We can select these, or pos-

ALGO RESEARCH QUARTERLY 18 DECEMBER 2001 Willow power

Figure 3: Sample Willow tree with Figure 4: Sample Willow tree with n = 11 and three equal time steps n = 11 and three unequal time steps

As noted earlier, most tree models spread out in all of the constraints. This solution structure a linear pattern. Observe also that the incre- insures a minimum of computation during for- ments of the process in the standard binomial ward recursion, typically used in calibration rou- tree, for example, are symmetrically distributed. tines, and backward recursion, used in pricing The Willow tree, on the other hand, manages to routines (see Hull (2000) for details on forward squeeze the nodes together as time proceeds, rel- and backward recursion). Note that a non-linear ative to the standard trees. It does this by giving objective function would spoil the parsimonious nodes at the tails of the marginal distributions a branching structure, because then there is no skewed distribution over the next time step. guarantee (in fact, it is unlikely) that the num- Instead of having a particular distribution of the ber of branches will be equal to the increments of the process, as in the driftless bino- minimum 4n. mial tree, the Willow tree imposes a particular distribution of the process level at each point in Also, observe that modelling a term structure of σ time (via Equation 4), leaving the distribution of instantaneous , say (t), can be the increments to be determined by the linear achieved through a time change; simply set τ() program (Equation 7). Since the Willow tree zi t to be the value of node i at time t and use does not involve pruning, probability is con- τ()τth+ – ()t α = ------to parametrize the t → t + h served. Therefore, accuracy can be maintained τ()t using trees of substantially less width. t 2 transition probabilities, where τ()t = * σ()s ds . Due to the form of basic feasible solutions of lin- s=0 ear programs, the number of non-zero pij for a Schmidt (1997) shows how a model of Brownian α particular will be equal to the dimensionality motion can be used as the raw material in con- of the basis matrix of the linear program structing models for a variety of other diffusion (Equation 7). So while, on average, each node processes, such as geometric Brownian motion has about four possible transitions at each step and Ornstein-Uhlenbeck processes. The (since there are four constraints per node), a approach described here solves the difficulty he specific node may have more or less than four mentions of modelling an Ornstein-Uhlenbeck transitions. Linear programming solutions are process as an exponentially time-changed minimal—at least in the sense relevant here—in Brownian motion, since it makes it possible to that as few of the variables (transition probabili- take time steps of arbitrary duration. ties) as possible are nonzero, while still satisfying

ALGO RESEARCH QUARTERLY 19 DECEMBER 2001 Willow power

For example, suppose x is an Ornstein-Uhlen- Brownian motions (this can be implemented beck process with mean reversion parameter λ using techniques such as Cholesky and volatility σ(t): decomposition or spectral decomposition (see Press et al. 1992)) dx = –λxdt + σ()t dW , • discrete dividends can be handled either by where W(t) is a Brownian motion. We can solve dropping the level of the tree or by this stochastic differential equation in terms of a modelling the spot price without dividends different Brownian motion W()t : as a geometric Brownian motion (or whatever process is deemed appropriate), t and adding the present value of future –λ()ts– xt()= *e σ()s dw dividends as needed. o Finally, observe that utilizing the Willow tree for .t generating Monte Carlo paths would not confer ˆ /–2λ()ts– []σ()2 = W /*e s ds . advantages beyond doing the same in standard 0 o binomial or trinomial trees. This is because the Monte Carlo process, almost by definition, does We then observe x(t) by taking the Brownian not often explore the low probability sections of motion value from the Willow tree at the time the tree. However, if the Willow tree is used as a ˆ indicated by the argument of W (⋅). In general, to calibration tool to calibrate a model to market form other processes via the one-factor Willow traded instruments, which can be priced using tree, a simple function is applied to the Brown- backward induction, this calibration enjoys the ian node value. For example, geometric Brown- superior efficiency of the Willow tree. ian motion is modelled using an exponential function: Numerical results

2 A typical convergence graph for the price of a St()= S()0 exp []()µσ– /2 t + σWt() , one-factor swaption is shown in Figure 5. Note where S(t) is the stock price at time t, µ is the how the convergence of the Willow tree is far drift rate, σ is the volatility and W(t) is the Wil- smoother than that of a trinomial tree. This low tree level at time t. graph, however, does not give an indication of the relationship between accuracy and computa- In contrast, using standard trees to model a tion time in the two trees. Ornstein-Uhlenbeck process requires adjusting the topology of the tree (which is awkward) or drastically increasing the density of time points as time proceeds (which is computationally expensive).

The basic Willow model easily accommodates a variety of additional features, such as the following:

• level-dependent volatility can be incorporated as in Nelson and Ramaswamy (1990) or Schmidt (1997)

• the extension to multiple factors is Figure 5: Convergence of Willow and straightforward using standard trinomial tree-swaption prices decomposition techniques of “crossing”

ALGO RESEARCH QUARTERLY 20 DECEMBER 2001 Willow power

Figure 6: Accuracy vs. computational time for Figure 7: Accuracy vs. computational time the price of an American put option for the delta of an American put option

To examine how accuracy relates to computa- It is apparent that, as the required accuracy tion time, we use two-factor versions of Willow increases, the advantage of using the Willow and trinomial (i.e., nine branches per node) tree becomes more pronounced. This effect is trees to price an American option to exchange even larger for models with more factors, one share of stock 2 for one share of stock 1. because the Willow tree makes far more efficient That is, if S1 and S2 are the prices of the two use of nodes in the spatial dimensions than does stocks, then the option pays (S1−S2) if the price the trinomial tree. In this example, for instance, of the first stock is higher than that of the sec- only 30 nodes are used for each spatial dimen- ond, and zero otherwise. We define “max error” sion at every discrete time point. Accordingly, to be the largest error observed among all trees the memory requirements are also far less for with at least as many time steps as the tree with multiple-factor implementations of the Willow a given computation time. Specifically, if err(l) is tree than for trinomial trees. the error obtained by a tree with l time steps, then the max error for a tree with k time steps is Extensions of the basic model max{err(l)| l ≥ k}. This definition of error helps Willow tree implementations have been devel- smooth out the results, particularly for the trino- oped for multiple-factor equity models, such as a mial tree, and facilitates a comparison in terms two-factor model, as well as of accuracy versus computation time. multi-factor normal and lognormal short-rate The Willow and trinomial estimates for price models and smile/skew models where volatility is and delta (with respect to S2) are compared in a function of both time and the level of the Figures 6 and 7, respectively. The reported underlying. In such implementations, the Wil- computation times are for a 400 MHz PC. In low tree is extended beyond the class of models moving from right to left in Figures 6 and 7, the outlined above in which the levels of the under- number of time steps for each model is lying variables are functions of a set of (possibly increased, which increases computation time time-changed) Brownian motions. In this and decreases error. For reference purposes, extended set of models, the drift rates and vola- accurate values are 7.968 for the price, and tility rates are allowed to be both time and state 0.459 for the delta. dependent according to the specifications of the underlying processes.

ALGO RESEARCH QUARTERLY 21 DECEMBER 2001 Willow power

Figure 8: Sample- surface Figure 9: Typical convergence behavior generated by Willow smile model of Willow barrier model

Figure 8 shows a sample-implied volatility sur- Barriers in the Willow tree are handled in a face generated by the Willow smile/skew model unique and flexible way. The tree is constructed that allows volatility to be state, as well as time, of a set of “compound” time steps, each consist- dependent as in the model of Dupire (1994). ing of several sub-time steps. The probability of Such models are used when the Black-Scholes a knock-in or knock-out over a compound time assumption of geometric Brownian motion does step is determined using a Brownian bridge com- not hold, but the modeller wishes to avoid intro- putation. (A Brownian bridge is a Brownian ducing extra factors into the model. In the smile/ motion conditioned on fixed, initial and final skew version of the Willow tree, the placement time points.) This computation is performed of the nodes is adjusted parametrically in order over several sub-time steps so that the process to induce a state-dependent volatility. The more closely resembles a Brownian motion, model is parametrized by a smile parameter, a making the use of the Brownian bridge analyti- skew parameter and a parameter that governs cal results more appropriate. Figure 9 shows how how fast the smile/skew fades with time. While convergence is enhanced by applying the parsimonious, this parametrization allows for a Brownian bridge computation over more and wide variety of surfaces, and fits data from mar- more sub-time steps. kets extremely well. Adopting a parametric approach yields other advantages such as stabil- This approach to barrier options allows for time- ity and fast calibration to both European and dependent barriers as well as time- and state- American options. dependent volatility. This method gives far greater modelling flexibility than the standard Note that we need not rerun the linear programs method of arranging a row of nodes to fall on in order to create the Willow smile model. The the barrier. adjustments required to the node placement and transition probabilities can be done in real-time. Concluding remarks

Another example of such an extension is the This paper has introduced an alternative to two-factor convertible bond model, where the standard binomial and trinomial trees. The Wil- drift of the stock price is equal to the prevailing low tree is constructed in a way that places short rate, and the corresponding adjustment is nodes (and thus computation) in regions that applied to the stock price drift at each node in are most important for general pricing problems. the tree to keep the model arbitrage free. The Willow tree has been shown to be an

ALGO RESEARCH QUARTERLY 22 DECEMBER 2001 Willow power extremely efficient improvement over such stan- References dard trees, and one that also provides the flexi- bility for a wide variety of processes and models. Curran, M., 1994, “Strata Gems,” Risk, 7(3): 70–71, March. The particular advantages of the Willow tree are Dupire, B., 1994, “Pricing with a Smile,” Risk, vastly improved convergence of derivative prices 7(1): 18–20, January. and , as well as superior stability. The Wil- low tree also gives complete freedom in choosing Hull, J., 2000, Options, Futures, and Other time steps, which simplifies the implementation Derivatives, Fourth edition, Toronto, ON: of pricing models. Variations of the basic Willow Prentice-Hall International, Inc. tree model admit state-dependent volatilities in Nelson, D., and K. Ramaswamy, 1990, “Simple single- and multifactor models, and efficient and binomial processes as diffusion flexible computation of all types of barrier approximations in financial models,” Review of options. Financial Studies, 3(3): 393–430. Acknowledgements Press, W., S. Teukolsky, W. Vetterling and B. Flannery, 1992, Numerical Recipes in C, The author wishes to thank Hans Beumé and Second edition, Cambridge, Cambridge Mark Davis for helpful conversations, and Frank University Press. Mao and Zhengyun Hu for superb programming Schmidt, W., 1997, “On a general class of one- and analytical support. factor models for the term structure of interest rates,” Finance and Stochastics, 1(1): 3–24.

ALGO RESEARCH QUARTERLY 23 DECEMBER 2001 ALGO RESEARCH QUARTERLY 24 DECEMBER 2001