<<

OPERATIONS RESEARCH informs Vol. 58, No. 1, January–February 2010, pp. 16–29 ® issn 0030-364X eissn 1526-5463 10 5801 0016 doi 10.1287/opre.1090.0729 © 2010 INFORMS

Dynamic with a Prior on Market Response

Vivek F. Farias Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, [email protected] Benjamin Van Roy Stanford University, Stanford, California 94305, [email protected]

We study a problem of faced by a vendor with limited inventory, uncertain about demand, and aiming to maximize expected discounted revenue over an infinite time horizon. The vendor learns from purchase data, so his strategy must take into account the impact of on both revenue and future observations. We focus on a model in which customers arrive according to a Poisson process of uncertain rate, each with an independent, identically distributed reservation price. Upon arrival, a customer purchases a unit of inventory if and only if his reservation price equals or exceeds the vendor’s prevailing price. We propose a simple heuristic approach to pricing in this context, which we refer to as decay balancing. Computational results demonstrate that decay balancing offers significant revenue gains over recently studied certainty equivalent and greedy heuristics. We also establish that changes in inventory and uncertainty in the arrival rate bear appropriate directional impacts on decay balancing in contrast to these alternatives, and we derive worst-case bounds on performance loss. We extend the three aforementioned heuristics to address a model involving multiple customer segments and stores, and provide experimental results demonstrating similar relative merits in this context. Subject classifications: dynamic pricing; demand learning; revenue management. Area of review: Revenue Management. History: Received March 2007; revisions received November 2007, July 2008, January 2009; accepted March 2009. Published online in Articles in Advance November 19, 2009.

1. Introduction Our focus is on an extension of this model in which the Consider a vendor of winter apparel. New items are stocked arrival rate is uncertain and the vendor learns from sales in the autumn and sold over several months. Because of data. Incorporating such uncertainty is undoubtedly impor- significant manufacturing lead times and fixed costs, items tant in many industries that practice revenue management. are not restocked over this period. Evolving fashion trends For instance, in the winter fashion apparel example, there generate great uncertainty in the number of customers who might be great uncertainty in how the market will respond to the product at the beginning of a sales season; the vendor to this article and distributed this copy aswill a courtesy toconsider the author(s). purchasing these items. To optimize revenue, the vendor should adjust prices over time. But how should must take into account how price influences both revenue these prices be set as time passes and units are sold? This and future observations from which he can learn. is representative of problems faced by many vendors of In this setting, it is important to understand how uncer- seasonal, fashion, and perishable goods. tainty should influence price. However, uncertainty in the copyright There is a substantial literature on for arrival rate makes the analysis challenging. Optimal pricing such a vendor (see Talluri and van Ryzin 2004 and refer- strategies can be characterized by a Hamilton-Jacobi- holds ences therein). Gallego and van Ryzin (1994), in particular, Bellman (HJB) equation, but this approach is typically not formulated an elegant model in which the vendor starts analytically tractable (at least for the models we consider). with a finite number of identical indivisible units of inven- Furthermore, for arrival rate distributions of interest, grid- tory. Customers arrive according to a Poisson process, with based numerical methods require discommoding computa- INFORMS Additional information, including rights and permission policies, is available atindependent, http://journals.informs.org/. identically distributed reservation prices. In tional resources and generate strategies that are difficult to the case of exponentially distributed reservation prices, the interpret. As such, researchers have designed and analyzed optimal pricing strategy is easily derived. The analysis of heuristic approaches. Gallego and van Ryzin (1994) can be used to derive pricing Aviv and Pazgal (2005a) studied a certainty equivalent strategies that optimize expected revenue over a finite hori- heuristic for exponentially distributed reservation prices, zon and is easily extended to the optimization of discounted which at each point in time computes the conditional expected revenue over an infinite horizon. Resulting strate- expectation of the arrival rate, conditioned on observed gies provide insight into how prices should depend on the sales data, and prices as though the arrival rate is equal arrival rate, expected reservation price, and the length of to this expectation. Araman and Caldentey (2009) recently the horizon or discount rate. proposed a more sophisticated heuristic that takes arrival

16 Farias and Van Roy: Dynamic Pricing with a Prior on Market Response Operations Research 58(1), pp. 16–29, © 2010 INFORMS 17

rate uncertainty into account when pricing. The idea is to demonstrating relative merits analogous to our base case of use a strategy that is greedy with respect to a particular a single branch and single-customer class. approximate value function. In this paper, we propose and We establish bounds on performance loss incurred by analyze decay balancing, a new heuristic approach which decay balancing relative to an optimal policy. These bounds makes use of the same approximate value function as the indicate a certain degree of robustness. For example, when greedy approach of Araman and Caldentey (2009). customer reservation prices are exponentially distributed Several idiosyncrasies distinguish the models studied and their arrival rate is Gamma distributed, we establish in Aviv and Pazgal (2005a) and Araman and Caldentey that decay balancing always garners at least one third of (2009). The former models uncertainty in the arrival rate the maximum expected discounted revenue. Allowing for in terms of a Gamma distribution, whereas the latter uses a dependence on arrival rate uncertainty and/or restricting a two-point distribution. The former considers maximiza- attention to specific classes of reservation price distribu- tion of expected revenue over a finite horizon, whereas the tions leads to substantially stronger bounds. It is worth latter considers expected discounted revenue over an infi- noting that no performance loss bounds (uniform or oth- nite horizon. To elucidate relationships among the three erwise) have been established for the certainty equivalent heuristic strategies, we study them in the context of a com- and greedy heuristics. Furthermore, computational results mon model. In particular, we take the arrival rate to be demonstrate that our bounds are not satisfied by the greedy distributed according to a finite mixture of Gamma distribu- heuristic. tions. This is a very general class of priors and can closely Aside from Aviv and Pazgal (2005a) and Araman and approximate any bounded continuous density. We take the Caldentey (2009), there is a significant literature on dynamic objective to be maximization of expected discounted rev- pricing while learning about demand. A recent paper in this enue over an infinite horizon. It is worth noting that in regard is Aviv and Pazgal (2005b) which considers, in a the case of exponentially distributed reservation prices, discrete time setting, a partially observable Markov mod- such a model is equivalent to one without discounting but ulated demand model. As we will discuss further in §6, a where expected reservation prices diminish exponentially special case of the heuristic they develop is closely related to decay balancing. Lin (2006) considers a model identical over time. This might make it an appropriate model for to Aviv and Pazgal (2005a) and develops heuristics which certain seasonal, fashion, or perishable products. Our mod- are motivated by the behavior of a seller who knows the eling choices were made to provide a simple yet fairly gen- arrival rate and anticipates all arriving customers. Bertsimas eral context for our study. We expect that our results can and Perakis (2006) develop several algorithms for a discrete, be extended to other classes of models such as those with finite time-horizon problem where demand is an unknown finite time horizons, although this is left for future work. linear function of price plus Gaussian noise. This allows for The certainty equivalent heuristic is natural and simple least-squares-based estimation. Lobo and Boyd (2003) study to implement. It does not, however, take uncertainty in a model similar to Bertsimas and Perakis (2006) and pro- the arrival rate into account. While the greedy heuristic pose a “price-dithering” heuristic that involves the solution does attempt to do this, we demonstrate through compu- of a semidefinite convex program. All the aforementioned tational experiments that the performance of this approach work is experimental; no performance guarantees are pro- to this article and distributed this copy ascan a courtesy to degrade the author(s). severely at high levels of uncertainty in arrival vided for the heuristics proposed. Cope (2006) studies a rate. While being only slightly more complex to implement Bayesian approach to pricing where inventory levels are than the certainty equivalent heuristic and typically less so unimportant (this is motivated by sales of online services) than greedy pricing, we demonstrate that decay balancing and there is uncertainty in the distribution of reservation offers significant performance gains over these heuristics copyright price. His work uses a very general prior distribution—a especially in scenarios where the seller begins with a high Dirichlet mixture—on reservation price. Modeling this type degree of uncertainty in arrival rate. From a more qualita- of uncertainty within a framework where inventory levels holds tive perspective, uncertainty in the arrival rate and changes do matter represents an interesting direction for future work. in inventory bear appropriate directional impacts on decay In contrast with the above work, Burnetas and Smith (1998) balancing prices: uncertainty in the arrival rate increases and Kleinberg and Leighton (2004) consider nonparametric price, while a decrease in inventory increases price. In con- approaches to pricing with uncertainty in demand. How- INFORMS Additional information, including rights and permission policies, is available attrast, http://journals.informs.org/. uncertainty in the arrival rate has no impact on cer- ever, those models again do not account for inventory levels. tainty equivalent prices, while greedy prices can increase Recently, Besbes and Zeevi (2006) presented a nonpara- or decrease with inventory. metric algorithm for “blind” pricing of multiple products In addition to our base model, we consider a generaliza- that use multiple resources, similar to the model considered tion that involves a vendor with multiple branches that can in Gallego and van Ryzin (1997). Their algorithm requires offer different prices and tend to attract different classes essentially no knowledge of the demand function and is of customers. The branches share and learn from each oth- optimal in a certain asymptotic regime. However, the algo- ers’ data, and they price to maximize aggregate expected rithm requires trying each alternative among a multidimen- discounted revenue. We extend the three heuristics to the sional grid in the space of price vectors and, therefore, could context of this model and present computational results take a long time to adequately learn about demand. Farias and Van Roy: Dynamic Pricing with a Prior on Market Response 18 Operations Research 58(1), pp. 16–29, © 2010 INFORMS

The remainder of this paper is organized as follows. Let tk denote the time of the kth purchase and nt = In §2, we formulate our model and cast our pricing prob- tktk  t denote the number of purchases made by cus- lem as one of stochastic optimal control. Section 3 develops tomers arriving at or before time t. The vendor’s expected the HJB equation for the optimal pricing problem in the revenue, discounted at a rate of >0, is given by contexts of known and unknown arrival rates. Section 4  −t first introduces existing heuristics for the problem and then E e pt dnt  introduces a new heuristic—decay balancing—which is the t=0

focus of this paper. This section also discusses structural Let 0 = inft xt = 0 be the time at which the final unit of properties of the decay balancing policy. Section 5 presents inventory is sold. For t  0, nt follows a Poisson process  a computational study that compares decay balancing to with intensity F pt. Consequently (see Theorem III.T9 certainty equivalent and greedy pricing heuristics as well as in Bremaud 1981), one may show that a clairvoyant algorithm. Section 6 is devoted to a theoreti-  0 −t −t  cal performance analysis of the decay balancing heuristic. E e pt dnt = E e ptF ptdt  We provide worst-case performance guarantees that depend t=0 t=0 on initial uncertainty in market response. When the arrival We now describe the vendor’s optimization problem. We rate is Gamma distributed and reservation prices are expo- first consider the case where the vendor knows  and later nentially distributed, we prove a uniform performance guar- allow for arrival rate uncertainty. In the case with known antee for our heuristic. Section 7 discusses an extension of arrival rate, we consider pricing policies  that are mea- our heuristic to a multidimensional version of the problem surable real-valued functions of the inventory level. The which involves a vendor with multiple stores and several price is irrelevant when there is no inventory, and as a customer segments with different demand characteristics. convention, we will require that  0 =. We denote . A vendor who employs pricing This section presents computational results that are quali- the set of policies by  policy  ∈  sets price according to p =  x , where tatively similar to those in §5. Section 8 concludes.  t t xt = x0 − nt, and receives expected discounted revenue 0  −t  2. Problem Formulation J x = Ex e ptF ptdt  t=0 We consider a problem faced by a vendor who begins with where the subscripts of the expectation indicate that x = x x identical indivisible units of a product and dynamically 0 0 and p =  x . The optimal discounted revenue is given adjusts price p over time t ∈ 0 . Customers arrive t t t by J ∗ x = sup J  x, and a policy  is said to be according to a Poisson process with rate . As a conven-  ∈  optimal if J ∗ = J  . tion, we will assume that the arrival process is right contin-   Suppose now that the arrival rate  is not known, but uous with left limits. Each customer’s reservation price is rather, the vendor starts with a prior on  that is a finite an independent random variable with cumulative distribu- mixture of Gamma distributions. A Kth order mixture of tion F ·. A customer purchases a unit of the product if it is K this type is parameterized by vectors a0, b0 ∈ + and a vec- to this article and distributed this copy asavailable a courtesy to the author(s). at the time of his arrival at a price no greater than K tor of K weights w0 ∈ + that sum to unity. The density, g, his reservation price; otherwise, the customer permanently for such a prior is given by leaves the system. a For convenience, we introduce the notation F p = 1 − b 0ka0k−1e−b0k g  = w 0k  F pfor the tail probability. We place the following restric- 0k copyright k ! a0k tions on F ·: where ! denotes the Gamma function: ! x =  holds Assumption 1. 1. F · has a differentiable density f · sx−1e−s ds. The expectation and variance are E# = s=0 w a /b  $ and Var# = w a a +1/ with support +. k 0k 0k 0k 0 k 0k 0k 0k 2 2 2. F has a nondecreasing hazard rate. That is, p = b0k − $0. Any prior on  with a continuous, bounded f p/F p is nondecreasing in p. density can be approximated to an arbitrary accuracy within such a family (see Dalal and Hall 1983). Moreover, INFORMS Additional information, including rights and permission policies, is available at http://journals.informs.org/. Assumption 1 is satisfied by many relevant distribu- as we describe below, posteriors on  continue to remain tions including the exponential, extreme value, logistic, and within this family rendering such a model parsimonious as Weibull, to name a few. We introduce this assumption to well as relatively tractable. facilitate use of first-order optimality conditions to charac- The vendor revises his beliefs about  as sales are terize solutions of various optimization problems that will observed. In particular, at time t, the vendor obtains a pos- arise in our discussion. It is possible to extend our results terior that is a kth order mixture of Gamma distributions to reservation price distributions that do not satisfy this with parameters assumption, although that would require additional techni- t  cal work. atk = a0k + nt and btk = b0k + F p d =0 Farias and Van Roy: Dynamic Pricing with a Prior on Market Response Operations Research 58(1), pp. 16–29, © 2010 INFORMS 19

∗ and weights that evolve according to It is easy to show that J is the unique solution to the HJB equation H J = 0. The first-order optimality condition a /b − w a /b   tk tk k tk tk tk for prices yields an optimal policy of the form dwtk = wtk dnt k wtk atk/btk ∗ ∗ ∗ ∗  x = 1/  x + J x − J x − 1 ∗ + atk/btk − wtk atk/btk dt  for x>0. By Assumption 1 and the fact that J x  k ∗ J x − 1, the above equation always has a solution in +. ∗ Note that the vendor does not observe all customer Given that J satisfies the HJB equation, we have ∗ arrivals but only those that result in sales. Furthermore, J x lowering price results in more frequent sales and therefore  sup F p p + J ∗ x − 1 − J ∗ x if x>0 more accurate estimation of the demand rate.    p0 We consider pricing policies  that are measurable = (1)  real-valued functions of the inventory level and arrival 0 otherwise rate distribution parameters. As a convention, we require Assumption 1 guarantees that sup F p p − c is a that  0abw = for all arrival rate distribution p0 decreasing function of c on  . This allows one to compute parameters a, b, and w. We denote the domain by + J ∗ x given J ∗ x − 1 via bisection search. This offers an  =  × k × k × k and the set of policies by . Let   + + + efficient algorithm that computes J ∗ 0 J ∗ 1J∗ x z = x a b w . A vendor who employs pricing policy    t t t t t in x iterations. As a specific concrete example, consider the  ∈  sets price according to p =  z  and receives t t case where reservation prices are exponentially distributed expected discounted revenue with mean r>0. We have  0  1  −t   ∗ ∗ J z = Ez  e ptF ptdt  r exp J x−1−J x−1 if x>0 t=0 J ∗ x= r    where the subscripts of the expectation indicate that z0 = z 0 otherwise and p =  z . Note that, unlike the case with known t t It follows that arrival rate,  is a random variable in this expectation. ∗ ∗ −1 ∗ The optimal discounted revenue is given by J z = J x = rW e / exp J x − 1/r (2)  sup∈ J z, and a policy  is said to be optimal if for x>0, where W · is the Lambert W-function (the ∗  ∗ J = J . We will use the notation J for the optimal value inverse of xex). function when we wish to emphasize the dependence on . We note that a derivation of the optimal policy for the case of a known arrival rate can also be found in Araman 3. Optimal Pricing and Caldentey (2009), among other sources. An optimal pricing policy can be derived from the value The value function for the case of a known arrival rate function J ∗. The value function in turn solves the HJB will be used in the design of our heuristic for the case equation. Unfortunately, direct solution of the HJB equa- with arrival rate uncertainty. We establish here properties to this article and distributed this copy as a courtesy to the author(s). ∗ tion, either analytically or computationally, does not appear of this value function J and its associated optimal pol- icy ∗, which we will later use. We will make the following to be a feasible task and one must resort to heuristic poli-  cies. With an end to deriving such heuristic policies, we assumption to simplify our analysis. ∗ characterize optimal solutions to problems with known and Assumption 2. J x is a differentiable function of  on copyright unknown arrival rates and discuss some of their properties. + for all x ∈ . Note that this assumption is satisfied for the case of holds 3.1. The Case of a Known Arrival Rate exponential reservation prices. The following comparative ∗ We begin with the case of a known arrival rate. For each statics for  are proved in the online appendix.  x0+1   0 and  ∈ , define an operator H   → An electronic companion to this paper is available as part x0+1 by of the online version that can be found at http://or.journal. INFORMS Additional information, including rights and permission policies, is available at http://journals.informs.org/. informs.org/.   H J x= F  x  x+ J x− 1 − J x− J x ∗ Lemma 1.  x is decreasing in x (on ) and nonde- Recall that  0 =. In this case, we interpret creasing in  (on +). F   0 0 as a limit, and Assumption 1 (which ensures For a fixed inventory level, it is natural to expect decreas- a finite, unique static revenue maximizing price) implies ing returns to increases in the arrival rate ; the follow-  that H J 0 =−J 0. Furthermore, we define the ing lemma, proved in the online appendix, formalizes this dynamic programming operator intuition. Lemma 2. For all x ∈ , J ∗ x is an increasing, concave H J x= sup H  J x    function of  on  . ∈ + Farias and Van Roy: Dynamic Pricing with a Prior on Market Response 20 Operations Research 58(1), pp. 16–29, © 2010 INFORMS

3.2. The Case of an Unknown Arrival Rate for x>0. The existence of a unique solution to this equa- ˜ tion is guaranteed by Assumption 1. As derived in the Let Sx˜ a˜ b˜ =  x a b w ∈ S a + x =˜a +˜x b  b w  0 1 w = 1 denote the set of states that might be preceding section, this is an optimal policy for the case ˜ where the arrival rate is known and equal to $ z, which is visited starting at a state with x0 =˜x, a0 =˜a, b0 = b. Let  denote the set of functions J  →  such that the expectation of the arrival rate given a prior distribution ˜ with parameters a, b, and w. The certainty equivalent pol- supz∈ J z <  for all x˜ and b>0 and that have x˜ a˜ b˜ icy is computationally attractive because J ∗ is easily com- bounded derivatives with respect to the third and fourth  arguments. We define $ z to be the expectation for the puted numerically (and in some cases, even analytically) as discussed in the previous section. As one would expect, prior on arrival rate in state z, so that $ z = k wkak/bk. For each policy  ∈ , we define an operator prices generated by this heuristic increase as the inventory x decreases. However, arrival rate uncertainty bears no influ- H  J z= F  z $ z  z + J z  − J z ence on price—the price only depends on the arrival rate distribution through its expectation $ z. Hence, this pric- + DJ z − J z ing policy is unlikely to appropriately address information acquisition. where z ∈ x˜ a˜ b˜ , z = xabw and z = x − 1 a + 1bw . Here, w is defined according to w = k 4.2. Greedy Pricing wkak/ bk$ z, and D is a differential operator given by We now present another heuristic which was recently pro- d d DJ z = w $ z − a /b  J z+ J z posed by Araman and Caldentey (2009) and does account k k k dw db k k k for arrival rate uncertainty. To do so, we first introduce the We now define the dynamic programming operator H notion of a greedy policy. A policy  is said to be greedy  according to with respect to a function J if H J = HJ . The first-order necessary condition for optimality and Assumption 1 imply HJ z = sup H  J  z (3) that the greedy price is given by the solution to  1 1 + Using standard dynamic programming arguments, one  z= + J z− J z  − DJ z can show that the value function J ∗ solves the HJB equation  z $ z for z = xabw with x>0 and z = x −1a+1bw  HJ z = 0 (4) with wk = wkak/ bk$. Furthermore, a policy  is optimal if and only if H  J ∗ = 0. Perhaps the simplest approximation one might consider ∗ ∗ This equation provides a prescription for efficient compu- to J z is J$ z x, the value for a problem with known tation of an optimal policy given the value function. Unfor- arrival rate $ z. One troubling aspect of this approx- tunately, there is no known analytical solution to the HJB imation is that it ignores the variance (as also higher equation when the arrival rate is unknown, even for special moments) of the arrival rate. The alternative approximation cases such as a Gamma or two-point prior with expo- proposed by Araman and Caldentey (2009) takes variance to this article and distributed this copy as a courtesy to the author(s). nential reservation prices. Furthermore, grid-based numeri- into account. In particular, their heuristic employs a greedy cal solution methods require discommoding computational policy with respect to the approximate value function J˜ resources and generate strategies that are difficult to inter- which takes the form

pret. As such, simple effective heuristics are desirable. ∗ copyright ˜ J z= EJ x# 4. Heuristics where the expectation is taken over the random variable , holds This section presents three heuristic pricing policies. The which is drawn from a Gamma mixture with parameters first two have been considered in prior literature and the a, b, and w. J z˜ can be thought of as the expected optimal third is one we propose and analyze in this paper. value if  is to be observed at the next time instant. It is interesting to note that this approximation is in the same INFORMS Additional information, including rights and permission policies, is available at4.1. http://journals.informs.org/. Certainty Equivalent spirit as the “full information” approximation considered in Aviv and Pazgal (2005b). The greedy price, however, Aviv and Pazgal (2005a) studied a certainty equivalent is distinct from the full information price considered there. heuristic that at each point in time computes the conditional Because it can only help to know the value of , J ∗ x  expectation of the arrival rate, conditioned on observed  EJ ∗ z #. Taking expectations of both sides of this ineq- sales data, and prices as though the arrival rate is equal to uality, we see that J˜ is an upper bound on J ∗. The approx- this expectation. In our context, the price function for such ∗ ∗ imation J$ z x is a looser upper bound on J z (which a heuristic uniquely solves ∗ follows from concavity of J in ). Consequently, we have 1 the following result, whose proof can be found in the online  z = + J ∗ x − J ∗ x − 1 ce $ z $ z appendix. ce z Farias and Van Roy: Dynamic Pricing with a Prior on Market Response Operations Research 58(1), pp. 16–29, © 2010 INFORMS 21

Lemma 3. For all z ∈  , >0, decreasing in p. Consequently, holding a, b, and w fixed, as x decreases, J ∗ z decreases, and therefore ∗ z increases.  ∗ ∗ ∗ ∗ F p p $ z ∗ ∗ J z  J z˜  J x   Furthermore, because J z  J$ z x, we see that for a $ z  fixed inventory level x and expected arrival rate $ z, the where p∗ is the static revenue maximizing price. optimal price in the presence of uncertainty is higher than in the case where the arrival rate is known exactly. The greedy price in state z is thus the solution to Like greedy pricing, the decay balancing heuristic relies on an approximate value function. We will use the same 1 1 + ˜ ˜ ˜ approximation J˜. But instead of following a greedy policy gp z = + J z− J z − DJ z gp z $ z with respect to J˜, the decay balancing approach chooses a policy  that satisfies the balance condition for z = xabw with x>0 and z = x −1a+1bw  db with w = w a / b $ z. k k k k F  z We have observed through computational experiments db $ z = J˜ z (5)  z (see §6) that when reservation prices are exponentially dis- db tributed and the vendor begins with a Gamma prior with with the decay rate approximated using J z˜ . The fol- scalar parameters a and b, greedy prices can increase or lowing lemma guarantees that the above balance equation decrease with the inventory level x, keeping a and b fixed. always has a unique solution so that our heuristic is well This is clearly not optimal behavior. defined. The result is a straightforward consequence of Assumption 1 and the fact that F p ∗/  p∗$ z  4.3. Decay Balancing J z˜  J ∗ z = F  ∗ z/  ∗ z$ z, where p∗ is We now describe decay balancing, a new heuristic that the static revenue maximizing price. will be the primary subject of the remainder of the paper. Lemma 4. For all z ∈  , there is a unique p  0 such that To motivate the heuristic, we start by deriving an alterna-  ˜ tive characterization of the optimal pricing policy. The HJB F p/ p$ z = J z. Equation (4) yields Unlike certainty equivalent and greedy pricing, uncer- tainty in the arrival rate and changes in inventory level have maxF p $ z p+J ∗ z −J ∗ z+ DJ ∗ z=J ∗ z the correct directional impact on decay balancing prices. p0 Holding a, b, and w fixed, as x decreases, J z˜ decreases,

for all z = xabw and z = x − 1a+ 1bw  with and therefore db z increases. Holding x and the expected ˜ ∗ x>0 and wk = wkak/ bk$ z. This equation can be arrival rate $ z fixed, J z J$ z x, so that the decay viewed as a balance condition. The right-hand side repre- balance price with uncertainty in arrival rate is higher than sents the rate at which value decays over time; if the price when the arrival rate is known with certainty. were set to infinity so that no sales could take place for a time increment dt but an optimal policy is used thereafter, 5. Computational Study ∗ ∗

to this article and distributed this copy asthe a courtesy to current the author(s). value would become J z−J zdt. The left- hand side represents the rate at which value is generated This section will present computational results that high- from both sales and learning. The equation requires these light the performance of the decay balancing heuristic. We two rates to balance so that the net value is conserved. consider both Gamma and Gamma mixture priors, and Note that the first-order optimality condition implies that we restrict attention to exponentially distributed reservation copyright if J z  − J z+ 1/$ z DJ  z < 0 (which must neces- prices. Furthermore, we consider only problem instances −1 sarily hold for J = J ∗), where  = e and r = 1; as we will discuss in §6, this is not restrictive (see Lemmas 5 and 10). holds F p ∗ $ z=maxF p $ z p+J z −J z+ DJ  z Performance Relative to a Clairvoyant Algorithm. p∗ p0 Consider a “clairvoyant” algorithm that has access to the

∗ realization of  at t = 0 and subsequently uses the pricing if p attains the maximum in the right-hand side. Interest- ∗ INFORMS Additional information, including rights and permission policies, is available at http://journals.informs.org/. policy  . The expected revenue garnered by such a pric- ingly, the maximum depends on J only through p∗. Hence,  ing policy upon starting in state z is simply EJ ∗ x# = the balance equation can alternatively be written in the fol-  J z˜ which, by Lemma 3, is an upper bound on J ∗ z. lowing simpler form: Our first experiment measures the average revenue earned F  ∗ z using decay balancing with that earned using such a clair- $ z = J ∗ z ∗ z voyant algorithm. The results are summarized in Table 1. We consider two cases In the first,  is drawn from a which implicitly characterizes ∗. Gamma distribution with shape parameter a = 004 and This alternative characterization of ∗ makes obvious scale parameter b = 0001, which corresponds to a mean two properties of optimal prices. Note that F p/ p is of 40 and a coefficient of variation of 5. In the second, Farias and Van Roy: Dynamic Pricing with a Prior on Market Response 22 Operations Research 58(1), pp. 16–29, © 2010 INFORMS

Table 1. Performance vs. a clairvoyant algorithm. Figure 1. Performance gain over the certainty equiva- lent and greedy pricing heuristics at various x0 = 1 x0 = 2 x0 = 5 x0 = 10 x0 = 20 x0 = 40 Inventory level (%) (%) (%) (%) (%) (%) inventory levels for Gamma prior.

Performance −13 −10 −6 −37 −2 −05 Performance gain over certainty equivalent heuristic gain (Gamma) 40 Performance gain −149 −123 −67 −43 −27 −24 30 (Gamma mixture) 20

Gain (%) 10  is drawn from a two-point Gamma mixture with parame- 0 ters a = 001023 007161# , b = 000102 000102# , w = 1 2 3 4 5 6 7 8 9 10 0505# , which correspond to a mean of 40 and a coef- Initial inventory level ficient of variation of 5. These parameter values are repre- Performance gain over greedy pricing sentative of a high level of uncertainty in . As is seen in 300 Table 1, the performance of the decay balancing heuristic is surprisingly close to that of the clairvoyant algorithm for 200 both prior distributions. 100 Gain (%) Performance Relative to Certainty Equivalent and 0 Greedy Pricing Heuristics. We finally turn to studying 1 2 3 4 5 6 7 8 9 10 the performance gains offered by decay balancing relative Initial inventory level to the certainty equivalent and greedy policies. 1. Dependence on initial inventory: The gains offered 2. Dependence on initial uncertainty: The performance by decay balancing relative to the certainty equivalent and gains offered by the decay balancing heuristic are higher greedy heuristic are pronounced at lower initial inventory at higher initial levels of uncertainty. We present a lower levels, i.e., in regimes where judiciously managing inven- bound on the maximal performance gain over the greedy tory is crucial. We consider relative performance gains for and certainty equivalent heuristics for various coefficients inventory levels between 1 and 10. We consider two initial of variation of an initial Gamma prior on . See Figure 3, priors for all heuristics: a Gamma prior with a = 004 and wherein the data point for each coefficient of variation c b = 0001 (corresponding to a mean of 40 and a coefficient corresponds to an experiment with a = 1/c2, b = 0001 of variation of 5) and a Gamma mixture prior with parame- (which corresponds to a mean of 1000/c2), and an inven- ters a = 001033 009297# , b = 000103 000103# , w = tory level of 1 and 2 for the certainty equivalent and greedy 0505# (corresponding to a mean of 50 and a coeffi- pricing heuristics, respectively. These experiments indicate cient of variation of 447). Figures 1 and 2 indicate that that the potential gain from using decay balancing increases we offer a substantial gain in performance over the cer- with increasing uncertainty in , and that in fact the gain tainty equivalent and greedy pricing heuristics, especially to this article and distributed this copy as a courtesy to the author(s). over certainty equivalence can be as much as a factor of 13 at lower initial inventory levels. The greedy policy per- and over the greedy policy can be as much as a factor of 4. forms particularly poorly. In addition, as discussed ear- 3. Value of “active” learning: We examine the “learn- lier, that policy exhibits qualitative behavior that is clearly ing gains” offered by decay balancing relative to the cer- suboptimal: For a problem with a Gamma prior, mean tainty equivalent and greedy heuristics. In particular, we copyright reservation price 1, and discount factor e−1, we com- do this by comparing the gains offered by all three pricing

pute gp 1 01 01 =126<gp 4 01 01 =161> heuristics relative to a naive no-learning heuristic that, at holds gp 10 01 01 =125 so that, all other factors remain- every point in time, prices assuming that the arrival rate ing the same, prices might increase or decrease with an  is equal to the initial prior mean. We observe that these increase in inventory level. Our gain in performance falls at gains are higher at lower initial levels of uncertainty. As higher initial inventory levels. This is not surprising; intu- before, we consider a Gamma prior with mean 40 and INFORMS Additional information, including rights and permission policies, is available atitively, http://journals.informs.org/. the control problem at hand is simpler there because coefficient of variation 5 and measure performance for a we are essentially allowed to sacrifice a few units of inven- range of starting inventory levels for each of the three tory early on so as to learn quickly without incurring much heuristics as also the no-learning heuristic. We define the of a penalty. learning gain relative to the certainty equivalent heuristic

Table 2. Relative learning gains offered by decay balancing.

Inventory x0 = 1 x0 = 2 x0 = 3 x0 = 4 x0 = 5 x0 = 6 x0 = 7 x0 = 8 x0 = 9 x0 = 10

ce 109% 26.2% 27.7% 16.8% 13.1% 10.0% 5.7% 4.4% Farias and Van Roy: Dynamic Pricing with a Prior on Market Response Operations Research 58(1), pp. 16–29, © 2010 INFORMS 23

Figure 2. Performance gain over the certainty equiva- high levels of uncertainty in market response, the decay bal- lent and greedy pricing heuristics at various ancing heuristic offers a level of performance not far from inventory levels for Gamma mixture prior. that of a clairvoyant algorithm. Moreover, in all our com- putational experiments, the decay balancing heuristic dom- Performance gain over certainty equivalent heuristic 30 inates both the certainty equivalent and the greedy pricing heuristics; this is especially so at high levels of uncertainty 20 in the initial prior and at high “load factors,” i.e., scenarios where judicious inventory management is important. One 10 Gain (%) final issue is robustness—in its quest to intelligently lever-

0 age uncertainty in market response, the greedy heuristic 1 2 3 4 5 6 7 8 9 10 experiences a drastic loss in performance. With this issue in Initial inventory level mind, the following section provides a performance analy- Performance gain over greedy pricing sis that rules out the drastic performance decay experienced 150 with greedy pricing and sheds some light on the critical 100 determinants of performance for our pricing problem.

Gain (%) 50 6. Bounds on Performance Loss 0 1 2 3 4 5 6 7 8 9 10 For the decay balancing price to be a good approximation Initial inventory level to the optimal price at a particular state, one requires only a good approximation to the value function at that state (and     + as 100× J db − J nl / J ce − J nl  −100% and that rel- not its derivatives). This section characterizes the quality ative to the greedy pricing heuristic is defined similarly. of our approximation to J ∗ and uses such a characteriza- Table 2 summarizes these relative learning gains. Because tion to ultimately bound the performance loss incurred by the greedy pricing policy consistently underperforms the decay balancing relative to optimal pricing. Our analysis no-learning policy, we only report learning gains relative to will focus primarily on the case of a Gamma prior and the certainty equivalent heuristic. In addition to noting that exponential reservation prices (although we will also pro- the decay balancing heuristic offers consistent gains rela- vide performance guarantees for other types of reservation tive to the no-learning heuristic, the data reported suggest price distributions). A rigorous proof of the existence and that the decay balancing heuristic likely captures a substan- tial portion of the gains to be had from using a learning uniqueness of a solution to the HJB equation for this case scheme relative to a naive heuristic that does not learn. may be found in the online appendix. We will show that To summarize our computational experience with in this case, decay balancing captures at least 333% of the Gamma and Gamma mixture priors, we observe that even at expected revenue earned by the optimal algorithm for all to this article and distributed this copy as a courtesy to the author(s).

Figure 3. Lower bound on maximal performance gain over the certainty equivalent heuristic (left) and greedy heuristic (right) for various coefficients of variation.

copyright Maximal performance gain over certainty Maximal performance gain over greedy equivalent heuristic pricing heuristic 45 350

holds Estimated gain 40 98% confidence upper bound 300 35 98% confidence lower bound

30 250 INFORMS Additional information, including rights and permission policies, is available at http://journals.informs.org/. 25 200 20

Gain (%) 150 Gain (%) 15

10 100 5 50 0

–5 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Coefficient of variation of initial prior Coefficient of variation of initial prior Farias and Van Roy: Dynamic Pricing with a Prior on Market Response 24 Operations Research 58(1), pp. 16–29, © 2010 INFORMS

choices of x0 > 1, a0 > 0, b0 > 0, >0, and r>0 when for exponential reservation prices, establishes the uniform reservation prices are exponentially distributed with mean bound 1/3  J db z/J ∗ z  1. r>0. Such a bound is an indicator of robustness across We begin our proof with a simple dynamic programming all parameter regimes. Decay balancing is the first heuristic result that we will have several opportunities to use. The for problems of this type for which a uniform performance proof is a consequence of Dynkin’s formula and can be guarantee is available. found in the online appendix. Before we launch into the proof of our performance Lemma 6. Let J ∈  satisfy J 0ab = 0, let  = bound, we present an overview of the analysis. Because inft J z  = 0, and let z ∈  . Then, our analysis will focus on a Gamma prior, we will sup- t 0 x˜ a˜ b˜ press the state variable w in our notation, and a and b will  −t   be understood to be scalars. Without loss of generality, we E e H J ztdt = J z0 − J z0 will restrict attention to problems with  = e−1; in partic- 0 ular, the value function exhibits the following invariance Let J  →  be bounded and satisfy J 0 = 0, let  =  where the notation J makes the dependence of the value inft J x  = 0, and let x ∈ . Then, function on  explicit. t 0  Lemma 5. Let   →  be an arbitrary policy and −t   + E e H J xtdt = J x0 − J x0 0 let    → + be defined according to  xab =  x a b/. Then, for all z ∈  , >0, J  z = J   1 xab, and, in particular, J ∗ z = J ∗ 1 xab 6.1. Decay Balancing vs. Optimal Prices As discussed in the preceding outline, we will establish Via the above lemma, we see that any performance a lower bound on J ∗ z/J z˜ to establish a lower bound bound established for a heuristic assuming a discount fac- on  z/∗ z. Let J nl z be the expected revenue gar- −1 db tor e applies to other discount factors >0 as well. nered by a pricing scheme that does not learn, upon starting In particular, given a heuristic policy  designed for dis- in state z. Delaying a precise description of this scheme −1 e −1 ∗e−1 count factor e and satisfying J z  -J z for for just a moment, we will have J nl z  J ∗ z  J z˜ 

some z ∈  , the above lemma tells us that the policy  J ∗ x. It follows that J nl z/J ∗ x  J ∗ z/J z˜ , so that −1 a/b a/b defined according to  xab= x a b/e  satisfies a lower bound on J nl z/J ∗ x is also a lower bound on   −1 ∗ −1 a/b J xae b/  -J xae b/. J ∗ z/J z˜ . We will focus on developing a lower bound on As a natural first step, we attempt to find upper J nl z/J ∗ x. and lower bounds on  z/∗ z, the ratio of the a/b db Upon starting in state z, the “no-learning” scheme decay balancing price in a particular state to the opti- assumes that  = a/b = $ and does not update this estimate mal price in that state. We are able to show that 1  over time. Assuming we begin with a prior of mean $, such ∗ ˜ J z/J z  1/. a, where . · is a certain decreasing a scheme would use a pricing policy given implicitly by function. By then specializing attention to specific reser- vation price distributions, this suffices to establish that nl ∗ ∗ ∗ ∗ to this article and distributed this copy as a courtesy to the author(s).  z =  x = 1/  x + J x − J x − 1 (6) ∗ $ $ $ $ 1/f . a  db z/ z  1, where f is some increas- ing, nonnegative function dependent upon the reservation nl Using the definition of H  and the fact that H J ∗ = 0, price distribution under consideration.  $ $ some simplification yields For example, in the case of exponential reservation price copyright distributions, f x= 1 + log x. nl H  J ∗ x = /$ − 1J ∗ x By considering a certain system under which revenue is  $ $ holds higher than the optimal revenue, we then use the bound The following two results are then essentially immediate above and a dynamic programming argument to show that consequences of Lemma 6; proofs can be found in the 1/f . a  J db z/J ∗ z  1, where J db z denotes the online appendix. expected revenue earned by the decay balancing heuristic nl ∗ INFORMS Additional information, including rights and permission policies, is available atstarting http://journals.informs.org/. in state z.Ifzis a state reached after i sales, then Lemma 7. If <$, J x  /$J$ x for all x ∈ . a = a + i>i, so that the above bound guarantees that 0 Lemma 8. If   $, J nl x  J ∗ x for all x ∈ . the decay balancing heuristic will demonstrate performance  $ that is within a factor of f . i of optimal moving forward Armed with these two results, we can establish a lower after i sales. nl ∗ bound on J z/Ja/b x. Our general performance bound can be strengthened to a uniform bound in the special case of exponential reserva- Theorem 1. For all z ∈  , tion prices. In particular, a coupling argument that uses a J nl z ! a+ 1 − ! a+ 1a+ a! a a refinement of the general bound above along with an anal-  ≡ 1/. a J ∗ x a! a ysis of the maximal loss in revenue up to the first sale a/b Farias and Van Roy: Dynamic Pricing with a Prior on Market Response Operations Research 58(1), pp. 16–29, © 2010 INFORMS 25

Proof. Setting $ = a/b,wehave trolled by db. The following result should be intuitive given our construction of the upper-bounding system and the fact nl J nl z = E J  x# ∗   that because db z   z, the probability that a customer  E 1 /$J ∗ x + 1 J ∗ x# arriving in state z chooses to purchase is higher in the sys-  <$ $ $ $ tem controlled by the decay balancing policy. The proof uses ! a+ 1 − ! a+ 1a+ a! a a = J ∗ x a dynamic programming argument and may be found in the a! a $ online appendix. where the inequality follows from the two preceding Lemma 9. For all z ∈  , and reservation price distribu- lemmas and the equality by direct integration of the tions satisfying Assumptions 1 and 2, Gamma ab density. ! · · is the incomplete Gamma function and is given by ! xy=  sx−1e−s ds.  J ub z  J ∗ z y The decay balance equation allows one to use the above Now observe that because . a is decreasing in a,we ˜ bound on the quality of our approximation J to com- have from Corollary 1 that pute bounds on the decay balance price relative to the optimal price at a given state. In particular, Corollary 1 1 Rdb z   1 establishes such bounds for exponential and logit reserva- f . a Rub z tion price distributions; see the online appendix for bounds one might establish for general classes of reservation price where f x= 1 + log x for exponential reservation price distributions. distributions and f x= 1 + log x/127 for Logit distri- butions. Taking expectations, and employing Lemma 9, we Corollary 1. For all z ∈  , and exponential reservation then immediately have the following theorem. price distributions with parameter r, Theorem 2. For all z ∈  , and exponential reservation 1  z  db  1 price distributions with parameter r, 1 + log . a ∗ z 1 J db z For all z ∈  , and logit reservation price distributions with   1 1 + log . a J ∗ z parameter r, 127  z while for logit reservation price distributions with param-  db  1 eter r, 127 + log . a ∗ z 127 J db z 6.2. An Upper Bound on Performance Loss   1 127 + log . a J ∗ z We next establish a lower bound on J db z/J ∗ z that will See Figure 4 for an illustration of these bounds. depend√ on the coefficient of variation of the prior on , 1/ a. It is worth pausing to reflect on the bound we have just to this article and distributed this copy as a courtesyLet to the author(s). established. Our performance bound does not depend on x,

−1 db −e tk R z = e  z −  db tk Figure 4. Lower bound on decay balancing ktk performance. copyright be the revenue under the decay balancing policy for a par- ticular sample path of the sales process, starting in state z, 1.0 holds and define

−1 0.8

ub −e tk ∗ ( z )

R z = e  z −  * tk

ktk )/ J ( z

db 0.6 INFORMS Additional information, including rights and permission policies, is available atThis http://journals.informs.org/. describes a system whose state evolution is identical to that under the decay balancing policy but whose revenues on a sale correspond to those that would be earned if the 0.4 price set prior to the sale was that of the optimal pricing algorithm. db db ub Of course, J z = EzR z#. Define J z = Lower bound on J 0.2 ub EzR z#, where the expectation is over tk and assumes Exponential that an arriving consumer at time tk makes a purchase with Logit probability F  z . That is, the expectation is under- 0 db tk 0 1 2 3 4 5 6 stood to be according to the dynamics of the system con- a Farias and Van Roy: Dynamic Pricing with a Prior on Market Response 26 Operations Research 58(1), pp. 16–29, © 2010 INFORMS

b, , or the parameters of the reservation price distribution. The result above is essentially a consequence of the fact that it is never the case that the ∗ controlled system sells Because the coefficient of variation√ of a Gamma prior with parameters a and b is given by 1/ a, the bound illustrates its first item before the db system, and moreover, that how decay balancing performance approaches optimal per- conditioning on , and the information available in both formance as the coefficient of variation of the initial prior is systems up to −, yields a posterior with shape parame- db decreased; below coefficients of variation of 05, the bound ter a + 1 and scale parameter b . We will also find the guarantees performance levels that are at least within 80% following technical lemma useful. of optimal. Nonetheless, the bound can be arbitrarily poor Lemma 12. For x>1, a>1, b>0, J ∗ xab  at high coefficients of variation. Next, we further specialize 205J ∗ x − 1ab. our attention to exponential reservation price distributions The result above is intuitive. It would follow, for and present a uniform performance guarantee for that case. example, from decreasing returns to an additional unit of inventory. Unfortunately, we are not able to show such a 6.3. A Uniform Performance Bound for “decreasing returns” property directly and a certain cou- Exponential Reservation Prices pling argument is used to prove the lemma (see the online We now consider the case of exponentially distributed appendix). We are now poised to prove a uniform (over prices. In particular, we have F p = exp −p/r, where x>1) performance bound for our pricing scheme. r>0. In light of the following lemma (where the notation Theorem 3. For all z ∈  with x>1, J   r makes the dependence of the value function on  J db z and r explicit), we can assume without loss of generality  1/3 J ∗ z that the mean reservation price, r,is1. Proof. In Lemma 11, we showed Lemma 10. Let   →  be an arbitrary policy and let −1 ∗ + J ∗ z  Ee−e  e−  −db∗ + J ∗ x − 1a+ 1bdb#    →  be defined according to  z = 1/r z.  + ∗   r  1 −  −db ∗ db Then, for all z ∈  , >0, r>0, J z = rJ z, + 1 − e J xa + 1b # ∗r ∗1 and, in particular, J z = rJ z. Now,

−1 ∗ Our proof of a uniform performance bound will use −e  −  −db ∗ ∗ db e e  +J x−1a+1b # Theorem 2 along with a coupling argument to bound per- ∗ + 1−e−  −dbJ ∗ xa+1bdb formance loss up to the time of the first sale. Begin by  −1 ∗ considering the following coupling (a superscript “db” on −e  −  −db ∗ ∗ db e e  +J xa+1b  a variable indicates that the variable is relevant to a sys- −1 ∗ −e  −  −db ∗ ∗ db e e  +205J x−1a+1b  tem controlled by the db policy). For an arbitrary pol- db  −1 icy  · ∈ , the sales processes n and n are coupled −e  db db t t e db +205 1+log. a+1J x−1a+1b  in the following sense: Denote by tk the points of the −1 −e  db db Poisson process corresponding to customer arrivals (not e 205 1+log. a+1 db +J x−1a+1b 

sales) to both systems. Assume that  −   − . Then, a ∗ to this article and distributed this copy as a courtesy to the author(s). dbt t where the first inequality is because J is nondecreasing  k k jump in n at time tk can occur if and only if a jump in x. The second inequality follows from Lemma 12. The db ∗ occurs in n at time tk. Furthermore, conditioned on a third inequality follows from the fact that   db  1 ∗ db  ∗ − −db jump in n at tk, the jump in n occurs with prob- so that  e  dbe and from Theorem 2. Finally, ability exp − t− − dbt− . The situation is reversed if taking expectations of both sides, we get copyright k k −1 ∗ db − >t− . Let  denote the time of the first sale for the −e  −  −db ∗ ∗ db tk k Ee e  +J x−1a+1b #  system, i.e.,  = inft ndb = 1. In the context of this  db t ∗ −  −db ∗ db holds ∗ + 1−e J xa+1b # coupling, consider the optimal (i.e.,  ) and db controlled  systems. Denoting 205 1+log. a+1

−1  ·Ee−e   +J db x−1a+1bdb# ∗ −t ∗  ∗ db  J z = E e  ztF  zt dt z0 = z  INFORMS Additional information, including rights and permission policies, is available at http://journals.informs.org/. t=0 205 1+log. 1J db z we then have the following lemma. Thus, J db z 1 Lemma 11. For all z ∈  ,   1/3  J ∗ z 205 1 + log . 1 ∗ −e−1 − ∗−  ∗ ∗ db Perhaps the most crucial point borne out in the perfor- J z  e e db  + J x − 1a+ 1b #  mance analysis we have just presented is that the decay ∗ −  −db ∗ db + 1 − e J xa + 1b   balancing heuristic is robust; our analysis precludes the drastic performance breakdown observed for the greedy ∗ ∗ ∗ db where  =  xab  and db = db xab . pricing heuristic in our computational experiments. Farias and Van Roy: Dynamic Pricing with a Prior on Market Response Operations Research 58(1), pp. 16–29, © 2010 INFORMS 27

7. Multiple Stores and Consumer satisfying Jˆ∗ 0ab= 0, and that the corresponding opti-

Segments mal policy for xi > 0 is given by We now explore extensions of decay balancing to a model  a /b  with multiple stores and consumer segments. We do not ∗ z = r + Jˆ∗ z − ij j j Jˆ∗ x − e a+ e b i  i j attempt to extend our performance analysis to this more j i general model but instead present numerical experiments, 1 d − Jˆ∗ z (7) the goal being to show that decay balancing demonstrates  db the same qualitative behavior as in the one-store—one- i j j customer segment case we have studied to this point. Now, assuming that the j s are known perfectly a priori, More formally, we consider a model with N stores and it is easy to see that the control problem decomposes across M consumer segments. Each store is endowed with an stores. In particular, the optimal strategy simply involves initial inventory x0i for i ∈ 1N. Customers from ∗ store i using as its pricing policy pti =  xti, where class j, for j ∈ 1M, arrive according to a Poisson i i = j ij j . Consequently, a certainty equivalent policy process of rate j , where j is a Gamma distributed ran- ∗ would use the pricing policy CE zi =  xi. dom variable with shape parameter a and scale parameter i 0j We can also consider as an approximation to Jˆ∗ the fol- b0j. An arriving segment j customer considers visiting a single store and will consider store i with probability  . lowing upper bound (which is in the spirit of the upper ij bound we derived in §4): Consequently, each store i sees a Poisson stream of cus- tomers having rate   . We assume without loss of j ij j generality that  = 1. We assume that customers in J z¯ = E J ∗ x   i ij i i each segment have exponential reservation price distribu- i tions with mean r, and moreover, that upon a purchase the The analogous greedy pricing policy  is then given by store has a mechanism in place to identify what segment gp ¯ ˆ∗ the purchasing customer belongs to. (7) upon substituting J · for J · in that expression. N Motivated by the decay balancing policy derived for the Let pt ∈  , t ∈ 0  represent the process of prices charged at the stores over time. Let nj represent the total single-store case, we consider using the following pricing ti policy at each store: number of type j customers served at store i up to time t, and let nj = nj . The parameter vectors a and b are then t i ti r  updated according to  z = r log i  db i ∗ EJ xi# t i −pi/r atj = a0j + ntj and btj = b0j + e d =0 i The above pricing equation assumes that, moving forward, each store will operate as a separate entity. Nonetheless, Our state at time t is now zt = xtatbt. As before, we will consider prices generated by policies  that are the heuristic incorporates joint learning across stores and measurable, nonnegative vector-valued functions of state, continues to account for the level of uncertainty in mar- ket size in the pricing process. Furthermore, the structural to this article and distributed this copy asso a courtesy that to the author(s). pt =  zt  0. Letting  denote the set of all such policies, our objective will be to identify a policy ∗ ∈  properties discussed in the single-store case are retained. that maximizes i Figure 5. Performance relative to a clairvoyant ˆ −pti/r J z = Ez  tie dt  algorithm. copyright i 0 i j Performance gain over clairvoyant algorithm where  = inft j nti = x0i and i = j ij aj /bj . holds We define the operator ij aj /bj    − zi/r –2 H J z= ie  zi + i j i –4 INFORMS Additional information, including rights and permission policies, is available at http://journals.informs.org/. –6 · J x− eia+ ej b− J z –8 d Gain (%) − zi/r + e J z − J z –10 j dbj –12 where ek is the vector that is 1 in the kth coordinate and 0 8 ˆ∗ ˆ∗ 8 in other coordinates. One might show that J = J is the 4 Store 2 inventory 4 unique solution to 2 2 sup H  J z= 0 ∀z 1 1 ∈ Store 1 inventory Farias and Van Roy: Dynamic Pricing with a Prior on Market Response 28 Operations Research 58(1), pp. 16–29, © 2010 INFORMS

Figure 6. Performance relative to the certainty equivalent heuristic (left) and greedy pricing heuristic (right).

Performance gain over certainty equivalent heuristic Performance gain over greedy pricing

900 10 700

5 500 300 Gain (%)

Gain (%) 0 100

0 –5 8 8 1 4 4 8 2 Store 2 inventory 4 Store 1 inventory4 2 2 2 8 1 1 1 Store 1 inventory Store 2 inventory

The joint learning under this heuristic does not, however, achieves near-optimal performance even on problems with account for inventory levels across stores. high levels of uncertainty in market response. Pricing poli- We now present computational results for the three cies generated by decay balancing have the appealing struc- heuristics. Our experiments will use the following model tural property that, all other factors remaining the same,

parameters. We take N = 2, M = 2 and assume that ij = the price in the presence of uncertainty in market response 1/2 for all i, j, and further that we begin with prior param- is higher than the corresponding price with no uncertainty.

eters a1 = a2 = 004 and b1 = b2 = 0001 (which cor- This is reasonable from an operational perspective and is in responds to a mean of 40 and a coefficient of variation fact a property possessed by the optimal policy. Our analy- of 5). As usual,  = e−1, r = 1. Our first set of results sis also demonstrated a uniform performance guarantee for (Figure 5) compares the decay balancing heuristic’s perfor- decay balancing when reservation prices are exponentially mance against that of a clairvoyant algorithm which as in distributed, which is an indicator of robustness. §5 has perfect a priori knowledge of .AsintheN = 1, Two heuristics proposed for problems of this nature prior M = 1 case, our performance is quite close to that of the to our work were the certainty equivalent heuristic (by clairvoyant algorithm. Figure 6 compares decay balanc- Aviv and Pazgal 2005a) and the greedy pricing heuris- ing performance to the certainty equivalent heuristic and tic (by Araman and Caldentey 2009). Our computational the greedy heuristic. Figure 6 is indicative of performance results suggest that decay balancing offers significant per- that is qualitatively similar to that observed for the N = 1, formance advantages over these heuristics. These advan- M = 1 case; there is a significant gain over certainty equiv- tages are especially clear at high levels of uncertainty in to this article and distributed this copy as a courtesy to the author(s). alence at lower inventory levels, but this gain shrinks as market response, which is arguably the regime of greatest inventory level increases. The performance of the greedy interest. Decay balancing relies only on a good approxi- heuristic is particularly dismal; one explanation for which mation to the value of an optimal policy at a given state. is that d/db J z¯ is a potentially poor approximation j j This is in contrast with greedy pricing that requires a good copyright ˜∗ to j d/dbj J z. approximation not only to value but further to derivatives of value with respect to the scale parameter. At the same holds 8. Discussion and Conclusions time, uncertainty in arrival rate and changes in inventory The dynamic pricing model proposed by Gallego and van levels bear the appropriate directional impact on decay bal- Ryzin (1994) is central to a large body of the revenue ancing prices: uncertainty in the arrival rate calls for higher management literature. This work considered an important prices than in corresponding situations with no uncertainty, INFORMS Additional information, including rights and permission policies, is available atextension http://journals.informs.org/. to that model. In particular, we considered incor- while a decrease in inventory calls for an increase in prices. porating uncertainty in the customer arrival rate or “market In contrast, uncertainty in the arrival rate has no impact on response,” which is without doubt important in many indus- certainty equivalent prices, while greedy prices can increase tries that practice revenue management. or decrease with decreasing inventory. We proposed and analyzed a simple new heuristic for this Our computational study and performance analysis were problem: decay balancing. Decay balancing is computation- focused on exponentially distributed reservation prices and ally efficient and leverages the solution to problems with no gamma priors, but we expect favorable performance for uncertainty in market response. Our computational experi- other distributions as well. In particular, the analysis of ments (which focused on Gamma priors and exponentially Theorem 1 can be extended to Gamma mixture priors yield- distributed reservation prices) suggest that decay balancing ing encouraging estimates on the quality of approximation Farias and Van Roy: Dynamic Pricing with a Prior on Market Response Operations Research 58(1), pp. 16–29, © 2010 INFORMS 29

provided by J˜. Because the decay balancing price at state acknowledges support from the INFORMS MSOM society z is likely to be a good approximation to the optimal price in the form of a best student paper award for a prelimi- at z if J z˜ is a good approximation to J ∗ z, this suggests nary version of this work. This research was supported in that decay balancing is likely to do a good job of approx- part by the National Science Foundation through grant IIS- imating the optimal price for general reservation price dis- 0428868. tributions and priors on arrival rate, which in turn should lead to superior performance. References There is ample room for further work in the general area of pricing with uncertainty in market response and Araman, V. F., R. Caldentey. 2009. Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5) 1169–1188. other factors that impact demand. One direction is con- Aviv, Y., A. Pazgal. 2005a. Pricing of short life-cycle products through sidering more complex models. We considered applying active learning. Working paper, Washington University, St. Louis. the decay balancing heuristic to a problem with multiple Aviv, Y., A. Pazgal. 2005b. A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9) 1400–1416. stores and consumer segments; our computational results Bertsimas, D., G. Perakis. 2006. Dynamic pricing: A learning approach. there are promising. There are other models one might D. Hearn, S. Lawphongpanich, eds. Mathematical and Computational hope to consider; for example, the multiproduct model pro- Models for Congestion Charging. Springer, Berlin, 45–79. Besbes, O., A. Zeevi. 2006. Blind nonparametric revenue management: posed by Gallego and van Ryzin (1997). Another poten- Asymptotic optimality of a joint learning and pricing method. tial direction is exploring new approximations to the value Submitted. function beyond the approximation considered here and Bremaud, P. 1981. Point Processes and Queues: Martingale Dynamics, applying such approximations with either the greedy pric- 1st ed. Springer-Verlag, Berlin. Burnetas, A. N., C. E. Smith. 1998. Adaptive ordering and pricing for ing or decay balancing heuristics. Finally, it would be inter- perishable products. Oper. Res. 48(3) 436–443. esting to extend the approaches in this paper to problems Cope, E. 2006. Bayesian strategies for dynamic pricing in e-commerce. with uncertainty in other factors that impact demand, such Naval Res. Logist. 54(3) 265–281. Dalal, S. R., W. J. Hall. 1983. Approximating priors by mixtures of natural as price elasticity. conjugate priors. J. Roy. Statist. Soc. B 45(2) 278–286. Gallego, G., G. van Ryzin. 1994. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8) 9. Electronic Companion 999–1020. Gallego, G., G. van Ryzin. 1997. A multiproduct dynamic pricing problem An electronic companion to this paper is available as part and its applications to network yield management. Oper. Res. 45(1) of the online version that can be found at http://or.journal. 24–41. informs.org/. Kleinberg, R., F. T. Leighton. 2004. The value of knowing a demand curve: Bounds on regret for on-line posted-price auctions. Proc. 44th IEEE Sympos. Foundations Comput. Sci. FOCS 2003, 594–605. Lin, K. Y. 2006. Dynamic pricing with real-time demand learning. Eur. J. Acknowledgments Oper. Res. 174(1) 522–538. The first author thanks Mike Harrison, Paat Rusmevichien- Lobo, M. S., S. Boyd. 2003. Pricing and learning with uncertain demand. Submitted. tong, and Gabriel Weintraub for useful discussions, sugges- Talluri, K. T., G. J. van Ryzin. 2004. The Theory and Practice of Revenue tions, and pointers to the literature. The first author also Management. Springer Science+Business Media, Berlin. to this article and distributed this copy as a courtesy to the author(s). copyright holds INFORMS Additional information, including rights and permission policies, is available at http://journals.informs.org/.