Revenue Management and Learning in Systems of Reusable Resources

Home , Revenue management

Revenue Management and Learning in Systems of Reusable Resources by Zachary Davis Owen B.S., Cornell University (2011) Submitted to the Sloan School of Management in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Operations Research at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2018 ○c Massachusetts Institute of Technology 2018. All rights reserved.

Author...... Sloan School of Management May 18, 2018

Certified by...... David Simchi-Levi Professor of Engineering Systems Professor of Civil and Environmental Engineering Thesis Supervisor

Accepted by ...... Dimitris Bertsimas Boeing Professor of Operations Research Co-director, Operations Research Center 2 Revenue Management and Learning in Systems of Reusable Resources by Zachary Davis Owen

Submitted to the Sloan School of Management on May 18, 2018, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Operations Research

Abstract Many problems in revenue management and operations management more generally can be framed as problems of resource allocation. This thesis focuses on developing policies and guarantees for resource allocation problems with reusable resources and on learning models for personalized resource allocation. First, we address the problem of pricing and assortment optimization for reusable resources under time-homogeneous demand. We demonstrate that a simple randomized policy achieves at least one half of the optimal revenue in both the pricing and assortment settings. Further, when prices are fixed a priori, we develop a method to compute the optimal randomized state-independent assortment policy. The performance of our policies is evaluated in numerical experiments based on arrival rate and parking time data from a municipal parking system. Though our algorithms perform well, our computational results suggest that dynamic pricing strategies are of limited value in the face of a consistent demand stream. Motivated in part by the computational results of the previous section, in the second section, we consider the problem of pricing and assortment optimization for reusable resource under time-varying demand. We develop a time-discretization strategy that yields a constant-factor performance guarantee relative to the optimal policy continuous-time policy. Additionally, we develop heuristic methods that implement a bid-price strategy between available resources based on pre-computed statistics that is computable in real-time. These methods effectively account for the future value of resources that in turn depend on the future patterns of demand. We validate our methods on arrival patterns derived from real arrival rate patterns in a parking context. In the third part, we consider the problem of learning contextual pricing policies more generally. We propose a framework for making personalized pricing decisions based on a multinomial logit model with features based on both customer attributes, item attributes, and their interactions. We demonstrate that our modeling procedure is coherent and in the well specified setting we demonstrate finite sample bounds on the performance of our strategy based on the size of the training data.

3 Thesis Supervisor: David Simchi-Levi Title: Professor of Engineering Systems Professor of Civil and Environmental Engineering

4 Acknowledgments

I would like to thank my advisor, David Simchi-Levi for his great support over the last five years. He has been a steady guiding hand through the many ups anddownsofa graduate research career. I would also like to acknowledge the guidance of my thesis committee members, Stephen Graves and John Tsitsiklis. They are both true legends in our field and provided helpful feedback and advice at each stage of the thesis-writing process. In addition, I would like to thank the faculty and staff of the Operations Research Center for putting together one of the finest research communities in the world. Their hard work in managing this program is fundamental to the success of all the students here. I will always be proud to be a graduate of such an iconic program.

I would also like to thank the members of David Simchi-Levi’s research group, both past and present for providing useful feedback in the office as well as our weekly seminars. I’d like to give a special acknowledgement to Clark, Michael, and Louis. The support and direction we all shared during our Friday check-ins was invaluable to me in this final stretch. I also appreciate Peter, He, Kris, Mila, and Will asfriends and models of productive researchers.

My time at MIT would have been much less joyful if not for the amazing people I have met from the ORC and around the institute. In particular, I am grateful to my roommates Andrew, Arthur, Virgile, and Mathieu for all the late night movies, whiteboard conversations, gathering-hosting, and for dealing gracefully with all my quirks. I’d also like to thank my ORC row-mates Max Biggs, Max Burq, Rim, and Emily for being great friends and conversation partners, sometimes in detriment to our personal productivity. I’d also like the thank the other members of the crew Andrew V.B., Anna, Cécile, Charlie, Colin, Elisabeth, Joey, Ludovica, Mariapaola, Hugh, Scott, Stefano, Sébastien, Zach Saunders, and Zebulon. From our weekly gatherings to Farkasy Football to Las Vegas, New Orleans, Cannon, Costa Rica and beyond our shared adventures will always be a highlight of my time here. I would also like to acknowledge my fellow students and friends Alexander R., Daniel C., Daniel S., Eli, Evan, Jean, Jehangir, Kevin, Ilias, Martin, Nishanth, Rajan, Velibor, and

5 Will H.. We made it through this and grew as people together. Each of you has inspired me in your own unique way. I owe a great debt of gratitude to my undergraduate research advisor Peter Frazier for his belief and investment in me during such a formative time. I am also grateful for my lasting friendships with Andy, Colleen, Erica, Kerry, Paul, and Peter. We have come a long way from our time at Cornell, and our intermittent gatherings have been a welcome respite from academic life. I hope our friendship continues to grow as our lives evolve. I would also like to thank my teammates at Armoire for their patience and support while I was writing this thesis. I am happy to call each of you friends as well as colleagues. Further, I would like to acknowledge the staff of the Martin Trust Center for the inspiring summer I spent in the Delta V program and for generously sharing their much-needed coffee. I am exceedingly grateful to my parents, David and Naomi Owen. My father David has always served as an educational and entrepreneurial inspiration, and I aspire to someday acquire some small fraction of his positive, can-do attitude. My mother Naomi has always been the one person who understands me best, and I truly appreciate her unwavering and selfless support. My sister Anna and brother Samuel have also been wonderful siblings, and I hope our bond only strengthens as we grow further into adulthood. Finally, thank you Miriam for being a caring partner and for making this last year one I can remember fondly.

6 Contents

1 Introduction 15 1.1 Overview of Thesis ...... 17 1.2 Literature Review ...... 18 1.2.1 Pricing and Assortment Optimization ...... 19 1.2.2 Reusable Resources ...... 21 1.2.3 Online Matching ...... 22

2 Revenue Management for Reusable Resources under Time-Homogeneous Demand 25 2.1 Model Formulation ...... 28 2.2 Assortment Policies ...... 31 2.2.1 Deterministic Linear Programming Upper Bound ...... 33 2.2.2 State-Independent Performance Guarantees ...... 37 2.2.3 Optimal State-Independent Assortment Policy ...... 41 2.2.4 Transient Revenue Loss Guarantees ...... 46 2.2.5 Dynamic Assortment Policies ...... 50 2.3 Computational Case Study - Assortment Only ...... 54 2.4 Pricing and Assortment Decisions ...... 58 2.4.1 The Opportunistic Pricing Case ...... 59 2.4.2 The Fair-pricing Case ...... 61 2.4.3 Fair-pricing with Single-item Assortments ...... 63 2.4.4 Computational Strategies for Assortments with 푀 > 1 . . . . 66 2.4.5 Fair Dynamic Pricing Policies ...... 67

7 2.5 Computational Case Study - Pricing and Assortment ...... 69 2.6 Conclusion ...... 73

3 Revenue Management for Reusable Resources under Time-Varying Demand 75 3.1 Assortment Model Formulation ...... 77 3.1.1 System State and Policies ...... 80 3.2 Assortment Policies ...... 82 3.2.1 Linear Programming Upper Bound ...... 84 3.2.2 Randomized Algorithm Guarantees ...... 88 3.2.3 Dynamic Policies ...... 91 3.3 Pricing Under Time-varying Demand ...... 93 3.3.1 The Opportunistic Pricing Case ...... 94 3.3.2 The Fair Pricing Case ...... 96 3.3.3 Pricing Policy Performance ...... 101 3.3.4 Dynamic Policies ...... 102 3.4 Computational Case Study ...... 104 3.4.1 Assortment Experiments ...... 107 3.4.2 Pricing Experiments ...... 110 3.5 Conclusion ...... 112

4 Statistical Learning Guarantees for Personalized Pricing 115 4.1 Literature Review ...... 119 4.2 A General Model ...... 121 4.2.1 Customized Pricing Model ...... 121 4.3 Algorithm ...... 124 4.4 Theory: Well-specified Model Setting ...... 125 4.4.1 Notation and Preliminaries ...... 126 4.4.2 Theoretical Results for Customized Pricing ...... 127 4.5 Extensions of Theory ...... 133 4.5.1 Misspecified Model Setting ...... 133

8 4.5.2 High-Dimensional Setting ...... 135 4.6 Extensions and Future Work ...... 136

5 Concluding Remarks 139 5.1 Summary ...... 139 5.2 Future Directions ...... 141

A Technical and Experimental Results for Chapter 2 143 A.1 Proofs ...... 143 A.1.1 Proof of Proposition 1 ...... 143 A.1.2 Proof of Proposition 2 ...... 144 A.1.3 Proof of Lemma 4 ...... 145 A.1.4 Proof of Proposition 4 ...... 147 A.1.5 Proof of Proposition 5 ...... 148 A.2 Computational Experiment Model-Fitting Methodology ...... 150 A.3 Computational Experiment Model Sensitivity ...... 151

B Technical Results, Experimental Results, and Extensions for Chap- ter 3 155 B.1 Proofs ...... 155 B.1.1 Proof of Proposition 6 ...... 155 B.1.2 Proof of Lemma 5 ...... 157 B.1.3 Proof of Proposition 7 ...... 158 B.1.4 Proof of Proposition 8 ...... 161 B.2 Extension to the Infinite Time Horizon Setting ...... 164 B.3 Tables of Computational Experiment Results ...... 173 B.3.1 Assortment Results ...... 174 B.3.2 Assortment Results ...... 175 B.3.3 Pricing Results ...... 177

9 C Technical Results for Chapter 4 179 C.1 Proofs ...... 179 C.1.1 Proof of Lemma 6 ...... 179 C.1.2 Proof of Lemma 7 ...... 180

10 List of Figures

2-1 Absolute and relative performance of state-independent policies in steady- state versus the arrival rate scaling factor...... 56 2-2 Absolute and relative performance of state-dependent policies in steady- state versus the arrival rate scaling factor...... 57 2-3 Absolute and relative performance of dynamic fair-pricing and offer policies in steady-state versus the arrival rate scaling factor...... 72 2-4 Comparison of performance of dual-based dynamic fair-pricing policies versus fixed-price policies in steady-state versus the arrival rate scaling factor...... 73

3-1 Absolute performance versus arrival scaling with LP-based upper bound.107 3-2 Policy performance versus arrival scaling for the price by meter assortment scenario...... 108 3-3 Policy performance versus arrival scaling for the price by customer type assortment scenario...... 109 3-4 Policy performance versus arrival scaling for the pricing scenario. . . 111

11 12 List of Tables

2.1 Expected hitting times, E[푇 *], in units of service times ...... 49

A.1 Steady State Revenue Rate Sensitivity, 훽 = 0.4 ...... 153 A.2 Steady State Revenue Rate Sensitivity, 훽 = 1.0 ...... 153 A.3 Steady State Revenue Rate Sensitivity, 훽 = 1.6 ...... 153

B.1 Assortment Experiment Price by Resource T=20 ...... 174 B.2 Assortment Experiment Price by Resource T=40 ...... 174 B.3 Assortment Experiment Price by Resource T=80 ...... 175 B.4 Assortment Experiment Price by Customer T=20 ...... 175 B.5 Assortment Experiment Price by Customer T=40 ...... 176 B.6 Assortment Experiment Price by Customer T=80 ...... 176 B.7 Pricing Experiment T=20 ...... 177 B.8 Pricing Experiment T=40 ...... 177 B.9 Pricing Experiment T=80 ...... 178

13 14 Chapter 1

Introduction

Many problems in revenue management and operations management more generally can be characterized as challenges of probabalistic resource allocation in which the operator seeks to make the most efficient use of their limited resources. This isthe case in essentially every industry from effectively pricing and rationing vehicles or parking spaces in the world of transportation to allocating beds or appointment time in a healthcare setting. In all of these settings the challenges faced by the operator derive from two primary factors: the limited quantity of resources available and the uncertainty inherent in the dynamics of the underlying system. The two primary sources of uncertainty we consider are randomness in demand and randomness in the evolution of the state of the resources. The former element owes to a number of factors such as the operator’s uncertainty of the market size and the unobservable factors affecting the way in which each customer selects between the resources offered by the operator. In this thesis we develop theoretical tools and practical techniques for managing these challenges in the face of limited resources with a focus on the challenges that are introduced when the resources in question are reusable. Two commonly used resource allocation tools in the revenue management literature are pricing and assortment optimization. Pricing is the canonical tool in both economics and revenue management used to allocate a limited quantity of resources to a larger pool of potential demand. The simplest pricing mechanism is simply to set a single price for each resource based on expectations of future demand. Two more

15 sophisticated pricing mechanisms we consider here are dynamic and contextual dynamic pricing. A dynamic pricing strategy gives the operator the flexibility to adjust pricing in response to evolving global conditions such as the current level of inventory and refined estimates of the expected future demand. As businesses capture ever more data about their customers and transactions there has been growing interest in contextual pricing which, in addition to the global considerations above, allows the operator to set their prices based on transaction level details. These transaction features could include variables such as time of day, the customer’s zip code, and any other relevant information that could help to inform the operator’s pricing strategy. Here we develop techniques for contextual dynamic pricing both in general and more specifically in the setting of reusable resources.

Another important operational tool for resource allocation is assortment optimization. By modeling the process by which individual customers select between various offered choices the operator is able to intelligently select the initial assortment to optimize revenue. The assortment optimization framework is applicable in both brick and mortar settings as well as in modern e-commerce applications. The latter settings are especially interesting due to the ability of online retailers to display a set of products personalized to each user’s tastes. This thesis also develops techniques for contextual dynamic assortment optimization in the setting of reusable resources.

In general resource allocation problems, the operator possesses a predefined quantity of resources and must make pricing and assortment decisions in the face of a stream of incoming demand. By employing a well-designed strategy, the operator seeks to optimize their revenue stream to meet their business objectives. In much of the current literature on resource allocation problems in revenue management, the prevailing assumption is that that the resources available to the operator are consumable. That is, at least for the selling period under consideration, each customer purchase results in the loss of the underlying resource.

In this standard inventory depletion setting, the primary challenge lies in accounting for the randomness in future demand, given the current inventory level. If future demand is heavy relative to the remaining resource level, one might offer a smaller

16 assortment of items at higher prices to extract more value from the remaining inventory. On the other hand, if future demand is light, a manager would likely be better off offering a wider assortment at lower prices to generate marginal revenue. Many papers in the literature such as Maglaras and Meissner (2006) and Gallego et al. (2016), for example, have developed effective pricing and assortment strategies to mitigate the risk inherent in random future demand. However, in many practically relevant settings, the resources available to the operator are renewable rather than consumable. This is the case for example in the management of a network of parking facilities, a car-sharing or bicycle-sharing network, or a cluster of cloud computing resources. In such settings, a purchasing customer makes use of the resource for the duration of their service, but upon completion the resource is restored to the operator for reuse. In many such settings, the length of time the resource is to be utilized by each customer is unknown to the operator at the time of sale and therefore must be modeled as random. This introduces additional complexity, since in addition to the risk of future demand, the random future state of the resources must be accounted for when making pricing and assortment decisions.

1.1 Overview of Thesis

This thesis is divided into three sections with each addressing a distinct aspect of the general resource allocation problem. In Chapter 2, we address the problem of pricing and assortment optimization for reusable resources under time homogeneous demand. This problem isolates the complexity introduced by stochastic service times when the demand rate is constant over time. We demonstrate that a simple randomized policy achieves at least one half of the optimal revenue in both the pricing and assortment settings. Our computational results suggest that dynamic pricing is of limited value in the face of consistent demand stream. Motivated by this we develop a method to compute the optimal randomized state-independent assortment policy when prices are fixed a priori. The performance of our policies is evaluated in numerical experiments based on arrival rate

17 and parking time data from the municipal parking system of Islington, a borough of London. Motivated in part by the computational results of the previous section, in the Chapter 3, we consider the problem of dynamic pricing and assortment optimization for reusable resource under time-varying demand. We develop a novel time- discretization strategy that yields a constant factor performance guarantee relative to the optimal, continuous-time policy. Additionally, we develop heuristic methods that implement a bid-price strategy between available resources based on pre-computed statistics that is computable in real-time. These methods effectively account for the future value of resources that in turn depend on the future patterns of demand. We validate our methods on arrival patterns derived from real arrival rate patterns in Islington. In Chapter 4, we consider the problem of learning contextual pricing policies more generally. We propose a framework for personalized pricing decisions based on a multinomial logit model with features based on both customer attributes, item attributes, and their interactions. We demonstrate that our modeling procedure is coherent and in the well specified setting we demonstrate finite sample bounds onthe performance of our strategy based on the size of the training data. Finally, we conclude in Chapter 5 by providing summary remarks and proposing interesting directions for future work. The more technical proofs for each chapter are provided in the associated appendices. In the following section, we introduce ideas and references that are broadly applicable throughout this thesis.

1.2 Literature Review

We first introduce references related to the general problems of pricing, assortment optimization, and the dynamic variations thereof present in the revenue management literature. Subsequently, we introduce references related to reusable resources in operations management. Finally, we present related work from the literature on

18 online matching.

1.2.1 Pricing and Assortment Optimization

With the rise of ecommerce, pricing and especially the dynamic pricing it facilitates has emerged as a popular research area within revenue management. Many early references in this field are summarized by Aviv et al. (2012). Dynamic pricing with demand learning has been a popular theme recently in the revenue management literature. A survey of early work in this field can be found in Aviv et al. (2012). In these settings it is often assumed that the demand function of the entire population is unknown, but it is possible to obtain information about the structure of demand through price experimentation. A common modeling assumption, (as used in Broder and Rusmevichientong (2012) and Keskin and Zeevi (2014), for example), is that the true market demand function is specified by a parametric choice model. Using price experimentation, they develop pricing policies that work to minimize regret in comparison to a clairvoyant who knows the full demand model. Another approach to learning and dynamic pricing was recently explored in Bertsimas and Vayanos (2014). The authors formulated a robust optimization problem that captures the exploration-exploitation trade-off in dynamic pricing with unknown demand and provide a tractable approximation. Beginning with the seminal work of van Ryzin and Mahajan (1999), much work in the revenue management literature has focused on using customer choice to balance demand for limited resources. Researchers draw a distinction between static assortment optimization problem and the dynamic assortment optimization problem. In the former case, given the choice model of customer behavior, the operator must decide how to structure their assortment, taking care to balance the tradeoff between increasing market share and internal cannibalization of high revenue products. In their pioneering work, Talluri and van Ryzin (2004a) show that under the multinomial logit (MNL) choice model, the static assortment optimization problem can be solved efficiently by considering only revenue-ordered assortments. Similarly, efficient algorithms have been developed to solve the static assortment optimization problem

19 under a number of other demand models. These include cases in which demand is specified by the Markov chain choice model (Blanchet et al. 2013), the nested logit model when the nests are not too dissimilar (Davis et al. 2014), and the consider- then-choose model as proposed in Aouad et al. (2015). Other work has considered static assortment optimization under various business constraints. For example, Rus- mevichientong et al. (2010b) and Gallego and Topaloglu (2014) demonstrate algorithms to solve the capacitated assortment optimization problem under the MNL and nested logit models respectively. Much of our work does not rely on the assumption of a specific choice model; however, in chapters 2 and 3 we assume that the operator has access to a subroutine that is able to repeatedly solve or approximate instances of the static assortment optimization problem under the relevant choice model.

On the other hand, Liu and Van Ryzin (2008) as well as Bront et al. (2009) analyze the assortment problem in the dynamic network revenue management setting, in which inventory is taken into consideration. They propose a choice-based deterministic linear program (CDLP) that determines the set of efficient assortments and demonstrate their asymptotic optimality. In the case of consumable inventory, it is difficult to establish how these efficient sets can be translated into a policy directly, and so both papers propose the use of dual-based heuristics to derive practically effective assortment policies. Our technique for establishing upper bounds onthe achievable revenue resembles this approach, however our analysis takes into account the time-varying nature of demand as well as the stochastic duration of resource utilization incurred by each purchasing customer.

Another segment of the literature on dynamic assortment optimization focuses on scenarios in which customer types can be distinguished from one another. In such settings personalized assortment decisions can be used as tool for capacity rationing to dynamically balance demand between items depending on the levels of inventory remaining. Bernstein et al. (2015) analyze this setting from the perspective of dynamic programming obtaining structural results on the optimal threshold policy in the case of two products and related heuristics. Golrezaei et al. (2014) use an ap- proached based on the theory of online bipartite matching to develop an inventory

20 1 balancing policy that achieves a competitive ratio of (1 − 푒 ) in their setting. While customer types play a key role in our formulations, the methods of analysis applied in these settings do not carry over directly in the case of reusable resources.

1.2.2 Reusable Resources

The literature on reusable resources, while of more recent interest in the revenue management literature has a long history in queuing theory and telecommunications. Loss systems represent the fundamental abstraction we use in our analysis of systems of reusable resources. A loss system is a queuing model consisting of a number of servers in which customers arriving to a system in which each server is occupied is blocked and lost permanently to the system. The problem of optimizing the utilization of reusable resources in a loss system has been considered in the literature, however most examples focus on pricing and admission control and do not exploit customer choice behavior explicitly. For example, Paschalidis and Tsitsiklis (2000) consider a model of a service provider managing a single resource with limited capacity with different product classes representing different levels and durations of utilization. They characterize structural properties of the optimal dynamic pricing policy and propose a linear programming-based upper bound on the optimal performance. They propose fixed-price static policies and show that these are optimal in the fluid regime ofmany small users. Paschalidis and Liu (2002) extend this analysis to the case of multiple resources and consider potential demand substitution effects between resources in response to the posted prices. More recently such problems have become of interest in the operations management community. Iyengar and Sigman (2004) consider a similar problem to that considered by Paschalidis and Tsitsiklis (2000) but utilize a penalty function that im- plements a penalty function to target a previously computed desirable state. Levi and Radovanovic (2010) and Levi and Radovanovic (2007) consider a similar case from the perspective of admission control in which arriving customers request a particular set of resources at a given price and their request must be accepted or rejected by the operator in real-time. They demonstrate that static linear programming-based

21 policies achieve at least one half of the revenue achieved by an optimal dynamic policy and demonstrate that as the capacity of the system grows large the loss of the static policy with respect to the optimal policy decreases at a quantifiable rate. Chen et al. (2016) propose an analogous linear programming-based policy for the case in which customers seek to reserve capacity in advance, presenting additional technical challenges. In another stream of papers Savin et al. (2005) consider the problem of when to ration capacity to lower value customers within a rental system. They characterize when fully pooling capacity is optimal and analyze the optimal fleet size under various parameter regimes. Gans and Savin (2007) extend this analysis to consider the role of pricing in a similar scenario with two customers classes. All of the works above rely on the assumption of time-homogeneous demand rates as we consider in chapter 2, however none of the above explore the use of customer choice as a tool for balancing demand between multiple resources. More recently work has been done to relax this time-homogenous assumption. Lei and Jasin (2016) studied the problem of pricing reusable resources under the assumption of deterministic service times. They derive a re-solving heuristic policy that adjusts pricing in response to demand as it is realized. They prove that this policy is asymptotically optimal in a number of scenarios. Our work differs from these settings in that we explicitly model the randomness in service times and further consider the impact of assortment decisions on customer choice, taking advantage of demand flexibility to balance resource utilization.

1.2.3 Online Matching

Our work is also related to the literature on online matching and online resource allocation. This field is extremely active in the computer science community andso we focus here on applications in operations and refer the reader to Mehta (2013) for a detailed survey of recent results. Wang et al. (2015) consider an online bipartite matching problem in which customers of various types must be irrevocably assigned to a resource or rejected upon arrival. They analyze such a system using the notion

22 of competitive ratio used in the analysis of online algorithms. They develop two algorithms which translate the solution to an offline linear program into an online resource allocation policy and demonstrate that both achieve the optimal competitive ratio in their setting. Also similar to our work is that of Gallego et al. (2016) who extend the style of analysis of Wang et al. (2015) to take into account the impact of customer choice. They employ a choice-based deterministic linear programming formulation, as analyzed in Liu and Van Ryzin (2008), to derive the upper bound with which they compete and also propose a solution methodology based on column generation. They propose the Optimized Primal Routing algorithm which offers dynamic assortments based on the marginal value of inventory based on the time dynamics of the dynamic programming value function. In our setting, the randomness in the customer service times renders the notion of competitive ratio as typically used in the analysis of online algorithms less useful as in our case the offline algorithm would have access to the customer service times. For this reason, we focus here on comparing the performance of our proposed algorithms to optimal dynamic policies in an average-case or steady-state condition. We also consider ways in which pricing can be used as a further tool for revenue management in this context.

23 24 Chapter 2

Revenue Management for Reusable Resources under Time-Homogeneous Demand

In this chapter, we consider the problem of price and assortment optimization in systems of reusable resources under time-homogeneous demand. The use of customer choice models and assortment optimization as a tool for balancing demand of par- tially substitutable products has long been an important theme in the field of revenue management and operations management more broadly. In particular, the role of assortment optimization as a method for inventory control has been examined closely by a number of recent papers. To our awareness, in current literature, when inventory is taken into consideration, the prevailing assumption is that that the resources available to the operator are consumable. That is, at least for the selling period under consideration, each customer purchase results in the loss its respective resources for the remainder of the period. However, in many practically relevant settings the resources available to the operator are renewable rather than consumable. Consider the problem faced by an operator of a system of parking facilities. Po- tential customers seeking to park within the system enter their ultimate destination into a mobile application, and the operator must decide in real-time which assortment of lots to offer them and at which prices. Another example of the reusable paradigm

25 is cloud computing, in which the operator must set the prices of their various server resources taking into account the available capacity and expectations of future demand. In both settings, if the customer elects to purchase, they consume a unit of capacity at the chosen resource for a possibly random period of time, after which the resource is returned to the operator and is made available to offer again to subsequently arriving customers. In making these decisions the operator must consider a number of factors including the customer’s preferences for various facilities in the system and price sensitivity, the current availability of the resources, the immediate revenue to be gained, and the impact of this decision on future revenues. In this chapter, we develop assortment and pricing strategies that effectively balance these considerations in continuous time under random service times.

We model such a network of resources as a loss system in which customers who elect to purchase a resource that is fully utilized are lost to the system. Customers have heterogeneous preferences for the various system resources captured by their observable customer type and a known associated choice model, which must be accounted for by the intelligent operator. In principle problems of this type may be solved to optimality using dynamic programming. However, due to the intractability of the underlying dynamic program, we focus on policies that are able to attain a constant factor guarantee with respect to the optimal dynamic policy. We draw a distinction between settings in which the system operator is able to observe system utilization in real-time, allowing for dynamic policies, and those in which the operator does not have access to such information and therefore must develop utilization- independent policies.

Our model is general in that it captures many of the fundamental tradeoffs faced in many other lines of business such as the market for heavy equipment rentals, vehicle- sharing systems, and clothing rental services, for example. Much of the current literature on pricing and assortment optimization does not carry over directly to our current setting as the polices proposed in Golrezaei et al. (2014) and Liu and Van Ryzin (2008), for example, are specifically tailored to the case of consumable inventory.

26 Our contributions in this chapter are summarized as follows.

∙ We extend the theory of the revenue management of reusable resources as presented in Paschalidis and Liu (2002) and Levi and Radovanovic (2010), to take advantage of customer choice behavior and derive a static (state-independent) policy for this setting with a matching guarantee of performance in steady state. In addition, we propose a more intuitive policy, taking advantage of dynamic substitution, and show that in many cases it performs at least as well.

∙ When the resource prices are fixed exogenously, we develop a method for determining the optimal static policy. This policy achieves the maximum performance available to an operator who is unable or unwilling to continually monitor the state of their resources.

∙ We show that our steady-state results are practically relevant by using the theory of hitting times in continuous time Markov chains to show that in the case of exponential service times, the potential expected revenue loss at a given resource in comparison to the steady-state guarantee is small and grows at a limited rate with the capacity.

∙ We also consider the closely related problem of pricing in a system of substitutable reusable resources. We assume that prices are selected from a finite set due to business constraints and consider two cases which we term opportunistic pricing and fair-pricing. In the setting of opportunistic pricing the operator is free to charge different customer types different prices for the same resource at the same time while in the fair-pricing case we restrict the operator to fix its prices before observing the type of an arriving customer. We show that in the opportunistic scenario the methods proposed in our discussion of the assortment problem can be extended to apply to the pricing problem in a number of cases of practical interest. This yields similarly effective static policies achieving strong theoretical guarantees in this case. On the other hand, in the fair-pricing scenario we show that the static capacitated problem is NP-hard even without

27 the burden of inventory constraints. Nevertheless, when assortments are constrained to be of unit size, we demonstrate that the problem is theoretically approximable and computationally tractable in practice for problems of reasonable size.

∙ We validate each of the above methods using computational experiments based on parking bay utilization data from Islington, a borough of London. These experiments demonstrate the effectiveness of our proposed policies in the state- independent setting. When the current state of the resources is available, our results demonstrate that the greedy policy is effective under light demand but this myopic approach becomes weaker in moderately loaded systems.

Taken together our results demonstrate the usefulness of intelligent policies for the management of systems of reusable resources under steady demand rates.

2.1 Model Formulation

In our setting, the platform operator has a set of 푁 distinct resources (items) 풩 indexed by 푖 ∈ {1, . . . , 푁} and seeks to manage their utilization in continuous time. The operator’s potential customers belong to one of 퐾 types with each having potentially idiosyncratic preferences for the available resources. Customers of each type 푘 ∈ 풦 = {1, . . . , 퐾} arrive continuously according to a Poisson process with arrival rate 휆푘. Each resource 푖 has capacity 퐶푖 ∈ Z+ which limits the number of customers who can make use of that resource simultaneously and we will use 퐶 to denote the 푁-vector of such capacities. For each resource 푖 ∈ 풩 , there is an associated finite set

of 퐿 candidate prices 풫푖 = {푝푖1, . . . , 푝푖퐿} from which the operator is able to select. 푁×퐾 We use the matrix 푅 ∈ R to denote a pricing specification with entry 푟푖푘 ∈ 풫푖

denoting the decision to offer resource 푖 to customers of type 푘 at price 푟푖푘. We let 풫 denote the space of such price configurations. The resources we consider are substitutable and the various customer types may have heterogeneous preferences which the operator must consider in making assort-

28 ment decisions. In particular, each customer type 푘 is associated with a choice

model, ℳ푘, that formally specifies the decision-making process of each customer

type. The choice model determines, Pℳ푘 (푖; 푆, 푅), the likelihood that a customer of type 푘 purchases product 푖 when offered assortment 푆 with prices specified by

푅. For brevity, we suppress explicit dependence on ℳ푘 by using the shorthand

notation 푃푖푘(푆, 푅) = Pℳ푘 (푖; 푆, 푅). To focus on the issue of policy optimization, in this chapter, we take the system parameters parameters and the choice models

ℳ = (ℳ1, ℳ2,..., ℳ퐾 ) as fixed and given.

Upon the arrival of a customer of type 푘, the operator chooses a set of resources to offer 푆 ⊆ 풩 and pricing specification 푅 ∈ 풫. Subsequently, the customer elects

to purchase item 푖 ∈ 푆 with known probability 푃푖푘(푆, 푅) or she chooses the outside ∑︀ option with probability 푃0푘(푆, 푅) = 1 − 푖∈푆 푃푖푘(푆, 푅). We further assume that the operator has the ability to reject a customer by offering the empty assortment

푆0 = ∅ ⊂ 풩 . For example, if each customer segment 푘 chooses according a multino-

mial logit model with a base utility 푢푖푘 for obtaining resource 푖 and constant price

sensitivity parameter 훽푘, then the choice probabilities are determined by segment and

item-specific weights 푤푖푘 = exp(푢푖푘 − 훽푘푟푖푘) and choice probabilities can be computed

푤푖푘 as 푃푖푘(푆, 푅) = ∑︀ . Where such weights are normalized so that the outside 1+ 푗∈푆 푤푗푘 option has unit weight. If the customer selects a resource that is currently being utilized at capacity or chooses the outside option then they exit the system with no further effects, otherwise they are matched with their chosen resource and aservice event is initiated.

If an arriving customer of type 푘 elects to purchase a given resource 푖 at price level 푙, this initiates a service event that lasts for a random length of time with mean

−1 (휇푖푘) , possibly depending on both the customer type and the resource. We note that unless otherwise specified the service length may be given by any distribution with finite first moment. In any case we refer to 휇푖푘 as the service rate of resource 푖 for customers of type 푘. During each service period the customer uses one unit of capacity of resource 푖 and the operator earns revenue continuously at the specified rate 푟푖푘 = 푝푖푙.

29 In the context of managing a parking system, each resource would correspond to one of the operator’s parking facilities. The capacity of the resource would then be the number of individual spaces and the revenue rate would be the associated hourly fee charged to each customer. In this setting the type of a customer could indicate the general area of their final destination so that differences between types reflect broad preferences to park near landmarks such as the city center or at the airport, for example. Within each type, the heterogeneous customer behavior captured by the choice probabilities would be a reflection of unmodeled fine-grain preferences.

Under this system specification, the operator’s goal is to select a pricing and personalized assortment policy that generates the optimal or near-optimal steady- state revenue rate. The notion of such a policy will be formalized shortly, but in brief, it consists of a mapping from the current state of the system and type of an arriving customer to a personalized assortment which will be offered to customers of that type and the associated prices. We will formally define and discuss policies in the context of assortment decisions in section 2.2 and proceed to additionally consider pricing decisions in section 2.4.

We pause to note that in specific contexts it is likely that the operator’s decisions are further constrained. For example assortment decisions could be restricted by shelf space, screen space, or other business rules. To accomodate such assortment constraints we let 풮 denote the set of feasible assortments. In the event that the set of feasible assortments depends on the customer type, the sets of feasible assortments

can be further specified using 풮푘 for customers of type 푘. For ease of notation we will typically assume that 풮 is valid for customers of each type, however the extension to such type-dependent assortments is straightforward. Additionally, pricing flexibility may be limited due to laws restricting price discrimination. We consider the presence of such pricing restrictions throughout this chapter in both the context of assortment- only and joint pricing and assortment decisions.

Under these rules, the state of the system evolves continuously over time. For

푁×퐾×퐿 any time 푥 we let 푊 (푥) ∈ Z+ be the time 푥 resource utilization matrix with entry 푊푖푙푘(푥) giving the current number of type 푘 customers being served by resource

30 ∑︀퐾 ∑︀퐿 푖 at price level 푙. We further use 푊푖(푥) = 푘=1 푙=1 푊푖푙푘(푥) to denote the total utilization of resource 푖 at time 푥. Due to the capacity limitations, a new customer of type 푘 can be matched with resource 푖 only if the resulting utilization would not

exceed the corresponding capacity limit so that 푊푖(푥) < 퐶푖. In the special case of exponentially distributed service times, due to memorylessness, we are able to succinctly describe the state of the system using the current utilization 푊 (푥). On the other hand, for general service time distributions, the state must also include information about the time each customers’ service was intiated. We use 푊¯ (푥) to denote such the augmented state which specifies the state entirely in this manner.

Taking into account system parameters, the demand specification ℳ, and the possible utilization states of system, the operator seeks to develop a policy that enables them to make effective pricing and assortment decisions. In the next section, we begin by examining the special case in which prices are fixed exogenously and the operator need only decide which assortment of resources to offer to each arriving customer type. We proceed to consider the case of joint assortment and pricing decisions in section 2.4.

2.2 Assortment Policies

In this section, we take the revenue rates 푟푖푘 as exogenously fixed and focus on developing assortment strategies that enable the operator to effectively balance demand between resources. Here we simplify our notation by omitting the pricing decision where applicable, so that 푃푖푘(푆) denotes the probability that a customer of type 푘 purchases resource 푖 from set 푆 given the fixed prices. We also omit the price level subscript in the utilization states 푊푖푘(푥).

An assortment policy is a (possibly randomized) mapping from the current state of the system 푊¯ (푥) and the type of an arriving customer 푘 to a distribution over subsets of products 푆푘 ∈ 풮푘 to be offered. Formally, an assortment policy for arrivals

31 of type 푘 is given by the mapping

휋푘(푊¯ (푥)) : 푁 × 퐾 → Δ(풮). (2.1)

Where we use the operator Δ(풮) to denote the space of probability distributions over a set 풮. Further, we use 휋(푊¯ (푥)) = (휋1(푊¯ (푥)), . . . , 휋퐾 (푊¯ (푥))) to denote the overall policy. We will call a policy 휋 admissible if for each combination of customer 푘 and system state 푊¯ (푥) every item in each set with postive support the resulting distribution over offer sets 휋푘(푊¯ (푥)) contains only items with capacity available to

serve the customer if selected, that is 푊푖(푥) < 퐶푖 for all 푖 ∈ 푆. We further define the class of state-independent assortment policies as those which do not depend on the current state of the system 푊¯ (푥). Such policies may inadmissable and suffer needless lost sales, however they are applicable in cases where real-time utilization information is unavailable or prohibitively expensive and they serve as a useful baseline for more sophisticated strategies. We will sometimes refer to policies that do utilize such real- time utilization data as dynamic policies.

We pause to note that under any fixed policy 휋 the state of the system evolves as a Markov process with transition probabilities modulated by the choice of assortment policy. Such a fixed policy thus induces a unique stationary distribution over the

utilization matrix 푊 (푥). Through the type-푘 assortment policy 휋푘 this then induces 휋 a stationary distribution 훼푘 (푆) over offer sets which specifies the long-run proportion of customers of type 푘 who are offered assortment 푆. Then by Little’s Law the long- run expected number of customers of type 푘 being served by resource 푖 is given by

휆푘 ∑︀ 휋 훼 (푆)푃푖푘(푆). 휇푖푘 푆∈풮 푘

In the case of continuous revenue accrual, under policy 휋 the long term average revenue is determined by the long run expected utilization of each resource by each customer type given by

[︃∫︁ 푇 퐾 푁 ]︃ (︃ 퐾 푁 ∫︁ 푇 )︃ 휋 1 ∑︁ ∑︁ ∑︁ ∑︁ 1 퐽 = lim E휋 푟푖푘푊푖푘(푥)푑푥 = 푟푖푘 lim E휋[푊푖푘(푥)]푑푥 푇 →∞ 푇 푇 →∞ 푇 0 푘=1 푖=1 푘=1 푖=1 0

32 Where the expectation is taken with respect to the random evolution of the system under the assortment policy 휋. To maximize their revenue it is clear that the operator must select a policy that balances the need to maintain high utilization of profitable resources with the need to ensure sufficient choice is available so that new arrivals into the system with selective preferences are able to find an acceptable resource to use. Let Π denote the set of valid assortment policies. In order to maximize revenue the operator is interested in calculating the optimal assortment policy 휋* which is the solution to the optimization problem,

퐽 * = max 퐽 휋. (2.2) 휋∈Π

In principle this problem can be solved using dynamic programming, but since even in the simplest setting of exponential service times both the number of states and the number of controls (assortments) are exponential in the number of resources such an approach is intractable for all but the smallest instances.

2.2.1 Deterministic Linear Programming Upper Bound

To obtain an upper bound on the optimal revenue (2.2) we use the well-known linear programming-based deterministic approximation, in which the resource capacity constraints need only hold in expectation. Similar techniques have been used prof- itably throughout the literature. For example in the case of consumable resources, Liu and Van Ryzin (2008) and Bront et al. (2009) introduce and solve the choice-based deterministic linear program (CDLP) . However due to the consumable nature of inventory in their setting it is not straightforward to implement the resulting policy. On the other hand, in case of reusable resources, as considered in Levi and Radovanovic (2010) and Lei and Jasin (2016), implementing the resulting policy is straightforward, however the role of customer choice is not explored. In contrast to these settings, our proposed policy explicitly considers and makes use of customer choice behavior to balance demand for multiple resources. This choice behavior combined with the larger

33 action space of assortments we consider introduces additional complexity in solving the associated linear program. Our formulation is similar to that presented in Gallego et al. (2016) with the addition of customer-dependent service times and revenue rates. To formulate the deterministic linear program we introduce decision variables

훼푘(푆) which for each type 푘 ∈ 풦 and 푆 ∈ 풮 can be interpreted as the fraction of time to offer assortment 푆 to arriving customers of type 푘. Then the solution to the following linear program determines an allocation that upper bounds the expected long-run system revenue of any admissible policy,

퐾 푁 퐿푃 ∑︁ ∑︁ ∑︁ 휆푘 퐽 = max 푟푖푘 푃푖푘(푆)훼푘(푆) 훼1,...,훼퐾 휇푖푘 푘=1 푆∈풮 푖=1 퐾 ∑︁ 휆푘 ∑︁ s.t. 푃푖푘(푆)훼푘(푆) ≤ 퐶푖 ∀푖 ∈ 풩 휇푖푘 (2.3) 푘=1 푆∈풮 ∑︁ 훼푘(푆) ≤ 1 ∀푘 ∈ 풦 푆∈풮

훼푘(푆) ≥ 0 ∀푆 ∈ 풮, 푘 ∈ 풦.

퐿푃 퐿푃 퐿푃 We let 훼 = {훼1 , . . . , 훼퐾 } denote the solution to this problem. Under the optimal dynamic policy 휋* the Markov process governing the state of the system has a unique stationary distribution over the utilization of resources.

* * Thus, applying the type-푘 policy 휋푘 yields 훼푘(푆), the stationary distribution over * * the assortments offered to customers of type 푘, and together {훼1, . . . , 훼퐾 } yields a feasible solution to problem (2.3). Indeed, since 휋* is an admissible policy it follows

∑︀퐾 휆푘 ∑︀ * that 푃푖푘(푆)훼 (푆) = 휋* [푊푖(푥)] ≤ sup 푊푖(푥) ≤ 퐶푖 for each item 푘=1 휇푖푘 푆∈풮 푘 E 푡≥0 푖. The last two constraints are automatically satisfied since a stationary distribution must be probability distribution. Since 훼* is a feasible solution, we have we must have that 퐽 퐿푃 represents an upper bound on 퐽 *. We state this result as a lemma below.

Lemma 1. 퐽 * ≤ 퐽 퐿푃

Although problem (2.3) is conceptually straightforward, in practice the number of possible assortments that can be offered to each customer class introduces com-

34 putational challenges since there are 퐾2푁 variables and selection probabilities must be generated and stored for each such set. To reduce the computational burden we utilize a column generation procedure for computing the deterministic upper bound. Our strategy here is similar to that proposed in Gallego et al. (2016), except that in our setting we must take into account the service rate of each customer type at each resource and any differential class-based pricing.

Since each variable corresponds to a customer type and assortment combination specifically we solve a reduced version of problem (2.3) with a subset of assortments 풞푘 for each customer type 푘 and compute the solution only for the corresponding variables. This results in a reduced problem of the form,

퐾 푁 푅퐸퐷 1 퐾 ∑︁ ∑︁ ∑︁ 휆푘 퐽 ({풞 ,..., 풞 }) = max 푟푖푘 푃푖푘(푆)훼푘(푆) 훼1,...,훼퐾 휇푖푘 푘=1 푆∈풞푘 푖=1 퐾 ∑︁ ∑︁ 휆푘 s.t. 푃푖푘(푆)훼푘(푆) ≤ 퐶푖 ∀푖 ∈ 풩 휇푖푘 (2.4) 푘=1 푆∈풞푘 ∑︁ 훼푘(푆) ≤ 1 ∀푘 ∈ 풦 푆∈풞푘 푘 훼푘(푆) ≥ 0 ∀푆 ∈ 풞 , ∀푘 = 1, . . . , 퐾.

After solving the reduced problem to optimality we check the optimality of our current solution to the full problem (2.3) by searching for a violated constraint in its dual problem given by,

푁 퐾 ∑︁ ∑︁ min 퐶푖훾푖 + 휎푘 훾,휎 푖=1 푘=1 푁 푁 ∑︁ 휆푘 ∑︁ 휆푘 s.t. 훾 푃 (푆) + 휎 ≥ 푟 푃 (푆) ∀푘 ∈ 풦, ∀푆 ∈ 풮 휇 푖 푖푘 푘 푖푘 휇 푖푘 (2.5) 푖=1 푖푘 푖=1 푖푘

훾푖 ≥ 0 ∀푖 ∈ 풩

휎푘 ≥ 0 ∀푘 = 1, . . . , 퐾.

Where we note that the dual variables 훾 and 휎 are associated with the capacity constraints and the offering time constraints respectively.

35 Let 훾푅퐸퐷 and 휎푅퐸퐷 be the resulting value of the dual variables after solving the reduced problem (2.4) to optimality. By standard duality theory in linear optimization, finding a variable to add to the primal problem is equivalent to findinga violated constraint in the dual program (2.5) with variables fixed to 훾푅퐸퐷 and 휎푅퐸퐷. Therefore, for each customer type 푘 we seek to solve the following column generation subproblem,

푁 푁 ∑︁ 휆푘 ∑︁ 푅퐸퐷 휆푘 푅퐸퐷 max 푟푖푘 푃푖푘(푆) − 훾푖 푃푖푘(푆) − 휎푘 . (2.6) 푆∈풮 휇 휇 푖=1 푖푘 푖=1 푖푘

If the resulting solution is non-positive for each customer type 푘, then the solution to the reduced primal problem is indeed optimal and we can terminate the procedure. Otherwise we add at least one set 푆 for which a positive value was obtained in (2.6) to the set of candidate columns 풞푘 for the corresponding customer type and repeat this procedure with the updated set of candidate columns.

We note that solving the column generation subproblem corresponds exactly to the static assortment optimization problem with the dual-adjusted revenues, 푟˜푖푘 = 푟푖푘−훾푖 . Thus the column generation problem is computationally tractable whenever 휇푖푘 the assortment problem can be solved efficiently. For the special case in which demand from each customer type is described by a multinomial logit model, it suffices to check the dual-adjusted revenue-ordered assortments for each customer type as shown in Talluri and van Ryzin (2004a). A number of other models are also known to admit tractable assortment optimization.

In the next section we will demonstrate that the resulting solution determines a policy with substantial performance guarantees. We would like to highlight this as one of important themes of this chapter. Despite the enormous size of the state space and the intractability of computing the full optimal dynamic control policy, we are able to obtain provably good static policies so long as we are able to solve a single period static assortment problem, either optimally or approximately.

36 2.2.2 State-Independent Performance Guarantees

The solution to problem (2.3) provides a guide as to the frequency with which to offer the selected sets. Although it does not give dynamic guidance as to thewhich sets should be offered under any given state of the system, the resulting policyis provably effective. To demonstrate this we define the type-based assortment selection policy (TASP) which selects sets to offer to type 푘 customers in proportion to the

퐿푃 probabilities 훼푘 . Specifically, under the TASP, when a customer oftype 푘 arrives to the system, an

퐿푃 assortment 푆 is selected from 풮 with probability given by 훼푘 (푆) and is offered to the customer regardless of the current availability of the products in 푆. The customer

then selects an item 푖 from 푆 with probability 푃푖푘(푆) as defined by their choice model. If the selected item 푖 currently has available capacity then the assignment is made and a service event begins. On the other hand, if resource 푖 has insufficient remaining

capacity, if the offered set was the empty set 푆0, or if the no purchase option is chosen then customer exits the system with no further effect on its operation. In the event

that no products in the selected set 푆 are available, we assume that 푆0 is offered.

We note that although the TASP as a state-independent policy, is not admissible as previously defined, by dropping customers who would violate the capacity constraint if accepted, we can apply tools from the literature on loss-systems to analyze our problem. In fact, we can derive an admissible policy simply by offering the set that results from following the TASP as described, but removing the unavailable products from the selected set 푆. We will refer to this policy as the available assortment selection policy (AASP) and what follows we will discuss sufficient conditions under which the AASP retains the performance guarantees associated with the TASP.

Under the TASP, the arrival rate of customers seeking to consume each resource is independent of the current state of the system. This allows us to analyze its performance with respect to each resource individually and so for now we consider the performance of the TASP with respect to a single resource 푖. Using the law of total probability we note that under the TASP the arrival rate of customers seeking re-

37 ∑︀퐾 ∑︀ 퐿푃 source 푖 is given by 푘=1 휆푘 푆∈푆 훼푘 (푆)푃푖푘(푆) and the service rate for each arriving

customer of type 푘 is 휇푖푘. Thus if the capacity constraint was removed after implementation the operator would earn revenue derived from resource 푖 at a rate given by

퐿푃 ∑︀퐾 휆푘 ∑︀ 퐿푃 퐽 = 푟푖푘 훼 (푆)푃푖푘(푆) for each resource 푖 per unit time. However, 푖 푘=1 휇푖푘 푆∈풮 푘

under the TASP, the capacity constraint causes some of these arrivals to fail. Let 퐵푖 denote the probability that an arriving customer under the TASP finds their chosen product unavailable. In the queuing and loss-systems literature this probability is referred to as the blocking probability. Thus under the TASP the long-run revenue

푇 퐴푆푃 퐿푃 rate at resource 푖 is given by 퐽푖 = (1 − 퐵푖)퐽푖 .

The blocking probabilities can be computed using the Erlang-B formula for loss- systems. Consider a generic parallel queuing system with 퐶 servers, Poisson arrivals

휆 at rate 휆, and service rate 휇. Then the traffic intensity is given by 휌 = 휇 and the long-run blocking probability, or probability that an arriving customer finds all servers busy, is given in closed form by

휌퐶 퐵(휌, 퐶) = 퐶! . (2.7) ∑︀퐶 휌ℓ ℓ=0 ℓ!

For a fixed traffic intensity, the blocking probability is decreasing and convexin 퐶 and for fixed capacity 퐶, 퐵(휌, 퐶) is increasing in 휌. Under the TASP, the capacity constraints in formulation (2.3) effectively upper bound the traffic intensity of resource

푖 by 퐶푖. Therefore, for each resource 푖 the long-run blocking probability is bounded above by 퐶푖 퐶푖 퐶푖! 퐵푖 ≤ 퐵(퐶푖, 퐶푖) = ℓ . ∑︀퐶푖 퐶푖 ℓ=0 ℓ!

We note that 퐵푖 is decreasing in the capacity 퐶푖 and that in the case of unit capacity 1 퐵푖(1, 1) = 2 . Lemma 1 proved in Levi and Radovanovic (2010) shows that 퐵푖 decays √ at a rate of at least 푂(1/ 퐶푖) as 퐶푖 is increased, demonstrating asymptotic optimality in the capacity of the system.

Then, applying lemma 1, the overall system the revenue rate under the TASP is

38 bounded from below by

푁 푇 퐴푆푃 ∑︁ 푇 퐴푆푃 퐽 = 퐽푖 푖=1 푁 ∑︁ 퐿푃 = (1 − 퐵푖)퐽푖 푖=1 퐿푃 ≥ min (1 − 퐵푖)퐽 푖=1,...,푁

* ≥ min (1 − 퐵푖)퐽 푖=1,...,푁

As mentioned previously, underthe TASP, the maximum blocking rate never exceeds

1 2 and so the TASP always achieves at least one half of the optimal revenue rate and grows increasingly competitive as the minimum capacity grows larger. We note that this bound also gives a proof of the asymptotic optimality of the TASP since as the system is scaled in arrival rate and capacity by the scalar 휃 the blocking probability converges to zero as 휃 grows larges. Thus we have shown the perhaps surprising result that a significant fraction of the optimal revenue can be obtained by astate- independent policy, which never considers the current utilization of the system. We summarize this result in the following lemma.

Lemma 2. The steady-state revenue generated under the TASP is bounded below as

푇 퐴푆푃 * 퐽 ≥ min (1 − 퐵(퐶푖, 퐶푖))퐽 . 푖=1,...,푁

푇 퐴푆푃 1 * In particular, 퐽 ≥ 2 퐽 . The TASP is also asymptotically optimal in the sense

that lim퐶푖→∞ 퐵(퐶푖, 퐶푖) = 0 independently of other problem parameters.

Although the TASP policy is simple to implement and provably effective, it is practically best suited for use cases in which real-time monitoring is not possible or prohibitively expensive. This is the case, for example, for the on-street parking systems in most cities. Even the most forward looking real-time municipal parking monitoring system in the United States, the SFPark pilot system, was not re- established after the first generation sensors exceeded their useful life. On theother

39 hand, in cases where such current utilization information is available the structure of the TASP policy as stated amounts to presenting customers with possibly unavailable choices, only to inform them of this after they have made up their mind. To mitigate these situations in practice we are motivated to modify the TASP so that customers are guaranteed to have access to the product they select. Formally we define the available assortment selection policy (AASP) policy as follows. When a customer of type 푘 arrives, a set 푆 is selected according to the prob-

퐿푃 abilities 훼푘 (푆) as under the TASP. However, the AASP then modifies the selected set to include only products with available capacity, specifically the operator will offer ˜ ˜ the set 푆(푥) = {푗 ∈ 푆 : 푊푗(푥) < 퐶푗}. If 푆(푥) = {∅}, then the customer is informed that there are no suitable products available and in this case they exit the system with no further effect. With two further assumptions we are able to demonstrate that the AASP performs at least as well as the TASP. The first is that the revenue rate 푟푖푘 depends only on the selected resource, that is for each resource 푖, 푟푖푘 = 푟푖푘′ for all customer types 푘, 푘′ ∈ [퐾]. In addition, although the TASP policy does not depend on the specification of the choice model, the AASP relies on a mild assumption on the structure of customer choice. Specifically we will assume that the choice model satisfies the following regularity property, 푃푖푘(푆) ≤ 푃푖푘(푆 ∖ 푗), ∀푗 ∈ 풮 such that 푗 ̸= 푖. In words, this assumption means that the likelihood of a customer of type 푘 choosing product 푖 only increases when there are fewer choices offered. This property is referred to inthe literature as weak rationality (Jagabathula 2014) or the regularity axiom (Berbeglia and Joret 2015). This intuitive property holds for many choice models of interest such as the multinomial logit (MNL), the nested logit model with dissimilarity parameters between 0 and 1, the mixed MNL model, and others (see Pixton and Simchi-Levi (2016) for further examples).

Proposition 1. So long as 푟푖푘 = 푟푖 for all resources 푖 ∈ 풩 and customer types 푘 ∈ 풦 and weak rationality holds, we have

퐴퐴푆푃 푇 퐴푆푃 * 퐽 ≥ 퐽 ≥ min (1 − 퐵푖)퐽 . 푖=1,...,푁

40 This proposition validates the reasonable intuition that in the absence of price dis- crimintation, the AASP retains the strong performance guarantees associated with the TASP. Further the AASP provides numerous practical benefits disruptive to potential customers,

2.2.3 Optimal State-Independent Assortment Policy

In the previous subsection we analyzed the performance of the TASP in managing the load of a capacitated system of reusable resources. Although the TASP is an effective state-independent policy and is guaranteed to achieve at least one half of the revenue attributable to the optimal state-dependent dynamic policy, it is likely that ignoring the true blocking probabilities when designing such a policy is suboptimal. In fact we observe that the resource constraints in formulation (2.3) impose a reasonable, but entirely arbitrary restriction on the average utilization of each resource. In this subsection we relax this restriction in order to develop optimal state-independent policies for our setting. Here we consider a method for computing the optimal state independet policy in the case when price discrimination is disallowed so that revenue of resource 푖 is fixed at 푟푖. We term this policy the State-Independent Optimal Policy (SIOP). In such a setting, this policy performs at least as well as the TASP and thus inherits the associated performance guarantees on the resulting steady-state expected revenue as well as asymptotic optimality. Further, since the solution to the TASP provides an upper bound on the expected revenue of the optimal dynamic policy it is possible to obtain stronger guarantees on a case-by-case basis. To compute the optimal static policy we propose the use of a constrained concave maximization procedure. As in the case of the TASP optimization problem, there are an exponential number of possible assortments available to offer, however we propose an assortment generation scheme similar in spirit to the column generation used to solve the TASP to alleviate this computational burden by keeping the number of assortments considered relatively small. Since a state-independent policy, by definition, must make its assortment selection

41 in ignorance of the current system utilization, it follows that any such policy for

customers of type 푘 can be formulated as a distribution 훼푘(푆) over 푆 ∈ 풮. Taken together, this results in the allocated utilization of resource 푖 under the policy specified by 훼, 퐾 ∑︁ ∑︁ 휆푘 휌˜푖(훼1, . . . , 훼퐾 ) = 푃푖푘(푆)훼푘(푆). 휇푖푘 푘=1 푆∈풮 From this allocated utilization, we are interested in the effective utilization of resource

푖, 푔푖 : R → R, under allocated utilization 휌 given by,

푔푖(휌) = (1 − 퐵 (휌, 퐶푖)) 휌, (2.8)

which by Little’s Law represents the long-term time average number of customers occupying resource 푖 after accounting for blocking. The first term expresses the steady-state blocking probability as given by the Erlang-B formula (2.7) when the allocated utilization to resource 푖 is 휌. With slight abuse of notation, we define

|풮| 퐾 effective utilization function for resource 푖, 푔푖 : (Δ ) → R, under the allocation variables 훼 = (훼1, . . . , 훼퐾 ) directly by,

(︃ (︃ 퐾 )︃)︃ (︃ 퐾 )︃ ∑︁ ∑︁ 휆푘 ∑︁ ∑︁ 휆푘 푔푖(훼1, . . . , 훼퐾 ) = 1 − 퐵 푃푖푘(푆)훼푘(푆), 퐶푖 푃푖푘(푆)훼푘(푆) , 휇푖푘 휇푖푘 푘=1 푆∈풮 푘=1 푆∈풮 (2.9) Using this function, we can express the steady-state revenue rate associated with resource 푖 simply as,

푓푖(훼1, . . . , 훼퐾 ) = 푟푖푔푖(훼1, . . . , 훼퐾 ). (2.10)

We then define the global revenue function 푓 : (Δ|풮|)퐾 → R as the sum of the resource-based revenue functions,

푁 ∑︁ 푓(훼1, . . . , 훼퐾 ) = 푓푖(훼1, . . . , 훼퐾 ). (2.11) 푖=1

Therefore to maximize revenue we are faced with the following nonlinear maxi-

42 mization problem over the 퐾 probability simplices specified by 훼1, . . . , 훼퐾 ,

푆퐼푂푃 퐽 = max 푓(훼1, . . . , 훼퐾 ) 훼1,...,훼퐾 ∑︁ s.t. 훼푘(푆) ≤ 1 ∀푘 ∈ 풦 (2.12) 푆⊆풮

훼푘(푆) ≥ 0 ∀푆 ∈ 풮, 푘 ∈ 풦.

We now show that problem (2.12) can be solved efficiently since the objective (2.11) is concave in the allocation variables 훼 as formalized in the following proposition.

Proposition 2. The global steady-state revenue function 푓 is concave in the allocation variables 훼.

This concavity result implies that as long as the number of resources and customer types is relatively small, problem (2.12) can be solved to the necessary precision using standard nonlinear optimization techniques such as the projected gradient descent algorithm. However, just as in the case of the TASP optimization problem (2.3), the large number of possible assortments presents computational challenges for an operator of even a moderate number of resources. To mitigate this challenge we propose a new column generation technique adapted to this problem that enables us to solve for the optimal static policy in larger systems. In what follows we assume access to a constrained convex optimization oracle, that is able to solve constrained convex optimization problems in which the feasible region is defined by a closed convex set, which in our case is defined by the Cartesian product of 퐾 unit simplices over the elements of 풮. In practice such problems can be solved to arbitrary precision using an iterative procedure such as projected gradient descent.

As in subsection 2.2.1, we begin with a restricted master problem, analogous to our original problem of interest (2.12) except that in this problem, for each customer

43 type 푘 we optimize over a subset of assortments 풞푘 ⊂ 풮, by solving,

퐽 푆퐼푂푃 (풞1,..., 풞퐾 ) = max 훼1,...,훼퐾

푁 퐾 (︃ (︃ 퐾 )︃)︃ ∑︁ ∑︁ 푟푖 ∑︁ ∑︁ 휆푘 ∑︁ 1 − 퐵 푃푖푘(푆)훼푘(푆), 퐶푖 휆푘푃푖푘(푆)훼푘(푆) 휇푖푘 휇푖푘 푖=1 푘=1 푘=1 푆∈풞푘 푆∈풞푘 ∑︁ s.t. 훼푘(푆) ≤ 1 ∀푘 ∈ 풦 푆∈풞푘 푘 훼푘(푆) ≥ 0 ∀푆 ∈ 풞 , 푘 ∈ 풦. (2.13) We then search for augmenting assortments systematically using the following steps.

1. Initialization: For each customer type 푘 ∈ 풦, begin with an arbitrary set of assortments 풞푘 ⊂ 풮. Set the iteration counter 푡 = 1.

2. Solve the optimization problem 퐽 푆퐼푂푃 (풞1,..., 풞퐾 ) over these subsets to obtain

푡 푡 푡 an optimal allocation 훼 = (훼1, . . . , 훼퐾 ) for the restricted problem.

푡 푡 3. For each customer class compute 푣 = max 푡 푘 ∇푓(훼 ). 푘 훼푘(푆):푆∈풞

4. For each resource 푖, compute the effective utilization revenue gradient coefficient,

푡 푑 푡 푟푖 = 푟푖(푡) 푑휌 푔푖(˜휌(훼 )).

∑︀ 푡 5. For each customer type 푘 solve the static assortment problem, max푆⊆풮 푖∈푆 푟푖푃푖푘(푆) 푡 and let 푆푘 denote each such maximizer.

∑︀ 푡 푡 6. If max푆∈풮 푖∈푆 푟푖푃푖푘(푆) ≤ 푣푘 for all customer types 퐾 then the current solution 훼¯ is optimal and the procedure terminates. Otherwise add such a violating set

푡 푘 푆푘 to the corresponding subset 풞 and return to step 2 above, incrementing the counter 푡.

Where in the third step we are computing the marginal contribution of the assortments we offer to type 푘 customers. By first order optimality conditions for concave functions, these marginal contributions are equal for all sets used in the solution to

푑 푡 푡 푡 the restricted problem. That is, 푡 푓(훼 ) = 푣푘 for all 푆 such that 훼푘(푆) > 0. Thus 푑훼푘(푆)

44 to find an augmenting assortment for a customer oftype 푘 we seek an assortment

푡 with marginal value greater than 푣푘. In the fourth step we compute the return on allocated utilization to resource 푖 at the current solution which represents the increase in revenue associated with a infinitessimally small increase in the allocated utilization. We then use these as inputs to a static assortment optimization problems to find an augmenting assortment. In practice, one can increase the speed of convergence by adding columns in batches, for instance by adding columns for multiple customer types simultaneously. Let 퐽 푆퐼푂푃 denote the steady-state revenue rate achieved by the state-independent optimal policy as found using the algorithm above. Thus by applying the result of lemma 2 and the optimality of 퐽 푆퐼푂푃 , we obtain the following result.

Proposition 3. The steady-state revenue generated under the SIOP is bounded below as

푆퐼푂푃 푇 퐴푆푃 * 퐽 ≥ 퐽 ≥ min (1 − 퐵(퐶푖, 퐶푖))퐽 . 푖=1,...,푁

푆퐼푂푃 1 * In particular, the SIOP inherits the lower bound, 퐽 ≥ 2 퐽 , and asymptotic optimality, lim퐶푖→∞ 퐵(퐶푖, 퐶푖) = 0 from the TASP.

By considering simple examples of systems whose resources have equal revenue rates and total arrival rates less than the total capacity it is easy to demonstrates that, in general, the dominance of the SIOP in proposition 3 is non-strict. However in overloaded systems or those with substantive heterogeneity in revenue rates between resources the SIOP allocate customer traffic can more efficiently and better balance revenue generating opportunities with blocking probabilities. Unfortunately, when revenue rates may differ between customers for the same resource, it is possible to demonstrate counterexamples to the concavity result of proposition 2, and therefore the procedure above may fail to find the globally optimal state-independent policy. In the event that the operator has the ability to monitor the status of their resource in real-time, we may also define a more intelligent version of the SIOP that proceeds in analogy to the AASP as defined in section 2.2.2. The Available State Independent Optimal Policy (ASIOP) proceeds in the same manner as the SIOP policy, however

45 upon generating a set to offer as in the SIOP, prior to offfering this assortment tothe arriving customer, all resources that are currently fully utilized are removed from the assortment. If the resulting assortment is empty then the customer is rejected and lost to the system. An analogous result to proposition 1 also applies in this instance. As we demonstrate in the computational results in section 2.3, this flexibility is valuable in practice.

2.2.4 Transient Revenue Loss Guarantees

In the previous sections we have shown that the TASP and the SIOP are provably effective in the steady-state regime. In this section, we characterize the time tosta- tionarity as a random variable and demonstrate that this convergence occurs quickly and that the expected revenue lost during the initial phase until steady state is reached is bounded by a term of order 푂(퐶 ln 퐶). For this analysis we use the theory of continuous time Markov chains (CTMCs) and so we must restrict our attention to the case of exponentially distributed service times. Note that this representation also requires that the service times for each resource be equal for customers of each type. With these further assumptions, under both the TASP and SIOP, the state of each resource 푖 can be modeled as a continuous-time birth-death Markov chain over the set of states Ω푖 = {0, 1, . . . , 퐶푖} and we let 푊푖(푥) ∈ Ω푖 denote the state at time 푡 and we calculate the instantaneous revenue rate of the process as 푟푖푊푖(푥).

For simplicity we assume that the resource begins unutilized so that 푊푖(0) = 0. In practice if we begin at any higher state convergence to steady state is likely to occur more quickly. As before, the aggregate arrival rate to resource 푖 can be expressed ˜ ∑︀퐾 ∑︀ as 휆푖 = 푘=1 휆푘 푆⊆푆 훼푘(푆)푃푖푘(푆), for the distribution 훼(푆) corresponding to the policy at hand, and is constant over all states 푗 ∈ Ω푖 with 푗 < 퐶푖. Due to the customer rejection mechanism, the effective arrival rate of customers in state 푊 (푥) = 퐶푖 is zero. Similarly the departure rate at state 푊 (푥) = 0 is zero, and is equal to 푗휇푖 for each 푗 ∈ Ω푖 with 푗 > 0. This information can be summarized succinctly using the corresponding CTMC infinitesimal generator matrix 퐺푖 ∈ R퐶푖+1,퐶푖+1. For convenience we index 퐺푖 by the state space so that a zero index is used to reference transition

46 rates to and from state zero. The entries of 퐺푖 can be given explicity in matrix form as represented below.

⎡ ⎤ ⎢ ˜ ˜ ⎥ ⎢−휆푖 휆푖 0 ... 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ˜ ˜ .. . ⎥ ⎢ 휇푖 −(휆푖 + 휇푖) 휆푖 . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ...... ⎥ ⎢ 0 . . . 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ . .. ˜ ˜ ⎥ ⎢ . . (퐶푖 − 1)휇푖 −(휆푖 + (퐶푖 − 1)휇푖) 휆푖 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 0 ... 0 퐶푖휇푖 −퐶푖휇푖

We let 푃푖푗(푡) denote the time 푡 transition probabilities from a state 푖 to state 푗. We

푖 퐶푖+1 let 휋 ∈ Δ1 denote the stationary distribution of this chain so that lim푡→∞ 푃푖푗(푡) = 푖 휋푗. Since the continuous-time chains we consider consist of a single recurrent class with finite state space such a stationary distribution is guaranteed to exist andis unique as shown in, for instance, Levin et al. (2009).

At time 푡, we define the instantaneous revenue loss versus steady state by

퐶푖 퐶푖 푖 ∑︁ 푖 ∑︁ (︀ 푖 )︀ ℒ (푥) = 푟푖푗휋푗 − 푟푖푊푖(푥) = 푟푖 푗 휋푗 − I(푊푖(푥) = 푗) . 푗=1 푗=1

Then for a single sample path, the total loss relative to steady state is found by integrating over 푡, ∫︁ ∞ ℒ푖 = ℒ푖(푥) 푑푥. (2.14) 0

To demonstrate the effectiveness of our policy we seek to bound this loss inexpec- tation over sample paths, E[ℒ푖]. For the remainder of this subsection we restrict our attention to a single resource. Without loss of generality we assume that the arrival ˜ rate 휆푖 is expressed in units of service rates so that 휇푖 = 1. In this case we will refer

47 to 퐺푖 as being service-rated normalized. We will make use of the eigenvalues of −퐺푖 to characterize the speed with which

푊푖(푥) obtains it’s highest state 퐶푖. We then show that once state 퐶 is reached the

future reward of 푊푖(푥) must be at least as high as the steady-state revenue, proving the proposition. We pause to note that a birth-death CTMC is reversible in time and therefore the corresponding generator matrix has real eigenvalues (Fill 2009). In particular since the rows 퐺푖 sum to zero, the vector of ones in R퐶푖+1 is an eigenvector 푖 of −퐺 with a corresponding eigenvalue of zero. Then let 휈0 = 0 and let 휈1, . . . , 휈퐶푖 denote the non-zero eigenvalues of −퐺푖.

We note that in our case 푊푖(푥) is restricted to instantaneous increases of unit size, so our chain is skip-free. Thus, to bound the revenue loss, we are able to make use of the result of Theorem 1.1 from Fill (2009), which states that the distribution

* of the hitting time of state 퐶푖, 푇 , for a chain of this type is given by the convolution

* ∑︀퐶푖 1 of exponential distributions with rate 휈푗. This implies that [푇 ] = and that E 푗=1 휈푗 * ∑︀퐶푖 1 Var(푇 ) = 푗=1 2 . We will demonstrate that the expected time until hitting 퐶푖 is 휈푗 directly proportional to the revenue loss in comparison to steady state. Therefore to ensure that convergence occurs quickly we are motivated to provide lower bounds

on the eigenvalues 휈1, . . . , 휈퐶푖 . The following lemma is referenced in van Doorn and Zeifman (2009) and is proved using the theory of orthogonal polynomials (Chihara 2011).

Lemma 3. If 퐺푖 is service rate normalized, the eigenvalues of −퐺푖 are bounded from

below by 휈푗 ≥ 푗 for 푗 = 0, 1, . . . , 퐶푖.

˜ We note that this lower bound on the eigenvalues is tight as 휆푖 approaches zero. ˜ For larger values of the aggregate arrival rate, such as 휆푖 ˜=1, this lower bound tends to 퐶푖 be loose. In table 2.1 we present a sample of We further note that the resources most critical to overall system revenue are likely to have high levels of allocated aggregate traffic and so importance to system revenue tends to correlate favorably withspeed of convergence. We begin by showing that the future reward gained by beginning in a higher

48 ˜ ˜ ˜ 휆푖 = 0.5 휆푖 = 1.0 휆푖 = 1.5 퐶푖 퐶푖 퐶푖 퐶푖 = 5 1.61 1.02 0.70 퐶푖 = 20 2.87 1.58 0.92 퐶푖 = 100 4.49 2.31 1.05

Table 2.1: Expected hitting times, E[푇 *], in units of service times utilization state is always higher on a pathwise-basis. For a process 푊 (푥) on a given sample path we define the reward function by

∫︁ 휏 푅푊 (휏) = 푟푊 (푥) 푑푥. 0

We further define the (possibly infinite) total future reward 푅푊 = lim휏→∞ 푅푊 (휏). The following lemma demonstrates that if we observe two processes on the same sample path, the future revenue is monotone in the current utilization state. The proof is provided in the appendix.

Lemma 4. For two such processes 푊 (푥) and 푌 (푥) whose future evolution is subject to the same sample path with 푋(0) ≤ 푌 (0), we have 푅푊 −푌 ≤ 0.

With these lemmas in place we are able to state the bound on the revenue loss in the following proposition. The proof of this result may be found in the appendix.

Proposition 4. If 퐺푖 is service rate normalized, then the expected revenue loss before attaining steady state is bounded above by

푖 E[ℒ ] ≤ 푟푖(퐶푖 ln(퐶푖 + 1) + 퐶푖).

This proposition demonstrates that although our revenue guarantees apply to the steady-state regime, this regime is reached quickly. This also creates confidence that so long as changes in demand are reasonably slow in comparison to average service times a series of static policies (often referred to as ’time of day’ policies) can effectively deal with changing demand in the dynamic setting.

49 2.2.5 Dynamic Assortment Policies

The TASP and SIOP and their state-dependent variants are provably strong policies for demand management in this setting. However, when real-time utilization information is available an operator may seek a policy that is more deterministic in nature and that responds to changing resource availability in a more proactive fashion. Due to the size of the global state space computing the globally optimal dynamic policy is intractable in general, so in this section we explore heuristic policies for dynamic assortment optimization. Subsequently, in section 2.3 we demonstrate that these policies perform well in our scenario of practical interest. We begin by introducing the obvious greedy policy and discuss its shortcomings. To mitigate these we proceed by examining the relevant dynamic programming formulation as examined in the case of consumable inventory in the work of Liu and Van Ryzin (2008) and Bront et al. (2009). Following these ideas it is possible to develop heuristic strategies that make use of the current state of the system as well the marginal values of inventory derived from the solutions to equation (2.3). At time 푥, the current utilization of the system is captured by the 푁 × 퐾 matrix, 푊 (푥) whose entries represent the number of each customer type being served at each ∑︀퐾 resource and recall the notation 푊푖(푥) = 푘=1 푊푖푘(푥) to denote the total utilization of 푖 at 푡. Then upon the arrival of a customer of type 푘, the greedy policy maximizes the expected marginal gain by offering the assortment 푆푘(푥), representing the solution to the optimization problem,

∑︁ 푟푖푘 max 푃푖푘(푆). 푆∈풮 휇푖푘 푖∈푆:푊푖(푥)<퐶푖

50 examined in Bernstein et al. (2015), where the authors demonstrates the optimality of a policy incorporating inventory-rationing in their model. In some systems, such as those with light traffic relative to system capacity and in which revenue rates and service times do not differ greatly between resources and customer types, the greedy policy performs well. However with increasing disparity in revenue or service rates between customer types or in the case where some customer types are more selective than others more sophisticated assortment policies can provide improved performance, as we demonstrate in section 2.3.

The more sophisticated heuristic policies we consider are motivated by the dynamic programming formulation of equation (2.2), which is most clearly presented under the assumption that the system evolves as a controlled continuous time Markov chain. In the derivations that follow we assume that the service time for a customer of type 푘 at resource 푖 is distributed as an exponential random variable with rate

휇푖푘. Under the Markovian assumption, the state of the system is specified fully by 푘 a utilization matrix 푊 and we use 푒푖 to represent the matrix with a value of one at position (푖, 푘) and zero elsewhere. Further, we are able to represent the value function by uniformizing the system against the maximum possible transition rate ∑︀퐾 ∑︀푁 휈 = 푘=1 휆푘 + 푖=1 퐶푖 max푘(휇푖푘). If the value of the optimal policy is given by 퐽 * and the relative value of state 푊 is given by 푉 *(푊 ), the dynamic programming formulation is given by,

퐾 (︂ )︂ * * ∑︁ ∑︁ ∑︁ 휆푘푃푖푘(푆)훼푘(푆) 푟푖푘 * 푘 퐽 + 푉 (푊 ) = max { + 푉 (푊 + 푒푖 ) 훼1,...,훼푘 휈 휇푖푘 푘=1 푆∈풮 푖∈푆:푊푖<퐶푖 푁 퐾 푘 ∑︁ ∑︁ 휇푖푘푊 + 푖 푉 *(푊 − 푒푘) (2.15) 휈 푖 푖=1 푘=1 퐾 푘 ∑︁ ∑︁ ∑︁ 휆푘푃푖푘(푆)훼푘(푆) + 휇푖푘푊 + (1 − 푖 )푉 *(푊 )}. 휈 푘=1 푆∈풮 푖∈푆:푊푖<퐶푖

Where for ease of representation we have replaced the revenue associated with each customer service event by its expected value 푟푖푘 which does not affect the expected 휇푖푘

51 value of a policy. We then isolate the decision problem at each state 푊 as,

{︃ 퐾 (︂ )︂}︃ ∑︁ ∑︁ ∑︁ 휆푘푃푖푘(푆)훼푘(푆) 푟푖푘 (︀ * * 푘 )︀ max − 푉 (푊 ) − 푉 (푊 + 푒푖 ) . 훼1,...,훼푘 휈 휇푖푘 푘=1 푆∈풮 푖∈푆:푊푖<퐶푖 (2.16)

* * 푘 The term (푉 (푊 ) − 푉 (푊 + 푒푖 )) in (2.16) can be viewed as the marginal value associated with having an extra unit of resource 푖 versus it being utilized by a customer of type 푘. In the asymptotic regime, as min푖(퐶푖) grows large, the system is not exposed to blocking and the TASP is optimal. Therefore in this regime the probabilities

푇 퐴푆푃 푇 퐴푆푃 훼 represent the solution to equation (2.16). The optimal dual variables 훾푖 corresponding to the resource capacity constraints intuitively represent the marginal rate that the operator should be willing to pay for an additional unit of capacity at resource 푖. These dual values are especially useful because they account for both the value of a resource and the degree to which other resources in the network can

푇 퐴푆푃 serve as an effective substitute. This motivates the use of the variables 훾푖 in approximating this marginal value as,

푇 퐴푆푃 (︀ * * 푘 푘 )︀ 훾푖 푉 (푊 ) − 푉 (푊푖 + 푒푖 ) = . (2.17) 휇푖푘

The most basic heuristic then acts greedily with respect to this constant marginal cost, selecting an offer set 푆푘(푡) that maximizes the value over this marginal value approximation. That is, we compute the assortment to offer each customer type 푘 at time 푥 using,

⎧ ⎫ (︂ 푇 퐴푆푃 )︂ ⎨ ∑︁ 푟푖푘 − 훾푖 ⎬ 푆푘(푡) = max 푃푖푘(푆) . 푆∈풮 휇푖푘 ⎩푖∈푆:푊푖(푥)<퐶푖 ⎭

Which reduces to a static assortment optimization problem with the revenue rate of

푇 퐴푆푃 each product reduced by 훾푖 . We term this policy the TASP Constant Marginal Value (TCMV) policy. Although simple TCMV policy has the advantage that it is easy to implement, even when service rates 휇푖푘 are non-constant between customer types for the same resource, a circumstance in which the more sophisticated policies

52 we discuss next are difficult to compute.

When the service time for each resource is constant for all customer types, that

is 휇푖푘 = 휇푖 for all 푘 for each 푖, it is also possible to implement a more intricate dynamic programming-based policy which computes marginal values of inventory that depend on the current utilization. Under this assumption and the CTMC model, the state space for each resource collapses to the 퐶푖 + 1 possible utilization levels. Following the heuristics suggested in Liu and Van Ryzin (2008) and Bront et al. (2009) we formulate a one-dimensional dynamic program for each resource independently, thereby obtaining a policy that in which the marginal value of inventory dynamically adjusts based on current utilization. To obtain this formulation we focus on an individual resource 푗 and use a much smaller dynamic program to obtain marginal values assuming that equation (2.17) holds for all customer types 푘 at resources 푖 ̸= 푗. That is we approximate the overall value function by making use of a series of ˜ one-dimensional approximate value functions, 푉푗(푊푗). Under this approximation all resources 푖 ̸= 푗 are assumed to be always available.

We focus here on a deterministic policy, so the set to offer customer type 푘 given the state 푊 is computed using policy iteration with the optimization step taking the form,

(︂ )︂ ∑︁ 푟푖푘 − 훾푖 max{ 푃푖푘(푆푘)[ I(푖 ̸= 푗)+ 푆푘∈풮 휇푖 푖∈푆푘 (︂ )︂ 푟푗푘 ˜ ˜ − (푉 (푊푗) − 푉 (푊푗 + 1)) I(푖 = 푗, 푊푗 < 퐶푗)]}. 휇푗 (2.18)

When service times are not equal for each customer type the state space for even the single-resource dynamic program grows intractably large due to the size of the state space. Attempting to extend these approaches to full generality is an interesting direction for future research.

53 2.3 Computational Case Study - Assortment Only

To test the effectiveness of our assortment policies in a realistic scenario wecon- structed a numerical experiment based on parking data collected in the borough of Islington, London, by the PayByPhone payment system. In this scenario the borough operates a network of parking lots, and we are interested in quantifying the revenue generation and congestion-control potential of a dynamic assortment strategy. The borough of Islington is currently conducting a pilot program of the GoPark system which recommends parking spaces to customers based on their residential status. One possible proposal is a mobile phone application that recommends drivers a place to park based on their desired destination. Due to the limited screen space on a mobile phone and the need to minimize driver distraction, it is reasonable that drivers are offered a limited number of spaces in which topark.

In the Islington data set, each resource is a group of adjacent parking spaces which share the same pricing characteristics and is termed a meter. Associated with

obs each meter 푖 is an observed arrival rate, 휆푖 , of customers into the meter and the

rate, 휇푖, at which present customers depart the meter. After processing the data as described in appendix A.2, we obtain a universe of 푁 = 287 meters for our analysis. Under the current policy, each meter 푖 has one of nine possible fixed prices,

푟푖 ∈ {1.2, 1.8, 2.0, 2.4, 3.0, 3.6, 4.0, 4.8, 5.0}, denominated in GBP. Also associated with

each meter 푖 is a location ℓ푖 in latitude and longitude and we use the function 퐷(ℓ푖, ℓ푗) to express the distance between meter 푖 and meter 푗 in kilometers.

Our work relies on modeling the decision process of customers through a choice model. Since the data set does not contain information on customers who decline the opportunity to park due to the price or potential substitution behavior between meters, to conduct our analysis we must make a number of assumptions on the customer decision process. First, we assume that the universe of possible customer types can be adequately approximated by the customer traffic to each meter. Therefore we have that the number of customer types 퐾 = 푁 = 287 and we assume that customers of type 푘 are those associated with the 푘th meter. Thus, for lack of further information,

54 for each customer type 푘, we take their preferred destination to be the location of the

푘th resource and therefore 푟푘 also denotes the price of the 푘th resource as fixed in the data set. In addition, we assume that the service time distribution is independent of the arriving customer type, and hence all customers remain in service at resource 푖

for a duration that is exponentially distributed with rate parameter 휇푘 regardless of the resource they select. For our purposes, we propose a multinomial logit (MNL) model to describe likeli-

hood of customer of type 푘 electing to utilize meter 푖. The parameter 푢푖푘 represents the mean utility customers of type 푘 derive from successfully utilizing resource 푖. The parameters 훽 and 휂 specify the sensitivity of customer utility to price and distance respectively. The deterministic component of customer utility or valuation is then given by, (푟푖 − 푟푘) 푈푖푘 = 푢푖푘 + 훽 + 휂퐷(ℓ푖, ℓ푘). (2.19) 휇푘

In accordance with the specification of the MNL model we assume that heterogeneity of customer valuations within each type for each alternative are distributed as standard Gumbel random variables, 휖. We also take the convention of assigning a base utility of zero to the no-purchase option, this leads to the expression,

푒푈푖푘 푃푖푘(푆) = (2.20) ∑︀ 푈푗푘 1 + 푗∈푆 푒

which describes the probability that a customer of type 푘 purchases resource 푖 from the offered assortment 푆. We seek to model the behavior of all potential customers arriving to the system, however, the arrival rates inferred from our data set consist only of customers who are using the PayByPhone system and due to the inavailability of real-time usage data we are also unable to observe customer arrivals which are blocked from the system when their target meter is full. We have worked to correct our estimates of the true

arrival rates 휆푘 for these potential biases as we explain in detail in the appendix. For the purpose of generating our results in this section we used the system parameters

푢푖푘 =푢 ¯ = 2, 훽 = −1, 휂 = −5. These values were selected for giving intuitively

55 (a) (b)

Figure 2-1: Absolute and relative performance of state-independent policies in steady- state versus the arrival rate scaling factor.

reasonable purchase probabilities. By interpreting 훽 as the decrease in utility per GBP of payment we note that these values correspond to customers gaining 2 GBP of utility over the cost their preferred meter and customers would be willing to pay 5 GBP to avoid needing to walk an additional kilometer. We also assume a capacity limit on the offered assortments of 푀 = 3 which is reasonable from the context of a parking assistance mobile phone application and prevents the MNL from offering large assortments of high-priced but remote parking facilities. Due to the computation time necessary to compute optimal capacitated assortments using the 푂(푁 2) algorithm of Rusmevichientong et al. (2010b), we have restricted our attention in this setting to the southwest quadrant of the data which consists of 푁 = 57 meters. We compare the performance of the following policies and heuristic methods and compare them to the linear programming based upper bound as calculated in computing the TASP.

∙ TASP - executed as described in section 2.2.2 without reference to system state

∙ AASP - executed as described in section 2.2.2 by removing blocked items

∙ SIOP - executed as described in section 2.2.3 without reference to system state

∙ ASIOP - executed as described in section 2.2.3 by removing blocked items

56 (a) (b)

Figure 2-2: Absolute and relative performance of state-dependent policies in steady- state versus the arrival rate scaling factor.

∙ Greedy - the state-dependent greedy heuristic as described in section 2.2.5

∙ TCMV - the state-dependent TASP Constant Marginal Value heuristic as described in section 2.2.5

∙ InvBal - the inventory balancing algorithm of Golrezaei et al. (2014), using the exponential penalty function

The steady-state performance the TASP and SIOP, which act independently of the current state of system, can be evaluated exactly using the Erlang-B formula (2.7) as in equation (2.11), for example. The absolute performance and relative performance of these state-independent algorithms are displayed in figure 2-1. Where the relative performance is given by the fraction of the LP-based upper bound obtained. The performance of the state-dependent policies must be evaluated using simula- tion due to the prohibitively large size of the state space. The performance of these polices in the same setting are displayed in figure 2-1. The results for each such eval- uation are computed using a burn-in period of at least 10 units of time. This burn-in period is reasonable as minimum estimated service rate, min푘(휇푘) > 0.5 and so as suggested by the data reported in table 2.1 this should be sufficient for the system to reach steady state in most cases. To elucidate the effect of system capacity on the performance of the algorithms we compute the performance of each on a spectrum

57 of systems derived as described with the arrival rate of each class scaled by the same constant factor. From these results we can make a number of observations. First, we note that, in the absence of real-time information of the state of the system, determining the optimal assortment is critical in achieving the highest fraction possible of the optimal revenue, especially in relatively lighter traffic. Second, these results emphasize the value of understanding the current state of the system before making assortment decisions. The policies that are able to adapt to current utilization states and avoid losing customers due to poor routing all tend to achieve much higher fractions of the optimal revenue than the state-independent policies. Third, the greedy algorithm actually performs quite well in these scenarios, especially in the case of lighter traffic. Under heavier traffic when desirable resources can become scarce, the TCMV policy, which uses the dual values corresponding to the capacity constraint for each resource as its marginal price, becomes dominant. We see that in general, the TCMV tends to perform comparably to the greedy strategy in lighter traffic and outperforms as system congestion increases. The dual value works well because it accounts for a resource’s scarcity as well as the degree to which other resources in the system can serve as effective substitutes for individual customer types.

2.4 Pricing and Assortment Decisions

In many cases of practical interest, in addition to the choice of assortment, an operator has the flexibility to adjust their pricing strategy to help match demand with available supply and to capture a larger share of the available consumer surplus. Here we consider two classes of problems representing varying degrees of pricing flexibility that may be encountered in various applications, which we term the opportunistic-pricing problem and the fair-pricing problem respectively. In the opportunistic scenario the operator is free to engage in price discrimination between the various customer types, while in the fair-pricing setting, such discrimination is not permitted. We first introduce common notation applicable to both scenarios. We then introduce each scenario

58 in turn and methodologies that are applicable to solving the upper bounding and policy-guiding linear programs in each setting. Subsequently, we introduce the randomized policies of section 2.2 and discuss how the performance guarantees presented there can be extended to the pricing context. Here, in contrast to the analysis presented in section 4, in addition to the assort-

ment decision 푆푘 ∈ 풮 for each customer type 푘, the seller must also make a pricing decision. Thus, in this case the operator’s decision at each stage is specified by both the selected prices and assortments, denoted 푋 = (푅, 푆) and further define 풳 = 풫×풮 to be the space of such decisions. Taking into account the demand specification ℳ and current utilization of the system, at each point the operator must make a pricing

and assortment decision 푋 ∈ 풳 . We then use the shorthand notation 푃푖푘푙(푋) to denote the probability that a customer of type 푘 purchases 푖 at price level 푙 when presented with the price and assortment configuration specified by 푋 ∈ 풳 . In many practical cases, the number of products it is possible to offer in a single assortment is constrained by shelf or screen space. If the number of products is limited to some

푀 ∈ Z+, this is known as the 푀-capacitated assortment optimization problem. In this scenario, we restrict our pricing and assortment decision further to lie in the set

풳푀 = {푋 ∈ 풳 : |푆| ≤ 푀}.

2.4.1 The Opportunistic Pricing Case

With this notation in hand, we begin by describing the opportunistic case. In this scenario the operator is free to charge customers of different classes different rates for the same resource. In particular, the operator is able to engage in dynamic price discrimination taking into account both the dynamics of the resource system and the price sensitivity of the customer. Such price discrimination is known to be effective in increasing revenue (Talluri and van Ryzin 2004b) and can be applicable in domains where pricing is expected to be customized such as is the case for insurance or in markets for which pricing is opaque and customer specific as is often the case in business-to-business transactions. Here, by introducing virtual products representing the various price levels associated with each product, as in Gallego and

59 Topaloglu (2014) for example, we can apply the methodology developed in section 2.2 to solve this problem. Specifically, as before, each column in the LP formulation now corresponds to an assortment decision for a given customer type 푘, but in this case each individual item is replaced with 퐿 virtual items, representing that product at each of its 퐿 prices. The operator then makes assortment decisions in the same manner described in section 2.2, with the additional restriction that each assortment 푆 must be valid, containing at most one virtual item corresponding to each item in

풩 . By extending the notion of our decision variables we obtain 훼푘(푋) corresponding to configuration 푋 = (푆, 푅). This variable represents the fraction of time customers of type 푘 are offered assortment 푆 with price levels set according to 푅. Using these variables, we formulate the linear programming upper bound in the opportunistic case as follows,

퐾 푁 퐿 퐿푃 −푂푃 ∑︁ ∑︁ ∑︁ ∑︁ 푝푖푙 퐽 = max 휆푘푃푖푘푙(푋)훼푘(푋) 훼 휇푖푘 푋∈풳 푘=1 푖=1 푙=1 퐾 퐿 ∑︁ ∑︁ ∑︁ 휆푘 s.t. 푃푖푘푙(푋)훼푘(푋) ≤ 퐶푖 ∀푖 ∈ 풩 휇푖푘 (2.21) 푋∈풳 푘=1 푙=1 ∑︁ 훼푘(푋) ≤ 1 ∀푘 ∈ 풦 푋∈풳

훼푘(푋) ≥ 0 ∀푋 ∈ 풳푀 , ∀푘 ∈ 풦 .

In very small instances the opportunistic problem may be solved directly using the linear programming formulation. However, as before, the number of possible assortments, exacerbated by the introduction of the virtual items, is likely to make the full problem intractably large, necessitating the use of a column generation procedure. As was the case in section 2.4, to find an augmenting column it is sufficient tosolve a single-period static pricing and assortment problem over valid assortments. Under some models of customer choice this problem admits an efficient exact algorithm. For example, Gallego and Topaloglu (2014) present a polynomial time algorithm to solve this problem exactly when customer choice is specified by a nested logit model of which the MNL model is a special case. In such cases the opportunistic problem can

60 be solved just as efficiently as the assortment-only problem.

2.4.2 The Fair-pricing Case

Despite its theoretical efficacy, price discrimination may face regulatory restriction and may incur backlash from customers, especially in consumer-facing markets. Thus we also propose the fair-pricing problem in which the operator’s pricing strategy must not directly discriminate between customers of different classes. That is for each resource 푖 and each pair of customer types 푘 and 푘′, the pricing rates must be equal, satisfying, 푟푖푘 = 푟푖푘′ . That is to avoid class-based discrimination the operator must make their pricing decision prior to the realization of the type of the arriving customer. Here, we work with the restricted space of pricing configurations, 풫퐹 푃 =

퐹 푃 풫1 ×풫2 ×...×풫푁 . In such a case the pricing decision 푅 ∈ 풫 reduces to a 푁-vector of prices. This scenario is challenging in that it induces a strong constraint on the pricing decision that must hold for each assortment offered to distinct customer types. Formally, in the fair-pricing context we must ensure that each product is offered at a single price regardless of customer class, thus we require that any valid pricing and assortment decision belong to the set 풳 퐹 푃 = 풫퐹 푃 × 풮퐾 . To ensure fair-pricing, we highlight the fact that each decision now specifies a price level as well as an assortment for each customer type simultaneously. For convenience in our formulations, we will interpret the joint pricing and assortment decision as a 3-dimensional vector 푋 ∈

푁×퐿×퐾 {0, 1} , with 푥푖푘푙 = 1 indicating the decision to offer product 푖 at price level 푙 to customers of type 푘. 퐹 푃 푁×퐿×퐾 ∑︀퐿 Thus we may specify 풳 = {푋 ∈ {0, 1} : 푙=1 푥푖푘푙 ≤ 1 ∀ 푖, 푘 , 1 − 푥푖푘푙 ≥ ′ ′ 푥푖푘′푙′ = 0 ∀ 푖, 푘, 푘 , 푙 ̸= 푙 }. The first restriction ensures that the same product is not offered to the same customer type at various price levels. The second constraint prevents price discrimination. This restriction can be formulated more succinctly by 푁×퐿 ∑︀퐿 introducing the auxiliary variables 푌 ∈ 풴 = {푌 ∈ {0, 1} : 푙=1 푦푖푙 = 1, ∀ 푖 ∈ 풩 }, with the interpretation that 푦푖푙 = 1 indicates the decision to price product 푖 at price level 푙. To express the fair pricing problem we may write 풳 퐹 푃 = {푋 ∈ 풳 :

∃푌 ∈ 풴, 푥푖푘푙 ≤ 푦푖푙, ∀ 푖 ∈ 풩 , ∀푙 ∈ ℒ, ∀ 푘 ∈ 풦} and we use the notation 푌 (푋) to

61 denote the auxilliary variables associated with the pricing and assortment decision 푋. Then to incorporate a potential 푀-capacity constraint we introduce the set, 푁×퐿×퐾 ∑︀푁 ∑︀퐿 풳푀 = {푋 ∈ {0, 1} : 푖=1 푙=1 푥푖푘푙 ≤ 푀, ∀ 푘} and restrict the decision space 퐹 푃 퐹 푃 to 풳푀 = 풳푀 ∩ 풳 . With this notation in hand we are able to formulate the bounding linear program for the 푀-capacity constrained fair pricing and assortment

퐹 푃 problem by introducing the decision variables 훼(푋). For each 푋 ∈ 풳푀 , 훼(푋) can be interpreted as the fraction of time in which the operator makes pricing and assortment decisions specified by 푋 to arriving customers. We solve the following linear program to determine an allocation that maximizes long-run expected revenue,

퐾 푁 퐿 퐿푃 −퐹 푃 ∑︁ ∑︁ ∑︁ ∑︁ 휆푘 퐽 =max 푝푖푙 푃푖푘푙(푋)훼(푋) 훼 휇푖푘 퐹 푃 푘=1 푖=1 푙=1 푋∈풳푀 퐾 푁 퐿 ∑︁ ∑︁ ∑︁ ∑︁ 휆푘 s.t. 푃푖푘푙(푋)훼(푋) ≤ 퐶푖 ∀푖 ∈ 풩 휇푖푘 퐹 푃 푘=1 푖=1 푙=1 (2.22) 푋∈풳푀 ∑︁ 훼(푋) ≤ 1 퐹 푃 푋∈풳푀 퐹 푃 훼(푋) ≥ 0 ∀푋 ∈ 풳푀 .

As before, the column generation problem associated with the master problem (2.22) reduces to the single period static problem. However, the following proposition demonstrates that, in general, this column generation subproblem poses computational challenges. We refer to this subproblem, which is of independent interest, as the fair pricing and personalized assortment optimization problem (FPPAO). Intu- itively, the FPPAO problem is concerned with selecting a price for each resource 푖 from

the set of candidate prices 풫푖 and subsequently deciding which assortment should be shown to each customer type. Given the data tuple (풮, 풫, ℳ), the FPPAO problem is to solve the following optimization problem,

퐾 ∑︁ ∑︁ max 푟푖 ℳ (푖; 푆푘, 푅). 퐹 푃 P 푘 푅∈풫 ,푆푘∈풮: |푆푘|≤푀 푘=1 푖∈푆푘

Proposition 5. The 푀-capacitated fair pricing and assortment optimization is NP-

62 hard even when ℳ푘 is specified by a multinomial logit model for each customer type 푘 and there are only two prices per item.

This result of this proposition implies that the FPPAO problem is difficult to solve in full generality. This means that the column generation subproblem required to solve formulation (2.22) is also NP-hard, rendering the procecure intractable in general.Owing to the difficulty, we are motivated to seek out special cases forwhich it can be solved approximately. To this end, in the following subsection, we study the special case of single-item assortments.

2.4.3 Fair-pricing with Single-item Assortments

In this section we consider the joint pricing and product offer problem, in which the assortments offered to customers must consist of a single item. Formally, the pricing

퐹 푃 and offer problem can be formulated by restricting the decision spaceto 풳1 . This leads to the following formulation of the linear programming upper bound for the pricing and offer problem,

퐾 푁 퐿 퐹 푃 ∑︁ ∑︁ ∑︁ ∑︁ 휆푘 퐽1 = max 푝푖푙 푃푖푘푙(푋)훼(푋) 훼 휇푖푘 퐹 푃 푘=1 푖=1 푙=1 푋∈풳1 퐾 푁 퐿 ∑︁ ∑︁ ∑︁ ∑︁ 휆푘 s.t. 푃푖푘푙(푋)훼(푋) ≤ 퐶푖 ∀푖 = 1, . . . , 푁 휇푖푘 퐹 푃 푘=1 푖=1 푙=1 (2.23) 푋∈풳1 ∑︁ 훼(푋) ≤ 1 퐹 푃 푋∈풳1 퐹 푃 훼(푋) ≥ 0 ∀푋 ∈ 풳1 .

퐹 푃 We let 훼1 denote the solution to this problem. Even after restricting our atten- 퐹 푃 tion to the simpler setting defined by 풳1 , the number of possible pricing and offer configurations is given by 2퐿푁 퐾 and so we again employ column generation. Upon solving the reduced problem we check the optimality of our current solution to the full

63 problem (2.23) by searching for a violated constraint in the associated dual problem,

푁 ∑︁ min 퐶푖훾푖 + 휎 훾,휎 푖=1 퐾 푁 퐿 퐾 푁 퐿 ∑︁ ∑︁ ∑︁ 휆푘 ∑︁ ∑︁ ∑︁ 휆푘 퐹 푃 s.t. 푃푖푘푙(푋)훾푖 + 휎푘 ≥ 푝푖푙 푃푖푘푙(푋) ∀푋 ∈ 풳1 휇푖푘 휇푖푘 푘=1 푖=1 푙=1 푘=1 푖=1 푙=1

훾푖 ≥ 0 ∀푖 ∈ 풩 휎 ≥ 0. (2.24) Where the dual variables 훾 and 휎 are associated with the capacity constraints and the offering time constraint respectively. Let 훾푅퐸퐷 and 휎푅퐸퐷 be the resulting value of the dual variables after solving the reduced problem to optimality. Using these we seek to solve the column generation subproblem,

퐾 푁 퐿 퐾 푁 퐿 ∑︁ ∑︁ ∑︁ 휆푘 ∑︁ ∑︁ ∑︁ 푅퐸퐷 휆푘 푅퐸퐷 max 푝푖푙 푃푖푘푙(푋) − 훾푖 푃푖푘푙(푋) − 휎 . (2.25) 푋∈풳 퐹 푃 휇푖푘 휇푖푘 1 푘=1 푖=1 푙=1 푘=1 푖=1 푙=1

If the resulting solution is non-positive, then the solution to the reduced primal problem is indeed optimal and we can terminate the procedure. Otherwise we add a configuration 푋 for which a positive value was obtained in (2.25) to the set of candidate columns and repeat this procedure with the updated set of candidate columns. Although the subproblem (2.25) requires optimization over an exponential number of possible configurations, we show that it can be formulated as an integer program.

To formulate this integer program we interpret the components of 푋 as variables

푥푖푘푙 ∈ {0, 1} which we refer to as assignments. Similarly, we interpret the components of the auxiliary variable 푌 as the variables 푦푖푙 ∈ {0, 1}. We observe that in the case of the pricing and offer problem, for each customer type 푘, 푃푖푘푙(푋) depends only on the single value 푥푖푘푙 for which 푥푖푘푙 = 1. For this item 푖 and price level 푙, 푃푖푘푙(푋) is a constant since the probability of purchasing product 푖 does not depend on the other unoffered products. We use 푃푖푘푙 to denote this constant probability. This leads to the

64 following formulation for the column generation subproblem,

퐾 푁 퐿 ∑︁ ∑︁ ∑︁ 푅퐸퐷 휆푘 max (푝푖푙 − 훾푖 ) 푃푖푘푙푥푖푘푙 푥,푦 휇푖푘 푘=1 푖=1 푙=1 푁 퐿 ∑︁ ∑︁ s.t. 푥푖푘푙 ≤ 1 ∀ 푘 ∈ 풦 푖=1 푙=1 (2.26) 푥푖푘푙 ≤ 푦푖푙 ∀ 푖 ∈ 풩 , 푘 ∈ 풦, 푙 ∈ ℒ 퐿 ∑︁ 푦푖푙 = 1 ∀ 푖 = 1 ∈ 풩 , 푘 ∈ 풦 푙=1

푥푖푘푙 ∈ {0, 1}, 푦푖푙 ∈ {0, 1} ∀ 푖 ∈ 풩 , 푘 = 1 ∈ 풦.

The first constraint ensures that the configuration offers at most one product andprice combination to each customer type. The second constraint ensures that a resource 푖 is only offered to customers at price level 푙 if this price level is selected using the

auxiliary variable 푦푖푙. The third constraint ensures that exactly one price level is selected for each product. This formulation of the problem remains NP-hard, however it can be shown that

its objective is submodular in the pricing variables 푦푖푙. Therefore by applying the pipage rounding framework proposed by Calinescu et al. (2007) it may be efficiently

1 approximated to within a factor of (1 − 푒 ). By Theorem 2 of (Gallego et al. 2015), 1 by obtaining a (1 − 푒 )-approximation to the column generation subproblem through 1 iteration we are able to obtain a (1 − 푒 )-approximation to the master problem (2.23).

On the other hand, observe that once the pricing variables 푦푖푙 are fixed, the as-

signment variables 푥푖푘푙 can be set by simple maximization. Therefore the integrality restriction on the assignment variables is not practically necessary, resulting in a problem with 푁퐿 integer variables. In our numerical experiments we have observed that for problems of moderate size, 푁 = 퐾 ˜=100 and 퐿 ˜=10, modern mixed-integer optimization solvers such as Gurobi are able to solve problem (2.26) within seconds. By

퐹 푃 a similar argument to that presented in section 2.2 the objective value 퐽1 provides an upper bound on the revenue of any dynamic policy that must fix prices before observing the customer type and is restricted to make only single-product offers.

65 After solving problem (2.23) exactly or approximately, it is simple to derive an intuitive policy that also carries strong theoretical guarantees. Specifically, prior to the arrival of each customer the operator randomly selects a pricing and offer

퐹 푃 decision according to the probabilities 훼1 . When a customer arrives and their type is revealed, the product to offer is selected based on their type at the pre-specified pricing level. Just as in the case of the TASP for the assortment-only problem, the revenue of this policy can be decomposed by resource and the only reason that the optimal revenue is not attained is the possibility of blocking. Therefore the result of Lemma 2 also applies directly to this case, and so by implementing the policy as

1 described the operator obtains at least 2 of the optimal revenue and this fraction increases to 1 as the minimum capacity in the system grows large. We term this policy the Fair Pricing and Offer (FPO) policy. We will demonstrate in section 2.5 that in realistic instances the performance of the FPO policy tends to exceed this worst-case bound.

2.4.4 Computational Strategies for Assortments with 푀 > 1

The techniques discussed in the previous section assume a solution to to the linear programming formulation (2.22) which can pose computational challenges in the case of assortments of more than one item. On the other hand, in the special case of single-item assortments, the column generation subproblem can be approximated efficiently and in practice the integer program for moderate size problems iswithin the capability of modern optimization solvers such as Gurobi. However, in general, without further simplification, the column generation subproblem associated with (2.22) may be extremely large. For example there are 퐿푁 integer pricing variables ∑︀푀 (︀푁)︀ 푗 and 푗=1 푗 퐿 possible assignments of fully-priced assortments per customer type in the 푀-capacitated pricing and assortment problem. Even in the case of the most simple choice models for which computing the requisite number of purchase probability coefficients is possible, the scale of the problem is still likely to present challenges for modern mixed integer programming solvers. In this section, we discuss strategies for reducing the size of the problem that

66 apply to a practically relevant special case. In particular, our strategy relies on horizontal differentiation between the products. That is, customer preferences are specialized enough that the number of items that each customer type is predisposed to purchase is much smaller than 푁. This is realistic in businesses such as parking system management, for example, in which customers are likely to demonstrate a strong preference for a small number of lots close to their destination, or in cloud computing in which customers may have vastly different computing needs ranging from simple blog hosting to specialized GPU-based computation where customers of either type are unlikely to be satisfied with options specialized for the other. Specifcally we will assume that with respect to operators optimal pricing strategy it is reasonable to assume that each customer type 푘 is willing to consider a subset of products 풮(푘) ⊂ 풮 with 퐷 = |풮(푘)| ≪ 푁. For the purpose of optimization we take 퐷 to be constant across customer types, which is easily seen to be without loss of generality. Under this assumption the number of fully-priced assortments we must ∑︀푀 (︀퐷)︀ 푗 consider per customer is reduced to 푗=1 푗 퐿 which may result in a computionally tractable column generation subproblem. In cases where the resulting problem is still too large, as a further approximation, we suggest solving the corresponding version of the opportunistic pricing problem and using all prices between the highest and lowest price offered for each resource. Intuitively the range of prices offered bythe opportunistic strategy should be wider than that offered in the fair pricing case, but may reduce the search space in the number of prices.

2.4.5 Fair Dynamic Pricing Policies

In this section, we consider the setting when real-time utilization information is available to the operator and present policies for dynamic pricing with fairness considerations. In particular, each policy we discuss in this section selects the prices and resource assignments prior to observing the type of the customer. This ensures that no customer is directly price discriminated against, and in a hypothetical setting different customers checking the system at the same time would see identical prices for each resource that they are offered in common. In this section we fix an assortment

67 capacity of size 푀, however as discussed at length in section 2.4.2, larger values of 푀 can present computational difficulties.

We first introduce a dynamic fair-pricing greedy strategy the performance ofwhich serves as a useful baseline. The basic idea is to set system prices greedily based on

퐹 푃 current availability. Therefore, we introduce the notation 풳푀 (푥) to denote the set of feasible pricing and assortment configurations at time 푥 based on the resources currently available. Then to determine the current pricing and assortment configuration we solve a variant of the relevant column generation subproblem using the feasible

퐹 푃 set 풳푀 (푥), 푁 퐾 퐿 ∑︁ ∑︁ ∑︁ 푝푖푙 max 휆푘푃푖푘푙(푋). (2.27) 푋∈풳 퐹 푃 (푥) 휇푖푘 푀 푖=1 푘=1 푙=1 The solution to this problem represents a locally greedy pricing strategy that sets prices in such a way to maximize the revenue rate in the current period. In the special case of single item assortments this greedy strategy can be computed by solving the integer program (2.26) with the objective function replaced by,

퐾 푁 퐿 ∑︁ ∑︁ ∑︁ 푝푖푙 max 휆푘푃푖푘푙푥푖푘푙. (2.28) 푆⊆퐴(푥) 휇푖푘 푘=1 푖=1 푙=1

Where 퐴(푥) denotes the set of available resources at time 푥.

In heavy traffic scenarios the greedy policy can suffer from its myopic nature. In particular the instantaneous revenue maximizing price does not take the limited nature of the capacity into account and therefore tends to underprice its resources, especially in high traffic scenarios. To mitigate this we propose a strategy thatacts greedily after taking into account an estimate of the value of capacity of each resource. To estimate these marginal values we propose using the dual values 훾 obtained in solving (2.23) in the case when 푀 = 1 and (2.22) in general. Thus to obtain this dual-adjusted pricing and assortment strategy we solve,

푁 퐾 퐿 (︂ )︂ ∑︁ ∑︁ ∑︁ 푝푖푙 max − 훾푖푘 휆푘푃푖푘푙(푋). (2.29) 푋∈풳 퐹 푃 (푥) 휇 푀 푖=1 푘=1 푙=1

68 As before when 푀 = 1 this strategy can be computed by solving the column generation subproblem with the objective modified as,

퐾 푁 퐿 (︂ )︂ ∑︁ ∑︁ ∑︁ 푝푖푙 max − 훾푖 휆푘푃푖푘푙푥푖푘푙. (2.30) 푆⊆퐴(푥) 휇푖푘 푘=1 푖=1 푙=1

Note that in practice one need only recompute the pricing problem when either a previously available resource becomes fully occupied or when a previously fully utilized resource becomes newly available.

2.5 Computational Case Study - Pricing and Assort- ment

To test the effectiveness of our pricing policies in more realistic scenarios weex- tend the numerical experiments based on parking data collected in the borough of Islington, London, presented in section 2.3. In this scenario the borough operates a network of parking lots, and we are interested in quantifying the revenue generation and congestion-control potential of a dynamic pricing strategy. In addition to an assortment mechanism, the borough of Islington is also interested in exploring the feasibility and efficacy of a dynamic pricing strategy. Due to the limited screen space on a mobile phone and the need to minimize driver distraction, it is reasonable that drivers are offered a limited number of space in which to park. Therefore the fair pricing and offer problem as described in section 2.4.2 represents a reasonable framework for this practical use case. Although the computational strategies we suggest in section 2.4.4 do apply, for computational simplicity we focus on the case of single-item assortments as We adapt the numerical experiments considered in section 2.3 to allow for pricing decisions. In addition, we extend the utility model (2.19) presented in that section to explicity account for the pricing decision. Recall that parameter 푢푖푘 represent the mean utility that customers of type 푘 derive from successfully utilizing resource 푖 and the parameters 훽 and 휂 specify the sensitivity of customer utility to price and

69 distance respectively. Then the deterministic component of customer utility derived by customers of type 푘 at resource 푖 with price level 푙 is given by,

(푝푖푙 − 푟푘) 푈푖푘푙 = 푢푖푘 + 훽 + 휂퐷(ℓ푖, ℓ푘) (2.31) 휇푘

In accordance with the specification of the MNL model we assume that heterogeneity of customer valuations within each type is distributed is distributed as a standard Gumbel random variable, 휖. We also take the convention of assigning zero utility to the no-purchase option, this leads to the expression,

푒−푈푖푘푙 푃푖푘(푆, 푅) = (2.32) 1 + 푒−푈푖푘푙

which describes the probability that a customer of type 푘 purchases resource 푖 when it is offered at price level 푙. The key assumptions of our model are highlighted below.

∙ The parking decisions of each customer type can be well-modeled using the logit model. In particular, the meters are sufficiently differentiated in the eyes ofpo- tential customers for the independence of irrelevant alternatives (IIA) property to be reasonable.

∙ Conditional on a purchase, the mean length of occupancy for a customer of type

푘 is 휇푘 independent of the pricing rate 푟푖푙 and the meter 푖.

∙ The utility of customers is linear in both price and distance.

∙ Customer arrival rates are constant over the period at hand.

∙ The set of possible prices is restricte to the 9 prices currently in use in Islington as detailed in section 2.3.

For the purpose of generating the results in this section we used the arrival rates estimated as previously detailed with 훽 = −1, 휂 = −5, and the arrival rates 휆푘 obtained using the values specified above. As we explain in appendix A.3, although the absolute values of the expected steady-state revenue are sensitive to changes in

70 these parameters the relative performance of the various pricing strategies appears to be reasonably stable. In this section we compare the performance of the following algorithms.

∙ TASP-PO - fair pricing and offer policy executed as described in section 2.4.3 without reference to system state

∙ Fair-Greedy - the state-dependent greedy heuristic as described in section 2.4.5 which solves for the myopic best pricing and offer policy at each step

∙ Fair-DBP - the state-dependent heuristic that makes pricing and offering decisions greedily after accounting for a dual-based approximation of the margingal value of each resource as described in section 2.4.5

∙ Fixed-Greedy - a heuristic that solves for a set of fixed prices and subsequently makes single-item assortment decisions greedily. The prices are determined by solving an mixed-integer optimization problem similar to problem (2.23) with the addition of integer variables that ensure that a single fixed price is selected for each resource.

∙ Fixed-DBP - a heuristic that solves for fixed prices in the same manner as for the Fixed-Greedy policy. However, in this policy assortment decisions incorporate the marginal values in the same manner as used in the Fair-DBP policy.

The theory presented in section 2.4.2 shows that by solving the optimization problem (2.23), in which we allow the prices of the resources to vary, we obtain both an effective dynamic pricing-offer policy as well as an upper bound on the performance of any similarly constrained dynamic pricing policy. Our results, summarized in figure 2-3, indicate that our policies live up to their theoretical guarantees. In particular, the TASP-PO policy which does not make use of the current state of the resources achieves over 75% of the optimal revenue under all scaling conditions. As was the case in our assortment-only experiments, when real-time utilization information is available, more dynamic policies provide a significant benefit. We observe that in low

71 (a) (b)

Figure 2-3: Absolute and relative performance of dynamic fair-pricing and offer policies in steady-state versus the arrival rate scaling factor. load scenarios, both Fair-Greedy and Fair-DBP perform well, each achieving greater than 90% of the revenue upper bound. However, under heavier load scenarios it is crucial to account for the value of a resource to customers who may arrive in the future. Indeed, when the arrival scaling factor is increased, the performance of the Fair-Greedy policy begins to degrade significantly, while the performance of Fair-DBP remains withing 7% of the upper bound. Despite their demonstrated effectiveness, both dynamic pricing policies Fair-Greedy and Fair-DBP, impose a significant computational burden in their practical implementation. In particular, they both require the operator to solve a mixed-integer optimization problem frequently throughout the selling period. Thus, we were motivated to compare the performance of our approaches to similar approaches that operate with fixed prices, the Fixed-Greedy and Fixed-DBP policies as described above. These policies make use of a heuristic mixed-integer optimization problem that is solved initially to determine prices and time-fraction allocations that would maximize revenue without consideration for the impact of blocking. We note that this upfront optimization problem would present computational challenges in systems with very large numbers of resources and customer types however the small scale of our current example enables us to compute them. The performance of these policies relative to Fair-DBP are presented in figure 2-4.

72 Figure 2-4: Comparison of performance of dual-based dynamic fair-pricing policies versus fixed-price policies in steady-state versus the arrival rate scaling factor.

From this figure, we observe that although the performance of Fair-DBP exceeds that of the fixed price strategies in all instances, the magnitude of this outperformance is actually somewhat small. In particular the outperformance of Fair-DBP is only about 1%-2% of the revenue upper bound. This relatively small shift in revenue may not provide significant justification for the implementation of a dynamic pricing policy in the time-homogenous setting. This limited outperformance is intuitive as in the time-homogeneous setting, dynamic pricing is only useful as a tool for managing stochastic fluctuations. We would expect dynamic pricing to be much more useful when demand varies over time so that prices may be changed to take into account ex- pecations of future demand. In Chapter 3 we investigate the time-varying setting and demonstrate that dynamic pricing provides more significant benefits in this setting.

2.6 Conclusion

In this chapter we studied the problem of assortment optimziation and joint pricing and assortment optimization under time-homogeneous demand rates. Assuming known customer choice models we derived a policy with a strong constant-factor steady-state performance guarantee. Further, in the practically relevant case of non-

73 price discrimination, we demonstrate that it is possible to compute the optimal state- independent policy. We also show that these guarantees are meaningful in this case as convergence to steady-state occurs quickly. We also propose heuristic dynamic policies that are able to use current utilization information to improve performance. We extend our techniques and analysis to consider dynamic pricing and assortment optimization. When price discrimination is acceptable we show that many of techniques developed for the assortment-only scenario carry over directly so long as a single joint pricing and assortment problem can be solved efficiently. In the case where price discrimination is disallowed we develop the fair-pricing variant of our problem. The same approach also applies in this case, however we demonstrate that the requisite column generation subproblem is NP-hard in general. In spite of this we show that pricing in the special case of single-item assortments is efficiently theoretically approximable, and practically tractable. Finally we validated our proposed techniques with computational experiments. In the assortment only case we demonstrate the SIOP indeed outperforms the TASP in the state-independent setting. When real-time utilization information is available we show that incorporating the value of capacity of each resource in a bid-price strategy improves on the performance of the greedy heuristic under moderate system loads. Our experimental results for pricing and assortment optimization yield similar results. However the performance of a dynamic pricing policy relative to an optimized fixed price policy are perhaps disappointing. This reinforces the intuitive notion that dynamic pricing is of limited benefit under steady demand. Motivated by this inthe subsequent chapter we examine how the strategies presented in this chapter can be extended to the setting of time-varying demand.

74 Chapter 3

Revenue Management for Reusable Resources under Time-Varying Demand

Motivated by the results of the previous chapter, in this chapter we proceed to consider the problems of pricing and assortment optimization in systems of reusable resources under time-varying demand. Although the techniques present previously provide a strong foundation for this problem, in many revenue management contexts, the assumption of constant demand rates over time is inappropriate. This is the case, for example, in our motivating example of parking, in which we would expect arrival rates to increase in the commercial center during the day and to be slower early in the morning and late at night. In such a setting, an operator with knowledge of future demand patterns must account for them in making intelligent pricing and assortment decisions. Crucially, in this scenario, the greedy policies which performed well in the time-homogeneous setting may mistakenly offer a resource at a lower price before a large demand spike, wasting valuable capacity. In this chapter, we develop analytical techniques that enable an operator to account for future demand in making operational decisions. In particular we propose a time-discretization strategy that enables us to compute both an upper bound on the optimal revenue as well as actionable pricing and assortment

75 strategy. Our techniques and guarantees apply under both a finite time horizon as well as under an infinite time horizon when demand varies periodically. In the former case our notion of approximate optimality is with respect to many independent runs of the same system scenario, while in the infinite time periodic case it is with respect to the optimal steady-state revenue under a consistent policy. Our primary contributions are summarized as follows.

∙ We introduce a policy and computational strategy for the allocation of reusable resources in continuous time which accounts for random service times, continuously time-varying demand rates, and customer choice in both the setting of a finite time horizon and an infinite horizon in which customer arrival ratesvary periodically. Our proposed policy achieves a constant factor guarantee relative to the optimal dynamic policy in our setting and is asymptotically optimal up to an approximation factor. Our strategy is parametrized allowing an operator to further refine their policy, and associated performance guarantees, atthe expense of additional computational effort.

∙ We further consider the problem of joint pricing and assortment selection in our setting. As in Chapter 2, we take the universe of possible prices as a discrete set, and we show that when price discrimination is feasible, many of the techniques and associated guarantees as presented in the assortment-only context carry over. On the other hand, when price discrimination is impossible or undesirable, as in Chapter 2, we formulate the fair pricing problem. As in the previous chapter we demonstrate that although computationally challenging in general, an approximation algorithm is tractable.

∙ When real-time utilization is available, we propose a novel heuristic bid-price strategy that effectively accounts for the future value of each resource. Our computational results demonstrate that accounting for this value is important when customer valuations are heterogeneous and is especially critical when the operator has the ability to implement dynamic pricing.

Taken together, our results suggest that pricing and assortment optimization can

76 serve as effective methods for revenue management and inventory control in theman- agement of systems of reusable resources. We begin in section 3.1 by defining our model and the structure of our policies for the assortment-only scenario. In section 3.2, we present our main techniques and for developing policies attaining constant-factor guarantees in our time-varying scenarios and our associated results. Section 3.3 extends the model and techniques of sections 3.1 and 3.2 to allow for dynamic pricing in addition to assortment decisions. In section 3.4, we examine the effectiveness of our proposed policies in a computational experiment based on parking data obtained from a corporate partner. We conclude our analysis and present possible directions for future inquiry in section 3.5.

3.1 Assortment Model Formulation

As in the previous chapter, the platform operator has a set of 푁 distinct resources (items) 풩 indexed by 푖 ∈ {1, . . . , 푁} and seeks to manage their utilization in continuous time. The operator’s potential customers belong to one of 퐾 types with each having potentially idiosyncratic preferences for the available resources. Our techniques may be applied under the assumption of a finite time horizon as well as in case of an infinite time horizon when fluctuations in demand can be expressed asaperi- odic function. We cover the extension to the infinite horizon setting under periodic demand in appendix B.2. In both settings, we assume that time evolves continuously indexed by 푥 and each customer type 푘 arrives to the system according to a non-

homogeneous Poisson process, with rate specified by the known function 휆푘(푥). In the finite time horizon setting, we assume that the time horizon is normalized tobeof unit length. We let 풱 = [0, 1) denote the unit interval which encompasses the entire time horizon or one periodic cycle in the finite and infinite time horizon settings, respectively. We use the notation 휈(푥) = mod(푥, 1) to denote the function which maps the system time to its place within the unit interval. We observe that in the finite time horizon setting, we have 휈(푥) = 푥, however for consistency of notation between our two settings we favor the use 휈(푥) here. Each resource 푖 has capacity 퐶푖 ∈ Z+

77 which limits the number of customers who can make use of each simultaneously and

푁 we will use 퐶 ∈ Z+ to denote the vector of such capacities. When a customer arrives the operator must select an assortment of products 푆 to offer from the universe of feasible subsets 풮 ⊆ 2풩 . In general the set of feasible assortments 풮 may be the set of all subsets of 풩 , however, in specific contexts it is possible that the operator’s actions are constrained. For example, assortment decisions could be restricted by shelf or screen space and pricing flexibility may be limited due to laws restricting price discrimination. In the event that the set of feasible assortments depends on the customer type, the sets of feasible assortments can be further specified using 풮푘 for customers of type 푘. For ease of notation we will typically assume that 풮 is valid for customers of each type, however the extension to such type-dependent assortments is straightforward.

As in the previous chapter, the resources we consider are substitutable and the various customer types may have heterogeneous preferences and price sensitivities which the operator should consider in developing their operating policy. In particular, each customer type 푘 is associated with a choice model, ℳ푘, that formally specifies the decision-making process of each customer type. As in the previous chapter we adopt the shorthand notation 푃푖푘(푆) = Pℳ푘 (푖; 푆) to specify the likelihood that a customer of type 푘 purchases product 푖 when offered assortment 푆. To focus on the issue of policy optimization, in this chapter, we take the system parameters and the choice models ℳ = (ℳ1, ℳ2,..., ℳ퐾 ) as fixed and known. However, as we will explain in the following section, full knowledge of the demand is sufficient but not required for our policies to achieve their associated performance guarantees. In particular, all that is needed are accurate estimates of average demand over particular slices of time. In general these choice probabilities may vary over time along with the demand rates themselves, without changing the form of our results or policy, however for notational clarity we take them as constant over time.

Upon the arrival of a type 푘 customer, if the operator offers a set of items 푆 ∈ 풮 then the customer elects to purchase item 푖 ∈ 푆 with known probability 푃푖푘(푆) or ∑︀ chooses the outside option with probability 푃0푘(푆) = 1 − 푖∈푆 푃푖푘(푆). We further

78 assume that the operator has the ability to reject a customer by offering the empty

set 푆0 = ∅ ⊂ 풩 . As a concrete example, suppose each customer segment 푘 chooses

according a multinomial logit model with segment and resource-specific weights 푤푖푘. These weights could reflect among other things, how closely resource 푖 matches the preferences of type 푘 and the price that resource 푖 is offered to customers of type

푤푖푘 푘. Then choice probabilities can be computed as 푃푖푘(푆) = ∑︀ . Where such 1+ 푗∈푆 푤푗푘 weights are normalized so that the outside option has unit weight. If the customer selects a resource that is currently being utilized at capacity or chooses the outside option then they exit the system with no further effects, otherwise they are matched with their chosen resource 푖 and a service event is initiated.

In the event that an arriving customer of type 푘 elects to purchase a given resource 푖, a service event is initiated which lasts for an exponentially distributed length of

time with rate parameter 휇푖푘, possibly depending on both the customer type and the resource. The assumption of exponential service times is standard in the literature when dealing with random as opposed to fixed service times, see Savin et al. (2005) and Gans and Savin (2007), for example. We note that in the special case of time homogeneous demand it is possible to achieve similar results in the presence of general service time distributions as we demonstrated in the previous chapter. In any case

we refer to 휇푖푘 as the service rate of resource 푖 for customers of type 푘. During each service period the customer uses one unit of capacity of resource 푖 and the operator earns revenue continuously at a potentially type-specific rate 푟푖푘. We also assume the existence of the null resource, indexed by 0, with 푟0푘 = 0, 휇푖푘 = 1, and 푃푖푘({0}) = 1 for all 푘 ∈ 풦 and infinite capacity. Thus if a customer is offered only the null resource they are taken to be rejected. We note that in the finite horizon setting we assume that all customers in service at the end of the horizon are allowed to complete their service and therefore in all cases the operator earns an expected revenue of 푟푖푘/휇푖푘 when a customer of type 푘 begins service at resource 푖. We first consider the case in which the prices are fixed exogenously, however in section 3.3, we demonstrate how the techniques we present can be extended to guide dynamic pricing.

79 3.1.1 System State and Policies

In our model the state of the system can be captured by information concerning the current utilization of the resources as well as the present time. For any time 푥, we let

푁×퐾 푊 (푥) ∈ Z+ denote the time 푥 resource utilization matrix in which entry 푊푖푘(푥) gives the number of type 푘 customers being served by resource 푖. Further, we let ∑︀퐾 푊푖(푥) = 푘=1 푊푖푘(푥) denote the total utilization of resource 푖 at time 푥. Due to the capacity limitations, a new customer of type 푘 can begin service at resource 푖 at time 푥 only if the resulting utilization would not exceed the corresponding capacity limit so that 푊푖(푥) < 퐶푖. In the case of exponentially distributed service times, due to memorylessness, we are able to succinctly describe the utilization state of the system using only the current utilization 푊 (푥). Under time-varying demand the system time 푥 also plays a vital role in defining the state of the system as it specifies the structure of future arrival patterns through the future evolution of the demand rate functions

휆푘(푥). Thus in the case of exponential service times, the state of the system may be specified as the tuple (푥, 푊 (푥)). In the finite time horizon setting we assume that the system begins empty so that 푊푖(0) = 0 for all resources 푖, however this assumption can be relaxed in a straightforward manner without impacting our results.

Taking into account the system parameters, the choice model specification ℳ, and the state of system, the operator seeks to develop a policy leading to effective assortment decisions. Formally, a policy for customers of type 푘 is a mapping from the state of the system to a distribution over assortments and prices that specify the operator’s decision. Thus, in the finite time horizon setting, such a policy is givenby the mapping,

푘 푁 휋 (휈(푥), 푊 (푥)) : [0, 1) × Z+ → Δ(풮) . (3.1)

Where we use the operator Δ(풮) to denote the space of probability distributions over

푘 a set 풮. Under policy 휋 we use the subscript notation 휋푆(휈(푥), 푊 (푥)) to denote the probability of offering set 푆 to customers of type 푘 given the state of the system at time 푥. Then the overall policy is given by the concatenation of such policies over customer types, 휋(휈(푥), 푊 (푥)) = (휋1(휈(푥), 푊 (푥)), . . . , 휋퐾 (휈(푥), 푊 (푥))). As in previous work,

80 see for example Liu and Van Ryzin (2008), for the purpose of benchmarking the optimal policy we restrict our search space to so-called admissible policies, for which

휋푆(휈(푥), 푊 (푥)) = 0 for all resources 푖 such that 푊푖(푥) = 퐶푖. This assumption of admissible policies is not likely to be restrictive in general, as even weak notions of rationality in customer choice are sufficient for the optimal policy to lie in this class. We will use Π to denote the space of all admissible policies. Due to the continuous nature of time in our model we note that, in general, such policies may not have a finite representation.

휋 For each time 휈 ∈ [0, 1), the policy 휋 ∈ Π induces a distribution Ψ휈 (푤) over possi- ∏︀푁 ble utilization states 푤 ∈ 풲 = 푖=1{0, . . . , 퐶푖} that encompasses the randomness in the arrival processes, the departure processes, and the potential randomness inherent in the policy 휋. By mapping these distribution over utilization states through the policy 휋, we obtain the aggregate average number of arrivals of customers of type 푘 into resource 푖 as,

∫︁ 1 ˜휋 ∑︁ ∑︁ 휋 푘 휆푖푘 = 휆푘(휈) Ψ휈 (푤)휋푆(휈, 푤)푃푖푘(푆) 푑휈. (3.2) 0 푤∈풲 푆∈풮

˜휋 Using the quantities 휆푖푘, the expected revenue earned by policy 휋 from arrivals beginning service in 풱 can be expressed as,

푁 퐾 휋 ∑︁ ∑︁ 푟푖푘 ˜휋 퐽 = 휆푖푘. (3.3) 휇푖푘 푖=1 푘=1

This represents the expected revenue over the time horizon under policy 휋. Our notion of optimality will be the policy with the highest expected revenue over the time horizon. We observe that the optimal revenue rate is bounded above by ∑︀푁 ∑︀퐾 * 푖=1 푘=1 푟푖푘퐶푖. Then let 퐽 denote the optimal expected revenue over the relevant space of policies, 퐽 * = sup 퐽 휋. (3.4) 휋∈Π We pause to highlight the fact that 퐽 * is the optimal revenue achievable by the best dynamic strategy including strategies which vary continuously over the interval 풱

81 with respect to the utilization state 풲. In principle, this problem can be solved using dynamic programming, but since even in the simplest settings the continuous nature of time yields an uncountable state space such an approach is intractable even in simple scenarios.

3.2 Assortment Policies

In this section, we focus on developing provably performant assortment offering policies under time-varying demand and exponentially distributed service times. To simplify the form of our results we assume that the service rates for each resource and customer type are equal, that is 휇푖푘 = 휇 for all resources 푖 and customer types 푘. This assumption is not necessary however, and our results can be generalized in a straightforward manner at the expense of more complicated expressions and bounds which account for potential disparities in service rates between customer types and resources. Here, we take the revenue rates 푟푖푘 as fixed exogenously and focus on developing assortment strategies that enable the operator to effectively balance demand between resources over time, however we consider extensions to dynamic pricing in section 3.3. We provide the proofs of our propositions in the finite time horizon setting, however our results remain true in the case of an infinite time horizon under periodically varying arrival rates and we provide the suitably modified modeling rationale and proofs in appendix B.2. Our strategy works by approximating the continuously changing demand rates as piecewise constant functions. To achieve this, we split the interval 풱 into a set of 푇 disjoint segments 풯 , which we index using 푡. For expositional simplicity we assume that each of these 푇 subintervals are of equal length 휃 = 1/푇 and ordered so that subinterval 푡 corresponds to the interval, [(푡 − 1)휃, 푡휃). In order to translate between continuous time 푥 and the index of the subinterval we define 푡(푥) as the mapping from continuous time 푥 to the corresponding subinterval 푡 ∈ 풯 . Customers of each type arrive continuously over time according to their respective non-homogeneous Poisson processes and upon arrival are offered a set of resources

82 푆 ⊆ 풮. Under our time discretization, the number of arrivals of type 푘 customers during subinterval 푡 is given by a Poisson random variable with parameter,

∫︁ 푡휃 ¯ 휆푘푡 = 휆푘(푥) 푑푥. (3.5) (푡−1)휃

If an arriving customer elects to purchase a resource 푖 ∈ 푆, they occupy one unit of capacity for a period of time that is exponentially distributed with rate 휇 while paying

푟푖푘 for the duration of the service. To monitor the utilization of each resource at the

beginning of each subinterval we introduce the random variable 푄푖푡 = 푊푖((푡 − 1)휃) to denote the utilization of resource 푖 at the beginning of subinterval 푡, which we term the initial utilization of subinterval 푡. Due to the capacity restriction we must have

푄푖푡 ≤ 퐶푖 for all 푖 and 푡.

We now examine the transition dynamics of the system utilization 푄푖푡. To formally

specify the transition dynamics of 푄푖푡 it is useful into introduce some variables that will play an important role in our analysis. We note that these random variables as well as the random variable representing the initial utilization depend on the choice of policy 휋, but we leave this dependence implicit to keep our notation uncluttered.

Let 퐴푖푡 denote the random variable representing the number of customers arriving to resource 푖 and successfully beginning service during subinterval 푡. Likewise, let

퐷푖푡 denote the number of customer departures from resource 푖 during subinterval 푡. Then we observe that the evolution of capacity utilization on a sample path can be

captured by the recurrence relation, 푄푖(푡+1) = 푄푖푡 + 퐴푖푡 − 퐷푖푡. We also note that the

capacity constraint applied to 푄푖(푡+1) immediately implies the natural flow balance constraint 푄푖푡 + 퐴푖푡 ≤ 퐶푖 + 퐷푖푡.

Under a fixed policy 휋, the evolution of 푄푖푡, 퐴푖푡, and 퐷푖푡 has further structure. Specifically, as observed in section 3.1.1, the policies we consider induce distributions over utilization states at each time 휈 ∈ 풱. We are then able to use these distributions to characterize the average behavior of 푄푖푡, 퐴푖푡, and 퐷푖푡 within the 푇 subintervals. ¯ ¯ ¯ Specifically, we use 푄푖푡, 퐴푖푡, and 퐷푖푡 to represent the means of these over the probability distributions induced by these distributions.

83 3.2.1 Linear Programming Upper Bound

To analyze the performance of any proposed policy we would like to develop an upper bound on the expected revenue achievable by any policy over the interval 풱. To this end we seek a lower bound on the effect of a customer arrival within a segment 푠 on each other subinterval 푡. Under Markovian departure dynamics, a customer is mostly likely to have departed by the end of the interval if they arrived at the beginning of the respective interval. By the CDF of the exponential distribution, such an arrival would remain in the system at the end of the interval with probability 푒−휇휃. Exploiting the memorylessness property of exponential random variables, these departure probabilities may be chained together to yield an expression that lower bounds the effective capacity utilized during subinterval 푡 by a customer admitted to resource 푖 during segment 푠. Thus we define the future load of an arrival during interval 푠 on periodic interval 푡 by,

⎧ ⎨⎪푒−(푡−푠+1)휇휃 if 푠 ≤ 푡 푓푖(푠, 푡) = (3.6) ⎩⎪0 if 푠 > 푡.

Under a policy 휋 we observe that there is a fixed probability that a customer of

휋 type 푘 will be offered the assortment 푆 during subinterval 푠, 훼푘푠(푆). These may be obtained by mapping the utilization state probabilities Ψ휈(푤) through the policy and integrating over 휈 ∈ [(푡 − 1)휃, 푡휃),

1 ∫︁ 푡휃 ∑︁ 훼휋 (푆) = 휆 (휈) Ψ (푤)휋푘 (휈, 푤) 푑휈. 푘푡 휆¯ 푘 휈 푆 푘푡 (푡−1)휃 푤∈풲

Under an admissible policy, each arrival to a resource is able to obtain service and therefore we can express the average number of arrivals into resource 푖 of customers

of type 푘 during subinterval 푡 in terms of the offering probabilities 훼푘푡(푆),

¯휋 ∑︁ ¯ 휋 퐴푖푘푡 = 휆푘푡푃푖푘(푆)훼푘푡(푆). (3.7) 푆∈풮

84 The resulting expected revenue over the interval 풱 can then be expressed in terms of these averages as, 푇 푁 퐾 ∑︁ ∑︁ ∑︁ 푟푖푘 퐽 휋 = 퐴¯휋 . (3.8) 휇 푖푡푘 푡=1 푖=1 푘=1 We will also make use of the overall number of customer arrivals to resource 푖 during interval 푡 given by, 퐾 ¯휋 ∑︁ ¯휋 퐴푖푡 = 퐴푖푘푡. (3.9) 푘=1 In this manner, every admissible policy induces a set of offering probabilities that directly define the arrival rates as well as the resulting revenue and this motivates us to introduce the corresponding decision variable 훼푘푠(푆) for each combination of customer type 푘 ∈ 풦, subinterval 푠 ∈ 풯 , and assortment 푆 ∈ 풮. These variables may be interpreted as the expected fraction of time that assortment 푆 is offered to customers of type 푘 during segment 푠 under the specified policy. Using these decision variables we seek to optimize the expected revenue (3.8) over 풱. With our

decision variables and the future load function 푓푖(푠, 푡), we formulate the following linear program. As we argue in proposition 6, the optimal value of this formulation upper bounds the performance of any admissible policy.

푇 퐾 푁 퐿푃 ∑︁ ∑︁ ∑︁ ∑︁ 푟푖푘 ¯ 퐽 = max 휆푘푠푃푖푘(푆)훼푘푠(푆) 훼 휇 푠=1 푘=1 푆∈풮 푖=1 푇 퐾 ∑︁ ∑︁ ∑︁ ¯ s.t. 휆푘푠푃푖푘(푆)푓푖(푠, 푡)훼푘푠(푆) ≤ 퐶푖 ∀푖 ∈ 풩 , ∀푡 ∈ 풯 (3.10) 푠=1 푘=1 푆∈풮 ∑︁ 훼푘푠(푆) ≤ 1 ∀푘 ∈ 풦, ∀푠 ∈ 풯 푆∈풮

훼푘푠(푆) ≥ 0 ∀푆 ∈ 풮, ∀푘 ∈ 풦, ∀푠 ∈ 풯

퐿푃 퐿푃 Let 훼 = {훼푘푠 } denote the solution to the linear program (3.10). We may then 퐿푃 퐿푃 define the expected number of arrivals by 퐴푖푘푡 and 퐴푖푡 just as specified in equations (3.7) and (3.9). We now demonstrate that the objective value of the solution to the linear program (3.10) represents an upper bound on the expected revenue of the optimal policy. The

85 proof is somewhat more involved than that used in the time-homogeneous setting and is given in appendix B.1.

Proposition 6. 퐽 * ≤ 퐽 퐿푃

Although problem (3.10) is conceptually straightforward, in practice the number of possible assortments that can be offered to each customer class within each subinterval introduces computational challenges. This is because there are 푇 퐾2푁 variables and selection probabilities must be generated and stored for each such set. To reduce the computational burden, we utilize a column generation procedure for computing the deterministic upper bound. Our strategy here is similar to that proposed in Gallego et al. (2016) and in the previous, except that in our setting we must take into account the impact of admitting a customer on the utilization of a resource in future time periods. Since each variable corresponds to a unique combination of customer type, periodic interval, and assortment we solve a reduced version of problem (3.10) with a subset of assortments 풞푘푠 for each customer type 푘 and subinterval 푠 and compute the solution only for the corresponding variables. This results in a reduced problem of the form,

푇 퐾 푁 푅 ∑︁ ∑︁ ∑︁ ∑︁ 푟푖푘 ¯ 퐽 (풞) = max 휆푘푠푃푖푘(푆)훼푘푠(푆) 훼 휇 푠=1 푘=1 푆∈풞푘푠 푖=1 푇 퐾 ∑︁ ∑︁ ∑︁ ¯ s.t. 휆푘푠푃푖푘(푆)푓푖(푠, 푡)훼푘푠(푆) ≤ 퐶푖 ∀푖 ∈ 풩 , ∀푡 ∈ 풯 (3.11) 푠=1 푘=1 푆∈풞푘푠 ∑︁ 훼푘푠(푆) ≤ 1 ∀푘 ∈ 풦, ∀푠 ∈ 풯 푆∈풮

훼푘푠(푆) ≥ 0 ∀푆 ∈ 풮, ∀푘 ∈ 풦, ∀푠 ∈ 풯 .

After solving the reduced problem to optimality, we check the optimality of our current solution to the full problem (3.10) by searching for a violated constraint in

its dual problem. In particular, let 훾푖푡 denote the dual variable corresponding to

the capacity constraint of resource 푖 in periodic subinterval 푡 and let 휎푘푡 denote the dual variable corresponding to the offering-time constraint for customers of type 푘 during subinterval 푡. We may then formulate the corresponding dual linear program

86 as follows,

푁 푇 퐾 푇 ∑︁ ∑︁ ∑︁ ∑︁ min 퐶푖훾푖푡 + 휎푘푡 훾,휎 푖=1 푡=1 푘=1 푡=1 푁 푇 푁 ∑︁ ∑︁ ∑︁ 푟푖푘 s.t. 휆¯ 푃 (푆)푓 (푠, 푡)훾 + 휎 ≥ 휆¯ 푃 (푆) ∀푘 ∈ 풦, ∀푠 ∈ 풯 , ∀푆 ∈ 풮 푘푠 푖푘 푖 푖푡 푘푠 휇 푘푠 푖푘 푖=1 푡=1 푖=1

훾푖푡 ≥ 0 ∀푖 ∈ 풩 , ∀푡 ∈ 풯

휎푘푡 ≥ 0 ∀푘 ∈ 풦, ∀푡 ∈ 풯 . (3.12)

푅 푅 Let 훾푖푡 and 휎푘푠 be the resulting value of the dual variables after solving the reduced problem (3.11) to optimality. By standard duality theory in linear optimization, finding a variable to add to the primal problem is equivalent to finding aviolated

푅 푅 constraint in the dual program (3.12) with variables fixed to 훾푖푡 and 휎푘푠. Therefore, for each customer type 푘 and subinterval 푠 we seek to solve the following column generation subproblem,

푁 푁 푇 ∑︁ 푟푖푘 ¯ ∑︁ ∑︁ ¯ max 휆푘푠푃푖푘(푆) − 훾푖푡휆푘푠푓푖(푠, 푡)푃푖푘(푆) − 휎푘푠. (3.13) 푆∈풮 휇 푖=1 푖=1 푡=1

If the resulting solution is non-positive for each customer type 푘 and segment 푠, then the solution to the reduced primal problem is indeed optimal and we can terminate the procedure. Otherwise we add at least one set 푆 for which a positive value was obtained in (3.13) to the set of candidate columns 풞푘푠 for the corresponding customer type and subinterval and repeat this procedure with the updated set of candidate columns.

We note that solving the column generation subproblem corresponds exactly to

the static assortment optimization problem with the dual-adjusted revenues, 푟˜푖푠 =

∑︀푇 푟푖푘 푡=1( 휇 − 훾푖푡푓푖(푠, 푡)). Thus the column generation problem is computationally tractable whenever the assortment problem can be solved efficiently. For the special case in which demand from each customer type is described by a multinomial logit model, it suffices to check the dual-adjusted revenue-ordered assortments for

87 each customer type as shown in Talluri and van Ryzin (2004a). A number of other models are also known to admit tractable assortment optimization. In the next section, we will demonstrate that the resulting solution determines a policy with substantial performance guarantees. We would like to highlight this as one of important themes of this chapter. Despite the enormous size of the state space and the intractability of computing the full optimal dynamic control policy, we are able to obtain provably effective policies so long as we are able to efficiently solve single-period static assortment problems.

3.2.2 Randomized Algorithm Guarantees

We now propose a simple policy that provides a constant-factor revenue guarantee versus the expected cyclic revenue of an optimal dynamic policy, 퐽 *. We term this policy the time-dependent randomized (TDR) policy, which makes offering decisions to customers of each type depending on the system state only through the current subinterval. To implement this policy we solve the following related linear program,

푇 퐾 푁 푃 퐺 ∑︁ ∑︁ ∑︁ ∑︁ 푟푖푘 ¯ 퐽 = max 휆푘푠푃푖푘(푆)훼푘푠(푆) 훼 휇 푠=1 푘=1 푆∈풮 푖=1 푇 퐾 ∑︁ ∑︁ ∑︁ ¯ −2휇휃 s.t. 휆푘푠푃푖푘(푆)푓푖(푠, 푡)훼푘푠(푆) ≤ 푒 퐶푖 ∀푖 ∈ 풩 , ∀푡 ∈ 풯 푠=1 푘=1 푆∈풮 ∑︁ 훼푘푠(푆) ≤ 1 ∀푘 ∈ 풦, ∀푠 ∈ 풯 푆∈풮

훼푘푠(푆) ≥ 0 ∀푆 ∈ 풮, ∀푘 ∈ 풦, ∀푠 ∈ 풯 , (3.14) which we term the policy-guiding linear program.

By solving (3.14) we obtain the resulting decision variables 훼푘푡(푆), corresponding to the fraction of time the assortment 푆 is offered to customers of type 푘 during segment 푡. The linear program (3.14) is essentially the same as formulation (3.10), but with the capacity of each resource scaled down by the multiplier 푒−2휇휃. The following lemma shows that the introduction of this capacity buffer reduces the resulting objective value in comparison to 퐽 퐿푃 by at most the same multiplicative factor. The

88 proof is straightforward but is included in appendix B.1 for completeness.

Lemma 5. 퐽 푃 퐺 ≥ 푒−2휇휃퐽 *

With the variables 훼푘푡(푆) in hand, upon the arrival of a customer of type 푘 during subinterval 푡 an assortment 푆 is selected independently at random according to the probability distribution specified by 훼푘푡. Since the TDR policy makes offering decisions without considering the current capacity utilization, customers who select a resource 푖 from the offered assortment 푆 may find that it is currently being utilized at capacity. As in the time-homogeneous setting we say that such arrivals are blocked and exit the system earning no reward for the operator. We observe that the TDR policy as stated is not admissible as defined in section 3.1.1 and therefore it maymake inefficient allocations to unavailable resources, however, despite this inefficiency, we will demonstrate that the policy achieves substantial performance guarantees relative to the optimal policy. Due to this issue, in our analysis of the TDR policy we must distinguish between the number of accepted arrivals to resource 푖 during interval 푡 given by 퐴푖푡 and the number of such arrivals assigned to resource 푖 by the randomized policy 푍푖푡. Recall that the number of arrivals of customers of type 푘 within subinterval 푡 is a Poisson ¯ random variable with mean 휆푘푡. Thus, by the splitting and merging properties of independent Poisson random variables under the TDR policy, the number of customer arrivals to each resource during each segment, 푍푖푡 is itself a Poisson random variable with mean, 퐾 ˜ ∑︁ ∑︁ ¯ 휆푖푡 = 휆푘푠푃푖푘(푆)훼푘푠(푆). 푘=1 푆∈풮 While simple to implement, the TDR policy provably obtains a constant-factor guarantee relative to the optimal policy.

Proposition 7. The expected revenue of the TDR policy satisfies the following performance guarantee,

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 퐽 푇 퐷푅 ≥ min 푖 푒−2휇휃퐽 * ≥ 푒−2휇휃−1퐽 *. (3.15) 푖 푞! 푞=0

89 In particular, as the approximation interval width 휃 becomes smaller, this perfor-

1 mance ratio approaches 푒 and as min푖 퐶푖 grows large this performance ratio approaches 1 −2휇휃 2 푒 , both independently of other problem parameters.

This constant factor performance guarantee applies for any configuration of system parameters. In particular it holds when the system capacities are small, for example

when 퐶푖 = 1 for many or all resources. Such small capacity systems are especially vulnerable to stochastic fluctuations in demand around their average values which

1 is the reason for presence of the additional factor of 푒 in the performance bound (3.15). In a heavy traffic system with larger capacities we may expect these stochastic fluctuations to tend to decrease in magnitude with respect to the size of theoverall demand allocations. To confirm this intuition we introduce a system scaling parameter 휉 which defines a new instance of the problem in which the arrival rates and capacity

(휉) (휉) are scaled to 휆푘 (푥) = 휉휆푘(푥) and 퐶 = 휉퐶, respectively. Under this 휉-scaled regime, we consider the scaled version of the policy-guiding linear program,

푇 퐾 푁 푃 퐺−휉 ∑︁ ∑︁ ∑︁ ∑︁ 푟푖푘 ¯(휉) 퐽 = max 휆 푃푖푘(푆)훼푘푠(푆) 훼 휇 푘푠 푠=1 푘=1 푆∈풮 푖=1 푇 퐾 ∑︁ ∑︁ ∑︁ ¯(휉) −2휇휃 (휉) s.t. 휆푘푠 푃푖푘(푆)푓푖(푠, 푡)훼푘푠(푆) ≤ 푒 퐶푖 ∀푖 ∈ 풩 , ∀푡 ∈ 풯 푠=1 푘=1 푆∈풮 ∑︁ 훼푘푠(푆) ≤ 1 ∀푘 ∈ 풦, ∀푠 ∈ 풯 푆∈풮

훼푘푠(푆) ≥ 0 ∀푆 ∈ 풮, ∀푘 ∈ 풦, ∀푠 ∈ 풯 . (3.16) We observe that the scaled system linear program is similar to the TDR LP formulation (3.14) with the constraints, the constraint vector, and the objective scaled by a factor of 휉. This scaling does not affect the solution 훼푃 퐺−휉 = 훼푃 퐺 and the resulting objective value is simply scaled as 퐽 푃 퐺−휉 = 휉퐽 푃 퐺. We now demonstrate that the TDR policy is asymptotically 푒−2휇휃-optimal in the

1 푇 퐷푅−휉 −2휇휃 * scale parameter 휉 in that 휉 퐽 approaches 푒 퐽 as 휉 grows large. This indicates that under heavy traffic, as the approximation window 휃 grows smaller, the performance of the TDR policy approaches the optimal revenue within a factor of .

90 Proposition 8. The TDR policy is asymptotically 푒−2휇휃-optimal in the system scale parameter 휉 in that, 1 lim 퐽 푇 퐷푅−휉 ≥ 푒−2휇휃퐽 *. 휉→∞ 휉 In particular, as the approximation interval width 휃 becomes smaller, the asymptotic performance ratio approaches 1.

We would like to highlight the fact that our bounds do not depend on exact knowledge of the true functional form of the arrival rate functions 휆푘(푥). Therefore our performance guarantees apply equally well when the expected number of arrivals ¯ 휆푘푠 can be accurately estimated within each discretization window. Therefore the assumption of exactly known arrival rates is sufficient, but not necessary for our results to apply. This is especially important in practical applications in which the assumption of perfectly known demand rates may be unreasonable, but in which period-by-period arrival rates may be estimated.

3.2.3 Dynamic Policies

The theoretical guarantees of proposition 7 give a constant-factor expected performance guarantee for the TDR algorithm. However, as previously noted, the TDR policy itself does not make use of current utilization information in making allocation decisions. In cases such as municipal on-street parking in most cities this restriction is realistic and therefore the TDR algorithm is an attractive choice. However, when real-time utilization information is available, an operator may seek a policy that is more deterministic in nature and that responds to changing resource availability in a more proactive fashion. Due to the size of the global state space, computing the globally optimal dynamic policy is intractable in general, so in this section we explore heuristic policies for dynamic assortment optimization. Subsequently, in section 3.4 we demonstrate that these policies perform well in our scenario of practical interest. We begin by introducing the obvious greedy policy and discuss its shortcomings. To mitigate these we propose an alternative bid-price style algorithm that makes use of real-time utilization data as well as an estimate of the marginal value of resource.

91 At time 푥, the current utilization of the system is captured by the 푁 × 퐾 matrix, 푊 (푥) whose entries represent the number of each customer type being served at each ∑︀퐾 resource and recall the notation 푊푖(푥) = 푘=1 푊푖푘(푥) to denote the total utilization of 푖 at 푥. We use the notation 퐴(푥) = {푖 ∈ 풩 : 푊푖(푥) < 퐶푖} to denote the set of available resources at time 푥. Then upon the arrival of a customer of type 푘, the greedy policy maximizes the expected marginal gain by offering the assortment 푆푘(푥), representing the solution to the optimization problem,

푁 ∑︁ 푟푖푘 max 푃푖푘(푆). 푆⊆퐴(푥) 휇 푖=1 푖푘

By maximizing incremental revenue upon each arrival, the greedy policy ensures that customers are always routed to high-value resources, however this policy fails to consider the impact of future customer choice behavior on long-term revenue. For instance, the greedy algorithm may offer a resource with high expected revenue toall arriving customers indiscriminately, however this is suboptimal if there is a class of customers for which this resource is the only acceptable alternative. Similar issues are examined in Bernstein et al. (2015), where the authors demonstrates the optimality of a policy incorporating inventory-rationing in their model. In some systems, such as those with light traffic relative to system capacity and in which revenue rates and service times do not differ greatly between resources and customer types, the greedy policy performs well. However with increasing disparity in revenue or service rates between customer types or in the case where some customer types are more selective than others more sophisticated assortment policies can provide improved performance.

In such circumstances, in addition to the immediately generated revenue, it is important to account for the value a resource could have in the future. To do this, we propose the the time-varying dual bid-price policy (TVDB) which approximates the marginal value of each unit resource using a quanitity derived from the dual values of the capacity constraints of the formulation (3.10). Specifically, for a customer of type 푘 arriving at time 푥, we estimate a marginal value for the consumption of

92 each resource based on the dual values of the resource over each future checkpoint.

−휇푖푘휃 The single-customer dual values for each interval 훾푖푡푒 are then weighted by the probability that an admitted customer will remain in the system until the end of each respective interval, yielding a customer and resource dependent value,

∑︁ −휇푖푘휃 훾푖푘(푥) = (1 − 퐹푖푘(푡휃 − 푥)) 훾푖푡푒 . (3.17) 푡≥푡(푥)

Where 퐹푖푘 denotes the cumulative distribution of a departure time random variable

−휇푖푘휃 with rate 휇푖푘. The factor 푒 appears due to the fact that the dual value 훾푖푡 actually represents the value of allowing 푒휇푖푘휃 customers to arrive at the beginning of subinterval 푡. After computing this marginal value for each resource, the customer is offered the assortment 푆 resulting from the bid-price optimization problem,

푁 (︂ )︂ ∑︁ 푟푖푘 max − 훾푖푘(푥) 푃푖푘(푆). (3.18) 푆⊆퐴(푥) 휇 푖=1 푖푘

We observe that the optimization problem (3.18) is simply a single period assortment optimization problem and is therefore tractable under a wide variety of customer choice models specifying the purchase probabilities 푃푖푘(푆). We note that if the ad- (︁ )︁ 푟푖푘 justed expected revenue − 훾푖푘(푥) < 0 for all resource 푖 ∈ 풩 , then the customer 휇푖푘 is offered the null resource and is in effect rejected. In section 3.4, westudythe performance of this strategy in numerical experiments based on an instance derived from data obtained in a real municipal parking application.

3.3 Pricing Under Time-varying Demand

In many cases of practical interest, in addition to the choice of assortment, an operator has the flexibility to adjust their pricing strategy to help match demand with available supply and to capture a larger share of the available consumer surplus. As in section 2.4, we consider the opportunistic-pricing problem and the fair-pricing

93 problem in turn. In the opportunistic scenario the operator is free to engage in price discrimination between the various customer types, while in the fair-pricing setting, such discrimination is not permitted. We first review common notation applicable to both scenarios. We then introduce the methodologies that are applicable to solving the upper bounding and policy-guiding linear programs in each setting. Subsequently, we extend the randomized policies of section 3.2 and discuss how the performance guarantees presented there can be extended to the pricing context. As in section 2.4, in addition to the assortment decision 푆 ∈ 풮 for each customer type 푘, the seller must also make a pricing decision for each resource 푖 from the set of

푁×퐾 퐿 candidate price levels 풫푖 = {푝푖1, . . . , 푝푖퐿}. Again, we use the matrix 푅 ∈ R to denote a pricing specification with individual pricing decisions given by the elements

푟푖푘 ∈ 풫푖. We let 풫 denote the space of such pricing configurations. Thus, in this case the operator’s decision at each stage is specified by both the selected prices and assortments, denoted 푋 = (푅, 푆) and we further define 풳 = 풫 × 풮 to be the

space of such decisions. We then use the shorthand notation 푃푖푘푙(푋) to denote the probability that a customer of type 푘 purchases 푖 at price level 푙 when presented with the price and assortment configuration specified by 푋 ∈ 풳 . Taking into account the customer choice models ℳ and the current state of the system as given by the present utilization and the system time, the operator must make a pricing and assortment decision 푋 ∈ 풳 . As in section 2.4, if the number of products is limited to some

푀 ∈ Z+, then we restrict our pricing and assortment decision further to lie in the set

풳푀 = {푋 ∈ 풳 : |푆| ≤ 푀}.

3.3.1 The Opportunistic Pricing Case

With this notation in hand, the opportunistic case proceeds much as in section 2.4.1. In this scenario, the operator is free to charge customers of different classes different rates for the same resource. In particular, the operator is able to engage in dynamic price discrimination taking into account both the price sensitivity of the customer, the current state of their resources, as well as future demand patterns. In this circumstance we may again introduce virtual products that represent the various price levels

94 associated with each product, as in Gallego and Topaloglu (2014) and subsequently apply the methodology developed in section 3.2 to solve this problem.

Specifically, as before, each column in the LP formulation now corresponds to an assortment decision for a given customer type 푘, but in this case each individual item is replaced with 퐿 virtual items, representing the same product at each of its 퐿 prices. The operator then makes assortment decisions in the same manner as described in section 3.2, with the additional restriction that each assortment 푆 must be valid, containing at most one virtual item corresponding to each item in 풩 . By

extending the notion of our decision variables we obtain 훼푘푠(푋) corresponding to configuration 푋 = (푆, 푅). This variable represents the fraction of time customers of type 푘 are offered assortment 푆 with price levels set according to 푅 during subinterval 푠. Using these variables, we formulate the linear programming upper bound in the opportunistic case as follows,

푇 퐾 푁 퐿 퐿푃 −푂푃 ∑︁ ∑︁ ∑︁ ∑︁ ∑︁ 푝푖푙 ¯ 퐽 = max 휆푘푠푃푖푘푙(푋)훼푘푠(푋) 훼 휇 푠=1 푋∈풳 푘=1 푖=1 푙=1 푇 퐾 퐿 ∑︁ ∑︁ ∑︁ ∑︁ ¯ s.t. 휆푘푠푃푖푘푙(푋)푓푖(푠, 푡)훼푘푠(푋) ≤ 퐶푖 ∀푖 ∈ 풩 , ∀푡 ∈ 풯 푠=1 푋∈풳 푘=1 푙=1 ∑︁ 훼푘푠(푋) ≤ 1 ∀푘 ∈ 풦 ∀푠 ∈ 풯 푋∈풳

훼푘푠(푋) ≥ 0 ∀푋 ∈ 풳푀 , ∀푘 ∈ 풦, ∀푠 ∈ 풯 . (3.19)

In very small instances the opportunistic problem may be solved directly using the explicit linear programming formulation. However, as before, the number of possible assortments, exacerbated by the introduction of the virtual items, is likely to make the full problem intractably large, necessitating the use of a column generation procedure. As was the case in section 3.2, to find an augmenting column it is sufficient tosolve a single-period static pricing and assortment problem over valid assortments. Under some models of customer choice this problem admits an efficient exact algorithm. For example, Gallego and Topaloglu (2014) present a polynomial time algorithm to solve this problem exactly when customer choice is specified by a nested logit model of

95 which the MNL model is a special case. In such cases the opportunistic problem can be solved with comparable efficiency to the assortment-only problem. As in section 3.2, for the purpose of developing randomized policies with provable guarantees we solve a modified policy-guiding formulation with capacity scaled by a factor of 푒−2휇휃.

3.3.2 The Fair Pricing Case

Due to the impracticality of price discrimination in many settings, we also consider the fair-pricing problem, as first posed in section 2.4.2. In this problem variant the operator’s pricing strategy must not directly discriminate between customers of different classes. That is for each resource 푖 and each pair of customer types 푘 and

′ 푘 , the pricing rates must be equal, satisfying, 푟푖푘 = 푟푖푘′ . One way to avoid class- based discrimination is to restrict the operator to make their pricing decision prior to the realization of the type of the arriving customer. Here, we again work with

퐹 푃 the restricted space of pricing configurations, 풫 = 풫1 × 풫2 × ... × 풫푁 . In such a case the pricing decision 푅 ∈ 풫퐹 푃 reduces to a 푁-vector of prices. This scenario is challenging in that it introduces a strong constraint on the pricing decision that must hold for each assortment offered to distinct customer types. Formally, in the fair-pricing context we must ensure that each product is offered at a single price regardless of customer class, thus we require that any valid pricing and assortment decision belong to the set 풳 퐹 푃 = 풫퐹 푃 × 풮퐾 . To ensure fair-pricing, we highlight the fact that each decision now specifies a price level as well as an assortment for each customer type simultaneously. For convenience in our formulations, we will interpret the joint pricing and assortment decision as a 3-dimensional vector 푋 ∈

푁×퐾×퐿 {0, 1} , with 푥푖푘푙 = 1 indicating the decision to offer product 푖 to customers of type 푘 at price level 푙 . Thus we may specify the space of fair prices,

{︃ 퐿 }︃ 퐹 푃 푁×퐿×퐾 ∑︁ ′ ′ 풳 = 푋 ∈ {0, 1} : 푥푖푘푙 ≤ 1 ∀ 푖, 푘 , 1 − 푥푖푘푙 ≥ 푥푖푘′푙′ = 0 ∀ 푖, 푘, 푘 , 푙 ̸= 푙 . 푙=1

The first restriction ensures that the same product is not offered to the same customer

96 type at various price levels. The second constraint prevents price discrimination. This restriction can be formulated more succinctly by introducing the auxiliary variables 푁×퐿 ∑︀퐿 푌 ∈ 풴 = {푌 ∈ {0, 1} : 푙=1 푦푖푙 = 1, ∀ 푖 ∈ 풩 }, with the interpretation that 푦푖푙 = 1 indicates the decision to price product 푖 at price level 푙. To express the fair pricing

퐹 푃 problem we may write 풳 = {푋 ∈ 풳 : ∃푌 ∈ 풴, 푥푖푘푙 ≤ 푦푖푙, ∀ 푖 ∈ 풩 , ∀푙 ∈ ℒ, ∀ 푘 ∈ 풦} and we use the notation 푌 (푋) to denote the auxilliary variables associated with the pricing and assortment decision 푋. Then to incorporate a potential 푀-capacity 푁×퐿×퐾 ∑︀푁 ∑︀퐿 constraint we introduce the set, 풳푀 = {푋 ∈ {0, 1} : 푖=1 푙=1 푥푖푘푙 ≤ 푀, ∀ 푘} 퐹 푃 퐹 푃 and restrict the decision space to 풳푀 = 풳푀 ∩ 풳 . With this notation in hand we are able to formulate the bounding linear program for the 푀-capacity constrained fair

pricing and assortment problem by introducing the decision variables 훼푡(푋) for each 퐹 푃 subinterval 푡. For each 푋 ∈ 풳푀 , 훼푡(푋) can be interpreted as the fraction of time in which the operator makes pricing and assortment decisions specified by 푋 to arriving customers during periodic subinterval 푡. We solve the following linear program to determine an allocation that maximizes long-run expected cyclic revenue,

푇 퐾 푁 퐿 퐿푃 −퐹 푃 (푀) ∑︁ ∑︁ ∑︁ ∑︁ ∑︁ 푝푖푙 ¯ 퐽 = max 휆푘푠푃푖푘푙(푋)훼푠(푋) 훼 휇 푠=1 퐹 푃 푘=1 푖=1 푙=1 푋∈풳푀 푇 퐾 퐿 ∑︁ ∑︁ ∑︁ ∑︁ ¯ s.t. 휆푘푠푃푖푘푙(푋)푓푖(푠, 푡)훼푠(푋) ≤ 퐶푖 ∀푖 ∈ 풩 , ∀푡 ∈ 풯 푠=1 퐹 푃 푘=1 푙=1 푋∈풳푀 ∑︁ 훼푠(푋) ≤ 1 ∀푠 ∈ 풯 퐹 푃 푋∈풳푀 퐹 푃 훼푠(푋) ≥ 0 ∀푋 ∈ 풳푀 , ∀푠 ∈ 풯 . (3.20) We let 훼퐿푃 −퐹 푃 (푀) denote the solution to this problem.

As in the previous section, the number of variables in this linear program necessi- tates the use of a column generation procedure. This procedure begins with a reduced problem on a restricted set of pricing and assorment configurations. Upon solving the reduced problem we check the optimality of our current solution to the full problem

97 (3.20) by searching for a violated constraint in the associated dual problem,

푁 푇 푇 ∑︁ ∑︁ ∑︁ min 퐶푖훾푖푡 + 휎푡 훾,휎 푖=1 푡=1 푡=1 푁 퐾 퐿 푇 ∑︁ ∑︁ ∑︁ ∑︁ ¯ s.t. 휆푘푠푃푖푘푙(푋)푓푖(푠, 푡)훾푖푡 + 휎푠 푖=1 푘=1 푙=1 푡=1 푁 퐾 퐿 (3.21) ∑︁ ∑︁ ∑︁ 푝푖푙 ≥ 휆¯ 푃 (푋) ∀푠 ∈ 풯 , 푋 ∈ 풳 퐹 푃 휇 푘푠 푖푘푙 푀 푖=1 푘=1 푙=1

훾푖푡 ≥ 0 ∀푖 ∈ 풩 , ∀푡 ∈ 풯

휎푡 ≥ 0 ∀푡 ∈ 풯 .

Where the dual variables 훾 and 휎 are associated with the capacity constraints and the offering time constraint respectively. Let 훾푅 and 휎푅 be the resulting value of the dual variables after solving the reduced problem to optimality. Using these we seek to solve the column generation subproblem,

푁 퐾 퐿 푁 퐾 퐿 푇 ∑︁ ∑︁ ∑︁ 푝푖푙 ¯ ∑︁ ∑︁ ∑︁ ∑︁ ¯ 푅 푅 max 휆푘푠푃푖푘푙(푋) − 휆푘푠푃푖푘푙(푋)푓푖(푠, 푡)훾푖푡 − 휎푠 . 푋∈풳 퐹 푃 휇 푀 푖=1 푘=1 푙=1 푖=1 푘=1 푙=1 푡=1 (3.22) If the resulting solution is non-positive for any periodic subinterval 푠, then the solution to the reduced primal problem is indeed optimal and we can terminate the procedure. Otherwise we add a configuration 푋 for which a positive value was obtained in (3.22) to the set of candidate columns and repeat this procedure with the updated set of candidate columns. As before, the column generation problem associated with the master problem (3.20) reduces to the corresponding single-period static problem. However, the following proposition demonstrates that, in general, this column generation subproblem poses computational challenges. We refer to this subproblem, which is of independent interest, as Intuitively, the FPPAO problem is concerned with selecting a price for each resource 푖 from the set

of candidate prices 풫푖 and subsequently deciding which assortment should be shown to each customer type.

98 As in the previous chapter, we recognize that this subproblem is an instance of the fair pricing and personalized assortment optimization problem (FPPAO). Recall that given the data tuple (풮, 풫, ℳ), the 푀-capacitated FPPAO problem is to solve the following optimization problem,

퐾 ∑︁ ∑︁ max 푟푖 ℳ (푖; 푆푘, 푅). 퐹 푃 P 푘 푅∈풫 ,푆푘∈풮: |푆푘|≤푀 푘=1 푖∈푆푘

We have shown in section 2.4.2 that the M-capacitated FPPAO problem is NP-hard in general, and we repeat the statement below for convenience.

Proposition 9. The 푀-capacitated fair pricing and assortment optimization is NP- hard even when ℳ푘 is specified by a multinomial logit model for each customer type 푘 and there are only two prices per item.

Owing to the difficulty in solving this column generation subproblem in general, we recall the joint pricing and product offer problem as introduced in section 2.4.2. In this problem variant the assortments offered to customers must consist of asin- gle item and each item may only be offered at a single price instantaneously. This problem can be viewed as a variant of the online matching problem, with variable prices, customer type specific price sensitivity, and resources that are available tobe rematched after a random duration of service. Formally, the pricing and offer prob-

퐹 푃 lem can be formulated by restricting the decision space to 풳1 . Even though the subproblem (3.22) specialized to this case requires optimization over an exponential number of possible configurations, we show that it can be formulated as an integer program with further structure. To formulate this integer program we interpret the components of 푋 as variables

푥푖푘푙 ∈ {0, 1} which we refer to as assignments. Similarly, we interpret the components of the auxiliary variable 푌 as the variables 푦푖푙 ∈ {0, 1}. We observe that in the case of the fair pricing and offer problem, for each customer type 푘, 푃푖푘푙(푋) depends only on the single value 푥푖푘푙 for which 푥푖푘푙 = 1. For this item 푖 and price level 푙, 푃푖푘푙(푋) is a constant since the probability of purchasing product 푖 does not depend on the

99 other unoffered products. We use 푃푖푘푙 to denote this constant probability. This leads to the following formulation for the column generation subproblem,

퐾 푁 퐿 푇 (︂ )︂ ∑︁ ∑︁ ∑︁ ∑︁ 푝푖푙 푅 ¯ max − 훾푖푡 푓푖(푠, 푡) 휆푘푠푃푖푘푙푥푖푘푙 푥,푦 휇 푘=1 푖=1 푙=1 푡=1 푁 퐿 ∑︁ ∑︁ s.t. 푥푖푘푙 ≤ 1 ∀ 푘 ∈ 풦 푖=1 푙=1 (3.23) 푥푖푘푙 ≤ 푦푖푙 ∀ 푖 ∈ 풩 , 푘 ∈ 풦, 푙 ∈ ℒ 퐿 ∑︁ 푦푖푙 = 1 ∀ 푖 ∈ 풩 , 푘 ∈ 풦 푙=1

푥푖푘푙 ∈ {0, 1}, 푦푖푙 ∈ {0, 1} ∀ 푖 ∈ 풩 , 푘 ∈ 풦, 푙 ∈ ℒ .

auxiliary variable 푦푖푙. The third constraint ensures that exactly one price level is selected for each product.

This formulation of the problem remains NP-hard, however it does possess special structure that aids in optimization. To see this we observe that once the pricing

variables 푦푖푙 are fixed, the assignment variables 푥푖푘푙 can be set by simple maximization. Then we may view the objective function of formulation (3.23) as a function over the elements of 푦 by,

퐾 푁 퐿 푇 (︂ )︂ ∑︁ ∑︁ ∑︁ ∑︁ 푝푖푙 푅 ¯ 푔(푦) = max − 훾푖푡 푓푖(푠, 푡) 휆푘푠푃푖푘푙푥푖푘푙. 푥:∑︀푁 ∑︀퐿 푥 ≤1, 푥 ≤푦 휇 푖=1 푙=1 푖푘푙 푖푘푙 푖푙 푘=1 푖=1 푙=1 푡=1

We observe that 푔(푦) is submodular in the pricing decision variables 푦푖푙 as each additional price/item combination offered may only reduce the marginal value of subsequently added price/item pairs. We also note that the third constraint in (3.23) can be interpreted as a partition matroid constraint, as an independent set in this context may contain at most one price for each item. Therefore by applying the pipage rounding framework proposed by Calinescu et al. (2007) the optimal solution to (3.23) may

100 1 be efficiently approximated to within a factor of (1 − 푒 ). Further, by Theorem 2 of 1 (Gallego et al. 2016), by obtaining a (1 − 푒 )-approximation to the column generation 1 subproblem through iteration we are able to obtain a (1 − 푒 )-approximation to the corresponding master problem.

3.3.3 Pricing Policy Performance

As in section 3.2 we begin by demonstrating that the solutions to the linear programming formulations (3.19) or (3.20) provide an upper bound on the expected cyclic revenue of the optimal periodic fixed policy in the pricing and assortment context. We use 퐽 *−푂푃 and 퐽 *−퐹 푃 (푀) to denote these optimal expected revenue values in the opportunistic and fair pricing scenarios respectively. The following proposition demonstrates that the objective values of our linear programming formulations provide the desired upper bounds. The proof follows in the same manner as that of Proposition 6 and is therefore omitted.

Proposition 10. 퐽 *−푂푃 ≤ 퐽 퐿푃 −푂푃 and 퐽 *−퐹 푃 (푀) ≤ 퐽 퐿푃 −퐹 푃 (푀)

Just as in section 3.2, we can obtain policies with constant-factor performance guarantees by solving policy-guiding variants of problems (3.19) or (3.20) in which the capacity constraints are scaled down by a factor of 푒−2휇휃. We term the objective values and solutions to these problems 퐽 푃 퐺−푂푃 /훼푃 퐺−푂푃 and 퐽 푃 퐺−퐹 푃 (푀)/훼푃 퐺−퐹 푃 (푀), respectively. To implement these time-dependent randomized policies, prior to the arrival of each customer in periodic subinterval 푠, the operator randomly selects a

pricing and offer decision according to the probabilities 훼푘푠. When a customer arrives and their type is revealed, the assortment to offer and associated prices are specified by the chosen configuration 푋. Let 퐽 푇 퐷푅−푂푃 and 퐽 푇 퐷푅−퐹 푃 (푀) denote the performance of the TDR policy in the respective scenarios. We observe that the proof of proposition 7 is general with respect to the prices offered to customer types within the interval, therefore by suitably modifying the proof we obtain the following performance guarantees applicable to both pricing and assortment scenarios.

101 Proposition 11. When the column generation subproblem may be solved exactly the expected steady-state cyclic revenue of the resulting TDR pricing and assortment policy satisfies the following performance guarantee,

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 퐽 푇 퐷푅−푂푃 ≥ min 푖 푒−2휇휃퐽 * ≥ 푒−2휇휃−1퐽 *−푂푃 . (3.24) 푖 푞! 푞=0

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 퐽 푇 퐷푅−퐹 푃 (푀) ≥ min 푖 푒−2휇휃퐽 * ≥ 푒−2휇휃−1퐽 *−퐹 푃 (푀). (3.25) 푖 푞! 푞=0 In particular, as the approximation interval width 휃 becomes smaller, this perfor-

1 mance ratio approaches 푒 and as min푖 퐶푖 grows large this performance ratio approaches 1 −2휇휃 2 푒 , both independently of other problem parameters. In addition, by introducing the system scaling parameter 휉, these policies are asymptotically 푒−2휇휃-optimal, in the same sense as that of proposition 8.

When the column generation subproblem can be efficiently approximated as is the case in the 퐹 푃 − 1 scenario, this guarantee follows through with an additional

1 factor of (1 − 푒 ) due to the approximation of submodular maximization. We term the natural resulting randomized strategy the approximated pricing and offer (APO) policy.

Corollary 1. The APO policy satisfies the following performance guarantee and its associated column generation subproblem may be solved in polynomial time.

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 퐽 퐴푃 푂−퐹 푃 (1) ≥ min 푖 (1 − 푒−1)푒−2휇휃퐽 * ≥ (1 − 푒−1)푒−2휇휃−1퐽 *−퐹 푃 (1). 푖 푞! 푞=0 (3.26)

3.3.4 Dynamic Policies

Just as in the case of the assortment strategies considered in section 3.2, the TDR policies considered in this section are randomized policies that do not make use of the current utilization state of the system. In this section, we introduce dynamic policies that avoid offering resources to customers which are fully occupied. Here we focuson

102 non-price discriminating policies as they are more intricate due to the interdependence between prices for various resources and demand by associated customer types. When price discrimination is permissable, the corresponding policy is obtained in a manner similar to that presented in section 3.2.3.

As in section 3.2.3, we introduce the time-varying dual bid-price strategy (TVDB) pricing and assortment policy. Upon the arrival of a customer of type 푘 at time 푥, we begin by computing an approximation for the value of capacity at each resource as specified in equation (3.17), repeated below for convenience,

∑︁ −휇푖푘휃 훾푖푘(푥) = (1 − 퐹푖푘(푡휃 − 푥)) 훾푖푡푒 . 푡≥푡(푥)

Where the dual values 훾푖푡 result from the solution of formulation (3.20).

Then to determine the current pricing and assortment configuration we solve a

variant of the column generation subproblem (3.22) using 훾푖푘(푥) as time-dependent bid-prices, 푁 퐾 퐿 (︂ )︂ ∑︁ ∑︁ ∑︁ 푝푖푙 ¯ max − 훾푖푘(푥) 휆푘푠푃푖푘푙(푋). (3.27) 푋∈풳 퐹 푃 (푥) 휇 푀 푖=1 푘=1 푙=1

퐹 푃 Where we use the notation 풳푀 (푥) to denote the current set of feasible pricing and assortment configurations. This specifies the set of such configurations under which no resource if offered if it is currently unavailable, or 푥푖푙푘 = 0 if 푖∈ / 퐴(푥). Recall that 퐴(푥) is the set of available resources at time 푥. In the special case of assortments of size one, the relevant problem can be expressed as in formulation (3.23) but with the objective modified to be,

퐾 푁 퐿 (︂ )︂ ∑︁ ∑︁ ∑︁ 푝푖푙 ¯ max − 훾푖(푥) 휆푘푠푃푖푘푙푥푖푘푙. (3.28) 푆⊆퐴(푥) 휇 푘=1 푖=1 푙=1

These policies have the advantage of being able to adjust prices in response to chang-

ing future demand levels as captured by the dynamic bid-price values 훾푖푘(푥).

We note that in this setting, the greedy policy is obtained by solving problem 3.27

with 훾푖푘(푥) = 0 for all 푖, 푘, and 푥. As in the case of our dynamic assortment policies,

103 (︁ )︁ 푝푖푙 if the adjusted expected revenue − 훾푖푘(푥) < 0 for all resource 푖 ∈ 풩 and price 휇푖푘 levels 푙 ∈ ℒ, then the customer is offered the null resource and is effectively rejected. We will compare the effectiveness of these policies in section 3.4.

3.4 Computational Case Study

To examine the effectiveness of our policies in a realistic scenario, we construct a numerical experiment based on parking data collected in a major city by an electronic payment system. In our scenario, the local municipality operates a network of parking lots, and we are interested in quantifying the revenue generation and congestion-control potential of our assortment and pricing strategies. The borough is conducting a pilot program which includes a mobile phone application which recommends parking spaces to customers based on their residential status. One proposal under consideration is a mobile phone application that recommends drivers a place to park based on their desired destination. This setting therefore represents a realistic application of our proposed policies. In our data set, each resource is a group of adjacent parking spaces which share the same pricing characteristics and is termed a meter. Associated with each meter

푖 is a series of timestamps of customer arrivals into the meter and the rate, 휇푖, at which present customers depart the meter. Most meters require payment for 10 hours per day so the available transaction log is limited to these 10 hours. From

obs this data we estimate a constant arrival rate 휆푖ℎ for each resource 푖 and hour ℎ ∈ {1,..., 10}. Under the current policy, each meter 푖 has one of nine possible fixed prices, 푟푖 ∈ {1.2, 1.8, 2.0, 2.4, 3.0, 3.6, 4.0, 4.8, 5.0}, denominated in the local currency.

Also associated with each meter 푖 is a location ℓ푖 in latitude and longitude and we use

the function 퐷(ℓ푖, ℓ푗) to express the haversine distance between meter 푖 and meter 푗 in kilometers. Since the data set does not contain counterfactual information we must make some assumptions on the customer decision process. First, we assume that the universe of possible customer types can be adequately approximated by the customer traffic

104 to each meter. Therefore we have that the number of customer types 퐾 = 푁 and we assume that customers of type 푘 are those associated with the 푘th meter. Thus, for lack of further information, for each customer type 푘, we take their preferred destination to be the location of the 푘th resource and therefore 푟푘 also denotes the price of the 푘th resource as fixed in the data set. In addition, we assume thatthe service time distribution is independent of the arriving customer type, and hence all customers remain in service at resource 푖 for a duration that is exponentially distributed with rate parameter 휇푖 regardless of their type. For our purposes, we propose a multinomial logit (MNL) model to describe likelihood of customer of type 푘 electing to utilize meter 푖. The parameter 푢푖푘 represents the mean utility customers of type 푘 derive from successfully utilizing resource 푖. The parameters 훽 and 휂 specify the sensitivity of customer utility to price and distance respectively. The deterministic component of customer utility or valuation is then given by, (푟푖 − 푟푘) 푈푖푘 = 푢푖푘 + 훽 + 휂퐷(ℓ푖, ℓ푘). (3.29) 휇푖 Where the parameters 훽 and 휂 denote the disutility for spending beyond their original budget and for the additional travel incurred, respectively. We note that in most reasonable scenarios these parameter values are likely to be nonpositive. In accordance with the specification of the MNL model we assume that heterogeneity of customer valuations within each type for each alternative are distributed as standard Gumbel random variables, 휖. We also take the convention of assigning a base utility of zero to the no-purchase option, this leads to the expression,

푒푈푖푘 푃푖푘(푆) = (3.30) ∑︀ 푈푗푘 1 + 푗∈푆 푒

which describes the probability that a customer of type 푘 purchases resource 푖 from the offered assortment 푆. For the purpose of generating our experiments in this section, we used the system

parameters 푢푖푘 =푢 ¯ = 2, 훽 = −1, 휂 = −5. These values were selected for giving intuitively reasonable purchase probabilities. By interpreting 훽 as the decrease in

105 utility per unit of currency paid we note that these values correspond to customers gaining 2 unit of currency of utility over the cost their preferred meter and customers would be willing to pay 5 unit of currency to avoid needing to walk an additional kilometer. Since the output of the choice model is used in the optimization problem and therefore in the formulation of the policies, we expect that the particular type of choice model used will have minimal impact on the relative performance of the policies. In our test scenarios we compute the performance of each policy over a 10 hour period with arrival rates specified by a non-homogeneous Poisson process with arrival

obs rates for each resource and hour given by 휆푖ℎ . Each scenario begins with each resource initially unutilized so that 푊푖푘 = 0 for all 푖 and 푘. We assume that each service earns its expected value value, equal to the offered price divided by the associated service rate 휇푖 and that each service earns its entire associated revenue, even if the customer remains in system at the conclusion of the 10 hour window. We compare the performance of the following policies and evaluate them with respect to the linear programming-based upper bound in both the assortment-only and pricing and assortment scenarios.

∙ TDR - executed as described in sections 3.2.2 and 3.3.3 without reference to system state

∙ TDR-UB - an implementation of the TDR algorithm which uses the results of the LP-UB to guide assortment decisions rather than the policy-guiding linear program

∙ Greedy - the state-dependent greedy heuristic as described in sections 3.2.3 and 3.3.4

∙ TVDB - the time-varying dual bid-price policy as described in sections 3.2.3 and 3.3.4

We conduct two different experiments with slightly differing specifications toex- amine the performance of our proposed policies in the assortment-only and pricing

106 (b) Assortment-Only Pricing by Customer (a) Assortment-Only Pricing by Meter Type

Figure 3-1: Absolute performance versus arrival scaling with LP-based upper bound. and assortment cases. In the following subsections, we specialize our procedure to these settings, report results, and discuss managerial implications. To determine how performance is affected by the level of traffic experienced by the system wetestthe performance of each policy on a grid of traffic levels. These multipliers are usedto

obs scale up and down the base level of traffic 휆푖ℎ for all resources and hours and these multipliers are used as the horizontal axis in figures 3-2 and 3-4. We also test the performance at a grid of values of 푇 ∈ {20, 40, 80} in order to determine how policy performance varies with the density of the approximation grid. For the purposes of our analysis we evaluate the performance of each relative the linear programming upper bound obtained in the tightest approximation scenario of 푇 = 80. This value attains the tightest upper bound on the revenue obtained by the optimal dynamic policy and so we report our results as fractions of this level.

3.4.1 Assortment Experiments

In the assortment setting, we examine two pricing scenarios. In the first, the prices are specified at the resource level and each customer paysrate 푟푖 to park at meter 푖 and therefore 푈푖푘 is given as specified in (3.29). In the second scenario, prices are fixedat the customer and therefore customers of type 푘 pay rate 푟푘 at each resource regardless

107 Figure 3-2: Policy performance versus arrival scaling for the price by meter assortment scenario.

of type. Thus, in this specification, the pricing term in (3.29) is canceled out andthe

base utility gained by customer type 푘 at meter 푖 is given by 푈푖푘 = 푢푖푘 + 휂퐷(ℓ푖, ℓ푘). In both cases, we work with a universe of 푁 = 퐾 = 23 resources constituting the southwest quadrant of available meters. For computational convenience, we do not institute a cap on the size of offered assortments so that under our MNL model, the optimal assortments may be computed quickly by optimizing over revenue-ordered assortments. We modify the upper bounding linear program (3.10) and the policy- guiding program (3.14) slightly to allow for service rates that vary by resource 휇푖. We then solve these formulations using column generation as described in section 3.2.1 until the dual-implied maximum gain of an additional assortment as represented by (3.13) is less than 0.0001. Our results suggest that the difference in objective value between a fully optimized solution that of the solution obtained by stopping early in this manner is insignificant. After solving both linear programs we implement each policy as described and plot the results in terms of absolute revenue in the first panel of figure 3-1 and the results relative to the tightest upper bound(푇 =80) in figure 3-2. We report results in table form for the price by resource scenario in the appendix section B.3.1 and B.3.2, respectively. In each case sufficient experiments are run so that the standard error of our estimates of the expected value of is less than 0.27% of the relevant linear programming upper bound. From the absolute results in figure 3-1, we observe that our arrival rate scaling

108 Figure 3-3: Policy performance versus arrival scaling for the price by customer type assortment scenario. covers a broad range of interesting behaviors. Lower values of the arrival rate scaling factor represent an underloaded system with rapid improvement in revenue as the rate of arrivals increases. On the other hand at high values, the system becomes limited by its capacity and the incremental gain achievable by the policies and the LP upper bound becomes smaller. These results emphasize the value of understanding the current state of the system before making assortment decisions. The two dynamic policies, greedy and TVDB, that are able to adapt to current utilization states and avoid losing customers due to poor routing tend to achieve higher fractions of the optimal revenue than the state- independent policies. When realtime state information is unavailable, we observe that TDR-UB, the variant of the TDR policy that employs 훼퐿푃 , the solution to formulation (3.10), tends to outperform the TDR policy as formally described. This is because, in this instance, the value of allocating customers to high value resources exceeds the value obtained by reduced blocking from the policy-guiding linear program. However, we also observe that the difference in performance shrinks as the approximation width is refined and the more conservative policy actually increases performance inthe underloaded regime. We also observe that the relative performance of the TVDB policy and the greedy policy is dependent on the pricing structure. When prices are fixed at the resource level, as shown in figure 3-2, the two policies yield similar results, with theTVDB

109 algorithm yielding a 1-2% improvement over the greedy baseline. This lack of differentiation is due to the fact that the prices in our scenario do not vary by customer type. When all customers pay the same price for each resource, we would expect a greedy algorithm to perform well since filling up each resource is the primary con- cern and rationing items for different types only helps through the ensuring that less picky customers are not wastefully offered a resource that is favored by more picky customer types. Using the arrival rates inferred from the data set and the customer choice process this effect is not extremely strong in our setting.

On the other hand, when prices are fixed by customer type, the TVDB policy begins to demonstrate a significant gap over the greedy policy as shown in figure 3-3. Under this pricing scheme a myopic greedy strategy may inefficiently offer low valuation customers resources that would be better saved for future high valuation customers. The TVDB policy is able to account for this valuation using its dual-derived estimates of the marginal value of capacity. We also observe that the performance of the TVDB policy improves as the approximation windows are refined, indicating that more precise estimates of the marginal value of resource capacity yields benefits in practice.

3.4.2 Pricing Experiments

In our pricing experiments, to isolate the effect of pricing, we consider the fair pricing and offer problem as detailed in section 3.3.2. In this setting, prior to the arrivalofa customer, the operator sets a price for each resource and designates an assignment of each customer type to a resource. The prices are selected from the set of nine prices present in the data set. Since the price is now a decision of the operator, we update the definition of the base utility in our MNL model to,

(푝푖푙 − 푟푘) 푈푖푘푙 =푢 ¯ + 훽 + 휂퐷(ℓ푖, ℓ푘). (3.31) 휇푖

110 Figure 3-4: Policy performance versus arrival scaling for the pricing scenario.

Therefore the probability of a customer of type 푘 purchase resource 푖 at price level 푙 is given simply by, 푒푈푖푘 푃푖푘푙 = . (3.32) 1 + 푒푈푖푘 Then, in keeping with the pricing and offer setting, the system configurations ateach time 푥 are determined by solving the appropriate variant problem (3.28) under the greedy and TVDB policies. For this pricing scenario we use an abbreviated data set consisting of 푁 = 퐾 = 8 resources selected by taking selecting resources from the southwest corner of the set considered in the assortment section. We report results in table form for the price by resource scenario in the appendix section B.3.3. In each case sufficient experiments are run so that the standard error of our estimates ofthe expected value of is less than 0.19% of the relevant linear programming upper bound. As seen in figure 3-1, the absolute revenue results in the pricing scenario followa similar pattern to those in the assortment case. Specifically the range of arrival rate multipliers is sufficient to represent both the under and over loaded regimes. We also note that the TDR and TDR-UB policies display similar performance to the results presented in the previous subsection, in that the TDR policy outperforms in the very underloaded system and the performance gap overall tends to shrink as the approximation window is refined.

The revenue results relative to the tightest LP-based upper bound in the pricing scenario are presented in 3-4. Here, in contrast to the assortment results considered

111 previously, the greedy algorithm underperforms, especially as the system becomes heavily loaded. In such an overloaded regime, the greedy policy even underperforms the randomized policies. This is due to the greedy policy’s insistence on extracting the maximum value out of each customer irrespective of value of the resources sold. Thus, the TVDB policy which accounts for the value of resources, and is able raise prices for resources with a high marginal value of capacity. In our experiment, our proposed TVDB policy outperforms the alternative policies over the entire range of system load scenarios. This demonstrates that, especially in heavily utilized systems, accounting for the future patterns of demand is essential to the effective performance of a dynamic pricing strategy.

3.5 Conclusion

In this chapter we have considered strategies for assortment and pricing in the setting of reusable resources. In contrast to the more typical setting of consumable resources, this setting introduces the additional challenge of accounting for randomness in service times, and therefore performance guarantees applicable in the former setting do not apply. The main theoretical contribution of our work is to introduce a method for and an analysis of a discretization scheme for the problem of managing a system of reusable resources. This is significant, because the dynamic programming formulation for the underlying problem of managing such a system in continuous time has an uncountable state space and is therefore intractable. We demonstrate that a variant of the usual randomized algorithm attains a strong constant-factor performance guarantee relative to the optimal time-adapted policy which is of potentially infinite complexity. Further, we show that this discretization scheme and associated guarantees are applicable under both a finite time horizon as well as an infinite time horizon with periodic demand. We show that this analysis is general enough to encompass the common revenue management problems of assortment optimization and joint pricing and assortment optimization. In particular, our theoretical performance guarantees for the random-

112 ized algorithm apply in both settings. We demonstrate that non-discriminatory pricing is computationally challenging in general but show that the case of pricing and single item assortments, the requisite subproblem is approximable in polynomial time.

In practical settings, the randomized algorithm we introduce provides a computationally tractable and provably performant method for allocating and pricing resources when the state of the resources cannot be directly observed. This is applicable in such settings as municipal parking in which sensors to monitor each individual space are impractical. When the state of the resources can be observed, our work provides a method for estimating the value of capacity dynamically over time. The precursors to these values may be computed once off-line and subsequently the value can be updated over time with simple arithmetic. Further, these estimates maybe refined as much as needed at the expense of additional effort in the initial computation. We use these estimates in our proposed time-varying bid-price policy, which is able to account for the future value of each unit of capacity. Our computational results demonstrate that accounting for this value is especially critical when the operator has pricing flexibility.

Our computational results also shed light on when a greedy policy may be sufficient and when more sophisticated techniques should be employed. When prices are fixed at the resource level or customers do not differ much in their valuations, the greedy policy tends to perform well. However, when customer types have very different valuations and the operator simply accepts or rejects bids, the performance of a simple greedy policy begins to degrade. This degradation is especially acute when the operator has the flexibility to adjust prices.

This work is intended to introduce one potential methodology for the analysis of assortment and pricing strategies in systems of reusable resources. Admittedly, our work leaves open a number of avenues for future research. In particular our results rely on the assumption of an accurate demand forecast and relaxing this assumption to consider robust policies is one direction we wish to explore further. We would also like to examine the result of relaxing our assumption of Markovian service times and determine if similar bounds apply in more general settings. Current work indicates

113 that similar bounds apply for a broader class of service time distributions which includes the gamma distribution, however generalizing further presents interesting challenges and opportunities. Finally, it would also be interesting to study the impact of resolving in our setting, which could allow for even more responsive pricing and assortment policies than those we have presented.

114 Chapter 4

Statistical Learning Guarantees for Personalized Pricing

The increasing prominence of electronic commerce has given businesses an unprece- dented ability to understand their customers as individuals and to tailor their services for them appropriately. This benefit is two-fold: customer profiles and data reposito- ries often provide information that can be used to predict which products and services are most relevant to a customer, and the fluid nature of electronic services allows for this information to be used to optimize their experience in real-time, see Murthi and Sarkar (2003). For instance, Linden et al. (2003) document how Amazon.com has used personalization techniques to optimize the selection of products it recommends to users for many years, dramatically increasing click-through and conversion rates as compared to static sites. Other companies, such as Netflix, have implemented personalization through recommender systems as described by Amatriain (2013) to drive revenue indirectly by improving customer experience. To implement a personalization strategy it has been widely proposed to divide the customer base into distinct segments and to tailor the service to each type appropriately as in Gurvich et al. (2009) and Bernstein et al. (2011), for example. This segmentation of customers is often accomplished in practice by dividing them into archetypical categories based on broad, easily observable characteristics such as business versus leisure travelers in the case of the airline industry as discussed in Talluri

115 and van Ryzin (2004b). Such methods have the potential to increase revenue over systems in which customers are assumed to be homogeneous, however, in practice, customers characteristics often vary continuously and smoothly over the pool of customers, which makes it difficult or impossible to cleanly separate customers along the axis of interest.

In this chapter, we consider a statistical model and algorithm that use observed contextual information, rather than previously defined customer segments, to inform decisions. This approach represents a shift from thinking about personalization in terms of customer types towards personalized management decisions as a function of the unique relevant information available at the time of each decision. In the language of machine learning, we cast personalized decision making as a supervised learning problem, where past transactional data is used to discover the underlying relationship between contextual information and customer behavior, and predictions are made based on this relationship. This approach takes full advantage of available data in at least two ways. First, it allows the seller to consider more complex relationships between context and customer behavior than those can be captured with just a few customer segments. Our approach also allows previous learning to be generalized easily, even to previously unseen types, allowing for customization even in the case of novel customer data.

As a demonstration of the advantages of this approach, consider again the example of business and leisure travelers in the airline industry. Within the business customer segment, there are large businesses with many assets and low price sensitivity, but also small firms that may not yet be financially established. Similarly, leisure travelers are often more price sensitive, but there are some wealthy travelers who may behave more like business customers. Transaction-specific contextual information is often rich enough to capture these “second-order" trends within the broad categories, which can then be leveraged to drive incremental revenue or to improve customer experience. With the granularity of information available today, it is likely that each new customer represents a unique pattern of information, and our personalization strategy should be flexible enough to both learn from and optimize for the full diversity of the customer

116 population. To implement such a strategy, we model demand or customer choice with a bi- nomial or multinomial logit function, where the arguments to the logit function are assumed to be linear functions of both observed features and the offered price or prices. Since logistic models are very popular in practice for modeling categorical outputs, this approach leads to a practical, data-driven algorithm with good theoretical guarantees. Our main contributions are summarized as follows:

1. General Framework for Personalized Revenue Management: We propose a general framework capturing a wide range of revenue management problems in which personalized contextual information is available. Instead of requiring prior knowledge of customers’ segments or using any clustering procedures, our framework assumes a parametric form of the customers’ choice model which incorporates each of a customers’ attribute. This permits information on the effect of attributes to be pooled across the sample, without the need to estimate separately in every segment which significantly reduces the sample size. Inthe proposed framework, we allow for the estimation of customer choice across arbitrary types, even for customers with attribute values not before observed as well as the case in which the number of attributes greatly exceeds the size of previously observed transaction data (the so-called high-dimensional setting). In contrast to popular statistical machine learning models in which data of interest consist of contextual information (or attributes) and outcomes, in revenue management problems, there is a third element that must be accounted for - the action of the seller. This action may be a price or set of prices offered by a firm, and it has an important effect on the observed outcome of the transaction. We provide ways to incorporate these actions into learning algorithms for customer behavior models.

2. Statistical Bound on the Revenue Gap: Given the model, we estimate the parameters, which relate the customers’ attributes and the action taken by the seller to the outcome, by the regularized maximum likelihood estimation. In

117 contrast to statistical machine learning problems in which one cares about the generalization error, our goal is to provide an upper bound on the gap between the expected revenue of the proposed method and the oracle revenue, which holds with high probability. The oracle revenue is achieved by making the optimal decision with full knowledge of the system parameters governing consumer behavior. In contrast to the asymptotic bound, our finite-sample bound gives the explicit number of samples needed to achieve any given level of revenue loss as compared to the oracle decision maker. Such a bound characterizes the trade-off between the cost of information, given by the cost of minimum number of samples to be collected, and the potential revenue loss. Further, our high- probability bound has an advantage over the usual expected loss bound since it is more robust against unexpected realizations of customers’ choice. We derive the desired bound in two steps: (1) we first provide the bound on the distance between the estimated parameter and true parameter holding with high probability; (2) we generalize the bound on estimation error into a bound on the revenue gap by exploiting properties of the revenue function. Although classi- cal statistical theory has established the asymptotic normality of the estimated parameter as in Theorem 3.2.16 of (van der Vaart and Wellner 2000), it does not directly yield a finite sample bound on the estimation error in terms ofthe size of the number of observations in the data set. We establish such a bound in this chapter, which demonstrates the rate at which the revenue gap shrinks to zero as the number of samples grows. We further extend our theoretical results to two other cases:

(a) One in which the model of customers’ behavior is mis-specified so that purchase probability does not in fact follow a logistic model. These results demonstrate the robustness of our method in the case when reality deviates from our modeling assumptions.

(b) Another in which the number of customer attributes is substantially larger than the number of previous observations, the high-dimensional scenario.

118 To illustrate the proposed framework for personalized revenue management, we consider the fundamental example of customized pricing. To facilitate the illustration of our framework without unnecessary complication, we only consider the most clas- sical setups in the chapter, i.e., single-stage pricing problem and multinomial logit model (MNL) based models such as those presented in Phillips (2005) or Talluri and van Ryzin (2004b). In customized pricing a seller aims to offer her product to each customer at a price that maximizes expected profit, given information she has obtained about the transaction and the customer. Customized pricing, or price discrimination, is quite common in business to business transactions. Despite its limited application to general settings due to customer satisfaction and legal issues, it is predicted that the practice will spread and become more widely accepted as more data becomes available (see Golrezaei et al. (2014)). We note that although we only explicitly demonstrate our approach in the context of customized pricing, our style of analysis is not limited to these specific instances, and can be applied in many other contexts such as online advertisement allocation, crowdsourcing task assignment, personalized medicine, and other problems in which personalization can aid in optimal decision-making.

4.1 Literature Review

Our work combines two recent themes within operations management: learning model parameters from historical data and making personalized decisions using contextual information. The primary application of personalization that we consider here, dynamic pricing, has been extensively studied over the past decade and there is a vast body of related work. Therefore, we provide a brief review of papers in the area which focus on learning and personalization beyond the more general references presented in section 1.2.1. At least in the field of operations research and management science, literature on personalized pricing is sparse. This is likely due to the fact that pure price discrimination is often thought to have limited application. However, there have been some

119 important contributions. Carvalho and Puterman (2005) studied a multi-stage pricing problem and assumed a logit model for demand as a function of the offered price, and they suggested that their model could be extended to include customer-specific attributes. Aydin and Ziya (2009) considered the case of customized pricing in which customers belong to either a high or low reservation price group and provide a signal to the seller that gives some information as to how likely they are to belong to the higher price group. Among other results, they develop conditions on the relationship between the signal and a customer’s probability of belonging to the higher price group under which the optimal price to offer is monotonic in the signal strength. Netessine et al. (2006) considered a form of personalized dynamic pricing in their treatment of cross-selling, in which the offer to each customer is customized, based on the other items that they are considering purchasing.

In many cases, models of price discrimination are actually cast as multi-product models, where the different price levels come with different qualifications and extras as in the airline industry. See Talluri and van Ryzin (2004a) and Belobaba (1989) for examples of this type. While in this chapter we focus on customized pricing for a single product based on customer attributes, which would be appropriate in the insurance industry and in business to business transactions, our conception of contextual information is general enough to encompass other formulations. For example our model can also be applied in the case of dynamic pricing of differing products based on the individual attributes of each product, the effects of which may be learned over time. While not precisely personalization, we note a commonly-made assumption in the literature has is that the utility of each product is linear in the attributes of the product. See Vulcano et al. (2008) and Rusmevichientong et al. (2010a) for discussions of this assumption. Our model generalizes this notion by specifying mean utility as linear in attributes of each customer. This setup allows for product effects to be modeled as attributes common to all customers but differing between products.

The rest of the chapter will proceed as follows. We give our approach and models in Section 4.2. In Section 4.3 we present the algorithms for customized pricing. Sections 4.4 and 4.5 are devoted to proving revenue bounds for the two problems under various

120 assumptions, including a high-dimensional result.

4.2 A General Model

In this section we present a general modeling framework for data-driven revenue management problems that include decision-specific context information, or costumer features (we will use the terms ‘feature’ and ‘attribute’ interchangeably). We then proceed to consider the application of customized pricing in detail.

A decision maker observes a vector of features 푧 ∈ 풵 ⊆ R푚 that encodes information about the context of the specific decision at hand. Taking into account 푧, she chooses an action 푎 from a problem-specific action space 풜. After the decision has been made, she observes an outcome 푦 from a finite set 풴 and gains a random reward from a finite set ℛ. The probability of outcome 푦 depends on both the feature

vector 푧 and the decision 푎. The reward 푟푎(푦) for outcome 푦 may also depend on the decision 푎. She would like to make the decision that maximizes her expected reward given the context 푧. The key to our modeling framework is a method for capturing the interaction between features, decision, and outcome using a logit model. This method gives conditional outcome probabilities P푧(푦; 푎). Given these probabilities from the problem- specific logit model, we can write the expected reward/revenue as

∑︁ 푓푧(푎) = 푟푎(푦)P푧(푦; 푎). (4.1) 푦∈풴

The algorithm we present estimates the expected reward and then maximizes over all possible decisions. Before stating the algorithm in detail, we give our specific model for customized pricing.

4.2.1 Customized Pricing Model

The primary application of our model is in the case where a seller has a single product without inventory constraints and wishes to offer a price that will maximize her

121 revenue. In basic form, the single-product pricing problem consists of a set 풜 =

{푝1, . . . , 푝퐾 } of 퐾 distinct prices and a probability of purchase P(푦 = 1; 푝푘), 푘 = 1, . . . , 퐾. Here the outcome 푦 is a binary decision and is equal to one if the customer purchases the product and zero otherwise. Thus, the expected revenue function is

푓(푝) = 푝P(푦 = 1; 푝). Without any other information, the seller would maximize 푓(·) over 푝 ∈ 풜.

In the customized pricing problem, we assume that the seller has the ability to offer a different price 푝 ∈ 풜 to each customer after observing the customer’s associated feature vector 푧 ∈ 풵 ⊆ R푚, which supplies the context for the current pricing decision. We assume that 푧 contains a unit intercept term common to all customers that allows us to learn universal effects.

This structure allows us to define a personalized demand function P푧(푦 = 1; 푝), which is the probability that a customer with the feature vector 푧 purchases the product at price 푝. We model purchase probabilities P푧(푦 = 1; 푝) using a logit model with the true parameters 훽* ∈ R퐾−1 and 훾* ∈ R푚:

* * 퐾 푚 P푧(푦 = 1; 푝, 훽 , 훾 ) ∑︁ * 푘 ∑︁ * log * * = 훽푘I(푝 = 푝 ) + 훾푗 푧푗, (4.2) 1 − 푧(푦 = 1; 푝, 훽 , 훾 ) P 푘=2 푗=1 where I(퐴) is the indicator function which takes the value 1 when the event 퐴 is true and 0 otherwise. We also note that since the price 푝 is in a discrete set 풜, we represent the price factor 푝 into a group of dummy variables (I(푝 = 푝2),..., I(푝 = 푝퐾 )). As is common in linear models, to improve computational properties the effect of the highest price, 푝1, is incorporated into the intercept instead of defining a separate * 2 퐾 푚+퐾−1 effect 훽1 . For simplicity of notation, let 푥 = (I(푝 = 푝 ),..., I(푝 = 푝 ), 푧) ∈ R be the entire feature vector that contains both the seller’s decision and customer feature; let 휃* = (훽*, 훾*) ∈ R퐾+푚−1 be the entire true parameter vector; and define the total dimensionality 푑 = 퐾 + 푚 − 1. The logit model in (4.2) gives a specific form

122 for the personalized demand function:

1 1 (푦 = 1; 푝, 휃*) = = , P푧 (︁ (︁ )︁)︁ * ∑︀퐾 * 푘 ∑︀푚 * 1 + exp (−⟨푥, 휃 ⟩) 1 + exp − 푘=2 훽푘I(푝 = 푝 ) + 푗=1 훾푗 푧푗 (4.3)

* where P푧(푦 = 1; 푝, 휃 ) accomplishes the goal of linking customer attributes, seller decisions, and customer choice without defining any customer segmentation. To complete the model for customized pricing, the expected reward is given by

* * 푓푧(푝, 휃 ) := 푝 P푧(푦 = 1; 푝, 휃 ), (4.4)

We pause to make explicit the connection between customized pricing notation and the general notation in (4.1). For customized pricing, each action 푎 is a price 푝 ∈ 풜. As mentioned above, outcomes 푦 ∈ {0, 1} correspond to purchase decisions, and reward 푟푎(푦) = 푟푝(푦) = 푝 if 푦 = 1 and zero if 푦 = 0. Using these mappings one * * can identify P푧(푦 = 1; 푝, 휃 ) and 푓푧(푝, 휃 ) as the problem-specific versions of P푧(푦; 푎) and 푓푧(푎).

Remark 4.2.1. Our model of the price effect via the parameters 훽* ∈ R퐾−1 is simply for clarity of exposition and to highlight the model of the seller’s decision (i.e., price) as a feature. In practice, this model directly extends to modeling interaction effects between offered prices and other features (e.g,. via introducing extra features 푝푧푗). These interaction effects allow us to measure the change in price sensitivity given specific customer features and we have found such effects to be especially usefulin our real transaction data. Further, our representation of the price from a finite set 풜 as a vector of dummy variables is a common way of modeling categorical feature in multifactor analysis-of-variance (Rao et al. 2008). It can also be easily adapted to a continuous price set by using a single price parameter 훽.

Our general model captures the relationship between features, decisions, and outcomes. Because we assume that past feature-decision-outcome data are observed and recorded, these models give rise to practical algorithms via maximum likelihood estimation for learning the underlying parameters, which facilitates the personalized

123 decision-making for new customers, as in the next section.

4.3 Algorithm

With the model in place, we now present the general Personalized Revenue Maxi- mization (PRM) algorithm in Algorithm 1. We begin with a set of 푛 prior transa-

tion records 풯 = {(푧1, 푎1, 푦1),... (푧푛, 푎푛, 푦푛)} from which the operator seeks to learn.

These In particular the 푖th record specifies 푎푖 the action taken, 푦푖 the outcome of that

decision, and 푧푖 the context for that decision. Such transaction records are contained in the database of essentially all large-scale modern businesses. Using the transaction likelihood as given by in (4.3) we compute the negative log-likelihood function, 1 ∑︀푛 ℓ푛(풯 ; 휃) = − 푛 푖=1 log (P푧푖 (푦푖; 푎푖, 휃)). To avoid overfitting and damaging the generalization properties of this procedure we introduce the regularization constant 푅

* and require that ‖휃 ‖1 ≤ 푅 as mandated in (4.5). In practice, one can either tune this 푅 for better performance or simply fix a large enough number 푅. In addition to controlling over-fitting, this regularization is also useful to facilitate theoretical analysis and often leads to better empirical performance. In the high-dimensional setting as discussed in Section 4.5.2, where the dimension of the attribute vector 푧 is large

compared to the number of observed transactions, ℓ1-norm regularization is essential for both empirical performance and theoretical justification.

In step 1 of Algorithm 1, we minimize ℓ푛(풯 ; 휃) over 휃 under ℓ1-regularization to

get an estimate for 휃. For customized pricing, it is easy to see that ℓ푛(풯 ; 휃) is a convex function and ℓ1-regularization is a convex constraint on 휃. Therefore, any fast convex optimization procedure such as accelerated projected gradient descent or alternating direction method of multipliers can be applied to solve the problem (4.5). The reader is encouraged to reference F. Bach and Obozinski (2011) and S. Boyd and Eckstein

(2010) for further background on ℓ1-regularized convex optimization. Given the estimated parameters, in step 2 we calculate the estimated outcome probabilities P푧(푦; 푎, 휃̂︀) for decision 푎 given features 푧, which is used to approximate * the expected reward 푓푧(푎, 휃 ) in (4.1). In step 3, we construct the decision policy to

124 Algorithm 1 Personalized Revenue Maximization (PRM) Algorithm

Input: Data samples 풯 = {(푧1, 푎1, 푦1),... (푧푛, 푎푛, 푦푛)}, regularization parameter 푅 Output: Decision policy ℎ : 풵 → 풜

1. Fit the regularized MLE on the observed data:

[︃ 푛 ]︃ 1 ∑︁ 휃̂︀ = argmin ℓ푛(풯 ; 휃) = − log ( 푧 (푦푖; 푎푖, 휃)) (4.5) ‖휃‖1≤푅 푛 P 푖 푖=1

2. Obtain the estimate of outcome probabilities P푧(푦; 푎, 휃̂︀) for every 푧 ∈ 풵 and 푎 ∈ 풜.

3. Construct the decision policy ℎ : 풵 → 풜 as ℎ(푧) = ̂︀푎 where

̂︀푎 = argmax푎∈풜푓푧(푎, 휃̂︀). (4.6)

maximize the approximated expected reward. For customized pricing, the decision problem in step 3 can be simply solved via optimization over the finite price set 풜.

Remark 4.3.1. The algorithm is presented here in its full generality, however it can be easily adapted to incorporate problem-specific information through the use of additional constraints. For instance in the case of customized pricing, it is natural in many cases to assume that a higher price implies lower demand given the same context 푧. Let the prices in the candidate set 풜 be ordered such that 푝퐾 < 푝퐾−1 < ··· < 푝1. To impose such an assumption, one can simply add an additional constraint to the regularized maximum likelihood estimation procedure in (4.5), i.e., 훽 lies in

퐾−1 the isotonic cone {훽 ∈ R |0 ≤ 훽2 ≤ ... ≤ 훽퐾 }. Such extra information can be practically useful in aiding the learning algorithm, especially when the number of observations is small.

4.4 Theory: Well-specified Model Setting

In this section, we develop our high-probability guarantees on the performance of the Algorithm 1 (PRM) for personalized pricing. In the well-specified setting, we assume that there exist parameters 휃* such that for all observed data 푖 ∈ {1, . . . , 푛},

125 * P(푦푖) = P푧푖 (푦푖; 푎푖, 휃 ), which means the logit model is the correct underlying model of outcome probabilities. In Section 4.5 we will discuss how our results can be adapted to cases in which transaction data does not follow such a logit model. We use as our benchmark the oracle policy that knows this true 휃* and so is

* able to select the action 푎 that maximizes expected revenue. Let ̂︀푎 be the action recommended by Algorithm 1 in (4.6). Using properties of the maximum likelihood estimates, for any feature vector 푧, our goal is to find a bound on the optimality gap

* * * 푓푧(푎 , 휃 ) − 푓푧(̂︀푎, 휃 ) which holds with high probability. Before we prove our bound on the optimality gap in terms of the number of samples 푛, we first provide some necessary notation and preliminaries.

4.4.1 Notation and Preliminaries

1 ∑︀ 푝 푝 We use ‖푣‖푝 to denote the 푝-norm of the vector 푣, given by ‖푣‖푝 = ( 푖 |푣푖| ) for

푝 > 0 and ‖푣‖∞ = max푖 |푣푖|. Further, let 휆max(Σ) and 휆min(Σ) denote the largest and

smallest eigenvalues of the matrix Σ, respectively, and let ‖Σ‖표푝 be the operator norm of Σ. For a symmetric positive semi-definite matrix Σ, we have ‖Σ‖op = 휆max(Σ). As our analysis focuses on rates of convergence, we will use 푐, 퐶, and 퐶′ to denote universal constants. In our theoretical analysis, we assume that either the customers’ feature vectors are fixed, in the deterministic design setting, or follow a sub-Gaussian distribution in the randomized design setting. The sub-Gaussian assumption on the feature vectors is a common and natural assumption for regression analysis since it captures a wide range of multivariate distributions. Examples include the multivariate Gaussian distribution, the multivariate Bernoulli distribution, the spherical distribution (for modelling normalized unit-norm feature vectors), and a uniform distribution on a convex set among many others. We briefly introduce sub-Gaussian random variables and vectors here and readers may refer to Vershynin (2012) for more details. Formally, a sub-Gaussian random variable 푋 is a random variable with mo- 1 √ ments that satisfy (E|푋|푝) 푝 ≤ 퐾 푝 for all 푝 ≥ 1 for some 퐾 > 0. The corresponding sub-Gaussian norm 휓푋 = ‖푋‖휓2 is the smallest 퐾 for which the mo-

126 1 −1/2 푝 푝 ment condition holds, i.e., ‖푋‖휓2 = sup푝≥1 푝 (E|푋| ) . It can be proven that such a moment condition is equivalent to a more natural tail condition of 푋 that is similar to the super-exponential tail bound of a Gaussian random variable, i.e.,

(︀ 2 2 )︀ P(|푋| ≥ 푡) ≤ exp 1 − 푐푡 /‖푋‖휓 for some constant 푐 (see Lemma 5.5. in Vershynin (2012)). Gaussian and any bounded random variable (e.g., Bernoulli) are special cases of sub-Gaussian random variables. Given the definition of sub-Gaussian random vari-

able, a random vector 푥 ∈ R푑 is sub-Gaussian if its one-dimensional marginals ⟨푥, 푤⟩ are sub-Gaussian random variables for all 푤 ∈ R푑. The corresponding sub-Gaussian

norm is defined by by 휓푥 = sup‖푤‖2≤1‖⟨푥, 푤⟩‖휓2 .

4.4.2 Theoretical Results for Customized Pricing

In this section, we will give the detailed revenue bound for the case of the customized pricing problem. We emphasize that for now we are considering the regime in which the dimensionality 푑 remains fixed and 푑 < 푛. The high-dimensional case will be studied in Section 4.5.2.

We consider both deterministic (or fixed) design, where the input feature vectors 푧푖

are viewed as fixed quantities, and random design, where inputs 푧푖 is randomly drawn from some distribution. Both of these design assumptions are popular in regression analysis (see Rao et al. (2008)). We formally state our assumptions before moving to the results.

Assumption 1. For both deterministic and random design settings, we assume

푛 1. Conditional independence: the observed outcomes {푦푖}푖=1 are independent given

each 푥푖 (see definition of 푥푖 in Section 4.2.1).

2. Bounded feature vectors: there exists a universal constant 퐵′ > 0 such that for

′ any customer feature vector 푧, |푧푗| ≤ 퐵 for all 푗 ∈ [푚]. This further implies ′ that |푥푗| ≤ max(퐵 , 1) , 퐵.

The assumption of conditional independence is ubiquitous in the statistical literature and is reasonable in our setting. Our boundedness assumption ensures that

127 the transaction data used will not contain arbitrarily large elements which could have an outsize effect on our learning procedure. The remainder of our assumptions differ between the fixed and random design settings.

Assumption 2 (Deterministic Design). There exists a constant 휌 such that 휆min (Σ푛) ≥ 휌 1 ∑︀푛 푇 2 > 0, where Σ푛 = 푛 푖=1 푥푖푥푖 is the sample Gram matrix.

푛 Assumption 3 (Random Design). 1. The vectors {푥푖}푖=1 are independent and identically distributed, following a sub-Gaussian distribution with the sub-Gaussian norm 휓.

2. There exists a universal constant 휌 such that 휆min(Σ) > 휌 > 0, where Σ = E(푥푥푇 ).

We note that the positive lower bound on either sample Gram matrix Σ푛 or population second moment matrix Σ can easily be satisfied when the sample size 푛 > 푑. The sub-Gaussian assumption for random design in Assumption 3 is satisfied as long as customer feature vectors 푧 are sub-Gaussian with sub-Gaussian norm 휓푧 = 휓 − 1. Finally, companies sometimes use periods of price experimentation, in which prices are offered at random, for the purpose of learning demand. In this case,the

푛 i.i.d. assumption on the vectors {푥푖}푖=1 is satisfied at least for the price information that the 푥푖 contain.

Estimation Error Bound

Our first step is to develop statistical bounds on the rate at which the estimated parameters 휃̂︀ provided by Algorithm 1 converge to the true system paramters 휃*. √︁ * log 푛 Under our Assumptions 1, 2, and 3 we demonstrate that ‖휃̂︀− 휃 ‖2 ≤ 퐶 푛 with high probability for some constant 퐶. This result implies that, with high probability, the parameters we estimate converge to their true values at a rate proportional to

√1 푛 . Subsequently, we will translate this convergence from parameter space to revenue space using properties of our revenue function. We state this result in the theorem below.

128 Theorem 1 (Parameter Convergence Rate). In the deterministic design setting under

1 Assumptions 1 and 2, we have with probability at least 1 − 푛 ,

2 √︂ * 퐵 (1 + exp(푅퐵)) 푑 log(푛푑) ‖휃̂︀− 휃 ‖2 ≤ 푐 . 휌 exp(푅퐵) 푛

4퐶푐푝(휓) log(푛)푑 In the randomized design setting under Assumptions 1 and 3, as long as 푛 ≥ min(휌,1)2 1 1 푑 for some constant 퐶푐푝(휓) only depending on 휓, w.p. at least 1 − 푛 − 2( 푛 ) ,

2 √︂ * 휓 (1 + exp(푅퐵)) 푑 log(푛푑) ‖휃̂︀− 휃 ‖2 ≤ 푐 , 휌 exp(푅퐵) 푛

where 푐 is a universal constant.

We note that from now on, we suppress the data argument 풯 in the function ℓ푛 for convenience. To prove Theorem 1, we first establish the strong convexity of the

* loss ℓ푛 with the strong convexity parameter 휂 > 0. Let Δ̂︀ = 휃̂︀− 휃 denote the error in our parameter estimate with respect to 휃* and recall that the goal of Theorem 1 is

to provide a finite-sample upper bound on ‖Δ̂︀ ‖2. The strong convexity of ℓ푛 implies that

휂 2 * * * ‖Δ̂︀ ‖ ≤ ℓ푛(휃 + Δ)̂︀ − ℓ푛(휃 ) − ⟨∇ℓ푛(휃 ), Δ̂︀ ⟩, (4.7) 2 2

where 휂 is known as the strong convexity parameter. Since 휃̂︀ is the minimizer of the * * * ℓ푛, we have ℓ푛(휃̂︀) − ℓ푛(휃 ) = ℓ푛(휃 + Δ)̂︀ − ℓ푛(휃 ) ≤ 0. Together with (4.7) and using Hölder’s inequality, this implies that

√ 휂 2 * * * ‖Δ̂︀ ‖ ≤ −⟨∇ℓ푛(휃 ), Δ̂︀ ⟩ ≤ ‖∇ℓ푛(휃 )‖∞‖Δ̂︀ ‖1 ≤ 푑‖∇ℓ푛(휃 )‖∞‖Δ̂︀ ‖2, 2 2 which further implies that √ 2 푑 * ‖Δ̂︀ ‖2 ≤ ‖∇ℓ푛(휃 )‖∞. (4.8) 휂

* Intuitively, (4.8) tells us that when ℓ푛 has sufficient curvature near 휃 (quantified by 휂), a small gradient must imply that the true parameter 휃* is near-optimal for the

129 empirical log-likelihood function ℓ푛. * Therefore, to establish an upper bound on ‖Δ̂︀ ‖2 = ‖휃̂︀ − 휃 ‖2 using (4.8), we * only need to (1) establish an upper bound on ‖∇ℓ푛(휃 )‖∞; (2) identify the strong- convexity parameter 휂. These steps are accomplished in the following lemmas. We

* begin by showing that ‖∇ℓ푛(휃 )‖∞ can be upper bounded with high probability in both deterministic and random design cases. The proof of the lemma is provided in appendix C.1.

Lemma 6 (Gradient Bound). In the deterministic design setting under Assumptions √︁ 1 * log(푛푑) 1 and 2, we have with probability at least 1 − 푛 , ‖∇ℓ푛(휃 )‖∞ ≤ 푐퐵 푛 . In the randomized design setting under Assumptions 1 and 3, we have with probability at least 1 − 1 , 푛 √︂ log(푛푑) ‖∇ℓ (휃*)‖ ≤ 푐휓 . 푛 ∞ 푛 * Next demonstrate that ℓ푛 is strongly convex in the paramters at 휃 with strong convexity paramter 휂 and that this parameter is independent of the size of the data set. The proof of the lemma is provided in appendix C.1.

Lemma 7 (Strong Convexity). In the deterministic design setting under Assumptions

* 1 and 2, we have that ℓ푛 is strongly convex at the true parameter 휃 with

exp(푅퐵) 휂 = · 휌. (4.9) 4(1 + exp(푅퐵))2

4퐶푐푝(휓) log(푛)푑 In the randomized design setting under Assumptions 1 and 3, as long as 푛 ≥ min(휌,1)2 * for some constant 퐶푐푝(휓) only depending on 휓, ℓ푛 is strongly convex at 휃 with strong 1 푑 convexity parameter given in (4.9), with probability at least 1 − 2( 푛 ) . Using the lemmas proved above, we now provide the proof of theorem 1 following the provided outline.

* Proof. Proof of Theorem 1 By plugging both the upper bound on ‖∇ℓ푛(휃 )‖∞ in Lemma 6 and the strong convexity parameter 휂 in (4.9) into (4.8), we obtain the result of Theorem 1 in both the fixed and the random design settings, which completes the proof of Theorem 1.

130 Revenue Bound

The results of the previous section tell us that the parameter estimates 휃̂︀ given by

* √1 Algorithm 1 converge to their true values 휃 at a rate of 푛 , with high probability. Crucially, this result holds for a finite number of samples 푛. However, in a management context the revenue lost in implementing the algorithm is the far more important quantity. In this section we translate our bound from parameter space to revenue space. For the purpose of this analysis we fix a transaction feature vector 푧 ∈ 풵. Recall the best price 푝^ based on the estimated parameters (defined in (4.6)) and oracle price,

* * 푝 := max 푓푧(푝, 휃̂︀) and 푝 := max 푓푧(푝, 휃 ). ̂︀ 푝∈풜 푝∈풜

We are interested in the revenue gap between the revenue generated by the oracle prize

* 푝 (so-called oracle revenue) and that by the offered price 푝̂︀ via Algorithm 1 when the * * * * customer’s behavior is specified by the true parameters 휃 , i.e., 푓푧(푝 , 휃 ) − 푓푧(푝,̂︀ 휃 ). The next theorem demonstrates that this revenue gap decreases at a quantifiable rate as the sample size 푛 is increased. We note that this revenue gap is an out-of-sample guarantee since such a bound holds for any new customer with feature vector 푧.

Theorem 2 (Revenue Convergence Rate). Under Assumptions 1 and 2 and 3, we

4퐶푐푝(휓) log(푛)푑 have that with high probability, as long as 푛 ≥ min(휌,1)2 , for any feature vector 푧, the expected revenue gap can be bounded by:

(︂ )︂ ′ √︂ 2 * * * 퐶푐푝(푅, 퐵, 휓) 푑 log(푛푑) 푓푧(푝 , 휃 ) − 푓푧(푝, 휃 ) ≤ max 푝 , ̂︀ 푝∈풜 휌 푛

′ where 퐶푐푝(푅, 퐵, 휓) is a constant only depending on 푅, 퐵 and 휓.

Proof. Proof. In order to translate the parameter bound into a bound on revenue, we exploit some of the structural properties of 푓푧 as defined in (4.4). Here, we view

푓푧 as a function of its parameters 휃 = (훽, 훾). For any given price 푝 ∈ 풜 and feature

131 vector 푧, we have

⃒ ⃒ ⃒ ⃒ * ⃒ 푝 푝 ⃒ |푓푧(푝, 휃 ) − 푓푧(푝, 휃̂︀)| = ⃒ (︁ (︁ )︁)︁ − (︁ (︁ )︁)︁⃒ ⃒ * ∑︀ * ∑︀ ⃒ ⃒1 + exp − 훽푘 + 푗 훾푗 푧푗 1 + exp − 훽̂︀푘 + 푗 훾̂︀푗푧푗 ⃒ 푝 * * ≤ |⟨(1, 푧), (훽 − 훽̂︀푘, 훾 − 훾)⟩| (4.10) 4 푘 ̂︀ 푝 * * ≤ ‖(1, 푧)‖2‖(훽푘 − 훽̂︀푘, 훾 − 훾̂︀)‖2 4 √ (︂ )︂ 2 푚퐵 + 1 * ≤ max 푝 ‖휃 − 휃̂︀‖2, 푝∈풜 4

* where 푘 is the index of the corresponding price in 풜 and we define 훽푘 and 훽̂︀푘 to be equal to zero when 푘 = 1 and to be equal to the corresponding element of 훽* or 훽̂︀ respectively otherwise. We note that (4.10) follows from the fact that the derivative

−1 1 of the function (1 + exp(−푎)) is bounded by 4 for any 푎. Thus, using the fact that * 푓푧(푝,̂︀ 휃̂︀) ≥ 푓푧(푝 , 휃̂︀),

* * * * * * 푓푧(푝 , 휃 ) − 푓푧(푝,̂︀ 휃 ) = 푓푧(푝 , 휃 ) − 푓푧(푝,̂︀ 휃̂︀) + 푓푧(푝,̂︀ 휃̂︀) − 푓푧(푝,̂︀ 휃 ) * * * * ≤ 푓푧(푝 , 휃 ) − 푓푧(푝 , 휃̂︀) + 푓푧(푝,̂︀ 휃̂︀) − 푓푧(푝,̂︀ 휃 ) * * * * ≤ |푓푧(푝 , 휃 ) − 푓푧(푝 , 휃̂︀)| + |푓푧(푝, 휃̂︀) − 푓푧(푝, 휃 )| √ ̂︀ ̂︀ (︂ )︂ 2 푚퐵 + 1 * ≤ max 푝 ‖휃 − 휃̂︀‖2. (4.11) 푝∈풜 2

Having bounded the revenue gap by the parameter gap, we can apply Theorem 1 to get the result in the theorem statement.

As desired, we have bounded the optimality gap of the reward generated by the

* recommended price 푝̂︀ as compared to the oracle price 푝 . This bound decreases as (︁ )︁ √1 푂 푛 , up to logarithmic terms, and thus quantifies the trade-off between the value of data and potential lost revenue.

132 4.5 Extensions of Theory

The theory presented so far has assumed that our model was well-specified and that the feature vectors were low-dimensional. We now demonstrate that it is possible for both of these assumptions to be relaxed.

4.5.1 Misspecified Model Setting

In practice it is quite possible that consumer behavior cannot be specified using a logit model, in which case the underlying demand structure is typically unknown. If

we proceed to use the logit loss function ℓ푛 as we are unaware of the underlying model, a natural benchmark is the performance of the logit model that most closely reflects actual consumer behavior, which we term the oracle estimator. The oracle estimator is obtained by minimizing E(ℓ푛(휃)), where the expectation is taken with respect to the true underlying model. This corresponds to the maximum likelihood estimated parameters of a logit model in the limit as the sample size grows to infinity. As we only have a finite number of samples in practice, we are interested in the revenue gap between the action chosen based on the estimated parameters from our finite sample and that based on the oracle estimator. In order for the notion of an infinitely large training set sampled from an underlying distribution to be well-defined, we consider random rather than fixed design.

Thus, we suppose that the data 풯 = {(푧1, 푎1, 푦1),... (푧푛, 푎푛, 푦푛)} is generated from some underlying random process. In accordance with our discussion above, we rede- fine

* 휃 = argmin휃E(ℓ푛(휃)). (4.12)

Thus in this section, the oracle estimator 휃* represents the estimated parameters under the expected negative log-likelihood (or the negative log-likelihood function based on an infinite number of samples from the underlying true distribution). Thisisin contrast to the interpretation of 휃* in Section 4.4 as the true underlying parameters, which are unavailable here because we do not make any modeling assumptions. We

also note that since we consider the expectation of ℓ푛 here, there is no need for the

133 regularization/constraint on 휃 in (4.12) whose main purpose is to penalize the model complexity when the number of samples is limited. Under this setup, we can still prove meaningful revenue bounds. In fact, the same results from Theorems 1 and 2 still hold with the new definition of 휃*. As an example, we will prove here an analogous result to Theorem 2, the customized pricing revenue bound:

Theorem 3. In the misspecified setting with 휃* defined as in (4.12) and under As-

4퐶휓 log(푛)푑 sumptions 1 and 3, we have that with high probability, as long as 푛 ≥ min(휌,1)2 , the gap between oracle estimator revenue and the revenue achieved with Algorithm 1 can be bounded as follows:

(︂ )︂ ′ √︂ 2 * * * 퐶푐푝(푅, 퐵, 휓) 푑 log(푛푑) 푓푧(푝 , 휃 ) − 푓푧(^푝, 휃 ) ≤ max 푝 , 푝∈풜 휌 푛

for all feature vectors 푧.

The proof of Theorem 3 is rather straightforward since the form of the likelihood

function ℓ푛 has not changed from the well-specified to the misspecified setting. In addition, Assumptions 1 and 3 also remain unchanged and so Lemma 7 still holds. However, the form of the underlying randomness in the data and the meaning of 휃* has changed, so Lemma 6 can not be directly applied here. Examining the proof of Lemma 6, one can see that the proof (in particular, the concentration result) still holds so long as 휃* satisfies

* E∇ℓ푛(휃 ) = 0. (4.13)

In the misspecified setting, because the oracle estimator 휃* is defined as an uncon-

* strained minimum, we have ∇E(ℓ푛(휃 )) = 0. Under certain mild regularity conditions

on ℓ푛, that hold for all our applications, the differentiation and integration can be interchanged, which means that (4.13) holds. Given that both lemmas hold, we have that Theorem 1 must hold for the oracle estimator, which further implies that the

revenue gap bound in Theorem 2 holds since the revenue function 푓푧 remains the same.

134 In the cases of both Theorem 2 and Theorem 3, we use an estimation error bound to quantify the rate at which the revenue gap shrinks to zero as the number of samples

√1 grows — a rate of 푛 . The difference between the two settings is in the interpretation of the oracle benchmark. In Theorem 2, the oracle policy knows the parameters of the true underlying distribution, whereas in Theorem 3 the oracle knows the parameters of the logit model which best reflects customer choice.

4.5.2 High-Dimensional Setting

In previous sections, our bounds are increasing functions of the number of features 푑. This is reasonable when 푑 remains fixed and 푛 grows large. However, as companies continue to collect more granular and larger quantities of data concerning their customers, there are applications where the number of customer features matches or even exceeds the number of data points. Examples of this could include highly granular GPS information or detailed lists of preference comparisons featuring irrational cycles that are not easily specified without introducing a combinatorial number of parameters. In such a high-dimensional setting where 푑 is comparable to or larger than 푛, a bound of the form presented above is of no use. In order to derive analogous bounds for the performance of our method in this setting, we must make assumptions about the sparsity of the true parameter 휃*. Specifically, for some index set 푆 ⊂ {1 . . . 푑}

* 푑 with |푆| = 푠 ≪ 푑, we assume that 휃 ∈ Θ(푆) := {휃 ∈ R : 휃푗 = 0, ∀푗∈ / 푆}. In the well-specified case this means that a small portion of the available feature data can be used to precisely quantify the likelihood of purchase. A natural way to obtain an estimator of such a sparse vector is to use ℓ1-norm regularization as we have considered in the low-dimensional case. The sparsity assumption allows us to prove theoretical revenue bounds analogous to those in the low dimensional setting. Using a Lagrangian form of (4.5), we obtain an estimator given by

휃̂︀휆푛 ∈ argmin휃∈R푑 {ℓ푛(풯 , 휃) + 휆푛‖휃‖1} . (4.14)

135 By standard duality theory (see Chapter 6 in Bertsekas et al. (2003) for example), (4.5) and (4.14) are equivalent in the sense that there is a one-to-one correspondence

between 휆푛 and 푅. The change in form here is made only for convenience in proving the high-dimensional results. We include here the high-dimensional version of our revenue bound for the well-specified random design customized pricing problem.

휅2 2 Theorem 4. As long as 푛 > 16( 휅 ) 푠 log 푑, the objective function error associated 1 √︁ with (훽, 훾), the solution to (4.14) with regularization parameter 휆 = 4휓 log 푝 , can ̂︀ ̂︀ 푛 2푛 be bounded as follows:

(︂ )︂ √ * * * 푘 12퐵휓 log 푑 푓푧(푝 , 휃 ) − 푓푧(푝,̂︀ 휃 ) ≤ max 푝 √ √ , 푘∈[퐾] 휅1 2푛 − 4휅2 2푠 log 푑

푛 for all bounded feature vectors 푧 with probability at least 1−2푑 exp(− 4 )−푐1 exp(−푐2푛)− 2 푑 . Here 휅1, 휅2, 푐1, and 푐2 are positive constants depending only on 퐵, 휓, 휌, and Σ.

The proofs are highly technical and follow from Negahban et al. (2012). Notice that although we cannot apply Theorem 2 in the high-dimensional setting since 푑 grows faster than 푛 (and thus the bounds on the estimation error and revenue loss go to infinity as 푛 grows), Theorem 4 does apply. In particular, the dimension parameter 푑 only appears logarithmically. Since 푠 remains small as 푑 grows by the common assumption, the bound shrinks to zero even when 푑 is exponential in 푛푐 with √ 푐 < 0.5 (e.g., log(푑) = 표( 푛)). We also note that in practice, we cannot directly rely on this guarantee because the regularization parameter 휆푛 is dependent on the unknown quantity 휓. However, one can tune 휆푛 by cross validation to achieve good performance.

4.6 Extensions and Future Work

We have developed a framework for modeling decision problems in which actions can be personalized by taking into account the information available to the decision maker. If we assume a logistic model to describe outcome probabilities in terms of the features and the seller’s decision, we demonstrate that learning takes place reliably by

136 establishing finite-sample high probability convergence guarantees for model parameters which hold regardless of the number of customer types, which can be potentially uncountable. These bounds apply between the fitted model and the true minimizer of constrained empirical risk, whether or not the model has been well-specified. The parameter convergence guarantees can then be extended to performance bounds in operational problems as we show for the case of customized pricing.

Many of the papers that study dynamic pricing consider inventory constraints over multiple periods, whereas our results do not explicitly consider inventory and are inherently myopic due to the offline nature of logistic regression. Due tothis offline nature, our method does not work to dynamically learn and optimize over time, and it would be interesting to examine how these results could be adapted to such a setting. An alternation scheme between exploration and exploitation phases such as that suggested by Rusmevichientong et al. (2010a) would allow for asymptotic convergence, but the logistic regression would need to be recomputed at step, which could be cumbersome even with a warm-start as the size of the data set grows large. A potential remedy for this would be to learn from a finite window of past data. Such a scheme could account for changes in parameters over time in which case asymptotic convergence becomes meaningless.

Beyond problems in revenue management our approach is relevant in many other situations in which decisions resulting in discrete outcomes can benefit from taking into account explicit contextual information. One such example is in online advertisement allocation in which we would like to predict click-through rates and make the optimal advertisement selection taking into account information we have about each viewer. Another possible application is crowdsourcing in which we would like to specialize our work schedule based on information we have gathered concerning our workers, the available tasks, and the interaction between their attributes. Finally, beyond the specific domain of operations management we envision applications in personalized medicine in which the likelihood of success of a treatment or the probability of disease could be predicted and decisions optimized by taking into account information concerning each patient. We would like to explore these applications in

137 the future.

138 Chapter 5

Concluding Remarks

In the preceding chapters we proposed a number of tools for revenue management. In this chapter, we summarize our results and propose interesting directions for future work.

5.1 Summary

In Chapters 2 and 3 we developed algorithms and associated performance guarantees for price and assortment optimization in systems of reusable resources. Chapter 2 focused on the setting of time-homogeneous demand which allows us to isolate the complexity that random service times introduce into a revenue management setting. In both the settings of assortment-only and joint pricing and assortment optimization, we demonstrated that a simple randomized algorithm is able to achieve at least half of the revenue obtained by the optimal dynamic policy. Further in the case that prices are fixed for each resource, we demonstrate that it is possible to obtain the optimal static assortment policy using a non-linear column-generation technique. We also introduced fairness considerations and demonstrated that computing a policy that does not price discriminate can be computationally challenging. Despite this difficulty, we show that in the case of single item assortments it is possible to efficiently obtain a fair policy at the expense of an additional approximation factor in the performance guarantee. In addition, in both the assortment-only and joint pricing and assortment

139 settings we introduce heuristic dynamic algorithms that perform well when real-time utilization information is available.

One fundamental theme of this second chapter is that the ability to solve single- stage problems efficiently may be used to help find effective policies for managing large networks of reusable resources in continuous time. Our computational results also shed light on when a greedy strategy is effective and when an operator should invest in more sophisticated techniques. We find that in extremely light and extremely heavy traffic a greedy heuristic performs well. On the other hand, when the numberof resources and demand are in relative balance, it becomes more important to employ a strategy that accounts for the complexity and diversity of customer tastes. These experiments also validated the intuitive notion that dynamic pricing is less effective under demand that is steady over time. Indeed, in our experimental setting the additional revenue generated by a dynamic pricing policy over that of a well-chosen fixed-price policy was less than 5% of the revenue of the optimal policy.

Motivated by this realization, in Chapter 3, we extended the fundamental ideas from the previous chapter to setting of time-vary demand rates. This introduced additional complexity into our policies and solutions and required entirely different technical methodology. In particular, our models in this chapter are able to quantify the impact of selling a resource in the current time period on its availability in future periods. These estimates are then used in our dynamic algorithm which makes assortment and pricing decisions based on time-varying bid prices for each resource. Our computational experiments demonstrate that these algorithms perform well under all load levels and maintain steady performance in instances when the performance of more myopic strategies degrades.

The results of this section again demonstrate the power of single-stage price and assortment optimization in solving complex continuous-time resource allocation problems in networks of reusable resources. Another useful facet of our strategy is that the solution methodology is invariant to the scale of the system. This means that our policies can be applied in systems with extremely large numbers of customers and individual resources so long as the customers and resources can be split into a

140 much smaller number of types. Finally, our policies are asymptotically optimal, to a scalable approximation factor, and therefore present even stronger guarantees in large, busy networks. In Chapter 4, we studied the problem of learning parameters for a logistic model in the context of personalized pricing. We demonstrated finite sample bounds on the performance of our algorithm that improve as the size of the transaction data it is trained on increases. These bounds apply between the fitted model and the true minimizer of the constrained expected risk whether or not the model has been well-specified.

5.2 Future Directions

Our work leaves open many interesting avenues for further research in systems of reusable resources. In particular, our algorithms rely on an unbiased estimate of the true demand functions in each period. Relaxing this assumption could yield interesting work in a number of directions. First, the problem of learning while optimizing has been a topic of much research in the revenue management community, but to our knowledge none of these algorithms have yet been tailored to the reusable setting. Secondly, another popular approach to optimization under uncertainty is to follow the paradigm of robust optimization. In such a circumstance the operator would define a set of possible outcomes and a strategy would be developed to optimize for the best worst-case outcome. The interaction between reusable resources and robust optimization could be an interesting frontier for future research to explore. Another interesting avenue for future research could be around the trade-offs between fairness and achievable revenue in large systems with customer substitution behavior. It is possible that some system structures lend themselves well to using customer choice as a surrogate for price discrimination. Characterizing the features of systems that allow for or prevent this type of manipulation is another interesting question. In this thesis we have worked with expected revenue as our optimization criterion.

141 Although this is common in the revenue management literature, in the closely related field of online matching a different criterion, the competitive ratio is often used to bound the performance gap between the proposed policy and the optimal policy under all possible demand realizations. Extending the ideas presented in this thesis to a scenario of online matching with reusable resources is another interesting direction for future research.

142 Appendix A

Technical and Experimental Results for Chapter 2

A.1 Proofs

A.1.1 Proof of Proposition 1

Proof. We begin by examining a single resource 푖. Since prices are equal for all cus- 휋 1 [︁∫︀ 푇 ∑︀퐾 ]︁ tomer types we have that 퐽푖 = lim푇 →∞ 푇 E휋 0 푟푖 푘=1 푊푖푘(푥)푑푥 . By the Fubini- Tonelli theorem for non-negative integrands we have,

∫︁ 푇 퐾 휋 1 ∑︁ 퐽푖 = lim 푟푖 E휋 [푊푖푘(푥)] 푑푥. 푇 →∞ 푇 0 푘=1

Therefore in order to demonstrate that the AASP performs no worse than the TASP we seek to demonstrate that for each resource the expected steady-state utilization is no lower. Under the TASP as defined by 훼퐿푃 , the base rate of arrivals of customer type ˜ ∑︀ 퐿푃 푘 to resource 푖 is given by 휆푖푘 = 휆푘 푆⊆풮 훼푘 (푆)푃푖푘(푆) and the overall arrival rate ˜ ˜ ∑︀퐾 ˜ 휆푖푘 휆푖 = 휆푖푘. This leads to the customer type traffic intensity 휌˜푖푘 = and overall 푘=1 휇푖푘 ∑︀퐾 traffic intensity 휌˜푖 = 푘=1 휌˜푖푘. Given this traffic intensity, the Erlang-b formula (2.7) yields the steady-state rate of rejection, 퐵(˜휌푖, 퐶푖). Using this, the steady-state rate of

143 ˜ customers of each type accepted into the system is given by (1−퐵(˜휌푖, 퐶푖))휆푖푘. Further by Little’s Law and the PASTA property the steady-state utilization of resource 푖 by customers of type 푘 is given by E휋[푊푖푘(푥)] = (1 − 퐵(˜휌푖, 퐶푖))˜휌푖푘 which leads to an overall utilization of E휋[푊푖(푥)] = (1−퐵(˜휌푖, 퐶푖))˜휌푖. Since (1−퐵(˜휌푖, 퐶푖))˜휌푖 is increasing ˜ in 휌˜푖 which itself is increasing in each arrival rate 휆푖푘 it suffices to show that the base arrival rates are no lower under the AASP. Under the AASP each assortment 푆 containing 푖 as selected by the TASP is ¯ adjusted to 푆 = {푗 ∈ 푆 : 푊푗 < 퐶푖} prior to being presented to the customer. Then the probability that an arriving customer of type 푘 purchases product 푖 from ¯ ¯ ¯ 푆 is given by 푃푖푘(푆) = 푃푖푘(푆) ≥ 푃푖푘(푆) by the weak rationality assumption. Thus the we have that the steady-state rate of utilization of resource 푖 under the AASP,

∑︀퐾 휆푘 ∑︀ 퐿푃 ¯ 휌¯푖 = 훼 (푆)푃푖푘(푆) must exceed or match that of the TASP, and 푘=1 휇푖푘 푆⊆풮 푘 therefore,

푛 퐴퐴푆푃 ∑︁ 퐽 = 푟푖 (1 − 퐵(¯휌푖, 퐶푖))¯휌푖 푖=1 푛 ∑︁ ≥ 푟푖 (1 − 퐵(˜휌푖, 퐶푖))˜휌푖 푖=1 푛 ∑︁ 푇 퐴푆푃 * = 퐽푖 ≥ min (1 − 퐵푖)퐽 , 푖=1,...,푛 푖=1 as claimed.

A.1.2 Proof of Proposition 2

Proof. We begin by decomposing 푓 into the sum over resource-wise revenue functions

푓푖 and by further examining the associated effective traffic functions 푔푖. By Corollary 1 of (Harel 1990) and by taking 휇 = 1 in the notation of (Harel 1990), we obtain that functions of the form 푔(휌) = 휌(1 − 퐵(휌, 푐)) are concave in 휌 for any fixed integral

capacity 푐 > 0. Therefore the effective traffic functions 푔푖 are each concave in the scalar argument 휌. Then for any 훼 and 훼′ represented as a 퐾2푁 -vector and any 휏 ∈ [0, 1] the effective

144 utilization implied by the linear combination 휏훼 + (1 − 휏)훼′ is exactly 휏휌 + (1 − 휏)휌′. Therefore by linearity of the 휌˜ in 훼 we have that,

′ ′ ′ 푔푖(휏훼 + (1 − 휏)훼 ) = (휏휌˜푖(훼) + (1 − 휏)˜휌푖(훼 )) (1 − 퐵 (휏휌˜푖(훼) + (1 − 휏)˜휌푖(훼 )) , 퐶푖)

′ ′ =휌 ˜푖(휏훼 + (1 − 휏)훼 ) (1 − 퐵(˜휌푖(휏훼 + (1 − 휏)훼 ), 퐶푖)

′ ′ ≥ 휏휌˜푖(훼) (1 − 퐵(˜휌푖(훼), 퐶푖) + (1 − 휏)˜휌푖(훼 ) (1 − 퐵(˜휌푖(훼 ), 퐶푖)

′ = 휏푔푖(훼) + (1 − 휏)푔푖(훼 ),

demonstrating that the effective traffic functions 푔푖 are concave in the allocation variables. Thus our since our overall objective (2.11) is the sum of concave functions scaled by positive revenue factors and therefore is itself concave in 훼 as claimed.

A.1.3 Proof of Lemma 4

Proof of Lemma 4. To demonstrate the value of beginning in a higher state, we follow the process 푊 (푥) with 푋(0) < 퐶 and the incremented process 푌 (푥) with 푌 (0) =

푋(0) + 1. We show that the total reward satisfies 푅푊 (휏) ≤ 푅푌 (휏) for all finite 휏 and in particular we show that there exists a time 휏 * that is almost surely finite and for

* * * which 푋(휏 ) = 푌 (휏 ). This implies that 푅푌 −푋 (휏 ) ≥ 0 and that 푅푌 −푋 = 푅푌 −푋 (휏) = * * 푅푌 −푋 (휏 ) for all 휏 > 휏 . To do this we work with the uniformized state space that monitors each possible transition. By the memorylessness property and the fact that the minimum of a set of exponential transition times is itself exponential we have that the uniformized

transitions take place at a rate of 휆 + 퐶휇. Let 휉 = {휉1, 휉2,...} denote the infinite series of such transition intervals and let 휏 be the associated series of transition times ∑︀푘 defined by 휏푘 = 푖=1 휉푖 with 휏0 = 0. Each transition corresponds to a customer arrival or a potential customer departure. Arrival transition events occur with probability

휆 퐶휇 휆+퐶휇 and departure events occur with probability 휆+퐶휇 . Each departure event is further mapped with uniform probability to one of the 퐶 potential customers since we assume here that service rates are equal. We let 푗(푘) ∈ {1, . . . , 퐶} denote the

145 number of the potential customer departure. We then define the state transitions,

∙ If 휏푘 is an arrival event: 푋(휏푘) = min (푋(휏푘−1) + 1, 퐶)

∙ If 휏푘 is a departure event : 푋(휏푘) = 푋(휏푘−1) − I(푗(푘) ≤ 푋(휏푘−1))

With these definitions we then define the paired process 푆(푥) = (푊 (푥), 푌 (푥)) in which each component evolves as defined above. We observe that upon each transition the two processes either remain in a state such that 푌 (푥) = 푊 (푥) + 1. Once the processes become equal they evolve in the same manner for the remainder of the sample as they are faced with the same sequence of future events. The evolution of the paired process depends on its current state, for completeness we describe these dynamics below.

∙ If 푆(푥) = (0, 1) there are three cases. If an arrival occurs we have 푆(휏푘) = (1, 2).

If a departure occurs such that 푗(푘) = 1 then the system couples at 푆(휏푘) = (0, 0). If a departure occurs such that 푗(푘) > 1, then both processes remain in

the same state and 푆(휏푘) = (0, 1).

∙ If 푆(푥) = (푗, 푗 + 1) with 1 ≤ 푗 ≤ 퐶 − 2, then there are four cases. If an arrival

occurs we have 푆(휏푘) = (푗 + 1, 푗 + 2). If a departure occurs such that 푗(푘) ≤ 푗

then 푆(휏푘) = (푗 − 1, 푗). If a departure occurs such that 푗(푘) = 푗 + 1 then

the system couples at 휏푘. If a departure occurs with 푗(푘) > 푗 + 1, then both

processes remain in the same state and 푆(휏푘) = (푗, 푗 + 1).

∙ If 푆(푥) = (퐶 − 1, 퐶), then there are three cases. If an arrival occurs it is

blocked in the 푌 (푥) chain and the system couples at 휏푘 with 푆(휏푘) = (퐶, 퐶).

If a departure occurs with 푗(푘) < 퐶, then the state transitions to 푆(휏푘) = (퐶 − 2, 퐶 − 1). Otherwise if a departure occurs and 푗(푘) = 퐶 then the system

again couples at 휏푘 with 푆(휏푘) = (퐶 − 1, 퐶 − 1).

* Let 휏 denote the random time when the process couples. Since 푅푊 (휏0) = * 푅푌 (휏0) = 0 we observe that in each such transition 휏푘 ≤ 휏 , we maintain the in- * variant, 푅푊 (휏푘) = 푅푊 (휏푘−1) ≤ 푅푊 (휏푘−1) + 휉푘 = 푅푌 (휏푘) and 푅푊 −푌 (휏) = 푅푊 −푌 (휏 ) for all 휏 > 휏 *.

146 휇 Thus at each such transition with probability at least 휆+퐶휇 . Thus the number of transition events required to reach a coupling state is stochastically dominated by a geometric random variable with this probability of success. The sum of a geometric number of exponential random variables is itself exponential, thus 휏 * is stochastically

휇 * dominated by an exponential random variable with rate (휆+퐶휇)2 . It follows that 휏 is * almost surely finite and that 푅푊 −푌 = 푅푊 −푌 (휏 ). Since this argument holds for all beginning states 푋(0) < 퐶, it follows that this induces an ordering of future rewards

over the state space and so 푅푊 −푌 ≤ 0 for all processes such that 푋(0) ≤ 푌 (0), proving the lemma.

A.1.4 Proof of Proposition 4

Proof of Proposition 4. As before we let 푇 * denote the random hitting time for state 퐶 so that 푋(푇 *) = 퐶. We prove the proposition by demonstrating that upon reaching state 퐶 the total future revenue must exceed that attributable to steady state.

For any utilization level 푗 ∈ {1, . . . , 퐶} and 휏 > 0, we define the auxiliary processes

휏 휏 푌푗 (푥) to be a process satisfying 푌푗 (휏) = 푗. Due to the memorylessness propoerty we observe that upon hitting 푇 * we can equivalently represent the total future expected revenue of the process in steady state as a distribution over such auxiliary functions, weighted by their likelihood in steady state, beginning at 푇 *.

The total samplewise regret can then be broken up into two epochs by the hitting time 푇 *. We note that since 푇 * is distributed as the sum of a finite number of exponential random variables it is finite almost surely. We now bound the total loss

147 versus steady state for a given sample path by,

∫︁ ∞ ∫︁ 푇 * ∫︁ ∞ ℒ푖(푥) = ℒ푖(푥) + ℒ푖(푥) 푑푥 0 0 푇 * ∫︁ ∞ 퐶푖 * ∑︁ 푖 (︀ 푇 * )︀ ≤ 푟푖퐶푖푇 + 푟푖 휋푗 푌푗 (푥) − 푊 (푥) 푑푥 푇 * 푗=1

퐶푖 * ∑︁ 푖 = 푟푖퐶푖푇 + 휋 푅 푇 * 푗 푌푗 −푋 푗=1

* ≤ 푟푖퐶푖푇 .

The first equality follows from the fact that the maximum instantaneous lossis 푟푖퐶푖 and the characterization of the future steady state revenue as a distribution over

휏 the auxiliary processes 푌푗 (휏). The second inequality is a result of an application of lemma 4. Finally, taking expectation over sample paths yields the desired result.

[︂∫︁ ∞ ]︂ * E |ℒ푖(푥)| 푑푥 ≤ 푟푖퐶푖E[푇 ] 0 퐶 ∑︁푖 1 = 푟 퐶 푖 푖 휈 푗=1 푗

≤ 푟푖퐶푖퐻(퐶푖)

≤ 푟푖퐶푖 (ln(퐶푖 + 1) + 훾)

∑︀푐 1 Where 퐻(푐) = 푗=1 푗 denotes the 푐th harmonic number and 훾 denotes the Euler- Mascheroni constant satisfying 훾 < 0.578.

A.1.5 Proof of Proposition 5

We claim that the FPPAO problem is NP-hard. To see this we first formulate the fair pricing and personalized assortment decision problem (FPPAD). With the additional argument 푏, the FPPAD problem specified by (풮, 풫, ℳ, 푏) is to determine whether or not there exists a pricing and assortment decision (푅, 푆) with objective value of

148 at least 푏.

Lemma 8. The 푀-capacitated FPPAD problem is NP-complete even when ℳ푘 is specified by a multinomial logit model for each customer type 푘 and there are two prices per item.

Proof. We prove this by reduction from 3-SAT. An instance of 3-SAT is described

by a set of 푛 Boolean variables 푋 = {푥푖 : 푖 ∈ [푛]}, a derived set of literals 퐿 = 푋 ∪ {푥¯ : 푥 ∈ 푋} consisting of 푋 as well as its negations, and a set of 푚 clauses

퐶 = {푐푗 : 푗 ∈ [푚]} over the literals. Where each clause consists of exactly 3 literals,

so that 푐푗 = (휆푗1, 휆푗2, 휆푗3) for some literals 휆푗1, 휆푗2, 휆푗3 ∈ 퐿, corresponding to distinct

variables in 푋. A truth assignment is a function 푇 : 푋 → {0, 1}, and if 푇 (푥푖) = 1

and we say that 푥푖 is set to true under 푇 , and false otherwise. A clause 푐푗 is satisfied under the truth assignment 푇 if at least one of its literals is true. Given the set of variables 푋 and the set of clauses 퐶, the 3-SAT problem is to determine if there exists a truth assignment 푇 so that each of the 푚 clauses is satisfied. It is well known that 3-SAT is NP-complete (Cook 1971). Now we show that an arbitrary instance of 3-SAT, can be decided using the FPPAD. To demonstrate this reduction we only require assortments of size 1, so that

푆1 ∈ [푁]. In our FPPAD instance we create a product for each variable so that 푁 = 푛, and a customer type for each clause so that 퐾 = 푚. Each product 푖 ∈ [푁] has two possible prices 푝푖1 = 1/휖 and 푝푖2 = 1, for 휖 ∈ (0, 1) to be defined subsequently.

The choice model ℳ푘 for customer type 푘 is specifed by the corresponding clause

푐푘. Specifically, if 푥푖 ∈ 푐푘 then Pℳ푘 (푖; 푖, 푝푖1) = Pℳ푘 (푖; 푖, 푝푖2) = 휖 and if 푥¯푖 ∈ 푐푘 then 2 Pℳ푘 (푖; 푖, 푝푖1) = 휖 and Pℳ푘 (푖; 푖, 푝푖2) = 1 − 휖. For all other products 푗, Pℳ푘 (푗; 푗, 푝푗1) = 2 Pℳ푘 (푗; 푗, 푝푗2) = 휖 . By setting 휖 < 1/(3푚+1), we ensure that the instance of 3-SAT is satisfiable if and only if there is a pricing and assignment solution with total revenue of at least 푚(1 − 휖). To see this suppose that 3-SAT(푋, 퐶) is satisfiable with truth assignment 푇 . Then

in terms of the construction above we choose the pricing decision 푑 = {푝푖1I푇 (푥푖)=1 +

푝푖2I푇 (푥푖)=0 : 푖 ∈ [푛]}. For each clause 푐푗 let 푥ℓ be an arbitrary variable corresponding

149 to one of the satisfied literals under 푇 , then the desired assortment policy is 푆푗 = {ℓ}. Since each clause is satisfiable, it follows that under this pricing and assortment policy, that each customer type is offered a product ℓ with expected revenue (푝 Pℳ푘 (ℓ; ℓ, 푝)) = 1−휖. Thus, the total value of the solution is precisely equal to the number of clauses, 푚(1 − 휖). On the other hand, suppose this instance of 3-SAT is not satisfiable. In this case, for any truth assignment 푇 , there is at least one clause 푐푗(푇 ) that is not satisfied. Since the maximum revenue that can be derived from each satisfied class is 1 and the maximum revenue from each unsatisfied class is 휖 it follows that the maximum revenue achievable in this case is 푚 − 1 + 휖 and we have 푚 − 1 + 휖 < 푚(1 − 휖) so long

1 as 휖 < 푚+1 . Hence, an instance of 3-SAT is satisfiable if and only if the associated fair pricing and personalized assortment instance attains a revenue of at least 푚(1 − 휖). This demonstrates that the FPPAD problem is NP-complete. Finally, we observe that as we only require assortments of unit size and the purchase probabilities lie in the interval (0, 1) each can easily be specified by a MNL choice model.

The previous lemma implies that the associated optimization problem is NP-hard, and in particular it is NP-hard in general to solve the column generation subproblem in the 푀-capacitated fair pricing upper-bounding linear program. Thus proposition 5 is proven.

A.2 Computational Experiment Model-Fitting Methodology

We seek to model the behavior of all potential customers arriving to the system, however, the arrival rates inferred from our data set consist only of customers who are using the PayByPhone system rather than the cash-based payment system, who observe a meter with at least one open space, and who decide that their utility for parking at the meter exceeds the expected price. That is for each meter 푘 we observe

150 obs an arrival rate 휆푘 that is less than the true arrival rate 휆푘. To aid in estimating 휆푘 we define a number of variables. We let 휉 denote the proportion of arriving customers using the PayByPhone system rather than paying in cash. We assume that mean utility a customer gains from successfully parking is 푢¯ units greater than the price paid at rate 푟푘 for each resource 푘. This utility is naturally unitless, but by comparing it to the price sensitivity parameter 훽 it can be understood in monetary terms. Using

1 this utility assumption we are able to compute 휅 = 1+exp(−(¯푢)) , the proportion of arriving to the unblocked meter who elect to purchase and park. Using these variables the relationship between the observed arrival rate and the true arrival rate can be expressed as by

obs (︂ (︂ )︂)︂ 휆푘 푝휆푘 = 휅휆푘 1 − 퐵 , 퐶푘 . (A.1) 휉 휇푘

Due to the monotonicity of the right side of equation (A.1) in 휆푘 we can use a root- finding algorithm to estimate the true arrival rate for any fixed valuesof 휉 and 휅. For the purposes of this initial analysis we assume that 푢¯ = 2.0. This implies that of the potential customers arriving to each meter, about 88% proceed with their purchase if their desired space is available. That is only 12% of potential customers who would otherwise choose to park at a particular location do not due to cost. We also take 휉 = 0.6, meaning that 60% of all customers are using the PayByPhone system. Using these values and the supplied observed arrival and parking rates a small fraction of the estimated true arrival rates are infinite. These are meters for which the estimated parking rate is high in comparison to the observed arrival rate. After excluding such meters we are left with 287 meters and customer types for our numerical experiments.

A.3 Computational Experiment Model Sensitivity

To test the sensitivity of our analysis to the chosen parameters we tested a wide range of parameter settings for the arrival rates 휆푘, the distance sensitivity 휂, and the price sensitivity 훽. Due to the significant computation time needed to solve for

151 the linear programming-based policies in the full 287 resource case, for the purposes of this sensitivity analysis we restricted our attention to the 57 meters lying in the southwest quadrant of the borough. The chosen ranges of multipliers are broken down as follows.

∙ 휆푘: Scaled by values from 0.2 to 2.6 in increments of 0.2

∙ 휂: from 2.0 to 10.0 in increments of 2.0

∙ 훽: from 0.2 to 2.0 in increments of 0.2

For each combination of these multipliers we computed the corresponding FPO policy and Current Price policy and computed the actual performance and the upper bound associated with each. For brevity, we report a representative sample of the results in the tables A.1, A.2, and A.3. The first value in each table cell is the percentage of the upper bound that is obtained by the FPO algorithm and the second value represents the increase in revenue generated by the dynamic pricing FPO policy over the corresponding Current Price policy. From these numerical results we observe that in all cases the proposed FPO policies attains at least 75% of the maximum achievable revenue. As expected this fraction increases as the arrival rate decreases, since this mimics the effect of increasing resource capacity. We note that the FPO policy appears to provide the greatest increase in revenue rate over the current fixed prices when customer price sensitivity is small. As customers grow increasingly price sensitive the FPO algorithm provides less and less additional revenue. We also observe that the FPO generates an increase in revenue in almost every scenario except those featuring anemic arrival rates in combination with high price sensitivity. In such cases the FPO is too aggressive in discounting and thereby maintains high traffic at many resources causing increased blocking over the current pricing baseline.

152 휂=2 휂=6 휂=10 0.4 × 휆 85.8% (21.5%) 86.2% (24.7%) 86.1% (28.6%) 1.0 × 휆 84.7% (33.1%) 85.1% (36.4%) 85.4% (34.1%) 1.6 × 휆 78.9% (36.2%) 79.6% (40.3%) 80.7% (40.3%)

Table A.1: Steady State Revenue Rate Sensitivity, 훽 = 0.4

휂=2 휂=6 휂=10 0.4 × 휆 85.2% (-1.8%) 85.3% (0.5%) 85.5% (5.7%) 1.0 × 휆 83.0% (6.3%) 83.4% (9.7%) 83.6% (8.7%) 1.6 × 휆 76.5% (7.6%) 77.7% (12.3%) 78.4% (12.3%)

Table A.2: Steady State Revenue Rate Sensitivity, 훽 = 1.0

휂=2 휂=6 휂=10 0.4 × 휆 85.4% (-4.9%) 85.3% (-2.8%) 85.3% (0.3%) 1.0 × 휆 82.4% (2.1%) 82.3% (5.4%) 82.5% (4.9%) 1.6 × 휆 76.1% (3.7%) 75.8% (6.8%) 76.1% (3.7%)

Table A.3: Steady State Revenue Rate Sensitivity, 훽 = 1.6

153 154 Appendix B

Technical Results, Experimental Results, and Extensions for Chapter 3

B.1 Proofs

B.1.1 Proof of Proposition 6

Proof. To prove our claim we demonstrate that the 훼푘푠 variables associated with any admissible policy must satisfy the constraints of the linear programming formulation (3.10). The simplex constraints of (3.10) are immediately satisfied by any probability distribution and thus we focus our attention here on the capacity utilization constraint. We first consider any resource 푖 and an arbitrary subinterval 푡. Recall that a policy in the finite horizon setting induces a distribution over the random variables

describing the system dynamics during the 푡-th subinterval, 푄푖푡, 퐴푖푡, and 퐷푖푡, with ¯ ¯ ¯ the expectations 푄푖푡, 퐴푖푡, and 퐷푖푡. We note that the initial utilization of the segment and the number accepted arrivals within it must be less than the sum of the number of departures within the interval and the resource capacity 퐶푖. Formally, any realization of 퐴푖푡, 푄푖푡, and 퐷푖푡 must satisfy the inequality,

퐴푖푡 + 푄푖푡 ≤ 퐶푖 + 퐷푖푡.

155 Since this inequality must be satisfied for each realization of subinterval 푡, it must ¯ ¯ also be satisfied for the associated expectations for subinterval 푡. Thus, 퐴푖푡, 푄푖푡, and ¯ 퐷푖푡 must satisfy, ¯ ¯ ¯ 퐴푖푡 + 푄푖푡 ≤ 퐶푖 + 퐷푖푡. (B.1)

¯ Further, the expected number of departures 퐷푖푡 can be bounded above by assuming ¯ that all arrivals occur at the beginning of the subinterval so that we have 퐷푖푡 ≤ ¯ ¯ (푄푖푡 + 퐴푖푡)(1 − exp(−휇휃)). Substituting this inequality into the capacity bound (B.1) and rearranging we obtain,

−휇휃 ¯ −휇휃 ¯ 푒 퐴푖푡 ≤ 퐶푖 − 푒 푄푖푡. (B.2)

Next, we note that the expected initial utilization for subinterval 푡 is related to the associated expectations from the previous subinterval by the recurrence relation,

¯ ¯ ¯ ¯ 푄푖푡 = 푄푖(푡−1) + 퐴푖(푡−1) − 퐷푖(푡−1).

¯ Bounding the number of departures, 퐷푖(푡−1), in an analogous manner we obtain the inequality, ¯ 휇휃 ¯ ¯ 퐴푖(푡−1) ≤ 푒 푄푖푡 − 푄푖(푡−1). (B.3)

Using these results, we are able to bound the average utilization during subinterval

156 푡 as follows.

푇 퐾 푇 ∑︁ ∑︁ ∑︁ ¯ ∑︁ ¯ 푓푖(푠, 푡)휆푘푠푃푖푘(푆)훼푘푠(푆) = 푓푖(푠, 푡)퐴푖푠 (B.4) 푠=1 푘=1 푆∈풮 푠=1 푡 ∑︁ −(푡−푠+1)휇휃 ¯ = 푒 퐴푖푠 (B.5) 푠=1 푡−1 −휇휃 ¯ ∑︁ −(푡−푠+1)휇휃 ¯ ≤ 퐶푖 − 푒 푄푖푡 + 푒 퐴푖푠 (B.6) 푠=1 푡−1 −휇휃 ¯ ∑︁ −(푡−푠)휇휃 (︀ ¯ −휇휃 ¯ )︀ ≤ 퐶푖 − 푒 푄푖푡 + 푒 푄푖(푠+1) − 푒 푄푖푠 푠=1 (B.7)

−휇휃 ¯ −휇휃 ¯ −푡휇휃 ¯ = 퐶푖 − 푒 푄푖푡 + 푒 푄푖푡 − 푒 푄푖1 (B.8)

= 퐶푖 (B.9)

Where the equality in (B.5) follows from the definition of the future load in this setting, (B.6) and (B.7) follow from the bounds above due to the capacity inequality, (B.8) follows from the telescoping series, and (B.9) is due to the fact that the initial initial utilization 푄푖1 = 푊푖(0) = 0 as the system begins empty.

This inequality demonstrates that any admissible policy induces a set of offering probabilities 훼푘푠 that are feasible as decision variables for (3.10). We further observe that under an admissible policy the objective of (3.10) is precisely the expected revenue of the policy as defined in (3.8) Then since 퐽 퐿푃 represents the maximum such expected revenue over the space of feasible variables, the proposition is proved.

B.1.2 Proof of Lemma 5

−2휇휃 Proof. We observe that the scaled decision variables 푒 훼퐿푃 represent a feasible solution to the TDR linear program (3.14), and therefore the objective value of this

157 linear program may be lower bounded as follows,

푇 퐾 푁 ∑︁ ∑︁ ∑︁ ∑︁ 푟푖푘 퐽 푃 퐺 = 휆¯ 푃 (푆)훼 (푆) 휇 푘푠 푖푘 푘푠 푠=1 푘=1 푆∈풮 푖=1 푇 퐾 푁 ∑︁ ∑︁ ∑︁ ∑︁ 푟푖푘 ≥ 푒−2휇휃 휆¯ 푃 (푆)훼퐿푃 (푆) 휇 푘푠 푖푘 푘푠 푠=1 푘=1 푆∈풮 푖=1 = 푒−2휇휃퐽 퐿푃

≥ 푒−2휇휃퐽 *.

Where the final inequality above follows from proposition 6.

B.1.3 Proof of Proposition 7

Proof. We begin by focusing on a single resource 푖 and subinterval 푡. For clarity of notation all policy-dependent random variables are assumed to be those under the TDR policy. Under the TDR policy, the number of customers of type 푘 selecting ˜ resource 푖 during subinterval 푡, 푍푖푘푡, is a Poisson random variable with mean, 휆푖푘푡. Thereby the overall number of customers selecting resource 푖 during subinterval 푡, ˜ ∑︀퐾 ˜ 푍푖푡 is also Poisson with mean 휆푖푡 = 푘=1 휆푖푘푡. We let 퐴푖푘푡 denote the actual number of type 푘 customers who are able to begin service at resource 푖 during subinterval

푡 and we note that 퐴푖푘푡 ≤ 푍푖푡푘 due to the capacity constraint. Within subinterval

∑︀퐾 푟푖푘 푡 the operator earns an expected revenue given by 퐽푖푡 = E[ 푘=1 휇 퐴푖푘푡]. On the other hand, the linear programming upper bound in (3.14) assumes a revenue of

푃 퐺 ∑︀퐾 푟푖푘 ˜ 퐽푖푡 = 푘=1 휇 휆푖푘푡 for this resource within this period. Our objective here is to bound the gap between these two terms. We will make use of the blended revenue rate, 퐾 ˜ ∑︁ 푟푖푘휆푖푘푡 푟¯ = , 푖푡 ∑︀퐾 ˜ 푘=1 ℓ=1 휆푖ℓ푡

푃 퐺 which is average revenue of an arriving customer in (3.14). Therefore, we have 퐽푖푡 =

푟¯푖푡 ˜ 휇 휆푖푡.

The number of accepted arrivals 퐴푖푘푡 depends on the interplay between the initial

158 utilization 푄푖푡 and the number of arrivals of each type 푍푖푘푡. We observe that the distribution of 푄푖푡 depends on the complex interaction of previous arrivals and service times with the capacity constraint as well as the fluctuations of the true demand ˜ ˜ rates. This motivates us to define an upper bound, 푄푖푡, such that 푄푖푡 ≥ 푄푖푡 along ˜ each sample path. To define 푄푖푡, we consider an alternative system with two primary modifications. The first is that in the alternate system, customer arrivals arenot subject to the capacity limit 퐶푖. In addition, we assume that all arrivals in the alternate system occur at the end of their respective subinterval. Thus, in this system, no arrivals occurring prior to the epoch at hand depart in the same period in which they arrive. Since, under the TDR policy, the number of raw arrivals 푍푖푡 to each resource in each epoch is a Poisson random variable, the number of admitted customers from each prior epoch is itself a Poisson random variable that has been further thinned by the probability of departure. It then follows that the initial utilization in this system ˜ 푄푖푡 is the sum of independent Poisson random variables and thus is itself Poisson. ˜ Based on the definition of the future load function the meanof 푄푖푡 can be expressed as,

푡−1 ∑︁ −(푡−푠−1)휇휃 ˜ 휔푖푡 = 푒 휆푖푠 푠=1 푡−1 2휇휃 ∑︁ −(푡−푠+1)휇휃 ˜ = 푒 푒 휆푖푠 푠=1 (︃ 푡 )︃ 2휇휃 ∑︁ [︁ −(푡−푠+1)휇휃 ˜ ]︁ −휇휃 ˜ = 푒 푒 휆푖푠 − 푒 휆푖푡 푠=1 (︃(︃ 푇 )︃ )︃ 2휇휃 ∑︁ ˜ −휇휃 ˜ = 푒 푓푖(푠, 푡)휆푖푠 − 푒 휆푖푡 . 푠=1

We then note that along each sample path as defined by the time of customer ˜ arrivals and their respective service lengths we have 푄푖푡 ≥ 푄푖푡. This construction is ˜ useful as it allows us to define the random variables 퐴푖푘푡 which is analogous to 퐴푖푘푡 ˜ except assuming that the initial utilization is given by 푄푖푡 rather than 푄푖푡. Therefore, ˜ we observe that on each sample path 퐴푖푘푡 ≤ 퐴푖푘푡. Using these constructions, we can

159 lower bound the expected revenue under the TDR policy by,

[︃ 퐾 ]︃ [︃ 퐾 ]︃ ∑︁ 푟푖푘 ∑︁ 푟푖푘 퐴 ≥ 퐴˜ E 휇 푖푘푡 E 휇 푖푘푡 푘=1 푘=1

퐶푖−1 ∞ [︃ 퐾 ]︃ ∑︁ ∑︁ ∑︁ 푟푖푘 = 퐴˜ | 푄˜ = 푞, 푍 = 푧 (푄˜ = 푞) (푍 = 푧) E 휇 푖푘푡 푖푡 푖푡 P 푖푡 P 푖푡 푞=0 푧=0 푘=1

퐶푖−1 퐶푖−푞 [︃ 퐾 ]︃ ∑︁ ∑︁ ∑︁ 푟푖푘 ≥ 퐴˜ | 푄˜ = 푞, 푍 = 푧 (푄˜ = 푞) (푍 = 푧). E 휇 푖푘푡 푖푡 푖푡 P 푖푡 P 푖푡 푞=0 푧=0 푘=1

In the last inequality above we have removed all terms in which there are more ˜ arrivals 푍푖푡 than spaces remaining at the beginning of the subinterval, 퐶푖 −푄푖푡. In the ˜ remaining cases none of the arrivals 푍푖푘푡 interfere with each other and thus 퐴푖푘푡 = 푍푖푘푡 and the expected revenue rate earned by each such arrival is equal to 푟¯푖푡 as defined previously. Thus, in this case the revenue earned depends only on the distribution of ˜ ˜ the Poisson random variables 푍푖푡 and 푄푖푡 and their respective means 휆푖푡 and 휔푖푡. We further note that the capacity constraint in (3.14) yields an upper bound on the sum of these two parameters,

(︃(︃ 푇 )︃ )︃ ˜ ˜ 2휇휃 ∑︁ ˜ −휇휃 ˜ 휆푖푡 + 휔푖푡 = 휆푖푡 + 푒 푓푖(푠, 푡)휆푖푠 − 푒 휆푖푡 푠=1 (︃(︃ 푇 )︃ )︃ 2휇휃 ∑︁ ˜ −휇휃 ˜ −2휇휃 ˜ = 푒 푓푖(푠, 푡)휆푖푠 − 푒 휆푖푡 + 푒 휆푖푡 푠=1 푇 2휇휃 ∑︁ ˜ ≤ 푒 푓푖(푠, 푡)휆푖푠 푠=1

≤ 퐶푖 .

160 Applying these facts yields,

[︃ 퐾 ]︃ 퐶푖−1 퐶푖−푞 ∑︁ 푟푖푘 ∑︁ ∑︁ 푟¯푖푡 퐴 ≥ 푧 (푄˜ = 푞) (푍 = 푧) E 휇 푖푘푡 휇 P 푖푡 P 푖푡 푘=1 푞=0 푧=0

퐶 −1 퐶푖−푞 (︃ ˜ )︃ (︃ ˜ )︃ 푖 ˜ 푞 −퐶푖−휆푖푡 ˜푧 −휆푖푡 ∑︁ ∑︁ 푟¯푖푡 (퐶푖 − 휆푖푡) 푒 휆 푒 ≥ 푧 푖푡 휇 푞! 푧! 푞=0 푧=0 퐶 −1 푖 푞 −퐶푖 푟¯푖푡 ∑︁ 퐶 푒 = 휆˜ 푖 휇 푖푡 푞! 푞=0 1 (︂푟¯ )︂ ≥ 푖푡 휆˜ . 푒 휇 푖푡

˜ The second inequality follows as 휔푖푡 ≤ 퐶푖 − 휆푖푡 and the expression above is decreasing in 휔푖푡. The final equality follows from simplification of terms. Applying this result to each resource and subinterval results in the desired lower bound on the overall performance of the TDR policy with respect to the optimal policy.

[︃ 푁 푇 퐾 ]︃ ∑︁ ∑︁ ∑︁ 푟푖푘 퐽 푇 퐷푅 = 퐴 E 휇 푖푘푡 푖=1 푡=1 푘=1 푁 (︃퐶 −1 )︃ 푇 푖 푞 −퐶푖 ∑︁ ∑︁ 퐶 푒 ∑︁ 푟¯푖푡 ≥ 푖 휆˜ 푞! 휇 푖푡 푖=1 푞=0 푡=1

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 ≥ min 푖 퐽 푃 퐺 푖 푞! 푞=0

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 ≥ min 푖 푒−2휇휃퐽 * 푖 푞! 푞=0 ≥ 푒−2휇휃−1퐽 *

B.1.4 Proof of Proposition 8

Proof. We begin the proof by examining a single resource 푖 and subinterval 푡. As in the proof of proposition 7 we work with the modified system in which all prior arrivals

161 occur at the end of their respective periodic intervals and the capacity constraint is not enforced for such prior arrivals. Under the TDR policy for problem (3.16) (휉) with scale parameter 휉, 푍푖푘푡 , the number of customers of type 푘 selecting resource 푖 ˜(휉) during subinterval 푡 is a Poisson random variable with mean, 휆푖푘푡. Thereby the overall (휉) number of customers selecting resource 푖 during subinterval 푡, 푍푖푡 is also Poisson with ˜(휉) ∑︀퐾 ˜(휉) (휉) mean 휆푖푡 = 푘=1 휆푖푘푡. We let 퐴푖푘푡 denote the actual number of type 푘 customers who are able to begin service at resource 푖 during subinterval 푡 after accounting for loss (휉) (휉) due to the capacity constraint. Due to this blocking effect we have that 퐴푖푘푡 ≤ 푍푖푡푘 . Within subinterval 푡 the operator earns an expected revenue from resource 푖 given by

푇 퐷푅−휉 ∑︀퐾 푟푖푘 (휉) 퐽푖푡 = E[ 푘=1 휇 퐴푖푘푡]. On the other hand, the linear programming upper bound 푃 퐺−휉 ∑︀퐾 푟푖푘 ˜(휉) ∑︀퐾 푟푖푘 ˜ 푃 퐺 in (3.14) assumes a revenue of 퐽푖푡 = 푘=1 휇 휆푖푘푡 = 푘=1 휇 휉휆푖푘푡 = 휉퐽푖푡 , due to the scaling of the system. We will make use of the blended revenue rate,

퐾 ˜(휉) ∑︁ 푟푖푘휆 푟¯ = 푖푘푡 , 푖푡 ∑︀퐾 ˜(휉) 푘=1 ℓ=1 휆푖ℓ푡

which is average revenue of an arriving customer in (3.14). Therefore, we have 푃 퐺 ˜(휉) 퐽푖푡 =푟 ¯푖푡휆푖푡 . As in the proof of proposition 7, the initial utilizations of resource 푖 ˜(휉) (휉) at subinterval 푡 in the alternate system, 푄푖푡 , is also a Poisson with mean 휔푖푡 = 휉휔푖푡 (휉) ˜(휉) and the means of 푍푖푡 , and 푄푖푡 satisfy are bounded with respect to the capacity constraint, ˜(휉) (휉) ˜ (휉) 휆푖푡 + 휔푖푡 = 휉(휆푖푡 + 휔푖푡) ≤ 휉퐶푖 = 퐶푖 ,

in both the finite and infinite time horizon settings.

˜(휉) By the same logic employed in the proof of proposition 7 we conclude that 퐴푖푘푡 ≤ (휉) 퐴푖푘푡 on each sample path. Therefore, by letting 푟^푖 = max푘 푟푖푘 we can lower bound

162 the normalized expected revenue under the TDR policy in the 휉-scaled regime by,

[︃ 퐾 ]︃ 1 1 ∑︁ 푟푖푘 (휉) 퐽 푇 퐷푅−휉 = 퐴 휉 푖푡 휉 E 휇 푖푘푡 푘=1 [︃ 퐾 ]︃ 1 ∑︁ 푟푖푘 (휉) ≥ 퐴˜ 휉 E 휇 푖푘푡 푘=1 [︃ 퐾 (︂ )︂ ]︃ 1 ∑︁ 푟푖푘 (휉) 푟^푖 (휉) (휉) 푟^푖 (휉) (휉) ≥ 푍 − [(푍 − 휆˜ )+] − [(푄˜ − 휔 )+] 휉 E 휇 푖푘푡 휇 E 푖푡 푖푡 휇 E 푖푡 푖푡 푘=1 1 (︂푟^ 푟^ )︂ = 퐽 푃 퐺 − 푖 [(푍(휉) − 휆˜(휉))+] + 푖 [(푄˜(휉) − 휔(휉))+] . 푖푡 휉 휇 E 푖푡 푖푡 휇 E 푖푡 푖푡

The second inequality above follows by allowing all excess current and prior arrivals to reduce the current period revenue as much as possible. The final equality follows

푃 퐺 from the defintion of 퐽푖푡 .

As this inequality holds for all scale parameters 휉, it also holds in the limit,

(︂ )︂ 1 푇 퐷푅−휉 푃 퐺 1 푟^푖 (휉) ˜(휉) + 푟^푖 ˜(휉) (휉) + lim 퐽푖푡 ≥ 퐽푖푡 − lim E[(푍푖푡 − 휆푖푡 ) ] + E[(푄푖푡 − 휔푖푡 ) ] (B.10) 휉→∞ 휉 휉→∞ 휉 휇 휇 (︂ √︁ √︁ )︂ 푃 퐺 1 푟^푖 ˜(휉) 푟^푖 (휉) ≥ 퐽푖푡 − lim 0.4 휆푖푡 + 0.4 휔푖푡 (B.11) 휉→∞ 휉 휇 휇 √ (︂ √︁ )︂ 푃 퐺 휉 푟^푖 ˜ 푟^푖 √ = 퐽푖푡 − lim 0.4 휆푖푡 + 0.4 휔푖푡 (B.12) 휉→∞ 휉 휇 휇

푃 퐺 = 퐽푖푡 . (B.13)

The second inequality above follows from the fact that for all sufficiently large 휆, a √ Poisson random variable 푍 with mean 휆 satisfies E[(푍 − 휆)+] ≤ 0.4 휆.

Combining these results and substituting into equation (B.10), yields the desired

163 asymptotic revenue guarantee,

푁 푇 [︃ 퐾 ]︃ 1 푇 퐷푅−휉 ∑︁ ∑︁ 1 ∑︁ 푟푖푘 (휉) lim 퐽휉 = lim E 퐴푖푘푡 휉→∞ 휉 휉→∞ 휉 휇 1=1 푡=1 푘=1 푁 푇 ∑︁ ∑︁ 푃 퐺 ≥ 퐽푖푡 1=1 푡=1 = 퐽 푃 퐺

≥ 푒−2휇휃퐽 *.

B.2 Extension to the Infinite Time Horizon Setting

Here we extend the analysis presented in the case of a finite time horizon to the the infinite horizon setting under periodically varying demand. Specifically, weassume

that the arrival rate functions 휆푘(푥) are periodic with a common periodic cycle of unit length. We use the 휈(푥) = mod(푥, 1) to denote the function which maps the system time 푥 to its position within the unit periodic cycle. As in the finite time horizon setting the operator seeks to develop a policy leading to effective assortment decisions under the given system parameters. Due to the periodic behavior of the arrival rate functions, it is natural to consider a class of policies that consider only the current utilization 푊 (푥) and the time within the periodic cycle, 휈(푥). We term such policies, periodic fixed policies, defined as for a specific customer type 푘 just as in equation (3.1).

푘 푁 휋 (휈(푥), 푊 (푥)) : [0, 1) × Z+ → Δ(풮) .

As in the finite time horizon setting, the overall policy is given by the concatenation of such policies over customer types, 휋(휈(푥), 푊 (푥)) = (휋1(휈(푥), 푊 (푥)), . . . , 휋퐾 (휈(푥), 푊 (푥))) and Π denotes the space of all admissible periodic fixed policies. As in the finite time horizon setting, our strategy works by approximating the

164 periodically-varying demand functions using piecewise constant functions over the 푇 subintervals 풯 . Each subinterval is of length 휃 = 1/푇 in chronological order. The system evolves continuously and we track the number of periodic intervals elapsed

using the epoch counter 휏 ∈ Z with epoch 휏 consisting of all times 푥 ∈ [휃(휏 − 1), 휃휏). As before, we use 푡(푥) as the mapping from continuous time 푥 to the corresponding ⌈︀ 푥 ⌉︀ subinterval 푡 ∈ 풯 . We further define 휏(푥) = 휃 as the mapping from continuous time 푥 to the corresponding epoch 휏. We note that in the finite time horizon setting 휏 = 푡. We also define the cyclic distance 푑(푠, 푡) as the number of subintervals by which interval 푠 precedes interval 푡 accounting for the periodic cycle where necessary,

⎧ ⎨⎪푡 − 푠 if 푠 ≤ 푡 푑(푠, 푡) = ⎩⎪푇 − 푠 + 푡 if 푠 > 푡.

We further define the helper function 훿(푡, 푧), which is defined to be the periodic interval 푠 such that 푑(푠, 푡) = mod(푧, 푇 ). In words, this function represents the periodic interval that precedes 푡 by 푧 periodic epochs.

We now observe that under a periodic fixed policy the evolution of the system utilization has further structure. Specifically, consider a periodic fixed policy 휋, an arbitrary base system time 휈 ∈ [0, 1), and the sequence of observations correspond-

ing to the same position within the periodic interval 휈푧 = 휈 + 푧 for 푧 ∈ Z. Under

the periodic fixed policy 휋 the evolution of 푊 (휈푧) between periodic cycles 푧 depends only on its current value. Thus, the system utilization on each slice 휈 evolves as a finite-state, discrete time Markov chain which is ergodic as each utilization statecom- municates with the empty utilization state. Therefore in steady state the distribution

휋 of utilization at each such slice 휈 ∈ [0, 1) has a unique stationary distribution Ψ휈 (푤) over utilization states 푤 ∈ 풲. Thus we can define the expected number of arrivals of ˜휋 type 푖 to resource 푘 over a single periodic cycle 휆푖푘 in exactly the same manner as in (3.2). Using this quantity, equation 3.3 yields the expected revenue attributable to customers arriving within one periodic cycle in steady state. Our notion of optimality will be the periodic fixed policy with the highest steady-state cyclical expected

165 revenue. To monitor the utilization of each resource at the beginning of each subinterval

we introduce the notation 푄푖(휏) = 푊푖((휏 − 1)휃) to denote the utilization of resource 푖 at the beginning of epoch 휏, which we term the initial utilization at epoch 휏. Due

to the capacity restriction we must have 푄푖(휏) ≤ 퐶푖 for all 푖 and 휏.

We now examine the transition dynamics of the system utilization 푄푖(휏) between epochs 휏. To formally specify the transition dynamics of 푄푖(휏) it is useful into introduce some variables that will play an important role in our analysis. We note that these random variables as well as the random variable representing the initial utilization depend on the choice of policy 휋, but we leave this dependence implicit to keep our notation uncluttered. Let 퐴푖(휏) denote the random variable representing the number of customers arriving to resource 푖 and successfully beginning service during epoch 휏. Likewise, let 퐷푖(휏) denote the number of customer departures from resource 푖 during epoch 휏. Then we observe that the evolution of capacity utilization can be captured by the recurrence relation, 푄푖(휏 + 1) = 푄푖(휏) + 퐴푖(휏) − 퐷푖(휏). We also note

that the capacity constraint applied to 푄푖(휏 + 1) immediately implies the natural

flow balance constraint 푄푖(휏) + 퐴푖(휏) ≤ 퐶푖 + 퐷푖(휏). We again use the notation 푄푖푡,

퐴푖푡, and 퐷푖푡 to represent the time averages of these random variables for periodic subinterval 푡 under a given policy. In the infinite time horizon setting under periodic demand our definition ofthe future load is adapted to account for the impact of a customer arriving in each prior instance of period 푠 on the utilization at the end of the current interval 푡.

∞ −(푑(푠,푡)+1)휇휃 ∑︁ (︀ −푇 휇휃)︀푗 −(푑(푠,푡)+1)휇휃 (︀ −휇)︀−1 푓푖(푠, 푡) = 푒 푒 = 푒 1 − 푒 . (B.14) 푗=0

With this refined definition we are again able to formulate a linear program identical to problem (3.10) presented in section 3.2. Our first task is to prove that this linear programming formulation represents an upper bound on the expected cyclical revenue of the optimal periodic fixed policy.

Lemma 9. 퐽 * ≤ 퐽 퐿푃

166 Proof. Proof of lemma 1 in the infinite time horizon setting. To prove our claim

we demonstrate that the 훼푘푠 variables associated with any admissible periodic fixed policy in steady state must satisfy the constraints of the cyclic linear programming formulation (3.10). The simplex constraints of (3.10) are immediately satisfied by any probability distribution and thus we focus our attention here on the capacity utilization constraint.

We first consider any resource 푖 and an arbitrary epoch time 휏 corresponding to the periodic interval 푡 = 푡(휏). Recall that a periodic fixed policy induces a stationary distribution over the random variables describing the system dynamics during the ¯ ¯ 푡-th periodic subinterval, 푄푖푡, 퐴푖푡, and 퐷푖푡, yielding the time average means 푄푖푡, 퐴푖푡, ¯ and 퐷푖푡. Under an admissible policy, a set 푆 may only be offered at times 푥 such that

푊푗(푥) < 퐶푗 for all 푗 ∈ 푆.Therefore the mean number of arrivals to resource 푖 during ¯ during periodic subinterval 퐴푖푡, can be expressed in terms of the offering probabilities

훼푘푡(푆) as, 퐾 ¯ ∑︁ ∑︁ ¯ 퐴푖푡 = 휆푘푡푃푖푘(푆)훼푘푡(푆). 푘=1 푆∈풮 We further note that the initial utilization of the interval and the number accepted arrivals within it must be less than the sum of the number of departures within the

interval and the resource capacity 퐶푖. Formally, any realization of 퐴푖(휏), 푄푖(휏), and

퐷푖(휏) must satisfy the inequality,

퐴푖(휏) + 푄푖(휏) ≤ 퐶푖 + 퐷푖(휏).

Since this inequality must be satisfied for all realizations of the periodic interval 푡(휏), it must also be satisfied for the associated time averages over the periodic interval 푡. ¯ ¯ ¯ Thus, 퐴푖푡, 푄푖푡, and 퐷푖푡 must satisfy,

¯ ¯ ¯ 퐴푖푡 + 푄푖푡 ≤ 퐶푖 + 퐷푖푡. (B.15)

¯ Further, the time average number of departures 퐷푖푡 can be bounded above by assuming that all arrivals occur at the beginning of the interval so that we have

167 ¯ ¯ ¯ 퐷푖푡 ≤ (푄푖푡 +퐴푖푡)(1−exp(−휇휃)). Substituting this inequality into the capacity bound (B.15) and rearranging we obtain,

−휇휃 ¯ −휇휃 ¯ 푒 퐴푖푡 ≤ 퐶푖 − 푒 푄푖푡. (B.16)

Next, we note that the time average initial utilization for periodic interval 푡 is related to the associated time averages from the previous periodic interval by the recurrence relation, ¯ ¯ ¯ ¯ 푄푖푡 = 푄푖훿(푡,1) + 퐴푖훿(푡,1) − 퐷푖훿(푡,1).

¯ Bounding the number of departures, 퐷푖훿(푡,1), in an analogous manner we obtain the inequality, ¯ 휇휃 ¯ ¯ 퐴푖훿(푡,1) ≤ 푒 푄푖푡 − 푄푖훿(푡,1). (B.17)

Using these results, we are able to bound the average utilization during periodic interval 푡 as follows.

푇 퐾 푇 ∑︁ ∑︁ ∑︁ ¯ ∑︁ ¯ 푓푖(푠, 푡)휆푘푠푃푖푘(푆)훼푘푠(푆) = 푓푖(푠, 푡)퐴푖푠 (B.18) 푠=1 푘=1 푆∈풮 푠=1 ∞ ∑︁ −(푙+1)휇휃 ¯ = 푒 퐴푖훿(푡,푙) (B.19) 푙=0 ∞ −휇휃 ¯ ∑︁ −(푙+1)휇휃 ¯ ≤ 퐶푖 − 푒 푄푖푡 + 푒 퐴푖훿(푡,푙) (B.20) 푙=1 ∞ −휇휃 ¯ ∑︁ −푙휇휃 (︀ ¯ −휇휃 ¯ )︀ ≤ 퐶푖 − 푒 푄푖푡 + 푒 푄푖훿(푡,푙−1) − 푒 푄푖훿(푡,푙) 푙=1 (B.21)

(︀ −(푙+1)휇휃 ¯ )︀ = 퐶푖 − lim sup 푒 푄푖훿(푡,푙) (B.22) 푙→∞

= 퐶푖 (B.23)

Where the (B.19) follows from the definition of the future load, (B.20) and (B.21) follow from the bounds above due to the capacity inequality, (B.22) follows from the telescoping series, and (B.23) is due to the fact that each initial utilization is bounded

168 by the largest capacity max푖 퐶푖. This inequality demonstrates that any admissible periodic fixed policy corresponds

퐿푃 to a set of decision variables 훼푘푠 that are feasible for (3.10). Therefore since 퐽 represents the maximum cyclic expected revenue over the space of feasible variables, the lemma is proved.

In the infinite time horizon setting we follow the TDR policy as defined insection 3.2 by solving the policy-guiding linear program (3.14) using the refined future load function B.14. Then during periodic epoch 휏, the set to offer is selected according to the probability distribution defined by the decision variables 훼퐿푃 corresponding to the periodic subinterval 푡(휏).

Theorem 5. The expected revenue over a single cycle of the TDR policy in steady state satisfies the following performance guarantee,

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 퐽 ≥ min 푖 푒−2휇휃퐽 * ≥ 푒−2휇휃−1퐽 *. (B.24) 푖 푞! 푞=0

In particular, as the approximation interval width 휃 becomes smaller, this perfor-

1 mance ratio approaches 푒 and as min푖 퐶푖 grows large this performance ratio approaches 1 −2휇휃 2 푒 , both independently of other problem parameters.

Proof. Proof of theorem 7 in the infinite time horizon setting. We begin by focusing on a single resource 푖 and periodic interval 푡. For clarity of notation all arrival and initial utilization random variables are assumed to be those under the TDR policy.

Under the TDR policy, 푍푖푘푡, the number of customers of type 푘 selecting resource ˜ 푖 during periodic interval 푡 is a Poisson random variable with mean, 휆푖푘푡. Thereby the overall number of customers selecting resource 푖 during period 푡, 푍푖푘푡 is also ˜ ∑︀퐾 ˜ Poisson with mean 휆푖푡 = 푘=1 휆푖푘푡. We let 퐴푖푘푡 denote the actual number of type 푘 customers who are able to begin service at resource 푖 during subinterval 푡 and we note that 퐴푖푘푡 ≤ 푍푖푡푘 due to the capacity constraint. Within subinterval 푡 the operator

∑︀퐾 푟푖푘 earns an expected revenue given by 퐽푖푡 = E[ 푘=1 휇 퐴푖푘푡]. On the other hand the 푃 퐺 ∑︀퐾 푟푖푘 ˜ linear programming upper bound in (3.14) assumes a revenue of 퐽푖푡 = 푘=1 휇 휆푖푘푡

169 for this resource within this period. Our objective here is to bound the gap between these two terms. We will make use of the blended revenue rate,

퐾 ˜ ∑︁ 푟푖푘휆푖푘푡 푟¯ = , 푖푡 ∑︀퐾 ˜ 푘=1 푘=1 휆푖푘푡

푃 퐺 which is average revenue of an arriving customer in (3.14). Therefore we have 퐽푖푡 = ˜ 푟¯푖푡휆푖푡.

Therefore, the number of accepted arrivals 퐴푖푘푡 depends on the interplay between the initial utilization 푄푖푡 and the number of arrivals of each type 푍푖푘푡. We observe that the distribution of 푄푖푡 depends on the complex interaction of previous arrivals and service times with the capacity constraint as well as the fluctuations of the true ˜ ˜ demand rates. This motivates us to define an upper bound, 푄푖푡, such that 푄푖푡 ≥ 푄푖푡 ˜ along each sample path. To define 푄푖푡, we consider an alternative system with two primary modifications. The first is that in the alternate system, customer arrivalsare not subject to the capacity limit 퐶푖. In addition, we assume that all arrivals in the alternate system occur at the end of their periodic interval. Thus, in this system, no arrivals occurring prior to the epoch at hand depart in the same period in which they arrive. Since, under the TDR policy, the number of raw arrivals 푍푖푡 to each resource in each epoch is a Poisson random variable, the number of admitted customers from each prior epoch is itself a Poisson random variable that has been further split by the ˜ probability of departure. It follows then that under the 푄푖푡 is the sum of independent

Poisson random variables and is itself Poisson with expectation taken to be 휔푖푡. Based

170 on the definition of the future load function this mean can be expressed as,

∞ ∑︁ −휇(ℓ−1)휃 ˜ 휔푖푡 = 푒 휆푖훿(푡,ℓ) ℓ=1 ∞ 2휇휃 ∑︁ −휇(ℓ+1)휃 ˜ = 푒 푒 휆푖훿(푡,ℓ) ℓ=1 (︃ 푇 [︃ (︃ ∞ )︃ ]︃ )︃ 2휇휃 ∑︁ −(푑(푠,푡)+1)휇휃 ∑︁ (︀ −푇 휇휃)︀푗 ˜ −휇휃 ˜ = 푒 푒 푒 휆푖푠 − 푒 휆푖푡 푠=1 푗=0 (︃(︃ 푇 )︃ )︃ 2휇휃 ∑︁ ˜ −휇휃 ˜ = 푒 푓푖(푠, 푡)휆푖푠 − 푒 휆푖푡 . 푠=1

We then note that along each sample path as defined by the time of customer ˜ arrivals and their respective service lengths we have 푄푖푡 ≥ 푄푖푡. This construction is ˜ useful as it allows us to define the random variables 퐴푖푘푡 which is analogous to 퐴푖푘푡 ˜ except assuming that the initial utilization is given by 푄푖푡. Therefore, we observe ˜ that on each sample path 퐴푖푘푡 ≤ 퐴푖푘푡. Using these constructions, we can lower bound the expected revenue under the TDR policy by,

[︃ 퐾 ]︃ [︃ 퐾 ]︃ ∑︁ 푟푖푘 ∑︁ 푟푖푘 퐴 ≥ 퐴˜ E 휇 푖푘푡 E 휇 푖푘푡 푘=1 푘=1

퐶푖−1 ∞ [︃ 퐾 ]︃ ∑︁ ∑︁ ∑︁ 푟푖푘 = 퐴˜ | 푄˜ = 푞, 푍 = 푧 (푄˜ = 푞) (푍 = 푧) E 휇 푖푘푡 푖푡 푖푡 P 푖푡 P 푖푡 푞=0 푧=0 푘=1

퐶푖−1 퐶푖−푞 [︃ 퐾 ]︃ ∑︁ ∑︁ ∑︁ 푟푖푘 ≥ 퐴˜ | 푄˜ = 푞, 푍 = 푧 (푄˜ = 푞) (푍 = 푧). E 휇 푖푘푡 푖푡 푖푡 P 푖푡 P 푖푡 푞=0 푧=0 푘=1

In the last inequality above we have removed all terms in which there are more ˜ arrivals 푍푖푡 than spaces remaining at the beginning of the subinterval, 퐶푖 − 푄푖푡.

In the remaining cases none of the arrivals 푍푖푘푡 interfere with each other and thus ˜ 퐴푖푘푡 = 푍푖푘푡 and the expected revenue rate earned by each such arrival is equal to 푟¯푖푡 as defined above. Thus, in this case the revenue earned depends only on the distribution ˜ ˜ of the Poisson random variables 푍푖푡 and 푄푖푡 and their respective means 휆푖푡 and 휔푖푡. We further note that the capacity constraint in (3.14) yields an upper bound on their

171 sum of these two parameters,

≤ 퐶푖 .

Applying these facts yields,

[︃ 퐾 ]︃ 퐶푖−1 퐶푖−푞 ∑︁ 푟푖푘 ∑︁ ∑︁ 푟¯푖푡 퐴 ≥ 푧 (푄˜ = 푞) (푍 = 푧) E 휇 푖푘푡 휇 P 푖푡 P 푖푡 푘=1 푞=0 푧=0

˜ The second inequality follows as 휔푖푡 ≤ 퐶푖 − 휆푖푡 and the expression above is decreasing in 휔푖푡. The final equality follows from simplification of terms.

Applying this result to each resource and periodic time interval results in the desired lower bound on the overall performance of the TDR policy with respect to

172 the optimal periodic fixed policy.

[︃ 푁 푇 퐾 ]︃ ∑︁ ∑︁ ∑︁ 푟푖푘 퐽 = 퐴 E 휇 푖푘푡 푖=1 푡=1 푘=1 푁 (︃퐶 −1 )︃ 푇 푖 푞 −퐶푖 ∑︁ ∑︁ 퐶 푒 ∑︁ 푟¯푖푡 ≥ 푖 휆˜ 푞! 휇 푖푡 푖=1 푞=0 푡=1

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 ≥ min 푖 퐽 푃 퐺 푖 푞! 푞=0

(︃퐶푖−1 푞 )︃ ∑︁ 퐶 푒−퐶푖 ≥ min 푖 푒−2휇휃퐽 * 푖 푞! 푞=0 ≥ 푒−2휇휃−1퐽 *

The proof of the asymptotic optimality of the TDR policy in a heavy traffic system is identical between the finite time horizon and infinite time horizon under periodic demand and is therefore omitted.

B.3 Tables of Computational Experiment Results

The results of our numerical experiments in the assortment and pricing scenarios are presented in the sections below.

173 B.3.1 Assortment Results

Table B.1: Assortment Experiment Price by Resource T=20