Censored Exploration and the Dark Pool Problem

Censored Exploration and the Dark Pool Problem Kuzman Ganchev, Michael Kearns, Yuriy Nevmyvaka, Jennifer Wortman Vaughan Computer and Information Science, University of Pennsylvania Abstract to V shares of a given stock on behalf of a client 1,and does so by distributing or allocating them over multiple distinct exchanges (venues) known as dark pools. We introduce and analyze a natural algo- Dark pools are a recent type of stock exchange in which rithm for multi-venue exploration from cen- relatively little information is provided about the cur- sored data, which is motivated by the Dark rent outstanding orders (Wikipedia, 2009, Bogoslaw, Pool Problem of modern quantitative finance. 2007). The trader would like to execute as many of the We prove that our algorithm converges in V shares as possible. If vi shares are allocated to dark polynomial time to a near-optimal alloca- pool i, and all of them are executed, the trader learns tion policy; prior results for similar prob- only that the liquidity available at exchange i was at lems in stochastic inventory control guaran- least vi, not the actual larger number that could have teed only asymptotic convergence and exam- executed there; this important aspect of our frame- ined variants in which each venue could be work is known as censoring in the statistics literature. treated independently. Our analysis bears a strong resemblance to that of efficient explo- In this work we make the natural and common as- ration/exploitation schemes in the reinforce- sumption that the maximum amount of consumption i ment learning literature. We describe an ex- available in venue at each time step (e.g., the to- tensive experimental evaluation of our algo- tal liquidity available in the example above) is drawn P rithm on the Dark Pool Problem using real according to a fixed but unknown distribution i.For- v trading data. mally speaking, this means that when i units are sub- mitted to venue i,avaluesi is drawn randomly from Pi and the observed (and possibly censored) amount of consumption is min{si,vi}. 1 Introduction A learning algorithm receives a sequence of volumes V 1,V2,... and must decide how to distribute the V t We analyze a framework and algorithm for the problem units across the venues at each time step t. Our goal is of multi-venue exploration from censored data.Con- to efficiently (in time polynomial in the “complexity” sider a setting in which at each time period, we have of the Pi and other parameters) learn a near-optimal some volume of V units (possibly varying with time) of allocation policy. There is a distinct between-venue ex- an abstract good. Our goal is to “sell” or “consume” ploration component to this problem, since the “right” as many of these units as possible at each step, and number of shares to submit to venue i may depend on there are K abstract “venues” in which this selling or t both V and the distributions for the other venues, and consumption may occur. We can divide our V units the only mechanism by which we can discover the dis- in any way we like across the venues in service of this tributions is by submitting allocations. If we routinely goal. Our interest in this paper is in how to efficiently submit too-small volumes to a venue, we receive cen- learn a near-optimal allocation policy over time, under sored observations and are underutilizing the venue; stochastic assumptions on the venues. if we submit too-large volumes we receive uncensored This setting belongs to a broad class of problems (or direct) observations but have excess inventory. known in the operations research literature as perish- 1In our setting it is important that we view V as given able inventory problems (see Related Work below). In exogenously by the client and not under the trader’s con- the Dark Pool Problem (discussed extensively in Sec- trol, which distinguishes our setting somewhat from prior tion 5), at each time step a trader must buy or sell up works; see Related Work. Our main theoretical contribution is a provably Our main theoretical contribution is thus the devel- polynomial-time algorithm for learning a near-optimal opment and analysis of a multiple venue, polynomial policy for any unknown venue distributions Pi.This time, near-optimal allocation learning algorithm, while algorithm takes a particularly natural and appealing our main experimental contribution is the application form, in which allocation and distribution reestimation of this algorithm to the Dark Pool Problem. are repeatedly alternated. More precisely, at each time step we maintain distributional estimates Pî; pretend- 2 Preliminaries ing that these estimates are in fact exactly correct, we V allocate the current volume accordingly. These allo- We consider the following problem. At each time cations generate observed consumptions in each venue, t, a learner is presented with a quantity or volume Pˆ which in turn are used to update or reestimate the i. V t ∈{1, ··· ,V} of units, where V t is sampled from Q We show that when the Pî are “optimistic tail mod- an unknown distribution . The learner must decide t ifications” of the classical Kaplan-Meier maximum on an allocation v of these shares to a set of K known vt ∈{ , ··· ,Vt} i ∈{ , ··· ,K} likelihood estimator for censored data, this estimate- venues,with i 0 for each 1 , K t t allocate loop has provably efficient between-venue ex- and i=1 vi = V . The learner is then told the t ploration behavior that yields the desired result. number of units ri consumed at each venue i.Here t t t t Venues with smaller available volumes (relative to the ri =min{si,vi },wheresi is the maximum consump- overall volume V t and the other venues) are gradu- tion level of venue i at time t, which is sampled inde- ally given smaller allocations in the estimate-allocate pendently from a fixed but unknown distribution Pi. t t loop, whereas venues with repeated censored observa- If ri = vi , we say that the algorithm receives a cen- tions are gradually given larger allocations, eventually sored observation because it is possible to infer only t t t t settling on a near-optimal overall allocation distribu- that ri ≤ si.Ifri <vi , we say that the algorithm re- tion. Interestingly, the analysis of our algorithm bears ceives a direct observation because it must be the case t t strong resemblance to the exploration-exploitation ar- that ri = si. guments common in the E3 and RMAX family of algo- The goal of the learner is to discover a near-optimal rithms for reinforcement learning (Kearns and Singh, one-step allocation policy, that is, an allocation pol- 2002, Brafman and Tennenholtz, 2003). icy that approximately optimizes the expected number of units out of V t consumed at each time step t. 1.1 Related Work (We briefly discuss other objectives at the end of Sec- tion 4.4.) The problem perhaps closest to our setting is the Throughout the remainder of the paper, we use the widely studied newsvendor problem from the opera- shorthand Ti for the tail probabilities associated with tions research literature. In this problem, at each Pi.Thatis,Ti(s)= Pi(s ). Clearly Ti(0) = 1 time period a player (representing a newsstand owner) s ≥s ˆt chooses the quantity V of newspapers to purchase at for all i.WeuseTi (s) for an empirical estimate of t t t a fixed per-unit price, and tries to optimize profit in Ti(s)attimet, and define Pî (s)=Tî (s) − Tî (s +1) the face of demand uncertainty at a single venue (their to be the empirical estimate of Pi(s)attimet. newsstand). There is a large and diverse literature on this single-venue problem; see Huh et al. (2009) and 3 Greedy Allocation is Optimal the citations within. In this same paper, the authors are the first to consider the use of the Kaplan-Meier In this section, we show that given estimates Tî of the estimator in perishable inventory problems. They use tail probabilities Ti for each venue i, a simple greedy an estimate-allocate loop similar to ours, and show allocation algorithm can to maximize the (estimated) asymptotic convergence to near-optimal behavior in expected number of units consumed at a single time a single venue. Managing the distribution of an ex- step. The greedy algorithm allocates one unit at a ogenously specified volume V across multiple venues time. The venue to which the next unit is allocated is (which are the important aspects of the Dark Pool chosen to maximize the estimated probability that the Problem, where the volume to be traded is specified unit will be consumed; if vi units have already been by a client, and there are many dark pools), and the allocated to venue i, then the estimated probability attendant exploration-exploitation trade-off between that the next allocated unit will be consumed is simply venues, are key aspects and differentiators of our al- Tî(vi + 1). A formal description is given as Algorithm gorithm and analysis. We also obtain stronger (poly- 1below. nomial time rather than asymptotic) bounds, which requires a modification of the classical Kaplan-Meier Theorem 1 The allocation returned by Greedy max- estimator. imizes the expected number of units consumed in a sin- Algorithm 1: Optimal allocation algorithm Greedy . At the highest level, the algorithm is quite simple and ˆt K natural. The algorithm maintains estimates Ti for the Input:VolumeV , tail probability estimates {Tî}i=1 true unknown tail probabilities Ti for each venue i.

Load more