Censored Exploration and the Dark Pool Problem
Kuzman Ganchev, Michael Kearns, Yuriy Nevmyvaka, Jennifer Wortman Vaughan Computer and Information Science, University of Pennsylvania
Abstract to V shares of a given stock on behalf of a client 1,and does so by distributing or allocating them over multi- ple distinct exchanges (venues) known as dark pools. We introduce and analyze a natural algo- Dark pools are a recent type of stock exchange in which rithm for multi-venue exploration from cen- relatively little information is provided about the cur- sored data, which is motivated by the Dark rent outstanding orders (Wikipedia, 2009, Bogoslaw, Pool Problem of modern quantitative finance. 2007). The trader would like to execute as many of the We prove that our algorithm converges in V shares as possible. If vi shares are allocated to dark polynomial time to a near-optimal alloca- pool i, and all of them are executed, the trader learns tion policy; prior results for similar prob- only that the liquidity available at exchange i was at lems in stochastic inventory control guaran- least vi, not the actual larger number that could have teed only asymptotic convergence and exam- executed there; this important aspect of our frame- ined variants in which each venue could be work is known as censoring in the statistics literature. treated independently. Our analysis bears a strong resemblance to that of efficient explo- In this work we make the natural and common as- ration/exploitation schemes in the reinforce- sumption that the maximum amount of consumption i ment learning literature. We describe an ex- available in venue at each time step (e.g., the to- tensive experimental evaluation of our algo- tal liquidity available in the example above) is drawn P rithm on the Dark Pool Problem using real according to a fixed but unknown distribution i.For- v trading data. mally speaking, this means that when i units are sub- mitted to venue i,avaluesi is drawn randomly from Pi and the observed (and possibly censored) amount of consumption is min{si,vi}. 1 Introduction A learning algorithm receives a sequence of volumes V 1,V2,... and must decide how to distribute the V t We analyze a framework and algorithm for the problem units across the venues at each time step t. Our goal is of multi-venue exploration from censored data.Con- to efficiently (in time polynomial in the “complexity” sider a setting in which at each time period, we have of the Pi and other parameters) learn a near-optimal some volume of V units (possibly varying with time) of allocation policy. There is a distinct between-venue ex- an abstract good. Our goal is to “sell” or “consume” ploration component to this problem, since the “right” as many of these units as possible at each step, and number of shares to submit to venue i may depend on there are K abstract “venues” in which this selling or t both V and the distributions for the other venues, and consumption may occur. We can divide our V units the only mechanism by which we can discover the dis- in any way we like across the venues in service of this tributions is by submitting allocations. If we routinely goal. Our interest in this paper is in how to efficiently submit too-small volumes to a venue, we receive cen- learn a near-optimal allocation policy over time, under sored observations and are underutilizing the venue; stochastic assumptions on the venues. if we submit too-large volumes we receive uncensored This setting belongs to a broad class of problems (or direct) observations but have excess inventory. known in the operations research literature as perish- 1In our setting it is important that we view V as given able inventory problems (see Related Work below). In exogenously by the client and not under the trader’s con- the Dark Pool Problem (discussed extensively in Sec- trol, which distinguishes our setting somewhat from prior tion 5), at each time step a trader must buy or sell up works; see Related Work. Our main theoretical contribution is a provably Our main theoretical contribution is thus the devel- polynomial-time algorithm for learning a near-optimal opment and analysis of a multiple venue, polynomial policy for any unknown venue distributions Pi.This time, near-optimal allocation learning algorithm, while algorithm takes a particularly natural and appealing our main experimental contribution is the application form, in which allocation and distribution reestimation of this algorithm to the Dark Pool Problem. are repeatedly alternated. More precisely, at each time step we maintain distributional estimates Pˆi; pretend- 2 Preliminaries ing that these estimates are in fact exactly correct, we V allocate the current volume accordingly. These allo- We consider the following problem. At each time cations generate observed consumptions in each venue, t, a learner is presented with a quantity or volume Pˆ which in turn are used to update or reestimate the i. V t ∈{1, ··· ,V} of units, where V t is sampled from Q We show that when the Pˆi are “optimistic tail mod- an unknown distribution . The learner must decide t ifications” of the classical Kaplan-Meier maximum on an allocation v of these shares to a set of K known vt ∈{ , ··· ,Vt} i ∈{ , ··· ,K} likelihood estimator for censored data, this estimate- venues,with i 0 for each 1 , K t t allocate loop has provably efficient between-venue ex- and i=1 vi = V . The learner is then told the t ploration behavior that yields the desired result. number of units ri consumed at each venue i.Here t t t t Venues with smaller available volumes (relative to the ri =min{si,vi },wheresi is the maximum consump- overall volume V t and the other venues) are gradu- tion level of venue i at time t, which is sampled inde- ally given smaller allocations in the estimate-allocate pendently from a fixed but unknown distribution Pi. t t loop, whereas venues with repeated censored observa- If ri = vi , we say that the algorithm receives a cen- tions are gradually given larger allocations, eventually sored observation because it is possible to infer only t t t t settling on a near-optimal overall allocation distribu- that ri ≤ si.Ifri