Super-Replication of the Best Pairs Trade in Hindsight Arxiv

Super-Replication of the Best Pairs Trade in Hindsight

Alex Garivaltis∗ March 18, 2019

Abstract This paper derives a robust on-line equity trading algorithm that achieves the greatest possible percentage of the final wealth of the best pairs rebalancing rule in hindsight. A pairs rebalancing rule chooses some pair of stocks in the market and then perpetually executes rebalancing trades so as to maintain a target fraction of wealth in each of the two. After each discrete market fluctu- ation, a pairs rebalancing rule will sell a precise amount of the outperforming stock and put the proceeds into the underperforming stock. Under typical conditions, in hindsight one can find pairs rebalancing rules that would have spectacularly beaten the market. Our trading strategy, which extends Ordentlich and Cover’s (1998) “max-min universal portfolio,” guarantees to achieve an acceptable percentage of the hindsight-optimized wealth, a percentage which tends to zero at a slow (polynomial) rate. This means that on a long enough investment horizon, the trader can enforce a compound-annual growth rate that is arbitrarily close to that of the best pairs rebalancing rule in hindsight. The strategy will “beat the market asymptotically” if there turns out to exist a pairs rebalancing rule that grows capital at a higher asymptotic rate than the market index. The advantages of our algorithm over the Ordentlich and Cover (1998) strategy are twofold. First, their strategy is impossible to compute in practice. Sec- ond, in considering the more modest benchmark (instead of the best all-stock arXiv:1810.02444v3 [q-fin.PM] 14 Mar 2019 rebalancing rule in hindsight), we reduce the “cost of universality” and achieve a higher learning rate.

Keywords: Super-replication, Pairs trading, Correlation options, Constant- rebalanced portfolios, Universal portfolios, Kelly criterion, Robust procedures, Minimax

JEL Classiﬁcation: C44, D81, D83, G11, G12, G13

∗Assistant Professor of Economics, Northern Illinois University, 514 Zulauf Hall, DeKalb IL 60115. E-mail: [email protected]. ORCID: 0000-0003-0944-8517.

1 1 Introduction

1.1 Literature Review

The theory of asymptotic portfolio growth was initiated by Kelly (1956), who considered repeated bets on horse races with odds that diverge from the true win probabilities. Kelly set forth the natural goal of optimizing the asymptotic growth rate of one’s capital. This implies that one should act each period so as to maximize the expected log of his capital. By the Law of Large Numbers, the realized per-period continuously-compounded growth rate converges to the expected growth rate. The Kelly rule was used by Beat the Dealer author Edward O. Thorp (1966) to properly size his bets at the Nevada blackjack tables. For example, imagine a situation where you have a 50.5% chance of winning the next hand. What percentage of your net worth should you bet? The classical mean-variance (Markowitz 1952) theory has no answer, except to say that it depends on your particular appetite for risk. For instance, the extreme choices of betting 0 percent or betting 100 percent are both undominated in the mean-variance plane. The Kelly criterion gives a much more satisfactory answer: bet 50.5% − 49.5% = 1% of your wealth. This achieves the (optimum) capital growth rate of 0.005% per hand played in this (favorable) situation. By the rule of 72, you would expect to double your wealth after approximately 72/0.005 = 14, 400 hands. Thus, it became clear to many people that the log-optimal portfolio theory should replace mean-variance as the dominant decision criterion. Breiman (1961) proved that the Kelly rule outperforms any “essentially different strategy” by an exponential factor, and it has the shortest mean waiting time to reach a distant wealth goal. Thorp’s (2017) biography discusses his use of log-optimal portfolios in his money management career on Wall Street. Cover’s (1987) survey and his information theory textbook (2006) are excellent primers of the theory of asymptotic growth. Cover and Gluss (1986) were the first to exhibit an on-line trading algorithm that could achieve the Kelly growth rate even when starting in total ignorance of the return process. Assuming finitely-supported returns, they applied Blackwell’s (1956) approachability theorem to get a trading strategy that grows wealth at the same asymptotic rate as the best rebalancing rule (or fixed-fraction betting scheme) in hindsight. Thus began a whole host of so-called “universal trading strategies” that, under mild conditions, “beat the market asymptotically” for highly arbitrary (e.g. nonstationary or serially correlated) return processes. Cover (1991) gave the first simple and intuitive universal portfolio, at the same time removing the restriction to finitely-supported returns. Jamshidian (1992) trans- planted Cover’s (1991) idea into a continuous-time market with several correlated stocks whose Itôprocesses have unknown, time-varying parameters that satisfy some asymptotic stability conditions. Ordentlich and Cover (1996) gave the “universal portfolio with side information,” along with more perspicuous proofs of the main (1991) regret bounds. For example, Thorp’s infamous “count” in Blackjack is a canonical source of side information. Ordentlich and Cover (1998) super-replicated the final wealth of the best rebalancing rule in hindsight at time-0, although they did not use the terminology of financial derivatives so thoroughly. It seems that their paper was not inspired so much by derivative pricing as it was by Shtarkov’s (1987) “universal source code” in information theory. Properly interpreted, the universal source code amounts to a robust scheme for betting on repeated horse races with unknown (and perhaps nonstationary) win probabilities. More recently, Iyengar (2005) has studied universal investment for discrete-time markets with two assets and proportional transaction costs. Stoltz and Lugosi (2005)

2 extended the game-theoretic notion of internal regret to the case of on-line portfolio selection problems. DeMarzo, Kremer, and Mansour (2006) used discrete-time on- line trading algorithms to derive no-arbitrage bounds for the prices of derivative securities. Györfi,Lugosi, and Udina (2006) gave universal procedures that find and exploit hidden complicated dependences of asset prices on the past evolution of the market. Kozat and Singer (2011) investigated semiconstant rebalanced portfolios that may (to avoid transaction costs) opt out of rebalancing altogether in selected investment periods.

1.2 Contribution

This paper oﬀers a workaround for two practical problems encountered by would-be practitioners of Ordentlich and Cover’s (1998) max-min universal portfolio. First, for markets with many assets, the practitioner must wait a tremendously long time for his bankroll to “pull away” from the market averages. Second, the on-line portfolio weights are impossible to calculate in practice, since the Ordentlich-Cover algorithm requires large-scale computation of multilinear forms. The cleverest methods of computation either exhaust the computer’s memory or else they require eons of CPU time. Ordentlich and Cover’s (1998) max-min universal portfolio is only viable for markets with two or three stocks, at best. Naturally, we want procedures that work for a market with, say, 500 assets. Accordingly, we take up the more modest goal of performing well (at the end of the investment horizon) relative to the best pairs rebalancing rule in hindsight. Our notion of a “pairs rebalancing rule” allows for the degenerate possibility of keeping 100% of wealth in either of the two stocks. Thus, the best pairs rebalancing rule in hindsight will do at least as well as the best performing stock in the market. Our use

3 of a less aggressive benchmark leads to a computable trading strategy that “learns” more quickly, although in the long run its “understanding” of market dynamics will be somewhat less subtle than that of the original Ordentlich and Cover (1998) strategy.

1.3 Motivating Example

To motivate the paper, we use a continuous-time version of “Shannon’s Demon” (Poundstone 2010) to illustrate the fact that the possibility of “beating the market asymptotically” is no contradiction to the random walk model of stock prices. For simplicity, consider two stocks i ∈ {1, 2} that follow independent geometric Brownian motions. Suppose that the price processes Si(t) evolve according to

2 dSi(t) σ = dt + σ dWi(t), (1) Si(t) 2

where W1(t),W2(t) are independent unit Brownian motions. In Shannon’s original lecture (Poundstone 2010), at each (discrete) time step the stock price either doubled or got cut in half, each with equal probability. To match this tradition, we put σ = log 2 = 0.7. We have

Si(t) = eσWi(t) = 2Wi(t). (2) Si(0)

Note that lim log{Si(t)/Si(0)}/t = 0. This means that the stocks themselves have t→∞ zero asymptotic growth; they trade “sideways.” Now, consider a gambler who continuously maintains half his wealth in each stock. This is not a buy-and-hold strategy: the rebalancing rule dictates that he sell some shares of whichever stock performed better over [t, t + dt]. He puts the proceeds into the underperforming stock. If the trader starts with a dollar, his wealth V (t) evolves

4 according to

2 dV (t) 1 dS1(t) 1 dS2(t) σ σ = · + · = dt + [dW1(t) + dW2(t)]. (3) V (t) 2 S1(t) 2 S2(t) 2 2

Applying Itˆo’sLemma for functions of several diﬀusion processes (Wilmott 2001), we get σ V (t) = exp σ2t/4 + [W (t) + W (t)] . (4) 2 1 2

We thus have lim log[V (t)]/t = σ2/4 = 12%. From two dead-money substrates, the t→∞ gambler has manufactured continuous growth at a rate of 12% per unit time, leaving the market portfolio in the dust. Notice that this growth is merely the result of “volatility harvesting” (Poundstone 2010) or “volatility pumping” (Luenberger 1998). Note that the gambler has not attempted to guess which stock will outperform over the interval [t, t+dt] — rather, he just rebalances his portfolio after the fact. A sample path for Shannon’s Demon has been simulated in Figure 1. For a pair of correlated stocks, the dynamics will be substantially the same, albeit with a lower growth rate. It is an axiom of capital growth theory that one should seek out a pair of volatile, uncorrelated stocks. But which ones? For the Dow Jones (30) stocks, there are

30 2 = 435 pairs to choose from. For the S&P 500, there are 124, 750. For a badly chosen pair {i, j} of stocks, the gambler may very well beat stocks i and j but still underperform the market as a whole. The best pair {i∗, j∗} will only be apparent in hindsight.

2 Deﬁnitions and Notation

We consider a ﬁnancial market with m assets, called j ∈ {1, ..., m}. For convenience, we will refer to these assets merely as “stocks,” although they can be any sort of

5 Figure 1: Shannon’s Demon in Continuous Time

ﬁnancial products whatsoever (cash, bonds, lottery tickets, arrow securities, insurance contracts, real estate, etc). The stocks are traded in T discrete sessions, called t ∈

{1, ..., T }. We let xtj ≥ 0 be the gross return of a $1 investment (or “bet”) on stock j in session t. xt := (xt1, ..., xtm) is the gross-return vector in session t. We will

m require only that xt ∈ R+ − {0}. This means that at least one of the assets must have a strictly positive gross return. In the sequel, we will adhere to the individual sequence approach to investment that was pioneered by Ordentlich and Cover (1996).

T This means that we will assume no particular dynamics for (xt)t=1; the analysis will be completely model-independent. However, to make some concrete sense of what follows, the reader may want to keep the following examples in mind.

νj +σj tj Example 1. Log-normal random walk: put xtj := e , where tj :∼ N (0, 1),

Corr(tj, tk) = ρjk, and the vectors t := (t1, ..., tm) are independent across time.

Example 2. Kelly (1956) horse race: assume that m horses run T races, 1 ≤

6 t ≤ T . Prior to race t, a bookie sets the (gross) odds at Otj on horse j. This means

that a $1 bet on the winning horse yields a gross payoﬀ of Otj dollars. Any money bet

on the other horses is lost. If horse jt is the winner of race t, then the gross-return

vector is xt = Otjt ejt = (0, ..., Otjt , ..., 0), where e1, ..., em are the unit basis vectors for jt Rm.

We will consider constant rebalancing rules (or constant-rebalanced portfolios) m m P called b ∈ R , where bj ≥ 0 and bj = 1. At the start of trading session (race) j=1 t, the gambler distributes his wealth among the m stocks, putting the fraction bj of wealth into stock j. To adhere to a constant rebalancing rule, the gambler must generally trade every period. For, at the end of trading session t, the gambler now has the fraction b x j tj (5) hb, xti

m P of wealth in stock j, where hb, xti is the inner product bjxtj. If stock j outperformed j=1 the portfolio in session t, then the trader must sell some shares of stock j to restore the balance. Likewise, he must buy additional shares of stock j if it underperformed the portfolio as a whole. He can refrain from adjusting his holdings in stock j only if xtj = hb, xti, e.g. only when stock j’s performance is identical to that of the portfolio as a whole. After T plays, the wealth of the constant-rebalanced portfolio b is

Wb(x1, ..., xT ) := hb, x1ihb, x2i · · · hb, xT i, (6) where we have assumed that the gambler starts with $1 and keeps reinvesting all his capital. The ﬁnal wealth is just the product of the growth factors hb, xti from each trading session. Note that the degenerate rebalancing rule b = ej amounts to buying and holding stock j, as it keeps 100% of wealth in stock j. For the Kelly horse race,

7 the rebalancing rule b amounts to a ﬁxed-fraction betting scheme that bets the ﬁxed

t fraction bj of wealth on horse j in every race. In the sequel, we will let x := (x1, ..., xt)

t+1 t be the return history after t trading sessions, with transition law x = (x , xt+1).

t t For the Kelly horse race, this amounts to the win history j := (j1, ..., jt) ∈ {1, ..., m} .

Deﬁnition 1. Given the return history xt, the best rebalancing rule in hindsight is the rebalancing rule b∗(xt) that would have yielded the most ﬁnal wealth:

∗ t b (x ) := arg max hb, x1ihb, x2i · · · hb, xti, (7) b∈∆m

m n P where ∆m := b ∈ R+ : bj = 1 is the portfolio simplex. j=1

T Q The number D(x1, ..., xT ) := max hb, xti can be regarded as the payoﬀ of a b∈∆m t=1 path-dependent ﬁnancial derivative (“Cover’s Derivative”) of the m underlying stocks. In the terminology of the exotic option literature (cf. Wilmott 1998), this would be called a “correlation” or “rainbow” option. In the continuous-time context of several correlated stocks in geometric Brownian motion, Cover’s Derivative has been priced and replicated by the author (Garivaltis 2018), under the assumption of continuous rebalancing and levered hindsight optimization. By contrast, the present paper deals with discrete-time, unlevered rebalancing, and super-replication under total model uncertainty. In this extreme generality, there is no way to guarantee the solvency of leveraged rebalancing rules. Thus, neither the hindsight-optimization nor the super- replicating strategy will be permitted to use leverage.

Deﬁnition 2. A trading strategy must specify a portfolio vector θt = (θt1, ..., θtm) as

t−1 t−1 a function of the history x . In session t, the strategy bets the fraction θtj = θtj(x ) m P t−1 t−1 of wealth on stock j, where θtj(x ) = 1 and θtj(x ) ≥ 0. For simplicity, we j=1

8 t−1 t−1 t−1 0 0 write θ(x ) = (θt1(x ), ..., θtm(x )). We write h for the empty history, and θ(h ) for the initial portfolio vector.

Note that every trading strategy θ(•) induces a betting scheme for the Kelly horse

t−1 race. After observing the win history j = (j1, ..., jt−1), θ(•) prescribes that one

should be the fraction θtj(O1j1 ej1 , ..., Ot−1,jt−1 ejt−1 ) of wealth on horse j in race t, where Orjr was the gross odds on the winner of race r. The betting scheme induced by the Cover and Ordentlich (1998) universal portfolio is known in information theory as the universal source code (Shtarkov 1987).

Definition 3. The final wealth function Wθ induced by the trading strategy θ(•) is defined by

T 0 Y t−1 Wθ(x1, ..., xT ) := hθ(h ), x1ihθ(x1), x2i · · · hθ(x1, ..., xT −1), xT i = hθ(x ), xti (8) t=1

Deﬁnition 4. A super-hedge (or super-replicating strategy) for a derivative payoﬀ D(•) is a pair (p, θ), where θ(•) is a trading strategy and p is an initial deposit of money, such that

p · Wθ(x1, ..., xT ) ≥ D(x1, ..., xT ) (9)

m for all x1, ..., xT ∈ R+ − {0}.

In the words of an undated memo by Eric Benhamou of Goldman Sachs, “A super- hedge is defined as a portfolio that will generate greater or equal cash-flows in any outcome. A super-hedge guarantees to make no loss as the super-hedge more than offsets the derivative security.” The concept is due to Bensaid, Lesne, and Scheinkman

(1992). Note that in our context, we demand that the ﬁnal wealth p · Wθ dominate the derivative payoﬀ literally everywhere, and not merely with probability 1.

9 Deﬁnition 5. The super-hedging price (or super-replicating cost) p∗ of D(•) is the minimum initial deposit needed to super-replicate D(•), i.e. p∗[D] := inf p ≥ 0 : (p, θ) is a super-hedge for some θ .

Under this terminology, the cost of super-replicating the ﬁnal wealth of the best rebalancing rule in hindsight (Ordentlich and Cover 1998) is

T X n1 nm p(T, m) := (n1/T ) ··· (nm/T ) , (10) n1, ..., nm n1+···+nm=T

where the sum is taken over all solutions of the equation n1 + ··· + nm = T in non-negative integers.

n Deﬁnition 6. A pairs rebalancing rule is a rebalancing rule b ∈ R+ whose support has at most two stocks, e.g.

#supp(b) = #{j : bj > 0} ≤ 2. (11)

More generally, an s-stock rebalancing rule is deﬁned by the condition

#supp(b) ≤ s. (12)

Definition 7. The best pairs rebalancing rule in hindsight is the pairs rebalancing rule that would have yielded the greatest final wealth, given x1, ..., xT . The final wealth of the best pairs rebalancing rule in hindsight is

T (2) Y D (x1, ..., xT ) := max hc, x1i···hc, xT i = max max {bxti +(1−b)xtj}. #supp(c)≤2 (i,j):1≤i

(s) In general, D (x1, ..., xT ) will denote the ﬁnal wealth of the best s-stock rebalancing

10 rule in hindsight. Ordentlich and Cover (1998) corresponds to the special case s = m.

For the Kelly horse race, a pairs rebalancing rule is a ﬁxed-fraction betting scheme that, each race, bets all its money on the same two horses {i, j} in the same ﬁxed proportions (b, 1 − b). Thus, if three or more distinct horses wind up winning over the T races, every pairs rebalancing rule will eventually go bankrupt, just as soon as a horse other than i or j wins a race.

3 Super-Replication

Lemma 1. For any trading strategy θ(•), we have

X Wθ(ej1 , ..., ejT ) = 1. (14) T (j1,...,jT )∈{1,...,m}

Proof. In the definition of a final wealth function, start by substituting xT := ejT and summing both sides over all jT = 1, ..., m. Since the coordinates of any portfolio vector m m P T −1 P T −1 T −1 T −1 sum to 1, we get Wθ(x , ejT ) = Wθ(x )hθ(x ), ejT i = Wθ(x ), where jT =1 jT =1 T −1 Wθ(x ) is the final wealth function for the first T − 1 investment periods. Next,

T −2 substitute xT −1 := ejT −1 and sum over all jT −1 = 1, ..., m. We get Wθ(x ), and so on down the line. After summing over the indices jT , jT −1, ..., j2, we ﬁnally substitute m m P P 0 x1 := ej1 and get Wθ(ej1 ) = hθ(h ), ej1 i = 1, which is the desired result. j1=1 j1=1 Proposition 1. For any derivative D(•), we have the bound

∗ X p [D] ≥ D(ej1 , ..., ejT ). (15) T (j1,...,jT )∈{1,...,m}

Proof. In the deﬁnition of super-hedging, we substitute x1 = ej1 , x2 = ej2 , ··· , xT =

ejT and sum the inequality over all possible indices j1, ..., jT . By Lemma 1, the left-

11 hand side of the resulting inequality is equal to p∗[D].

Proposition 1 says that if (p, θ) dominates D(•) everywhere, then in particular it dominates the derivative payoﬀ for the special case of the Kelly horse race. However,

we have ignored the odds Otjt , and just substituted xt = ejt . This amounts to an artiﬁcial situation where a one dollar bet on the winning horse gets you your dollar back; otherwise the money is lost.

Proposition 2. For unit basis vectors xt := ejt , Cover’s Derivative has the value

(m) n1 n2 nm D (ej1 , ..., ejT ) = (n1/T ) (n2/T ) ··· (nm/T ) , (16)

where ni := #{t : jt = i} is the number of races won by horse i.

Here we have used the tacit convention that “00 := 1” for the situation where

there are horses i such that ni = 0.

Proof. We have to solve a standard Cobb-Douglas optimization problem over the simplex:

n1 nm max b1 ··· bm . (17) b∈∆m

∗ If ni = 0, then bi = 0, e.g. in hindsight no money should have been bet on horse

∗ i. For all other horses we have bi > 0, and Lagrange’s multipliers give the solution m ∗ P bi = ni/ ni = ni/T . i=1

(2) Corollary 1. For unit basis vectors xt = ejt , we have D = 0 if at least three

(2) distinct horses ever won a race. Otherwise, if jt ∈ {i, j} for all t, then D =

ni T −ni (ni/T ) (1 − ni/T ) .

T ∗ (2) m P T n T −n m Corollary 2. p [D ] ≥ 2 n (n/T ) (1 − n/T ) = 2 p(T, 2). n=0

12 Proof. We proceed to sum D(2) over all horse race sequences that have at most two

T winning horses i, j. To this end, we let k := (k1, k2, ..., kT ), where kt ∈ {i, j} denotes

T the winner of race t. For each i = 1, ..., m, we let ni(k ) denote the number of races won by horse i in the sequence kT .

T T X X T ni(k ) T nj (k ) [ni(k )/T ] [nj(k )/T ] i

(18)

In other words, among the 2T histories that have only horses i, j as winners, there

T are n histories for which horse i wins n times, and for all such histories we have D(2) = (n/T )n(1 − n/T )T −n.

Thus, we have found that the cost of achieving the best pairs rebalancing rule in

m ∗ (2) m hindsight is at least 2 p(T, 2). To prove the equality p [D ] = 2 p(T, m), we need m only exhibit a super-hedge that costs 2 p(T, 2).

Theorem 1. The (minimum) cost of achieving the best pairs rebalancing rule in

m hindsight is 2 p(T, 2). To achieve the super-replication, one can proceed as follows: for every pair {i, j} of stocks with i < j, purchase a minimum cost super-hedge at t = 0 for the ﬁnal wealth of the best {i, j} rebalancing rule in hindsight. This amounts to depositing p(T, 2) dollars into the Ordentlich and Cover (1998) strategy over stocks

m i and j, for an aggregate deposit of 2 p(T, 2).

This simple strategy, which is the best possible, leads to a ﬁnal wealth of at least T P Q ∗ ∗ max {bxti + (1 − b)xtj} . One of the terms (i , j ) of this sum will correspond i

13 dominate the ﬁnal wealth of the best pairs rebalancing rule in hindsight. However, in the worst-case scenario of the Kelly market, the trader’s wealth will be exactly equal to that of the best pairs rebalancing rule in hindsight.

Theorem 2. After T periods, the excess per-period continuously-compounded growth rate of the best pairs rebalancing rule in hindsight over and above that of the super- hedging trader is at most log m + log p(T, 2) 2 , (19) T which tends to 0 as T → ∞. Thus, the trader compounds his money at the same asymptotic rate as the best pairs rebalancing rule in hindsight.

m Proof. The trader takes his initial dollar and purchases 1/ 2 p(T, 2) super-hedges of the ﬁnal wealth of the best pairs rebalancing rule in hindsight. From the deﬁnition of super-hedge, we have

m p(T, 2)W (x , ..., x ) ≥ D(2)(x , ..., x ). (20) 2 θ 1 T 1 T

Taking logs, we have the uniform bound

log D(2) − log W log m + log p(T, 2) θ ≤ 2 . (21) T T

√ The fact that lim 1 log p(T, 2) = 0 follows from the upper bound p(T, 2) ≤ 2 T + 1 T →∞ T (Ordentlich and Cover 1998, Lemma 3).

It remains to write explicit formulas for the super-replicating strategy. To this

t t end, let Wij(x ) be the wealth, after x , that has accrued to a $1 deposit into the Ordentlich-Cover (1998) strategy applied to the speciﬁc pair {i, j} of stocks, with i < j. Alternatively, one can use the sequential-minimax universal portfolio (Garivaltis

14 2018) applied to stocks {i, j}. On account of the fact that we have made an initial deposit of p(T, 2) dollars into each distinct pairs strategy, our aggregate wealth after

t P t t x will be p(T, 2) Wij(x ). Let bij(x ) denote the fraction of wealth held by this i

k−1 m X t t X t t p(T, 2)Wik(x )[1 − bik(x )] + p(T, 2)Wki(x )bki(x ). (22) i=1 i=k+1

Thus, the total fraction of wealth to bet on stock k in session t + 1 (after return history xt) is

k−1 m P t t P t t Wik(x )[1 − bik(x )] + Wki(x )bki(x ) θ (xt) = i=1 i=k+1 . (23) t+1,k P t Wij(x ) i

This expression accounts for the total wealth held by the m − 1 pairs strategies (i, k) and (k, i) that have stock k in the portfolio, as a fraction of the aggregate wealth held

m by all 2 strategies. The practitioner is required to keep track of the wealths and m portfolio vectors of 2 separate pairs strategies.

3.1 Generalized Max-Min Game

Ordentlich and Cover (1998) considered a two-person zero-sum trading game between the trader (Player 1) and nature (Player 2). The trader picks an entire trading algorithm θ(•) while nature simultaneously picks the returns (x1, ..., xT ) of all stocks in all periods. They used the payoﬀ kernel

Wθ(x1, ..., xT ) (θ(•), x1, ..., xT ) 7→ (m) , (24) D (x1, ..., .xT )

15 which is the ratio of the trader’s ﬁnal wealth to that of the best (full support) rebalancing rule in hindsight. In this subsection we solve the generalized game with payoﬀ kernel

Wθ(x1, ..., xT ) (θ(•), x1, ..., xT ) 7→ (s) . (25) D (x1, ..., .xT )

m Theorem 3. In pure strategies, the lower value of the game is 1/ s p(T, s) and the upper value is 1. Thus, there is no pure strategy Nash equilibrium. The trader’s maximin strategy is to play a minimum-cost super-hedge for D(s). Nature’s minimax strategy is to pick (any) particular stock j∗ and have it be the best performing stock in all periods, e.g. xtj∗ ≥ xtj for all t, j.

Proof. Let θ(•) be a minimum-cost super-hedge for D(s). From the deﬁnition of super-hedging, we have the uniform bound

m p(T, s)W (x , ..., x ) ≥ D(s)(x , ..., x ). (26) s θ 1 T 1 T

m Thus, the trading strategy θ(•) guarantees that the payoﬀ is at least 1/ s p(T, s) . This is the best possible guarantee. For, suppose that a trading strategy ψ(•) guar-

(s) (s) antees that Wψ/D ≥ g for all x1, ..., xT . Then, since (1/g)Wψ ≥ D , the strategy ψ(•) is a super-hedge for D(s), with an initial deposit of 1/g dollars. Since the cheap-

m m est possible super-hedge costs s p(T, s), we must have 1/g ≥ s p(T, s), so that m m g ≤ 1/ s p(T, s) . This shows that 1/ s p(T, s) is the highest possible payoﬀ the trader can guarantee. To show that the upper value is 1, assume that nature chooses a speciﬁc return

∗ path (x1, ..., xT ) with the property that a certain stock j is the best performer in all periods. Then the best s-stock rebalancing rule in hindsight is a degenerate rebalancing rule that keeps 100% of its wealth in stock j∗ at all times (and 0% in the

16 other s − 1 stocks). This also happens to be the best trading strategy of any kind

that could be played against the speciﬁc path (x1, ..., xT ). Thus, this speciﬁc path

(s) guarantees that Wθ/D ≤ 1 for all θ(•).

Theorem 4. To ﬁll the duality gap, nature randomizes over Kelly horse race se-

quences xt = ejt that have at most s distinct winners (#{j1, ..., jT } ≤ s). It plays the

(s) m particular sequence (ej1 , ..., ejT ) with probability D (ej1 , ..., ejT ) s p(T, s) . The m value of the game is 1/ s p(T, s) . In the mixed-strategy Nash equilibrium, Player 1 does not randomize; he continues to play a minimum-cost super-hedge θ(•).

Proof. First note that these are legitimate probabilities, since they are non-negative

T and sum to 1. For a speciﬁc Kelly sequence (ejt )t=1 that has at most s distinct

(s) winners, the payoff is Wθ(ej1 , ..., ejT )/D (ej1 , ..., ejT ). Multiplying these payoffs by their probabilities and summing over all such sequences, we obtain an expected payoff

of P Wθ(ej1 , ..., ejT ) #{j1,...,jT }≤s m . (27) s p(T, s) By Lemma 1, the numerator is at most 1. Thus, nature’s mixed strategy has guar-

m anteed that the expected payoﬀ is at most 1/ s p(T, s) , regardless of θ(•). This proves the theorem.

4 Conclusion

This paper generalized Ordentlich and Cover’s beautiful (1998) result, that the cost of super-replicating the best full support rebalancing rule in hindsight is p(T, m) =

P T n1 nm (n1/T ) ··· (nm/T ) . We obtained the fact that the cost of n1,...,nm n1+···+nm=T m super-replicating the best s-stock rebalancing rule in hindsight is s p(T, s).

17 For any signiﬁcant number of stocks (say, the Dow Jones 30), the full support universal portfolio is impossible to compute in practice. However, it is very easy to super-replicate the best pairs trade in hindsight: one need only calculate and

m account for 2 2-stock universal portfolios. The minimum-cost super-hedge amounts m to buying 2 super-hedges for p(T, 2) dollars each, one for each pair of stocks i < j. If the realized volatility of stock prices turns out to be low, then this strategy will easily have enough final wealth with which to dominate the best pairs rebalancing rule in hindsight. However, if the realized volatility of the stock market turns out to be extremely high, then the final wealth of this strategy will not be much more than that of the best pairs rebalancing rule in hindsight. In the limiting case of the Kelly horse race market, the strategy will have a final wealth that is exactly equal to the derivative payoff. In practice, the trader will have to pick a tolerance , and calculate the shortest horizon T () on which he can guarantee to achieve a compound growth rate that is within of that of the best pairs rebalancing rule in hindsight. T () is the smallest solution T of the inequality

log m + log p(T, 2) 2 < . (28) T

m With this horizon in hand, the trader takes his initial dollar and purchases 1/ 2 p(T, 2) (2) m (2) super-hedges, yielding a ﬁnal wealth of at least D / 2 p(T, 2) , where D is the ∞ wealth of the best pairs rebalancing rule in hindsight. If the realized returns (xt)t=1 are such that the best pairs rebalancing rule in hindsight sustains a higher asymptotic growth rate than the best performing stock in the market, then the trader will beat

18 the market asymptotically. Put more concisely, we hope that

T 1 (2) Y lim inf log D (x1, ..., xT ) − log max xtj > 0. (29) T →∞ T 1≤j≤m t=1

However, the trader’s asymptotic growth rate will be somewhat lower than that achieved by the full-support universal portfolio.

References

[1] Benhamou, E. Dynamic Replication. Undated Memo.

[2] Bensaid, B., Lesne, J.P. and Scheinkman, J., 1992. Derivative Asset Pricing With Transaction Costs. Mathematical Finance, 2 (2), pp.63-86.

[3] Blackwell, D., 1956. An Analog of the Minimax Theorem for Vector Payoﬀs. Paciﬁc Journal of Mathematics, 6 (1), pp.1-8.

[4] Breiman, L., 1961. Optimal Gambling Systems for Favorable Games. In Pro- ceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Prob- ability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California.

[5] Cover, T.M., 1987. Log Optimal Portfolios. Chapter in Gambling Research: Gambling and Risk Taking, Seventh International Conference (Vol. 4).

[6] Cover, T.M., 1991. Universal Portfolios. Mathematical Finance, 1 (1), pp.1-29.

[7] Cover, T.M. and Gluss, D.H., 1986. Empirical Bayes Stock Market Portfolios. Advances in Applied Mathematics, 7 (2), pp.170-181.

[8] Cover, T.M. and Ordentlich, E., 1996. Universal Portfolios With Side Infor- mation. IEEE Transactions on Information Theory, 42 (2), pp.348-363.

19 [9] Cover, T.M. and Thomas, J.A., 2006. Elements of Information Theory. John Wiley & Sons.

[10] DeMarzo, P., Kremer, I. and Mansour, Y., 2006. Online Trading Algo- rithms and Robust Option Pricing. In Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing (pp. 477-486). ACM.

[11] Garivaltis, A. 2018. Exact Replication of the Best Rebalancing Rule in Hind- sight. Working Paper.

[12] Garivaltis, A. 2018. How to Beat the Market Asymptotically (by Superhedging a Lookback Derivative). Working Paper.

[13] Gy¨orﬁ,L., Lugosi, G. and Udina, F., 2006. Nonparametric Kernel-Based Sequential Investment Strategies. Mathematical Finance 16 (2), pp.337-357.

[14] Iyengar, G., 2005. Universal Investment in Markets with Transaction Costs. Mathematical Finance 15 (2), pp.359-371.

[15] Jamshidian, F., 1992. Asymptotically Optimal Portfolios. Mathematical Fi- nance, 2 (2), pp.131-150.

[16] Kelly, J., 1956. A New Interpretation of Information Rate. Bell Sys. Tech. Journal, 35, pp.917-926.

[17] Kozat, S.S. and Singer, A.C., 2011. Universal Semiconstant Rebalanced Portfolios. Mathematical Finance 21 (2), pp.293-311.

[18] Luenberger, D.G., 1998. Investment Science. Oxford University Press.

[19] Markowitz, H., 1952. Portfolio Selection. The Journal of Finance, 7 (1), pp.77- 91.

[20] Ordentlich, E. and Cover, T.M., 1998. The Cost of Achieving the Best Portfolio in Hindsight. Mathematics of Operations Research, 23 (4), pp.960-982.

20 [21] Poundstone, W., 2010. Fortune’s Formula: The Untold Story of the Scientiﬁc Betting System that Beat the Casinos and Wall Street. Hill and Wang.

[22] Shtar’kov, Y.M., 1987. Universal Sequential Coding of Single Messages. Prob- lemy Peredachi Informatsii, 23 (3), pp.3-17.

[23] Stoltz, G. and Lugosi, G., 2005. Internal Regret in On-Line Portfolio Selec- tion. Machine Learning, 59 (1-2), pp.125-159.

[24] Thorp, E.O., 1966. Beat the Dealer. Random House.

[25] Thorp, E.O., 2017. A Man for All Markets: From Las Vegas to Wall Street, how I Beat the Dealer and the Market. Random House Incorporated.

[26] Willmott, P., 1998. Derivatives: the Theory and Practice of Financial Engi- neering. John Wiley & Sons.

[27] Wilmott, P., 2001. Paul Wilmott Introduces Quantitative Finance. John Wiley & Sons.