Shuffling Decks With Repeated Card Values

by Mark A. Conger

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Mathematics) in The University of Michigan 2007

Doctoral Committee:

Assistant Professor Divakar Viswanath, Chair Professor Sergey Fomin Professor John K. Riles Professor Bruce Sagan, Michigan State University c Mark A. Conger

All Rights Reserved 2007 ACKNOWLEDGEMENTS

This thesis has been more than 16 years in the making, so there are a lot of people

to thank.

Thanks to my friends and roommates Steve Blair and Paul Greiner, who managed

to get me through my first year at Michigan (1990-91). As Steve said to me once,

years later, “Together, we made a pretty good graduate student.”

Thanks to Prof. Larry Wright at Williams for teaching me about Radix Sort. And

thanks to everyone who shared their punch cards and punch card stories with me:

John Remmers, Ken Josenhans, Thom Sterling, and Larry Mohr among others.

Thanks to Profs. Bill Lenhart at Williams and Phil Hanlon at Michigan who taught

me basic enumerative combinatorics, especially how to switch summation signs. Prof.

Ollie Beaver taught me how to use confidence intervals, and Martin Hildebrand

taught me the Central Limit Theorem. Pam Walpole and Pete Morasca taught me

many things I use every day.

Most of the programs used to test hypotheses and generate Monte Carlo data were

written in C/C++. The thesis was typeset in LATEX using the PSTricks package, with a number of figures written directly in PostScript. Other figures, as well as the glossary and notation pages, were generated from scripts by Perl programs. Certain

ii tables were created by an XSLT stylesheet. MAPLE programs were used when large rational results were required. So thank you to Brian Kernighan, Dennis Ritchie,

Bjarne Stroustrup, , Leslie Lamport, Timothy Van Zandt, Adobe Sys- tems, Larry Wall, the World Wide Web Consortium, and Maplesoft for those lan- guages.

Thanks to Profs. Mark Skandera and Boris Pittel for explanations of the Neggers-

Stanley conjecture. Prof. Herbert Wilf encouraged the work in Chapter IV. Prof.

Jeff Lagarias found me a copy of [22].

Thanks to all my friends with Ph.D.’s who inspired me by example during my time in the wilderness. They include Paul Greiner, Rick Mohr, Thom Sterling, Jill Baker,

Ray Bingham, Lucy Hadden, Will Brockman, Ming Ye, Xueqing Tan, Ivan Yourshaw,

Ken Hodges, Su Fang Ng, John Remmers, Steven Lybrand, Wil Lybrand, Jan Wolter, and Larry Mohr.

Thanks to Profs. Keith Riles and Roberto Merlin, who let me sit in on their physics classes in 1999. If they had been less encouraging, I might have gone back to pro- gramming for a living.

Thanks to Prof. Al Taylor, who encouraged me to get back into the math program, and has acted as my unofficial protocol adviser throughout.

Thanks to all the staff in the graduate math office at Michigan, especially Warren

Noone, Tara McQueen, Christine Betz Bolang, Bert Ortiz, and Jennifer Wagner.

Overcoming bureaucracy has always been a challenge for me, and it helped enor- mously to have advocates (and often surrogates) for dealing with the powers that be. The department also generously paid my tuition for the Winter 2006 semester.

iii Thanks to Jayne London for fostering all the programs for graduate students over at

Rackham, and for talking to me about options.

Many thanks to my good friend and teaching partner Jason Howald, who was always encouraging and helpful when I was stuck, and who gave me the greatest gift one mathematician can give another: he listened to me talk about my problem, thought about it, and worked with me to find a different approach.

Thanks to Profs. and Jason Fulman for conversations and tips on directions to take, and to Jim Reeds and Ed Gilbert for explaining the history to me.

Thanks to Prof. Bruce Sagan, who helped me with mathematical as well as political advice on a number of occasions.

Sergi Elizalde talked with me about EDπ(1) (Chapter IV, Theorem 4.3), and wrote a proof a few days after I did.

Thanks to all my office mates: Kamran Kashef, Ken Keppen, Jared Maruskin, Jiarui

Fei, Dave Constantine, and Janis Stipins, for putting up with my clutter (and my primitive origami).

Thanks to Teresa Hunt for years of encouragement and for several pearls of wisdom and tricks for getting things done.

Thanks to Nito for keeping me company.

Thanks to my study partners of the past few years: Cornelia Yuen, Paul Greiner,

Rob Houck, and Chris Bichler.

iv Many thanks to my stepfather, Wil Lybrand, who gave me the invaluable advice to treat the Ph.D. as a union card, not a life’s work.

Prof. Sergey Fomin taught me everything I know about Young diagrams and sym- metric functions, and he has always been generous with his time and his support.

My advisor Divakar Viswanath, of course, was more involved with this work than anyone, and deserves the most credit for its success. We began work on the topic of card shuffling in 2002, when he gave me a copy of [3] to read as part of a class on Markov Chains. He suggested thinking about problems that the authors had not considered, and that was the source of the question about decks with repeated cards.

Throughout the development of the ideas in this thesis he has usually been several steps ahead of me.

My Mother, Drew Conger, has been unbelievably patient waiting for me to finish my degree. I learned not only patience but almost everything else I value from her.

Carol Mohr has endured the most in service of this project. She has also given the most support, and for that I am forever grateful.

v TABLE OF CONTENTS

ACKNOWLEDGEMENTS ...... ii

LIST OF FIGURES ...... x

LIST OF TABLES ...... xiii

LIST OF APPENDICES ...... xiv

GLOSSARY ...... xv

NOTATION ...... xx

CHAPTER

I. Introduction ...... 1

1.1 PreviousResults...... 2

1.2 RepeatedCards ...... 6

1.3 NewResults ...... 8

1.3.1 Dealing is Equivalent to Fixing the Target Deck ...... 8

1.3.2 Transition Probabilities and Descent Polynomials ...... 9

1.3.3 Approximating Transition Probabilities ...... 9

1.3.4 Bridge ...... 11

1.3.5 Other Results from First-Order Approximations ...... 12

1.3.6 MonteCarloSimulations...... 13

1.3.7 Calculation of Descent Polynomials for Certain Decks ...... 13

vi 1.3.8 The Joint Distribution of des(π) and π(1) ...... 14

II. Preliminaries ...... 15

2.1 PermutationsandDecks ...... 15

2.2 RepeatedValues...... 16

2.3 MixingProblems...... 18

2.4 RiffleShuffling...... 19

2.5 a-shuffles...... 22

2.6 InverseShufflesandRepeatedShuffles...... 23

2.7 Counting Shuffles Which Produce a Permutation ...... 26

2.8 DescentPolynomialsandShuffleSeries ...... 29

2.9 DistancefromUniform ...... 30

2.10 VariationDistance...... 33

2.11 Dealing Cards is Equivalent to Fixing the Target Deck ...... 36

2.12 HowGoodistheGSRModel? ...... 38

III. Probability Calculations for Some Simple Decks ...... 41

3.1 TheSimplestDeck ...... 43

3.2 OneRedCardonTop...... 44

3.3 OneRedCardontheBottom ...... 45

3.4 AnyPositiontoTop...... 47

3.5 AnyPositiontoBottom...... 49

3.6 AnyPositiontoAnyPosition ...... 50

3.7 OneRedCard,OneGreenCard ...... 52

3.8 Source deck RmBn ...... 53

3.9 Target deck RmBn ...... 56

3.10 Source Deck 1n1 2n2 ...mnm ...... 57

vii 3.11 Target Deck 1n1 2n2 ...mnm ...... 60

3.12 TargetDecksContainingBlocks ...... 61

IV. The Joint Distribution of π(1) and des(π) in Sn ...... 65

4.1 IntroductionandMainResults...... 65

4.2 BasicProperties ...... 68

4.3 Recurrences ...... 70

4.4 FormulasandMoments ...... 71

4.5 Application Using Stein’s Method ...... 77

4.6 GeneratingFunctions ...... 79

4.7 GeneralBehavior ...... 85

4.8 Behavior if d n ...... 88  4.9 IfBothEndsAreFixed...... 89

4.10Remarks ...... 92

V. Estimating Variation Distance ...... 93

5.1 TheBasicQuestionsofCardShuffling...... 93

5.2 Calculating Variation Distance Exactly ...... 95

5.3 Expansion of the Transition Probability as a Power Series in a−1 ...... 96

5.4 Approximating the Transition Probability and Variation Distance for Large Shuffles...... 99

5.5 All-DistinctDecks...... 102

5.6 Calculation of κ1 whentheSourceDeckisFixed...... 105

5.6.1 TwoCardTypes ...... 105

5.6.2 For Which Collections of Cards Can κ1 beZero? ...... 108

5.6.3 Guessing How Large κ1 CanBe...... 111

5.6.4 Finding κ1 forGeneralSourceDecks ...... 115

5.7 Calculation of κ1 whentheTargetDeckisFixed ...... 118

viii 5.7.1 For Which Collections of Cards Can κ1 beZero? ...... 119

5.7.2 Finding κ1 forGeneralTargetDecks ...... 122

5.7.3 ...... 123

5.7.4 StraightPoker...... 124

5.7.5 OrderedDeals...... 126

5.7.6 Bridge ...... 129

5.7.7 Estimation of κ1 forlargedecks...... 132

5.8 Bounding the Error in the First-Order Estimate ...... 133

5.9 Ways to understand the transition between two decks ...... 136

VI. Monte Carlo Estimates ...... 143

6.1 Approximating Variation Distance Given Descent Polynomials ...... 144

6.2 Approximating Transition Probabilities ...... 147

6.3 Approximating κ1 and κ1 ...... 154

APPENDICES ...... 158

BIBLIOGRAPHY ...... 210

ix LIST OF FIGURES

Figure

2.1 The four permutations which take BABA to AABB...... 17

2.2 The correspondence between the shuffle of Table 2.1 and a sequence of 0’s and 1’s. 21

2.3 From left to right, a 3-shuffle of ABCDEFGH; from right to left, an inverse 3-shuffle of DAGEFBHC...... 24

2.4 Aradixsortofasetof8punchcards...... 25

2.5 The advantages for player B in the Ace-King (in black) and in Doyle’s game (inred)...... 33

2.6 The variation distance from uniform of a distinct 52-card deck after an a-shuffle. . 36

3.1 The graphs of π, ρπ, πρ, and ρπρ, showing the way ρ changes ascents to descents andvice-versa...... 46

3.2 ThetwocasesconsideredinSection3.4...... 49

3.3 TheshuffledescribedinSection3.6...... 51

3.4 The second “red-green” shuffle described in Section 3.7...... 52

3.5 The shuffle of the source deck RmBn,asdescribedinSection3.8...... 54

3.6 The Young matrix A for the deck RBBRBRRBBBRRB...... 55

3.7 A shuffle to the target deck RmBn...... 56

x 3.8 Shuffling a sorted deck, as described in Section 3.10...... 58

3.9 A shuffle to the target deck 1n1 2n2 ...mnm ...... 61

4.1 A north-east lattice path from (0, 0) to (m + n `,`)...... 75 −

4.2 The real zeroes of a Neggers-Stanley descent polynomial...... 84

n 4.3 The unimodality of d k for n = 6 and n =7...... 87

n ` 4.4 The “rollback” procedure for finding d k...... 91

5.1 The variation distance from uniform, and first-order approximation, of a distinct 52-card deck after an a-shuffle...... 104

5.2 Representingadeckasawalkonagraph...... 106

5.3 The deck D0 = 112212112212 represented as a north-east path from (0, 0) to (6, 6). 107

5.4 The effect of cutting a deck at position m...... 127

5.5 The effect of cutting a euchre deck at position m...... 127

5.6 Three styles of bridge dealing, represented by lattice paths...... 131

5.7 A summary of κ1 examples,withnormalestimates...... 133

5.8 The bound E(n,a) on the error in the first-order estimate of variation distance, for n = 52, versus the actual error for the case of 52 distinct cards...... 136

6.1 A Monte Carlo estimate of the variation distance from uniform, and a first-order approximation, of a euchre deck after an a-shuffle...... 146

6.2 A Monte Carlo estimate of the variation distance from uniform, and first-order approximation, of an ordered bridge deck after an a-shuffle...... 147

xi 6.3 A double Monte Carlo estimate of the variation distance from uniform of a cyclic 0 13 bridge deck (Dcyc = (NESW) )...... 152

6.4 A double Monte Carlo estimate of the variation distance from uniform of a back- 0 6 and-forth bridge deck (Dbf = (NESWWSEN) (NESW))...... 153

6.5 Monte Carlo estimates for three methods of dealing bridge...... 153

6.6 The distances of Figure 6.5 graphed on a log-log scale...... 154

6.7 A Monte Carlo estimate of the variation distance from uniform of a sorted go-fish deck (D = A424 K4)...... 156 · · ·

C.1 Shufflingasorteddeck...... 178

0 C.2 The matrix B1 if the subsequence of 1’s and 2’s in D is 1221211222112...... 179

D.1 Riemann sums corresponding to probabilities in the Ace-Kinggame...... 186

xii LIST OF TABLES

Table

1.1 Variation distances from uniform after an a-shuffle for a deck of distinct cards and 3methodsofdealingbridge...... 12

2.1 The steps in a typical shuffle of the deck ABCDEFGH...... 20

5.1 Ordering a deck so that κ1 =0...... 110

10 0 6.1 The distribution of des over a sample of M = 10 permutations from T (D1,Dcyc), where D1 isasinEquation(6.3)...... 149

xiii LIST OF APPENDICES

Appendix

A. Tables of Eulerian and Refined Eulerian Numbers ...... 159

B. Excerpts from Calcul des Probabilit´es (1912)byHenriPoincar´e ...... 163

B.1 Introduction,pp.13–15...... 163

B.2 Chapter XVI, “Diverse Questions”, pp. 301–313 ...... 165

C. An Algorithm for Computing the Descent Polynomial when the Source Deck is Sorted177

D. ProbabilitiesfortheAce-KingGame ...... 183

E. The Variance of Nc1 WhentheSourceDeckisFixed ...... 188

F. The Variance of Nc1 WhentheTargetDeckisFixed ...... 194

G. Moments of the Descent Distribution over T (D,D0)...... 200

G.1 FirstMoment ...... 203

G.2 SecondMoment ...... 204

G.2.1 ConsecutiveVariables ...... 204

G.2.2 NonconsecutiveVariables ...... 206

xiv GLOSSARY

arrow diagram A figure describing the action of a permutation on a deck. The

source deck appears on one side of the diagram, the target deck on the other,

and arrows are drawn between corresponding cards. (p. 16) ascent When applied to a permutation π, a position k such that π(k) > π(k + 1).

(See descent.) (p. 27) a-shuffle A method of shuffling cards in which a deck is into a packets, according

to a multinomial distribution on the packet sizes, then riffled together. There

are an possible a-shuffles of a deck of n cards, and they are all equally probable

under the GSR model. (p. 22) A game played with ordinary playing cards. The suit of cards is

ignored, and there are 10 categories of rank: the aces, the twos, . . ., the nines,

and the rest. Blackjack may be played with any number of ordinary decks

mixed together, but when played with 52 cards, we may think of the deck as an

ordering of A424 94T16. (p. 114) ··· bottom card The card in position n of an n-card deck. (p. 15) bridge A game played with an ordinary deck of 52 cards, all of which are considered

distinct. Each of the four players (commonly referred to as north, east, south,

xv and west) receives a 13-card hand. The order in which a player receives his

cards is irrelevant, so we may consider a bridge deal to be some element of

(N13E13S13W13). (p. 129) O card An element in a deck, which has a value and a position in the deck. (p. 15) cutoff In a mixing problem, the amount of mixing after which the variation distance

to uniform begins to drop quickly. (p. 94) deal A partitioning of a collection of cards into sets with prescribed sizes. (p. 37) deck A sequence of cards. May be thought of as a function from 1, 2,...,n into { } an arbitrary set of values. (p. 15) descent When applied to a permutation π, a position k which is such that π(k) >

π(k + 1). If the permutation is treated as a function and graphed in R2, with

line segments drawn between points, descents are the positions where the graph

has negative slope. (p. 27) descent polynomial The ordinary generating function for the number of permu-

tations in a set R S which have d descents. R is usually the transition set ⊂ n between two chosen decks. (p. 29) digraph Two consecutive cards; we say a deck D has a u-v digraph at i if D(i)= u

and D(i +1) = v. (p. 99) embedding Given two decks s and t, an embedding is an injection φ from the

positions of s to the positions of t such that t φ = s. In other words, an ◦ embedding is a subdeck of t which matches s in order and content. (p. 201) euchre A played with 4 players and a deck of 24 distinct cards. Each

player receives 5 cards. The remaining 4 cards are the “kitty.” The top card of

xvi the kitty is face up and may enter play, but the other 3 cards are hidden and

may not. The dealer deals two or three cards at a time, so the deal sequence is

111223334411222334445666. (p. 123) go-fish A children’s card game played with a normal 52-card deck, in which the

rank of cards is important but their suit is not. So we may think of a go-fish

deck as an ordering of A424 K4. (p. 113) ··· graph of a permutation The graph obtained from a permutation π S by draw- ∈ n ing a sequence of line segments from (1, π(1)) to (2, π(2)), from (2, π(2)) to

(3, π(3)),..., and from (n 1, π(n 1)) to (n, π(n)). (p. 45) − −

GSR model The Gilbert-Shannon-Reeds model of card shuffling. It can be de-

scribed either by assuming that a deck is cut according to a binomial distribu-

tion, and then cards are dropped with probability proportional to packet size,

or by stating that all 2n shuffles of an n-card deck are equally likely. See also

a-shuffle. (p. 19) ordered deck The deck e =1, 2, 3,...,n. (p. 16) pair Two cards that appear as a subsequence of a deck; we say a deck D has a u-v

pair at (i, j) if i < j, D(i)= u, and D(j)= v. (p. 99) palindrome A deck that is the same backwards as forwards. (p. 140) permutation of n letters A bijection from 1, 2,...,n to itself. (p. 15) { } rising sequence A maximal sequence of consecutive numbers that appears as a

subsequence of some deck. When applied to a permutation, the deck in question

is the result of applying the permutation to the ordered deck. (p. 26)

xvii shuffle series The ordinary generating function of the number of a-shuffles that

produce a permutation in R for some R S . R is usually the transition set ⊂ n between two chosen decks. (p. 29)

source deck The deck being acted upon. (p. 17)

stabilizer When a group G acts on a set X, the stabilizer of some element x in X

is the subgroup of elements of G which leave x fixed. (p. 17)

stable sort Sorting a list of items by some particular attribute in such a way that if

two items have the same attribute value, they will be in the same relative order

after the sort as they were before. (p. 23)

straight poker A game played with any number of players, and a regular deck of

cards. Each player receives 5 cards. All cards are distinct, but the order in

which a player receives his cards is not significant. (p. 124)

symmetric group The group of all permutations of 1, 2,...,n . (p. 15) { }

target deck The result of an action on a deck. (p. 17)

top card The card in position 1 of a deck. (p. 15)

twin digraph A digraph of the form uu. (p. 109)

uniform distribution Given a finite event space = E , E ,...,E , a proba- E { 1 2 N } bility distribution on is said to be uniform if P(E )= 1 for all i. (p. 18) E i N variation distance A measure of the distance between two probability distributions

defined on the same event space. The variation distance between distributions

P and Q is equal to the maximum of P (W ) Q(W ) as W ranges over all sets | − | of events. (p. 34)

xviii winding number The number of rising sequences of a permutation. (p. 27)

Young matrix A matrix corresponding to a deck made up of two kinds of cards.

The (i, j) of the matrix is x if the ith card of the first type appears after

the jth card of the second type, and 1 otherwise. The x’s in the matrix form a

Young shape, anchored to the lower-left corner of the matrix. (p. 55)

xix NOTATION

P(U) The probability of the event U.

E(X) The expectation of the random variable X.

#U The size of the set U.

[A] The truth value of a logical statement A: [A] is 0 if A is false and 1 if A

is true. (p. 49) xm x(x + 1)(x + 2) (x + m 1) (p. 74) ··· − xm x(x 1)(x 2) (x m +1) (p. 74) − − ··· −

Sn The symmetric group of all permutations of n letters. (p. 15)

ρ The reversal permutation: ρ S is such that ρ(i)= n +1 i for all i. ∈ n − (p. 45) n (Eulerian number) The number of permutations in S that have d de- d n   scents. (p. 43) n (Refined Eulerian number) The number of permutations π S with d ∈ n  k des(π)= d and π(1) = k. (p. 44) n ` The number of permutations π S with des(π) = d, π(1) = k, and d ∈ n  k n π(n)= `. These numbers can be written in terms of the d k. (p. 52) n d an(x) The ordinary generating function for the Eulerian numbers : d d x .

a0(x) is defined to be 1. (p. 43) P gn,k(x) The ordinary generating function for the refined Eulerian numbers:

n d d d kx . (p. 45) P xx Gn The row vector (gn,1(x),gn,2(x),...,gn,n(x)). (p. 55)

Gn The row vector (gn,n(x),gn,n 1(x),...,gn,1(x)). (p. 55) − hen,k,`(x) The ordinary generating function for the descents of permutations with

n ` d both ends fixed: d d kx . hn,k,` can be written in terms of gn,j where j = k ` (mod nP). (p. 52) − H The n n matrix whose (i, j) entry is h . (p. 59) n × n,i,j e The ordered deck: 1, 2, 3,...,n. (p. 16)

T (D,D0) The set of permutations which, when applied to deck D, result in D0.

(p. 17) W (D,u,v) The number of u-v digraphs in the deck D minus the number of v-u

digraphs in D. (p. 99) Z(D,u,v) The number of u-v pairs in the deck D minus the number of v-u pairs

in D. (p. 99) P Q The variation distance between the probability distributions P and Q. || − || (p. 34) 1 κ1(D) The coefficient of a− in the expansion of the variation distance from

uniform of the source deck D after an a-shuffle. (p. 102) 1 κ1(D0) The coefficient of a− in the expansion of the variation distance from

uniform of the target deck D0 after an a-shuffle. Fixing the target deck

corresponds to a method of dealing cards. (p. 102) n k (Unsigned Stirling number of the fist kind) The number of permutations   of n letters which have exactly k cycles. (p. 140) n k (Stirling number of the second kind) The number of ways to partition n  things into k nonempty subsets. (p. 140) (n, a) The set of a-shuffles of n cards. (p. 23) H des(π) The number of descents of the permutation π. (p. 27)

xxi asc(π) The number of ascents of the permutation π. (p. 27)

Pa(π) The probability of obtaining the permutation π as the result of an a-

shuffle. (p. 28) (x) The descent polynomial of R. (p. 29) DR (x) The shuffle series of R. (p. 29) SR

(D,D0; x) The descent polynomial of T (D,D0). (p. 30) D

(D,D0; x) The shuffle series of T (D,D0). (p. 30) S

P (D D0) The probability that an a-shuffle of the deck D results in the deck D0. a → (p. 33) (D) The orbit of a deck D when it is acted on by the symmetric group; i.e., O all rearrangements of D. (p. 95) 4 16 Dbj The blackjack deck (A23456789) T (p. 117)

Deuchre0 The euchre deck 111223334411222334445666. (p. 123)

13 13 13 13 Dord0 The ordered bridge deck N E S W . (p. 129)

13 Dcyc0 The cyclic bridge deck (NESW) . (p. 130)

6 Dbf0 The back-and-forth bridge deck (NESWWSEN) NESW. (p. 131)

xxii CHAPTER I

Introduction

This work is about the mathematics of card shuffling—specifically, riffle shuffling— when the cards in the deck are not necessarily distinct, or when they are distinct but will be dealt into hands after the shuffle. The Gilbert-Shannon-Reeds model of riffle shuffling, along with variation distance from uniform to measure , are used throughout.

Playing cards in date back to the 12th century, and in Europe, according to

Epstein, [19, p. 158] Gutenberg printed playing cards the same year he printed his

first Bible. Since then a number of people have approached the problem of under- standing how shuffling mixes a deck of cards. There are a variety of ways to shuffle, and several ways to measure randomness, and research has included mathematical modeling as well as accumulation of statistics.

We will take for granted that the goal of shuffling is to make it difficult to guess anything about the order of the cards. In other words, a deck has been thoroughly shuffled if every possible ordering of it has become equally likely. The difficulty is that most shuffling methods employed by humans are not thorough; instead they leave some bias toward certain orderings and away from others. The goal of a math-

1 2 ematician studying card shuffling is therefore to quantify and predict the bias.

1.1 Previous Results

Henri Poincar´edevoted a section of his 1912 book Calcul des Probabilit´es [36], quoted in Appendix B, to card shuffling. His basic assumption is that the shuffler employs some simple method of shuffling, which places cards into new positions based on their current positions, but does not depend for its action on the order of the deck.

In other words, the shuffler selects a permutation from some fixed distribution on the set of all permutations, and applies it to the deck. (The distribution is what is meant by a “method of shuffling.”) Nearly all subsequent authors have made the same assumption.

Poincar´eis able to show that any shuffling method which meets certain mild criteria, if applied repeatedly to a deck, will eventually result in a well-mixed deck—that is, with enough shuffles the bias can be made arbitrarily small. At the same time

Markov [34] was creating the more general theory of Markov Chains, and he often used card shuffling as an example. The verdict of history seems to be that Markov justly deserves credit for the theory named after him, but that Poincar´eanticipated some of Markov’s ideas in his work on card shuffling. Most subsequent work on shuffling has approached the problem as a Markov chain.

In 1955 Ed Gilbert [22, 23], working with Claude Shannon on the new science of information theory at , considered the ordering of a deck as a piece of information, and shuffling as “information destruction.” So to Gilbert the bias of a shuffle is the information it behind. In concordance with Poincar´ehe shows 3

that for any reasonable method of shuffling, the residual information decays to zero

as the number of shuffles rises.

Gilbert goes on to model several particular shuffles, including riffle shuffles, in which

the deck is cut in half and the two halves interleaved in some fashion. He quotes

a theorem of Shannon that characterizes which permutations are reachable with a

certain number of shuffles, and then gives lower and upper bounds on the residual

information based on the assumptions that all reachable permutations are equally

likely, and that all possible shuffles are equally likely. The latter assumption has

become the basis for the model employed in this work.

In 1981 Jim Reeds [37, 38] and David Aldous were thinking about generating random

permutations by sorting n samples of a uniform random variable. If the variable takes on values of 0 and 1, the sorting permutation, applied to a deck of cards, will pull out those cards corresponding to 0’s and move them to the top, preserving their order.

Reeds realized that this was the inverse of a riffle shuffle, and that it was similar to the way a radix sorter (Section 2.6) would sort punch cards if it chose bins randomly instead of reading the cards. Repeated shuffles correspond to sorting multiple times, which is how the radix sorter sorts multi-digit fields. The assumption of uniformity on the bits is equivalent to picking shuffles uniformly, which is what Gilbert and

Shannon had considered; thus the premise is known as the Gilbert-Shannon-Reeds model.

In 1983 Aldous [1], following Reeds, used card shuffling as an example of a on a group. Aldous was interested in the general problem of how fast Markov chains approach their stationary distributions, and he related a number of ideas that are important to us. He measures the nonrandomness of a distribution over a finite 4

event space Ω by variation distance from uniform, which is defined to be

1 P U := P(ω) U(ω) || − || 2 | − | ω Ω X∈ 1 where U(ω)= #Ω is the uniform probability. (In Sections 2.9 and 2.10 we make an argument for why variation distance from uniform is a good measure of randomness for card shuffling.) Aldous observes that certain “rapidly mixing” Markov chains approach uniform very suddenly; thus, a graph of variation distance versus time displays the “waterfall” shape which the reader will see repeatedly in the present work (for example, in Figure 2.6). He suggests that the best way to understand the behavior of such chains is to pick a small value  and find the first time the variation distance from uniform drops below . That value has come to be known as the “cutoff time”.

Aldous shows that GSR shuffling is a rapidly mixing chain, and (drawing from Reeds’ work) that the time when each card has been assigned a different bit string is a

“strong uniform time.” In other words, at that time, the initial ordering of the deck has become irrelevant, and the deck is thoroughly mixed. That allows Aldous to

3 prove that the cutoff time for shuffling an n-card deck is about 2 log2(n) shuffles when n is large.

In 1986 Aldous and Persi Diaconis [2] gave the GSR model its name, and described it in three equivalent ways, one of which is given in Section 2.4. Again following

Reeds, they observe that the time when each card has been assigned a different bit string is similar to the famous “ problem” in probability. After 11 shuffles of a 52-card deck, the probability that any two cards have the same string (i.e., the probability that some pair of cards has always been in the same half of the deck for

1 each shuffle) drops below 2 . In 1988 Diaconis [13] added a fourth description of the 5

GSR model, and also reported some experiments he and Reeds performed to test

how well the model represents the way people really shuffle (see Section 2.12).

In 1992, Diaconis and Dave Bayer [3] wrote the most famous paper to date on card

shuffling. They generalized the normal idea of a shuffle to include a-shuffles, which is the way a person with a hands would shuffle a deck (see Section 2.5). That allows the problem to be approached without the use of Markov chains. Most importantly they derived an explicit formula for the probability of a permutation after an a-shuffle of an n-card deck: 1 a + n des(π) 1 P (π)= − − a an n   where des(π) is the number of descents in π. Using the formula, one can, for the first time, calculate precisely the variation distance from uniform of any deck of distinct cards after any number of GSR shuffles (see Section 2.10 and Figure 2.6). For a

1 52-card deck, the variation distance drops below 2 after 7 shuffles, which is why the New York Times article about the result had the headline “In Shuffling Cards, 7

Is Winning Number” [28]. Seven shuffles has been the standard for randomness for many card players ever since.

This work picks up where Bayer and Diaconis left off, but we would be remiss not to point out that many other people have thought about card shuffling, from both mathematical and statistical points of view. In 1940 Borel and Ch´eron [5] devoted an entire chapter of their book on the mathematics of bridge to shuffling. In 1958

Kosambi and Rao [29] did experiments with a 25-card deck in order to show that certain ESP research, which used such a deck to test psychic ability, had not taken into account the effect of poor shuffling. In 1973 Berger [4] showed there was a measurable difference between bridge tournament hands picked by computers and 6

those dealt by hand, and he argues that the discrepancy is due to poor shuffling. In

1977 Epstein [19] investigated real shufflers by recording the sound of shuffles, and

he suggests a different model than Gilbert, Shannon and Reeds (see Section 2.12).

Since 1992, a number of authors have used Bayer and Diaconis’ results and gen-

eralized them in various ways (see [14, 15] for references); we mention only a few

here. Diaconis, Pitman, and McGrath [17] calculated the distribution of a number

of quantities after an a-shuffle. Ciucu [8] found the most likely card to be in a par- ticular position after any number of GSR shuffles. Trefethen & Trefethen [49] and

Stark, Ganesh, and O’Connell [44] revived Gilbert and Shannon’s idea of measuring

randomness as information loss, with some surprising results.

1.2 Repeated Cards

The problem of shuffling a deck with repeated cards is mentioned in [29], and the fact

that dealing a deck into hands affects the randomness created by shuffling appears

in [5], [22], and [35]. The new idea in the present work (and in [10] and [11]) is to

apply the results of Aldous, Diaconis, Reeds, et. al. to the case of decks in which not

all cards are distinct. This turns out to be dual to the problem of decks which are

dealt into hands (see Section 2.11), and so the two problems are treated in tandem

throughout. Note that identifying cards has a natural application to card

such as go-fish or blackjack, which are played with an ordinary deck but ignore the

suit of cards; so, for instance, the ace of spades is essentially identical to the ace of

hearts.

The generalization complicates the situation in the following ways: 7

Decks (ordered sequences of cards) and transformations between decks can no • longer be identified with permutations. Instead for each pair of decks there is

a set of permutations which transform the first into the second (and a different

set which goes the other way). The transformation sets are easy to describe,

but it is difficult to find their probability after shuffling.

The initial order of a deck, and not just its composition, affects how fast the • distribution approaches uniform.

A deck can no longer be described by a single number (its size). That makes it • 3 hard to imagine a simple asymptotic result such as “The cutoff is near 2 log2(n) shuffles” which would apply to all conceivable decks.

In this work we fully embrace Bayer and Diaconis’ generalization of repeated GSR shuffles to a-shuffles. The good news then is that the formula for probability after an a-shuffle is still available to us, as long as we cast the transformation between decks in terms of permutations. Aldous, Bayer, and Diaconis’ method of measuring randomness—variation distance from uniform—is defined for any probability distri- bution, so it generalizes easily to the case of decks with repeated cards.

The other good news is that we have a richer set of questions we can ask than simply

“How many times should one shuffle,” the precise answer to which will always be somewhat debatable, since it depends on how much unfairness one is willing to tolerate. For instance:

Which ordering of a deck will shuffle the slowest? •

Which method of dealing produces the most random set of hands? • 8

Where should one cut the deck between shuffling and dealing? •

1.3 New Results

1.3.1 Dealing is Equivalent to Fixing the Target Deck

Let P (D D0) be the probability of obtaining the deck D0 by a-shuffling the deck a → D. So after D is a-shuffled, the variation distance between the new distribution of

decks and the uniform is

1 P U = P (D D0) U(D0) || a − || 2 | a → − | D0 (D) X∈O where (D) is all reorderings of D. On the other hand, if a deck of distinct cards O is shuffled and then dealt to k players, we may encode the method of dealing as a

sequence D0 of letters from the alphabet 1, 2,...,k ; for instance, dealing cyclically { } around the table corresponds to the string

D0 = (12 k)(12 k) (12 k). ··· ··· ··· ···

The uniform distribution in this case would make all partitions of the deck into hands

equally likely, and in Section 2.11 we show that the variation distance from uniform

after a-shuffling and then dealing with method D0 is

1 P U = P (D D0) U(D) . || a − || 2 | a → − | D (D0) ∈OX In other words, dealing corresponds to fixing the target (ending) deck, in the same

way that identifying cards corresponds to fixing the source (starting) deck. (This idea

is due to Viswanath, and appears first in [10].) Note that shuffling is an asymmetric

process, so the two sums above are different. Thus we say that the problem of dealing

is “dual” to the problem of identifying cards, but not identical. 9

1.3.2 Transition Probabilities and Descent Polynomials

In order to describe the behavior of the distribution after repeated shuffles, we need

to be able to calculate the sums above for all values of a. Thus for either case

(fixed source deck or fixed target deck) it is necessary to compute P (D D0) a → for all a. Since the probability of a permutation is determined by the number of

descents it has, we could compute that probability if we knew how many descents

each permutation that takes D to D0 has. That is, if we know the coefficients of the

“descent polynomial”

des(π) (D,D0; x)= x D 0 π:πDX=D we can calculate P (D D0) for any a. a →

Thus the first step in calculating variation distances seems to be a method for finding

descent polynomials. Unfortunately, this turns out to be an intractable problem in

general. In [11] Viswanath demonstrates a class of decks for which computing descent

polynomials is #P-complete, meaning that it is equivalent to a large class of counting

problems which are believed not to have polynomial-time solutions. So it seems likely

that calculating variation distances exactly is also #P-complete in general.

1.3.3 Approximating Transition Probabilities

1 The new idea in Chapter V is to write P (D D0) as a polynomial in a− , thus a → allowing us to estimate the transition probability for large values of a. We show

that n 1 − k P (D D0)= c (D,D0)a− a → k Xk=0 10

where c0(D,D0) is the uniform probability and

n W (D,u,v)Z(D0,u,v) c (D,D0)= . 1 2N n n u

nw is the number of cards with value w, and W (D,u,v) and Z(D0,u,v) are simple,

easily computable functions on decks: W (D,u,v) is the number of u-v digraphs in

D minus the number of v-u digraphs, and Z(D0,u,v) is the number of u-v pairs in

D0 minus the number of v-u pairs. See Section 5.4 for definitions of digraphs and

pairs.

Thus we have a first order estimate for the variation distance from uniform of a deck

D after an a-shuffle:

1 2 Pa U = κ1(D)a− + O(a− ) || − ||

where 1 κ (D)= E Nc (D,D0) 1 2 | 1 | and the expectation is taken under the assumption that D0 is a random variable

uniformly distributed on the orderings of D. Similarly, when the target deck D0 is

fixed, we have

1 2 Pa U = κ1(D)a− + O(a− ) || − ||

where 1 κ (D0)= E Nc (D,D0) 1 2 | 1 | and D is uniform on (D0). The coefficients κ (D) and κ (D0), when they are O 1 1 nonzero, tell the long term behavior of the variation distance from uniform. So

arguably they answer the questions “How hard is the deck D to shuffle?” and “How

good is the dealing method D0?” 11

1.3.4 Bridge

Consider the case of bridge (Section 5.7.6). Bridge is a game with four players (North,

East, South, and West) who each receive 13 cards from a deck of 52 distinct cards.

If D0 is some method of dealing bridge, then

1 κ1(D0)= E W (D,u,v)Z(D0,u,v) 13 u

13 13 13 13 1. Dord0 = N E S W , that is, dealing the top 13 cards to North, the next 13 to

East, etc.

13 2. Dcyc0 =(NESW) , dealing cyclically around the table.

There are 169 ways to find an N above an E in Dord0 , and no ways to find an E above

an N, so Z(Dord0 , N, E) = 169. Likewise Z(Dord0 ,u,v) = 169 for all u < v, as long as

we give the values the inherent order N < E < S < W. (The order is arbitrary, and

only for convenience—any order will give the same value for κ1.) On the other hand

one can check that there are 91 N-E pairs and 78 E-N pairs in Dcyc0 , and likewise for

the other values, so Z(Dcyc0 ,u,v)= 13 for all u < v. It follows that

1 κ (D0 )= κ (D0 ). 1 cyc 13 1 ord

One can interpret that as saying that if one shuffles enough, then cyclic dealing works

13 times as well as cutting the deck into hands. Or, to put it another way, switching from cutting the deck into hands to cyclic dealing is as effective in the long run as doing an extra log 13 3.7 GSR shuffles. 2 ≈ 12

Deck Method a = 16 32 64 128 256 512 1024 52 Distinct Exact 1.0000 0.9237 0.6135 0.3341 0.1672 0.0854 0.0429 123 (52) 44.0571a−1 2.7536 1.3768 0.6884 0.3442 0.1721 0.0860 0.0430 · · · Ordered bridge Monte Carlo 0.9902 0.7477 0.4230 0.2183 0.1104 0.0550 0.0274 N13E13S13W13 27.9095a−1 1.7443 0.8722 0.4361 0.2180 0.1090 0.0545 0.0273 Cyclic bridge Monte Carlo 0.2349 0.0735 0.0346 0.0169 0.0084 0.0042 0.0021 (NESW)13 2.1469a−1 0.1342 0.0671 0.0335 0.0168 0.0084 0.0042 0.0021 Back-Forth bridge Monte Carlo 0.3118 0.0260 0.0073 0.0022 0.0008 0.0003 0.0002 (NESWWSEN)6(NESW) 0.1651a−1 0.0103 0.0052 0.0026 0.0013 0.0006 0.0003 0.0002

Table 1.1: Variation distances from uniform after an a-shuffle for a deck of distinct cards and 3 methods of dealing bridge. The first-order approximations are shown for each deck; the −1 coefficients of a , which are called κ1 in the text, are rational numbers which will be computed exactly in Chapter V. Also shown are an exact computation for the distinct case, and the results of Monte Carlo simulations for the bridge methods. See Chapter VI for the Monte Carlo parameters.

The natural question at this point is, “Is cyclic dealing the best we can do?” The answer, surprisingly, is no. We show that for the dealing method

6 Dbf0 =(NESWWSEN) (NESW)

the value of Z(Dbf0 ,u,v)is 1 for all u < v, and thus dealing back-and-forth around the table is 13 times as effective as cyclic dealing, if one shuffles enough. Or, switching from cyclic dealing to back-and-forth dealing is worth another 3.7 GSR shuffles.

Table 1.1 contains a summary of the first-order approximations for bridge, along with results from Monte Carlo simulations (see below).

1.3.5 Other Results from First-Order Approximations

In Section 5.6.4 we discuss a general algorithm for computing κ1(D), and in Sec- tion 5.7.2 we do the same for κ1(D0). Some results are

1. When playing straight poker with 4 players and dealing cyclically, cutting the

deck after card 16 will produce the most random set of hands.

2. A deck containing only two types of cards will have κ1 = 0, and therefore will 13

shuffle very fast, if the first and last cards are the same (but not if they are

different).

3. The hardest-to-shuffle go-fish deck appears to be (A23456789TJQK)4, and its

value of κ1 is about half that of a deck of 52 distinct cards. Thus, a shuffler

needs one more GSR shuffle to mix a deck of 52 distinct cards than to mix a

go-fish deck.

1.3.6 Monte Carlo Simulations

It is natural to ask how good the first order estimate is for variation distance. We present a crude bound on the error in Section 5.8, and we also attempt justify the estimates using Monte Carlo simulations in Chapter VI. Included are two very large supercomputer simulations for cyclic and back-and-forth bridge; they confirm that back-and-forth dealing is an improvement if one shuffles at least 5 times. In general we find that the first-order estimate usually becomes good at about the time of the cutoff.

1.3.7 Calculation of Descent Polynomials for Certain Decks

Useful in those Monte Carlo simulations are some results from Chapter III, where we compute descent polynomials for certain special pairs of decks. The intent of the chapter is to find the broadest possible classifications which will admit fast computa- tion, knowing as we do that the general problem is #P-complete. The most inclusive methods presented calculate transitions when one of the two decks is sorted, and, a bit more slowly, when the target deck contains large blocks of cards of the same value. Thus we can always compute the transition probability to or from at least 14

one ordering of any deck.

1.3.8 The Joint Distribution of des(π) and π(1)

The calculations in Chapter III lead to a general question about the joint distribution of des(π) and π(1) as π ranges over all permutations of n letters. That distribution

is explored in Chapter IV, where we find a formula for the number of permutations

which begin with k and have d descents. That is used to show that when π is chosen

uniformly from among those permutations with d descents, the expected value of

π(1) is d + 1. Chapter IV will appear in publication as [9]. CHAPTER II

Preliminaries

2.1 Permutations and Decks

A permutation of n letters is a bijection π : 1, 2,...,n 1, 2,...,n . One { }→{ } way to display a permutation is with two-line notation, for instance

123456 π = 415362  

which means π(1) = 4, π(2) = 1, and so on. Sn will denote the symmetric group

of all permutations of n letters. If π, σ S then πσ is the composition π σ. That ∈ n ◦ is, πσ(k)= π(σ(k)), for all k.

A deck D of size n is a sequence of n cards. Each card in a deck has a position

and a value. The positions are distinct integers between 1 and n. The values can

be anything, and two different cards may have the same value. We may think of a

deck as simply a function D from 1, 2,...,n into some set of values, where the { } card in position k has value D(k). When we write “D = A, C, A, B, A, B” or simply

“D = ACABAB” we mean that D is a deck of 6 cards, and that the card in position 1

has value A, the card in position 2 has value C, and so on. The card in position 1 is

the top card and the card in position n is the bottom card.

15 16

A permutation π can be applied to a deck D of the same size to produce a new deck πD. π takes the card at position k and puts it in position π(k), for all k. So if

D0 = πD, then D0(π(k)) = D(k) for all k. If we think of a deck as a function, that

means D0 π = D, or ◦

1 (2.1) πD = D π− . ◦

We will sometimes associate a permutation with the deck πe, where e is the ordered

deck 1, 2, 3,...,n. (That is, e(k)= k for all k.) From Equation (2.1),

1 1 1 (2.2) πe = π− (1), π− (2),...,π− (n).

We will also sometimes associate π with the sequence π(1), π(2),...,π(n). Which

sequence is intended will be clear from context.

2.2 Repeated Values

Let D be a deck of n cards, and let D0 be a reordering of D. If all the cards in D have

distinct values, then there is exactly one permutation in Sn which, when applied to

D, produces D0. For instance, if

D = C, A, B, D and D0 = D, C, B, A

then a permutation π which takes D to D0 must send the C in position 1 of D to the

C in position 2 of D0, so π(1) must be 2. Likewise π(2) must be 4 so that the A card

moves correctly, and π(3) = 3, π(4) = 1 to accommodate the B and D cards. So π in

two-line notation must be 1234 π = . 2431   We can represent the decks and the permutation together with the following arrow

diagram. 17

B A B A B A B A A A A A A A A A B B B B B B B B A B A B A B A B

Figure 2.1: The four permutations which take BABA to AABB.

CABD

D C B A

Now suppose D and D0 contain multiple cards with the same values. For instance, we might have

D = B, A, B, A and D0 = A, A, B, B.

Now if π takes D to D0, what do we know about π(1)? Not as much as before. π(1) is the position that the B at the top of D will go to after the deck is permuted, so it must be the position of a B in D0. That is, π(1) 3, 4 . Since the third card in ∈{ } D is also a B, we must have π(3) 3, 4 too, and similarly the A’s must go to A’s, ∈ { } which means π(2), π(4) 1, 2 . But that’s all we can say. Any permutation which ∈{ } satisfies those criteria will take D to D0. There are 4 such, shown in Figure 2.1.

In general, let T (D,D0) be the set of permutations which take D to D0. In this situation we refer to D as the source deck and D0 as the target deck. Suppose the values of cards in the decks are v1, v2,...,vk and each deck contains ni cards with value vi, for i =1, 2,...,k. Then there are ni! ways to draw arrows between the vi’s in D and the vi’s in D0. So the total number of permutations which take D to D0, i.e. the size of T (D,D0), is n !n ! n ! 1 2 ··· k

For any deck D, T (D,D) is the stabilizer of D. T (D,D0) is a coset of both T (D,D) 18

and T (D0,D0): if π is any element of T (D,D0), then

(2.3) T (D,D0)= πT (D,D)= T (D0,D0)π.

Unfortunately, stabilizers are not in general normal subgroups of the symmetric group, and few results from group theory seem to be helpful in answering our basic questions.

2.3 Mixing Problems

Imagine being the groundskeeper of a stadium. A truck delivers a large pile of dirt to one corner of the field, and your job is to spread the dirt over the field so that its depth is the same everywhere. You have at your disposal a mechanical rake, capable of raking the whole field at once and moving dirt around in some fashion. You would like to decide how many times you need to rake the field, so that it will be close to uniformly covered when you are done. To make mathematical sense of this problem, one needs three things: a model for how the rake moves dirt, a measure for how close the dirt is to “flat” at any given time, and an idea of “how flat is flat enough”—that is, a critical value of flatness which we declare to be our target.

The groundskeeper’s problem is a metaphor for mixing-time problems. The field is some event space, and the dirt is probability, so the position of the dirt on the field represents a probability distribution. The rake is some randomization procedure, which spreads probability around. If it is a good mixing method, it will eventually bring the distribution close to the uniform distribution (i.e., all events will be approximately equally likely).

In our case the event space will be the set of orderings of a deck of cards, and the 19

randomization procedure will be riffle shuffling, defined below. We will establish our

measure of “flatness” in Sections 2.9 and 2.10. We leave it up to the reader to choose

the critical value.

2.4 Riffle Shuffling

Riffle shuffling may be described by the following procedure, sometimes called the

Gilbert-Shannon-Reeds or GSR model. Take a deck of cards and partition it into two packets by placing a “cut” somewhere in the deck. The position of the cut should be chosen according to the binomial distribution; that is, the probability that the

1 n cut is placed after the kth card of an n-card deck is 2n k . (Note this allows for the cut to be placed before the top card or after the bottom card, but since the binomial distribution is bell-shaped, the cut is very likely to be near the middle of the deck.)

After making a cut, the shuffler takes those cards above the cut, still in their original order, into his left hand, and the rest into his right hand. He then repeats the following procedure, until all cards have been dropped: if the left hand contains

A cards and the right hand contains B cards, he drops a card from the bottom of

A the left packet with probability A+B and from the bottom of the right packet with

B probability A+B . (The idea being, a shuffler is more likely to drop from a large packet than from a small one.) Each card dropped falls on top of those already dropped,

and the result is a new deck, which is a reordering of the original. Table 2.1 shows

a typical shuffle of the deck ABCDEFGH.

Any shuffle of n cards can be represented uniquely by a sequence of n 0’s and 1’s.

The number of 0’s is the card after which to make the cut (equal to the number of 20

Packets On Table Action Probability ABC|DEFGH 1 8 Cut after third card 28 3  ABC 5 H Drop from right 8 DEFG H

AB C 3 CH Drop from left 7 DEFG

A B 2 BCH Drop from left 6 DEFG

A 4 GBCH Drop from right 5 DEF G

A 3 FGBCH Drop from right 4 DE F

A 2 EFGBCH Drop from right 3 D E

A 1 AEFGBCH Drop from left 2 D

1 DAEFGBCH Drop from right 1 D

Table 2.1: The steps in a typical shuffle of the deck ABCDEFGH. It results in the reordering DAEFGBCH. 21 } { A D1 B A0 Left }| C E1 z

{ D F1 Read this bottom-to-top to get a E G1 {z recipe for executing the shuffle F B0 Right }| G C0 H H1 z |

Figure 2.2: The correspondence between the shuffle of Table 2.1 and a sequence of 0’s and 1’s. The number of 0’s is the position of the cut, and each digit indicates whether a card from the left (0) or right (1) hand should be dropped at each stage. cards in the shuffler’s left hand when shuffling begins), and each digit tells whether to drop a card from the left or from the right (0 for left, 1 for right). For instance, the shuffle in Table 2.1 can be represented by 10011101, as shown in Figure 2.2.

Clearly, then, there are 2n ways to riffle shuffle a deck of n cards. The probabil- ity of getting the shuffle in our example can be obtained by multiplying together the probabilities shown in Table 2.1 for each step, since the choices involved are independent:

1 8 5 3 2 4 3 2 1 1 1 P(10011101)= = . 28 3 8 7 6 5 4 3 2 1 28                  

In fact, this was not an accident. The denominators of the “drop probabilities” are the total number of cards remaining to be dropped—therefore they begin with 8 and decrease by 1 at each step. The numerators of the “left-drop” probabilities are the number of cards remaining in the left hand—therefore they begin with 3 and decrease by 1 each time a card is dropped from the left. The other numerators correspond to “right-drops”, so they begin with 5 and go down. So the product of all the drop probabilities is 22

(3 2 1)(5 4 3 2 1) 3!5! · · · · · · = 8 7 6 5 4 3 2 1 8! · · · · · · · which means the probability of the shuffle is

1 8 3!5! 1 = . 28 3 8! 28  

n It is straightforward to see that under the GSR model, the result will be 2− for any shuffle of n cards. This is the chief mathematical virtue of the GSR model, and the reason it is called the “maximum entropy” model.

2.5 a-shuffles

Riffle shuffles, as described so far, can be generalized to a-shuffles, by either of the following equivalent methods:

1. Partition a deck of n cards into a packets according to a multinomial distribu-

tion. That is, the probability that packet i has n cards for i = 0, 1,...,a 1 i − is 1 n! n a · n0!n1! na 1! ··· − Then proceed as in Table 2.1, dropping a card from the ith packet with proba-

bility equal to the size of the packet over the number of cards not yet dropped.

2. Write down a sequence of n numbers, each of which is between 0 and a 1. − Let n be the number of i’s, for i =0, 1,...,a 1. Remove the first n cards of i − 0

the deck and call them the “0-packet”, remove the next n1 cards and call them

the “1-packet”, and so on. Now proceed to use the sequence as instructions for

combining the packets, as in Figure 2.2. 23

We will use the notation (n, a) to represent the a-shuffles of n cards. Each shuffle H determines some permutation of n letters, but there may be some permutations that cannot be obtained through a-shuffling, and others that can be obtained in multiple ways. For example, complete reversal of a deck is impossible if a < n, yet the two

2-shuffles

11110000 and 11100000 both produce the identity permutation by cutting the deck, dropping all the cards formerly on the bottom, then dropping all the cards formerly on the top. Note they are different shuffles, since they have different cuts. The GSR model assigns equal probability to all an elements of (n, a), so it will give different probabilities to H different permutations.

2.6 Inverse Shuffles and Repeated Shuffles

The inverse of an a-shuffle is easier to describe than the a-shuffle itself. To perform an inverse a-shuffle on a deck of cards, simply write a number from 0 to a 1 on − the back of each card, then perform a stable sort. That is, move the 0’s to the

top of the pack, preserving their relative order, then move the 1’s below them, again

preserving relative order, and so on. Figure 2.3 shows a 3-shuffle and its inverse.

Stable-sorting of computer punch cards on a single decimal digit could be—and

was—done mechanically. Sorting machines contained 10 bins, numbered 0 through

9, and would iterate through a deck of cards and place those with a 0 in a particular

column in the 0-bin, those numbered 1 in the 1-bin, etc. Thus they could perform

any “10-sort”, that is, the inverse of any 10-shuffle. 24

0A D1 0B A0 0C G2 1D E1 1E F1 1F B0 2G H2 2H C0

Figure 2.3: From left to right, a 3-shuffle of ABCDEFGH; from right to left, an inverse 3-shuffle of DAGEFBHC. Inverse shuffles are stable sorts.

The reason a 10-sorter was a useful machine was that one could sort a stack of cards according to a multi-digit field by successive sorts on digits, least significant first.

The process was known as “radix sort”[27]. For example, consider a set of cards containing two fields, name and age.

Bob 47 Carol 39 Cathy 50 Jenny 27 Joe 43 Steve 41 Twila 37 Zoe 31

Suppose the cards are currently ordered by name, and we would like to sort them by age. What we should do is stable-sort the cards first by the right (ones) digit, then by the left (tens) digit. Figure 2.4 shows the process.

Since each sort preserves the previous order when it has no cause to make a change, the net result is to put the stack in lexicographic order according to the whole field.

Clearly the radix of the digits of the field is not important, and in fact each digit might have a different radix.

Now suppose we have an ab-shuffle z, which corresponds to a sequence (z1, z2,...,zn) of numbers between 0 and ab 1. We can write each z as a two-digit mixed-radix − i 25

Jenny 27 Cathy 50 Bob 47 Zoe 31 Steve 41 Carol 39 Twila 37 Zoe 31 Cathy 50 Carol 39 Joe 43 Jenny 27 Steve 41 Bob 47 Joe 43 Joe 43 Jenny 27 Steve 41 Bob 47 Twila 37 Twila 37 Cathy 50 Carol 39 Zoe 31

Figure2.4: A radix sort of a set of 8 punch cards. The cards are initially ordered by the alphanu- meric “name” field, and can be sorted by the two digit “age” field by the two-stage process shown. The first stage sorts by the ones digit of age, and the second by the tens digit. Since each sort is stable, the effect is to sort by the whole field. number (x ,y ), where 0 x < a and 0 y < b, by setting i i ≤ i ≤ i

x = z /b i b i c y = z bx . i i − i

1 Then from the radix sort we know that z− (an ab-sort) is the composition of a b-sort and an a-sort, which means that z is the composition of an a-shuffle and a b-shuffle.

So there is a surjection

φ (2.4) (n, a) (n, b) (n, ab) H × H → H such that an a-shuffle x followed by a b-shuffle y is equivalent to the ab-shuffle φ(x, y).

Since the domain and range both have size (ab)n, φ is a bijection. Thus choosing x and y independently and uniformly has the same effect as choosing φ(x, y) uniformly, which is to say, executing a random a-shuffle followed by a random b-shuffle has the same effect as executing a random ab-shuffle. In particular, that means m 2-shuffles are equivalent to one 2m-shuffle. So we need only understand a-shuffles to understand the effect of repeated shuffles. 26

The idea of thinking of inverse shuffles as sorts appears in [1] and [2], in which Aldous

and Diaconis credit the idea to Jim Reeds. Those works also include the notion of

stringing together 2-shuffles to get a 2n shuffle. The idea of generalizing 2-shuffles to a-shuffles, and the fact that an a-shuffle followed by a b-shuffle is an ab-shuffle,

appears in [3].

2.7 Counting Shuffles Which Produce a Permutation

Let π be an element of Sn. We would like to enumerate the a-shuffles which resolve

to π, so that we can calculate the probability of obtaining π by a-shuffling the deck.

Once a deck D of distinct cards has been partitioned, there is at most one way to

riffle the cards together to get πD, because the order of the cards in πD dictates

which card must be dropped at each step. So we need only count the number of

ways to partition D in such a way that it is possible to obtain πD by riffling.

A partition of a deck of n cards can be described either as a set of non-negative packet

sizes s0,s1,...,sa 1 which sum to n, or as a list of cut positions c1,c2,...,ca 1, where − −

ck = i

cut between a pair of cards, and we can also place cuts before the top card or after

the bottom card.)

A rising sequence ([22],[3]) of a permutation π is a maximal sequence of consecutive

numbers i +1, i +2,...,i + k that appears as a subsequence of πe. So for instance

if 12345678 π = 24537816   27

then πe = 71423856. 123 is a rising sequence of π, because the numbers 1, 2, and 3 appear in order in πe (1,2,3 is a subsequence) but 1, 2, 3, and 4 do not (so 1,2,3 is maximal). Likewise 4,5,6 and 7,8 are rising sequences. So π, in this example, has 3 rising sequences.

In [3] Bayer and Diaconis refer to the number of rising sequences of a permutation as its winding number. Imagine a cul-de-sac in a new housing development, with n houses arranged in a circle. Suppose it is the developer’s job to deliver doormats to each of the houses, and that each house’s mat is personalized; perhaps the name of the family in the house is embroidered on the mat. The mats are stacked in some order in the back of the developer’s truck, and he plans to deliver them by driving around the circle until he gets to the house whose mat is on top of the stack, throwing the mat on its fated doorstep, then repeating the process until all the mats are gone.

In this story, the winding number of the initial order of the mats is the number of trips the developer makes around the circle.

If k appears in a certain rising sequence of π, then k + 1 will be in the same rising sequence if and only if k + 1 appears after k in πe. The positions of k and k +1 are π(k) and π(k + 1), respectively, so k + 1 begins a new rising sequence if and only if π(k) > π(k + 1). In that case we say π has a descent at k. Since descents

correspond to breaks between rising sequences,

(2.5) # rising sequences of π = # descents of π +1. { } { }

We will denote the number of descents of π by des(π). If π(k) < π(k + 1) we say

that π has an ascent at k, and we will denote the number of ascents of π by asc(π). 28

Lemma 2.1 If P is a partition of a deck of size n, and π S , then P can be riffled ∈ n to get π if and only if P has a cut wherever π has a descent.

Proof. P can be riffled to get π if and only if every packet of P appears in order in

πe. That means each packet of P is contained in a rising sequence of π, which is to

say, no packet crosses a boundary between rising sequences. Since there are descents

between the rising sequences and cuts between the packets, there must be a cut at

every descent. 

And now we can count the number of shuffles that produce a particular permutation.

n+a 1 d Corollary 2.2 If π S has d descents, then there are − − a-shuffles which ∈ n n resolve to π. 

Proof. We need to count the number of partitions that contain all d descents of π

among their a 1 cuts. That is the same as counting the number of ways to arrange − a 1 identical balls in n + 1 boxes, subject to the restriction that a certain d boxes − must contain at least one ball. One way to construct each such arrangement is to

place a 1 d balls in the boxes, then place the remaining d balls in the required − − locations. Placing k balls in m boxes is equivalent to writing a sequence of k “stars”

m 1+k and m 1 “bars”, and there are m− 1 ways to do that. Setting m = n + 1 and − − k = a 1 d completes the proof.   − −

n Finally, then, since each shuffle occurs with probability a− , 1 n + a 1 des(π) (2.6) P (π) := − − a an n   is the probability of obtaining the permutation π as the result of an a-shuffle. Equa-

tion (2.6) appeared first in [3]. 29

2.8 Descent Polynomials and Shuffle Series

For π S , define the ordinary generating function ∈ n

a 1 (2.7) (x) := # a-shuffles which produce π x − Sπ { } a 1 X≥ n + a 1 des(π) a 1 = − − x − . n a 1   X≥ Substituting k = a 1 des(π), − − n + k (2.8) (x)= xdes(π) xk. Sπ n k 0   X≥ The coefficient in the sum is the number of ways to put k identical balls into n +1 distinct boxes, thus

xdes(π) (2.9) (x)= xdes(π) 1+ x + x2 + . . . n+1 = . Sπ (1 x)n+1 −  Now let R S be some set of permutations, and define ⊂ n

a 1 (2.10) (x) := # a-shuffles which produce a permutation in R x − SR { } a 1 X≥ (2.11) (x) := xdes(π) = # π R with d descents xd DR { ∈ } π R d X∈ X to be the shuffle series and descent polynomial of R, respectively. Then

xdes(π) (x) (2.12) (x)= (x)= = DR . SR Sπ (1 x)n+1 (1 x)n+1 π R π R X∈ X∈ − − If

n 1 R(x)= b0 + b1x + . . . + bn 1x − D − (x)= c + c x + c x2 + SR 1 2 3 ··· then the probability of obtaining a permutation in R after an a-shuffle is

c 1 n + a 1 d (2.13) P (R)= a = b − − a an an d n d a 1   ≤X− 30

which is a finite sum, calculable in polynomial time if we know the bd’s. Note we can go the other way as well—since (x)= (x)(1 x)n+1, DR SR −

d (a 1) n+1 (2.14) b = c coefficient of x − − in (1 x) d a − a 1 X≥  d a+1 n +1 = ( 1) − c − d a +1 a a 1   X≥ − which is also a finite sum, calculable in polynomial time if we know the ca’s. So calculating the probabilities of a set of permutations after an a-shuffle, for all a, is computationally equivalent to calculating the descent polynomial of the set.

In particular, we will focus on finding

(2.15) (D,D0; x) := 0 (x) D DT (D,D )

(2.16) (D,D0; x) := 0 (x) S ST (D,D ) as a means to understand the transition from deck D to deck D0 after any size shuffle.

2.9 Distance from Uniform

Once we understand how shuffling acts on a deck, we still need to decide how much shuffling is “enough”. To do so, we need to define what “enough” means. Ultimately, the definition should be rooted in what we plan to do with the deck once it is shuffled, i.e., what game we will be playing.

Suppose we are playing a simple game, with some number of players, that involves no choices or strategy—so the entire course of the game is determined by the initial order of the cards. War, straight poker (if we ignore betting), and many forms of solitaire are examples of such games. From the perspective of a particular player, some decks are “winners” and the rest are “losers”. 31

Let W be the set of winners for some player A. If U represents the uniform dis- tribution, then U(W ) is the probability that player A wins, given that the deck is perfectly mixed. If, on the other hand, P represents the distribution of decks af- ter some shuffle, then one measure of how good the shuffle is is the advantage or disadvantage it gives to player A; that is, the quantity

P (W ) U(W ) . | − | For example, consider a simple game called “Ace-King”, between two players whom we will call B (for “black”) and R (for “red”). A regular deck is dealt face-up, one card at a time. B wins if the ace of spades comes up before the king of hearts, and

loses otherwise. Clearly this is a fair game if the deck is perfectly mixed—that is,

1 if W is the set of decks for which B wins, then U(W ) = 2 . But suppose instead that the ace of spades is initially on top of the deck, and the king of hearts on the

bottom, then the deck is given an a-shuffle, after which we play the game. One might

guess, correctly, that B has an advantage if the deck is not well shuffled, since the ace

will tend to stay near the top of the deck and the king near the bottom. Using the

methods of Chapter III and Chapter IV, the probability that B wins is calculated in

Appendix D, and the result is

a n a n 1 k k − P (W )=(n + 1) n a a − a Xk=1   Xk=1   where n = 52 is the size of the deck. B’s advantage is P (W ) U(W )= P (W ) 1 ; a − a − 2 it is graphed in Figure 2.5 alongside the result of the next example.

Peter Doyle, as reported in [33] and [32], proposed the following extension of the

Ace-King game (somewhat simplified here). Suppose an ordinary deck is initially ordered

A , 2 ,..., K , A , 2 ,..., K , A , 2 ,..., K , A , 2 ,..., K ♠ ♠ ♠ ♣ ♣ ♣ ♦ ♦ ♦ ♥ ♥ ♥ 32

and is given an a-shuffle. We go through the deck from top to bottom. When the

ace of spades is found, place it in a pile in front of player B. If, subsequently, the

2 of spades is found, place it on top of the ace. If the 3 of spades appears after the two, place it on top, and so on. So B’s stack always matches an initial segment of the unshuffled deck.

Simultaneously construct a stack of red cards ( ’s and ’s) in front of player R, in ♦ ♥ the same manner, except that the red stack must be in order; the first card

down must be the king of hearts, then the queen of hearts, etc. If a card that belongs

in neither pile comes up, place it on the bottom of the deck, so that it may come up

again later.

B wins if the last black card (the king of clubs) is extracted before the last red card

(the ace of diamonds). Rising sequences among the black cards work in B’s favor,

and descents against him; the opposite is true for R and the red cards. So we expect

B to have a large advantage for small values of a. A computer simulation was run,

with one million trials for each value of a from 1 to 1024. The results are shown in

Figure 2.5.

In order to decide how many times to shuffle before playing either of the games

discussed in this section, we should decide what an acceptable advantage is for player

B, then shuffle enough to bring the advantage down to that level. Whatever level

we pick, more shuffling is required for Doyle’s game than for the Ace-King game. 33

y

1

1 2

a 2 4 8 16 32 64 128 256 512 1024

Figure 2.5: The advantages for player B in the Ace-King Game (in black) and in Doyle’s game (in red). We assume a 52-card deck that is initially ordered and then given an a-shuffle. 1 “Advantage” means Pa(B wins) . a is graphed on a logarithmic scale. − 2

2.10 Variation Distance

The reason we got different answers for the Ace-King game and for Doyle’s game is that the set W of winning decks is different for the two games, and Pa(WAce-King)

approaches U(WAce-King) faster than Pa(WDoyle) approaches U(WDoyle). It is natural

to ask, then, what set W approaches its uniform probability the slowest—in other

words, what game is the hardest to shuffle for.

Note: we defined Pa(π) in Equation (2.6) to be the probability of obtaining the

permutation π as the result of an a-shuffle. Here we extend that notation: let

P (D D0) be the probability that an a-shuffle of the deck D results in the deck a →

D0. That is,

P (D D0) := P (π). a → a π T (D,D0) ∈ X When the source deck D has been fixed and W is some set of decks, P (D W ) a → 34 refers to the probability that an a-shuffle of D results in one of the decks in W :

P (D W ) := P (D D0). a → a → D0 W X∈

We abbreviated that as “Pa(W )” in the previous paragraph. However, at other times we will fix the target deck D0, and for that case we define

P (W D0) := P (D D0). a → a → D W X∈

For now, suppose we fix a source deck D and an > 0 and make our goal “to shuffle enough so that the advantage or disadvantage of any one player in any simple game is less than .” So we want to make a large enough that the quantity

(2.17) max Pa(D W ) U(W ) W | → − | is less than , where W is allowed to range over all sets of orderings of the deck.

The quantity in Equation (2.17) is known as the variation distance between Pa and U, and will be denoted P U . Variation distance is sometimes called “total || a − || variation”, “total variation distance”, or “half the L1 norm”. Note that variation distance depends implicitly on the composition of the deck we are shuffling and its initial order, facts which are hidden by the notation P U . || a − ||

Fix a, and let f(D0) := P (D D0) U(D0). Call a deck “favored” if it is more a → − likely after an a-shuffle than it would be under the uniform distribution; that is, D0 is favored if f(D0) > 0. D0 is “disfavored” if f(D0) < 0. Then if W is some set of decks, f(W )= D0 W f(D0) can be increased either by appending favored decks to ∈ W or by deletingP disfavored decks. Likewise f(W ) can be decreased by appending disfavored decks or deleting favored decks. “Neutral” decks, which have f(D0) =0, do not affect f(W ) one way or the other. It follows that f(W ) is maximized either | | 35

+ when W is the set W of all favored decks or when W is the set W − of all disfavored decks. Those two sets produce the same distance, because

1= Pa(D D0)= U(D0) 0 → 0 XD XD and so

+ 0= f(D0)= f(D0)+ f(D0)= f(W ) f(W −) . − D0 D0 W + D0 W − X X∈ X∈ Therefore the variation distance after an a-shuffle is f( W +). But we do not need to

find W + to compute the variation distance, because

+ + f(D0) = f(D0) + f(D0) = f(W ) + f(W −) =2 f(W ) | | | | | | D0 D0 W + D0 W − X X∈ X∈ and therefore when the source deck D is fixed,

1 (2.18) Pa U = Pa(D D0) U(D0) . || − || 2 0 | → − | XD We will most often use Equation (2.18) as our starting point for calculating variation distance when the source deck is fixed.

We can now calculate the variation distance from uniform of a deck of n distinct cards which are given an a-shuffle, a result which appeared first in [3]. If D is any starting order, the transformation set T (D,D0) contains only one permutation for each D0. So P (D D0)= P (π) where π is the unique permutation which takes D a → a to D0, and Equation (2.6) tells us that Pa(π) depends only on the number of descents in π. Therefore

1 Pa U = Pa(D D0) U(D0) || − || 2 0 | → − | XD 1 1 = P (π) 2 a − n! π Sn X∈ 1 n 1 n + a 1 d 1 = − − 2 d an n − n! d     X

36

y

1

1 2

a 2 4 8 16 32 64 128 256 512 1024

Figure 2.6: The variation distance from uniform of a distinct 52-card deck after an a-shuffle. The 1 distance falls below 2 somewhere between a = 64 and a = 128, that is, between 6 and 7 2-shuffles.

n where d is the number of permutations in Sn which have d descents. Chapter IV n will give a well-known recurrence relation which makes d easy to calculate for small n, so we can compute P U efficiently. Figure 2.6 shows the variation distance || a − || for n = 52 and a between 1 and 1024. All of the variation distances we have studied have the same basic “waterfall” shape seen in Figure 2.5 and Figure 2.6.

2.11 Dealing Cards is Equivalent to Fixing the Target Deck

Suppose now that all the cards in the deck are distinct, but we are playing a game with “hands”. That is, the deck is shuffled and then dealt to several players in some manner. Each player receives some set of cards from the deck, and we assume that the order in which a player receives her cards is irrelevant to how the game will play out. Bridge and straight poker are examples of such a game. If there are cards that do not get dealt (as in the case of straight poker), we may group them together and call them another hand. (If, however, undealt cards may be dealt later, as in the 37 case of draw poker, each undealt card must be a hand by itself.) So without loss of generality assume all cards are dealt. We will refer to a particular partitioning of the cards into hands as a deal.

Our goal is still the same - to shuffle enough that the advantage or disadvantage to any one player is small. But now certain orderings of the deck at the time of dealing are equivalent, because they result in the same set of hands. For example, in bridge the players are referred to as North, East, South, and West, and the dealer usually deals cards to them in order, cyclically. That is, the dealing process is

(2.19) D0 := NESWNESWNESWNESWNESWNESWNESWNESWNESWNESWNESWNESWNESW where ‘N’ means “deal a card to North”, ‘E’ means “deal a card to East”, etc. In other words, D0 is a function from 1, 2,..., 52 to N, E, S, W which instructs the { } { } dealer where he should deal the ith card. If, after shuffling but before dealing, the dealer were to swap the first and fifth cards in the deck, the subsequent game would be unaffected, because those cards become part of the same hand, and the order in which they arrive is immaterial.

Assume that the cards are initially in some known order e = e(1), e(2),...,e(52), and the dealer gives them an a-shuffle before dealing. We may describe a deal as a sequence of 13 N’s, 13 E’s, 13 S’s, and 13 W’s in some order; for instance, the sequence

(2.20) D := NSEENNWEWSSWESWNNNEESSSSSESWWNNSENWSEWSWWWEENEWNNNWE refers to the deal in which the North player gets cards e(1), e(5), e(6),...,e(50), the East player gets cards e(3), e(4), e(8),...,e(52), and so on. So D, like D0, is a function from 1, 2,..., 52 to N, E, S, W , and D(i) is the player who receives card { } { } e(i). 38

Suppose that when the dealer shuffles the cards he produces a permutation π, and also that the resulting deal is the D defined in Equation (2.20). In order for the

North player to receive card e(1), π must send the first card to one of the positions where an N appears in the deal sequence D0; that is, D0(π(1)) must be N. Likewise,

South will receive card e(2) if and only if π(2) is one of the positions where an S

appears in D0, so it must be the case that D0(π(2)) = S. In general, then, π produces

D if and only if D0(π(i)) = D(i) for all positions i, which means D0 π = D or ◦ 1 D0 = D π− . ◦

Now suppose we think of D and D0 as decks with cards of type N, E, S, and W. Then

1 by Equation (2.1) D π− is the result of π acting on D; so π produces D if and only ◦

if πD = D0, and therefore the probability of getting deal D is P (D D0). So the a → variation distance is

(2.21) max Pa(W D0) U(W ) W | → − |

where W ranges over all sets of orderings of D0. As in the last section, the W which

maximizes the quantity in Equation (2.21) is the set of all “favored” orderings, so

after some simplifications we can say that when the deal D0 is fixed then

1 (2.22) P U = P (D D0) U(D) . || a − || 2 | a → − | D X In other words, dealing is almost the same as declaring certain cards to be identical, except that it fixes the target deck in a shuffle instead of the source deck.

2.12 How Good is the GSR Model?

The GSR model assumes that all possible shuffles are equally likely. That symmetry is the root of the elegant mathematics surrounding the theory of card shuffling on 39

which this work is based. From a scientific standpoint, one should consider how well

the GSR model represents the way people actually shuffle cards.

Diaconis and Reeds performed a simple experiment to test the model: each man

shuffled about 100 52-card decks and recorded the results of each shuffle. In [13],

they report the distribution of the “packet size” statistic for each shuffler. In this

context a packet is a maximal sequence of cards dropped from the same hand. So

for instance the shuffle 100001101 = 1104120111 has packets of size 1, 4, 2, 1, and 1.

Diaconis is a very “neat” shuffler, and as a result his packets are of size 1 about 80% of the time. Reeds’ packets are of size 1 about 62% of the time. Under the GSR model, 27 51% of the packets should have size 1, so it appears that both Diaconis 53 ≈ and Reeds are “neater” than the model. Preliminary tests by this author suggest

that packet size distribution varies greatly among human shufflers, and we hope to

generate more data on human shuffling for a future work.

Other mathematical models for riffle shuffling may be found in [18], [48], and [19].

The first two consider only special subsets of the set of riffle shuffles as we have

defined them. Epstein [19] studied real shufflers by recording the sound produced by

shuffles. He suggests that for shufflers above a certain level of proficiency, packet size

has approximately a geometric distribution; that is, the probability that a packet

has size η is about

η 1 P(η)= α(1 α) − −

8 and for expert shufflers he suggests α = 9 . This leads to a non-uniform distribution on shuffles.

A mathematical model can be judged useful by its accuracy in describing a real- 40 world phenomenon. Alternately, success can result if the model allows us to learn something about the real world that was not otherwise apparent. We hope this latter case applies to the GSR model. For instance, in Chapter V we will show that one of the consequences of the model is that there is a better way to deal a bridge hand than the ordinary cyclic method. We hope in a future work to test this conclusion on real shuffle data, to see if information gleaned from the model translates back to actual human shuffling. CHAPTER III

Probability Calculations for Some Simple Decks

Let D be a deck of n cards, D0 a rearrangement of D. In Section 2.8 we defined the descent polynomial and shuffle series

des π (D,D0; x) := x D 0 π:πDX=D a 1 (D,D0; x) := # a-shuffles which take D to D0 x − S { } a 1 X≤ and showed that computing one was equivalent to computing the other, since

n+1 (D,D0; x)=(1 x) (D,D0; x). D − S

So if we know either (D,D0; x) or (D,D0; x) then we may fairly say that we D S completely understand the transition from D to D0 under the GSR model, since the

a 1 n coefficient of x − in (D,D0; x), when divided by a , is the probability of obtaining S

D0 after a-shuffling D. So the first step toward understanding shuffles of decks with repeated cards would seem to be finding a general method for computing (D,D0; x) D and (D,D0; x). S

Unfortunately, Viswanath showed in [11] that, for general values of D and D0, com- puting (D,D0; x) (and therefore (D,D0; x)) belongs to a class of counting problems D S known as #P, and is in fact #P-complete, meaning every other problem in #P can

41 42 be reduced to finding descent polynomials in time that is polynomial in the size of the problem. There are many other #P-complete problems, some of considerable interest to industry. As with NP-complete problems, it is generally believed that there is no polynomial-time algorithm to solve any of them. Kozen [30] contains a full explanation of #P-completeness.

However we can compute (D,D0; x) and (D,D0; x) efficiently in certain special D S cases, and it is our goal in this chapter to do that. Our computations will bear fruit in Chapter VI, where we use them to approximate variation distances.

In the space of all possible decks of size n, there are two extremes that might be considered “simple”: a deck of n distinct cards, and a deck of n cards which are all the same. In the all-distinct case, T (D,D0) consists of a single permutation π,

des(π) so (D,D0; x) is simply x . Decks which are close to the all-distinct case (i.e., D decks with very little redundancy) have similarly small transition sets, which makes it possible to compute descent polynomials simply by counting.

We begin this chapter from the other extreme (single-valued decks). From there the development is bipartite: we assume a format for the source deck and find D and for general target decks; then we assume the same format for the target deck S and allow the source to vary. It becomes evident how unsymmetric the problem of shuffling is—the answers are very different depending on which deck is fixed.

n 1 n 1 The formats we consider are: RB − (one red card on top of n 1 black ones), B − R − n 2 (one red card under n 1 black ones), RB − G (one red card over n 2 black cards − − over a green card), RmBn, and the general ordered deck 1n1 2n2 knk . The final case ··· includes all the previous ones, and represents one ordering of any collection of cards. 43

It still falls far short of encompassing all decks, however; for instance, the methods for ordered decks are no help in understanding the transition from a cyclic deck like

(A23456789TJQK)4 to an arbitrary rearrangement, or vice-versa.

In Section 3.12 we present one final method that is applicable to the general case.

It will yield an improvement over simple counting of permutations when the target deck consists of large blocks of cards of the same value.

3.1 The Simplest Deck

First consider a deck of n cards that are all identical; we represent this symbolically by

(3.1) D = 1n.

That is, D is a sequence of n cards labeled 1. Every rearrangement of D is also D, so all an a-shuffles of D produce D; therefore

n a 1 (3.2) (D,D; x)= a x − . S a 1 X≥ On the other hand, if we adopt the standard notation

n (3.3) = the number of permutations in S that have d descents d n   then, since all permutations of D are D (i.e., T (D,D)= Sn), we have

n (D,D; x)= xdes(π) = xd. D d π Sn d   X∈ X We will refer to this function as an(x). By Equation (2.12) we have the well-known identity

n d n+1 n a 1 (3.4) a (x) := x = (1 x) a x − . n d − d   a 1 X X≥ 44

n The d are known as the Eulerian Numbers, because Euler first considered them in [20]. They have been studied extensively—see for example [24], [7], [47]. Let a0(x) := 1 so as to agree with the right-hand side of Equation (3.4).

3.2 One Red Card on Top

n 1 Let D1 = RB − . If we identify ‘R’ with ‘red’ and ‘B’ with ‘black’, then D1 is a deck made up of one red card on top of n 1 black cards. Let D be the rearrangement − k of D in which the red card is the kth card in the deck (1 k n). Then π S 1 ≤ ≤ ∈ n will take D1 to Dk if and only if π(1) = k. Therefore

n d (3.5) (D1,Dk; x)= x D d k Xd   where n is the number of permutations π S that have π(1) = k and des(π)= d. d k ∈ n n The d k will be studied extensively in Chapter IV, but for now we can approach this problem by computing the shuffle series (D ,D ; x) directly. S 1 k

The number of a-shuffles that take D1 to Dk is the same as the number of inverse a-shuffles that take Dk to D1, or the number of sequences

u ,u ,...,u , with 0 u a 1 for all i 1 2 n ≤ i ≤ − for which a stable sort moves uk to the top. A stable sort will move uk to the top if none of the ui are smaller than uk and none of u1,u2,...,uk 1 are equal to uk. −

Suppose we fix the value of uk to be i; then we need

k 1 u1,u2,...,uk 1 i +1, i +2,...,a 1 ((a i 1) − choices) − ∈{ − } − − n k u ,u ,...,u i, i +1,...,a 1 ((a i) − choices k+1 k+2 n ∈{ − } − 45

so the total number of a-shuffles that take D1 to Dk is

a 1 − k 1 n k (a i 1) − (a i) − . − − − i=0 X Substituting j = a 1 i and summing over a, − − a 1 − k 1 n k a 1 (D ,D ; x)= j − (j + 1) − x − S 1 k a 1 j=0 ! X≥ X k 1 n k a 1 = j − (j + 1) − x − j 0 a j+1 X≥ X≥ 1 k 1 n k j = (1 x)− j − (j + 1) − x − j 0 X≥ where we agree to interpret 00 as 1. We can now use Equation (2.12) to deduce the descent polynomial, which we will call gn,k:

n d n k 1 n k j (3.6) g (x) := (D ,D ; x)= x = (1 x) j − (j + 1) − x . n,k D 1 k d − d  k j 0 X X≥ We will derive that again in Chapter IV by different means.

3.3 One Red Card on the Bottom

Now consider the transformation D D ; that is, the process of moving a red card n → k from the bottom of the deck to position k. We could solve this problem in much the

same way as the last, but instead we take a shortcut.

Let ρ S be the “reversal” permutation. That is, ∈ n

(3.7) ρ(i) := n +1 i, 1 i n. − ≤ ≤

Applying ρ to a deck reverses it.

If π is any permutation in Sn, we can define the graph of π by drawing a sequence

of line segments from (1, π(1)) to (2, π(2)), from (2, π(2)) to (3, π(3)),..., and from 46

π πρ

ρπ ρπρ

Figure 3.1: The graphs of π, ρπ, πρ, and ρπρ, showing the way ρ changes ascents to descents and 12345678 vice-versa. Here π = and ρ is the reversal permutation. 86375142 Applying ρ on the left flips the graph upside down, applying it on the right flips the graph left-to-right. Either flip changes negative slopes to positive ones, and vice-versa. The descents of π and their reflections are drawn in black. They become the ascents of ρπ and πρ, and go back to being descents in ρπρ.

(n 1, π(n 1)) to (n, π(n)). (See Figure 3.1.) A line segment with negative slope − − represents a descent, and one with positive slope represents an ascent.

Since ρπ(i)= n+1 π(i), the graph of ρπ is the graph of π flipped upside down. The − flip changes ascents to descents, and vice-versa; therefore des(ρπ)= n 1 des(π). − − On the other hand, πρ(i) = π(n + 1 i), so the graph of πρ is the graph of π − flipped left to right. This flip also exchanges ascents for descents, so des(πρ) is also

n 1 des(π). Finally, the graph of ρπρ is the graph of π flipped both ways, and − − so

des(ρπρ)= n 1 des(πρ)= n 1 (n 1 des(π)) = des(π). − − − − − −

Now suppose D is any deck of n cards, D0 is a rearrangement of D, and we know 47

all about the set T (D,D0) of permutations that take D to D0. If π T (D,D0), then ∈

ρπρ T (ρD, ρD0), because ∈

2 (ρπρ)(ρD)= ρπρ D = ρπD = ρD0 since ρ2 is the identity. The map π ρπρ is one-to-one by general group theory, so 7→ since #T (D,D0) = #T (ρD, ρD0) (see Section 2.2), it is a bijection. Even better, it is a bijection that preserves des, so

(3.8) (D,D0; x)= (ρD, ρD0; x). D D

Applying Equation (3.8) to the problem of D D , we see that n → k

(Dn,Dk; x)= (ρDn, ρDk; x)= (D1,Dn+1 k; x). D D D −

Therefore

n n k k 1 j (3.9) (Dn,Dk; x)= gn,n+1 k(x)=(1 x) j − (j + 1) − x . D − − j 0 X≥

3.4 Any Position to Top

Now consider the transition from D` to D1. In this case it is more straightforward to compute the descent polynomial directly.

The transition from D1 to itself was covered by Equation (3.6), and can also be seen as a case of the very simplest problem, where all the cards are the same: πD1 = D1 means that π(1) = 1, so to draw an arrow diagram between D1 and itself, we must connect the top cards with an arrow; then we can draw arrows any way we like between the other cards. Number the arrows 1, 2,...,n according to their starting positions in the source deck. The first arrow cannot cross the second, so all descents 48

are among arrows 2, 3,...,n, which represent an arbitrary permutation from Sn 1. − Therefore

n 1 d (3.10) (D1,D1; x)= − x = an 1(x). D d − Xd   Now suppose ` > 1. To draw an arrow diagram of a permutation π which takes D` to D1, we first draw a “red” arrow from position ` on the left to position 1 on the right, then fill in the other “black” arrows in any fashion. The `th arrow cannot cross the (` + 1)st because π(` + 1) > 1 = π(`). On the other hand, the (` 1)st − arrow must cross the `th, because π(` 1) > 1= π(`). In other words, the descents − of π consist of

1. the descents among the first ` 1 arrows, −

2. the descents among the final n ` arrows, and −

3. a descent at ` 1. −

Let A = 1, 2,...,` 1 and B = ` +1,` +2,...,n be sets of positions in the { − } { } source deck. We can divide the task of drawing the black arrows into three subtasks:

1. Partition the set 2, 3,...,n of positions of black cards in the target deck into { }

disjoint sets A0 and B0, with #A0 = ` 1 and #B0 = n `. − −

2. Draw arrows from A to A0.

3. Draw arrows from B to B0.

n 1 There are `−1 ways to perform step 1. Performing step 2 amounts to choosing a −  permutation πA S` 1, then drawing arrows from i to the πA(i)th largest position ∈ − in A0, for i =1, 2,...,` 1. Arrow crossings will correspond to descents of π . − A 49

R R B R ascent ` 1 { B B − z}|{ B B B B descent R B ascent

B B { B B

n 1 }| − B B B B

n ` }| B B − B B

z B B z B B

Figure 3.2: The two cases considered in Section 3.4. When the red card is initially on top, the permutation among the black cards is arbitrary, and there are no additional descents. Otherwise, if the red card is not initially on top, there must be a descent between the red card and the card above it, since the arrows from those positions must cross. Likewise there can be no descent between the red card and the one below it, because those arrows cannot cross.

Likewise performing step 3 amounts to choosing a permutation πB Sn `. So, for ∈ − `> 1,

n 1 (D ,D ; x)= − xdes(πA)+des(πB )+1 D ` 1 ` 1   πA S`−1 πB Sn−` − X∈ X∈ n 1 = − x xdes(πA) xdes(πB) ` 1   πA S`−1 πB Sn−` − X∈ X∈ n 1 = − xa` 1(x)an `(x) ` 1 − −  −  where a is as in Equation (3.4) for n> 0, and a (x) 1. We can put this together n 0 ≡ with Equation (3.10) to write

n 1 [`>1] (3.11) (D`,D1; x)= − x a` 1(x)an `(x) D ` 1 − −  −  for any `, where 0 if statement A is false (3.12) [A]= 1 if statement A is true.  Knuth refers to this as Iverson notation in [26], and traces its origin to [25].

3.5 Any Position to Bottom

It is now easy to describe the transition from D` to Dn, since

(D`,Dn; x)= (ρD`, ρDn; x)= (Dn+1 `,D1; x). D D D − 50

Plugging into Equation (3.11) and simplifying,

n 1 [`

3.6 Any Position to Any Position

Now we calculate the general descent polynomial (D ,D ; x). D ` k

Suppose we do an (a + b + 1)-shuffle of D`, and that the red card is in packet a.

Imagine the source deck is a family ordered by age, and each packet is a generation.

Then from the perspective of the red card, packets 0, 1,...,a 1 contain “ancestors”, − packet a contains “siblings” (older ones above `, younger ones below), and packets a +1, a +2,...,a + b contain “descendants”. Let s be the number of ancestors and t be the number of descendants. Subject to those conditions, here are the steps in constructing a shuffle from D` to Dk:

1. Choose ` 1 s positions above k in the target deck to be the destinations of − − k 1 the older siblings ( ` −1 s choices). − −  2. Choose n ` t positions below k in the target deck to be the destinations of − − n k the younger siblings ( n −` t choices). − −  3. Decide which of the remaining s+t positions in the target deck are for ancestors,

s+t and which for descendants ( s choices).  4. Define an a-shuffle of the ancestors (as choices). 51

{ B B } B B s “ancestors” }| {z k 1 − z B B | B R ` 1 s “older siblings” } − − z}|{ B B R B B B n ` t “younger siblings”

z}|{ B B − − {z n k −

{ B B B B t “descendants” }|

z B B |

Figure 3.3: The shuffle described in Section 3.6. The source deck is divided into a + b + 1 packets, with the red card in position ` and packet a. The boundaries of packet a are drawn with horizontal lines. For ease of exposition, the cards in packets 0, 1,...,a 1 are called “ancestors” in the text, those in packets a +1,a +2,...,a + b are called “descendants”,− and those in packet a are “siblings”. Note that packet a cannot change its order during the transition.

5. Define a b-shuffle of the descendants (bt choices).

So the shuffle series is

k 1 n k s + t (3.14) (D ,D ; x)= − − asbtxa+b. S ` k ` 1 s n ` t s a,b,s,t 0     X≥ − − − − If k = `, the sum above becomes

p q s + t (3.15) asbtxa+b. s t s a,b,s,t 0     X≥ where p = k 1, the number of black cards above the red one, and q = n k, the − − number of black cards below the red one, in both the source and target decks. It is a very beautiful sum, but has no obvious simplification.

The case presented in this section is treated in depth by Ciucu in [8]. He computes the probability that the card in position ` is in position k after a 2-shuffle, and treats the resulting matrix as a Markov chain. Then he can answer the question of which card is most likely to be in each position. 52

B R { B R u } } z}|{ B B B B

u }| descent

R B z B B ascent

{ B B G B B B descent R B }| {z v n 2 {z n 2 − − z B B { B B ascent G B B B

descent w }| | B B B B | w z

z}|{ B G B G

Figure 3.4: The second “red-green” shuffle described in Section 3.7. The red and green cards normally partition the black cards of Dk,` into three packets of sizes u, v, and w. Since the red card is destined for the top of the target deck, there will be a descent above it (unless k = 1) and not below it. Likewise there will be a descent below the green card if `

3.7 One Red Card, One Green Card

Now consider a different generalization. Let Dk,` be a deck with one red card in position k, one green card in position `, and n 2 black cards in the other positions. − Then ` n d (D1,n,Dk,`; x)= x D d k Xd   ` where n is the number of permutations π S with des(π) = d, π(1) = k, and d k ∈ n π(n)= `. In Theorem 4.10 we will show that

n 1 ` − n d n+k ` if k<` =  − d n 1  k  − if k >`.  d 1 k ` − − So this case reduces to one we have already seen, namely:

gn 1,n+k `(x) if k<` (3.16) (D ,D ; x)= − − D 1,n k,`   xgn 1,k `(x) if k >`.  − −

We will refer to the function in Equation (3.16) as hn,k,`(x).

In the other direction we have the transition from Dk,` to D1,n. The two colored cards in the source deck separate the black cards into three (possibly empty) packets; call 53

the packets U, V , and W , and let their sizes be u, v, and w respectively. (So

u = min k,` 1, v = max k,` min k,` 1, and w = n max k,` . See { } − { } − { } − − { } Figure 3.4)

To construct a permutation which takes Dk,l to D1,n, we need to

1. Designate which of the target positions 2, 3,...,n 1 will hold cards from U, { − } (n 2)! − which will hold cards from V , and which will hold cards from W ( u!v!w! choices).

2. Draw arrows from U, V , and W to their destination sets in any fashion.

3. Draw an arrow from position k to position 1 for the red card, and one from

position ` to position n for the green card.

The complete permutation π has all the descents of the permutations defined in step 2. In addition, there will be a descent before k if k > 1 and a descent after ` if

` < n. (However if k = ` + 1, these are the same descent.) Therefore

(n 2)! (3.17) (D ,D ; x)= − xθa (x)a (x)a (x). D k,` 1,n u!v!w! u v w where

θ = [k > 1]+[` < n] [k = ` + 1] . −

3.8 Source deck RmBn

Now let D = RmBn. That is, D consists of m red cards on top of n black cards. Let

D0 be some rearrangement of D. We wish to calculate (D,D0; x). D

Let r1,r2,...,rm be the positions of the red cards in D0, and let b1, b2,...,bn be the

positions of the black cards. We can construct a permutation that takes D to D0 54

{ R R r1 R B b1 R B b2

m }| R R r2 R B

z R R descent if r` >bk

{ B R r` B B B B bk B B

n }| B R B R rm

z B B bn

Figure 3.5: The shuffle of the source deck RmBn, as described in Section 3.8. The positions of the red cards in the target deck are labeled r1, r2,...,rm and those of the black cards b1,b2,...,bn. r` is chosen as the ending position of the last red card from the source deck, and bk the ending position of the first black card. with the following procedure:

1. Choose a destination r` for the last red card in D.

2. Choose a permutation π S with π (m)= `, and for each i 1, 2,...,m , R ∈ m R ∈{ }

send the ith red card in D to position rπR(i) in D0.

3. Choose a destination bk for the first black card in D.

4. Choose a permutation π S with π (1) = k, and for each j 1, 2,...,n , B ∈ n B ∈{ }

send the jth black card in D to position bπB (j) in D0.

Call the complete permutation π. π will have a descent at position m if r` > bk, and all other descents of π come from the descents of πA and πB. So m n [r >b ]+des(π )+des(π ) (D,D0; x)= x ` k R B . D `=1 π Sm k=1 π Sn R∈ B∈ X πRX(n)=` X πBX(1)=k Rearranging summation signs and applying our results from Section 3.2 and Sec- tion 3.3, we have m n [r`>bk] (3.18) (D,D0; x)= x gm,m+1 `(x)gn,k(x). D − X`=1 Xk=1 55

1 1 1 1 1 1 1 x x 1 1 1 1 1 x x x 1 1 1 1 x x x 1 1 1 1 x x x x x x 1 x x x x x x 1

Figure 3.6: The Young matrix A for the deck RBBRBRRBBBRRB. The (i, j) entry is x if the ith R occurs after the jth B, and 1 otherwise. The x’s form a Young shape, anchored in the lower left corner. It is possible to recover the deck from the matrix by tracing the boundary between x’s and 1’s; a vertical segment corresponds to an R and a horizontal segment to a B.

Let

(3.19) Gr :=(gr,1(x),gr,2(x),...,gr,r(x)) ,

(3.20) Gr :=(gr,r(x),gr,r 1(x),...,gr,1(x)) , −

and let A be an m n matrixe with × x if r > b A = x[ri>bj ] = i j i,j 1 if r < b .  i j Then

m n T (3.21) (D,D0; x)= Gm A`,k(Gn)k = GmAGn . D ` X`=1 Xk=1   If we assume that r < r < < r ande b < b <

rm >rm 1 > >ri > bj > bj 1 > > b1 − ··· − ··· which means that r > b for any (p, q) with p i and q j. In other words, if p q ≥ ≤

Aij = x, then all entries in A below and to the left of (i, j) are also x. So we may

refer to A as a Young matrix in the sense that the x’s in A form a Young shape,

drawn in the French style (i.e., anchored to the lower left corner). See Figure 3.6.

One may recover the deck D0 from A by tracing the boundary between 1’s and x’s

from the upper left corner of A to the lower right corner of A. The boundary is a 56

R R } m1

z}|{ R R ascent

{ B R

{z m n1 B R }|

z B R descent

R R |

R B } ascent B B B B descent R B

{z n R B ascent B B

B B |

Figure 3.7: A shuffle to the target deck RmBn. There will be a descent wherever the source deck changes from black to red, because all the black cards in the target deck are below all the red cards. Likewise there are no descents where the source deck changes from red to black. sequence of m + n unit line segments—a vertical segment corresponds to a red card in D0, and a horizontal segment to a black card.

3.9 Target deck RmBn

m n Now let D0 := R B , with D some reordering, and consider the transition from D to

D0. D can be viewed as a sequence of monochromatic blocks of cards, with the odd blocks one color and the even ones the other. Without loss of generality we may say that

D = Rm1 Bn1 Rm2 Bn2 Rmk Bnk ··· for some k, where m1 and nk may be 0, but we insist that the other exponents be positive.

We can construct a permutation from D to D0 with the following procedure:

1. Partition the positions of red cards in D0 into those that will receive the first red

block in D, those that will receive the second block, etc. ( m! choices). m1!m2! m ! ··· k 57

2. Do likewise for the black cards ( n! choices). n1!n2! n ! ··· k

3. Define a permutation for each block into its destination set.

Suppose k > 1 and π is generated as above. Since the (m1 + n1)th card of D is

black, π(m + n ) m +1, m +2,...,m + n , and since the next card in D is red, 1 1 ∈{ } π(m + n + 1) 1, 2,...,m . Therefore π has a descent at m + n , and likewise 1 1 ∈{ } 1 1 at each of the k 1 positions where D changes from black to red. By the same − logic there are no descents where D changes from red to black. So all other descents

of π come from the permutations defined in step 3, each of which is unconstrained.

Therefore

k k 1 ami (x) ani (x) (3.22) (D,D0; x)= m!n!x − . D m ! n ! i=1 i i Y

3.10 Source Deck 1n1 2n2 ...mnm

n1 n2 nm Now consider D = 1 2 ...m , with D0 some rearrangement of D. Let pv(i) be

the position of the ith card with value v in D0. To construct a permutation which

takes D to D0, we should

1. Pick ` 1, 2,...,n and π S with π (n )= ` , and send the rth 1 in D 1 ∈{ 1} 1 ∈ n1 1 1 1

to position p (π (r)) in D0, for 1 r n . 1 1 ≤ ≤ 1

2. Pick k ,` 1, 2,...,n . If n = 1, we must have k = ` ; otherwise, if 2 2 ∈ { 2} 2 2 2

n2 > 1, k2 and `2 should be chosen to be distinct. Then pick a permutation

π S with π (1) = k and π (n )= ` , and send the rth 2 in D to position 2 ∈ n2 2 2 2 2 2

p2(π2(r)) in D0. 58

· · · · · · 0 v 1 v (`v)th v in D v − 1 − · · · 0 v v +1 (kv+1)th v + 1 in D v · · · · · · · · · 0 v v (kv)th v in D v · · · v +1 · · · 0 v +1 v 1 (`v−1)th v 1 in D − − · · · · · ·

Figure 3.8: Shuffling a sorted deck, as described in Section 3.10. We fix kv 1, 2,...,nv to be ∈ { } the relative destination of the first v in D. That is, the first v in D will go to the kvth 0 v in D . `v is the relative destination of the last v in D, and we similarly fix ku and `u for the other card values. (Exception: we don’t fix the destination of the first 1 or the last m.)

3. Pick

(k3,`3, π3), (k4,`4, π4),..., (km 1,`m 1, πm 1) − − −

in the manner of step 2.

4. Pick k 1, 2,...,n and π S with π (1) = k and send the rth m m ∈ { m} m ∈ nm m m

in D to position pm(πm(r)).

The descents of the resulting permutation π will be those of π1, π2,...,πm together with any “inter-value” descents, which result when, for some value v, the arrow from the labeled v in D crosses the arrow from the first card labeled v + 1; i.e., when pv(`v) >pv+1(kv+1). So we have

(3.23)

(D,D0; x)= f1(x)φ1,2(x)f2(x)φ2,3(x)f3(x) fm 1(x)φm 1,m(x)fm(x) D ··· − − k2,k3,...,km `1,`2X,...,`m−1 where fv(x) is the generating function for the descents of πv, and

[pu(`u)>pv(kv)] φu,v(x) := x . 59

Let A be the n n Young matrix (like the one in Figure 3.6) which results from u,v u × v ignoring all cards except those with values u or v. In other words,

[pu(i)>pv(j)] (Au,v)ij = x .

Then φu,v(x) is the (row `u, column kv) entry of Au,v.

From Section 3.3 we know that f1(x) = gn1,n1+1 `1 (x), which is the (`1)th entry in − the vector Gn1 defined in Section 3.8. Likewise, from Section 3.2, fm(x)= gnm,km (x), which is thee (km)th entry in Gnm . If1 1, fv(x) is determined in the manner of Section 3.7; that is,

fv(x)= hnv,kv,`v (x) where h (x) is the function defined in Equation (3.16). So let H be an n n n,k,` n × matrix with gn 1,n+i j(x) if i < j − − (Hn)ij = hn,i,j(x)= xgn 1,i j(x) if i > j  − −  0 if i = j for n> 1, and H = [1]. H is a Toeplitz matrix, since h only depends on i j; 1 n  n,i,j − for example (3.24) 2 0 g3,3 g3,2 g3,1 0 x + x 2x 1+ x 2 2 xg3,1 0 g3,3 g3,2 x + x 0 x + x 2x H4 =   =  2 2 2  . xg3,2 xg3,1 0 g3,3 2x x + x 0 x + x 2 3 2 2  xg3,3 xg3,2 xg3,1 0   x + x 2x x + x 0          Now we can write the sum in Equation (3.23) as

(G ) (A ) (H ) (A ) n1 `1 1,2 `1,k2 n2 k2,`2 2,3 `2,k3 ··· (Am 2,m 1)`m−2,km−1 (Hnm−1 )km−1,`m−1 (Am 1,m)`m−1,km (Gnm )km . k2,k3,...,km ··· − − − `1,`2X,...,`m−1 e In other words,

T (3.25) (D,D0; x)= Gn1 A1,2Hn2 A2,3 Am 2,m 1Hnm−1 Am 1,mGn . D ··· − − − m e 60

Note the G’s and H’s are fixed by the cards in D. Only the A’s depend on the order

of D0.

This product can be shortened, at the expense of some complexity. The procedure

for calculating (D,D0; x) is distilled into an algorithm in Appendix C. D

3.11 Target Deck 1n1 2n2 ...mnm

n1 n2 nm Now let D0 = 1 2 ...m , D some rearrangement, and consider the transition

from D to D0. This case is very similar to the one we considered in Section 3.9. We

may write D as a sequence of monochromatic blocks:

D = vk1 vk2 vkr 1 2 ··· r

1 where v = v . Let Dˆ be the deck v v v , and let B = Dˆ − (v). That is, B is i 6 i+1 1 2 ··· r v v

the set of blocks that have value v. Then to construct a permutation from D to D0,

we should

1. Identify which of the positions of 1’s in D0 will receive the first block of 1’s in

D, which will receive the second, etc. (n !/ k ! choices). 1 i B1 i ∈ Q 2. Do likewise for values 2, 3,...,m. There are

n !n ! n ! 1 2 ··· m k !k ! k ! 1 2 ··· r ways to perform steps 1 and 2.

3. Choose a permutation from each block to its destination set.

Descents in the complete permutation π will come from those internal to the block permutations, and from the descents of Dˆ. That is, if Dˆ(i) > Dˆ(i +1) then all of 61

2 1 } k1

z}|{ 2 1 n1 descent {z

{ 1 1

1 1 | k2 }| z 1 2 } ascent 3 2 n2 3 2 {z descent

2 2 |

2 3 } descent 1 3 ascent

{ 3 3 n3 {z 3 3 kr }| z 3 3 |

Figure 3.9: A shuffle to the target deck 1n1 2n2 ...mnm . There will be a descent wherever the source deck changes from a high numbered card to a lower one (such as from 3 to 2), but not when the change is from a low number to a higher one (such as from 1 to 3).

the cards in the ith block of D will end up below all the cards in the (i + 1)st block.

In particular, the arrow from the last card of the ith block will cross the arrow from the first card of the (i + 1)st block, so π will have a descent between the blocks. By the same logic, if Dˆ(i) < Dˆ(i + 1) there will be no descent between the blocks. So let

d = # i 1, 2,...,r 1 : Dˆ(i) > Dˆ(i + 1) . 0 ∈{ − } n o The number of inter-value descents is always d0, so we have

n1!n2! nm! d0 (3.26) (D,D0; x)= ··· x a (x)a (x) a (x). D k !k ! k ! k1 k2 ··· kr 1 2 ··· r

3.12 Target Decks Containing Blocks

The fallback method for computing (D,D0; x) is to iterate through all the permu- D

tations which take D to D0, and record the number of descents in each. Of course

this takes time in general, which is why we have attempted in this chapter

to find improved methods when the decks are of special types. Here we extend the

method of Section 3.11 to general target decks and obtain good results when the 62

target deck is “blocky”, i.e., contains blocks of cards of the same value.

Finding a permutation from D to D0 means assigning a valid destination to each

position in the source deck. The idea here is to assign positions in the source deck

to blocks in the target deck, and thus group together a set of permutations. Suppose

D0 = 111222111222. Then D0 contains 4 blocks, which we will number 1 through 4,

top to bottom. If D = 122112222111 then an assignment of positions in the source

deck to blocks might be P = 144332224113, which can be read as “The first card in

D goes to block 1 in D0, the second to block 4, the third to block 4, . . ., the last to

block 3.” This assignment to blocks represents 3!4 = 1296 permutations, since there

are 3 cards in each block and therefore 3! ways to resolve the assignments to each

block into positions.

Let R be the set of permutations represented by the block assignment P . We can

find the descent polynomial (x) for those permutations by scanning through P . DR Where P has an ascent (P (i) < P (i + 1)), every permutation in R must also have

an ascent, since position i goes to a lower-numbered block, and therefore a lower-

numbered position, than does i + 1. Likewise descents in P force descents in every

π R. ∈

Consider the 222 which appears at positions 6, 7, and 8 in P . It forces the permu-

tations in R to send 6, 7, 8 into 4, 5, 6 (the positions of block 2), but it does not { } { } 2 constrain the order in which they are assigned. Since a3(x)=1+4x + x , we know

1 that 6 of the permutations in R will have no descents among positions 6, 7, and 8,

4 1 6 will have one descent, and 6 will have two descents. The number of such descents is independent of all the other choices we make when resolving P into a permuta-

`1 `2 `s m1 m2 mt tion. So in general if D0 = v v v and P = b b b , with v = v and 1 2 ··· s 1 2 ··· t b 6 b+1 63

bj = bj+1, then 6 t am (x) (x)=(` !` ! ` !)xdes(P ) j . DR 1 2 ··· s m ! j=1 j Y If we sum up (x) over all choices for the block assignment P , we will have DR

(D,D0; x). D

Here is a recursive algorithm to do that. We assume that the set of card values is 1, 2,...,K . For convenience we assume certain variables are global; they are { } capitalized. Uncapitalized variables are local. We use i to represent a position in the deck, b to represent the number of a block in the target deck, and v to represent a card value. A block object is a structure containing two fields: value and count.

FindDesPoly(D,D0)

1. PreprocessTarget(D0) 2. PreprocessSource(D) 3. return MapBlock(1)

PreprocessTarget(D0)

1. N Length(D0) 2. P ← an array of N integers 3. Blocks← an empty list of Block objects 4. MapSize← 1 ← 5. v D0(1) 6. m← 1 7. for←i 2 to N ← 8. if (D0(i)= v) 9. m m +1 10. MapSize← MapSize m 11. else ← × 12. Append a new block with (value = v, count = m) to Blocks 13. v D0(i) 14. m← 1 15. Append a← new block with (value = v, count = m) to Blocks

PreprocessSource(D) 1. Pos array of K sets, all initially empty 2. for i← 1 to Length(D) 3. v ← D(i) ← 64

4. Pos[v] Pos[v] i ← ∪{ } MapBlock(b) 1. if (b> #Blocks) 2. return CalcPoly() 3. else 4. despoly 0 5. v Blocks← [b].value 6. m← Blocks[b].count 7. for← each s Pos[v] with #s = m 8. for each⊂i s 9. P [i] j∈ 10. Pos[v] ←Pos[v] s 11. despoly← despoly\ + MapBlock(j + 1) 12. Pos[v] ←Pos[v] s 13. return despoly← ∪

CalcPoly() 1. despoly MapSize 2. b P (1)← 3. m← 1 4. for←i 2 to N 5. if ←(P (i)= b) 6. m m +1 7. despoly← despoly/m 8. else ← 9. despoly despoly am(x) 10. if (P (i) ←< b) × 11. despoly despoly x 12. b P (i) ← × 13. m← 1 ← 14. despoly despoly am(x) 15. return despoly← ×

Once the target deck has been pre-processed, the procedure can be repeated for any number of source decks. In Chapter VI we will use this method to find descent polynomials for euchre, where the target deck is fixed as 111223334411222334445666 and the source deck allowed to vary. Using the method from this section cuts down processing time for a euchre deck by a factor of about 9000 over iteration through all the permutations. CHAPTER IV

The Joint Distribution of π(1) and des(π) in Sn

4.1 Introduction and Main Results

In this chapter we will often identify a permutation π S with the sequence ∈ n π(1), π(2),...,π(n). So for instance if π(1) = k and π(n)= `, we say that π “begins with” k and “ends with” `. In Chapter III we introduced the Eulerian Numbers:

n (4.1) := # π S : des(π)= d . d { ∈ n }   which have been widely studied; see, for example, [24, p. 267] and [7]. We also defined the refined Eulerian numbers:

n (4.2) := # π S : des(π)= d and π(1) = k . d { ∈ n }  k This chapter is an investigation of the numbers defined in Equation (4.2). We derive a formula in terms of binomial coefficients:

Theorem 4.1 If 1 k n, ≤ ≤

n d j n k 1 n k = ( 1) − j − (j + 1) − d − d j  k j 0   X≥ − where 00 is interpreted as 1

65 66

which is similar to a well-known formula for the Eulerian numbers. We use the

formula to understand how the two statistics des(π) and π(1) interact.

If we are constructing a permutation with d descents from left to right, and d is small,

a conservative strategy would seem to be to start with a low number, since starting

with a high number means we will use up one of our descents near the beginning

of the permutation. So in other words, we expect that if d is small then there are

more permutations with d descents starting with low numbers than starting with

high numbers. Similarly, if d is close to n, our intuition is that starting with a high

number leaves us more possibilities later on. This intuition turns into a result that

is surprisingly simple to state:

Theorem 4.3 If π is chosen uniformly from among those permutations of n that

have d descents, the expected value of π(1) is d +1 and the expected value of π(n) is

n d. −

Theorem 4.7, stated in Section 4.7, asserts that, as expected, the sequence

n n n , ,..., d d d  1  2  n is weakly decreasing when d is small and weakly increasing when d is large. Con-

sequently that sequence is an interpolation between its endpoints, which are two

n 1 n 1 Eulerian numbers: −d and d−1 . Experimental evidence (see Section 4.10) sug- − gests that it is a good interpolation, at least when d is close to (n 1)/2, in the sense − that a normal approximation to the Eulerian numbers also seems to provide a good

approximation to the refined Eulerian numbers. However, the normal approximation

is good for neither set when d is small or d is close to n. Theorem 4.7 shows that in

those cases the distribution of π(1) is approximately geometric. 67

The application which led directly to this work is presented in Section 4.5. Fulman

shows in [21] that certain statistics on permutations, including the descent statistic,

are approximately normally distributed. (Note that descents were known to be ap-

proximately normal before Fulman’s work; see [12] for references.) The main tool he

uses is Stein’s method, due to Charles Stein [45]. The thrust behind the method is

to introduce a little extra randomness to a given random variable to get a new one.

If certain symmetries are present, the result is an “exchangeable pair” of random

variables, meaning, essentially, that the Markov process which takes one to the other

is reversible. Then Stein’s theorems (and more recent refinements of them) can be

applied to bound the distance between the original variable’s distribution and the

standard normal distribution.

Fulman uses a “random to end” operation to add randomness to permutations. That

is, he starts with a uniformly distributed permutation π and sets

π0 =(I, I +1,...,n)π

where I is selected uniformly from 1, 2,...,n . While (π, π0) is not an exchangeable { } pair, it turns out that (des(π), des(π0)) is, and this leads to a central limit theorem

for descents, and for a whole class of statistics.

We tried a different method of adding randomness to π, namely, composing π with a

uniformly selected transposition. That calculation (which is presented in Section 4.5)

led directly to Theorem 4.3.

The Neggers-Stanley Conjecture, now disproved in general ([6, 46]), was that the

generating function for descents among the linear extensions of a fixed finite poset

has only real zeroes. Since a function with positive coefficients can have no positive 68

zeroes, any combinatorial generating function with all real zeroes can be written in

the form

a(x + c )(x + c ) (x + c ) 1 2 ··· n

for non-negative constants a, c1,c2,...,cn. The implication, then, is that if D is the

number of descents in a uniformly selected linear extension of a poset for which the

Neggers-Stanley conjecture is true, then D can be written as a sum of independent

Bernoulli variables.

In Section 4.6 we present several generating functions for the refined Eulerian num-

bers. The set of permutations of n which begin with k is the same as the set of linear extensions of the poset defined on 1, 2,...,n by k < a for all a other than k. { } So we can find the Neggers-Stanley generating function for this poset explicitly, and

we show that it does indeed have only real zeroes. We go on to show that several

similar posets also satisfy the conjecture. (All of the posets considered were known

to satisfy the conjecture by theorems of Simion [41] and Wagner [50].)

4.2 Basic Properties

If π(1) = 1, then π(1) is certainly less than π(2), so all descents are among the final

n 1 numbers. And if π(1) = n, there is certain to be a descent between π(1) and − π(2). So we know some boundary values:

n n 1 n n 1 (4.3) = − and = − d d d d 1  1    n  −  69

for n> 1. Also, it is immediate that

n (4.4) =(n 1)! d k − Xd   n n (4.5) = . d k d Xk     Let ρ S be the reversal permutation, as defined in Equation (3.7): ρ(i)= n+1 i. ∈ n − Then ρπ is the same as π but with i replaced by n +1 i everywhere. As a result, − ρπ has a descent wherever π has an ascent, and an ascent wherever π has a descent.

So des(ρπ)= n 1 des(π). Since π ρπ is a bijection from S to itself, it follows − − 7→ n that

n n (4.6) = . d n 1 d    − − 

Note we could have obtained the same result from the map π πρ, since reversing 7→ π changes ascents to descents and also reflects their positions about the center. Let

n k (4.7) := # π S : des(π)= d and π(n)= k . d { ∈ n }   Both transformations yield symmetric identities for the refined Eulerian numbers.

If

π(1) = k and des(π)= d

then

ρπ(1) = n +1 k and des(ρπ)= n 1 d − − − πρ(n)= k and des(πρ)= n 1 d − − ρπρ(n)= n +1 k and des(ρπρ)= d − from which it follows that

k n+1 k n n n n − (4.8) = = = . d k n 1 d n+1 k n 1 d d    − −  −  − −    70

4.3 Recurrences

Assume n> 1. Let

T := π S : π(1) = k and des(π)= d k { ∈ n } T := π S : π(1) = k, π(2) = `, and des(π)= d k,` { ∈ n } and let π T . If `

n 1 #T = − k,` d 1  − ` when `k, there is no descent between π(1) and π(2), so there

must be d descents in the tail. This time ` is the (` 1)st largest value in the tail, − so n 1 #Tk,` = − d ` 1   −

when `>k. Of course Tk is the disjoint union of the Tk,l, so

n n 1 n 1 = #Tk = #Tk,` = − + − d k d 1 ` d ` 1   X` X`k   − or, more succinctly,

n 1 n − n 1 (4.9) = − . d k d [` n, in which case d k = 0.

Now suppose 1 k n 1 and π S begins with k. Swapping k with k + 1 in ≤ ≤ − ∈ n the sequence π(1), π(2),...,π(n) preserves descents for most π; the only exception

is when π(2) = k + 1, in which case a new descent is created. If we eliminate that 71 case, the swap map is a bijection from T T to T T , as those sets are k \ k,k+1 k+1 \ k+1,k defined above. Substituting sizes for sets, we have

n n 1 n n 1 (4.10) − = − . d − d d − d 1  k  k  k+1  − k Equation (4.10) is valid as long as k = 0 and k = n. (If k < 0 or k > n, all terms 6 6 are 0.)

n A well-known recurrence for d comes from considering what happens when you insert n into an element of Sn 1: − n n 1 n 1 (4.11) =(n d) − +(d + 1) − d − d 1 d    −    We can get a similar recurrence for the refined Eulerian numbers by considering what happens when you insert n into an element of Sn 1 which begins with k: − n n 1 n 1 (4.12) =(n d 1) − +(d + 1) − . d − − d 1 d  k  − k  k

In other words, one way to get an element of Sn which begins with k and has d descents is to take an element of Sn 1 which begins with k and has d descents, and − insert n at a descent or at the end (d + 1 choices). The other way is to start with an element of Sn 1 which begins with k and has d 1 descents, and insert n at an − − ascent (n d 1 choices). Equation (4.12) fails when k = n, since a permutation of − −

Sn 1 cannot begin with n. It is valid for all other values of k. −

4.4 Formulas and Moments

There is an explicit formula for the Eulerian numbers in terms of binomial coefficients:

n d j n +1 n (4.13) = ( 1) − (j + 1) . d − d j   j 0   X≥ − 72

See for example, [24, p. 269]. [Aside: Equation (4.13) follows from Equation (4.11), which means that it is valid for all values of d, even if d< 0 or d n]. So we have ≥

n n 1 d j n n 1 (4.14) = − = ( 1) − (j + 1) − d d − d j  1   j 0   X≥ − n n 1 d 1 j n n 1 (4.15) = − = ( 1) − − (j + 1) − d d 1 − d 1 j  n   j 0   − X≥ − − d j n n 1 = ( 1) − j − . − d j j 0   X≥ − n These suggest a formula for d k:

Theorem 4.1 If 1 k n, ≤ ≤

n d j n k 1 n k (4.16) = ( 1) − j − (j + 1) − d − d j  k j 0   X≥ − where 00 is interpreted as 1.

Proof. Let Gi count the number of ways to place balls numbered 1, 2,...,n into bins numbered 1, 2,...,i, subject to the restriction that the lowest numbered ball in bin 1 is ball k. Then the balls whose labels are less than k have i 1 possible − destinations, and those with labels greater than k have i possible destinations. So

k 1 n k G =(i 1) − i − . i −

Each arrangement of balls in bins corresponds to a permutation with i 1 or fewer − descents, by writing the numbers of the balls in bin 1 in increasing order, followed by the numbers of the balls in bin 2, etc. How many times does Gi count a particular permutation π? To find a ball/bin arrangement which represents π, we need to write out π(1), π(2),...,π(n) and then place i 1 “cuts” between the numbers. If π has − d descents, then d of the cuts must be placed where the descents occur, but the 73 other i 1 d may be placed anywhere except before π(1), which must be k and − − must be in the first bin. A standard stars and bars argument tells us that there

(n 1)+(i 1 d) are − i 1 −d − ways to place the extra cuts, so that is the number of times π is − −  counted by Gi. Substituting j = d + 1 yields

n 1+ i j n (4.17) G = − − . i i j j 1 j i   k X≤ − − n T So if we let Dj := j 1 , we have G = MD, where G = (G1,G2,...) , D = − k T (D1,D2,...) , and M is a lower-triangular matrix with

n 1+ i j M = − − 1 j i. ij i j ≤ ≤  −  To invert M, we note that there is a homomorphism from the ring of formal power series onto the ring of lower-triangular Toeplitz matrices, namely

a0

2 a1 a0 (a0 + a1x + a2x + . . .)   . 7→ a2 a1 a0  . . . .   ......      Under this map, M is the matrix for

n 1+ r m(x)= − xr. r r 0   X≥ The coefficient of xr is the number of ways to put r identical balls into n boxes, hence 1 m(x)= 1+ x + x2 + n = . ··· (1 x)n − 1  M − must represent the polynomial for

1 n = (1 x)n = ( 1)r xr m(x) − − r r 0   X≥ so

1 i j n M − =( 1) − . ij − i j  −   74

1 Finally, D = M − G so

n 1 d j n k 1 n k = D = M − G = ( 1) − j − (j + 1) − d d+1 d+1,j+1 j+1 − d j  k j 0 j 0  −  X≥  X≥ as desired. 

Remark 1: Finding all the ball/bin arrangements which produce a particular per-

mutation is similar to finding all shuffles of a deck of cards which produce that

permutation, as in Section 2.7.

Remark 2: Note that the proof never assumed that d was less than n, and Equa-

tion (4.16) is clearly true if d < 0. So the theorem is true for all integer values of

d.

Remark 3: We can rewrite Equation (4.17) as

n 1+ i j n k 1 n k (4.18) − − =(i 1) − i − i j j 1 − j k X  −  −  which is a k-analog of the Worpitzky identity [24].

From Equation (4.16) we can deduce a formula for the mth “rising moment” of π(1)

when des(π) is fixed. Assume π is chosen uniformly from Sn, and let

des(π)=d m (4.19) µm := E π(1)

where xm = x(x + 1)(x + 2) (x + m 1). ··· −

Lemma 4.2

n 1 − n d j n m + n ` (4.20) µ = m! ( 1) − j . d m − d j `   j 0   `=0   X≥ − X 75

y 1 x = m + 2 `

r

0 x 0 m m +1 m + n ` − Figure 4.1: A north-east lattice path from (0, 0) to (m + n `,`). (All edges are either north or east.) −

Proof. From Equation (4.16),

n n n m n (k + m 1)! d j n k 1 n k µ = k = − ( 1) − j − (j + 1) − d m d (k 1)! − d j   k=1  k k=1 j 0   X X − X≥ − n 1 − d j n r + m r n 1 r = m! ( 1) − j (j + 1) − − − d j r j 0   r=0   X≥ − X n 1 r n 1 r n 1 r s (the last by setting r = k 1). But (j +1) − − = − − − − j . So let ` = r +s − s=0 s and we have P 

n 1 ` − n d j n ` r + m n 1 r (4.21) µ = m! ( 1) − j − − . d m − d j r ` r   j 0   `=0 r=0    X≥ − X X − Let φ be a north/east lattice path from (0, 0) to (m + n `,`) (see Figure 4.1). − m+n The number of such paths is ` . If r is the height at which φ crosses the line 1  x = m + 2 , then φ consists of a path from (0, 0) to (m, r), a horizontal segment, and a path from (m+1,r)to(m+n `,`). Counting the possibilities for the parts yields − the identity

` r + m n 1 r m + n (4.22) − − = . r ` r ` r=0 X   −    76

Substituting Equation (4.22) into Equation (4.21) yields the desired result. 

Note that the last sum in Equation (4.20) is a truncated binomial expansion of

(j + 1)m+n.

Theorem 4.3 If π is chosen uniformly from among those permutations of n that have d descents, the expected value of π(1) is d +1 and the expected value of π(n) is

n d. −

Proof. The expected value of π(1) is µ1, and

n 1 − n d j n n +1 ` µ = ( 1) − j d 1 − d j `   j 0   `=0   X≥ − X d j n n+1 n+1 n = ( 1) − (j + 1) j (n + 1)j − d j − − j 0  −  X≥  d j n n+1 d i n n = ( 1) − (j + 1) ( 1) − (n +1+ i)i . − d j − − d i j 0   i 0   X≥ − X≥ − The term for i = 0 is 0, so let j = i 1 and combine −

d j n n n = ( 1) − (j + 1) (j +1)+ (n + j + 2) . − d j d j 1 j 0      X≥ − − − n+1 The quantity in brackets simplifies to (d + 1) d j , so −  n d j n n +1 n µ =(d + 1) ( 1) − (j + 1) =(d + 1) . d 1 − d j d   j 0     X≥ − Therefore

des(π)=d µ1 = E π(1) = d +1.

For the second part,

1 n k 1 n Edes(π)=dπ(n)= k = k n d n n 1 d d k n 1 d k k X   − − X  − −  des( π)=n 1 d = E − − π(1) = n d. − 77



Remark: It is possible to prove Theorem 4.3 by induction using the recurrences in

Equation (4.11) and Equation (4.12).

4.5 Application Using Stein’s Method

Charles Stein developed a method for showing that the distribution of a random vari-

able W which meets certain criteria is approximately standard normal. His technique

has come to be known as Stein’s method; see [45] or [16] for more explanation.

In its most straightforward form, Stein’s method requires finding a “companion”

random variable W ∗ such that (W, W ∗) is an exchangeable pair, meaning that

(4.23) P(W = w, W ∗ = w∗)= P(W = w∗, W ∗ = w)

for all values of w and w∗. If we can find such a W ∗ and if, in addition, there is a λ

between 0 and 1 such that

W (4.24) E W ∗ = (1 λ)W −

(that is, the expected value of W ∗ when W is fixed at some value is 1 λ times that − value), then we may apply Stein’s method.

We are interested in showing that if π is chosen uniformly from Sn, then the random

variable D = des(π) is approximately normal. This has been proven before, and

in more generality; see [21] for references. Our goal here is to demonstrate the set-

up for Stein’s method—that is, finding a companion variable and showing that it

satisfies Equation (4.23) and Equation (4.24). From there, applying the method

would proceed as in [21]. 78

Often the companion variable in Stein’s method is defined by adding a little bit

of randomness to the variable we are interested in. In this case, let τ be selected

uniformly from among the transpositions in Sn, independently of π. Then τπ is

uniformly distributed over S , and for any u, v S , n ∈ n

1 1 P(π = u,τπ = v)= P(π = u, τ = vu− )= P(π = u)P(τ = vu− )

1 1 P(π = v,τπ = u)= P(π = v, τ = uv− )= P(π = v)P(τ = uv− ).

1 1 n − 1 Both right-hand sides are (n!)− 2 if vu− is a transposition and 0 otherwise, so (π,τπ) is an exchangeable pair. 

Let D∗ := des(τπ). Since (π,τπ) is an exchangeable pair, (F (π), F (τπ)) is exchange-

able for any function F . So(D,D∗) is exchangeable. For 1 i n 1 let ≤ ≤ −

Di = [π(i) > π(i +1)] and Di∗ = [τπ(i) > τπ(i + 1)]

n 1 n 1 be Bernoulli random variables; then D = i=1− Di and D∗ = i=1− Di∗. P P Fix π and i and let a = π(i), b = π(i +1). If a < b, the only ways for τπ(i) to be

bigger than τπ(i +1) are if τ swaps a with something bigger than b (n b ways), if − τ swaps b with something smaller than a (a 1 ways), or if τ swaps a with b. So −

Di=0 n + π(i) π(i + 1) E (Di∗ Di)= P(Di∗ =1 Di =0)= −n − | 2  and similarly if a > b,

Di=1 n + π(i + 1) π(i) E (Di∗ Di)= P(Di∗ =0 Di =1)= n − . − − | − 2  So in general

D π(i) π(i + 1) 2(1 2Di) E i (D∗ D )= − + − . i − i n n 1 2 −  79

Summing now over i causes the π(i) terms to telescope:

n 1 − π π π(1) π(n) 4D E (D∗ D)= E (D∗ D )= − +2 − i − i n − n 1 i=1 2 X − which allows us to apply Theorem 4.3: 

D D D D π E π(1) E π(n) 4D E (D∗ D)= E E (D∗ D)= − +2 − − n − n 1 2 − 2 4D 2(n 1) 4D = ((D + 1) (n D ))+2 = − − . n(n 1) − − − n 1 n − − The mean and variance of des(π) are µ :=(n 1)/2 and σ2 :=(n+1)/12 respectively, − so the variables

des(π) µ des(τπ) µ W := − and W ∗ := − σ σ

have mean 0 and variance 1. Then (W, W ∗) is an exchangeable pair and

W =w D=σw+µ D∗ µ D µ 1 D=σw+µ E (W ∗ W )= E − − = E (D∗ D) − σ − σ σ −   which is to say

W 2(n 1) 4(σW + µ) 4 E (W ∗ W )= − − = W. − σn −n

So if W ∗ is obtained using the “random transposition” method described here,

(W, W ∗) will be an exchangeable pair satisfying Equation (4.24) with λ =4/n. One

can now proceed with Stein’s method and show that W is close to being a standard

normal random variable.

4.6 Generating Functions

Recall from Equation (3.4) that

n (4.25) a (x) := xd = (1 x)n+1 (j + 1)nxj. n d − d   j 0 X X≥ 80

and we defined a0(x) = 1 so as to agree with the right-hand side of Equation (4.25).

We can define one generating function for all the Eulerian numbers:

A(x, z):= a (x)zn/n!= (1 x)n+1 (j + 1)nxjzn/n! n − n 0 n 0 j 0 X≥ X≥ X≥ j n j (j+1)(1 x)z = (1 x) x ((j + 1)(1 x)z) /n!=(1 x) x e − − − − j 0 n 0 j 0 X≥ X≥ X≥ (1 x)z (1 x)z (1 x)z j (1 x)e − = (1 x)e − xe − = − − 1 xe(1 x)z j 0 − − X≥  x 1 = − . x e(x 1)z − − n There are various ways to define generating functions for the d k, depending on which variables are kept constant.

Theorem 4.4

θy n d k n 1 dt (4.26) x y z /n!= 1 1/y d k θ θ x t − n,d,kX   Z − 1 x where θ = exp y−−1 1 z . − n  o

Proof. Let B(x, y, z) be the left-hand side of Equation (4.26). Note the sum is over

all integers n, d, and k. So

1 ∂B (4.27) (y− 1) + (1 x)B = − ∂z − n +1 n +1 n n + xdykzn/n! d k+1 − d k d k − d 1 k n,d,kX        −   Let S(n,d,k) be the bracketed quantity. It is clearly 0 if n< 0, and if n = 0, 1 if d =0,k =0 S(0,d,k)= 1 if d =0,k =1  −  0 otherwise so n = 0 contributes 1 y to the sum on the right-hand side of Equation (4.27). If − n 1, then by Equation (4.10), S(n,d,k) is 0 unless k =0 or k = n + 1, in which ≥ 81

case

n +1 n n +1 n S(n, d, 0) = = and S(n, d, n +1)= = . d d − d − d 1  1    n+1  −  Therefore

1 ∂B n d n n d n+1 n (y− 1) + (1 x)B =1 y + x z /n! x y z /n! − ∂z − − d − d 1 n 1,d   n 1,d   X≥ X≥ − =1 y +(A(x, z) 1) xy(A(x, yz) 1) − − − − = A(x, z) xyA(x, yz)+(x 1)y − − ye (1 x)yz 1 = (1 x) − − . − x e (1 x)yz − x e (1 x)z  − − − − − − 

1 x αz 1 Let α = y−−1 1 . Then θ, as defined in the theorem, is e . Dividing by y− 1 and − − multiplying through by θ gives

∂B ye (1 x)yz 1 θ + αθB = αθ − − ∂z x e (1 x)yz − x e (1 x)z  − − − − − −  which is to say that

∂ yθy θ (θB)= α . ∂z x θy 1 − x θ1 1/y  − − − −  Differentiating the integral on the right-hand side of Equation (4.26),

y ∂ θ dt ∂θy 1 ∂θ 1 1 1/y = 1 1/y 1 1/y ∂z θ x t − ∂z "x (θy) − # − ∂z x θ − Z − −  −  αyθy αθ ∂ = = (θB) . x θy 1 − x θ1 1/y ∂z − − − − Since θB and the integral have the same derivative with respect to z, and they both vanish when z = 0, they are equal.  82

Here are three more generating functions. They can all be found by plugging in

Equation (4.16) and switching summation signs.

n d n k 1 n k j (4.28) x = (1 x) j − (j + 1) − x d − d  k j 0 X X≥ n n n k d j n (j + 1) (jy) (4.29) y = y ( 1) − − d − d j j +1 jy k  k j 0   X X≥ − − n (j + 1)n (jy)n (4.30) xdyk = (1 x)ny − xj d − j +1 jy d,k  k j 0 X X≥ −

Note that Equation (4.28) is the function we called gn,k(x) in Chapter III.

We can now prove a special case of the Neggers-Stanley conjecture. Let P be a

poset of n elements with labels 1, 2,...,n. A linear extension of P is an ordering of

1, 2,...,n which preserves the ordering of P ; that is, a π S which is such that if ∈ n i < j then i appears before j in the list π(1), π(2),...,π(n). If (P ) denotes the P L set of linear extensions of P , then Neggers and Stanley [43, p. 311] conjectured that

for any poset, every zero of the descent polynomial (P )(x) is real. DL

The conjecture has been shown to be false in general [6, 46]. But we can prove it is

true in a certain special case.

Theorem 4.5 If Pn,k is the poset with Hasse diagram

1 2 k 1 k +1 k +2 n

··· − ···

k

then (P )(x) has only distinct real roots. DL n,k 83

Proof. For u, v 0 let ≥

u + v +1 d des(π) cu,v := x = x . d u+1 d   π Su+v+1 X π∈(1)=Xu+1

Then setting u = k 1, v = n k yields the polynomial in question. If v = 0, c − − u,v counts the reversal permutation ρ, which has (u+v +1) 1= u descents. Otherwise, − if v > 0, cu,v doesn’t count ρ but it does count the permutation

u +1,u,u 1,..., 1,u + v +1,u + v,...,u +2 − which has u + v 1 descents. So − u if v =0 deg(c )= u,v u + v 1 if v > 0.  −

Similarly, if u = 0, cu,v counts the identity permutation, which has no descents.

Otherwise it doesn’t count the identity but it does count

u +1,u +2,...,u + v +1, 1, 2,...,u which has 1 descent. So x - c (x) and if u> 0, x c (x) but x2 - c (x). Now let 0,v | u,v u,v c h := u,v . u,v (1 x)u+v+1 − Note that c (1) = # π S : π(1) = u +1 = (u + v)!, so c does not have u,v { ∈ u+v+1 } u,v a zero at x = 1. Therefore hu,v has exactly the same zeroes as cu,v, plus a pole at x = 1. By Equation (4.28),

u v j hu,v(x)= j (j + 1) x . j 0 X≥ If D represents differentiation with respect to x, we have

(xD)hu,v(x)= hu+1,v(x) and (Dx)hu,v(x)= hu,v+1(x) 84

−1 1 0 h0,0(x)=(1 x) −∞ − Dx − −2 h0,1(x)=(1 x) Dx − −3 h0,2(x)=(1 x) (1 + x) Dx −

−4 2 ¡ h0,3(x)=(1 x) (1+4x + x ) ¡ xD −

−5 2 3

¢ ¢ h1,3(x)=(1 x) (8x + 14x +2x ) ¢ xD −

−6 2 3 4

£ £ £ h2,3(x)=(1 x) (8x + 60x + 48x +4x ) £ xD −

−7 2 3 4 5

¤ ¤ ¤ ¤ h3,3(x)=(1 x) (8x + 160x + 384x + 160x +8x ) ¤ −

Figure 4.2: The real zeroes of a Neggers-Stanley descent polynomial. This is the construction of h3,3(x) as described in the proof of Theorem 4.5. The zeroes of each function are plotted on the right, using an inverse tangent scale. Since each function is generated from the previous one by applying either the Dx or the xD operator, Rolle’s Theorem guarantees that the zeroes must interleave. By a counting argument, all the zeroes of each function must be real. and so

v u h0,v(x)=(Dx) h0,0(x) and hu,v(x)=(xD) h0,v(x).

1 2 h (x)=(1 x)− and h (x)=(1 x)− both have no zeroes. Suppose v 1 and 0,0 − 0,1 − ≥ h has only distinct real zeroes. Since deg(c )= v 1 and x - c (x), xc (x) and 0,v 0,v − 0,v 0,v xh (x) have v distinct real zeroes. By Rolle’s Theorem, (Dx)h must have v 1 0,v 0,v −

distinct zeroes interlaced between those of xh0,v(x). Furthermore, the denominator

of xh (x) has degree v +1, so xh (x) approaches 0 as x . Therefore its graph 0,v 0,v →∞ must turn back toward the x-axis somewhere to the left of its leftmost zero, at which

place there must be another zero of (Dx)h0,v. So we have found v real zeroes of

h0,v+1, and that accounts for all its zeroes.

Applying the xD operator goes similarly. Given that hu,v has d distinct real zeroes,

by Rolle’s Theorem Dh (x) has d 1 interlaced zeroes. Since the numerator of u,v −

hu,v has degree smaller than the denominator, hu,v must turn back toward the axis

to the left of its leftmost zero, which accounts for one more zero of Dhu,v. Finally,

2 (xD)hu,v has one more zero at 0 (which is distinct from the others since x - hu,v and

therefore x - Dhu,v). So we have found d +1 real zeroes of hu+1,v, and that accounts 85

for all of the zeroes. 

Corollary 4.6 The same can be said for the poset

k

1 2 k 1 k +1 k +2 n ··· − ···

Proof. The result of turning a poset upside-down is to reverse all its linear ex-

tensions, which changes ascents to descents and vice-versa. So if F (x) is the de- scent polynomial of the original poset, the descent polynomial of the new poset is

n 1 1 x − F (x− ). So the roots of the new polynomial are the inverses of the roots of the original. 

4.7 General Behavior

n n n We can say in general how the sequence d n, d n 1,..., d 1 behaves. −

n The set of numbers d k, for n fixed, is very nearly unimodal if arranged appropri- ately.

Theorem 4.7 Fix n and d. Then 86

(i) If d =0, 0= n = = n < n =1 d n ··· d 2 d 1 n n n (ii) If 1 d (n 3)/2, d n < d n 1 < < d 1 ≤ ≤ − − ··· (iii) If n is even and d =(n 2)/2, n < < n = n − d n ··· d 2 d 1 (iv) If n is odd and d =(n 1)/2, n < < n > > n − d n ··· d (n+1)/2 ··· d 1 n n n (v) If n is even and d = n/2, d n = d n 1 > > d 1 − ··· n n n (vi) If (n + 1)/2 d n 2, d n > d n 1 > > d 1 ≤ ≤ − − ··· n n n (vii) If d = n 1, 1= d n > d n 1 = = d 1 =0. − − ···

Proof. (i) follows from the fact that the identity is the only permutation with 0 descents. (v), (vi), and (vii) follow from (iii), (ii), and (i) respectively because n = n . d k n 1 d n+1 k − − −

n n Let fn(x)= , which means that fn(nd k)= if 0 d n 1 x/n +1 n x/n +n x d k b c b c − − ≤ ≤ − and 1 k n. Figure 4.3 shows the graphs of f (x) and f (x). Each monochromatic ≤ ≤ 6 7 n n n section is a sequence of the form d n, d n 1,..., d 1. Note the graphs plateau − n n 1 n where one sequence meets the next. Since = − = , each sequence d 1 d d+1 n begins where the previous one ends. The content of the theorem is that fn is basically unimodal. That is, the sequences on the left increase, those on the right decrease, and those in the middle behave according to (iii) through (v).

The theorem is true for small n by inspection. By Equation (4.9), k 1 n k 1 n n +1 − n n − = + = fn(n(d 1) `)+ fn(nd `). d k d 1 ` d ` − − −   X`=1  −  X`=k   X`=1 X`=k Let i = ` + n k in the first sum and ` k in the second and we have − − n 1 n k n 1 n +1 − − − = f (n(d 1) (i n+k))+ f (nd (i+k)) = f (nd k i). d n − − − n − n − −  k i=n k+1 i=0 i=0 X− X X So imagine a caterpillar of length n crawling on the graph of y = fn(x), as shown in the top graph of Figure 4.3. If his head is at x-position nd k, the equation above − 87

y 66 y = f6(x)

x d = 1 d = 2

y 384 y = f7(x)

x d = 2, k = 5

n Figure 4.3: The unimodality of d k for n = 6 and n = 7. The graphs of f6(x) and f7(x) are shown, n where fn(nd k)= , as defined in Theorem 4.7. − d k

88

says that the sum of the heights of his segments (or his total potential energy) is

n+1 n+1 d k. If he were to take a step forward, his total energy would be d k+1. That would be an increase in energy if the new height of his head is higher than the current height of his tail. The theorem now follows easily by induction. 

4.8 Behavior if d n 

If d is much less than n, and π is selected at random from those permutations of n

letters which have d descents, then the distribution of π(1) approaches a geometric

distribution uniformly, in the following sense.

Theorem 4.8 Fix an integer d > 0. Suppose πn is chosen uniformly from those

permutations of n letters which have d descents. Then for any  > 0 there is an N

such that

P(π (1) = k) (4.31) n 1 <  (1 p)pk 1 − − −

for all integers n and k with n N and 1 k n, where p = d . ≥ ≤ ≤ d+1

d j n Proof. For 0 j d, let Pj(n)=( 1) − d j . Then by Equation (4.16) and ≤ ≤ − − Equation (4.13), 

k 1 n k n k 1 n k j − j +1 − (4.32) = d − (d + 1) − P (n) d j d d +1  k 0 j d     X≤ ≤ n j +1 n (4.33) =(d + 1)n P (n + 1) . d j d +1   0 j d   X≤ ≤ k 1 k 1 k Since (1 p)p − = d − /(d + 1) , the left-hand side of Equation (4.31) is − k 1 n k j − j+1 − 0 j d Pj(n) d d+1 ≤ ≤ 1 j+1 n − P 0 j d Pj(n + 1) d+1  ≤ ≤

P 

89

n and since Pd(n)= 0 = 1, the last term of both sums is 1. Therefore we have

 k 1 n k n j − j+1 − j+1 0 j

P  Now each term in each sum is a polynomial in n times a decaying exponential in n. So both sums go to 0 as n goes to infinity. 

Corollary 4.9 The variation distance between the distribution of πn(1) and the ge-

d ometric distribution with parameter p = d+1 approaches 0 as n approaches infinity.

4.9 If Both Ends Are Fixed

We might now ask about the number of permutations with d descents whose first

and last positions are fixed. Let

n ` := # π S : des(π)= d, π(1) = k, and π(n)= ` . d { ∈ n }  k

Theorem 4.10 Suppose 1 k

Proof. Let ψ S be the n-cycle (n, n 1,..., 2, 1). Then for any π S , ∈ n − ∈ n π(i) 1 if π(i) > 1 ψπ(i)= n − if π(i)=1.  90

(Imagine a device like a car odometer, with a window and n wheels, on each of which are painted the numbers 1 through n. π can be represented by turning the ith wheel until π(i) shows through the window, for all i. If one then rolls all the wheels backward a notch, ψπ shows through the window. For this reason we will refer to the transformation π ψπ as a rollback.) 7→

If 1 i < n, let D (π) = [π(i) > π(i + 1)]. The pair π(i), π(i +1) has one of four ≤ i types:

Type D (π) D (ψπ) D (ψπ) D (π) i i i − i A 1 < π(i) < π(i + 1) 0 0 0 B 1 < π(i + 1) < π(i) 1 1 0 C 1= π(i) < π(i + 1) 0 1 1 D 1= π(i + 1) < π(i) 1 0 -1

Most pairs are of type A or B. π will have one pair of type C unless π(n) = 1 and one pair of type D unless π(1) = 1. Therefore

n 1 1 if π(1) = 1 − des(ψπ) des(π)= D (ψπ) D (π)= 1 if π(n)=1 − i − i  − i=1 0 otherwise. X  Let 

P b := π S : π(1) = a and π(n)= b a { ∈ n } b Q := π Sn 1 : π(n)= b . { ∈ − }

Consider the following sequence of bijections:

k+m rollback k 1+m rollback rollback 1+m rollback m shorten m Pk Pk −1 P1 Pn Q −−−−−−→ − −−−−−−→· · · −−−−−−→ −−−−−−→ −−−−−→ where “shortening” a permutation means removing n. (See Figure 4.4 for an exam- ple.) The first k 1 rollbacks all preserve des, and the final one increments des. − But the shortening decrements it again, since it removes n from the front of the permutation. Therefore the net effect, across the whole sequence, is to preserve des.

n k+m n 1 m − So d k = d for all d.

91

Descents Descents 3 7 9 1 8 4 5 2 6 3 8 9 1 3 2 6 5 7 4 4

2 6 8 9 7 3 4 1 5 3 7 8 9 2 1 5 4 6 3 4

1 5 7 8 6 2 3 9 4 3 6 7 8 1 9 4 3 5 2 4

9 4 6 7 5 1 2 8 3 4 5 6 7 9 8 3 2 4 1 4

4 6 7 5 1 2 8 3 3 4 5 6 8 7 2 1 3 9 3

4 5 6 8 7 2 1 3 3

n ` Figure 4.4: The “rollback” procedure for finding d k. Here we see examples of the actions of the bijections described in Theorem 4.10, for n = 9. Vertical lines show the positions of

descents. If π(1) < π(n), as at the top left, then the permutation is “rolled back” until n appears at the front, and then n is removed. In each of the rollbacks but the last, one of the internal bars moves one position to the right, to accommodate a 1 changing to an n, but the total number of descents stays the same. Only when the number in the first position changes from 1 to n do we gain a descent, but it vanishes again when we remove n in the last step. The procedure is similar when π(1) > π(n), as on the right, but the last rollback eliminates a descent, and removing n leaves the number of descents unchanged.

The second part of the theorem follows from the bijective sequence

k rollback k 1 rollback rollback 1 rollback n shorten Pk+m Pk −1+m P1+m Pm Qm −−−−−−→ − −−−−−−→· · · −−−−−−→ −−−−−−→ −−−−−→ where Q = π S : π(1) = a . Here the final rollback decrements des, and the a { ∈ n } n k n 1 shortening leaves it unchanged. So = − .  d k+m d 1 m −

Corollary 4.11 If 1 k,` n and P ` is the poset on 1, 2,...,n defined by ≤ ≤ n,k { } k < a< ` for all a other than k and `, then the descent polynomial of (P ` ) has P P L n,k only distinct real zeroes.

Proof. If ` = k + m, then the polynomial in question is

n ` n 1 m xd = − xd d k d Xd   Xd   92

which was shown to have real distinct zeroes in Corollary 4.6. As in that corollary,

it follows immediately that turning the poset upside-down inverts the roots of the

polynomial, leaving them real. 

4.10 Remarks

In Section 4.5 we noted that if π is uniformly distributed over Sn then the distribution

of D = des(π) is approximately normal. Thus the normal density function

1 1 d µ 2 exp − √ πσ −2 σ 2 (   )

n 1 n+1 1 n with µ = −2 and σ = 12 is a good approximation for n! d when d is close to µ. q However, it can be off by orders of magnitude when d is very small or very large.

n n n Theorem 4.7 shows that the sequence d 1, d 2,..., d n is an interpolation between n 1 n 1 n 1 −d and d−1 , so it seems a reasonable hypothesis that if d is close to −2 , then − n d k is well approximated by

n k n 2 (n 1)! 1 d + n−1 2 − − −πn exp . −2 √n !  6  12  Experimental evidence forp n 200 suggests that this is in fact the case. So while ≤   the distribution of π(1) given des(π) is by no means normal, it does seem to behave

n 1 like a segment of the normal curve when d is near −2 .

More generally, there may be some underlying curve which the Eulerian numbers, properly normalized, can be said to approach as n grows large. It will look like a bell curve, but not be exactly normal, since the normal approximation is not very good when d n. If so, it seems likely that the refined Eulerian numbers presented  in this chapter can be said to approach points on the same curve. CHAPTER V

Estimating Variation Distance

5.1 The Basic Questions of Card Shuffling

In the beginning, the question mathematicians asked about card shuffling was:

If one shuffles enough, will the distribution of decks approach the uniform distribution?

Poincar´econsiders that question in [36], an excerpt of which is translated into English in Appendix B. He finds that any reasonable method of shuffling cards would, indeed, cause the distribution to approach uniform eventually. Gilbert gives an alternative proof involving entropy in [22].

Later in the twentieth century, models were developed for particular methods of shuffling, and the question became

How fast does the distribution approach uniform, for a particular kind of shuffling?

The question became popularized (in [28] and [33], for example) as

How many times should one shuffle a deck of cards?

Aldous [1] and Diaconis [13] note that a graph of variation distance from uniform in many mixing problems displays the “waterfall” shape of Figure 2.6. Diaconis refers

93 94

to the place where the variation distance begins to drop precipitously as the cutoff.

As a result, we have a new question:

Where is the cutoff for card shuffling?

Bayer and Diaconis show, in [3], that under the GSR model the cutoff for riffle shuffling a deck of n distinct cards occurs at

−1/4 a = n3/22O(n ).

Finding the cutoff is arguably equivalent to answering the question

1 When does variation distance drop below 2 ? and it is easy to see from Figure 2.6 that the answer, for 52 distinct cards, is between

64 and 128, and therefore seven 2-shuffles are enough to get the variation distance

1 1 1 below 2 . (Note: Aldous uses 2e in place of 2 to define the cutoff in [1].)

But note that it is not reasonable to equate the question “Where is the cutoff?” with the question “How many times should one shuffle?”, as demonstrated by Doyle’s game

(Section 2.9). After 7 shuffles of an all-distinct 52-card deck, the variation distance from uniform is about .334, which means that there is some set of permutations whose probability is .334 higher than it would be if the cards were perfectly mixed.

In terms, that is a very large amount of unfairness due to poor shuffling.

The probability of the black player winning in Doyle’s game after 7 shuffles is about

.807, or .307 higher than it would be had the deck been perfectly mixed, so Doyle’s game is close to being the worst-case scenario. Clearly 7 shuffles is not enough if one plans to play Doyle’s game.

So the goal, as stated in Section 2.10, should be to shuffle enough so that every set of decks has a probability within  of its uniform probability, where  is considerably 95

1 smaller than 2 . As a result we need to shuffle some distance past the cutoff, which suggests the following question, which we will explore in this chapter:

How does a deck’s variation distance from uniform after an a-shuffle behave when a is very large?

It turns out we can answer this last question even if we do not know where the cutoff is.

5.2 Calculating Variation Distance Exactly

In order to describe the behavior of variation distance from uniform after a-shuffling a deck D, we need to evaluate the expression 1 (5.1) P U = P (D D0) U(D0) || a − || 2 | a → − | D0 (D) X∈O where (D) is the set of all rearrangements of D. In Chapter III we showed how O to calculate the descent polynomial (D,D0; x) and shuffle series (D,D0; x) when D S one of the decks is of some simple form. From there it is straightforward to calculate

P (D D0), for any a. The most general achievement of Chapter III was the a → calculation of (D,D0; x) when either D or D0 is sorted. So in theory we have enough D information to find P U when either the deck is sorted and then shuffled, or || a − || when a deck of distinct cards is shuffled and then cut into hands (the simplest kind of dealing). The only obstacle is that the number of possible orderings of a deck may be very large, so it might not be reasonable to count through all of them. We can get a good approximation using a Monte Carlo method, however, as we will show in

Chapter VI.

On the other hand, most decks are not amenable to the methods of Chapter III, and the problem of finding the descent polynomial (and thus the transition probabilities) 96

between two decks has been shown to be #P-complete in general [11]. So it is unlikely

that there is an efficient way to calculate the variation distance from uniform directly

from Equation (5.1) when the starting or ending deck is not a special type. Diaconis

[3] was able to do it for decks of n distinct cards (Section 2.10) because the transition

probabilities fell into n easily enumerated equivalence classes. So one approach to the

general problem is after fixing, say, a source deck D, to divide the target decks into

equivalence classes according to their descent polynomials with respect to D. If the number of equivalence classes is small, and we can calculate the descent polynomials for each class, then it is possible to find the variation distance from uniform.

We tried that approach in [10] and found some equivalence relations for decks con- taining two types of cards. However, the number of equivalence classes grows faster than in the all-distinct case, and it is not clear how the equivalence relations gener- alize to decks with more types of cards. And we still have the problem that descent polynomials in general are hard to calculate.

1 5.3 Expansion of the Transition Probability as a Power Series in a−

Shifting our focus to the long term (i.e., large a) behavior of the variation distance

allows a different approach, to which the following theorem is the key:

Theorem 5.1 Let D be a deck of n cards, with D0 a reordering of D. Suppose

d (D,D0; x)= b x . Then D d d P n 1 − k (5.2) P (D D0)= c (D,D0)a− a → k Xk=0 97

where

1 (5.3) c (D,D0)= b e ( d, 1 d, 2 d,...,n 1 d) k n! d k − − − − − Xd and ek is the kth elementary symmetric function:

e (α i I)= α . k i| ∈ i S I i S #XS⊂=k Y∈

Proof. bd is the number of permutations with d descents that take D to D0, and

n a+n d 1 each such permutation has probability a− −n − (Equation (2.6)). Therefore  1 a + n d 1 P (D D0)= b − − a → an d n Xd   1 (a d)n = b − an d n! Xd 1 (a d)n = b − n! d an Xd where xm = x(x + 1)(x + 2) (x + m 1), as defined in Section 4.4. Now since ··· − (a d)n a d a +1 d a +2 d a + n 1 d − = − − − − − an a · a · a ··· a d 1 d 2 d n 1 d = 1+ − 1+ − 1+ − 1+ − − a a a ··· a         and

k (1 + uix)= ek(u1,u2,...)x i Y Xk we have n (a d) k − = e ( d, 1 d,...,n 1 d)a− an k − − − − Xk and therefore

1 k (5.4) P (D D0)= b e ( d, 1 d,...,n 1 d)a− . a → n! d k − − − − Xd Xk 98

Of course e ( d, 1 d,...,n 1 d) is zero when k < 0 or k > n. In addition, if k − − − − 0 d n 1 (which is required for b to be nonzero) then 0 appears somewhere in ≤ ≤ − d the list d, 1 d,...,n 1 d. Hence the product of all n numbers in the list is 0, − − − − which means e ( d, 1 d,...,n 1 d) is 0. So we may limit k to 0, 1,...,n 1 n − − − − { − } in Equation (5.4). Switching summation signs completes the proof. 

Let D and D0 be as in the theorem. Then c0(D,D0) is

1 1 b e ( d, 1 d,...,n 1 d)= b n! d 0 − − − − n! d Xd Xd since e0 is 1 regardless of its arguments. Moreover bd is the number of permutations

with d descents that take D to D0, so d bd is the total number of permutations that

take D to D0, that is, the sizeof T (D,DP 0). Let

S = #T (D,D0)

N = # (D). O

Recall from Section 2.2 that T (D,D0) is a coset of the stabilizer of D, so S is the size

of each such coset, and of the stabilizer. N is the number of cosets, and therefore

SN = #Sn = n!

So we have 1 1 1 c = b = S = . 0 n! d SN · N Xd That is comforting: as a gets large, the higher order terms of Equation (5.2) will

die off, and the probability of getting D0 will approach c0. Since N is the number of

1 1 rearrangements of D, N is the uniform probability of D0. So the fact that c0 is N means that in the long run, each ordering of the deck is equally likely. 99

That means that c1 (if it is nonzero) will give us a good approximation for a deck’s deviation from uniform when a is large—that is,

c1(D,D0) 1 Pa(D D0) U(D0)= + O . → − a a2  

So we should try to find c1.

5.4 Approximating the Transition Probability and Variation Distance for Large Shuffles

Let u and v be card values, and let i and j be positions in a deck. We say that a deck

D has a u-v digraph at i if D(i)= u and D(i +1) = v. We say that D has a u-v pair at (i, j) if i < j, D(i) = u, and D(j) = v. The distinction between digraphs and pairs is akin to that between descents and inversions in a permutation. Let

W (D,u,v) := # u-v digraphs in D # v-u digraphs in D { } − { } Z(D,u,v) := # u-v pairs in D # v-u pairs in D { } − { }

For example, the deck D = ABAAABABB has 3 A-B digraphs and 2 B-A digraphs, so

W (D, A, B)=3 2 = 1. D has 15 A-B pairs and 5 B-A pairs, so Z(D, A, B) = − 15 5 = 10. Note that both W and Z are antisymmetric in u and v: W (D,u,v)= − W (D,v,u) and Z(D,u,v)= Z(D,v,u). − −

Theorem 5.2 Let D, D0, n, N, and c1 be as in the last section. Assume the values of the cards in D have some defined order. Then

n W (D,u,v)Z(D0,u,v) c (D,D0)= 1 2N n n u

Proof. From Theorem 5.1 we have

1 c = b e ( d, 1 d, 2 d,...,n 1 d) 1 n! d 1 − − − − − d X n 1 1 − 1 n(n 1) = b (i d)= b − nd n! d − NS d 2 − i=0 Xd X Xd   n 1 n = b (n 1 2d)= E (n 1 2 des(π) πD = D0) . 2N · S d − − 2N − − | Xd Of course asc(π) + des(π)= n 1 for all π S , so we can write that as − ∈ n n c = E(asc(π) des(π) πD = D0). 1 2N − |

Let 1 if π(k) < π(k + 1) ω (π) := k 1 if π(k) > π(k + 1)  − for 1 k n 1. Then asc(π) des(π)= ω (π), and ≤ ≤ − − k k n P (5.5) c = E(ω (π) πD = D0) 1 2N k | Xk since expectation is linear.

Fix k and suppose first that D(k) = D(k +1). Let τ S be the transposition ∈ n

(k,k + 1). Obviously τ has no effect on D; τD = D. If π T (D,D0) then ∈

D0 = πD = πτD

so πτ is also in T (D,D0), which means that π πτ takes T (D,D0) into itself. The 7→ map is one-to-one by general group theory, and therefore a bijection. Its effect is

to swap π(k) with π(k + 1), so it sends permutations which have an ascent at k to

those which have a descent at k, and vice-versa. That means half the permutations

in T (D,D0) have an ascent at k and half have a descent; therefore E(ω (π) πD = k |

D0) = 0. So when D(k)= D(k + 1), k contributes nothing to the right-hand side of

Equation (5.5). 101

Now let u := D(k) and v := D(k+1), and suppose u = v. There are n cards labelled 6 u

u in D0, and therefore nu possibilities for π(k) if π is to take D to D0. Likewise there

are nv possibilities for π(k + 1), and each valid choice for (π(k), π(k + 1)) is equally

likely if π is chosen uniformly from T (D,D0). So

# u-v pairs in D0 P(ωk(π)=1 πD = D0)= { } | nunv # v-u pairs in D0 P(ωk(π)= 1 πD = D0)= { } − | nunv and therefore

Z(D0,u,v) E(ωk(π) πD = D0)= . | nunv 0 So each u-v digraph in D contributes Z(D ,u,v) to the right-hand side of Equation (5.5), nunv and together the u-v and v-u digraphs in D contribute

Z(D ,u,v) Z(D ,v,u) # u-v digraphs in D 0 + # v-u digraphs in D 0 . { } nunv { } nunv

Since Z is antisymmetric in u and v, that is

W (D,u,v)Z(D0,u,v)

nunv and summing over all u < v completes the proof. 

Note that summing over u < v is just a convenient way to say that we consider all pairs of values, but in one particular order only. The value of c1 is unaffected if we

change the order on the values.

Now that we can approximate the transition probability when a is large, we can

approximate the variation distance from uniform. If the source deck D is fixed,

1 1 c1(D,D0) 1 Pa U = Pa(D D0) U(D0) = + O || − || 2 | → − | 2 a a2 D0 (D) D0 (D)   X∈O X∈O

1 2 = κ1(D)a− + O a−  102

where

1 (5.6) κ (D) := c (D,D0) 1 2 | 1 | D0 (D) X∈O n W (D,u,v)Z(D ,u,v) = 0 4N 0 nunv D (D) u

Similarly, if the target deck D0 is fixed,

1 1 2 Pa U = Pa(D D0) U(D) = κ1(D0)a− + O a− || − || 2 | → − | D (D0) ∈OX  where

1 (5.7) κ (D0) := c (D,D0) 1 2 | 1 | D (D0) ∈OX n W (D,u,v)Z(D ,u,v) = 0 4N 0 nunv D (D ) u

on (D0). O

5.5 All-Distinct Decks

Suppose the source deck consists of n distinct cards. Since the names and order we

give card values is arbitrary, without loss of generality we may assume we have the

ordered deck e = 123 ...n. Then 1 if v = u +1 W (e, u, v)= 0 otherwise  103

since all of the digraphs in e are of the form (u,u + 1). So since nw = 1 for all values

w, n 1 n − κ1(e)= E Z(D0,u,u + 1) 4 u=1 X where D0 is chosen uniformly from (D). All the cards are distinct, so for each O

D0 there is a unique permutation π with D0 = πe. πe contains a u-(u + 1) pair if

π(u) < π(u +1), and a (u + 1)-u pair otherwise. So

1 if π has an ascent at u Z(πe,u,u +1) = 1 if π has a descent at u  − which is what we called ωu(π) in the last section. That means

Z(πe,u,u +1) = ω (π) = asc(π) des(π) u − u u X X and therefore n κ (e)= E asc(π) des(π) 1 4 | − | where π is uniform on Sn. We know the distribution of descents on Sn very well—it

is given by the Eulerian numbers. Since asc(π) des(π)= n 1 2 des(π), we have − − − n n κ (e)= n 1 2d . 1 4n! d | − − | Xd  

From there we can calculate κ1(e) exactly for any size n. For instance, when n = 52 we find that κ1(e) is

146020943891326775423340146124729913263177343486982212261189487693 3314356310443124530393681659122442758682178888925184000000000000 which is approximately 44.06. So after giving a deck of 52 distinct cards an a-shuffle, where a is large, the variation distance from uniform will be approximately 44.06/a.

We can compare that with the exact results we got for the same deck in Section 2.10, to see how big a has to be to make the approximation a good one. Figure 5.1 shows the result. 104

y

1

1 2

a 2 4 8 16 32 64 128 256 512 1024

Figure 5.1: The variation distance from uniform, and first-order approximation, of a distinct 52- card deck after an a-shuffle. The actual variation distance is graphed in black, and the first order approximation κ1/a, with κ1 = 44.05710497, is graphed in red. The approximation becomes quite good near the cutoff value.

Suppose π is chosen uniformly from Sn. We can approximate κ1(e) using the fact that

des(π), and therefore asc(π) des(π)= n 1 2 des(π), is close to being normally − − − distributed when n is large. des(π) has mean (n 1)/2 and variance (n +1)/12 [39, − p. 216], so

E(n 1 2 des(π)) = n 1 2E(des(π))=0 − − − − n +1 Var(n 1 2 des(π)) = 4 Var(des(π)) = − − 3

and therefore asc(π) des(π) X := − n+1 3 is approximately a standard normal randomq variable. So

n n n +1 κ1(e)= E asc(π) des(π) = E X 4 | − | 4 r 3 | | 105

and

1 ∞ x2/2 2 ∞ x2/2 E X x e− dx = xe− dx | | ≈ √2π | | √2π 0 Z−∞ Z 2 x2/2 ∞ 2 = e− = . π − 0 π r h i r Therefore n +1 κ1(e) n ≈ r 24π when n is large. This result agrees with the asymptotics on p. 309 of Bayer and

Diaconis [3]. For n = 52, the approximation is 43.60, which has a relative error of

1.04%.

5.6 Calculation of κ1 when the Source Deck is Fixed

If we a-shuffle a deck D we know that its variation distance from uniform is

1 2 Pa U = κ1a− + O(a− ) || − || where

n W (D,u,v)Z(D0,u,v) κ1 = κ1(D)= . 4N 0 nunv D (D) u

5.6.1 Two Card Types

Suppose first that D contains only two card types.

Theorem 5.3 If D (1n1 2n2 ) then ∈ O

0 if D(1) = D(n) κ1(D)=  (n1 1)!(n2 1)! n n − − 1 2  2(n 1)! k p(n1, n2,k) k 2 if D(1) = D(n) − − 6 P  106

1 2

Figure 5.2: Representing a deck as a walk on a graph. A source deck D with two types of cards can be represented as a walk on the directed graph shown here. The starting position is the top card of the deck, and the ending position is the bottom card. Each digraph corresponds to traversing an edge. If we start and end at the same position (i.e., if D(1) = D(n)), then we will have walked the 1-2 edge the same number of times as the 2-1 edge, and as a result W (D, 1, 2), and hence κ1(D), will be 0.

where n = n1 + n2 and p(n1, n2,k) is the number of Young shapes of size k which fit

inside an n n box. 1 × 2

Proof. We have

n W (D, 1, 2)Z(D0, 1, 2) κ1(D)= 4N n1n2 D0 (D) X∈O n = W (D, 1, 2) E Z(D0, 1, 2) 4n1n2 | | | | where D0 is uniformly distributed on (D). We can think of D as a walk on a graph O with two vertices (Figure 5.2): start at D(1), move to D(2), then to D(3), etc.,

finishing at D(n). Traversal of the edge from 1 to 2 is a 1-2 digraph, and traversal

from 2 to 1 is a 2-1 digraph. If the path begins at D(1) = 1 and ends at D(n) =2,

it must have made one more 1-2 passage than 2-1 passage, so W (D, 1, 2)=1. If the

path begins at 2 and ends at 1 (D(1) = 2,D(n) = 1) then it made one more trip

from 2 to 1 than the other direction, so W (D, 1, 2) = 1. Otherwise, if D begins − and ends with cards of the same value, then it must contain the same number of 1-2

digraphs as 2-1 digraphs, so W (D, 1, 2)= 0.

So W (D, 1, 2) = [D(1) = D(n)], which means that κ is 0 when D(1) = D(n), and | | 6 1 otherwise n κ1 = E Z(D0, 1, 2) . 4n1n2 | |

107

2

1

2 ©

2

¦ § ¨ 1 1

2

¤ ¥ 1

2 £

2

¡ ¢ 1 1 Figure 5.3: The deck D0 = 112212112212 represented as a north-east path from (0, 0) to (6, 6). A square in the Young shape λ to the northwest of the path corresponds to a 1-2 pair of D0 by finding the 1 in the square’s column and the 2 in its row. Likewise a square in the inverted, complementary shape λ to the southeast corresponds to a 2-1 pair in D0.

D0 can be described as a north-east walk from (0, 0) to (n1, n2), where a 1 represents a step eastward and a 2 represents a step northward (see Figure 5.3). Each 1-2 pair in D0 corresponds to a box inside the Young shape λ northwest of the path, and each 2-1 pair corresponds to a box in the complimentary shape to the southeast.

Therefore

Z(D0, 1, 2) = λ (n n λ )=2 λ n n | | − 1 2 −| | | | − 1 2 so n κ1 = E 2 λ n1n2 4n1n2 | | | − | where λ is chosen uniformly from the Young shapes that fit inside an n n box. 1 × 2 The total number of such shapes is the number of rearrangements of D, n , so n1  p(n , n ,k) P( λ = k)= 1 2 | | n n1  and simplification completes the proof. 

The generating function for the p(n1, n2,k) is a q-binomial coefficient:

k n (n)!q p(n1, n2,k)q = = n1 q (n1)!q(n2)!q Xk   108

m 1 m where (m) =1+ q + + q − and (m)! = (i) . (See [42, pp. 25–31].) The q ··· q i=1 q

q-binomial coefficients obey a simple recurrenceQ making it feasible to calculate κ1

exactly for any reasonably-sized deck of two types of cards.

In the case of a deck D consisting of 26 red cards and 26 black cards, we can make

2 κ1 vanish (that is, make Pa U = O(a− )) by insuring that the top and bottom || − || cards are the same color before we start shuffling. Otherwise, we will have

25!25! 1355227989329177 κ = p(26, 26,k) k 338 = 1.6817. 1 2 51! | − | 805867616040669 ≈ · Xk

5.6.2 For Which Collections of Cards Can κ1 be Zero?

We will write W (D) 0 to mean that W (D,u,v) = 0 for all choices of card types u ≡ and v.

Lemma 5.4 κ (D)=0 if and only if W (D) 0. 1 ≡

Proof. By Equation (5.6) κ1 will be zero if and only if c1(D,D0) = 0 for all re-

arrangements D0 of D. If W (D) 0 that is clearly the case, by Theorem 5.2. ≡

Suppose on the other hand that W (D) 0, and D0 is some rearrangement of D. If 6≡

c (D,D0) = 0, then we are done; otherwise, find a pair (i, j) of positions such that 1 6

W (D,D0(i),D0(j)) = 0 and i j is as small as possible. Then swapping the cards 6 | − |

in positions i and j of D0 produces a deck D00 such that c (D,D00) = 0.  1 6

Now we can characterize those collections of cards which can be arranged into a deck

with κ1 = 0.

Theorem 5.5 Let D0 be a deck of n cards which have k different values. There is a 109

deck D (D ) with κ (D)=0 if and only if n 2k 1. ∈ O 0 1 ≥ −

Proof. Suppose 1 n < 2k 1, D is an arrangement of D with W (D) 0, and ≤ − 0 ≡ k is the minimal number of card values which admits such a counterexample to the theorem. Note k must be at least 2. If there were two or more cards for each value, then n would be at least 2k. So there must be some value v for which there is only

one card (i.e., nv = 1). If v were the first card in D and it were followed by a card

labelled u, then W (D,u,v) would be 1. So D cannot begin with v, and likewise

cannot end with v. Therefore v appears somewhere in the middle, surrounded by

two cards of the same type. That is,

D = D1uvuD2

for some (possibly empty) D1,D2 and some u. But note that

D˜ := D1uD2

has the same digraphs as D, less one u-v digraph and one v-u digraph; therefore

W (D˜) W (D) 0. But D˜ containsn ˜ = n 2 cards and k˜ = k 1 card types, ≡ ≡ − − and n< 2k 1 implies thatn< ˜ 2k˜ 1. So D˜ is a counterexample with fewer card − − values, which contradicts our assumption that D was minimal.

Conversely, Suppose that n 2k 1. Let D =1n1 2n2 knk , and assume without ≥ − 0 ··· loss of generality that n n n 1. Consider the following procedure for 1 ≥ 2 ≥···≥ k ≥ ordering D. A twin digraph is a digraph of the form uu.

1. Put the cards labelled 1 on the table 2. for v 2 to k 3. Insert← the cards labelled v inside some twin digraph 110

v nv Dv mv 0 1 1 4 1111 3 2 3 111222 1 4 3 2 1133 12221 4 4 1 1133124 221 3 5 1 15 133124221 2 6 1 15136 3124221 1 7 1 15136312427 21 0

4 3 2 Table 5.1: Ordering a deck so that κ1 = 0. Here we demonstrate how to order the deck 1 2 3 4567 in such a way that it will have κ1 = 0, as described in Theorem 5.5. New additions are boxed. nv is the number of cards of value v, and mv is the number of twin digraphs (digraphs of the form uu) in the deck after the v’s have been inserted, representing the capacity for future additions. mv = mv−1 + nv 2, so values for which there are 3 or more cards expand capacity, 2-card values leave it− unchanged, and singletons decrement it by 1.

See Table 5.1 for an example. Let Dv be the deck on the table after the cards labelled

v are put down. Inserting a packet of cards labelled v between two cards labelled u

creates one u-v digraph, one v-u digraph, and some number of v-v digraphs, which

together have no net effect on W ; therefore

W (Dv) W (Dv 1) W (D1) 0 ≡ − ≡···≡ ≡

for all v. So the procedure will produce a deck with κ1 = 0 unless we are unable to

execute step 3 at some point.

Let mv be the number of twin digraphs in Dv. If v > 1, the insertion of the v’s

creates nv 1 new twin digraphs and eliminates one, so mv = mv 1 + nv 2; if we − − −

set m0 = 1, then that is true for v = 1 as well. So

v m = m + (n 2)=(n + n + + n )+1 2v. v 0 u − 1 2 ··· v − u=1 X Let s be the cumulative average 1 (n + n + + n ). Since the n are weakly v v 1 2 ··· v v

decreasing with v, the sv must be as well. So if v

n 2k 1 1 1 s s = − =2 > 2 v ≥ k k ≥ k − k − v 111

which means 1 m = vs +1 2v = v s 2+ > 0. v v − v − v   So every time we get to step 3, there will be a place available to put new cards. 

So any deck with sufficient redundancy has an ordering whose variation distance de-

2 1 creases as a− when a-shuffled, instead of the usual a− . Such an ordering represents

a boon to the shuffler, who would like to decrease the variation distance below his

target as quickly as possible.

5.6.3 Guessing How Large κ1 Can Be

Perhaps a more practical question than “is there a fast ordering?” is, “How slow is

the slowest order?” In other words, shufflers rarely know the exact order of the deck

before they begin shuffling. However, they do know the composition of the deck, so

if they also knew which ordering of the deck had the highest value of κ1, they could

use that as a “worst case scenario” to generate a bound on how much shuffling is

needed.

We suggest here a heuristic for guessing which ordering of a deck is the hardest to

shuffle. We can write 1 1 κ (D)= c (D,D0) = E Nc (D,D0) 1 2 | 1 | 2 | 1 | D0 (D) X∈O where D0 is chosen uniformly from (D), and N = # (D). From Theorem 5.2 we O O have

(5.8) Nc1(D,D0)= αuvZ(D0,u,v) u

Now note that the αuv are antisymmetric, and we can re-write Equation (5.8) as

(5.9) Nc (D,D0)= α (# u-v pairs in D0 # v-u pairs in D0 ) 1 uv { } − { } u

= # u-v pairs in D0 α = α 0 0 . { } uv D (i),D (j) u,v i

n 0 0 So Nc1 is the sum of the 2 identically distributed random variables Xij = αD (i),D (j). Unfortunately they are not independent, so we cannot apply the central limit theo-

rem. But recall that des(π), where π is chosen uniformly from Sn, could be written as

a similar sum, and that variable is approximately normal. So there is reason to hope

that the same may be true in the case of Nc1, and experimental evidence supports

the case.

The mean value of Nc1 is 0, since the mean value of each Xij is 0. If Nc1 does indeed

turn out to be approximately a normal random variable, then it should be the case,

as in Section 5.5, that

1 1 2 κ1(D)= E Nc1(D,D0) Var(Nc1) 2 | | ≈ 2rπ p where D0, in the middle expression, is chosen uniformly from (D). That means the O

deck for which Var(Nc1) is the highest should be the one that is hardest to shuffle.

In Appendix E we show that

n2 W (D,u,v)2 (5.10) Var(Nc )= + E(D) 1 12 n n "u

1 + 1 if D(1) = D(n) nD(1) nD(n) 6 (5.11) E(D)=   0 if D(1) = D(n).

(Note: the E term can be eliminated by thinking of a deck as circular, by inserting

a card with a new value, say 0, between the bottom card and the top card. That 113

adds two digraphs to the deck, and if we treat the value 0 like any other value, the

effect is to absorb E(D) into the sum. Such a paradigm shift is beyond the scope of this work, unfortunately, since it is not clear how to define pairs, or descents, if the deck is circular. So for now we will continue to think of decks as sequences with a beginning and an end.)

Consider a standard card deck D as it is used in a game (such as go-fish) where the

rank of cards is significant, but their suit is not. Then we may consider all the aces

to be identical, and likewise all the twos, all the threes, etc. So D (A424 K4). ∈ O ··· The total of all the W (D,u,v) cannot be more than the number of digraphs, which

is 51, and the size of W (D,u,v) must be between 4 and 4 for each u, v, since there −

are at most 4 u-v digraphs and at most 4 v-u digraphs. The nw are all 4, so to

maximize the variance we should maximize the sum of the squares of the W (D,u,v);

and we cannot do better than to make 12 of them have size 4 and 1 of them have

size 3. That is achieved by the cyclic deck D =(A23456789TJQK)4 which has

W (D, A, 2)= W (D, 2, 3)= = W (D, Q, K) = 4 and W (D, A, K)= 3. ··· −

So 522 42 ( 3)2 1 1 35321 Var(Nc )= 12 + − + + = 1 12 4 4 4 4 4 4 12   ·  ·  and therefore we expect

Var(Nc1) 35321 κ1(D) = 21.6. ≈ r 2π r 24π ≈ In the next chapter we will approach this same problem using a Monte Carlo simu- lation, and we can say with high confidence that κ1(D) is about 21.47. So while the

normal approximation is unproven, it appears to work in practice.

The upshot, then, is that κ1 for the hardest-to-shuffle go-fish deck seems to be about 114

half of κ1 for a 52-card all-distinct deck (which we found to be about 44 in Sec- tion 5.5). So for large a we expect the variation distance from uniform of a go-fish deck to be about half what it would be if all the cards were distinct. Which is to say, whatever our goal for randomness is, we need to give an all-distinct deck one more

2-shuffle than we give to a go-fish deck.

In general the problem of finding the order of a collection of cards that maximizes

Var(Nc1) is beyond the scope of the present work. Consider for examplea blackjack

deck. Blackjack is a game played with ordinary playing cards. As with go-fish, suit

is ignored, and in blackjack it is also the case that 10’s, jacks, queens, and kings

are all equivalent. In the game is generally played with several decks mixed

together, but here we assume there is only one; that means a blackjack deck is some

ordering of A424 94T16. ···

Since the frequencies of the cards are not all the same, different types of digraphs

count differently in the sum in Equation (5.10). The blackjack deck with the highest

4 16 value of Var(Nc1) appears to be D =(A23456789) T , though we will not attempt

to prove it here. The variance is

522 42 ( 3)2 1 1 1 96161 8 + − + + + = 12 4 4 4 4 4 16 4 16 48   ·  · ·  so we expect

Var(Nc1) 96161 κ1(D) = 17.9. ≈ r 2π r 96π ≈ In the next section we will show how to calculate this value exactly. 115

5.6.4 Finding κ1 for General Source Decks

Here is an algorithm for calculating κ exactly for a general deck. Fix a deck D 1 ∈ (1n1 2n2 knk ), and let α be as in Section 5.6.3. For the sake of brevity, define O ··· uv

φ(D0) := αuvZ(D0,u,v)= αD0(i),D0(j) u

m = m e + m e + + m e . 1 1 2 2 ··· k k

Let m = m . When all the m are nonnegative integers we will say that m is || || i i

“valid”, andP think of it as representing a collection of m1 cards labelled 1, m2 cards

labelled 2, etc.; in that case, m is the total number of cards in the collection. We || || write m n to mean that m is a subcollection of n, i.e., that m and n are valid and ≤ m n for all v. And by abuse of notation we will also write v ≤ v

(1m1 2m2 ...kmk ) if m is valid (m) := O O   otherwise.  ∅  Define fm to be the ordinary generating function

0 0 0 φ(D ) αD (i),D (j) fm(t) := t = t . D0 (m) D0 (m) i

fev (t)=1. 116

Otherwise, if m > 1, then a deck D0 (m) which ends in v can be thought of || || ∈ O as an element D˜ of (m e ), followed by a v. So O − v

αD˜ (i),D˜ (j) αD˜ (`),v fm(t)= t t .

v D˜ (m ev) i

to get

muαuv (5.12) fm(t)= t fm ev (t). − v u ! X Y Once we have f , we know the distribution of φ on (n)= (D), so we can calculate n O O

κ1(D). Let

F := f (t) : m n and m = r . r { m ≤ || || }

Equation (5.12) tells us how to calculate the polynomials in Fr if we know the

polynomials in Fr 1. To find fn we will ultimately need to find fm for all m n, − ≤ but we may do so one layer at a time to save on storage space. In other words

1. Set each polynomial in F1 to 1 2. for r 2 to n ← 3. Calculate Fr from Fr 1 − 4. Release storage space used by Fr 1 −

Suppose D (A424 K4) represents a normal deck of cards in a game where card ∈ O ···

rank matters but suit does not, such as go-fish. Calculating κ1(D) in the manner

described above requires finding 513 = 1,220,703,125 polynomials, though they need

r not all be stored at once. The size of Fr is the coefficient of x in

(1 + x + + xn1 )(1 + x + + xn2 ) (1 + x + + xnk ) ··· ··· ··· ···

or (1+ x + x2 + x3 + x4)13 in the case of our go-fish deck D. The coefficients are

unimodal and symmetric about the center, so the largest pair is F25 and F26, whose 117 sizes together add up to 186,908,632. Unfortunately, that amount of storage space is just out of reach of the technology currently available to the author.

On the other hand, if D (A424 94T16) is some blackjack deck, then the size of ∈ O ··· F is the coefficient of xr in (1+ x + + x4)9(1 + x + + x16). Once again the r ··· ··· coefficients are unimodal and symmetric about 26, so the largest pair is F25 and F26, whose sizes total 3,725,154. That amount of storage is available, and the calculation was carried out for the deck

4 16 Dbj =(A23456789) T .

(We conjectured in the last section that Dbj is the blackjack deck with the largest value of κ1, and therefore the hardest deck to shuffle.) In this case αuv is the row u, column v entry of

016 0 0 0 0 0 0 12 0 16 016 0 0 0 0 0− 0 0  − 0 16 016 0 0 0 0 0 0  −  0 0 16 016 0 0 0 0 0   −  13  0 0 0 16 016 0 0 0 0  α =  −  32  0 0 0 0 16 0 16 0 0 0   −   0 0 0 0 0 16 0 16 0 0   −   000000 16 0 16 0   −   12 0 0 0 0 0 0 16 0 1   −   00000000 1 0   −    where the card values are given the inherent order A, 2,..., 9, T. Letting s = t13/32 makes Equation (5.12) contain only integer powers of s, which speeds computation. 118

The result is

1920 1918 f = s− + s− + n ··· 2 + 10825903695806145633110268710665552660771s−

+ 11060018158195592709052990843297020304044

+ 10825903695806145633110268710665552660771s2

+ + s1918 + s1920 ···

from which we conclude that

7934389009159438485208644722674289791267933 κ (D )= 17.671. 1 bj 448997850648843417179198436737241000000000 ≈

The calculation of fn for Dbj took several hundred hours of computer time.

5.7 Calculation of κ1 when the Target Deck is Fixed

Now we turn our attention to the case where the target deck is fixed and the source

deck allowed to vary. In Section 2.11 we showed that is equivalent to fixing a method

of dealing cards to players, in a game where the order in which players get their cards

is not significant (such as bridge). We know that for a given method of dealing D0,

1 2 Pa U = κ1a− + O(a− ) || − ||

where

1 1 (5.13) κ = κ (D0)= c (D,D0) = E Nc (D,D0) 1 1 2 | 1 | 2 | 1 | D (D0) ∈OX and D is chosen uniformly from (D0), N = # (D0). O O 119

5.7.1 For Which Collections of Cards Can κ1 be Zero?

In Section 5.6.2 we sought to find when a collection of cards could be ordered so that

when dealt, it would shuffle quickly. That is an interesting mathematical question,

but in practice, the shuffler has little control over the state of the deck before he

begins shuffling, unless he is willing to order it by hand. For decks with only two

types of cards, we found a simple trick for making shuffling go faster, but in general

ordering by hand to achieve a favorable starting deck probably takes longer than it

is worth (and will make other game players suspicious).

By contrast, the shuffler/dealer has complete control over how a deck is dealt after

it has been shuffled. So finding the best way to deal, given a recipe for how many

cards should go to each player, is eminently practical.

n1 n1 n Let D = 1 2 k k be such a recipe, and suppose D0 (D ) is some dealing 0 ··· ∈ O 0 sequence. Let

n Z(D0,u,v) (5.14) βuv := 2 nunv   so that

(5.15) Nc1(D,D0)= βuvW (D,u,v) u

Nc (D,D0)= β (# u-v digraphs in D # v-u digraphs in D ) 1 uv { } − { } u

Lemma 5.6 The following are equivalent: 120

(i) κ1(D0)=0

(ii) c (D,D0)=0 for all D (D0) 1 ∈ O (iii) Z(D0,u,v)=0 for all u, v

(iv) βuv =0 for all u, v

Proof. Equation (5.13) makes it clear that (i) and (ii) are equivalent, and likewise

(iii) is the same as (iv) by Equation (5.14). (iv) implies (ii) by Equation (5.15), so it remains only to prove that (ii) implies (iv).

Suppose c (D,D0)= 0 for all D (D0). If D0 contains only one type of card, then 1 ∈ O (iv) is vacuously true. If k = 2 we have

n1 n2 0= Nc1(1 2 ,D0)= β12

so all the β’s are 0. So suppose k 3, and let D (D0) begin with 123. Then ≥ ∈ O

Nc1(D,D0)= βD(i),D(i+1) = β12 + β23 + βD(i),D(i+1). i i 3 X X≥ Let Dˆ be the same as D, but with the first two cards transposed. Then

ˆ Nc1(D,D0)= βDˆ (i),Dˆ(i+1) = β21 + β13 + βD(i),D(i+1). i i 3 X X≥

But c1(D,D0)= c1(D,Dˆ 0) = 0 by assumption, so

β12 + β23 = β21 + β13.

Likewise, swapping the first two cards of decks that begin with 132 and 231 yields

β13 + β32 = β31 + β12

β23 + β31 = β32 + β21. 121

Since β = β we can encode our results as uv − vu 2 1 1 β − 12 (5.16) 1 2 1 β13 =0.  −1 1− 2   β  − 23     The matrix in Equation (5.16) has determinant 4, so it is invertible, and therefore

β12 = β13 = β23 = 0. Likewise all the other β’s vanish as well. 

Theorem 5.7 There is a D0 (D ) with κ (D0)=0 if and only if not more than ∈ O 0 1 one of the nv is odd.

Proof. By the lemma, making κ1(D0) vanish is equivalent to making Z(D0,u,v)

vanish for all u, v.

Suppose n and n are both odd. Then for any D0 (D ), u v ∈ O 0

# u-v pairs in D0 + # v-u pairs in D0 = n n . { } { } u v

So one of those sizes must be odd and one even, which means that their difference,

Z(D0,u,v), cannot be 0.

If, on the other hand, at most one of the nv is odd, then it is easy to construct a D0

with κ1(D0) = 0. Assume without loss of generality that n2, n3,...,nk are all even.

Execute the following procedure:

1. Place the cards labelled 1 in a stack on the table 2. for v 2 to k 3. Place← half the cards labelled v on top of the deck 4. Place the other half on the bottom of the deck

nv Then for any u < v there are 2 cards of value v below each card labelled u, so

nunv 2 u-v pairs in total; likewise the same number of v-u pairs. So Z(D0,u,v) =

Z(D0,v,u)=0.  122

5.7.2 Finding κ1 for General Target Decks

n1 n2 n Here is an algorithm for finding κ in general. Fix a D0 (1 2 k k ) and let 1 ∈ O ···

βuv be as in Section 5.7.1. For brevity define

θ(D) := βuvW (D,u,v)= βD(i),D(i+1) u

Let

(m) := D (m) : the last card of D is v Ov { ∈ O } = Dv˜ : D˜ (m e ) ∈ O − v n θ(D) o gm,v(t) := t .

D v(m) ∈OX Then g (t) := g (t) tells us the full distribution of θ over (D0), which is what n v n,v O

we need to knowP to calculate κ1.

If m = 1 then m = e for some u, and g = [u = v]. Otherwise, if m > 1, || || u eu,v || ||

θ(Dv˜ ) θ(Dv˜ ) βuv θ(D˜) gm,v(t)= t = t = t t .

D˜ (m ev) u D˜ u(m ev) u D˜ u(m ev) ∈OX− X ∈OX− X ∈OX− The last sum is recognizable as our generating function, so

βuv (5.17) gm,v(t)= t gm ev,u(t). − u X Equation (5.17) gives us a way to calculate gn,v recursively. If necessary we can split

the task into “layers”, as we did in Section 5.6.4, though that is not necessary in the

examples that follow. 123

5.7.3 Euchre

Euchre is a card game popular in the Midwestern United States. A number of versions exist. The one we will consider is played with 24 distinct cards (the 9’s,

10’s, jacks, queens, kings, and aces from an ordinary deck). Four players are each dealt five cards. The remaining four cards are stacked on the table and referred to as the “kitty”. The top card of the kitty (the “turn-up card”) is turned face up and may enter play; the other three cards are never seen. Thus there are six types of cards: four hands, the turn-up card, and the rest of the kitty.

Traditionally, a euchre dealer shuffles the cards and then deals three cards to the player on his left, two to his partner across the table, three to the player on his right, and two to himself. Then one more round of dealing, this time two to the left, three across, two to the right, and three to himself. The rest of the cards become the kitty.

So the target deck representing a euchre deal is

D0 = Deuchre0 = 111223334411222334445666 where types 1-4 are the players’ hands, and 5 and 6 are the kitty. The reader may check that Z(D0,u,v) is the row u, column v entry in the matrix

0 17 13 17 5 15 17 0 7 13 5 15  −13 7 0 17 5 15  Z(D0)= − −  17 13 17 0 5 15   − − −   5 5 5 5 0 3   − − − −   15 15 15 15 3 0   − − − − −    124

which makes βuv the row u, column v entry in

17 13 17 0 25 25 25 1 1   17 0 7 13 1 1  − 25 25 25     13 7 17   25 25 0 25 1 1  β = 12  − −  .    17 13 17 0 1 1   25 25 25   − − −     1 1 1 1 0 1   − − − −       1 1 1 1 1 0   − − − − −    The vector representing the card counts is n = (5, 5, 5, 5, 1, 3), so there are 6 6 6 × × × 6 2 4 = 10368 valid m n, and therefore we need to calculate 6 10368 = 62208 × × ≤ ×

polynomials in order to find gn(t). The result has 409 terms:

2652/25 2556/25 504/5 gn(t)=18t− + 28t− + 720t− + . . .

12/25 12/25 + 4807642667453t− + 8529751549516 + 4807642667453t

+ . . . + 720t504/5 + 28t2556/25 + 18t2652/25.

1 Note: it must be the case that g (t) = g (t− ) because θ(ρD) = θ(D), that is, n n −

reversal is a bijection from (D0) to itself which negates θ. Once we know the O

generating function for θ on (D0), we can find its expected absolute value, and O get 1 5133780815153 κ (D0 )= E θ(D) = 6.523. 1 euchre 2 | | 787071645360 ≈

5.7.4 Straight Poker

Straight Poker is a game played with an ordinary deck of 52 cards, all of which are considered distinct. Each player receives a hand of 5 cards. The order in which a player’s cards arrive is immaterial; the player will generally rearrange his cards as soon as he receives them. The cards which are not dealt cannot come into play. 125

Suppose there are four players. Ordinarily the dealer will shuffle and then deal

cyclically: one card to the left, one card across, one card to the right, and one card

to himself. Then repeat the pattern four more times. Thus the normal dealing

sequence is

5 32 D0 = Dpoker0 = (1234) 5

which means 1 1 1 0 5 5 5 1 1 0 1 1 1  − 5 5 5  β = 26 1 1 0 1 1 . − 5 − 5 5  1 1 1 0 1   − 5 − 5 − 5   1 1 1 1 0   − − − −  We have n = (5, 5, 5, 5, 32) so calculating κ requires finding 64 33 5 = 213, 840 1 × × polynomials. Doing so, we find that

1041539930128654272599 κ (D0 )= 8.427. 1 poker 123600572196960202344 ≈

There is, however, one aspect of dealing a hand of poker that we have so far omitted.

After shuffling the cards, but before dealing them, the dealer normally hands the cards to the player on his left. That player has the option to cut the deck—that is, take some number m of cards off the top of the deck and put them on the bottom.

The effect is to rotate the deal sequence: the (m + 1)st card of the pre-cut deck now goes where the first card used to go, the (m + 2)nd card goes where the second card used to go, etc. So for instance, cutting three cards off the top of a poker deck after shuffling but before dealing is equivalent to changing the deal sequence to

3 3 5 29 D0 = σ Dpoker0 =5 (1234) 5 where σ is the cycle (1, 2,..., 52).

The tradition of cutting the deck stems at least in part from the fact that players may not trust the dealer/shuffler to be fair. In such a situation, cutting foils an attempt 126

by the dealer to stack the top of the deck with cards favorable to himself. Game

theory dictates then that the player cutting the deck do so at a uniformly-distributed

position, so that the dealer has as little information as possible about how the deal

will play out.

On the other hand, in many friendly games the players agree implicitly to cooperate

to make the deal as random as possible. In that case, it is natural to ask what effect

the cut has on the variation distance from uniform. It is not difficult to exhaustively

try all 52 possible cuts of a poker deck, and check κ1 for each. The results are in

Figure 5.4. We see that the most helpful cut is after the 16th card, where

16 523485619699747366033 κ (σ D0 )= 4.132. 1 poker 126685078454994859800 ≈

So making the cut in the right place approximately halves κ1, which is to say, it is

worth one extra shuffle. Note that

16 16 5 16 σ Dpoker0 =5 (1234) 5 ,

i.e., the unused cards are split evenly between the top and bottom of the deck, making

βu5 = 0 for all u.

Many other games include a cutting step between the shuffle and the deal, and the

same analysis can be carried out for them as well. In euchre, for example, it is most

effective to cut after the 13th or 15th card, as shown in Figure 5.5.

5.7.5 Ordered Deals

Before we continue we should consider the very simplest kind of dealing, “ordered

dealing,” which might also be called “cutting the deck into hands.” The dealer 127

κ1

9.0

8.0

7.0

6.0

5.0

4.0 m 0 4 8 12 16 20 24 28 32 36 40 44 48 52

Figure 5.4: The effect of cutting a poker deck at position m. The dealer in a poker game usually allows the player to his left the chance to cut the deck between shuffling and dealing. The effect is to rotate the deal sequence, which changes the variation distance from uniform. The graph shows κ1 as a function of the position of the cut. κ1 is smallest when the deck is cut at position 16, so cutting there will produce the fairest deal.

κ1

7.0

6.0

5.0 m 0 4 8 12 16 20 24

Figure 5.5: The effect of cutting a euchre deck at position m. Though less pronounced than in the case of poker, where most of the deck is undealt, cutting a euchre deck can make the deal slightly more random. Cutting at position 13 or position 15 reduces κ1 from 6.523 to 5.849. 128

simply takes the top n1 cards off the shuffled deck, and gives them to player 1, takes

the next n2 cards and gives them to player 2, etc. In other words, fix the target deck

as

n1 n2 n D0 = D0 =1 2 k k . ordered ··· We found descent polynomials for this case in Section 3.11.

If u < v, all of the cards with value u appear before all of the cards with value v in

D0, so Z(D0,u,v)= nunv. So

+1 if u < v n Z(D ,u,v) n β = 0 = 0 if u = v uv 2 n n 2  u v 1 if u>v.    − which is to say that βuv is the row u, column ventry of 0 1 1 1 1 0 1 ··· 1 n − ··· β =  1 1 0 1  . 2 − . − . . ···. .  ......     1 1 1 0   − − − ···    That means that if D is some rearrangement of D0,

n Nc (D,D0)= β W (D,u,v)= W (D,u,v). 1 uv 2 u

+1 if D(i) < D(i + 1) n n Nc (D,D0)= 0 if D(i)= D(i + 1) = (asc(D) des(D)) 1 2  2 − i 1 if D(i) > D(i + 1) X  − where we define ascents and descents for decks in complete analogy to the way we defined them for permutations. So

n κ (D0 )= E asc(D) des(D) 1 ordered 4 | − | where D is selected uniformly from (D0 ). The problem of finding the distribu- O ordered tion of des(D) is known as Simon Newcomb’s problem, and solutions may be found 129

in [31, pp. 187ff] and [39, pp. 216ff]. However, solving Newcomb’s problem does not

immediately imply a comparable answer for asc des, because although asc and des − have the same distribution, they are neither independent nor completely dependent

(as was the case with permutations, when a similar expression arose in Section 5.5).

But we can still use Equation (5.17) to calculate κ1(Dordered0 ).

5.7.6 Bridge

Bridge is a card game played with a full (52-card) deck of ordinary playing cards.

All cards are considered to be distinct. Every bridge game has four players, usually

referred to as north, east, south, and west. The entire deck is dealt out to begin the

game, so that each player receives 13 cards. Thus we may consider a bridge deal to

be some element of (N13E13S13W13). Since there are 13 of each card type, for any O

bridge dealing method D0 we will have

n Z(D0,u,v) 2 βuv = = Z(D0,u,v) 2 nunv 13   so that when D is selected uniformly from (D0), O 1 1 (5.18) κ1(D0)= E βuvW (D,u,v) = E W (D,u,v)Z(D0,u,v) . 2 13 u

The simplest way to deal a hand of bridge would be the ordered deal described in

13 13 13 13 Section 5.7.5. We will refer to that as Dord0 := N E S W , and we have 0 1 1 1 1 0 1 1 Z(D0 ) = 169 ord  −1 1 0 1  − −  1 1 1 0   − − −  where we assume an implicit ordering of N < E < S < W, and interpret the entries in

the matrix accordingly. Using Equation (5.17) we can find

93574839271687495932003418573 κ (D0 )= 27.909. 1 ord 3352796110343049552452340000 ≈ 130

The normal way that bridge players deal, however, is cyclically:

13 D0 = Dcyc0 :=(NESW) .

In that case the reader can check that 0 1 1 1 1 0 1 1 Z(D0 )=13 cyc  −1 1 0 1  − −  1 1 1 0   − − −  1   which is to say, Z(Dcyc0 ,u,v)= 13 Z(Dord0 ,u,v) for all pairs of card types. It follows then from Equation (5.18) that

1 7198064559360576610154109121 κ (D0 )= κ (D0 )= 2.147. 1 cyc 13 1 ord 3352796110343049552452340000 ≈ So dealing cyclically works 13 times as well as simply cutting the deck into hands.

That is to say, in the long run, cyclic dealing is worth an extra log (13) 3.7 2 ≈ 2-shuffles.

That begs the question, is cyclic dealing optimal? Can we do any better? We know

from Theorem 5.7 that since all four of the card counts are odd, it is impossible to

find a dealing scheme with κ1 = 0.

Even so, the answer is yes, we can do better than cyclic dealing. For any bridge

dealing order D0 we may visualize the N-E and E-N pairs of D0 by drawing a north-

east path such as in Figure 5.3. That is, start at (0, 0) and traverse D0 from top to

bottom. When an N is encountered, draw a northward segment, and when an E is

encountered, draw an eastward segment. Then the size of the Young shape to the

northwest of the path is the number of E-N pairs, and the size of the complementary

shape to the southeast is the number of N-E pairs.

For ordered bridge dealing, all the N cards come before the E cards, so the path goes

north 13 units and then east 13 units. Thus the northwest shape is empty (i.e., 131

Figure 5.6: Three styles of bridge dealing, represented by lattice paths. The grid on the left repre- sents ordered dealing, the center is cyclic dealing, and the one on the right is back-and- forth dealing. Each grid shows the sequence of N and E cards as north and east line segments respectively. The size of the Young shape to the northwest of the path is the number of E-N pairs in the target deck, and the size of the complementary shape is the number of N-E pairs. there are no E-N pairs) and the southeast shape is full (there are 169 N-E pairs). The

diagram for ordered dealing is shown on the left side of Figure 5.6.

In the case of cyclic dealing, the N and E cards alternate, so the path is the one shown

in the middle of Figure 5.6. Z(Dcyc0 , N, E) is the unshaded area minus the shaded area,

which is 13. That is much better than the last case, but still higher than it needs to

be because the path is always above the diagonal line which divides the grid in half;

to make Z(Dcyc0 , N, E) smaller we need to make the shaded and unshaded areas closer

to being equal, and that requires crossing back and forth across the diagonal.

Consider then the deal sequence

6 Dbf0 =(NESWWSEN) NESW.

Its N-E path is shown on the right side of Figure 5.6, and the reader may check that

the shaded area is one unit smaller than the unshaded area, making Z(Dbf0 , N, E)=1.

The path is the same for any other pair of values too, so 0 1 1 1 1 0 1 1 1 1 Z(D0 )=  −  = Z(D0 )= Z(D0 ) bf 1 1 0 1 13 cyc 169 ord − −  1 1 1 0   − − −    132

which means

1 7198064559360576610154109121 κ (D0 )= κ (D0 )= 0.165. 1 bf 13 1 cyc 43586349434459644181880420000 ≈

Dbf0 corresponds to “back-and-forth” dealing in the sense that the dealer makes one cycle around the table clockwise, then another cycle counterclockwise, then clockwise, counterclockwise, etc. until the cards have all been dealt. What we have shown is that back-and-forth dealing produces a deal which is, after a sufficiently large number of shuffles, 13 times as random as the same deck dealt cyclically. Or, switching from cyclic dealing to back-and-forth dealing is worth 3.7 extra shuffles in the long run.

5.7.7 Estimation of κ1 for large decks

We have had more success calculating κ1 exactly in this section than we did with

κ1 in the last section, where the examples we tried required more computation. But

it would still be helpful to have a way to estimate κ1 when the deck is too large to

apply Equation (5.17).

Recall that calculation of κ1(D0) was reduced to finding half the expected absolute

value of

Nc1(D,D0)= βD(i),D(i+1) i X when D is selected uniformly from the orderings of D0. The terms of that sum are

identically distributed random variables, but unfortunately not quite independent.

So as in Section 5.6.3, we cannot apply the central limit theorem, but since the terms

are “mostly” independent, we have reason to suspect that the sum is approximately

a normal random variable.

In Appendix F we show that when D0 is fixed and D is uniform on (D0) then O 133

Game D0 κ(D0) Approx

Euchre 1322334212233243563 6.532 6.506

Straight Poker (1234)5532 8.427 8.344

Straight Poker, cut after 16 cards 516(1234)5516 4.132 4.206

Ordered Bridge N13E13S13W13 27.909 28.201

Cyclic Bridge (NESW)13 2.147 2.169

Back-And-Forth Bridge (NESWWSEN)6NESW 0.165 0.167

52 Hands, 1 Card Each 123 52 44.06 43.597 · · ·

Figure 5.7: A summary of κ1 examples, with normal estimates. The normal approximations are calculated using the method of Section 5.7.7.

E(Nc1) = 0 and

(5.19) 2 2 n 1 n +1 Z(D0,u,v) Var(Nc )= Z(D0,u,v) . 1 2(n 1) n  2 n −  v v u u u ! − X X X   If Nc1 is approximately normal we expect that

1 Var(Nc1) (5.20) κ1(D0)= E Nc1(D,D0) 2 | | ≈ r 2π so we have a way to guess at κ1(D0) even if we cannot calculate it precisely. Lacking

a proof of approximate normality that would make this a rigorous guess, we present

instead in Table 5.7 the results of the examples in this section, and the corresponding

approximation given by Equations (5.19) and (5.20).

5.8 Bounding the Error in the First-Order Estimate

In this section we find a crude bound on the difference between the variation distance

from uniform and the first-order estimate. The bound will depend only on the size

of the deck (n) and the size of the shuffle (a), not on the deck’s composition or order. 134

Fix a deck D of n cards, and fix a. Let N := # (D), S := #T (D,D), and O 1 a + n d 1 p := − − . d an n   Then after D is given an a-shuffle,

1 (5.21) P U = P (D D0) U(D0) || a − || 2 | a → − | D0 (D) X∈O 1 1 = p . 2 des(π) − n! D0 (D) π T (D,D0)   X∈O ∈ X

We can expand p as d 1 p = (a d)(a d + 1) (a d + n 1) d ann! − − ··· − − 1 d d 1 d (n 1) = 1 1 − 1 − − n! − a − a ··· − a       1 1 2 3 (5.22) = 1+ rda− + sda− + O(a− ) n!   where

n 1 r := e ( d, 1 d,...,n 1 d)= n − d d 1 − − − − 2 −   n n 1 2 n +1 s := e ( d, 1 d,...,n 1 d)= − d . d 2 − − − − 2 2 − − 12     ! Now recall from the proof of Theorem 5.2 that

n c (D,D0)= E(n 1 2 des(π) πD = D0) 1 2N − − | n n 1 = − des(π) . NS 2 − π T (D,D0)   ∈ X Of course NS = n!, which means

κ1(D) 1 1 rdes(π) (5.23) = c (D,D0) = . a 2a | 1 | 2 an! D0 (D) D0 (D) π T (D,D0) X∈O X∈O ∈ X

(In other words, taking the first-order approximation of var iation distance is equiva- lent to substituting the first two terms of Equation (5.22) for pd in Equation (5.21).) 135

Combining Equation (5.21) and Equation (5.23) with the facts that a a | i| ≤ | i| and A B A B , we have P P || |−| ||≤| − |

κ1 1 1 rdes(π) Pa U = pdes(π) || − || − a 2 0  0 0 − n! − 0 0 an!  D (D) π T (D D )   π T (D D ) X∈O ∈ X ∈ X   1 1 r p des(π) ≤ 2 des(π) − n! − an! D0 (D) π T (D0D0)   π T (D0D0) X∈O ∈ X ∈ X

1 1 r p des( π) ≤ 2 des(π) − n! − an! D0 (D) π T (D0D0)   X∈O ∈ X 1 1 r p des(π) . ≤ 2 des(π) − n! − an! D0 (D) π T (D0D0) X∈O ∈ X

But now the two sums together count all π Sn, so we have bounded the error in ∈ the first-order estimate by

1 n 1 r (5.24) E(n, a) := p d . 2 d d − n! − an! d   X A graph of E(52, a) for 1 a 1024 is shown in Figure 5.8, on a log-log scale, along ≤ ≤ with the actual error in the first-order estimate for the case of 52 distinct cards. We see that the bound is not sharp for large a.

From Equation (5.22) we have

1 n sd 3 1 3 E(n, a)= + O(a− )= E sdes(π) + O(a− ) 2 d a2n! 2a2 d   X

where π is uniform on Sn. Let X be the random variable des(π); then X has mean

n 1 2 n+1 µ = −2 and variance σ = 12 , which means n n X µ 2 s = ((X µ)2 σ2)= σ2 − 1 . X 2 − − 2 σ −       ! X µ For reasonably large n, X is approximately normally distributed, which means σ− is approximately a standard normal random variable. Therefore

n 2 1 ∞ 2 1 t2 E sX σ t 1 e− 2 dt. | | ≈ 2 √2π −   Z−∞

136

y

100

10.0

1.0 a 2 4 8 16 32 64 128 256 512 1024 0.1

0.01

0.001

1.0e-04

Figure 5.8: The bound E(n,a) on the error in the first-order estimate of variation distance, for n = 52, versus the actual error for the case of 52 distinct cards. The bound is in black, the actual error in red.

The integral is

1 ∞ 2 1 t2 2 1 t2 ∞ 2 1 t2 2 t 1 e− 2 dt =2 (1 t )e− 2 dt 2 (1 t )e− 2 dt − − − − Z0 Z0 Z1 1 1 t2 1 t2 ∞ 1 = 2te− 2 2te− 2 =4e− 2 0 − 1

so we expect 3 n n 2 E(n, a) − a− ≈ 12√2πe for large a.

5.9 Ways to understand the transition between two decks

We have several ways to understand the transition from a deck D to a deck D0:

1. The descent polynomial:

des(π) (D,D0; x)= x D 0 πDX=D 137

2. The shuffle series:

a 1 (D,D0; x)= #(a-shuffles which take D to D0)x − S a 1 X≥ 3. The coefficients c (D,D0) which are such that for all a 1, k ≥

k P (D D0)= c a− a → k Xk And we will shortly add another:

4. The moments of the descent distribution:

n 1 k µ (D,D0) := E − des(π) k 2 −  

where π is chosen uniformly from T (D,D0).

Equation (2.12) shows how to calculate (D,D0; x) from (D,D0; x) and vice-versa. S D

Theorem 5.1 shows how to get the c from (D,D0; x). k D

Theorem 5.8 If an(x) is the Eulerian polynomial given in Equation (3.4), then

k (D,D0; x)= ck(D,D0)(1 x) an k(x). D − − Xk where n is the number of cards in D and D0.

Proof. For any a we have

k #(a-shuffles that take D to D0) c a− = P (D D0)= k a → an Xk and therefore

a 1 (D,D0; x)= #(a-shuffles that take D to D0)x − S a 1 X≥ n k a 1 n k a 1 = a cka− x − = ck a − x − . a 1 k ! k a 1 X≥ X X X≥ 138

n k n k We recognize the inner sum from Equation (3.2) as (1 − , 1 − ; x), and recall that S (1m, 1m; x)= a (x). So D m

n+1 (D,D0; x)=(1 x) (D,D0; x) D − S n+1 (n k+1) = (1 x) ck(1 x)− − an k(x) − − − Xk k = ck(1 x) an k(x) − − Xk as desired. 

Theorem 5.2 gives us a way to calculate c1(D,D0). The following is a generalization which will allow calculation of higher order coefficients. We assume that D and D0 are fixed, and we abbreviate ck(D,D0) by ck and µk(D,D0) by µk.

T T Theorem 5.9 Let c :=(c0,c1,...,cn) , µ :=(µ0,µ1,...,µn) . Then

(5.25) Nc = Anµ

where N = # (D) and An =(aij)i,j 0,1,...,n with O ∈{ } n i + j n 1 n 1 n 1 n 1 aij = − ei j − , 1 − , 2 − ,..., − . j − − 2 − 2 − 2 2    

Proof. If asked to find a term of ei(a1 + u, a2 + u,...,an + u), we could pick a k i, pick k of the a’s, and then from the remaining n k binomials, pick i k to ≤ − − contribute a u. Thus

i n k i k e (a + u, a + u,...,a + u)= e (a , a ,...,a ) − u − i 1 2 n k 1 2 n i k k=0  −  Xi n i + j j = ei j(a1, a2,...,an) − u − j j=0 X   139

(the last by substituting j = i k). From Theorem 5.1 we have − N Nc = b e ( d, 1 d,...,n 1 d) i n! d i − − − − Xd = Ee ( des(π), 1 des(π),...,n 1 des(π)) i − − − −

n 1 where π is chosen uniformly from T (D,D0). Let u := − des(π). Then 2 − n 1 n 3 n 1 Nc = Ee u − ,u − ,...,u + − i i − 2 − 2 2   i n 1 n 3 n 1 n i + j j = E ei j − , − ,..., − − u − − 2 − 2 2 j j=0 X    

= aijµj j X as desired. 

Note that An depends only on n, and that it is lower-triangular. Moreover, em =

n 1 n 3 n 1 m e − , − ,..., − is the coefficient of x in m − 2 − 2 2  n 1 n 3 n 1 (5.26) f (x) := 1 − x 1 − x 1+ − x n − 2 − 2 ··· 2       n 1 2 n 3 2 = 1 − x2 1 − x2 − 2 − 2 ···   !   ! which is an even function of x. It follows that a = 0 when i j is odd, so A is a ij − n “checkerboard” matrix:

n 0   0 n  1   n 2 n   − e 0   0 2 2      n 2  n  An =  0 − e2 0  .  1 3     n 4  n 2  n   −0 e4 0 −2 e2 0 4     ......   . .. .. .. .. ..       n 4 n 2 n   n−4 e4 0 n−2 e2 0 n   ··· − −       140

Aside: Equation (5.26) provides an efficient way to calculate the aij, but they can

also be written in terms of unsigned Stirling numbers of the first kind, for which we

n adopt the notation k , after [24]. Since   n = en k(0, 1,...,n 1) k − −   we can write

n 1 n 3 n 1 k n n k + m n 1 m e − , − ,..., − = − − k − 2 − 2 2 n k + m m − 2 m=0   X  −     and therefore

i j m − n n i + j + m n 1 a = − − . ij n i + j + m n i, j, m − 2 m=0 X  −  −   

The fact that An is lower-triangular means that we need not know all of the µ’s to compute the first few c’s: if we know µ0,µ1,...,µk then we can calculate c0,c1,...,ck.

µ0 is always 1, and Theorem 5.2 can be recast to say that

1 W (D,u,v)Z(D ,u,v) (5.27) µ = 0 1 2 n n u

explains how to do that. With some more work we could undoubtedly find µ3, but

since calculating all the µk is equivalent to knowing the descent polynomial, and that

in general is a #P-complete problem, we expect the difficulty in finding µk to be at

least exponential in k.

A deck is a palindrome if it reads the same backwards as forwards (ρD = D). The

fact that An is “checkered” allows the following observation:

Theorem 5.10 If either D or D0 is a palindrome, then ck(D,D0)=0 for all odd k. 141

Proof. Let n be the size of D and let

T := π S : πD = D0 and des(π)= d . d { ∈ n }

If D is a palindrome and π T then ∈ d

D0 = πD = π(ρD)=(πρ)D and we know that πρ has n 1 d descents (see Section 3.3), so π πρ takes T − − 7→ d

into Tn 1 d. Likewise the same map takes Tn 1 d into Td, so it is a bijection, which − − − −

means #Td = #Tn 1 d. − −

If D0 is the palindrome then the relevant bijection is π ρπ, and the result is the 7→

same. So if π is uniform on T (D,D0) then the distribution of des(π) is symmetric

n 1 n 1 about − , which means the distribution of u = − des(π) is symmetric about 0. 2 2 −

So all the odd moments of u (µ1,µ3,µ5,...) vanish. But since An is “checkered”, ck

is a linear combination of µk,µk 2,µk 4,..., so ck = 0 for all odd k.  − −

Before leaving this topic, consider the case where the source and target decks are

n n n n n both 1 . Any permutation applied to 1 gives back 1 , so T (1 , 1 ) = Sn and

P (1n 1n)=1 for all a. So c = 1 and c = c = = c = 0, which means a → 0 1 2 ··· n

T (5.28) (1, 0,..., 0) = Anλ

T where λ =(λ0,λ1,...,λn) and

n 1 k λ = E − des(π) k 2 −  

given that π is chosen uniformly from Sn. The odd λ’s are 0 by symmetry. Riordan

[39, p. 216] gives a formula for the binomial moments of des(π) when π is uniform

on Sn: des(π) n nk E = k n k    −  142

n where n k is a Stirling number of the second kind, i.e., the number of ways to − partition a set of n distinct objects into n k nonempty subsets. From there one can − 1 1 find a formula for each λk; for instance, λ0 = 1, λ2 = 12 (n+1), λ4 = 240 (n+1)(5n+3), etc.

Now given any pair of decks, subtract Equation (5.28) from Equation (5.25) to get

N(0,c ,c ,...,c )T = A (µ λ). 1 2 n n −

Since µ0 = λ0 = 1 no matter what the decks are, we can eliminate the 0th rows and columns to get

(5.29) N(c ,c ,...,c )T = A˜ (µ λ ,µ λ ,...,µ λ )T 1 2 n n 1 − 1 2 − 2 n − n

where A˜n is the same as An, but with indices from 1 to n. We can understand

Equation (5.29) as saying that the closer the moments of a particular transition are to the “universal” moments (taken over Sn), the smaller the ci will be, and the faster

the probability will go to uniform. CHAPTER VI

Monte Carlo Estimates

Throughout this work we have been restricted by the vast size of direct computation.

Even a modest deck can generate large numbers; for instance if N is the number of orderings of a deck, and S is the size of its stabilizer,

Game Deck S N

All-distinct 123 (52) 1 52! 8 1067 ··· ≈ × Go-fish (A23 K)4 (4!)13 8.8 1017 9.2 1049 ··· ≈ × × Euchre 1322334212233243563 5!4 3! 1.2 109 5.0 1014 × ≈ × × Poker (1234)5532 5!4 32! 5.4 1043 1.5 1024 × ≈ × × Bridge (NESW)13 13!4 1.5 1039 5.4 1028 ≈ × × Symmetry and previous results about Eulerian numbers allowed Bayer and Diaconis

to compute the variation distance of an all-distinct deck after an a-shuffle. Among

the other examples the only number that is reasonable to count to is the size of the

stabilizer in euchre, but it is not reasonable to compute all the transition probabilities

for any of the decks. Lacking the same kind of symmetry as we had in the all-distinct

case, that is what would be required to compute variation distances exactly.

In Chapter V we deduced the long term (large a) behavior of variation distances in

143 144 most of the cases above. We can also approximate the short term (small a) behavior in some cases, using Monte Carlo simulations.

6.1 Approximating Variation Distance Given Descent Polynomials

Recall from Section 2.10 that when a deck D is given an a-shuffle, the variation distance from uniform can be written as

(6.1) P U = (U(D0) P (D D0)) || a − || − a → D0 W − X∈ where W − is the set of “disfavored” decks, whose probability is less than the uniform probability. Let x if x> 0 X+ := 0 otherwise.  Then we can write Equation (6.1) as

1 + P U = P (D D0) || a − || N − a → D0 (D)   X∈O

(where N = # (D)) which allows us to avoid worrying about what W − actually is. O

Suppose D0 is a random variable, uniformly distributed on (D), and let X := 1 O 1 1 + P (D D0 ) . Then N − a → 1  + 1 Pa U EX = P (D0 = D0) P (D D0) = || − || 1 1 N − a → N D0 (D)   X∈O and since 0

independent with the same distribution as D1, and let

1 + X := P (D D0) i N − a → i   N Y := (X + X + + X ) . K K 1 2 ··· K

Then

EY = P U K || a − || N 2 1 Var(Y )= Var(X ) . K K i ≤ K   X The central limit theorem tells us to expect YK to be approximately normally dis-

tributed when K is large, which means for > 0

 (6.2) P Y EY < P Y EY <  Var(Y ) | K − K | √ ≥ | K − K | K  K    p  1 1 x2 e− 2 dx. ≈ √2π  Z− (A more rigorous treatment of the error in terms of large deviations can be found

in [11].) If  = 1.96 then the right-hand side of Equation (6.2) is .95, so we can

be fairly confident that YK is within 1.96/√K of the variation distance. (Note: we

could use the sample variance to come up with a better guess at the error, and we

will do something like that in the next section. For now we are content to know that

it is reasonably small.)

The exact same analysis applies if the target deck is fixed instead of the source deck.

For example, we can use the method of Section 3.12 to find descent polynomials for

euchre decks in reasonable time. A sample D ,D ,...,D (D0 ) was gener- 1 2 K ∈ O euchre

ated, for K = 100,000, and for each such D the descent polynomial (D ,D0 ; x) i D i euchre recorded. That allows us to compute the initial segment of the shuffle series for each,

for a = 1 up to 1024. 146

y

1

1 2

a 2 4 8 16 32 64 128 256 512 1024

Figure 6.1: A Monte Carlo estimate of the variation distance from uniform, and a first-order ap- proximation, of a euchre deck after an a-shuffle. The Monte Carlo result is graphed in black. It was made by finding exact transition probabilities from 100,000 euchre decks to the deal sequence 1322334212233243563, for all values of a between 1 and 1024, and applying the method of Section 6.1. The first order approximation κ1/a, with κ1 =6.523, is graphed in red.

Then for each a we can compute YK to get an approximation of the variation distance.

We are fairly confident that the estimate will be within at most 1.96/√100000 ≈ .0062 of the actual variation distance. The results are shown in Figure 6.1, along with the first-order estimate we found in Section 5.7.3.

Likewise we can use the method of Section 3.11 to compute descent polynomials

13 13 13 13 for ordered bridge (target deck Dord0 = N E S W ). Computing ordered bridge

probabilities is somewhat faster than euchre probabilities, so the results of 1,000,000

trials are in Figure 6.2.

We could also estimate the variation distance for a sorted go-fish deck at this point,

using the method of Section 3.10. But we will save that for Section 6.3. 147

y

1

1 2

a 2 4 8 16 32 64 128 256 512 1024

Figure 6.2: A Monte Carlo estimate of the variation distance from uniform, and first-order approxi- mation, of an ordered bridge deck after an a-shuffle. The Monte Carlo result is graphed in black. It was made by finding the exact transition probabilities to 1,000,000 decks. The first order approximation κ1/a, with κ1 = 27.909, is graphed in red.

6.2 Approximating Transition Probabilities

Unfortunately, as we know, it is not always feasible to compute transition probabil- ities exactly, and in general computation is a #P -complete problem. For some of the cases we are most interested in (cyclic bridge, back-and-forth bridge, and cyclic go-fish) we have no fast methods to get exact probabilities. In these cases we would like to have a method for approximating transition probabilities.

d Fix D and D0 and suppose (D,D0; x) (which is unknown to us) is b x . Let π D d d 1 be a random variable which is uniformly distributed on T (D,D0), andP let Xd(π1) :=

[des(π1)= d]. Then b EX (π )= P(des(π )= d)= d d 1 1 S

where S = #T (D,D0). Now let π2, π3,...,πM be independent and identically dis- 148

tributed to π1. Then M

βd := Xd(πi) i=1 X M S has mean S bd, which means M βd can be used as an estimate for bd. So approxi- mating variation distance using this method involves two levels of Monte Carlo, and therefore two parameters: K, the number of decks sampled, and M, the number of permutations sampled per deck.

In order to approximate the variation distance of a cyclic bridge deck after an a-

10 shuffle, we chose K = 1000 and M = 10 . We selected D1,D2,...,DK uniformly from (D0 ), and each D was passed to a program running on one of the nodes O cyc i of the TeraGrid supercomputer cluster, at either the NCSA or San Diego site. The program chose permutations π1, π2,...,πM uniformly from T (Di,Dcyc0 ) and recorded

βd for d =0, 1,..., 51. The output for

(6.3) D1 = NESSNNWSWEEWSEWNNNSSEEEEESEWWNNESNWESWEWWWSSNSWNNNWS

is shown in Table 6.1. It took approximately 9 hours to process each Di, or about

9000 hours total.

Note first that the distribution is basically bell-shaped, with a peak near the mean value of 25.42 (calculated using Equation (5.27)). Viswanath and the author showed in [11] that des is in general approximately normally distributed over T (D,D0), for any pair of decks.

Note also that none of the permutations in the sample of 1010 had fewer than 13 descents. That does not imply that there are no such permutations in T (D,Dcyc0 ); in fact, a trial with a larger value of M did find some permutations with 12 descents.

But permutations with fewer than 13 descents are very rare in the transition set, and 149

d βd d βd d βd d βd 0 0 13 9 26 1845956914 39 0 1 0 14 242 27 1440932389 40 0 2 0 15 3897 28 892376254 41 0 3 0 16 47156 29 437309298 42 0 4 0 17 426941 30 168950238 43 0 5 0 18 2923324 31 51184205 44 0 6 0 19 15380496 32 12081313 45 0 7 0 20 62583781 33 2203414 46 0 8 0 21 198745151 34 306478 47 0 9 0 22 495578675 35 32263 48 0 10 0 23 975025069 36 2500 49 0 11 0 24 1519174384 37 140 50 0 12 0 25 1878775460 38 9 51 0

10 0 Table 6.1: The distribution of des over a sample of M = 10 permutations from T (D1,Dcyc), where D1 is as in Equation (6.3). The fact that βd = 0 for d < 13 means only that 0 permutations with fewer than 13 descents are rare in T (D1,Dcyc), not that they do not exist. So we are unable to guess the real values of bd for small d.

we never happened upon one in our sample of size 1010. Unfortunately, we have no

basis to say how rare, so we cannot make a reasonable guess at b0, b1,...,b12.

Our confidence in the estimate of bd which results from βd grows with the size of βd,

in the following sense. Since Xd(πi) is a Bernoulli random variable with probability

p := b /S of being 1, it has variance p(1 p), which means that β has variance d − d Mp(1 p). −

If X is a normal random variable with mean µ and variance σ2, then for any > 0

µ/σ µ X µ µ 1 1 x2 P((1 )µ

σ Choosing  =1.96 µ makes the integral .95, so if we use a single sample of X to guess

σ at µ, we are 95% confident that our guess will have a relative error less than 1.96 µ .

S The Central Limit Theorem tells us that M βd is approximately normal, and we know S S 2 S2 Var β = Var(β )= p(1 p). M d M d M −       150

So with high confidence we can say that the relative error in our estimate of bd is

less than

S S2 Var M βd M p(1 p) 1 p 1.96 1.96 =1.96 − =1.96 − < . q b q pS pM √pM d   r

βd is an estimate for pM, so we expect the relative error to be less than 1.96/√βd. In

Table 6.1, note that β16 through β35 have 5 or more digits, so we expect the relative

error in estimating b16 through b35 to be less than one percent.

In most Monte Carlo simulations, it is enough to estimate how probable common

events are, and we are free to ignore the rare events. Not so in our case, unfortunately.

From Equation (2.6) we know that the probability of obtaining a permutation π from an a-shuffle is 1 n + a 1 des(π) P (π)= − − a an n   which vanishes if des(π) a. So ≥ a 1 1 − n + a 1 d (6.4) P (D D0 )= − − b a → cyc an n d Xd=0   which means that we cannot approximate the transition probability for small a with- out approximations for the low-numbered bd. Note that the coefficient of bd in Equa-

tion (6.4) is larger the smaller d is, which means that a permutation with a low

number of descents contributes more to the sum than does its counterpart with a

high number of descents.

We may think of S52 as a bag of stones, each representing a permutation. The stones

52 come in 52 types, with d stones of type d. The value (probability) of a stone is a decreasing function of its type. So low-numbered stones (d< 13) are “precious,” in

that they are both rare and valuable; slightly higher-numbered stones (13 d< 16) ≤ are “semi-precious” (less valuable and less rare). Stones numbered from 16 to 35 are 151

“common,” and higher numbered stones are “semi-obscure” and “obscure” (rare but

not valuable).

In this analogy, a might be called “sophistication.” When a is small, stones with type lower than a are extremely valuable, and stones with larger type are worthless.

The larger a gets, the less important a stone’s type is in determining its value, and

as a approaches infinity the values approach equality.

T (D,Dcyc0 ) is a scoop of stones taken out of Sn. We would like to gauge the total

value of the scoop (i.e., P (D D0 )). To do so, we sample stones from the scoop a → cyc and record a histogram of their values. This gives us a good idea of the distribution

and total value of the common stones in the scoop, and a bit of an idea about the

semi-precious and semi-obscure, but no information about the precious and obscure,

other than that they are rare in the scoop (as they are in the full bag).

So the question of whether our estimate is a good one hinges on the total value of

the precious stones in the scoop. If that value is small relative to the total value

of the common stones, then we are in good shape. If our scoop is close to being

representative of the full bag, then for a 32, 95% or more of the total value ≥ of the scoop will be in the common stones. But it is not obvious that our scoop

should be representative, since it was not chosen at random. In any case, for a 52 ≤ our estimate will more likely be low than high, since we will be undercounting the

precious stones. Using low estimates for P (D D0 ) results in possibly higher a → cyc 1 + values of P (D D0 ) , and therefore a high estimate for P U . So we N − a → cyc || a − || can say with some confidence that our estimates are upper bounds for the actual

variation distances. 152

y

0.25

0.20

0.15

0.10

0.05

a 2 4 8 16 32 64 128 256 512 1024

Figure 6.3: A double Monte Carlo estimate of the variation distance from uniform of a cyclic bridge 0 13 deck (Dcyc = (NESW) ). The Monte Carlo result is graphed in black, and the first order approximation κ1/a, with κ1 =2.147, is graphed in red.

The estimated results for cyclic bridge, along with the first-order estimate from

Section 5.7.6, are shown in Figure 6.3. Since the estimates for a < 16 are not

meaningful, we forgo our normal scale and only show larger values of a. The estimates for 16 a < 32 should be regarded with some suspicion, but the accuracy grows ≤ thereafter.

A separate supercomputer simulation was run for back-and-forth bridge, with K =

1000 decks and M = 109 samples per deck. The results are shown in Figure 6.4. As

with cyclic bridge, the accuracy of the estimates for a< 32 is suspect.

Estimates for all three bridge-dealing methods are shown in Figure 6.5. Since the

scale makes the values hard to see for large a, the same estimates are shown in

Figure 6.6 graphed on a log-log scale. Each graph appears linear because the variation

distance approaches κ1/a for large a, and the graphs appear equally spaced because

the κ1’s are a factor of 13 apart. 153

y

0.10

0.08

0.06

0.04

0.02

a 2 4 8 16 32 64 128 256 512 1024

Figure 6.4: A double Monte Carlo estimate of the variation distance from uniform of a back-and- 0 6 forth bridge deck (Dbf = (NESWWSEN) (NESW)). The Monte Carlo result is graphed in black, and the first order approximation κ1/a, with κ1 =0.165, is graphed in red.

y

0.10

0.08

0.06

0.04

0.02

a 2 4 8 16 32 64 128 256 512 1024

Figure 6.5: Monte Carlo estimates for three methods of dealing bridge. Ordered bridge is shown in red, cyclic bridge in green, and back-and-forth bridge in blue. 154

y 2 4 8 16 32 64 128 256 512 1024 1 a

0.1

0.01

0.001

0.0001

Figure 6.6: The distances of Figure 6.5 graphed on a log-log scale. Ordered bridge is shown in red, cyclic bridge in green, and back-and-forth bridge in blue.

6.3 Approximating κ1 and κ1

We have one more use for Monte Carlo simulations, and that is to estimate values

for κ1 and κ1 when decks are too large for the algorithms in Subsections 5.6.4 and

5.7.2. Recall from Equations 5.6 and 5.7 that

1 κ (D)= E Nc (D,D0) where D0 is uniform on (D) 1 2 | 1 | O 1 κ (D0)= E Nc (D,D0) where D is uniform on (D0) 1 2 | 1 | O

and by Theorem 5.2

W (D,u,v)Z(D0,u,v) Nc (D,D0)= . 1 n n u

we can approximate them by taking a large sample from the distribution of that

variable, and finding the sample mean.

Let X be a random variable with mean µ and variance σ2 which are unknown to us. 155

Suppose we are given ,c > 0 and asked to find an estimate µ such that

P( µ µ ) c. | − | ≤ ≥

(For instance, if c = .95 and  = .01, that means we want to be 95% confident that

µ is within .01 of the actual mean of X.) Let K be some large number, and let

1 K µ := X K i i=1 X where the Xi are independent and identically distributed to X. Then by the Central

Limit Theorem, µ is approximately normally distributed, and

1 2 K σ2 Var(µ)= Var(X )= . K i K i=1   X Therefore

 µ µ  √K (6.5) P( µ µ )= P − 2Φ 1 | − | ≤ − σ2/K ≤ σ2/K ≤ σ2/K ! ≈ σ ! − where p p p x 1 1 t2 Φ(x)= e− 2 dt. √2π Z−∞ σ 1 c+1 2 The right-hand side of Equation (6.5) will be at least c when K Φ− . ≥  2 Of course we do not know the true variance, but assuming the sample size is large we are justified in approximating it with the sample variance:

2 1 K 1 K 1 K σ2 σ2 := X2 µ2 = X2 X . ≈ K 1 i − K 1  i − K i  i=1 i=1 i=1 ! − X  − X X   So we just need to keep sampling the distribution until

2 σ 1 c +1 (6.6) K Φ− . ≥  2    For example, consider the sorted go-fish deck D = A424 K4. When asked to es- ···

timate κ1(D) to within .001 with confidence 99%, our program found target decks 156

y

1

1 2

a 2 4 8 16 32 64 128 256 512 1024

Figure 6.7: A Monte Carlo estimate of the variation distance from uniform of a sorted go-fish deck (D = A424 K4). The Monte Carlo estimate, which resulted from sampling the transi- tion probability· · · to 1,000,000 decks, is graphed in black. The first order approximation κ1/a, with κ1 =6.653, is graphed in red.

1 D0 ,D0 ,... (D) by permuting D randomly. It found φ = Nc (D,D0) for each, 1 2 ∈ O i 2 1 i K K 2 and kept track of i=1 φi and i=1 φi as K advanced. That allowed calculation of µ and σ at anyP point. WhenPK reached 173,590,000 we had µ = 6.652914 and

Inequality (6.6) was satisfied, so we can say with 99% confidence that µ is within

.001 of 6.652914. We are able to calculate exact transition probabilities for D using

the method of Section 3.10, so Figure 6.7 contains both the Monte Carlo and κ1

estimates.

In Section 5.6.3 we wanted to calculate the largest possible value of κ1 for a go-fish

4 deck, which we guessed to be κ1((A23456789TJQK) ), so as to have an upper bound

on how fast a deck goes to uniform. Unfortunately we were stymied by the size of

the computation, and unable to get an exact answer.

Finding a Monte Carlo approximation is much simpler. The program described

above was given the cyclic go-fish deck and parameters  = .001, c = .99. It ran 157

until K = 1,822,610,000, at which time Inequality (6.6) was satisfied, and µ was

21.474320. APPENDICES

158 159

APPENDIX A

Tables of Eulerian and Refined Eulerian Numbers

The Eulerian numbers

n = # π S : des(π)= d d { ∈ n }   may be calculated efficiently using the recurrence

n n 1 n 1 =(n d) − +(d + 1) − d − d 1 d    −    (Equation (4.11)). Here they are, for 1 n 8: ≤ ≤ d =0 d =1 d =2 d =3 d =4 d =5 d =6 d =7 n =1 1 n =2 1 1 n =3 1 4 1 n =4 1 11 11 1 n =5 1 26 66 26 1 n =6 1 57 302 302 57 1 n =7 1 120 1191 2416 1191 120 1 n =8 1 247 4293 15619 15619 4293 247 1

The refined Eulerian numbers

n = # π S : des(π)= d and π(1) = k d { ∈ n }  k may be calculated by means of Equation (4.12):

n n 1 n 1 =(n d 1) − +(d + 1) − . d − − d 1 d  k  − k  k 160

Tables of n for 1 n 8 appear below. Examining the nth table, we find that d k ≤ ≤

The left and right columns are the Eulerian numbers for n 1, since • − n n 1 n = − = . d d d +1  1    n

Each column sums to (n 1)!, since there are (n 1)! permutations in S which • − − n begin with any particular k.

The dth row sums to n , since there are n permutations in S with d descents. • d d n

Each table, like a good crossword puzzle, is symmetric about its center. This • follows from Equation (4.8):

n n = . d k n 1 d n+1 k    − −  −

n 1 The rows of each table are weakly decreasing for d< − , weakly increasing for • 2 n 1 n 1 d> −2 , and unimodal if d = −2 , in accordance with Theorem 4.7.

n k n A table for d can be constructed by flipping the table for d k left-to-right or top-to-bottom, by Equation (4.8):

k n+1 k n n n − = = . d n 1 d d  k  − −   

n = 1 k =1 Total d =0 1 1 Total 1 1

n = 2 k =1 k =2 Total d =0 1 0 1 d =1 0 1 1 Total 1 1 2 161 n = 3 k =1 k =2 k =3 Total d =0 1 0 0 1 d =1 1 2 1 4 d =2 0 0 1 1 Total 2 2 2 6 n = 4 k =1 k =2 k =3 k =4 Total d =0 1 0 0 0 1 d =1 4 4 2 1 11 d =2 1 2 4 4 11 d =3 0 0 0 1 1 Total 6 6 6 6 24 n = 5 k =1 k =2 k =3 k =4 k =5 Total d =0 1 0 0 0 0 1 d =1 11 8 4 2 1 26 d =2 11 14 16 14 11 66 d =3 1 2 4 8 11 26 d =4 0 0 0 0 1 1 Total 24 24 24 24 24 120 n = 6 k =1 k =2 k =3 k =4 k =5 k =6 Total d =0 100000 1 d =1 26 16 8 4 2 1 57 d =2 66 66 60 48 36 26 302 d =3 26 36 48 60 66 66 302 d =4 1 2 4 8 16 26 57 d =5 000001 1 Total 120 120 120 120 120 120 720 n = 7 k =1 k =2 k =3 k =4 k =5 k =6 k =7 Total d =0 1000000 1 d =1 57 32 16 8 4 2 1 120 d =2 302 262 212 160 116 82 57 1191 d =3 302 342 372 384 372 342 302 2416 d =4 57 82 116 160 212 262 302 1191 d =5 1 2 4 8 16 32 57 120 d =6 0000001 1 Total 720 720 720 720 720 720 720 5040 162 n = 8 k =1 k =2 k =3 k =4 k =5 k =6 k =7 k =8 Total d =0 10000000 1 d =1 120 64 32 16 8 4 2 1 247 d =2 1191 946 716 520 368 256 176 120 4293 d =3 2416 2416 2336 2176 1952 1696 1436 1191 15619 d =4 1191 1436 1696 1952 2176 2336 2416 2416 15619 d =5 120 176 256 368 520 716 946 1191 4293 d =6 1 2 4 8 16 32 64 120 247 d =7 00000001 1 Total 5040 5040 5040 5040 5040 5040 5040 5040 40320 163

APPENDIX B

Excerpts from Calcul des Probabilit´es (1912) by Henri Poincar´e

B.1 Introduction, pp. 13–15

Let us pass to an entirely different example, where especially the complexity of the causes intervenes; I suppose that a player shuffles a deck of cards. Each shuffle changes the order of the cards, and can change them several ways. To simplify the example, let us suppose there are only three cards. The cards which, before the shuffle, occupied positions 1 2 3 will, after the shuffle, occupy the positions

123, 231, 312, 321, 132, 213.

Each one of these six re-orderings is possible, and they have probabilities

p1,p2,p3,p4,p5,p6 respectively. The sum of these six numbers is equal to 1, but that is all we know; these six probabilities depend on the practices of the player, whom we do not know.

With the second and subsequent shuffles, that will start again and under the same conditions; I want to say that p4, for example, always represents the probability that 164

the three cards which, after the nth shuffle and before the (n+1)st, occupy positions

1 2 3, will occupy positions 3 2 1 after the (n + 1)st shuffle. And that remains true, whatever the number n, since the practices of the player, his way of shuffling, remain

the same.

But if the number of the shuffles is very large, the cards which, before the first shuffle,

occupied positions 1 2 3, will be able, after the last shuffle, to occupy the positions

123, 231, 312, 321, 132, 213,

and the probability of these six orderings will be appreciably the same and equal

1 to 6 ; and that will be true, whatever the unknown numbers p1,...,p6. The great number of the shuffles, i.e. the complexity of the causes, produced the uniformity.

That would apply without change if there were more than three cards; but, even

with three cards, the demonstration would be complicated; I will be satisfied here to

give it only for two cards.1 We have nothing any more but two orderings

12, 21

with the probabilities p and p = 1 p . Let us suppose n shuffles, and suppose 1 2 − 1 that I gain 1 franc if the cards are finally in the initial order, and that I lose 1 franc

if they are finally inverted. Then, my expectation will be

(p p )n 1 − 2

The difference p p is certainly smaller than 1; so that, if n is very large, my 1 − 2

expectation will be null; we do not need to know p1 and p2 to know that the play is

equitable.

1See a more complete calculation in the Chapter entitled: Diverse Questions. 165

There would be an exception, however, if one of the numbers p1 and p2 were equal to 1 and the other equal to 0. That would not work any more, because our initial assumptions would be too simplistic.

B.2 Chapter XVI, “Diverse Questions”, pp. 301–313

225. Card Shuffling. In the introduction I considered some problems relating to a player who shuffles a deck of cards. Why, when the deck is shuffled long enough, do we admit that all permutations of the cards, i.e. all the orders in which the cards can be arranged, must be equally probable? It is what we will examine more closely.

Let q be the number of the cards; and let Si be an arbitrary permutation, i.e. the operation which consists in making pass to position α the card which before the permutation occupied position β; α being a given function of β. The total number of the possible permutations is q!

There will be a certain order of the cards which we will regard as normal, and which we will indicate by S0; and we will represent by Si the order of the cards when, originally arranged in the normal order, they undergo permutation Si. Thus S0 will represent at the same time the normal order and the identity permutation, that which does not alter the order of the cards. That posed, two consecutive permutations Si and Sj will be equivalent to a single permutation Sk, which I will express by the relation

(B.1) SiSj = Sk.

A set of permutations forms a group when the product of any two permutations in 166

the set belongs to the set. Thus

S0,S1,...,Sr

are various permutations of a group G and corresponding orders of the cards. Let

us suppose that we know that the order of the play belongs to this group, and that

the various permutations of the group have as respective probabilities

p0,p1,...,pr

so that

p + p + + p =1. 0 1 ··· r

We can symbolically represent this law of probability by a complex number. It is

known that one inaugurated complex numbers of the form

X = x e + x e + x e , 0 0 1 1 ··· r r

where the x are ordinary quantities and the e complex units. The operations on these complex numbers are done according to the ordinary rules of calculation, with the difference that multiplication, which remains distributive and associative, might not be commutative. One defines a system of complex numbers, by giving oneself the rule of multiplication, i.e. by defining the product eiej of two unspecified complex

units.

We will define a system of complex numbers corresponding to our group G; with

each permutation Si of this group we will associate a complex unit ei; and if, as in

Equation (B.1), SiSj = Sk, we will agree that the product eiej is equal to ek. This

rule is acceptable, since it is associative. 167

We will be able then to represent symbolically the law of probability under consid-

eration by the complex number

P = p e + p e + + p e . 0 0 1 1 ··· r r

226. The player who shuffles the cards acts so that with each shuffle, there is a

probability pi that the cards are permuted according to Si. This law of probability,

which is unknown to us, is represented symbolically by P = piei. If one left the

normal order, the probability that after a shuffle one has theP order Si will be pi,

so that the probability distribution of the various orders will still be represented

symbolically by P . If instead of leaving the normal order S0 we had started from

an unspecified order Sj, the probability distribution would have been represented by

ejP . If before the shuffle the probability distribution were represented by complex

number Q, it will be QP after the shuffle. If thus we leave the normal order and

shuffle n times, the probability distribution will be represented finally by the complex

number P n.

What we want to show is that if n is very large one will have appreciably 1 P n = (e + e + + e ), r +1 0 1 ··· r i.e. all the possible orders will be equally probable. And this result will be indepen- dent of P , i.e. of the unknown law of probability of the unknown practices of the

player.

227. M. Cartan introduced into the theory of complex numbers the concept of the

characteristic equation. Let A be a given complex number, X an unknown complex

number, and ω an unknown ordinary number; let us consider the equation

(B.2) AX = ωX. 168

The two sides are complex numbers, and by equating the coefficients of e0, e1,...,er, there will be r + 1 equations in the r + 1 coefficients xi of the unknown complex number X; these equations are linear on the one hand compared to xi, and on the other hand compared to ω and with the r + 1 coefficients ai of A. We thus have r +1 linear equations compared to r+1 unknown xi. Let us write that the determinant, ∆ of these equations is null; we will have an algebraic equation of order r +1 which will determine ω. According to the theorem of linear substitutions, to each simple root of this equation ∆ = 0 will correspond a complex number X satisfying Equation (B.2).

With a double root there will be two complex numbers X and X1, such that

(B.2 bis) AX = ωX, AX1 = ωX1 + 1X.

With a triple root, there will be three numbers X,X1,X2, such that

(B.2 ter) AX = ωX, AX1 = ωX1 + 1X, AX2 = ωX2 + 2X1,

and so on; the  are ordinary constant numbers which one can suppose equal to 0 or 1. Let us notice that if 1 = 2 = 0, one will have

A(λX + λ1X1 + λ2X2)= ω(λX + λ1X1 + λ2X2)

whatever the constants λ,λ1,λ2 (which are, of course, ordinary numbers).

In the case where the complex numbers form a group there are some simplifications.

If X is an unspecified complex number and

eiX = Y = yjej; X then it is seen immediately that the yi are the xi arranged in a different order. It is the same if one poses Xei = Y ; one has besides

e0X = Xe0 = X, 169 so that the equation (2) can be written

(A ωe )X =0. − 0

228. I form the characteristic equation of the number P

PX = ωX and I propose to show first that it has a root equal to 1, and then that all other roots are no more than 1 in absolute value.

That is to say, if

PX = yiei; X one will have

yi = pkxh X indices i, k and h being bound by the relation ekeh = ei, so that, if I write

yi = ph,ixh, X

The phi will be the ph in another order. Then Equation (B.2) will give us

(B.3) ph,ixh = ωxi. X

We will be able to satisfy Equation (B.3) by taking all the xi to be equal, from where we will deduce

ph,i = ω. X But

ph,i = ph =1 X X 170

hence ω = 1, which shows that there is a root equal to 1. With regard to the other

roots, Equation (B.3) will give us

(B.4) p x ωx | h,i h|≥| i| X or, since all the p are real and positive,

p x ω x . h,i | h|≥| || i| X Adding all these inequalities gives

p x ω x . h,i | h|≥| | | i| i X Xh X But

ph,i =1. i X Thus

p x = x , h,i | h| | h| i X Xh X so

x ω x , | h|≥| | | i| X X And therefore

ω 1. | | ≤ Thus no root can be larger than 1 in absolute value.

C.Q.F.D.

229. Suppose ω is a root different from 1; let us consider a complex number pertaining

to this ω root, i.e., according to the phraseology adopted by M. Cartan, one of the

numbers X,X1,..., such that

PX = ωX, PX1 = ωX1 + 1X. 171

I say that, for all these complex numbers,

xi =0. X Indeed, let us replace in all our complex numbers all the complex units ei by the ordinary unit; the equalities which can exist between these complex numbers will remain. If one has

1 P = piei, X = xiei, X1 = xi ei, X X X these complex numbers after substitution will become respectively

1 pi =1, xi, xi , X X X and our equalities will become

1 1 xi = ω xi, xi = ω xi + 1 xi, X X X X X and consequently, if ω is not equal to 1,

1 xi =0, xi =0. X X

230. We said that there is a root equal to 1; it remains to be known whether there can be several roots whose modulus is equal to 1, or if 1 is not a multiple root. So that the inequality (B.4) is reduced to an equality, it is necessary that all the xi have the same modulus, and, as these xi are given only by their ratios, we can suppose that they are all real and positive. As the p are all real and positive, it will be the same for ω, i.e. we will have

ω =1.

If we let xj be the largest of all xi we will have

p x p x = x , h,j j ≤ h,j j j X X 172

this inequality being able to be reduced to an equality only if all the phj are null, except those which when multiplied by an xh equal an xj. But Equation (B.3) gives us

ph,jxh = ωxj = xj. X We must thus conclude that

ph,j =0 if xh < xj

which shows us that if none of the pi and consequently none of the phj are null, then all the xi will be equal.

We will say that the permutation Si belongs to category C if xi is equal to xj, i.e. to the largest component of x.

I will say in addition that Sk belongs to the set E if, whenever Sj belongs to category

C, the same can be said of

1 Sh = Sk− Sj.

The condition necessary so that pk = phj cannot be null is that Sk belongs to E.

I say now that the E constitutes a subgroup of G. If Sk and Se belong to E, then

1 1 Sj cannot belong to C without it being the same for Sk− Sj and Se− Sj. But then it will have to be the same for

1 1 1 Sk− (Se− Sj)=(SeSk)− Sj,

which means that SeSk will also belong to C.

Thus the circumstance which occupies us will only occur if all the pk are null, except those which correspond to the permutations of a certain subgroup, i.e. if it is known 173

in advance that the player by shuffling the cards will never carry out anything but

permutations belonging to this subgroup.

If one leaves aside this single exception, the equation

PX = X

cannot be satisfied unless all the xi are equal.

231. A possibility would still remain; one could suppose that there is a complex

number X1 such as

(B.5) PX1 = X1 + 1X, 1 ≷ 0

X = e + e + + e 0 1 ··· r

and that the root 1 is multiple. But from Equation (B.1) one deduces

n P X1 = x1 + n1X

n The coefficients of X and 1X do not depend on n; those of P depend on n, but they

remain real and positive, and their sum remains equal to 1, since P n symbolically

represents the law of the probabilities after n shuffles.

n The coefficients of P X thus remain bounded. On the contrary, those of X1 + n1X

are polynomials of the first degree in n; they cannot thus be bounded; Equation (B.5)

is thus impossible.

If we take again the equation of the form

PX = ωX,

we saw that ω is given by an equation of order r +1; and that to a root of order

of multiplicity µ, pertain µ distinct complex numbers. The sum of the orders of 174

multiplicity being r +1, there will be r +1 complex numbers belonging to the various roots, and they will be linearly independent. An unspecified complex number can thus be regarded as a linear combination of those which belong to the various roots.

In the case which occupies us, only one number belongs to the root 1, it is

e + e + + e ; 0 1 ··· r all the others belong to roots < 1 in absolute value, and the sum of the coefficients of each one of these complex numbers is null. It results from there that any complex number, such that the sum of its coefficients is null, can be looked at as a linear combination of the complex numbers which belong to the roots < 1 in absolute value.

232. Let us consider now a root ω such that ω < 1; we will have, if this root is | | multiple,

PX = ωX, PX1 = ωX1 + 1X, PX2 = ωX2 + 2X1 and we will easily deduce

n n n n n 1 P X = ω X, P X1 = ω X1 + nω − 1X,

n n n 1 n(n 1) n 2 P X = ω X + nω −  X + − ω −   X. 2 2 2 1 2 1 2

n n 1 n(n 1) n 2 − Since ω , nω − , 2 ω − ,... tend towards zero when n grows indefinitely, one sees that, for a complex number X pertaining to a root < 1 in absolute value, one has

lim P nX =0 (n = ). ∞

But any complex number is a combination of those which belong to the roots < 1, provided that the sum of its coefficients is equal to zero. 175

One will thus have

lim P nX =0

whenever the sum of the coefficients of X is null.

If on the contrary X belongs to root 1, i.e. if all its coefficients are equal, one will have

P nX = X.

If X is an unspecified complex number, we will be able to pose

X = SX0 + X0, where S is the sum of the coefficients of X,

1 X = (e + e + + e ), 0 r +1 0 1 ··· r and the sum of the coefficients of X0 is null. One will have then

n lim P X = SX0.

Let us notice now that

X0X = XX0 = SX0.

One will have indeed

e e X = x e , X = j , X X = k x , i i 0 r +1 0 r +1 k,j X X Xk where one posed xkj = xi by admitting eiej = ek or

e X X = k x . 0 r +1 k,j j ! Xk X But the xk,j which appear under the j are the same as the xi in a different order, i.e. P

xk,j = xi = S, j X X 176

from where finally

X0X = SX0 and

lim(P n X )X =0. − 0

But X is an unspecified complex number; we can thus take X = e0 from where

(P n X )X =(P n X )e = P n X . − 0 − 0 0 − 0

Therefore

n lim P = X0,

which says that the limit of all probabilities, i.e. all the coefficients of the complex

number P n which represents the law of probability symbolically, are equal. Which

is what we proposed to show.

I will return to some of the works where complex numbers and their relationship

with the groups are treated. I will quote first the work of M. Frobenius published

in Sitzungsberichte of the Academy of Berlin of 1896 to 1901, and then a report

of M. Cartan On the bilinear groups and the systems of complex numbers (Annals

of the Faculty of Toulouse, t. XII). I considered the question, and I in particular

endeavoured to bring closer the results presented by these two eminent scientists in

very different forms at the time of a study On the algebraic integration of the linear

equations, inserted in the Journal of Liouville, 5e series, t. IX.

Translated using AltaVista’s “Babel Fish” translator, with help from Carol Mohr,

Paul Greiner, and Alejandro Uribe. 177

APPENDIX C

An Algorithm for Computing the Descent Polynomial when the Source Deck is Sorted

The descent polynomial for the transition from the deck

D =1n1 2n2 mnm ···

to a rearrangement D0 can be computed efficiently using the algorithm below. In

Section 3.10, we showed that

T (C.1) (D,D0; x)= Gn1 A1,2Hn2 A2,3 Am 2,m 1Hnm−1 Am 1,mGn D ··· − − − m e where A is an n n matrix with u,v u × v

ith u is below jth v in D0 (Au,v)i,j = x h i

Gn =(gn,1(x),gn,2(x),...,gn,n(x))

Gn =(gn,n(x),gn,n 1(x),...,gn,1(x)) − e and H is an n n matrix with H = [1] and n × 1

gn 1,n+i j(x) if i < j  − − (Hn)i,j = hn,i,j(x)=  xgn 1,i j(x) if i > j  − −  0 if i = j    178

· · · · · · v 1 v ith v in D0 −

{ v v · · · v · · ·

C }| v v · · · v · · · · · · 0 z v +1 v +1 jth v + 1 in D v +1 · · · · · · · · ·

Figure C.1: Shuffling a sorted deck. When the destinations of the first v and first v +1 are fixed as shown, the (i, j) entry of Bv is the generating function for descents among the cards in Cv, which includes all the v’s in D as well as the first v + 1. for n> 1. Note that

` ` n d n d n d gn,n+1 `(x)= x = x = x = hn,k,`(x) − d n+1 ` d d k Xd   − Xd   Xd Xk   Xk so Gn = JnHn where

e Jn = (1, 1,..., 1). n 1’s So we can write Equation (C.1) as | {z }

m 1 − T (C.2) (D,D0; x)= J H A G . D n1 nv v,v+1 nm v=1 ! Y Suppose 1 v < m. Let n := n and B := H A . We can simplify B directly ≤ v v nv v,v+1 v using Equation (4.9), but we can also do so with a combinatorial argument.

Let Cv be a set of cards consisting of the v’s in D together with the first v + 1. Bv represents the descents among the cards in Cv. In other words, the (i, j) entry of

Bv is the generating function for descents among the arrows originating from the v’s and the first v + 1 in D, if we fix the destination of the first v as the ith v in D0, and the destination of the first v +1 as the jth v + 1 in D0. 179

g6,6 g6,6 g6,5 g6,3 g6,3 g6,3 g6,1

xg6,1 xg6,1 g6,6 g6,4 g6,4 g6,4 g6,2

xg6,2 xg6,2 xg6,1 g6,5 g6,5 g6,5 g6,3

xg6,3 xg6,3 xg6,2 g6,6 g6,6 g6,6 g6,4

xg6,4 xg6,4 xg6,3 xg6,1 xg6,1 xg6,1 g6,5

xg6,5 xg6,5 xg6,4 xg6,2 xg6,2 xg6,2 g6,6

0 Figure C.2: The matrix B1 if the subsequence of 1’s and 2’s in D is 1221211222112. The jth column is a subsequence of (gn,1,gn,2,...,gn,n, xgn,1, xgn,2,...,xgn,n), where n = 6 is the number of 1’s, and the number of entries from the first half of the sequence is equal to the number of 1’s before the jth 2. So the boundary between the g’s and the xg’s is described by drawing a line (as in Figure 3.6) from the upper-left corner to the lower-right corner, with a vertical segment for a 1 and a horizontal segment for a 2.

Let rj be the number of v’s above the jth v + 1 in D0. Then

gn,n+i r (x) if i rj − j ≤ (Bv)i,j = hn+1,i+[i>rj],rj +1(x)=   xgn,i r (x) if i>rj.  − j

So the jth column of Bv is the subsequence of

(C.3) gn,1(x),gn,2(x),...,gn,n(x), xgn,1(x), xgn,2(x),...,xgn,n(x) which starts at offset n r from the left. − j

In the algorithm below we assume for ease of exposition that

n = n = = n = n 1 2 ··· m though it is not hard to generalize to distinct nv. By Equation (C.2) we need to

find

T (C.4) JnB1B2 Bm 1Gn . ··· −

The columns of the Bv will all be subsequences of Equation (C.3), so first we need to compute gn,k for all k. We know that

gn,1(x)= an 1(x)= gn 1,k(x) − − Xk 180

and, for 1

gn,k(x)= gn,k 1(x)+(x 1)gn 1,k 1(x) − − − −

as a consequence of Equation (4.10). So we can compute the g’s like this:

Compute-G(n) 1. s G(1) 1 2. for←m 2←to n 3. u ←G(1) 4. G(1)← t s 5. for k← 2←to m 6. t ←t +(x 1)u 7. u← G(k) − 8. G(←k) t 9. s s←+ t ←

We presume here that we want to fix the source deck and calculate the descent

polynomial for a number of target decks. So we should isolate the processing that

requires knowledge of the order of D0.

With knowledge only of m and n, we can set up some data structures to speed

later processing. In the algorithms below, variables whose names are capitalized are

presumed to be global; that is, they may be accessed in other routines. The new

operator allocates memory for an array of objects of the given type.

Set-Sorted-Source-Deck(m, n) 1. N n 2. M ← m 3. G ←new polynomial(2N) 4. R ← new integer((M 1) N) 5. Count← new integer(−M) × 6. Acc1 ←new polynomial(N) 7. Acc2 ← new polynomial(N) 8. Compute-G← (N) 9. for k 1 to N 10. G(←N + k) xG(k) ← 181

Acc1 and Acc2 are accumulator arrays that the Sorted-Descent-Polynomial routine below will use for scratch space. The 2-dimensional array R will be such that R(v, j) is the number of cards labeled v above the jth card labelled v + 1 in D0

(these numbers were called rj earlier). Here is how we compute them:

Count-Cards(D0) 1. for v 1 to M 2. Count← (v) 0 3. for i 1 to MN← ← 4. v D0(i) 5. j ← Count(v) Count(v)+1 6. if←(v > 1) ← 7. R(v 1, j) Count(v 1) − ← −

And now we can compute the descent polynomial:

Sorted-Descent-Polynomial(D0)

1. Count-Cards(D0) 2. cur Acc1, prev Acc2 3. for k← 1 to N ← 4. cur←(k) 1 5. for v 1 ←to M 1 6. temp← cur, −cur prev, prev temp 7. for j ← 1 to N ← ← 8. offset← N R(v, j) 9. cur(j) ←0 − 10. for k ←1 to N 11. cur←(j) cur(j)+ prev(k)G(k + offset) 12. f 0 ← 13. for←k 1 to N 14. f ← f + cur(k)G(k) 15. return←f

Here we multiply the matrices in Equation (C.4) from left to right. Since JN is an N-dimensional row vector, and each B is an N N matrix, the result will be v × an N-dimensional row vector at each stage. In steps 7-11, prev = JN B1B2 Bv 1 ··· − and cur is being computed to be prevBv. In the final stage (steps 12-15), cur = 182

JN B1B2 BM 1 and f is being computed to be curGN . ··· −

Sorted-Descent-Polynomial contains (M 1)N 2 + N polynomial multiplica- − tions in steps 11 and 14. The setup routines Set-Sorted-Source-Deck and

Compute-G contain N(N + 1)/2 polynomial multiplications between them. 183

APPENDIX D

Probabilities for the Ace-King Game

The Ace-King game is a simple two-player game that was described in Section 2.9.

We begin with the ace of spades on top of the deck and the king of hearts on the bottom, give the deck an a-shuffle, then deal out the deck from the top. Player B

(for black) wins if the ace of spades comes up before the king of hearts, and loses otherwise.

Suppose that besides the ace and the king the deck includes n 2 other cards, which − we may consider to be indistinguishable. (For a standard deck, n 2 = 50.) We − would like to compute the probability that player B wins.

Let U S be the set of permutations which take the ace to position k and the n,k,` ⊂ n

king to position `. In Section 3.7 we began using the notation hn,k,` for the descent

polynomial of Un,k,`, and we showed that

gn 1,n+k `(x) if k<` − − hn,k,`(x)=   xgn 1,k `(x) if k>`  − −

n d  where gn,k(x)= d d kx . Let Wn be the set of permutations which, when applied to a deck with theP ace on top and the king on the bottom, result in a win for B. 184

Then

Wn = Un,k,` k<`[ so

n 1 − Wn (x)= hn,k,`(x)= gn 1,n+k `(x)= ugn 1,u(x) D − − − u=1 Xk<` Xk<` X n 1 − n 1 n 1 = u − xd = xd u − . d d u=1 u u u X Xd   Xd X   n 1 The inner sum is familiar to us from Theorem 4.3—if we divide it by −d , we get the expected value of π(1) when π is chosen uniformly from among those permutations of n 1 which have d descents. Theorem 4.3 tells us that is d +1, so −

d n 1 d Wn (x)= x − (d +1) = (xan 1(x)) D d dx − Xd   n 1 d where an 1(x) is − x . From Section 3.1 we know that − d d P n n 1 a 1 an 1(x)=(1 x) a − x − − − a 1 X≥ so

d n n 1 a (x)= (1 x) a − x DWn dx − a 1 X≥ n n a 1 n 1 n 1 a = (1 x) a x − n(1 x) − a − x − − − a 1 a 1 X≥ X≥ which means that

(n+1) (x)=(1 x)− (x) SWn − DWn 1 n a 1 2 n 1 a = (1 x)− a x − n(1 x)− a − x . − − − a 1 a 1 X≥ X≥ The first term is

(1 + x + x2 + x3 + )(1n +2nx +3nx2 +4nx3 + ) ··· ··· 185

which can be written as a a 1 n x − k . a 1 k=1 X≥ X The second term is

2 3 n 1 n 1 2 n 1 3 n(1+2x +3x +4x + )(1 − x +2 − x +3 − x + ) ··· ··· which is the same as a a 1 n 1 n x − (a k)k − . − a 1 k=1 X≥ X a 1 So the coefficient of x − in (x) is SWn a a a a n n 1 n n 1 k n (a k)k − =(n + 1) k an k − . − − − Xk=1 Xk=1 Xk=1 Xk=1 Dividing by an gives the probability that B wins the game:

a n a n 1 k k − (D.1) P (W )=(n + 1) n . a n a − a Xk=1   Xk=1   Let a n 1 k − R := n . n,a a Xk=1   Then P (W )= R R . Note that a n n+1,a − n,a a n 1 1 1 k − R = n a n,a a a Xk=1   and a 1 n 1 1 − 1 k − (R n)= n a n,a − a a Xk=0   1 n 1 are the right- and left-hand Riemann sums of 0 nx − dx when approximated by a subintervals (see Figure D.1). The integral is 1,R so

1 1 (R n) < 1 < R a n,a − a n,a

and therefore

a < Rn,a < a + n. 186

y

n

y = nxn−1

x 1/a 2/a 1

Figure D.1: Riemann sums corresponding to probabilities in the Ace-King game. The graph shows y = nxn−1, and the shaded rectangles form the left-hand Riemann sum approximation − 1 n−1 a−1 1 k n 1 1 of nx dx, that is, n = (Rn,a n). The shaded and unshaded 0 k=0 a a a − rectangles together form the right-hand sum, which is 1 R . This allows us to bound R P  a n,a the value of Rn,a.

So the Rn,a are reasonably sized sums of positive numbers, which means there is no chance of catastrophic cancellation if we compute Rn+1,a and Rn,a separately, and then subtract.

On the other hand, we know that

a m m 1 m+1 1 m 1 m +1 m+1 j k = a + a + B a − m +1 2 m +1 j j j=2 Xk=1 X   where Bj is the jth Bernoulli number (see [24, pp. 283–290]). So

a n n 1 R = k − n,a an 1 − k=1 X n 1 − n 1 n 1 n 1 1 n n j = a + a − + B a − an 1 n 2 n j j − " j=2 # X   n 1 − n n 1 j = a + + B a − 2 j j j=2 X   187

and likewise n n +1 n +1 1 j R = a + + B a − . n+1,a 2 j j j=2 X   So n 1 n +1 n 1 j 1 n P (W )= + B a − + B a − a n 2 j − j j n j=2 X     which means the margin enjoyed by player B is

n 1 n 1 j 1 n (D.2) P (W ) = B a − + B a − . a n − 2 j 1 j n j=2 X  −  Equation (D.2) shows clearly that B’s advantage will become negligible as a grows large, since all the powers of a are negative. Unfortunately, the Bernoulli numbers grow very large and alternate in sign, so Equation (D.2) is not a good way to calculate

B’s advantage when a is small. However the leading term is

n 1 n B a− = 1 2 6a  

n so we can expect the margin to behave like 6a for large enough values of a. For n = 52, the difference between the actual margin and n is below .01 for a 42, and 6a ≥ below .001 for a 91. The relative error ≥ n (P (W ) 1 ) 6a − a n − 2 P (W ) 1 a n 2 −

drops below 1% for a 93. ≥ 188

APPENDIX E

The Variance of Nc1 When the Source Deck is Fixed

In section Section 5.4 we defined

1 1 κ (D)= c (D,D0) = E Nc (D,D0) 1 2 | 1 | 2 | 1 | D0 (D) X∈O where D0 is chosen uniformly from (D), and N = # (D). The expected absolute O O value of the random variable Nc1(D,D0) is in general nontrivial to calculate, but we can find its mean and variance.

n1 n2 n Let D :=1 2 ...k k , with n := n + n + + n . For D0 (D ) define 0 1 2 ··· k ∈ O 0

φ(D0) := αuvZ(D0,u,v) u

convenience, we extend the definition antisymmetrically: Let αuu := 0 and αuv :=

α when u > v. − vu

Theorem E.1 If D0 is chosen uniformly from (D ) then O 0

Eφ(D0)=0

2 1 2 Var φ(D0)= n n α + n n α 3  u v uv v u uv  u

Proof. Let φij(D0) := αD0(i),D0(j). Then

φ(D0)= (# u-v pairs in D0 # v-u pairs in D0 ) α { } − { } u,v u

= (# u-v pairs in D0 α + # v-u pairs in D0 α ) { } uv { } vu u

= # u-v pairs in D0 α = φ (D0). { } uv ij u,v i

Eφ = P(A )α = (P(A ) P(A ))α . ij uv uv uv − vu uv u,v u

Eφij = 0. Therefore Eφ = Eφij =0. So P 2 (E.1) Var(φ)= E(φ )= E(φijφk`). i

distinct. Then

E(φ φ )= P(A )α E(φ A ). ij k` u,v uv k`| u,v u,v X But the inner expectation is the expected value of φk` when the deck ranges over all

orderings of D0 with one card of value u and one card of value v removed, and that

is 0. So E(φijφk`) = 0 when #P = 4.

If #P = 2, i < j and k<` imply that i = k and j = `, so

n n E(φ φ )= E(φ )2 = P(A )α2 =2 P(A )α2 =2 u v α2 . ij k` ij uv uv uv uv n(n 1) uv u,v u

n That leaves the case where #P = 3. There are 3 ways to choose such a P , and they all yield the same expectations, so 

n E(φ φ )= E φ φ . ij k` 3 ij k` i

1 2 3 φijφk`

i, k j ` φ12φ13

i, k ` j φ13φ12

i j, k ` φ12φ23

k i, ` j φ23φ12

i k j, ` φ13φ23

k i j, ` φ23φ13

So the cases where #P = 3 contribute

n (E.3) 2 E(φ φ + φ φ + φ φ ) 3 12 13 12 23 13 23   to Equation (E.1). To calculate that expectation, we should consider all possibilities

for the first 3 cards of D0. If all three cards have value u, then φ12 = φ13 = φ23 =

αuu = 0, so this case contributes nothing to the expectation. Suppose on the other

hand that exactly two values u and v, with u < v, appear among the first three

cards. There are 6 ways that can happen: 191

first 3 cards φ12 φ13 φ23 φ12φ13 + φ12φ23 + φ13φ23 probability

2 uuv 0 αuv αuv αuv

2 nu(nu 1)nv − uvu αuv 0 αvu αuv n(n 1)(n 2) − − − 2 vuu αvu αvu 0 αuv

2 uvv αuv αuv 0 αuv

2 nunv(nv 1) − vuv αvu 0 αuv αuv n(n 1)(n 2) − − − 2 vvu 0 αvu αvu αuv

So for each choice of u and v these cases contribute n (n 1)n n n (n 1) u u − v (α2 α2 + α2 )+ u v v − (α2 α2 + α2 ) n(n 1)(n 2) uv − uv uv n(n 1)(n 2) uv − uv uv − − − − or n n (n + n 2) u v u v − α2 n(n 1)(n 2) uv − − to the expectation in Equation (E.3).

Finally, suppose three distinct values u, v, and w, with u

first three positions. We have 6 more cases:

first 3 cards φ12 φ13 φ23 φ12φ13 + φ12φ23 + φ13φ23 probability

uvw α α α +α α α α + α α uv uw vw uv uw − vu vw wu wv uwv α α α +α α + α α α α uw uv wv uv uw vu vw − wu wv

vuw αvu αvw αuw αuvαuw + αvuαvw + αwuαwv − nunvnw n(n 1)(n 2) vwu α α α +α α + α α α α − − vw vu wu uv uw vu vw − wu wv wuv α α α α α + α α + α α wu wv uv − uv uw vu vw wu wv wvu α α α +α α α α + α α wv wu vu uv uw − vu vw wu wv so for each choice of u,v,w these cases contribute n n n 2 u v w (α α + α α + α α ) n(n 1)(n 2) uv uw vu vw wu wv − − 192

to the expectation in Equation (E.3). So

n n n (n + n 2) E(φ φ )=2 u v u v − α2 ij k` 3 n(n 1)(n 2) uv i

Putting it all together, we have

1 Var φ = n n α2 + n n (n + n 2)α2 u v uv 3 u v u v − uv u

1 1 1 n n (n + n + 1)α2 = n n α2 + n2 n α2 3 u v u v uv 3 u v uv 3 u v uv u

To calculate Var(Nc1(D,D0)) when D is fixed, we let

n W (D,u,v) αuv = 2 nunv     193

so by the theorem,

2 n2 W (D,u,v)2 1 Var(Nc )= + W (D,u,v) . 1 12  n n n  u

W (D,u,v)= (# u-v digraphs in D # u-v digraphs in D ) { } − { } u u X X = # digraphs ending in v # digraphs beginning with v . { } − { }

Every card in D, except the last, begins a digraph, and every card but the first ends a digraph. So the number of digraphs ending in v will be nv unless D(1) = v, in which case it will be n 1. Similarly, the number of digraphs beginning with v will v − n 1 if D(n)= v, and n otherwise. So v − v

W (D,u,v)=(n [D(1) = v]) (n [D(n)= v]) = [D(n)= v] [D(1) = v] v − − v − − u X and the square of that sum is 1 if D begins with v or ends with v, but not both, and

0 otherwise. So n2 W (D,u,v)2 Var(Nc )= + E(D) 1 12 n n "u

 194

APPENDIX F

The Variance of Nc1 When the Target Deck is Fixed

In Section 5.4 we defined

1 1 κ (D0)= c (D,D0) = E Nc (D,D0) 1 2 | 1 | 2 | 1 | D (D0) ∈OX where D is chosen uniformly from (D0), and N = # (D0). The expected absolute O O

value of the random variable Nc1(D,D0) is in general nontrivial to calculate, but we

can find its mean and variance.

Let D :=1n1 2n2 ...knk , with n := n + n + + n . For D (D ) define 0 1 2 ··· k ∈ O 0

θ(D) := βuvW (D,u,v) u

convenience, we extend the definition antisymmetrically: Let βuu := 0 and βuv :=

β when u > v. − vu

Theorem F.1 If D is chosen uniformly from (D ) then O 0

Eθ(D)=0

2 Var θ(D)= n n β2 + n n n (β + β + β )2 . n(n 1) u v uv u v w uv vw wu "u

Proof. Let θ (D) := β for 1 i n 1. Then i D(i),D(i+1) ≤ ≤ −

θ(D)= (# u-v digraphs in D # v-u digraphs in D ) β { } − { } uv u

Let Auv be the event that D(i)= u and D(i +1)= v. Then

Eθ = P(A )β = (P(A ) P(A ))β . i u,v uv uv − vu uv u,v u 1, the positions i, i +1, j, j + 1 are distinct. So −

E(θ θ )= P(A )β E(θ A ). i j uv uv j | u,v u,v X

The inner expectation is the mean value of θj when the deck ranges over all orderings of D0 with one card labelled u and one card labelled v removed, and that mean is 0, as above. So E(θ θ ) = 0 when j i> 1. Therefore i j − n 1 n 2 − − (F.1) Var θ = E(θ2)+2 E(θ θ )=(n 1)E(θ2)+2(n 2)E(θ θ ). i i i+1 − 1 − 1 2 i=1 i=1 X X Now

2 2 2 (F.2) E(θ1)= P(D(1) = u, D(2) = v)βuv =2 P(D(1) = u, D(2) = v)βuv u,v u

To calculate the expected value of θ1θ2, we need to consider all possible ways to place cards in the first three positions of the deck.

If D(1) = D(2) = D(3) = u, then θ1 = θ2 = βuu = 0, so the case where only one value appears among the first 3 cards contributes nothing to Eθ1θ2. Suppose two values, u and v, with u < v, appear in the first 3 positions. Then we have 6 cases:

first 3 cards of D θ1 θ2 θ1θ2 probability

uuv 0 βuv 0

2 nu(nu 1)nv − uvu βuv βvu βuv n(n 1)(n 2) − − −

vuu βvu 0 0

uvv βuv 0 0

2 nunv(nv 1) − vuv βvu βuv βuv n(n 1)(n 2) − − −

vvu 0 βvu 0

The contribution of these cases to Eθ1θ2 is

n (n 1)n n n (n 1) (F.3) u u − v + u v v − ( β2 ) n(n 1)(n 2) n(n 1)(n 2) − uv u

Now suppose that the first three cards are all distinct. We have 6 more cases: 197

first 3 cards of D θ1 θ2 θ1θ2 probability

uvw βuv βvw βuvβvw

uwv βuw βwv βvwβwu

vuw βvu βuw βuvβwu nunvnw n(n 1)(n 2) − − vwu βvw βwu βvwβwu

wuv βwu βuv βuvβwu

wvu βwv βvu βuvβvw

Together the cases where the first three cards are distinct contribute

n n n (F.4) u v w (2β β +2β β +2β β ) n(n 1)(n 2) uv vw vw wu wu uv u

2 2 (F.5) Var θ = n n β2 n n (n + n 2)β2 n u v uv − n(n 1) u v u v − uv u

2 (F.6) n n ((n 1) (n + n 2))β2 n(n 1) u v − − u v − uv u

2 2 2 nunvnw(βuv + βvw + βwu) u

and substituting back into Equation (F.5) gives the desired result. 

Suppose that +1 if u < v β = sgn(v u)= 0 if u = v uv  − 1 if u>v.  − Then θ = sgn(D(i + 1) D(i)), which is to say θ is 1 if D has an “ascent” at i −  i i, 1 if D has a “descent” at i, and 0 otherwise, where ascents and descents for − decks are defined analogously to the way they were defined for permutations. So

θ = asc(D) des(D), and that means − 2 Var(asc des) = n n (1)2 + n n n (1+1 1)2 − n(n 1) u v u v w − "u

where er is the rth elementary symmetric function.

Before we calculate Var(Nc1), note that we could have simplified the result in The-

orem F.1 differently. The third sum in Equation (F.5) is the same as

2 (F.8) nunvnwβuvβwv −n(n 1) u,v,w − distinctX and the expression in Equation (F.6) can be written as

2 (n + 1) n n β2 n n (n + n )β2 n(n 1) u v uv − u v u v uv " u

2 2 (F.9) nunvβuv = nunvnwβuvβwv. u=v u,v,w 6 u=w=v X X6

Since βuvβwv vanishes when u = v or w = v, together Equation (F.8) and Equa- 199

tion (F.9) cover all choices for u, v, and w which make βuvβwv nonzero. Therefore

2 Var θ = (n + 1) n n β2 n n n β β n(n 1) u v uv − u v w uv wv " u

From Theorem 5.2 we know that

n W (D,u,v)Z(D0,u,v) Nc (D,D0)= . 1 2 n n u

So to find Var(Nc1) when D0 is fixed we should set

n Z(D0,u,v) βuv = . 2 nunv     Plugging that into Equation (F.10), we have

(F.11) 2 2 n 1 n +1 Z(D0,u,v) Var(Nc )= Z(D0,u,v) . 1 2(n 1) n  2 n −  v v u u u ! − X X X   200

APPENDIX G

Moments of the Descent Distribution over T (D, D0)

In Section 5.9 we showed that the coefficients ck(D,D0), which can be used to estimate

the transition probability from D to D0 after an a-shuffle, can be written in terms of

the moments n 1 k µk = E − des(π) πD = D0 . 2 − !  

In this appendix we develop algorithms for computing m1 and m2, where

k m := E des(π) πD = D0 . k |  That allows calculation of µ1 and µ2, since

k k i k k i k n 1 − k n 1 − µ = E − ( des(π))i = − ( 1)im . k i 2 − i 2 − i i=0 i=0 X     X    

n1 n2 n Let D,D0 (1 2 k k ), with n = n + n + + n . Assume from now on ∈ O ··· 1 2 ··· k that π is a permutation selected uniformly from T (D,D0). For 1 i < n let ≤ 1 if π(i) > π(i + 1) X = i 0 otherwise  so that des(π)= X1 + X2 + + Xn 1. Let ··· −

A(v, m) := # ` < m : D0(`)= v { }

B(v, m) := # ` > m : D0(`)= v { } 201

So given D0, A(v, m) is the number of cards “above” position m (i.e., nearer the top of the deck) which have value v, and B(v, m) is the number “below” position m which have value v.

Given two strings

s = s(1),s(2),...,s(m) and t = t(1), t(2),...,t(n) of cards, an embedding of s into t is an injection

φ : 1, 2,...,m 1, 2,...,n { }→{ } such that t φ = s. In other words, an embedding is a subdeck of t which matches s. ◦

Let I(s) be the total number of embeddings of s into D0, and let R(s) be the number of those embeddings which reverse the order of the cards in s. For example, if

D0 = 121323 and s = 321

then there are 8 embeddings of s into D0:

3 1 3 1 3 1 3 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 3 3 3 3 2 2 2 2 3 3 3 3

3 1 3 1 3 1 3 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 3 3 3 3 2 2 2 2 3 3 3 3

Four of those embeddings (the first, second, and fourth in the top row, and the fourth in the bottom row) reverse the original relative order of the cards in s, so I(s)=8 and R(s)=4. 202

If s and t are two strings of letters, then by I(s t) we mean the number of ways ∼ to embed both s and t into D0 such that there is no overlap. So for instance if D0 is

as above then the embeddings of 12 32 are ∼

1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 3 3 3 3

1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 3 3 3 3

By R(s t) we mean the number of ways to embed s and t into D0 such that both s ∼ and t are reversed. In the above example, the second embedding in the bottom row

is the only one which satisfies this condition, so R(12 32) = 1. ∼

Note that in general I(s) depends only on the value of the characters in s, not on

their order. If s (1m1 2m2 kmk ) then ∈ O ···

mv I(s)= nv v Y where nr is the falling factorial power

nr = n(n 1)(n 2) (n r + 1). − − ··· −

Here is a simple algorithm for computing I(s), suitable when the length of s is small.

Count-Embeddings(s) 1. I 1 2. for←i 1 to Length(s) 3. v ← s(i) ← 203

4. I I n ← × v 5. nv nv 1 6. for i ←1 to−Length(s) 7. v ← s(i) ← 8. nv nv +1 9. return←I

G.1 First Moment

We have m1 = E des(π)= i EXi, and P EXi = P(Xi = 1)(1)+ P(Xi = 0)(0) = P(Xi =1)= P(π(i) > π(i + 1)).

Let u = D(i), v = D(i + 1). We know that π(i) can be any position at which D0

has a u, with equal probability, and likewise π(i + 1) is uniformly distributed over

1 1 D0− (v). If u = v, it follows that P(Xi = 1) = 2 . Otherwise, π(i) and π(i +1) are independent random variables, so (π(i), π(i + 1)) is uniformly distributed over the

1 1 set D0− (u) D0− (v). Therefore ×

# v-u pairs in D0 1 EXi = P(π(i) > π(i +1)) = { } = A(v,k). nunv nunv k D0−1(u) ∈ X 1 If we preprocess D0, we can compute and save the values of nv, D− (v), and A(v,k)

2 for all k and all v in O(n ) time. Afterward we can compute EXi in O(n) time, and

2 hence compute m1 in O(n ) time.

Here are some algorithms to do that. P R is a global array which accumulates results so they need not be calculated twice. It is assumed to be set to all negative values at program runtime.

Pair-Expectation(i) 1. u D(i), v D(i + 1) 2. if ←(u = v) ← 1 3. return 2 204

4. return Count-Pair-Reversals(u, v)/(nunv)

Count-Pair-Reversals(u, v) 1. if (P R(u, v) < 0) 2. P R(u, v) 0 ← 1 3. for each k D0− (u) 4. P R(u, v)∈ P R(u, v)+ A(v,k) 5. return P R(u, v)←

G.2 Second Moment

For the second moment, we have

2 2 m2 = E (X1 + X2 + + Xn 1) = E Xi +2 XiXj . ··· − i i

m2 = m1 +2 E (XiXj) . i

0 or 1, the same can be said of XiXj. Therefore

E(XiXj)= P(XiXj = 0)(0)+ P(XiXj = 1)(1)

= P(XiXj =1)= P(Xi = 1 and Xj = 1)

= P(π(i) > π(i + 1) and π(j) > π(j + 1)).

G.2.1 Consecutive Variables

First, suppose j = i + 1. Then

E(XiXj)= P(π(i) > π(i + 1) > π(i + 2)).

Let u = D(i), v = D(i + 1),w = D(i +2). If u = v = w, the answer is 1/6, since all

orderings of the three are equally likely. Otherwise, use the fact that E(XiXi+1) = 205

R(uvw)/I(uvw).

R(uvw) can be computed by enumerating all the ways to place a v into D0, then

place a u below that position and a w above it. If the v is placed at position k, there

are A(w,k) places to put the w above it and B(u,k) ways to place the a below it.

Therefore

R(uvw)= A(w,k)B(u,k). k D0−1(v) ∈ X Here are some algorithms for computing E(XiXi+1):

Consecutive-Expectation(i) 1. u D(i), v D(i + 1), w D(i + 2) 2. return← Count-Triple-Reversals← ← (u,v,w)/Count-Embeddings(uvw)

Count-Triple-Reversals(u,v,w) 1. if (T R(u,v,w) 0) 2. return T R(≥u,v,w) 3. m nu 4. R ← 0 5. if ←(u = v) 6. if (v = w) 7. return Count-Embeddings(uvw)/6 8. m m 1 ← − 1 9. for each i D0− (v) 10. R R +∈ A(w, i)(m A(u, i)) 11. T R(u,v,w← ) R − 12. return R ←

Here T R, like P R, is a global array which is initially filled with negative numbers.

It collects the results of calls to Count-Triple-Reversals so they need not be

computed twice.

Note that we can compute B(u, i) from A(u, i), because

1 nu if D0(i) = u A(u, i)+ B(u, i)=#(D0− (u) i )= 6 \{ } nu 1 if D0(i)= u  − 206

1 and in the algorithm i D0− (v), so D0(i)= v, and therefore ∈ n if u = v A(u, i)+ B(u, i)= u = m. n 1 if u =6 v  u −

G.2.2 Nonconsecutive Variables

Now suppose that j > i + 1. Then the two pairs (i, i +1) and (j, j + 1) are disjoint.

Let

u = D1(i), v = D1(i + 1), w = D1(j), x = D1(j + 1).

Suppose u = v. Then given any values for π(j) and π(j + 1), exactly half of the ways to choose π(i) and π(i + 1) will be such that π(i) > π(i +1). So

E(XiXj)= P (π(i) > π(i + 1) and π(j) > π(j + 1))

= P(π(j) > π(j + 1))/2= EXj/2.

Likewise if w = x, E(XiXj)= EXi/2.

So assume u = v and w = x. There are 15 ways to partition the set u,v,w,x into 6 6 { } equivalence classes, and in 8 of them either u = v or w = x, or both. That leaves 7

cases to consider:

uw vx, ux vw, uw v x, ux v w, vw u x, vx u w, u v w x | | | | | | | | | | | | |

where the bars separate equivalence classes - so “uw vx” means “u = w and v = x | but u = v”. 6

u v w x. If all the values are distinct, then the placement of u and v has no effect | | |

on the placement of w and x, from which it follows that Xi and Xj are independent.

So in this case

E(XiXj)= P(Xi = 1 and Xj =1)= P(Xi = 1)P(Xj =1)= EXiEXj. 207

ux v w. Here u = x, so E(X X )= R(uv wu)/I(uv wu) where u, v, and w are | | i j ∼ ∼ distinct values. I(uv wu)= n (n 1)n n . To compute R(uv wu), first count ∼ u u − v w ∼

all ways to place uv and wu into D0 such that both pairs are reversed, then subtract

off the “illegal” maps in which there is overlap. There are R(uv) ways to reverse uv,

and we know how to compute that - we did it when we found the first moment:

R(uv)= A(v, i). i D0−1(u) ∈ X There are R(wu) ways to reverse wu, and we can likewise calculate that in O(n)

time. Two placements can overlap only if the u’s go to the same place, and the

number of ways to accomplish that is the number of ways to pick a u in D0, pick a

v above it, and pick a w below it. But that’s R(wuv). So

R(uv wu)= R(uv)R(wu) R(wuv) ∼ −

which means we can calculate E(XiXj) in O(n) time.

vw u x. Here v = w, so E(X X )= R(uv vx)/I(uv vx). This is really the same | | i j ∼ ∼ problem as the last if you change the letters, since the relative order of the pairs does

not matter. Therefore I(uv vx)= n n (n 1)n and ∼ u v v − x

R(uv vx)= R(uv)R(vx) R(uvx). ∼ −

uw v x. u = w so E(X X ) = R(uv ux)/I(uv ux). In all of these cases, it | | i j ∼ ∼ is obvious how to calculate I in constant time, so we will concentrate on R. As

before, there are R(uv)R(ux) ways to make assignments which reverse the pairs.

The number of “bad” assignments is the number of ways to pick a u, a v, and an x

in D0 in such a way that the u is below both the other two. That is the number of

ways to reverse uvx plus the number of ways to reverse uxv. In other words,

R(uv ux)= R(uv)R(ux) R(uvx) R(uxv). ∼ − − 208

vx u w. v = x so E(X X ) = R(uv wv)/I(uv wv). This is like the previous | | i j ∼ ∼ case, but the bad assignments have a v on the top and a u and a w below in some order. So

R(uv wv)= R(uv)R(wv) R(uwv) R(wuv). ∼ − − ux vw. Here u = x and v = w, so E(X X ) = R(uv vu)/I(uv vu). There are | i j ∼ ∼ two ways in which a pair of assignments can overlap - either the v’s can go to the

same place with one u on either side, or the u’s can go to the same place with a v

on either side. There cannot be two overlaps, because then one of the pairs would

not be reversed. So

R(uv vu)= R(uv)R(vu) R(uvu) R(vuv). ∼ − −

uw vx. u = w and v = x, so E(X X ) = R(uv uv)/I(uv uv). In this case | i j ∼ ∼ the u’s can overlap, or the v’s can overlap, or both. If the u’s overlap and the v’s

1 1 do not, there are three destinations, one in D0− (u) and two in D0− (v). Both the

v’s should be above the u, so the number of destination sets is R(uvv). There are

two ways to map to each destination set, since it does not matter which v is on top.

Therefore there are 2R(uvv) bad assignments in which the u’s overlap and the v’s do

not. Likewise if the v’s overlap but the u’s do not. There are R(uv) ways to make

both u’s and v’s overlap. So

R(uv uv)= R(uv)2 2R(uvv) 2R(uuv) R(uv). ∼ − − −

Here is an algorithm for computing E(XiXj) where j > i + 1:

Nonconsecutive-Expectation(i, j) 1. u D(i), v D(i + 1) 2. w ← D(j), x← D(j + 1) 3. if ←(u = v) ← 4. return Pair-Expectation(j)/2 209

5. if (w = x) 6. return Pair-Expectation(i)/2 7. R Count-Pair-Reversals(u, v) Count-Pair-Reversals(w, x) 8. if ←(u = w or v = w) × 9. R R Count-Triple-Reversals(u,v,x) 10. if (u =←w) − 11. R R Count-Triple-Reversals(u, x, v) 12. if (u =←x or−v = x) 13. R R Count-Triple-Reversals(w,u,v) 14. if (v =←x) − 15. R R Count-Triple-Reversals(u,w,v) 16. if (u =←w and− v = x) 17. R R Count-Pair-Reversals(u, v) 18. return←R /−Count-Embeddings(uvwx) BIBLIOGRAPHY

210 211

BIBLIOGRAPHY

[1] David Aldous. Random walks on finite groups and rapidly mixing Markov chains. In Seminar on probability, XVII, volume 986 of Lecture Notes in Math., pages 243–297. Springer, Berlin, 1983. [2] David Aldous and Persi Diaconis. Shuffling cards and stopping times. Amer. Math. Monthly, 93(5):333–348, 1986. [3] Dave Bayer and Persi Diaconis. Trailing the dovetail shuffle to its lair. Ann. Appl. Probab., 2(2):294–313, 1992. [4] Paul D. Berger. On the distribution of hand patterns in bridge: Man-dealt versus computer- dealt. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 1(2):261–266, 1973. [5] Emile Borel and Andr´eCh´eron. Th´eorie math´ematique du bridge `ala port´ee de tous. Gauthier- Villars, Paris, 1940. [6] Petter Br¨and´en. Counterexamples to the Neggers-Stanley conjecture. Electron. Res. Announc. Amer. Math. Soc., 10:155–158 (electronic), 2004. [7] Leonard Carlitz. Eulerian numbers and polynomials. Math. Mag., 32:247–260, 1958/1959. [8] Mihai Ciucu. No-feedback card guessing for dovetail shuffles. Ann. Appl. Probab., 8(4):1251– 1269, 1998. [9] Mark Conger. A refinement of the Eulerian numbers, and the joint distribution of π(1) and des(π) in Sn. Ars Combin., To Appear. [10] Mark Conger and Divakar Viswanath. Riffle shuffles of decks with repeated cards. Ann. Probab., 34(2):804–819, 2006. [11] Mark Conger and Divakar Viswanath. Shuffling cards for blackjack, bridge, and other card games. Journal of Experimental Mathematics, Submitted. [12] Mark Conger and Divakar Viswanath. Normal approximations for descents and inversions of permutations of sets and multisets. Ann. Appl. Probab., To Appear. [13] Persi Diaconis. Group representations in probability and statistics. Institute of Mathematical Statistics Lecture Notes—Monograph Series, 11. Institute of Mathematical Statistics, Hayward, CA, 1988. [14] Persi Diaconis. Mathematical developements from the analysis of riffle shuffling. In A. A. Ivanov, Martin W. Liebeck, and Jan Saxl, editors, Groups, Geometry, and Combinatorics, pages 73–97. Engineering and Physical Sciences Research Council of Great Britain, World Scientific Publishing Company, July 16-26 2001. 212

[15] Persi Diaconis. Mathematical developements from the analysis of riffle shuffling. Technical Report 2002-16, Stanford University, August 2002. [16] Persi Diaconis and Susan Holmes, editors. Stein’s method: expository lectures and applica- tions. Institute of Mathematical Statistics Lecture Notes—Monograph Series, 46. Institute of Mathematical Statistics, Beachwood, OH, 2004. [17] Persi Diaconis, Michael McGrath, and Jim Pitman. Riffle shuffles, cycles, and descents. Com- binatorica, 15(1):11–29, 1995. [18] J. R. Doner and V. R. R. Uppuluri. A markov chain structure for riff shuffling. SIAM J. Appl. Math., 18(1):191–209, 1970. [19] Richard A. Epstein. The theory of gambling and statistical logic. Academic Press [Harcourt Brace Jovanovich Publishers], New York, revised edition, 1977. [20] Leonhard Euler. Institutiones Calculi Differentialis cum eius Usu in Analysi Finitorum ac Doctrina Serierum. Academiae Imperialis Scientiarum, St. Petersburg, 1755. [21] Jason Fulman. Stein’s method and non-reversible Markov chains. In Stein’s method: expository lectures and applications, volume 46 of IMS Lecture Notes Monogr. Ser., pages 69–77. Inst. Math. Statist., Beachwood, OH, 2004. [22] Edgar N. Gilbert. Theory of shuffling. Technical Report MM-55-114-44, Bell Telephone Lab- oratories, October 21 1955. [23] Edgar N. Gilbert. Personal communication, February 9 2007. [24] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete mathematics. Addison- Wesley Publishing Company, Reading, MA, second edition, 1994. [25] Kenneth E. Iverson. A programming language. John Wiley and Sons, Inc., New York-London, 1962. [26] Donald E. Knuth. Two notes on notation. Amer. Math. Monthly, 99(5):403–422, 1992. [27] Donald E. Knuth. The Art of Computer Programming, volume 3. Addison-Wesley Publishing Company, Reading, Mass., second edition, 1998. [28] Gina Kolata. In shuffling cards, 7 is winning number. The New York Times, January 9 1990. [29] Damodar Dharmananda Kosambi and U.V. Ramamohana Rao. The efficiency of randomiza- tion by card shuffling. J. Royal Stat. Soc., Series A (General), 121(2):223–233, 1958. [30] Dexter C. Kozen. The design and analysis of algorithms. Texts and Monographs in Computer Science. Springer-Verlag, New York, 1992. [31] Percy A. MacMahon. Combinatory Analysis, volume I. Chelsea Publishing Company, New York, 1916. [32] Brad Mann. How many times should you shuffle a deck of cards? UMAP J., 15(4):303–332, 1994. [33] Brad Mann. How many times should you shuffle a deck of cards? In Topics in contemporary probability and its applications, Probab. Stochastics Ser., pages 261–289. CRC, Boca Raton, FL, 1995. [34] Andrei Andreevich Markov. Extension of the law of large numbers to dependent events. Bull. Soc. Phys. Math., 15(2):135–156, 1906. [35] M. J. Mellish. Optimal card shuffling. Eureka, 36:9–10, 1973. 213

[36] Henri Poincar´e. Calcul des Probabilit´es. Gauthier-Villars, Paris, deuxi`eme edition, 1912. [37] Jim Reeds. Unpublished manuscript. 1981. [38] Jim Reeds. Personal communication, February 2 2007. [39] John Riordan. An introduction to combinatorial analysis. Wiley Publications in Mathematical Statistics. John Wiley & Sons Inc., New York, 1958. [40] Bruce E. Sagan. The symmetric group. The Wadsworth & Brooks/Cole Mathematics Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 1991.

[41] Rodica Simion. A multi-indexed Sturm sequence of polynomials and unimodality of certain combinatorial sequences. J. Combin. Theory Ser. A, 36(1):15–22, 1984. [42] Richard P. Stanley. Enumerative Combinatorics, volume 1. Wadsworth & Brooks/Cole, Mon- terey, California, 1986. [43] Richard P. Stanley. Positivity problems and conjectures in algebraic combinatorics. In Math- ematics: frontiers and perspectives, pages 295–319. Amer. Math. Soc., Providence, RI, 2000.

[44] Dudley Stark, A. Ganesh, and Neil O’Connell. Information loss in riffle shuffling. Comb. Probab. Comput., 11(1):79–95, 2002. [45] Charles Stein. Approximate computation of expectations. Institute of Mathematical Statistics Lecture Notes—Monograph Series, 7. Institute of Mathematical Statistics, Hayward, CA, 1986. [46] John R. Stembridge. Counterexamples to the poset conjectures of Neggers, Stanley, and Stem- bridge. Trans. Amer. Math. Soc., 359:1115–1128, 2007.

[47] S. Tanny. A probabilistic interpretation of Eulerian numbers. Duke Math. J., 40:717–722, 1973. [48] E. Thorp. Nonrandom shuffling with applications to the game of Faro. JASA, 68:842–847, 1973. [49] L. N. Trefethen and L. M. Trefethen. How many shuffles to randomize a deck of cards? Proc. Roy. Soc. London Ser. A, 456:2561–2568, 2000.

[50] David G. Wagner. Enumeration of functions from posets to chains. European J. Combin., 13(4):313–324, 1992. [51] Herbert S. Wilf. generatingfunctionology. Academic Press Inc., Boston, MA, 1990. ABSTRACT

Shuffling Decks With Repeated Card Values

by

Mark A. Conger

Chair: Divakar Viswanath

This thesis considers the effect of riffle shuffling on decks of cards, allowing for some cards to be indistinguishable from other cards. The dual problem of dealing a game with hands, such as bridge or poker, is also considered. The Gilbert-Shannon-Reeds model of card shuffling is used, along with variation distance for measuring how close to uniform a deck has become.

A method is found for approximating the variation distance from uniform when the size of a shuffle is large. This leads to a number of results for specific card games.

In particular, the normal cyclic way that bridge is dealt is not optimal: changing to back-and-forth dealing can add as much randomness to the game as performing 3.7 more shuffles. Also: one fewer shuffle is needed to mix a go-fish deck (in which suit is irrelevant) than to mix a deck of 52 distinct cards; shuffling a deck with two types of cards is greatly speeded if the top and bottom cards of the deck initially have the same value; and a poker deck is best cut in such a way that the cards to be played come from the middle of the shuffled pack.

Several Monte Carlo methods are also discussed, for use in estimating values that

are beyond the means of current technology to calculate exactly. The results of two

large supercomputer simulations for bridge dealing are reported.

Among the other results are methods for computing the transition probability be-

tween decks when one of them has special characteristics (in particular, when one

of them is sorted, or when the target deck consists of blocks of cards with the same

value). This leads to an investigation of the joint distribution of two statistics on

the symmetric group, des(π) and π(1), and a generalization of the Eulerian numbers.

That results in a formula for the number of permutations with a given number of descents and a given initial value, and a proof that the expected value of π(1), when

π is chosen uniformly from those permutations with d descents, is d + 1.