Shuffling Decks With Repeated Card Values
by Mark A. Conger
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Mathematics) in The University of Michigan 2007
Doctoral Committee:
Assistant Professor Divakar Viswanath, Chair Professor Sergey Fomin Professor John K. Riles Professor Bruce Sagan, Michigan State University c Mark A. Conger
All Rights Reserved 2007 ACKNOWLEDGEMENTS
This thesis has been more than 16 years in the making, so there are a lot of people
to thank.
Thanks to my friends and roommates Steve Blair and Paul Greiner, who managed
to get me through my first year at Michigan (1990-91). As Steve said to me once,
years later, “Together, we made a pretty good graduate student.”
Thanks to Prof. Larry Wright at Williams for teaching me about Radix Sort. And
thanks to everyone who shared their punch cards and punch card stories with me:
John Remmers, Ken Josenhans, Thom Sterling, and Larry Mohr among others.
Thanks to Profs. Bill Lenhart at Williams and Phil Hanlon at Michigan who taught
me basic enumerative combinatorics, especially how to switch summation signs. Prof.
Ollie Beaver taught me how to use confidence intervals, and Martin Hildebrand
taught me the Central Limit Theorem. Pam Walpole and Pete Morasca taught me
many things I use every day.
Most of the programs used to test hypotheses and generate Monte Carlo data were
written in C/C++. The thesis was typeset in LATEX using the PSTricks package, with a number of figures written directly in PostScript. Other figures, as well as the glossary and notation pages, were generated from scripts by Perl programs. Certain
ii tables were created by an XSLT stylesheet. MAPLE programs were used when large rational results were required. So thank you to Brian Kernighan, Dennis Ritchie,
Bjarne Stroustrup, Donald Knuth, Leslie Lamport, Timothy Van Zandt, Adobe Sys- tems, Larry Wall, the World Wide Web Consortium, and Maplesoft for those lan- guages.
Thanks to Profs. Mark Skandera and Boris Pittel for explanations of the Neggers-
Stanley conjecture. Prof. Herbert Wilf encouraged the work in Chapter IV. Prof.
Jeff Lagarias found me a copy of [22].
Thanks to all my friends with Ph.D.’s who inspired me by example during my time in the wilderness. They include Paul Greiner, Rick Mohr, Thom Sterling, Jill Baker,
Ray Bingham, Lucy Hadden, Will Brockman, Ming Ye, Xueqing Tan, Ivan Yourshaw,
Ken Hodges, Su Fang Ng, John Remmers, Steven Lybrand, Wil Lybrand, Jan Wolter, and Larry Mohr.
Thanks to Profs. Keith Riles and Roberto Merlin, who let me sit in on their physics classes in 1999. If they had been less encouraging, I might have gone back to pro- gramming for a living.
Thanks to Prof. Al Taylor, who encouraged me to get back into the math program, and has acted as my unofficial protocol adviser throughout.
Thanks to all the staff in the graduate math office at Michigan, especially Warren
Noone, Tara McQueen, Christine Betz Bolang, Bert Ortiz, and Jennifer Wagner.
Overcoming bureaucracy has always been a challenge for me, and it helped enor- mously to have advocates (and often surrogates) for dealing with the powers that be. The department also generously paid my tuition for the Winter 2006 semester.
iii Thanks to Jayne London for fostering all the programs for graduate students over at
Rackham, and for talking to me about options.
Many thanks to my good friend and teaching partner Jason Howald, who was always encouraging and helpful when I was stuck, and who gave me the greatest gift one mathematician can give another: he listened to me talk about my problem, thought about it, and worked with me to find a different approach.
Thanks to Profs. Persi Diaconis and Jason Fulman for conversations and tips on directions to take, and to Jim Reeds and Ed Gilbert for explaining the history to me.
Thanks to Prof. Bruce Sagan, who helped me with mathematical as well as political advice on a number of occasions.
Sergi Elizalde talked with me about EDπ(1) (Chapter IV, Theorem 4.3), and wrote a proof a few days after I did.
Thanks to all my office mates: Kamran Kashef, Ken Keppen, Jared Maruskin, Jiarui
Fei, Dave Constantine, and Janis Stipins, for putting up with my clutter (and my primitive origami).
Thanks to Teresa Hunt for years of encouragement and for several pearls of wisdom and tricks for getting things done.
Thanks to Nito for keeping me company.
Thanks to my study partners of the past few years: Cornelia Yuen, Paul Greiner,
Rob Houck, and Chris Bichler.
iv Many thanks to my stepfather, Wil Lybrand, who gave me the invaluable advice to treat the Ph.D. as a union card, not a life’s work.
Prof. Sergey Fomin taught me everything I know about Young diagrams and sym- metric functions, and he has always been generous with his time and his support.
My advisor Divakar Viswanath, of course, was more involved with this work than anyone, and deserves the most credit for its success. We began work on the topic of card shuffling in 2002, when he gave me a copy of [3] to read as part of a class on Markov Chains. He suggested thinking about problems that the authors had not considered, and that was the source of the question about decks with repeated cards.
Throughout the development of the ideas in this thesis he has usually been several steps ahead of me.
My Mother, Drew Conger, has been unbelievably patient waiting for me to finish my degree. I learned not only patience but almost everything else I value from her.
Carol Mohr has endured the most in service of this project. She has also given the most support, and for that I am forever grateful.
v TABLE OF CONTENTS
ACKNOWLEDGEMENTS ...... ii
LIST OF FIGURES ...... x
LIST OF TABLES ...... xiii
LIST OF APPENDICES ...... xiv
GLOSSARY ...... xv
NOTATION ...... xx
CHAPTER
I. Introduction ...... 1
1.1 PreviousResults...... 2
1.2 RepeatedCards ...... 6
1.3 NewResults ...... 8
1.3.1 Dealing is Equivalent to Fixing the Target Deck ...... 8
1.3.2 Transition Probabilities and Descent Polynomials ...... 9
1.3.3 Approximating Transition Probabilities ...... 9
1.3.4 Bridge ...... 11
1.3.5 Other Results from First-Order Approximations ...... 12
1.3.6 MonteCarloSimulations...... 13
1.3.7 Calculation of Descent Polynomials for Certain Decks ...... 13
vi 1.3.8 The Joint Distribution of des(π) and π(1) ...... 14
II. Preliminaries ...... 15
2.1 PermutationsandDecks ...... 15
2.2 RepeatedValues...... 16
2.3 MixingProblems...... 18
2.4 RiffleShuffling...... 19
2.5 a-shuffles...... 22
2.6 InverseShufflesandRepeatedShuffles...... 23
2.7 Counting Shuffles Which Produce a Permutation ...... 26
2.8 DescentPolynomialsandShuffleSeries ...... 29
2.9 DistancefromUniform ...... 30
2.10 VariationDistance...... 33
2.11 Dealing Cards is Equivalent to Fixing the Target Deck ...... 36
2.12 HowGoodistheGSRModel? ...... 38
III. Probability Calculations for Some Simple Decks ...... 41
3.1 TheSimplestDeck ...... 43
3.2 OneRedCardonTop...... 44
3.3 OneRedCardontheBottom ...... 45
3.4 AnyPositiontoTop...... 47
3.5 AnyPositiontoBottom...... 49
3.6 AnyPositiontoAnyPosition ...... 50
3.7 OneRedCard,OneGreenCard ...... 52
3.8 Source deck RmBn ...... 53
3.9 Target deck RmBn ...... 56
3.10 Source Deck 1n1 2n2 ...mnm ...... 57
vii 3.11 Target Deck 1n1 2n2 ...mnm ...... 60
3.12 TargetDecksContainingBlocks ...... 61
IV. The Joint Distribution of π(1) and des(π) in Sn ...... 65
4.1 IntroductionandMainResults...... 65
4.2 BasicProperties ...... 68
4.3 Recurrences ...... 70
4.4 FormulasandMoments ...... 71
4.5 Application Using Stein’s Method ...... 77
4.6 GeneratingFunctions ...... 79
4.7 GeneralBehavior ...... 85
4.8 Behavior if d n ...... 88 4.9 IfBothEndsAreFixed...... 89
4.10Remarks ...... 92
V. Estimating Variation Distance ...... 93
5.1 TheBasicQuestionsofCardShuffling...... 93
5.2 Calculating Variation Distance Exactly ...... 95
5.3 Expansion of the Transition Probability as a Power Series in a−1 ...... 96
5.4 Approximating the Transition Probability and Variation Distance for Large Shuffles...... 99
5.5 All-DistinctDecks...... 102
5.6 Calculation of κ1 whentheSourceDeckisFixed...... 105
5.6.1 TwoCardTypes ...... 105
5.6.2 For Which Collections of Cards Can κ1 beZero? ...... 108
5.6.3 Guessing How Large κ1 CanBe...... 111
5.6.4 Finding κ1 forGeneralSourceDecks ...... 115
5.7 Calculation of κ1 whentheTargetDeckisFixed ...... 118
viii 5.7.1 For Which Collections of Cards Can κ1 beZero? ...... 119
5.7.2 Finding κ1 forGeneralTargetDecks ...... 122
5.7.3 Euchre...... 123
5.7.4 StraightPoker...... 124
5.7.5 OrderedDeals...... 126
5.7.6 Bridge ...... 129
5.7.7 Estimation of κ1 forlargedecks...... 132
5.8 Bounding the Error in the First-Order Estimate ...... 133
5.9 Ways to understand the transition between two decks ...... 136
VI. Monte Carlo Estimates ...... 143
6.1 Approximating Variation Distance Given Descent Polynomials ...... 144
6.2 Approximating Transition Probabilities ...... 147
6.3 Approximating κ1 and κ1 ...... 154
APPENDICES ...... 158
BIBLIOGRAPHY ...... 210
ix LIST OF FIGURES
Figure
2.1 The four permutations which take BABA to AABB...... 17
2.2 The correspondence between the shuffle of Table 2.1 and a sequence of 0’s and 1’s. 21
2.3 From left to right, a 3-shuffle of ABCDEFGH; from right to left, an inverse 3-shuffle of DAGEFBHC...... 24
2.4 Aradixsortofasetof8punchcards...... 25
2.5 The advantages for player B in the Ace-King Game (in black) and in Doyle’s game (inred)...... 33
2.6 The variation distance from uniform of a distinct 52-card deck after an a-shuffle. . 36
3.1 The graphs of π, ρπ, πρ, and ρπρ, showing the way ρ changes ascents to descents andvice-versa...... 46
3.2 ThetwocasesconsideredinSection3.4...... 49
3.3 TheshuffledescribedinSection3.6...... 51
3.4 The second “red-green” shuffle described in Section 3.7...... 52
3.5 The shuffle of the source deck RmBn,asdescribedinSection3.8...... 54
3.6 The Young matrix A for the deck RBBRBRRBBBRRB...... 55
3.7 A shuffle to the target deck RmBn...... 56
x 3.8 Shuffling a sorted deck, as described in Section 3.10...... 58
3.9 A shuffle to the target deck 1n1 2n2 ...mnm ...... 61
4.1 A north-east lattice path from (0, 0) to (m + n `,`)...... 75 −
4.2 The real zeroes of a Neggers-Stanley descent polynomial...... 84
n 4.3 The unimodality of d k for n = 6 and n =7...... 87
n ` 4.4 The “rollback” procedure for finding d k...... 91
5.1 The variation distance from uniform, and first-order approximation, of a distinct 52-card deck after an a-shuffle...... 104
5.2 Representingadeckasawalkonagraph...... 106
5.3 The deck D0 = 112212112212 represented as a north-east path from (0, 0) to (6, 6). 107
5.4 The effect of cutting a poker deck at position m...... 127
5.5 The effect of cutting a euchre deck at position m...... 127
5.6 Three styles of bridge dealing, represented by lattice paths...... 131
5.7 A summary of κ1 examples,withnormalestimates...... 133
5.8 The bound E(n,a) on the error in the first-order estimate of variation distance, for n = 52, versus the actual error for the case of 52 distinct cards...... 136
6.1 A Monte Carlo estimate of the variation distance from uniform, and a first-order approximation, of a euchre deck after an a-shuffle...... 146
6.2 A Monte Carlo estimate of the variation distance from uniform, and first-order approximation, of an ordered bridge deck after an a-shuffle...... 147
xi 6.3 A double Monte Carlo estimate of the variation distance from uniform of a cyclic 0 13 bridge deck (Dcyc = (NESW) )...... 152
6.4 A double Monte Carlo estimate of the variation distance from uniform of a back- 0 6 and-forth bridge deck (Dbf = (NESWWSEN) (NESW))...... 153
6.5 Monte Carlo estimates for three methods of dealing bridge...... 153
6.6 The distances of Figure 6.5 graphed on a log-log scale...... 154
6.7 A Monte Carlo estimate of the variation distance from uniform of a sorted go-fish deck (D = A424 K4)...... 156 · · ·
C.1 Shufflingasorteddeck...... 178
0 C.2 The matrix B1 if the subsequence of 1’s and 2’s in D is 1221211222112...... 179
D.1 Riemann sums corresponding to probabilities in the Ace-Kinggame...... 186
xii LIST OF TABLES
Table
1.1 Variation distances from uniform after an a-shuffle for a deck of distinct cards and 3methodsofdealingbridge...... 12
2.1 The steps in a typical shuffle of the deck ABCDEFGH...... 20
5.1 Ordering a deck so that κ1 =0...... 110
10 0 6.1 The distribution of des over a sample of M = 10 permutations from T (D1,Dcyc), where D1 isasinEquation(6.3)...... 149
xiii LIST OF APPENDICES
Appendix
A. Tables of Eulerian and Refined Eulerian Numbers ...... 159
B. Excerpts from Calcul des Probabilit´es (1912)byHenriPoincar´e ...... 163
B.1 Introduction,pp.13–15...... 163
B.2 Chapter XVI, “Diverse Questions”, pp. 301–313 ...... 165
C. An Algorithm for Computing the Descent Polynomial when the Source Deck is Sorted177
D. ProbabilitiesfortheAce-KingGame ...... 183
E. The Variance of Nc1 WhentheSourceDeckisFixed ...... 188
F. The Variance of Nc1 WhentheTargetDeckisFixed ...... 194
G. Moments of the Descent Distribution over T (D,D0)...... 200
G.1 FirstMoment ...... 203
G.2 SecondMoment ...... 204
G.2.1 ConsecutiveVariables ...... 204
G.2.2 NonconsecutiveVariables ...... 206
xiv GLOSSARY
arrow diagram A figure describing the action of a permutation on a deck. The
source deck appears on one side of the diagram, the target deck on the other,
and arrows are drawn between corresponding cards. (p. 16) ascent When applied to a permutation π, a position k such that π(k) > π(k + 1).
(See descent.) (p. 27) a-shuffle A method of shuffling cards in which a deck is cut into a packets, according
to a multinomial distribution on the packet sizes, then riffled together. There
are an possible a-shuffles of a deck of n cards, and they are all equally probable
under the GSR model. (p. 22) blackjack A casino game played with ordinary playing cards. The suit of cards is
ignored, and there are 10 categories of rank: the aces, the twos, . . ., the nines,
and the rest. Blackjack may be played with any number of ordinary decks
mixed together, but when played with 52 cards, we may think of the deck as an
ordering of A424 94T16. (p. 114) ··· bottom card The card in position n of an n-card deck. (p. 15) bridge A game played with an ordinary deck of 52 cards, all of which are considered
distinct. Each of the four players (commonly referred to as north, east, south,
xv and west) receives a 13-card hand. The order in which a player receives his
cards is irrelevant, so we may consider a bridge deal to be some element of
(N13E13S13W13). (p. 129) O card An element in a deck, which has a value and a position in the deck. (p. 15) cutoff In a mixing problem, the amount of mixing after which the variation distance
to uniform begins to drop quickly. (p. 94) deal A partitioning of a collection of cards into sets with prescribed sizes. (p. 37) deck A sequence of cards. May be thought of as a function from 1, 2,...,n into { } an arbitrary set of values. (p. 15) descent When applied to a permutation π, a position k which is such that π(k) >
π(k + 1). If the permutation is treated as a function and graphed in R2, with
line segments drawn between points, descents are the positions where the graph
has negative slope. (p. 27) descent polynomial The ordinary generating function for the number of permu-
tations in a set R S which have d descents. R is usually the transition set ⊂ n between two chosen decks. (p. 29) digraph Two consecutive cards; we say a deck D has a u-v digraph at i if D(i)= u
and D(i +1) = v. (p. 99) embedding Given two decks s and t, an embedding is an injection φ from the
positions of s to the positions of t such that t φ = s. In other words, an ◦ embedding is a subdeck of t which matches s in order and content. (p. 201) euchre A card game played with 4 players and a deck of 24 distinct cards. Each
player receives 5 cards. The remaining 4 cards are the “kitty.” The top card of
xvi the kitty is face up and may enter play, but the other 3 cards are hidden and
may not. The dealer deals two or three cards at a time, so the deal sequence is
111223334411222334445666. (p. 123) go-fish A children’s card game played with a normal 52-card deck, in which the
rank of cards is important but their suit is not. So we may think of a go-fish
deck as an ordering of A424 K4. (p. 113) ··· graph of a permutation The graph obtained from a permutation π S by draw- ∈ n ing a sequence of line segments from (1, π(1)) to (2, π(2)), from (2, π(2)) to
(3, π(3)),..., and from (n 1, π(n 1)) to (n, π(n)). (p. 45) − −
GSR model The Gilbert-Shannon-Reeds model of card shuffling. It can be de-
scribed either by assuming that a deck is cut according to a binomial distribu-
tion, and then cards are dropped with probability proportional to packet size,
or by stating that all 2n shuffles of an n-card deck are equally likely. See also
a-shuffle. (p. 19) ordered deck The deck e =1, 2, 3,...,n. (p. 16) pair Two cards that appear as a subsequence of a deck; we say a deck D has a u-v
pair at (i, j) if i < j, D(i)= u, and D(j)= v. (p. 99) palindrome A deck that is the same backwards as forwards. (p. 140) permutation of n letters A bijection from 1, 2,...,n to itself. (p. 15) { } rising sequence A maximal sequence of consecutive numbers that appears as a
subsequence of some deck. When applied to a permutation, the deck in question
is the result of applying the permutation to the ordered deck. (p. 26)
xvii shuffle series The ordinary generating function of the number of a-shuffles that
produce a permutation in R for some R S . R is usually the transition set ⊂ n between two chosen decks. (p. 29)
source deck The deck being acted upon. (p. 17)
stabilizer When a group G acts on a set X, the stabilizer of some element x in X
is the subgroup of elements of G which leave x fixed. (p. 17)
stable sort Sorting a list of items by some particular attribute in such a way that if
two items have the same attribute value, they will be in the same relative order
after the sort as they were before. (p. 23)
straight poker A game played with any number of players, and a regular deck of
cards. Each player receives 5 cards. All cards are distinct, but the order in
which a player receives his cards is not significant. (p. 124)
symmetric group The group of all permutations of 1, 2,...,n . (p. 15) { }
target deck The result of an action on a deck. (p. 17)
top card The card in position 1 of a deck. (p. 15)
twin digraph A digraph of the form uu. (p. 109)
uniform distribution Given a finite event space = E , E ,...,E , a proba- E { 1 2 N } bility distribution on is said to be uniform if P(E )= 1 for all i. (p. 18) E i N variation distance A measure of the distance between two probability distributions
defined on the same event space. The variation distance between distributions
P and Q is equal to the maximum of P (W ) Q(W ) as W ranges over all sets | − | of events. (p. 34)
xviii winding number The number of rising sequences of a permutation. (p. 27)
Young matrix A matrix corresponding to a deck made up of two kinds of cards.
The (i, j) entry of the matrix is x if the ith card of the first type appears after
the jth card of the second type, and 1 otherwise. The x’s in the matrix form a
Young shape, anchored to the lower-left corner of the matrix. (p. 55)
xix NOTATION
P(U) The probability of the event U.
E(X) The expectation of the random variable X.
#U The size of the set U.
[A] The truth value of a logical statement A: [A] is 0 if A is false and 1 if A
is true. (p. 49) xm x(x + 1)(x + 2) (x + m 1) (p. 74) ··· − xm x(x 1)(x 2) (x m +1) (p. 74) − − ··· −
Sn The symmetric group of all permutations of n letters. (p. 15)
ρ The reversal permutation: ρ S is such that ρ(i)= n +1 i for all i. ∈ n − (p. 45) n (Eulerian number) The number of permutations in S that have d de- d n scents. (p. 43) n (Refined Eulerian number) The number of permutations π S with d ∈ n k des(π)= d and π(1) = k. (p. 44) n ` The number of permutations π S with des(π) = d, π(1) = k, and d ∈ n k n π(n)= `. These numbers can be written in terms of the d k. (p. 52) n d an(x) The ordinary generating function for the Eulerian numbers : d d x .
a0(x) is defined to be 1. (p. 43) P gn,k(x) The ordinary generating function for the refined Eulerian numbers:
n d d d kx . (p. 45) P xx Gn The row vector (gn,1(x),gn,2(x),...,gn,n(x)). (p. 55)
Gn The row vector (gn,n(x),gn,n 1(x),...,gn,1(x)). (p. 55) − hen,k,`(x) The ordinary generating function for the descents of permutations with
n ` d both ends fixed: d d kx . hn,k,` can be written in terms of gn,j where j = k ` (mod nP). (p. 52) − H The n n matrix whose (i, j) entry is h . (p. 59) n × n,i,j e The ordered deck: 1, 2, 3,...,n. (p. 16)
T (D,D0) The set of permutations which, when applied to deck D, result in D0.
(p. 17) W (D,u,v) The number of u-v digraphs in the deck D minus the number of v-u
digraphs in D. (p. 99) Z(D,u,v) The number of u-v pairs in the deck D minus the number of v-u pairs
in D. (p. 99) P Q The variation distance between the probability distributions P and Q. || − || (p. 34) 1 κ1(D) The coefficient of a− in the expansion of the variation distance from
uniform of the source deck D after an a-shuffle. (p. 102) 1 κ1(D0) The coefficient of a− in the expansion of the variation distance from
uniform of the target deck D0 after an a-shuffle. Fixing the target deck
corresponds to a method of dealing cards. (p. 102) n k (Unsigned Stirling number of the fist kind) The number of permutations of n letters which have exactly k cycles. (p. 140) n k (Stirling number of the second kind) The number of ways to partition n things into k nonempty subsets. (p. 140) (n, a) The set of a-shuffles of n cards. (p. 23) H des(π) The number of descents of the permutation π. (p. 27)
xxi asc(π) The number of ascents of the permutation π. (p. 27)
Pa(π) The probability of obtaining the permutation π as the result of an a-
shuffle. (p. 28) (x) The descent polynomial of R. (p. 29) DR (x) The shuffle series of R. (p. 29) SR
(D,D0; x) The descent polynomial of T (D,D0). (p. 30) D
(D,D0; x) The shuffle series of T (D,D0). (p. 30) S
P (D D0) The probability that an a-shuffle of the deck D results in the deck D0. a → (p. 33) (D) The orbit of a deck D when it is acted on by the symmetric group; i.e., O all rearrangements of D. (p. 95) 4 16 Dbj The blackjack deck (A23456789) T (p. 117)
Deuchre0 The euchre deck 111223334411222334445666. (p. 123)
13 13 13 13 Dord0 The ordered bridge deck N E S W . (p. 129)
13 Dcyc0 The cyclic bridge deck (NESW) . (p. 130)
6 Dbf0 The back-and-forth bridge deck (NESWWSEN) NESW. (p. 131)
xxii CHAPTER I
Introduction
This work is about the mathematics of card shuffling—specifically, riffle shuffling— when the cards in the deck are not necessarily distinct, or when they are distinct but will be dealt into hands after the shuffle. The Gilbert-Shannon-Reeds model of riffle shuffling, along with variation distance from uniform to measure randomness, are used throughout.
Playing cards in China date back to the 12th century, and in Europe, according to
Epstein, [19, p. 158] Gutenberg printed playing cards the same year he printed his
first Bible. Since then a number of people have approached the problem of under- standing how shuffling mixes a deck of cards. There are a variety of ways to shuffle, and several ways to measure randomness, and research has included mathematical modeling as well as accumulation of statistics.
We will take for granted that the goal of shuffling is to make it difficult to guess anything about the order of the cards. In other words, a deck has been thoroughly shuffled if every possible ordering of it has become equally likely. The difficulty is that most shuffling methods employed by humans are not thorough; instead they leave some bias toward certain orderings and away from others. The goal of a math-
1 2 ematician studying card shuffling is therefore to quantify and predict the bias.
1.1 Previous Results
Henri Poincar´edevoted a section of his 1912 book Calcul des Probabilit´es [36], quoted in Appendix B, to card shuffling. His basic assumption is that the shuffler employs some simple method of shuffling, which places cards into new positions based on their current positions, but does not depend for its action on the order of the deck.
In other words, the shuffler selects a permutation from some fixed distribution on the set of all permutations, and applies it to the deck. (The distribution is what is meant by a “method of shuffling.”) Nearly all subsequent authors have made the same assumption.
Poincar´eis able to show that any shuffling method which meets certain mild criteria, if applied repeatedly to a deck, will eventually result in a well-mixed deck—that is, with enough shuffles the bias can be made arbitrarily small. At the same time
Markov [34] was creating the more general theory of Markov Chains, and he often used card shuffling as an example. The verdict of history seems to be that Markov justly deserves credit for the theory named after him, but that Poincar´eanticipated some of Markov’s ideas in his work on card shuffling. Most subsequent work on shuffling has approached the problem as a Markov chain.
In 1955 Ed Gilbert [22, 23], working with Claude Shannon on the new science of information theory at Bell Labs, considered the ordering of a deck as a piece of information, and shuffling as “information destruction.” So to Gilbert the bias of a shuffle is the information it leaves behind. In concordance with Poincar´ehe shows 3
that for any reasonable method of shuffling, the residual information decays to zero
as the number of shuffles rises.
Gilbert goes on to model several particular shuffles, including riffle shuffles, in which
the deck is cut in half and the two halves interleaved in some fashion. He quotes
a theorem of Shannon that characterizes which permutations are reachable with a
certain number of shuffles, and then gives lower and upper bounds on the residual
information based on the assumptions that all reachable permutations are equally
likely, and that all possible shuffles are equally likely. The latter assumption has
become the basis for the model employed in this work.
In 1981 Jim Reeds [37, 38] and David Aldous were thinking about generating random
permutations by sorting n samples of a uniform random variable. If the variable takes on values of 0 and 1, the sorting permutation, applied to a deck of cards, will pull out those cards corresponding to 0’s and move them to the top, preserving their order.
Reeds realized that this was the inverse of a riffle shuffle, and that it was similar to the way a radix sorter (Section 2.6) would sort punch cards if it chose bins randomly instead of reading the cards. Repeated shuffles correspond to sorting multiple times, which is how the radix sorter sorts multi-digit fields. The assumption of uniformity on the bits is equivalent to picking shuffles uniformly, which is what Gilbert and
Shannon had considered; thus the premise is known as the Gilbert-Shannon-Reeds model.
In 1983 Aldous [1], following Reeds, used card shuffling as an example of a random walk on a group. Aldous was interested in the general problem of how fast Markov chains approach their stationary distributions, and he related a number of ideas that are important to us. He measures the nonrandomness of a distribution over a finite 4
event space Ω by variation distance from uniform, which is defined to be
1 P U := P(ω) U(ω) || − || 2 | − | ω Ω X∈ 1 where U(ω)= #Ω is the uniform probability. (In Sections 2.9 and 2.10 we make an argument for why variation distance from uniform is a good measure of randomness for card shuffling.) Aldous observes that certain “rapidly mixing” Markov chains approach uniform very suddenly; thus, a graph of variation distance versus time displays the “waterfall” shape which the reader will see repeatedly in the present work (for example, in Figure 2.6). He suggests that the best way to understand the behavior of such chains is to pick a small value and find the first time the variation distance from uniform drops below . That value has come to be known as the “cutoff time”.
Aldous shows that GSR shuffling is a rapidly mixing chain, and (drawing from Reeds’ work) that the time when each card has been assigned a different bit string is a
“strong uniform time.” In other words, at that time, the initial ordering of the deck has become irrelevant, and the deck is thoroughly mixed. That allows Aldous to
3 prove that the cutoff time for shuffling an n-card deck is about 2 log2(n) shuffles when n is large.
In 1986 Aldous and Persi Diaconis [2] gave the GSR model its name, and described it in three equivalent ways, one of which is given in Section 2.4. Again following
Reeds, they observe that the time when each card has been assigned a different bit string is similar to the famous “birthday problem” in probability. After 11 shuffles of a 52-card deck, the probability that any two cards have the same string (i.e., the probability that some pair of cards has always been in the same half of the deck for
1 each shuffle) drops below 2 . In 1988 Diaconis [13] added a fourth description of the 5
GSR model, and also reported some experiments he and Reeds performed to test
how well the model represents the way people really shuffle (see Section 2.12).
In 1992, Diaconis and Dave Bayer [3] wrote the most famous paper to date on card
shuffling. They generalized the normal idea of a shuffle to include a-shuffles, which is the way a person with a hands would shuffle a deck (see Section 2.5). That allows the problem to be approached without the use of Markov chains. Most importantly they derived an explicit formula for the probability of a permutation after an a-shuffle of an n-card deck: 1 a + n des(π) 1 P (π)= − − a an n where des(π) is the number of descents in π. Using the formula, one can, for the first time, calculate precisely the variation distance from uniform of any deck of distinct cards after any number of GSR shuffles (see Section 2.10 and Figure 2.6). For a
1 52-card deck, the variation distance drops below 2 after 7 shuffles, which is why the New York Times article about the result had the headline “In Shuffling Cards, 7
Is Winning Number” [28]. Seven shuffles has been the standard for randomness for many card players ever since.
This work picks up where Bayer and Diaconis left off, but we would be remiss not to point out that many other people have thought about card shuffling, from both mathematical and statistical points of view. In 1940 Borel and Ch´eron [5] devoted an entire chapter of their book on the mathematics of bridge to shuffling. In 1958
Kosambi and Rao [29] did experiments with a 25-card deck in order to show that certain ESP research, which used such a deck to test psychic ability, had not taken into account the effect of poor shuffling. In 1973 Berger [4] showed there was a measurable difference between bridge tournament hands picked by computers and 6
those dealt by hand, and he argues that the discrepancy is due to poor shuffling. In
1977 Epstein [19] investigated real shufflers by recording the sound of shuffles, and
he suggests a different model than Gilbert, Shannon and Reeds (see Section 2.12).
Since 1992, a number of authors have used Bayer and Diaconis’ results and gen-
eralized them in various ways (see [14, 15] for references); we mention only a few
here. Diaconis, Pitman, and McGrath [17] calculated the distribution of a number
of quantities after an a-shuffle. Ciucu [8] found the most likely card to be in a par- ticular position after any number of GSR shuffles. Trefethen & Trefethen [49] and
Stark, Ganesh, and O’Connell [44] revived Gilbert and Shannon’s idea of measuring
randomness as information loss, with some surprising results.
1.2 Repeated Cards
The problem of shuffling a deck with repeated cards is mentioned in [29], and the fact
that dealing a deck into hands affects the randomness created by shuffling appears
in [5], [22], and [35]. The new idea in the present work (and in [10] and [11]) is to
apply the results of Aldous, Diaconis, Reeds, et. al. to the case of decks in which not
all cards are distinct. This turns out to be dual to the problem of decks which are
dealt into hands (see Section 2.11), and so the two problems are treated in tandem
throughout. Note that identifying cards has a natural application to card games
such as go-fish or blackjack, which are played with an ordinary deck but ignore the
suit of cards; so, for instance, the ace of spades is essentially identical to the ace of
hearts.
The generalization complicates the situation in the following ways: 7
Decks (ordered sequences of cards) and transformations between decks can no • longer be identified with permutations. Instead for each pair of decks there is
a set of permutations which transform the first into the second (and a different
set which goes the other way). The transformation sets are easy to describe,
but it is difficult to find their probability after shuffling.
The initial order of a deck, and not just its composition, affects how fast the • distribution approaches uniform.
A deck can no longer be described by a single number (its size). That makes it • 3 hard to imagine a simple asymptotic result such as “The cutoff is near 2 log2(n) shuffles” which would apply to all conceivable decks.
In this work we fully embrace Bayer and Diaconis’ generalization of repeated GSR shuffles to a-shuffles. The good news then is that the formula for probability after an a-shuffle is still available to us, as long as we cast the transformation between decks in terms of permutations. Aldous, Bayer, and Diaconis’ method of measuring randomness—variation distance from uniform—is defined for any probability distri- bution, so it generalizes easily to the case of decks with repeated cards.
The other good news is that we have a richer set of questions we can ask than simply
“How many times should one shuffle,” the precise answer to which will always be somewhat debatable, since it depends on how much unfairness one is willing to tolerate. For instance:
Which ordering of a deck will shuffle the slowest? •
Which method of dealing produces the most random set of hands? • 8
Where should one cut the deck between shuffling and dealing? •
1.3 New Results
1.3.1 Dealing is Equivalent to Fixing the Target Deck
Let P (D D0) be the probability of obtaining the deck D0 by a-shuffling the deck a → D. So after D is a-shuffled, the variation distance between the new distribution of
decks and the uniform is
1 P U = P (D D0) U(D0) || a − || 2 | a → − | D0 (D) X∈O where (D) is all reorderings of D. On the other hand, if a deck of distinct cards O is shuffled and then dealt to k players, we may encode the method of dealing as a
sequence D0 of letters from the alphabet 1, 2,...,k ; for instance, dealing cyclically { } around the table corresponds to the string
D0 = (12 k)(12 k) (12 k). ··· ··· ··· ···
The uniform distribution in this case would make all partitions of the deck into hands
equally likely, and in Section 2.11 we show that the variation distance from uniform
after a-shuffling and then dealing with method D0 is
1 P U = P (D D0) U(D) . || a − || 2 | a → − | D (D0) ∈OX In other words, dealing corresponds to fixing the target (ending) deck, in the same
way that identifying cards corresponds to fixing the source (starting) deck. (This idea
is due to Viswanath, and appears first in [10].) Note that shuffling is an asymmetric
process, so the two sums above are different. Thus we say that the problem of dealing
is “dual” to the problem of identifying cards, but not identical. 9
1.3.2 Transition Probabilities and Descent Polynomials
In order to describe the behavior of the distribution after repeated shuffles, we need
to be able to calculate the sums above for all values of a. Thus for either case
(fixed source deck or fixed target deck) it is necessary to compute P (D D0) a → for all a. Since the probability of a permutation is determined by the number of
descents it has, we could compute that probability if we knew how many descents
each permutation that takes D to D0 has. That is, if we know the coefficients of the
“descent polynomial”
des(π) (D,D0; x)= x D 0 π:πDX=D we can calculate P (D D0) for any a. a →
Thus the first step in calculating variation distances seems to be a method for finding
descent polynomials. Unfortunately, this turns out to be an intractable problem in
general. In [11] Viswanath demonstrates a class of decks for which computing descent
polynomials is #P-complete, meaning that it is equivalent to a large class of counting
problems which are believed not to have polynomial-time solutions. So it seems likely
that calculating variation distances exactly is also #P-complete in general.
1.3.3 Approximating Transition Probabilities
1 The new idea in Chapter V is to write P (D D0) as a polynomial in a− , thus a → allowing us to estimate the transition probability for large values of a. We show
that n 1 − k P (D D0)= c (D,D0)a− a → k Xk=0 10
where c0(D,D0) is the uniform probability and
n W (D,u,v)Z(D0,u,v) c (D,D0)= . 1 2N n n u nw is the number of cards with value w, and W (D,u,v) and Z(D0,u,v) are simple, easily computable functions on decks: W (D,u,v) is the number of u-v digraphs in D minus the number of v-u digraphs, and Z(D0,u,v) is the number of u-v pairs in D0 minus the number of v-u pairs. See Section 5.4 for definitions of digraphs and pairs. Thus we have a first order estimate for the variation distance from uniform of a deck D after an a-shuffle: 1 2 Pa U = κ1(D)a− + O(a− ) || − || where 1 κ (D)= E Nc (D,D0) 1 2 | 1 | and the expectation is taken under the assumption that D0 is a random variable uniformly distributed on the orderings of D. Similarly, when the target deck D0 is fixed, we have 1 2 Pa U = κ1(D)a− + O(a− ) || − || where 1 κ (D0)= E Nc (D,D0) 1 2 | 1 | and D is uniform on (D0). The coefficients κ (D) and κ (D0), when they are O 1 1 nonzero, tell the long term behavior of the variation distance from uniform. So arguably they answer the questions “How hard is the deck D to shuffle?” and “How good is the dealing method D0?” 11 1.3.4 Bridge Consider the case of bridge (Section 5.7.6). Bridge is a game with four players (North, East, South, and West) who each receive 13 cards from a deck of 52 distinct cards. If D0 is some method of dealing bridge, then 1 κ1(D0)= E W (D,u,v)Z(D0,u,v) 13 u 13 13 13 13 1. Dord0 = N E S W , that is, dealing the top 13 cards to North, the next 13 to East, etc. 13 2. Dcyc0 =(NESW) , dealing cyclically around the table. There are 169 ways to find an N above an E in Dord0 , and no ways to find an E above an N, so Z(Dord0 , N, E) = 169. Likewise Z(Dord0 ,u,v) = 169 for all u < v, as long as we give the values the inherent order N < E < S < W. (The order is arbitrary, and only for convenience—any order will give the same value for κ1.) On the other hand one can check that there are 91 N-E pairs and 78 E-N pairs in Dcyc0 , and likewise for the other values, so Z(Dcyc0 ,u,v)= 13 for all u < v. It follows that 1 κ (D0 )= κ (D0 ). 1 cyc 13 1 ord One can interpret that as saying that if one shuffles enough, then cyclic dealing works 13 times as well as cutting the deck into hands. Or, to put it another way, switching from cutting the deck into hands to cyclic dealing is as effective in the long run as doing an extra log 13 3.7 GSR shuffles. 2 ≈ 12 Deck Method a = 16 32 64 128 256 512 1024 52 Distinct Exact 1.0000 0.9237 0.6135 0.3341 0.1672 0.0854 0.0429 123 (52) 44.0571a−1 2.7536 1.3768 0.6884 0.3442 0.1721 0.0860 0.0430 · · · Ordered bridge Monte Carlo 0.9902 0.7477 0.4230 0.2183 0.1104 0.0550 0.0274 N13E13S13W13 27.9095a−1 1.7443 0.8722 0.4361 0.2180 0.1090 0.0545 0.0273 Cyclic bridge Monte Carlo 0.2349 0.0735 0.0346 0.0169 0.0084 0.0042 0.0021 (NESW)13 2.1469a−1 0.1342 0.0671 0.0335 0.0168 0.0084 0.0042 0.0021 Back-Forth bridge Monte Carlo 0.3118 0.0260 0.0073 0.0022 0.0008 0.0003 0.0002 (NESWWSEN)6(NESW) 0.1651a−1 0.0103 0.0052 0.0026 0.0013 0.0006 0.0003 0.0002 Table 1.1: Variation distances from uniform after an a-shuffle for a deck of distinct cards and 3 methods of dealing bridge. The first-order approximations are shown for each deck; the −1 coefficients of a , which are called κ1 in the text, are rational numbers which will be computed exactly in Chapter V. Also shown are an exact computation for the distinct case, and the results of Monte Carlo simulations for the bridge methods. See Chapter VI for the Monte Carlo parameters. The natural question at this point is, “Is cyclic dealing the best we can do?” The answer, surprisingly, is no. We show that for the dealing method 6 Dbf0 =(NESWWSEN) (NESW) the value of Z(Dbf0 ,u,v)is 1 for all u < v, and thus dealing back-and-forth around the table is 13 times as effective as cyclic dealing, if one shuffles enough. Or, switching from cyclic dealing to back-and-forth dealing is worth another 3.7 GSR shuffles. Table 1.1 contains a summary of the first-order approximations for bridge, along with results from Monte Carlo simulations (see below). 1.3.5 Other Results from First-Order Approximations In Section 5.6.4 we discuss a general algorithm for computing κ1(D), and in Sec- tion 5.7.2 we do the same for κ1(D0). Some results are 1. When playing straight poker with 4 players and dealing cyclically, cutting the deck after card 16 will produce the most random set of hands. 2. A deck containing only two types of cards will have κ1 = 0, and therefore will 13 shuffle very fast, if the first and last cards are the same (but not if they are different). 3. The hardest-to-shuffle go-fish deck appears to be (A23456789TJQK)4, and its value of κ1 is about half that of a deck of 52 distinct cards. Thus, a shuffler needs one more GSR shuffle to mix a deck of 52 distinct cards than to mix a go-fish deck. 1.3.6 Monte Carlo Simulations It is natural to ask how good the first order estimate is for variation distance. We present a crude bound on the error in Section 5.8, and we also attempt justify the estimates using Monte Carlo simulations in Chapter VI. Included are two very large supercomputer simulations for cyclic and back-and-forth bridge; they confirm that back-and-forth dealing is an improvement if one shuffles at least 5 times. In general we find that the first-order estimate usually becomes good at about the time of the cutoff. 1.3.7 Calculation of Descent Polynomials for Certain Decks Useful in those Monte Carlo simulations are some results from Chapter III, where we compute descent polynomials for certain special pairs of decks. The intent of the chapter is to find the broadest possible classifications which will admit fast computa- tion, knowing as we do that the general problem is #P-complete. The most inclusive methods presented calculate transitions when one of the two decks is sorted, and, a bit more slowly, when the target deck contains large blocks of cards of the same value. Thus we can always compute the transition probability to or from at least 14 one ordering of any deck. 1.3.8 The Joint Distribution of des(π) and π(1) The calculations in Chapter III lead to a general question about the joint distribution of des(π) and π(1) as π ranges over all permutations of n letters. That distribution is explored in Chapter IV, where we find a formula for the number of permutations which begin with k and have d descents. That is used to show that when π is chosen uniformly from among those permutations with d descents, the expected value of π(1) is d + 1. Chapter IV will appear in publication as [9]. CHAPTER II Preliminaries 2.1 Permutations and Decks A permutation of n letters is a bijection π : 1, 2,...,n 1, 2,...,n . One { }→{ } way to display a permutation is with two-line notation, for instance 123456 π = 415362 which means π(1) = 4, π(2) = 1, and so on. Sn will denote the symmetric group of all permutations of n letters. If π, σ S then πσ is the composition π σ. That ∈ n ◦ is, πσ(k)= π(σ(k)), for all k. A deck D of size n is a sequence of n cards. Each card in a deck has a position and a value. The positions are distinct integers between 1 and n. The values can be anything, and two different cards may have the same value. We may think of a deck as simply a function D from 1, 2,...,n into some set of values, where the { } card in position k has value D(k). When we write “D = A, C, A, B, A, B” or simply “D = ACABAB” we mean that D is a deck of 6 cards, and that the card in position 1 has value A, the card in position 2 has value C, and so on. The card in position 1 is the top card and the card in position n is the bottom card. 15 16 A permutation π can be applied to a deck D of the same size to produce a new deck πD. π takes the card at position k and puts it in position π(k), for all k. So if D0 = πD, then D0(π(k)) = D(k) for all k. If we think of a deck as a function, that means D0 π = D, or ◦ 1 (2.1) πD = D π− . ◦ We will sometimes associate a permutation with the deck πe, where e is the ordered deck 1, 2, 3,...,n. (That is, e(k)= k for all k.) From Equation (2.1), 1 1 1 (2.2) πe = π− (1), π− (2),...,π− (n). We will also sometimes associate π with the sequence π(1), π(2),...,π(n). Which sequence is intended will be clear from context. 2.2 Repeated Values Let D be a deck of n cards, and let D0 be a reordering of D. If all the cards in D have distinct values, then there is exactly one permutation in Sn which, when applied to D, produces D0. For instance, if D = C, A, B, D and D0 = D, C, B, A then a permutation π which takes D to D0 must send the C in position 1 of D to the C in position 2 of D0, so π(1) must be 2. Likewise π(2) must be 4 so that the A card moves correctly, and π(3) = 3, π(4) = 1 to accommodate the B and D cards. So π in two-line notation must be 1234 π = . 2431 We can represent the decks and the permutation together with the following arrow diagram. 17 B A B A B A B A A A A A A A A A B B B B B B B B A B A B A B A B Figure 2.1: The four permutations which take BABA to AABB. CABD D C B A Now suppose D and D0 contain multiple cards with the same values. For instance, we might have D = B, A, B, A and D0 = A, A, B, B. Now if π takes D to D0, what do we know about π(1)? Not as much as before. π(1) is the position that the B at the top of D will go to after the deck is permuted, so it must be the position of a B in D0. That is, π(1) 3, 4 . Since the third card in ∈{ } D is also a B, we must have π(3) 3, 4 too, and similarly the A’s must go to A’s, ∈ { } which means π(2), π(4) 1, 2 . But that’s all we can say. Any permutation which ∈{ } satisfies those criteria will take D to D0. There are 4 such, shown in Figure 2.1. In general, let T (D,D0) be the set of permutations which take D to D0. In this situation we refer to D as the source deck and D0 as the target deck. Suppose the values of cards in the decks are v1, v2,...,vk and each deck contains ni cards with value vi, for i =1, 2,...,k. Then there are ni! ways to draw arrows between the vi’s in D and the vi’s in D0. So the total number of permutations which take D to D0, i.e. the size of T (D,D0), is n !n ! n ! 1 2 ··· k For any deck D, T (D,D) is the stabilizer of D. T (D,D0) is a coset of both T (D,D) 18 and T (D0,D0): if π is any element of T (D,D0), then (2.3) T (D,D0)= πT (D,D)= T (D0,D0)π. Unfortunately, stabilizers are not in general normal subgroups of the symmetric group, and few results from group theory seem to be helpful in answering our basic questions. 2.3 Mixing Problems Imagine being the groundskeeper of a stadium. A truck delivers a large pile of dirt to one corner of the field, and your job is to spread the dirt over the field so that its depth is the same everywhere. You have at your disposal a mechanical rake, capable of raking the whole field at once and moving dirt around in some fashion. You would like to decide how many times you need to rake the field, so that it will be close to uniformly covered when you are done. To make mathematical sense of this problem, one needs three things: a model for how the rake moves dirt, a measure for how close the dirt is to “flat” at any given time, and an idea of “how flat is flat enough”—that is, a critical value of flatness which we declare to be our target. The groundskeeper’s problem is a metaphor for mixing-time problems. The field is some event space, and the dirt is probability, so the position of the dirt on the field represents a probability distribution. The rake is some randomization procedure, which spreads probability around. If it is a good mixing method, it will eventually bring the distribution close to the uniform distribution (i.e., all events will be approximately equally likely). In our case the event space will be the set of orderings of a deck of cards, and the 19 randomization procedure will be riffle shuffling, defined below. We will establish our measure of “flatness” in Sections 2.9 and 2.10. We leave it up to the reader to choose the critical value. 2.4 Riffle Shuffling Riffle shuffling may be described by the following procedure, sometimes called the Gilbert-Shannon-Reeds or GSR model. Take a deck of cards and partition it into two packets by placing a “cut” somewhere in the deck. The position of the cut should be chosen according to the binomial distribution; that is, the probability that the 1 n cut is placed after the kth card of an n-card deck is 2n k . (Note this allows for the cut to be placed before the top card or after the bottom card, but since the binomial distribution is bell-shaped, the cut is very likely to be near the middle of the deck.) After making a cut, the shuffler takes those cards above the cut, still in their original order, into his left hand, and the rest into his right hand. He then repeats the following procedure, until all cards have been dropped: if the left hand contains A cards and the right hand contains B cards, he drops a card from the bottom of A the left packet with probability A+B and from the bottom of the right packet with B probability A+B . (The idea being, a shuffler is more likely to drop from a large packet than from a small one.) Each card dropped falls on top of those already dropped, and the result is a new deck, which is a reordering of the original. Table 2.1 shows a typical shuffle of the deck ABCDEFGH. Any shuffle of n cards can be represented uniquely by a sequence of n 0’s and 1’s. The number of 0’s is the card after which to make the cut (equal to the number of 20 Packets On Table Action Probability ABC|DEFGH 1 8 Cut after third card 28 3 ABC 5 H Drop from right 8 DEFG H AB C 3 CH Drop from left 7 DEFG A B 2 BCH Drop from left 6 DEFG A 4 GBCH Drop from right 5 DEF G A 3 FGBCH Drop from right 4 DE F A 2 EFGBCH Drop from right 3 D E A 1 AEFGBCH Drop from left 2 D 1 DAEFGBCH Drop from right 1 D Table 2.1: The steps in a typical shuffle of the deck ABCDEFGH. It results in the reordering DAEFGBCH. 21 } { A D1 B A0 Left }| C E1 z { D F1 Read this bottom-to-top to get a E G1 {z recipe for executing the shuffle F B0 Right }| G C0 H H1 z | Figure 2.2: The correspondence between the shuffle of Table 2.1 and a sequence of 0’s and 1’s. The number of 0’s is the position of the cut, and each digit indicates whether a card from the left (0) or right (1) hand should be dropped at each stage. cards in the shuffler’s left hand when shuffling begins), and each digit tells whether to drop a card from the left or from the right (0 for left, 1 for right). For instance, the shuffle in Table 2.1 can be represented by 10011101, as shown in Figure 2.2. Clearly, then, there are 2n ways to riffle shuffle a deck of n cards. The probabil- ity of getting the shuffle in our example can be obtained by multiplying together the probabilities shown in Table 2.1 for each step, since the choices involved are independent: 1 8 5 3 2 4 3 2 1 1 1 P(10011101)= = . 28 3 8 7 6 5 4 3 2 1 28 In fact, this was not an accident. The denominators of the “drop probabilities” are the total number of cards remaining to be dropped—therefore they begin with 8 and decrease by 1 at each step. The numerators of the “left-drop” probabilities are the number of cards remaining in the left hand—therefore they begin with 3 and decrease by 1 each time a card is dropped from the left. The other numerators correspond to “right-drops”, so they begin with 5 and go down. So the product of all the drop probabilities is 22 (3 2 1)(5 4 3 2 1) 3!5! · · · · · · = 8 7 6 5 4 3 2 1 8! · · · · · · · which means the probability of the shuffle is 1 8 3!5! 1 = . 28 3 8! 28 n It is straightforward to see that under the GSR model, the result will be 2− for any shuffle of n cards. This is the chief mathematical virtue of the GSR model, and the reason it is called the “maximum entropy” model. 2.5 a-shuffles Riffle shuffles, as described so far, can be generalized to a-shuffles, by either of the following equivalent methods: 1. Partition a deck of n cards into a packets according to a multinomial distribu- tion. That is, the probability that packet i has n cards for i = 0, 1,...,a 1 i − is 1 n! n a · n0!n1! na 1! ··· − Then proceed as in Table 2.1, dropping a card from the ith packet with proba- bility equal to the size of the packet over the number of cards not yet dropped. 2. Write down a sequence of n numbers, each of which is between 0 and a 1. − Let n be the number of i’s, for i =0, 1,...,a 1. Remove the first n cards of i − 0 the deck and call them the “0-packet”, remove the next n1 cards and call them the “1-packet”, and so on. Now proceed to use the sequence as instructions for combining the packets, as in Figure 2.2. 23 We will use the notation (n, a) to represent the a-shuffles of n cards. Each shuffle H determines some permutation of n letters, but there may be some permutations that cannot be obtained through a-shuffling, and others that can be obtained in multiple ways. For example, complete reversal of a deck is impossible if a < n, yet the two 2-shuffles 11110000 and 11100000 both produce the identity permutation by cutting the deck, dropping all the cards formerly on the bottom, then dropping all the cards formerly on the top. Note they are different shuffles, since they have different cuts. The GSR model assigns equal probability to all an elements of (n, a), so it will give different probabilities to H different permutations. 2.6 Inverse Shuffles and Repeated Shuffles The inverse of an a-shuffle is easier to describe than the a-shuffle itself. To perform an inverse a-shuffle on a deck of cards, simply write a number from 0 to a 1 on − the back of each card, then perform a stable sort. That is, move the 0’s to the top of the pack, preserving their relative order, then move the 1’s below them, again preserving relative order, and so on. Figure 2.3 shows a 3-shuffle and its inverse. Stable-sorting of computer punch cards on a single decimal digit could be—and was—done mechanically. Sorting machines contained 10 bins, numbered 0 through 9, and would iterate through a deck of cards and place those with a 0 in a particular column in the 0-bin, those numbered 1 in the 1-bin, etc. Thus they could perform any “10-sort”, that is, the inverse of any 10-shuffle. 24 0A D1 0B A0 0C G2 1D E1 1E F1 1F B0 2G H2 2H C0 Figure 2.3: From left to right, a 3-shuffle of ABCDEFGH; from right to left, an inverse 3-shuffle of DAGEFBHC. Inverse shuffles are stable sorts. The reason a 10-sorter was a useful machine was that one could sort a stack of cards according to a multi-digit field by successive sorts on digits, least significant first. The process was known as “radix sort”[27]. For example, consider a set of cards containing two fields, name and age. Bob 47 Carol 39 Cathy 50 Jenny 27 Joe 43 Steve 41 Twila 37 Zoe 31 Suppose the cards are currently ordered by name, and we would like to sort them by age. What we should do is stable-sort the cards first by the right (ones) digit, then by the left (tens) digit. Figure 2.4 shows the process. Since each sort preserves the previous order when it has no cause to make a change, the net result is to put the stack in lexicographic order according to the whole field. Clearly the radix of the digits of the field is not important, and in fact each digit might have a different radix. Now suppose we have an ab-shuffle z, which corresponds to a sequence (z1, z2,...,zn) of numbers between 0 and ab 1. We can write each z as a two-digit mixed-radix − i 25 Jenny 27 Cathy 50 Bob 47 Zoe 31 Steve 41 Carol 39 Twila 37 Zoe 31 Cathy 50 Carol 39 Joe 43 Jenny 27 Steve 41 Bob 47 Joe 43 Joe 43 Jenny 27 Steve 41 Bob 47 Twila 37 Twila 37 Cathy 50 Carol 39 Zoe 31 Figure2.4: A radix sort of a set of 8 punch cards. The cards are initially ordered by the alphanu- meric “name” field, and can be sorted by the two digit “age” field by the two-stage process shown. The first stage sorts by the ones digit of age, and the second by the tens digit. Since each sort is stable, the effect is to sort by the whole field. number (x ,y ), where 0 x < a and 0 y < b, by setting i i ≤ i ≤ i x = z /b i b i c y = z bx . i i − i 1 Then from the radix sort we know that z− (an ab-sort) is the composition of a b-sort and an a-sort, which means that z is the composition of an a-shuffle and a b-shuffle. So there is a surjection φ (2.4) (n, a) (n, b) (n, ab) H × H → H such that an a-shuffle x followed by a b-shuffle y is equivalent to the ab-shuffle φ(x, y). Since the domain and range both have size (ab)n, φ is a bijection. Thus choosing x and y independently and uniformly has the same effect as choosing φ(x, y) uniformly, which is to say, executing a random a-shuffle followed by a random b-shuffle has the same effect as executing a random ab-shuffle. In particular, that means m 2-shuffles are equivalent to one 2m-shuffle. So we need only understand a-shuffles to understand the effect of repeated shuffles. 26 The idea of thinking of inverse shuffles as sorts appears in [1] and [2], in which Aldous and Diaconis credit the idea to Jim Reeds. Those works also include the notion of stringing together 2-shuffles to get a 2n shuffle. The idea of generalizing 2-shuffles to a-shuffles, and the fact that an a-shuffle followed by a b-shuffle is an ab-shuffle, appears in [3]. 2.7 Counting Shuffles Which Produce a Permutation Let π be an element of Sn. We would like to enumerate the a-shuffles which resolve to π, so that we can calculate the probability of obtaining π by a-shuffling the deck. Once a deck D of distinct cards has been partitioned, there is at most one way to riffle the cards together to get πD, because the order of the cards in πD dictates which card must be dropped at each step. So we need only count the number of ways to partition D in such a way that it is possible to obtain πD by riffling. A partition of a deck of n cards can be described either as a set of non-negative packet sizes s0,s1,...,sa 1 which sum to n, or as a list of cut positions c1,c2,...,ca 1, where − − ck = i cut between a pair of cards, and we can also place cuts before the top card or after the bottom card.) A rising sequence ([22],[3]) of a permutation π is a maximal sequence of consecutive numbers i +1, i +2,...,i + k that appears as a subsequence of πe. So for instance if 12345678 π = 24537816 27 then πe = 71423856. 123 is a rising sequence of π, because the numbers 1, 2, and 3 appear in order in πe (1,2,3 is a subsequence) but 1, 2, 3, and 4 do not (so 1,2,3 is maximal). Likewise 4,5,6 and 7,8 are rising sequences. So π, in this example, has 3 rising sequences. In [3] Bayer and Diaconis refer to the number of rising sequences of a permutation as its winding number. Imagine a cul-de-sac in a new housing development, with n houses arranged in a circle. Suppose it is the developer’s job to deliver doormats to each of the houses, and that each house’s mat is personalized; perhaps the name of the family in the house is embroidered on the mat. The mats are stacked in some order in the back of the developer’s truck, and he plans to deliver them by driving around the circle until he gets to the house whose mat is on top of the stack, throwing the mat on its fated doorstep, then repeating the process until all the mats are gone. In this story, the winding number of the initial order of the mats is the number of trips the developer makes around the circle. If k appears in a certain rising sequence of π, then k + 1 will be in the same rising sequence if and only if k + 1 appears after k in πe. The positions of k and k +1 are π(k) and π(k + 1), respectively, so k + 1 begins a new rising sequence if and only if π(k) > π(k + 1). In that case we say π has a descent at k. Since descents correspond to breaks between rising sequences, (2.5) # rising sequences of π = # descents of π +1. { } { } We will denote the number of descents of π by des(π). If π(k) < π(k + 1) we say that π has an ascent at k, and we will denote the number of ascents of π by asc(π). 28 Lemma 2.1 If P is a partition of a deck of size n, and π S , then P can be riffled ∈ n to get π if and only if P has a cut wherever π has a descent. Proof. P can be riffled to get π if and only if every packet of P appears in order in πe. That means each packet of P is contained in a rising sequence of π, which is to say, no packet crosses a boundary between rising sequences. Since there are descents between the rising sequences and cuts between the packets, there must be a cut at every descent. And now we can count the number of shuffles that produce a particular permutation. n+a 1 d Corollary 2.2 If π S has d descents, then there are − − a-shuffles which ∈ n n resolve to π. Proof. We need to count the number of partitions that contain all d descents of π among their a 1 cuts. That is the same as counting the number of ways to arrange − a 1 identical balls in n + 1 boxes, subject to the restriction that a certain d boxes − must contain at least one ball. One way to construct each such arrangement is to place a 1 d balls in the boxes, then place the remaining d balls in the required − − locations. Placing k balls in m boxes is equivalent to writing a sequence of k “stars” m 1+k and m 1 “bars”, and there are m− 1 ways to do that. Setting m = n + 1 and − − k = a 1 d completes the proof. − − n Finally, then, since each shuffle occurs with probability a− , 1 n + a 1 des(π) (2.6) P (π) := − − a an n is the probability of obtaining the permutation π as the result of an a-shuffle. Equa- tion (2.6) appeared first in [3]. 29 2.8 Descent Polynomials and Shuffle Series For π S , define the ordinary generating function ∈ n a 1 (2.7) (x) := # a-shuffles which produce π x − Sπ { } a 1 X≥ n + a 1 des(π) a 1 = − − x − . n a 1 X≥ Substituting k = a 1 des(π), − − n + k (2.8) (x)= xdes(π) xk. Sπ n k 0 X≥ The coefficient in the sum is the number of ways to put k identical balls into n +1 distinct boxes, thus xdes(π) (2.9) (x)= xdes(π) 1+ x + x2 + . . . n+1 = . Sπ (1 x)n+1 − Now let R S be some set of permutations, and define ⊂ n a 1 (2.10) (x) := # a-shuffles which produce a permutation in R x − SR { } a 1 X≥ (2.11) (x) := xdes(π) = # π R with d descents xd DR { ∈ } π R d X∈ X to be the shuffle series and descent polynomial of R, respectively. Then xdes(π) (x) (2.12) (x)= (x)= = DR . SR Sπ (1 x)n+1 (1 x)n+1 π R π R X∈ X∈ − − If n 1 R(x)= b0 + b1x + . . . + bn 1x − D − (x)= c + c x + c x2 + SR 1 2 3 ··· then the probability of obtaining a permutation in R after an a-shuffle is c 1 n + a 1 d (2.13) P (R)= a = b − − a an an d n d a 1 ≤X− 30 which is a finite sum, calculable in polynomial time if we know the bd’s. Note we can go the other way as well—since (x)= (x)(1 x)n+1, DR SR − d (a 1) n+1 (2.14) b = c coefficient of x − − in (1 x) d a − a 1 X≥ d a+1 n +1 = ( 1) − c − d a +1 a a 1 X≥ − which is also a finite sum, calculable in polynomial time if we know the ca’s. So calculating the probabilities of a set of permutations after an a-shuffle, for all a, is computationally equivalent to calculating the descent polynomial of the set. In particular, we will focus on finding (2.15) (D,D0; x) := 0 (x) D DT (D,D ) (2.16) (D,D0; x) := 0 (x) S ST (D,D ) as a means to understand the transition from deck D to deck D0 after any size shuffle. 2.9 Distance from Uniform Once we understand how shuffling acts on a deck, we still need to decide how much shuffling is “enough”. To do so, we need to define what “enough” means. Ultimately, the definition should be rooted in what we plan to do with the deck once it is shuffled, i.e., what game we will be playing. Suppose we are playing a simple game, with some number of players, that involves no choices or strategy—so the entire course of the game is determined by the initial order of the cards. War, straight poker (if we ignore betting), and many forms of solitaire are examples of such games. From the perspective of a particular player, some decks are “winners” and the rest are “losers”. 31 Let W be the set of winners for some player A. If U represents the uniform dis- tribution, then U(W ) is the probability that player A wins, given that the deck is perfectly mixed. If, on the other hand, P represents the distribution of decks af- ter some shuffle, then one measure of how good the shuffle is is the advantage or disadvantage it gives to player A; that is, the quantity P (W ) U(W ) . | − | For example, consider a simple game called “Ace-King”, between two players whom we will call B (for “black”) and R (for “red”). A regular deck is dealt face-up, one card at a time. B wins if the ace of spades comes up before the king of hearts, and loses otherwise. Clearly this is a fair game if the deck is perfectly mixed—that is, 1 if W is the set of decks for which B wins, then U(W ) = 2 . But suppose instead that the ace of spades is initially on top of the deck, and the king of hearts on the bottom, then the deck is given an a-shuffle, after which we play the game. One might guess, correctly, that B has an advantage if the deck is not well shuffled, since the ace will tend to stay near the top of the deck and the king near the bottom. Using the methods of Chapter III and Chapter IV, the probability that B wins is calculated in Appendix D, and the result is a n a n 1 k k − P (W )=(n + 1) n a a − a Xk=1 Xk=1 where n = 52 is the size of the deck. B’s advantage is P (W ) U(W )= P (W ) 1 ; a − a − 2 it is graphed in Figure 2.5 alongside the result of the next example. Peter Doyle, as reported in [33] and [32], proposed the following extension of the Ace-King game (somewhat simplified here). Suppose an ordinary deck is initially ordered A , 2 ,..., K , A , 2 ,..., K , A , 2 ,..., K , A , 2 ,..., K ♠ ♠ ♠ ♣ ♣ ♣ ♦ ♦ ♦ ♥ ♥ ♥ 32 and is given an a-shuffle. We go through the deck from top to bottom. When the ace of spades is found, place it in a pile in front of player B. If, subsequently, the 2 of spades is found, place it on top of the ace. If the 3 of spades appears after the two, place it on top, and so on. So B’s stack always matches an initial segment of the unshuffled deck. Simultaneously construct a stack of red cards ( ’s and ’s) in front of player R, in ♦ ♥ the same manner, except that the red stack must be in reverse order; the first card down must be the king of hearts, then the queen of hearts, etc. If a card that belongs in neither pile comes up, place it on the bottom of the deck, so that it may come up again later. B wins if the last black card (the king of clubs) is extracted before the last red card (the ace of diamonds). Rising sequences among the black cards work in B’s favor, and descents against him; the opposite is true for R and the red cards. So we expect B to have a large advantage for small values of a. A computer simulation was run, with one million trials for each value of a from 1 to 1024. The results are shown in Figure 2.5. In order to decide how many times to shuffle before playing either of the games discussed in this section, we should decide what an acceptable advantage is for player B, then shuffle enough to bring the advantage down to that level. Whatever level we pick, more shuffling is required for Doyle’s game than for the Ace-King game. 33 y 1 1 2 a 2 4 8 16 32 64 128 256 512 1024 Figure 2.5: The advantages for player B in the Ace-King Game (in black) and in Doyle’s game (in red). We assume a 52-card deck that is initially ordered and then given an a-shuffle. 1 “Advantage” means Pa(B wins) . a is graphed on a logarithmic scale. − 2 2.10 Variation Distance The reason we got different answers for the Ace-King game and for Doyle’s game is that the set W of winning decks is different for the two games, and Pa(WAce-King) approaches U(WAce-King) faster than Pa(WDoyle) approaches U(WDoyle). It is natural to ask, then, what set W approaches its uniform probability the slowest—in other words, what game is the hardest to shuffle for. Note: we defined Pa(π) in Equation (2.6) to be the probability of obtaining the permutation π as the result of an a-shuffle. Here we extend that notation: let P (D D0) be the probability that an a-shuffle of the deck D results in the deck a → D0. That is, P (D D0) := P (π). a → a π T (D,D0) ∈ X When the source deck D has been fixed and W is some set of decks, P (D W ) a → 34 refers to the probability that an a-shuffle of D results in one of the decks in W : P (D W ) := P (D D0). a → a → D0 W X∈ We abbreviated that as “Pa(W )” in the previous paragraph. However, at other times we will fix the target deck D0, and for that case we define P (W D0) := P (D D0). a → a → D W X∈ For now, suppose we fix a source deck D and an > 0 and make our goal “to shuffle enough so that the advantage or disadvantage of any one player in any simple game is less than .” So we want to make a large enough that the quantity (2.17) max Pa(D W ) U(W ) W | → − | is less than , where W is allowed to range over all sets of orderings of the deck. The quantity in Equation (2.17) is known as the variation distance between Pa and U, and will be denoted P U . Variation distance is sometimes called “total || a − || variation”, “total variation distance”, or “half the L1 norm”. Note that variation distance depends implicitly on the composition of the deck we are shuffling and its initial order, facts which are hidden by the notation P U . || a − || Fix a, and let f(D0) := P (D D0) U(D0). Call a deck “favored” if it is more a → − likely after an a-shuffle than it would be under the uniform distribution; that is, D0 is favored if f(D0) > 0. D0 is “disfavored” if f(D0) < 0. Then if W is some set of decks, f(W )= D0 W f(D0) can be increased either by appending favored decks to ∈ W or by deletingP disfavored decks. Likewise f(W ) can be decreased by appending disfavored decks or deleting favored decks. “Neutral” decks, which have f(D0) =0, do not affect f(W ) one way or the other. It follows that f(W ) is maximized either | | 35 + when W is the set W of all favored decks or when W is the set W − of all disfavored decks. Those two sets produce the same distance, because 1= Pa(D D0)= U(D0) 0 → 0 XD XD and so + 0= f(D0)= f(D0)+ f(D0)= f(W ) f(W −) . − D0 D0 W + D0 W − X X∈ X∈ Therefore the variation distance after an a-shuffle is f( W +). But we do not need to find W + to compute the variation distance, because + + f(D0) = f(D0) + f(D0) = f(W ) + f(W −) =2 f(W ) | | | | | | D0 D0 W + D0 W − X X∈ X∈ and therefore when the source deck D is fixed, 1 (2.18) Pa U = Pa(D D0) U(D0) . || − || 2 0 | → − | XD We will most often use Equation (2.18) as our starting point for calculating variation distance when the source deck is fixed. We can now calculate the variation distance from uniform of a deck of n distinct cards which are given an a-shuffle, a result which appeared first in [3]. If D is any starting order, the transformation set T (D,D0) contains only one permutation for each D0. So P (D D0)= P (π) where π is the unique permutation which takes D a → a to D0, and Equation (2.6) tells us that Pa(π) depends only on the number of descents in π. Therefore 1 Pa U = Pa(D D0) U(D0) || − || 2 0 | → − | XD 1 1 = P (π) 2 a − n! π Sn X∈ 1 n 1 n + a 1 d 1 = − − 2 d an n − n! d X 36 y 1 1 2 a 2 4 8 16 32 64 128 256 512 1024 Figure 2.6: The variation distance from uniform of a distinct 52-card deck after an a-shuffle. The 1 distance falls below 2 somewhere between a = 64 and a = 128, that is, between 6 and 7 2-shuffles. n where d is the number of permutations in Sn which have d descents. Chapter IV n will give a well-known recurrence relation which makes d easy to calculate for small n, so we can compute P U efficiently. Figure 2.6 shows the variation distance || a − || for n = 52 and a between 1 and 1024. All of the variation distances we have studied have the same basic “waterfall” shape seen in Figure 2.5 and Figure 2.6. 2.11 Dealing Cards is Equivalent to Fixing the Target Deck Suppose now that all the cards in the deck are distinct, but we are playing a game with “hands”. That is, the deck is shuffled and then dealt to several players in some manner. Each player receives some set of cards from the deck, and we assume that the order in which a player receives her cards is irrelevant to how the game will play out. Bridge and straight poker are examples of such a game. If there are cards that do not get dealt (as in the case of straight poker), we may group them together and call them another hand. (If, however, undealt cards may be dealt later, as in the 37 case of draw poker, each undealt card must be a hand by itself.) So without loss of generality assume all cards are dealt. We will refer to a particular partitioning of the cards into hands as a deal. Our goal is still the same - to shuffle enough that the advantage or disadvantage to any one player is small. But now certain orderings of the deck at the time of dealing are equivalent, because they result in the same set of hands. For example, in bridge the players are referred to as North, East, South, and West, and the dealer usually deals cards to them in order, cyclically. That is, the dealing process is (2.19) D0 := NESWNESWNESWNESWNESWNESWNESWNESWNESWNESWNESWNESWNESW where ‘N’ means “deal a card to North”, ‘E’ means “deal a card to East”, etc. In other words, D0 is a function from 1, 2,..., 52 to N, E, S, W which instructs the { } { } dealer where he should deal the ith card. If, after shuffling but before dealing, the dealer were to swap the first and fifth cards in the deck, the subsequent game would be unaffected, because those cards become part of the same hand, and the order in which they arrive is immaterial. Assume that the cards are initially in some known order e = e(1), e(2),...,e(52), and the dealer gives them an a-shuffle before dealing. We may describe a deal as a sequence of 13 N’s, 13 E’s, 13 S’s, and 13 W’s in some order; for instance, the sequence (2.20) D := NSEENNWEWSSWESWNNNEESSSSSESWWNNSENWSEWSWWWEENEWNNNWE refers to the deal in which the North player gets cards e(1), e(5), e(6),...,e(50), the East player gets cards e(3), e(4), e(8),...,e(52), and so on. So D, like D0, is a function from 1, 2,..., 52 to N, E, S, W , and D(i) is the player who receives card { } { } e(i). 38 Suppose that when the dealer shuffles the cards he produces a permutation π, and also that the resulting deal is the D defined in Equation (2.20). In order for the North player to receive card e(1), π must send the first card to one of the positions where an N appears in the deal sequence D0; that is, D0(π(1)) must be N. Likewise, South will receive card e(2) if and only if π(2) is one of the positions where an S appears in D0, so it must be the case that D0(π(2)) = S. In general, then, π produces D if and only if D0(π(i)) = D(i) for all positions i, which means D0 π = D or ◦ 1 D0 = D π− . ◦ Now suppose we think of D and D0 as decks with cards of type N, E, S, and W. Then 1 by Equation (2.1) D π− is the result of π acting on D; so π produces D if and only ◦ if πD = D0, and therefore the probability of getting deal D is P (D D0). So the a → variation distance is (2.21) max Pa(W D0) U(W ) W | → − | where W ranges over all sets of orderings of D0. As in the last section, the W which maximizes the quantity in Equation (2.21) is the set of all “favored” orderings, so after some simplifications we can say that when the deal D0 is fixed then 1 (2.22) P U = P (D D0) U(D) . || a − || 2 | a → − | D X In other words, dealing is almost the same as declaring certain cards to be identical, except that it fixes the target deck in a shuffle instead of the source deck. 2.12 How Good is the GSR Model? The GSR model assumes that all possible shuffles are equally likely. That symmetry is the root of the elegant mathematics surrounding the theory of card shuffling on 39 which this work is based. From a scientific standpoint, one should consider how well the GSR model represents the way people actually shuffle cards. Diaconis and Reeds performed a simple experiment to test the model: each man shuffled about 100 52-card decks and recorded the results of each shuffle. In [13], they report the distribution of the “packet size” statistic for each shuffler. In this context a packet is a maximal sequence of cards dropped from the same hand. So for instance the shuffle 100001101 = 1104120111 has packets of size 1, 4, 2, 1, and 1. Diaconis is a very “neat” shuffler, and as a result his packets are of size 1 about 80% of the time. Reeds’ packets are of size 1 about 62% of the time. Under the GSR model, 27 51% of the packets should have size 1, so it appears that both Diaconis 53 ≈ and Reeds are “neater” than the model. Preliminary tests by this author suggest that packet size distribution varies greatly among human shufflers, and we hope to generate more data on human shuffling for a future work. Other mathematical models for riffle shuffling may be found in [18], [48], and [19]. The first two consider only special subsets of the set of riffle shuffles as we have defined them. Epstein [19] studied real shufflers by recording the sound produced by shuffles. He suggests that for shufflers above a certain level of proficiency, packet size has approximately a geometric distribution; that is, the probability that a packet has size η is about η 1 P(η)= α(1 α) − − 8 and for expert shufflers he suggests α = 9 . This leads to a non-uniform distribution on shuffles. A mathematical model can be judged useful by its accuracy in describing a real- 40 world phenomenon. Alternately, success can result if the model allows us to learn something about the real world that was not otherwise apparent. We hope this latter case applies to the GSR model. For instance, in Chapter V we will show that one of the consequences of the model is that there is a better way to deal a bridge hand than the ordinary cyclic method. We hope in a future work to test this conclusion on real shuffle data, to see if information gleaned from the model translates back to actual human shuffling. CHAPTER III Probability Calculations for Some Simple Decks Let D be a deck of n cards, D0 a rearrangement of D. In Section 2.8 we defined the descent polynomial and shuffle series des π (D,D0; x) := x D 0 π:πDX=D a 1 (D,D0; x) := # a-shuffles which take D to D0 x − S { } a 1 X≤ and showed that computing one was equivalent to computing the other, since n+1 (D,D0; x)=(1 x) (D,D0; x). D − S So if we know either (D,D0; x) or (D,D0; x) then we may fairly say that we D S completely understand the transition from D to D0 under the GSR model, since the a 1 n coefficient of x − in (D,D0; x), when divided by a , is the probability of obtaining S D0 after a-shuffling D. So the first step toward understanding shuffles of decks with repeated cards would seem to be finding a general method for computing (D,D0; x) D and (D,D0; x). S Unfortunately, Viswanath showed in [11] that, for general values of D and D0, com- puting (D,D0; x) (and therefore (D,D0; x)) belongs to a class of counting problems D S known as #P, and is in fact #P-complete, meaning every other problem in #P can 41 42 be reduced to finding descent polynomials in time that is polynomial in the size of the problem. There are many other #P-complete problems, some of considerable interest to industry. As with NP-complete problems, it is generally believed that there is no polynomial-time algorithm to solve any of them. Kozen [30] contains a full explanation of #P-completeness. However we can compute (D,D0; x) and (D,D0; x) efficiently in certain special D S cases, and it is our goal in this chapter to do that. Our computations will bear fruit in Chapter VI, where we use them to approximate variation distances. In the space of all possible decks of size n, there are two extremes that might be considered “simple”: a deck of n distinct cards, and a deck of n cards which are all the same. In the all-distinct case, T (D,D0) consists of a single permutation π, des(π) so (D,D0; x) is simply x . Decks which are close to the all-distinct case (i.e., D decks with very little redundancy) have similarly small transition sets, which makes it possible to compute descent polynomials simply by counting. We begin this chapter from the other extreme (single-valued decks). From there the development is bipartite: we assume a format for the source deck and find D and for general target decks; then we assume the same format for the target deck S and allow the source to vary. It becomes evident how unsymmetric the problem of shuffling is—the answers are very different depending on which deck is fixed. n 1 n 1 The formats we consider are: RB − (one red card on top of n 1 black ones), B − R − n 2 (one red card under n 1 black ones), RB − G (one red card over n 2 black cards − − over a green card), RmBn, and the general ordered deck 1n1 2n2 knk . The final case ··· includes all the previous ones, and represents one ordering of any collection of cards. 43 It still falls far short of encompassing all decks, however; for instance, the methods for ordered decks are no help in understanding the transition from a cyclic deck like (A23456789TJQK)4 to an arbitrary rearrangement, or vice-versa. In Section 3.12 we present one final method that is applicable to the general case. It will yield an improvement over simple counting of permutations when the target deck consists of large blocks of cards of the same value. 3.1 The Simplest Deck First consider a deck of n cards that are all identical; we represent this symbolically by (3.1) D = 1n. That is, D is a sequence of n cards labeled 1. Every rearrangement of D is also D, so all an a-shuffles of D produce D; therefore n a 1 (3.2) (D,D; x)= a x − . S a 1 X≥ On the other hand, if we adopt the standard notation n (3.3) = the number of permutations in S that have d descents d n then, since all permutations of D are D (i.e., T (D,D)= Sn), we have n (D,D; x)= xdes(π) = xd. D d π Sn d X∈ X We will refer to this function as an(x). By Equation (2.12) we have the well-known identity n d n+1 n a 1 (3.4) a (x) := x = (1 x) a x − . n d − d a 1 X X≥ 44 n The d are known as the Eulerian Numbers, because Euler first considered them in [20]. They have been studied extensively—see for example [24], [7], [47]. Let a0(x) := 1 so as to agree with the right-hand side of Equation (3.4). 3.2 One Red Card on Top n 1 Let D1 = RB − . If we identify ‘R’ with ‘red’ and ‘B’ with ‘black’, then D1 is a deck made up of one red card on top of n 1 black cards. Let D be the rearrangement − k of D in which the red card is the kth card in the deck (1 k n). Then π S 1 ≤ ≤ ∈ n will take D1 to Dk if and only if π(1) = k. Therefore n d (3.5) (D1,Dk; x)= x D d k Xd where n is the number of permutations π S that have π(1) = k and des(π)= d. d k ∈ n n The d k will be studied extensively in Chapter IV, but for now we can approach this problem by computing the shuffle series (D ,D ; x) directly. S 1 k The number of a-shuffles that take D1 to Dk is the same as the number of inverse a-shuffles that take Dk to D1, or the number of sequences u ,u ,...,u , with 0 u a 1 for all i 1 2 n ≤ i ≤ − for which a stable sort moves uk to the top. A stable sort will move uk to the top if none of the ui are smaller than uk and none of u1,u2,...,uk 1 are equal to uk. − Suppose we fix the value of uk to be i; then we need k 1 u1,u2,...,uk 1 i +1, i +2,...,a 1 ((a i 1) − choices) − ∈{ − } − − n k u ,u ,...,u i, i +1,...,a 1 ((a i) − choices k+1 k+2 n ∈{ − } − 45 so the total number of a-shuffles that take D1 to Dk is a 1 − k 1 n k (a i 1) − (a i) − . − − − i=0 X Substituting j = a 1 i and summing over a, − − a 1 − k 1 n k a 1 (D ,D ; x)= j − (j + 1) − x − S 1 k a 1 j=0 ! X≥ X k 1 n k a 1 = j − (j + 1) − x − j 0 a j+1 X≥ X≥ 1 k 1 n k j = (1 x)− j − (j + 1) − x − j 0 X≥ where we agree to interpret 00 as 1. We can now use Equation (2.12) to deduce the descent polynomial, which we will call gn,k: n d n k 1 n k j (3.6) g (x) := (D ,D ; x)= x = (1 x) j − (j + 1) − x . n,k D 1 k d − d k j 0 X X≥ We will derive that again in Chapter IV by different means. 3.3 One Red Card on the Bottom Now consider the transformation D D ; that is, the process of moving a red card n → k from the bottom of the deck to position k. We could solve this problem in much the same way as the last, but instead we take a shortcut. Let ρ S be the “reversal” permutation. That is, ∈ n (3.7) ρ(i) := n +1 i, 1 i n. − ≤ ≤ Applying ρ to a deck reverses it. If π is any permutation in Sn, we can define the graph of π by drawing a sequence of line segments from (1, π(1)) to (2, π(2)), from (2, π(2)) to (3, π(3)),..., and from 46 π πρ ρπ ρπρ Figure 3.1: The graphs of π, ρπ, πρ, and ρπρ, showing the way ρ changes ascents to descents and 12345678 vice-versa. Here π = and ρ is the reversal permutation. 86375142 Applying ρ on the left flips the graph upside down, applying it on the right flips the graph left-to-right. Either flip changes negative slopes to positive ones, and vice-versa. The descents of π and their reflections are drawn in black. They become the ascents of ρπ and πρ, and go back to being descents in ρπρ. (n 1, π(n 1)) to (n, π(n)). (See Figure 3.1.) A line segment with negative slope − − represents a descent, and one with positive slope represents an ascent. Since ρπ(i)= n+1 π(i), the graph of ρπ is the graph of π flipped upside down. The − flip changes ascents to descents, and vice-versa; therefore des(ρπ)= n 1 des(π). − − On the other hand, πρ(i) = π(n + 1 i), so the graph of πρ is the graph of π − flipped left to right. This flip also exchanges ascents for descents, so des(πρ) is also n 1 des(π). Finally, the graph of ρπρ is the graph of π flipped both ways, and − − so des(ρπρ)= n 1 des(πρ)= n 1 (n 1 des(π)) = des(π). − − − − − − Now suppose D is any deck of n cards, D0 is a rearrangement of D, and we know 47 all about the set T (D,D0) of permutations that take D to D0. If π T (D,D0), then ∈ ρπρ T (ρD, ρD0), because ∈ 2 (ρπρ)(ρD)= ρπρ D = ρπD = ρD0 since ρ2 is the identity. The map π ρπρ is one-to-one by general group theory, so 7→ since #T (D,D0) = #T (ρD, ρD0) (see Section 2.2), it is a bijection. Even better, it is a bijection that preserves des, so (3.8) (D,D0; x)= (ρD, ρD0; x). D D Applying Equation (3.8) to the problem of D D , we see that n → k (Dn,Dk; x)= (ρDn, ρDk; x)= (D1,Dn+1 k; x). D D D − Therefore n n k k 1 j (3.9) (Dn,Dk; x)= gn,n+1 k(x)=(1 x) j − (j + 1) − x . D − − j 0 X≥ 3.4 Any Position to Top Now consider the transition from D` to D1. In this case it is more straightforward to compute the descent polynomial directly. The transition from D1 to itself was covered by Equation (3.6), and can also be seen as a case of the very simplest problem, where all the cards are the same: πD1 = D1 means that π(1) = 1, so to draw an arrow diagram between D1 and itself, we must connect the top cards with an arrow; then we can draw arrows any way we like between the other cards. Number the arrows 1, 2,...,n according to their starting positions in the source deck. The first arrow cannot cross the second, so all descents 48 are among arrows 2, 3,...,n, which represent an arbitrary permutation from Sn 1. − Therefore n 1 d (3.10) (D1,D1; x)= − x = an 1(x). D d − Xd Now suppose ` > 1. To draw an arrow diagram of a permutation π which takes D` to D1, we first draw a “red” arrow from position ` on the left to position 1 on the right, then fill in the other “black” arrows in any fashion. The `th arrow cannot cross the (` + 1)st because π(` + 1) > 1 = π(`). On the other hand, the (` 1)st − arrow must cross the `th, because π(` 1) > 1= π(`). In other words, the descents − of π consist of 1. the descents among the first ` 1 arrows, − 2. the descents among the final n ` arrows, and − 3. a descent at ` 1. − Let A = 1, 2,...,` 1 and B = ` +1,` +2,...,n be sets of positions in the { − } { } source deck. We can divide the task of drawing the black arrows into three subtasks: 1. Partition the set 2, 3,...,n of positions of black cards in the target deck into { } disjoint sets A0 and B0, with #A0 = ` 1 and #B0 = n `. − − 2. Draw arrows from A to A0. 3. Draw arrows from B to B0. n 1 There are `−1 ways to perform step 1. Performing step 2 amounts to choosing a − permutation πA S` 1, then drawing arrows from i to the πA(i)th largest position ∈ − in A0, for i =1, 2,...,` 1. Arrow crossings will correspond to descents of π . − A 49 R R B R ascent ` 1 { B B − z}|{ B B B B descent R B ascent B B { B B n 1 }| − B B B B n ` }| B B − B B z B B z B B Figure 3.2: The two cases considered in Section 3.4. When the red card is initially on top, the permutation among the black cards is arbitrary, and there are no additional descents. Otherwise, if the red card is not initially on top, there must be a descent between the red card and the card above it, since the arrows from those positions must cross. Likewise there can be no descent between the red card and the one below it, because those arrows cannot cross. Likewise performing step 3 amounts to choosing a permutation πB Sn `. So, for ∈ − `> 1, n 1 (D ,D ; x)= − xdes(πA)+des(πB )+1 D ` 1 ` 1 πA S`−1 πB Sn−` − X∈ X∈ n 1 = − x xdes(πA) xdes(πB) ` 1 πA S`−1 πB Sn−` − X∈ X∈ n 1 = − xa` 1(x)an `(x) ` 1 − − − where a is as in Equation (3.4) for n> 0, and a (x) 1. We can put this together n 0 ≡ with Equation (3.10) to write n 1 [`>1] (3.11) (D`,D1; x)= − x a` 1(x)an `(x) D ` 1 − − − for any `, where 0 if statement A is false (3.12) [A]= 1 if statement A is true. Knuth refers to this as Iverson notation in [26], and traces its origin to [25]. 3.5 Any Position to Bottom It is now easy to describe the transition from D` to Dn, since (D`,Dn; x)= (ρD`, ρDn; x)= (Dn+1 `,D1; x). D D D − 50 Plugging into Equation (3.11) and simplifying, n 1 [` 3.6 Any Position to Any Position Now we calculate the general descent polynomial (D ,D ; x). D ` k Suppose we do an (a + b + 1)-shuffle of D`, and that the red card is in packet a. Imagine the source deck is a family ordered by age, and each packet is a generation. Then from the perspective of the red card, packets 0, 1,...,a 1 contain “ancestors”, − packet a contains “siblings” (older ones above `, younger ones below), and packets a +1, a +2,...,a + b contain “descendants”. Let s be the number of ancestors and t be the number of descendants. Subject to those conditions, here are the steps in constructing a shuffle from D` to Dk: 1. Choose ` 1 s positions above k in the target deck to be the destinations of − − k 1 the older siblings ( ` −1 s choices). − − 2. Choose n ` t positions below k in the target deck to be the destinations of − − n k the younger siblings ( n −` t choices). − − 3. Decide which of the remaining s+t positions in the target deck are for ancestors, s+t and which for descendants ( s choices). 4. Define an a-shuffle of the ancestors (as choices). 51 { B B } B B s “ancestors” }| {z k 1 − z B B | B R ` 1 s “older siblings” } − − z}|{ B B R B B B n ` t “younger siblings” z}|{ B B − − {z n k − { B B B B t “descendants” }| z B B | Figure 3.3: The shuffle described in Section 3.6. The source deck is divided into a + b + 1 packets, with the red card in position ` and packet a. The boundaries of packet a are drawn with horizontal lines. For ease of exposition, the cards in packets 0, 1,...,a 1 are called “ancestors” in the text, those in packets a +1,a +2,...,a + b are called “descendants”,− and those in packet a are “siblings”. Note that packet a cannot change its order during the transition. 5. Define a b-shuffle of the descendants (bt choices). So the shuffle series is k 1 n k s + t (3.14) (D ,D ; x)= − − asbtxa+b. S ` k ` 1 s n ` t s a,b,s,t 0 X≥ − − − − If k = `, the sum above becomes p q s + t (3.15) asbtxa+b. s t s a,b,s,t 0 X≥ where p = k 1, the number of black cards above the red one, and q = n k, the − − number of black cards below the red one, in both the source and target decks. It is a very beautiful sum, but has no obvious simplification. The case presented in this section is treated in depth by Ciucu in [8]. He computes the probability that the card in position ` is in position k after a 2-shuffle, and treats the resulting matrix as a Markov chain. Then he can answer the question of which card is most likely to be in each position. 52 B R { B R u } } z}|{ B B B B u }| descent R B z B B ascent { B B G B B B descent R B }| {z v n 2 {z n 2 − − z B B { B B ascent G B B B descent w }| | B B B B | w z z}|{ B G B G Figure 3.4: The second “red-green” shuffle described in Section 3.7. The red and green cards normally partition the black cards of Dk,` into three packets of sizes u, v, and w. Since the red card is destined for the top of the target deck, there will be a descent above it (unless k = 1) and not below it. Likewise there will be a descent below the green card if ` 3.7 One Red Card, One Green Card Now consider a different generalization. Let Dk,` be a deck with one red card in position k, one green card in position `, and n 2 black cards in the other positions. − Then ` n d (D1,n,Dk,`; x)= x D d k Xd ` where n is the number of permutations π S with des(π) = d, π(1) = k, and d k ∈ n π(n)= `. In Theorem 4.10 we will show that n 1 ` − n d n+k ` if k<` = − d n 1 k − if k >`. d 1 k ` − − So this case reduces to one we have already seen, namely: gn 1,n+k `(x) if k<` (3.16) (D ,D ; x)= − − D 1,n k,` xgn 1,k `(x) if k >`. − − We will refer to the function in Equation (3.16) as hn,k,`(x). In the other direction we have the transition from Dk,` to D1,n. The two colored cards in the source deck separate the black cards into three (possibly empty) packets; call 53 the packets U, V , and W , and let their sizes be u, v, and w respectively. (So u = min k,` 1, v = max k,` min k,` 1, and w = n max k,` . See { } − { } − { } − − { } Figure 3.4) To construct a permutation which takes Dk,l to D1,n, we need to 1. Designate which of the target positions 2, 3,...,n 1 will hold cards from U, { − } (n 2)! − which will hold cards from V , and which will hold cards from W ( u!v!w! choices). 2. Draw arrows from U, V , and W to their destination sets in any fashion. 3. Draw an arrow from position k to position 1 for the red card, and one from position ` to position n for the green card. The complete permutation π has all the descents of the permutations defined in step 2. In addition, there will be a descent before k if k > 1 and a descent after ` if ` < n. (However if k = ` + 1, these are the same descent.) Therefore (n 2)! (3.17) (D ,D ; x)= − xθa (x)a (x)a (x). D k,` 1,n u!v!w! u v w where θ = [k > 1]+[` < n] [k = ` + 1] . − 3.8 Source deck RmBn Now let D = RmBn. That is, D consists of m red cards on top of n black cards. Let D0 be some rearrangement of D. We wish to calculate (D,D0; x). D Let r1,r2,...,rm be the positions of the red cards in D0, and let b1, b2,...,bn be the positions of the black cards. We can construct a permutation that takes D to D0 54 { R R r1 R B b1 R B b2 m }| R R r2 R B z R R descent if r` >bk { B R r` B B B B bk B B n }| B R B R rm z B B bn Figure 3.5: The shuffle of the source deck RmBn, as described in Section 3.8. The positions of the red cards in the target deck are labeled r1, r2,...,rm and those of the black cards b1,b2,...,bn. r` is chosen as the ending position of the last red card from the source deck, and bk the ending position of the first black card. with the following procedure: 1. Choose a destination r` for the last red card in D. 2. Choose a permutation π S with π (m)= `, and for each i 1, 2,...,m , R ∈ m R ∈{ } send the ith red card in D to position rπR(i) in D0. 3. Choose a destination bk for the first black card in D. 4. Choose a permutation π S with π (1) = k, and for each j 1, 2,...,n , B ∈ n B ∈{ } send the jth black card in D to position bπB (j) in D0. Call the complete permutation π. π will have a descent at position m if r` > bk, and all other descents of π come from the descents of πA and πB. So m n [r >b ]+des(π )+des(π ) (D,D0; x)= x ` k R B . D `=1 π Sm k=1 π Sn R∈ B∈ X πRX(n)=` X πBX(1)=k Rearranging summation signs and applying our results from Section 3.2 and Sec- tion 3.3, we have m n [r`>bk] (3.18) (D,D0; x)= x gm,m+1 `(x)gn,k(x). D − X`=1 Xk=1 55 1 1 1 1 1 1 1 x x 1 1 1 1 1 x x x 1 1 1 1 x x x 1 1 1 1 x x x x x x 1 x x x x x x 1 Figure 3.6: The Young matrix A for the deck RBBRBRRBBBRRB. The (i, j) entry is x if the ith R occurs after the jth B, and 1 otherwise. The x’s form a Young shape, anchored in the lower left corner. It is possible to recover the deck from the matrix by tracing the boundary between x’s and 1’s; a vertical segment corresponds to an R and a horizontal segment to a B. Let (3.19) Gr :=(gr,1(x),gr,2(x),...,gr,r(x)) , (3.20) Gr :=(gr,r(x),gr,r 1(x),...,gr,1(x)) , − and let A be an m n matrixe with × x if r > b A = x[ri>bj ] = i j i,j 1 if r < b . i j Then m n T (3.21) (D,D0; x)= Gm A`,k(Gn)k = GmAGn . D ` X`=1 Xk=1 If we assume that r < r < < r ande b < b < rm >rm 1 > >ri > bj > bj 1 > > b1 − ··· − ··· which means that r > b for any (p, q) with p i and q j. In other words, if p q ≥ ≤ Aij = x, then all entries in A below and to the left of (i, j) are also x. So we may refer to A as a Young matrix in the sense that the x’s in A form a Young shape, drawn in the French style (i.e., anchored to the lower left corner). See Figure 3.6. One may recover the deck D0 from A by tracing the boundary between 1’s and x’s from the upper left corner of A to the lower right corner of A. The boundary is a 56 R R } m1 z}|{ R R ascent { B R {z m n1 B R }| z B R descent R R | R B } ascent B B B B descent R B {z n R B ascent B B B B | Figure 3.7: A shuffle to the target deck RmBn. There will be a descent wherever the source deck changes from black to red, because all the black cards in the target deck are below all the red cards. Likewise there are no descents where the source deck changes from red to black. sequence of m + n unit line segments—a vertical segment corresponds to a red card in D0, and a horizontal segment to a black card. 3.9 Target deck RmBn m n Now let D0 := R B , with D some reordering, and consider the transition from D to D0. D can be viewed as a sequence of monochromatic blocks of cards, with the odd blocks one color and the even ones the other. Without loss of generality we may say that D = Rm1 Bn1 Rm2 Bn2 Rmk Bnk ··· for some k, where m1 and nk may be 0, but we insist that the other exponents be positive. We can construct a permutation from D to D0 with the following procedure: 1. Partition the positions of red cards in D0 into those that will receive the first red block in D, those that will receive the second block, etc. ( m! choices). m1!m2! m ! ··· k 57 2. Do likewise for the black cards ( n! choices). n1!n2! n ! ··· k 3. Define a permutation for each block into its destination set. Suppose k > 1 and π is generated as above. Since the (m1 + n1)th card of D is black, π(m + n ) m +1, m +2,...,m + n , and since the next card in D is red, 1 1 ∈{ } π(m + n + 1) 1, 2,...,m . Therefore π has a descent at m + n , and likewise 1 1 ∈{ } 1 1 at each of the k 1 positions where D changes from black to red. By the same − logic there are no descents where D changes from red to black. So all other descents of π come from the permutations defined in step 3, each of which is unconstrained. Therefore k k 1 ami (x) ani (x) (3.22) (D,D0; x)= m!n!x − . D m ! n ! i=1 i i Y 3.10 Source Deck 1n1 2n2 ...mnm n1 n2 nm Now consider D = 1 2 ...m , with D0 some rearrangement of D. Let pv(i) be the position of the ith card with value v in D0. To construct a permutation which takes D to D0, we should 1. Pick ` 1, 2,...,n and π S with π (n )= ` , and send the rth 1 in D 1 ∈{ 1} 1 ∈ n1 1 1 1 to position p (π (r)) in D0, for 1 r n . 1 1 ≤ ≤ 1 2. Pick k ,` 1, 2,...,n . If n = 1, we must have k = ` ; otherwise, if 2 2 ∈ { 2} 2 2 2 n2 > 1, k2 and `2 should be chosen to be distinct. Then pick a permutation π S with π (1) = k and π (n )= ` , and send the rth 2 in D to position 2 ∈ n2 2 2 2 2 2 p2(π2(r)) in D0. 58 · · · · · · 0 v 1 v (`v)th v in D v − 1 − · · · 0 v v +1 (kv+1)th v + 1 in D v · · · · · · · · · 0 v v (kv)th v in D v · · · v +1 · · · 0 v +1 v 1 (`v−1)th v 1 in D − − · · · · · · Figure 3.8: Shuffling a sorted deck, as described in Section 3.10. We fix kv 1, 2,...,nv to be ∈ { } the relative destination of the first v in D. That is, the first v in D will go to the kvth 0 v in D . `v is the relative destination of the last v in D, and we similarly fix ku and `u for the other card values. (Exception: we don’t fix the destination of the first 1 or the last m.) 3. Pick (k3,`3, π3), (k4,`4, π4),..., (km 1,`m 1, πm 1) − − − in the manner of step 2. 4. Pick k 1, 2,...,n and π S with π (1) = k and send the rth m m ∈ { m} m ∈ nm m m in D to position pm(πm(r)). The descents of the resulting permutation π will be those of π1, π2,...,πm together with any “inter-value” descents, which result when, for some value v, the arrow from the last card labeled v in D crosses the arrow from the first card labeled v + 1; i.e., when pv(`v) >pv+1(kv+1). So we have (3.23) (D,D0; x)= f1(x)φ1,2(x)f2(x)φ2,3(x)f3(x) fm 1(x)φm 1,m(x)fm(x) D ··· − − k2,k3,...,km `1,`2X,...,`m−1 where fv(x) is the generating function for the descents of πv, and [pu(`u)>pv(kv)] φu,v(x) := x . 59 Let A be the n n Young matrix (like the one in Figure 3.6) which results from u,v u × v ignoring all cards except those with values u or v. In other words, [pu(i)>pv(j)] (Au,v)ij = x . Then φu,v(x) is the (row `u, column kv) entry of Au,v. From Section 3.3 we know that f1(x) = gn1,n1+1 `1 (x), which is the (`1)th entry in − the vector Gn1 defined in Section 3.8. Likewise, from Section 3.2, fm(x)= gnm,km (x), which is thee (km)th entry in Gnm . If1 fv(x)= hnv,kv,`v (x) where h (x) is the function defined in Equation (3.16). So let H be an n n n,k,` n × matrix with gn 1,n+i j(x) if i < j − − (Hn)ij = hn,i,j(x)= xgn 1,i j(x) if i > j − − 0 if i = j for n> 1, and H = [1]. H is a Toeplitz matrix, since h only depends on i j; 1 n n,i,j − for example (3.24) 2 0 g3,3 g3,2 g3,1 0 x + x 2x 1+ x 2 2 xg3,1 0 g3,3 g3,2 x + x 0 x + x 2x H4 = = 2 2 2 . xg3,2 xg3,1 0 g3,3 2x x + x 0 x + x 2 3 2 2 xg3,3 xg3,2 xg3,1 0 x + x 2x x + x 0 Now we can write the sum in Equation (3.23) as (G ) (A ) (H ) (A ) n1 `1 1,2 `1,k2 n2 k2,`2 2,3 `2,k3 ··· (Am 2,m 1)`m−2,km−1 (Hnm−1 )km−1,`m−1 (Am 1,m)`m−1,km (Gnm )km . k2,k3,...,km ··· − − − `1,`2X,...,`m−1 e In other words, T (3.25) (D,D0; x)= Gn1 A1,2Hn2 A2,3 Am 2,m 1Hnm−1 Am 1,mGn . D ··· − − − m e 60 Note the G’s and H’s are fixed by the cards in D. Only the A’s depend on the order of D0. This product can be shortened, at the expense of some complexity. The procedure for calculating (D,D0; x) is distilled into an algorithm in Appendix C. D 3.11 Target Deck 1n1 2n2 ...mnm n1 n2 nm Now let D0 = 1 2 ...m , D some rearrangement, and consider the transition from D to D0. This case is very similar to the one we considered in Section 3.9. We may write D as a sequence of monochromatic blocks: D = vk1 vk2 vkr 1 2 ··· r 1 where v = v . Let Dˆ be the deck v v v , and let B = Dˆ − (v). That is, B is i 6 i+1 1 2 ··· r v v the set of blocks that have value v. Then to construct a permutation from D to D0, we should 1. Identify which of the positions of 1’s in D0 will receive the first block of 1’s in D, which will receive the second, etc. (n !/ k ! choices). 1 i B1 i ∈ Q 2. Do likewise for values 2, 3,...,m. There are n !n ! n ! 1 2 ··· m k !k ! k ! 1 2 ··· r ways to perform steps 1 and 2. 3. Choose a permutation from each block to its destination set. Descents in the complete permutation π will come from those internal to the block permutations, and from the descents of Dˆ. That is, if Dˆ(i) > Dˆ(i +1) then all of 61 2 1 } k1 z}|{ 2 1 n1 descent {z { 1 1 1 1 | k2 }| z 1 2 } ascent 3 2 n2 3 2 {z descent 2 2 | 2 3 } descent 1 3 ascent { 3 3 n3 {z 3 3 kr }| z 3 3 | Figure 3.9: A shuffle to the target deck 1n1 2n2 ...mnm . There will be a descent wherever the source deck changes from a high numbered card to a lower one (such as from 3 to 2), but not when the change is from a low number to a higher one (such as from 1 to 3). the cards in the ith block of D will end up below all the cards in the (i + 1)st block. In particular, the arrow from the last card of the ith block will cross the arrow from the first card of the (i + 1)st block, so π will have a descent between the blocks. By the same logic, if Dˆ(i) < Dˆ(i + 1) there will be no descent between the blocks. So let d = # i 1, 2,...,r 1 : Dˆ(i) > Dˆ(i + 1) . 0 ∈{ − } n o The number of inter-value descents is always d0, so we have n1!n2! nm! d0 (3.26) (D,D0; x)= ··· x a (x)a (x) a (x). D k !k ! k ! k1 k2 ··· kr 1 2 ··· r 3.12 Target Decks Containing Blocks The fallback method for computing (D,D0; x) is to iterate through all the permu- D tations which take D to D0, and record the number of descents in each. Of course this takes factorial time in general, which is why we have attempted in this chapter to find improved methods when the decks are of special types. Here we extend the method of Section 3.11 to general target decks and obtain good results when the 62 target deck is “blocky”, i.e., contains blocks of cards of the same value. Finding a permutation from D to D0 means assigning a valid destination to each position in the source deck. The idea here is to assign positions in the source deck to blocks in the target deck, and thus group together a set of permutations. Suppose D0 = 111222111222. Then D0 contains 4 blocks, which we will number 1 through 4, top to bottom. If D = 122112222111 then an assignment of positions in the source deck to blocks might be P = 144332224113, which can be read as “The first card in D goes to block 1 in D0, the second to block 4, the third to block 4, . . ., the last to block 3.” This assignment to blocks represents 3!4 = 1296 permutations, since there are 3 cards in each block and therefore 3! ways to resolve the assignments to each block into positions. Let R be the set of permutations represented by the block assignment P . We can find the descent polynomial (x) for those permutations by scanning through P . DR Where P has an ascent (P (i) < P (i + 1)), every permutation in R must also have an ascent, since position i goes to a lower-numbered block, and therefore a lower- numbered position, than does i + 1. Likewise descents in P force descents in every π R. ∈ Consider the 222 which appears at positions 6, 7, and 8 in P . It forces the permu- tations in R to send 6, 7, 8 into 4, 5, 6 (the positions of block 2), but it does not { } { } 2 constrain the order in which they are assigned. Since a3(x)=1+4x + x , we know 1 that 6 of the permutations in R will have no descents among positions 6, 7, and 8, 4 1 6 will have one descent, and 6 will have two descents. The number of such descents is independent of all the other choices we make when resolving P into a permuta- `1 `2 `s m1 m2 mt tion. So in general if D0 = v v v and P = b b b , with v = v and 1 2 ··· s 1 2 ··· t b 6 b+1 63 bj = bj+1, then 6 t am (x) (x)=(` !` ! ` !)xdes(P ) j . DR 1 2 ··· s m ! j=1 j Y If we sum up (x) over all choices for the block assignment P , we will have DR (D,D0; x). D Here is a recursive algorithm to do that. We assume that the set of card values is 1, 2,...,K . For convenience we assume certain variables are global; they are { } capitalized. Uncapitalized variables are local. We use i to represent a position in the deck, b to represent the number of a block in the target deck, and v to represent a card value. A block object is a structure containing two fields: value and count. FindDesPoly(D,D0) 1. PreprocessTarget(D0) 2. PreprocessSource(D) 3. return MapBlock(1) PreprocessTarget(D0) 1. N Length(D0) 2. P ← an array of N integers 3. Blocks← an empty list of Block objects 4. MapSize← 1 ← 5. v D0(1) 6. m← 1 7. for←i 2 to N ← 8. if (D0(i)= v) 9. m m +1 10. MapSize← MapSize m 11. else ← × 12. Append a new block with (value = v, count = m) to Blocks 13. v D0(i) 14. m← 1 15. Append a← new block with (value = v, count = m) to Blocks PreprocessSource(D) 1. Pos array of K sets, all initially empty 2. for i← 1 to Length(D) 3. v ← D(i) ← 64 4. Pos[v] Pos[v] i ← ∪{ } MapBlock(b) 1. if (b> #Blocks) 2. return CalcPoly() 3. else 4. despoly 0 5. v Blocks← [b].value 6. m← Blocks[b].count 7. for← each s Pos[v] with #s = m 8. for each⊂i s 9. P [i] j∈ 10. Pos[v] ←Pos[v] s 11. despoly← despoly\ + MapBlock(j + 1) 12. Pos[v] ←Pos[v] s 13. return despoly← ∪ CalcPoly() 1. despoly MapSize 2. b P (1)← 3. m← 1 4. for←i 2 to N 5. if ←(P (i)= b) 6. m m +1 7. despoly← despoly/m 8. else ← 9. despoly despoly am(x) 10. if (P (i) ←< b) × 11. despoly despoly x 12. b P (i) ← × 13. m← 1 ← 14. despoly despoly am(x) 15. return despoly← × Once the target deck has been pre-processed, the procedure can be repeated for any number of source decks. In Chapter VI we will use this method to find descent polynomials for euchre, where the target deck is fixed as 111223334411222334445666 and the source deck allowed to vary. Using the method from this section cuts down processing time for a euchre deck by a factor of about 9000 over iteration through all the permutations. CHAPTER IV The Joint Distribution of π(1) and des(π) in Sn 4.1 Introduction and Main Results In this chapter we will often identify a permutation π S with the sequence ∈ n π(1), π(2),...,π(n). So for instance if π(1) = k and π(n)= `, we say that π “begins with” k and “ends with” `. In Chapter III we introduced the Eulerian Numbers: n (4.1) := # π S : des(π)= d . d { ∈ n } which have been widely studied; see, for example, [24, p. 267] and [7]. We also defined the refined Eulerian numbers: n (4.2) := # π S : des(π)= d and π(1) = k . d { ∈ n } k This chapter is an investigation of the numbers defined in Equation (4.2). We derive a formula in terms of binomial coefficients: Theorem 4.1 If 1 k n, ≤ ≤ n d j n k 1 n k = ( 1) − j − (j + 1) − d − d j k j 0 X≥ − where 00 is interpreted as 1 65 66 which is similar to a well-known formula for the Eulerian numbers. We use the formula to understand how the two statistics des(π) and π(1) interact. If we are constructing a permutation with d descents from left to right, and d is small, a conservative strategy would seem to be to start with a low number, since starting with a high number means we will use up one of our descents near the beginning of the permutation. So in other words, we expect that if d is small then there are more permutations with d descents starting with low numbers than starting with high numbers. Similarly, if d is close to n, our intuition is that starting with a high number leaves us more possibilities later on. This intuition turns into a result that is surprisingly simple to state: Theorem 4.3 If π is chosen uniformly from among those permutations of n that have d descents, the expected value of π(1) is d +1 and the expected value of π(n) is n d. − Theorem 4.7, stated in Section 4.7, asserts that, as expected, the sequence n n n , ,..., d d d 1 2 n is weakly decreasing when d is small and weakly increasing when d is large. Con- sequently that sequence is an interpolation between its endpoints, which are two n 1 n 1 Eulerian numbers: −d and d−1 . Experimental evidence (see Section 4.10) sug- − gests that it is a good interpolation, at least when d is close to (n 1)/2, in the sense − that a normal approximation to the Eulerian numbers also seems to provide a good approximation to the refined Eulerian numbers. However, the normal approximation is good for neither set when d is small or d is close to n. Theorem 4.7 shows that in those cases the distribution of π(1) is approximately geometric. 67 The application which led directly to this work is presented in Section 4.5. Fulman shows in [21] that certain statistics on permutations, including the descent statistic, are approximately normally distributed. (Note that descents were known to be ap- proximately normal before Fulman’s work; see [12] for references.) The main tool he uses is Stein’s method, due to Charles Stein [45]. The thrust behind the method is to introduce a little extra randomness to a given random variable to get a new one. If certain symmetries are present, the result is an “exchangeable pair” of random variables, meaning, essentially, that the Markov process which takes one to the other is reversible. Then Stein’s theorems (and more recent refinements of them) can be applied to bound the distance between the original variable’s distribution and the standard normal distribution. Fulman uses a “random to end” operation to add randomness to permutations. That is, he starts with a uniformly distributed permutation π and sets π0 =(I, I +1,...,n)π where I is selected uniformly from 1, 2,...,n . While (π, π0) is not an exchangeable { } pair, it turns out that (des(π), des(π0)) is, and this leads to a central limit theorem for descents, and for a whole class of statistics. We tried a different method of adding randomness to π, namely, composing π with a uniformly selected transposition. That calculation (which is presented in Section 4.5) led directly to Theorem 4.3. The Neggers-Stanley Conjecture, now disproved in general ([6, 46]), was that the generating function for descents among the linear extensions of a fixed finite poset has only real zeroes. Since a function with positive coefficients can have no positive 68 zeroes, any combinatorial generating function with all real zeroes can be written in the form a(x + c )(x + c ) (x + c ) 1 2 ··· n for non-negative constants a, c1,c2,...,cn. The implication, then, is that if D is the number of descents in a uniformly selected linear extension of a poset for which the Neggers-Stanley conjecture is true, then D can be written as a sum of independent Bernoulli variables. In Section 4.6 we present several generating functions for the refined Eulerian num- bers. The set of permutations of n which begin with k is the same as the set of linear extensions of the poset defined on 1, 2,...,n by k < a for all a other than k. { } So we can find the Neggers-Stanley generating function for this poset explicitly, and we show that it does indeed have only real zeroes. We go on to show that several similar posets also satisfy the conjecture. (All of the posets considered were known to satisfy the conjecture by theorems of Simion [41] and Wagner [50].) 4.2 Basic Properties If π(1) = 1, then π(1) is certainly less than π(2), so all descents are among the final n 1 numbers. And if π(1) = n, there is certain to be a descent between π(1) and − π(2). So we know some boundary values: n n 1 n n 1 (4.3) = − and = − d d d d 1 1 n − 69 for n> 1. Also, it is immediate that n (4.4) =(n 1)! d k − Xd n n (4.5) = . d k d Xk Let ρ S be the reversal permutation, as defined in Equation (3.7): ρ(i)= n+1 i. ∈ n − Then ρπ is the same as π but with i replaced by n +1 i everywhere. As a result, − ρπ has a descent wherever π has an ascent, and an ascent wherever π has a descent. So des(ρπ)= n 1 des(π). Since π ρπ is a bijection from S to itself, it follows − − 7→ n that n n (4.6) = . d n 1 d − − Note we could have obtained the same result from the map π πρ, since reversing 7→ π changes ascents to descents and also reflects their positions about the center. Let n k (4.7) := # π S : des(π)= d and π(n)= k . d { ∈ n } Both transformations yield symmetric identities for the refined Eulerian numbers. If π(1) = k and des(π)= d then ρπ(1) = n +1 k and des(ρπ)= n 1 d − − − πρ(n)= k and des(πρ)= n 1 d − − ρπρ(n)= n +1 k and des(ρπρ)= d − from which it follows that k n+1 k n n n n − (4.8) = = = . d k n 1 d n+1 k n 1 d d − − − − − 70 4.3 Recurrences Assume n> 1. Let T := π S : π(1) = k and des(π)= d k { ∈ n } T := π S : π(1) = k, π(2) = `, and des(π)= d k,` { ∈ n } and let π T . If ` n 1 #T = − k,` d 1 − ` when ` must be d descents in the tail. This time ` is the (` 1)st largest value in the tail, − so n 1 #Tk,` = − d ` 1 − when `>k. Of course Tk is the disjoint union of the Tk,l, so n n 1 n 1 = #Tk = #Tk,` = − + − d k d 1 ` d ` 1 X` X` n 1 n − n 1 (4.9) = − . d k d [` Now suppose 1 k n 1 and π S begins with k. Swapping k with k + 1 in ≤ ≤ − ∈ n the sequence π(1), π(2),...,π(n) preserves descents for most π; the only exception is when π(2) = k + 1, in which case a new descent is created. If we eliminate that 71 case, the swap map is a bijection from T T to T T , as those sets are k \ k,k+1 k+1 \ k+1,k defined above. Substituting sizes for sets, we have n n 1 n n 1 (4.10) − = − . d − d d − d 1 k k k+1 − k Equation (4.10) is valid as long as k = 0 and k = n. (If k < 0 or k > n, all terms 6 6 are 0.) n A well-known recurrence for d comes from considering what happens when you insert n into an element of Sn 1: − n n 1 n 1 (4.11) =(n d) − +(d + 1) − d − d 1 d − We can get a similar recurrence for the refined Eulerian numbers by considering what happens when you insert n into an element of Sn 1 which begins with k: − n n 1 n 1 (4.12) =(n d 1) − +(d + 1) − . d − − d 1 d k − k k In other words, one way to get an element of Sn which begins with k and has d descents is to take an element of Sn 1 which begins with k and has d descents, and − insert n at a descent or at the end (d + 1 choices). The other way is to start with an element of Sn 1 which begins with k and has d 1 descents, and insert n at an − − ascent (n d 1 choices). Equation (4.12) fails when k = n, since a permutation of − − Sn 1 cannot begin with n. It is valid for all other values of k. − 4.4 Formulas and Moments There is an explicit formula for the Eulerian numbers in terms of binomial coefficients: n d j n +1 n (4.13) = ( 1) − (j + 1) . d − d j j 0 X≥ − 72 See for example, [24, p. 269]. [Aside: Equation (4.13) follows from Equation (4.11), which means that it is valid for all values of d, even if d< 0 or d n]. So we have ≥ n n 1 d j n n 1 (4.14) = − = ( 1) − (j + 1) − d d − d j 1 j 0 X≥ − n n 1 d 1 j n n 1 (4.15) = − = ( 1) − − (j + 1) − d d 1 − d 1 j n j 0 − X≥ − − d j n n 1 = ( 1) − j − . − d j j 0 X≥ − n These suggest a formula for d k: Theorem 4.1 If 1 k n, ≤ ≤ n d j n k 1 n k (4.16) = ( 1) − j − (j + 1) − d − d j k j 0 X≥ − where 00 is interpreted as 1. Proof. Let Gi count the number of ways to place balls numbered 1, 2,...,n into bins numbered 1, 2,...,i, subject to the restriction that the lowest numbered ball in bin 1 is ball k. Then the balls whose labels are less than k have i 1 possible − destinations, and those with labels greater than k have i possible destinations. So k 1 n k G =(i 1) − i − . i − Each arrangement of balls in bins corresponds to a permutation with i 1 or fewer − descents, by writing the numbers of the balls in bin 1 in increasing order, followed by the numbers of the balls in bin 2, etc. How many times does Gi count a particular permutation π? To find a ball/bin arrangement which represents π, we need to write out π(1), π(2),...,π(n) and then place i 1 “cuts” between the numbers. If π has − d descents, then d of the cuts must be placed where the descents occur, but the 73 other i 1 d may be placed anywhere except before π(1), which must be k and − − must be in the first bin. A standard stars and bars argument tells us that there (n 1)+(i 1 d) are − i 1 −d − ways to place the extra cuts, so that is the number of times π is − − counted by Gi. Substituting j = d + 1 yields n 1+ i j n (4.17) G = − − . i i j j 1 j i k X≤ − − n T So if we let Dj := j 1 , we have G = MD, where G = (G1,G2,...) , D = − k T (D1,D2,...) , and M is a lower-triangular matrix with n 1+ i j M = − − 1 j i. ij i j ≤ ≤ − To invert M, we note that there is a homomorphism from the ring of formal power series onto the ring of lower-triangular Toeplitz matrices, namely a0 2 a1 a0 (a0 + a1x + a2x + . . .) . 7→ a2 a1 a0 . . . . ...... Under this map, M is the matrix for n 1+ r m(x)= − xr. r r 0 X≥ The coefficient of xr is the number of ways to put r identical balls into n boxes, hence 1 m(x)= 1+ x + x2 + n = . ··· (1 x)n − 1 M − must represent the polynomial for 1 n = (1 x)n = ( 1)r xr m(x) − − r r 0 X≥ so 1 i j n M − =( 1) − . ij − i j − 74 1 Finally, D = M − G so n 1 d j n k 1 n k = D = M − G = ( 1) − j − (j + 1) − d d+1 d+1,j+1 j+1 − d j k j 0 j 0 − X≥ X≥ as desired. Remark 1: Finding all the ball/bin arrangements which produce a particular per- mutation is similar to finding all shuffles of a deck of cards which produce that permutation, as in Section 2.7. Remark 2: Note that the proof never assumed that d was less than n, and Equa- tion (4.16) is clearly true if d < 0. So the theorem is true for all integer values of d. Remark 3: We can rewrite Equation (4.17) as n 1+ i j n k 1 n k (4.18) − − =(i 1) − i − i j j 1 − j k X − − which is a k-analog of the Worpitzky identity [24]. From Equation (4.16) we can deduce a formula for the mth “rising moment” of π(1) when des(π) is fixed. Assume π is chosen uniformly from Sn, and let des(π)=d m (4.19) µm := E π(1) where xm = x(x + 1)(x + 2) (x + m 1). ··· − Lemma 4.2 n 1 − n d j n m + n ` (4.20) µ = m! ( 1) − j . d m − d j ` j 0 `=0 X≥ − X 75 y 1 x = m + 2 ` r 0 x 0 m m +1 m + n ` − Figure 4.1: A north-east lattice path from (0, 0) to (m + n `,`). (All edges are either north or east.) − Proof. From Equation (4.16), n n n m n (k + m 1)! d j n k 1 n k µ = k = − ( 1) − j − (j + 1) − d m d (k 1)! − d j k=1 k k=1 j 0 X X − X≥ − n 1 − d j n r + m r n 1 r = m! ( 1) − j (j + 1) − − − d j r j 0 r=0 X≥ − X n 1 r n 1 r n 1 r s (the last by setting r = k 1). But (j +1) − − = − − − − j . So let ` = r +s − s=0 s and we have P n 1 ` − n d j n ` r + m n 1 r (4.21) µ = m! ( 1) − j − − . d m − d j r ` r j 0 `=0 r=0 X≥ − X X − Let φ be a north/east lattice path from (0, 0) to (m + n `,`) (see Figure 4.1). − m+n The number of such paths is ` . If r is the height at which φ crosses the line 1 x = m + 2 , then φ consists of a path from (0, 0) to (m, r), a horizontal segment, and a path from (m+1,r)to(m+n `,`). Counting the possibilities for the parts yields − the identity ` r + m n 1 r m + n (4.22) − − = . r ` r ` r=0 X − 76 Substituting Equation (4.22) into Equation (4.21) yields the desired result. Note that the last sum in Equation (4.20) is a truncated binomial expansion of (j + 1)m+n. Theorem 4.3 If π is chosen uniformly from among those permutations of n that have d descents, the expected value of π(1) is d +1 and the expected value of π(n) is n d. − Proof. The expected value of π(1) is µ1, and n 1 − n d j n n +1 ` µ = ( 1) − j d 1 − d j ` j 0 `=0 X≥ − X d j n n+1 n+1 n = ( 1) − (j + 1) j (n + 1)j − d j − − j 0 − X≥ d j n n+1 d i n n = ( 1) − (j + 1) ( 1) − (n +1+ i)i . − d j − − d i j 0 i 0 X≥ − X≥ − The term for i = 0 is 0, so let j = i 1 and combine − d j n n n = ( 1) − (j + 1) (j +1)+ (n + j + 2) . − d j d j 1 j 0 X≥ − − − n+1 The quantity in brackets simplifies to (d + 1) d j , so − n d j n n +1 n µ =(d + 1) ( 1) − (j + 1) =(d + 1) . d 1 − d j d j 0 X≥ − Therefore des(π)=d µ1 = E π(1) = d +1. For the second part, 1 n k 1 n Edes(π)=dπ(n)= k = k n d n n 1 d d k n 1 d k k X − − X − − des( π)=n 1 d = E − − π(1) = n d. − 77 Remark: It is possible to prove Theorem 4.3 by induction using the recurrences in Equation (4.11) and Equation (4.12). 4.5 Application Using Stein’s Method Charles Stein developed a method for showing that the distribution of a random vari- able W which meets certain criteria is approximately standard normal. His technique has come to be known as Stein’s method; see [45] or [16] for more explanation. In its most straightforward form, Stein’s method requires finding a “companion” random variable W ∗ such that (W, W ∗) is an exchangeable pair, meaning that (4.23) P(W = w, W ∗ = w∗)= P(W = w∗, W ∗ = w) for all values of w and w∗. If we can find such a W ∗ and if, in addition, there is a λ between 0 and 1 such that W (4.24) E W ∗ = (1 λ)W − (that is, the expected value of W ∗ when W is fixed at some value is 1 λ times that − value), then we may apply Stein’s method. We are interested in showing that if π is chosen uniformly from Sn, then the random variable D = des(π) is approximately normal. This has been proven before, and in more generality; see [21] for references. Our goal here is to demonstrate the set- up for Stein’s method—that is, finding a companion variable and showing that it satisfies Equation (4.23) and Equation (4.24). From there, applying the method would proceed as in [21]. 78 Often the companion variable in Stein’s method is defined by adding a little bit of randomness to the variable we are interested in. In this case, let τ be selected uniformly from among the transpositions in Sn, independently of π. Then τπ is uniformly distributed over S , and for any u, v S , n ∈ n 1 1 P(π = u,τπ = v)= P(π = u, τ = vu− )= P(π = u)P(τ = vu− ) 1 1 P(π = v,τπ = u)= P(π = v, τ = uv− )= P(π = v)P(τ = uv− ). 1 1 n − 1 Both right-hand sides are (n!)− 2 if vu− is a transposition and 0 otherwise, so (π,τπ) is an exchangeable pair. Let D∗ := des(τπ). Since (π,τπ) is an exchangeable pair, (F (π), F (τπ)) is exchange- able for any function F . So(D,D∗) is exchangeable. For 1 i n 1 let ≤ ≤ − Di = [π(i) > π(i +1)] and Di∗ = [τπ(i) > τπ(i + 1)] n 1 n 1 be Bernoulli random variables; then D = i=1− Di and D∗ = i=1− Di∗. P P Fix π and i and let a = π(i), b = π(i +1). If a < b, the only ways for τπ(i) to be bigger than τπ(i +1) are if τ swaps a with something bigger than b (n b ways), if − τ swaps b with something smaller than a (a 1 ways), or if τ swaps a with b. So − Di=0 n + π(i) π(i + 1) E (Di∗ Di)= P(Di∗ =1 Di =0)= −n − | 2 and similarly if a > b, Di=1 n + π(i + 1) π(i) E (Di∗ Di)= P(Di∗ =0 Di =1)= n − . − − | − 2 So in general D π(i) π(i + 1) 2(1 2Di) E i (D∗ D )= − + − . i − i n n 1 2 − 79 Summing now over i causes the π(i) terms to telescope: n 1 − π π π(1) π(n) 4D E (D∗ D)= E (D∗ D )= − +2 − i − i n − n 1 i=1 2 X − which allows us to apply Theorem 4.3: D D D D π E π(1) E π(n) 4D E (D∗ D)= E E (D∗ D)= − +2 − − n − n 1 2 − 2 4D 2(n 1) 4D = ((D + 1) (n D ))+2 = − − . n(n 1) − − − n 1 n − − The mean and variance of des(π) are µ :=(n 1)/2 and σ2 :=(n+1)/12 respectively, − so the variables des(π) µ des(τπ) µ W := − and W ∗ := − σ σ have mean 0 and variance 1. Then (W, W ∗) is an exchangeable pair and W =w D=σw+µ D∗ µ D µ 1 D=σw+µ E (W ∗ W )= E − − = E (D∗ D) − σ − σ σ − which is to say W 2(n 1) 4(σW + µ) 4 E (W ∗ W )= − − = W. − σn −n So if W ∗ is obtained using the “random transposition” method described here, (W, W ∗) will be an exchangeable pair satisfying Equation (4.24) with λ =4/n. One can now proceed with Stein’s method and show that W is close to being a standard normal random variable. 4.6 Generating Functions Recall from Equation (3.4) that n (4.25) a (x) := xd = (1 x)n+1 (j + 1)nxj. n d − d j 0 X X≥ 80 and we defined a0(x) = 1 so as to agree with the right-hand side of Equation (4.25). We can define one generating function for all the Eulerian numbers: A(x, z):= a (x)zn/n!= (1 x)n+1 (j + 1)nxjzn/n! n − n 0 n 0 j 0 X≥ X≥ X≥ j n j (j+1)(1 x)z = (1 x) x ((j + 1)(1 x)z) /n!=(1 x) x e − − − − j 0 n 0 j 0 X≥ X≥ X≥ (1 x)z (1 x)z (1 x)z j (1 x)e − = (1 x)e − xe − = − − 1 xe(1 x)z j 0 − − X≥ x 1 = − . x e(x 1)z − − n There are various ways to define generating functions for the d k, depending on which variables are kept constant. Theorem 4.4 θy n d k n 1 dt (4.26) x y z /n!= 1 1/y d k θ θ x t − n,d,kX Z − 1 x where θ = exp y−−1 1 z . − n o Proof. Let B(x, y, z) be the left-hand side of Equation (4.26). Note the sum is over all integers n, d, and k. So 1 ∂B (4.27) (y− 1) + (1 x)B = − ∂z − n +1 n +1 n n + xdykzn/n! d k+1 − d k d k − d 1 k n,d,kX − Let S(n,d,k) be the bracketed quantity. It is clearly 0 if n< 0, and if n = 0, 1 if d =0,k =0 S(0,d,k)= 1 if d =0,k =1 − 0 otherwise so n = 0 contributes 1 y to the sum on the right-hand side of Equation (4.27). If − n 1, then by Equation (4.10), S(n,d,k) is 0 unless k =0 or k = n + 1, in which ≥ 81 case n +1 n n +1 n S(n, d, 0) = = and S(n, d, n +1)= = . d d − d − d 1 1 n+1 − Therefore 1 ∂B n d n n d n+1 n (y− 1) + (1 x)B =1 y + x z /n! x y z /n! − ∂z − − d − d 1 n 1,d n 1,d X≥ X≥ − =1 y +(A(x, z) 1) xy(A(x, yz) 1) − − − − = A(x, z) xyA(x, yz)+(x 1)y − − ye (1 x)yz 1 = (1 x) − − . − x e (1 x)yz − x e (1 x)z − − − − − − 1 x αz 1 Let α = y−−1 1 . Then θ, as defined in the theorem, is e . Dividing by y− 1 and − − multiplying through by θ gives ∂B ye (1 x)yz 1 θ + αθB = αθ − − ∂z x e (1 x)yz − x e (1 x)z − − − − − − which is to say that ∂ yθy θ (θB)= α . ∂z x θy 1 − x θ1 1/y − − − − Differentiating the integral on the right-hand side of Equation (4.26), y ∂ θ dt ∂θy 1 ∂θ 1 1 1/y = 1 1/y 1 1/y ∂z θ x t − ∂z "x (θy) − # − ∂z x θ − Z − − − αyθy αθ ∂ = = (θB) . x θy 1 − x θ1 1/y ∂z − − − − Since θB and the integral have the same derivative with respect to z, and they both vanish when z = 0, they are equal. 82 Here are three more generating functions. They can all be found by plugging in Equation (4.16) and switching summation signs. n d n k 1 n k j (4.28) x = (1 x) j − (j + 1) − x d − d k j 0 X X≥ n n n k d j n (j + 1) (jy) (4.29) y = y ( 1) − − d − d j j +1 jy k k j 0 X X≥ − − n (j + 1)n (jy)n (4.30) xdyk = (1 x)ny − xj d − j +1 jy d,k k j 0 X X≥ − Note that Equation (4.28) is the function we called gn,k(x) in Chapter III. We can now prove a special case of the Neggers-Stanley conjecture. Let P be a poset of n elements with labels 1, 2,...,n. A linear extension of P is an ordering of 1, 2,...,n which preserves the ordering of P ; that is, a π S which is such that if ∈ n i < j then i appears before j in the list π(1), π(2),...,π(n). If (P ) denotes the P L set of linear extensions of P , then Neggers and Stanley [43, p. 311] conjectured that for any poset, every zero of the descent polynomial (P )(x) is real. DL The conjecture has been shown to be false in general [6, 46]. But we can prove it is true in a certain special case. Theorem 4.5 If Pn,k is the poset with Hasse diagram 1 2 k 1 k +1 k +2 n ··· − ··· k then (P )(x) has only distinct real roots. DL n,k 83 Proof. For u, v 0 let ≥ u + v +1 d des(π) cu,v := x = x . d u+1 d π Su+v+1 X π∈(1)=Xu+1 Then setting u = k 1, v = n k yields the polynomial in question. If v = 0, c − − u,v counts the reversal permutation ρ, which has (u+v +1) 1= u descents. Otherwise, − if v > 0, cu,v doesn’t count ρ but it does count the permutation u +1,u,u 1,..., 1,u + v +1,u + v,...,u +2 − which has u + v 1 descents. So − u if v =0 deg(c )= u,v u + v 1 if v > 0. − Similarly, if u = 0, cu,v counts the identity permutation, which has no descents. Otherwise it doesn’t count the identity but it does count u +1,u +2,...,u + v +1, 1, 2,...,u which has 1 descent. So x - c (x) and if u> 0, x c (x) but x2 - c (x). Now let 0,v | u,v u,v c h := u,v . u,v (1 x)u+v+1 − Note that c (1) = # π S : π(1) = u +1 = (u + v)!, so c does not have u,v { ∈ u+v+1 } u,v a zero at x = 1. Therefore hu,v has exactly the same zeroes as cu,v, plus a pole at x = 1. By Equation (4.28), u v j hu,v(x)= j (j + 1) x . j 0 X≥ If D represents differentiation with respect to x, we have (xD)hu,v(x)= hu+1,v(x) and (Dx)hu,v(x)= hu,v+1(x) 84 −1 1 0 h0,0(x)=(1 x) −∞ − Dx − −2 h0,1(x)=(1 x) Dx − −3 h0,2(x)=(1 x) (1 + x) Dx − −4 2 ¡ h0,3(x)=(1 x) (1+4x + x ) ¡ xD − −5 2 3 ¢ ¢ h1,3(x)=(1 x) (8x + 14x +2x ) ¢ xD − −6 2 3 4 £ £ £ h2,3(x)=(1 x) (8x + 60x + 48x +4x ) £ xD − −7 2 3 4 5 ¤ ¤ ¤ ¤ h3,3(x)=(1 x) (8x + 160x + 384x + 160x +8x ) ¤ − Figure 4.2: The real zeroes of a Neggers-Stanley descent polynomial. This is the construction of h3,3(x) as described in the proof of Theorem 4.5. The zeroes of each function are plotted on the right, using an inverse tangent scale. Since each function is generated from the previous one by applying either the Dx or the xD operator, Rolle’s Theorem guarantees that the zeroes must interleave. By a counting argument, all the zeroes of each function must be real. and so v u h0,v(x)=(Dx) h0,0(x) and hu,v(x)=(xD) h0,v(x). 1 2 h (x)=(1 x)− and h (x)=(1 x)− both have no zeroes. Suppose v 1 and 0,0 − 0,1 − ≥ h has only distinct real zeroes. Since deg(c )= v 1 and x - c (x), xc (x) and 0,v 0,v − 0,v 0,v xh (x) have v distinct real zeroes. By Rolle’s Theorem, (Dx)h must have v 1 0,v 0,v − distinct zeroes interlaced between those of xh0,v(x). Furthermore, the denominator of xh (x) has degree v +1, so xh (x) approaches 0 as x . Therefore its graph 0,v 0,v →∞ must turn back toward the x-axis somewhere to the left of its leftmost zero, at which place there must be another zero of (Dx)h0,v. So we have found v real zeroes of h0,v+1, and that accounts for all its zeroes. Applying the xD operator goes similarly. Given that hu,v has d distinct real zeroes, by Rolle’s Theorem Dh (x) has d 1 interlaced zeroes. Since the numerator of u,v − hu,v has degree smaller than the denominator, hu,v must turn back toward the axis to the left of its leftmost zero, which accounts for one more zero of Dhu,v. Finally, 2 (xD)hu,v has one more zero at 0 (which is distinct from the others since x - hu,v and therefore x - Dhu,v). So we have found d +1 real zeroes of hu+1,v, and that accounts 85 for all of the zeroes. Corollary 4.6 The same can be said for the poset k 1 2 k 1 k +1 k +2 n ··· − ··· Proof. The result of turning a poset upside-down is to reverse all its linear ex- tensions, which changes ascents to descents and vice-versa. So if F (x) is the de- scent polynomial of the original poset, the descent polynomial of the new poset is n 1 1 x − F (x− ). So the roots of the new polynomial are the inverses of the roots of the original. 4.7 General Behavior n n n We can say in general how the sequence d n, d n 1,..., d 1 behaves. − n The set of numbers d k, for n fixed, is very nearly unimodal if arranged appropri- ately. Theorem 4.7 Fix n and d. Then 86 (i) If d =0, 0= n = = n < n =1 d n ··· d 2 d 1 n n n (ii) If 1 d (n 3)/2, d n < d n 1 < < d 1 ≤ ≤ − − ··· (iii) If n is even and d =(n 2)/2, n < < n = n − d n ··· d 2 d 1 (iv) If n is odd and d =(n 1)/2, n < < n > > n − d n ··· d (n+1)/2 ··· d 1 n n n (v) If n is even and d = n/2, d n = d n 1 > > d 1 − ··· n n n (vi) If (n + 1)/2 d n 2, d n > d n 1 > > d 1 ≤ ≤ − − ··· n n n (vii) If d = n 1, 1= d n > d n 1 = = d 1 =0. − − ··· Proof. (i) follows from the fact that the identity is the only permutation with 0 descents. (v), (vi), and (vii) follow from (iii), (ii), and (i) respectively because n = n . d k n 1 d n+1 k − − − n n Let fn(x)= , which means that fn(nd k)= if 0 d n 1 x/n +1 n x/n +n x d k b c b c − − ≤ ≤ − and 1 k n. Figure 4.3 shows the graphs of f (x) and f (x). Each monochromatic ≤ ≤ 6 7 n n n section is a sequence of the form d n, d n 1,..., d 1. Note the graphs plateau − n n 1 n where one sequence meets the next. Since = − = , each sequence d 1 d d+1 n begins where the previous one ends. The content of the theorem is that fn is basically unimodal. That is, the sequences on the left increase, those on the right decrease, and those in the middle behave according to (iii) through (v). The theorem is true for small n by inspection. By Equation (4.9), k 1 n k 1 n n +1 − n n − = + = fn(n(d 1) `)+ fn(nd `). d k d 1 ` d ` − − − X`=1 − X`=k X`=1 X`=k Let i = ` + n k in the first sum and ` k in the second and we have − − n 1 n k n 1 n +1 − − − = f (n(d 1) (i n+k))+ f (nd (i+k)) = f (nd k i). d n − − − n − n − − k i=n k+1 i=0 i=0 X− X X So imagine a caterpillar of length n crawling on the graph of y = fn(x), as shown in the top graph of Figure 4.3. If his head is at x-position nd k, the equation above − 87 y 66 y = f6(x) x d = 1 d = 2 y 384 y = f7(x) x d = 2, k = 5 n Figure 4.3: The unimodality of d k for n = 6 and n = 7. The graphs of f6(x) and f7(x) are shown, n where fn(nd k)= , as defined in Theorem 4.7. − d k 88 says that the sum of the heights of his segments (or his total potential energy) is n+1 n+1 d k. If he were to take a step forward, his total energy would be d k+1. That would be an increase in energy if the new height of his head is higher than the current height of his tail. The theorem now follows easily by induction. 4.8 Behavior if d n If d is much less than n, and π is selected at random from those permutations of n letters which have d descents, then the distribution of π(1) approaches a geometric distribution uniformly, in the following sense. Theorem 4.8 Fix an integer d > 0. Suppose πn is chosen uniformly from those permutations of n letters which have d descents. Then for any > 0 there is an N such that P(π (1) = k) (4.31) n 1 < (1 p)pk 1 − − − for all integers n and k with n N and 1 k n, where p = d . ≥ ≤ ≤ d+1 d j n Proof. For 0 j d, let Pj(n)=( 1) − d j . Then by Equation (4.16) and ≤ ≤ − − Equation (4.13), k 1 n k n k 1 n k j − j +1 − (4.32) = d − (d + 1) − P (n) d j d d +1 k 0 j d X≤ ≤ n j +1 n (4.33) =(d + 1)n P (n + 1) . d j d +1 0 j d X≤ ≤ k 1 k 1 k Since (1 p)p − = d − /(d + 1) , the left-hand side of Equation (4.31) is − k 1 n k j − j+1 − 0 j d Pj(n) d d+1 ≤ ≤ 1 j+1 n − P 0 j d Pj(n + 1) d+1 ≤ ≤ P 89 n and since Pd(n)= 0 = 1, the last term of both sums is 1. Therefore we have