arXiv:1306.3039v1 [stat.ME] 13 Jun 2013 adwyi opc admnme ewe and 1 between number random a n pick stan- to The is permutations. way we random dard of and lot games ap- a Labs various needed an Bell in and with strategies at optimal me working simulating for was were started I it problem. and plied 1980s, the since h hl oi a eeoe vrtels 0years? 30 last the over developed how has of topic overview whole some give the and days, early the of chains lections [ “Markov with and times” labeled 1980s, You often mixing early point. now and topic the that the in from of involved, start up both pick were to I try academic but and personal life, earlier your about ask won’t I Diaconis Persi Aldous with David Conversation Another o a for 03 o.2,N.2 269–281 2, DOI: No. 28, Vol. 2013, Science Statistical www.stat.berkeley.edu/users/˜aldous [email protected] ekly aiona970 S e-mail: USA Hall, 94720, Evans California 367 Berkeley, Berkeley, California, of University

ai losi rfso,Saitc Department, Statistics Professor, is Aldous David c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint .MRO HIS IIGTMSAND TIMES MIXING CHAINS, MARKOV 1. Diaconis: nttt fMteaia Statistics Mathematical of Institute Aldous: n wthi ih1 hnpc admnumber random a pick then 1, with it switch and 10.1214/12-STS404 ttsia Science Statistical o eeitriwdi coe 1984 October in interviewed were You htsbe h anfcso ywork my of focus main the been That’s nttt fMteaia Statistics Mathematical of Institute OT CARLO MONTE , h M n a eevdhnrr otrtsfo hcg an Presiden Chicago past from a doctorates Sciences, honorary universities. received of other has Academy and National IMS 1 the the since of Stanford at member Statistics a and Professor been of has Professor partments he A as (1996–1998), appointed Cornell periods was and Following he (1987–1997) Stanford. 1974 at in Harvard Professor from tant Ph.D. a receiving Upon Abstract. blt,fudtoso ttsis ai,Mro hi Mo chain Markov , times. statistics, mixing of foundations ability, phrases: and words Key 2012. early in home 03 o.2,N.2 269–281 2, No. 28, Vol. 2013, h olwn ovrainto lc thsoc n tAldo at and office his at place took conversation following The 34 URL: ; .Cnyutl syu recol- your us tell you Can ]. ovrainatce[ article conversation 2013 , es icnswsbr nNwYr nJnay3,1945. 31, January on York New in born was Diaconis Persi . aeinsaitc,cr hffln,exchange- shuffling, card statistics, Bayesian This . 7 ,so ], in 1 hti o i tfrorder saw for we it and talked did Durrett 1980s, you Rick early if and the that you in like came Coast I people West cards? with 52 the mix to many to answer—how back take the it know does cards.” to 52 transpositions wanted with really enough I be to So “You has me. that at and trans- to hundred yelled positions went a and did he down crazy—she and “It’s are came mathematicians boss he said, and her she boss to and his went simulations” she “You Then the said, crazy.” redo I card.” to random ran- have another a with switched random—I card more dom it fussy made three that I random said but After “You the thing mistake. said, generate lady you a The did for permutations?” “How looked asked, we we days and simulations lines 2000 of code were time There of wrong. CPU looked of something hours and many of results sso ocmuaiegopadgtteright [ the out get turned and it group and answer noncommutative a problem anal- Fourier on a the as ysis do up carefully it and set analysis Fourier could in we realized we and was here Shahshahani Mehrdad Eventually answer. right nuh... . . enough that ewe and 2 between t n n18 o rt hsatce[ article this wrote you 1983 in And it. when Diaconis: Aldous: n n − 2gvs13 htwsfrm h tr of start the me for was That 103. gives 52 = ie,i’ xcl adm egtthe got We random. exactly it’s times, 1 ya ayculn ruet... . . argument coupling easy an by . . . u tws’ la fta a the was that if clear wasn’t it but . . . n n wthi ih2 t.I o do you If etc. 2, with it switch and tHarvard at t Carlo, nte nteDe- the in 9.H is He 998. 26 four d n obe to ] 2 ssis- of t us’s ie htwudbe would that times 2 1 n 1 log nmixing on ] n which , 2 D. ALDOUS times for Markov chains. Around the same time Jim Aldous: Now there’s a distinction between “don’t Reeds had become interested in riffle shuffling. He converge” and “we can’t prove they do converge” had reinvented a model that Gilbert and Shannon . . . And there’s an argument that in practice one had invented and had numerical results and ideas uses “black box” methods like MCMC in compli- but couldn’t seem to push them through. You and cated situations where you don’t have any nice struc- I started to talk about it and invented stopping time ture, whereas to do any theoretical analysis you need arguments [3] that turned out to give good answers to assume some structure, so we (perhaps) are in in some cases. Spurred on by these two examples a Catch-22 situation where one can do theory for I started to think hard about mixing times. Around MCMC only in situations where you wouldn’t actu- the same time, what I now call “the Markov chain ally use it. And then you have to rely on the heuris- Monte Carlo revolution” [11] began with the paper tics that applied researchers have developed. by Geman and Geman [30]. So the topic of mixing Diaconis: Well at least for the chemists I talk to, times became about more than just card shuffling, who study molecular dynamics, they haven’t con- it was also about how long should you run a simula- verged, they’re in some kind of local minimum, and tion until it converges. I say now, as I said then, that really dramatically new ideas are needed. I think a if you take any application of MCMC in a real prob- very interesting research question is to look at the lem and ask if we theoreticians can give a sensible zoo of diagnostic techniques that are available to- answer to a practitioner about “how long . . . ,” then day, and look at the hundreds of examples of Markov we can’t. These are open research problems, every chains about which we know everything. Take some one of them. We have ideas, we have heuristics, but of those examples and diagnostic techniques and try as math problems they are really open. to see how they behave. That seems like a very rea- A fitting proof of that is the following: The Metro- sonable thing to do. I’ve tried for 20 years to get polis algorithm, Glauber dynamics, the Gibbs sam- a graduate student to do this, but somehow I can’t pler and molecular dynamics were all invented to get anyone to sit down and do the work. I should solve one problem, the problem of random place- have learned I need to do it myself. They’re hard ment of hard discs in a box. Take, say, two-dimen- math problems. The diagnostics can be pretty so- sional discs of radius ε in the unit square. You want phisticated; they’re not just second eigenvalue but them to be uniform random subject to nonoverlap. involve sups and infs of complicated functionals. The Metropolis algorithm is that you pick a disc at We do have a lot of machinery and they’re nice random, you try to move it a little, if it’s possible math problems so this project would be useful to to move then do it, if not then try another pick. help evaluate diagnostics. What’s annoying to me is Glauber dynamics is similar. But as far as I know— how little that problem is recognized. If you go to a despite billions of steps of simulations over the last Statistics meeting, in talk after talk somebody runs 60 years—nobody has ever sampled from anything the Gibbs sampler because that’s the standard thing close to the stationary distribution, in the interest- to do, and they say they ran it 10,000 times and it ing case of high disc density. There’s supposed to seemed to be OK, and they just go on with what be a phase transition around 81%, but the algo- they’re doing. People don’t even try to prove the rithms have no hope of converging near that point chain does what it’s supposed to do, that is, have and yet people get numbers from the simulations the desired stationary distribution. and talk about them. I think the same goes for sta- 2. BAYESIAN STATISTICS tistical algorithms too—people who don’t want to think about it just run the simulation until some- Aldous: Let me move on to Bayesian statistics, thing seems to have settled down. So I think the cur- which has been a recurrent feature of your research. rent state of the art is there’s a ton of research still Maybe I should remind readers that 30 years ago to be done; everyone finds us theoreticians annoy- this was completely unrelated to Markov chains, but ing prigs for asking what can you show rigorously. now a major use of MCMC is for computing Bayes But it’s not just being annoying. In enough cases the posteriors—but let’s leave MCMC for later. I’m not algorithms really don’t converge, and people don’t competent to ask good questions here, so let me just seem to want to own up to that. In [21, 22] we tried throw out two points and then I’ll sit back and listen. pretty seriously to do the hard discs problem, but (a) You have work, such as the 1986 paper [14] there’s still a very long way to go. with David Freedman, that addresses foundational ANOTHER CONVERSATION WITH PERSI DIACONIS 3 technical (rather than philosophical) issues in Bayes- like de Finetti’s theorem, which I was interested in ian statistics. for philosophical reasons, trying to make sense of the (b) There is a recent newsletter piece by Mike Jor- way model-building goes. I like de Finetti’s take, fo- dan [32] summarizing comments by many leading cusing on observables, and I’d like to understand Bayesian statisticians (including you) on open prob- just what you need to assume about a process, in lems in Bayesian statistics. terms of observables, in order for it to be a mixture So I guess I am asking for your thoughts on the of standard parametric families, a mixture of expo- history and current state of methodological/techni- nential or normals or some other thing. That led to cal aspects of Bayesian statistics. a lot of work [15]. That era seems to have quieted Diaconis: I came into Statistics late in life, be- way down—nowadays no one works on exchange- coming aware of the Bayesian position when I was ability particularly, though a few of us still dabble a graduate student at Harvard. Art Dempster and in it. Fred Mosteller were Bayesians—not everyone, Bill About a year ago, some of our chemists here came Cochran wouldn’t dream of doing anything Bayesian. to me. They were working on a protein folding prob- I read de Finetti’s work and found it frustrating and lem with the IBM Blue Gene project. They’re re- fascinating, as I still do today, but it was inspiring ally doing protein folding—taking forty molecules and so I tried to make my own sense out of it. One of and ten thousand water molecules and then doing the things I noticed was that de Finetti’s theorem in- the molecular dynamics to see how the protein folds volves an infinite exchangeable sequence. I wondered by using the equations of physics. It’s a very high- whether there could be a finite version, saying the dimensional system—one particle is represented by sequence is almost a mixture of IIDs. In fact, I wrote twelve numbers—and the chemists were coarse- my first paper on that topic when I was still a grad- graining and dividing this high-dimensional space uate student [8]. That was when I came to meet into maybe five thousand boxes. Their hope is that the Berkeley people—David Freedman and Lester within a box it will quickly get random—in the sense Dubins and David Blackwell, the latter two being of invariant measure for a dynamical system—and Bayesians of various stripes—and, in fact, David that jumps from box to box can be modeled as Freedman’s thesis (published as [27]) was about de some Markov chain. Refreshingly to me, they were Finetti’s theorem for Markov chains. The story, in a Bayesians, so they wanted to put a prior on transi- nutshell, is that David was a precocious but difficult tion matrices and, because the laws of physics are young man who wanted to do a thesis in Probabil- reversible, they wanted the prior to live on reversible ity, and (the way David told it) he went into Feller’s chains. I realized that some earlier work with Silke office and Feller looked up, said, “prove de Finetti’s Rolles [24] exactly gave the conjugate prior for re- theorem for Markov chains,” looked back down, and versible Markov chains. I told them about it, they David left. So he went and did it. In order to prove implemented it and they say it makes a big differ- the theorem, he needed to assume the chain was ence. There’s a marvelous graduate student here, stationary. When I met him at a Berkeley-Stanford Sergio Bacallado, he’s a chemist, and he’s written joint colloquium, I said that I knew how to do it papers such as [5] in the Annals which extends our without stationarity. I could make a finite version work on priors in more practical directions. There’s of the theorem and it didn’t need anything like sta- something very exciting here—our old work had hor- tionarity. He agreed to listen, and Lester did too. rible formulas involving quotients of Gamma func- Lester was very dismissive, but David wasn’t, and tions and now someone is caring to get it right, that led to our work on finite versions of both the and thinking it’s sensible. So that subject is quite Markov and the IID cases [12, 13]. alive and well today, although Sergio has taken it a I’ve written far too many papers. I’ll try to distin- lot further. One of the main problems for Markov guish the ones that people seem to like into Statis- chain theory is to make the mixing time theory for tics or Probability or something in between. You continuous-space chains. There really are technical presented me with a list of papers . . . difficulties for continuous spaces, and he’s managed Aldous: your 30 most cited papers, according to to get around that. Google Scholar, Now in a larger view, it’s a very exciting time for Diaconis: . . . and about a third are Statistics and Bayesian statistics. When I first learned about it, in a third are Probability and a third are in between, the early 1970s, it was still Good and Savage, and 4 D. ALDOUS people were still arguing about whether an egg in a 3. TEACHING THE PHILOSOPHICAL fridge is rotten or not . . . FOUNDATIONS Aldous: . . . and the Bayesian lady tasting tea. Aldous: Diaconis: You teach an undergraduate course with I remember going to my first Valencia on the philosophical foundations of meeting. One of the world’s leading Bayesians, John Statistics. You describe its topic as “10 great ideas Pratt, a marvelous man, was analyzing some data, about Chance.” Now most readers of Statistical Sci- his wife’s estimates of upcoming gross receipts at a ence have surely never taken, let alone taught, such cinema where she worked in Cambridge, MA. He was a course. Can you tell us about the course? doing regression, and at the end he did an ordinary Diaconis: Philosophers and statisticians have least squares but nothing Bayesian [35]. I asked him thought for a very long time about what proba- why not Bayesian? He said it was too hard to figure bilistic statements mean and how to combine dis- out the priors and it wouldn’t have made any differ- parate sources of information to reach a conclusion. ence anyway. I was shocked and dumbfounded. That These are still important questions and not ones to was 1983, but since then we can actually implement which we know the answer. We begin our course Bayesian methods. And we do. Now the judgement with the first great idea, that probability can be has to be put off—frequentist methods have had 200 measured—the emergence of equally likely cases, the years of people tinkering with them and we’re just first probability calculations. There is of course a starting to use Bayesian methods. I think it’s reason- discussion of frequentism and of various kinds of able to let time settle down before deciding whether Bayesians. Indeed, I.J. Good once wrote an arti- they are better or worse. There are lots and lots of cle entitled 46656 varieties of Bayesians where he groups doing Bayesian analysis. states 11 “facets” like whether utility is emphasized One of the big tensions in Statistics, which is a or avoided, whether physical probabilities are de- mystery to me, is really big data sets. You can try to nied or allowed, and so on. We try to explain some estimate huge numbers of parameters with very few of the different kinds of Bayesians. Brian and I are data points. Now I understand sometimes there’s a both subjectivists—I am what I call a nonreligious story that seeks to justify that, but it makes me Bayesian, that is, I find it useful and interesting and very, very nervous. If you try to think about be- I don’t really care what you do. Some of the course ing a Bayesian in that kind of problem, it can’t be is pointing out the shallowness of naive frequentism. that you have any idea about what priors you’re Bayesians are happy to talk about frequencies, in putting on, you’re completely making something up. that when you have a lot of data the data swamps It’s nothing other than a way of suggesting proce- the prior, and you will use the frequency in order to dures. It might be useful, it might not be useful. make your inferences. It’s not that Bayesians argue There are a lot of people trying to do that, but it’s against frequencies, they’re happy to have a lot of a completely different part of the world and I don’t data, and frequencies are forced on you by the math- have much feeling for it. It’s so taken over Statis- ematics. So we discuss and prove those things. We tics right at the moment that I feel compelled to also explain von Mises collectives, which have mor- put in the following sentence. There are huge data phed into the complexity approach to probability. sets; there are also many, many small data sets. One of the things I find interesting that’s hard And that’s where the inferential subtleties matter. to make philosophy out of, is what I want to call If you’re sick and you’re trying to think about a new the von Mises pragmatic approach. If you ask work- procedure for your tooth and there are two available ing statisticians what they think probability is, they procedures, with 10 or 50 instances of each . . . what say, well, you do something a lot of times, and it’s should you do? Statistics encounters lots of prob- the proportion of times something happens. If you lems like this too. So it’s good to remember that ask about the probability Obama will be re-elected, while there are huge data sets and that’s very excit- they will respond with a cloud of words. Or they’ll ing, there are also lots of small data sets and there’s walk away or say it’s too difficult to talk about. still room for the classical way of thinking about What von Mises said is that any scientific area has statistical problems. practice and theory. He discusses geometry—there’s Aldous: A cynical view is that there’s more money the mathematical notion of circles and straight lines, in the fields with big data sets. then there’s practical architecture and drawing. The Diaconis: Tsk tsk (laughs), you won’t get any ar- theory can be used, but at some point you have to gument from me. relate the theory to the real world. I think that ANOTHER CONVERSATION WITH PERSI DIACONIS 5 sort of pragmatic approach to foundations is im- then there are smaller seasonal effects. So the uni- portant. But von Mises never tells you how to do formity might not be true for my class. We don’t so. I ask this question for differential equations. If really know what the probabilities are. So let me some guy writes down a differential equation, and put a prior on (p1,p2,...,p365). If your prior is uni- there’s a picture of water whirling around in a ves- form on the simplex, then the key number of people sel with blockages—what does that equation have to (to have a 50% chance of some birthday coincidence) do with the whirling of the water? In order to answer decreases from 26 to about 18. For the coupon col- that, many of us would say, “That’s what Statistics lector’s problem, using a story that Feller suggested, is about.” Whether theory fits data is a statistical the key number of people in a village (to have a question. So we can apply this to our own subject: 50% chance that every day is someone’s birthday) does statistical theory fit the real world? is about 2300. That’s under the uniform multino- Anyway, we hope to turn the course into a book, mial model. If instead you take the uniform prior on after several years of iterations. the simplex, then—it’s a slightly harder calculation Aldous: What kind of students take the course? to do—but if I remember, the key number increases Diaconis: About 70 students, undergraduates or to about 190,000. That’s a little surprising when you graduates in Statistics or Philosophy, and just in- first hear it, but under the uniform prior some pi will 2 2 terested other people, even some faculty attend. It’s be around (1/365) so you need order 365 people quite lively, there’s lots of discussion. We teach it just to have a good chance of having that one day once a week for three hours, which is exhausting for as a birthday. everyone concerned. Aldous: But isn’t this a good argument against Trying to think about why we do what we do is im- the naive Bayesian idea of inventing priors that are portant, but nobody talks about it. I tell the follow- mathematically simple but without any real-world ing two stories. One is about you, and one is about reason? Brad Efron. At some stage you and I were talking, Diaconis: Sure, and that was the point of the exer- as we often do, and I said I was going to teach a cise. Bayesian statisticians should be thinking more course on the Philosophy of Probability. And you got carefully about their priors. Part of that is under- quite irate, saying, “You’re just going to tell a bunch standing the effect of different priors, and those are of words that won’t illuminate anything.” And my math problems. In the birthday problem, math good friend Efron got similarly very angry. He said, showed the prior didn’t have too much effect, whereas “That’s just going to be that Bayesian garbage,” for the coupon collector’s problem it had a huge ef- reached into his pocket, took out a handful of coins, fect. Susan Holmes and I wrote a paper called A threw them, and said, “Look: Head, Tail, Tail, . . . — Bayesian peek into Feller volume I [18] taking his that’s random.” So people hear “Philosophy” and elementary problems and making Bayesian versions take it in a religious way. To me, the question “is of them. When does it make a difference and when what you’re doing really about anything?” is worth not? It’s a paper I like a lot. discussing, and we’re just trying to talk about it. Aldous: A version of the nonuniform birthday prob- If you want to know what the problems in Bayesian lem I give in my own “probability and the real world” × 1 statistics are, ask a Bayesian. We know! It’s very course [2] is to take pi = 1.5 365 for half the days × 1 hard to put meaningful priors on high-dimensional and 0.5 365 for the other half. This makes sur- real problems. And the choices can really make a prisingly little difference—the key number decreases difference. I’m going to give one example of that, from 23 to 22. And to avoid the possible disaster of just for fun. Suppose you’re teaching an elementary it failing with my students, I show the active roster Probability course. It’s the first day of term, you of a baseball team (easily found online; each MLB walk into class, you see there are 26 students in the team has a page in the same format) which con- class, so you decide to do the birthday problem. Here veniently has 25 players and their birth dates. The are two thoughts about the birthday problem. First, predicted chance of a birthday coincidence is about if it doesn’t work, then it’s a disastrous way to start 57%. With 30 MLB teams one expects around 17 a course. Second, the usual calculation assumes each teams to have the coincidence; and one can read- day is equally likely. But my students are about the ily check this prediction in class in a minute or so same age, and there are more births on weekdays (print out the 30 pages and distribute among stu- than weekend-days—that’s about a 20% effect—and dents). 6 D. ALDOUS

4. BOOKS: ON MAGIC AND ON ble, and everyone’s going to fall asleep. How long COINCIDENCES until I can change the conversation? We’re inter- Aldous: On a lighter note, I have found myself ested in good magic tricks, which are performable following in your footsteps in various aspects of aca- and don’t look mathematical, but which have some demic life, a minor such aspect being “unfinished math behind them. Some of the math turns out to books.” The 1984 conversation refers to the book on be pretty interesting. Most of the tricks are ones coincidences you were writing with Mosteller, and we invented ourselves, which is why we don’t get there is a 1989 joint paper [23], but when can we strung up for revealing secrets; the magic commu- expect to see the book? nity doesn’t like that, but we seem to get away Diaconis: Well, there were two books mentioned with it. There’s not much probability in the book— in that interview, and the other one, with Ron Gra- there’s some material on riffle shuffles and that sort ham on mathematics and magic, has recently been of thing—and some old tricks of Charles Jordan that published [17]. So it took 27 years, but we did finish we made mathematical sense of. To whet your ap- it. I’m starting to think about the coincidences book petite, there’s a chapter on the connection between again. We’re sitting in my office and you see those riffle shuffles and the Mandlebrot set. folders up there . . . Aldous: Science has a notion of progress—one Aldous: I see about 15 of those very wide old open- could take any scientific topic and write a nontech- ended cardboard files . . . nical article on progress in that topic over the last Diaconis: . . . those folders have newspaper clip- 30 years. Is there an analog of progress in magic? pings collected by Fred Mosteller over 30 years, and Diaconis: Here I’m a bit negative. The final chap- every one has a few pages saying here’s a kind of ters are about who are the current stars—who is in- coincidence we might study via a model, and here’s venting tricks that are new and really different? The some back-of-an-envelope calculation. I give a lot of people we describe are old or now departed. The public talks, about 50 a year, and I had stopped giv- younger people don’t seem to be inventing math- ing the talk on coincidences, but I’ve now committed based tricks. But in the coming quarter I’ll be teach- to giving the talk again in a few weeks. That’s how ing a course on mathematics and magic here at Stan- I trick myself and get back into thinking about the ford, so I’m trying to cultivate young people myself. topic. So look for the book sometime in the next Magic is changing in many ways, and the main one five years. I promised Fred (before he died in 2006) is again negative. Because of Wikipedia and youtube I would do it, and I’m going to gear up and do there are very few secrets any more. You could be it. watching a show and type the right words into your Aldous: The colorful story of you running away smartphone and get an explanation, and this won’t from home at age 14 to do magic, then buying Feller go away. It’s profoundly changing magic, likely not and teaching yourself enough mathematics to under- for the better. stand it, was told in the 1984 interview, and has be- Now I do have a positive hope—maybe this will come well known in our community. But I’ve joked encourage people to invent new and better tricks. to students “if you meet the Queen of England, don’t slap her on the back; if you meet Persi Diaconis, Also . . . when I was a kid, I was once hanging around don’t ask him to do a magic trick.” Now that you with my magic mentor Dai Vernon at a billiard par- and Ron Graham have published the book on math- lor. Billiards is a very refined game, the gentleman’s ematics and magic [17], could you tell us a little version of pool. Now pool halls are notoriously rowdy, about what’s in the book? smoke-filled with gambling and drinking. This was a Diaconis: The reason I first got interested in math- group of people, seated around two masters, playing ematics was via magic. I had hoped to call the book three-cushion billiards. The crowd was silent aside Mathematics to Magic and Back, but the publisher from an occasional quiet ooh of appreciation. Ver- vetoed that, saying people wouldn’t get the idea. non looked at me and said, “Wouldn’t it be won- Now it’s called Magical Mathematics: The Math- derful if people watched magic that way.” If people ematical Ideas that Animate Great Magic Tricks, would learn a bit more about magic and appreci- maybe a bit pompous. One of the things about math- ate the skill and presentation, then maybe it would ematics and magic is that if some person says, become like watching a classical violinist. Those are “I know a card trick,” you wince inside, because my dreams about how exposure might change magic they’re going to deal cards into piles on the ta- for the better. ANOTHER CONVERSATION WITH PERSI DIACONIS 7

5. COLLABORATION WITH DAVID ing (e.g.) the Cox proportional hazards model or the FREEDMAN mysteries of regression. Then he said to me, as if it Aldous: We’ve already mentioned David Freed- were obvious, though it hadn’t occurred to me be- man, my long-time colleague at Berkeley, and per- fore: “If I say it really, really clearly, then people will haps your major collaborator, who sadly died in see how crazy it is.” 2008. I regarded him as one of the handful of peo- David was a brilliant mathematician. I miss him ple in our business who are unique—there was no- daily, because we used to chat all the time. And body like David. I mentally pictured him as Mycroft I could ask him anything, from “where to eat” to fine Holmes (Sherlock’s smarter older brother, who ap- points of nonmeasurable sets. This continued until a pears briefly in several stories to give sage advice) few years before his death. We had written 33 papers and I recall you having some “bright light” image. together, and I’m a shoot-from-the-hip guy in writ- Can you tell us some things about your collabora- ing first drafts, and David was very careful, and very tions and about David’s impact on the field? artful in his prose, and finally we got rather tired of Diaconis: I first met David at a Berkeley-Stanford each other, like an old married couple—we felt we joint colloquium barbeque at Tom Cover’s house. had heard everything the other had to say. I found I had read his thesis when I was a graduate student, his constant negativity draining, and he found my so I had something to say to him. He was a very constant enthusiasm draining. But we had been a crusty character. He had a kind of “gee shucks, I’m pretty good pair for a long time. just a farm boy” outer style, but he was in fact the Right now, Laurent Saloff-Coste and I [25] are try- debating champion of Canada. He was an honest ing to make a little theory of “who needs positivity?” man, and there aren’t so many of them. He could be What happens when you start convolving signed difficult. There’s an image—that I heard from Jim measures? Infinite products are often not well-defin- Pitman who maybe heard it from Lester Dubins— ed. I’m sure there’s some technical way of fixing of David working on a problem: you’d ask him a that. It’s the kind of thing where David would have question and he would berate you and say that’s said, “Let’s think about it,” and some nice math stupid, but then he would get down and focus. And would have come out of it. Now, with David gone, when he was focused it was like there was this very I don’t know who to ask about such things, I don’t bright clear light on a narrow part of the problem, know who cares about measure theory any more. and then it would shift slightly over and focus on Aldous: But we all figure you have 57 collabora- a next part. That was how he worked. He wasn’t a tors, so you always have somebody to call. quick glib guy. Diaconis: I do have a lot of collaborators, and At some stage he decided that the main impact he that’s an absolute joy, though there’s a cost. You could make in Statistics was what he called defen- have to own up to how little you know, and not be sive statistics, which was trying to make an art and afraid to make a fool of yourself. science out of critiquing knee-jerk modeling and the wild misuse of probability models. He was as effec- 6. MORE COLLABORATORS tive as anyone ever has been at that. Was he actually Aldous: effective? Maybe not in our business, but he has a Because you have had a huge number of following in some of the social sciences and that’s collaborators, we might apologize in advance to any marvelous. He certainly made me very sensitized to who are not mentioned in this conversation. In the the misuse of models. 1984 conversation you emphasized Martin Gardiner Aldous: And me too. and Fred Mosteller and Charles Stein and David Diaconis: Now it’s easy to just criticize modeling, Freedman as the people you had interacted with and but what should we do about it? I wrote a paper been influenced by the most by that time. Are there about my version of David’s argument which was others during your later career, not already men- called A place for philosophy? The rise of modeling tioned, who you would like to talk about? in statistical science [9]. I tried to make a list of what Diaconis: Well, there’s you, with exchangability we can do. David’s approach to what we should do and card shuffling and mixing in MCMC, and statis- was embodied in the last book he wrote [28]. He tics and probability in the real world. And Lau- spent years writing out with infinite clarity about rent Saloff-Coste, an analyst who I’ve converted to topics he had such scorn for. I had never quite un- be somewhat of a probabilist. He was visiting Dan derstood why he put so much energy into expound- Stroock, and at that time was very far from prob- 8 D. ALDOUS ability, and we got into an argument, and he was can make the effort to go talk to them. Also, I say right and I was wrong. to pay attention to your cohort of students—some I’ve written a lot of papers with Ron Graham. He will become eminent in the future—and they always tried to hire me when I got my Ph.D. I remember laugh. knocking on his office door at , where he Diaconis: Sometimes when interviewing postdocs, was running the math and statistics group. I opened they think they can come to Stanford and have you the door and there was this man with a net attached work on their problem. Or they just want to work to his waist belt and going up to the ceiling. He on their own thing by themselves. It’s a lot better to was practicing 7 ball juggling and the net caught read some paper by the person you want to interact dropped balls so he didn’t have to pick then up off with, and say, “Can we talk about that?”, at least the floor, and I thought, this guy’s great. as a way of getting started. It’s a simple thing to do, I’ve written papers with Susan Holmes, my wife, but most people don’t do it. and that has its complexity. One of the most stress- ful things, for each of us, is to hear the other give a 8. OLD TOPICS NEVER DIE talk on our joint work. You sit there thinking, “No, Aldous: You recently sent me an email from coun- no, no, that’s not the way to say it,” and you have try X saying that most of the people you talked to to keep quiet. We’ve all had this experience with a were our generation and still working on the same graduate student, but when it’s your wife it’s rad- kind of topics that had established their careers. I’ve ically worse. I’ve just finished writing a paper [16] always liked the well-known quote from von Neu- with Susan and Jason Fulman that was based on a mann [37]: casino card shuffling machine that we were asked to analyze and could in fact analyze. This was done ten As a mathematical discipline travels far years ago and the machine didn’t work, so it wasn’t from its empirical source, or still more, if so polite to publish back then. it is a second and third generation only I don’t write so many papers with my graduate indirectly inspired by ideas coming from students—they should get the credit for their work— “reality” . . . there is a grave danger that but one I have resumed working with is Jason Ful- the subject will develop along the line of man. I enjoy working with him because he starts least resistance, that the stream, so far with a natural algebraic bent, but I taught him from its source, will separate into a mul- to look at a formula and look for some probabil- titude of insignificant branches, and that ity story, and he’s great at it. I have also started the discipline will become a disorganized writing papers with Sourav Chatterjee. He’s moving mass of details and complexities. toward the probability-physics field, but I’m encour- Of course math naturally grows in a “one thing leads aging him to keep some connection with statistics. to another” way, but is there any test for when 7. NETWORKING enough has been done on a topic and it’s time to move on? Diaconis: I’m an extremely social statistician. That Diaconis: It’s a difficult question. Right now I’m is, it’s a lot of fun to go ask somebody something. doing some work in algebraic topology, a subject You need to be not too proud, to not be embarrassed with enormous depth, but many of the prominent about what you don’t know. If someone asks you a practitioners are involved in the minutiae of how question, and you don’t know the answer, then sug- the big machine works and don’t bother to solve gest someone else who might know—try to be help- real problems. They just think that if the machine ful. I do this all the time—asking and answering, is well enough developed, then it can solve any prob- helping other people and having them help you— lem that’s handed to it. I do think it’s important to but most people don’t. Learning social skills is un- try to focus on real world problems. A lot of my dervalued in the research community. There’s a joy motivation is MCMC, which is really used on real in having a community, in having people who know problems, and, as I said earlier, we don’t know how what you’re doing. to give theoretical analyses of MCMC on real prob- Aldous: As a related aspect of social skills, I tell in- lems. So what we do is problems with nice structure, coming graduate students that the faculty are friend- say, symmetry, and hope that will grow into some- ly but busy; they won’t come talk to you, but you thing useful. von Neumann’s quote is perfect—you ANOTHER CONVERSATION WITH PERSI DIACONIS 9

Fig. 1. Juliet Shaffer, Erich Lehmann, Persi Diaconis, 1997. make a small change in a solved problem, it’s still for five years with wonderful analysts. We wrote pa- not real, you can’t do it but one of your students pers [21, 22] in the best math journals. But our the- makes progress, and an area grows and gets a name. orems are basically useless as regards the real prob- It does happen that way. lem. Of course it’s easy to criticize. One way I try to But again . . . sometimes things done because they be constructive is take a classic like the original were beautiful as pure math, then 50 years later it’s Metropolis algorithm applied to hard discs in a box. just what somebody needed. A reasonable case in Can I prove anything about it? I worked very hard point is partial exchangeability for matrices, which

Fig. 2. Persi Diaconis, 2006. 10 D. ALDOUS

anyone else interested in. It’s about summability. A sequence of real numbers that doesn’t converge in the usual sense may be Abel or Cesaro summable. And there are theorems that say if a sequence is summable in scheme A, then it’s summable in scheme B. I noticed that any time there was such a known theorem, there was a probabilistic identity which said that the stronger method was an average of the weaker method. So is there a kind of meta-theorem that says this is always true? I once gave the Hardy Memorial Lecture at Cam- bridge and wrote a paper [10] titled G. H. Hardy and Probability ??? with the three question marks. Hardy notoriously didn’t have much regard for ap- plied math of any sort, and probability was particu- larly low on his list. He hated being remembered for the Hardy–Weinberg principle. I knew Paul Erd˝os well, and he said that Hardy and Littlewood were great mathematicians, but if they had had any knowl- edge of probability at all, then they would have been able to prove the law of the iterated loga- rithm. That they certainly had the techniques but because they just couldn’t think probabilistically their work on that particular problem was second- rate. Anyway, in the lecture I wove together such stories and my own open problems about Tauberian theory. Aldous: Outside academia you are perhaps best Fig. 3. Persi Diaconis, 2006. known for magic and for the “7 shuffles suffice” re- sult from your 1992 paper with Dave Bayer [6]. I’m David Freedman and I were working on in 1979, and sure that features in every other interview you’ve you independently came up with a proof. That was done, so I won’t ask again here. More recent work of an esoteric corner of probability, and soon the sub- yours that attracted popular interest was the 2007 ject went quiet for 20 years, but now it’s completely Dynamical bias in the coin toss paper [19], assert- re-emerged in contexts such as graph limits [20] and ing (by a mixture of Newtonian physics and exper- other parts of pure math [4]. People are looking back imental observation of the initial parameters when at the old papers and asking how did they do that. real people performed tosses) that there was about I just opened the Annals of Probability and there’s a 50.8% chance for a coin to land the same way an article on free probability versions of de Finetti’s up as tossed. I had two undergraduates actually do theorem. Is that probability, or some other area of the 40,000 tosses required to have a good chance of math? It’s very hard to know what will turn out to detecting this effect, but the results were ambigu- be useful. ous [33]. Have you or other people followed up on Aldous: An unconventional idea for a workshop your paper? would be to invite senior people to talk about one Diaconis: Aside from your students, there’s a phys- nonrecent idea of theirs which has not been devel- ics group at Boston [38] who carefully repeated our oped or followed up by others, but which (the speaker measurements of angular velocity etc., and a Polish thinks) should be. Following Hammersley [31], one group who have written a book [36] on the physics of might call these “ungerminated seedlings of re- gambling. They reproduced our analysis and added search.” Do you have any ideas in this category? bouncing and air resistance, which we neglected. Diaconis: There’s a problem that I worked on as Speaking of coin-tossing, every year we get a call part of my thesis but have never managed to get from ESPN and they want a two-minute spot on “is ANOTHER CONVERSATION WITH PERSI DIACONIS 11

Fig. 4. Philip Stark, Don Ylvisaker, Persi Diaconis, Larry Brown, Terry Speed and Ani Adhikari at the memorial for David Freedman, 2008. the coin toss in the Superbowl fair?” Of course the ing that some coins are biased when you spin them Superbowl coin is a big thick specially minted ob- on a flat surface, but for flipping in the air we can ject, and I don’t have much to say on that. I recently prove you can’t make it biased . . . got a letter from a German Gymnasium teacher who Aldous: . . . by conservation of angular momentum, tried to make a biased coin by making one side of which a high school physics teacher should know. balsa wood, and he couldn’t do it. I wrote back say- You may recall that two of our colleagues have a

Fig. 5. Persi Diaconis and Elizabeth Purdom, 2010. 12 D. ALDOUS paper titled You can load a die, but you can’t bias a weeks than three months. So our field is moving in coin [29]. that direction. Publication is less and less meaning- ful because of the arXiv. But as an author I find it 9. MODERN TIMES useful to imagine that some referee is going to read Aldous: In the 1984 conversation, when asked my paper. It makes me take care about the details about the future you were wise enough not to make and the exposition. Aldous: very specific predictions about particular topics, but Your answer in 1984 to “what does the fu- I do notice two points. You noted there was increas- ture hold for you?” was “just going crazy, working ing collaboration—“more and more 2- or 3-author hard, learning more math.” I think we can agree that papers”—and we’re all aware this trend has contin- prediction was correct. So let me ask the same ques- ued. The current (October 2011) Annals of Statistics tion again, and ask for your thoughts on the future has only 2 out of 17 articles being single-authored, of the field of Statistics, and ask for advice to some- whereas going back 30 years (September 1981) it one completing an undergraduate degree and con- was 10 out of 17. Incidently, the total length of the templating starting a Ph.D. program in Statistics. Diaconis: 17 articles increased from under 200 pages to al- Yes, I still like working hard and learn- most 500 pages, a perhaps less predictable effect. ing more. Over my career Statistics has changed Your second point, paraphrasing slightly, was “I’m so drastically it’s almost unrecognizable. Compa- glad Statistics is not that kind of high-pressure field nies like Target predicting what their individual cus- where you have to publish every two weeks.” But tomers will want or can be persuaded to want—this today we do have younger colleagues who publish kind of aggressive analysis of massive data sets. So fifteen papers per year. there’s a lot of new Statistics for someone like me We can probably all agree that increased collab- who’s classically trained. You have to find a part oration is A Good Thing, but what about the in- of it you want to learn. For example, I’m trying to creased number of papers and the implicit pressure think about large networks via general models for on young people to publish more than in our day? random graphs. And for a theoretical statistician, Diaconis: Right now I’m on the hiring committees looking at what applied people are doing and ask- for both the Math and Stat departments, and it’s ing, “Can I break it, can I do it better?” will always noticeable that even applicants straight out of grad give us plenty to do. school have 3–10 papers on their CV, many of them About what a youngster should do . . . for a start, in pretty good journals. How has that happened? you can’t learn too much about using computers. When I was at that stage I just had some techni- I lament that the academic statistics world doesn’t cal reports. So it’s just a cultural change. We per- know how to recognize and reward that skill appro- ceive an exponentially growing literature with just priately. There are people who are amazing hackers too many papers. People publish the most obscure and that’s an invaluable skill, but they don’t get things. But then the ability to search on the web the same credit as mathematically-focused people. allows us to keep track, and, as I said earlier, some- I don’t know why this is, but it should change. times the most obscure-looking paper turns out to Aldous: Presumably because of the traditional “re- contain just the right thing. And I should be the last search = papers in journals” equation—we’re so used one to criticize there being too many papers, because to assessing research contributions in that particular I’m now writing almost ten papers a year. I would way. Even though there are journals like Journal of hate to have to choose which ones I shouldn’t have Computational and Graphical Statistics, they maybe written. are perceived as less prestigious. In our field we still referee, or pretend to referee, Diaconis: Another piece of advice is to read clas- papers, and we all know it can take six months or a sic papers. If there’s a topic that interests you, look year to get through. I do some work with physicists back at what the people who invented it actually and physics is largely an unrefereed subject. Their wrote. It gives you a more concrete sense of why they logic is that if somebody publishes a wrong result, invented it and what it’s about, compared to read- the community becomes aware of it, and then that ing textbooks. Nowadays people don’t pay enough group gets a bad reputation. It’s not that no one attention to such things—instead it’s “Let’s try it looks at the paper at all; someone reads the abstract out and write a quick paper.” and scans the paper to check it looks reasonable. Statistics is as healthy as it’s ever been. One can Then it gets published, in time maybe closer to three see the prominence of machine learning, but they ANOTHER CONVERSATION WITH PERSI DIACONIS 13 are really just using ideas that were developed in Inst. Henri Poincar´e Probab. Stat. 23 397–423. Statistics twenty or fifty years ago. They are apply- MR0898502 ing them—that’s great—but we are inventing the [16] Diaconis, P., Fulman, J. and Holmes, S. (2012). ideas that will be applied in the next twenty or fifty Analysis of casino shelf shuffling machines. Ann. years. Statistics is a great field to be part of, and Appl. Probab. To appear. [17] Diaconis, P. and Graham, R. (2011). Magical Mathe- I’m still excited by it. matics: The Mathematical Ideas that Animate Great ACKNOWLEDGMENTS Magic Tricks. Princeton Univ. Press, Princeton, NJ. MR2858033 I thank Raazesh Sainudiin for proofreading a first [18] Diaconis, P. and Holmes, S. (2002). A Bayesian peek 64 draft. into Feller volume I. Sankhy¯aSer. A 820–841. MR1981513 [19] Diaconis, P., Holmes, S. and Montgomery, R. REFERENCES (2007). Dynamical bias in the coin toss. SIAM Rev. 49 211–235. MR2327054 [1] Aldous, D. (1983). Random walks on finite groups and [20] Diaconis, P. and Janson, S. (2008). Graph limits and rapidly mixing Markov chains. In Seminar on Prob- ability, XVII. Lecture Notes in Math. 986 243–297. exchangeable random graphs. Rend. Mat. Appl. (7) 28 Springer, Berlin. MR0770418 33–61. MR2463439 [2] Aldous, D. (2012). On chance and unpredictabil- [21] Diaconis, P. and Lebeau, G. (2009). Micro-local anal- 262 ity: 20 lectures on the links between mathemat- ysis for the Metropolis algorithm. Math. Z. 411– ical probability and the real world. Available at 447. MR2504885 Diaconis, P. Lebeau, G. Michel, L. http://www.stat.berkeley.edu/~aldous/. [22] , and (2011). [3] Aldous, D. and Diaconis, P. (1986). Shuffling cards Geometric analysis for the metropolis algorithm and stopping times. Amer. Math. Monthly 93 333– on Lipschitz domains. Invent. Math. 185 239–281. 348. MR0841111 MR2819161 [4] Aldous, D. J. (2010). Exchangeability and continuum [23] Diaconis, P. and Mosteller, F. (1989). Methods for limits of discrete random structures. In Proceedings studying coincidences. J. Amer. Statist. Assoc. 84 of the International Congress of Mathematicians. 853–861. MR1134485 Volume I 141–153. Hindustan Book Agency, New [24] Diaconis, P. and Rolles, S. W. W. (2006). Bayesian Delhi. MR2827888 analysis for reversible Markov chains. Ann. Statist. [5] Bacallado, S. (2011). Bayesian analysis of variable- 34 1270–1292. MR2278358 order, reversible Markov chains. Ann. Statist. 39 [25] Diaconis, P. and Saloff-Coste, L. (2012). Convolu- 838–864. MR2816340 tion powers of complex functions on Z. Unpublished [6] Bayer, D. and Diaconis, P. (1992). Trailing the dove- manuscript. tail shuffle to its lair. Ann. Appl. Probab. 2 294–313. [26] Diaconis, P. and Shahshahani, M. (1981). Generating MR1161056 a random permutation with random transpositions. [7] DeGroot, M. H. (1986). A conversation with Persi Di- Z. Wahrsch. Verw. Gebiete 57 159–179. MR0626813 1 aconis. Statist. Sci. 319–334. MR0858513 [27] Freedman, D. A. (1962). Mixtures of Markov processes. [8] Diaconis, P. (1977). Finite forms of de Finetti’s the- Ann. Math. Statist. 33 114–118. MR0137156 36 orem on exchangeability. Synthese 271–281. [28] Freedman, D. A. (2009). Statistical Models: Theory and MR0517222 Practice, revised ed. Cambridge Univ. Press, Cam- [9] Diaconis, P. (1998). A place for philosophy? The rise bridge. MR2489600 of modeling in statistical science. Quar. Appl. Math [29] Gelman, A. and Nolan, D. (2002). You can load a die, LVI 797–805. but you can’t bias a coin. Amer. Statist. 56 308–311. [10] Diaconis, P. (2002). G. H. Hardy and probability??? MR1963275 Bull. Lond. Math. Soc. 34 385–402. MR1897417 [30] Geman, S. and Geman, D. (1984). Stochastic re- [11] Diaconis, P. (2009). The Markov chain Monte Carlo revolution. Bull. Amer. Math. Soc. (N.S.) 46 179– laxation, Gibbs distributions, and the Bayesian 205. MR2476411 restoration of images. IEEE Trans. Pattern Anal. 6 [12] Diaconis, P. and Freedman, D. (1980). de Finetti’s Machine Intelligence 721–741. theorem for Markov chains. Ann. Probab. 8 115– [31] Hammersley, J. M. (1972). A few seedlings of research. 130. MR0556418 In Proceedings of the Sixth Berkeley Symposium on [13] Diaconis, P. and Freedman, D. (1980). Finite ex- Mathematical Statistics and Probability (Univ. Cali- changeable sequences. Ann. Probab. 8 745–764. fornia, Berkeley, Calif., 1970/1971), Vol. I: Theory MR0577313 of Statistics 345–394. Univ. California Press, Berke- [14] Diaconis, P. and Freedman, D. (1986). On the con- ley, CA. MR0405665 sistency of Bayes estimates. Ann. Statist. 14 1–67. [32] Jordan, M. I. (2011). What are the open problems in MR0829555 Bayesian statistics? ISBA Bulletin 8 1–4. [15] Diaconis, P. and Freedman, D. (1987). A dozen de [33] Ku, P., Larwood, J. and Aldous, D. (2009). 40,000 Finetti-style results in search of a theory. Ann. coin tosses yield ambiguous evidence for dynamical 14 D. ALDOUS

bias. Available at http://www.stat.berkeley.edu/ [36] Strzalko, J., Grabski, J., Stefanski, A., Per- ˜aldous/Real-World/coin tosses.html. likowski, P. and Kapitaniak, T. (2009). Dynam- [34] Levin, D. A., Peres, Y. and Wilmer, E. L. (2009). ics of Gambling: Origins of in Dynam- Markov Chains and Mixing Times. Amer. Math. ical Systems. Springer, . Soc., Providence, RI. MR2466937 [37] von Neumann, J. (1947). The mathematician. In The [35] Pratt, J. and Schlaifer, R. (1985). Repetitive as- Works of the Mind 180–196. Univ. Chicago Press, sessment of judgmental probability distributions: Chicago, IL. MR0021929 A case study. In Proc. Second Valencia Inter- [38] Yong, E. and Mahadevan, L. (2011). Probability and national Meeting on Bayesian Statistics 393–424. dynamics in the toss of a thick coin. Available at North-Holland, Amsterdam. arXiv:1008.4559.