Studies in plausibility theory, with applications to physics

PIERO G. L. PORTAMANA

Doctoral thesis in photonics Stockholm, 2007 Porta Mana, P. G. L., Studies in plausibility theory, with applications to physics.

Akademisk avhandling som med tillstånd av Kungliga Tekniska Högskolan framlägges till offentlig granskning för avläggande av teknologie doktorsexamen i fotonik tisdagen den 12 juni 2007 klockan 10:00 i Sal D, Forum, Kungliga Tekniska Högskolan, Isafjordsgatan 39, Kista.

TRITA ICT/MAP AVH:Report 2007:6 ISSN 1653-7610 Kungliga Tekniska Högskolan ISRN KTH/ICT-MAP/AVH-2007:6-SE SE-100 44 Stockholm ISBN 978-91-7178-688-3 SWEDEN

© 2007 by Piero G. L. Porta Mana.

Typeset with LATEX 2ε on 9th May 2007 Publisher: Universitetsservice US AB ai miei tre tesori: mia mamma, mia sorella, e il mio amore

nonna Matilde Onnis Porta nonno Giovanni Porta morfar Lewis Nyberg farmor Alice Hult in memoriam

v

Abstract

The discipline usually called ‘probability theory’ can be seen as the theory which describes and sets standard norms to the way we reason about plausibility. From this point of view, this ‘plausibility theory’ is a province of logic, and the following infor- mal proportion subsists: plausibility theory deductive logic = . common notion of ‘plausibility’ common notion of ‘truth’ Some studies in plausibility theory are here offered. An alternative view and math- ematical formalism for the problem of induction (the prediction of uncertain events from similar, certain ones) is presented. It is also shown how from plausibility theory one can derive a mathematical framework, based on convex geometry, for the descrip- tion of the predictive properties of physical theories. Within this framework, problems like state assignment — for any physical theory — find simple and clear algorithms, numerical examples of which are given for three-level quantum systems. Plausibility theory also gives insights on various fashionable theorems, like Bell’s theorem, and various fashionable ‘paradoxes’, like Gibbs’ paradox.

Ringraziamenti

Mia madre Miriam, la mia “sorellina” Marianna, e il mio amore Louise ringrazio per prime, per la loro costante vicinanza, il loro costante amore, supporto, e perché hanno sempre creduto in me. Amore e supporto ho sempre ricevuto anche dal resto della mia famiglia: mio padre Beppe, zio Sergio, zia Pinuccia, Michela, Laura, Grazia, Marco, Agnese; e dalla mia fa- miglia “allargata”: Gisela e Bertil, Zacharias, Ursula e Joakim, Emilie, Nicolina, Lova, e il piccolo Loke, a cui non posso non aggiungere anche mormor Brita e morfar Lewis, che purtroppo non è piú con noi, e farmor Alice, anche lei purtroppo non piú con noi. A tutti voi: grazie! e: vi voglio un mare di bene! (E una carezza a Nerone, Mia, Leon, e Brigitte anche se non c’è piú.) E gli Amici con la A maiuscola, dove li lasciamo? anche loro hanno sempre dato calore e, non appena ce ne fosse bisogno, insostituibile sostegno. Anna, Mario, Filippo, Marcello, Margareta e Bengt, Anders e Aline, Elsa e Eric: grazie! vi voglio bene! Un ringraziamento particolare va a Daniel sensei, Tommy sensei, e Iryo sensei, per il loro costante insegnamento che trascende la sfera puramente fisica. Grazie! E ora passiamo alla fisica. Prima di tutto un cordiale grazie alla segretaria di diparti- mento Eva, che è sempre pronta ad auitare con squisita gentilezza e sempre con un sorriso. Tanti sorrisi ho sempre ricevuto anche dallo staff della biblioteca: Elisabeth Hammam, In- grid Talman, Tommy Westergren, e Elin Ekstedt, Thomas Hedbjörn, Daniel Larsson, Allan Lindqvist, Lina Lindstein, Yvonne Molin, Min Purroy Pei, Anders Robertsson. Loro de- vono essere ringraziati per una gran parte dei riferimenti bibliografici qui inclusi. Grazie a Richard, e grazie anche a Lisbeth, senza cui la caffetteria sarebbe un disastro. Grazie anche a tutte le persone che, senza essere pagate, rendono la pubblicazione elettronica possibile tramite LATEX 2ε con i vari packages, Emacs and AUCTEX, MiKTEX, and http:\arxiv.org. Grazie! E grazie anche a tutti i ricercatori che rendono il proprio lavoro gratuitamente disponibile in internet. Cari colleghi! Un ringraziamento particolare va a Anders M. e Åsa, che hanno sempre mostrato attivo interesse per le mie ricerche. Gli altri colleghi sono sempre stati gentili e pronti a dare una mano e ad ascoltare, anche se non c’è stato molto interesse reciproco nelle rispettive ricerche: Phil, Matthew, Maria, Iulia, Hoshang, Björn, Anders K., Rina, Sebastien, Davide, Isabel, Marec. Un ringraziamento particolare a Christian, Jonas, e alla professoressa Jaskorzynska per aver letto e aiutato a correggere la bozza della tesi, a Mauritz per aver fatto resuscitare lo schermo del computer in un momento critico, a Daniel

vii viii RINGRAZIAMENTI per informazioni post-dottorali. Un grazie particolare a coloro che, attivamente o passivamente, mi hanno fatto da hand- ledare (il termine svedese è il piú adatto): i professori Bengtsson, Cadoni — che a grandi intervalli spazio-temporali mi ha sempre dato consiglio e appoggio —, Langmann, e Lind- blad. Dulcis in fundo, il mio handledare Gunnar voglio ringraziare, tante altre cose a parte (i consigli, le critiche, gli insegnamenti) soprattutto per il dono piú importante: libertà. Libertà di fare ricerca sugli argomenti che piú mi hanno interessato, libertà di proporre e sviluppare progetti di ricerca, che ha sempre accettato; libertà anche come segno di fiducia nel mio lavoro. Handledare cosí se ne trovano pochi oggi! Grazie anche ai professori Owen, Linusson, Björner, Murdoch, Fuchs, Hardy, Gaifman, Noll, e Netocný per la loro disponibilità epistolare, per discussioni, e per referenze. Un ringraziamento cordiale a Peter Morgan e a Rob Spekkens per l’interesse mostrato nel mio lavoro e a Rob anche per i suoi amichevoli consigli riguardo alla situazione post-dottorato. Infine, sono sicuro che se nonna Tilde fosse ancora qui avrei da ringraziarla per una montagna di cose, oltre a quelle per cui la devo ancora ringraziare. Grazie nonna!

Anti-thank

There is one that I have to anti-thank and for whom I must express my innermost despise (and that of the rest of the world, I suppose). He has continuously caused problems, and he is the responsible for all the errors and the lacks of quality of this work (besides, of course, for the problems of world hunger and war). Who am I speaking about, if not PIPPO!!!! Contents

Ringraziamenti vii

Contents ix

List of public-made papers and other contributions xiii Papers included in this work ...... xiii Other contributions ...... xiv

I Prologue 1 Truesdell, Jaynes, and Hardy ...... 1 Purpose and contents of these notes ...... 3 Other remarks ...... 4

II Plausibility logic and induction 7 A Bayesian statistical test ...... 7 § 1 ...... 7 Deductive logic ...... 8 § 2 ...... 8 § 3 ...... 8 § 4 ...... 9 § 5 ...... 10 § 6 ...... 10 Plausibility logic ...... 11 § 7 ...... 11 § 8 ...... 12 § 9 ...... 13 ‘Plausibilities of plausibilities’, induction, and circumstances: a generalisation . 14 § 10 ...... 14 § 11 ...... 14 § 12 ...... 15 § 13 ...... 16 Miscellaneous remarks ...... 16 § 14 ...... 16

ix x CONTENTS

§ 15 ...... 18 § 16 ...... 18 § 17 ...... 21

III A mathematical framework for the plausibilistic properties of physical the- ories 25 What is a physical theory? ...... 25 § 18 ...... 25 Exemplifying interlude (which also introduces some notation) ...... 26 § 19 ...... 26 § 20 ...... 27 § 21 ...... 30 Three special kinds of propositions. Definition of ‘system’. Insights ...... 31 § 22 ...... 31 § 23 ...... 33 Introducing the vectors ...... 35 § 24 ...... 35 Structures of the vector sets and connexion with plausibilistic reasoning in physics 36 § 25 ...... 36 § 26 ...... 40 The vector framework in classical point mechanics ...... 41 § 27 ...... 41 § 28 ...... 43 § 29 ...... 43 § 30 ...... 44 The vector framework in quantum mechanics ...... 44 § 31 ...... 44 Miscellaneous remarks ...... 46 § 32 ...... 46 § 33 ...... 46 § 34 ...... 49 § 35 ...... 50 § 36 ...... 51 Historical notes and additional remarks ...... 52 § 37 ...... 52

IV A synthesis: induction and state assignment 55 Putting together the Laplace-Jaynes approach and the vector framework . . . . 55 § 38 ...... 55 § 39 ...... 56 § 40 ...... 57 Remarks ...... 58 § 41 ...... 58 § 42 ...... 59 xi

§ 43 ...... 59 § 44 ...... 60

V Epilogue 63 Summary and beginnings ...... 63 § 45 ...... 63 Future research directions ...... 64 Plausibilistic physics ...... 64 § 46 ...... 64 § 47 ...... 65 § 48 ...... 66 § 49 ...... 66 Rational continuum mechanics ...... 67 § 50 ...... 67 Quantum theory ...... 68 § 51 ...... 68 § 52 ...... 69 Teaching ...... 69 § 53 ...... 69 § 54 ...... 69

Bibliography 71

List of public-made papers and other contributions

(Note: In later papers I have decided to sign with my mother’s family name, ‘Porta’, as well as my father’s, ‘Mana’.)

Papers included in this work

Paper (A) Porta Mana, P. G. L., 2003, Consistency of the Shannon entropy in quantum experiments, eprint arXiv:quant-ph/0302049; also publ. in a peer-reviewed journal.

Paper (B) Porta Mana, P. G. L., 2004, Distinguishability of non-orthogonal density ma- trices does not imply violations of the second law, eprint arXiv:quant-ph/0408193.

Paper (C) Porta Mana, P. G. L., 2003, Why can states and measurement outcomes be represented as vectors?, eprint arXiv:quant-ph/0305117.

Paper (D) Porta Mana, P. G. L., 2004, Probability tables, eprint arXiv:quant-ph/0403084; also published in a conference proceed- ings volume.

Paper (E) Porta Mana, P. G. L., A. Månsson, and G. Björk, 2005, On distinguishability, orthogonality, and violations of the second law, eprint arXiv:quant-ph/0505229; submitted to a peer-reviewed journal.

Paper (F) Porta Mana, P. G. L., A. Månsson, and G. Björk, 2006, ‘Plausibilities of plau- sibilities’: an approach through circumstances, eprint arXiv:quant-ph/0607111.

xiii xiv LIST OF PUBLIC-MADE PAPERS AND OTHER CONTRIBUTIONS

Paper (G) Porta Mana, P. G. L., A. Månsson, and G. Björk, 2007, The Laplace-Jaynes approach to induction, eprint arXiv:physics/0703126, eprint PhilSci:00003235.

Contribution of the author in Papers (E)–(G): I was the main developer; made the literature search; wrote the papers.

Paper (H) Månsson, A., P. G. L. Porta Mana, and G. Björk, 2006, Numerical Bayesian state assignment for a three-level quantum system. I. Absolute-frequency data, and constant and Gaussian-like priors, eprint arXiv:quant-ph/0612105. Paper (I) Månsson, A., P. G. L. Porta Mana, and G. Björk, 2007, Numerical Bayesian state assignment for a three-level quantum system. II. Average-value data, and constant, Gaussian-like, and Slater priors, eprint arXiv:quant-ph/0701087.

Contribution of the author in Papers (H)–(I): I contributed to the work on three-level systems; jointly worked on the creation and running of the program codes; made the main part of the literature search; in the first paper I wrote the introductory and concluding sections and rewrote part of the rest. More generally, as regards Papers (F)–(I): I formulated the original idea of ‘circum- stance’ and proposed the project of developing this idea in the general framework presented in Papers (C) and (D).

(J) Björk, G., and P. G. L. Porta Mana, 2003, A size criterion for macroscopic superposi- tion states, eprint arXiv:quant-ph/0310193; also publ. in a peer-reviewed journal.

(K) Björk, G., and P. G. L. Porta Mana, 2004, Schrödinger-cat states: size classification based on evolution or dissipation, in Proceedings SPIE: Fluctuations and Noise in Photonics and Quantum Optics II, vol. 5478, pp. 335–343.

Contribution of the author in Papers (J)–(K): I gave feedback through discussions on physical and mathematical points; gave a minor help in the literature search; jointly wrote the introductory and concluding sections of the papers.

Other contributions

• Cadoni, M., and P. G. L. Porta Mana, 2000, Hamiltonians for a general dilaton gravity theory on a spacetime with a non-orthogonal, timelike or spacelike outer boundary, eprint arXiv:gr-qc/0011010; also publ. in a peer-reviewed journal. xv

• Porta Mana, P. G. L., and A. Månsson (presenter), 2004, Maximum-entropy meth- ods with Shannon’s entropy for a two-level quantum system, poster presented at the CNRS Summer school on ‘Quantum logic and communications’, Cargese, France.

• Porta Mana, P. G. L., and G. Björk (speaker), 2005, Uncertainty, information and entropy, invited talk at the ‘Ninth International Conference on Squeezed States and Uncertainty Relations’, Besançon, France.

I. Prologue

Aber wie wollte ich gerecht sein von Grund aus! Wie kann ich Jedem das Seine geben! Diess sei mir genug: ich gebe Jedem das Meine.

Zarathustra [560]

Truesdell, Jaynes, and Hardy

In the present notes I present part of the studies pursued to achieve the title of ‘philoso- phiae doctor’. Well, I have not really pursued them for the sake of the title (which like all titles is seldom a guarantee of anything, especially today), but rather spurred by curiosity about certain questions stemming for a passion (which, like all passions, is not a guarantee of success and quality) for physics, mathematics, logic, and philosophy — in a word, a passion for natural philosophy. My final undergraduate studies concerned general relativity, and the first thought was to continue my graduate studies in that beautiful field. But I still had a strong curiosity and antipathy towards quantum mechanics, which originated from the related undergrad- uate studies. So my graduate studies took a quantum-mechanical turn. The subject was, and still is, absolutely counterintuitive. Teachers say that it is the physical phenomena de- scribed that are counterintuitive. I believe that it is the model we have build of them that is counterintuitive, not the phenomena themselves; or I could wickedly say: it is the absence of physics in the model that is counterintuitive.1 As I say later in these pages, quantum mechanics looks like a Linnaean ‘catalogue’ and systematisation of interesting phenom- ena; but its phenomenological system lacks physics as Linné’s system lacked genetics, so to speak. A very important turn in my studies was the finding and reading of Jaynes’ Probability Theory: The Logic of Science [390]. I can still recall the wonder and enlightenment I felt reading the first chapters of that book. Probability theory had up to that point been for me just a difficult and obscure subject. Suddenly it was shown by Jaynes to be just a

1To this, one should also add the unscientific attitude of many physicists who indulge in speaking about ‘unsolvable mysteries’ or the like; such physicists would have made very good priests.

1 2 I. PROLOGUE province of logic, with few, simple, and clear principles, from which all the rest can be derived and on which all applications are based. It turned out that this point of view was quite old, at least as old as Leibniz’ studies, and that it had been and still is cultivated not only by physicists but also by logicians like Hailperin [312, 313]. From this new point of view, and reading other studies by Jaynes and by others (e.g. Caves, Fuchs, Schack) having similar views, quantum mechanics began to appear somewhat less incomprehensible. More precisely, some of its mathematical objects and structure acquired new meanings, quite different from those I had been taught. But their origin, the physical idea behind them, was and still is missing. Another turn, rapidly following the Jaynesian one, was the reading of Hardy’s Quan- tum theory from five reasonable axioms [316]. Hardy presented a mathematical structure completely isomorphic to that of quantum mechanics, but without imaginary numbers. I had not thought that something like that could ever be possible; but it is (and, a posteriori, one sees that this possibility is a quite trivial fact). More than that, Hardy started from a mathematical framework that is as much simple as it is general, and can be applied to classical as well as quantum mechanics, and provides a lot of insight into the mathemat- ical structures of these theories. My curiosity was drawn at first principally toward that framework. It appears to have been around at least from the fifties. The third turn, no less important than the other two, was the reading and studying of Truesdell’s work and studies in rational continuum thermomechanics. Also in this case I felt enlightenment and enthusiasm. Truesdell and the other workers in rational ther- momechanics — Coleman, Ericksen, Noll, Owen, Serrin to name just a few — present thermodynamics in the same way as rational mechanics, or analytical point mechanics, or continuum mechanics are usually presented. Not only that: they present continuum me- chanics (including electromagnetism) and thermodynamics as one whole beautiful subject. They dispelled many of the preconceptions I had learnt to parrot in thermodynamics: that temperature and entropy are ‘defined’ only in equilibrium and for ‘quasi-static’ processes, that the entropy function determines the equation of state, et similia. But Truesdell’s writ- ings — which are often literary gems and like Jaynes’ are colourful, provocative, personal, unlike the many dull, dry writings that follow some recommended ‘scientific style’ in or- der to achieve a semblance of objectivity — have given and give me much more than only thermodynamic insights. They are a guide and reference in scientific thinking; in understanding what physics and physical theories are; and in quality standards — which, unfortunately, I seldom meet albeit I strive to achieve them. What about quantum theory, after these studies? On the one side, I am no longer interested in it. ‘Classical’ physics is more beautiful; it is ‘classical’ also in the literary sense. On another side, I am still interested in quantum mechanics in the sense in which they say ‘know your enemy’. Its understanding can eventually lead to its resolution into ‘classical’ physics, a possibility for which I see no physical or conceptual obstacles. I discuss this in more detail at various places in these notes. 3

Purpose and contents of these notes

The public-made articles reprinted with these notes, a list of which is given on page xiii, are a part of the fruits of the studies inspired by the work of Jaynes, Truesdell, and Hardy. They concern plausibility theory and on the general features of the plausibilistic proper- ties of physical theories. They are minuscule notes and bricks that I hope may be used in the music and edifice of natural philosophy: to replace older and partly eroded bricks, to modulate between two or more arie. It is ennobling to contribute, however modestly, to the same sublime music and great edifice to which the geniuses of Aristotle, Euclid, New- ton, Leibniz, Euler, Laplace, Gauss, Cauchy, Hamilton, Maxwell, Boole, Gibbs, Poincaré, Hilbert, Einstein, Wittgenstein, Tarski, Truesdell and many others contributed foundations, themes, vaults, counterpoints, buttresses, stretti, pillars, rhythms. The chapters that follow are only meant as concrete or ligatures amongst the reprinted articles, to show their unity and fill some gaps. Thus, these notes are not meant to be a short introduction to plausibility theory, or plausibility logic, or statistical models, or classical or quantum mechanics, or other subjects touched in the discourse. Rather, knowledge of these subjects is to a great deal presupposed in order to read the articles presented. These notes may rather be, for some readers, an invitation to read and study the cited works on these subjects, so that these readers can build by themselves a context against which to read and understand the articles; for other readers, a reminder of those works and subjects. As invitation or as a reminder, they are accompanied by personal remarks on some of those works and on the subjects themselves. The presentation is not linear, since the subjects presented interconnect and touch one another in a variety of points and ways. This is also the reason for which I have chosen to divide the text in continuously numbered sections. ‘What do you mean by “plausibility theory”?’ some will ask. Plausibility theory is that province of logic which describes and formalises the way we reason, or we think we ought to reason, about plausibility. In precisely the same sense, well-known deductive logic describes and formalises the way we reason about truth. There are many mathemat- ical similarities between plausibility theory and the various probability theories that stroll and limp around. But its strength lies in its meaning and its interpretation; and on this account it is intimately related to Bayesian probability theory. Chapter II, containing a short introduction and summary of plausibility logic, with references and brief historical remarks, provides some context against which to read Papers (F) and (G), which repre- sent my small contributions to the problem of induction within plausibility theory. In this chapter I also give some additional remarks and addenda to those papers. ‘And what do you mean by “plausibilistic properties of physical theories”?’ others will ask. I mean those quantitative properties of the predictions, expressed as plausibility distributions, that a physical theory allows us to make, and above all the certainties or plausibilities of these predictions, independently from what these predictions are physically about. There is a mathematical framework which is particularly suited to this study. I call it the ‘vector’ or ‘convex’ framework since it is based on linear algebra and convex geometry plays an important part in it. It is intimately related to the theory of statistical models. Chapter III is a sort of introduction to this framework from a very general perspective and with some historical remarks, and thus provide some context against which to read 4 I. PROLOGUE

Papers (C) and (D). A panoramic of some advantages and insights provided by the vector framework is also given. Papers (B) and (E) are examples of such insights. Plausibility theory as discussed in chapter II is at the heart of the vector framework discussed in chapter III. Indeed, chapter IV shows that the problem of state assignment, as formulated in the vector framework, is just a case of a problem of induction in plausibility theory. This presents the ideas and the general formulae on which Papers (H) and (I) are based. Additional remarks are also given. The remaining part of my studies’ fruits, especially those on thermodynamics, statis- tical mechanics, and continuum mechanics, is not presented here (if not en passant in Pa- pers (B) and (E)) although it constitutes a good 50% of the total, because it consists more in personal insights than in original contributions, and also because of spatio-temporal lim- itations. In chapter 45, however, I sketchily present some ideas of mine for future studies, and it will be clear that many of these concern thermodynamics, statistical mechanics, and continuum mechanics. Since I think that the study of past literature, which often hides extremely beautiful gems, is essential to research and gives many insights, I have tried to provide as many bibliographic references as possible. The notation follows ISO [361] and ANSI/IEEE [357] standards.

Other remarks

Of course, under my post-graduate studies I have also formed some opinions and got a clearer personal view on research in general. These opinions and view surely emerge in the remarks presented on the following pages. As well as opinions I have also had some ‘revelations’, unfortunately mostly negative, like e.g. the fact that the ‘peer-reviewed’ literature of our times counts too many unworthy and erroneous papers. I do not mean papers which present theories that at a deeper analysis prove to be inconsistent, or that yield wrong predictions; for these are part of the crab march of science. I mean papers whose authors clearly do not know the subject they are working with, or do not have a minimum of logic, or the mathematics is completely erroneous at an elementary level. And I am not saying that such papers are the majority, but their number is for my own taste too high. The problem is not really with such papers themselves and with their content, for ‘quadratures of the circle’, equations like ‘2 + 2 = 5’, et similia have always been, and will always be, presented. The problem is that those papers were supposedly ‘peer-reviewed’. This simply means that ‘peer-reviewed’ publication is not a guarantee for quality. Perhaps it gives a guarantee for plausible quality; but from my personal experience as reader this plausibility does not reach, not for all but yet for many famous journals, a level which I can personally accept. A far greater problem with many famous ‘peer-reviewed’ journals is that they have often been and still are a vehicle for ostracism from the current ‘establishment’. New or provocative but mathematically and logically irreproachable ideas meet resistance in the publication phase; but the naturally understandable resistance should be met in the reading phase, not in the publication one. So we are in the perverse situation in which 5 ridiculous papers are ‘peer review’-accepted and published simply because they parrot what the Celebrity of the Moment (often only a scientific pygmy) has said, and beautiful papers are ‘peer review’-rejected simply because they say something provocative though ingenious. Jaynes’ papers are an example of the latter kind; he had to publish mostly in conference proceedings. Another ‘problem’ are the constraints that ‘peer-reviewed’ journals impose on the style of a paper. These constraints are completely understandable, since each journal is characterised by its style, has its aficionados, and must find formal devices to maintain an appearance of objectivity. And yet, my personal opinion is that with regard to style there are no authorities other than those one self chooses, and the Chicago Manual of Style, e.g., may well be a bible for someone and au pair with toilet paper for someone else. I am not willing to adapt the style of a paper to that of a particular journal only to be published therein. Fortunately today, thanks to the internet, all the problems above are no problems any- more. Any scholar can post his or her studies, written in a completely personal style, in scientific archives which are freely and publicly available — an essential requisite of sci- entific studies —, like arXiv.org (http://arxiv.org) and mp_arc (http://www.ma. utexas.edu/mp_arc/), and the ‘peer reviewers’ are the final readers, who may praise the paper or throw it away, and hopefully send criticisms and comments to the author in both cases. No ostracism can be excised in between: the responsibility of judging the paper lies entirely on the readers, not on ignote third parties — and I think this is a very important point in science. Much time is also gained in avoiding discussions with ‘referees’ who sometimes are even incompetent as regards the paper’s subject or have no reasons to crit- icise a paper other than that it shows weaknesses in some work of their own or of their friends and allies.2 In order to have opinions and criticisms about some study of mine I prefer to ask some colleague who I know to be competent and frank. It is for this reason that, as of 2007, I have for the moment decided not to submit any papers to ‘peer-reviewed’ journals any longer. I post them mainly to arXiv.org, and whoever wants to judge them must do so by reading them, not by looking where they are published. Another negative revelation has been the knowledge-tight separation between different physical disciplines. Too many quantum physicists, e.g., have no idea of the new results achieved in thermodynamics since the sixties (many will still tell you that temperature is defined only at equilibrium3 and that equations of state can be derived from the entropy function); and many quantum experimentalists are unaware of the theorists’ profound in- sights, some of which are seventy years old, into the mathematical structure of quantum mechanics. This separation can even become puerile hostility in the relationship between physics and mathematics, or physics and philosophy. Quite disappointing is the philosoph- ical naïveté of so many physicists, even of some otherwise very competent; grotesquely, it is usually accompanied by scorn for the philosophers’ activities.4

2I think that a useful thing would be if ‘peer-reviewed’ journals periodically published a list of rejected papers along with the names of the referees who rejected them and the grounds for rejection. 3They could as well say that ‘position’ is defined only at rest. 4I have read somewhere that Wheeler said ‘Philosophy is too important to leave to the philosophers’. I hope he was joking. Surely I should not leave philosophy to people talking about ‘it from bit’, ‘complementarity’, or other cheap and infantile ideas.

II. Plausibility logic and induction

A Bayesian statistical test

1. Suppose that a person called Ingo is in a certain room at a certain time and, because of various reasons that need not interest us now and that we denote by I, Ingo strongly believes or is almost sure that a person called Ambrose will come into the room within the next minute, a hypothetical fact that we denote by the proposition A. Someone asks Ingo what is the probability of A, and he says

P(A| I) = 0.001. (II.1)

A minute passes, and Ambrose has not come in, therefore Ingo and we agree that A is false. Now call eq. (II.1) a ‘model’, consider all the facts that I have told you, and answer the following test question: Do you think that eq. (II.1) is a good or a bad model? ‘Yes’: plausibility theory, and Bayesian probability theory in particular, is not for you. ‘No’: plausibility theory and Bayesian probability theory are probably for you. Or perhaps you have a third answer — the most appropriate one: ‘Wait a moment: first you must tell me what eq. (II.1) is a model for!’. The point in fact is that those who answer ‘Yes’ evidently conceive P(A| I) as a ‘model’ of how things are, in this case those denoted by A. Those who answer ‘No’ instead conceive P(A| I) as a ‘model’ of how the beliefs about A of the person who knows I are. In plausibility theory, P(A| I) quantifies a belief, not the ‘reality of a fact’ or ‘how often a facts has happened’ or ‘the tendency of a fact to happen’ or something similar. And an equation like (II.1) is a ‘model’ for a belief about A, given the knowledge I. Hence eq. (II.1) is, in this particular case, a really bad ‘model’, since it does not reflect nor quantify well at all what Ingo believed about A when he knew I (remember that he was almost sure that Ambrose would come). It does not matter, for the purposes of judging the model, if A is then observed or not, if Ambrose came or not.

7 8 II. PLAUSIBILITY LOGIC AND INDUCTION

Deductive logic

2. Physicists have generally a great respect for logic and formal logic1 and try never to break its rules, even if this sometimes happens anyway. If for instance a physicist, in a given context I, first asserts the proposition A — and this can be symbolically written as I |= A or T(A| I) = 1 — and then the proposition B, T(B| I) = 1, and later asserts that A and B are together false, T(A ∧ B| I) = 0, then this is sure to bring forth a lot of papers and comments attacking his work and perhaps even him personally as well. Because we all know that if you state I |= A, and you state I |= B, (II.2)

then you must state I |= A ∧ B Because thus is the way we reason about facts and hypotheses and their truth, and we are not willing to accept conclusions by a person who does not follow this way of reasoning. In this sense formal logic is normative: in it we have distilled, formalised, and in some cases simplified the way any rational person should reason about facts and hypotheses. An important characteristic of formal logic is that it is not directly concerned with the subject of the reasoning it formalises. If I state that all fairies are under ten centimetres tall, then assert that Uranium is a radioactive element, and finally assert that it is not true that all fairies are under ten centimetres tall and that Uranium is a radioactive element, you would certainly object to my conclusion. Not on the grounds that fairies do not exist, but on the grounds that I violate the conjunction rule above. In fact, I may even reason about statements that are empirically false, but if I respect the rules of logic my reasoning is completely self-consistent. In this case you can argue against the truth of my conclusions not because of my reasoning, but because the premises I used were empirically false. We see then that in logic we have some ‘initial’ propositions (the premises or axioms or hypotheses) and some ‘final’ propositions (the conclusions), constructed by means of various logical connectives like ‘and’ (∧), ‘or’ (∨), ‘not’ (¬), and others. Logic is only concerned about the relationships between the truth or falsity of the former with that of the latter; but it is not concerned on whether the former or the latter are actually true or false.

3. The form in which formal logic is constructed stems from the fact that the common notion of ‘truth’ can be taken to be, in many situations at least, dichotomic. This means that the truth or falsity of a proposition A can be represented by one of two values — ‘t’ and ‘f’, ‘>’ and ‘⊥’, ‘0’ and ‘1’, or whatever — associated to that proposition. So we could write something like T(A) = 1 in order to express that A is true. In this notation, however, some

1Many are the books and textbooks in logic. A list including same classics can count Mill [531], De Mor- gan [160], Johnson [399–401], Lewis and Langford [464], Tarski [689], Quine [611, 612], Strawson [675, 676] Copi and Cohen [135, 136] Suppes [682], Hamilton [315], Barwise and Etchemendy [51, 52] (see also [48–50]), van Dalen [149]. See also Adams [1, 2] and Hailperin [313], who present the connexions between formal logic and probability theory. §4. 9

T(A| I) T(B| I) T(¬A| I) T(A ∧ B| I) T(A ∨ B| I) 0 0 1 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 1 Table II.1: Truth tables for negation, conjunction, and disjunction. important details are left implicit. Perhaps the proposition A is empirically true, or perhaps we are entertaining its truth only hypothetically or conditionally on other hypotheses or on a model. In other words, the truth of a proposition is always best referred to a context. Denoting such a context by I, we can then write ‘T(A| I) = 1’, as in fact we did in § 2. This expression can be read ‘the truth-value of A, given (or: in the context) I, is “true” ’. In formal logic this is sometimes written ‘I |= A’. Note that, here and in the following, the discussion is restricted to propositional calculi; first- and higher-order calculi are not considered. Therefore, thanks to the completeness theorem of the propositional calculus, we need not care about the distinction between ‘I |= A’ and ‘I ` A’ (for this distinction see the refs in footnote 1). The relationships between the truth-values of the premises and those of the conclusions are usually given by truth tables like those of Table II.1. They can also be compactly written as the following set of equations:

T(¬A| I) = 1 − T(A| I), (II.3a) T(A ∧ B| I) = T(A| I) × T(B| I), (II.3b) T(A ∨ B| I) = T(A| I) + T(B| I) − T(A| I) × T(B| I). (II.3c)

Note that these equations rely on the interpretation of the symbols ‘0’ and ‘1’ as mathe- matical symbols and on their mathematical properties (the tables II.1, on the other hand, do not rely on this interpretation).

4. The relations between the truth-values of premises and conclusions expressed in table II.1 and eqs. (II.3) are quite standard. But we can present the reasonings that they express in a slightly different way, and modify the formal relations accordingly. Take relation (II.3b) for instance. Its form apparently indicates that the assertions of the truth or falsity of A and B are made independently; i.e., when we state B for example, we do not need to know what has been asserted about A, and vice versa. Indeed, the expression ‘T(B| I)’ e.g. bears no trace of the proposition A. This way of reasoning can be weakened. We can assert the truth of A and then, given this assertion, assert the truth or falsity of B as well. We can express this conditional truth-value2 by the formula ‘T(B| A∧I)’. It expresses the fact that the truth-value of B is referred not only to the context I, but to the truth of A as well. In the same way can we introduce the formula ‘T(A| B ∧ I)’. From this point of view

2Not to be confused with the ‘material conditional’, for which see again the refs in footnote 1. 10 II. PLAUSIBILITY LOGIC AND INDUCTION we have that if we assert A, and then, given A, we assert B, then we must necessarily also assert their conjunction A ∧ B. The truth table for the conjunction can then be modified in the two following equivalent ways:

T(A| I) T(B| A ∧ I) T(A ∧ B| I) T(B| I) T(A| B ∧ I) T(A ∧ B| I) 0 n.d. 0 0 n.d. 0 0 n.d. 0 0 n.d. 0 1 0 0 1 0 0 1 1 1 1 1 1 where ‘n.d.’ stands for ‘non-defined’: it does not make sense to assign, e.g., a truth-value to B given A, if A is not given. In terms of ‘conditional’ truth-values, the rules (II.3) can be rewritten as follows:

T(¬A| I) = 1 − T(A| I), (II.4a) T(A ∧ B| I) = T(A| I) × T(B| A ∧ I) (II.4b) = T(B| I) × T(A| B ∧ I), T(A ∨ B| I) = T(A| I) + T(B| I) − T(A| I) × T(B| A ∧ I) (II.4c) = T(A| I) + T(B| I) − T(B| I) × T(A| B ∧ I).

5. I must now hurry to make clear that what I have called ‘conditional truth-value’ is not a part of the logic usually taught in undergraduate courses; although Adams [1, 2] and Hailperin [313] introduce calculi which include this conditional truth-value. Something similar to it is also present in the formalisms of natural deduction or sequent calculus, for which see e.g. [46, 51, 52, 149, 261, 271, 355, 445, 583], although these formalisms concern the logical rules more from the syntactical point of view than from the semantical one [688, 690] apparently adopted here. There also seem to be approaches that try to take the context into consideration, for which see e.g. Barwise [48, 49], Gaifman [254]. The general questions related to conditional truth-values and the notion of context, together with the related studies, are quite exciting. In my opinion they are also likely to contribute to the current discussion and synthesis of the syntactic versus semantic approach to logic, for which read the innovative work and entertaining discussions by Girard [271, 272, 274, 275] (cf. also [408, 681]). The reason why I introduced a conditional truth-value is to make a smoother shift to inductive logic, also called probability logic.

6. To summarise: formal logic, in its basic form, • is a model — a simplification, schematisation, formalisation — of the way we reason about ‘truth‘; • simplifies and represents ‘truth’ as a dichotomic notion; • is concerned about the relationships amongst the truth-values of premises and con- clusions; • does not prescribe any truth-values for the premises. §7. 11

Note that if we intend ‘true’ in a more specialised sense, e.g. as ‘provable’ as the intuitionists do [90, 271], then the formalisation of the way we reason about it can be different and we can have, e.g., intuitionistic logic, which can be said to have three truth- values and wherein the law of excluded middle is not valid.

Plausibility logic

7. Suppose now that we want to build a model of — i.e., simplify, schematise, for- malise — the way we reason about plausibility. I say ‘plausibility’, and not ‘probability’, because the latter word is amongst scholars so impregnated with contrasting connotations and denotations — frequency, ‘propensity’, chance, etc. — as to render it unsuitable for our purposes. In fact, what interests me is the everyday meaning of ‘plausible’, the one present in utterances like ‘Yes, that’s likely!’. Forget about frequency, chance, indetermin- ism, propensity, and other similar concepts. For the purpose of building a model, a first characteristic of the notion of plausibility that we notice is that it comes in degrees, in contrast to truth which, at a basic level, has dichotomic characteristics. We say ‘very plausible’, ‘barely plausible’, ‘very unlikely’, and, at the extremes, ‘impossible’ and ‘certain’. Moreover, the last two expressions are often connected with (though they are not synonyms for) the expressions ‘false’ and ‘true’. This suggests two things. The first is that in our model we can represent plausibility by a continuum. Of course, in our everyday expressions about plausibility we do not have the resolution of a continuum; but this schematisation offers many mathematical advan- tages. The second thing is that two points of this continuum must somehow correspond to our representation of ‘true’ and ‘false’. Already at this point we see that our model for plausibility is likely to require a more complex mathematical apparatus than that for truth. A second characteristic noted about these plausibility degrees is that they are often ordered. We say ‘this is more plausible than that’, ‘that’s less likely’, and so on. This suggests that our modelling continuum be an ordered one, and to keep things simple we may take it as completely ordered. Hence a generic interval [a, b] like the ordered extended real line [−∞, +∞], or [0, 1] will suit our purposes. The lower and upper limit of the interval must correspond to the notions of ‘impossible’ and ‘certain’. We can therefore introduce the expression ‘P(A| I)’, taking value in a for now unspeci- fied interval, to represent the plausibility of the proposition A in the context I. Remember that in formal logic we can use beside the expression ‘T(A| I) = 1’ also ‘I |= A’; and beside ‘T(A| I) = 0’ also ‘I |= ¬A’. A notation similar to that employing ‘|=’ would be not possible in plausibility logic, since we have there a continuum of possible values [cf. 314, § 2]. (But see how Hailperin [312, 313] extends the meaning of the symbol ‘|=’.) Some authors, like Gaifman [253], Dubois and Prade [178], and perhaps de Finetti [234] (depending on how one reads his words), see the plausibility of a proposition as defined ‘on top’ [178, § 3.3] of its truth value. This point of view seems to me superfluous and is not the one I adopt. I rather see the plausibilistic and the truth-value structures to live beside each other. For we usually say ‘It is likely that she went there’, not ‘It is likely that it is true that she went there’; the second proposition is just a roundabout form of the first 12 II. PLAUSIBILITY LOGIC AND INDUCTION one.

8. Having found a way to represent the notion of plausibility, we must decide which rules connect the plausibility values of the premises to those of the conclusions formed by means of the logical connectives. Just as in the case of formal logic, the rules must reflect — although necessarily in a simplified and schematised way — what we consider as the ‘correct’ way of reasoning about plausibilities. In this first phase plausibility logic has therefore a descriptive rôle. An explicit analysis of how these rules should reflect our plau- sibilistic reasoning is apparently first given by Cox [138, 139] and then by Jaynes [371– 373, 390] and Tribus [706, 707]; cf. also Jeffreys [397, 398] and Pólya [598–600]. I say ‘explicit’ because this conception of the theory of probability is already held by Maxwell, by Laplace [451, p. CLIII]: la théorie des probabilités n’est, au fond, que le bon sens réduit au calcul ; elle fait apprécier avec exactitude ce que les esprits justes sentent par une sorte d’instinct, sans qu’ils puissent souvent s’en rendre compte and even by Leibniz and Jacob Bernoulli, as discussed in Hacking’s studies [309–311]; see also Hailperin’s [313] historical discussion. Once this analysis is done, the correct way of reasoning about plausibilities should be represented by a set of equations like (II.3) — we cannot express the rules by tables like II.1 because, to repeat myself, we are dealing here with an infinite set of degrees of plausibility. The ensuing set of equations depends on the interval we have chosen to represent the plausibilities; though all these sets are, of course, isomorphic to one another; i.e. they all represent the same reasoning. For example, if we make the convention that P(A| I) ∈ [0, 1] (clearly for general A and I), where ‘0’ represents impossibility and ‘1’ represents certainty, then the rules are

P(A| I) ∈ [0, 1], (II.5a) P(¬A| I) = 1 − P(A| I), (II.5b) P(A ∧ B| I) = P(A| I) × P(B| A ∧ I) (II.5c) = P(B| I) × P(A| B ∧ I), P(A ∨ B| I) = P(A| I) + P(B| I) − P(A| I) × P(B| A ∧ I) (II.5d) = P(A| I) + P(B| I) − P(B| I) × P(A| B ∧ I), formally identical with those for truth-values in terms of conditional truths. These are the rules which we all know. With the convention that P(A| I) ∈ [−∞, +∞] instead, where ‘−∞’ represents impossibility and ‘+∞’ represents certainty, the rules take the form

P(A| I) ∈ [−∞, +∞], (II.6a) P(¬A| I) = − P(A| I), (II.6b) h i P(A ∧ B| I) = P(A| I) + P(B| A ∧ I) − ln 1 + eP(A|I) + eP(B|A∧I) h i (II.6c) = P(B| I) + P(A| B ∧ I) − ln 1 + eP(B|I) + eP(A|B∧I) , §9. 13 the rule for the disjunction having a very complicated expression. You can find that ex- pression by yourself from the isomorphism relating the rule set (II.5) to the rule set (II.6); here it is: P(A| I) P(A| I) 7→ ln . (II.7) 1 − P(A| I) Other plausibility-representation intervals are possible, with corresponding rules; you can find some in Tribus [707], pp. 26–29. It is evident that the first set of rules is the simplest; it does not involve, e.g., transcendental functions, and has the remarkable advantage of making the plausibility of incompatible events additive. It is moreover the one we are quantitatively more accustomed to. The second set, on the other hand, would have the advantage that since the plausibility ranges between −∞ and +∞ we could not confuse or conflate plausibility with frequency — this confusion plagues us still today. In any case, the representation (II.5) is the standard one and I shall not discuss any other equivalent representation any further.

9. I shall not repeat all the arguments leading to the above rules; for this I refer to the studies by Cox, Jaynes, Tribus, and the others already cited; they are an enjoyable and illuminating reading. But the argument leading to the rule (II.5c) for the plausibility of the conjunction, A ∧ B, is particularly compelling and quite intuitive: I present it by counterposing it to other possible rules. So consider the question of relating the plausibility P(A ∧ B| I) to the plausibilities P(A| I), P(B| I), or possibly also P(B| A ∧ I) and P(A| B ∧ I). Consider a rule identical with (II.3b) for truth logic,

P(A ∧ B| I) = P(A| I) × P(B| I). (II.8)

Would this rule agree with common sense? At first sight it has some desired properties; for example, the plausibility of the conjunction is less than those of the conjuncts. But let us examine a concrete example. Take the proposition A B ‘The left eye of the next person you meet is blue’. In the context I of a person living, like I, in Sweden, we could say that the plausibility of A is 1/2. Also the proposition B B ‘The right eye of the next person you meet is brown’ has, in the same context, a plausibility around 1/2. Now according to the rule (II.8) the plausibility of the conjunction, viz. of the proposition A ∧ B ≡ ‘The next person you meet has a blue left eye and a brown right one’, is 1/4. But this is too much (in the context I); I personally should assign a plausibility of 1/100 or less. The rule (II.8) does not satisfy our way of assigning plausibilities. The same problem remains and gets possibly worse with similar rules like some proposed in artificial intelligence, e.g.

P(A ∧ B| I) = min[P(A| I), P(B| I)]. (II.9)

According to this rule the plausibility that the next person I meet have eyes of the two different colours would be 1/2. The standard rule (II.5c), instead, gives an answer in accordance with common sense. It requires in fact that we specify not the plausibility of B per se, but that of B given A. This plausibility I should state at 1/50, and then the plausibility of the conjunction is 1/100, which accords with my judgements. 14 II. PLAUSIBILITY LOGIC AND INDUCTION

Of course, plausibility logic describes, in a simplified way, the way in which we think an ideal rational person ought to reason. But in practise we often do not reason according to our own standards, in plausibility logic as well as in deductive logic. There are very interesting studies by Kahneman, Tversky, et al. [78, 409, 410, 721–724] on typical errors committed in plausibilistic reasoning. Example of common errors in deductive reasoning are given in the first chapters of Copi’s book [136].

‘Plausibilities of plausibilities’, induction, and circumstances: a generalisation

10. The theory that emerges from the rules (II.5) is mathematically identical with prob- ability theory as presented in standard textbooks, and subsumes all usual applications of probability theory. For this reason I do not think necessary to further discuss various the- orems like e.g. Bayes’, the theorem on total plausibility, or standard applications. For these I refer to the works by Jaynes [390], Jeffreys [397, 398], Bernardo and Smith [66], de Finetti [239, 240], Gregory [294], and to the works by Hailperin [313] and Adams [1, 2], which are very relevant yet virtually unknown to the physics community. The import of the plausibility-logical point of view is not so much on the purely math- ematical side, as on the vast new range of applications and insights that it opens. Two of these insights concern the interpretation of the parameters in various statistical models, especially the ‘generalised Bernoulli’ one; and the relation between plausibility and induc- tion. These subjects are discussed in Papers (F) and (G), the contents of which I shall hereafter assume to be known (i.e., this is an appropriate place to read those papers). My purpose here is to extend the Laplace-Jaynes approach to multiple kinds of measurements, so as to make clearer its connexion, to be presented in ch. IV, with the vector framework for the plausibilistic properties of physical systems introduced in the next chapter.

11. The extension of the Laplace-Jaynes approach, as presented in § 5 of Paper (G), to multiple kinds of measurements is straightforward. We introduce different kinds of measurement, denoted by Mk. Each measurement has a set of outcomes {Ri | i ∈ Λk}; we shall often omit the indication of the index set Λk, since no ambiguity should arise. Different instances of these measurements and of their outcomes are denoted by an index ‘(τ)’, as in the paper. Remember that the terms ‘measurement’ and ‘outcome’ have broader denotations than usual. See § 2 of the mentioned paper on this point.) The main question is to determine plausibilities of the form

P(R(τN+L) ∧ · · · ∧ R(τN+1)| M ∧ R(τN ) ∧ · · · ∧ R(τ1) ∧ I), (II.10) iN+L iN+1 iN i1 where the symbol ‘M’ has the same meaning as described in Paper (G), § 2, but for the fact that it can concern different kinds of measurement. The introduction of a set of circumstances proceed as usual; only properties (II) and (III) need to be amended, as follows: §12. 15

III. The amendment to the second property is clear and does not need any comments:

(τ) (τ) (τ) (τ) (τ) (τ) P(Ri | Mk ∧ D ∧ C j ∧ I) = P(Ri | Mk ∧ C j ∧ I) for all τ, k, i ∈ Λk, j, and all D representing a conjunction of measure- ments, measurement outcomes, and circumstances of instances different from τ. (II.11)

IV. Also the amendment to the third property is quite clear: (τ0) (τ0) (τ0) (τ00) (τ00) (τ00) 0 00 P(Ri | Mk ∧ C j ∧ I) = P(Ri | Mk ∧ C j ∧ I) C pi j for all τ , τ , (II.12) (τ) (τ) (τ) which mathematically expresses the fact that we consider the Mk , Ri , C j , for different τ and fixed k, i, j, as ‘instances’ of the ‘same’ measurement, the ‘same’ out- come, and the ‘same’ circumstance. Thanks to this property we can use an expression like ‘P(Ri| Mk ∧ C j ∧ I)’ unambiguously; it stands for (τ) (τ) (τ) P(Ri| Mk ∧ C j ∧ I) B P(R | M ∧ C ∧ I) for any τ, i k j (II.13) ≡ pi j with i ∈ Λk.

12. Having made the above alterations to the properties of the circumstances, the de- termination of the plausibilities (II.10) proceeds along the lines of § 5.2 of Paper (G). Given the plausibilities (τ) (τ) (τ) P(Ri| Mk ∧ C j ∧ I) ≡ pi j (B P(Ri | M ∧ C j ∧ I) for any τ), (II.14) V0 ( j0) (τ) P(C j| I) ≡ γ j (B P( jC j | I) ≡ P(C j | I) for any τ), (II.15)

(with i ∈ Λk), and given some data D consisting in some outcomes of various kinds of measurements,

(τN ) (τ1) D B R ∧ · · · ∧ R (with ia ∈ Λk , a = 1,..., N), (II.16) | i N {z i 1 } a Ri appears Ni times the plausibility assigned to any collection of L measurement outcomes, with frequencies (Li), is given by

P(R(τN+L) ∧ · · · ∧ R(τN+1)| M ∧ D ∧ I) = | i N + L {z i N + 1 } Ri appears Li times X Y  X Y  Li Li P(Ri| Mk ∧ C j ∧ I) P(C j| D ∧ I) ≡ pi j P(C j| D ∧ I), (II.17) j k,i∈Λk j k,i∈Λk with   Q Ni pi j P(C j| I) k,i∈Λk P(C j| D ∧ I) =   . (II.18) P Q Ni pi j P(C j| I) j k,i∈Λk 16 II. PLAUSIBILITY LOGIC AND INDUCTION

13. The next step of this generalisation is to define an equivalence relation amongst circumstances. The natural choice is, of course, Definition 1. Two preparation circumstances C0, C00 are said to be equivalent if they lead to identical plausibility distribution for each kind of measurement Mk:

0 00 0 00 C ∼ C ⇐⇒ P(Ri| Mk ∧ C ∧ I) = P(Ri| Mk ∧ C ∧ I) for all k and i ∈ Λk. (II.19) The propositions above should really all have an index ‘(τ)’, and the equivalence re- lation should be defined for each instance τ; but, thanks to the properties (I)–(IV) of the circumstances, the instantial index can be unambiguously left out. In Papers (F) and (G) we proceeded with disjoining equivalent circumstances together and with ‘plausibility-indexing’ the resulting disjunctions. I shall however follow a slightly different path in the present case, to be discussed in § 39.

Miscellaneous remarks

14. We have not discussed the meanings of the three logical connectives ‘¬’, ‘∧’, and ‘∨’; and I really do not plan to discuss them either, for they should be intuitively clear; appropriate references to logic texts are given in footnote 1. Yet, whilst every logician and mathematician — even if amateurs —, and most philosophers and physicists have a clear understanding of those meanings, there are some individuals that attribute to those symbols ‘non-standard’3 meanings, apparently out of nowhere, as testified by the literature. A non- standard meaning that I have a couple of times noted in the literature is that of conceiving the conjunction as having a sort of temporal meaning, which it should properly not have. E.g., some people interpret the expression ‘A ∧ B’ as implying that B is a statement about matters that temporally precede those of A. Others interpret it as saying that A and B concern simultaneous matters. All I can say in this regard is: this is not standard (not to say erroneous)! People employing ‘conjunction’ that way should better find another name and another symbol for it, since their idiosyncratic use clashes not only with old logical and mathematical conventions and usage, but also with engineering standards like ISO [361] and ANSI/IEEE [357] (which adopt the standard logical usage). Analogous idiosyncrasies I have noted with respect to the conditional symbol ‘|’, al- though in this case they are more common and can even form minor schools. Some people attribute to the conditional symbol ‘|’ a temporal or even a causal meaning, interpreting ‘A| B’ as saying that B comes before A or that B causes A. That this is not the case is easily seen by considering

A B ‘The sky is cloudy’, B B ‘It is raining’. (II.20a)

Given that it is raining, I think you agree with me that it is very likely that the sky is cloudy as well; in symbols, this is written

P(A| B ∧ I) = a, (II.20b)

3In this case the boundary between ‘non-standard’ and ‘erroneous’ is very fuzzy. §14. 17 where a is near 1. This expression makes complete sense. But in it the rain comes surely not before the cloudiness, nor is the rain a cause of the cloudiness (one could rather say the opposite). This very simple example shows that the conditional symbol ‘|’ expresses a log- ical, not a temporal or causal relation here. It expresses relevance in respect of knowledge. We have another example if we consider two persons, say husband and wife, only one of whom has, in the pocket, the only key to a particular door. Suppose the husband does not know whether the key is in his own pocket. But he comes to know (e.g., by a telephone call, or reading a note on a paper) that the wife — who can be miles away — does not have the key in her own pocket. This automatically implies for the husband that the key is in his pocket. Denoting by

A B ‘The husband has the key’, B B ‘The wife does not have the key’, (II.21a) and by I the whole context (which include the assumption that none lost the key), we can rightfully write P(A| B ∧ I) = 1. (II.21b) But again, there are no temporal or causal connections between the facts stated by A and B here (the wife’s not having the key is not the ‘cause’ of the husband’s having it). The ‘|’ expresses the relevance of the second piece of data, B, for the first, A. I do not understand why some people would like to limit the scope of the conditional symbol ‘|’ to temporal or causal connexions only, thus forbidding completely meaningful expressions like (II.20) and (II.21). Such relevance relations and judgements are routinely used, whether implicitly or explicitly, also in laboratory practise, where it is often the case that some detail of a set-up tells us something about some other detail, yet without being a ‘cause’ of the latter or ‘preceding’ it in time. And of capital importance is their use in communication theory, where a fundamental problem is to increase our knowledge about a sent signal or message, A, from knowledge of a received one, B. The plausibility P(A| B∧I) is here the fundamental quantity to be judged, and the temporal and causal arrows clearly go from A to B, not vice versa! Communication theory would simply be impossible if the meanings of the conditional and the conjunction symbols were restricted as those people use them. When such people, ignoring their own idiosyncrasies, read and comment works that use standard notation, some preposterous results may arise indeed. As in the case of Brukner and Zeilinger’s analysis of Shannon’s entropy in quantum experiments [94]. Shannon’s use of the conjunction and of the conditional symbol in his deservedly famous article [657; cf. also 658] is quite standard of course — otherwise he could not have built any theory at all on the problem of ‘reproducing at one point either exactly or approximately a message selected at another point’ [657, Introduction]. Concrete example: in § 12 he explicitly calculates and states that P(A| B ∧ I) = 0.99, where B B ‘A 0 is received’ and A B ‘A 0 was transmitted’. But Brukner and Zeilinger did apparently not notice this, and based their analysis on the tacit understanding that ‘A| B’ means that the measurement related to B temporally precedes that related to A. This led them, amongst other things, to change a whole experimental set-up, in which A preceded B, into a new one in which the order of measurement was reversed — all in order to calculate quantities involving ‘A| B’! It 18 II. PLAUSIBILITY LOGIC AND INDUCTION is as if, in order to calculate the plausibility of a particular input given the output of a communication channel, we needed to exchange the input with the output; with the only result of constructing — granted the possibility of that operation — a different channel. Simply meaningless! A recount and more detailed analysis of this and related questions is given in Paper (A).

15. We already saw that what plausibility logic does is to give us the plausibilities of some propositions, the conclusions, provided that we give those of other propositions, the assumptions. And as remarked this is completely analogous to what truth logic does. An aspect on which there has been and there still is much debate is the question of making the initial plausibility assignments. The various positions within the Bayesian community as regards this aspect can be roughly grouped into two groups. One group, the ‘objectivists’, maintain that the plausibility of whatever proposition A is determined by the context I, if the latter is completely specified. To this group seem to belong Jeffreys (but see [757]), Carnap, and Jaynes amongst others. The other group, the ‘subjectivists’, maintain that the plausibility of whatever proposition A is a matter of subjective judgement, and two persons sharing exactly the same context could nevertheless assign different plausibilities to the same proposition. To this group belong de Finetti and Savage amongst others. I must say that the statement that ‘two persons sharing exactly the same context can nevertheless assign different plausibilities’ appears a bit vacuous to me, since no two per- sons can ever share exactly the same context. I think that it is always possible to trace a difference in plausibility assignments back to different experiences, i.e., different contexts (I have done some informal experiments upon friends on this — try yourself: just keep asking ‘why?’ long enough). This would seem to strengthen the thesis of the objectivists. And yet, I think it also impossible to specify a context exactly (in those experiments, the answers to the whys always tend to be hazier and hazier). So the objectivist programme appears somewhat vacuous as well. In fact, I am not convinced that the methods that have been proposed to ‘mechanically’ assign a plausibility from some kinds of contexts really achieve their purpose. However, I do not mean that those methods are not useful; quite the contrary. My personal position simply is that the question of the initial plausibility as- signments does not concern, and cannot be answered by, plausibility theory — just like the question of the initial truth-value assignments does not lie within the jurisdiction of formal logic. Both assignment problems are certainly very important, but their settling resides at a ‘meta-theoretical’ level that I think cannot be fully caged into a mathematico-logical formalism. The assignment of initial plausibilities and initial truth-values is a matter of discussion and convention that lie outside any precise theory.

16. A deplorable habit of not few physicists, and not few mathematicians and logicians either, is that of neglecting the context in which a given plausibility assignment applies. This is very surprising as regards physicists, because they are surely the first to pay atten- tion to, and to specify as exactly as possible, the details and boundary conditions of their experiments. The problem is that this negligence leads to errors, seeming contradictions, and false statements. An egregious case is the statement, which I have read more than once, §16. 19 that ‘quantum mechanics violates the sum rule of probability theory’, i.e., rule (II.5d):

P(A ∨ B| I) = P(A| I) + P(B| I) − P(A ∧ B| I), (II.5d)r especially with regard to the double-slit experiment and similar ones. Let us see what plausibility theory says about this experiment. See also Szabó’s discussion [683, 684] (although he assumes a frequentist point of view). Consider the usual set-up with two slits, called ‘A’ and ‘B’, the source of particles, the detection screen, etc. Introduce the mutually exclusive propositions

IAB¯ B ‘Slit “A” is open, “B” is closed’, (II.22)

IAB¯ B ‘Slit “B” is open, “A” is closed’, (II.23)

IAB B ‘Both slits are open’, (II.24) and the propositions

A B ‘The particle passes through slit “A”’, (II.25) B B ‘The particle passes through slit “B” ’, (II.26)

Dx B ‘The particle is detected at the screen position x’. (II.27) Our invariable situation will be the following:

I B ‘A particle is sent towards the slits, and is detected somewhere in the screen’. (II.28) So we shall not consider the situation in which the particle does not arrive at the screen, e.g. because it did not pass through any slit. In our situation the particle must have passed through one of the slits, hence we have

P(A ∨ B| I) = 1, (II.29) and note that this holds in each of the cases IAB¯ , IAB¯ , IAB. We also assume that we are speaking of particles that do not break into parts, so that

P(A ∧ B| I) = 0; (II.30) i.e., the particle cannot pass through both slits. Now consider the case in which both slits are open, IAB. What is the plausibility that the particle is detected at x? By the rules of plausibility logic, it is

P(Dx| IAB ∧ I) = P[Dx| (A ∨ B) ∧ IAB ∧ I],

= P(Dx| A ∧ IAB ∧ I) P(A| IAB ∧ I) + P(Dx| B ∧ IAB ∧ I) P(B| IAB ∧ I), (II.31) i.e., the sum of the plausibility that the particle is detected at x given that it passes through ‘A’ times the plausibility that it passes through ‘A’, and the analogous product for ‘B’. What more does plausibility theory says? Nothing! 20 II. PLAUSIBILITY LOGIC AND INDUCTION

To have a numerical answer we must specify the functions

P(Dx| A ∧ IAB ∧ I) = fA(x), P(Dx| B ∧ IAB ∧ I) = fB(x), (II.32) and the quantities

P(A| IAB ∧ I) = pA, P(B| IAB ∧ I) = pB; (II.33) but that is not something that plausibility theory can do. Those plausibilities must be dic- tated by some physical theory. Until this has been done, plausibility theory leaves open an infinity of possibilities; in particular, also those dictated by quantum mechanics: plau- sibility theory does not exclude ‘interference effects’. So why do some people say that ‘quantum mechanics violates plausibility theory’? probably what they mean is that the plausibilities (II.32)–(II.33) assigned by quantum mechanics are different from those as- signed by some ‘classical’ physical theory. First conclusion: quantum mechanics does not violate the rules of plausibility theory; it only assigns plausibilities that differ from those of some classical theory. But both theories respect the plausibility rules. From the form of eq. (II.31) we also notice another fact. Since, from eqs. (II.29) and (II.30), P(A| IAB ∧ I) + P(B| IAB ∧ I) = 1, it follows that the plausibility for the particle to be detected at x can never be equal to the sum of P(Dx| A ∧ IAB ∧ I) (the plausibility that the particle be detected there given that it passes through ‘A’), and P(Dx| B ∧ IAB ∧ I) (similarly, but passes through ‘B’). At most, there can be a proportionality with factor of 1/2 when P(A| IAB ∧ I) = P(B| IAB ∧ I) = 1/2. But then, what about Feynman’s formulae [230, § 1-1; 231, ch. 1]

‘ P12 = P1 + P2 ’, (II.34a)

‘ P12 , P1 + P2 ’? (II.34b)

His notation with ‘P12’, ‘P1’, and ‘P2’, as is clear from his discussion, stands for

P12 B P(Dx| IAB ∧ I), (II.35)

P1 B P(Dx| IAB¯ ∧ I), (II.36)

P2 B P(Dx| IAB¯ ∧ I); (II.37) i.e., P12 is the plausibility that the particle be detected at some point given that both slits are open, P1 the plausibility given that slit ‘A’ is open and ‘B’ closed, and P2 the plausibility given that slit ‘B’ is open and ‘A’ closed. You note that these are plausibilities conditional on incompatible propositions; so the expressions P12 = P1 + P2 and P12 , P1 + P2 have nothing to do with the sum rule of plausibility theory, eq. (II.5d). Indeed, the following theorem of plausibility logic holds: Theorem II.1. Given that

P(Dx| A ∧ IAB ∧ I) = fA(x), (II.38)

P(Dx| B ∧ IAB ∧ I) = fB(x), (II.39) §17. 21

Figure II.1: Space-time diagram for the Einstein-Podolsky-Rosen thought-experiment

with fA(x), fB(x) ∈ [0, 1] fixed but arbitrary, the only constraints imposed by the rules of plausibility logic to P(Dx| IAB ∧ I) (II.40) are that it be non-negative and not greater than one — i.e., no constraints at all apart the usual ones for plausibilities.

The theorem can be proven by standard linear-programming methods [151, 152, 312, 313]. The conclusion is again: statements asserting violations of plausibility theory on the part of quantum mechanics are completely false. See also the discussion by Koopman [440] and Strauß [674].

17. Another egregious misapplication of plausibility theory lies at the core of one of the most celebrated theorems of our time: Bell’s theorem [54, 55, 126, 127]. This misap- plication (that does not concern fair sampling or other similar arguments) is an example of those discussed in § 14; it is first pointed out by Jaynes [385], and lucidly discussed by Morgan [536] (see also Kracklauer [441–443]). Consider the usual set-up of the Einstein-Podolsky-Rosen experiment [193] as mod- ified by Bohm. In each of two space-like separated regions α¯ and β¯ a spin measure- ment is performed on an ‘object’ — usually considered a particle of a pair, but from a field-theoretical viewpoint it should rather be a field — whose state depends on the set of ‘hidden’ variables λ of a region γ¯ given by the intersection amongst a space-like hy- persurface and the past light-cones of the regions α¯ and β¯ , as in Fig. II.1. In the region γ¯ the two ‘objects’ are prepared according to a fixed procedure. We know some details of the experimental set-ups in the two measurement regions, viz. the orientations a and b of the two measuring Stern-Gerlach apparatus. What we ask is the plausibility that the measurements yield outcome A˜ for the measurement in region α¯ and outcome B˜ for that in β¯ . What we assume is that there is some local and deterministic theory — possibly a 22 II. PLAUSIBILITY LOGIC AND INDUCTION

field theory — that describes all the fundamental microscopic ‘hidden’ variables involved in the preparation and the measurements. Note that a, b, A˜, B˜ are macroscopic parameters or quantities, i.e., they are related through some sort of ‘coarse graining’ (space or time averages [358, 420–424, 541–552], or Fourier transforms [625]) to the underlying funda- mental microscopic quantities in the respective regions; much like pressure in relation to the positions and momenta of the particle of a gas and of its environment. The variables λ, on the other hand, are amongst the fundamental microscopic, ‘hidden’ quantities, and are not obtained by coarse graining; this means, in particular, that they cannot be affected by ‘thermalisation’ or similar processes. The assumption of determinism implies that the values of the fundamental microscopic quantities in region α¯ , and hence also the values of all the derived ‘coarse-grained’ quan- tities therein, are completely determined (through some functions or functionals) by the values of the microscopic quantities in earlier space-time regions. The assumption of (rela- tivistic) locality implies that only the microscopic quantities in the past-light cone of region α¯ are necessary and sufficient for this determination. The same holds for region β¯ . See again Fig. II.1. In formulae, ˜ A = fA˜(λ,... ), a = fa(λ,... ), (II.41) ˜ B = fB˜ (λ,... ), b = fb(λ,... ), (II.42) where the dots stand for other microscopic variables in the non-intersecting parts of the two past-light cones. In more generic terms we can say that knowledge of λ is in general relevant to knowledge of each of a, b, A˜, B˜. Note that the equations above cannot in general be inverted to obtain the value of λ from those of a, b, A˜, B˜ (we should need all the microscopic quantities in the future light-cone of region γ¯ to do this); but knowledge of these can imply some restrictions in the possible values that λ may have. In formulae,4 −1 ˜ −1 ˜ −1 −1 λ ∈ fA˜ ({A}) ∩ fB˜ ({B}) ∩ fa ({a}) ∩ fb ({b}). (II.43) In other words, the knowledge of each of a, b, A˜, B˜ is in general relevant to the knowledge of λ. Note that the assumption of determinism can be weakened: we may assume that the theory only plausibilistically determines the values of a, b, A˜, B˜ from those of λ. But in respect of relevance, the points made in the previous paragraphs still hold. Let us put all this in form of propositions. Introduce the context I, which includes a description of all our assumptions and set-up, and the propositions a B ‘The apparatus in region α¯ is oriented along a’, (II.44) b B ‘The apparatus in region β¯ is oriented along b’, (II.45)

Λλ B ‘The ’hidden variables’ in γ¯ have values λ’, (II.46) A B ‘The result of the measurement in region α¯ is A˜’, (II.47) B B ‘The result of the measurement in region β¯ is B˜’. (II.48)

4I should really write something like ‘( f −1) ’ to indicate that I only consider the first argument of f for the A˜ 1 A˜ inverse mapping, etc.; but I do not want to encumber the notation. §17. 23

The plausibilities that interest us are

P(A ∧ B| a ∧ b ∧ I). (II.49)

This plausibility can be written as a marginalisation on the possible values of λ: Z

P(A ∧ B| a ∧ b ∧ I) = p(A ∧ B ∧ Λλ | a ∧ b ∧ I) dλ, (II.50) and using the product rule we obtain

P(A ∧ B| a ∧ b ∧ I) = Z

P(A| B ∧ Λλ ∧ a ∧ b ∧ I) P(B| Λλ ∧ a ∧ b ∧ I) p(Λλ | a ∧ b ∧ I) dλ. (II.51)

Up to now our analysis coincides with that of Bell; and no other assumptions than deter- minism and locality have been used. But at this point Bell makes the following additional assumptions: First, P(A| B ∧ Λλ ∧ a ∧ b ∧ I) = P(A| Λλ ∧ a ∧ I), (II.52) i.e., knowledge of the orientation and outcome in region β¯ , given that λ is known, is not relevant to knowledge of the outcome in α¯ . This is acceptable for the following reason. Knowledge of b and B˜ is relevant in that it can restrict the possible value of λ and of the other microscopic quantities in the non-intersecting part of past light-cone of β¯ . But λ is already known, so knowledge of its restrictions is superfluous; and the restrictions on the other microscopic quantities are, by locality, irrelevant for region α¯ , since they lie outside its past light-cone. Therefore b and B˜ cannot tell us anything new for a and A˜. Note, however, that had λ denoted only part of the microscopic quantities in γ¯ , then b and B˜ would have been relevant and Bell’s assumption (II.52) would not have held. Bell’s second assumption is that

P(B| Λλ ∧ a ∧ b ∧ I) = P(B| Λλ ∧ b ∧ I), (II.53) which is also acceptable for a reasoning analogous to that of the first assumption, provided that λ denotes all the microscopic quantities in γ¯ . Bell’s third assumption,

p(Λλ | a ∧ b ∧ I) = p(Λλ | I), (II.54) is however untenable. It says that knowledge of a and b is irrelevant for the knowledge of λ. But we have seen from eqs. (II.41), (II.42), and especially (II.43) that knowledge of a and b can impose restrictions on the values of λ and is therefore relevant to its knowledge. This is the meaning of leaving ‘a∧b’ in the context of ‘P(Λλ | a∧b∧ I)’. Bell’s assumption is unwarranted and, in general, unphysical. In general, it contradicts the assumption of de- terminism. Not only in the case of strict determinism, but also in the case of a plausibilistic relation between λ and a, b. 24 II. PLAUSIBILITY LOGIC AND INDUCTION

Many physicists suddenly turn into mystics when it comes to Bell’s last assumption above. They say: ‘But the propositions a and b concern the orientation of the apparatuses, which may be chosen by us; you mean then that we have no free will in our decisions?’ — Of course I do! or have they perhaps forgotten that our assumption was determinism? They must decide: do they want this theorem to concern local deterministic physical theories, or to concern a religious question? What will their next assumption invoke? the Holy Trinity? Here lies another, perhaps deeper problem: the fact that often we physicists take a philosophical pose, but our philosophical remarks are usually naïve and cheap, no better than those of a gymnasial student making the first reflections on human nature. Indeed, the question of ‘free will’ was definitively shown by Wittgenstein [748–750] to be simply void, only a Sprachspiel. So we cannot accept the last assumption, since in general it contradicts one of the two basic assumptions of the theorem. Once it is discarded, our sought plausibility takes the final form Z

P(A ∧ B| a ∧ b ∧ I) = P(A| Λλ ∧ a ∧ I) P(B| Λλ ∧ b ∧ I) p(Λλ | a ∧ b ∧ I) dλ, (II.55) and it can be shown that the inequalities obtainable from this decomposition have an up- per bound of 4 (the maximum possible), instead of the usual 2 presented in the litera- ture [536]. The conclusion is that (a) Bell’s theorem, as is usually stated and taught, proves nothing about general local deterministic ‘hidden-variable’ theories; (b) the various ex- periments [23, 24] that confirm (pace loopholes) a violation of Bell’s inequalities, have nothing to say as regards the exclusion of general local deterministic ‘hidden-variable’ theories, since they only show a violation of the bound of 2, not 4 as required in our deriva- tion. The latter bound, however, is the maximum possible and cannot be violated by any experiment (in the same sense in which no experiment can ever show a relative frequency greater than 1.) III. A mathematical framework for the plausibilistic properties of physical theories

What is a physical theory?

18. The official place that plausibility logic has in natural philosophy is very prominent today — I say ‘official’ because, at least at an intuitive and informal level, plausibility logic has always been an essential element in the mathematical study of nature. Only the idea of trying to list some research areas primarily based on plausibility logic seems meaningless: these areas are too many, and there are no well-defined boundaries between them. Wanting to make a sort of schema of the ways in which plausibility logic is used in a physical theory, we should need a general idea of what a physical theory is. Let me first quote this beautiful passage from Truesdell [716, Prologue]:

A theory is a mathematical model for an aspect of nature. One good theory extracts and exaggerates some facets of the truth. Another good theory may idealize other facets. A theory cannot duplicate nature, for if it did so in all respects, it would be isomorphic to nature itself and hence useless, a mere rep- etition of all the complexity which nature presents to us, that very complexity we frame theories to penetrate and set aside. If a theory were not simpler than the phenomena it was designed to model, it would serve no purpose. Like a portrait, it can represent only a part of the subject it pictures. This part it exaggerates, if only because it leaves out the rest. Its simplicity is its virtue, provided the aspect it portrays be that which we wish to study. If, on the other hand, our concern is an aspect of nature which a particular theory leaves out of account, then that theory is for us not wrong but simply irrelevant. For example, if we would analyse the stagnation of traffic in the streets, to take into account the behavior of the elementary particles that make up the engine, the body, the tires, and the driver of each automobile, however “fundamental” the physicists like to call those particles, would be useless even if it were not insuperably difficult. The quantum the- ory of individual particles is not wrong in studies of the deformation of large

25 26 III. THE VECTOR FRAMEWORK

samples of air; it is simply a model for something else, something irrelevant to matter in gross.

What I need for the present discussion is to try to identify some general features that all physical theories and models have in common. One of these features, which also char- acterises truth logic and plausibility logic, is the following: a physical theory or model is something which, given some — possibly hypothetical — knowledge about particular facts, like observations, set-ups, etc., provides us with — possibly hypothetical — knowl- edge about other related facts.1 We can call these knowledges ‘a priori’ and ‘a posteriori’; but I must hurry to remark that these adjectives do not refer to temporal characteristics of the objects or facts of knowledge. In fact, the ‘a priori’ knowledge may e.g. engage quantities at a time t0 and the ‘a posteriori’ one quantities at a time t1 with t1 < t0. I shall sometimes also use the terms ‘initial’ and ‘final’, but the same remark holds for them as well — even more. Chronologically more neutral terms, which I shall also use, are ‘in- put’ and ‘output’; they unfortunately suggest that some sort of ‘connecting algorithm’ is available, which is not always the case.

Exemplifying interlude (which also introduces some notation)

19. To make the very general characterisation of a physical theory given in § 18 more concrete, let me give some descriptive, but not yet mathematical, examples of ‘classical’ theories.2 I shall then make one of these examples mathematically explicit in § 20. In ‘classical’ theories we usually have a basic system of equations, which includes general laws — expressing e.g. principles of balance or conservation — and possibly some constitutive equations. By specifying additional particular constitutive equations as well as initial- and boundary-value data, we obtain a system of integro-differential equations with a unique solution. This usually describes the ‘motion’ of a particular set of quantities. Note that the term ‘motion’ can generally mean (e.g., in relativistic theories) the specification of the resulting quantities in given space-time regions. From the point of view of the general characterisation given in the previous section, the additional constitutive equations and the boundary conditions represent our ‘a priori’ knowledge, and the solution of the equations our ‘a posteriori’ knowledge. In this example

1Göran Lindblad amusingly remarked at a seminar of mine that the above characterisation could also be given of religion; and he is surely right. One could therefore add that the knowledge provided by a model is the distillate and often the generalisation of experience and observations, not an ecstatic revelation sent by the gods. Surely there are many other qualifications that ought to be given in trying to describe what a physical model is. But my humble purpose is only to try to motivate the mathematical framework to be introduced presently, not to start a discussion of philosophy of science. For such discussions I can but refer to greater and lesser masters like Duhem [181], Poincaré [596, 597], Truesdell [710, 715] (see also [711] and the introductions in [714, 716, 718, 719]), Bunge [102, 104] (see also [99, 100, 103]), and others. 2Unfortunately many physicists (especially quantum mechanicians) that occupy themselves also of ‘founda- tions’ seem to always have in mind the equation ‘classical (Galilean-relativistic) physics’ = ‘classical mechanics of point particles’; thus leaving out the greatest and most complex part of classical physics, the continuum elec- tromagnetothermomechanical theories [e.g., 74, 158, 159, 197, 203, 206, 207, 213, 214, 216, 217, 511, 636, 662, 712, 714, 718, 719, 736, 737], of which analytical mechanics is only a special case. §20. 27 both types of knowledge are precise and unique, or we could say certain. This is often the meaning of the adjective ‘deterministic’ applied to these theories. But the analysis can be generalised. If we specify only a part of the necessary con- stitutive equations or a part of the boundary values, then the solution is no longer unique but belongs to a more or less restricted class of possible solutions. This class represents in this case our a posteriori knowledge, even if it is not as detailed as that represented by a unique solution. The next step of this generalisation is to consider boundary data (or even constitutive equations) that are uncertain, introducing some plausibilities for them. The solutions will then also have a plausibilistic nature. It is easy to formulate concrete examples. The first and surely the most familiar to quantum physicists is that of a ‘classical’ system of point particles whose interaction is specified either by assigning a system of forces, or a Hamiltonian or Lagrangian function, or a more general evolution operator. The a priori knowledge usually pertains a particular value for the set of positions and velocities (and, in rare cases, even accelerations) of the particles at a given instant; the a posteriori knowledge pertain a whole and unique trajectory for the positions of the particles, for preceding and subsequent instants. But we can also specify a given time interval (positive or negative) amongst the a priori data, and in this case the a posteriori knowledge may simply concern the value of the set of positions, velocities, etc., after or before such a time interval. Thus this example also shows that there are no precise prescriptions as to what is ‘a priori’ and what is ‘a posteriori’ knowledge: this division depends on what we already know (‘a priori’) and what we ask (‘a posteriori’). The generalisation of this example leads to statistical mechanics [605]. If our a priori knowledge consists not in a precise value for the set of positions etc., but in a set of possible ones with assigned plausibilities, then the a posteriori knowledge will consist in a set of possible trajectories with particular plausibilities.3 An example from more general modern continuum theories is provided by a body on which the fields of stress, (free) energy, heating flux, and entropy (and possibly electro- magnetic fields) are defined and satisfy appropriate balance laws, as well as additional constitutive equations that characterise a particular material. The a priori knowledge gen- erally consists in the history (or equivalence classes of histories) of the deformation and temperature fields of the body. Required is the a posteriori knowledge of the motion of the other fields. Plausibilistic generalisations of this example are not so common, see e.g. Beran [65]; yet they are conceptually straightforward, even if mathematically demanding.

20. Let us rephrase the point-particle example of § 19 in mathematical notation. The positions, velocities or momenta, etc. of the particles can be called ‘phase coordinates’ and

3The plausibilities for the initial data are usually called — by many still today, unfortunately — an ‘ensem- ble’; a term that dates back to the days in which many physicists were still confused about what plausibility is, and needed to imagine an infinity of fictive copies of the physical situation under study. Today we do not need such fictions, and that term causes only confusion. We have only one physical situation — the one under our senses — and all we are doing is making plausibility judgements about some unknown details of its. Note that by ‘ensemble’ I do not mean a real collection of a finite number of systems or objects; I call this an ‘assembly’, a term proposed by Peres [586]. Unfortunately the literature witnesses also a confusion about ‘ensemble’ and ‘assembly’ [130, 131, 477, 478, 693]. 28 III. THE VECTOR FRAMEWORK denoted by z. Their space4 is Γ . The dynamics is specified by a mapping, or ‘evolution operator’,

U :(t0, t1) 7→ Ut0,t1 , Ut0,t1 : z0 7→ Ut0,t1 (z0) (III.1)

that maps the phase point z0 at time t0 to the point Ut0,t1 (z0) at time t1, and satisfies the usual Chapman-Kolmogorov law [125, 509]

Ut ,t ◦ Ut ,t = Ut ,t , 0 1 1 2 0 2 (III.2) Ut,t(z) = z.

We assume we have all needed smoothness. The evolution operator can represent a solution of the equations of motion ˙z(t) = v[z(t), t] for a time-dependent vector field v(z, t). The a priori knowledge then consists in a pair of values t0, z0, and the a posteriori knowledge consists in the motion t 7→ Ut0,t(z0) or, if a particular time t1 is specified (and remember that we could have t1 < t0), simply in the phase coordinate z1 = Ut0,t1 (z0). All this can be rephrased in a language appropriate to plausibility logic. We must keep in mind that the phase coordinates z at a time t represent the possible ‘outcomes’ (t) (t) 5 {Rz | z ∈ Γ } of a particular ‘measurement’ M performed at that time . What I call ‘measurement’ is not always an observation: sometimes it is an active selection. Our a (t0) (t0) priori knowledge Cc can be expressed by saying that the proposition Rz0 , given M , is true or certain:6

(t0) (t0) P(Rz0 | M ∧ Cc ∧ I) = 1, (III.3) or better, in terms of a plausibility distribution,

(t0) (t0) p(Rz | M ∧ Cc ∧ I) dz = δ(z − z0) dz. (III.4)

The knowledge implicit in the equations of motions (which is included in the proposition I) (t0) (t0) can be expressed in the same logical terms. The equations say that given the data Rz0 ∧M (t ) (irrespectively of Cc), and given that we perform a measurement M 1 at time t1, we are (t1) certain to obtain the outcome Rz1 such that z1 = Ut0,t1 (z0):

(t1) (t1) (t0) (t0) p(Rz | M ∧ Rz0 ∧ M ∧ I) dz = δ[z − Ut0,t1 (z0)] dz. (III.5)

These two pieces of knowledge together lead to our a posteriori knowledge, simply by the rules of plausibility theory (II.5) (more specifically, the theorem on total plausibility):

4I would only say ‘manifold’, since no additional structures are assumed for Γ in this example. 5Which also implies the tacit performance of a time measurement. 6Note that the derivations that follow, including the integrations and the products of deltas, are math- ematically rigorous, even if some details are left out. I am using Egorov’s theory of generalised func- tions [164, 185, 186] (see also [162, 163, 470, 568]), which means that the deltas are implicitly specified by appropriate sequences. §20. 29

given Cc, the plausibility that a measurement at time t1 yields an outcome within dz is

(t1) (t1) p(R | M ∧ Cc ∧ I) dz = z Z (t1) (t1) (t0) (t0) (t0) (t0) 0 p(Rz | M ∧ Rz0 ∧ M ∧ Cc ∧ I) p(Rz0 | M ∧ Cc ∧ I) dz dz = Γ Z 0 0 0 δ[z − Ut0,t1 (z )] δ(z − z0) dz dz = δ[z − Ut0,t1 (z0)] dz; (III.6) Γ in other words, we are certain of the outcome z1 B Ut0,t1 (z0). Up to now we could have comfortably done without plausibility theory: eqs. (III.4)– (III.6) have only restated in a cumbersome way what was basically already implicit in eqs. (III.1) and (III.2). But let us now consider the generalised case with a priori data Cu upon which knowledge of a specific z0 at time t0 is uncertain. We have a plausibility distribution (t0) (t0) p(Rz | M ∧ Cu ∧ I) dz = ft0 (z) dz (III.7) for some specific normalised positive generalised function ft0 . The knowledge provided by the equations of motion is unaltered,

(t1) (t1) (t0) (t0) p(Rz | M ∧ Rz0 ∧ M ∧ I) dz = δ[z − Ut0,t1 (z0)] dz. (III.5)r To derive our a posteriori knowledge, i.e. the answer to the question ‘if we perform a (t1) (t1) measurement M at time t1, with which plausibility can be obtain a result around Rz ?’, we really need this time the rules of plausibility theory:

(t1) (t1) p(R | M ∧ Cu ∧ I) dz = z Z (t1) (t1) (t0) (t0) (t0) (t0) 0 p(Rz | M ∧ Rz0 ∧ M ∧ Cu ∧ I) p(Rz0 | M ∧ Cu ∧ I) dz dz = Γ Z δ[z − U (z0)] f (z0) dz0 dz = f [U−1 (z)] dz. (III.8) t0,t1 t0 t0 t0,t1 Γ

(t1) (t1) That is, defining ft1 (z) dz B p(Rz | M ∧ Cu ∧ I) dz, our final knowledge is simply ex- pressed by the plausibility distribution h i (t ) (t ) p(R 1 | M(t1) ∧ C ∧ I) dz = p R 0 M(t0) ∧ C ∧ I dz, or z u U−1 (z) u t0,t1 (III.9) f (z) dz = f [U−1 (z)] dz. t1 t0 t0,t1 We recognise in this equation the most general form of Liouville’s equation, which in terms of the vector field v previously introduced is equivalent to the more common form

∂t ft(z) = −∇z · [˙z ft(z)] (III.10) [cf. 19, 20, 222, 223, 282, 474, 589, 613, 614, 720]. Formula (III.9) represents the answer to our question for a particular time t1; but can also be interpreted as a set of answers for different such times t1. It is in a sense somehow 30 III. THE VECTOR FRAMEWORK redundant; for instead of asking, for each t, ‘What is the plausibility of obtaining the out- (t) come Rz ?’, we can directly ask: ‘What is the plausibility of a particular phase trajectory?’. Denoting: a trajectory by ζ : t 7→ z = ζ(t), the ‘measurement’ of the trajectory (i.e., of the history of the system) by M, and the outcome consisting in a particular trajectory ζ by Rζ , the answer to our question above is

‘ p(Rζ | M ∧ Cu ∧ I) dζ = δ[ζ − Ut0,·(z0)] ft0 (z0) dζ ’, (III.11) which I have put within quotation marks since it requires much analytical and topological care. The measure p(... ) dζ (as well as the delta) is in fact defined over a trajectory space, i.e., integrations in respect of this measure are path integrals. It would seem that there is no much difference between the two questions above, the one about individual times and the one about trajectories. And indeed conceptually there is not. Yet, after the grounding work of Boltzmann, Maxwell, and Gibbs, it took so long time for statistical mechanics to proficuously attack non-equilibrium processes precisely because the point of view and the question asked had always been restricted to the possible phase coordinates of the system, instead of the possible motions of the system, — an approach that is clearly untenable as soon as the relation between trajectories and phase points at a given time ceases to be bijective. Today, especially after the work of Jaynes, statistical mechanics is based on ‘ensembles’ (plausibility distributions) defined not in phase space, but in path space. Much could be said on this extremely interesting subject; unfortunately, for reasons of space and time, I can only refer to the work of Jaynes [374, 377, 379, 383] and many others’ like Mori, McLennan, Lewis, Zwanzig, Hobson, Robertson, Zubarev and Kalashnikov, Grandy, Baker-Jarvis, Gallavotti, Maes, Dewar, et multi alii [27–30, 32, 34, 142, 161, 167–169, 174, 175, 184, 196, 220, 255, 256, 265, 285, 287–289, 343– 345, 368, 394, 458, 468, 469, 495–499, 517–519, 533, 537, 574, 618–620, 622–624, 639, 641, 666, 692, 733, 760, 763–765].

21. Before continuing, I should like to offer a remark concerning locutions like ‘tem- poral evolution of the plausibility distribution’ or ‘the equation of motion of the plausibil- ity distribution’ so often used in statistical mechanics. Our a priori knowledge (Cu ∧ I in the above equations) is always the same; and the plausibility distributions simply concern questions that regard different times. These plausibilities are assigned as soon as we per- form the calculations, and as soon as these are done our knowledge and our plausibilities do not change a iota, independently of how long the system has evolved. Thus saying that the plausibility ‘evolves in time’ is somehow inappropriate and misleading. An example may perhaps illustrate what I mean: If we now read and learn a timetable for a local bus, we know where the bus stops at particular times; but we know it all now — we do not acquire that knowledge ‘along the bus’ trip’. We know now that the buss will stop at Baker Street at 12 o’clock, we are not ‘going to know it’ at 12 o’clock. Our knowledge is not ‘evolving’. The sentence that ‘the plausibility evolves in time’ conceals a conception of plausibility as a sort of physical thing — which it is not. The same remark also holds for ‘wave-functions’ and other mathematical objects whose rôle is only that of encoding plausibilities (see ch. III). §22. 31

Three special kinds of propositions. Definition of ‘system’. Insights

22. The derivation of the general mathematical setting for statistical mechanics given in § 20 is not the one commonly found in textbooks, even though its imports are roughly the same. It was as near as possible to the point of view of plausibility logic, and the notation introduced there will be further used in the following discussion. Let us now proceed to the construction of a logico-mathematical framework which make allowance for the characterisation, given in § 18, of a physical theory. I introduce three kinds of propositions to be used in the plausibilistic description of a physical theory or model. They will generally be denoted by the symbols S¯ , M, and R. The propositions S¯ will be called states or preparations, and will be meant to express part of our a priori knowledge. The propositions M will be called measurements, and will represent additional a priori knowledge. Finally, the propositions R will be called outcomes and will represent those details about which we have some a posteriori knowledge. The difference between the ‘S¯ -’ and ‘M-propositions’ is that the latter delimit the scope of the a posteriori knowl- edge required; so to speak, they define and confine the particular ‘question’ we are asking. The propositions R are the possible ‘answers’ to such questions. When a given ‘system’ is considered, the set of possible states and the sets of possible measurements and outcomes are automatically circumscribed. Here I want to take the opposite point of view, which is much more profitable:

Definition 2. A set of states {S¯ j} and a set of measurements {Mk} with the relative outcomes {Ri} together define the physical ‘system’ under study.

(Cf. Bridgman’s ‘universe of operations’ [85].) This definition is profitable for at least two reasons. One is that some people sometimes indicate a system by indicating a body or collection of bodies, or sometimes a region of space. But a body (or a space region) may be studied in many different ways, and can correspond to different systems, depending on whether we are interested in its mechanical, or thermodynamical, or electromagnetic, etc., properties (cf.Jaynes [392, § 1.2]) — we cannot be interested in all of its properties at once (and moreover it can possess properties that have not yet been discovered). The other reason is that the laws governing a system are meaningful and correct only in respect of given sets of quantities, variables, and processes. If these sets are altered the laws may become meaningless or incorrect. There are at least three illustrious examples of discussions (which have now become tedious) about ‘paradoxes’ that originated simply because of carelessness in defining the system and in inspecting whether certain laws were really meant to apply to that kind of system:

• The ‘Gibbs paradox’ [266, pp. 166–167],7 one version of which being that the en- tropy of mixing of two gases varies discontinuously as the gases gets chemically ‘more similar’ and eventually identical.8 The point missed by the enthusiasts of this

7Gibbs did not use the word ‘paradox’. 8This is only one of the versions in which this ‘paradox’ is formulated. The versions are sometimes presented as equivalent but are in fact different — which contributes to the confusion. 32 III. THE VECTOR FRAMEWORK

version of the ‘paradox’ is that the system they consider provides for no such pro- cess of chemical change, so their very formulation of the ‘paradox’ is meaningless. In a system — a different system — which made allowance for such a process, there would then also be an additional related parameter; the entropy function and the en- tropy of mixing would depend on it and would therefore have different expressions (there would be a sort of entropy of convection), would satisfy different balance laws, and would be governed by different evolution equations. There are in fact thermo- dynamic theories and systems that describe such processes, with no ‘paradox’; see e.g. Faria Santos [225, 226].9 • A putative argument by von Neumann [554, § V.2] and Peres [586, § 9-4] stating that if two non-one-shot-distinguishable states of a quantum system could be distin- guished in one shot, a violation of the second law could follow. The point missed here is that, if we consider a given system and say that two of its possible states cannot be distinguished in one shot, then evidently the system by definition does not admit measurements or processes that can distinguish those states in one shot. Then why would we entertain such a non-admitted process? It is clear that logical contra- dictions then arise; a derivation of ‘physical’ consequences is then only a vacuous exercise (from contradictory assumptions any proposition can be derived) and has no physical meaning at all. Admitting such a kind of distinguishing process only means that you are considering a different system, in which those states are, by definition, one-shot distinguishable, and for which the entropy function and evolution equations have different forms. Analyses of von Neumann’s and Peres’ inconsistent argument, from partially different points of view, have been given in Paper (B) and Paper (E). In a paper in preparation I also show that von Neumann’s and Peres’ putative argu- ment is valid for classical mechanics as well; hence the discussion is not peculiar to quantum mechanics. • ‘Maxwell’s demon’ [514, pp. 338–339; 695]. In this case one imagines having an envelope with two samples, A and B, of a gas at the same temperature and separated by a diaphragm. In the diagram there is a small hole and ‘a being, who can see the individual molecules, opens and closes this hole, so as to allow only the swifter molecules to pass from A to B, and only the slower ones to pass from B to A. He will thus, without expenditure of work, raise the temperature of B and lower that of A, in contradiction to the second law of thermodynamics’ [514, ibid.]. The amount of discussion about this observation has been enormous, even with volumes specially dedicated to it; a very small sample being [64, 86, 87, 95, 119, 123, 182, 183, 417, 435, 566, 577, 590, 632, 661, 664, 696, 759]; cf. also the references for ‘Gibbs’ paradox’ above. Amongst these discussions, I find Earman and Norton’s

9In all versions of this ‘paradox’ (see preceding footnote) it has been shown that (a) there is no paradox, but only confusion and carelessness in formulating the argument; (b) quantum mechanics has nothing to do with the ‘resolution’ of the ‘paradox’. See the analyses by Gibbs himself [266, p. 166], Larmor [455, p. 92], Schrödinger [642], Bridgman [85, pp. 168–169], Grad [284], Boyer [75], van Kampen [412], Jaynes [388], who have to a greater or lesser degree clarified one or another version of the ‘paradox’ (though sometimes at the detriment of the clarity of other versions). See also [173, 188, 286, 434, 587]. §23. 33

[182, 183] the most lucid. But the simple point here is again that the system for which the second law is stated (which is characterised by measurements of the volumes, temperatures, and pressures of the two gas samples) does not admit any observations of individual molecules. If such kinds of measurement are allowed, we have then a different system, for which the entropy and heat functions either have different expressions and therefore satisfy a different form of the second law, or are not defined at all. I think this is the point made by Maxwell himself immediately after stating his example [514, p. 339]: This is only one of the instances in which conclusions which we have drawn from our experience of bodies consisting of an immense number of molecules may be found not to be applicable to the more delicate ob- servations and experiments which we may suppose made by one who can perceive and handle the individual molecules which we deal with only in large masses. I.e., different sets of measurements (‘more delicate observations and experiments’) define different systems — even if these systems concern the same physical body —, and different systems may satisfy different laws (‘conclusions . . . may be found not to be applicable’).

23. The propositions representing states, measurements, and outcomes introduced in the previous section are required to satisfy some properties which reflect some character- istics of those notions. All these properties can be stated in terms of plausibilities: (I) The states are mutually exclusive and, in the majority of problems, also exhaustive:

0 00 P(S¯ j0 ∧ S¯ j00 | I) = 0 if j , j , (III.12a) W  P P S¯ j| I = P(S¯ j| I) = 1. (III.12b) j j Note that the plausibilities of each state are unspecified, of course. (II) Also the measurements are mutually exclusive and, in the majority of problems, exhaustive as well:

0 00 P(Mk0 ∧ Mk00 | I) = 0 if k , k , (III.13a) W  P P Mk| I = P(Mk| I) = 1. (III.13b) k k Also in this case the plausibilities of each measurement are not specified.

(III) To each measurement Mk is associated a unique set of outcomes {Ri | i ∈ Λk} which are, given the measurement, mutually exclusive and exhaustive:

0 00 P(Ri0 ∧ Ri00 | Mk ∧ I) = 0 if i , i , (III.14a) W  P P Ri| Mk ∧ I = P(Ri| Mk ∧ I) = 1, (III.14b) i i 34 III. THE VECTOR FRAMEWORK

As already appears from these equations, we shall usually omit to indicate to which measurement a particular outcome Ri belongs (which would be otherwise indicated by ‘i ∈ Λk’), since it is usually clear which is the measurement intended.

Requirement (I) expresses the fact that we can prepare (or select) a system in a certain way or in another, but not in both. You perhaps would argue against this requirement as follows: ‘I can prepare, say, a rigid body so that it has a given position x (preparation S¯ 1), or so that it has a given moment of momentum L (preparation S¯ 2); but also in both ways, 0 so that it has position x and moment of momentum L (preparation S¯ 1 ∧ S¯ 2) . But if you rephrase your statement more carefully you see that it is not true. Ask yourself: in the 0 preparation S¯ 2, what is the position? Either it has some value x , or it is unknown. If it is unknown, you cannot have both S¯ 2 and S¯ 1, since in the latter the position is known, and it cannot, of course, be both unknown and known. If it is known to be x0, then you can have both preparations only if (1) the moment of momentum in S¯ 1 is also known, say with value 0 0 0 L , and (2) x = x and L = L . But this means that S¯ 1 and S¯ 2 are the same preparation, and then S¯ 1 ≡ S¯ 2 ≡ S¯ 1 ∧ S¯ 2. Requirement (II) expresses the fact that we can perform a particular measurement, or another one, but not both, in analogy with the requirement for the states. Against the present requirement, though, reasonable arguments can be levelled. For we can imagine e.g. a measurement M1 with three outcomes, and then another measurement M2 which is identical to the first but for the fact that two of the outcomes are ‘grouped together’ and considered as one — more precisely, we take their disjunction. The measurement M2 is called a ‘coarsening’ of M1 (cf. [333] and see § 24). What is in this case the status of the conjunction M1 ∧ M2? The formal position I assume here is that the specification of a measurement includes also a ‘description’ of the possible outcomes, including their number. So in our example the conjunction M1 ∧ M2 is impossible, because we can either perform a measurement with three outcomes, or one with two, but not both. This is only one choice, but I have noticed that it has many advantages. You have noticed that the requirement (III.14) for the outcomes is conditional on some measurement. It seems best not to fix any particular requirements when a measurement is not given. For example, in some situations one can require all outcomes, from all mea- surements, to be mutually exclusive,

0 00 P(Ri0 ∧ Ri00 | I) = 0 if i , i ; whilst in other situations it can be convenient to conceive that an outcome can belong 0 00 to more than one measurement: i ∈ Λk0 and i ∈ Λk00 with k , k (or more simply, Λk0 ∩ Λk00 , ∅). Such is the case, e.g., when a measurement can be considered as a coarsening of another one, in the sense explained above. In any case it is always best to state precisely which measurement one is speaking about. This point may seem so obvious as to be trivial. And yet, very many discussions and statements about quantum mechanics and plausibility theory reveal that it is — still today — not clear at all. Cf. the discussion on the double-slit experiment of § 16. §24. 35

Introducing the vectors

24. As already said, the propositions {S¯ j} are meant to represent the possible a priori data about a given system, and the propositions {Mk} the possible ‘questions’ we can ask, the propositions {Ri} being the possible ‘answers’. It is clear then that the theory (whose content and ‘laws’ we assume included in the proposition I) that concerns the system must allow us to assign the plausibilities

P(Ri| Mk ∧ S¯ j ∧ I) for all j, k, and i ∈ Λk. (III.15)

In other words, if we specify the initial data and ‘ask a (sensible) question’ the theory must allow us to assign plausibilities to the possible ‘answers’. The theory thus provides us with a sort of table like the following:

S¯ 1 S¯ 2 S¯ 3 S¯ 4 ... S¯ σ

R1 p11 p12 p13 p14 ... p1σ M1 R2 p21 p22 p23 p24 ... p2σ

R3 p31 p32 p33 p34 ... p3σ

M2 R4 p41 p42 p43 p44 ... p4σ (III.16) R5 p51 p52 p53 p54 ... p5σ ......

Rρ−2 pρ−2,1 pρ−2,2 pρ−2,3 pρ−2,4 ... pρ−2,σ

Mµ Rρ−1 pρ−1,1 pρ−1,2 pρ−1,3 pρ−1,4 ... pρ−1,σ

Rρ pρ,1 pρ,2 pρ,3 pρ,4 ... pρ,σ where pi j B P(Ri| Mk ∧ S¯ j ∧ I), and it has been assumed that the sets {S¯ j : j = 1,..., σ}, {Mk : k = 1,..., µ}, and {Ri : i = 1,..., ρ} are finite, and (only as an example) measurement M1 has two outcomes Ri with i ∈ Λ1 B {1, 2}, measurement M2 has three outcomes with Λ2 B {3, 4, 5}, etc. The table corresponds to a matrix (pi j). The idea now is to associate some kind of mathematical objects to the propositions S¯ j and to the pairs of propositions Ri, Mk with i ∈ Λk in such a way that, when we ask ‘if I specify the preparation S¯ i and ask the “question” Mk, what is the plausibility that the theory assigns to obtaining the “answer” Ri (i ∈ Λk)?’, we can obtain that plausibility by appropriately combining those mathematical objects. In communication-theoretical jargon, we want to encode the plausibilities (pi j) into pairs of mathematical objects. This idea is quite easy to realise: it turns out that those mathematical objects are just real-valued vectors, and that the way to combine them is to take their scalar product. I.e., we make the associations

S¯ j 7→ s j, (III.17)

Ri 7→ ri with i ∈ Λk for some Mk, (III.18) 36 III. THE VECTOR FRAMEWORK in such a way that

T P(Ri| Mk ∧ S¯ j ∧ I) ≡ pi j = ri s j, (III.19)

T where I write the scalar product ri · s j as the matrix product between the transpose ri of the column vector ri and the column vector s j. It is important to note that the mappings (III.17) and (III.18) are generally not bijective, although they are surjective by construction. The vectors s j and ri are called ‘preparation vectors’ and ‘outcome vectors’ respectively; col- lectively, we call them ‘proposition vectors’. With regard to the measurements Mk, we simply associate to them the corresponding sets of outcome vectors:

Mk 7→ mk B {ri | i ∈ Λk}. (III.20)

The explicit construction of this vector representation is given in Papers (C) and (D) together with examples, a general discussion, applications, and uncommented historical references. I shall not present the construction and the discussion again here, but rather assume that the main points of those papers be known (i.e., this is a good point to read those papers). I should like, however, to present some additional, partly historical, remarks. 0 In the following, the vectors associated to generic propositions S¯ x, S¯ , Ry, and similar 0 will be denoted by sx, s , ry, etc. in an obvious way.

Structures of the vector sets and connexion with plausibilistic reasoning in physics

25. To the sets of propositions {S¯ } and {R } are associated the sets of vectors {s } and j i  j {r }, and to the set of propositions {M } is associated the set of sets of vectors {r | i ∈ i k i Λk} k. As the various propositions generally concern some physical quantities (e.g., phase coordinates etc.), it is important to distinguish carefully between

1. the set or space of physical quantities,

2. the set of related propositions,

3. the set of proposition vectors.

These sets have in general different mathematical, logical, and geometrical structures. To the structures of the sets of propositions and proposition vectors we turn now our attention. The analysis of their structures is particularly important because it is very in- timately connected to the plausibilistic reasoning we make in our theory — e.g., with questions like ‘state assignment’ (or ‘reconstruction’ or ‘retrodiction’ or ‘estimation’) and ‘measurement assignment’. Remember once more that the ‘preparations’, ‘measurements’, and ‘outcomes’ — i.e. the propositions S¯ j etc. — concern actual or hypothetical facts, often stated in terms of physical quantities. Consider the following situations as regards these facts: §25. 37

Preparation ‘mixing’: In a given circumstance we may find ourselves in a condition of uncertainty C between two (or more) preparations S¯ 0, S¯ 00. This is quantitatively expressed by a plausibility distribution P(S¯ 0| C ∧ I) = α0, P(S¯ 00| C ∧ I) = α00, (III.21) with P(S¯ 0 ∨ S¯ 00| C ∧ I) = α0 + α00 = 1.

Given a measurement Mk, what is the plausibility distribution that we assign to its out- comes {Ri} in such a circumstance? From the rules of plausibility logic, more precisely from the theorem on total plausibility,

0 0 P(Ri| Mk ∧ C ∧ I) = P(Ri| Mk ∧ S¯ ∧ I) P(S¯ | C ∧ I) + 00 00 P(Ri| Mk ∧ S¯ ∧ I) P(S¯ | C ∧ I), (III.22) T 0 0 T 00 00 T 0 0 00 00 = ri s α + ri s α ≡ ri (α s + α s ), where we have made the assumption that C becomes irrelevant if S¯ 0 or S¯ 00 is known,10 used eq. (III.21), and introduced the vectors associated to the various propositions. The numer- ical value of the plausibility conditional on C is a weighted average of the plausibilities conditional on S¯ 0 and S¯ 00. The last line of the preceding equation shows that we can associate the vector x B P(S¯ 0| C ∧ I) s0 + P(S¯ 00| C ∧ I) s00 ≡ α0 s0 + α00 s00 (III.23) to the proposition C. This vector is a convex combination of the vectors associated to S¯ 0 and S¯ 00. All these results are straightforwardly generalised to conditions of uncertainty concerning more than two preparations. A condition of uncertainty between two or more preparations is often called a ‘mixture’ of those preparations. In the following, we shall call propositions like C circumstances, as we did in the Laplace-Jaynes approach to induc- tion; this is not a confounding nomenclature, since we shall see that the ‘circumstances’ of the Laplace-Jaynes approach and the ones introduced here are basically the same no- tion. When we want to be more specific, we may call those here presented ‘preparation circumstances’.11 Their associated vectors, like x, will be called ‘circumstance vectors’. There is a natural relation of equivalence amongst different circumstances, and amongst circumstances and preparations themselves: Definition 3. Two preparation circumstances C0, C00 are said to be equivalent if they lead 12 to identical plausibility distribution for each measurement Mk; or, equivalently, if their associated vectors are identical: 0 00 0 00 C ∼ C ⇐⇒ P(Ri| Mk ∧ C ∧ I) = P(Ri| Mk ∧ C ∧ I) for all k and i ∈ Λk, (III.24) ⇐⇒ x0 = x00.

10 0 0 00 I.e., P(Ri| Mk ∧ S¯ ∧ C ∧ I) = P(Ri| Mk ∧ S¯ ∧ I), and analogously with S¯ . 11I know that the terms ‘(preparation) circumstance’ and ‘measurement circumstance’, introduced infra, are ugly and liable to be confused with their non-technical homonyms. But I have not found better alternatives yet (cf. footnote 4). 12 T 0 T 00 Since ri x = ri x for a complete set of linearly independent ri. 38 III. THE VECTOR FRAMEWORK

Analogously, one can speak of the equivalence of a circumstance and a preparation, or of two preparations. It must be remarked that this equivalence relation is heavily dependent on the speci- fication of the measurement and outcome sets. Adding or subtracting a measurement to or from the set {Mk} — and thus considering a different system, cf. § 22 — may render two previously equivalent circumstances inequivalent or vice versa. Thus the equivalence is not an ‘intrinsic’ property of the preparations, or related to the characteristics of the preparations alone. Unfortunately the above remark is very often forgotten in quantum mechanics. There, also, two ‘preparation procedures’ can be ‘indistinguishable’, in which case they are repre- sented by the same density matrix.13 But this indistinguishability is only relative to some set of measurements, not ‘intrinsic’ to the procedures. (After all, if we are speaking of two procedures it is because we can distinguish them somehow.) Also the statement that the two procedures lead to the ‘same state’ must be qualified relatively to some measurement set. It may well happen that a new measurement be found that distinguishes ‘states’ that were previously thought to be the ‘same state’: just think about the discovery of isotopes. Statements asserting that ‘two states are indistinguishable in principle’ or something of the kind (and such statements abound in quantum mechanics), are simply vacuous, untestable, and usually of limited historical duration. Measurement ‘mixing’: In another circumstance a condition of uncertainty W may regard two (or more) measurements M0, M00:

P(M0| W ∧ I) = β 0, P(M00| W ∧ I) = β 00, (III.25) with P(M0 ∨ M00| W ∧ I) = β 0 + β 00 = 1.

In this circumstance W we can expect an outcome of M0 or M00, so that the set of possible 0 00 outcomes is the union of the outcomes of the two, Ri with i ∈ Λ ∪Λ . Given a preparation S¯ j, the plausibility distribution over this set of outcomes in this circumstance is, from the theorem on total plausibility and eq. (III.25),

0 0 P(Ri| W ∧ S¯ j ∧ I) = P(Ri| M ∧ S¯ j ∧ I) P(M | W ∧ I) + 00 00 P(Ri| M ∧ S¯ j ∧ I) P(M | W ∧ I),  (III.26)  0 T 0 β ri s j if i ∈ Λ , =   00 T 00 β ri s j if i ∈ Λ , where the assumption has been made that W becomes irrelevant if M0 or M00 is known.14 A condition of uncertainty between two or more measurements is often called a ‘mix- ture’ of those measurements. In the following we shall call propositions like W measure- ment circumstances. (Cf. also Holevo [349].)

13As explained in Papers (C) and (D), density matrices constitute only a different representation of the prepa- ration vectors of quantum-mechanical systems. 14 0 0 00 I.e., P(Ri| M ∧ W ∧ S¯ j ∧ I) = P(Ri| M ∧ S¯ j ∧ I), and analogously with M . §25. 39

Coarsening: There are also circumstances in which the outcomes {Ri | i ∈ Θ} of one or more measurements Mk, with Λk ⊆ Θ, cannot be observed; but we can observe other ‘events’ described by mutually exclusive and exhaustive propositions {Eı¯}, and we have (e.g., from some theory) some plausibilities for the latter given the former:

P(Eı¯| Ri ∧ I) = Qı¯i, P W  (III.27) with Qı¯i = P Eı¯| RiI = 1. ı¯ ı¯

The last line implies that the matrix Q ≡ (Qı¯i) is a stochastic matrix [603] (see also [13, 58, 145, 510]). The plausibility for one of the ‘events’ {Eı¯}, given a measurement Mk such that Λk ⊆ Θ and a preparation S¯ j, is then obtained by marginalisation over the outcomes {Ri | i ∈ Λk}: X P(Eı¯| Mk ∧ S¯ j ∧ I) = P(Eı¯| Ri ∧ I) P(Ri| Mk ∧ S¯ j ∧ I), i∈Λk (III.28) P T = Qı¯i ri s j, i∈Λk where we assume that the Mk are irrelevant if the Ri are known. A set of such ‘events’ is often called a ‘coarsening’ of the outcome set {Ri | i ∈ Λk}; in fact, from now on we shall call the propositions Eı¯ ‘coarsened outcomes’, or even simply ‘outcomes’, instead of ‘events’. (Cf. also Holevo [349].) A simple example of coarsening is the disjunction of two outcomes in a set of three: {E1, E2} B {R1 ∨ R2, R3}. In this case the stochastic matrix will contain the submatrix     Q11 Q12 Q13 1 1 0   =   . (III.29) Q21 Q22 Q23 0 0 1 It is quite natural to consider and combine a measurement circumstance together with a set of coarsened outcomes, which manifest a condition of uncertainty about the outcomes of those measurements. Denote this ‘combined’ measurement circumstance also by W. We can have e.g. a measurement circumstance W regarding some measurements Mk with plausibility distribution {β }, and also a coarsening {E } of their sets of outcomes with k S ı¯ stochastic matrix (Qı¯i), i ∈ k Λk. The plausibility for the coarsened outcome Eı¯, con- ditional on W and on a preparation S¯ j, is then (with the assumptions already discussed) given by X P(Eı¯| W ∧ S¯ j ∧ I) = P(Eı¯| Ri ∧ I) P(Mk| W ∧ I) P(Ri| Mk ∧ S¯ j ∧ I), k,i∈Λ Xk (III.30) T = Qı¯i βk ri s j.

k,i∈Λk To this measurement circumstance W we can therefore associate the set of vectors

m¯ B {r¯ı¯}, with X P (III.31) r¯ı¯ B P(Eı¯| Ri ∧ I) P(Mk| W ∧ I) ri, ≡ Qı¯i βk ri. k,i∈Λk k,i∈Λk 40 III. THE VECTOR FRAMEWORK

In analogy with the preparation circumstances, there is also a natural equivalence re- lation amongst ‘combined’ (in the sense above) measurement circumstances, as well as amongst these and the measurements themselves:

0 00 0 Definition 4. Two measurement circumstances W and W , with coarsened outcomes {Eı¯} 00 and {Eı¯ } having the same cardinality, are said to be equivalent if they lead to identical 15 plausibility distribution for each preparation S¯ j; or, equivalently, if their associated vec- tor sets are identical:

0 00 0 0 00 00 W ∼ W ⇐⇒ {P(E | W ∧ S¯ j ∧ I)} = {P(E | W ∧ S¯ j ∧ I)} for all j, ı¯ ı¯ (III.32) ⇐⇒ m¯ 0 = m¯ 00.

Analogously, one can speak of the equivalence of a measurement circumstance and a mea- surement, or of two measurements.

Note that the equivalence regards the plausibility distributions only: the sets of out- comes of the equivalent measurement circumstances may be from a physical point of view completely different. As in the case of the equivalence relation (III.24), also in this case it must be remarked that the present equivalence relation is dependent on the specification of the set of preparations. Any alteration to the set {S¯ i} — which alteration implies a change of system, cf. § 22 — may render two measurement circumstances, which were previously equivalent, inequivalent; or vice versa.

26. Preparations S¯ , measurements M, and outcomes R on one side, and circumstances C, W, and coarsened outcomes E on the other, live on different logical planes. The former are nearer to the notions, the concepts, and the primitives of the theory than the latter. The latter represents different kinds of states of knowledge we can have on those notions, concepts, and primitives. It is for this reason that when we study a physical system from a plausibilistic point of view, it is natural to shift from the sets {S¯ j}, {Mk}, {Ri}, to the sets of propositions like C, W, and E. This shift also has some mathematical consequences and advantages. We can imagine all possible kinds of circumstances C. Mathematically this means that in eqs. (III.21) and (III.23) we can entertain all possible plausibility distributions {α j}. The associated vectors x evidently form a convex set, viz. the convex hull [14, 89, 297, 520, 627, 726] conv{s j} of the set of preparation vectors. Note that the correspondence between circumstances and their vectors is not injective: as eq. (III.24) shows, equivalent circumstances have the same vector. Each equivalence class is then uniquely characterised by a vector x; membership of a circumstance C j to such equivalence class will be denoted by j ∈ x∼. From the same equation it also follows, by the rules of plausibility theory, that

 W   ∼ P Ri| Mk ∧ C j ∧ I = P(Ri| Mk ∧ C j ∧ I) for all j ∈ x , j∈x∼ (III.33) T = ri x.

15See footnote 12. §27. 41

Hence the same vector x can also be associated to the disjunction of all equivalent circum- stances — which is itself but another ‘circumstance’. It is therefore natural to consider the set of such disjunctions instead of the original set, since the members of each disjunction are for us the same for the purpose of assign- ing plausibilities to measurement outcomes. (This point is made more precise in § 39; cf. eq. (IV.5).) From the above equations it follows that this disjunction is uniquely charac- terised by such a vector, which can therefore be used as a unique index: W S x B C j. (III.34) j∈x∼

Such disjunctions will be called ‘x-indexed circumstances’, or more generically ‘plausibility- indexed circumstances’, in analogy with the Laplace-Jaynes approach (remember in fact that the rôle of the vector x is to partly encode the plausibilities P(R j| Mk ∧ S x ∧ I)). Since the vectors x belong to a convex space, the set {S x} is continuous. For a more careful discussion of this point, see Paper (F), § 4, and Paper (G), § 5.3. In a similar way can we imagine all possible kinds of measurement circumstances W; and the set of imaginable coarsened outcomes contains at least all possible disjunctions of outcomes Ri. Mathematically this means that in eqs. (III.25), (III.27), and (III.31) we can entertain all possible plausibility distributions {βk} and at least those stochastic matrices (Qı¯i) that have noughts or ones as elements. The associated vectors r¯ can then be shown to form a convex set as well, the convex hull conv{ri} of the set of outcome vectors. In fact, there is more than only a convex structure: this set is a double cone, as discussed in Paper (C), § III.C; cf. also the references given infra in § 37. The set of sets m¯ associated to uncertainty conditions on measurements has an even more complex structure, possibly with a partial order (cf. [13, 70, 145, 510, 538]). Measurement circumstances can also be grouped together into disjunctions of equiva- lent ones, according to the equivalence relation (III.32). The process is analogous to that for preparation circumstances. I shall not take this step, however; for preparations and measurements have different offices, and it is not yet clear to me whether a ‘plausibility- indexing’ of measurement circumstances and outcomes is as useful as for preparation cir- cumstances.

The vector framework in classical point mechanics

27. Up to now the discussion has been conducted on an abstract and general plane. But it is simple to apply the framework to concrete cases. A simple example concerning a toy system is given in § 2.1 of Paper (D). Here I want briefly present quite natural applications to systems of classical point mechanics and quantum mechanics;16 but note that they are not the only possible ones.

16The first case requires some mathematical care since the resulting convex spaces are not finite-dimensional; but for the general presentation given here I think that no particular mathematical (topological) comments are necessary. 42 III. THE VECTOR FRAMEWORK

With ‘a system of classical point mechanics’ I intend here specifically a closed17 system specified by some phase coordinates z, like the positions and momenta of a collection of particles, and by a set of measurements which can pertain e.g. the position coordinates, or energy, or other quantities like the intensity of the total electric field at a given point, et similia. The measurements may more generally depend on external parameters, and these may be unknown (for a simple example see Peres [585]), so that we have plausibility distributions for their outcomes. An evolution operator which may be time-dependent, or less generally a Hamiltonian (usually corresponding to the energy) or a Lagrangian is also given, but it will not interest us here. If we fix a time instant t0, it is natural to introduce a (continuous) set of states S¯ z, each stating that the system has been prepared or selected, at that instant, with particular phase coordinates z. Amongst the measurements considered for this kind of systems the basic one is that yielding the phase coordinates themselves; we denote it by Mph. Its outcomes Rz are the various phase-coordinate values, and the theory tells us that

ph 0 p(Rz| M ∧ S z0 ∧ Ic) dz = δ(z − z ) dz, (III.35) whose meaning I think needs no explanation. Another almost universally considered mea- en surement concerns the energy h, and we denote it by Mω . In the case of a closed system the energy value is determined by the phase coordinates z, and by other possible parameters as well as the time, which I denote collectively by ω:

h = H(z, ω). (III.36)

If ω is known and fixed the plausibility of obtaining the outcome h is given by

en 0 p(Rh| Mω ∧ S z0 ∧ Ic) dh = δ[h − H(z , ω)] dh. (III.37)

Analogous considerations holds for other kinds of measurements. The plausibilities in the equations above are those ‘given by the theory’, and are more concrete examples of the generic ones of eq. (III.15). Although, as already said, we have an uncountable infinity of states, we may roughly imagine how the construction of a plau- sibility table like (III.16) could proceed, as well as the derivation of the proposition vectors as in Papers (C) and (D). The various vectors are in this case infinite-dimensional and are realised as generalised functions18 over an appropriate space, their scalar product be- ing realised as the integral of the product of those generalised functions. The following associations hold in particular:

0 S¯ z0 is represented by sz0 B [ˆz 7→ δ(ˆz − z )], (III.38) 0 0 Rz0 is represented by rz0 B [ˆz 7→ δ(ˆz − z ) dz ], (III.39)

Rh is represented by rh B {ˆz 7→ δ[h − H(ˆz, ω)] dh}, (III.40)

17In the thermodynamical meaning of the term. 18Or rather, measures. §28. 43 in such a way that hR i ph 00 T 00 0 00 p(Rz00 | M ∧ S¯ z0 ∧ Ic) dz = rz00 sz0 ≡ δ(ˆz − z ) δ(ˆz − z ) dˆz dz , (III.41) nR o en ¯ T 0 p(Rh| Mω ∧ S z0 ∧ Ic) dh = rh sz0 ≡ δ[h − H(ˆz, ω)] δ(ˆz − z ) dˆz dh, (III.42) etc. We clearly recover eqs. (III.35) and (III.37). Note that, although there is a bijective correspondence between phase coordinates z and the preparation-vectors sz of the corresponding propositions S z, the two mathematical objects live in two different spaces, with different relevant properties. In the space of the z there may be a linear structure (e.g., when the z are positions and momenta), a symplectic one, etc. In the space of the sz the relevant structure is the convex one, and it turns out that this space consists in the extreme points of a simplex.19 From this property many other well-known properties follow, e.g. the fact that all states are distinguishable in one shot, and that every measurement is equivalent to some coarsening of Mph.

28. In the particular application of the vector framework to classical point mechanics described in the previous section, a preparation circumstance C represents a state of uncer- tainty regarding the phase coordinates or, better, the propositions S¯ z. Conditional on such circumstance we have a plausibility distribution

p(S¯ z| C ∧ Ic) dz = f (z) dz, (III.43) and, according to § 24, eq. (III.23), we can associate to the proposition C the vector (which is again a generalised function) R h R i x B p(S¯ z| C ∧ Ic) sz dz = ˆz 7→ f (z) δ(ˆz − z) dz ≡ f. (III.44) It is clear that a circumstance C is what we usually represent by a Liouville function.

29. It is interesting to note that, in the vector framework, Liouville functions can be seen from two different points of view. From the first, they are plausibility distributions over the phase coordinates (or better, over the propositions S z); and this is the usual point of view. From the second point of view, they are just vectors that, when combined with other vectors, yield a plausibility distribution; from this point of view their normalisation and positivity properties are not necessary. In fact, the vector representation (III.38)–(III.40) is only a particular realisation: we could have chosen functions different from deltas. A simple alternative would be, e.g., the associations

0 0 S¯ z0 is represented by sz0 B [ˆz 7→ −δ(ˆz − z )], (III.38 ) 0 0 0 Rz0 is represented by rz0 B [ˆz 7→ −δ(ˆz − z ) dz ], (III.39 ) etc.; in this case to the circumstance C would be associated the vector (function) − f , clearly non-positive. This second point of view is quite useful especially when we compare clas- sical mechanics with quantum mechanics, since it implies that Liouville functions and,

19Some qualifications would be necessary in this infinite-dimensional case. See e.g. [14, 426–432, 634, 739]. 44 III. THE VECTOR FRAMEWORK e.g., Wigner functions or other ‘pseudo-distributions’20 are homologous mathematical ob- jects, i.e. they play the same rôle within the respective theories; in particular, they are not plausibility distributions, but only mathematical objects that, combined with others, yield plausibility distributions.

30. It is perhaps useful to give an example of a measurement circumstance as well. Consider the energy function z 7→ H(z, ω) and suppose that the value of the parameter ω is unknown, with a plausibility distribution g(ω) dω. This situation constitutes a measure- en ment circumstance W, conditional on which the measurements Mω for different ω have plausibilities en p(Mω | W ∧ Ic) dω = g(ω) dω. (III.45)

Conditional on W and on a state S¯ z, the possible outcomes Rh have a plausibility distribu- tion nRR o p(Rh| W ∧ S¯ z ∧ Ic) dh = δ[h − H(ˆz, ω)] δ(ˆz − z) g(ω) dˆz dω dh (III.46) and to the outcomes we can associate the vectors  nR o  r¯h B ˆz 7→ δ[h − H(ˆz, ω)] g(ω) dω dh , (III.47) which formula is but a particular case of eq. (III.31).

The vector framework in quantum mechanics

31. With regard to quantum mechanics, the application of the vector framework is as straightforward as it was for classical point mechanics. Here I take as example a closed,21 finite-level system. Such a system is characterised by an infinite collection of preparation circumstances which includes a set of ‘special ones’, indexed by the rays φ of a complex Hilbert space of given dimension. The physical reasons as to why such a set should be picked up are still unknown — and also unsought within the main-stream research of trade science [709]. A collection of measurements is also given which pertain various quantities such as the components of intrinsic angular momentum, energy, et similia. The collection of preparations and measurements has the specific feature — whose reason is also unknown and unsought — that at most K preparations can be distinguished in one shot by some measurement; the system is therefore called a K-level system. An evolution operator or less generally a Hamiltonian is usually given as well, but it will not interest us here. Fixing a time instant t0, one can introduce the set of preparation circumstances {C j} and the set of measurements {Mk}, each with a set of outcomes {Ri | i ∈ Λk}. If we imagine to specify a plausibility table like (III.16) and to decompose it as usual, we arrive at sets of circumstance- and outcome-vectors x and ri that have particular convex struc- tures; in particular, the set of preparation vectors is convex-structurally isomorphic to the

20See references in § 31 infra. 21In the thermodynamical meaning of the term. §31. 45 set of positive-definite K-by-K complex Hermitian matrices with unit trace, and the set of outcome vectors is the largest convex set compatible with the structure of the preparation- vector set (in the sense of Paper (C), § III.C). These convex structures are quite compli- cated; an example for a three-level system is shown in Fig. 1 of Paper (H), and others are available upon request [604]. The circumstance- and outcome-vectors are realised in a variety of ways in quantum- mechanical studies and applications, depending on the purpose. The most common re- alisation is as particular Hermitian matrices: statistical operators [224] ρ and positive- operator-valued-measure elements [15, 22, 106–108, 110, 111, 113, 156, 157, 348, 351, 369, 444, 644] Ei, the scalar product corresponding to the trace of the product of these matrices:22 x =ˆ ρ, (III.48a)

ri =ˆ Ei, (III.48b) T ri x =ˆ tr(Eiρ). (III.48c) Another realisation very common in quantum optics is as Wigner functions [419, 741] W or other functions like Husimi’s Q or Glauber and Sudarshan’s P [264, 276–278, 342, 461, 680], defined on a particular parameter space; the scalar product is realised as the integral of the product of the functions: x =ˆ [y 7→ W(y)], (III.49a)

ri =ˆ [y 7→ Vi(y)], (III.49b) R T ri x =ˆ Vi(y) W(y) dy. (III.49c) There have been many discussions about the fact that such functions, often called ‘pseudo- probability distributions’, do not generally have the properties of a plausibility distribu- tion. But we see that this is in fact not necessary, since their rôle is not that of plausibility distributions, but of objects that, combined with similar objects, yield plausibility distribu- tions.23 Put it otherwise, the fact that e.g. a Wigner function may have negative values is no more troublesome than the fact that a statistical operator has complex entries: for neither is a plausibility distribution. I have already mentioned that the sets of circumstance- and outcome-vectors have par- ticular and complicated convex structures. Some consequent properties are well known. E.g., the circumstances represented by the extreme points of the preparation-vector set are not all distinguishable in one shot, as instead is the case for most systems of classical point mechanics. A noteworthy property of the set of measurements is that not all measurements can be obtained as coarsenings of a single one, as is the case for most classical systems; this is related to the ‘construction’ of some measurements by means of Najmark’s theo- rem [see e.g. 12, 346, 348, 351]. For detailed discussions of the convex and related prop- erties of these sets see e.g. the studies by Holevo [346, 348, 349, 351], Schroeck [644],

22The symbol ‘=ˆ ’ means ‘corresponds to’. Cf. the ISO [361] and ANSI/IEEE [357] standards. 23In some cases the objects to be combined with are sort of ‘identities’, and thus the combination is not apparent. This is the case e.g. for the Q and P pseudo-distributions. 46 III. THE VECTOR FRAMEWORK

Busch, Grabowski, and Lahti [106, 108–111], Beltrametti and Bugajski [56], and refer- ences therein.

Miscellaneous remarks

32. In the study of some now fashionable topics, like entanglement, cloning, and other communication-theory related ones, the vector framework appears to me the most appro- priate since it offers a clear geometric point of view, uncluttered by mathematical objects and notions — like eigenprojectors, Hermitian conjugates, kets, bras, and other Hilbert- space paraphernalia — that are not always necessary. This seems to have been noticed in the literature recently [40–42, 668]. Note, in fact, that the notions of distinguishability, measurement sharpness, mixing, coarsening, and many other related ones24 are all of a convex-geometrical nature, and are therefore most easily expressed and studied in convex- geometrical terms. Note that I am not saying that Hilbert-space concepts are unnecessary; they are necessary in the sense that the Hilbert-space structure determines the particular convex one. But when one e.g. wants to localise some points in this convex structure, or delimit particular regions by hyperplanes, or just choose some basis vectors, there is no need to invoke, say, SU-group generators and the like. The mathematics introduced in some works that I have seen in the literature seems more like a sort of exorcising ritual rather than an efficient mathematical apparatus set-up to achieve a given purpose. My statement, indeed, is that in quantum mechanics, the only scope of assigning a Hilbert-space structure is to compactly assign the convex one. And I hope that alternative ways of assigning the convex structure will be found some day; and that these will have some understandable physical content.

33. This leads me to another observation. In quantum theory it is the convex structure of the sets of circumstances and measurements that is given at the start (by postulating a complex Hilbert-space structure), and from this the plausibilities for the measurement outcomes, conditional on the circumstances, are derived. This is the opposite of what we do in classical physics, where instead we give physical laws — which concern notions, concepts, ideas that are distillations and idealisations of our experience — and based on these we assign plausibilities, from which the convex structures are finally derived. As already said, it is still unknown what are the reasons of the particular convex structures of quantum mechanics, and one of the greatest achievements in natural philosophy will be to derive them from humanly understandable principles. Until then quantum theory will only be a very successful black-box theory, a sort of modern Linnaean ‘Systema naturæ per spatia Hilberti’ of the microscopic fauna and flora. Note that by very nice group arguments one can derive the standard quantum-mechanical commutator structures, see e.g. Lévy-Leblond [463] and Holevo [348]; but these derivations assume at the start the particular, unexplained convex structure of the set of quantum-mechanical states.

24E.g., the topic of ‘cloning’, which centres around the notion of distinguishability. §33. 47

Some physicists are of the opinion that quantum theory needs no ‘interpretation’ (e.g., Fuchs and Peres [249]; see also the comments on their articles and their reply [679])25. Well, that depends on what one means by ‘interpretation’. I should say, e.g., that ‘ther- modynamics needs no interpretation’, since the primitives and the laws of this theory are distillates and idealisations of our daily experience and concepts. And yet, I would not say that it is useless to encumber thermodynamics ‘with hidden variables [. . . ] without any im- provement in its predictive power’; for otherwise I should be condemning the first studies in statistical mechanics, which added a lot of ‘hidden variables’ (positions and momenta of microscopic particles) without, at that time, any improvement to thermodynamics’ pre- dictions. I also think that the parallel Fuchs and Peres draw with Euclidean geometry misses the point. They imagine Euclid saying: ‘Geometry is an abstract formalism and all you can demand of it is internal consistency. However, you may seek material objects whose behavior mimics the theorems of geometry, and that involves interpretation’ [679]. But ge- ometry was created as an idealisation, systematisation, formalisation of some provinces of experience, not as an exercise in axiomatics. The primitives of point, straight line, surface, etc., and the associated postulates were all introduced to reflect and idealise some facts of experience. The same is true of the foundations of classical physics [219, 556, 719], based on primitives like body, distance, motion, force, temperature, and on the postulates that characterise these. But the primitives of quantum mechanics — eigenprojectors, statistical operators — which experiential facts do they idealise? I agree with Fuchs and Peres: they are not meant to idealise facts, they are just parts of ‘an algorithm for computing prob- abilities for the macroscopic events (“detector clicks”) that are the consequences of our experimental interventions’ [249]. I.e., again: quantum theory is a black-box theory, not a proper physical theory. Appleby [21], citing Bell [55], states the point in a direct way:

This does not mean that I find the Copenhagen interpretation satisfactory, as it stands now. Bell [55, pp. 173–174] argues that the Copenhagen interpre- tation is “unprofessionally vague and ambiguous”. I think he is right. I also think he is right to complain that quantum mechanics, when interpreted in tra- ditional Copenhagen terms, seems to be “exclusively concerned with ‘results of measurement’ and [seems to have] nothing to say about anything else”. I share Bell’s conviction that the aim of physics is to understand nature, and that counting detector “clicks” is not intrinsically any more interesting than counting beans. If prediction and control were my aim in life I would have become an engineer, not a physicist.

There is also a facetious side in the assertion (see again Fuchs and Peres’ quotation, supra) that quantum theory encodes probabilities for macroscopic events: this would mean that quantum theory is not meant to describe ‘microscopic’ phenomena!

25Although I paraphrase some statements from Fuchs and Peres’ article here, I do not really want to attribute them the meaning here intended to those authors. I find the statements at the beginning and at the end of their article contradictory; surely because I have not understood what they mean by ‘interpretation’. 48 III. THE VECTOR FRAMEWORK

One could argue: ‘but in the microscopic domain there cannot be, by the very meaning of “microscopic”, any experiential facts, since those phenomena are not immediately ac- cessible to your senses’. And that is true. But in fact the original programme and approach was to try to imagine or to represent to ourselves those microscopic phenomena by means of macroscopic concepts — we could say with Nietzsche [560]:

— Aber diess bedeute euch Wille zur Wahrheit, dass Alles verwandelt wer- de in Menschen-Denkbares, Menschen-Sichtbares, Menschen-Fühlbares! Eu- re eignen Sinne sollt ihr zu Ende denken!

And the beautiful statistical-mechanical and kinetic-theoretical studies of Maxwell, Boltz- mann, Gibbs, amongst others, are examples of such a programme. Many physicists today say that this programme is no longer feasible, and some even mention ‘proofs’ of such impossibility, e.g. Bell’s theorem. But even if there really was a theorem showing a con- tradiction of locality and determinism with experimental data: who cares? Fuchs and Peres [249] say that a non-local theory ‘would eventually have to encompass everything in the universe, including ourselves, and lead to bizarre self-referential logical paradoxes’. That is a bizarre statement: Newtonian mechanics is in principle non-local, but that has never hindered anybody to apply it for very concrete purposes, like calculating where a bomb should land, with no paradoxes at all. And non-local theories are studied and used in modern continuum mechanics [204, 205, 207–212, 214, 215, 218]. In any case, with regard to Bell’s theorem we have seen in § 17 that its import in respect of locality and determinism is naught. In fact, there is a different theorem, and a very simple one, whose content goes in a very different direction than that of Bell’s theorem: it states that any quantum system, in fact, any system whatever, can always be considered as a classical one in which some measurements are ‘forbidden’; see e.g. Holevo [348, § I.7]. Fortunately, the programme of Maxwell, Boltzmann, Gibbs has not been abandoned, and many studies [43–45, 76, 77, 91, 124, 141, 327, 328, 340, 375, 376, 378, 380, 381, 385–387, 389, 390, 393, 578, 628, 637, 678, 727–732] (see also [116–118]) — some more, some less interesting, some convincing, some unconvincing — are pursued today in its spirit. Moreover, thanks to the appearance of freely and publicly available sci- entific archives like arXiv.org (http://arxiv.org) and mp_arc (http://www.ma. utexas.edu/mp_arc/), these studies may finally appear freely, without the ostracism of ‘peer-reviewed’ periodicals.26 Representative of this ostracism, still present today, is the following remark by Boyer at the end of a study in which he apparently shows that the Aharonov-Bohm phase shift ‘may well arise from classical electromagnetic forces which are simply more subtle in the magnetic case since they involve relativistic effects of the order v2/c2’[77]:

I should also like to thank the Editor for his decision to accept for publi- cation my two papers dealing with the Aharonov-Bohm phase shift and re- lated classical electromagnetic theory, despite the existence of a (minority)

26Some periodicals are exceptions; in particular Foundations of Physics, to the editor of which, Alwyn van der Merwe, I pay my homages. §34. 49

opinion among the several referees that urged rejection on the grounds that, although my calculations might be correct, my conclusions were “certain” to be wrong and thus would “lead to unnecessary confusion” regarding the Aharonov-Bohm phase shift.

34. In the previous section I have mentioned a theorem by Holevo [348, § I.7], stating that any physical system whatever can always be considered as a classical one in which some measurements are ‘forbidden’. There is an analogous theorem stating that any phys- ical system whatever can always also be considered as a quantum one in which some measurements are ‘forbidden’. The proof, that I do not give here [605], is based on the fact that a simplex of any dimension, say D, can always be obtained as a projection of the state space of a D-level quantum system. Projecting a state space (viz., the set of preparation vectors), thus obtaining a new one, corresponds to declaring some measurements unfeasi- ble (i.e., to eliminating some vectors from the outcome-vector set). To understand this fact one may draw a parallel with the standard topology of Rn: you can derive it from the set of balls or from the set of hypercubes: it does not matter which since every ball can be inscribed in a hypercube and every hypercube in a ball.27 These theorems show that one should not give some, but not ‘too much’, physical meaning to the particular simplicial and ‘Hilbertian’ convex structures of classical and quantum systems. There are some reasons which make me believe that the Hilbertian convex structure of quantum theory is not fundamental, whereas the simplicial one of classical physics is the fundamental one. First, the horrible dimensional jump of the state space for systems of different dimen- 2 sionality: if a quantum system has a state space of dimension√ N (with N = D − 1 for some natural D), the next ‘larger’ quantum system has 2 N + 1 + 1 more dimensions!28 Compare this with a classical system, where this dimensional jump is simply 1 instead, independently of N. This has important and annoying consequences in the actual study of some systems. Consider e.g. the case in which we are studying a two-level quantum system which is a subsystem of a larger one. As long as we are interested in measurements and preparations of the subsystem only, we only need to work with 3 real independent variables (from the independent components of the statistical operator). But as soon as we consider a single measurement or preparation concerning the larger system (e.g., we want to keep track of a global quantity), we have to increase the number of variables, and the minimal increase allowed by the quantum mechanical formalism is by 5 real variables! It may well be the case that some of these are unimportant for us, but we have to drag them along anyway. Another way is to consider only the useful additional variables as extra parame- ters. Situations of this kind made it necessary to consider non-completely positive maps and their restricted domains [115, 245, 405–407, 579, 635, 654, 734]. On the other hand, in a situation analogous to the above, but for a classical system, the minimum necessary

27But this topological fact is not true in infinite dimensions. Analogously I wonder whether the reciprocal ‘inscribableness’ of quantum and classical systems holds for infinite-dimensional ones. 28This obviously follows from the fact that a D-level quantum system has a state space of dimension N = D2 − 1. Note also that N = K − 1 for K as defined in Papers (C) and (D). 50 III. THE VECTOR FRAMEWORK number of additional variables would just be 1. Classical theories are more flexible. Cf the remarks given in the earlier versions (available at http://arxiv.org) of Paper (B). Another reason, related to the first one, is the excessive number of variables that appear when we ‘compose’ two or more systems, an operation implemented by the tensor product in the quantum-mechanical formalism. E.g., composing two three-level systems, described by 8 variables each, we suddenly have to handle with 80 variables. In the classical case this number would be 64 instead. The additional variables and the extra correlations that accompany them make me believe that the tensor-product operation of quantum mechan- ics puts in, physically, some more ‘systems’ or more generally speaking some ‘additional phenomena’ than just the systems entering the tensor product. In any case, I think that we ought to consider alternative mathematical approaches to the relation between ‘subsys- tems’ and ‘supersystems’ than Cartesian or tensor products. I made some remarks on this in Paper (C) and a mathematical study is almost ready [605].

35. It should be noted hew in the vector framework here presented there is a clear dif- ference between preparations and measurement outcomes, and this difference is reflected in their mathematical representatives, the preparation vectors and outcome vectors. In fact, the convex spaces of these two kinds of vectors can be very different. In particular, they need not have the same number of extreme points. In view of this fact, classical and quantum systems are particular because their outcome- and preparation-vector spaces have not only the same cardinality, but can even be given the same linear, respectively Hilbertian structure. It is this peculiarity that allows us to asso- ciate to every ‘ket’ |φi a ‘bra’ hφ|, and vice versa, in quantum mechanics. Hence what I am saying is that in general physical theories a correspondence or ‘pairing’ analogous to the quantum-mechanical |φi ! hφ| does not exist. This leads me to two brief comments. The first is that I do not like the oft-heard sentence ‘the probability that a system prepared in the state |φi be found in the state |ψi is. . . ’. This sentence would be meaningless in other physical theories: in general we can only speak about the plausibility that a measurement yield this or that outcome; and it is not completely clear to me what ‘finding a system in a given state’ means. We can infer that it was prepared in a given state, which is a different statement. From this point of view the Kochen-Specker [437] and similar theorems lose their (meta-mathematical) meaning. Note also that the statement that a quantum system ‘is found’ in some state needs qualification: amongst which states? It usually understood that the states in which it can be found is a complete orthonormal set; but then I do not see why one could not ask for the plausibility (density) that the system be found in the state |ψi amongst all possibile states, orthogonal or not. This question would simply correspond to a positive-operator-valued measure with a continuum of results.29 The second comment is that we perhaps ought to search for some ‘physical meaning’ for the fact that classical and quantum systems have that particular isomorphism between the spaces of outcome- and preparation-vectors.

29An example for a two-level quantum system is the positive-operator-valued measure {|ΩihΩ| dΩ}, where Ω represent the coordinates of a surface element on the Bloch sphere. §36. 51

36. Some people regard the schematisation of what we do with and within physical theories into the propositions S¯ etc. as ‘operational’, as I also did once. But now I must confess that I do not see exactly what is that this adjective should put into relevance, or demarcate, or exclude. Are not all proper physical theories (i.e., excluding toy theories and mathematical divertissements like string theory) ‘operational’? Which theories are not? I should like to quote a passage from Truesdell’s review [708] of a book by Jammer: Thus, on page 120 [of Jammer’s book], “In contrast to a purely hypothetico- deductive theory, as for instance axiomatized geometry, where primitive no- tions (like ‘point,’ ‘straight line,’ and so forth) can be taken as implicitly de- fined by the set of axioms of the theory, in mechanics semantic rules or cor- relations with experience have to be considered and a definiendum, even if defined by an implicit definition, must ultimately be determinable in its quan- titative aspects through recourse to operational measurements.” This is simply nonsense. If a physicist says, “I take a sphere of one inch radius weighing one pound,” why is only the pound and not the sphere or the inch in need of “oper- ational” definition? Cannot the sort of person who derives comfort from “op- erational” definitions manufacture them for geometry, too? And when we are told, “Mach did not say what ‘mass’ really is but rather advanced an implicit definition of the concept relegating the quantitative determination to certain operational procedures,” are we really expected to find any meaning here, or is it just a smooth transition to the next chapter in a sociological essay? Surely all proper physical theories are created to mathematically frame and describe mat- ters of experience, and therefore can but be ‘operational’ in this sense. But surely they also involve abstractions and generalisations, otherwise they would not be theories but mere catalogues (and even catalogues imply a certain degree of abstraction), as Poincaré [596, ch. IX] said with other, famous words. I probably do not need to mention the fact that the very devising and set-up of every physical experiment is imbued with theory. All in all, the main problem with all this discussion on ‘operationalism’ seems to lie in the maniacal urge to paint as all white or all black something which has more colours than the rainbow. The propositions S¯ etc. are not meant to exclusively represent ‘laboratory instructions’ or the like. From this point of view the chosen names of ‘preparation’, ‘measurement’, and ‘outcome’ are particularly unfortunate; but I have not yet found more suitable alter- 30 natives. In the example of § 20, e.g., the proposition Cc represented the specification of a particular phase point, and the proposition I represented the specification of the evolu- tion operator amongst other things. If these specifications are ‘operational’ or not does not interest me, in view of the discussion above.

30 I often contemplate using the Latin terms praeparatio for the S¯ j, quaestio or mensuratio for the Mk, and prouentus for the Ri. 52 III. THE VECTOR FRAMEWORK

Historical notes and additional remarks

37. The basic ideas behind what I have here called, quite laconically, ‘vector frame- work’ have a relatively long history, though less than a century long. I shall try to give some references, but they will be very incomplete. I think one can recognise two and a half fundamental elements in the framework. The first is the introduction of the notions of preparation (or state), measurement, and outcome; or similar ones. They are the arguments of plausibilities. The second is the association of mathematical objects to those notions, objects that encode the plausibilities and are prin- cipally characterised by a convex structure. The ‘half’ element are the particular relations that we can introduce amongst preparations, measurements, and outcomes; specifically, what I have called ‘mixing’ and ‘coarsening’. I consider this a ‘half’ element because it is quite naturally derived from the first one. The natural notions of preparation, measurement, and outcome have basically always been present in physics; but they have often been directly identified with their mathemat- ical representatives, without the formal intermediary of logical propositions. Some may think that such intermediaries are in fact unnecessary; but I disagree. When we add (or subtract) a new proposition in a set of given ones there are usually no dramatical changes in this set, provided the new proposition is not inconsistent with the rest. We simply need to assign new plausibilities (or some become irrelevant, in the case of subtraction); the existing ones usually remain valid. On the other hand, the mathematical structure of the associated mathematical objects may have to be changed dramatically. In this resides part of the usefulness of the propositions as intermediaries. We have seen e.g. how gentle the ‘transition’ between a classical and a quantum system is from the propositional point of view: we have only taken away some propositions regarding some measurements. And yet the associated mathematical structure has changed from that of a phase space to that of a complex Hilbert space, and the convex structure from that of an infinite-dimensional simplex to that of a finite-dimensional non-simplicial convex body. The first studies known to me in which the notions of preparation etc. were introduced more or less explicitly as propositions are those by Strauß [674], and Foulis and Randall [246–248, 615–617], which also introduced the notion of a plausibility table; see also Ludwig, Dähn, and Stolz [146– 148, 484–492, 671, 672], Ekstein [194, 195], Hellwig and Kraus [337, 338], Giles [268– 270], Gudder [301–304], Lubkin [483], Holevo [348], Hardy [316]. In other studies those notions are rather connected to their mathematical representatives than to general propo- sitions; but in many cases it is indeed difficult to put a demarcation line. Therefore the studies I mention in the next paragraphs should also be taken into account. Amongst the first studies, known to me, of the mathematical representatives, especially from a convex-geometrical point of view, I can mention Mackey’s book [494], followed by numerous other studies amongst which of particular importance are those by Lud- wig, Dähn, and Stolz [146–148, 484–492, 671, 672] already mentioned, Gleason [279], Kochen and Specker [436, 437], Gudder [96, 298–304], Mielnik [526–530], Davies and Lewis [153–157], Hellwig and Kraus [337, 338], Holevo [346–349, 351], Ali, Emch, and Prugoveckiˇ [15–17], Bloore [71], Wright [751–754], Harriman [318–324], Ivanovic´ [362– 366], Bugajski, Lahti, Busch, Schroeck, Grabowski, Beltrametti [56, 57, 96, 97, 105– §37. 53

113, 447, 448, 643–647]; see also Jones [403, 404]. In many of these works the notions of state and measurement have narrower connotations than those presented here (some of these authors, e.g., speak about preparations of ‘beams’ of particles; although I suspect they are aware that their formalism has more general applications). Moreover, the no- tion of state has almost always been associated to a particular time instant, which is not necessarily the case in my presentation, cf. § 21. Almost all of the above studies were borne out of a desire to understand quantum theory. There is also another seam of studies, principally concerned with partially ordered sets and various kinds of lattices, that from the point of view of the framework presented here represent the structure induced in the set of measurements (not outcomes; cf. § 26) by the notions of measurement mixing and coarsening explained in § 24. The germs may be found in an article by Birkhoff and von Neumann [69]. Notably, their paper begins thus: One of the aspects of quantum theory which has attracted the most general attention, is the novelty of the logical notions which it presupposes. It asserts that even a complete mathematical description of a physical system S does not in general enable one to predict with certainty the result of an experiment on S, and that in particular one can never predict with certainty both the posi- tion and the momentum of S (Heisenberg’s Uncertainty Principle). It further asserts that most pairs of observations are incompatible, and cannot be made on S simultaneously (Principle of Non-commutativity of Observations).

Here one wonders what is so logically novel about these notions. They may be physically novel, or better, unusual; but surely not ‘logically’ novel. There is nothing in the axioms of logic that contradicts these notions. But the authors continue:

The object of the present paper is to discover what logical structure one may hope to find in physical theories which, like quantum mechanics, do not con- form to classical logic. But we have seen in the preceding sections that quantum mechanics is just a particular application of ‘classical’ logic and ‘classical’ plausibility theory, so it surely ‘conforms’ to both. We see here one of the seeds of that confusion of logic and probability/plausibility theory with physics that still plagues us today. It is remarkable that these two eminent mathematicians planted one of those obnoxious seeds. Fortunately, Strauß [674] (and later Koopman [440]), making analogous studies, recognised the point I have just made. Also Bodiou [72] has in his studies a clearer view than Birkhoff and von Neumann:

le principe des probabilités composées n’est pas vérifié par les pondérations conditionnées quantiques. On interprète, d’un point de vue classique cette non-vérification en considérant que le “conditionnement quantique” est, en réalité, un “changement de catégorie d’épreuves”. In other words, we have different contexts (or ‘sample spaces’ as many statisticians would say), one for each measurement; therefore the specification of the measurement M is an essential part of the framework. In our list Pool [601, 602] and Greechie [291, 292] follow 54 III. THE VECTOR FRAMEWORK next. This kind of studies then joins the seam on convex structures, so I may refer to the references of the previous paragraph. Studies related to the notions of measurement mixing and coarsening, though not from a lattice point of view, are those by Blackwell [70], and Morse and Sacksteder [538]. More recent studies, which I do not try to put into a category or another, are e.g. those in refs [4–11, 37–42, 58–63, 115, 128, 129, 290, 334–336, 416, 481, 482, 581, 582, 667, 668, 742–744, 754, 766]; many of them, although relatively relevant to the vector framework, are specifically concerned with quantum mechanics. Amongst all the above studies, those by Holevo [346–349, 351] deserve special men- tion; he systematised the whole framework within the general theory of statistical models and proved many important results that are still rediscovered today. Hardy’s work [316] also deserves special mention as it presented the basic ideas in the simplest possible math- ematical form. Finally, it is worth mentioning that many intimate connexions exist amongst the vector framework, the theory of statistical models [e.g., 66, 67, 259, 516], and system theory [411, 746, 756] (cf. also [411, 746, 747]). IV. A synthesis: induction and state assignment

Putting together the Laplace-Jaynes approach and the vector framework

38. You have probably noticed that some notation and terminology from ch. III had al- ready appeared in ch. II, especially in §§ 11–12. And you have probably thought, correctly, that that similarity was not accidental. Consider the following two situations:

(a) We have a problem of induction concerning some measurement instances {M(τ)} with kτ outcomes {R(τ)}; these regards a given physical system. We decide to adopt the Laplace- iτ (τ) Jaynes approach; we therefore need to specify the set of circumstances {C j } and the prior plausibilities

P(Ri| Mk ∧ C j ∧ I) for all i, k, and i ∈ Λk, (III.15)r

and this specification we do using the theory describing the physical system.

(b) We have a given physical system for which we have introduced sets of preparation cir- cumstances {C j}, measurements {Mk}, and outcomes {Ri}, with the associated vectors. In a given collection of measurements we know that the preparation circumstance was always ‘the same’, though we do not know which. Our problem is to assign plausi- bilities to the circumstances given the evidence of the outcome obtained. This is the typical situation that in plausibilistic physics we call ‘state assignment’ or ‘estimation’ or ‘retrodiction’.

It is clear that the two problems are faces of the same coin. We solve both at once by applying the formulae for the Laplace-Jaynes approach as given in § 5 of Paper (G) or in §§ 11–13 above; in these formulae the plausibilities (III.15), dictated by the physi- cal theory, will be encoded into the outcome- and preparation-vectors, as in eqs. (III.19) and (III.23): T P(Ri| Mk ∧ C j ∧ I) ≡ pi j = ri x j. (IV.1)

55 56 IV. A SYNTHESIS: INDUCTION AND STATE ASSIGNMENT

Given data D consisting in N outcomes of various kinds of measurements, with fre- quencies (Ni),

(τN ) (τ1) D B R ∧ · · · ∧ R (with ia ∈ Λk , a = 1,..., N), (II.16)r | i N {z i 1 } a Ri appears Ni times the plausibility of the circumstance C j is therefore given by eq. (II.18):   Q Ni pi j P(C j| I) k,i∈Λk P(C j| D ∧ I) =   , (II.18)r P Q Ni pi j P(C j| I) j k,i∈Λk which can also be rewritten, using eq. (4) of Paper (C) or (D), or equivalently eq. (III.19), as h i Q T N (rl x j) l P(C j| I) l P(C j| D ∧ I) = PhQ i . (IV.2) T N (rl x j) l P(C j| I) j l

The plausibility that, performing a new instance τN+1 of some measurement Mk, we obtain the outcome Ri is X (τN+1) (τN+1) P(Ri | Mk ∧ D ∧ I) = P(Ri| Mk ∧ C j ∧ I) P(C j| D ∧ I), j h i Q T N X (rl x j) l P(C j| I) T l = ri x j PhQ i T N j (rl x j) l P(C j| I) (IV.3) j l h i Q T N X (rl x j) l P(C j| I) T l ≡ ri x j PhQ i . T N j (rl x j) l P(C j| I) j l

The last equation shows that to the conjunction of data and prior knowledge D ∧ I we can associate an ‘effective’ circumstance-vector h i Q T N X (rl x j) l γ j l dD∧I B x j PhQ i . (IV.4) T N j (rl x j) l γ j j l

39. The above formulae are all expressed in terms of a set of preparation circumstances {C j}. We have seen that in both the convex framework and the Laplace-Jaynes approach §40. 57 there is a natural equivalence relation amongst circumstances. For the convex framework it was defined in § 25, eq. (III.24); for the Laplace-Jaynes approach it was defined in § 13, eq. (II.19). It is clear that the two definitions coincide. We also saw that in both frameworks it is quite natural to disjoin equivalent circum- stances together, since neither framework can lead to relative differences in the plausibil- ities of equivalent circumstances, other than those that were already present in the prior data. More precisely: for each pair of equivalent circumstances C0, C00, the ratio of their updated plausibilities cannot change and is equal to that of their prior plausibilities:

P(C0| D ∧ I) P(C0| I) C0 ∼ C00 =⇒ = , for whatever data D consisting of outcomes. P(C00| D ∧ I) P(C00| I) (IV.5) In the case of the Laplace-Jaynes approach we stopped short of disjoining equivalent circumstances and ‘plausibility-indexing’ them. The reason is that we can follow the steps taken in the convex framework instead, as described in § 26: we take disjunctions W S x B C j. (III.34)r j∈x∼ of circumstances having the same circumstance-vector x. As you remember, we called these disjunctions ‘x-indexed circumstances’, and the vectors x form a convex set. In terms of x-indexed circumstances, our state-assignment eqs. (IV.2) and (IV.3) take the form h i Q T N (rl x) l p(S x| I) dx l p(S x| D ∧ I) dx = R hQ i , (IV.6) T N (rl x) l p(S x| I) dx l

(τN+1) (τN+1) T P(R | M ∧ D ∧ I) = ri dD∧I, with (IV.7) i k Z

dD∧I B x p(S x| D ∧ I) dx. (IV.8)

40. The state-assignment formulae above are valid for any physical theory — or at least, for those theories whose plausibilistic properties can be formalised through the vec- tor framework; but we have seen that classical mechanics and quantum mechanics are counted amongst these. When we change to quantum-mechanical notation — because the change is only notational, nothing more — we obtain in fact the formulae of §§ 1 and 2 of Paper (H); in Paper (I), § 3, they have been generalised to data Df consisting of disjunc- tions of outcome collections: _ D B (R(τN ) ∧ · · · ∧ R(τ1)) (with i ∈ Λ , a = 1,..., N). (II.16) f iN i1 a ka r (ia)∈Ξ

In these papers the joint convex-Laplace-Jaynes framework was used in the state assign- ment for a three-level quantum system. 58 IV. A SYNTHESIS: INDUCTION AND STATE ASSIGNMENT

Remarks

41. I should like to add here some remarks concerning the state-assignment framework presented above. The first remark concerns a comparison of the Laplace-Jaynes approach in the case of multiple kinds of measurements, and of the approach based on (unrestricted) partial infinite exchangeability [237; 66, § 4.6.2]. Equations (IV.6) and (IV.7) above can be rewritten, when no reference is made to the vector framework, as Z (τN+1) (τN+1) P(Ri | Mk ∧ D ∧ I) = pl(x pi(x) p(S x| D ∧ I) dx, with (IV.9) h i Q N pl(x) l p(S x| I) dx l p(S x| D ∧ I) dx = R h i , (IV.10) Q N pl(x) l p(S x| I) dx l

pi(x) B P(Ri| Mk ∧ S x ∧ I). (IV.11) We realise that they are apparently not completely equivalent, in respect of their mathe- matical form, to those obtained from the theorem on partial infinite exchangeability. The difference consists in the fact that within the exchangeability approach we have

pi(x) ≡ xi, i.e., p(x) ≡ x, (IV.12) and moreover x ≡ p has a definite range, viz. the Cartesian product of the simplices asso- ciated to the plausibility distributions of the various measurements Mk:  X x ≡ p ∈ ∆k, ∆k B {(pi) | i ∈ Λk, pi > 0, pi = 1}; (IV.13) k i whereas in the Laplace-Jaynes approach the parameter x belongs to a generic convex space that depends on the meaning of the set of circumstances (e.g., the physical system from which they stem), and must thus be specified by us, with the sole restriction  p(x) ∈ X ⊆ ∆k. (IV.14) k This difference lies clearly in the fact that in the exchangeability approach the parameter x is ‘uninterpreted’, but not so in the Laplace-Jaynes approach, where x indexes propositions that have particular meanings and whose plausibilities may be dictated by some physical theory1. However, the exchangeability-based formulae can be made equivalent to the Laplace- Jaynes ones by appropriately restricting the support of the prior generating function to those values of x such that p(x) ≡ x ∈ X. This corresponds to a particular a priori plau- sibility judgement with regard to the possible infinite collections of outcomes that we can ever observe; a judgement that may stem from some physical theory. From this point of view there is hence no substantial difference between the two approaches (but see Pa- per (G) for a discussion of other differences in range of applicability and purposes).

1Which in the end means: by us, since it is we who, based on our experience, create and distill physical theories. §42. 59

42. The second remark, which has many connexions with the previous one, concerns a comparison of the framework for state assignment, as developed above, with that one presented by Caves, Fuchs, and Schack [121, 122, 250, 251], based on their ‘quantum de Finetti representation theorem’ [122, 354, 673]. In Paper (G) I lament the fact that the quantum de Finetti approach does not work with quantum mechanics on real and quater- nionic Hilbert space; but a general state-assignment framework should apply to any con- ceivable, i.e. self-consistent, physical theory, even one that apparently describes (yet) un- observed phenomena. The framework here presented does satisfy this requirement. Another criticisms can be levelled at the quantum de Finetti state-assignment approach2. One of its basic assumptions is that a collection of quantum systems (even ones localised at arbitrary space-time separations) can be handled as a quantum system in itself — with den- sity matrices describing its preparations etc. A similar assumption was also made by Balian and Balazs [33, 35] in a tentative to justify the maximum-von Neumann-entropy principle in quantum statistical mechanics. I personally see this assumption (which also seems to be the cause of the inapplicability of the approach to real and quaternionic quantum me- chanics) as completely unwarranted. We cannot take a collection of quantum systems and gratuitously state that they collectively behave as a quantum ‘super-system’: this would be a statement with important physical consequences and would surely need qualification. Imagine that we find a report on some quantum experiments made twenty years ago in a distant land, say 7 000 km from here. The report contains a very detailed description (including, e.g., the temperature and humidity of the laboratory et sim.) of a collection of experimental set-ups, each of which can be conceptually divided into a ‘preparation set- up’, which is always the same, and a ‘measurement set-up’ which is different for different experiments of the collection. The report also contains the outcome data obtained in these measurements. Now we can reproduce, by carefully following the reported instructions, the preparation set-up and perform new measurements on it and collect new outcome data. I should be willing, in this situation, to use the old and the new data together to assign a density matrix to the preparation set-up. And yet, I see no physical grounds for consider- ing the old and the new experiments as performed on a ‘quantum super-system’ extending twenty years in time and thousands of kilometres in space! The reason why in the example I am willing to use the old and new data is simply be- cause I judge the preparation scheme to be the same in all old and new experiments (in the precise sense of eqs. (I), (IV) of Paper (G) and (II.11), (II.12) above), and want therefore to assign to it a preparation-vector that encode the plausibilities for experiments performed following that scheme. This is precisely what we do in the Laplace-Jaynes approach, with- out that additional, and in this case also very suspect, ‘super-system’ assumption.3

43. The studies of the above-mentioned papers originate from the observation that it is always possible to analyse a given context at a deeper logical level. Syntactically and

2N.B.: I am now speaking about the quantum state-assignment approach, not the quantum de Finetti theorem; the latter is in fact a very interesting piece of mathematics. 3Moreover, as discussed in Paper (G), with the Laplace-Jaynes approach we do not need to entertain an infinite collection of additional fictive similar experiments either. 60 IV. A SYNTHESIS: INDUCTION AND STATE ASSIGNMENT semantically this analysis is done by introducing some set of propositions so chosen as to satisfy particular plausibilistic properties (see Paper (F), § 4; Paper (G), §§ 5.1, 5.3). Such propositions have been called ‘circumstances’ for want of a better term,4 and I have called the approach based on them the ‘Laplace-Jaynes approach’, since the main point of view is foreshadowed in Laplace [449] and the idea is briefly but explicitly expressed in Jaynes [391].5 What is important to stress is that the meaning of these circumstances is quite arbi- trary (so long as their plausibilities satisfy the mentioned properties). This means that an analysis into circumstances can be done from the points of view of different theories and even of different philosophies. Hence two (or more) different persons, with background 0 00 0 00 knowledges I and I , will choose in general different sets of circumstances {C j0 }, {C j00 }, 0 0 00 00 as well as different conditional plausibilities P(Ri| C j00 ∧ I ) and P(Ri| C j00 ∧ I ), for all the propositions Ri of common interest — call these ‘outcomes’. The plausibilities for the cir- 0 0 00 00 cumstances themselves, P(C j00 | I ) and P(C j00 | I ), will also be generally different of course. However, as soon as the two persons perform the ‘plausibility-indexing’ of their re- spective sets of circumstances (as described in Paper (F), § 4, and (G), § 5.3), obtaining 0 00 the new sets {S p} and {S p } with

0 0 00 00 P(Ri| S p ∧ I ) = P(Ri| S p ∧ I ) ≡ pi, (IV.15) they will find themselves in a sort of formal agreement on the plausibilities assigned to the outcomes Ri conditional on each p-indexed circumstance, as the above equation shows. 0 00 The agreement is only formal because the circumstances S p, S p , that have the same index p will have different meanings to the two persons. This difference will in fact be manifest 0 0 00 00 in the difference between the plausibilities P(S p| I ) and P(S p | I ). But also the difference in those plausibilities can be reduced. If there are multiple ‘in- stances’ — in the sense discussed in Paper (G) — of the outcomes Ri, then the collection of an enough large amount of data will lead, under certain assumptions, to the mutual convergence of those plausibilities. The two persons will then formally agree on all the 0 00 updated plausibilities involving the plausibility-indexed circumstances S p and S p , even though the meanings of the latter will still be different. Thus a difference in philosophy or interpretation does not lead to a difference in the mathematical formalism and in the predictions.

44. Some of the points of the previous section, like e.g. the convergence of the updated total plausibilities, are also valid for the approach to induction and to the interpretation of ‘plausibilities of plausibilities’ based on de Finetti’s theorem [e.g., 66, 170, 171, 232,

4I sometimes contemplate using the Latin term condicio. 5As regards Laplace, the above naming is not meant to reflect historical priority or attribution: I have very little historical knowledge on the subject. Laplace is usually presented as conceiving ‘physical’ causes for what I here denote by C and S . I think his ‘causes’ could be read in a more general sense (cf. the passage where he calls ‘cause’ the fraction of white to black tickets [449, § II]; this is surely not a ‘physical’ cause). I could have included the name of Bayes, who in his work [53] expresses partially similar ideas; but he seemed to me less explicit than Laplace. Others could have been mentioned as well, like Johnson [402], de Finetti [232, § 20], or Caves [120]. I later discovered that Mosleh and Bier [539] also propose and discuss basically the same interpretation. §44. 61

238–240, 329, 341; see also 384, 473]. This is also discussed in Paper (G), together with the reasons for which the Laplace-Jaynes approach can be preferable to that based on exchangeability. I think that the former approach is far more suitable to problems in physics, since all theories introduce physical concepts that play the rôle of ‘causes’, or more generally, of ‘circumstances’.

V. Epilogue

Summary and beginnings

45. There are many other topics, problems, conjectures, answers, ruminations about which I should have liked to talk. As said in the prologue, there was no time or space to talk about them. My hope for the present writing is that it have succeeded in showing its own unity, in showing that the discussion has explored a single unbroken territory, though from different directions. That territory is plausibility logic. We have seen how it allows us to formalise and quantify our conclusions about unknown facts from known similar ones; how it provides insights and clarifications in various physical problems; how it yields a mathematical struc- ture with which we can study the predictions of physical theories, including quantum me- chanics, prescinding these predictions from their physical contents. Did we find in this territory something unseen before? I cannot answer this question. I have seen many, many cases in which a study or a finding, both mine and of others, was apparently new to many researchers and surely to its finders; but a later fortuitous litera- ture discovery showed it to have been found or studied — and often much more deeply — many years earlier. One of the studies here presented that I have, at present, not seen done somewhen else is the mathematical characterisation of what I have called the ‘Laplace- Jaynes approach’ to induction and to statistical models; especially the introduction of a set of propositions having particular plausibilistic properties that fit this approach. In what I have called the ‘vector framework’, I have not seen remarked before that the propositions it is based upon need not refer to particular time instants, but can pertain more general situations that extend over time ranges or that are even atemporal. In general those propo- sitions can refer to the usual boundary- and initial-conditions of physical theories. Nor have I seen somewhere else the mechanical algorithm here presented to derive the specific structures of the framework for a given physical theory, given the probabilities specified by that theory, in the finite-dimensional case. Unseen for me was also the derivation through that framework of general state-assignment (and measurement-assignment) techniques as particular problems of induction, and in particular as applications of the Laplace-Jaynes approach. Finally, other passing remarks as those in § 22 and more elaborate ones as those in Papers (B) and (E) I have not previously seen either. Is there something to ‘conclude’ from these studies? I have not drawn any ‘conclu- sions’, apart from the fact that there should be more communication and less feeling of

63 64 V. EPILOGUE self-sufficiency amongst physicists, mathematicians, logicians, and philosophers; that as a rule of thumb we physicists are philosophical asses; that most fundamental claims about quantum mechanics are unwarranted but quantum mechanicians pretend not to notice (fear of seeing their funding cut?); that the majority of scientific activity today is what Trues- dell [710, lecture VI] calls ‘trade science’ and ‘religion science’. Am I too pessimistic? No. Am I exaggerating? Not really; but it is true that I am withholding some positive thoughts on purpose. I do not want to do that, because the message that one gets from todays’ journals, conferences, meetings is ‘All is well in science!’, ‘We’re doing just fine!’, ‘How intelligent and clever we are!’. Someone sometimes has to tell that in fact not everything is well, and some things are very bad indeed. Other more physical or mathematical conclusions can the readers draw for themselves. For me the studies here presented are beginnings for future studies and projects.

Future research directions

The Road goes ever on and on Down from the door where it began. Now far ahead the Road has gone, And I must follow, if I can, Pursuing it with eager feet, Until it joins some larger way Where many paths and errands meet. And whither then? I cannot say.

Bilbo Baggins [702]

Plausibilistic physics

46. The work that has been presented here is particularly suited to study so-called ‘inverse problems’ in physics. The first field that comes to mind is, of course, statistical mechanics, in which we try to get some knowledge about the microscopic description of a given class of phenomena from their macroscopic descriptions, and then the other way round. The foundations of statistical mechanics (and for definiteness I mean ‘classical’ statis- tical mechanics) are today based on the maximum-entropy principle or, more generally, on the maximum-calibre principle [377, 379], thanks to which the theory can consistently be applied to non-equilibrium phenomena. The basic idea of these principles is to assign a plausibility distribution to the preparation circumstances of a system (and ‘preparation’ can here mean more generally a description of a trajectory of the system in phase-space) such that (a) we get expectation values for particular coarse-grained (field) quantities that are equal to the measured values, and (b) the plausibility distribution maximise the Kullback- Leibler divergence in respect of a given ‘reference’ plausibility distribution. §47. 65

This procedure is quite successful and completely self-consistent. In non-dissipative systems in equilibrium the reference distribution is chosen by symmetry or invariance ar- guments (e.g., invariance with respect to the symplectic structure of the phase space, which generally means invariance under the action of the evolution operator). But in some situ- ations, in particular the most interesting ones that concern non-equilibrium phenomena, it is not always clear which reference distribution should be chosen. Examples are provided in the study of granular materials [31, 73, 80–84, 137, 150, 165, 179, 180, 227, 228, 258, 260, 280, 283, 353, 367, 414, 418, 438, 439, 457, 475, 476, 479, 480, 521, 522, 524, 525, 534, 535, 540, 561, 567, 570, 571, 575, 609, 610, 631, 633, 648, 669, 725, 745, 762]. The vector framework discussed in ch. III makes allowance for the application of the maximum-entropy principle to generic systems (in fact it leads to an application in quan- tum mechanics that apparently differs from that based on the von Neumann-entropy prin- ciple, as discussed in § 48 infra). But it also suggests alternative approaches. In particular, it suggests that the problem of assigning a plausibility distribution to the possible ‘micro- scopic’ circumstances in simply one of state assignment, of the kind discussed in ch. IV. As we have seen, in that approach we must assign a prior plausibility distribution over the possible circumstances; this seems to parallel the specification of a reference distribution in the maximum-entropy approach. The ‘state-assignment approach’ to statistical mechan- ics would appear, in a sense, to reflect the historical development of the various models and distributions that have been assigned during the development of statistical mechanics. Consider, simplifying a lot the real developments, that initially a (symplectic-invariant) dis- tribution over the whole phase-space was assigned, but then, comparing its predictions to actual measurement outcomes, it was found that a distribution assigning reduced weights to circumstances corresponding to particle permutations — the famous division by N! — yielded better predictions. This can be seen as a sort of historical ‘update’ of a prior distribution to a new one conditional on the observed results; although this update did not really followed Bayes’ theorem, since it was a ‘meta-theoretical’ update. Something similar is today happening in the study of granular materials [see refs above]. The state- assignment approach to statistical mechanics goes along the same line, but the update is really done according to Bayes’ theorem. Consistency should require that it lead to the same posterior distributions as the maximum-entropy principle. Does it? For which prior distributions? Can the maximum-entropy and -calibre principles perhaps be derived from the state-assignment one?

47. There are very interesting and general results, presented by Murdoch, Pitteri, Robin- son, et al. [541–552, 591, 593, 594, 625, 626], on the derivation of the most general laws for continuum media from microscopic particle models, which generalise previous ap- proaches by e.g. Kirkwood, Irving, et al. [358, 359, 420–425, 467–469] in the previous century (of course, there are much older studies, for which see e.g. Truesdell [713]; worthy of mention are those by Waterston [740]). Murdoch’s and Pitteri’s studies, however, stop short of including thermodynamic phenomena. On the other hand, twenty years ago Jaynes put forward a unifying statistical-mechanical framework [377, 379], a generalisation of the ‘maximum-entropy principle’, which is directly applicable to non-equilibrium phenom- 66 V. EPILOGUE ena. It is basically founded on the consideration of probability distributions on path space instead of phase space. It seems to me that Jaynes’ framework can be used to extend Mur- doch’s and Pitteri’s work to equilibrium and especially non-equilibrium thermodynamic phenomena (note that I mean something more general than the approaches based on ‘local equilibrium’ à la de Groot & Mazur [296]; see § 54 below), and I should like to study and develop this possibility. I should also like to apply Jaynes’ framework to the statistical-mechanical and thermo- dynamic study of granular materials [see refs in § 46], and show that from this framework one can derive all results concerning the various fluctuation-dissipation-related theorems that have been presented in the last few years [27–30, 142, 255, 343, 344, 368, 496, 498, 692]. Finally, I should like to put together some results on ‘H-theorems’, Markov proper- ties, and other insights by Vlad and Mackey [733], Crutchfield and Shalizi [143, 655, 656], Dewar [167–169], De Roeck, Maes, Netocný,ˇ et al. [161, 495, 497, 499], which are scat- tered in very different publications and apparently do not know of each other, and show that they explain many features and assumptions behind Jaynes’ framework. The use of probability logic in these investigations is essential, and the vector frame- work is also particularly suited to these investigations.

48. As already mentioned, when we apply the maximum-Shannon-entropy principle to a quantum system we obtain results that in general differ from those of the maximum- von Neumann-entropy principle. Cf. the discussion in Paper (I), § 4.3. More precisely, the results of the first method depend on the choice of prior distribution. Is there a prior distribution that leads to the same results of the von Neumann-entropy-based approach? Which is it? What is its significance?

49. The application of probabilistic and statistical ideas and frameworks need not be confined to microscopic theories. In 1925 Szilard [685] proposed a probabilistic theory for thermodynamics, as opposed to statistical mechanics; i.e., for a macroscopic theory with no assumptions about microscopic features. (Also Einstein [189] studied a similar approach earlier). His studies were resurrected during short periods by Lewis [465, 466], Mandelbrot [502–505], and Tisza, Manning, and Quay [698–701]. But I have not seen works pursuing these studies today, although Primas [608] and Paladin and Vulpiani [573] mention Szilard’s studies; nor have I seen them generalised to (macroscopic) mechanical theories, if not possibly in Beran’s book [65]. I think that the probabilistic, or statistical, approach to continuum mechanics is in many contexts well worth of consideration. The fact that different microscopic models lead, in some contexts, to the same macroscopic statistical features, means that we can do without those microscopic assumptions alto- gether (just like when in the study of the generic ‘vicinity’ of points we need not bring whole metrics; just a topology can do the job). Note that I am not depreciating the usual statistical-mechanical approach: in many cases we are in fact interested in microscopic features, and that approach allows us to form some plausible conclusions about those mi- croscopic features. What I am referring to are, rather, those situations in which we are not §50. 67 really interested in the microscopic details, but we introduce them only to get statistical macroscopic (or mesoscopic) theoretical consequences to test against experiments. Moreover, theories based on Szilard’s ideas can be useful as phenomenological ‘inter- mediaries’ in the study of the connexion between thermodynamics and statistical mechan- ics; diagrammatically,

thermodynamics ←→ ‘statistical thermodynamics’ ←→ statistical mechanics.

Such an ‘interface’ can be quite and especially useful when the phenomena in question are not just thermodynamical, but electromagneto-thermo-mechanical — and even in non- equilibrium.

Rational continuum mechanics

50. More work must be done concerning the foundations of electromagnetism in ple- no in connexion with non-linear mechanical and thermodynamic phenomena. Although particular ad hoc formulae and data abound, as well as different partial theoretical foun- dations from first principles, especially within rational continuum mechanics [132, 133, 206, 210, 356, 415, 703–705, 719], a general derivation from first, simple principles is still lacking. It is remarkable, e.g., that there still is no unanimous general expression for the stress and energy density for bodies sustaining electromagnetic fields simultaneously with other thermodynamical and mechanical effects. The remarks made in this regard by, e.g., Robinson [625] in his book in the seventies are more or less repeated today by Erick- sen [198–202]. (I wonder how it is possible that we are still in such a situation. Perhaps we have become too wont to ad hoc formulae?) Both Truesdell [710, Lecture IV] and DiCarlo [172] point out that the problem lies in the completely different conceptual and mathematical approaches to the mechanics of bodies and to the electrodynamics in bodies: the introduction of space-time as a relation amongst material points in the first, and as a sort pre-existing, absolute object in the second. As I see it, one should pursue the line of attack discussed by old but still vigorous and ingenious Ericksen [ibid.], based on the idea that the electromagnetic field can sustain forces; but this line could be combined with Noll’s [562–565; cf. 714, ch. 1] beautiful (and apparently not widely known) approach to classical Galilean-relativistic mechanics, in which inertial forces are considered as real forces (exerted by the rest of the masses of the universe — a point of view in line with general relativity) and the basic axiom is that the net force sustained by a material point, including the inertial ones, is always nought. In this approach momentum is just the expression of inertial forces; hence we have that the stress, that from control-volume or microscopic considerations [358, 421–424] is viewed as a combination of contact forces and of momentum transfer, can be again be simply con- sidered as an expression of forces only. With regard to energy density, I think we would follow Serrin’s predicament, motivated by his investigations in thermodynamics [650–653] (see also Owen [572]), to return to work and heat as fundamental quantities. This predica- ment should perhaps be followed also in general relativity: cf. the various studies that try to ‘localize’ the energy of the gravitational field [e.g., 93]. Serrin’s proposal accords also 68 V. EPILOGUE with the fact that in many systems entropy and energy are non-extensive (cf. Gurtin and Williams [305, 308]). Beside such a general approach from first principles, and partly as a point of reference therein, I should be curious to compile a sort of ‘table’ with the interpretations and expres- sions used by specialists for the magnetic field intensity H, the electric displacement D, and the polarisation and magnetisation P and M in diverse electromagnetothermomechani- cal provinces, from piezo-electricity and birefringence to ferromagnetism, pyro-electricity, and hysteresis. Moreover, from the mathematical point of view it would be advantageous to use the differential-form framework advocated by Hehl, Obukhov, et al. [295, 330– 332, 569] (see also [166, 471, 472, 738]), and much earlier even by Maxwell [515], which treats the electromagnetic quantities as differential forms of different degrees. Surely these investigations would also help in constructing a (classical) relativistic the- ory of materials supporting electromagnetothermomechanical effects, which today, as far as I know, is basically in the same status as it was when developed by Bressan [79] amongst others. I see some usefulness also as regards field quantisation in non-linear media (the ap- proaches that I have seen in the literature [e.g., 68, 176, 177, 257, 460, 584, 660] are all more or less ad hoc and apparently try to patch the linear-case approach in a way or another. A fresh new start is needed.)

Quantum theory

51. In ch. III I mentioned and briefly discussed the main open question about quantum theory: why that particular convex structure? We have seen that the vector framework helps in seeing and formulating clearly the question. It cannot, however, provide an answer by itself, since it does not contain physical principles or laws in itself, although it is very useful in displaying the plausibilistic consequences of such principles once they have been specified, and although it enables one to prove that quantum theory can be recovered from a classical theory. The next step in ‘clearing the quantum mysteries’ is to think about and introduce real physical models and theories. Not ‘toy’ theories, of which there are plenty today, but serious, elegant, humanly understandable and ‘picturable’ physical theories. Perhaps we should begin from where the situation was immediately after Planck’s presentation of his radiation law, using, however, the concepts and mathematical apparatus that in the mean- while have been developed in rational continuum electromagnetothermomechanics [197, 203, 206, 207, 213, 214, 216, 217, 306, 307, 511–513, 572, 636, 662, 712, 714, 718, 719]. Some interesting studies have already been pursued by Jaynes [375, 376, 378, 380, 381, 385–387, 389, 390] whose ‘neoclassical’ electromagnetic theory has given many predic- tions identical with those of quantum field theory (e.g., Jaynes [375] derives orbit quanti- sation condition for the hydrogen atom within neo-classical theory), but without the need of invoking quantisation (see also [43, 45, 98, 141, 325–328, 393, 532, 638, 678]). A connected and still unsolved fundamental problem is that of radiation reaction in classical physics [e.g., 98, 252, 523, 629, 630, 638]. The variety of phenomena that classical me- chanics can exhibit — e.g., think of ‘orbit quantisation’, as simply shown by the Rayleigh §52. 69 oscillator — gives much hope in the programme of deriving quantum theory from the full splendour of the classical theories.

52. Another research idea is partially related to Bell-like theorems: I have found some indications that, when we pass from a local deterministic classical field theory to a ‘coarse- grained’ one (be it classical or quantum) apt to describe coarser space-time scales, seeming ‘non-localities’ appear in the latter. This basically comes about from the fact that, when- ever we study a compact region of space-time, we always need to specify non-local bound- ary conditions (e.g., on a time- or space-like cylindrical boundary [92, 93, 114]) for the relevant partial differential equations of the ‘finer’ field theory. This apparent fact has of course important implications for Bell’s ‘hidden-variable’ theorem, which loses much of its importance (cf. [536, 606]).

Teaching

53. All courses in quantum mechanics that I have hitherto attended are still heavily based on the concepts and points of view of the works of Dirac and von Neumann. But the latest advances in quantum mechanics have shown that the theory can be initially pre- sented with more general — and simpler! — concepts and points of views, related to the vector framework discussed in this work. Amongst the advantages of this framework: (1) the basic concepts are common to classical and quantum physics, and therefore are less counterintuitive to the students; (2) the mathematics is, at a first stage, based solely on real vector spaces (Rn), and therefore even younger students may acquire a working knowledge of quantum theory; (3) the formalism used is especially suitable to the study of non-isolated quantum systems. Within this approach, a student may first learn to ‘build’ a state space and a measurement space that suit any particular class of phenomena of interest. The spaces thus constructed may be more or less similar (with a continuum of degrees) to those that characterise classical physics, and the student can freely explore the differences. Then the student learns that, amongst these spaces, there are some — the quantum ones — that describe well many classes of microscopic phenomena. In this way the student has met quantum physics without making conceptual and mathematical ‘jumps’ from classi- cal physics; and is able, moreover, to identify which features of the quantum theory are general features of any theory, which are generally non-classical, and which are peculiarly quantum.

54. Also most thermodynamic courses are based on formalisms and points of view that are a century old. The progress in principles [see e.g. 197, 572, 636, 662, 712] and in applications (Atkin and Craine [25, 26] review particular examples) that started around the sixties is virtually ignored. Today thermodynamics can be taught in a way completely parallel to classical mechanics: (1) it is a dynamical theory, with quantities — including temperature and entropy — that depend explicitly on time (and are, in general, field quan- tities); (2) it is governed by few (two, in the simplest cases) basic axioms, comparable to 70 V. EPILOGUE

Newton’s axioms in classical mechanics; (3) just like in classical mechanics, the specifi- cation of particular constitutive equations and of initial and boundary conditions leads to (differential) equations of motions with well-defined solutions. A student can thus study, e.g., not only the equation of state of an ideal gas, but also the behaviour of the thermal quantities of a non-ideal gas in rapid or free expansion. Bibliography

Note: arxiv eprints are located at http://arxiv.org/; mp_arc eprints are located at http://www.ma.utexas.edu/mp_arc/.

[1] Adams, E. W., 1975, The Logic of Conditionals: An Application of Probability to Deductive Logic (D. Reidel Publishing Company, Dordrecht). [2] Adams, E. W., 1998, A Primer of Probability Logic (CSLI Publications, Stanford). [3] Addison, J. W., L. Henkin, and A. Tarski (eds.), 1965, The Theory of Models: Proceedings of the 1963 International Symposium at Berkeley (North-Holland Publishing Company, Amster- dam). [4] Aerts, D., 1982, Description of many physical entities without the paradoxes encountered in quantum mechanics, Found. Phys. 12(12), 1131–1170, http://www.vub.ac.be/CLEA/ aerts/publications/1982.. [5] Aerts, D., 1993, Quantum structures due to fluctuations of the measurement situation, Int. J. Theor. Phys. 32(12), 2207–2220. [6] Aerts, D., and S. Aerts, 1997, The hidden-measurement formalism: quantum mechanics as a consequence of fluctuations on the measurement, in Ferrero and van der Merwe [229], pp. 1–6. [7] Aerts, D., S. Aerts, B. Coecke, B. D’Hooghe, T. Durt, and F. Valckenborgh, 1997, A model with varying fluctuations in the measurement context, in Ferrero and van der Merwe [229], pp. 7–9. [8] Aerts, D., and L. Gabora, 2005, A theory of concepts and their combinations I: The struc- ture of the sets of contexts and properties, Kybernetes 34(1/2), 167–191, eprint arXiv: quant-ph/0402207; see also [9]. [9] Aerts, D., and L. Gabora, 2005, A theory of concepts and their combinations II: A Hilbert space representation, Kybernetes 34(1/2), 192–221, eprint arXiv:quant-ph/0402205; see also [8]. [10] Aerts, S., 2006, Quantum and classical probability as Bayes-optimal observation, eprint arXiv:quant-ph/0601138. [11] Aerts, S., D. Aerts, and F. E. Schroeck, 2005, Necessity of combining mutually incompatible perspectives in the construction of a global view: Quantum probability and signal analysis, in Worldviews, Science and Us: Redemarcating Knowledge and Its Social and Ethical Impli- cations, edited by D. Aerts, B. D’Hooghe, and N. Note (World Scientific, Singapore), p. 203, eprint arXiv:quant-ph/0503112. [12] Akhiezer, N. I., and I. M. Glazman, 1961–1963/1993, Theory of Linear Operators in Hilbert Space (Dover Publications, New York), ‘Two volumes bound as one’; transl. by Merlynd Nestell; first publ. in Russian 1961–1963.

71 72 BIBLIOGRAPHY

[13] Alberti, P. M., and A. Uhlmann, 1982, Stochasticity and Partial Order: Doubly Stochastic Maps and Unitary Mixing (VEB Deutscher Verlag der Wissenschaften/D. Reidel Publishing Company, Berlin/Dordrecht). [14] Alfsen, E. M., 1971, Compact Convex Sets and Boundary Integrals (Springer-Verlag, Berlin). [15] Ali, S. T., and G. G. Emch, 1974, Fuzzy observables in quantum mechanics, J. Math. Phys. 15(2), 176–182, pOVMs. [16] Ali, S. T., and E. Prugovecki,ˇ 1977, Classical and quantum statistical mechanics in a common Liouville space, Physica A 89A(3), 501–521. [17] Ali, S. T., and E. Prugovecki,ˇ 1977, Systems of imprimitivity and representations of quantum mechanics on fuzzy phase spaces, J. Math. Phys. 18(2), 219–228. [18] Alicki, R., 1995, Comment on “Reduced dynamics need not be completely positive”, Phys. Rev. Lett. 75(16), 3020, see [579] and also [580]. [19] Andrey, L., 1985, The rate of entropy change in non-Hamiltonian systems, Phys. Lett. A 111(1–2), 45–46, see also [20, 282]. [20] Andrey, L., 1986, Note concerning the paper “The rate of entropy change in non-Hamiltonian systems”, Phys. Lett. A 114(4), 183–184, see [19] and also [282]. [21] Appleby, D. M., 2005, Facts, values and quanta, Found. Phys. 35(4), 627–668, eprint arXiv: quant-ph/0402015. [22] Arthurs, E., and J. L. Kelly, Jr., 1965, On the simultaneous measurement of a pair of conjugate observables, Bell Syst. Tech. J. 44(4), 725–729. [23] Aspect, A., J. Dalibard, and G. Roger, 1982, Experimental test of Bell’s inequalities using time- varying analyzers, Phys. Rev. Lett. 49(25), 1804–1807. [24] Aspect, A., P. Grangier, and G. Roger, 1982, Experimental realisation of Einstein-Podolsky- Rosen-Bohm gedankenexperiment: a new violation of Bell’s inequalities, Phys. Rev. Lett. 49(2), 91–94. [25] Atkin, R. J., and R. E. Craine, 1976, Continuum theories of mixtures: applications, J. Inst. Maths. Applics 17(2), 153–207, see also [26]. [26] Atkin, R. J., and R. E. Craine, 1976, Continuum theories of mixtures: basic theory and histor- ical development, Q. J. Mech. Appl. Math. 29(2), 209–244, see also [25]. [27] Baker-Jarvis, J., 1993, Dielectric and magnetic relaxation by a maximum-entropy method, Smart Mater. Struct. 2(2), 113–123. [28] Baker-Jarvis, J., 2005, Time-dependent entropy evolution in microscopic and macroscopic electromagnetic relaxation, Phys. Rev. E 72(6), 066613. [29] Baker-Jarvis, J., and P. Kabos, 2001, Dynamic constitutive relations for polarization and mag- netization, Phys. Rev. E 64(5), 056127. [30] Baker-Jarvis, J., P. Kabos, and C. L. Holloway, 2004, Nonequilibrium electromagnetics: Local and macroscopic fields and constitutive relationships, Phys. Rev. E 70(3), 036615. [31] Baldassarri, A., A. Barrat, G. D’Anna, V. Loreto, P. Mayor, and A. Puglisi, 2005, What is the temperature of a granular medium?, J. Phys.: Condens. Matter 17(24), S2405–S2428, eprint arXiv:cond-mat/0501488. [32] Balescu, R., 1975, Equilibrium and Nonequilibrium Statistical Mechanics (John Wiley & Sons, New York). [33] Balian, R., 1989, Justification of the maximum entropy criterion in quantum mechanics, in Skilling [663], pp. 123–129. [34] Balian, R., Y. Alhassid, and H. Reinhardt, 1986, Dissipation in many-body systems: a geo- metric approach based on information theory, Phys. Rep. 131(1/2), 1–146. [35] Balian, R., and N. L. Balazs, 1987, Equiprobability, inference, and entropy in quantum theory, 73

Ann. of Phys. 179(1), 97–144. [36] Balzer, W., D. A. Pearce, and H.-J. Schmidt (eds.), 1984, Reduction in Science: Structure, Examples, Philosophical Problems (D. Reidel Publishing Company, Dordrecht), papers pre- sented at the colloquium ‘Reduktion in der Wissenschaft: Struktur, Beispiele, philosophische Probleme’, held in Bielefeld, West Germany, July 18–21, 1983. [37] Barnett, S. M., D. T. Pegg, and J. Jeffers, 2000, Bayes’ theorem and quantum retrodiction, J. Mod. Opt. 47(11), 1779–1789, eprint arXiv:quant-ph/0106139. [38] Barnett, S. M., D. T. Pegg, J. Jeffers, and O. Jedrkiewicz, 2000, Atomic retrodiction, J. Phys. B 33(16), 3047–3065, eprint arXiv:quant-ph/0107019. [39] Barnum, H., 2003, Quantum information processing, operational quantum logic, convexity, and the foundations of physics, Stud. Hist. Philos. Mod. Phys. 34(3), 343–379, eprint arXiv: quant-ph/0304159. [40] Barnum, H., J. Barrett, M. Leifer, and A. Wilce, 2006, Cloning and broadcasting in generic probabilistic theories, eprint arXiv:quant-ph/0611295. [41] Barrett, J., 2005, Information processing in generalized probabilistic theories, eprint arXiv: quant-ph/0508211. [42] Barrett, J., and A. Kent, 2004, Non-contextuality, finite precision measurement and the Kochen-Specker theorem, Stud. Hist. Philos. Mod. Phys. 35(2), 151–176, eprint arXiv: quant-ph/0309017. [43] Barut, A. O., 1986, Quantum electrodynamics based on self-energy versus quantization of fields: Illustration by a simple model, Phys. Rev. A 34(4), 3502–3503. [44] Barut, A. O., and A. J. Bracken, 1981, Zitterbewegung and the internal geometry of the elec- tron, Phys. Rev. D 23(10), 2454–2463. [45] Barut, A. O., and J. P. Dowling, 1987, Quantum electrodynamics based on self-energy, without second quantization: The Lamb shift and long-range Casimir-Polder van der Waals forces near boundaries, Phys. Rev. A 36(6), 2550–2556. [46] Barwise, J., 1977, An introduction to first-order logic, in Barwise [47], chapter A.1, pp. 5–46. [47] Barwise, J. (ed.), 1977/1999, Handbook of Mathematical Logic (Elsevier, Amsterdam), first publ. 1977. [48] Barwise, J., 1985, The Situation in Logic — II: Conditionals and Conditional Information, Technical Report CSLI-85-21, Center for the Study of Language and Information, Stanford. [49] Barwise, J., 1989, The Situation in Logic (CSLI, Stanford). [50] Barwise, J., and J. Etchemendy, 1987, The Liar: An Essay on Truth and Circularity (Oxford University Press, New York). [51] Barwise, J., and J. Etchemendy, 1990/1993, The Language of First-Order Logic (CSLI Publi- cations, Stanford), third, revised and expanded edition, first publ. 1990. [52] Barwise, J., and J. Etchemendy, 1999/2003, Language, Proof and Logic (CSLI Publications, Stanford), written in collaboration with Gerard Allwein, Dave Barker-Plummer, Albert Liu; first publ. 1999. [53] Bayes, T., 1763, An essay towards solving a problem in the doctrine of chances, Phil. Trans. R. Soc. Lond. 53, 370–418, http://www.stat.ucla.edu/history/essay.pdf; with an introduction by Richard Price. [54] Bell, J. S., 1964, On the Einstein Podolsky Rosen paradox, Physics 1(3), 195–200, http: //puhep1.princeton.edu/~mcdonald/examples/EM/; repr. in [55], pp. 14–21. [55] Bell, J. S., 1987, Speakable and Unspeakable in Quantum Mechanics (Cambridge University Press, Cambridge). [56] Beltrametti, E. G., and S. Bugajski, 1993, Decomposability of mixed states into pure states 74 BIBLIOGRAPHY

and related properties, Int. J. Theor. Phys. 32(12), 2235–2244. [57] Beltrametti, E. G., and S. Bugajski, 1995, A classical extension of quantum mechanics, J. Phys. A 28(12), 3329–3343. [58] Bengtsson, I., 2004, The importance of being unistochastic, eprint arXiv: quant-ph/0403088. [59] Bengtsson, I., 2004, MUBs, polytopes, and finite geometries, eprint arXiv: quant-ph/0406174. [60] Bengtsson, I., 2005, Geometrical statistics — classical and quantum, eprint arXiv: quant-ph/0509017. [61] Bengtsson, I., and A. Ericsson, 2003, How to mix a density matrix, Phys. Rev. A 67, 012107, eprint arXiv:quant-ph/0206169. [62] Bengtsson, I., and A. Ericsson, 2004, Mutually unbiased bases and the complementarity poly- tope, eprint arXiv:quant-ph/0410120. [63] Bengtsson, I., A. Ericsson, W. T. Marek Kus,´ and K. Zyczkowski,˙ 2004, Birkhoff’s polytope and unistochastic matrices, N = 3 and N = 4, Commun. Math. Phys. 259(2), 307–324, eprint arXiv:math.CO/0402325. [64] Bennett, C. H., 2003, Notes on Landauer’s principle, reversible computation, and Maxwell’s demon, Stud. Hist. Philos. Mod. Phys. 34(3), 501–510, eprint arXiv:physics/0210005. [65] Beran, M. J., 1968, Statistical Continuum Theories (John Wiley and Sons, New York). [66] Bernardo, J.-M., and A. F. Smith, 1994, Bayesian Theory (John Wiley & Sons, Chichester). [67] Besag, J., P. J. Bickel, H. Brøns, D. A. S. Fraser, N. Reid, I. S. Helland, P. J. Huber, R. Kalman, S. Pincus, T. Tjur, and P. McCullagh, 2002, What is a statistical model?: Dis- cussion and Rejoinder, Ann. Stat. 30(5), 1267–1310, http://www.stat.uchicago.edu/ ~pmcc/publications.html; see McCullagh [516]. [68] Bhat, N. A. R., and J. E. Sipe, 2006, Hamiltonian treatment of the electromagnetic field in dispersive and absorptive structured media, Phys. Rev. A 73(6), 063808. [69] Birkhoff, G., and J. von Neumann, 1936, The logic of quantum mechanics, Ann. Math. 37(4), 823–843. [70] Blackwell, D., 1953, Equivalent comparisons of experiments, Ann. Math. Stat. 24(2), 265– 272. 1 [71] Bloore, F. J., 1976, Geometrical description of the convex sets of states for systems with spin- 2 and spin-1, J. Phys. A 9(12), 2059–2067. [72] Bodiou, G., 1957, Probabilité sur un treillis non modulaire : (Calcul quantique des probabi- lités), Publ. Inst. Stat. Univ. Paris (Ann. Inst. Stat. Univ. Paris) 6, 11–25. [73] Bonetto, F., G. Gallavotti, A. Giuliani, and F. Zamponi, 2006, Fluctuations relation and exter- nal thermostats: an application to granular materials, J. Stat. Mech. , P05009eprint arXiv: cond-mat/0601683. [74] Bowen, R. M., 1989/2004, Introduction to Continuum Mechanics for Engineers (http:// www1.mengr.tamu.edu/rbowen/), revised edition, first publ. 1989. [75] Boyer, T. H., 1970, Sharpening Bridgman’s resolution of the Gibbs paradox, Am. J. Phys. 38(6), 771–773. [76] Boyer, T. H., 2000, Classical electromagnetism and the Aharonov-Bohm phase shift, Found. Phys. 30(6), 907–932, electromagnetism, quantum theory, quantum field theory. [77] Boyer, T. H., 2000, Does the Aharonov-Bohm effect exist?, Found. Phys. 30(6), 893–905, electromagnetism, quantum theory, quantum field theory. [78] Brenner, L. A., D. J. Koehler, V. Liberman, and A. Tversky, 1996, Overconfidence in proba- bility and frequency judgments: A critical examination, Organ. Behav. Hum. Decis. Process. 75

65(3), 212–219. [79] Bressan, A., 1978, Relativistic Theories of Materials (Springer-Verlag, Berlin). [80] Brey, J. J., A. Domínguez, M. I. García de Soria, and P. Maynar, 2006, Mesoscopic theory of critical fluctuations in isolated granular gases, Phys. Rev. Lett. 96(15), 158002. [81] Brey, J. J., and Ruiz-Montero, 2004, Heat flux and upper boundary condition in an open fluidized granular gas, Europhys. Lett. 66(6), 805–811. [82] Brey, J. J., M. J. Ruiz-Montero, and F. Moreno, 2001, Hydrodynamics of an open vibrated granular system, Phys. Rev. E 63(6), 061305. [83] Brey, J. J., M. J. Ruiz-Montero, and F. Moreno, 2005, Energy partition and segregation for an intruder in a vibrated granular system under gravity, Phys. Rev. Lett. 95(9), 098001, eprint arXiv:cond-mat/0507206. [84] Brey, J. J., M. I. García de Soria, P. Maynar, and M. J. Ruiz-Montero, 2004, Energy fluctua- tions in the homogeneous cooling state of granular gases, Phys. Rev. E 70(1), 011302. [85] Bridgman, P. W., 1941/1943, The Nature of Thermodynamics (Harvard University Press, Cam- bridge, USA), first publ. 1941. [86] Brillouin, L., 1950, Can the rectifier become a thermodynamical demon?, Phys. Rev. 78(5), 627–628. [87] Brillouin, L., 1951, Maxwell’s demon cannot operate: Information and entropy. I, J. Appl. Phys. 22(3), 334–337, see also [88]. [88] Brillouin, L., 1951, Physical entropy and information. II, J. Appl. Phys. 22(3), 338–343, see also [87]. [89] Brøndsted, A., 1983, An Introduction to Convex Polytopes (Springer-Verlag, Berlin). [90] Brouwer, L. E. J., 1946/1981, Brouwer’s Cambridge lectures on intuitionism (Cambridge Uni- versity Press, Cambridge, USA), ed. by D. Van Dalen; lectures given 1946–1951. [91] Brovetto, P., V. Maxia, and M. Salis, 2005, Some remarks about mass and Zitterbewegung substructure of electron, eprint arXiv:quant-ph/0512047. [92] Brown, J. D., S. R. Lau, and J. W. York, 2002, Action and energy of the gravitational field, Ann. of Phys. 297(2), 175–218, eprint arXiv:gr-qc/0010024. [93] Brown, J. D., and J. W. York, Jr., 1993, Quasilocal energy and conserved charges derived from the gravitational action, Phys. Rev. D 47(4), 1407–1419, eprint arXiv:gr-qc/9209012. [94] Brukner, C.,ˇ and A. Zeilinger, 2001, Conceptual inadequacy of the Shannon information in quantum measurements, Phys. Rev. A 63, 022113, eprint arXiv:quant-ph/0006087. [95] Bub, J., 2001, Maxwell’s demon and the thermodynamics of computation, Stud. Hist. Philos. Mod. Phys. 32, 569–579, eprint arXiv:quant-ph/0203017. [96] Bugajski, S., S. P. Gudder, and S. Pulmannová, 2000, Convex effect algebras, state ordered effect algebras, and ordered linear spaces, Rep. Math. Phys. 45(3), 371–388. [97] Bugajski, S., and P. J. Lahti, 1980, Fundamental principles of quantum theory, Int. J. Theor. Phys. 19(7), 499–514, see also [448]. [98] Bullough, R. K., P. J. Caudrey, and A. S. F. Obada, 1974, Theory of radiation reaction and atom self-energies: all-order perturbation theory of the generalized non-relativistic Lamb shift, J. Phys. A 7(13), 1647–1663, see also [638]. [99] Bunge, M., 1961, Laws of physical laws, Am. J. Phys. 29(8), 518–529. [100] Bunge, M., 1963, A general black box theory, Philos. Sci. 30(4), 346–358. [101] Bunge, M. (ed.), 1967, Delaware Seminar in the Foundations of Physics (Springer-Verlag, Berlin). [102] Bunge, M., 1967, Foundations of Physics (Springer-Verlag, Berlin). [103] Bunge, M., 1967, Physical axiomatics, Rev. Mod. Phys. 39(2), 463–474. 76 BIBLIOGRAPHY

[104] Bunge, M., 1967, The structure and content of a physical theory, in Bunge [101], pp. 15–27. [105] Busch, P., 1985, Indeterminacy relations and simultaneous measurements in quantum theory, Int. J. Theor. Phys. 24(1), 63–92. [106] Busch, P., 1986, Unsharp reality and joint measurements for spin observables, Phys. Rev. D 33(8), 2253–2261. [107] Busch, P., 1987, Some realizable joint measurements of complementary observables, Found. Phys. 17(9), 905–937. [108] Busch, P., 1988, Surprising features of unsharp quantum measurements, Phys. Lett. A 130(6– 7), 323–329. [109] Busch, P., 1991, Informationally complete sets of physical quantities, Int. J. Theor. Phys. 30(9), 1217–1227. [110] Busch, P., M. Grabowski, and P. J. Lahti, 1989, Some remarks on effects, operations, and unsharp measurements, Found. Phys. Lett. 2(4), 331–345. [111] Busch, P., and P. J. Lahti, 1984, On various joint measurements of position and momentum observables in quantum theory, Phys. Rev. D 29(8), 1634–1646. [112] Busch, P., and P. J. Lahti, 1985, A note on quantum theory, complementarity, and uncertainty, Philos. Sci. 52(1), 64–77. [113] Busch, P., and F. E. Schroeck Jr., 1989, On the reality of spin and helicity, Found. Phys. 19(7), 807–872. [114] Cadoni, M., and P. G. L. Mana, 2000, Hamiltonians for a general dilaton gravity theory on a spacetime with a non-orthogonal, timelike or spacelike outer boundary, Class. Quantum Grav. 18(5), 779–792, eprint arXiv:gr-qc/0011010. [115] Carteret, H., D. R. Terno, and K. Zyczkowski,˙ 2005, Physical accessibility of non-completely positive maps, eprint arXiv:quant-ph/0512167. [116] Cartwright, N., 1974, Correlations without joint distributions in quantum mechanics, Found. Phys. 4(1), 127–136. [117] Cartwright, N., 1978, The only real probabilities in quantum mechanics, PSA: Proc. Bienn. Meet. Philos. Sci. Assoc. 1978-1, 54–59. [118] Cartwright, N., and M. Jones, 1991, How to hunt quantum causes, Erkenntnis 35(1–3), 205– 231. [119] Caves, C. M., 1990, Quantitative limits on the ability of a Maxwell demon to extract work from heat, Phys. Rev. Lett. 64(18), 2111, see also [123]. [120] Caves, C. M., 2000, Learning and the de Finetti representation, http://info.phys.unm. edu/~caves/reports/reports.html. [121] Caves, C. M., C. A. Fuchs, and R. Schack, 2002, Quantum probabilities as Bayesian proba- bilities, Phys. Rev. A 65, 022305, eprint arXiv:quant-ph/0106133. [122] Caves, C. M., C. A. Fuchs, and R. Schack, 2002, Unknown quantum states: the quantum de Finetti representation, J. Math. Phys. 43(9), 4537–4559, eprint arXiv: quant-ph/0104088. [123] Caves, C. M., W. G. Unruh, and W. H. Zurek, 1990, Comment on “Quantitative limits on the ability of a Maxwell demon to extract work from heat”, Phys. Rev. Lett. 65(11), 1387, see [119]. [124] Challinor, A. D., A. N. Lasenby, S. F. Gull, and C. J. L. Doran, 1996, A relativistic, causal account of a spin measurement, Phys. Lett. A 218(3–6), 128–138, http://www.mrao.cam. ac.uk/~clifford/publications/index.html. [125] Choquet-Bruhat, Y., C. De Witt-Morette, and M. Dillard-Bleick, 1977/1996, Analysis, Mani- folds and Physics. Part I: Basics (Elsevier, Amsterdam), revised edition, first publ. 1977. 77

[126] Clauser, J. F., M. A. Horne, A. Shimony, and R. A. Holt, 1969, Proposed experiment to test local hidden-variable theories, Phys. Rev. Lett. 23(15), 880–884. [127] Clauser, J. F., and A. Shimony, 1978, Bell’s theorem: experimental tests and implications, Rep. Prog. Phys. 41(12), 1881–1927. [128] Coecke, B., and K. Martin, 2003, Partiality in physics, eprint arXiv:quant-ph/0312044. [129] Coecke, B., D. J. Moore, and A. Wilce, 2000, Operational quantum logic: An overview, in Current Research in Operational Quantum Logic: Algebras, Categories, Languages, edited by B. Coecke, D. J. Moore, and A. Wilce (Kluwer Academic Publishers, Dordrecht), pp. 1–36, eprint arXiv:quant-ph/0008019. [130] Cohen, O., 1999, Counterfactual entanglement and nonlocal correlations in separable states, Phys. Rev. A 60(1), 80–84, eprint arXiv:quant-ph/9907109, see also [131, 693]. [131] Cohen, O., 2001, Reply to “Comment on ‘Counterfactual entanglement and nonlocal correla- tions in separable states’ ”, Phys. Rev. A 63(1), 016102, eprint arXiv:quant-ph/0008080, see [130, 693]. [132] Coleman, B. D., and E. H. Dill, 1971, On the thermodynamics of electromagnetic fields in materials with memory, Arch. Rational Mech. Anal. 41(2), 132–162. [133] Coleman, B. D., E. H. Dill, and R. A. Toupin, 1970, A phenomenological theory of streaming birefringence, Arch. Rational Mech. Anal. 39(5), 358–399. [134] Coleman, B. D., M. Feinberg, and J. Serrin (eds.), 1987, Analysis and Thermomechanics: A Collection of Papers Dedicated to W. Noll (Springer-Verlag, Berlin). [135] Copi, I. M., 1954/1979, Symbolic Logic (Macmillan Publishing, New York), fifth edition, first publ. 1954. [136] Copi, I. M., and C. Cohen, 1953/1990, Introduction to Logic (MacMillan Publishing Com- pany, New York), eighth edition, first publ. 1953. [137] Corwin, E. I., H. M. Jaeger, and S. R. Nagel, 2005, Structural signature of jamming in granu- lar media, Nature 435(7096), 1075–1078. [138] Cox, R. T., 1946, Probability, frequency, and reasonable expectation, Am. J. Phys. 14(1), 1–13. [139] Cox, R. T., 1961, The Algebra of Probable Inference (The Johns Hopkins Press, Baltimore). [140] Crisp, M. D., and E. T. Jaynes, 1969, Erratum: Radiative effects in semiclassical theory, Phys. Rev. 185(5), 2046, see [141]. [141] Crisp, M. D., and E. T. Jaynes, 1969, Radiative effects in semiclassical theory, Phys. Rev. 179(5), 1253–1261, http://bayes.wustl.edu/etj/node1.html, see also erratum [140]. [142] Crooks, G. E., 1999, Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences, Phys. Rev. E 60(3), 2721–2726, eprint arXiv: cond-mat/9901352. [143] Crutchfield, J. P., and C. R. Shalizi, 1999, Thermodynamic depth of causal states: Objec- tive complexity via minimal representations, Phys. Rev. E 59(1), 275–283, eprint arXiv: cond-mat/9808147. [144] Curien, P.-L., 2001, Preface to Locus Solum, Math. Struct. in Comp. Science 11(3), 299–300, see also [274]. [145] Dahl, G., 1999, Matrix majorization, Lin. Alg. Appl. 288, 53–73, http://heim.ifi.uio. no/~ftp/publications/research-reports/. [146] Dähn, G., 1968, Attempt of an axiomatic foundation of quantum mechanics and more general theories. IV, Commun. Math. Phys. 9(3), 192–211. [147] Dähn, G., 1972, The algebra generated by physical filters, Commun. Math. Phys. 28(2), 109– 122. 78 BIBLIOGRAPHY

[148] Dähn, G., 1972, Symmetry of the physical probability function implies modularity of the lattice of decision effects, Commun. Math. Phys. 28(2), 123–132. [149] van Dalen, D., 1991/1994, Logic and Structure (Springer-Verlag, Berlin), third, augmented edition, first publ. 1991. [150] Dalton, F., F. Farrelly, A. Petri, L. Pietronero, L. Pitolli, and G. Pontuale, 2005, Shear stress fluctuations in the granular liquid and solid phases, Phys. Rev. Lett. 95(13), 138001, eprint arXiv:cond-mat/0507600. [151] Dantzig, G. B., and M. N.-D. Thapa, 1997, Linear Programming. 1: Introduction (Springer- Verlag, New York). [152] Dantzig, G. B., and M. N.-D. Thapa, 2003, Linear Programming. 2: Theory and Extensions (Springer-Verlag, New York). [153] Davies, E. B., 1969, Quantum stochastic processes, Commun. Math. Phys. 15(4), 277–304. [154] Davies, E. B., 1970, Quantum stochastic processes II, Commun. Math. Phys. 19(2), 83–105. [155] Davies, E. B., 1971, Quantum stochastic processes III, Commun. Math. Phys. 22(1), 51–70. [156] Davies, E. B., 1976, Quantum Theory of Open Systems (Academic Press, London). [157] Davies, E. B., and J. T. Lewis, 1970, An operational approach to quantum probability, Com- mun. Math. Phys. 17(3), 239–260. [158] Day, W. A., 1972, The Thermodynamics of Simple Materials with Fading Memory (Springer- Verlag, Berlin). [159] Day, W. A., 1985, Heat Conduction Within Linear Thermoelasticity (Springer-Verlag, New York). [160] De Morgan, A., 1847/1926, Formal Logic (The Open Court Company, London), first publ. 1847. [161] De Roeck, W., C. Maes, and K. Netocný,ˇ 2006, H-theorems from macroscopic au- tonomous equations, J. Stat. Phys. 123(3), 571–584, eprint arXiv:cond-mat/0508089, eprint mp_arc:05-263. [162] Delcroix, A., M. F. Hasler, S. Pilipovic,´ and V. Valmorin, 2002, Algebras of generalized func- tions through sequence spaces algebras. Functoriality and associations, Int. J. Math. Sci. 1, 13–31, eprint arXiv:math.FA/0210249. [163] Delcroix, A., M. F. Hasler, S. Pilipovic,´ and V. Valmorin, 2004, Generalized function alge- bras as sequence space algebras, Proc. Am. Math. Soc. 132(7), 2031–2038, eprint arXiv: math.FA/0206039. [164] Demidov, A. S., 2001, Generalized Functions in Mathematical Physics: Main Ideas and Con- cepts (Nova Science Publishers, Huntington, USA), with an addition by Yu. V. Egorov. [165] Depken, M., and R. Stinchcombe, 2004, Fluctuation-dissipation relation and the Edwards entropy for a glassy granular compaction model, eprint arXiv:cond-mat/0404111. [166] Deschamps, G. A., 1981, Electromagnetics and differential forms, Proc. IEEE 69(6), 676–696. [167] Dewar, R. C., 2003, Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states, J. Phys. A 36(3), 631–641, eprint arXiv:cond-mat/0005382. [168] Dewar, R. C., 2005, Maximum entropy production and non-equilibrium statistical mechanics, in Kleidon and Lorenz [433], pp. 41–55. [169] Dewar, R. C., 2005, Maximum entropy production and the fluctuation theorem, J. Phys. A 38(21), L371–L381. [170] Diaconis, P., 1977, Finite forms of de Finetti’s theorem on exchangeability, Synthese 36(2), 271–281. [171] Diaconis, P., and D. Freedman, 1980, Finite exchangeable sequences, Ann. Prob. 8(4), 745– 79

764. [172] DiCarlo, A., 2004, G. Lamé vs. J. C. Maxwell: How to reconcile them?, in Schilders et al. [640], pp. 1–13. [173] Dieks, D., and V. van Dijk, 1988, Another look at the quantum mechanical entropy of mixing, Am. J. Phys. 56(5), 430–434. [174] Dougherty, J. P., 1989, Approaches to non-equilibrium statistical mechanics, in Skilling [663], pp. 131–136. [175] Dougherty, J. P., 1994, Foundations of non-equilibrium statistical mechanics, Phil. Trans. R. Soc. Lond. A 346(1680), 259–305. [176] Drummond, P. D., 1990, Electromagnetic quantization in dispersive inhomogeneous nonlin- ear dielectrics, Phys. Rev. A 42(11), 6845–6857. [177] Drummond, P. D., and M. Hillery, 1999, Quantum theory of dispersive electromagnetic modes, Phys. Rev. A 59(1), 691–707, eprint arXiv:quant-ph/9806045. [178] Dubois, D., and H. Prade, 2001, Possibility theory, probability theory and multiple-valued logics: A clarification, Ann. Math. Artif. Intell. 32(1–4), 35–66. [179] Dufty, J. W., and A. Baskaran, 2005, Hard sphere dynamics for normal and granular fluids, eprint arXiv:cond-mat/0503180. [180] Dufty, J. W., and J. J. Brey, 2004, Origins of hydrodynamics for a granular gas, eprint arXiv: cond-mat/0410133. [181] Duhem, P., 1906/1914, La Théorie Physique : son objet — sa structure (Marcel Rivière, Éditeur, Paris), second edition, http://virtualbooks.terra.com.br/freebook/fran/ la_theorie_physique.htm; first publ. 1906. [182] Earman, J., and J. D. Norton, 1998, Exorcist XIV: The wrath of Maxwell’s demon. Part I. From Maxwell to Szilard, Stud. Hist. Philos. Mod. Phys. 29(4), 435–471, see also [183]. [183] Earman, J., and J. D. Norton, 1999, Exorcist XIV: The wrath of Maxwell’s demon. Part II. From Szilard to Landauer and beyond, Stud. Hist. Philos. Mod. Phys. 30(1), 1–40, see also [182]. [184] Eckmann, J.-P., C.-A. Pillet, and L. Rey-Bellet, 1999, Non-equilibrium statistical mechanics of anharmonic chains coupled to two heat baths at different temperatures, Commun. Math. Phys. 201(3), 657–697, eprint arXiv:chao-dyn/9804001. [185] Egorov, Y. V., 1990, A contribution to the theory of generalized functions, Russ. Math. Surveys (Uspekhi Mat. Nauk) 45(5), 1–49. [186] Egorov, Y. V., 1990, Generalized functions and their applications, in Exner and Neidhardt [221], pp. 347–354. [187] Ehrenfest, P., 1959, Collected Scientific Papers (North-Holland Publishing Company, Ams- terdam), ed. by Martin J. Klein. [188] Ehrenfest, P., and V. Trkal, 1920, Deduction of the dissociation-equilibrium from the theory of quanta and a calculation of the chemical constant based on this, Proc. Acad. Sci. Amsterdam (Proc. of the Section of Sciences Koninklijke Nederlandse Akademie van Wetenschappen) 23, 162–183, repr. in [187], pp. 414–435. [189] Einstein, A., 1910, Theorie der Opaleszenz von homogenen Flüssigkeiten und Flüssigkeitsge- mischen in der Nähe des kritischen Zustandes, Ann. d. Phys. 33, 1275–1298, repr. in [192], transl. in [191]. [190] Einstein, A., 1993, The Collected Papers of Albert Einstein: Vol. 3: The Swiss Years: Writings, 1909–1911. (English translation) (Princeton University Press, Princeton), transl. by Anna Beck and Peter Havas. [191] Einstein, A., 1993, The theory of the opalescence of homogeneous fluids and liquid mixtures 80 BIBLIOGRAPHY

near the critical state, in Einstein [190], pp. 231–249. [192] Einstein, A., 2005, Theorie der Opaleszenz von homogenen Flüssigkeiten und Flüssigkeitsge- mischen in der Nähe des kritischen Zustandes, Ann. d. Phys. 14(S1), 368–391. [193] Einstein, A., B. Podolsky, and N. Rosen, 1935, Can quantum-mechanical description of phys- ical reality be considered complete?, Phys. Rev. 47, 777–780. [194] Ekstein, H., 1967, Presymmetry, Phys. Rev. 153(5), 1397–1402, see also [195]. [195] Ekstein, H., 1969, Presymmetry. II, Phys. Rev. 184(5), 1315–1337, see also [194]. [196] Emch, G. G., and G. L. Sewell, 1968, Nonequilibrium statistical mechanics of open systems, J. Math. Phys. 9(6), 946–958. [197] Ericksen, J. L., 1991/1998, Introduction to the Thermodynamics of Solids (Springer, New York), revised edition, first publ. 1991. [198] Ericksen, J. L., 2002, Electromagnetic fields in thermoelastic materials, Math. Mech. Solids 7(2), 165–189. [199] Ericksen, J. L., 2006, Electromagnetism in steadily rotating matter, Continuum Mech. Ther- modyn. 17(5), 361–371. [200] Ericksen, J. L., 2006, A modified theory of magnetic effects in elastic materials, Math. Mech. Solids 11(1), 23–47. [201] Ericksen, J. L., 2007, On formulating and assessing continuum theories of electromagnetic fields in elastic materials, J. Elasticity (‘online first’)(**), **. [202] Ericksen, J. L., 2007, Theory of elastic dielectrics revisited, Arch. Rational Mech. Anal. 183(2), 299–313. [203] Eringen, A. C., 1967/1980, Mechanics of Continua (Robert E. Krieger Publishing Company, Huntington, USA), second edition, first publ. 1967. [204] Eringen, A. C., 1972, On nonlocal fluid mechanics, Int. J. Engng. Sci. 10(6), 561–575. [205] Eringen, A. C., 1974, Theory of nonlocal thermoelasticity, Int. J. Engng. Sci. 12(12), 1063– 1077. [206] Eringen, A. C. (ed.), 1976, Continuum Physics. Vol. III — Mixtures and EM Field Theories (Academic Press, New York). [207] Eringen, A. C. (ed.), 1976, Continuum Physics. Vol. IV — Polar and Nonlocal Field Theories (Academic Press, New York). [208] Eringen, A. C., 1981, On nonlocal plasticity, Int. J. Engng. Sci. 19(12), 1461–1474. [209] Eringen, A. C., 1983, Theories of nonlocal plasticity, Int. J. Engng. Sci. 21(7), 741–751. [210] Eringen, A. C., 1984, Electrodynamics of memory-dependent nonlocal elastic continua, J. Math. Phys. 25(11), 3235–3249. [211] Eringen, A. C., 1991, Memory-dependent nonlocal electromagnetic elastic solids and super- conductivity, J. Math. Phys. 32(3), 787–796. [212] Eringen, A. C., 1992, Vistas of nonlocal continuum physics, Int. J. Engng. Sci. 30(10), 1551– 1565. [213] Eringen, A. C., 1999, Microcontinuum Field Theories: I. Foundations and Solids (Springer- Verlag, New York). [214] Eringen, A. C., 2002, Nonlocal Continuum Field Theories (Springer-Verlag, New York). [215] Eringen, A. C., 2006, Nonlocal continuum mechanics based on distributions, Int. J. Engng. Sci. 44(3–4), 141–147. [216] Eringen, A. C., and E. S. ¸Suhubi,1974, Elastodynamics. Vol. I: Finite Motions (Academic Press, New York). [217] Eringen, A. C., and E. S. ¸Suhubi,1975, Elastodynamics. Vol. II: Linear Theory (Academic Press, New York). 81

[218] Eringen, A. C., and D. G. B. Edelen, 1972, On nonlocal elasticity, Int. J. Engng. Sci. 10(3), 233–248. [219] Euler, L., 1912–1996, Leonhardi Euleri opera omnia. Series secunda: mechanica et astro- nomica (Teubner/Birkhäuser Verlag, Leipzig/Basel), 32 volumes-parts; ed. by H.-C. Im Hof et al. Cf. http://www.eulerarchive.org/. [220] Evans, D. J., and G. P. Morriss, 1990, Statistical Mechanics of Nonequilibrium Liquids (Aca- demic Press, London). [221] Exner, P., and H. Neidhardt (eds.), 1990, Order, Disorder and Chaos in Quantum Systems: Proceedings of a conference held at Dubna, USSR on October 17–21, 1989 (Birkhäuser Ver- lag, Basel). [222] Ezra, G. S., 2002, Geometric approach to response theory in non-Hamiltonian systems, J. Math. Chem. 32(4), 339–360. [223] Ezra, G. S., 2004, On the statistical mechanics of non-Hamiltonian systems: The generalized Liouville equation, entropy, and time-dependent metrics, J. Math. Chem. 35(1), 29–53. [224] Fano, U., 1957, Description of states in quantum mechanics by density matrix and operator techniques, Rev. Mod. Phys. 29(1), 74–93. [225] Faria Santos, S. H., 2001, Mixtures with continuous diversity: general theory and application to polymer solutions, Continuum Mech. Thermodyn. 13(2), 91–120. [226] Faria Santos, S. H., 2003, Mechanics and Thermodynamics of Mixtures with Continuous Di- versity: From Complex Media to Ice Sheets, Ph.D. thesis, Technische Universität Darmstadt, http://www.mechanik.tu-darmstadt.de/ag3/abstracts/2003/Faria03.html. [227] Feitosa, K., and N. Menon, 2002, Breakdown of energy equipartition in a 2D binary vibrated granular gas, Phys. Rev. Lett. 88(19), 198301, eprint arXiv:cond-mat/0111391. [228] Ferguson, A., and B. Chakraborty, 2006, Stress and large-scale spatial structures in dense, driven granular flows, Phys. Rev. E 73(1), 011303, eprint arXiv:cond-mat/0505496. [229] Ferrero, M., and A. van der Merwe (eds.), 1997, New Developments on Fundamental Prob- lems in Quantum Physics (Kluwer Academic Publishers, Dordrecht). [230] Feynman, R. P., and A. R. Hibbs, 1965, Quantum Mechanics and Path Integrals (McGraw-Hill Book Company, New York). [231] Feynman, R. P., R. B. Leighton, and M. Sands, 1965, The Feynman Lectures on Physics: Quantum Mechanics (Addison-Wesley Publishing Company, Reading, USA), eprint arXiv:. [232] de Finetti, B., 1931, Probabilismo, Logos 14, 163–219, transl. as [233]; see also [395]. [233] de Finetti, B., 1931/1989, Probabilism: A critical essay on the theory of probability and on the value of science, Erkenntnis 31(2–3), 169–223, transl. of [232] by Maria Concetta Di Maio, Maria Carla Galavotti, and Richard C. Jeffrey. [234] de Finetti, B., 1936, La logique de la probabilité, in Actes du Congrès International de Phi- losophie Scientifique : Sorbonne, Paris, 1935. IV. Induction et Probabilité (Hermann et Cie, Paris), pp. 31–39, transl. as [235]. [235] de Finetti, B., 1936/1995, The logic of probability, Phil. Stud. 77(1), 181–190, transl. by Brad Angell of [234]. [236] de Finetti, B., 1937, La prévision : ses lois logiques, ses sources subjectives, Ann. Inst. Henri Poincaré 7(1), 1–68, transl. as [238]. [237] de Finetti, B., 1938, Sur la condition d’équivalence partielle, transl. in [241]. [238] de Finetti, B., 1964, Foresight: Its logical laws, its subjective sources, in Kyburg and Smokler [446], pp. 93–158, transl. by Henry E. Kyburg, Jr. of [236]. [239] de Finetti, B., 1970/1990, Theory of Probability: A critical introductory treatment. Vol. 1 (John Wiley & Sons, New York), transl. by Antonio Machi and Adrian Smith; first publ. in 82 BIBLIOGRAPHY

Italian 1970. [240] de Finetti, B., 1970/1990, Theory of Probability: A critical introductory treatment. Vol. 2 (John Wiley & Sons, New York), transl. by Antonio Machi and Adrian Smith; first publ. in Italian 1970. [241] de Finetti, B., 1980, On the condition of partial exchangeability, in Jeffrey [396], pp. 193–205, transl. by P. Benacerraf and R. Jeffrey of [237]. [242] Flato, M., Z. Maric, A. Milojevic, D. Sternheimer, and J. P. Vigier (eds.), 1976, Quantum Me- chanics, Determinism, Causality, and Particles (D. Reidel Publishing Company, Dordrecht). [243] Flügge, S. (ed.), 1960, Handbuch der Physik: Band III/1: Prinzipien der klassischen Mechanik und Feldtheorie [Encyclopedia of Physics: Vol. III/1: Principles of Classical Mechanics and Field Theory] (Springer-Verlag, Berlin). [244] Flügge, S. (ed.), 1965, Handbuch der Physik: Band III/3: Die nicht-linearen Feldtheorien der Mechanik [Encyclopedia of Physics: Vol. III/3: The Non-Linear Field Theories of Mechanics] (Springer-Verlag, Berlin). [245] Fonseca Romero, K. M., P. Talkner, and P. Hänggi, 2004, Is the dynamics of open quantum systems always linear?, Phys. Rev. A 69(5), 052109, eprint arXiv:quant-ph/0311077. [246] Foulis, D. J., and S. P. Gudder, 2001, Observables, calibration, and effect algebras, Found. Phys. 31(11), 1515–1544. [247] Foulis, D. J., and C. H. Randall, 1972, Operational statistics. I. Basic concepts, J. Math. Phys. 13(11), 1667–1675. [248] Foulis, D. J., and C. H. Randall, 1978, Manuals, morphisms and quantum mechanics, in Marlow [508], pp. 105–126. [249] Fuchs, C. A., and A. Peres, 2000, Quantum theory needs no ‘interpretation’, Phys. Today 53(3), 70–71, see also comments and reply [679]. [250] Fuchs, C. A., and R. Schack, 2004, Unknown quantum states and operations, a Bayesian view, eprint arXiv:quant-ph/0404156. [251] Fuchs, C. A., R. Schack, and P. F. Scudo, 2004, De Finetti representation theorem for quantum-process tomography, Phys. Rev. A 69, 062305, eprint arXiv:quant-ph/0307198. [252] Fulton, T., and F. Rohrlich, 1960, Classical radiation from a uniformly accelerated charge, Ann. of Phys. 9(4), 499–517. [253] Gaifman, H., 1964, Concerning measures in first order calculi, Israel J. Math. 2, 1–18. [254] Gaifman, H., 2002, Vagueness, tolerance and contextual logic, http://www.columbia. edu/~hg17/. [255] Gallavotti, G., 2006, Fluctuation relation, fluctuation theorem, thermostats and entropy cre- ation in non equilibrium statistical physics, eprint arXiv:cond-mat/0612061. [256] Gallavotti, G., and E. G. D. Cohen, 1995, Dynamical ensembles in nonequilibrium statistical mechanics, Phys. Rev. Lett. 74(14), 2694–2697. [257] Garrison, J. C., and R. Y. Chiao, 2004, Canonical and kinetic forms of the electromagnetic momentum in an ad hoc quantization scheme for a dispersive dielectric, Phys. Rev. A 70(5), 053826, eprint arXiv:quant-ph/0406222. [258] Garzó, V., and J. W. Dufty, 2002, Hydrodynamics for a granular binary mixture at low density, Phys. Fluids 14(4), 1476–1490, eprint arXiv:cond-mat/0105395. [259] Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin, 1995/2004, Bayesian Data Analysis (Chapman & Hall/CRC, Boca Raton, USA), second edition, first publ. 1995. [260] de Gennes, P. G., 1999, Granular matter: a tentative view, Rev. Mod. Phys. 71(2), S374– S382. [261] Gentzen, G., 1935, Untersuchungen über das logische Schliessen, Math. Z. 39, 176–210, 83

405–431, transl. as [263]. [262] Gentzen, G., 1969, The Collected Papers of Gerhard Gentzen (North-Holland Publishing Company, Amsterdam), ed. and transl. by M. E. Szabo. [263] Gentzen, G., 1969, Investigations into logical deduction, in Gentzen [262], pp. 68–131, 312– 317 (notes). [264] Gerry, C. C., and P. L. Knight, 2005, Introductory Quantum Optics (Cambridge University Press, Cambridge). [265] Ghosh, K., K. A. Dill, M. M. Inamdar, E. Seitaridou, and R. Phillips, 2006, Teach- ing the principles of statistical dynamics, Am. J. Phys. 74(2), 123–133, eprint arXiv: cond-mat/0507388. [266] Gibbs, J. W., 1876–1878, On the equilibrium of heterogeneous substances, Trans. Connect. Acad. 3, 108–248, 343–524, repr. in [267], pp. 55–353. [267] Gibbs, J. W., 1906/1948, The Collected Works of J. Willard Gibbs. Vol. 1: Thermodynamics (Yale University Press, New Haven), http://gallica. bnf.fr/notice?N=FRBNF37277935, http://www.archive.org/details/ elementaryprinci00gibbrich; first publ. 1906. [268] Giles, R., 1970, Foundations for quantum mechanics, J. Math. Phys. 11(7), 2139–2160. [269] Giles, R., 1974, A non-classical logic for physics, Studia Logica 33(4), 397–415. [270] Giles, R., 1979, Formal languages and the foundations of physics, in Hooker [352], pp. 19–87. [271] Girard, J.-Y., 1989/2003, Proofs and Types (Cambridge University Press, Cambridge), http: //www.cs.man.ac.uk/~pt/stable/Proofs+Types.html; transl. and with appendices by Paul Taylor Yves Lafont; first publ. 1989. [272] Girard, J.-Y., 1999, On the meaning of logical rules I: syntax vs. semantics, in Computational Logic, edited by U. Berger and H. Schwichtenberg (Springer-Verlag, Berlin), pp. 215–272, http://iml.univ-mrs.fr/~girard/Articles.html; see also [273]. [273] Girard, J.-Y., 2000, On the meaning of logical rules II: multiplicatives and additives, in Foun- dations of Secure Computation: Proceedings of the 1999 Marktoberdorf Summer School, edited by F. L. Bauer and R. Steinbruggen (IOS Press, Amsterdam), pp. 183–212, http: //iml.univ-mrs.fr/~girard/Articles.html; see also [272]. [274] Girard, J.-Y., 2001, Locus solum: From the rules of logic to the logic of rules, Math. Struct. in Comp. Science 11(3), 301–506, http://iml.univ-mrs.fr/~girard/Articles.html; see also [144]. [275] Girard, J.-Y., 2003, From foundations to ludics, Bull. Symbolic Logic 9(2), 131–168, http: //iml.univ-mrs.fr/~girard/Articles.html. [276] Glauber, R. J., 1963, Coherent and incoherent states of the radiation field, Phys. Rev. 131(6), 2766–2788. [277] Glauber, R. J., 1963, Photon correlations, Phys. Rev. Lett. 10(3), 84–86. [278] Glauber, R. J., 1963, The quantum theory of optical coherence, Phys. Rev. 130(6), 2529–2539. [279] Gleason, A. M., 1957, Measures on the closed subspaces of a Hilbert space, J. Math. Mech. 6(6), 885–893. [280] Goddard, J. D., 2006, From granular matter to generalized continuum, in Mathematical Models for Granular Matter, edited by P. Mariano and ** (Springer, Berlin), p. **, http: //maeresearch.ucsd.edu/goddard/. [281] Goldman, A. I. (ed.), 1993, Readings in Philosophy and Cognitive Science (The MIT Press, Cambridge, USA). [282] González-Gascón, F., 1986, Note on paper of Andrey concerning non-Hamiltonian systems, Phys. Lett. A 114(2), 61–62, see [19] and also [20]. 84 BIBLIOGRAPHY

[283] Goodman, M. A., and S. C. Cowin, 1972, A continuum theory for granular materials, Arch. Rational Mech. Anal. 44(4), 249–266. [284] Grad, H., 1961, The many faces of entropy, Comm. Pure Appl. Math. 14, 323–354. [285] Grandy, W. T., Jr., 1980, Principle of maximum entropy and irreversible processes, Phys. Rep. 62(3), 175–266. [286] Grandy, W. T., Jr., 1981, Indistinguishability, symmetrisation and maximum entropy, Eur. J. Phys. 2(2), 86–90. [287] Grandy, W. T., Jr., 2004, Time evolution in macroscopic systems. I. Equations of motion, Found. Phys. 34(1), 1–20, eprint arXiv:cond-mat/0303290. [288] Grandy, W. T., Jr., 2004, Time evolution in macroscopic systems. II. The entropy, Found. Phys. 34(1), 21–57. [289] Grandy, W. T., Jr., 2004, Time evolution in macroscopic systems. III: Selected applications, Found. Phys. 34(5), 771–813. [290] Granström, H., 2006, Some remarks on the theorems of Gleason and Kochen-Specker, eprint arXiv:quant-ph/0612103. [291] Greechie, R. J., 1969, A particular non-atomistic orthomodular poset, Commun. Math. Phys. 14(4), 326–328. [292] Greechie, R. J., 1971, Orthomodular lattices admitting no states, J. Combin. Theory A 10(2), 119–132. [293] Green, M. S., 1960, Comment on a paper of Mori on time-correlation expressions for trans- port properties, Phys. Rev. 119(3), 829–830, see Mori [537]. [294] Gregory, P., 2005, Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support (Cambridge University Press, Cambridge). [295] Gronwald, F., and F. W. Hehl, 1997, Stress and hyperstress as fundamental concepts in con- tinuum mechanics and in relativistic field theory, eprint arXiv:gr-qc/9701054. [296] de Groot, S. R., and P. Mazur, 1962/1984, Non-Equilibrium Thermodynamics (Dover Publi- cations, New York), first publ. 1962. [297] Grünbaum, B., 1967/2003, Convex Polytopes (Springer-Verlag, New York), second edition, prep. by Volker Kaibel, Victor Klee, and Günter M. Ziegler; first publ. 1967. [298] Gudder, S., 1999, Observables and statistical maps, Found. Phys. 29(6), 877–897. [299] Gudder, S. P., 1965, Spectral methods for a generalized probability theory, Trans. Am. Math. Soc. 119(3), 428–442. [300] Gudder, S. P., 1969, Quantum probability spaces, Proc. Am. Math. Soc. 21(2), 296–302. [301] Gudder, S. P., 1973, Convex structures and operational quantum mechanics, Commun. Math. Phys. 29(3), 249–264. [302] Gudder, S. P., 1976, A generalized measure and probability theory for the physical sciences, in Harper and Hooker [317], pp. 121–141. [303] Gudder, S. P., 1984, An extension of classical measure theory, SIAM Rev. 26(1), 71–89. [304] Gudder, S. P., 1984, Probability manifolds, J. Math. Phys. 25(8), 2397–2401. [305] Gurtin, M. E., 1974, Modern continuum thermodynamics, in Nemat-Nasser [553], pp. 168– 213. [306] Gurtin, M. E., 1993, Thermomechanics of Evolving Phase Boundaries in the Plane (Oxford University Press), oxford edition. [307] Gurtin, M. E., 2000, Configurational Forces as Basic Concepts of Continuum Physics (Springer, New York). [308] Gurtin, M. E., and W. O. Williams, 1967, An axiomatic foundation for continuum thermody- namics, Arch. Rational Mech. Anal. 26, 83–117. 85

[309] Hacking, I., 1971, Equipossibility theories of probability, Brit. J. Phil. Sci. 22(4), 339–355. [310] Hacking, I., 1971, Jacques Bernoulli’s Art of Conjecturing, Brit. J. Phil. Sci. 22(3), 209–229. [311] Hacking, I., 1971, The Leibniz-Carnap program for inductive logic, J. Phil. 68(19), 597–610. [312] Hailperin, T., 1984, Probability logic, Notre Dame J. Formal Logic 25(3), 198–212. [313] Hailperin, T., 1996, Sentential Probability Logic: Origins, Development, Current Status, and Technical Applications (Associated University Presses, London). [314] Hailperin, T., 2006, Probability logic and combining evidence, Hist. Philos. Logic 27(3), 249– 269. [315] Hamilton, A. G., 1978/1993, Logic for Mathematicians (Cambridge University Press, Cam- bridge), first publ. 1978. [316] Hardy, L., 2001, Quantum theory from five reasonable axioms, eprint arXiv: quant-ph/0101012. [317] Harper, W. L., and C. A. Hooker (eds.), 1976, Foundations of Probability Theory, Statisti- cal Inference, and Statistical Theories of Science. Vol. III: Foundations and Philosophy of Statistical Theories in the Physical Sciences (D. Reidel Publishing Company, Dordrecht). [318] Harriman, J. E., 1978, Geometry of density matrices. I. Definitions, N matrices and 1 matrices, Phys. Rev. A 17(4), 1249–1256, see also [319–322]. [319] Harriman, J. E., 1978, Geometry of density matrices. II. Reduced density matrices and N rep- resentability, Phys. Rev. A 17(4), 1257–1268, see also [318, 320–322]. [320] Harriman, J. E., 1979, Geometry of density matrices. III. Spin components, Int. J. Quant. Chem. 15(6), 611–643, see also [318, 319, 321, 322]. [321] Harriman, J. E., 1983, Geometry of density matrices. IV. The relationship between density matrices and densities, Phys. Rev. A 27(2), 632–645, see also [318–320, 322]. [322] Harriman, J. E., 1984, Geometry of density matrices. V. Eigenstates, Phys. Rev. A 30(1), 19–29, see also [318–321]. [323] Harriman, J. E., 1986, Densities, operators, and basis sets, Phys. Rev. A 34(1), 29–39. [324] Harriman, J. E., 1994, A quantum state vector phase space representation, J. Chem. Phys. 100(5), 3651–3661. [325] Hassan, S. S., and R. K. Bullough, 1975, Theory of the dynamical Stark effect, J. Phys. B 8(9), L147–L152. [326] Hassan, S. S., T. M. Jarad, R. R. Puri, and R. K. Bullough, 2005, Field-dependent atomic relaxation in a squeezed vacuum, J. Opt. B 7(11), 329–340. [327] Hawton, M., 1991, Dispersion self-energy of the electron, Phys. Rev. A 43(9), 4588–4593. [328] Hawton, M., 1992, Dispersion forces, self-reaction, and electromagnetic fluctuations, Phys. Rev. A 46(11), 6846–6850. [329] Heath, D., and W. Sudderth, 1976, De Finetti’s theorem on exchangeable variables, American Statistician 30(4), 188–189. [330] Hehl, F. W., and Y. N. Obukhov, 2000, A gentle introduction to the foundations of classical electrodynamics: The meaning of the excitations (D, H) and the field strengths (E, B), eprint arXiv:physics/0005084. [331] Hehl, F. W., and Y. N. Obukhov, 2005, Dimensions and units in electrodynamics, Gen. Relat. Gravit. 37(4), 733–749, eprint arXiv:physics/0407022. [332] Hehl, F. W., Y. N. Obukhov, and G. F. Rubilar, 1999, Classical electrodynamics: A tutorial on its foundations, eprint arXiv:physics/9907046. [333] Heitjan, D. F., and D. B. Rubin, 1991, Ignorability and coarse data, Ann. Stat. 19(4), 2244– 2253. [334] Helland, I. S., 2003, Extended statistical modelling under symmetry: The link towards quan- 86 BIBLIOGRAPHY

tum mechanics, http://folk.uio.no/ingeh/publ.html. [335] Helland, I. S., 2003, Quantum theory as a statistical theory under symmetry and complemen- tarity, http://folk.uio.no/ingeh/publ.html. [336] Helland, I. S., 2005, Extended statistical modelling under symmetry: The link towards quan- tum mechanics, eprint arXiv:quant-ph/0503214. [337] Hellwig, K.-E., and K. Kraus, 1969, Pure operations and measurements, Commun. Math. Phys. 11(3), 214–220. [338] Hellwig, K.-E., and K. Kraus, 1970, Operations and measurements. II, Commun. Math. Phys. 16(2), 142–147. [339] Henkin, L., P. Suppes, and A. Tarski (eds.), 1959, The Axiomatic Method: With Special Refer- ence to Geometry and Physics (North-Holland Publishing Company, Amsterdam), ‘Proceed- ings of an International Symposium held at the University of California, Berkeley, December 26, 1957 – January 4, 1958’. [340] Hestenes, D., 1990, The Zitterbewegung interpretation of quantum mechanics, Found. Phys. 20(10), 1213–1232, http://modelingnts.la.asu.edu/pdf/ZBW_I_QM.pdf. [341] Hewitt, E., and L. J. Savage, 1955, Symmetric measures on Cartesian products, Trans. Am. Math. Soc. 80(2), 470–501. [342] Hillery, M., R. F. O’Connell, M. O. Scully, and E. P. Wigner, 1984, Distribution functions in physics: Fundamentals, Phys. Rep. 106(3), 121–167. [343] Hobson, A., 1966, Irreversibility and information in mechanical systems, J. Chem. Phys. 45(4), 1352–1357. [344] Hobson, A., 1966, Irreversibility in simple systems, Am. J. Phys. 34, 411–416. [345] Hobson, A., and D. N. Loomis, 1968, Exact classical nonequilibrium statistical-mechanical analysis of the finite ideal gas, Phys. Rev. 173(1), 285–295. [346] Holevo, A. S., 1973, Statistical decision theory for quantum systems, J. Multivariate Anal. 3(4), 337–394. [347] Holevo, A. S., 1976/1978, Investigations in the general theory of statistical decisions, Pro- ceedings of the Steklov Institute of Mathematics (Trudy Matematicheskogo Instituta imeni V. A. Steklova) 1978(3), transl. from the Russian by Lisa Rosenblatt; first publ. 1976. [348] Holevo, A. S., 1980/1982, Probabilistic and Statistical Aspects of Quantum Theory (North- Holland, Amsterdam), first publ. in Russian in 1980. [349] Holevo, A. S., 1985, Statistical definition of observable and the structure of statistical models, Rep. Math. Phys. 22(3), 385–407, see also [350]. [350] Holevo, A. S., 1985, Statistical definition of observable and the structure of statistical mod- els, Technical Report TRITA-TFY-85-02, Kungliga Tekniska Högskolan, Stockholm, see also [349]. [351] Holevo, A. S., 2001, Statistical Structure of Quantum Theory (Springer-Verlag, Berlin). [352] Hooker, C. A. (ed.), 1979, Physical Theory as Logico-Operational Structure (D. Reidel Pub- lishing Company, Dordrecht). [353] Hou, M., Z. Peng, R. Liu, K. Lu, and C. K. Chan, 2005, Dynamics of a projectile penetrating in granular systems, Phys. Rev. E 72(6), 062301. [354] Hudson, R. L., and G. R. Moody, 1976, Locally normal symmetric states and an analogue of de Finetti’s theorem, Probab. Theory Relat. Fields 33(4), 343–351. [355] Huth, M. R. A., and M. D. Ryan, 2000/2002, Logic in Computer Science: Modelling and reasoning about systems (Cambridge University Press, Cambridge), first publ. 2000. [356] Hutter, K., 1977, A thermodynamic theory of fluids and solids in electromagnetic fields, Arch. Rational Mech. Anal. 64(3), 193–298. 87

[357] IEEE, 1993, ANSI/IEEE Std 260.3-1993: American National Standard: Mathematical signs and symbols for use in physical sciences and technology, Institute of Electrical and Electronics Engineers, New York. [358] Irving, J. H., and J. G. Kirkwood, 1950, The statistical mechanical theory of trans- port processes. IV. The equations of hydrodynamics, J. Chem. Phys. 18(6), 817–829, see also [359, 421, 422, 424]. [359] Irving, J. H., and R. W. Zwanzig, 1951, The statistical mechanical theory of transport pro- cesses. V. Quantum hydrodynamics, J. Chem. Phys. 19(9), 1173–1180, see also [358, 421, 422, 424]. [360] Isham, C. J., R. Penrose, and D. W. Sciama (eds.), 1981, Quantum Gravity 2 (Oxford Univer- sity Press, Oxford). [361] ISO, 1993, Quantities and units, International Organization for , Geneva, third edition. [362] Ivanovic,´ I. D., 1981, Geometrical description of quantal state determination, J. Phys. A 14(12), 3241–3245. [363] Ivanovic,´ I. D., 1983, Formal state determination, J. Math. Phys. 24(5), 1199–1205. [364] Ivanovic,´ I. D., 1984, Representative state in incomplete quantal state determination, J. Phys. A 17(11), 2217–2223. [365] Ivanovic,´ I. D., 1987, How to differentiate between non-orthogonal states, Phys. Lett. A 123(6), 257–259. [366] Ivanovic,´ I. D., 1988, Mott scattering spin measurement as a generalised quantum measure- ment, J. Phys. A 21(8), 1857–1865. [367] Jaeger, H. M., S. R. Nagel, and R. P. Behringer, 1996, Granular solids, liquids, and gases, Rev. Mod. Phys. 68(4), 1259–1273. [368] Jarzynski, C., 2000, Hamiltonian derivation of a detailed fluctuation theorem, J. Stat. Phys. 98(1/2), 77–102, eprint arXiv:cond-mat/9908286. [369] Jauch, J. M., and C. Piron, 1967, Generalized localizability, Helv. Phys. Acta 40(5), 559–570. [370] Jaworski, W., 1987, Higher-order moments and the maximum entropy inference: the thermo- dynamic limit approach, J. Phys. A 20(4), 915–926, see also [573]. [371] Jaynes, E. T., 1954–1974, Probability Theory: With Applications in Science and Engineer- ing: A Series of Informal Lectures, http://bayes.wustl.edu/etj/science.pdf.html; lecture notes written 1954–1974; earlier version of [390]. [372] Jaynes, E. T., 1957, How does the brain do plausible reasoning?, http://bayes.wustl. edu/etj/node2.html (includes reviewer’s comments and Jaynes’ reply). [373] Jaynes, E. T., 1959, Probability Theory in Science and Engineering (Socony-Mobil Oil Com- pany, Dallas), http://bayes.wustl.edu/etj/node1.html; see also [371]. [374] Jaynes, E. T., 1967, Foundations of probability theory and statistical mechanics, in Bunge [101], pp. 77–101, http://bayes.wustl.edu/etj/node1.html. [375] Jaynes, E. T., 1973, Survey of the present status of neoclassical radiation theory, in Coherence and Quantum Optics, edited by L. Mandel and E. Wolf (Plenum Press, New York), pp. 35–81, http://bayes.wustl.edu/etj/node1.html. [376] Jaynes, E. T., 1978, Electrodynamics today, in Coherence and Quantum Optics IV, edited by L. Mandel and E. Wolf (Plenum Press, New York), pp. 495–509, a longer version available at http://bayes.wustl.edu/etj/node1.html. [377] Jaynes, E. T., 1980, The minimum entropy production principle, Annu. Rev. Phys. Chem. 31, 579, http://bayes.wustl.edu/etj/node1.html. [378] Jaynes, E. T., 1980, Quantum beats, in Foundations of Radiation Theory and Quantum 88 BIBLIOGRAPHY

Electrodynamics, edited by A. O. Barut (Plenum Press, New York), pp. 37–43, http: //bayes.wustl.edu/etj/node1.html. [379] Jaynes, E. T., 1985, Macroscopic prediction, in Complex Systems — Operational Approaches, edited by H. Haken (Springer-Verlag, Berlin), p. 254, http://bayes.wustl.edu/etj/ node1.html. [380] Jaynes, E. T., 1985, Some random observations, Synthese 63, 115, http://bayes.wustl. edu/etj/node1.html. [381] Jaynes, E. T., 1985, Where do we go from here?, in Smith and Grandy [665], p. 21, http: //bayes.wustl.edu/etj/node1.html. [382] Jaynes, E. T., 1986, Monkeys, kangaroos and N, in Maximum-Entropy and Bayesian Methods in Applied Statistics, edited by J. H. Justice (Cambridge University Press, Cambridge), p. 26, see also the revised and corrected version [391]. [383] Jaynes, E. T., 1986, Predictive statistical mechanics, in Frontiers of Nonequilibrium Statistical Physics, edited by G. T. Moore and M. O. Scully (Plenum Press, New York), p. 33, http: //bayes.wustl.edu/etj/node1.html. [384] Jaynes, E. T., 1986, Some applications and extensions of the de Finetti representation theorem, in Bayesian Inference and Decision Techniques with Applications: Essays in Honor of Bruno de Finetti, edited by P. K. Goel and A. Zellner (North-Holland, Amsterdam), p. 31, http: //bayes.wustl.edu/etj/node1.html. [385] Jaynes, E. T., 1989, Clearing up mysteries — the original goal, in Maximum Entropy and Bayesian Methods, edited by J. Skilling (Kluwer Academic Publishers, Dordrecht), pp. 1–27, http://bayes.wustl.edu/etj/node1.html. [386] Jaynes, E. T., 1990, Probability in quantum theory, in Zurek [761], p. 381, revised and ex- tended version available at http://bayes.wustl.edu/etj/node1.html. [387] Jaynes, E. T., 1991, Scattering of light by free electrons: as a test of quantum theory, in The Electron, edited by D. Hestenes and A. Weingartshofer (Kluwer Academic Publishers, Dordrecht), pp. 1–20, http://bayes.wustl.edu/etj/node1.html. [388] Jaynes, E. T., 1992, The Gibbs paradox, in Maximum-Entropy and Bayesian Methods, edited by C. R. Smith, G. J. Erickson, and P. O. Neudorfer (Kluwer Academic Publishers, Dordrecht), pp. 1–22, http://bayes.wustl.edu/etj/node1.html. [389] Jaynes, E. T., 1993, A backward look to the future, in Physics and Probability: Essays in Honor of Edwin T. Jaynes, edited by W. T. Grandy, Jr. and P. W. Milonni (Cambridge Univer- sity Press, Cambridge), pp. 261–275, http://bayes.wustl.edu/etj/node1.html. [390] Jaynes, E. T., 1994/2003, Probability Theory: The Logic of Science (Cambridge Uni- versity Press, Cambridge), ed. by G. Larry Bretthorst; http://omega.albany.edu: 8008/JaynesBook.html, http://omega.albany.edu:8008/JaynesBookPdf.html. First publ. 1994; earlier versions in [371, 373]. [391] Jaynes, E. T., 1996, Monkeys, kangaroos and N, http://bayes.wustl.edu/etj/node1. html; revised and corrected version of Jaynes [382]. (Errata: in equations (29)–(31), (33), (40), (44), (49) the commas should be replaced by gamma functions, and on p. 19 the value 0.915 should be replaced by 0.0915). [392] Jaynes, E. T., ca. 1965, Thermodynamics, http://bayes.wustl.edu/etj/node2.html. [393] Jaynes, E. T., and F. W. Cummings, 1963, Comparison of quantum and semiclassical radia- tion theory with application to the beam maser, Proc. IEEE 51(1), 89–109, http://bayes. wustl.edu/etj/node1.html. [394] Jaynes, E. T., and D. J. Scalapino, 1963, Irreversible statistical mechanics, http://bayes. wustl.edu/etj/node2.html. 89

[395] Jeffrey, R., 1989, Reading Probabilismo, Erkenntnis 31(2–3), 225–237, see [232]. [396] Jeffrey, R. C. (ed.), 1980, Studies in inductive logic and probability (University of California Press, Berkeley). [397] Jeffreys, H., 1931/1957, Scientific Inference (Cambridge University Press, Cambridge), sec- ond edition, first publ. 1931. [398] Jeffreys, H., 1939/1998, Theory of Probability (Oxford University Press, London), third edi- tion, first publ. 1939. [399] Johnson, W. E., 1921/1940, Logic. Part I (Cambridge University Press, Cambridge), first publ. 1921. [400] Johnson, W. E., 1922, Logic. Part II. Demonstrative Inference: Deductive and Inductive (Cambridge University Press, Cambridge). [401] Johnson, W. E., 1924, Logic. Part III. The Logical Foundations of Science (Cambridge Uni- versity Press, Cambridge). [402] Johnson, W. E., 1932, Probability: The deductive and inductive problems, Mind 41(164), 409–423, with some notes and an appendix by R. B. Braithwaite. [403] Jones, K. R. W., 1991, Principles of quantum inference, Ann. of Phys. 207(1), 140–170. [404] Jones, K. R. W., 1994, Fundamental limits upon the measurement of state vectors, Phys. Rev. A 50(5), 3682–3699. [405] Jordan, T. F., 2005, Affine maps of density matrices, Phys. Rev. A 71(3), 034101, eprint arXiv:quant-ph/0407203. [406] Jordan, T. F., A. Shaji, and E. C. G. Sudarshan, 2004, Dynamics of initially entangled open quantum systems, Phys. Rev. A 70(5), 052110, eprint arXiv:quant-ph/0407083. [407] Jordan, T. F., A. Shaji, and E. C. G. Sudarshan, 2006, Mapping the Schrödinger picture of open quantum dynamics, Phys. Rev. A 73(1), 012106, eprint arXiv:quant-ph/0505123. [408] Kahle, R., and P. Schroeder-Heister, 2006, Introduction: Proof-theoretic semantics, Synthese 148(3), 503–506, gentzen, logic. [409] Kahneman, D., and A. Tversky, 1973, On the psychology of prediction, Psychol. Rev. 80(4), 237–251. [410] Kahneman, D., and A. Tversky, 1979, Prospect theory: An analysis of decision under risk, Econometrica 47(2), 263–292. [411] Kalman, R. E., P. L. Falb, and M. A. Arbib, 1969, Topics in Mathematical System Theory (McGraw-Hill Book Company, New York). [412] van Kampen, N. G., 1984, The Gibbs paradox, in Parry [576], pp. 303–312. [413] Kass, R. E., 1982, A comment on “Is Jeffreys a ‘necessarist’?”, American Statistician 36(4), 390–391, see [757] and also [758]. [414] Kawarada, A., and H. Hayakawa, 2004, Non-Gaussian velocity distribution function in a vi- brating granular bed, eprint arXiv:cond-mat/0404633. [415] Kelly, P. D., 1964, A reacting continuum, Int. J. Engng. Sci. 2(2), 129–153. [416] Kent, A., 1999, Noncontextual hidden variables and physical measurements, Phys. Rev. Lett. 83(19), 3755–3757, eprint arXiv:quant-ph/9906006. [417] Kieu, T. D., 2004, The second law, Maxwell’s demon, and work derivable from quantum heat engines, Phys. Rev. Lett. 93(14), 140403, eprint arXiv:quant-ph/0508161. [418] Kim, K., J. K. Moon, J. J. Park, H. K. Kim, and H. K. Pak, 2005, Jamming transition in a highly dense granular system under vertical vibration, eprint arXiv:cond-mat/0501310. [419] Kirkwood, J. G., 1933, Quantum statistics of almost classical assemblies, Phys. Rev. 44(31– 37). [420] Kirkwood, J. G., 1935, Statistical mechanics of fluid mixtures, J. Chem. Phys. 3(5), 300–313. 90 BIBLIOGRAPHY

[421] Kirkwood, J. G., 1946, The statistical mechanical theory of transport processes: I. General theory, J. Chem. Phys. 14(3, 5), 180–201, 347, see also [358, 359, 422, 424]. [422] Kirkwood, J. G., 1947, The statistical mechanical theory of transport processes: II. Transport in gases, J. Chem. Phys. 15(1, 3), 72–76, 155, see also [358, 359, 421, 424]. [423] Kirkwood, J. G., and F. P. Buff, 1949, The statistical mechanical theory of surface tension, J. Chem. Phys. 17(3), 338–343. [424] Kirkwood, J. G., F. P. Buff, and M. S. Green, 1949, The statistical mechanical theory of transport processes: III. The coefficients of shear and bulk viscosity of liquids, J. Chem. Phys. 17(10), 988–994, see also [358, 359, 421, 422]. [425] Kirkwood, J. G., and B. Crawford, Jr., 1952, The macroscopic equations of transport, J. Phys. Chem. 56(9), 1048–1051. [426] Klee, V. L., jr., 1949, Dense convex sets, Duke Math. J. 16(2), 351–354. [427] Klee, V. L., jr., 1951, Convex sets in linear spaces, Duke Math. J. 18(2), 443–466, see also [428, 429]. [428] Klee, V. L., jr., 1951, Convex sets in linear spaces. II, Duke Math. J. 18(4), 875–883, see also [427, 429]. [429] Klee, V. L., jr., 1953, Convex sets in linear spaces. III, Duke Math. J. 20(1), 105–111, see also [427, 428]. [430] Klee, V. L., jr., 1955, A note on extreme points, Am. Math. Monthly 62(1), 30–32. [431] Klee, V. L., jr., 1957, Extremal structure of convex sets, Arch. d. Math. 8(3), 234–240, see also [432]. [432] Klee, V. L., jr., 1958, Extremal structure of convex sets. II, Math. Z. 69(1), 90–104, see also [431]. [433] Kleidon, A., and R. D. Lorenz (eds.), 2005, Non-equilibrium Thermodynamics and the Pro- duction of Entropy: Life, Earth, and Beyond (Springer, Berlin), with a Foreword by Hartmut Grassl. [434] Klein, M. J., 1958, Note on a problem concerning the Gibbs paradox, Am. J. Phys. 26(2), 80–81. [435] Klein, M. J., 1958, Order, organisation and entropy, Brit. J. Phil. Sci. 4(14), 158–160. [436] Kochen, S., and E. P. Specker, 1965, Logical structures arising in quantum theory, in Addison et al. [3], pp. 177–189. [437] Kochen, S., and E. P. Specker, 1967, The problem of hidden variables in quantum mechanics, J. Math. Mech. 17(1), 59–87. [438] Kohlstedt, K., A. Snezhko, M. V. Sapozhnikov, I. S. Aranson, J. S. Olafsen, and E. Ben-Naim, 2005, Velocity distributions of granular gases with drag and with long-range interactions, Phys. Rev. Lett. 95(6), 068001, eprint arXiv:cond-mat/0507113. [439] Kolb, E., C. Goldenberg, S. Inagaki, and E. Clément, 2006, Reorganization of a two- dimensional disordered granular medium due to a small local cyclic perturbation, J. Stat. Mech. , P07017. [440] Koopman, B. O., 1957, Quantum theory and the foundations of probability, in MacColl [493], pp. 97–102. [441] Kracklauer, A. F., 2000, “La ‘théorie’ de Bell, est-elle la plus grande méprise de l’histoire de la physique ?”, Ann. Fond. Louis de Broglie 25(2), 193–207. [442] Kracklauer, A. F., 2002, Exclusion of correlation in the theorem of Bell, eprint arXiv: quant-ph/0210121. [443] Kracklauer, A. F., 2002, One less quantum mystery, J. Opt. B 4(4), S469–S472. [444] Kraus, K., 1983, States, Effects, and Operations: Fundamental Notions of Quantum Theory 91

(Springer-Verlag, Berlin). [445] Kremer, M., 1988, Logic and meaning: The philosophical significance of the sequent calculus, Mind 97(385), 50–72. [446] Kyburg, H. E., Jr., and H. E. Smokler (eds.), 1964, Studies in Subjective Probability (John Wiley & Sons, New York). [447] Lahti, P. J., 1983, Hilbertian quantum theory as the theory of complementarity, Int. J. Theor. Phys. 22(10), 911–929. [448] Lahti, P. J., and S. Bugajski, 1985, Fundamental principles of quantum theory, II. From a convexity scheme to the DHB theory, Int. J. Theor. Phys. 24(11), 1051–1080, see also [97]. [449] Laplace, P. S., (Marquis de), 1774, Mémoire sur la probabilité des causes par les évènemens, Mémoires de mathématique et de physique, presentés à l’Académie Royale des Sciences, par divers savans et lûs dans ses assemblées 6, 621–656, repr. in [453], pp. 25–65; http: //gallica.bnf.fr/document?O=N077596; transl. as [454]. [450] Laplace, P. S., (Marquis de), 1812/1820, Théorie analytique des probabilités (Mme Ve Courcier, Paris), third edition, first publ. 1812; repr. in [452]; http://gallica.bnf.fr/ document?O=N077595. [451] Laplace, P. S., (Marquis de), 1814/1819, Essai philosophique sur les probabilités (Mme Ve Courcier, Paris), fourth edition, repr. as the Introduction of [450], pp. V–CLIII; http:// gallica.bnf.fr/document?O=N077595. [452] Laplace, P. S., (Marquis de), 1886, Œuvres complètes de Laplace. Tome septième : Théo- rie analytique des probabilités (Gauthier-Villars, Paris), ‘Publiées sous les auspices de l’Académie des sciences, par MM. les secrétaires perpétuels’; http://gallica.bnf.fr/ notice?N=FRBNF30739022. [453] Laplace, P. S., (Marquis de), 1891, Œuvres complètes de Laplace. Tome huitième : Mémoires extraits des recueils de l’Académie des sciences de Paris et de la classe des sciences ma- thématiques et physiques de l’Institut de France (Gauthier-Villars, Paris), ‘Publiées sous les auspices de l’Académie des sciences, par MM. les secrétaires perpétuels’; http://gallica. bnf.fr/notice?N=FRBNF30739022. [454] Laplace, P. S., (Marquis de), 1986, Memoir on the probability of the causes of events, Stat. Sci. 1(3), 364–378, transl. by S. M. Stigler of [449]; see also the translator’s introduction [670]. [455] Larmor, J., 1897, A dynamical theory of the electric and luminiferous medium. — Part III. Relations with material media, Phil. Trans. R. Soc. Lond. A 190, 205–300, 493, repr. in [456, pp. 11–131]. [456] Larmor, J., 1929, Mathematical and Physical Papers. Vol. II (Cambridge University Press, Cambridge). [457] Leconte, M., Y. Garrabos, E. Falcon, C. Lecoutre-Chabot, F. Palencia, P. Évesque, and D. Bey- sens, 2006, Microgravity experiments on vibrated granular gases in a dilute regime: non- classical statistics, J. Stat. Mech. , P07012. [458] Lefevere, R., 2006, On the local space-time structure of non-equilibrium steady states, eprint arXiv:math-ph/0609049. [459] Leff, H. S., and A. F. Rex (eds.), 1990/2003, Maxwell’s Demon 2: Entropy, Classical and Quantum Information, Computing (Institute of Physics Publishing, Bristol), second edition, first publ. 1990. [460] Lenac, Z., 2003, Quantum optics of dispersive dielectric media, Phys. Rev. A 68(6), 063815. [461] Leonhardt, U., and H. Paul, 1994, Is the Wigner function directly measurable in quantum optics?, Phys. Lett. A 193(2), 117–120. [462] Levine, R. D., and M. Tribus (eds.), 1979, The Maximum Entropy Formalism (The MIT Press, 92 BIBLIOGRAPHY

Cambridge, USA). [463] Lévy-Leblond, J.-M., 1974, The pedagogical role and epistemological significance of group theory in quantum mechanics, Nuovo Cimento 4(1), 99–143. [464] Lewis, C. I., and C. H. Langford, 1932, Symbolic Logic (The Century Company, New York). [465] Lewis, G. N., 1931, Generalized thermodynamics including the theory of fluctuations, J. Am. Chem. Soc. 53(7), 2578–2588. [466] Lewis, G. N., 1931, A more fundamental thermodynamics, Phys. Rev. 38, 376. [467] Lewis, R. M., 1960, Measure-theoretic foundations of statistical mechanics, Arch. Rational Mech. Anal. 5, 355–381. [468] Lewis, R. M., 1961, Solution of the equations of statistical mechanics, J. Math. Phys. 2(2), 222–231. [469] Lewis, R. M., 1967, A unifying principle in statistical mechanics, J. Math. Phys. 8(7), 1448– 1459. [470] Lighthill, M. J., 1958/1964, Introduction to Fourier Analysis and Generalised Functions (Cambridge University Press, London), first publ. 1958. [471] Lindell, I. V., and B. Jancewicz, 2000, Electromagnetic boundary conditions in differential- form formalism, Eur. J. Phys. 21(1), 83–89. [472] Lindell, I. V., and B. Jancewicz, 2000, Maxwell stress dyadic in differential-form formalism, IEE Proc. Sci. Meas. Tech. 147(1), 19–26. [473] Lindley, D. V., and L. D. Phillips, 1976, Inference for a Bernoulli process (a Bayesian view), American Statistician 30(3), 112–119. [474] Liouville, J., 1838, Sur la théorie de la Variation des constantes arbitraires, J. Math. Pure Appl. 3, 342–349. [475] Lois, G., A. Lemaître, and J. M. Carlson, 2005, The breakdown of kinetic theory in granular shear flows, eprint arXiv:cond-mat/0507286. [476] Lois, G., A. Lemaître, and J. M. Carlson, 2005, Numerical tests of constitutive laws for dense granular flows, Phys. Rev. E 72(5), 051303, eprint arXiv:cond-mat/0501535. [477] Long, G.-L., Y.-F. Zhou, J.-Q. Jin, and Y. Sun, 2004, Distinctness of ensembles having the same density matrix and the nature of liquid NMR quantum computing, eprint arXiv: quant-ph/0408079. [478] Long, G.-L., Y.-F. Zhou, J.-Q. Jin, Y. Sun, and H.-W. Lee, 2005, Density matrix in quantum mechanics and distinctness of ensembles of fixed particle number having the same compressed density matrix, eprint arXiv:quant-ph/0508207. [479] Longhi, E., N. Easwar, and N. Menon, 2002, Large force fluctuations in a flowing granular medium, Phys. Rev. Lett. 89(4), 045501, eprint arXiv:cond-mat/0203379. [480] Losert, W., D. G. W. Cooper, J. Delour, A. Kudrolli, and J. P. Gollub, 1999, Velocity statistics in excited granular media, Chaos 9(3), 682–690, eprint arXiv:cond-mat/9901203. [481] Loubenets, E. R., 2003, General framework for the probabilistic description of experiments, eprint arXiv:quant-ph/0312199. [482] Loubenets, E. R., 2003, General probabilistic framework of randomness, eprint arXiv: quant-ph/0305126. [483] Lubkin, E., 1976, Quantum logic, convexity, and a Necker-cube experiment, in Harper and Hooker [317], pp. 143–153. [484] Ludwig, G., 1954/1983, Foundations of quantum mechanics I (Springer-Verlag, New York), transl. by Carl A. Hein; first publ. in German 1954. [485] Ludwig, G., 1954/1985, Foundations of quantum mechanics II (Springer-Verlag, New York), transl. by Carl A. Hein; first publ. in German 1954. 93

[486] Ludwig, G., 1964, Versuch einer axiomatischen Grundlegung der Quantenmechanik und all- gemeinerer physikalischer Theorien, Z. Phys. 181(3), 233–260, see also [487, 488]. [487] Ludwig, G., 1967, Attempt of an axiomatic foundation of quantum mechanics and more gen- eral theories, II, Commun. Math. Phys. 4(5), 331–348, see also [486, 488]. [488] Ludwig, G., 1968, Attempt of an axiomatic foundation of quantum mechanics and more gen- eral theories. III, Commun. Math. Phys. 9(1), 1–12, see also [486, 487]. [489] Ludwig, G., 1978/1990, Die Grundstrukturen einer physikalischen Theorie (Springer-Verlag, Berlin), second, revised and enlarged edition, first publ. 1978. [490] Ludwig, G., 1984, Restriction and embedding, in Balzer et al. [36], pp. 17–31. [491] Ludwig, G., 1985, An Axiomatic Basis for Quantum Mechanics. Vol. 1: Derivation of Hilbert Space Structure (Springer-Verlag, Berlin), transl. from the German manuscript by Leo F. Boron. [492] Ludwig, G., 1987, An Axiomatic Basis for Quantum Mechanics. Vol. 2: Quantum Mechanics and Macrosystems (Springer-Verlag, Berlin), transl. from the German manuscript by Kurt Just. [493] MacColl, L. A. (ed.), 1957, Applied Probability (McGraw-Hill Book Company, New York). [494] Mackey, G. W., 1963, The Mathematical Foundations of Quantum Mechanics: A Lecture-Note Volume (W. A. Benjamin, New York). [495] Maes, C., 2001, Statistical mechanics of entropy production: Gibbsian hypothesis and local fluctuations, eprint arXiv:cond-mat/0106464. [496] Maes, C., 2004, Fluctuation relations and positivity of the entropy production in irreversible dynamical systems, Nonlinearity 17(4), 1305–1316. [497] Maes, C., F. Redig, and A. Van Moffaert, 2000, On the definition of entropy production, via examples, J. Math. Phys. 41(3), 1528–1554, eprint mp_arc:99-209. [498] Maes, C., F. Redig, and M. Verschuere, 2001, From global to local fluctuation theorems, eprint arXiv:cond-mat/0106639. [499] Maes, C., and M. H. van Wieren, 2006, Time-symmetric fluctuations in nonequilibrium sys- tems, eprint arXiv:cond-mat/0601299. [500] Mana, P. G. L., 2004, Non-orthogonality and violations of the second law of thermodynamics, eprint arXiv:quant-ph/0408193; see also [501]. [501] Mana, P. G. L., A. Månsson, and G. Björk, 2005, On distinguishability, orthogonality, and violations of the second law, eprint arXiv:quant-ph/0505229; rev. version of [500]. [502] Mandelbrot, B., 1956, An outline of a purely phenomenological theory of statistical thermo- dynamics — I: canonical ensembles, IEEE Trans. Inform. Theor. 2(3), 190–203. [503] Mandelbrot, B. B., 1962, The role of sufficiency and of estimation in thermodynamics, Ann. Math. Stat. 33(3), 1021–1038. [504] Mandelbrot, B. B., 1964, On the derivation of statistical thermodynamics from purely phe- nomenological principles, J. Math. Phys. 5(2), 164–171. [505] Mandelbrot, B. B., 1989, Temperature fluctuation: a well-defined and unavoidable notion, Phys. Today January, 71–73. [506] Månsson, A., P. G. L. Porta Mana, and G. Björk, 2006, Numerical Bayesian state assignment for a three-level quantum system. I. Absolute-frequency data; constant and Gaussian-like pri- ors, eprint arXiv:quant-ph/0612105; see also [507]. [507] Månsson, A., P. G. L. Porta Mana, and G. Björk, 2007, Numerical Bayesian state assignment for a quantum three-level system. II. Average-value data; constant, Gaussian-like, and Slater priors, eprint arXiv:quant-ph/0701087; see also [506]. [508] Marlow, A. R. (ed.), 1978, Mathematical Foundations of Quantum Theory (Academic Press, 94 BIBLIOGRAPHY

London). [509] Marsden, J. E., T. Ratiu, and R. Abraham, 1983/2002, Manifolds, Tensor Analysis, and Appli- cations (Springer-Verlag, New York), third edition, first publ. 1983. [510] Martínez Pería, F. D., P. G. Massey, and L. E. Silvestre, 2005, Weak matrix majorization, Lin. Alg. Appl. 403, 343–368. [511] Maugin, G. A., 1992, The thermomechanics of plasticity and fracture (Cambridge University Press, Cambridge). [512] Maugin, G. A., 1993, Material Inhomogeneities in Elasticity (Chapman & Hall, London). [513] Maugin, G. A., 1999, Nonlinear Waves in Elastic Crystals (Oxford University Press, Oxford). [514] Maxwell, J. C., 1871/1902, Theory of Heat (Longmans, Green, & Co., London), tenth edi- tion, ‘With corrections and additions (1891) by Lord Rayleigh’; http://www.archive. org/details/theoryofheat00maxwrich; first publ. 1871. [515] Maxwell, J. C., 1873/1954, A Treatise on Electricity and Magnetism. Vols. I and II (Dover Publications, New York), ‘Unabridged and unaltered republication of the third edition of 1891’, first publ. 1873. [516] McCullagh, P., 2002, What is a statistical model?, Ann. Stat. 30(5), 1225–1267, http:// www.stat.uchicago.edu/~pmcc/publications.html; see also the following discussion and rejoinder [67]. [517] McLennan, J. A., Jr., 1959, Statistical mechanics of the steady state, Phys. Rev. 115(6), 1405– 1409. [518] McLennan, J. A., Jr., 1963, Comment on the theory of transport coefficients, Progr. Theor. Phys. 30(3), 408–409, see also [537]. [519] McLennan, J. A., Jr., 1963, The formal statistical theory of transport processes, Adv. Chem. Phys. 5, 261–317. [520] McMullen, P., and G. C. Shephard, 1971, Convex Polytopes and the Upper Bound Conjecture (Cambridge University Press, Cambridge). [521] McNamara, S., and S. Luding, 1998, Energy flows in vibrated granular media, Phys. Rev. E 58(1), 813–822. [522] McNamara, S., and W. R. Young, 1992, Inelastic collapse and clumping in a one-dimensional granular medium, Phys. Fluids A 4(3), 496–504. [523] Medina, R., 2005, Radiation reaction of a classical quasi-rigid extended particle, eprint arXiv:physics/0508031. [524] van der Meer, D., and P. Reimann, 2004, Temperature anisotropy in a driven granular gas, eprint arXiv:cond-mat/0404174. [525] Melby, P., F. Vega Reyes, A. Prevost, R. Robertson, P. Kumar, D. A. Egolf, and J. S. Urbach, 2004, The dynamics of thin vibrated granular layers, eprint arXiv:cond-mat/0412640. [526] Mielnik, B., 1968, Geometry of quantum states, Commun. Math. Phys. 9(1), 55–80. [527] Mielnik, B., 1969, Theory of filters, Commun. Math. Phys. 15(1), 1–46. [528] Mielnik, B., 1974, Generalized quantum mechanics, Commun. Math. Phys. 37(3), 221–256, repr. in [352, pp. 115–152]. [529] Mielnik, B., 1976, Quantum logic: Is it necessarily orthocomplemented?, in Flato et al. [242], pp. 117–135. [530] Mielnik, B., 1981, Quantum theory without axioms, in Isham et al. [360], pp. 638–656. [531] Mill, J. S., 1843/1886, System of Logic, Ratiocinative and Inductive: Being a Connected View of the Principles of Evidence and the Methods of Scientific Investigation (Longmans, Green, and Co., London), eighth edition, first publ. 1843. [532] Milonni, P. W., J. R. Ackerhalt, and W. A. Smith, 1973, Interpretation of radiative corrections 95

in spontaneous emission, Phys. Rev. Lett. 31(15), 958–960. [533] Minlos, R., S. Rœlly, and H. Zessin, 2000, Gibbs states on space-time, Potential Analysis 13(4), 367–408. [534] Moon, S. J., M. D. Shattuck, and J. B. Swift, 2001, Velocity distributions and correla- tions in homogeneously heated granular media, Phys. Rev. E 64(3), 031303, eprint arXiv: cond-mat/0105322. [535] Moon, S. J., J. B. Swift, and H. L. Swinney, 2004, Steady-state velocity distributions of an oscillated granular gas, Phys. Rev. E 69(1), 011301, eprint arXiv:cond-mat/0309480. [536] Morgan, P., 2006, Bell inequalities for random fields, J. Phys. A 39(23), 7441–7455, see also eprint arXiv:cond-mat/0403692 and the previous versions thereof. [537] Mori, H., 1958, Statistical-mechanical theory of transport in fluids, Phys. Rev. 112(6), 1829– 1842, see also [293, 518]. [538] Morse, N., and R. Sacksteder, 1966, Statistical isomorphism, Ann. Math. Stat. 37(1), 203– 214. [539] Mosleh, A., and V. M. Bier, 1996, Uncertainty about probability: a reconciliation with the subjectivist viewpoint, IEEE Trans. Syst. Man Cybern. A 26(3), 303–310. [540] Mueth, D. M., H. M. Jaeger, and S. R. Nagel, 1998, Force distribution in a granular medium, Phys. Rev. E 57(3), 3164–3169, eprint arXiv:cond-mat/9902282. [541] Murdoch, A. I., 1985, A corpuscular approach to continuum mechanics: basic considerations, Arch. Rational Mech. Anal. 88(4), 291–321, repr. in [134, pp. 81–111]. [542] Murdoch, A. I., 1987, On the relationship between balance relations for generalised continua and molecular behaviour, Int. J. Engng. Sci. 25(7), 883–914. [543] Murdoch, A. I., 2000, On time-dependent material systems, Int. J. Engng. Sci. 38(4), 429–452. [544] Murdoch, A. I., 2003, On the microscopic interpretation of stress and couple stress, J. Elas- ticity 71(1–3), 105-131. [545] Murdoch, A. I., 2006, Some primitive concepts in continuum mechanics regarded in terms of objective space-time molecular averaging: The key rôle played by inertial observers, J. Elasticity 84(1), 69–97. [546] Murdoch, A. I., and D. Bedeaux, 1993, On the physical interpretation of fields in continuum mechanics, Int. J. Engng. Sci. 31(10), 1345–1373. [547] Murdoch, A. I., and D. Bedeaux, 1994, Continuum equations of balance via weighted aver- ages of microscopic quantities, Proc. Roy. Soc. London A 445(1923), 157–179. [548] Murdoch, A. I., and D. Bedeaux, 1996, A microscopic perspective on the physical founda- tions of continuum mechanics — Part I: Macroscopic states, reproducibility, and macroscopic statistics, at prescribed scales of length and time, Int. J. Engng. Sci. 34(10), 1111–1129, see also [549]. [549] Murdoch, A. I., and D. Bedeaux, 1997, A microscopic perspective on the physical foundations of continuum mechanics — II. A projection operator approach to the separation of reversible and irreversible contributions to macroscopic behaviour, Int. J. Engng. Sci. 35(10–11), 921– 949, see also [548]. [550] Murdoch, A. I., and D. Bedeaux, 2001, Characterization of microstates for confined systems and associated scale-dependent continuum fields via Fourier coefficients, J. Phys. A 34(33), 6495–6508. [551] Murdoch, A. I., and S. M. Hassanizadeh, 2002, Macroscale balance relations for bulk, in- terfacial and common line systems in multiphase flows through porous media on the basis of molecular considerations, Int. J. Multiphase Flow 28(7), 1091–1123. [552] Murdoch, A. I., and A. Morro, 1987, On the continuum theory of mixtures: Motivation from 96 BIBLIOGRAPHY

discrete considerations, Int. J. Engng. Sci. 25(1), 9–25. [553] Nemat-Nasser, S. (ed.), 1974, Mechanics Today. Vol. 1 (Pergamon Press, New York). [554] von Neumann, J., 1932/1996, Mathematische Grundlagen der Quantenmechanik (Springer- Verlag, Berlin), second edition, first publ. 1932; transl. as [555]. [555] von Neumann, J., 1955, Mathematical Foundations of Quantum Mechanics (Princeton Uni- versity Press, Princeton), transl. of [554] by Robert T. Beyer. [556] Newton, I., 1687/1726, Philosophiæ naturalis principia mathematica (Guil. & Joh. Innys, London), ‘Tertia aucta & emendata’ edition, http://www.archive.org/ details/principiareprint00newtuoft, http://www.archive.org/details/ principia00newtuoft, First publ. 1687; transl. in [557–559]. [557] Newton, I., 1687/1846, Newton’s Principia: The Mathematical Principles of Natural Philos- ophy, [. . . ] to which is added Newton’s system of the World (Daniel Adee, New York), first american, ‘carefully revised and corrected’ edition, transl. of [556] by Andrew Motte (1729). [558] Newton, I., 1687/1974, Sir Isaac Newton’s Mathematical Principles of Natural Philosophy and his system of the world. Vol. One: The Motion of Bodies (University of California Press, Berkeley), transl. of [556] by Andrew Motte, rev. and supplied with an historical and explana- tory appendix by Florian Cajori. [559] Newton, I., 1687/1974, Sir Isaac Newton’s Mathematical Principles of Natural Philosophy and his system of the world. Vol. Two: The System of the World (University of California Press, Berkeley), transl. of [556] by Andrew Motte, rev. and supplied with an historical and explanatory appendix by Florian Cajori. [560] Nietzsche, F., 1883/1999, Also sprach Zarathustra (de Gruyter, Berlin), Herausgegeben von Giorgio Colli und Mazzino Montinari; first publ. 1883–1885. [561] van Noije, T. P. C., and M. H. Ernst, 1998, Velocity distributions in homogeneous gran- ular fluids: the free and the heated case, Granular Matter 1(2), 57–64, eprint arXiv: cond-mat/9803042. [562] Noll, W., 1959, The foundations of classical mechanics in the light of recent advances in continuum mechanics, in Henkin et al. [339], pp. 266–281. [563] Noll, W., 1967, Space-time structures in classical mechanics, in Bunge [101], pp. 28–34. [564] Noll, W., 1972, A new mathematical theory of simple materials, Arch. Rational Mech. Anal. 48(1), 1–50. [565] Noll, W., 2004, Five Contributions to Natural Philosophy, http://www.math.cmu.edu/ ~wn0g/noll/. [566] Norton, J. D., 2005, Eaters of the lotus: Landauer’s principle and the return of Maxwell’s de- mon, Stud. Hist. Philos. Mod. Phys. 36(2), 375–411, http://www.pitt.edu/~jdnorton/ homepage/cv.html#Eaters. [567] Nowak, E. R., J. B. Knight, E. Ben-Naim, H. M. Jaeger, and S. R. Nagel, 1998, Density fluctuations in vibrated granular materials, Phys. Rev. E 57(2), 1971–1982. [568] Oberguggenberger, M., 2001, Generalized functions in nonlinear models — a survey, Nonlinear Analysis 47(8), 5029–5040, http://techmath.uibk.ac.at/mathematik/ publikationen/. [569] Obukhov, Y. N., and F. W. Hehl, 2003, Electromagnetic energy-momentum and forces in mat- ter, Phys. Lett. A 311(4–5), 277–284, eprint arXiv:physics/0303097. [570] Olafsen, J. S., and J. S. Urbach, 1998, Clustering, order, and collapse in a driven granular monolayer, Phys. Rev. Lett. 81(20), 4369–4372, eprint arXiv:cond-mat/9807148. [571] Olafsen, J. S., and J. S. Urbach, 1999, Velocity distributions and density fluctuations in a granular gas, Phys. Rev. E 60(3), R2468–R2471, eprint arXiv:cond-mat/9905173. 97

[572] Owen, D. R., 1984, A First Course in the Mathematical Foundations of Thermodynamics (Springer-Verlag, New York). [573] Paladin, G., and A. Vulpiani, 1988, The role of fluctuations in thermodynamics: a critical answer to Jaworski’s paper ‘Higher-order moments and the maximum entropy inference’, J. Phys. A 21(3), 843–847, see [370]. [574] Paltridge, G. W., 2005, Stumbling into the MEP racket: an historical perspective, in Kleidon and Lorenz [433], pp. 33–40. [575] Paolotti, D., A. Barrat, U. Marini Bettolo Marconi, and A. Puglisi, 2004, Thermal convec- tion in mono-disperse and bi-disperse granular gases: A simulation study, eprint arXiv: cond-mat/0404278. [576] Parry, W. E. (ed.), 1984, Essays in Theoretical Physics: In Honour of Dirk Ter Haar (Perga- mon Press, Oxford). [577] Partington, J. R., 1949, An Advanced Treatise on Physical Chemistry. Vol. I: Fundamental Principles. The Properties of Gases (Longmans, Green and Co, London). [578] Pearle, P. M., and A. Valentini, 2005, Generalizations of quantum mechanics, eprint arXiv: quant-ph/0506115. [579] Pechukas, P., 1994, Reduced dynamics need not be completely positive, Phys. Rev. Lett. 73(8), 1060–1062, see also [18, 580]. [580] Pechukas, P., 1995, Pechukas replies, Phys. Rev. Lett. 75(16), 3021, see also [18, 579]. [581] Pegg, D., S. Barnett, and J. Jeffers, 2002, Quantum theory of preparation and measurement, J. Mod. Opt. 49(5), 913–924, eprint arXiv:quant-ph/0207174. [582] Pegg, D. T., S. M. Barnett, and J. Jeffers, 2002, Quantum retrodiction in open systems, Phys. Rev. A 66, 022106, eprint arXiv:quant-ph/0208082. [583] Pelletier, F. J., 2000, A history of natural deduction and elementary logic textbooks, in Logical Consequence: Rival Approaches. Vol. 1, edited by J. Woods and B. Brown (Hermes Science Publications, Oxford), pp. 105–138, http://www.sfu.ca/~jeffpell/publications. html. [584] Pereira, S., and J. E. Sipe, 2002, Hamiltonian formulation of the nonlinear coupled mode equations, Phys. Rev. E 66(2), 026606. [585] Peres, A., 1984, Stability of quantum motion in chaotic and regular systems, Phys. Rev. A 30(4), 1610–1615. [586] Peres, A., 1995, Quantum Theory: Concepts and Methods (Kluwer Academic Publishers, Dordrecht). [587] Pešic,´ P. D., 1991, The principle of identicality and the foundations of quantum theory. I. The Gibbs paradox, Am. J. Phys. 59(11), 971–974, see also [588]. [588] Pešic,´ P. D., 1991, The principle of identicality and the foundations of quantum theory. II. The role of identicality in the formation of quantum theory, Am. J. Phys. 59(11), 975–979, see also [587]. [589] Pfirsch, D., 1968, De Theoremate conservandi quodam differentiali Theoriae ab Hamilton atque Jacobi doctae, Arch. Rational Mech. Anal. 29(5), 323–324. [590] Piechocinska, B., 2000, Information erasure, Phys. Rev. A 61(6), 062314, repr. in [459], pp. 169–177. [591] Pitteri, M., 1986, Continuum equations of balance in classical statistical mechanics, Arch. Rational Mech. Anal. 94(4), 291–305, see also corrigendum [592] and [593, 594]. [592] Pitteri, M., 1988, Correction of an error in my paper, Arch. Rational Mech. Anal. 100(3), 315–316, see [591]. [593] Pitteri, M., 1988, On the equations of balance for mixtures constructed by means of classical 98 BIBLIOGRAPHY

statistical mechanics, Meccanica 23(1), 3–10, see also [591, 592, 594]. [594] Pitteri, M., 1990, On a statistical-kinetic model for generalized continua, Arch. Rational Mech. Anal. 111(2), 99–120, see also [591–593]. [595] Poincaré, H., 1902/1952, Science and Hypothesis (Dover Publications, New York), transl. by W. J. G. of [596]; first publ. 1902. [596] Poincaré, H., 1902/1992, La science et l’hypothèse (Éditions de la Bohème, Rueil- Malmaison, France), http://gallica.bnf.fr/document?O=N026745; first publ. 1902; transl. as [595]. [597] Poincaré, H., 1913/1946, The Foundations of Science: Science and Hypothesis, The Value of Science, Science and Method (The Science Press, Lancaster, USA), transl. from the French by G. B. Halsted, first publ. 1913. [598] Pólya, G., 1945/1971, How To Solve It: A New Aspect of Mathematical Method (Princeton University Press, Princeton), second edition, first publ. 1945. [599] Pólya, G., 1954, Mathematics and Plausible Reasoning: Vol. I: Induction and Analogy in Mathematics (Princeton University Press, Princeton). [600] Pólya, G., 1954/1968, Mathematics and Plausible Reasoning: Vol. II: Patterns of Plausible Inference (Princeton University Press, Princeton), second edition, first publ. 1954. [601] Pool, J. C. T., 1968, Baer ∗-semigroups and the logic of quantum mechanics, Commun. Math. Phys. 9(2), 118–141. [602] Pool, J. C. T., 1968, Semimodularity and the logic of quantum mechanics, Commun. Math. Phys. 9(3), 212–228. [603] Poole, D. G., 1995, The stochastic group, Am. Math. Monthly 102(9), 798–801. [604] Porta Mana, P. G. L., 2006, Four-dimensional sections of the set of statistical operators for a three-level quantum system, in various coordinate systems, realised as animated pictures by means of Maple; available upon request. [605] Porta Mana, P. G. L., 2007, in preparation. [606] Porta Mana, P. G. L., 2007, Studies in plausibility theory, with applications to physics, Ph.D. thesis, Kungliga Tekniska Högskolan, Stockholm, http://web.it.kth.se/~mana/. [607] Primas, H., 1985, Kann Chemie auf Physik reduziert werden? Erster Teil: Das Molekulare Programm, Chemie in unserer Zeit 19(4), 109–119, see also [608]. [608] Primas, H., 1985, Kann Chemie auf Physik reduziert werden? Zweiter Teil: Die Chemie der Makrowelt, Chemie in unserer Zeit 19(5), 160–166, see also [607]. [609] Puglisi, A., V. Loreto, U. Marini Bettolo Marconi, and A. Vulpiani, 1999, Kinetic approach to granular gases, Phys. Rev. E 59(5), 5582–5595, eprint arXiv:cond-mat/9810059. [610] Puglisi, A., P. Visco, A. Barrat, E. Trizac, and F. van Wijland, 2005, Fluctuations of inter- nal energy flow in a vibrated granular gas, Phys. Rev. Lett. 95(11), 110202, eprint arXiv: cond-mat/0509105. [611] Quine, W. V. O., 1941/1966, Elementary Logic (Harvard University Press, Cambridge, USA), revised edition, first publ. 1941. [612] Quine, W. V. O., 1950/1982, Methods of Logic (Harvard University Press, Cambridge, USA), fourth edition, first publ. 1950. [613] Ramshaw, J. D., 1986, Remarks on entropy and irreversibility in non-Hamiltonian systems, Phys. Lett. A 116(3), 110–114. [614] Ramshaw, J. D., 2002, Remarks on non-Hamiltonian statistical mechanics, Europhys. Lett. 59(3), 319–. [615] Randall, C. H., and D. J. Foulis, 1970, An approach to empirical logic, Am. Math. Monthly 77(4), 363–374. 99

[616] Randall, C. H., and D. J. Foulis, 1973, Operational statistics. II. Manuals of operations and their logics, J. Math. Phys. 14(10), 1472–1480. [617] Randall, C. H., and D. J. Foulis, 1979, The operational approach to quantum mechanics, in Hooker [352], pp. 167–201. [618] Richardson, J. M., 1960, The hydrodynamical equations of a one component system derived from nonequilibrium statistical mechanics, J. Math. Anal. Appl. 1(1), 12–60. [619] Robertson, B., 1966, Equations of motion in nonequilibrium statistical mechanics, Phys. Rev. 144(1), 144. [620] Robertson, B., 1967, Equations of motion in nonequilibrium statistical mechanics. II. Energy transport, Phys. Rev. 160(1), 175–183, see also erratum [621]. [621] Robertson, B., 1968, [Erratum:] Equations of motion in nonequilibrium statistical mechanics. II. Energy transport, Phys. Rev. 166(1), 206, see [620]. [622] Robertson, B., 1979, Application of maximum entropy to nonequilibrium statistical mechan- ics, in Levine and Tribus [462], pp. 289–320. [623] Robertson, B., and W. C. Mitchell, 1971, Equations of motion in nonequilibrium statistical mechanics. III. Open systems, J. Math. Phys. 12(3), 563–568. [624] Robin, W. A., 1990, Nonequilibrium thermodynamics, J. Phys. A 23(11), 2065–2085. [625] Robinson, F. N. H., 1973, Macroscopic Electromagnetism (Pergamon Press, Oxford). [626] Robinson, F. N. H., 1975, Electromagnetic stress and momentum in matter, Phys. Rep. 16(6), 313–354. [627] Rockafellar, R. T., 1970, Convex Analysis (Princeton University Press, Princeton). [628] Rodrigues Jr., W. A., J. Vaz, E. Recami, and G. Salesi, 1998, About Zitterbewegung and electron structure, eprint arXiv:quant-ph/9803037. [629] Rohrlich, F., 2000, The self-force and radiation reaction, Am. J. Phys. 68(12), 1109–1112. [630] Rohrlich, F., 2002, Dynamics of a classical quasi-point charge, Phys. Lett. A 303(5–6), 307– 310. [631] Rosas, A., D. ben Avraham, and K. Lindenberg, 2005, Velocity distribution in a viscous gran- ular gas, Phys. Rev. E 71(3), 032301, eprint arXiv:cond-mat/0404405. [632] Rothstein, J., 1951, Information, measurement, and quantum mechanics, Science 114(2955), 171–175. [633] Rouyer, F., and N. Menon, 2000, Velocity fluctuations in a homogeneous 2D granular gas in steady state, Phys. Rev. Lett. 85(17), 3676–3679. [634] Roy, N. M., 1987, Extreme points of convex sets in infinite dimensional spaces, Am. Math. Monthly 94(5), 409–422. [635] Royer, A., 2005, Families of positivity preserving but not completely positive superoperators, Phys. Lett. A 336(4–5), 295–310. [636] Samohýl, I., 1987, Thermodynamics of Irreversible Processes in Fluid Mixtures (Approached by Rational Thermodynamics) (BSB B. G. Teubner Verlagsgesellschaft, Leipzig). [637] Santos, E., 1997, The program of local hidden variables, in Ferrero and van der Merwe [229], pp. 369–379. [638] Saunders, R., R. K. Bullough, and F. Ahmad, 1975, Theory of radiation reaction and atom self-energies: an operator reaction field, J. Phys. A 8(5), 759–772, see also [98]. [639] Scalapino, D. J., 1961, Irreversible statistical mechanics and the principle of maximum en- tropy, Ph.D. thesis, Stanford University. [640] Schilders, W. H. A., E. J. W. ter Maten, and S. H. M. J. Houben (eds.), 2004, Scientific Computing in Electrical Engineering: Proceedings of the SCEE-2002 Conference held in Eindhoven (Springer, Berlin). 100 BIBLIOGRAPHY

[641] Schlögl, F., 1980, Stochastic measures in nonequilibrium thermodynamics, Phys. Rep. 62(4), 267–380. [642] Schrödinger, E., 1921, Isotopie und Gibbssches Paradoxon, Z. Phys. 5(3), 163-166. [643] Schroeck Jr., F. E., 1981, A model of a quantum mechanical treatment of measurement with a physical interpretation, J. Math. Phys. 22(11), 2562–2572. [644] Schroeck Jr., F. E., 1982, On the stochastic measurement of incompatible spin components, Found. Phys. 12(5), 479–497. [645] Schroeck Jr., F. E., 1982, The transitions among classical mechanics, quantum mechanics, and stochastic quantum mechanics, Found. Phys. 12(9), 825–841. [646] Schroeck Jr., F. E., 1985, The dequantization programme for stochastic quantum mechanics, J. Math. Phys. 26(2), 306–310. [647] Schroeck Jr., F. E., 1989, Coexistence of observables, Int. J. Theor. Phys. 28(3), 247–262. [648] Schröter, M., S. Nägle, C. Radin, and H. L. Swinney, 2006, Phase transition in a static gran- ular system, eprint arXiv:cond-mat/0606459, eprint mp_arc:06-187. [649] Serrin, J. (ed.), 1986, New Perspectives in Thermodynamics (Springer-Verlag, Berlin). [650] Serrin, J., 1986, An outline of thermodynamical structure, in Serrin [649], pp. 3–32. [651] Serrin, J., 1991, The nature of thermodynamics, Atti Sem. Mat. Fis. Univ. Modena 39(2), 455–472. [652] Serrin, J., 1995, On the elementary thermodynamics of quasi-static systems and other remarks, in Thermoelastic Problems and the Thermodynamics of Continua: Presented at the 1995 Joint ASME Applied Mechanics and Materials Summer Meeting, Los Angeles, California, June 28– 30, 1995, edited by L. M. Brock (American Society of Mechanical Engineers, New York), pp. 53–62. [653] Serrin, J., 1996, The equations of continuum mechanics and the laws of thermodynamics, Meccanica 31(5), 547–563. [654] Shaji, A., and E. C. G. Sudarshan, 2005, Who’s afraid of not completely positive maps?, Phys. Lett. A 341(1–4), 48–54. [655] Shalizi, C. R., and J. P. Crutchfield, 2001, Computational mechanics: Pattern and prediction, structure and simplicity, J. Stat. Phys. 104(3–4), 817–879, eprint arXiv: cond-mat/9907176. [656] Shalizi, C. R., and C. Moore, 2003, What is a macrostate? Subjective observations and ob- jective dynamics, eprint arXiv:cond-mat/0303625. [657] Shannon, C. E., 1948, A mathematical theory of communication, Bell Syst. Tech. J. 27, 379–423, 623–656, http://cm.bell-labs.com/cm/ms/what/shannonday/paper. html, http://www.cparity.com/it/demo/external/shannon.pdf. [658] Shannon, C. E., 1949, Communication in the presence of noise, Proc. IRE 37(1), 10–21, repr. in [659]. [659] Shannon, C. E., 1998, Communication in the presence of noise, Proc. IEEE 86(2), 447–457, repr. of [658], with an introduction [755]. [660] Shen, Y. R., 1967, Quantum statistics of nonlinear optics, Phys. Rev. 155(3), 921–931. [661] Shizume, K., 1995, Heat generation required by information erasure, Phys. Rev. E 52(4), 3495–3499, repr. in [459], pp. 164–168. [662] Šilhavý, M., 1997, The Mechanics and Thermodynamics of Continuous Media (Springer- Verlag, Berlin). [663] Skilling, J. (ed.), 1989, Maximum Entropy and Bayesian Methods: Cambridge, England, 1988 (Kluwer Academic Publishers, Dordrecht). [664] Skordos, P. A., 1993, Compressible dynamics, time reversibility, Maxwell’s demon, and the 101

second law, Phys. Rev. E 48(2), 777–784, eprint arXiv:chao-dyn/9305006. [665] Smith, C. R., and W. T. Grandy, Jr. (eds.), 1985, Maximum-Entropy and Bayesian Methods in Inverse Problems (D. Reidel Publishing Company, Dordrecht). [666] Sommeria, J., 2005, Entropy production in turbulent mixing, in Kleidon and Lorenz [433], pp. 79–91. [667] Spekkens, R. W., 2005, Contextuality for preparations, transformations, and unsharp mea- surements, Phys. Rev. A 71(5), 052108, eprint arXiv:quant-ph/0406166. [668] Spekkens, R. W., 2007, Evidence for the epistemic view of quantum states: A toy theory, Phys. Rev. A 75(3), 032110, eprint arXiv:quant-ph/0401052. [669] Stepišnik, J., S. Lasic,ˇ I. Serša, A. Mohoric,ˇ and G. Planinšic,ˇ 2005, Dynamics of air- fluidized granular system measured by the modulated gradient spin-echo, eprint arXiv: physics/0505214. [670] Stigler, S. M., 1986, Laplace’s 1774 memoir on inverse probability, Stat. Sci. 1(3), 359–363, introduction to the transl. [454]. [671] Stolz, P., 1969, Attempt of an axiomatic foundation of quantum mechanics and more general theories V, Commun. Math. Phys. 11(4), 303–313. [672] Stolz, P., 1971, Attempt of an axiomatic foundation of quantum mechanics and more general theories VI, Commun. Math. Phys. 23(2), 117–126. [673] Størmer, E., 1969, Symmetric states of infinite tensor products of c∗-algebras, J. Funct. Anal. 3(1), 48–68. [674] Strauß, M., 1937, Mathematics as logical syntax — a method to formalize the language of a physical theory, Erkenntnis 7, 147–153. [675] Strawson, P. F., 1952/1971, Introduction to Logical Theory (Methuen & Co., London), first publ. 1952. [676] Strawson, P. F. (ed.), 1967, Philosophical Logic (Oxford University Press, London). [677] Stroud, C. R., Jr., and E. T. Jaynes, 1970, [Erratum:] Long-term solutions in semiclassical radiation theory, Phys. Rev. A 2(4), 1613–1614, see [678]. [678] Stroud, C. R., Jr., and E. T. Jaynes, 1970, Long-term solutions in semiclassical radiation theory, Phys. Rev. A 1(1), 106–121, see also erratum [677]. [679] Styer, D., S. Sobottka, W. Holladay, T. A. Brun, R. B. Griffiths, P. Harris, C. A. Fuchs, and A. Peres, 2000, Quantum theory — interpretation, formulation, inspiration and Fuchs and Peres reply, Phys. Today 53(9), 11–14, see [249]. [680] Sudarshan, E. C. G., 1963, Equivalence of semiclassical and quantum mechanical descrip- tions of statistical light beams, Phys. Rev. Lett. 10(7), 277–279. [681] Sundholm, G., 2006, Semantic values for natural deduction derivations, Synthese 148(3), 623–638, logic. [682] Suppes, P., 1957/1963, Introduction to Logic (D. Van Nostrand Company, Princeton, USA), first publ. 1957. [683] Szabó, L. E., 1996, Is there anything non-classical?, eprint arXiv:quant-ph/9606006 . [684] Szabó, L. E., 2001, Critical reflections on quantum probability theory, in John von Neu- mann and the Foundations of Quantum Physics, edited by M. Rédei and M. Stöltzner (Kluwer Academic Publishers), pp. 201–219, http://philosophy.elte.hu/leszabo/neumann/ neumann.htm. [685] Szilard, L., 1925, Über die Ausdehnung der phänomenologischen Thermodynamik auf die Schwankungserscheinungen, Z. Phys. 32, 753–788, transl. in [686]. [686] Szilard, L., 1925/1972, On the extension of phenomenological thermodynamics to fluctuation phenomena, in Szilard [687], pp. 70–102. 102 BIBLIOGRAPHY

[687] Szilard, L., 1972, The Collected Works of Leo Szilad. Vol. I: Scientific Papers (The MIT Press, London), ed. by Bernard T. Feld and Gertrud Weiss Szilard. [688] Tarski, A., 1936, O poje. ciu wynikania logicznego, Przegla.d Filozoficzny 39, 58–68, transl. as [691] with an introduction. [689] Tarski, A., 1936/1994, Introduction to Logic and to the Methodology of the Deductive Sciences (Oxford University Press, Oxford), fourth edition, ed. by Jan Tarski, transl. by Olaf Helmer; first publ. in Russian 1936. [690] Tarski, A., 1944, The semantic conception of truth: And the foundations of semantics, Philos- ophy and Phenomenological Research 4(3), 341–376. [691] Tarski, A., 2002, On the concept of following logically, Hist. Philos. Logic 23, 155–196, transl. of [688] by Magda Stroinska´ and David Hitchcock, with an introduction. [692] Teramoto, H., and S. ichi Sasa, 2005, Microscopic description of the equality between viola- tion of fluctuation-dissipation relation and energy dissipation, Phys. Rev. E 72(6), 060102(R), eprint arXiv:cond-mat/0509465. [693] Terno, D. R., 2001, Comment on “Counterfactual entanglement and nonlocal correlations in separable states”, Phys. Rev. A 63(1), 016101, eprint arXiv:quant-ph/9912002, see [130] and also [131]. [694] Thomson, W., (Lord Kelvin), 1874, Kinetic theory of the dissipation of energy, Nature 9, 441–444, see [695]. [695] Thomson, W., (Lord Kelvin), 1874/1875, The kinetic theory of the dissipation of energy, Proc. Roy. Soc. Edinb. 8, 325–334, read 1874; also publ. in [694]; repr. in [697, item 86, pp. 11-20] and [459, pp. 41–43]. [696] Thomson, W., (Lord Kelvin), 1879, The sorting demon of Maxwell, Roy. Institution Proc. 9, 113, abstract of Royal Institution Lecture; repr. in [697, item 87, pp. 21-23]. [697] Thomson, W., (Lord Kelvin), 1911, Mathematical and Physical Papers. Volume V: Thermody- namics, Cosmical and Geological Physics, Molecular and Crystalline Theory, Electrodynam- ics. Collected from Different Scientific Periodicals from May, 1841, to the Present Time. With Supplementary Articles Written for the Present Volume, and Hitherto Unpublished (Cam- bridge University Press, Cambridge), ‘Arranged and revised with brief annotations by Sir Joseph Larmor’; http://www.archive.org/details/mathematicalandp029217mbp. [698] Tisza, L., 1966, The fluctuation of paths, in Tisza [699], pp. 296–306. [699] Tisza, L., 1966/1977, Generalized Thermodynamics (The MIT Press, Cambridge, USA), first publ. 1966. [700] Tisza, L., and I. Manning, 1957, Fluctuations and irreversible thermodynamics, Phys. Rev. 105(6), 1695–1705, repr. in [699, pp. 307–317]. [701] Tisza, L., and P. M. Quay, 1963, The statistical thermodynamics of equilibrium, Ann. of Phys. 25(1), 48–90, repr. in [699, pp. 245–287]. [702] Tolkien, J. R. R., 1954/1999, The Lord of the Rings (Harper Collins Publishers, London), three volumes; first publ. 1954–55. [703] Toupin, R. A., 1956, The elastic dielectric, J. Rational Mech. Anal. 5(6), 849–915. [704] Toupin, R. A., 1963, A dynamical theory of elastic dielectrics, Int. J. Engng. Sci. 1(1), 101– 126. [705] Toupin, R. A., and R. S. Rivlin, 1961, Electro-magneto-optical effects, Arch. Rational Mech. Anal. 7(1), 434–448. [706] Tribus, M., 1961, Thermostatics and Thermodynamics: An Introduction to Energy, Informa- tion and States of Matter, with Engineering Applications (D. Van Nostrand Company, Prince- ton). 103

[707] Tribus, M., 1969, Rational Descriptions, Decisions and Designs (Pergamon Press, New York). [708] Truesdell, C. A., III, 1963, [Book review:] Max Jammer. Concepts of Mass in Classical and Modern Physics, Isis 54(2), 290–291. [709] Truesdell, C. A., III, 1966, Method and taste in natural philosophy, in Truesdell [710], pp. 83–108. [710] Truesdell, C. A., III, 1966, Six Lectures on Modern Natural Philosophy (Springer-Verlag, Berlin). [711] Truesdell, C. A., III, 1968, Essays in the History of Mechanics (Springer-Verlag, Berlin). [712] Truesdell, C. A., III, 1969/1984, Rational Thermodynamics (Springer-Verlag, New York), second edition, first publ. 1969. [713] Truesdell, C. A., III, 1975, Early kinetic theories of gases, Arch. Hist. Exact Sci. 15(1), 1–66. [714] Truesdell, C. A., III, 1977/1991, A First Course in Rational Continuum Mechanics. Vol. 1: General Concepts (Academic Press, New York), second edition, first publ. 1977. [715] Truesdell, C. A., III, 1984, An Idiot’s Fugitive Essays on Science: Methods, Criticism, Train- ing, Circumstances (Springer-Verlag, New York), second printing, revised and augmented. [716] Truesdell, C. A., III, and R. G. Muncaster, 1980, Fundamentals of Maxwell’s Kinetic Theory of a Simple Monatomic Gas: Treated as a Branch of Rational Mechanics (Academic Press, New York). [717] Truesdell, C. A., III, and W. Noll, 1965/1992, The Non-Linear Field Theories of Mechanics (Springer-Verlag, Berlin), second edition, first publ. 1965 in [244]; third ed. [718]. [718] Truesdell, C. A., III, and W. Noll, 1965/2004, The Non-Linear Field Theories of Mechanics (Springer-Verlag, Berlin), third edition, ed. by Stuart S. Antman; first publ. 1965 in [244]; second ed. [717]. [719] Truesdell, C. A., III, and R. A. Toupin, 1960, The Classical Field Theories, in Flügge [243], pp. 226–858, with an appendix on invariants by Jerald LaVerne Ericksen. [720] Tuckerman, M. E., C. J. Mundy, and G. J. Martyna, 1999, On the classical statistical mechan- ics of non-Hamiltonian systems, Europhys. Lett. 45(2), 149–155. [721] Tversky, A., and D. Kahneman, 1974, Judgment under uncertainty: Heuristics and biases, Science 185(4157), 1124–1131, repr. in [724]. [722] Tversky, A., and D. Kahneman, 1981, The framing of decisions and the psychology of choice, Science 211(4481), 453–458. [723] Tversky, A., and D. Kahneman, 1983, Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment, Psychol. Rev. 90(4), 293–315, repr. in [724]. [724] Tversky, A., and D. Kahneman, 1993, Probabilistic reasoning, in Goldman [281], pp. 43–68. [725] Urbach, J. S., and J. S. Olafsen, 1999, Experimental observations of non-equilibrium distri- butions and transitions in a 2D granular gas, eprint arXiv:cond-mat/9907373. [726] Valentine, F. A., 1964, Convex Sets (McGraw-Hill Book Company, New York). [727] Valentini, A., 2001, Hidden variables, statistical mechanics and the early universe, in Chance in Physics: Foundations and Perspectives, edited by J. Bricmont, D. Dürr, M. C. Galavotti, G. Ghirardi, F. Petruccione, and N. Zanghí (Springer-Verlag, Berlin), pp. 165–181, eprint arXiv:quant-ph/0104067. [728] Valentini, A., 2002, Signal-locality and subquantum information in deterministic hidden- variables theories, in Non-locality and Modality, edited by T. Placek and J. Butterfield (Kluwer Academic Publishers, Dordrecht), pp. 81–103, eprint arXiv:quant-ph/0104067. [729] Valentini, A., 2002, Signal-locality in hidden-variables theories, Phys. Lett. A 297(5-6), 273– 278, eprint arXiv:quant-ph/0106098. [730] Valentini, A., 2003, Universal signature of non-quantum systems, eprint arXiv: 104 BIBLIOGRAPHY

quant-ph/0309107. [731] Valentini, A., 2004, Black holes, information loss, and hidden variables, eprint arXiv: hep-th/0407032. [732] Valentini, A., and H. Westman, 2004, Dynamical origin of quantum probabilities, eprint arXiv:quant-ph/0403034. [733] Vlad, M. O., and M. C. Mackey, 1995, Maximum information entropy approach to non- markovian random jump processes with long memory: application to surprisal analysis in molecular dynamics, Physica A 215(3), 339–360. [734] Štelmachovic,ˇ P., and V. Bužek, 2001, Dynamics of open quantum systems initially entangled with environment: Beyond the Kraus representation, Phys. Rev. A 64, 062106, eprint arXiv: quant-ph/0108136, see also erratum [735]. [735] Štelmachovic,ˇ P., and V. Bužek, 2003, Erratum: Dynamics of open quantum systems initially entangled with environment: Beyond the Kraus representation, Phys. Rev. A 67, 029902, see [734]. [736] Wang, C.-C., 1979, Mathematical Principles of Mechanics and Electromagnetism. Part A: Analytical and Continuum Mechanics (Plenum Press, New York). [737] Wang, C.-C., 1979, Mathematical Principles of Mechanics and Electromagnetism. Part B: Electromagnetism and Gravitation (Plenum Press, New York). [738] Warnick, K. F., and R. H. S. D. V. Arnold, 1997, Teaching electromagnetic field theory us- ing differential forms, IEEE Trans. Educ. 40(1), 53–68, http://www.ece.byu.edu/forms/ forms_pub.html. [739] Warren, J., S. Schaefer, A. N. Hirani, and M. Desbrun, 2007, Barycentric coordinates for convex sets, Adv. Comput. Math. (‘online first’)(**), **. [740] Waterston, J. J., 1845/1892, On the physics of media that are composed of free and perfectly elastic molecules in a state of motion, Phil. Trans. R. Soc. Lond. A 183, 1–79, with an intro- duction by Lord Rayleigh; first written 1845, read 1846; see also Truesdell’s discussion [713]. [741] Wigner, E. P., 1932, On the quantum correction for thermodynamic equilibrium, Phys. Rev. 40, 749–759. [742] Wilce, A., 2003, Topological orthoalgebras, eprint arXiv:math.RA/0301072. [743] Wilce, A., 2004, Compact orthoalgebras, eprint arXiv:quant-ph/0405180. [744] Wilce, A., 2004, Topological test spaces, eprint arXiv:quant-ph/0405178. [745] Wildman, R. D., and D. J. Parker, 2002, Coexistence of two granular temperatures in binary vibrofluidized beds, Phys. Rev. Lett. 88(6), 064301. [746] Willems, J. C., 1972, Dissipative dynamical systems: Part I: General theory, Arch. Rational Mech. Anal. 45(5), 321–351, see also [747]. [747] Willems, J. C., 1972, Dissipative dynamical systems: Part II: Linear systems with quadratic supply rates, Arch. Rational Mech. Anal. 45(5), 352–393, see also [746]. [748] Wittgenstein, L., 1929/1998, Philosophical Remarks (Basil Blackwell, Oxford), ed. by Rush Rhees, transl. by Raymond Hargreaves and Roger White; written ca. 1929–30, first publ. in German 1964. [749] Wittgenstein, L., 1933/1993, Philosophical Grammar (Basil Blackwell, Oxford), ed. by Rush Rhees, transl. by Anthony Kenny; written ca. 1933–34, first publ. 1974. [750] Wittgenstein, L., 1945/2001, Philosophical Investigations [Philosophische Untersuchungen]: The German Text, with a Revised English Translation (Blackwell Publishing, Oxford), third edition, transl. by G. E. M. Anscombe, first publ. 1953 (written 1945–49). [751] Wright, R., 1977, The structure of projection-valued states: A generalization of Wigner’s theorem, Int. J. Theor. Phys. 16(8), 567–573. 105

[752] Wright, R., 1978, Spin manuals: Empirical logic talks quantum mechanics, in Marlow [508], pp. 177–254. [753] Wright, R., 1978, The state of the pentagon: A nonclassical example, in Marlow [508], pp. 255–274. [754] Wright, R., 2003, Statistical structures underlying quantum mechanics and social science, eprint arXiv:quant-ph/0307234. [755] Wyner, A. D., and S. Shamai, 1998, Introduction to “Communication in the presence of noise”, Proc. IEEE 86(2), 442–446, see [659]. [756] Zadeh, L. A., and C. A. Desoer, 1963, Linear System Theory: The State Space Approach (McGraw-Hill Book Company, New York). [757] Zellner, A., 1982, Is Jeffreys a “necessarist”?, American Statistician 36(1), 28–30, see also [413, 758]. [758] Zellner, A., 1982, Reply to a comment on “Is Jeffreys a ‘necessarist’?”, American Statistician 36(4), 392–393, see [413, 757]. [759] Zhang, K., and K. Zhang, 1992, Mechanical models of Maxwell’s demon with noninvariant phase volume, Phys. Rev. A 46(8), 4598–4605. [760] Zubarev, D. N., and V. P. Kalashnikov, 1970, Derivation of the nonequilibrium statistical operator from the extremum of the information entropy, Physica 46(4), 550–554. [761] Zurek, W. H. (ed.), 1990, Complexity, Entropy, and the Physics of Information (Addison- Wesley). [762] Zuriguel, I., A. Garcimartín, D. Maza, L. A. Pugnaloni, and J. M. Pastor, 2004, Jamming during the discharge of granular matter from a silo, eprint arXiv:cond-mat/0407761. [763] Zwanzig, R., 1961, Memory effects in irreversible thermodynamics, Phys. Rev. 124(4), 983– 992, see also [765]. [764] Zwanzig, R., 1965, Time-correlation functions and transport coefficients in statistical me- chanics, Annu. Rev. Phys. Chem. 16, 67–102. [765] Zwanzig, R., K. S. J. Nordholm, and W. C. Mitchell, 1972, Memory effects in irreversible thermodynamics: Corrected derivation of transport equations, Phys. Rev. A 5(6), 2680–2682, see [763]. [766] Zyczkowski,˙ K., and I. Bengtsson, 2004, On duality between quantum maps and quantum states, eprint arXiv:quant-ph/0401119.

Paper A

arXiv:quant-ph/0302049 v5 19 Jul 2004 ∗ as h uhr fRf 1]tctysitbtendi between shift tacitly [10] (di Ref. contexts of authors the cause entropy Shannon the of property show to and calculation explicit [10] Ref. in presented experiments the analyse of violation a properties. entropy’s represent Shannon really the [10] thought- Ref. interesting not in and given do concrete experiments a so the of how and or mainly nature, whether are explain philosophical) criticisms partly interesting (and These theoretical Timp- 13]. and [11] [12, Hall’s mainly son’s literature, the in appeared ready any) (if extent what to occur. or violations claimed sense the which in rea- therefore ascertain be to would sonable It properties, particular problems. its quantum-mechanical of in because entropy, Shannon the use ones. hold- classical while in experiments Shan- quantum ing in the violated of are properties entropy particular non two that conclude and ‘classical’,one, or thought- non-quantum-mechanical, a quantum-mechanical of statis- simple and the experiments two to of entropy outcomes Shannon the tical of application the analyse properties In uncertainty. intuitive to of correspond measure that a 9] symbols. from 8, prop- expected 7, or mathematical 6, 5, some messages 4, [1, possesses of erties entropy prob- set his a regard, a in this quantify present for to “choice” distribution it or introduced ability “uncertainty” Shannon of 3]. amount [2, the well branches as other science in applications of capacity, of important channel many definition found and operational has rate and and transmission precise like a concepts useful allows it where ory, rˆ central a played lcrncades [email protected] address: Electronic h ups ftepeetppri oepamtc ore- to pragmatic: more is paper present the of purpose The al- have [10] Ref. of conclusions the of criticisms Deep to wish we when limitations serious poses conclusion This esalseta h emn voain”apa nybe- only appear “violations” seeming the that see shall We Zeilinger and Brukner however, [10], paper recent a In has [1] entropy Shannon the 1948, in introduction its Since ff rn xeietlarneet) hsmisap- thus arrangements), experimental erent ASnmes 36.a 03.67.-a 03.65.Ta, numbers: PACS u teto utb adt oia n xeietlcnet.Ti atrmr ssont pl regardless apply to shown is settings, remark quantum last experiments. in This the violated of contexts. never nature experimental are classical and properties or logical its quantum to and the paid consistent of be fully must is entropy attention but Shannon the that shown is h ossec fteSannetoy hnapidt ucmso unu xeiet,i nlsd It analysed. is experiments, quantum of outcomes to applied when entropy, Shannon the of consistency The l nalbace fcmuiainthe- communication of branches all in ole that .INTRODUCTION I. hypeeti atn ilto fany of violation no fact in present they ossec fteSannetoyi unu experiments quantum in entropy Shannon the of Consistency ugiaTkik H¨ Tekniska Kungliga . ntttoe f¨ Institutionen gkln sfodgtn2,S-6 0Ksa Sweden Kista, 40 SE-164 22, Isafjordsgatan ogskolan, rMkolkrnkohInformationsteknik, och Mikroelektronik or ir .Lc Mana Luca G. Piero Dtd 6Jl 2004) July 16 (Dated: ff erent by hsieCtadhv oevle,t di a to values, some have and Cat Cheshire ‘ symbols the where context, side. q 3,a ti rte,mkssml osne eas the because di sense, have no they simply used: inconsistently makes are therein written, fact symbols mathematical is In various it addition. as of property (3), commutative Eq. the with do to little cm “2 fact in equations. is those property commutative in satisfied the clearly hence (2), Eq. for gously n hnwt h et( rest the with then and hs paiin lc lasosre rtol that only first observes always Alice apparitions these eeb cnet,adt aetels on lae,ltus let clearer, point last the example: make following the to consider and meant ‘context’, is by what explain here to order In formulae. Shannon’s plying et o is fal eseta h otn fE.()is (1) Eq. of content the that see we cm all, “2 of just really First No. for rect? violated is addition of property Cats”. commutative Cheshire “the or: ( The tail its one. with Mome appears very always a Cat and Cheshire one Frumious Frumious a Cats: Cheshire okn tEs 1 n 2,Aieconcludes: Alice (2), and (1) Eqs. at Looking that observe first can Alice rest it, its with appear always ( instead, Cat, Cheshire Mome The 2cm,andthenthat 0c;hence cm; 20 1 R esifra xml speetdi h ulse eso fti paper, this of version published the A in Rev. presented Phys. is example informal less A rt n hncmsteti ( tail the comes then and first, ) hscmsaotbcueAietctysit rmagiven a from shifts tacitly Alice because about comes This nteohrhn,wa lc en yE.()i just is (3) Eq. by means Alice what hand, other the On sAiescnlso bu h omttv rprycor- property commutative the about conclusion Alice’s Is two with acquainted is Alice Wonderland, another In + ∗ 3cm ff 69 rn auso h ih-ta nteleft-hand the on than right- the on values erent 618(2004). 062108 ,  + 0cm 30 3cm R “ “ “ = R T R T m hence cm, 3 + fisbd ha nldd,adin and included), (head body its of ) + + = + T R 0c” ota hseuto has equation this that so cm”, 20 3cm R T =  = ’and‘ R 0c ” cm 50 R 1 5cm” = + + T 2cm 0c,adte that then and cm, 30 T ,s ht hnmeeting when that, so ), R ” ee oteFrumious the to refer ’ . , . = m,adanalo- and cm”, 5 ff rn context, erent T )first, T T (3) (2) (1) = = 2 where the same symbols refer to the Mome Cheshire Cat and II. DEFINITIONS AND NOTATION have different values — but the commutative property of ad- dition is not meant to apply that way. Related to this is the We shall work with probability theory as extended logic [7, fact that Alice seems to attribute a temporal meaning to the 16, 17], so we denote by P(a |l) the probability of the proposi- order in which the summands appear in the additions (1)–(3), tion a given the prior knowledge l. This prior knowledge, also but we know that the order of terms in an addition has no tem- called (prior) data or prior information, represents the con- poral meaning. If Alice wishes to express temporal details, text the proposition refers to. It is useful to write explicitly she can do so by, e.g., appending subscripts to the symbols, the prior knowledge — a usage advocated by Keynes [18], like Tt , Rt , etc.; in any case she cannot burden the addition Jeffreys [17, 19], Koopman [20, 21, 22], Cox [7, 23, 24], operation with meanings that the latter does not have. Good [25], and Jaynes [16] — because the meaning and the Hence, Alice’s incorrect conclusion stems from her using, probability of a proposition always depend on a given context. inconsistently, the same symbols T and R for two different and Compare, e.g., P(“Tomorrow it will snow” | “We are in Stock- incompatible logical and experimental contexts. She could holm and it is winter”)andP(“Tomorrow it will snow” | “We 3 have used, e.g., TMome,t , RFrum,t , etc., instead, writing her are in Rome and it is summer”). Even more, the probabil- findings as ity of a proposition can be undefined in a given context (i.e., that proposition is meaningless in that context): consider, e.g., P(“My daughter’s name is Kristina” | “I have no children”).  +  =  +  TMome,t RMome,t RMome,t TMome,t Common usage tends to omit the context and writes P(a)in-  TFrum,t + RFrum,t = RFrum,t + TFrum,t , (4) stead of P(a |l), but this usage can in some cases — especially when the context is variable, i.e., several different contexts are where all symbols are consistently and informatively used and considered in a discussion — lead to ambiguities, as will be it is clear that no violations of any mathematical property oc- shown. cur. In particular, a proposition can represent an outcome of an experiment or observation that was, is, or will be, performed. In nuce, an analogous faux pas is made in Ref. [10], al- The context in this case consists of all the details of the exper- though less apparent, because there instead of the addition we imental arrangement and of the theoretical model of the latter have more complex mathematical functions — probabilities which are necessary to assign a probability to that outcome. and entropies —, and quantum experiments take the place of This is true for both “classical” and “quantum” experiments. the example’s Cheshire-Cats experiments. In the mentioned The importance of the context is also evident when consid- paper, the authors compare probabilities and Shannon en- ering a set of propositions tropies pertaining to different contexts (different experimental arrangements); thus their equations do not pertain to the prop- A =def {a : i = 1,...,n } (5) erties they meant to discuss. To my knowledge, this simple i a point has hitherto never been noticed in the literature, and will such that, in the context l, they are mutually exclusive and ex- be shown here by explicit calculation. haustive (i.e., one and only one of them is true). Indeed, it is We shall also investigate how the peculiar reasoning in clear that the propositions may be mutually exclusive and ex- Ref. [10] comes about, and find that it stems from an improper haustive in some context while not being such in another. For temporal interpretation of the logical symbols ‘|’and‘∧’. This example, the propositions {“A red ball is drawn at the first point has not been noticed in the literature either.2 draw”, “A wooden ball is drawn at the first draw”} are mutu- It is also clear that such kinds of seeming “violations”, aris- ally exclusive and exhaustive in the context “Balls are drawn ing through mathematical and logical misapplication, can then from an urn containing one red plastic ball and one blue easily be made to appear not only in quantum experiments, but in classical ones as well, and this will also be shown by simple examples. 3 The relevance of prior knowledge is nicely expressed in the following For these purposes, some notation and definitions for prob- brief dialogue, which I could not resist quoting, between Holmes and Wat- abilities and the Shannon entropy will first be introduced, son [26], taking place after the two listened to a client’s statement: where the rˆole of the context — e.g., the given experimen- “I think that I shall have a whisky and soda and a cigar tal arrangement — is emphasized. Then, after a discussion of after all this cross-questioning. I had formed my conclusions as to the case before our client came into the room.” the two questioned properties of the entropy, the experiments “My dear Holmes!” of Ref. [10] will be presented and re-analysed. Finally, two “[...]Mywholeexaminationservedtoturnmyconjecture “counter-experiments” will be presented. into a certainty. Circumstantial evidence is occasionally very convincing, as when you find a trout in the milk, to quote Thoreau’s example.” “But I have heard all that you have heard.” “Without, however, the knowledge of pre-existing cases 2 In fact, even Timpson [12, p. 15] seems to attach an improper temporal which serves me so well.” sense to the logical concept of ‘joint probability’. 3 wooden ball”, but are neither mutually exclusive nor exhaus- satisfies the properties tive in the context “Balls are drawn from an urn containing   P (a ∧ b ) ∧ (a  ∧ b  ) | l = 0fori  i or j  j , (12) one red wooden ball, one red plastic ball, and one blue i j i j plastic ball”, and make no sense in the context “Cards are P(ai ∧ b j | l) = 1. (13) drawn from a deck”. ij In the following, propositions which are mutually exclusive and exhaustive in a given context will be called alternatives, The probability distributions for the sets of alternatives A, and their set a set of alternatives.4 Such a set may represent B,andA ∧ B in the context l are related by the following stan- the possible, mutually exclusive outcomes in a given experi- dard probability rules (see e.g. Jaynes [16, Ch. 2], Jeffreys [17, ment. Ch. 1][19, Ch. 2], Cox [7, Ch. 1]) The probability distribution for a set of alternatives {ai : i = | = ∧ | , 1,...,na} in the context l will be denoted by P(ai l) P(ai b j l) j def (14a) P(A | l) = {P(ai | l): i = 1,...,na}, (6) P(b j | l) = P(ai ∧ b j | l) i and has, of course, the usual properties

 (marginal probabilities), and P(a ∧ a  | l) = 0fori  i , (7) i i P(ai | l) = 1, (8) P(ai ∧ b j | l) = P(ai | b j ∧ l) P(b j | l) i (14b) = P(b j | ai ∧ l) P(ai | l) where the conjunction [28, 29, 30] (sometimes also called ‘logical product’) of a and b is denoted by a ∧ b. (product rule), from which The Shannon entropy [1] is a function of the probability dis- tribution for a given set of alternatives Ainagivencontextl. P(ai ∧ b j | l) P(a | b ∧ l) = if P(b | l)  0, It is defined as i j P(b | l) j j (14c) def ∧ | H[P(A | l)] = −K P(a | l)lnP(a | l), K > 0, (9) P(ai b j l) i i P(b j | ai ∧ l) = if P(ai | l)  0 i P(ai | l) with the usual conventions for the units in which it is ex- (Bayes’ rule) follow. pressed and the positive constant K, and the limiting proce- All the probability distributions for the sets of alternatives dure when vanishing probabilities are present.5 The Shannon A, B,andA ∧ B have associated Shannon entropies. It is also entropy can be considered to quantify the amount of “uncer- possible to define the conditional entropy of the distribution tainty” associated with a probability distribution.6 for B relative to the distribution for A, as follows: Given two sets of alternatives A and B in the context l,the set of all conjunctions of the propositions {a } with the propo- def i H[P(B | A ∧ l)] = P(ai | l)H[P(B | ai ∧ l)] sitions {b }, j i def = −K P(ai | l) P(b j | ai ∧ l)lnP(b j | ai ∧ l). (15) A ∧ B = {ai ∧ b j : i = 1,...,na; j = 1,...,nb}, (10) i j is also a set of alternatives in l, as follows from the rules of | ∧ probability theory; it will be called a composite set.7 Thus the An analogous definition is given for H[P(A B l)]. composite probability distribution

∧ | =def { ∧ | = ,..., = ,..., } III. PROPERTIES OF THE SHANNON ENTROPY AND P(A B l) P(ai b j l): i 1 na; j 1 nb QUANTUM EXPERIMENTS (11) The various Shannon entropies presented in the previous section possess several mathematical properties [1, 4, 5, 6, 7, 8, 9] which have a fairly intuitive meaning when the Shan- 4 Another name for such a set could be ‘event’ (cf. Tribus [27]); Cox uses non entropy is interpreted as a measure of “uncertainty”. We the terms ‘(inductive) system’ [7] and (with a slightly more general mean- ing)‘question’ [24]. shall focus our attention on two of these properties which, it 5 Cox [7, 24] gives a generalisation of Shannon’s formula which is valid for is claimed [10], are violated in quantum experiments. a general set of not necessarily mutually exclusive propositions. Consider the probability distribution for a composite set of 6 A very small sample of the literature on Shannon’s entropy is represented alternatives A ∧ B in a context l. The entropy H[P(B | l)] and by the work of Kullback [31], R´enyi [32], Acz´el et al. [4, 8], Wehrl [5], | ∧ Shore et al. [33], Jaynes [16]; see also Forte [34]. the conditional entropy H[P(B A l)] satisfy 7 I prefer the term ‘composite’ to the term ‘joint’, as the latter seems to bring about improper temporal meanings. H[P(B | l)] H[P(B | A ∧ l)]. (16a) 4

This property (also called concavity since it arises from the and, in the right-hand sides, those relative to a different exper- concavity of the function −x ln x), intuitively states that the iment. This remark amounts simply to saying that the expres- uncertainty of the probability distribution for B in the context sion ‘x+y = y+x’ holds only if we give x and y the same value l decreases or remains the same, on average, when the context on both sides (cf. Alice’s example in the Introduction). The is “updated” because one of the {ai} is known to be true. An (somewhat pedantic) notation H[P(A | l)] and H[P(B | A ∧ l)], intuitive picture can be given as follows. Imagine that some- instead of the more common H(A)andH(B | A), is used here one performs an experiment (represented by l) consisting in to stress this point. two observations (represented by the sets A and B), and writes The third remark (cf. also Koopman [22]) is that an expres- the results of the experiment in an observation record, say, on sion like ‘B | A’ does not imply that “the observation repre- a piece of paper denoted by ‘l’, under the headings ‘A’and sented by A is performed before the one represented by B”; ‘B’. Now suppose that we know all the details of the experi- nor does ‘A ∧ B’ mean that the two observations are carried ment except for the outcomes: we have not yet taken a look at out “simultaneously”. The temporal order of the observa- the record and are uncertain about what result is written under tions is formally contained in the context l,andthe symbols ‘B’. If we now read the result written under ‘A’, our uncer- ‘|’ and ‘∧’ have a logical, not temporal, meaning.Onemust tainty about the result under ‘B’ will decrease or remain the be careful not to confuse logical concepts with mathematical same on average (i.e., in most, though not all, cases), since the objects or physical procedures, even when there may be some former result can give us some clues about the latter. kind of relationship amongst them; this kind of confusion is The second property (strong additivity [4]) reads strictly related to what Jaynes called “the mind projection fal- H[P(A ∧ B | l)] = H[P(A | l)] + H[P(B | A ∧ l)] lacy” [16, 35, 36]. Consider, e.g., the following situation: an urn contains a = H[P(B | l)] + H[P(A | B ∧ l)], (16b) red, a green, and a yellow ball, and we draw two balls with- and its intuitive meaning is that the uncertainty of the proba- out replacement (this is our context lu). Consider the propo- bility distribution for the composite set of alternatives A ∧ B sitions a =def “Red at the first draw”,andb =def “Green at the is given by the sum of the uncertainty of the probability distri- second draw”. Now, suppose that we also know that the sec- bution for A and the average uncertainty of the updated prob- ond draw yields ‘green’; what is the probability that the first ability distribution for B if one of the {ai} is known to be true, draw yields ‘red’? The answer, to be found in any probability- or vice versa. Using the intuitive picture already proposed, theory textbook, is P(a | b ∧ lu) = P(a ∧ b | lu)/P(b | lu) = we are initially uncertain about both outcomes written on the (1/3) × (1/2)/(1/3) = 1/2. We see that the expression ‘a | b’ record ‘l’. This uncertainty will first decrease as we read the does not mean that b precedes a (nor does it mean that b outcome for ‘A’, and then disappear completely when we read “causes” a): it just expresses a logical connexion. We can the outcome for ‘B’ (given that we do not forget what we have also ask: what is the probability of obtaining first ‘red’ and read under ‘A’). Equivalently, the total uncertainty will first then ‘green’? The answer is again standard: P(a∧b|lu) = 1/6. decrease and then disappear as we read first the outcome un- We see that the expression ‘a ∧ b’ does not mean that a and b der ‘B’ and then the one under ‘A’. happen simultaneously: it just expresses a logical connexion.8 Three remarks are appropriate here. The first is that the Note that Shannon himself, in his paper, does not attribute above are mathematical, not physical, properties: they are not any temporal meaning to the conditional or joint entropies, experimentally observed regularities, but rather follow math- but only the proper logical one. He uses, e.g. [1, §§11 and ematically, through the properties of basic arithmetical func- 12], the expressions H[P(X | Y ∧ l)] and H[P(X ∧ Y | l)] (which tions like the logarithm, once we put some numbers (the prob- in his notation are ‘H (x)’ and ‘H(x, y)’ [1, §6]) where ‘Y’ | | | ∧ y abilities P(a1 l), P(b2 l), P(b2 a1 l), etc.) into the formu- represents the received signal, while ‘X’ represents the signal lae (9) and (15). In particular, they can be neither proven nor source — i.e., where X actually precedes Y.9 Wemayalsore- disproved by experiment, but only correctly or incorrectly ap- mark that Shannon’s entropy formulae for channel character- plied. It is for this reason that one suspects that the seeming istics are not directly dependent on the actual physical details violations found in Ref. [10] are only caused by an incorrect of the system(s) constituting the channel, which can be classi- application of the formulae (16); we shall see that this is the case, because in the mentioned paper some probabilities are used in the right-hand sides of Eqs. (16), and (numerically) different probabilities in the left-hand sides. 8 We can change the example by considering, instead, the order in which This leads us to the second remark: the above properties three identically prepared radioactive nuclei, in three different laboratories, hold when all the probabilities in question refer to the same will decay. The analysis and the conclusion will nevertheless be the same. ‘Quantumness’ plays no rˆole here. context; otherwise, they are not guaranteed to hold. This is 9 From Shannon’s paper: “First there is the entropy H(x) of the source or of because if a specific set of alternatives, in two different con- theinputtothechannel[...]. Theentropyoftheoutputofthechannel, texts, has two different sets of probabilities, then it will yield, i.e., the received signal, will be denoted by H(y).[...]Thejointentropyof , in general, two different entropies as well. Hence we can- input and output will be H(x y). Finally there are two conditional entropies Hx(y)andHy(x), the entropy of the output when the input is known and not expect the formulae (16) to hold if we use, in the left- conversely. Among these quantities we have the relations H(x, y) = H(x) + hand sides, the probabilities relative to a given experiment, Hx(y) = H(y) + Hy(x)” [1, §11]. 5 cal, quantum, or even made of pegasi and unifauns. Only their present. For the set B, we have of course that, using our nota- statistical properties are directly relevant. tion, Hence, the conjunction a ∧b ≡ b ∧a (and B∧A ≡ A∧ B) i j j i | = , | = , is commutative even if the matrix product of two operators P(bout lone) 0 P(bnot-out lone) 1 (20) which can in some way be associated with these propositions so that the Shannon entropy is is not. We shall see that these remarks have indeed a bearing on H[P(B | lone)] = H(0, 1) = 0bit. (21) the analysis presented in Ref. [10], where the validity of the formulae (16) was analysed in two quantum experiments and At this point, asking for the probabilities for the set of alter- =def a classical one, and it was found that, seemingly,Eqs.(16)did natives A, we realise that the propositions aout “The photon def not hold in the quantum case, while they still held in the clas- comes out of the diagonal filter” and anot-out = “The photon sical one. These experiments will be now presented using first does not come out of the diagonal filter” make no sense here, their original notation of Ref. [10] (indicated by the presence since no diagonal filter is present; consequently, there exist no of quotation marks), which makes no reference to the context, entropies like H[P(A | lone)] or H[P(B | A ∧ lone)].Sointhis and then re-analysed using the expanded notation introduced experimental set-up it is not even meaningful to consider the above. property (16a). When we insert a diagonal filter before the horizontal one, we have a new, different experimental arrangement, which A. First quantum experiment will be denoted by ltwo. In this new set-up it does make sense to speak of both sets A and B.10 Quantum mechanics yields The first quantum experiment [10, Fig. 4] runs as fol- the following probabilities: lows. Suppose we send a vertically polarised photon through 1 1 def P(a | l ) = , P(a | l ) = , (22) a horizontal polarisation filter. The set of alternatives B = out two 2 not-out two 2 { , } | ∧ = 1 , bout bnot-out refers to the photon’s coming out of the filter, P(bout aout ltwo) 2 def (23) = 1 with bout “The photon comes out of the horizontal filter”, P(bnot-out | aout ∧ ltwo) = , def 2 bnot-out = “The photon does not come out of the horizontal P(bout | anot-out ∧ ltwo) = 0, filter”. Since we are sure about b , i.e., “P(b ) = 1”, (24) not-out not-out | ∧ = . we have that P(bnot-out anot-out ltwo) 1 “ H(B) = 0”. (17) whence, by the product rule, it follows that Then we insert a diagonal (45◦) filter before the horizontal | = 1 , | = 3 . P(bout ltwo) 4 P(bnot-out ltwo) 4 (25) one; the set of alternatives A refers to the photon’s coming out of the diagonal filter, with aout and anot-out defined analogously. At this point we notice that the probabilities for the set B, Now, if we know that the photon has come out of the diagonal Eqs. (25), differ numerically from those calculated in the pre- filter, we are no longer sure that it will not get through the vious set-up, Eqs. (20). This difference forces us to take note ff horizontal filter, and so the uncertainty on B is increased by of the di erence of the set-ups lone and ltwo; if we ignored this, knowledge of A: inconsistencies would arise already for the probabilities, even before computing any entropy. “ H(B | A) > 0”. (18) The Shannon entropy for the probability distribution for the set B appropriate in this context is readily calculated: Thus we find H[P(B | l )] = H 1 , 3 ≈ 0.81 bit, (26) “0= H(B) < H(B | A) ” (seemingly) (19) two 4 4 and property (16a) is seemingly violated. Only seemingly, though. The fact that a diagonal filter was 10 considered in the reasoning leading to Eq. (18) but not in the In the original formulation of the example [10], the authors denote “by A and B the properties of the photon to have polarization at +45◦ and hori- reasoning leading to Eq. (17), makes us doubt whether the zontal polarization, respectively”; so that A should perhaps be defined as comparison of the entropies “H(B)” and “H(B | A)” really cor- {“The photon has diagonal (45◦) polarisation”, “The photon has no diag- responds to Eq. (16a): these entropies, in fact, apparently refer onal (45◦) polarisation”},andB analogously. However, there are problems to two different experimental arrangements — i.e., to two dif- with these propositions. If the photon is absorbed by the diagonal filter, then it makes no sense to say that the photon has no diagonal polarisation, ferent contexts. It becomes evident that this is indeed the case since the photon is no longer present (note that this problem has nothing to if we proceed by calculating numerically and explicitly all the do with the non-existence of properties of a system before an observation: probabilities first, and then the entropies, keeping the context the point is that, if no photon is present, then it does not make sense to in view. speak about its properties anyway). For this reason the alternative proposi- tions aout, bnot-out, etc., have been used here; however, this has not affected Let us consider again the first experimental set-up, which the point of the experiment in Ref. [10] — namely, the seeming violation will be denoted by lone,whereonly one, horizontal, filter is of property (16a). 6

def and we see that, in fact, it differs from the one in the first spin up along a” and adown = “The particle comes out with set-up: thus, in the first quantum example of Ref. [10] the ex- spin down along a”. We have the following probabilities, in pression “H(B)” is inconsistently and ambiguously used. the notation of Ref. [10]: The conditional entropy relative to A is: = 2 α , = 2 α , “ P(aup) cos 2 P(adown) sin 2 ” (30) H[P(B | A ∧ ltwo)] = 0.5bit, (27) and a corresponding Shannon entropy which amounts to and, as a consequence, = 2 α , 2 α . “ H(A) H cos 2 sin 2 ” (31) 0.81 bit ≈ H[P(B | l )] H[P(B | A ∧ l )] = 0.5bit, (28) two two The particle then proceeds to a second Stern-Gerlach appara- in accord with the property (16a). tus aligned along the x axis; let us denote the corresponding =def { , } So no violations of the property (16a) are found here. Equa- set of alternatives by B bup bdown ,wherebup and bdown are tion (19), from Ref. [10], is simply incorrect, and the seeming defined analogously to aup and adown above. The conditional “violations” that arose from it disappear at once if we write probabilities for B relative to the outcomes for A are: that equation more correctly as 2 π α “ P(bup | aup) = cos − ”, 4 2 (32) H[P(B | l )] < H[P(B | A ∧ l )], (29) | = 2 π − α , one two “ P(bdown aup) sin 4 2 ” 2 π α where we see that the left-hand side refers to the set-up lone, “ P(bup | adown) = sin − ”, 4 2 (33) whereas the right-hand side refers to the different set-up ltwo, 2 π α “ P(bdown | adown) = cos − ”, so that the comparison is between entropies relative to dif- 4 2 ferent experiments, and the equation does not concern prop- and we can easily calculate the following Shannon conditional erty (16a). This fact escaped the attention of the authors of entropy: Ref. [10], partly because the contexts were not explicitly writ- 2 α 2 α 2 α ten, and partly because the probabilities were not explicitly “ H(B | A) = cos × H cos , sin 2 2 2 calculated (so that the authors did not notice that the probabil- + 2 α × 2 α , 2 α sin 2 H sin 2 cos 2 (34) ities for B had two numerically different sets of values). The = H cos2 α , sin2 α ”. authors also seem to interpret the expression “H(B)” as im- 2 2 plying that no observation must precede B, and the expression The sum of the entropies thus far calculated is | “H(B A)” as implying that the observation relative to B must A + | = 2 α , 2 α . be preceded by the one relative to . As already remarked, “ H(A) H(B A) 2 H cos 2 sin 2 ” (35) this needs not be the case. It should be noted that the experimental arrangements lone Now we suppose to exchange the two Stern-Gerlach appa- and ltwo are incompatible, also in the formal sense that their ratus, putting the one along x before the one along a.Wethen conjunction lone ∧ ltwo is false. We could erroneously see ltwo find the following probabilities for B: as a “more detailed” description of lone, equivalent, for exam- 1 1 “ P(bup) = , P(bdown) = ”, (36) ple, to the conjunction of lone and the proposition “Moreover, 2 2 a diagonal polarisation filter is present between the photon with the associated Shannon entropy source and the horizontal filter”. But this is not the case: lone states that nothing is present between the source and the hori- = 1 , 1 . “ H(B) H 2 2 ” (37) zontal filter; if it had been otherwise, and lone had left open the possibility that something unknown could be between source The conditional probabilities, given by quantum theory, for and filter (a linear or circular polarisation filter, or a mirror, the set A relative to the outcomes for B are: or an opaque screen, or something else), then we should have 2 π α “ P(aup | bup) = sin − ”, assigned a different state to the photon reaching the horizontal 4 2 (38) | = 2 π − α , filter, and the calculation of the probabilities would have been “ P(adown bup) cos 4 2 ” ff very di erent [37]. 2 π α “ P(aup | bdown) = cos − ”, 4 2 (39) | = 2 π − α , “ P(adown bdown) sin 4 2 ” B. Second quantum experiment and together with the probabilities (36) they lead to the condi- The second experiment [10, Fig. 5] is as follows. We send a tional entropy / spin-1 2 particle with spin up along the z axis through a Stern- | = 2 α × 2 α , 2 α “ H(A B) sin 2 H sin 2 cos 2 Gerlach apparatus aligned along the axis a that lies in the xz α α α α + cos2 × H cos2 , sin2 (40) plane and forms an angle with the z axis. Let us denote by 2 2 2 { , } =def = 2 α , 2 α . A the set aup adown with aup “The particle comes out with H cos 2 sin 2 ” 7

The sum of the entropies (37) and (40) now yields The formulae above show clearly that, as remarked in §II, it does make sense to consider the composite probability of the + | = 1 , 1 + 2 α , 2 α , “ H(B) H(A B) H 2 2 H cos 2 sin 2 ” (41) outcomes of the two temporally separated observations, as a composite probability needs have nothing to do with the fact α = π/ ff butthisisingeneral(e.g.for 4) di erent from the that the observations are performed “simultaneously” or not. sum (35), and thus we find that, in general, From the joint probabilities, we can compute all probabili- “ H(A) + H(B | A)  H(B) + H(A | B) ” (seemingly), (42) ties involved in this set-up. Applying the marginal probability rule (14a) and Bayes’ rule (14c) we find: in seeming contradiction with the property (16b). However, we notice that two different relative positions of the Stern-Gerlach apparatus were considered, in order to ar- | = + , | = + , rive at Eqs. (35) and (41) respectively. This makes the seem- P(aup m) pab pab¯ P(adown m) pab¯ pa¯b¯ (45) ing contradiction above just an artifact produced, again, by | = + , | = + , P(bup m) pab pab¯ P(bdown m) pab¯ pa¯b¯ (46) the comparison of Shannon entropies relative to two different p experimental arrangements, analogously to what happened in P(a | b ∧ m) = ab , up up p + p the experiment with photons previously discussed. Also in ab ab¯ (47) this case, this is shown by an explicit calculation of all the | ∧ = pab¯ , P(adown bup m) + probabilities relative to the experiments. pab pab¯ In the first set-up, which can be denoted by m,wesend | ∧ = pab¯ , / P(aup bdown m) + the spin-1 2 particle to the Stern-Gerlach apparatus oriented pab¯ pa¯b¯ along a (associated to the set of alternatives A), which is in (48) | ∧ = pa¯b¯ , turn placed before the one oriented along x (associated to the P(adown bdown m) + pab¯ pa¯b¯ set B). Basic quantum-mechanical rules yield the following pab probabilities (in our notation): P(bup | aup ∧ m) = , p + p ¯ ab ab (49) 2 α 2 α p ¯ P(aup | m) = cos , P(adown | m) = sin , (43a) | ∧ = ab , 2 2 P(bdown aup m) + pab pab¯ 2 π α P(bup | aup ∧ m) = cos − , 4 2 pab¯ 2 π α (43b) P(bup | adown ∧ m) = , P(bdown | aup ∧ m) = sin − , p + p ¯ 4 2 ab¯ a¯b (50) p ¯ | ∧ = 2 π − α , P(b | a ∧ m) = a¯b . P(bup adown m) sin down down + 4 2 (43c) pab¯ pa¯b¯ | ∧ = 2 π − α , P(bdown adown m) cos 4 2 ffi and these are su cient to calculate, by the product rule (14b), Here the probabilities (43) have been re-written in terms of the joint probabilities the joint probabilities. def 2 α 2 π α pab = P(aup ∧ bup | m) = cos · cos − , (44a) A look at Eqs. (47) and (48) shows that Bayes’ rule also 2 4 2 def 2 α 2 π α applies in “quantum experiments”, and provides a counter- p ¯ = P(a ∧ b | m) = cos · sin − , (44b) ab up down 2 4 2 | example to the incorrect statement that the expression ‘aup =def ∧ | = 2 α · 2 π − α , pab¯ P(adown bup m) sin sin (44c) bup’ would mean “bup precedes aup”. 2 4 2 =def ∧ | = 2 α · 2 π − α . pa¯b¯ P(adown bdown m) sin 2 cos 4 2 (44d) We can proceed to calculate the Shannon entropies

| = − + + + + + , H[P(A m)] K[(pab pab¯ )ln(pab pab¯ ) (pab¯ pa¯b¯ )ln(pab¯ pa¯b¯ )] (51) | = − + + + + + , H[P(B m)] K[(pab pab¯ )ln(pab pab¯ ) (pab¯ pa¯b¯ )ln(pab¯ pa¯b¯ )] (52) as well as the conditional entropies pab pab pab¯ pab¯ H[P(B | A ∧ m)] = K (p + p ¯ ) − ln − ln ab ab p + p p + p p + p p + p ab ab¯ ab ab¯ ab ab¯ ab ab¯ + + − pab¯ pab¯ − pa¯b¯ pa¯b¯ K (pab¯ pa¯b¯ ) + ln + + ln + pab¯ pa¯b¯ pab¯ pa¯b¯ pab¯ pa¯b¯ pab¯ pa¯b¯ = − − − − K [ pab ln pab pab¯ ln pab¯ pab¯ ln pab¯ pa¯b¯ ln pa¯b¯ + + + + + + , (pab pab¯ )ln(pab pab¯ ) (pab¯ pa¯b¯ )ln(pab¯ pa¯b¯ )] (53) 8 | ∧ = + − pab pab − pab¯ pab¯ H[P(A B m)] K (pab pab¯ ) + ln + + ln + pab pab¯ pab pab¯ pab pab¯ pab pab¯ + + − pab¯ pab¯ − pa¯b¯ pa¯b¯ K (pab¯ pa¯b¯ ) + ln + + ln + pab¯ pa¯b¯ pab¯ pa¯b¯ pab¯ pa¯b¯ pab¯ pa¯b¯ = − − − − K [ pab ln pab pab¯ ln pab¯ pab¯ ln pab¯ pa¯b¯ ln pa¯b¯ + + + + + + , (pab pab¯ )ln(pab pab¯ ) (pab¯ pa¯b¯ )ln(pab¯ pa¯b¯ )] (54) where the expressions have been simplified making use of the additivity property of the logarithm. Finally, from Eqs. (51) and (53), (52) and (54), we find

H[P(A | m)] + H[P(B | A ∧ m)] ≡ H[P(B | m)] + H[P(A | B ∧ m)] = − + + + K (pab ln pab pab¯ ln pab¯ pab¯ ln pab¯ pa¯b¯ ln pa¯b¯ ) (55) ≡ H[P(A ∧ B | m)],

whence we see that the property (16b) is satisfied (we find, Thus we have found no inconsistencies in this second ex- e.g., that H[P(A | m)] + H[P(B | A ∧ m)] ≡ H[P(B | m)] + periment either: Equation (42), from Ref. [10], is simply in- H[P(A | B ∧ m)] ≡ H[P(A ∧ B | m)] ≈ 1.20 bit, for α = π/4). correct, and can more correctly be written as We note that the way in which Eq. (55) has been found | + | ∧ does not depend on the numerical values of the probabilities H[P(A m)] H[P(B A m)] {pij}, but only on the additivity property of the logarithm; so  H[P(B | minv)] + H[P(A | B ∧ minv)], (58) the calculations above can be seen as a mathematical proof of or equivalently and more briefly as the property (16b) for the special case of sets with only two alternatives. H[P(A ∧ B | m)]  H[P(A ∧ B | minv)], (59) If we invert the positions of the two Stern-Gerlach appara- and its content is that the Shannon entropies in the experiment tus, placing the one oriented along x (associated to the set B) m are in general different from those in the different experi- before the one oriented along a (associated to the set A), we ment m . This does not surprise us, since the experiments ff inv then realise a new, di erent experimental arrangement, which have different set-ups. can be denoted by minv. The probabilities for the sets of al- The fact that Eq. (42) does not pertain to property (16b) ff ternatives A and B will thus di er from those in m:wehave is not noticed in Ref. [10], again because the contexts are indeed not kept explicit in the notation. Moreover, the expression “H(B|A)” is considered there as implying that the observation P(b | m ) = 1 , P(b | m ) = 1 , (56a) up inv 2 down inv 2 corresponding to A is performed before the one corresponding | ∧ = 2 π − α , to B:11 it is for this reason that, in order to calculate “H(B|A)”, P(aup bup minv) sin 4 2 (56b) the authors consider the experiment in which the observation P(a | b ∧ m ) = cos2 π − α , down up inv 4 2 corresponding to A is performed before the one correspond- 2 π α ing to B, but then, in order to calculate “H(A | B)”, they feel P(aup | bdown ∧ minv) = cos − , 4 2 (56c) compelled to change the order of the observations — with the P a | b ∧ m = 2 π − α . ( down down inv) sin 4 2 only effect of changing the whole problem and all probabili- ties instead! But, as already remarked, the conditional symbol It is clear that, from this point on, we can proceed as in the | analysis of the first set-up, obtaining ‘ ’ does not have that meaning. The point is that the temporal order of acquisition of knowledge about two physical events does not necessarily correspond to the temporal order in which H[P(A | m )] + H[P(B | A ∧ m )] inv inv these events occur. = H[P(B | minv)] + H[P(A | B ∧ minv)] = −K (p ln p + p ln p ab ab ab¯ ab¯ C. Classical experiment +   +   , (57) pab¯ ln pab¯ pa¯b¯ ln pa¯b¯ ) ≡ H[P(A ∧ B | minv)] Together with the two quantum experiments, the authors of Ref. [10] also present an example of a classical experiment {  } { ∧ where the pij are the values of the joint probabilities P(ai b j | minv)},different, in general, from the {pij}. In any case, property (16b) is again satisfied in the new context (we have, | + | ∧ ≡ | + e.g., H[P(A minv)] H[P(B A minv)] H[P(B minv)] 11 Cf. the discussion of the formula “H(A) + H(B | A) = H(B) + H(A | B)” in H[P(A|B∧minv)] ≡ H[P(A∧B|minv)] ≈ 1.60 bit, for α = π/4). Ref. [10, §III]. 9 in which, they claim, the property (16b) is not violated. It denote this new experiment by ninv. It is clear that n and ninv is useful to re-analyse this example as well, in order to show are really different experiments, for the following reason: in that, in fact, it is not an instance of confirmation of the prop- n, between the two draws, the second box contains black or erty (16b), because it, too, involves two different experimental white balls; while in ninv it contains plastic or wooden balls. arrangements. The probabilities for the set B this time are P(bplastic |ninv) = 3 | = 1 The idea [10, Fig. 3] is as follows. We fill a box with four 4 and P(bwood ninv) 4 , with an entropy balls of different colours (black and white) and compositions | = 3 , 1 ≈ . , (plastic and wood). There are two black plastic balls, one H[P(B ninv)] H 4 4 0 81 bit (65) white plastic ball, one white wooden ball. We shake the box, def while the conditional probabilities for A are draw a ball blindfold, and consider the set A = {ablack, awhite} 2 for the ball’s being black or white. If the ball is black, then P(ablack | bplastic ∧ ninv) = , we put all black balls in a new box, draw a new ball from this 3 (66) | ∧ = 1 , def P(awhite bplastic ninv) 3 box, and consider the set B = {bplastic, bwood} for this ball’s being plastic or wooden. We proceed analogously if the first if the first result was ‘plastic’, and drawn ball was white instead.12 Let us denote the set-up just described by n. P(ablack | bwood ∧ ninv) = 0, The probabilities of first drawing a black or a white ball are (67) P(awhite | bwood ∧ ninv) = 1, respectively P(ablack | n) = 1/2andP(awhite | n) = 1/2 and thus their Shannon entropy is if it was ‘wooden’. The conditional entropy is H[P(A | n)] = H 1 , 1 = 1bit. (60) | ∧ = 3 2 , 1 + 1 , 2 2 H[P(A B ninv)] 4 H 3 3 4 H(0 1) ≈ 3 × . + 1 × The conditional probabilities of the second drawn ball’s being 4 0 92 bit 3 0bit (68) plastic or wooden, given the outcome of the first observation, ≈ 0.69 bit, are and adding this time Eqs. (65) and (68), we find

P(bplastic | ablack ∧ n) = 1, P(bwood | ablack ∧ n) = 0, (61) H[P(B | ninv)] + H[P(A | B ∧ ninv)] = 1.5bit. (69) if the first result was ‘black’, and We see that the entropies (64) and (69) are equal,

P b | a ∧ n = 1 , P b | a ∧ n = 1 , ( plastic white ) 2 ( wood white ) 2 (62) H[P(A | ninv)] + H[P(B | A ∧ ninv)] = | + | ∧ , if it was ‘white’. From these probabilities the following Shan- H[P(B ninv)] H[P(A B ninv)] (70) non conditional entropy can be computed: which is equivalent to H[P(B | A ∧ n)] = 1 H(1, 0) + 1 H 1 , 1 ∧ | = ∧ | , 2 2 2 2 (63) H[P(A B n)] H[P(A B ninv)] (71) = 1 × 0bit+ 1 × 1bit= 0.5bit. 2 2 but it is clear that this is not the statement of property (16b), Combining Eqs. (60) and (63) we obtain because the right- and left-hand sides of this equation refer to two different experiments. The content of the equality above H[P(A | n)] + H[P(B | A ∧ n)] = 1.5bit. (64) is only that the Shannon entropies for the probability distribu- tions for the composite set of alternatives A ∧ B are equal in Now we suppose to make the observations in inverse order the two experiments n and ninv. instead. We shake the initial box, draw a ball, consider first the def set B = {bplastic, bwood} for the ball’s being plastic or wooden. Depending on the outcome we fill a new box either with the IV. THE ROLEˆ OF “QUANTUMNESS” plastic or the wooden balls, and draw a new ball; then we def consider the set A = {ablack, awhite} for the new ball. Let us We have shown thus far that the quantum experiments in Ref. [10] did not involve any violation of the Shannon en- tropy’s properties. However, we may still imagine someone raising the following argument: 12 In Brukner and Zeilinger’s original example, the black and white balls are put into two separate boxes after the first draw, and a ball is drawn from “Very well, the properties (16) are not vio- each box separately. But these two final draws are then related to two sepa- lated in any experiment. But one notices that the rate observations, not one, so that in total we have three observations in this experiment, and the formula (16b) is not even appropriate. The experiment equality for different contexts has thus been modified here, in order to preserve the two authors’ original intention. H[P(A ∧ B | n)] = H[P(A ∧ B | ninv)], (71)r 10

while holding in the classical experiment of The Shannon entropies are §III C, does not hold in general in the quantum § | = 1 , 2 ≈ . , one of III B, where it is found instead that H[P(A k)] H 3 3 0 92 bit (75) 1 2 1 1 ∧ |  ∧ | . H[P(B | A ∧ k)] = H(1, 0) + H , H[P(A B m)] H[P(A B minv)] (59)r 3 3 2 2 (76) = 1 × + 2 × ≈ . , 3 0bit 3 1bit 0 67 bit From this particular case, one can see that in clas- sical experiments the Shannon entropy remains and their sum is the same if the temporal order of observations is changed, whereas in quantum experiments the H[P(A | k)] + H[P(B | A ∧ k)] entropy changes together with the change in tem- ≡ H[P(B | k)] + H[P(A | B ∧ k)] poral order. This phenomenon is thus a pecu- (77) ≡ ∧ | ≈ . . liarity of the quantum nature of the experiments H[P(A B k)] 1 58 bit — a sort of ‘quantum-context-dependence’ of the Shannon entropy.” Now let us consider the set-up kinv, in which the observa- tions are made in reverse order, but with the same general pro- cedure. The probabilities are then: But this argument has no validity, of course. The Shan- non entropy is always “context dependent”, and this comes P(b | k ) = 2 , P(b | k ) = 1 , (78) from the fact that probabilities are always context dependent, plastic inv 3 wood inv 3 in both classical and quantum experiments. We can further | ∧ = 2 , P(ablack bplastic kinv) 3 (79) and illustrate this fact by means of two more experiments, P(a | b ∧ k ) = 1 , (80) which will also serve as counter-examples of the ones already white plastic inv 3 discussed. P(ablack | bwood ∧ kinv) = 0, (81)

P(awhite | bwood ∧ kinv) = 1. (82)

A. First counter-example These lead to the entropies | = 2 , 1 ≈ . , The first counter-example is a modification, based on the H[P(B kinv)] H 3 3 0 92 bit (83) examples presented by Kirkpatrick [38, 39] of the experiment H[P(A | B ∧ k )] = 2 H 2 , 1 + 1 H(0, 1) with the balls discussed in §III C.13 The balls are in addition inv 3 3 3 3 ≈ 2 × . + 1 × big or small as well now: there are one big black plastic ball, 3 0 92 bit 3 0bit (84) one small black plastic ball, one small white plastic ball, and ≈ 0.61 bit, one small white wooden ball. Initially, we prepare the box so that it contains only all small and the sum balls. Then we shake the box, draw a ball blindfold, and con- =def { , } sider the set A ablack awhite for the ball’s being black or H[P(B | kinv)] + H[P(A | B ∧ kinv)] white. If the drawn ball is black, then we prepare the box so ≡ H[P(A | kinv)] + H[P(B | A ∧ kinv)] that it contains only all black balls (also the big black one that (85) ≡ ∧ | ≈ . . was initially not in the box), draw a new ball from this box, H[P(A B kinv)] 1 53 bit and consider the set B =def {b , b } for this ball’s being plastic wood Comparing Eqs. (77) and (85) we find plastic or wooden. We proceed analogously if the first drawn ball was white.14 It is easy to see that, in the set-up just de- H[P(A ∧ B | k)]  H[P(A ∧ B | k )]. (86) scribed, denoted by k, we have the following probabilities: inv Hence, for this experiment, of a clearly classical nature, we P(a | k) = 1 , P(a | k) = 2 , (72) black 3 white 3 obtain different statistics and entropies depending on the order | ∧ = , | ∧ = , 15 P(bplastic ablack k) 1 P(bwood ablack k) 0 (73) in which we observe colour and composition. | ∧ = 1 , | ∧ = 1 . Note, in any case, that in each of the two set-ups — P(bplastic awhite k) 2 P(bwood awhite k) 2 (74) Eqs. (77) and (85) — the properties (16) are always satisfied, just as in the quantum experiments.

13 Cf. also the toy models discussed by Hardy [40] and Spekkens [41]. 14 The Reader should not, too simply, identify the “system” here with some specific group of balls. It is rather associated with a variable collection 15 We may note, incidentally, that it has long been known that the thermody- of balls, in analogy with a classical open system associated with variable namic entropy of a classical thermodynamic system in a non-equilibrium number (and species) of particles. state depends on its complete previous history [42]. 11

B. Second counter-example the properties of the Shannon entropy (Eqs. (16)), which refer to a single, well-defined experiment and hold unconditionally. The second counter-example is of a quantum-mechanical The peculiar results arising from the comparison of en- nature. It runs precisely like the experiment with spin-1/2par- tropies relative to different experimental contexts can thus ap- ticles discussed in §III B, except that now the particle has ini- pear or not appear in any kind of experiments, classical as well tially spin up, not along the z axis, but along the axis b that as quantum; this has been shown by means of two counter- bisects the angle ax (i.e., b lies in the xaz plane and forms an examples: a classical one, in which a change in the temporal angle β =def π/4−α/2 with both the x and a axes; remember that order of the experiment leads to a change in entropy values, α is the angle az). The analysis of this experiment proceeds and a genuinely quantum one, in which the same temporal completely along the lines of the re-analysis of §III B, if we change leads to no entropy changes. introduce the contexts q and qinv, and change Eqs. (43) with A conclusion is that Bohr’s dictum ought to be observed also in mathematical notation, and not only in the analysis of | = 2 β , | = 2 β , P(aup q) cos 2 P(adown q) sin 2 (87a) quantum phenomena, but in the analysis of classical phenom- ena as well, because probabilities — and, consequently, Shan- | ∧ = 2 β, P(bup aup q) cos non entropies — always depend on “the whole experimental 2 (87b) P(bdown | aup ∧ q) = sin β, arrangement” [44] taken into account.

2 P(bup | adown ∧ q) = sin β, (87c) | ∧ = 2 β, P(bdown adown q) cos Acknowledgements and Eqs. (56) with I thank Professor Gunnar Bj¨ork for continuous encourage- | = 2 β , | = 2 β , ment and advice, Professor Sandra Brunsberg for advice, and P(bup qinv) cos 2 P(bdown qinv) sin 2 (88a) Professor Kim Kirkpatrick for useful discussions. Finan- 2 P(aup | bup ∧ qinv) = sin β, cial support from the Foundation Angelo Della Riccia and (88b) 2 the Foundation BLANCEFLOR Boncompagni-Ludovisi n´ee P(adown | bup ∧ qinv) = cos β, Bildt is gratefully acknowledged. 2 P(aup | bdown ∧ qinv) = cos β, 2 (88c) P(adown | bdown ∧ qinv) = sin β.

It is obvious that this leads to the equalities P(ai ∧ b j | q) = ∧ | [1] C. E. Shannon, A mathematical theory of communica- P(ai b j qinv) and eventually to the equality tion, Bell Syst. Tech. J. 27, 379–423, 623–656 (1948), http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html. H[P(A ∧ B | q)] = H[P(A ∧ B | qinv)], (89) [2] M. Tribus, Thirty years of information theory,inLevineand Tribus [45] (1979), p. 1. exactly as it happened in the experiment with the balls of [3] A. Zellner, ed., Bayesian Analysis in Econometrics and Statis- §III C (for α = π/4, e.g., we have H[P(A ∧ B | q)] ≈ 0.83 bit). tics (North-Holland, Amsterdam, 1980). But here the experiment is a purely quantum one: we see in- [4] J. Acz´el and Z. Dar´oczy, On Measures of Information and Their deed that the observables do not commute here, the initial state Characterizations (Academic Press, New York, 1975). [5] A. Wehrl, General properties of entropy, Rev. Mod. Phys. 50, pure not diagonal is , and its density matrix is in either of the 221–260 (1978). observables’ bases. Compare this result with the discussion in [6] A. Wehrl, Remarks on A-entropy, Rep. Math. Phys. 12, 385 Ref. [43]. (1977). [7] R. T. Cox, The Algebra of Probable Inference (The Johns Hop- kins Press, Baltimore, 1961). V. CONCLUSIONS [8] J. Acz´el, Measuring information beyond communication the- ory: Some probably useful and some almost certainly useless generalizations, Inform. Process. Manag. 20, 383–395 (1984). We have shown that the properties of the Shannon entropy [9] W. Ochs, Basic properties of the generalized Boltzmann-Gibbs- are not violated in quantum experiments, contrary to the con- Shannon entropy, Rep. Math. Phys. 9, 135 (1976). clusions of Ref. [10]. In that paper, an idiosyncratic temporal [10] C.ˇ Brukner and A. Zeilinger, Conceptual inadequacy of the interpretation of the conditional symbol ‘|’ and of the conjunc- Shannon information in quantum measurements,Phys.Rev.A tion symbol ‘∧’ leads the authors to change experimental set- 63, 022113 (2001), quant-ph/0006087. ups in order to calculate various entropies. As a result, they [11] M. J. W. Hall, Comment on “Conceptual inadequacy of Shan- ff non information...” by C. Brukner and A. Zeilinger (2000), compare entropies relative to di erent experimental arrange- quant-ph/0007116. ments instead, and do not notice that the probabilities are also [12] C. G. Timpson, On a supposed conceptual inadequacy of the different, and so their results (Eqs. (19) and (42) in this paper), Shannon information in quantum mechanics, Stud. Hist. Phil. besides not being formally correctly written, do not pertain to Mod. Phys. 34, 441–468 (2003), quant-ph/0112178. 12

[13] C. G. Timpson, The applicability of Shannon infor- ciple of maximum entropy and the principle of minimum cross- mation in quantum mechanics and Zeilinger’s foun- entropy, IEEE Trans. Inform. Theor. IT-26, 26–37 (1980). dational principle, Philos. Sci. 70, 1233–1244 (2003), [34] B. Forte, Entropies with and without probabilities: Applica- http://philsci-archive.pitt.edu/archive/00001100/. tions to questionnaires, Inform. Process. Manag. 20, 397–405 [14] L. Carroll (Rev. Charles Lutwidge Dodgson), Alice’s Adven- (1984). tures in Wonderland (1865), Repr. in [46]. [35] E. T. Jaynes, Clearing up mysteries — the original goal,in [15] L. Carroll (Rev. Charles Lutwidge Dodgson), Through the Maximum Entropy and Bayesian Methods, edited by J. Skilling Looking-Glass, and What Alice Found There (1871), Repr. in (Kluwer Academic Publishers, Dordrecht, 1989), pp. 1–27, [46]. http://bayes.wustl.edu/etj/node1.html. [16] E. T. Jaynes, Probability Theory: The Logic of Science (Cam- [36] E. T. Jaynes, Probability theory as logic,inProceedings of the bridge University Press, Cambridge, 2003), edited by G. Larry Ninth Annual Workshop on Maximum Entropy and Bayesian Bretthorst. Methods, edited by P. Foug`ere (Kluwer Academic Publish- [17] H. Jeffreys, Theory of Probability (Oxford University Press, ers, Dordrecht, 1990), revised, corrected, and extended in London, 1998), 3rd ed., first publ. 1939. http://bayes.wustl.edu/etj/node1.html. [18] J. M. Keynes, A Treatise on Probability (MacMillan, London, [37] C. A. Fuchs, quantum foundations in the light of quantum in- 1921), 5th ed. formation (2001), quant-ph/0106166. [19] H. Jeffreys, Scientific Inference (Cambridge University Press, [38] K. A. Kirkpatrick, “Quantal” behavior in classical probability, Cambridge, 1957), 2nd ed., first publ. 1931. Found. Phys. Lett. 16, 199 (2003), quant-ph/0106072. [20] B. O. Koopman, The bases of probability, Bull. Am. Math. Soc. [39] K. A. Kirkpatrick, Classical three-box “paradox”, J. Phys. A: 46, 763–774 (1940), repr. in [47, pp. 159–172]. Math. Gen. 36, 4891–4900 (2003), quant-ph/0207124. [21] B. O. Koopman, The axioms and algebra of intuitive probabil- [40] L. Hardy, Disentangling nonlocality and teleportation (1999), ity, Ann. Math. 41, 269–292 (1940). quant-ph/9906123. [22] B. O. Koopman, Quantum theory and the foundations of prob- [41] R. W. Spekkens, In defense of the epistemic view of quantum ability, in MacColl [48] (1957), p. 97. states: a toy theory (2004), quant-ph/0401052. [23] R. T. Cox, Probability, frequency, and reasonable expectation, [42] C. A. Truesdell, III, Rational Thermodynamics (Springer- Am. J. Phys. 14, 1 (1946). Verlag, New York, 1984), 2nd ed., first publ. 1969. [24] R. T. Cox, Of inference and inquiry, an essay in inductive logic, [43] C.ˇ Brukner and A. Zeilinger, Quantum measurement and Shan- in Levine and Tribus [45] (1979), p. 119. non information, a reply to M. J. W. Hall (2000), quant- [25] I. J. Good, Probability and the Weighing of Evidence (Charles ph/0008091. Griffin, London, 1950). [44] N. Bohr, Discussion with Einstein on epistemological problems [26] A. C. Doyle, The noble bachelor,inThe Adventures of Sherlock in atomic physics, in Schilpp [50] (1949), pp. 199–241. Holmes (1891), reprinted in [49]. [45] R. D. Levine and M. Tribus, eds., The Maximum Entropy For- [27] M. Tribus, Information theory as the basis for thermostatics malism (The MIT Press, Cambridge (USA), 1979). and thermodynamics, J. Appl. Mech. 28, 1–8 (1961). [46] L. Carroll (Rev. Charles Lutwidge Dodgson), The Annotated [28] C. I. Lewis and C. H. Langford, Symbolic Logic (The Century Alice (Penguin Books, London, 1970), Ed. by Martin Gardner. Company, New York, 1932). [47] H. E. Kyburg, Jr. and H. E. Smokler, eds., Studies in Subjective [29] I. M. Copi, Introduction to Logic (MacMillan, New York, Probability (John Wiley & Sons, New York, 1964). 1959), first publ. 1953. [48] L. A. MacColl, ed., Applied Probability (McGraw-Hill Book [30] A. G. Hamilton, Logic for Mathematicians (Cambridge Univer- Company, New York, 1957). sity Press, Cambridge, 1993), first publ. 1978. [49] A. C. Doyle, Sherlock Holmes: The Complete Novels and Sto- [31] S. Kullback, Information Theory and Statistics (John Wiley & ries (Bantam, New York, 1986). Sons, New York, 1959). [50] P. A. Schilpp, ed., Albert Einstein: Philosopher-Scientist (Tu- [32] A. R´enyi, Probability Theory (North-Holland, Amsterdam, dor Publishing Company, New York, 1951), 2nd ed., first publ. 1970). 1949. [33] J. E. Shore and R. W. Johnson, Axiomatic derivation of the prin- Paper B

arXiv:quant-ph/0408193 v4 30 Jan 2005 usaiiy issues. guishability” faseigvoaino h eodlwo thermodynam- of law to second due the apparently of [14], ics violation Jaynes seeming by given a analysis, be of known will less probably This sim- and lucid, ple, a true. with [2] necessarily demonstration Peres’ comparing not state- by done second is Peres’) rephrased, (and once Neumann’s von ment, that show to and [2–13], then description mathematical physi- its a with and between phenomenon distinction “state” cal the clear term make which ambiguous terms the other substituting above, ments sae” ecudvoaetescn a fthermodynamics. of law second the violate could semi-permeable we two “states”, produce to non-orthogonal distinguish able unambiguously were which membranes we demonstration if a that gave also show by [2], to Peres, book lucid formula. given and insightful entropy con- his his “thermodynamic in statements derives same he the which two by on siderations” based these are Neumann of von demonstrations The † ∗ n h converse the and p 197): (p. Quantenmechanik .VNNUAN(N EE)O ORTHOGONALITY ON PERES) (AND NEUMANN VON 1. 1 eiae otemmr fAhrPeres Asher of memory the to Dedicated lcrncaddress: Electronic itnusaiiyo o-rhgnldniymtie osntipyvoain ftescn law second the of violations imply not does matrices density non-orthogonal of Distinguishability ze Zustände “zwei nl owdrpih i nam ie oce eiemalnWand semipermeablen solchen einer Hauptsatz”. Annahme zweiten dem die widerspricht so ort onal, sie wenn können, werden h ups fti ae sfis orprs h w state- two the rephrase to first is paper this of purpose The n§. fvnNeumann’s von of §V.2 In eodlw[fthermodynamics]. [of law second the contradicts membrane semi-permeable a such if orthogo- are nal; they if membrane semi-permeable a states two φ 1 , ψ r o rhgnl hnteasmto of assumption the then orthogonal, not are eette,adte oprn qatm hroyai nlssgvnb ee iha“lsia”one “classical” a with rep- Peres which by matrices given density analysis the thermodynamic and “quantum” preparations a Jaynes. points physical related by comparing These given (and between then facts. space distinction and experimental density-matrix the new them, particular hypothetical clear resent the the Neumann. making describe that von by to mean by adequate shown stated surely be are instead would not was would possibility as adopted a thermodynamics, space) such of Hilbert law hand, second other the the of On violation a imply necessarily not ASnmes 03.65.Ca,65.40.Gr,03.67.-a numbers: PACS φ h yohtclpsiiiyo itnusigpeaain ecie ynnotooa est arcsdoes matrices density non-orthogonal by described preparations distinguishing of possibility hypothetical The , φ [email protected] ψ N H EODLAW SECOND THE AND , 1 efidtefloigtopropositions two following the find we [1] [ ...]d u r c hs e m i p e r m e a b l eW ä n d eb e s t i m m tg e t r e n n t ψ [...]cancertainlybeseparatedby classical oa nttt fTcnlg KH,Iajrsaa 2 E144 tchl,Sweden Stockholm, 40 SE-164 22, Isafjordsgatan (KTH), Technology of Institute Royal ooa id,ad“sind and sind”, hogonal ahmtsh rnlgnder Grundlagen Mathematische eateto ireetoisadIfrainTcnlg (IMIT), Technology Information and Microelectronics of Department ie o-unu)“indistin- non-quantum) (i.e. aoaoyo unu lcrnc n unu pis(QEO), Optics Quantum and Electronics Quantum of Laboratory φ , ψ ih orthog- nicht ir .Lc Mana Luca G. Piero Dtd 0Jnay2005) January 30 (Dated: (2) (1) c.as 1–7) hc a la hscl experimental physical, clear a and has meaning, which [15–17]), also (cf. h oto aigsm eatcvoec ote,adin terms and distinct in two them, “state” the term of to favour too-many-faced violence the eliminate semantic me some let particular making of cost the cycle vanishes. thermodynamic change closed entropy a the in that assumption the only but entropy “quantum” or “classical” Indeed, about formulae. questions to nor mechan- statistical ics, and thermodynamics between relation the lows: general. more be us let moment the for but section, device next this the discuss in and reintroduce shall We preparations. cal to ability the con- sequences thermodynamic with endow a could needed that he device conceptual because only an membranes of of spoke Neumann Von idea general more with the membrane semi-permeable a of concept thermodynamic old the use to prefer I concern hereafter. densities, term here probability consider no we but matrices statistics the since [19]); Fano what (cf. for matrix term ‘density old called the been is has matrix’ ‘statistical instead: object cal Tosaitclmatrices statistical (Two n nyi tr( if only and e enwrprs o emn’ ttmnsaoe at above, statements Neumann’s von rephrase now me Let about questions with concerned directly not is paper The ecnepesvnNuansfis ttmn 1 sfol- as (1) statement first Neumann’s von express can We the being, time the for only but substitute, also me Let .EEEI FVNNUANSSTATEMENTS NEUMANN’S VON OF EXEGESIS 2. ontko faysc bevto procedure. observation such any of know not do we if matrices statistical non-orthogonal vice by and versa, matrices; the statistical orthogonal represent by latter mathematically with we preparations then physical certainty, two we which distinguish by procedure can observation an is there if † ttsia matrix statistical φψ ) opriua nrp oml ilb used be will formula entropy particular no = . hsr-ttmn ae la the clear makes re-statement This 0.) distinguish φ and bevto procedure observation paetysneWge [18] Wigner since apparently ’ hc eoe mathemati- a denotes which , pyia)preparation (physical) ψ r adt eotooa if orthogonal be to said are adtu eaae physi- separate) thus (and [2–11]. [2–13] (1’) ∗ , 2 difference between the physical phenomenon and the math- ‘distinguishability ⇒ orthogonality’: remember that if ‘A ⇒ ematical objects used to describe, or represent, it: we see B’ holds, then we cannot have both A true and B false [20]. that von Neumann’s original proposition, which looked math- But we know that from a false antecedent one can idly de- ematical in nature, disguises in fact a statement about physics duce any proposition whatever [20], hence the statements (2) methodology. Note also that from a logical point of view2 I and (2’) are devoid of real content. Speaking on a less ab- have completed von Neumann’s original proposition stractly logical level, the point is that if we can distinguish two preparations represented by non-orthogonal statistical matri- orthogonality ⇒ distinguishability ces, then we are evidently no longer following our prescrip- tion (1’) to mathematically represent preparation distinguish- into the two ability by means of matrix orthogonality; we are inconsistent. Any consequences we derive, like e.g. violations of the sec- ⇒ , distinguishability orthogonality ond law, are likely to be only artifacts of our inconsistency indistinguishability ⇒ non-orthogonality, rather than real phenomena. We must thus amend the partic- ular statistical matrices or the whole statistical-matrix space or equivalently used, since they have not been adequately chosen to describe the physical phenomenon in question or our experimental ca- distinguishability ⇔ orthogonality. pabilities.

(The reader should pay attention not to confuse the I shall in fact show explicitly that the violation of the second “causal” connexion, in a loose sense, with the logical con- law of statements (2) and (2’) is only an artifact, by analysing nexion which exist between distinguishability and orthogo- Peres’ demonstration [2] and comparing it with Jaynes’ al- nality. The causal connexion goes only in one direction: a ready mentioned analysis [14] of an analogous seeming viola- physicist uses orthogonal matrices because the corresponding tion of the second law of thermodynamics due to entirely clas- 4 preparations are distinguishable, but two preparations do not sical “indistinguishability” issues. In order to do this, let us become suddenly distinguishable just because she has repre- analyse the way in which von Neumann and Peres link quan- sented them on paper by two orthogonal matrices. The logical tum distinguishability and non-orthogonality with the second connexion goes instead both ways: if a physicist can distin- law of thermodynamics. guish two preparations, we can deduce that she will represent them by orthogonal matrices; and if we see that a physicist represents two preparations by orthogonal matrices, we can 3. “QUANTUM” IDEAL GASES, SEMI-PERMEABLE deduce that she knows of an observation procedure that can MEMBRANES, AND THERMODYNAMICS distinguish those preparations. The present author confused these two kind of connexions himself in the first drafts of this In order to analyse a proposition which relates distinguish- paper. This kind of confusion between physics and probability ability of quantum preparations with the second law of ther- is called “the mind-projection fallacy” by Jaynes [21–23].) modynamics, it is necessary to introduce a thermodynamic With regard to von Neumann’s (and Peres’) second propo- body possessing “quantum” characteristics, i.e., quantum de- sition (2), it can be rephrased as follows: grees of freedom.

if we could distinguish, by means of some obser- Let us first recall that (“classical”) ideal gases are defined vation procedure, two physical preparations rep- as homogeneous, uniform thermodynamic bodies characteris- resented by non-orthogonal statistical matrices, (2’) able by two thermodynamic variables: the volume V > 0and then we could violate the second law of thermo- the temperature T > 0, and for which the internal energy is a 5 dynamics. function of the variable T alone; this implies, via the first law of thermodynamics, that in any isothermal process the work This proposition makes as well a clear distinction between done by the gas, W, is always equal to the heat absorbed by physical phenomenon and mathematical description; how- the gas, Q: ever, it is logically intrinsically vain. Let us see why from a logical point of view first. The antecedent of the proposi- W = nRT ln(Vf/Vi) = Q (isothermal processes), (3) tion3 formally is ‘distinguishability ∧ non-orthogonality’, but this is false in view of the previous statement (1’), namely

4 Jaynes used his analysis to show that the definition and the quantification of the thermodynamic entropy depend on the particular thermodynamic 2 In the following I use the two logical symbols ‘⇒’ (‘implies’, ‘if . . . then’) variables that define the thermodynamic system: a long known fact, which for logical implication, and ‘∧’ (‘and’) for logical conjunction [20]. also Grad [24–26], who had different views on statistical mechanics than 3 In a proposition of the form ‘A ⇒ B’, A is called the antecedent (or impli- Jaynes’, stressed. cans,orprotasis)andB the consequent (or implicate,orapodosis)[20]. 5 See e.g. the very fine little book by Truesdell and Bharatha [27]. 3 where Vi and Vf are the volumes at the beginning and end of Two samples of these quantum ideal gases can be (more the process, R is the molar gas constant, and n is the (constant) or less effectively) separated by semi-permeable membranes, number of moles. This formula will be true throughout the analogous to those used with chemically different gases. The paper, as we shall only consider isothermal processes. microscopic idea [1, p. 196][2, p. 271] is, paraphrasing One often considers samples of such ideal gases in a cham- von Neumann, to construct many windows in the membrane, ber and inserts, moves, or removes impermeable or semi- each of which is made as follows: each particle of the gases permeable membranes6 at any position one pleases in order is detained there and an observation is performed on its quan- to subject the samples, independently of each other, to varia- tum degrees of freedom. Depending on the observation re- tions of e.g. volume or pressure.7 sult, the particle penetrates the window or is reflected, with unchanged momentum. This implies that the number of par- We must now face the question of how to introduce and ticles and hence the pressures or volumes of the gases on the mathematically represent quantum degrees of freedom in an two sides of the semi-permeable membrane will vary, and may ideal gas. Von Neumann [1, §V.2] used a hybrid classical- set the membrane in motion, producing work (e.g. by lifting quantum description, microscopically modelling a “quantum” a weight which loads the membrane). In more general terms, ideal gas as a quantity n of classical particles (for simplic- this membrane is a device which performs a physical oper- ity) possessing an “internal” quantum degree of freedom rep- ation, described by a given positive-operator-valued measure resented by a statistical matrix ρ living in an appropriate (POVM) and completely positive map (CPM) [41–44], on the statistical-matrix space; this space and the statistical matrix quantum degrees of freedom of the gases’ particles, and de- are always assumed to be the same for all the gas particles. pending on the result it acts on their translational degrees of He then treated two gas samples described by different statisti- freedom, e.g. separating them spatially. There arises thus a cal matrices as gases of somehow different chemical species.8 kind of mutual dependence between the quantum degrees of This conceptual device presents some problems, to be dis- freedom and the thermodynamic parameters (like the volume cussed more in detail elsewhere [38]; for example, the chemi- V) of a quantum ideal gas. cal species of a gas is not a thermodynamic variable, and even ff less a continuous one: chemical di erences cannot change Let us make an example. Imagine a chamber having vol- 9 continuously to zero. Intuitively, it would seem more appro- ume V and containing a mixture of a quantity n/2ofaz+-gas priate to somehow describe a quantum ideal gas by the vari- and n/2ofaz−-gas, i.e., of two quantum ideal gases whose , , ρ ables (V T ) instead, taking values on appropriate sets. quantum degrees of freedom are represented by the statistical However, we shall follow von Neumann and Peres instead matrices and speak of a ‘φ-gas’, or a ‘ψ-gas’, etc., where φ or ψ are the statistical matrices describing its quantum degrees of free- + def + + z = |z z | =ˆ 10 , (4) dom, just as if we were speaking of gases of different chemical 00 10 − def= | − −| = 00 , species (like e.g. ‘argon’ and ‘helium’). The thermodynamic z z z ˆ 01 (5) variables are only (V, T) for each such gas. in the usual spin-1/2 notation (Fig. 1, step a). These two statis- tical matrices represent quantum preparations that can be dis- tinguished with certainty, hence there exists a semi-permeable 6 Partington [28, §28] informs us that these were first used in thermodynam- membrane, of the kind described above, which is completely ics by Gibbs [29]. − 7 Although the conclusions drawn from such kind of reasonings are often opaque to (the particles of) the z -gas and completely trans- + valid, it must be said that the mathematical description adopted is want- parent to (the particles of) the z -gas. Analogously, there ing (Truesdell often denounced the fact that classical thermodynamics has exists another semi-permeable membrane with the opposite only very rarely been treated with the conceptual respect and mathematical properties, viz. it is completely opaque to (the particles of) the dignity which are accorded to other theories like rational mechanics or gen- + − eral relativity). The correct formalism to describe insertions and removals z -gas and completely transparent to (the particles of) the z - of membranes should involve field quantities (cf. Buchdahl [30, §§46, 75] gas. Such membranes implement the two-element POVM {z±} and see e.g. Truesdell [31, lectures 5, 6 and related appendices] or ref- with outcomes given by the CPMs {ρ → z±ρz±/ tr(z±ρz±)}: erences [32–36]), as indicated by the possibility of introducing as many + 11 membranes as we wish and hence to control smaller and smaller portions acting on the z -gas it yields of the gas. 8 The idea had been presented by Einstein eighteen years earlier [37], but it z+ → z+ with probability tr(z+ z+ z+) = 1, is important to point out that for Einstein the “internal quantum degree of + − − + − (6a) freedom” was just a “resonator” capable of assuming only discrete energies z → z with probability tr(z z z ) = 0, (i.e., it was not described by statistical matrices, and non-orthogonality issues were unknown); thus his application was less open to problems and critique than von Neumann’s. 9 An observation made by Partington [28, §II.28] in reference to Larmor [39, p. 275]. 11 Due to the large number of gas particles considered, the outcome proba- 10 φ Dieks and van Dijk [40] point rightly out that for a -gas one should more bilities are numerically equal, within small fluctuations negligible in the N φ = correctly consider the total statistical matrix i=1 ,whereN nNA (with present work, to the average fraction of gas correspondingly transmitted or NA Avogadro’s constant) is the total number of particles. reflected by the membranes. 4

(a) (b) to (the particles of) the other gas. Mathematically this is

|z+ z+| reflected in the non-existence of some two-outcome POVM 0.5 |z+ z+|+ < Q -0 {A1, A2} with the properties 0.5 |z− z−| |z− z−| + + z yields first outcome with probability tr(A1 z A1) = 1, + + z yields second outcome with probability tr(A2 z A2) = 0, Figure 1: Separation of completely distinguishable quantum (9a) gases and

− + + while acting on the z -gas it yields x yields first outcome with probability tr(A1 x A1) = 0, + + x yields second outcome with probability tr(A2 x A2) = 1. z− → z+ with probability tr(z+ z− z+) = 0, (6b) (9b) z− → z− with probability tr(z− z− z−) = 1 It can be shown [1, 2] that in this case the operation which (note that these are just von Neumann projections), i.e., the will require the maximum amount of work is represented by operation separates the two preparations which certainty. The the two-element POVM {α±},where only difference between the membranes is in which kind of √ √ ± ± ± 1 2 ± 2 ± 2 gas (particle) they let through. α def= |α α | =ˆ √ √ , (10) Now, imagine to insert these two membranes in the cham- 4 ± 22∓ 2 ber, very near to its top and bottom walls respectively. Push- √ − 1 |α± def= ± 2 | ± ±| ± , ing them isothermally toward the middle of the chamber, they 2 2 z x √ 1 √ 1 will separate the two gases and in the end we shall have the 1 2 + 2 − + − ≡ ± 2 ± 2 |z + 2 ∓ 2 |z , (11) z -gas completely above the first membrane and the z -gas 2 completely under the other in contact with the first (step b). whose outcomes are given by the CPMs (projections) ρ → In order to move these membranes and achieve this separation α±ρα±/ tr(α±ρα±). Note that the Hilbert-space vectors |α± we have to spend an amount of work equal to are the eigenvectors of the matrix λ,where / √ √ − × n V 2 ≈ . , 2 RT ln 0 693 nRT (7) 1 + 1 + 2 + 2 + 2 − 2 − 1 31 2 V λ def= z + x = α + α =ˆ , (12) 2 2 4 4 4 11 because each membrane must overcome the pressure exerted + − by the gas to which it is opaque. Since the process is isother- and they are orthogonal, so that tr(α α ) = 0. The action of mal, the quantity above is also the (positive) amount of heat this POVM on the statistical matrices of our gases is given by released by the gases. √ + → α+ α+ +α+ = + / The semi-permeable membranes can also be used to realise z with probability tr( z ) 2 2 4 the inverse process, i.e. the mixing of two initially separated ≈ 0.854, + − √ z -andz -gases. In this case Eq. (7) would be the amount z+ → α− with probability tr(α− z+α−) = 2 − 2 /4 of heat absorbed by the gases as well as the amount of work ≈ . , performed by them. 0 146 (13a) However, the quantum degrees of freedom of two gases may also be prepared in such a way that no observation proce- and dure can distinguish between them with certainty, and hence + + + + + there is no semi-permeable membrane which can separate x → α with probability tr(α x α ) ≈ 0.854, (13b) them completely; this has of course consequences for the x+ → α− with probability tr(α− x+α−) ≈ 0.146. amount of work that can be gained by using the membrane. Let us make another example. Imagine again the initial situa- The significance of the equations above is that the half/half + + tion above, but this time with a mixture of a quantity n/2ofa mixture of z -andx -gases can equivalently be treated as a + − z+-gas and n/2ofanx+-gas, with 0.854/0.146 mixture of an α -gas and an α -gas,asisnow shown. + def= | + +| = 1 11 . We can construct two semi-permeable membranes which x x x ˆ 2 11 (8) implement the above POVM and CPMs in such a way that The two statistical matrices z+ and x+ are non-orthogonal, they can be used to completely separate an α+-gas from an tr(z+ x+)  0, and correspond to quantum preparations that α−-gas, i.e., one membrane is totally transparent to the for- cannot be distinguished with certainty, and so there do not ex- mer and totally opaque to the latter, and vice versa for the ist semi-permeable membranes which are completely opaque other membrane. By inserting these membranes in the cham- to (the particles of) the one gas and completely transparent ber as in the preceding example, they will partially separate 5

(a) (b) T, is bounded above by the change in entropy:12

+ + 0.5 |z z |+ Q <-0 |α+ α+| Q/T ΔS (isothermal process). (16) 0.5 |x+ x+| 13 |α− α−| This assumes in our case for a cyclic process the form / =Δ . Figure 2: Separation of partially distinguishable quantum Q T 0 S (isothermal cyclic process) (17) gases

4. PERES’ DEMONSTRATION and transform our z+-andx+-gases leaving in the end an α+- − Peres’ demonstration [2, pp. 275–277] can be presented as gas above and an α -gas below the two (adjoined) membranes follows. The physicist Tatiana describes the internal quantum (Fig. 2). The gases will have same the pressure but occupy un- + + degrees of freedom of her ideal gases by means of a spin-1/2 equal volumes because the ratio for both z -andx -gases to + − statistical-matrix space (with the related set of POVMs). She be transformed into an α -gas and an α -gas is approximately starts (Fig. 3, step a) with two quantum ideal gases equally 0.854/0.146, as seen from Eqs. (13); this will also be the ratio divided into two compartments having volumes V/2 each and of the final volumes. The total amount of work necessary to separated by an impermeable membrane. In the upper com- perform this separation is partment there is a z+-gas, in the lower a x+-gas, where z+ and x+ are the statistical matrices defined in the previous 0.146V 0.854V − 0.146 nRT ln − 0.854 nRT ln ≈ section, Eqs. (4) and (8). Since they are non-orthogonal, V V tr(z+ x+)  0, there are no means to distinguish with certainty 0.416 nRT, (14) the two gases. This implies for Tatiana the non-existence of two semi-permeable membranes with the property of being, + where the first term is for the upper membrane (transparent the one, completely transparent to the z -gas and completely + to the α+-gas) and the second for the lower one (transparent opaque to the x -gas, and vice versa for the other, as discussed to the α−-gas). This is also the amount of heat released by in the previous section. the gases, and we see that it is less than the amount for the Enters a “wily inventor”, as Peres calls him [2, p. 275]; previous case, Eq. (7). Of course, in the reverse process, in let us call him Willard. He claims having produced two which we mix two initially separated α±-gases in the same such semi-permeable membranes, which can completely dis- amounts, the gases would absorb the same (positive) amount tinguish and separate the two gases. By means of them of heat. he reversibly mixes the two gases, obtaining work equal to Q = nRT ln 2 ≈ 0.693 nRT,cf.Eq.(7).Wehavenowasin- + We shall in a moment examine how Peres uses such semi- gle chamber of volume V filled with a half-half mixture of z - + permeable membranes to show that if we could distinguish and x -gases (step b). and separate two quantum-ideal-gas samples characterised by From Tatiana’s point of view, the situation is now the same non-orthogonal statistical matrices, then a violation of the sec- as that discussed in the last example of the previous section: ond law would follow. Although he explicitly adopts von Neu- the gas mixture is equivalent (step d) to a mixture of approx- + − mann’s entropy formula, his demonstration really only uses imately 0.854 parts of an α -gas and 0.146 parts of an α - + − the assumption that the thermodynamic entropy depends on gas, where α and α are the statistical matrices defined in the values of the parameters describing the phenomena in Eq. (11). Tatiana uses two semi-permeable membranes to ± + question at a particular instant but not on their history (as in- separate the two α -gases, the α -gas into a 0.854 fraction − stead is the case in more general thermodynamic and thermo- of the volume V,andtheα -gas into the remaining 0.146 mechanic processes [31]), so that in a cyclic process, in which fraction (so that they have the same pressure), and spends  we start from and end in a situation described by identical pa- work equal to −Q = −nRT(0.854 ln0.854+0.146 ln0.146) ≈ rameter values, the entropy change is naught: 0.416 nRT, cf. Eq. (14). Tatiana then performs two operations corresponding to uni- ΔS = 0 (cyclic process). (15) tary rotations of the statistical matrices associated to the two

We shall also make this same sole assumption, and since it does not require a specific formula for the entropy (such as 12 This is a special case of the Clausius-Duhem inequality [32, §258][31, von Neumann’s formula), we shall not make use of any par- lecture 2][45], which is more generally valid for inhomogeneous, non- ticular entropy expression. uniform bodies and for irreversible, non necessarily isothermal processes: r/T dm+ q/T dA s˙ dm,wherer, q, s, are respectively the massic Finally, let us recall that the second law of thermodynamics B ∂B B heating supply, the heating influx, the massic entropy of the body B having for isothermal processes says that the amount of heat Q ab- surface ∂B, and the dot represents the substantial derivative. sorbed by a thermodynamic body, divided by the temperature 13 See also Serrin’s [46] nice analysis of the second law for cyclic processes. 6 gases to a common one, say z+, so that the two compartments (a) (b) (c) ≡ (a) + now contain for her the same z -gas; she then eliminates the Ar Ar  membranesand reinserts another impermeable one to divide Q >-0 - ? Ar the gas into two compartments of equal volume (step e), and Ar Ar finally performs again an operation represented by a rotation ≡ z+ → x+ of the statistical matrix associated to the gas in the 6 lower chamber. In this way she has apparently re-established Figure 4: Classical gas experiment from Johann’s point of the original condition of the gases (step a), which have thus view undergone a cycle. These last operations are assumed to be performable without expenditure or gain of work, hence with- 14 out heat exchange either. 5. JAYNES’ DEMONSTRATION Tatiana summarises the results as follows: the total entropy change is naught because the initial and final conditions are We leave the quantum laboratory where Tatiana and Willard the same: ΔS = 0. The total heat absorbed by the gases are now arguing after their experiment, and enter an adjacent equals the work done by them and amounts to classical laboratory, where we shall look at Jaynes’ demon- stration [14]. The situation here is in many respects very sim- Q = Q + Q ≈ (0.693 − 0.416) nRT = 0.277 nRT > 0. (18) ilar to the previous, though it is completely “classical”. We have an ideal gas equally divided into two chambers of volumes V/2 each and separated by an impermeable mem- Hence, we have a violation of the second law (17) because for brane (Fig. 4, step a). For the scientist Johann the gas in the Tatiana’s gases two chambers is exactly the same, say “ideal argon”:15 for him it would thus be impossible, not to say meaningless, to Q/T > 0 =ΔS. (19) find a semi-permeable membrane that be transparent to the gas in the upper chamber and opaque to the gas in the lower one, and another membrane with the opposite properties. Tatiana accuses Willard of having violated the second law The scientist Marie states nevertheless that she has in fact by means of his strange semi-permeable membranes that “sep- two membranes with those very properties. She uses them arate non-orthogonal states”. to reversibly and isothermally mix the two halves of the gas, obtaining work equal to Q = nRT ln 2 ≈ 0.693 nRT (step b). Yet, from Johann’s point of view Marie has left things ex- (a) (b) (c) actly how they were: he just needs to reinsert the impermeable

| + +| membrane in the middle of the vessel and for him the situa- z z + + + + Q >-0 0.5 |z z |+ ≡- 0.85 |α α |+ tion is exactly the same as in the beginning: the gas is equally ? 0.5 |x+ x+| 0.15 |α− α−| |x+ x+| divided into two chambers (steps c, a). Johann’s conclusion is the following: The initial and final 6 ≡ −Q < Q conditions of the gas are the same and so the total entropy (f) ≡ (a) (e) (d) ? change vanishes: ΔS = 0. The work obtained equals the heat

|z+ z+| |z+ z+| absorbed by the gas,   |α+ α+|  + + + + Q = Q ≈ 0.693 nRT > 0. (20) |x x | |z z | |α− α−| The second law of thermodynamics states that this amount Figure 3: Quantum gas experiment from Tatiana’s point of of heat divided by the temperature cannot exceed the entropy view change,16 Q/T ΔS , but this is quite incompatible with Jo- hann’s conclusion that

Q/T > 0 =ΔS (21) 14 Note that von Neumann [1, pp. 194 and 197] and Peres [2, p. 275] assert that unitary rotations can be realised by processes involving no heat ex- (cf. Tatiana’s Eq. (19)). change, but work exchange is allowed and indeed sometimes necessary. However, in our present discussion we have assumed all processes to be isothermal and all gases ideal, and this implies that any isochoric exchange of work must be accompanied by an equivalent exchange of heat (see § 3); for this reason must Tatiana’s final isochoric unitary rotations be performed 15 Real argon, of course, behaves like an ideal gas only in certain ranges of with no energy exchange. This issue is related to the problematic way in temperature and volume. which the quantum and classical or thermodynamical descriptions are com- 16 Note that, strictly speaking, ideal gases (or mixtures thereof) cannot un- bined; namely, the statistical matrices are not thermodynamic variables au dergo irreversible processes (the usual example of “free expansion” evi- pair with the real numbers (V and T) describing the gas. dently assumes that the gas is not ideal during the expansion). 7

(a) (b) (c)  (a) law [Q/T ΔS ] only as long as all experi- 0.5 aAr aAr mental manipulations are confined to that cho-  . a b Q >-0 0 5 Ar - 0.5 Ar sen set. If someone, unknown to us, were to ! 0.5 bAr 0.5 aAr b vary a macrovariable Xn+1 outside that set, he Ar . b 0 5 Ar could produce what would appear to us as a vi- 6  (−Q Q ) olation of the second law, since our entropy func- tion S (X ,...,X ) might decrease spontaneously, Figure 5: Classical gas experiment from Marie’s point of 1 n while his S (X ,...,X , X + ) increases. view 1 n n 1 This is old wisdom; for example, Grad had explained thirty- one years earlier that [25, p. 325] (see also [24, 26]) Johann, however, is never dogmatic about his own knowl- edge of the experimental facts. Asking Marie whether she is the adoption of a new entropy is forced by the able to reproduce her “trick” at will or whether it was only discoveryof newinformation. [...] The exis- chance, and upon her answer that the separation is repro- tence of diffusion between oxygen and nitrogen ducible, he understands that where for him there was only one somewhere in a wind tunnel will usually be of ff gas there must actually be two di erent gases. This is indeed no interest. Therefore the aerodynamicist uses an the case: Marie explains that the two chambers initially con- entropy which does not recognise the separate ex- ff tained two di erent kinds of ideal-argon, of which Johann had istence of the two elements but only that of “air”. a b no knowledge: argon ‘a’( Ar) and argon ‘b’( Ar). Argon a is In other circumstances, the possibility of diffu- soluble in whafnium while argon b is not, but the latter is sol- sion between elements with a much smaller mass 17 uble in whifnium, a property not shared by the a variety. ratio (e.g., 238/235) may be considered quite rel- a b Marie’s separation of the two gases Ar and Ar was pos- evant. sible by means of two semi-permeable membranes made of whifnium and whafnium that take advantage of these different We can rephrase Grad’s and Jaynes’ remark shifting the em- properties. phasis to the distinction between the experimental situation We see (Marie’s point of view, Fig. 5) that the second law and the mathematics which describes it: The fact that some is not violated. Initially the two gases aAr and bAr were physicist can perform experimental operations which contra- completely separated in the vessel’s two chambers (step a). dict our mathematical description and which apparently lead After mixing and extracting work, the vessel contained an to violations of e.g. the second law, simply means that that equal mixture of aAr and bAr (step b). Upon Johann’s rein- physicist is able to control physical phenomena which are not sertion of the impermeable membrane the vessel is again di- contemplated by our mathematical description, and the sec- vided in two equal chambers, but each chamber contains a ond law is not necessarily violated in that physicist’s more mixture of aAr and bAr (step c), and this is different from appropriate mathematical description. the initial condition (step a): the cycle has not been com- pleted although it appeared so to Johann, and so the equation ΔS = 0 is not necessarily valid. To close the cycle one has 6. RE-ANALYSIS OF PERES’ DEMONSTRATION to use the semi-permeable membranes again to relegate the two gases to two separate chambers, and must thereby spend With the insight provided by Grad and Jaynes, we can re- an amount of work −Q at least equal to that previously ob- turn to the quantum laboratory and look with different eyes tained, −Q Q, and the second law (16) for the completed at what happened there. If Willard can reproducibly distin- cycle is satisfied: Q = Q + Q 0 =ΔS . guish and separate with certainty the two physical prepara- The simple conclusion, drawn by Jaynes [14, §3] in terms tions represented by Tatiana through the non-orthogonal sta- of entropy, is that tistical matrices z+ and x+, this can only have one meaning: these preparations have to be represented by orthogonal sta- it is necessary to decide at the outset of a prob- tistical matrices instead, at least in experimental situations in lem which macroscopic variables or degrees of / which Willard takes advantage of his instrumental capabili- freedom we shall measure and or control; and ties, his “tricks”. This is not in contradiction with Tatiana’s within the context of the thermodynamic sys- formalism: with the instruments and apparatus at her dis- tem thus defined, entropy will be some func- posal the two physical situations are not distinguishable with ,..., tion S (X1 Xn) of whatever variables we have certainty, and so the appropriate way for her of representing chosen. We can expect this to obey the second them was by non-orthogonal statistical matrices. But she can now share Willard’s instrumentation and knowledge and use an accordingly more adequate mathematical description of the physical facts. 17 Jaynes explains that ‘whifnium’, as well as ‘whafnium’, “is one of the rare superkalic elements; in fact, it is so rare that it has not yet been discov- We can imagine a possible explanation from Willard’s point ered” [14, §5]. of view (it is just a possible one, and even more drastic ones, 8 requiring abandonment of the quantum-mechanical formal- with ism, might be necessary in other instances). Willard explains √ − 1 that the internal quantum degrees of freedom of the gases |α˜ ± def= 2 ± 2 2 |z˜± ±|x˜± , (27) are best represented by a spin-3/2 statistical-matrix space, of √ − 1 which Tatiana used a subspace (more exactly, a projection) |α` ± def= 2 ± 2 2 |z`± ±|x`± , (28) because of her limited observational means; i.e., part of the statistical-matrix space was “traced out” because Tatiana used cf. Eq. (11), and the associated CPMs τ → only instrumentation represented by a portion of the total E±τE±/ tr(E±τE±). POVM space. For instance, denoting by {|z˜+ , |z˜− , |z`+ , |z`− } That separation led to a compartment, with volume 0.854V, the basis for the Hilbert space used by Willard, with z˜+ | containing the gas mixture z`+ = z˜− | z`− ≡0, Tatiana could not distinguish, amongst + + 1 1 others, the preparations corresponding to |z˜ and to |z` , both |α˜ + α˜ +| + |α` + α` +|, (29) | + 2 2 of which she represented √ as z , nor those corresponding √ to + def + − + def + − |x˜ = |z˜ + |z˜ / 2andto|x` = |z` + |z` / 2, which and the other compartment, with volume 0.146V, containing + she denoted as |x . The projection is thus the gas mixture |z˜+ →|z+ , |z˜− →|z− , |z`+ →|z+ , |z`− →|z− , 1 − − 1 − − (22) |α˜ α˜ | + |α` α` |. (30) 2 2 from which also follows Tatiana could not perceive that these were mixtures, because of her limited instrumentation. |x˜+ →|x+ , |x˜− →|x− , |x`+ →|x+ , |x`− →|x− , The following step corresponded to the rotations (23) |α+ α+| →| + +|, |α+ α+| →| + +| which makes it evident that Tatiana cannot distinguish prepa- ˜ ˜ z˜ z˜ ` ` z` z` (31) rations and experiments represented by the vectors with a tilde from the corresponding ones represented by accented vec- for gases in the upper compartment, and tors.18 − − + + − − + + Thus, the preparations which Tatiana represents by z+ and |α˜ α˜ | →|z˜ z˜ |, |α` α` | →|z` z` | (32) x+ because for her they were indistinguishable, are instead represented by Willard by for the gases in the lower compartment (remember that z˜+ | z`+ = 0). The successive elimination and reinsertion of 1000 0000 + def= | + +| = 0000 , + def= | + +| = 0000 , the impermeable membrane led to two compartments of equal z˜ z˜ z˜ ˆ 0000 x` x` x` ˆ 0010 (24) 0000 0000 volumes V/2 and equal content, viz. the equal mixture of | + +| | + +| which are clearly orthogonal, tr(˜z+ x`+) = 0, because for him z˜ z˜ -and z` z` -gases (step e). the two corresponding preparations are distinguishable. From The final rotation for the gas in the lower compartment, his point of view, the process went as follows. The two com- + + + + |z˜ →|x˜ , |z` →|x` (33) partments initially containedz ˜+-and`x+-gases (Fig. 6, step a). With his semi-permeable membranes he mixed the two distin- (remember that x˜+ | x`+ = 0), only led to two equal com- guishable gases with extraction of work (step b), so that the partments containing the mixtures of 1 |z˜+ z˜+| + 1 |z`+ z`+| and chamber eventually contained a τ-gas with 2 2 1 1 1000 τ = z˜+ + x`+ =ˆ 1 0000 . (25) (a) (b) (c) 2 0010 0.43 |α˜ + α˜ +|+ 2 2 0000 | + +| z˜ z˜ + + − − Q >-0 0.5 |z˜ z˜ |+ - 0.07 |α˜ α˜ |+ Tatiana’s subsequent separation by means of her semi- + + + + ! 0.5 |x` x` | 0.43 |α` α` |+ |x`+ x`+| permeable membranes (steps c, d), distinguishing the prepa- 0.07 |α` − α` −| rations corresponding to α+ and α−, is represented by Willard 6      by the POVM  (−Q > Q + Q ) −Q < Q (f)  (a) (e) (d) ? ± def ± ± ± ± = |α α | + |α α |, 0.5 |z˜+ z˜+|+ 0.5 |z˜+ z˜+|+ E ˜ ˜ ` ` + + ⎛ √ √ ⎞ . | + +| . | + +| 0.5 |α˜ α˜ |+ ± ± 0 5 z` z`  0 5 z` z`  ⎜ 2 √ 2 √20 0⎟ . |α+ α+| ⎜ ⎟ . | + +|+ . | + +|+ 0 5 ` ` = 1 ⎜ ± 22∓ 20√ 0√ ⎟ , 0 5 x˜ x˜ 0 5 z˜ z˜ ˆ ⎜ ⎟ (26) + + + + 4 ⎝ 002±√ 2 ± √2 ⎠ 0.5 |x` x` | 0.5 |z` z` | q 00± 22∓ 2 0.5 |α˜ − α˜ −|+ 0.5 |α` − α` −|

+ + + + + − Figure 6: Quantum gas experiment from Willard’s point of 18 An even more evident notation would be |z˜ ≡|z , z , |z` ≡|z , z , etc., but it might be misleading in other respects. view 9

1 | + +| + 1 | + +| 20 2 x˜ x˜ 2 x` x` gases respectively (step f). This is of They occur in a laboratory” [47]. (The present author is course different from the initial situation (step a); but for in fact guilty of unclarity about this important distinction in Tatiana, whose instrumentation was limited with respect to a previous paper [48].) The usual metonymic expression21 Willard’s, the initial and final conditions appeared identical. “to distinguish two statistical matrices” is certainly handy, but The second law is not violated, because the cycle has not must be used with a grain of salt: what we distinguish is in fact been completed, and the equation ΔS = 0 does thus not nec- two physical situations, facts, phenomena, or preparations; essarily hold. It is easy to see that in order to return to the ini- not two statistical matrices. The latter should mathematically tial condition an amount of work −Q 4 × (1/4)nRT ln 2 ≈ reflect what we can do with these preparations, e.g.whether 0.693 nRT has to be spent to separate the |z˜+ z˜+|-gas from the we can distinguish them, and be amended whenever new ex- |z`+ z`+|-gas, and analogously for the |x˜+ x˜+|-and|x`+ x`+|- perimental facts appear.22 gases. A final operation must then be performed correspond- ing to the rotations of the statistical matrices |z`+ z`+| and |x˜+ x˜+| to |z˜+ z˜+| and |x`+ x`+| respectively, and we have fi- nally reached again the initial condition (step a). The total Acknowledgements amount of heat absorbed by the gases, corresponding to the work performed on them would then be I thank Gunnar Björk and Anders Månsson for many dis- cussions regarding von Neumann’s demonstration, Gunnar    Q = Q + Q + Q (0.693 − 0.416 − 0.693)nRT = also for encouragement and for precious comments on the − 0.416nRT 0 =ΔS, (34) paper, Louise and Anna for encouragement, and Louise for invaluable odradek. and the second law, for the completed cycle, is satisfied (strictly so: we see that the whole process is irreversible, and it is easy to check that the only irreversible step was the sepa- α+ α− ration performed by Tatiana into -and -gases). [1] J. von Neumann, Mathematische Grundlagen der Quanten- mechanik (Springer-Verlag, Berlin, 1996), 2nd ed., first publ. 1932, transl. by Robert T. Beyer as Mathematical Foundations of Quantum Mechanics (Princeton University Press, Princeton, 7. CONCLUSION 1955). [2] A. Peres, Quantum Theory: Concepts and Methods (Kluwer The re-analysis, with Jaynes’ (and Grad’s) insight, of the Academic Publishers, Dordrecht, 1995). simple quantum experiment which seemed to violate the sec- [3] H. Ekstein, Presymmetry, Phys. Rev. 153, 1397–1402 (1967). ond law leads to the following almost trivial conclusion: if [4] H. Ekstein, Presymmetry. II,Phys.Rev.184, 1315–1337 the physicist Willard can reproducibly distinguish two physi- (1969). [5] R. Giles, Foundations for quantum statistics, J. Math. Phys. 9, cal preparations that the physicist Tatiana represents by non- 357–371 (1968). orthogonal statistical matrices, then no necessary violation of [6] R. Giles, Foundations for quantum mechanics, J. Math. Phys. the second law of thermodynamics is implied from Willard’s 11, 2139–2160 (1970). point of view. On the other hand it is certain that the partic- [7] R. Giles, Formal languages and the foundations of physics, ular statistical-matrix space adopted by Tatiana for the phe- in C. A. Hooker, ed., Physical Theory as Logico-Operational nomenon’s description is not (any longer) adequate, and has Structure (D. Reidel Publishing Company, Dordrecht, 1979), to be amended (in extreme cases a non-quantum-mechanical pp. 19–87. [8]D.J.FoulisandC.H.Randall,Operational statistics. I. Basic description might be necessary) in order to avoid inconsisten- concepts, J. Math. Phys. 13, 1667–1675 (1972). cies like e.g. seeming violations of the second law. Alterna- [9] C. H. Randall and D. J. Foulis, Operational statistics. II. Manu- tively, Tatiana can keep on using the unamended mathemat- als of operations and their logics, J. Math. Phys. 14, 1472–1480 ical description, but she must then renounce to treat with it (1973). situations involving the new experimental possibility and the related phenomena, in order to avoid inconsistencies. This conclusion emphasises the distinction, somewhat ob- 20 scured in von Neumann’s statements but often stressed by A similar explicit observation had already been made by Band and 19 Park [12]: “experimenters do not apprehend Hilbert vectors; they gather Peres [2, 11, 43, 47] and Jaynes [23] amongst others, be- numerical data” (their emphasis). tween physical phenomena and their mathematical descrip- 21 Metonymy is “a figure of speech consisting of the use of the name of one tion: “quantum phenomena do not occur in a Hilbert space. thing for that of another of which it is an attribute or with which it is asso- ciated (as ‘crown’ in ‘lands belonging to the crown’)” (Merriam-Webster Online Dictionary); it perfectly applies to our case. 22 We must not forget, however, that the mathematical formalism is often expanded with no or very little experimental input and successfully leads 19 E.g. Ekstein [3, 4], Giles [5–7], Foulis and Randall [8–10], Band and thus to new discoveries (think of e.g. general relativity, Dirac’s positron, or Park [12, 13]; cf. also References [15–17]. the Clausius-Duhem inequality). 10

[10] D. J. Foulis and C. H. Randall, Manuals, morphisms and quan- mans, Green and Co, London, 1949). tum mechanics,inA.R.Marlow,ed.,Mathematical Founda- [29] J. W. Gibbs, On the equilibrium of heterogeneous substances, tions of Quantum Theory (Academic Press, London, 1978), pp. Trans. Connect. Acad. 3, 108–248, 343–524 (1876–1878), repr. 105–126. in The Collected Works of J. Willard Gibbs. Vol. 1: Thermo- [11] A. Peres, What is a state vector?, Am. J. Phys. 52, 644–650 dynamics (Yale University Press, New Haven, 1948, first publ. (1984). 1906), pp. 55–353. [12] J. L. Park and W. Band, Mutually exclusive and exhaustive [30] H. A. Buchdahl, The Concepts of Classical Thermodynamics quantum states, Found. Phys. 6, 157–172 (1976). (Cambridge University Press, London, 1966). [13] W. Band and J. L. Park, New information-theoretic foundations [31] C. A. Truesdell, III, Rational Thermodynamics (Springer- for quantum statistics, Found. Phys. 6, 249–262 (1976). Verlag, New York, 1984), 2nd ed., first publ. 1969. [14] E. T. Jaynes, The Gibbs paradox,inMaximum-Entropy and [32] C. A. Truesdell, III and R. A. Toupin, The classical field the- Bayesian Methods,editedbyC.R.Smith,G.J.Erickson, ories, in S. Flügge, ed., Handbuch der Physik: Band III/1: and P. O. Neudorfer (Kluwer Academic Publishers, Dor- Prinzipien der klassischen Mechanik und Feldtheorie [Encyclo- drecht, 1992), pp. 1–22, http://bayes.wustl.edu/etj/ pedia of Physics: Vol. III/1: Principles of Classical Mechanics articles/gibbs.paradox.pdf. and Field Theory] (Springer-Verlag, Berlin, 1960), pp. 226– [15] L. E. Ballentine, The statistical interpretation of quantum me- 858. chanics, Rev. Mod. Phys. 42, 358 (1970). [33] R. M. Bowen, Thermochemistry of reacting materials,J.Chem. [16] L. Hardy, Quantum theory from five reasonable axioms (2001), Phys. 49, 1625–1637 (1968). arxiv.org/quant-ph/0101012. [34] C. A. Truesdell, III, Sulle basi della termodinamica delle mis- [17] P. G. L. Mana, Probability tables, in A. Y. Khrennikov, ed., cele, Atti Accad. Naz. Lincei: Rend. Sc. fis. mat. nat. 44, 381– Quantum Theory: Reconsideration of Foundations — 2 (Växjö 383 (1968). University Press, Växjö, Sweden, 2004), pp. 387–401, see also [35] I. Samohýl, Thermodynamics of Irreversible Processes in Fluid arxiv.org/quant-ph/0403084. Mixtures (Approached by Rational Thermodynamics) (BSB B. [18] E. P. Wigner, On the quantum correction for thermodynamic G. Teubner Verlagsgesellschaft, Leipzig, 1987). equilibrium, Phys. Rev. 40, 749–759 (1932). [36] I. Samohýl, Thermodynamics of reacting mixtures of any sym- [19] U. Fano, Description of states in quantum mechanics by density metry with heat conduction, diffusion and viscosity,Arch.Ra- matrix and operator techniques, Rev. Mod. Phys. 29, 74–93 tional Mech. Anal. 147, 1–45 (1999). (1957). [37] A. Einstein, Beiträge zur Quantentheorie, Ver. Deut. Phys. Ges. [20] I. M. Copi and C. Cohen, Introduction to Logic (MacMillan 16, 820–828 (1914), transl. by A. Engel and E. Schucking as Publishing Company, New York, 1990), eighth ed., first publ. Contributions to quantum theory,inThe Collected Papers of 1953. Albert Einstein: Vol. 6: The Berlin Years: Writings, 1914– I. M. Copi, Symbolic Logic (Macmillan Publishing, New York, 1917. (English translation) (Princeton University Press, Prince- 1979), 5th ed., first publ. 1954. ton, 1997), pp. 20–26. [21] E. T. Jaynes, Clearing up mysteries — the original goal,in [38] G. Björk, P. G. L. Mana, and A. Månsson (2005), in preparation. Maximum Entropy and Bayesian Methods, edited by J. Skilling [39] J. Larmor, A dynamical theory of the electric and luminiferous (Kluwer Academic Publishers, Dordrecht, 1989), pp. 1–27, medium. — Part III. Relations with material media,Phil.Trans. http://bayes.wustl.edu/etj/node1.html. R. Soc. Lond. A 190, 205–300, 493 (1897), repr. in J. Larmor, [22] E. T. Jaynes, Probability theory as logic,inProceedings of the Mathematical and Physical Papers. Vol. II (Cambridge Univer- Ninth Annual Workshop on Maximum Entropy and Bayesian sity Press, Cambridge, 1929), pp. 11–131. Methods, edited by P. Fougère (Kluwer Academic Publishers, [40] D. Dieks and V. van Dijk, Another look at the quantum mechan- Dordrecht, 1990), revised, corrected, and extended version at ical entropy of mixing, Am. J. Phys. 56, 430–434 (1988). http://bayes.wustl.edu/etj/node1.html. [41] E. B. Davies, Quantum Theory of Open Systems (Academic [23] E. T. Jaynes, Probability Theory: The Logic of Science Press, London, 1983). (Cambridge University Press, Cambridge, 2003), ed. by G. [42] K. Kraus, States, Effects, and Operations: Fundamental No- Larry Bretthorst, first publ. 1995 at http://bayes.wustl. tions of Quantum Theory (Springer-Verlag, Berlin, 1983). edu/etj (chapter 30, Maximum entropy: matrix formula- [43] A. Peres, Classical interventions in quantum systems. I. The tion, of the original version of the book, not included in measuring process,Phys.Rev.A61, 022116 (2000), arxiv. the printed one, is available at http://www.imit.kth.se/ org/quant-ph/9906023. ~mana/work/cc30e.pdf). [44] A. Peres, Classical interventions in quantum systems. II. Rel- [24] H. Grad, Statistical mechanics of dynamical systems with inte- ativistic invariance, Phys. Rev. A 61, 022117 (2000), arxiv. grals other than energy, J. Phys. Chem. 56, 1039–1048 (1952). org/quant-ph/9906034. [25] H. Grad, The many faces of entropy, Comm. Pure Appl. Math. [45] M. Šilhavý, The Mechanics and Thermodynamics of Continu- 14, 323–354 (1961). ous Media (Springer-Verlag, Berlin, 1997). [26] H. Grad, Levels of description in statistical mechanics and ther- [46] J. Serrin, Conceptual analysis of the second law of thermody- modynamics, in M. Bunge, ed., Delaware Seminar in the Foun- namics, Arch. Rational Mech. Anal. 70, 355–371 (1979). dations of Physics (Springer-Verlag, Berlin, 1967), pp. 49–76. [47] A. Peres, What’s wrong with these observables? (2002), [27] C. A. Truesdell, III and S. Bharatha, The Concepts and Logic of arxiv.org/quant-ph/0207020. Classical Thermodynamics as a Theory of Heat Engines: Rig- [48] P. G. L. Mana, Why can states and measurement outcomes orously Constructed upon the Foundation Laid by S. Carnot and be represented as vectors? (2003), arxiv.org/quant-ph/ F. Reech (Springer-Verlag, New York, 1977). 0305117. [28] J. R. Partington, An Advanced Treatise on Physical Chemistry. Vol. I: Fundamental Principles. The Properties of Gases (Long- Paper C

arXiv:quant-ph/0305117 v3 22 Sep 2004 r tde.Tefaeoki eea,ntrsrce to vectors restricted these not of general, is sets ta- framework the the The of storing, studied. properties or are Some organising, naturally of data. follows way ble’s space alternative vector real an system, a as quantum in vectors or out- as measurement classical comes and preparations generic of statisti- that, a representation containing the shown about table’, is data It ‘data cal experimental perspective. an different given slightly a from axioms. “reasonable” simple, latter quantum- some the of of characterises means outcomes by Hardy and only. states systems of mechanical description the to ∗ it l hti eti ipyavco fra numbers: real drawn of be can vector of conclusion a representation analogous the simply An for is state. the left is is this the that from all probabilities list, “redundant” quanti- the measured discarding different have ties”; relates theories which physical structure most some (infinite “since be to measure- over-complete, likely system. is all the and) probabilities on of for list performed” a probabilities such be However, “all possibly could of that list ments a by system’s acterised a that is [8], Peres vector real of language spaces. simple the in mechanics quantum ap- geometrically example, for pealing. their or, when simple true is particularly mathematics [6], is “behind” This physics Hardy the use- mechanics. better all quantum more understand certainly [5], to be but useful applications, very may Weigert practical are formalisms in these [1], 4], others than of Wigner ful Some [3, by [7]. Wootters others, Havel [2], among or Stapp given, more where complex differ, examples classical which the from formalisms less, through expressible lcrncades [email protected] address: Electronic thsln enkonta unu ehnc is mechanics quantum that known been long has It iia dai rpsdi h rsn ae,but paper, present the in proposed is idea similar A restricted not and general, very is framework a Such ad’ anie,wihwsarayepesdby expressed already was which idea, main Hardy’s of [6] formulation Hardy’s for case the certainly is This h a ttsadmaueetotoe erpeetda vectors? as represented be outcomes measurement and states can Why ASnmes 36.a 03.67.-a 03.65.Ta, numbers: PACS a.Sm rpriso h eutn eso hs etr r icse,a ela oeconnexions some natural as a well in as discussed, follows are space vectors vector these real formalism. of a quantum-mechanical sets the in resulting vectors the with of as properties outcomes Some measurement way. and states of sentation ti hw o,gvna“rbblt aatbe o unu rcasclsse,terepre- the system, classical or quantum a for table” data “probability a given how, shown is It .INTRODUCTION I. esrmn outcomes measurement ntttoe f¨ Institutionen state Hletsaebsdone; -Hilbert-space-based ,or preparation lcrm29 E144 it,Sweden Kista, 40 SE-164 229, Electrum oa nttt fTcnlg (KTH), Technology of Institute Royal rMkolkrnkohIfraintki (IMIT), Informationsteknik och Mikroelektronik or ,ischar- . Dtd 0My2003) May 20 (Dated: ir .L Mana L. G. Piero unu ehnc;cnein ihtelte r dis- are end. latter the in the cussed with connexions mechanics; quantum ubro ucmso l esrmnsis measurements all of outcomes of number o vr tt n o o vr ucm;tetable the outcome; every for row a ( and entry state every for table data ability following: number given the a imagine in different to system of this easy prepare is can physicist It A nature. unknown of I ihadffrn ubrof number different a with ann h outcome the taining { ueetadsaepeae;i a oklk h follow- the like look may it ing: prepared; state and table surement a down write can p physicist the- the some reasoning), necessarily, oretical not but possibly (and, experiments lyecuieadexhaustive and exclusive ally omo tagvnnme fmeasurements of number given a it on form 1 m r hscnawy eahee ygopn nsial asthe ways outcome suitable the necessary in if adding grouping “ and by measurement, the achieved of be outputs always can This ihtepoaiiisfreeyotoe o vr mea- every for outcome, every for probabilities the with 3 h al,wihcnb alda‘ a called be can which table, The or quantum, classical, be can which system, a Consider k other ,r I EOPSTO FADT TABLE DATA A OF DECOMPOSITION II. eed ntemaueeti usin h total The question. in measurement the on depends 4 ,r ,j i, ∗ ”. 5 } )( ftemeasurement the of preparations e.g. m m m 3 2 1 ,(4 ... r r r r r r r r‘aatbe o hr,hsacolumn a has short, for table’ ‘data or ’ L 6 5 4 3 2 1 , p p p p p p p ) steprobability the is 2)) s r L 61 51 41 31 21 11 1 i 1 ( ,or p r p p p p p p s 4 L 62 52 42 32 22 12 2 mn h osbeoutcomes possible the among , 2 outcomes states p p p p p p p s 1 L 63 53 43 33 23 13 3 ,weetest findices of sets the where ), 3 ... p m p p p p p p s L 64 54 44 34 24 14 4 { 2 4 s hntesse is system the when ) eprmna)prob- (experimental) . p ... p ... p ... p ... p ... p ... p ... s ... 1 { ,...,s r i : i LM 6 5 4 3 2 1 p M ∈ M M M M M M ij M I L ( m { } p Through . m k ,andper- 42 } k )ofob- } (mutu- ,each M 2 prepared in the state sj (s2). States and outcomes can The result is that, given a probability table for a sys- be listed and rearranged in any desired way in the table. tem, the states and the measurement outcomes for that Such a table would very likely have a large number of system can be represented as vectors, {sj} and {ri},in rows and columns, i.e.,thenumbersL and M are likely RK ,forsomeK, and the relative probabilities are given to be very large.2 by their scalar product. These vectors can be called state Suppose that the physicist wants to find a more com- vectors and outcome vectors. As already said, this kind pact, or simply different, way to write down and store the of representation has already been presented by Hardy [6] data in p. The table p is really just an L×M rectangular through a different line of reasoning. In particular, Hardy matrix, and as such it has a rank K, viz., the minimum supposes that it is possible to represent a state with a K- number of linearly independent rows or columns: dimensional vector4,withK  L, because “most physical theories have some structure which relates different mea- K def=rankp  min{L, M}. (1) sured quantities”; but the reasoning above shows that this possibility exists even before building up some the- It follows from linear algebra that p can be written as ory to describe the data. the product of an L × K matrix t and a K × M matrix The matrices t and u are not uniquely determined u3: from the decomposition (2), so that there is some free- dom in choosing their form. The fact that rank p = K, , p = tu (2) implies that there exists a square K ×K submatrix a,ob- tained from p by suppressing (L − K)rowsand(M − K) or columns, such that det a = 0. It is always possible to re- p11 ... p1j ... p1M ...... arrange the rows and the columns of the table p so that pi1 ... pij ... piM K ...... = such submatrix is the one formed by the first rows and p ... p ... p K L1 Lj LM columns. After this rearrangement (which, of course, t11 ... t1K does not imply any physical operation on the system), p ...... u11 ... u1j ... u1M ...... = ti1 ... tiK can be written in the following block form: ...... uK1 ... uKj ... uKM ⎛tL1 ⎞... tLK rT (3) ab ...1 p = with det a =0 , (5) ⎝ rT ⎠ s1 ... sj ... sM cd = ...i ( ) rT L where b, c,andd are of order K ×(M −K), (L−K)×K, In the last equation, the matrix t hasbeenwrittenasa and (L − K) × (M − K) respectively. block of row vectors rT,andthematrixu as a block of By writing also the matrices t and u in block form i column vectors si. In this decomposition, the element pij of p is then given by the matrix product of the row v , , T t = u = xy (6) vector ri with the column vector sj: w T K K×K L−K ×K pij = ri sj = ri · sj, ri, sj ∈ R , (4) where the orders of v, w, x,andy are ,( ) , K × K,andK × (M − K) respectively, we can rewrite where, in the last expression, the row ri and sj are con- Eq. (2) as sidered as vectors in RK , so that the matrix product is ⎧ equivalent to the scalar product. ⎪ a = vx ⎨⎪ ab v b = vy = xy , or (7) cd w ⎪ c = wx ⎩⎪ d = wy 2 Of course, one may wonder how often a physicist has to con- cretely deal with similar tables, or whether a similar table has with the solution5 ever been written down actually; yet, it cannot be denied that ⎧ this imaginary table conveys an idea of a part of that complex −1 ⎧ ⎪ v = ax − activity called “doing physics”. Its main purpose here is to give a ⎪ ⎪ ax 1 ⎨ −1 ⎨ completely operational background to the concepts presented. It w = cx t = −1 must also be remarked that it is not strictly necessary to speak or cx det x =0 , ⎪ −1 ⎪ about systems, states or preparations, and measurement out- ⎪ y = xa b ⎪ ⎪ ⎩ −1 ⎩ − u = xxa b comes: a similar table could be compiled by considering how d = ca 1b different aspects of a given phenomenon are (cor)related to each other in their various manifestations. The terms ‘system’, ‘state’, (8) ‘measurement’, and ‘outcome’ will nevertheless be used here for definiteness. 3 This is equivalent to the fact that a linear map p: RM → RL

K def p RM of rank =dim can be obtained as the composition 4 Hardy calls K the number of degrees of freedom of the system. p t ◦ u u RM → p RM 5 = of a surjective map : andaninjective The submatrix d of p is completely determined by the other map t: p RM → RL. submatrices a, b,andc because rank p = K. 3 where the square matrix two dimensions; or a tetrahedron, cube, prism, or gen- eral polyhedron in three, or a 600-cell in four, and so ... x =(s1 sK )(9)on [13]). Some of the states {s1,...,sM } are extreme points of this convex set (vertices of the polytope). Since is undetermined except for the condition of being non- every state vector can be written as a convex, hence lin- singular; this corresponds to the freedom of choosing K ear, combination of these extreme states, they must be K basis vectors in R as the representatives of the first atleastasnumerousasthebasisstates.ThenumberZ K 6 { ,..., } states s1 sK , which can then be called basis of extreme states (in quantum mechanics, they are called states. pure states) must then satisfy

Z  K. (11) III. THE SETS OF STATES AND OUTCOMES For the set of outcomes, the situation is slightly dif- The vectors by which states and outcomes are rep- ferent, and convex combination is not the only way in def resented belong to two subsets, S = {s ,...,sM } and which outcomes can be combined. Consider, as a con- 1 m R def { ,..., } RK crete example, the state sj and the measurements , = r1 rL respectively, of . It is interesting to m,andm with outcomes {r , r , r }, {r , r },and study some properties of these sets. Some of the following 1 2 3 4 5 {r6, r7} respectively, and the corresponding probabili- results have been obtained by Peres and Terno [9] in the ties; from them, the following additional measurements framework of quantum mechanics. The convex-related can be derived: properties of the sets are also well known [10, 11, 12, and references therein], but are often derived and expressed • The measurement which consists in performing through more elegant, and abstract, mathematics. m but considering only the set of two results In the following, the terms ‘vector’ and ‘point’ are used {(r1 or r3), r2}, with probabilities {(p1j +p3j),p2j} interchangeably. (a sort of coarse-graining).

• The measurement which consistes in perform- A. “Completion” of the sets ing m with probability λ,orm with prob- ability λ,orm with probability λ (obvi-    One can consider the convex hull of the set of states ously, λ + λ + λ = 1). The experimenter S: (who may not know which measurement will ac- tually be performed) expects thus one of the out- M M { , , , , , , } def comes r1 r2 r3 r4 r5 r6 r7 with probabilities Sc = λisi | si ∈ S, λi  0, λi =1 . (10)        {λ p1j,λp2j,λp3j,λ p4j,λ p5j,λ p6j,λ p7j} (of i=1 i=1 course, obtaining, e.g.,theresultr2 would imply that m was actually performed). An element of Sc like, e.g., λsj +(1− λ)sj ,with sj , sj ∈ S (and 0  λ  1), can be considered as a • Combinations of the two cases above. possible state, corresponding to a preparation in which sj or sj are chosen with probabilities λ or (1 − λ)re- From the examples just given, it is easy to see that, given spectively. This state should then be added to the table two outcomes ri and ri , not necessarily of the same p, with a respective column of probabilities. However, measurement, one can consider also the outcomes this column would be, for obvious reasons, just a linear combination of the columns under sj and sj ,withco-       λ ri + λ ri (0  λ + λ  1andλ ,λ  0), (12) efficients λ and (1 − λ); as a consequence, the rank of the table would be still K. For this reason, the set S and its and convex hull Sc can be used interchangeably in the con- siderations to follow7. Reasoning in terms of convexity,   one sees that Sc is just a convex (not necessarily regu- ri + ri (only if m = m ). (13) lar) polytope in a (K − 1)-dimensional (Euclidean) space (for example, a triangle, rhomboid, or general polygon in Note that Eq. (12) is not (always) a convex combination. The null vector (origin) o belongsthustothesetRc. These measurements and outcomes could be added to the table p as well, with their relative rows of probabil- 6 There is the alternative option of choosing the representatives ities; the rank of the table would nevertheless remain K of the first K outcomes; this corresponds to solving Eq. (7) in for the same reason given for the additional states. Thus, terms of v. R 7 This corresponds to “completing” the table p with this (infinite) also the set of outcomes can be ideally extended to a number of “additional” states; so M tends to infinity, but the set Rc by means of Eqs. (12) and (13). Again, R and Rc rank K remains constant. will be referred to interchangeably in the following. 4

B. The set of states lies in a hyperplane In particular, it is interesting to study how the boundary of S determines the region wherein R is constrained to  R Consider a state sj ∈ S and all the outcomes {ri : i ∈ lie (though may be a proper subset of this region).   p  p · Im } of a given measurement m , where the sets of indices Since 0 1 for every probability = r s, one has  Im depends on m . Since the outcomes are exhaustive and mutually exclusive, their probabilities must sum up to unity: 0  r · s  1. (19) The formula above, with s considered constant and r pij = (ri · sj)= ri · sj =1. (14) variable, defines a region in RK delimited by the parallel i∈I i∈I i∈I m m m hyperplanes r·s =0andr·s = 1. These hyperplanes are The equation above, when considered for the first K both perpendicular to the vector s, and pass respectively K states {s1,...,sK }, which form a basis for R , uniquely through the origin o and through the point n,sincen·s = m determines the vector sum of the outcomes of ,de- 1 by Eq. (16). There is one such region for every state  def s ∈ S i . The outcome vectors must be confined to lie in the noted by n = i ∈Im r , by its projections along the basis vectors. On the other hand, this happens for the intersection of all these regions. However, if an outcome m vector r lies inside the regions determined by two states outcomes of any measurement . Hence, the sum of the   outcome vectors {ri: i ∈ Im} of any measurement m is a s and s , it must also lie in that determined by any state λ  − λ  constant vector s +(1 )s which is their convex combination; i.e., if 0  λ  1, then n ≡ ri, for any m, (15)  i∈Im 0  r · s  1 ⇒  · λ  − λ   .  ·   0 r [ s +(1 )s ] 1 (20) such that 0 r s 1

T n s ≡ n · s =1. (16) This is easily proven observing that, since r · s, r · s, λ, and (1−λ) are non-negative, then λ r·s+(1−λ) r·s  0; The vector n (which is an extreme point of R )maybe     c and since r · s , r · s  1, then λ r · s +(1− λ) r · s  called the trivial-measurement vector, since it also repre- λ +(1− λ)=1. sents the outcome of the trivial measurement having only Hence, one needs to consider only the regions deter- a single (and therefore certain) outcome for any state. mined by the Z extreme states, which are the only ones The actual components of this vector are determined by not expressible as convex combinations of other states. the choice of the matrix x. In fact, Eq. (16) holds, in Now let {si1 ,...,siZ } be the extreme states. Consider particular, for every basis state {s1,...,sK },andtheset the Z hyperplanes of equations r · si =0,j =1,...,Z: of these K equations can be written, with the help of j they delimit a convex cone with vertex in the origin formula (9), in the following matrix form: o. Another convex cone, with vertex in n, is delim- K elements ited by the other Z hyperplanes of equations r · si =1, j T T j ,...,Z 9 n x = q , with q = 11... 1 . (17) =1 . The intersection of these two cones finally determines the maximal, K-dimensional, convex region Since x is non-singular, one finds which can be occupied by the set of outcomes R. nT = qTx−1, (18) so that x determines n, as asserted. D. ‘One-shot’ distinguishability Finally, the equation (16), where n is now considered constant, must be satisfied by every state s; this is the m equation, in vector form, of a (affine) hyperplane nor- Suppose that there exists a measurement that allows mal to n. Hence, the set of states S lies in a (K − 1)- one to tell with certainty which state, from a given set K D { ,..., } dimensional (affine) hyperplane in R . of states sj1 sjD , is actually prepared. These statescanthenbecalled‘one-shot distinguishable’. This means that the table p must contain a subtable of the C. The set of states determines the maximal form (obtained, if necessary, by re-listing states and out- possible extension of the set of outcomes comes):

The sets S and R can thus be viewed as compact, con- vex regions in RK . Their boundaries are interrelated8. 9 It can be shown that the cones are symmetric with respect to the point (center of symmetry) n/2: indeed, if r·s = 0 or 1 for some r and s,then[n/2+(n/2−r)]·s ≡ n·s−r·s =1−r·s = 1 or 0, where n/2+(n/2 − r) is the point symmetric to r with respect 8 They are dual [10]. to n/2. 5

 K   K si1 si2 ... siD of states S ∈ R of p , and the set of states S ∈ R  mri1 10... 0 of p . The map should preserve convex combinations of state vectors11; this implies that its most general form is ri2 01... 0 ...... that of an affine map, which can be written as r ...    iD 00 1 s → s = Fs + g, (23)

D × D   K or, in other words, the matrix p has a square where F is a K × K matrix and g ∈ R acolumn D × D submatrix equal to the identity matrix ED.This vector. implies that It turns out, however, that such a map is always ex- pressible as a linear transformation between the state rank p ≡ K  D. (21) S S S vectors in and . This is due to the fact that lies in an affine hyperplane of RK , and is simply proven by It can be proven that, if some of the states  considering Eq. (16) for the set S : {sj1 ,...,sjD } are not extreme, they can be replaced by T    extreme states having the same distinguishability prop- n s =1, s ∈ S , (24) erty [6, Sec. 6.12]. Moreover, if the {sj1 ,...,sjD } are one-shot distinguishable extreme states, then the con- where n is the trivial-measurement vector of R,and vex combination of any (D − 1) of them must lie on the then defining boundary of S; a simple proof of this fact for the case of three one-shot distinguishable extreme states is given C def= F + gnT, (25) in Appendix A. This implies that, if S contains D one- shot distinguishable states, there must exist a (D − 1)- which is a K × K matrix representing a linear trans- dimensional hyperplane such that its intersection with S formation. By Eq. (24) one finds is a (D − 1)-dimensional simplex. In particular, this is also true for the set of states Cs = Fs + gnTs = Fs + g, (26) in quantum mechanics. For example, the set of states of a three-level system, which has at most three dis- as asserted. tinguishable states, shows triangular two-dimensional Every such a map C between states induces a dual R ∈ RK  sections [14, Fig. 2]; for a four-level system, a three- map from the set, ,ofoutcomesofp to that, dimensional section of the set of states yields a tetra- R ∈ RK ,ofp: hedron, and so some two-dimensional sections must have T T T   T  triangular as well as trapezoidal shapes [15, Fig. 1]. r → r = r C, or r → r = C r . (27) The maximum number of one-shot distinguishable states for a given system with table p will be denoted The dual map must, for obvious reasons, map the trivial-  R by N;10 from Eqs. (11) and (21) it follows that measurement vector n of to the trivial-measurement vector n of R: Z  K  N. (22) CT n = n, (28) Note that, if K = N,thenZ = K ≡ N as well. This is because the N distinguishable states can be chosen to Which represents a constraint on the possible form of C. be extreme, and then they are the vertices of a (N − 1)- An analysis of the characteristics of the set of possi- dimensional simplex; but this must be all of S,sinceS ble maps between system states, analogous to that con- S R is (N − 1)-dimensional (for K = N) and convex. (In this ducted on the sets and , would be very interesting, case, R is a K-dimensional hypercube.) but it will not be pursued here. Such analysis could offer new perspectives (complementary to the ‘tensor-product based’ one) to study the relation system-subsystem12 and IV. MAPS BETWEEN SETS OF STATES the question of the complete positivity of superopera- tors [16, 17, 18, 19]. There is, of course, more than just one system; the transformations and relations among systems (like the relation system-subsystem) are extremely important. 11 This map, however, could in principle be partial, i.e., not defined A transformation or relation between a system with for all s ∈ S; this is because there may be, e.g., a transformation   table p and another with table p can be in some cases device that cannot take just any state of S as input.   expressed by means of a map S → S between the set 12 The fact that p is a subsystem of p canbeexpressedbysay- ing that there exists a surjective, non-injective map from S to S. Intuitively, this is because every preparation of a system is also a preparation for one of its subsystems, and a preparation of a subsystem may correspond to different preparations of the 10 Hardy calls N the dimension of the system. system it is part of. 6

V. HARDY’S AXIOMATICS AND QUANTUM they are not; for example, in the presence of superselec- MECHANICS tion rules, where, roughly speaking, some “portions” of the quantum mechanical set of states SQM are actually The formalism just presented is quite general, con- missing, i.e., are not observed (the same happens for the R taining the quantum mechanical one as a particular case quantum mechanical set of outcomes QM). (the relation with the trace rule is quickly shown in Ap- A related question is whether, given a generic prob- pendix B), and is essentially the same as Hardy’s, prior ability data table p, the set of states S and the set of to the introduction of his axioms. The effect of the lat- outcomes R derived from it can be “embedded” in some S R ter [6] is to characterise classical and quantum systems larger sets QM and QM satisfying the quantum me- by means of constraints on the set of states and that of chanical constraints. If this were always possible, then outcomes. quantum mechanics would be “always right”, just be- A classical system is characterised, in Hardy’s axiomat- cause every every experimental data table could then al- ics, by the fact that the number of one-shot distinguish- ways be described by quantum mechanical means. able states is maximal, i.e., N = K; as a consequence, the number of extreme states is Z = K = N as well, as shown in Sec. III D. VI. DISCUSSION A quantum mechanical system, instead, is charac- terised by the supposed existence of continuous reversible Using the conceptual tool of an imaginary ‘probability transformations between extreme states13. This implies data table’ associated to a system, it has been shown that the extreme vectors of a quantum mechanical set that the states and measurement outcomes of the system of states SQM form a continuum (Z =cardR); they can be represented as vectors in a real vector space. This satisfy also a quadratic equation which determines the representation simply follows from the decomposition of “shape” (modulo isomorphisms, in the convex sense) of the table, and in this context the rank of the latter, K, SQM. Moreover, the maximal number of one-shot dis- has a peculiar rˆole. This framework is very similar in tinguishable states is characterised by K = N 2.Some spirit to Hardy’s framework before the introduction of remarks can be made on these features. his axioms, and may thus elucidate some features of the A continuum of extreme states can never be observed latter. in practice [22, Chap. 9], but is a useful approxima- Some properties of the sets of states and outcomes, for tion or inductive generalisation which makes the pow- a generic classical or quantum mechanical system, have erful tools of analysis available for doing physics. The then been analysed in simple geometrical terms. same approximative or inductive step is indeed taken also Finally, some points have been discussed concerning in classical physics (think of classical phase space): in the characteristics of the sets of states of a quantum me- this case the distinguishable states, and hence the ex- chanical system as formalised by Hardy. In particular, it treme states as well, are supposed to form a continuum has been argued that the origin of many typical quantum (N = Z =cardR). From this point of view the contin- mechanical features lies not specifically in the continuity uum assumption of quantum mechanics is more econom- of the extreme states, or in the relation K = N 2 between ical than the one of classical physics. K and the number of ‘one-shot distinguishable’ states N, On the other hand, it is to be noted that most (if but simply in the fact that the number of extreme states not all) typical features of quantum mechanics arise not is greater than the number of ‘one-shot distinguishable’ from the continuity of the extreme states, but from the states. fact that there are more extreme states than distinguish- able ones, i.e.,fromZ>N(which implies K>N). This can be seen from the examples given by Kirk- Acknowledgments patrick [23, 24, 25] (see also Ref. 26). Note, however, that in those examples one finds that K = N 2,andZ, The author would like to thank Professor Gun- K, N are finite.14 nar Bj¨ork for advice and useful discussions, and Asa˚ It is reasonable to ask, from this point of view, to Ericsson for a useful discussion. what extent the above-mentioned quantum mechanical constraints on the sets of states and outcomes are ac- tually observed in practice. There are cases in which Appendix A

In Sec. III D it was stated that, if {sj1 ,...,sjD } are one-shot distinguishable extreme states, then the con- 13 Hardy adopts the old saying “natura in operationibus suis non vex combination of any (D − 1) of them must lie on the facit saltum” as expressed by Tissot [20] (see also von Linn´e [21]). boundary of S. A simple proof for three one-shot distin- 14 It would be interesting to develop, and study the properties of, { j , j , j } a general mathematical formalism having no constraints on the guishable extreme states s 1 s 2 s 3 is the following. sets S and R or on the numbers Z, K, N, and thus containing First, the condition for a point s ∈ S to belong to the the classical and the quantum-mechanical as special cases. boundary is that there exists at least one point s∗ ∈ S, 7 s∗ = s, such that, writing s as a convex combination s = Appendix B: THE TRACE RULE μs∗ +(1− μ)s∗∗ (0  μ  1) of s∗ and some other point μ s∗∗, implies that =0(i.e., the combination cannot be It is shown that the ‘scalar product formula’, Eq. (4), proper). Intuitively, this means that straight lines cannot includes also the ‘trace rule’ of quantum mechanics (cf. be drawn from the point s in just any direction, if they Hardy [6, Sec. 5]; also Weigert [5]). S are to remain inside . A state is usually represented in quantum mechanics Now, consider a point s given by a convex combination by a density matrix ρˆj, and a measurement outcome by of sj1 and sj2 : a positive-operator-valued measure element Aˆi;bothare Hermitian operators in a Hilbert space of dimension N. s ≡ λsj1 +(1− λ)sj2 , 0  λ  1. (A1) The probability of obtaining the outcome Aˆi for a given j since sj1 , sj2 ,andsj3 are one-shot distinguishable, there measurement on state ρˆ is given by the trace formula must exist a measurement outcome r such that pij =trAˆiρˆj. (B1) r · sj1 = r · sj2 =0, (A2)

r · sj3 =1, (A3) The Hermitian operators form a linear space of real di- mension K = N 2; one can choose N 2 linearly indepen- and Eqs. (A1), (A2) yield dent Hermitian operators {Bˆk} as a basis for this linear space. These can also be chosen (basically by Gram- · λ · − λ · . r s = r sj1 +(1 )r sj2 =0 (A4) Schmidt orthonormalisation) to satisfy

Now write s as a convex combination of the vector sj 3 tr BˆkBˆl = δkl. (B2) with some other vector s∗∗: Both ρˆj and Aˆi can be written as a linear combination of the basis states s = μsj3 +(1− μ)s∗∗, 0  μ  1. (A5) K K From Eqs. (A2), (A4), and (A3), one obtains l k ρˆj = sj Bˆl, Aˆi = ri Bˆk, (B3) l=1 k=1 0=r·s = μr·sj3 +(1−μ)r·s∗∗ = μ+(1−μ)r·s∗∗ (A6) s l r k which can be satisfied only if μ = 0 (which implies where the coefficients j and i are real. Using Eqs. (B3) and (B2) the trace formula becomes s∗∗ = s). Thus, the vector sj3 plays the role of s∗ in the condition given above, and so any s = λsj +(1− λ)sj 1 2 K K lies in the boundary of S. An analogous proof holds k l l l pij ˆi ˆj ri sj ˆk ˆl ri sj i · j, for any convex combination of any two of the vectors =trA ρ = tr B B = = r s k,l=1 l=1 {sj1 , sj2 , sj3 }, and this implies that the latter are ver- tices of a triangle which is part of the boundary of S. (B4) def K def K The generalisation to more than three vectors is where ri = ri1 ...ri and sj = sj1 ...sj are vec- straightforward. tors in RK .

[1] E. Wigner, Phys. Rev. 40, 749 (1932). tional Laboratory (2003), quant-ph/0304159. [2]H.P.Stapp,Phys.Rev.D3, 1303 (1971). [13] H. S. M. Coxeter, Regular Polytopes (Dover, New York, [3] W. K. Wootters, Found. Phys. 16, 391 (1986). 1948), 3rd ed. [4] W. K. Wootters, Annals of Physics 176, 1 (1987). [14] G. Kimura, The Bloch vector for N-level systems (2003), [5] S. Weigert, Phys. Rev. Lett. 84, 802 (2000), quant- quant-ph/0301152. ph/9903103. [15] L. Jak´obczyk and M. Siennicki, Phys. Lett. A 286, 383 [6] L. Hardy, Quantum theory from five reasonable axioms (2001). (2001), quant-ph/0101012. [16] P. Pechukas, Phys. Rev. Lett. 73, 1060 (1994). [7] T. F. Havel, The real density matrix (2003), quant- [17] P. Stelmachoviˇˇ candV.Buˇzek, Phys. Rev. A 64, 062106 ph/0302176. (2001), quant-ph/0108136. [8] A. Peres, Quantum Theory: Concepts and Methods, [18] A. Peres and D. R. Terno, Quantum information and vol. 72 of Fundamental Theories of Physics (Kluwer Aca- relativity theory (2002), quant-ph/0212023. demic Publishers, Dordrecht, 1995). [19] A. Peres and D. R. Terno, Quantum information and [9] A. Peres and D. R. Terno, J. Phys. A: Math. Gen. 31, special relativity (2002), quant-ph/0301065. L671 (1998). [20] J. Tissot, Discours de la vie et mort du g´eant Teuto- [10] P. Busch and P. J. Lahti, Phys. Rev. D 29, 1634 (1984). bochus, Lyon? (1613?); seemingly cited in E.´ Fournier, [11] P. Busch, Phys. Rev. D 33, 2253 (1986). L’Esprit des autres (Dentu, Paris, 1857), 3rd ed., and [12] H. Barnum, Tech. Rep. LA-UR 03-1199, Los Alamos Na- in W. F. H. King, Classical and foreign quotations: A 8

polyglot dictionary of historical and literary quotations, ability (2001), quant-ph/0106072. proverbs, and popular sayings (J. Whitacker and Sons, [24] K. A. Kirkpatrick, Classical three-box “paradox” (2002), London, 1904). quant-ph/0207124. [21] C. von Linn´e (Carolus Linnaeus), Philosophia botanica [25] K. A. Kirkpatrick, Hardy’s second axiom is insufficiently in qva explicantur fundamenta botanica cum definition- general (2003), quant-ph/0302158. ibus partium, exemplis terminorum, observationibus rari- [26] P. G. L. Mana, The properties of the Shannon entropy orum, adjectis figuris aeneis. (Grefing, Stockholm, 1751). are not violated in quantum measurements (2003), quant- [22] H. Jeffreys, Scientific Inference (Cambridge University ph/0302049. Press, London, 1931), 2nd ed. [23] K. A. Kirkpatrick, “Quantal” behavior in classical prob- Paper D

arXiv:quant-ph/0403084 v1 10 Mar 2004 fMenk[,4 ,6,Fui n adl ta.[,8 ,1,1] anm[2,adothers. and [12], Barnum 11], 10, 9, 8, [7, al. et Randall and Foulis 6], 5, 4, [3, Mielnik of a so h anie fa‘al eopsto’ n nteipiain fthe of implications paper the this Conference. on in this and in decomposition’, emphasis discussed ‘table The topics a various of details. for idea latter further main for the referred on is is in Reader issues the ‘foundational’ which current the of some at offer may looking fact mechanics. for curious quantum view This of represen- [1]. points Hardy’s outcomes’ for ‘measurement new which, as way and same ‘states’ a the that of in essentially find tation results, we is and way phenomena, preparations compact more mechanical to a associated quantum in be data a can table’s in vectors the organise real data or store table’s to this try we store to want proceed? also we quantum, we could classical, collected How that be data way. can Suppose statistical which compact of the else. nature table, the something of — or sort experiments of a group in a written, from have we that Suppose Introduction 1 oevr ayaaois icse nRf ,aet efudbtenti okadthose and work this between found be to are 2, Ref. in discussed analogies, many Moreover, n fdcmoigti al nacmatwy ed oasotu o Hardy’s for shortcut a to issues. leads foundational way, on system, compact classical perspectives a or new quantum in gives a table and for this data formalism, probabilistic decomposing of of table a and writing of idea The h da eepeetdaeasmayo hs eeoe nRf ,to 2, Ref. in developed those of summary a are presented here ideas The when above, described situation the given that, shown is it paper this In ntttoe f¨ Institutionen sfodgtn2,S-6 0Ksa Sweden Kista, 40 SE-164 22, Isafjordsgatan ugiaTkik H¨ Tekniska Kungliga rmkolkrnkohinformationsteknik, och mikroelektronik or mi:[email protected] Email: rbblt tables Probability ir .Lc Mana Luca G. Piero 1 asAlsvradl ed nMenschen- in Menschen- denken! werde Menschen-Sichtbares, F verwandelt Wahrheit, zur Denkbares, Alles Wille euch bedeute dass diess Aber — hbrs ueege in ol h uEnde zu ihr sollt Sinne eignen Eure uhlbares! ¨ ogskolan, a Nietzsche 2 Probability tables

Imagine that we are in a laboratory, performing experiments of various kinds to study some interesting phenomena; the purpose of the experiments is to statistically study the correlations among different kinds of these phenomena. In general, we try to reproduce a given phenomenon — either by controllably preparing it at will, or simply by waiting for its occurrence —, to observe which concomitant phenomena, or results, occur.

Some experiments present common features: for example, part of the preparation can be the same for some of them. We separate ideally each experiment into a preparation and an intervention; the latter also delimits the kind of results which can be obtained, which implies that if we are told a result, we know which intervention was made. We then consider a set of preparations and a set of interventions, with the clause that sensible experiments may be made by combining each of the preparations with each of the interventions (preparations or interventions which do not satisfy this condition are set aside for the moment).b

Thus, suppose that we have M different preparations {S1,...,S M },and a given number of possible interventions {M1, M2,...,Mk,...},eachwitha IM {R , R ,...,RI } different number k of results 1 2 Mk (mutually exclusive and c I exhaustive ), where the number Mk depends on the particular intervention Mk. The total number of results, counted from all interventions, is L.

Through repetitions of the experiments, or through theoretical assump- tions, or just by analogy with other experiments which we have already seen and which we judge similar to those that are now under study, we can write down a table p with the probabilities that we assign to every result, for every intervention and preparation. The table may look like the following:

bNote that a preparation does not need to temporally precede an intervention; indeed, these two notions are meant to have here only a logical, not temporal, meaning. For example, in quantum-mechanical experiments with post-selection, the preparation is effectively com- pleted after the intervention is made! cThis can always be achieved by grouping in suitable ways the results, and adding if necessary the result “other”.

2 S 1 S2 S 3 S 4 ... S M R1 p11 p12 p13 p14 ... p1M M1 R2 p21 p22 p23 p24 ... p2M R3 p31 p32 p33 p34 ... p3M M2 R4 p41 p42 p43 p44 ... p4M R5 p51 p52 p53 p54 ... p5M M3 R6 p61 p62 p63 p64 ... p6M ......

RL pL1 pL2 pL3 pL4 ... pLM

The table, which may be called a ‘probability table’, has a column for each preparation and a group of rows for each intervention, and these rows are the possible results of the intervention. The table entry pij is the probability of obtaining the result Ri for the intervention Mki and the preparation Sj.For example, the entry (4, 3) is the probability p43 the we assign to obtaining the result R4, among the possible results {R3, R4, R5}, for the intervention M2 and the preparation S3. Preparations and results can be listed and rearranged in any desired way in the table. Such a table would very likely have a large number of rows and columns, i.e., the numbers L and M are likely to be very large. Now, suppose that we seek a more compact way to write down and store the probability data collected in the table p. We note that the table is really just an L × M rectangular matrix, and as such it has a rank K, viz., the minimum number of linearly independent rows or columns:

def K =rankp  min{L, M}. (1)

It follows from linear algebra that p can be written as the product of an L × K matrix t and a K × M matrix u:d

p = tu, (2)

⎛ ⎞ or T p11 ... p1j ... p1M r ...... 1 pi1 ... pij ... piM ⎝ rT ⎠ s1 ... sj ... sM ...... = ...i ( )(3) pL1 ... pLj ... pLM T rL In the last equation, the matrix t has been written as a block of row vectors T ri ,andthematrixu as a block of column vectors si. In this decomposition, d p RM → RL K def p RM This is equivalent to the fact that a linear map : of rank =dim p t ◦ u u RM → p RM can be obtained as the composition = of a surjective map : and an injective map t: p RM → RL.

3 T the element pij of p is then given by the matrix product of the row vector ri with the column vector sj: T pij = ri sj = ri · sj , (4) K where, in the last expression, ri and sj are considered as vectors in R ,so that the matrix product is equivalent to the scalar product. It will be shown in a moment that the decomposition is always effective in reducing the number of data of the table. K The result is that we can associate vectors {sj} and {ri} in R ,forsome K, to the preparations and the intervention results for the table, and the relative probabilities are given by their scalar product:

pij ≡ ri · sj. (5) These vectors can be called preparation vectors and (intervention-)result vec- tors, or, in general, (table) vectors. It should be remarked immediately that we are not postulating any kind of physical property in Eq. (5); even less have we found one. We have only decided to represent and store a collection of numbers (which do have physical significance) in an alternative way. Note, in particular, that the numerical values of the vectors {sj} and {ri}, as well as their dimension K, depend on the whole collection of probabilities {pij }: if one of these is changed, then K and all the vectors will in general change. The meaning of the representation (5) shall be discussed in a moment, but let us study the decomposition in more detail first, in order to have a clearer idea of how the table vectors originate. The matrices t and u are not uniquely determined from the decomposi- tion (2), so that there is some freedom in choosing their form. The fact that rank p = K, implies that there exists a square K × K submatrix a, obtained from p by suppressing (L−K)rowsand(M −K) columns, such that det a =0. It is always possible to rearrange the rows and the columns of the table p so that this non-singular submatrix is the one formed by the first K rows and K columns. After this rearrangement, p can be written in the following block form: ab p = with det a =0 , (6) cd where b, c,andd are of order K×(M −K), (L−K)×K,and(L−K)×(M −K) respectively. By writing also the matrices t and u in block form v t = , u = xy , (7) w

4 where the orders of v, w, x,andy are K × K,(L − K) × K, K × K,and K × (M − K) respectively, we can rewrite the decomposition equation (2) as ab v = xy , (8) cd w or

a = vx, b = vy, c = wx, d = wy, (9) which has the solutione det x =0 , y = xa−1b, v = ax−1, w = cx−1, (10) which in terms of the result and preparation vectors is

−1 (s ... sK )=x, (sK ... sM )=xa b, (11a) 1 ⎛ ⎞ ⎛+1 ⎞ r1 rK+1 ⎝...⎠ = ax−1, ⎝ ... ⎠ = cx−1. (11b) rK rL

The square matrix x =(s1 ... sK ) is undetermined except for the condition of being non-singular; this corresponds to the freedom of choosing K basis vectors K f in R as those associated to the first K preparations S 1,...,SK ,whichcan then be called basis preparation-vectors. Note that for a given probability table p the set of basis preparation-vectors is in general non-unique, because p possesses in general many submatrices like a with rank K. We can now check whether the above decomposition effectively reduces the number of data to be stored. The table p has L × M entries. The matrices x, y, v, w (or equivalently the collection of vectors {sj} and {ri}) have in total K × (L + M) entries; however, we are free to choose a ‘canonical’ form for the non-singular K × K matrix x (e.g., the identity matrix) once and for all, for all probability tables (this corresponds to a standard choice of basis vectors in RK ). This implies that we are indeed left with only K × (L + M) − K2 numbers. It is then simple to verify that, from K  min{L, M},wehave K × (L + M) − K2  L × M. (12) Thus the decomposition of the probability table into two sets of vectors is indeed a more compact way to write down and store the probability data. eNote that d = ca−1b, i.e., the submatrix d of p is completely determined by the other submatrices a, b,andc, because rank p = K. fThere is the alternative option of choosing the vectors associated to the first K results; this corresponds to solving Eq. (9) in terms of v.

5 2.1 A numerical and graphical example Imagine that, with the various apparatus in our laboratory, we can make seven different preparations {S 1,...,S 7} and perform three different interventions {M1, M2, M3}, each having two results. The probabilities that we assign to the various results are given in the following table:

S 1 S2 S 3 S 4 S 5 S6 S 7 R 1 1 3 1 3 M 1 1 2 0 2 4 2 4 1 R 1 1 1 1 1 2 0 2 1 2 4 2 4 R 1 1 3 1 1 M 3 2 1 2 0 4 2 2 2 R 1 1 1 1 1 4 2 0 2 1 4 2 2 R5 1 1 1 1 1 1 1 M3 R6 0 0 0 0 0 0 0

The matrix corresponding to this probability table has rank K =3,and the first 3 × 3 submatrix a is indeed already non-singular, without the need of rearranging the rows or columns of the original table. Proceeding as in the decomposition equations (9) and (10) with the following choice for the matrix x: ⎛ ⎞ 11 1 x ≡ ⎝10−1⎠ , (13) 01 0 we finally obtain the following preparation and result vectors: , , , s1 =(1 1 0) 1 1 r1 =( , , 0), s =(1, 0, 1), 2 2 2 1 , − 1 , , r2 =(2 2 0) s3 =(1, −1, 0), 1 , , 1 , r3 =(2 0 2 ) s4 =(1, 0, −1), (14) 1 , , − 1 , 1 1 r4 =(2 0 2 ) s5 =(1, , ), 2 2 r =(1, 0, 0), s =(1, 0, 0), 5 6 , , , , 1 , , r6 = O =(0 0 0) s7 =(1 2 0) represented as points of R3 in Fig. 1. We notice that all preparation vectors lie on the same plane (x = 1); this is a general property of any probability table, which derives from the fact that the results of an intervention are mutually exclusive and exhaustive; this also implies that an intervention’s result-vectors always sum up to the same vector [2]: in this case, r1 + r2 = r3 + r4 = r5 + r6 =(1, 0, 0).

6 O z y r3 r 2 x r1 r4

s2 s3 s5 r5=s6 s7 s1 s4

Figure 1: Preparation and result vectors (and their convex hulls) in R3,from the example. O is the origin; x, y, z are the directions of the Cartesian axes.

3 Relation to quantum mechanics

Probability tables like the one illustrated above can be made, in particular,for phenomena concerning classical or quantum systems. Indeed, the resulting as- sociation of vectors to preparations and results is analogous to that introduced by Hardy [1], through a different line of reasoning, for ‘states’ and ‘measure- ment outcomes’ of classical and quantum systems.

For example, imagine how a table for a two-level quantum system would appear; consider for concreteness the polarisation of a single photon. We can prepare the photon with different polarisation directions and with different degrees of polarisation; hence the table has in the limit a continuum of columns, one for each preparation. Analogously, we can perform some interventions on the photon by placing various polarisation filters on its way, controlling if it is absorbed or not; in the limit, the table has also a continuum of rows, one for each intervention result. Some entries of the table would look like the following (the meaning of the symbols should be self-evident):

7 ... S 0◦ ... S 45◦ ...... ◦ R45 ... 1 ... 1 ... M ◦ out◦ 2 45 R45 ... 1 ...... abs 2 0 ...... ◦ R60 ... 1 ... 0.933 ... M ◦ out◦ 4 60 R60 ... 3 ...... abs 4 0 067 ......

Such a table has, notwithstanding the limiting infinite size, rank K =4, and so we can associate to every preparation and to every result a 4-dimensional vector. The preparation vectors, however, lie on a 3-dimensional (affine) hyper- plane, as in the numerical example previously discussed. It can be shown indeed [1, 2] that the resulting set of preparation vectors is equivalent to the standard Bloch-sphere for two-level quantum systems. This was just an example, but all quantum-mechanical concepts like den- sity matrix, positive-operator-valued measure, and completely positive map can be expressed in an equivalent table-vector formalism [1, 2] (e.g., the rela- tion with the trace rule is quickly shown in the appendix).

4 Discussion

The following discussion concentrates on the possible relations among the ideas hitherto presented and ideas presented by other authors in this Conference. The Reader is referred to Ref. 2 for a more general and detailed discussion.

4.1 On the probabilities in the table and ‘quantum logics’

In Sect. 2, we quickly introduced the preparations, interventions, and results {Sj}, {Mk}, {Ri}, and the probabilities {pij} which make up the table. Let us define them more clearly. Symbols like S i and Mk represent propositions which together describe an actual, well-defined procedure to set up an experiment, e.g. S i =‘The diode laser is put on the table is such and such position, with a beam splitter in front of it in such and such position. . . ’ etc.,g and Mk =‘Such and such vertical filter is placed in such and such place, and the detector is placed behind it...’ etc. The separation of the experiment’s description into the two propositions is not unique, and indeed more kinds of separations can be considered.h gAnother example: ‘Wait until such and such happens, then. . . ’ etc. hIn Ref. 2, this fact is used as the starting point to define the concept of transformation.

8 A symbol like Rj represents a proposition which describes the results of an experiment, e.g. Rj =‘The detector does not click’; it is then clear that it depends on the particular experiment being performed. The probability pij is consequently defined as

def pij = P (Ri| Mki ∧ S j ∧ Q), (15) where Q is a proposition representing the rest of the experimental details and our prior knowledge.i We are thus considering probability theory as extended logic [13, 14, 15, 16], an approach which will prove to be, in the following, powerful, flexible, and intuitive at once. Usually, many repetitions of the same experiment are made and the relative frequencies of the different results of the intervention are observed. In this case, given a judgement of total exchangeability of the experiment’s repetitions, the probability is practically equal to the observed frequency of the result, thanks to de Finetti’s representation theorem [17]. But the probability can also be assigned on grounds of similarity with other experiments, or just by theoretical assumptions. Note that the table is silent with regard to the probabilities for the prepa- rations or the interventions, P (S j| Q)andP (Mk| Q). In a given ‘situation’ Q, if we decide which preparation and intervention to perform, say S5 and M7, then these probabilities are P (S i| Q)=δi,5 and P (Mk| Q)=δk,7 of course (which amounts to saying that “we know what we’re doing”). If it is someone else who is deciding the particular preparation and intervention, then the prob- abilities must be assigned in some other way, e.g. by asking “which preparation are you making?”, or by using some other knowledge, and they will in general differ from 0 and 1. The difference between these probabilities and those of Eq. (15) is somehow analogous to the difference between initial conditions and equations of motion in classical mechanics: the theory concerns only the latter, while the former has to be specified on a case-by-case basis. The probabilities of the results of a given intervention and a given prepa- ration form a probability distribution, because the results were arranged so as to be mutually exclusive and exhaustive. This implies that, for two results Rk  and Rk of a given intervention Mk and for a given preparation S j,wealsohave the trivial identity

  P (Rk ∨ Rk| Mk ∧ S j ∧ Q)=P (Rk| Mk ∧ S j ∧ Q)+P (Rk| Mk ∧ S j ∧ Q),  =(rk + rk) · sj. (16) iA ‘subjective Bayesian’ can see Q as representing the ‘agent’.

9 But with probability theory as logic we can also evaluate, for a given prepa-   ration S j, the disjoint probability for the results R and R of two different   interventions M and M , just using the product and sum rules:

        P [R ∨ R | (M ∨ M ) ∧ S j ∧ Q]=P [(R ∧ M ) ∨ (R ∧ M )| S j ∧ Q],    = P (R | M ∧ S j ∧ Q) × P (M | S j ∧ Q)+    P (R | M ∧ S j ∧ Q) × P (M | S j ∧ Q),     =[r P (M | S j ∧ Q)+r P (M | S j ∧ Q)] · sj, (17)

  where it is assumed that P (M ∨ M | S j ∧ Q) = 1, i.e., we are sure that one or the other intervention was performed. The content of the formula above is intuitive: the occurrence of the result   R implies that the intervention M was performed, and so analogously for   the result R and the intervention M . Then the probability of getting the one or the other result depends in turn on the probability that the one or the other intervention was made, hence these probabilities appear in the last line of the above equation. However, as already said, the probabilities of these interventions are not contained in the table, but must be given on a case-by- case basis. As a result, the two disjoint probabilities in Eqs. (16) and (17) ‘behave’ dif- ferently, and the reason for this is intuitively clear. However, we can partially trace in this fact the source of much research and discussion on partially or- dered lattices and quantum logics for the set of intervention results [18, 19, 20]. Roughly speaking, the point is that, for a classical system, there is the theoret- ical possibility of joining all possible interventions (measurements) in a single “total intervention”: the table associated to a classical system can then be considered as having only one intervention; thus one needs never consider the case of Eq. (17). However, such “total intervention” is excluded in quantum mechanics, and one is thus forced to consider the case of Eq. (17). From the point of view of probability theory as logic, instead, there is no need for non-Boolean structures thanks to the possibility of changing and adapting, by means of Bayes’ theorem, the context (also called ‘prior knowl- edge’ [16] or ‘data’ [13]) of a probability, i.e., the proposition to the right of the conditional symbol ‘|’.j jIn contrast, ‘Kolmogorovian’ probability, with its scarce flexibility with respect to contexts, deals with these features with difficulty. Loubenets’ efforts [21, 22] are directed to amelio- rating this situation.

10 4.2 Unknown preparations and their ‘tomography’ On the other hand, we may have the following scenario: we have repeated instances of a given preparation, but we do not know which. By performing interventions on these instances and observing the results, we can estimate which preparation is being made. Introducing the proposition D representing the results thus obtained, we have

P (D| S j ∧ Q) P (S j| Q) P (S j| D ∧ Q)= . (18) j P (D| S j ∧ Q) P (S j| Q)

This is a standard “inverse-inference” result of Bayesian analysis [16, Ch. 4]. The probability P (D| S j ∧ Q) can be written in terms of scalar products of result and preparation vectors, but the probability distribution {P (Sj| Q)} depends on the prior knowledge that one has in each specific case. If we now want to predict which result will occur in a new intervention Mk,wehavethe following probability:

P (Ri| Mki ∧ D ∧ Q)= P (Ri| Mki ∧ S j ∧ Q) × P (S j| D ∧ Q), j (19) = ri · sj P (S j| D ∧ Q), j i.e., we can effectively associate the vector snew ≡ sj P (S j| D ∧ Q) (20) j to the unknown preparation. A further kind of scenario is this: we have a brand-new kind of preparation, i.e., a new phenomenon, still untested. It has no place in our probability table; yet we think that we could reserve a new column to it without making substantial changes to our table’s decomposition (by which we mean that the table’s rank K would not change).k Given this, in order to associate a vector to this new preparation we proceed as in the preceding scenario, performing interventions and observing results. The result is that Eqs. (18) and (20) also apply in this case. Note that there is no conflict between our talking about “unknown prepa- rations” and Fuchs’ criticism of the term ‘unknown quantum state’ [23, 24, 25]. kThis condition seems to be somehow related to Fuchs’ “accepting quantum mechanics” [23, Sec. 6].

11 A preparation, as we have seen, is a well-defined procedure (that can be shown or described to others, etc.) to set up a given physical situation; on the other hand, Fuchs’ meaning of ‘quantum state’ is ‘density matrix’ [23, 26], which corresponds (more or less) to the preparation vector instead.l Thus, consider the two sentences “It is unknown to me which kind of laser and of beam splitter are used in this experiment” and “I do not know with which probability the detector behind the vertical filter will click”: the latter sentence is nonsensical from a Bayesian point of view, because there are no ‘unknown’ degrees of be- lief; but the former sentence is unquestionably meaningful. Thus, even if the preparation is unknown to us, we can always associate a preparation vector to it.m

4.3 On the content of the table vectors and ‘quantum states of knowledge’ Some remarks have already been made in Sect. 2 after the derivation of the formula P R | M ∧ S ∧ Q ≡ · , ( i ki j ) ri sj (5bis) with regard to the fact that the vectors {sj} and {ri} do not have any physical meaning separately: a given preparation vector sj tells us nothing if we do not have an intervention-result vector ri; even less if we know nothing about the set of intervention results. It follows that the preparation vectors also lack any probabilistic meaning per se:theyarenot probabilities nor collections of probabilities — they are just mathematical objects which yield probabilities when combined in a given way with objects of similar kind. This, in particular, is true for density matrices, as we have seen that they are just a particular case of preparation vectors. It is slightly incorrect, as well, to say that probabilities are ‘encoded’ or ‘contained’ in the (quantum) preparation vectors: rather, they are parts of an encoding. From these considerations, the quantum state (the density matrix) appears to be part of a state of belief, and not the ‘whole’ state of belief [23, 28]. Perhaps the point is that Caves, Fuchs, and Schack’s notion of a ‘quantum’ state of belief implicitly assumes the existence and the particular structure of the whole set lSadly, the Janus-faced term ‘state’ is sometimes meant as ‘density matrix’ and sometimes as a sort of ‘physical state of affairs’, though the two notions are, of course, quite distinct. mIt would be interesting, indeed, to see more clearly the connexion between the “preparation tomography” illustrated above, and the ‘quantum Bayes rule’ and ‘quantum de Finetti the- orem’ of Caves, Fuchs, and Schack [23, 24, 25, 27, 28, 29, 30]; it seems reasonable to expect that these ‘quantum’ analogues of Bayesian formulae should have a counterpart for tables concerning general systems, not only quantum-mechanical ones; the failure of the ‘quantum de Finetti theorem’ for quantum mechanics on real Hilbert spaces is quite interesting from this point of view.

12 of quantum positive-operator-valued measures (i.e., the interventions).n This is an important difference from a ‘usual’ degree of belief P (A| X) which does not need to be combined with other mathematical objects to reveal its content.

4.4 Possible applications of the table formalism

It has already been remarked that the kind of vector representation arising from the table decomposition is essentially the same as Hardy’s [1]. The derivation presented here can be seen as a sort of shortcut for his derivation, but it implies something more. Hardy supposes that it is possible to represent a preparation by means of a K-dimensional vector with K  L because most physical the- ories have some structure which relates different measured quantities; but the reasoning behind the decomposition of Sect. 2 shows that this possibility exists even without a theory that describes the data (indeed, the question arises: has this possibility any physical meaning at all?) In any case, the idea of a ‘probability table’ and its decomposition has probably very little usefulness in experimental applications, but provides a very simple approach to study the mathematical and geometrical structures of classical and quantum theories, and offers a different way to look at their “foundational” and “interpretative” issues. This approach is even more general than other standard ones based, e.g., on C∗-algebras, or even Jordan-Banach-algebras [31], or on convex state-spaces.o Thus, with the idea of a ‘probability table’ we can very easily implement ‘toy theories’ or models like those of Spekkens [32] and Kirkpatrick [33] (cf. also the issue raised by Terno [34]), which can then be compared to classical or quantum mechanics using a unique, common formalism. nHowever, Caves, Fuchs, and Schack have the right to be not so pedantic about this point, because they assume at the start the absolute validity of quantum mechanics and of its mathematical structure (an assumption not made in the present work). oThe first two formalisms are included as particular limiting cases of tables with an uncount- ably infinite number of columns and rows. In contrast to the convex-state-space approach, the probability-table idea does not require that the set of preparations be necessarily convex (however, the convexity usually appears in a natural way: see e.g. Hardy’s discussion in Ref. 1, Sect. 6.5). Moreover, in contrast to all three mentioned frameworks, it does not as- sume that the set of preparations (states) be necessarily the whole set of normalised positive linear functionals of the set of results (POVM elements), or vice versa for the convex-state- space framework. For example, (the convex hull of) the set of preparations for the table of Sect. 2.1 is only part of the set of normalised positive linear functionals of (the convex hull of) the set of results (the latter would be a square circumscribed on the given one).

13 Acknowledgements The author would like to thank Gunnar Bj¨ork, Ingemar Bengtsson, Christopher Fuchs, Lucien Hardy, Asa˚ Ericsson, Anders M˚ansson, and Anna for advice, encouragement, and many useful discussions.

A The trace rule It is shown that the ‘scalar product formula’, Eq. (5), includes also the ‘trace rule’ of quantum mechanics (see also Ref. 1, Sect. 5). A preparation is usually associated in quantum mechanics to a density matrix ρˆj, and a intervention result to a positive-operator-valued-measure ele- ment Πˆ i; both are Hermitian operators in a Hilbert space of dimension N.The probability of obtaining the result Πˆ i foragiveninterventiononpreparation ρˆj is given by the trace formula

pij =trΠˆ iρˆj. (21)

The Hermitian operators form a linear space of real dimension K = N 2;one can choose K linearly independent Hermitian operators {Bˆk} as a basis for this linear space. These can also be chosen (basically by Gram-Schmidt or- thonormalisation) to satisfy

tr BˆkBˆl = δkl. (22)

Both ρˆj and Πˆ i can be written as a linear combination of the basis operators: K K l k ρˆj = sj Bˆl, Πˆ i = ri Bˆk, (23) l=1 k=1 l k where the coefficients sj and ri are real. Using Eqs. (23) and (22) the trace formula becomes K K k l l l pij =trΠˆ iρˆj = ri sj tr BˆkBˆl = ri sj = ri · sj, (24) k,l=1 l=1

def K def K K where ri =(ri1,...,ri )andsj =(sj1,...,sj ) are vectors in R , q.e.d.

References [1] L. Hardy, Quantum theory from five reasonable axioms (2001), quant-ph/0101012.

14 [2] P. G. L. Mana, in preparation.

[3] B. Mielnik, Theory of filters, Commun. Math. Phys. 15, 1 (1969).

[4] B. Mielnik, Quantum logic: Is it necessarily orthocomplemented?,in Quantum Mechanics, Determinism, Causality, and Particles,editedby M. Flato, Z. Maric, A. Milojevic, D. Sternheimer, and J. P. Vigier (D. Reidel Publishing Company, Dordrecht, 1976), p. 117.

[5] B. Mielnik, Quantum theory without axioms,inQuantum Gravity 2,edited by C. J. Isham, R. Penrose, and D. W. Sciama (Oxford University Press, Oxford, 1981), pp. 638–656.

[6] B. Mielnik, Generalized quantum mechanics, Commun. Math. Phys. 37, 221 (1974), reprinted in [35, pp. 115–152].

[7]D.J.FoulisandC.H.Randall,Operational statistics. I. Basic concepts, J. Math. Phys. 13, 1667 (1972).

[8] C. H. Randall and D. J. Foulis, Operational statistics. II. Manuals of operations and their logics,J.Math.Phys.14, 1472 (1973).

[9] D. J. Foulis and C. H. Randall, Manuals, morphisms and quantum me- chanics,inMathematical Foundations of Quantum Theory,editedbyA.R. Marlow (Academic Press, London, 1978), pp. 105–126.

[10] C. H. Randall and D. J. Foulis, The operational approach to quantum mechanics, in [35], pp. 167–201.

[11] D. J. Foulis and S. P. Gudder, Observables, calibration, and effect algebras, Found. Phys. 31, 1515 (2001).

[12] H. Barnum, Quantum information processing, operational quantum logic, convexity, and the foundations of physics, Tech. Rep. LA-UR 03-1199, Los Alamos National Laboratory (2003), quant-ph/0304159.

[13] H. Jeffreys, Theory of Probability (Oxford University Press, London, 1998), 3rd ed., first publ. 1939.

[14] R. T. Cox, Probability, frequency, and reasonable expectation,Am.J. Phys. 14, 1 (1946).

[15] R. T. Cox, The Algebra of Probable Inference (The Johns Hopkins Press, Baltimore, 1961).

15 [16] E. T. Jaynes, Probability Theory: The Logic of Science (Cambridge Uni- versity Press, Cambridge, 2003), edited by G. Larry Bretthorst.

[17] B. de Finetti, Foresight: Its logical laws, its subjective sources,inStudies in Subjective Probability, edited by H. E. Kyburg, Jr. and H. E. Smokler (John Wiley & Sons, New York, 1964), pp. 93–158.

[18] J. M. Jauch, Foundations of Quantum Mechanics (Addison-Wesley Pub- lishing Company, Reading, USA, 1973), first publ. 1968.

[19] C. Piron, Foundations of Quantum Physics (W.A.Benjamin,Reading, USA, 1976).

[20] A. Wilce, Compactness and symmetry in quantum logic,inQuantum The- ory: Reconsideration of Foundations. 2,toappear.

[21] E. R. Loubenets, General probabilistic framework of randomness (2003), quant-ph/0305126.

[22] E. R. Loubenets, General framework for the probabilistic description of experiments (2003), quant-ph/0312199.

[23] C. A. Fuchs, Quantum mechanics as quantum information (and only a little more) (2002), quant-ph/0205039.

[24] S. J. van Enk and C. A. Fuchs, Quantum state of an ideal propagating laser field, Phys. Rev. Lett. 88, 027902 (2002), quant-ph/0104036.

[25] S. J. van Enk and C. A. Fuchs, Quantum state of a propagating laser field, Quant. Info. Comp. 2, 151 (2002), quant-ph/0111157.

[26] C. A. Fuchs, Quantum states: What the Hell Are They?, http://netlib.bell-labs.com/who/cafuchs/PhaseTransition.pdf.

[27] R. Schack, T. A. Brun, and C. M. Caves, Quantum Bayes rule,Phys.Rev. A 64, 014305 (2001), quant-ph/0008113.

[28] C. M. Caves, C. A. Fuchs, and R. Schack, Quantum probabil- ities as Bayesian probabilities,Phys.Rev.A65, 022305 (2002), quant-ph/0106133.

[29] C. M. Caves, C. A. Fuchs, and R. Schack, Unknown quantum states: the quantum de Finetti representation,J.Math.Phys.43, 4537 (2002), quant-ph/0104088.

16 [30] R. Schack, Unknown quantum operations,inQuantum Theory: Reconsid- eration of Foundations. 2,toappear. [31] H. Halvorson, A note on information theoretic characterizations of phys- ical theories (2003), quant-ph/0310101. [32] R. W. Spekkens, In defense of the epistemic view of quantum states: a toy theory (2004), quant-ph/0401052. [33] K. A. Kirkpatrick, “Quantal” behavior in classical probability, Found. Phys. Lett. 16, 199 (2003), quant-ph/0106072. [34] D. R. Terno, Inconsistency of quantum-classical dynamics, and what it implies (2004), quant-ph/0402092. [35] C. A. Hooker, ed., Physical Theory as Logico-Operational Structure (D. Reidel Publishing Company, Dordrecht, 1979).

17

Paper E

arXiv:quant-ph/0505229 v2 24 Aug 2005 n h converse the and of tsatemdnmcccewihvoae h eodlaw. second the violates which cycle thermodynamic cre- a assumptions ates this from and diaphragms, “sta semi-permeable quantum non- two the assumes of explicitly orthogonality he which thought-experiment in a gases’ of ‘quantum means with by (2) statement second the of be be must can they “states” orthogonal. then two diaphragms, if semi-permeable viz.: by one, separated first but the formula). statement, of second converse entropy the the prove rather his actually derived not did he he However, same which (the by considerations” considerations “thermodynamic involving lysis propositions: Quantenmechanik ∗ 1 lcrncades [email protected] address: Electronic .ITOUTO:VNNUAN(N EE)ON PERES) (AND NEUMANN VON INTRODUCTION: 0. addmzetnHauptsatz”. zweiten dem Wand widerspric so orthogonal, rntwre önn enseo sie wenn können, werden trennt Zustände “[Z]wei hs ttmnscnentemdnmc n h notions the and thermodynamics concern statements These twsPrs[]ta aeaseigydrc demonstration direct seemingly a gave that [3] Peres was It ana- an in above statement first the proved Neumann Von n§V2o o Neumann’s von of V.2 § In distinguishability tdb eiprebedahami hyare they if diaphragm orthogonal”; semi-permeable a by ated states “[T]wo h eodlw[fthermodynamics]”. [of law second contradicts the diaphragm semi-permeable a such of “[I]f RHGNLT N H EODLAW SECOND THE AND ORTHOGONALITY φ , 1 eetpee fkoldeaotagvnpyia iuto,aduigicmail est arcst describe to matrices density incompatible using and situation, physical given a it. about knowledge of pieces ferent ASnmes 03.65.Ca,65.40.Gr,03.67.-a numbers: PACS ocuin ocrigeg iltoso h eodlwo hroyais ntescn at eanalyse we part, dis- second the one-shot In that from instead, explained any thermodynamics. comes, is draw of assumptions it rigorously law contradictory and cannot second these discussed, one the when and of happens which defined what from violations are e.g. assumptions, concepts concerning contradictory these conclusion, are paper, orthogonality the implications. and of thermodynamic tinguishability part their first and the diaphragm, In semi-permeable orthogonality, distinguishability, one-shot ψ w ttmnsb o emn n huh-xeietb ee rmt icsino h oin of notions the on discussion a prompts Peres by thought-experiment a and Neumann von by statements Two r o rhgnl hnteassumption the then orthogonal, not are φ , 1 seas 2)w n h w following two the find we [2]) also (see [1] ndsigihblt,otooaiy n iltoso h eodlaw: second the of violations and orthogonality, distinguishability, On ψ and φ .. durch[...] semipermeable Wände bestimmt ge- , tdeAnheenrslhnsemipermeablen solchen einer Annahme die ht oa nttt fTcnlg KH,Iajrsaa 2 E144 tchl,Sweden Stockholm, 40 SE-164 22, Isafjordsgatan (KTH), Technology of Institute Royal ψ orthogonality otaitr supin,cnrsigpee fknowledge of pieces contrasting assumptions, contradictory .. can[...] certainly separ- be tooa id,ad“[S]ind and sind”, rthogonal ahmtsh rnlgnder Grundlagen Mathematische ir .Lc Mana, Luca G. Piero e” hi eaaiiyby separability their tes”, colo nomto n omncto Technology, Communication and Information of School . Dtd 4Ags 2005) August 24 (Dated: φ ∗ , ψ nesMnsn n unrBjörk Gunnar and Månsson, Anders nicht (1) (2) rms;ie,i a h oia om‘( form logical its and the in — has contradiction (2) it a statement contains i.e., second — premise; the Peres that by be demonstration will its part first conclusion the principal of of The relation the thermodynamics. discuss to to concepts occasion these prepara- an those provide will shot, This one tions. in distinguish, can measurement a which of procedure realisation particular a of just idea prepara- is the (“states”) two tions how separate can show or which we asserting membrane Second, semi-permeable without a well. one as the other deny the or denying assert cannot one thus, of counterpart mathematical the gonality oepeiey h huh-xeietinvolves thought-experiment the precisely, More otaito ntepeiei h supino distinguish- of non-distinguish assumption and the ability is premise the in contradiction matrix density distinguishability other the separation. assumption, the non-orthogonality performing the making one ers: inet.Freape h eodlwmyseigybe seemingly may law second the example, For signments. ecnevda iuto nwhich in situation a as conceived be of perspectives. main two reflecting into depth, parts, divided varying main is two in paper The it purposes. analyse paedagogical and with interesting; mainly discuss quite to want is we observers therefore of multiplicity and dynamics, htcnhpe hntetoosres aeand illustrates 18], di have also hence 17, observers’ knowledge, It of 16, two pieces 19]. the ferent 15, 18, when 14, 17, happen 13, 16, can con- [3, what 12, measurements observer and 11, preparations the of 10, by sets sidered 9, the on 8, as 7, well as 6, 5, [4, ation particular observer’s an depend- always on is preparation ent a describe to adopted matrix ity illust an provides This nomenon. di ff h otaito nPrs xeiet oee,cnalso can however, experiment, Peres’ in contradiction The Inthefirstpartweo thermo- distinguishability, orthogonality, of interplay The preparation erent is,w aecerth clear make we First, . est arcst nls the analyse to matrices density , oiieoeao-audmeasure positive-operator-valued two and n ftermteaia representatives: mathematical their of and , di esrmn procedure measurement ff rn bevr,hvn dif- having observers, erent ff radsuso ntetreconcepts three the on discussion a er blt ttesm time. same the at ability knowledge oeso distinguishability’; ‘one-shot aino h atta dens- a that fact the of ration t‘rhgnlt’i simply is ‘orthogonality’ at ff rn est-arxas- density-matrix erent two A same ∧¬ bu htprepar- that about cetssuetwo use scientists and A ) hsclphe- physical ⇒ ,and two “one-shot” B use observ- .The ’. ortho- dif- 2 violated for one of the observers. The fact that a mathem- distinguishability’ and ‘orthogonality’. In our definitions we atical description always depends on one’s particular know- follow Ekstein [36, 37], Giles [38, 39, 40], Foulis and Ran- ledge of a phenomenon, however, is not only true in quantum dall [41, 42, 43], Band and Park [44, 45], and Peres [3, 46, 47]; mechanics, but in classical physics as well. A particular case the reader is referred to these references for a deeper discus- within thermodynamics was shown by Jaynes’ [20] by means sion (see also [4, 5, 16, 17, 19, 48, 49, 50, 51, 52] and the of a thought-experiment which is very similar to Peres’ and in introductory remarks in Komar [53]). which the same seeming violation of the second law appears In quantum theory, a preparation procedure (or ‘prepara- according to one of the observers. For this reason, we shall tion’ for short) is “an experimental procedure that is com- juxtapose Peres’ demonstration to Jaynes’, hoping that they pletely specified, like a recipe in a good cookbook” ([3], p. will provide insight into each other. 12, cf. also p. 424). It is usually accompanied by a meas- We assume that our reader has a working knowledge of urement procedure, which can result in different outcomes, quantum mechanics and thermodynamics (we also provide appropriately labeled. The probabilities that we assign to the some footnotes on recent developments of the latter, today obtainment of these outcomes, for all measurement and pre- better called ‘thermomechanics’ [21, 22, 23], since they are paration procedures, are set as postulates in the theory and apparently largely unknown to the quantum-physics com- encoded in its mathematical objects, described below. Prepar- munity.) We also want to emphasise that this paper is not ation and measurement procedures are not mathematical ob- directly concerned with questions about the relation between jects. As an oversimplified example, the instructions for the thermodynamics and statistical mechanics, nor to questions set-up and triggering of a low-intensity laser constitute a pre- about “classical” or “quantum” entropy formulae. Since our paration procedure, and the instructions for the installation of discussions will regard the second law of thermodynamics, the a detector constitute a measurement procedure. reader probably expects that entropy will enter the scene; but A preparation procedure is mathematically represented by a his or her expectations will not be fulfilled. Peres’, von Neu- density matrix4 , whose mathematical properties we assume mann’s, and Jaynes’ demonstrations are based on cyclic pro- well known to the reader [3, 52, 55]. A measurement pro- cesses, which start from and end in a situation described by cedure is instead mathematically represented by a positive- 2 the same thermodynamic state. Assuming the entropy to be operator-valued measure (POVM) {Ei}, viz., a set of pos- a state variable, its change is naught in such processes, inde- itive (semi-definite) matrices, each associated to a particu- pendently of its mathematical expression. The second law of lar measurement outcome, which sum to the identity mat- thermodynamics assumes then the form3 rix [3, 52, 56, 57, 58]. These two mathematical objects encode our knowledge of the statistical properties of their respective Q 0 (cyclic processes), (3) preparation and measurement: when we perform an instance where Q is the total amount of heat absorbed by the thermo- of the measurement represented by {Ei} on an instance of the dynamic body in the process. This is the entropy-free form preparation represented by , the probability pi assigned to we are going to use in this paper. the obtainment of outcome i is given by the trace formula pi = tr(Ei). If we wish to specify not only the probabilities of the vari- 1. PART I: CONTRADICTORY PREMISES AND ous outcomes of a measurement, but also its effect on the pre- VIOLATIONS OF THE SECOND LAW paration, we associate to the measurement procedure a com- Alice felt even more indignant at this suggestion. pletely positive map (CPM) [52, 56, 57, 58, 59]. We shall “I mean,” she said, “that one can’t help growing not use the full formalism of completely positive maps here, older.” [35] however, but only the special case where the CPM is a set of projectors {Πi} and the effect on the density matrix  when result i is obtained is given by 1.1. Preparations and density matrices, measurements and positive-operator-valued measures Π Π  →  = i i . i (4) tr(ΠiΠi) We begin by informally recalling the definitions of ‘prepar- ation procedure’ and ‘density matrix’, ‘measurement proced- ure’ and ‘positive-operator-valued measure’, and ‘one-shot 1.2. One-shot distinguishability and orthogonality

Suppose a physicist has realised one instance of a prepar- ation procedure, choosing between two possible preparation 2 The concept of ‘state’ in thermomechanics may include not only the in- stantaneous values of several (field) variables, but even their histories, or suitably defined equivalence classes thereof. This concept has an in- teresting historical development. Cf. e.g. Truesdell et al. [24, Preface], Noll [25, 26], Willems [27], Coleman et al. [28, 29, 30, 31] Del Piero et 4 The original and more apt term was ‘statistical matrix’, which has unfortu- al. [32, 33], Šilhavý [23]. nately been changed to ‘density matrix’ apparently since Wigner [54]; cf. 3 See Serrin [34] for a keen analysis of the second law for cyclic processes. Fano [55]. 3 procedures.5 We do not know which of the two, but we can distinguishable is mathematically represented by the ortho- perform a measurement on the instance and record the out- gonality of their associated density matrices. Or, to put it an- come. Suppose there exists a measurement procedure such other way, orthogonality is the mathematical representation of that some (at least one) of its outcomes have vanishing prob- one-shot distinguishability. The converse of the mathematical abilities for the first preparation and non-vanishing probabilit- theorem is also true: if two density matrices are orthogonal, ies for the second, while the remaining (at least one) outcomes then there exists a POVM with the properties (5) above. But have vanishing probabilities for the second preparation and to conclude that the represented preparation procedures are non-vanishing probabilities for the first. This means that by one-shot distinguishable, we need first to assume that there performing a single instance of this measurement procedure physically exists a measurement procedure corresponding to we can deduce with certainty which preparation was made, by that POVM.7. looking at the outcome. In this case the two preparations are To prove this theorem (and its converse), thermodynamic said to be one-shot distinguishable (cf. [19]). The fact that arguments are not needed. It just follows mathematically some sets of preparation procedures are one-shot distinguish- from Eqns. (5), which represent a probabilistic property of a able while others are not is the basic reason why quantum measurement. So we begin to see that von Neumann’s state- mechanics is a probabilistic theory. ment (1), in which he seems to derive orthogonality from If two preparation procedures are one-shot distinguishable, separability by semi-permeable diaphragms, would not really it is then easy to separate them. By this we mean that if we need thermodynamic considerations. But we can make this are presented with many instances of these preparations, we more precise only after we have discussed the notion of a can for each instance make a measurement and tell the prepar- semi-permeable diaphragm, which appears in statement (1). ation, and thus separate the instances of the first preparation This will be done in a moment; before then, we want to from the instances of the second preparation. But the converse make some final remarks about distinguishability and about is also true: if we can separate with certainty the two groups linguistic details. of instances, it means that we can distinguish with certainty It may be the case that two preparation procedures cannot the preparation of each instance. be distinguished by means of a single measurement instance, Let us see how the one-shot-distinguishability prop- but can still be distinguished with arbitrary precision by study- erty is represented mathematically. The two one-shot- ing their statistical properties, i.e. by analysing the results of distinguishable preparation procedures of the above example an adequately large number of diverse measurements on sep- are represented by density matrices φ and ψ,andthemeasure- arate instances of these preparations.8 They are thus distin- ment by a POVM {Ak}. According to the definition of one- guishable, but not in one-shot. They are represented in this shot distinguishability given above, we must be able to write case by different, but non-orthogonal, density matrices. On this POVM as {Ak} = {Ei, F j}, with the other hand, two preparation procedures are represented by the same density matrix when they have exactly the same tr(φEi) = 0andtr(ψEi)  0forallEi, statistical properties (with respect to a given set of measure- (5) tr(ψF j) = 0andtr(φF j)  0forallF j. ment procedures) and so cannot be distinguished by a stat- istical analysis of measurement results, no matter how many This is the mathematical form of our definition of one-shot instances of them we prepare and how many measurements distinguishability. we make. An important consequence of the equations above is the fol- The whole discussion above is easily generalised to more lowing: the density matrices φ and ψ must be orthogonal, viz. than two preparation procedures. tr(φψ) = 0.6 We prove this simple theorem in the Appendix. Thus, the fact that two preparation procedures are one-shot 1.3. Interlude: linguistic details

Preparation procedures are often called ‘states’; however, 5 Note the difference between a preparation (or measurement) procedure, and the realisation of an instance thereof. The first is a set of instructions, the word ‘state’ is equally often used to mean the density mat- a description, the latter is a single actual realisation of those instructions. rix representing a preparation procedure. This double mean- This terminology may help avoiding the confusion which some authors ing is quite natural, because the concepts of preparation pro- (see e.g. [60, 61, 62]) still make between ‘ensemble’ and ‘assembly’, as cedure and density matrix are, as we have seen, strictly re- defined and conceptually distinguished by e.g. Peres [3, pp. 25, 59, 292] (see also Ballentine [48, p. 361]). The term ‘ensemble’ is used with so lated. They are nevertheless distinct, as shown e.g. by the fact different meanings in the literature (see e.g. Hughston et al. [63] for yet an- that different preparation procedures (concerning the same other, though self-consistent, usage), that we prefer to avoid it altogether. physical phenomenon) can sometimes be represented by the Moreover, Bayesian probability theory (see e.g. [10, 64]) makes the con- ceptual usefulness of this term questionable or at least obsolete. 6 This expression is the usual definition of orthogonality, i.e. vanishing scalar product, between vectors: in our case the density matrices are considered as vectors in a real vector space of Hermitean matrices, with tr(φψ)asthe 7 This is not a light assumption; cf. e.g. Peres [3], pp. 50 and 424. scalar product. 8 See e.g. [65, 66, 67, 68, 69, 70, 71, 72]; cf. also [13, 14]. 4 same density matrix. We wish to keep this distinction in this done by the gas, W: paper, and we shall thus stick to the two distinct terms, avoid- Q = W = NkT ln(V /V ) (isothermal processes), (6) ing the term ‘state’. f i The terms ‘(one-shot) distinguishable’ and ‘orthogonal’ are where Vi and Vf are the volumes at the beginning and end of also often interchanged in an analogous way; e.g., one says the process, k is Boltzmann’s constant, and N is the (constant) that “two preparations are orthogonal” (instead of “two pre- number of particles. This formula will be true throughout the parations are distinguishable”), or that “two density matrices paper, as we shall only consider isothermal processes. (Note are distinguishable” (instead of “two density matrices are that W and Q can assume both positive and negative values.) orthogonal”). Such metonymic expressions are of course One often considers several samples of such ideal gases in handy and acceptable, but we must not forget that ‘orthogonal a container and inserts, moves, or removes impermeable or preparations’ only means ‘one-shot distinguishable prepara- semi-permeable diaphragms10 at any position one pleases.11 tions’, so that if we say “these preparations are one-shot dis- What the presence of these diaphragms, both in practice and tinguishable and non-orthogonal” we are then contradicting in theory, really signifies in the case of ideal, non-interacting ourselves — not an experimental or physical contradiction, (e.g. non chemically reacting) gas samples, is that we can but a linguistic,orlogical, one. Just as it would be contra- control and monitor the variables of the ideal-gas samples, dictory to say that a classical force field is conservative and (V1, T1; V2, T2; ...) independently of each other, even when its integral along a closed path does not vanish; or that, in a some of the samples occupy identical regions of space sim- given reference frame, a point-like body is in motion and its ultaneously. Thus, the problem becomes equivalent to one position-vector is constant; or that a non-relativistic system is where all gas samples always occupy distinct regions of space, closed and its total mass is changing. even though they may be in mechanical or thermal contact.12 These remarks are pedantic, and many a reader will con- For “quantum” ideal gases, as will be seen in a moment, the sider them only linguistic nit-picking; but these readers are situation is not so simple. then invited to take again a look at von Neumann’s second We must first face the question of how to introduce and statement (2). Is everything in order there? We shall come mathematically represent quantum degrees of freedom in an back to this point later. ideal gas. Von Neumann [1, § V.2] used a hybrid classical- quantum description, microscopically modelling a quantum ideal gas as a large number N of classical particles possess- ing an “internal” quantum degree of freedom represented by 1.4. “Quantum” ideal gases and semi-permeable diaphragms a density matrix  living in an appropriate density-matrix space. This space and the density matrix are always assumed In order to analyse a statement which involves, besides to be the same for all the gas particles.13 He then treated two quantum concepts, also thermodynamic arguments and semi- gas samples described by different density matrices as gases permeable diaphragms, it is necessary to introduce a ther- of somehow different chemical species. This idea had been modynamic body possessing “quantum” characteristics, i.e., presented by Einstein [91] eighteen years earlier, but it is im- quantum degrees of freedom; an ‘ideal’ body will do very portant to point out that the “internal quantum degree of free- well. For this purpose, we shall first recall the basic rela- dom” was for Einstein just a “resonator” capable of assum- tionships between work and heat for classical ideal gases. ing only discrete energies. This degree of freedom had for We shall then follow von Neumann [1] and introduce the him discrete (the original meaning of the adjective ‘quantum’) quantum degrees of freedom as “internal” degrees of free- but otherwise statistically classical properties; it was not de- dom of the particles constituting the gases, and shall finally scribed by density matrices, and it did not provide non-ortho- discuss how the interaction between the quantum and thermo- gonality issues. Einstein’s idea was hence less open to prob- dynamic parameters is achieved by means of semi-permeable lems than von Neumann’s. diaphragms. Let us first recall that (classical) ideal gases are defined as homogeneous, uniform thermodynamic bodies character- isable by two thermodynamic variables: the volume V > 0 10 Partington [76, § 28] informs us that these were first used in thermodynam- > ics by Gibbs [79]. and the temperature T 0, and for which the internal energy 11 9 In the limit, this leads to a formalism involving field quantities (cf. Buch- is a function of the variable T alone; this implies, via the dahl [78, §§ 46, 75], and see e.g. Truesdell [21, lectures 5, 6 and related first law of thermodynamics, that in any isothermal process appendices] and [22, 80, 81, 82, 83]), as indicated by the possibility of the heat absorbed by the gas, Q, is always equal to the work introducing as many diaphragms as we wish and hence to control smaller and smaller portions of the gas samples. 12 In particular, the samples have separate entropies and must separately sat- isfy the second law [79]. When the gases do interact, we enter into the more complex, and still under development, thermomechanic theory of 9 See e.g. the excellent little book by Truesdell and Bharatha [73], or mixtures (see e.g. [21, 22, 82, 83, 84, 85, 86, 87, 88]; cf. also [89]). Planck [74, §§ 86–91, 232–236], Lewis and Randall [75, chap. VI], Part- 13 Dieks and van Dijk [90] point rightly out that one should then more cor- n  = ington [76, §§ II.54–57, IV.14, VIIA.21], Callen [77, §§ 3-4, 13-1–2, 16- rectly consider the total density matrix i=1 ,wheren NNA (with NA 10], Buchdahl [78, §§ 71, 82]; also Samohýl [22, §§ 5, 6]. Avogadro’s constant) is their total number. 5

In fact, von Neumann’s conceptual device presents some with two examples. problems. For example, the chemical species of a gas is not a thermodynamic variable, and even less a continuous one: chemical differences cannot change continuously to zero.14 It First example: one-shot distinguishable preparations would then seem more appropriate to describe a quantum ideal gas by the variables (V, T, ) instead, taking values on appro- Imagine a container having volume V and containing a mix- + − priate sets. That this would indeed be the only correct treat- ture of N/2 particles of a z -gas and N/2ofaz -gas; i.e., the ment can be seen from the fact that the thought experiments particles have quantum degrees of freedom represented by the considered by von Neumann and reproduced below always density matrices  involve some step in which the density matrix of a gas is + + +  z def= |z z | =ˆ 10 , (7) changed.Thisimpliesthat is something which we can and 00 need to control, and as such it should be included in the list of z− def= |z− z−| =ˆ 00 , (8) variables which define our thermodynamic system [93]. 01 However, in the following discussion we shall follow in the usual spin-1/2 notation (Fig. 1, a). These two dens- von Neumann and Peres instead and speak of a ‘φ-gas’, or a ity matrices represent quantum preparations that can be dis- ‘ψ-gas’, etc., where φ or ψ are the density matrices describing tinguished in one shot by an appropriate measurement, im- the internal quantum degrees of freedom of the gas particles, plemented by two semi-permeable diaphragms as described just as if we were speaking of gases of different chemical spe- above. The first is completely opaque to the particles of the − + cies (like e.g. ‘argon’ and ‘helium’). The thermodynamic vari- z -gas and completely transparent to those of the z -gas; the + ables are (V, T) for each such gas. other is completely opaque to the particles of the z -gas and − In this framework, semi-permeable diaphragms have dif- completely transparent to those of the z -gas. Mathematic- { ±} ferent meaning and function than they have in the classical ally they are represented by the two-element POVM z and { → ± ±/ ± ± } 15 framework. This becomes clear when we analyse how they the CPMs (projections) z z tr(z z ) , as follows. are modelled microscopically. The microscopic picture [1, For the first diaphragm: p. 196][3, p. 271] is, paraphrasing von Neumann, to construct + + + + z → z , let through, with probability tr(z z ) = 1, (9a) many “windows” in the diaphragm, each of which is made − → −, − − = as follows. Each particle of the gases is detained there and a z z reflected, with probability tr(z z ) 1 (9b) + − − + measurement is performed on its quantum degrees of freedom; z → z , with probability tr(z z ) = 0, (9c) depending on the measurement result, the particle penetrates z− → z+, with probability tr(z+ z−) = 0, (9d) the window or is reflected, and its density matrix is changed in an appropriate way. In other words, the diaphragm also and for the second: acts on the translational degrees of freedom, separating the + → +, + + = , ff ffi z z reflected, with probability tr(z z ) 1 (10a) particles having di erent preparations spatially, with an e - − − − − ciency which depends on the preparations and the implemen- z → z , let through, with probability tr(z z ) = 1 (10b) ted measurement. This implies that the number of particles z+ → z−, with probability tr(z− z+) = 0, (10c) and hence the pressures or volumes of the gases on the two z− → z+, with probability tr(z+ z−) = 0. (10d) sides of the semi-permeable diaphragm will vary, and may set the diaphragm in motion, producing work (e.g. by lifting a Now imagine we insert these two diaphragms in the con- weight which loads the diaphragm). There arises thus a kind tainer (Fig. 1, a), very near to its top and bottom walls re- of mutual dependence between the quantum degrees of free- spectively. We push them isothermally toward the middle of dom and the thermodynamic parameters (like the volume V) of the quantum ideal gases. We see that for quantum gases (a) (b) the semi-permeable diaphragms not only make the existence | + +| of separate thermodynamic variables for different gas samples + + z z 0.5 |z z |+ Q <-0 possible, as it happened for (non-interacting) classical gases, . | − −| 0 5 z z − − but also perform transformations of the (not thermodynamic- |z z | ally reckoned) quantum variable . Such a diaphragm is then simply a device which implement Figure 1: Separation of one-shot-distinguishable quantum a measurement procedure, and is mathematically described by gases. a given POVM and a CPM. Let us illustrate how semi-permeable diaphragms work

15 Due to the large number of gas particles considered, the outcome prob- abilities are numerically equal, within small fluctuations negligible in the 14 An observation made by Partington [76, § II.28] in a reference to Lar- present work, to the average fraction of gas correspondingly transmitted or mor [92, p. 275]. reflected by the diaphragms. 6 the container until they come in contact with each other, so (a) (b) that the container is divided in two chambers. By doing so + + we have separated the two gases, with the z+-gas in the upper 0.5 |z z |+ Q <-0 |α+ α+| + + chamber and the z−-gas in the lower one (Fig. 1, b). In order 0.5 |x x | to move these diaphragms and achieve this separation we have |α− α−| spent an amount of work equal to Figure 2: Separation of non-one-shot-distinguishable N V/2 quantum gases. −2 × kT ln ≈ 0.693 NkT, (11) 2 V α±α±/ α±α± because each diaphragm had to overcome the pressure exer- tr( ), with ted by the gas to which it is opaque. Since the process is √ √ ± ± ± 1 2 ± 2 ± 2 isothermal, the quantity above is also the (positive) amount of α def= |α α | = √ √ , ˆ ± ∓ (13) heat released by the gases. 4 22 2 √ − 1 The semi-permeable diaphragms can also be used to realise |α± def= 2 ± 2 2 |z± ±|x± ≡ the inverse process, i.e. the mixing of two initially separated √ 1 √ 1 (14) + − 1 z -andz -gases. In this case the expression (11) would be the ± 2 ± 2 2 |z+ + 2 ∓ 2 2 |z− , amount of heat absorbed by the gases as well as the amount 2 of work performed by them. i.e., the Hilbert-space vectors |α± are the eigenvectors of the matrix λ given by √ √ 1 + 1 + 2 + 2 + 2 − 2 − 1 31 Second example: non-one-shot distinguishable preparations λ def= z + x = α + α =ˆ , (15) 2 2 4 4 4 11

The quantum degrees of freedom of two gases may also be and they are orthogonal: tr(α+α−) = 0. The action of the first prepared in such a way that no measurement procedure can diaphragm is thus distinguish between them in one shot, and so there are no + + + + semi-permeable diaphragms which can separate them com- z → α , let through, with probability tr(α z ) ≈ 0.854, pletely. This has of course consequences for the amount of (16a) work that can be gained by using the diaphragms. Imagine z+ → α−, reflected, with probability tr(α− z+) ≈ 0.146, again the initial situation above, but this time with a mixture (16b) + + of N/2 particles of a z -gas and N/2ofanx -gas, with + + + + x → α , let through, with probability tr(α x ) ≈ 0.854, (16c) + def + + 1 11 x = |x x | =ˆ . (12) + − − + 2 11 x → α , reflected, with probability tr(α x ) ≈ 0.146, (16d) The two density matrices z+ and x+ are non-orthogonal, tr(z+ x+)  0, and this must represent the fact that the prepara- while for the second diaphragm: tion procedures they represent cannot be distinguished in one + + + + shot. This also means that there are no semi-permeable dia- z → α , reflected, with probability tr(α z ) ≈ 0.854, phragms which are completely opaque to the particles of the (17a) one gas and completely transparent to those of the other gas z+ → α−, let through, with probability tr(α− z+) ≈ 0.146, and vice versa. Mathematically this is reflected in the non- (17b) existence of a POVM with the properties (5). The absence of + + + + x → α , reflected, with probability tr(α x ) ≈ 0.854, such completely separating diaphragms implies that we can- (17c) not control their mixing in a reversible way. + → α−, α− + ≈ . . It can be shown [1, 3] that in this case the separating x let through, with probability tr( x ) 0 146 process requiring the minimum amount of work, and so the (17d) 16 maximum (negative) amount of heat absorbed by the gas, By inserting these diaphragms in the container as in the can be performed by two diaphragms represented by the transform {α±}  → preceding example, they will and separate into two two-element POVM and the CPMs (projections) chambers our z+-andx+-gases, leaving in the end an α+-gas in the upper chamber and an α−-gas in the lower one (Fig. 2). The gases will have the same pressure as the original ones + 16 but occupy unequal volumes, because the ratio for the z -and Readers interested in entropy questions should note that such process is + α+ α− irreversible and that there is no reversible process with the same initial and x -gases to be transformed into an -gas and an -gas was final thermodynamic states, as discussed by Dieks and van Dijk [90]. approximately 0.854/0.146, as seen from Eqns. (16); this will 7 also be the ratio of the final volumes. The total amount of matrix (or a Hilbert-space ray, which can be considered as a work we spent for this separation is particular case thereof). The first statement (1), 0.146 V 0.854 V − 0.146 NkT ln − 0.854 NkT ln ≈ φ ψ V V “Two [density matrices] , can certainly be separated by a semi-permeable diaphragm if they 0.416 NkT (18) are orthogonal”, α− (the first term is for the upper diaphragm, opaque to the - seems to be saying just this: the orthogonality of two dens- gas, and the second for the lower diaphragm, opaque to the ity matrices (represents (mathematically) the fact that their re- α+ -gas). This is also the amount of heat released by the gases. spective preparation procedures are one-shot distinguishable, / The significance of the equations above is that the half half i.e., can be separated by some device. The demonstration of + + . / . mixture of z -andx -gases can be treated as a 0 854 0 146 this statement does not really require thermodynamic argu- α+ α− mixture of an -gas and an -gas. In fact, the two semi- ments, as we have seen.17 permeable diaphragms could be used to completely separate The second statement was: an α+-gas from an α−-gas, whose preparations are one-shot distinguishable. “If [the density matrices] φ, ψ are not orthogonal, then the assumption of such a semi-permeable We shall see an application of the coupling between the diaphragm [which can separate them] contradicts thermomechanic and quantum degrees of freedom provided the second law of thermodynamics”. by semi-permeable diaphragms in the second part of the pa- per. For the moment, we want to stress the following two-way Let us now try to analyse it. The statement mix together math- connexion: the existence of a measurement which can distin- ematical concepts (density matrices, non-orthogonality) and guish two preparations in one shot implies the possibility of physical ones (semi-permeable diaphragm), so we may try to constructing a semi-permeable diaphragm which can distin- state it either in physical or in mathematical terms only. guish and separate the two preparations. Vice versa, if we had Saying that two density matrices are not orthogonal is the such a diaphragm we could use it as a measurement device to mathematical way of saying that their corresponding prepara- distinguish in one shot the two preparations, and so it would tion procedures are not one-shot distinguishable. So the state- imply their one-shot distinguishability. ment becomes: The notion of a semi-permeable diaphragm implies of course something more, viz., the possibility of a coup- If two preparations are not one-shot distinguish- ling between the quantum degrees of freedom to be distin- able, then the assumption of a semi-permeable guished and the translational degrees of freedom. It is this diaphragm which can separate them contradicts coupling, done with a hybrid quantum-classical framework the second law. (von Neumann [1] and the present paper) or completely within quantum mechanics (Peres [3]), that allows applications and But in the previous sections we have shown that assuming the consequences of thermodynamic character. However, the existence of a separating semi-permeable diaphragm is equi- two-way connexion pointed out above is independent of the valent by construction to assuming the existence of a meas- fact that the diaphragm may also have applications or con- urement which can distinguish the preparations in one shot. sequences of thermodynamic character. So what the second statement is saying is just the following: If two preparations are not one-shot distinguish- able, then the assumption that they are one-shot 1.5. Distinguishability, orthogonality, and the second law distinguishable contradicts the second law.

We can now summarise the observations and results that we So the statement assumes that two preparations are one- have gathered up to now: shot distinguishable and not one-shot distinguishable; or in The existence of a measurement which can distinguish two mathematical terms, that their density matrices are orthogonal preparation procedures in one shot is equivalent to the ex- and not orthogonal. It has the logical form ‘(A ∧¬A) ⇒ B’, istence of semi-permeable diaphragms which can separate and makes thus little sense, because it starts from contradict- the two preparations, and is also equivalent, by definition, ory premises. In particular, it cannot have any physical im- to saying that the two preparations are one-shot distinguish- plications for the second law of thermodynamics. We must able. Mathematically, the measurement and the diaphragms are then represented by a POVM satisfying Eqns. (5), or equi- valently by the orthogonality of the density matrices repres- 17 enting the two preparations. What von Neumann’s demonstration of the first statement really shows is something else: viz., that his quantum entropy expression is consistent with Let us now look again at the two statements (1) and (2). thermodynamics. This, however, does not concern us in the present paper, We understand that, by “state”, von Neumann meant a density and will perhaps be analysed elsewhere [93]. 8 in fact remember that, in a logical formal system, from con- dictory assumptions within the quantum-mechanical theory it- tradictory assumptions (‘A ∧¬A’) one can idly deduce any self, and so cannot be simultaneously used to study the con- proposition whatever (‘B’) as well as its negation (‘¬B’) (see sistency of this theory with another one (thermodynamics). e.g. [94, 95, 96]). Using an example given in § 1.3: we can discuss the consist- The same contradictory premises are also present in Peres’ ency of a particular microscopic force field with macroscopic demonstration of the second statement by means of a thought- thermodynamic laws; but it is vain to assume that the force experiment. This experiment will be discussed in detail in field is conservative and its integral along a closed path does the second part of this paper, for a different purpose. But we not vanish in order to derive a violation of the second law can anticipate its basic idea only in order to see where the — these assumptions are contradictory by definition and any- contradictory premises lie: thing can be vacuously derived from them. Peres’ thought-experiment, though, does not lose its value A container is divided into two chambers containing because of its contradictory premises. The reason lies in its two quantum-ideal-gas samples. These are represented subtle (and nice) presentation: not as an impersonal reason- by non-orthogonal density matrices. Two semi-permeable ing, but as an interplay between two observers (us and a “wily diaphragms are then used to completely separate the two inventor”, as Peres calls him [3, p. 275]). It is thus possible to samples, isothermally, just as in the first example of § 1.4. In re-interpret the contradictory premises as different, contrast- this way some heat is absorbed by the gases, or equivalently, ing pieces of knowledge of two observers. This will be now some work is done by them. The process is then completed so done in the second part of the paper. as to come back to the initial situation, and it is easily shown that a net amount of heat is absorbed by the gases (or a net amount of work done by them), violating the second law. 2. PART II: CONTRASTING PIECES OF KNOWLEDGE The contradictory premises are the following. It is as- AND VIOLATIONS OF THE SECOND LAW sumed, on the one hand, that the two initial preparations of the quantum-gas samples are not one-shot distinguishable, as “One can’t perhaps,” said Humpty Dumpty; “but two is reflected by the use of non-orthogonal density matrices; i.e., can. With proper assistance, you might have left off at that there does not exist a measurement procedure able to dis- seven.” [35] tinguish and separate them in one shot. On the other hand, it is assumed immediately thereafter that there is such a meas- urement procedure, as is reflected by the existence of the sep- 2.1. The dependence of a density matrix on the observer’s knowledge and on the set of preparations and measurements arating diaphragms which must necessarily implement it; in chosen particular, the quantum-gases have then to be represented by orthogonal density matrices. This is a contradiction of course, and it has nothing to do with thermodynamic, work extraction, The fact that a density matrix always depends on the par- or heat absorption. The thought-experiment cannot be contin- ticular knowledge about the properties of the preparation it ued if we do not state clearly which of the mutually exclusive represents, has been stressed amongst others by Kemble [4, p. 1021][5, pp. 1155–57], Jaynes [6, 7, 8, 9, 10], Caves et alternatives, one-shot distinguishability or not-one-shot dis- 19 tinguishability, i.e. orthogonality or non-orthogonality, is the al. [11, 12, 14, 15], and ourselves [16, 17, 18]. Discus- one meant to hold. sions and analyses of the compatibility of density-matrix as- signments by different observers have already been offered, Note that we are not saying that there are no relationships or e.g., by Brun et al. [13], and Caves et al. [14]. valid consistency considerations between quantum mechanics Note in particular that a density matrix can be associated to and thermodynamics. In fact, what many of von Neumann’s a preparation (and a POVM to a measurement) only after we and Peres’ analyses and thought-experiments really demon- have specified the whole sets of preparation and measurement strate is that the quantum-mechanical physical concepts and procedures we want or have to consider — indeed, it is quite principles are consistent with the thermodynamic ones, which appropriate to say that these sets define our ‘system’. This is a fundamental result. As Peres [3, p. 275] states it, “It might appear paradoxical to some, but is clearly reflected in thus appears that thermodynamics imposes severe constraints the fact that, in the description of a phenomenon, we always on the choice of fundamental axioms for quantum theory”.18 have to choose a particular Hilbert space before introducing What we are saying is that the existence of a semi-permeable any density matrix. This fact can be inferred from Peres’ [3] diaphragm that can separate two preparations, and the non- one-shot distinguishability of these preparations, are contra-

19 Caves, Fuchs, and Schack see the density matrix as a sort of “quantum” analogue of a probability, and develop a “quantum” probability theory in 18 We thank an anonymous reviewer for the American Journal of Physics for analogy with (“classical”) probability theory (with analogous theorems, pointing out this passage to us. In the same paragraph, Peres also mention like e.g. Bayes’ and de Finetti’s [11, 12, 15, 97, 98, 99, 100]). We should a petitio principii which, however, concerns the proof of the equivalence of instead prefer to derive the quantum formalism as a particular application the von Neumann and thermodynamic entropies. of probability theory [17, 18]. 9 and Hardy’s [19] work, and we have tried to make it mathem- For this reason, we shall analyse Peres’ thought-experiment atically explicit elsewhere [16, 17, 18]. with the following scheme. First, we present it from the point Thus, if we add a single measurement procedure — perhaps of view of the first observer, that we call Tatiana, which sees a newly discovered one — to the chosen set of measurement a seeming violation of the second law. Then we go over to procedures, we must in some cases change (numerically and Jaynes’ simpler thought-experiment, and see how the two ob- even dimensionally) the density matrices we had previously servers’ (Johann and Marie) contrasting descriptions are re- associated to the preparations.20 solved there. With the insight provided by Jaynes’ experi- ment, we turn back to Peres’, and re-analyse it from the point of view of the second observer, that we call Willard, see- ing that he had some additional knowledge and measurement 2.2. The value of Peres’ thought-experiment from a different perspective means with respect to Tatiana, and that for him the second law is not violated; on the other hand, Tatiana will have to change the dimensions and numerical values of the density matrices Peres’ thought-experiment is particularly suited to show the used in her description if she wants to make allowance for points above. In its presentation, two observers have con- Willard’s new measurement procedure.21 trasting pieces of knowledge about the preparations of the quantum ideal gases: the first observer thinks them not to be one-shot distinguishable, and represents them accordingly 2.3. Peres’ thought-experiment: Tatiana’s description by non-orthogonal density matrices; the second observer, in- stead, knows they are one-shot distinguishable and, in fact, he possesses actual devices — the semi-permeable diaphragms Peres’ demonstration [3, pp. 275–277] can be presented as — by which he separates them. This is reflected in the fact follows. We are in a quantum laboratory, where the physicist that the set of measurement procedures considered by the Tatiana is studying a container wherein two quantum-ideal- second observer contains (at least) a measurement (viz., the gas samples are confined to two chambers, having volume one which can distinguish in one shot the gas preparations) not V/2 each and separated by an impermeable diaphragm. From contained in the set of the first observer, simply because the some measurements made on a series of identically prepared first observer did not know about the existence of that meas- containers, she has chosen to describe the internal quantum urement. degrees of freedom of the ideal gases by means of the density- A similar, two-observer style of presentation was chosen by matrix space for a quantum two-level system. In her descrip- tion, the upper chamber contains a z+-gas, the lower an x+- Jaynes [20, § 6] for a thought-experiment which showed that + + two different observers may have different pieces of know- gas, where z and x are the density matrices defined in § 1.4, + +  ledge and so give different thermodynamic descriptions for Eqns. (7) and (12). They are non-orthogonal, tr(z x ) 0, the same physical situation. When they interact, strange res- because to Tatiana’s knowledge there are no means to distin- ults may arise; e.g., it may appear to one of them as if the other guish the two corresponding preparations in one shot, as she were violating the second law. From this point of view Jaynes’ could not separate the two gases completely with any semi- and Peres’ thought-experiments are very similar also in their permeable diaphragms known to her. results; the similarity is the more fascinating because Jaynes’ Enters a “wily inventor”; let us call him Willard. He claims experiment is completely within a classical, not quantum, con- having produced two such semi-permeable diaphragms, 22 text. which can completely separate the two gases. In fact, by Grad [108, p. 325] also remarked that not only two observ- means of them he reversibly mixes the two gases, obtaining  = ≈ . ers, but even more simply the same observer facing two dif- work equal to Q NkT ln 2 0 693 NkT (cf. Eqn. (11)). ferent applications, may use different pieces of knowledge and From Tatiana’s point of view this is quite surprising! descriptions to study the same physical situation. Now she has a single container of volume V filled with a half/half mixture of z+-andx+-gases (Fig. 3, b). She wants

20 This fact is related to the restrictions (“compatibility” or “positivity do- mains”) of the set of statistical matrices for the “reduced dynamics” of 21 In some situations it might perhaps be necessary to abandon the quantum- some open quantum systems [101, 102, 103, 104, 105, 106, 107]. These theoretical physical principles partially or altogether, in favour of other restrictions simply arise because one wants to describe a set of preparation more economical or aesthetically pleasing. Note that it is easily proven [18] procedures by means of a density-matrix space that is instead only meant (cf. [16, § V], Holevo [109, chap. I]) that there always exist appropriate (if for, and is only appropriate for, a particular smaller set. On the other hand, necessary, infinite-dimensional) quantum-mechanical density-matrix- and the reason why one wants to do this is that quantum mechanics has the fol- POVM-spaces by which one can represent sets of preparation and measure- lowing unfortunate property: if you want to add just one more preparation ment procedures having any statistical properties whatever. In this sense, or measurement procedure to the sets that interest you, you must be ready the quantum-theoretical mathematical formalism can never be “proven to “pay” for at least 2N +1 more dimensions for your matrix-spaces, where wrong” (but note that the same also holds for the classical statistical form- N2 − 1 is the dimension of the (normalised-Hermitean-matrix) spaces you alism [109, § I.7]), although it may be redundant (see preceding footnote). were using (i.e., you pay the difference between (N + 1)2 − 1andN2 − 1). 22 This being a scientific paper, we ideally (i.e., unrealistically) assume Wil- Compare this with the classical “cost” of only 1 more dimension. lard’s honesty and the absence of any fraud. 10

(a) (b) (c) completed because the initial and final situations are the same. |z+ z+| The total heat Q absorbed by the gases equals the work —  . | + +|+ . |α+ α+|+ Q >-0 0 5 z z ≡- 0 85 experimentally measured — done by them and amounts to ? 0.5 |x+ x+| 0.15 |α− α−| |x+ x+| Q = Q + Q ≈ (0.693 − 0.416) NkT = 0.277 NkT > 0. (19) 6 ≡ −Q < Q Hence, she sees a violation of the second law (3) because, ≡ ? (f) (a) (e) (d) from her point of view, |z+ z+| |z+ z+|   |α+ α+| Q > 0 in the cyclic process. (20) |x+ x+| |z+ z+| |α− α−| Tatiana accuses Willard of having violated the second law by means of his strange semi-permeable diaphragms that Figure 3: Quantum gas experiment from Tatiana’s point of “completely separate non-orthogonal density matrices”. view. Has the second law really been violated? Also from Wil- lard’s point of view? We do not answer these questions now; instead, we leave the quantum laboratory where Tatiana and to return to the initial situation, but she cannot use Willard’s Willard are now arguing after their experiment, and enter an puzzling means, of course. For her the situation is now the adjacent classical laboratory, where we shall look at Jaynes’ same as that discussed in the second example of § 1.4: the gas mixture is for her equivalent (Fig. 3, d) to a mixture of demonstration [20]. The situation there is in many respects + very similar to the previous one, though it is completely “clas- approximately 0.854 parts of an α -gas and 0.146 parts of an − + − sical”. α -gas, where α and α are the density matrices defined in Eqn. (14). Tatiana uses two semi-permeable diaphragms to α± α+ . separate the two -gases, the -gas into a 0 854 fraction of 2.4. Jaynes’ thought-experiment the volume V,andtheα−-gas into the remaining 0.146 frac- tion (so that they have the same pressure), and spends work In the classical laboratory, we have a container wherein equal to −Q = −NkT(0.854 ln 0.854 + 0.146 ln 0.146) ≈ an ideal-gas sample is equally divided into two chambers 0.416 NkT, cf. Eqn. (18). of volumes V/2 each and separated by an impermeable dia- Tatiana then performs two operations corresponding to phragm (Fig. 4, a). From the measurements made by the sci- unitary rotations which reversibly change the density matrices entist Johann, the gas in the two chambers is exactly the same, associated to the two gases into the same density matrix, say “ideal argon”.24 For Johann it would thus be impossible, not z+, so that the two chambers eventually contain for her the + to say meaningless, to find a semi-permeable diaphragm that same z -gas. She then eliminates the diaphragms and reinserts be transparent to the gas in the upper chamber and opaque another impermeable one in the middle to divide the gas into to the gas in the lower one, and another diaphragm with the two chambers of equal volume (Fig. 3, e) — which is, from opposite properties. her point of view, a reversible operation —, and finally per- The scientist Marie, also in the laboratory, states never- + → + forms again an operation represented by a rotation z x theless that she has in fact two diaphragms with those very of the density matrix associated to the gas in the lower cham- properties. She uses them to reversibly and isothermally mix ber. In this way she has apparently re-established the original the two halves of the gas, obtaining work equal to Q = condition of the gases (Fig. 3, a), which have thus undergone NkT ln 2 ≈ 0.693 NkT (Fig. 4, b), and leaving Johann stu- a cycle. The last operations were assumed to be performable pefied. without expenditure or gain of work, hence without heat ex- 23 In fact, from Johann’s point of view Marie has left things change either. exactly as they were: he just needs to reinsert the impermeable Tatiana summarises the results as follows: a cycle has been diaphragm in the middle of the container and for him the situ- ation is exactly the same as in the beginning: the one gas is equally divided into two equal chambers (Fig. 4, c a). Johann’s conclusion is the following: The initial and final 23 Von Neumann [1, pp. 194, 197] and Peres [3, p. 275] assert that unitary rotations can be realised without heat exchange, but work exchange is al- conditions of the gas are the same, so a cyclic process has been lowed and indeed sometimes necessary. However, in our present discus- completed. The work obtained — experimentally measured sion we have assumed all processes to be isothermal and all gases ideal, — equals the heat absorbed by the gas, which implies that any reversible isochoric work exchange (like the unit- ary rotations) would be accompanied by an equivalent reversible heat ex- Q = Q ≈ 0.693 NkT > 0, (21) change (see § 1.4), with a consequent undesired entropy change. This is why Tatiana’s final isochoric unitary rotations must be performed with no energy exchange. This issue is related to the problematic way in which the quantum and classical or thermodynamic descriptions are combined; namely, the density matrices are not thermodynamic variables au pair with 24 Real argon, of course, behaves like an ideal gas only in certain ranges of the real numbers (V and T) describing the gas; cf. § 1.4. temperature and volume. 11

(a) (b) (c) ≡ (a) (a) (b) (c)  (a) 0.5 aAr Ar Ar aAr   . a b Q >-0 - Q >-0 0 5 Ar - 0.5 Ar Ar ? ! 0.5 bAr 0.5 aAr Ar Ar bAr 0.5 bAr 6 ≡ 6  (−Q Q) Figure 4: Classical gas experiment from Johann’s point of Figure 5: Classical gas experiment from Marie’s point of view. view.

26 and so for Johann a property not shared by the a variety. Marie’s separation of the two gases aAr and bAr was possible by means of two Q > 0 in the cyclic process (22) semi-permeable diaphragms made of whifnium and whafnium that take advantage of these different properties. (Of course, (cf. Tatiana’s Eqn. (20)), which plainly contradicts the second Jaynes’ and our speaking of ‘argon a’, ‘argon b’, ‘whafnium’, law of thermodynamics (3). and ‘whifnium’ in this imaginary experiment is just a coloured way of stating that the two samples are of different gases. The We see that what happens here is completely analogous to reader can, if he or she so prefers, simply call them ‘gas A’ what has happened in the quantum laboratory: a physicist has, and ‘gas B’ and take into account their different behaviour from initial measurements, a particular description of a given with respect to the two semi-permeable diaphragms.) situation. Then a process takes place that contradicts the phys- According to Marie (Fig. 5), the second law is not viol- icist’s mathematical description, and the re-establishment of ated: Initially the two gases aAr and bAr were completely what was thought to be the initial situation yields an apparent separated in the container’s two chambers (Fig. 5, a). After experimental violation of the second law. her mixing and extracting work, the container contained an The reader has no doubt noticed that the facts were presen- equal mixture of aAr and bAr (Fig. 5, b). Upon Johann’s ted not only from Johann’s point of view, but also, so to speak, reinsertion of the impermeable diaphragm the container was taking side with him. In fact, we plainly ignored Marie’s again divided in two equal chambers, but each chamber con- experimental performance in our mathematical description. tained a half/half mixture of aAr and bAr(Fig.5,c),and But it should be clear that, since the two gas samples in the this was different from the initial condition (Fig. 5, a). Thus chambers behaved differently with respect to Marie’s semi- the cycle was not completed, although it appeared so to Jo- permeable diaphragms (one sample exerted pressure on one hann, and so the form of the second law for cyclic processes, of the diaphragms but not on the other, and vice versa for the Eqn. (3), cannot be applied. To close the cycle one has to other sample), then they must actually be samples of two dif- use the semi-permeable diaphragms again to relegate the two ferent gases,25 contrary to what Johann (and we) believed and gases to two separate chambers, and must thereby spend an described mathematically. amount of work −Q at least equal to that previously obtained,   Note that this fact does not completely contradicts Johann’s −Q Q , and the second law (3) for the completed cycle is   point of view. It simply means that the two samples behave satisfied: Q = Q + Q 0. exactly in the same manner with respect to Johann’s exper- The simple conclusion, stated in terms of entropy, that imental and measurement means, and so he was justified to Jaynes [20, § 3] draws from this demonstration, is that consider them as samples of the same gas. But now Johann it is necessary to decide at the outset of a prob- has experimental evidence that the two samples behave dif- lem which macroscopic variables or degrees of ferently in certain circumstances, and so, in order to avoid freedom we shall measure and/or control; and inconsistencies, they have to be considered as samples of dif- within the context of the thermodynamic sys- ferent gases, at least in all experimental situations in which tem thus defined, entropy will be some func- their difference in behaviour comes about (such as the mixing tion S (X ,...,X ) of whatever variables we have performed by Marie). 1 n chosen. We can expect this to obey the second So let us see how the whole process has taken place from law [Q/T ΔS ] only as long as all experi- Marie’s point of view. She explains that the gas samples ini- ff mental manipulations are confined to that chosen tially contained in the two chambers were two di erent kinds set. If someone, unknown to us, were to vary of ideal-argon, of which Johann had no knowledge: ‘argon a’ a macrovariable X + outside that set, he could (aAr) and ‘argon b’(bAr). Argon a is soluble in whafnium n 1 while argon b is not, but the latter is soluble in whifnium,

26 Jaynes [20, § 5] explains that ‘whifnium’, as well as ‘whafnium’, “is one of the rare superkalic elements; in fact, it is so rare that it has not yet been 25 Provided, of course, that this phenomenon is also reproducible. discovered”. 12

produce what would appear to us as a violation and so for her they were appropriately represented by non-or- of the second law, since our entropy function thogonal density matrices. But, in order to avoid inconsisten- S (X1,...,Xn) might decrease spontaneously [i.e., cies, the two preparation procedures have to be represented by without absorption of heat (Q = 0)], while his orthogonal density matrices in all experimental situations in- S (X1,...,Xn, Xn+1) increases. volving the new measurement capability — such as Willard’s mixing process. This is old wisdom: Grad [108, p. 325] (see also [110, 111]) Willard thus represents the two quantum ideal gases by or- explained thirty-one years earlier that thogonal density matrices, and his mathematical analysis of the thermodynamic process is different from Tatiana’s. Let us the adoption of a new entropy is forced by the follow, step by step, a possible explanation of the quantum-gas discoveryofnewinformation. [...] Theexist- process from his point of view. He explains that the internal ence of diffusion between oxygen and nitrogen quantum degrees of freedom of the gases are best represented somewhere in a wind tunnel will usually be of by the density-matrix space for a quantum four-level system no interest. Therefore the aerodynamicist uses an (it might be that the molecules of the gas where diatomic and entropy which does not recognise the separate ex- not mono-atomic as Tatiana believed), of which Tatiana only istence of the two elements but only that of “air”. used a subspace because of her limited measurement means In other circumstances, the possibility of diffu- (e.g., she probed the internal quantum degrees of freedom of sion between elements with a much smaller mass only one of the molecule’s atoms) . In other words, part of ratio (e.g., 238/235) may be considered quite rel- the density-matrix space used by Willard was “traced out” in evant. Tatiana’s description, because she had only access to measure- ment procedures represented by a portion of the total POVM But Jaynes’ and Grad’s remarks do not apply only to en- space. tropy; they have greater generality. We always choose some Denoting by {|z+ z+ , |z− z+ , |z+ z− , |z− z− } the orthonormal variables — with particular ranges, scales, and governing basis for the Hilbert space used by Willard, Tatiana could not equations — to describe a physical phenomenon. And such distinguish, amongst others, the preparations corresponding | + + | + − | + a choice always represents, and is dependent on, the particu- to z z and to z z , both of which she represented √ as z , | + + def= | + + +| − + / lar knowledge that we have about that phenomenon, or that we nor those corresponding to x z√ z z z z 2and think is sufficient to describe it in a given situation or applic- to |x+ z− def= |z+ z− + |z− z− / 2, which she represented as ation.27 This is true in particular for the quantum-mechanical |x+ . Tatiana’s projection is thus of the kind |φψ →|φ : density matrices, Hilbert spaces, and so on. |z+ z+ →|z+ , |z− z+ →|z− , (23) |z+ z− →|z+ , |z− z− →|z− , 2.5. Peres’ thought-experiment: Willard’s description from which also follows

With the insight provided by the analysis of the classical + + + − + − |x z →|x , |x z →|x , experiment and by Grad’s and Jaynes’ remarks, we can return (24) | + − →| + , | − − →| − . to the quantum laboratory and look with different eyes at what x z x x z x happened there. From Willard’s point of view, the process went as follows. Just as in the case of Johann and Marie, we must admit that He had also made some measurements on identically prepared in the presentation of the quantum experiment we took not containers, and according to his results, the container’s two only Tatiana’s point of view, but also her parts, disregarding chambers initially containedz ˜+-andˆx+-gases (Fig. 6, a), with Willard’s experimental evidence in our mathematical descrip- tion. In fact, Willard showed that the two quantum ideal gases 1000 + def= | + + + +| = 0000 , can be completely (and, it is assumed, reproducibly) separ- z˜ z z z z ˆ 0000 (25) 0000 ated; and as shown in § 1.5 this means that there must ex- 0000 + def= | + − + −| = 0000 . ist a measurement procedure by which the two corresponding xˆ x z x z ˆ 0010 (26) quantum preparations can be distinguished in one shot. This 0000 is not in complete contradiction with Tatiana’s initial descrip- These density matrices are orthogonal, tr(˜z+ xˆ+) = 0, be- her tion: with the measurement means at disposal, the two cause his measurement means allow him to distinguish the preparation procedures were not distinguishable in one shot, two corresponding preparations in one shot. This is precisely what Tatiana could not do, instead, and so she represented the two preparations by the non-orthogonal density matrices z+ = |z+ z+| and x+ = |z− z−|. 27 For an excellent discussion on different levels and scales of description in the particular case of thermodynamics, see Wood [112]; also Samohýl [22, Willard mixed the two separable gases with his semi- especially § 7] and Jaynes [113, § 1.2]. permeable diaphragms, obtaining work (Fig. 6, b), so that the 13

(a) (b) (c) for gases in the upper chamber, and 0.43 |α+ z+ α+ z+|+ |z+ z+ z+ z+|  . | + + + +|+ . |α− + α− +|+ Q >-0 0 5 z z z z - 0 07 z z − + − + + + + + + − + − + − + − |α α | →| |, ! . |x z x z | . |α z α z |+ z z z z z z (35) + − + − 0 5 0 43 |x z x z | − − − − − − − − + − + − 0.07 |α z α z | |α z α z | →|z z z z | (36)

6       (−Q > Q + Q ) −Q < Q for the gases in the lower chamber. The successive elimin- (f)  (a) (e) (d) ? 0.5 |z+ z+ z+ z+|+ 0.5 |z+ z+ z+ z+|+ ation and reinsertion of the impermeable diaphragm yielded + + + + + − + − + − + − . |α α |+ / 0.5 |z z z z | 0.5 |z z z z | 0 5 z z two chambers of equal volume V 2 and same content, viz. the   + − + − + + + + + − + − + + + + + + + + 0.5 |α z α z | | | | | 0.5 |x z x z |+ 0.5 |z z z z |+ mixture of gases described by z z z z and z z z z 0.5 |x+ z− x+ z−| 0.5 |z+ z− z+ z−| q (Fig. 6, e). 0.5 |α− z+ α− z+|+ Tatiana’s final rotation for the gas in the lower chamber, 0.5 |α− z− α− z−| |z+ z+ →|x+ z+ , |z+ z− →|x+ z− , (37) Figure 6: Quantum gas experiment from Willard’s point of only led to two equal chambers containing the mixtures view. 1 | + + + +| + 1 | + − + −| 1 | + + + +| + of 2 z z z z 2 z z z z and 2 x z x z 1 | + − + −| 2 x z x z gases respectively (Fig. 6, f). container eventually contained a τ-gas, where From her point of view, i.e., from what her measuring 1000 means could tell, this final situation was identical with the 1 1 + + τ = + + + = 1 0000 . initial one (Fig. 6, a), i.e. a z -gas in one chamber and an x - z˜ xˆ ˆ 2 0010 (27) 2 2 0000 gas in the other, and so she thought the thermodynamic cycle Tatiana’s subsequent separation by means of her semi- completed. But we see now that the final and initial situations permeable diaphragms (Fig. 6, c d) which can separate the were in fact different. Hence the second law was not violated, preparations corresponding to α+ and α−, is represented by because the cycle was not completed, and the form (3) for the Willard by the two-element POVM {E+, E−}, with second law does not apply. It is also easy to see that in order to return to the initial ± def ± + ± + ± − ± − E = |α z α z | + |α z α z |, condition an amount of work −Q 4 × (1/4)NkT ln 2 ≈ ⎛ √ √ ⎞ . | + + + +| ⎜ 2±√ 2 ± √20 0⎟ 0 693 NkT has to be spent to separate the z z z z - ⎜ ⎟ (28) + − + − = 1 ⎜ ± 22∓ 20√ 0√ ⎟ , gas from the |z z z z |-gas, and analogously for the ˆ 4 ⎝⎜ ± ± ⎠⎟ 002√ 2 √2 | + + + +| | + − + −| 00± 22∓ 2 x z x z -and x z x z -gases. A final operation must then be performed corresponding to the rotations of the dens- where ity matrices |z+ z− z+ z−| and |x+ z+ x+ z+| to |z+ z+ z+ z+| + − + − √ 1 and |x z x z | respectively, and we have finally reached ± + − ± + ± + |α z def= 2 ± 2 2 |z z ±|x z , (29) again the initial condition (Fig. 6, a). The total amount of √ − 1 heat absorbed by the gases in this completing process would ± − def 2 ± − ± − |α z = 2 ± 2 |z z ±|x z (30) then be (cf. Eqn. (14)), and the associated CPMs (projections) τ → =  +  +  . − . − . = E±τE±/ tr(E±τE±). Q Q Q Q (0 693 0 416 0 693)NkT That separation led to a chamber, with volume 0.854 V, − 0.416NkT 0, (38) containing the gas mixture and the second law, for the completed cycle, would be satisfied 1 1 |α+ z+ α+ z+| + |α+ z− α+ z−|, (31) (strictly so: we see that the whole process is irreversible, and 2 2 it is easy to check that the only irreversible step was Tatiana’s + − and another chamber, with volume 0.146 V, containing the gas transformation and separation into α -andα -gases). mixture The original conclusion has thus been reversed: no viola- 1 1 tion of the second law is found. |α− z+ α− z+| + |α− z− α− z−|. (32) 2 2 Note that Tatiana could not notice that these were mixtures, 3. CONCLUSIONS because of her limited instrumentation. Moreover, from Wil- lard’s point of view the separation brought about a transform- Preparation procedure and density matrix, with their re- ation of the gases’ quantum degrees of freedom. spective properties, are quite different concepts, but intimately The following step corresponded to the unitary rotations related because the latter is a mathematical representation of + + + + + + + + the former. In particular, the orthogonality of two density |α z α z | →|z z z z |, (33) matrices mathematically represents thefactthattheircorres- |α+ − α+ −| →| + − + −| z z z z z z (34) ponding preparation procedures can be distinguished in one 14 shot. The one-shot distinguishability of two preparation pro- Appendix: ORTHOGONALITY PROOF cedures is also equivalent to the possibility of separating them by appropriate semi-permeable diaphragms. This equivalence Suppose that two preparation procedures are represented by is important because it relates their statistical and gross ther- density matrices φ and ψ, and that we have a measurement modynamic properties. procedure represented by a POVM {Eμ, Fν} such that A consequence of this physical-mathematical relation is tr(φEμ) = 0andtr(ψEμ)  0forallEμ, that an observer cannot assume the non-orthogonality of two (A.1) density matrices and, at the same time, claim the one-shot dis- tr(ψFν) = 0andtr(φFν)  0forallFν. tinguishability of their corresponding preparation procedures, because this assumption is self-contradictory and can only These equations mathematically represent the condition of lead to vain conclusions. We have seen this kind of contra- one-shot distinguishability, as discussed in § 1.2. dictory assumption behind the second statement (2). We want to prove that from the above equations it follows that the two density matrices are orthogonal: However, two observers can represent the same prepara- tions by means of density matrices having different numerical tr(φψ) = 0. (A.2) values and properties (e.g., orthogonality), and even different dimensionality. This may happen either because the two ob- To prove this, let us consider the two-element POVM {E, F} servers have different pieces of knowledge about the prepara- given by: tions’ properties and the measurements available; or because def= , def= . each observer, having a different purpose or application, con- E Eμ F Fν (A.3) siders different sets of preparations and measurements to de- μ ν scribe the same physical phenomenon. In particular, proper- This POVM represents a “coarse graining” of the original ties like one-shot distinguishability — and thus also orthogon- measurement procedure, with some results grouped together. ality — always depend on the particular set of measurement It is easy to see that it satisfies procedures which an observer decides to consider (sometimes the whole known set at his or her disposal). tr(φE) = 0andtr(ψE) = 1  0, (A.4) We have seen an example of this point in Peres’ thought- tr(ψF) = 0andtr(φF) = 1  0. experiment: An observer does not know of any measurement procedure that can distinguish in one shot two particular pre- The first POVM element E can be written as a sum of one- {| |} parations; accordingly, she represents them by non-orthogonal dimensional orthogonal projectors i i (here and in the fol- density matrices. A second observer then shows the existence lowing the indices i, k, j,andl take values in appropriate and of such a measurement procedure, and so both observers must possibly distinct ranges): then use orthogonal density matrices to represent the prepar- = | |, < ations, at least when they want to operate with that measure- E ei i i with 0 ei 1foralli (A.5) ment. i In any case, the consistency between the physical phenom- (note that the sum is not necessarily a convex combination, ffi ena considered and their mathematical description is always even though its coe cients are positive and not greater than essential. unity). We can decompose the first density matrix φ in an analog- ous manner: φ = φk|k˜ k˜|, with 0 <φk 1forallk and φk = 1, k k Acknowledgements (A.6) where the projectors {|k˜ k˜|} are not necessarily parallel or or- thogonal to the {|i i|}. We thank an anonymous reviewer (see Footnote 18) for The assumption tr(φE) = 0 can be written as some constructive criticisms which helped to explain the ideas 2 of this paper more clearly. PGLM thanks Louise for en- tr(φE) = φkei| k˜ | i | = 0, (A.7) couragement, suggestions, and invaluable odradek, and Peter k,i Morgan for many useful comments. Special and affectionate φ thanks also to the staff of the Royal Institute of Technology’s which by the strict positivity of the k and ei implies Library, for their irreplaceable work and ever prompt support. ˜ | = , AM thanks Anders Karlsson for encouragement. k i 0 for all k and i,(A.8) We dedicate this paper to the memory of Asher Peres and i.e., the projectors — equivalently, the eigenvectors — of the to his clarifying work in quantum mechanics. matrix E are in fact all orthogonal to those of the matrix φ. 15

This means that the {|i i|} can be completed by the {|k˜ k˜|} (they are not those used in the previous proof, although we use and some additional projectors {| j j|} to form a complete set the same symbols for convenience). These operators form, as of orthonormal projectors — an orthonormal basis: is easily checked, a POVM {E, F}, which satisfy Eqns. (A.4). If we moreover assume that any POVM may be physically {| |, |˜ ˜|, |  |} . i i k k j j ik j (A.9) realised by some measurement procedure,28 then we have proven the converse statement: if two preparations are rep- Writing the identity matrix I in terms of the new projectors, resented by orthogonal density matrices, it means that they are one-shot distinguishable. q.e.d. I ≡ |i i| + |k˜ k˜|, + | j j|, (A.10) i k j we have that the second POVM element F must be given by [1] J. von Neumann, Mathematische Grundlagen der Quanten-   F = I − E = (1 − ei)|i i| + |k˜ k˜| + | j j |. (A.11) mechanik (Springer-Verlag, Berlin, 1996), 2nd ed., first publ. i k j 1932, transl. by Robert T. Beyer as Mathematical Found- ations of Quantum Mechanics (Princeton University Press, Let us now decompose the second density matrix ψ as Princeton, 1955). [2] J. von Neumann, Thermodynamik quantenmechanischer Ge- ψ = ψl|lˆ lˆ|, with 0 <ψl 1foralll and ψl = 1, samtheiten, Gött. Nach. pp. 273–291 (1927), repr. in John l l von Neumann: Collected Works. Vol. I: Logic, theory of sets (A.12) and quantum mechanics (Pergamon Press, Oxford, 1961), where the projectors {|lˆ lˆ|} do not necessarily belong to the ed. by A. H. Taub, pp. 236–254. complete set previously introduced. [3] A. Peres, Quantum Theory: Concepts and Methods (Kluwer Using Eqns. (A.12) and (A.11), we rewrite the assumption Academic Publishers, Dordrecht, 1995). that tr(ψF) = 0as [4]E.C.Kemble,Fluctuations, thermodynamic equilibrium and entropy, Phys. Rev. 56, 1013–1023 (1939). tr(ψF) = ψ (1 − e )| lˆ | i |2 + [5]E.C.Kemble,The quantum-mechanical basis of statistical l i mechanics,Phys.Rev.56, 1146–1164 (1939). l,i (A.13) [6] E. T. Jaynes, Where do we stand on maximum entropy?, 2  2 ψl| lˆ | k˜ | + ψl| lˆ | j | = 0. in R. D. Levine and M. Tribus, eds., The Maximum En- l,k l, j tropy Formalism (The MIT Press, Cambridge (USA), p. 15, http://bayes.wustl.edu/etj/node1.html. Noting that all the sum terms are non-negative, and that the [7] E. T. Jaynes, Inferential scattering (1993), coefficients ψl are strictly positive, we deduce in particular http://bayes.wustl.edu/etj/node1.html;“extens- that ively rewritten” version of Generalized scattering,inC.R. Smith and W. T. Grandy, Jr., eds., Maximum-Entropy and lˆ | k˜ = 0, for all l and k, (A.14) Bayesian Methods in Inverse Problems (D. Reidel Publishing Company, Dordrecht, 1985), p. 377. which means that the eigenvectors of the density matrix φ are [8]E.T.Jaynes,Clearing up mysteries — the original goal, all orthogonal to those of the matrix ψ. in Maximum Entropy and Bayesian Methods, edited by We have thus J. Skilling (Kluwer Academic Publishers, Dordrecht, 1989), pp. 1–27, http://bayes.wustl.edu/etj/node1.html. tr(φψ) = φ ψ | lˆ | k˜ |2 = 0, (A.15) [9]E.T.Jaynes,Probability in quantum theory,inW.H.Zurek, k l ed., Complexity, Entropy, and the Physics of Information k,l (Addison-Wesley, 1990), p. 381; revised and extended version q.e.d. at http://bayes.wustl.edu/etj/node1.html. The converse is also easy to demonstrate. Consider again [10] E. T. Jaynes, Probability Theory: The Logic of the definitions (A.6) and (A.12), and assume that these density Science (Cambridge University Press, Cambridge, matrices are orthogonal, i.e., Eqn. (A.15). It follows that their 2003), ed. by G. Larry Bretthorst, first publ. 1995 at http://bayes.wustl.edu/etj (chapter 30, Maximum eigenvectors are orthogonal, Eqn. (A.14), and may be used to-   entropy: matrix formulation, of the original version of gether with additional projectors {| j j |} to form a complete the book, not included in the printed one, is available at orthonormal set http://www.imit.kth.se/~mana/work/cc30e.pdf).   [11] C. A. Fuchs, Quantum foundations in the light of quantum {|k˜ k˜|, |lˆ lˆ|, | j j |}k,l, j. (A.16) information (2001), arxiv eprint quant-ph/0106166. [12] C. A. Fuchs and R. Schack, Unknown quantum states Now define the operators and operations, a Bayesian view (2004), arxiv eprint quant-ph/0404156. E def= |k˜ k˜|, (A.17) k F def= I − E =+ |lˆ lˆ| + | j j| (A.18) 28 l j See, however, Footnote 7. 16

[13] T. A. Brun, J. Finkelstein, and N. D. Mermin, How much state [35] L. Carroll, (Rev. Charles Lutwidge Dodgson), Through the assignments can differ,Phys.Rev.A65, 032315 (2002), arxiv Looking-Glass, and What Alice Found There (1871), repr. in eprint quant-ph/0109041. The Annotated Alice (Penguin Books, London, 1970), ed. by [14] C. M. Caves, C. A. Fuchs, and R. Schack, Conditions for Martin Gardner. compatibility of quantum-state assignments,Phys.Rev.A66, [36] H. Ekstein, Presymmetry, Phys. Rev. 153, 1397–1402 (1967). 062111 (2002), arxiv eprint quant-ph/0206110. [37] H. Ekstein, Presymmetry. II,Phys.Rev.184, 1315–1337 [15] C. M. Caves, C. A. Fuchs, and R. Schack, Quantum probabilit- (1969). ies as Bayesian probabilities, Phys. Rev. A 65, 022305 (2002), [38] R. Giles, Foundations for quantum statistics, J. Math. Phys. 9, arxiv eprint quant-ph/0106133. 357–371 (1968). [16] P. G. L. Mana, Why can states and measurement out- [39] R. Giles, Foundations for quantum mechanics, J. Math. Phys. comes be represented as vectors? (2003), arxiv eprint 11, 2139–2160 (1970). quant-ph/0305117. [40] R. Giles, Formal languages and the foundations of physics, [17] P. G. L. Mana, Probability tables, arxiv eprint in C. A. Hooker, ed., Physical Theory as Logico-Operational quant-ph/0403084, repr. in A. Y. Khrennikov, ed., Structure (D. Reidel Publishing Company, Dordrecht, 1979), Quantum Theory: Reconsideration of Foundations — 2 pp. 19–87. (Växjö University Press, Växjö, Sweden, 2004), pp. 387–401. [41] D. J. Foulis and C. H. Randall, Operational statistics. I. Basic [18] P. G. L. Mana, (2005), in preparation. concepts, J. Math. Phys. 13, 1667–1675 (1972). [19] L. Hardy, Quantum theory from five reasonable axioms [42] D. J. Foulis and C. H. Randall, Manuals, morphisms and (2001), arxiv eprint quant-ph/0101012. quantum mechanics, in Marlow [114] (1978), pp. 105–126. [20] E. T. Jaynes, The Gibbs paradox,inMaximum- [43] C. H. Randall and D. J. Foulis, Operational statistics. II. Entropy and Bayesian Methods,editedbyC.R. Manuals of operations and their logics, J. Math. Phys. 14, Smith, G. J. Erickson, and P. O. Neudorfer (Kluwer 1472–1480 (1973). Academic Publishers, Dordrecht, 1992), pp. 1–22, [44] J. L. Park and W. Band, Mutually exclusive and exhaustive http://bayes.wustl.edu/etj/articles/gibbs.paradox.pdf. quantum states, Found. Phys. 6, 157–172 (1976). [21] C. A. Truesdell, III, Rational Thermodynamics (Springer- [45] W. Band and J. L. Park, New information-theoretic founda- Verlag, New York, 1984), 2nd ed., first publ. 1969. tions for quantum statistics, Found. Phys. 6, 249–262 (1976). [22] I. Samohýl, Thermodynamics of Irreversible Processes in [46] A. Peres, What is a state vector?, Am. J. Phys. 52, 644–650 Fluid Mixtures (Approached by Rational Thermodynamics) (1984). (BSB B. G. Teubner Verlagsgesellschaft, Leipzig, 1987). [47] A. Peres, When is a quantum measurement?, Am. J. Phys. 54, [23] M. Šilhavý, The Mechanics and Thermodynamics of Continu- 688–692 (1986). ous Media (Springer-Verlag, Berlin, 1997). [48] L. E. Ballentine, The statistical interpretation of quantum [24] C. A. Truesdell, III and W. Noll, The Non-Linear Field The- mechanics, Rev. Mod. Phys. 42, 358 (1970). ories of Mechanics (Springer-Verlag, Berlin, 1992), 2nd ed., [49] H. P. Stapp, S -matrix interpretation of quantum theory, Phys. first publ. 1965. Rev. D 3, 1303 (1971). [25] W. Noll, A new mathematical theory of simple materials, [50] H. P. Stapp, The Copenhagen interpretation, Am. J. Phys. 40, Arch. Rational Mech. Anal. 48, 1–50 (1972). 1098 (1972). [26] W. Noll, 5 Contributions to Natural Philosophy (2004), [51] A. Peres, Pure states, mixtures and compounds,inMarlow http://www.math.cmu.edu/~nw0z/publications/04-CNA-018/018abs/018abs.html[114] (1978), pp. 357–363.. [27] J. C. Willems, Dissipative dynamical systems: Part I: Gen- [52] K. Kraus, States, Effects, and Operations: Fundamental No- eral theory, Arch. Rational Mech. Anal. 45, 321–351 (1972), tions of Quantum Theory (Springer-Verlag, Berlin, 1983). see also J. C. Willems, Dissipative dynamical systems: Part [53] A. Komar, Indeterminate character of the reduction of the II: Linear systems with quadratic supply rates, Arch. Rational wave packet in quantum theory, Phys. Rev. 126, 365 (1962). Mech. Anal. 45, 352–393 (1972). [54] E. P. Wigner, On the quantum correction for thermodynamic [28] B. D. Coleman and D. R. Owen, A mathematical foundation equilibrium, Phys. Rev. 40, 749–759 (1932). for thermodynamics, Arch. Rational Mech. Anal. 54, 1–104 [55] U. Fano, Description of states in quantum mechanics by dens- (1974). ity matrix and operator techniques, Rev. Mod. Phys. 29, 74–93 [29] B. D. Coleman and D. R. Owen, On thermodynamics and (1957). elastic-plastic materials, Arch. Rational Mech. Anal. 59, 25– [56] K. Kraus, General state changes in quantum theory, Ann. of 51 (1975). Phys. 64, 311–335 (1971). [30] B. D. Coleman and D. R. Owen, On the thermodynamics of [57] E. B. Davies, Quantum Theory of Open Systems (Academic semi-systems with restrictions on the accessibility of states, Press, London, 1983). Arch. Rational Mech. Anal. 66, 173–181 (1977). [58] A. Peres, Classical interventions in quantum systems. I. The [31] B. D. Coleman and D. R. Owen, On the thermodynamics of measuring process, Phys. Rev. A 61, 022116 (2000), arxiv elastic-plastic materials with temperature-dependent moduli eprint quant-ph/9906023. and yield stresses, Arch. Rational Mech. Anal. 70, 339–354 [59] M.-D. Choi, Completely positive linear maps on complex (1979). matrices, Lin. Alg. Appl. 10, 285–290 (1975). [32] G. Del Piero, A mathematical characterization of plastic phe- [60] G. Ghirardi, A. Rimini, and T. Weber, Is the density matrix de- nomena in the context of Noll’s new theory of simple materials, scription of statistical ensembles exhaustive?, Nuovo Cimento Eng. Fract. Mech. 21, 633–639 (1985). B 29, 135–158 (1975). [33] G. Del Piero and L. Deseri, On the concepts of state and free [61] O. Cohen, Counterfactual entanglement and nonlocal correl- energy in linear viscoelasticity, Arch. Rational Mech. Anal. ations in separable states,Phys.Rev.A60, 80–84 (1999), 138, 1–35 (1997). arxiv eprint quant-ph/9907109. Cohen’s confusion is ex- [34] J. Serrin, Conceptual analysis of the second law of thermody- plained and corrected by D. R. Terno, Comment on “Coun- namics, Arch. Rational Mech. Anal. 70, 355–371 (1979). terfactual entanglement and nonlocal correlations in separ- 17

able states”, Phys. Rev. A 63, 016101 (2001), arxiv eprint [82] C. A. Truesdell, III, Sulle basi della termodinamica delle mi- quant-ph/9912002. See also Cohen’s reply, Reply to “Com- scele, Atti Accad. Naz. Lincei: Rend. Sc. fis. mat. nat. 44, ment on ‘Counterfactual entanglement and nonlocal correla- 381–383 (1968). tions in separable states’ ”,Phys.Rev.A63, 016102 (2001), [83] I. Samohýl, Thermodynamics of reacting mixtures of any sym- arxiv eprint quant-ph/0008080. metry with heat conduction, diffusion and viscosity,Arch.Ra- [62] G.-L. Long, Y.-F. Zhou, J.-Q. Jin, and Y. Sun, Distinct- tional Mech. Anal. 147, 1–45 (1999). ness of ensembles having the same density matrix and the [84] I. Müller, A thermodynamic theory of mixtures of fluids,Arch. nature of liquid NMR quantum computing (2004), arxiv eprint Rational Mech. Anal. 28, 1–39 (1968). quant-ph/0408079. [85] R. M. Bowen and D. J. Garcia, On the thermodynamics of mix- [63] L. P. Hughston, R. Jozsa, and W. K. Wootters, A complete clas- tures with several temperatures, Int. J. Engng. Sci. 8, 63–83 sification of quantum ensembles having a given density matrix, (1970). Phys. Lett. A 183, 14–18 (1993). [86] R. J. Atkin and R. E. Craine, Continuum theories of mixtures: [64] A. Hobson, Concepts in Statistical Mechanics (Gordon and basic theory and historical development,Q.J.Mech.Appl. Breach Science Publishers, New York, 1971). Math. 29, 209–244 (1976). [65] W. K. Wootters, Statistical distance and Hilbert space, Phys. [87] R. J. Atkin and R. E. Craine, Continuum theories of mixtures: Rev. D 23, 357 (1981). applications, J. Inst. Maths. Applics 17, 153–207 (1976). [66] W. K. Wootters and B. D. Fields, Optimal state-determination [88] R. Fosdick and J. Patiño, On the Gibbsian thermostatics of by mutually unbiased measurements, Ann. of Phys. 191, 363– mixtures, Arch. Rational Mech. Anal. 93, 203–221 (1986), 381 (1989). repr. in B. D. Coleman, M. Feinberg, and J. Serrin, eds., Ana- [67] A. Peres and W. K. Wootters, Optimal detection of quantum lysis and Thermomechanics: A Collection of Papers Dedic- information,Phys.Rev.Lett.66, 1119 (1991). ated to W. Noll (Springer-Verlag, Berlin, 1987), pp. 435–453. [68] K. R. W. Jones, Principles of quantum inference, Ann. of Phys. [89] P. Ehrenfest and V. Trkal, Deduction of the dissociation- 207, 140–170 (1991). equilibrium from the theory of quanta and a calculation of the [69] K. R. W. Jones, Fundamental limits upon the measurement of chemical constant based on this, Proc. Acad. Sci. Amsterdam state vectors,Phys.Rev.A50, 3682–3699 (1994). (Proc. of the Section of Sciences Koninklijke Nederlandse [70] S. L. Braunstein and C. M. Caves, Statistical distance and the Akademie van Wetenschappen) 23, 162–183 (1920), repr. in geometry of quantum states,Phys.Rev.Lett.72, 3439 (1994). P. Ehrenfest, Collected Scientific Papers (North-Holland Pub- [71] P. B. Slater, Reformulation for arbitrary mixed states of Jones’ lishing Company, Amsterdam, 1959), ed. by Martin J. Klein, Bayes estimation of pure states, Physica A 214, 584–604 pp. 414–435. (1995). [90] D. Dieks and V. van Dijk, Another look at the quantum mech- [72] R. D. Gill and S. Massar, State estimation for large en- anical entropy of mixing, Am. J. Phys. 56, 430–434 (1988). sembles,Phys.Rev.A61, 042312 (2000), arxiv eprint [91] A. Einstein, Beiträge zur Quantentheorie, Ver. Deut. Phys. quant-ph/9902063. Ges. 16, 820–828 (1914), transl. as Contributions to quantum [73] C. A. Truesdell, III and S. Bharatha, The Concepts and Lo- theory in A. Einstein, The Collected Papers of Albert Ein- gic of Classical Thermodynamics as a Theory of Heat En- stein: Vol. 6: The Berlin Years: Writings, 1914–1917. (Eng- gines: Rigorously Constructed upon the Foundation Laid by lish translation) (Princeton University Press, Princeton, 1997), S. Carnot and F. Reech (Springer-Verlag, New York, 1977). transl. by A. Engel and E. Schucking, pp. 20–26. [74] M. Planck, Treatise on Thermodynamics (Dover Publications, [92] J. Larmor, A dynamical theory of the electric and luminifer- New York, 1945), 3rd ed., transl. from the seventh German ous medium. — Part III. Relations with material media,Phil. edition by A. Ogg, first publ. 1897. Trans. R. Soc. Lond. A 190, 205–300, 493 (1897), repr. in [75] G. N. Lewis and M. Randall, Thermodynamics and the Free J. Larmor, Mathematical and Physical Papers. Vol. II (Cam- Energy of Chemical Substances (McGraw-Hill Book Com- bridge University Press, Cambridge, 1929). pany, New York, 1923). [93] P. G. L. Mana, A. Månsson, and G. Björk (2005), in prepara- [76] J. R. Partington, An Advanced Treatise on Physical Chem- tion. istry. Vol. I: Fundamental Principles. The Properties of Gases [94] I. M. Copi and C. Cohen, Introduction to Logic (MacMillan (Longmans, Green and Co, London, 1949). Publishing Company, New York, 1990), eighth ed., first publ. [77] H. B. Callen, Thermodynamics and an Introduction to Ther- 1953. mostatistics (John Wiley & Sons, New York, 1985), 2nd ed., [95] I. M. Copi, Symbolic Logic (Macmillan Publishing, New York, first publ. 1960. 1979), 5th ed., first publ. 1954. [78] H. A. Buchdahl, The Concepts of Classical Thermodynamics [96] A. G. Hamilton, Logic for Mathematicians (Cambridge Uni- (Cambridge University Press, London, 1966). versity Press, Cambridge, 1993), first publ. 1978. [79] J. W. Gibbs, On the equilibrium of heterogeneous substances, [97]R.Schack,T.A.Brun,andC.M.Caves,Quantum Trans. Connect. Acad. 3, 108–248, 343–524 (1876–1878), Bayes rule,Phys.Rev.A64, 014305 (2001), arxiv eprint repr. in The Collected Works of J. Willard Gibbs. Vol. 1: Ther- quant-ph/0008113. modynamics (Yale University Press, New Haven, 1948), first [98] C. M. Caves, C. A. Fuchs, and R. Schack, Unknown quantum publ. 1906, pp. 55–353. states: the quantum de Finetti representation, J. Math. Phys. [80] C. A. Truesdell, III and R. A. Toupin, The Classical Field The- 43, 4537–4559 (2002), arxiv eprint quant-ph/0104088. ories, in S. Flügge, ed., Handbuch der Physik: Band III/1: [99] C. A. Fuchs, R. Schack, and P. F. Scudo, De Finetti represent- Prinzipien der klassischen Mechanik und Feldtheorie [Encyc- ation theorem for quantum-process tomography,Phys.Rev.A lopedia of Physics: Vol. III/1: Principles of Classical Mech- 69, 062305 (2004), arxiv eprint quant-ph/0307198. anics and Field Theory] (Springer-Verlag, Berlin, 1960), pp. [100] B. de Finetti, Foresight: Its logical laws, its subjective sources, 226–858. transl. with additional notes in H. E. Kyburg, Jr. and H. E. [81] R. M. Bowen, Thermochemistry of reacting materials,J. Smokler, eds., Studies in Subjective Probability (John Wiley Chem. Phys. 49, 1625–1637 (1968). & Sons, New York, 1964), pp. 93–158. 18

[101] P. Pechukas, Reduced dynamics need not be completely posit- eprint quant-ph/0505123. ive, Phys. Rev. Lett. 73, 1060–1062 (1994), see also R. Alicki, [107] T. F. Jordan, Affine maps of density matrices,Phys.Rev.A71, Comment on “Reduced dynamics need not be completely pos- 034101 (2005), arxiv eprint quant-ph/0407203. itive”, Phys. Rev. Lett. 75, 3020 (1995), and P. Pechukas, [108] H. Grad, The many faces of entropy, Comm. Pure Appl. Math. Pechukas replies,Phys.Rev.Lett.75, 3021 (1995). 14, 323–354 (1961). [102] P. Štelmachovicˇ and V. Bužek, Dynamics of open quantum [109] A. S. Holevo, Probabilistic and Statistical Aspects of Quantum systems initially entangled with environment: Beyond the Theory (North-Holland, Amsterdam, 1982), first publ. in Rus- Kraus representation, Phys. Rev. A 64, 062106 (2001), arxiv sian 1980. eprint quant-ph/0108136, see also Erratum: Dynamics of [110] H. Grad, Statistical mechanics of dynamical systems with in- open quantum systems initially entangled with environment: tegrals other than energy, J. Phys. Chem. 56, 1039–1048 Beyond the Kraus representation, Phys. Rev. A 67, 029902 (1952). (2003). [111] H. Grad, Levels of description in statistical mechanics and [103] H. Hayashi, G. Kimura, and Y. Ota, Kraus representation in thermodynamics, in M. Bunge, ed., Delaware Seminar in the the presence of initial correlations,Phys.Rev.A67, 062109 Foundations of Physics (Springer-Verlag, Berlin, 1967), pp. (2003), arxiv eprint quant-ph/0303061. 49–76. [104] K. M. Fonseca Romero, P. Talkner, and P. Hänggi, Is the dy- [112] L. C. Woods, The Thermodynamics of Fluid Systems (Oxford namics of open quantum systems always linear?, Phys. Rev. A University Press, London, 1975). 69, 052109 (2004), arxiv eprint quant-ph/0311077. [113] E. T. Jaynes, Thermodynamics (1965 ca.), [105] T. F. Jordan, A. Shaji, and E. C. G. Sudarshan, Dynamics of http://bayes.wustl.edu/etj/node2.html. initially entangled open quantum systems (2004), arxiv eprint [114] A. R. Marlow, ed., Mathematical Foundations of Quantum quant-ph/0407083. Theory (Academic Press, London, 1978). [106] T. F. Jordan, A. Shaji, and E. C. G. Sudarshan, The Schrödinger picture for open quantum dynamics (2005), arxiv (Note: ‘arxiv eprints’ are located at http://arxiv.org/.) Paper F

arXiv:quant-ph/0607111v3 29 Apr 2007 ntepeetfis td eo we study first present the In Introduction 0 rmwr.Ti on fve,wihw hn sbsclyLpaesbtue an uses but Laplace’s basically is logical think general we and which simple view, a develop of we point which of This for re-interpretation a methods, framework. provides inverse that also of This objects kinds models. two some statistical probabilities’, with of connexion in ‘probabilities appear and parameters probability-like of, ∗ Email: Fo pasblte fplausibiliti of ‘plausibilities “From ugiaTkik ösoa,Iajrsaa 2 E144 tchl,Sweden Stockholm, 40 SE-164 22, Isafjordsgatan Högskolan, Tekniska Kungliga ASnmes 02.50.Cw,02.50.Tt,01.70. numbers: PACS S ubr:03B48,62F15,60A05 numbers: MSC that — properties logical in additional useful some are satisfy that are and hypotheses, probability physical a e.g. hypo- like assigning or knowledge, reasonings real of viz., inferential pieces — usual ‘circumstances’ thetical of the classes that equivalence about is which reasonings a Jaynes, idea, of The and parameters probability-like Laplace the properties. about to logical traced additional be prob- some a can assigning satisfies in a that useful ‘circumstance’, is and that knowledge of ability of notion piece any the for stands through which term reinterpreted are distributions, prior nqeyindexed uniquely rbblt-ieprmtr pern nsm ttsia oes n their and models, statistical some in appearing parameters Probability-like [email protected] napoc hog circumstances through approach an .G .PraMana, Porta L. G. P. Pasblte fplausibilities’: of ‘Plausibilities ytepoaiiydsrbtoste edto. lead they distributions probability the by ff ra lentv on fve n rare-interpretation a or on, view of point alternative an er Dtd 9Arl2007) April 29 (Dated: en atIof I part Being ∗ Abstract 1 .Mnsn .Björk G. Månsson, A. s osaeasgmn methods” state-assignment to es’ ttsia oe a ecnevdas conceived be can model statistical + w on’mtogether. ’em join ’em, join can’t you If Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach idea presented in nuce in some work by Jaynes, is alternative to both that based on the distinction between ‘physical’ and ‘subjective’ probabilities, and that based on de Finetti’s theorem. This point of view and the ensuing inverse-method framework have applications in physical theories, as will be shown in following studies [1, 2]. Our study and results are based exclusively on probability theory; we do not use entropy notions, for example. For us, ‘probability’ means simply plausibility,and the following conceptual proportion holds:1

Plausibility calculus : Everyday notion of ‘plausibility’ = Logical calculus : Everyday notion of ‘truth’. We thus take the licence to adopt the term plausibility henceforth2 — with no need to define what it means, any more than it is usually done with ‘truth’.3 (Our study can in any case be easily ‘translated’ into degrees-of-belief, credence, or similar terms.) The notation P(A| I) [5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] (cf. also [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37]) will denote the plausibility of the statement A in the context described by the proposition I.4 We shall also say that I ‘leads to’ or ‘yields’ a given plausibility of A, but no particular meaning is intended with these two verbs. Associated plausibility densities will be denoted by x → pA(x| I); the term ‘distribution’ will be used for ‘density’ sometimes. Other symbols and notations are used in accordance with ISO [44] and ANSI/IEEE [45] standards. In another study [46] we analyse the question of assigning plausibilities to un- known ‘events’ (e.g., measurement outcomes) from knowledge of ‘similar events’; a

1Does also Wittgenstein mean something of the kind when he writes ‘Probability theory is only concerned with the state of expectation in the sense in which logic is with thinking’ [3, § 237][4, p. 231]? See also Johnson [5, pp. 2–3]. 2Also used by Kordig [6]. 3Is truth objective or subjective? Can truth be ‘operationally’ defined? Can the truth of a proposi- tion be tested? — ‘Of course! to test the truth of ‘This hat is brown’ I only need to look at the hat!’. Well, provided e.g. that you are not dreaming or having hallucinations; and how do you test that? Going backwards, in the end you arrive at some proposition which you simply assume — unconsciously, by convention, by agreement, by caprice — and cannot ‘test’. ‘Subjectivity’ lurks no less in logic than in plausibility theory — and is no less uninteresting in plausibility theory than it is in logic. 4The context could also be called ‘condition’ or ‘situation’; Johnson calls it ‘supposal’ [5] ; Jeffreys simply ‘data’ [17]. In the notation above, there is a relation (in the sense in which there is a relation between ‘certain’ and ‘true’) between the expressions ‘P(A| I) = 1’ and ‘I |= A’ (especially when this is used as in situational logic; see e.g. [38, 39, 40, 41]). The latter could also be suggestively written ‘T(A| I) = 1’. The differences between the formalisms of logic and plausibility theory lie essentially in the fact that our everyday use of truth can be effectively (but not exclusively) modelled through a dichotomic set like {0, 1} (or {, ⊥} or {T, F}), whereas our everyday use of plausibility can be effectively modelled through an ordered continuum like [0, 1] (or [0, +∞]or[−∞, +∞], see e.g. Tribus [42, pp. 26–29], Jaynes [20, ch. 2], Cox [12]). This has important implications, like the fact that a purely syntactic approach to plausibility, in the guise of the logical calculus, is near to worthless [43]. The parallel between plausibility theory and truth-functional logic suggests also another point. We do not require of logic, when put to practical use, that it should also provide us with the initial truth-value assignments (the ‘assumptions’). Why should we have an analogous requirement on plausibility theory with regard to initial plausibility-value assignments instead?

2 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach problem which is connected to induction. The key point is the formalisation, within probability theory, of the notion of ‘similar event’. This we do through the framework and the interpretation presented in the first note. We do not use the idea of exchange- ability — and infinite exchangeability in particular — which is used in Bayesian theory for the same purpose; but there are known strong connexions and analogies with its mathematics and some of its results. In fact, we try to persuade the reader that our approach touches the core idea from which exchangeability also springs. In a third note [1, 2] it will be shown that the inferential point of view presented here and in [46] finds applications in physical theories, like classical and quantum mechanics, an example being state reconstruction [47, 48]. The framework pre- sented subsumes and re-interprets known techniques of quantum-state assignment (or ‘retrodiction’ or ‘reconstruction’) and tomography, and offers alternative approaches to analogous techniques in classical mechanics.

1 Statistical models and ‘probabilities of probabilities’

A statistical model is, roughly speaking, a plausibility distribution whose numerical values depend on parameters (for a critical discussion of more rigorous or useful definitions see [49, 50]). An example is the ubiquitous normal distribution   1 (x − μ)2 x → N(x| μ, 1/σ2)  √ exp − 2π σ 2σ2 whose parameters are the expectation μ and the variance σ. Another example, one in which we shall be especially interested in this paper, is the ‘generalised Bernoulli’ model i → Br(i| q), which gives the plausibility distribution for a set of m mutually exclusive and exhaustive propositions {R1,...,Rm}, hereafter called outcomes,

p(Ri| q) = Br(i| q)  qi, (1) depending on a set of parameters q  (q1,...,qm) which belong to a simplex of appropriate dimensionality:  ∈ Δ  { |  , = }. q (xi) xi 0 i xi 1 (2)

(This model is apparently called ‘discrete model’. Since this name is too anonymous and the model reduces to the Bernoulli one for m = 2, we opted for ‘generalised Bernoulli’ instead.) The parameters of a statistical model are sometimes regarded as ‘unknown’, and a plausibility distribution (more precisely, a density) for them is therefore introduced. This distribution, usually called ‘prior distribution’ or simply ‘prior’, is used in cal- culations for a variety of purposes; two in particular interest us here and in the fol- lowing papers. (1) A parameter-free plausibility distribution for the outcomes can be obtained integrating the product of the prior and the statistical model in respect of the parameters, i.e., by marginalisation. In the case of the Bernoulli model, e.g.,

3 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach introducing a prior q → f (q)(f being of course a normalised positive generalised function) one obtains the parameter-free distribution  

p(Ri) = Br(i| q) f (q)dq ≡ qi f (q)dq. (3) Δ Δ (2) In so called ‘inverse methods’ or ‘problems’ (cf. Dale [51, §§ 1.2, 1.3]), the prior is used in the formula of Bayes’ theorem to obtain an ‘updated’ plausibility distribution for the parameters, conditional on knowledge of some outcome. The resulting distribution is called ‘posterior (distribution)’. In the case of the Bernoulli model, e.g., from the prior q → f (q) and knowledge of the outcome Ri one obtains the posterior p(R | q) f (q) q → f (q| R ) = i . (4) i |    Δ p(Ri q ) f (q )dq

Such practices are at least as old as Bayes [52, 53, 54]. Related and unrelated historical information can be found e.g. in Dale’s book [51] and some nice essays by Hacking [55, 56, 57]; see also Jaynes’ discussion [20, ch. 18]. Old, though ap- parently not as old as Bayes, is also the question: how to interpret statistical-model parameters like q and their prior distributions? The problem is that the parameters (qi) look like plausibilities, since their values are identical to the plausibilities of the outcomes {Ri} as eq. (1) shows, and that the prior f looks therefore like a ‘plausibility (distribution) of a plausibility’ — a redundant notion. This question, combined with the related issues on the interpretation of ‘probability’, has led to many philosophical debates; see e.g., amongst the vast literature on this, [58, 59]. The importance of the interpretative question is not merely philosophical, however. Different interpre- tations can lead to different conceptual and mathematical approaches — and thus to different solutions — in the investigation of concrete problems. This is particularly true for elaborate statistical models, like those connected to physical theories. Two main interpretations appear to be in vogue. Many statisticians, logicians, and physicists, on the one hand, speak about ‘subjective’ and ‘physical’ probabilities (or ‘propensities’ [60]). For them the notion of a ‘probability of a probability’ poses no problems, since it means something like ‘the subjective probability of a propen- sity’. The very idea of ‘estimating a probability’ implies such kinds of interpretation; cf. e.g. Good [61], especially the title and chapter 2, Jamison [62], or Tintner [63]. For pious Bayesian or ‘de Finettian’ devotees, on the other hand, which con- ceive probability as ‘degree of belief’, the notion of a ‘degree of belief in a degree of belief’ is redundant or even meaningless. The Bayesian are notoriously rescued from philosophical headaches by de Finetti’s celebrated theorem and other similar ones [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78], by which parameters like q and functions like q → f (q) are introduced as mere mathematical devices — i.e., not plausibilities or degrees of belief! — that need not be directly interpreted. See Bernardo and Smith [76, ch. 4] for a neat presentation of this point of view. In- terpretative issues like the Bayesian’s are also shared by those who thinks in terms of ‘logical probabilities’ [79] or, like we, simply in terms of ‘plausibilities’.

4 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

Here we present, discuss, and formalise still another interpretation — let us call it the ‘circumstance interpretation’ for definiteness’ sake, for reasons that will be apparent in the next section — in which functions like q → f (q) do represent plau- sibility distributions, i.e., they are not mere mathematical devices, but the notion of ‘plausibility of a plausibility’ is nevertheless completely avoided. This interpretation combines two ideas by which Jaynes tried to make sense of ‘plausibility-like’ param- eters: one is very briefly formulated in [80, p. 11], and can possibly be read also in de Finetti [81, § 20]; the other — the idea of an ‘Ap distribution’ — appears in the var- ious versions of his book on probability theory [18, lect. 18][19, lect. 5][20, ch. 18]. A similar interpretation is also proposed and discussed by Mosleh and Bier [59]. Caves also discusses, and criticises, a similar idea [78, 82]. It really seems to us, however, that this interpretation is basically what Laplace had in mind [53], if we read his ‘causes’ more generally as ‘circumstances’. Instead of trying to summarise this interpretation in abstract general terms that would very likely only appear obscure at this point, we prefer to invite the reader to proceed to the simple and concrete example of the next section, just a coin toss away. The example will allow us to introduce the basic idea, along with some terminology. Then another, more elaborate example (§ 3) follows, to further expand the main idea. This is then abstracted and generalised (§ 4). Some important remarks are scattered throughout this note.

2 Interpreting plausibility-like parameters as ‘indexed circumstances’: introductory example

2.1 Context and circumstances

A coin has been tossed, the outcome unknown to us. We want to assign plausibilities to the outcomes ‘head’, Rh, and ‘tail’, Rt. The old recipe says to compute “le rapport du nombre des cas favorables à celui de tous les cas possibles” [54]. This is seldom of much help: Which are the cases? at which depth should the situation be analysed? And what if these cases are not equally plausible? But why not analyse the situation in terms of some set of ‘cases’ anyway? Some set, not the set. And their plausibilities can be assigned by some other means. We do not want the ultimate analysis, just an analysis. In our case, suppose that the knowledge of the situation, which constitutes the context Ico, says that either Cecily or Gwendolen or Jack or Algernon tossed the coin. Let us call these the four possible circumstances of the coin toss and denote them by CC, CG, CJ, CA. The context could thus be analysed as the conjunction Ico = Jco ∧ (CC ∨ CG ∨ CJ ∨ CA), for some ‘sub-context’ Jco. Each circumstance says also something more about the respective person, which helps us in assigning the conditional plausibilities:5

5Cf. Laplace’s Problème II [53].

5 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

CC: Cecily is a magician and skilled coin-tosser that always like to produce the outcome ‘head’. If we knew that she had tossed the coin, we would assign the distribution of plausibility P(Rh| CC ∧ Ico), P(Rt| CC ∧ Ico) = (1, 0) (5) for the outcomes.

CG: Gwendolen, on the other hand, has no such particular skills, so if it were her who had tossed the coin we would assign the plausibility distribution | ∧ = 1 , 1 , P(Ri CG Ico) 2 2 (6) with i = h, t here and in the following.

CJ: On Jack we know nothing whatsoever. He could be skilled or unskilled in coin-tossing, a trickster or an absolutely earnest person. If we knew he had tossed the coin we could but assign the distribution | ∧ = 1 , 1 . P(Ri CJ Ico) 2 2 (7)

CA: Finally, we know that Algernon had been carrying a double-headed coin, which he would exchange with the original one if asked to toss it. So we assign the plausibilities P(Ri| CA ∧ Ico) = (1, 0) (8) in case he had made the coin toss. Remark 1. It is clear that not all the circumstances above express ‘causes’ [53] or ‘mechanisms’ [18, lects. 16, 17][19, lect. 5][20, ch. 18][78] which ‘determine’ the respective plausibility distributions. It could be appropriate to say this of the circum- stance concerning Algernon; but the circumstance concerning Jack, e.g., can hardly be called a ‘cause’ or ‘mechanism’: it is only out of sheer ignorance that we assign, conditionally upon it, the distribution (1/2, 1/2). Here and in the following, ‘circum- stance’ will generally mean simply what its name denotes: ‘a possibly unessential or secondary condition, detail, part, state of affair, factor, accompaniment, or attribute, in respect of time, place, manner, agent, etc., that accompanies, surrounds, or possi- bly determines, modifies, or influences a fact or event’ (cf. [83]).

2.2 Grouping the circumstances in a special way

The crucial step now is the following. Suppose that these four circumstances interest us not for their intrinsic details, but only in connexion with the plausibility distribu- tions they lead to for the coin toss in the context Ico. In this regard, the circumstance ‘Cecily tossed the coin’6 and the circumstance ‘Algernon tossed the coin’ are for us

6Our knowledge about Cecily must also be understood as implicit in this sentence; otherwise we should write ‘Cecily, who is a magician etc., tossed the coin’. This also holds for the sentences that follow.

6 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach equivalent, since both lead to the plausibility distribution (1, 0), as shown by eqs. (5) and (8). Similarly, ‘Gwendolen tossed the coin’ and ‘Jack tossed the coin’ are also equivalent, both leading to (1/2, 1/2); cf. eqs. (6) and (7). We should like to have a set of circumstances such that different circumstances led to different distributions. The first thing that comes to mind is to take the set {CC ∨ CA, CG ∨ CJ} of the dis- junctions of equivalent circumstances, i.e. CC ∨CA ≡ ‘Cecily or Algernon tossed the coin’ and CG ∨ CJ ≡ ‘Gwendolen or Jack tossed the coin’. We must see, however, whether this ‘coarse-grained’ set really fulfils our wishes. A simple theorem of plausibility theory comes to help. It says that, in a given context, the plausibility of a statement A conditional on a disjunction of mutually exclusive propositions {B j} is given by a convex sum of the plausibilities conditional on the single propositions, as follows [20, ch. 2]:  P(B j| I) P[A| ( B ) ∧ I] = P(A| B ∧ I) ({B } mutually exclusive), (9) j j j P(B | I) j j l l { } the weights being proportional to the plausibilities of the B j . Note that the value of the plausibility conditional on the disjunction, P[A| ( jB j) ∧ I], generally depends on the values of the plausibilities of the {B j}, {P(B j| I)}. Thus, the latter plausibilities must in general be specified if we want to find the first, and that varies as these vary. However, we see that this dependence disappears when the plausibilities conditional on each single B j, {P(A| B j ∧ I)}, have all the same value (the right-hand side be- comes a convex sum of identical points). In this case also the plausibility conditional on the disjunction, P[A| ( jB j) ∧ I], will have that same value, irrespective of the 7 plausibilities of the {B j}:  if P(A| B j ∧ I) = q for all j, then P[A| ( jB j) ∧ I] = q,

regardless of the values of the {P(B j| I)}. (10)

Clearly this is just the case when A is either of our outcomes {Ri} and the {B j} are either pair of equivalent circumstances. In fact, the protasis of the last formula is just our previous definition of equivalence amongst circumstances. Hence, the plausibility distribution for the results conditional on the disjunction CC ∨ CA is the same as those conditional on the two disjuncts separately, P[Ri| (CC ∨ CA) ∧ Ico] = P(Ri| CC ∧ Ico) = P(Ri| CA ∧ Ico) = (1, 0), (11) and analogously for CG ∨ CJ: | ∨ ∧ = | ∧ = | ∧ = 1 , 1 . P[Ri (CG CJ) Ico] P(Ri CG Ico) P(Ri CJ Ico) 2 2 (12) This is true whatever the plausibilities of our four initial circumstances might be (in fact, we have not yet specified them!).

7Cases of vanishing plausibilities can be treated as appropriate limits. One can adopt the consistent convention that the product of an undefined plausibility (such as those with a contradictory context) times a defined and vanishing one also vanishes.

7 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

The coarse-grained set {CC ∨ CA, CG ∨ CJ} has thus, by construction, the special feature we looked for: different circumstances lead to different plausibility distribu- tions for the outcomes. The circumstances can therefore be uniquely indexed by the respectively assigned plausibility distributions, and we denote them accordingly:

 ∨ ,  ∨ , S (1,0) CC CA S 1 , 1 CG CJ (13) 2 2 and call them plausibility-indexed circumstances. With this indexing system, and denoting q  (qh, qt), the conditional plausibilities of the outcomes can be written

P(Ri| S q ∧ Ico) = qi. (14)

The last expression is in many ways similar to that defining the generalised Bernoulli model (1). Indeed, one of our main points is the following: plausibility-like parameters used as arguments of plausibilities can always be interpreted to stand for appropriate plausibility-indexed circumstances. In view of eq. (14), someone could interpret the symbol ‘S q’ as ‘The plausibility distribution for the {Ri} is q’ (similarly to the symbol ‘Ap’ introduced by Jaynes [18, lect. 18][19, lect. 5][20, ch. 18]). But such an interpretation is obviously wrong. Let us make this point clear. The symbol ‘S (1,0)’, e.g., stands for ‘Cecily or Algernon tossed the coin’, as eq. (13) shows; and this proposition does not concern plausi- bilities at all. It is true that this proposition is the only one leading us to assign the distribution (1, 0); but it is so just because of a trick, viz. the fact that we have grouped and indexed the initial circumstances in a particular way. Borrowing some terminology from logic, we can say that the correspondence between the proposition ‘Cecily or Algernon tossed the coin’ and the distribution (1, 0) is only a trick within the metalanguage of our theory [84, 85, 86, 87, 88]. Remark 2. The use of statements like ‘The plausibility of A is p’ or ‘Data are drawn from a distribution f ’ is universal. Of course, they can be simply interpreted as ‘Look, the context and the circumstance are such that the plausibility of A (the data) is p ( f )’, and this can be enough for our purposes: we may not need to know all the details of the context and the circumstance. But note that those statements are more precisely metastatements, statements about plausibility assignments. As in logic, the use of such kind of statements as arguments of plausibility formulae is preferably avoided. First, because such statements usually make poor contexts. Compare the statements ‘Either Jack, who is a skilled coin tosser with a predilection for ‘head’, or Algernon, who has a two-headed coin, tossed the coin’ with ‘The plausibility distri- bution for ‘head’ and ‘tail’ is (1, 0)’: the former gives some clues as to the grounds on which the distribution (1, 0) is assigned, whereas the latter says only that that dis- tribution is assigned.8 Second, because such statements used inside plausibility for-

8It reminds of Bachelierus’ oft quoted answer: “Mihi a docto Doctore/ Domandatur causam et rationem, quare/ Opium facit dormire?/ À quoi respondeo,/ Quia est in eo/ Virtus dormitiva./ Cujus est natura/ Sensus assoupire” [89, troisième intermède].

8 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach mulae may give rise to self-references, circularity, and thus known paradoxes (‘This proposition is false’) and other inconsistencies [88, 90].9

2.3 Analysis by marginalisation

Let us now introduce the plausibilities of the original circumstances in the context Ico. For concreteness we can assume them to be equally plausible: 1 P(C | I ) = P(C | I ) = P(C | I ) = P(C | I ) = . (15) C co G co J co A co 4 From these values and the definitions (13) we have by the sum rule the plausibilities of the plausibility-indexed circumstances:

| = 1, | = 1. P[S (1,0) Ico] P[S 1 , 1 Ico] (16) 2 2 2 2 These plausibilities can be used to write the distribution for the outcomes on context Ico by marginalisation over the circumstances. We can do this both with the initial set {C j} and with the set of plausibility-indexed set {S q}. With the first we obtain | = | ∧ | = 3 , 1 . P(Ri Ico) P(Ri C j Ico) P(C j Ico) 4 4 (17) j With the second set we must of course obtain, consistently, the same result; but the decomposition has a more suggestive (and possibly misleading!) form: P(Ri| Ico) = P(Ri| S q ∧ Ico) P(S q| Ico), q (18) = | = 3 , 1 . q P(S q Ico) 4 4 q

The index q assumes the two values {(1, 0), (1/2, 1/2)}, but we can let it range over the whole simplex Δ defined in (2), introducing a density function q → pS (q| Ico)in the usual way (explained later in § 4). In this case it is given by 1 p (q| I)  δ(q − 1) + δ q − 1 δ(q + q − 1), (19) S 2 h h 2 h t

9We find an example in an article by Friedman and Shimony [91]. They introduce a proposition which says that the expectation of a certain quantity has a given value (their eq. (4)). But such a proposition is a metastatement, because expectation is defined in terms of plausibility assignments (in contrast to average, which is defined in terms of measured frequencies [92]). The authors, however, do not notice this and proceed to use that proposition inside plausibilities, obtaining peculiar conclusions. Gage and Hestenes [93] apparently show that these conclusions are not inconsistent, although they do not notice the mix-up of language and metalanguage either. Cyranski [94] has a partially clearer view of the matter. Cf. remark 6. A metastatement inside a plausibility is used, although tentatively, also by Jaynes (his ‘Ap’) [18, lect. 18][19, lect. 5][20, ch. 18]; but our analysis shows that his ideas can be realised without this artifice.

9 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach a weighted sum of Dirac deltas10 with support on q = (1, 0) and (1/2, 1/2). The marginalisation (18) thus takes the form  P(Ri| Ico) = q pS (q| Ico)dq, (20) Δ which is similar to the formula (3) for the generalised Bernoulli model.

2.4 Updating the plausibility of the circumstances

If the outcome of the toss is, say, ‘head’, what do the plausibilities of the circum- stances become? In other words, what are the circumstances’ plausibilities in the context Rh ∧ Ico? The answer is obviously given by Bayes’ theorem:

P(Rh| S q ∧ Ico)P(S q| Ico) P(S q| Rh ∧ Ico) =  , (21) q P(Rh| S q ∧ Ico)P(S q | Ico) or, in terms of the density pS , |∧ | ∧ = qh pS (q Ico) = pS (q Rh Ico)    q pS (q |∧I )dq Δ h co   2 1 δ(q − 1) + δ q − 1 δ(q + q − 1). (22) 3 h 3 h 2 h t The plausibility of S (1,0), i.e., that Cecily or Algernon tossed the coin, has thus in- creased a little. Remark 3. Note that knowledge of the outcome can help to increase the plausibility of one of the plausibility-indexed circumstances {S q} at the expense of the others’, but can never do so within a set of equivalent circumstances like {CC, CA} or {CG, CJ}.

The last formula is a very simple instance of the answer to an inverse problem. Our point is, again, that the marginalisation over a plausibility-like parameter and the updating of its distribution can be interpreted as the same operations for a set of plausibility-indexed circumstances. From this standpoint, and as should be clear from a previous discussion and remarks, the plausibility P(S q| Ico) (and its density pS (q| Ico)) is not the plausibility of a plausibility, but simply the plausibility of a circumstance, the latter being indexed in a particular way.

3 Second Example: multiple measurements, particular convex structures of circumstances, updating

3.1 Context

The following example differs from the first in the number of measurements and circumstances considered. Consequences: the space of parameters has particular

10In Egorov’s sense [95, 96, 97]; see also [98, 99, 100, 101] and cf. [102, 103, 104].

10 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach convex structures, and the plausibilities of the outcomes of one measurement can be ‘updated’ upon knowledge of the outcome of the other. We have a box with two buttons, marked ‘Letter’ and ‘Number’, and a display. Push the ‘Letter’ button, and either ‘a’ or ‘b’ appears on the display; push ‘Number’, and ‘1’ or ‘2’ appears. We can push each button only once, and only one at a time. Call, improperly, measurement the act of pushing a button and reading the display; call ‘outcome’ what is then read on the display. Denote the ‘Letter’ measurement L { L, L} N by M and its outcomes by Ra Rb ; the ‘Number’ measurement by M and its { N, N} outcomes by R1 R2 . Given only the above knowledge, we should assign a plausibility distribution (1/2, 1/2) to the outcomes of each measurement. But we know in fact something more about the construction of the box: inside, besides some sort of machinery, there is a chest containing an even number, 2N, of balls. Each ball is marked either ‘a1’, ‘a2’, ‘b1’, or ‘b2’. When a button is pushed, the machinery draws one of the balls from the chest and sends, depending on the button, either the letter or the number printed on the drawn ball to the display; and then puts the ball back into the chest.11 We have also a very important piece of knowledge as to how the 2N balls were originally chosen and put into the chest: this initially contained 4N balls, marked ‘a1’, ‘a2’, ‘b1’, and ‘b2’ in equal proportions (i.e., N balls marked ‘a1’, N ‘a2’, etc.). From these, 2N balls where taken away, so only 2N remained in the chest. This is all we know; denote it (together with everyday knowledge concerning balls, buttons, boxes, etc.) by IN . From IN , some points are immediately clear. First, not all the 2N balls in the chest can be marked ‘a1’, nor all ‘a2’, etc., since the chest initially contained only N of each type. Second, if all the 2N balls have the ‘a’ mark, then N of them must necessarily be of the ‘a1’ kind and the other N must be of the ‘a2’ kind. Similarly for the marks ‘b’ and, exchanging the rôle of letters and numbers, ‘1’ and ‘2’.

3.2 Introducing a set of circumstances

Let us analyse the context IN into a set of mutually exclusive and exhaustive possible circumstances. Different choices are possible. One is to consider the possible sets of balls left in (or equivalently, taken away from) the chest. The number of circum- stances thus defined is given by the number of ways of choosing 2N objects from a collection of 4N distinct ones without regard to order — the binomial coefficient 4N 2N . Note that it matters which of the ‘a1’-marked balls are chosen, and likewise for the others. Our knowledge is symmetric in respect of these circumstances, hence they are assigned equal plausibilities.12

11This renders the temporal order of the measurements (if both are performed) irrelevant. That is why we are not making temporal considerations. (Note also that we do not need to suppose that the urn is shaken after the replacement of the ball: this would add nothing to our state of knowledge, since we do not know how the machine makes the replacement anyhow.) 12That is, they are assigned equal plausibilities not because ‘the balls are initially chosen at random’ or something of the kind, but because we just do not know how they have been chosen. In fact, they

11 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

Another choice is to consider as a circumstance the numbers of balls marked ‘a1’, ‘a2’, etc. left in the chest instead. Note the difference with the previous choice: this is a sort of ‘coarse graining’ thereof. For this reason the newly defined circumstances are not equally plausible. We settle for this second choice and denote a generic circumstance by Cαa1,βa2,γb1,δb2, meaning ‘α ‘a1’-marked balls, ..., and δ ‘b2’- marked balls are left in the chest’. The coefficients α, β, etc. must obviously sum up to 2N and each can range from 0 to N.

3.3 Plausibility-indexing the circumstances; their particular set

As in the previous example, suppose that we are not interested in the details of the circumstances above, but only in the plausibilities they lead us to assign to the out- comes of the two measurements ML and MN. We can group the circumstances into plausibility-indexed equivalence classes, as before. In the present case the equiv- alence must take into account two plausibility distributions, one for each measure- ment. Here is an example for N = 2. The two different circumstances C(2a1,0a2,1b1,1b2) and C(1a1,1a2,2b1,0b2) lead both to the same plausibility distribution (1/2, 1/2) for the ‘Letter’ measurement, and to the same distribution (3/4, 1/4) for the ‘Number’ mea- surement (as is clear by simply counting their ‘a’s, ‘b’s, ‘1’s, and ‘2’s). Moreover, only these two circumstances lead to the plausibility distributions above, as the reader can prove. By theorem (10), also their disjunction C(2a1,0a2,1b1,1b2) ∨ C(1a1,1a2,2b1,0b2) leads to the same distributions and can thus be denoted by

S ((1/2,1/2), (3/4,1/4))  C(2a1,0a2,1b1,1b2) ∨ C(1a1,1a2,2b1,0b2). (23)

This is one of the plausibility-indexed circumstances. Its plausibility is the sum of its disjuncts’ plausibilities, P[S ((1/2,1/2), (3/4,1/4))| IN ] = P(C(2a1,0a2,1b1,1b2)| IN ) + P(C(1a1,1a2,2b1,0b2)| IN ). In general, for any N, we have plausibility-indexed circumstances denoted by L N S (qL,qN), the parameters q and q corresponding to the plausibility distributions for the ‘Letter’ and the ‘Number’ measurements. The indexing is such that

k| k ∧ ∧ = k = , P(Ri M S (qL,qN) IN ) qi for k L N and all appropriate i. (24)

We leave to the reader the pleasure of proving that there is a total of N2 + (N + 1)2 plausibility-indexed circumstances, i.e., of distinct values for the parameters (qL, qN). L N They can be represented by points on the plane qa q1 as illustrated in fig. 1 for the cases N = 1, N = 4, and N = 16 respectively. It is not difficult to see (especially looking at the figure for N = 16) that as N →∞their set ΓN becomes dense in the convex set Γ∞ defined by

L N L N Γ∞  {(q , q ) |q ∞ + q ∞  1}, (25) can have been chosen according to a particular scheme; the point is that we do not know such scheme.

12 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

q1 q1 N1 N4 1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

qa qa 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

q1 N16 1

0.8

0.6

0.4

0.2

qa 0.2 0.4 0.6 0.8 1

L N = Figure 1: Possible values of the plausibilities qa and q1 for N 1, 4, and 16. They are in bijective correspondence with the plausibility-indexed circumstances. The grey region is the set of all possible pairs of plausibility distributions for two generic measurements.

where q∞ is the supremum norm q∞  maxi{qi}. Thus in the limit N →∞ we may effectively work with a continuum of plausibility-indexed circumstances in bijection with the points of this set. Denote this ‘limit context’ by I∞. We observe two interesting facts. The first is that Γ∞ is a proper subset of the set of all possible pairs of plausibility distributions for two generic measurements (the grey square region in the figure); the latter is the Cartesian product of two one- L N dimensional simplices, Δ1 × Δ1. We could have let the parameters (q , q ) range over the latter set; in this case, however, the contexts IN and I∞ would have led us to a vanishing plausibility density for those parameter values not belonging to Γ∞.

13 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

The second interesting fact is that neither Γ∞ nor the larger set Δ1 × Δ1 are simplices. It is so because we are considering two measurements (had we considered a single measurement with four outcomes, we would have dealt with a three-dimensional simplex instead).13 See also remark 7.

3.4 Analysis by marginalisation

We can write the plausibility distributions for the measurements as marginalisations { } over the plausibility-indexed circumstances S (qL,qN) , using the latter’s plausibilities { | } L, N  { }≡{ } P[S (qL,qN) IN ] . Denote for brevity (q q ) q¯, hence S (qL,qN) S q¯ .Then k| k ∧ = k| k ∧ ∧ | , P(Ri M IN ) P(Ri M S q¯ IN) P(S q¯ IN ) ∈Γ q¯N k = q P(S q¯ | IN ), q¯∈ΓN k = L, N (26) Also this formula, like (3) and (18), looks like a weighted sum of plausibilities, and P(S q¯ | IN ) looks like the plausibility of two plausibility distributions. But this is not the case, just as it was not in the example of the coin: the propositions {S q¯ } speak not about plausibilities but about possible preparations of the box and its contents; yet they are suitably indexed according to the plausibilities they lead us to assign to the measurements’ outcomes. The sums above can also be replaced by integrals over the set Γ∞,  k| k ∧ = k | , P(Ri M IN) q pS (¯q IN )d¯q (27) Γ∞ where the densityq ¯ → pS (¯q| I∞) is introduced just like in the example of the coin.

3.5 ‘Updating’ the plausibilities of the circumstances and the outcomes

Suppose that the ‘Letter’ button has been pushed and the outcome ‘a’ has appeared on the display. This knowledge places us in a new context, expressed by the proposition L ∧ L ∧ M Ra I. We ask: (1) What plausibilities | L ∧ L ∧ P(S q¯ Ra M IN) (28) { } do we assign to the plausibility-indexed circumstances S (qL,qN) in the new context? Furthermore, we still have the possibility of pushing the ‘Number’ button once. So we also ask: (2) What plausibilities N| N ∧ L ∧ L ∧ P(Ri M Ra M IN ) (29) 13You might ask: “Couldn’t we consider a single measurement with the four outcomes ‘a1’, ‘a2’, ‘b1’, ‘b2’ instead?”. The answer is: yes, we could have introduced a single fictive measurement with ML and MN arising as marginals. But what for? After all, the rules of this game do not make allowance for such a measurement.

14 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

{ N, N} do we now assign to the outcomes R1 R2 in case we push the ‘Number’ button? Let us answer the first question. We use the assumption (valid in the context IN ) that knowledge of the performance of any of the two measurements (but not of their outcomes!) is irrelevant for assigning plausibilities to the circumstances:

k L N P(S q¯ | M ∧ IN ) = P(S q¯ | M ∧ M ∧ IN) = P(S q¯ | IN )forall¯q, k. (30)

With this assumption and eq. (24), Bayes’ theorem yields a simplified form for the sought plausibilities (28):

qL P(S | I ) | L ∧ L ∧ =  a q¯ N . P(S q¯ Ra M IN)  (31)  L  | q¯ q a P(S q¯ IN ) This is also the answer to an inverse problem. Note, again, that it expresses the updated plausibility distribution, not ‘of the parameterq ¯’, but of propositions like ‘There are α ‘a1’-markedballs,...,andδ ‘b2’-markedballsleftinthechest;or...; or α ‘a1’-markedballs,...,andδ ‘b2’-marked balls left in the chest’. To answer the second question we use, beside assumption (30), the following fact, which holds in our context IN : If we want to determine the plausibility dis- tribution for one of the measurements, and we know which particular circumstance holds, then for us it is irrelevant to know whether the other measurement has been performed, or which outcome it has yielded. For example, if we are interested in the plausibilities of the outcomes of the ‘Number’ measurement, and we know that a particular circumstance S q¯ holds (e.g., that in the chest there are two ‘a1’-marked balls, one ‘a2’-marked ball, etc.; or one ‘a1’-marked ball, one ‘a2’-marked ball, etc.), then knowledge of the outcome of the mere performance of ‘Letter’ measurement is irrelevant. In formulae,

k| l ∧ l ∧ k ∧ ∧ = k| l ∧ k ∧ ∧ = P[Ri (R j M ) M S q¯ IN ] P(Ri M M S q¯ IN ) k| k ∧ ∧  . P(Ri M S q¯ IN)forallk, l k, and appropriate i, j (32) Analysing eq. (29) in terms of circumstances and using eqs. (31) and (32) we find  qN qL P(S | I ) N| N ∧ L ∧ L ∧ = q¯ i a q¯ N . P(Ri M Ra M IN )  (33)  L  | q¯ q a P(S q¯ IN ) In regard to the assumptions summarised in eqs. (30) and (32), cf. remark 9.

4 Generalisation and summary of principal formulae

The two examples should suffice to give an idea of the interpretation ofq ¯-like pa- rameters and of their plausibilities, and of the principal consequences of this inter- pretation. The reader could try to make similar analyses for the toy models by Kirk- patrick [105, 106, 107], Spekkens [108], or us [109, 110]. We shall now present the idea in general and abstract terms. Some additional remarks will also be given.

15 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

4.1 Experiments, outcomes, circumstances

In the general case we have a context I and a set of m measurements, represented by propositions Mk, k = 1,...,m. Each measurement Mk has mutually exclusive and { k} exhaustive outcomes represented by a set of propositions Ri . The number of out- comes can vary from measurement to measurement, so that i ranges over appropriate sets for different k. The index k is omitted when no confusion arises. Remark 4. The use of the terms ‘measurement’ and ‘outcome’ is only dictated by concreteness. The formalism and the discussion presented apply in fact to more gen- eral concepts. What we call ‘measurement’ could be only a casual observation, or simply a ‘state of affairs’ which can present itself in mutually exclusive and exhaus- tive ‘forms’ (the ‘outcomes’). The term ‘measurement’ shall hence be divested here of those connotations implying active planning and control, which are not relevant to our study. Moreover, a ‘measurement’ needs not be associated with a point or short interval in time or space. It can e.g. be a collection of observations; in this case its ‘outcomes’ are all possible combinations of results from these observations. Finally, note that the m measurements are generally different, i.e., they are not neces- sarily ‘repetitions’ of the ‘same’ measurement — a case that will be discussed in the second paper instead.

A set of circumstances {C j} is introduced; these represent a sort of more detailed, possible descriptions of the context I, and are mutually exclusive and exhaustive, i.e. we know that one and only one of them holds:

   P(C  ∧ C  | I) = 0forallj and j  j , (34) j j  | = . P( jC j I) 1 (35)

The plausibilities of the measurements’ outcomes conditional on the circum- stances, k P(Ri| M ∧ C j ∧ I)forallj, k, and appropriate i, (36) are assumed to be given.

Remark 5. The notion of ‘circumstance’, represented by propositions C j and later also S q, has been further explained in remark 1. An example of circumstance from § 2 is ‘Gwendolen tossed the coin’; other examples are ‘The temperature during the experiment was 25 °C’ and the more elaborated ‘We studied the density of monodis- perse spherical particles in a tall cylindrical tube as a series of external excitations, consisting of discrete, vertical shakes or ‘taps,’ were applied to the container’ [111]. As in the case of ‘measurement’, a circumstance needs not be related to a single point or short interval in space or time. For example, in assigning the plausibility that it will rain or has rained in a given place at a given time, a circumstance might consists in a specific history of worldwide meteorological conditions under the preceding two years. For reasons discussed in remark 2, we require that a circumstance be described or specified in concrete terms, and metastatements like ‘The samples are drawn from a distribution f ’ or ‘The plausibility of head is 1/3’ are excluded. Finally, the choice

16 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach of an appropriate set of circumstances, i.e., of the appropriate way and depth to anal- yse a particular problem (the context), can only be decided on an individual basis, of course.

4.2 Plausibility-indexing the circumstances

The circumstances are then grouped into equivalence classes. Two circumstances are equivalent if they lead to the same plausibility distributions for each measurement Mk: ⎧ ⎪ k k ⎨ P(Ri| M ∧ C j ∧ I) = P(Ri| M ∧ C j ∧ I) C j ∼ C j ⇐⇒ ⎪ (37) ⎩ for all k and appropriate i. By construction the equivalence classes are in injective correspondence with the possible numerical values of the plausibility distributions for the measurements, (q1, q2,...). Denote a generic such value byq ¯  (qk), its equivalence class byq ¯ ∼, ∼ ∼ and membership by C j ∈ q¯ or simply j ∈ q¯ . We take all disjunctions of equivalent circumstances S q¯  C j, (38) j∈q¯ ∼ and call these (in lack of a better name) plausibility-indexed circumstances, short- ened to ‘circumstances’ whenever no confusion is possible. Conditional on such a circumstance S q¯ , the plausibilities of the outcomes have numerical values identical to its indices: k| k ∧ ∧ = k, P(Ri M S q¯ I) qi (39) a formula that reminds of a generalised Bernoulli model (cf. eq. (1)). Our main belief, already stated in the coin example, is that plausibility-like pa- rameters used as arguments of plausibilities can always be interpreted to stand for some appropriate plausibility-indexed circumstances.14 The passage to plausibility-grouped circumstances can have two main motiva- tions. (1) We can be interested in the plausibilities the circumstances lead to, rather than in the latter’s intrinsic details. (2) We may want a set of circumstances with the property that knowledge of outcomes can increase the plausibility of only one circumstance. This is true for the set {S q¯ }, but not for the set {C j} in general. In fact, knowledge of new outcomes can never lead to a alteration of the ratios of the plausibilities of two or more equivalent circumstances. Cf. remark 3 and see [1, 2] and [46, § 5.3].

14What constitutes a circumstance is largely a matter of situation, purpose, and personal good taste as well. The formalism presented cannot think up the circumstances for us. In § 2 we spoke e.g. about different persons’ skills in coin-tossing; but other people could speak about different values of the coin’s ‘propensity’ to come up heads. Perhaps the reason why ‘de Finettians’ have always felt uneasy about plausibility-like parameters and their priors was that these mathematical objects leave room to ideas and concepts that are unnecessary or not in good taste (cf. Jaynes [20], ch. 3, ‘Logic versus propensity’). To keep off these ideas they partially denied priors their meaning as plausibilities (this has led, fortunately, to some very beautiful ideas and theorems [64]). We hope to have shown here and in the next paper that there is no need to adopt such extreme measures.

17 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

Remark 6. Suppose that to each outcome Ri of some measurement Mk is associated avalueri of some physical quantity, so that it makes sense to speak of the expected 15 value of this quantity in a generic context Mk ∧ J: r| Mk ∧ J  ri P(Ri| Mk ∧ J). (40a) i In our case, the formation of equivalence classes of circumstances can then be made with respect to expected values instead of plausibilities, i.e.,

k k k k C j ∼ C j ⇐⇒ r | M ∧ C j ∧ I = r | M ∧ C j ∧ I for all k. (40b) { } In this way we obtain a set of expectation-indexed circumstances S r . Note that two different circumstances in such a set (leading hence to different expectations) may lead to the same probability distributions for the outcomes; therefore this set is not to be confused with, and has not the same applications of, our {S q¯ }. Particularly interesting is the space Γ of the parametersq ¯. Since these correspond Γ to numerical values of plausibility distributions for the measurements, is in general Δ (k) a (possibly proper) subset of a Cartesian product of simplices k ,thesimplex Δ (k) corresponding to the plausibility distribution for the kth measurement. Remark 7. The features of the subset Γ will depend on the nature of the circum- stances {C j} (and thus of the {S q¯ }). In some cases it is simply postulated that some kinds of circumstances do not present themselves, and this will delimit the subset accordingly. We saw an instance of this in the box example of § 3, in which the set Γ was, for each N, a special proper subset ΓN of the Cartesian product of two two- dimensional simplices (the grey square region in the figures). There are examples of physical theories where we postulate (by induction from numerous observations) that the set of ‘circumstances in which a system is prepared’ — often called states — is somehow restricted. This also restricts the space of the mathematical objects representing these states to particular, non-simplicial (convex) sets. The most notable example is quantum theory, in which the set of statistical operators — the mathemat- ical objects representing the states — has very strange shapes [112, 113, 114, 115].16 The set of Gibbs distributions in classical statistical mechanics provides another ex- ample. Remark 8. The plausibility-indexed circumstances need not be parametrised by the values of the plausibility distributions (qk) ≡ q¯. Other parametrisations can be used as long as they are in bijective correspondence with theq ¯ one, and some may be more useful (cf. [116]). Usually, what is relevant is the convex structure of the set of parameters Γ , a point on which we shall return in the third paper.

15Which should not be confused with the average [92], defined in terms of observed frequencies. Cf. footnote 9. 16That is, if we represent this set so as to preserve its convex properties, which are the relevant ones (see the third note of this series).

18 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

4.3 Priors and analysis by marginalisation If the initial circumstances {C j} have the plausibility distribution P(C j| I) ,bythe sum rule the plausibility-indexed circumstances have distributions P(S q¯ | I) = P(C j| I) (41) j∈q¯ ∼ (see also remark 11). In terms of the plausibility-indexed circumstances, the plausibility distribution k for each measurement outcome Ri can be expressed in marginal form as k| k ∧ = k| k ∧ ∧ | , P(Ri M I) P(Ri M S q¯ I)P(S q¯ I) q¯∈Γ ≡ k | , qi P(S q¯ I) (42) q¯∈Γ ≡ k | , qi pS (¯q I)d¯q Γ

(cf. eq. (3)) whereq ¯ → pS (¯q| I) is an appropriate generalised function [98, 117, 118, 119] (see also [120, 121]). The sudden appearance of an integral can be justified (as customary) as follows:q ¯ becomes a continuous parameter whose range is some set (k) Γ such that conv Γ ⊆ Γ ⊆ k Δ (where conv Γ is the convex hull of Γ ), and we → | ω ⊆ Γ introduce a density functionq ¯ pS (¯q I) such that, for each (from a suitable σ | =   | 17 -field of subsets), ∫ω pS (¯q I)d¯q q¯ ∈ω∩Γ P(S q¯ I). Note that to obtain the marginal form above it is assumed that knowledge of the measurement performed (but not of its outcome!) is irrelevant for assigning the plausibilities to the circumstances (cf. eq. (30)):

k1 kn P(S q¯ | M ∧···∧M ∧ I) = P(S q¯ | I)forall¯q, n = 1,...m,and{kt}. (43)

4.4 Updating the plausibilities of circumstances and outcomes

k Upon knowledge of the outcomes {Rk1 ,...,R n } of any subset {Mk1 ,...,Mkn }, n  i1 in m, of measurements, the {kt} being all mutually different, the plausibilities of the circumstances are updated, with the assumption (43), according to

k1 kn q ···q P(S q¯ | I) k1 k1 kn kn i1 in P[S q¯ | (R ∧ M ) ∧···∧(R ∧ M ) ∧ I] =  , (44) i1 in k1 kn ∈Γ q ···q P(S q¯  | I) q¯ i1 in

17 No one forbids us to introduce additional impossible fictitious circumstances {C j } (which may involve, e.g., ‘centaurs, nectar, ambrosia, fairies’ [122]) constructed so as to ‘complete’ the set of ≡ k ∈ Γ plausibility-indexed circumstances, i.e., in such a way that for eachq ¯ (qi ) (note the bar!) there is {  } | k ∧ ∧ = k always an S q¯ — possibly defined in terms of the fictitious C j —forwhichP(Ri M S q¯ I) qi . This operation — which is, mark, not necessary — has no importance nor mathematical consequences because the fictitious circumstances are impossible, i.e., their plausibilities in the context I are naught, and thus terms containing them give no contribution in formulae like (42) or (47).

19 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

or, in terms of the density pS ,

k1 kn q ···q pS (¯q| I) k1 k1 kn kn i1 in pS [¯q| (R ∧ M ) ∧···∧(R ∧ M ) ∧ I] = . (45) i1 in k1 kn   q ···q pS (¯q | I)d¯q Γ i1 in These formulae are valid for n  2 only if we assume that, when a circumstance is known and we want to assign a plausibility distribution for a measurement, knowl- edge of performance of other measurements or of their outcomes is irrelevant (this is what Caves calls, in a slightly different context (see the second paper in this series), ‘learning through the parameter’ [78]):

k| ∧ k ∧ ∧ = k| k ∧ ∧ P(Ri E M S q¯ I) P(Ri M S q¯ I) for allq ¯,whereE is any conjunction of any number of mutually (46) different {Mkt } and any number of {Rkt } (each k  k). it t Under the assumptions (43) and (46), we also obtain, by marginalisation over { } k the S q¯ , the plausibility of an outcome Ri given knowledge of outcomes of other measurements {Mkt } different from Mk: k k1 ··· kn | Γ qi qi qi pS (¯q I)d¯q P[Rk| (Rk1 ∧ Mk1 ) ∧···∧(Rkn ∧ Mkn ) ∧ Mk ∧ I] = 1 n . (47) i i1 in k1 kn   q ···q pS (¯q | I)d¯q Γ i1 in Remark 9. We should always be careful in assuming and using the conditions sum- marised in eqs. (43) and (46), because they in many cases do not hold. An example would be provided by the example of the coin toss if we considered other tosses made by the same, unknown, person. In the circumstance in which Jack tosses the coin, eq. (46) would not hold because from the results of other tosses we would learn more about Jack’s skills in coin-tossing. In fact, even eq. (7) could cease to be valid for other tosses, and our set of circumstances would no longer be appropri- ate. We discuss similar matters in more detail in the second part of this study. In general, also the relations amongst the times or places at which measurements are performed can be relevant and thus require a careful analysis. Cf. the examples in refs. [105, 106, 107, 108, 109].

4.5 Further remarks

Remark 10. A very important point is that the analysis of the context in terms of ff {  } {  } {  } circumstances is far from unique (cf. footnote 14). Di erent sets C j , C j , C j , etc. of circumstances can be introduced to analyse the context, and from them cor- {  | ∈ Γ } {  | ∈ Γ } responding sets of plausibility-indexed circumstances S q¯ q¯ , S q¯ q¯ , {  | ∈ Γ } S q¯ q¯ , etc. can be constructed in the standard way. The circumstances of each set have to be mutually exclusive and exhaustive for the present formalism to hold, but they need not be exclusive with those of the other sets. For example, in

20 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

the case of the coin toss (§ 2) we could analyse the context Ico into another set of {  | ∈ − , } circumstances, say Cr r ] 1 1[ with   Cr ‘The mass-centre of the coin lies on the coin’s (oriented) axis a (48) fraction r/2 of the total width away from the coin centre’. The analysis and the construction of the plausibility-indexed circumstances would proceed exactly in the same way, apart from possibly different values of their plausi- bilities.18 ff {  } {  } Di erent sets C j , C j , . . . can also be combined into a single set with circum-   {    ∧ ∧···} stances C j j ... C j C j . These will be mutually exclusive and exhaustive by construction. Again, the corresponding plausibility-indexed set {S q¯ } will ensue in the usual way. Remark 11. In view of the preceding remark it is clear that we can find a meaning for a plausibility-parameter likeq ¯ in terms of a set of circumstances, but not the meaning, because that set is not unique. This also implies that different choices of priors for q¯ need not be contradictory, because they can arise as the plausibilities for different sets of circumstances. There are, however, some compatibility conditions that the plausibility distributions for two or more sets of plausibility-indexed circumstances must satisfy (here stated in terms of densities):  

k1 k1 q pS  (¯q| I)d¯q = q pS  (¯q| I)d¯q, i1 i1    Γ Γ

k1 k2 k1 k2 q q pS  (¯q| I)d¯q = q q pS  (¯q| I)d¯q, i1 i2 i1 i2   Γ Γ ...   1 2 m 1 2 m q q ···q p  (¯q| I)d¯q = q q ···q p  (¯q| I)d¯q, i1 i2 im S i1 i2 im S   Γ Γ

for all mutually different kt and appropriate it (49)

(i.e., some of their moments must be equal), where m is the number of measure- k ments. These conditions arise simply analysing the plausibilities P(Ri| M ∧ I)and (47) first by means of one set of circumstances, then by means of the other, equat- ing the expressions thus obtained, and applying property (39) (under the assump- tions (43) and (46)). Remark 12. The formalism lends naturally itself also to iteration. One can introduce ‘circumstances of circumstances’, etc., i.e. deeper and deeper levels of analysis for the context I. What mathematically comes about looks like a hierarchy of ‘plausi- bilities of plausibilities’, ‘plausibilities of plausibilities of plausibilities’, etc., which

18Note that the position of the mass-centre of a coin is not a very important factor in the assignment of a plausibility to heads or tails. See Jaynes’ discussion [20, ch. 10].

21 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

Good calls ‘probabilities of Type I, II, III’, etc. [61]. Of course, such a cornucopia of recursive analyses may be appropriate and useful in some cases, while in others may just lead to constipation. Remark 13. The interpretation here presented may also provide another point of view on theories of interval-valued probabilities (see e.g. [123, 124][75, esp. § 3.1] and cf. [61, § 2.2][125]), an in this sense completes or re-interprets studies by e.g. Jami- son [62], Levi [125, 126], Fishburn [127], Nau [128].

5 Conclusions

We often have the need to use statistical models with plausibility-like parameters, especially in classical and quantum mechanics, and must face the problems of choos- ing an suitable parameter space and a plausibility distribution on this space. These problems would sometimes be less difficult if the parameters could be given some interpretation. Some interpret the parameters as ‘propensities’ or ‘physical probabilities’. But these concepts do not make sense to us. De Finettians say that we should not interpret the parameters, but think in terms of infinitely exchangeable sequences instead; the parameters and their priors then arise as mathematical devices. But we do not like being forced to think in terms of infinite sequences, whose vast majority (∞) of elements must then necessarily be fictitious. And there are situations that can be repeated a finite number of times only. In addition to this, looking at concrete applications of statistical models it seems that behind the parameters we often have ‘at the back of our minds’ an idea of some possible hypotheses — ‘circumstances’ — that could hold in the context under study, e.g. a physical measurement. These circumstances could help us in the assignment of plausibilities. And they need not concern ‘causes’ or ‘propensities’; see remarks 1 and 5. At the same time, we are sometimes not interested in the intrinsic details of such circumstances, but only in the plausibilities that we eventually assign on their grounds. We have seen in this study that plausibility theory allows us, starting from any set {C j} of circumstances, to form another, ‘coarse-grained’ set {S q} with the property that its circumstances lead each one to a different plausibility distribution. The cir- cumstances of this set can then be uniquely indexed by the plausibility distributions they lead us to assign. This set, moreover, is invariant with respect to changes in the plausibilities of the initial and the coarse-grained sets of circumstances, {P(C j| I)} and {P(S q¯ | I)}. This suggests that plausibility-like parameters like q, when used as arguments of plausibility formulae, can always be interpreted to stand for some appropriately in- dexed circumstances like S q. With mathematical care, this may even hold for param- eters of continuous statistical models. Parameter priors like f (q| I) can consequently be interpreted as plausibilities of circumstances P(S q| I).

22 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

The study of how these priors are updated when repetitions of ‘similar’ measure- ments occur, and of particular applications to classical and quantum mechanics, are developed in the next two papers.

Acknowledgements

PM thanks Louise for continuous and invaluable encouragement, and the staff of the KTH Biblioteket for their ever prompt support. AM thanks Anders Karlsson for encouragement.

Bibliography

Note: arxiv eprints are located at http://arxiv.org/. [1] P. G. L. Porta Mana, Ph.D. thesis, Kungliga Tekniska Högskolan, Stockholm (2007), http://web.it.kth.se/~mana/. [2] P. G. L. Porta Mana (2007), in preparation. [3] L. Wittgenstein, Philosophical Remarks (Basil Blackwell, Oxford, 1929/1998), ed. by Rush Rhees, transl. by Raymond Hargreaves and Roger White; written ca. 1929–30, first publ. in German 1964. [4] L. Wittgenstein, Philosophical Grammar (Basil Blackwell, Oxford, 1933/1993), ed. by Rush Rhees, transl. by Anthony Kenny; written ca. 1933–34, first publ. 1974. [5] W. E. Johnson, Probability: The relations of proposal to supposal,Mind41(161), 1–16 (1932), with an introductory note by R. B. Braithwaite. [6] C. R. Kordig, Discovery and justification, Philos. Sci. 45(1), 110–117 (1978). [7] W. E. Johnson, Probability: Axioms,Mind41(163), 281–296 (1932). [8] W. E. Johnson, Probability: The deductive and inductive problems,Mind41(164), 409–423 (1932), with some notes and an appendix by R. B. Braithwaite. [9] B. O. Koopman, The bases of probability,Bull.Am.Math.Soc.46, 763–774 (1940), repr. in [129, pp. 159–172]. [10] B. O. Koopman, The axioms and algebra of intuitive probability, Ann. Math. 41(2), 269–292 (1940). [11] B. O. Koopman, Intuitive probabilities and sequences, Ann. Math. 42(1), 169–187 (1941). [12] R. T. Cox, Probability, frequency, and reasonable expectation, Am. J. Phys. 14(1), 1–13 (1946). [13] R. T. Cox, The Algebra of Probable Inference (The Johns Hopkins Press, Baltimore, 1961). [14] E. W. Adams, The Logic of Conditionals: An Application of Probability to Deductive Logic (D. Reidel Publishing Company, Dordrecht, 1975). [15] R. T. Cox, Of inference and inquiry, an essay in inductive logic, in Levine and Tribus [130] (1979), pp. 119–167. [16] H. Jeffreys, Scientific Inference (Cambridge University Press, Cambridge, 1931/1957), 2nd ed., first publ. 1931. [17] H. Jeffreys, Theory of Probability (Oxford University Press, London, 1939/1998), 3rd ed., first publ. 1939.

23 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

[18] E. T. Jaynes, Probability Theory: With Applications in Science and Engineering: A Se- ries of Informal Lectures (1954–1974), http://bayes.wustl.edu/etj/science. pdf.html; lecture notes written 1954–1974; earlier version of [20]. [19] E. T. Jaynes, Probability Theory in Science and Engineering (Socony-Mobil Oil Com- pany, Dallas, 1959), http://bayes.wustl.edu/etj/node1.html;seealso[18]. [20] E. T. Jaynes, Probability Theory: The Logic of Science (Cambridge University Press, Cambridge, 1994/2003), ed. by G. Larry Bretthorst; http://omega.albany.edu: 8008/JaynesBook.html, http://omega.albany.edu:8008/JaynesBookPdf. html. First publ. 1994; earlier versions in [18, 19]. [21] T. Hailperin, Sentential Probability Logic: Origins, Development, Current Status, and Technical Applications (Associated University Presses, London, 1996). [22] E. W. Adams, A Primer of Probability Logic (CSLI Publications, Stanford, 1998). [23] P. Gregory, Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support (Cambridge University Press, Cambridge, 2005). [24] R. Carnap, Logical Foundations of Probability (The University of Chicago Press, Chicago, 1950/1962), 2nd ed., first publ. 1950. [25] H. Gaifman, Concerning measures in first order calculi,IsraelJ.Math.2, 1–18 (1964). [26] D. Scott and P. Krauss, Assigning probabilities to logical formulas, in Hintikka and Suppes [131] (1966), pp. 219–264. [27] I. Hacking, Slightly more realistic personal probability, Philos. Sci. 34(4), 311–325 (1967). [28] H. Gaifman, Subjective probability, natural predicates and Hempel’s ravens, Erkennt- nis 14(2), 105–147 (1979). [29] H. Gaifman and M. Snir, Probabilities over rich languages, testing and randomness, J. Symbolic Logic 47(3), 495–548 (1982). [30] P. Maher, Inductive logic and the ravens paradox, Philos. Sci. 66(1), 50–70 (1999). [31] H. Gaifman, Reasoning with bounded resources and assigning probabilities to arithmetical statements, Synthese 140(1–2), 97–119 (2003/2004), http://www. columbia.edu/~hg17/; first publ. 2003. [32] A. Hájek, What conditional probability could not be, Synthese 137(3), 273–323 (2003). [33] P. Maher, Probability captures the logic of scientific confirmation, in Hitchcock [132] (2004), pp. 69–93. [34] B. Fitelson and J. Hawthorne, How Bayesian confirmation theory handles the paradox of the ravens,inProbability in Science, edited by E. Eells and J. Fetzer (Open Court, Chicago, 2005), p. **, http://fitelson.org/research.htm. [35] B. Fitelson, Inductive logic, in Sarkar and Pfeifer [133] (2005), p. **, http:// fitelson.org/research.htm. [36] B. Fitelson, A. Hájek, and N. Hall, Probability, in Sarkar and Pfeifer [133] (2005), p. **, http://fitelson.org/research.htm. [37] P. Maher, The concept of inductive probability, Erkenntnis **(**), ** (2006), http: //patrick.maher1.net/preprints.html. [38] J. Barwise, The situation in logic — II: Conditionals and conditional information, Tech. Rep. CSLI-85-21, Center for the Study of Language and Information, Stanford (1985). [39] J. Barwise, The Situation in Logic (CSLI, Stanford, 1989). [40] G. Restall, Notes on situation theory and channel theory (1996), http:// consequently.org/writing/; lecture notes. [41] H. Gaifman, Vagueness, tolerance and contextual logic (2002), http://www.

24 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

columbia.edu/~hg17/. [42] M. Tribus, Rational Descriptions, Decisions and Designs (Pergamon Press, New York, 1969). [43] H. Gaifman (2006), personal communication to PGLPM. [44] ISO, Quantities and units, International Organization for Standardization, Geneva, 3rd ed. (1993). [45] IEEE, ANSI/IEEE Std 260.3-1993: American National Standard: Mathematical signs and symbols for use in physical sciences and technology, Institute of Electrical and Electronics Engineers, New York (1993). [46] P. G. L. Porta Mana, A. Månsson, and G. Björk, The Laplace-Jaynes approach to induction. Being part II of “From ‘plausibilities of plausibilities’ to state-assignment methods” (2007), eprint arXiv:physics/0703126,eprintPhilSci:00003235. [47] A. Månsson, P. G. L. Porta Mana, and G. Björk, Numerical Bayesian state assignment for a three-level quantum system. I. Absolute-frequency data; constant and Gaussian- like priors (2006), eprint arXiv:quant-ph/0612105;seealso[48]. [48] A. Månsson, P. G. L. Porta Mana, and G. Björk, Numerical Bayesian state assignment for a quantum three-level system. II. Average-value data; constant, Gaussian-like, and Slater priors (2007), eprint arXiv:quant-ph/0701087; see also [47]. [49] P. McCullagh, What is a statistical model?, Ann. Stat. 30(5), 1225–1267 (2002), http://www.stat.uchicago.edu/~pmcc/publications.html; see also the fol- lowing discussion and rejoinder [50]. [50] J. Besag, P. J. Bickel, H. Brøns, D. A. S. Fraser, N. Reid, I. S. Helland, P. J. Huber, R. Kalman, S. Pincus, T. Tjur, et al., What is a statistical model?: Discussion and Re- joinder, Ann. Stat. 30(5), 1267–1310 (2002), http://www.stat.uchicago.edu/ ~pmcc/publications.html; see McCullagh [49]. [51] A. I. Dale, A History of Inverse Probability: From Thomas Bayes to Karl Pearson (Springer-Verlag, New York, 1991/1999), 2nd ed., first publ. 1991. [52] T. Bayes, An essay towards solving a problem in the doctrine of chances, Phil. Trans. R. Soc. Lond. 53, 370–418 (1763), http://www.stat.ucla.edu/history/ essay.pdf; with an introduction by Richard Price. [53] P. S. Laplace, (Marquis de), Mémoire sur la probabilité des causes par les évène- mens, Mémoires de mathématique et de physique, presentés à l’Académie Royale des Sciences, par divers savans et lûs dans ses assemblées 6, 621–656 (1774), repr. in [134], pp. 25–65; http://gallica.bnf.fr/document?O=N077596;transl. as [135]. [54] P. S. Laplace, (Marquis de), Théorie analytique des probabilités (Mme Ve Courcier, Paris, 1812/1820), 3rd ed., first publ. 1812; repr. in [136]; http://gallica.bnf. fr/document?O=N077595. [55] I. Hacking, The Leibniz-Carnap program for inductive logic, J. Phil. 68(19), 597–610 (1971). [56] I. Hacking, Jacques Bernoulli’s Art of Conjecturing, Brit. J. Phil. Sci. 22(3), 209–229 (1971). [57] I. Hacking, Equipossibility theories of probability, Brit. J. Phil. Sci. 22(4), 339–355 (1971). [58] J. Marschak, M. H. Degroot, J. Marschak, K. Borch, H. Chernoff, M. Groot, R. Dorf- man, W. Edwards, T. S. Ferguson, K. Miyasawa, et al., Personal probabilities of prob- abilities, Theory and Decision 6(2), 121–153 (1975). [59] A. Mosleh and V. M. Bier, Uncertainty about probability: a reconciliation with the subjectivist viewpoint, IEEE Trans. Syst. Man Cybern. A 26(3), 303–310 (1996).

25 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

[60] D. Gillies, Varieties of propensity, Brit. J. Phil. Sci. 51(4), 807–835 (2000). [61] I. J. Good, The Estimation of Probabilities: An Essay on Modern Bayesian Methods (The MIT Press, Cambridge, USA, 1965). [62] D. Jamison, Bayesian information usage, in Hintikka and Suppes [137] (1970), pp. 28–57. [63] G. Tintner, The theory of choice under subjective risk and uncertainty, Econometrica 9(3–4), 298–304 (1941). [64] B. de Finetti, La prévision : ses lois logiques, ses sources subjectives,Ann.Inst.Henri Poincaré 7(1), 1–68 (1937), transl. as [138]. [65] B. de Finetti, Sur la condition d’équivalence partielle (1938), transl. in [139]. [66] E. Hewitt and L. J. Savage, Symmetric measures on Cartesian products,Trans.Am. Math. Soc. 80(2), 470–501 (1955). [67] D. Heath and W. Sudderth, De Finetti’s theorem on exchangeable variables, American Statistician 30(4), 188–189 (1976). [68] D. V. Lindley and L. D. Phillips, Inference for a Bernoulli process (a Bayesian view), American Statistician 30(3), 112–119 (1976). [69] P. Diaconis, Finite forms of de Finetti’s theorem on exchangeability, Synthese 36(2), 271–281 (1977). [70] H. O. Georgii, ed., Canonical Gibbs Measures: Some Extensions of de Finetti’s Rep- resentation Theorem for Interacting Particle Systems (Springer-Verlag, Berlin, 1979). [71] G. Link, Representation theorems of the de Finetti type for (partially) symmetric prob- ability measures,inJeffrey [140] (1980), pp. 207–231. [72] P. Diaconis and D. Freedman, Finite exchangeable sequences, Ann. Prob. 8(4), 745– 764 (1980). [73] E. T. Jaynes, Some applications and extensions of the de Finetti representation the- orem,inBayesian Inference and Decision Techniques with Applications: Essays in Honor of Bruno de Finetti, edited by P. K. Goel and A. Zellner (North-Holland, Ams- terdam, 1986), p. 31, http://bayes.wustl.edu/etj/node1.html. [74] P. Diaconis and D. Freedman, A dozen de Finetti-style results in search of a theory, Ann. Inst. Henri Poincaré (B) 23(S2), 397–423 (1987). [75] F. Lad, J. M. Dickey, and M. A. Rahman, The fundamental theorem of prevision, Statistica 50(1), 19–38 (1990). [76] J.-M. Bernardo and A. F. Smith, Bayesian Theory (John Wiley & Sons, Chichester, 1994). [77] C. M. Caves, Exchangeable sequences and probabilities for probabilities (2000), http://info.phys.unm.edu/~caves/reports/reports.html. [78] C. M. Caves, Learning and the de Finetti representation (2000), http://info. phys.unm.edu/~caves/reports/reports.html. [79] J. M. Keynes, A Treatise on Probability (MacMillan, London, 1921), 5th ed. [80] E. T. Jaynes, Monkeys, kangaroos and N (1996), http://bayes.wustl.edu/etj/ node1.html; revised and corrected version of Jaynes [141]. (Errata: in equations (29)–(31), (33), (40), (44), (49) the commas should be replaced by gamma functions, andonp.19thevalue0.915 should be replaced by 0.0915). [81] B. de Finetti, Probabilismo, Logos 14, 163–219 (1931), transl. as [142]; see also [143]. [82] C. M. Caves, C. A. Fuchs, and R. Schack, Unknown quantum states: the quantum de Finetti representation, J. Math. Phys. 43(9), 4537–4559 (2002), eprint arXiv: quant-ph/0104088. [83] Random House, Random House Webster’s Unabridged Dictionary, Random House Reference Publishing, New York, CD-ROM v3.0 ed. (2003), with recorded pronunci-

26 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

ations and illustrations; based on the second printed edition. [84] I. M. Copi, Symbolic Logic (Macmillan Publishing, New York, 1954/1979), 5th ed., first publ. 1954. [85] A. Church, Introduction to Mathematical Logic. Vol. 1 (Princeton University Press, Princeton, 1956/1970), first publ. 1956. [86] P. J. FitzPatrick, To Gödel via Babel,Mind75(299), 332–350 (1966). [87] H.-D. Ebbinghaus, J. Flum, and W. Thomas, Mathematical Logic (Springer-Verlag, New York, 1978/1984), transl. by Ann S. Ferebee; first publ. 1978 as Einführung in die mathematische Logik. [88] D. R. Hofstadter, Gödel, Escher, Bach: an Eternal Golden Braid (Basic Books, New York, 1979/1999), first publ. 1979. [89] Molière (Jean-Baptiste Poquelin), Le malade imaginaire : comédie : mêlée de mu- sique et de danses (La Grange et Vivot, Paris, 1682), “Corrigée, sur l’original de l’auteur, de toutes les fausses additions et suppositions de scènes entières, faites dans les éditions précédentes. Représentée pour la première fois à Paris, sur le Théâtre de la salle du Palais-Royal le 10e février 1673 par la Troupe du Roi”; http://www.toutmoliere.net/oeuvres/maladeim/. [90] J. Barwise and J. Etchemendy, The Liar: An Essay on Truth and Circularity (Oxford University Press, New York, 1987). [91] K. Friedman and A. Shimony, Jaynes’s maximum entropy prescription and probability theory, J. Stat. Phys. 3(4), 381–384 (1971), see also [93, 144, 145]. [92] ISO, Guide to the Expression of Uncertainty in Measurement, International Organiza- tion for Standardization, Geneva (1993). [93] D. W. Gage and D. Hestenes, Comment on the paper “Jaynes’s maximum entropy prescription and probability theory”, J. Stat. Phys. 7(1), 89–90 (1973), see [91] and also [144, 145]. [94] J. F. Cyranski, Analysis of the maximum entropy principle “debate”, Found. Phys. 8(5–6), 493–506 (1978). [95] Y. V. Egorov, A contribution to the theory of generalized functions, Russ. Math. Sur- veys (Uspekhi Mat. Nauk) 45(5), 1–49 (1990). [96] Y. V. Egorov, Generalized functions and their applications, in Exner and Neidhardt [146] (1990), pp. 347–354. [97] A. S. Demidov, Generalized Functions in Mathematical Physics: Main Ideas and Concepts (Nova Science Publishers, Huntington, USA, 2001), with an addition by Yu. V. Egorov. [98] M. J. Lighthill, Introduction to Fourier Analysis and Generalised Functions (Cam- bridge University Press, London, 1958/1964), first publ. 1958. [99] A. Delcroix, M. F. Hasler, S. Pilipovic,´ and V. Valmorin, Algebras of generalized functions through sequence spaces algebras. Functoriality and associations,Int.J. Math. Sci. 1, 13–31 (2002), eprint arXiv:math.FA/0210249. [100] A. Delcroix, M. F. Hasler, S. Pilipovic,´ and V. Valmorin, Generalized function alge- bras as sequence space algebras, Proc. Am. Math. Soc. 132(7), 2031–2038 (2004), eprint arXiv:math.FA/0206039. [101] M. Oberguggenberger, Generalized functions in nonlinear models — a survey, Nonlinear Analysis 47(8), 5029–5040 (2001), http://techmath.uibk.ac.at/ mathematik/publikationen/. [102] C. Swartz, Introduction to Gauge Integrals (World Scientific Publishing, Singapore, 2001). [103] R. G. Bartle, A Modern Theory of Integration (American Mathematical Society, Prov-

27 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

idence, USA, 2001). [104] W. F. Pfeffer, The Riemann approach to integration: Local geometric theory (Cam- bridge University Press, Cambridge, 1993). [105] K. A. Kirkpatrick, Ambiguities in the derivation of retrodictive probability (2002), eprint arXiv:quant-ph/0207101. [106] K. A. Kirkpatrick, “Quantal” behavior in classical probability, Found. Phys. Lett. 16(3), 199–224 (2003), eprint arXiv:quant-ph/0106072. [107] K. A. Kirkpatrick, Classical three-box “paradox”, J. Phys. A 36(17), 4891–4900 (2003), see also the corrected version eprint arXiv:quant-ph/0207124. [108] R. W. Spekkens, Evidence for the epistemic view of quantum states: A toy theory, Phys. Rev. A 75(3), 032110 (2007), eprint arXiv:quant-ph/0401052. [109] P. G. L. Mana, Consistency of the Shannon entropy in quantum experiments, Phys. Rev. A 69, 062108 (2004), rev. version at eprint arXiv:quant-ph/0302049. [110] P. G. L. Mana, Probability tables, in Khrennikov [147] (2004), pp. 387–401, rev. version at eprint arXiv:quant-ph/0403084. [111] E. R. Nowak, J. B. Knight, E. Ben-Naim, H. M. Jaeger, and S. R. Nagel, Density fluctuations in vibrated granular materials,Phys.Rev.E57(2), 1971–1982 (1998). [112] L. Jakóbczyk and M. Siennicki, Geometry of Bloch vectors in two-qubit system, Phys. Lett. A 286, 383–390 (2001). [113] G. Kimura, The Bloch vector for N-level systems, Phys. Lett. A 314(5–6), 339–349 (2003), eprint arXiv:quant-ph/0301152. [114] G. Kimura and A. Kossakowski, The Bloch-vector space for N-level systems: the spherical-coordinate point of view, Open Sys. & Information Dyn. 12(3), 207–229 (2005), eprint arXiv:quant-ph/0408014. [115] P. G. L. Porta Mana, Four-dimensional sections of the set of statistical operators for a three-level quantum system, in various coordinate systems (2006), realised as ani- mated pictures by means of Maple; available upon request. [116] P. McCullagh, Conditional inference and Cauchy models, Biometrika 79(2), 247–259 (1992), http://www.stat.uchicago.edu/~pmcc/publications.html. [117] J. F. Colombeau, New Generalized Functions and Multiplication of Distributions (North-Holland, Amsterdam, 1984). [118] J. F. Colombeau, Elementary Introduction to New Generalized Functions (North- Holland, Amsterdam, 1985). [119] J. F. Colombeau, Multiplication of Distributions: A tool in mathematics, numerical engineering and theoretical physics (Springer-Verlag, Berlin, 1992). [120] R. G. Bartle, Return to the Riemann integral, Am. Math. Monthly 103(8), 625–632 (1996). [121] R. A. Gordon, The use of tagged partitions in elementary real analysis,Am.Math. Monthly 105(2), 107–117 (1998). [122] H. MacColl, Symbolic reasoning (VI),Mind14(53), 74–81 (1905), see also [148, 149, 150, 151, 152, 153, 154]. [123] F. V. Atkinson, J. D. Church, and B. Harris, Decision procedures for finite decision problems under complete ignorance, Ann. Math. Stat. 35(4), 1644–1655 (1964). [124] E. H. Kyburg, Jr., Bayesian and non-Bayesian evidential updating, Artif. Intell. 31(3), 271–293 (1987). [125] I. Levi, Information and ignorance, Inform. Process. Manag. 20(3), 355–362 (1984). [126] I. Levi, On indeterminate probabilities, J. Phil. 71(13), 391–418 (1974). [127] P. C. Fishburn, Ellsberg revisited: A new look at comparative probability, Ann. Stat. 11(4), 1047–1059 (1983).

28 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

[128] R. F. Nau, Indeterminate probabilities on finite sets, Ann. Stat. 20(4), 1737–1767 (1992), http://faculty.fuqua.duke.edu/~rnau/bio/. [129] H. E. Kyburg, Jr. and H. E. Smokler, eds., Studies in Subjective Probability (John Wiley & Sons, New York, 1964). [130] R. D. Levine and M. Tribus, eds., The Maximum Entropy Formalism (The MIT Press, Cambridge, USA, 1979). [131] J. Hintikka and P. Suppes, eds., Aspects of Inductive Logic (North-Holland, Amster- dam, 1966). [132] C. Hitchcock, ed., Contemporary Debates in Philosophy of Science (Blackwell Pub- lishing, Malden, USA, 2004). [133] S. Sarkar and J. Pfeifer, eds., The Philosophy of Science: An Encyclopedia (Routledge Press, Florence, USA, 2005), two volumes. [134] P. S. Laplace, (Marquis de), Œuvres complètes de Laplace. Tome huitième : Mémoires extraits des recueils de l’Académie des sciences de Paris et de la classe des sciences mathématiques et physiques de l’Institut de France (Gauthier-Villars, Paris, 1891), ‘Publiées sous les auspices de l’Académie des sciences, par MM. les secrétaires per- pétuels’; http://gallica.bnf.fr/notice?N=FRBNF30739022. [135] P. S. Laplace, (Marquis de), Memoir on the probability of the causes of events, Stat. Sci. 1(3), 364–378 (1986), transl. by S. M. Stigler of [53]; see also the translator’s introduction [155]. [136] P. S. Laplace, (Marquis de), Œuvres complètes de Laplace. Tome septième : Théorie analytique des probabilités (Gauthier-Villars, Paris, 1886), ‘Publiées sous les auspices de l’Académie des sciences, par MM. les secrétaires perpétuels’; http://gallica. bnf.fr/notice?N=FRBNF30739022. [137] J. Hintikka and P. Suppes, eds., Information and Inference (D. Reidel Publishing Com- pany, Dordrecht, 1970). [138] B. de Finetti, Foresight: Its logical laws, its subjective sources, in Kyburg and Smokler [129] (1964), pp. 93–158, transl. by Henry E. Kyburg, Jr. of [64]. [139] B. de Finetti, On the condition of partial exchangeability,inJeffrey [140] (1980), pp. 193–205, transl. by P. Benacerraf and R. Jeffrey of [65]. [140] R. C. Jeffrey, ed., Studies in inductive logic and probability (University of California Press, Berkeley, 1980). [141] E. T. Jaynes, Monkeys, kangaroos and N,inMaximum-Entropy and Bayesian Methods in Applied Statistics, edited by J. H. Justice (Cambridge University Press, Cambridge, 1986), p. 26, see also the revised and corrected version [80]. [142] B. de Finetti, Probabilism: A critical essay on the theory of probability and on the value of science, Erkenntnis 31(2–3), 169–223 (1931/1989), transl. of [81] by Maria Concetta Di Maio, Maria Carla Galavotti, and Richard C. Jeffrey. [143] R. Jeffrey, Reading Probabilismo, Erkenntnis 31(2–3), 225–237 (1989), see [81]. [144] M. Tribus and H. Motroni, Comment on the paper “Jaynes’s maximum entropy pre- scription and probability theory”, J. Stat. Phys. 4(2/3), 227–228 (1972), see [91] and also [93, 145]. [145] K. Friedman, Replies to Tribus and Motroni and to Gage and Hestenes,J.Stat.Phys. 9(3), 265–269 (1973), see [93, 144]. [146] P.Exner and H. Neidhardt, eds., Order, Disorder and Chaos in Quantum Systems: Pro- ceedings of a conference held at Dubna, USSR on October 17–21, 1989 (Birkhäuser Verlag, Basel, 1990). [147] A. Y. Khrennikov, ed., Quantum Theory: Reconsideration of Foundations — 2 (Växjö University Press, Växjö, Sweden, 2004).

29 Porta Mana,Månsson,Bjork¨ ‘Plausibilities of plausibilities’: the circumstance approach

[148] H. MacColl, Symbolical reasoning,Mind5(17), 45–60 (1880), see also [122, 149, 150, 151, 152, 153, 154]. [149] H. MacColl, Symbolic reasoning (II),Mind6(24), 493–510 (1897), see also [122, 148, 150, 151, 152, 153, 154]. [150] H. MacColl, Symbolic reasoning (III),Mind9(33), 75–84 (1900), see also [122, 148, 149, 151, 152, 153, 154]. [151] H. MacColl, Symbolic reasoning (IV),Mind11(43), 352–368 (1902), see also [122, 148, 149, 150, 152, 153, 154]. [152] H. MacColl, Symbolic reasoning (V),Mind12(47), 355–364 (1903), see also [122, 148, 149, 150, 151, 153, 154]. [153] H. MacColl, Symbolic reasoning (VII),Mind14(55), 390–397 (1905), see also [122, 148, 149, 150, 151, 152, 154]. [154] H. MacColl, Symbolic reasoning (VIII),Mind15(60), 504–518 (1906), see also [122, 148, 149, 150, 151, 152, 153]. [155] S. M. Stigler, Laplace’s 1774 memoir on inverse probability, Stat. Sci. 1(3), 359–363 (1986), introduction to the transl. [135].

30 Paper G

arXiv:physics/0703126v2 [physics.data-an] 29 Apr 2007 in npasblt hoy hog hsnto esalhr prahn esthan less no approach here applica- shall its we of notion and this ‘circumstance’ Through of theory. notion plausibility the in of tions exploration our here philosophia continue We atque notatio, personae, Dramatis 1 ∗ Email: Fo pasblte fplausibiliti of ‘plausibilities “From ugiaTkik ösoa,Iajrsaa 2 E144 tchl,Sweden Stockholm, 40 SE-164 22, Isafjordsgatan Högskolan, Tekniska Kungliga h alc-ansapoc oinduction to approach Laplace-Jaynes The htbsdo eFntisrpeetto hoe n ntento finfinite for- ‘unknown of those involve of apparently notion interpretation that the alternative mulae on an gives and it theorem particular, representation In exchangeability. Finetti’s de on based complem that a provides Bayesian meaning, fully and approach, form This in ‘circumstances’. into problem given a of context ASnmes 02.50.Cw,02.50.Tt,01.70. numbers: PACS also are Generalisations exchangeability. on based discussed. that espe- to discussed, comparison are in approach presented cially the of applications and advantages ous S ubr:03B48,60G09,60A05 numbers: MSC napoc oidcini rsne,bsdo h dao nlsn the analysing of idea the on based presented, is induction to approach An [email protected] .G .PraMana, Porta L. G. P. Dtd 9Arl2007) April 29 (Dated: en atI of II part Being Note,toheado ol nyb ahrcus oainlyadtedious and verbally. notationally clumsy rather it be arise; only not would could misconception the restate that easily so could everything We notation. our of doubling a a probabilities by hypotheses our parameters index probability”. to convenient a simply of is “probability It chosen a [...] introduce to to be way numerically no in equal to the ∗ .T J T. E. Abstract 1 .Mnsn .Björk G. Månsson, A. s osaeasgmn methods” state-assignment to es’ aynes probabilities’ n ri oecssa lentv to alternative an cases some in or ent + w sge ytoehptee;ti avoids this hypotheses; those by ssigned ff , omnmsocpin htti is this that misconception, common a oky,knaos n N and kangaroos, Monkeys, r‘rpniis.Vari- ‘propensities’. or 1,p 12 p. [1], Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction the question of induction, i.e., of the prediction of unobserved evens from knowl- edge of observed (similar) ones. The study can be read independently of the previous one [2], although the two elucidate each other. The following characters appear in this study, the first two having a walk-on part only: The ‘propensitor’: a scholar who conceives of a ‘physical disposition’ of some phenomena to occur with definite relative frequencies, and call this physical charac- teristic ‘propensity’. This scholar sometimes makes inferences about propensities by means of probability as degree of belief. The ‘frequentist’: a scholar who conceives probability as ‘limit relative fre- quency’ of a series of ‘trials’ (or something like that). Frequentists are often propen- sitors at heart. The ‘de Finettian’: a scholar who conceives probability as ‘degree of belief’, pedantically insisting on the subjective nature of all plausibility assignments, and through the notion of infinite exchangeability makes sense of the frequentist’s and propensitor’s voodooistic practises. The views and tools of this scholar [e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11] are supposed known to the reader. The ‘plausibilist’ or ‘probability logician’, who in this study will be your humble narrator: a scholar who conceives of probability as a formalisation and schematisa- tion of the everyday notions of ‘plausibility’ and ‘probability’ — just as truth in for- mal logic is such a formalisation and schematisation of the everyday notion of ‘truth’ — and does not dwell more than so in its meaning. The views of this scholar are a dis- tillate of the philosophies and views and/or the works of Laplace [12], Johnson [13], Jeffreys [14, 15, 16], Cox [17, 18], Jaynes [19, 20, 21], Tribus [22], de Finetti [6, 7], Adams [23, 24], Hailperin [25], and others [e.g., 11, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]. Plausibilists agree with de Finettians on basically all points. The only difference is in emphasis: although they recognise the ‘subjective’ character of initial plausibil- ity assignments — i.e., that these are matter of convention —, they also think that it is not such a big deal. It should in any case be no concern of plausibility theory, which is ‘impersonal in the same sense as in ordinary logic: that different people starting from the same [assignments] would get the same answers if they followed the rules’ (Jeffreys [16], reinterpreted). This will surely be self-evident when plausi- bility theory will reach its maturity, just as it is self-evident in formal logic today. In fact, the same subjective, conventional character is present — no more, no less — in formal logic as regards the initial truth assignments, those which are usually called ‘axioms’ or ‘postulates’. Albeit in formal logic there is perhaps less place for dis- agreement amongst people about initial truth assignments, the possible choice being there between ‘true’ and ‘false’, or ‘0’ and ‘1’ only — and not amongst a continuum of possibilities [0, 1] as in plausibility theory. The above remark is not meant to diminish the historical importance of de Fi- netti’s Nietzschean work in plausibility theory. Emphasis on subjectivity may still be necessary sometimes. In this study we shall also use terms like ‘judgement’ to this purpose.

2 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

The following notation will be used: P(A| B) denotes the plausibility of the proposition A conditional on B or, as we shall also say, given B or in the context B. The proposition B is supposed to express all knowledge (including beliefs), data, and ‘working hypotheses’ on the grounds of which we make the plausibility assignment for A. We are intransigent as regards the necessity and appropriateness of always specifying the context of a plausibility (as well as that of a truth!),1 a point also stressed by Jeffreys [14, 15], Cox [17, 18], Jaynes [19, 20, 21], Tribus [22], de Finetti [48, § 4], Hájek [49], and apparently Keynes [26]. A plausibility density for propositions Ax concerning (in a limit sense) a contin- uous parameter x is denoted by p(Ax| B), and is as usual implicitly defined by 

P(Ax∈Ξ | B) = p(Ax| B)dx for all appropriate Ξ;(1) Ξ in general, x → p(Ax| B) is a generalised function, here always intended in the sense of Egorov [50, 51, 52] (see also [53, 54, 55, 56]).2 The remaining notation follows ISO [68] and ANSI/IEEE [69] standards.

2 General setting of the question

The question that we are going to touch is, in very general terms, that of induction: In a given situation, given a collection of observations of particular events, assign the plausibility for another collection of unobserved events. Note that temporal distinc- tions are not relevant (we did not say ‘past’ or ‘future’ events) and we shall not make any. Of course, in stating this general question we have in mind ‘similar’ events.3 So let us call these events ‘instances of the same phenomenon’, following de Finetti’s appropriate terminology [3, 4, 48]. We also suppose to know that each such instance can ‘manifest itself’ in a constant (and known) number of mutually exclusive and exhaustive ‘forms’, and there is a clear similarity between the forms of each instance (that is one of the reasons we call the events ‘similar’). All this can be restated and made more concrete using a terminology that is nearer to physics; but we must keep in mind that the setting has not therefore become less general. We call the phenomenon a ‘measurement’,4 and its instances ‘measurement instances’. The forms will be called ‘(measurement) outcomes’. We can finally state our question thus: In a given situation, given the observed outcomes of some instances of a particular measurement, assign the plausibility for the unobserved outcomes of some other instances of that measurement.

1The context may also have a clarifying rôle in formal logic. Cf. e.g. the studies by Adams [23, 24, 41], Lewis [42, 43], Hailperin [25, 44], Barwise [45, 46], and Gaifman [47]. 2Alternatively, integrals as the above can be intended as generalised Riemann integrals [57, 58, 59] (see also [60, 61, 62, 63, 64, 65, 66, 67]). 3But ‘similar’ in which sense? The answer to this question cannot be given by plausibility theory, but can be formalised within it, as shown later. 4Even more appropriate, but too long, would be ‘measurement scheme’.

3 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

We represent the situation, measurements, etc. by propositions. The situation (τ) τ (τ) by I, the measurement instances by M , the outcomes of the th instance by Ri (different instances of the same outcome are those with different τ but identical i). Instances are thus generally denoted by an index (τ) with τ = 1, 2,...; its range may be infinite or finite, a detail that will be always specified as it will be very important in later discussions. Other propositions will be introduced and defined later. With this representation, our question above simply becomes the assignment of the plausibility τ τ τ τ τ τ P(R( N+L) ∧···∧R( N+1)| M( N+L) ∧···∧M( N+1) ∧ R( N ) ∧···∧R( 1) ∧ I)(2) iN+L iN+1 iN i1 for all possible distinct τa, distinct ia, N  0, and L > 0.

A more convenient notation. If you are wondering what those M(τ) are doing in the context of the plausibility, consider that the plausibility of observing a particu- lar outcome at the τ instance of a measurement is, in general, the plausibility of the outcome given that the measurement is made, times the plausibility that the mea- surement is made (if the latter is not made the outcome has nought plausibility by definition): P(R| I) = P(R| M ∧ I)P(M| I) + P(R|¬M ∧ I)P(¬M| I), = P(R| M ∧ I)P(M| I). But of course when we ask for the plausibility of an outcome we implicitly mean: given that the corresponding measurement is or will be made. We assume that knowl- edge of an outcome implies knowledge that the corresponding measurement has been   made — symbolically, P(M(τ )| R(τ ) ∧ I) = 1ifτ = τ — hence there is no need of specify in the context the measurements of those outcomes that are already in the context. But the necessity remains of explicitly writing in the context the measure- ments of the outcomes outside the context. This can sometimes be notationally very cumbersome, and therefore we introduce the symbol M with the following conven- tion: M, in the context of a plausibility, always stands for the conjunction of all measurement instances M(τ) corresponding to the outcomes on the left of the condi- tional symbol ‘|’. Thus, e.g., (7) ∧ (3)| M ∧ (2) ∧ ≡ (7) ∧ (3)| (7) ∧ (3) ∧ (2) ∧ . P(R5 R1 R8 I) P(R5 R1 M M R8 I) Note that M has not a constant value (it is in a sense a metavariable); we indicate this by the use of a different typeface (Euler Fraktur). With the new notational convention the plausibility (2) can be rewritten as τ τ τ τ P(R( N+L) ∧···∧R( N+1)| M ∧ R( N ) ∧···∧R( 1) ∧ I). (2) iN+L iN+1 iN i1 r

3 The approach through infinite exchangeability

The propensitor (and, roughly in the same way, the frequentist as well) approaches the question of assigning a value to (2) by supposing that the outcome instances (‘tri- als’) are ‘i.i.d.’: that they are independently ‘produced’ with constant but ‘unknown’

4 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction propensities. As N in (2) becomes large, the relative frequencies of the outcomes must tend to the numerical values of the propensities. These frequencies can then be used as ‘estimates’ of the propensities, and so the estimated propensity for outcome Ri at an additional measurement instance can be given. This summary surely appears laconic to readers that have superficial knowledge of this practise. But this does not matter: it is the approach of the de Finettian that interests us.5 Recall that for the de Finettian, and for the Bayesian in general, ‘probability’ is not a physical concept like ‘pressure’, but a logical (and subjective) one like ‘truth’. To mark this difference in concept we are using the term plausibility, which has a more logical and subjective sound. How would a de Finettian approach the problem above? More or less as follows:

[A fictive de Finettian speaking:] ‘Before I know of any measurement outcomes, I imagine all possible infinite collections of instances of the measurement M.Let us suppose that I judge two collections that have the same frequencies of outcomes to have also the same probability. The probability distribution that I assign to the infinite collections of outcomes is therefore symmetric with respect to exchanges of collections having the same frequencies. Such a distribution is called infinitely exchangeable. De Finetti’s representation theorem [4, 5, 6, 7, 8, 9, 10, 11, 71] (see also Johnson and Zabell [13, 72, 73]) says that any L-outcome marginal of such distribution, where the outcomes {Ri} appear with relative frequencies L¯ ≡ (Li), can be uniquely written in the following form:  τ τ ( L) ∧···∧ ( 1)| M ∧ = Li Γ | , P(Ri Ri I) qi (¯q I)d¯q (3) L 1 i R1 appears L1 times, etc. ≡ whereq ¯ (qi) are just parameters — not probabilities! — satisfying the same posi- tivity and normalisation conditions (qi  0, i qi = 1) as a probability distribution, andq ¯ → Γ (¯q| I) is a positive and normalised generalised function, which can be called the generating function of the representation [cf. 71]. Example: I can write the probability of a collection of three measurement instances with two outcomes R5 and one R8 as  τ τ τ ( 1) ∧ ( 2) ∧ ( 3)| M ∧ = 2 Γ | . P(R5 R5 R8 I) q5 q8 (¯q I)d¯q

In particular, the probability for the outcome i in the whatever measurement instance τ can be written as  (τ)| (τ) ∧ = Γ | . P(Ri M I) qi (¯q I)d¯q (4) A propensitor or a frequentist would say that the right-hand side of eq. (3) represents the fact that the outcome instances are “independent” (therefore “their probabilities

5There are authors who apparently keep a foot in both camps; see e.g. Lindley and Phillips’ arti- cle [70]

5 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

qi are simply multiplied to give the probability of their conjunction”), “identically distributed” (therefore “the probability distributions q is the same for all instances”), and moreover “their probability is unknown” (therefore “the expectation integral over all possible probability distributionsq ¯”). But de Finetti’s theorem shows that all these mathematical features are simply consequences of infinite exchangeability, and we do not need any “i.i.d.” terminology. ‘The generating functionq ¯ → Γ (¯q| I) is, by de Finetti’s theorem, equal to the limit ‘All possible collections of L outcomes with Γ (¯q| I)d¯q = lim P M ∧ I , (5) L→∞ frequencies in the range ]Lqi, L(qi + dqi)[’ i.e., Γ (¯q| I) is equal to the probability — assigned by me — that my imagined infinite collection of outcomes has limiting relative frequencies equal to (qi). My exchange- able probability assignment determines thus Γ uniquely. But de Finetti’s theorem also says the converse, viz., any positive and normalisable Γ uniquely determines an infinitely exchangeable probability distribution. This allows me to specify my probability assignment by giving the function Γ instead of the more cumbersome probability distribution for infinite collections of outcomes. ‘Let us suppose that I am now given a collection of N measurement outcomes, and in particular their absolute frequencies N¯ ≡ (Ni). Given this evidence, the proba- bility for a collection of further L unobserved measurement outcomes with frequen- cies (Li) is provided by the basic rules of probability theory: τ τ τ τ P(R( N+L) ∧···∧R( N+1)| M ∧ R( N ) ∧···∧R( 1) ∧ I) = iN+L iN+1 iN i1

Ri appears Li times Ri appears Ni times τ τ τ τ P(R( N+L) ∧···∧R( N+1) ∧ R( N ) ∧···∧R( 1)| M ∧ I) iN+L iN+1 iN i1 . τ τ (6) P(R( N ) ∧···∧R( 1)| M ∧ I) iN i1 Using de Finetti’s representation theorem again, this probability can also be written as τ τ τ τ R( N+L) ∧···∧R( N+1)| M ∧ R( N ) ∧···∧R( 1) ∧ I = P( i + i + i i ) N L N 1 N  1 τ τ Li Γ | ( N ) ∧···∧ ( 1) ∧ . qi (¯q Ri Ri I)d¯q (7) i N 1 τ τ The functionq ¯ → Γ (¯q| R( N ) ∧ ··· ∧ R( 1) ∧ I)isdifferent from the previous one iN i1 q¯ → Γ (¯q| I), but the two can be shown [e.g., 11] to be related by the remarkable formula Ni Γ | qi (¯q I) τ τ Γ | ( N ) ∧···∧ ( 1) ∧ = i , (¯q Ri Ri I) (8) N 1 Ni Γ | qi (¯q I)d¯q i which is formally identical with Bayes’ theorem if we define

(τ1) (τ2) (τL) Γ (R ∧ R ∧···∧R | q¯, I)  qi qi ···qi for all L, ia, and distinct τa,(9) i1 i2 iL 1 2 L

6 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction which includes in particular

Γ (τ)| ,  τ (Ri q¯ I) qi for all . (10) (Note that these are only formal definitions and not probability judgements, since the various Γ s are not probability distributions.) Thus I can not only specify my infinitely exchangeable distribution by Γ , but also update it by ‘updating’ Γ . Moreover, for N enough large this function has the limit τ τ Γ (¯q| R( N ) ∧···∧R( 1) ∧ I)  δ N¯ − q¯ as N →∞. (11) iN i1 N

Ri appears Ni times

Comparing with eq. (4), this means that

τ τ τ N ( N+1) (τN+1) ( N ) ( 1) i P(R | M ∧ R ∧···∧R ∧ I)  as N →∞ (with τ  τa), (12) i iN i1 N

Ri appears Ni times i.e., the probability I assign to an unobserved outcome gets very near to its observed relative frequency, as observations accumulate. This also means that two persons having different but compatible initial beliefs (i.e., different initial exchangeable dis- tributions having the same support) and sharing the same data, tend to converge to similar probability assignments.’

The point of view summarised above by the de Finettian is quite powerful. It allows the de Finettian to make sense, without the need of bringing along ugly or meaningless metaphysical concepts, of those mathematical expressions very often used by propensitors (and frequentists as well) that are formally identical to formu- lae (4) (‘expected propensity’ or ‘estimation of unknown probability’), (5) (‘proba- bility as limit frequency’), (9) (definition of ‘propensity’), (12) (‘probability equal to past frequency’). This point of view and the notion of exchangeability can moreover be generalised to more complex situations [5, 8, 9, 10, 71, 74, 75, 76][cf. also 77], leading to other powerful mathematical expressions and techniques.

4 Why a complementary approach?

We shall presently present and discuss another approach, not based on exchangeabil- ity or representation theorems, that can be used to give an answer to the question of induction, and to make sense of the propensitors’ and frequentists’ formulae and practise as well. Why did we seek an approach different from that based on infinite exchangeability? Here are some reasons: (a) One sometimes feels ‘uncertain’, so to speak, about one’s plausibility as- signment. One can also say that some plausibility assignments feel sometimes more ‘stable’ than others. This is lively exemplified by Jaynes [21, ch. 18]:

7 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

Suppose you have a penny and you are allowed to examine it carefully, convince yourself that it’s an honest coin; i.e. accurately round, with head and tail, and a center of gravity where it ought to be. Then, you’re asked to assign a probability that this coin will come up heads on the first toss. I’m sure you’ll say 1/2. Now, suppose you are asked to assign a probability to the proposition that there was once life on Mars. Well, I don’t know what your opinion is there, but on the basis of all the things that I have read on the subject, I would again say about 1/2 for the probability. But, even though I have assigned the same “external” probabilities to them, I have a very different “internal” state of knowledge about those propositions. To see this, imagine the effect of getting new information. Suppose we tossed the coin five times and it comes up tails every time. You ask me what’s my probability for heads on the next throw; I’ll still say 1/2. But if you tell me one more fact about Mars, I’m ready to change my probability assignment completely. There is something which makes my state of belief very stable in the case of the penny, but very unstable in the case of Mars. An example similar to the Martian one is that of a trickster’s coin, which always comes up heads or always tails (either because the coin is two-headed or two-tailed, or because of the tosser’s skills). Not knowing which way the tosses are biased, you assign 1/2 that the coin will come up heads on the first observed toss. But as soon as you see the outcome, your plausibility assignment for the next toss will collapse6 to 1 or 0. The ‘uncertainty’ in the plausibility assignments given to the penny and to the trickster coin’s toss could be qualitatively pictured as in Fig. 1. We think that this ‘uncertainty’ in the plausibility assignment — or better, as Jaynes calls it, this ‘difference in the internal state of knowledge’ with regard to the propositions involved, is a familiar and undeniable feeling. There is in fact a rich literature which tries to take it into account by exploring or even proposing alternative plausibility theories based on probability intervals or probabilities of probabilities (see e.g. [78, 79, 80, 81, 82; 77, esp. § 3.1; 83, 84] and cf. [85, § 2.2]). (b) A plausibility assignment, according to de Finetti’s approach, begins with a judgement of exchangeability. One is not concerned within the formalism itself with the motivation of such a judgement (de Finetti recognises that there usually is a motivation, but it is relegated to the informal meta-theoretical considerations). Yet such judgements are often grounded, especially in natural philosophy, on very important7 reasonings and motivations which one would like to analyse by means of plausibility theory.8

6Just like a quantum-mechanical wave-function. 7And, trivially, subjective. 8In a way there is a point in Good’s statement [85, ch. 3] ‘it seems to me that one would not accept the [exchangeability assumption] unless one already had the notion of physical probability and (approximate) statistical independence at the back of one’s mind’, although there is no need to bring ‘physical probabilities’ and ‘statistical independence’ along.

8 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

P‘heads’ 0 0.2 0.4 0.6 0.8 1

Figure 1: Qualitative illustration of the ‘uncertainty’ in the plausibility assignments for ‘heads’ in the case of a fair coin-toss (blue dashed curve, with maximum at 1/2) and the toss of a trickster’s coin (orange continuous curve, with maxima at 0 and 1). The illustration is made quantitative in § 6.2.

(c) When the maximum number of possible observations (measurements) is fi- nite and small it does not make sense to make a judgement of infinite exchangeability, not even as an approximation (cf. [9][11, § 4.7.1]). De Finettians can in this case use finite exchangeability [9, 10], but then they cannot make sense of the propensitor’s formulae like (4) or (9), which the propensitor, however, is still entitled to use even in this case. (d) Finally, another important reason is that the generalisation of de Finetti’s the- orem studied by Caves, Fuchs, and Schack [86, 87, 88, 89] fails for some physical sta- tistical models, viz. quantum mechanics on real and quaternionic Hilbert space. We consider this failure not as a sign that complex quantum theory is somehow ‘blessed’, but rather as a sign that either the theorem can be generalised in some other way that holds in any physical statistical model, or de Finetti’s approach can be substituted or complemented by another one whose generalisation applies to any physical statisti- cal model whatever. The point here is that de Finetti’s theorem — just like the whole of plausibility theory — belongs to the realm of logic, not physics, and thus should apply to any conceivable physical theory that be logically consistent, even one that does not describe actual phenomena. The alternative point of view to be presented meets all the points raised above: (a) It allows us to formalise within plausibility theory — i.e., without resorting to interval-valued plausibilities and the like — the intuitive notion of ‘uncertainty’ of a

9 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction plausibility assignment, and even to quantify it. (b) It stems directly from an analysis of the conditions in which judgements of exchangeability usually originate and apply. (c) It can be used to interpret the propensitor’s practise when the maximum number of possible observations is finite and small. (d) It readily generalises to any logically consistent physical statistical model, whether it apply to actual phenomena or not; in particular, within quantum theory it easily applies to such problems as quantum- state assignment, quantum-state ‘teleportation’, and others that involve the notion of ‘unknown quantum state’. This point of view, which we shall call the ‘Laplace-Jaynes approach’, stems from a re-reading and possibly a re-interpretation of Jaynes [1, esp. pp. 11, 12, 15] (cf. also [19, lect. 18][20, lect. 5][21, ch. 18]), Laplace [90], and de Finetti [3, § 20] (cf. also Caves [91]) and is strictly related to the ‘approach through circumstances’ presented in the previous paper [2]. Mosleh and Bier [92] also propose and discuss basically the same point of view. It cannot be too strongly emphasised that this point of view is fully in line with plausibility theory, Bayesian theory, and de Finetti’s point of view: Some plausibili- ties with particular properties will be introduced, based on particular judgements; but the acceptance or not of the latter is, as with judgements of exchangeability, always up to the individuals and their knowledge. We now present this ‘Laplace-Jaynes approach’ and show how it can be used for the question of induction in a manner parallel to the approach through infinite exchangeability. We shall then discuss how it meets points a, b, c above (not neces- sarily in that order). The discussion of the fourth point is left to the next study of this series (in preparation).

5 The Laplace-Jaynes approach

5.1 Introducing a set of circumstances

With the notation and the general settings of the introduction, let us recapitulate how we reason and proceed in the case of exchangeability. The situation I leads us to assign an infinitely exchangeable plausibility distribution to the collection of measurement outcomes. In other words, I is such that we consider two plausibilities (2) ∧ (4)| M ∧ (4) ∧ (2)| M ∧ like e.g. P(R5 R7 I)andP(R5 R7 I) as equal. In § 4, point (b), we remarked that the approach through exchangeability is not formally concerned about those details of the situation which lead us to see a sort of analogy ar similarity amongst different measurement and outcome instances and thence make a judgement of exchangeability. Rather, the whole point is that this similarity is expressed by, or reflected in, the exchangeable plausibility assignment. As de Finetti says, ‘Our reasoning will only bring in the events, that is to say, the trials, each taken individually; the analogy of the events does not enter into the chain of reasoning in its own right but only to the degree and in the sense that it can influ- ence in some way the judgment of an individual on the probabilities in question’ [4,

10 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction p. 120]. And also:

What are sometimes called repetitions or trials9 of the same event are for us distinct events. They have, in general, some common characteristics or symmetries which make it natural to attribute to them equal proba- bilities, but we do not admit any apriorireason which prevents us in principle from attributing to each of these trials [...]somedifferent and absolutely arbitrary probabilities [...].Inprinciplethereisnodifference for us between this case and the case of n events which are not analo- gous to each other; the analogy which suggests the name “trials of the same event” (we would say “of the same phenomenon”) is not at all es- sential, but, at the most, valuable because of the influence it can exert on our psychological judgment in the sense of making us attribute equal or very nearly equal probabilities to the different events.’ [4, p. 113, foot- note 1]

But although we agree on this terminology and its motivation, we do not agree on the unqualified and indiscriminate diminution of the importance of the similarity amongst measurement instances. Such similarity, especially in the natural sciences, often stems from or is traced back to similarities at deeper10 levels of analysis and observation. It is by this process that natural philosophy proceeds. So let us suppose that the considerations that lead us to a probability assignment for various measurement instances can be analysed — and formalised — at a deeper level.11 More precisely, we suppose to have identified for each measurement instance τ { (τ) = ,...} a set of diverse possible ‘circumstances’ C j : j 1 — all these sets having the same cardinality — with the following properties:

I. The first involves the plausibilities we assign to the circumstances:

(τ)| ∧ = (τ)| P(C j D I) P(C j I)forallj and all D representing conjunc- tions of measurements (but not of out- comes!) from the same or other instances, (I) (τ) ∧ (τ)| =   , P(C j C j I) 0ifj j   (τ)| = (τ)| = , P C j I P(C j I) 1 j j

{ (τ) = ,...} i.e., we judge the C j : j 1 to be mutually exclusive and exhaustive, or in other words we are certain that one of them holds, but we do not know which. Moreover, we judge any knowledge of measurement instances (but not

9And what we call here instances (Authors’ Note). 10We do not mean ‘microscopic’. 11Using, once more, words by de Finetti: ‘we enlarge the analysis to include our state of mind in relation to other events, from which it might or might not be independent and by which, consequently, it will or will not be modified if they occur, or if we learn of their occurrence’ [93, § 28].

11 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

of outcome instances) as irrelevant for assessing the plausibilities of these cir- cumstances, or in other words, simply knowing that a measurement is made does not give us clues as to the exact circumstance in which it is made.12

II. The second involves the plausibility distribution we assign to the outcomes, conditional on knowledge of a circumstance:

(τ)| (τ) ∧ ∧ (τ) ∧ = (τ)| (τ) ∧ (τ) ∧ P(Ri M D C j I) P(Ri M C j I) for all τ, i, j,andallD representing a conjunction of measurements, measurement outcomes, and circumstances of instances different from τ, (II)

(τ) which means that if we were certain about any circumstance C j for the in- stance τ we should judge knowledge of any data concerning other instances as irrelevant for the assessment of the plausibility of the outcome of instance τ. (τ) (The expression above is undefined if C j happens to be inconsistent with D, | M ∧ (τ) ∧ = i.e. if P(D C j I) 0; but this will not cause problems in the following analysis.)

III. The third property concerns the relationships amongst our plausibility assign- ments for the outcomes in different instances:

    (τ )| (τ) ∧ (τ ) ∧ = (τ )| (τ) ∧ (τ ) ∧  τ τ P(Ri M C j I) P(Ri M C j I) qij for all , , (III) which expresses the fact that we see a similarity between outcomes and circum- stances of different instances, like when we say ‘the same outcome’, ‘the same measurement’, or ‘the same circumstance’. It is thanks to this property that we can often drop the instance index ‘(τ)’ and make sense of expressions like ‘P(Ri| M ∧ C j ∧ I)’, which can stand generically for

(τ) (τ) (τ) P(Ri| M ∧ C j ∧ I)  P(R | M ∧ C ∧ I)foranyτ, i j (13) ≡ qij.

In the following, expressions like ‘P(Ri| M ∧ C j ∧ I)’ will be understood in the above sense.

IV. The fourth property strengthen the similarity amongst instances, and is the one that makes induction possible:

(τ) (τ)   | ∧ = δ   τ , τ P(C j C j I) j j for all . (IV) This means that we believe that if a particular circumstance holds in a particular instance, then the ‘same’ circumstance (in the sense of (III) and (IV)) holds in

12This assumption can be relaxed.

12 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

all other instances. This property implies (and, with the help of (I), is implied by) the following: (τ) (σ) for all τ and j,P(C | I) = P( C | I)  γ j (14) j σ j

for some γ j;and ⎧ ⎪γ = σ (σ) ⎨ j if jσ j for all , P( C | I) = ⎪ (15) σ jσ ⎩0 otherwise,

i.e., the plausibility that we assign to the τth instance of the jth circumstance is equal to that assigned to the collection consisting exclusively of ‘repetitions’ of the jth circumstance. Collections consisting of non-corresponding circum- stances have nought plausibility. This allows us to define |  (σ)| , P(C j I) P( σC j I) ≡ (τ)| τ, (16) P(C j I)forany

≡ γ j,

and (σ) C j  C . (17) σ j

Thanks to the definitions above the expression ‘P(C j| I)’ can be used to unam- biguously denote the plausibility that ‘the “same” circumstance ‘C j’ holds in all measurement instances’.

Note that in all the properties above the number of instances (i.e., the range of τ)can be either infinite or finite. Before further commenting the above properties, let us try to partially answer the question: what are these ‘circumstances’? The most general and precise answer is: they are whatever (propositions concerning) facts you like that make you assign plausibilities satisfying properties (I)–(IV). No more than this would really need be said. But we can emphatically add that the circumstances need not concern ‘mecha- nisms’, ‘causes’, ‘microscopic conditions’, or the like; and that they do not13 concern ‘unknown probabilities’.14 They concern details of the context that are unknown, but { (τ)} that (we judge) would have a great weight in our plausibility assignment for the Ri if we only knew them; so great a weight as to render (practically) unimportant any knowledge of the details of other measurement instances. As for the choice of such

13Cannot, in a logical sense; cf. Remark 2 in [2]. 14But the circumstances may concern ‘propensities’, if the latter are intended as sorts of (ugly) physical concepts. This possibility is due to the generality of plausibility theory which, like classical logic, does not forbid you to bring along and reason about unreal or even preposterous concepts and entities, like ‘nagas’ [94], ‘valier’ [95], ‘propensities’, and ‘wave-particles’, provided you do it in a self-consistent way.

13 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction details, we have complete freedom. A trivial example: the tosses of a coin which we have not examined, but which we know for sure to be two-headed or two-tailed. In this case the propositions ‘Coin is two-headed (at toss τ)’ and ‘Coin is two-tailed (at toss τ)’ form, for each τ, a set of circumstances that satisfy property (I). But we also know that the coin is the same at all tosses, which implies property (IV); hence we can omit the specification ‘(at toss τ)’ without confusion. If we knew that the coin was two-headed (at toss τ) we should assign unit plausibility to heads for all future or past tosses, i.e.,

P(‘Heads at toss τ’| M ∧ ‘Coin is two-headed (at toss τ)’ ∧ I) = 1forallτ, and we should consider knowledge of the outcomes of other tosses as irrelevant.15 An analogous discussion holds for the two-tails possibility. Thus properties (II) and (III) are also satisfied. This was an extreme example, for the plausibilities conditional on the circum- stances were nought or one. But it needs not be so: other circumstances could lead to less extreme conditional plausibility judgements. In general, remaining within the coin toss example, we could ask: By which method is the coin tossed? What are its physical characteristics (two-headedness, centre-of-mass position, elastic and rigid properties, etc.)? Upon what is it tossed? Who tosses it?16 Can there be any symme- tries in my state of knowledge in respect of the situation? What are the consequences of the toss or of the outcomes? — and a set of circumstances could be distilled from the possible answers to these and other questions. Note in particular, with regard to the last two questions, that a circumstance may be a judgement of symmetry or may also be a consequence, in some sense, of the outcomes. Such kinds of circum- stances are perfectly fine as long as they lead you to make a plausibility assignment on the outcomes and satisfy the properties (I)–(IV). This emphasises again the fact that a circumstance needs not be a sort of ‘mechanism’ or ‘cause’ of the measure- ments or of the outcomes. Note also that properties (I) and (IV) determine neither (τ)| (τ)| (τ) ∧ (τ) ∧ the plausibilities P(C j I) nor the P(Ri M C j I). These plausibilities are, a de Finettian would say, ‘fully subjective’. By properties (II) and (III) we can interpret and give a meaning to the locution ‘in- dependent and identically distributed events’. Such locution means only that we are entertaining some circumstances that, according to our judgement, render the con- ditional plausibilities of corresponding outcomes of different measurement instances equal, so that we need not specify the particular instance (‘identically distributed’); and render knowledge of outcomes of other instances irrelevant, so that we can spec- ify the conditional plausibility for an outcome independently of the knowledge of other outcomes (‘independent’). The locutions ‘unknown probability’ and ‘probabil- ity of a probability’ will also be interpreted in a moment (§ 5.3).

15Because we can but expect all tosses to give heads. Note that if a toss has given or will give tails, this signals a contradiction in our knowledge. I.e., some of the data we have (about the coin or about toss outcomes) have to be mendacious. But this is a problem that does not concern plausibility theory. Like logic, it can give sensible answers only if the premises we put in are not inconsistent. 16See Jaynes’ insightful and entertaining discussion [21, ch. 10] on these kinds of factors.

14 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

5.2 Approaching the problem of induction through a set of circumstances

How do we face the question of induction with these ‘circumstances’ and the as- sumptions that accompany them? We answer this question in two steps. { (τ)} Having introduced sets of circumstances C j , we must assign (subjectively, a de Finettian would say) for all i and j the plausibilities

| ∧ ∧ ≡  (τ)| (τ) ∧ (τ) ∧ τ P(Ri M C j I) qij ( P(Ri M C j I)forany ), (18) | ≡ γ  (σ)| ≡ (τ)| τ P(C j I) j ( P( σC j I) P(C j I)forany ), (19) in the sense of eqs. (13) and (16). It is then a simple consequence of the rules of plau- sibility theory, together with the properties and definitions of the previous sections, that the plausibility we assign to any collection of measurement outcomes is given by

τ τ P(R( L) ∧···∧R( 1)| M ∧ I) = iL i1

R1 appears L1 times, etc. | ∧ ∧ Li | ≡ Li | . P(Ri M C j I) P(C j I) qij P(C j I) (20) j i j i

Note how this plausibility assignment depends only on the frequencies (Li)ofthe outcomes: it is an (infinitely) exchangeable assignment — although exchangeability was not our starting assumption. Let us suppose that we are now given a collection of N measurement outcomes, and in particular their absolute frequencies N¯ ≡ (Ni). Given this evidence, the plausi- bility for a collection of outcomes, with frequencies (Li), of further L measurements is also derived by the basic rules of plausibility theory, from our initial plausibility assignments (18) and (19), using expression (20) and the properties of the circum- stances:

τ τ τ τ P(R( N+L) ∧···∧R( N+1)| M ∧ R( N ) ∧···∧R( 1) ∧ I) = iN+L iN+1 iN i1

Ri appears Li times Ri appears Ni times τ τ Li ( N ) ( 1) P(Ri| M ∧ C j ∧ I) P(C j| R ∧···∧R ∧ I) ≡ iN i1 j i Li (τN ) (τ1) q P(C j| R ∧···∧R ∧ I), (21) ij iN i1 j i with Ni | qij P(C j I) τ τ | ( N ) ∧···∧ ( 1) ∧ = i . P(C j Ri Ri I)  (22) N 1 Ni | qij P(C j I) j i

15 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

These formulae present many similarities to the de Finettian’s (7) and (8), and to the formally similar expressions used by ‘frequentists’ and ‘propensitors’. But it should be noted that, in the present formulae, qki  P(Ri| M ∧ C j ∧ I), P(C j| I), and (τN ) (τ1) P(C j| R ∧···∧R ∧I)areactual plausibilities, not just parameters or positive and iN i1 normalised generating functions. (‘And they are fully subjective!’, the de Finettian reiterates).

5.3 Plausibility-indexing the set of circumstances

The marked similarities can in fact be made into identities of form. To achieve this, { (τ)} we go back to the point where, after having introduced the circumstances C j ,we made the plausibility assignments | ∧ ∧ ≡  (τ)| (τ) ∧ (τ) ∧ τ P(Ri M C j I) qij ( P(Ri M C j I)forany ), (18)r | ≡ γ  (σ)| ≡ (τ)| τ P(C j I) j ( P( σC j I) P(C j I)forany ), (19)r for all i, j (and τ). Instead of proceeding as we did, we now follow the remark by Jaynes [1, p. 12], cited in the introductory epigraph: It is simply convenient to index our hypotheses by parameters [¯q] cho- sen to be numerically equal to the probabilities assigned by those hy- potheses; this avoids a doubling of our notation. We could easily restate everything so that the misconception could not arise; it would only be rather clumsy notationally and tedious verbally. What Jaynes calls ‘hypotheses’ we have here interpreted, more generically, as ‘cir- cumstances’. Let us now group together those that lead to the same plausibility distribution for the outcomes. That is, fix a τ, and form the equivalence classes of the equivalence relation (τ) ∼ (τ) ⇐⇒ (τ)| (τ) ∧ (τ) ∧ = (τ)| (τ) ∧ (τ) ∧ . C j C j for all i,P(Ri M C j I) P(Ri M C j I) (23) By the very method these classes are defined, each one can be uniquely identified by a particular set of values (qi) ≡ q¯ of the plausibility distribution for the outcomes. Note that owing to property (III) the value ofq ¯ does not depend on τ. Call therefore ∼ (τ) ∈ ∼ q¯ the class identified by a particularq ¯, and denote membership of C j by ‘ j q¯ ’ for short. Now let us take the disjunction of the circumstances in each classq ¯∼ and (τ) uniquely denote this disjunction by S q¯ :  (τ)  (τ). S q¯ C j (24) j∈q¯∼

(τ) It is easy to see that, owing again to property (I), each S q¯ yields a conditional distribution for the outcomes that is numerically equal toq ¯, i.e. numerically equal to (τ) ∼ all those yielded by the C j inq ¯ : (τ)| (τ) ∧ (τ) ∧ = τ. P(Ri M S q¯ I) qi for any (25)

16 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

{ (τ)} { (τ)} Moreover, the S q¯ also satisfy properties (I)–(IV) as the circumstances C j ,and their plausibilities are easily obtained from those of the latter:   (τ)| = (τ)| , | = | , P(S q¯ I) P(C j I) P(S q¯ I) P(C j I) (26) j∈q¯∼ j∈q¯∼  (σ) where S q¯ σ S q¯ in analogy with the definition (17). (τ) These S q¯ are thus a sort of ‘plausibility-indexed’ circumstances. We shall call (τ) them also ‘coarse-grained circumstances’ sometimes. The C j can then be called ‘fine-grained’ circumstances when a distinction is necessary. ∼ (τ) In general, we have a classq ¯ and a disjunction S q¯ for particular (vector) values ofq ¯ only (depending on the initial choice of circumstances and on the initial assign- (τ) ments (18)). But we can formally introduce propositions S q¯ , all equal to the false proposition (A ∧¬A), for the remaining valuesq ¯. These propositions hence have (τ)| ≡ (τ)| (τ) ∧ (τ) ∧ nought plausibilities, P(S q¯ I) 0. The plausibilities P(Ri M S q¯ I)are undefined, but they will appear multiplied by the former in all relevant formulae, and hence their product will vanish by convention. With this expedient we can consider { (τ) | ∈ Δ} the whole, continuous set S q¯ q¯ for all possible distributionsq ¯ which natu- Δ  { |  , = } rally belong to the simplex (qi) qi 0 i qi 1 . We can thus substitute ... | ...... | ... an integration ∫Δ p(S q¯ )d¯q for the summation q¯ P(S q¯ ). The density 17 p(S q¯ | ...) will be a generalised function. We can therefore analyse the context I through the plausibility-indexed circum- { (τ)} { (τ)} stances S q¯ , instead of the C j . The rationale behind this is that those circum- (τ) ∼ ff stances C j that belong to a given classq ¯ have all the same e ect on our judgement as regards the assignment and the update of the plausibilities of the outcomes; more- (τ)| ... ∧ over, the ratios of their updated plausibilities P(C j I) cannot change upon acquisition of new measurement outcomes and are equal to those of their prior plau- sibilities, see [106, 107] and [2, remark 3 and § 4.2]. It is therefore not unreasonable (τ) ff to handle equivalent C j class-wise. The special notation chosen for the di erent (τ) classes, ‘S q¯ ’, the index ‘¯q’ in particular, is to remind us on the grounds of what plausibility judgements (theq ¯) we grouped the original circumstances thus in the (τ) first place. But it is only a notation, nothing more. Each S q¯ is a disjunction of propositions like, e.g., ‘The coin is two-headed, or the tosses are made by a trickster withapredilectionforheads,or...’— itis not a statement about the plausibilities qi (nor about ‘propensities’). The effect of this ‘bookkeeping’ notation is, however, surprising for the form our induction formulae (20)–(22) take when expressed in terms of the plausibility- { (τ)} indexed circumstances S q¯ . The plausibility we assign to any collection of mea-

17We can avoid the expedient above if we like; then the integrals that follow must be understood in a measure-theoretic sense [96, 97, 98, 99, 100, 101, 102, 103] (cf. also [61, 104, 105]), with ‘p(S q¯| ...)d¯q’ standing for appropriate singular measures.

17 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction surement outcomes takes now, by eq. (25), the form

P(R(L) ∧···∧R(1)| M ∧ I) = iL i1 R1 appears L1 times, etc.  | ∧ ∧ Li | ≡ Li | , P(Ri M S q¯ I) p(S q¯ I)d¯q qi p(S q¯ I)d¯q (27) i i with value numerically equal to that of (20), and with the same remarks and conven- tions about the index τ as made in § 5.1. The expression above is formally identical to the de Finettian’s (3) with the correspondence p(S q¯ | I)  Γ (¯q| I). In particular, the plausibility for the outcome of any instance takes the form  (τ)| (τ) ∧ = | , P(Ri M I) qi p(S q¯ I)d¯q (28) formally identical to (4). The plausibility conditional on the observation of N outcomes takes the form

τ τ τ τ P(R( N+L) ∧···∧R( N+1)| M ∧ R( N ) ∧···∧R( 1) ∧ I) = iN+L iN+1 iN i1

Ri appears Li times Ri appears Ni times  τ τ | ∧ ∧ Li | ( N ) ∧···∧ ( 1) ∧ ≡ P(Ri M S q¯ I) p(S q¯ Ri Ri I)d¯q i N 1  τ τ Li | ( N ) ∧···∧ ( 1) ∧ qi p(S q¯ Ri Ri I)d¯q (29) i N 1 with Ni | qi p(S q¯ I)d¯q τ τ | ( N ) ∧···∧ ( 1) ∧ = i , p(S q¯ Ri Ri I)d¯q (30) N 1 Ni | qi p(S q¯ I)d¯q i again formally identical to the de Finettian’s (7) and (8). Apart from the congruence between their mathematical forms, the above formu- lae and those derived from exchangeability have very different meanings. Whereas in the exchangeability approach theq ¯ ≡ (qi) were just parameters, in the Laplace-Jaynes (τ)| (τ) ∧ approach they are (numerical values of) actual plausibilities,viz.theP(Ri M (τ) ∧ (τ)| (τ) ∧ (τ) ∧ C j I)ortheP(Ri M S q¯ I). Whereas in the exchangeability approach q¯ → Γ (¯q| ...) was only a generalised function, in the Laplace-Jaynes approach q¯ → p(S q¯| ...) is (the density of) an actual plausibility distribution — a distribu- tion, however, not over ‘probabilities’ or ‘propensities’, but rather over propositions like ‘The coin is two-headed, or the tosses are made by a trickster with a predilection forheads,or...’. Equations (29) and (30) (as well as (21) and (22)) show that the observation of measurement results has a double ‘updating’ effect in the circumstances approach. Not only are the plausibilities of unobserved results updated, but those of the cir- cumstances as well; in fact, the former updating happens through the latter. This is

18 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction related to what Caves calls ‘learning through a parameter’ [91], although there are no parameters here, only plausibilities. Also in the exchangeability approach, one could argue, is the generating function Γ (¯q| I) updated; but its updating only represents and reflects a change in our uncertainty or degree of belief about the outcomes,no more than that. In the Laplace-Jaynes approach, the updating of p(S q¯ | I) represents instead a further change in our uncertainty about events or phenomena other than the outcomes. We shall return to this in a moment. We finally also see how the Laplace-Jaynes approach interprets and makes sense of the notions of ‘unknown probability (or propensity)’ and ‘probability of a proba- bility’: what is unknown is not a probability, but which circumstance from a set of empirical ones holds; the second probability is therefore not about a probability, but about an empirical circumstance.

5.4 Limit for large number of observations

Let us suppose that the plausibility distribution p(S q¯ | I)d¯q does not vanish for any q¯. This implies that the set of ‘fine-grained’ circumstances {C j} is a continuum; with analytical and topological care this case should not present particular difficulties. That assumption being made, from eq. (30) we have that as the number N of observations increases, the updated distribution for the plausibility-indexed circum- stances asymptotically becomes (τN ) (τ1) N¯ p(S q¯| M ∧ R ∧···∧R ∧ I)  δ − q¯ as N →∞, (31) iN i1 N

Ri appears Ni times which is formally identical to the de Finettian’s (11). It also follows that

τ τ τ τ Ni P(R( N+1)| M( N+1) ∧ R( N ) ∧···∧R( 1) ∧ I)  as N →∞, (32) i iN i1 N Ri appears Ni times exactly as in (12). Also in the Laplace-Jaynes approach, then, the plausibilities as- signed to unobserved outcomes get nearer their observed relative frequencies as the number of observations gets larger (under the assumptions specified above). And persons sharing the same data and having compatible initial plausibility assignments tend to converge to similar plausibility assignments, as regards their respective sets of plausibility-indexed circumstances (and as regards unobserved measurement out- comes). Cf. the discussion in §§ 6.2 and 6.4, and see also Jaynes [21, ch. 18] (see also [19, lect. 18][20, lect. 5]). What happens if the distribution p(S q¯ | I)d¯q vanishes at, or in a neighbourhood of, the point (qi) = (Ni/N), as may be the case when the number of ‘fine-grained’ circumstances {C j} is finite? It happens that the updated distribution gets concen- trated at those S q¯ (and those C j) withq ¯ (respectively (qki)) nearest (Ni/N)andfor which the initial plausibility does not vanish. One should make the adjective ‘near- est’ topologically more precise. There are also interesting results at variance with the

19 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

asymptotic expression (12) when (Ni/N) cannot be obtained as a convex combination of thoseq ¯ (respectively (qki)) whose related plausibilities do not vanish. All this is left to another study.

6 Discussion

6.1 Relations between the circumstance and exchangeability approaches

The Laplace-Jaynes approach and the infinite-exchangeability one do not exclude each other and may be used simultaneously in many problems. In fact, if in a given problem we can introduce ‘circumstances’, and the number of measurement instances is potentially infinite, the resulting distributions for collections of outcomes are then infinitely exchangeable and all the results and representations based on ex- changeability also apply, beside those based on the circumstance representation. This leads to the following powerful proposition:

Proposition. In the Laplace-Jaynes approach, if the number of measurement in- stances is potentially infinite then

τ τ P(R( L) ∧···∧R( 1)| M ∧ D ∧ I) = iL i1 R1 appears L1 times, etc.   Li Γ | ∧ = Li | ∧ , qi (¯q D I)d¯q qi p(S q¯ D I)d¯q (33) i i and in particular Γ (¯q| D ∧ I)d¯q = p(S q¯ | D ∧ I)d¯q, (34) for any L, {iσ }, and any D representing a (possibly empty) conjunction of measure- ment outcomes not containing the instances τ1,...,τL. The equalities of the above Proposition may be read in two senses, whose mean- ing is the following: If in a given situation we have introduced circumstances {S q¯} (or, equivalently, {C j}) and assigned or calculated their plausibility density p(S q¯| D ∧ I), then we also know, automatically, the generating function Γ (¯q| D ∧ I) of de Finetti’s representation theorem, which we should have introduced had we taken an exchange- ability approach. Vice versa, if we approach the problem through exchangeability and assign an infinitely exchangeable distribution to the possible infinite collections of outcomes, we have automatically assigned also the plausibility density p(S q¯ | D∧I) for whatever set of plausibility-indexed circumstances {S q¯} we might be willing to introduce. The density of the Laplace-Jaynes approach and the generating function of the exchangeability one are, in both cases, numerically equal. What is remarkable in the second direction of the Proposition is that the plau- sibilities of the plausibility-indexed circumstances are completely determined from our exchangeable plausibility assignment, even when we have not yet specified what

20 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction those propositions are about! Although remarkable, this fact is a consequence of the particular way the plausibility-indexed circumstances are formed from the ‘fine- grained’ ones. Note, moreover, that the plausibilities of the ‘fine-grained’ circum- stances {C j} would not be completely determined in general, even though their values would be constrained by eq. (26). Some philosophical importance has the fact that p(S q¯| ...), numerically equal to Γ (¯q| ...), is an ‘actual’ plausibility distribution, whereas the latter is only a gener- ating function. It is practically impossible to specify most infinitely exchangeable plausibility distributions directly, and this is one of the main points and advantages of de Finetti’s theorem: a de Finettian can specify an infinitely exchangeable distri- bution by making a (often necessary) detour and specifying Γ instead. It is, however, a bit disconcerting the fact that one is almost always forced to specify the ‘as if’ in- stead of what according to de Finetti is ‘the actual plausibility’. The Laplace-Jaynes approach gives instead a direct meaning to the generating function Γ as a plausibility distribution, so that one needs not feel embarrassed to specify it. Can all situations which can be approached through infinite exchangeability also be approached from the Laplace-Jaynes point of view? Asking this means asking whether all situations for which we deem exchangeability to apply can be analysed into circumstances having the properties (I)–(IV). There is surely much place for discussion on the answer to this question, if by ‘circumstances’ we really mean, in some sense, ‘interesting circumstances’. One could also wonder whether on a formal level the answer could be ‘yes’. The reason is that the Laplace-Jaynes approach would seem to formally include the infinite-exchangeability one. To see how, make { } first a judgement of infinite exchangeability, and then introduce and define a set C f¯ of propositions stating that the limiting relative frequencies of an infinite collection of outcomes have certain values f¯. This should be possible since the existence of this limit under exchangeability is guaranteed by de Finetti’s theorem. Intuitively, (τ) such a set {C ¯}  {C } for any τ would seem to constitute (under a judgement of f f¯ infinite exchangeability) a set of circumstances that satisfy properties (I)–(IV). If this were true then, whenever infinite exchangeability applied, the Laplace-Jaynes approach would be at least formally viable. But intuition is not a reliable guide here (the definition of a proposition like C f¯, e.g., would apparently involve a disjunction over the set of permutations of natural numbers), and the above reasoning is not mathematically rigorous. We have not the mathematical knowledge necessary to rigorously analyse and answer this sort of conjecture, and since we find it in any case uninteresting, let us not discuss it any further. It can be added, however, that any criticism from de Finettians with regards to the infinities involved in the argument would be like sawing the branch upon which they are sitting themselves, since we do not see how one can effectively make a judgement of infinite exchangeability without { } first considering possibly problematic propositions like C f¯ .

21 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

6.2 ‘Unsure’ and ‘unstable’ plausibiliy assignments

The presence of circumstances in the analysis of the problem, and the consequent fact that the plausibility of the outcomes can be decomposed as in eq. (28), suggest an interpretation of those feelings of ‘uncertainty’ and ‘stability’ about a plausibility assignment mentioned in § 4, point (a). A person might feel ‘sure’ about an assign- ment 1/2toheadsinthesituationIs because of the following, perhaps unconscious, reasoning: ‘If I knew that the coin was two-headed or the toss method favoured heads, I’d assign 1 to heads. If on the contrary I knew that the coin was two-tailed or the toss method favoured tails, I’d assign 0 to heads. If I knew that the coin was a usual one and the tosser knew nothing about tossing methods, I’d assign 1/2to heads. But the coin, although I gave to it a rapid glance only, seems to me quite com- mon and symmetric; and I know that the tosser, a friend of mine, knows no tossing tricks. So I can safely exclude the first two possibilities, and I’m practically sure of the third. Yes, I’ll give 1/2 to heads’. In this reasoning, the person entertains (for each τ) a set of possibilities, like ‘The coin is two-headed’, ‘The toss method favours tails’, etc. From these a set of three plausibility-indexed circumstances S q¯ , with 18 q¯ ≡ (qheads, qtails), is distilled, corresponding to theq ¯ values (1, 0), (1/2, 1/2), and (0, 1). To these circumstances the person assigns, given the background knowledge Is, the plausibilities

P(S (1,0)| Is) ≈ 0,

P(S (1/2,1/2)| Is) ≈ 1, (35)

P(S (0,1)| Is) ≈ 0. Then, according to the rules of plausibility theory, the total plausibility assigned to heads is  P(‘heads’| Is) = P(‘heads’| S q¯ ∧ Is)P(S q¯| Is), q¯ (36) = 1 × P(S (1,0)| Is) + 1/2 × P(S (1/2,1/2)| Is) + 0 × P(S (0,1)| Is), ≈ 1 × 0 + 1/2 × 1 + 0 × 0 = 1/2.

The fact to notice here is that amongst all the circumstances, those that lead to a con- ditional plausibility for heads near to the total plausibility, viz. 1/2, have together far higher plausibility than the others. This fact can be interpreted as the source of the feelings of sureness and stability associated to that final plausibility assignment: the person has conceived a number of hypotheses, and the total plausibility assigned on the grounds of these is practically equal to the plausibilities assigned conditionally on the most plausible hypotheses only. Put it otherwise, the most plausible circum- stances are not discordant, all point more or less to the same plausibility assignment. Mathematically this is reflected in the fact that the distribution P(S q¯| Is) is concen- trated around the median.

18 Note, once more, that the S q¯ concern not plausibilities but statements like, in this case, ‘The coin is a usual one and the tosser knows nothing about tossing methods’, etc.

22 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

The reasoning of a person who feels ‘unsure’ about an assignment 1/2 to heads in the situation Iu might go instead as follows: ‘Had I known that the coin was a common one, I’d have assigned 1/2 to heads. But I have heard from reliable sources that this coin is not a common one, so I can safely exclude that possibility. If I knew that the coin was two-headed, I’d assign 1 to heads. If I knew that the it was two-tailed, I’d assign 0 to heads. But I’m completely unsure about these two possibil- ities!Ihavetogive1/2 to heads then’. Also in this reasoning the person introduces a collection of hypotheses and from these distills a set of three plausibility-indexed cir- cumstances similar to those introduced by the ‘sure’ person. It is in the assessment of the plausibilities P(S q¯ | I) that the two persons — owing to their different background knowledges Is, Iu —differ. In this case the assessment yields

P(S (1,0)| Iu) ≈ 1/2,

P(S (1/2,1/2)| Iu) ≈ 0, (37)

P(S (0,1)| Iu) ≈ 1/2, and therefore the unsure person assign to the outcome ‘heads’ the total plausibility  P(‘heads’| Iu) = P(‘heads’| S q¯ ∧ Iu)P(S q¯| Iu), q¯ (38) = 1 × P(S (1,0)| Iu) + 1/2 × P(S (1/2,1/2)| Iu) + 0 × P(S (0,1)| Iu), ≈ 1 × 1/2 + 1/2 × 0 + 0 × 1/2 = 1/2. The final assignment is the same as that of the sure person, but the most plausible circumstances for the unsure person would yield conditional plausibilities not near to the final value 1/2. The unsure person would have given 0 or 1 if a little bit more knowledge of the situation had been available.19 In other words, the most plausible circumstances are discordant and point to different plausibility assignments, and this can be interpreted as the source of the feelings of unsureness and instability. In mathematical terms, the distribution P(S q¯ | Iu) is not concentrated around the median. The two examples above are very simple, the number of possible plausibility- indexed circumstances being only three. With a deeper analysis this number could be very large, and the assignments (35) and (37) would be replaced by effective continuous distributions like e.g.

20 20 p(S q¯ | Is)d¯q ∝ (qheads) (qtails) δ(1 − qheads − qtails)d¯q, (39) −39/40 −39/40 p(S q¯ | Iu)d¯q ∝ (qheads) (qtails) δ(1 − qheads − qtails)d¯q, (40) whose graphs are those of Fig. 1, blue dashed curve for the first and orange continu- ous curve for the second. These are Dirichlet (or beta) distributions, known for their particular properties, for which see e.g. [1, 11, 72, 85, 108] (cf. also [70]). The degree of ‘sureness’ or ‘stability’ about a plausibility assignment can thus be apparently captured, formalised, and even quantified by the plausibility distribution

19The assignment will in fact collapse onto one of those two values as soon as one toss is observed, leading to nought-or-one updated plausibilities for the circumstances S (0,1) and S (1,0).

23 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

of a set of circumstances, P(S q¯ | I), and by its ‘width’ in particular, which can be de- fined and quantified in a different number of ways (entropy, standard deviation, etc.). Graphs such as those of Fig. 1 have then a legitimate and quantitative meaning within (single-valued) plausibility theory: they do not represent ‘probabilities of probabil- ities’; they represent plausibility distributions amongst different unknown empirical circumstances, grouped for simplicity according to the conditional plausibilities they lead to. This interpretation is also consonant with the fact that, according to the results of § 5.4, eq. (31), as the observed data accumulate the distribution P(S q¯| I)getsmore and more peaked around a particular value which gets also nearer the median, re- flecting the fact that accumulation of data tends to make us surer of our plausibility assignments. For further discussion and references, see Jaynes [21, ch. 18; cf. also 19, lect. 18; 20,lect.5]and[2].

6.3 Finite and small number of possible observations

Asmentionedin§4withregardtopoint(c),therearesituationsinwhichtheinfinite- exchangeability approach cannot be adopted: situations where we know that the num- ber of possible measurement instances is finite — and, in particular, small. In this case a de Finettian has three possible options as regards the application of exchange- ability. The first is to make a judgement of finite exchangeability and use the related representation theorem. It is known that the representation theorem for finite ex- changeability has a different form from that for infinite exchangeability, eq. (3). The plausibility for a collection of outcomes takes in the finite case the form of a mixture of hypergeometric distributions (‘urn samplings without replacement’). In the case of a finite but large number of possible observations this representation converges to the infinitely exchangeable one [9, 10][see also 109], so the de Finettian can make sense of the propensitor’s ‘unknown propensity’ reasoning and of the related formu- lae, which are formally identical with (3)–(8), as approximations. But in the case of a small maximum number of possible observations the discrepancy between the two representations is too large, and if the ‘propensitor’ obstinately uses the ‘unknown propensity’ formulae, the de Finettian will then not be able to ‘make sense’ of them, as instead was the case with infinite exchangeability. The second option is to use de Finetti’s theorem as generalised by Jaynes [71], i.e. extended to generating functions of any sign, but restricting the choice of the latter to positive ones only. But this restriction would lack meaningful motivation: What kind of plausibility judgement would the restriction reflect? — For we must remember that the specification of the generating function is for the de Finettian only a detour to specify the plausibility distribution of the outcomes, which is the only ‘actual’ one. The restriction to positive generating functions would only seem to reflect the third option, to which we now turn.

24 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

The third option is to enlarge the finite collection of measurement instances with an infinite number of fictive ones, and then make a judgement of infinite exchange- ability for the enlarged collection. With this artifice the formulae of the propensitor can be recovered, and seemingly with a meaningful motivation. It seems to us, how- ever, that the introduction of very many (∞) fictive measurement instances, apart from being unpleasant, is formally inconsistent. For the fact that one knows that the number of measurement instances can but be finite, say at most n, must be included in the context I;viz.,M(τ) ∧ I is false for τ > n. The plausibility of the outcomes of the fictive ‘additional’ (τ > n) measurements, which is conditional on I,isthen by definition nought, or undefined. One cannot even say ‘Let us suppose that ad- ditional measurements were possible’, because such a proposition conjoined with I would yield a false context and the plausibilities conditional on it would then be un- defined. An alternative would be to eliminate from the context I those ‘elements’ that bound the number of measurement instances, so that the plausibilities of the fictive additional measurements would neither be nought nor undefined in the new, muti- lated context. But mutilation of prior knowledge — granted its feasibility — can be dangerous. The Laplace-Jaynes approach, on the other hand, applies unaltered to the case of a finite, even small, maximum number of measurement instances. This is clear if we look again at properties (II)–(IV): they nowhere require the instance index τ to run to infinity, as cursively remarked directly after their enunciation. Hence, through the Laplace-Jaynes approach we can make straightforward sense of the propensitor’s procedure also in these finite situations.

6.4 When is the Laplace-Jaynes approach useful? Making allowance for the grounds behind exchangeability assignments: some examples

We have seen in § 6.1 that the two approaches are not mutually exclusive but can coexist and strongly ‘interact’. But in which sense are the two approaches comple- mentary? How ought we to decide which approach to choose? Why? There is no definite answer: as de Finetti warns [93, § 20]:

in certain enterprises it can be better to evaluate the probability of favour- able outcomes all in a block, to see at a glance whether the investment is secure or insecure, and in others it is better to reach this conclusion starting from an analysis of the individual factors that are in play. There is no conceptual difference between the two cases. Someone who wants to estimate the area of a rectangular field can with the same right esti- mate the area directly in hectares, or estimate the lengths of the sides and multiply them: reasons of convenience, practicality and custom will make one method preferable to the other, the one that seems to us more trustworthy in relation to our capacity to judge, or we can follow both methods, or try other ones, and consider all of them in fixing our opinion.

25 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

In assigning a plausibility to some measurement outcomes we may simply want, depending on ‘convenience, practicality and custom’, to assign plausibilities to all possible collections of similar outcomes (indirectly, by specifying the generating function of de Finetti’s theorem), update the plausibilities according to observations already made, and then marginalise for the instances of interest, as described in § 3. Or we may want to analyse the contexts of the measurements into more details and assign plausibilities to these, obtaining the final plausibilities for the outcomes of in- terest by the theorem on total probability, as described in § 5. Neither approach is more ‘objective’ than the other, for both must start from some set of (subjective, the de Finettian says) plausibilities that must be given without further analysis. In situations such as that of a ‘common’ coin toss or die throw, we (the authors) usually prefer in general the approach through exchangeability. Although such situ- ations could be analysed into more details and hypotheses about the way of tossing etc., we are not really interested in these and prefer to make an initial exchangeable plausibility assignment for future tosses, that we shall update according to observa- tions made — the number of which can be very large (i.e., infinite exchangeability can be applied, at least as an approximation). One could say that it is the plausibil- (τ)| ... ities ‘P(Ri )’ that are of interest, and we collect measurement data to be put in place of the dots in order to update those plausibilities. But there are examples, espe- cially in the study of natural phenomena, where it is the details — the circumstances — that interest us most and that we want to investigate. In these cases measure- ment outcomes are collected not only for the sake of prediction of other, unknown ones, but also in order to modify our uncertainty about the circumstances, i.e., to change our initial plausibility assignments about them. In other words, the plausibil- | ... (τ)| ... ities ‘P(C j )’ interest us as much as the ‘P(Ri )’, and measurement data are collected to update both. A powerful feature of the approach through plausibility-indexed circumstances is that you can ‘wait’ to think about and to explicitly define the circumstances of interest until you have collected a large amount of observations. The ‘procedure’ is as follows:

1. Imagine to have introduced numerous circumstances {C j}, to have assigned the plausibilities P(Ri| M ∧ C j ∧ I), and to have performed the ‘coarse-graining’ of the circumstances into the set {S q¯}.

2. Assume to have assigned plausibilities P(C j| I) such that the plausibility den- sity p(S q¯ | I) over the plausibility-indexed circumstances is enough smooth and non-vanishing at every point. How much is ‘enough’ is related to the size of the collected data as for the next step.

3. Collect a large amount of outcome observations D. What counts as ‘large’ is related to the smoothness assumed in the previous step for the plausibility density over the plausibility-indexed circumstances.

26 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

4. Update your distributions according to the observations D.Iff¯ ≡ ( fi) are the relative frequencies of the observed outcomes, the updated plausibility distri- bution over the plausibility-indexed circumstances would now be concentrated around the circumstance labelled by f¯, i.e., p(S q¯| D ∧ I)d¯q ≈ δ(¯q − f¯)d¯q; cf. § 5.4. This means that you would judge S f¯ to be the most plausible cir- cumstance given the evidence D. (But remember: the introduction of circum- stances is up to now only imagined.)

5. Now re-examine the context as it was before the observations, and really intro- duce a set of circumstances of interest, respecting the assumptions in steps 1 and 2. In this process, however, you only need to concentrate on those circum- stances conditional on which the outcomes of any measurement instance are near to f¯, i.e. those for which P(Ri| M ∧C j ∧I) ≈ fi. This restriction is sensible since, as for step 4 above, these are the circumstances that have now become most plausible in the light of the observed data.

This process reflects e.g. what we do when, having tossed a coin many times and seen that we nearly always obtain heads, we say ‘there’s something peculiar going on here, let’s see what it can be’, and begin to look for circumstances (like who tossed the coin, how the coin was tossed, on which surface it was tossed, how it was manufactured, etc.) which presumably lead to the observed peculiar behaviour. Examples in the same spirit, although more complex, abound in the natural sci- ences. Consider the following example from the study of granular materials [e.g., 110, 111]. These are made of a very large number of macroscopic (thermal agitation is unimportant) but small particles, that are hence considered point-like. These ma- terials can then be studied by the methods of statistical mechanics. Often of interest is the particle motion resulting from an externally applied force and the particles’ collisions. In the study by Rouyer and Menon [112] (to which we refer for details) we are interested in the measurement — our M — of the horizontal velocity com- ponent v of any particle. The possible measurement outcomes {Rv} are the possible different values of v.20 What is the plausibility distribution that we assign to the different outcomes? Depending on the circumstances, we should make different as- signments. For example, if we knew that the particle collisions were elastic, the particle density enough low, long-range interactions negligible, and a couple more of details — all of which we represent by CMB —, we should assign a Maxwell- | ∧ ∧ ∝ 2/ 2 Boltzmann, Gaussian distribution p(Rv M CMB I)dv exp(v v0)dv, with an appropriate constant v0. (In the present case we know the collisions to be inelastic, so that CMB is from the beginning known to be false; but let us disregard this to make the example more interesting.) If we knew [112] that the collisions were inelastic, knew that the density presented inhomogeneities, judged the particle density very relevant for the velocity distribution, and knew some other details — denote all them by CP —, we should then assign a particular non-Gaussian plausibility distribution,

20Hence, the set of outcomes is continuous here but, with appropriate mathematical care, this is not so important.

27 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction according to the kinetic-theoretical considerations of Puglisi et al. [113]. Finally, if we knew [112] that the collision were inelastic, that energy was injected in the sys- tem homogeneously in space, and some other details — call them CNE —, we should then assign another particular non-Gaussian plausibility distribution, according to the study by van Noije and Ernst [114]. (In the present case we know that energy is injected from the boundary, so that CNE is known to be false; again, we disregard this.) We do not consider other possible circumstances for the moment, but go on to collect a large amount of observations instead — call these D. The distribution of / = α / α α ≈ / frequencies thus observed goes like Nv N dv exp(v v1 )dv with 3 2, a distri- bution very similar to that assigned on the grounds of CP, and partially similar to that assigned conditional on CNE. By eq. (12) we are thus led, in both the exchangeability and the Laplace-Jaynes approach, to assign (N+1)| (N+1) ∧ ∧ ≈ α / α p(Rv M D I)dv exp(v v1 )dv (41) and to say that

α α Γ (‘exp(v /v )’| D ∧ I) = p(S α / α | D ∧ I) has a very large value. (42) 1 ‘exp(v v1 )’ Γ α / α You note that has ‘exp(v v1 )’ as argument, which is also the index of S .This is because v is a continuous variable, hence the vectorsq ¯ ≡ (qi) of our previous α / α discussions become functions Q(v) here. The function ‘exp(v v1 )’ thus represents a particular value of what is denoted by ‘¯q’in§§3and5. The analysis has up to now involved both the exchangeability and the Laplace- Jaynes approaches, and at this point the exchangeability approach basically termi- nates. But the interesting part of the Laplace-Jaynes approach, instead, begins. In fact, from eq. (42) we are led to assign a vanishing plausibility to the circumstance CMB and a small one to CNE (which we already knew to be false, however) and a non-negligible plausibility to the circumstance CP. This prompts other studies and experiments in order to update that plausibility so as to possibly decrease our uncer- tainty. Rouyer and Menon [112] actually do this, and eventually come to a vanishing plausibility for the circumstance CP. Other circumstances are still to be formulated and introduced. In the above typically physical example the ‘circumstances’ are indeed intended as ‘causes’ or ‘mechanisms’. But as we remarked in § 5.1, this needs not always be the case. A very interesting and curious example, for which we believe the cir- cumstances cannot be interpreted as ‘causes’ or ‘mechanisms’, is provided by the study of the Newcomb-Benford law [115, 116, 117, 118], here tersely described by Raimi [117, p. 521]:

It has been known for a long time that if an extensive collection of numerical data expressed in decimal form is classified according to first significant digit, without regard to position of decimal point, the nine resulting classes are not usually of equal size. Indeed, [...] for

28 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

the occurrence of a given first digit i (i = 1, 2,...,9),21 many observed + / tables give a frequency approximately equal to log10[(i 1) i]. Thus the initial digit 1 appears about .301 of the time, 2 somewhat less and so on, with 9 occurring as a first digit less than 5 percent of the time. (We do not admit 0 as a possible first digit.) This particular logarithmic distribution of first digits, while not uni- versal, is so common and yet so surprising at first glance that it has given rise to a varied literature, among the authors of which are mathe- maticians, statisticians, economists, engineers, physicists and amateurs.

This example can be analysed along the lines of the previous one. The ‘measurement’ M is the observation of the first digits in a given collection of numerical data, the pos- sible ‘outcomes’ being R1,...,R9. Having made a judgement of exchangeability as regards the outcomes of any number of observations,22 after many observations our plausibility assignment will be very near to the observed frequencies, in this case + / approximately log10[(i 1) i] . Here the approach through exchangeability stops. But the Laplace-Jaynes approach does not. According to eq. (31) the circumstances that have acquired highest plausibility are those which lead us to assign the ‘sur- + / prising’ distribution log10[(i 1) i] . The ‘varied literature’ which Raimi mentions consists almost exclusively in searches and studies of such circumstances. The sig- nificant point is that many proposed ones concern not ‘causes’ or ‘mechanisms’, but symmetries. A general and more thorough discussion of how the Laplace-Jaynes approach fits into the formalism of statistical physics will be given in our next study (in prepara- tion), together with an analysis of point (d) of § 4.

7 Generalisations

In the Laplace-Jaynes approach introduced in § 5.1 a set of possible circumstances is defined for each measurement instance, and there is a mutual correspondence of these sets, mathematically expressed in particular by eqs. (III) and (IV). These express the idea that there is a similarity amongst the measurement instances, for they can be analysed into the ‘same’ set of circumstances; and also the idea that the ‘same’, but unknown, circumstance holds in all instances. This approach can be generalised in different directions. A first generalisation is to introduce a set of circumstances {Cλ }, not for each measurement instance, but for all of them en bloc. I.e., each circumstance Cλ con- cerns all measurement instances. This idea can be easily illustrated if different mea- surement instances are characterised by different times: then each Cλ can represent a

21‘p’inRaimi’stext. 22Since the number of data is finite, infinite exchangeability can here be used only as an approxi- mation or an idealisation.

29 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction possible ‘history’ of the details of the instances.23 Modifying properties (I)–(III) and (·)  the subsequent sections by the formal substitution C j Cλ , and dropping prop- erty (IV), most part of the analysis and discussion presented in the previous sections holds unchanged, or with minor adaptations, for this generalisation. A second generalisation, with similarities with the preceding one, is made by dropping property (IV) only. This means that we do not assume the ‘same’ circum- σ C( ) stance to hold at every measurement instance. All conjunctions σ jσ , where the jσ can differ for different σ, can then have non-vanishing plausibilities (these con- junctions are obviously similar to the circumstances Cλ above; cf. footnote 23). With this generalisation formulae (20)–(22) do not hold; in their place we have

τ τ P(R( L) ∧···∧R( 1)| M ∧ I) = iL i1 τ τ τ τ τ τ σ ( 1)| ( 1) ∧ ( 1) ∧ ··· ( L)| ( L) ∧ ( L) ∧ ( )| ≡ P(R M C τ I) P(R M C τ I)P C σ I i1 j 1 iL j L σ j j1, j2,... ··· (σ)| ,  qi jτ qi jτ P C I (20 ) 1 1 L L σ jσ j1, j2,...

τ τ τ τ P(R( N+L) ∧···∧R( N+1)| M ∧ R( N ) ∧···∧R( 1) ∧ I) = iN+L iN+1 iN i1 L τ τ σ τ τ P(R( σ )| M(τσ ) ∧ C( σ ) ∧ I) P C( ) | R( N ) ∧···∧R( 1) ∧ I ≡ iσ jσ σ jτσ iN i1 , ,... σ=1 j1 j2 σ τ τ ··· ( )| ( N ) ∧···∧ ( 1) ∧ ,  qi jτ qi jτ P C R R I (21 ) 1 1 L L σ jσ iN i1 j1, j2,... and ··· (σ)| σ τ τ qi1 jτ qiN jτ P σ C jσ I P C( )| R( N ) ∧···∧R( 1) ∧ I =  1 N . (22) σ jσ iN i1 (σ) , ,... q τ ···q τ P σ C | I j1 j2 i1 j 1 iN j N jσ The plausibility distribution for the collection of outcomes is therefore generally not exchangeable (the frequentist and the propensitor would say that the events are not ‘identically distributed’, though independent). The plausibilistic framework of the last generalisation is used in non-equilibrium (τ) statistical mechanics [119, 120, 121, 122, 123]. A generic circumstance C j repre- (τ) sents a system’s being in a (‘microscopic’) state j at the time tτ ; Ri represents the obtainment of the ith outcome of a (‘macroscopic’) measurement M(τ) performed at (τ)| (τ) ∧ (τ) ∧ time tτ ; the plausibilities P(Ri M C j I) are given by the relevant physical

23Note, however, that different measurement instances do not need to be associated to different times in general, so the ‘history interpretation’ is just an example. A more general way to think of the { } τ { (τ)} circumstances Cλ is to introduce for each measurement instance a set of circumstances C j ,as in § 5.1, but without assuming any correspondence amongst the sets, not even the same cardinality. (τ) We then consider all possible conjunctions τ C jτ , and each such conjunction can be taken to be by definition one of the Cλ .

30 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction theory for all i, τ, j. We initially assign, usually by means of maximum-entropy or σ C( )| I maximum-calibre principles, an initial plausibility distribution P( σ jσ )forall σ C( ) possible state dynamics σ jσ that the system may follow. Then we update, from the history of the observed measurement outcomes, the plausibilities of the different dynamics and thence those of unobserved outcomes. Finally, a third generalisation, for which see [106, 107], is the introduction of multiple kinds of measurements instead of a single one M. The resulting representa- tion, which should be easy for the reader to derive, would correspond to de Finetti’s theorem for partial exchangeability [11, 124].

8 Conclusions

Beside exchangeability, there is another point of view from which the de Finettians and the plausibilists can approach the question of induction and thus also interpret and give meaning to the formulae of the propensitors and some of their locutions like ‘unknown probability’ or ‘i.i.d.’. This point of view, which we have named the ‘Laplace-Jaynes approach’, is based on an analysis of the particular situation of interest into possible ‘circumstances’. These ‘circumstances’ are propositions that concern details of an empirical nature; in particular, they are not and cannot be state- ments about probabilities. They do not necessarily involve notions of ‘cause’ or ‘mechanism’, but can concern symmetries of the situation under study, or conse- quences of the observed events. More generally, their choice is fully ‘subjective’, as are the plausibilities assigned to them. Their main required property is that when they are known they render irrelevant any observational data for the purpose of assigning a plausibility to unobserved events. The Laplace-Jaynes approach can coexist with that based on exchangeability, and can be a complement or an alternative to the latter. It is particularly suited to problems in natural philosophy, whose heart is the analysis of natural phenomena into relevant circumstances (concerning especially, but not exclusively, the notion of ‘cause’). It also allows — without resorting to alternative probability theories — an interpretation, formalisation, and quantification of feelings of ‘uncertainty’ or ‘instability’ about some plausibility assignments. Finally, this point of view is applicable in those situations in which the maximum number of possible observations is bounded (and, especially, small) and infinite exchangeability cannot therefore be applied. The Laplace-Jaynes approach is also straightforwardly generalised to the case of more generic sets of circumstances, the case of non-exchangeable plausibility as- signments (with applications in non-equilibrium statistical mechanics), and the case in which different kinds of measurements are present, usually approached by partial exchangeability.

31 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

Acknowledgements

PM non può ringraziare mai abbastanza Louise, Marianna, e Miriam per il loro con- tinuo sostegno e amore. Affectionate thanks also to the staff of the KTH Biblio- teket, especially the staff of the Forum Library — Elisabeth Hammam, Ingrid Tal- man, Tommy Westergren, Elin Ekstedt, Thomas Hedbjörn, Daniel Larsson, Allan Lindqvist, Lina Lindstein, Yvonne Molin, Min Purroy Pei, Anders Robertsson — for their patient and indefatigable work. Without them, research would hardly be possible. AM thanks Anders Karlsson for encouragement and advice.

Bibliography

Note: arxiv eprints are located at http://arxiv.org/. [1] E.T.Jaynes,Monkeys, kangaroos and N (1996), http://bayes.wustl.edu/etj/ node1.html; revised and corrected version of Jaynes [125]. (Errata: in equations (29)–(31), (33), (40), (44), (49) the commas should be replaced by gamma functions, andonp.19thevalue0.915 should be replaced by 0.0915). [2] P.G.L.PortaMana,A.Månsson,andG.Björk,‘Plausibilities of plausibilities’: an approach through circumstances. Being part I of “From ‘plausibilities of plausibili- ties’ to state-assignment methods” (2006), eprint arXiv:quant-ph/0607111. [3] B. de Finetti, Probabilismo, Logos 14, 163–219 (1931), transl. as [93]; see also [126]. [4] B. de Finetti, Foresight: Its logical laws, its subjective sources, in Kyburg and Smokler [127] (1964), pp. 93–158, transl. by Henry E. Kyburg, Jr. of [128]. [5] E. Hewitt and L. J. Savage, Symmetric measures on Cartesian products,Trans.Am. Math. Soc. 80(2), 470–501 (1955). [6] B. de Finetti, Theory of Probability: A critical introductory treatment. Vol. 1 (John Wiley & Sons, New York, 1970/1990), transl. by Antonio Machi and Adrian Smith; first publ. in Italian 1970. [7] B. de Finetti, Theory of Probability: A critical introductory treatment. Vol. 2 (John Wiley & Sons, New York, 1970/1990), transl. by Antonio Machi and Adrian Smith; first publ. in Italian 1970. [8] D. Heath and W. Sudderth, De Finetti’s theorem on exchangeable variables, American Statistician 30(4), 188–189 (1976). [9] P. Diaconis, Finite forms of de Finetti’s theorem on exchangeability, Synthese 36(2), 271–281 (1977). [10] P. Diaconis and D. Freedman, Finite exchangeable sequences, Ann. Prob. 8(4), 745– 764 (1980). [11] J.-M. Bernardo and A. F. Smith, Bayesian Theory (John Wiley & Sons, Chichester, 1994). [12] P. S. Laplace, (Marquis de), Théorie analytique des probabilités (Mme Ve Courcier, Paris, 1812/1820), 3rd ed., first publ. 1812; repr. in [129]; http://gallica.bnf. fr/document?O=N077595. [13] W. E. Johnson, Logic. Part III. The Logical Foundations of Science (Cambridge Uni- versity Press, Cambridge, 1924). [14] H. Jeffreys, Theory of Probability (Oxford University Press, London, 1939/1998), 3rd ed., first publ. 1939.

32 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

[15] H. Jeffreys, Scientific Inference (Cambridge University Press, Cambridge, 1931/1957), 2nd ed., first publ. 1931. [16] H. Jeffreys, The present position in probability theory, Brit. J. Phil. Sci. 5(20), 275– 289 (1955). [17] R. T. Cox, Probability, frequency, and reasonable expectation, Am. J. Phys. 14(1), 1–13 (1946). [18] R. T. Cox, The Algebra of Probable Inference (The Johns Hopkins Press, Baltimore, 1961). [19] E. T. Jaynes, Probability Theory: With Applications in Science and Engineering: A Se- ries of Informal Lectures (1954–1974), http://bayes.wustl.edu/etj/science. pdf.html; lecture notes written 1954–1974; earlier version of [21]. [20] E. T. Jaynes, Probability Theory in Science and Engineering (Socony-Mobil Oil Com- pany, Dallas, 1959), http://bayes.wustl.edu/etj/node1.html; see also [19]. [21] E. T. Jaynes, Probability Theory: The Logic of Science (Cambridge University Press, Cambridge, 1994/2003), ed. by G. Larry Bretthorst; http://omega.albany.edu: 8008/JaynesBook.html, http://omega.albany.edu:8008/JaynesBookPdf. html. First publ. 1994; earlier versions in [19, 20]. [22] M. Tribus, Rational Descriptions, Decisions and Designs (Pergamon Press, New York, 1969). [23] E. W. Adams, The Logic of Conditionals: An Application of Probability to Deductive Logic (D. Reidel Publishing Company, Dordrecht, 1975). [24] E. W. Adams, A Primer of Probability Logic (CSLI Publications, Stanford, 1998). [25] T. Hailperin, Sentential Probability Logic: Origins, Development, Current Status, and Technical Applications (Associated University Presses, London, 1996). [26] J. M. Keynes, A Treatise on Probability (MacMillan, London, 1921), 5th ed. [27] F. P. Ramsey, Truth and probability, in Ramsey [28] (London, 1926/1950), pp. 156– 198, repr. in [127, pp. 61–92], written 1926. [28] F. P. Ramsey, The Foundations of Mathematics and other Logical Essays (Routledge & Kegan Paul, London, 1931/1950), ed. by R. B. Braithwaite, first publ. 1931. [29] A. N. Kolmogorov, Foundations of the Theory of Probability (Chelsea Publishing Company, New York, 1933/1956), second English ed., transl. by Nathan Morrison, with an added bibliography by A. T. Bharucha-Reid first publ. in Russian 1933. [30] B. O. Koopman, The bases of probability,Bull.Am.Math.Soc.46, 763–774 (1940), repr. in [127, pp. 159–172]. [31] B. O. Koopman, The axioms and algebra of intuitive probability, Ann. Math. 41(2), 269–292 (1940). [32] B. O. Koopman, Intuitive probabilities and sequences, Ann. Math. 42(1), 169–187 (1941). [33] G. Pólya, Mathematics and Plausible Reasoning: Vol. I: Induction and Analogy in Mathematics (Princeton University Press, Princeton, 1954). [34] G. Pólya, Mathematics and Plausible Reasoning: Vol. II: Patterns of Plausible Infer- ence (Princeton University Press, Princeton, 1954/1968), 2nd ed., first publ. 1954. [35] J. Łos,´ On the axiomatic treatment of probability, Colloq. Math. 3, 125–137 (1955). [36] H. Gaifman, Concerning measures in first order calculi, Israel J. Math. 2, 1–18 (1964). [37] D. Scott and P. Krauss, Assigning probabilities to logical formulas, in Hintikka and Suppes [130] (1966), pp. 219–264. [38] H. Gaifman and M. Snir, Probabilities over rich languages, testing and randomness, J. Symbolic Logic 47(3), 495–548 (1982). [39] H. Gaifman, Reasoning with bounded resources and assigning probabilities to

33 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

arithmetical statements, Synthese 140(1–2), 97–119 (2003/2004), http://www. columbia.edu/~hg17/; first publ. 2003. [40] P. Gregory, Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica Support (Cambridge University Press, Cambridge, 2005). [41] E. W. Adams, modus tollens revisited,Analysis48(3), 122–128 (1988). [42] D. Lewis, Probabilities of conditionals and conditional probabilities, Phil. Rev. 85(3), 297–315 (1976), see also [43]. [43] D. Lewis, Probabilities of conditionals and conditional probabilities II, Phil. Rev. 95(4), 581–589 (1986), see also [42]. [44] T. Hailperin, Probability logic, Notre Dame J. Formal Logic 25(3), 198–212 (1984). [45] J. Barwise, The Situation in Logic (CSLI, Stanford, 1989). [46] J. Barwise, The situation in logic — II: Conditionals and conditional information, Tech. Rep. CSLI-85-21, Center for the Study of Language and Information, Stanford (1985). [47] H. Gaifman, Vagueness, tolerance and contextual logic (2002), http://www. columbia.edu/~hg17/. [48] B. de Finetti, Probability and exchangeability from a subjective point of view,Int.Stat. Rev. 47(2), 129–135 (1979). [49] A. Hájek, What conditional probability could not be, Synthese 137(3), 273–323 (2003). [50] Y. V. Egorov, A contribution to the theory of generalized functions, Russ. Math. Sur- veys (Uspekhi Mat. Nauk) 45(5), 1–49 (1990). [51] Y. V. Egorov, Generalized functions and their applications, in Exner and Neidhardt [131] (1990), pp. 347–354. [52] A. S. Demidov, Generalized Functions in Mathematical Physics: Main Ideas and Concepts (Nova Science Publishers, Huntington, USA, 2001), with an addition by Yu. V. Egorov. [53] M. J. Lighthill, Introduction to Fourier Analysis and Generalised Functions (Cam- bridge University Press, London, 1958/1964), first publ. 1958. [54] A. Delcroix, M. F. Hasler, S. Pilipovic,´ and V. Valmorin, Algebras of generalized functions through sequence spaces algebras. Functoriality and associations,Int.J. Math. Sci. 1, 13–31 (2002), eprint arXiv:math.FA/0210249. [55] A. Delcroix, M. F. Hasler, S. Pilipovic,´ and V. Valmorin, Generalized function alge- bras as sequence space algebras, Proc. Am. Math. Soc. 132(7), 2031–2038 (2004), eprint arXiv:math.FA/0206039. [56] M. Oberguggenberger, Generalized functions in nonlinear models — a survey, Nonlinear Analysis 47(8), 5029–5040 (2001), http://techmath.uibk.ac.at/ mathematik/publikationen/. [57] C. Swartz, Introduction to Gauge Integrals (World Scientific Publishing, Singapore, 2001). [58] R. G. Bartle, A Modern Theory of Integration (American Mathematical Society, Prov- idence, USA, 2001). [59] W. F. Pfeffer, The Riemann approach to integration: Local geometric theory (Cam- bridge University Press, Cambridge, 1993). [60] R. G. Bartle, Return to the Riemann integral, Am. Math. Monthly 103(8), 625–632 (1996). [61] A. M. Bruckner, Creating differentiability and destroying derivatives,Am.Math. Monthly 85(7), 554–562 (1978). [62] R. M. McLeod, The Generalized Riemann Integral (The Mathematical Association of

34 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

America, 1980). [63] J. Mawhin, Generalized Riemann integrals and the divergence theorem for differen- tiable vector fields, in Butzer and Fehér [132] (1981), pp. 704–714. [64] W. F. Pfeffer, The divergence theorem,Trans.Am.Math.Soc.295(2), 665–685 (1986). [65] W. F. Pfeffer, The multidimensional fundamental theorem of calculus,J.Austral.Math. Soc. A 43(2), 143–170 (1987). [66] W. F. Pfeffer, A note on the generalized Riemann integral, Proc. Am. Math. Soc. 103(4), 1161–1166 (1988). [67] J. Lamoreaux and G. Armstrong, The fundamental theorem of calculus for gauge in- tegrals, Math. Mag. 71(3), 208–212 (1998). [68] ISO, Quantities and units, International Organization for Standardization, Geneva, 3rd ed. (1993). [69] IEEE, ANSI/IEEE Std 260.3-1993: American National Standard: Mathematical signs and symbols for use in physical sciences and technology, Institute of Electrical and Electronics Engineers, New York (1993). [70] D. V. Lindley and L. D. Phillips, Inference for a Bernoulli process (a Bayesian view), American Statistician 30(3), 112–119 (1976). [71] E. T. Jaynes, Some applications and extensions of the de Finetti representation the- orem,inBayesian Inference and Decision Techniques with Applications: Essays in Honor of Bruno de Finetti, edited by P. K. Goel and A. Zellner (North-Holland, Ams- terdam, 1986), p. 31, http://bayes.wustl.edu/etj/node1.html. [72] W. E. Johnson, Probability: The deductive and inductive problems,Mind41(164), 409–423 (1932), with some notes and an appendix by R. B. Braithwaite. [73] S. L. Zabell, W. E. Johnson’s “sufficientness” postulate, Ann. Stat. 10(4), 1090–1099 (1982). [74] B. de Finetti, On the condition of partial exchangeability,inJeffrey [133] (1980), pp. 193–205, transl. by P. Benacerraf and R. Jeffrey of [124]. [75] P. Diaconis and D. Freedman, A dozen de Finetti-style results in search of a theory, Ann. Inst. Henri Poincaré (B) 23(S2), 397–423 (1987). [76] H. O. Georgii, ed., Canonical Gibbs Measures: Some Extensions of de Finetti’s Rep- resentation Theorem for Interacting Particle Systems (Springer-Verlag, Berlin, 1979). [77] F. Lad, J. M. Dickey, and M. A. Rahman, The fundamental theorem of prevision, Statistica 50(1), 19–38 (1990). [78] F. V. Atkinson, J. D. Church, and B. Harris, Decision procedures for finite decision problems under complete ignorance, Ann. Math. Stat. 35(4), 1644–1655 (1964). [79] D. Jamison, Bayesian information usage, in Hintikka and Suppes [134] (1970), pp. 28–57. [80] I. Levi, On indeterminate probabilities, J. Phil. 71(13), 391–418 (1974). [81] I. Levi, Information and ignorance, Inform. Process. Manag. 20(3), 355–362 (1984). [82] E. H. Kyburg, Jr., Bayesian and non-Bayesian evidential updating, Artif. Intell. 31(3), 271–293 (1987). [83] P. C. Fishburn, Ellsberg revisited: A new look at comparative probability, Ann. Stat. 11(4), 1047–1059 (1983). [84] R. F. Nau, Indeterminate probabilities on finite sets, Ann. Stat. 20(4), 1737–1767 (1992), http://faculty.fuqua.duke.edu/~rnau/bio/. [85] I. J. Good, The Estimation of Probabilities: An Essay on Modern Bayesian Methods (The MIT Press, Cambridge, USA, 1965). [86] C. M. Caves, C. A. Fuchs, and R. Schack, Unknown quantum states: the quantum de Finetti representation, J. Math. Phys. 43(9), 4537–4559 (2002), eprint arXiv:

35 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

quant-ph/0104088. [87] C. M. Caves, C. A. Fuchs, and R. Schack, Quantum probabilities as Bayesian proba- bilities,Phys.Rev.A65, 022305 (2002), eprint arXiv:quant-ph/0106133. [88] C. A. Fuchs and R. Schack, Unknown quantum states and operations, a Bayesian view (2004), eprint arXiv:quant-ph/0404156. [89] C. A. Fuchs, R. Schack, and P. F. Scudo, De Finetti representation theorem for quantum-process tomography, Phys. Rev. A 69, 062305 (2004), eprint arXiv: quant-ph/0307198. [90] P. S. Laplace, (Marquis de), Mémoire sur la probabilité des causes par les évène- mens, Mémoires de mathématique et de physique, presentés à l’Académie Royale des Sciences, par divers savans et lûs dans ses assemblées 6, 621–656 (1774), repr. in [135], pp. 25–65; http://gallica.bnf.fr/document?O=N077596;transl. as [136]. [91] C. M. Caves, Learning and the de Finetti representation (2000), http://info. phys.unm.edu/~caves/reports/reports.html. [92] A. Mosleh and V. M. Bier, Uncertainty about probability: a reconciliation with the subjectivist viewpoint, IEEE Trans. Syst. Man Cybern. A 26(3), 303–310 (1996). [93] B. de Finetti, Probabilism: A critical essay on the theory of probability and on the value of science, Erkenntnis 31(2–3), 169–223 (1931/1989), transl. of [3] by Maria Concetta Di Maio, Maria Carla Galavotti, and Richard C. Jeffrey. [94] J. L. Borges and M. Guerrero, The Book of Imaginary Beings (Vintage, London, 1957/2002), revised and enlarged ed., transl. by Norman Thomas di Giovanni in col- laboration with Jorge Luis Borges; first publ. in Spanish 1957, second Spanish ed. 1967; http://www.uiowa.edu/borges/vakalo/zf/home.html. [95] J. R. R. Tolkien, The Silmarillion (Ballantine Books, New York, 1977/1979), first publ. 1977. [96] A. N. Kolmogorov and S. V. Fomin, Measure, Lebesgue Integrals, and Hilbert Space (Academic Press, New York, 1960/1962), transl. by Natascha Artin Brunswick and Alan Jeffrey; first publ. in Russian 1960. [97] W. Rudin, Real and Complex Analysis (McGraw-Hill, London, 1970). [98] C. Swartz, Measure, Integration, and Function Spaces (World Scientific Publishing, Singapore, 1994). [99] D. H. Fremlin, Measure Theory. Vol. 1: The Irreducible Minimum (Torres Fremlin, Colchester, England, 2000/2004), http://www.sx.ac.uk/maths/staff/ fremlin/mt.htm; first publ. 2000. [100] D. H. Fremlin, Measure Theory. Vol. 2: Broad Foundations (Torres Fremlin, Colch- ester, England, 2001/2003), http://www.sx.ac.uk/maths/staff/fremlin/mt. htm; first publ. 2001. [101] D. H. Fremlin, Measure Theory. Vol. 3: Measure Algebras (Torres Fremlin, Colch- ester, England, 2002/2004), http://www.sx.ac.uk/maths/staff/fremlin/mt. htm; first publ. 2002. [102] D. H. Fremlin, Measure Theory. Vol. 4: Topological Measure Spaces. Part I (Torres Fremlin, Colchester, England, 2003/2006), http://www.sx.ac.uk/maths/staff/ fremlin/mt.htm; first publ. 2003. [103] D. H. Fremlin, Measure Theory. Vol. 4: Topological Measure Spaces. Part II (Torres Fremlin, Colchester, England, 2003/2006), http://www.sx.ac.uk/maths/staff/ fremlin/mt.htm; first publ. 2003. [104] E. J. McShane, A unified theory of integration, Am. Math. Monthly 80(4), 349–359 (1973).

36 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

[105] P. Zakrzewski, Some set-theoretic aspects of measure theory, Cubo Matemática Educacional 3(2), 75–88 (2001), http://www.mimuw.edu.pl/~piotrzak/ publications.html. [106] P. G. L. Porta Mana, Ph.D. thesis, Kungliga Tekniska Högskolan, Stockholm (2007), http://web.it.kth.se/~mana/. [107] P. G. L. Porta Mana (2007), in preparation. [108] M. R. Novick, Multiparameter Bayesian indifference procedures,J.Roy.Stat.Soc.B 31(1), 29–64 (1969). [109] S. L. Zabell, The rule of succession, Erkenntnis 31(2–3), 283–321 (1989). [110] H. M. Jaeger, S. R. Nagel, and R. P. Behringer, Granular solids, liquids, and gases, Rev. Mod. Phys. 68(4), 1259–1273 (1996). [111] P. G. de Gennes, Granular matter: a tentative view, Rev. Mod. Phys. 71(2), S374– S382 (1999). [112] F. Rouyer and N. Menon, Velocity fluctuations in a homogeneous 2D granular gas in steady state,Phys.Rev.Lett.85(17), 3676–3679 (2000). [113] A. Puglisi, V. Loreto, U. Marini Bettolo Marconi, and A. Vulpiani, Kinetic ap- proach to granular gases,Phys.Rev.E59(5), 5582–5595 (1999), eprint arXiv: cond-mat/9810059. [114] T. P. C. van Noije and M. H. Ernst, Velocity distributions in homogeneous granular flu- ids: the free and the heated case, Granular Matter 1(2), 57–64 (1998), eprint arXiv: cond-mat/9803042. [115] S. Newcomb, Note on the frequency of use of the different digits in natural numbers, Am. J. Math. 4(1/4), 39–40 (1881), benford. [116] F. Benford, The law of anomalous numbers, Proc. Am. Philos. Soc. 78(4), 551–572 (1938). [117] R. A. Raimi, The first digit problem, Am. Math. Monthly 83(7), 521–538 (1976). [118] R. A. Raimi, The first digit phenomenon again,Proc.Am.Philos.Soc.129(2), 211– 219 (1985). [119] E. T. Jaynes, The minimum entropy production principle, Annu. Rev. Phys. Chem. 31, 579 (1980), http://bayes.wustl.edu/etj/node1.html. [120] E. T. Jaynes, Macroscopic prediction,inComplex Systems — Operational Ap- proaches, edited by H. Haken (Springer-Verlag, Berlin, 1985), p. 254, http:// bayes.wustl.edu/etj/node1.html. [121] R. C. Dewar, Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states, J. Phys. A 36(3), 631–641 (2003), eprint arXiv:cond-mat/0005382. [122] R. C. Dewar, Maximum entropy production and the fluctuation theorem, J. Phys. A 38(21), L371–L381 (2005). [123] R. C. Dewar, Maximum entropy production and non-equilibrium statistical mechanics, in Kleidon and Lorenz [137] (2005), pp. 41–55. [124] B. de Finetti, Sur la condition d’équivalence partielle (1938), transl. in [74]. [125] E. T. Jaynes, Monkeys, kangaroos and N,inMaximum-Entropy and Bayesian Methods in Applied Statistics, edited by J. H. Justice (Cambridge University Press, Cambridge, 1986), p. 26, see also the revised and corrected version [1]. [126] R. Jeffrey, Reading Probabilismo, Erkenntnis 31(2–3), 225–237 (1989), see [3]. [127] H. E. Kyburg, Jr. and H. E. Smokler, eds., Studies in Subjective Probability (John Wiley & Sons, New York, 1964). [128] B. de Finetti, La prévision : ses lois logiques, ses sources subjectives,Ann.Inst.Henri Poincaré 7(1), 1–68 (1937), transl. as [4].

37 Porta Mana,Månsson,Bjork¨ The Laplace-Jaynes approach to induction

[129] P. S. Laplace, (Marquis de), Œuvres complètes de Laplace. Tome septième : Théorie analytique des probabilités (Gauthier-Villars, Paris, 1886), ‘Publiées sous les auspices de l’Académie des sciences, par MM. les secrétaires perpétuels’; http://gallica. bnf.fr/notice?N=FRBNF30739022. [130] J. Hintikka and P. Suppes, eds., Aspects of Inductive Logic (North-Holland, Amster- dam, 1966). [131] P.Exner and H. Neidhardt, eds., Order, Disorder and Chaos in Quantum Systems: Pro- ceedings of a conference held at Dubna, USSR on October 17–21, 1989 (Birkhäuser Verlag, Basel, 1990). [132] P. L. Butzer and F. Fehér, eds., E. B. Christoffel: The Influence of His Work on Math- ematics and the Physical Sciences (Birkhäuser Verlag, Basel, 1981). [133] R. C. Jeffrey, ed., Studies in inductive logic and probability (University of California Press, Berkeley, 1980). [134] J. Hintikka and P. Suppes, eds., Information and Inference (D. Reidel Publishing Com- pany, Dordrecht, 1970). [135] P. S. Laplace, (Marquis de), Œuvres complètes de Laplace. Tome huitième : Mémoires extraits des recueils de l’Académie des sciences de Paris et de la classe des sciences mathématiques et physiques de l’Institut de France (Gauthier-Villars, Paris, 1891), ‘Publiées sous les auspices de l’Académie des sciences, par MM. les secrétaires per- pétuels’; http://gallica.bnf.fr/notice?N=FRBNF30739022. [136] P. S. Laplace, (Marquis de), Memoir on the probability of the causes of events, Stat. Sci. 1(3), 364–378 (1986), transl. by S. M. Stigler of [90]; see also the translator’s introduction [138]. [137] A. Kleidon and R. D. Lorenz, eds., Non-equilibrium Thermodynamics and the Produc- tion of Entropy: Life, Earth, and Beyond (Springer, Berlin, 2005), with a Foreword by Hartmut Grassl. [138] S. M. Stigler, Laplace’s 1774 memoir on inverse probability, Stat. Sci. 1(3), 359–363 (1986), introduction to the transl. [136].

38 Paper H

arXiv:quant-ph/0612105v2 29 Apr 2007 ensuidi h ieaue hi ups sto is purpose Their literature. the in studied been “ “estimation”, construction”, † ∗ fpoaiitcmtosi lal seta nti task, use this The in essential clearly outcomes. is measurement methods past probabilistic of or plau- the future deriving of statistical for a sibilities suitable into matrix”) meager, “density is former (or espe- the operator knowledge, which prior in cases and in data cially measurement of kinds ious a paetyb sdwt agrvreyo ro knowl- prior of former. variety the larger which latter, a than the with edge with used concerned 11, be are apparently 10, we can Here [9, 15]. methods 14, Bayesian 13, general 12, more imple- on are based There others ways. of maximum-relative-entropymethods on variety based mentations a in implemented are they 3 2 1 lcrncades [email protected] address: Electronic lcrncades [email protected] address: Electronic od sete h n ersne by represented one the either is holds hrteoerpeetdb tesaitcloperator) statistical (the by represented one the 8]. spin-1 unfair. ther 7, very a 6, sample 5, small for 4, any [3, E.g., render are to contributions as latest vast of and so number Early is the these as on literature achieved The only not larger. and is does larger case that gets This problem outcomes inverse reasoning. and well-posed plausible a measurements require have of we number Mathematically, the su tor. speaking, are roughly outcomes which, measurement in case e. cial (cf. tomography” “Quantum-state a h n ersne by represented you one knowledge the prior was of kind first the on | aeamaueetcrepnigt h oiieoeao-audmea- positive-operator-valued the to sure corresponding measurement a Make ersne by represented x ubro di of number A −

{| x z − + | ,adti di this and ”,

..Qatmsaeasgmn:ter... . theory. assignment: Quantum-state 1.1. z + | , | z ASnmes 03.67.-a,02.50.Cw,02.50.Tt,05.30.-d,02.60.-x numbers: PACS ml ausof values small aaadknso ro nweg scmue atyaayial,prl hog ueia nerto i eight (in c measurement integration data various numerical measurement through of The partly evidence analytically, computer. the partly on a computed on assigned is knowledge operator dimensions) prior statistical of The kinds and system. data quantum three-level a to ods iuto nwihoehsueu ro nweg bu h ieypeaaino h system. the considered. a of are reflecting preparation Slater, likely thus by the and studied operators; about operator, knowledge statistical statistical prior of pure useful set a the has on of one centred structure which distribution convex in Gaussian-like the situation a of by respect represented in other constant the distribution plausibility a by sented ia o emn rjciemaueet efre on performed measurements projective Neumann von tical − | z

− hsppro paper This nacmainpprtecs fmaueetdt ossigi vrg aus n nadtoa prior additional an and values, average in consisting data measurement of case the paper companion a In /

z ytm nweg ht“h tt hthlsi ei- is holds that state “the that knowledge system, 2 − ff z ueia aeinsaeasgmn o he-ee unu system quantum three-level a for assignment state Bayesian Numerical ff |} − rn qatmsaeasgmn”(r“re- (or assignment” “quantum-state erent .INTRODUCTION 1. n ups o bante‘ the obtain you suppose and , rnecnb sflyepotdi oesituations: some in exploited usefully be can erence | ,i di is ”, 3 .Aslt-rqec aa osatadGusa-iepriors Gaussian-like and constant data; Absolute-frequency I. Odsaitclmtos iemaxi- like methods, statistical (Old | z + ff ugiaTkik ösoa,Iajrsaa 2 E144 tchl,Sweden Stockholm, 40 SE-164 22, Isafjordsgatan Högskolan, Tekniska Kungliga ffi

ff rn rmkoldeta tesaethat state “the that knowledge from erent N in oyedauiu ttsia opera- statistical unique a yield to cient r xmlso oceenmrclapiain fBysa unu-tt-sinetmeth- quantum-state-assignment Bayesian of applications numerical concrete of examples ers z + swl stelarge- the as well as erdcin)tcnqe have techniques retrodiction”) | ,weescniinlo h second the on conditional whereas ”, .[,2)uulyrfr otespe- the to refers usually 2]) [1, g. | x hnko ht“h rgnlstate original “the that know then +

.Månsson, A. x + | rteoerpeetdby represented one the or z + eut Conditional result. ’ | z + N

ii r osdrd w id fpirkoldeaeue:oerepre- one used: are knowledge prior of kinds Two considered. are limit z encode ∗ + | .G .PraMana, Porta L. G. P. rteone the or Dtd 9Arl2007) April 29 (Dated: 1 2 var- and and niti bouefeuniso h ucmsof outcomes the of frequencies absolute in onsist bu h osbesaei hc h ytmi rprdbe prepared distribution is plausibility system ‘prior’ the a which by in expressed state possible the about e of set ρ statistical the calculate to used expression operator the in agree less however, mention. those Jones, special ones); and deserve Dukes, earlier and of Larson Holevo, rediscoveries Helstrom, just by are contributions jus- individual which do the of not of (some importance does relative unfortunately the list found dull to be tice a already Such can points 60, [66]. 59, central 58, Bloch some 57, in 56, 65]; 55, 64, 43, 54, 63, 53, 42, 52, 62, 8, 51, Horn- 61, 7, 50, 49, 6, and 48, [5, 27], 47, Malley others 46, many 26, 45, 38], 44, and 41], 25, [37, [40, Slater 24, Jones [39], 23, stein [36], Dukes 22, Ivanovi and [31], 21, Bloore Larson [20, 30], 29, Park [28, and Holevo Band [17, 19], related Helstrom [16], less Segal 18, by or works more the in of consist could sample studies A gradually. developed were ones.) Bayesian are the they of since cases either special here only considered not are likelihood, mum e a epeae r ersne ysaitcloperators statistical by represented are prepared be can tem knowledge r ie n§4.Lttemaueetdata measurement the Let 4). § in given are hro,and thereof, etdb the by sented g ( N hs e ednt by denote we set whose , u eaieetoy hs ehd hspoiels rdciepwrin power predictive less provide quan- thus the example. with methods this used these be entropy; to relative as operator viz. tum way, statistical same mixed” maximum-entropy the “completely quantum in encoded same in are the But knowledge prior before. of as kinds both much methods as just now know you ρ l aeinqatmsaeasgmn ehiusmr or more techniques assignment quantum-state Bayesian All techniques Bayesian the behind ideas fundamental The dnial rprdthr prepared identically )d N ρ † outcomes ρ weed (where n .Björk G. and D I ∧ h cniin’ r‘tts,i hc h sys- the which in ‘states’, or ‘conditions’, The . I g N noigtemaueetdata measurement the encoding luiiiydniy oetcncldetails technical more density; plausibility a oiieoeao-audmeasures positive-operator-valued ρ i 1 ,..., savlm lmn on element volume a is elvlsses Various systems. ee-level i k ,..., S e h ro knowledge prior the Let . i N of N esrmns repre- measurements, N 3,3,3,35], 34, 33, [32, c ´ iden- D D S p n h prior the and oss na in consist ( rasubset a or ρ { E | I μ ( k )d ) : ρ μ = = I 2

2 1,...,rk}, k = 1,...,N. Bayesian quantum-state assignment those in [73, 74, 75, 76]) convex region of d − 1 dimensions techniques yield a ‘posterior’ plausibility distribution of the (2d − 2 dimensions if only pure statistical operators are con- form sidered), and one has to choose between explicit integration limits but very complex integrands, or, vice versa, simpler in- p(D| ρ) p(ρ| I)dρ p(ρ| D ∧ I)dρ = , tegrands but implicitly defined integration regions. | ρ ρ| ρ p(D ) p( I)d Therefore explicit calculations have hitherto been confined E (k)ρ ρ ρ (1a) almost exclusively to two-level systems, which have the ob- k tr i g( )d = k . vious advantages of low-dimensionality and symmetry (the tr E (k)ρ g(ρ)dρ k ik set of statistical operators is the three-dimensional Bloch ball [77, 78]); in some cases these allow the derivation of ρ 6 and a statistical operator D∧I given by a sort of weighted analytical results. In studies by Jones [37], Larson and average,4 Dukes [36], Slater [41, 79, 80] (these are very interesting stud- ies; cf. also [81, 82]), and Bužek, Derka, et al. [5, 7, 54], the  ρ E (k)ρ ρ ρ k tr g( )d posterior distributions p(ρ| D)dρ and the ensuing statistical ρ  ρ ρ| ∧ ρ = ik . D∧I p( D I)d ρ (k) operator D∧I are explicitly calculated for measurement data k tr E ρ g(ρ)dρ ik D and priors I of various kinds. In some of these studies the (1b) integrations range over the whole set of statistical operators, These formulae may present differences of detail from author in others over the pure ones only. As regards higher-level sys- to author, reflecting — quite excitingly! — different philo- tems, the only numerical study known to us is that by Bužek sophical stands. For instance, the prior distribution (and there- et al. [7, 54] for a spin-3/2 system; however, they assume from fore the integration) is in general defined over the whole set the start that the aprioripossible statistical operators are con- of statistical operators; but a person who conceives only pure fined to a three-dimensional subset of the pure ones; this as- statistical operators as representing sort of “real, internal (mi- sumption simplifies the integration problem from 15 to 3 di- croscopic) states of the system” may restrict it to those only. mensions. A person who, on the other hand, thinks of the statistical op- erators themselves as encoding “states of knowledge” au pair with plausibility distributions, might see the prior distribution as a “plausibility of a plausibility”, and thus prefer to derive 1.3. More practice: the place of the present study the formula above through a quantum analogue of de Finetti’s theorem; in this case the derivation will involve a tensor prod- In this paper and its companion [83] we provide numeri- uct ρ ⊗···⊗ρ of multiple copies of the same statistical oper- cal examples of quantum-state assignment, via eqs. (1), for a ator.5 three-level system. The set of statistical operators of such a The formulae (1) (or special cases thereof) are proposed system, S3, is eight-dimensional — a high but still computa- and used in Larson and Dukes [36], Jones [37, 38], and e.g. tionally tractable number of dimensions — and has less sym- Slater[41],Derka,Buzek,etal.[5,7,54],andMana[67]. metries, in respect of its dimensionality, than that of a two- We arrived at these same formulae (as special cases of formu- level one: a two-level system is a ball in R3, but a three-level lae applicable to generic, not necessarily quantum-theoretical one is definitely not a ball in R8.7 Some three-dimensional systems) in a series of papers [68, 69, 70, 71] (see also [67, sections of this eight-dimensional set are given in fig. 1 (two- 72]) in which we studied and tried to solve the various philo- dimensional sections can be found in [74]; four-dimensional sophical issues to our satisfaction. ones are also available [76]); see also Bloore’s very interesting study [31]. We study data D and prior knowledge I of the following 1.2. ...andpractice kind:

In regard to the numerical computation of formulae (1) in actual or fictive state-assignment problems, with explicitly 6 given prior distributions and measurement data, the number The high symmetry, however, renders the results independent of the par- ticular choice of prior knowledge in some cases, e.g. when the prior is of studies is much smaller. The main problem is that for- spherically symmetric and the data consist on averages. mula (1b), when applied to a d-level system, generally in- 7 In group-theoretical terms, the “quantum” symmetries of the set of statisti- volves an integration over a complicated (see e.g. figs. 1 and cal operators of a three-level system are fewer than those it could have had as a eight-dimensional compact convex set. The former symmetries are in fact equivalent to the group U(3)/U(1) [84, 85, 86, 87][88, § 4], of 8 di- mensions, whereas the latter could have been as large the group SO(8), of 28 dimensions [89, 90, 91, 92]. Compare with the case of a two-level sys- 4 ρ Note that, as shown in § 2, the derivation of the formula for D∧I does not tem, whose symmetry group U(2)/U(1), of 3 dimensions, is isomorphic to require decision-theoretical concepts. the largest symmetry group that a three-dimensional compact convex body 5 We leave to the reader the entertaining task of identifying these various can have, SO(3). (We have only considered the connected part of these philosophical stances in the references already provided. groups; one should also take the semidirect product with Z2.) 3

Figure 1: Some three-dimensional sections of the eight-dimensional set S3 of the statistical operators for a three-level quantum system. The adopted coordinate system is explained in § 3. 4

• The measurement data D consistinasetofN outcomes particularise the latter to our study. In § 3 Kimura’s parametri- of N instances of the same measurement performed sation and the Bloch-vector set are introduced. The two prior on N identically prepared systems. The measure- distributions adopted are discussed in § 4. The calculation, ment is represented by the extreme positive-operator- by symmetry arguments and by numerical integration, of the valued measure (i.e., non-degenerate ‘von Neumann Bloch vectors and of the corresponding statistical operators measurement’) having three possible distinct out- is presented in § 5, for all data and priors. In § 6 we offer comes {‘1’, ‘2’, ‘3’} represented by the eigenprojectors some remarks on the incorporation into the formalism of un- {|1 1|, |2 2|, |3 3|}. The data D thus correspond to a certainties in the detection of outcomes. In § 7 we discuss the , ,  ¯ triple of absolute frequencies (N1 N2 N3) N, with form the assigned statistical operator takes in the limit of a Ni 0and i Ni = N. We consider various such triples very large number of measurements. Finally, the last section for small values of N, as well as for the limiting case of summarises and discusses the main points and results. very large N.

• Two different kinds of prior knowledge I are used. The 2. STATISTICAL-OPERATOR ASSIGNMENT first, Ico, is represented by a prior plausibility distribu- tion 2.1. General case

p(ρ| Ico)dρ = gco(ρ)dρ ∝ dρ, (2) This section provides a summary derivation of the formulae which is constant in respect of the convex structure of for statistical-operator assignment. For a more general deriva- the set of statistical operators, in the sense explained in tion of analogous formulae valid for any kind of system (clas- §§ 3 and 4. The second, Iga, is represented by a spheri- sical, quantum, or exotic), and for a discussion of some philo- cally symmetric, Gaussian-like prior distribution sophical points involved, we refer the reader to [68, 69, 70, 71] tr[(ρ −|2 2|)2] and also [67, 72]. p(ρ| I )dρ = g (ρ)dρ ∝ exp − dρ, (3) ga ga s2 There is a preparation scheme that produces quantum sys- tems always in the same ‘condition’ — the same ‘state’. We | | centred on the statistical operator 2 2 , one of the pro- do not know which this condition is, amongst a set of possi- jectors of the von Neumann measurement. This prior ble ones;8 although there may be some conditions in that set expresses some kind of knowledge that leads us to as- that are more plausible than others. Our knowledge I,inother sign higher plausibility to regions in the vicinity of words, is expressed by a plausibility distribution over these | | 2 2 . conditions. To each condition is associated a statistical oper- ρ To assign a statistical operator D∧I from these data and ator; this encodes the plausibility distributions that we assign priors means to assign eight independent real coefficients of for all possible quantum measurements, given that that partic- its matrix elements, or equivalently a vector of eight real pa- ular condition hold. Therefore we can and shall more simply rameters bijectively associated with them. These parameters, speak in terms of statistical operators instead of the respective according to eq. (1b), must be computed by the integration conditions. Note that this is, however, a metonymy, i.e. we of a function (actually two, the other being a normalisation are speaking about something (‘statistical operator’) although factor) defined over the set of all statistical operators. Hence it is something else but related to it (‘condition’) that we really the function itself and the integration region can be expressed mean. in terms of eight coordinates, corresponding to the parame- We thus have a plausibility distribution over some statistical ters. The coordinate system should be chosen in such a way operators. It can in full generality be written as that both the function and the integration limits have a not too ρ| ρ = ρ ρ, complex form. For these reasons we choose the parametri- p( I)d g( )d (4) sation studied in particular by Kimura [74]. In this case the defined over the whole set of statistical operators, denoted by vectors of real parameters associated to a statistical operator S. The function g is a normalised positive generalised func- is called a ‘Bloch vector’. tion.9 In this way the more general case is also accounted for In such a coordinate system, six of the eight parameters can in which the whole set of statistical operators S is involved: be calculated analytically and quite straightforwardly by sym- the case with a finite number of aprioripossible statistical metry arguments, for all absolute-frequency triples N¯ .The remaining two parameters have been numerically calculated for some triples N¯ by a computer using quasi-Monte Carlo integration methods, suitable for high-dimensional problems. 8 We intentionally use the vague term ‘condition’, since each researcher can Further symmetry arguments yield the parameters for the re- understand it in terms of his or her favourite physical picture (internal mi- maining triples. croscopic configurations, macroscopic procedures, pilot waves, propensi- All these points as well as the results are discussed in the ties, grounds for judgements of exchangeability, or whatnot). Quantum theory offers no concrete physical picture, only some constraints on how paperasfollows:In§2wequicklypresentthereasoninglead- such a picture should work; so each one can provide one’s favourite. ing to the statistical-operator-assignment formulae (1), and 9 See footnotes 15 and 16. 5 operators corresponds to a g equal to a sum of appropriately even all, of the measurements (and therefore their positive- weighted Dirac deltas.10 operator-valued measures) can be identical. The outcomes Our ‘prior’ knowledge I about the preparation can be rep- i1,...,ik,...,iN are or were obtained; this is our new data resented by a unique statistical operator: Suppose we are to D. The plausibility for this to occur, according to the prior give the plausibility of the μth outcome of an arbitrary mea- knowledge I, is given by a generalisation of expression (5): surement, represented by the positive-operator-valued mea-  N sure {E μ }, performed on a system produced according to the | ≡ E (1),...,E (N)| = E (k)| ρ ρ| ρ. p(D I) p i i I p i p( I)d preparation. Quantum mechanics dictates the plausibilities 1 N S k=1 k p(E μ | ρ) = tr(E μ ρ), and by the rules of plausibility theory (8) we assign, conditional on I,11   On the evidence of D we can update the prior plausibility dis- tribution p(ρ| I)dρ. By the rules of plausibility theory p(E μ | I) = p(E μ | ρ) p(ρ| I)dρ = tr(E μ ρ) g(ρ)dρ, S S p(D| ρ) p(ρ| I)dρ (5) p(ρ| D ∧ I)dρ = , | ρ ρ| ρ or more compactly, by linearity of the trace, S p(D ) p( I)d E (k)ρ ρ ρ (9) p(E μ | I) = tr E μ ∫ ρ g(ρ)dρ , k tr i g( )d (6) = k . = E ρ , (k) tr( μ I ) tr E ρ g(ρ)dρ S k ik ρ with the statistical operator I defined as The statistical operator encoding the joint knowledge D ∧ I   is thus, according to eq. (7) and using eq. (9), ρ  ρ ρ| ρ = ρ ρ ρ. I p( I)d g( )d (7)  (k) S S ρ k tr E ρ g(ρ)dρ ρ  ρ ρ| ∧ ρ = S ik . D∧I p( D I)d S (k) The prior knowledge I can thus be compactly represented by, k tr E ρ g(ρ)dρ ρ ρ S ik or “encoded in”, the statistical operator I .Notehow I ap- (10) pears naturally, without the need to invoke decision-theoretics arguments and concepts, like cost functions etc. Note also that ρ the association between I and I is by construction valid for 2.2. Three-level case generic knowledge I, be it “prior” or not. ρ The statistical operator I is a “disposable” object. As soon So far everything has been quite general. Let us now con- as we know the outcome of a measurement on a system pro- sider the particular cases studied in this paper. duced according to our preparation, the plausibility distribu- The preparation scheme concerns three-level quantum sys- ρ| ρ tion p( I)d should be updated on the evidence of this new tems; the corresponding set of statistical operators will be de- piece of data D, and thus we get a new statistical operator noted by S .TheN measurements considered here are all ρ 3 I∧D. And so on. It is a fundamental characteristic of plau- instances of the same measurement, namely a non-degenerate ff sibility theory that this update can indi erently be performed projection-valued measurement (often called ‘von Neumann with a piece of data at a time or all at once. (k) measurement’). Thus, for all k = 1,...,N, {E μ } = {E μ }  So suppose we come to know that N measurements, repre- {|1 1|, |2 2|, |3 3|}. The projectors |1 1|, |2 2|, |3 3| define {E (k) μ = sented by the N positive-operator-valued measures μ : an orthonormal basis in Hilbert space. All relevant operators ,..., } = ,..., 1 rk , k 1 N, are or have been performed on N will, quite naturally and advantageously, be expressed in this I systems for which our knowledge holds. Note that some, basis. We have for example that tr(E μ ρ) ≡ ρμμ,theμth di- agonal element of ρ. The data D consist in the set of outcomes {i1,...,iN } of the N measurements, where each ik is one of the three possible 10 The knowledge I and all inferential steps to follow concern a preparation scheme in general and not specifically this or that system only; just like outcomes ‘1’, ‘2’, or ‘3’. The formula (10) for the statistical tastings of cakes made according to a given unknown recipe increase our operator thus takes the form knowledge of the recipe, not only of the cakes. If one insists in seeing the knowledge I and the various inferences as referring to a given set of, say, ρ N ρ ρ ρ S = ik ik g( )d M systems only, then that knowledge is represented by a plausibility distri- 3 k 1 ρ ∧ = , (11) bution over the statistical operators of these M systems, i.e. over the Carte- D I N ρ ρ ρ M (1) (M) (1) (M) = ik ik g( )d sian product S , and has the form p(ρ ,...,ρ | I)dρ ···dρ = S3 k 1 g(ρ(1)) δ(ρ(2) − ρ(1)) ···δ(ρ(M) − ρ(1))dρ(1) ···dρ(M). Integrations are then also to be understood accordingly. Note moreover that if we con- with ik ∈{1, 2, 3} for all k. sider joint quantum measurements on all the systems together, then we are However, it is clear from the expressions in the integrals really dealing with one quantum system, not M. 11 above that the exact order of the sequence of ‘1’s, ‘2’s, and ‘3’s We do not explicitly write the prior knowledge I whenever the statistical , , operator appears on the conditional side of the plausibility; i.e., p(·| ρ)  is unimportant; only the absolute frequencies (N1 N2 N3)of p(·| ρ, I). appearance of these three possible outcomes matter (naturally, 6  Ni 0and iNi = N). We can thus rewrite the last equation a basis. The vector x ≡ (x j)ofcoefficients in equation (13) is as uniquely determined by ρ: ρ 3 ρNi ρ ρ S i=1 g( )d x = x (ρ) = tr(λ ρ). (14) ρ = 3 ii , j j j D∧I (12) 3 ρNi ρ ρ S i=1 ii g( )d 3 The operators {λ j}, being Hermitean, can also be regarded as observables and then the equation above says that the (xi) with the convention, here and in the following, that ρNi  1 ii are the corresponding expectation values in the state ρ: x j = whenever Ni = ρii = 0 (the reason is that the product origi- λ j ρ [106]. nally is, to wit, restricted to the terms with N > 0). i A systematic construction of generators of SU(d)which The discussion of the explicit form of the prior g(ρ)dρ is generalises the Pauli spin operators is known (see e.g. [74, deferred to § 4. We shall first introduce on S a suitable coor- 3 107]). In particular, for d = 2 they are the usual Pauli spin dinate system (x ,...,x ) ≡ x ∈ R8 so as to explicitly calcu- 1 8 operators, and for d = 3 they are the Gell-Mann matrices late the integrals. This is done in the next section. (see e.g. [108]). In the eigenbasis {|1 1|, |2 2|, |3 3|} of the von Neumann measurement {E μ } introduced in the previous 3. BLOCH VECTORS section these matrices assume the particular form ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ − ⎟ ⎜ ⎟ ⎜010⎟ ⎜0 i0⎟ ⎜10 0⎟ In order to calculate the integrals required in the state- λ = ⎜ ⎟ , λ = ⎜ ⎟ , λ = ⎜ ⎟ , 1 ⎝⎜100⎠⎟ 2 ⎝⎜i00⎠⎟ 3 ⎝⎜00 0⎠⎟ assignment formula (12) we put a suitable coordinate sys- − 8 000 000 00 1 tem on S3, so that they “translate” as integrals in R .In ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ff ⎜001⎟ ⎜00−i⎟ ⎜000⎟ di erential-geometrical terms, we choose a particular chart on ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ λ = ⎜000⎟ , λ = ⎜00 0⎟ , λ = ⎜001⎟ , S3 considered as a differentiable manifold [93, 94, 95, 96, 97, 4 ⎝⎜ ⎠⎟ 5 ⎝⎜ ⎠⎟ 6 ⎝⎜ ⎠⎟ 100 i00 010 98, 99]. ⎛ ⎞ ⎛ ⎞ There exists an ‘Euler angle’ parametrisation [100, 101, ⎜00 0⎟ ⎜100⎟ ⎜ ⎟ 1 ⎜ ⎟ S R8 λ = ⎜ − ⎟ , λ = √ ⎜ − ⎟ . 102, 103] which maps 3 onto a rectangular region of 7 ⎝⎜00 i⎠⎟ 8 ⎝⎜0 20⎠⎟ (modulo identification of some points). With this parametri- 0i 0 3 001 sation the integration limits of our integrals become advanta- (15a) geously independent, but the integrands (p(D| ρ) in particular) We see that our von Neumann measurement corresponds to acquire too complex a form. the observable For the latter reason we choose, instead, the parametrisation studied by Byrd, Slater and Khaneja [101, 104], Kimura [74] λ 3 ≡|1 1| + 0|2 2|−|3 3|, (15b) (see also [75]), and Bölükba¸sı and Dereli [105], amongst oth- ers. The functions to be integrated take simple polynomials or the measurement outcomes being associated with the particu- exponentials forms. The integration limits are no longer inde- lar values 1, 0, and −1. These eigenvalues, however, are of no pendent, though — in fact, they are given in an implicit form importance to us (they will be more relevant in the companion and will be accounted for by multiplying the integrands by a paper [83]). characteristic function. For a three-level system, and in the eigenbasis We follow Kimura’s study [74] here, departing from it on {|1 1|, |2 2|, |3 3|}, the operator ρ in (13) can thus be some definitions. All statistical operators of a d-level quantum written in matrix form as: system can be written in the following form [74] (see also [75, 101, 104]): ρ = ρ(x) = ⎛ ⎞ 1 + 1 + √1 1 − 1 − n ⎜ (x3 x8) (x1 ix2) (x4 ix5) ⎟ 1 1 ⎜ 3 2 3 2 2 ⎟ n ⎜ 1 1 √1 1 ⎟ ρ = ρ(x) = Id + x jλ j, (x1,...,xn) ≡ x ∈ Bn ⊂ R . ⎜ (x1 + ix2) − x8 (x6 − ix7) ⎟ . (16) d 2 ⎝⎜ 2 3 3 2 ⎠⎟ j=1 1 1 1 1 √1 (x4 + ix5) (x6 + ix7) + (−x3 + x8) (13) 2 2 3 2 3 This matrix is Hermitean and has unit trace, so the remaining where n ≡ d2 − 1 is the dimension of S,andB is a com- n condition for it to be a statistical operator is that it be positive pact convex subset of Rn. The operators {λ } satisfy (1) j semi-definite (non-negative eigenvalues). This is equivalent to λ = λ † λ = λ λ = δ j j ,(2)tr j 0, (3) tr( i j) 2 ij. Together with two conditions [74] for the coefficients x: with our definitions the identity operator Id they are generators of SU(d), and of the Gell-Mann matrices, the first is in respect of the Frobenius (Hilbert-Schmidt) inner product λ · λ  tr(λ λ ) [92] they also constitute a complete or-  4 i j i j x2 ≡ 8 x2 , (17a) thogonal basis for the vector space of Hermitean operators on k=1 i 3 a d-dimensional Hilbert space. In fact, eq. (13) is simply the √ decomposition of the Hermitean operator ρ in terms of such which limits x to be inside or on a ball of radius 2/ 3; the 7 second is √ 2 2 2 2 2 3 8 − 18x + 27x3 x + x − x − x − 6 3x + √ 1 2 6 7 8 2 + 2 + 2 − 2 + 2 + 2 + 2 + 9 3x8 2 x3 x4 x5 x1 x2 x6 x7 54(x1x4 x6 + x2 x4 x7 + x2 x5 x6 − x1 x5 x7) 0. (17b)

The set of all real vectors x satisfying conditions (17) is called the ‘Bloch-vector set’ B8 of the three-level system:

8 B8  {x ∈ R | (17) hold}. (18)

Figure 2: Schematic illustration of the relations amongst S3, B8,and Since there is a bijective correspondence between B8 and S3, C8. we can parametrise the set of all statistical operators S3 by the set of all Bloch vectors.12 S B Both 3 and 8 are convex sets [16, 31, 42, 43, 44, 109, we shall do is in fact the opposite: we define dρ to be the 110, 111, 112, 113, 114], and the maps volume element on S3 which in the coordinates x is simply dx.Indifferential-geometrical terms, dρ is the pull-back [95, S → B ρ → ρ 3 8 by x( ) (19) 96, 97, 98, 99] of dx induced by the map ρ → x: given by (14), and its inverse dρ → dx. (25)

It is worth noting that this choice of volume element is not B8 → S3 by x → ρ(x) (20) arbitrary, but rather quite natural. On any n-dimensional con- given by (13) or (16) are convex isomorphisms, i.e. they pre- vex set S we can define a volume element which is canonical serve convex combinations: in respect of S ’s convex structure, as follows. Consider any convex isomorphism c: S → B between S and some subset         x(α ρ + α ρ ) = α x(ρ ) + α x(ρ ), (21) B ⊂ Rn. Consider the volume element on B defined by ρ(α x + α x) = αρ(x) + αρ(x), (22) ω  dy/ ∫ dy, (26) B with α, α 0, α + α = 1. This fact will be relevant for the discussion of the prior distributions. where dy is the canonical volume element on Rn. The pull- ∗ It is useful to introduce the characteristic function x → back c (ω)ofω onto S then yields a volume element on the χB(x)ofthesetB8: latter. It is easy to see that the volume element thus induced ⎧ (1) does not depend on the particular isomorphism c (and set ⎨⎪ ∈ B , ffi χ  1ifx 8, i.e. if (17) hold B) chosen, since all such isomorphisms are related by a ne B(x) ⎩⎪ (23) → +  ∈ Rn 0ifx  B8, i.e. if (17) do not hold, coordinate changes (y Ay b, with det A 0, b ); (2) is invariant in respect of convex automorphisms of S ;(3) and to consider the smallest eight-dimensional rectangular re- assigns unit volume to S , as clear from eq. (26). These prop- 14 gion (or ‘orthotope’ [110]) C8 containing B8. As shown in the erties make this volume element canonical. appendix, C8 is Since the parametrisation S3 → B8 is a convex isomor- phism, we see that dρ as defined in (25) is the canonical vol- C  [−1, 1]7 × − √2 , √1 ⊃ B . (24) S 8 3 3 8 ume element of 3 in respect of its convex structure. S The relations amongst S , B ,andC are schematically illus- We can finally write any integral over 3 in coordinate form. 3 8 8 ρ → ρ trated in fig. 2. In fig. 1 we can see some three-dimensional If f ( ) is an integrable (possibly vector-valued) function over S3, its integral becomes sections (through the origin) of B8 — and thus of S3 as well, in the sense of their isomorphism.   We are almost ready to write the integrals of formula (12) f (ρ)dρ ≡ f [ρ(x)] χB(x)dx ≡ 8 S C in coordinate form, i.e. as integrals over R . It only remains 3  8   13 1 1 √1 to specify the volume element dρ in coordinate form. What 3 dx1 ··· dx7 dx8 f [ρ(x)] χB(x). (27) −1 −1 − √2 3

12 On some later occasions the terms ‘statistical operators’ and ‘Bloch vec- tors’ might be used interchangeably; but it should be clear from the context which one is really meant. 14 In measure-theoretic terms, we have the canonical measure B → 13 An odd volume form [115][95, § IV.B.1] (see also [96, 99]). Recall that a m[c(B)]/m[c(S )], where B is a set of the appropriate σ-field of S and m metric structure is not required, only a differentiable one. is the Lebesgue measure on Rn. 8

This form is especially suited to numerical integration by computer and we shall use it hereafter. We can thus rewrite ρ the state-assignment formula (12) for D∧I as: ρ 3 ρ Ni χ C (x) i=1 ii(x) B(x) g(x)dx ρ = 8 . D∧I (28) 3 N = ρii(x) i χB(x) g(x)dx C8 i 1 Expanding the ρ(x) inside the integrals using eq. (13) (equiv- alent to (16)) we further obtain

8 1 1 L j(N¯ , I) ρ ∧ = I + λ j, (29) D I 3 3 2 Z N¯ , I j=1 ( ) where  3 Ni L j(N¯ , I)  x j ρii(x) g(x) χB(x)dx, (30a) = C8 i 1

Figure 3: Graph of the constant prior’s marginal density (x3, x8) → for j = 1,...,8, and ∫ gco dx1 dx2 dx4 dx5 dx6 dx7. The triangle represents the boundary of  B8 in the Ox3 x8 plane (see § 5.2, and cf. figs. 5–9). 3 Ni Z(N¯ , I)  ρii(x) g(x) χB(x)dx. (30b) C i=1 8 The first kind of prior knowledge considered in our study, We shall omit the argument ‘(N¯ , I)’ from both L j and Z when Ico, has a constant density: it should be clear from the context. | = ∝ , It is now time to discuss the prior plausibility distributions p(x Ico)dx gco(x)dx dx (32) adopted in our study. the proportionality constant being given by the inverse of the volume of B8. This distribution hence corresponds to the canonical volume element (or the canonical measure) dis- 4. PRIOR KNOWLEDGE cussed in the previous section. Thus Ico expresses some- how “vague” prior knowledge (although we do not necessar- The prior knowledge I about the preparation is expressed ily maintain that it be “uninformative”). Fig. 3 shows the as a prior plausibility distribution p(ρ| I)dρ = g(ρ)dρ. marginal density of the coordinates x3 and x8 for this prior. The last expression can be interpreted, in measure-theoretic The state-assignment formula which makes use of this prior terms [116, 117][95, § I.D] [118] (cf. also [119, 120]), as assumes the simplified form ‘μ(dρ)’, where μ is a normalised measure; or it can be sim- ply interpreted, as we do here, as the product of a generalised ρ 3 ρ Ni χ C (x) i=1 ii(x) B(x)dx 15 ρ 16 ρ = 8 . function g and the volume element d . The two points of D∧I (33) co 3 N = ρii(x) i χB(x)dx view are not mutually exclusive of course, and these technical C8 i 1 matters are only relatively important since S3 and the distri- butions we consider are quite well-behaved objects (and the The second prior to be considered expresses somehow bet- I simple Riemann integral suffices for our purposes). ter knowledge ga of the possible preparation. In coordinate We shall specify the plausibility distributions on S giving form it is represented by the spherically symmetric Gaussian- 3 like distribution them directly in coordinate form on B8 (with an abuse of no- tation for g): p(x| I )dx = g (x)dx ∝ ga ga p(x| I)dx = g(x)dx  g[ρ(x)] dx. (31) tr [ρ(x) − ρ(ˆx)]2 (x − xˆ)2 exp − dx ≡ exp dx, (34) s2 2s2 with 15 We always use the term ‘generalised function’ in the sense of Egorov [121], √ whose theory is most general and nearest to the physicists’ ideas and prac- xˆ  (0, 0, 0, 0, 0, 0, 0, −2/ 3), i.e., ρ(ˆx) ≡|2 2|, tice. Cf. also Lighthill [122], Colombeau [123, 124, 125], and Oberguggen- 1 (35) berger [126, 127]. s = √ . 16 It is always preferable to write not only the plausibility density, but the 2 2 volume element as well. The combined expression is thus invariant under | | parameter changes; this also helps not to fall into some pitfalls such as Regions in proximity of 2 2 have greater plausibility, and those discussed by Soffer and Lynch [128]. the plausibility of other regions decreases as their “distance” 9

with the prior distribution gco(x)dx; and the triples

N = 1: (1, 0, 0), (0, 1, 0), (0, 0, 1);

with the Gaussian-like prior distribution gga(x)dx. A combination of symmetries of B8 and numerical integra- tionisusedtocomputeL j and Z.

5.1. Deduction of some Bloch-vector parameters for some data via symmetry arguments

The coefficients L j for j = 1, 2, 4, 5, 6, 7 can be shown to vanish by symmetry arguments. Let us show that L1 = 0in particular. Consider  3 Ni L1 ≡ x1 ρii(x) g(x) χB(x)dx. (37) C i=1 Figure 4: Graph of the Gaussian-like prior’s marginal density 8 , → (x3 x8) ∫ gga dx1 dx2 dx4 dx5 dx6 dx7. The triangle represents the The transformation boundary of B8 in the Ox3 x8 plane (see § 5.2, and cf. figs. 5–9).

x ≡ (x1, x2, x3, x4, x5, x6, x7, x8) → /  ≡ − , , , , , − , , − {tr[ρ(x)−ρ(ˆx)]2}1 2 =ˆ |x−xˆ| from |2 2| increases. The param- x ( x1 x2 x3 x4 x5 x6 x7 x8) (38) eter s may be called the ‘breadth’ of the Gaussian-like func- 17 maps the domain C8 bijectively onto itself, and the absolute tion. The marginal density of the coordinates x3 and x8 for this prior is shown in fig. 4. The state-assignment formula value of its Jacobian determinant is equal to unity. Under this transformation we have that with the prior knowledge Iga assumes the form  = − , − 2 x1 x1 (39a) ρ 3 ρ Ni (x xˆ) χ C (x) = ii(x) exp 2 B(x)dx  8 i 1 2s ρ = ρ = , , , ρ ∧ = . (36) ii(x ) ii(x) i 1 2 3 (39b) D Iga (x−xˆ)2 3 ρ Ni χ  = ii(x) exp 2 B(x)dx = = , , C8 i 1 2s g(x ) g(x) (for both g gco gga) (39c)  χB(x ) = χB(x). (39d) In the following the function g(x) will generically stand for gco(x)orgga(x). Applying the formula for the change of variables [130, 131] to (37), using the symmetries above, and renaming dummy integration variables we obtain 5. EXPLICIT CALCULATION OF THE ASSIGNED  STATISTICAL OPERATOR 3 Ni L1 = x1 ρii(x) g(x) χB(x)dx, C i=1 8 (40) We shall now calculate the statistical operator given by (29), 3 = − ρ Ni χ , which means calculating the L and Z as given in (30a) x1 ii(x) g(x) B(x)dx j C i=1 and (30), for the triples of absolute frequencies 8 ∴ L1 = 0. (41) N = 1: (1, 0, 0) and permutations thereof; Similarly one can show that L2, L4, L5, L6, L7 are all zero N = 2: (2, 0, 0), (1, 1, 0), and permutations; by changing the signs of the triplets (x2, x5, x7), (x1, x5, x6), N = 3: (3, 0, 0), (2, 1, 0), (1, 1, 1), and permutations; (x2, x4, x6), (x2, x4, x6), (x2, x5, x7), respectively. N = 4, 5, 6, 7: (0, N, 0), The assigned statistical operator hence corresponds to the Bloch vector (0, 0, L3/Z, 0, 0, 0, 0, L8/Z), for all triples of ab- solute frequencies N¯ and both kinds of prior knowledge. I.e. it has, in the eigenbasis {|1 1|, |2 2|, |3 3|}, the diagonal matrix 17 "Standard deviation" would be an improper name, e.g., since s has not all form the usual properties of a standard deviation. E.g., although the Hessian ⎛ √ ⎞ ⎜ 1 L3+L8/ 3 ⎟ determinant of the Gaussian-like density vanishes for |x − xˆ| = s, the total ⎜ + 00⎟ ⎜ 3 2Z ⎟ plausibility within a distance s fromx ˆ is 0.0047, not 0.00175 as would ⎜ 1 √L8 ⎟ ρ ∧ = ⎜ 0 − 0 ⎟ (42) be expected of an octavariate Gaussian distribution on R8 [129]. This is D I ⎜ 3 3Z √ ⎟ ⎝ − + / ⎠ simply due to the bounded ranges of the coordinates. 1 + L3 L8 3 003 2Z 10

(note that L3,8 and Z still depend on N¯ and I). 5.2. Numerical calculation for the remaining cases Two further changes of variables — both with unit Jacobian determinant and mapping B8 1-1 onto itself — can be used No other symmetry arguments seem available to derive L3, to reduce the calculations for some absolute-frequency triples L ,andZ for the remaining cases. In fact L , L are in general , , 8 3 8 (N1 N2 N3) to the calculation of other ones, with a reasoning non-zero (Z can never vanish, its integrand being positive and similar to that of the preceding section. never identically naught). It is very difficult — impossible per- The first is haps? — to calculate the corresponding integrals analytically because of the complicated shape of B8. Therefore we have x ≡ (x1, x2, x3, x4, x5, x6, x7, x8) →  ≡ , , − , , − , , , , resorted to numerical integration, using the quasi-Monte Carlo x (x6 x7 x3 x4 x5 x1 x2 x8) (43) integration algorithms provided by Mathematica 5.2.18 under which, in particular, The resulting Bloch vectors for the constant prior gco dx    = , , ρ (x ) = ρ (x), ρ (x ) = ρ (x), ρ (x ) = ρ (x). are shown for N 1 2 3 in figs. 5, 6, and 7 respectively. 11 33 33 11 22 22 = (44) We have included in fig. 5 the case N 0—i.e.,nodata — corresponding to the statistical operator ρ that encodes From eqs. (30) it follows that Ico the prior knowledge Ico. In fig. 8 we have plotted the Bloch L (N , N , N ) = −L (N , N , N ), (45a) 3 3 2 1 3 1 2 3 vectors corresponding to triples of the form (N1, N2, N3) = L8(N3, N2, N1) = L8(N1, N2, N3), (45b) (0, 0, N)forN = 1,...,7. = = Z(N3, N2, N1) = Z(N1, N2, N3), (45c) The cases N 0andN 1 for the Gaussian-like prior g dx are shown in fig. 9. The case N = 0 corresponds to the for both prior distributions g and g . ga co ga statistical operator ρ encoding the prior knowledge I . The second change of variables is an anti-clockwise rota- Iga ga tion of the plane (x , x ) by an angle 2π/3 accompanied by The large triangle in the figures is the two-dimensional sec- 3 8 B permutations of the other coordinates: tion of the set 8 along the plane Ox3 x8. It can, of course, also be considered as a section of the set of statistical op- S | | (x1, x2, x3, x4, x5, x6, x7, x8) → erators 3. This section contains the eigenprojectors 1 1 , √ √ |2 2|, |3 3|, which are the vertices of the triangle, as indi- x3 + 3x8 3x3 − x8 x7, x6, − , x2, x1, x4, x5, , (46) cated. The assigned statistical operators, for all data and pri- 2 2 ors considered in this study, also lie on this triangle since they under which, in particular, are mixtures of the eigenprojectors, as we found in § 5.1,    eq. (42). They are represented by points labelled with the ρ (x ) = ρ (x), ρ (x ) = ρ (x), ρ (x ) = ρ (x), 11 22 22 33 33 11 respective data triples. The points have planar coordinates (47) L (N¯ , I)/Z(N¯ , I), L (N¯ , I)/Z(N¯ , I) . leading to 3 8 The numerical-integration uncertainties 3 and 8,forL3/Z 1 / L3[(N2, N3, N1), Ico] = − L3[(N1, N2, N3), Ico] − and L8 Z respectively, specified in the figures’ legends, vary 2 √ from ±0.0025 for the triplets with N = 2to±0.015 for various 3 L [(N , N , N ), I ], other triplets. Numerical integration has also been performed 2 8 1 2 3 co for those quantities that can be determined analytically (§ 5.1) (48a) , , / , , √ — like L3(0 N 0) Z(0 N 0) e.g. —, and the numerical results agree, within the uncertainties, with the analytical ones. , , , = 3 , , , − L8[(N2 N3 N1) Ico] L3[(N1 N2 N3) Ico] ff 2 (48b) A trade-o between, on the one hand, calculation time and, 1 on the other, accuracy of the result was necessary. The ac- L [(N , N , N ), I ], 2 8 1 2 3 co curacy parameters to be inputted onto the integration routine Z[(N2, N3, N1), Ico] = Z[(N1, N2, N3), Ico]. (48c) were determined by previous rough numerical estimations of the results; in some cases an iterative process of this kind Note that the formulae from this transformation holds only for was adopted. The calculation of the statistical operator for a the constant prior gco. given triple of absolute frequencies N¯ took from three to one From (45) we see that, for both priors, L3 vanishes for all hundred minutes, depending on the accuracy required and the triples of the form (n, N − 2n, n) for some positive integer n complexity of the integrands. N/2, in particular for (0, N, 0) and (n, n, n). In the last case The statistical operators encoding the various kinds of data = L8 0 as well — though only for the constant prior gco —, as and prior knowledge are given in explicit form in table I. can be deduced from (45) and (48). Note that the uncertainties for the statistical operators should In the case of the prior knowledge Ico, it is easy to re- be written as 3λ 3/2 + 8λ 8/2 (cf. eq. (29)); however, we alise that, repeatedly applying the two transformations above, one can derive the values of L3, L8,andZ for all triples (N1, N2, N2) from the values for the triples with N2 N1 N3 only. 18 The programmes are available upon request. 11

I , N = 0 (no data): I N = co ⎛ ⎞ co, 5:⎛ ⎞ ⎜1/30 0⎟ ⎜0.215 ± 0.004a 00⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ a ⎟ ρ , = ⎜ 01/30⎟ ρ , = ⎜ 00.571 ± 0.009 0 ⎟ (000) Ico ⎝⎜ ⎠⎟ (050) Ico ⎝⎜ ⎠⎟ 001/3 000.215 ± 0.004a

NB: This statistical operator encodes the prior knowledge Ico cases (500) and (005) obtained by permutation

I , N = 1: I N = co ⎛ ⎞ co, 6:⎛ ⎞ ⎜0.300 ± 0.001a 00⎟ ⎜0.201 ± 0.004a 00⎟ ⎜ ⎟ ⎜ ⎟ ⎜ a ⎟ ⎜ a ⎟ ρ , = ⎜ 00.399 ± 0.003 0 ⎟ ρ , = ⎜ 00.598 ± 0.009 0 ⎟ (010) Ico ⎝⎜ ⎠⎟ (060) Ico ⎝⎜ ⎠⎟ 000.300 ± 0.001a 000.201 ± 0.004a cases (100) and (001) obtained by permutation cases (600) and (006) obtained by permutation

I , N = 2: I N = co ⎛ ⎞ co, 7:⎛ ⎞ ⎜0.2735 ± 0.0007a 00⎟ ⎜0.191 ± 0.004a 00⎟ ⎜ ⎟ ⎜ ⎟ ⎜ a ⎟ ⎜ a ⎟ ρ , = ⎜ 00.453 ± 0.001 0 ⎟ ρ , = ⎜ 00.619 ± 0.009 0 ⎟ (020) Ico ⎝⎜ ⎠⎟ (070) Ico ⎝⎜ ⎠⎟ . ± . a . ± . a ⎛ 0002735 0 0007 ⎞ 000191 0 004 ⎜0.3642 ± 0.0007a 00⎟ cases (700) and (007) obtained by permutation ⎜ ⎟ ⎜ a ⎟ ρ , = ⎜ 00.272 ± 0.001 0 ⎟ (101) Ico ⎝⎜ ⎠⎟ 000.3642 ± 0.0007a other cases obtained by permutation I , N = 0 (no data): ga ⎛ ⎞ ⎜0.195 ± 0.004a 00⎟ ⎜ ⎟ ⎜ a ⎟ ρ , = ⎜ 00.609 ± 0.009 0 ⎟ (000) Iga ⎝⎜ ⎠⎟ I N = a co, 3:⎛ ⎞ 000.195 ± 0.004 ⎜0.249 ± 0.001a 00⎟ ⎜ ⎟ NB: This statistical operator encodes the prior knowledge Iga ⎜ a ⎟ ρ , = ⎜ 00.502 ± 0.003 0 ⎟ (030) Ico ⎝⎜ ⎠⎟ . ± . a ⎛ 000249 0 001 ⎞ ⎜0.333 ± 0.004a 00⎟ = ⎜ ⎟ Iga, N 1: ρ = b ⎜ a ⎟ ⎛ ⎞ (021),I ⎜ 00.418 ± 0.003 0 ⎟ co ⎝⎜ ⎠⎟ ⎜0.180 ± 0.004a 00⎟ . ± . a ⎜ ⎟ ⎛ 000⎞ 249 0 004 ρ = ⎜ . ± . a ⎟ (010),Iga ⎜ 00640 0 009 0 ⎟ ⎜1/30 0⎟ ⎝ ⎠ ⎜ ⎟ . ± . a ρ = ⎜ ⎟ ⎛ 000180 0 004 ⎞ (111),I ⎜ 01/30⎟ co ⎝⎜ ⎠⎟ ⎜0.239 ± 0.006a 00⎟ / ⎜ ⎟ 0013 ⎜ a ⎟ ρ , = ⎜ 00.575 ± 0.009 0 ⎟ (001) Iga ⎝⎜ ⎠⎟ other cases obtained by permutation . ± . a ⎛ 000186 0 006 ⎞ ⎜0.186 ± 0.006a 00⎟ ⎜ ⎟ ρ = ⎜ . ± . a ⎟ (100),Iga ⎜ 00575 0 009 0 ⎟ I N = ⎝ ⎠ co, 4:⎛ ⎞ 000.239 ± 0.006a ⎜0.230 ± 0.004a 00⎟ ⎜ ⎟ ρ = ⎜ a ⎟ aNote that only two of the three uncertainties of the diagonal (040),I ⎜ 00.541 ± 0.009 0 ⎟ co ⎝⎜ ⎠⎟ elements are independent; see § 5.2. . ± . a 000230 0 004 bThis has been computed from the average of the cases (021) and cases (400) and (004) obtained by permutation (120) (appropriately permuted).

Table I: Statistical operators assigned for the various absolute-frequency data and priors considered in this study. Cf. figs. 5–9. 12 adopted a more compact notation in the table (see footnote a (N1, N2, N3) ≡ ( f1N, f2N, f3N)by there). The results for N = 2andN = 3 show an intriguing feature, v(N , N , N )  immediately apparent in figs. 6 and 7: the computed Bloch 1 2 3 , , ¯ , / ¯ , , , , , , ¯ , / ¯ , . vectors seem to maintain the convex structure of the respec- 0 0 L3(N Ico) Z(N Ico) 0 0 0 0 L8(N Ico) Z(N Ico) tive data. What we mean is the following. For given N,the (50) set of possible triples of absolute frequencies (N1, N2, N3)has a natural convex structure with the extreme points (N, 0, 0), These Bloch vectors (and hence the statistical operators) (0, N, 0), and (0, 0, N): seem, from figs. 6 and 7, to respect the same convex com- binations as their respective triples:

(N1, N2, N3) ≡ ( f1N, f2N, f3N) = , , ≈ , , + , , + , , . f1(N, 0, 0) + f2(0, N, 0) + f3(0, 0, N), (49) v( f1N f2N f3N) f1v(N 0 0) f2v(0 N 0) f3v(0 0 N) (51) where we have introduced the relative frequencies fi  In terms of the integrals (30) defining L3, L8, Z, and using (16) Ni/N. Denote the Bloch vector corresponding to the triple or (42), the seeming equation above becomes

f1N f2N f3N 1 x3 x√8 1 √x8 1 x3 x√8 x j + + − − + dx B8 3 2 2 3 3 3 3 2 2 3 ≈ f1N f2N f3N 1 + x3 + x√8 1 − √x8 1 − x3 + x√8 dx B8 3 2 2 3 3 3 3 2 2 3 N N N 1 + x3 + x√8 1 − √x8 1 − x3 + x√8 B x j dx B x j dx B x j dx 8 3 2 2 3 + 8 3 3 + 8 3 2 2 3 , = , , f1 N f2 N f3 N j 3 8 (52) 1 + x3 + x√8 dx 1 − √x8 dx 1 − x3 + x√8 dx B8 3 2 2 3 B8 3 3 B8 3 2 2 3

a remarkable expression. Does it hold exactly? We have not such events as e.g. the ‘null’, no-detection event; the num- tried to prove or disprove its analytical validity, but it surely ber of events need not be the same as the number of out- deserves further investigation. [Post scriptum: Slater, us- comes. The model formalised in the equation above suffices ing cylindrical algebraic decomposition [132, 133, 134] and in many cases. Other models could take into account, e.g. a parametrisation by Bloore [cf. 135], has confirmed that “non-local” or memory effects, so that the plausibility of an eq. (52) holds exactly. In fact, he has remarked that the some event could depend on a set of previous or simultaneous out- of the integrals, here numerically calculated, can be solved comes. We thus definitely enter the realm of communication analytically by his approach.] theory [136, 137, 138, 139, 140] (see also [17, 19]). Given the preparation represented by the statistical operator ρ, and the positive-operator-valued measure {E μ } represent- 6. TAKING ACCOUNT OF THE UNCERTAINTIES IN THE ing the measurement with outcomes {‘μ’}, the plausibility of DETECTION OF OUTCOMES registering the event ‘i’ in a measurement instance is, by the rules of plausibility theory,19 Uncertainties are normally to be found in one’s measure-   ment data, and need to be taken into account in the state- p(i| ρ) = μ p(i| μ) p(μ| ρ) = μ h(i| μ)tr(E μ ρ). (54) assignment procedure. For frequency data the uncertainty can This marginalisation could be carried over to the state- stem from a combination of “over-counting”, i.e. the regis- assignment formulae already discussed in § 2, and the for- tration (because of background noise e.g.) of some events as mulae thus obtained would take into account the outcome- outcomes when there are in fact none, and “under-counting”, registration uncertainties. i.e. the failure (because of detector limitations, e.g.) to register However, it is much simpler to introduce a new positive- some outcomes. operator-valued measure {Δ i} defined by Let us model the measurement-data uncertainty as follows,  for definiteness. We say that the plausibility of registering the Δ i  μ h(i| μ)E μ , (55) “event” ‘i’ when the outcome ‘μ’ is obtained is

P(‘i’| ‘μ’ ∧ I) = h(i| μ). (53) 19 It is assumed that knowledge of the state is redundant in the plausibility The event ‘i’ belongs to some given set that may include assignment of the event ‘i’ when the outcome is already known. 13 so that the plausibilities p(i| ρ) in eq. (54) can be written, by of measure zero). We shall see later what happens when such the linearity of the trace, a region shrinks to a single point, i.e. when the uncertainties becomes smaller and smaller. In [71] it is shown, using some | ρ = Δ ρ . p(i ) tr( i ) (56) theorems in Csiszár [142] and Csiszár and Shields [143], that In the state assignment we can simply use the new positive- ⎧ ⎨⎪ , ρ  Φ , operator-valued measure, which includes the outcome- ρ| ∧ ρ ∝ 0 if q( ) ∞ p( DN I)d ⎩⎪ registration uncertainties, in place of the old one. The last p(ρ| I)dρ, if q(ρ) ∈ Φ∞, procedure is also more in the spirit of quantum mechanics: it →∞. ρ + ρ as N (59) is analogous to the use of the statistical operator p1 1 p2 2 when we are unsure (with plausibilities p and p ) about 1 2 In other words: as the number of measurements becomes whether ρ or ρ holds. I.e., we can “mix” positive-operator- 1 2 large, the plausibility of the statistical operators that encode valued-measure elements just like we mix statistical oper- a plausibility distribution not equal to one of the measured ators. In fact, we could even mix, with a similar proce- frequencies vanishes, so that the whole plausibility gets con- dure, whole positive-operator-valued measures — a procedure centrated on the statistical operators encoding plausibility dis- which would represent the fact that there are uncertainties in tributions equal to the possible frequencies. This is an intu- the identification not only of the outcomes, but of the whole itively satisfying result. The data single out a set of statistical measurement procedure as well. See Peres’ partially related operators, and these are then given weight according to the discussion [141]. prior p(ρ| I)dρ,specifiedbyus. ∗ If Φ∞ degenerates into a single frequency value f ,theex- 7. LARGE-N LIMIT pression above becomes, as shown in [71],

∗ ∗ 7.1. General case p[ρ| ( f = f ) ∧ I]dρ ∝ p(ρ| I) δ[q(ρ) − f ]dρ, (60)

Let us briefly consider the case of data with very large N. which was also intuitively expected. We summarise some results obtained in [71]. Mathematically Note that if the prior density vanishes for such statistical we want to see what form the state-assignment formulae take operators as are singled out by the data, then the equations →∞ { }∞ above become meaningless (no normalisation is possible), re- in the limit N . Consider a sequence of data sets DN N=1. Each DN consists in some knowledge about the outcomes of N vealing a contradiction between the prior knowledge and the instances of the same measurement. The latter is represented measurement data. by the positive-operator-valued measure {E i}. The plausibility distribution for the outcomes, given the preparation ρ,is 7.2. Present case q(ρ)  qi(ρ) with qi(ρ) = tr(E iρ). (57)

Let us consider more precisely the general situation in In the case of our study, the derivation above shows that, as →∞ ≡ , ,  which each data set DN consists in the knowledge that the rel- N and the triple of relative frequencies f ( f1 f2 f3) , , / ∗ ative frequencies f ≡ ( fi)  (Ni/N) lie in a region ΦN (with (N1 N2 N3) N tends to some value f , the diagonal elements ρ non-empty interior and whose boundary has measure zero in of the assigned statistical operator DN ∧I tend to respect of the prior plausibility measure). Such kind of data | ρ ≡ ρ → ∗ →∞. arise when the registration of measurement outcomes is af- p(i DN ∧I ) ( DN ∧I )ii fi as N (61) fected by uncertainties and is moreover “coarse-grained” for practical purposes, so that not precise frequencies are obtained Combining this with the results of § 5.1 concerning the off- but rather a region — like ΦN — of possible ones. diagonal elements, we find that the assigned statistical opera- For each data set we then have a resulting posterior distri- tor has in the limit the form bution for the statistical operators, ⎛ ⎞ ∗ ⎜ f 00⎟ ⎜ 1 ⎟ p(ρ| D ∧ I)dρ = p[ρ| ( f ∈ Φ ) ∧ I]dρ = ρ = ⎜ ∗ ⎟ , (62) N N D∞∧I ⎜ 0 f2 0 ⎟ ⎝ ∗⎠ p( f ∈ ΦN | ρ) p(ρ| I)dρ 00f . (58) 3 ∈ Φ | ρ ρ| ρ S p( f N ) p( I)d for both studied priors. This is again an expected result. Only ρ  ∫ ρ ρ| ∧ ff and an associated statistical operator DN ∧I p( DN the diagonal elements of the statistical operator are a ected I)dρ. by the data, and as the data amount increases it overwhelms {Φ }∞ ff Assume that the sequence N N=1 of such frequency re- the prior information a ecting the diagonal elements. Both gions converges (in a topological sense specified in [71]) to a priors are moreover symmetric in respect of the off-diagonal region Φ∞ (also with non-empty interior and with boundary elements, that get thus a vanishing average. 14

8. DISCUSSION AND CONCLUSIONS the kind staff of the KTH Biblioteket, the Forum biblioteket in particular, for their irreplaceable work. Bayesian quantum-state assignment techniques have been Post scriptum: We cordially thank Paul B. Slater for point- studied for some time now but, as far as we know, never been ing out to us the method of cylindrical algebraic decomposi- applied to the whole set of statistical operators of systems tion, by which some of the integrals of this paper can be solved with more than two levels. And they have never been used analytically, and for other important remarks. for state assignment in real cases. In this study we have ap- plied such methods to a three-level system, showing that the Appendix: DETERMINATION OF C numerical implementation is possible and simple in principle. 8 This paper should therefore not only be of theoretical inter- est but also be of use to experimentalists involved in state es- Any hyperplane tangent to (supporting) a convex set must timation. The time required to obtain the numerical results touch the latter on at least an extreme point [109, 110, 111, was relatively short in this three-level case, which involved 113, 114, 153]. To determine the hyper-sides of the minimal C B an eight-dimensional integration. Application to higher-level hyper-box 8 containing 8 we need therefore consider only systems should also be feasible, if one considers that integrals the maximal points of the latter — i.e., the pure states. involving hundreds of dimensions are computed in financial, A generic ray of a three-dimensional complex Hilbert space particle-physics, and image-processing problems (see e.g. the can be written as (somewhat dated) refs. [144, 145, 146, 147, 148]). − β − γ |φ = a |1 + e i b |2 + e i c |3 , (A.1) Bayesian methods always take into account prior knowl- edge. We have given examples of state-assignment in the case with of “vague” prior knowledge, as well as in the case of a kind of somehow better knowledge assigning higher plausibility to 0 β, γ 2π, a, b, c 0, a2 + b2 + c2 = 1; (A.2) statistical operators in the vicinity of a given pure one. A com- parison of the resulting statistical operators for the same kind note that any two of the parameters a, b, c can be chosen in- of data is quickly obtained by looking at figs. 5 and 9 (or at the dependently in the range [0, 1]. The corresponding pure sta- respective statistical operators in table I). It is clear that when tistical operator is the available amount of data is small (as is the case in those ⎛ ⎞ figures, which concern data with no or only one measurement ⎜ a2 e−iβ ab e−iγ ac ⎟ ⎜ ⎟ outcome), prior knowledge is very relevant. Any practised |φ φ| = ⎜ iβ 2 −i(β−γ) ⎟ . ⎜e ab b e bc⎟ (A.3) experimentalist usually has some kinds of prior knowledge in ⎝ γ β−γ ⎠ ei ac ei( )bc c2 many experimental situations, which arise from past experi- ence with similar situations. With some practice in “trans- All pure states have this form, with the parameters in the lating” these kinds of prior knowledge into distribution func- ranges (A.2). Equating this expression with the one in terms tions, one could employ small amounts of data in the most of the Bloch-vector components (xi), eq. (16), we obtain after efficient way. some algebraic manipulation a parametric expression for the The generalisation of the present study to data involving Bloch vectors of the pure states: different kinds of measurement is straightforward. Of course, = β, = β, in the general case one has to numerically determine a greater x1 2ab cos x2 2ab sin 2 2 number of parameters (the L j) and therefore compute a greater x3 = a − b , x4 = 2ac cos γ, number of integrals. It would also be interesting to look at the (A.4) x = 2ac sin γ, x = bc cos(β − γ), results for other kinds of priors, in particular “special” priors 5 6 √ 2 like the Bures one [101, 149, 150, 151, 152]. We found a x7 = bc sin(β − γ), x8 = 3(b − 1/3). particular non-trivial numerical relation, eq. (52), between the results obtained for the constant prior; it would be interesting These parametric equations define the four-dimensional sub- B ff to know whether it holds exactly. set of the extreme points of 8. It takes little e ort to see that, as a, b, c, β,andγ vary in the ranges (A.2), each of the first In the next paper [83] we shall give examples of numeri- − , cal quantum-state assignment for data consisting in average seven coordinates above ranges√ in√ the interval [ 1 1] and the eighth in the interval [−2/ 3, 1/ 3]. The rectangular region values instead of absolute frequencies; and besides the two C priors considered here we shall employ another prior studied given by the Cartesian product of these intervals is thus 8 as q e d by Slater [41]. defined in eq. (24), . . .

Acknowledgements Note: arxiv eprints are located at http://arxiv.org/. AM thanks Professor Anders Karlsson for encouragement. [1] U. Leonhardt, Measuring the Quantum State of Light (Cam- PM thanks Louise for continuous and invaluable support, and bridge University Press, Cambridge, 1997). 15

[2]D.F.V.James,P.G.Kwiat,W.J.Munro,andA.G.White, [24] J. L. Park and W. Band, Rigorous information-theoretic Measurement of qubits,Phys.Rev.A64, 052312 (2001), arxiv derivation of quantum-statistical thermodyamics. I, Found. eprint quant-ph/0103121. Phys. 7(3/4), 233–244 (1977), see also [25]. [3]E.T.Jaynes,Information theory and statistical mechanics. II, [25] W. Band and J. L. Park, Rigorous information-theoretic Phys. Rev. 108(2), 171–190 (1957), http://bayes.wustl. derivation of quantum-statistical thermodyamics. II, Found. edu/etj/node1.html, see also [154]. Phys. 7(9/10), 705–721 (1977), see also [24]. [4] E. T. Jaynes, The minimum entropy production principle, [26] W. Band and J. L. Park, Quantum state determination: Quo- Annu. Rev. Phys. Chem. 31, 579 (1980), http://bayes. rum for a particle in one dimension, Am. J. Phys. 47(2), 188– wustl.edu/etj/node1.html. 191 (1979). [5] R. Derka, V. Bužek, G. Adam, and P. L. Knight, From [27] J. L. Park, W. Band, and W. Yourgrau, Simultaneous measure- quantum Bayesian inference to quantum tomography,Jemna ment, phase-space distributions, and quantum state determi- Mechanika a Optika 11/12, 341 (1996), arxiv eprint nation, Ann. d. Phys. 37(3), 189–199 (1980). quant-ph/9701029. [28] A. S. Holevo, Statistical decision theory for quantum systems, [6] V. Bužek, G. Drobný, G. Adam, R. Derka, and P. L. Knight, J. Multivariate Anal. 3(4), 337–394 (1973). Reconstruction of quantum states of spin systems via the [29] A. S. Holevo, Probabilistic and Statistical Aspects of Quantum Jaynes principle of maximum entropy,J.Mod.Opt.44(11/12), Theory (North-Holland, Amsterdam, 1980/1982), first publ. in 2607–2627 (1997), arxiv eprint quant-ph/9701038. Russian in 1980. [7] V. Bužek, R. Derka, G. Adam, and P. L. Knight, Recon- [30] A. S. Holevo, Statistical Structure of Quantum Theory struction of quantum states of spin systems: From quantum (Springer-Verlag, Berlin, 2001). Bayesian inference to quantum tomography, Ann. of Phys. [31] F. J. Bloore, Geometrical description of the convex sets of 1 266, 454–496 (1998). states for systems with spin- 2 and spin-1, J. Phys. A 9(12), [8] V. Bužek and G. Drobný, Quantum tomography via the max- 2059–2067 (1976). ent principle, J. Mod. Opt. 47(14/15), 2823–2839 (2000). [32] I. D. Ivanovic,´ Geometrical description of quantal state deter- [9] H. Jeffreys, Scientific Inference (Cambridge University Press, mination,J.Phys.A14(12), 3241–3245 (1981). Cambridge, 1931/1957), 2nd ed., first publ. 1931. [33] I. D. Ivanovic,´ Formal state determination, J. Math. Phys. [10] H. Jeffreys, Theory of Probability (Oxford University Press, 24(5), 1199–1205 (1983). London, 1939/1998), 3rd ed., first publ. 1939. [34] I. D. Ivanovic,´ Representative state in incomplete quantal state [11] E. T. Jaynes, Probability Theory: The Logic of Science determination, J. Phys. A 17(11), 2217–2223 (1984). (Cambridge University Press, Cambridge, 1994/2003), ed. [35] I. D. Ivanovic,´ How to differentiate between non-orthogonal by G. Larry Bretthorst; http://omega.albany.edu: states, Phys. Lett. A 123(6), 257–259 (1987). 8008/JaynesBook.html, http://omega.albany.edu: [36] E. G. Larson and P. R. Dukes, The evolution of our probability 8008/JaynesBookPdf.html. First publ. 1994; earlier image for the spin orientation of a spin-1/2–ensemble as mea- versions in [155, 156]. surements are made on several members of the ensemble — [12] B. de Finetti, Theory of Probability: A critical introductory connections with information theory and Bayesian statistics, treatment. Vol. 1 (John Wiley & Sons, New York, 1970/1990), in Grandy and Schick [157] (1991), pp. 181–189. transl. by Antonio Machi and Adrian Smith; first publ. in Ital- [37] K. R. W. Jones, Principles of quantum inference, Ann. of Phys. ian 1970. 207(1), 140–170 (1991). [13] J.-M. Bernardo and A. F. Smith, Bayesian Theory (John Wiley [38] K. R. W. Jones, Fundamental limits upon the measurement of & Sons, Chichester, 1994). state vectors,Phys.Rev.A50(5), 3682–3699 (1994). [14] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, [39] J. D. Malley and J. Hornstein, Quantum statistical inference, Bayesian Data Analysis (Chapman & Hall/CRC, Boca Raton, Stat. Sci. 8(4), 433–457 (1993). USA, 1995/2004), 2nd ed., first publ. 1995. [40] P. B. Slater, Fisher information, prior probabilities, and the [15] P. Gregory, Bayesian Logical Data Analysis for the Physical state determination of spin-1/2 and spin-1 systems,J.Math. Sciences: A Comparative Approach with Mathematica Sup- Phys. 34(5), 1794–1798 (1993). port (Cambridge University Press, Cambridge, 2005). [41] P. B. Slater, Reformulation for arbitrary mixed states of Jones’ [16] I. E. Segal, Postulates for general quantum mechanics, Ann. Bayes estimation of pure states, Physica A 214(4), 584–604 Math. 48(4), 930–948 (1947). (1995). [17] C. W. Helstrom, Detection theory and quantum mechanics, [42] G. W. Mackey, The Mathematical Foundations of Quantum Inform. and Contr. 10(3), 254–291 (1967). Mechanics: A Lecture-Note Volume (W. A. Benjamin, New [18] C. W. Helstrom, Estimation of a displacement parameter of a York, 1963). quantum system, Int. J. Theor. Phys. 11(6), 357–427 (1974). [43] B. Mielnik, Geometry of quantum states, Commun. Math. [19] C. W. Helstrom, Quantum Detection and Estimation Theory Phys. 9(1), 55–80 (1968). (Academic Press, New York, 1976). [44] B. Mielnik, Generalized quantum mechanics, Commun. Math. [20] J. L. Park and W. Band, A general theory of empirical state Phys. 37(3), 221–256 (1974), repr. in [158, pp. 115–152]. determination in quantum physics: Part I, Found. Phys. 1(3), [45] E. B. Davies, Information and quantum measurement, IEEE 211–226 (1971), see also [21]. Trans. Inform. Theor. IT-24(5), 596–599 (1978). [21] W. Band and J. L. Park, A general method of empirical state [46] J. E. Harriman, Geometry of density matrices. I. Definitions, determination in quantum physics: Part II, Found. Phys. 1(4), N matrices and 1 matrices, Phys. Rev. A 17(4), 1249–1256 339–357 (1971), see also [20]. (1978), see also [47, 48, 49, 50]. [22] W. Band and J. L. Park, New information-theoretic foun- [47] J. E. Harriman, Geometry of density matrices. II. Reduced dations for quantum statistics, Found. Phys. 6(3), 249–262 density matrices and N representability, Phys. Rev. A 17(4), (1976). 1257–1268 (1978), see also [46, 48, 49, 50]. [23] J. L. Park and W. Band, Mutually exclusive and exhaustive [48] J. E. Harriman, Geometry of density matrices. III. Spin com- quantum states, Found. Phys. 6(2), 157–172 (1976). ponents, Int. J. Quant. Chem. 15(6), 611–643 (1979), see 16

also [46, 47, 49, 50]. Jaynes approach to induction. Being part II of “From ‘plausi- [49] J. E. Harriman, Geometry of density matrices. IV. The rela- bilities of plausibilities’ to state-assignment methods” (2007), tionship between density matrices and densities,Phys.Rev.A arxiv eprint physics/0703126, philsci eprint 00003235. 27(2), 632–645 (1983), see also [46, 47, 48, 50]. [70] P. G. L. Porta Mana, Ph.D. thesis, Kungliga Tekniska Högsko- [50] J. E. Harriman, Geometry of density matrices. V. Eigenstates, lan, Stockholm (2007), http://web.it.kth.se/~mana/. Phys. Rev. A 30(1), 19–29 (1984), see also [46, 47, 48, 49]. [71] P. G. L. Porta Mana (2007), in preparation. [51] R. Balian and N. L. Balazs, Equiprobability, inference, and en- [72] P. G. L. Mana, Why can states and measurement out- tropy in quantum theory, Ann. of Phys. 179(1), 97–144 (1987). comes be represented as vectors? (2003), arxiv eprint [52] R. Balian, Justification of the maximum entropy criterion in quant-ph/0305117. quantum mechanics, in Skilling [159] (1989), pp. 123–129. [73] L. Jakóbczyk and M. Siennicki, Geometry of Bloch vectors in [53] R. Derka, V. Bužek, and A. K. Ekert, Universal algorithm for two-qubit system, Phys. Lett. A 286, 383–390 (2001). optimal estimation of quantum states from finite ensembles via [74] G. Kimura, The Bloch vector for N-level systems, Phys. Lett. A realizable generalized measurement, Phys. Rev. Lett. 80(8), 314(5–6), 339–349 (2003), arxiv eprint quant-ph/0301152. 1571–1575 (1998), arxiv eprint quant-ph/9707028. [75] G. Kimura and A. Kossakowski, The Bloch-vector space for [54] V. Bužek, G. Drobný, R. Derka, G. Adam, and H. Wiede- N-level systems: the spherical-coordinate point of view,Open mann, Quantum state reconstruction from incomplete data, Sys. & Information Dyn. 12(3), 207–229 (2005), arxiv eprint Chaos, Solitons & Fractals 10(6), 981–1074 (1999), arxiv quant-ph/0408014. eprint quant-ph/9805020. [76] P. G. L. Porta Mana, Four-dimensional sections of the set of [55] S. M. Barnett, D. T. Pegg, and J. Jeffers, Bayes’ theorem and statistical operators for a three-level quantum system, in var- quantum retrodiction, J. Mod. Opt. 47(11), 1779–1789 (2000), ious coordinate systems (2006), realised as animated pictures arxiv eprint quant-ph/0106139. by means of Maple; available upon request. [56] S. M. Barnett, D. T. Pegg, J. Jeffers, and O. Jedrkiewicz, [77] F. Bloch, Nuclear induction, Phys. Rev. 70(7–8), 460–474 Atomic retrodiction, J. Phys. B 33(16), 3047–3065 (2000), (1946). arxiv eprint quant-ph/0107019. [78] F. Bloch, W. W. Hansen, and M. Packard, The nuclear induc- [57] J. E. Harriman, Distance and entropy for density matrices,J. tion experiment, Phys. Rev. 70(7–8), 474–485 (1946). Chem. Phys. 115(20), 9223–9232 (2001). [79] P. B. Slater, Bayesian inference for complex and quaternionic [58] R. Schack, T. A. Brun, and C. M. Caves, Quantum two-level quantum systems, Physica A 223(1–2), 167–174 Bayes rule, Phys. Rev. A 64, 014305 (2001), arxiv eprint (1996). quant-ph/0008113. [80] P. B. Slater, Quantum Fisher-Bures information of two-level [59] C. M. Caves, C. A. Fuchs, and R. Schack, Unknown quantum systems and a three-level extension,J.Phys.A29(10), L271– states: the quantum de Finetti representation, J. Math. Phys. L275 (1996). 43(9), 4537–4559 (2002), arxiv eprint quant-ph/0104088. [81] P. B. Slater, Bayesian thermostatistical analyses of two- [60] D. T. Pegg, S. M. Barnett, and J. Jeffers, Quantum retrodiction level complex and quaternionic systems (1997), arxiv eprint in open systems,Phys.Rev.A66, 022106 (2002), arxiv eprint quant-ph/9710057. quant-ph/0208082. [82] P. B. Slater, Quantum statistical thermodynamics of two-level [61] S. J. van Enk and C. A. Fuchs, Quantum state of an ideal prop- systems (1997), arxiv eprint quant-ph/9706013. agating laser field,Phys.Rev.Lett.88, 027902 (2002), arxiv [83] A. Månsson, P. G. L. Porta Mana, and G. Björk, Numeri- eprint quant-ph/0104036. cal Bayesian state assignment for a quantum three-level sys- [62] S. J. van Enk and C. A. Fuchs, Quantum state of a propagating tem. II. Average-value data; constant, Gaussian-like, and laser field, Quant. Info. Comp. 2, 151–165 (2002), arxiv eprint Slater priors (2007), arxiv eprint quant-ph/0701087;see quant-ph/0111157. also [161]. [63] O. V. Man’ko and V. I. Man’ko, Probability representa- [84] I. Bengtsson (2006), personal communication. tion entropy for spin-state tomogram (2004), arxiv eprint [85] E. P. Wigner, Group Theory: and Its Application to the Quan- quant-ph/0401131. tum Mechanics of Atomic Spectra (Academic Press, New [64] F. Tanaka and F. Komaki, Bayesian predictive density opera- York, 1931/1959), expanded and improved ed., transl. by J. tors for exchangeable quantum-statistical models, Phys. Rev. J. Griffin; first publ. in German 1931. A 71(5), 052323 (2005). [86] R. V. Kadison, Transformations of states in operator theory [65] V. I. Man’ko, G. Marmo, A. Simoni, A. Stern, E. C. G. Sudar- and dynamics, Topology 3(2), 177–198 (1965). shan, and F. Ventriglia, On the meaning and interpretation of [87] W. Hunziker, A note on symmetry operations in quantum me- tomography in abstract Hilbert spaces, Phys. Lett. A 351(1– chanics, Helv. Phys. Acta 45(2), 233–236 (1972). 2), 1–12 (2006), arxiv eprint quant-ph/0510156. [88] J. C. Baez, The octonions, Bull. Am. Math. Soc. [66] F. Bloch, Fundamentals of Statistical Mechanics: Manuscript 39(2), 145–205 (2002), http://math.ucr.edu/home/ and Notes of Felix Bloch (Imperial College Press and World baez/octonions/, arxiv eprint math.RA/0105155; see also Scientific Publishing, London and Singapore, 1989/2000), errata [162]. prepared by John D. Walecka, first publ. 1989; based on lec- [89] R. Gilmore, Lie Groups, Lie Algebras, and Some of Their Ap- tures notes by Bloch dating from 1949. plications (John Wiley & Sons, New York, 1974). [67] P. G. L. Mana, Probability tables, in Khrennikov [160] (2004), [90] M. L. Curtis, Matrix Groups (Springer-Verlag, New York, pp. 387–401, rev. version at arxiv eprint quant-ph/0403084. 1979/1984), first publ. 1979. [68] P. G. L. Porta Mana, A. Månsson, and G. Björk, ‘Plau- [91] A. Baker, Matrix Groups: An Introduction to Lie Group The- sibilities of plausibilities’: an approach through circum- ory (Springer-Verlag, London, 2002). stances. Being part I of “From ‘plausibilities of plausi- [92] B. C. Hall, Lie Groups, Lie Algebras, and Representations: bilities’ to state-assignment methods” (2006), arxiv eprint An Elementary Introduction (Springer-Verlag, New York, quant-ph/0607111. 2003/2004), first publ. 2003. [69] P. G. L. Porta Mana, A. Månsson, and G. Björk, The Laplace- [93] S. Kobayashi and K. Nomizu, Foundations of Differential Ge- 17

ometry. Vol. I (Interscience Publishers, New York, 1963). [118] D. H. Fremlin, Measure Theory. Vol. 1: The Ir- [94] W. M. Boothby, An Introduction to Differentiable Manifolds reducible Minimum (Torres Fremlin, Colchester, Eng- and Riemannian Geometry (Academic Press, Orlando, USA, land, 2000/2004), http://www.sx.ac.uk/maths/staff/ 1975/1986), 2nd ed., first publ. 1975. fremlin/mt.htm; first publ. 2000. [95] Y. Choquet-Bruhat, C. De Witt-Morette, and M. Dillard- [119] A. N. Kolmogorov, Foundations of the Theory of Probability Bleick, Analysis, Manifolds and Physics. Part I: Basics (El- (Chelsea Publishing Company, New York, 1933/1956), second sevier, Amsterdam, 1977/1996), revised ed., first publ. 1977. English ed., transl. by Nathan Morrison, with an added bibli- [96] J. E. Marsden, T. Ratiu, and R. Abraham, Manifolds, Ten- ography by A. T. Bharucha-Reid first publ. in Russian 1933. sor Analysis, and Applications (Springer-Verlag, New York, [120] J. L. Doob, The development of rigor in mathematical prob- 1983/2002), 3rd ed., first publ. 1983. ability (1900–1950), Am. Math. Monthly 103(7), 586–595 [97] W. D. Curtis and F. R. Miller, Differential Manifolds and The- (1996). oretical Physics (Academic Press, Orlando, USA, 1985). [121] Y. V. Egorov, A contribution to the theory of generalized func- [98] S. Gallot, D. Hulin, and J. Lafontaine, Riemannian Geometry tions, Russ. Math. Surveys (Uspekhi Mat. Nauk) 45(5), 1–49 (Springer-Verlag, Berlin, 1987). (1990). [99] A. U. Kennington, Differential geometry reconstructed: a [122] M. J. Lighthill, Introduction to Fourier Analysis and Gen- unified systematic framework, http://www.topology.org/ eralised Functions (Cambridge University Press, London, tex/conc/dg.html (2001/2006). 1958/1964), first publ. 1958. [100] M. Byrd, Differential geometry on SU(3) with applications to [123] J. F. Colombeau, New Generalized Functions and Multiplica- three state systems, J. Math. Phys. 39(11), 6125–6136 (1998), tion of Distributions (North-Holland, Amsterdam, 1984). arxiv eprint math-ph/9807032; see also erratum [163]. [124] J. F. Colombeau, Elementary Introduction to New Generalized [101] M. S. Byrd and P. B. Slater, Bures measures over the spaces Functions (North-Holland, Amsterdam, 1985). of two- and three-dimensional density matrices, Phys. Lett. A [125] J. F. Colombeau, Multiplication of Distributions: A tool in 283(3–4), 152–156 (2001), arxiv eprint quant-ph/0004055. mathematics, numerical engineering and theoretical physics [102] T. Tilma and E. C. G. Sudarshan, Generalized Euler an- (Springer-Verlag, Berlin, 1992). gle parametrization for SU(N), J. Phys. A 35, 10467–10501 [126] M. Oberguggenberger, Multiplication of distributions and ap- (2002), arxiv eprint math-ph/0205016. plications to partial differential equations (Longman Scien- [103] T. Tilma and E. C. G. Sudarshan, Some applications for an Eu- tific & Technical, Harlow, England, 1992). ler angle parameterization of SU(N) and U(N) (2002), arxiv [127] M. Oberguggenberger, Generalized functions in nonlinear eprint quant-ph/0212075. models — a survey, Nonlinear Analysis 47(8), 5029–5040 [104] M. S. Byrd and N. Khaneja, Characterization of the posi- (2001), http://techmath.uibk.ac.at/mathematik/ tivity of the density matrix in terms of the coherence vector publikationen/. representation,Phys.Rev.A68, 062322 (2003), arxiv eprint [128] B. H. Soffer and D. K. Lynch, Some paradoxes, errors, and quant-ph/0302024. resolutions concerning the spectral optimization of human vi- [105] A. T. Bölükba¸sı and T. Dereli, On the SU(3) parametrization sion, Am. J. Phys. 67(11), 946–953 (1999). of qutrits, J. Phys. Conf. Ser. 36, 28–32 (2006), arxiv eprint [129] V. Chew, Confidence, prediction, and tolerance regions for the quant-ph/0511111. multivariate normal distribution,J.Am.Stat.Assoc.61(315), [106] A. Peres, Quantum Theory: Concepts and Methods (Kluwer 605–617 (1966). Academic Publishers, Dordrecht, 1995). [130] J. Schwartz, The formula for change in variables in a multiple [107] F. T. Hioe and J. H. Eberly, N-level coherence vector and integral, Am. Math. Monthly 61(2), 81–85 (1954). higher conservation laws in quantum optics and quantum me- [131] P. D. Lax, Change of variables in multiple integrals,Am. chanics, Phys. Rev. Lett. 47(12), 838–841 (1981). Math. Monthly 106(6), 497–501 (1999). [108] A. J. Macfarlane, A. Sudbery, and P. H. Weisz, On [132] D. S. Arnon, G. E. Collins, and S. McCallum, Cylindrical al- Gell-Mann’s λ-matrices, d- and f -tensors, octets, and gebraic decomposition I: The basic algorithm,SIAMJ.Com- parametrizations of SU(3), Commun. Math. Phys. 11(1), 77– put. 13(4), 865–877 (1984), see also [133]. 90 (1968). [133] D. S. Arnon, G. E. Collins, and S. McCallum, Cylindri- [109] F. A. Valentine, Convex Sets (McGraw-Hill Book Company, cal algebraic decomposition II: An adjacency algorithm for New York, 1964). the plane, SIAM J. Comput. 13(4), 878–889 (1984), see [110] B. Grünbaum, Convex Polytopes (Springer-Verlag, New York, also [132]. 1967/2003), 2nd ed., prep. by Volker Kaibel, Victor Klee, and [134] M. Jirstrand, Cylindrical algebraic decomposition — an in- Günter M. Ziegler; first publ. 1967 [164]. troduction, Tech. Rep. LiTH-ISY-R-1807, Linköping Uni- [111] R. T. Rockafellar, Convex Analysis (Princeton University versity, Linköping, Sweden (1995), http://www.control. Press, Princeton, 1970). isy.liu.se/publications/doc?id=164. [112] E. M. Alfsen, Compact Convex Sets and Boundary Integrals [135] P. B. Slater, Qubit-qubit and qubit-qutrit separability functions (Springer-Verlag, Berlin, 1971). and probabilities (2007), arxiv eprint quant-ph/0702134. [113] A. Brøndsted, An Introduction to Convex Polytopes (Springer- [136] C. E. Shannon, A mathematical theory of communica- Verlag, Berlin, 1983). tion, Bell Syst. Tech. J. 27, 379–423, 623–656 (1948), [114] R. Webster, Convexity (Oxford University Press, Oxford, http://cm.bell-labs.com/cm/ms/what/shannonday/ 1994). paper.html, http://www.cparity.com/it/demo/ [115] J. K. Moser, On the volume elements on a manifold,Trans. external/shannon.pdf. Am. Math. Soc. 120(2), 286–294 (1965). [137] C. E. Shannon, Communication in the presence of noise,Proc. [116] W. Rudin, Principles of Mathematical Analysis (McGraw- IRE 37(1), 10–21 (1949), repr. in [165]. Hill, New York, 1953/1976), 3rd ed., first publ. 1953. [138] D. Middleton, An Introduction to Statistical Communication [117] W. Rudin, Real and Complex Analysis (McGraw-Hill, Lon- Theory (McGraw-Hill Book Company, New York, 1960). don, 1970). [139] I. Csiszár and J. Körner, Information Theory: Coding Theo- 18

rems for Discrete Memoryless Systems (Academic Press, New bridge, 1971). York, 1981). [154] E. T. Jaynes, Information theory and statistical mechanics, [140] T. M. Cover and J. A. Thomas, Elements of Information The- Phys. Rev. 106(4), 620–630 (1957), http://bayes.wustl. ory (John Wiley & Sons, New York, 1991). edu/etj/node1.html, see also [3]. [141] A. Peres, What’s wrong with these observables?, Found. Phys. [155] E. T. Jaynes, Probability Theory: With Applications in Science 33(10), 1543–1547 (2003), arxiv eprint quant-ph/0207020. and Engineering: A Series of Informal Lectures (1954–1974), [142] I. Csiszár, Sanov property, generalized I-projection and a con- http://bayes.wustl.edu/etj/science.pdf.html; lec- ditional limit theorem, Ann. Prob. 12(3), 768–793 (1984). ture notes written 1954–1974; earlier version of [11]. [143] I. Csiszár and P. C. Shields, Information theory and statis- [156] E. T. Jaynes, Probability Theory in Science and Engineering tics: A tutorial, Foundations and Trends in Communications (Socony-Mobil Oil Company, Dallas, 1959), http://bayes. and Information Theory 1(4), 417–528 (2004), http://www. wustl.edu/etj/node1.html; see also [155]. renyi.hu/~csiszar/. [157] W. T. Grandy, Jr. and L. H. Schick, eds., Maximum Entropy [144] F. James, Monte Carlo theory and practice, Rep. Prog. Phys. and Bayesian Methods: Laramie, Wyoming, 1990 (Kluwer 43(9), 1145–1189 (1980). Academic Publishers, Dordrecht, 1991). [145] L. Stewart, Bayesian analysis using Monte Carlo integration [158] C. A. Hooker, ed., Physical Theory as Logico-Operational — a powerful methodology for handling some difficult prob- Structure (D. Reidel Publishing Company, Dordrecht, 1979). lems, The Statistician 32(1/2), 195–200 (1983). [159] J. Skilling, ed., Maximum Entropy and Bayesian Methods: [146] S. LaValle, K. J. Moroney, and S. A. Hutchinson, Methods for Cambridge, England, 1988 (Kluwer Academic Publishers, numerical integration of high-dimensional posteriordensities Dordrecht, 1989). with application to statistical image models, IEEE Trans. Im- [160] A. Y. Khrennikov, ed., Quantum Theory: Reconsideration of age Process. 6(12), 1659–1672 (1997). Foundations — 2 (Växjö University Press, Växjö, Sweden, [147] I. H. Sloan and H. Wo´zniakowski, When are quasi-Monte 2004). Carlo algorithms efficient for high dimensional integrals?,J. [161] A. Månsson, P. G. L. Porta Mana, and G. Björk, Numerical Complex. 14(1), 1–33 (1998). Bayesian state assignment for a three-level quantum system. [148] E. Novak, High dimensional integration, Adv. Comput. Math. I. Absolute-frequency data; constant and Gaussian-like priors 112(1), 1–2 (2000). (2006), arxiv eprint quant-ph/0612105; see also [83]. [149] P. B. Slater, Hall normalization constants for the Bures vol- [162] J. C. Baez, Errata for “The octonions”, Bull. Am. Math. Soc. umes of the n-state quantum systems,J.Phys.A32(47), 8231– 42(2), 213 (2004). 8246 (1999), arxiv eprint quant-ph/9904101. [163] M. Byrd, Erratum: “Differential geometry on SU(3) with ap- [150] P. B. Slater, Bures geometry of the three-level quantum sys- plications to three state systems”, J. Math. Phys. 41(2), 1026– tems, J. Geom. Phys. 39(3), 207–216 (2001), arxiv eprint 1030 (2000), see [100]. quant-ph/0008069; see also [151]. [164] B. Grünbaum, Convex Polytopes (John Wiley & Sons, Lon- [151] P. B. Slater, Bures geometry of the three-level quantum don, 1967), second ed. [110]. systems. II (2001), arxiv eprint math-ph/0102032;see [165] C. E. Shannon, Communication in the presence of noise,Proc. also [150]. IEEE 86(2), 447–457 (1998), repr. of [137], with an introduc- [152] H.-J. Sommers and K. Zyczkowski,˙ Bures volume of the set tion [166]. of mixed quantum states,J.Phys.A36(39), 10083–10100 [166] A. D. Wyner and S. Shamai, Introduction to “Communication (2003), arxiv eprint quant-ph/0304041. in the presence of noise”, Proc. IEEE 86(2), 442–446 (1998), [153] P. McMullen and G. C. Shephard, Convex Polytopes and the see [165]. Upper Bound Conjecture (Cambridge University Press, Cam- 19

x8 1133 0.5

100 001 000 x3 1 0.5 0.5 1 010

0.5

1

x1 x2 x4 x5 x6 x7 0 22

Figure 5: Bloch vectors of the assigned statistical operator for prior knowledge Ico and absolute-frequency triples with N = 0andN = 1, computed by numerical integration. The large triangle in the figures is the two-dimensional section of the set B8 along the plane Ox3 x8.The numerical-integration uncertainty in the x3 and x8 components is ±0.005. In the case of no data (N = 0), the statistical operator assigned on the basis of the prior knowledge Ico alone is the “completely mixed” one I3/3. Note that that all the components of all four points have been determined by numerical integration, even those that can be exactly determined by symmetry arguments. Within the given uncertainties, numerical computations yielded the exact results. 20

x8 11 33 0.5

200 101002

x3 1 0.5 110 011 0.5 1

020

0.5

1

x1 x2 x4 x5 x6 x7 0 22

Figure 6: Bloch vectors of the assigned statistical operator for prior knowledge Ico and absolute-frequency triples with N = 2, computed by numerical integration. The large triangle in the figures is the two-dimensional section of the set B8 along the plane Ox3 x8. The numerical- integration uncertainty in the x3 and x8 components is ±0.0025. Note that that all the components of all six points have been determined by numerical integration, even those that can be exactly determined by symmetry arguments. Within the given uncertainties, numerical computations yielded the exact results. 21

x8 1133 0.5

201 102 300 003 111 x3 1 0.5 210 012 0.5 1

120 021

030

0.5

1

x1 x2 x4 x5 x6 x7 0 22

Figure 7: Bloch vectors of the assigned statistical operator for prior knowledge Ico and absolute-frequency triples with N = 3, computed by numerical integration. The large triangle in the figures is the two-dimensional section of the set B8 along the plane Ox3 x8. The numerical- integration uncertainty in the x3 and x8 components is ±0.005. Note that that all the components of all ten points have been determined by numerical integration, even those that can be exactly determined by symmetry arguments. Within the given uncertainties, numerical computations yielded the exact results. 22

x8 1133 0.5

x3 1 0.5 010 0.5 1 020 030 040 050 060 0.5 070

1

x1 x2 x4 x5 x6 x7 0 22

Figure 8: Bloch vectors of the assigned statistical operator for prior knowledge Ico and absolute-frequency triples of the form (0, N, 0), with N = 1, 2, 3, 4, 5, 6, 7, computed by numerical integration. The large triangle in the figures is the two-dimensional section of the set B8 along the plane Ox3 x8. The numerical-integration uncertainty in the x3 and x8 components is ±0.015. Only the x8 component was determined by numerical integration; the x3 vanishes for symmetry reasons. 23

x8 1133 0.5

x3 1 0.5 0.5 1

100 001   0.5 000 010

1

x1 x2 x4 x5 x6 x7 0 22

Figure 9: Bloch vectors of the assigned statistical operator for prior knowledge Iga and absolute-frequency triples with N = 0andN = 1, B computed by numerical integration. The large triangle in the figures is the two-dimensional√ section of the set 8 along the plane Ox3 x8.The prior knowledge is represented by a Gaussian-like distribution of “breadth” s = 1/(2 (2) centred on the pure statistical operator |2 2|;see§4.√ The small circular arc is the locus of the Bloch vectors (on the plane) at a distance |x − xˆ| = s from the vectorx ˆ  (0, 0, 0, 0, 0, 0, 0, −2/ 3) corresponding to the statistical operator |2 2|. In the case of no data (N = 0), the statistical operator assigned on the basis of the prior knowledge Iga alone lies in between the completely mixed one and the pure one |2 2|.

Paper I

arXiv:quant-ph/0701087v2 16 Jan 2007 fanncntn,pirproba prior non-constant, a of aemaueetstain h esnfrdigti sthat is this doing the for for reason [2] The situation. method measurement entropy same maximum com- by Jaynes’ to obtained using those us with enables instead operators it statistical to since assigned interesting data our also pare of is kind It particular this complex frequencies. study more absolute us of example to mere an given than constitutes been data it have that kind, may is this it form, of this that in data one obvious studying the for than reason other A data. value average limit the in analytically com- is partly when knowledge and prior numerically and partly data puted value average an the out- and of coding measurement 0, form the 1, the particular to in in associated comes, be being values instead of will average data situation, measurement measurement same the the but consider will we Here tems. on performed ments of paper outcomes first the consiste In data considered. measurement the data measurement of type di the in main The in used paper. etc, this nomenclature, formulas, on theory, of background, references account the and complete discussions and explanations, detailed motivations, more pa- a the first for paper the first in the as to scenario ourselves repeating same avoid system. to the per, quantum consider three-level will we a Since to methods assignment quantum- Bayesian state of applications numerical concrete of ples ∗ lcrncades [email protected] address: Electronic nti ae ecniu u w-atsuy[]wt exam- with [1] study two-part our continue we paper this In N →∞ ueia aeinqatmsaeasgmn o he-ee unu system quantum three-level a for assignment quantum-state Bayesian Numerical N ASnmes 03.67.-a,02.50.Cw,02.50.Tt,05.30.-d,02.60.-x numbers: PACS iesos nacmue.Temaueetdt oss nteaeaeo ucm ausof values outcome of eight average (in the integration in numerical consist through data partly measurement analytically, data The partly measurement various computer. computed of a is evidence the on knowledge on dimensions) prior assigned of operator statistical kinds The and system. quantum three-level a to ttsia prtr bandwt h rttoknso rosaecmae ihteoeotie yJaynes’ by obtained assigned one The the with system. compared the are of situation. priors measurement preparation of same likely the kinds the for two reflecting about method first thus knowledge entropy the and maximum prior operator, with useful statistical the obtained pure operators; has operators a statistical one of statistical on set which centred the in distribution on represented measure Gaussian-like situation one natural a a the another by as operators; represented proposed statistical is been of has prior set which last Slater, the by of studied structure prior convex a the by of respect in constant distribution large- the o emn rjciemaueet efre on performed measurements projective Neumann von o osat n lofrtodi two for also and constant, a for , ntecmainpprtecs fmaueetdt ossigi bouefeunisi considered. is frequencies absolute in consisting data measurement of case the paper companion the In hsppro paper This dnia o emn rjciemeasure- projective Neumann von identical I vrg-au aawt osat asinlk,adaSae prior Slater a and Gaussian-like, a constant, a with data Average-value II. .INTRODUCTION 1. N N ff ii ilb osdrd he id fpirkoldeaeue:oerpeetdb plausibility a by represented one used: are knowledge prior of kinds Three considered. be will limit dnial rprdtrelvlsys- three-level prepared identically rnebtentetoppr lies papers two the between erence ugiaTkik ösoa,Iajrsaa 2 E144 tchl,Sweden Stockholm, 40 SE-164 22, Isafjordsgatan Högskolan, Tekniska Kungliga ff r xmlso oceenmrclapiain fBysa unu-tt sinetmethods assignment quantum-state Bayesian of applications numerical concrete of examples ers iiydsrbto,addi and distribution, bility − naslt rqece fthe of frequencies absolute in d .Tesaitcloeao en- operator statistical The 1. eteeoerfrtereader the refer therefore we .Månsson, A. ff rn kinds erent ∗ Dtd 4Jnay2007) January 14 (Dated: .G .PraMn,adG Björk G. and Mana, Porta L. G. P. ff erent N dnial rprdthree- prepared identically olwn kind: following itcloeao steoeotie yuigtemaximum the using sta- by same the obtained to method. one lead entropy would the that as prior operator the find tistical to be assignment, try quantum-state could so, Bayesian if method of and this case special not a or as whether seen investigate to want we nti ae esuydata study we paper this In p • • ( ρ h esrmn data measurement The ueetpromdon performed surement nfis ae,ie ro luiiiydistribution plausibility prior a i.e. paper, first in n[,§34;adashrclysmerc Gaussian- symmetric, spherically a explained distribution and prior sense like 3,4]; the § in [1, operators, of in statistical structure convex of the set of the respect in constant is which vnNuanmaueet)hvn he possi- eigenprojectors outcomes three the distinct having measurement’) ble Neumann ‘von extreme non-degenerate the (i.e., by measure represented positive-operator-valued is measurement The tems. w fthem, of Two he di Three large very of case values outcome associated their by labelled are eigenprojectors N | I ucm ausof values outcome ga )d { ρ 1 ff , p = 0 ( rn id fpirknowledge prior of kinds erent .TEPEETSTUDY PRESENT THE 2. ρ , g − | ga I 1 co I ( } co ρ epciey ecnie h limiting the consider We respectively. , )d ee ytm.I particular In systems. level )d and N ρ . ρ {| = 1 ∝ I

N ga g D exp 1 co { D ,arethesameasthosegiven ‘1’ N | ntne ftesm mea- same the of instances n ro knowledge prior and , ( ρ | 0 oss nteaeaeof average the in consist , dnial rprdsys- prepared identically )d

− ‘2’ 0 N tr[( ρ | , , identical ‘3’ |− ∝ ρ 1 s d − } 2 − ρ ρ , ˆ ersne by represented ) 1 2 |} ] ,wherethe I d r used. are ρ I , fthe of (1) (2) 2

centred on the statistical operator ρˆ . The latter prior ex- We will consider the general situation in which the data D presses some kind of knowledge that leads us to assign consists in the knowledge that the average valuem ¯ in N repe- a higher plausibility to regions in the vicinity of ρˆ .For titions of the measurement lies in a set Υ; this prior we consider two examples, when ρˆ = |1 1| = ∈ Υ . and ρˆ = |0 0|.1 D ˆ [¯m ] (5)

The third kind of prior knowledge, Isl, is represented by Such kind of data arise when the the measurements is af- the prior plausibility distribution fected by uncertainties and is moreover “coarse-grained” for practical purposes, so that not precise average values are ob- 2d+1 p(ρ| Isl)dρ = gsl(ρ)dρ ∝ (det ρ) dρ, (3) tained but rather a region of possible ones. On the evi- dence of D we can update the prior plausibility distribution the so called “Slater prior” for a d-level system, which g(ρ)dρ  p(ρ| I)dρ. By the rules of plausibility theory2 has been proposed as a candidate for being the appro- p(D| ρ) g(ρ)dρ priate measure on the set of statistical operators. [3] p(ρ| D ∧ I)dρ = , (6) | ρ ρ ρ S p(D ) g( )d Thepaperisorganisedasfollows:In§3wepresentthe reasoning leading to the statistical-operator-assignment for- where S is the set of all statistical operators. mulae in the case of average value data, for finite N and in The plausibility of obtaining a particular sequence of out- the limit when N →∞. We arrived at the same formulae (as comes is special cases of formulae applicable to generic, not necessar- r E ,...,E | ρ = E ρ Ni , ily quantum-theoretical systems) in a series of papers [4, 5, 6]. p i1 iN [tr i ] (7) i=1 In § 4 we present the particular case studied in this paper and give the statistical-operator-assignment formulae in this case, with the convention, here and in the following, that only fac- > introduce the Bloch vector parametrisation, present the cal- tors with Ni 0 are to be multiplied over, and where we have culations by symmetry arguments and by numerical integra- used that tr E iρ = p E i| ρ and (N1, .., Nr)  N¯ are the ab- tion, discuss the result and in some cases compare it with that solute frequencies of appearance of the r possible outcomes obtained by the maximum entropy method. Finally, the last (naturally, Ni 0and iNi = N). Since the exact order of section summarises and discusses the main points and results. the sequence of outcomes is unimportant and only the abso- lute frequencies of appearance N¯ matter, the plausibility of the absolute frequencies N¯ in N measurements is

3. STATISTICAL OPERATOR ASSIGNMENT r [tr E ρ ]Ni p N¯ | ρ = N! i . (8) N ! 3.1. General case i=1 i Nr ¯ Define N as the set of all absolute frequencies N,forfixedN Again we assume there is a preparation scheme that pro- and r. By the rules of plausibility theory we then have that duces quantum systems always in the same ‘condition’ — the same ‘state’ — where each condition is associated with a sta- p D| ρ = p D|N¯ ∧ ρ)p N¯ |ρ). (9) tistical operator. Suppose we come to know that N measure- ¯ ∈Nr N N ments, represented by the N positive-operator-valued mea- (k) sures {E μ : μ = 1,...,rk}, k = 1,...,N, are or have been Given that we know N¯ , we can with certainty tell if N¯ corre- performed on N systems for which our knowledge I holds. In sponds to an average value this paper we will be analysing the case when the data is an r average of a number of outcome values and it will therefore m¯ ≡ Nimi/N (10) be natural to limit ourselves to the situation when the N mea- i=1 surements are all instances of the same measurement. Thus, (k) Υ for all k = 1,...,N, {E μ } = {E μ }. that belongs to the set , and knowledge of the statistical op- ρ | ¯ ∧ ρ = Let us say that the outcomes i1,...,ik,...,iN are or were erator is here irrelevant. We thus have that p D N ) obtained. Since every outcome is associated to an outcome p m¯ ∈ Υ|N¯ ) = 1ifN¯ ∈ φΥ and 0 otherwise, where we have value mi, the average of all outcome values is defined φ  { ¯ ∈ Nr | / ∈ Υ}. N Υ N N iNimi N (11) ≡ / . m¯ mik N (4) k=1

2 We do not explicitly write the prior knowledge I whenever the statistical operator appears on the conditional side of the plausibility; i.e., p(·| ρ)  1 Note that the case ρˆ = |−1 −1| is equivalent to the case with ρˆ = |1 1|. p(·| ρ, I). 3

Using this together with equations (8) and (9) we obtain: Further it is also shown that if Υ∞ degenerates into a single average valuem ¯ , the expression above becomes3 r {E ρ} Ni |ρ = ¯ |ρ = [tr i ] . p(D ) p(N ) N! (12) p[ρ| m¯ ∧ I]dρ ∝ p(ρ| I) δ[ iqi(ρ) mi − m¯ ]dρ. (18) Ni! N¯ ∈φΥ N¯ ∈φΥ i=1 This is an intuitively satisfying result, since in the limit when →∞ Inserting this into equation (6) we finally obtain: N we would expect that it is only those statistical opera- tors ρ whose expectation value iqi(ρ) mi is equal to the mea- r N [tr{E i ρ}] i sured average valuem ¯ that could have been the case. The data g(ρ)dρ Ni! single out a set of statistical operators, and these are then given N¯ ∈φΥ i=1 p(ρ|D ∧ I)dρ = . (13) weight according to the prior p(ρ| I)dρ = g(ρ)dρ, specified r [tr{E ρ}]Ni i g(ρ)dρ by us. = Ni! By normalising the posterior plausibility distribution in N¯ ∈φΥ S i 1 equation (18), the assigned statistical operator in equation (14) ˜ is then given by We saw in the first paper that generic knowledge I can be rep- resented by or “encoded in” a unique statistical operator: ρ ρ δ ρ − ρ g( ) [ iqi( ) mi m¯ ]d ρ  ρ ρ| ˜ ρ. ρ = S . I˜ p( I)d (14) D∧I (19) S g(ρ) δ[ iqi(ρ) mi − m¯ ]dρ The statistical operator encoding the joint knowledge D ∧ I is S thus given by r N [tr{E iρ}] i 4. AN EXAMPLE OF STATE ASSIGNMENT FOR A ρ g(ρ)dρ THREE-LEVEL SYSTEM = Ni! N¯ ∈φΥ S i 1 ρ = . (15) D∧I r [tr{E ρ}]Ni 4.1. Three-level case i g(ρ)dρ N ¯ ∈φ = i! N Υ S i 1 We will now consider the particular case studied in this paper. The preparation scheme concerns three-level quan- tum systems; the corresponding set of statistical operators 3.2. Large-N limit will be denoted by S3. We are going to consider the case when the number of measurements N is very large and in Let us now summarise some results obtained in [6] for the limit goes to infinity. The N measurements are all in- the case of very large N. Consider the general situation stances of the same measurement, namely a non-degenerate D in which each data set N consists in the knowledge that projection-valued measurement (often called ‘von Neumann the relative frequencies f ≡ ( f ):= (N /N) lie in a region (k) i i measurement’). Thus, for all k = 1,...,N, {E } = {E μ }  Φ = { | ∈ Υ } Υ μ N f [ i fi mi] N ,where N is a region in which the {|1 1|, |0 0|, |−1 −1|}, where the projectors, labelled by the average values lie (being such that ΦN has a non-empty inte- particular outcome values (m1, m2, m3) = (1, 0, −1) we have rior and its boundary has measure zero in respect of the prior chosen to consider here, define an orthonormal basis in Hilbert plausibility measure). Mathematically we want to see what →∞ space. All relevant operators will, quite naturally and advan- form the state-assignment formulae take in the limit N . tageously, be expressed in this basis. We have for example Consider a sequence of data sets {D }∞ with corresponding N N=1 that qμ (ρ) = tr(E μ ρ) = ρμμ,theμth diagonal element of ρ sequences of regions {Υ }∞ and {Φ }∞ , and assume the N N=1 N N=1 in the chosen basis. As data we are given that the average of regions converges (in a topological sense specified in [6]) to the measurement outcome values ism ¯ (more precisely in the regions Υ∞ and Φ∞ (the latter also with non-empty interior sense that Φ∞ degenerates into a single average valuem ¯ ). and with boundary of measure zero), respectively. Given that the statistical operator is ρ, the plausibility dis- tribution for the outcomes is 4.2. Bloch vector parametrisation and symmetries q(ρ) ≡ q (ρ) with q (ρ)  tr(E ρ). (16) i i i We will be using the same parametrisation of the statistical In [6] it is shown that operators as in the companion paper, i.e. in terms of Bloch ⎧ ⎨⎪ , ρ  Φ , ρ| ∧ ρ ∝ 0 if q( ) ∞ p( DN I)d ⎩⎪ p(ρ| I)dρ, if q(ρ) ∈ Φ∞, 3 Note that we have here, with abuse of notation, written p[ρ| m¯ ∧ I] instead ∗ →∞. of the more correct form p[ρ| (¯m = m¯ ) ∧ I], to avoid introducing another as N (17) variablem ¯ ∗ for the average value data. 4 vectors x. For a three-level system the Bloch vector expansion where the dependence of the average value and prior knowl- of a statistical operator ρ(x) is given by: edge is indicated within brackets. The assigned statistical op- erator will then given by 1 1 ρ x = I + 8 x λ , ( ) 3 j=1 j j (20) 8 3 2 1 1 Li[¯m, I] ρ ∧ = I + λ . (29) m¯ I 3 3 2 Z[¯m, I] i where i=1

xi = tr{λ i ρ(x)}≡ λ i ρ(x). (21) One sees directly from equations (27) and (28) that L3[¯m, I]/Z[¯m, I] = m¯ (Z[¯m, I] can never vanish, its integrand The Gell-Mann operators λ i are Hermitian and can therefore being positive and never identically naught). be regarded as observables. Note that our von Neumann mea- For the same reasons already accounted for in the first paper ρ surement corresponds to the observable we will not try to determine m¯ ∧I exactly, but also here com- pute it with a combination of symmetry considerations of B8 λ ≡| | + | |−|− − |. 3 1 1 0 0 0 1 1 (22) and numerical integration. For all three kinds of prior knowl- edge considered in this paper the same symmetry arguments ρ Hence, given a statistical operator (x), the following holds used in the companion paper also holds here, so again we have for the expectation value of the outcome values {1, 0, −1} for that Li[¯m, I]/Z[¯m, I] = 0foralli  3, 8 and any average value this particular measurement: −1 ≤ m¯ ≤ 1. The assigned Bloch vector is thus given by , , , , , , , , / , ρ (0 0 m¯ 0 0 0 0 L8[¯m I] Z[¯m I]). This means that m¯ ∧I lies λ i ρ(x) = iqi(x) mi = ρ11(x) − ρ33(x) = x3. (23) in the (x3, x8)-plane and it has, in the chosen eigenbasis, the diagonal matrix form Equation (18) thus becomes ⎛ , ⎞ ⎜ 1 + m¯ + √L8[¯m I] 00⎟ p[x| m¯ ∧ I]dx ∝ g(x) δ(x − m¯ )dx, (24) ⎜ 3 2 2 3Z[¯m,I] ⎟ 3 ⎜ , ⎟ ρ = ⎜ 0 1 − √L8[¯m I] 0 ⎟ . m¯ ∧I ⎜ 3 3Z[¯m,I] ⎟ ⎝⎜ , ⎠⎟ and the assigned statistical operator in equation (19) assumes 001 − m¯ + √L8[¯m I] 3 2 2 3Z[¯m,I] the form (30)

ρ(x) g(x) δ(x3 − m¯ )dx B 4.3. Numerical integration, results and the maximum entropy ρ = 8 , m¯ ∧I (25) method g(x) δ(x3 − m¯ )dx 4 B8 We have used numerical integration to compute L [¯m, I]/Z[¯m, I] for different prior knowledge and dif- B 8 where 8 is the set of all three-level Bloch vectors. This can be ferent values ofm ¯ . The result for a constant prior density rewritten in a form especially suited for numerical integration is shown in figure 1, where the blue curve (with bars indi- by computer, which we shall use hereafter: cating the numerical-integration uncertainties) is the Bloch ρ ff vector corresponding to m¯ ∧Ico plotted for di erent values of ρ δ − χ = 5 (x)g(x) (x3 m¯ ) B8 (x)dx x3 m¯ . C It is interesting to compare ρ ∧ with the statistical opera- ρ = 8 , m¯ Ico m¯ ∧I (26) tor obtained by the maximum entropy method [2] for the mea- δ − χB g(x) (x3 m¯ ) 8 (x)dx surement situation we are considering here. Given the expec-

C8 tation value M of a Hermitian operator M, corresponding to an observable M, the maximum entropy method assigns the χB B ⊂ where 8 (x) is the characteristic function of the set 8 C  [−1, 1]7 × − √2 , √1 . Using the Bloch vector expansion 8 3 3 in equation (20) we see that by computing the following set of ρ 4 Using quasi Monte Carlo-integration in Mathematica 5.2 on a PC (Pen- integrals we have determined m¯ ∧I : tium 4 processor, 3 GHz). The computation times are given in figures 1 to 4, and for more details on the numerical integration we again refer the ,  δ − χ , reader to the companion paper [1]. Li[¯m I] xi g(x) (x3 m¯ ) B8 (x)dx (27) 5 Note that we have for all three kinds of priors considered in this pa- C8 per computed L8[¯m, I]/Z[¯m, I] only for non-negative values of x3 = m¯ , x , x , x , x , x , x , x , x → ∈{ , .., } since by using the symmetry operation ( 1 2 3 4 5 6 7 8) where i 1 8 ,and (x6, x7, −x3, x4, −x5, x1, x2, x8) one can show that L8[¯m, I]/Z[¯m, I]isin- variant under a sign change ofm ¯ . Further, we have not computed L8[¯m, I]/Z[¯m, I]for¯m√ = ±1, since it follows from [1, eq. 17] that Z[¯m, I]  g(x) δ(x3 − m¯ ) χB (x)dx, (28) 8 L8[¯m, I]/Z[¯m, I] = 1/ 3 is the only possibility in this case (which one

C8 also realises by looking at the figures). 5 statistical operator to the system that maximises the von Neu- 5. CONCLUSIONS mann entropy S  − tr{ρ ln ρ} and satisfies the constraint {ρ } = ¯ tr M M . Having obtained an average value M from This was the second paper in a two-part study where the many instances of the same measurement performed on iden- Bayesian quantum-state assignment methods has been applied = ¯ tically prepared systems, one conventionally sets M M. to a three-level system, showing that the numerical implemen- In our case the operator M would be identified as the Her- tation is possible and simple in principle. This paper should mitian operator λ 3 and M¯ asm ¯ . Hence the maximum entropy not only be of theoretical interest but also be of use to ex- method corresponds here to an assignment of the statistical perimentalists involved in state estimation. We have anal- operator that maximises S among all statistical operators sat- ysed the situation where we are given the average of outcome isfying λ 3 = m¯ , and this statistical operator is given by values from N repetitions of identical von Neumann projec- tive measurements performed on N identically prepared three- −μ λ (¯m) 3 level systems, when the number of repetitions N becomes very ρ  e , ME −μ λ (31) ff tr{e (¯m) 3 } large. From this measurement data together with di erent kinds of prior knowledge of the preparation, a statistical oper- where ator can be assigned to the system. By a combination of sym- metry arguments and numerical integration we computed the √ ff −m¯ + 4 − 3¯m2 assigned statistical operator for di erent average values and μ(¯m):= ln . (32) for a constant, and also for two examples of a non-constant, 2(m ¯ + 1) prior probability distribution. The results were also compared with that obtained by the max- This could be compared with the statistical operator ρ ob- m¯ ∧I imum entropy method. An interesting question is whether tained by instead using Bayesian quantum-state assignment, there exists a prior probability distribution that gives rise to and expressed in general form as in equation (25) it is seen to an assigned statistical operator which is in general identical instead be given by a weighted sum, with weight g(x)dx,of to the one given by the maximum entropy method, i.e. if the all statistical operators with λ = x = m¯ . 3 3 maximum entropy method could be seen as a special case of In the case of a constant prior one sees from figure 1 that Bayesian quantum-state assignment? In the case of a constant ρ ∧ is in general different from ρ (the red curve [without m¯ Ico ME and a “Slater prior” on the Bloch vector space of a three-level bars]). This means for instance that, if the maximum entropy system we saw that the assigned statistical operator did not method is a special case of Bayesian quantum-state assign- agree with the one given by the maximum entropy method. It ment, the statistical operator obtained by the former method would therefore be interesting to try other kinds of priors, in corresponds to a non-constant prior probability distribution particular “special” priors like the Bures one. B g(x)dx on 8 in the latter method. This conclusion in itself The generalisation of the present study to data involving dif- is perhaps not so surprising, but it raises an interesting ques- ferent kinds of measurement is straightforward. Of course, in B tion: Does there exist a (non-constant on 8) prior distribution the general case one has to numerically determine a greater g(ρ)dρ that one with Bayesian quantum-state assignment in number of parameters (the L j[¯m, I]) and therefore compute a general obtains the same assigned statistical operator as with greater number of integrals. the maximum entropy method? Post scriptum: During the preparation of this manuscript, A strong candidate is the “Bures prior” which has been pro- P. Slater kindly informed us that some of the integrals numer- posed as the natural measure on the set of all statistical opera- ically computed here and in the previous paper can in fact be tors (see e.g. [7, 8, 9, 10, 11]), but unfortunately it turns out to calculated analytically, using cylindrical algebraic decompo- be difficult to do numerical integrations on it due to its compli- sition [12, 13, 14, 15, 16] with a parametrisation introduced by cated functional form, so we have not computed the assigned Bloore [17]; cf. Slater [18]. This is true, e.g., for the integrals statistical operator in this case. Another interesting candidate involving the constant and Slater’s priors. By this method is the “Slater prior” [3], which have also been suggested to Slater has also proven the exact validity of eq. (52) of our pre- be the natural measure on the set of all statistical operators, vious paper [1]. We plan to use and discuss more extensively and the computed assigned statistical operator in this case is this method in later versions of these papers. shown in figure 2. One can see directly from the figure that although it is similar to the curve obtained by the maximum entropy method, we have found them to differ. The computed assigned statistical operators for the Gaussian- Acknowledgements like prior, centred on the projectors ρˆ = |1 1| and ρˆ = |0 0| with “breadth” s = 1/4, are shown in figures 3 and 4, respec- We cordially thank P. Slater for introducing us to cylindrical tively. Apart from being symmetric under a sign change ofm ¯ , algebraic decomposition and showing how it can be applied to as already have been noted in footnote 5, one can also show the integrals considered in our papers. AM thanks Professor that L8[¯m, Iga]/Z[¯m, Iga] does not depend on thex ˆ3-coordinate Anders Karlsson for encouragement. PM thanks Louise for of the statistical operator the prior is centred on. continuous and invaluable support, and the staff of the KTH 6

Biblioteket for their irreplaceable work. [8] P. B. Slater, Bures geometry of the three-level quantum sys- (Note: ‘arxiv eprints’ are located at http://arxiv.org/.) tems, J. Geom. Phys. 39(3), 207–216 (2001), arxiv eprint quant-ph/0008069; see also [9]. [9] P. B. Slater, Bures geometry of the three-level quantum systems. II (2001), arxiv eprint math-ph/0102032; see also [8]. [10] P. B. Slater, Hall normalization constants for the Bures volumes of the n-state quantum systems 32 [1] A. Månsson, P. G. L. Porta Mana, and G. Björk, Numeri- ,J.Phys.A (47), 8231–8246 quant-ph/9904101 cal Bayesian quantum-state assignment for a three-level quan- (1999), arxiv eprint . Applications of quantum and classical Fisher infor- tum system. I. Absolute-frequency data with a constant and a [11] P. B. Slater, mation to two-level complex and quaternionic and three-level Gaussian-like prior (2006), arxiv eprint quant-ph/0612105. complex systems 37 [2] E. T. Jaynes, Information theory and statistical me- , J. Math. Phys. (6), 2682–2693 (1996). Cylindrical alge- chanics. II, Phys. Rev. 108(2), 171–190 (1957), [12] D. S. Arnon, G. E. Collins, and S. McCallum, braic decomposition I: The basic algorithm http://bayes.wustl.edu/etj/node1.html, see also , SIAM J. Comput. 13 Information theory and statistical mechan- (4), 865–877 (1984), see also [19]. [13] J. H. Davenport, Y. Siret, and E. Tournier, Computer Algebra: ics, Phys. Rev. 106(4), 620–630 (1957), http://bayes.wustl.edu/etj/node1.html Systems and Algorithms for Algebraic Computation (Academic . / [3] P. B. Slater, Reformulation for arbitrary mixed states of Jones’ Press, London, 1987 1993), 2nd ed., transl. by A. Davenport Bayes estimation of pure states, Physica A 214(4), 584–604 and J. H. Davenport; first publ. in French 1987. Algorithmic Algebra (1995). [14] B. Mishra, (Springer-Verlag, New York, 1993). [4] P. G. L. Porta Mana, A. Månsson, and G. Björk, From “plausi- Cylindrical algebraic decomposition bilities of plausibilities” to state-assignment methods: I. “Plau- [15] M. Jirstrand, sibilities of plausibilities”: an approach through circumstances — an introduction, Tech. Rep. LiTH-ISY-R-1807, (2006), arxiv eprint quant-ph/0607111. Linköping University, Linköping, Sweden (1995), http://www.control.isy.liu.se/publications/doc?id=164 [5] P. G. L. Porta Mana, A. Månsson, and G. Björk, From “plausi- . bilities of plausibilities” to state-assignment methods: II. In- [16] C. W. Brown, Simple CAD construction and its applications,J. Symbolic Computation 31(5), 521–547 (2001). duction and a challenge to de Finetti’s theorem (2006), in [17] F. J. Bloore, Geometrical description of the convex sets of states preparation. for systems with spin- 1 and spin-1, J. Phys. A 9(12), 2059–2067 [6]P.G.L.PortaMana,A.Månsson,andG.Björk,From “plau- 2 sibilities of plausibilities” to state-assignment methods: III. In- (1976). terpretation of “state” and state-assignment methods (2006), in [18] P. B. Slater, Two-qubit separability probabilities and beta func- preparation. tions (2006), arxiv eprint quant-ph/0609006. [7] M. S. Byrd and P. B. Slater, Bures measures over the spaces [19] D. S. Arnon, G. E. Collins, and S. McCallum, Cylindrical alge- of two- and three-dimensional density matrices, Phys. Lett. A braic decomposition II: An adjacency algorithm for the plane, 283(3–4), 152–156 (2001), arxiv eprint quant-ph/0004055. SIAM J. Comput. 13(4), 878–889 (1984), see also [12]. 7

x8 1111 0.5

I 3 x3 Λ3 1 0.5 0.5 1

0.5

1 x1 x2 x4 x5 x6 x7 0 00

Figure 1: Bloch vectors of the assigned statistical operator for prior knowledge Ico, computed by numerical integration for different average valuesm ¯ ≡ x3 ≡ λ 3 (connected by the blue curve [with bars]). The red curve (without bars) is the statistical operator given by the maximum entropy method also as a function of the average valuem ¯ ≡ x3 ≡ λ 3 . The large triangle is the two-dimensional cross section of the set B8 along the plane Ox3 x8. The maximum numerical-integration uncertainty in the x8 component is ±0.01. Note that only the ten points for 0 ≤ m¯ ≤ 0.9 have been determined by numerical integration, since the nine points for −0.9 ≤ m¯ ≤−0.1 can√ be exactly determined from the former by symmetry arguments. The endpoints corresponding tom ¯ = ±1 were set manually, since x8 = 1/ 3 is the only possibility in this case. Within the given uncertainties, numerical computations yielded the exact results. The computation was done on a PC (Pentium 4 processor, 3 GHz) and the computation time was 15 min. 8

x8 1111 0.5

I 3 x3 Λ3 1 0.5 0.5 1

0.5

1 x1 x2 x4 x5 x6 x7 0 00

Figure 2: Bloch vectors of the assigned statistical operator for prior knowledge Isl, computed by numerical integration for different average valuesm ¯ ≡ x3 ≡ λ 3 (connected by the blue curve [with bars]). The red curve (without bars) is the statistical operator given by the maximum entropy method also as a function of the average valuem ¯ ≡ x3 ≡ λ 3 . The large triangle is the two-dimensional cross section of the set B8 along the plane Ox3 x8. The maximum numerical-integration uncertainty in the x8 component is ±0.02. Note that only the ten points for 0 ≤ m¯ ≤ 0.9 have been determined by numerical integration, since the nine points for −0.9 ≤ m¯ ≤−0.1 can√ be exactly determined from the former by symmetry arguments. The endpoints corresponding tom ¯ = ±1 were set manually, since x8 = 1/ 3 is the only possibility in this case. Within the given uncertainties, numerical computations yielded the exact results. The computation was done on a PC (Pentium 4 processor, 3 GHz) and the computation time was 250 min. 9

x8 1111 0.5

I 3 x3 Λ3 1 0.5 0.5 1

0.5

1 x1 x2 x4 x5 x6 x7 0 00

Figure 3: Bloch vectors of the assigned statistical operator for prior knowledge Iga, computed by numerical integration for different average valuesm ¯ ≡ x3 ≡ λ 3 (connected by the curve). The large triangle is the two-dimensional cross section of the set B8 along the plane Ox3 x8. The prior knowledge is represented by a Gaussian-like distribution of “breadth” s = 1/4 centred on the pure statistical operator |1 1|.The√ small circular arc is the locus of the Bloch vectors (on the plane) at a distance |x − xˆ| = s from the vectorx ˆ  (0, 0, 1, 0, 0, 0, 0, 1/ 3) corresponding to the statistical operator |1 1|. The numerical-integration uncertainty in the x8 component is ±0.016. Note that only the ten points for 0 ≤ m¯ ≤ 0.9 have been determined by numerical integration, since the nine points for −0.9 ≤ m¯ ≤−0.1 can√ be exactly determined from the former by symmetry arguments. The endpoints corresponding tom ¯ = ±1 were set manually, since x8 = 1/ 3 is the only possibility in this case. Within the given uncertainties, numerical computations yielded the exact results. The computation was done on a PC (Pentium 4 processor, 3 GHz) and the computation time was 30 min. 10

x8 1111 0.5

I 3 x3 Λ3 1 0.5 0.5 1

0.5

1 x1 x2 x4 x5 x6 x7 0 00

Figure 4: Bloch vectors of the assigned statistical operator for prior knowledge Iga, computed by numerical integration for different average valuesm ¯ ≡ x3 ≡ λ 3 (connected by the curve). The large triangle is the two-dimensional cross section of the set B8 along the plane Ox3 x8. The prior knowledge is represented by a Gaussian-like distribution of “breadth” s = 1/4 centred on the pure statistical operator |0 0|.The√ small circular arc is the locus of the Bloch vectors (on the plane) at a distance |x − xˆ| = s from the vectorx ˆ  (0, 0, 0, 0, 0, 0, 0, −2/ 3) corresponding to the statistical operator |0 0|. The numerical-integration uncertainty in the x8 component is ±0.02. Note that only the ten points for 0 ≤ m¯ ≤ 0.9 have been determined by numerical integration, since the nine points for −0.9 ≤ m¯ ≤−0.1 can√ be exactly determined from the former by symmetry arguments. The endpoints corresponding tom ¯ = ±1 were set manually, since x8 = 1/ 3 is the only possibility in this case. Within the given uncertainties, numerical computations yielded the exact results. The computation was done on a PC (Pentium 4 processor, 3 GHz) and the computation time was 425 min. Paper J

arXiv:quant-ph/0310193 v2 9 Jun 2004 bei h atraents peea,o vnvisible, even or ephemeral, Schr¨ adjective so in the not cat are the latter refer like unaccept- states the is the if but when objects, able tolerated ephemeral and be imperceptible can to This paradox- nature. or ical counterintuitive different and mysterious as- quite a then “coexistence” have sumes their can two characteristics, these the incompatible often since of and more “coexistence” and is of states, sort it original a However, re- as (re)presented statistical former. sadly precise the some with having lationships state different third, osiun ttssol o eegnttso h mea- the of two eigenstates the be that not should implies are states of this constituent positions collection Operationally, mean a their or uncorrelated. from deviation states include whose coherent examples particles spin and semiclassical, states, are states as coherent Such to mechan- states). referred quantum “dead” as often and state consists “alive” classical (the superposition as allow Sec- an ics the in states be pho- etc. two should a of, magnetometer the ruler, of a a each voltmeter, eye, state ondly, a the system’s film, the as such infer tographic “cat”). consist meter to (the classical possible or subsystems through be excited, quantized should of highly it number Ideally, be large should a of study under tem Schr¨ in points crucial ooiiaewe eapn h adjective of the concept append the seem we to peculiarities when the originate pe- of to these Some inter- study features. on to statistical discussion culiar efforts a been experimental has to which by and rise accompanied today gave still lasts This which issues arrange- pretative 3]. new experimental 2, of peculiar [1, existence ments the in suggests properties theory statistical new the that ticed † ∗ URL: lcrncades [email protected] address: Electronic lcrncades [email protected]; address: Electronic uepsto ftogvnsae s eysml,a simply, very is, states given two of superposition A ou,ti detv mle w hns hc r also are which things, two implies adjective this us, To onatrtebrho unu ehnc twsno- was it mechanics quantum of birth the after Soon http://www.ele.kth.se/QEO/ hsehne estvt,o itreec tlt” a hnb sda iecieinamong criterion size a as used be superposition then a may that utility”, fact the “interference on or based superpositions. sensitivity, app is interferometric enhanced measure in The This sensitivity proposed. greater is presents experiments, recent in realized ASnmes 36.f 36.a 42.50.Dv 03.65.Ta, 03.65.Vf, numbers: PACS macroscopic noeainlmauet uniyteszso oe“arsoi unu superpositions”, quantum “macroscopic some of sizes the quantify to measure operational An .INTRODUCTION I. superposition dne’ xml 3 eeenters here — [3] example odinger’s oa nttt fTcnlg KH,Eetu 2,S-6 0Ksa Sweden Kista, 40 SE-164 229, Electrum (KTH), Technology of Institute Royal dne’ xml.Frty h sys- the Firstly, example. odinger’s iecieinfrmcocpcsproiinstates superposition macroscopic for criterion size A . eateto ireetoisadIfrainTechnology, Information and Microelectronics of Department . unrBj¨ Gunnar macroscopic ork Dtd ue2004) June 9 (Dated: ∗ n ir .Lc Mana Luca G. Piero and iain hnissproe osiun states. constituent superposed its than lications ae ntesuyo h iiaiisbtenasuper- 2 a between form similarities the the of of study position the on based macro- the at the observed to a usually level. related is are scopic are there which These states that states. (only) stating basis of to transfor- set equivalent “preferred” unitary is discon- global which The under invariant mations, are. the not states is large reduced nectivity how possible all checking ef- of by is the entropies This speaking, precisely, roughly particles. more achieved, quantum-correlated particles or of of number idea superposition, number fective “effective” The a the in count them. involved to of is it many a behind with shares and nature measures, non-operational entanglement with D¨ affinity by some recently, and, 23] this for [22, [24]. proposed al. Leggett been have by measures Some purpose is. it scopic oprn ihrdchrnetmso nageetre- by entanglement or speaking, times roughly decoherence established, either is comparing Similarity similar. 2 acntn,bti sas h oreo iclisi es- is in realized and, difficulties hitherto “macroscopic”, of most superpositions the source the the of also which is timating is devices, it but superconducting heavy-fascinating, to from interferometry range techni- variety which molecule these The experiments, in studied challenging phenomena interpre- cally issue. the our this of of difference test of and a understanding recent as a seen and for be tation(s) 22 can “macro- Ref. possible, as (see as superpositions 21] creating scopic” 20, of quest 19, 12, the 18, 11, in 17, 10, review) 16, 9, 15, is 8, 14, 7, situation 13, [6, the achieved efforts results experimental that The in- noteworthy said 5]. these in [4, be further, paradoxical so to just not or fortunately may the expose, It not to is terpretations. paper it this but of ways, purpose different in superpositions scopic macroscopic below.) of shown choice as experimentalist’s therefore superpositions, the and not prepare are to they difficult highly are of eigenstates superpositions excited (Moreover, non- for atypical objects. and ephemeral unnatural is this as observable sured | − φ h esr rpsdb D¨ by proposed measure The has [23], ‘disconnectivity’ called measure, Leggett’s ieetitrrtto col aeteiseo macro- of issue the face schools interpretation Different 1 − / 2 | φ ( | + 0 |

⊗ n ,adasadr tt fteform the of state standard a and 0, =  + | 1

⊗ n † ,assigfrwhich for assessing ), − 1 / 2 ( | φ quantitatively + re l 2] nta,is instead, [24], al. et ur

⊗ N + | φ n − hyaemost are they ,howmacro-

⊗ N ,where ), ret ur 2 sources; notably these two criteria lead to the same re- peaked distribution in the eigenvalue spectrum of the sult. Also in this case the measure depends on a choice preferred observable, centered around a given eigenvalue. of a preferred set of basis states. This distribution has a breadth, which is usually implied A characteristic common to these two measures is that by the semiclassical nature of the state, and implies the they emphasize the rˆole of the number of particles, or latter’s usefulness in an interferometric application with modes, participating in the superposition; in a sense, the preferred observable. If the probability distribution they associate the adjective ‘macroscopic’ only with these of each state is a Dirac delta the state is typically no features. However, though the number of particles is longer classical, or macroscopic, or interferometrically certainly an important factor, one may asks whether it useful in our sense. A semiclassical state typically has is the only one that should be associated to the word an eigenvalue spectrum with a width proportional to the ‘macroscopic’. Above, we have stated that we do not square root of its excitation or constituent particles. believe so. To clarify and justify the reason for this, Third, one imagines to make the superposition state consider a standard two-arm interferometer, and the two evolve under the operator generated by the preferred ob- states in which we have ten photons (with frequency in servable. If the state is a superposition, this evolution the visible spectrum) in one arm or, respectively, in the will produce a rapid oscillation between the state and other. Such states could be distinguished by the naked an orthogonal state (i.e., it will give rise to interference). eye [25, 26], and hence should be rightly called ‘macro- The oscillation frequency is higher than for the single su- scopic’, even though they engage only two modes (but ten perposed semiclassical states (due to their breadths), and quanta) Their superposition should thus, in our opinion, it would not be faster if the state were just a mixture. be a ‘macroscopic superposition’ but the disconnectivity The crucial point here is that the larger is the eigenvalue of this state is only 2, and the disconnectivity is indepen- separation between the two states, the higher is this os- dent of the state’s excitation N. cillation frequency. Conversely, in an application based Indeed, besides disconnectivity, Leggett has pro- on such an interference, the oscillation frequency is an posed another, “auxiliary” measure, the ‘extensive dif- indication of the state’s sensitivity as a probe, or its util- ference’ [22], whose idea has something in common with ity in an interferometric application, with respect to the the preceding remark. Roughly speaking, it is the differ- utility of the semiclassical states. ence between the expectation values for the superposed Hence, to appreciate if and how much this oscillation states of a particular (“extensive”) observable, the latter is due to the macroscopic quantum superposition, one being that for which this difference is maximal; the ex- compares it with the frequency of the oscillation due to pectation values are of course expressed with respect to the spread of the wave-function of each of the two states some “atomic” unit. in the superposition, i.e., one compares the oscillation frequency of the superposed state with that of the single states (which is equivalent to that of the corresponding II. THE GENERAL IDEA: INTERFERENCE mixture). UTILITY Let us remark, together with Leggett [28], that the issue about the word ‘macroscopic’ presents many sub- In line with the last argument, we propose an oper- jective elements — which are unavoidable. The purpose ationally defined measure based on the properties of a of this paper is thus not to eliminate these elements, but preferred observable, rather than on the definition of an to try to frame them in a more operational context: they effective number of particles, or modes. The idea is based will enter in the choice of the ‘preferred observable’ dis- on the following points: cussed above, but such a choice can be better motivated First, one tries to identify a ‘preferred’ observable on, e.g., applicational grounds. Here also enter in the which is especially related to the experiment in question. interpretation of the adjective ‘macroscopic’. The fact that such an observable must indeed be present is strongly suggested by the existence of preferred basis states, as noted above: without such a set of states there III. MATHEMATICAL FORMULATION would be no point in emphasizing superpositions, since every state can always be written as some superposition, A macroscopic superposition, in its simplest form, con- and every pure superposition as a single state.Notethat sists of a (pure) state for which the probability distribu- the adjective ‘preferred’ is not meant here in some vague tion for the preferred observable is bi-modal. (One can sense of “preferred by nature”, or in a sense analogous also imagine cases where the distribution is multi-modal, to Zurek’s “pointer observable and states” [27]; instead, but this case will not be treated here.) For what follows, it is meant to denote the observable which is most useful the exact form of the distribution is inconsequential, as for a physicist in a particular (interferometric) applica- long as it is reasonably smooth. To put what was just said tion, and hence depend on the experimental and research in more concrete dress, consider a Young double slit ex- context. periment. The pertinent preferred operator in this case Second, one considers two classical, macroscopic states is the position operatorx ˆ. A particle incident on the that can be separately characterized by a more or less double slit will have a bimodal probability distribution 3

P (x)=| x|ψ |2 at the exit side of the slit. (Here, it is as- sumed that origin of the bimodal probability distribution is a bimodal probability amplitude distribution. That is, we have a pure superposition state and not a mixed state.) Hence, if we expand the superposition state in the (complete) set of eigenvectors {|A |A ∈ R} of the preferred operator Aˆ, we will get |ψ = A|ψ |A dA = f(A)|A dA. (3.1) R R

As mentioned above, we assume that |f(A)|2 deviates Figure 1: A bimodal probability distribution P (A)=|f(A)|2 appreciably from zero only within an interval around two with peaks of width ΔA ≡ w centered at the values A1 and A values of Aˆ that we will denote A1 and A2 (see Fig. 1), 2. hence it can be expressed as

2 |f(A)| = c(A − A1)+c(A − A2), (3.2) between A1 and A2, the evolution into an orthogonal where c(A) is a (positive and appropriately normalized) state due to this factor may be significantly faster than roughly even function with a single maximum at the ori- the evolution due to the width ΔA of the probability gin A =0. distribution peaks. Therefore, a measure of the system’s A binary, equal, superposition has the property that “interferometric macroscopality” M is the inverse ratio it evolves into an orthogonal, or almost orthogonal state, between the interaction needed to evolve the system into upon a relative phase-shift of π between its two com- an orthogonal state when it is in a superposition, and ponents. To achieve such a phase-shift, we evolve the when it is not. Thus, state (3.1) under the action of its preferred operator, i.e., under the unitary evolution operator exp(iθAˆ), where θsing |A1 − A2| the real parameter θ characterizes the interaction time M = ≈ . (3.6) θ ΔA or strength. The evolved state becomes sup iθAˆ iθA |ψ(θ) = e |ψ = e f(A)|A dA, (3.3) Being a ratio, M is dimensionless. Moreover, it is in- R dependent of the specifics of the measurement such as geometry, field strengths, etc., as all these experimen- and its inner product with the original state is θ tal parameters are built into the evolution parameter . That is, the measure compares the evolution under iden- iθAˆ iθA 2 ψ|e |ψ = e |f(A)| dA tical experimental conditions for a system and a binary R superposition of the very same system. Moreover, the iθA = e c(A − A1) dA measure is applicable to all bimodal superpositions, it is, R (3.4) i.e., easy to extend the analysis to observables with a dis- eiθAc A − A dA, crete eigenvalue spectrum as will be done in Sec. IV A, + ( 2) 1 R below. In contrast to disconnectivity, the measure does not favor (nor disfavor) states with many particles or iθA1 iθA2 iθA =(e + e ) e c(A) dA modes. The measure just tells us how much the evolu- R tion of the state will be accelerated by the means of the where we have used the bi-modal assumption (3.2). The binary superposition. modulus of this inner product can thus be written θ A − A | ψ|eiθAˆ|ψ | ( 2 1) eiθAc A dA . =2cos ( ) (3.5) 1 2 R The analysis just presented can of course be also formulated in Wigner-function formalism [29, 30, 31, 32, 33]; the presence If the function c(a) is smooth, then we can deduce of “semiclassical” states perhaps suggest that such a formalism that the integral in (3.5) will have its first minimum when would be more easily applied. However, it is easy to realise that θ ≈ θ ≡ π/ A A this is not the case: first, we would have the additional spurious sing Δ ,whereΔ is the width (e.g., FWHM) presence of generalised phase-space observables, not necessarily of c(A) (and hence the width each of the two peaks of related with the preferred observable nor with the superposition |f(A)|2). However, the right hand side of (3.5) also con- state; second, evolution should be computed through a gener- alised Liouville equation; third, the overlap between the various tains the factor cos[θ(A2 − A1)/2]. This interference fac- θ θ ≡ π/ A − A (evolved) states should be computed through integrations (in- tor becomes zero when = sup ( 2 1)and volving variables not directly related to the problem). Hence, appears only because the system is in a superposition Dirac’s formalism is more suited to the mathematical expression state. For large (“macroscopic”) values of the separation of the above ideas. 4

IV. EXAMPLES ~~ 2 N A. Superpositions of number states

Let us start by examining a class of states for which the measure M is not directly applicable (and neither is Δ n the measure proposed by D¨ur et al.). An example of such a state is the two-mode, bosonic state √ | ⊗|N |N ⊗| / . ( 0 + 0 ) 2 (4.1) -20 -10 0 10 20

Each of the terms in the superposition state is a two- Figure 2: Probability distributions for the preferred observ- mode number-state. able n =(ˆn1 − nˆ2)/2. The two (truncated) thick lines rep- Since both components of the superposition state lack resent the (Kronecker-delta-like)√ distribution for the state dispersion for the preferred observable they do not evolve (|0⊗|N+|N⊗|0)/ 2withN = 20, which becomes orthog- under the preferred evolution operator exp[iθ(ˆn1−nˆ2)/2], onal for θsing = π/N. The dashed distributions correspond to | ⊗|α |α⊗|  |α|2 |n| wheren ˆ1 (ˆn2)is the number operator operating on the coherent states 0 √and 0 with = =20 left (right) mode in the tensor product. One can ascribe and a dispersion√ Δn ≈ 20, which become almost orthogonal θ ≈ π/ N this “nonevolution” to the fact that the number states for sup . (Note that these distributions are discrete; are eigenstates to our preferred observable. Moreover, their envelopes have been used here for simplicity.) the number states are highly nonclassical (and conse- quently very fragile with respect to dissipation). Hence, √ The scaling N for the interferometric utility is to although the state is in a highly excited superposition when N is large, it is not in a superposition of two semi- be compared with the fragility of the state under de- coherence, which scales as N. This means, intuitively, classical states, and therefore, it is not in the spirit of Schr¨odinger’s example. In this case, we can still derive that the “cost” for preserving√ this kind of state increases faster (by a factor ≈ N) than the state’s sensitivity in an interferometric macroscopality by approximating each of the superposition constituent states with a semiclassi- interferometric application does. cal state (cf. Fig. 2). E.g., we can compare the evolution This state also is a good example for the strange be- of the superposition state with that of a coherent state havior of the disconnectivity as a measure of how macro- N scopic the superposition above is, for the disconnectivity having a photon number expectation value of .Since N the dispersion (ˆn − nˆ )2 of a coherent state is nˆ ,we is always unity for the state, irrespective of .Thesim- √ ple explanation is that we do not increase the “size” of find that for such a state we get Δn = N. (This result the state by increasing the number of modes, but instead is identical to that we would have obtained if we instead increase only the number of excitations. had considered a two-mode coherent state.) The state in Eq. (4.1) becomes orthogonal for θsing = π/N. Hence, the interferometric macroscopality of the state becomes B. Qubit states √ π/ N √ M ≈ ≈ N. π/N (4.2) A simple state that has been used as a model system to quantify the size of macroscopic superpositions is the N-qubit state (See Fig. 2.) This√ result makes intuitive sense. In inter- N ferometry, this improvement in utility denoted the 1 ⊗N ⊗N √ (|φ+ + |φ− ), (4.3) difference between the standard quantum limit and the 2 Heisenberg limit [34, 35, 36, 37, 38, 39]. The scaling would not change qualitatively if we con- where sidered a coherent state with mean photon number N/2, |φ ϕ ± / | / ϕ ± / |− / , nor would it change if we had considered a slightly ± =cos( 2) 1 2 +sin( 2) 1 2 (4.4) squeezed coherent state (with a constant squeezing pa- where, without loss of generality, we have taken the two rameter)2 qubit basis states to be spin 1/2 eigenstates along some axis. We note that the overlap | φ−|φ+ | = | cos |. 2 2 Hence, for 0 <  1, we have | φ−|φ+ | ≈ 1 − , and the qubit states have a large overlap. However, as 2 Imagining to have the technology to create squeezed states of any degree of squeezing, and to adapt the squeezing parameter | φ |⊗N |φ ⊗N | | N |, for any choice N, the scaling would become N 2/3 [40]; however, − + = cos (4.5) the comparison with squeezed states would not be in the spirit of the present paper’s ideas, as squeezed states are not semiclas- the many-qubit state has almost√ vanishing overlap as sical [29]. soon as N  1and >1/ N. To be more specific, 5 for = arccos(e−1/N ) ≈ 2/N ,theoverlapis1/e. superposition any longer. The disconnectivity’s invari- ⊗N Hence, the N√-qubit states |φ± are almost orthogo- ance with respect to clearly demonstrates the measure’s nal for >1/ N. exclusive dependence on the particle (or mode) number In order to calculate the interferometric macroscopal- and its ignorance of the state of the particles (modes). N ity of the -qubit state, we choose as preferred observ- Sˆ N σ able the total spin r = n=1 ˆn,r (in units of )along the axis r defined by the basis states. We compute the C. A superposition of coherent states mean ⊗N ⊗N Several authors have suggested to use a binary super- φ±| Sˆr|φ± = N cos(2ϕ ± )/2, (4.6) position of coherent states to make macroscopic super- positions. At least one experiment has been performed and the variance on such a state, that is of the form φ |⊗N Sˆ 2|φ ⊗N N − 2 ϕ ± / . ± Δ r ± = [1 cos (2 )] 4 (4.7) 1 |ψ = √ (|eiϕ|α| + |e−iϕ|α| ), (4.9) We see that in order to make the difference between the N ⊗N expectation values of the two superposed states |φ+ ⊗N where the normalization factor and |φ− as large as possible, we should choose ϕ = π/ 4, i.e., the preferred observable should be along an axis N =2{1+exp[−|α|2(1−cos 2ϕ)] cos(|α|2 sin 2ϕ)} (4.10) forming an angle of π/2 with the superposition state’s average spin, in a Bloch-sphere representation. For this and wherea ˆ|eiϕ|α| = |α|eiϕ|eiϕ|α| .Ifsuchastate’s | φ |Sˆ |φ − φ |Sˆ |φ | N| |≈N choice, + r + − r − = sin quadrature-amplitude distribution is measured, where and the width of each spin probability distribution (the the Hermitian quadrature amplitude operator is de- † square root√ of each distribution’s variance) is approx- fineda ˆ2 =(ˆa − aˆ )/2i, a bimodal distribution will be imately√ N/2. We immediately see that as soon as found, provided that the superposition “distance” D = >1/ N, we get a bimodal distribution for the su- 2|α| sin ϕ is sufficiently large. Hence, the preferred evolu- perposed state. This fact is supported by the condition tion operator is exp(iθaˆ2). Let us first see how a coherent derived previously for the two states to have a small over- state evolves towards orthogonality under this operator. lap. Thus for the state (4.3) we get The dispersion ofa ˆ2 of the coherent state is 1/2. Hence, √ we can expect the coherent state to evolve into an (al- N θ π M ≈ √ =2 N . (4.8) most) orthogonal after an interaction of =2 . N/2 In a more detailed calculation, we use the fact that

† † † This measure can be compared to the measure by D¨ur et eiθaˆ2 = e(−θaˆ +θaˆ)/2 = e−θaˆ /2eθa/ˆ 2e−[−θaˆ ,θaˆ]/8 al., where they conclude that the state’s macroscopic size (4.11) −θ2/ −θa†/ θa/ is N 2. However, their measure is based on the premise = e 8e ˆ 2e ˆ 2. − / ⊗n ⊗n that the macroscopic size of the state 2 1 2(|0 +|√1 ) is n. In contrast, we find that for this state M = n/ n = Since coherent states are eigenstates to the operatora ˆ,it √ ⊗n 2 ⊗n ⊗n 2 ⊗n is easy to compute n. Because 0| ΔSˆr |0 = 1| ΔSˆr |1 =0, we have used the method described in the previous sub- 2 α|eiθaˆ2 |α e−θ /8eiθIm{α}, section, comparing the evolution time of the superposi- = (4.12) tion state with that of a spin-coherent state with mean where Im{z} denotes the imaginary part of z and where excitation n. We see that our measure is proportional we have assumed that |α| = 0. We see that the mag- to the square root of the measure of D¨ur et al., when nitude of the overlap between a coherent state and an a comparison is applicable. The difference between the 2 evolved copy of itself decreases with time as e−θ /8.Note measures can be traced to the operational questions they |α| answer. In our case it is the state’s interference utility, in that the result is independent of . The evolved state the case of D¨ur et al., it is how many qubits a GHZ state never becomes fully orthogonal to the unevolved state, distilled from the macroscopic superposition contains. the overlap goes only√ asymptotically toward zero, but we can define θsing ≈ 2 2 as the typical “orthogonality” Considering again Eqs. (4.6) and (4.7), we see that a θ π choice of an observable forming a vanishing angle with time. (The crude calculation above yielded sing =2 , the superposition state’s average spin would have led to roughly a factor of two higher.) In this time the (ab- an interference utility M = 0: with respect to such ob- solute value of the) overlap has decreased from unity to 1/e. Applying the same evolution operator to the state servable, the state (4.3) would present very small — i.e., |ψ non-macroscopic — interference effects. ,above,onegets It is also worth noticing that the disconnectivity of the 2 2e−θ /8 state (4.3) is N, irrespective of ,aslongas =0.The ψ|eiθaˆ2 |ψ θ|α| ϕ = N [cos( sin ) disconnectivity√ vanishes only when = 0 exactly, even if (4.13) −|α|2 − ϕ for <1/ N the state does not represent a macroscopic + e (1 cos 2 ) cos(|α|2 sin 2ϕ)]. 6

a 2 the full utility of macroscopic superposition states in ap- plications, unless dissipation is kept to a minimum. 3

2 D. Molecule interferometry

1 α A series of experiments have been performed on diffrac-

ϕ tion of molecules with increasing atomic weight. The ef- ϕ 1 fort started in the 1930’s with interference experiments sin a 1 2 3 with H2 (weight 2 in atomic mass units), but recently 2α the field has used increasingly larger molecules going -1 from He2,Li2,Na2,K2 and I2 (with weights 8, 14, 46, 1

~ 78, and 254 atomic mass units, respectively) to substan- -2 tially heavier molecules such as C60,C70 to C60F48 (with weights 720, 840 and 1632 atomic mass units, respec- -3 tively). In one of the recent experiments of this type [10], a collimated beam of C60 molecules was diffracted by a free-standing SiNx grating consisting of 50 nm wide Figure 3: Wigner-function representation of the coherent slits with a 100 nm period D. A tree-peaked diffraction iϕ −iϕ states |e |α| and |e |α| (with |α| =3.1andϕ =0.5) on pattern was observed a distance L =1.25 m behind the a a the plane ˆ1ˆ2. The respective probability distributions for grating by means of a scanning photo-ionization stage a the preferred observable ˆ2 are plotted on the left of the lat- followed by a ion detection unit. The distance from the ter’s axis. The distance between the two distributions’ peaks π central peak and the nodes on each side of it was about is 2|α| sin ϕ, corresponding to a θ = for the super- sup 2|α| sin ϕ 12 μm. posed state to become orthogonal; the widths√ of the distribu- To analyze the experiment in terms of macroscopic su- tions is ∼ 1, corresponding to a θsing ≈ 2 2foreachsingle costituent state to become orthogonal. perpositions, we note that the molecular beam intensity is such that the molecules are diffracting one by one. Hence, we need not invoke quantum mechanics to ex- In this case, for reasonable large values of |α| and ϕ (when plain the diffraction pattern, but we can simply model the rightmost term of the equation above is negligible) the experiment in terms of first-order wave interference. the state becomes almost orthogonal under evolution, We note that for a single slit of width d, the molecule and this first happens when intensity in the direction θ from the slit normal direction π is proportional to θ = θsup = . (4.14) 2|α| sin ϕ d/2 d/2 I ∝ eikx sin(θ) ≈ eikxθ, (4.16) Hence, the superposition’s interferometric macroscopal- −d/2 −d/2 ity will be √ where x is the position coordinate across the slit, k is θ the molecule’s de Broglie wave vector, we have assumed M sing ≈ 2 2 ≈ |α| ϕ. = 2 sin (4.15) a constant molecular beam amplitude across the slit, and θsup π/2|α| sin ϕ diffraction close to the normal has been considered. We ((See Fig. 3.) In experiments performed by Brune et see that in a slit diffraction experiment, the position x is al. [41] on Rydberg states at microwave frequencies, a |α| the preferred observable and the parameter θ can be in- of 3.1 was achieved, and ϕ was varied between approxi- terpreted as the diffraction angle. We can express k in the mately 0.1 − 0.5 rad. Hence, an interferometric macro- molecular beam velocity v = 220 ms−1, and the molecu- scopality of ≈ 6.2sin0.5 ≈ 3 was achieved. However, at lar weight m = 720 atomic mass units as k =2πmv/h, the largest angles ϕ measured, the state had already de- where h is Planck’s constant. We find that orthogonality cohered substantially, so the superposition state was no occurs when θ = θsing = h/dmv. Experimentally, this longer a pure superposition. It is worth noticing that the means that the single slit diffraction pattern has its first decoherence, manifested in the decrease of the Ramsey node at this angle, or at a distance ≈ hL/dmv from the interference fringes measured in the experiment, scales diffraction peak center in the observation plane. In a as exp(−D2/2). The decoherence time is hence propor- thought single slit experiment with the same experimen- tional to the interferometric macroscopality squared of tal parameter as in the experiment referred to above, the the state. This is the same result we found for a rather node should have been found by the ionization detector different state in Sec. IV B. The result suggest that that about 63 μm from the diffraction pattern center. In the the found relation between the decoherence and the in- real experiment, the first node (indicating orthogonality) terferometric macroscopality of the state is quite general. was found 12 μm from the central diffraction peak. Since The result also indicates that it will be difficult to reap θsup scales proportionally to this distance, we can com- 7 pute the interferometric macroscopality of the superpo- tunnelling transition probability is monitored as a func- sition state as M =63/12 ≈ 5.2. This is perhaps smaller tion of the applied magnetic flux and the frequency of a than one would expect. microwave frequency pulse. Under certain conditions it In a later refinement [20], the molecular beam was ve- is possible to create an equal superposition of the flux- locity filtered around 110 ms−1. In this case the the oid states |0 and |1 . The odd and even superposition coherence length of the molecules increased significantly, states differ in energy by about 0.86 μeV (ΔE/kB ≈ 0.1 resulting in several diffraction fringes. However, the in- K, where kB is Boltzmann’s constant). terferometric macroscopality of the state was equal to To analyze the fluxoid superposition state, the pre- that in the earlier experiment. With a single slit, the ferred operator is the magnetic flux through the ring. first diffraction node would have been expected at 125 The difference in the (mean) magnetic flux of the two μm from the diffraction center. With the grating the bare fluxoid states |0 and |1 is deduced to be about first node is found at 24 μm from the center. Thus, φ0/4. In order to estimate the interferometric macrosco- M = 125/24 ≈ 5.2. pality of this state we need to estimate the flux inter- From the considerations above, we see that the path action needed to evolve a “classical” flux state into an towards larger interferometric macroscopality lies not in orthogonal one. Since the considered fluxoid states are employing molecules with larger mass, but only in mak- discrete, they nominally do not evolve under the action ing the relative slit separation greater. of the flux operator (only their overall phase evolves). To It is of course an impressive fact to produce a mat- make an estimate of how rapid the evolution of a “classi- ter wave consisting of relatively heavy molecules with a cal” flux state would be, we use the fact that the SQUID’s > 100nm transverse coherence length. However, a mat- potential wells are approximately harmonic, and that ter wave is not automatically the same as a macroscopic classically, the state would be confined to only one of superposition state in the sense of Schr¨odinger. the wells. We can then construct a coherent flux state confined to one of the wells. To estimate the state’s dispersion (width) when ex- E. SQUID interference pressed in the flux operator, we note that the poten- tial well level spacing of the SQUID ΔE in each po- Another physical system where macroscopic superpo- tential well is about is about 86 μeV (ΔE/kB ≈ 1K). φ2/ L ≈ . sition states have been created and measured, is super- The numerical values for the potential are 0 2 55 6 φ2/ Lk ≈ φ I / π ≈ . conducting interference devices (SQUIDs). These de- meV ( 0 2 B 645 K), and 0 c 2 6 5meV φ I / πk ≈ vices consist of a superconducting wire loop incorporat- ( 0 c 2 B 76 K). In a (mechanical) harmonic os- 2 ing Josephson junctions [13, 14]. In these junctions, the cillator, we have U = κx /2, an energy level spacing of magnetic flux through the ring is quantized in units of ω = κ/m, and a position dispersion of the flux quantum φ0 = h/2e,wheree is the unit charge. 1 1 The supercurrent in the loop can be controlled by apply- Δx ≈ = √ , (4.18) ing an external magnetic field. When the external field 2 ωm 2 κm magnetic flux φx through the ring is about one half a where κ is the spring constant and m is the oscillator flux quanta, the supercurrent in the ring can either flow mass. Taylor expanding the potential in (4.17) around in such a way that the induced magnetic flux cancels φx the point φx = 0 and using the analogy with the mechan- or so that it augments it. The corresponding “fluxoid” ical harmonic oscillator, we find that the flux dispersion states correspond to zero and one flux quanta respec- of the flux coherent state is tively. The SQUID potential U is given of the sum be- tween the magnetic energy of the ring and the Josephson φ0 1 − Δφ ≈ ≈ 7.6 · 10 3φ . (4.19) coupling energy of the junction(s): 2 1290 + 4π276 0 Hence, the state’s interferometric macroscopality is φ2 φ − φ 2 φ I 1 0 x 0 c U = − cos(2πφ/φ0) , (4.17) φ /4 2 L φ π M 0 ≈ . 0 = −3 33 (4.20) 7.6 · 10 φ0 where L is the the ring inductance and Ic is the junction This interferometric macroscopality is impressive, but critical current. By tuning the induced flux, a double-well significantly different from what one might naively guess, potential as a function of the flux φ that threads the ring considering that the flux difference between the states |0 can be created. To a first approximation, each well can and |1 corresponds to a local magnetic moment of about be approximated by a harmonic potential with (approx- 1010 μB [13]. imately) equidistantly spaced energy states. However, there is a finite barrier between the wells, and this bar- rier can be tuned with the help of the applied magnetic F. Quantum superpositions of a mirror field. For certain values of the external field, an excited level of each of the wells line up, and tunnelling between In the literature it has been proposed that emerging the states is possible. In one of the experiments [13], the technology will soon allow one to put a mirror into a su- 8 perposition of positions [17]. The idea is to suspend a V. CONCLUSIONS small (10 × 10 × 10 μm) mirror at the tip of a high Q- value cantilever. The mirror would form the end-mirror of a plano-concave, high-Q cavity. This cavity would In this paper, a measure has been presented to quan- form one arm of a Michelson interferometer. In the other tify how “macroscopic” some superpositions realized in arm there would be a rigid cavity with equal resonance different experiments are. The measure is based on the frequency and finesse. When a single photon is incident utility of the superposition state as a probe in interfer- on the Michelson interferometer input port, with proba- ence experiments, quantified by the difference in time or, bility 1/2, the photon would either be found in the rigid more generally, in interaction strength needed to make cavity or in the cavity with the cantilever suspended mir- a macroscopic superposition or, respectively, the sin- ror. The photon pressure would shift the position of the gle macroscopic states evolve into orthogonal states by suspended mirror, and relatively quickly, the position of means of a unitary transformation generated by a par- the cantilever and the photon state would be entangled. ticular preferred observable. The authors proceed to analyze how well this entangle- The proposed measure gives values for the “interfero- ment could be detected by detecting in what port the metric macroscopalities” of recent experiments that are photon exits the Michelson interferometer as a function perhaps smaller than expected; this is due to the fact of time. Quite intuitively, if the photon exerts pressure that the measure is not directly related to the number of on the mirror for a whole oscillation period of the mirror, particles, or excitations, involved in the experiment, but the work exerted by the photon on the mirror when the rather to the interference properties deriving from large two are moving codirectionally, is cancelled by the work separations of two superposed states as measured by the the mirror exerts on the photon mode when he mirror preferred observable. and photon move contradirectionally. Hence, there will Let us finally remark again that the issue about the be a revival in the photon interference visibility after an word ‘macroscopic’ is very subjective, but it is interesting interaction time equal to the mirror oscillation period. as well, and can provide some insight in the way we look Of course, dissipation (decoherence) will impede one to and use quantum and non-quantum mechanical concepts. observe this revival, but the authors deduce that in a cool enough environment, it is possible to find somewhat real- istic parameters allowing the quantum superposition to be detected through the photon’s visibility revival. The authors suggest that under these conditions, the separa- Acknowledgments tion between the two mirror positions in the superposi- tion correspond to the width (dispersion) of a coherent state wavepacket. This immediately lets us deduce that This work was inspired by a talk given by Dr. the interferometric macroscopality of the suggested ex- H. Takayanagi, NTT Basic Research Laboratories, at the periment is of the order unity, in spite of involving an QNANO ’02 workshop at Yokohama, Japan. The work astronomic number of atoms, ∼ 1014. was supported by the Swedish Research Council (VR).

[1] A. Einstein, B. Podolsky, and N. Rosen, Phys. Rev. 47, [12] Y. Nakamura, Y. A. Pashkin, and J. S. Tsai, Nature 398, 777 (1935). 786 (1999), cond-mat/9904003. [2]N.Bohr,Phys.Rev.48, 696 (1935). [13] J. R. Friedman, V. Patel, W. Chen, S. K. Tolpygo, and [3] E. Schr¨odinger, Naturwissenschaften 23, 807 (1935), J. E. Lukens, Nature 406, 43 (2000), cond-mat/0004293. transl. in [42]. [14] C. H. van der Wal, A. C. J. ter Haar, F. K. Wilhelm, [4] A. Peres and N. Rosen, Phys. Rev. 135, B1486 (1964). R. N. Schouten, C. J. P. M. Harmans, T. P. Orlando, [5] A. Peres, Phys. Rev. D 22, 879 (1980). S. Lloyd, and J. E. Mooij, Science 290 (2000). [6] D. D. Awschalom, M. A. McCord, and G. Grinstein, [15] P. Bonville and C. Gilles, Physica B 304, 237 (2001), Phys.Rev.Lett.65, 783 (1990). cond-mat/0007090. [7]D.D.Awschalom,J.F.Smyth,G.Grinstein,D.P.Di- [16] B. Julsgaard, A. Kozhekin, and E. S. Polzik, Nature 413, Vincenzo, and D. Loss, Phys. Rev. Lett. 68, 3092 (1992). 400 (2001), quant-ph/0106057. [8] J. I. Cirac, M. Lewenstein, K. Mølmer, and P. Zoller, [17] W. Marshall, C. Simon, R. Penrose, and Phys.Rev.A57, 1208 (1998), quant-ph/9706034. D. Bouwmeester, Towards quantum superpositions [9] J. Ruostekoski, M. J. Collett, R. Graham, and D. F. of a mirror (2002), quant-ph/0210001. Walls, Phys. Rev. A 57, 511 (1998), cond-mat/9708089. [18] A. Auffeves, P. Maioli, T. Meunier, S. Gleyzes, [10] M. Arndt, O. Nairz, J. Vos-Andreae, C. Keller, G. van der G. Nogues, M. Brune, J.-M. Raimond, and S. Haroche, Zouw, and A. Zeilinger, Nature 401 (1999). Entanglementofamesoscopicfieldwithanatom [11] J. C. E. Harris, J. E. Grimaldi, D. D. Awschalom, induced by photon graininess in a cavity (2003), D. Chiolero, and D. Loss, Phys. Rev. B 60, 3453 (1999), quant-ph/0307185. cond-mat/9904051. [19] M. J. Everitt, T. D. Clark, P. B. Stiffel, R. J. Prance, 9

H. Prance, A. Vourdas, and J. F. Ralph, Superconducting [31] L. Cohen, J. Math. Phys. 7, 781 (1966). analogues of quantum optical phenomena: Schr¨odinger [32] M. Hillery, R. F. O’Connell, M. O. Scully, and E. P. cat states and squeezing in a SQUID ring (2003), Wigner, Phys. Rep. 106, 121 (1984). quant-ph/0307175. [33] L. Cohen, Proc. IEEE 77, 941 (1989). [20] O. Nairz, M. Arndt, and A. Zeilinger, Am. J. Phys. 71, [34] C. M. Caves, Phys. Rev. Lett. 45, 75 (1980). 319 (2003). [35] C. M. Caves, Phys. Rev. D 23, 1693 (1981). [21] S. Ghosh, T. F. Rosenbaum, G. Aeppli, and S. N. Cop- [36] B. Yurke, S. L. McCall, and J. R. Klauder, Phys. Rev. A persmith, Nature 425, 48 (2003). 33, 4033 (1986). [22] A. J. Leggett, J. Phys.: Condens. Matter 14, R415 [37] P. Grangier, R. E. Slusher, B. Yurke, and A. LaPorta, (2002). Phys. Rev. Lett. 59, 2153 (1987). [23] A. J. Leggett, Progr. Theor. Phys. Suppl. 69, 80 (1980). [38] C. Fabre, J. B. Fouet, and A. Maˆıtre, Opt. Lett. 25,76 [24] W. D¨ur,C.Simon,andJ.I.Cirac,Phys.Rev.Lett.89, (2000). 210402 (2002), quant-ph/0205099. [39]N.Treps,U.Andersen,B.Buchler,P.K.Lam,A.Maˆıtre, [25] S. Hecht, J. Opt. Soc. Am. 32, 42 (1942). H.-A. Bachor, and C. Fabre, Phys. Rev. Lett. 88, 203601 [26] D. A. Baylor, T. D. Lamb, and K. W. Yau, J. Physiol. (2002), quant-ph/0204017. 288, 613 (1979). [40] R. Loudon and P. L. Knight, J. Mod. Opt. 34, 709 (1987). [27] W. H. Zurek, Rev. Mod. Phys. 75, 715 (2003), [41] M. Brune, E. Hagley, J. Dreyer, X. Maˆitre, A. Maali, quant-ph/0105127. C. Wunderlich, J.-M. Raimond, and S. Haroche, Phys. [28] A. J. Leggett, Phys. Scripta T102, 69 (2002). Rev. Lett. 77, 4887 (1996). [29] U. Leonhardt, Measuring the Quantum State of Light, [42] J. D. Trimmer, Proc. Am. Phil. Soc. 124, 323 (1980), Cambridge studies in modern optics (Cambridge Univer- http://www.tu-harburg.de/rzt/rzt/it/QM/cat.html. sity Press, Cambridge, 1997). [30] E. P. Wigner, Phys. Rev. 40, 749 (1932).

Paper K

Schr¨odinger-cat states – size classification based on evolution or dissipation

Gunnar Bj¨ork and Piero G. Luca Mana Department of Microelectronics and Information Technology Royal Institute of Technology (KTH) Electrum 229, SE-164 40 Kista, Sweden

ABSTRACT The issue of estimating how “macroscopic” a superposition state is, can be addressed by analysing the rapidity of the state’s evolution under a preferred observable, compared to that of the states forming the superposition. This fast evolution, which arises from the larger dispersion of the superposition state for the preferred operator, also represents a useful characteristic for interferometric applications. This approach can be compared to others in which a superposition’s macroscopality is estimated in terms of the fragility to dissipation. Keywords: Macroscopic superpositions, Schr¨odinger-cat states, Interference, Disconnectivity

1. INTRODUCTION There are many concepts that, although not unknown to classical physics, assume in quantum mechanics a fundamental rˆole. We may take for example those of ‘fluctuation’ and ‘noise’: a “classical” engineer must deal with noise in virtually any real application, from communication theory to thermodynamics; but he knows that he can make, in principle, all the fluctuations of his classical system as small as he wishes. A quantum engineer, on the other hand, knows that if she succeeds in eliminating the fluctuations for some observable of her quantum system, she must also expect increased fluctuations for some other observable. Related to ‘fluctuation’ is the well known concept of ‘superposition’, which also has a fundamental place in quantum physics. The peculiar statistical features of a superposition are in general counterintuitive from a classical point of view,∗ and have been the source of many a claimed paradox, not to mention philosophical headaches. The situation becomes dramatic and difficult to imagine when the superposition is “macroscopic”, the typical example being Schr¨odinger’s cat (3), where the poor cat is not in a definite “alive” or “dead” state. This is counterintuitive, the more so because the two states involved are macroscopically distinct, i.e., the one cannot be easily confused with the other in general. Apart from being counterintuitive, superposition states such as Schr¨odinger’s are also very difficult to realise, because they are very fragile to dissipation — whereas the superposed states (“alive” and “dead”) are very robust to the latter. Dissipation is indeed the main problem faced in all recent noteworthy experiments (4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19) (a recent review is given in Ref. 20), whose quest is to create superpositions as “macroscopic” as possible. Thus, the question ‘how macroscopic are such superpositions (how near are they to being a Schr¨odinger-cat state)?’ can be often related to the other question ‘how fragile are they to dissipation and decoherence phenomena?’. On the other hand, another very natural question for an engineer to ask is the following: what could we do with a macroscopic superposition (a “Schr¨odinger cat”)? The purpose of this work is to show that this question also has a bearing on the issue of the “macroscopality” of a superposition. First, we note that the relation between ‘dispersion’ and ‘superposition’ alluded to above is the following: if we have two states which, for a given observable, present almost vanishing fluctuations (i.e., they are almost eigenstates of that observable), then their superposition will in general present an unavoidable larger dispersion for the same observable. This phenomenon would in general be more dramatic for macroscopic superpositions. ∗See however Refs. (1; 2) Thus, the peculiar feature of the macroscopic superposition represented by Schr¨odinger’s cat is that it presents an unavoidable, and hence classically counterintuitive, dispersion for the observable “cat alive/cat dead”. However, a large dispersion for some observable implies in general also a faster and interferometrically useful evolution under the same observable. Thus, the very counterintuitive features of a Schr¨odinger cat would also represent an enhanced usefulness in interferometric applications with respect to that of a cat which is “only” alive or dead: the unavoidable dispersion for the “cat alive/cat dead” observable would indeed imply a faster evolution under the same observable than the classical “alive” or “dead” states would have. To make these ideas more precise and to give them a mathematical form, we shall first discuss in more detail the principal features that, in our view, characterise a Schr¨odinger-cat superposition, emphasising their connexion to dispersion and evolution; we shall then define the cat’s ‘interferometric utility’, which turns out to be a measure of its ‘macroscopality’ as well. This measure will be applied and discussed in three explicative examples, and then compared to other “size measures” based on dissipation and entanglement.

2. WHAT IS A SCHRODINGER-CAT¨ STATE? A Schr¨odinger-cat state is often defined as a superposition between two (or more) macroscopically distinguishable states. However, this definition is rather imprecise, as it leaves the questions about what is meant by “macro- scopic” and “distinguishable” open. In a discussion about these concepts, subjectivity will inevitably enter the picture, so we will give our subjective view here. We argue that the word macroscopic must imply two things. First, a macroscopic state should have a certain excitation rendering it easily detectable, maybe even visible, like the cat in Schr¨odinger’s example (3). For different states, the implication of this requirement on the excitation of the state differs. It is, e.g., not very difficult to detect photons in the visible range, even if they contain only a few quanta. (It has been demonstrated that with the naked eye, after accommodation to low light conditions, one can see bursts of as few as ten photons (21; 22; 23).) However, at microwave wavelengths, the thermal background radiation will make it more difficult to detect few-photon states, so at such frequencies, the excitation, measured in photon number, must be higher in order to reach a state that can be considered macroscopic. In order to classify how macroscopic a certain Schr¨odinger-cat state is, the usual way is to use a function that depends on the state’s excitation (or, on how many constituent particles the state has). This function should be monotonic with respect to the excitation, and hence it will serve as a quantifier of the Schr¨odinger-cattiness of the state. A problem is that ideally, the function should work for any (reasonable) Schr¨odinger-cat state. Therefore it must address general properties of such states. The other property we think must be satisfied for a state to qualify as a Schr¨odinger-cat state is that each of the states making up the superposition must be as classical as possible. At the heart of Schr¨odinger’s “paradox” lies the intermingling of classical concepts (a cat) and inherently quantum mechanical concepts (a superposition). Therefore, it is not sufficient that the constituent states in the superposition are highly excited. They should also be as classical as possible. This requirement implies that the states have fluctuations. In the foregoing we have talked about superpositions of states, without specifying what kind of states the superposition should be built from. Since any pure state can be written as a single state vector, the notion of superposition implies that there exists a preferred observable, and that the Schr¨odinger-cat state is not an eigenstate to this observable. In a double-slit experiment, the pertinent observable is the position across the slit. A state passing through the double slit is prepared in a superposition of two orthogonal position states, one corresponding to each slit, and each with a relatively narrow position probability dispersion. What can be observed at a distant screen is the interference pattern between these two states. To speak about a Schr¨odinger- cat state without disclosing the preferred observable is meaningless, because for at least some observable, the state is not in a superposition state. In general, the preferred observable is rather obvious, given that the preparation of the state is known. In fact, the usual procedure in preparing Schr¨odinger-cat states is first to decide on the preferred observable, and then try to devise an experimental procedure to prepare a suitable state.

3. FLUCTUATIONS, EVOLUTION AND DISTINGUISHABILITY In Sec. 2 above, we stated that macroscopic, semiclassical states typically have dispersion. This dispersion depends, on the fundamental level, on quantization, but may be described in terms of, e.g., shot noise. In many cases, the fluctuations are such that they correspond to the minimal fluctuations imposed by Heisenberg’s uncertainty relation that minimize the state energy. Relatively well known examples are the quadrature amplitude fluctuations of a coherent state or the spin fluctuations of a spin-coherent state. The fluctuations are also necessary to evolve the state. Suppose that an ensemble of identically prepared states has a certain dispersion of the measurement outcomes when it is measured by the observable Aˆ.Itis well known that the dispersion sets a limit to how fast the state may evolve under the action of the unitary operator exp(iθAˆ), where θ is a real parameter characterizing the strength of the interaction (24; 25). That is, how quickly the overlap between a state |ψ and the evolved state exp(iθAˆ)|ψ vanishes as a function of θ, depends on the dispersion (ψ|(ΔAˆ)2|ψ)1/2.If|ψ is an eigenstate of Aˆ, it is obvious that the state does not evolve since |ψ| exp(iθAˆ)|ψ| = 1 for all values of θ. Typically, the dispersion will cause the overlap between the original state and the evolved state approach zero. This will make the two states distinguishable with certainty. In the following, and this is the idea behind our measure, the interaction strength θ needed to make the evolved state orthogonal to the unevolved state will be used as a measure of the size of the interaction. To understand the key idea it is useful to make an analogy with interferometry. Suppose an interferometer is set up so that all the particles fed into the input port exit through one of the output ports. If we change the path-length between the interferometer arms, less particles will exit through this port (on the average), and instead these particles will exit through the other port (again, in the average sense). In the case when no particles exit through the first port, the initial and the modified (due to the arm-length change) states inside the interferometer must be orthogonal. It is desirable that an as small arm-length change as possible can accomplish the transfer of the initial state into an orthogonal state. The parameter θ quantifies how small arm-length change (or in general, interaction strength) is needed to evolve a state to an orthogonal state.

4. INTERFEROMETRIC UTILITY In the preparation of a superposition of two states having each an associated dispersion, in general the dispersion (measured with the preferred observable) of the superposition state will be greater than that of its constituent states. Hence, the superposition state will evolve faster under the action of the preferred observable than its constituent states. This is a fact one can capitalize on, as superposition states that evolve quickly are useful in interferometric applications, as discussed above. The key idea behind our proposal for a size criterion for Schr¨odinger-cat states is to compare the rate of evolution of the superposition state with that of its constituent states. In those cases when the Schr¨odinger-cat state is prepared from two eigenstates of the proper observable, we will compare the evolution of this superposition state with that of a semiclassical state with the same mean excitation. Therefore, assume that we have a state whose probability distribution associated with the measurement of the observable Aˆ is a bimodal distribution. For what follows, the exact form of the distribution is of no consequence, as long as it is reasonably smooth. (Here, it is assumed that origin of the bimodal probability distribution is a bimodal probability amplitude distribution. That is, we have a pure superposition state and not a mixed state.) If we expand the superposition state in the set of eigenvectors {|A|A ∈ R} of the preferred operator Aˆ, we will get   |ψ = A|ψ|AdA = f(A)|AdA. (1) R R We will assume that |f(A)|2 deviates appreciably from zero only within an interval around two values of Aˆ that 2 we will denote A1 and A2 (see Fig. 1). Therefore, we can express the probability distribution |f(A)| of the state

2 |f(A)| ≈ c(A − A1)+c(A − A2), (2) where c(x) is a (positive and appropriately normalized) roughly even function with a single maximum at the origin x =0. A binary, equal, superposition has the property that it evolves into an orthogonal, or almost orthogonal state, upon a relative phase-shift of π between its two components. To achieve such a phase-shift, we evolve the state (1) under the unitary evolution operator exp(iθAˆ). The evolved state becomes  ˆ |ψ(θ) = eiθA|ψ = eiθAf(A)|AdA, (3) R and its overlap with the original state is  ˆ ψ|eiθA|ψ = eiθA|f(A)|2 dA R  iθA iθA ≈ e c(A − A1) dA + e c(A − A2) dA, (4) R  R =(eiθA1 + eiθA2 ) eiθAc(A) dA R where we have used the bimodal assumption (2). The modulus of this inner product can thus be written     θ A − A  iθAˆ  ( 2 1) iθA  |ψ|e |ψ| =2cos e c(A) dA . (5) 2 R

If the function c(A) is smooth, then we can deduce that the integral in (5) will have its first minimum when θ ≈ θsing ≡ π/ΔA, where ΔA is the width (e.g., FWHM) of c(A) (and hence the width of each of the two peaks 2 of |f(A)| ). However, the right hand side of (5) also contains the factor cos[θ(A2 − A1)/2]. This interference factor becomes zero when θ = θsup ≡ π/(A2 − A1) and appears only because the system is in a superposition state. For large (“macroscopic”) values of the separation between A1 and A2, the evolution into an orthogonal state due to this factor may be significantly faster than the evolution due to the width ΔA of the probability distribution peaks. Therefore, a measure of the system’s “interference macroscopality” M is the inverse ratio between the interaction needed to evolve the system into an orthogonal state when it is in a superposition, and when it is not. Thus, θ |A − A | M = sing ≈ 1 2 . (6) θsup ΔA Being a ratio, M is dimensionless. Moreover, it is independent of the specifics of the measurement such as geometry, field strengths, etc., as all these experimental parameters are built into the evolution parameter θ. That is, the measure compares the evolution under identical experimental conditions for a system and a binary superposition of the very same system. Moreover, the measure is applicable to all bimodal superpositions, it is, i.e., easy to extend the analysis to observables with a discrete eigenvalue spectrum as will be done in Sec. 5, below.

2 Figure 1. A bimodal probability distribution P (A)=|f(A)| with peaks of width ΔA centered at the values A1 and A2. 5. THREE EXAMPLES To give some concrete examples of our measure, consider first a binary superposition of coherent states. At least one experiment has been performed on such a state (26), that is of the form 1 |ψ = √ (|eiϕ|α| + |e−iϕ|α|), (7) N where the normalization factor

N =2{1 + exp[−|α|2(1 − cos 2ϕ)] cos(|α|2 sin 2ϕ)} (8) and wherea ˆ|eiϕ|α| = |α|eiϕ|eiϕ|α|. If such a state’s quadrature-amplitude distribution is measured, where † the Hermitian quadrature amplitude operator is defineda ˆ2 =(ˆa − aˆ )/2i, a bimodal distribution will be found, provided that the superposition “distance” D =2|α| sin ϕ is sufficiently large. Hence, the preferred evolution operator is exp(iθaˆ2). Let us first see how a coherent state evolves towards orthogonality under this operator. We use the fact that

† † † 2 † eiθaˆ2 = e(−θaˆ +θaˆ)/2 = e−θaˆ /2eθa/ˆ 2e−[−θaˆ ,θaˆ]/8 = e−θ /8e−θaˆ /2eθa/ˆ 2. (9)

Since coherent states are eigenstates to the operatora ˆ, it is easy to compute

2 α|eiθaˆ2 |α = e−θ /8eiθIm{α}, (10) where Im{z} denotes the imaginary part of z and where we have assumed that |α| = 0. We see that the magnitude 2 of the overlap between a coherent state and an evolved copy of itself decreases with time as e−θ /8. Note that the result is independent of |α|. The evolved state never becomes fully√ orthogonal to the unevolved state, the overlap goes only asymptotically toward zero, but we can define θsing ≈ 2 2 as the typical “orthogonality” time. In this time the (absolute value of the) overlap has decreased from unity to 1/e. Applying the same evolution operator to the state |ψ, above, one gets

2 −θ /8   2e 2 ψ|eiθaˆ2 |ψ θ|α| ϕ e−|α| (1−cos 2ϕ) |α|2 ϕ . = N cos( sin )+ cos( sin 2 ) (11)

In this case, for reasonable large values of |α| and ϕ (when the rightmost term of the equation above is negligible) the state becomes almost orthogonal under evolution, and this first happens when π θ = θ = . (12) sup 2|α| sin ϕ

Hence, the superposition’s interferometric macroscopality will be √ θ 2 2 M = sing ≈ ≈ 2|α| sin ϕ. (13) θsup π/2|α| sin ϕ Note that the interferometric macroscopality coincides with the superposition distance D for coherent states. As a second example, consider the two-mode, bosonic state √ (|0⊗|N + |N⊗|0)/ 2. (14)

Each of the terms in the superposition state is a two-mode number-state and quite obviously, the preferred observable is the difference between the photon numbers in the two modes, that is Aˆ =(ˆn1 − nˆ2)/2, wheren ˆ1 (ˆn2) is the number operator operating on the left (right) mode in the tensor product. Since both components of the superposition state lack dispersion for the preferred observable they do not evolve under the preferred evolution operator exp[iθ(ˆn1 − nˆ2)/2]. The number states are also highly nonclassical (and consequently very fragile with respect to dissipation). Hence, although the state is in a highly excited superposition when N is large, it is not in a superposition of two semiclassical states, and therefore, we think it is not in the spirit of Schr¨odinger’s example. We can still derive an interferometric macroscopality by approximating each of the superposition constituent states with a suitable semiclassical state. Therefore, we compare the evolution of the superposition state with that of a coherent state with a photon number expectation value√ of N. Since the dispersion (ˆn −nˆ)2 of a coherent state is nˆ1/2, we find that for such a state we get Δn = N. (This result is more general than it may appear at first glance. Due to the law of large numbers, the dispersion of many classical systems scale as the square root of the excitation or the number of constituent particles.) The state in Eq. (14) becomes orthogonal for |θ| = π/N. Hence, the interferometric macroscopality of the state becomes √ π/ N √ M ≈ ≈ N. π/N (15) √ This result makes intuitively sense. In interferometry, this N improvement in utility denoted the difference between the standard quantum limit and the Heisenberg limit (27; 28; 29; 30; 31; 32). The scaling would not change qualitatively if we considered a coherent state with mean photon number N/2. As a third example, we consider some collection of particles with highly correlated motions around some mean position. Each particle k is, on the average, localized at some coordinater ¯k, but its position fluctuates 2 1/2 around this position. Say that the position dispersion for the particle is δ, that is ψk|(Δˆxk) |ψk = δ. Assume now a state with N particles, where all particles’ position fluctuations are perfectly correlated. Due to the perfect correlation, the dispersion of the center of mass operator k xˆk is Nδ. This means that the state will become orthogonal under the action of the center of mass observable for an interaction strength of −1 θsing =(Nδ) . Suppose that we now prepare a superposition of two such states, with the mean position of each particle displaced between the states by a distance Δx. We find that this states interferometric macroscopality becomes N x x M Δ Δ . = Nδ = δ (16) It is seen that even if the center of mass is displaced by a distance equal to the dispersion of each individual particle, the state’s interferometric macroscopality will remain approximately unity. This is a consequence of the large dispersion associated with each state in the superposition, and its scaling with respect to particle number. Unless Δx ≥ δ it will be difficult to resolve the two peaks in the position distribution due to the large fluctuations. Although such a state is highly nonclassical (very entangled), the state is not particularly more suitable for interferometric applications than each of its constituent states. We do not consider such a state a good example of a Schr¨odinger-cat state. The main reason is that the state is composed of a superposition of two highly entangled states, and not of two semiclassical states (indeed, the superposed states are not much more robust to dissipation then their superposition is). Therefore, the state seems much better suited for quantum information applications, and as such, it is better quantified with quantum information measures such as concurrence.

6. DISSIPATION AND ENTANGLEMENT BASED SIZE MEASURES We are certainly not the first to try to quantify the size of macroscopic superposition states. Measures to quantify the macroscopality have been proposed by, e.g., Leggett (33; 20) and, recently, by D¨ur et al. (34). Leggett’s measure, called ‘disconnectivity’ (33), has some affinity with entanglement measures, and shares a non-operational nature with many of them. The idea behind it is to count the effective number of quantum- correlated particles in the state. This is achieved, roughly speaking, by checking how small the entropies of all possible reduced states are. The disconnectivity is not invariant under global unitary transformations, which is equivalent to stating that there is a preferred observable as discussed in Sec. 2. A disadvantage with Leggett’s measure, as we see it, is its insensitivity to the state excitation. E.g., the state (14) above has the disconnectivity ln 2, irrespective of the excitation N. The number ln 2 stems from the fact that this is a two-mode state. Leggett’s measure seems to work fine for Fermionic states, where the only way to increase the state’s excitation is by increasing the number of particles, modes, occupied energy levels, etc. For bosonic states it sometimes gives counter intuitive results. Leggett’s measure is closely related to dissipation, because upon loss of one of the particles, the measure predicts how mixed the resulting state becomes. In general, this measure will give similar results as our measure. This can be understood in terms of distinguishability. If the distance |A1 − A2| between the two peaks in the dispersion function becomes large, observation of the pertinent value of a single particle of the state will allow us to infer which of the two probability distribution is most likely to have “caused” this result, and consequently the state will collapse towards this possibility. This will destroy the superposition. Likewise, if each of the two constituent states are highly entangled, such as in a GHZ-like superposition, the post-measurement state can be perfectly predicted from a measurement of a single particle. This is what happens for the state (14), above. Therefore, its disconnectivity is small. On the other hand, if the superposition distance |A1 − A2| is small compared to each state’s dispersion, then the two distributions will overlap, and it will be difficult to infer anything about the post-measurement state from the measurement. This will make the superposition more robust against dissipation, but will also make the state’s interferometric macroscopality small. The intimate connection between dissipation and size now becomes clear. Another measure proposed by D¨ur et al. (34), is based on the similarities between a superposition of the −1/2 ⊗N ⊗N −1/2 ⊗n ⊗n form 2 (|φ+ + |φ− ), where |φ−|φ+| = = 0, and a standard state of the form 2 (|0 + |1 ), assessing for which n they are most similar. Similarity is established, roughly speaking, by comparing either decoherence times or entanglement resources. These two criteria lead to the same result. Here, again, we see the close connection between utility and sensitivity to dissipation. This relation should not come as a surprise to anyone in the field, as dissipation has been a major obstacle for the development of quantum technology. The more useful states are, the more fragile to dissipation they tend to be.

7. CONCLUSIONS We have shown that the “macroscopality” of a superposition is intimately related to the latter’s utility in interferometric applications, and we have given a measure, called for this reason ‘interference macroscopality’, that can quantify this utility for binary superpositions. This relation comes about in virtue of the superposition’s greater dispersion for a particular preferred observable, and hence faster evolution under the action of the latter, in comparison to the dispersion and evolution of the superposed states. Three examples have been given that illustrate this point as well as the numerical application of our measure. Finally, we also have briefly discussed the relation between a superposition’s “macroscopality”, interference utility, and fragility to dissipation as captured by other “size measures” proposed in the literature.

ACKNOWLEDGMENTS This work was supported by the Swedish Research Council (VR).

References [1] K. A. Kirkpatrick, ““Quantal” behavior in classical probability”, Found. Phys. Lett. 16(3), p. 199, 2003. quant-ph/0106072.

[2] K. A. Kirkpatrick, “Classical three-box “paradox””, J. Phys. A: Math. Gen. 36(17), p. 4891, 2003. quant-ph/0207124.

[3] E. Schr¨odinger, “Die gegenw¨artige Situation in der Quantenmechanik”, Naturwissenschaften 23, pp. 807– 812, 823–828, 844–849, 1935. Translated in (35). [4] D. D. Awschalom, M. A. McCord, and G. Grinstein, “Observation of macroscopic spin phenomena in nanometer-scale magnets”, Phys. Rev. Lett. 65, p. 783, 1990. [5] D. D. Awschalom, J. F. Smyth, G. Grinstein, D. P. DiVincenzo, and D. Loss, “Macroscopic quantum tunneling in magnetic proteins”, Phys. Rev. Lett. 68, p. 3092, 1992. [6] J. I. Cirac, M. Lewenstein, K. Mølmer, and P. Zoller, “Quantum superposition states of Bose-Einstein condensates”, Phys. Rev. A 57(2), p. 1208, 1998. quant-ph/9706034. [7] J. Ruostekoski, M. J. Collett, R. Graham, and D. F. Walls, “Macroscopic superpositions of Bose-Einstein condensates”, Phys. Rev. A 57(1), p. 511, 1998. cond-mat/9708089. [8] M. Arndt, O. Nairz, J. Vos-Andreae, C. Keller, G. van der Zouw, and A. Zeilinger, “Wave-particle duality of C60 molecules”, Nature 401, October 1999. [9] J. C. E. Harris, J. E. Grimaldi, D. D. Awschalom, D. Chiolero, and D. Loss, “Excess spin and the dynamics of antiferromagnetic ferritin”, Phys. Rev. B 60, p. 3453, 1999. cond-mat/9904051. [10] Y. Nakamura, Y. A. Pashkin, and J. S. Tsai, “Coherent control of macroscopic quantum states in a single- Cooper-pair box”, Nature 398, pp. 786–788, April 1999. cond-mat/9904003. [11] J. R. Friedman, V. Patel, W. Chen, S. K. Tolpygo, and J. E. Lukens, “Quantum superposition of distinct macroscopic states”, Nature 406, pp. 43–46, July 2000. cond-mat/0004293. [12] C. H. van der Wal, A. C. J. ter Haar, F. K. Wilhelm, R. N. Schouten, C. J. P. M. Harmans, T. P. Orlando, S. Lloyd, and J. E. Mooij, “Quantum superposition of macroscopic persistent-current states”, Science 290, October 2000. [13] P. Bonville and C. Gilles, “Search for incoherent tunnel fluctuations of the magnetisation in nanoparticles of artificial ferritin”, Phys. B: Condens. Matter 304, p. 237, 2001. cond-mat/0007090. [14] B. Julsgaard, A. Kozhekin, and E. S. Polzik, “Experimental long-lived entanglement of two macroscopic objects”, Nature 413, pp. 400–403, Sept. 2001. quant-ph/0106057. [15] W. Marshall, C. Simon, R. Penrose, and D. Bouwmeester, “Towards quantum superpositions of a mirror”, 2002. quant-ph/0210001. [16] A. Auffeves, P. Maioli, T. Meunier, S. Gleyzes, G. Nogues, M. Brune, J.-M. Raimond, and S. Haroche, “Entanglement of a mesoscopic field with an atom induced by photon graininess in a cavity”, 2003. quant-ph/0307185. [17] M. J. Everitt, T. D. Clark, P. B. Stiffel, R. J. Prance, H. Prance, A. Vourdas, and J. F. Ralph, “Supercon- ducting analogues of quantum optical phenomena: Schr¨odinger cat states and squeezing in a SQUID ring”, 2003. quant-ph/0307175. [18] O. Nairz, M. Arndt, and A. Zeilinger, “Quantum interference experiments with large molecules”, Am. J. Phys. 71, p. 319, 2003. [19] S. Ghosh, T. F. Rosenbaum, G. Aeppli, and S. N. Coppersmith, “Entangled quantum state of magnetic dipoles”, Nature 425, p. 48, Sept. 2003. [20] A. J. Leggett, “Testing the limits of quantum mechanics: Motivation, state of play, prospects”, J. Phys.: Condens. Matter 14, p. R415, 2002. [21] S. Hecht, “The quantum relations of vision”, J. Opt. Soc. Am. 32, pp. 42–49, 1942. [22] A. Rose, “The sensitivity performance of the human eye on an absolute scale”, J. Opt. Soc. Am. 38(2), pp. 196–208, 1948. [23] D. A. Baylor, S. Lamb, and M. H. Yau J. Physiology (London) 288, p. 613, 1979. [24] L. Mandelstamm and I. Tamm J. Phys. (Moscow) 9, p. 249, 1945. [25] L. Vaidman, “Minimum time for the evolution to an orthogonal quantum state”, Am. J. Phys. 60(2), pp. 182–183, 1992. See also (36). [26] M. Brune, E. Hagley, J. Dreyer, X. Maˆıtre, A. Maali, C. Wunderlich, J.-M. Raimond, and S. Haroche, “Observing the progressive decoherence of the “meter” in a quantum measurement”, Phys. Rev. Lett. 77(24), p. 4887, 1996. [27] C. M. Caves, “Quantum-mechanical radiation-pressure fluctuations in an interferometer”, Phys. Rev. Lett. 45(2), pp. 75–79, 1980. [28] C. M. Caves, “Quantum-mechanical noise in an interferometer”, Phys. Rev. D 23(8), pp. 1693–1708, 1981. [29] B. Yurke, S. L. McCall, and J. R. Klauder, “SU(2) and SU(1, 1) interferometers”, Phys. Rev. A 33(6), pp. 4033–4054, 1986. [30] P. Grangier, R. E. Slusher, B. Yurke, and A. LaPorta, “Squeezed-light–enhanced polarization interfero- meter”, Phys. Rev. Lett. 59(19), pp. 2153–2156, 1987. [31] C. Fabre, J. B. Fouet, and A. Maˆıtre, “Quantum limits in the measurement of very small displacements in optical images”, Opt. Lett. 25(1), pp. 76–78, 2000. [32] N. Treps, U. Andersen, B. Buchler, P. K. Lam, A. Maˆıtre, H.-A. Bachor, and C. Fabre, “Surpassing the stand- ard quantum limit for optical imaging using nonclassical multimode light”, Phys. Rev. Lett. 88, p. 203601, 2002. quant-ph/0204017. [33] A. J. Leggett, “Macroscopic quantum systems and the quantum theory of measurement”, Progr. Theor. Phys. Suppl. 69, p. 80, 1980. [34] W. D¨ur, C. Simon, and J. I. Cirac, “Effective size of certain macroscopic quantum superpositions”, Phys. Rev. Lett. 89, p. 210402, 2002. quant-ph/0205099. [35] J. D. Trimmer, “The present situation in quantum mechanics: a translation of Schr¨odinger’s “cat paradox””, Proc. Am. Phil. Soc. 124, p. 323, 1980. http://www.tu-harburg.de/rzt/rzt/it/QM/cat.html. [36] J. Uffink, “The rate of evolution of a quantum state”, Am. J. Phys. 61(10), pp. 935–936, 1993. See (25).