APA Newsletters NEWSLETTER on PHILOSOPHY and COMPUTERS

APA Newsletters NEWSLETTER ON PHILOSOPHY AND COMPUTERS

Volume 09, Number 2 Spring 2010

FROM THE EDITOR, PETER BOLTUC

FROM THE CHAIR, MICHAEL BYRON

ARTICLES

Featured Article

TERRY HORGAN AND IRIS OVED “John Pollock’s Posthumous Article”

JOHN L. POLLOCK “Probabilities for AI”

DISCUSSION PAPERS

BERYS GAUT “Computer Art”

DOMINIC MCIVER LOPES “Remediation Revisited: Replies to Gaut, Matravers, and Tavinor”

LYNNE RUDDER BAKER “Shrinking Difference—Response to Replies”

ARTICLES

LORENZO MAGNANI “Building Artificial Mimetic Minds: How Hybrid Humans Make Up Distributed Cognitive Systems”

MARGARET CUONZO “Virtual Homes and Sherlock Holmes: On the Existence of Virtual (and Other Abstract) Entities”

CARTOON

RICCARDO MANZOTTI “The Fallacy of the Intermediate Entity”

INVITATIONS

SUSAN CASTRO “Twitter and Beyond the Blackboard: The Upcoming Sessions Organized by this Committee”

KAMILLA JOHANNSDOTTIR “Biologically Inspired Cognitive Architectures 2010” APA NEWSLETTER ON

Philosophy and Computers

Piotr Bołtuć, Editor Spring 2010 Volume 09, Number 2

Due to the length of the featured article by the late J. Pollock, we were unable to include a number of papers already FROM THE EDITOR in the editor’s hand in the current issue. Among other works, in the next issue the readers may expect an article by Susan Stuart, a former member of this committee, on enkinaesthesia, and Piotr Bołtuć another by Thomas Polger on distributed computing. University of Illinois at Springfield

The highlights of the current issue of the Newsletter include a longer paper by the late John Pollock advancing the notion of probable probabilities, which comes with a helpful introduction FROM THE CHAIR by Terry Horgan and Iris Oved, and an article by Lorenzo Magnani on humans as distributed cognitive systems. Pollock’s article (from section 4 onwards) may not be easy to follow for Michael Byron those philosophers who didn’t care for math in college, albeit Kent State University we have a cartoon later in this newsletter. The Committee recently awarded the 2008 and 2009 Barwise Two discussions, on the status of computer art and the Prizes. ontology of artifacts, come to the conclusion in the current issue. The two papers on computer art, by Berys Gaut and by The 2009 Barwise Prize went to Luciano Floridi of the Dominic McIver Lopes close the bloc co-organized with the University of Hertsfordshire and the University of Oxford. ASA. We are also glad to have the response by Lynne Baker Luciano gave an outstanding presentation at the Eastern to her commentators on the shrinking ontological difference Division meeting in December, in which he sketched his between artifacts and natural kinds. vision of what he calls the Fourth Revolution (learn about it on YouTube: http://www.youtube.com/watch?v=N2AE8zy6PFo). In his article on emergence and data types Russ Abbott takes up the challenge to cross the border between philosophy The 2008 Barwise Prize went to Terry Bynum from Southern and computer science, which is always an ambitious and sort Connecticut State University. Due to scheduling issues, we could of perilous path to take. We are also glad to publish an article not award the Prize until the Central Division meeting this year. by Margaret Cuonzo that uses Kripke’s theory to explain the Terry also gave a terrific talk, in which he compared the views status of online entities. of Norbert Wiener with those of Floridi. I find a philosophical cartoon by Riccardo Manzotti putting The recent Central Division meeting also saw a session forth the claim that we do not need images as an intermediate organized by committee members Peter Boltuc and David notion in epistemology oddly interesting and unsettling at Anderson entitled, “Machines, Intentionality, Ethics, and the same time. This is because the form is so persuasive Cognition,” which clearly covers it all! The speakers included that one needs a lot of effort to step back and view its thesis David L. Anderson (Illinois State University), Keith Miller with a philosophical eye. It seems that critical reflection that (University of Illinois–Springfield), Thomas W. Polger (University philosophers undertake when reading a traditional article does of Cincinnati), and Ricardo Sanz (Universitat Autònoma de not come easily when the form—such as a movie, a cartoon, Barcelona). Other information about this session is available or a novel—forces the message through, leaving aside the here: http://bit.ly/93Sqd1. vigilance of one’s intellect. The Committee also sponsored a session at the Pacific We finish with invitations to the two upcoming sessions Division meeting in April. The session was entitled, “Beyond organized by this committee and to the next BICA conference; Blackboard: Teaching Philosophy with Technology,” and more submissions of notes of this kind would be particularly speakers included committee members Renee Smith (Coastal welcome. Carolina University), H. E. Baber (University of San Diego), and Garrett Pendergraft (University of California–Riverside), as well As always, I want to thank the members of the APA as Gregory R. Mayes (California State University–Sacramento) Committee on Philosophy and Computers for making it possible and Mark Alfino (Gonzaga University). for me to edit this Newsletter and the Chair of the Department of Philosophy as well as the Dean of Liberal Arts and Sciences Congratulations to our award winners, and I look forward at UIS for making it a little easier for me. to seeing you at our future sessions. — APA Newsletter, Spring 2010, Volume 09, Number 2 —

recognized as inferences—which is especially apt to happen if EATURED RTICLE one is not mindful of the important, insufficiently appreciated, F A and infrequently acknowledged distinction between generic and singular probabilities. In the present paper, Pollock moves beyond Classical Direct Inference to show that further Commentary on John Pollock’s “Probabilities inferences are defeasibly justified within his Nomic Probability for AI” framework. One of the principles that Pollock discusses in the present Iris Oved paper is the well-known Principle of Statistical Independence. University of Arizona This principle warrants inferences from assumptions of the form prob (Ax | Cx) and prob (Bx | Cx) to conclusions of the form Terry Horgan x x probx(Ax & Bx | Cx), where the value of the latter is simply the University of Arizona product of the values of the former two. He calls the class of inferences that rely on the Principle of Statistical Independence John L. Pollock died on September 29, 2009, at the age of 69. He Nonclassical Direct Inference because they go from generic leaves a legacy of over a dozen published books and over 100 probabilities to other generic probabilities. Such inferences published journal articles that span the fields of epistemology, are commonplace in AI systems as well as in human scientific philosophical logic, probability theory, and artificial intelligence. reasoning. The principle sometimes fails to hold, but it seems And there is more! John was working on four additional safe for an agent to assume that it holds when the agent lacks books connecting some of his earlier published work with evidence connecting the two properties A and B. Pollock shows new material, constructing an ever-growing full philosophical why it is safe to assume for any two properties that they are picture. in fact independent, viz., the likelihood of their independence As Pollock enthusiasts, we would like to see how is 1. Besides showing that agents are justified in assuming observations from this paper, “Probabilities for AI,” can be the principle, Pollock shows how to compute the defeat of brought to bear on the design of current and future AI systems. specific inferences based on the principle, which is needed if We would also like to see how they bear on probability theory, the agent is to be able to use later evidence that connects the on the philosophy of science, and on epistemic considerations probabilities. Moreover, Pollock shows how sets of Statistical about how reasoning with probabilities ought to be carried out Independence assumptions can be logically inconsistent, and in resource-bounded agents like ourselves. Although parts of he shows how an agent can defeat the inferences in such the paper appear elsewhere,1 much of it is new and deserves circumstances as well. the wider audience that John intended. Another important principle that Pollock shows to have In this paper, Pollock shows how certain inferences about probability 1 is a principle he calls Y-Independence. This probabilities can be justifiably used, by human beings and principle allows for a class of inferences he calls Computational AI systems, even though the inferences are not grounded in Inheritance, which is an extremely powerful class of inferences the probability calculus. He does this by showing that these for an agent to have at its disposal. This class seems to be probability inferences are sanctioned by certain general entirely overlooked by AI, probability theory, and the empirical principles about probability which themselves have probability sciences. Y-Independence is a principle that allows for the 1. Such principles can be used to derive conclusions that computation of the Joint Probability probx(Ax | Bx & Cx) from are, in Pollock’s terminology, defeasibly justified, rather than the probabilities prob (Ax | Bx) and prob(Ax | Cx). Pollock’s deductively justified. That is, the inferences are justified x example computes the conjunctive generic probability probx(x because they are based on principles that have probability 1, has disease D | x tested positive on test T1 and x tested positive but, in Pollock’s treatment of probability, having probability 1 on test T2) from the two generic probabilities prob (x has is compatible with having counter-instances. x disease D | x tested positive on test T) = .7 and probx(x has The work in this paper builds on Pollock’s 1990 Nomic disease D | x tested positive on test T) = .75. He also shows Probability theory, which included a series of proofs that warrant how to compute defeat for inferences based on this principle. In what he is here calling Classical Direct Inference. The notion the later sections of the paper, he goes on to show that further of Classical Direct Inference relies on a distinction that Pollock principles for computing probabilities from probabilities have makes between generic probabilities and singular probabilities probabilities of 1, and how to compute their defeat. (he calls them “indefinite” and “definite” probabilities Here are some issues that arise in light of Pollock’s paper, elsewhere). Generic probability, which he denotes with the and some questions involving these issues. We offer these in small-case expression ‘prob’, applies to properties, which are the hope that they will spark further discussion: expressible in a formal language via formulas containing free (1) Background theoretical justification vs. direct variables. The expression ‘prob’ is a variable-binding operator deployment. Which aspects of Pollock’s justification for these that binds those free variables. For instance, ‘prob (Fx | Gx)’, x principles are to be carried out by the AI systems themselves, also sometimes written as ‘prob(F | G)’, denotes the conditional and which aspects are to motivate the building-in of the probability of something’s having the property F-ness given principles as assumptions? Their justification may suggest that its having the property G-ness. Singular probability, which they can be built in; however, in order to compute the defeat of Pollock denotes with the capitalized expression ‘PROB’, applies inferences that are justified by these principles, do agents need to propositions, which are expressible in a formal language to have access to the reasons that justify them? via formulas containing no free variables (i.e., sentences). Consider his example of “Classical Direct Inference”: from the (2) Explicit recognition of, and implicit reliance upon, the generic/singular distinction. To what extent is the distinction Generic Probability probx(x has disease D | x tested positive on test T) = .7, inferring to a Singular Probability concerning between generic and singular probabilities implicitly replied upon in other work in probability theory, or in AI, or in any of a particular person, Bernard: PROB(Bernard has disease D | Bernard tested positive on test T). Inferences of this kind are the other sciences? To what extent is this distinction explicitly commonplace, in AI systems as well as in ordinary human recognized and acknowledged? What is the relationship reasoning and in scientific practice. But often they are not even between generic probabilities and laws of nature? Are the

— 2 — — Philosophy and Computers — laws of nature themselves generic probabilities or do generic A related problem occurs when probability practitioners probabilities rely on laws of nature? adopt undefended assumptions of statistical independence (3) Transferability to alternative conceptions of probability. simply on the basis of not seeing any connection between To what extent do Pollock’s results depend specifically on two propositions. This is common practice, but its justification his conception of probability as a (modal, infinitary) form of has eluded probability theorists, and researchers are typically relative frequency? (He describes them as proportions, which he apologetic about making such assumptions. Is there any way construes as relative frequencies across infinitely many nomically to defend the practice? possible worlds.) Could his results be embraced by advocates This paper shows that on a certain conception of of other notions of probability, including more subjectivist and probability—nomic probability—there are principles of epistemic notions like Bayesian Probability? If not, does Objective “probable probabilities” that license inferences of the above Bayesianism make the right kinds of adjustments? sort. These are principles telling us that although certain (4) Implementation and Augmentation. How can Pollock’s inferences from probabilities to probabilities are not deductively contributions—involving such matters as classical and valid, nevertheless the second-order probability of their yielding nonclassical direct inference, computational inheritance, and correct results is 1. This makes it defeasibly reasonable to make defeaters for defeasible inferences—be implemented in AI? the inferences. Thus I argue that it is defeasibly reasonable to How would such implementation augment current systems? assume statistical independence when we have no information Does making the generic/singular distinction explicit open a to the contrary. And I show that there is a function Y(r,s|a) doorway to building agents with more, and more powerful, such that if prob(P/Q) = r, prob(P/R) = s, and prob(P/U) = a cognitive processing? Are there further defeasible principles (where U is our background knowledge) then it is defeasibly involving probable probabilities that Pollock’s framework might reasonable to expect that prob(P/Q&R) = Y(r,s|a). Numerous incorporate? Can Pollock’s results help augment systems in other defeasible inferences are licensed by similar principles of the Machine Learning branch of AI, and/or systems that store probable probabilities. This has the potential to greatly enhance (rather than learn) degrees of correlation between phenomena, the usefulness of probabilities in practical application. such as Bayesian Networks—say, by allowing such systems to 1. Introduction defeasibly infer additional new probabilities using the defeasible inferences that Pollock has justified? AI aims at multiple goals, and probability plays an essential role in most of them. One of the ultimate aspirations of AI is the We heartily recommend John Pollock’s paper “Probabilities 2 construction of agents of human-level intelligence, capable of to AI” to AI researchers, cognitive scientists, and philosophers operating in environments of real-world complexity (in short, interested in the use of probability in designing artificial agents “generally intelligent agents,” or GIAs). For many years this or in modeling human cognition, and also to those interested in problem was largely set aside as being too hard for existing AI supplementing probability theory with defeasible second-order technology, although there has been a recent resurgence of principles which themselves have probability 1. This important 3 interest in GIAs. A more modest goal of AI is the construction work deserves a wide audience. of applications that can provide intelligent assistance to human Endnotes agents. Either goal frequently requires AI systems to use and 1. Some of this material appears in “Reasoning Defeasibly make inferences about probabilities, and as we will see, both about Probabilities” (forthcoming) in Knowledge and goals encounter similar problems in their use of probabilities. Skepticism, edited by Michael O’Rourke and Joseph Campbell GIAs are faced with environments about which they have (Cambridge, MA: MIT Press). Some of the material was in only limited knowledge. They must be able to expand their “Probable Probabilities,” originally presented at the Pacific Division APA meeting in 2007. knowledge base, and use that knowledge to guide their activity. Just like human beings, they will have to be able to discover 2. His article is also available online at http://oscarhome.soc-sci. new regularities in the world, but these will not generally be arizona.edu/ftp/PAPERS/Probabilities%20for%20AI.pdf. exceptionless regularities. Much of that knowledge will be 3. We would like to acknowledge Martin Stone Davis for helping probabilistic. Their reasoning about how to act must then be with the formatting of John Pollock’s article. based on this probabilistic knowledge. On the other hand, at least some kinds of AI assistants may not be required to discover new generalizations. Perhaps the Probabilities for AI human operators can be relied upon to provide the requisite general knowledge of the world, and then the AI assistants will John L. Pollock reason from there. However, for humans too, most of our general University of Arizona knowledge of the world is probabilistic. We know that if there are certain kinds of clouds, it will probably rain, if we turn the key in Abstract the ignition of our car, it will probably start, and so forth. Very little Probability plays an essential role in many branches of AI, where of our general knowledge is of exceptionless laws of nature. it is typically assumed that we have a complete probability In their reasoning about probabilities, both GIAs and AI- distribution when addressing a problem. But this is unrealistic assisted humans will face a general epistemological problem for problems of real-world complexity. Statistical investigation that have not been adequately addressed in the AI literature. AI gives us knowledge of some probabilities, but we generally researchers often assume that when a problem is addressed want to know many others that are not directly revealed by our by either a GIA or an AI-assisted human, they will come to the data. For instance, we may know prob(P/Q) (the probability of problem equipped with knowledge of a complete probability P given Q) and prob(P/R), but what we really want is prob(P/ distribution. The first problem for this assumption is that in a Q&R), and we may not have the data required to assess that sufficiently complex environment it would be impossible to directly. The probability calculus is of no help here. Given store a complete probability distribution in an AI system. In prob(P/Q) and prob(P/R), it is consistent with the probability general, given n simple propositions, it will take 2n logically calculus for prob(P/Q&R) to have any value between 0 and 1. independent probabilities to specify a complete probability Is there any way to make a reasonable estimate of the value distribution. For a rather small number of simple propositions, of prob(P/Q&R)? — 3 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — this is a completely intractable number of logically independent prob(A&B/D) = . However, it is also true that D entails C. It probabilities. For example, given just 300 simple propositions, then follows from the probability calculus that prob(A&B/C) a grossly inadequate number for describing many real-life = . Thus if d ≠ 1, A and B cannot be independent relative to problems, there will be 2300 logically independent probabilities. both C and D. Once this conflict is discovered, intuition alone 2300 is approximately equal to 1090. To illustrate what an immense might leave us wondering which independence assumption number this is, recent estimates of the number of elementary we should make. particles in the universe put it at 1080 – 1085. Thus to know the The preceding example illustrates that a blanket probabilities of all the constituents of a complete probability assumption of statistical independence for all cases in which distribution, we would have to know 5 – 10 orders of magnitude such assumptions seem initially reasonable will often by more logically independent probabilities than the number of inconsistent with the probability calculus. The following elementary particles in the universe. theorem of the probability calculus is another illustration of An obvious problem is that if an AI system had to store this phenomenon: all of these probabilities explicitly, it would have to have more Theorem: If A,B,C each entail U and memory than there are elementary particles in the universe. (a) prob(C/B & A) = prob(C/A); Sometimes this problem can be alleviated by assuming that most of the propositions under consideration are statistically (b) prob(C/B & ~A) = prob(C/U & ~A); independent of each other. That enables us to store the (c) prob(C/A) ≠ prob(C/U); and probabilities in a Bayesian net, which only requires us to (d) prob(B/A) ≠ prob(B/U); explicitly store probabilities where independence fails. It can then prob(C/B) ≠ prob(C/U). reasonably be oubted that there will always be enough statistical In other words, if (c) and (d) hold, then the pair of independence independence for this problem to be solved using Bayesian assumptions in (a) and (b) are inconsistent with the nets. But let us set that aside and focus on the epistemological assumption that C is independent of B. The upshot is that problem. To use Bayesian nets in this way, we have to know defeasible assumptions of independence can help alleviate what propositions are statistically independent of each other. So the epistemological problem, but we need a theory to guide us the human agent, or the GIA, would still have to know the values in making defeasible assumptions of statistical independence, of all the 2n logically independent probabilities required for because our untutored intuitions will often lead us into specifying a complete probability distribution. In other words, contradiction. the use of Bayesian nets may alleviate the storage problem, but not the epistemological problem of knowing the values of the Of course, even with such assumptions of independence, probabilities required for constructing a Bayesian net. there will be a vast number of useful probabilities we will not know. Discovering the values of interesting probabilities In applying probabilities to real-world problems, researchers is a difficult epistemic task. In the sciences, researchers get typically fill in many of the gaps in their knowledge by simply journal publications out of the discovery of new probabilistic assuming statistical independence when they have no generalizations, and even in everyday life, we usually have to information to the contrary. This strategy is often employed in observe many repeated occurrences of events before we can the construction of Bayesian nets, but such assumptions are also estimate probabilities. This problem can be illustrated by the made more generally. When they see no apparent connection common need for “joint probabilities.” Consider a medical between two kinds of events A and B, researchers assume diagnosis problem. Think of Bernard, who has symptoms that the probability of A occurring is independent of whether B suggesting a particular disease, and tests positive on two occurs, i.e., prob(A&B) = prob(A)⋅prob(B). Such assumptions unrelated tests for the disease. Suppose the probability of a are “defeasible,” in the sense that they may be reasonable person with those symptoms having the disease is .6. Suppose assumptions given what the researcher knows initially, but the probability of a person with those symptoms having the further knowledge could, at least in principle, make it clear that disease if they also test positive on the first test is .7, and A and B are not really statistically independent. the probability of their having the disease if they have those Defeasible assumptions of statistical independence symptoms and test positive on the second test is .75. What can go a long way towards filling the gaps in our knowledge is the joint probability of their having the disease if they have of probability distributions. However, deciding which those symptoms and test positive on both tests? The probability independence assumptions to make has usually been based calculus is of no help here. Given the preceding assumptions, it on nothing but untutored intuition. AI researchers have lacked is consistent with the probability calculus for the joint probability formal tools for choosing independence assumptions. The to be anything from 0 to 1. Humans, on the other hand, when reason this is a problem is that different sets of seemingly faced with a problem like this, expect the joint probability to reasonable independence assumptions are often inconsistent be higher than the probability of having the disease given only with each other. How do we decide which set of assumptions to that one tests positive on one of the tests. Such problems of adopt? Untutored intuition often fails us here, and the probability predicting joint probabilities are ubiquitous in the real-world calculus is of no help. For example, consider a community with use of probabilities. Statistical investigation gives us knowledge building codes that specify that only commercial buildings can of the component probabilities, but we frequently have no be painted grey, and also specify that only commercial buildings concrete data enabling us to estimate the joint probabilities, can be multi-storey. Let A = painted grey, B = multi-storey, C = and it is often the joint probabilities that we need—not the building in this community, and D = commercial building in this component probabilities by themselves. A complete probability community. Suppose prob(A/C) = r, prob(B/C) = s, and prob(D/ distribution would contain explicit knowledge of all the joint C) = d. It is tempting to assume that A and B are independent probabilities, but that is unrealistic. We rarely have the data relative to C, and so prob(A&B/C) = r⋅s. But it is equally tempting required to make explicit statistical inferences about joint to assume that A and B are independent relative to D. However, probabilities. it is impossible for both of these independences to hold. (A&C) The upshot is that for sufficiently complex problems, we and (B&C) are both subproperties of (i.e., logically entail) D. will typically fall far short of having a complete probability It follows by the probability calculus that prob(A/D) = and distribution. Our GIAs and AI assistants must accomodate prob(B/D) = . So if A and B are independent relative to D, this fact. For either purpose, we need AI systems that do not

— 4 — — Philosophy and Computers — require knowledge of complete probability distributions. This be rational to have degrees of belief that do not conform to the paper explores one possibility for dealing with this problem. probability calculus. Thus a completely rational cognizer will It will be argued that, just as it often seems reasonable to have degrees of belief that conform to the probability calculus, make defeasible assumptions of statistical independence, it and these are the cognizer’s subjective probabilities. can also be reasonable to make other defeasible assumptions A standard objection to the Dutch Book Argument is that about probabilities that cannot be computed just by applying it is impossible for a real (resource-bounded) cognizer to have the probability calculus to probabilities we already know. coherent degrees of belief. The difficulty is that it follows from The core idea will be that there are inferences not licensed the probability calculus (and from the Dutch Book Argument) by the probability calculus which are nevertheless almost that a necessary truth has probability 1 and a necessarily false certain to produce correct results regarding unknown proposition has probability 0. I assume that a GIA or an AI- probabilities. In other words, the second-order probability assisted human cognizer is capable of reasoning about the of the conclusion (about unknown probabilities) being true propositions expressed by a first-order language. However, given that the premises (about known probabilities) are true by Church’s theorem, there is no algorithm for determining is extremely high. Among these inferences will be inferences whether such propositions are necessary truths or necessary about statistical independence, so this promises to resolve the falsehoods. Thus there is no computationally possible way to aforementioned problem of selecting which assumptions of ensure that every necessary truth is assigned probability 1 and statistical independence to make. To justify these claims, and every necessary falsehood is assigned probability 0. to make sense of the second-order probabilities involved, we Faced with this argument, subjectivists generally retreat to must focus on what kind of probability we are talking about. the position that only ideal cognizers (unconstrained by limited Thus the next section briefly surveys the variety of kinds of memory or processing speed) have coherent degrees of belief. probability discussed in the literature on the foundations of For ideal cognizers, subjective probabilities are then identified probability theory. with their actual degrees of belief. The difficulty is that neither 2. Varieties of Probability human beings nor AI agents are ideal cognizers. So this leaves subjective probability undefined for them. To get around this 2.1 Subjective Probability difficulty, subjectivists typically define the subjective probability Early approaches to probability theory tended to focus on of P for a non-ideal agent S to be the degree of belief S would objective probability, but in response to perceived difficulties have in P if S were ideally rational. But this is also problematic. for objective probability, subjective probability became the Given a non-ideal agent S with incoherent degrees of belief, is dominant variety of probability in the last half of the twentieth there any reason to think there is a unique degree of belief S century, and retains that status today. The basic ideas underlying would have in a proposition P if S were ideally rational? This, of subjective probability were introduced first by Frank Ramsey course, depends upon what constraints rationality imposes, but (1926), but did not have much impact at the time. They were subjectivists typically claim that as long as an agent’s degrees rediscovered by Leonard Savage (1954), and it is his work that of belief are coherent, they cannot be criticized on grounds of caught on and led to the dominant role of subjective probability rationality. In particular, subjectivists give no guidance as to how today. The basic idea is that cognizers have varying degrees of an incoherent set of degrees of belief should be altered to make confidence in beliefs about different propositions, and these it coherent. Lacking rules for converting incoherent degrees of degrees of confidence should affect what bets they are willing belief into coherent degrees of belief, there is no such thing as to accept. “Degree of belief” is a technical term, defined as the degree of belief S would have in P if S were ideally rational. follows: S could have any degree of belief in P and still be rational and A cognizer S has degree of belief n/(n+r) in a long as S’s overall set of degrees of belief is coherent. proposition P iff S would accept any bet that P is true The upshot is that subjective probability only seems to with odds better than r:n, and S would accept any bet make sense for ideal agents. However, AI does not deal in ideal that P is false with odds better than n:r. agents. Both GIAs and AI-assisted humans have serious resource constraints, including bounded memory and processing speed. Degree of belief is supposed to be a measure of the cognizer’s So it does not seem that subjective probability has a place in degree of confidence. Subjectivists assume that a cognizer has AI. a degree of belief in every proposition. 2.2 Objective Probability There is no guarantee that a cognizer’s degrees of belief will conform to the probability calculus, but if they do they are If subjective probabilities are not useful for AI, it seems we said to be coherent. The Dutch Book Argument is standardly should look to objective probabilities. Historically, there have used to argue that a cognizer is being irrational if its degrees been two general approaches to probability theory. What I will 2 of belief are not coherent. This argument turns on the notion call generic probabilities are general probabilities, relating of a Dutch book, which is a combination of bets on which a properties or relations. The generic probability of an A being person will suffer a collective loss no matter what happens. For a B is not about any particular A, but rather about the property instance, suppose you are betting on a coin toss and are willing of being an A. In this respect, its logical form is the same to accept odds of 1:2 that the coin will land heads and are also as that of relative frequencies. I write generic probabilities willing to accept odds of 1:2 that the coin will land tails. I could using lower case “prob” and free variables: prob(Bx/Ax). For then place two bets with you, betting 50 cents against the coin example, we can talk about the probability of an adult male landing heads and also betting 50 cents against the coin landing of Slavic descent being lactose intolerant. This is not about tails, with the result that no matter what happens I will have to any particular person—it expresses a relationship between pay you 50 cents on one bet but you will have to pay me $1 on the property of being an adult male of Slavic descent and the the other. In other words, you have a guaranteed loss—Dutch property of being lactose intolerant. Most forms of statistical book can be made against you. The Dutch book argument (due inference or statistical induction are most naturally viewed as originally to Ramsey 1926) consists of a mathematical proof that giving us information about generic probabilities. On the other if an agent’s degrees of belief do not conform to the probability hand, for many purposes we are more interested in probabilities calculus then Dutch book can be made against him. It is alleged that are about particular persons, or more generally, about that it is irrational to put oneself in such a position, so it cannot specific matters of fact. For example, in deciding how to treat

— 5 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Herman, an adult male of Slavic descent, his doctor may want After brief thought, most people find the distinction to know the probability that Herman is lactose intolerant. This between singular and generic probabilities intuitively clear. illustrates the need for a kind of probability that attaches to However, this is a distinction that sometimes puzzles probability propositions rather than relating properties and relations. These theorists many of whom have been raised on an exclusive are sometimes called “single case probabilities,” although that diet of singular probabilities. They are often tempted to terminology is not very good because such probabilities can confuse generic probabilities with probability distributions attach to propositions of any logical form. For example, we can over random variables. Although historically most theories ask how probable it is that there are no human beings over the of objective probability were theories of generic probability, age of 130. In the past, I called these “definite probabilities,” but mathematical probability theory tends to focus exclusively now I will refer to them as singular probabilities. on singular probabilities. When mathematicians talk about The distinction between singular and generic probabilities variables in connection with probability, they usually mean is often overlooked by contemporary probability theorists, “random variables,” which are not variables at all but functions perhaps because of the popularity of subjective probability assigning values to the different members of a population. (which has no obvious way to make sense of generic Generic probabilities have single numbers as their values. probabilities). But most objective approaches to probability tie Probability distributions over random variables are just what probabilities to relative frequencies in some essential way, and their name implies—distributions of singular probabilities rather the resulting probabilities have the same logical form as the than single numbers. relative frequencies. That is, they are generic probabilities. It has always been acknowledged that for practical The simplest theories identify generic probabilities with decision-making we need singular probabilities rather than relative frequencies.3 However, it is often objected, fairly I think, generic probabilities. For example, in deciding how to treat that such “finite frequency theories” are at least sometimes Herman, his doctor wants to know the probability of his being inadequate because our probability judgments often diverge lactose intolerant, not the probability of Slavs in general being from relative frequencies. For example, we can talk about a lactose intolerant. So theories that take generic probabilities as coin being fair (and so the generic probability of a flip landing basic need a way of deriving singular probabilities from them. heads is 0.5) even when it is flipped only once and then Theories of how to do this are theories of direct inference. destroyed (in which case the relative frequency is either 1 or Theories of objective generic probability propose that statistical 0). For understanding such generic probabilities, it has been inference gives us knowledge of generic probabilities, and then suggested that we need a notion of probability that talks about direct inference gives us knowledge of singular probabilities. possible instances of properties as well as actual instances. Reichenbach (1949) pioneered the theory of direct inference. Theories of this sort are sometimes called “hypothetical The basic idea is that if we want to know the singular probability frequency theories.” C. S. Peirce was perhaps the first to make PROB(Fa), we look for the narrowest reference property G such a suggestion of this sort. Similarly, the statistician R. A. Fisher, that we know the generic probability prob(Fx/Gx) and we regarded by many as “the father of modern statistics,” identified know Ga, and then we identify PROB(Fa) with prob(Fx/Gx). probabilities with ratios in a “hypothetical infinite population, For example, actuarial reasoning aimed at setting insurance of which the actual data is regarded as constituting a random rates proceeds in roughly this fashion. Kyburg (1974) was the sample” (1922, p. 311). Karl Popper (1956, 1957, and 1959) first to attempt to provide firm logical foundations for direct endorsed a theory along these lines and called the resulting inference. Pollock (1990) took that as its starting point and probabilities propensities. Henry Kyburg (1974a) was the first constructed a modified theory with a more epistemological to construct a precise version of this theory (although he did orientation. The present paper builds upon some of the basic not endorse the theory), and it is to him that we owe the name ideas of the latter. “hypothetical frequency theories.” Kyburg (1974a) also insisted What I will argue in this paper is that new mathematical that von Mises should be considered a hypothetical frequentist. results, coupled with ideas from the theory of nomic probability More recent attempts to formulate precise versions of what (Pollock 1990), provide the justification for a wide range of might be regarded as hypothetical frequency theories are van new principles supporting defeasible inferences about the Fraassen (1981), Bacchus (1990), Halpern (1990), Pollock (1983, expectable values of unknown probabilities. These principles 1984, 1990), and Bacchus et al. (1996). I will sketch my own include familiar-looking principles of statistical independence proposal below. and direct inference, but they include many new principles I do not think that it should be supposed that there is just as well. For example, among them is a heretofore unnoticed one sensible kind of generic probability. However, in my (1990) principle enabling us to defeasibly estimate the joint probability I suggested that there is a central kind of generic probability of Bernard having the disease when he tests positive on both in terms of which a number of other kinds can be defined. tests. I believe that this broad collection of new defeasible This central kind of generic probability is what I called nomic inference schemes provides the solution to the problem of how probability. Nomic probabilities are supposed to be the subject probabilities can be truly useful even when we are ignorant matter of statistical laws of nature. Exceptionless general about most of them. laws, like “All electrons are negatively charged,” are not just 3. Nomic Probability about actual electrons, but also about all physically possible electrons. We can think of such a law as reporting that any Pollock (1990) developed a possible worlds semantics for 4 physically possible electron would be negatively charged. This objective generic probabilities, and I will take that as my is an example of a nomic generalization. We can think of nomic starting point for the present theory of probable probabilities. I probabilities as telling us instead that a certain proportion of will just sketch the theory here. The proposal was that we can physically possible objects of one sort will also have some other identify the nomic probability prob(Fx/Gx) with the proportion property. For example, we might have a law to the effect that of physically possible G’s that are F’s. For this purpose, physically the probability of a hadron being negatively charged is .5. We possible G’s cannot be identified with possible objects that are can think of this as telling us that half of all physically possible G, because the same object can be a G at one possible world hadrons would be negatively charged. and fail to be a G at another possible world. Instead, a physically possible G is defined to be an ordered pair 〈w,x〉 such that w is a physically possible world (one compatible with all of the

— 6 — — Philosophy and Computers — physical laws) and x has the property G at w. I assume that for Universality: any nomically possible property F (i.e., property consistent with If A ⊆ B, then ρ(B,A) = 1. the laws of nature), the set F of physically possible F’s will be infinite. This follows from there being infinitely many possible Finite Set Principle: worlds in which there are F’s. I also assume that properties For any set B, N > 0, and open formula Φ, are rather coarsely individuated, in the sense that nomically ρ (Φ(X) / X ⊆ B & # X = N)= equivalent properties are identical. Equivalently, if F and G are X ρx x (Φ({x ,...,x }) / x ,...,x are pairwise distinct & properties, F = G iff F = G. 1,..., N 1 N 1 N x ,...,x ∈ B). For properties F and G, where F and G are the sets of 1 N physically possible F’s and G’s respectively, let us define the Projection Principle: subproperty relation as follows: If 0 ≤ p,q ≤ 1 and (∀y)(Gy → ρx(Fx/Rxy)∈[p,q]), then ρ (Fx/Rxy & Gy)∈[p,q]. F 7 G iff F ⊆ G, i.e., iff it is physically necessary (follows x,y from true physical laws) that (∀x)(Fx → Gx). Crossproduct Principle: We can think of the subproperty relation as a kind of nomic If C and D are nonempty, ρ(A×B,C×D) = ρ(A,C) ⋅ ρ(B,D). entailment relation (holding between properties rather than Note that these four principles are all theorems of elementary propositions). More generally, F and G can have any number set theory when the sets in question are finite. For instance, the of free variables, in which case F 7 G iff the universal closure projection principle tells us is that ρ (Fx/(∃y)(Rxy & Gy)) is a of (F → G) is physically necessary. x weighted average of the values of ρx(Fx/ Rxy) for different values Proportion functions are a generalization of measure of y. The finite version of the Projection Principle is proven in functions studied in mathematics in measure theory. Proportion the appendix. My assumption is simply that ρ continues to have functions are “relative measure functions.” Given a suitable these algebraic properties even when applied to infinite sets. I proportion function ρ, we could stipulate that: take it that this is a fairly conservative set of assumptions. 5 probx(Fx/Gx) = ρ(F,G). I often hear the objection that in affirming the Crossproduct However, it is unlikely that we can pick out the right proportion Principle, I must be making a hidden assumption of statistical function without appealing to prob itself, so the postulate is independence. However, that is to confuse proportions with simply that there is some proportion function related to prob probabilities. The Crossproduct Principle is about proportions— as above. This is merely taken to tell us something about the not probabilities. For finite sets, proportions are computed by formal properties of prob. Rather than axiomatizing prob simply counting members and computing ratios of cardinalities. directly, it turns out to be more convenient to adopt axioms for It makes no sense to talk about statistical independence in this proportion functions. Pollock (1990) showed that, given the context. The crossproduct principle holds for finite sets for the assumptions adopted there, ρ and prob are interdefinable, so simple reason that #(A×B) = (#A)⋅(#B). For infinite sets we the same empirical considerations that enable us to evaluate cannot just count members any more, but the algebra is the prob inductively also determine ρ. same. It is useful to axiomatize nomic probabilities indirectly by adopting axioms for proportions because the algebra of It is convenient to be able to write proportions in the proportions is simpler than the algebra of probabilities. same logical form as probabilities, so where φ and θ are open formulas with free variable x, let ρ (φ/θ) = ρ({x|φ & θ}, {x|θ}). Pollock (1990) derived the entire epistemological theory x of nomic probability from a single epistemological principle Note that probx and ρx are variable-binding operators, binding the variable x. When there is no danger of confusion, I will coupled with a mathematical theory that amounts to a calculus typically omit the subscript “x.” To simplify expressions, I will of nomic probabilities. The single epistemological principle is often omit the variables, writing “prob(F/G)” for “prob(Fx/Gx)” the statistical syllogism, which can be formulated as follows: when no confusion will result. Statistical Syllogism: I will make three classes of assumptions about the If F is projectible with respect to G and r > 0.5, then £Gc proportion function. Let #X be the cardinality of a set X. If Y is & prob(F/G) ≥ r· is a defeasible reason for £Fc·, the finite, I assume: strength of the reason being a monotonic increasing 6 Finite Proportions: function of r. For finite X, ρ(A,X) = . I take it that the statistical syllogism is a very intuitive However, for present purposes the proportion function is most principle, and it is clear that we employ it constantly in our useful in talking about proportions among infinite sets. The everyday reasoning. For example, suppose you read in the sets F and G will invariably be infinite, if for no other reason newspaper that the President is visiting Guatemala, and you than that there are infinitely many physically possible worlds believe what you read. What justifies your belief? No one in which there are F’s and G’s. believes that everything printed in the newspaper is true. What you believe is that certain kinds of reports published in certain My second set of assumptions is that the standard axioms kinds of newspapers tend to be true, and this report is of that for conditional probabilities hold for proportions: kind. It is the statistical syllogism that justifies your belief. 0 ≤ ρ(X,Y) ≤ 1; The projectibility constraint in the statistical syllogism is If Y ⊆ X then ρ(X,Y) = 1; the familiar projectibility constraint on inductive reasoning, If Z ≠ ∅ and X∩Y∩Z = ∅ then ρ(X∪Y,Z) = ρ(X,Z) + ρ(Y,Z); first noted by Goodman (1955). One might wonder what it is doing in the statistical syllogism. But it was argued in (Pollock If Z ≠ ∅ then ρ(X∩Y,Z) = ρ(X,Z) ⋅ ρ(Y,X∩Z). 1990), on the strength of what were taken to be intuitively These axioms automatically hold for relative frequencies compelling examples, that the statistical syllogism must be among finite sets, so the assumption is just that they also hold so constrained. Furthermore, it was shown that without a for proportions among infinite sets. projectibility constraint, the statistical syllogism is self-defeating, Finally, I need four assumptions about proportions that go because for any intuitively correct application of the statistical beyond merely imposing the standard axioms for the probability syllogism it is possible to construct a conflicting (but unintuitive) calculus. The four assumptions I will make are: application to a contrary conclusion. This is the same problem — 7 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — that Goodman first noted in connection with induction. Pollock Figure 1. C(n,r) for n=100, n=1000, and n=10000. (1990) then went on to argue that the projectibility constraint on induction derives from that on the statistical syllogism. The projectibility constraint is important, but also problematic because no one has a good analysis of projectibility. I will not discuss it further here. I will just assume, without argument, that the second-order probabilities employed below in the theory of probable probabilities satisfy the projectibility constraint, and hence the statistical syllogism can be applied to them. general theorem, we need the notion of a linear constraint. The statistical syllogism is a defeasible inference scheme, Linear constraints either state the values of certain proportions, so it is subject to defeat. I believe that the only primitive e.g., stipulating that ρ(X,Y) = r, or they relate proportions using (underived) principle of defeat required for the statistical linear equations. For example, if we know that X = Y∪Z, that syllogism is that of subproperty defeat: generates the linear constraint Subproperty Defeat for the Statistical Syllogism: ρ(X,U) = ρ(Y,U) + ρ(Z,U) – ρ(X∩Z,U). If H is projectible with respect to G, then £Hc & prob(F/ Our strategy will be to approximate the behavior of G&H) < prob(F/G)· is an undercutting defeater for the constraints applied to infinite domains by looking at their inference by the statistical syllogism from £Gc & prob(F/G) behavior in sufficiently large finite domains. Some linear ≥ r· to £Fc·.7 constraints may be inconsistent with the probability calculus. In other words, more specific information about c that lowers We will want to rule those out of consideration, but we will the probability of its being F constitutes a defeater. need to rule out others as well. The difficulty is that there are 4. Limit Theorems and Probable Probabilities sets of constraints that are satisfiable in infinite domains but not satisfiable in finite domains. For example, if r is an irrational I propose to solve the epistemic problem of inadequate number between 0 and 1, the constraint “ρ(X,Y)=r” is satisfiable probability knowledge by justifying a large collection of in infinite domains but not in finite domains. Let us define: defeasible inference schemes for reasoning about probabilities. LC is finitely unbounded iff for every positive integer K The key to doing this lies in proving some limit theorems about there is a positive integer N such that if #U = N then the algebraic properties of proportions among finite sets, and #{〈X ,...,X 〉|LC & X ,...,X ⊆ U} ≥ K. proving some general theorems that relate those limit theorems 1 n 1 n to the algebraic properties of nomic probabilities. For the purpose of approximating the behaviors of constraints in infinite domains by exploring their behavior in finite domains, 4.1 Probable Proportions Theorem I will confine my attention to finitely unbounded sets of linear Let us begin with a simple example. Suppose we have a set constraints. If LC is finitely unbounded, it must be consistent of 10,000,000 objects. I announce that I am going to select a with the probability calculus, but the converse is not true. I think subset, and ask you approximately how many members it will that by appealing to limits, it should be possible to generalize have. Most people will protest that there is no way to answer the following results to all sets of linear constraints that are this question. It could have any number of members from 0 to consistent with the probability calculus, but I will not pursue 10,000,000. However, if you answer, “Approximately 5,000,000,” that here. you will almost certainly be right. This is because, although there The key theorem we need is then: are subsets of all sizes from 0 to 10,000,000, there are many more subsets whose sizes are approximately 5,000,000 than there are Probable Proportions Theorem: of any other size. In fact, 99% of the subsets have cardinalities Let U,X1,…,Xn be a set of variables ranging over sets, differing from 5,000,000 by less than .08%. If we let “ ” mean and consider a finitely unbounded finite set LC of linear “the difference between x and y is less than or equal to δ,″ the constraints on proportions between Boolean compounds general theorem is: of those variables. Then for any pair of relations P,Q whose

Finite Indifference Principle: variables are a subset of U,X1,…,Xn there is a unique real number r in [0,1] such that for every ε,δ > 0, there is an N For every ε,δ > 0 there is an N such that if U is finite and such that if U is finite and #{〈X ,...,X 〉|LC & X ,...X ⊆ U} #U > N then 1 n 1 n ≥ N then

ρX(ρ(X,U) 0.5 / X ⊆ U)≥ 1 – ε. ρX ,...X (ρ(P,Q) r / LC & X1,...,Xn ⊆ U) ≥ 1 – ε. Proof: See appendix. 1 n Proof: See appendix. In other words, to any given degree of approximation, Let us refer to this unique r as the limit solution for ρ(P/Q) the proportion of subsets of U which are such that ρ(X,U) is given LC. For all of the choices of constraints we will consider, approximately equal to .5, goes to 1 as the size of U goes to finite unboundedness will be obvious, so the limit solution will infinity. To see why this is true, suppose #U = n. If r ≤ n, the exist. This theorem, which establishes the existence of the limit number of r-membered subsets of U is C(n,r)= . It is solution under very general circumstances, underlies all of the illuminating to plot C(n,r) for variable r and various fixed values principles developed in this paper. It is important to realize that of n. See figure 1. This illustrates that the sizes of subsets of U will it is just a combinatorial theorem about finite sets, and as such cluster around , and they cluster more tightly as n increases. is a theorem of set theory. It does not depend on any of the C(n,r) becomes “needle-like” in the limit. As we proceed, I will assumptions we have made about proportions in infinite sets. state a number of similar combinatorial theorems, and in each Thus far the mathematics is not philosophically questionable. case they have similar intuitive explanations. The cardinalities of relevant sets are products of terms of the form C(n,r), and What we will actually want are particular instances of this their distribution becomes needle-like in the limit. theorem for particular choices of LC and specific values of r. An example is the Finite Indifference Principle. In general, The Finite Indifference Principle is our first example of an LC generates a set of simultaneous equations, and the limit instance of a general combinatorial limit theorem. To state the

— 8 — — Philosophy and Computers — solution r can be determined by solving those equations. 4.3 Probable Probabilities The simultaneous equations are the term-characterizations Nomic probabilities are proportions among physically possible discussed in the appendix in the proof of the Probable objects. Recall that I have assumed that for any nomically Proportions Theorem. It turns out that these equations can be possible property F (i.e., property consistent with the laws of generated automatically and then solved automatically by a nature), the set F of physically possible F’s is be infinite. Thus the computer algebra program. To my surprise, neither Mathematica Limit Principle for Proportions implies an analogous principle nor Maple has proven effective in solving these sets of equations, for nomic probabilities: but I was able to write a special purpose LISP program that Probable Probabilities Theorem: is fairly efficient. It computes the term-characterizations and solves them for the variable when that is possible. It can also be Consider a finitely unbounded finite set LC of linear directed to produce a human-readable proof. If the equations constraints on proportions between Boolean compounds of a list of variables U,X ,…,X . Let r be limit solution for ρ(P/Q) constituting the term-characterizations do not have analytic 1 n solutions, they can still be solved numerically to compute the given LC. Then for any infinite set U, for every δ > 0, most probable values of the variables in specific cases. This probX ,...X (prob(P/Q) r / LC & X1,...,Xn 7 U)=1. software can be downloaded from http://oscarhome.soc-sci. 1 n Proof: See appendix. arizona.edu/ftp/OSCAR-web-page/CODE/Code for probable probabilities.zip. I will refer to this as the probable probabilities I sometimes hear the objection that in proving theorems software. The proofs of many of the theorems presented in this like the Probable Probabilities Theorem I must be making a paper were generated using this software. hidden assumption about uniform distributions. It is not clear what lies behind this objection. I gave the proof. Where is the 4.2 Limit Principle for Proportions gap supposed to be? Talk of uniform distributions makes no The Probable Proportions Theorem and its instances are sense as applied to either proportions or generic probabilities. mathematical theorems about finite sets. For example, the Finite I suspect that those who raise this objection are confusing Indifference Principles tells us that as N → ∞, if U is finite but generic probabilities with probability distributions over random contains at least N members, then the proportion of subsets X of variables, as discussed in section three. a set U which are such that ρ(X,U) 0.5 goes to 1. This suggests Instances of the Probable Proportions Theorem tell us that the proportion is 1 when U is infinite: the values of the limit solutions for sets of linear constraints,

If U is infinite then for every δ>0, ρX(ρ(X,U) 0.5 / X ⊆ U) = 1. and hence allow us to derive instances of the consequent Given the rather simple assumptions I made about ρ in of the Probable Probabilities Theorem. I will call the latter section three, we can derive such infinitary principles from “probable probabilities principles.” For example, from the Finite the corresponding finite principles. We first prove in familiar Indifference Principle we get: ways: Probabilistic Indifference Principle: Law of Large Numbers for Proportions: For any nomically possible property G and for every δ > 0, If B is infinite and ρ(A/B) = p then for every ε,δ > 0, there 8 probX(prob(X / G) 0.5 / X 7 G)=1. is an N such that 4.4 Justifying Defeasible Inferences about Probabilities ρ (ρ(A/X) p / X ⊆ B & X is finite & # X ≥ N) ≥ 1 – ε. x Next note that we can apply the statistical syllogism to the Proof: See appendix. second-order probability formulated in the probabilistic Unlike Laws of Large Numbers for probabilities, the Law of indifference principle. For every δ > 0, this gives us a defeasible Large Numbers for Proportions does not require an assumption reason for expecting that if F 7 G, then prob(F / G) 0.5, and of statistical independence. This is because it is derived from these conclusions jointly entail that prob(F/G) = 0.5. For any the crossproduct principle, and as remarked in section three, property F, (F&G) 7 G, and prob(F/G) = prob(F&G/G). Thus no such assumption is required (or even intelligible) for the we are led to a defeasible inference scheme: crossproduct principle. Indifference Principle: The Law of Large Numbers for Proportions provides the For any properties F and G, if G is nomically possible then it link for moving from the behavior of linear constraints in finite is defeasibly reasonable to assume that prob(F/G) = 0.5. sets to their behavior in infinite sets. It enables us to prove: The Indifference Principle is my first example of a Limit Principle for Proportions: principle of probable probabilities. We have a quadruple of Consider a finitely unbounded finite set LC of linear constraints principles that go together: (1) the Finite Indifference Principle, on proportions between Boolean compounds of a list of which is a theorem of combinatorial mathematics; (2) the variables U,X1,…,Xn. Let r be limit solution for ρ(P/Q) given LC. Infinitary Indifference Principle, which follows from the finite Then for any infinite set U, for every δ > 0: principle given the law of large numbers for proportions; (3) ρ (ρ(P,Q) r / LC & X ,...,X ⊆ U)=1. the Probabilistic Indifference Principle, which is a theorem X1,...,Xn 1 n Proof: See appendix. derived from (2); and (4) the Indifference Principle, which is a principle of defeasible reasoning that follows from (3) with the This is our crucial “bridge theorem” that enables us help of the statistical syllogism. All of the principles of probable to move from combinatorial theorems about finite sets to probabilities that I will discuss have analogous quadruples of principles about proportions in infinite sets. This, together principles associated with them. Rather than tediously listing with the Probable Proportions Theorem, constitute the central all four principles in each case, I will encapsulate the four theorems of this paper. They will allow us to establish many principles in the simple form: more concrete theorems. Thus, for example, from the Finite Indifference Principle we can derive: Expectable Indifference Principle: Infinitary Indifference Principle: For any properties F and G, if G is nomically possible then the expectable value of prob(F/G) = 0.5. If U is infinite then for every δ > 0, ρX(ρ(X,U) 0.5 / X ⊆ U)=1. So in talking about expectable values, I am talking about this entire quadruple of principles. Our general theorem is:

— 9 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Principle of Expectable Values practitioners commonly assume statistical independence Consider a finitely unbounded finite set LC of linear when they have no reason to think otherwise, and so compute constraints on proportions between Boolean compounds that prob(A&B/C) = prob(A/C)⋅prob(B/C). This assumption is ubiquitous in almost every application of probability to real-world of a list of variables U,X1,…,Xn. Let r be the limit solution for ρ(P/Q) given LC. Then given LC, the expectable value problems. However, the justification for such an assumption has of prob(P/Q) = r. heretofore eluded probability theorists, and when they make such assumptions they tend to do so apologetically. We are now I have chosen the Indifference Principle as my first in a position to provide a justification for a general assumption example of a principle of probable probabilities because the of statistical independence. Recall that our general strategy is argument for it is simple and easy to follow. But this principle to formulate our assumptions as a set of finitely unbounded is only occasionally useful. If we were choosing the properties linear constraints, and then find the limit solution by solving F in some random way, it would be reasonable to expect that the set of simultaneous equations generated by them (the term prob(F/G) = 0.5. However, pairs of properties F and G which characterizations). This can usually be done using the probable are such that prob(F/G) = 0.5 are not very useful to us from a probabilities software. In this case we get: cognitive perspective, because knowing that something is a G then carries no information about whether it is an F. As a result, Finite Independence Principle: we usually only enquire about the value of prob(F/G) when we For all rational numbers r,s between 0 and 1, given that have reason to believe there is a connection between F and G X,Y,Z ⊆ U & ρ(X,Z) = r & ρ(Y,Z) = s, the limit solution for such that prob(F/G) ≠ 0.5. Hence in actual practice, application ρ(X∩Y,Z) is r⋅s.9 of the Indifference Principle to cases that really interest us will Proof: See appendix. almost invariably be defeated. This does not mean, however, As before, this generates the four principles making up the that the Indifference Principle is never useful. For instance, if following principle of expectable values: I give Jones the opportunity to pick either of two essentially identical balls, in the absence of information to the contrary Principle of Expectable Statistical Independence: it seems reasonable to take the probability of either choice to For rational numbers r,s between 0 and 1, given that be .5. This can be justified as an application of the Indifference prob(A/C) = r and prob(B/C) = s, the expectable value of Principle. prob(A&B/C) = r⋅s. That applications of the Indifference Principle are So a provable combinatorial principle regarding finite sets often defeated illustrates an important point about nomic ultimately makes it reasonable to expect, in the absence of probability and principles of probable probabilities. The fact contrary information, that arbitrarily chosen properties will be that a nomic probability is 1 does not mean that there are statistically independent of one another. This is the reason why, no counter-instances. In fact, there may be infinitely many when we see no connection between properties that would counter-instances. This should be familiar from standard force them to be statistically dependent, we can reasonably measure theory. Consider the probability of a real number being expect them to be statistically independent. This solves one of irrational. Plausibly, this probability is 1, because the cardinality the major unsolved problems of the application of probabilities of the set of irrationals is infinitely greater than the cardinality of to real-world problems. the set of rationals. But there are still infinitely many rationals. The set of rationals is infinite, but it has measure 0 relative to 6. Defeaters for Statistical Independence the set of real numbers. Of course, the assumption of statistical independence A second point is that in classical probability theory (which sometimes fails. Clearly, this can happen when there are causal is about singular probabilities), conditional probabilities are connections between properties. But it can also happen for defined as ratios of unconditional probabilities: purely logical reasons. For example, if A = B, A and B cannot be independent unless r = 1. In general, when A and B “overlap,” PROB(P/Q) = . in the sense that there is a D such that (A&C),(B&C) 7 D and However, for generic probabilities, there are no unconditional prob(D/C) ≠ 1, then we should not expect that prob(A&B/C) = probabilities, so conditional probabilities must be taken as prob(A/C)⋅prob(B/C). This follows from the following principle primitive. These are sometimes called “Popper functions.” of expectable probabilities: The first people to investigate them were Karl Popper (1938, Principle of Statistical Independence with Overlap: 1959) and the mathematician Alfred Renyi (1955). If conditional If r,s,g are rational numbers between 0 and 1, given that probabilities are defined as above, PROB(P/Q) is undefined when prob(A/C) = r, prob(B/C) = s, prob(D/C) = g, (A&C) 7 D, PROB(Q) = 0. However, for nomic probabilities, prob(F/G&H) can be perfectly well-defined even when prob(G/H) = 0. One and (B&C) 7 D, it follows that prob(A/C&D) = r/g, prob(B/ consequence of this is that, unlike in the standard probability C&D) = s/g, and the following values are expectable: calculus, if prob(F/G) = 1, it does not follow that prob(F/G&H) (1) prob(A&B/C) = ; = 1. Specifically, this can fail when prob(H/G) = 0. Thus, for (2) prob(A&B/C&D) = . example, prob(2x is irrational/x is a real number) = 1 Proof: See appendix. To illustrate statistical independence with overlap using but a simple and intuitive case, suppose A = A & D and B = B & prob(2x is irrational/x is a real number & x is rational) = 0. 0 0 D. Given no reason to think otherwise, we would expect A0,

In the course of developing the theory of probable probabilities, B0, and D to be statistically independent. But then we would we will find numerous examples of this phenomenon, and they expect that will generate defeaters for the defeasible inferences licensed prob(A&B/C) = prob(A0&D&B0/C) = prob(A0/C)⋅prob(D/ by our principles of probable probabilities. C)⋅prob(B0/C) 5. Statistical Independence = = . Now let us turn to a truly useful principle of probable probabilities. It was remarked above that probability — 10 — — Philosophy and Computers —

The upshot is that, given the overlap, we can expect A know of some specific D such that (A&C) 7 D and (B&C) 7 D and B to be statistically independent relative to (C&D), but and prob(D/C) ≠ 1. not relative to C. The second-order probability to which the But now it may occur to the reader that there is a second statistical syllogism is applied to generate (1) in the Principle strategy for generating automatic defeat. We can always of Statistical Independence with Overlap is: construct a specific such D, namely, (A ∨ B). However, it turns out that this choice of D does not give us a defeater. In fact,

On the other hand, the second-order probability to which the statistical syllogism is applied to generate the Principle of Statistical Independence was: = = 1.

This is because, once again, ,[(prob(A/C) = r and prob(B/C) = s) The former probability takes account of more information than → (∃g)(∃ζ)[(A&C) 7 (A ∨ B) and (B&C) 7 (A ∨ B) the latter, so it provides a subproperty defeater for the use of the statistical syllogism and hence an undercutting defeater for and prob(A ∨ B/C) = g and prob(A ∨ B/U) = ζ]]. the Principle of Statistical Independence: Notice that the latter depends upon our not knowing the Overlap Defeat for Statistical Independence: value of g. If we do know that prob(A/C) = r, prob(B/C) = s, and prob(A ∨ B/C) = g, then we can simply compute by the £(A&C) 7 D, (B&C) 7 D, and prob(D/C) ≠ 1· is an probability calculus that prob(A&B/C) = r + s – g, in which undercutting defeater for the inference from £prob(A/C) case the application of the defeasible inference to the contrary = r and prob(B/C) = s· to £prob(A&B/C) = r ⋅ s· by the conclusion is conclusively defeated. Principle of Statistical Independence. The preceding can be generalized. There are many ways Suppose you know that prob(A/C) = r and prob(B/C) = s, of automatically generating properties D such that (A&C) 7 D and are inclined to infer that prob(A&B/C) = r⋅s. As long as and (B&C) 7 D. For example, given some fixed set E, we can r,s<1, there will always be a D such that (A&C) 7 D, (B&C) define: 7 D, and prob(D/C) ≠ 1. Does this mean that the inference is µ(A,B) = A ∨ B ∨ E. always defeated? It does not, but understanding why is a bit complicated. First, what we know in general is the existential But again, generalization (∃D)[(A&C) 7 D and (B&C) 7 D and prob(D/C) ,[(prob(A/C) = r and prob(B/C) = s) ≠ 1]. But the defeater requires knowing of a specific such D. → (∃g)(∃ζ)[(A&C) 7 µ(A,B) and (B&C) 7 µ(A,B) The reason for this is that it is not true in general that prob(Fx/ and prob(µ(A,B)/C) = g and prob(µ(A,B)/U) = ζ]] Rxy) = prob(Fx/(∃y)Rxy). For example, let Fx be “x = 1” and let so Rxy be “x < y & x,y are natural numbers ≤ 2.” Then prob(Fx/ Rxy) = a, but prob(Fx/(∃y)Rxy) = 2. Accordingly, we cannot assume that = 1.

These observations illustrate a general phenomenon that will recur for all of our defeasible principles of expectable probabilities. Defeaters cannot be generated by functions that apply automatically to the properties involved in the inference. For example, in obtaining overlap defeaters for the Principle of = Statistical Independence, we must have some substantive way of picking out D that does not pick it out simply by reference to A, B, and C. and hence merely knowing that (∃D)[(A&C) 7 D and (B&C) In sections seven and nine we will encounter additional 7 D and prob(D/C) ≠ 1] does not give us a defeater. In fact, it undercutting defeaters for the Principle of Statistical is a theorem of the calculus of nomic probabilities that if ,[B Independence. → C] then prob(A/B) = prob(A/B&C). So because 7. Nonclassical Direct Inference ,[(prob(A/C) = r and r,s < 1 and prob(B/C) = s) Pollock (1984) noted (using different terminology) the following → (∃D)(∃g)(∃ζ)[(A&C) 7 D and (B&C) 7 D and prob(D/ principle of probable probabilities: C) = g and prob(D/U) = ζ]] Nonclassical Direct Inference: it follows that If r is a rational number between 0 and 1, and prob(A/B) = r, the expectable value of prob(A/B&C) = r. Proof: See appendix. = 1. This is a kind of “principle of insufficient reason.” It tells us that if we have no reason for thinking otherwise, we should expect that strengthening the reference property in a nomic Hence the mere fact that there always is such a D does not probability leaves the value of the probability unchanged. This is automatically give us a defeater for the application of the called “nonclassical direct inference” because, although it only Principle of Statistical Independence. To get defeat, we must licenses inferences from generic probabilities to other generic

— 11 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — probabilities, it turns out to have strong formal similarities to to assume that being a fish and being more than a year old classical direct inference (which licenses inferences from are statistically independent relative to “x is a vertebrate,” generic probabilities to singular probabilities), and as we will and hence prob(x is more than a year old & x is a fish/x is see in section eight, principles of classical direct inference can a vertebrate) = 0.15⋅0.8 = 0.12. But suppose we also know be derived from it. prob(x is more than a year old/x is an acquatic animal) = 0.2. Probability theorists have not taken formal note of the Should this make a difference? Relying upon untutored intuition Principle of Nonclassical Direct Inference, but they often may leave one unsure. However, being a vertebrate and a fish reason in accordance with it. For example, suppose we entails being an acquatic animal, so additional information know that the probability of a twenty year old male driver in gives us a subproperty defeater for the assumption of statistical Maryland having an auto accident over the course of a year is independence. What we should conclude instead is that prob(x .07. If we add that his girlfriend’s name is “Martha,” we do not is more than a year old & x is a fish/x is a vertebrate) = 0.2⋅0.8 expect this to alter the probability. There is no way to justify = 0.16. this assumption within a traditional probability framework, By virtue of the equivalence of the principles of Nonclassical but it is justified by Nonclassical Direct Inference. In fact, the Direct Inference and Statistical Independence, defeaters for the Principle of Nonclassical Direct Inference is equivalent (with Principle of Statistical Independence also yield defeaters for one slight qualification) to the defeasible Principle of Statistical Nonclassical Direct Inference. In particular, overlap defeaters Independence. This turns upon the following simple theorem for the Principle of Statistical Independence yield overlap of the probability calculus: defeaters for Nonclassical Direct Inference. We have the Independence and Direct Inference Theorem: following theorem: If prob(C/B) > 0 then prob(A/B&C) = prob(A/B) iff Principle of Nonclassical Direct Inference with Overlap: prob(A&C/B) = prob(A/B)⋅prob(C/B). If B&D 7 G and C&D 7 G then the expectable value of As a result, anyone who shares the commonly held intuition prob(B/C&D) = . that we should be able to assume statistical independence in the absence of information to the contrary is also committed Note that if G 7 D then = prob(B/G), so £B&D,C&D to endorsing Nonclassical Direct Inference. This is important, 7 G 7 D· is a defeasible reason for £prob(B/C&D) = prob(B/ because I have found that many people do have the former G)·. intuition but balk at the latter. This is an interesting generalization of Nonclassical Nonclassical Direct Inference is a principle of defeasible Direct Inference. Although probabilists commonly reason in reasoning, so it is subject to defeat. The simplest and most accordance with Nonclassical Direct Inference in practical important kind of defeater is a subproperty defeater. Suppose applications (without endorsing the formal principle), untutored C 7 D 7 B and we know that prob(A/B) = r, but prob(A/D) = intuition is not apt to lead them to reason in accordance with s, where s ≠ r. This gives us defeasible reasons for drawing two Nonclassical Direct Inference with Overlap. To the best of incompatible conclusions, viz., that prob(A/C) = r and prob(A/D) my knowledge, Nonclassical Direct Inference with Overlap = s. The principle of subproperty defeat tells us that because has gone unnoticed in the probability literature. Nonclassical D 7 B, the latter inference takes precedence and defeats the Direct Inference with Overlap yields the standard principle of inference to the conclusion that prob(A/C) = r: Nonclassical Direct Inference when D is tautologous. Subproperty Defeat for Nonclassical Direct Inference: Nonclassical Direct Inference with Overlap is subject to If C 7 D 7 B, prob(A/D) = s, prob(A/B) = r, prob(A/U) = both subproperty defeat and overlap defeat, just as the standard a, prob(B/U) = b, prob(C/U) = c, prob(D/U) = d, then the principle is: expectable value of prob(A/C) = s (rather than r). Subproperty Defeat for Nonclassical Direct Inference with Proof: See appendix. Overlap: Because the principles of nonclassical direct inference and £(C&D) 3 E 3 D and prob(B/E) ≠ r· is an undercutting statistical independence are equivalent, subproperty defeaters defeater for the inference by Nonclassical Direct Inference for nonclassical direct inference generate analogous defeaters with Overlap from £B&D,C&D 7 G· to £prob(B/C&D) for the Principle of Statistical Independence: = ·. Principle of Statistical Independence with Subproperties: Overlap Defeat for Nonclassical Direct Inference with If prob(A/C) = r, prob(B/C) = s, (B&C) 7 D 7 C, and Overlap: prob(A/D) = p ≠ r, then the expectable value of prob(A&B/ £B&D 7 H, C&D 7 H and prob(Gx/Dx) ≠ prob(Hx/Dx)· is C) = p⋅s (rather than r⋅s). an undercutting defeater for the inference by Nonclassical £ 7 Proof: prob(A&B/C) = prob(A/B&C)⋅prob(B/C), and by Direct Inference with Overlap from B&D G and C&D 7 · £ · nonclassical direct inference, the expectable value of prob(A/ G to prob(B/C&D) = . B&C) = p. • 8. Classical Direct Inference Subproperty Defeat for Statistical Independence: Direct inference is normally understood as being a form of £(B&C) 7 D 7 C and prob(A/D) = p ≠ r· is an undercutting inference from generic probabilities to singular probabilities defeater for the inference by the Principle of Statistical rather than from generic probabilities to other generic Independence from £prob(A/C) = r & prob(B/C) = s· to probabilities. However, it was shown in Pollock (1990) that these £prob(A&B/C) = r⋅s·. inferences are derivable from Nonclassical Direct Inference if Consider an example of subproperty defeat for Statistical we identify singular probabilities with a special class of generic Independence. Suppose we know that prob(x is more than probabilities. The present treatment is a generalization of that a year old/x is a vertebrate) = 0.15, and prob(x is a fish/x is a given in Pollock (1984 and 1990).10 Let K be the conjunction of vertebrate) = 0.8, and we want to know the value of prob(x all the propositions the agent is warranted in believing,11 and let is more than a year old & x is a fish/x is a vertebrate). In the K be the set of all physically possible worlds at which K is true absence of any other information it would be reasonable (“K-worlds”). I propose that we define the singular probability

— 12 — — Philosophy and Computers —

PROB(P) (written in small caps) to be the proportion of K-worlds Overlap Defeat for Classical Direct Inference: at which P is true. Where P is the set of all physically possible The conjunction of P-worlds: (i) Rx …x & Sx …x & Tx …x 7 Gx …x and PROB(P) = ρ(P,K). 1 n 1 n 1 n 1 n (ii) (Sx …x & Tx …x & x = a & … & x = a & ) 7 More generally, where Q is the set of all physically possible Q- 1 n 1 n 1 1 n n K Gx …x and worlds, we can define: 1 n

PROB(P/Q) = ρ(P, Q∩K). (iii) prob(Gx1…xn / Sx1…xn & Tx1…xn) ≠ 1 This makes singular probabilities sensitive to the agent’s is an undercutting defeater for the inference by classical knowledge of his situation, which is what is needed for rational direct inference from £Sa1…an is known and prob(Rx1…xn / 12 PROB decision making. Formally, singular probabilities become Sx1…xn & Tx1…xn) = r· to £ (Ra1…an / Ta1…an) = r·. analogous to Carnap’s (1950, 1952) logical probability, with the Because singular probabilities are generic probabilities in important difference that Carnap took ρ to be logically specified, disguise, we can also use nonclassical direct inference to infer whereas here the identity of ρ is taken to be a contingent singular probabilities from singular probabilities. Thus £PROB(P/ fact. ρ is determined by the values of contingently true nomic Q) = r· gives us a defeasible reason for expecting that PROB(P/ probabilities, and their values are discovered by various kinds Q&R) = r. We can employ principles of statistical independence of statistical induction. similarly. For example, £PROB(P/R) = r & PROB(Q/R) = s· gives It turns out that singular probabilities, so defined, can be us a defeasible reason for expecting that PROB(P&Q/R) = identified with a special class of nomic probabilities: r⋅s. And we get principles of subproperty defeat and overlap Representation Theorem for Singular Probabilities: defeat for these applications of Nonclassical Direct Inference (1) PROB(Fa) = prob(Fx/x = a & K); and Statistical Independence that are exactly analogous to the principles for generic probabilities. (2) If it is physically necessary that [K → (Q ↔ Sa1…an)]

and that [(Q&K) → (P ↔ Ra1…an)], and Q is consistent 9. Computational Inheritance PROB with K, then (P/Q) = prob(Rx1…xn/Sx1…xn & x1 = The biggest problem faced by most theories of direct inference a1 & … & xn = an & K). concerns what to do if we have information supporting (3) PROB(P) = prob(P & x=x/x = x & K). conflicting direct inferences. For example, suppose Bernard has symptoms suggesting, with probability .6, that he has a certain Proof: See appendix. rare disease. Suppose furthere that we have two seemingly PROB(P) is a kind of “mixed physical/epistemic probability,” unrelated diagnostic tests for a disease, and Bernard tests because it combines background knowledge in the form of K positive on both tests. We know that the probability of a person with nomic probabilities. with his symptoms having the disease if he tests positive on The probability prob(Fx/x = a & K) is a peculiar-looking the first test is .7, and the probability if he tests positive on the nomic probability. It is a generic probability, because “x” is a free second test is .75. But what should we conclude about the variable, but the probability is only about one object. As such probability of his having the disease if he tests positive on both it cannot be evaluated by statistical induction or other familiar tests? The probability calculus gives us no guidance here. It is forms of statistical reasoning. However, it can be evaluated consistent with the probability calculus for the “joint probability” using nonclassical direct inference. If K entails Ga, nonclassical of his having the disease if he tests positive on both tests to be direct inference gives us a defeasible reason for expecting anything from 0 to 1. The Principle of Classical Direct inference that PROB(Fa) = prob(Fx/x = a & K) = prob(Fx/Gx). This is as formulated in section eight is no help either. Direct inference a familiar form of “classical” direct inference—that is, direct gives us one reason for thinking the probability of Bernard inference from generic probabilities to singular probabilities. having the disease is .7, and it gives us a different reason for More generally, we can derive: drawing the conflicting conclusion that the probability is .75. Classical Direct Inference: The result, endorsed in Pollock (1990), is that both instances of £Sa …a is known and prob(Rx …x / Sx …x & Tx …x ) Classical Direct Inference are defeated (it is a case of collective 1 n 1 n 1 n 1 n defeat), and we are left with no conclusion to draw about the PROB = r· is a defeasible reason for £ (Ra1…an / Ta1…an) = r·. singular probability of Bernard’s having the disease. Because this sort of situation is so common, Classical Direct Inference Similarly, we get subproperty defeaters: is not generally very useful. Kyburg (1974) tried to do better by Subproperty Defeat for Classical Direct Inference: proposing that Direct Inference locates singular probabilities in intervals. In this case his conclusion would be that the £V 7 S, Va1…an is known, and prob(Rx1…xn / Vx1…xn & Tx …x ) ≠ r· is an undercutting defeater for the inference probability of Bernard having the disease is (or lies in the 1 n interval) [.7,.75]. But intuitively, this also seems unsatisfactory. by classical direct inference from £Sa1…an is known and PROB If Bernard tests positive on both tests, the probability of his prob(Rx1…xn / Sx1…xn & Tx1…xn) = r· to £ (Ra1…an / Ta …a ) = r·. having the disease should be higher than if he tests positive 1 n on just one, so it should lie above the interval [.7,.75]. But how Classical Direct Inference and Subproperty Defeat are can we justify this? (versions of) the two best known principles of direct inference. Pollock (1983) proposed them as precizations of Reichenbach’s Knowledge of generic probabilities would be vastly more seminal principles of direct inference, and Kyburg (1974) and useful in real application if there were a function Y(r,s|a) such 7 Bacchus (1990) built their theories around similar principles. that when prob(F/U) = a, G,H U, prob(F/G) = r and prob(F/H) However, as Kyburg was the first to observe, these two principles = s we could defeasibly expect that prob(F/G&H) = Y(r,s|a), do not constitute a complete theory of direct inference. This is and hence (by Nonclassical Direct Inference) that PROB(Fc) illustrated by overlap defeat, and we will find other defeaters = Y(r,s|a). I call this computational inheritance, because it too as we proceed: computes a new value for PROB(Fc) from previously known generic probabilities. Direct inference, by contrast, is a kind of “noncomputational inheritance.” It is direct in that PROB(Fc) simply inherits a value from a known generic probability. I call

— 13 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — the function used in computational inheritance “the Y-function” Figure 3. Y(z,x|.5), holding z constant (for several choices of z as because its behavior would be as diagrammed in figure 2. indicated in the key). function. Figure 3 illustrates that Y(��.5) is symmetric around the right-leaning diagonal. Figure 2. The Y-function prob(F/U) = a and G,H 7 U prob(F/G) = r prob(F/H) = s

prob(F/G&H) = Y(r,s|a)

Following Reichenbach (1949), it has generally been assumed that there is no such function as the Y-function. Certainly, there is no function Y(r,s|a) such that we can conclude deductively that prob(F/G&H) = Y(r,s|a). For any r, s, and a that are neither 0 nor 1, prob(F/G&H) can take any value Varying a has the effect of warping the Y-function up or between 0 and 1. However, that is equally true for Nonclassical down relative to the right-leaning diagonal. This is illustrated Direct Inference. That is, if prob(F/G) = r we cannot conclude in figure 4 for several choices of a. deductively that prob(F/G&H) = r. Nevertheless, that will tend Note that, in general, when r,s < a then Y(r,s|a) < r and to be the case, and we can defeasibly expect it to be the case. Y(r,s|a) < s, and when r,s > a then Y(r,s|a) > r and Y(r,s|a) Might something similar be true of the Y-function? That is, > s. could there be a function Y(r,s|a) such that we can defeasibly 13 expect prob(F/G&H) to be Y(r,s|a)? It follows from the Probable The Y-function has a number of important properties. In Probabilities Theorem that the answer is “Yes.” Let us define: particular, it is important that the Y-function is commutative and associative in the first two variables: Y(r,s|a) = Y-commutativity: Y(r,s|a) = Y(s,r|a). I use the non-standard notation “Y(r,s|a)” rather than “Y(r,s,a)” because the first two variables will turn out to work differently Y-associativity: Y(r,Y(s,t|a)|a) = Y(Y(r,s|a),t|a). than the last variable. Commutativity and associativity are important for the use of Let us define: the Y-function in computing probabilities. Suppose we know that prob(A/B) = .6, prob(A/C) = .7, and prob(A/D) = .75, B and C are Y-independent for A relative to U iff A,B,C 7 where B,C,D 7 U and prob(A/U) = .3. In light of comutativity U and and associativity we can combine the first three probabilities (a) prob(C/B & A) = prob(C/A) in any order and infer defeasibly that prob(A/B&C&D) = and Y(.6,Y(.7,.75|.3)|.3) = Y(Y(.6,.7|.3),.75|.3) = .98. This makes it (b) prob(C/B & ~A) = prob(C/U & ~A). convenient to extend the Y-function recursively so that it can be applied to an arbitrary number of arguments (greater than The key theorem underlying computational inheritance is the or equal to 3): following theorem of the probability calculus: If n ≥ 3, Y(r ,…,r |a) = Y(r ,Y(r ,…,r |a) |a). Y-Theorem: 1 n 1 2 n Then we can then strengthen the Y-Principle as follows: Let r = prob(A/B), s = prob(A/C), a = prob(A/U), and 0 < a < 1. If B and C are Y-independent for A relative to U then Compound Y-Principle:

prob(A/B&C) = Y(r,s|a). If B1,…,Bn 7 U, prob(A/B1) = r1,…, prob(A/Bn) = rn, and prob(A/U) = a, the expectable value of prob(A/B &…& Proof: See appendix. 1 B & C) = Y(r ,…,r |a). In light of the Y-theorem, we can think of Y-independence as n 1 n formulating an independence condition for C and D which says If we know that prob(A/B) = r and prob(A/C) = s, we can that they make independent contributions to A—contributions also use Nonclassical Direct Inference to infer defeasibly that that combine in accordance with the Y-function, rather than prob(A/B&C) = r. If s ≠ a, Y(r,s|a) ≠ r, so this conflicts with the “undermining” each other. conclusion that prob(A/B&C) = Y(r,s|a). However, as above, the inference described by the Y-principle is based upon a By virtue of the Principle of Statistical Independence, we probability with a more inclusive reference property than have a defeasible reason for expecting that the independence that underlying Nonclassical Direct Inference (that is, it takes conditions (a) and (b) hold. Thus the Y-theorem supports the account of more information), so it takes precedence and yields following principle of expectable values (which can also be an undercutting defeater for Nonclassical Direct Inference: proven directly using the probable probabilities software): Y-Defeat Defeat for Nonclassical Direct Inference: Y-Principle: £ 7 · If B,C 7 U, prob(A/B) = r, prob(A/C) = s, prob(A/U) = a, A,B,C U and prob(A/C) ≠ prob(A/U) is an undercutting £ · £ prob(B/U) = b, prob(C/U) = c, and 0 < a < 1, then the defeater for the inference from prob(A/B) = r to prob(A/ · expectable value of prob(A/B & C) = Y(r,s|a). B&C) = r by Nonclassical Direct Inference. Proof: See appendix. It follows that we also have a defeater for the Principle of Statistical Independence: Note that the expectable value of prob(A/B & C) is independent of b and c. Y-Defeat Defeat for Statistical Independence: £ 7 · To get a better feel for what the Y-Principle tells us, it is A,B,C U and prob(A/B) ≠ prob(A/U) is an undercutting £ useful to examine plots of the Y-function. Figure 3 illustrates that defeater for the inference from prob(A/C) = r & prob(B/C) = · £ ⋅ · Y(r,s|.5) is symmetric around the right-leaning diagonal. s to prob(A&B/C) = r s by Statistical Independence.

— 14 — — Philosophy and Computers —

Figure 4. Y(z,x|a) holding z constant (for several choices of z), for a = .7, a = .3, a = .1, Theorem 3 seems initially surprising, and a = .01. because we have a defeasible assumption of independence for B and C relative to all three of A, U&~A, and U. Theorem 3 tells us that if A is statistically relevant to B and C then we cannot have all three. However, this situation is common. Consider the example of two sensors B and C sensing the presence of an event A. Given that one sensor fires, the probability of A is higher, but raising the probability of A will normally raise the probability of the other sensor firing. So B and C are not statistically independent. However, knowing whether an event of type A is occurring screens off the effect of the sensors on one another. For example, knowing that an event of type A occurs will raise the probability of one of the sensors firing, but knowing that the other sensor is firing will not raise that probability further. So prob(B/C&A) = prob(B/A) and prob(B/ C&~A) = prob(B/U&~A). The defeasible presumption of Y- independence for A is based upon a probability that takes account of more information than the probability grounding the defeasible presumption of statistical independence relative to U, so the former takes precedence. In other words, in light of theorem 3, we get a defeater for Statistical Figure 4. Y(��) holding � constant (for several Independence whenever we have an A 7 U such that prob(A/C) ≠ prob(A/U) and The phenomenon of Computational Inheritance makes prob(A/B) ≠ prob(A/U): knowledge of generic probabilities useful in ways it was Y-Defeat for Statistical Independence: never previously useful. It tells us how to combine different £prob(A/C) ≠ prob(A/U) and prob(A/B) ≠ prob(A/U)· is an probabilities that would lead to conflicting direct inferences and undercutting defeater for the inference from £prob(A/C) still arrive at a univocal value. Consider Bernard again. We are = r and prob(B/C) = s· to £prob(A&B/C) = r ⋅ s· by the supposing that the probability of a person with his symptoms Principle of Statistical Independence. having the disease is .6. We also suppose the probability of such Proof: See Appendix. a person having the disease if they test positive on the first test is .7, and the probability of their having the disease if they test The application of the Y-function presupposes that we know positive on the second test is .75. What is the probability of the base rate prob(A/U). But suppose we do not. Then what can their having the disease if they test positive on both tests? We we conclude about prob(A/B&C)? It might be supposed that we can infer defeasibly that it is Y(.7,.75|.6) = .875. We can then can combine Indifference and the Y-Principle and conclude that apply classical direct inference to conclude that the probability prob(A/B&C) = Y(r,s|.5). That would be interesting because, of Bernard’s having the disease is .875. This is a result that we as Joseph Halpern has pointed out to me (in correspondence), could not have gotten from either the probability calculus alone this is equivalent to Dempster’s “rule of composition” for belief 14 or from Classical Direct Inference. Similar reasoning will have functions (Shafer 1976). However, by ignoring the base rate significant practical applications, for example in engineering prob(A/U), that theory will often give intuitively incorrect results. where we have multiple imperfect sensors sensing some For example, in the case of the two tests for the disease, suppose phenomenon and we want to arrive at a joint probability the disease is rare, with a base rate of .1, but each positive test regarding the phenomenon that combines the information individually confers a probability of .4 that the patient has the from all the sensors. disease. Two positive tests should increase that probability further. Indeed, Y(.4,.4|.1) = .8. However, Y(.4,.4|.5) = .3, so Again, because singular probabilities are generic if we ignore the base rate, two positive tests would lower the probabilities in disguise, we can apply computational probability of having the disease instead of raising it. inheritance to them as well and infer defeasibly that if PROB(P) The reason the Dempster-Shafer rule does not give the right = a, PROB(P/Q) = r, and PROB(P/R) = s then PROB(P/Q&R) = Y(r,s|a). answer when we are ignorant of the base rate is that, although when we are completely ignorant of the value of prob(A/U) it is Somewhat surprisingly, when prob(C/A) ≠ prob(C/U) and reasonable to expect it to be .5, knowing the values of prob(A/ prob(B/A) ≠ prob(B/U), Y-independence conflicts with ordinary B) and prob(A/C) changes the expectable value of prob(A/U). independence: Let us define Y0(r,s) to be Y(r,s|a) where a, b, and c are the Theorem 3: If B and C are Y-independent for A relative to U solutions to the following set of three simultaneous equations and prob(C/A) ≠ prob(C/U) and prob(B/A) ≠ prob(B/U) then (for fixed r and s): prob(C/B) ≠ prob(C/U). Proof: see appendix.

— 15 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

it is discovered that 98% of the people having enzyme E in their blood have the disease. This becomes a powerful tool for diagnosis when used in connection with the statistical syllogism. However, the use of such information is complicated by the fact that we often have other sorts of statistical information as well. First, in statistical investigations of diseases, it is typically found that some factors are statistically irrelevant. For instance, it may be discovered that the color of one’s hair is statistically irrelevant to the reliability of this diagnostic technique. Thus, for Then we have the following principle: example, it is also true that 98% of all redheads having enzyme Y -Principle: 0 E in their blood have the disease. Second, we may discover that If prob(A/B) = r and prob(A/C) = s, then the expectable there are specifiable circumstances in which the diagnostic

value of prob(A/B&C) = Y0(r,s). technique is unreliable. For instance, it may be found that of Proof: See appendix. patients undergoing radiation therapy, only 32% of those with enzyme E in their blood have Roderick’s syndrome. As we have If a is the expectable value of prob(A/U) given that prob(A/ found hair color to be irrelevant to the reliability of the diagnostic B) = r and prob(A/C) = s, then Y (r,s) = Y(r,s|a). However, a 0 technique, we would not ordinarily go on to collect data about does not in general have a simple analytic characterization. the effect of radiation therapy specifically on redheads. Now Y (r,s) is plotted in figure 5, and the default values of prob(A/U) 0 consider Jerome, who is redheaded, undergoing radiation are plotted in figure 6. Note how the curve for Y (r,s) is twisted 0 therapy, and is found to have enzyme E in his blood. Should with respect to the curve for Y(r,s|.5) (in figure 3). we conclude that he has Roderick’s syndrome? Intuitively, we should not, but this cannot be explained directly by the Figure 5. Y0(r,s), holding s constant (for several choices of s as indicated in the key statistical syllogism and subproperty defeat. We have statistical knowledge about the reference properties B = person with enzyme E in his blood, C = redheaded person, and D = person who is undergoing radiation therapy. Letting A be the property of having Roderick’s syndrome, we know that: (1) Bc & prob(Ax/Bx) = .98. (2) Bc & Cc & prob(Ax/Bx&Cx) = .98. (3) Dc & prob(Ax/Bx&Dx) = .32. (1), (2), and (3) are related as in figure 7, where the solid arrow indicates a defeat relation and the dashed arrows signify inference relations. By the statistical syllogism, both (1) and (2) constitute defeasible reasons for concluding that Jerome has Roderick’s syndrome. (3) provides a subproperty defeater for the inference from (1), but it does not defeat the inference from (2). Thus it should be reasonable to infer that because Jerome Figure 6. Default values of prob(A/U) (for several choices of s as is a redhead and most redheads with enzyme E in their blood indicated in the key have Roderick’s syndrome, Jerome has Roderick’s syndrome. Formally, the fact that Jerome is undergoing radiation therapy should not defeat the inference from (2), because that is not more specific information than the fact that Jerome has red hair. But, obviously, this is wrong. We regard Jerome’s having red hair as irrelevant. The important inference is from the fact that most people with enzyme E in their blood have Roderick’s syndrome to the conclusion that Jerome has Roderick’s syndrome, and we regard that inference as undefeated. Pollock (1990) took this example to support the need for a new kind of defeater for the statistical syllogism. Domination defeaters were supposed to have the effect of making (3) defeat the inference from (2) to (4) by virtue of the fact that (2) defeats the inference the inference from (1) to (4) and prob(Ax/Bx) = prob(Ax/Bx&Cx): Domination Defeat: 10. Domination Defeaters for the Statistical If A is projectible with respect to D, then £Dc & prob(A/ Syllogism B) = prob(A/B&C) & prob(A/B&D) < prob(A/B)· is an The defeasible inferences licensed by our principles of probable undercutting defeater for the inference from £Bc & Cc & probabilities are obtained by applying the statistical syllogism prob(A/B&C) = r· to £Ac· by the statistical syllogism. to second-order probabilities. It turns out that the principles What I will show here is that by appealing to the Y-Principle, of probable probabilities have important implications for the we can derive domination defeaters from subproperty defeaters statistical syllogism itself. In stating the principle of the statistical without making any further primitive assumptions. Applying the syllogism in section three, the only primitive defeater that I gave Y-Principle to (1), (2), and (3), we get: was that of subproperty defeat. However, in Pollock (1990), it was Expectable Domination: argued that we must supplement subproperty defeaters with If B,C,D 7 U, prob(A/B) = r, prob(A/B&D) = v < r, and what were called “domination defeaters.” Suppose that in the prob(A/U) = a, then the expectable value of prob(A/ course of investigating a certain disease, Roderick’s syndrome, B&C&D) = Y(v,a|a) = v < r.

— 16 — — Philosophy and Computers —

Figure 7. Domination defeat Inverse Probabilities I: If A,B 7 U and we know that prob(A/B) = r, but we do (1) �� & prob(��/��) = � not know the base rates prob(A/U) and prob(B/U), the (3) �� & prob(��/��& ��) �� following values are expectable: prob(B/U) = ; (2) �� & �� & prob(��/��&��) = � prob(A/U) = ; domination defeat?

(4) �� prob(~A/~B&U) = .5;

prob(~B/~A&U) = . We can diagram the relations between these probabilities as in figure 8. The upshot is that we can infer defeasibly that prob(A/B&C&D) = prob(A/B&D), and this gives us a Proof: See appendix. subproperty defeater for the inference from (2) to (4). Thus These values are plotted in figure 9. Note that when prob(A/B) domination defeaters become derived defeaters. Note that this > prob(A/U), we can expect prob(~B/~A&U) to be almost as argument does not depend on the value of a. It works merely great as prob(A/B). on the supposition that there is some base-rate a, and that is a necessary truth. Figure 9. Expectable values of prob(~B/~A&U), prob(A/U), and prob(B/U), as a function of prob(A/B), when the base rates are Figure 8. Reconstructing Domination Defeat unknown.

�� & prob(�/�) = � �� & prob(�/�&�) = � < �

�� & �� & prob(�/�&�) = �

�� & prob(�/�&�&�) = Y(��) = � < �

Somewhat surprisingly, domination defeat can be generalized. If we have prob(A/B&C) = s ≤ r, we can still infer that prob(A/B&C&D) = Y(v,s|r). It is a general property of the Y-function that if v < r and s ≤ r then Y(v,s|r) < v and Y(v,s|r) < s. Hence we get: Generalized Domination Defeat: If s ≤ r then £Dc & prob(A/B) = r & prob(A/B&D) < r· is an undercutting defeater for the inference from £Bc & Cc & prob(A/B&C) = s· to £Ac· by the statistical syllogism. 11. Inverse Probabilities Sometimes we know one of the base rates but not both: All of the principles of probable probabilities that have been Inverse Probabilities II: discussed so far are related to defeasible assumptions of If A,B 7 U and we know that prob(A/B) = r and prob(B/U) statistical independence. As we have seen, Nonclassical Direct = b, but we do not know the base rate prob(A/U), the Inference is equivalent to a defeasible assumption of statistical following values are expectable: independence, and the Y-Principle follows from a defeasible prob(A/U) = .5(1 – (1 – 2r)b); assumption of Y-independence. This might suggest that all principles of probable probabilities derive ultimately from various defeasible independence assumptions. However, this prob(~A/~B&U) = ; section presents a set of principles that do not appear to be related to statistical independence in any way. prob(~B/~A&U)) = . Where A,B 7 U, suppose we know the value of prob(A/ B). If we know the base rates prob(A/U) and prob(B/U), the Figure 10 plots the expectable values of prob(~B/~A&U) probability calculus enables us to compute the value of the (for values greater than .5) as a function of prob(A/B), for fixed inverse probability prob(~B/~A&U): values of prob(B/U). The diagonal dashed line indicates the Theorem 4: If A,B 7 U then value of prob(A/B), for comparison. The upshot is that for low prob(~B/~A&U) = values of prob(B/U), prob(~B/~A&U) can be expected to be However, if we do not know the base rates then the probability higher than prob(A/B), and for all values of prob(B/U), prob(~B/ calculus imposes no constraints on the value of the inverse ~A&U) will be fairly high if prob(A/B) is high. Furthermore, probability. It can nevertheless be shown that there are prob(~B/~A&U) > .5 iff prob(B/U) < . expectable values for it, and generally, if prob(A/B) is high, so is prob(~B/~A&U).

— 17 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Figure 10. Expectable values of prob(~B/~A&U) as a function If prob(A/U) < .5, the expected value of prob(~B/~A&U) is of prob(A/B), when prob(A/U) is unknown, for fixed values of greater than prob(A/B). prob(B/U). The upshot is that even when we lack knowledge of the base rates, there is an expectable value for the inverse probability prob(~B/~A&U), and that expectable value tends to be high when prob(A/B) is high. 12. Meeting Some Objections I have argued that mathematical results, coupled with the statistical syllogism, justify defeasible inferences about the values of unknown probabilities. Various worries arise regarding this conclusion. A few people are worried about any defeasible (non-deductive) inference, but I presume that the last 50 years of epistemology has made it amply clear that, in the real world, cognitive agents cannot confine themselves to conclusions drawn deductively from their evidence. We employ multitudes of defeasible inference schemes in our everyday reasoning, and the statistical syllogism is one of them. Granted that we have to reason defeasibly, we can still ask what justifies any particular defeasible inference scheme. At least in the case of the statistical syllogism, the answer seems clear. If prob(A/B) is high, then if we reason defeasibly from things being B to their being A, we will generally get it right. The most complex case occurs when we do know That is the most we can require of a defeasible inference the base-rate prob(A/U) but we do not know the base-rate scheme. We cannot require that the inference scheme will prob(B/U): always lead to true conclusions, because then it would not be Inverse Probabilities III: defeasible. People sometimes protest at this point that they are not interested in the general case. They are concerned with If A,B 7 U and we know that prob(A/B) = r and prob(A/U) some inference they are only going to make once. They want = a, but we do not know the base rate prob(B/U), then: to know why they should reason this way in the single case. (a) where b is the expectable value of prob(B/U), But all cases are single cases. If you reason in this way in single cases, you will tend to get them right. It does not seem that you ; can ask for any firmer guarantee than that. You cannot avoid defeasible reasoning. But we can have a further worry. For any defeasible (b) the expectable value of prob(~B/~A&U) = . inference scheme, we know that there will be at possible The equation characterizing the expectable value of prob(B/U) cases in which it gets things wrong. For each principle of does not have a closed-form solution. However, for specific probable probabilities, the possible exceptions constitute a values of a and r, the solutions can be computed using hill- set of measure 0, but it is still an infinite set. The cases that climbing algorithms (included in the probable probabilities actually interest us tend to be highly structured, and perhaps software). The results are plotted in figure 11. When prob(A/ they also constitute a set of measure 0. How do we know that B) = prob(A/U), the expected value for prob(~B/~A) is .5, the latter set is not contained in the former? Again, there can and when prob(A/B) > prob(A/U), prob(~B/~A&U) > .5. be no logical guarantee that this is not the case. However, the generic probability of an arbitrary set of cases falling in the set of possible exceptions is 0. So without further specification of Figure 11. Expectable values of prob(~B/~A&U) as a function the structure of the cases that interest us, the probability of the of prob(A/B), when prob(B/U) is unknown, for fixed values of prob(A/U). set of those cases all falling in the set of exceptions is 0. Where defeasible reasoning is concerned, we cannot ask for a better guarantee than that. We should resist the temptation to think of the set of possible exceptions as an amorphous unstructured set about which we cannot reason using principles of probable probabilities. The exceptions are exceptions to single defeasible inference schemes. Many of the cases in which a particular inference fails will be cases in which there is a general defeater leading us to expect it to fail and leading us to make a different inference in its place. For example, knowing that prob(A/B) = r gives us a defeasible reason to expect that prob(A/B&C) = r. But if we also know that prob(A/C) = s and prob(A/U) = a, the original inference is defeated and we should expect instead that prob(A/B&C) = Y(r,s|a). So this is one of the cases in which an inference by nonclassical direct inference fails, but it is a defeasibly expectable case. There will also be cases that are not defeasibly expectable. This follows from the simple fact that there are primitive nomic probabilities representing statistical laws of nature. These laws

— 18 — — Philosophy and Computers — are novel, and cannot be predicted defeasibly by appealing fruitful by giving them the power to license defeasible, non- to other nomic probabilities. Suppose prob(A/B) = r, but deductive, inferences to a wide range of further probabilities £prob(A/B&C) = s· is a primitive law. The latter is an exception that we have not investigated empirically. Furthermore, unlike to nonclassical direct inference. Furthermore, we can expect logical probability, these defeasible inferences do not depend that strengthening the reference property further will result in upon ad hoc postulates. Instead, they derive directly from nomic probabilities like £prob(A/B&C&D) = s·, and these will provable theorems of combinatorial mathematics. So even when also be cases in which the nonclassical direct inference from we do not have sufficient empirical information to deductively £prob(A/B) = r· fails. But, unlike the primitive law, the latter determine the value of a probability, purely mathematical facts is a defeasibly expectable failure arising from subproperty may be sufficient to make it reasonable, given what empirical defeat. So most of the cases in which a particular defeasible information we do have, to expect the unknown probabilities inference appealing to principles of probable probabilities fails to have specific and computable values. Where this differs from will be cases in which the failure is defeasibly predictable by logical probability is (1) that the empirical values are an essential appealing to other principles of probable probabilities. This is ingredient in the computation, and (2) that the inferences to an observation about how much structure the set of exceptions these values are defeasible rather than deductive. (of measure 0) must have. The set of exceptions is a set of exceptions each to just a single rule, not to all principles of Appendix: Proofs of Theorems probable probabilities. The Probable Probabilities Theorem 3. Nomic Probability implies that even within the set of exceptions to a particular Finite Projection Principle: defeasible inference scheme, most inferences that take account If the sets of F’s, G’s, and R’s are finite and (∀y)(Gy → of the primitive nomic probabilities will get things right, with ρ (Fx/Rxy)∈[p,q] then ρ (Fx/Rxy & Gy)∈[p,q]. probability 1. x x,y Proof: Suppose the G’s are y1,…,yn. Then 13. Conclusions ρx,y(Fx/Rxy & Gy) = ρx,y(Fx/Rxy & (y = y1 ∨…∨ y = yn)) The problem of sparse probability knowledge results from = ρ (Fx/Rxy & y=y )⋅ρ (y=y / Rxy & (y = y ∨…∨ y=y )) the fact that in the real world we lack direct knowledge of x,y i x,y i 1 n most probabilities. If probabilities are to be useful, we must ρ (Fx/Rxy & y=y ) = ρ (Fx/Rxy & y=y ) have ways of making defeasible estimates of their values x,y i x,y i i even when those values are not computable from known probabilities using the probability calculus. Within the theory = of nomic probability, limit theorems from finite combinatorial mathematics provide the necessary basis for these inferences. = . It turns out that in very general circumstances, there will be expectable values for otherwise unknown probabilities. These are described by principles telling us that although certain The ρx,y(y = yi / Rxy & (y = y1 ∨…∨ y = yn)) are between 0 and 1 inferences from probabilities to probabilities are not deductively and sum to 1, so ρx,y(Fx/Rxy & Gy) is a weighted average of the • valid, nevertheless the second-order probability of their yielding ρx(Fx/Rxy1), and hence ρx,y(Fx/Rxy & Gy)∈[p,q]. correct results is 1. This makes it defeasibly reasonable to make the inferences. 4. Limit Theorems and Probable Probabilities I illustrated this by looking at Indifference, Statistical Finite Indifference Principle: Independence, Classical and Nonclassical Direct Inference, For every ε,δ > 0 there is an N such that if U is finite and the Y-Principle, and Inverse Probabilities. But these are just #U > N then illustrations. There are a huge number of useful principles of probable probabilities, some of which I have investigated, but ρX(ρ(X,U) 0.5 / X ⊆ U) ≥ 1 – ε. most waiting to be discovered. I proved the first such principles We can prove the Finite Indifference Principle as follows. laboriously by hand. It took me six months to find and prove the Theorem 1 establishes that is the size of r that maximizes Y-Principle. But it turns out that there is a uniform way of finding Bin(n,r). and proving these principles. This made it possible to write the probable probabilities software that analyzes the results of Theorem 1: If 0 < k ≤ n, and . linear constraints and determines what the expectable values of the probabilities are. That software produces a proof of the Y-Principle in a matter of seconds. Nomic probability and the principles of probable probability Proof: are reminiscent of Carnap’s logical probabilities (Carnap 1950, 1952; Hintikka 1966; Bacchus et al. 1996). Historical theories of objective probability required probabilities to be assessed by empirical methods, and because of the weakness of the probability calculus, they tended to leave us in a badly impoverished epistemic state regarding most probabilities. = Carnap tried to define a kind of probability for which the values of probabilities were determined by logic alone, thus vitiating the need for empirical investigation. However, finding the right probability measure to employ in a theory of logical probabilities And . • proved to be an insurmountable problem. Nomic probability and the theory of probable probabilities Theorem 2 shows that the slopes of the curves become lies between these two extremes. This theory still makes the infinitely steep as n → ∞, and hence the sizes of subsets of an values of probabilities contingent rather than logically necessary, n-membered set cluster arbitrarily close to : but it makes our limited empirical investigations much more — 19 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Theorem 2: As n → ∞, and . =

= . Proof: As in theorem 1, For any positive or negative real number ε, → Nε as N → ∞, so the most probable value of x occurs when → 1 as u → ∞.

> . • =

As we are talking about finite sets, there is always at least one Thus we have the Finite Indifference Principle. value of x that maximizes Prd(x) and hence that is a solution to this equation. It will follow from the proof of the Probable Probable Proportions Theorem: Values Lemma (below) that, in the limit, there is only one

Let U,X1,…,Xn be a set of variables ranging over sets, allowable value of x that is a solution. It is, in fact, the only real- and consider a finitely unbounded finite set LC of linear valued solution within the interval [0,1]. It was noted above constraints on proportions between Boolean compounds that r1+...+rk– s1–...–sk = 0, so more simply, the most probable of those variables. Then for any pair of relations P,Q whose value of x is a real-valued solution within the interval [0,1] of

variables are a subset of U,X1,…,Xn there is a unique real the following equation: number r in [0,1] such that for every ε,δ > 0, there is an N such that if U is finite and # {〈X1,...,Xn〉|LC & X1,...,Xn ⊆ =1. U}≥ N then In the common case in which r1 = … = rk = s1 = … = sk, the ρ (ρ(P,Q) r / LC & X1,...,Xn ⊆ U) ≥ 1 – ε. X1,...,Xn most probable value of x occurs when

=1. Proof: Assume then that LC is finitely unbounded. For each intersection of elements of the set {X ,…,X }, e.g, X∩Y∩Z, let the 1 n Whichever of these equations we get, I will call it the term- corresponding lower-case variable xyz be ρ(X∩Y∩Z/U). Given a characterization of x. We find the most probable value of x by set of linear constraints on these variables, the cardinality of an solving these equations for x. element X of the partition is a function f(x)⋅u of x (x may occur vacuously, in which case f(x) is a constant function). I will refer What remains is to show, in the limit, that if ξ is the most to the f(x)’s as the partition-coefficients. Because the constraints probable value of x, then ξ has probability 1 of being the value are linear, for each f(x) there is a positive or negative real number of x. This is established by the Probable Values Lemma, stated r such that f(x+ε) = f(x)+r⋅ε⋅u. If r < 0, I will say that x has a below. To prove the Probable Values Lemma, we first need: negative occurrence. Otherwise, x has a positive occurrence. It Partition Principle: is a general characteristic of partitions that each variable has If ε,x1,…,xk,y1,…,yk > 0, r1ε < x1,...,rkε< xk, x1+…+xk+y1+… the same number k of positive and negative occurrences. Let +yk = 1, and r1+…+rk = s1+…+sk then a (x),…,a (x) be the partition-coefficients in which x has a 1 k positive occurrence, and let b1(x),…,bk(x) be those in which x > 1 has a negative occurrence. In most cases we will consider, r = 1 or r = -1, but not in all. The terms r⋅ε represent the amount a Proof: By the inequality of the geometric and arithmetic mean, cell of the partition changes in size when x is incremented by ε. if z1,…,zn > 0, z1+…+zn = 1, a1,…,an > 0, and for some i, j, ai

However, the sizes of the cells must still sum to u, so the sum of ≠ aj, then the r’s must be 0. For each i ≤ k, let ri be the real number such . that a (x+ε) = a (x) + r ε and let s be the real number such that i i i i We have: bi(x+ε) = bi(x) – siε. So r1+...+rk=s1+...+sk. Note further that x1 – r1ε+...+ xk–rkε +y1+s1ε+...+yk+skε a1(x)⋅u,…,ak(x)⋅u,b1(x)⋅u,…,bk(x)⋅u are the cardinalities of the elements of the partition, so they must be non-negative. That is, = x1 +...+xk+y1 +...+ yk + ε(s1 +...+ sk– r1–...– rk) =1 for any value ξ of x that is consistent with the probability calculus and (an “allowable” value of x), a1(ξ),…,ak(ξ),b1(ξ),…,bk(ξ) must be non-negative. It follows that if ξ is an allowable value of x, ξ >1> , so + ε is an allowable value only if for every i ≤ k, ε < , and

ξ - ε is an allowable value only if for every i ≤ k, . Define < Prd(x) = (a1(x)u)!...(ak(x)u)!(b1(x)u)!...(bk(x)u)! Our interest is in what happens as u → ∞. For fixed values of the other variables, the most probable value of x occurs when =1.

→ 1 as u → ∞. (This assumes that the curve has no Equivalently, merely local minima, but that is proven below in the course of proving the Probable Values Lemma.) >1. •

— 20 — — Philosophy and Computers —

Now we can prove: Hence, Probable Values Lemma: If LC is an infinitely unbounded set of linear constraints, →

a1(x),…,ak(x),b1(x),…,bk(x) are the resulting positive and negative partition coefficients, and

=1 = then for every ε,δ,>0>0, the probability that ξ is within δ of the actual the value of x is greather than 1–ε. = . Proof: Where Prd(x) = (a (x)u)!...(a (x)u)!(b (x)u)!...(b (x)u)! I will call the latter the slope function for x. By the Partition 1 k 1 k Principle, it suffices to show that when ξ is the most probable value of x, then (1) if for every i ≤ k, , then → ∞ as u → ∞, > 1. and (2) if for every i ≤ k, , then → ∞ as u → ∞. I Hence → ∞ as u → ∞. So as u → ∞, the probability that will just prove the former, as the latter is analogous. x ξ → 1. • = Law of Large Numbers for Proportions: = . If B is infinite and ρ(A/B) = p then for every ε,δ > 0, there is an N such that

By the Stirling approximation, . ρX(ρ(A / X) p / X ⊆ B & X is finite & # X ≥ N) ≥ 1 – ε. Thus, as u → ∞, Proof: Suppose ρ(A/B) = p, where B is infinite. By the finite-set principle: → ρX (ρ(A / X) = r / X ⊆ B & # X = N)=

ρ ,..., (ρ(A /{x1,...,xN})= r / x1,...,xN are pairwise distinct x1 xN = . & x1,...,xN ∈ B). “(ρ(A /{x ,...,x })=r” is equivalent to the disjunction of Similarly, 1 N pairwise logically incompatible disjuncts of the form “y1,...,yrN

∈ A & z1,...,z(1–r)N ∉ A” where {y1,...,yrN,z1,...,z(1–r)N} = {x1,...,xN}. → . By the crossproduct principle,

= prN (1–p)(1–r)N Therefore, . (For instance, ρx,y,z(Ax & Ay & ~ Az / Bx & By & Bz) = ρ(A × A × (B – A), B × B × B)= p ⋅ p ⋅ (1 – p).) Hence by finite additivity:

ρ (ρ(A / X) = r / X ⊆ B & # X = N) = → X This is just the formula for the binomial distribution. It follows by the familiar mathematics of the binomial distribution according to which, like C(n,r), it becomes “needle-like” in the limit, that for every ε,δ > 0, there is an N such that • = ρX(ρ(A / X) p / X ⊆ B & X is finite & # X ≥ N) ≥ 1 – ε. Limit Principle for Proportions: Consider a finitely unbounded finite set LC of linear constraints on proportions between Boolean compounds of

= . a list of variables U,X1,…,Xn. Let r be limit solution for ρ(P/Q) given LC. Then for any infinite set U, for every δ > 0:

ρ ,..., (ρ(P,Q) r / LC & X1,...,Xn ⊆ U) = 1. x1 xN r1 +...+ rk – s1 –...– sk = 0, so = 1, and Proof: Consider a finitely unbounded finite set LC of linear constraints, and let r be the limit solution for ρ(P/Q) given LC. = 1. Thus for every ε,δ > 0, there is an N such that if U* is finite and{〈X1,...,Xn〉 | LC & X1,...,Xn ⊆ U*} ≥ N, then

ρ ,..., (ρ(P,Q) r / LC & X1,...,Xn ⊆ U*) ≥ 1 – ε. x1 xN It follows by the projection principle that for every ε,δ > 0, there is an N such that

ρ ,..., (ρ(P,Q) r / LC & X1,...,Xn ⊆ U* & U* is finite & X1 Xn,U # U* ≥ N) ≥ 1 – ε — 21 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Suppose that for some δ > 0 and infinite U: probabilities.

ρ ,..., (ρ(P,Q) r / LC & X1,...,Xn ⊆ U) = s. The software produces the following: x1 xN ( As LC is finitely unbounded, it follows that {〈X1,...,Xn〉 | LC & X ,...,X ⊆ U} is infinite. Hence by the Law of Large Numbers ======1 n Dividing U into 3 subsets A,B,C whose cardinalities relative to U are a, b, c, for Proportions, for every ε > 0, there is an N such that if the following constraints are satisfied: ρ ρ ρ ⊆ ⊆ prob(A / C) = r (1) U( x ,...,x ( (P,Q) r / LC & X1,...,Xn U*) s / U* U & U* 1 N prob(B / C) = s is finite & # U* ≥ N) ≥ 1 – ε and hence But we know that there is an N such that for every finite U* such bc = (s * c) that U* ⊆ U and #U* ≥ N, ac = (r * c) and the values of a, b, c, r, s are held constant, ρ ,..., (ρ(P,Q) r / LC & X1,...,Xn ⊆ U*) ≥ 1 – ε. x1 xN then the term-set consisting of the cardinalities of the partition of U is: So by the Universality Principle we get: { ((ab - abc) * u) (3) ρU*(ρ ,..., (ρ(P,Q) r / LC & X1,...,Xn ⊆ U*) ≥ 1 – ε / U* ⊆ U x1 xN (abc * u) & U* is finite & #U* ≥ N) = 1 (((a + abc) - (ab + (r * c))) * u) For every ε,δ > 0 there is an N such that (1) and (3) hold. It (((r * c) - abc) * u) (((b + abc) - (ab + (s * c))) * u) follows that s=1. • (((s * c) - abc) * u) Probable Probabilities Theorem: (((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) * u) Consider a finitely unbounded finite set LC of linear (((c + abc) - ((r * c) + (s * c))) * u) } constraints on proportions between Boolean compounds For computing the most probable value of abc, we need only of a list of variables U,X1,…,Xn. Let r be limit solution for ρ(P/Q) given LC. Then for any nomically possible property consider the members of the term-set that contain abc: U, for every δ > 0, The subset of terms in the term-set that contain abc is: { 7 ((ab - abc) * u) probx ,...,x (prob(P/Q) r / LC & X1,...,Xn U) =1. 1 n (abc * u) Proof: Assume the antecedent. prob(P/Q) = ρ(B,D), so the Limit (((a + abc) - (ab + (r * c))) * u) Principle for Proportions immediately implies: (((r * c) - abc) * u) ρ 7 (((b + abc) - (ab + (s * c))) * u) x ,...,x (prob(P/Q) r / LC & X1,...,Xn U) =1 1 n (((s * c) - abc) * u) The crossproduct principle tells us: (((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) * u) ρ(A × B / C × D)=ρ(A / C) ⋅ ρ(B / D). (((c + abc) - ((r * c) + (s * c))) * u) } The properties expressed by £ LC & X1,...,Xn 7 U · and £prob(P / Q) r · have the same instances in all physically possible worlds, As shown in the Probable Proportions theorem, the most so where W is the set of all physically possible worlds, probable values of ab and abc are those that minimize the product of the factorials of these members of the term-set, probX ,...,X (prob(P/Q) r / LC & X1,...,Xn 7 U) ε 1 n and for any positive or negative real number ε, → N as

= ρ({〈w,X1,...,Xn〉|w ∈ W & prob(P / Q) r}, {〈w,X1,...,Xn〉|w N → ∞. So

∈ W & LC & X1,...,Xn 7 U}) The expectable-value of abc is then the real-valued solution to the following equation: = ρ(W × {X1,...,Xn |prob(P / Q) r}, W × {X1,...,Xn | LC & 1 = (((ab - abc) ^ ((ab - (abc + 1)) - (ab - abc)))

X1,...,Xn 7 U}) * (abc ^ ((abc + 1) - abc)) = ρ(W / W) ⋅ ρ ,..., (prob(P/Q) r / LC & X1,...,Xn 7 U) X1 Xn * = 1. • (((a + abc) - (ab + (r * c))) ^ (((a + (abc + 1)) - (ab + (r * c))) - ((a + abc) - (ab + (r * c))))) 5. Statistical Independence * Finite Independence Principle: (((r * c) - abc) ^ (((r * c) - (abc + 1)) - ((r * c) - abc))) For 0 ≤ a,b,c,r,s ≤ 1 and for every ε,δ > 0 there is an N such * (((b + abc) - (ab + (s * c))) ^ (((b + (abc + 1)) - (ab + (s * c))) - ((b + abc) that if U is finite and #U > N, then - (ab + (s * c))))) * (((s * c) - abc) ^ (((s * c) - (abc + 1)) - ((s * c) - abc))) * Proof: The limit value for ρ(X∩Y,Z) given that X,Y,Z ⊆ U & ρ(X,Z) (((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) ^ (((ab + 1 + (r * c) + (s * c)) - (a + b + (abc + 1) + c)) - ((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)))) = r, ρ(Y,Z) = s, ρ(X,U) = a, ρ(Y,U) = b, and ρ(Z,U) = c can be * computed by executing the following instruction in the probable (((c + abc) - ((r * c) + (s * c))) ^ (((c + (abc + 1)) - ((r * c) + (s * c))) - ((c probabilities software: + abc) - ((r * c) + (s * c)))))) (analyze-probability-structure :subsets ‘(A B C) = (((ab - abc) ^ (- 1)) * (abc ^ 1) * (((a + abc) - (ab + (r * c))) ^ 1) * (((r * c) - abc) ^ (- 1)) * (((b + abc) - (ab + (s * c))) ^ 1) :constants ‘(a b c r s) * (((s * c) - abc) ^ (- 1)) * (((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) :probability-constraints ‘((prob(A / C) = r) ^ (- 1)) * (((c + abc) - ((r * c) + (s * c))) ^ 1)) (prob(B / C) = s)) = ((1 / (ab - abc)) * abc * ((abc + a) - (ab + (r * c))) * (1 / ((r * c) - abc)) * :probability-queries ‘(prob((A & B) / C)) ((abc + b) - (ab + (s * c))) * (1 / ((s * c) - abc)) * :display-details t (1 / (((s * c) + (r * c) + 1 + ab) - (a + b + abc + c))) * ((abc + c) - ((r * :display-infix t) c) + (s * c)))) = (abc * ((abc + a) - (ab + (r * c))) * ((abc + b) - (ab + (s * c))) * ((abc + c) “prob” and “&” are used in place of “ρ” and “∩” because non- - ((r * c) + (s * c))) * (1 / (((s * c) + (r * c) + 1 + ab) - (a + b + abc + c))) ASCII symbols are not supported in most computer languages. * (1 / ((s * c) - abc)) * (1 / ((r * c) - abc)) * (1 / (ab - abc))) However, in light of the Probable Probabilities Theorem, the = ((((c + abc) - ((r * c) + (s * c))) * ((b + abc) - (ab + (s * c))) * ((a + abc) - (ab + (r * c))) * abc) / ((ab - abc) * ((r * c) – abc) * ((s * c) - abc) * ((ab + result for the limit value implies the analogous principle for 1 + (r * c) + (s * c)) - (a + b + abc + c))))

— 22 — — Philosophy and Computers —

The subset of terms in the term-set that contain ab is: 6. Defeaters for Statistical Independence { Principle of Statistical Independence with Overlap: ((ab - abc) * u) (((a + abc) - (ab + (r * c))) * u) If prob(A/C) = r, prob(B/C) = s, prob(D/C) = g, (A&C) 7 (((b + abc) - (ab + (s * c))) * u) D, and (B&C) 7 D, then prob(A/C&D) = r/g, prob(B/C&D) (((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) * u) = s/g, and the following values are expectable: } The expectable-value of ab is then the real-valued solution to the following (1) prob(A&B/C) = ; equation: 1 = (((ab - abc) ^ (((ab + 1) - abc) - (ab - abc))) * (((a + abc) - (ab + (r * c))) ^ (((a + abc) - ((ab + 1) + (r * c))) - ((a + (2) prob(A&B/C&D) = . abc) - (ab + (r * c))))) * (((b + abc) - (ab + (s * c))) ^ (((b + abc) - ((ab + 1) + (s * c))) - ((b + Proof: It suffices to prove the corresponding finitary principle: abc) - (ab + (s * c))))) * (((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) ^ ((((ab + 1) + 1 + (r * For every ε,δ > 0, there is an N such that if U is finite and #U c) + (s * c)) - (a + b + abc + c)) > N, then - ((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c))))) = (((ab - abc) ^ 1) * (((a + abc) - (ab + (r * c))) ^ (- 1)) * (((b + abc) - (ab + (s * c))) ^ (- 1)) . * (((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) ^ 1)) = ((ab - abc) * (1 / ((abc + a) - (ab + (r * c)))) * (1 / ((abc + b) - (ab + (s * c)))) * (((s * c) + (r * c) + 1 + ab) - (a + b + Here is a machine-generated proof of the finitary principle, using abc + c))) the code at http:// oscarhome.soc-sci.arizona.edu/ftp/OSCAR- = ((ab - abc) * (((s * c) + (r * c) + 1 + ab) - (a + b + abc + c)) web-page/CODE/Code for probable probabilities.zip. The proof * (1 / ((abc + b) - (ab + (s * c)))) * (1 / ((abc + a) - (ab + (r * c))))) is produced by executing the following instruction: = ((((ab + 1 + (r * c) + (s * c)) - (a + b + abc + c)) * (ab - abc)) / (((a + abc) (analyze-probability-structure - (ab + (r * c))) * ((b + abc) - (ab + (s * c))))) :subsets ‘(A B C D) The preceding term-characterization for ab simplifies to: :constants ‘(a b c d r s g) . . ((((c * ab) + (u * abc) + (a * b) + (r * s * (c ^ 2))) - ((c * abc) + (u * ab) + (a * s * c) + (r * c * b))) = 0) :subset-constraints ‘(((A intersection C) subset D) Solving for ab: ((B intersection C) subset D)) . . ab = ((((u * abc) + (a * b) + (r * s * (c ^ 2))) - ((r * c * b) + (a * s * c) + (c :probability-constraints ‘((prob(A / C) = r) * abc))) / (u - c)) (prob(B / C) = s) Substituting the preceding definition for ab into the previous term-characterizations (prob(D / C) = g)) produces the new term-characterizations: :probability-queries ‘(prob((A & B) / C) . . . . . abc: 1 = ((((c + abc) - ((r * c) + (s * c))) prob((A & B) / (C & D))) * ((b + abc) - (((((u * abc) + (a * b) + (r * s * (c ^ 2))) :independence-queries ‘((A B C) - ((r * c * b) + (a * s * c) + (c * abc))) / (u - c)) + (s * c))) (A B (C & D))) * ((a + abc) - (((((u * abc) + (a * b) + (r * s * (c ^ 2))) :parallel-term-characterizations t - ((r * c * b) + (a * s * c) + (c * abc))) / (u - c)) + (r * c))) * abc) :display-details t / :display-infix t ((((((u * abc) + (a * b) + (r * s * (c ^ 2))) - ((r * c * b) + (a * s * c) + ) (c * abc))) / (u - c)) - abc) * ((r * c) - abc) * ((s * c) - abc) * ((((((u * abc) + (a * b) + (r * s * (c ^ 2))) Dividing U into 4 subsets A,B,C,D whose probabilities relative to U are a, b, c, d, - ((r * c * b) + (a * s * c) + (c * abc))) / (u - c)) + u + (r * c) + if the following constraints are satisfied: (s * c)) - (a + b + abc + c)))) prob(A / C) = r These term-characterizations simplify to yield the following term- prob(B / C) = s characterizations: prob(D / C) = g . . . . . abc: 1 = ((abc * ((c + abc) - ((r * c) + (s * c)))) / (((r * c) - abc) * ((s * ((A intersection C) subset D) c) - abc))) ((B intersection C) subset D) . The preceding term-characterization for abc simplifies to: and hence . . . (((r * s * (c ^ 2)) - (abc * c)) = 0) acd = (* r c) . Solving for abc: bcd = (- (+ (* s c) (* r c)) (* r c)) . . . abc = (r * s * (c ^ 1)) abcd = (- (+ (* r c) abc) (* r c)) ======EXPAND-DEFS ======and the values of a, b, c, d, r, s, g are held constant, Thus far we have found the following definitions: then the term-set consisting of the cardinalities of the partition of U is: abc = (r * s * (c ^ 1)) { ab = ((((u * abc) + (a * b) + (r * s * (c ^ 2))) - ((r * c * b) + (a * s * c) + (c * abc))) / (u - c)) (c - (g * c)) * u (((g * c) + abc) - ((r * c) + (s * c))) * u Substituting the definition for abc into the definition for ab and simplifying, ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * u produces: ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * u ab = ((((a * b) + (u * r * s * c)) - ((r * c * b) + (a * s * c))) / (u - c)) ((s * c) - abc) * u ((b + abd) - (ab + bd)) * u Grounded definitions of the expectable values were found for all the variables. ((bd + abc) - (abd + (s * c))) * u ------((r * c) - abc) * u The following definitions of expectable values were found that appeal only to ((a + abd) - (ab + ad)) * u the constants: ((ad + abc) - (abd + (r * c))) * u ------abc * u abc = (r * s * c) (ab - abd) * u ------(abd - abc) * u ab = ((((r * s * c) + (a * b)) - ((r * c * b) + (a * s * c))) / (1 - c)) } ======The subset of terms in the term-set that contain abc is: Reconstruing a, b, c, etc., as probabilities relative to U rather than as cardinalities, { the following characterizations were found for the expectable values of the probabilities wanted: (((g * c) + abc) - ((r * c) + (s * c))) * u ------((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * u prob((A & B) / C) = (r * s) ((s * c) - abc) * u ------((bd + abc) - (abd + (s * c))) * u ======((r * c) - abc) * u ) • ((ad + abc) - (abd + (r * c))) * u — 23 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

abc * u 1 = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) (abd - abc) * u ^ (((1 + (ab + 1) + ad + bd + (g * c)) - (a + b + c + d + abd)) - ((1 + } ab + ad + bd + (g * c)) - (a + b + c + d + abd)))) The expectable-value of abc is then the real-valued solution to the following * (((b + abd) - (ab + bd)) ^ (((b + abd) - ((ab + 1) + bd)) - ((b + abd) equation: - (ab + bd)))) 1 = (((((g * c) + abc) - ((r * c) + (s * c))) ^ ((((g * c) + (abc + 1)) - ((r * c) + * (((a + abd) - (ab + ad)) ^ (((a + abd) - ((ab + 1) + ad)) - ((a + abd) (s * c))) - (((g * c) + abc) - ((r * c) + (s * c))))) - (ab + ad)))) * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * ((ab - abd) ^ (((ab + 1) - abd) - (ab - abd)))) ^ (((abd + d + (r * c) + (s * c)) - (ad + bd + (abc + 1) + (g * c))) - ((abd = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) ^ 1) + d + (r * c) + (s * c)) * (((b + abd) - (ab + bd)) ^ (- 1)) - (ad + bd + abc + (g * c))))) * (((a + abd) - (ab + ad)) ^ (- 1)) * (((s * c) - abc) ^ (((s * c) - (abc + 1)) - ((s * c) - abc))) * ((ab - abd) ^ 1)) * (((bd + abc) - (abd + (s * c))) ^ (((bd + (abc + 1)) - (abd + (s * c))) = (((ab - abd) * ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd))) / (((b - ((bd + abc) - (abd + (s * c))))) + abd) - (ab + bd)) * ((a + abd) - (ab + ad)))) * (((r * c) - abc) ^ (((r * c) - (abc + 1)) - ((r * c) - abc))) The subset of terms in the term-set that contain ad is: * (((ad + abc) - (abd + (r * c))) ^ (((ad + (abc + 1)) - (abd + (r * c))) { - ((ad + abc) - (abd + (r * c))))) ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * u * (abc ^ ((abc + 1) - abc)) ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * u * ((abd - abc) ^ ((abd - (abc + 1)) - (abd - abc)))) ((a + abd) - (ab + ad)) * u = (((((g * c) + abc) - ((r * c) + (s * c))) ^ 1) ((ad + abc) - (abd + (r * c))) * u * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) ^ (- 1)) } * (((s * c) - abc) ^ (- 1)) The expectable-value of ad is then the real-valued solution to the following * (((bd + abc) - (abd + (s * c))) ^ 1) equation: * (((r * c) - abc) ^ (- 1)) 1 = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * (((ad + abc) - (abd + (r * c))) ^ 1) ^ (((1 + ab + (ad + 1) + bd + (g * c)) - (a + b + c + d + abd)) - ((1 + * (abc ^ 1) ab + ad + bd + (g * c)) - (a + b + c + d + abd)))) * ((abd - abc) ^ (- 1))) * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) = ((abc * ((ad + abc) - (abd + (r * c))) * ((bd + abc) - (abd + (s * c))) * (((g ^ (((abd + d + (r * c) + (s * c)) - ((ad + 1) + bd + abc + (g * c))) * c) + abc) - ((r * c) + (s * c)))) - ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))))) / (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * ((s * c) - abc) * (((a + abd) - (ab + ad)) ^ (((a + abd) - (ab + (ad + 1))) - ((a + abd) * ((r * c) - abc) * (abd - abc))) - (ab + ad)))) The subset of terms in the term-set that contain abd is: * (((ad + abc) - (abd + (r * c))) ^ ((((ad + 1) + abc) - (abd + (r * c))) { - ((ad + abc) - (abd + (r * c)))))) ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * u = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) ^ 1) ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * u * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) ^ (- 1)) ((b + abd) - (ab + bd)) * u * (((a + abd) - (ab + ad)) ^ (- 1)) ((bd + abc) - (abd + (s * c))) * u * (((ad + abc) - (abd + (r * c))) ^ 1)) ((a + abd) - (ab + ad)) * u = ((((ad + abc) - (abd + (r * c))) * ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd))) ((ad + abc) - (abd + (r * c))) * u / (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * ((a + abd) (ab - abd) * u - (ab + ad)))) (abd - abc) * u The subset of terms in the term-set that contain bd is: } { The expectable-value of abd is then the real-valued solution to the following ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * u equation: ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * u 1 = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) ((b + abd) - (ab + bd)) * u ^ (((1 + ab + ad + bd + (g * c)) - (a + b + c + d + (abd + 1))) - ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)))) ((bd + abc) - (abd + (s * c))) * u * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) } ^ ((((abd + 1) + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) The expectable-value of bd is then the real-valued solution to the following equation: - ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))))) 1 = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * (((b + abd) - (ab + bd)) ^ (((b + (abd + 1)) - (ab + bd)) - ((b + abd) - (ab + bd)))) ^ (((1 + ab + ad + (bd + 1) + (g * c)) - (a + b + c + d + abd)) - ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)))) * (((bd + abc) - (abd + (s * c))) ^ (((bd + abc) - ((abd + 1) + (s * c))) - ((bd + abc) - (abd + (s * c))))) * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * (((a + abd) - (ab + ad)) ^ (((a + (abd + 1)) - (ab + ad)) - ((a + abd) ^ (((abd + d + (r * c) + (s * c)) - (ad + (bd + 1) + abc + (g * c))) - (ab + ad)))) - ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))))) * (((ad + abc) - (abd + (r * c))) ^ * (((b + abd) - (ab + bd)) ^ (((b + abd) - (ab + (bd + 1))) - ((b + abd) (((ad + abc) - ((abd + 1) + (r * c))) - ((ad + abc) - (abd + (r * c))))) - (ab + bd)))) * ((ab - abd) ^ ((ab - (abd + 1)) - (ab - abd))) * (((bd + abc) - (abd + (s * c))) ^ ((((bd + 1) + abc) - (abd + (s * c))) - ((bd + abc) - (abd + (s * c)))))) * ((abd - abc) ^ (((abd + 1) - abc) - (abd - abc)))) = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) ^ 1) = ((((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) ^ (- 1)) * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) ^ (- 1)) * (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) ^ 1) * (((b + abd) - (ab + bd)) ^ (- 1)) * (((b + abd) - (ab + bd)) ^ 1) * (((bd + abc) - (abd + (s * c))) ^ 1)) * (((bd + abc) - (abd + (s * c))) ^ (- 1)) = ((((bd + abc) - (abd + (s * c))) * ((1 + ab + ad + bd + (g * c)) - (a + b + * (((a + abd) - (ab + ad)) ^ 1) c + d + abd))) * (((ad + abc) - (abd + (r * c))) ^ (- 1)) / (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * ((b + abd) * ((ab - abd) ^ (- 1)) - (ab + bd)))) * ((abd - abc) ^ 1)) The preceding term-characterization for ab simplifies to: = (((abd - abc) * ((a + abd) - (ab + ad)) * ((b + abd) - (ab + bd)) * ((abd + . . ((((abd * g * c) + abd + (ab * c) + (ab * d) + (b * a) + (bd * ad)) d + (r * c) + (s * c)) - (ad + bd + abc + (g * c)))) - ((abd * c) + (abd * d) + (ab * g * c) + ab + (b * ad) + (bd * a))) / (((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * ((bd + abc) - (abd = 0) + (s * c))) * ((ad + abc) - (abd + (r * c))) * (ab - abd))) Solving for ab: The subset of terms in the term-set that contain ab is: . . ab = ((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + (b * ad) + { (abd * d) + (abd * c))) ((1 + ab + ad + bd + (g * c)) - (a + b + c + d + abd)) * u / (((g * c) + 1) - (d + c))) ((b + abd) - (ab + bd)) * u Substituting the preceding definition for ab into the previous term-characterizations ((a + abd) - (ab + ad)) * u produces the new term-characterizations: (ab - abd) * u . . . . . abc: 1 = ((abc * ((ad + abc) - (abd + (r * c))) * ((bd + abc) - (abd + (s * } c))) * (((g * c) + abc) - ((r * c) + (s * c)))) The expectable-value of ab is then the real-valued solution to the following / (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * ((s equation: * c) - abc) * ((r * c) - abc) * (abd - abc))) — 24 — — Philosophy and Computers —

. . . . . abd: 1 = (((abd - abc) - (ad + bd + abc + (g * c))) * ((a + abd) * ((s * c) - abc) - (((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + * ((r * c) - abc) (b * ad) + (abd * d) + (abd * c))) * (((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) / (((g * c) + 1) - (d + c))) + ad)) + (ad * s * c) + (g * c * abc))) / (d - (g * c))) * ((b + abd) - abc))) - (((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + ...... ad: 1 = ((((1 + (g * c) + ad) - (a + c + d)) (b * ad) + (abd * d) + (abd * c))) * ((ad + abc) / (((g * c) + 1) - (d + c))) - (((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) + + bd)) (ad * s * c) + (g * c * abc))) / (d - (g * c))) * ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c)))) + (r * c)))) / (((1 + ((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) / ((a - ad) + (b * ad) + (abd * d) + (abd * c))) * ((((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) / (((g * c) + 1) - (d + c))) + (ad * s * c) + (g * c * abc))) / (d - (g * c))) + ad + bd + (g * c)) + d + (r * c) + (s * c)) - (a + b + c + d + abd)) - (ad + bd + abc + (g * c))))) * ((bd + abc) - (abd + (s * c))) ...... bd: 1 = ((((1 + (g * c) + bd) - (b + c + d)) * ((ad + abc) - (abd + (r * c))) * ((bd + abc) * (((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + - (((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) + (b * ad) + (abd * d) + (abd * c))) (ad * s * c) + (g * c * abc))) / (d - (g * c))) / (((g * c) + 1) - (d + c))) + (s * c)))) - abd))) / ((b - bd) . . . . . ad: 1 = ((((ad + abc) - (abd + (r * c))) * ((((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) * ((1 + ((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + + (ad * s * c) + (g * c * abc))) / (d - (g * c))) (b * ad) + (abd * d) + (abd * c))) + d + (r * c) + (s * c)) / (((g * c) + 1) - (d + c))) - (ad + bd + abc + (g * c))))) + ad + bd + (g * c)) . These term-characterizations simplify to yield the following term- - (a + b + c + d + abd))) characterizations: / (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) ...... abc: 1 = (((((g * c) + abc) - ((r * c) + (s * c))) * abc) / (((r * c) - abc) * ((s * c) - abc))) * ((a + abd) ...... ad: 1 = (((ad - (r * c)) * ((1 + (g * c) + ad) - (a + c + d))) / (((d + (r * c)) - (((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + - (ad + (g * c))) * (a - ad))) (b * ad) + (abd * d) + (abd * c))) ...... bd: 1 = (((bd - (s * c)) * ((1 + (g * c) + bd) - (b + c + d))) / (((d + (s * / (((g * c) + 1) - (d + c))) c)) - (bd + (g * c))) * (b - bd))) + ad)))) . . The preceding term-characterization for abc simplifies to: . . . . . bd: 1 = ((((bd + abc) - (abd + (s * c))) . . . . (((r * s * (c ^ 2)) - (g * c * abc)) = 0) * ((1 + ((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + . . Solving for abc: (b * ad) + (abd * d) + (abd * c))) . . . . abc = ((r * s * c) / g) / (((g * c) + 1) - (d + c))) . . Substituting the preceding definition for abc into the previous term- + ad + bd + (g * c)) characterizations produces the new term-characterizations: - (a + b + c + d + abd))) ...... ad: 1 = (((ad - (r * c)) * ((1 + (g * c) + ad) - (a + c + d))) / (((d + (r * / (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) c)) - (ad + (g * c))) * (a - ad))) * ((b + abd) ...... bd: 1 = (((bd - (s * c)) * ((1 + (g * c) + bd) - (b + c + d))) / (((d + (s * - (((((abd * g * c) + abd + (b * a) + (bd * ad)) - ((bd * a) + c)) - (bd + (g * c))) * (b - bd))) (b * ad) + (abd * d) + (abd * c))) . . . The preceding term-characterization for ad simplifies to: / (((g * c) + 1) - (d + c))) . . . . . ((((r * g * (c ^ 2)) + (r * c) + (ad * c) + (d * a)) - ((r * (c ^ 2)) + (r * c * + bd)))) d) + ad + (g * c * a))) = 0) These term-characterizations simplify to yield the following term- . . . Solving for ad: characterizations: . . . . . ad = ((((r * g * (c ^ 2)) + (r * c) + (d * a)) - ((g * c * a) + (r * c * d) + (r . . . . . abc: 1 = ((abc * ((ad + abc) - (abd + (r * c))) * ((bd + abc) - (abd + (s * * (c ^ 2)))) / (1 - c)) c))) * (((g * c) + abc) - ((r * c) + (s * c)))) . . . Substituting the preceding definition for ad into the previous term- / (((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * ((s characterizations produces the new term-characterizations: * c) - abc) * ((r * c) - abc) * (abd - abc))) ...... bd: 1 = (((bd - (s * c)) * ((1 + (g * c) + bd) - (b + c + d))) / (((d + (s . . . . . abd: 1 = ((((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))) * * c)) - (bd + (g * c))) * (b - bd))) (abd - abc)) . . . These term-characterizations simplify to yield the following term- / (((ad + abc) - (abd + (r * c))) * ((bd + abc) - (abd + (s * c))))) characterizations: . . . . . ad: 1 = ((((1 + (g * c) + ad) - (a + c + d)) * ((ad + abc) - (abd + (r * ...... bd: 1 = (((bd - (s * c)) * ((1 + (g * c) + bd) - (b + c + d))) / (((d + (s c)))) * c)) - (bd + (g * c))) * (b - bd))) / ((a - ad) * ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))))) . . . . The preceding term-characterization for bd simplifies to: . . . . . bd: 1 = ((((1 + (g * c) + bd) - (b + c + d)) * ((bd + abc) - (abd + (s * ...... ((((s * g * (c ^ 2)) + (s * c) + (bd * c) + (d * b)) - ((s * (c ^ 2)) + (s * c c)))) * d) + bd + (g * c * b))) = 0) / ((b - bd) * ((abd + d + (r * c) + (s * c)) - (ad + bd + abc + (g * c))))) . . . . Solving for bd: . The preceding term-characterization for abd simplifies to: ...... bd = ((((s * g * (c ^ 2)) + (s * c) + (d * b)) - ((g * c * b) + (s * c * d) + (s * (c ^ 2)))) / (1 - c)) . . . ((((g * c * abd) + (d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((g * c * abc) + (d * abd) + (ad * s * c) + (r * c * bd))) = 0) ...... The preceding definitions for abd, ab, bcd, abcd then expand as follows: . Solving for abd: ...... abd = ((((d * abc) . . . abd = ((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) + (ad * s * + (ad c) + (g * c * abc))) / (d - (g * c))) * ((((s * g * (c ^ 2)) + (s * c) + (d * b)) - ((g * c * b) + (s * . Substituting the preceding definition for abd into the previous term- c * d) + (s * (c ^ 2)))) / (1 - c))) characterizations produces the new term-characterizations: + (r * s * (c ^ 2))) ...... abc: 1 = ((abc - ((r * c * ((((s * g * (c ^ 2)) + (s * c) + (d * b)) - ((g * c * b) + * ((ad + abc) (s * c * d) + (s * (c ^ 2)))) / (1 - c))) - (((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) + (ad * s * c) + (g * c * abc))) + (ad * s * c) + (g * c * abc))) / (d - (g * c))) / (d - (g * c))) + (r * c))) ...... ab = ((((abd * g * c) + abd + (b * a) * ((bd + abc) + (((((s * g * (c ^ 2)) + (s * c) + (d * b)) - ((g * c * b) + (s * c * - (((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) d) + (s * (c ^ 2)))) / (1 - c)) * ad)) + (ad * s * c) + (g * c * abc))) / (d - (g * c))) - ((((((s * g * (c ^ 2)) + (s * c) + (d * b)) - ((g * c * b) + (s * c * + (s * c))) d) + (s * (c ^ 2)))) / (1 - c)) * a) * (((g * c) + abc) - ((r * c) + (s * c)))) + (b * ad) + (abd * d) + (abd * c))) / (((((((d * abc) + (ad * bd) + (r * s * (c ^ 2))) - ((r * c * bd) + / (((g * c) + 1) - (d + c))) (ad * s * c) + (g * c * abc))) / (d - (g * c))) ...... bcd = (((s * c) + (r * c)) - (r * c)) + d + (r * c) + (s * c)) ...... abcd = (((r * c) + abc) - (r * c))

— 25 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

...... and these simplify to: (r * s * (c ^ 2)) + (c * a * b * (g ^ 2)) + (r * c * d * b * g))) ...... abd = ((((b * ad) + (s * (c ^ 2) * r) + abc) - ((s * c * ad) + (b * c * r) + / ((1 - c) * (1 - c) * g)) * c) (abc * c))) / (1 - c)) + (r * g * (c ^ 2) * b) + (r * c * b) + (d * a * b) + (s * c * a) ...... ab = ((((s * c * ad) + (b * a) + abd) - ((b * ad) + (s * c * a) + (abd * + (((((r * s * (c ^ 3)) + (a * (c ^ 2) * s * (g ^ 2)) + (r * d * (c c))) / (1 - c)) ^ 2) * s * g) + ...... bcd = (s * c) (r * s * c) + (d * a * b * g) + (r * (c ^ 2) * b * (g ^ 2))) ...... abcd = abc - ((r * s * (c ^ 2)) + (d * a * c * s * g) + (r * (c ^ 3) * s * ...... The preceding definitions for abd, ab then expand as follows: (g ^ 2)) + ...... abd = ((((b (r * s * (c ^ 2)) + (c * a * b * (g ^ 2)) + (r * c * d * b * g))) * ((((r * g * (c ^ 2)) + (r * c) + (d * a)) - ((g * c * a) + (r * c * / ((1 - c) * (1 - c) * g)) * c))) d) + (r * (c ^ 2)))) / (1 - c))) / ((1 - c) * (1 - c))) + (s * (c ^ 2) * r) + abc) ...... which simplifies to: - ((s * c * ((((r * g * (c ^ 2)) + (r * c) + (d * a)) - ((g * c * a) + (r ...... ab = ((((r * s * c) + (b * a * g) + (r * (c ^ 2) * s * g)) - ((r * c * b * g) + * c * d) + (r * (c ^ 2)))) / (1 - c))) (s * c * a * g) + (r * s * (c ^ 2)))) + (b * c * r) + (abc * c))) / ((1 - c) * g)) / (1 - c)) ...... ab = ((((s * c * ((((r * g * (c ^ 2)) + (r * c) + (d * a)) - ((g * c * a) + (r * Grounded definitions of the expectable values were found for all the variables. c * d) + (r * (c ^ 2)))) / (1 - c))) + (b * a) + abd) The following definitions of expectable values were found that appeal only to - ((b * ((((r * g * (c ^ 2)) + (r * c) + (d * a)) - ((g * c * a) + (r * c the constants: * d) + (r * (c ^ 2)))) / (1 - c))) ------+ (s * c * a) + (abd * c))) abcd = ((r * s * c) / g) / (1 - c)) ------...... and these simplify to: bcd = (s * c) ...... abd = ((((r * g * (c ^ 2) * b) + (d * a * b) + abc + (r * d * (c ^ 2) * s) + (g * a * (c ^ 2) * s) + (abc * (c ^ 2))) ------((r * c * d * b) + (g * c * a * b) + (abc * c) + (r * g * (c ^ 3) * acd = (r * c) s) + (d * a * c * s) + (abc * c))) ------/ ((1 - c) * (1 - c))) abc = ((r * s * c) / g) ...... ab = ((((r * g * (c ^ 3) * s) + (r * (c ^ 2) * s) + (d * a * c * s) + (b * a) ------+ (abd * (1 ^ 2)) + abd = ((((r * (c ^ 2) * b * (g ^ 2)) + (d * a * b * g) + (c * s * r) (r * (c ^ 2) * b) + (r * c * d * b) + (g * c * a * b) + (s * a * (c ^ + (r * d * (c ^ 2) * s * g) + (a * (c ^ 2) * s * (g ^ 2)) + (r * s * (c ^ 3))) 2)) + (abd * (c ^ 2))) - ((2 * r * s * (c ^ 2)) + (d * a * c * s * g) + (r * (c ^ 3) * s * (g ^ 2)) + - ((r * (c ^ 3) * s) + (r * d * (c ^ 2) * s) + (g * a * (c ^ 2) * s) + (c * a * b * (g ^ 2)) + (r * c * d * b * g))) (b * a * c) + (abd * c) + / (g * (1 - c) * (1 - c))) (r * g * (c ^ 2) * b) + (r * c * b) + (d * a * b) + (s * c * a) + (abd * c))) ------/ ((1 - c) * (1 - c))) bd = ((((s * g * (c ^ 2)) + (s * c) + (d * b)) - ((s * (c ^ 2)) + (s * c * d) + (g * c * b))) / (1 - c)) ...... Substituting the new definition for abc into the previous definition for abd produces: ------...... abd = ((((r * g * (c ^ 2) * b) + (d * a * b) + ((r * s * c) / g) + ad = ((((r * g * (c ^ 2)) + (r * c) + (d * a)) - ((r * (c ^ 2)) + (r * c * d) + (g * c * a))) / (1 - c)) (r * d * (c ^ 2) * s) + (g * a * (c ^ 2) * s) + (((r * s * c) / g) * (c ^ 2))) ------((r * c * d * b) + (g * c * a * b) + (((r * s * c) / g) * c) + (r * g * ab = ((((r * (c ^ 2) * s * g) + (b * a * g) + (r * s * c)) - ((r * c * b * g) + (s * c * (c ^ 3) * s) + (d * a * c * s) + (((r * s * c) a * g) + (r * s * (c ^ 2)))) / (g * (1 - c))) / g) * c))) / ((1 - c) * (1 - c))) The following characterizations were found for the expectable values of the probabilities wanted: ...... which simplifies to: ------...... abd = ((((r * s * (c ^ 3)) + (a * (c ^ 2) * s * (g ^ 2)) + (r * d * (c ^ 2) * s * g) prob((A & B) / C) = ((r * s) / g) + ( r * s * c) + (d * a * b * g) + (r * (c ^ 2) * b * (g ^ 2))) ------((r * s * (c ^ 2)) + (d * a * c * s * g) + (r * (c ^ 3) * s * (g ^ 2)) prob((A & B) / (C & D)) = ((r * s) / (g * g)) + (r * s * (c ^ 2)) + (c * a * b * (g ^ 2)) + (r * c * d * b * g))) ------/ ((1 - c) * (1 - c) * g)) A and B are STATISTICALLY INDEPENDENT relative to (C & D) ...... The preceding definition for abcd then expands as follows: A and B are NOT statistically independent relative to C ...... abcd = ((r * s * c) / g) (prob(A / C) * prob(B / C)) ÷ prob((A & B) / C) = g ...... and this simplifies to: • ...... abcd = ((r * s * c) / g) 7. Nonclassical Direct Inference ...... Substituting the new definition for abd into the previous definition for ab produces: Nonclassical Direct Inference: ...... ab = ((((((((r * s * (c ^ 3)) + (a * (c ^ 2) * s * (g ^ 2)) + (r * d * (c ^ 2) * s * g) If prob(A/B) = r, the expectable value of prob(A/B&C) = + (r * s * c) + (d * a * b * g) + (r * (c ^ 2) * b * (g ^ 2))) r. - ((r * s * (c ^ 2)) + (d * a * c * s * g) + (r * (c ^ 3) * s * (g ^ 2)) + Proof: Here is a machine-generated proof. It is produced by (r * s * (c ^ 2)) + (c * a * b * (g ^ 2)) + (r * c * d * b * g))) executing the following instruction: / ((1 - c) * (1 - c) * g)) (analyze-probability-structure * (c ^ 2)) :subsets ‘(A B C) + (s * a * (c ^ 2)) + (g * c * a * b) + (r * c * d * b) + (r * (c ^ 2) * b) :constants ‘(a b c r) + (((((r * s * (c ^ 3)) + (a * (c ^ 2) * s * (g ^ 2)) + (r * d * (c ^ 2) * s * g) + :probability-constraints ‘((prob(A / B) = r)) (r * s * c) + (d * a * b * g) + (r * (c ^ 2) * b * (g ^ 2))) - :probability-queries ‘(prob(A / (B & C))) ((r * s * (c ^ 2)) + (d * a * c * s * g) + (r * (c ^ 3) * s * (g ^ 2)) :independence-queries ‘((A C B)) + (r * s * (c ^ 2)) + (c * a * b * (g ^ 2)) + (r * c * d * b * g))) :parallel-term-characterizations t / ((1 - c) * (1 - c) * g)) * (1 ^ 2)) :display-details t + (b * a) + (d * a * c * s) + (r * (c ^ 2) * s) + (r * g * (c ^ 3) * s)) :display-infix t - ((r * (c ^ 3) * s) + (r * d * (c ^ 2) * s) + (g * a * (c ^ 2) * s) ) + (b * a * c) + (((((r * s * (c ^ 3)) + (a * (c ^ 2) * s * (g ^ 2)) + (r * d * (c Dividing U into 3 subsets A,B,C whose probabilities relative to U are a, b, c, ^ 2) * s * g) + if the following constraints are satisfied: (r * s * c) + (d * a * b * g) + (r * (c ^ 2) * b * (g ^ 2))) prob(A / B) = r - ((r * s * (c ^ 2)) + (d * a * c * s * g) + (r * (c ^ 3) * s * and the values of a, b, c, r are held constant, (g ^ 2)) + then the term-set consisting of the cardinalities of the partition of U is:

— 26 — — Philosophy and Computers —

{ * (((c + abc) - (ac + bc)) ^ (((c + abc) - (ac + (bc + 1))) - ((c + abc) ((r * b) - abc) * u - (ac + bc))))) abc * u = ((((b + abc) - ((r * b) + bc)) ^ (- 1)) * ((bc - abc) ^ 1) * ((((r * b) + 1 + ac + bc) - (a + b + abc + c)) ^ 1) ((a + abc) - ((r * b) + ac)) * u * (((c + abc) - (ac + bc)) ^ (- 1))) (ac - abc) * u = (((((r * b) + 1 + ac + bc) - (a + b + abc + c)) * (bc - abc)) / (((abc + b) ((b + abc) - ((r * b) + bc)) * u - ((r * b) + bc)) * ((abc + c) - (ac + bc)))) (bc - abc) * u The preceding term-characterization for ac simplifies to: (((r * b) + 1 + ac + bc) - (a + b + abc + c)) * u . . ((((b * ac) + abc + (a * c) + (r * b * bc)) - ((b * abc) + ac + (a * bc) + (r * ((c + abc) - (ac + bc)) * u b * c))) = 0) } Solving for ac: The subset of terms in the term-set that contain abc is: . . ac = (((abc + (a * c) + (r * b * bc)) - ((r * b * c) + (a * bc) + (b * abc))) / { (1 - b)) ((r * b) - abc) * u Substituting the preceding definition for ac into the previous term-characterizations produces the new term-characterizations: abc * u . . . . . abc: 1 = ((((c + abc) - ((((abc + (a * c) + (r * b * bc)) - ((r * b * c) + (a * ((a + abc) - ((r * b) + ac)) * u bc) + (b * abc))) / (1 - b)) + bc)) (ac - abc) * u * ((b + abc) - ((r * b) + bc)) ((b + abc) - ((r * b) + bc)) * u * ((a + abc) - ((r * b) + (((abc + (a * c) + (r * b * bc)) - ((r * b (bc - abc) * u * c) + (a * bc) + (b * abc))) / (1 - b)))) (((r * b) + 1 + ac + bc) - (a + b + abc + c)) * u * abc) ((c + abc) - (ac + bc)) * u / } (((r * b) - abc) The expectable-value of abc is then the real-valued solution to the following * ((((abc + (a * c) + (r * b * bc)) - ((r * b * c) + (a * bc) + (b equation: * abc))) / (1 - b)) - abc) 1 = ((((r * b) - abc) ^ (((r * b) - (abc + 1)) - ((r * b) - abc))) * (bc - abc) * (abc ^ ((abc + 1) - abc)) * ((bc + (((abc + (a * c) + (r * b * bc)) - ((r * b * c) + (a * bc) * (((a + abc) - ((r * b) + ac)) ^ (((a + (abc + 1)) - ((r * b) + ac)) - ((a + + (b * abc))) abc) - ((r * b) + ac)))) / (1 - b)) + 1 + (r * b)) - (a + b + abc + c)))) * ((ac - abc) ^ ((ac - (abc + 1)) - (ac - abc))) . . . . . bc: 1 = (((((r * b) + 1 + (((abc + (a * c) + (r * b * bc)) - ((r * b * c) + (a * (((b + abc) - ((r * b) + bc)) ^ (((b + (abc + 1)) - ((r * b) + bc)) - ((b + * bc) + (b * abc))) abc) - ((r * b) + bc)))) / (1 - b)) + bc) - (a + b + abc + c)) * ((bc - abc) ^ ((bc - (abc + 1)) - (bc - abc))) * (bc - abc)) * ((((r * b) + 1 + ac + bc) - (a + b + abc + c)) ^ / ((((r * b) + 1 + ac + bc) - (a + b + (abc + 1) + c)) - (((r * b) + 1 + ac (((abc + b) - ((r * b) + bc)) + bc) - (a + b + abc + c)))) * ((abc + c) - ((((abc + (a * c) + (r * b * bc)) - ((r * b * c) + (a * (((c + abc) - (ac + bc)) ^ (((c + (abc + 1)) - (ac + bc)) - ((c + abc) * bc) + (b * abc))) / (1 - b)) + bc)))) - (ac + bc))))) These term-characterizations simplify to yield the following term- = ((((r * b) - abc) ^ (- 1)) * (abc ^ 1) * (((a + abc) - ((r * b) + ac)) ^ 1) * characterizations: ((ac - abc) ^ (- 1)) * . . . . . abc: 1 = ((((abc ^ 2) + (abc * b)) - ((abc * bc) + (r * abc * b))) / ((bc (((b + abc) - ((r * b) + bc)) ^ 1) * ((bc - abc) ^ (- 1)) * ((((r * b) + 1 + ac - abc) * ((r * b) - abc))) + bc) - (a + b + abc + c)) ^ (- 1)) * . . . . . bc: 1 = ((((bc ^ 2) + bc + (c * abc) + (b * abc)) - ((bc * abc) + abc + (((c + abc) - (ac + bc)) ^ 1)) (c * bc) + (b * bc))) = ((((c + abc) - (ac + bc)) * ((b + abc) - ((r * b) + bc)) * ((a + abc) - ((r * / ((c - bc) * ((abc + b) - ((r * b) + bc)))) b) + ac)) * abc) . The preceding term-characterization for abc simplifies to: / (((r * b) - abc) * (ac - abc) * (bc - abc) * ((bc + ac + 1 + (r * b)) - (a + b . . . (((bc * r * b) - (abc * b)) = 0) + abc + c)))) . Solving for abc: The subset of terms in the term-set that contain ac is: . . . abc = (bc * r) { . Substituting the preceding definition for abc into the previous term- ((a + abc) - ((r * b) + ac)) * u characterizations produces the new term-characterizations: (ac - abc) * u ...... bc: 1 = ((((bc ^ 2) + bc + (c * (bc * r)) + (b * (bc * r))) - ((bc * (bc * r)) (((r * b) + 1 + ac + bc) - (a + b + abc + c)) * u + (bc * r) + (c * bc) + (b * bc))) ((c + abc) - (ac + bc)) * u / ((c - bc) * (((bc * r) + b) - ((r * b) + bc)))) } . These term-characterizations simplify to yield the following term- The expectable-value of ac is then the real-valued solution to the following characterizations: equation: ...... bc: 1 = ((((bc + 1) - (b + c)) * bc) / ((c - bc) * (b - bc))) 1 = ((((a + abc) - ((r * b) + ac)) ^ (((a + abc) - ((r * b) + (ac + 1))) - ((a + . . The preceding term-characterization for bc simplifies to: abc) - ((r * b) + ac)))) . . . . (((c * b) - bc) = 0) * ((ac - abc) ^ (((ac + 1) - abc) - (ac - abc))) . . Solving for bc: * ((((r * b) + 1 + ac + bc) - (a + b + abc + c)) ^ ((((r * b) + 1 + (ac + 1) + . . . . bc = (c * b) bc) - (a + b + abc + c)) - (((r * b) + 1 + ac + bc) - (a + b + abc + c)))) . . . . Substituting the new definition for bc into the previous definition for abc * (((c + abc) - (ac + bc)) ^ (((c + abc) - ((ac + 1) + bc)) - ((c + abc) - (ac produces: + bc))))) . . . . abc = ((c * b) * r) = ((((a + abc) - ((r * b) + ac)) ^ (- 1)) * ((ac - abc) ^ 1) * ((((r * b) + 1 + ac + bc) - (a + b + abc + c)) ^ 1) . . . . The preceding definition for ac then expands as follows: * (((c + abc) - (ac + bc)) ^ (- 1))) . . . . ac = (((abc + (a * c) + (r * b * (c * b)) - ((r * b * c) + (a * (c * b) + (b * abc))) / (1 - b)) = (((((r * b) + 1 + ac + bc) - (a + b + abc + c)) * (ac - abc)) / (((abc + a) - ((r * b) + ac)) * ((abc + c) - (ac + bc)))) . . . . and this simplifies to: The subset of terms in the term-set that contain bc is: . . . . ac = ((abc + (a * c)) - (r * b * c) { . . . . Substituting the new definition for abc into the previous definition for ac produces: ((b + abc) - ((r * b) + bc)) * u . . . . ac = (((((c * b) * r)) + (a * c)) - (r * b * c)) (bc - abc) * u . . . . which simplifies to: (((r * b) + 1 + ac + bc) - (a + b + abc + c)) * u . . . . ac = (a * c) ((c + abc) - (ac + bc)) * u } Grounded definitions of the expectable values were found for all the variables. The expectable-value of bc is then the real-valued solution to the following equation: 1 = ((((b + abc) - ((r * b) + bc)) ^ (((b + abc) - ((r * b) + (bc + 1))) - ((b + The following definitions of expectable values were found that appeal only to abc) - ((r * b) + bc)))) the constants: * ((bc - abc) ^ (((bc + 1) - abc) - (bc - abc))) ------* ((((r * b) + 1 + ac + bc) - (a + b + abc + c)) ^ abc = ((c * b) * r) ((((r * b) + 1 + ac + (bc + 1)) - (a + b + abc + c)) - (((r * b) + 1 + ac ------+ bc) - (a + b + abc + c)))) ac = (a * c)

— 27 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

------. . ac = (c * s) bc = (c * b) . . The preceding definitions for abcd, acd, abc then expand as follows: . . abcd = (c * s) The following characterizations were found for the expectable values of the . . acd = (c * s) probabilities wanted: . . abc = (c * s) ------. . and these simplify to: prob(A / (B & C)) = r . . abcd = (c * s) ------. . acd = (c * s) . . abc = (c * s) A and C are statistically independent relative to B • Grounded definitions of the expectable values were found for all the variables. Subproperty Defeat for Nonclassical Direct Inference: ------The following definitions of expectable values were found that appeal only to If C 7 D 7 B, prob(A/D) = s, prob(A/B) = r, prob(A/U) = the constants: a, prob(B/U) = b, prob(C/U) = c, prob(D/U) = d, then the ------expectable value of prob(A/C) = s (rather than r). ac = (c * s) ------Proof: Here is a machine-generated proof, produced by bc = c executing the following instruction: ------(analyze-probability-structure abcd = (c * s) :subsets ‘(A B C D) ------:constants ‘(a b c d r s) cd = c :probability-constraints ‘((prob(A / B) = r) ------(prob(A / D) = s)) bcd = c :subset-constraints ‘((C subset D) (D subset B)) ------:probability-queries ‘(prob(A / C)) acd = (c * s) :independence-queries ‘((A C B) (A C D)) ------:parallel-term-characterizations t abc = (c * s) :display-details t ------:display-infix t) abd = (s * d) ======------Dividing U into 4 subsets A,B,C,D whose probabilities relative to U are a, b, c, d, bd = d if the following constraints are satisfied: prob(A / B) = r The following characterizations were found for the expectable values of the probabilities wanted: prob(A / D) = s ------(C subset D) prob(A / C) = s (D subset B) ------and hence cd = c A and C are not statistically independent relative to B bd = d (prob(A / B) * prob(C / B)) ÷ prob((A & C) / B) = (r / s) bc = c A and C are statistically independent relative to D abcd = ac • acd = ac abd = (* s d) 8. Classical Direct Inference abc = ac Representation Theorem for Singular Probabilities: bcd = c and the values of a, b, c, d, r, s are held constant, PROB(Fa) = prob(Fx/x = a & K). then the term-set consisting of the cardinalities of the partition of U is: Proof: { prob(Fx/x = a & K) ((1 + (r * b)) - (a + b)) * u (c - ac) * u = ρ({〈w,x〉|w ∈ W & (x=a & Fx & K) at w},{〈w,x〉|w ∈ W & ((b + (s * d)) - ((r * b) + d)) * u (x=a & K) at w}) ((d + ac) - ((s * d) + c)) * u = ρ({〈w,x〉|w ∈ W & x=a & (Fx & K) at w},{〈w,x〉|w ∈ W & (a - (r * b)) * u x=a & at w}) ac * u K ((r * b) - (s * d)) * u = ρ({〈w,a〉|w ∈ W & (Fa & K) at w},{〈w,a〉|w ∈ W & K at ((s * d) - ac) * u w}) } = ρ({w|w ∈ W & (Fa & K) at w}×{a},{w|w ∈ W & K at The subset of terms in the term-set that contain ac is: { w}×{a}) (c - ac) * u = ρ({w|w ∈ W & (Fa & K) at w},{w|w ∈ W & K at w} ⋅ ((d + ac) - ((s * d) + c)) * u ρ({a},{a}) ac * u = ρ({w|w ∈ W & (Fa & K) at w},{w|w ∈ W & K at w}) ((s * d) - ac) * u } = PROB(Fa). • The expectable-value of ac is then the real-valued solution to the following 9. Computational Inheritance equation: 1 = (((c - ac) ^ ((c - (ac + 1)) - (c - ac))) Y-Theorem: * (((d + ac) - ((s * d) + c)) ^ (((d + (ac + 1)) - ((s * d) + c)) - ((d + ac) Let r = prob(A/B), s = prob(A/C), a = prob(A/U), and 0 < - ((s * d) + c)))) * (ac ^ ((ac + 1) - ac)) a < 1. If B and C are Y-independent for A relative to U then * (((s * d) - ac) ^ (((s * d) - (ac + 1)) - ((s * d) - ac)))) prob(A/B&C) = Y(r,s|a). = (((c - ac) ^ (- 1)) * (((d + ac) - ((s * d) + c)) ^ 1) * (ac ^ 1) * (((s * d) - ac) ^ (- 1))) Proof: As we have seen, in the definition of Y-independence, = ((ac * ((d + ac) - ((s * d) + c))) / ((c - ac) * ((s * d) - ac))) (a) is equivalent to: The preceding term-characterization for ac simplifies to: (c) prob(B&C/A) = prob(B/A)⋅prob(C/A). . . (((c * s * d) - (ac * d)) = 0) Similarly, (b) is equivalent to: Solving for ac:

— 28 — — Philosophy and Computers —

(d) prob(B&C/U & ~A) = prob(B/U & ~A)⋅prob(C/U & ~A) ((s * c) - abc) * u By Bayes’ theorem: ((b + abc) - ((r * b) + bc)) * u (bc - abc) * u (((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) * u ((c + abc) - ((s * c) + bc)) * u } The expectable-value of abc is then the real-valued solution to the following equation: 1 = ((((r * b) - abc) ^ (((r * b) - (abc + 1)) - ((r * b) - abc))) * (abc ^ ((abc + 1) - abc)) * (((a + abc) - ((r * b) + (s * c))) ^ (((a + (abc + 1)) - ((r * b) + (s * c))) - ((a + abc) - ((r * b) + (s * c))))) * (((s * c) - abc) ^ (((s * c) - (abc + 1)) - ((s * c) - abc))) So: * (((b + abc) - ((r * b) + bc)) ^ (((b + (abc + 1)) - ((r * b) + bc)) - ((b + abc) - ((r * b) + bc)))) * ((bc - abc) ^ ((bc - (abc + 1)) - (bc - abc))) * ((((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) ^ ((((r * b) + 1 + (s * c) + bc) - (a + b + (abc + 1) + c)) - (((r * b) + 1 = (by (c)) + (s * c) + bc) - (a + b + abc + c)))) * (((c + abc) - ((s * c) + bc)) ^ (((c + (abc + 1)) - ((s * c) + bc)) - ((c + abc) - ((s * c) + bc))))) = = ((((r * b) - abc) ^ (- 1)) * (abc ^ 1) * (((a + abc) - ((r * b) + (s * c))) ^ 1) = * (((s * c) - abc) ^ (- 1)) * (((b + abc) - ((r * b) + bc)) ^ 1) Similarly, by (d): * ((bc - abc) ^ (- 1)) * ((((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) ^ (- 1)) . * (((c + abc) - ((s * c) + bc)) ^ 1)) = ((((c + abc) - ((s * c) + bc)) * ((b + abc) - ((r * b) + bc)) * ((a + abc) - ((r * b) + (s * c))) * abc) But prob(~ A / B & C)=1 – prob(A / B & C) , so: / (((r * b) - abc) * ((s * c) - abc) * (bc - abc) * ((bc + (s * c) + 1 + (r * b)) . - (a + b + abc + c)))) The subset of terms in the term-set that contain bc is: { Solving for prob(A/B&C): ((b + abc) - ((r * b) + bc)) * u prob(A/B&C) = = Y(r,s|a). • (bc - abc) * u (((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) * u Y-Principle: ((c + abc) - ((s * c) + bc)) * u If B,C 7 U, prob(A/B) = r, prob(A/C) = s, prob(A/U) = a, } prob(B/U) = b, prob(C/U) = c, and 0 < a < 1, then the The expectable-value of bc is then the real-valued solution to the following equation: expectable value of prob(A/B & C) = Y(r,s|a). 1 = ((((b + abc) - ((r * b) + bc)) ^ (((b + abc) - ((r * b) + (bc + 1))) - ((b + Proof: Here is a machine-generated proof, produced by abc) - ((r * b) + bc)))) * ((bc - abc) ^ (((bc + 1) - abc) - (bc - abc))) executing the following instruction: * ((((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) ^ ((((r * b) + 1 + (s * c) + (bc + 1)) - (a + b + abc + c)) - (((r * b) + 1 + (analyze-probability-structure (s * c) + bc) - (a + b + abc + c)))) :subsets ‘(A B C) * (((c + abc) - ((s * c) + bc)) ^ (((c + abc) - ((s * c) + (bc + 1))) - ((c + :constants ‘(a b c r s) abc) - ((s * c) + bc))))) :constraints ‘((ab = (* r b)) = ((((b + abc) - ((r * b) + bc)) ^ (- 1)) (ac = (* s c))) * ((bc - abc) ^ 1) :probability-queries ‘(prob(A / (B & C))) * ((((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) ^ 1) :independence-queries ‘((B C A) (B C ~A) (B C U)) * (((c + abc) - ((s * c) + bc)) ^ (- 1))) :parallel-term-characterizations t = (((((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) * (bc - abc)) :display-details t / (((abc + b) - ((r * b) + bc)) * ((abc + c) - ((s * c) + bc)))) :display-infix t) The preceding term-characterization for bc simplifies to: . . ((((a * bc) + abc + (b * c) + (r * b * s * c)) - ((a * abc) + bc + (b * s * c) + Dividing U into 3 subsets A,B,C whose probabilities relative to U are a, b, c, (r * b * c))) = 0) if the following constraints are satisfied: Solving for bc: ab = (r * b) . . bc = (((abc + (b * c) + (r * b * s * c)) - ((r * b * c) + (b * s * c) + (a * abc))) / (1 - a)) ac = (s * c) Substituting the preceding definition for bc into the previous term-characterizations and the values of a, b, c, r, s are held constant, produces the new term-characterizations: then the term-set consisting of the cardinalities of the partition of U is: . . . . . abc: 1 = ((((c + abc) - { ((s * c) + (((abc + (b * c) + (r * b * s * c)) - ((r * b * c) + (b * ((r * b) - abc) * u s * c) + (a * abc))) / (1 - a)))) abc * u * ((b + abc) - ((a + abc) - ((r * b) + (s * c))) * u ((r * b) + (((abc + (b * c) + (r * b * s * c)) - ((r * b * c) + (b ((s * c) - abc) * u * s * c) + (a * abc))) / (1 - a)))) ((b + abc) - ((r * b) + bc)) * u * ((a + abc) - ((r * b) + (s * c))) * abc) (bc - abc) * u / (((r * b) - abc) * ((s * c) - abc) (((r * b) + 1 + (s * c) + bc) - (a + b + abc + c)) * u * ((((abc + (b * c) + (r * b * s * c)) - ((r * b * c) + (b * s * c) + (a * abc))) / (1 - a)) - abc) ((c + abc) - ((s * c) + bc)) * u * (((((abc + (b * c) + (r * b * s * c)) - ((r * b * c) + (b * s * c) } + (a * abc))) / (1 - a)) + (s * c) The subset of terms in the term-set that contain abc is: + 1 + (r * b)) - (a + b + abc + c)))) { These term-characterizations simplify to yield the following term- ((r * b) - abc) * u characterizations: abc * u . . . . . abc: 1 = ((((abc ^ 2) + (a * abc)) - ((abc * r * b) + (abc * s * c))) / (((s * ((a + abc) - ((r * b) + (s * c))) * u c) - abc) * ((r * b) - abc))) . The preceding term-characterization for abc simplifies to:

— 29 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

. . . (((s * c * r * b) - (a * abc)) = 0) prob(C/U) = prob(C/A)⋅prob(A/U) + prob(C/U&~A)⋅(1– . Solving for abc: prob(A/U)). . . . abc = ((s * c * r * b) / a) . . . Substituting the new definition for abc into the previous definition for bc If prob(C/A) ≠ prob(C/U) then prob(C/U&A) ≠ prob(C/U&~A). produces: . . . bc = (((((s * c * r * b) / a) + (b * c) + (r * b * s * c)) - ((r * b * c) + (b * s * Furthermore, prob(A/B) = prob(B/A)⋅ , so prob(A/B) c) + (a * ((s * c * r * b) / a)))) / (1 - a)) = prob(A/U) iff prob(B/A) = prob(B/U). Thus prob(A/B) ≠ . . . which simplifies to: prob(A/U), so by lemma 1, prob(C/B) ≠ prob(C/U). • . . . bc = ((((s * c * r * b) + (a * b * c)) - ((b * s * c * a) + (r * b * c * a))) / ((1 - a) * a))

Let us define Y0(r,s) to be Y(r,s|p) where p is the solution to Grounded definitions of the expectable values were found for all the variables. the following set of three simultaneous equations (for variable ------p, a, and b, and fixed r and s): The following definitions of expectable values were found that appeal only to the constants: 2p3 – (a + b – 2a ⋅ r – 2b ⋅ s – 3)p2 ------abc = ((s * c * r * b) / a) +(a ⋅ b+2a ⋅ r – a ⋅ br+2b ⋅ s – a ⋅ b ⋅ s + 2a ⋅ b ⋅ r ⋅ s – a ------– b +1)p – a ⋅ b ⋅ r ⋅ s=0; bc = ((((a + (s * r)) - ((s * a) + (r * a))) * c * b) / (a * (1 - a))) ======; Reconstruing a prob(A / U), the following characterizations were found for the expectable values of the probabilities wanted: ------. prob(A / (B & C)) = (((1 - a) * s * r) / ((a + (s * r)) - ((r * a) + (s * a)))) ------Then we have the following principle: B and C are STATISTICALLY INDEPENDENT relative to A Y0-Principle: B and C are STATISTICALLY INDEPENDENT relative to ~A If prob(P/A) = r and prob(P/B) = s, then the expectable

B and C are NOT statistically independent relative to U value of prob(P/A & B) = Y0(r,s). (prob(B / U) * prob(C / U)) ÷ prob((B & C) / U) = Proof: Proceeding as usual, the expectable values of abp, a, b, ((a * (1 - a)) / ((a + (s * r)) - ((s * a) + (r * a)))) p, and ab are the solutions to the following set of equations: • Y-Defeat for Statistical Independence: £prob(A/C) ≠ prob(A/U) and prob(A/B) ≠ prob(A/U)· is an undercutting defeater for the inference from £prob(A/C) = r and prob(B/C) = s· to £prob(A&B/C) = r ⋅ s· by the Principle of Statistical Independence. This is established by lemma 2. To prove lemma 2 we first prove: Lemma 1: If x ≠ x* and y ≠ y* then xy + (1–x)y* ≠ x*y + (1–x*)y*. As in the Y-principle, solving the first and fifth equation for ab Proof: Suppose xy + (1–x)y* = x*y + (1–x*)y*. (x – x*)y = (x and abp produces: – x*)y*, so either (x – x*) = 0 or y = y*. • Lemma 2: If B and C are Y-independent for A relative to U and prob(C/A) ≠ prob(C/U) and prob(B/A) ≠ prob(B/U) then prob(C/B) ≠ prob(C/U). Proof: Suppose the antecedent. Substituting these values into the other equations produces the prob(C/B) = prob(C/B&A)⋅prob(A/B) + prob(C/B&~A)⋅(1– triple of equations that define Y (r,s). • prob(A/B)) 0 11. Inverse Probabilities = prob(C/A)⋅prob(A/B) + prob(C/U&~A)⋅(1– Inverse Probabilities I: prob(A/B)). If A,B 7 U and we know that prob(A/B) = r, but we do prob(C/U) = prob(C/A)⋅prob(A/U) + prob(C/U&~A)⋅(1– not know the base rates prob(A/U) and prob(B/U), the prob(A/U)). following values are expectable: If prob(C/A) ≠ prob(C/U) then prob(C/U&A) ≠ prob(C/U&~A). prob(B/U) = ; Furthermore, prob(A/B) = prob(B/A)⋅ , so prob(A/B) = prob(A/U) iff prob(B/A) = prob(B/U). Thus prob(A/B) ≠ prob(A/ prob(A/U) = ; U), so by lemma 1, prob(C/B) ≠ prob(C/U). • Theorem 3: If B and C are Y-independent for A relative to U prob(~A/~B&U) = .5; and prob(C/A) ≠ prob(C/U) and prob(B/A) ≠ prob(B/U) then prob(C/B) ≠ prob(C/U). Proof: Suppose the antecedent. prob(~B/~A&U) = . prob(C/B) = prob(C/B&A)⋅prob(A/B) + prob(C/B&~A)⋅(1– prob(A/B)) Proof: Here is an “almost” automatically produced proof. = prob(C/A)⋅prob(A/B) + prob(C/U&~A)⋅(1– prob(A/B)). — 30 — — Philosophy and Computers —

Dividing U into 2 subsets A,B whose probabilities relative to U are a, b, 5. Probabilities relating n-place relations are treated similarly. I if the following constraints are satisfied: will generally just write the one-variable versions of various prob(A / B) = r principles, but they generalize to n-variable versions in the and the values of r are held constant, obvious way. then the term-set consisting of the cardinalities of the partition of U is: 6. The statistical syllogism was first expressed in this form in { Pollock (1983a), but it has a long and distinguished history ((1 + (r * b)) - (a + b)) * u (b - (r * b)) * u going back at least to C. S Peirce in the 1880s. See also Kyburg (a - (r * b)) * u (1974, 1977). (r * b) * u 7. There are two kinds of defeaters. Rebutting defeaters attack } the conclusion of an inference, and undercutting defeaters The subset of terms in the term-set that contain a is: attack the inference itself without attacking the conclusion. { Here I assume some form of the OSCAR theory of defeasible ((1 + (r * b)) - (a + b)) * u reasoning (Pollock 1995). For a sketch of that theory see (a - (r * b)) * u Pollock (2006a). } 8. If we could assume countable additivity for nomic probability, The expectable-value of a is then the real-valued solution to the following equation: the Indifference Principle would imply that probX (prob(X/G) 1 = ( = 0.5 / X 7 G)=1. Countable additivity is generally assumed (((1 + (r * b)) - (a + b)) ^ (((1 + (r * b)) - ((a + 1) + b)) - ((1 + (r * b)) in mathematical probability theory, but most of the important - (a + b)))) writers in the foundations of probability theory, including de * ((a - (r * b)) ^ (((a + 1) - (r * b)) - (a - (r * b))))) Finetti (1974), Reichenbach (1949), Jeffrey (1983), Skyrms = ((((1 + (r * b)) - (a + b)) ^ (- 1)) * ((a - (r * b)) ^ 1)) (1980), Savage (1954), and Kyburg (1974), have either = ((1 / (((r * b) + 1) - (a + b))) * (a - (r * b))) questioned it or rejected it outright. Pollock (2006) gives what = ((a - (r * b)) * (1 / (((r * b) + 1) - (a + b)))) I consider to be a compelling counter-example to countable = ((a - (r * b)) / (((r * b) + 1) - (a + b))) additivity. So I will have to remain content with the more The subset of terms in the term-set that contain b is: complex formulation of the Indifference Principle. { ((1 + (r * b)) - (a + b)) * u 9. This illustrates that to get finite unboundedness, we often (b - (r * b)) * u have to restrict the various parameters mentioned in LC (a - (r * b)) * u to rational numbers. I am convinced that this restriction (r * b) * u should be inessential. One can go ahead and solve the term } characterizations in the same way for the cases in whichthe The expectable-value of b is then the real-valued solution to the following parameters are irrational, and I am inclined to endorse the equation: resulting principles of probable probabilities. However, at this 1 = ( point I am unsure how to justify this. (((1 + (r * b)) - (a + b)) ^ (((1 + (r * (b + 1))) - (a + (b + 1))) - ((1 + (r * b)) - (a + b)))) 10. Bacchus (1990) gave a somewhat similar account of direct * ((b - (r * b)) ^ (((b + 1) - (r * (b + 1))) - (b - (r * b)))) inference, drawing on Pollock (1983, 1984). * ((a - (r * b)) ^ ((a - (r * (b + 1))) - (a - (r * b)))) 11. What an agent is justified in believing at a time depends on * ((r * b) ^ ((r * (b + 1)) - (r * b)))) how much reasoning he has done. A proposition is warranted = ((((1 + (r * b)) - (a + b)) ^ (r - 1)) * ((b - (r * b)) ^ (1 - r)) * for an agent iff the agent would be justified in believing it if ((a - (r * b)) ^ (- r)) * ((r * b) ^ r)) he could do all the relevant reasoning. = ((2 * b * ((1 - r) ^ (1 - r)) * (r ^ r)) / (1 - b)) 12. For a further complication, see the literature on causal Solving for b: probability, as discussed for example in Pollock (2006). . . b = ((1/2) / ((1/2) + ((r ^ r) * ((1 - r) ^ (1 - r))))) The preceding term-characterization for a simplifies to: 13. It turns out that the Y-function has been studied for its . . ((((r * b) + 1 + (r * b)) - (a + a + b)) = 0) desirable mathematical properties in the theory of associative Solving for a: compensatory aggregation operators in fuzzy logic (Dombi . . a = ((((r * b) + 1 + (r * b)) - b) / 2) 1982; Klement, Mesiar, and Pap 1996; Fodor, Yager, and Rybalov

Substituting the preceding definition for b into the definition for a: 1997). Y(r,s|a) is the function of Dλ(r,s) for λ = (Klement, . . a = ((1/2) - ((1/4) - (r / 2)) / (((r ^ r) ((1 - r) ^ (1 - r))) + (1/2))) Mesiar, and Pap 1996). The Y-theorem may provide further justification for its use in that connection. The following characterizations were found for the expectable values of the probabilities wanted: 14. See also Bacchus et al. (1996). Given very restrictive ------assumptions, their theory gets the special case of the Y- prob(B / U) = ((1/2) / ((1/2) + ((r ^ r) * ((1 - r) ^ (1 - r))))) Principle in which a = .5, but not the general case. ------References prob(A / U) = ((1/2) - ((1/4) - (r / 2)) / (((r ^ r) ((1 - r) ^ (1 - r))) + (1/2))) Bacchus, Fahiem. 1990. Representing and Reasoning with Probabilistic ------Knowledge. MIT Press. prob(~A / (~B & U)) = 1/2 ------Bacchus, Fahiem, Adam J. Grove, Joseph Y. Halpern, Daphne Koller. prob(~B / ~A & U) = ((r ^ r) / (((1 - r) ^ r) + (r ^ r))) 1996. From statistical knowledge bases to degrees of belief. Artificial ------Intelligence 87: 75-143. Braithwaite, R. B. 1953. Scientific Explanation. Cambridge: Cambridge • University Press. Endnotes Carnap, Ruldolph. 1947. Meaning and Necessity. Chicago: University 1. This work was supported by NSF grant no. IIS-0412791. of Chicago Press. 2. In the past, I followed Jackson and Pargetter 1973 in calling 1950. The Logical Foundations of Probability. Chicago: University of these “indefinite probabilities,” but I never liked that Chicago Press. terminology. 1952. The Continuum of Inductive Methods. Chicago: University of 3. Examples are Russell (1948); Braithwaite (1953); Kyburg Chicago Press. (1961, 1974); Sklar (1970, 1973). William Kneale (1949) traces de Finetti, B. 1974. Theory of Probability, vol. 1. New York: John Wiley the frequency theory to R. L. Ellis, writing in the 1840s, and and Sons. John Venn (1888) and C. S. Peirce in the 1880s and 1890s. Dombi, J. 1982. Basic concepts for a theory of evaluation: The 4. Somewhat similar semantics were proposed by Halpern aggregative operator. European Journal of Operational Research 10: (1990) and Bacchus et al. (1996). 282-293. — 31 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Fisher, R. A. 1922. On the mathematical foundations of theoretical Renyi, Alfred. 1955. On a new axiomatic theory of probability. Acta statistics. Philosophical Transactions of the Royal Society A 222: 309- Mathematica Academiae Scientiarum Hungaricae 6: 285-333. 68. Russell, Bertrand. 1948. Human Knowledge: Its Scope and Limits. New Fodor, J., R. Yager, A. Rybalov. 1997. Structure of uninorms. International York: Simon and Schuster. Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 5: Savage, Leonard. 1954. The Foundations of Statistics. Dover, New 411-27. York. Goodman, Nelson. 1955. Fact, Fiction, and Forecast. Cambridge, Mass.: Shafer, G. 1976. A Mathematical Theory of Evidence. Princeton: Harvard University Press. Princeton University Press. Halpern, J. Y. 1990. An analysis of first-order logics of probability. Artificial Sklar, Lawrence. 1970. “Is propensity a dispositional concept?” Journal Intelligence 46: 311-50. of Philosophy 67: 355-366. Harman, Gilbert. 1986. Change in View. MIT Press, Cambridge, Mass. ———. 1973. Unfair to frequencies. Journal of Philosophy 70: 41-52. Hintikka, Jaakko. 1966. A two-dimensional continuum of inductive Skyrms, Brian. 1980. Causal Necessity. Yale University Press, New methods. In Aspects of Inductive Logic, ed. J. Hintikka and P. Suppes. Haven. 113-32. Amsterdam: North Holland. van Fraassen, Bas. 1981. The Scientific Image. Oxford: Oxford University Jeffrey, Richard. 1983. The Logic of Decision, 2nd edition. University of Press. Chicago Press. Venn, John. 1888. The Logic of Chance, 3rd ed. London. Klement, E. P., R. Mesiar, E. Pap, E. 1996. On the relationship of associative compensatory operators to triangular norms and conorms. Int J. of Unc. Fuzz. and Knowledge-Based Systems 4: 129-44. Kneale, William. 1949. Probability and Induction. Oxford: Oxford University Press. DISCUSSION PAPERS Kushmerick, N., Hanks, S., and Weld, D. 1995. An algorithm for probabilistic planning. Artificial Intelligence 76: 239-86. Kyburg, Henry, Jr. 1961. Probability and the Logic of Rational Belief. Computer Art Middletown, Conn.: Wesleyan University Press. ———. 1974. The Logical Foundations of Statistical Inference. Dordrecht: Berys Gaut Reidel. University of St. Andrews ———. 1974a. Propensities and probabilities. British Journal for the Philosophy of Science 25: 321-53. What is computer art? The answer may seem obvious: computer ———. 1977. Randomness and the right reference class. Journal of art is any art made by a computer or with the use of a computer. Philosophy 74: 791-97. But that cannot be right. Almost all novels are now written on a Levi, Isaac. 1980. The Enterprise of Knowledge. Cambridge, Mass.: computer, but that does not make them computer art. Perhaps MIT Press. computer art is art made by a computer and that is distinctive Pearl, Judea. 1988. Probabilistic Reasoning in Intelligent Systems: in some way. That fails too: novels written on a computer are Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann. distinctive in being written after 1945, but that does not make Pollock, John L. 1983. A theory of direct inference. Theory and Decision these novels computer art works. Rather, computer art is art 15: 29-96. that is made by a computer and is also artistically distinctive in ———. 1983a. Epistemology and Probability. Synthese 55: 231-252. some way. For instance, hypertext stories enable the capacity to navigate through a story by the insertion of hyperlinks into it, ———. 1984. Foundations for direct inference. Theory and Decision 17: 221-56. and so create a form of artistically interesting interaction that does not obtain in traditional literature; by virtue of that fact ———. 1984a. Foundations of Philosophical Semantics. Princeton: hypertext stories are a form of computer art. What matters, then, Princeton University Press. for the existence of computer art is that one can do something ———. 1990. Nomic Probability and the Foundations of Induction. New artistically distinctive with computers, something that is not York: Oxford University Press. achievable, either at all or in practice, in other kinds of art. ———. 1995. Cognitive Carpentry. Cambridge, MA: Bradford/MIT Press. In computer art a computer’s capacities are exploited ———. 2006. Thinking about Acting: Logical Foundations for Rational for achieving artistically distinctive ends. So we need first to Decision Making. New York: Oxford University Press. determine what computers can do that other things cannot, either at all or in practice. The most general characterization ———. 2006a. Defeasible reasoning. In Reasoning: Studies of Human Inference and its Foundations, ed. Jonathan Adler and Lance Rips. of a computer is as a Universal Turing Machine (UTM). A Cambridge: Cambridge University Press. Turing Machine is an abstract device composed of a reading Popper, Karl. 1938. A set of independent axioms for probability. Mind head and an infinitely long tape divided into cells, and in each 47: 275ff. cell there is no more than one syntactically specified symbol, taken from a finite list of such symbols. The machine takes as ———. 1956. The propensity interpretation of probability. British Journal for the Philosophy of Science 10: 25-42. inputs the symbols in the cells; it reads one cell at a time and has a set of instructions (its program) about what to do when ———. 1957. The propensity interpretation of the calculus of probability, and the quantum theory. In Observation and Interpretation, ed. S. it encounters any of these symbols. It may retain the symbol in Körner. 65-70. New York: Academic Press. the cell, erase the symbol, or replace one symbol with another. ———. 1959. The Logic of Scientific Discovery. New York: Basic It can also move one cell to the right or the left, and take as its Books. next input the symbol in that cell. The set of instructions in the machine comprises an algorithm: that is, there is an exactly Ramsey, Frank. 1926. Truth and probability. In The Foundations of Mathematics, ed. R. B. Braithwaite, Paterson, NJ: Littlefield, Adams. formulated rule that specifies what the machine is to do for each input and which the machine can implement in a finite Reichenbach, Hans. 1949. A Theory of Probability. Berkeley: University of California Press. (Original German edition 1935.) time. The machine thus has an input (the symbols stored on the tape), an output (the symbols that it writes onto the tape), Reiter, R., and G. Criscuolo. 1981. On interacting defaults. In IJCAI81, and an algorithm that transforms the input into the output. A 94-100. UTM is a machine that can do what any Turing Machine can

— 32 — — Philosophy and Computers — do. Computers, being UTMs, take inputs and transform them output (drawings, musical works, etc.) according to set rules into outputs by algorithms. A UTM could consist of a human than would be in practice achievable by manual methods. following the set of instructions, and was so thought of by Alan Intensity involves producing individual outputs that are far Turing in his original paper describing the device (Copeland more elaborated than could be otherwise achieved. Consider 2004). Electronic computers mechanize UTM routines, allowing photoreal animation—animation where the animation image them to implement them at speeds billions of times faster than of some object is indiscernible from how a photograph of that any human could. The notion of a UTM puts no constraints on object would look if the object existed with the properties that how the inputs or algorithms are produced. The inputs might, the image ascribes to it. Photoreal animation is not in practice for instance, be chosen either by the designers of the UTM or achievable by manual animation, which employs traditional by its users. painting and drawing techniques. It requires huge computing When we talk of computer art, we have in mind electronic power to render images with the degree of detail equivalent computers, not human beings running UTM routines. The to that of a high resolution digital photograph. And photoreal conjunction of two features is distinctive of electronic animation matters artistically, in generating a beauty of detail not computers. First they operate at speeds that far outrun human seen in traditional animation and in fostering greater character capacities, made possible by their automation of procedures. engagement (Gaut 2010, 66-7). Second, since they are UTMs, they operate according to The other main type of computer art is interactive computer algorithms. These algorithms define a space of possibilities. art, which depends on the fact that the input to a program Varying the input changes the output in accordance with the can be partly set by the audience rather than entirely by the algorithms; so varying the inputs allows us to discover what the artists. Examples include Daniel Rozin’s Wooden Mirror and algorithms permit, to explore the possibility space defined by Camille Utterback and Romy Archituv’s TEXT RAIN (Bolter and the algorithmic rules. Doing so by electronic computer enables Gromala 2003). Wooden Mirror consists of numerous small exploration with a thoroughness and speed not otherwise square wooden tiles set into an octagonal wooden frame; a achievable. Computers thus afford automated algorithmic video camera records the image of someone standing in front exploration. of the frame and a computer analyzes the image and controls Computer art exploits automated algorithmic exploration the tilting of each of the tiles to reflect back different amounts of to artistically distinctive effect. It allows artists to discover the light, forming a rough image of the viewer. The viewer, through full potential of what rules permit, something that would not be her presence and actions, thus partly determines the input to in practice possible without automation. There are two broad the work, and the work is thereby interactive, processing the classes of cases, depending on how the input is determined— viewer’s input according to the algorithms that constitute the solely by the artist in the case of non-interactive computer art, or program and thereby producing the perceived output. TEXT partly by the audience in the case of interactive computer art. RAIN involves a screened projection of the viewer’s image, In the first class are works such as those produced by and words and letters, taken from a poem by Evan Zimroth, Harold Cohen’s AARON and David Cope’s EMI (Boden 2004). that move slowly down the screen. By moving her body the AARON comes in various versions; in some it generates viewer can control the fall of letters, cupping them in her arms, drawings of acrobats in various postures. Cohen specifies a set throwing them up, perhaps even creating new poems with them. of algorithms, determining drawing-rules, models of acrobats’ Again, the viewer partly controls the input by her presence and bodies, how they may be physically positioned, and so on, and movements, which are projected onto the screen. More familiar inputs the basic parameters for a set of drawings. AARON can examples of interactive computer works are videogames. then produce an enormous number of drawings according to Here there is much overlap with the technology employed in the possibilities permitted by the algorithms. EMI is comprised non-interactive digital animation: Maya, for instance, is also of a database of composers’ melodic and harmonic motifs and sometimes used in the creation of videogames. The difference a set of rules for combining these motifs. Musical works can be with non-interactive digital cinema, where input is specified produced in the style of any composer whose motifs are in the by the artists alone, is that the player partly controls the input database; compositions in the style of Bach, Mozart, Beethoven, to the program and thus partly determines the happenings in and Mahler have been produced, and many are indiscernible the fictional world. Some videogames, such as Ico (2001) and to the lay (and often the expert) ear from actual compositions Bioshock (2007), are sufficiently rich and interesting to count as by those composers. EMI’s works are musical explorations of artworks because of the way they employ interactive possibilities aspects of a composer’s style: through automation one can for artistic ends: in Ico the repeated need to touch and hold discover what is possible in that composer’s style, exploring a hands as part of the gameplay becomes an affecting symbol vast range of possibilities: one can download 5,000 computer- of love between the player character, Ico, and the princess, generated Bach-style chorales from Cope’s website. Digital Yorda. In Bioshock a series of increasingly fraught moral choices animation is also a type of computer art. By using software tools about whether to “harvest” (kill) or liberate genetically altered such as Maya animators can build 3-D animation “puppets,” little girls guides the player to question some of her choices in either by hand or by mechanically capturing body information, the game world, something that is possible only in interactive rig (build the internal control skeletons of) these models, contexts, and employs this questioning to artistically interesting animate them, and then render the 3-D models into 2-D images. effect. Computationally generated interactivity makes possible At each stage animators choose input and then see how output distinctive artistic achievements; like all computer art it enables varies according to what is algorithmically permissible. This automated algorithmic exploration, with extensive or intensive may involve short algorithmic routines—seeing, for instance, possibilities, and in addition interactivity enables the viewer how moving a certain joint moves the 3-D animation model—or partly to determine the path of the exploration and output. extended algorithmic routines, using techniques such as particle As this theory-sketch shows, the computer art form systems or AI systems, involving multiple “agents” that interact comes in two basic types, non-interactive and interactive. The with each other. differences between the types are important, and condition the Automated algorithmic exploration can be distinctive ontology of works in those types, the role of the audience and in respect of either extensiveness or intensity (these are not that of the artist, and several other matters. These differences exclusive). Extensiveness involves producing far more artistic stem from the absence of audience input into the algorithms that generate the works in the case of non-interactive computer

— 33 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — art; and the existence of audience input into the algorithms (artworks made on Tuesdays) are one such kind. Appreciative whose implementation constitutes the works in the case of art kinds are defined thus: “a kind is an appreciative art kind interactive computer art. Nevertheless, there is a single art form just in case we normally appreciate a work in the kind by that embraces both interactive and non-interactive computer comparison with arbitrarily any other works in that kind” (Lopes works, constituted by the distinctive artistic possibilities afforded 2010, 17). Being an appreciative art kind is necessary but not by automated algorithmic exploration. Some art forms contain sufficient for being an art form: there are other appreciative art other art forms: the art form of picture printing contains the art kinds, such as genres like horror, which cross different art forms. forms of woodcuts, engravings, etchings, etc. Interactive and Digital art is not an art form because it is not an appreciative non-interactive computer art forms likewise are contained art kind, for we don’t normally appreciate a digital artwork by within the broader art form of computer art. comparing it with arbitrarily any other digital artwork: we don’t The claim that the computer art form subdivides into appreciate, say, digital paintings by comparing them with digital interactive and non-interactive forms may seem obvious. musical works. Digital paintings are appreciated in comparison Nevertheless, it has been denied by Dominic Lopes (2010). Lopes to other, often non-digital paintings; digital musical works are has written a highly original, systematic, and groundbreaking appreciated in comparison to other, often non-digital musical account of the computer art form, from which I have learned works. Those computer artworks that are digital artworks a great deal. But he defends the following definition of the therefore do not belong to a digital art form, because there is computer art form (CAF): “an item is a computer art work just no such form (Lopes 2010, 18). By parity of reasoning there in case (1) it’s art, (2) it’s run on a computer, (3) it’s interactive, is no computer art form that embraces non-interactive and and (4) it’s interactive because it’s run on a computer” interactive works, for digital non-interactive computer works (Lopes 2010, 27). So works in the computer art form must be do not fall under a common art form that could be a type of interactive: the type of non-interactive computer art form that computer art. I have just defended does not exist. Thus, in virtually founding Lopes’ argument depends on individuating art forms in the philosophical discussion of computer art, Lopes has part by comparison classes. The theory-sketch given earlier simultaneously radically restricted the object of study. Why? individuated art forms in terms of a medium’s (such as the He is certainly aware of non-interactive computer art: he computer medium’s) affording distinctive artistic possibilities discusses non-interactive works, such as AARON’s paintings (see Gaut 2010, chapter 7). Lopes’ individuation condition, and EMI’s musical compositions, not only interactive works, however, does not individuate art forms. Consider Rembrandt such as Wooden Mirror, TEXT RAIN, and videogames. AARON self-portraits and prehistoric cave paintings. They both belong to and EMI are discussed in a chapter about digital artworks. So the art form of painting, showing some of the distinctive artistic one might suppose that the disagreement is merely verbal: possibilities afforded by paint on a surface through enabling, for Lopes recognizes a digital art form, which comes in two kinds: instance, seeing-in by attending to a worked surface. But it is non-interactive and interactive (and he calls only the latter not true that we normally appreciate Rembrandt self-portraits kind “computer art”), whereas I am labelling as computer art by comparison to cave paintings: perhaps such a comparison what he terms the general art form of “digital art.” But that is has never been made. So we do not individuate art forms in not how matters stand. Lopes argues that there is in fact no terms of normally appreciating a work in the kind by comparing digital art form, so it cannot be identified with the computer it with arbitrarily any other work in the kind, for that is not true art form. What exists are particular digital artworks, which are of Rembrandt and cave paintings. It may be replied that this members of traditional art forms: the acrobat pictures produced is one of the abnormal cases: Rembrandt portraits or cave by AARON are digital drawings or paintings, and thus belong paintings are some of the exceptions where one only compares to the art forms of drawing or painting; EMI’s digitally created the paintings to a restricted set of other paintings rather than to musical works are members of the musical art form. But there arbitrarily any other painting. But taking this line means that in is no digital art form. order to assure ourselves that the art form of painting exists, we I agree with Lopes that digital art is not to be identified would have to engage in a statistical survey to show that most with computer art, though for reasons different from his. Some paintings are compared to arbitrarily any other paintings and computers are not digital: connectionist (neural net) computers that only a minority of paintings are compared to some restricted can be analogue, for instance, though connectionist systems set of paintings. But we know that the art form of painting exists are almost invariably run on digital computers. And some without engaging in any such inquiry, and indeed this sort of outputs of computers are not digital: the output of Wooden enquiry seems absurd. Mirror consists of the tilting of wooden tiles; in later versions of Alternatively, perhaps “normally” in Lopes’ individuation AARON Cohen equipped his computer with a robot art, which condition should be read not as a statistical notion, but as a employed a paintbrush and paint blocks to produce analogue normative one; we ought to compare the Rembrandt with paintings (Boden 2004, 314-5). Digital artworks are by far the the cave painting because we can learn something from the most common kind of computer artworks, but they are not the comparison. However, making this move undermines Lopes’ only kind; the digital art form is the major type of computer claim that there is no digital art form. One can illuminatingly art form, but is not identical with it. (Lopes categorizes digital appreciate digital works by comparing them with each other, works as either made by a digital computer or for display by even when they are in different traditional media. One can, a digital computer, so AARON’s recent output would count as for instance, usefully compare AARON’s digital paintings with digital works for him. However, categorizing these analogue EMI’s digital musical works: the former are more original, since paintings as digital artworks would be deeply misleading, for Cohen selected the input and algorithms to reflect his own style, they lack essential properties of the digital image, such as being but Cope chose EMI’s database and algorithms in accordance infinitely reproducible without information loss.) with an analysis of other composers’ style. However, the best of My main disagreement with Lopes concerns his claim EMI’s output is I think artistically more successful than the best that there is no digital art form and the reasoning that leads of AARON’s output, which likely reflects the greater value of the him thereby to conclude that the computer art form must be initial stylistic parameters that were fed into EMI. Comparing the interactive. Art kinds for Lopes are simply kinds of art, groups of two kinds of works, then, sheds light on the range and value of artworks that share some feature in common: Tuesday artworks the possibilities offered by automated algorithmic exploration. So if the “normal” in the definition of an appreciative art kind

— 34 — — Philosophy and Computers — is a normative notion, there is a digital art form. Recall that discussions, as we did when photography and the movies were I individuated art forms in terms of a medium’s affording new arts (1980, 152). distinctive artistic possibilities. Given that account, it will always Critical, historical, and theoretical discussions of digital be true that there is something to be learned from a comparison art typically do root it in precursor art practices and do draw of two works in an art form, since they both illustrate something analogies to traditional art while identifying disanalogies that of the distinctive artistic possibilities afforded by the art form. represent reactions against tradition. Going a step further, This understanding of what an art form is explains why Lopes’ some theorists hold that this process is itself and by necessity account understood normatively gets matters extensionally part of digital art practice. A work of digital art is nothing but correct. digitally rendered literature, depiction, film, performance, or So Lopes’ individuation condition for an art form fails, if music, and so its significance must lie in how it “remediates” understood statistically; and if it is understood normatively, these traditional art media by rendering them digitally (Bolter it does not support his conclusion that computer art form is and Grusin 2000). Through remediation, digital art is the art of always interactive. According to the theory-sketch developed bricolage. earlier, the computer art form embraces both interactive and Running against this grain, A Philosophy of Computer Art non-interactive works; both exploit automated algorithmic distinguishes digital art from computer art, whose medium exploration, which grounds artistically distinctive features. is computer-based interactivity. As Matravers points out, this Such a nested framework allows one better to appreciate the means that computer art faces a seriously exacerbated bricoleur commonalities, for instance, between non-interactive digital problem. If interactivity is not a medium in traditional art, then cinema and videogames; both are types of cinema (the medium what is the basis for an analogy to computer art? of the moving image) and both are types of computer art form, Every work of computer art has an interface or display differing in that the latter supports interactivity and the former made up of text, images, or sound; and perhaps these provide does not; but the technology behind both has a great deal in a basis for constructing the comparisons needed to solve the common and the appreciation of that technology is essential bricoleur problem? Remediation to the rescue after all? Not so to the appreciation of works in the media (Gaut 2010). Since so fast. The argument in A Philosophy of Computer Art assumes much of it is non-interactive, computer art is far more ubiquitous that to appreciate a work of computer art for what it is, one must and varied than Lopes countenances. A Philosophy of Computer appreciate it, at least in part, for its computer-based interactivity. Art is an enormously accomplished and groundbreaking work So we cannot understand why computer-based interactivity that will shape future philosophical discussion of computer art. is a suitable vehicle for appreciation by seeking analogies But it overly restricts the computer art to its interactive form and between the computer-based interactivity of computer art and so impoverishes the domain to be discussed. There is more to the computer-based interactivity of traditional art. There is no computer art than is dreamt of in its philosophy. computer-based interactivity in traditional art. References Some readers will have noticed a sneaky reformulation Boden, Margaret. 2004. The Creative Mind: Myths and Mechanisms, 2nd of the bricoleur problem as concerning what is a suitability ed. London: Routledge. medium for appreciation instead of art. This reformulation is Bolter, Jay David and Diane Gromala. 2003. Windows and Mirrors: harmless as long as what makes something art is at least in Interaction Design, Digital Art, and the Myth of Transparency. Cambridge, part features of its medium that make it apt for appreciation. Mass.: MIT. Institutional theories of art deny that what makes something art Copeland, B. Jack. 2004. Computation. In The Blackwell Guide to the has anything to do with features of its medium that make it apt Philosophy of Computing and Information, ed. Luciano Floridi. Malden, for appreciation, but institutional theories of art are inconsistent Mass.: Blackwell. with the bricoleur problem. They say that any medium is in Gaut, Berys. 2010. A Philosophy of Cinematic Art. Cambridge: Cambridge principle a suitable vehicle for art. University Press. One way to solve the bricoleur problem relies on interactive Lopes, Dominic McIver. 2010. A Philosophy of Computer Art. London: precursors to computer art to furnish suitable analogies. Routledge. Happenings, for example, are interactive though not computer- based (Lopes 2009a, 49-51), and some writers on digital art trace its roots to Happenings and Dada performances. Alas, this Remediation Revisited: Replies to Gaut, proposal is ultimately unsatisfactory. Truly interactive precursors Matravers, and Tavinor to computer art are few and far between, and their interactivity is typically a mere means to other artistic purposes, such as Dominic McIver Lopes unscriptedness. For these two reasons, one might wonder University of British Columbia whether interactivity is a medium in these works. A Philosophy of Computer Art was conceived of a hunch that Tavinor suggests another solution to the bricoleur thinking about computer art might allow us to come at large problem, in discussing why the artistic aspects of video games and familiar problems in aesthetics and art theory from a new involve interactivity. Some games (e.g., checkers) have no angle. Berys Gaut, Derek Matravers, and Grant Tavinor touch representational elements, some games (e.g., chess) have upon some of these large and familiar problems in earlier interactive and representational elements that are completely issues of this Newsletter. One of these is Richard Wollheim’s independent of each other, but in most video games (e.g., The “bricoleur problem.” Sims) representation and interactive game-play are inseparable. One appreciates The Sims for how its little dramas are realized Wollheim asked what makes some stuffs or processes—or through interaction: the interaction is what it is only given the “media”—suitable vehicles of art, and he proposed that a representational elements and the representation is what it is solution to this “bricoleur problem” will be largely determined only given the interaction. So, in trying to understand why video by “analogies and disanalogies that we can construct games are suitable vehicles for appreciation, why not draw between the existing arts and the art in question” (1980, 43). analogies between drama-realized-interactively and drama- In seeking these analogies and disanalogies, we may draw realized-by-actors-following-a-script? And if video games are from the “comparatively rich context” of critical and historical

— 35 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — the popular end of computer art, then this proposal solves the share a distinctive feature in common and that are normally bricoleur problem for computer art. appreciated partly for having that feature” (Lopes 2009a, 18). The proposal can be generalized in a way that makes it Second, the “normally” requires a word of explanation. It clear that remediation has not snuck in the back door. We is possible to appreciate a K as a K* (Lopes 2008). For example, appreciate works for such formal, expressive, and cognitive it is possible to appreciate a building as a sculpture, though properties as having balance, being sad, and bringing out how buildings are not sculptures, and it is also possible to appreciate none of us are free of gender bias. In different arts, these are a building as an antelope, though it would probably not come off realized in different ways—by acting, narrative, depiction, tone- very well (it depends on the building!). However, what makes meter-timbre structures, and the like. Why should a solution architecture an art form is that buildings are works of art and to the bricoleur problem send us in search of analogies at the there is a norm to appreciate them as buildings. Works made level of realizers and not at the level of the formal, expressive, on Tuesdays are an art kind and it is possible to appreciate a and cognitive properties that they realize? Perhaps the analogies work as a Tuesday work, but there is no norm to appreciate we need to solve computer art’s acute case of the bricoleur anything as a work made on a Tuesday, and that is why Tuesday problem are not to be found by comparing interactivity to works are not an art form. media like acting, narrative, depiction, and tone-meter-timbre There is no norm to appreciate digital art works as digital art structures, but rather by comparing the formal, expressive, because we do not in fact appreciate them as digital art, though and cognitive achievements of interactivity alongside those of we do appreciate them as digital songs, photographs, and the acting, narrative, depiction, and tone-meter-timbre structures. like. Have you ever appreciated a digital song in comparison to Simply put, interactivity is a suitable medium for appreciation if arbitrarily any digital photograph? If you do appreciate 1234 as interactive works can realize features worth appreciating. a work of digital art, then you would have no more reason to This suggestion must fall flat if a solution to the bricoleur exclude Jeff Wall’s A Sudden Gust of Wind from its comparison problem must tell us how a medium can be a suitable vehicle for class as you would have to exclude any other digital song. art when it is not a suitable vehicle for appreciation. However, The examples of David Cope’s EMI and Harold Cohen’s as already noted, only institutional theories try to understand AARON do double duty in Gaut’s argument. EMI and AARON what makes for art without appeal to appreciation, and the output (non-interactive) works that we appreciate as products bricoleur problem does not arise for these theories. of automated algorithmic processing. As a result, they seem to If this thinking is sound, it is possible to solve the bricoleur counter the claim that we never appreciate any given work of problem without appeal to remediation. To the extent that the digital art in a comparison class with arbitrarily any other work problem pushes theorists to emphasize that digital art is an art of digital art. of remediation, we now have room to downplay digital art as a Gaut is right to say that we can and do appreciate the works theoretical concept and to counterbalance it with the concept of output by EMI and AARON in comparison with one another. computer art. However, all of this is wheel spinning if computer AARON’s drawings are more original but less impressive art is not an art form in the first place. formally than EMI’s compositions because of the algorithms Gaut argues that there is an art form—call it “computer- and databases that each employs. This admission is consistent based art”—which conjoins computer art and digital art. Gaut’s with the argument against the proposition that digital art is an argument proceeds first by objecting to the argument in A art form so long as the case of AARON and EMI is not like that Philosophy of Computer Art that digital art is not an art form and of 1234 and A Sudden Gust of Wind. The question is whether then by proposing a feature, automated algorithmic processing, there is room to allow for the appreciation of AARON’s drawings which is the medium for all computer-based art—for computer alongside EMI’s songs without having to squeeze A Sudden Gust art and digital art alike. of Wind into the same art form as 1234. There is if computer art Here is the argument in the book for the claim that digital art has a sister art form, “generative art,” wherein algorithms are is not an art form. An art kind is an art form only if we normally run on electronic computers to output new works of art (see appreciate any work in the kind by comparison with arbitrarily also Andrews 2009). On this proposal, AARON and EMI output any other work in the kind. Digital art is an art kind but we do not generative art. But A Sudden Gust of Wind and 1234 are not appreciate any given work of digital art with arbitrarily any other works of generative art; they belong instead to digital images work of digital art. Therefore, digital art is not an art form. and digital music, which are genres of the traditional arts of Gaut doubts whether the major premise of this argument depiction and music. can individuate art forms. Rembrandt self-portraits and This discussion is not, as it might seem, empty taxonomizing, paleolithic cave paintings are pictures but we do not normally for it brings us face to face with the bricoleur problem, whence compare Rembrandt self-portraits with cave paintings. Fair it leads us into fundamental questions of value in the arts and enough, there is some truth in that. Bear in mind two points, the role of media in realizing that value. Gaut, Matravers, and however. Tavinor raise plenty of other issues besides these that merit First, the claim is not that we consciously or actively study and dialogue. A Philosophy of Computer Art was never compare any given work in an art form with each and every to be the last word on its topic, but rather the first. other work in the art form. Rather, the claim is that the Ks References are the works whose comparison class does not exclude any Andrews, J. 2009. Review of A Philosophy of Computer Art. Netpoetic. K. Appreciating a Rembrandt self-portrait as a Rembrandt http://netpoetic.com/2009/11/. self-portrait does exclude cave paintings, but appreciating a Bolter, J. D. and R. Grusin. 2000. Remediation: Understanding New Rembrandt self-portrait as a picture does not exclude cave Media. Cambridge: MIT Press. paintings. An appreciation of the self-portrait that would have Gaut, B. 2009. Computer art. APA Newsletter on Philosophy and gone differently were cave paintings included cannot be an Computers 9.1 (2009) and American Society for Aesthetics Newsletter appreciation of the self-portrait as a picture—it must be an (2009). appreciation of it as something narrower—a seventeenth- Lopes. D. M. 2008. True appreciation. In Photography and Philosophy: century painting, perhaps. In the book, the same idea is New Essays on the Pencil of Nature, ed. Scott Walden. Oxford: Wiley- expressed by saying that an art form is “a group of works that Blackwell. ———. 2009a. A Philosophy of Computer Art. London: Routledge.

— 36 — — Philosophy and Computers —

———. 2009b. Précis of A Philosophy of Computer Art. APA Newsletter I think that there is a terminological issue about “ontologically on Philosophy and Computers 8.2 (2009) and American Society for important” or “ontologically illuminating.” Preston thinks Aesthetics Newsletter (2009). that she disagrees with me because she does not think the Matravers, D. 2009. Sorting out the value of new art forms. APA mind-independent/mind-dependent distinction (along with Newsletter on Philosophy and Computers 8.2 (2009) and American associated distinctions between natural objects and artifacts, Society for Aesthetics Newsletter (2009). or between nonID objects and ID objects) is ontologically Tavinor, G. 2009. Videogames, interactivity, and art. APA Newsletter important. Kroes and Vermaas think that they disagree with on Philosophy and Computers 9.1 (2009) and American Society for me because they do think that the mind-independent/mind- Aesthetics Newsletter (2009). dependent distinction is ontologically important. But all of us Wollheim, R. 1980. Art and Its Objects, 2nd ed. Cambridge: Cambridge agree that “mind-dependence” does not mark any ontological University Press. deficiency. The fact remains, however, that the putative distinction between mind-independent and mind-dependent objects is Shrinking Difference—Response to Replies the basis for mainstream analytic metaphysics. A mainstream Lynne Rudder Baker corollary is that mind-dependent objects are ontologically University of Massachusetts–Amherst deficient. In the face of this unfortunate situation, it seems to me a reasonable strategy to acknowledge that there is a coherent First, I’d like to express my appreciation to Amie L. Thomasson, distinction between what is mind-independent and what is not, Beth Preston, Peter Kroes and Pieter E. Vermaas, and Roxanne but to deny that the distinction is ontologically important. Kurtz for their thoughtful replies to my article, “The Shrinking Preston has two reasons for abandoning the distinction Difference Between Artifacts and Natural Objects.” between ID and nonID objects: (1) The distinction is vague; In response to Amie Thomasson: many objects do not fit well on either side of the line. (2) Use Amie Thomasson and I are in agreement about artifacts, of the distinction to understand artifacts is suspect. in particular about the existential dependence of artifacts on As to (1), I subscribe to ontological vagueness generally. human intentions. Thomasson says, “Since the very idea of Spatial boundaries and temporal boundaries are all vague, an artifact is of something mind-dependent in certain ways, independently of our concepts. Exactly when did our solar accepting mind-independence as an across-the-board criterion system come into existence? What are the spatial boundaries for existence gives us no reason to deny the existence of of a tree with autumn leaves in the process of falling? If artifacts; it merely begs the question against them.” I agree our distinctions are to be accurate, they should permit entirely. indeterminate cases. Thomasson discusses two very interesting issues about As to (2), my view is that the identity and nature of an mind-dependent objects that I did not raise. First, she mentions artifact depends on its proper function, and its proper function the distinction between imaginary objects (if there are any) depends on human intentions. I admit that I have not studied that are merely the products of human thought and more artifact functions as extensively as Preston has, but as long as familiar artifacts like tables and hammers. I agree that this is “most of the big issues are still up in the air, including the issue an important distinction; I was concerned only with technical of where and how artifacts get their proper functions” (28), I’ll artifacts. stick with my view. Second, and relatedly, Thomasson mentions “abstract In response to Kroes and Vermaas: artifacts” like “novels and laws of state, songs, and Kroes and Vermaas agree that artifacts should not be corporations”—artifacts that are not constituted by aggregates regarded as ontologically inferior to natural objects, but still of particles. Her own work has been a contribution to want to maintain the importance of the mind-dependent/mind- understanding such abstract artifacts. independent distinction. As experts on technical artifacts and I think that she is entirely correct to draw attention to the philosophy of engineering, their “take” on the issues is the increasing importance of abstract artifacts—databases, somewhat different from mine. I found their discussion (and search engines, computer programs—in daily life. This whole different interpretation) of internal principles of activity, as area is crawling with philosophical issues that need more well as the philosophical questions they raise about the nature philosophical attention. Although I dealt only with concrete of regularities (or laws) in the engineering sciences, quite artifacts, a complete account of artifacts must include all kinds instructive. I certainly agree that more epistemological work of “abstract artifacts.” needs to be done by people (like Kroes and Vermaas) with In response to Beth Preston: greater knowledge of engineering than I have. Preston’s comment is very thought-provoking. Preston Kroes and Vermaas have a different interest in the mind- has an admirable store of knowledge of pertinent examples. dependent/mind-independent distinction from mine. My She fascinatingly shows how my examples of blurring the interest is to deny that the distinction can be a basis for ontology line between natural and artifactual objects are really just in this sense: Mind-independence cannot be the criterion for developments of ancient human interventions in nature. It being in the ontology. Kroes and Vermaas agree with that point. still seems to me—and I think to Preston as well—that there What they are interested in is that we maintain the difference is a line (although “blurry”) between objects whose existence between artifacts (mind-dependent) and natural objects depends on intentional human interventions and objects whose (mind-independent) without downgrading artifacts. As I said existence does not so depend. Be that as it may, Preston goes on to Preston, I can agree that there is such a distinction, and that to argue that there is not an ontologically important line between a version of it (the ID/nonID distinction) marks the difference intention-dependent (ID) objects and nonintention-dependent between artifacts and natural objects. I am uncertain whether (nonID) objects. She advocates abandoning the distinction the apparent difference between Kroes and Vermaas and me altogether. Indeed, she says, “the distinction between artifacts regarding the mind-independent/mind-dependent distinction and natural objects is itself ontologically unilluminating” (28). is merely terminological.

— 37 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

One small final point: Kroes and Vermaas take issue with It is not mind-independent in the sense that I am talking about. my example of automobiles. I said that “when automobiles Its existence is still dependent on human intentions. were invented, a new kind of thing came into existence: and If atoms spontaneously coalesced in outer space to form it changed the world.” They say that if we apply Alexander’s an object that was molecule-for-molecule just like your ’87 Dictum (“To be real is to have effects”), I should have said: Chevy, the car lookalike would not be a car. (It would be mind- “when automobiles were invented, a new kind of thing came independent, however.) into existence because it changed the world.” I disagree. The On my view, the table in your kitchen and the table in your automobile had effects quite independently of its changing the Second Life kitchen are both artifacts and both dependent on world: It had the effect of conveying its passengers from one intentions. However, I agree with you that there is an ontological place to another in a private vehicle. difference between them—a difference that I have not explored. In response to Roxanne Kurtz: Philosophical exploration of the difference is certainly worth Kurtz agrees with me that artifacts are intention-dependent undertaking! and that artifacts are not ontologically deficient compared to References natural objects. She offers novel support for this position by Anderson, D.L. 2009. A semantics for virtual environments and the exploiting a point on which she and I agree: We—and thus our ontological status of virtual objects. American Philosophical Association intentional activities—are part of nature. Newsletter on Philosophy and Computers 08(2): forthcoming. As I understand her argument, it is a kind of sorites: My Baker, L.R. 2008. The shrinking difference between artifacts and natural opponent offers Intention Independence as the criterion of objects. American Philosophical Association Newsletter on Philosophy ontological robustness: and Computers 07(2): 2-5. Intention Independence: An object is ontologically Kroes P. and Vermaaas, P.E. 2008. Interesting differences between robust only if it could exist in a world that lacks beings with artifacts and natural objects. American Philosophical Association intentions. Newsletter on Philosophy and Computers 08(1): 28-31. My hypothetical opponent accepts Intention Independence. Kurtz, R. 2009. Ontologically tough artifacts and non-spooky intentions. But why, Kurtz asks, should we accept Intention Independence American Philosophical Association Newsletter on Philosophy and Computers 08(2): 31-33. as a criterion of ontological robustness instead of various alternative candidates as a criterion of ontological robustness Preston, B. 2008. The shrinkage factor: Comment on Lynne Rudder that would also demote increasingly many kinds of natural Baker’s ‘The shrinking difference between artifacts and natural objects’. American Philosophical Association Newsletter on Philosophy and objects: Instinct Independence, Learning Independence, Computers 08 (1): 26-28. Biological Independence, Geological Independence? Each of Thomasson A.L. 2008. Artifacts and mind-independence: Comments these alternatives is more draconian than its predecessor. By on Lynne Rudder Baker’s ‘The shrinking difference between artifacts the end of this series, hardly any medium-sized natural objects and natural objects’. American Philosophical Association Newsletter on count as ontologically robust. Perhaps we would be left only Philosophy and Computers 08(1): 25-26. with simples. For the sake of argument, Kurtz (quite reasonably) rejects the simples-only position. Since we are part of nature, we have a series of candidate criteria of natural processes beginning with Intention Independence and ending with Geological Independence. ARTICLES Each of the candidates invokes natural processes for the creation of objects. If someone (e.g., my opponent who accepts Intention Independence) accepts any of the criteria, she should Building Artificial Mimetic Minds: How Hybrid accept them all. There is no principled place to stop. “[I]t is Humans Make Up Distributed Cognitive reasonable for us to expect the ontological robustness criteria that invoke these processes to stand and fall together in virtue Systems of the metaphysical similarities of the involved processes” (33). Lorenzo Magnani Thus, all of them should be rejected as criteria of ontological University of Pavia, Italy robustness, including Intention Independence. Thus, my opponent cannot hold that artifacts are ontologically deficient because they depend on intentions. Abstract She must reject Intentional Independence as she hits “an The imitation game between human and machine, proposed by ontologically robust brick wall” (32). Turing in 1950, is a game between a discrete and a continuous If I have Kurtz’s argument right, it is an interesting one. But system. In the framework of the recent studies about embodied I am not convinced that the “metaphysical similarities of the and distributed cognition and about prehistoric brains, the involved processes” are sufficiently strong to warrant Kurtz’s Turing’s “discrete-state machine” can be seen as an externalized conclusion that Intention Independence should be rejected cognitive mediator that constitutively integrates human “hybrid” along with the other criteria. That said, however, I certainly cognitive behavior. Through the description of a subclass of agree with her conclusion. these cognitive mediators I call “mimetic minds,” the paper will deal with some of their cognitive and epistemological aspects A note on some remarks by David Leech Anderson: and with the cognitive role played by the manipulations of the Mind-dependence, or intention-dependence, as I construe environment that includes them. The last part of the paper will it, is not causal dependence but ontological dependence. describe the concept of mimetic mind that I have introduced to When I say that an artifact (ontologically) depends on human shed new light on the role of computational modeling and on intentions, I mean that it could not have existed in a world the decline of the so-called Cartesian computationalism. without beings with intentions. I do not mean that anyone has to think of the artifact to keep it in existence. An object that Introduction comes off an automated assembly line after an attack that kills Following Peirce’s semiotics, the interplay between internal all human beings but leaves inanimate objects intact is still a car. and external representation can be further depicted taking

— 38 — — Philosophy and Computers — advantage of what I call semiotic brains. They are brains that • Universal) Logical Computing Machines (LCMs). An make up a series of signs and that are engaged in making or LCM is a kind of discrete machine Turing introduced in 1937 manifesting or reacting to a series of signs. Through this semiotic that has activity they are at the same time engaged in “being minds” and […] an infinite memory capacity obtained in the form thus in thinking intelligently. An important effect of this semiotic of an infinite tape marked out into squares on each of brain activity is a continuous process of disembodiment of mind which a symbol could be printed. At any moment there that exhibits a new cognitive perspective on the mechanisms is one symbol in the machine; it is called the scanned underlying the semiotic emergence of meaning processes. To symbol. The machine can alter the scanned symbol illustrate this process I will take advantage of Turing’s comparison and its behavior is in part described by that symbol, between the so-called “unorganized” brains and “logical” and but the symbols on the tape elsewhere do not affect “practical” machines, and of some paleoanthropological results the behavior of the machine. However, the tape can be on the birth of material culture, that provide an evolutionary moved back and forth through the machine, this being perspective on the origin of intelligent behaviors. Then I will one of the elementary operations of the machine. Any describe the centrality to semiotic cognitive information symbol on the tape may therefore eventually have processes of the disembodiment of mind from the point of innings. (Turing 1969, 6) view of the cognitive interplay between internal and external representations, both mimetic and creative. I consider this This machine is called Universal if it is “such that if the standard interplay critical in analyzing the relation between meaningful description of some other LCM is imposed on the otherwise semiotic internal resources and devices and their dynamical blank tape from outside, and the (universal) machine then set interactions with the externalized semiotic materiality already going it will carry out the operations of the particular machine stored in the environment. This materiality plays a specific role whose description is given” (p. 7). The importance of this in the interplay due to the fact that it exhibits (and operates machine resorts to the fact that we do not need to have an through) its own cognitive constraints. Hence, minds are infinity of different machines doing different jobs. A single one “extended,” that is, hybrid and artificial in themselves. suffices: it is only necessary “to program” the universal machine From this perspective Turing’s “unorganized” brains can be to do these jobs. seen as structures that organize themselves through a semiotic • (Universal) Practical Computing Machines (PCMs). PCMs activity that is reified in the external environment and then are machines that put their stored information in a form very re-projected and reinterpreted through new configurations different from the tape form. Given the fact that in LCMs the of neural networks and chemical processes. I will show how number of steps involved tends to be enormous because of the disembodiment of mind can nicely account for low-level the arrangement of the memory along the tape, in the case of semiotic processes of meaning creation, bringing up the PCMs, “by means of a system that is reminiscent of a telephone question of how higher-level processes could be comprised exchange it is made possible to obtain a piece of information and how they would interact with lower-level ones. To better almost immediately by ‘dialing’ the position of this information explain these higher-level semiotic mechanisms I will return to in the store” (p. 8). Turing adds that “nearly” all the PCMs under analysis of the role of model-based and manipulative abduction construction have the fundamental properties of the Universal and of external representations.1 The example of elementary Logical Computing Machines: “given any job which could have geometry will also be examined, where many external things, be done on an LCM one can also do it on one of these digital usually inert from the cognitive/semiotic point of view, can be computers” (ibid.) so we can speak of Universal Practical transformed into what I have called “epistemic mediators” (cf. Computing Machines. Magnani 2009, chapter three) that then give rise—for instance, • Unorganized Machines. Machines that are largely random in the case of scientific reasoning—to new signs, new chances in their constructions are called “Unorganized Machines”: for “interpretants,” and thus to new interpretations. Taking advantage of Turing’s comparison between “unorganized” So far we have been considering machines which are brains and “logical” and “practical” machines the concept designed for a definite purpose (though the universal of the mimetic mind is introduced. This sheds new cognitive machines are in a sense an exception). We might and philosophical light on the role of Turing’s machines and instead consider what happens when we make up a computational modeling, outlines the decline of the so-called machine in a comparatively unsystematic way from Cartesian computationalism, and emphasizes the possible some kind of standard components. […] Machines impact of the construction of new types of universal “practical” which are largely random in their construction in this machines, available over there, in the environment, as new tools way will be called “Unorganized Machines.” This does underlying the emergence of meaning processes. not pretend to be an accurate term. It is conceivable that the same machine might be regarded by one man 1 Turing Unorganized Machines as organized and by another as unorganized. (p. 9) 1.1 Logical, Practical, Unorganized, and Paper Machines They are machines made up Aiming at building intelligent machines Turing first of all provides from a large number of similar units. an analogy between human brain and computational machines. Each unit is endowed with two In “Intelligent Machinery,” written in 1948 (Turing 1969), he input terminals and has an output maintains that “the potentialities of human intelligence can terminals that can be connected only be realized if suitable education is provided” (p. 3). The to the input terminals of 0 or more concept of unorganized machine is then introduced, and it is of other units. An example of the maintained that the infant human cortex is of this nature. The so-called unorganized A-type argumentation is indeed related to showing how such machines machine with all units connected can be educated by means of “rewards and punishments.” to a synchronizing unit from which elsewhere do not affect the behavior of theFigure machine. 1. However, In Turing the 1969. tape can be moved back and forth through the Unorganized machines are listed among different kinds of synchronizing pulses are emitted at existent machineries: more or less equal intervals of times is given in Figure 1 (the times when the pulses arrive are called moments and each unit is capable of having two states at each — 39 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — moment). The so-called A-type unorganized machines are information to the machine modifies its behavior. It is clear that considered very interesting because they are the simplest model in the case of the universal machine, paper interference can be of a nervous system with a random arrangement of neurons (cf. as useful as screwdriver interference: we are interested in this the following section “Brains as unorganized machines”). kind of interference. We can say that each time an interference • Paper Machines. “It is possible to produce the effect of a occurs the machine is probably changed. It has to be noted that computing machine by writing down a set of rules of procedure paper interference provides information that is both “external” and asking a man to carry them out. […] A man provided with and “material” (further consideration on the status of this paper, pencil and rubber, and subject to strict discipline, is information are given below in section 5). in effect a universal machine” (p. 9). Turing calls this kind of Turing thought that the fact that human beings have made machine “Paper Machine.” machinery able to imitate any small part of a human being 1.2 Continuous, Discrete, and Active Machines was positive in order to believe in the possibility of building thinking machinery: trivial examples are the microphone for The machines described above are all discrete machines the ear, and the television camera for the eye. What about because it is possible to describe their possible states as the nervous system? We can copy the behavior of nerves with a discrete set, with the motion of the machines occurring suitable electrical models and the electrical circuits which by jumping from one state to another. Turing remarks that are used in electronic computing machinery seem to have all machinery can be regarded as continuous (where the essential properties of nerves because they are able to transmit states form a continuous manifold and the behavior of the information and to store it. machine is described by a curve on this manifold) but “when it is possible to regard it as discrete it is usually best to do so. Education in human beings can model “education of Moreover, machineries are called “controlling” if they only deal machinery.” “Mimicking education, we should hope to modify with information, and “active” if they aim at producing some the machine until it could be relied on to produce definite definite physical effect. A bulldozer will be a continuous and reactions to certain commands” (p. 14). A graduate has had active machine, a telephone continuous and controlling. But interactions with other human beings for twenty years or also brains can be considered machines and they are—Turing more and at the end of this period “a large number of standard says “probably”—continuous and controlling but “very similar routines will have been superimposed on the original pattern to much discrete machinery” (p. 5). of his brain” [ibid.]. Turing maintains that Brains very nearly fall into this class [discrete controlling—when it is natural to describe its possible 1) in human beings the interaction is mainly with other states as a discrete set] and there seems every reason humans and the receiving of visual and other stimuli to believe that they could have been made to fall constitutes the main forms of interference; genuinely into it without any change in their essential 2) it is only when a human being is “concentrating” that properties. However, the property of being “discrete” s/he approximates a machine without interference; is only an advantage for the theoretical investigator, 3) even when a human being is concentrating his behavior and serves no evolutionary purpose, so we could not is mainly conditioned by previous interference. expect Nature to assist us by producing truly “discrete brains.” (p. 6) 2 Brains as Unorganized and Organized Machines Brains can be treated as machines but they can also be 2.1 The Infant Cortex as an Unorganized Machine considered discrete machines. The epistemological reason is In many unorganized machines when a configuration2 is clear: this is just an advantage for the “theoretical investigator” reached and possible interference suitably constrained, the that aims at knowing what are intelligent machines, but certainly machine behaves as one organized (and even universal) it would not be an evolutionary advantage. “Real” humans machine for a definite purpose. Turing provides the example of brains are of course continuous systems, only “theoretically” a B-type unorganized machine with sufficient units where we they can be treated as discrete. can find particular initial conditions able to make it a universal Following Turing’s perspective we have derived two machine also endowed with a given storage capacity. The set new achievements about machines and intelligence: brains up of these initial conditions is called “organizing the machine” can be considered machines, the simplest nervous systems that indeed is seen as a kind of “modification” of a preexisting with a random arrangement of neurons can be considered unorganized machine through external interference. unorganized machines, in both cases with the property of Infant brain can be considered an unorganized machine. being “discrete.” Given the analogy previously established (cf. subsection 1.1 above, “Logical, Practical, Unorganized, and Paper Machines”), 1.3 Mimicking Human Education what are the events that modify it in an organized universal Turing adds: brain/machine? “The cortex of an infant is an unorganized The types of machine that we have considered so far machinery, which can be organized by suitable interference are mainly ones that are allowed to continue in their training. The organization might result in the modification of own way for indefinite periods without interference the machine into a universal machine or something like it. […] from outside. The universal machines were an This picture of the cortex as an unorganized machinery is very exception to this, in that from time to time one might satisfactory from the point of view of evolution and genetics” change the description of the machine which is being (p. 16). The presence of a human cortex is not meaningful in imitated. We shall now consider machines in which itself: “[…] the possession of a human cortex (say) would be such interference is the rule rather than the exception. virtually useless if no attempt was made to organize it. Thus (p. 11) if a wolf by a mutation acquired a human cortex there is little reason to believe that he would have any selective advantage” Screwdriver interference is when parts of the machine are [ibid.]. Indeed, the exploitation of a big cortex (that is its possible removed and replaced with others, giving rise to completely new organization) requires a suitable environment: “If however the machines. Paper interference is when mere communication of mutation occurred in a milieu where speech had developed

— 40 — — Philosophy and Computers —

(parrot-like wolves), and if the mutation by chance had well Handaxes were made by Early Humans and firstly appeared permeated a small community, then some selective advantage 1.4 million years ago, still made by some of the Neanderthals might be felt. It would then be possible to pass information on in Europe just 50,000 years ago. The making of handaxes is from generation to generation” [ibid.]. strictly intertwined with the development of consciousness. Hence, organizing human brains into universal machines Many needed capabilities constitute a part of an evolved strongly relates to the presence of psychology that appeared long before the first handaxes were 1. speech (even if only at the level rudimentary parrot-like manufactured. Consequently, it seems humans were pre- wolves) adapted for some components required to make handaxes (Mithen 1996, 1999): 2. and a social setting where some “techniques” are learnt (“the isolated man does not develop any intellectual power. It is 1. imposition of symmetry (already evolved through necessary for him to be immersed in an environment of other predators escape and social interaction). It has been an men, whose techniques he absorbs during the first twenty years unintentional by-product of the bifacial knapping technique but of his life. He may then perhaps do a little research of his own also deliberately imposed in other cases. It is also well-known and make a very few discoveries which are passed on to other that the attention to symmetry may have developed through men. From this point of view the search for new techniques social interaction and predator escape, as it may allow one to must be regarded as carried out by human community as a recognize that one is being directly stared at (Dennett 1991). It whole, rather than by individuals”) (p. 23). seems that “Hominid handaxes makers may have been keying into this attraction to symmetry when producing tools to attract This means that a big cortex can provide an evolutionary the attention of other hominids, especially those of the opposite advantage only in the presence of that massive storage of sex” (Mithen 1999, 287); information and knowledge on external supports that only an already developed small community can possess. Turing 2. understanding fracture dynamics (for example, evident himself considers this picture rather speculative but evidence from Oldowan tools and from nut cracking by chimpanzees from paleoanthropology can support it, as I will describe in the today); following section. 3. ability to plan ahead (modifying plans and reacting to Moreover, the training of a human child depends on contingencies, such unexpected flaws in the material and a system of rewards and punishments, that suggests that miss-hits), still evident in the minds of Oldowan tool makers organization can occur only through two inputs. The example and in chimpanzees; of an unorganized P-type machine, that can be regarded as 4. high degree of sensory-motor control: The origin of an LCM without a tape and largely incompletely described, this capability is usually tracked back to encephalization—the is given. Through suitable stimuli of pleasure and pain (and increased number of nerve tracts and of the integration between the provision of an external memory) the P-type machine can them allows for the firing of smaller muscle groups—and become a universal machine (p. 20). bipedalism—that requires a more complex integrated highly When the infant brain is transformed into an intelligent one fractionated nervous system, which in turn presupposes a both discipline and initiative are acquired: “to convert a brain larger brain. or machine into a universal machine is the extremest form of The combination of these four resources produced the birth discipline. […] But discipline is certainly not enough in itself to of what Mithen calls technical intelligence of early human mind, produce intelligence. That which is required in addition we call that is consequently related to the construction of handaxes. initiative. […] Our task is to discover the nature of this residue Indeed, they indicate high intelligence and good health. They as it occurs in man, and try and copy it in machines” (p. 21). cannot be compared to the artefacts made by animals, like Examples of problems requiring initiative are the following: honeycomb or spider web, deriving from the iteration of fixed “Find a number n such that…”, “see if you can find a way of actions which do not require consciousness and intelligence. calculating the function which will enable us to obtain the values 3.1 Private Speech and Fleeting Consciousness for arguments…”. The problem is equivalent to that of finding Two central factors play a fundamental role in the combination a program to put on the machine in question. of the four resources above: We have seen how a brain can be “organized,” but how is • the exploitation of private speech (speaking to oneself) the relation of that brain with the idea of “mimetic mind”? to trail between planning, fracture dynamic, motor 3 From the Prehistoric Brains to the Universal control, and symmetry (also in children there is a Machines kind of private muttering which makes explicit what is implicit in the various abilities); I have said that a big cortex can provide an evolutionary • a good degree of fleeting consciousness (thoughts advantage only in the presence of a massive storage of information and knowledge on external supports that only about thoughts). an already developed small community of human beings can In the meantime these two aspects played a fundamental role possess. Evidence from paleoanthropology seems to support in the development of consciousness and thought: this perspective. Some research in cognitive paleoanthropology So my argument is that when our ancestors made teaches us that high level and reflective consciousness in handaxes there were private mutterings accompanying terms of thoughts about our own thoughts and about our the crack of stone against stone. Those private feelings (that is consciousness not merely considered as raw mutterings were instrumental in pulling the knowledge sensation) is intertwined with the development of modern required for handaxes manufacture into an emergent language (speech) and material culture. After 250,000 years consciousness. But what type of consciousness? I think ago several hominid species had brains as large as ours today, probably one that was a fleeting one: one that existed but their behavior lacked any sign of art or symbolic behavior. during the act of manufacture and that did not endure. If we consider high-level consciousness as related to a high- One quite unlike the consciousness about one’s level organization—in Turing’s sense—of human cortex, its emotions, feelings, and desires that were associated origins can be related to the active role of environmental, social, with the social world and that probably were part of a linguistic, and cultural aspects. — 41 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

completely separated cognitive domain, that of social An evolved mind is unlikely to have a natural home for intelligence, in the early human mind.” (p. 288) this being, as such entities do not exist in the natural world: so whereas evolved minds could think about This use of private speech can certainly be considered a “tool” humans by exploiting modules shaped by natural for organizing brains and so for manipulating, expanding, and selection, and about lions by deploying content rich exploring minds, a tool that probably evolved with another: mental modules moulded by natural selection and talking to each other. Both private and public language act about other lions by using other content rich modules as tools for thought and play a central role in the evolution of from the natural history cognitive domain, how could consciousness. one think about entities that were part human and 3.2 Material Culture part animal? Such entities had no home in the mind. Another tool appeared in the latter stages of human evolution, (p. 291) that played a great role in the evolutions of minds in hybrid A mind consisting of different “separated intelligences” (for mimetic minds, that is, in a further organization of human instance, “thinking about animals” as separated from “thinking brains. Handaxes are at the birth of material culture, so as new about people”) cannot come up with such an entity. The only cognitive chances can co-evolve: way is to extend the mind into the material word, exploiting • the mind of some early humans, like the Neanderthals, rocks, blackboards, paper, ivory, and writing, painting, and were constituted by relatively isolated cognitive carving: “artefacts such as this figure play the role of anchors domains Mithen calls different intelligences, probably for ideas and have no natural home within the mind; for ideas endowed with different degree of consciousness about that take us beyond those that natural selection could enable the thoughts and knowledge within each domain us to possess” (p. 291). (natural history intelligence, technical intelligence, In the case of our figure we are faced with an social intelligence). These isolated cognitive domains anthropomorphic thinking created by the material representation became integrated also taking advantage of the role serving to anchor the cognitive representation of a supernatural of public language; being. In this case the material culture disembodies thoughts • degrees of high level consciousness appear, human that otherwise will soon disappear without being transmitted beings need thoughts about thoughts; to other human beings. The early human mind possessed two • social intelligence and public language arise. separated intelligences for thinking about animals and people. Through the mediation of the material culture the modern It is extremely important to stress that material culture is human mind can creatively arrive to “internally” think about not just the product of this massive cognitive chance but also animals and people at the same time. the cause of it. “The clever trick that humans learnt was to disembody their minds into the material world around them: Artefacts as external objects allowed humans to loosen a linguistic utterance might be considered as a disembodied and cut those chains on our unorganized brains imposed by thought. But such utterances last just for a few seconds. Material our evolutionary past. Chains that always limited the brains of culture endures” (p. 291). other human beings, such as the Neanderthals. Loosing chains and securing ideas to external objects was a way to re-organize In this perspective we acknowledge that material artefacts brains as universal machines for thinking. Still important in are tools for thoughts as is language: tools for exploring, human reasoning and in computational modeling is the role expanding, and manipulating our own minds. In this regard the of external representations and mediators. I devoted part of evolution of culture is inextricably linked with the evolution of my research to illustrate their role at the epistemological and consciousness and thought. ethical level (Magnani 2001a, 2007, 2009). Early human brain becomes a kind of universal “intelligent” machine, extremely flexible so that we did no longer need 5 Mimetic and Creative Representations different “separated” intelligent machines doing different jobs. A 5.1 External and Internal Representations single one will suffice. As the engineering problem of producing We have said that through the mediation of the material culture various machines for various jobs is replaced by the office work the modern human mind can creatively arrive to internally of “programming” the universal machine to do these jobs, so think about animals and people at the same time. We can also the different intelligences become integrated in a new universal account for this process of disembodiment from an interesting device endowed with a high-level type of consciousness. cognitive point of view. From this perspective the expansion of the minds is in I maintain that representations can be external and internal. the meantime a continuous process of disembodiment of the We can say that minds themselves into the material world around them. In this regard the evolution of the mind is inextricably linked with the • external representations are formed by external evolution of large, integrated, material cognitive systems. In materials that express (through reification) concepts the following sections I will illustrate this extraordinary hybrid and problems that do not have a natural home in the interplay between human brains and the cognitive systems they brain; make, which is at the origins of the first interesting features of • internalized representations are internal re-projections, the modern human mind. a kind of recapitulations (learning) of external representations in terms of neural patterns of activation 4 The Disembodiment of Mind and the Birth of in the brain. They can sometimes be “internally” Hybrid Minds manipulated like external objects and can originate A wonderful example of the cognitive effects of the new internal reconstructed representations through disembodiment of mind is the carving of what most likely is the neural activity of transformation and integration. the mythical being from the last ice age, 30,000 years ago, a half This process explains why human beings seem to perform human/half lion figure carved from mammoth ivory found at both computations of a connectionist type such as the ones Hohlenstein Stadel, Germany. involving representations as

— 42 — — Philosophy and Computers —

• (I Level) patterns of neural activation that arise usually inert from the semiotic point of view, can be transformed as the result of the interaction between body and into what I have called “epistemic mediators” (Magnani 2001a, environment—that interaction that is extremely 2002) that give rise—for instance, in the case of scientific fruitful for creative results—(and suitably shaped reasoning—to new signs, new chances for interpretants, and by the evolution and the individual history): pattern new interpretations. completion or image recognition, and computations We can cognitively account for the process of that use representations as disembodiment of mind taking advantage of the concept of • (II Level) derived combinatorial syntax and manipulative abduction. It happens when we are thinking semantics dynamically shaped by the various through doing and not only, in a pragmatic sense, about doing. external representations and reasoning devices found It happens, for instance, when we are creating geometry or constructed in the environment (for example, constructing and manipulating an external, suitably realized, geometrical diagrams in mathematical creativity); icon like a triangle looking for new meaningful features of it, they are neurologically represented contingently as like in the case given by Kant in the “Transcendental Doctrine of patterns of neural activations that “sometimes” tend Method” (cf. Magnani 2001b; cf. also the following subsection). to become stabilized structures and to fix and so to It refers to an extra-theoretical behavior that aims at creating permanently belong to the I Level above. communicable accounts of new experiences to integrate them into previously existing systems of experimental and linguistic The I Level originates those sensations (they constitute a (semantic) practices. kind of “face” we think the world has) that provide room for the II Level to reflect the structure of the environment, and, most Gooding (1990) refers to this kind of concrete manipulative important, that can follow the computations suggested by these reasoning when he illustrates the role in science of the so-called external structures. It is clear we can now conclude that the “construals” that embody tacit inferences in procedures that growth of the brain and especially the synaptic and dendritic are often apparatus and machine based. The embodiment is of growth are profoundly determined by the environment. course an expert manipulation of meaningful semiotic objects in a highly constrained experimental environment, and is directed When the fixation is reached the patterns of neural activation by abductive movements that imply the strategic application no longer need a direct stimulus from the environment for their of old and new templates of behavior mainly connected with construction. In a certain sense they can be viewed as fixed extra-rational components, for instance, emotional, esthetical, internal records of external structures that can exist also in the ethical, and economic. absence of such external structures. These patterns of neural activation that constitute the I Level Representations always The hypothetical character of construals is clear: they keep record of the experience that generated them and, thus, can be developed to examine or discard further chances, they always carry the II Level Representation associated to them, are provisional creative organization of experience and some even if in a different form, the form of memory and not the of them become in their turn hypothetical interpretations of form of a vivid sensorial experience. Now, the human agent, via experience, that is, more theory-oriented, their reference/ neural mechanisms, can retrieve these II Level Representations meaning is gradually stabilized in terms of established and use them as internal representations or use parts of them to observational practices. Step by step the new interpretation— construct new internal representations very different from the that at the beginning is completely “practice-laden”—relates to ones stored in memory (cf. also Gatti and Magnani 2005). more “theoretical” modes of understanding (narrative, visual, diagrammatic, symbolic, conceptual, simulative), closer to Human beings delegate cognitive features to external the constructive effects of theoretical abduction. When the representations because in many problem solving situations the reference/meaning is stabilized the effects of incommensurability internal computation would be impossible or it would involve a with other established observations can become evident. But it very great effort because of the human mind’s limited capacity. is just the construal of certain phenomena that can be shared First, a kind of alienation is performed; second, a recapitulation by the sustainers of rival theories. Gooding (1990) shows how is accomplished at the neuronal level by re-representing Davy and Faraday could see the same attractive and repulsive internally that which was “discovered” outside. Consequently, actions at work in the phenomena they respectively produced; only later on we perform cognitive operations on the structure their discourse and practice as to the role of their construals of data that synaptic patterns have “picked up” in an analogical of phenomena clearly demonstrate they did not inhabit way from the environment. Internal representations used in different, incommensurable worlds in some cases. Moreover, cognitive processes have a deep origin in the experience lived the experience is constructed, reconstructed, and distributed in the environment. across a social network of negotiations among the different I think there are two kinds of artefacts that play the role scientists by means of construals. of external objects (representations) active in this process of It is difficult to establish a list of invariant behaviors that disembodiment of the mind: creative and mimetic. Mimetic are able to describe manipulative abduction in science. As external representations mirror concepts and problems that illustrated above, certainly the expert manipulation of objects are already represented in the brain and need to be enhanced, in a highly semiotically constrained experimental environment solved, further complicated, etc., so they sometimes can give 3 implies the application of old and new templates of behavior rise to new concepts, models, and perspectives. that exhibit some regularities. The activity of building construals Following my perspective it is at this point evident that the is highly conjectural and not immediately explanatory: these hybrid “mind” transcends the boundary of the individual and templates are hypotheses of behavior (creative or already includes parts of that individual’s environment. cognitively present in the scientist’s mind-body system, and 6 Constructing Meaning through Mimetic and sometimes already applied) that abductively enable a kind of epistemic “doing.” Hence, some templates of action and Creative External Representations manipulation can be selected in the set of the ones available and 6.1 Disembodiment of the Mind: Constructing Meaning through pre-stored, others have to be created for the first time to perform Manipulative Abduction the most interesting creative cognitive accomplishments of Manipulative abduction occurs when many external things, manipulative abduction.

— 43 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Moreover, I think that a better understanding of manipulative information and knowledge. Therefore, manipulative abduction abduction at the level of scientific experiment could improve represents a kind of redistribution of the epistemic and cognitive our knowledge of induction, and its distinction from abduction: effort to manage objects and information that cannot be manipulative abduction could be considered as a kind of basis immediately represented or found internally (for example, for further meaningful inductive generalizations. Different exploiting the resources of visual imagery).6 generated construals can give rise to different inductive If we see scientific discovery like a kind of opportunistic generalizations. ability of integrating information from many kinds of Some common features of these tacit templates that enable simultaneous constraints to produce explanatory hypotheses us to manipulate things and experiments in science to favor that account for them all, then manipulative abduction will meaning formation are related to: 1) sensibility towards the play the role of eliciting possible hidden constraints by building aspects of the phenomenon which can be regarded as curious external suitable experimental structures. or anomalous; manipulations have to be able to introduce 6.2 Manipulating Meanings through External Semiotic potential inconsistencies in the received knowledge (Oersted’s Anchors report of his well-known experiment about electromagnetism is devoted to describe some anomalous aspects that did not If the structures of the environment play such an important depend on any particular theory of the nature of electricity role in shaping our semiotic representations and, hence, our and magnetism; Ampère’s construal of experiment on cognitive processes, we can expect that physical manipulations electromagnetism—exploiting an artifactual apparatus to of the environment receive a cognitive relevance. produce a static equilibrium of a suspended helix that clearly Several authors have pointed out the role that physical shows the role of the “unexpected”); 2) preliminary sensibility actions can have at a cognitive level. In this sense Kirsh and towards the dynamic character of the phenomenon, and not Maglio (1994) distinguish actions into two categories, namely, to entities and their properties, common aim of manipulations pragmatic actions and epistemic actions. Pragmatic actions is to practically reorder the dynamic sequence of events in a are the actions that an agent performs in the environment in static spatial one that should promote a subsequent bird’s- order to bring itself physically closer to a goal. In this case the eye view (narrative or visual-diagrammatic); 3) referral to action modifies the environment so that the latter acquires experimental manipulations that exploit artificial apparatus to a configuration that helps the agent to reach a goal which is free new possibly stable and repeatable sources of information understood as physical, that is, as a desired state of affairs. about hidden knowledge and constraints (Davy’s well-known Epistemic actions are the actions that an agent performs in set-up in terms of an artifactual tower of needles showed a semiotic environment in order to discharge the mind of a that magnetization was related to orientation and does not cognitive load or to extract information that is hidden or that require physical contact). Of course, this information is not would be very hard to obtain only by internal computation. artificially made by us: the fact that phenomena are made In this subsection I want to focus specifically on the and manipulated does not render them to be idealistically relationship that can exist between manipulations of the and subjectively determined; 4) various contingent ways of environment and representations. In particular, I want to epistemic acting: looking from different perspectives, checking examine whether external manipulations can be considered the different information available, comparing subsequent as means to construct external representations. events, choosing, discarding, imaging further manipulations, If a manipulative action performed upon the environment re-ordering and changing relationships in the world by implicitly is devoted to create a configuration of signs that carries relevant evaluating the usefulness of a new order (for instance, to help information, that action will well be able to be considered memory). as a cognitive semiotic process and the configuration of From the general point of view of everyday situations elements it creates will well be able to be considered an manipulative abductive reasoning exhibits other very interesting external representation. In this case, we can really speak of an templates: 5) action elaborates a simplification of the reasoning embodied cognitive process in which an action constructs an task and a redistribution of effort across time when we “need external representation by means of manipulation. We define to manipulate concrete things in order to understand structures cognitive manipulating as any manipulation of the environment which are otherwise too abstract” (Piaget 1974), or when we are devoted to constructing external configurations that can count in the presence of redundant and unmanageable information; 6) as representations. action can be useful in presence of incomplete or inconsistent An example of cognitive manipulating is the diagrammatic information—not only from the “perceptual” point of view—or demonstration illustrated in Figure 2, taken from the field of of a diminished capacity to act upon the world: it is used to geometry. In this case a simple manipulation of the triangle get more data to restore coherence and to improve deficient in Figure 2(a) gives rise to an external configuration—Figure knowledge; 7) action as a control of sense data illustrates 2(b)—that carries relevant semiotic information about the how we can change the position of our body (and/or of the internal angles of a triangle “anchoring” new meanings. external objects) and how to exploit various kinds of prostheses (Galileo’s telescope, technological instruments, and interfaces) to get various new kinds of stimulation: action provides some tactile and visual information (e.g., in surgery), otherwise unavailable; 8) action enables us to build external artifactual models of task mechanisms instead of the corresponding internal ones, that are adequate to adapt the environment to the agent’s needs: experimental manipulations exploit artificial Figure 2. Diagrammatic demonstration that the apparatus to free new possible stable and repeatable sources sum of the internal angles of any triangle is 180°. (a) of information about hidden knowledge and constraints.4 Triangle. (b) Diagrammatic manipulations. The whole activity of manipulation is devoted to build various external epistemic mediators5 that function as versatile The entire process through which an agent arrives at a semiotic tools able to provide an enormous new source of physical action that can count as cognitive manipulating can be

— 44 — — Philosophy and Computers — understood by means of the concept of manipulative abduction schemata” [Peirce, CP, 4.233]; moreover, he uses diagrammatic (Magnani 2001a, 2009). Manipulative abduction is a specific and schematic as synonyms, thus relating his considerations case of cognitive manipulating in which an agent, when faced to the Kantian tradition where schemata mediate between with an external situation from which it is hard or impossible to intellect and phenomena.8 The following is the famous passage extract new meaningful features of an object, selects or creates in the Critique of Pure Reason (“Transcendental Doctrine of an action that structures the environment in such a way that it Method”): gives information that would be otherwise unavailable and that Suppose a philosopher be given the concept of a is used specifically to infer explanatory hypotheses. triangle and he be left to find out, in his own way, what In this way the semiotic result is achieved on external relation the sum of its angles bears to a right angle. representations used in lieu of the internal ones. Here action He has nothing but the concept of a figure enclosed performs an epistemic and not a merely performatory role, for by three straight lines, and possessing three angles. example, relevant to abductive reasoning. However long he meditates on this concept, he will 6.3 Geometrical Construction is Manipulative Abduction never produce anything new. He can analyse and Let’s quote Peirce’s passage about mathematical constructions. clarify the concept of a straight line or of an angle or Peirce says that mathematical and geometrical reasoning of the number three, but he can never arrive at any “consists in constructing a diagram according to a general properties not already contained in these concepts. precept, in observing certain relations between parts of that Now let the geometrician take up these questions. diagram not explicitly required by the precept, showing that He at once begins by constructing a triangle. Since these relations will hold for all such diagrams, and in formulating he knows that the sum of two right angles is exactly this conclusion in general terms. All valid necessary reasoning equal to the sum of all the adjacent angles which is in fact thus diagrammatic” (Peirce, CP, 1.54). This passage can be constructed from a single point on a straight clearly refers to a situation like the one I have illustrated in the line, he prolongs one side of his triangle and obtains previous section. This kind of reasoning is also called by Peirce two adjacent angles, which together are equal to “theorematic” and it is a kind of “deduction” necessary to derive two right angles. He then divides the external angle significant theorems: “[…] is one which, having represented by drawing a line parallel to the opposite side of the the conditions of the conclusion in a diagram, performs an triangle, and observes that he has thus obtained an ingenious experiment upon the diagram, and by observation external adjacent angle which is equal to an internal 9 of the diagram, so modified, ascertains the truth of the angle—and so on. In this fashion, through a chain of conclusion” (Peirce, CP, 2.267). The experiment is performed inferences guided throughout by intuition, he arrives with the help of “imagination upon the image of the premiss at a fully evident and universally valid solution of the in order from the result of such experiment to make corollarial problem. (Kant 1929, A716-B744, pp. 578-79) deductions to the truth of the conclusion” (Peirce, 1976, IV, p. As we have already said for Peirce the whole mathematics 38). The “corollarial” reasoning is mechanical (Peirce thinks it consists in building diagrams that are “[…] (continuous in can be performed by a “logical machine”) and not creative, “A geometry and arrays of repeated signs/letters in algebra) Corollarial Deduction is one which represents the condition of according to general precepts and then [in] observing in the conclusion in a diagram and finds from the observation of the parts of these diagrams relations not explicitly required this diagram, as it is, the truth of the conclusion” (Peirce, CP, in the precepts” [Peirce, CP, 1.54]. Peirce contends that this 2.267, cf. also Hoffmann 1999). diagrammatic nature is not clear if we only consider syllogistic In summary, the point of theorematic reasoning is the reasoning, “which may be produced by a machine” but transformation of the problem by establishing an unnoticed becomes extremely clear in the case of the “logic of relatives, point of view to get interesting—and possibly new—insights. where any premise whatever will yield an endless series of The demonstrations of theorems in mathematics are examples conclusions, and attention has to be directed to the particular of theorematic deduction. kind of conclusion desired” (Peirce 1986, 11-23). Not dissimilarly Kant says that in geometrical construction In ordinary geometrical proofs auxiliary constructions of external diagrams, “[…] I must not restrict my attention to are present in terms of “conveniently chosen” figures and what I am actually thinking in my concept of a triangle (this is diagrams where strategic moves are important aspects of nothing more than the mere definition); I must pass beyond it deduction. The system of reasoning exhibits a dual character: to properties which are not contained in this concept, but yet deductive and “hypothetical.” Also in other—for example, belong to it” (Kant 1929, A718-B746, p. 580). logical—deductive frameworks there is room for strategical We have seen that manipulative abduction is a kind of moves which play a fundamental role in the generations of abduction, usually model-based, that exploits external models proofs. These strategical moves correspond to particular forms endowed with delegated (and often implicit) cognitive and of abductive reasoning. semiotic roles and attributes. We know that the kind of reasoned inference that is involved 1. The model (diagram) is external and the strategy that in creative abduction goes beyond the mere relationship that organizes the manipulations is unknown a priori. there is between premises and conclusions in valid deductions, 2. The result achieved is new (if we, for instance, refer where the truth of the premises guarantees the truth of the conclusions, but also beyond the relationship that there is to the constructions of the first creators of geometry), in probabilistic reasoning, which renders the conclusion just and adds properties not contained before in the more or less probable. On the contrary, we have to see creative concept (the Kantian to “pass beyond” or “advance abduction as formed by the application of heuristic procedures beyond” the given concept, Kant 1929, A154-B193/194, that involve all kinds of good and bad inferential actions, and p. 192).7 not only the mechanical application of rules. It is only by means Iconicity in theorematic reasoning is central. Peirce, of these heuristic procedures that the acquisition of new truths analogously to Kant, maintains that “philosophical reasoning is guaranteed. Also, Peirce’s mature view illustrated above on is reasoning with words; while theorematic reasoning, or creative abduction as a kind of inference seems to stress the mathematical reasoning is reasoning with specially constructed strategic component of reasoning.

— 45 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Many researchers in the field of philosophy, logic, and Concrete manipulations on them can be done, for instance, to cognitive science have sustained that deductive reasoning get new data and cognitive information and/or to simplify the also consists in the employment of logical rules in a heuristic problem at issue (cf. the epistemic templates illustrated above manner, even maintaining the truth preserving character: the in section 6.1). application of the rules is organized in a way that is able to recommend a particular course of actions instead of another 7 Mimetic Minds one. Moreover, very often the heuristic procedures of deductive It is well known that there are external representations that are reasoning are performed by means of a model-based abduction representations of other external representations. In some cases where iconicity is central. We have seen that the most common they carry new scientific knowledge. To make an example, example of creative abduction is the usual experience people Hilbert’s Grundlagen der Geometrie is a “formal” representation have of solving problems in geometry in a model-based way of the geometrical problem solving through diagrams: in trying to devise proofs using diagrams and illustrations: of Hilbertian systems solutions of problems become proofs of course, the attribute of creativity we give to abduction in this theorems in terms of an axiomatic model. In turn a calculator case does not mean that it has never been performed before by is able to re-represent (through an artifact) (and to perform) anyone or that it is original in the history of some knowledge. those geometrical proofs with diagrams already performed Hence, we have to say that theoretical model-based by human beings with pencil and paper. In this case we have abductions—as so iconicity—also operate in deductive representations that mimic particular cognitive performances reasoning. Following Hintikka and Remes’s analysis (1974), that we usually attribute to our minds. proofs of general implication in first order logic need the use of We have seen that our brains delegate cognitive (and instantiation rules by which “new” individuals are introduced, epistemic) roles to externalities and then tend to “adopt” and so they are “ampliative.” In ordinary geometrical proofs auxiliary recapitulate what they have checked occurring outside, over constructions are present in term of “conveniently chosen” there, after having manipulated—often with creative results— figures and diagrams. In Beth’s method of semantic tableaux the external invented structured model. A simple example: it is the strategic “ability” to construct impossible configurations is relatively neurologically easy to perform an addition of numbers undeniable (Hintikka 1998; Niiniluoto 1999).10 by depicting in our mind—thanks to that brain device that is This means that also in many forms of deductive reasoning called visual buffer—the images of that addition thought as it there are not only trivial and mechanical methods of making occurs concretely, with paper and pencil, taking advantage of inferences but we have to use models and heuristic procedures external materials. We have said that mind representations are that refer to a whole set of strategic principles. All the more also over there, in the environment, where mind has objectified reason that Bringsjord (1998) stresses his attention on the itself in various structures that mimic and enhance its internal role played by a kind of “model based deduction” that is “part representations. and parcel” of our establishing Gödel’s first incompleteness Turing adds a new structure to this list of external objectified theorem, showing the model-based character of this great devices: an abstract tool (LCM) endowed with powerful abductive achievement of formal thought.11 mimetic properties. We have concluded the previous section I think the previous considerations also hold for Peircean remarking that the “mind” is in itself hybrid and extended and, theorematic reasoning: indeed, Peirce further distinguished a so to say, both internal and external: the mind transcends the “corollarial” and a “theoric” part within “theorematic reasoning,” boundary of the individual and includes parts of that individual’s and connects theoric aspects to abduction (Hoffmann 1999, environment. Turing’s LCM, which is an externalized device, is 293). Of course, as already stressed, we have to remember able to mimic human cognitive operations that occur in that this abductive aspect of mathematical reasoning is not in itself interplay between the internal mind and the external one. creative. It can be performed both in creative (to find new Indeed, Turing already in 1950 maintains that, taking advantage theorems and mathematical hypotheses) and non-creative of the existence of the LCM, “Digital computers […] can be (merely “selective”) ways, for example, in the case we are constructed, and indeed have been constructed, and […] using diagrams to demonstrate already known theorems (for they can in fact mimic the actions of a human computer very instance, in didactic settings), where selecting the strategy of closely” (Turing 1950). manipulations is among chances not necessarily unknown and In the light of my perspective both (Universal) Logical the result is not new. With respect to abduction in empirical Computing Machine (LCM) (the theoretical artifact) and sciences abduction in mathematics aims at hypothesizing (Universal) Practical Computing Machine (PCM) (the practical ideal objects, which later we can possibly insert in a deductive artifact) are mimetic minds because they are able to mimic the apodictic and truth preserving framework. mind in a kind of universal way (wonderfully continuing the The example of diagrams in geometry furnishes a semiotic activity of disembodiment of minds our ancestors rudimentary and epistemological example of the nature of the cognitive started). LCM and PCM are able to re-represent and perform in a interplay between internal neuronal representations (and very powerful way plenty of cognitive skills of human beings. embodied “cognitive” kinesthetic abilities) and external Universal Turing machines are discrete-state machines, representations I have illustrated above: also for Peirce, more DMS, “with a Laplacian behavior” (Longo 2002; Lassègue 1998, than a century before the new ideas derived from the field of 2002): “it is always possible to predict all future states”) and distributed reasoning, the two aspects are intertwined in the they are equivalent to all formalisms for computability (what pragmatic and semiotic view, going beyond the rigidity of the is thinkable is calculable and mechanizable), and because Kantian approach in terms of schematism. Diagrams are icons universal they are able to simulate—that is to mimic—any that take material and semiotic form in an external environment human cognitive function, that is what is usually called mind. endowed with Universal Turing machines are just a further extremely • constraints depending on the specific cognitive fruitful step of the disembodiment of the mind I have described delegation performed by human beings and above. A natural consequence of this perspective is that they do not represent (against classical AI and modern cognitivist • the particular intrinsic constraints of the materiality at computationalism) a “knowledge” of the mind and of human play. intelligence. Turing is perfectly aware of the fact that brain is not

— 46 — — Philosophy and Computers — a DSM but, as he says, a “continuous” system, where instead a Dennett, D. 1991. Consciousness Explained. New York: Little, Brown, mathematical modeling can guarantee a satisfactory scientific and Company. intelligibility (cf. his studies on morphogenesis). Gatti A. and Magnani, L. 2005. On the representational role of the We have seen that our brains delegate cognitive (and environment and on the cognitive nature of manipulations. In epistemic) roles to externalities and then tend to “adopt” what Computing, Philosophy, and Cognition, ed. L. Magnani and R. Dossena. 227-42. Proceedings of the European Conference of Computing and they have checked occurring outside, over there, in the external Philosophy, Pavia, Italy, 3-4 June 2004. invented structured and model. Gooding, D. 1990. Experiment and the Making of Meaning. Dordrecht: Our view about the disembodiment of the mind certainly Kluwer. involves that the Mind/Body dualist view is less credible as Hameroff, S. R., Kaszniak, A. W. and Chalmers, D. J. (Eds.). 1999. well as Cartesian computationalism. Also the view that Mind is Toward a Science of Consciousness III. The Third Tucson Discussions Computational independently of the physical (functionalism) is and Debates. Cambridge, MA: MIT Press. jeopardized. In my perspective on human cognition in terms of Hintikka, J. 1998. What is abduction? The fundamental problem of mimetic minds we no longer need Descartes dualism: we only contemporary epistemology. Transactions of the Charles S. Peirce have brains that make up large, integrated, material cognitive Society 34: 503-33. systems like, for example, LCMs and PCMs. The only problem Hintikka, J. and Remes, U. 1974. The Method of Analysis. Its Geometrical seems “How meat knows”: we can reverse the Cartesian Origin and Its General Significance. Dordrecht: Reidel. motto and say “sum ergo cogito.” In this perspective what Hoffmann, M.H.G. 1999. Problems with Peirce’s Concept of Abduction. we usually call mind simply consists in the union of both the Foundations of Science 4(3): 271-305. changing neural configurations of brains together with those Hutchins, E. 1995. Cognition in the Wild. Cambridge, MA: MIT Press. large, integrated, and material cognitive systems the brains Hutchins, E. 1999. Cognitive artifacts. In Encyclopedia of the Cognitive themselves are continuously building. Sciences, ed. R. A. Wilson & F. C. Keil. 126-27. Cambridge, MA: MIT 8 Conclusion Press. The main thesis of this paper is that the disembodiment of Kant, I. 1929. Critique of Pure Reason, trans. N. Kemp Smith. London: MacMillan. Reprint 1998; originally published 1787. mind is a significant cognitive perspective able to unveil some basic features of creative thinking. Its fertility in explaining the Karmiloff-Smith, A. 1992. Beyond Modularity: A Developmental Perspective on Cognitive Science. Cambridge, MA: MIT Press. interplay between internal and external levels of cognition is evident. I maintain that various aspects of cognition could take Kirsh, D. & Maglio, P. 1994. On distinguishing epistemic from pragmatic action. Cognitive Science 18: 513-49. advantage of the research on this interplay: for instance, study on external mediators can provide a better understanding of Lassègue, J. 1998. Turing. Paris: Les Belles Lettres. the processes of explanation and discovery in science and in Lassègue, J. 2002. Turing entre formel et forme; remarque sur la some areas of artificial intelligence related to mechanizing convergence des perspectives morphologiques. Intellectica 35(2): discovery processes.12 For example, concrete manipulations of 185-98. external objects influence the generation of hypotheses: what Longo, G. 2002. Laplace, Turing, et la géométrie impossible du “jeu de I have called manipulative abduction shows how we can find l’imitation”: aléas, determinisme e programmes dans le test de Turing. methods of constructivity in scientific and everyday reasoning Intellectica 35(2): 131-61. based on external models and “epistemic mediators.” Magnani, L. 2001a. Abduction, Reason, and Science. Processes of Discovery and Explanation. New York: Kluwer Academic/Plenum Finally, I think the cognitive role of what I call “mimetic Publishers. minds” can be further studied also taking advantage of the Magnani, L. 2001b. Philosophy and Geometry. Theoretical and Historical research on hypercomputation. The imminent construction Issues. Dordrecht: Kluwer Academic. of new types of universal “abstract” and “practical” machines Magnani, L. 2004. Conjectures and manipulations. Computational will constitute important and interesting new “mimetic minds” modeling and the extra-theoretical dimension of scientific discovery. externalized and available over there, in the environment, as Minds and Machines 14: 507-37. sources of mechanisms underlying the emergence of new Magnani, L. 2007. Morality in a Technological World. Knowledge as a meaning processes. They will provide new tools for creating Duty. Cambridge: Cambridge University Press. meaning formation in classical areas like analogical, visual, and Magnani, L. 2009. Abductive Cognition. The Epsitemological and Eco- spatial inferences, both in science and everyday situations, so Cognitive Dimensions of Hypothetical Reasoning. Berlin/Heidelberg: that this can extend the epistemological and the psychological Springer. theory. Magnani, L. and Dossena, R. 2003. Perceiving the infinite and the References infinitesimal world: unveiling and optical diagrams and the construction Agre, P. & Chapman, D. 1990. What are plans for? In Designing of mathematical concepts. In Proceedings of CogSci2003. CD-ROM Autonomous Agents, ed. P. Maes. 17-34. Cambridge, MA: MIT Press. produced by X-CD Technologies, Boston, MA. Aliseda, A. 1997. Seeking Explanations: Abduction in Logic, Philosophy Magnani, L, Nersessian, N. J. and Pizzi, C. (Eds.). 2002. Logical and of Science and Artificial Intelligence. PhD Thesis. Amsterdam: Institute Computational Aspects of Model-Based Reasoning, Dordrecht: Kluwer for Logic, Language and Computation. Academic. Aliseda, A. 2006. Abductive Reasoning. Logical Investigations into Mithen, S. 1996. The Prehistory of the Mind. A Search for the Origins of Discovery and Explanation. Berlin: Springer. Art, Religion, and Science. London: Thames and Hudson. Batens, D. 2006. A diagrammatic proof search procedure as part of Mithen, S. 1999. Handaxes and ice age carvings: Hard evidence for the a formal approach to problem solving. In Model-Based Reasoning evolution of consciousness. In Hameroff et al. 1999, pp. 281-96. in Science and Engineering, ed. L. Magnani. 265-84. London: College Niiniluoto, I. 1999. Abduction and geometrical analysis. Notes on Charles Publications. S. Peirce and Edgar Allan Poe. In L. Magnani, N.J. Nersessian, and P. Brooks, R.A. & Stein, L. 1994. Building brains for bodies. Autonomous Thagard (Eds.) 1999, pp. 239-54. Robots 1: 7-25. Peirce, C.S. (1931-1958) (CP). Collected Papers, 8 vols. C. Hartshorne & Clancey, W.J. 2002. Simulating activities: Relating motives, deliberation, P. Weiss (vols. I-VI), (Eds.), & A.W. Burks (vols. VII-VIII) (Ed.) Cambridge, and attentive coordination. Cognitive Systems Research 3(1-4): 471- MA: Harvard University Press. 500.

— 47 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Peirce, C.S. 1976. The New Elements of Mathematics by Charles Sanders As the illustration shows, after four time steps the glider Peirce. C. Eisele (Ed.) (vols. I-IV). The Hague-Paris/Atlantic Highlands, pattern reconstitutes itself one square to the right and one NJ: Mouton/Humanities Press. square down.2 When a glider appears during a Game-of-Life Peirce, C.S. 1986. Historical Perspectives on Peirce’s Logic of Science: A run, it looks like a little creature scurrying across the grid. History of Science. C. Eisele (Ed.) (vols. I-II). Berlin: Mouton. The glider is a nice example of emergence. It is a macro- Piaget, J. 1974. Adaptation and Intelligence. Chicago, IL: University of level regularity that arises from the micro-level phenomena of Chicago Press. Game-of-Life cells going on and off according to the Game-of- Turing, A. M. 1937. On computable numbers with an application to Life rules. It is a macro-level regularity in that it has properties the Entscheidungsproblem. Proceedings of the London Mathematical that can be described abstractly at the phenomenal level. For Society 42: 230-65. example, a glider moves across the grid; its motion can be Turing, A. M. 1950. Computing machinery and intelligence. Mind 49: characterized by a velocity, i.e., a speed and a direction. These 433-60, 1950. Also in Turing 1992a, pp. 133-60. statements characterize a glider as an abstract phenomenon. Turing, A. M. 1969. Intelligent machinery [1948]. In Bernard Meltzer They do not rely on how gliders are implemented by the Game- and Donald Michie, Machine Intelligence 5: 3-23. Also in Turing 1992a, of-Life rules. In fact, the Game-of-Life rules do not talk about pp. 3-23. things moving across the grid at all; they talk only about cells Turing, A. M. 1992a. Collected Works of Alan Turing, Mechanical going on and off. Intelligence, ed. Darrel C. Ince. Amsterdam: Elsevier. But there is certainly nothing mysterious about how a glider Turing, A. M. 1992b. Collected Works of Alan Turing, Morphogenesis, comes to be. As Figure 1 shows, each configuration follows from ed. Peter T. Saunders. Amsterdam: Elsevier. the preceding according to the Game-of-Life rules.3 So what is the emergence issue? Is it, as Fodor (1997) Abstract Data Types and Constructive suggests, why the Game of Life gives rise to any regularities at all? Why isn’t every Game-of-Life run just a “buzzing, blooming Emergence confusion” of cells going on and off? That seems similar to Russ Abbott asking why there is mathematics. Mathematics is the study of California State University, Los Angeles (typically unforeseen) regularities. One could just as well ask why there are an infinite number of prime numbers, or why there are more real numbers between 0 and 1 than there are Abstract integers, or why every positive integer can be expressed as the I suggest that the computer science notion of an abstract sum of 4 or fewer squares—e.g., 23 = 32 + 32 + 22 + 12)—or why data type explains many of the issues raised about the “Fermat’s last theorem” is true, or why any other mathematical phenomenon known as emergence. I discuss that along with theorem holds. downward entailment, an ontology of entities based on energy The mathematical answer to why a theorem holds is given considerations, the usefulness of supervenience in studying by the proof of the theorem. Similarly the question “Why are emergence, and the relationship of multiple realization to there gliders?” is answered by Figure 1, which shows why there emergence. are gliders—or at least how gliders come to be. I. The basic questions about emergence Given any emergent phenomenon, it seems to me that its Broadly speaking, emergence refers to the common observation how question—how does this phenomenon come to be—is 4 5 that many of the phenomena we see around us, although part (or will be ) answered by science. of the material world, seem to resist description in terms of the But if the how question is not the central issue, are there laws governing the elements and phenomena from which they other basic questions? The following are what I take to be the derive—and especially that they resist description in terms of fundamental questions about emergence—along with brief the fundamental laws of physics. versions of my answers. Can we say what the fundamental philosophical issues a) What—if anything—characterizes what we think of are regarding emergence? Much of the Introduction to Bedau as emergence? and Humphreys (2008) is devoted to formulating what they My answer. Emergence pertains to macro phenomena (presumably) take to be the important philosophical questions. that exhibit regularities. These are phenomena that Later in this article I respond to those questions from the may be characterized abstractly, i.e., structurally or perspective that I will be presenting. functionally at the level of the phenomena themselves I’m not sure, though, that any of the Bedau and Humphreys and without reference to the properties of the questions quite get to the fundamental issues. To explain why I’d phenomena from which they derive. In the case of the like to turn to the Game of Life,1 which has become something glider, for example, part of its abstract characterization of a white mouse for thought experiments about emergence. is that it has a velocity. b) Is there a common material basis—besides the details of the answers to the how questions—for why there are emergent phenomena? My answer. Energy, as characterized by physics, is the common material basis for all material emergent Figure 1. A glider. phenomena.6 Some emergent phenomena—like molecules and solar systems—exist in what are known as energy wells. Energy is required to pull If one runs the Game of Life from a random starting them apart—to destroy them. Other emergent configuration, various patterns appear. One of the best known is the glider, a sequence of configurations such as the phenomena—like biological organisms—require a following. continual supply of external energy to persist. These phenomena require energy to keep themselves

— 48 — — Philosophy and Computers —

Table 1. Table of entity types (from Abbott 2009). Naturally occurring Human designed Static. At an energy equilibrium; in Atoms, molecules, solar systems, ... Tables, boats, houses, cars, ships, ... an “energy well.” Homeostatic mechanisms: lowest Homeostatic mechanisms: Supervenience is useful. energy state. few; generally dependent on “maintenance” processes. Dynamic. “Far from” an energy Hurricanes(!), biological organisms, Social groups such as governments, equilibrium. Must import energy (and biological groups such as ant corporations, clubs, the ship of usually other resources) to persist. colonies, ... Theseus(1), ...

Energy Supervenience is not useful. Homeostatic mechanisms: Homeostatic mechanisms: status specialized for individual cases. specialized for individual cases, ranging from force to incentives. Subsidized. Energy is not relevant Ideas, concepts, “memes,” ... . The “first-class” values—such as since it is provided “for free” within The elements of a conceptual objects, classes, class instances, an entity “incubator.” system. (This paper is not about etc.—within a computational system. consciousness. This category just fits Supervenience is not useful. here.) Homeostatic mechanisms: generally not required since no natural Homeostatic mechanisms: We don’t degradation. understand how concept formation works.

together. But these phenomena are organized in such machine is well defined. Yet Turing did not provide a Peano- a way that they are able to extract enough energy like set of axioms when he defined a Turing machine. His from their environment to maintain their form and description was rigorous enough that no one has argued that functionality. I call the former class of entities static and the specification is unclear or ambiguous.8 the later class dynamic. See Table 1. See the section It’s important to realize that Turing machines and Peano below on emergence and entities for some additional arithmetic are self-contained. Unlike emergent phenomena discussion. they are not defined as “macro” properties that have somehow c) What makes it possible for there to be macro arisen from some “micro” phenomena. They are simply a set regularities? of self-consistent properties and functions. This is generally the case with abstract data types. My answer. Emergence results from the imposition of constraints. If it is possible to impose constraints Another way to put it is that abstract data types are pure on micro phenomena, it is generally possible to imagination. They enable software designers to introduce new produce macro-level regularities. I don’t have good concepts. An abstract data type is an imagined category of things answers to the follow-up questions: What makes it that behaves in an imagined way. possible to impose constraints, and how might one That is not to say that abstract data types are arbitrary. characterize in general the kinds of constraints that They are defined for specific reasons. Turing defined the can be imposed. Turing machine the way he did—rather than in some other way—because it had properties he thought were important. The computer science approach to emergence More frequently abstract data types are defined to have In Computer Science we tend not to use the term emergence. properties that are patterned after non-software elements with Instead, we think about abstractions and how (or whether) they similar properties. Peano developed his axioms as a way to can be implemented. The term we use for such abstractions is formalize what he thought were the essential features of the abstract data type. It is my suggestion that once one understands non-negative integers. the notion of an abstract data type questions about emergence For a more commonplace example consider a word are easily answered. processor. Word processors include a number of familiar Computer Science uses the term type (or data type) for abstract data types. Among them are the word, the paragraph, what in philosophy is called a kind. An abstract data type is the page, and the footnote. One of the criteria we apply to word a type that is characterized exclusively in terms of specified processors is to ask how well the properties of their abstract properties. What this means is that the properties of an abstract data types parallel those we intuitively associate with “kinds” data type are specified abstractly and not in terms of how the of the same name that we understand from other contexts. Is abstract data type may be realized. a word in Microsoft Word like what we understand a word to Sometimes these properties are expressed as a set of be in English? axioms. A nice example is Peano’s well-known axioms for the Contrast a word processor with a simpler so-called text- non-negative integers.7 editor (like Microsoft Notepad) that enables users to manipulate More often the properties of an abstract data type are text but that implement far fewer data types. Microsoft Notepad, expressed informally—but clearly enough that there is general for example, has no data types corresponding either to the agreement about what the specification means. The Turing paragraph or the footnote. machine is a nice example. The functionality of a Turing

— 49 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

A collection of interacting abstract data types is known as computer programs that offered interaction through a GUI. Why a level of abstraction—which in some ways is analogous to a is that important? It’s important because it illustrates three things special science. Like an abstract data type a level of abstraction about abstract data types. is defined in terms of its abstract components. It too is therefore a) Many abstract data types are invented either to solve a fantasy world—one defined in terms of itself—but, like so a problem—in this case how to offer users a selection many fantasy worlds, often motivated by similar real-world of operations without cluttering the screen—or simply collections of kinds. to provide a new capability. The level-of-abstraction implemented by a computer b) Abstract data types are defined constructively. program is generally known as a conceptual model. To use a Although the abstraction of a pop-up menu can be piece of software effectively one must first master its conceptual specified independently of its implementation, any model—understand the types it implements and the operations implementation of a pop-up menu—or any other it enables users to perform on elements of those types. abstract data type—depends on the availability of other I want again to emphasize that each of these—an abstract types that are used in that implementation. data type, a level of abstraction, and the conceptual model c) Once an abstract data type exists and once awareness implemented by a piece of software—is generally defined self- of it spreads throughout the software development referentially. Each consists of a self-contained world defined community, it is often used in the construction of other (circularly) in terms of its own elements. abstractions. This frequently leads to a virtuous cycle An example of a sophisticated abstract data type is the word of creativity. abstract data type in Microsoft Word. When one selects a word in Microsoft Word (by double clicking within it) one gets the A non-abstract data type symbols that make up the English word along with the space Not all computer data types are abstract. Consider the collection (if any) that follows it. Imagine then pasting the copied word of floating point numbers and their associated arithmetic that is in two different places: one at the end of a sentence where the implemented by a particular computer—especially a computer pasted word is followed immediately by a period and one in of a few decades ago before floating point numbers were the middle of a sentence where the pasted word was preceded standardized. In this case there is no independent specification. and followed by another word. Microsoft Word treats the space The implementation is the specification. The data type consists selected with the word properly in both cases. In the first case of the numbers that can be represented and the operations that no space is left between the pasted word and the following the computer performed on them. This is a good example of a period. In the second, spaces are left between the pasted word non-abstract data type. and both the preceding word and the following word. Try the One of the issues raised about emergence is autonomy. same thing in Microsoft Notepad, for example. You will see that The emergent phenomenon has properties that are said to be Microsoft Notepad is not nearly as sophisticated in its treatment autonomous of its base. Another issue is multiple realization. It of the analogous abstract data type. is often argued (or more frequently just assumed) that if some Notice that in none of the above was it necessary to talk property is multiply realized it must be autonomous. about how abstract data types are implemented. I didn’t say The preceding example of a non-abstract data type can whether a word is stored as bytes in the computer’s memory be used to show that multiple realizability is not a definitive or which internal computer operations are performed when demonstration of autonomy. Imagine software that emulates one copies and pastes a word. the original computer but that runs on a modern computer. The A word is a sequence of symbols (i.e., a letter, digit, emulating software would implement the original floating point punctuation character, etc.), but users neither need nor data type. Is that data type now autonomous? It is clearly not want to know that a symbol has for a long time been stored autonomous of its original implementation; its properties were in the computer’s memory as a byte. They almost certainly defined by its original implementation. Yet those properties serve don’t want to know that now that word processors deal with as an abstract (autonomous) specification for the emulator. So symbol sets that have more elements than there are distinct the data type is autonomous of one of its implementations and bit configurations in a byte (such as symbols sets that include not autonomous of another. characters from a wide range of alphabets), a symbol is no Furthermore, once the emulator is written the data type longer stored simply as a byte. For users, a symbol is a symbol. would be multiply realized. That, of course, would not change It should have the properties of a symbol as an abstract data the fact that the properties and functionality of the data type are type and not properties that depend on whether or not it is not autonomous of the computer where it first appeared.9 stored as a byte. Type theory Emergence as a constructive activity In the preceding I’ve tried to provide an intuitive feeling for The development of new abstract data types is a constructive abstract data types. I don’t want to leave the impression activity. Consider the pop-up menu. It is a feature common that that’s all there is to the notion of a type. In fact, type to many interactive computer programs. It allows users to theory—introduced by Bertrand Russell to prevent Russell’s perform an operation selected from a list of operations that paradox—has developed over the past century in multiple are relevant to the element on the screen to which the user’s directions—including mathematical logic and the theory of cursor is pointing. To avoid cluttering the screen with options, programming languages—to become quite sophisticated.10 pop-up menus appear only in response to a user action such But no matter how complex type theory becomes, it is always as pressing the right-most button on a mouse. the case that type definitions are self-contained; types are not As far as I know, no similar capability existed prior to the defined in terms of their implementations or realizations. development of interactive computer systems that had graphical mouse and screen based interfaces—commonly known as a Summary comments about abstract data types Graphical User Interface or GUI. In other words, someone had I hope that the preceding has conveyed the idea that abstract to invent the pop-up menu. Once invented the pop-up menu data types are defined independently of their implementation. abstraction quickly spread to (was implemented in) nearly all Furthermore, abstract data types may be implemented

— 50 — — Philosophy and Computers —

(or realized) in multiple ways. The abstract data types it has been difficult to understand emergence. My view is that implemented by Microsoft Word are echoed to a great extent emergence occurs in nature in the form of realized abstract by the abstract data types implemented by Open Office, the data types. Yet it is the fact of the abstraction—that one can open source competitor. The Open Office implementations of provide an abstract specification—and not the realization of those types are necessarily different from the Microsoft Word that abstraction that establishes a phenomenon as emergent.12 implementations—or Microsoft would sue the Open Office The fact that abstractions don’t themselves exist as independent developers for copyright infringement. physical elements may make it difficult to think of them as Multiple realizability of an abstract data type is neither playing a role in emergence. required for nor a demonstration of the autonomy of an abstract Emergence and entities data type. On the contrary, multiple realizability is enabled by the fact that the abstract data type has an abstract specification. The primary notions in the version of emergence I am proposing are (a) abstract data types and (b) instances/implementations/ The fundamental point, though, is that the notion of an realizations of abstract data types. Since nature does not abstract data type requires a strict separation of specification create unrealized abstractions, it is the instances that come from implementation. The specification characterizes the into existence. An instance of an abstract data type is an entity, properties of the type; the implementation/realization embodies a thing. Consequently, entities are central to my version of those properties in something concrete. emergence. I discuss entities in both Abbott 2008 and Abbott The notion of an abstract data type has become one of the 2009. The following is an overview of that discussion. foundational elements of computer science and is now part of As far as I can tell, there is no philosophical consensus the standard lower division computer science curriculum. about what one means by the term entity. One common view is Similar processes produce emergence in the world of that the entities are whatever scientists talk about. In explaining material phenomena Quine’s approach to ontology, Smith (2003) put it this way. I suggest that emergence in general is best understood as the Quine takes ontology seriously. His aim is to use realization/implementation of an abstract data type. A couple science for ontological purposes, which means: to find of immediate clarifications are needed. the ontology in scientific theories. Ontology is then a First, and perhaps most importantly, the fundamental network of claims, derived from the natural sciences, premise underlying this suggestion is that many things in the about what exists coupled with the attempt to establish world can be described abstractly—in terms of regularities that what types of entities are most basic. …This is defined are not directly related to the fundamental elements of physics. by the vocabulary of the corresponding theory These are the regularities to which Fodor (1997) was referring and (most importantly for Quine) by its canonical when he wrote, “The very existence of the special sciences formalization in the language of first-order logic. … testifies to reliable macro-level regularities.” The existence of His so-called “criterion of ontological commitment” these regularities means that there are features of nature that is captured in the slogan: To be is to be the value of a can be characterized abstractly. Any such description is the bound variable. specification of an abstract data type. My position is not to convince anyone that there are That’s fine. The network of claims about what exists made by abstract data types in nature. We already agreed that one of the a science is essentially the level of abstraction or conceptual fundamental features of emergence was the existence of such model defined by that science. regularities. I take those regularities as a starting point. In simply But this doesn’t tell us to which sorts of things scientists accepting the existence of naturally occurring regularities I am tend to make ontological commitments. The simple answer converting the special sciences from a problem that needs to is that science looks for and attempts to understand and be explained into an ally. Instead of asking as Fodor does how characterize regularities. there can possibly be macro-level regularities, I am taking the So what is a regularity? It is something that can be described existence of macro-level regularities as given and asking what more concisely than a simple enumeration of its components. that says about the world. Another way to put it is that a regularity is a persistent pattern— Second, nature is not capable (as far as I know) of creating some aspect of the world that has lower entropy than its an abstract data type without realizing/implementing it. That is, surroundings. In my view then, entities are persistent patterns. nature cannot create a specification of an abstract data type. It is entities that emerge. Human beings create unrealized specifications all the time. In my brief answer to the question raised earlier about When designing computer programs, software developers a common material basis for emergence, I said that energy specify the abstract data types they intend to use. These provides that common basis. I said that there are two categories specifications exist even though no instance of those types have of naturally emergent entities: static and dynamic. (See table been created—and even though designs for how those data 1.) types will be implemented may not have been developed. The static entities are those at an energy equilibrium. Specifications are created in other fields as well. A These include such entities as atoms, molecules, and solar constitution is a specification for a government before the systems. Energy is required to separate these entities into their government is created. Similar documents provide abstract components. Consequently, such entities have objectively (but descriptions for corporations, clubs, and other social negligibly) less mass than their components taken separately. organizations. Robert’s Rules of Order (e.g., Robert 2000) This mass difference—along with their reduced entropy—is provides a specification for meetings as an abstract data type. support for their objective existence. But nature—at least non-human nature—can’t do this. Dynamic entities are stable but far from an energy I know of no way for nature to create a specification of an equilibrium. These entities are typical of what are frequently abstract data type—and I know of no specification of an abstract referred to as complex systems. Examples are biological data type that one finds in nature.11 The fact that stand-alone organisms and social organizations. These entities require a specifications don’t exist in nature may be one of the reasons continual supply of energy to persist. They consist of components

— 51 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — in motion. As such they possess objectively (but negligibly) more question of where the energy comes from for automata to mass than their components taken separately. Again, this mass operate is neither asked nor answered. It is for that reason that difference—along with their reduced entropy—is support for I refer to entities that appear in these sorts of artificial worlds their objective existence. as energy-subsidized—or just subsidized. One might ask which entities will come into existence. My Constructive emergence view is that entities come into existence as a result of historical accidents. We understand how evolution operates. Variations Earlier I pointed out that the creation of new abstractions in on existing types are created at random. Those that are suited software is often enabled by the existence of pre-requisite to the environment within which they find themselves persist. abstractions. A similar phenomenon occurs in nature. Gould The others don’t. and Vrba (1982) invented the term exaptation to refer to a biological feature that performs a function different from that It seems to me that a generalization of that process applies for which it was originally selected. A standard example is the to all entities and entity types. They are created more or less feather, which may have been selected originally for insulation at random from elements available at the time. The ones that but which now helps in flight. persist are those that are suited to their environments. This leads to what I called (Abbott 2009) the principle of ontological On a much smaller scale—and from a slightly different emergence. perspective—bacterial flagella are constructed of components that appear to have been selected originally for other uses. In Extant abstract data types are those whose reply to the so-called intelligent-design claim that flagella are implementations have materialized and whose “irreducibly complex” Miller (2004) traces such a plausible environments enable their persistence. evolutionary path. That doesn’t quite answer the question why some entity types The point is that in both software and nature emergence persist and others don’t. Part of the answer is that entities persist is a constructive process. This holds with special force when if they are successful in the world. Little is demanded of static emergence occurs in nature. Since there are no unrealized entities. Static entities are successful and persist if nothing abstractions in nature, the components required for the tears them apart. Most familiar chemical compounds, for realization of an abstraction must exist before the abstraction example, would not persist in an extremely hot and/or turbulent can come into being. environment. Pour a glass of water into a blast furnace, and the Supervenience water molecules will come apart. Pour it onto the “surface” of the sun, and even the atoms will come apart. The properties of emergent entities supervene over the properties of their components. But supervenience is not as More interestingly, dynamic entities must continually supply simple or useful as one might suppose. Consider a biological themselves with energy if they are to persist. Animals must organism. At any moment its properties supervene over those “work for a living.” Those that can’t find a niche within which of the elements that make it up. But those elements change they can extract sufficient energy cease to exist. from moment to moment. Biological organisms are continually The “work” that dynamic entities are able to do depends shedding parts of themselves and incorporating new elements. on the functionality built into them as instances of particular So if one were to speak of what is sometimes referred to as the entity types. supervenient base of a biological organism it would consist of Cows eat grass. That simple statement embeds within it all the elements of which the organism is composed over its facts about the functionality of the cow entity type. Presumably, lifetime. So supervenience holds, but in many cases it is not a one could trace at the micro-physical level the processes very useful conceptual tool for exploring emergence. involved in cows eating and digesting grass. But that would The same holds for something as simple as a glider. At any not clarify why the cow entity type succeeds. The cow entity moment a glider supervenes over the cells of which it is made. type succeeds because its functionality includes the ability of But the cells that make up a glider change from one time step to its instances to sustain themselves—as cows—by eating and the next. So the “supervenient base”—if there is one—of a glider digesting grass. consists of all the cells that ever participate in its makeup. The The same is true for every other successful dynamic entity “supervenient base” of a glider on an otherwise empty Game- type. A successful dynamic entity type is one that includes of-Life grid is an unboundedly large collection of cells. functionalities that enable its instances to sustain themselves— Is that useful? In some ways it is. If one were to sabotage as that type—in the world in which they find themselves. just one of those cells before it became part of the glider, the Subsidized entities glider would likely fail when it reached that cell. That insight gives us a good way to look at phenomena like poisons and The third row in table 1 is labeled “subsidized entities.” These moles (internal spies): control something that will become part are entities that exist in an artificial environment within which of the supervenient base of an entity and one potentially has a energy is not a consideration. lot of leverage over that entity. The glider again provides a useful example. Gliders are neither static entities (they don’t persist because fundamental Multiple realization forces of nature hold them together) nor dynamic entities (they Multiple realization is a consequence of emergence rather than don’t require imported energy to hold themselves together). a criterion required for it. Gliders exist in a world in which energy is not an issue. When Putnam (1975) described autonomous phenomena In the Game of Life, no concern is given to the question of from a functionalist perspective, he emphasized the importance where the energy comes from that powers the application of of functional isomorphism between two physical entities. the Game-of-Life rules. Like any formal automaton (such as [T]wo systems can have quite different constitutions a Turing machine or a finite automaton), the Game of Life is and be functionally isomorphic. For example, a simply assumed to operate as it does. computer made of electrical components can be In reality, of course, no such rule application process could isomorphic to one made of cogs and wheels. occur without energy. But in the virtual world of automata the

— 52 — — Philosophy and Computers —

It was an important insight that radically different entities can As another example, consider the implications of the fact be functionally isomorphic. This insight began to clarify what that it is possible to implement Turing machines in the Game is meant when one says that emergent phenomena (or their of Life. Since it is undecidable whether a Turing machine properties or their functions) are autonomous of the elements will halt, it is undecidable whether an arbitrary Game-of-Life (and the properties and functions of those elements) of which configuration will reach a stable state. they are composed. This is a direct example of downward entailment. A fact Unfortunately (in my view), Putnam’s insight has become about Turing machines, an abstraction implemented by the a primary focus in establishing the autonomy of a functional Game of Life, has implications for the Game of Life itself. As specification. If, the argument goes, one can produce two or long as the Game of Life is implementing a Turing machine, the more elements that have (as Putnam says) “quite different Turing machine abstraction will apply to those elements of the constitutions” but that yet are “functionally isomorphic,” Game of Life that participate in the implementation. then presumably that functionality is in some reasonable This illustrates reduction in mathematics and computer sense independent of or autonomous from the elements’ science. When something can be reduced to something else, constitutions. conclusions about the latter apply to the former. In this case Although this may be true, I think it misses the point. As the halting (or stability) problem for the Game of Life has Putman points out, “a computer made of electrical components been reduced (upward!) to the halting problem for Turing can be isomorphic to one made of cogs and wheels.” But it is machines—which then entails (downward) facts about the not the isomorphism between the different computers that Game of Life. The reduction is done by implementing Turing establishes that the functionality of a computer is autonomous machines within the framework of the Game of Life. of its implementation. The autonomy derives from the fact that a computer can be specified independently of its II. Questions from Bedau and Humphreys implementation. In this section I’d like to offer answers to the questions posed What is the functionality of a computer? It certainly doesn’t in the Introduction of Bedau and Humphreys (2008). consist of all the properties and functions possessed by a Question 1. How should emergence be defined? A number computing device made of cogs and wheels. Nor is it defined of leading ideas appear in different definitions of emergence, by all the properties and functions possessed by a computing including irreducibility, unpredictability, conceptual novelty, device made of electrical components. Many of these properties ontological novelty, and supervenience. will have something to do with cogs and wheels or with Answer 1. Emergence is best understood as the realization electrical components. To establish the isomorphism one strips of an abstract data type. The features listed by Bedau and away the irrelevant properties. The abstraction that remains—to Humphreys are related to emergence, but they don’t define which both realizations can be mapped and which can map it. to both realizations—is what characterizes what one means Irreducibility seems to me to be an awkward way of by a computer. characterizing the independence of a specification. It certainly But having created that abstraction, it becomes clear that is not the case that emergent phenomena are somehow what one means by a computer can be specified abstractly magically disconnected from their components. As we saw even if the only physical example were one made of cogs and with the glider, one can always explain how a configuration of wheels—or even if there were no physical examples at all. components produces emergent properties and functionality. Multiple realizability has nothing to do with the independence Even when there are multiple realizations of an abstraction one and autonomy of such an abstract specification. On the contrary, can explain how each of the given constructions produces the it is the autonomy of the specification that enables multiple given functionality. So it isn’t as if the emergent properties and realization. Once one has abstracted out the functionality one functionality do not have a material explanation. cares about, one sees that one can create other quite different But what typically is the case is that it is not possible to devices that produce that same functionality. describe the emergent properties and functions in terms of the properties and functions that characterize the components. Downward Entailment There is no mapping between the domain of description The existence of an emergent abstraction can result in what—in applicable to the implementing elements and that of the Abbott 2006 and 2009—I called downward entailment. implemented domain. The emergent properties and functions Downward entailment refers to an important aspect of the are typically a new and self-contained world that is implemented relationship between a realized abstraction and the elements by—but not derived from or defined in terms of—those of the that realize it. When an abstraction is being realized, the implementing elements. elements that compose the abstraction participate in whatever This may seem very strange. How is it possible to get from happens to the realized entity. Suppose I throw a baseball. one to the other if there is no mapping? The answer is that One of the consequences of my throwing the baseball is that although the new abstraction is not mathematically or logically it moves from here to there. As long as the baseball is being derived from the features of the implementing elements, when realized by its components, those components will also move put together in certain ways, the implementing elements simply from here to there. have certain (generally new) properties. When carbon atoms This is not downward causation. One could (presumably) are formed into a lattice the resulting diamond is hard even explain the movement of each component of the baseball at though the notion of hardness does not apply to individual the level of quantum physics. There are no new forces that carbon atoms. The same is true of a biological organism. apply to baseballs and other macro elements. Furthermore, if It has properties that its chemical components don’t have the elements that made up the baseball stopped making it up— individually. suppose the baseball lost its cover and its innards unwound in This is standard practice in software. A computer program mid-flight—the abstraction would cease to be realized, and the has properties—e.g., that it computes a particular function—that elements that had previously made up the realized abstraction the instructions and data structures that make it up don’t would no longer participate in its (former) motion. have individually. It is generally both the components and the

— 53 — — APA Newsletter, Spring 2010, Volume 09, Number 2 — way the components are put together that produces the new Answer 5. Emergence can be both synchronic and properties. diachronic. University academic departments (which are That raises the question of what sorts of ways can things emergent social organizations) consist of people who come be put together. When components are put together to form and go and who retain their individual identities even while they a new entity, certain global constraints are put in place. What are part of the department. The properties of the department are the possible global constraints? That’s a very good question are “simultaneously present with the basal features from which and one to which I don’t have a good answer.13 it emerges.” Here are two more questions with no apparent answers: It is also the case that new departments can be created (a) What sorts of new properties and functions can one from people who existed before the creation of the department. create? (b) What are the implications of the apparent fact that In that case “the base precedes the emergent phenomenon there is no good answer to the preceding question, i.e., that which develops over time from them.” creativity—both natural (through evolution) and human—is in Question 6. Does emergence imply or require the some reasonable sense unlimited?14 existence of new levels of phenomena? Unpredictability is simply wrong. Emergent properties and Answer 6. Since emergence is (in my view) the realization functions are predictable. When one first comes across an of an abstraction the elements that realize the abstraction are instance of a new data type, one may not know how it works. in a micro to macro relationship to the realized abstraction as But if a “special science” develops to study the new abstraction a whole. and if as is generally the case that special science is successful, But that doesn’t mean that nature is a tiered hierarchy the abstraction will be predictable. of levels. One of my favorite examples is the gecko. Geckos Conceptual novelty is usually true. Emergent elements are macro-level entities that rely directly (Kellar 2002) on satisfy descriptions that are new and self-contained and that do the quantum-level Van der Waals force to adhere to vertical not derive from the descriptions of their implementations. surfaces. Yet there are many intermediate “levels”—such I’m not sure what Bedau and Humphreys mean by as atomic physics, chemistry, biochemistry, etc.—between ontological novelty. Emergence and abstract data types have to quantum forces and geckos. do with entities and their properties. An abstract data type is a Question 7. This question bundles together a number of specification that applies to and characterizes instances of that issues. I’ll split them apart. type. Each instance is an ontologically real and distinct entity. Question 7a. In what ways are emergent phenomena Because emergence is so fundamentally connected to the autonomous from their emergent bases? …A number of different instantiation of abstractions, entities are central to emergence. It kinds of autonomy have been discussed in the literature, is entities—often with new properties and functionalities—that including the ideas that emergent phenomena are irreducible to emerge. Emergent entities have an objective existence. See my their bases, inexplicable from them, unpredictable from them, comments above about entities. supervenient on them, and multiply realizable in them. See the section above on supervenience for my comments Answer 7a. This sub-question itself bundles together on why supervenience applies but is often less useful than one a number of sub-sub-questions. Most have already been might imagine. addressed, but I’ll respond briefly to each. Question 2. What ontological categories of entities can In what ways are emergent phenomena autonomous from be emergent: properties, substances, processes, phenomena, their emergent bases? Emergent phenomena are autonomous patterns, laws, or something else? in that they are characterized by independent, self-contained Answer 2. The elements mentioned (properties, substances, specifications. processes, phenomena, patterns, laws) are not mutually disjoint. Are emergent phenomena irreducible to their bases? For example, a biological organism is also a process, and a Emergent phenomena are generally not reducible if reducibility biological organism has properties and obeys the “laws” that is taken to require a kind-to-kind mapping. That’s the position characterize it as an abstraction. My basic answer is as before: Fodor (1974) took three and a half decades ago. I think he was emergence is the realization of an abstract data type. Such a right. Emergent phenomena are characterized by autonomous realization is an entity, which can have new properties, etc. specifications, which cannot in general be mapped directly Question 3. What is the scope of actual emergent to the kinds, properties, and functionalities available in their phenomena? This question partly concerns which aspects of bases. the world can be characterized as emergent. Are emergent phenomena inexplicable from their bases? Answer 3. Anything that can be characterized abstractly— No. Emergent phenomena may be conceptually novel with i.e., as something other than a brute force enumeration of its respect to the conceptual apparatus needed to describe their parts—is emergent. Therefore, other than the fundamental bases. But it is always possible to explain how the emergent elements of physics—if there are any—everything that we think phenomena have the properties and functions they do in of as a thing is emergent. terms of how they are realized. As we saw with the glider, the Question 4. Is emergence an objective feature of the world, philosophical approach known as reductive explanation will or is it merely in the eye of the beholder? typically do the job. Answer 4. Emergent entities are objectively ontologically Are emergent phenomena unpredictable from their bases? real. See the earlier discussion of entities. No. See my reply to predictability in question 1. Question 5. Should emergence be viewed as static Are emergent phenomena supervenient on their bases? Yes, and synchronic, or as dynamic and diachronic, or are both but see my earlier comments on supervenience. possible? …In synchronic emergence, the emergent feature Are emergent phenomena multiply realizable in their is simultaneously present with the basal features from which bases? Frequently, but multiple realization is not as important it emerges. By contrast, in diachronic emergence, the base as it is often made out to be. See the earlier section on multiple precedes the emergent phenomenon which develops over realization. time from them.

— 54 — — Philosophy and Computers —

Question 7b. Is…autonomy…merely epistemological or mean that we don’t understand how each one of those ways [does] it [have] ontological consequences? produces the regularity. In general we do. Answer 7b. Although emergent entities are ontologically 6. In what may cause some confusion, energy is not relevant real their reality is not a consequence of their autonomous for emergence in the Game of Life or other computational specification. Emergent entities are ontologically real because frameworks. The reason is that computational environments are energy subsidized. Elements within the framework do of the physics of their construction. See my earlier comments not have to worry about where the energy comes from that about emergence and entities. creates them and keeps them in existence. The Game of Life, Question 7c. [Does] emergence necessarily [involve] novel for example, turns cells on and off without consideration of causal powers, especially powers that produce “downward the energy that would be required to do that were the Game causation,” in which emergent phenomena have novel effects of Life implemented physically. That’s why I wrote that energy on their own emergence base? is the common basis for all material emergent phenomena. Answer 7c. As far as I am concerned there is no such 7. Here are Peano’s axioms as presented in Weisstein (2009). thing as downward causation. The only forces of nature—and Zero is a number. presumably the only source of causality in the sense intended If a is a number, the successor of a is a number. here—are the fundamental forces of physics. But there is a very Zero is not the successor of a number. important qualification. Downward Entailment (as discussed Two numbers of which the successors are equal are above) can appear similar to downward causation and may themselves equal. lead one to suppose that emergence may involve novel causal (Induction axiom.) If a set S of numbers contains zero and powers. also the successor of every number in S, then every number is in S. Conclusions: analysis vs. construction 8. Here is Turing’s (1936) introductory description of what As I mentioned earlier, one of the corollaries of looking at has come to be called a Turing machine. Turing added emergence the way I am proposing is that it highlights nature’s formalization later in the paper but, as far as I can tell, he creativity. Putting things together in new ways can produce new never provided an axiomatic basis for his machine. properties and functions. We may compare a man in the process of computing The emergence-as-creativity perspective illustrates one a real number to a machine which is only capable of a finite number of conditions q , q , ..., q which will of the differences between computer science (and much 1 2 R of engineering in general) and most analytical disciplines. be called “m-configurations.” The machine is supplied with a “tape,” (the analogue of paper) running through Computer science is inherently constructive/creative. Software it, and divided into sections (called “squares”) each developers (and many engineers) imagine new things and capable of bearing a “symbol.” At any moment there figure out how to put existing things together to realize them. is just one square, say the r-th, bearing the symbol S(r) Practitioners in analytical fields generally attempt to determine which is “in the machine.” We may call this square the how existing things came to be the way they are. “scanned square.” The symbol on the scanned square Acknowledgment: may be called the “scanned symbol.” The “scanned symbol” is the only one of which the machine is, so I am grateful for the support, hospitality, and intellectual companionship to speak, “directly aware.” However, by altering its m- afforded me during the Summer of 2009 by Anne-Marie Grisogono of configuration the machine can effectively remember the Australian Defence Science and Technology Organization (DSTO), some of the symbols which it has “seen” (scanned) Hussein Abbas of the Australian Defence Force Academy, and Matthew previously. The possible behaviour of the machine at any Berryman of the University of South Australia (UniSA). I also wish to moment is determined by the m-configuration q and express my appreciation for the many stimulating conversations with n the scanned symbol S(r). This pair qn, S(r) will be called Antony Iorio, Cliff Hooker, and David Batten during my visit. the “configuration”: thus the configuration determines Endnotes the possible behaviour of the machine. In some of the 1. Most readers are probably familiar with the Game of Life. But configurations in which the scanned square is blank to review, the Game of Life is an unbounded, totalistic (the (i.e., bears no symbol) the machine writes down a new action taken by a cell depends on the number of neighbors symbol on the scanned square: in other configurations in certain states—not on the states of particular neighbors) it erases the scanned symbol. The machine may also two-dimensional cellular automaton. It operates in discrete change the square which is being scanned, but only time steps. Each cell is binary. At any moment, a cell is either by shifting it one place to right or left. In addition to “alive” or “dead”—or, more simply, on or off. At each time any of these operations the m-configuration may be step the following rules are applied simultaneously to each changed. Some of the symbols written down will form cell to determine whether it will be alive or dead at the next the sequence of figures which is the decimal of the real time step. number which is being computed. The others are just rough notes to “assist the memory.” It will only be these • A live cell stays alive if it has two or three live neighbors; rough notes which will be liable to erasure. otherwise it dies. 9. There is an old joke in the software community about letting • A dead cell with exactly three live neighbors becomes the implementation define the abstraction. If one finds a bug alive. in a program, instead of fixing it one should change the user 2. Gliders may move in any of the four diagonal directions. This manual—thereby changing a bug into a feature. one moves south-easterly. 10. For example, types need not be just simple classes of like 3. In philosophical terms, Figure 1 provides a reductive items. One type may be a subtype of another: human being explanation for how a glider is related to the rules of the is a subtype of mammal, which is a subtype of animal, etc. Game of Life. Types can also be defined functionally: male cuts across the 4. Consciousness is frequently cited as emergent. We can’t types just mentioned. answer its how question yet. But I’m convinced that we will There are also polymorphic (or structural or parameterized) be able to eventually. types (like list) in which a structural regularity is understood to 5. This is true even if the regularity is multiply realized. The fact be a (parameterized) type whose properties are independent that a regularity can be realized in multiple ways doesn’t of the type of thing stored in the structure. A list of numbers and a list of names are both lists, and have list-properties.

— 55 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

Functions too (including, intuitively, operations and Weisstein, Eric W. Peano’s axioms. From MathWorld–A Wolfram Web processes) can also be understood as having a type. The Resource. http://mathworld.wolfram.com/PeanosAxioms.html. process of counting the number of people standing in a line derives a number from a queue—in this case of people. The type of such an operation/function is one that maps queues Virtual Homes and Sherlock Holmes: On the (no matter what sorts of things are in the queue) to non- negative integers. That type would be written as something Existence of Virtual (and Other Abstract) like this. Entities queue(t) → Number The “t” in parentheses means that the queue type is Margaret Cuonzo parameterized by the type of its members, which can be Long Island University, Brooklyn Campus anything. Treating functions as typed objects is especially important in modern programming languages. I have a large virtual empire. My virtual holdings include two 11. A person’s DNA is not his or her specification. At best, DNA large plantations, an unusually decorated apartment, an island, is a parts list along with a way of generating those parts as a large aquarium, a café, and numerous other virtual entities. I needed. paid no money for these entities, nor taxes (regular or virtual) 12. That was the message of Abbott 2006. on them, yet it seems that they are mine and that they exist 13. Computer Science studies the question of the kinds of in some obvious sense. For example, it seems I speak truly or computations that can be performed within various limits falsely about these things. I would be speaking falsely if I said with its categorizations of (a) automata and formal languages that fish tended to survive in my virtual aquarium, and I would and (b) the time and space complexity of algorithms. speak truly if I said that my virtual apartment’s living room looks 14. It might be worthwhile to formalize the preceding observation. more like a beach than a regular living room. And while these One relatively simple way would be to understand the act objects occupy no noncontroversial physical space, they do of creating a new data type as combining (a) the operation have virtual “addresses” and can be reached by typing these of making tuples (of components) with (b) defining new into my web browser. What, then, is the ontological status of functions and predicates that take such tuples as arguments— my empire, and, more generally, any virtual object? To suggest and in the case of functions that return values that consist of an answer to this question, I’ll turn to another type of abstract such tuples. At this basic level it seems clear that there is no formal limit to the types one can create and the properties entity that bares numerous similarities to the virtual, namely, they may have. the fictional object. Fictional objects, like Sherlock Holmes, lack noncontroversial physical locations and properties, yet it References seems that we can speak truly and falsely of them. The claim Abbott, Russ. 2006. Emergence explained. Complexity (Sep/Oct 2006): that Sherlock Holmes was a detective, it seems, is true. Fictional 13-26. Preprint: http://cs.calstatela.edu/wiki/images/9/95/Emergence_ entities are also creations of an author, as virtual entities are Explained-_Abstractions.pdf created by programmers and users of programs. If it makes Abbott, Russ. 2008. If a tree casts a shadow is it telling the time? sense to talk of either entity as existing, then both types of entity International Journal of Unconventional Computing 4/3: 195-222. owe their existence to linguistic practices of their creators. Preprint: http://sites.google.com/site/russabbott/Ifatreecastsashadow Thus, discussions of the ontological status of fictional entities isittellingthetime.pdf provide a good starting point for discussing the ontological Abbott, Russ. 2009. The reductionist blind spot. Complexity, to appear. status of virtual entities. Below, I take some major positions on Also presented at the North American Conference on Computers and Philosophy, July 2008, Bloomington, Indiana, 2008. Preprint: http:// the nature of fictional entities and apply them to virtual entities, cs.calstatela.edu/wiki/images/c/ce/The_reductionist_blind_spot.pdf. noting ways in which the parallels between fictional and virtual objects break down, and suggesting that a deflationist account Bedau, Mark and Paul Humphreys. 2008. Emergence. MIT Press. Introduction available: http://mitpress.mit.edu/catalog/item/default. of virtual objects is most plausible. asp?ttype=2&tid=11341. A Fregean Account of the Virtual Fodor, Jerry A. 1974. Special sciences and the disunity of science as a Frege obviously didn’t discuss the nature of virtual entities. working hypothesis. Synthese 28: 77-115. However, he did discuss the nature of fictional objects in detail. Fodor, Jerry A. 1997. Special sciences: still autonomous after all these For Frege, many utterances we would not normally consider years. Philosophical Perspectives 11: 149-63. fictional, such as those that use empty singular terms, empty Gould, S.J., and E.S. Vrba. 1982. Exaptation: A missing term in the definite descriptions, and empty demonstratives, fall into science of form. Paleobiology 8(1): 4-15. the category of fiction. In “The Thought,” for example, Frege Kellar, Autumn, Metin Sitti, Yiching A. Liang, et al. 2002. Evidence for claimed that the demonstrative, “that lime tree,” said when van der Waals adhesion in gecko setae. Proceedings of the National there was in fact no lime tree, is a fictional utterance (28). Academy of Sciences of the USA 99: 12252-56. And, similar to straightforwardly fictional utterances, these Miller, Kenneth R. 2004. The flagellum unspun: The collapse of have sense but no reference. In “On Sense and Reference,” “Irreducible Complexity.” In Debating Design: From Darwin to DNA, Frege claims that “The sentence ‘Odysseus was set ashore at ed. W. Demski and M. Ruse. Cambridge University Press. Ithaca while sound asleep,’ obviously has a sense. But since it Putnam, Hilary. 1975. Philosophy and our Mental Life. In Mind, Language, is doubtful whether the name ‘Odysseus,’ occurring therein, has and Reality. 291-303. Cambridge University Press. reference, it is also doubtful whether the whole sentence has Quine, W. V. O. 1953. On what there is. As reprinted in From a Logical one” (62). Also, in his posthumous writings, Frege claimed that, Point of View. New York: Harper & Row. “Although the tale of William Tell is a legend and not history and Robert, Henry M. III, et al. 2000. Robert’s Rules of Order (10th Edition). the name ‘William Tell’ is a mock proper name, we cannot deny Da Capo Press. it a sense” (130). Thus, for Frege, fictional names have sense Smith, B. 2003. Ontology. In Blackwell Guide to the Philosophy of but no reference. And since the reference of a sentence is its Computing and Information. 155-66. Oxford: Blackwell. truth-value and the referents of one of the parts of a sentence Turing, A. M. 1936. On computable numbers, with an application to about William Tell is missing, the sentence cannot be either true the Entscheidungs problem. Proceedings of the London Mathematical or false, but rather what Frege calls “fictitious” (Logic 130). On Society 1937 s2-42(1): 230-65. this account, sentences such as “Sherlock Holmes was smart”

— 56 — — Philosophy and Computers — cannot be true, nor can sentences such as “Sherlock Holmes account, in a game that is licensed by the programs. By talking had seven heads” be false, since there is no referent for the about how many virtual farms I have, I am participating in a empty name. The sense of “Sherlock Holmes” is an abstract, game of make-believe that is licensed by the programs. The individual concept expressed by the name, but because it farms do not exist, but I speak as though they exist. Moreover, lacks a referent, there is no reference for the entire sentence a claim I make about a virtual farm is true or false if, and only that contains the fictional name. Frege, thus, does not commit if, what I say is licensed by the program itself. Altering Walton’s himself to ontology of fictional entities. truth condition for our purposes, and exchanging “game of Applying the above account to virtual islands, cafés, farms, make-believe” for “virtual world,” we have, and apartments, on the Fregean account, claims about virtual When a participant in a virtual world is authorized by entities have sense but no reference. We understand them, but a given program to assert something by uttering an utterances about virtual objects lack truth values, due to the ordinary statement and in doing so makes a genuine fact that they lack referents. Thus, on the Fregean account, the assertion, what she genuinely asserts is true if, and claim that my living room resembles a beach is neither true nor only if, within the framework of the virtual world, she false, but fictitious. My claim has sense, but no reference since speaks truly. there is no object to which “my living room” refers. The main problem with such an account is that what are That is, an assertion about a virtual object is true if, and only if, meant to be quite serious utterances fall into the realm of fiction. from the framework of the virtual (pretend, if you will) world, The claim that “Sherlock Holmes was a fictional character,” for what is said is true. And we can determine what is true or false example, would have to lack reference on the Fregean account by turning to the relevant features of the program itself. Thus, and lack truth value. Similarly, I am often making quite serious according to this nominalist account of virtual objects, such claims about virtual objects that turn out to lack truth-values on objects do not exist, but we can make true or false assertions this account. For example, I often claim that my virtual farm is about virtual objects nevertheless. much smaller than my friends’ virtual farms. This seems to me to Yablo’s Oracle Argument is one argument that has been be not only a serious utterance, but also a true one. The Fregean offered to show that we are not committed to the existence of account, which seems to put all empty terms, descriptions, abstract objects in general. Suppose that an oracle told us that demonstratives, and fictional objects into the same category there were no abstract entities, that is, no numbers, no abstract of expressions that have sense but no reference, and hence forms, no abstract meanings of terms, nothing abstract at all. In no truth values, would undoubtedly put claims about virtual such a situation, would anything about our linguistic practices objects in the same category. However, just as it is unlikely that change? It seems like nothing would. We would still talk about someone who is asserting something about “that lime tree” how 2+2=4, about the meaning of summer in Spanish, and intends her utterance to be fictional and/or lacking in truth value, whether Harry Potter is a good magician or not. In this situation, it is unlikely that most of what is said about virtual objects is we would acknowledge that there were really no such things, intended as fictional and/or lacking in truth value. Thus, a key but say that it was convenient to speak that way, regardless of problem for the Fregean account of virtual objects is its claim the existence of these objects. In such a situation we would that expressions that describe them lack truth value. not be committed to the existence of these objects. Yet, if that is the case in this hypothetical situation, then we must not be A Nominalist Account of the Virtual committed to holding that these things exist right now. And since The nominalist with respect to fictional objects claims that we should always strive to have the simplest view of what exists, Sherlock Holmes and his ilk do not exist, despite the fact that we should hold that talk about abstract objects is like engaging authors, readers, and critics speak as though fictional objects in a little fiction, one that is useful to us. This “oracle argument” exist. The challenge for the nominalist, though, is to explain applies for virtual objects, as well. If an oracle told us that there how claims about fictional objects can be true or false without were no virtual objects, in reality, then our linguistic practices relying on the existence of fictional entities. One example of how would be little changed. We would still talk about these objects, this might work comes from a chapter from Kendall Walton’s even granting that they do not exist, but acknowledge that Mimesis as Make-Believe, titled “Doing Without Fictional speaking about them was a kind of shorthand for describing Entities.” According to Walton, the notion of make-believe, the intricacies of the program. We are not committed to the or prescribed imaginings, and not an ontology of fictional existence of the virtual objects, then, but use words that seem entities, is all that is needed to account for the truth and falsity to commit us to the existence of these things. of assertions containing fictional names. He argues that “when However, one problem with this account, at least as it a participant in a game of make-believe authorized by a given applies to fictional entities, is that it is hard to see how it is representation fictionally asserts something by uttering an possible to both “fictionally assert” something and make a ordinary statement and in doing so makes a genuine assertion, “genuine assertion” at the same time. In participating in a what she genuinely asserts is true if and only if it is fictional in game of make-believe we are participating in the pretense, the game that she speaks truly” (399). According to the above and are hence making pretend, or fictional, assertions. The truth condition, when someone makes an assertion such as jump from an assertion made during a game of make-believe “Tom Sawyer attended his own funeral,” she is participating in and a genuine assertion seems itself unauthorized. In order to a game of make-believe authorized by the representation, in genuinely assert something about a fictional entity, we have to this case, The Adventures of Tom Sawyer. Her assertion is not step outside the pretense and comment on it. Otherwise, we part of the novel, but authorized by it. And this assertion is true, are pretending to assert and not genuinely asserting anything. according to Walton, “if and only if it is fictional in the game that And the same thing seems to go for assertions about virtual she speak truly.” That is, her assertion is true if, and only if, from objects made within and outside a “virtual world,” i.e., the game the framework of the game of make-believe what she said is licensed by the program. If my assertions are made within the true. And this is determined by turning to the work itself. context of a virtual world, I am still participating in this world This type of nominalist account can be applied to virtual (game) and not making genuine assertions in the real world objects, as well. Those who say that they cultivate virtual farms, about the entities in the virtual one. decorate virtual apartments, and so on, are participating, on this

— 57 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

A Deflationist Account of the Virtual practices. We are not committed, as a Platonist seems to be, to The last account to be discussed here is a deflationary one that admitting the existence of countless abstract entities. acknowledges the need to posit virtual entities, yet denies that A Final Comparison and Conclusion such entities exist independently from our linguistic practices. The above views are interestingly compared with the Platonist This view is an application of the account of fictional entities view that abstract objects are independently existent entities given by Saul Kripke in his John Locke lectures.1 To be able to that become instantiated when we, through our linguistic allow for the truth or falsity of sentences such as “John admires practices, embody them. If this were so, then SuperPokes, Hamlet,” Kripke argued that we can posit the existence of virtual flowers, and so on would exist independently of our abstract fictional entities, yet at the same time deny that such linguistic practices. Perhaps this is merely an intuition pump, but entities are language independent. These entities don’t have a it seems highly unlikely that these things exist independently of secondary Meinongian existence, but exist in the actual world. Facebook and the activities that take place on it. Thus, our talk Whether a particular fictional entity exists or not is an empirical about entities like the SuperPoke, the virtual flower, and so on, question, and the answer is dependent on whether linguistic provide some intuitive grounds for thinking that, with respect practices created such an entity or not. For Kripke, a fictional to the ontological status of abstract objects, something like character “is an abstract entity which exists in virtue of more nominalism or deflationism is true. I have argued that, due to concrete activities the same way that a nation is an abstract the fact that the Fregean account has no way of accounting for entity which exists in virtue of the concrete relations between true and false claims about virtual entities, it cannot be correct, people” (Lecture 3, p. 20). Concrete activities like telling stories, and that a serious flaw of the above nominalist account is that writing novels, and so on, determine whether a statement about the distinction between pretending to assert something and a fictional character is true. In the same way that a statement really asserting something is not fully acknowledged in the about a nation is true or false due to the activities of its people, nominalist truth condition. This leaves the deflationist account a statement about a fictional character is true or false in virtue of virtual entities, which posits the existence of virtual objects, of the activity of making fiction. but claims that these objects are created by our linguistic How, then, would this account be applied to virtual objects? practices. Like nations, and Sherlock Holmes, virtual objects like It seems that on this type of deflationary account, virtual objects, the ones I send to people on Facebook exist, but their existence like my virtual farm, exist, but their existence is dependent on is dependent on the practices of the programmers and users the linguistic practices, presumably the activities of the program of the program. Had these not been in place, there would be for creating the virtual farm. What is the mechanism by which no virtual flowers. Yet, this existence is not some weaker form the farm is created? Kripke gives the following explanation with of existence. Does this have untoward consequences with regard to fictional entities. When an author of a work of fiction respect to one’s ontological commitments? Since the only virtual uses a fictional name of a person, that name is then used by objects that are created are the result of our concrete linguistic readers, critics, and the author herself, as a name of a fictional practices, only a limited number of objects can be claimed to person. Language has this way of turning fictional names of exist. So, those of us who spend far too much time cultivating entities into names of fictional entities. Analogously, the user virtual farms and sending virtual flowers can rest assured that of the program acts in such a way that she creates a virtual our farms and flowers exist, although we get no actual crops, farm. By engaging in the concrete activities that create a virtual and the flowers don’t smell as sweet as the ones purchased version of a real farm, a real version of a virtual farm is created. from the local florist. As Shakespeare fictionally described a real person named “Hamlet,” and thereby created a real fictional person named Endnotes Hamlet, I engaged in activities that create a virtual version of a 1. This view also bears some similarity to Joseph Margolis’ real entity, and created a real version of a virtual entity. Here are (2000) position that works of art are emergent entities that some more examples from Facebook: when I poke someone, I exist in a language-dependent way. However, a key difference between the two views is that, while Margolis commits am engaging in a form of pretense. In pretending to physically himself to a strongly relativist ontology of artworks, Kripke poke someone, I virtually poke someone. In pretending to send does not claim that relativism with respect to fictional entities a physical flower, I really send a virtual flower. In pretending follows from the fact that fictional characters are language- to tend someone’s actual farm, I really tend someone’s virtual created entities. farm. And so on. Like fictional characters, pokes, virtual flowers, References virtual apartments, and virtual farms exist, but their existence is the result of my linguistic practices. Frege, Gottlob. 1967. The thought. In Philosophical Logic. Oxford: Oxford University Press. Deflationism with respect to virtual entities overcomes ———. 1979. Posthumous Writings, ed. Hans Hermez et al. Oxford: the problem of how it is possible to speak truly and falsely Basil Blackwell. about virtual entities. We can speak truly or falsely about them ———. 1980. On sense and reference. In Translations from the Writings since they exist and have properties that are specified in the of Gottlob Frege, ed. Geach and Black. Oxford: 1980. linguistic practices that create them. There is no need to claim that, in making claims about these entities, we engage in a Kripke, Saul. 1973. John Locke Lectures. Unpublished. Available in the Library, Oxford. pretense. The pretense that is involved in creating the entities drops out once the objects are created. We can then make Margolis, Joseph. 2000. The deviant ontology of artworks. In Theories of Art Today, ed. Carroll. Madison: University of Wisconsin Press. assertions that are quite serious about Sherlock Holmes, and, by extension, virtual flowers and other virtual entities. This Walton, Kendall. 1990. Mimesis as Make-Believe: On the Foundations of the Representational Arts. Cambridge: Harvard University Press. feature makes deflationism a much more plausible position than nominalism. Here, one might object that this view gives too bloated an ontology, admitting the existence of many objects that have no physical reality. However, keep in mind that, on this account, the number of virtual objects is limited by our concrete linguistic practices, so the number of such objects is limited by our own

— 58 — — Philosophy and Computers — There Are No IMAGES (to be seen) or "The Fallacy of the INTERMEDIATE ENTITY" © Riccardo Manzotti January 2010

Our world is filled with pictures, everywhere! some of them are dynamic like movies or videogames. but is there any image? I will claim that there aren’t any!!

— 59 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

A picture is a physical two-dimensional distribution of various properties (like color, gray values, etc.) mmmmh ... images maybe ...... but pictures? what is a picture?

nothing is bu e t b NG A SPHE a o LO RI l A C t D AL w g E T o V H This is a picture then! A R R U d C E E i m D It’s physical and it lasts! e I

M n

I can even go to a N

S o

museum to see it! N

A l

p S

i H

t P

u E

r e

The Rolled Paper as well as a curved surface still hold, since they can be mapped on a two or n-dimensional physical and geometrical projection

in short, a picture is a physical thing

a poststamp is a good example!

you can stick a We can conceive picture but not n-dimensional pictures built an image in any way we like! even dynamic ones like movies and computer screens. thus an image But they are physical things! is not a picture! pictures are not images ...

— 60 — — Philosophy and Computers —

it all started in italy (surprise?). many centuries ago

I suggest that you see an image which is nothing but any section of the visual pyramid

This image is all we need to see reality as it really is ... well, it really seems that seeing the world is nothing but seeing an image of it ... did I go too far?

Leon Battista Alberti Filippo Brunelleschi (1404-1472) (1377-1446)

MMMH i DID EVEN Worse ... YET IT SEEMED ONLY REASONABLE, AT THAT POINT TO ASSUME THAT the external image was supposed at the other end of the visual pyramid OBJECT

Kepler 1571-1630

even more ALLEGED alleged EXTERNAL IMAGE I SUGGESTED retinal internal OF THE OBJECT image THAT getting inside the eye THERE is AN IMAGE INSIDE THE EYE ... ehm minor mistake ... lightsource neural spikes

intermediate photons neural nucleus

optical nerve brain external object eYE however, as far as we know, there are no images along the visual perceptual chain ... many physical phenomena, but neither pictures nor images in any sense ...

— 61 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

plain truth is that neither in the eye (pace keplero) nor in the brain (pace kosslyn) there are any images (not to speak of pictures!). It is only a useful cultural metaphor ... nothing but that!

There is no reason to suppose there is anything like an image between our experience and the world, nor that our experience is an image of some kind. if we WERE These ideas derive from the small enough historical importance of opticks to walk inside from XVI to the XVII Century the retina, it would be obvious that SEEING THE WORLD OR SEEING THE WORLD there is no image BY MEANS OF PICTURES OR IMAGES IS VERY to be seen DIFFERENT. dO WE REALLY SEE IMAGES?

many scholars were mislead by such wrong views

optical lense nerve pupil

Leonardo I was so convinced 1452-1519 that the eye was an image capturing device that I made three big mistakes when I draw the eye! can you spot them?

IN the same sense, neither a camera nor a scanner are image capturing though a devices. picture (even What they do is a stereo pair) to produce pictures out of what they could be the have in front of them (which in turn cause of a simi- is neither a picture nor an image). lar visual expe- cameras and printers make prints out of rience ... it does not have to the objects they are pointed to. be the case!!! they do not capture images!

— 62 — — Philosophy and Computers —

Also in perception the notion of image is obnoxius! if, To see a flower, I would need an image of that flower, THEN ... in order to see that image I would need an image ... of that image of that flower and so on and on ad infinitum ...

But aren’t there surprisingly, on mirrors there are no images, as it images on mirrors? can be easily shown, by the fact that, given a mirror, each observer sees a different part of the world. Thus, on a mirror, there ought to be infinite images, which is to say none. A mirror is JUST a structure modifying the usual causal geometry of light rays

Is that myself? ??????

The answer is NO ONCE AGAIN! as we will see, there are neither images nor reflections. These names are nothing but shorthands to describe complex causal entanglementS

yet ... how would you explain that, in a mirror, we don’t see the world as it is, but apparently a left- right inverted version of it? isn’t it an inverted image of the world? don’t we see a reflection then?

Observer cheated by over A a reflecting surface mirror surface THERE is nothing to be seeN. Truth is that the order in which we see is ADDITIONAL PATHS DUE TO THE MIRROR The alleged different from usual. mirror surface mirror image is IF THERE IS A MIRROR, nothing but the real from any object world reached through there ARE two paths a rearranged order of INSTEAD OF ONLY lightrays incoming ONE for light. in the eye of DIRECT PATHS the sbj

A B C C B A

— 63 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

The same is true for microscopes, a telescope does not acquire lenses, distorting mirros and such images but rather it modifies the they change the causal geometry lightrays geometry in a very unusual of light, but they neither create nor yet useful way. manipulate images. There are no images! A human observer is thus in causal proximity with events that would otherways be disconnected from him/her!

We are not images, there is nothing on the mirrors!

in the same way iT would be foolish to look if you look for images on the inside a telescope you won’t find any surfaces of my image of saturn glasses anymore than saturn itself!

I’m nothing but I’m a real picture a variation in the causal of your face network of lightrays digitally built and you couldn’t even moving draw me, if you were precise!

Yet, none of us is an image!!! here nor here! of course I can always take There is no image of saturn a picture with a camera, which is a device to make things we call pictures. really there are no images to be caught!

THERE IS A FUNDAMENTAL DIFFERENCE BETWEEN LOOKING AT MYSELF IN A WEBCAM GENERATED PICTURE AND AT MYSELF IN A MIRROR (AND IT’S NOT JUST THE LEFT-RIGHT INVERSION ...)

— 64 — — Philosophy and Computers —

by deflecting lightrays in unusual ways a mirror allows the beholder to interact with reality it does create NEITHER REFLECTIONS NOR mirror images

LIGHTRAY

LIGHTRAY BULLET

BANG!

of course I cannot really draw a mirror since there is no picture on it but you’ll understand I’m sure

WITH REAR MIRRORS, THE CAUSAL CONSIDER SPACE OF THE DRIVER THE DIFFERENCE CHANGES DRASTICALLY BETWEEN DRIVING WITHOUT THERE IS NO NEED USING REAR MIRRORS TO INTRODUCE AND USING THEM IMAGES

IN SUM, WORDS LIKE iT’S NOT AN ISOLATE CASE THOUGH. iN MANY OTHER “IMAGE” OR “REFLECTION” CASES, to simplify a problem, a more ARE NOTHING BUT SHORTCUTs complex network irreducible aspects TO SPEAK OF MORE were chopped to simpler fictitious bits COMPLEX CONDITIONs

i’M STILL!!! i’M NOT GOING ANYWHERE but it’s

EARThLINGS! perception

WRONG! i

mind

m logic a consciousness

g e

like it is easier To say that the sun rises IN THE SKY BUT ...

— 65 — — APA Newsletter, Spring 2010, Volume 09, Number 2 —

the idea of sending and receiving images is supported by (wrong!) analogies between perceiving the world and receiving information as if it were bits of matter going back and forth! today, although almost nobody takes images in a literal sense, many use other words to flesh out their theories of would perception you like some more “mental image” or would you Brain rather prefer world some “neural MIND? image”? with a touch of “information” perhaps? image express virtual or real?

this is a trick to spot a fictitious intermediate entitiy: Consider it in isolation from everything else: does it really continue to exist? try with money, information, and, of course, images!

NEURAL PATTERN PLAYING THE ROLE OF THE INTERMEDIATE ENTITY LIKE THE OLD IMAGE

dAVId mARR, Stephen kOSSLyN, AND MANY OTHERS, fall in the trap of the fallacy of the intermediate entity when they assumed that representations are something separate from what they represent. for inst ance, put a THEY SET ASIDE THE OLD NOTION OF IMAGE BUT KEPT the coin under a microscope the notion disguised as “VISUAL REPRESENTATION”, would y ou ever find its v alue? “MENTAL IMAGE”, “NEURAL PATTERN” AND THE LIKE. YET of course not, the same holds IT is again THE fallacy OF THE INTERMEDIATE ENTITY for informa tion, images and the like!

And now, that we got to the end, I can thank you for your patience. It is somewhat paradoxical that I used cartoons to criticize a notion such as ‘image’ that derives from the widespread use of pictures. In the end, my point is to stress that there is a lot we still don’t know and that certain words (image, representation, and the like) disguise our ignorance. We don’t really know what they hide behind.

Riccardo Manzotti IN SHORT, THE FALLACY OF IULM University, Milan, Italy THE INTERMEDIATE ENTITY SHOWS THAT MANY ENTITIES http://www.consciousness.it WE TAKE FOR GRANTED SUCH AS IMAGES DERIVE FROM ASSUMING [email protected] THAT REALITY IS MADE OF SEPARATE AUTONOMOUS ENTITY. i WONDER WHETHER IT COULDN’T BE BETTER TO ADOPT A PROCESS ONTOLOGY ... i WONDER ...

— 66 — — Philosophy and Computers —

References Bridging biological and artificial learners Bennett, M. R. and P. M. S. Hacker. 2003. Philosophical Foundations of Learning and cognitive development are critical to any machine Neuroscience. Malden, Mass.: Blackwell. equivalent of the human mind. Although various models Kay, K. N., T. Naselaris, et al. 2008. Identifying natural images from of learning have already been implemented in cognitive human brain activity. Nature 452: 352-55. architectures, a wide gap still separates machine learning Kemp, M. 1990. The Science of Art. Optical themes in western art from approaches from solutions found in biology (e.g., human self- Brunelleschi to Seurat. Yale: Yale University Press. regulated learning and meta-learning). The aim of BICA 2010 Kosslyn, S. M., W. L. Thompson, et al. (Eds.) 2006. The Case for Mental is to create a multidisciplinary forum for discussion of potential Imagery. New York: Oxford University Press. solutions and to broaden the potential scope of gaining a better Maddox, J. 1991. The semantics of plane-mirror inversion. Nature understanding of the underlying capacities of human-like 353(6347): 791. learners. Such a forum is an important step towards bridging Manzotti, R. 2006. An alternative process view of conscious perception. the gap between biological and artificial systems. Journal of Consciousness Studies 13(6): 45-79. The conference brings together four schools of thoughts: (1) Manzotti, R. 2008. A process oriented externalist solution to the hard computational neuroscience, that tries to understand how the problem. The Reasoner 2(6): 13-20. brain works in terms of connectionist models; (2) cognitive Pylyshyn, Z. W. 2002. Mental imagery: iI search of a theory. Behavioral modeling, pursuing higher-level computational description of and Brain Sciences 25: 157-238. human cognition; (3) human-level artificial intelligence, aiming Russell, S. and P. Norvig. 2003. Artificial Intelligence. A Modern Approach. at generally intelligent artifacts that can replace humans at New York: Prentice Hall. work; and (4) human-like learners: artificial minds that can be Tye, M. 1984. The debate about mental imagery. The Journal of understood by humans intuitively, that can learn like humans, Philosophy 81(11): 678-91. from humans and for human needs. The intended focus in 2010 is on (4). A comparative table created by participants at the BICA (2009) forum clearly demonstrates that a joined discussion of the four schools is possible and can be highly productive and INVITATIONS synergetic: http://members.cox.net/bica2009/cogarch/.

Potential topics I. From Susan Castro: The APA Committee on Topics of interest include: metacognition, self-regulated learning, meta-learning, emotional intelligence, and human- Philosophy and Computers invites you to two sessions like cognitive growth ability. BICA models of biological learning at the Eastern Division meeting in December 2010: mechanisms, general theory of bootstrapped and human-like Session 1 learning, experience with virtually embodied BICA, language acquisition, and symbol grounding problem. Biological Twitter: Brevity and Connectedness in Philosophical constraints for BICA (with a summary of recent progress in Communities neuroscience), and scalability metrics. The conference will also Chair: Susan Castro address potential targets of applications of the BICA technology, Shannon Vallor including intelligent personal assistants, metacognitive tutoring Topic: Social media and the Aristotelian shared life systems and pedagogical agents, artificial programmers, virtual characters, machine learning, and natural language processing Session 2 tools of new generation, intelligent social agents, and human- Beyond the Blackboard: Teaching Philosophy with computer interface. Technology Chair: Renee Smith Registration and venue Dylan Wittkower - Holiday Inn (www.holidayinn.com/washarlington) Topic: Communicating non-linear theories in Arlington, VA, adjacent to Washington, D.C. The Marvin Croy selected location is across the street from the AAAI Topic: Modeling student data Fall Symposium site (which is Westin Arlington Gateway). - Early registration: $200 USD. Early student registration Exact titles to be published in the Proceedings. rate is $80 USD and will be granted to selected students. II. Biologically Inspired Cognitive Architectures (BICA 2010) November 13-14, Arlington, Virginia, USA What does it take to create a real-life computational equivalent of the human mind? BICA 2010 is the first international conference addressing this question within the framework of the cognitive architectures paradigm. The conference, held November 13-14 in conjunction with the AAAI Fall Symposium Series, follows two successful predecessors: AAAI FSS BICA 2008 and 2009.

— 67 —