INFORMATION TO USERS

This was produced from a copy of a document sent to us for microfilming. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the material submitted.

The following explanation of techniques is provided to help you understand markings or notations which may appear on this reproduction.

1. The sign or "target” for pages apparently lacking from the document photographed is "Missing Page(s)”. If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure you of complete continuity.

2. When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy. Unless we meant to delete copyrighted materials that should not have been filmed, you will find a good image of the page in the adjacent frame. If copyrighted materials were deleted you will find a target note listing the pages in the adjacent frame.

3. When a map, drawing or chart, etc., is part of the material being photo­ graphed the photographer has followed a definite method in "sectioning” the material. It is customary to begin filming at the upper left hand corner of a large sheet and to continue from left to right :n equal sections with small overlaps. If necessary, sectioning is continued again—beginning below the first row and continuing on until complete.

4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost and tipped into your xerographic copy. Requests can be made to our Dissertations Customer Services Department.

5. Some pages in anydocument may have indistinct print. In all cases we have filmed the best available copy.

University Microfilms International 300 N. ZEEB RD . ANN ARBOR. Ml 48106 Josephson, John Richard

EXPLANATION AND INDUCTION

The Ohio Slate University

University Microfilms

International300 N. Zeeb Road, Ann Arbor, MI 48106 EXPLANATION AND INDUCTION

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of in the Graduate

School of The Ohio State University

By

John R. Josephson, B.S., M.S.

* * * * *

The Ohio State University

1982

Reading Committee: Approved By

Ronald Laymon Peter K. Machamer William Lycan

Department of Philosophy (5) Copyright by John R. Josephson 1982 To my mother, who is in many ways a sine qua non for the existence of this work. VITA

November 2, 1944. . . . Eorn - Cleveland, Ohio

1968 ...... B.S., The Ohio State University (Mathematics)

1970...... M.S., The Ohio State University (Mathematics)

1971-1982 Teaching Associate, Department of Mathematics; and Teaching Associate, Department of Philosophy, The Ohio State University, Columbus, Ohio

FIELDS OF STUDY

Major Field: Philosophy

Philosophy of Science. Professors Peter Machamer and Ronald Laymon. TABLE OF CONTENTS

Page

DEDICATION . . . . -...... ii

VITA ...... iii

LIST OF F I G U R E S ...... iv

Chapter

1. Skeptical Considerations ...... 1

A. Cartesian Doubts ...... 1 B. Three Weaknesses of . . 9 C. The Task of This Dissertation . . . 44

2. A Stand Against Skepticism: Foundations for a Logic of Induction . . 49

A. The Epistemic Starting Place . . . 49 B. Is it Reasonable to be Reasonable? . 52 C. The A-B Perspective ...... 57 D. Inductive Proceedures ...... 62 E. Inductive Rationality ...... 67 F. Explanatory Coherence ...... 69 G. Inference to the Best Explanation . . 71 H. The Present D o x a ...... 74 I. The Stability of B e l i e f ...... 79 J. M e m o r y ...... 85 K. Sense Perception ...... 87 L. The Empirical B a s e ...... 91 M. Statement of the T h e s i s ...... 94

3. Foundations: Explanatory Inference rather than Inductive Generalization . . 97

A. Theoretical Entities ...... 98 B. Emergent Certainty ...... 101 C. Absorption and Insightfulness . . . 107

4. The ...... 133

A. Where is the Problem? ...... 133 B. Reichenbach's Vindication of Induction Improved: Probabilities as Propensities 138 iv Page

C. The "New Riddle of Induction": Projecting Stabilities ...... 165 D. Where Did the Problem Go? .... 176

E p i l o g ...... 181

APPENDIXES

A. Every Real Number on [0,1] is Accessible as a Limit of Relative Frequencies .... 190

B. A Non-Convergent Sequence of Relative Frequencies ...... 193

C. Strangs, Decastrangs, and Kilostrangs: Proposals for a Vocabulary for Low Likelihood Events ...... 195

BIBLIOGRAPHY ...... 197

v LIST OF FIGURES

Page

Figure 1. Circuit Diagram of a Hume Counter Example Machine ...... 33

Figure 2. The Organization of a Mackie Counterexample Machine ...... 36

Figure 3. Decision Matrix for the Hypothesis of the Great D e c e p t i o n ...... 54

vi Chapter 1

Skeptical Considerations

A. Cartesian Doubts

There is no proposition whatever for which we cannot entertain a particle of doubt.

I cannot be completely sure that I am not being syste­ matically deceived by a very powerful and evil being, perhaps one so powerful that it can do anything which is logically self-consistent. The deception may extend, therefore, to "inner" as well as to "outer" perceptions; a being so powerful would be able to confuse even my percep­ tions of my own mental states— dibble with my judgments, so to speak.

I cannot be completely sure that what seems most clear to me is not really an illusion, or that it is really clear to me at all despite its seeming so. My reasonings them­ selves might be a complete muddle, and I totally unaware of it. My beliefs, even those about my own thoughts, may bear no likeness at all to the way things really are. In short, anything I believe may actually be false, or worse yet, completely unrelated to reality.

Of course, if my thinking is a complete muddle, probably nothing I have said so far makes any sense. But I don't really think that my thinking is in such poor shape. 2

In particular, I think that I am making sense here and that

what I am saying is true: absolute certainty is not pos­

sible. Of course I am not absolutely certain of that; yet

it is too restrictive to require absolute certainty before

we speak and assert, and I won't require it of myself.

Descartes thought that there are propositions that

are immune from doubt. In this he seems to be mistaken.

The proposition, 'I exist' can even be doubted.

Perhaps all that really exists is this ongoing stream of

consciousness, with the perceptions, musings, impulses,

doubts . . . that make it up. There is no necessary con­

nection between the existence of a thought and the exis­

tence of a thinker. It is conceivable that there are

thoughts, but no thinkers thinking them; no selves at all,

least of all myself. This stream of thoughts is not itself

a thinker— not, perhaps, anything other than a stream of mental occurrences. It may not even be a unified thing at

all. I am not this stream of thoughts. The particular

thought 'I exist' might flicker into existence for a moment,

unthought by any being, refer truthfully only to itself, and then pop right out of existence again. I, as thinker, am not this particular thought either. If this is con­ ceivable, even just barely conceivable, then a particle of

doubt is possible as to my own existence. 3

Furthermore, even if while introspecting I should suddenly "directly" perceive myself, I would have no way of being absolutely certain that I was perceiving correctly, since my inner perceptions are as uncertain as my outer ones. I might suspect that the "I" I see, seeing itself see itself, is just some fantastic free-floating illusion.

Thus, I cannot be absolutely certain of my own exis­ tence .

Yet surely it is certain that something exists. There is at least this stream of thoughts, with a briefly occurring self-referential thought 'I am'. Can I really doubt the proposition that 'Something exists'?

To begin with, I can doubt the existence of anything but the presently active parts of the stream of con­ sciousness. I have already managed to conceive that the events of this stream of consciousness might be the whole of reality; now I realize that I cannot be completely sure that this consciousness has any history. Maybe all of this mental activity began only an instant ago, and consequently any memory I call up is false. Perhaps

THIS THOUGHT EVENT (say it's an 'I am') is all that exists, all that has ever existed, all that will ever exist. Furthermore, since I may not be clearly perceiving

THIS THOUGHT EVENT, even while it happens, any descrip­ tion of it is questionable, even the description of it as a "thought event" or as an "I am" thought. I had better

just say that perhaps all there is is THIS.

Can I question the existence even of THIS? Is it conceivable that there is really nothing at all? I will admit that it is a hard thing to conceive, but I think that

I can do it. The task is to conceive that the universe may be so totally void that nothing whatever exists— not my conception of the void, not even the void itself. Such a conception seems to have been approached by the author of the Heart Sutra, and it helps my imagination along to con­ template the following fragment from it:

Therefore in emptiness, no form, No feelings, perceptions, impulses, consciousness; No eyes, no ears, no nose, no tongue, no body, no ; No color, no sound, no smell, no taste, no touch, no object of mind; No realm of eyes and so forth until no realm of mind consciousness; No ignorance and also no extinction of it, and so forth until no old-aae-and-ueath and also no extinction of them; No suffering, no origination, no stopping, no path; No cognition, also no attainment ....

When I give myself over to the spirit of the passage, I think that I can conceive that the universe is empty. So I think that I can doubt the existence of THIS, and doubt the truth of the proposition, 'Something exists'. To be honest, I must say that I am not at all sure that I can conceive of the universe as being so totally void that not even the present thought exists. My reader is probably having a similar difficulty. Yet even if we can't quite conceive it, maybe what we have here is a failure of imagination, and not an impossibility in prin- 2 ciple of conception. Maybe TOTALLY VOID is conceivable in principle; we are just too caught up in our ordinary ways of thanking to achieve it. My point is that some doubt is possible as to whether or not we can ever succeed in doubting that, 'Something exists'. So we cannot be absolutely certain that we cannot doubt it. And so we cannot be absolutely certain that it is absolutely certain that something exists. The proper conclusion seems to be that absolute certainty is impossible for any proposition asserting existence; and even if absolute certainty could be achieved for some such proposition, it could only be achieved for asserting the existence of a momentary, indescribable THIS,and even here we could never be abso­ lutely certain that we could be absolutely certain.

Even the propositions of mathematics and pure logic can be called into question. The of theory that appeared around the beginning of this century point out that our current mathematics and logic might be inconsis­ tent. ^ Furthermore, we can doubt the validity of any proof longer than a single step, for our memory of the previous step can be doubted; so, at most we can be certain of propositions which can be held within a single grasp of consciousness. And finally, an evil being capable of doing anything which is logically self-consistent, is capable of

Confusing Me Utterly; and if I may be Utterly Confused, I cannot be completely sure I understand what, for example,

'All A are A' really means; and if I can't be sure of what it means, I can't be sure that it's true. So the proposi­ tions of mathematics and logic also may be doubted, and are less than absolutely certain.

At this point I should make it clear that I am not denying the existence of tables and chairs, my body, my reader, and such-like things. I am quite sure that these things exist, and even that 1+2=3. But I want to dis­ tinguish between "quite sure" and "absolutely certain".

When I say that I am "quite sure" of something, I do not mean that I cannot be wrong, but that I am very confident that I am right.^ I am happy to assert what I am quite sure of, base my actions upon it, bet on it, and normally leave it unquestioned. But if I were to be "absolutely certain" of some proposition, that would mean that I cannot by any stretch of the imagination be wrong; it would be beyond question in every context of investigation, even the most searching. What I am arguing here is that if we 7

liberate our imaginations, we will find that everything we believe is open to question. If we follow Descartes, and suspend judgment on any proposition where we can entertain the slightest doubt, we will be left with nothing at all, or with very nearly nothing at all.

****** * *

Shall we then, in the name of intellectual honesty, adopt the position of the complete skeptic? Shall we, perhaps, make only those judgments that our practical lives force upon us, and make these only tentatively? A more moderate approach would be to renounce both dogmatism and skepticism:

"There can be no doubt at all on the question, Marcus. The earth is at the center of the World. The truth of this is perfectly plain and certain. You have to be mad even to raise the question."

It is not that one cannot live the life of the complete skeptic. It is probably done to some degree very day:

"I dunno, Harry. The way I usually think of it, one finger and two fingers makes three fingers altogether. But what about base two or something? I hear that things are all different in base two. I just can't be sure."

Complete skepticism is probably possible; but suspending judgment on some question just because we cannot get absolute certainty seems no more intellectually honest than going ahead and believing what we find ourselves to be quite sure of. That is, the honest thing to do is to admit that it is always possible to be wrong, and to go ahead anyway and follow the clear sense of the evidence. We £ should "proportion our belief of the evidence." However we decide to go about it, if we are to be able to do science, or philosophy for that matterr we shall have to be able to make honest judgments. The life of the skeptic perhaps can be lived; it is just not the life of the scien­ tist.

Summary •

A. No absolute certainty.

B. We can always be wrong.

C. Complete skepticism not mandatory.

D. Believe what we feel sure of.

E. Follow the clear sense of the evidence. 9

B. Three Weaknesses of Empiricism

The following program has perhaps never been advocated in its entirety, and in precisely this form, by any single philosopher. Yet it does come pretty close to the views of

David Hume, and something quite close to it has functioned,

I think, as a sort of catechism for that school of thought usually called "empiricism". Consequently, I will refer to a hypothetical advocate of this program as an "empiricist".

In short, this is what the empiricist believes:

(1) Absolute certainty is impossible.

Yet by retreating to what we are most certain of, we can find an epistemic starting place (an end to chains of justification):

(2) Descriptions of sense perceptions are reasonably

certain.

and

(3) Propositions of (deductive) logic and mathematics

are reasonably certain, though they convey no

information about the world.

Furthermore,

(4) Scientific knowledge is to be based upon sense

perception, and built up by the methods of logic

and mathematics. This way the conclusions of

science will also be reasonably certain, yet not

completely immune from a subsequent need for

revision. (5) Since the propositions of science are character­

istically general, while descriptions of sense

perceptions are particular, and since logic and

mathematics cannot be used to deduce general

propositions from particular ones, some further

(and non-deductive) inference pattern is necessary.

(6 ) This inference pattern is some form of "generalizing

from experience" or "inductive generalization", or

the related pattern of "projecting the next case".

(7) Inductive generalization (or indeed any non-deduc­

tive inference pattern taken to be fundamental)

cannot be non-circularly justified.

This last is the traditional "problem of Induction" assoc­

iated with Hume.

Finally,

(8 ) The meanings of words are to be analyzed in terms

of sense perception.

********

One weakness of empiricism concerns the problem of

induction. D. C. Stove has undertaken an investigation of

Hume's argument that certain non-deductive inferences are unreasonable.^ The inferences that Stove examines are

those Hume calls inferences "from the impression to the

idea", and are inferences from experience to predictions.

These come in two forms: (1) inferences from a present

experience to a prediction, independent of past experience

(e.g., This is a billiard ball striking another, so the 11

second ball will move.); (2 ) inferences from a present

experience to a prediction when we have had previous exper­

ience (e.g., This is a billiard ball striking another, in

the past this has always resulted in the movement of the

second ball, so this time the second ball will move.) . Those

of the first form Stove calls "a priori" inferences, and

those of the second form "predictive-inductive" inferences.

Hume's argument occurs in two stages, each stage being

an argument designed to show the unreasonableness of one of

the forms of inferences, Stove renders (accurately, I think)

the two stages as follows:

Stage 1

(a) Whatever is intelligible, is possible.

(b) That the inference from an impression to an idea,

prior to experience of the appropriate constant

conjunction, should have its premise true and

conclusion false, is an intelligible supposition.

(c) That supposition is possible.

(a) The inference from the impression to the idea,

prior to experience, is not one which reason

engages us to make.

Stage 2

(e) Probable arguments all presuppose that unobserved

(e.g., future) instances resemble observed ones.

(The Thesis of Resemblance) (f) That there is this resemblance, is a proposition

concerning matter of fact and existence.

(g) That proposition (i.e., the Thesis of Resemblance)

cannot be proved by any demonstrative arguments.

(h) Any arguments for the Thesis of Resemblance must

be probable ones.

(i) Any probable arguments for that thesis would be

circular.

(j) Even after we have had experience of the appro­

priate constant conjunction, it is not reason

(but custom, etc.) which determines us to infer

the idea from the impression.

The inference pattern of Hume's argument is claimed by

Stove to be:

This too seems to me to be an accurate rendering of the text.

At this point Stove undertakes to clarify the argument by translating much of Hume's terminology into more modern language.

The phrase "is not one which reason engages us to make" occurring in (d) and the equivalent expression in (j) are 13

replaced by "is unreasonable" to avoid mistaking the claims

to be psychological ones about the operations of mental

faculties. This seems correct. Ultimately Stove analyzes

'unresaonableness' in terms of invalidity.

In (f), "is a proposition concerning matter of fact and existence" is replaced by "is a contingent proposition".

This seems to capture Hume's sense.

In (g), "demonstrative arguments" is replaced by

"valid argument from necessarily true premises". This also seems fair to the text. Hume cannot simply mean ’valid1 as this would make nonsense of his claim that matters of fact cannot be proved by demonstrative arguments.

In (e), (h), and (i), "probable argument" is replaced by "inductive argument", where we are not to understand

"inductive" to carry any sense of evaluation of the strength of an argument, but only to classify it as a non-deductive inference from observed instances of empirical predicates to unobserved instances.

The last problem of translation concerns the word

"presuppose" in (e). Stove understands the word to be used in the sense where an argument presupposes a proposi­ tion p in case p is required as an additional premise to make the argument valid.

When we make these substitutions, Hume's argument comes out: 14

Stage 1

(a) Whatever is intelligible, is possible.

(b) All a priori inferences [from present experience

to prediction] are such that the supposition,

that the premise is true and the conclusion false,

is intelligible,

(c) That supposition is always possible.

(d) All a priori inferences are unreasonable.

Stage 2

(e) All inductive arguments are invalid as they stand,

and are such that, in order to turn them into

valid arguments, it is necessary to add the

Resemblance Thesis to their premises.

(f) The Resemblance Thesis is a contingent proposition.

(g) The Resemblance Thesis cannot be validly inferred

from necessarily true premises.

(h) Any arguments for the Resemblance Thesis must be

inductive ones.

(i) Any inductive argument for the Resemblance Thesis

would be circular.

(j) All predictive-inductive inferences are

unreasonable. 15

Hume's argument, if it is correct, has the corollary that no number of observations of nature is sufficient to warrant inferences to the universalized theories of natural science. And if we identify rational belief with the results of "reasonable inferences", the argument also has the corollary that there are no rational grounds whatsoever for any nontautologous expectation.

So the empiricist is led to the belief that the infer­ ential practices that characterize science ultimately rest on "habit" or "animal faith" or some such. That is, in one step science is found to rest on irrational foundations.

But we should be able to expect at least a layer or two of rationality underlying scientific inferences. Furthermore, the conclusion that there are no rational grounds for making predictions collides head-on with common sense. Thus Hume's argument is really a reductio ad absurdum for the empiri­ cist's program. The absurdity is due, I think, to weak and inadequate theories of inductive inference and inductive rationality. What we really need is a point of view from which the problem of induction does not arise, or failing that, a point of view from which the problem is easier to deal with.

******** A second inadequacy of the empiricist's program is

its inability to deal satisfactorily with theoretical entities. This problem arises in part because inductive generalization, even if it is supposed to be rational, is incapable of supporting conclusions about entities unlike those that are observed, and in part because of the empiri­ cist's peculiar theory of meaning.

Suppose that someone were to hold that the world is made up only of surfaces. Anything that is not a surface is either unreal or is a logical construction of surfaces.

All qualities of physical objects are qualities of their surfaces, and all relationships between physical objects are relationships between their surfaces. "After all," he might argue, "we never see, hear, or touch anything but surfaces. We cannot even visualize anything but surfaces.

Moreover, all of the 'primary qualities' philosophers have attributed to physical objects, qualities such as definite shape, velocity, elasticity, and hardness, are obviously just qualities of their surfaces. Even our sense organs themselves are just surfaces on our bodies. There is no good reason to suppose the existence of anything else.

Talk of 'interiors' is unscientific, metaphysical, and downright mystical." Let us suppose that he even tries to support his view by appealing to a theory of meaning whereby the only meaningful talk is ultimately analyzable 17 as talk about surfaces. "How can we meaningfully talk about entities of which we have no experience— which we cannot even imagine? The only kind of entities with which we are acquainted are surfaces, and we have no need to suppose the existence of any other kind of thing. Indeed, we would not even understand what it was that we were supposing."

O Let us call this view "surfacism". As far as I know, no one has actually held this view in the form just pre­ sented, but it does bear a certain resemblance to some views that have actually been held in the , in particular to those we have characterized as

"empiricism". The point of introducing it here is so that we may argue against it, and by analogy, to argue against those views that resemble it too closely. We thus oppose at one time the whole class of views that can be seen as varieties, more or less disguised, of surfacism.

Suppose we confront the surfacist with the common observation that when we cut a body in half, we seem to expose part of the interior to our sight. We do not find a stick of butter to be hollow when we slice off an end.

What can the surfacist say? Let us suppose that he insists that in the act of cutting, our instrument stretches the old surfaces to close the solid. How can we refute this affront to common sense? We can say that the causal 18 mechanism of this stretching seems to be obscure, but he can always counter that the obscurity is only a result of our superstitious belief in the existence of interiors.

Presumably once we accept the idea that causal relations hold only between surfaces, we will see that the surface of the knife stretched the surface of our butter, and that this is all there is to the matter. Stretching is one of the ways that surfaces can act upon other surfaces.

Stretching is just what we should expect when an elastic body meets a hard body; it is no more causally mysterious than the motion that is imparted when one hard body strikes another.

Suppose we ask the surfacist to give an account of the mass of a physical object, a quality that we normally associate with interiors. He might reply that we have no awareness of mass, separate from our awareness of surfaces; so mass must somehow be done away with, or analyzed in terms of surfaces. He can concede that the mass of a physical object is a function, not of the surface area of the object, but of its volume; but then he would insist that volume be analyzed in terms of the dimensions of an object— and the dimensions of an object are qualities of its surfaces. Mass, he would maintain, is merely a logical- mathematical construction. The notion is completely analy- zable in terms of the qualities of surfaces. 19

Suppose we ask the surfacist to explain how we came to have this belief that physical objects have interiors.

How can we have been so wrong all along? He asserts that this belief is a remnant of the belief in a soul or self that inhabits our bodies. We have projected this primitive religious belief onto the outer world, and imagine that in a similar way physical objects have mystical interiors.

"Of course this is all nonsense," he asserts, "there aren't any such souls, and no interiors either. But even if we were able to introspectively discover our 'interiors', it would still be a mistake to project these subjective enti­ ties onto the outer world."

It begins to appear that a categorical refutation of surfacism may be impossible. If the surfacist is ingenious enough, and if he is determined enough, he may very well be able to go on countering our objections indefinitely. If we are unable to refute the position, must we accept it and embrace surfacism ourselves? Clearly not. That we cannot refute a position is not rationally compelling evidence in its favor. The position may have a competitor that is equally hard to refute; or one that, despite the existence of strong and unanswered objections, nonetheless has some other decisive advantage in competing for our acceptance.

In the present case, surfacism must compete with our 20 commonsense belief in the existence of interiors, and with

the whole family of views, ranging from the commonsensical through the various theories of physics, that suppose the existence of interiors.

Empiricism, in its tendency to try to reduce the unobservables of a theory to observables, comes dangerously close to surfacism. Empiricist theories that go this way, while they probably cannot be decisively refuted, neverthe­ less do not cohere very well with common sense and physics.

********

The third inadequacy with the empiricist's program is its inability to deal satisfactorily with causal relations.

Empiricism is usually associated with the "regularity-of- occurrence" view of causality, but this view of causality can be shown to be inadequate.

Now it is a matter of some controversy whether the notion of the cause-effect relationship has any significant 9 role to play in modern theoretical science. But it can hardly be disputed that this notion is an important instru­ ment in the applied sciences, in public planning, and in the conduct of the ordinary affairs of life. With some urgency we seek to understand the effects that various human activities have upon the world's ecosystems; I have been on occasion very interested in discovering the cause of my son's apparent discomfort. Yet despite the 21

pervasiveness of this notion, a satisfactory philosophical

understanding of it has proven to be illusive. Causal

relationships can be known— our ordinary experience con­

firms this. If we are to understand how they come to be

known, it seems important for us to discover what they are.

One reoccurring proposal, particularly associated with

David Hume, but going back at least to William of O c k h a m ,

is that the cause-effect relationship is nothing more than

some sort of regularity-of-occurrence relation. Various

empiricist philosophers have put forth varied proposals for

this regularity relation. But all regularity accounts have

in common the virtue that they make it clear, at least in

principle, how causal relationships might come to be known:

what happens is that we notice that in all of our experience

events, conditions, or happenings of type A have always

stood in the appropriate regularity relation to events, etc.,

of type B; and by an inductive leap we conclude that they

always will be so related. If some regularity account of

causation is correct, then knowledge of causal relationships

could result from a process as logically uncomplicated as

generalizing from experience.^ On the other hand, if no

regularity account is correct, we must expect that causal

knowledge is the result of some other, and probably logically more complex process. 22

My intention in the remainder of this section is to evaluate a contemporary regularity account of causation, that of Wesley Salmon, and then to generalize one of the insights gained in the process to apply to all regularity accounts. I will argue that no regularity account can be satisfactory.

Salmon holds the view that relations of causal influence are certain complicated sorts of relations of 12 statistical relevance. To understand what he means by statistical relevance, we will need to consider a few preliminary notions:

Notation: p(A/B) is the conditional probability of A given

B, i.e., the probability that something is an A given that it is a B. A and B here represent classes of things

(events, conditions, factors, etc.), or they represent properties whereby things may be grouped into classes.^

Definition 1: C is statistically independent of B within

A iff (if and only if) p(C/A.B) = p(C/A). A.B here repre­ sents the set intersection of A and B, if A and B are classes; or the property of having both properties A and

B, if A and B are properties.

The relation of statistical independence can be shown to be symmetric; i.e., if C is statistically independent of

B within A, then B is statistically independent of C within A. 23

Definition 2: C is statistically relevant to B within A

iff C is not statistically independent of B within A.

It follows that statistical relevance is also a symmetric relation.

Since Salmon holds the view that probabilities are just long-run frequencies of occurrence, and since he also holds that causal relevance is a species of statistical relevance (a probability relation), it follows that his view is a regularity account of causation. In particular it is what we will call a "statistical regularity" account in order to distinguish it from "strict regularity" accounts which do not call upon statistical notions.

Nowhere in Salmon[1] or Salmon[2] does he completely spell out an analysis of the cause-effect relationship;^ he is more concerned to give an analysis of scientific explanation. But he does make a number of interesting claims about causation, and I would like to focus on one of them. Before we turn to it, we shall need one more notion, Definition 3; D screens off C from B in reference class A iff p(B/A.D .C) = p(B/A.D) ? p(B/A.C).

This means that within A.D, the property C is statis­ tically independent of B, and further, that within A.C the property D is statistically relevant to B. Intuitively, this means that when we know whether or not something has

D, our knowledge of the likelihood that it also has B is 24

not at all improved by knowing whether or not it has C.

On the other hand, if we know whether or not it has C, we can improve our information about whether or not it also has B by determining whether or not it has D. Crudely, D

is more statistically relevant to B than

The screening-off relation is non-symmetric.

Salmon suggests that the screening-off relation is an 15 indicator of causal proximity. The idea is that if D screens off C from B in reference class A, then D is causally closer to B than C is to B, and conversely, if D is causally closer to B than C is to B on background assump­ tion A, then D will screen off C from B in A. The idea behind his suggestion seems to be crudely expressible as: statistically more relevant (in the sense of screening-off) implies causally more relevant, and vice-versa.

I would like now to confront this suggestion with what

I take to be a counterexample:

Example 1: Let A = adults, B = people who die of lung cancer, C = smokers, D = residents of Winetuckamuck, Maine.

Now let us suppose that the statistics show that the inci­ dence of death from lung cancer in Winetuckamuck is slightly higher than the average for smokers generally. Then p(B/A.C.D) = p(B/A.D) > p (B/A.C), so D screens off C from

B in reference class A. 25

Now it seems that while smoking is causally related to

lung cancer (we have this on the authority of the Surgeon

General), living in Winetuckamuch per se is not. Yet living

in Winetuckamuck screens off smoking from getting lung

cancer. It does so because we assumed that it is statis­

tically relevant to the incidence of lung cancer. Here we have an example where the more causally proximate factor is

screened off by what seems to be a causally unrelated factor.

One response that a defender of the screening-off indicates causal proximity thesis might make to this example, is to maintain on some grounds that living in

Winetuckamuck is indeed more causally proximate to death from lung cancer than smoking is. He might point out that the condition of being a resident of Winetuckamuck impli­ citly includes the condition of being a smoker, since ex hypothesi all of the residents smoke, and then suggest that the increase in the incidence of lung cancer over the rate for smokers generally is an indication of additional causal factors at work. (Perhaps the reasoning here would be an appeal to the Principle of Sufficient Reason— if there is a difference in the rate, there must be some reason to account for it; or perhaps an appeal to something like Mill's Method of Difference— there must be something different about

Winetuckamuck that would account for the different rate.) 26

Thus, a defender of the thesis might claim that the resi­

dence condition, seen as indicating a slightly more com­

plete account, is plausibly taken to be the more causally

proximate condition.

But this response to the proposed counterexample is

inadequate. The slightly higher incidence of lung cancer

in Winetuckamuck may not be an indication of anything at all. It might just be (and we can suppose that it just is) a result of the statistical fluctuation of the incidence of 17 lung cancer about its mean value; that is, if there really are towns where all of the adults are smokers, it is to be expected that in some of them the incidence is higher, and in some lower, than the rate for smokers generally. But if the higher rate in Winetuckamuck is not causally significant, then living in Winetuckamuck, is not causally more proximate to lung cancer than smoking is.

And the counterexample stands.

At this point a defender of the thesis might decide to put the blame for the increased statistical relevance

(in the sense of screening off) on the short-run nature of the relative frequencies upon which the probabilities were based. Recall that Salmon holds that probabilities are long-run relative frequencies. The relative frequencies that we record from our surveys and observations are to be thought of as samples (hopefully, random samples) of the 27 limits that the relative frequencies would approach in the 18 infinitely long run. Thus Salmon would likely maintain that long-run screening off is the perfectly reliable indicator of causal proximity, whereas recorded screening off is merely a statistically reliable indicator. Presu­ mably, in drawing causal conclusions from recorded data our confidence ought to be a function of the sample size, the strength of the evidence that our sample is random, and the size of the difference in the recorded relative frequencies.

Thus we can conclude that the thesis that recorded screening off is an indicator of causal proximity can be maintained only if it is taken to mean that recorded screening off is statistically related to causal proximity.

The thesis has to be taken this way because of the possi­ bility of accidental (i.e., causally meaningless) fluctua­ tions in the statistics. Statistically relevance relations can, and sometimes actually do, hold accidentally.

But what about long-run screening off? Is it a per­ fectly reliable indicator of causal proximity? It might be well at this point to raise the more general question as to whether any long-run statistical relevance relation might ever hold accidentally. Is it possible for two variables to be in fact statistically relevant to each other in the infinitely long run, without this statistical relevance 28

having some sort of causal significance? The answer

clearly is that it is indeed possible, at least in certain

types of cases.

The simplest (and perhaps most trivial) kind of

example of this is where the statistical relevance is a

result of logical relations rather than causal ones. The

correlation between days on which it rains and days on

which it precipitates requires no causal explanation

(although we would not be inclined to call the correlation

"accidental" in this case). Perhaps we should consider

logical relations to be trivially causal relations, but I

will not pursue this question just now.

Less trivial examples are those where the relevant

classes are finite, even in the infinitely long run. In

the Winetuckamuck example we can easily imagine that living

in Winetuckamuck would be statistically relevant to death

from lung cancer, even when we take account of the whole

population of the town from its founding until its eventual

absorption into the city of Tuckaweegum. Or to take an

actual example, rather than a made-up one: it is in fact

the case that I have gone camping many times, and that on

each and every occasion when I have done so it has either

rained or snowed. Presumably this correlation is acciden-

* tal (though sometimes I wonder). But even in the infin­

itely long-run it will likely be the case that the relative 29 frequency of rain in any location is greater if John

Josephson is camping there than it is otherwise. This is so because, unless I live long and camp often, it is already arithmetically impossible to wipe out the correlation.

Besides these types of cases are there any others?

That is, supposing the relevant classes to be at least potentially infinite, can there be accidental (non-causal and non-logical) statistical relevance? I submit that this is at least conceivable, whether or not it ever actually happens. It is possible in principle; there is no contra­ diction implied by supposing the existence of an infinitely protracted coincidence, though we might be inclined to set

I Q the probability of such an occurrence at zero. For example, we can without contradiction suppose that hemlock is actually harmless; it just happens to be the case that everyone who ever has or who ever will drink hemlock, dies of other causes shortly thereafter. Improbable as this seems, it is not impossible, and we are forced to concede the logical possibility of infinitely protracted accidental statistical relevance relations.

Summarizing our discussion so far of Salmon's thesis that screening off is an indicator of causal proximity, we have found that: (a) Recorded screening off is at best a statistically reliable indicator of causal proximity; and 30

(b) Long-run screening off may be in fact a perfectly reliable indicator of causal proximity, but therfe is no necessity to this connection. We were led to these conclu­ sions by considering the possibility of accidentally occurring statistical relevance relations.

I would like now to consider abouter example, one with a somewhat different moral.

Example 2: Suppose that there were to be in the solar system two comets— Chin's comet and Chan's comet, such that they have the same period of revolution around the sun.

Suppose further that they are out of phase by six months to the day, i.e., perihelion of Chin's comet is invariably followed six months later by perihelion of Chan's comet.

Then with respect to the general reference class D of days, p(A/D.I) = 1, where I = days which are six months after perihelion of Chin's comet, and A = days of perihelion of

Chan's comet. It follows that nothing can possibly screen off I from A in D. This is so because if something (call it C) were to screen off I from A in D, by definition this would require that p(A/D.C.I) = p(A/D.C) ^ p(A/D.I). But ex hypothesi p(A/D.I) = 1. Furthermore, since D.C.I is a subclass of D.I, p(A/D.C.I) = 1 as well. Thus we could conclude that 1 ^ 1, 31

Despite the fact that nothing can possibly screen off

the property of being a day which is six months after peri­

helion of Chin's comet from the property of being a peri­

helion day for Chan's comet, neither of the properties is

related to the other as cause to effect. They are the

results of a common cause, namely, the characteristic

gravitational field of the sun.

The moral of this story is that, if we can find an

absolutely invariant functional relationship between two

variables, then no additional variable can ever screen off

the two related variables from each other. But such an

invariant functional relationship could always be conceived

as the result of a common cause with the variables them­

selves being causally remote. Again we conclude that

screening off bears no necessary relationship to causal

proximity. This time the problem arose, not from con­

sidering accidental statistical relevance relations, but

from considering the non-accidental relevance relation

arising from the events having a common cause.

I claim that this last result generalizes to doom any attempt to analyze the cause-effect relationship as some

sort of regularity of occurrence, whether statistical

regularity or strict regularity, whether universal regu­

larity of actual occurrence, or lawlike regularity of any possible occurrence. I would like to argue for this claim 32

by first arguing that no strict regularity account can

succeed, and then turn to arguing that no statistical

regularity account can succeed.

I begin by arguing against an analysis of the causal

relation put forward by Hume. In some places in the

Treatise Hume analyzes the causal relationship as follows:

(I modernize the language somewhat.) An event A is the

cause of an event B iff (1) A and B are spatially and

temporally contiguous, and (2) A is temporally prior to B,

and (3) events similar to A are constantly conjoined with

events similar to B. That is, whenever an event similar to

A occurs, it is followed immediately thereafter-.by an event

similar to B; and whenever an event similar to B occurs, an

event similar to A has just occurred.

Ignoring for the moment the spatio-temporal conditions,

we produce a counterexample to the analysis by making a machine to the following specifications: the machine is

supplied with electric power, it has a single button on the

top and it has two little lights on the front. The cir­

cuitry is arranged in such a way that when we press the button, both lights go on. The lights never go on except when the button is pushed. Thus the events consisting of

the lighting of the leftmost light are constantly conjoined with the events consisting of the lighting of the rightmost

light. So far so good. The operations of this machine are 33

a counterexample to the view that causation is just con­

stant conjunction— since neither lighting event is the

cause of the other. They are the results of a common

cause, namely, the pressing of the button together with

the flow of electricity through the circuits.

Figure 1. Circuit Diagram of a Hume Counterexample Machine

Now we modify the machine slightly to take account of

the condition requiring the cause to be temporally prior

to the effect. We introduce a time delay device into the

circuitry leading to bulb A. Now when we press the button,

bulb A lights slightly before bulb B— exactly one instant before bulb B. If a temporal leg of one instant does not make sense, then neither does the condition requiring tem­ poral contiguity, so we are on safe ground here. Finally, we move the bulbs so close together that they touch. We now have a counterexample machine to Hume's complete analy­

sis. Lighting events of bulb A are regularly related to

lighting events of bulb B exactly as they should be for 34

A-events to be the cause (on Hume's analysis) of B-events.

But A-events are not in fact the cause of B-events. They have a common cause. Diagramatically:

A rather than A -► B

I claim that with the use of time delay devices, and the careful placement of bulbs or arrangement of a contin­ uous display (for example, a cathode ray tube), we can always handle the spatio-temporal conditions of a regularity 21 account. So I will ignore these conditions as I go on to describe bigger and better counterexample machines.

But before we go on to consider further machines, I want to point out that our last machine is already a counter­ example to the view that a cause is simply a necessary con­ dition for its effect, and a counterexample to the view that a cause is simply a sufficient condition for its effect.

(That is, if "necessary condition" and "sufficient condi­ tion" are understood truth-functionally.) In this machine the occurrence of A-events is a necessary and a sufficient condition for the occurrence of B-events, since bulb A lights iff bulb B lights. But the machine is so constructed that A-events are not the cause (or part of the cause, or a cause) of B-events. Thus the cause-effect relationship consists neither in truth-functional necessity, nor in truth-functional necessity and sufficiency. Nor does it consist in any of these three relations together with addi­ tional spatio-temporal conditions.

Perhaps it is clear by now that by designing counter­ example machines, we can defeat any attempt at analysis based only on spatio-temporal conditions and conditions of regularity of occurrence whose logical form is built up out of truth functional connectives. Just in case this is not yet clear, let us consider one more such view. 2 2 In a paper first published in 1965, J. L. Mackie proposes the following analysis: C is a cause of E (on a certain occasion) if C is an INUS condition of E, i.e., if

C is an Insufficient but Necessary part of a condition which is itself Unnecessary but exclusively Sufficient for

E (on that occasion). He gives this example: When experts declare a short-circuit the cause of a fire, "they are saying, in effect, that the short-circuit is a condition of this sort, that it occurred, that the other conditions which conjoined with it form a sufficient condition where also present, and that no other sufficient condition of the house's catching fire was present on this occasion."

The design of the counterexample machine is as follows

The machine has four lights, C, E, A, and B. If the C- light is lit we say that condition C is present, and similarly for the other lights and conditions. The machine has four buttons, 1, 2, 3, and 4. It is wired in such a way that when button 1 is pushed, A and E light. A lights under no other circumstances, so A is a sufficient condi­ tion for E. When button 2 is pushed, C alone lights. So

C is by itself insufficient for E. When button 3 is pushed,

B alone lights. So B is by itself insufficient for E.

When button 4 is pushed, C, B, and E all light. The organi­ zation of the machine is summarized in Figure 2.

buttons— ► 1 2 3 4

lights 4

A X

BX X

C X X

E X X

Figure 2. The Organization of a Mackie Counterexample Machine

Now suppose that I push button 4. I claim that on

Mackie1s analysis, C is a cause of E on this occasion. By examining Table 1 we note first that C by itself is not a sufficient condition for E, since if button 2 is pushed,

C lights but E does not. Then we note that there is a condition (C.B) which is sufficient for E and is the only sufficient condition for E (besides E itself) which occurs on this occasion (of pushing button 4). (The only other 37

sufficient condition for E is A, which does not occur on

this occasion.) But this condition (C.B) is not necessary

for E , since if button 1 were to be pushed, E would occur

but C.B would not. Finally we note that C is a necessary

part of this condition (C.B), because C is a part and

because B alone is not sufficient for E. (Button 3 lights

B but not E.) So C is indeed an INUS condition for E and

should be a cause of E according to Mackie's analysis. Yet

here again the relationship is not one of cause-effect, but of results of the action of a common cause— namely, the pushing of button 4 and the flow of electricity through

the circuits.

We are left with only the statistical regularity

accounts of causation to deal with, and the extension of

the arguments to law-like regularity accounts. As far as

I can see, the only new type of machine component that might be needed in order to construct counterexamples to statis­

tical regularity accounts is a random-choice switch. Such a switch, when activated, would have the ability to randomly— but according to prearranged statistical biases— activate certain other circuits leading to various lights.

Now in all likelihood we will never need a random- choice switch to construct our machines, because a given statistical regularity account will probably take strict 38 regularities to be limiting cases of statistical regulari­ ties. If so, a machine constructed along the lines we have been using so far will do the job of mimicking the limiting-case strict regularity, and thus provide a counterexample. But just in case a view of causation should be devised that requires regularity relations which are essentially statistical in nature, we had better keep a box of random-choice switches on the shelf.

What I claim to have proved with these counterexample machines is that no regularity-of-occurrence relation is a sufficient condition for a cause-effect relation, this being so because we can arrange for any regularity relation to hold as the result of the action of a common cause.

(That we can actually arrange for a real live counter­ instance makes this a stronger argument than one from the mere conceivability of accidental regularities.) Now in fact I have only shown how to design counterexample machines for truth-functional regularity relations and for statis­ tical regularity relations. If someone were to propose that the cause-effect relation is a regularity-of-occurrence relation of some new third kind, it might be the case that a counterexample machine of this type is not possible. So

I would like to present two more arguments— these designed to show that any regularity at all might be the result of a common cause. 39

The Cinematic Argument

A moving picture representation of a series of events displays (or can display) any of the regularities of occurrence that the original events contained. Yet when the movie is shown, the event representations do not have internal causal connections in the way that the events that they represent do. The event representations are all effects of a common cause: the complex projection process or the order of events on the film, or some such.

The God-may-be-the-sole-cause Argument

Imagine a Berkeley-like universe in which there are no internal causal relationships. Everything that happens, happens as a direct result of its being willed by a power­ ful God. I think we must concede that such a universe is logically possible, and that such a universe might very well be indistinguishable from our own. It follows that no regularity-of-occurrence relation is sufficient for the cause-effect relation, because no regularity relation rules out a God-is-the-common-cause explanation for that regu­ larity. 23

I believe that all of the accounts that hold the cause- effect relation to be some particular regularity-of- actual-occurrence relation, have now been successfully refuted. But what if causation were held to be regularity of occurrence per se? Russell seems to have some such

idea in mind when he suggests (Russell[1]) replacing the

notion of cause-effect with the notion of the functional

relationships (in the mathematical sense) between certain

events at certain times and other events at later, at

earlier, or even at the same time. (Or more precisely, the

functional relationships between the variables charac­

terizing the events.) The point is that he does not take

the cause-effect relation to consist in some particular

functional relation, or even in some particular kind of

functional relation (though he seems partial to differen­

tial equations). For our purposes, and ducking quibbles

about applying the various forms of the word "cause", we can take Russell to be suggesting that causation is

nothing more than occurrence-according-to-a-functional-

relationship.

Whatever the merits of this proposal, it suffers from

the same defect as all of the views we have so far con­ sidered. Regularity-of-occurrence relations— truth functional, statistical, or functional— do not have the power to capture the distinction between direct causal relatedness and relatedness due to a common cause, do not have the power to distinguish A B from C5* A Yet this 41 distinction is both conceptually natural and practically

(if not theoretically) indispensible.

To see that Russell's proposal fares no better than the others in this regard, we need only note that all of the particular regularity relations we have considered are just special cases of functional relations. So functional relations can, and sometimes do, obtain as the result of the action of common causes,

I believe that by the arguments presented so far we have shown that regularity of occurrence cannot be a sufficient condition for the cause-effect relation. But might regularity still be a necessary condition? Is every occurrence of a cause-effect connection an instance of a general regularity? I would like us briefly to consider this possibility.

The problem is that sometimes it is at least difficult, if not impossible, to specify plausibly and non-trivially the relevant general regularity. Abraham begat Isaac, and the relationship between the two is clearly a causal relationship.^ But what regularity of occurrence connects parents with their offspring? That there are no people who did not have parents? Some other troublesome cases are provided by the relationships between human intentions and 2 5 their realizations, between "sustaining causes" and 42

stable states,26 and by the causes and effects of "unique

events" such as the Fall of Rome and the Big Bang.

Now I do not claim that these troublesome cases refute

the view that regularity is a necessary condition. I just

want to cast a bit more doubt. To argue more forcefully would require a thorough examination of just what sorts of

regularity relations are sufficiently non-trivial to count

here, and I doubt that the effort would pay off in enough

insight to be worthwhile. Let us get on to other matters.

Three final remarks before I close this section.

First, I want to make it clear that I am not denying that

regularities of occurrence can be symptomatic of causal

connections, or that they challenge us to find causal

explanations. I am just denying that regularities as such

can capture causal relatedness, or provide causal explana­

tions .

Secondly, I claim that I have argued successfully

against what might be called lawlike-regularity-of-occur- rence analyses of causation as well as what might be called universal-regularity-of-actual-occurrence analyses. If the mark of lawlikeness is the support of subjunctive condi­

tionals, I claim that even lawlike regularities of occur­

rence may be the result of the action of common causes.

For example, I think that it is true to say of our Hume counterexample machine that (supposing the machine to be working properly), if bulb A were to light, bulb B would

'light as well.

Finally, I want to point out again that arguments

against regularity accounts of causation based upon the

possibility of a common cause do not suffer from a certain

weakness that arguments from the possibility of accidental

regularities suffer from. The weakness is that it is

awfully hard to believe in the occurrence of an accidental

regularity of potentially infinite scope, an infinitely

extended coincidence, as it were. Our intuitions seem to

demand that the likelihood of such a thing is extremely

small.

In summary, we have found three difficulties with

empiricism: a difficulty about the foundations of induc­

tion; a difficulty about the status of theoretical entities

and a difficulty about the analysis of causal relations.

I hope to do better on all three counts. 44

C. The Task of this Dissertation

Let us suppose at the outset that science is possible—

and I mean by "science" something very much like what our

scientists think that they do. Let us agree that it is

possible by way of "rational" methods to arrive at factual

judgments about the world, both judgments about the occur­

rence of particular events (including expectations), and

judgments which are highly abstract, general, and theoreti­

cal. Judgments, furthermore, which are often "accurate"

or "true".

The fundamental question that this dissertation is to

address is this: How is science possible? An answer will

of necessity have something to say about just what sort of

science is indeed possible. As I see it, the task is to

clarify the logic of science, formalize it, rationalize it,

justify it, and perhaps propose changes in it; in short,

the task is to criticize the logic of science. This is what I mean by answering the question as to how science is

possible. By the "logic" of science, I mean its inferen­

tial practices, its way of handling evidence, and its most

basic concepts and assumptions.

Now I do not propose to complete such a general and

abstractly defined task in the present work. The most I

can hope for is to lay down a general program for answering

the question, and to take the first few steps toward pro­

viding a detailed answer. NOTES FOR CHAPTER I

1. The Prajna PSramita-Hridaya Sutra, c. fourth century A.D. This translation from the Sanskrit is from a sutra card from Zen Mountain Center, Zenshinji Monastery, Tassajara Springs, California. Other, more readily available translations are: Suzuki [1], pp. 26-30, and Conze [1], pp. 162-164. A discussion of the Prajna Paramita Sutras (of which the Heart Sutra is a part) may be found in Dumoulin [1], p. 34ff. and B. L. Suzuki [1], p. 99ff.

2. Or a failure of courage. TOTALLY VOID is, after all, pretty threatening to anything I might be attached to.

3. Perhaps it is conceivable to the inhabitants of the planet Pegasus, who have a greater evolutionary invest­ ment than we humans do in "imagination" and a weaker investment in "logical thinking".

4. See, for example, Kneale (1], p. 652ff.

5 "The sentence 'I can't be making a mistake1 is cer­ tainly used in practice. But we may question whether it is then to be taken in a perfectly rigorous sense, or is rather a kind of exaggeration which perhaps is used only with a view to persuasion."— Wittgenstein, On Certainty, #699 (Wittgenstein [1]).

6. "A wise man, therefore, proportions his belief to the evidence." David Hume, An Inquiry Concerning Human Understanding, Section TO ("Of "), Part 1, Paragraph 4 (Hume [1]).

7. Stove [1].

8. I am indebted to Nicholas P. Goodman for the idea of "surfacism", and for inspiring much of this section. (See Goodman, N.D. [1].) Of course I take full responsibility for the present work, which may or may not adequately reflect his ideas.

9. For example Russell (in Russell [1]).suggests replacing the notion of causality with that of func­ tional relation— the motive being the avoidance of

45 46 misleading associations and oversimplifications of the actual forms of physical laws. He does not rest his argument on a rejection of determinism. Since the development of the quantum theory, however, it has . become fashionable to reject the notion as part of a rejection of determinism. (See, for example, Schrfidinger [1].) But it seems we need causal concepts anyway; we just need to divorce them from deterministic implications. That is, we need a notion of causality— with connection, but without the sort of necessary connection which strictly requires that sufficiently like circumstances always result in like happenings.

10. Moody [1], p. 313.

11. Strictly speaking, the process is this uncomplicated only if some regularity-of-actual-occurrence account of causation is correct. If instead, some lawlike regularity account is correct, an inductive leap that carries us to knowledge of causal relationships would have to be more than a simple generalization from experience, since such a generalization cannot arrive at a counterfactual conditional.

12. For instance, he says, "I shall agree from the outset that causal relevance (or causal influence) plays an indispensable role in scientific explanation, and I shall attempt to show how this relation can be expli­ cated in terms of the concept of statistical rele­ vance" (Salmon [3], p. 120-21 ) And later he says, ". . . w e are construing causal relevances as a species of statistical relevance" (Salmon [3], p. 133).

13. This is currently a more standard notation for condi­ tional probabilities than the one used by Salmon and Reichenbach, I will continue to use it throughout the present work.

14. Though he does go a long way toward doing so in Salmon [3], section 3, pp. 129-34. Here he invokes Reichenbach1s conception of "the ability to transmit a mark" to distinguish causal processes from what he calls "pseudo-processes". Since "the ability to transmit a mark" does not seem to be a statistical notion, it might appear that Salmon is abandoning the attempt to explicate causal relevance in terms of statistical relevance, but later he says that the "ability to transmit a mark is simply a species of constant conjunction" (Salmon [3], p. 140). 47

15. Salmon [1], pp. 75-76.

16. I have been informed in conversations with social scientists that this view is actually taught in social science methods courses. I have not verified this, however.

17. To say that something is a "result of a statistical fluctuation" is not necessarily to impute causal power to statistical fluctuations, but perhaps to deny that the thing is causally explainable. We use causal language in a similar way if we say that some­ thing is "due to chance".

18. In as much as probabilities for Salmon are potential limits of relative frequencies of actual occurrence, his account of causation is not strictly a regularity- of-actual-occurrence account; it might better be thought of as a lawlike-regularity-of-occurrence account.

19. Having a probability of zero is not the same as being impossible;, e.g., the probability is zero that an integer chosen at random from the entire infinite set of integers would turn out to 7,903, yet it is pos­ sible.

20. See, for example, Hume [2], Book I, Part III, Section XIV, p. 170; and Book I, Part III, Section XV, p. 173.

21. If someone would dispute this, they need only come up with a convincing counterexample.

22. Mackie [1]. His view here differs somewhat from the view he presents later in Mackie [2].

23. This argument might appear to lead to the skeptical conclusion that causal relations are in principle unknowable, because a supernatural sole cause (soul cause?) can never be completely ruled out. But we are led this way only if we require certainty before we have knowledge of causal relations, and this requirement is clearly too strong. The thrust of the arguments presented so far is that regularity-of- occurrence relations cannot distinguish direct causal relevance from indirect relevance due to a common cause. 48

24. Though it is a thing-thing causal relationship, rather than, say, an event-event one. Abraham, as part of Isaac's biological ancestry, is part of his causal ancestry. Abraham is ontologically prior to him. Isaac's being what he was is dependent on, and conditioned by, Abraham's being what he was.

25. What is the regularity-of-occurrence relation between Caesar's intention to cross the Rubicon and his act of crossing it? That Caesar normally, (a% of the time?) accomplishes what he sets out to do? Or maybe, that human beings accomplish what they intend, if they are capable of it, and if nothing prevents them from doing so?

26. That if the furnace stops working, then the tempera­ ture of the house will fall unless it is warm out, or unless someone builds a fire in the fireplace, or someone bakes bread, or unless by some other means enough heat is added to the interior to make up for the heat loss? Chapter 2

A Stand Against Skepticism:

Foundations for a Logic of Induction

A. The Epistemic Starting Place

It seems to me that skepticism is not wrong. If we

so desire, we can adopt the position that knowledge (in

some sense of the term) is impossible to have. Since

absolute certainty cannot be attained, we can decide that

nothing can be attained worthy of being called "knowledge".

After all, if a proposition is not absolutely certain, then

it might be false; and if it might be false, why call it

"knowledge"?

We can decide that no judgment can ever be suffic­

iently justified so as to rationally compel our acceptance of it (even this one). We can strive mightly to be agnostic on every issue. Yet, as Hume says,

Nature, by an absolute and uncontroulable necessity, has determin'd us to judge as well as to breathe and feel.

If we allow ourselves just that degree of animal opinion which is urged upon us by circumstance and the make up of our nervous systems, and make no pretense of rationality for

49 50

those opinions, we can probably go on living our daily

lives without a great deal of difficulty.

We can be skeptics all right. But we cannot simul­

taneously be skeptics and scientists. As we have already

agreed, the scientist, qua scientist, uses "rational" methods to make factual judgments.

My position is that the scientist (and philosopher)

has understanding as a goal— it is part of the nature of 2 the enterprise. I can see no way of arguing that

accepting this goal is incumbent upon all those who would be rational. One either accepts it, has a "will to under­

stand", or one does not.

In choosing to speak of this goal of science as

"understanding", rather than "knowledge", say, or "truth",

I mean to emphasize that the object is multi-dimensional.

We want more than simply true beliefs corresponding to the

facts, though we do indeed want that. We also want our beliefs to be connected together, to be integrated into theoretical pictures of the world. In Aristotle's terms:

it is not enough to have simply "knowledge of the fact", 3 we also want "knowledge of the reasoned fact". We seek the "why" of things. We want to be in position to explain 4 the facts.

Unlike Descartes or the empiricists, I propose a commitment as our logical starting place, rather than a 51 method or a set of propositions. We start with a commit­ ment to try to understand the world as best we can.

Now I take it that whatever "understanding" is, it

includes the idea of having true (or at least accurate)

beliefs. We presumably have a better chance of arriving at

true beliefs if we are willing to make considered judgments, however we do that, than we have if we are unwilling to make considered judgments at all. Trying to understand

things, we make judgments, and in so doing we demonstrate a certain faith in our powers of rational consideration.

Whether or not this faith can be justified, is a question to which we now turn. 52

B. Is it Reasonable to be Reasonable?

What if the evil being envisioned by Descartes really exists? Then there would be no hope of coming to under­ stand the world. It is hard to get up much enthusiasm for any project flowing from the proposed commitment if we fear that the project is doomed from the beginning. So how plausible is the hypothesis that an Evil Deceiver is behind all of our perceiving and thinking? It would certainly succeed in accounting for all of the details of our exper­ ience and thought: P is experienced, or P is thought, because the Deceiver wants P to be experienced or thought.

This hypothesis explains any detail, but it does not give very detailed explanations; nor does it give very detailed predictions. It does not explain why P is experienced rather than something else. Why this particular illusion?

Why should she want to deceive us anyway? Why should such a deceiver have allowed us to suspect her existence?

How strong is the evidence for the existence of this

Great Deception? Clearly there is ignorance in the world, but there seems to be no reason to think that it is so completely pervasive. On the contrary, this hypothesis seems to live a life apart, as a bare possibility, with no real plausibility at all.

It is more to the point to ask whether accepting this hypothesis will be helpful in the search for truth. 53

Will accepting it be conducive to the progress of human

understanding?

If we are correct in believing the hypothesis, and

our reason hsa indeed been subverted, then we have realized

the One Big Awful Truth, and can reason no more. We cannot

even have much confidence on our realization, since it

tends to deny itself. But if we believe the hypothesis,

and are mistaken in doing so, then we will have given up

our reason for a great falsehood. We will have needlessly

abandoned the pursuit of understanding. In either case,

if we accept the hypothesis of the Great Deception we must

abandon the commitment to try to understand the world.

On the other side, if we reject the hypothesis, and

it is really true, then we will not have lost much. It is not much worse to be completely wrong about everything,

than to understand only that we are deceived about every­

thing and have not got the power to improve the situation.

Finally, if we reject the theory, and there is really no

Great Deception, then we can go on about the business of seeking truth, reasonably confident in the belief (which for all that we can be certain of is false) that at least some of our beliefs are accurate, that our thinking is not completely muddled, and so our understanding, our thought, our reasoning has some bearing on the way things are. 54

These considerations can be summarized in the

following decision matrix:

HYPOTHESIS OF THE GREAT DECEPTION

hypothesis hypothesis is true is false A 1

accept the small great hypothesis gain loss

reject the . small great hypothesis loss gain

Figure 3. Decision Matrix for the Hypothesis of the Great Deception

In short, if the hypothesis is true, it makes little

difference if we believe it or not; but if the hypothesis

is not true, it would be a big mistake to believe it. The way things work out, the decision is completely dominated by the possibility that there is No Great Deception.

There is no real contest. We must reject the Great

Deception hypothesis. Not because we can decisively refute it (Descartes is clearly mistaken on this point), but because we have a much better hypothesis available. That better hypothesis, which is really just the of the other, is the one that holds that most of our beliefs are at least somewhat accurate, and that human thought 55 processes, and in particular human reason, is decently adapted to the ends of truth and understanding.

This hypothesis is better because it is more strate­ gic— it is a better move in a game of understanding- seeking. What is demanded is that reason have confidence in its own methods; it has no real alternative but suicide.

As Socrates says in the :

that we shall be better and braver and less helpless if we think that we ought to enquire, than we should have been if we indulged in the idle fancy that there was no knowledge and no use in seeking to know that what we do not know;— that is a theme upon which I am ready to fight, in word and deed, to the utmost of my power.5

Summary:

We started with a commitment (perhaps I should say

"a vow") to pursue certain ends. Then we rejected the hypothesis that reason had been completely subverted, and accepted instead the hypothesis that human thought pro­ cesses are decently adapted to these ends, and that most of our beliefs are at least somewhat accurate.

This conclusion does not, of course, lend much encour­ agement to dogmatism. We are not entitled to complete certainty with regards to any purported principle of reason. We apparently make mistakes in our thinking; we can always be wrong; we are never entitled to be absolutely 56

certain; and when we say that it is impossible for us to

be wrong, we are necessarily exaggerating.

It is always possible (for all we know) that some

principles of reason will turn out to conflict with others

which seem just as central to reason. If this happens, we

could have grounds for rejecting the one principle as being

in conflict with others collectively stronger. Thus we are

entitled to have only a provisional confidence in any

belief, including purported principles of reason.

Now the argument just presented for confidence in

reason is admittedly an appeal to reason (in the form of a

decision theoretic argument). So it would seem to be

circular. If someone is not inclined to trust his own

intuitions as regards to what is, and what is not, a strong

argument, then he is not likely to be made persuaded by

this decision theoretic argument. Yet, if he has agreed

to seek understanding, he has agreed to make a first move.

My suggestion is that he acknowledge the force of the argu­ ment; if he has a better move, let him propose it. My

suggestion has the virtue that it gets us some confidence

in the whole of reason, based upon reliance on only a part.

I interpret the result as meaning (among other things)

that we are to trust our intuitions concerning deductive validity in particular, and concerning the strengths of all

sorts of arguments in general.^ 57

C. The A-B Perspective

Before Newton, it was thought appropriate to try to explain motions. A typical dialogue might go something like:

Question: Why does it move?

Answer: Because you threw it.

Question: Why does it keep moving after it leaves my hand?

Answer: Because you gave it some impetus.

i the Newtonian Revolution the dialogue has changed:

Question: Why does it move?

Answer: Because you threw it.

Question: Why does it keep moving?

Answer: Because it has no reason yet to stop.

Question: Then why does it eventually stop?

Answer: Because of the forces of resistance that it encounters.

There is a difference here not only in what questions are answered, but also in what questions are thought capable of being answered. "Because it has no reason yet to stop" is no answer, it is a denial of the appropriateness of the question (along with a hint as to what to ask instead).

We may think of the change as being from considering velocity (V) to be an appropriate object of explanation, to 58

considering change of velocity (A-V) to be appropriate

instead.

I suggest that we follow a similar Newtonian revolution

for epistemology. This time, however, we shift the objects of justification, rather than the objects of explanation.

Instead of justifying beliefs, we justify changes of

7 belief. Let us call this perspective, "the A-B perspec­ tive" .

This should not be a very dislocating conceptual shift.

Most old and familiar questions will continue to make sense from the A-B perspective, though perhaps a slightly differ­ ent sense from before. In particular, if we are asked to justify belief in p (a proposition), we may not understand what we are being asked to do until some doxastic state or doxa is specified. Perhaps we can extract it from the context, but somehow we need to have some idea of what is already believed while p (which is presumably not believed) is under consideration. From A-B we can provide justifi­ cation for p only in the sense of justifying p's addition to some doxa, or justifying the replacement of some doxa not including belief in p with another doxa which does include it.

"Doxa" can be variously interpreted as the sort of thing associated with individuals or with communities. 59

Thus we may speak of "my present doxa", "our present doxa"

(meaning, for example, me and my readers), or the doxa of the scientific community at the end of the nineteenth century. No presumption is made that a given doxa is logically consistent. The plural of "doxa" can be taken 8 to be "doxata".

"Aha!" you say, "how can 'the whole scientific community1 be considered as an individual? It speaks not with one voice."

"An individual human being speaks not with one voice," say I.

********

One approach to the philosophy of science can be simply put: We start with our present doxa, and investi­ gate the doxastic state transition rules of the ideal scientist. This is the approach I advocate. We can iden­ tify "logic" and "scientific method" with just these doxas­ tic state transition rules of the ideal scientist, which we may think of as her inferential practices.

This "ideal scientist" is ideal in the sense that she is an optimally efficient understanding-seeker. She is sub-omniscient, but she is an ideal learner and information processor. How would she operate? What would she do? And why would she do things that way? 60

As an approach to answering these questions, we might

try to design a "robot scientist". What will it have to

be able to do? How can it be done? Why should it be done

that way? The idea is to design a "thinking thing". It

need not have any "consciousness" or "vivid subjective

experience". We will not concern ourselves with whether or

not it really has a "mind"; yet it will be a "thinking

thing" in the sense that it will have a changing "structure of outer representations", a system of "beliefs", a mathe­ matical model of the world around it, a doxoid. It will be provided with a way of modifying that doxoid, partly as a result of new input. It will have doxoidal state transi­ tion rules programmed into it. 61

M e : doxa doxa n+1 Whatever rules or patterns I actually use. Some "rational", some not. I can improve by being mindful.

Scientific community:

Whatever it actually uses; some "rational", some not. Can improve by being mindful.

Ideal scientist:

These rules are justifiers for the ideal scientist, and for us.

Robot scientist: doxoid oxoid n+ These rules are approximations to and suggestions for the ideal scientist,

Figure 4. Doxastic State Transitions 62

D. Inductive Procedures

Our thinking processes can conclude in a variety of

ways. We can arrive at a belief, disbelief, or at a

decision to suspend judgment pending further evidence. We

can come to expect something to happen. We can decide to

pursue a goal, obey a rule, trust someone, adopt a strategy,

or perform an action. We can decide to pretend that some­

thing isn’t so.

Let us focus on inferences, which we can take to be

processes of thought that conclude by accepting some

proposition, i.e., in belief, and that include considera­

tion of what is taken to be the evidence for the truth of

the proposition which is provided by other propositions

already accepted. In accordance with tradition, let us distinguish deductive from non-deductive inferences. Non-

deductive inferences we will call inductive inferences.

By an Inductive Procedure (abbreviated IP) we will mean a pattern of inductive inference that we may implicitly or explicitly adopt, and follow or use in our thinking.

For example, 'to generalize by supposing a sample relative frequency to accurately represent the frequency for a larger population' describes an inductive procedure, 'to reason by analogy' describes another, and 'to consult the oracle bones in the traditional manner and accept the indicated prediction' describes yet another. In general an 63

IP may be either formal or informal according to whether or not it can be adequately described by its "logical form", i.e., as a pattern of symbol manipulations.

Inductive procedures are examples of doxastic state transition rules as described in the last section. Deduc­ tive procedures, and patterns of thought concluding in ways other than belief are examples of doxastic state transition rules which are not IP’s.

When I employ an IP I pass from some body of evidence or data, which I already accept in some manner, to an hypothesis, which I come to accept in some manner, on the basis of that data. An hypothesis accepted, by inferring in accordance with an IP, is a kind of conclusion. When called upon to justify a conclusion, I may be called upon to provide data (as premises) upon which it may be based

(whether or not I actually arrived at the conclusion from that data), or I may be called upon to provide or to jus­ tify an IP (as an argument form) in accordance with which the conclusion may be drawn. IP's are thus taken to under­ lie both the logic of discovery and the logic of justifica­ tion.

Corresponding to a given inductive procedure there is associated a class of qualifications upon its use. This class includes such things as restrictions on the type or quantity of data required, the degree or kind of assurance 64

the conclusion is taken to have, and the purposes for which the IP may be employed. Some examples of these qualifications may help to clarify things and to ground

the discussion.

Examples of Qualifications upon IP*s

Restrictions on the type of data— rules regarding the admissibility of evidence in a court of law.

A rule requiring that the data include only replicable experiments.

Restrictions on the quantity of data— minimum sample size before a particular statistical test is taken to apply.

Degree or kind of assurance— authorizing acceptance of the conclusion as "a good working hypothesis", but not authorizing taking the conclusion to be "conclusively established".

Admitting the conclusion as "indicated" or "suggested" by the data.

Purposes for which the procedure may be employed— good enough for when a snap decision is needed, but not good enough for scientific justification.

Good enough for predicting the motions of slow moving particles with accuracy enough for practical matters.

Similarly to inductive procedures themselves, the qualifi­ cations upon them may be formal or informal.

Since a change in the associated class of qualifica­ tions could represent a change in the way a procedure may actually be employed (e.g., by changing the status of the conclusion), and would perhaps change the status of the IP 65

with respect to whether or not it is justified, we will

say, when qualifications change, that we have a new IP.

That is, we will count IP's as distinct if they differ in

their associated class of qualifications, even if the

procedures themselves pick out the same pattern. For

example, inductive generalization with a sample required

to include at least thirty individuals, will count as a

different IP than inductive generalization without this

restriction.

An inference may generally be classified in a variety

of ways, though some classifications will obscure those

characteristics of the inference most relevant to its

justification. This is analogous to deductive inference, where for example:

Snakes are reptiles. Reptiles are vertebrates.

Snakes are vertebrates,

may be classified as having any of the following forms:

All S are R. All R are V.

All S are V. or 66

For every x, if x is an S then x is an R. For every x, if x is an R then x is a V.

For every x, if x is an S then x is a V. or simply

P q

r

Classifying the inference as an example of this latest form is useless if we are trying to decide if it is a justified inference or not.

We have already said that the scientist, qua scientist, makes judgments. If he makes judgments, he will have to make inferences. Since he cannot get very far by making just deductive inferences, he will have to make inductive inferences. If he makes inductive inferences, he will be adopting IP's, at least implicitly. So the scientist adopts IP's.

Thus the scientist (and the philosopher) adopts IP's, not because he is sure that some IP is correct, or because he thinks that any IP at all is completely reliable, but because he has no other alternative but to give up making judgments. But to give up making judgments is to give up the attempt to understand. Because of his goal of under­ standing, the scientist is justified in adopting some IP rather than adopting none at all. 67

E. Inductive Rationality

Fundamental questions:

A. Which inductive procedures is it rational to

adopt?

B. Why is it rational to adopt these procedures?

One thing that distinguishes the scientist, qua scientist, from the layman is that the scientist has certain special concerns. He is concerned to justify his conclusions in ways that are persuasive to others. He is more careful in his reasoning than is important for most practical matters.

He especially seeks conclusions which have broad explana­ tory power— that contribute significantly to our under­ standing of the world. Yet the layman may, on occasion, share any or all of these concerns. So an IP which is rational for the scientist to accept, is presumably rational for the layman as well (though he may never have occasion to use it). Thus scientific rationality is, so to speak, a subset of the more general inductive rationality that applies to everyone who would reason. The above

"fundamental questions" have, therefore, an important special case:

A'. Which inductive procedures is it rational for

the scientist to adopt?

B 1. How can they be shown to be rational for him? 68

In line with the approach advocated earlier, A' and B' can be answered if we can answer:

A". Which inductive procedures are adopted by an

ideal scientist?

B". Why these? 69

F. Explanatory Coherence

Q: What is the ideal scientist trying to do?

A: She is trying to understand.

Q: Understand what?

A: As much as possible.

B: Do you mean that she's trying to get as much as possible into each grasp of understanding? Or that she's trying to get as many different grasps as possible?

A: She'll take whatever she can get.

I want to interpret the commitment to pursue under­

standing to mean that one is to maximize explanatory

coherence. This means that one is to modify one's doxa so

as to promote, to the best of one's ability, explanatory

breadth, explanatory integration, explanatory accuracy;

and to modify one's doxa so as to minimize cognative disso­

nance, narrow-minded parochialism, inconsistency, ignorance,

stupidity of all sorts, and error of whatever kind can be

discovered. Thus, explanatory coherence is not a single

magnitude, but rather an open-ended bundle of separate

magnitudes whose commensurability, in general, cannot be

presumed. Yet two doxata under comparison may differ only

in one place and be otherwise identical; this is the case when I am considering changing my doxa by accepting one of

two contrary explanatory hypotheses. The two prospective 70 doxata in this case can be compared in every respect in which the two explanations can be compared.

Thus our maxim says: maximize explanatory coherence

["explanatory" because the end is understanding. If the end were supposed to be simply truth, we might just want to say "coherence"]. 71

G. Inference to the Best Explanation

I have proposed a commitment to try to understand as the epistemic starting place. This gives us a general license to make inductive inferences, as has been said (see section D). Will it also provide justification for some particular IP? I think that it does, and I propose the following IP as a strategy to maximize explanatory g coherence:

to accept the best explanation we have of a body of data which we already accept

Qualification 1— only if there seems to be a distinctly best one available.

Qualification 2— only if accepting the explanation seems otherwise consistent with the overall goal of maximizing explanatory coherence. ®

Qualification 3— only if we do not expect shortly to be in possession of significantly more relevant data.11

Qualification 4 — Minimally, the conclusion is to be taken to be "suggested" by the data. From here, the degree of assurance increases, approaching "conclusively established" as the data broadens in kinds, and as it increases in quantity, as the conclusion is "best" to a higher degree and in more ways, and as the (failing) effort to discover viable alternative explanations increases.

Qualification 5— Any serious misgivings that arise in the process of coming to a conclusion are to be remembered along with the conclusion, and the acceptance itself is to be considered to be provisional pending satisfactory resolution of the difficulties.12 72

Let us call this the "inference-to-the-best-explanation

IP" (IBE for short), or sometimes we may call it "the explanatory inference". This is an immature formulation of the principle; eventually we will have to spell it out

in more detail. Let us also postpone for a while con­ sidering what "best explanation" means, and just take it at face value. We will consider the matter in more detail later.

Suppose we are presented with a choice of possible explanations for some body of data which we already accept. If we are to understand the data, we are justified in accepting the best explanation. This is because one way to understand something is to grasp an accurate or correct explanation of it. We have agreed to try our best to understand, and the best we can do in this case is to accept that explanation which seems to be the best one available. There is really nothing else we can do.

The IBE procedure should not seem too strange. We already use it quite commonly in our daily lives. Just how pervasive it is will emerge later.

Joe: Why are you pulling into the filling station?

Tidmarsh: Because the gas tank is nearly empty.

Joe: What makes you think so? 73 Tidmarsh: Because the gas gauge indicates nearly empty. I have no reason to think that the gauge is broken, and it has been a long time since I filled up the tank.

Under the circumstances, the nearly empty tank is presumably the best available explanation for the gauge indication, and Tidmarsh's other remarks are directed to ruling out possible competing explanations. 74

H. The Present Doxa

Now we still have not got'ton science or even common

sense off the ground. We have a commitment, and we have

an IP, but an IP requires data (already accepted proposi­ tions) to work from. And we do not seem to have any propositions yet.

Actually, we do have propositions already. Unless we have been working very hard to suspend judgment, there are all sorts of things which we believe. That we are reading

John Josephson's dissertation, for example. Perhaps we just think that we are reading it? But the best available explanation for why we think that we are reading it is that we are actually reading. Ergo, by IBE, we are reading it.

Diabolically simply really. Why do I think I’m reading?

Because I'm reading. My reading is a cause of my thinking that I'm reading. If I think I'm reading, and have no reason to suspect a mistake, then the best explanation I have for it is that I am actually reading, and as a result, think that I am.

The point is that we already accept many commonsensi- cal propositions that we have no real reason to doubt. If someone with a skeptical bent calls upon us to justify our belief in some such proposition, our response is to appeal to IBE: 75

Tidmarsh: This is my hand.

Pyrrho: Justify that assertion.

Tidmarsh: Well, I ain myself quite convinced of it. I have no evidence that I misunderstand such a simple piece of the English language. I do not recall having ever in my adult life been corrected or unexpectedly mis­ understood for my use of the words "my" or "hand", and I have surely used both words often. The best explanation that I am aware of for my success in using these words, is that I do properly understand their use.

Furthermore, I have no reason to doubt my perceptions right now. I have never had any reason to suspect that I suffer from unprovoked hallucinations. As far as I know, I have not recently ingested any psychoactive drugs. And this thing looks, feels, and acts like my hand. See, I can wiggle it at will! The best explanation that I am aware of, for all of the data, is that this is indeed my hand, that "This is my hand" is a proper English assertion, and is in fact true.

Pyrrho: I see that you are convinced of it, but how can you convince me?

Tidmarsh: Do you accept that this looks like a hand?

Pyrrho: It looks like a hand to me, yes.

Tidmarsh: How do you explain the fact that it looks like a hand attached to my body, and the fact that I seem to be convinced that it is my hand?

Pyrrho: Maybe I'm hallucinating. Maybe you're misguided, or mad, or hallucinating too. Maybe the gods have created an illusion for their amusement. The point is, we just don't know about anything but the appearances. So we had better suspend judgment about what lies behind them, and 76

remain silent. It looks like a hand, that's all. I don't try to explain it.

Tidmarsh: Ah, that is where we differ. I do accept explanations. And the simplest, most straightforward, most plausible, explana­ tion we have for the fact that this appears to us to be my hand, is that it is indeed my hand. I accept the conclusion.

The difference between us seems to be that I go willingly where the evidence leads, but you resist, waiting for certainty. Of course I could be wrong in this matter, though it hardly seems likely. But that doesn't really say much— we could always be wrong about anything. There is no reason to think hat I am wrong in this case, and clear evidence that I am right— as clear as the hand in front of my face.

Some other examples of common sense propositions that we can defend by appealing to an explanatory inference analysis of the evidence can be got from G. E. Moore's

"A Defense of Common Sense" (Moore [1]). I will list a

few of them here:

1. There exists at present a living human body, which is my body.

2. This body was born at a certain time in the past and has existed continuously ever since.

3. At every moment since it was born, there have also existed many other things, having shape and size in three dimensions, from which it has been at various distances.

4. The earth had existed for many years before my body was born.

5. At every moment since the birth of this body there have been large numbers of other living bodies. 77

To this list we can add one of our own:

6 . At least most living human beings are conscious in more or less the same way that I am conscious.

Now I do not claim, as Moore does, that I am certain of these things. What I do claim is that these things are true, and that we are quite justified in believing them.

Our justification in each case is that the truth of the proposition represents a distinctly best explanation of an enormous number of things which we already accept.

These common sense propositions are all examples of what Russell calls "instinctive beliefs".

Of course it is not by argument that we originally come by our belief in an indepen­ dent external world. We find this belief ready in ourselves as soon as we begin to reflect: it is what may be called an instinctive belief .... Since this belief does not lead to any difficulties, but on the contrary tends to simplify and systema­ tize our account of our experiences, there seems to be no good reason for rejecting i t....

All knowledge, we find, must be built up ~ upon our instinctive beliefs, and if these are rejected, nothing is left. But among our instinctive beliefs some are much stronger than others. . . .

There can never be any reason for rejecting one.instinctive belief except that it clashes with others; thus, if they are found to harmonize, the whole system becomes worthy of acceptance. . . . 78

Hence, by organizing our instinctive beliefs and their consequences, by considering which among them is most possible, if necessary, to modify or abandon, we can arrive, on the basis of accepting as our sole data what we instinc­ tively believe, at an orderly systematic organization of knowledge, in which, though the possibility of error remains, its likeli­ hood is diminished by the interrelation of the parts, and by the critical scrutiny which has preceded acquiescence.13

So let us start with those beliefs which we find "ready in ourselves as soon as we begin to reflect". We then proceed by modifying our beliefs as seems appropriate. 79

I. The Stability of Belief

We have said something about which propositions we are to accept; the question we now face is: which are we to doubt?

Each of us accepts an enormous number of propositions, should we attempt to justify each one? Surely this would take a while. So I propose that we adopt the following reasoning strategy, which we may as well call -

The Principle of Stability

We withdraw acceptance from something already accepted (actually suspend belief, not just as a thought experiment), only when we have some reason to do so. The mere possibility that we are mistaken is insufficient reason to withdraw acceptance.

One reason for accepting this principle is purely pragmatic. If we are to get on with the business of science, it will not do to randomly suspend our prior beliefs. It makes more strategic sense to hold a belief stable until we come upon some reason to change it. Seen this way the principle is one of economy of effort.

Other justifications for accepting the principle can be put forth which, in addition to strengthening the argument, are interesting in their own right. One such justification appeals to explanatory inference, but it will 80

take some developing, so if it appears that we are

wandering far afield, please be patient~-we will get back

to it.

Now usually, when I believe something, I can recall at

least some of the evidence upon which the belief was based.

If this evidence has been called into question, that would

constitute some reason (though not necessarily sufficient

reason) to change my mind about the thing. Contraposi-

tively, if I have no reason to change my mind, then the original evidence remains to support the belief; and so I

should presumably leave the belief intact. But what if I

can recall no particular evidence to support the belief, or to support it at its present level of confidence? Is

there any reason to continue to believe it?

To clarify things, let us consider an idealized

example. Suppose that there is a proposition p for which we have no direct evidence one way or another. Suppose

further that we are aware that a gentleman of our acquain­

tance by the name of Nordo Pringlee believes p. We are aware of no one else who has an opinion about p. We believe Mr. Pringlee to be of sound mind, but we are unaware of whether or not he has any special expertise in

the matter. In this minimum information situation can anything be inferred? Yes it can! Though not with any high degree of assurance. The best available explanation 81

for the fact that Pringlee believes p is that p is true

and that directly or indirectly the fact that p represents

is a cause of Pringlee's belief. This explanation appeals

to our belief that he is of sound mind, and is simple and

straightforward. So under the circumstances we can infer p. In general we may say that the fact that someone believes p is some evidence for the truth of p , and, in the absence of any evidence against p, or challenging the person's reliability, p may be accepted, though very provisionally and with a nearly minimal degree of assurance.

This example points out that in many cases a datum has a usual or natural explanation. Here 'Nordo Pringlee believes p' has as its natural explanation 'p is true'.

The truth of p accounts, to some degree, for why he believes it. By accepting p, we are able to understand something of why Pringlee believes p rather than, say, not p. Another example is: 'I seem to see a cow' which has as its natural explanation 'there is a cow there which I see, and which is the cause of my perception'. In general, the natural explanation for something stands out as the best available explanation when information particular to the case is minimal. It functions as a default presumption.

In accordance with IBE, the natural explanation for some­ thing may be accepted when there is no evidence to support 82

an alternative, incompatible explanation, or to challenge

the natural one.

Let us return to the argument for the Principle of

Stability. In general a proposition of the form 'S

believes that p' has as its natural explanation, 'p is

true, and is believed as a result of its truth1. In accor­

dance with IBE, we may accept the natural explanation of

a proposition of this form if we have no evidence that

counts against it. Furthermore, this applies to myself.

If I find that I believe p, the natural explanation for

this includes that p is true. So in the absence of any

reason to change my mind (e.g., evidence that counts

against the natural explanation, or the sudden appearance

of some purpose for which a high degree of assurance is

necessary), I should accept the natural explanation and go

on believing p.

I would like now to recast the earlier pragmatic argu­

ment in support of the stability principle, and make it

stronger, but this will require another digression.

Let us return to the idea of an ideal scientist. She

would never forget, never lose her place in her reasoning,

and never let desire or fear influence her judgment. How would she operate? What would she do? And why would she

do things that way? 83

As we said before, an approach to answering these

questions, is to try to design a robot scientist. What

will it have to be able to do? How can it be done? Why

should it be done that way? Recall that it will have

doxoidal state transition rules programmed into it. Let

us call its way of changing the doxoid, the "methodology

program"; it will of necessity incorporate IP's. This methodology program may be self-modifying. The computer

architecture of our robot need not be limited to that of a

single processor which executes steps sequentially; many

sequential processes may be linked together and coordi- . nated;^ our robot may be able to do many things at the

same time. Our robot may not even be predictable; it may use Monte Carlo strategies.

Also, this robot scientist need not be materially realizable or "mechanical"; it will be enough if its components and its programs are precisely describable.

Because our robot is to be a scientist, we design it to include the goal of understanding. It is not intended to just produce lists of true propositions, it is intended to produce explanations as well— at least to decide between explanations, and perhaps to create them too. In any case it will have to "decide" which explanations to present, and it will have to do this by somehow picking the best ones 84 available to it. That is, the methodology program will incorporate IBE.

Let us return to the Principle of Stability. Should our robot scientist be set up in accordance with it?

Clearly so. The pragmatic argument seems even stronger here. Better our robot spends its time digesting what it has already accepted, or looking to acquire new informa­ tion, than randomly suspending what it already accepts.

Sleeping dogs are allowed to lie when there is no apparent reason, atthe moment, for arousing them.— Isaac Levi^ 85

J . Memory

Human memory is notoriously fallible. How then can

there be any knowledge or understanding? Part of the answer is that, while our memories are fallible, they are

normally more or less accurate.

Suppose I remember something, say event M. Suppose

further that I have no particular reason to distrust my memory of M— I have no particular emotional stake in the matter, the memory seems clear and unconfused, I wasn't drunk or paranoid at the time, and am not now either, and

it isn't a memory from my early childhood. Then the best explanation I have for why I have this memory of M is that

M really did happen. This is the natural explanation, and

in the absence of any reason specifically to question it, it may be accepted. So, by an explanatory inference, I decide to trust the contents of my memory in this case.

Now the scientific community as a whole is in better shape than individual humans. Its data base is more reliably stored and is less vulnerable to subsequent dis­ tortion. Nevertheless, the basic reason for trusting an old research report, say, is the same as before: the natural explanation for why it says what it does, is that that is the way it was written. Thus, in the absence of any reason to suspect the contrary, I may infer that I am reading an accurate reproduction of what was originally written. 87

K. Sense Perception

What authorizes our giving any credence at all to our

experience? Why believe our senses at all?

The situation is similar to the case of memory. The

natural explanation for the fact that I seem to see a tree,

is that there really is a tree that I am seeing, and which

is a cause of my perception. If I seem to see a tree, and

the lighting is good, etc., etc., I may infer that there is

a tree before me.

The scientific community depends upon the perceptions

of individual humans. The natural explanation for why

Tidmarsh reports measuring variable Y as 15.2 ppm is that variable Y was approximately 15.2 ppm, and Tidmarsh was

reporting it correctly. Accuracy is normal, at least among

scientists, and inaccuracy abnormal. The default presump­

tion is normality.

Furthermore I claim that the objects of normal sight

are physical objects; and that these seen physical objects

are actual, nouminal objects, and not some sort of phenomenal objects or mere appearances.

A common philosophical opinion has it that when I

look at a table, what I immediately see is an "appearance" of some sort. The "real table", according to this theory,

is either forever unknowable, or, if it is knowable at all, 88 is so only indirectly as a result of making inferences based upon the appearances. Let us call this theory that there are two tables— an apparent table and a real table— the Two-Table Theory of Perception.

I contend that the Two-Table Theory is false. I deny the existence of the apparent table; or putting the matter slightly differently, I deny that the apparent table is anything other than the real table.

To make sense of my contention, let us follow for a moment the visual metaphor that when we accept an explana­ tion for something, we see that thing in a new way; we see it as part of a larger pattern. The metaphor suggests that explanations are some sort of pattern recognizings. I would like also to suggest the converse: that pattern recognition— seeing— is a form of explanatory inference.

When, for example, I see a picture of a table correctly as a picture of a table, I am in position to explain why the individual elements of the picture (the "pixels") are as they are, and not some other way. Furthermore, as I come to this gestalt— this seeing as— I have ruled out in the process (at least implicitly) seeing the picture as a pic­ ture of something else. This is, formally at least, very similar to inferring to the best of competing explanatory hypotheses; and, if I am correct, is just a case of explana­ tory inference. 89

Similarly to seeing a picture of a table, when I look

at a real table, and see a table, I have as part of the

process ruled out seeing it any differently than the way I

do see it. That is, the unconscious information processing

of my visual apparatus relies upon explanatory inference.

My eyes themselves, so to speak, infer the existence of the

table from, and in order to explain, the firings of the

retinal nerves, and then I find in my mind a vivid and

detailed belief concerning what is in front of me.

If this description of the process is substantially

correct, then the table before my mind is already a conclu­

sion of an explanatory inference, and no further inference

to a "real table" is required. My visual apparatus has

already inferred to the real table before I saw anything.

There is no additional "apparent table".

Of course I could be hallucinating. In this caseI would be having a kind of mistaken seeing of a table. This

is much like any other explanatory inference that r’esults

in accepting a false hypothesis. There is no more reason

to posit an "apparent table" to stand as the object of the hallucination, than there is reason to posit an "alleged flying saucer" to stand as the object of a false flying

saucer report.

The table I see is the real table. True, I do not see all of the table. I don't see the underside, or the inside 90 either. I don't see the atoms and the microscopic struc­ tures that make up the table. I don't see the whole of the table, but what. I do see is the table itself, or a part of the table at least; not some image of the table.

I contend that the noumena, the things-in-themselves, can be known; they can sometimes even be seen. 91

L. The Empirical Base

When we actually make an explanatory inference we don't

run through all possible explanatory hypotheses; for one

thing there are far too many. We normally limit our con­

sideration to "plausible" explanations, at least at first,

in effect supposing that a plausible explanation will be

better than an implausible one every time. We accept D,

then we cast about for plausible explanations for D. If we

find none we lower our standards for plausibility a bit and

try again. If we find only a single plausible candidate, one inductive leap, and we are done. But if we find

several roughly equally plausible explanations for D, we may be in for a lengthy and complex process of considering the alternatives— a process quite unlike an immediate infer­

ential "leap".

The deliberation that enters into making the judgment

that one particular explanation is the best may be more or less conscious. When only the conclusion is conscious and not the underlying process, we arrive at a "perception".

When we consciously follow and assent to the process, we arrive at a considered judgment. Sometimes we are unable to articulate any of the process by which we infer? under the circumstances let us say we have encountered an inarticulable inference. 92

Suppose we are challenged by a skeptic to produce

chains of justifications. We defend some claim by

appealing to an explanatory inference, are asked to justify

the data upon which the justification rests, and appeal to

further explanatory inferences and a further level of

evidence. Along some lines at least we run up against raw

perceptual seemings. Are they then the explained non­

explainers at the foundations of knowledge? No, as we have

said, these perceptual seemings are seeings-as of the

perceptual apparatus; pattern recognizings; results of

inarticulable explanatory inferences.

The perception of this table is a detailed belief about

what is directly in front of me— a belief, I contend, which

is the result of an inarticulable explanatory inference.

This inference proceeds from the stimulation states of my

physical senses, as data, to an explanation of those states

involving the geometry of my immediate surroundings, and

including a physical object resting directly in front of me.

We may justly call this process "inference" because,

while it does not have mental premises, it is based upon

evidence, and does end in belief. Furthermore, if these

raw perceptual seemings aren't the results of inferences,

it is hard to see how they could have any epistemic legiti­ macy. 93

P: They are the results of the functioning of a reliable mechanism. That's what makes them -legitimate.

I: Why is the mechanism reliable? Can this reliability be explained somehow without saying that the mechanism does something that amounts to inferring?

P: I don't know how to explain it, but I don't have to. It's just reliable, that's all. Empirically. [?] It is this reliability of our sensory mechanisms that confers the legitimacy to the seemings.

I: Well then, it seems to me that raw perceptual seemings are really the results of inferences. I've been thinking long and hard upon this question, and under these conditions my conclu­ sions are pretty reliable. [Cheap response to a cheap response; but the burden of proof is thrown back to the reliabilist.]

The process of perception is a process of inference— explanatory inference even. So the raw perceptual seemings are the results of non-conscious inductive inferences whose conclusions are mental, but whose premises are organismic and non-mental. The real bottom level underlying the per­ ceptions, the explained non-explainers at the empirical base of science, are the stimulations of the senses— the very firings of the sensory nerves. 94

M. Statement of the Thesis

Thesis. Explanatory inference is an important concept for understanding the logic of science. A logic of induc­ tive inference based upon explanatory inference provides a tool for analyzing scientific inferences; provides a basis for arguing that the objects of perception and the objects posited by successful theories are real things; and provides a perspective from which the traditional problem of induc­ tion is less intransigent. Notes

1. Treatise (Hume [2]), bk 1, part 4, sec. 1, p. 182.

2. Some philosophers have reportedly held that the goals of science are properly only prediction and control, and that understanding is not a goal of the enter­ prise at all. I am not sure what they are denying, but I would like to suggest that, ceteris paribus, the better driver is one who has some understanding of the workings of the vehicle. He can better predict what will happen when circumstances are unusual.

3. Aristotle, Posterior Analytics, Book I, ch. 13 (Aristotle [1)).

4. "The joy of research is not the accumulation of facts but the search for the understanding of the world and the development of laws and principles that tie together facts and explain their meanings," Ernst Meyr, evolutionary biologist, as reported in Science News, March. 31, 1979, p. 219. (Science News til).

5. Plato [1], as quoted by Richard Jeffrey in Jeffrey [1], p. 551.

6 . It seems to me that Descartes' "clear and distinct idea" started out as meaning something like, 'the intuition of deductive validity'. He was a mathema­ tician, remember. Later on, I suspect, the meaning has illegitimately slid to something like 'logically consistent'.

7. Popper holds a similar view. See Popper [1] and [2],

8 . Suggested by W. Lycan, though he denies it.

9. The importance of this IP was first made clear to me by Gilbert H. Harman's paper "The Inference to the Best Explanation", Harman [1].

10. This rules out accepting a seriously defective explana­ tion, even if it is the best one available. For example, if I recall correctly, the orthodox recon­ struction of the assassination of President John F.

95 96 Kennedy alleges that Lee Harvey Oswald fired three shots from his carbine within a period of two and a half seconds, one bullet causing a total of three wounds— two in Governor Connelly and one in the president. This may be the best explanation we have but if so it stresses credibility too far for it to be accepted.

11. For example, if there is an easy experiment that promises to help decide the question, one should sus­ pend judgment until the experiment has been performed.

12. An inductively rational being, an ideal scientist, or a decent inductive robot will keep track of any special qualifications its conclusions may have, e.g., "unless of course I'm completely wrong about what Fronto did say. . . ."or "unless Spock's tricorder was acting up again". These linkages are part of the structure of the doxa.

13. Russell [2], p. 24ff.

14. See, for example, Tanimoto and Klinger [1],

15. Levi [1J, p. 4. Chapter 3

Foundations

Explanatory Inference rather than Inductive Generalization

I would like now to compare the merits of basing an inductive logic on inductive generalization(IG), the puta­ tive view of the empiricists, and basing a logic on explana­ tory inference (El), "explanationism", my view. It seems to me that explanationism is superior to the view of the empiricists on at least four counts. This chapter will be devoted to arguing this superiority.

97 98

A. Theoretical Entities

Inductive generalization(IG) does not permit the con­ clusions of inferences to refer to entities of types which are unlike those that are observed. By observing frogs, we may drawn conclusions about frogs, or even about amphibians or animals in general, but we may not draw conclusions about genes for example. This is a serious disadvantage.

It has even led some to suggest that in all strictness conclusions can be drawn only about observations, e.g., that induction proceeds from actual observations to poten­ tial observations, without ever leaving the sphere of observations.

Before the space age, no one had ever seen the Whole

Earth, yet it had been customary for several centuries to use this model (of a roughly spherical Whole Earth) for navigation and foreign policy decisions among other things.

'The Whole Earth1 is (or was) a theoretical entity, rather than an observational entity. Yet it seems silly for our logical scruples to prevent us from drawing conclusions about the Whole Earth, just because it is unobserved; or if we are permitted such conclusions, it seems silly to interpret them as being about logical constructions out of sense data, or patches of land, or as being just about navigation charts and sea voyages. The Whole Earth is real, say I, independent of mind. It's a roughly spherical hunk 99 of matter, and we live on its surface. This conclusion cannot be drawn with any number of IG5s from sense exper­ ience. But it can be drawn with an El. The best explana­ tion we have for quite a number of things, including the change of altitude of the noonday sun with changes of latitude and the ability to get to the same place by sailing east or west, and including the utility of the notion of

'the Whole Earth', is that the Whole Earth really exists in the way it is notioned.

Explanatory inference does permit conclusions about unobserved types of things. If in explaining a series of bubble chamber tracks, a physicist is led to talk about positrons, say, then his conclusion can be taken at face value as being about little invisible entities, rather than being about, say, logical constructions of potential bubble chamber tracks. is permitted to do its work, and logic to do its work, without the one being made to do double duty to make up for the anemia of the other.

So the first superiority of El logic over IG logic is that El logic allows for a much more satisfactory treatment of theoretical entities. In fact El can move from phenomenal experience to the thing experienced, from phenomena to noumina. The noumina are posited to explain the phenomena, and this process is sometimes epistemically legitimate. So explanationism allows for an empiricist's sort of epistemology (one building upon observation and experiment) with a realist's sort of ontology. This cannot be done with inductive generalization. 101

B. Emergent Certainty

Empiricist epistemology seems as if it does not allow

for anything to be more certain than the observations

(except maybe tautologies), since everything is built up

from the observations by deductive logic and generalizing.

'All goats are smelly' comes out as less certain than any

given 'This goat is smelly'; and even 'Most snarks are

cottontails' and 'The next observed snark will be a

cottontail' come out to be less certain than 'All observed

snarks are cottontails'. Inductive generalization is always

certainty reducing. Empirical knowledge appears as a pyra­ mid whose base is particular sense perceptions, and where

the farther up you go, the more general you get, and the

less certain.

Thesis concerning emergent certainty. In practice the conclusions of inductive inferences are often held with more conviction than any of the individual pieces of data

that go to support them; this is true of the inferential practices of individual humans and of the whole scientific community; moreover this is as it should be, i.e., we are sometimes epistemically entitlied (and indeed sometimes compelled) to hold with great conviction conclusions which are based on bits of evidence which we are not entitled to hold with such a high degree of conviction. Let us call this phenomenon emergent certainty. 102

There are two subtheses. The first concerns the analysis of actual inferential practices, and asserts that emergent certainty is an occurrent phenomenon. The second concerns the analysis of ideal practice, and asserts that at least some of these actual practices are epistemically legitimate.

Arguments for the first subthesis: a. Separate inductions can sometimes converge on a single

conclusion and mutually support it much more strongly

than when convergence does not occur. This phenomenon

is called "the conciliance of inductions" by William

Whewell.'*' An example of this occurred, I think, when

the charge/mass ratio of the electron was independently

predicted to increase with velocity by Einstein's

Special Theory, and "found empirically" (i.e., was an

interpretation of an experiment) by Lorentz. b. A curve fit to a series of data points may hit none of

them, and at least usually, will not hit all of them.

Yet the scientist will sometimes trust the point on

the curve over the measured point.^ c. Patterns emerge from individual points where no single

point is essential to recognizing the pattern. d. A signal extracted-reconstructed from a noisy channel

may lead to a message, the wording of which, or even more the intent of which, is more certain than any part of the received message. This is especially likely if the message has a high degree of redundancy.

There are historical cases where observations are subsequently thrown out or revised. (Empiricism cannot easily accommodate this.) For example, practically everyone agrees that the sun appears larger when it is near the horizon. The camera doesn't see it this way however; the angular diameter remains the same near the horizon and overhead. What has to go is the naked eye observations.

The identification of the face, as a face, is more certain than the identification of any of the parts.^ 104

Arguments for the second subthesis (the legitimacy of inferences involving emergent certainty): a. If empirical science with strong theoretical conclu­

sions is possible, then there must be something like

emergent certainty at work. Any single observation or

experiment might be erroneous in any of a number of

ways— illusion, deception, misinterpretation— but a

strong theory should be able to survive the collapse

of any single observation or experiment. b. If we can find cases, either made up or from the

history of science, where our intuitions agree that

the inferences in question involving emergent certainty

were indeed legitimate under the circumstances, then

these will constitute arguments for the subthesis.

Perhaps some of the examples suggested under the first

subthesis will do. c. We have presumably already agreed to the legitimacy of

inferences to the best explanation. These sometimes

seem to exhibit emergent certainty. For example the

conclusion that other people have minds too, seems to

be justified by an explanatory inference which is

"broad but shallow", i.e., is based upon a large

number of observations of what appears to be intelli­

gent behavior, where any given observation it seems, 105

could easily be a misinterpretation. Nevertheless,

because there are so many observations, of so many

kinds, the conclusion seems pretty certain.

Thus we have found a second weakness of inductive generalization: IG cannot support emergent certainty, because the conclusions of these generalizations are less certain than their premises.

In contrast, the conclusions of explanatory inferences are sometimes more certain than their premises. For example, suppose that the same event is independently alleged to have taken place by a number of different wit­ nesses. Then their collective testimony is surely much stronger, and justifies a higher degree of certainty, than the testimony of any single witness. It is difficult to see how the inference to the truth of the allegations can be analyzed other than as an explanatory inference. (More coming about witness believing as El in the next section.)

If this is indeed a correct analysis, then El can support emergent certainty.

The second reason, then, for preferring explanatory inference over inductive generalization, is that inductive generalizations are always certainty reducing, while explanatory inferences are sometimes certainty enhancing. This ability to enhance certainty is in conformity with actual practice, and leads to a more robust science. 107

C. Absorption and Insightfulness

The third reason for preferring explanatory inference is that explanatory inference can absorb inductive generali­ zation, but this relationship is not symmetric; i.e., every

(intuitively warranted) inductive inference describable as

IG can also be described as an instance of El, but there are (intuitively warranted) cases of El which cannot be described as IG.

In his paper "The Inference to the Best Explanation,"^

Gilbert Harman proposes the view that the IBE is "the basic form of nondeductive inference".^ In support he argues that "all warranted inferences which may be described as instances of enumerative induction must also be described as instances of the inference to the best explanation". "Enumerative induction" he describes as inferring "from observed regularity to universal regularity 7 or at least regularity in the next instance", so we may identify Harman's "enumerative induction" with what we have been calling "inductive generalization".

Harman provides four examples of inferences that can be easily seen to be examples of IBE, but where it is difficult or impossible to see them as instances of IG.

His examples are: 108 i. When a detective puts the evidence together and decides that it must have been the butler,

he is reasoning that no other explanation which accounts for all the facts is plausible enough or simple enough to be accepted. ii. When a scientist infers the existence of atoms and sub-atomic particles,

he is inferring the truth of an explanation for various data which he wishes to account for. iii. When we infer that a witness is telling the truth,

our inference goes as follows; (i) we infer that he says what he does because he actually believes it; (ii) we infer that he believes what he does because he actually did witness the situation which he describes. That is, our confidence in his testimony is based on our conclusion about the most plausible explanation for that testimony. Our confidence fails if we come to think that there is some other plausible explanation for his testimony (if, for example, he stands to gain a great deal from our believing him). iv. When we infer from a person's behavior to some fact about his mental experience,

we are inferring that the latter fact explains better than some other explanation what he does.®

I find Harman's examples to be persuasive, and this in spite of the fact that there are obviously instances of IG in the unmentioned but essential background evidence supporting each inference. I find example 1— the detec­ tive— to be persuasive, not because this inference cannot be reconstructed as a large number of IG's, but because I do not see how such a reconstruction can capture the intuition that the conclusion is warranted (it must be the 109 butler) only when "all the pieces fit together", i,e,, only when this conclusion is seen as part of any intellec­ tually satisfying (i.e., plausible) reconstruction of the events surrounding the crime. 'Fit together' does not seem to be a concept analyzable by inductive generalization.

Harman’s second example, and the principle it illus­ trates, I find to be even more persuasive. There do seem to be warranted inferences to entities and events quite unlike any that are observed. We discussed this matter before as the superiority of explanatory inference with respect to its ability to handle theoretical entities.

There is no way that any number of IG's could justify claims about sub-atmoic particle interactions. If conclu­ sions about these particle interactions are warranted, and

I assume that at least some are, and if there is a basic form of warranted inductive inference, then it cannot be

IB.

With the third example Harman presents a strikingly good analysis of testimonial evidence. This then is our fourth reason for prefering explanatory inference:

Explanatory inference supports insightful analyses of inductive inferences.

Together, Harman's four examples seem to succeed in establishing the point that IBE cannot be absorbed into IG. 110

Harman's argument proceeds. He next argues that

"inferences which appear to be applications of enumerative induction are better described as instances of the infer­ ence ;to the bext explanation". The description in terms of

IBE is better in two ways: it makes it clearer when the inferences are warranted, and it makes it clearer when the inferences are knowledge producing. The problems of analyzing knowledge claims are really beyond the scope of this dissertation, so I will stick to evaluating Harman's claim that describing an inference as an IBE is more use­ ful in determining whether or not it is warranted than describing it as an IG.

This part of his paper suffers from a defect noted by

9 Robert Ennis. (Only one defect, I think. We will deal with Ennis' proposed counterexamples shortly.) This defect is in Harman's attempt to analyze inferences of the form:

All observed A's are B's

The next observed A will be B.

The problem is that the conclusion does not explain anything at all, much less the observed regularity. Now this would be a pretty serious charge, challenging the whole possi­ bility of treating warranted induction as IBE, if it were not for the fact that the trouble is pretty easily repaired. Ill

We simply insert the intermediate step 'All A's are B's'

(or maybe 'Generally A's are B's), and treat the argument, quite naturally, I think, as an IG step plus a deductive step (in particular, a prediction). There is one problem with this reconstruction which we will face shortly. So now, everything rests on the analysis of inferences of the form:

All observed A's are B's.

All A's are B's.

Whether these inferences can always be construed as IBE or not leads us to consider Ennis' challenge.

Ennis presents three examples designed to show that IG cannot be in general absorbed by IBE. i. Inferences from observed regularity to universal regularity,

G: 'Whenever an MD says that a child is about to come down with measles, that child soon comes down with measles'.

G is inferred from observations of a large number of such coincidences. The inferer knows that doctors study such things and is not surprised. ii. Inferences to regularity in the next instance.

Same situation as i, but what is inferred is: 'The next time that an MD says a child is coming down with measles, that child will come down with measles soon thereafter.' 112 iii. Inference to the second half of an instance of a regularity on the occasion of the occurrence of the first half.

Inferring 'The air speed of the airplane will lower' from the observation of the regularity between pulling back on the control stick and the lowering of the air speed, together with the knowledge that the control stick has just been pulled back.

I propose to treat Ennis' examples somewhat differently 10 than Harman does in his reply to Ennis. First, I insist

(just for now) that predictive conclusions, as in examples

2 and 3, be analyzed as being deductive consequences of universal propositions of the form, 'All A's are B's'.

This completely eliminates example 2, and we are left with trying to view G and 'Whenever the control stick is pulled back, the air speed lowers' as results of inferring to the best explanation. I do not think that this is very diffi­ cult, but before we get into it I wish to consider another example that I hope will clarify things.

Suppose that on the only two occasions when Konrad ate pizza at Guido's Pizza Shop, he had diarrhea the next morning. In general, Konrad has diarrhea occasionally, but not frequently. What may we conclude about the relation­ ship between the pizza and the diarrhea? What may we pre­ dict about the outcome of Konrad's next visit to Guido's?

Nothing, I maintain. There is not a large enough sample.

Now suppose that Konrad continues patronizing Guido's, and 113

that after -every one of seventy-nine subsequent trips he

has diarrhea within twelve hours. What may we conclude

about the relationship between Guido's pizza and Konrad's

diarrhea? That Guido's pizza makes Konrad have diarrhea.

And we may predict as the outcome of Konrad's next visit

that he will have diarrhea afterwards.

It seems to me that the best way to explain what is

going on in this example is by way of IBE. After Konrad's

first two visits, we could not conclude anything because we

did not have enough evidence to distinguish between the two

competing hypotheses:

i. The eating pizza-diarrhea correlation was accidental, i.e., merely coincidental or spurious. (Say for example, on the first visit the diarrhea was caused by a virus contracted elsewhere, and on the second visit caused by an argument with his mother.) ii. There is some connection between eating pizza and the subsequent diarrhea, i.e., there is some reason why he gets diarrhea after eating the pizza. (E.g., Konrad is allergic to the snake oil in Guido's Special Sauce.)

By the time we have noted the outcome of Konrad's seventy- ninth visit, we are able to decide in favor of the second hypothesis. The best explanation of the correlation has become the second. It becomes best because explaining it as accidental becomes less and less plausible the longer the association goes on. 114

This example serves to illustrate and make plausible

the general principle that whenever we observe some regu­

larity we have immediately the two competing explanatory

hypotheses:

i. The observed regularity is spurious.

ii. The observed regularity is no accident.

(I want to use terms here in such a way that these two come

out to be mutually exclusive and jointly exhaustive).

That the observed regularity is no accident serves, I maintain, as a weak explanation of the observed regularity.

This is very close to Harman's view in "Enumative Induc­

tion as Inference to the Best Explanation",'*'^ though

Harman puts the matter as the normality of the general

regularity explaining the observations, while I put it as

the non-spuriousness of the observed regularity explaining

the observations. I take it that "that A is normally

followed by B" (Harman's generalization) is one sort of

situation that can lead to an observed regularity which is non-spurious. There are others, for example: 'All A's are B's' or 'Somebody around here is making A's into B's'.

What I mean by an "accidental" or "spurious" observed regularity is something like 'thinking that a signal has been received but it's really just noise' or 'what was 115

unexpected and surprising was so merely as the result of a

statistical fluctuation1.

That the observed regularity is non-accidental com­

petes only with explanations that imply the accidental

nature of the regularity. (If such things can be called

explanations at all; they do provide some information, and

some degree of understanding, so let us count them as

explanations. Thus, saying that something is "due to

chance" counts for us as an explanation of that thing.)

And the observed regularity being non-accidental is other­ wise compatible with any additional explanation of the

regularity. Thus, for example, under appropriate circum­

stances 'A's are normally B's' serves as the best explana­

tion of 'All observed A's are B's' by out-competing (among

other things) 'A and B are unrelated and the observed

association is spurious'.

Let us now return to Ennis' examples of inductive

generalizations alleged to be impossible to absorb to IBE.

I will treat the first one; the others may be treated

similarly but with the addition of one more step, the deduction of a prediction. Consider the inference to:

'Whenever an MD says that a child is about to come down with the measles, the child soon comes down with the measles.' 116

The inference is from observed regularity. Is this an inference to the best of competing explanations? (Supposing, contrary to reason, that the conclusion is true.) Yes, the statement of universal regularity explains the observed regularity. Does it explain the observed regularity better than, say, an explanation in terms of how physicians are trained? It does not matter; these explanations do not compete; they are completely compatible (or rather, would be compatible if physicians could be trained to complete reliability). Does it explain the observed regularity better than the hypothesis that it was due to chance? Yes.

If the regularity has gone on for some time, the hypothesis that it is due to chance posits too many coincidences to be plausible.

Poirot said: "That is quite possible. I am always prepared to admit one coincidence."

Agatha Christie— A Holiday for Murder

Ennis' main objection here is based on his refusal to accept the view that 'All A's are B's' is an explanation of

'All observed A's are B's'. He says,

the universalization 'All A's are B's does not explain why S is a B, not why when S was an A, S was a B. Similarly, it does not explain why being an A and being a B were associated at other times. Thus, it does not explain why A and B are regularly 117

associated. This universalization does not account for the evidence advanced on its behalf. I can see nothing else that might be reasonably offered as material to be explained by the inductive conclusion. This is thus a case of legitimate enumerative inductive inference that is not an inference to a best explanation.13

To answer Ennis' objection it will be necessary to distinguish between the event of discovery of some fact and the fact discovered. I agree with Ennis that 'All A's are

B's does not explain any of the following:

why ai is a B, why a2 is a B, . . . , why an is a B, . . . , why, when S was an A, it was a B; why A's have been regularly associated with B's in the past.

But I claim that it does explain:

why aj_ was found to be a B, why a 2 was found to be a B, . . . , why, when S was observed to be an A, it was also observed to be a B; why A's have been repeatedly observed to be B's in the past.

The universalization explains the events of discovery, not the facts discovered. 'All A's are B's' does explain why

'All observed A's are B's', not by explaining why the A's that were observed were B's rather than, say, C's; but by explaining why I couldn't find any A's that were C's even though I looked. (I couldn't find any because there weren't any.) The universalization does not explain its instances; it explains the character of the observations. 118

Perhaps a concrete example will make my claim clearer.

Suppose I choose a ,ball at random (arbitrarily) from a large hat containing colored balls. The ball I choose is red. Does the fact that all of the balls in the hat are red explain why this particular ball is red? No, but it does explain why, when I chose a ball at random, it turned out to be a red one.

I am really making two claims here. With the first I am agreeing with Ennis, and disagreeing with the Covering- law or Subsumption Model of Explanation, in claiming that a general statement does not explain its instances. I am convinced that this is correct; but even if it is not, and consequently 'All A's are B's' does explain why these particular A's are B's, then it explains why 'All observed

A's are B's'. And so, even if my claim is not correct, it comes out that IG can be absorbed by IBE, which is the main point that I am arguing.

Still, pursuing the matter anyway, 'All A's are B's' can't explain why 'this A is a B', because it does not say anything at all about how its being an A is connected with its being a B.

He: Why is this A a B?

She: They all are.

He: T hat’s interesting. OK then, why is it that all A's are B's? 119

The information that "they all are" doesn't really tell me

anything about why this one is, except that it suggests

that if I want to know why 'this one is', I would do well

to figure out why "they all are".

My second claim is that a generalization does explain

something about the observations of its instances. That

the (cloudless, daytime) sky is blue explains why, when I

look up, I see the sky to be blue. The truth of 'Theodore

reads ethics books a lot1 helps to explain why, so often when I have seen him, he has been reading an ethics book;

(but it doesn't explain why he was reading ethics books on

those occasions). In its most general form the claim is that the statistics for the whole population, together with the method for drawing a sample, explain the statistics of the sample. In particular, 'A's are usually B's' together with 'this sample of A's was drawn without regard to whether or not they were B's' explain why the A's that were drawn were mostly B's.

I seem to find no way of arguing for this claim, other than by describing examples and appealing to the intuitions of my readers.

Returning to the main point, we find that Ennis is apparently mistaken, and 'All A's are B's' does explain why 'All observed A's are B's'; and so the inference from 120

observed regularity to universal regularity does not repre­

sent an inference form that cannot be absorbed by IBE.

So let us suppose that Ennis' challenge has been dealt

with, and that it is possible to treat every warranted

inference that appears to be an instance of IG as an

instance of IBE. Let us now consider Harman's claim that

viewing an inductive inference as IBE is more useful in

determining whether or not it is warranted than describing

it as IG. This is a return to considering our fourth

reason for prefering explanatory inference, namely that it

supports insightful analyses.

In particular Harman suggests that the inference

(cast as an IG):

All observed A's are B's

All A's are B's is warranted,

whenever the hypothesis that all A's are B's is (in the light of all the evidence) a better, simpler, more plausible (and so forth) hypothesis than is the hypothesis, say, that someone is biasing the observed sample in order to make us think that all A's are B's. On the other hand, as soon as the total evidence makes some other, competing hypo­ thesis plausible, one may not infer from the past correlation in the observed sample to a complete correlation in the total population. 121

Harman seems to be doing fine here. I might add that if we insist on thinking of induction as IG, we are at a loss to explain why the inference from the observed past correlation to a complete correlation in the whole popula­ tion is made stronger or more warranted, if in collecting our observations we make a systematic search for counter­ instances, and cannot find any, than it would be if we just take the observations passively. I.e., the generalization is made stronger by making an effort to examine a wide variety of types of A's. But from an explanatory infer­ ence point of view, we can say that the inference is made stronger because the failure of the active search for counterinstances rules out various hypotheses about how our sample might be biased.

I would like very much to conclude this section here with the claim that every inductive argument can be con­ strued as an instance of IBE; and further, that to construe them this way looks very promising as an approach to con­ structing a complete theory of inductive inference. But there is a problem area that I have been avoiding up until now, and which it is time to face.

The problem arises when we consider inferences to predictions. Until now I have been reconstructing predic­ tions as deductive inferences from universal premises that are arrived at by IBE. But this is inadequate. Inferences 122

to predictions from statements of less than universal

regularity do not appear to be deductions. The problem is

made acute by noticing that the inference pattern

observations

at least generally A's are B's

the next A will be a B

is stronger or more warranted than the alternative:

observations

All A's are B's

the next A will be a B.

The first is stronger in that it establishes its conclusion with more certainty by hedging its middle step, and thus *| C making it more plausible or probable. But the first pattern does not include a deductive second inference, while the second pattern does. So it seems that perhaps

some of our most warranted inferences to predictions do not rely on deduction for the final predictive step.

Our problems would be solved if we could find a way of

integrating statistical syllogism and the like into the explanatory inference approach. Statistical syllogisms have forms like: 123

m/n of the A's are B's (where m/n > 1/2)

The next A will be a B

and

m/n of the A's are B's

Approximately m/n of the A's in this sample will be B's.

Some related forms are:

Generally A's are B's S is an A

S is a B and

Your typical, normal X does T

This X will do T.

In general they are inferences from whole populations to samples or to particular instances. They are not deduc­ tions, and they do not seem to be explanatory inferences.

One approach to analyzing these inferences, Harman's approach in Harman [2], is to construe inductive inferences to infer to total explanatory accounts, rather than just 1 f to explanatory hypotheses. Somehow this means that we are to infer to the best explanans-explanandum pair, instead of just inferrina to the best explanans. Harman: 124

I claim that in inductive inference one may directly infer by one step of inference only a conclusion of the form A explains B. . . '

one may use induction to infer A explains B and then use deduction to infer A. Or one may first infer A explains B and then infer B.17 ------

This way we perceive a competition for the title of "best explanation" (as best pair) to occur when two explanans- explanandum pairs agree on the explanans but differ on the explanandum. Harman:

Sometimes one infers an explanation of something already accepted (as when one infers that a person says what he says because he believes it); but sometimes one infers that something already accepted explains something else (as when one infers that a person's present intention to do something will explain his later doing it).

By this approach predictive inferences are made out to be species of inferences to the best explanation. For this approach to succeed it must be the case that every

(warranted) predictive inference is either a deduction, or what amounts to an explanans to explandum inference.

I agree that sometimes the thing predicted is also explained, but I contend that the predictive forms related to statistical syllogism are in general non-deductive and non-explanatory as well. Again, a generalization 125

does not explain its instances, though it does predict

them.

More arguments that a generalization doesn't explain its

instances: If explanations must give "causes" (in some

broad sense), then it is hard to see how a generalization

could explain its instances, as it seems not in any way to 1 9 cause them. What it all comes down to is this:

'generally A's are B's' doesn't at all help me to under­

stand why 'this particular A is a B', though it does tell me something about where an explanation is to be sought;

namely, in whatever explains the whole class of A's being

so often B's.

Q: Why is this ripe apple red?

A: Most ripe apples are red.

The answer does not provide an explanation for 'this ripe

apple is red', but it would provide a basis for predicting

it.

If we agree that a generalization does not explain

its instances, then we must reject Harman's analysis of

predictive inferences.

Another alternative for dealing with predictive

inferences is to reconstruct them somehow as deductive

inferences. Perhaps this can be done by building into the 126

meanings of the probability words of a theory, the license

to make the relevant predictions. Yet the fact remains

that in any inference of the form:

P has high probability

P whatever the probability assertion is taken to mean, it

would have to allow for the possibility of the conclusion

being false while the premise is true. This being so, such

an inference could not be a deductive one.

Somehow it has to come out that the rationality of

holding to a prediction from a hypothesis, is derived from

the rationality of accepting the hypothesis in the first

place; we commit ourselves to the appropriate prediction as we accept the hypothesis. This is the way it works with

hypotheses of universal form: when we accept 'All A's are

B's' we are committed to accept 'The next A will be a B 1 and 'Any sample of A's will consist entirely of B's'. We need a way of seeing things whereby accepting 'Generally

A's are B's' commits one, under the appropriate circum­

stances, and with appropriate confidence, to expecting

'The next A will be a B' and expecting that 'A sample of

A's, chosen in such and such a way, will consist mostly of

B's'. 127

I think that the correct approach is to construe the

inference to the prediction to be a sort of practical

inference. When I accept 'Generally A's are B's', I am committed, when I want to decide between the predictions

'The next A will be a B' and 'The next A will be a non-B',

to choose the first. Sometimes I don't have to decide, or, don't want to, but when I do want to generate expectations, then I try to conform my expectations to the way things will be (perhaps factoring into my considerations the relative costs of being wrong and the benefits of being right). My best estimates of the way things will be are provided by the best theories I have of the way things timelessly are. If my best theory has it that 'Generally

A's are B's', then I expect the next A to be a B (providing that I do not think this situation to be untypical), because I think I'll be right more often than wrong by doing things this way. I conform my expectations to what

I think are the frequencies in the whole (timeless) popula­ tion. If two-thirds of the A's are B's, then if I guess 20 "B", I'll be right two-thirds of the time.

I think that the inference goes something likethis:

1. 2/3 of all A's are B's. (From an explanatory inference, say)

2. For x's that are A's, if I assert 'x is a B' I will be correct 2/3 of the time.(From 1) 128

3. Ceteris Paribus adopt inductive procedures that will be correct most of the time. (Practical premise)

4. x is an A. (From observation, say)

5. x is a B. (From 2, 3, 4)

Thus, statistical syllogisms and their kin depend on our courage, our willingness to risk being wrong (though sometimes the risk is not very great). The justification

for a particular prediction uses a previously accepted generalization as a premise; and amounts to adopting a strategy that, if the generalization is correct, will yield 21 correct predictions often enough for our present purposes.

Predictions, then, are on my view neither inferences to the best explanation, nor are they always deductions.

They are sometimes non-deductive inferences which are not instances of inference to the best explanation. This is contrary to Harman's view that all non-deductive inferences are IBE. On my view non-deductive inferences are of two types: inferences to the best explanation (explanatory inferences) and predictions. Nevertheless, explanatory inference can absorb inductive generalization. Inferences like:

All observed A's are B's

At least most A's are B's 129 can be made out to be inferences to the best explanation.

These are paradigms of inductive generalizations. On the other hand, inferences like:

All observed A's are B's

The next A will be a B

(Let us call these "projections", rather than "inductive generalizations".) are reconstructed as consisting of an inference of the first kind, plus a predictive inference of the form:

At least most A's are B's

The next A will be a B.

These predictive inferences are on my view rationally justifiable as being the best available response to the desire of arriving at an expectation. They amount to adapting a reasoning strategy that, while it does not always yield a correct conclusion, does demonstrably yield correct ones more often than incorrect ones (supposing the generalization to be true).

So it seems as if a logic based upon explanatory inference is not a complete inductive logic; it is not able to handle predictions. This being so, Explanatory

Inference Logic will be limited to being a logic of theory 130 acceptance, and requires supplementation by a more or less

separate logic of predictive inference. Predictive Infer­ ence Logic is to be concerned with the extraction of the implications of the theories we already accept, and is, for some theories at least, to be modeled along the lines of classical sampling theory.

Thus we find that explanatory inference absorbs inductive generalization, leaving only the predictive aspect of induction to be handled separately. Notes

1. See, for example, Laudan II], p. 372.

2. There is a brief but cleardiscussion of this in Freedman, Pisani, Purves [1], p. 195.

3. After an illustration from Norman [1].

4. Harman [1],

5. Ibid., p. 88.

6 . Ibid., p. 88.

7. Ibid., p. 88.

8 . Ibid., p. 89.

9. Ennis [1]

10. Harman [2].

11. Harman [2], p. 532.

12. Christie, [1].

13. Ennis, [1],pp. 526-527.

14. Harman [1], p. 91.

15. It could be made stronger yet by hedging the temporal extent of the middle step: 'At least generally A's are B's, at least for the recent past and immediate future'.

16. Taking the conclusion of an explanatory inference to be a "total explanation account" fits very nicely with the A-B perspective discussed in the last chap­ ter, and I embrace the general approach. But Harman's interpretation of it is unacceptable.

17. Harmon [2], p. 530.

18. Ibid., p. 530.

131 132

19. Though, 'whenever p, then q' could be construed as a proposal for an explanation of q0 , namely, 'because of some pD '.

20. If two-thirds of the A's are B's, then I'll be right two-thirds of the time for guessing "B", unless the procedure by which A's are chosen is "biased", and furthermore, doesn't exhaust all of the A's. That is, we have the three cases:

Case #1— The A's are finite and are examined one at a time; never reexamined. Then a guess of "B" will be correct in two-thirds of the instances that potentially occur before all of the A's are examined, i.e., correct in two-thirds of the possible outcomes. This is analogous to draws without replacement.

Case #2— A's finite. Situation as in draws with replacement. See Case #3, since draws with replacement can potentially go on ad infinitum.

Case #3— A's infinite (or at least conceptually so). Then the meaning of 'two-thirds of the A's are B's is not obvious unless some sort of definition is made. We need to define things so that in some sense we would be correct two-thirds of the time for guessing "B".

21. This procedure is like Pascal's Wager in that it involves taking a risk of being wrong, and involves (sometimes) considering the benefits of success and the costs of error. But it is unlike PW in that PW ignores the probabilities of success and error, but that is precisely what this procedure is all about. Also our procedure requires courage where PW turns on cowardice. Chapter 4

The Problem of Induction

A. Where is the Problem?

Wesley Salmon summarizes Hume's argument this way:

Take any inference whatsoever. It must be deductive or nondemonstrative. Suppose it is nondemonstrative. If we could justify it deductively it would cease to be nondemon­ strative. To justify it nondemonstratively would presuppose an already justified type of nondemonstrative inference, which is precisely the problem at issue.

and then:

Hume's argument does not break down when we consider forms more complex than simple enumeration. Although the word "induction" is sometimes used as a synonym for "induction by simple enumeration", I am not using it that way. Any type of correct ampliative inference is induction; the problem of induction is to show that some particular form of ampliative inference is justifiable.

In one way it is not surprising for induction to be

justifiable only circularly. Any logical principle taken

to be fundamental is in this position. Consider modus ponens: Any justification of it must be either deductive or inductive. But any inductive justification would not do justice to the certainty of the conclusion. So a justifi­ cation would have to be deductive. Since (we may suppose)

133 134 modus ponens is deductively fundamental, any such justifi­ cation must presuppose modus ponens, and the justification would thus be circular. Modus ponens seems no better off here than inductive generalization or any other inductive principle.

We seem to have established the truism that it is impossible to non-circularly justify our first principles of justification. This is no great thing. So where is the problem in the problem of induction?

One piece of the problem that remains is this: It seems that we can explain where deductive arguments get their force. The information in the conclusion of a deduc­ tive argument is somehow contained in the premises; the argument is extractive of this information; it does not assert anything that goes beyond what is already asserted, perhaps only implicitly, by the premises. Thus, even though we may not be able to justify our deductive proce­ dures by appealing to anything more fundamental, at least we can explain why deduction works. But inductive arguments are ampliative; the conclusion of an inductive argument asserts things which are not asserted by the premises, even implicitly. If it didn't do so it would be deductive. The question is: Where do inductive arguments get their force?

The status of induction is a bit like that of Euclid's fifth postulate— the parallel postulate. For centuries this 135

axiom seemed more puzzling, less obvious, than the other

axioms. Numerous attempts were made to provie it; all

failed. Finally, the development of consistent non-

Euclidean geometries in the nineteenth century, these

denying the parallel postulate, showed that it was logically

independent of the rest. Hume's argument asserts the logi­

cal independence of induction from deduction when it asserts

that a deductive justification for induction is impossible.

(Are there then a number of consistent systems of induc­

tion?)

But if a deductive justification is impossible, and an

inductive justification would be circular, what then? Are we supposed to do without induction, as Popper thinks? Or are we supposed to just pick our basic principles of induc­ tion completely irrationally? You decide for inductive generalization, I choose IBE, she picks faith in her magic pebble, and who's to say who is the wiser?

So here is another remaining piece of the problem of induction: How are we to decide which inductive procedures to adopt? Can we rule out obviously irrational ones like

'the inference to the worst explanation' and Salmon's p "counterinductive rule"?

Finally, it has not really been demonstrated that a deductive justification of induction is impossible. True, a deductive justification is impossible along the lines of 136 accepting the premises of an inductive argument, and then proving that the conclusion must be true. If we could do that, it would indeed turn the inductive argument into a deductive one. But there are other strategies for deduc­ tively justifying induction that can't be declared to be impossible quite so readily. Maybe we can accept a deduc­ tive (viz., mathematical) theory of 'probability' and use that to justify certain inductive forms as leading to conclusions which are provably probable. That would leave us with the need to establish some connection between the mathematical theory and the world we live in in such a way that conclusions which are "probable" can reasonably be. expected with some regularity to turn out to be true. This is the strategy we followed in Chapter 3 to justify induc­ tive prediction.

Another promising strategy for deductively justifying induction is the pragmatic approach. We pick some inductive procedure and argue in some way that we have everything to gain and nothing to lose by adopting this procedure. This was the thrust of the justification for 'maximize explana­ tory coherence’ that was given in Chapter 2. This is also the strategy of Reichenbach's "vindication of induction".

Answers of the pragmatic type, originally offered by Peirce but independently elabor­ ated with great resourcefulness by 137 Hans Reichenbach, are among the most original modern contributions to the subject-3— Max Black 130

B. Reichenbach's Vindication4 of Induction Improved:

Probabilities as Propensities

Reichenbach conceives the fundamental form of induc­ tive inference to be induction by enumeration. This is the form of induction that he attempts to vindicate. He says,

Since the axiomatic construction of the calculus of probability leads to the result that, when the frequency theory is assumed, all probability inferences are reducible to deductive inferences with the addition of induction by enumeration, it followed that all inductive inferences are reducible to induction by enumeration.^

By "the frequency theory" he means that interpretation of the formal calculus of probabilities which identifies a probability with a limit of a frequency within an infinite sequence.^ So let us imagine an infinite sequence of A's, some of which have the characteristic B. Let ^n be the number of B's in the first nA's (n=l, 2, 3, . . .).

Then is the relative frequency of B's in A's at the k*-*1 A. Now as n gets larger, lim ^n/n might exist, and n + ° ° it might not. If the limit does not exist, say lim n/n = L, n-*°° we say that the sequence of relative frequencies converges to L. It is this limit L that Reichenbach identifies with the probability of B given A (p(B/A) in our notation). b 7 Thus, for Reichenbach p(B/A) = L = lim n/n * n-+oo 139

If we conceive with Reichenbach the mathematical

calculus of probabilities as a formal deductive theory of

probability transformations, and if we accept for the

moment the idea that all inductive inferences are probabil­

ity inferences, we can see why he might think that the only

real inductive inferences are those that establish probabil­

ities. (But to these we may hasten to add those inferences

that purport to go from statements of probability to bald

assertions without attached probabilities. See Chapter 3.)

Anyway, according to Reichenbach the fundamental form of

induction to which all others are reducible is "induction by enumeration".

Consistent with the frequency interpretation, he takes

induction by enumeration to be the inference from a finite initial segment of a sequence to the value of the limit

in the infinitely long run.

. . . the theory of induction by enumera­ tion . . . states that the frequency observed for a finite initial section can be identified with the limit of the frequency. . . . ®

According to him the inference has the form of inferring from an initial frequency of ^k/ to p(B/A) = + 6.

The 6 represents a degree of approximation for the inferred value. ^ 140

Salmon, however, in his discussion of the inference,

leaves off the 6 term.-*-® He then describes the inference his way and then says in a footnote:

The foregoing formulation differs from Reichenbach's in that his formulation mentions a degree of approximation 6 , but since no method of specifying 6 is given, it seems to me that the rule becomes vacuous. . . . It seems better to state the rule so that it equates the observed fre­ quency and the inferred value, making it a pragmatic matter to determine in any given case, what degree of approximation of the inferred value to the true value is accep­ table. •‘-1

But there is a very good reason for leaving in the 6 term.

Without it, technically, the rule has a zero probability of inferring the correct limit. This is so because the correct limit, if it exists, might be any real number on the interval [0,1], since all such numbers are accessible as limits of sequences of relative frequencies. (See

Appendix A.) But at every stage along the way, the rela- tive frequency k/^, being a ratio of natural numbers, is itself a rational number. There are countably many rational numbers on [0,1], but uncountably many real numbers as

Cantor has shown; so in the absence of further information about one of these limits, we must suppose that the proba­ bility is zero that one chosen at random is a rational number. And so, if we identify the limit as one of the 141 finite relative frequencies, without some tolerance of inexactness, there is a probability of zero that we are correct (which is not to say that it is impossible, only that it is terrifically unlikely).

So let us deal with Reichenbach's version and formu­ late :

Reichenbach1s Rule:

Given that the relative frequency so far bk/k = j/k? to infer that lim ^n/n = j/^ + 6 n-*-"

Presumably the tolerance term 6 is to be specified in such a way that it decreases with increasingly long data, vanishing in the long run, i.e., 6 ■+ 0 (k -* °°) . According to Reichenbach, this rule expresses the fundamental form of inductive inference to which all others are reducible.

This is the form of inductive inference that Reichenbach's

Vindication is directed toward justifying.

Yet Max Black accuses this approach of conceiving inductive method in an "eccentrically restricted fashion".

He says, "the determination of limiting values of relative frequencies is at best a special problem of inductive 12 method and by no means the most fundamental". So what can Reichenbach be thinking here? What is it that makes all of this plausible? 142

Let's try the classical form:

All observed A's are B's.

All A's are B's.

Can this be construed as an instance of Reichenbach's Rule?

Let's see. First off we note that RR requires a sequence of A's, so we interpret the classical form as:

All heretofore observed A's have been B's.

All A's that ever will be observed will be B's. and thus confine our attention to observed A's. Our observation of A's will now form a sequence proceeding into the indefinite future; and since 'all' can be identified with 'a relative frequency of 1', this in turn gets inter­ preted as:

The number of B's so far observed is the same as the number of A's so far observed, i.e., bfc = k; and so ^k/^ = 1

In the long run ^n/n will continue to be approximately 1. and this as:

bvk = i

lim bn/n = 1 n-*-°° 14 3

If there is some substantially different way to go from the classical form to Reichenbach's, I am not aware of it.

Two links are questionable. The first concerns the identification of 'All A's are B's' with 'All A's that ever will be observed will be B's'. The conceptual gulf between these two is, of course, enormous, but I can see no other way of confining the A's to a sequence plausibly consistent with Reichenbach's intent. But surely there are vastly more events of atomic decay, say, in the cosmos than are ever observed. Yet an ardent empiricist might point out that there is no practical or empirical difference between the two, and thus advocate ignoring the conceptual differ­ ence. Let's not argue this issue now.

The second weak link is the move from 'in the long run kn/n will continue to be approximately 1' to lim ^n/n =l'. n-*-oo For, while the second can be conceived to follow from the first, it nowise can be conceived to be an analysis, or an interpretation, or a description of it. This is so because the second does not entail the first. As Reichenbach was aware, the truth of 'lim ^n/n = 1' is compatible with the n-*-°° most outrageous behavior of the sequence ^n/n occurring, and continuing to occur, for a very long time before A's settle down to being B's so unanimously that a limit of 1 is the result. This 'lim ^n/n =1' is a substantially n-*-00 144

weaker conclusion than the original 'All A's are B's' that

it was meant to be an interpretation of. It says a lot

less about what will happen in the near future.

Reichenbach seems sometimes to forget this point, as

when he writes, "... the limit of the frequency, that is,

with the probability controlling the sequence (emphasis 1 ^ mine). Or more clearly:

The a posteriori determination [of a probability] is identical with a pro­ cedure known in logic as induction by enumeration. . . . It is based on counting the relative frequency in an initial section of the sequence, and consists in the inference that the relative frequency observed will persist approximately for the rest of the sequence, or in other words, that the observed value represents,within certain limits of exactness, the value of the limit for the whole sequence [ emphasis mine].14 M =

"In other words", he says! But, that the "frequency

observed will persist approximately", having more to say

about the near future, is logically stronger than "that the

observed value represents, within certain limits of exact­

ness, the value of the limit".

So 'lim kn/n = i* cannot be an interpretation of 'all n + o o A's are B's'; and the classical form cannot be made out as

an instance of Reichenbach's Rule. 145

It seems rather that Reichenbach's Rule is a special

case of

m/n of observed A's are B's.

m/n A's are B's.

i.e., an inference from sample to whole population. Thus,

rather than being the more general form, it appears that

Reichenbach's Rule is the special case. Reichenbach's Rule

appears when IIT1/m A 's are B's'can be interpreted in

sequence terms and identified with a limit statement like

'lim bn/n = m/n -' n-*°°

This would be the case for example if there were only a finite number (n) of A's and exactly m of them were B's.

Then the sequence of observed relative frequencies would hit m/n , when all of the A's had been observed, and remain m/n ever after. Thus in this case lim bn/n would indeed n+oo be m/n , but in a somewhat trivial way. The interpretation of the situation here in terms of sequences seems to add nothing but complexity.

Alternatively, the class of A's might be infinite but confined to a sequence, as for example if the A's were the discrete outcomes of the operations of an ideal gambling machine, where a sequence approach to analyzing it seems natural. We can think of RR as being a form of sample to 146

whole population inference that arises when the sample is

chosen, not by the experimenter, but rather is presented

sequentially to the experimenter by events themselves. In

such circumstances problems with experimenter bias in the

choice of the sample do not arise.

It seems to me that the best way to make sense of

Reichenbach's view is to suppose that he has something like

ideal mechanisms in mind.

Suppose that we have a Real Coin, and begin flipping

it in shifts. It would be nice if we could associate some

definite probability with the event of getting heads, the better to conform our expectations to reality. But our

coin is not immortal; sooner or later it must split,

crumble, wear away, or in some other way Give Out. So the coin tosses do not form an infinite sequence of events, but only a finite sequence. So suppose the coin gives out (or we do) after a large number of tosses, say after f tosses, and at this point e heads have been observed. Then the

final relative frequency is e/f, and this is the value

Reichenbach would presumably have us identify with the

True Probability of getting a head. Yet the statistical behavior of the coin as it nears its demise may differ substantially from what it was earlier on, as if, for 147

example, it lost a chip on the 7, 00 0^ toss and became

strongly biased for tails.

Now let us look at the situation as we face it at the

k*1*1 step, where k is a number, say, between 100 and 10,000.

We have observed j heads (1 < j < k) for a relative

frequency so far of 3/^. To keep things simple, let us

also assume that we have no information, other than

frequency, upon which to base our expectations of future

behavior— no observations of cracks in the coin, no special

insight into the behaviors of the tossers, no pattern to

the appearing of heads; and let us assume that the frequency

H/k is not so tantalizingly close to 1/2 that we are tempted to suppose it to be a fair coin; and assume that the behavior of the coin does not appear to have undergone any changes. We can all agree, I think, that we would want to equate this ratio, 3/k/ with our best guess as to the rate at which heads will continue to turn up in future strings of tosses; we would expect the observed relative frequency to continue to hold approximately constant in the future.

Whether we call it a "probability" or a "weight", we can agree that j/k is the appropriate value to assign to our expectation that the next toss will yield a head; we would use 3/^ in our reasoning in deciding what bets to accept.

Does Reichenbach's Rule have anything to add to our understanding of the Real Coin at the k*-*1 step? All that 148

-the Rule lets us do is infer that lim ^n/_ = 3/t + <$; and H K. — that, as we have said, is logically weaker than the presump­ tion that the observed frequency willpersist approximately, and doesn't carry any information about the next case. The limit concept doesn't seem to contribute anything here.

So let us formulate the

Standard Rule of Projection

Given that the relative frequency so far ^k/j^ = j/^, to infer that in any near future string of A's, approximately j/^ of them will be B's.

One remaining piece of the problem is the challenge of providing a justification for this inductive procedure for those contexts where the procedure is indeed appropriate and rational. Such a justification will be attempted before the end of this chapter.

********

Reichenbach's conceptual problems seem to stem from the frequency interpretation. The probability p(A/B) can be thought of as active now and "controlling" the sequence, but not so the limit of the sequence of relative frequencies.

The limit is way too far off in the future to exert any influence now, and might even be different from the proba­ bility that is now controlling the sequence.If the probability is thought of as a "propensity" for A's to be 149

B's, i.e., an objective tendency (as it seems Reichenbach does at times), this probability could be exerting its influence right along. In an Ideal Situation (coins that are flipped forever and don't wear our, Eternal Laws . . .) this propensity would be stable and would result in the convergence of the sequence of relative frequencies in the infinitely long run.

Reichenbach's Rule, I submit, is plausible only if we think of certain real probabilities as "propensities", which inhere in real situations and give rise to real frequencies. A propensity can be numerically identified with what, under certain idealizing assumptions, the limit of a sequence of relative frequencies probably would be.

I am not advocating here some sort of full-blown propensity interpretation for all probabilities (as perhaps Popper does^), but am merely interpreting certain probabilities in certain contexts that way.

Actually, we know a lot more than we have said about the coin tossing situation. Here we are presented with a highly unstable situation which quickly reaches stability in either one of two configurations: arms are mechanically sloppy and not precisely controllable; the spin rate of the coin is usually high relative to its vertical velocity; the angle of the first bounce, and the effect of the first bounce on the spin rate, are critically dependent on the 150 precise angle at which the coin first strikes, and similarly for later bounces. In all, we have a background theory of the situation whereby we expect a stable propen­ sity of heads, not very far from 1/2. Since we already have reason to expect a stable propensity, we can take the relative frequency encountered so far as our best estimate of the numerical value of that propensity, without having to infer the existence of a stable propensity from the data.

Propensities, as I propose them, exist toward any possible outcome of any situation. They are objective features (albeit high level abstract features) of situa­ tions. Propensities (to occur) attach to possibilities in situations. They are thus most fundamentally applicable to single cases (typically the next case), and have a less direct relationship to frequencies. Propensities cause outcomes. They help to explain why a single result comes out the way that it does; and, if a propensity is stable, as an experiment is repeated again and again, this fact helps to explain why the frequencies come out the way that they do. Propensities cause frequencies.

According to this propensity interpretation, the rela­ tive frequencies will tend to approximate the propensity: 151

While the propensity is in effect p(B/A) = z implies that any random sample of k A's will probably (propensity) include approximately z*k B's; the degree of approximation improving as the sample gets larger.

Let us call this the frequency promoting property of a propensity.lb

Admittedly this formulation of the property is imprecise,

but it is, I think, precise enough for our present pur­

poses. Also:

If the propensity is assumed to remain stable indefinitely, p(B/A) = z implies (with probability 1) that lim ^k/k exists and equals z. n-*-£»

Let us call this the convergence property of a propensity.

The convergency property is, I take it, a logical conse­

quence of the frequency promoting property.

On what evidence may I posit a propensity governing a

sequence? It seems that there are four possibilities:

A. Some indirect (viz., theoretical) reasoning leads to positing a propensity governing the next case. The exis­

tence of the propensity is inferred separately from

inferring the numerical value of that propensity. Example: coin tossing, where our general theory of the situation

leads us to expect a propensity of heads not far from 1/2, but whose actual value is accessible only by experiment. 152

B. The data so far reveal a stable propensity. If the sequence so far be divided into subsequences of equal length, and if then the frequencies in the subsequences approximate a normal distribution around their mean value, then the sequence is "acting as if it were governed by a single probability", i.e., this is what classical sampling theory would lead us to expect on the basis of a fixed probability. In the absence of any evidence to the con­ trary, the best available explanation of a pattern of this sort is that it is the result of a stable propensity

(numerically near to the relative frequency so far encoun­ tered) . Stable propensities are the presumed causes of a certain class of statistical patterns, namely those that display fairly stable relative frequencies. Typically, one projects a heretofore propensity to the next case.

More about this shortly.

C . Some more complicated pattern is revealed by the data, that yet leads to positing a propensity governing the next case. Very many interesting and even bizarre examples of this sort of thing can be imagined: Consider an ideal machine that prints out zero's and one's on an infinite tape, starting at a definite point. All we know is the output of the machine as it is produced; we know nothing about the causes of that output. The inner workings of the 153

machine and the intent of its designers are unavailable to

us. Since the machine produces binary sequences (zero's

and one's) and since we can't see inside, let us call it an Opaque Binary Machine or OBM for short. We are inter­ ested in predicting whether or not the next character will be a one, i.e., we want to figure out what propensity governs the next case.

Such an OBM might be doing almost anything. Maybe it is printing out the video data from the Voyager II spacecraft, or the texts of Dada poetry encoded in ASCII.

On the face of it, it seems mad to try to predict the next character of such a machine. Yet whatever it is doing, we might be able to spot an emerging pattern in the output, and use that pattern as a basis for a reasoned estimate of the next case.

One projectable pattern type we have already discussed— patterns revealing stable propensities. Other possibilities include deterministic patterns, e.g., repeating patterns; and partially deterministic patterns, as those displaying rigidly oscillating frequencies, or, say, zero's and one's always appearing as identical pairs, 00 and 11, where the appearance of the pairs is nondeterministic. Further examples are provided by Markov processes and by fractally 17 clustered error bursts. 154

If we spot a consistent pattern of occurrence of one's, whether deterministic or stochastic, we may typically project the pattern into the future, at least as far as the next case, and use that pattern as a basis for making a prediction.

D. A propensity may perhaps be posited as a sort of last resort. Suppose we must make a prediction for practical reasons, but

i) the data are too short to recognize anything, or

ii) the data follow no known distribution, or

iii) we have no other information but the frequency

to go on— the rest of the information about

occurrences so far being lost. Then perhaps we

can posit a propensity applying to the next

case and approximated by the frequency encoun­

tered so far.

Anyway, whether propensities may indeed be justly posited as a last sort, and if so, in which contexts, let us declare to be outside the scope of the present discus­ sion.

Having gone some ways in developing the 'propensity' idea, we now go on to apply it to Reichenbach's Vindica­ tion of Induction.

******** 155

Let us now reformulate Reichenbach's Rule for the propensity interpretation.

Reichenbach1s Rule Revised

Given that ^k/k = j/k, to infer that p(B/A) = 3/k + 6 (where 6 -*■ 0 as k -*• 00) .

qualifications: The probability p(B/A) is to be interpreted as the propensity for A's to be B's. This inductive procedure is to be used only if the existence of a stable propensity is already supposed.

The shift from interpreting the probability here as a limit of the frequencies to interpreting it as a propensity is most of the change from Reichenbach's Rule. Propensities are the Explanationist's counterpart to the Empiricist's frequencies. Reichenbach's Rule Revised is now a species of explanatory inference, since propensities explain frequencies.

We see now that the Standard Projection can be resolved into two distinct steps: the first is an instance of Reichenbach's Rule Revised, and infers from an observed frequency to a propensity; the second step infers from the propensity to future relative frequencies, and is a prediction. The whole is governed by the supposition that there exists a stable propensity (or at least one that is stable enough for our purposes). The second step can be made out to be a deductive one when the stability of the propensity is supposed; this follows from the frequency 156 promoting property. That leaves the inductive content of the Standard Projection as consisting of two parts:

Reichenbach's Rule Revised, and the supposition that the propensity will remain stable. Accordingly, and assuming the justification of deductive steps to be unproblematical, the Standard Projection would be justified if these two parts were justified.

Let's see if they can be justified. We turn first to

Reichenbach's Rule Revised and ask whether Reichenbach's vindication of induction might not be fixed up and made to work for the Revised Rule.

First, the particulars of Reichenbach's vindication.

Salmon:

Reichenbach1s justification proceeds by acknowledging that there are two possibilities, namely, that the relative frequency with which we are concerned approaches a limit or that it does not, and we do not know which of these possibilities obtains. If the limit does exist, then so does the probability p(A,B) IpCB/A) in our notation]; if the limit does not exist, there is no such probability. We must consider each case. Suppose first that the sequence Fn (A,B) [i.e., ^n/n ] (n = 1, 2, 3, . . .) has no limit. In this case any attempt to infer the value of that (nonexis­ tent) limit is bound to fail, whether it be by induction, by enumeration, or by any other method. In this case, all methods are on a par: they are useless. Suppose, now, that the sequence does have a limit. Let us apply the rule of induction by enumeration and infer that the observed frequency matches the limit of the relative frequency to whatever degree of approximation we desire. We 157

persist in the use of this rule for larger and larger observed initial parts of our sequence as we observe larger and larger numbers of members. It follows directly from the limit concept that, for any desired degree of accuracy whatever, there is some point in the sequence beyond which the inferred values will always match the actual limit within that degree of approximation. To be sure, we cannot say before hand just how large our samples must be to realize this condition, nor can we be sure when we have reached such a point, but we can be sure that such exists. There is a sense, consequently, in which we have everything to gain and nothing to lose by following this inductive procedure for ascertaining probabilities— i.e., for inferring limits of relative frequencies. If the probability whose value we are trying to ascertain actually exists, our inductive procedure will ascertain it. If the probability does not exist, we have lost nothing by adopting that inductive procedure, for no other method would have been successful in ascer­ taining the value of the nonexistent probability.18

As I said, Reichenbach has got to be looked at as trying to infer propensities, not limits. He identifies these propensities with the limits of relative frequencies in the infinitely long run; but I think, rather, that a propensity is the sort of thing that causes the frequencies to come out the way that they do, and hence, causes the ideal limit to be what it is. The propensity explains the limit of the frequencies; the two are conceptually distinct.

If the limit of the frequencies can be inferred, however, then so can the propensity, for they are numerically 158 identical (at least they are under the assumption that the propensity is stable). So if the vindication of

Reichenbach's Rule is correct it can, it seems, be carried over to justify Reichenbach's Rule Revised,

Now according to Reichenbach1s Rule we should project the relative frequency so far observed, and take it as our estimate of the limit. This rule, according to its supporters, has the demonstrable virtue that it will succeed in finding the limit if anything will. If the limit exists, then the sequence of estimates will converge to that value as a limit.

But we can demonstrate an analogous virtue for

Reichenbach's Rule Revised— it will succeed in finding the propensity if anything will. If the propensity exists and persists with a constant value, then Reichenbach1s Rule

Revised, repeatedly applied as data accumulates, will probably sooner or later produce estimates within any desired degree of closeness to that value of the propensity.

This follows from the frequency promoting property.

Now this virtue does not strictly recommend his rule as being the best one for adoption, as Reichenbach was aware. His rule is only one of an infinite class of rules with this same virtue, which he calls being an "asymptotic method" or a "self-corrective method". 159

Salmon

The chief defect in Reichenbach's justifi­ cation is that it fails to justify a unique inductive rule, but rather, it justifies an infinite class of inductive rules equally well. Although induction by enumeration will work if any method will, it is not the only rule that has this characteristic. Reichenbach's argument shows that the same can be said of an infinite class of induc­ tive rules. Reichenbach was aware of this fact, and he characterized the class as asymptotic. A rule is asymptotic if it shares with the rule of induction by enumera­ tion the property of yielding inferences that converge to the limit of the relative frequency whenever such a limit exists. . . . The fact that there are infinitely many asymptotic rules is not, by itself, cause for dissatisfaction with Reichenbach's argument. If this infinity contained only a narrow spectrum of rules that yield similar results and that quickly converge to one another, we could accept the small degree of residual arbitrariness with equanimity. The actual case is, however, quite the opposite. The class of asymptotic rules is so broad that it admits complete arbitrariness of inference. . . . Given a sample of any finite size, and given any observed frequency in that sample, you may select any number from zero to one inclusive, and there is some asymptotic rule to sanction the inference that this arbitrarily chosen number is the probability. The class of asymptotic rules, taken as a whole, tolerates any inference limit of the relative

Reichenbach saw his task to be that of showing his rule to be epistemically preferable to the other asymptotic ones, especially those that led to significantly different predictions in the short run. 160

It should be pointed out that there is also another way in which Reichenbach's Rule has not been shown to be best. If the limit does not exist, Reichenbach's Rule will fail to recognize that fact; it would have us go on indefinitely making estimates of the non-existent limit.

But there are other inductive procedures that would not make this mistake. For example, Josephson's Jump (humbly named), to wit: 'try to figure out the pattern, then project its indefinite continuance', can, under some cir­ cumstances, predict that the limit does not exist.

Consider an opaque binary machine (OBM). We are interested in the frequency with which ones appear in the infinitely long run, i.e., in the limit of the sequence of relative frequencies. Suppose that what comes out is this:

first 1 one next 10 zeros next 100 ones next 1,000 zeros next 10,000 ones

etc.

It can be shown (.see Appendix B) that, if the pattern continues, the limit of the relative frequencies does not exist. If we recognize the pattern as it comes out on the tape, and suppose its continuance, we can deduce the cor­ rect conclusion— that the limit does not exist. 161

Reichenbach's Rule, however, has us producing estimates of the non-existent limit ranging from below 1/10 to more than 5/6 and slowly oscillating back and forth for­ ever (see Appendix B).

If we allow for the possibility that a procedure might arrive at the conclusion that a limit does not exist, then it is not true at all that Reichenbach's Rule will work if any method will. We have found one case where

Josephson's Jump works but Reichenbach1s Rule fails.

My purpose here is not really to argue for the adop­ tion of Josephson's Jump (which is, by the way, one of the asymptotic rules in that it is self-correcting in the long run), but to see what can be done to straighten out and finish up Reichenbach's solution to the problem of induc­ tion .

Accordingly, let us again imagine an OBM. Let us suppose, this time, that we can find no deterministic pattern to the output, and further, that we can find no pattern at all incompatible with the hypothesis that the output is governed by a stable propensity to produce one's.

So we proceed on the assumption that the behavior of the machine is purely the statistical result of a stable propensity, or at least that that is the best way we can find to approach the problem of predicting its behavior. 162

Accordingly, let us agree that we are trying to infer the value of the propensity for characters printed to be ones.

We want to demonstrate that Reichenbach's Rule Revised

(RRR) is in some sense epistemically preferable to any other "asymptotic" rule, i.e., preferable to any other rule which probably converges to the propensity if a stable propensity really exists. If this can be demonstrated, then the defect in Reichenbach1s Vindication will have been repaired, and the problem of induction will have yielded

(at least partially) to our methods.

So now let us suppose for the sake of the demonstra­ tion, that a stable propensity P for the occurrence of one's exists. Let R be an asymptotic rule significantly different from RRR. Let R(n) represent the estimate of P i t . given by rule R at the n character. Since R is asymp­ totic, probably R(n) -*■ P (n -*■ 00) .

Since R differs significantly from RRR, there must be some value of n (say k) such that R(k) produces a signifi­ cantly different estimate of P than does RRR. Suppose that at the k*-*1 character there have been one's so far, for a relative frequency of k / ^ . We can represent this situa­ tion at the ktb character, where the rules differ, as:

R(k) -Tjk. ^k/^. (k^/jc being the estimate produced by RRR at the k*"*1 character, and " " being read as "significantly differs from"). 163

Let us consider two competing hypotheses;

h^; k/fc ~ p f i.e., the observed frequency conforms approximately to the propensity.

h 2 "‘ k/k ^ i.e., the observed frequency deviates significantly from the propensity.

Now, that P exists and is stable implies (frequency pro­ moting property) that probably P ~ bk/^. Thus h^ is probable (in the propensity sense) and hj is, accordingly, improbable. But if probably bk/k ~ P, and since

R(k) it follows that probably R(k) P. Thus

RRR has probably produced a better estimate of P than R has.

So this is the epistemic probability of RRR over other asymptotic rules: That if a stable propensity exists, where a rule differs significantly from RRR in its estimate of that propensity, the estimate produced by RRR is proba­ bly the more accurate. RRR is justified, not just on how it does in the long run, as is RRR, but on the basis of how well it can be expected to do in the short run.

I think that I have repaired the defect in Reichenbach1 s vindication of induction by replacing his frequency inter­ pretation of probabilities by a propensity interpretation.

If this is correct, and if indeed Reichenbach1s vindication has been the best response to Hume's problem that has been produced to this point in time, then I have produced a better response to Hume's problem. Explanationism leads to 164 a point of view from which the problem of induction is less intransigent.

If I have indeed repaired the defect in Reichenbach's argument, then RRR has been justified; and that leaves as the only unjustified inductive content of the Standard

Rule of Projection the supposition that the propensity will remain stable.

We now turn to the problem of the justification for projecting stabilities into the future. 165 21 C. The "New Riddle of Induction": Projecting Stabilities

There is something fishy going on with the Principle

of Uniformity. If we take this principle to be the general

proposition that unobserved instances of any phenomenon

resemble observed instances, then the thesis is clearly

false. Things do change. Furthermore, while some such

general principle is alleged by Hume's argument to be

presupposed by the whole class of inductive inferences,

single inferences seem to presuppose only limited unifor­ mities specific to the phenomena we are considering. For

example, in the case of Hume's billiard balls, we need only

suppose that future instances of billiard ball collisions will resemble past ones in order to make our inference to

the next case a valid one. When we inquire into the

possible justification of the billiard ball uniformity

principle, the circularity anticipated by Hume's argument

does not immediately appear. We could, for example, justify

it by assuming certain laws of physics which have as their

justification larger classes of observations (perhaps none of them specifically of billiard balls) and different uniformity principles.

Yet, what appears to be a persistent feature of the world may turn out to be a temporary condition, or worse— a mere extended coincidence. Explanatory Inference allows 166 one to rule out (pretty much) the extended coincidence hypothesis if the run has been long enough, but how do we rule out the hypothesis of a temporary condition? Often we have background theory that lets us make the jump, as for example, particle behavior is supposed more consistent over time that human behavior. But on the most fundamental level available at the time, at the outposts of theory that are the background for lesser theories, we have no back­ ground theory to appeal to at all. What then?

How do I rule out the hypothesis of a temporarly con­ dition? Perhaps I don't rule out the hypothesis, perhaps

I just suppose that things aren't about to change. E.g., on the most fundamental level, particle physics say, I don't rule out the hypothesis that the discovered behavior only holds for the local epoch. I just do physics for the epoch in which I find myself. If pressed, I generalize to the whole of time, but I am aware that I am being specula­ tive. There is a tradeoff here: by hedging the temporal scope and thus improving the probability that my hypotheses are correct, I raise the possibility that the present epoch is untypical, and thus unnecessarily introduce loose ends and weaken the system of hypotheses.

An important principle here is that the present is a random moment— there is no reason to think that it is unlike any other moment. 167

Time: -*------(------)------► t ft tt+ +1 the time interval of the observations

When I generalize, I generalize to the past, future, and

to China and Mars. In the absence of background theory,

generalizing very much beyond the matter at hand is pretty

speculative, but any explicit limits to the generalization

would be arbitrary. The rational thing to do, it seems, in

the absence of decisive background theory, is to leave the

scope of a generalization unsettled, but to imagine the

limits to be out of reach.

What is the difference between generalizing in space

and generalizing in time? Is more caution indicated in the

one than in the other? The answer is that we can sample

around spatially, and thus rule out spatial biases, but we

can't sample around temporally. Also, for most matters, we

can sample a larger part of Relevant Space than we can of

Relevant Time.

Again, in the absence of background theory, which

occurs only at the theoretical frontier, we may as well

generalize to the limits, i.e., to all space and time

(except maybe right after the big bang and right before the

singularity at the end of time). In doing so we need to be aware that we are being somewhat speculative, i.e., are in 168 the realm of explanations which are best explanations, but only better in avoiding arbitrary, complications.

There is still the problem of slow change of funda­ mentals, e.g., what if the unit of electric charge very slowly decreases with time? What does that do to reasonings supposing the charge to be constant? It doesn't make much difference in the short run, but it makes us more and more inaccurate the farther from the present we extend our reasonings.

We don't rely on a general principle of uniformity after all, but rather, I contend, we project certain limited stabilities indefinitely into the future. Paradigmatically:

All observed A's are B's. (Lots of examples, no reason to suspect bias, . . .)

There must be some reason why A's have been so consistently turning out to be B's.

There is no reason to suppose this reason to be unstable.

— > Suppose it to be stable

A's are at least almost always B's (timelessly).

Let us try to vindicate positing stabilities (or at least relative stabilities). The argument goes something like this: Cognitive knowledge of the world proceeds by discovering stabilities in the world, and adjusting our ideas thereto. These stabilities are either there, or 169 they're not. If they're not there, then nothing will avail, understanding is impossible. But if they are there, then explanatory inference will find them— they will appear

(probably, eventually, if they are significant) as the terms of explanations; one stability, one cognitive item: a species, an equation relating two physical magnitudes, a name for a human being, a predicate. I have in mind a rather simple-minded modeling by setting up a kind of iso­ morphic representation of the world in the mind or in the theory. So: no stabilities, no.understanding, I learn by picking up stabilities, and giving then names, learning to recognize their instances, and reappearances, and manifes­ tations, etc.

********

Let us now consider Goodman's "New Riddle of Induction".

Goodman:

Suppose that all emeralds examined before a certain time t are green. At time t, then, our observations support the hypothesis that all emeralds are green; and this is in accord with our definition of confirmation. Our evidence statements assert that emerald a is green, that emerald b is green, and so on; and each confirms the general hypothesis that all emeralds are green, So"far, so good.

Now let me introduce another predicate less familiar than "green". It is the predicate "grue" and it applies to all things examined before t just in case they are green but to other things just in case they are blue. Then at time t we have, for each evidence statement 170

asserting that a given emerald is green, a parallel evidence statement asserting that that emerald is grue. And the statements that emerald a is grue, that emerald b is grue, and so on, will each confirm the general hypothesis that all emeralds are grue. Thus according to our definition, the prediction that all emeralds subsequently examined will be green and the prediction that all will be grue are alike confirmed by evidence statements describing the same observations. But if an emerald subsequently examined is grue, it is blue and hence not green .... Moreover, it is clear that if we simply choose an appropriate predicate, then on the basis of these same observations we shall have equal confirmation on our definition for any prediction whatever about other emeralds— or indeed about anything else.22

Let us read Goodman's definition of 'Grue' as saying

that grue objects are those which are either green and

23 examined prior to t, or blue and not examined prior to t.

The explanationist's task is to show why the hypothesis

that all emeralds are green is a better explanation for the

evidence than the hypothesis that they are all grue, and

thus show why it is preferable to project the predicate

'green' into the future than to project 'grue'. Let us

examine the implications of each hypothesis.

First note that the whole discussion cannot go on in

a void; we must examine some common background doxa against which the hypotheses are to be compared (see chapter 2).

So let us assume that the color of an emerald (i.e., blue or green) is a result of its spectrum of reflected light, 171 which is in turn a result of certain features of its micro­ structure.

Thus, if all emeralds are green, then this is so because they possess a similar microstructure. This would not be very surprising because (we may suppose) there must already be some physical basis for the distinction between emeralds and non-emeralds. It's plausible that whatever underlies emeraldhood, is connected to whatever underlies the color green.

On the other hand, if all emeralds are grue, the situation gets a bit more complicated. What happens at time t? What changes and what does not? Emeralds are always grue, so nothing changes in that respect; but emeralds are observed to be green before t, and blue after­ wards. (Except that individual emeralds that we first observed before t are green, it seems, at all times— even after t.) Emeralds that are never observed are blue right along ("grue applies to other things [not observed before t] just in case they are blue").

So no emerald actually changes color at t. Some, namely those never observed, and those observed only after t, are blue right along. And some, namely those that are observed before t, are green right along. No physically peculiar event has happened at time t, and we can't reject the hypothesis on those grounds. 172

Yet on the hypothesis that all emeralds are grue there

are presumably two different microstructures— the one that

underlies the green emeralds and the one that underlies

the blue. Since the only green emeralds are those that

are observed before t, it is only these emeralds that have

the green-generating microstructure.

Thus we find two anomalies associated with this hypo­

thesis. The first is in the peculiar situation after t,

where all of the emeralds are grue, but some of them are

green (those first observed before t), and the rest are

blue. After t I have no way of telling whether a given

emerald is grue or not (my eyes work on the basis of normal

color), unless I know whether or not that particular stone

has been observed before t or not. That is, after t, if

I come upon a green emerald, I can't tell by any experiment whether it is (i) a grue emerald first observed before t,

or (ii) a counterexample to the proposition that all

emeralds are grue. There would be no structural difference

between the two kinds of stones. After t grue is no longer

an observational predicate.

The second anomaly occurs before t. An amazingly unlikely event has occurred. All and only those emeralds with the green-generating microstructure were observed,

even though there were two different microstructures

represented in the whole population of emeralds. A low 173

likelihood event has occurred— a decastrang, or even a kilostrang (see Appendix C). An improbable sampling coin­ cidence has occurred.

The hypothesis that all emeralds are green is, all

things considered, a better explanatory hypothesis than is the hypothesis that all emeralds are grue. 'Green' is more projectable than 'grue', not because it is more entrenched by accident of usage and simple mindless habit (however well the world has selected out bad habits), but because it is more entrenched in physical theory. ’Green' coheres better with the physics.

Let us consider one other reading of 'grue1, namely green before time t, blue afterwards. It is pretty hard to imagine a mechanism whereby grue is more projectable than green. What happens to light at time t? Or to the structure of emeralds? Here again green is more entrenched in terms of mechanism. 'Green' coheres better with the physics.

So what about a fundamental predicate like electric charge? What if at time t, everything changes its electric charge? Say 'x is posineg' means that x is posi­ tive at some time before t, or that x is negative after t

(t is in the future, say). Now the question is: why say that protons are positive? The temptation is to say that

'positive' is a simpler predicate that ’posineg', but one 174 might respond that this is so only if it is indeed more fundamental. I maintain that we can't rule posineg out completely, but it is pretty hard to imagine a mechanism whereby the charge changes at t. Actually, for any practical purpose we don't really have to rule out the hypothesis that protons are posineg, again we just need to suppose that t is sufficiently far in the future that there won't be trouble. It would be pretty surprising if

Goodman's time t turned out to be in the next few minutes, or even years. Goodman's time t, if it exists, could be any time in the next 20 billion years, or even later. Why in our lifetimes? It would be a strang (occurrence of a low likelihood event) for it to occur in this year of all years. Or better, maybe even a kilostrang (see Appendix

C) .

So why do I project 'green' instead of 'grue'?

Because green makes more causal sense. We've got lots of background theory here such that, if we suppose the back­ ground stuff to be o.k., then 'green' works better than

'grue'. But what about at a fundamental level where we have no real background theory? What makes 'electrically positive' more projectable than 'posineg'? When I posit a stability, I may as well posit it as perfectably stable and then compromise back as coherence requires. But if the proton is posineg, it's stable in that regard, and unstable 175 as 'positive'. (Yet it would change the way it behaves at t. That's an instability which is unbalanced, it seems.)

In the absence of any reason to suppose change, I suppose stability. It's simpler; and, since it's self- correcting anyway, I choose the simpler. We may appeal to the natural explanation: The simplest, most direct, and least ad hoc explanation for why Z appears stable, is that it is stable.

Furthermore, these stabilities that we posit and pro­ ject into the future are to be considered "real" and not merely artifacts of our choice as predicates. Stabilities are more fundamental than the language we use to pick them out: if the world were a Heraclitean flux, with all things changing in all ways at all times, then language and thought would be impossible because our words would have no fixed meanings.^ Projectable predicates are those that pick out real stabilities. 176

D. Where did the Problem Go?

We found that inductive inferences have two funda­ mental forms. The first has variously appeared as:

inferring to the best explanation, accepting a theory, changing one's doxa so as to maximize explanatory coher­ ence, recognizing a pattern, and as observing a stability.

Let us now call it Seeing. The second form has appeared as: statistical syllogism, accepting the implications of a theory, and as projecting a limited stability into the future. Let us now call it Predicting.

In Predicting, the conclusion is no more certain than the weakest link. In Seeing, the conclusion may be more certain than any of its supports.

Our justification for Seeing is that there is no alternative but to give up trying to understand.

"If thou hast eyes to see, then see."— Crito the

S a g e . ^

Our justification for Predicting is that whether we want to or not we must act (even inaction is an act), so we may as well take courage and act as boldly as our Seeing permits.

Implicitly at least, the process of Seeing involves the rejecting of alternative explanatory hypotheses. The standards for preferring one hypothesis to another are thus implicated in this form of induction, and we may reasonably 177 ask for them to be justified. The task of providing detailed justifications for the standards of theory preference is thus a residue of the problem of induction.

One justification strategy is suggested by our approach so far: these standards are to be justified by their anticipated contribution to the overall effort to under­ stand . Notes

1. Salmon [1], p. 20.

2. Salmon [1] , P» 50.

3. Black [1], p. 176.

4. The term "vindication" for this type of justification is due to Herbert Feigl, this according to Max Black, Black [1], p. 176.

5. Reichenbach, Hans, Reichenbach [1], p. 433.

6 . Ibid., p. 68ff.

7. This is not Reichenbach1s notation, but I will continue to use it throughout this discussion.

8 . Ibid., p. 329.

9. Ibid., pp. 329, 446, 450.

10. Salmon [1], p. 86.

11. Salmon [1], p. 138, fn. 111.

12. Black [1], p. 176. Also on p. 170, "There is no general agreement concerning the basic form of induc­ tive argument, although many writers regard simple enumeration as in some sense fundamental."

13. Ibid., p . 329.

14. Ibid., p. 351.

15. Popper [3], p. 25-42.

16. Though I assert a logical connection between frequency and probability with the "frequency promoting property", I do not think that my view falls prey to thedifficul­ ties mentioned by van Fraassen in TheScientific Image, p. 190 (van Fraassen [1]). He £ a y s :

178 179

However, as they stand, both these views [i.e., Popper's and Kyburg's] fall prey to all the difficulties that beset the strict frequency interpretation. For clearly, there are models containing but a single world (a single long run). In that case we find that the domain of the defined probability function may not be a Borel field, and indeed not even a field; and where defined, not countably additive.

Given these difficulties (which some propen­ sity theorists can (emphasis van Fraassen's] and do avoid by the simple expedient of denying any and all logical connections between frequency and probability), the models will not be acceptable.

Though I agree that there are models of my view containing a single world, such a world cannot be a single long run. On my view a propensity can be modeled as a "proportion" in some class, even an infinite class (as in the probability of getting a rational number if I choose a real number at random), or even perhaps a countably infinite class (as in the probability of drawing an even integer from a hat containing all of the integers), but a propensity of my sort cannot (I hope) depend upon the order of draw in an infinite sequence. The propensity may be modeled by the underlying countable set, but not by a single infinite sequence. Note that the sequence given by 1, 3, 2, 5, 7, 4, 9, 11, £>, 13, 15, (3, 17, 19, H), . . . gets all of the natural numbers even­ tually, but the frequency of even numbers approaches 1/3.

17. Mandelbrot [1], pp. 96-98.

18. Salmon [1], p. 86.

19. Ibid., p. 446.

20. Salmon [1], pp. 87, 88.

21. Goodman, Nelson [1]. 180

22. Ibid., pp. 512, 513.

23. This reading seems close to the text, but "grue" has been interpreted by others as simply "green before t, blue after t". For example, this definition of "grue" is the one used by Max Black in Black [1].

24. This argument is inspired by Aristotle's Metaphysics, r.

25. As quoted by Marcus Aurelius in Aurelius [1], Book 8, verse 38. Epilog

Up till now I've written quite freely about inference

to the best explanation without really saying very much

about what an explanation is supposed to be. I take it

that an explanation expresses the content of an under­

standing. When I understand something, I can explain it,

if I can find suitable language to express that under­

standing. When I don't understand, I can't formulate an

explanation, except by accident.

We understand things (and explain things) by using

causal models. We construct simulations of the structures

and workings of the world; and we do this by calling upon

certain distinctively causal, general patterns of thought.

Let us call these general patterns paradigms of causation

(after the sense of T. Kuhn), or models of causations (in

the sense of mathematical simulations), for want of better

terms. It is my contention that each such paradigm has

its own unique logical structure, and thus its own logical

forms for explanation and prediction, for possibility, necessity, and likelihood, and its own logic of counter-

factual conditions.

There follows a list of some of these models of causa­

tion, along with some brief remarks concerning their

181 182

logical features. This should help make it clear what

level of abstraction and generality I would like to focus on. We begin with the most primitive and anthropomorphic, and proceed roughly in the order of their historical emer­ gence .

1. Causation as the relation between the will of an agent

and the object of that will.

Explanation within this paradigm is by assigning responsibility to an agent: 'Why p?1— 'George did it' (presumably on purpose). If we are so attached to this model that we will admit no other form of explanation, then we are led to postulate unseen agents to account for the details of our daily lives, and thus led to animism or theism.

Predicting normally requires knowing an agent's intent, or at least something about its personality.

The probability that something comes about, given that it is intended by agent X, is directly related to X's power to do what it intends. In general 'power' is a more significant notion here than 'likelihood'.

Speaking of "the cause" of an event is normally in order; the cause is the responsible agent (and in a secondary and much more abstract sense "the cause" is the intention in the mind of the agent).

'Causes' tend to be conflated with 'reasons'.

Since a desire or an intention is forward-looking, the causal arrow must proceed forwards in time.

2. When means-ends relations are added to supplement the

simple agency model.

Motive can be inferred from behavior by construing the behavior as a means to some end. Explanation is by identifying the explained as a means to a desired end.

The idea of 'using' must be as old as tools and weapons. If we consider using people, we are led to consider the master-slave relationship, where the behavior of the slave is understood by reference to the wij.1 of the master.

Conscious agents are notoriously unpredictable so speaking of what S would do under specified circum­ stances is pretty speculative.

Final causes. Power still resides in agents or things, but ends need not be conscious.

The model is often connected with some kind of essen- tialism, where the end is innate in, and part of the nature of, the agent.

If I know S's nature or personality, and understand the situation that it is in, then I can predict what will happen.

Material causes

The structure of something as a cause and explainer of its properties and behavior.

Formal causes

Axioms and definitions as causes of the theorems.

Crude mechanism

Causation as collision theory.

Natural laws and nomic necessity, with counterfactuals derivable from the laws.

Complete temporal reversibility.

Newtonian forces

"Forces" as the explainers. 184

8 . Laplacean Determinism

State-state causation.

"The cause" only as a state of the whole system.

The meanings of counterfactuals can be made clear by distinguishing laws from initial conditions. They are almost pointless though/because within a deter­ ministic setting nothing could be any different from the way it actually is.

9. Process-prodnct causation

Historical explanation.

10. Hegelean and Marxist dialectical conditions

Deterministic causation with the causes being self- transcending, i.e., proceeding without the super­ vision of external laws.

11. Causation as manifest in' state transition rule

This model subsumes some of the preceding ones.

deterministic stochastic

closed system

open system

The "propensities" of the last chapter are, I think, material causes of stochastic state transition rules.

12. Homeostasis, dynamic equilibrium, feedback, symbiosis,

causal reciprocity

~ — ------y \ ► ^ , and 4---- - structures. 185

"The cause" almost meaningless. "How does x contri­ bute to y?" becomes the dominant question, along with the auxiliaries "what maintains y", and "what does x do?"

I have no doubt left many things off the above list, and run together items that would better be separated, but

I hope I have given the general idea: We have a limited repertoire of causal models to call upon in constructing the theories and scenarioes that we use to understand, pre­ dict, and explain. Presumably these are not forever fixed— the people of the future will have a broader range of causal models as humans develop new ways of understanding. Each model has its domain of usefulness; to insist on one model for all phenomena is dogmatic.

Every scientific theory that explains events or changes is committed to a view of what can and what cannot count as causes. Moreover, every such theory provides a class of legitimate causal processes with which to construct scien­ tific explanations of single events and classes of similar events. In so doing a theory directs our inquiry into events and phenomena to be explained. Thus a theory with its model of causation is on this level part of the logic of discovery.

********

We may distinguish "causation as it is in us" from

"causation as it is in the world". The discussion so far of models or paradigms of causation has been concerned with causation "as it is in us". 186

Causation "as it is in the world" answers to different models, on different occasions. That is, the nature of causation "as it is in the world" is an empirical question to which the foregoing models provide partial answers.

This is in line with the general position advocated in this dissertation concerning the truth content of our theories and models of the world, and which I'll now call

Non-dogmatic Realism: each theory should normally be pre­ sumed true as far as it goes (i.e., this is sometimes sub­ ject to explicit qualifications), but it should never be taken to be the whole truth, or to have an unreasonable degree of certainty— never beyond correction.

Thus causation "as it is in the world" is over here like one model, and over there like another. The model is always a simplification of the way the thing is in the world, because the world is very complicated, with causal processes proceeding on many levels all at once.

Yet we are entitled to inquire as to what all these models have in common. Is there a core logic of causation that reflects the common part? Is there a general aspect of the world that we are attending to as we construct these models of causation?

It seems to me that the answer is that all of the models have in common a primitive causal arrow. This arrow expresses what we may as well read as the causal ancestry relation.

It is on the basis of this relation that we distinguish between: 187

A is a cause of B. A >B

B is a cause of A. B —— >A

A and B are causal descendents of a common cause C.

It is related to the Biblical "begats". This relation expresses a connection between its relata, a connection with a direction. The resulting formal structure of relation­ ships— a structure of nodes and arrows— is treated by the mathematical theory of directed graphs.^

The corresponding structure of the world, of which the causal ancestry relation is the formal model, seems to be something like Aristotle's "prior in the order of being".

By demonstration I mean a syllogism productive of scientific knowledge, a syllogism, that is, the grasp of which is eo ipso such knowledge. . .

The premises must be the cause of the conclusion, better known than it, and prior to it; its causes, since we possess scientific knowledge of a thing only when we know its cause; prior in order to be causes. . . .

Now 'prior' and 'better known' are ambiguous terms, for there is a difference between what is prior and better known in the order of being and what is prior and better known to man.^

A cause is prior to its effect in the order of being. The existence or being of the cause somehow supports or enhances the being of the effect. The real-world content of the causal ancestry relation is thus the ontological connected­ ness of nature, or at least one aspect of that connectedness. 188

Through the discussion of the regularity view of causation in Chapter 1, we found that causal ancestry cannot be completely analyzed in terms of space, time, and existence, even if we are permitted to use modal notions

(as counterfactual conditionals) in the logic of the reduction. So causation should be added to the list of basic categories along with space, time, and existence.

The Humeans leave the connectedness of nature out of their view of the world. Notes

1. See for example, Hall [1], p. 92, for a definition.

2. Aristotle, PosteriarAnalytics, Aristotle [1], Bk. 1, Ch. 2. Emphasis mine.

189 APPENDIX A

Theorem: For every real number r ( [0fl] there is a binary sequence b^, b 2* • - • such that the one's relative frequency converges to r.

Proof: Since r is real, it can be realized in a variety of

ways at the limit of a sequence of rational approximations.

One way to do this is to consider the binary develop­

ment of r: r = .d-j^^S*** di binary digits. Thisgives

dx/2, (2dj_ + d2)/22 , (22dx + 2d2 + d 3)/ 23 , . . .

(2k" 1d 1 + 2k"2d 2 + . . . + 2dk_ 1 + dk )/2k , . . .

as a converging sequence of approximations to r.

What we want to do is to build up a binary sequence

whose one's relative frequency converges to r by adding

on segments so that the one's relative frequency achieves

each value in this sequence of rational approximations.

Suppose we are at the step of building up the

sequence, i.e., that the one's relative frequency so far is

(2k_1d1 + 2k“2d2 + . . . + 2dk_1 + dk )/2k .

Let us suppose that this is realized as the numerator number

of one's in the denominator number of terms of the sequence.

190 191

How many one's and how many zero's need to be added to get the sequence to display a one's relative frequency of the next approximation? Well, we will want 2k+1 terms altogether, so we'll have to add on

2k+l _ 2k = 2k (2-1) = 2k more terms. We will want to come out with

2kd^ + 2k " ^ 2 + . . . + 2dk + dk+-^ one's, but we have

2k“ 1d 1 + 2k_2d2 + . . . + 2dk„1 + dk already, so we need to add on:

2 k“^d-^ + 2 k~^d2 + . . . + d^ + dk+-^ which is what we have already +

So we need to add on a new segment which is either identical to the whole sequence so far (lf dk+l “ 0) or a new segment which is identical to the whole sequence so far except that a single zero has been replaced with a one.

If there are no zero's in the whole sequence so far, so none available to replace with one's, it must be that every previous addition was all one's including the first.

But the first segment was of length 2 with either 0 or 1 one. So the sequence so far always contains at least one zero. Q.E.D. 192

Corollary; There are uncountably many binary sequences whose one's relative frequency converges. APPENDIX B

A Non-convergent Sequence of Relative Frequencies

Suppose an ideal machine which successively prints zero's and one's on an infinite tape starting at a definite point. We are interested in the relative frequency (Fn at the nfck step) with which one's appear, and in lim Fn . n->°° Suppose what happens is this: step what comes out frequency of one's so far 1 1 one -f 2 10 zero's 7T 3 100 one's 4 1000 zero's "'rlrh 5 10,000 one’s Jf f i - ' 6 100,000 zero's , 7 1,000,000 one's

> . . l O / O ... 0 1 4 - * ^ K k (even) 10K x zero's ,,,,, ~-h r/t>rs O r io,0 ... Ol-4---— K HOTS k (odd) 10 one's k p>&>r*

th Consider the frequency of one's at the end of the k step.

Suppose first that k is even. Then the frequency is ioi.-■ a n • " with the denominator one digit longer than the numerator.

nt...a s m ... a _ _J___

But if k is odd, the frequency is ioi ... 01 ni ••• a with the denominator and the numerator the same length. 194

And _>c/...o I v ioq. . OO ^ t o o - _ is_ - X ///...// ^ 111..// tao ' " 0 0 fH *> so the sequence of frequencies has a subsequence with values always less than 1/10 and another that is always larger than 5/6. It follows that the limit does not exist. Appendix C

Strangs, Decastrangs, and Kilostrangs: Proposals for a

Vocabulary for Low Likelihood Events

If I toss a fair coin, and get a head, that is not a strang; but if I toss a coin slightly biased for tails, and get a head, that is a strang.

A decastrang is the occurrence of an event equivalent in unlikeliness to the occurrence of ten strangs in a row.

Similarly, a hecostrang is equivalent to a hundred strangs in a row, a kilostrang is equivalent to a thousand strangs in a row, and a megastrang to 106 strangs in a row.

The probability of a decastrang is less than

(1/2)1® _ 1/ 2 I® < 1/103 = .001, i.e., less than one-in-a- thousand; and the probability of a kilostrang is less than

(1/2)1000 = 1/21000 = l/(210)100 < 1 / (103)100 =

1/10300 = .0000 . . . 01, i.e., less than one-in-a-googol-

299 zero's of-googals-of-googals.

Strangs actually occur and are observed every day. kilostrangs, on the other hand, almost never occur and are quite certainly never observed by anyone I know. The

195 196 decastrang may actually be a useful unit— decastrangs are the sorts of things that occur often enough to actually observe, and yet rarely enough to be worth talking about. BIBLIOGRAPHY

Aristotle [1]

Aristotle. Posterior Analytics. Edited by Richard McKeon. Random House: New York.

Aurelius [1]

Marcus Aurelius. Meditations. Trans., Maxwell Staniforth. Penguin Books, Ltd., Harmondsworth, Middlesex, England.

Black [1]

Max Black, "Induction." The Encyclopedia of Philosophy. Ed., Paul Edwards. Macmillan Publishing Co., Inc., and The Free Press: New York [1967].

Christie [1]

Agatha Christie. A Holiday for Murder. Bantam Books: Toronto [1939].

Conze [1]

Conze, trans. Buddhist Scriptures. Penguin Books: Baltimore, Maryland [1959].

Dumoulin [1]

Heinrich Dumoulin, s.j. A History of Zen Buddhism. Beacon Press: Boston [1963].

Ennis [1]

Robert Ennis. "Enumerative Induction and Best Explanation." Journal of Philosophy. Sept, 1968.

Freedman, Pisani, Purves [1]

Freedman, Pisani, Purves. Statistics. W. W. Norton & Co., New York [1978].

197 198 Goodman, Nelson [1]

Nelson Goodman. "The New Riddle of Induction." Readings in the Philosophy of Science. Ed., Baruch A. Brody. Prentice-Hall, Inc.: Englewood Cliffs, New Jersey [1970). Originally from Fact, Fiction, and Forecast by , copyright 1965 by the Bobbs-Merrill Company, Inc.

Goodman, N.D. [1]

Nicholas D. Goodman. "Mathematics as an Objective Science." Mathematical Monthly 86, pp. 540-55.

Hall [1]

Marshall Hall, Jr. Combinatorial Theory. Blaisdell Publishing Company: Waltham, Massachusetts [1967].

Harman [1]

Gilbert Harman. "The Inference to the Best Explanation." Philosophical Review, LX XVI, 1 (January 1965), pp. 88- 93"!

Harman [2]

Gilbert Harman. "Enumerative Induction as Inference to the Best Explanation." Journal of Philosophy, September, 1968.

Hume [1]

David Hume. An Inquiry Concerning Human Understanding. Bobbs-Merrill Company: Indianapolis. Originally published in 1748.

Hume [2]

David Hume. A Treatise of Human Nature. Edited by L. A. Selby-Bigge. Oxford [1888] .

Jeffrey [1]

Richard C. Jeffrey. "Valuation and Acceptance of Scien­ tific Hypotheses." Readings in the Philosophy of Science. Ed., Barch A. Brody. Prentice-Hall, Inc.: Englewood Cliffs, New Jersey. 199

Laudan [1]

L. L. Laudan. "William Whewell on the Conciliance of Inductions." The Monist, 55 [1971].

Kneale [1]

W. and M. Kneale. The Development of Logic. Oxford University PressT Oxford [1968] .

Levi [1]

Isaac Levi. Gambling with Truth: An Essay on Induction and the Aims of Science. MIT Press: Cambridge, Massachusetts [1967] .

Mackie [1]

J. L. Mackie. "Causes and Conditions." American Philosophical Quarterly, 2.4 (October 196 5), pp. 245-55 and 261-64. Reprinted in Causation and Conditionals. Edited by Ernest Sosa. Oxford University Press [1975].

Mackie [2]

J. L. Mackie. The Cement of the Universe: A Study of Causation. Oxford: Clarendon Press [1974].

Mandelbrot [1]

Benoit Mandelbrot. Fractals: Form, Chance, and Dimension. W. H. Freeman [1977].

Moody [1]

Ernest A. Moody. "William of Ockham." Encyclopedia of Philosophy, pp. 306-317.

Moore [1]

G. E. Moore. "A Defense of Common Sense." Philosophical Papers. Copyright George Allen and Unwin, Ltd: Collier Books, New York [1959].

Norman [1]

Donald A. Norman. Memory and Attention: An Introduction to Human Information' Processing, 2nd ed. John Wiley & Sons [1976]. 200

Plato [1J

Plato. "Meno." The Dialogues of Plato. Trans., Benjamin Jowett, 3rd ed.. Random House, Inc.: New York.

Popper [1]

Karl Popper. "Science, Conjectures and Refutations." Conjectures and Refutations: The Growth of Scientific Knowledge. Harper & Row: New York 11965] .

Popper [2]

Karl Popper. "Truth, Rationality, and the Growth of Scien­ tific Knowledge." Conjectures and Refutations; The Growth of Scientific Knowledge.

Popper [3]

Karl Popper. "The Propensity Interpretation of Probability." British Journal for the Philosophy of Science, Vol. 10, 1959, pp. 25-42.

Reichenbach [1]

Hans Reichenbach. The Theory of Probability; An Inquiry into the Logical and Mathematical Foundations of the Calculus of Probability. English trans., Ernest E. Hutten and Maria Reichenbach. Second Edition. University of California Press: Berkeley [1971].

Russell [1]

Bertrand Russell. "On the Notion of Cause." Mysticism and Logic. Garden City, New York: Doubleday, Anchor Books [1957], pp. 174-201.

Russell [2]

Bertrand Russell. The Problems of Philosophy. Oxford University Press'^ first published 1912.

Salmon [1]

Wesley C. Salmon. The Foundations of Scientific Inference. University of Pittsburgh Press 11966]. 201 Salmon 12j

Wesley C. Salmon. Statistical Explanation and Statistical Relevance. University of Pittsburgh Press [1971].

Salmon [3]

Wesley C. Salmon. "Theoretical Explanation." Explanation. Ed., S. Kflrner. Oxford: Basil Blackwell, [1975] , pp. 118-145.

Schrfldinger [1]

Erwin C. Schrfldinger. "The Law of Chance: The Problem of Causation in Modern Science". Science: Theory and Man. Trans., James Murphy. Dover Publications: New York [1975], pp. 39-51.

Science News [1]

Science News, March 31, 1979. Vol. 115, no. 13.

Stove [1]

D. C. Stove. Probability and Hume's Inductive Skepticism. Oxford: Clarendon Press [1973 ] .

Suzuki, B. L. [1]

B. L. Suzuki, Beatrice Lane Suzuki. Mahayana Buddhism. Macmillan [1969].

Suzuki, D. T. [1]

D. T. Suzuki. Manual of Zen Buddhism. Grove Press: New York [I960].

Tanimoto and Klinger [1]

Tanimoto and Klinger, eds. Structured Computer Vision, Machine Perceptions through Hierarchical Computation Structures. Academic Press [1980]. van Fraassen [1]

Bas C. van Fraassen. The Scientific Image. Clarendon Press: Oxford [1980]. 202

Wittgenstein II]

Ludwig Wittgenstein. On Certainty. Ed., G. E. M. Anscombe and G. H. von Wrxght. Trans., Denis Paul and G. E. M. Anscombe. J. J. Harper Editions: New York and Evanston.