<<

29

Empirical : Its and Sources

Julian Reiss

INTRODUCTION structural model, which is based on eco- nomic , then to operationalise its terms With the rise of evidence-based movements and test the fit of the model against data. in medicine and social policy, the topic This structural equation modelling approach of evidence has come to the forefront of was developed by members of the Cowles in the and methodology Commission in the 1930s and 1940s and of . But the issue is far from new. has since been supplemented by a variety , , induction and con- of other techniques in econometrics such firmation – all practises very closely related as the analysis of natural , but to evidence – have been central concerns of also laboratory experiments, simulation philosophers ever since the birth of Western and various forms of conceptual or thought philosophy. The primary aim of this article experimentation. is to provide an introduction to and illumi- This article has two parts. In the first part, I nation of these topics in so far as they are survey philosophical of evidence relevant to the social . and in so doing attempt to answer questions Empirical evidence in the social sciences regarding the nature of evidence and the is extraordinarily varied. It is produced by nature of the inference from evidence to methods including the collection of physical hypotheses. The second part will give a nec- artefacts in archaeology, conducting cen- essarily incomplete overview of the different suses in demography, mathematical model- sources of evidence in the social sciences and ling in , thought experimentation asks how to combine their products. in history, expert judgement in political Before diving into the various philosophi- science, laboratory experimentation in psy- cal proposals for theories of evidence and chology and causal modelling in sociology, inference, a few terminological clarifications among many others. Even within one and the are in order. First, the of scientific same science, evidence can have a variety evidence has at least three connotations that of sources. To take economic policy as an are relevant here. According to Webster’s example, the traditional way to substantiate New World Dictionary (Second College edi- economic policy claims is to first build a tion), ‘evidence’ refers to: (1) the condition

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555151 111/10/20101/10/2010 33:26:05:26:05 PMPM 552 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

of being evident; (2) something that makes ‘data’ or ‘test result’ or something similar another thing evident; indication; sign; and for it. (3) something that tends to prove; ground Second, as the passage from Salmon illus- for . Accordingly, trates, the notion of evidence is closely means, first, the more or less observable related to those of confirmation and induc- outcomes of scientific tests such as experi- tion. Hypotheses are confirmed by evidence, ments, statistical analyses and surveys. Used and most theories of evidence that will be in this way, the notion is more or less syn- discussed below have been introduced as onymous with ‘data’ or ‘scientific result’. theories of confirmation. Induction refers According to the second entry, scientific to the mode of reasoning or inference from evidence means hint, sign, indication of or evidence to . As the hypothesis a reason to believe (the negation of) a sci- contains more, or at least different, infor- entific hypothesis. According to the third, mation than the evidence, this mode of the word means (something that furnishes) reasoning is ampliative – enlarging what is proof of or good or cogent reason to believe already known. It contrasts with deductive (the negation of) a hypothesis. The ambiguity reasoning, which proceeds in reverse order between the latter two meanings is illustrated from a more general claim to a more specific by a passage from Wesley Salmon (Salmon, claim. Unlike , which is 1975: 6): preserving, reasoning from evidence is fallible. As Carnap pointed out in Logical Foundations of , the of confirmation is radically ambiguous. If we say, for example, that the special theory of relativity has been confirmed by experi- mental evidence, we may have either of two quite TAKING THE CON OUT OF distinct meanings in mind. On the one hand, we may intend to say that the special theory has CONFIRMATION become an accepted part of scientific and that it is very nearly certain in light of its sup- Some Preliminary Remarks porting evidence. If we admit that scientific hypotheses can have numerical degrees of Philosophers of science often treat theories of confirmation, the sentence, on this construal, says evidence and of induction (or confirmation) that the degree of confirmation of the special as if they were of the same kind. But this theory of relativity on the available evidence is high. On the other hand, the same sentence might would be a mistake. Theories of evidence be used to make a very different statement. It are supposed to answer questions regard- might be taken to mean that some particular evi- ing the nature of evidence and the kinds of dence – for example, on the lifetimes observations or tests a researcher needs to of mesons – renders the special theory more make in order to have evidence in favour of acceptable or better founded than it was in the absence of this evidence. If numerical degrees of the hypothesis at stake. Theories of induc- confirmation are again admitted, this latter con- tion, by contrast, begin with an antecedently strual of the sentence amounts to the claim that understood notion of evidence and ask what the special theory has a higher degree of the kinds of inferences are that one is justi- confirmation on the basis of the new evidence fied to make given one has evidence of the than it had on the basis of the previous evidence alone. required type at hand. Bayesianism, for example, is often In what follows I will use ‘evidence’ almost described as a theory of evidence (as in always in sense 2, indication or sign, the only Achinstein, 2001) but it is completely silent exception being Peter Achinstein’s theory on the issue of the nature of evidence. Rather, of evidence, which is a theory of a good or it tells us what a rational agent should do cogent reason to believe. Evidence in sense in a situation where she comes to believe 1 will play an important role but I will say an evidential statement (viz., to update her

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555252 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 553

degree of belief in the hypothesis according theory of induction that tells us what infer- to a specific rule). Colin Howson and Peter ences are warranted after the evidence has Urbach describe the matter as follows: come to be believed. This theory is silent about what evidence is. A full account of The Bayesian theory we are proposing is a theory learning from evidence requires both. Here I of inference from data; we say nothing about will look at theories of evidence first, then at whether it is correct to accept the data … The Bayesian theory of support is a theory of how the theories of induction and finally at a number acceptance as true of some evidential statement of hybrid theories. affects your belief in some hypothesis. How you came to accept the truth of the evidence, and whether you are correct in accepting it as true, are matter which, from the point of view of the Theories of Evidence theory, are simply irrelevant. (Howson and Urbach, 1993: 272) In this section I will look at two families of theories of evidence, instance theories and By contrast, Mill’s methods, though often hypothetico-deductive theories. The former described as a theory of induction (as in regards an instance of a hypothesis evidence Norton, 2003) are in at best informative for it, the latter, its entailments. about the types of observations one must make in order to support a (in this case, Instance Theories causal) hypothesis. For instance, the method According to the first family of theories of of difference tells us evidence in favour of a evidence, a state of affairs provides evidence causal hypothesis can be provided by two sit- for a general hypothesis if and only if it is an uations that are exactly identical except with instance of the hypothesis. Here I will look at respect to the phenomenon of interest. The theories of evidence regarding two kinds of method, by itself, is not informative about general hypotheses: simple subject-predicate the kinds of inferences warranted by the hypotheses and causal hypotheses. observation of two such situations. It does not say, for example, whether, after having made Simple natural the observation, we should accept the causal By ‘simple natural laws’ I refer to universally hypothesis as true or raise our degree of con- quantified statements that ascribe a property fidence in the hypothesis or rather assess the to a kind or substance such as ‘All ravens probability of accepting a false hypothesis are black’ (Hempel, 1945) or ‘All samples of if the test were run again and again.1 the element bismuth melt at 271°C’ (Norton, That inference to the best explanation 2003). According to this first theory, evi- and Bayesianism are compatible has been dence for such a generalisation is constituted noticed before (see for instance Lipton, 2004: by the instances of the generalisation. Chapter 7; Okasha, 2000). According to the Formulated in first-order , the account position defended here, this is no accident. is subject to the famous ‘ravens paradox’. The The two theories belong to different catego- hypothesis ‘All ravens are black’ is logically ries members of which play complementary equivalent to the hypothesis ‘All non-black roles. Inference to the best explanation is, things are non-ravens’. Now, an instance of despite its name, a theory of evidence that the latter hypothesis is a red shoe, and there- tells us what are the kinds of observations we fore observing a red shoe provides evidence should make, what are the kinds of tests we for it. On the plausible assumption that if a need to run in order to confirm or disconfirm state of affairs is evidence for one hypothesis, a hypothesis. (As theory of evidence I will then it is evidence for a logically equivalent therefore refer to it as ‘explanationism’.) It hypothesis, observing a red shoe provides is silent about the types of inferences to be evidence for the hypothesis ‘All ravens are drawn from the evidence. Bayesianism is a black’, which is absurd (Hempel, 1945).

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555353 111/10/20101/10/2010 33:26:06:26:06 PMPM 554 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

What is more important in the present that is followed by (or follows) is the cause context is that few interesting hypotheses in (or the effect) of the phenomenon. Call f the the social sciences have the form of a simple phenomenon of interest; f a factor and X, x, natural . First, because of the high degree Y, and y vectors of ‘other circumstances’

of biological, psychological and social vari- (where Xi ≠ Yj and xi ≠ yj for all i, j and Xi ∈ ability, few claims are true of all tokens of X, Yj ∈ Y, xi ∈ X, yj ∈ Y), then we can say a given type. Second, the theory restricts schematically: evidence to what is describable by ‘phenom- 1 Method of agreement enal’, that is, observable (such as ‘black’) or measurable (such as ‘melts at 271°C’), fX is followed by fx fY is followed by fy predicates. To the extent that social science Therefore, f is a cause of f.2 hypotheses have some generality, they tend to stem from explanatory theories, which The method of difference, by contrast, seeks involve theoretical predicates and require a two sequences of situations, one in which different kind of evidence (see below). the phenomenon of interest occurs and on in which it doesn’t, that are exactly identical Causal hypotheses with respect to every factor save one, then Despite these limitations, the that an that factor is the cause (or the effect) of the instance provides evidence for its gener- phenomenon. Schematically: alisation is behind a very influential set of 2 Method of difference principles for causal inference. There is a fundamental and critical distinction between fX is followed by fX X is followed by X sequences that are genuinely causal and Therefore, f causes f.3 those that are ‘merely’ accidental. To use a philosopher’s stock example, smoking is said The method of difference is the method to cause cancer. On the other hand, many of the controlled experiment. Both Mill’s people who have nicotine-stained fingers methods and the simpler instance theory will later develop cancer but having yellow constitute accounts of evidence because they fingers isn’t a cause of cancer. The difference tell us what the kinds of observations or tests between the two is as important for explana- are that one must take in order to support a tion as it is for planning and policy. John’s hypothesis. It is true that both are associated smoking may explain his cancer; stained fin- with a rule of inference according to which gers don’t. And if John wants to avoid certain the evidence warrants accepting the truth kinds of cancer, to stop smoking would be a of the hypothesis – an inference rule I call good idea; avoiding yellow stains by wearing ‘categorical induction’ (for Mill’s case, see gloves wouldn’t. Mill, 1843). Norton (2003) therefore clas- Building on the seminal work of Francis sifies the simple instance theory and Mill’s Bacon, developed five prin- methods under accounts of induction of the ciples to distinguish between causal and acci- type ‘inductive generalisation’. dental sequences: the methods of agreement, However, the accounts are only accidentally of difference, the joint method (of agreement wedded to this specific inference rule. There and difference), of residues and of concomi- is nothing in the principles themselves that tant variation (Mill, 1843 [1874]: Book III, prevent using other rules such as probability Chapter 3). Exemplarily, let us look at the updating or error correction (for a discussion first two in more detail. of these rules of inference, see below). The method of agreement seeks two sequences of situations in which the phenom- Hypothetico-deductivism enon of interest occurs such that they differ The feature that unites the second family in every respect but one. Then, the factor is that what makes a statement a statement

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555454 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 555

about evidence is its being entailed by a history (Scriven, 1966: 249–50; emphasis hypothesis and suitable auxiliary assump- original): tions. The advantages of these theories over instance theories are immediate: any theory, For in order to establish a causal claim on behalf of a factor what does the historian need? Merely using both predicates that refer to observables evidence that his candidate was present, that it and those that refer to , can in has on other occasions clearly demonstrated its principle be supported by evidence, not just capacity to produce an effect of the sort here generalisations. But there is also an immedi- under study (or there might be theoretical grounds ate problem: few evidential statements are for thinking it a possible cause rather than previ- ous direct of its actual efficacy), and the entailed by only one hypothesis. The usual absence of evidence (despite a thorough search) case is that there are many mutually incom- (a) that its modus operandi was inoperative here, patible hypotheses, all of which entail the and/or (b) that any of the other possible causes evidential statement. The main question for was present. If the event studied had a cause at all these theories of evidence is consequently (which the historian assumes it did), then he may confidently assert that the residual condition is how to discriminate between the different elected. This argument proves his claim – and it evidence-entailing hypotheses. I discuss two requires nothing the historian does not possess. ways here, which I call eliminativism and The only general that might be involved explanationism, respectively. would be a list of the known possible causes of the kind of effect in question. Explanation proceeds by the elimination of possible causes … Eliminativism The most straightforward way to discrimi- The principal worry about elimination of nate among evidence-entailing hypotheses alternative hypotheses in the social sciences is to devise tests or series of tests that is that is is frequently the case that methods eliminate all but one of the alternatives. such as the randomised trial are not applicable This idea too goes back to or applicable only in an attenuated form, and (see for instance Klein, 2003). The evidence that there are too many possible alternatives relevant to a hypothesis is therefore con- not all of which can be ruled out. In medical stituted by the testable implications of the research, a randomised trial is a powerful hypothesis at stake as well as those of its blinding device: neither patient nor doctor alternatives. knows whether which of a number of treat- A large randomised controlled trial can ments (including a placebo) is administered. serve as the because it elimi- Blinding in this sense is often not an option nates many alternative hypotheses in one fell when a treatment is a training programme or swoop. A hypothesis about the effectiveness some other social policy (Scriven, 2008). of a new training programme, say, can be tested by dividing subjects at random into Explanationism a treatment group (which receives the new If several theories or hypotheses entail the training) and a control group (which receives evidential statement, then that which best the standard training) such that the distribu- explains the evidence is confirmed according tion of other factors influencing performance to this theory. Additional evidence is there- is identical in the two groups. Then, if per- fore provided by a fact about the theory: its formance is on average higher in the treat- relative degree of explanatoriness or explana- ment group than in the control group, it must tory quality. be due to the new programme. One might argue that calling such a fact But it is not necessary that alternative ‘evidence’ would be misleading. This is cor- hypotheses be eliminated by one and the rect, but only on the first reading of the term same test such as a randomised trial. Michael ‘evidence’ as ‘data’ or ‘test result’. On the Scriven proposes the following form of infer- second reading of ‘evidence’ as ‘hint’, ‘sign’ ence in the context of causal analysis in or ‘indication’ (or even on the third reading

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555555 111/10/20101/10/2010 33:26:06:26:06 PMPM 556 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

as ‘proof’), there is nothing unusual about objection’ (Lipton, 2004: Chapters 4 and 9) calling about a theoretical hypothesis is that the various explanatory virtues such evidence. At any rate, these facts are taken as as credibility, simplicity, unifying power or evidence by proponents of explanationism. mechanism are too subjective and varied to A connection between evidence and provide an acceptable ground for inductive explanation was already present in Hempel’s reasoning. (1965) account because according to his An important role evidence plays in sci- deductive-nomological theory of explana- entific and other controversies is that of an tion, a hypothesis, if it is true, fulfils some objective arbiter. If, say, one political party additional criteria and, together with other holds that minimum wages are an effective statements entails the evidential statement, tool to provide a living wage for everyone it explains the evidential statement. Turned and another that minimum wages are coun- around, one can say that according to this terproductive because they destroy jobs, then account, a statement is evidence for a hypoth- such disagreements about apparently purely esis if and only if the hypothesis, if true, factual matters (‘is X an effective strategy to explains it. Explanationism adds a criterion promote Y?’ when it is agreed that Y is a desir- to discriminate between competing potential able state of affairs) should, in principle, be theoretical accounts. In Gilbert Harman’s solvable on the basis of evidence. Inference words (Harman, 1965: 89): to the best explanation reintroduces sub- jectivity through the back door because (a) In making this inference one infers, from the fact there is no one generally accepted explana- that a certain hypothesis would explain the evi- tory virtue; (b) there is no generally accepted dence, to the truth of that hypothesis. In general, there will be several hypotheses which might that ranks or weighs the different explain the evidence, so one must be able to reject explanatory virtues; (c) there are no objective all such alternative hypotheses before one is war- criteria that determine whether and to what ranted in making the inference. Thus one infers, extent any given virtue applies to a given from the premise that a given hypothesis would case – what’s simpler (or more unifying or provide a “better” explanation for the evidence than would any other hypothesis, to the conclu- …) to one person isn’t to another (on this sion that the given hypothesis is true. latter point, see Norton 2003).

This raises the question how to determine which of a given set of alternative hypotheses As can be seen, all theories of evidence is ‘best’. Many suggestions have been made: that have been discussed so far have limited the simplest, the most unifying, the most applicability. They work, to the extent that detailed, that which confers most under- they do, only for a specific type of hypothesis standing on its user. This mode of reasoning and under favourable epistemic conditions. is fairly common in theory-driven branches According to a position one may call evi- of the social sciences such as in parts of dential contextualism, this is to be expected economics and sociology. In economics, for (see for instance Reiss, 2008: Chapter 1; cf. instance, a model is accepted as explanatory Kincaid, 2007). Scientists’ epistemic inter- – or as more explanatory than an alterna- ests and their domains of investigation are tive – if it portrays a world that is credible too heterogeneous to subsume all kinds of (Sugden, 2000) or if it makes assumptions clues that may indicate a hypothesis under about structural features that can be found one universal scheme or even a small finite in a great range of economic phenomena; set of schemes of limited generality. in other words, if the model is unifying (see According to evidential contextualism, it Reiss, 2008: Chapter 6). is context-specific background knowledge Explanationism is subject to an important that informs scientists what kinds of tech- objection sometimes called ‘Hungerford’s niques work in what domains and under what

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555656 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 557

conditions. At the general level no more can supplemented with an account of natural be said than that evidence is an observation kinds that provides limits for the types of or test result about which background knowl- predicate that are admissible for being quan- edge entails that it is relevant to the assess- tified over. Nelson Goodman’s famous ‘grue ment of a hypothesis of the given type and paradox’ illustrates the problem. for the purpose at hand. There are numerous Goodman’s paradox, in short, is as fol- examples of the kinds of considerations that lows. The evidential statement ‘All emer- lead a (social) scientist to accept an observa- alds that have been observed so far have tion or test result as evidence for different been green’ confirms the generalisation ‘All kinds of hypotheses and in the light of dif- emeralds are green’. Now, we can define a ferent kinds of purposes in the third section new predicate, ‘grue’ as ‘green if examined of this article. before t or blue otherwise’. Thus, before t, we have exactly parallel evidence that all emeralds are green and that all emeralds Theories of Induction are grue. But obviously only one of the two hypotheses can be true. The question, then, Once we have evidence, what can we infer is which one we are warranted to infer. Some from it? This is the question addressed by (e.g. Quine, 1969) have argued that ‘green theories of induction. In this section I will emerald’ represents a natural kind while discuss three types of theory: categorical ‘grue emerald’ doesn’t – and therefore that induction, two probabilistic theories and we are warranted to infer the former hypoth- Norton’s ‘material theory of induction’. esis but not the latter. And it is not necessary to introduce arti- Categorical Induction ficial predicates such as ‘grue’ to make that What I call ‘categorical induction’ is the rule point. A more scientific example due to John to infer the truth of the hypothesis from the Norton can be used similarly. Consider the evidence. Conjoined with the instance theory following inferences (Norton, 2003: 649): of evidence, we get enumerative induction or the ‘more of the same’ rule of inference. Some samples of the element bismuth melt at Conjoined with the first form of hypothetico- 271°C. Some samples of wax melt at 91°C. deductivism, we get eliminative induction Therefore, all samples of the element bismuth melt and conjoined with its second form, infer- at 271°C. ence to the best explanation or abduction. Therefore, all samples of wax melt at 91°C.

Enumerative induction The obvious difference between the two This rule prescribes to infer from a finite set arguments is that ‘bismuth’ refers to a chemi- of observed instances to the corresponding cal element – a type of natural kind – whereas generalisation. For example, ‘wax’ names a variety of different mixtures of substances. Using simple enumerative Raven 1 is black induction as an inference rule is successful Raven 2 is black in the former case because of a known fact … Raven n is black about chemical elements: elements are gen- Therefore, all ravens are black. erally uniform in their physical properties. No such fact is true of the different mixtures One problem is that the account is vague: it of hydrocarbons that are jointly referred to does not say how many instances must be as ‘wax’. observed in order to warrant inferring the Of course, restricting the inference rule truth of the generalisation. Another problem to hypotheses regarding natural kinds is is that the rule itself is underspecified unless epistemically not helpful. Natural kinds are

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555757 111/10/20101/10/2010 33:26:06:26:06 PMPM 558 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

just those kinds certain properties of which truth of the hypothesis that is the only one to are uniform across all instances, Knowing be consistent with the evidence’ (1985: 143), that already presupposes that the inferential ‘… is a rule that selects the best [supported] problem has been solved. among the historically given hypotheses. We can watch no contest of the theories we have Eliminative induction so painfully struggled to formulate, with Alexander Bird calls this rule inference to those no one has proposed. So our selection the only explanation (in contrast to infer- may well be the best of a bad lot’. The final ence to the best explanation, see below) and criticism is whether the evidence requires an describes it thus (Bird, 2007: 242): explanatory hypothesis at all. Perhaps some facts are just brute. By Inference to the Only Explanation (IEO) I intend something quite specific, that at the end of we can be in the position to infer the truth of some Inference to the best explanation hypothesis since it is the only possible hypothesis The proponent of inference to the best expla- left unrefuted by the evidence. It is the form of nation or abductive reasoning infers the truth inference advocated by Sherlock Holmes in his of a hypothesis from two considerations: famous dictum ‘Eliminate the impossible, and (a) the hypothesis explains the evidence; whatever remains, however improbable, must be the truth.’ (b) among the evidence-explaining alterna- tive hypotheses it is the one that scores best There are three main worries that beset this on some scale of explanatory merit. For form of induction. First and foremost is the Peter Lipton, for example, the ‘loveliness’ idea that theoretical hypotheses are always of an explanation is the relevant criterion. underdetermined by the evidence available Hence, the loveliest explanatory hypothesis at a certain point in time. This is the Duhem– is inferred to be true, according to this rule. Quine thesis. There are This inductive schema is subject to what good reasons to believe that the thesis in has been called ‘Voltaire’s objection’ (Lipton, its most general form – any theoretical 2004: Chapterss 4 and 9). The objection hypothesis is always underdetermined by all denies the connection between goodness and available evidence – is false (see for instance truth. It asks, why should the world be simple Norton, 2008). In many actual scientific or intelligible or lovely? In the context of the cases, all plausible hypotheses but one could social sciences the link between explanatory be eliminated. The just mentioned article by ‘loveliness’ and truth often seems particu- Bird reconstructs Semmelweis’ discovery larly tenuous. In economics, for example, a of the cause of puerperal fever along these ‘lovely’ explanatory model portrays agents as lines. But in the social sciences we often face perfectly rational, uses equilibrium situations in which the available evidence de to solve equations and is mathematical in facto underdetermines the choice of theo- nature. There is little plausibility in the idea retical hypotheses. The list of possible and that considerations such as these should be even plausible alternatives can be very long a reliable guide to truth (see Reiss, 2008: indeed, potentially open ended. Thus, the Chapter 6). ‘problem of confounders’ (Steel, 2004) is a real obstacle to social-science research. Probabilistic Theories There are two further problems but they Probabilistic theories ascribe a probability are less specific to social science. One is the to the hypothesis and understand evidential question whether the list of plausible alterna- support in terms of probabilistic relations. tives contains the true hypothesis. Eliminative categorical induction is only guaranteed to Bayesianism result in the true theory if it does. But, as Bas Standard Bayesianism combines an interpre- van Fraassen reminds us, the rule ‘infer the tation of probability as subjective degrees of

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555858 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 559

belief with Bayes’ rule – itself a theorem function of its prior probability P(h) and the of probability theory – and an interpretation likelihood ratio. of belief-updating as confirmation to yield a Together with the interpretation of prob- schema for making inductive inferences. ability as subjective degree of confidence or belief in a hypothesis and the idea that Aside: Five interpretations of updating entails confirmation, Bayes’ theo- probability rem entails an inference rule: upon coming to The five major interpretations of probability believe the evidential statement, update your are: classical, logical, subjective, frequency degree of belief in the hypothesis in accord- and propensity. The classical theory holds ance with (29.1). Evidence in favour of that probability is the ratio of the favourable (against) a hypothesis increases (decreases) cases to the total number of equally possible the degree of belief in the hypothesis. cases. For instance, the probability of the To illustrate how Bayes’ theorem works as event ‘rolling a number larger than three with an inference rule, consider a medical exam- a fair die’ is (1 + 1 + 1)/6 = 1/2. According ple. Suppose that a diagnostic test is 99 per to the logical interpretation, too, all possible cent accurate; that is, it gives a correct test states of affairs are assigned but result in 99 per cent of cases, both positive it relaxes the requirement of equal weights. A (when the disease is present) and negative probability measure assigns numbers to so- (when the disease is absent). If the test gives called state descriptions, which describe all a positive result, what is the probability that individuals in a in maximum detail.4 the person actually has the disease? Let h According to subjectivists such as Bayesians, = ‘person has disease’ and e = ‘test result probabilities are constraints a rational agent is positive’. The quantity we would like to lays upon the degrees to which he holds infer is the posterior probability P(h | e), the a belief. Probability expresses the confi- probability that the person has the disease dence with which an agent holds a belief. given that the test result is positive. The like- Frequentism identifies probability with the lihood ratio is P(e |~h)/P(e | h) = 1%/99%. frequency of favourable outcomes in a (finite Let the h’s prior probability, in cases such or infinite) reference class (e.g. the frequency as this called the base-rate, be 1/10,000 (i.e. of ‘heads’ in a finite series of tosses of a coin one person in ten thousand has the disease). or a hypothetical infinite series). Finally, Then, by (29.1): according to propensity theorists, probability is a physical disposition of a chance set up P(h | h) = .0001/[.0001+ to generate outcomes (such as the tendency (.01/.99)*.999] ≈ . 00981 of an atom to decay within a certain amount of time). Because of the low base-rate, the probability Bayes’ theorem has numerous forms (see that the person taking the test has the disease, Howson and Urbach, 1993: Chapter 2); for despite the positive test result, is below 1 an arbitrary hypothesis h, its negation ~h and per cent. Nevertheless, a positive test result evidential statement e, it can be expressed as provides evidence for the hypothesis because follows: P(h | e) ≈ 1% > .01% = P(h). Several criticisms have been levied against P(h | e) = P(h)/[P(h) + (P(e |~h)/P(e | h)) the Bayesian inductive rule, two of which P(~h)] (29.1) I want to discuss here. According to the Bayesian, a hypothesis’ probability assess- where the expression P(e |~h)/P(e | h) is ment after coming to believe the evidence called the ‘likelihood ratio’. Since P(~h) = depends, as we have seen, on two factors: the 1 – P(h), P(h | e), the posterior probability hypothesis’ prior probability and the likeli- of the hypothesis given the evidence, is a hood ratio. The first criticism finds fault with

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 555959 111/10/20101/10/2010 33:26:06:26:06 PMPM 560 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

prior probabilities; the second is suspicious to a hypothesis relative to an alternative. that evidence enters the inference only via The law of likelihood states that evidence e

the likelihood. supports hypothesis h1 over hypothesis h2 if In ordinary probability theory, probabili- and only if P(e | h1) > P(e | h2) (see Hacking, ties are defined over events or outcomes or 1965). a sample space. Making probability state- As in Bayesianism, a positive test result is ments entails that things could be different evidence for the hypothesis that a person has than the way they actually are. For instance, the disease, even if the posterior probability saying that the probability of rain today is might be low. Using the same numbers as

90 per cent means that it might either rain or above and defining h1 as the hypothesis ‘John not (and that it should rain on 90 per cent of has the disease’ and h2 as its negation, the days like today). Scientific hypotheses, by evidence ‘positive test result’ supports h1 as: contrast, are either true or false (or, if one prefers, empirically adequate or not or reli- P(e | h1) = .99 >> .01 = P(e | h2). able or not), they are not ‘probable’ in the sense that rain is. If there were probabilities, Likelihoodism is an account of evidence this would imply that there are many worlds, that addresses the question, ‘Which of two in which case the probability of a hypothesis hypotheses is better supported by the evi- could measure the frequency of worlds in dence?’ That this can lead to counterintuitive which the hypothesis is true. But there is results is illustrated with example due to Ian only one world – and thus, there are no priors Hacking (Hacking, 1972: 136): ‘We capture (Mayo, 1996: Chapter 3). enemy tanks at random and note the serial The other criticism can be illustrated by numbers on their engines. We know the the debate regarding stopping rules. A stop- serial numbers start at 0001. We capture a ping rule specifies when to stop collecting tank number 2176. How many tanks did the new data. To test whether a coin is fair, for enemy make? On the likelihood analysis, instance, one might toss it 20 , record the best supported guess is: 2176.’ That is, the number of heads and tails (say, 8 and 12) after capturing that one tank with the number and then assess whether the specific outcome 2176, the hypothesis that the number of tanks was more likely if the coin was fair or if it the enemy made is just that number is better was unfair. Alternatively, one might continue supported than any other hypothesis. As tossing the coin until one has recorded eight likelihoods are the only way evidence enters heads or a ratio of two heads to three tails. reasoning, this account is subject to the criti- Intuitively, it should matter a great deal how cism regarding stopping rules in the same the test is set up. For instance, it should way as Bayesianism is. matter whether a specific outcome was likely or not given how the procedure was designed. Bayesianism entails that to the assessment Naturalism asserts that the best place to look of the fairness of the coin these considera- for insights regarding inductive rules is sci- tions should not play a role. Another way of ence itself. It is suspicious of substantive putting it is that Bayesianism is only sensitive philosophical claims of great generality that to the actual outcome (the actual series of 8 are made independent of the details of specific heads and 12 tails) not also to the outcomes scientific practises. Here I will look at two the test could have produced but did not. examples: Norton’s ‘material’ theory of induc- tion and the error correction perspective. Likelihoodism The likelihood view is essentially Bayesianism Norton’s ‘Material’ Theory of Induction without the priors. An attractive feature of it All above mentioned families of theories of is that it regards evidence always as relevant induction purport to have universal range,

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556060 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 561

that is, they are thought to apply to every well known. If a correlation is nevertheless domain of inquiry, independently of the taken as evidence for a causal claim – and it more specific facts true within the differ- certainly provides a clue that two variables ent domains. In a recent paper John Norton may be causally related – other sources of has challenged the feasibility of the general correlation must be controlled for if the project behind these ‘formal’ theories of hypothesis is to be inferred reliably. If the induction as he calls them (Norton, 2003). hypothesis of interest is ‘X causes Y ’, one Unlike deductive schemata of inference such first wants to rule out reverse causation from as modus ponens or universal instantiation, Y to X as well as common factors that influ- which do enjoy universal validity, there is no ence both variables. In addition there are inductive schema that has not been subject to numerous non-causal sources of correlation: criticism and counterexample. All examples sampling error, measurement error, non- of inductive rules discussed above are cases stationarity and other statistical properties of in point. the variables, mixing, variables that are con- Hence, according to Norton, what licenses ditioned on common effects and so on. A hypo- the inference is not the form of the inductive thesis can be inferred reliably to the extent schema – as that is the same in instances that these sources of error have been control- where it works and where it doesn’t – but led successfully (Reiss, 2008: Chapter 1; rather particular material facts true of the cf. Schickore, 2005; Hon, 1989, 1995). situation in which the inference is made. There is no general account of error that Norton shows that all formal inductive sche- is independent of the type of hypothesis, mata work where they do because of such specific domains of science and the purposes material facts. Thus, inductive inferences to which the hypothesis is put. This is the derive (Norton, 2003: 648): main difference between this account and eliminative induction. There is no require- their license from facts. These facts are the mate- ment that either the hypothesis of inter- rial of the inductions; hence it is a ‘material theory est or a potential inferential error explain of induction’. Particular facts in each domain license the inductive inferences admissible in that the evidence. A hypothesis about the future domain – hence the slogan: ‘All induction is local.’ value of a variable, for instance, does not My purpose is not to advocate any particular explain whatever evidence one might have system of inductive inference. Indeed I will suggest in its favour, nor does a descriptive hypoth- that the competition between the well established esis about, say, the inflation rate, explain system is futile. Each can be used along with their attendant maxims on the best use of evidence, as why an index number has such and such a long as we restrict their use to domains in which value (see sections on Index Numbers and they are licensed by prevailing facts. Expert Political Judgement, respectively). The source of information about potential Material facts, according to this theory, errors in inference lies, rather, in context- license not only that inductive inferences are specific background knowledge about the made from evidence but also what specific domain of investigation. types of inferences can be made. There are two main problems with natural- ism. First, it is hardly a theory of induction. Error correction Theories in the do not Inferences from evidence to a hypothesis are only aim to show what is the common logic subject to a variety of errors able to invalidate behind a scientific practise such as scien- conclusions. As an example that is highly tific explanation, inference, measurement relevant in the context of the social sciences, or experimentation but also to explain the consider a correlation between two variables rationale behind these practises. Norton’s X and Y as evidence for a causal hypothesis. account does neither. He denies that there is That correlation does not entail causation is a universal logic of induction. The second

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556161 111/10/20101/10/2010 33:26:06:26:06 PMPM 562 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

naturalistic account appeals to vague ‘logic evidence and induction. As a theory of evi- of controlling for known errors’ but nei- dence, it regards an observation evidence for ther account provides a justifying rationale a hypothesis to the extent that it has been for why a given inferential practise can be produced by a test procedure that would have expected to work or why it is rational to made it very unlikely that the observation draw inferences in the way described. These would have been produced if the hypothesis accounts therefore lack power. If were false. More precisely (Mayo, 2000: one asks, say, why a certain methodology S198; emphasis original): ‘Data e produced such as that of the randomised trial is as suc- by procedure T provides good evidence for cessful as it is and one hears that this is due hypothesis H to the extent that test T severely to specific facts about the ‘physical prob- passes H with e’.5 Hypothesis H passes a abilities of the randomizer’ (Norton, 2003: severe test with e if (i) e fits H (for a suit- 655), one hasn’t been answered. able notion of fit or ‘distance’) and (ii) the Second, the different formal theories test procedure T has a very low probability differ dramatically in both their informa- of producing a result that fits H as well as tional requirements or inputs as well as their (or better than) e does, if H were false or outputs. Bayesianism, say, requires a prior incorrect. probability and likelihoods and yields a Thus, the approach requires that hypoth- posterior probability. By contrast, the error- eses be subject to a test that is as stringent as statistical approach denies that prior prob- a randomised controlled trial (for a detailed abilities attach to hypotheses. Naturalism has discussion, see section on Evidence-Based no resources to inform us about where and Policy). A randomised trial eliminates all when either inference rule can and should be sources of error in one fell swoop. Hence, it used. This is important as the different rules will ‘pass’ the hypothesis if it is false only for yield different results even in cases where statistical reasons, because of sampling error. the material facts are undisputed (Steel, This error can, however, be controlled by the 2005a). An appeal to scientific practise isn’t procedure’s ‘error-probabilities’. informative when scientists themselves are There are two error probabilities. Type-I divided about what inferences are licensed errors consist in rejecting the null hypoth- by a situation. esis (usually the hypothesis that there is no treatment effect) when it is in fact true. It is controlled by choosing the significance Hybrid Theories level of the test. Type-II errors consist in not rejecting the null when it is in fact false. It Hybrid theories provide resources that not is controlled by designing the test such that only allow us to classify a piece of informa- it has high power, which is one minus the tion such as an observation or test result as probability of a type-II error. Significance evidence, they also contain rules of infer- level and power aren’t entirely independent, ence. These accounts do not simple conjoin however. If sample and effect size are given, a theory of evidence with an inference rule. fixing the significance level determines Rather, the two aspects are parts of an inte- power and vice versa. Thus, at a chosen level grated whole. In this section I discuss two of significance (and assuming that the effect such theories, the error-statistical account size cannot be manipulated), the power of the and Achinstein’s theory. test can only be increased by increasing the sample size. Error Statistics Error statisticians Deborah Mayo and Aris The error statistical account develops classi- Spanos add a third error probability to these, cal statistical testing of Neyman and Pearson that of ‘attained power’ or ‘severity’ (Mayo into a full-fledged philosophical theory of and Spanos, 2006). It is a post-data measure

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556262 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 563

and therefore sensitive to the sample realisa- hypothesis of interest. The confirmation of tion (whereas significance and power are hypotheses is therefore of a purely qualita- pre-data and independent of the realisation). tive nature. To measure severity, one must define a value Construed as a theory of evidence, the for discrepancy from the null one deems as error-statistical approach is exceedingly substantially significant (for instance, a treat- narrow as it can only be used to test statisti- ment effect size) Severity then defined as one cal hypotheses. To be sure, statistical infer- minus the probability that a test result like ence is an important part of research in all the one obtained occurs if the discrepancy is areas awash in data. But it is just one part. in fact larger. To infer a causal claim from an experiment What we infer from a test about the hypoth- we do want to know whether the observed esis therefore depends on what we deem correlation is real or due to chance. But there as scientifically significant. Suppose that a are many other sources of error that we want new training programme is under scrutiny. to control – measurement error, confounding, The null hypothesis says that it is ineffective non-causal sources of correlations and what (relative to the current best programme, say). have you. The error-statistical approach has Testing the proposition by statistical means no answer to these.6 does not by itself allow us to draw an infer- ence. If the test yields an insignificant result Achinstein’s Theory (let us say), we cannot simply infer that there Peter Achinstein combines a form of expla- is no treatment effect. In addition we have to nationism as theory of evidence with a form specify the type of inference we wish to draw of Bayesianism as inductive rule. He first in terms of a distance from the null. Then, distinguishes various concepts of evidence: once the data is in, we can calculate the prob- subjective, ES-, potential and veridical ability of the test producing data like these if (Achinstein, 2001: Chapter 2). Essentially, the distance is in fact greater, that is, if there an agent has subjective evidence e for h if she is in fact a (greater) treatment effect. e and takes it as a reason to believe Intuitively, this makes a lot of sense. If a h. ES refers to an agent’s ‘epistemic situa- certain result is produced that is statistically tion’. In a specific epistemic situation C, e is insignificant, it is one thing to conclude ES-evidence that h if, in C, e is a good reason ‘there is (probably) no effect’ and quite to believe h. e is potential evidence in h if it another ‘there is (probably) no large effect’. is a good reason to believe h simpliciter and Accordingly, the severity with which the e is veridical evidence that h if e is potential test passes these two hypotheses differs. The evidence that h and h is true. same is of course true when the result is Achinstein thinks that veridical and poten- significant. Then, given the result, the larger tial evidence are the concepts most relevant the discrepancy (i.e. the treatment effect) one for scientific practise, and most of his work is wishes to infer from this result, the lower the dedicated to explicating those concepts. He severity. defines (Achinstein, 2001: 170): Finally, the inference we draw is about the frequency of achieving a certain test result PE. e is potential evidence that h, given b, [if and7] were the test repeated many times, just as only if in classical statistics. Thus, we do not infer a hypothesis (as in categorical induction) 1 P(there is an explanatory connection between h and e | e&b) > 1/2 or a degree of belief in a hypothesis (as in 2 e and b are true Bayesianism) but rather a claim about a 3 e does not entail h probability of a certain test result. That is, the probability claims of the error-statistical where b signifies background knowledge. By approach attach to test procedures, not to the ‘explanatory connection’ Achinstein means

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556363 111/10/20101/10/2010 33:26:06:26:06 PMPM 564 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

that it is either the cases that the hypoth- true that higher than average sea levels tend to be esis explains the evidence, that the evidence associated with higher than average bread prices. The two quantities are very strongly positively explains the hypothesis or that a common correlated. factor explains both. This condition is sup- posed to rule out classical counterexamples Now, if there is a strong correlation between to ‘high probability’ accounts of evidence and the two quantities, we can use, say, the fact confirmation. Achinstein’s example (p. 149): that one quantity is very high as evidence for the hypothesis that the other is high as well. h: Michael Jordan will not become pregnant. But there isn’t any explanatory connection e: Michael Jordan eats Wheaties. between the two. High bread prices do not b: Michael Jordan is a male basketball star. explain high sea levels, nor do high sea levels On a simple high probability account, e explain high bread prices and nor is there a is evidence that h because the probability common factor that explains both. Time can that h given e is high. Achinstein assumes be used to predict whether the quantity is that in this case e and h are not explanatorily high or low but it does not explain why this connected: neither does his eating Wheaties is so. Or, to take a case due to Jossi Berkovitz explain that he won’t become pregnant nor (as described by Dan Steel, 2005b: 19): vice versa, nor is there a common factor that ‘… imagine two slot machines constructed explains both.8 Eating Wheaties is explanato- entirely independently of one another but rily irrelevant for not becoming pregnant. which, coincidentally enough, have pre- Achinstein uses the general requirement cisely the same initial conditions and inter- that h and e must be explanatorily connected nal mechanics’. Here we are facing a brute rather than that, say, h explain e, because correlation (or a set of such correlations) that causes can provide evidence for effects, once more has no explanation. And again, we effects for causes and joint effects of common can use the state of one machine to predict – causes for each other. Thus, taking a potent provide evidence for – the state of the other. medicine can be evidence for relief, relief for That correlations rather than explanations taking medicine and the drop in the barometer are required as evidentiary relationship is reading can be evidence for the storm. also shown by the reverse case, where two To require that there be an explanatory quantities have an explanatory connection connection between evidence and hypothesis without a correlation being induced. A phi- is plausible but too strong. If evidence is to losophers’ stock example regarding the con- be a mark or a symptom of the truth of a nection between correlation and causation hypothesis it is enough that there be a cor- concerns the relation between birth con- relation between the states of affairs hypoth- trol pills and thrombosis (originally due to esis and evidential statement express, there Hesslow, 1976). Birth control pills cause need not be causal or explanatory relation. A thrombosis via one route but they also pre- widely discussed case involving a ‘spurious vent thrombosis by preventing pregnancies, correlation’ demonstrates this. Elliott Sober as pregnancies are themselves a cause of describes his case thus (Sober, 1987[1994]: thrombosis. Depending on the actual fre- 161–162 [quoted from Sober, 2001: 332]): quencies, the probability raising and lowering channels might just cancel each other out so Consider the fact that the sea level in Venice and that the probability of developing thrombosis the cost of bread in Britain have both been on the is the same whether or not a given woman rise in the past two centuries. Both, let us sup- takes the pill. Knowing that, we should not pose, have monotonically increased. Imagine that take facts about oral contraceptives as evi- we put this data in the form of a chronological dence for hypotheses about the likelihood of list; for each date, we list the Venetian sea level and the going price of British bread. Because both contracting thrombosis. Nevertheless, there quantities have increased steadily with time, it is is an explanatory connection. For instance,

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556464 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 565

the reason for a given occurrence of throm- methods or ‘sources’ of evidence, both across bosis might lie in the woman having taken all sciences as well as within a single science. the pill. It is impossible to review all the methods The seemingly arbitrary requirement that used in the social sciences here, or even all the probability of there being an explana- those one can find in a single social science. tory connection be greater than .5 stems In what follows, I will therefore present a from Achinstein’s absolutist concept of evi- highly selective partial overview. The selec- dence. A good reason to believe h cannot tion is guided partly by considerations of sci- also be a good reason to believe not-h. entific importance and philosophical interest Hence the probability given the evidence and but partly also by my expertise. I group the background knowledge must at least be .5. methods into three categories: sources of evi- Further, Achinstein includes the third con- dence for (a) descriptive claims; (b) explana- dition because does not want evidence and tory claims; and (c) policy claims. hypothesis to be too ‘close’ to each other. The drop in the barometer reading is not evidence for the change of the barometer reading. Descriptive Inference As inferential rule, Achinstein uses Bayesian updating, which makes him This type of inference and the associated a Bayesian of sorts. The main difference methods are frequently ignored by philoso- between his account and standard Bayesian phers but they are all the more important is his ‘objective epistemic’ interpretation of in social science. Hardly any property that probability (Achinstein, 2001: Chapter 5). is of interest from a social science point of Standard Bayesianism interprets probability view is immediately observable. In order as degree of belief. Apart from adhering to to establish facts, even purely descriptive the of probability theory, there are facts about a society, the investigator has to no constraints on what a subject ought to make inferences on the basis of new imme- believe.9 Achinstein defines probability as diate observations, already established facts ‘degree of reasonableness of belief’. It is and background knowledge. I focus on two therefore not a measure of how strongly a examples here: participant observation (used, person believes in a proposition but rather for instance, in anthropology, communica- one of the quality of the reasons for holding tion studies, , a belief. Further, it is not subjective in the and sociology) and index numbers (used sense of being relative to a particular agent. mostly in economics). In a sense, participant The degree of reasonableness of a certain observation and index numbers represent the drug producing relief may be .8 even though two extremes of the same spectrum. The par- no single agent holds this belief. ticipant observer is an actor who, by immers- Apart from this difference in interpreting ing herself in the culture she , probability, Achinstein is a Bayesian. In par- becomes an expert in that culture and as such ticular, he must assume hypotheses (and evi- makes informed judgements about whether dence) to have prior probabilities, which makes this or that fact obtains. When establishing his account vulnerable to the same objection facts by means of index numbers, the aim is regarding priors as standard Bayesianism. to reduce expert judgement to a minimum by standardising procedures. To be sure, measuring quantities such as inflation or unemployment too requires judgements at SOURCES OF EVIDENCE various points, even when procedures are standardised. For example, a government As indicated in the Introduction, there is statistician must judge whether the goods an enormous variety of evidence-generating he finds in a chosen supermarket are indeed

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556565 111/10/20101/10/2010 33:26:06:26:06 PMPM 566 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

comparable to the goods chosen previously, A similar trade-off obtains at the level of and if not, what type of adjustment procedure the degree of involvement of the individual to use. Similarly, Bureau of Labor surveyors participant observer. One end of the spec- must interpret the responses of households trum is occupied by the researcher who ‘goes when measuring unemployment. native’ never to return from the field. His immersion is complete but just as complete Participant Observation is his loss of and detachment, A key idea of participant observation is that and he obviously relinquishes the aim of the researcher occupy a role within the group producing a scientific outcome. At the other she observes and its aim, at least when done extreme are researchers who aim to keep in cultural anthropology, is to produce an active involvement at a minimum but thereby ethnography. Typically, participant observa- also forfeit the goal of gaining entry to inside tion involves (DeWalt and DeWalt, 2002: 4): knowledge. One important source of ethical conflict in this area is the question to what living in the context for an extended period of time; extent a researcher should intervene if she learning and using local language and dialect; finds an observed practise objectionable. Another problem is that involved research • actively participating in a wide range of daily, can be intrusive, which raises concerns about routine, and extraordinary activities with people the privacy of the observed groups; but there who are full participants in that context; • using everyday conversation as an interview is often no other way to access information technique; of this kind. • informally observing during leisure activities A further trade-off besets the description of (hanging out); the observed social practises. Traditionally, • recording observations in field notes (usually social ‘facts’ used to be reported as if valid organized chronologically); and • using both tacit and explicit information in for all (places and) times: ‘Social group G analysis and writing. engages in practise y’. After the ‘reflective turn’, however, reports resembled more what There are two main forms of the technique: in philosophy is called a protocol sentence: a overt and covert participant observation. In statement about the observation of a concrete the former case, the observed group both event, along with details about when, where knows and permits the researcher to partici- and how the observation was made. This pate and conduct her investigation, which has mode of recording facts has been criticised advantages and disadvantages. Advantages as subjective and even as overly indulgent include easier access to the group as a whole in personal details of the researcher. On the and subgroups (if one is allowed to partici- other hand, it reduces the risk of hasty gener- pate!) as well as easier recording of the obser- alisation and unwarranted inference. vations made; the main disadvantage is that as a result of knowing to be observed the group Index Numbers may change its behavioural patterns (the so- Index numbers are widely used in economics called ‘observer effect’). Covert participant in order to estimate quantities of interest such observation is carried out secretly, without as the price level, inequality or wellbeing. the group’s knowledge and permission. Apart Suppose for example we would like to assess from obvious ethical worries, disadvantages whether the price level in the following two- include the danger of losing objectivity on by-two toy economy has increased, decreased the researcher’s part and greater difficulties in or stayed put as illustrated in Table 29.1. recording data. The advantages are that access Only if all prices change at the same rate, an to groups that would not normally allow it can unequivocal answer can be given. If not, as in be gained and that there are greater chances the example, some weighted average must be of avoiding the observer effect. drawn, and different methods of averaging or

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556666 111/10/20101/10/2010 33:26:06:26:06 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 567

Table 29.1 A two-by-two toy economy results only if nothing but prices change Quantity Quantity in the economy. If other things change, for Price cocoa cocoa Price cloves cloves instance traded quantities, the answer is ambiguous unless further assumptions are Year 1 100 3 50 2 made. If we assume that consumers respond Year 2 50 2 100 3 to price changes by adjusting expenditures Source: Reiss, 2008: 68 (relatively more expensive goods are substi- tuted by relatively cheaper goods), it can be shown that the Laspeyres index overstates aggregating the raw data give different, some- inflation. To get rid of this ‘substitution times widely disparate, results. Table 29.2 effect’ one can compute a so-called ‘super- shows five different indices, computed for lative index’, of which the geometric mean the above data. between Laspeyres and Paasche index is an Index numbers are a particularly clear example. case showing that philosophical theories Price and quantity changes are not, how- of evidence have few resources to help ever, the only changes between two peri- with concrete scientific problems concerning ods. Tastes, environments, the quality of the evidence. Suppose our index number, say, exchanged goods as well as the range of avail- a Laspeyres index, computes the inflation able goods may change too. In each of these rate in a region and period to be 3 per cent. cases decisions must be made about how to Call this our evidence e. Then suppose our adequately incorporate a source of change hypothesis is that the inflation rate in that into the index. The index-number purpose region and period is indeed 3 per cent. There will guide these decisions. For instance, to is no explanatory connection between the measure consumers’ cost-of-living, it makes evidence and the hypothesis. The evidence sense to include mortgage payments in the is a mathematical , and there is no budget. By contrast, if the purpose is to test good sense in which it is caused by (even monetary theory, mortgage payments should less so, causes) the quantity of interest. Nor be excluded as they are directly proportional is there a good sense in which observing the to interest rates, which play an important evidence should lead us to revise our belief in explanatory role in that theory. the hypothesis – unless further assumptions are made (and then it is these further assump- tions that justify the inference). There is also no good sense in which the ‘test’ (comput- Explanatory Inference ing the index number) passes the hypothesis Accurate descriptive inference is an impor- ‘severely’. What is the probability that the tant goal of social science in its own right. It index yields this result if ‘true’ inflation were also plays a preparatory role for further infer- different from 3 per cent? ences regarding the explanation of social Rather, whether or not the datum ‘3 per phenomena. There are numerous models of cent’ is evidence for our hypothesis depends explanation in the social sciences but the on considerations of the following kind. A causal model is currently the dominant, and I Laspeyres index – a price index – gives exact will focus on causal inference here.

Qualitative Comparative Analysis Table 29.2 Different indices Above, we have already looked at Mill’s Arithmetic Geometric Harmonic Laspeyres Paasche methods of causal inference. Squarely in the Rate of 25% 0% –20% –13% 14% Humean tradition, Mill understood causation change to be a kind of complex regularity. To him, a Source: Reiss, 2008: 68 cause was a insufficient but non-redundant

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556767 111/10/20101/10/2010 33:26:06:26:06 PMPM 568 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

part of an unnecessary but sufficient, short is the sounding of the Manchester hooters at INUS, condition (this analysis is due to John 5.00 p.m. that is shown to be an INUS condi- Mackie, see Mackie, 1974). To define a cause tion for the Londoners to leave work shortly as INUS condition says three things. thereafter, but of course the Londoners do not leave the factory because of the sound of Any cause is followed by its effect only if certain the Manchester hooters (see Mackie, 1974: enabling conditions are present and disturbing 81–4). Nevertheless, Ragin’s account dem- factors absent. onstrates how regularities of a certain kind can constitute – defeasible – evidence for 1 For any effect, there are many alternative sets of causal hypotheses. The most likely source causes that may precede it. 2 The relationship between cause and effect is of deficiency is the omission of common invariant; that is, when the right causal condi- causes. As Mackie’s hooters example shows, tions are in place, the effect must follow and vice omitting a common cause – in his case, it versa. being 5.00 p.m. – leads one to misinterpret what is in fact joint effect as a cause. When Qualitative comparative analysis (QCA), causes operate indeterministically – in the developed by the sociologist Charles Ragin social sciences a possibility one should not (see for instance Ragin, 1998) builds on exclude a priori – the full set of causal con- the understanding of cause as INUS condi- ditions does not have to be sufficient for its tion and makes use of it in drawing causal effect, which also makes the application dif- conclusions from comparing a small number ficult in this area. of cases. It has been applied to to fields as wide-ranging as sociology, political science, Causal Modelling economics and criminology (for a full list Whereas QCA is based on (or can be expli- of applications, see the bibliographical data- cated with) the Mill-Mackie analysis of base at www.compasss.org). QCA aims to ‘cause’ as an INUS condition, the various overcome the problem of small sample sizes approaches to causal modelling relate to the by making the maximum possible number of probabilistic theory of causation, according comparisons among the sampled units. to which causation is a specific form of cor- The method identifies causes of phenom- relation or probabilistic dependence.10 The ena of interest (e.g. ethnic political mobilisa- most popular form of causal modelling is tion among Western European minorities) currently that of the so-called Bayesian net- by first arranging the all observed instances works or short Bayes’ nets. (in this case, minorities) in a table and deter- A Bayes’ net is a directed acyclic graph or mining whether or not the phenomenon is DAG with an associated probability distribu- present. Then a list of factors (in this case, tion. A graph is a set of vertices (representing size, linguistic ability, wealth relative to core variables) and a set of edges connecting the region and population growth) is constructed vertices (representing relations among the and it is noted whether each factor is present variables). A graph is directed when all its or absent. A factor is judged to be a cause edges are directed and acyclic when there whenever it is a member of a group such are no directed cycles. There is assumed to that that group of factors is always associ- be a joint probability distribution over the ated with the phenomenon of interest and no variables. If the graph is Markov, it can be subgroup is always associated with the phe- used to represent certain kinds of probabi- nomenon – in other words, if it is an INUS listic independencies among the variables. condition. For instance, in both DAG1 X1 and X3 are The analysis of causation in terms of independent conditional on X2 and in DAG2 INUS conditions is deficient, as Mackie X2 and X3 are independent conditional on himself understood. In his famous example it X1 as shown in Figure 29.1.

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556868 111/10/20101/10/2010 33:26:07:26:07 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 569

X1 Birth control – X1 X2 X3 − + DAG1 X2 X3 DAG2 + Pregnancy Thrombosis Figure 29.1 Two directed acyclic graphs DAG3 (DAGs) Figure 29.2 A potential counterexample to FC By themselves, DAGs are abstract math- ematical objects that could be used to merely store probabilistic information efficiently. Figure 29.2 (this is the above mentioned From the point of view of causal inference, example due to Hesslow, 1976). they become interesting when vertices are In the example Birth control affects interpreted as causal factors and edges as Thrombosis via two independent routes: causal relations. DAG1, for instance, can be directly and indirectly through Pregnancy. If used to represent the causal chain ‘obesity the causal strength of the two routes is identi- causes diabetes causes heart attacks’, and cal and therefore the positive direct influence DAG2 the common-cause structure ‘drop is exactly cancelled by the negative indi- in atmospheric pressure causes change in rect influence, Birth control can be causally barometer reading and storm’. related, albeit not correlated with thrombosis. A causal Bayes’ net is assumed to sat- The CMC, too, is subject to counterex- isfy the causal Markov condition (CMC), amples. Not all correlations have a causal according to whichCMC is a variable X is explanation, especially correlations among independent of all other variables in a graph social science variables. In many cases, cor- except its effects, conditional on its direct relations can be induced by certain statisti- causes. cal properties of the time series describing Thus, we expect heart attacks to be inde- variables, for instance when they are non- pendent of obesity conditional on diabetes stationary (Sober, 1987, 2001; Hoover, 2003; and the change in the barometer reading to be Reiss, 2007). Moreover, common causes do independent of storm conditional on atmos- not always screen off their joint effects – pheric pressure. The CMC is a generalisation for example when causes operate genuinely of the screening-off condition found in vari- indeterministically (Cartwright, 1999). ous probabilistic theories of causation. It pro- If both CMC and FC can be assumed to vides the link from correlation to causation: hold, simple algorithms can be applied to infer if there is a correlation, then it must have a causal relations from statistics. According to causal explanation. the FC, causation entails correlation so all For causal inference, another condition variables that are found to be correlated must is essential, viz. the Faithfulness condition be causally connected in one way or another. (FC), according to which FC is a causal The CMC can then be used to distinguish graph has only those probabilistic independ- direct, indirect and common causal rela- encies that are entailed by the CMC. tions. The theory of Bayes’ nets provides a FC says that all variables that are causally variety of algorithms that do precisely that. related are also correlated. It thus provides However, the basic idea is the foundation for the link between correlation and causation in all causal inference from statistics. the reverse direction, from causation to cor- relation: if two variables are causally related, Process Tracing they are also correlated. That this does not Statistical methods of causal inference always have to be the case is illustrated by work only under fairly stringent conditions.

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 556969 111/10/20101/10/2010 33:26:07:26:07 PMPM 570 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

To begin with, they assume a tight link mechanism connecting two social variables between causation and correlation, an can be a powerful source of evidence for assumption that is not always well war- causal claims. ranted in social science research. But even Sociologists call this method of infer- if this very fundamental assumption can be ence ‘process tracing’. Daniel Steel describes made, statistical methods often fail for purely the method as follows (Steel, 2004: 67): practical reasons. For example, one can infer ‘Process tracing consists in presenting evi- causation from correlation – at best – when dence for the existence of several prevalent all common causes of the variables consid- social practices that, when linked together ered are measured. The complexity of the produce a chain of causation from one vari- social world often makes this an insurmount- able to another.’ In the example discussed by able hurdle. Further, the statistical inference Steel, the hypothesis at stake is Malinowski’s from observed frequencies to probabilities is (1935) claim that the possession of many only reliable when the samples are relatively wives was a cause of wealth and influence large, and large sample size isn’t always among Trobriand chiefs. The social practises guaranteed in . that constitute evidence for the hypothesis In such cases process tracing may be a are (a) the custom whereby brothers contrib- viable alternative method of causal inference. ute substantial gifts of yams to the house- Frequently, when a macro social variable holds of their married sisters; and (b) the fact causes another, its action is mediated by a that political endeavours and public projects more or less continuous process or mecha- undertaken by chiefs are financed primarily nism. If this is so, there is uncertainty as to with yams. whether two specific variables are indeed While certainly a useful alternative method causally related and social mechanisms are of causal inference, process tracing too has epistemically more readily accessible than serious limitations. And obvious one is that relations between macro variables, knowl- the ‘facts’ used for process tracing such as edge about the mediating mechanism can those reported in (a) and (b) above have to be used for causal inference. In these cases, be substantiated with evidence themselves. information about mechanisms provides evi- Together they are supposed to form a causal dence for hypotheses about causal relations chain, so each link is itself causal in nature among macro social variables of interest. and must be substantiated with adequate Sometimes the stronger claim that only methods. While often at this level other knowledge about mechanisms provides suf- tools are available – for instance, participant ficient evidence to provide a good reason to observation as in the Malinowski case – there believe a causal hypothesis is made (Friedman is no guarantee that the problem of inferring and Schwartz, 1963: 59): causal relations from observations is more easily solvable than at the aggregate or social However consistent may be the relation between level. And even in the best case, the results monetary change and economic change, and of process tracing are pretty modest. The however strong the evidence for the autonomy of method can only be used to establish a purely the monetary changes, we shall not be persuaded that the monetary changes are the source of the qualitative claim about the causal connection economic changes unless we can specify in some between two variables. It may well be the detail the mechanism that connects the one with case that there are other mechanisms that the other. undermine the effect of the process studied. For instance, the correctness of Malinowski’s Necessary for successful causal inference or reasoning is not inconsistent with the pos- not, there is no doubt that if the mechanism session of many wives actually being a is epistemically more readily accessible than prohibitor of wealth and influence because the macro causal relation, learning about a of other mechanisms that are quantitatively

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 557070 111/10/20101/10/2010 33:26:07:26:07 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 571

stronger than the one established. Moreover, nature of the world would allow it (Bishop many disagreements in the social sciences and Trout, 2005). are about the quantitative strength of a cause, A recently published report of over 20 not whether or not one variable is linked to years of research on political experts by another ‘in one way or another’. Philip Tetlock strikes a balance between these sceptics and meliorists – those who maintain that ‘the quest for predictors of Policy Inference good judgment, and ways to improve our- selves, is not quixotic and that there are better Whereas descriptive and explanatory infer- and worse ways of thinking that translate ences are past- and present-regarding, policy into better and worse judgements [regard- inferences concern the future. Social scien- ing future political events]’ (Tetlock, 2006: tists do not only aim to describe and explain: 19). Many of his results confirm the sceptic they also try to anticipate future events to (Tetlock, 2006: Chapter 2). For example, facilitate planning, and to prepare policy human experts’ subjective probability judge- decisions. In this final section of the over- ments of outcomes are no better calibrated to view of methods of evidence generation, I the frequencies of these outcomes than those will look at two instances of sources of evi- of a dart-throwing (randomising) chimpan- dence for policy: expert political judgement zee who assigns an equal probability to all and evidence-based policy. outcomes.11 Whether the forecaster was an expert in the relevant field made hardly any Expert Political Judgement difference to overall predictive accuracy, nor Every day, countless ‘experts’ make predic- did education, experience, gender or political tions about political events that may or may orientation. Moreover, statistical models of not occur some time in the future. The ability various degrees of sophistication beat even to foresee such events, be they the outcome the best expert. of a national election, the outbreak of a war On the other hand, certain factors did or the end of a political era, with a reasonable make a difference to some outcomes and rel- degree of accuracy would be enormously ative to some baselines. For instance, while useful for political decision makers, investors randomising achieves a higher score on cali- and society at large, if it could be achieved. bration, humans beat chimps with respect to But sceptics argue that successful predic- discriminating between high and low prob- tion is unattainable. There are two types of ability events.12 Experts perform better than sceptics: those who deny that the world is undergraduates on both calibration as well as predictable and those who deny that humans discrimination scores. Short-term forecasts have the cognitive capacities to make suc- are more accurate than long-term forecasts. cessful . Economic historian and By far the most informative factor about methodologist Deirdre McCloskey belongs an expert’s judgement is his cognitive style. to the former group. She argues essentially Tetlock uses Isaiah Berlin’s metaphor of that successful predictions are self-defeating hedgehogs and foxes to characterise experts’ because people would try to capitalise on and cognitive style. Hedgehogs are thinkers who there by undermine them (McCloskey, 1998: know one big thing and try to systematise 150–151). Philosophers Michael Bishop every fact within the explanatory schema and J.D. Trout belong to the latter group. of that one big thing. Foxes know many They argue that cognitive limitations such little things, are sceptical of grand schemes as memory and computing deficiencies as and excel in ‘ad hocery’. Foxes outperform well as psychological phenomena such as hedgehogs on both calibration as well as dis- overconfidence prevent us from achieving crimination and the best come close to some predictive success to the extent that the statistical models. Controlling for cognitive

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 557171 111/10/20101/10/2010 33:26:07:26:07 PMPM 572 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

style also changes the interpretation of other an alternative treatment or a placebo). In a effects. Thus, while expertise has no across- double-blind trial neither the participating the-bard effect, it is beneficial for foxes subjects nor the treatment administrators and outright harmful for hedgehogs. Foxes know which is the treatment and which the also score higher in the long-term than in control group. There are also multiple-blind the short-term while the opposite is true of trials in which also the statistician who hedgehogs (Tetlock, 2006: Chapter 3). analyses the data and other researchers are The important lesson from Tetlock’s study blinded. is that there are better and worse experts, and RCTs are regarded as the gold standard of there are ways to tell who is what. Tetlock’s evidence in evidence-based practise because results confirm those reported by Bishop they are, if implemented successfully, a and Trout in that even the foxiest experts are highly reliable sources of evidence for causal outperformed by statistical models. But to claims. But there are two catches. The first the extent that political expertise is likely to is indicated by the qualification ‘if imple- maintain an important role in society, we had mented successfully’: RCT results are certain better know who to trust, how far and with only under highly stringent, and indeed unre- respect to what claims. alistic conditions. These conditions include that the set of all other factors that affect the Evidence-Based Policy outcome are distributed identically between The evidence-based practise13 movement can the two groups and that correlations always be understood as a reaction to what was have a causal explanation (see for instance perceived as over-reliance on expertise and Cartwright, 2007a). However, randomisa- related sources of knowledge such as folk- tion in no way guarantees that treatment and lore and tradition. These, in the eyes of pro- control group are identical with respect to all ponents of the movement, unreliable guides confounders (see for instance Worrall, 2002), to practise should be substituted by rigor- and especially in social-science applications, ously established scientific evidence, and correlations have a variety of non-causal many regard the randomised controlled trial sources (Reiss, 2007). (RCT) as the ‘gold standard’ of evidence. The second catch is that the result of a The movement became prominent first in RCT, even if it was implemented success- medicine and other fields of health care and fully, while known with certainty, is of very policy but is now gaining popularity in social limited use. What a RCT at best proves is and public policy in the United States and that a treatment has a causal effect on aver- many other countries. RCTs have been con- age and in the population studied. If there is ducted to study questions as diverse as the an average causal effect, then the treatment effect of CCTV surveillance on crime, class must be effective in at least some individu- size on academic achievement, cognitive- als but we don’t know which. In particular, behavioural treatment on anti-social behav- it is not inconsistent that a treatment should iour, correctional boot camps on offending be beneficial on average and yet harmful for and many more. some individuals. Further, the RCT result In a RCT eligible subjects are divided into is valid only with respect to the particular two groups using a random number gen- arrangement of confounders present in the erator. The aim of randomisation is to create experiment. It is not informative about the two groups that are ‘exchangeable’ from a effectiveness of the treatment in populations statistical point of view; that is, identical in with a different arrangement of confounding all respects relevant for the assessment of factors. the treatment effect. One group is assigned The latter difficulty has come to be called a treatment while the other figures as control the ‘problem of external validity’: if a test group (and either remains untreated, receives result is valid for an experimental population,

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 557272 111/10/20101/10/2010 33:26:07:26:07 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 573

how do we apply it outside the experiment, not subject to methodological problems or all ‘in the field’? This issue has been taken up in studies have some flaws, the question simply recent philosophy of science and now there reappears. (3) is a less sophisticated version exist a variety approaches to deal with it: of (5), which will be dealt with in more detail based on knowledge of mechanisms (Steel, below. (4) again seems arbitrary. What if the 2008), on (Guala, forthcoming, best studies support a conclusion different 2005) and on causal capacities (Cartwright, from the majority? This strategy also works 2009). But none of these have the logical at best for simple yes/no results such as stringency of the ideal RCT. Therefore, the whether or not a treatment is effective. Many certainty associated with testing a policy disagreements, however, are about the size of proposition using RCTs is at least to some the effect, not its presence or absence. extent illusory. Meta-analyses combine research results from a range of studies in a quantitative way. Many meta-analyses identify a common metric Integrating, Weighing and of effect size and model it using some form of Aggregating Evidence regression in which the results of the indi- vidual studies figure as inputs. Meta-analyses A serious problem arises when pieces of evi- have a variety of advantages over alternatives, dence tell different stories about the hypoth- including an increase in statistical power (over esis at stake. This can happen when, for individual studies) and the ability to control for instance, estimated correlations vary greatly a variety of sources of error (see Hunter and between different studies or when different Schmidt, 2004: Chapter 2). But they also come sources of evidence (e.g. statistical versus with serious drawbacks and limitations. mechanistic evidence) give incompatible There are two obvious limitations. The results. How do we combine such conflicting first is that meta-analyses can only combine items of evidence in a way such as to draw statistical evidence. The important prob- reliable inferences regarding a hypothesis? lem of integrating evidence from different Peter Achinstein discusses the following sources is not addressed. Second, the method strategies (Achinstein, 2001: 124): requires that the individual studies deal with the same hypothesis. It is frequently not Write a review article that summarises the differ- clear, however, what that amounts to. ent studies and results, without attempting to resolve the issue. But even if one restricts coverage to sta- tistical evidence and is able to formulate 1 Choose a single, favourite study from the set reasonable inclusion criteria, meta-analysis and agree with its conclusions. is subject to criticisms. The most important 2 Compute overall averages for relevant statistics of these is that the method assumes that dif- across the entire set of studies, independently ferences between the studies are due to sta- of the sizes of the sample in each study or the conditions under which the samples were tistical error alone whereas in fact they often taken. arise systematically. If, say, one study shows 3 Take a vote. If a majority of the studies favour a positive treatment effect and another shows one conclusion, then that is the conclusion a negative effect or none, this may be due supported by the studies. to differences in the causal structures char- 4 Employ meta-analysis, which its proponents regard as a much more sophisticated and reli- acterising the two populations. Averaging able set of methods than 3 and 4 above. over the two studies masks these differences and treats the samples as if drawn from one (1) Obviously doesn’t solve the problem. underlying population. This way important (2) seems arbitrary unless the chosen study is information is lost. Similar considerations the only one without clearly identifiable meth- apply to the choice of measurements used odological flaws. If either several studies are and other aspects of the study design.

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 557373 111/10/20101/10/2010 33:26:07:26:07 PMPM 574 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

In sum, to the extent that individual studies should instead start from scientifically – and are biased for statistical reasons (for instance, politically – desired uses and work on meth- because of sampling error), meta-analyses ods designed to target hypotheses knowledge are a powerful tool to reduce these types of of which is advantageous in their light. At the bias. They do not, however, eliminate the core of her project is a kind of contextual- need for context-sensitive judgements about ism regarding evidence (Cartwright, 2006: the quality of the individual studies entering 983): ‘What justifies a claim depends on the analysis. what we are going to do with that claim, and evidence for one use may provide no support for others’. ‘Evidence for use’ is a research project that urges philosophers EVIDENCE FOR USE and methodologists to pay closer attention to the demands of scientific practise and All scientific methods are associated with social utility. Arguably, the foundational specific types of hypotheses the researcher is debates in the social sciences can profit from entitled to infer by the evidence generated by such a reorientation too. Much philosophical them. Often, there is a kind of trade-off: the effort is spent, perhaps needlessly, on debat- more reliable the method (that is, the more ing whether social science is real science, secure the inference based on the evidence whether one should or shouldn’t study the produced by it), the narrower the range of social world with essentially the same meth- hypotheses that can be supported by that ods as the natural world, how to effectively method. Some methods, such as the RCT, separate ‘facts’ and ‘values’, whether there ‘clinch’ their results: under a suitable set is such a thing as society or rather a mere of background assumptions, the evidence heap of individuals – while significant social deductively entails the hypothesis. But the issues remain unnoticed by the philosophers epistemic certainty is bought at a cost: the claiming to be experts in these debates. situations of which the assumptions are likely to be true are very rare. Other methods, such as participant observation only ‘vouch’ for their results: the evidence makes the hypoth- NOTES esis more likely without proving it. These 1 These three inferences allude to enumerative methods tend to be more broadly applicable induction, Bayesianism and the error-statistical but there always remain reasons to believe approach, respectively. See below for detailed that the conclusion is false (for the clincher/ descriptions. voucher dichotomy, see Cartwright, 2007b). 2 If f and ϕ are exchanged, the same method is The value of knowing a hypothesis is used to argue that f is an effect of ϕ. 3 Once more, if f and are exchanged, the same constrained by the certainty with which it ϕ method is used to argue that f is an effect of ϕ. is known but also by the value of the ways 4 This interpretation was at the heart of Carnap’s in which it can be put to use. Philosophers 1950 influential theory of evidence. in the analytic tradition have tended to be 5 Mayo inserts a footnote here saying that she impressed by the paradigm of ‘exact science’ prefers to phrase this in terms of data e being a ‘good indication’ of H. and consequently focused on the former 6 See also Hon (1998) and Carrier (2001) for at the expense of the latter. But for many similar criticisms. It should be noted, however, that important applications, there simply are no error statisticians claim to have a philosophy of evi- techniques that produce evidence of the dence and induction of entirely general scope: see desired quality. Mayo (1997, 2000, 2004); Mayo and Spanos (2004, 2006). Cartwright (2006) argues that philosophers, 7 Achinstein first defines only a necessary condi- rather than develop methods with certain tion but later qualifies: ‘the conditions in (PE) are epistemologically desirable characteristics, proposed as both necessary and sufficient’ (2001).

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 557474 111/10/20101/10/2010 33:26:07:26:07 PMPM EMPIRICAL EVIDENCE: ITS NATURE AND SOURCES 575

8 ‘Being male’ might explain Wheaties eating DeWalt, K. and B. DeWalt (2002) Participant behaviour in the same sense that gender explains Observation: A Guide for Fieldworkers. Walnut other preferences. Achinstein assumes that this isn’t Creek, CA: AltaMira Press. the case here. Friedman, M. and A. Schwartz (1963) ‘Money and busi- 9 Though many Bayesians require satisfaction of ness cycles’, Review of Economics and Statistics, the so-called ‘principal principle’, which says that if 45(1, Part 2, Supplement): 32–64. an agent knows the physical probability of an out- come, his degree of belief should be the same. The Guala, F. (2005) The Methodology of Experimental term is due to David Lewis (1980). Economics. Cambridge: Cambridge University Press. 10 Some authors, in the methodological literature Guala, F. ( forthcoming) ‘Extrapolation Without Process most notably Kevin Hoover (2003) distinguish the Tracing’, Philosophy of Science, PSA 2008. two notions. With most other philosophers I will Hacking, I. (1965) The Logic of Statistical Inference. glance over the differences here (for a discussion, see Cambridge: Cambridge University Press. Reiss, 2007). Hacking, I. (1972) ‘Likelihood’, British Journal for the 11 What is meant by ‘calibrated’ here is that the Philosophy of Science, 23: 132–137. subjective probability judgements reflect the objec- Harman, G. (1965) ‘Inference to the best explanation’, tive frequencies of those types of events. For exam- Philosophical Review, 74(1): 88–95. ple, those events assigned a probability of 10 per cent should actually happen in 10 per cent of the cases. Hempel, C. (1945) ‘Studies in the logic of confirmation 12 Perfect discrimination is achieved when all (I.)’, Mind, 54(213): 1–26. those events that obtain are predicted as ‘certain’ Hempel, C. (1965) Aspects of Scientific Explanation and and all those that do not obtain as ‘impossible’. Other Essays in the Philosophy of Science. New York, 13 With ‘evidence-based practise’ I refer to evi- NY: Free Press. dence-based movements in all branches of knowl- Hesslow, G. (1976) ‘Discussion: Two notes on the edge creation and policy such as medicine, health probabilistic approach to ’, Philosophy of care and policy as well as management, social and Science, 43: 290–292. public policy. ‘Evidence-based policy’ is narrower, Hon, G. (1989) ‘Towards a typology of experimental covering only the latter two fields. To my knowledge, errors: An epistemological view’, Studies in History there is no standardised terminology in this area. and Philosophy of Science, 20: 469–504. Hon, G. (1995) ‘Is the identiciation of experimental error contextually dependent? The case of Kaufmann’s experiment and its varied reception’, in REFERENCES J. Buchwald (ed.) Scientific Practice: Theories and Stories of Doing . Chicago, IL: University of Achinstein, P. (2001) The Book of Evidence. Oxford: Chicago Press. pp. 170–223. Oxford University Press. Hon, G. (1998) ‘Exploiting errors’, International Studies Bird, A. (2007) ‘Inference to the only explanation’, in the Philosophy of Science, 29(3): 465–479. Philosophy and Phenomenological Research, 74(2): Hoover, K. (2003) ‘Nonstationary time-series, cointegra- 424–432. tion, and the principle of the common cause’, British Bishop, M. and J.D. Trout (2005) and the Journal for the Philosophy of Science, 54: 527–551. Psychology of Human Judgment. Oxford: Oxford Howson, C. and P. Urbach (1993) Scientific Reasoning: University Press. The Bayesian Approach. 2nd edn. Chicago, IL: Open Carnap, R. (1950) Logical Foundations of Probability. Court. Chicago, IL: University of Chicago Press. Hunter, J. and F. Schmidt (2004) Methods of Meta- Carrier, M. (2001) ‘Critical notice: Error and the growth Analysis: Correcting Error and Bias in Research of experimental knowledge’, International Studies in Findings. Thousand Oaks, CA: Sage. the Philosophy of Science, 15(1): 93–98. Kincaid, H. (2007) ‘Contextualist morals and science’, Cartwright, N. (1999) The Dappled World. Cambridge: in H. Kincaid, J. Dupré and A. Wylie (eds) Value-Free Cambridge University Press. Science? Ideals and Illusions. Oxford: Oxford Cartwright, N. (2007a) ‘Are RCTs the gold standard?’, University Press. pp. 218–238. BioSocieties, 2(2): 11–20. Klein, J. (2003) ‘Francis Bacon’, Stanford Encyclopedia Cartwright, N. (2007b) Hunting Causes and Using of Philosophy (Spring 2009) E. Zalta (ed.), URL = < Them. Cambridge: Cambridge University Press. http://plato.stanford.edu/archives/spr2009/entries/ Cartwright, N. (2009) ‘Evidence-based policy: What’s francis-bacon/> (accessed 23 November 2009) . to be done about relevance’, Philosophical Studies, Lewis, D. (1980) ‘A subjectivist’s guide to objective 143(1): 127–136. chance’, R. Jeffrey (ed.) Studies in Inductive Logic

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 557575 111/10/20101/10/2010 33:26:07:26:07 PMPM 576 THE SAGE HANDBOOK OF THE PHILOSOPHY OF SOCIAL SCIENCES

and Probability. Berkeley and Los Angeles: University in the Sciences. London: College Publications. of California Press. pp. II. pp. 179-196. Lipton, P. (2004) Inference to the Best Explanation. 2nd Reiss, J. (2008) Error in Economics: Towards a edn. London: Routledge. More Evidence-Based Methodology. London: Mackie, J. (1974) The Cement of the Universe: A Study Routledge. of Causation. Oxford: Oxford University Press. Salmon, W. (1975) ‘Confirmation and relevance’, G. Malinowski, B. (1935) Coral Gardens and Their Magic. Maxwell and R. Anderson (ed.) Induction, Probability, New York, NY: American Book Co. and Confirmation. Don Mills, ON: Burns & Mayo, D. (1996) Error and the Growth of Experimental Maceachern. VI: 3–36. Knowledge. Chicago: University of Chicago Press. Schickore, J. (2005) ‘“Through thousands of errors we Mayo, D. (1997) ‘Error statistics and learning from reach the truth” – but how? On the epistemic roles error: Making a virtue of necessity’, Philosophy of of error in scientific practice’, Studies in History and Science, 64(PSA 1996): S195–212. Philosophy of Science, 36: 539–556. Mayo, D. (2000) ‘Experimental practice and an error Scriven, M. (1966) ‘Causes, connections and conditions statistical account of evidence’, Philosophy of in history’, in W. Dray (ed.) Science, 67(Proceedings): S193–207. and History. New York, NY: Harper and Row. pp. Mayo, D. (2004) ‘An error-statistical philosophy of evi- 238–264. dence’, in M. Taper and S. Lele (eds) The Nature of Scriven, M. (2008) ‘A summative evaluation of RCT Scientific Evidence. Chicago, IL: University of Chicago methodology and an alternative approach to causal Press. pp. 79–96. research’, Journal of MultiDisciplinary Evaluation, Mayo, D. and A. Spanos (2004) ‘Methodology in prac- 5(9): 11–24. tice: Statistical misspecification testing’, Philosophy Sober, E. (1987[1994]) ‘The principle of the common of Science, 71: 1007–1025. cause’, in From a Biological Point of View. Cambridge: Mayo, D. and A. Spanos (2006) ‘Severe testing as a Cambridge University Press. pp. 158–174. basic concept in a Neyman–Pearson philosophy of Sober, E. (2001) ‘Venetian sea levels, British bread induction’, British Journal for the Philosophy of prices, and the principle of the common cause’, Science, 57: 323–357. British Journal for the Philosophy of Science, 52: McCloskey, D. (1998) The Rhetoric of Economics. 2nd 331–346. edn. Madison, WN: University of Wisconsin Press. Steel, D. (2004) ‘Social mechanisms and causal infer- Mill, J. S. (1843 [1874]) A System of Logic. New York, ence’, Philosophy of the Social Sciences, 34(1): NY: Harper. 55–78. Norton, J. (2003) ‘A material theory of induction’, Steel, D. (2005a) ‘The facts of the matter: A discussion Philosophy of Science, 70(4): 647–670. of Norton’s material theory of induction’, Philosophy Norton, J. (2008) ‘Must evidence underdetermine of Science, 72: 188–197. theory?’, in M. Carrier, D. Howard and J. Kourany Steel, D. (2005b) ‘ and the causal Markov (eds) The Challenge of the Social and the Pressure of condition’, British Journal for the Philosophy of Practice. Pittsburgh, PA: Pittsburgh University Press. Science, 56: 3–26. pp. 17–44. Steel, D. (2008) Across the Boundaries: Extrapolation in Okasha, S. (2000) ‘Van Fraassen’s critique of inference Biology and Social Science. Oxford: Oxford University to the best explanation’, Studies in the History and Press. Philosophy of Science, 34(4): 691–710. Sugden, R. (2000) ‘Credible worlds: The status of theo- Quine, W.v.O. (1969) ‘Natural kinds’, in W.v.O. Quine retical models in economics’, Journal of Economic (ed.) Ontological Relativity and Other Essays. New Methodology, 7(1): 1–31. York, NY: Columbia University Press. pp. 114–138. Tetlock, P. (2006) Expert Political Judgment: How Good Ragin, C. (1998) ‘The logic of quality comparative Is It? How Can We Know? Princeton: Princeton analysis’, International Review of Social History, University Press. 43(Supplement): 105–124. van Fraassen, B. (1985) Laws and Symmetry. Oxford: Reiss, J. (2007) ‘Time series, nonsense correlations Oxford University Press. and the principle of the common cause’, in F. Russo Worrall, J. (2002) ‘What evidence in evidence-based and J. Williamson (eds) Causality and Probability medicine’, Philosophy of Science, 69: S316–330.

55579-Jarvie-Chap29.indd579-Jarvie-Chap29.indd 557676 111/10/20101/10/2010 33:26:07:26:07 PMPM