1990-Learning Causal Trees from Dependence Information
Total Page:16
File Type:pdf, Size:1020Kb
TECHNICAL REPORT From: AAAI-90 Proceedings. Copyright ©1990, AAAI (www.aaai.org). All rights reserved. R-149 sal Trees fro n-nation * Dan Geiger Azaria Paz Judea Pearl [email protected] [email protected] [email protected] Northrop Research and Computer Science Department Cognitive Systems Lab. Technology Center Israel Institute of Technology University of California One Research Park Haifa, Israel 32000 Los Angeles, CA 90024 Palos Verdes, CA 90274 Abstract b”, “a suggests b”, ‘a tends to cause b”, and “u actu- ally caused b”. These utterances are the basic build- In constructing probabilistic networks from hu- ing blocks from which knowledge bases are assembled. man judgments, we use causal relationships to Second, any autonomous learning system attempting convey useful patterns of dependencies. The con- to build a causal model of its environment cannot rely verse task, that of inferring causal relationships exclusively on procedural semantics but must be able from patterns of dependencies, is far less under- to translate direct observations to cause and effect re- stood. Th’ 1s paper establishes conditions under lationships. which the directionality of some interactions can Temporal precedence is normally assumed essential be determined from non-temporal probabilistic in- for defining causation. Suppes [Suppes 701, for ex- formation - an essential prerequisite for attribut- ample, introduces a probabilistic definition of causa- ing a causal interpretation to these interactions. tion with an explicit requirement that a cause pre- An efficient algorithm is developed that, given cedes its effect in time. Shoham makes an identical data generated by an undisclosed causal polytree, assumption [Shoham 871. In this paper we propose a recovers the structure of the underlying polytree, non-temporal semantics, one that determines the di- as well as the directionality of all its identifiable rectionality of causal influence without resorting to links. temporal information, in the spirit of [Simon 541 and [Glymour at al. 871. Such semantics should be applica- 1 Introduction ble, therefore, to the organization of concurrent events or events whose chronological precedence cannot be de- The study of causation, because of its pervasive usage termined empirically. Such situations are common in in human communication and problem solving, is cen- the behavioral and medical sciences where we say, for tral to the understanding of human reasoning. All rea- example, that old age explains a certain disability, not soning tasks dealing with changing environments rely the other way around, even though the two occur to- heavily on the distinction between cause and effect. gether (in many cases it is the disability that precedes For example, a central task in applications such as di- old age). agnosis, qualitative physics, plan recognition and lan- Another feature of our formulation is the appeal to guage understanding, is that of abduction, i.e., finding probabilistic dependence, as opposed to functional or a satisfactory explanation to a given set of observa- deterministic dependence. This is motivated by the tions, and the meaning of explanation is intimately re- observation that most causal connections found in nat- lated to the notion of causation. ural discourse, for example “reckless driving causes ac- Most AI works have given the term “cause” a proce- cidents” are probabilistic in nature [Spohn 901. Given dural semantics, attempting to match the way people that statistical analysis cannot distinguish causation use it in inference tasks, but were not concerned with from covariation, we must still identify the asymme- what makes people believe that Ku causes b”, as op- tries that prompt people to perceive causal structures posed to, say, “b causes a” or Kc causes both a and in empirical data, and we must find a computational b.” [de Kleer & Brown 78,Simon 541. An empirical se- model for such perception. mantics for causation is important for several reasons. First, by formulating the empirical components of cau- Our attack on the problem is as follows; first, we sation we gain a better understanding of the mean- pretend that Nature possesses Ktrue” cause and effect ing conveyed by causal utterances, such as Ku explains relationships and that these relationships can be repre- sented by a causal network, namely, a directed acyclic *This work was supported in part by the National Sci- graph where each node represents a variable in the do- ence Foundation Grant # IRI-8821444 main and the parents of that node correspond to its 770 MACHINE LEARNING direct causes, as designated by Nature. Next, we as- properties of P allow the recovery of D ? It is shown sume that Nature selects a joint distribution over the that intersection composition and marginal weak tran- variables in such a way that direct causes of a variable sitivity are sufficient properties to ensure that the dag render this variable conditionally independent of all is uniquely recoverable (up to isomorphism) in polyno other variables except its consequences. Nature per- mial time. The recovery algorithm developed consid- mits scientists to observe the distribution, ask ques- erably generalizes the method of Rebane and Pearl for tions about its properties, but hides the underlying the same task, as it does not assume the distribution to causal network. We investigate the feasibility of recov- be dag-isomorph [Pearl 88, Chapter 81. The algorithm ering the network’s topology efficiently and uniquely implies, for example, that the assumption of a multi- from the joint distribution. variate normal distribution is sufficient for a complete This formulation contains several simplifications of recovery of singly-connected dags. the actual task of scientific discovery. It assumes, for example, that scientists obtain the distribution, rather 2 Probabilistic Dependence: than events sampled from the distribution. This as- Background and Definitions sumption is justified when a large sample is available, sufficient to reveal all the dependencies embedded in Our model of an environment consists of a finite set the distribution. Additionally, it assumes that all of variables U and a distribution P over these vari- relevant variables are measurable, and this prevents ables. Variables in a medical domain, for example, us from distinguishing between spurious correlations represent entities such as “cold”, Kheadache”, ‘fever”. [Simon 541 and genuine causes, a distinction that is im- Each variable has a domain which is a set of permis- possible within the confines of a closed world assump- sible values. The sample space of the distribution is tion. Computationally, however, solving this simplified the Cartesian product of all domains of the variables problem is an essential component in any attempt to in U. An environment can be represented graphically deduce causal relationships from measurements, and by an acyclic directed graph (dag) as follows: We se- that is the main concern of this paper. lect a linear order on all variables in U. Each variable It is not hard to see that if Nature were to assign is represented by a node. The parents of a node v totally arbitrary probabilities to the links, then some correspond to a minimal set of variables that make v distributions would not enable us to uncover the struc- conditionally independent of all lesser variables in the ture of the network. However, by employing additional selected order. Each ordering may produce a differ- restrictions on the available distributions, expressing ent graph, for example, one representation of the three properties we normally attribute to causal relation- variables above is the chain headache +- cold -+ fever ships, some structure could be recovered. The basic which is produced by the order cold, headache and requirement is that two independent causes should be- fever (assuming fever and headache are independent come dependent once their effect is known [Pearl 881. symptoms of a cold). Another ordering of these vari- For example, two independent inputs for an AND gate ables: fever, cold, and heuduche would yield the dag become dependent once the output is measured. This headache + cold c- fever with an additional arc be- observation is phrased axiomatically by a property tween fever and headache. Notice that the directional- called Marginal Weak Transitivity (Eq. 9 below). This ity of links may differ between alternative representa- property tells us that if two variables z and y are mu- tions. In the first graph directionality matches our per- tually independent, and each is dependent on their ef- ception of cause-effect relationships while in the second fect c, then II: and y are conditionally dependent for it does not, being merely a spurious by-product of the at least one instance of c. Two additional properties ordering chosen for the construction. We shall see that, of independence, intersection and composition (Eqs. 7, despite the arbitrariness in choosing the construction and 8 below ), are found useful. Intersection is guar- ordering, some directions will be preferred to others, anteed if the distributions are strictly positive and is and these can be determined mechanically. justified by the assumption that, to some extent, all The basis for differentiating alternative representa- observations are corrupted by noise. Composition is tions are the dependence relationships encoded in the a property enforced, for example, by multivariate nor- distribution describing the environment. We regard a mal distributions, stating that two sets of variables X distribution as a dependency model, capable of answer- and Y are independent iff every it; E X and y E Y ing queries of the form UAre X and Y independent are independent. In common discourse, this property given 2 ?” and use these answers to select among is often associated with the notion of “independence”, possible representations. The following definitions and yet it is not enforced by all distributions. theorems provide the ground for a precise formulation of the problem. The theory to be developed in the rest of the paper addresses the following problem.