Unifying Logic and Probability
Total Page:16
File Type:pdf, Size:1020Kb
review articles DOI:10.1145/2699411 those arguments: so one can write rules Open-universe probability models show merit about On(p, c, x, y, t) (piece p of color c is on square x, y at move t) without filling in unifying efforts. in each specific value for c, p, x, y, and t. Modern AI research has addressed BY STUART RUSSELL another important property of the real world—pervasive uncertainty about both its state and its dynamics—using probability theory. A key step was Pearl’s devel opment of Bayesian networks, Unifying which provided the beginnings of a formal language for probability mod- els and enabled rapid progress in rea- soning, learning, vision, and language Logic and understanding. The expressive power of Bayes nets is, however, limited. They assume a fixed set of variables, each taking a value from a fixed range; Probability thus, they are a propositional formal- ism, like Boolean circuits. The rules of chess and of many other domains are beyond them. What happened next, of course, is that classical AI researchers noticed the perva- sive uncertainty, while modern AI research- ers noticed, or remembered, that the world has things in it. Both traditions arrived at the same place: the world is uncertain and it has things in it. To deal with this, we have to unify logic and probability. But how? Even the meaning of such a goal is unclear. Early attempts by Leibniz, Bernoulli, De Morgan, Boole, Peirce, Keynes, and Carnap (surveyed PERHAPS THE MOST enduring idea from the early by Hailperin12 and Howson14) involved days of AI is that of a declarative system reasoning attaching probabilities to logical sen- over explicitly represented knowledge with a general tences. This line of work influenced AI inference engine. Such systems require a formal key insights language to describe the real world; and the real world ˽ First-order logic and probability has things in it. For this rea son, classical AI adopted theory have addressed complementary aspects of knowledge representation first-order logic—the mathe matics of objects and and reasoning: the ability to describe complex domains concisely in terms relations—as its foundation. of objects and relations and the ability The key benefit of first-order logic is its expressive to handle uncertain information. Their unification holds enormous power, which leads to concise—and hence promise for AI. learnable—models. For example, the rules of chess ˽ New languages for defining open-universe 0 5 probability models appear to provide occupy 10 pages in first-order logic, 10 pages in the desired unification in a natural way. 38 As a bonus, they support probabilistic propositional logic, and 10 pages in the language of reasoning about the existence and identity finite automata. The power comes from separating of objects, which is important for any system trying to understand the world predicates from their arguments and quantifying over through perceptual or textual inputs. 88 COMMUNICATIONS OF THE ACM | JULY 2015 | VOL. 58 | NO. 7 research but has serious shortcomings Despite their successes, these appro- The difference between knowing all as a vehicle for representing knowledge. aches miss an important consequence the objects in advance and inferring An alternative approach, arising from of uncertainty in a world of things: uncer- their existence from observation cor- both branches of AI and from statistics, tainty about what things are in the world. responds to the distinction between com bines the syntactic and semantic Real objects seldom wear unique identi- closed-universe languages such as SQL devices of logic (composable function fiers or preannounce their existence like and logic programs and open-universe symbols, logical variables, quantifiers) the cast of a play. In areas such as vision, languages such as full first-order logic. with the compositional semantics of language understanding, Web mining, This article focuses in particular on Bayes nets. The resulting languages and computer security, the existence of open- universe probability models. enable the construction of very large prob- objects must be inferred from raw data The section on Open-Universe Models ability models and have changed the way (pixels, strings, and so on) that contain describes a formal language, Bayesian 21 IMAGE BY ALMAGAMI BY IMAGE in which real-world data are analyzed. no explicit object references. logic or BLOG, for writing such models. JULY 2015 | VOL. 58 | NO. 7 | COMMUNICATIONS OF THE ACM 89 review articles It gives several examples, including (in terms. Thus, Parent(Bill, William) These two assumptions force every world simplified form) a global seismic moni- and Father(William) = Bill are atomic to contain the same objects, which are toring system for the Comprehensive sentences. The quantifiers ∀ and ∃ in one-to-one correspondence with the Nuclear Test-Ban Treaty. make assertions across all objects, for ground terms of the language (see example, Figure 1b).b Obviously, the set of worlds Logic and Probability under open-universe semantics is larger This section explains the core concepts ∀ p, c (Parent( p, c) ∧ Male (p)) ⇔ and more heterogeneous, which makes of logic and probability, beginning with Father (c) = p. the task of defining open-universe prob- possible worlds.a A possible world is a ability models more challenging. formal object (think “data structure”) For first-order logic, a possible world The formal semantics of a logical with respect to which the truth of any specifies (1) a set of domain elements (or language define the truth value of a sen- assertion can be evaluated. objects) o1, o2, . and (2) mappings from tence in a possible world. For example, For the language of propositional the constant symbols to the domain the first-order sentence A = B is true in logic, in which sentences are composed elements and from the function and world ω iff A and B refer to the same from proposition symbols X1, . , Xn predicate symbols to functions and rela- object in ω; thus, it is true in the first joined by logical connectives (∧, ∨, ¬, tions on the domain elements. Figure 1a three worlds of Figure 1a and false in the ⇒, ⇔), the possible worlds ω ∈ Ω are all shows a simple example with two con- fourth. (It is always false under closed- possible assignments of true and false stants and one binary predicate. Notice universe semantics.) Let T(a) be the set to the symbols. First-order logic adds that first-order logic is an open-universe of worlds where the sentence a is true; the notion of terms, that is, expres- language: even though there are two con- then one sentence a entails another sions referring to objects; a term is a stant symbols, the possible worlds allow sentence b, written a b, if T(a) ⊆ T(b). constant symbol, a logical variable, or a for 1, 2, or indeed arbitrarily many objects. Logical inference algorithms generally k-ary function applied to k terms as A closed- universe language enforces determine whether a query sentence is arguments. Proposition symbols are additional assumptions: entailed by the known sentences. replaced by atomic sentences, con- In probability theory, a probability sisting of either predicate symbols – The unique names assumption model P for a countable space Ω of pos- applied to terms or equality between requires that distinct terms must sible worlds assigns a probability P(ω) to refer to distinct objects. each world, such that 0 ≤ P(ω) ≤ 1 and – The domain closure assump- Σω ∈ Ω P(ω) = 1. Given a probability model, a In logic, a possible world may be called a model tion requires that there are no the probability of a logical sentence a or structure; in probability theory, a sample point. To avoid confusion, this paper uses objects other than those named is the total probability of all worlds in “model” to refer only to probability models. by terms. which a is true: Figure 1. (a) Some of the infinitely many possible worlds for a first-order, open-universe P(a) = Σ P(ω). (1) language with two constant symbols, A and B, and one binary predicate R(x, y). Gray arrows ω∈T(a) indicate the interpretations of A and B and black arrows connect pairs of objects satisfying R. (b) The analogous figure under closed-universe semantics; here, there are exactly four The conditional probability of one sen- possible x, y-pairs and hence 24 = 16 worlds. tence given another is P(a|b ) = P(a ∧ b )/ P(b ), provided P(b ) > 0. A random vari- AB AB AB AB AB AB able is a function from possible worlds to a fixed range of values; for example, (a) . .. .. one might define the Boolean random variable VA = B to have the value true in the first three worlds of Figure 1a and AB AB AB AB AB false in the fourth. The distribution of a random variable is the set of prob- (b) A A A A . A B B B B B abilities associated with each of its possible values. For example, suppose the variable Coin has values 0 (heads) and 1 (tails). Then the assertion Coin Figure 2. Left: a Bayes net with three Boolean variables, showing the conditional probability ∼ Bernoulli(0.6) says that Coin has a of true for each variable given its parents. Right: the joint distribution defined by Equation (2). Bernoulli distribution with parameter 0.6, that is, a probability 0.6 for the BAE P(B, E, A) Burglary P(B) Earthquake P(E) value 1 and 0.4 for the value 0. 0.003 0.002 t t t 0.0000054 t t f 0.0000006 t f t 0.0023952 b The open/closed distinction can also be illus- t f f B E P(A | B,E) 0.0005988 f t t 0.0007976 trated with a common-sense example.