It Has Been Said That…

“For every belief comes either through syllogism or from induction”

Aristotle, ( II, §23) Brendan Juba Loizos Michael Washington Univ. St. Louis Open University of Cyprus [email protected] [email protected] www.cse.wustl.edu/~bjuba cognition.ouc.ac.cy/loizos Artificial Intelligence research today?

B. Juba supported by an AFOSR Young Investigator Award and NSF Award CCF‐1718380 Syllogism xor Induction

AAAI 2018 (New Orleans, Louisiana, U.S.A.) February 03, 2018 2 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

High‐Level Tutorial Roadmap

A. Why KRR Should Embrace Learning

B. Introduction to PAC‐Semantics

C. Integrating Deduction and Induction

D. Reasoning Non‐Monotonically E. Overall Summary and Conclusions A

3 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 4 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

High‐Level Tutorial Roadmap

A. Why KRR Should Embrace Learning

B. Introduction to PAC‐Semantics

C. Integrating Deduction and Induction A. Why KRR Should Embrace Learning

D. Reasoning Non‐Monotonically Review of Standard KRR Problems E. Overall Summary and Conclusions

5 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 6 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

1 Representation and Reasoning Relational Representation and IQEs

(p, q, …). Connectives (∧, ¬, …). • Predicates (p, q, …). Variables (x, y, …). Tokens • Implications: φ  x. Equivalences: φ  x. (ti). Connectives (∧, ¬, …), Quantifiers (∀, ∃). • Reasoning semantics through entailment ⊨. • Proof procedures ⊢ to compute entailment. • From the of implications / equivalences, consider only those whose body comprises • Given formulas in KB and an input O, deduce independently quantified expressions (IQEs). whether a result R is entailed (KB ⋃ O ⊨ R). • ∃y∃z [ num(y) ∧ num(z) ∧ larger(z,y) ∧ ¬div(y,z) ] • Given formulas in KB and an input O, abduce • No tokens (they carry no meaning), small arity. an explanation E that entails O (KB ⋃ E ⊨ O). • ∀xs [ formula over IQEs  head_predicate(xs) ] (see later: these restrictions support learnability)

7 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 8 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Non‐Monotonic Reasoning Formal Argumentation in AI

• Non‐monotonicity typically viewed as • Abstract Argumentation framework of extending input O for fixed KB, • Arg is a of (no internal structure) and having result R become “smaller”. • Att is a binary relation on Arg (lifted on sets) • Useful also when extending KB as a • Goal: Find S  Arg that defends all its attacks. prerequisite to elaboration tolerance. • Several ways to make this precise [Dung ’95]. • Will use ‐based argumentation for NMR. • Most (all?) major NMR have been • Structured argumentation (e.g., ABA, ASPIC+): reformulated in terms of argumentation. is a classical proof from inputs and • Compatible with human cognition [Kakas+ ’16]. KB rules. The overall reasoning is not classical.

9 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael AAAI 2018 Tutorial: Integrating10 Learning into Reasoning Juba and Michael

Activity: Logic‐Based Arguments Reasoning in Temporal Settings more (for treated implies will_survive. total • Frame Problem (commonsense law of inertia):

fatal implies ¬will_survive. simplicity)

Properties persist unless caused to stop. preferred ordering viral_meningitis implies meningitis. • If you see a bird flying, it is flying a bit later. bacterial_meningitis implies meningitis. • Persistence rules are weaker than causal rules. bacterial_meningitis implies fatal. fatal implies ¬treatable. • Ramification Problem: Production of indirect meningitis implies ¬fatal. What should effects as a way to satisfy state constraints. meningitis implies treatable. we draw on each input? {}, {viral_meningitis}, • If you shoot a bird, it stops being able to fly. ¬fatal implies will_survive. {bacterial_meningitis}, • Encode ramifications as causal rules (since true implies ¬meningitis. {bacterial_meningitis,treated} constraints could also qualify causal change).

11 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 12 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

2 Reasoning in Temporal Settings Activity: Story Understanding

• Qualification Problem: Effects of actions are What follow from this story? blocked if they would violate state constraints. Papa Joe woke up early at dawn, and went off to • If you scare a dead bird, it does not fly away. the forest. He walked for hours, until the sight of a • This constraint does not produce ramifications. turkey in the distance made him stop. • Encode constraints as preclusion / block rules. A bird on a tree nearby was cheerfully chirping away, building its nest. He carefully aimed at the • State Default Problem [Kakas+ ’08]: If a state turkey, and pulled the trigger of his shotgun. constraint is violated, the exception persists. Undisturbed, the bird nearby continued chirping. • If you see a flying penguin, it remains one. Q1: What is the condition of the turkey? • Persistence rules are stronger than constraints. (a) Alive and unharmed. (b) Dead. (c) Injured.

13 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 14 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

… Through Argumentation

scene 1: forest(t1) at 1. turkey(ttokens2) at have 2. in(t 2,t1) at 2. scene 2: bird(t3) at 4. nearby(tno meaning3) at 4. chirping(t3) at 4. gun(t4) at 6. aim(t4,t2) at 6. pull_trigger(t4) at 7.

scene 3: nearby(t3) at 4–9. chirping(t3) at 4–9.

r1: turkey(X), alive(X), in(X,P), ruleaim(G,X), bodies … persist . A. Why KRR Should Embrace Learning r2: turkey(X), in(X,P), forest(P) impliesare IQEsalive(X). r : aim(G,X), pull_trigger(G) causes fired_at(G,X). arguments 3 with implicit r : fired_at(G,X), gun(G), turkey(X) causes ‐alive(X). 4 preferences The Task of Knowledge Acquisition r5: fired_at(G,X), gun(G), turkey(X) causes noise. r : noise, bird(B), nearby(B) causes ‐chirping(B). 6 [Diakidoy+ ’14,15] r7: ‐loaded(G), gun(G) precludes fired_at(G,X).

15 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 16 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

The Declaration of Independence What if KB not Given but Learned? (of KRR from Machine Learning) learned KB training real world We hold these truths to be self-evident: partial model of the world an appropriate knowledge base is given observe reasoning is evaluated against the KB chaining of rules is trivially beneficial reasoning same KB rules can be applied in any order • Then, KB is Probably Approximately Correct: • improbable the samples were unrepresentative acquisition of KB can be done a priori • future predictions will be approximately correct

17 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 18 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

3 Program, Memorize, or Induce? Typical Boolean Target Concepts

• Examples: Target Concept: Labels: • Conjunction : x3  x5  x7 • 1011101 the conjunction of x3, x5, x7 is 1 1101001 0 • Parity function: x  x  x  x 0101011 n 0 2 3 5 6 : {0,1}  {0,1} • 0010101 the parity of x , x , x , x is 0 0001110 f 1 2 3 5 6 … … • Decision list: x4,1, x2,0, x7,1, default,0 • 1010011 false: x4, x2 first true: x7 value: 1 • Concept Class C: bias, possible target concepts. • Linear threshold (hyberplane, perceptron): • Hypothesis Class H:all acceptable hypotheses. weights=0.3, 0.1, 0.5, 0.2, 0.7, 0.9, threshold=1 • Complexity: examples/time/memory to learn. • 0110100 110100  weights = 0.6 < threshold

19 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 20 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Activity: Is Learning Even Possible? Evaluation of Reasoning Process

• Suppose we have seen f123… • Evaluate reasoning process against given KB: f(1,1) = 2f(1,3) = 4 1 2 ? 4 ? • improve completeness, insisting on full f(2,2) = 4f(2,3) = 5 2 ? 45? • okay since KB is considered the golden standard f(3,1) = 4f(3,2) = 5 • not okay when KB is only approximately correct 3 45?? and have found some h6 that agrees with these. … ???? • Evaluate reasoning process when KB learned: • improve completeness, without compromising soundness much more than what is necessary • Whatever hypothesis hn we find, an adversary ex post chooses the value of f on the (n+1)‐th • soundness and completeness wrt an “ideal KB” (access only to its partial models during training) example to differ from the prediction of hn.

21 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 22 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Real World, the KB, and Reasoning Is Chaining Trivially Beneficial?

traditional KRR with learned KB • Yes, if rules in the KB are given / programmed. • Given: “if you have fever then you are sick” real world no access(!) partial models • Given: “if you are sick then visit a doctor” • Infer “visit a doctor” given “you have fever”? globally correct individual rules the KB (rules qualified) approx. correct • Prima facie no, if rules in the KB are learned. • Learning can render rule chaining superfluous against the KB against real world (cf. shortcuts, heuristics, fast thinking, hunch). reasoning unsoundess = 0 unsoundness << • Learned: “if you have fever then visit a doctor” completeness ++ completeness ++

23 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 24 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

4 High‐Level Tutorial Roadmap

A. Why KRR Should Embrace Learning

B. Introduction to PAC‐Semantics

C. Integrating Deduction and Induction

D. Reasoning Non‐Monotonically B E. Overall Summary and Conclusions

25 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 26 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Motivating example: birds.com Analytics

TM birds.com To determine: ¬[FOOD=FISH] “true… with“true?” high probability ? The data for data source?”

Day Bird no. Food 107 48 Seed B. Introduction to PAC‐Semantics 107 49 Grubs 107 50 Mouse 107 51 Mouse Semantics for Propositional Logic 107 52 Worm 107 53 Fish 107 54 Mouse 107 55 Grubs ªªª 27 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 28 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

PAC‐semantics for propositional logics PAC‐semantics for propositional logics

• Fixed, of propositional variables: • Probability distribution D over Boolean [FOOD=SEED], [FOOD=GRUBS], valuations for the propositional variables [FOOD=MOUSE], [FOOD=WORM], • Usually propositional variables capture [FOOD=FISH], …, [FLIES], [SWIMS], [HAS_BILL], attributes of data entry, sensor values, etc. [HAS_BEAK], [COLOR=BLACK], [COLOR=RED], • D captures range of possible combinations of [COLOR=BROWN], [COLOR=WHITE], values and their relative frequency [COLOR=GREEN], [COLOR=BLUE], … • Probability distribution D over Boolean ☞. [Valiant ’00] A formula φ(x1,…,xn) valuations for the propositional variables is (1‐ε)‐valid under D if PrD[φ(x1,…,xn)] ≥ 1‐ε. • NOTE: generally not uniform, not independent • ε may or may not be small (cf. [Adams ’75])

29 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 30 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

5 Classical entailment to PAC entailment: PAC‐semantics for propositional logics the union bound ☞ Definition. [Valiant ’00] A formula φ(x1,…,xn) • Consider the events [¬h1],…,[¬hk] is (1‐ε)‐valid under D if PrD[φ(x1,…,xn)] ≥ 1‐ε. • ε may or may not be small (cf. [Adams ’75]) D Probability Probability ≤ εk ¬h2 WHAT INFERENCES ≤ ε1 PRESERVE (1-ε)–? ¬h1 ¬h … k Probability • . [Juba ’13] Suppose h1,…,hk ≥ 1‐(ε1+…+ εk) are, respectively, 1‐ε1, …,1‐εk –valid, Union Bound: Total and {h ,…,h }⊧φ (classical entailment). 1 k probability ≤ ε1+…+ εk Then φ is 1‐(ε1+…+εk)‐valid.

31 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 32 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Classical entailment to PAC entailment: PAC‐semantics does not mean the union bound Probability Logic (e.g. Nilsson’86)

• Consider the events [¬h1],…,[¬hk] PAC‐semantics Logics of Probability • Union bound: the event • Classical Boolean • Probability bounds in [¬h1∨…∨¬hk] = [¬(h1∧…∧hk)] object language object language has total probability ≤ ε1+…+ εk • Probability bounds in • Classical (Tarskian) • Classical entailment: [h1∧…∧hk] ⊆ [φ] semantics • Therefore: PrD[φ] ≥ PrD[h1∧…∧hk] • Classical proof of a • Classical proof of a ≥ 1 ‐ (ε1+…+ εk)  formula guarantees probability bound • Summary: all classical inferences preserved, it holds with some on D that is true but each additional may incur a cost probability under D with certainty

33 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 34 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Motivating example: birds.com Analytics birds.comTM ¬[FOOD=FISH] ? 7 Seems ≈ /8‐valid… Day Bird no. Food 107 48 Seed 107 49 Grubs B. Introduction to PAC‐Semantics 107 50 Mouse 107 51 Mouse 107 52 Worm Learning and Inductive Inference 107 53 Fish 107 54 Mouse …JUSTIFIED ON 107 55 Grubs WHAT GROUNDS?? ªªª 35 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 36 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

6 Knowledge acquisition from data: The “i.i.d. data” assumption birds.com Analytics

TM birds.com To determine: • Assume: data consists of examples – ¬[FOOD=FISH] “true…(1 with‐ε)‐valid highfor probability D? ? valuations of the propositional variables The data for(D data is arbitrary source?) ” drawn independently from common D (“i.i.d.”) Day Bird no. Food • Will enable us to draw conclusions about D 107 48 Seed 107 49 Grubs 107 50 Mouse • Recall: if A,B,C,… are (mutually) independent, 107 51 Mouse Assume: Each entry (row) then Pr[A∧B∧C∧…] = Pr[A] Pr[B] Pr[C] ... 107 52 Worm is an “example,” drawn • Likewise, if random variables W,X,Y,... are 107 53 Fish independently from D 107 54 Mouse independent then E[WXY…] = E[W] E[X] E[Y]... 107 55 Grubs (and E[f(W)g(X)h(Y)…]=E[f(W)]E[g(X)]E[h(Y)]...) ªªª 37 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 38 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Inference from i.i.d. examples Inference from i.i.d. examples

• Assume: data consists of examples – • Suppose φ is not (1‐ε)‐valid under D. valuations of the propositional variables • Draw X(1),X(2) ..., X(m) independently from D. drawn independently from common D (“i.i.d.”) • Pr[φ(X(1)), φ(X(2)),…, and φ(X(m))] • Will enable us to draw conclusions about D = Pr[φ(X(1))]Pr[φ(X(2))]…Pr[φ(X(m))] • Suppose φ is not (1‐ε)‐valid under D. • By hypothesis, each Pr[φ(X(i))] ≤ 1‐ε (1) (1) (1) (1) • Draw X = (X1 ,X2 ,…,Xn ), • So, the probability that we fail to observe that (2) (2) (2) (2) X = (X1 ,X2 ,…,Xn ), Data set of φ is false for some example X(i) is at most ..., m “examples” (1‐ε)m ≤ e‐εm (…since for all x, 1+x ≤ ex) (m) (m) (m) (m) X = (X1 ,X2 ,…,Xn ) • Less than any given δ if m ≥ 1/ ln 1/ independently from D. ε δ

39 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 40 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Chernoff/Hoeffding bounds: Inference from i.i.d. examples sample averages are good estimates • So we have shown • Let Y(1), Y(2) ..., Y(m) be independent 1 1 Theorem. If φ is consistent with m ≥ /ε ln /δ random variables taking values in [0,1]. 1 (1) (2) (m) examples drawn independently from D, then Let μ = E[( /m)(Y +Y +...+Y )]. with probability 1‐δ, φ is (1‐ε)‐valid under D. • Hoeffding bound: for any γ > 0, 1 (1) (2) (m) ‐2mγ2 (“Probably Approximately Correct”) Pr[( /m)(Y +Y +...+Y ) > μ+γ] < e 1 (1) (2) (m) ‐2mγ2 Pr[( /m)(Y +Y +...+Y ) < μ‐γ] < e • Only guaranteed to work if φ is actually always • Chernoff bound: for any γ > 0, 1 (1) (2) (m) ‐mμγ2/3 true. What if φ is only (1‐ε’)‐valid under D, for Pr[( /m)(Y +Y +...+Y ) > (1+γ)μ] < e 1 (1) (2) (m) ‐mμγ2/2 some ε’ < ε? Pr[( /m)(Y +Y +...+Y ) < (1‐γ)μ] < e 7 (e.g., only /8‐valid, in the case of birds.com?)

41 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 42 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

7 Inference using Chernoff/Hoeffding Inference using Chernoff/Hoeffding

• We draw X(1), X(2) ..., X(m) independently from D • We draw X(1), X(2) ..., X(m) independently from D 1 (1) (2) (m) 1 (1) (2) (m) • For p = /m([φ(X )]+[φ(X )]+…+[φ(X )]), • For p = /m([φ(X )]+[φ(X )]+…+[φ(X )]), how large must m be to conclude that with how large must m be to conclude that with probability 1‐δ, φ is (p±γ)‐valid under D? probability 1‐δ, φ is (p±γ)‐valid under D? • Use the Hoeffding bound: for any γ > 0, • Use the Hoeffding bound: for any γ > 0, 1 (1) (2) (m) ‐2mγ2 1 (1) (2) (m) ‐2mγ2 Pr[( /m)(Y +Y +...+Y ) > μ+γ] < e Pr[( /m)(Y +Y +...+Y ) > μ+γ] < e 1 (1) (2) (m) ‐2mγ2 1 (1) (2) (m) ‐2mγ2 Pr[( /m)(Y +Y +...+Y ) < μ‐γ] < e Pr[( /m)(Y +Y +...+Y ) < μ‐γ] < e 1 2 ‐2mγ2 δ • For m ≥ /2γ2 ln /δ, can check that e ≤ /2; TRY IT! take a union bound of upper and lower bounds to conclude p is within ±γ of PrD[φ(X)].

43 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 44 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Comparison: PAC‐semantics versus Occam’s Razor [Blumer et al. ’87] Inductive Logic Programming PAC‐semantics Inductive Logic • Theorem. Let H be a set of formulas that can be Programming specified using at most B bits. Suppose we draw X(1), X(2) ..., X(m) independently from D for • Examples drawn from • Examples define 1 2 m ≥ /2γ2 ((B+1)ln2 + ln /δ). With probability 1‐δ, larger distribution D domain 1 (1) (2) (m) /m([h(X )]+[h(X )]+…+[h(X )]) is within ±γ of • D mostly unseen • Closed‐world Pr [h(X)]for every h in H. assumption D • Rules partially • Rules fully capture • We know: 1/ ([h(X(1))]+[h(X(2))]+…+[h(X(m))]) capture D domain m is within ±γ of PrD[h(X)] for any single h in H δ • Rules can be (a little) • Rules must be faithful with probability 1‐ /2B+1. inconsistent with to defining examples • There are < 2B+1 strings of B bits, so there are examples fewer than 2B+1 h in H. We take a union bound.

45 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 46 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

(Simplified) Learning to using Learning a chaining rule using Occam’s Razor [Khardon‐Roth ’97] Elimination [Valiant ’84]

• Recall: Deciding if a 3‐DNF (ORs of ANDs of at • Suppose xt is defined by a conjunction, i.e., most three literals) is valid is NP‐complete. xt ⟺ l1∧l2∧…∧lk where l1,l2,...,lk are • There are 2O(n3) 3‐DNFs. literals. • Theorem. Suppose we draw X(1), X(2) ..., X(m) • There are only 22n conjunctions of literals on 1 3 1 independently from D for m = O( /γ2(n +ln /δ)). n propositional variables. Then with probability 1‐δ, • Occam’s Razor: any rule xt ⟺ l1∧l2∧…∧lk 1 (1) (2) (m) 1 1 /m([h(X )]+[h(X )]+…+[h(X )]) is within ±γ that is consistent with m = O( /γ2(n+ln /δ)) of PrD[h(X)]for every 3‐DNF h. examples is (1‐γ)‐valid with probability 1‐δ. • In particular, if all h(X(i))=1, we guarantee • We only need to find such a rule. h is at least (1‐γ)‐valid with probability 1‐δ.

47 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 48 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

8 Learning a chaining rule using Learning a chaining rule using Elimination [Valiant ’84] Elimination [Valiant ’84]

1 1 • We only need to find xt⟺l1∧l2∧…∧lk • We find a (1‐γ)‐valid rule using O( /γ2(n+ln /δ)) 1 1 consistent with m = O( /γ2(n+ln /δ)) examples examples by taking the conjunction of all (i) literals for which whenever (X(i)) = 0, X (i) = 0. • Elimination: for each ith example, if Xt = 1, l t then any literal l with l(X(i)) = 0 cannot be • As stated, the runs in time n 1 included in the rule. Delete it. O( /γ2(n+ln /δ)) so this is efficient. 1 1 • The rule given by the conjunction of the • Note: actually, O( /γ(n+ln /δ)) examples will do remaining literals must contain the actual defining conjunction l1∧l2∧…∧lk. • Therefore, we find a conjunction that is consistent with all m examples, as needed.

49 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 50 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

PAC Learning (with Complete Info)

010110 exm D f 0 α B. Introduction to PAC‐Semantics Coping with Partial Information Given access to (exm,α)’s drawn during training, w.h.p. and efficiently produce h s.t. on (exm’,α’) drawn during testing, h(exm’) α’ w.p. at most ε.

51 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 52 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Learning and Prediction Scenarios Complete the Missing Value

Marital Age Smoking Drinking Blood Cholesterol Exercising • Construct hypothesis to predict target x1. Status Habits Habits Pressure Level Habits • 1101100011011 learning single 3 weekly 130/90

• ?011101001010 prediction married 49 cigarettes high parent 10 daily • What if we want to predict multiple x ’s? no jogging j 32 drinking normal daily • 1101100011011 learning divorced 2 daily 145/100 • ?01?1??0010?0 prediction 68 cigars no 3 daily exercise • What if we want to learn autonomously? single no slightly parent smoking 125/80 elevated • 1?011?001?0?? learning gym 25 rarely normal • ?01?10?001010 prediction 3 weekly

53 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 54 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

9 Structure in Missingness? Autodidactic Learning (PAC + Missing)

• Q: What is the number of your credit card? 010110 0??1?0 • The responder may not wish to share it. obs • What if the responder does not have one? D mask 0 ? α • Q: When was the last time you ate apples? f • The responder may have a poor memory. nature

• Q: Where you ever convicted for murder? Given access to (obs,α)’s drawn during training, • The responder may not wish to answer… w.h.p. and efficiently produce h s.t. on (obs’,α’) • … especially if the answer would be “yes”! drawn during testing, h(obs’) α’ w.p. at most ε.

55 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 56 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Reading Between the Lines Learning the Hidden Reality

[Michael ’09] • Need learning able to: • identify structure in the hidden example, • given access only to partial observations, • without knowing how masking works.

• Autodidactic learning [Michael ’08,10]: • Suffices to learn rules consistent with obs.

• e.g., predict that x1 = 0 in ?010??11 • W.h.p., these rules make accurate predictions.

57 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 58 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Evaluation of an Individual Rule When to Abstain from Predictions

evaluate rule (X X )  X example observation 5 7 3 • Choosing to abstain would trivialize learning. inference consistent accurate • Hypotheses are still Boolean functions… 0010110 0?10??0 1 Yes Yes 0100100 ?10??00 1 No No • Abstain only if hypothesis not fully determined. 0100100 ?10??0? ? Yes Abstain • Assume hypothesis is h =x1 x3  x7 0010110 0??01?? 1 Yes Yes • If observation is x = α10?1101 then h(x) = ? 0100100 0??01?? 1 Yes No • If observation is x = α10?1100 then h(x) = 0 Theorem: For each mask there is η s.t. each (1‐η ε)‐ consistent rule is (1‐ε)‐accurate. The “discount” is tight. • Does missing info “kill” our ability to predict?

59 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 60 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

10 Activity: Predict the Hidden Animal Autodidactic Learnability [Michael ’08,10]

• Theorem: All classes of monotone and read‐ once formulas that are PAC‐learnable, are also autodidactically learnable (arbitrary masking). • Learning algorithm: • Ignore observations where the label is masked. • During training α0??1?0 to α0αα1α0. • PAC‐learn from resulting (complete) instances. • Theorem: Parities and monotone term 1‐ decision lists are not properly learnable.

61 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 62 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Causal Learnability [Michael ’11]

010110 D 110111 nature t B. Introduction to PAC‐Semantics Dealing with First‐Order Expressions ? 101 h 101 f

63 Introspective Forecasting Loizos Michael (OUC) 64 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

An Example of Learned Knowledge How To Learn Relational Rules?

• NL query : “members share something” • Looking to learn a rule in the correct form:

• logic form : member(t1), something(t3), share(t1,t3) • No tokens (they carry no meaning), small arity. • knowledge : (trained on “spyware” web pages) • ∀xs [ formula over IQEs  head_predicate(xs) ] file(x)  threshold(1.0) % pos:1852 neg:1831 • The formula belongs in a concept class that is v : scan(v,x)  rogue(v) weight(0.962710) % pos:16 neg:1 known to be PAC learnable (cf. [Valiant ’00]). v : share(v,x) weight(1.627098) % pos:11 neg:1 • Linear thresholds with propositional features. v : have(x,v)  program(v) weight(0.645691) % pos:19 neg:0 v : open(v,x) weight(1.593269) % pos:27 neg:2 • Recall: also learnable from partial observations. • inference : file(t ) 3 Relaonal Scenes → Proposional Examples → • NL answer : “something is a file” [Michael ’13] Proposional Learning → Relaonal Hypothesis

65 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 66 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

11 Activity: Learn a Linear Threshold Examples from Relational Scenes

• Initially assign weight 0.8 to every . • Instead of Boolean learning examples, we get • When a negative example is predicted true: relational scenes giving instances of : divide by 2 weights of true propositions. file(t2), scan(t7,t2), rogue(t7), ¬file(t5), ¬open(t5,t2). • When a positive example is predicted false: multiply by 2 weights of true propositions. • For a head (e.g., file(x)), consider all IQEs that follow a schema (e.g., scan ∧ rogue): • Hidden target concept: 0.3, 0.0, 0.0, 0.7, 0.5 scan(x,x) ∧ rogue(x), ∃v : scan(x,x) ∧ rogue(v), 000110, 011101, 101011, 001110, 110010, ∃v : scan(x,v) ∧ rogue(x), ∃v : scan(x,v) ∧ rogue(v), 011001, 111111, 101011, 001101, 011101. ∃v : scan(v,x) ∧ rogue(x), ∃v : scan(v,x) ∧ rogue(v), • Computed hypothesis: 0.2, 0.1, 0.05, 0.8, 0.2 ∃u,v : scan(u,v) ∧ rogue(v), ∃u,v : scan(u,v) ∧ rogue(u).

67 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 68 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Propositionalizaton of Learning

Full info. Small arity. Few tokens. scene 1 P(1), P(2), …, P(5), 2 P(1,1), …, P(2,3), …, P(5,5), 3 P(1,1,1), …, P(3,5,4), …, P(5,5,5), 5 P(1), P(2), …, P(5). 4 10000000000010…00…010…000001 x=3 P(x) withP(x,a,b), P(a,x), … C x=5 P(x) withP(a,x,b), P(a,x,b)P(c,a), …

69 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 70 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

High‐Level Tutorial Roadmap

A. Why KRR Should Embrace Learning

B. Introduction to PAC‐Semantics

C. Integrating Deduction and Induction C. Integrating Deduction and Induction

D. Reasoning Non‐Monotonically Interleaving Learning and Reasoning E. Overall Summary and Conclusions

71 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 72 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

12 The Declaration of Independence Reasoning  Fill‐In Missing Values

(of KRR from Machine Learning) Marital Smoking Drinking Blood Cholesterol Exercising Status Age Habits Habits Pressure Level Habits We hold these truths to be self-evident: single 3 weekly 130/90 married 49 cigarettes high parent 10 daily an appropriate knowledge base is given no jogging 32 drinking normal daily reasoning is evaluated against the KB divorced 2 daily 145/100 cigars no chaining of rules is trivially beneficial 68 3 daily exercise single no 125/80 slightly KB rules can be applied in any order parent smoking elevated gym 25 rarely normal acquisition of KB can be done a priori 3 weekly

73 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 74 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

KB Performance Evaluation Multiple Targets and Autonomy

• Example knowledge base: • Predict multiple xj‘s and learn autonomously. • (x3 and not x5) determines the value of x2 • 1?011?001?0?? learning • (not x4 or not x1) determines the value of x6 • ?01?10?001010 prediction example observation KB inference KB evaluation • Options on how to tackle this task: no disagreement 101011 1?101? 101011 no “don’t know” • Learn hypotheses first, then apply in parallel. disagreement 001001 0?100? 011001 no “don’t know” • Learn hypotheses first, then apply by chaining. no disagreement • Simultaneously learn hypotheses and predict. 010101 0??101 0??101 “don’t know” 50% sound disagreement 75% complete • Which of these approaches is appropriate? 010001 110000 110001 no “don’t know”

75 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 76 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Basic Hypotheses / Predictors Reasoning Process Semantics

• Given a specified attribute xi of the world, • exm = ‐1, 0, 5,0,4, ‐7, 9 S unknownendogenous list and sufficient resources during training, • obs = ‐1, 0, ,0,4, , 9 S masksand exogenousexm S qualification • produce a PAC predictor (=rule) Pj for xi Policy P determines how predictors(hints are ofapplied. NMR?) (P ) x x +x (P ) x  x  x (P ) x  x ‐ x /2 • e.g., x7  x1  ¬x3 is a predictor for x7 1 3 5 7 2 4 2 6 3 6 3 5

e.g., P = {P1, P2, P3} e.g., P = {P1, P2}, {P3} • Predictor Pj is “classical” and could potentially “flat/parallel” policy “chaining” of predictors • abstain, when its body is not fully determined P(obs) = ‐1, 0, 5,0,4, , 9 P(obs) = ‐1, 0, 5,0,4, 3, 9 • predict incorrectly (cf. approximately correct) sound against exm unsound against exm • but generally improve completeness of the DB but incomplete but complete

77 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 78 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

13 Activity: Chaining & Completeness Chaining is Provably Beneficial

• Example knowledge base: Definition: Chaining collapses on S against S if • (noon time)  lunch time • for every policy P with some performance • (lunch time and Bob hungry)  Bob eats (soundness + completeness) on S against S, • Statement: • there exists a flat policy P’ (not a reordering of • “It was not noon time yet, but Bob was hungry.” P, necessarily) with equally high performance.

Theorem: [Michael ’14] There exist S, S such • Multi‐layered application: Bob does not eat. that chaining does not collapse on S against S. • Single‐layered application: “don’t know”. Proof: Chaining can simulate non‐monotonic • Chaining is better than any single‐layered KB. reasoning, which is beyond individual predictors.

79 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 80 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Analogy to Neural Networks Learn First, Then Predict (& Chain)

There are multilayered NNs • Consider rules R1 and R2 obtained to make that compute more complex highly accurate predictions on distribution D.

functions than those by any • On future examples from D, apply R1 then R2. NN without hidden layers.

• No! R2 is applied on distribution R1 ○ D ( ≠ D ). There are no guarantees on that distribution! Trivially, because of larger hypothesis space. • Are there situations that justify this approach? What if each neuron can compute any function? • If information during learning is complete. What if neurons can abstain from predictions? • Could happen, but less realistic assumption.

81 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 82 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Analogy to Neural Networks Simultaneously Learn and Predict

Cannot train each neuron • Learn rules for all attributes during training. independently and then • Apply those rules to enhance the training set. assign neurons in layers. • Repeat as necessary to get more KB “layers”. • Theorem: [Michael ’14] SLAP does not reduce soundness or completeness. It may produce Trivially, because the target to be learned by KBs with better soundness and completeness each specific neuron is not directly observable. (combined) than any parallel (classical) KB. Proof: Train each rule on its training distribution, until its predictions are fully sound for subsequent training. What if one could train each neuron in isolation? Doubly‐exponential dependence on reasoning depth.

83 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 84 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

14 An Experimental Impasse? Guess the Missing Word [Michael+ ’08]

• Scene is an observation. e.g., 1?10010??10 • Training T0 & testing E0 from news text corpus. • Learn underlying text structure [Michael ’09]. • Goal: Predict what information holds in scene. • Using T0 learn relational thresholds for 268 • Predictions useful exactly when information is frequently‐used words: price, market, stock, … missing from scene. How to evaluate them?! • Train on T0 and test on E0 for each of 268 tasks. • Evaluate another task with known answers! • Using T0 learn relational thresholds for 599 • Use standard approach to solve the task. (1) verbs: buysbj,obj, chargeobj, coerceobj,prd, … • Make predictions, and give them as input. • Use verb rules on T0 to get T1 / on E0 to get E1. • See whether task performance improves. (2) • Train on T1 and test on E1 for each of 268 tasks.

85 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 86 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Experimental SLAP Results [Michael+ ’08] Multiple Iteration SLAP [Skouteli+ ’16]

100% “missing word” task LUCAS (LUng CAncer Simple set) corpus train on T & test on E 0 0 values 80% not recovered

A: word/pos + verb obscured)

recovered correctly

C: word/pos + prox. missing 60% recovered but incorrectly

D: use all instances

randomly 40%

initially

of

“missing word” task 20% values train on T1 & test on E1 of

B: word/pos + verb 0% 1 5 10 15 20 (30% E: use all instances percentage number of iterations (hypotheses are 2‐layer NNs)

87 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 88 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Introspective Forecasting [Michael ’15]

010110 D 001011 110111 nature t C. Integrating Deduction and Induction  Implicit Learning of Testable KBs ? 101 h 101 f

89 Introspective Forecasting Loizos Michael (OUC) 90 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

15 A problem requiring learning and birds.com: Inferring Flying from Food? deduction: birds.com Analytics TM The (statistical) query: birds.com Day Bird no. Food “true with high probability • Data: FLY? 107 48 Seed The data for data source?” 107 49 Grubs ªªª Day Bird no. Food • Query: FLY? 107 48 Seed 107 49 Grubs • Background knowledge: 107 50 Mouse PENGUIN∨FLY 107 51 Mouse ¬PENGUIN∨EAT(FISH) 107 52 Worm ¬EAT(FISH)∨¬EAT(GRUBS) 107 53 Fish ¬EAT(FISH)∨¬EAT(MOUSE) foods are mutually exclusive… 107 54 Mouse THE DATA ALONE DOESN’T ¬EAT(FISH)∨¬EAT(SEED)THE KNOWLEDGE ALONE ALSO …etc… 107 55 Grubs ANSWER THE QUERY DOESN’T ANSWER THE QUERY ªªª 91 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 92 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Relevant knowledge The Relevant Property completes a we aim to discover from the data Proof of the Query ∅ • Relevant property: the birds eat anything but fish— Negated Query ¬FLY FLY EAT(GRUBS)∨EAT(MOUSE)∨EAT(SEED)∨… PENGUIN∨FLY ¬PENGUIN • Usually satisfied by Data ✔ Day Bird no. Food • Completing a Proof of Query… 107 48 Seed ¬PENGUIN∨EAT(FISH) ¬EAT(FISH) 107 49 Grubs 107 50 Mouse Background Knowledge ¬EAT(FISH)∨EAT(SEED)∨… 107 51 Mouse 107 52 Worm 107 53 Fish ¬EAT(FISH)∨¬EAT(MOUSE) ¬EAT(FISH)∨EAT(MOUSE)∨EAT(SEED)∨… ªªª ∨ ∨ ∨… ¬EAT(FISH)∨¬EAT(GRUBS) EAT(GRUBS) EAT(MOUSE) EAT(SEED) Relevant Property

93 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 94 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Relevant knowledge Approximate Query Answering with we aim to discover from the data Implicit Learning

Day Bird no. Food • Relevant property: the birds eat anything but fish— • Data: 107 48 Seed EAT(GRUBS)∨EAT(MOUSE)∨EAT(SEED)∨… 107 49 Grubs • Usually satisfied by Data ✔ ∅ ªªª • Completing a Proof of Query ✔ ¬FLY FLY • Background knowledge: PENGUIN∨FLY ¬PENGUIN∨EAT(FISH) ¬EAT(FISH)∨¬EAT(GRUBS) …etc… • The existence of such knowledge PENGUIN∨FLY ¬PENGUIN DON’T NEED • Query: FLY? TO SAY WHAT establishes the query (FLY) PROPERTY ☞It isn’t important what ¬PENGUIN∨EAT(FISH) ¬EAT(FISH) • Verdict: Does there exist a relevant property the actual property says! e.g., EAT(GRUBS)∨EAT(MOUSE)∨EAT(SEED)∨… FLY ☞We are not told what property to search for… • Usually satisfied by Data • Completing Proof of QueryPENGUIN∨FLY ¬PENGUIN ∨ 95 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 96 AAAI 2018 Tutorial: Integrating Learning into¬PENGUIN Reasoning EAT(FISH) Juba and¬EAT(FISH) Michael

16 Proposed Algorithm Proposed Algorithm for Approximate Query Answering for Approximate Query Answering Given a query φ, background knowledge KB, Given a query φ, background knowledge KB, target ε, and i.i.d. partial examples ρ(1),…, ρ(m) target ε, and i.i.d. partial examples ρ(1),…, ρ(m) • For each partial example ρ(i), • For each partial example ρ(i),

• If φ|ρ(i) provable from KB|ρ(i), • If φ|ρ(i) provable from KB|ρ(i), increment SUCCESS (initially, 0) increment SUCCESS (initially, 0) SUCCESS SUCCESS • If /m > 1‐ε, ACCEPT, otherwise REJECT • If /m > 1‐ε, ACCEPT, otherwise REJECT

(i) PARTIAL EVALUATION. ∧ PARTIAL EVALUATION: PLUG IN ρ AND =1 ¬z RECURSIVELY REPLACE EACH LOCALLY ∨ ∨ DETERMINED CONNECTIVE BY ITS VALUE. ρ: x=0, y=0 =0 ¬x =1 z y¬z 97 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 98 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Algorithm on Problem Approximate Queries: The theorem birds.com [Juba ’13]

TM birds.com FLY? PENGUIN∨FLY Theorem. Our algorithm distinguishes: ∨ OnDay most partial Bird no. examples… Food ¬PENGUIN EAT(FISH) I. The query is only satisfied by D w.p. < 1‐ε‐γ 107 48 Seed ¬EAT(FISH)∨¬EAT(GRUBS) 107 49 Grubs ¬EAT(FISH)∨¬EAT(MOUSE) II. There exists a “(1‐ε+γ)‐testable” formula ψ ∨ 107 50 Mouse ¬EAT(FISH) ¬EAT(SEED) for which there exists a proof of the query …etc… 107 51 Mouse … from ψ and any background knowledge 107 52 Worm … some background rule 1 1 w.p. 1‐δ, given / 2 ln / partial examples 107 53 Fish simplifies to ¬EAT(FISH)… γ δ 107 54 Mouse FLY from M(D) 107 55 Grubs PENGUIN∨FLY ¬PENGUIN ªªª ¬PENGUIN∨EAT(FISH) ¬EAT(FISH) ACCEPT … completing 99 AAAI 2018 Tutorial: Integrating Learning into Reasoning a proof ofJuba the and Michael query 100 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Key Definition: Testable formulas Key Definition: Testable formulas Relevant Properties we can detect [Juba ’13] Relevant Properties we can detect [Juba ’13] • Definition. A formula ψ is (1‐ε)‐testable under • Definition. A formula ψ is (1‐ε)‐testable under a distribution over partial examples M(D) if a distribution over partial examples M(D) if PrM(D)[ψ|ρ=1] ≥ 1‐ε PrM(D)[ψ|ρ=1] ≥ 1‐ε birds.comTM PARTIAL EVALUATION: PLUG IN ρ AND Day Bird no. Food RECURSIVELY REPLACE EACH LOCALLY 107 48 Seed EAT(GRUBS)∨EAT(MOUSE)∨EAT(SEED)∨… DETERMINED CONNECTIVE BY ITS VALUE. 107 49 Grubs 107 50 Mouse 107 51 Mouse 107 52 Worm 107 53 Fish ªªª

101 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 102 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

17 Key Definition: Testable formulas Approximate Queries: The theorem Relevant Properties we can detect [Juba ’13] [Juba ’13] • Definition. A formula ψ is (1‐ε)‐testable under Theorem. Our algorithm distinguishes: a distribution over partial examples M(D) if I. The query is only satisfied by D w.p. < 1‐ε‐γ PrM(D)[ψ|ρ=1] ≥ 1‐ε II. There exists a (1‐ε+γ)‐testable formula ψ for which there exists a proof of the query • Require more than truth of Relevant Property from ψ and any background knowledge • Standard cases (clause/linear inequality 1 1 w.p. 1‐δ, given /γ2 ln /δ partial examples ): actually no more demanding from M(D)

103 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 104 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Proposed Algorithm Detecting a Relevant Property… for Approximate Query Answering Consider: Tractable Proof Systems Given a query φ, background knowledge KB, • Bounded‐width target ε, and i.i.d. partial examples ρ(1),…, ρ(m) • Treelike, bounded clause space resolution • For each partial example ρ(i), e.g., • If φ|ρ(i) provable from KB|ρ(i), ∅ increment SUCCESS (initially, 0) ¬FLY FLY SUCCESS • If /m > 1‐ε, ACCEPT, otherwise REJECT PENGUIN∨FLY ¬PENGUIN

¬PENGUIN∨EAT(FISH) ¬EAT(FISH)

105 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 106 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

A Key Property of the Insight: The Testable Premises Example Tractable Proof Systems Drop Out of the Proof! • Bounded‐width resolution ∅ • Treelike, bounded clause space resolution ρ: ☞Partial evaluation of proofs of these forms yields proofs of the same form (from a proof of a query φ, we obtain a proof of φ|ρ of the same syntactic form)

Premises of testable property

107 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 108 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

18 Recall: Proposed Algorithm Pros and cons of Implicit Learning Detects These Residual Proofs Given a query φ, background knowledge KB, • Pro: utilizes rules with imperfect validity target ε, and i.i.d. partial examples ρ(1),…, ρ(m) • Usually intractable to learn explicitly • For each partial example ρ(i), • Captures kinds of commonsense reasoning (next part) • If φ| (i) provable from KB| (i), ρ ρ • Pro: reasoning time complexity independent increment SUCCESS (initially, 0) of size of implicit KB SUCCESS • If /m > 1‐ε, ACCEPT, otherwise REJECT • Actually, may reduce reasoning complexity in some circumstances [Juba ’15] THE THEOREM THEREFORE • Con: cannot report rules used to support FOLLOWS IMMEDIATELY FROM conclusion HOEFFDING’S INEQUALITY

109 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 110 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

High‐Level Tutorial Roadmap

A. Why KRR Should Embrace Learning

B. Introduction to PAC‐Semantics

C. Integrating Deduction and Induction

D. Reasoning Non‐Monotonically D E. Overall Summary and Conclusions

111 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 112 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Nonmonotonicity in learning to reason [Roth ’95] • We have seen: deciding queries by counting the frequency with which they are provable • Allows us to answer “hard” queries • Draws on a potentially large KB of implicitly D. Reasoning Non‐Monotonically learned knowledge as needed • New twist: suppose we incorporate a hypothesis by filtering out examples that do Conditional Probability and NMR not satisfy the hypothesis • Produces desirable non‐monotonic inferences, appropriate for “commonsense reasoning”

113 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 114 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

19 Nonmonotonicity in Nonmonotonicity in learning to reason [Roth ’95] learning to reason [Roth ’95] • Example: suppose we are reasoning about “birds” – • Example: Now suppose we consider specifically red we filter the set of examples to only include bird = 1: birds, filtering to only include bird∧red = 1: bird fly has_beak red purple penguin bird fly has_beak red purple penguin 111000 111000 111100 111100 101001 101001 101001 101001 111000 111000 111100 111100 111000 111000 5 • We find has_beak is (1‐γ)‐valid, fly is ( /7‐γ)‐valid, but • Now has_beak and fly are (still) (1‐γ’)‐valid, and 2 red and penguin are at most ( /7+γ)‐valid… penguin is at most γ’‐valid (for some γ’ > γ)… 115 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 116 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Nonmonotonicity in Nonmonotonicity in learning to reason [Roth ’95] learning to reason [Roth ’95] • Example: now suppose we are considering penguins, • Example: now considering specifically “penguins with filtering to only include penguin= 1: beaks,” we only include penguin∧has_beak = 1: bird fly has_beak red purple penguin bird fly has_beak red purple penguin 111000 111000 111100 111100 101001 101001 101001 101001 111000 111000 111100 111100 111000 111000 • We find has_beak is still (1‐γ’)‐valid, but • Again, we find fly and red are at most γ’‐valid – fly and red are now at most γ’‐valid not affected by has_beak

117 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 118 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Nonmonotonicity in Nonmonotonicity and learning to reason [Roth ’95] conditional probability distributions • Example: if we instead consider “purple penguins,” we • A set of examples filtered to satisfy a formula only include examples with penguin∧purple = 1: h has the conditional probability distribution bird fly has_beak red purple penguin D|[h(X)=1] (we “condition on h”) 111000 • So: as long as we consider (1‐ε)‐validity for 111100 1 101001 some ε>0 (e.g., ε = /3 could have sufficed), 101001 conditioning may have a non‐monotonic effect 111000 • Note: this requires the use of non‐negligibly 111100 111000 large ε • With no examples remaining, we cannot draw any • Since: we must have enough examples in the conclusions (except perhaps from a given KB) conditional distribution to support inferences

119 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 120 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

20 Classical issues in commonsense Classical issues in commonsense reasoning, in PAC‐semantics reasoning, in PAC‐semantics • Qualification problem: taking ε>0 permits a • Ramification problem: any further rule to fail for any number of unspecified and consequences r to a hypothesis h are included: unmodeled . The commonsense rule for any example x satisfying h, we are given we implicitly use is never written out. that x also satisfies r, so r will be highly valid in • Elaboration tolerance: we can simply add a D|[h(X)=1]. So, r is included in the implicit KB new example to our set of examples, and the of D|[h(X)=1] without further consideration. next time we answer a query the count will be • E.g.: ¬fly is highly valid in D|[penguin(X)=1]. slightly different. But the implicit KB does not need to be “edited” in any way.

121 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 122 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Context in reasoning using “preconditions” [Valiant ’94,95,00] • The condition h in D|[h(X)=1] is sometimes called a “precondition” • Valiant proposed: when answering a query, a precondition capturing the current context should be used to filter examples. D. Reasoning Non‐Monotonically • E.g., given by the units currently firing in the “neuroidal” cognitive model [Valiant ’94] • Problem: may be too specific. Preconditions and Abduction • How should we choose a precondition?

123 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 124 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Reasoning with preconditions Reasoning with preconditions: [Juba ’16] formalization [Juba ’16] • One possible formulation of reasoning with • Fix a class of Boolean representations H preconditions: • Given query formula φ; context assignment x*, ε, δ, μ∈(0,1); access to examples from D, “Does there exist a precondition h from a class • Suppose that there exists a h*∈H such that of representations H such that… 1. h* supports φ: PrD[φ(X)|h*(X)] = 1 2. h* is common: PrD[h*(X)] ≥ μ 1. h supports the query φ 3. h* is consistent with context x*: h*(x*) = 1 2. h is common • Find a h (ideally in H) such that with prob. 1‐δ, 3. h is consistent with the current context x?” 1. PrD[φ(X)|h(X)] ≥ 1‐ε • If any precondition supports the query, 2. PrD[h(X)] ≥ μ’ for some μ’ (ideally close to μ) we will find one. 3. h(x*) = 1

125 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 126 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

21 Reasoning with k‐DNF preconditions The precondition task, in pictures… is possible (complete information) Theorem. If there is a k‐DNF h* such that

1. h* supports φ: PrD[φ(X)|h*(X)] = 1 2. h* is common: PrD[h*(X)] ≥ μ • x* 3. h* is consistent with context x*: h*(x*) = 1 x ∈ {0,1}n then using m = O(1/ (nk+log1/ )) examples, x: h(x)=1 με δ in time O(mnk) we can find a k‐DNF h such that

φ(x)=1 with probability 1‐δ, φ(x)=1 φ(x)=0 1. h supports φ: Pr [φ(X)|h(X)] ≥ 1‐ε • x* D h(x)=1 2. h is common: PrD[h(X)] ≥ μ 3. h is consistent with context x*: h(x*) = 1

127 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 128 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Finding a supporting k‐DNF Analysis pt 1: Pr [h(X)] ≥ μ, h(x*) = 1 precondition using Elimination D • Start with h as an OR over all terms of size k • We are given that some k‐DNF h* has (1) (m) • For each example x ,…,x 1. h* supports φ: PrD[φ(X)|h*(X)] = 1 (i) • If φ(x ) = 0, 2. h* is common: PrD[h*(X)] ≥ μ delete all terms T from h such that T(x(i)) = 1 3. h* is consistent with context x*: h*(x*) = 1 • If h(x*)=1, return h • Initially, every term of h* is in h • Else return FAIL (no supporting precondition) • Terms of h* are never true when φ(x)=0 by 1. ☞every term of h* remains in h Running time is still clearly O(mnk) ☞h* implies h, so PrD[h(X)] ≥ PrD[h*(X)] ≥ μ and h(x*)=1 since h*(x*)=1 by 3.

129 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 130 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Analysis pt 2, cont’d: Analysis pt 2: PrD[φ(X)|h(X)] ≥ 1‐ε PrD[φ(X)|h(X)] ≥ 1‐ε

• Rewrite conditional probability: • We’ll show: PrD [[¬φ(X)]⋀h(X)] ≤ εμ Pr [[¬φ(X)]⋀h(X)] ≤ εPr [h(X)] D D • Consider any h’ s.t. PrD [[¬φ(X)]⋀h’(X)]>εμ • We’ll show: PrD [[¬φ(X)]⋀h(X)] ≤ εμ • h’ is only possibly output w.p. < (1‐εμ)m ( ≤ εPrD [h(X)] by part 1) • There are only 2O(nk) possible k‐DNF h’ • Consider any h’ s.t. PrD[[¬φ(X)]⋀h’(X)]>εμ • Since 1‐z ≤ e‐z, m = O(1/ (nk+log1/ )) ex’s (i) με δ • Since each X is drawn independently from D suffice to guarantee that each such h’ (i) ⋀ (i) m k PrD [no i has [¬φ(X )] h’(X )] < (1‐εμ) is only possible to output w.p. < δ/2O(n ) • A term of h’ is deleted when φ=0 and h’ =1 ☞w.p. >1‐δ, h has Pr [[¬φ(X)]⋀h(X)] ≤ εμ. • So, h’ is only possibly output w.p. < (1‐εμ)m D

131 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 132 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

22 Reasoning with k‐DNF preconditions Extension: Finding preconditions is possible (complete information) tolerating ε>0 Theorem. If there is a k‐DNF h* such that • Given only that some h* achieves

1. h* supports φ: PrD[φ(X)|h*(X)] = 1 1. h* supports φ: PrD[φ(X)|h*(X)] ≥ 1‐ε 2. h* is common: PrD[h*(X)] ≥ μ 2. h* is common: PrD[h*(X)] ≥ μ 3. h* is consistent with context x*: h*(x*) = 1 3. h* is consistent with context x*: h*(x*) = 1 1 k 1 Find an h such that for some other μ’ & ε’, then using m = O( /με (n +log /δ)) examples, in time O(mnk) we can find a k‐DNF h such that 1. PrD[φ(X)|h*(X)] ≥ 1‐ε’ 2. Pr [h(x)] ≥ μ’ for some μ’ (ideally close to μ) with probability 1‐δ, D 3. h(x*) = 1 1. h supports φ: PrD[φ(X)|h(X)] ≥ 1‐ε 2. h is common: PrD[h(X)] ≥ μ • Extension of algorithm for k‐DNF achieves 3. h is consistent with context x*: h(x*) = 1 μ’=μ, ε’=O(nkε) (only delete T making εμmerrors)

133 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 134 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Abductive reasoning: Why might we want a new model? making plausible guesses • This formulation of precondition search is a • Existing models only tractable in simple cases form of • E.g. Horn rules (a⋀b⋀c⇒d …no negations), • Given a conclusion c, “nice” (conjugate) priors find a “plausible” h that implies/leads to/… c • The choice of formulation, prior distribution, • Proposing a precondition supporting the query etc. really matters • Two varieties of “plausibility” in common use • And, they are difficult to specify by hand • Syntactic: a small h from which c follows • Bayesian: a h which has large posterior probability when given c ie., “Pr [h actual rule used|c true] > …”

135 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 136 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

New model: abductive reasoning Example: identifying a subgoal from random examples [Juba ’16] • Task: for a conclusion c, find a h such that • Consider: blocks world. For t=1,2,…,T B A 1. Plausibility: PrD[h(X)] ≥ μ (for some given μ) • Propositional state vars. (“fluents”) 2. h almost entails c: PrD[c(X)|h(X)] ≥ 1‐ε ONt(A,B), ONt (A,TABLE), ONt (C,A), etc. • Actions also encoded by propositional vars. • Note: D now captures both the entailment PUTt(B,A), PUTt(C,TABLE), etc. relation and the measure of “plausibility” • Given many examples of interaction…

• Our goal c: ONT(A,TABLE)⋀ONT(B,A) ⋀ONT(C,B) • Distinction from earlier precondition search: • A perhaps plausibly good “subgoal” h: no “context” assignment x* [ONT‐1(B,A) ⋀PUTT(C,B)]⋁[PUTT‐1 (B,A) ⋀PUTT (C,B)]

137 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 138 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

23 Formally: abductive reasoning Abducing k‐DNFs is also possible from random examples for a class H (complete information) Theorem. If there is a k‐DNF h* such that • Fix a class of Boolean representations H 1. Plausibility: Pr [h*(X)] ≥ μ • Given Boolean formula c; ε, δ, μ∈(0,1); D independent examples x(1),…,x(m) ∈D, 2. h* entails c: PrD[c(X)|h*(X)] = 1 1 k 1 • Suppose that there exists a h*∈H such that then using m = O( /με (n +log /δ)) examples, in time O(mnk) we can find a k‐DNF h such that 1. Plausibility: Pr [h*(X)] ≥ μ D with probability 1‐δ, 2. h* entails c: Pr [c(X)|h*(X)] = 1 D 1. Plausibility: Pr [h(X)] ≥ μ • Find h (ideally in H) such that with prob. 1‐δ, D 2. h almost entails c: PrD[c(X)|h(X)] ≥ 1‐ε 1. Plausibility: PrD[h(X)] ≥ μ’ for some μ’(μ,n,ε,δ) 2. h almost entails c: PrD[c(X)|h(X)] ≥ 1‐ε

139 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 140 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Elimination Algorithm also solves Extension: Abductive Reasoning k‐DNF abduction tolerating ε>0 • Start with h as an OR over all terms of size k • Given only that some h* achieves (1) (m) • For each example x ,…,x 1. Plausibility: PrD[h*(X)] ≥ μ (i) • If c(x ) = 0, 2. h* almost entails c: PrD[c(X)|h*(X)] ≥ 1‐ε delete all terms T from h such that T(x(i)) = 1 Find an h such that for some other μ’ & ε’,

• Return h 1. Plausibility: PrD[h(x)] ≥ μ’ JUST OMIT THE 2. h almost entails c: PrD[c(X)|h(X)] ≥ 1‐ε’ FINAL TEST FOR CONSISTENCY WITH • Improved algorithm for k‐DNF achieves k/2 x*. THE ANALYSIS IS μ’=(1‐γ)μ, ε’=Õ(n ε) [Zhang‐Mathew‐Juba’17] ALMOST IDENTICAL. • Cf. only obtained ε’=O(nkε) for preconditions…

141 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 142 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

But, what about abducing Extension to abduction with conjunctions?? [Juba ’16] partial examples Theorem. Suppose that a polynomial‐time algorithm exists for learning abduction from random examples for • Subtle: do we condition on… conjunctions with μ’=C((1‐γ)μ/n)d for some C, d. • h(X) true? Generally Then there is a polynomial‐time algorithm for PAC‐ • h(X) provable (under ρ)? more learning DNF. desirable… • h|ρ=1 (h “witnessed” on ρ)? So what? 1. Central open problem in computational learning theory • Come see our poster raised in original paper by Valiant (1984) “Learning Abduction Under Partial 2. Recent work by Daniely and Shalev‐Shwartz (2016) shows Observability” (Juba, Li, Miller) that algorithms for PAC‐learning DNF would have other surprising consequences. • Short version: condition on some term T of h In summary – an algorithm for our problem would constitute provable under ρ (i.e. T|ρ provable from KB|ρ ) an unlikely breakthrough. • Can use Elimination; incorportates implicit KB

143 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 144 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

24 Extent of commonsense reasoning Naïve use of preconditions is with PAC‐semantics problematic COULD THIS CAPTURE ALL • Our reasoning with preconditions is too COMMONSENSE REASONING? credulous • Unlikely! • Example: consider the query fly for x* with • So far, generally doesn’t capture penguin(x*)=1… then h* = bird “law of inertia,” naïve theories, … • Is reasonably common (PrD[bird] moderate) • We still seem to require the use of a • Is consistent with x* (bird(x*)=1) non‐monotonic logic as a foundation • Supports the query (Pr[fly|bird] high) • At minimum, for “law of inertia” • So, we will return a precondition such as • Question: can simulations provide h = bird supporting fly examples for naïve theories?

145 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 146 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Activity: Find the Arguments

This animal has Feathers, lives in Antarctica, and looks Funny. Question: Does it have Wings?

Antarctica r4 Feathers D. Reasoning Non‐Monotonically Bird r2 Funny r7 Learning with Non‐Monotonic Logics r3 Flying r6

r5 Penguin r1 Wings

147 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 148 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Why Abandon Equivalences? How to Learn Arguments?

• r1: penguin  ¬flying, r2: bird  flying, r1 > r2 • Rules with different heads in each argument, Formula: (u  bird)  ¬penguin  flying therefore, one has to deal with partial • Good on “full” scenes. Still abstains on {¬p, ¬b}. observability, which then requires SLAP. • Infers too little… Does not infer f on {b}. Bad! • Sufficient to get consistency in predictions.

• r1: b  a, r2: ¬b  a Formula: true  a • Some relational expressivity comes for “free”. • Infers too much… a by case analysis on { }. Bad! • Each rule has an IQE body that is efficiently • NP‐hard reasoning. Still not 1 rule/atom. Bad! testable for satisfaction on observations. • Following linear thresholds and decision lists. • Thus: logic‐based arguments with preferences. • Online, efficient, dealing with priorities, etc.

149 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 150 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

25 Never‐Ending Rule Discovery [Michael ’16] What Does it End Up Learning?

1. Get observation obs, and reason with active • Hide some attributes from states drawn from: rules to get an enhanced observation obs*. {¬bird,¬penguin,¬plane,¬flying} w.p. 5/16 {¬bird,¬penguin,¬plane, flying} w.p. 5/16 2. Find a literal x that is observed in obs but not {¬bird,¬penguin, plane, flying} w.p. 2/16 inferred by active rules triggered in obs*. Add { bird, penguin,¬plane,¬flying} w.p. 1/16 body  x, for random body satisfied by obs*. { bird,¬penguin,¬plane, flying} w.p. 3/16 3. Increase / decrease weight of rules triggered • “Intended” learned rules with head (¬)flying: in obs* that concur with / oppose obs. penguin  ¬flying, bird  flying, plane  flying 4. Newly active rules are weaker than existing. • But also “picks up”: plane  ¬penguin (mutually exclusive), ¬bird  ¬penguin (contrapositive), 5. Newly inactive rules, have no preferences. penguin  bird, ¬flying  ¬bird (explaining away).

151 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 152 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Demo of NERD’s Online Behavior Probably Approximately Correct?

• Equivalences are PAC learnable, but suffer from Goldilocks effect: infer too much / little. • Are logic‐based arguments PAC learnable? + Learn with unknown atoms. Learn priorities. Initial evidence happens to activate r . 7 - Nested if‐then‐else’s unlearnable from scenes. Later counterexamples deactivate it. +  In the meantime, evidence activates r . Non‐adversarial environments. Equiv Args. 1 - Learning requires reasoning, restricts depth. Thus, support becomes stronger for r7. Even though counterexamples remain. + Psychological evidence on bounded reasoning. Rules will react to environment change. + On edge of learnable. Evolutionary++- pressure.- 153 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 154 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Machine Coaching [Michael ’17]

• Integrate user‐provided rules while learning. • Like online learning, but instead of providing the correct label, “question” part of the argument that leads to the wrong prediction. • Possible to give PAC‐semantics to the process!

• Theorem: Arguments (ASPIC+ type: grounded semantics, axiomatic premises, defeasible rules, rebutting attacks, last link preferences) E are PAC learnable via machine coaching alone.

155 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 156 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

26 High‐Level Tutorial Roadmap

A. Why KRR Should Embrace Learning

B. Introduction to PAC‐Semantics

C. Integrating Deduction and Induction E. Overall Summary and Conclusions

D. Reasoning Non‐Monotonically Recap and Open Problems E. Overall Summary and Conclusions

157 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 158 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Semantics for Learned Knowledge KRR and Learning Integration

• PAC‐semantics is a suitable choice: • Traditional view of reasoning and learning as • models the world as a probability distribution independent processes must be abandoned. • treats knowledge as high‐probability properties • When combined under PAC‐semantics they: • Offers simple solutions to classic KR problems: • soundly achieve greater completeness • non‐monotonicity from conditioning of rules • circumvent computational barriers in learning • exogenous qualification from validity defect • enable fast and compact access to “implicit KB” • elaboration tolerance through implicit learning • may sometimes reduce reasoning complexity • ramifications incorporated in “implicit KB” • Also, seamlessly supports user intervention • natural formulation of abductive inference during learning through machine coaching. • explicit solutions through argument learning

159 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 160 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

Open Problems with KRR Flavor Open Problems with KRR Flavor

• What kind of first‐order expressions can we • When, and how broadly, is the complexity of learn from partial examples in reasoning? reasoning reduced in the integrated problem? • IQEs via propositionalization. Classical barriers • More natural condition than in [Juba ’15]? (e.g., Haussler) apply to integrated problem? • More broadly: which fragments are tractable? • What kind of first‐order integrated learning • Incorporating naive theories into the model? and reasoning is possible? • E.g., treat naive simulations as populating data • May want first‐order expressions to refer to a set for implicit KB —how well does this work? limited domain. Seems closely related to • Preconditions for commonsense reasoning? selection of “preconditions” (see point in next • Perhaps select an “unchallenged” precondition; slide) but for limiting the domain of quantifiers. suggests argument semantics for precondition.

161 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 162 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

27 References

• Adams. (1975), ‘The Logic of Conditionals: An Application of Probability to Deductive Logic’, Springer. • Blumer, Ehrenfeucht, Haussler, Warmuth. (1987), ‘Occam’s Razor’, Information Processing Letters, 24:377–380. • Daniely, Shalev‐Shwartz. (2016), ‘Complexity Theoretic Limtations on Learning DNF’s’, Proc. COLT, volume 49 of JMLR W&CP. • Diakidoy, Kakas, Michael, Miller. (2014), ‘Story Comprehension through E. Overall Summary and Conclusions Argumentation’, Proc. COMMA, p. 31–42. • Diakidoy, Kakas, Michael, Miller. (2015), ‘STAR: A System of Argumentation for Story Comprehension and Beyond’, Proc. Commonsense, p. 64–70. • Dimopoulos, Michael, Athienitou. (2009), ‘Ceteris Paribus Preference Elicitation Bibliography of Related Work with Predictive Guarantees’, Proc. IJCAI, p. 1890–1895. • Dung. (1995), ‘On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n‐Person Games’, Artificial Intelligence 77:321–357.

163 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 164 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

References References

• Haussler. (1989), ‘Learning Conjunctive Concepts in Structural Domains’, Machine • Michael. (2011), ‘Causal Learnability’, Proc. IJCAI, p. 1014–1020. Learning 4(1):7–40. • Michael. (2013), ‘Machines with WebSense’, Proc. Commonsense. • Juba. (2013), ‘Implicit Learning of for Reasoning’, Proc. IJCAI, p. • Michael. (2014), ‘Simultaneous Learning and Prediction’, Proc. KR, p. 348–357. 939–946. • Michael. (2015a), ‘The Disembodied Predictor Stance’, Pattern Recognition Letters • Juba. (2015), ‘Restricted Distribution Automatizability in PAC‐Semantics’, Proc. ITCS, 64(C):21–29. p. 93–102. • Michael. (2015b), ‘Introspective Forecasting’, Proc. IJCAI, p. 3714–3720. • Juba. (2016), ‘Learning Abductive Reasoning Using Random Examples’, Proc. AAAI, • Michael. (2016), ‘Cognitive Reasoning and Learning Mechanisms’, Proc. AIC, p. 2– p. 999–1007. 23. • Kakas, Michael. (2016), ‘Cognitive Systems: Argument and Cognition’, IEEE • Michael. (2017), ‘The Advice Taker 2.0’, Proc. Commonsense. Intelligent Informatics Bulletin 17(1):14–20. • Michael, Papageorgiou. (2013), ‘An Empirical Investigation of Ceteris Paribus • Kakas, Michael, Miller. (2008), ‘Fred meets Tweety’, Proc. ECAI, p. 747–748. Learnability’, Proc. IJCAI, p. 1537–1543. • Khardon, Roth. (1997), ‘Learning to Reason’, J. ACM, 44(5):697–725. • Michael, Valiant. (2008), ‘A First Experimental Demonstration of Massive • Michael. (2008), ‘Autodidactic Learning and Reasoning’, PhD thesis, Harvard. Knowledge Infusion’, Proc. KR, p. 378–388. • Michael. (2009), ‘Reading Between the Lines’, Proc. IJCAI, p. 1525–1530. • Modgil, Prakken. (2014), ‘The ASPIC+ Framework for Structured Argumentation: A • Michael. (2010), ‘Partial Observability and Learnability’, Artificial Intelligence Tutorial’, Argument & Computation 5(1):31–62. 174(11):639–669.

165 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 166 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

References

• Nilsson. (1986), ‘Probabilistic Logic’, Artificial intelligence, 28(1):71–87. • Roth. (1995), ‘Learning to Reason: The Non‐monotonic Case’, Proc. IJCAI, vol. 2, p. 1178–1184. • Skouteli, Michael. (2016), ‘Empirical Investigation of Learning‐Based Imputation Policies’, Proc. GCAI, p. 161–173. • Toni. (2014), ‘A Tutorial on Assumption‐based Argumentation’, Argument & Computation 5(1):89–117. • Valiant. (1984), ‘A Theory of the Learnable’, Communications of the ACM, 18(11):1134– 1142. • Valiant. (1994), ‘Circuits of the Mind’, Oxford University Press. • Valiant. (1995), ‘Rationality’, Proc. COLT, p. 3–14. end • Valiant. (2000), ‘Robust Logics’, Artificial Intelligence, 117:231–253. • Valiant. (2000b), ‘A Neuroidal Architecture for Cognitive Computation’, J. ACM, 47(5):854–882. • Zhang, Mathew, Juba. (2017), ‘An Improved Algorithm for Learning to Perform Exception‐Tolerant Abduction’, Proc. AAAI, p. 1257–1265.

167 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael 168 AAAI 2018 Tutorial: Integrating Learning into Reasoning Juba and Michael

28