Automatically Generating Problems and Solutions for Natural Deduction
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Automatically Generating Problems and Solutions for Natural Deduction Umair Z. Ahmed Sumit Gulwani Amey Karkare IIT Kanpur MSR Redmond IIT Kanpur [email protected] [email protected] [email protected] Abstract against the prejudiced and uncivilized attitudes that threaten the foundations of our democratic society [Hurley, 2011]. Natural deduction, which is a method for establish- From a pedagogical perspective, natural deduction is a use- ing validity of propositional type arguments, helps ful tool in relieving math anxiety that terrifies countless stu- develop important reasoning skills and is thus a dents. It is a gentle introduction to mastering the use of logical key ingredient in a course on introductory logic. symbols, which carries into other more difficult fields such as We present two core components, namely solution algebra, geometry, physics, and economics. generation and practice problem generation, for en- Natural deduction is typically taught as part of an introduc- abling computer-aided education for this important tory course on logic, which is a central component of college subject domain. The key enabling technology is use education and is generally offered to students from all disci- of an offline-computed data-structure called Uni- plines regardless of their major. It is thus not a surprise that versal Proof Graph (UPG) that encodes all possi- a course on logic is being offered on various online educa- ble applications of inference rules over all small tion portals including Coursera [Coursera, ], Open Learning propositions abstracted using their bitvector-based Initiative [oli, ], and even Khan Academy [kha, ]. truth-table representation. This allows an efficient forward search for solution generation. More inter- estingly, this allows generating fresh practice prob- Significance of Solution Generation Solution generation lems that have given solution characteristics by per- is important for several reasons. First, it can be used to gen- forming a backward search in UPG. We obtained erate sample solutions for automatically generated problems. around 300 natural deduction problems from var- Second, given a student’s incomplete solution, it can be used ious textbooks. Our solution generation procedure to complete the solution. This can be much more illustrative can solve many more problems than the traditional for a student as opposed to providing a completely different forward-chaining based procedure, while our prob- sample solution to a student. Third, given a student’s incom- lem generation procedure can efficiently generate plete solution, it can also be used to generate hints on the next several variants with desired characteristics. step or an intermediate goal. 1 Introduction Significance of Problem Generation Generating fresh problems that have specific solution characteristics is a te- [ ] Natural deduction Gentzen, 1964 is a method for estab- dious task for the teacher. Automating this has several bene- lishing the validity of propositional type arguments, where fits. the conclusion of an argument is actually derived from the Generating problems that are similar to a given problem premises through a series of discrete steps. A proof in natu- has two applications. First, it can help avoid copyright issues. ral deduction consists of a sequence of propositions, each of It may not be legal to publish problems from textbooks on which is either a premise or is derived from preceding propo- course websites. Hence, instructors resort to providing indi- sitions by application of some inference/replacement rule and rect pointers to textbook problems as part of assignments. A the last of which is the conclusion of the argument. problem generation tool can provide instructors with a fresh source of problems to be used in their assignments or lecture Significance of Teaching Natural Deduction From a prag- notes. Second, it can help prevent plagiarism in classrooms matic perspective, it helps develop the reasoning skills needed or massive open online courses since each student can be pro- to construct sound arguments of one’s own and to evaluate vided with a different problem of the same difficulty level. the arguments of others. It instills a sensitivity for the formal Generating problems that have a given difficulty level and component in language, a thorough command of which is in- that exercise a given set of concepts can help create personal- dispensable to clear, effective, and meaningful communica- ized workflows for students. If a student solves a problem cor- tion. Such a logical training provides a fundamental defense rectly, then the student may be presented with a problem that 1968 Rule Name Premises Conclusion Rule Name Proposition Equivalent Proposition Modus Ponens (MP) p ! q, p q De Morgan’s Th. : (p ^ q) : p _: q Modus Tollens (MT) p ! q, : q : p : (p _ q) : p ^ : q Hypothetical Syllogism (HS) p ! q, q ! r p ! r Distribution p _ (q ^ r) (p _ q) ^ (p _ r) Disjunctive Syllogism (DS) p _ q, : p q Double Negation p :: p (a) (b) Constructive Dilemma (CD) (p ! q) ^ (r ! s), p _ r q _ s Transposition p ! q : q !: p Destructive Dilemma (DD) (p ! q) ^ (r ! s), : q _: s : p _: r Implication p ! q : p _ q Simplification (Simp) p ^ q q Equivalence p ≡ q (p ! q)^ (q ! p) Conjunction (Conj) p, q p ^ q p ≡ q (p ^ q)_ (: p ^ : q) Addition (Add) p p _ q Exportation (p ^ q) ! r p ! (q ! r) Figure 1: (a) Sample Inference Rules. (b) Sample Replacement Rules. Step Proposition Reason Step Truth-table Reason Step Proposition Reason P1 x1 _ (x2 ^ x3) Premise P1 η1 = 1048575 Premise P1 x1 ≡ x2 Premise P2 x1 ! x4 Premise P2 η2 = 4294914867 Premise P2 x3 !:x2 Premise P3 x4 ! x5 Premise (b) P3 η3 = 3722304989 Premise P3 (x4 ! x5) ! x3 Premise 1 (x1_x2)^(x1_x3) P1, Distribution 1 η4 = 16777215 P1, Simp. 1 (x1 ! x2) ^ (x2 ! x1) P1, Equivalence 2 x1 _ x2 1, Simp. 2 η5 = 4294923605 P2, P3, HS. 2 x1 ! x2 1, Simp. (a) 3 x1 ! x5 P2, P3, HS. 3 η6 = 1442797055 1, 2, HS. (d) 3 (x4 ! x5) !:x2 P3, P2, HS. 4 x2 _ x1 2, Commutativity 4 ::x2 !:(x4 ! x5) 3, Transposition 5 ::x2 _ x1 4, Double Neg. Node Candidate Target Source 5 x2 !:(x4 ! x5) 4, Double Neg. 6 :x2 ! x1 5, Implication Proposition Witness Witness 6 x1 !:(x4 ! x5) 2, 5, HS. 7 :x2 ! x5 6, 3, HS. (c) η4 q2 [q1] [qP1] 7 x1 !:(:x4 _ x5) 6, Implication 8 ::x2 _ x5 7, Implication η5 q3 [qP2; qP3] [qP2; qP3] 8 x1 ! (::x4 ^ :x5) 7, De Morgan’s 9 x2 _ x5 8, Double Neg. η6 q7 [q6; q3] [q2; q3] 9 x1 ! (x4 ^ :x5) 8, Double Neg. Figure 2: (a) Natural Deduction Proof (with application of inference rules highlighted in bold). (b) Abstract Proof with truth-tables shown using a 32-bit integer representation. (c) Steps in converting abstract proof to natural deduction proof. qj denotes the proposition at step j from (a). (d) Natural Deduction Proof of a similar problem. is more difficult than the last problem, or exercises a richer starting from a conclusion during problem generation, set of concepts. If a student fails to solve a problem, then the which we model as a reverse of solution generation. student may be presented with a simpler problem. Novel Technical Insights Our observations include: Contributions We make following contributions. • Small-sized hypothesis: Propositions that occur in edu- • We propose leveraging the following novel ingredients cational contexts use a small number of variables and for building an efficient computer-aided education sys- have a small size (Fig. 4(a)). The number of such small- tem for natural deduction: small-sized proposition hy- sized propositions is bounded (Fig. 3(a)). pothesis, truth-table based representation, and offline computation. • Truth-Table Representation: A proposition can be ab- stracted using its truth-table, which can be represented • We present a novel two-phased methodology for solu- using a bitvector representation [Knuth, 2011]. (Appli- tion generation that first searches for an abstract solution cation of inference rules over bitvector representation and then refines it to a concrete solution. reduces to performing bitwise operations.) This provides two key advantages: (i) It partitions small-sized propo- • We motivate and define some useful goals for prob- sitions into a small number of buckets (Fig. 3(b)). (ii) It lem generation, namely similar problem generation and reduces the size/depth of a natural deduction proof tree, parameterized problem generation. We present a novel making it easier to search for solutions or generate prob- methodology for generating such problems using a pro- lems with given solution characteristics. cess that is reverse of solution generation. • Offline Computation: The symbolic reasoning required • We present detailed experimental results on 279 bench- to pattern match propositions for applying inference mark problems collected from various textbooks. Our rules can be performed (over their truth-table bitvector solution generation algorithm can solve 84% of these representation) and stored in an offline phase. This has problems, while the baseline traditional algorithm could two advantages: (i) It alleviates the cost of symbolic rea- only solve 57% of these problems. Our problem genera- soning by a large constant factor by removing the need tion algorithm is able to generate few thousands of sim- to perform any symbolic matching at runtime. (ii) It en- ilar problems and parameterized problems on average ables efficient backward search for appropriate premises per instance in a few minutes. 1969 s n s n Total n 2 1 2 3 4 5 Total (2 ) 1 2 3 4 5 1 2 10 1:3 × 102 2:4 × 103 4:6 × 104 1 4 2 4 4 4 4 (a) 2 4 52 1:5 × 103 5:9 × 104 2:5 × 106 (b) 2 16 4 16 16 16 16 3 1 6 126 5:8 × 103 3:3 × 105 2:1 × 107 3 256 6 38 152 232 256 4 8 232 1:4 × 104 1:1 × 106 9:8 × 107 4 65536 8 70 526 3,000 13,624 5 10 370 2:9 × 104 2:8 × 106 3:1 × 108 5 4,294,967,296 10 112 1,252 12,822 1,22,648 Figure 3: (a) Number of propositions (not including those with double or more negations) over n variables and size at most s.