Automatically Generating Problems and Solutions for Natural Deduction
Total Page:16
File Type:pdf, Size:1020Kb
Automatically Generating Problems and Solutions for Natural Deduction Umair Ahmed Sumit Gulwani Amey Karkare IIT Kanpur MSR Redmond IIT Kanpur [email protected] [email protected] [email protected] Abstract against the prejudiced and uncivilized attitudes that threaten the foundations of our democratic society [Hurley, 2008]. Natural deduction, which is a method for establish- From a pedagogical perspective, natural deduction is a use- ing validity of propositional type arguments, helps ful tool in relieving math anxiety that terrifies countless stu- develop important reasoning skills and is thus a dents. It is a gentle introduction to mastering the use of logical key ingredient in a course on introductory logic. symbols, which carries into other more difficult fields such as We present two core components, namely solution algebra, geometry, physics, and economics. generation and practice problem generation, for en- Natural deduction is typically taught as part of an introduc- abling computer-aided education for this important tory course on logic, which is a central component of college subject domain. The key enabling technology is use education and is generally offered to students from all disci- of an offline-computed data-structure called Uni- plines regardless of their major. It is thus not a surprise that versal Proof Graph (UPG) that encodes all possi- a course on logic is being offered on various online educa- ble applications of inference rules over all small tion portals including Coursera [Coursera, ], Open Learning propositions abstracted using their bitvector-based Initiative [oli, ], and even Khan Academy [kha, ]. truth-table representation. This allows an efficient forward search for solution generation. More inter- Significance of Solution Generation Solution generation estingly, this allows generating fresh practice prob- is important for several reasons. First, it can be used to gen- lems that have given solution characteristics by per- erate sample solutions for automatically generated problems. forming a backward search in UPG. We obtained Second, given a student’s incomplete solution, it can be used 500+ natural deduction problems from various text- to complete the solution. This can be much more illustrative books. Our solution generation procedure can solve for a student as opposed to providing a completely different many more problems than the traditional forward- sample solution to a student. Third, given a student’s incom- chaining based procedure, while our problem gen- plete solution, it can also be used to generate hints on the next eration procedure can efficiently generate several step or an intermediate goal. variants with desired characteristics. Significance of Problem Generation Generating fresh problems that have given solution characteristics is a tedious 1 Introduction task for the teacher. Automating this has several benefits. Generating problems that are similar to a given problem Natural deduction [Gentzen, 1964] is a method for estab- has two applications. First, it can help avoid copyright issues. lishing the validity of propositional type arguments, where It may not be legal to publish problems from textbooks on the conclusion of an argument is actually derived from the course websites. Hence, instructors resort to providing indi- premises through a series of discrete steps. A proof in natu- rect pointers to textbook problems as part of assignments. A ral deduction consists of a sequence of propositions, each of problem generation tool can provide instructors with a fresh which is either a premise or is derived from preceding propo- source of problems to be used in their assignments or lecture sitions by application of some inference/replacement rule and notes. Second, it can help prevent plagiarism in classrooms the last of which is the conclusion of the argument. or massive open online courses since each student can be pro- vided with a different problem of the same difficulty level. Significance of Teaching Natural Deduction From a prag- Generating problems that have a given difficulty level and matic perspective, it helps develop the reasoning skills needed that exercise a given set of concepts can help create personal- to construct sound arguments of one’s own and to evaluate ized workflows for students. If a student solves a problem cor- the arguments of others. It instills a sensitivity for the formal rectly, then the student may be presented with a problem that component in language, a thorough command of which is in- is more difficult than the last problem, or exercises a richer dispensable to clear, effective, and meaningful communica- set of concepts. If a student fails to solve a problem, then the tion. Such a logical training provides a fundamental defense student may be presented with a simpler problem. Rule Name Premises Conclusion Rule Name Proposition Equivalent Proposition Modes Ponen (MP) p ! q, p q De Morgan’s Theorem : (p ^ q) : p _: q Modus Tollens (MT) p ! q, : q : p : (p _ q) : p ^ : q Hypothetical Syllogism (HS) p ! q, q ! r p ! r Distribution p _ (q ^ r) (p _ q) ^ (p _ r) Disjunctive Syllogism (DS) p _ q, : p q Double Negation p :: p (a) (b) Constructive Dilemma (CD) (p ! q) ^ (r ! s), p _ r q _ s Transposition p ! q : q !: p Destructive Dilemma (DD) (p ! q) ^ (r ! s), : q _: s : p _: r Implication p ! q : p _ q Simplification (Simp) p ^ q q Equivalence p ≡ q (p ! q)^ (q ! p) Conjunction (Conj) p, q p ^ q p ≡ q (p ^ q)_ (: p ^ : q) Addition (Add) p p _ q Exportation (p ^ q) ! r p ! (q ! r) Figure 1: (a) Sample Inference Rules. (b) Sample Replacement Rules. Step Proposition Reason Step Truth-table Reason Step Proposition Reason P1 x1 _ (x2 ^ x3) Premise P1 η1 = 1048575 Premise P1 x1 ≡ x2 Premise P2 x1 ! x4 Premise P2 η2 = 4294914867 Premise P2 x3 !:x2 Premise P3 x4 ! x5 Premise (b) P3 η3 = 3722304989 Premise P3 (x4 ! x5) ! x3 Premise 1 (x1_x2)^(x1_x3) P1, Distribution 1 η4 = 16777215 P1, Simp. 1 (x1 ! x2) ^ (x2 ! x1) P1, Equivalence 2 x1 _ x2 1, Simp. 2 η5 = 4294923605 P2, P3, HS. 2 x1 ! x2 1, Simp. (a) 3 x1 ! x5 P2, P3, HS. 3 η6 = 1442797055 1, 2, HS. (d) 3 (x4 ! x5) !:x2 P3, P2, HS. 4 x2 _ x1 2, Association 4 ::x2 !:(x4 ! x5) 3, Transposition 5 ::x2 _ x1 4, Double Neg. Node Candidate Target Source 5 x2 !:(x4 ! x5) 4, Double Neg. 6 :x2 ! x1 5, Implication Proposition Witness Witness 6 x1 !:(x4 ! x5) 2, 5, HS. 7 :x2 ! x5 6, 3, HS. (c) η4 q2 (q1) (qP1) 7 x1 !:(:x4 _ x5) 6, Implication 8 ::x2 _ x5 7, Implication η5 q3 (qP2; qP3) (qP2; qP3) 8 x1 ! (::x4 ^ :x5) 7, De Morgan’s 9 x2 _ x5 8, Double Neg. η6 q7 (q6; q3) (q2; q3) 9 x1 ! (x4 ^ :x5) 8, De Morgan’s Figure 2: (a) Natural Deduction Proof. (b) Abstract Proof with truth-tables shown using a 32-bit integer representation. (c) Steps in converting abstract proof to natural deduction proof. qj denotes the proposition at step j from (a). (d) Natural Deduction Proof of a similar problem. Novel Technical Insights Our observations include: truth-table based representation, and offline computation. • Small-sized hypothesis: Propositions that occur in educa- • We present a novel two-phased methodology for solution tional contexts use a small number of variables and have generation that first searches for an abstract solution and a small size (Fig. 4(a)). The number of such small-sized then refines it to a concrete solution. propositions is bounded (Fig. 3(a)). • We motivate and define some useful goals for problem • Truth-Table Representation: A proposition can be ab- generation, namely similar problem generation and param- stracted using its truth-table, which can be represented us- eterized problem generation. We present a novel method- ing a bitvector representation. (Application of inference ology for generating such problems using a process that is rules over bitvector representation reduces to performing reverse of solution generation. bitwise operations.) This provides two key advantages: (i) • We present detailed experimental results on 430 bench- It partitions small-sized propositions into a small number mark problems collected from various textbooks. Our so- of buckets (Fig. 3(b)). (ii) It reduces the size/depth of a lution generation algorithm can solve 70% of these prob- natural deduction proof tree, making it easier to search for lems, while the baseline traditional algorithm could only solutions or generate problems with given solution charac- solve 72% of the problems that our algorithm is able to teristics. solve. Our problem generation algorithm is able to gener- • Offline Computation: The symbolic reasoning required to ate few thousands of similar problems and parameterized pattern match propositions for applying inference rules can problems on average per instance in a few minutes. be performed (over their truth-table bitvector representa- tion) and stored in an offline phase. This has two advan- 2 Natural Deduction tages: (i) It alleviates the cost of symbolic reasoning by Let x1; : : : ; xn be n Boolean variables. A proposition over a large constant factor by removing the need to perform these Boolean variables is a Boolean formula consisting of any symbolic matching at runtime. (ii) It enables efficient Boolean connectives over these variables. backward search for appropriate premises starting from a Definition 1 (Natural Deduction Problem) A natural de- conclusion during problem generation, which we model as m duction problem is a pair (fpigi=1; c) of a set of propositions a reverse process of solution generation. m fpigi=1 called premises and a proposition c called conclu- sion. A natural deduction problem is well-defined if the con- Contributions We make following contributions. clusion is implied by the premises, but not by any strict subset • We propose leveraging the following novel ingredients of those premises.