Generating Text with a Theorem Prover
Total Page:16
File Type:pdf, Size:1020Kb
Generating Text with a Theorem Prover Ivfin I. Garibay School of Computer Science University of Central Florida Orlando, FL [email protected] Statechart Abstract ~ Theoreml The process of documenting designs is tedious and Content Planning Question tree + Tree transformations often error-prone. We discuss a system that au- , _?_T;_J Text Planning i Hypermxt~s implicittext planner(user)[ tomatically generates documentation for the single step transition behavior of Statecharts with particu- I.oa!izatioo Tomp,ato I lar focus on the correctness of the result in the sense that the document will present all and only the facts Hyper-t exit Document corresponding to the design being documented. Our approach is to translate the Statechart into Figure 1: Conceptual view Of the system. a propositional formula, then translate this formula into a natural language report. In the later transla- spective, this problem is distinguished in that the tion pragmatic effects arise due to the way the in- formal correctness of the document being generated formation is presented. Whereas such effects can be is crucial while felicitousness of the style is rela- difficult to quantify, we account for them within an tively unimportant. This leads us to a solution abstract framework by applying a series of transfor- based on formally verifiable theorem-proving tech- mations on the structure on the report while pre- niques which allows us to approach strategic NLG is- serving soundness and completeness of the logical sues within a highly abstract and conceptually clear content. The result is an automatically generated framework. hypertext report that is both logically correct and, The system takes a statechart in the form of a to a relatively high degree of confidence, free of mis- labeled directed graph and translates it into a set leading implicatures. of propositional formulae defining its transition be- havior. A hyper-text natural language document is 1 Introduction generated on-demand from this set of formulae in Producing technical documentation is a time- response to the reader's interaction with the appli- consuming and expensive task. For instance, Re- cation. iter et al. (1995), report cases of engineers expend- Figure 1 depicts a comparative (Moore and Paris, ing five hours on documentation for each hour spent 1993; Paris et al., 1991; Hovy, 1988) conceptual view on design and of airplane documentation sets which of the system while Fig. 2 shows the system archi- weigh more than the actual airplane being docu- tecture. A prototype has been fully implemented mented. Part of the reason for this problem is the with the exception of the statechart axiomatization gap between Computer Aided Design (CAD) tools module, x and similar tools for assisting the documentation of those designs. Since research efforts focus primarily 2 A Logical Semantics for in the former, this situation is likely to get worse as Statecharts the CAD tools get more powerful while documenta- tion tools lag far behind. The graphical language of statecharts as proposed In this paper we address the matter of automatic by David Harel (Harel et al., 1987; Harel and Naa- generation of technical documentation (Reiter et al., mad, 1996), has been widely recognized as a impor- 1992; Reiter et al., 1995; RSsner and Stede, 1992; tant tool for analyzing complex reactive systems. Svenberg, 1994; Punshon et al., 1997) by studying It has been implemented in commercial applica- the problem of automatically generating documents tions like STATEMATE (Harel and Politi, 1998) describing the single step transition behavior of Stat- 1A full description of this algorithmic translation of a stat- echarts. echart from its graphical formalism to the propositional logic From a natural language generation (NLG) per- input format used in this work is described in Garibay (2000). 13 Statechart ((TV ~ WORKING v WAITING) (TV.next ~ WORKING.next V WAITING,next) ..... t ..... (WORKING --~ ~ WAITING) ~Axlomatlzatlon ) (WORKING.next --~ ~ WAITING.next) (WAITING --~ ~ WORKING) 'I......... Module I ' (WAITING.next --+ ~ WORKING.next) (WORKING ~ IMAGE A SOUND) [ Statechart Axioms (WORKING.next ~ IMAGE.next A SOUND.next) • .. ) Reduction to ~ ((TV) A I MRCNF module IN, ((WORKING A PICTURE A PIC-OFF A WAITING.next) V • ~ (WORKING ^~ (PICTURE A PIC-OFF) A ((IMAGE A PICTURE A PIC-OFF A WAITING.next) V (IMAGE A~ (PICTURE A PIC-OFF) A ((PICTURE A PIC-OFF A WAITING.next) V (PICTURE A TXT A MUTE.next A TEXT.next) V I-I to CN, module (PICTURE A~ OFF A9 TXT A PICTURE.next) ) I QuestionTree ,ode~ Theorem ] • .. ))) Prover [Information EitractionModule ~'~ k. Hyper-text Organization/RealizationModul~ Figure 4: Section of the propositional logic transla- tion of the example statechart (Fig. 3). Generated Hyper-text Page User Interface(Browser) ] one for the next status. In practice, we incorpo- rate this into a single model with two versions of each propositional variable: P for the truth value in the current status and Pn for the truth value in the Figure 2: System architecture of the theorem prover next status 2. A full description of the algorithm based generator. The dotted box is not imple- for translating statecharts to sets of formulae can mented. be found in Garibay (2000). For a example of this translation see Fig. 4. "rv HNG { PIC OFF WOR 3 The Minimum Clausal Theory of IMAGE SOUND the Statecharts o J VhmN( At this point, we have a formula that entails the the- ory of the single step transition behavior of a Stat- echart. We can fulfill our requirement of generat- ing a sound and complete report just by translating this formula into English. However, this approach presents a number of problems. For instance, the Figure 3: Example Statechart. AND and OR connectives do not in general have the same meaning in English as they do in logic (Gazdar, 1979), furthermore, unlike in the logical formula the and RHAPSODY from ilogix (I-Logix Inc., 2000) scope of the connectives in English is not, in gen- and has been adopted as a part of the Unified Mod- eral, well defined (Holt and Klein, 1999). To mini- eling Language (UML Revision Task Force, 1999; mize the ambiguity, we need to take the formula to Booch, 1999), an endeavor to standardize a language a form with minimal nesting of operators. of blueprints for software. Potentially a more significant problem is the fact Statecharts (Fig. 3) are an extension of conven- that much of the theory (the formula plus all its logi- tional finite state machines in which the states may cal consequences) is obtainable only via complicated have a hierarchical structure. A configuration is de- inferences. Since the reader understands the trans- fined as a maximal set of non-conflicting states which lation of the formula at an intuitive level, making are active at a given time. A transition connects only limited inferences, a direct translation will fail states and is labeled with the set of events that trig- to communicate the entire theory. Hence, we would ger it, and a second set of events that are generated like to take the formula to a form that is closed, in when the transition is taken. A step of the statechart some sense, under logical consequences. relates the current configuration and the events that We address both issues by using what we refer to are active to the next configuration and the events as minimal (fully) resolved conjunctive normal form that are generated. A configuration and the set of (MRCNF). A formula is in a MRCNF if and only if events that are active is referred to as a status. 2These single step models will form the basis for a tem- We capture a step of a statechart as a pair of poral model capturing the full behavior of the statecharts as propositional models, one for the current status and described by Harel and Naamad (1996). 14 it is in conjunctive normal form (CNF) and is closed the theory is contingent upon. The reader effec- under resolution, absorption and tautology (Fitting, tively fixes the valuation of one of these variables 1990; Rogers and Vijay-Shanker, 1994). The clo- to true or false. The system then adds the reader's sure under resolution is effectively a finite approx- choice to the theory and recalculates the MRCNF. If imation of closure under consequence, that is, ev- the newly obtained theory remains contingent upon ery clause that is a logical consequence of the the- some variables, the reader then will have available ory entailed by the formula is a direct consequence a new set of choices. If not, the reader will have of some clause in the MRCNF. The other two op- reached a set of non-contingent facts (henceforth erations guarantee minimality in size by removing facts) which are consequences of all the previous clauses that are trivially true (tautology), and those choices. that are proper super-sets of another (absorption). While this process makes the information more Hence, the translation will communicate not only accessible by giving it a logical structure, it does the initial facts but also those inferred by resolution. nothing to reduce the size of the report. We resolve Moreover, a formula in this form is just a conjunc- this by generating the document on demand. While tion of disjunctions--eliminating the scoping prob- the refinement process (the core computation for on- lem. If we interpret the disjunctions as implications, demand generation) can potentially be very expen- the translation into English will be just a sequence sive in terms of time, the fact that we are adding sin- of implicative sentences that are to be interpreted gleton clauses to an already minimum set of clausal conjunctively--a typical structure for such informa- consequences allows us to use a simplified form of tion in English.