Stochastic Definite Clause Grammars
Total Page:16
File Type:pdf, Size:1020Kb
Stochastic Definite Clause Grammars Christian Theil Have Research group PLIS: Programming, Logic and Intelligent Systems Department of Communication, Business and Information Technologies Roskilde University, P.O.Box 260, DK-4000 Roskilde, Denmark [email protected] Abstract where corpus annotations dictate the derivations, can This paper introduces Stochastic Definite Clause be done by counting expansions used in the annota- Grammars, a stochastic variant of the well- tions. Estimation with incomplete data can be ac- known Definite Clause Grammars. The grammar complished using the Expectation-Maximization (EM) formalism supports parameter learning from an- algorithm [8]. notated or unannotated corpora and provides a In stochastic unification grammars, the choice of mechanism for parse selection by means of sta- rules to expand is stochastic and the values assigned tistical inference. Unlike probabilistic context- free grammars, it is a context-sensitive gram- to unification variables are determined implicitly by mar formalism and it has the ability to model rule selection. This means that in some derivations, cross-serial dependencies in natural language. instances of the same logic variable may get different SDCG also provides some syntax extensions values and unification will fail as result. which makes it possible to write more compact Some of the first attempts to define stochastic uni- grammars and makes it straight-forward to add fication grammars did not address the issue of how lexicalization schemes to a grammar. they should be trained. Brew [2] and Eisele [9] tries to address this problem using EM, but their methods have problems handling cases where variables fails to 1 Introduction and background unify. The resulting probability distributions are miss- ing some probability mass and normalization results in We describe a stochastic variant of the well-known Def- non-optimal distributions. inite Clause Grammars [12], which we call Stochastic Abney [1] defines a sound theory of unification gram- Definite Clause Grammars (SDCG). mars based on Markov fields and shows how to esti- Definite Clause Grammars (DCG) is a grammar for- mate the parameters of these models using Improved malism built on top of Prolog, which was developed by Iterative Scaling (IIS). Abney’s proposed solution to Pereira and Warren [12] and was based the principles the parameter estimation problem depends on sam- from Colmerauers metamorphis grammars [6]. The pling and only considers complete data. Riezler[13] grammars are expressed as rewrite rules which may decribes the Iterative Maximization algorithm which include logic variables, like normal Prolog rules. DCG also work for incomplete data. Finally, Cussens [7] exploit Prologs unification semantics, which assures provide an EM algorithm for stochastic logic programs equality between different instances of the same logic- which handles incomplete data and is not dependent variable. DCG also allows modeling of cross-serial of on sampling. dependencies, which is known to be beyond the capa- SDCG is implemented as a compiler that translates bility of context-free grammars [4]. a grammar into a program in the PRISM language. In stochastic grammar formalisms such as proba- PRISM [16, 19, 15] is an extension of Prolog that al- bilistic context-free grammars (PCFG), every rewrite lows expression of complex statistical models as logic rule has an associated probability. programs. A PRISM program is a usual Prolog pro- For a particular sentence, a grammar can produce an gram augmented with random variables. PRISM de- exponential number of derivations. In parsing, we are fines a probability distribution over the possible Her- usually only interested in one derivation which best brand models of a program. It includes efficient im- reflects the intended sentence structure. In stochas- plementations of algorithms for parameter learning tic grammars, a statistical inference algorithm can be and probabilistic inference. The execution, or sam- used to find the most probable derivation, and this is a pling, of a PRISM program is a simulation where very successful method for parse disambiguation. This values for the random variables is selected stochasti- is especially true variants of PCFGs which condition cally, according to the underlying probability distribu- rule expansions on lexical features. Charniak [3] re- tion. PRISM programs can have constraints, usually ports that ”a vanilla PCFG will get around 75% preci- in the form of equality between unified logic variables. sion/recall whereas lexicalized models achieve 87-88% Stochastic selection of values for such variables may precision recall”. The reason for the impressive preci- lead to unification failure and resulting failed deriva- sion/recall of stochastic grammars is that the proba- tions must be taken into account in parameter estima- bilities governing the likelihood of rule expansions are tion. PRISM achieves this using the fgEM algorithm normally derived from corpora using parameter esti- [17, 20, 18], which is an adaptation of Cussen’s Failure- mation algorithms. Estimation with complete data, Adjusted Maximization algorithm [7]. A central part 139 International Conference RANLP 2009 - Borovets, Bulgaria, pages 139–143 of Cussens algorithm is the estimation of the number consists of a name which is a Prolog atom, followed of times rules are used in failed derivations. PRISM by an optional parenthesized, comma-separated list of estimates failed derivations using a failure program, features; (F1..Fn). Features are either Prolog atoms derived through a program transformation called First or variables. Rule constituents may additionally have Order Compilation (FOC) [14]. prefix regular expression modifiers. The allowed mod- ifiers are * (kleene star) meaning zero or more oc- currences, + meaning one or more occurrences and ? 2 Stochastic Definite Clause meaning zero or one occurrence. Grammars Embedded code takes the form, {P}, where P is a block of Prolog goals and control structures. The Stochastic Definite Clause Grammars is a stochastic allowed subset of Prolog corresponds to what is al- unification based grammar formalism. The grammar lowed in the body of a Prolog rule, but with the re- syntax is modeled after, and is compatible with, Defi- striction that every goal must return a ground answer nite Clause Grammars. To facilitate writing stochastic and may not be a variable. Also, while admitted by grammars in DCG notation, a custom DCG compiler the syntax, meta-programming goals like call are not has been implemented. The compiler converts a DCG allowed. The goals unify with facts and rules defined to a PRISM program, which is a stochastic model of outside the embedded Prolog code, but not in other the grammar. embedded code blocks. Utilizing the functionality of PRISM, the grammar Symbol lists are Prolog lists of either atoms or vari- formalism supports parameter learning from anno- ables or a combination of the two. The list usually tated or unannotated corpora and provides and mech- take the form, [ S1,S2,..,SN ], but the list opera- anism for parse selection through statistical inference. tor | may also be used. However, it is required that Parameter learning and inference is performed using every variable in the list is ground. A symbol list may PRISMs builtin functionality. not be empty. SDCG include some extensions to the DCG syntax. Expansion macros have the form, It includes a compact way of expressing recursion, in- @name(V1,V2,...,Vn) spired by regular expressions. It has expansion macros used for writing template rules which allow compact where name is an atom and is followed by a non-empty expression of multiple similar rules. The grammar syn- parenthesized, comma-separated list, V1...Vn, con- tax also adds a new conditioning operator which makes sisting of atoms or variables or a combination. A possible to condition rule expansions on previous ex- macro corresponds have a corresponding goal, name/n, pansions. which must be defined. 2.1 Grammar syntax 2.2 Procedural semantics A grammar consist grammar rules and possibly some The grammar rules govern the rewriting the head of helper Prolog rules and facts. A grammar rule takes a rule into the constituents in the body of a rule. A the form, rule is rewritten when all its constituents have been expanded. The order of the constituents in the body H ==> C1,C2,..,Cn. are significant and they are expanded in a left-to-right manner. The rewriting process always begins with the H is called the head or left-hand side of the rule and start rule and progress in a depth-first manner. A rule C1,C2,...,Cn is called the body or right-hand side of constituent in the body of a rule is thus a reference the rule. The head is composed of a name, followed by to one or more other rules of the grammar. A gram- an optional parameter list and an optional condition- mar rule is said to be matched by a constituent rule if ing clause. It has the form, the name and arity of are the same and their features name(F1,F2,...,Fn) | V1,V2,...,Vn unify. A constituent rule is expanded by replacing it with the body of some matching rule. Symbol lists The name of the rule is a Prolog atom. The pa- are terminals and are not expanded. Embedded Pro- rameter list is a non-empty parenthesized, comma- log code is expanded to nothing and executed as a separated list of features which may be Prolog vari- side-effect. The expansion terminates when the body ables or atoms. The number of features in rules is re- only contains symbols or some constituent cannot be ferred to as its arity. The optional conditioning clause expanded (derivation fails). starts with the pipe (included) and is a non-empty, When a constituent matches more than one rule comma-separated list of Prolog variables or atoms, or there might be more than one derivation. The choice a combination of the two. The conditioning clause of the rule to expand given such a constituent, should may also contain expansion macros in the case of un- be seen in the light of the probabilistic inference being expanded rules.