Definite Clause Grammar with Prolog
Total Page:16
File Type:pdf, Size:1020Kb
Definite Clause Grammar with Prolog Lecture 13 Definite Clause Grammar The most popular approach to parsing in Prolog is Definite Clause Grammar (DCG) which is a generalization of Context Free Grammar (CFG). Parsing is one of the important applications of Prolog and Logic Programming. The DCG formalism is essentially independent of Prolog and it would be possible to write compiler or interpreter for it in any programming language which permits the unification of arguments. But DCG is easily implementable in Prolog because grammar rules are similar to Prolog rules. Cont… Let us see the relationship of DCG with Prolog and its systematic evolution. – Here we are staring from Prolog to DCG to justify the claim that it is by product of Prolog. We begin by defining Context Free Grammar where rules are expressed in Bacus Normal Form (BNF) rules. The general form of CFG rule is as follows: < non_terminal > :: = < body >, – where body is a sequence of terminals and non terminals symbols of a grammar. Grammar (BNF notation) Consider the following grammar for a small subset of English sentences defined using BNF like notation. <sentence> :: = < noun_phrase >, <verb_phrase> <noun_phrase>:: = <determiner>, <noun> <verb_phrase> :: = <verb>, <noun_phrase> | <verb> <determiner> :: = a | the | an <noun> :: = apple | boy | girl | song < verb> :: = eats | sings Declarative meaning of first rule is that a sentence can take a form in which noun_phrase is followed by a verb_phrase . Parse Tree The parse tree for the sentence “the girl sings a song” is given as follows: sentence noun_phrase verb_phrase determiner noun verb noun_phrase the girl sings determiner noun a song Cont… This grammar is context free and does not take care of number agreement and other semantic information. – A sentence of the type “the girl sing a song” is also parsed if a verb sing is available in the lexicon. The semantically incorrect sentence will also be parsed. For example, “the apple eats a boy” is correct according to the above grammar. – The reason is simple that we have not incorporated any context sensitive and semantic information. Semantic Features If we can incorporate that the subject of eat should be animate (object having life) and the object should be eatable, then sentence can semantically be parsed correctly. All these semantic features can be added and are explained later in this chapter. The CFG grammar can be easily coded into Prolog rules. Each non_terminal symbol becomes a unary predicate whose argument is a sentence or phrase it identifies. Grammar in Prolog sentence(X) :- append(Y, Z, X), np (Y), vp(Z). (1) np(X) :- append(Y, Z, X), det(Y), noun(Z). (2) vp(X) :- append(Y, Z, X), verb (Y), np(Z). (3) vp(X) :- verb(X). (4) The rules for terminal words are coded as facts. det([a]). det([the]). det([an]). noun([boy]). noun([apple]). noun([girl]). noun([song]). verb([eats]). verb(([sings]). Goal: ?- sentence([the, girl, sings, a, song]). Here it is noted that a sentence is given as a list of words representing Prolog symbols. Cont… In rule (1), X is instantiated to [the, girl, sings, a, song], but Y and Z are uninstantiated variables. The goal append will generate all possible pair of values of Y and Z from X using append function which concates of two lists. The following pair of X and Y lists are obtained. – Y = [ ] , Z = [the, girl, sings, a, song] – Y = [the], Z = [girl, sings, a, song] – Y = [the, girl], Z = [sings, a, song] – Y = [the, girl, sings], Z = [a, song] – Y = [the, girl, sings, a], Z = [song] – Y = [the, girl, sings, a, song] , Z = [ ] Incomplete Data Structure Data structures which are incomplete or having holes are useful in many applications. Incomplete list is an example of such structures. Consider a complete list [1, 2, 3]. We can represent it as the difference of the following pair of lists. – [1, 2, 3, 5, 8] and [5, 8] – [1, 2, 3, 6, 7, 8, 9] and [6, 7, 8, 9] – [1,2,3] and [ ]. Each of these are instances of the pair of two incomplete lists [1,2,3 | X] and X. We call such pair a difference-list . – Difference list is represented by A-B, where A is the first argument and B is the second argument of a difference-list A-B. – A list [1,2,3] is represented using difference-list as [1, 2, 3 | X] - X . Such representation of list facilitates some of list operations more efficiently. Example using Difference list Concept Concatenating two lists represented in the form of difference lists. When two lists (in difference lists representation) are concatenated, then appended list is obtained by simply unifying the appropriate arguments. append_diff (A - B, B - C, A - C). If we have to append two lists [1,2,3] and [4,5,6], then we execute the following goal using difference-list rule given above. Goal: ?- append_diff([1,2,3 | X] - X , [4,5,6 | Y] - Y, N). Graphical Representation of append A A-B B B-C A-C C Search Tree ?- append_diff([1,2,3 | X] - X , [4,5,6 | Y] - Y, N). {A = [1,2,3 | X], B = X = [4,5,6 | Y], C =Y, N = A-C=[1,2,3,4,5,6 |Y] - Y} succeeds Answer: X = [4,5,6 | Y]; N = [1,2,3,4,5,6 |Y] - Y • This program can not be used for concatenating two complete lists. • Here each list is to be represented using difference-list notation. • There are nontrivial limitations to this representation because the first list gets changed. Cont… Rule (1) can be rewritten using difference lists as: sentence(X -Y) :- append(X - Z, Z - Y, X - Y), np(X - Z), vp(Z-Y). Since append(X - Z, Z - Y, X - Y) always suceeds, we can remove it from the rule. Therefore, the modified rule becomes sentence(X -Y) :- np(X - Z), vp(Z - Y). For the sake of convenience, we can write sentence(X, Y) :- np(X, Z), vp(Z, Y). Interpretation: There is a sentence between the difference of two lists X and Y if there is a noun_phrase between the difference of two lists X and Z and verb_phrase between Z and Y. Cont… The np clause decides how much of the sequence is to be consumed and what is to be left for the vp clause to work on. A terminal symbol in Prolog is coded using difference-list concept as category( [token|X], X) – which means that there is a terminal_symbol between the difference of two lists [token | X] and X . – For example, det( [the | X], X), noun([girl | X], X), verb([ sing | X], X) etc. Complete Prolog Program sentence(X, Y) :- np(X, Z), vp(Z, Y). np(X, Y) :- det(X, Z), noun(Z, Y). vp(X, Y) :- verb (X, Z), np(Z, Y). vp(X, Y) :- verb(X, Y). det([a | X], X). det([an | X], X). det([the | X], X). noun([boy | X], X). noun([girl | X], X). noun([song | X], X). noun([apple | X], X). verb(([sing | X], X). verb(([sings | X], X). verb(([eats | X], X). DCG Grammar sentence --> np, vp. np --> det, noun. vp --> verb. vp --> verb, np. det --> [a]. det --> [an]. det --> [the]. noun --> [boy]. noun --> [girl]. noun --> [song]. noun --> [apple]. verb --> [sing]. verb --> [sings]. verb --> [eats]. DCG Handler In most of the Prolog systems, a DCG handler is built-in that translates DCG rules into Prolog rules. The actual grammar rules are Prolog structures, with main functor --> which is declared as an infix operator in the beginning of program. Prolog interpreter checks whether a term read in has this functor and if so then translates it into a proper Prolog clause. DCG rule sentence --> np, vp. Prolog rule sentence(X, Y) :- np(X, Z), vp(Z, Y). DCG fact det --> [the]. Prolog fact det([the | X], X). Cont… The DCG rules are translated by DCG handler into Prolog rules by adding two difference-list arguments. The query is normal Prolog goal and thus is expressed by adding the extra arguments by user as ?- sentence ([the, girl, sings, a, song], []). The goal gets satisfied using above DCG grammar. The number agreements between subject and verb can be easily incorporated in DCG grammar. Query: ?- sentence([the, girl, sing, a, song], []). ?- sentence ([the, girl, sings, a, song], []). ?- np([the, girl, sings, a, song], Z), vp(Z, []). ?- det([the, girl, sings, a, song], Z1), noun(Z1, Z), vp(Z, []). Z1 = [girl, sings, a, song] ?- noun([girl, sings, a, song], Z), vp(Z, []). Z = [sings, a, song] ?- vp([sings, a, song], []). ?- verb([sings, a, song], Y), np(Y, []). Y = [a, song] ?- np([a, song], []). ?- det([a, song], X), noun(X, []). X = [song] ?- noun([song], []). Succeeds Answer: Yes .