<<

Compact non-left-recursive grammars using the selective



left-corner transform and factoring

Mark Johnson Brian Roark

Cognitive and Linguistic Sciences, Box 1978

Brown University

Mark [email protected] Brian [email protected]

Left-corner transforms are particularly useful b e- Abstract

cause they can preserve annotations on pro ductions

The left-corner transform removes left-recursion

more on this b elow and are therefore applicable to

from probabilistic context-free grammars and uni-

more complex grammar formalisms as well as CFGs;

cation grammars, p ermitting simple top-down

a prop erty which other approaches to left-recursion

parsing techniques to be used. Unfortunately the

elimination typically lack. For example, they apply

grammars pro duced by the standard left-corner

to left-recursive uni cation-based grammars Mat-

transform are usually much larger than the original.

sumoto et al., 1983; Pereira and Shieb er, 1987; John-

The selective left-corner transform describ ed in this

son, 1998a. Because the emission probabilityofa

pap er pro duces a transformed grammar which simu-

PCFG pro duction can b e regarded as an annotation

lates left-corner recognition of a user-sp eci ed set of

onaCFG pro duction, the left-corner transform can

the original pro ductions, and top-down recognition

pro duce a CFG with weighted pro ductions which

of the others. Combined with two factorizations, it

assigns the same probabilities to strings and trans-

pro duces non-left-recursive grammars that are not

formed trees as the original grammar Abney et al.,

much larger than the original.

1999. However, the transformed grammars can b e

much larger than the original, which is unacceptable

1 Intro duction

for many applications involving large grammars.

Top-down parsing techniques are attractive b ecause

The selective left-corner transform reduces the

of their simplicity, and can often achieve good per-

transformed grammar size b ecause only those pro-

formance in practice Roark and Johnson, 1999.

ductions which app ear in a left-recursive cycle need

However, with a left- such parsers

be recognized left-corner in order to remove left-

typically fail to terminate. The left-corner gram-

recursion. A top-down parser using a grammar pro-

mar transform converts a left-recursive grammar

duced by the selective left-corner transform simu-

into a non-left-recursive one: a top-down parser

lates a generalized left-corner parser Demers, 1977;

using a left-corner transformed grammar simulates

Nijholt, 1980 which recognizes a user-sp eci ed sub-

a left-corner parser using the original grammar

set of the original pro ductions in a left-corner fash-

Rosenkrantz and Lewis I I, 1970; Aho and Ullman,

ion, and the other pro ductions top-down.

1972. However, the left-corner transformed gram-

mar can be signi cantly larger than the original

Although we do not investigate it in this pap er,

grammar, causing numerous problems. For exam-

the selective left-corner transform should usually

ple, we show b elow that a probabilistic context-free

have a smaller search space relative to the standard

grammar PCFG estimated from left-corner trans-

left-corner transform, all else b eing equal. The par-

formed Penn WSJ tree-bank trees exhibits consid-

tial parses pro duced during a top-down parse consist

erably greater sparse data problems than a PCFG

of a single connected tree fragment, while the par-

estimated in the usual manner, simply b ecause the

tial parses pro duced pro duced during a left-corner

left-corner transformed grammar contains approxi-

parse generally consist of several disconnected tree

mately 20 times more pro ductions. The transform

fragments. Since these fragments are only weakly re-

describ ed in this pap er pro duces a grammar approx-

lated via the \link" constraint describ ed b elow, the

imately the same size as the input grammar, which

search for each fragment is relatively indep endent.

is not as adversely a ected by sparse data.

This may be resp onsible for the observation that

exhaustive left-corner parsing is less ecient than



This research was supp orted by NSF awards 9720368,

top-down parsing Covington, 1994. Informally,be-

9870676 and 9812169. Wewould like to thank our colleagues

cause the selective left-corner transform recognizes

in BLLIP Brown Lab oratory for Linguistic Information Pro-

only a subset of pro ductions in a left-corner fashion,

cessing and Bob Mo ore for their helpful comments on this

its partial parses contain fewer tree discontiguous pap er.

fragments and the searchmay b e more ecient.

 D   D 

While this pap er fo cuses on reducing grammar

{

B D A

size to minimize sparse data problems in PCFG

1 0

estimation, the mo di ed left-corner transforms de-

{

D B

n n

scrib ed here are generally applicable wherever the

B 

n

LC

original left-corner transform is. For example, the

{

A D B

selective left-corner transform can b e used in place

n 1

of the standard left-corner transform in the con-

{

D D

0

struction of nite-state approximations Johnson,

1998a, often reducing the size of the intermedi-

ate automata constructed. The selective left-corner

Figure 1: Schematic parse trees generated by the origi-

transform can b e generalized to head-corner parsing

nal grammar G and the selective left-corner transformed

van No ord, 1997, yielding a selective head-corner

grammar LC G. The shaded lo cal trees in the orig-

L

parser. This follows from generalizing the selective

inal parse tree corresp ond to left-corner pro ductions;

left-corner transform to Horn clauses.

the corresp onding lo cal trees generated by instances of

After this pap er was accepted for publication we

schema 1c  in the selective left-corner transformed tree

learnt of Mo ore 2000, which addresses the issue

are also shown shaded. The lo cal tree colored blackis

of grammar size using very similar techniques to

generated by an instance of schema 1b.

those prop osed here. The goals of the two pap ers

are slightly di erent: Mo ore's approach is designed



{

has b een found left-corner, so D X  only

LC G

to reduce the total grammar size i.e., the sum of

L



X . if D 

the lengths of the pro ductions, while our approach

G

minimizes the numb er of pro ductions. Mo ore 2000

{

D ! wD w 1a

do es not address left-corner tree-transforms, or ques-

{

D ! D A where A ! 2 P L 1b

tions of sparse data and parsing accuracy that are

{ {

D B ! D C where C ! B 2 L 1c

covered in section 3.

{

D D !  1d

2 The selective left-corner and

The schemata function as follows. The pro ductions

related transforms

intro duced byschema 1a start a left-corner parse of

a predicted nonterminal D with its leftmost termi-

This section intro duces the selective left-corner

nal w , while those intro duced byschema 1b start a

transform and two additional factorization trans-

left-corner parse of D with a left-corner A, whichis

forms which apply to its output. These transforms

itself found by the top-down recognition of pro duc-

are used in the exp eriments describ ed in the follow-

tion A ! 2 P L. Schema 1c extends the current

ing section. As Mo ore 2000 observes, in general

left-corner B up to a C with the left-corner recogni-

the transforms pro duce a non-left-recursive output

tion of pro duction C ! B . Finally, schema 1d

grammar only if the input grammar G do es not con-

matches the top-down prediction with the recog-

tain unary cycles, i.e., there is no nonterminal A

+

nized left-corner category.

such that A ! A.

G

Figure 1 schematically depicts the relationship b e-

2.1 The selective left-corner transform

tween a chain of left-corner pro ductions in a parse

tree generated by G and the chain of corresp onding

The selective left-corner transform takes as input a

instances of schema 1c. The left-corner recognition

CFG G =V; T ; P; S and a set of left-corner produc-

of the chain starts with the recognition of , the

tions L P , which contains no epsilon pro ductions;

right-hand side of a top-down pro duction A ! ,

the non-left-corner pro ductions P L are called top-

using an instance of schema 1b. The left-branching

down productions. The standard left-corner trans-

chain of left-corner pro ductions corresp onds to a

form is obtained by setting L to the set of all

right-branching chain of instances of schema 1c; the

non-epsilon pro ductions in P . The selective left-

left-corner transform in e ect converts

corner transform of G with resp ect to L is the CFG

into right recursion. Notice that the top-down pre-

LC G=V ;T;P ;S, where:

L 1 1

dicted category D is passed down this right-recursive

{

chain, e ectively multiplying each left-corner pro-

V = V [fD X : D 2 V; X 2 V [ T g

1

ductions by the p ossible top-down predicted cate-

and P contains all instances of the schemata 1. In gories. The right recursion terminates with an in-

1

these schemata, D 2 V , w 2 T , and lower case stance of schema 1d when the left-corner and top-



{

greek letters range over V [ T  . The D X are down categories match.

new nonterminals; informally they enco de a parse Figure 2 shows how top-down pro ductions from

state in whichanD is predicted top-down and an X G are recognized using LC G. When the se- L

top-down as well as left-corner Manning and Car-

::: A :::  A  ::: A :::

penter, 1997. Thus investigating the prop erties of

 

LC

-removal

{

PCFG estimated from trees transformed with T is

A A

L

an easy way of studying sto chastic push-down au-

tomata p erforming generalized left-corner parses.

Figure 2: The recognition of a top-down pro duction A !

{

by LC Ginvolves a left-corner category A A, which 2.3 Pruning useless pro ductions

L

immediately rewrites to . One-step -removal applied

We turn now to the problem of reducing the size of

to LC G pro duces a grammar in which each top-down

L

the grammars pro duced by left-corner transforms.

pro duction A ! corresp onds to a pro duction A !

Many of the pro ductions generated by schemata 1

in the transformed grammar.

are useless, i.e., they never app ear in any termi-

nating derivation. While they can be removed by

lective left-corner transform is followed by a one-

standard metho ds for deleting useless pro ductions

step -removal transform i.e., comp osition or partial

Hop croft and Ullman, 1979, the relationship be-

evaluation of schema 1b with resp ect to schema 1d

tween the parse trees of G and LC G depicted in

L

Johnson, 1998a; Abney and Johnson, 1991; Resnik,

Figure 1 shows how to determine ahead of time the

1992, each top-down pro duction from G app ears

{

new nonterminals D X that can app ear in useful

unchanged in the nal grammar. Full -removal

pro ductions of LC G. This is known as a link con-

L

yields the grammar given by the schemata b elow.

straint.

{

D ! wD w

For PCFGs there is a particularly simple link

+

{

D ! w where D  w

constraint: D X app ears in useful pro ductions of

L

? ?

{

D ! D A where A ! 2 P L

X . If LC G only if 9 2 V [ T  :D 

L

L

?

A; A ! 2 P L D ! where D 

epsilon removal is applied to the resulting gram-

L

{ {

{

D B ! D C where C ! B 2 L

mar, D X app ears in useful pro ductions only if

?

+ ?

{

C; C ! B 2 L D B ! where D 

X . Thus one only need 9 2 V [ T  :D 

L

L

generate instances of the left-corner schemata which

Mo ore 2000 intro duces a version of the left-

satisfy the corresp onding link constraints.

corner transform called LC , which applies only to

LR

Mo ore 2000 suggests an additional constrainton

pro ductions with left-recursive parent and left child

{

nonterminals D X that can app ear in useful pro duc-

categories. In the context of the other transforms

tions of LC G: D must either b e the start symbol

L

that Mo ore intro duces, it seems to have the same

of G or else app ear in a pro duction A ! D of G,

e ect in his system as the selective left-corner trans-

+ ?

for any A 2 V , 2fV [ T g and 2fV [ T g .

form do es here.

It is easy to see that the pro ductions that Mo ore's

constraint prohibits are useless. There is one non-

2.2 Selective left-corner tree transforms

terminal in the tree-bank grammar investigated b e-

There is a 1-to-1 corresp ondence b etween the parse

low that has this prop erty, namely LST. However,

trees generated by G and LC G. A tree t is gener-

L

0

in the tree-bank grammar none of the pro ductions

ated by G i there is a corresp onding t generated by

expanding LST are left-recursive in fact, the rst

LC G, where each o ccurrence of a top-down pro-

L

child is always a preterminal, so Mo ore's constraint

duction in the derivation of t corresp onds to exactly

do es not a ect the size of the transformed grammars

one lo cal tree generated by o ccurrence of the cor-

investigated b elow.

resp onding instance of schema 1b in the derivation

0

of t , and each o ccurrence of a left-corner pro duc- While these constraints can dramatically reduce

tion in t corresp onds to exactly one o ccurrence of b oth the numb er of pro ductions and the size of the

0

the corresp onding instance of schema 1c in t . It is parsing search space of the transformed grammar,

straightforward to de ne a 1-to-1 tree transform T in general the transformed grammar LC G can b e

L L

mapping parse trees of G into parse trees of LC G quadratically larger than G. There are two causes

L

Johnson, 1998a; Roark and Johnson, 1999. In the for the explosion in grammar size. First, LC G

L

empirical evaluation b elow, we estimate a PCFG contains an instance of schema 1b for each top-

from the trees obtained by applying T to the trees down pro duction A ! and each D such that

L

?

in the Penn WSJ tree-bank, and compare it to the 9 : D  A . Second, LC G contains an in-

L

L

PCFG estimated from the original tree-bank trees. stance of schema 1c for each left-corner pro duction

?

A sto chastic top-down parser using the PCFG es- C ! and each D such that 9 : D  C . In

L

timated from the trees pro duced by T simulates e ect, LC G contains one copy of each pro duction

L L

a sto chastic generalized left-corner parser, whichis for each p ossible left-corner ancestor. Section 2.5

a generalization of a standard sto chastic left-corner describ es further factorizations of the pro ductions

parser that p ermits pro ductions to be recognized of LC G which mitigate these causes. L

3a,3b, 1c and 1d. 2.4 Optimal choice of L

?

Because  increases monotonically with  and

0 L

L

{

D ! A D A where A ! 2 P L 3a

hence L,wetypically reduce the size of LC Gby

0

L

A ! where A ! 2 P L 3b

making the left-corner pro duction set L as small as

p ossible. This section shows how to nd the unique

Notice that the numb er of instances of schema 3a is

minimal set of left-corner pro ductions L such that

less than the square of the numberofnonterminals

LC G is not left-recursive.

L

and that the numb er of instances of schema 3b is the

Assume G = V; T ; P; S is pruned i.e., P con-

numb er of top-down pro ductions; the sum of these

tains no useless pro ductions and that there is no

numb ers is usually much less than the number of

+

A 2 V such that A ! A i.e., G do es not gen-

instances of schema 1b.

G

erate recursive unary branching chains. For rea-

Top-down factoring plays approximately the same

sons of space we also assume that P contains no

role as \non-left-recursion grouping" NLRG do es

-pro ductions, but this approach can b e extended to

in Mo ore's 2000 approach. The ma jor di erence

deal with them if desired. A pro duction A ! B 2

is that NLRG applies to all pro ductions A ! B

? ?

+

P is left-recursive i 9 2 V [ T  :B  A , i.e.,

in which B is not left-recursive, i.e., 69 : B  B ,

P

P

P rewrites B into a string b eginning with A. Let L

0

while in our system top-down factorization applies to

?

b e the set of left-recursive pro ductions in G. Then

those pro ductions for which 69 : B  A , i.e., the

P

we claim 1 that LC G is not left-recursive, and

L

pro ductions not directly involved in left recursion.

0

2 that for all L  L , LC G is left-recursive.

0 L

The left-corner factorization decomp oses

Claim 1 follows from the fact that if A  B

L

schema 1c in a similar way using new nonter-

0

n

then A  B and the constraints in section 2.3

P

minals D X , where D 2 V and X 2 V [ T .

lc

on useful pro ductions of LC G. Claim 2 follows

L

0

LC G=V ;T;P ;S, where:

lc lc

L

from the fact that if L  L then there is a chain of

0

n

left-recursive pro ductions that includes a top-down

V = V [fD X : D 2 V; X 2 V [ T g

lc 1

pro duction; a simple induction on the length of the

and P contains all instances of the schemata 1a,

lc

chain shows that LC G is left-recursive.

L

1b, 4a, 4b and 1d.

This result justi es the common practice in natu-

ral language left-corner parsing of taking the termi-

n

{ {

D B ! C BD C where C ! B 2 L 4a

nals to b e the preterminal part-of-sp eech tags, rather

n

C B ! where C ! B 2 L 4b

than the lexical items themselves. We did not at-

tempt to calculate the size of such a left-corner gram-

The number of instances of schema 4a is b ounded

mar in the empirical evaluation b elow, but it would

by the numb er of instances of schema 1c and is typ-

be much larger than any of the grammars describ ed

ically much smaller, while the number of instances

there. In fact, if the preterminals are distinct from

of schema 4b is precisely the number of left-corner

the other nonterminals as they are in the tree-bank

pro ductions L.

grammars investigated b elow then L do es not in-

Left-corner factoring seems to corresp ond to one

0

clude any pro ductions b eginning with a preterminal,

step of Mo ore's 2000 \left factor" LF op eration.

and LC G contains no instances of schema 1a at

The left factor op eration constructs new nontermi-

L

0

all. Wenow turn our attention to the other schemata

nals corresp onding to common pre xes of arbitrary

of the selective left-corner grammar transform.

length, while left-corner factoring e ectively only

factors the rst nonterminal symbol on the right

2.5 Factoring the output of LC

L

hand side of left-corner pro ductions. While wehave

This section de nes two factorizations of the output

not done exp eriments, Mo ore's left factor op eration

of the selective left-corner grammar transform that

would seem to reduce the total numberofsymb ols

can dramatically reduce its size. These factoriza-

in the transformed grammar at the exp ense of p os-

tions are most e ective if the numb er of pro ductions

sibly intro ducing additional pro ductions, while our

is much larger than the numberofnonterminals, as

left-corner factoring reduces the numb er of pro duc-

is usually the case with tree-bank grammars.

tions.

The top-down factorization decomp oses

These two factorizations can be used together

schema 1b by intro ducing new nonterminals

in the obvious way to de ne a grammar trans-

0

D , where D 2 V , that have the same expansions

td;lc

, whose pro ductions are de ned by form LC

L

that D do es in G. Using the same interpretation for

schemata 1a,3a, 3b, 4a, 4b and 1d. There are corre-

variables as in schemata 1, if G =V; T ; P; S then

td

sp onding tree transforms, whichwe refer to as T ,

td

L

LC G=V ;T;P ;S, where:

td td

L

etc., b elow. Of course, the pruning constraints de-

0

scrib ed in section 2.3 are applicable with these fac-

V = V [fD : D 2 V g

td 1

torizations, and corresp onding invertible tree trans-

and P contains all instances of the schemata 1a, forms can b e constructed. td

none td lc td; l c

3 Empirical Results

G 15,040

To examine the e ect of the transforms outlined

ab ove, we exp erimented with various PCFGs in-

LC 346,344 30,716

P

duced from sections 2{21 of a mo di ed Penn WSJ

LC 345,272 113,616 254,067 22,411

N

tree-bank as describ ed in Johnson 1998b i.e.,

314,555 103,504 232,415 21,364 LC

L

0

lab els simpli ed to grammatical categories, root

T 20,087 17,146

P

no des added, empty no des and vacuous unary

T 19,619 16,349 19,002 15,732

N

branches deleted, and auxiliaries retagged as aux

T 18,945 16,126 18,437 15,618

L

0

or auxg. We ignored lexical items, and treated

the part-of-sp eech tags as terminals. As Bob Mo ore

Table 1: Sizes of PCFGs inferred using various grammar

p ointed out to us, the left-corner transform may pro-

and tree transforms after pruning with link constraints

duce left-recursive grammars if its input grammar

without epsilon removal. Columns indicate factorization.

In the grammar and tree transforms, P is the set of pro-

contains unary cycles, so we removed them using the

ductions in G i.e., the standard left-corner transform,

a transform that Mo ore suggested. Given an initial

N is the set of all pro ductions in P which do not be-

set of non-epsilon pro ductions P , the transformed

gin with a POS tag, and L is the set of left-recursive

0

grammar contains the following pro ductions, where

\

pro ductions.

the A are new non-terminals:

+

none td lc td; l c

A ! where A ! 2 P; A 6 A

P

+

\ ?

LC 564,430 38,489

A ! D where A  D  A

P

P

P

+

\ 

LC 563,295 176,644 411,986 25,335

A ! where A ! 2 P; A  A; 6 A

N

P

P

LC 505,435 157,899 371,102 23,566

L

0

This transform can be extended to one on PCFGs

22,035 17,398 T

P

which preserves derivation probabilities. In this sec-

T 21,589 16,688 20,696 15,795

N

tion, we xP to b e the pro ductions that result after

T 21,061 16,566 20,168 15,673

L

0

applying this unary cycle removal transformation to

the tree-bank pro ductions, and G to be the corre-

Table 2: Sizes of PCFGs inferred using various grammar

sp onding grammar.

and tree transforms after pruning with link constraints

with epsilon removal, using the same notation as Table 1.

Tables 1 and 2 give the sizes of selective left-

corner grammar transforms of G for various values

of the left-corner set L and factorizations, without

much larger than L , which in turn is b ecause most

0

and with epsilon-removal resp ectively. In the ta-

pairs of non-POS nonterminals A; B are mutually

bles, L is the set of left-recursive pro ductions in

0

left-recursive.

P , as de ned in section 2.4. N is the set of pro duc-

Turning now to the PCFGs estimated after ap-

tions in P whose left-hand sides do not b egin with

plying tree transforms, we notice that grammar size

a part-of-sp eech POS tag; b ecause POS tags are

do es not increase nearly so dramatically. These

distinct from other nonterminals in the tree-bank,

PCFGs enco de a maximum-likeliho o d estimate of

N is an easily identi ed set of pro ductions guaran-

the state transition probabilities for various sto chas-

teed to include L . The tables also gives the sizes

0

tic generalized left-corner parsers, since a top-down

of maximum-likeliho o d PCFGs estimated from the

parser using these grammars simulates a general-

trees resulting from applying the selective left-corner

ized left-corner parser. The fact that LC G is

P

tree transforms T to the tree-bank, breaking unary

17 times larger than the PCFG inferred after apply-

cycles as describ ed ab ove. For the parsing exp eri-

ing T to the tree-bank means that most of the p os-

P

ments b elowwe always deleted empty no des in the

sible transitions of a standard sto chastic left-corner

output of these tree transforms; this corresp onds to

parser are not observed in the tree-bank training

epsilon removal in the grammar transform.

data. The state of a left-corner parser do es capture

First, note that LC G, the result of applying the

P

some linguistic generalizations Manning and Car-

standard left-corner grammar transform to G, has

penter, 1997; Roark and Johnson, 1999, but one

approximately 20 times the number of pro ductions

might still exp ect sparse-data problems. Note that

td;lc

that G has. However LC td;lc td;lc G, the result of ap-

L

0

LC is only 1.4 times larger than T ,sowe

L L

0 0

plying the selective left-corner grammar transforma-

exp ect less serious sparse data problems with the

tion with factorization, has approximately 1.4 times

factored selective left-corner transform.

the number of pro ductions that G has. Thus the

We quantify these sparse data problems in two

metho ds describ ed in this pap er can in fact dramati-

ways using a held-out test corpus, viz., all sentences

cally reduce the size of left-corner transformed gram-

in section 23 of the tree-bank. First, table 3 lists the

td;lc

mars. Second, note that LC G is not much

N

number of sentences in the test corpus that fail to

td;lc

receive a parse with the various PCFGs mentioned larger than LC G. This is b ecause N is not

L 0

Transform none td lc td; l c none td lc td; l c

none 0 none 70.8,75.3

T 2 0 T 75.8,77.7 74.8,76.9

P P

T T 2 0 2 0 75.8,77.6 73.8,75.8 75.5,77.8 72.8,75.4

N N

0 0 0 0 75.8,77.4 73.0,74.7 75.6,77.8 72.9,75.4 T T

L L

0 0

Table 3: The numb er of sentences in section 23 that do Table 5: Lab elled recall and precision scores of PCFGs

not receive a parse using various grammars estimated estimated using various tree-transforms in a transform-

from sections 2{21. detransform framework using test data from section 23.

none td lc td; l c Transform

the transform-detransform framework describ ed in

none 514

Johnson 1998b to evaluate the parses, i.e., we ap-

665 535 T

P

1

plied the appropriate inverse tree transform T

T 664 543 639 518

N

to detransform the parse trees pro duced using the

T 640 547 615 522

L

0

PCFG estimated from trees transformed by T . By

T 719 539

P

calculating the lab elled precision and recall scores

718 554 685 521 T

N

for the detransformed trees in the usual manner, we

T 706 561 666 521

L 

0

can systematically compare the parsing accuracy of

di erent kinds of sto chastic generalized left-corner

Table 4: The numb er of pro ductions found in the trans-

formed trees of sentences in section 23 that do not app ear

parsers.

in the corresp onding transformed trees from sections 2{

Table 5 presents the results of this comparison. As

21. The subscript epsilon indicates epsilon removal was

rep orted previously, the standard left-corner gram-

applied.

mar emb eds sucient non-lo cal information in its

pro ductions to signi cantly improve the lab elled pre-

cision and recall of its MLPs with resp ect to MLPs of

ab ove. This is a relatively crude measure, but cor-

the PCFG estimated from the untransformed trees

relates roughly with the ratios of grammar sizes, as

Manning and Carp enter, 1997; Roark and John-

exp ected.

son, 1999. Parsing accuracy drops o as grammar

Second, table 4 lists the number of pro ductions

size decreases, presumably b ecause smaller PCFGs

found in the tree-transformed test corpus that do

have fewer adjustable parameters with whichtode-

not app ear in the corresp ondingly transformed trees

scrib e this non-lo cal information. There are other

of sections 2{21. What is striking here is that the

kinds of non-lo cal information which can be incor-

number of missing pro ductions after either of the

p orated into a PCFG using a transform-detransform

td;lc td;lc

transforms T or T is approximately the

L N

0

approach that result in an even greater improvement

same as the number of missing pro ductions using

of parsing accuracy Johnson, 1998b. Ultimately,

the untransformed trees, indicating that the factored

however, it seems that a more complex approach in-

selective left-corner transforms cause little or no ad-

corp orating back-o and smo othing is necessary in

ditional sparse data problem. The relationship b e-

order to achieve the parsing accuracy achieved by

tween lo cal trees in the parse trees of G and LC G

L

Charniak 1997 and Collins 1997.

mentioned earlier implies that left-corner tree trans-

formations will not decrease the numb er of missing

4 Conclusion

pro ductions.

This pap er presented factored selective left-corner

We also investigate the accuracy of the maximum-

grammar transforms. These transforms preserve the

likeliho o d parses MLPs obtained using the PCFGs

primary b ene ts of the left-corner grammar trans-

estimated from the output of the various left-corner

1

form i.e., elimination of left-recursion and preserva-

tree transforms. We searched for these parses us-

tion of annotations on pro ductions while dramati-

ing an exhaustive CKY parser. Because the parse

cally ameliorating its principal problems grammar

trees of these PCFGs are isomorphic to the deriva-

size and sparse data problems. This should extend

tions of the corresp onding sto chastic generalized

the applicability of left-corner techniques to situa-

left-corner parsers, we are in fact evaluating di erent

tions involving large grammars. We showed howto

kinds of sto chastic generalized left-corner parsers in-

identify the minimal set L of pro ductions of a gram-

0

ferred from sections 2{21 of the tree-bank. We used

mar that must b e recognized left-corner in order for

1

We did not investigate the grammars pro duced by the

the transformed grammar not to be left-recursive.

various left-corner grammar transforms. Because a left-corner

We also prop osed two factorizations of the output of

grammar transform LC preserves pro duction probabilities,

L

the selective left-corner grammar transform which

the highest scoring parses obtained using the weighted CFG

further reduce grammar size, and showed that there

LC G should b e the highest scoring parses obtained using

L

G transformed by T . is only a minor increase in grammar size when the L

tic tree representations. Computational Linguistics, factored selective left-corner transform is applied to

244:613{632.

a large tree-bank grammar. Finally, we exploited

Christopher D. Manning and Bob Carp enter. 1997.

the tree transforms that corresp ond to these gram-

Probabilistic parsing using left-corner mo dels. In Pro-

mar transforms to formulate and study a class of

ceedings of the 5th International Workshop on Parsing

sto chastic generalized left-corner parsers.

Technologies, pages 147{158, Massachusetts Institute

This work could b e extended in a number of ways.

of Technology.

For example, in this pap er we assumed that one

Yuji Matsumoto, Hozumi Tanaka, Hideki Hirakawa,

would always cho ose a left-corner pro duction set

Hideo Miyoshi, and Hideki Yasukawa. 1983. BUP:

that includes the minimal set L required to ensure

0

A b ottom-up parser emb edded in Prolog. New Gen-

that the transformed grammar is not left-recursive.

eration Computing, 12:145{158.

However, Roark and Johnson 1999 rep ort good

Rob ert C. Mo ore. 2000. Removing left recursion from

p erformance from a sto chastically-guided top-down

context-free grammars. In Proceedings of 1st Annual

parser, suggesting that left-recursion is not always

Conference of the North American Chapter of the As-

fatal. It might be p ossible to judiciously cho ose

sociation for Computational Linguistics, San Fran-

cisco. Morgan Kaufmann.

a left-corner pro duction set smal ler than L which

0

Anton Nijholt. 1980. Context-free Grammars: Covers,

eliminates p ernicious left-recursion, so that the re-

Normal Forms, and Parsing. Springer Verlag, Berlin.

maining left-recursive cycles have such low proba-

Fernando C.N. Pereira and Stuart M. Shieb er. 1987.

bility that they will e ectively never be used and

Prolog and Natural Language Analysis. Number 10 in

a sto chastically-guided top-down parser will never

CSLI Lecture Notes Series. Chicago University Press,

search them.

Chicago.

Philip Resnik. 1992. Left-corner parsing and psychologi-

References

cal plausibility. In The Proceedings of the fteenth In-

Stephen Abney and Mark Johnson. 1991. Memory re- ternational Conference on Computational Linguistics,

quirements and lo cal ambiguities of parsing strategies.

COLING-92,volume 1, pages 191{197.

Journal of Psycholinguistic Research, 203:233{250.

Brian Roark and Mark Johnson. 1999. Ecient proba-

bilistic top-down and left-corner parsing. In Proceed-

Steven Abney,David McAllester, and Fernando Pereira.

ings of the 37th Annual Meeting of the ACL, pages

1999. Relating probabilistic grammars and automata.

421{428.

In Proceedings of the 37th Annual Meeting of the Asso-

ciation for Computational Linguistics, pages 542{549,

Stanley J. Rosenkrantz and Philip M. Lewis II. 1970.

San Francisco. Morgan Kaufmann.

Deterministic left corner parser. In IEEE Conference

Record of the 11th Annual Symposium on Switching

Alfred V. Aho and Je ery D. Ullman. 1972. The The-

and Automata, pages 139{152.

ory of Parsing, Translation and Compiling; Volume 1:

Gertjan van No ord. 1997. An ecient implementation

Parsing. Prentice-Hall, Englewo o d Cli s, New Jersey.

of the head-corner parser. Computational Linguistics,

Eugene Charniak. 1997. Statistical parsing with a

233:425{456.

context-free grammar and word statistics. In Proceed-

ings of the Fourteenth National ConferenceonArti -

cial Intel ligence, Menlo Park. AAAI Press/MIT Press.

Michael Collins. 1997. Three generative, lexicalised

mo dels for statistical parsing. In The Proceedings of

the 35th Annual Meeting of the Association for Com-

putational Linguistics, San Francisco. Morgan Kauf-

mann.

Michael A. Covington. 1994. Natural Language Pro-

cessing for Prolog Programmers. Prentice Hall, En-

glewo o d Cli s, New Jersey.

A. Demers. 1977. Generalized left-corner parsing. In

Conference Record of the Fourth ACM Symposium

on Principles of Programming Languages, 1977 ACM

SIGACT/SIGPLAN, pages 170{182.

John E. Hop croft and Je rey D. Ullman. 1979. Intro-

duction to , Languages and Compu-

tation. Addison-Wesley.

Mark Johnson. 1998a. Finite state approximation

of uni cation grammars using left-corner grammar

transforms. In The Proceedings of the 36th An-

nual Conference of the Association for Computational

Linguistics COLING-ACL, pages 619{623. Morgan

Kaufmann.

Mark Johnson. 1998b. PCFG mo dels of linguis-