<<

12

Statistical Mo dels and

Lexicon Acquisition

S Schulte i W H Schmid M Rooth S Riezler

D Prescher

Intro duction

This pap er presents a framework for developing and training statisti

cal grammar mo dels for the acquisition of information Util

ising a robust parsing environment and mathematically welldened

unsup ervised training metho ds the framework enables us to induce

lexicon information from text corp ora Particular strengths of the ap

proach concern i the fact that no extensive manual work is required

to set up the framework and ii that the framework is applicable

to any desired It has already b een applied to English and

German Carroll and Ro oth Beil et al Ro oth et al

Schulte im Walde a Portuguese de Lima and Chinese

Ho ckenmaier

Manual work within the framework is reduced to a minimum since

the necessary need not go into detailed structures for the rele

vant grammar asp ects to b e trained suciently The automatic training

cess utilises a shallow parser emb edded in the mathematically well

dened Exp ectationMaximisation algorithm The training approach en

forces the lexicalised parameters in the statistical grammar to obtain

linguistic reliability A basic assumption thereby exp ects that the lin

guistically correct analyses of text corresp ond to those analyses which

Linguistic Form and its Computation

Edited by

Christian Rohrer Antje Rodeutscher and

Hans Kamp

c

Copyright CSLI Publications

S Schulte i W H Schmid M Rooth S Riezler D Prescher

maximise the probability of the data

The linguistic value of the grammar mo dels mainly lies in the lex

icalised mo del parameters they contain lexicalised rules ie grammar

rules referring to a sp ecic lexical head and lexical choice parameters a

measure of lexical coherence b etween lexical heads Concerning verbs for

example the lexical rule parameters serve as basis for probability distri

butions over sub categorisation frames and the lexical choice parameters

supply us with heads of sub categorised noun phrases as basis

for selectional constraints The information can b e used straightly as lex

ical description or as input for lexicon to ols such as semantic clustering

techniques Ro oth et al Schulte im Walde a or as basis for

a variety of applications eg parser improvement Riezler et al

chunking Schmid and Schulte im Walde or machine transla

tion Prescher et al

The reader might still wonder ab out the exact nature of the lexi

cal information we gain Consider this concrete example our trained

grammar mo del for German informs you that the verb essen eat most

probably o ccurs transitively but might as well o ccur intransitively In

addition we learn that eg the most frequent nominal heads in the di

rect ob ject slot of the transitive frame are the German equivalent nouns

for bread meat banana and icecream

The rst part of this chapter concerns the grammar development

and its training section allows practical insights into the prerequi

sites for our statistical grammars and describ es a characteristic grammar

development pro cess by means of the German grammar Following in

section the reader will nd an intro duction to the theoretical back

ground of statistical grammars and their headlexicalised renements as

well as a description of their training facilities Section then presents

the application of the training pro cedure concerning the German gram

mar example

The second part of this chapter illustrates various p ossibilities to ex

ploit the lexicalised probability mo dels section straightly utilises

the mo del parameters to extract lexical parameters for mainly verbs

and to apply sp ecic parsing facilities such as Viterbi parsing or noun

chunking Section demonstrates the usage of lexical information

with sp ecic reference to lexical coherence b etween verbs and sub cate

gorised nouns as input for semantic clustering techniques

Grammar Development

Our statistical grammar mo dels can b e develop ed for arbitrary lan

guages presupp osing i a corpus as source for empirical input data

Statistical Grammar Models and Lexicon Acquisition

ii a morphological analyser for analysing the corpus forms and

assigning lemmas where appropriate and iii a contextfree grammar

CFG for parsing the corpus data

The grammar is supp osed to cover a sucient part of the corpus

since in order to develop a statistical grammar mo del on basis of the

grammar cf sections and a large amount of structural rela

tions within parses is required The more corpus data is accessible for

grammar training the more reliable the probability mo del will b e

As mentioned in the intro duction manual work concerning the gram

mar is reduced to a minimum The necessary grammars need not go into

detailed structures for the relevant grammar asp ects to b e trained suf

ciently The complete framework can b e set up within a few weeks

time and easily b e transferred to a dierent language This prop erty

advances the grammar framework compared to eg treebank gram

mars Charniak since it do es not presupp ose a treebank for the

relevant language

So far we have worked on statistical grammar mo dels for En

glish Carroll and Ro oth German an earlier version is de

scrib ed in Beil et al Portuguese de Lima and Chi

nese Ho ckenmaier The preparation of the relevant corpus data

the task denition of the morphological analyser and a contextfree

grammar are describ ed b elow For the purp ose of illustrating the gram

mar development framework we concentrate on the German mo del We

sp ecically describ e the grammar development facilities and outline the

grammar structure

Corpus Preparation

We created two subcorp ora from the million token newspap er cor

pus Huge German Corpus HGC a a subcorpus containing

verbnal clauses with a total of million and b a subcorpus

containing million relative clauses with a total of million words

Apart from nonnite clauses as verbal arguments there are no further

clausal emb eddings and the clauses do not contain any punctuation ex

cept for a terminal p erio d The average clause length is and

words p er clause resp ectively

Morphological Analyser

We utilised a nitestate Schiller and St ckert to as

sign multiple morphological features such as partofsp eech tag case

gender and numb er to the corpus words partly collapsed to reduce the

numb er of analyses For example the word Bleibe either the case am

biguous feminine singular noun residence or a p erson and mo de am

S Schulte i W H Schmid M Rooth S Riezler D Prescher

biguous nite singular present tense verb form of stay is analysed as

follows

analyse Bleibe

BleibeNNFemAkkSg

BleibeNNFemDatSg

BleibeNNFemGenSg

BleibeNNFemNomSg

bleibenVSgPresInd

bleibenVSgPresKonj

bleibenVSgPresKonj

Reducing the ambiguous categories leaves the two morphological analy

ses

Bleibe NNFemCasSg VVFIN

Apart from assigning morphological analyses the to ol in addition serves

as lemmatiser cf Schulze

The German ContextFree Grammar

The contextfree grammar contains rules with their heads marked

With very few exceptions rules for co ordination Srule the rules do

not have more than two daughters The terminal categories in the

grammar corresp ond to the collapsed corpus tags assigned by the mor

phology

Grammar development is facilitated by a grammar development envi

ronment of the featurebased grammar YAP Schmid

and b a chart browser that p ermits a quick and ecient discovery of

grammar bugs Carroll Figure shows that the ambiguity in

the chart is quite considerable even though grammar and corpus are

restricted

The grammar covers of the verbnal and of the rel

ative clauses ie the resp ective part of the corp ora are assigned parses

The following sections describ e two essential parts of the gram

mar the noun chunks and the denition of sub categorisation

frames For more details concerning the German grammar structure

see Schulte im Walde b

Noun Chunks

On nominal categories in addition to the four cases Nom Gen Dat and

Akk case features with a disjunctive interpretation such as Dir for

Nom or Akk are used The grammar is written in such a way that non

disjunctive features are intro duced high up in the tree Figures to

illustrate the use of disjunctive features in the noun pro jections for the

Statistical Grammar Models and Lexicon Acquisition

FIGURE Chart Browser for Grammar Development

German noun phrase eine gute Gelegenheit a go o d opp ortunity in all

four cases the terminal NN contains the fourway ambiguous Cas case

feature the Nbar NN and noun chunk NC pro jections disambiguate to

twoway ambiguous case features Dir and Obl the weakstrong SwSt

feature of NN allows or prevents combination with a determiner re

sp ectively only at the noun phrase NP pro jection level the case feature

app ears in disambiguated form The use of disjunctive case features re

sults in some reduction in the size of the parse forest Essentially the

full range of agreement inside the noun phrase is enforced Agreement

b etween the sub ject NP and the tensed verb is not enforced by the gram

mar in order to control the numb er of parameters and rules

The noun chunk denition refers to Abneys chunk grammar or

ganisation Abney the noun chunk NC is a pro jection that

excludes p osthead complements and adverbial adjuncts intro duced

higher than prehead mo diers and determiners but includes participial

premo diers with their complements

S Schulte i W H Schmid M Rooth S Riezler D Prescher

NPNom

NCDir

ARTE NNFemDirSw

ARTIndefE ADJE NNFemDirSw

eine ADJE NNFemCasSg

gute Gelegenheit

FIGURE Noun Pro jection NP with Nominative Case

NPAkk

NCDir

ARTE NNFemDirSw

ARTIndefE ADJE NNFemDirSw

eine ADJE NNFemCasSg

gute Gelegenheit

FIGURE Noun Pro jection NP with Accusative Case

Statistical Grammar Models and Lexicon Acquisition

NPDat

NCObl

ARTR NNFemOblSw

ARTIndefR ADJN NNFemOblSw

einer ADJN NNFemCasSg

anderen Gelegenheit

FIGURE Noun Pro jection NP with Dative Case

NPGen

NCObl

ARTR NNFemOblSw

ARTIndefR ADJN NNFemOblSw

einer ADJN NNFemCasSg

anderen Gelegenheit

FIGURE Noun Pro jection NP with Genitive Case

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Sub categorisation Frames

The grammar distinguishes four sub categorisation frame classes active

VPA passive VPP nonnite VPI frames and copula constructions

VPK A frame may have maximally three arguments Possible argu

ments in the frames are nominative n dative d and accusative a

NPs reexive pronouns r PPs p and nonnite VPs i The gram

mar do es not distinguish plain nonnite VPs from zunonnite VPs

The grammar is designed to distinguish b etween PPs representing a

verbal complement or adjunct only complements are referred to by the

frame typ e The numb er and the typ es of frames in the dierent frame

classes are given in Table

Frame Class Frame Typ es

VPA n na nd np nad nap ndp

ni di nai ndi

nr nar ndr npr nir

VPP n nps d dps p pps

nd ndps np npps dp dpps

i ips ni nips di dips

VPI a d p r ad ap dp pr

VPK n i

TABLE Sub categorisation Frame Typ es

German b eing a language with comparatively free phrase order al

lows for scrambling of arguments Scrambling is reected in the partic

ular sequence in which the arguments of the verb frame are saturated

Compare Figure as example of a canonical sub jectob ject order within

an active transitive frame der sie liebt who loves her and its scrambled

ob jectsub ject order den sie liebt whom she loves

VPAnana VPAnana

VPAnaa VPAnan NPAkk NPNom

der NPAkk VPAna den NPNom VPAna

sie VPA sie VPA

liebt liebt

FIGURE Realising Scrambling Eect in the Grammar Rules

Statistical Grammar Models and Lexicon Acquisition

Abstracting from the active and passive realisation of an identical

underlying deeplevel we generalise over the alternation by den

ing a toplevel sub categorisation frame typ e eg IPnad for VPAnad

VPPnd and VPPndps with ps a prep ositional phrase within passive

frame typ es representing the deepstructure sub ject realisable only by

PPs headed by von or durch by see Figure as example presenting

the relative clauses der die Frau verfolgt who follows the woman die

verfolgt wird who is followed and die von dem Mann verfolgt wird who

is followed by the man

IPna IPna

VPAnana VPPnn

NPNom VPAnaa NPNom VPPn

der NPAkk VPAna die verfolgt wird

die Frau verfolgt

IPna

VPPnpsnps

NPNom VPPnpsps

die PPDatvon VPPnps

von dem Mann verfolgt wird

FIGURE Generalising over the ActivePassive Alternation of

Sub categorisation Frames

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Probability Mo del

The probabilistic grammars are parsed with a headlexicalised proba

bilistic contextfree parser called LoPar Schmid It is an imple

mentation of the LeftCorner algorithm for parsing and of the Inside

Outside algorithm for parameter estimation Probabilistic contextfree

parsing is a wellknown technique Lari and Young Innovative

features of LoPar are head lexicalisation lemmatisation parameter p o ol

ing and a sophisticated smo othing technique

Probabilistic ContextFree Grammars

A probabilistic contextfree grammar PCFG is a contextfree grammar

which additionally assigns a probability P r to each grammar rule r

The probability of a parse tree is dened as the pro duct of the proba

bilities of the rules which are used to build the parse tree

PCFGs rank the dierent analyses parse trees of a sentence ac

cording to their probabilities However PCFGs fail to resolve some fre

quent syntactic ambiguities like PP attachment ambiguities and co ordi

nation ambiguities For example in the sentence The COLING con

ference in August at the University of Saarland in Saarbrcken was

wel l attended the prep ositional phrase in Saarbrcken could syntac

tically attach to any of the preceding noun phrases Disambiguation of

these ambiguities requires information ab out the lexical heads of the

constituents see also Hindle and Ro oth Headlexicalised prob

abilistic contextfree grammars incorp orate this typ e of information

HeadLexicalised Probabilistic ContextFree Grammars

Syntactically a headlexicalised probabilistic contextfree grammar

HPCFG Carroll Carroll and Ro oth is a probabilistic

contextfree grammar in which one of the categories on the right hand

side of each grammar rule is marked as the head by an ap ostrophe

eg NP DT N Each constituent b ears a lexical head which is prop

agated from the head daughter The lexical head of a terminal no de is

the resp ective word form

10

LoPar is basically a reimplementation of the Galacsy to ols which were develop ed

by Glenn Carroll in the SFB but LoPar provides additional functionality

Statistical Grammar Models and Lexicon Acquisition

HPCFGs assign the following probability to a parse tree T

P T P catr ootT

star t

P headr ootT j catr ootT

star t

Y

P r ul en j catn headn

r ule

nonter m n in T

Y

P headn j catn catpar n headpar n

choice

nonr oot n in T

Y

P hti j catn headn P w or dn j catn headn

r ule lex

ter m n in T

Five families of probability distributions are relevant here P C

star t

is the probability that C is the category of the ro ot no de of a parse

tree P hjC is the probability that a ro ot no de of category C b ears

star t

the lexical head h P r jC h is the probability that a no de of cate

r ule

gory C with lexical head h is expanded by rule r P hjC C h

choice p p

is the probability that a nonhead no de of category C has the lexical

head h given that the parent category is C and the parent head is h

p p

P htijC h is the probability that a no de of category C with lexi

r ule

cal head h is a terminal no de P w jC h nally is the probability

lex

that a terminal no de with category C and lexical head h expands to the

word form w If the lexical head of a terminal no de is the word form

itself rather than eg its then P w jC h is if w and h are

lex

identical and otherwise

Lemmatisation

The ma jor problem in training HPCFGs is the large numb er of parame

ters which have to b e estimated from a limited amount of training data

The numb er of parameters is reduced if stems are used as lexical heads

rather than inected word forms increasing the reliability of the pa

rameter estimates This is in particular true for with a rich

morphology like German

If the lexical heads are stems the word form probability distribution

P w jC h is not trivial anymore b ecause several word forms could

lex

have the same stem and part of sp eech just assume that all numb ers

have the same stem The P parameters therefore have to b e estimated

lex

from training data like other parameters

11

The auxiliary functions cat head parent word and rule return the syntactic

category the lexical head the parent no de the dominated word or the expanding

grammar rule of a no de root returns the ro ot no de of a parse tree and t is a

constant

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Parameter Estimation

The parameters of lexicalised as well as unlexicalised probabilis

tic contextfree grammars are iteratively estimated with the Inside

Outside algorithm Lari and Young which is an instance of the

ExpectationMaximisation EM algorithm Baum Each iteration

of the InsideOutside algorithm consists of two steps namely frequency

estimation and parameter estimation

Lexicalised probability mo dels are estimated with a b o otstrapping

approach First an unlexicalised PCFG is trained starting with a ran

domly initialised mo del The unlexicalised PCFG is then used to esti

mate initial values for the lexicalised probability mo del The lexicalised

mo del is retrained until it do es not improve anymore

Parameter Smo othing

The numb er of parameters of PCFGs and HPCFGs is usually so large

that some of the corresp onding events do not o ccur in the training data

Their estimated frequency is therefore The same holds for the prob

abilities if relative frequency estimates are used In order to avoid that

all analyses with unobserved events are assigned zero probabilities the

probability distributions are smoothed A variant of the absolute dis

counting metho d Ney et al is used for this purp ose

The basic idea of absolute discounting is to subtract a small amount

the discount from all frequency counts and to redistribute the sum of

these discounts over the events with zero frequency according to some

backo distribution This is done recursively The absolute discounting

metho d had to b e adapted in order to b e applicable to the realvalued

frequency counts generated by LoPar

Parameter Po oling

It has already b een discussed how lemmatisation is used to reduce the

numb er of parameters of a HPCFG Another way to achieve a reduc

tion is parameter pooling Parameter p o oling applies to the lexical choice

probabilities It is based on the observation that the probability of the

lexical head of the daughter no de is usually similar for dierent inec

tional variants of the lexical head of the mother no de Consider the

following grammar rule which adjoins an adverb to a verb phrase

VPfinpast VPfinpast ADV

The lexical choice probability P heav il y jAD V V P f inpast r ain

choice

is unlikely to dier much from the probabil

ity P heav il y jAD V V P f inpr es r ain or

choice

P heav il y jAD V V P inf r ain etc Therefore it is p ossi

choice

ble to p o ol the corresp onding distributions into one distribution

Statistical Grammar Models and Lexicon Acquisition

P adv jAD V V P f inpastjV P f inpr esj v er b in order to

choice

get more reliable estimates

Similarly it is p ossible to p o ol the daughter categories By p o oling

mother and daughter categories in case of the rules

NBARnomsg ADJnomsg NBARnomsg

NBARnompl ADJnompl NBARnompl

NBARgensg ADJgensg NBARgensg

NBARaccpl ADJaccpl NBARaccpl

we obtain a single probability distribution for the adjectival mo diers

of the German noun Buch b o ok If the phrase das alte Buch the old

b o ok nominative case is observed in the training data the probability

of the phrase den alten Bchern the old b o oks dative case will also

b e high

Statistical Grammar Training

What is the linguistically optimal strategy for training a headlexicalised

probabilistic contextfree grammar ie estimating the mo del parameters

in the optimal way The EMalgorithm guarantees improving an under

lying mo del towards a lo cal maximum of the likeliho o d of the training

corpus but is that adequate for improving the linguistic representation

within the probabilistic mo del Various training strategies have b een

develop ed in the past years with preliminary results referred to by Beil

et al Beil et al

Elab orating the optimal training strategy results from the interaction

b etween the linguistic and mathematical motivation and prop erties of

the probability mo del

Mathematical motivation p erplexity of the mo del

The perplexity P er p C of a corpus C wrt a language mo del M

M

is a measure of t for the mo del The p erplexity is dened as

log P (C )

M

N

P er p C e

M

where P C is the likelihood of corpus C according to mo del M

M

and N is the size of the corpus Intuitively the p erplexity measures

the uncertainty ab out the next word in a corpus For example if

the p erplexity is then the uncertainty is as high as it is when

we have to cho ose from alternatives of equal probability

The p erplexity on the training and test data should decrease during

training At one p oint the p erplexity on the test data will increase

again which is referred to as overtraining The optimal p oint of

time to stop the training is at the minimum of p erplexity b efore

S Schulte i W H Schmid M Rooth S Riezler D Prescher

the increase

Linguistic motivation representation of linguistic features

The linguistic parameters can b e controlled by investigating rule

and lexical choice parameters eg what is the probability distri

bution over sub categorisation frames concerning the verb achten

ambiguous b etween to resp ect and to pay attention and do es

it correlate to existing lexical information

In addition the mo dels were insp ected by controlling the parsing

p erformance on sp ecied grammatical structures ie noun chunks

and verb phrases have b een assigned lab els which form the basis

for evaluating parses

Section describ es the up to the present optimal training strategy

In section the resulting mo del is evaluated section describ es

the linguistic p erformance in more detail ie strength and weaknesses

of the mo del are investigated

Training Strategy

For training the mo del parameters we used of the corp ora ie

of the verbnal and of the relative clauses a total of million

clauses Every th sentence was cut out of the corp ora to generate a

test corpus The training was p erformed in the following steps

Initialisation

The grammar was initialised by identical frequencies for all

contextfree grammar rules

Comparative initialisations with random frequencies had no eect

on the mo del development

Unlexicalised training

The training corpus was parsed once with LoPar reestimating the

frequencies twice

The optimal training strategy pro ceeds with few parameter re

estimations Without reestimations or with a large numb er of re

estimations the mo del was eected to its disadvantage

With less unlexicalised training more changes during lexicalised

training take place later on

Lexicalisation

The unlexicalised mo del was turned into a lexicalised mo del by

setting the probabilities of the lexicalised rule probabilities to

the values of the resp ective unlexicalised probabilities

initialising the lexical choice and lexicalised start probabilities

uniformly

Statistical Grammar Models and Lexicon Acquisition

Lexicalised training

Three training iterations were p erformed on the training corpus

reestimating the frequencies after each iteration

Comparative numb ers of iterations up to iterations showed

that more iterations of lexicalised training did not have further

eect on the mo del

To achieve a reduction of parameters and improve the lexical choice

mo del we utilised the p o oling option as describ ed in section

all active passive and nonnite verb frames were p o oled according to

shared arguments disregarding the saturation state of the frames in or

der to generalise over their arguments without taking into account their

p ositional facilities In addition each of the categories describing noun

phrases noun chunks the noun bar level and prop er was p o oled

disregarding the features for gender case and numb er thus allowing to

generalise over op en class categories like adjectives which combine with

nouns disregarding these features

Probability Mo del Evaluation

As mentioned ab ove main background for the development of the train

ing strategy were the p erplexity of the mo del as the measure of mathe

matical evaluation on the one hand and the parsing accuracy of gram

matical structures as the measure of linguistic evaluation on the other

hand Figure displays the development of the p erplexity on the train

ing data Figure the development of the p erplexity on the test data

b oth referring to the exp eriment describ ed in section illustrating

lexicalised training up to its fth iteration As the gures show b oth

the p erplexity on the training data and the p erplexity on the test data

monotonously decrease during training which means that according to

p erplexity the mo del improves steadily and has not reached the status

of overtraining yet

S Schulte i W H Schmid M Rooth S Riezler D Prescher

1000 "perplexity.train"

800

600

400

200

0

untrained unlex lex0 lex5

FIGURE Perplexity on Training Data

1000 "perplexity.test"

800

600

400

200

0

untrained unlex lex0 lex5

FIGURE Perplexity on Test Data

Statistical Grammar Models and Lexicon Acquisition

The linguistic parameters of the mo dels were evaluated concerning

the identication of noun chunks and sub categorisation frames We ran

domly extracted relative clauses and verbnal clauses from the

test data and handannotated the relative clauses with noun chunk la

b els and all of the clauses with frame lab els In addition we extracted

randomly chosen relative clauses for each of the six verbs beteili

gen participate erhalten receive folgen follow verbieten forbid

versprechen promise versuchen try and handannotated them with

their sub categorisation frames Probability mo dels were evaluated by

making the mo dels determine the Viterbi parses ie the most probable

parses of the test data extracting the categories of interest ie noun

chunks and sub categorisation frame typ es and comparing them with

the annotated data The noun chunks were evaluated according to

the range of the noun chunks did the mo del nd a chunk at all

the range and the identier of the noun chunks did the mo del nd

a noun chunk and identify the correct syntactic category and case

and the sub categorisation frames were evaluated according to the frame

lab el ie did the mo del determine the correct sub categorisation frame

for a clause Precision was measured in the following way

tp

pr ecision

tp f p

with tp counting the cases where the identied chunklab el is correct

and f p counting the cases where the identied chunklab el is not correct

Figures and present the strongly dierent development of noun

chunk and sub categorisation frame representations within the mo dels

ranging from the untrained mo del until the fth iteration of lexicalised

training Noun chunks were mo delled suciently by an unlexicalised

trained grammar lexicalisation made the mo delling worse Verb phrases

in general needed a combination of unlexicalised and lexicalised training

but the representation strongly dep ended on the sp ecic item Unlex

icalised training advanced frequent phenomena compare for example

the representation of the transitive frame with direct ob ject for erfahren

and with indirect ob ject for folgen lexicalisation and lexicalised train

ing improved the lexicalised prop erties of the verbs as exp ected

It is obvious that p erplexity can hardly measure the linguistic p erfor

mance of the training strategy and resulting mo dels the p erplexity on

training as well as on test data is a monotonously decreasing curve but

as explained ab ove the linguistic mo del p erformance develops dierently

according to dierent phenomena So p erplexity can only serve as rough

indicator whether the mo del reaches towards an optimum but linguistic

evaluation determines the optimum

S Schulte i W H Schmid M Rooth S Riezler D Prescher

1 "RC_range.precision" "RC_range.recall" "RC_range_label.precision" "RC_range_label.recall"

0.95

0.9

0.85

0.8

0.75

untrained unlex lex0 lex5

FIGURE Development of Precision and Recall Values on Noun Chunk

Range and Lab el

The precision values of the b est mo del according to the training

strategy in section were as in Table

Noun Chunks Sub categorisation Frames on SubCorp ora

range rangelab el relative clauses verb nal clauses

Sub categorisation Frames on Sp ecic Verbs

beteiligen erhalten folgen verbieten versprechen versuchen

participate receive follow forbid promise try

TABLE Precision Values on Noun Chunks and Sub categorisation Frames

For comparison reasons we evaluated the sub categorisation frames

of relative clauses extracted from the training data Interestingly

there were no striking dierences concerning the precision values

Without utilising the p o oling option the precision values for low

frequent phenomena such as nonnite frame recognition was worse eg

the precision for the verb versuchen was less than with p o oling

Statistical Grammar Models and Lexicon Acquisition

1 "beteiligen_label.precision" "erhalten_label.precision" "folgen_label.precision" "verbieten_label.precision" "versprechen_label.precision" 0.8 "versuchen_label.precision"

0.6

0.4

0.2

0

untrained unlex lex0 lex5

FIGURE Development of Precision Values on Sub categorisation Frames

for Sp ecic Verbs

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Investigating the of the Mo del

Which linguistic asp ects could b e learned by the probability mo del ie

what is the strength and what are the weaknesses of the mo del Noun

chunks sub categorisation frames and prep ositional frames have b een

investigated

Concerning the noun chunks a remarkable numb er was identied

correctly concerning their structure ie what is a noun chunk as well

as their category ie which case is assigned to the noun chunk Before

training a large numb er of noun chunks was assigned wrong case but

after training the mistakes were mostly corrected except for few noun

chunks b eing assigned the accusative case instead of nominative or da

tive

For sub categorisation frames the distribution and confusion of the

multiple frames is manifold Some interesting feature developments are

cited b elow

Highly common sub categorisation typ es such as the transitive

frame are learned in unlexicalised training and then slightly un

learned in lexicalised training Less common sub categorisation

typ es such as the demand for an indirect ob ject are unlearned

in unlexicalised training but improved during lexicalised training

It is dicult and was not eectively learned to distinguish b etween

prep ositional phrases as verbal complements and adjuncts

The active present p erfect verb complexes and passive of condition

were confused b ecause b oth are comp osed by a past participle and

a form of to be eg geschwommen ist has swum vs gebunden ist

is b ound

Copula constructions and passive of condition were confused again

b ecause b oth may b e comp osed by a past participle and a form of to

be eg verboten ist is forbidden vs erfahren ist is exp erienced

Noun chunks b elonging to a sub categorised nonnite clause were

partly parsed as arguments of the main verb For example der ihn

zu berreden versucht who him tried to p ersuade was parsed as

acc

demanding an accusative plus a nonnite clause instead of recog

nising that the accusative ob ject is sub categorised by the emb ed

ded innitival verb

Reexive pronouns app eared in the sub categorisation frame as ei

ther reexive pronoun itself or as accusative or dative noun chunk

The correct or wrong choice of frame typ e containing the reex

ive pronoun was learned consequently right or wrong for dierent

verbs For example the verb sich benden to b e situated was

generally parsed as a transitive not as inherent reexive verb

Statistical Grammar Models and Lexicon Acquisition

This feature confusion reects the background for the identication of

the frame typ es concerning the sp ecically chosen verbs

The verb beteiligen was mostly parsed as transitive verb Two

sources of mistakes were combined here i the verb was assigned

a transitive instead of inherent reexive frame and ii the oblig

atory prep ositional phrase was consequently parsed as adjunct in

stead of argument All feature tendencies were already determined

by unlexicalised training and not corrected in lexicalised training

The transitive frame of erhalten was recognised well not many

mistakes were made except for the PPassignment

As consequence of unlexicalised training the verb folgen was partly

parsed as transitive but lexicalised training corrected that ten

dency

The main problem for the verb verbieten was b eing assigned a

copulaconstruction instead of a passive of condition

For the verb versprechen the main mistake was using the domi

nance of the bitransitive frame also for parsing the transitive re

exive verb sich versprechen

The main mistake for versuchen was parsing a direct ob ject in

stead of recognising the ob jects correlation with the emb edded

innitival verb

We conclude the linguistic feature description by presenting probability

distributions of selected verbs over sub categorisation frames in Table

as extracted by questioning to ols on the mo del parameters

12

Examples are only given in case the frame usage is p ossible Otherwise an expla

nation for a wrong frame indication is given

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Verb Prob Frame Example

erlauben IPna weil meine Eltern vieles erlaubt haben

b ecause my parents allowed a lot

IPnad weil sie mir vieles erlaubt haben

b ecause they allowed me a lot

achten IPnp weil das Kind auf die Ampel achten sol lte

b ecause the child should pay attention

to the trac lights

IPna da wir die Bemhungen achten

that we resp ect the eort

IPn intransitive use not p ossible

basieren IPnp da die Ausnahme auf der Regel basiert

that the exception is based on the rule

erfahren IPna weil er die Neuigkeit erfahren hat

b ecause he found out the news

IPnp weil er von den nderungen erfahren wil l

since he wants to nd out ab out

the changes

IPn intransitive use not p ossible

IPnap PP cannot b e argument

folgen IPnd weil er ihr folgen wol lte

b ecause he wanted to follow her

IPn weil wichtige Entscheidungen folgen

b ecause imp ortant decisions follow

beginnen IPnp da wir mit der Schule beginnen mchten

that we want to start with scho ol

IPn da die Vorlesung beginnt

that the seminar starts

IPna weil wir das Spiel bereits begonnen haben

b ecause we started the game already

scheinen IPni weil die Regelung zu funktionieren scheint

b ecause the regulation seems to work

IPn weil die Sonne heute scheint

b ecause the sun is shining to day

IPnai accusative should b e parsed as direct ob ject

of emb edded innitival verb

erweisen IPnr PP as argument needed

IPnpr weil sie sich als eine gute Fee erwiesen hat

b ecause she proved to b e a fairy

IPnad weil er ihr die Ehre erweist

b ecause he paid her resp ect

enden IPnp weil die Stunde um Uhr endet

b ecause the lesson ends at am

IPn weil auch die besten Zeiten enden werden

b ecause even the b est times will end

beteiligen IPnpr weil wir uns an dem Kauf beteiligen

since we participate in the purchase

IPnp confusion copula construction

passive of condition

IPnr PP as argument needed

TABLE Probability Distribution over Sub categorisation Frames

Statistical Grammar Models and Lexicon Acquisition

Exploiting the Lexicalised Probabilistic Grammar

Mo del

Having trained the statistical grammar mo dels we are equipp ed with

valuable lexical information But how to detect it What are the p ossibil

ities to determine relevant lexical information and apply it to interesting

tasks The following sections refer to the p otential of the grammar mo d

els with section presenting a collection of lexicalised probabilities

for verbs section applies Viterbi parsing on basis of the lexi

cal probabilities to an example sentence followed by section ex

tracting an empirical database of sub categorisation frames from Viterbi

parses nally section explains how to base a chunker on the

trained grammar

Lexicalised Probabilities

The mo del parameters can b e queried by to ols First we queried for

the sub categorisation frames of sp ecic verbs This kind of parameter

b elongs to the lexicalised rules it sp ecies the probability of the sentence

generating the category IPFrame dep ending on a verb Following

you nd the relevant probabilities of the IPs for display reasons with a

cuto probability of

Verb glauben believe Verb geben give

prob IPframe prob IPframe

IPn IPna

IPna IPnap

IPnp IPnad

Verb folgen follow Verb enden end

prob IPframe prob IPframe

IPnd IPnp

IPn IPn

Verb achten respectpay attention Verb beteiligen participate

prob IPframe prob IPframe

IPnp IPnpr

IPna IPnp

IPn IPnr

Secondly we queried for the probabilities of sub categorised prep ositional

phrases in verb phrases containing a prep ositional phrase as one argu

ment The probabilities also represent a kind of lexicalised rule param

S Schulte i W H Schmid M Rooth S Riezler D Prescher

eters the probability of a certain PP eg a PP with dative case and

headed by the prep osition mit representing the sub categorised PP in

the sub categorisation frame eg the frame np

Verb sprechen talk VP VPAnp

prob rule

PPDatvon about

PPAkkfr for

PPDatmit with

Verb enden end VP VPAnp

prob rule

PPDatmit with

PPDatin in

PPDatan at

Verb eignen qualify VP VPAnpr

prob rule

PPAkkfr for

PPDatzu to

In the nal example we ltered frequency distributions over nominal

heads in sub categorised noun phrases This kind of parameter b elongs

to the lexical choice parameters it sp ecies the probability of a certain

lemma eg the noun Kind child as head of a sub categorised noun

phrase eg an NP with accusative case

Verb entstammen descend from VP VPAnd NPDat

freq word

Familie family

Jahrhundert century

Welt world

Disziplin discipline

Drogenhandel drug trafficking

Elternhaus parental home

Zeit time

Verb drohen threaten VP VPAnd NPNom

freq word

Gefahr danger

Abschiebung deportation

Verfolgung prosecution

Statistical Grammar Models and Lexicon Acquisition

Todesstrafe death penalty

Tod death

Arbeitslosigkeit unemployment

Ausweisung instruction

Entlassung dismissal

Kndigung termination

Verb erziehen educate VP VPAna NPAkk

freq word

Kind child

Junge boy

Sohn son

Tochter daughter

Viterbi Parses

With LoPar it is p ossible to parse a corpus unambiguously by select

ing the resp ective analysis with the highest probability called Viterbi

parse Viterbi parses are printed in a list notation graphical to ols allow

the parse tree representation For example the Viterbi parse of the rel

ative clause die vielen Menschen das Leben retten knnte which could

save many p eoples lives is represented by the parse tree in Figure

The parser correctly chose the ditransitive sub categorisation frame nad

for the verb retten save and provided the relevant NPs with the correct

case die as a nominative relative pronoun vielen Menschen as an NP

with dative case and das Leben as an NP with accusative case Viterbi

parsing is used to build large parsed corp ora called treebanks or as an

intermediate step in larger NLP systems for eg machine

question answering query analysis

Empirical Sub categorisation Frame Database

Section intro duced Viterbi parses as a metho d for determining

the most probable parse of a sentence We collected the parses to build

an empirical database an input to complex NLP systems The database

has actually b een used for semantic clustering cf Ro oth et al

Schulte im Walde a and exp eriments on verb biases concerning

lexical syntactic preferences Lapata et al To app ear

For example the following lines represent some example sub categori

sation frames tokens for English extracted from the Viterbi parses of

the resp ective sentences in the British National Corpus BNC Each

line represents one sub categorisation frame the verb as well as the ar

guments are dened by a tuple describing the syntactic category

and its features each syntactic category was accompanied by the lexical

head the prep ositional phrase by the lexical head plus the head noun of

S Schulte i W H Schmid M Rooth S Riezler D Prescher

FIGURE Viterbi Parse

the subordinated noun phrase and the verb by its mo de

The frames start with the description of the verb followed by all argu

ments in the order they app eared in the parses To give an example

the frame token

actexcelled subjnobody objhim ppinjudgement

describ es the sentence Nobody excel led him in that judgement

pasdescribed objrealism ppbypnfischer

actproved subjdistinction apdifficult

acttook subjthis objforms

actargued subjhe ppagainsttype

actintend subjmuseum toactsponsor

paslimited objwriting ppbydemands

acthas subjcritic objadvantage

actserve subjcomparison objus ppasexample

actseem subjthey toactproceed

actdemands subjpnmichelangelo objpreference

A more detailed description of the frame tokens can b e found

in Schulte im Walde

A comparable database was created for German Following are ex

amples starting with a verbnal clause followed by all arguments and

the verb frame

Statistical Grammar Models and Lexicon Acquisition

S dass in diesem Jahr der grosse Coup gelingen wrde

that the big coup would succeed this year

NPNom Coup

IPn gelingen

S weil die Stadtvter Schmiergelder fr die Einrichtung

eines modernen Mllplatzes einsteckten

because the city management accepted bribe money

for the establishment of a modern dump

NPNom Stadtvter

NPAkk Schmiergelder

IPna einsteckten

S dass diese Kunst menschlichen Bedrfnissen entspricht

that this art corresponds to human needs

NPNom Kunst

NPDat Bedrfnissen

IPnd entspricht

Chunking

A chunker is a to ol which marks all p ossibly recursive chunks in a sen

tence Arbitrary syntactic categories can b e dened as relevant chunks

Whereas the contextfree grammars under development often cop e with

restricted parts of the resp ective language cf the German grammar de

scrib ed in section we develop ed a languageindep endent metho d

which allows to extend the grammars with robustness rules to extract

various kinds of chunks from unrestricted text

The b est chunk sequence of a sentence is dened as the sequence of

chunks with category start and end p osition for which the sum of the

probabilities of all parses which contain exactly that chunk sequence is

maximal The algorithm sums probabilities up to the level of the chunks

like the Inside algorithm and computes the maximum ab ove the level of

chunks like the Viterbi algorithm To b e more sp ecic we compute for

each no de n in the parse forest

the maximum of the probabilities of all analyses of n containing

chunks and

the sum of the probabilities of all analyses of n containing no

chunks

We have concentrated the chunking on nouns cf

Schmid and Schulte im Walde since many lowlevel NLP

systems are using them eg as index terms in information retrieval or

as candidates for extraction

The German base grammar currently covering verb nal and relative

clauses has automatically b een extended by robustness rules All rules

have b een trained on unlab elled data by the probabilistic contextfree

S Schulte i W H Schmid M Rooth S Riezler D Prescher

parser For extracting noun chunks the parser generates all p ossible

noun chunk analyses scores them and cho oses the most probable chunk

sequences according to the ab ove algorithm LoPar is able to generate

chunked output in which either minimal ie nonrecursive chunks or

maximal chunks are marked with surrounding brackets

The following example presents a German sentence followed by the

noun chunks extracted The noun chunks are marked by case

S Damit sei freilich noch keine Garantie gegeben

schreiben beide Politiker weiter

dass die Verhandlungen tatschlich whrend des Gipfeltreffens

in Amsterdam zu einem guten Ende gelangten

There is still no warranty

the politicians continued

that the negotiations at the summit meeting

im Amsterdam conclude with a good solution

NCNom keine Garantie

NCNom beide Politiker

NCNom die Verhandlungen

NCGen des Gipfeltreffens

NCDat Amsterdam

NCDat einem guten Ende

Lexical Semantic Clusters

This section presents a metho d for automatic induction of semantically

annotated sub categorisation frames from unannotated corp ora We use

the statistical for inducing sub categorisation frames for verbs

as describ ed in section which estimates probability distributions

and corpus frequencies for pairs of a verbal head and a sub categorisa

tion frame Since the statistical parser can also collect frequencies for

the nominal llers of slots in a sub categorisation frame the induction

of lab els for slots in a frame is based up on the estimation of a probabil

ity distribution over tuples consisting of a class lab el a selecting head

a grammatical relation and a ller head The class lab el is treated as

hidden data in the EMframework for statistical estimation For fur

ther information on theory and applications of our clustering mo del see

Ro oth et al and Ro oth et al

EMBased Clustering

Basic Idea

In our clustering approach classes are derived directly from distribu

tional dataa sample of pairs of verbs and nouns gathered by parsing

an unannotated corpus and extracting the llers of grammatical rela

tions Semantic classes corresp onding to such pairs are viewed as hidden

variables or unobserved data in the context of maximum likeliho o d es

Statistical Grammar Models and Lexicon Acquisition

timation from incomplete data via the EM algorithm This approach

allows us to work in a mathematically welldened framework of statis

tical inference ie standard monotonicity and convergence results for

the EM algorithm extend to our metho d

The basic ideas of our EMbased clustering approach were presented

in Ro oth see also Ro oth An imp ortant prop erty of our

clustering approach is the fact that it is a soft clustering metho d

dening class memb ership as a conditional probability distribution over

verbs and nouns In contrast in hard Bo olean clustering metho ds

such as that of Brown et al every word b elongs to exactly one

class which b ecause of homophony is unrealistic The foundation of our

clustering mo del up on a probability mo del furthermore contrasts with

the merely heuristic and empirical justication of similaritybased ap

proaches to clustering Dagan et al The probability mo del we

use can b e found earlier in Pereira et al However in contrast

to this approach our statistical inference metho d for clustering is for

malised clearly as an EMalgorithm Approaches to probabilistic clus

tering similar to ours were presented recently in Saul and Pereira

and Hofmann and Puzicha There also EMalgorithms for similar

probability mo dels have b een derived but applied only to simpler tasks

not involving a combination of EMbased clustering mo dels as in our

lexicon induction exp eriment

General Theory

We seek to derive a joint distribution of verbnoun pairs from a large

sample of pairs of verbs v V and nouns n N The key idea is to view

v and n as conditioned on a hidden class c C where the classes are

given no prior interpretation The semantically smo othed probability of

a pair v n is dened to b e

X X

pv n pc v n pcpv jcpnjc

cC cC

The joint distribution pc v n is dened by pc v n pcpv jcpnjc

Note that by construction conditioning of v and n on each other is solely

made through the classes c

In the framework of the EM algorithm Dempster et al

McLachlan and Krishnan we can formalise clustering as an es

timation problem for a latent class LC mo del as follows We are given

a sample space Y of observed incomplete data corresp onding to

pairs from V N

a sample space X of unobserved complete data corresp onding to

triples from C V N

S Schulte i W H Schmid M Rooth S Riezler D Prescher

a set X y fx X j x c y c C g of complete data related

to the observation y

a completedata sp ecication p x corresp onding to the joint

probability pc v n over C V N with parametervector

h jc C v V n N i

c v c nc

an incomplete data sp ecication p y which is related to the

completedata sp ecication as the marginal probability p y

P

p x

X (y )

The EM algorithm is directed at nding a value of that maximises

the incompletedata loglikeliho o d function L as a function of for a

given sample Y ie

Y

arg max L where L ln p y

y

As prescrib ed by the EM algorithm the parameters of L are es

timated indirectly by pro ceeding iteratively in terms of completedata

(t)

estimation for the auxiliary function Q which is the conditional

exp ectation of the completedata loglikeliho o d ln p x given the ob

(t)

served data y and the current t of the parameter values Estep

This auxiliary function is iteratively maximised as a function of M

step where each iteration is dened by the map

(t+1) (t) (t)

M arg max Q

Note that our application is an instance of the EMalgorithm for context

free mo dels Baum et al Baker from which the following

particularly simple reestimation formulae can b e derived Let x c y

for xed c and y and f y b e the frequency of y in the training sample

Then

P

f y p xjy

y fv gN

P

M

v c

f y p xjy

y

P

f y p xjy

y V fng

P

M

nc

f y p xjy

y

P

f y p xjy

y

M

c

jY j

Intuitively the conditional exp ectation of the numb er of times a partic

ular v n or c choice is made during the derivation is prorated by the

conditionally exp ected total numb er of times a choice of the same kind

is made As shown by Baum et al every such maximisation step

increases the loglikeliho o d function L and a sequence of reestimates

eventually converges to a lo cal maximum of L

Statistical Grammar Models and Lexicon Acquisition

Clustering Examples

In the following we will present some examples of induced clusters In

one exp eriment the input to the clustering algorithm was a training

corpus of tokens typ es of English verbnoun pairs

participating in the grammatical relations of intransitive and transi

tive verbs and their sub ject and ob ject llers The data were gathered

from the maximalprobability parses the headlexicalised probabilistic

contextfree grammar of Carroll and Ro oth gave for the British

National Corpus million words

Figure shows an induced semantic class out of a mo del with

classes At the top are listed the most probable nouns in the pnj

distribution and their probabilities and at left are the most probable

verbs in the pv j distribution where is the class index Those verb

noun pairs which were seen in the training data app ear with a dot in

the class matrix Verbs with sux as s indicate the sub ject slot of an

active intransitive Similarly aso s denotes the sub ject slot of an active

transitive and aso o denotes the ob ject slot of an active transitive

Thus v in the ab ove discussion actually consists of a combination of a

verb with a sub categorisation frame slot as s aso s or aso o

Induced classes often have a basis in lexical class can b e

interpreted as clustering agents denoted by prop er names man and

woman together with verbs denoting communicative action Figure

shows a cluster involving verbs of scalar change and things which can

move along scales Figure can b e interpreted as involving dierent

dispositions and mo des of their execution

In another exp eriment we extracted tokens typ es

of pairs of German verbs or adjectives and grammatically related nouns

from maximalprobability parses the parsed corpus was the verb nal

subcorpus from the HGC describ ed in section The underlying

lexicalised statistical mo del for German was describ ed in section

Figure and Figure show two classes out of a mo del with

classes On the left and at the top are listed the highest prob

able verbadjective predicates and nouns app earing as llers of the

verbadjective slots ordered according to their probability given the

class Verbal predicates are annotated with sub categorisation slots eg

liegenAnpn denotes the nominative nounphrase ller n of the

sub jectslot of an active verb A liegen lie sub categorising for a nom

inative n and a prep ositional p phrase tragenAnaa is the ac

cusative nounphrase ller a of the ob ject slot of the transitive verb

tragen carry steigenAnn denotes the nominative ller n of the

sub ject slot of the intransitive verb steigen rise Clearly due to the

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Cl

PROB he ard hel k y oman oice

helen edw man ruth corb ett do ctor w burun juliet blanc athelstan cranston b enjamin stephen adam girl laura maggie v ashley jane caroline jac john harry emily one p eople b o rac

askass                             

no dass                         

thinkass                          

shakeasos                           

smileass                            

laughass                            

replyass                         

shrugass                        

wonderass                        

feelasos                        

takeasos                            

sighass                       

watchasos                       

askasos                        

tellasos                            

lo okass                     

giveasos                           

hearasos                     

grinass                        

answerass                      

explainass                     

frownass                   

hesitateass                   

standass                     

continueass                      

ndasos                     

feelass                       

sitass                      

agreeass                        

cryass                

FIGURE English Class communicative action

smaller size of the German input data compared to the English data

German classes are less dense than the English counterparts

Figure shows a cluster involving scalar motion verbs and things

which can move along scales Figure shows a class which can b e

interpreted as governmentalpublic authority involving nouns such as

police force and public prosecutors oce

Statistical Grammar Models and Lexicon Acquisition

Cl

PROB t y b er er el w terest um alue hance

rate price cost lev amoun sale v n in demand c standard share risk prot pressure income p erformance b enet size p opulation prop ortion temp erature tax fee time p o qualit supplely money

increaseass                             

increaseasoo                           

fallass                       

payasoo                

reduceasoo                             

riseass                         

exceedasoo                       

exceedasos                         

aectasoo                           

growass                

includeasos                          

reachasos                      

declineass                   

loseasoo                       

actasos        

improveasoo            

includeasoo                             

cutasoo                 

showasoo                       

varyass                

giveasoo                         

carryasoo                  

improveass             

haveass                            

pro duceasoo                       

getasoo                            

raiseasoo                      

meanasoo                           

receiveasoo                       

standasoo     

FIGURE English Class scalar change

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Cl

PROB t t t t h emen t y v emen wledge elopmen wth v el hange yp e

t reaction kind dierence mo loss amoun c eect result degree resp onse approac reduction forme condition understanding impro factor lev use increase dev gro skill action pro cess activit kno treatmen

requireasoo                          

showasoo                            

needasoo                          

involveasoo                           

pro duceasoo                          

o ccurass                         

causeasos                         

causeasoo                    

aectasos                            

requireasos                            

meanasoo                         

suggestasoo                          

pro duceasos                          

demandasoo                   

reduceasos                     

reectasoo                           

involveasos                         

undergoasoo              

increaseasos                      

allowasoo                  

includeasoo                          

makeasos                             

supp ortasoo                    

sawasoo                      

createasos                        

aectasoo                       

implyasoo                   

achieveasoo                    

ndasoo                             

describ easoo                         

FIGURE English Class dispositions

Statistical Grammar Models and Lexicon Acquisition

Cl

PROB eit hrift rage er t ksal hluss h k hF teil terSc hic ahlErgebnis ost erlust emp eratur ruc

Last W T Ergebnis Preis Menge An Stc Zahl Gewinn Kritik Ertrag Figur V F Solidaritt InationsRate Absc Arb eitsLosigk K Bus Nac BrgerMeister Angst Umsatz Einnahme Zins Sc Imp ort un

liegen               

Anpn

tragen       

Anaa

steigen            

Ann

sagen  

Ann

gering            

ADJ

sinken        

Ann

steigen       

Anpn

erklren 

Ann

p ositiv  

ADJ

gehen        

Anpn

sinken       

Anpn

spt  

ADJ

wirken

Ann

ndern 

Anaa

regional

ADJ

Erfolgreich 

ADJ

b esttigen  

Ann

steigen         

ADJ

ansteigen       

Anpn

formulieren 

Anan

b se 

ADJ

ansteigen      

Ann

sitzen 

Ann

b ersteigen           

Anan

einsetzen 

Pnn

anerkennen    

Anaa

entdecken  

Anapa

zunehmen        

Anpn

b etrachten 

Anaa

senken      

Pnn

FIGURE German Class scalar change

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Cl

PROB tur haft her t altsc w tenAgen h hLand unk her hric oalition onzern erein eranstalter olizei olizeiSprec

V P deutsc Nation SPD USA K V Blatt K Magistrat Sender ob erBrgerMeister BundesBank BrgerMeister fernsehen Nato Zeitung Nhe nac P LandesRegierung rundF StaatsAn Behrde Bonn rm UNO BundesAm Sprec

mitteilen             

Ann

b erichten             

Ann

vereinen  

ADJ

sagen    

Ann

BremerJ     

AD

mitteilen       

Anpn

erklren       

Ann

hessisch    

ADJ

unmittelbar 

ADJ

fordern        

Anan

machen      

Anadn

verzichten        

Anpn

melden      

Ann

Berliner       

ADJ

spielen  

Anapn

Westdeutsch   

ADJ

statistisch 

ADJ

meinen    

Ann

aufnehmen       

Anapn

wnschen

Anara

vorstellen      

Anarn

hab en            

Anapn

b etonen    

Ann

zustndig   

ADJ

b ernehmen          

Anan

schsisch   

ADJ

rtlich    

ADJ

b esttigen      

Ann

erzhlen 

Ann

eintreen  

Ann

FIGURE German Class governmentalpublic authority

Statistical Grammar Models and Lexicon Acquisition

Evaluation of Clustering Mo dels

PseudoDisambiguation

We evaluated our clustering mo dels on a pseudodisambiguation task

similar to that p erformed in Pereira et al but diering in detail

The task is to judge which of two verbs v and v is more likely to take a

given noun n as its argument where the pair v n has b een cut out of

the original corpus and the pair v n is constructed by pairing n with a

randomly chosen verb v such that the combination v n is completely

unseen Thus this test evaluates how well the mo dels generalise over

unseen verbs

The data for this test were built as follows We constructed an evalu

ation corpus of v n v triples from a test corpus of typ es of v n

pairs which were randomly cut out of the original corpus of

tokens leaving a training corpus of tokens Each noun n in

the test corpus was combined with a verb v which was randomly chosen

according to its frequency such that the pair v n did app ear neither

in the training nor in the test corpus However the elements v v and

n were required to b e part of the training corpus Furthermore we re

stricted the verbs and nouns in the evaluation corpus to the ones which

o ccurred at least times and at most times with some verb

functor v in the training corpus The resulting evaluation triples

were used to evaluate a sequence of clustering mo dels trained from the

training corpus

The clustering mo dels we evaluated were parameterised in starting

values of the training algorithm in the numb er of classes of the mo del

and in the numb er of iteration steps resulting in a sequence of

mo dels Starting from a lower b ound of for randomly initialised

mo dels accuracy was calculated as the numb er of times the mo del de

cided for pnjv pnjv out of all choices made Figure shows the

evaluation results for mo dels trained with iterations averaged over

starting values and plotted against class cardinality Dierent starting

+

values had an eect of on the p erformance of the test We obtained

a value of ab out accuracy for mo dels b etween and classes

Mo dels with more than classes show a small but stable overtting

eect

The German mo dels were evaluated in a similar way An evaluation

corpus of v n v triples was extracted from the original corpus of

verbadjectivenoun tokens leaving tokens for training

a sequence of clustering mo dels Again the mo dels were parameterised

in starting values numb er of classes and iteration steps resulting in a

sequence of mo dels Figure shows the evaluation results

S Schulte i W H Schmid M Rooth S Riezler D Prescher

for mo dels trained with iterations averaged over starting values

and plotted against class cardinality We obtained an accuracy of over

for mo dels up to classes Dierent starting values had an eect

+

of on the evaluation results For mo dels with more than classes

again a small overtting eect can b e seen

Smo othing Power

A second exp eriment addressed the smo othing p ower of the mo del by

counting the numb er of v n pairs in the set V N of all p ossible combi

nations of verbs and nouns which received a p ositive joint probability by

the mo del The V N space for the ab ove clustering mo dels included

ab out million v n combinations we approximated the smo oth

ing size of a mo del by randomly sampling pairs from V N and

returning the p ercentage of p ositively assigned pairs in the random sam

ple Figure plots the smo othing results for the ab ove mo dels against

+

the numb er of classes Starting values had an inuence of on p er

formance Given the prop ortion of the numb er of typ es in the training

corpus to the V N space without clustering we have a smo othing

p ower of whereas for example a mo del with classes and

iterations has a smo othing p ower of ab out

Corresp onding to the maximum likeliho o d paradigm the numb er of

training iterations had a decreasing eect on the smo othing p erformance

whereas the accuracy of the pseudodisambiguation was increasing in the

numb er of iterations We found a numb er of iterations to b e a go o d

compromise in this tradeo

For German mo dels we observed a baseline smo othing p ower of

which is the relation of the numb er of typ es in the German train

ing corpus to the billion combinations in the V N space for the

German exp eriments Despite of the fact that this baseline is times

smaller than the baseline for the English mo dels we have a smo othing

p ower of ab out for mo dels with classes which were b est in terms

of the pseudodisambiguation task This is shown in Figure The b est

compromise in terms of iterations was a numb er of iterations for the

German exp eriments

Statistical Grammar Models and Lexicon Acquisition

85 generalization power over ambiguous verbs

80

75

70

accuracy 65

60

55

50 0 50 100 150 200 250 300 number of classes

0.8 generalization power over ambiguous verbs

0.75

0.7

0.65 accuracy

0.6

0.55

0.5 0 20 40 60 80 100

number of classes

FIGURE Evaluation of EnglishGerman Mo dels on

PseudoDisambiguation Task

S Schulte i W H Schmid M Rooth S Riezler D Prescher

100 smoothing power

95

90

85 percentage of model size

80

75 0 50 100 150 200 250 300 number of classes

70 smoothing power

60

50

40 percentage

30

20

10 10 20 30 40 50 60 70 80 90 100

number of classes

FIGURE Evaluation of EnglishGerman Mo dels on Smo othing Task

Statistical Grammar Models and Lexicon Acquisition

Lexicon Induction based on Latent Classes

The goal of the following exp eriment was to derive a lexicon of several

hundred intransitive and transitive verbs with sub categorisation slots

lab elled with latent classes

Probabilistic Lab elling with Latent Classes using EM

Estimation

To induce latent classes for the sub ject slot of a xed intransitive verb the

following statistical inference step was p erformed Given a latent class

mo del p for verbnoun pairs and a sample n n of sub jects

LC 1 M

for a xed intransitive verb we calculate the probability of an arbitrary

sub ject n N by

X X

pn pc n pcp njc

LC

cC cC

The estimation of the parametervector h jc C i can b e formalised

c

in the EM framework by viewing pn or pc n as a function of for

xed p The reestimation formulae resulting from the incomplete

LC

data estimation for these probability functions have the following form

f n is the frequency of n in the sample of sub jects of the xed verb

P

f np cjn

nN

P

M

c

f n

nN

A similar EM induction pro cess can b e applied also to pairs of nouns

thus enabling induction of latent semantic annotations for transitive verb

frames Given a LC mo del p for verbnoun pairs and a sample

LC

n n n n of noun arguments n sub jects and n direct

1 2 1 1 2 M 1 2

ob jects for a xed transitive verb we calculate the probability of its

noun argument pairs by

X

pn n pc c n n

1 2 1 2 1 2

c c C

1 2

X

pc c p n jc p n jc

1 2 LC 1 1 LC 2 2

c c C

1 2

jc c C i can b e Again estimation of the parametervector h

1 2 c c

1 2

formalised in an EM framework by viewing pn n or pc c n n

1 2 1 2 1 2

as a function of for xed p The reestimation formulae resulting

LC

from this incomplete data estimation problem have the following sim

ple form f n n is the frequency of n n in the sample of noun

1 2 1 2

argument pairs of the xed verb

P

f n n p c c jn n

1 2 1 2 1 2

n n N

1 2

P

M

c1c2

f n n

1 2

n n N

1 2

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Note that the class distributions pc and pc c for intransitive and

1 2

transitive mo dels can also b e computed for verbs unseen in the LC

mo del

Lexicon Induction Exp eriment

In a rst exp eriment with English data we used a mo del with classes

From maximal probability parses for the British National Corpus de

rived with the statistical parser of Carroll and Ro oth we ex

tracted frequency tables for intransitive verbsub ject pairs and transitive

verbsub jectob ject triples The most frequent verbs were selected

for slot lab elling Figure shows two verbs v for which the most prob

able class lab el is a class which we earlier describ ed as communicative

action together with the estimated frequencies of f np cjn for those

ten nouns n for which this estimated frequency is highest

blush snarl

constance mandeville

christina jinkwa

willie man

ronni scott

claudia omalley

gabriel shamlou

maggie angalo

bathsheba corb ett

sarah southgate

girl ace

FIGURE Lexicon Entries blush snarl

Figure shows corresp onding data for an intransitive scalar motion

sense of increase

increase

numb er

demand

pressure

temp erature

cost

prop ortion

size

rate

level

price

FIGURE Lexicon Entry increase

Figure shows the intransitive verbs which take as the most

Statistical Grammar Models and Lexicon Acquisition

probable lab el Intuitively the verbs are semantically coherent When

compared to Levin s toplevel verb classes we found an agree

ment of our classication with her class of verbs of changes of state

except for the last three verbs in the list in Figure which is sorted by

probability of the class lab el

decrease drop

double grow

increase vary

decline improve

rise climb

soar ow

fall cut

slow mount

diminish

FIGURE Scalar Motion Verbs

Figure shows the most probable pair of classes for increase as a

transitive verb together with estimated frequencies for the head ller

pair Note that the ob ject lab el is the class found with intransitive

scalar motion verbs this corresp ondence is exploited in the next section

increase (8; 17)

development pressure

fat risk

communication awareness

supplementation concentration

increase numb er

FIGURE Transitive increase with Estimated Frequencies for Filler Pairs

Further exp eriments were done with two German mo dels with and

classes resp ectively The data for these exp eriments were extracted

from the maximal probability parses of the verb nal German subcorpus

from the HGC describ ed in section parsed with the lexicalised

probabilistic grammar describ ed in section Figure shows the

sub jects of the transitive verb bekanntgeben make public The nouns

are classied with probability to class which was describ ed

ab ove as class of governmentalpublicauthority The numb ers in the

column show the estimated frequencies of the sub ject llers

Figure shows the sub jects of the intransitive verb steigen rise

which b elong with probability to class which was interpreted

ab ove as a class of gradationscalar change

Similar to the English exp eriments we observe semantic uniformity

S Schulte i W H Schmid M Rooth S Riezler D Prescher

bekanntgeben make public

Sprecher sp okesman

Polizei p olice

BundesAmt Federal Agency

BrgerMeister mayor

VorstandsChef Chairman of the b oard

GeschftsLeitung manager

Vorstand b oard of management

unternehmen company

WetterAmt meteorological oce

VolksBank co op erative bank

FIGURE Intransitive Lexicon Entry bekanntgeben make public

steigen rise

Zahl numb er

Preis price

Arb eitsLosigkeit unemployment

Lohn wage

NachFrage demand

Zins interest

Auage print run

Beitrag contribution

Pro duktion output

GrundstuecksPreis price of a piece of land

FIGURE Intransitive Lexicon Entry steigen rise

in the verbs of scalar change Figure shows intransitive verbs which

take class of a classes mo del corresp onding to class of the

class mo del as the most probable class to lab el their resp ective sub ject

slots On the basis the most probable class lab els these verbs can b e

summarised as scalar motion verbs When compared to linguistic classi

cations of verbs given by Schuhmacher we found an agreement

of our classication with the class of einfache nderungsverb en simple

verbs of change except for the verbs anwachsen increase and stagnieren

stagnate which were not classied there at all

An example of the two most probable sub jectob ject class pairs of

a transitive verb senken lower is shown in Figure Class has

b een intro duced b efore as governmentalpublic authority and class as

gradationscalar change

Figure shows the transitive verb dauern last selecting the class

pair with probability as semantic lab el for its sub ject

and ob ject slots Class can b e interpreted as projectactionclass and

class as class of time

Statistical Grammar Models and Lexicon Acquisition

ansteigen go up

steigen rise

absinken sink

sinken go down

schrumpfen shrink

zurckgehen decrease

anwachsen increase

stagnieren stagnate

wachsen grow

hinzukommen b e added

FIGURE Intransitive Scalar Change Verbs

senken (14; 26)

lower

BundesBank LeitZins

Federal bank base rate

BundesBank Zins

Federal bank interest

sup erMarkt Preis

sup er market price

SommerGeschft Verlust

summer business loss

BundesBank DiskontSatz

Federal bank minimum lending rate

senken (14; 14)

lower

BundesBank Lombardsatz

Federal bank rate on loanes on security

StrafAndrohung AbtreibungsQuote

threat of punishment ab ortion rate

StrafAndrohung AbtreibungsZahl

threat of punishment numb er of ab ortions

FachHandel LagerKost

stores storage charges

Harmonisierung sozialNiveau

harmonisation so cial level

FIGURE Transitive Lexicon Entries senken lower

dauern (0; 10) lastgo on

Entwirrung Zeit disentanglement time

BuergerFrageStunde Stunde question time hour

Prozess Jahr trail year

schreckensZeit Jahr scaring time year

ratenZahlung Jahr buy in installments year

FIGURE Transitive Lexicon Entry dauern last

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Linguistic Interpretation

In some linguistic accounts multiplace verbs are decomp osed into rep

resentations involving at least one predicate or relation p er argument

For instance the transitive causativeinchoative verb increase is com

p osed of an actorcausative verb combining with a oneplace predicate

in the structure on the left in Figure Linguistically such represen

tations are motivated by argument alternations diathesis case linking

and deep word order scop e ambiguity by the de

sire to represent asp ects of lexical meaning and by the fact that in some

languages the p ostulated decomp osed representations are overt with

each primitive predicate corresp onding to a For references

and recent discussion of this kind of theory see Hale and Keyser

and Kural

vp vp vp vp

np v np v np v np v

R ^ increase

17 17

vp v vp v vp v

act R R

8 8

np v np v np v

increase R R ^ increase

17 17 17

FIGURE First Tree Linguistic Lexical Entry for Transitive Verb increase

Second Tree Corresp onding Lexical Entry with Induced Classes as

Relational Constants Third Tree Indexed Op en Class Ro ot added as

Conjunct in Transitive Scalar Motion increase Fourth Tree Induced Entry

for Related Intransitive increase

We will sketch an understanding of the lexical representations in

duced by latentclass lab elling in terms of the linguistic theories men

tioned ab ove aiming at an interpretation which combines computational

learnability linguistic motivation and denotationalsemantic adequacy

The basic idea is that latent classes are computational mo dels of the

atomic relation symb ols o ccurring in lexicalsemantic representations

As a rst implementation consider replacing the relation symb ols in the

rst tree in Figure with relation symb ols derived from the latent class

lab elling In the second tree in Fig R and R are relation symb ols

17 8

with indices derived from the lab elling pro cedure of section Such

representations can b e semantically interpreted in standard ways for

instance by interpreting relation symb ols as denoting relations b etween

Statistical Grammar Models and Lexicon Acquisition

events and individuals

Such representations are semantically inadequate for reasons given

in philosophical critiques of decomp osed linguistic representations see

Fo dor for recent discussion A lexicon estimated in the ab ove way

has as many primitive relations as there are latent classes We guess

there should b e a few hundred classes in an approximately complete

lexicon which would have to b e estimated from a corpus of hundreds of

millions of words or more Fo dors arguments which are based on the

very limited degree of genuine interdenability of lexical items and on

Putnams arguments for contextual determination of lexical meaning

indicate that the numb er of basic concepts has the order of magnitude

of the lexicon itself More concretely a lexicon constructed along the

ab ove principles would identify verbs which are lab elled with the same

latent classes for instance it might identify the representations of grab

and touch

For these reasons a semantically adequate lexicon must include ad

ditional relational constants We meet this requirement in a simple way

by including as a conjunct a unique constant derived from the op enclass

ro ot as in the third tree in Figure We intro duce indexing of the op en

class ro ot copied from the class index in order that homophony of op en

class ro ots not result in common conjuncts in semantic representations

for instance we dont want the two senses of decline exemplied in de

cline the proposal and decline ve percent to have a common entailment

represented by a common conjunct This indexing metho d works as long

as the lab elling pro cess pro duces dierent latent class lab els for the dif

ferent senses

The last tree in Figure is the learned representation for the scalar

motion sense of the intransitive verb increase In our approach learning

the argument alternation diathesis relating the transitive increase in

its scalar motion sense to the intransitive increase in its scalar motion

sense amounts to learning representations with a common comp onent

R increase In this case this is achieved

17 17

S Schulte i W H Schmid M Rooth S Riezler D Prescher

Further Applications

Probabilistic clustering metho ds for natural language applications

mainly fo cus on the following two tasks i induction of smo oth proba

bility mo dels on language data and ii automatic discovery of class

structure in natural language In the ab ove describ ed application of

clustering to lexicon induction we fo cussed our attention on the sec

ond task There we were interested in the structure of the induced

clusters as a statistical semantics underlying the data in question In

other applications the classstructure itself is not of interest rather

data clusters are consulted as general backup sources of informa

tion when information ab out sp ecic events is sparse or missing in

the input Here smo oth clustering mo dels can b e used to solve sparse

data problems in various application areas For applications of EM

based clustering in lexical disambiguation in see

Prescher et al or in headword lexicalisation of probabilistic

grammars see Johnson and Riezler Riezler et al

Conclusion

In the preceding sections we presented a framework for the development

and training of statistical grammar mo dels and successfully applied it

to the acquisition of lexicon information In particular we describ ed

metho ds for the extraction of sub categorisation frames for verbs and for

the determination of selectional restrictions The resulting information is

easy to use for lexicographers Our approach has already b een applied to

German English Portuguese and Chinese and will b e applied to Greek

and Spanish in the near future In addition the linguistic information

gained in our exp eriments is valuable for naturallanguage applications

like lexicography parsing information retrieval or machine translation

In an extensive exp eriment we applied semantic clustering tech

niques to predicateargument pairs in order to induce semantic classes

representing typical predicateargument relationships Such classes are

not only interesting from a linguistic p oint of view but can also b e di

rectly used to solve sparsedata problems in natural language mo delling

The mathematically welldened Exp ectationMaximisation algo

rithm for unsup ervised learning was used in all our exp eriments Al

though there is no guarantee that the maximisation of the likeliho o d

of the training data which the EM algorithm p erforms also improves

the linguistic correctness of the resulting syntactic analyses our exp er

iments show that in practice this is the case Gaining more insight into

the relationship b etween linguistic plausibility and likeliho o d of linguis

tic analyses will b e an interesting future research topic

References

Abney Steven Chunk Styleb o ok Technical rep ort Seminar fr

Sprachwissenschaft Universitt Tbingen

Baker J Trainable Grammars for Sp eech Recognition In

Communication Papers for the th Meeting of the Acoustical Society

of America ed D Klatt and J Wolf

Baum Leonard E An Inequality and Asso ciated Maximization

Technique in Statistical Estimation for Probabilistic Functions of

Markov Pro cesses Inequalities I I I

Baum Leonard E Ted Petrie George Soules and Norman Weiss

A Maximization Technique Occurring in the Statistical Analysis of

Probabilistic Functions of Markov Chains The Annals of Mathemat

ical Statistics

Beil Franz Glenn Carroll Detlef Prescher Stefan Riezler and Mats

Ro oth InsideOutside Estimation of a Lexicalized PCFG for

German Gold In Inducing with the EM Algorithm

AIMS Rep ort Institut fr Maschinelle Sprachverarb eitung Uni

versitt Stuttgart

Beil Franz Glenn Carroll Detlef Prescher Stefan Riezler and Mats

Ro oth InsideOutside Estimation of a Lexicalized PCFG for

German In Proceedings of the th Annual Meeting of the Associa

tion for Computational Linguistics ACL College Park MD

Brown Peter Peter deSouza Rob ert Mercer Vincent Della Pietra and

Jenifer Lai ClassBased ngram Mo dels of Natural Language

Computational Linguistics

Carroll Glenn Learning Probabilistic Grammars for Language

Modeling Do ctoral dissertation Department of Computer Science

Brown University

Carroll Glenn Manual pages for charge hyparCharge Institut

fr Maschinelle Sprachverarb eitung Universitt Stuttgart

Linguistic Form and its Computation

Carroll Glenn and Mats Ro oth Valence Induction with a Head

Lexicalized PCFG In Proceedings of EMNLP Granada

Charniak Eugene TreeBank Grammars In Proceedings of

the Thirteenth National Conference on Articial Intel ligence AAAI

Dagan Ido Lillian Lee and Fernando Pereira SimilarityBased

Mo dels of Word Co o ccurrence Probabilities In Machine Learning

Sp ecial issue on natural language learning

de Lima Erika F The Automatic Acquisition of Lexical Infor

mation from Portugese Text Corp ora with a Probabilistic Context

Free Grammar Unpublished Manuscript Institut fr Maschinelle

Sprachverarb eitung Universitt Stuttgart

Dempster A P N M Laird and D B Rubin Maximum Like

liho o d from Incomplete Data via the EM Algorithm Journal of the

Royal Statistical Society B

Fo dor Jerry A Concepts Where Cognitive Science Went Wrong

Oxford Oxford Cognitive Science Series

Hale K and SJ Keyser Argument Structure and the Lexical

Expression of Syntactic Relations In The View from Building

ed K Hale and SJ Keyser Cambridge MA MIT Press

Hindle Donald and Mats Ro oth Structural Ambiguity and Lex

ical Relations Computational Linguistics

Ho ckenmaier Julia Parsing Unsegmented Chinese Text with a

HeadLexicalised PCFG Masters thesis Institut fr Maschinelle

Sprachverarb eitung Universitt Stuttgart

Hofmann Thomas and Jan Puzicha Unsup ervised Learning from

Dyadic Data Technical Rep ort TR Berkeley CA Interna

tional Computer Science Insitute

Johnson Mark and Stefan Riezler Exploiting Auxiliary Distribu

tions in Sto chastic UnicationBased Grammars In Proceedings of

the st Conference of the North American Chapter of the Association

for Computational Linguistics ANLPNAACL Seattle WA

Kural Murat Verb Incorporation and Elementary Predicates

Do ctoral dissertation University of California Los Angeles

Lapata Maria Frank Keller and Sabine Schulte im Walde To app ear

Verb Frame Frequency as a Predictor of Verb Bias Journal of Psy

cholinguistic Research

Lari K and S J Young The Estimation of Sto chastic Context

Free Grammars using the InsideOutside Algorithm Computer

Speech and Language

References

Levin Beth English Verb Classes and Alternations A Preliminary

Investigation ChicagoLondon The University of Chicago Press

McLachlan Georey J and Thriyambakam Krishnan The EM

Algorithm and Extensions New York Wiley

Ney Hermann Ute Essen and Reinhard Kneser On Structuring

Probabilistic Dep endencies in Sto chastic Language Mo delling Com

puter Speech and Language

Pereira Fernando Naftali Tishby and Lillian Lee Distributional

Clustering of English Words In Proceedings of the th Annual

Meeting of the Association for Computational Linguistics ACL

Columbus Ohio

Prescher Detlef Stefan Riezler and Mats Ro oth Using a Prob

abilistic ClassBased Lexicon for Lexical Ambiguity Resolution In

Proceedings of the th International Conference on Computational

Linguistics COLING Saarbrcken Germany

Riezler Stefan Detlef Prescher Jonas Kuhn and Mark Johnson

Lexicalized Sto chastic Mo deling of ConstraintBased Grammars us

ing LogLinear Measures and EM Training In Proceedings of the

th Annual Meeting of the Association for Computational Linguis

tics ACL Hong Kong

Ro oth Mats TwoDimensional Clusters in Grammatical Rela

tions In Symposium on Representation and Acquisition of Lexical

Know ledge Ambiguity and Generativity AAAI Spring

Symp osium Series Stanford University

Ro oth Mats TwoDimensional Clusters in Grammatical Rela

tions In Inducing Lexicons with the EM Algorithm AIMS Re

p ort Institut fr Maschinelle Sprachverarb eitung Universitt

Stuttgart

Ro oth Mats Stefan Riezler Detlef Prescher Glenn Carroll and Franz

Beil EMBased Clustering for NLP Applications In Inducing

Lexicons with the EM Algorithm AIMS Rep ort Institut fr

Maschinelle Sprachverarb eitung Universitt Stuttgart

Ro oth Mats Stefan Riezler Detlef Prescher Glenn Carroll and Franz

Beil Inducing a Semantically Annotated Lexicon via EM

Based Clustering In Proceedings of the th Annual Meeting of the

Association for Computational Linguistics ACL Maryland

Saul Lawrence K and Fernando Pereira Aggregate and Mixed

Order Markov Mo dels for Statistical Language Pro cessing In Pro

ceedings of EMNLP

Schiller Anne and Chris St ckert DMOR Institut fr

Maschinelle Sprachverarb eitung Universitt Stuttgart

Linguistic Form and its Computation

Schmid Helmut YAP Parsing and Disambiguation with Feature

Based Grammars Do ctoral dissertation Institut fr Maschinelle

Sprachverarb eitung Universitt Stuttgart

Schmid Helmut LoPar Design and Implementation Arb eitspa

piere des Sonderforschungsb ereiches No Institut fr

Maschinelle Sprachverarb eitung Universitt Stuttgart

Schmid Helmut and Sabine Schulte im Walde Robust German

Noun Chunking With a Probabilistic ContextFree Grammar In

Proceedings of the th International Conference on Computational

Linguistics COLING Saarbrcken Germany

Schuhmacher Helmut Verben in Feldern Valenzwrterbuch zur

Syntax und Semantik deutscher Verben Berlin De Gruyter

Schulte im Walde Sabine Automatic Semantic Classication of

Verbs According to their Alternation Behaviour Masters thesis

Institut fr maschinelle Sprachverarb eitung Universitt Stuttgart

Schulte im Walde Sabine a Clustering Verbs Semantically Ac

cording to their Alternation Behaviour In Proceedings of the th

International Conference on Computational Linguistics COLING

Saarbrcken Germany

Schulte im Walde Sabine b The German Statistical Grammar

Model Development Training and Linguistic Exploitation Ar

b eitspapiere des Sonderforschungsb ereiches No Institut

fr Maschinelle Sprachverarb eitung Universitt Stuttgart

Schulze Bruno Maximilian GermLem ein Lemmatisierer fr

deutsche Textcorpora Institut fr Maschinelle Sprachverarb eitung

Universitt Stuttgart