Regular Expressions for Language Engineering

Natural Language Engineering Cambridge UniversityPress

L KARTTUNEN JP CHANOD G GREFENSTETTE A SCHILLER

Rank Xerox Research CentreRXRC Chemin de Maupertuis Meylan France

fLauriKarttunenJeanPierreChanodGregoryGrefenstetteAnneSchillerg

grenoblerxrcxeroxcom

Received February

Abstract

Manyofthe pro cessing steps in natural language engineering can b e p erformed using

nite state transducers An optimal waytocreate such transducers is to compile them

from regular expressions This pap er is an intro duction to the regular expression calculus

extended with certain op erators that haveproved very useful in natural language appli

cations ranging from tokenization to lightparsing The examples in the pap er illustrate

in concrete detail some of these application s

Intro duction

The use of nite state transducers for morphological analysis and generation Kart

tunen et al Karttunen Beesley and Karttunen is bynowwell es

tablished It is less well known howfarthe techniques of nite state transformation

of text can succeed in solving other natural language engineering problems b eyond

morphologyIn this pap er wewillpresentanumber of applications of nite state

technology to other language engineering problems and describ e part of the regular

expression calculus that wehavedevelop ed that makethese applications p ossible

Although large eorts havebeen made to build lo cal grammars Silb erztein

without the help of a nite state calculus the expressivepower of a welldesigned

calculus makes it p ossible to create mo dular rule sets and lexical descriptions that

are easy to up date and maintain accelerating the pro duction of diverse engineering

applications

Our pap er is structured in the following way After a general intro duction to the

nite state calculus sp ecial attention will b e given to three nite state op erators

restriction replacement and left to right longest matchreplacement These con

ceptually simple regular expression op erators express in a concise way conditions

whose explicit formulation would otherwise b e complicated and unwieldyTheir

intro duction into the Xeroxnite state calculus has spurred a numberoflanguage

engineering applications some of which are describ ed b elow A graded set of exam

ples using these op erators will b e presented as a concrete illustration of their p ower

and simplicity

After this overview of the Xeroxnite state calculus wewill presentanumber of

Lauri Karttunen and others

applications using this calculus over textual structures larger than individual words

In Section wepresenttokenization applications Subsection presents a de

terministic tokenizer used in our part of sp eechtagging suite Nondeterministic

tokenization presented in subsection maintains alternativetokenizations of

the same input stream in a nite state structure to whichfurther linguistic treat

ments eg parsing can b e applied

Subsection presents twoapplications of lightparsing over tagged text Just

as tokenization applies nite state rules and constraints over strings involving more

than one word these applications apply to units here sentences recognized bythe

part of sp eech tagger whichinclude more than one word And just as transducers

compiled from morphological rules insert or delete characters in words these light

parsers intro duce syntactic markings within a tagged sentence around sequences

that fulll the conditions sp ecied in the nite state rules

Intro duction to the FiniteState Calculus

Finite state transducers that enco de phonological and morphographical alterna

tions are generally not created directly but are compiled from rules of some sort

The twolevel rule formalism Koskenniemi as well as classical phonological

rewrite rules can b e viewed as sp ecial kinds of regular expressions Kaplan and

Kay that extend the basic language with new op erators and new typ es of

expressions These extensions makethe formalism more suitable for particular ap

plications but they do not aect its generativepower In this section wewilldiscuss

recentextensions to the regular expression calculus that enable us to create nite

state transducers for syntactic pro cessing from high level rules

We start with a brief review of the basic regular expression calculus and pro ceed

with the discussion of twospecial op erators that were originally intro duced for

phonological and morphographical alternations but whichhavesince proved useful

in writing syntactic rules as well These are the restriction op erator and the

replace op erator Wewill then describ e in more detail a variantofthelatter

the left to right longest matchreplacement recently intro duced in Karttunen

and illustrate its application to syntactic description

Simple regular expressions

The language of regular expressions is a formal language similar to formulas of

Bo olean logic It has a simple syntax but the expressions can b e arbitrarily com

plex Like formulas of Bo olean logic regular expressions denote sets We need to

distinguish two kinds of sets sets of strings and sets consisting of pairs of strings

We use the term language to refer to a set of simple strings and the term relation in

talking ab out sets of string pairs The terms regular language and regular relation

refer to sets than can b e describ ed bya regular expression

Regular languages and relations maybeencoded as nite state networksLan

guages are represented by simple automata relations by transducers Anyregular

expression can b e compiled into a network that represents the corresp onding lan

Regular Expressions for Language Engineering

guage or relation Because of the close connection b etween regular expressions and

nite state networks weoftenuse the terms regular and nite state interchangeably

in this pap er

Regular expressions contains two kinds of symb ols unary symbols and symbol

pairs Unary symbols a betc denote strings symbol pairs ab a betc

representpairs of strings The simplest kind of regular expression contains a single

symbol For example a denotes the set fagSimilarlythe regular expression

ab denotes the singleton relation fa bgA regular relation mayalways

be viewed as a mapping b etween tworegular languages The ab relation is simply

the crosspro duct of the languages denoted bythe expressions a and b

In order to distinguish the two languages that are involved in a regular relation

we can call the rst one the upper and the second one the lower language of the

relation Corresp ondinglyinthe pair abtherst symb ol acanbe called the

upper symbol and the second one bthe lower symbolThetwocomponents of a

symbol pair are separated in our notation byacolonwithout anywhitespace

b efore or after Tomakethe notation less cumb ersome wesystematically ignore

the distinction b etween the language A and the identityrelation that maps every

string of A to itself Therefore wealso write aa simply as a

A transducer that enco des a regular relation can b e applied in twoways to map

a string in the upp er language to the corresp onding strings in the lower language

or vice versa There is no privileged direction of application In the examples in

later sections the intended input language is always on the upp er side and the

output on the lower side of the relation

Two regular expression symbols havea special interpretation epsilon and

any The epsilon symbol denotes the empty string stands for anysymbol

that o ccurs in the same regular expression and for anyunknown symbol In alater

section weintro duce one more sp ecial symbol the b oundary marker torefer

to the b eginning or the end of a string in certain expressions

The sp ecial meaning of any symbol maybeturnedobyprexing it with the

escap e character or byenclosing the symbol in double quotes and are

interpreted as ordinary zero digits and not as the sp ecial character epsilon On the

other hand certain other strings receiveaspecial interpretation within a doubly

quoted string For example n is interpreted as the newline character t as

a tab following the C programming language conventions Because whitespace is

generally ignored a space as a regular symbol must b e prexed with or enclosed

in double quotes

Complex regular expressions can b e built up from simpler ones bymeans of

regular expression op erators Because b oth regular languages and regular relations

are closed under concatenation and union the following basic op erators can b e

combined with any kind of regular expression

See httpwwwrxrcxeroxcomresearchmlttfst for a demonstration of the Xe

rox regular expression compiler and for a more comprehensivedescription of the syntax

and semantics of regular expressions

Lauri Karttunen and others

A B Union

AB Concatenation

A Optionality union with the emptystring

A Iteration one or more concatenations of A

A Kleene star equivalentto A

Square brackets are used for grouping expressions Thus A is equivalentto

A while A is not Note the following simple expressions

The emptystring languageidentityrelation

The universal languageidentityrelation

Although regular languages are closed under complementation and intersection

regular relations are not see Kaplan and Kay thus the following op erators

can b e combined only with expressions that denote a regular language

A Complement negation

A B Intersection

A B Relativecomplementminus

Regular relations can b e constructed bymeansof twobasic op erators

AxB Crosspro duct

AoB Comp osition

The crosspro duct op erator xisused only with expressions that denote a

regular language it constructs a relation b etween them A x B designates the

relation that maps every string of A to every string of BIfA contains x and B

contains ythe pair x y is included in the crosspro duct

Comp osition is an op eration on relations that yields a new relation A o B

maps strings that are in the upp er language of A to strings that are in the lower

language of BIfA contains the pair xy and B contains the pair y zthepair

x z is in the comp osite relation

Dening New Operators

The syntax though not the descriptivepower of regular expressions can b e ex

tended bydening new op erators that allowcommonly used constructions to b e

expressed more conciselyA simple example of a trivial but convenient extension is

the containmentoperator

A A

def

For example a b denotes all strings that contain at least one a or b

somewhere

The addition of new op erators can b e more than just a notational convenience A

case in p ointis Koskenniemis restriction op erator originally intro duced

for twolevel phonological rules

ABC Restriction A only in the context of BC

Regular Expressions for Language Engineering

Here A B and C maydenote any regular language This expression designates

the language of strings that havetheprop ertythatany string of A that o ccurs

in them is immediately preceded bysome string from B and immediately followed

by some string from CFor example abcincludes all strings that contain

no o ccurrence of a strings like bacbac that completely satisfy the condition

but no strings likeab Reductionist nite state parsers Koskenniemi et al

Voutilainen and Tapanainen Chano d and Tapanainen makefrequent

use of such constraints to exclude unwanted analyses

The advantage of the restriction op erator is that it enco des in a compact waya

useful condition that is dicult to express in terms of the more primitiveoperators

The denition of A B C is shown b elow

ABC B A A C

def

Clearlyhigh level abstractions like A B C are conceptually easier to op

erate with than the logically equivalentbutvery complex primitiveformulas just

as it is easier to write complex computer programs in a high level language rather

than in a logically equivalent assembly language

Note that the denition of the restriction op erator given ab ove contains three

negations Because regular relations are not closed under complementation all the

comp onent expressions in A B C must denote regular languages rather than

relations In the general regular expression calculus wedonotallowexpressions such

as a b c that are wellformed twolevel rules in Koskenniemis formalism

In our twolevel calculus all pair symbols are treated as atomic symbols andcon

verted to real symbolpairs only after the compilation is nished Karttunen and

Beesley

Another example of a useful high level abstraction is the replace op erator

that plays much the same role in the general regular expression calculus as

and in twolevel rules This simplest typ e of replacementisunconstrained by

contexts

AB ReplacementofAbyB

The comp onentexpressions A and Bmust denote regular languages but the

expression as a whole denotes a relation The A B relation maps anystring

to itself if the string contains no instance of AStrings that contain instances of A

are paired with copies that are otherwise identical except that each A segmentis

replaced by some B string The exact denition of ABis shown b elow

AB A A x B A

def

This relatively simple idea would b e also rather cumb ersome to express without

the explicit replace op erator The same is true of the other typ es of replace

expressions Kaplan and Kay Karttunen that constrain the op eration

by left and rightcontexts

ABLR Replacementof A by B in the context LR

Lauri Karttunen and others

The twovertical bars in the ab ovecontexted replacementindicate that b oth

contexts L and R p ertain to the upp er language of the relation See Karttunen

Kemp e and Karttunen for other variations

Another wayof generalizing the replace op erator is to maketwoormore replace

ments in parallel

A B A B

This is dened like the single A B replacementabove except that wereplace

A by A A and A x B bytheunion of multiple

crosspro ducts A x B A x B

Left to right Longest match Replacement

The transducers compiled from the simple replacement expression ABare in

general ambiguous in the sense that a string in the upp er language of the relation

maybecome paired with more than one string This can happ en even if the B

language consists of a single string For example a b bbaabax

maps the string aba into four dierent strings as shown b elow

aba aba ab a aba

x a axa a x x

The reason is that the simple replacement relation do es not constrain the selection

of the alternate substrings for replacement Here wegetdierentresults for aba

dep ending on whether the replacementstarts at the b eginning or in the middle of

the string Atboth sites there are two alternative replacements to b e made Starting

at the b eginning wemay relace either ab or aba Starting in the middle wecan

replace either b or ba The underlining showthefour alternate factorizations

of the input string

For manyapplications it is useful to dene another version of replacementthat

in all such cases yields a unique outcome The longest match left to rightreplace

op erator dened in Karttunen imp oses a unique factorization on ev

ery input The replacement sites are selected from left to right not allowing any

overlaps If there are alternate candidate strings starting at the same lo cation only

the longest one is replaced Thus the op erator allows only the last factorization

in the gure ab ove mapping aba unambiguously to x

The eect of the left to right longest matchconstraints is that every string in

the upp er language of A B is uniquely parsed into a sequence of substrings

that either b elong or do not b elong to AWe can take advantage of the unique

factorization in more than one way Instead of replacing the instances of A bya

string from some other language wemayinsert markers or brackets around the A

strings to mark them as such

Toimplementthis idea Karttunen intro duced a sp ecial symbol on

the righthand side of the replacementexpression to mark the place around which the insertions are to b e made The general form of these marking expressions is

Regular Expressions for Language Engineering

shown b elow For the sakeofgeneralityweallow B and C to denote anyregular

language

ABC Left to right longest matchmarkup

The corresp onding transducer lo cates instances of A in the input string under the

left to right longest matchregimen copies the entire string unchanged except that

the B and C strings are inserted around the selected A strings as markers In eect

this transducer can b e viewed as a parser that picks out maximal instances of the

regular language A

Just like simple replacement the left to rightlongest match replacementcanalso

be constrained bycontext and generalized for parallel replacement For example

A B C A B C

picks out the maximal instances of the union A A and marks them dierently

dep ending on whether they b elong to A or AWewill makeuseof this sp ecic

p ossibilitybelowinSection The contexted version of makes its app earance

in Section

Tostart with a simpler example let us assume that noun phrases consist of an

optional determiner danynumber ofadjectives a and one or more nouns n

The expression d a n compiles into a transducer that inserts

brackets around maximal instances of the noun phrase pattern For instance it

maps dannvaan into dannvaan

dann v aan

d annv aan

Although the input string dannvaan contains manyother instances of the noun

phrase pattern n an nn etc the left to rightandlongest matchconstraints

pickout just the two maximal ones

This simple example demonstrates that nite state parsers can b e compiled di

rectly from regular expressions In the next section wewillpresentamore sophis

ticated example illustrating this technique

AGrammarand a Parser for Date Expressions

It is wellknown among linguists that the syntax of a natural language cannot in

general b e describ ed byanite state or even a context free grammar Nevertheless

there are manysubsets of natural language that can b e correctly describ ed byvery

simple means for example names and titles addresses prices dates etc For some

of these kinds of expressions a nite state grammar maybemore appropriate and

easier to construct than an ordinary phrase structure or feature based grammar

In this section weexamine one suchcase in detail a grammar for dates As we

demonstrated in the previous section from the regular expression that denes the

syntax of wellformed date strings wecan directly deriveanite state transducer that marks them in a text

Lauri Karttunen and others

For the sakeofillustration let us consider here only one of several common date

formats expressions that are of the typ e

Sunday

August

SundayAugust

August

SundayAugust

In the following we assume that a date expression consists of a dayoftheweek

amonth and a date with or without a year or a combination of the two Note

that this description of the syntax of date expressions leads to the same problem

weencountered in the previous example Long date expressions suchasSunday

August contain smaller wellformed date expressions eg August

that should b e ignored in the context of a larger date In order to simplify the

presentation westipulate that date expressions are contiguous strings including

the internal spaces and commas

Tofacilitate the sp ecication of the date language werst dene some auxiliary

terms and then use them to dene larger phrases The complete set of denitions

is shown b elow

To To

Day Monday Tuesday Saturday Sunday

Month January February November December

Date ToTo

Year To To To To

DateExpression Day Day SP Month Date SP Year

From these denitions wecancompile a small nite state automaton states

arcs that describ es a language of ab out million date expressions for the

p erio d from January to December

Aparser for the language can b e compiled from the following simple regular

expression

DateExpression

It yields a transducer of states and arcs that marks maximal date expressions

in the manner illustrated bythe following text

Today is Wednesday August because yesterday was Tuesday

and it was August so tomorrow must be ThursdayAugust and

not August as it says on the program

Because of the left to right longest matchconstraints asso ciated with the

op erator the transducer brackets only the maximaldate expressions

However as to correctness this regular expression grammar suers from a serious

problem of overgeneration The actual number of days in years is much smaller

than the numberofexpressions in the language million A large ma jorityof

the date expressions generated bythegrammarrefer to dates that do not exist For

example expressions like April are similar to examples likethepresent

Regular Expressions for Language Engineering

king of France that do not refer to anything real It is an interesting challenge for

nite state syntactic description to try to sp ecify a sublanguage that contains all

and only the semantically valid date expressions

Eliminating Invalid Date Expressions

In this section we eliminate step bystepall the dierenttyp es of invalid date ex

pressions using the op erations of the regular expression calculus intro duced earlier

The easiest problem to correct is that the original date grammarallows expres

sions likeFebruary and April that exceed the maximum number of days

for the month Somewhat more challenging is the case of leap days February

and February are valid dates but February and Febru

ary do not exist The hardest problem is the dep endency b etween the

dayof the week and the date Because September is in fact a Monday

the expressions TuesdaySeptember Wednesday September

etc are invalid even if they o ccasionally o ccur in real texts

Our solution is to construct a suitable constraintforeachofthesethree kinds

of invalid typ es of dates MaxDaysInMonth LeapDaysand WeekDayDatesEachof

these constraintexpressions denotes a language that excludes a particular typ e of

invalid date but admits all other strings Weobtain the desired eect byintersect

ing the constraintlanguages with the original language of date expressions The

intersection of the four languages contains all and only the valid dates

ValidDate

DateExpression MaxDaysInMonth LeapDays WeekDayDates

The MaxDaysInMonth constraintisa language that includes all strings except the

ones whichcontain a month name followed byaninappropriate number ofdays

MaxDaysInMonth

February

February April June September November

Note that A denotes the complementnegation of the set of strings that

contain at least one instance of A somewhere

In order to restrict February to leap years we need to do a little more work

Not all years divisible by four are leap years Full centuries are not leap years unless

they are divisible by Consequentlyyear is not a leap year but year

is Weneed the following denitions

Even

Odd

N To To

Div N Even NOdd

LeapYear Div N Div

Here werstdene Div as the innite set of natural numb ers that are divisible

by four This set consists of two parts numb ers that end in or p ossibly

preceded byaneven number and numb ers that end in or preceded byan

Lauri Karttunen and others

odd numb er Finallywe dene LeapYear as the set of numbers divisible by

subtracting centuries that are not multiples of Note that the expression N

Div denotes numbers with twonal zeros that are preceded byanumber

that is not divisible by four For example it includes but not Because

LeapYear is dened as Div minus this set it follows that is a leap year but

is not

Once the set of leap years is dened the distribution of February in date

expressions can b e constrained with the following simple restriction As dened

ab ove SP is a separator consisting of a comma and a space

LeapDays February SP LeapYear

In other words a date expression containing February must terminate

with a leap year Note that the b oundary symbol isnecessary here to mark

the end of the year string in order to rule out expressions likeFebruary

whichwould qualify if wewere allowed to takeinto accountonlythe rst three

digits since year is a leap year in the Gregorian calendar

The last problem synchronization b etween the days of the week and the dates

takes a little more work In eect weneed to construct a complete calendar to

determine the validityofany arbitrary date expression sayFriday Octob er

the day the Gregorian calendar was intro duced in Catholic countries

The task is easier than it rst app ears b ecause the Gregorian calendar includes

aperfect year double cycle after eachyears we start the year on the same

dayandwe are at the same p ointinthe leap year sequence Starting from that

observation it is simple to partition years into equivalence classes so that the years

in eachclass b egin on the same dayoftheweek Furthermore we need to distinguish

ordinary years from leap years that haveoneextra dayThus wecan dene fourteen

equivalence classes

MonY

TueY

MonLY

TueLY

where MonY TueY etc are ordinary years b eginning with MondayTuesdayetc

resp ectively MonLY TueLYetc denote the corresp onding classes for leap years

The year classes can b e constructed using the same sort of nite state arithmetic

that we employed ab oveto dene the innite set of leap years but wewillskip the

mechanics here except for one interesting detail In order to avoid listing years one

by one itisconvenienttoderivesub classes from other sub classes by addition This

can b e achieved byanauxiliary transducer that maps any set of numb ers x y z

to another set containing x y z etc Applied in the opp osite direction

the same transducer subtracts

A binary version of the addersubtracter transducer is one of the examples discussed

in httpwwwrxrcxeroxcomresearchmlttfstfsexampleshtml

Regular Expressions for Language Engineering

Another simple observation that greatly facilitates the calendar construction is

that the weekdayofthe rst dayoftheyear determines the weekdayofalltheother

days of the year Again weneed fourteen equivalence classes

D January January December

DL January January December

where D includes all the dates that fall on the same weekdayasthe rst dayof

an ordinary year D contains all the dates that share the weekdaywith the second

day and so on The DL DL etc are the corresp onding classes for leap years

Toconstruct the nal lter for the p erfect Gregorian calendar we just need to

combine the information ab out what weekdaybeginswhatyear with the equivalence

classes for the dates within a year For example for an ordinary Mondayyear

MonY Mondays fall on January January and December But if the

year b egins on SundaySunY then Mondays fall on January January etc

up to December Similarly for all the fourteen equivalence classes for years

The complete constraint for Mondays MondayDates is expressed bythe following

regular expression

MondayDates

Monday SP DSP MonY DL SP MonLY

D SP SunY DL SP SunLY

D SP SatY DL SP SatLY

D SP FriY DL SP FriLY

D SP ThuY DL SP ThuLY

D SP WedY DL SP WedLY

D SP TueY DL SP TueLY

The language denoted by the expression ab oveincludes all strings that do not

contain any instance of Monday and all strings that contain Monday in an

environment that conforms to the restriction the end of the string Monday

any month and a date without a year MondayDecember or a month and

a date followed byanappropriate year MondayJanuary If a month

and a date are followed byayear the month and date class D DL Detcmust

correlate with the year class MonY MonLY SunY etc

The complete WeekDayDates constraintissimplytheintersection of the similar

constraints for all the seven weekdays

WeekDayDates MondayDates TuesdayDates WednesdayDates

ThursdayDates FridayDates SaturdayDates

SundayDates

Wehavenow completed the task of extracting the language of valid dates from

the set of all date expressions it is the intersection of the four languages wehave

dened

ValidDate DateExpression MaxDaysInMonth

LeapDays WeekDayDates

Lauri Karttunen and others

If we consider the nite state network that enco des the language wecansee

that the numberofexpressions is considerably reduced since there are only ab out

ab out million days in b etween year and year On the other hand the

network itself is muchlarger than the unconstrained DateExpression automaton

states arcs

In any case wecannow easily deriveatransducer that marks all and only the

valid dates

ValidDate

With the Xeroxregular expression compiler the entire computation creating the

dateparsing nite state transducer takes ab out seconds on a p owerful worksta

tion Sun Ultra

Discriminating Date Parser

Although it is desirable to distinguish valid dates from invalid dates it may not b e

useful in practice to recognize only the valid dates Because of errors and misprints

real text corp ora contain a fair number of invalid dates Instead of ignoring all but

the valid ones it maybemore practical to accept all date expressions and simply

use dierent tags to mark the distinction b etween valid and invalid dates

Once wehavedenedboth the language of all date expressions DateExpression

and the set of valid ones ValidDateitis simple to pickoutthe set of invalid dates

DateExpression ValidDate Using the notion of parallel replacementintro

duced earlier wecan easily construct a parser that recognizes maximal instances

of the general class of date expressions but tags them in twoways dep ending on

whichsub class the expression b elongs to The discriminating date parser is dened

by the regular expression b elow

DateExpression ValidDate ID

ValidDate VD

This parallel replacementexpression compiles into a state arc trans

ducer in ab out seconds on a Sun Ultra The time includes the compilation of all

the auxiliary expressions and constraints discussed ab ove The following example

illustrates the eect of the transducer on a sample text

The correct date for today is VD Monday September There

is an error in the program Today is not ID Tuesday September

The string MondaySeptember gets marked with the VD tag

b ecause it is a valid date Replacing Monday byTuesday makes the date invalid

but the parser still recognizes the expression TuesdaySeptember as a

date expression alb eit as an invalid one Because the longest matchconstraintis

calculated with resp ect to the language of all date expressions the invalid date

TuesdaySeptember gets selected over TuesdaySeptember

whichhappens to b e valid although the Gregorian calendar of course was not yet

in use in the year AD

Regular Expressions for Language Engineering

Toconclude this section let us recapitulate the main p oints The purp ose of this

exercise was to demonstrate that

Finite state parsers can b e constructed directly from regular expressions for

nontrivial languages

There are regular subsets of natural language suchasthelanguage of dates

for whichnite state description is not only feasible but more appropriate

and easier to construct than the equivalentphrase structure grammar

Regular languages and relations can b e mo died directly with the nite state

calculus to obtain new languages and relations without rewriting the gram

mars that describ e them

It is of course p ossible to use phrase structure rules attributevalue matrices

typ e hierarchies categorial grammar and other p owerful formalismstorepresent

leap years years starting on a Mondayand dates that fall on the same weekdayas

January However it is not apparent that these p owerful formalisms giveusany

advantage over the simple nite state techniques wehaveused On the contraryit

seems that the manipulations required to accomplish suchtasks with more p owerful

formalismswould b e at least as complicated and ad ho c as the simple nite state

techniques employed here Note that a phrase structure grammar of valid dates

requires crossclassication whichleads to a large numberofrules and nonterminal