CG­3 ­ Beyond Classical Constraint Grammar

Eckhard Bick Tino Didriksen University of Southern GrammarSoft ApS [email protected] [email protected]

Abstract genre, but in programming terms, it is implemented procedurally as a set of This paper discusses methodological consecutively iterated rules that add, remove or strengths and shortcomings of the select tag­encoded information. In its classical Constraint Grammar paradigm (CG), form (Karlsson, 1990; Karlsson et al., 1995), showing how the classical CG formalism Constraint Grammar relies on a morphological can be extended to achieve greater analyzer providing so­called cohorts of possible expressive power and how it can be readings for a given word, and uses constraints enhanced and hybridized with techniques that are largely topological1 in nature, for both from other paradigms. We present part­of­speech disambiguation and the a new, largely theory­independent CG assignment of syntactic function tags. (a­c) framework and rule compiler (CG­3), that provide examples for close context (a) and wide allows the linguist to write CG rules context (b) POS rules, and syntactic mapping (c). incorporating different types of linguistic information and methodology from a wide (a) REMOVE VFIN IF (0 N) (­1 ART OR range of parsing approaches, covering not OR GEN); remove a finite verb reading only CG's native topological technique, if self (0) can also be a noun (N), and if there is but also , phrase an article (ART), possessive () or structure grammar and unification genitive (GEN) 1 position left (­1). grammar. In addition, we allow the (b) SELECT VFIN IF (NOT *1 VFIN) (*­1C integration of statistical­numerical CLB­WORD BARRIER VFIN); select a finite constraints and non­discrete tag and string verb reading, if there is no other finite verb sets. candidate (VFIN) to the right (*1), and if there is an unambiguous (C) clause boundary word 1 Introduction (CLB­WORD) somewhere to the left (*­1), with Within Computational Linguistics, Constraint no (BARRIER) finite verb in between. Grammar (CG) is more a methodological than a (c) MAP (@SUBJ) TARGET N (*­1 >>> descriptive paradigm, designed for the robust BARRIER NON­PRE­N) (1C VFIN) ; map a parsing of running text (Karlsson et al., 1995). subject reading (@SUBJ) on noun (N) targets if The formalism provides a framework for there is a sentence­boundary (>>>) left expressing contextual linguistic constraints without non­prenominals (NON­PRE­N) in allowing the grammarian to assign or between, and an unambiguous (C) finite verb disambiguate token­based, morphosyntactic (VFIN) immediately to the right (1C). readings. However, CG's primary concern is not the tag inventory itself, or the underlying As can be seen from the examples, the original linguistic theory of the categories and structures formalism refers only to the linear order of used, but rather the efficiency and accuracy of the tokens, with absolute (>>>) or relative fields method used to achieve a given linguistic annotation. Conceptually, a Constraint Grammar 1 With "topological" we mean that grammar rules can be seen as a declarative whole of contextual refer to relative, left/right-pointing token positions possibilities and impossibilities for a language or (or word fields), e.g. -2 = 2 tokens to the left, *1 = anywhere to the right.

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 31 counting tokens left (­) or right (+) from a time or to link referents across sentence, nor was zero/target position in the sentence. Though in it possible to contextually trigger genre variables principle a methodological limitation, this or in other ways to make a grammar interact with topological approach also has descriptive "side a given text type. Descriptively, this limitation effects": For instance, it supports local syntactic meant that CG as such could not be used for function tags (such as the @SUBJ tag on the head higher­level annotation such as anaphora or noun of an NP), but it does not easily lend itself discourse relations, and that grammars were to structural­relational annotation. Thus, agnostic of genre and task types. dependency relations or constituent brackets can neither be created or referred to by purely Following Karlsson's original proposal, two topological CG rules2. Even chunking constraints, standards for CG rule compilers emerged in the though topologically more manageable than tree late 90'ies. The first, CG­1, was used by structures, have to be expressed in an indirect Karlsson's team at Helsinki University and way (cp. the NON­PRE­N barrier condition in commercially by the spin­off company LingSoft example rule (c), and syntactic phrases cannot be for English (ENGCG), Swedish and German addressed as wholes, let alone subjected to (GERCG) taggers, as well as for applied products rewriting rules. such as Scandinavian grammar checkers (Arppe, 2000; Birn, 2000 for Swedish, Hagen et al., 2001 A second design limitation in classical CG for Norwegian). The second compiler, CG­2, was concerns the expression of vague, probabilistic programmed and distributed by Pasi Tapainen truths about language. Thus, the formalism does (1996), who made several notational not allow numerical tags or numerical feature­ improvements3 to the rule formalisms (in value pairs, and while many current main stream particular, regarding BARRIER conditions, SET NLP tools are based on probabilistic methods and definitions and REPLACE operations), but left machine learning, classical CG is entirely rule­ the basic topological interpretation of constraints based, and the only way to integrate likelihoods is unchanged. Five years later, a third company, through lexical "Rare" tags or by ordering rules in GrammarSoft ApS, in cooperation with the batches with more heuristic rules applying last. University of Southern Denmark, launched an open source CG compiler, vislcg, which was Third, classical CG tags and tokens are discrete backward compatible with CG­2, but also units and are handled as string constants. While introduced a few new features4, in particular the this design option facilitated efficient processing SUBSTITUTE and APPEND operators designed and even FST methods, it also limited the to allow system hybridization where input from a linguist, who was not allowed to use regular probabilistic tagger could be corrected with CG expressions, feature variables or unification. rules in preparation of a syntactic or semantic CG Another aspect of discreteness concerns stage, as implemented e.g. in the earliest version tokenization: Classical CG regarded token form, of the French FrAG parser (Bick, 2004). Vislcg, number and order as fixed, so the formalism had too, was used in spell and grammar checkers difficulty in accommodating, for instance, the (Bick, 2006a), but because of its open­source rule­based creation of a (fused) named­entitity environment it also marked the transition to a token, the insertion or removal of tokens in spell wider spectrum of CG users and research and grammar checking, or the reordering of languages. tokens needed for machine .

Finally, when classical CG was designed, it 3 Tapanainen also created a very efficient compiling had isolated sentences in mind. Though rule and run-time interpretation algorithm for cg2, scope can be arbitrarily defined by a "window" involving fintite state transducers, as well as a delimiter set, and though "global" window rules finite state dependency grammar, FDG clearly surpass the scope of HMM n­grams, it (Tapanainen, 1997), for his company Conexor and was not possible to span several windows at a its Machinese parsers. 4 The vislcg compiler was programmed over 2 As a work-around, attachment direction markers several years by Martin Carlsen for VISL and (arrows) were introduced in the syntactic function GrammarSoft. For a technical comparison of CG- tags, such as @>N or @N> for pre-nominal and 2 and vislcg, cf. http://beta.visl.sdu.dk/visl/vislcg- @N< or @

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 32 But though constraint grammars using the CG­ upper­case POS and fields or the @­ 2/VISLCG compiler standard did achieve a tag marke syntactic function field6: granularity and accuracy that allowed them to support external modules for both constituent and Both "both" DET P @>N #1­>2 dependency tree generation, they remained companies "company" N P @SUBJ> #2­>3 topological in nature and did not permit explicit said "say" V IMPF @FS­STA #3­>0 reference to linguistic relations and structure in they "they" PERS 3P NOM @SUBJ> #4­>5 the formalism itself. The same is true for virtually would "will" V IMPF @FS­3 all related work outside the CG community itself, lauch "launch" V INF @ICL­AUX< #6­>5 where the basic idea of CG constraints has an "a" ART S @>N #7­>9 sometimes been exploited to enhance or hybridize electric "electric" ADJ POS @>N #8­>9 HMM­style probabilistic methods (e.g. Graña et car "car" N S NOM @6 al., 2003) or combined with machine learning . "." PU @PU #10­>0 (Lindberg & Eineborg, 1998; Lager, 1999), but Instead of the "topological" left/right­pointing always in the form of (mostly close­context) position markers, CG rules with dependency topological rather than structural­relational rules contexts can refer to three types of relations: p and always with discrete tag and string constants. (parent/head), c (child/dependent) and s (sibling). It is only with the CG­3 compiler presented here, that these and most of the other above­mentioned ADD (§AG) TARGET @SUBJ (p V­HUM design issues have been addressed in a principled LINK c @ACC LINK 0 N­NON­HUM) ; way and inside the CG formalism itself. CG­35 (or VISL CG­3 because of its backward (Add an AGENT tag to a subject reading if its compatibility with VISLCG) was developed over parent verb is a human verb that in turn has a a period of 6 years, where new features were child accusative object that is a non­human noun. designed and implemented continually, while E.g. "BMW launched an electric car.") existing features were tested in real­life parsing applications. In the following sections we will In order to add dependency annotation to ”virgin” discuss the most important of these features and input, the operators SETPARENT and compare the finished framework with other SETCHILD are used together with a TO target. approaches. Thus, for the sentence "We know for a fact that the flat had not been used in months." 2 Expressive power: Relational tags SETPARENT @FS­ LINK 0 V­ tags, and this is true of CG­3 relational tags, too. COGNITIVE) (NOT 1 @) anywhere to the left (**­1) if where each dependent (daughter, child) is the latter is a cognitive verb (V­COG) and is not assigned exactly one head (mother, parent), but followed by an ordinary direct object (@m tag, these can either be traditional positional contexts, where 'n' is the token id of the dependent, and 'm' or exploit already established dependency the token id of the head. Thus, dependency is a relations. CG­3 has a built­in check against tag field, just like the ".."­marked lemma field, the circularity, preventing attachments that would create a dependency loop7. Dependency operators can be combined with a number of options: 5 For detailed technical documentation on CG-3, cf. http://beta.visl.sdu.dk/cg3.html, for tutorials, associated tools, parser demos and resources, see 6 All of these fields can easily be converted into http://visl.sdu.dk/constraint_grammar.html. xml-encoded feature-value pairs for compatibility. Cagetories and tag abbreviations are explained at The authors provide scripts for conversion into http://visl.sdu.dk/tagset_cg_general.html . e.g. MALT xml and TIGER xml.

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 33 1.) * (Deep scan) allows a child­ or parent­test to 3 Constituent structure: Inspiration continue searching along a straight line of from the generative paradigm descendants and ancestors, respectively, until the test condition is matched or until the end of a Because dependency bases its structural relation chain is reached. description on tokens (words), it is inherently closer to the native CG approach than the 2.) C (All scan) requires a child­ or sibling­ competing generative family of syntactic relation to match all children or all siblings, formalisms, which operate with non­terminal respectively. Note that this is different from the nodes and constituent brackets. ordinary C (= safe) option which applies to readings. Thus 'cC ADJ' means 'only adjectives as 3.1 Tree transformation children' – e.g. no articles or PP's, while 'c (*) LINK 0C ADJ' means 'any one daughter with an Classical CG does not support constituent unambiguous adjective reading'. brackets in any form, be it flat chunks or nested constituents, so external modules had to be used 3.) S (Self) can be combined with c, p or s to look to create constituent trees. The oldest example at the current target as well. For example, 'c are PSGs with CG functions as terminals (Bick, @SUBJ LINK cS HUM' looks for a human 2003), used for CALL applications within the subject NP – where either the head noun VISL project, followed by dependency­to­ (@SUBJ) itself is human, or where it has a constituent tree transformation employing an modifier that is tagged as human. external dependency grammar (Bick, 2005; Bick, 2006b). Of course the same transformation could Apart from dependency relations, we also allow be used with our new, native CG dependency general named relations in CG­3, that can be used (cp. previous section), but CG­3 does offer more for arbitrary relation types, such as secondary direct ways to express linguistic structure in dependencies between object and object generative terms, allowing linguists used to think complement, anaphora (Bick, 2010), discourse along PSG lines to directly translate generative relations etc. Thus, the following establishes an descriptions and constraints into the CG identity relation between a relative pronoun and formalism. its noun antecedent: 3.2 Chunking SETRELATION (identity) TARGET () TO (*­1 N) There are at least two distinct methods in CG­3 to Where matched, this will add a relational tag on perform chunking, using either (a) cohort insertion or (b) the pronoun token: ID:n R:identity:m, where R: relation­adding. For traditional, shallow chunking, specifies the relation, and n and m are token id's without overlaps and nesting, only about 20 rules are for the pronoun and noun, respectively. needed (Bick, 2013), inserting opening (a) and closing (b) edge marker tokens. It is even possible to set bidirectional relations with separate labels, to be tag­marked at both ends (a) ADDCOHORT ("<$np>" "CHUNK" NP) of the relation arc. Thus, the example rule sets a BEFORE @>N OR N/PROP/PRON OR relation between a human noun subject and a DET/NUM/PERS ­ @ATTR (NOT ­1 @>A OR sense­verb object, labelling the former as @>N) (NEGATE ­1 IT LINK ­1 @>N) ; "experiencer" and the latter as "stimulus": (b) ADDCOHORT ("<$/np>" "ENDCHUNK" SETRELATIONS (experiencer) (stimulus) NP) AFTER N/PROP/PRON OR DET/NUM/PERS TARGET N­HUM + @SUBJ TO (p V­SENSE ­ @ATTR (NOT 0 @>N) (*­1 CHUNK­NP LINK c @ACC) ; BARRIER CHUNK) ;

NP opening markers (a) are inserted before prenominal noun dependents (@>N) or NP heads (N/PROP/PRON), accepting even determiners and numerals if they have no attributive function (@ATTR). 7 Though descriptively undesirable, loops can be Likewise, NP closing markers (b) are inserted after the explicitly allowed with the ALLOWLOOP and NEAREST options (cf. visl.sdu.dk/cg3.html) above NP head candidates, in the presence of the left­

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 34 hand (*­1) NP­chunk opener. The NOT contexts in (a) However, CG­3 also offers another way of make sure that the triggering prenominal is in fact the expressing chunks, the template, which can be first element in the NP, and not preceded by an adverbial integrated into CG rules also at early tagging dependent of its own, or part of a coordination. The stages. A template is basically a pre­defined inserted chunk­opening and ­closing tokens can then be sequence of tokens, POS or functions that can be interpreted as labelled brackets: (np We_PRON /np) referred to as a whole in rule contexts, or even in had (np very_@>A delicious_@>N icecream_N /np) other templates. The basic idea goes back to with (np strawberries_N /np). Karlsson et al. (1995), but was not implemented in either CG­1 or CG­2. The second method is better suited for layered, deep For instance, an NP could be defined as chunking, because it uses relational tags to individually link chunk edges to each other or to the chunk head. (a) TEMPLATE np = ([ART, N]) With full layering, this approach can create complete OR ([ART,ADJ,N]) xml­formated constituent trees from CG dependency­ (b)TEMPLATE np = (? ART LINK 1 N) tagged input without the need of an external converter, if OR (? ART LINK 1 ADJ LINK 1 N) chunk brackets are expressed as xml opening/closing markers. However, using relations to delimit topological (c) TEMPLATE np = ? ART LINK *1 N units such as chunks, introduces certain complexities in BARRIER NON­PRE­N the face of crossing branches and needs to specify the and then used in ordinary rules with a T:­prefix "handedness" (left/right) and "outermostness" of (*1 VFIN LINK *1 T:np). dependency arcs, features that are normally left underspecified in dependency annotation. In CG3, we (a) is closest to the original idea, and reminiscent support these features as l/r­ (left/right) and ll/rr­ of generative rewriting rules, while (b) and (c) are (leftmost/rightmost) additions: shorthand for ordinary CG contexts and harness the full power of the latter. Independently of the (a) ADDRELATIONS (np­head­l) (np­start) format, however, the linguistic motivation behind TARGET (*) (c @>N OR @N<&) TO (llScc templates is to allow direct reference to (*)) ; constituent units, to think in terms of phrase structure and to subsume aspects of generative (b) ADDRELATIONS (np­head­r) (np­stop) grammar into CG. Thus, constituent templates TARGET (*) (c @>N OR @N<&) (r:np­head­l (*)) allow a direct conceptual transfer from generative TO (rrScc (*)) ; rules, and a simple generative NP grammar for the Both rules are bidirectional and mark both chunk NP "a very delicious icecream with strawberries": head and chunk edges. The head target is any np = adjp? n pp? ; word (*) with an adnominal dependent (c @>N adjp = adv? adj ; OR @N<), and the TO­edge is the leftmost (ll) pp = prp np ; resp. rightmost (rr) descendant (cc) or self (S). could be expressed in CG3 as: This second method will yield complete, nested structures, including adjective phrases (adjp) and TEMPLATE np = (N) OR (T:adjp LINK 1 N) prepositional phrases (pp) in the NPs: (np­start OR (T:adjp LINK 1 N LINK 1 T:pp) (adjp­start very_@>A delicious_@>N adjp­stop) OR (N LINK 1 T:pp) ; icecream_N (pp­start with_PRP_@N< 8 TEMPLATE adjp = (ADJ) OR (ADV LINK 1 strawberries_@P< pp­stop) np­stop) . ADJ) ; TEMPLATE pp = PRP LINK 1 T:np ; 3.3 Phrase templates In the example, "very_ADV delicious_ADJ" Both of the above chunking methods are intended matches T:adjp, and "with_PRP icecream_N" to be used late in the annotation pipe, and exploit matches T:pp, and the whole expression could existing morphosyntactic markup or even then be referred to as a T:np context by CG rules. dependencies, so the chunking cannot itself be CG­internally, templates could also simply be seen as methodological part of parsing per se. interpreted as shorthand (variables) for context parentheses, so­called context templates. As such, 8 For clarity, only phrases with 2 or more constituents were bracketed in the 2nd method. they logically need to allow internal, predefined

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 35 positions, as in the following example for a (c) no C­restriction for BARRIERs human verb­template, where the motivation is not CG­3 adds all of the above10, but while increasing a constituent definition, but simply to integrate rule­writing efficiency, these changes to not affect two context alternatives into one9, and to label the the discreteness of tags and strings. result with one simple variable. Methodologically more important, therefore, is TEMPLATE v­hum = ((c @SUBJ + HUM) OR our introduction of regular expressions and (*1 (”that” KS) BARRIER V)) ; "human verb" variables. The former can be used instead of sets defined as either having a subject (@SUBJ) child for open­class items, primarily lemma and word (c) that is human (HUM), or having a class, e.g. ".*i[zs]e"r V in a transitivity set or subordinating conjunction (KS) anywhere to the ".*ist" N as a heuristic candidate for the right (*1) without another verb (V) in between. or classes ("professional" or Compiler­internally, both template types are "ideological" humans). But the feature is useful processed in a similar way, which is why even with a closed­class semantic set such as constituent templates have question marks or 0­ +HUM, and r will work across grammars positions as place holders for an external position and languages leaving grammarians the option to marker, which will be inserted into the template introduce ad hoc sub­distinctions (e.g. by the compiler at run­time. for words like 'diabetic'). Finally, regular expressions can be used to substitute for, or When using templates together with (external) enhance, morphological analysis, for instance in BARRIER's or LINKed conditions, the template stemming or affix recognition, supporting the can be thought of as one token – meaning that creation of so­called "barebones" Constraint right­looking contexts with a template (*1 T:x Grammars without lexical resources (Bick, 2011). BARRIER) will be interpreted against the left edge of the template, while left­looking contexts Variables can be used in connection with (*­1 T:x) will be interpreted against the right edge regular expressions, when appending readings (a) of the template so as to avoid internal, or for instantiating valency conditions (b): unpredictable parts of the template itself to trigger APPEND ("$1"v ADJ) TARGET ("<(.*(ic|oid| the BARRIER condition. ous))>"r) ; # recognizing adjective endings

4 Beyond discrete tags and string REMOVE (N) (0 (<(.+)^vp>r INF)) (­1 INFM) (1 ("$1"v PRP)) ; # e.g. to minister to the tribe constants: Regular expressions, variables and unification With the example given, the second rule can remove the noun reading for 'minister' because the A formal grammar has to strike a balance between 'to' in the valency marker of the verb computational efficiency on the one hand, and 'minister' matches the lemma "to" of the following linguistic ease and rule writing efficiency on the preposition, even if the infinitive marker is still other. Thus, the "classical" CG compilers treated unsafe and potentially a preposition itself (­1 tags and strings (lemma & word form) as rather than ­1C). constants and CG­2, in particular, achieved very The methodologically most important use of high processing speeds exploiting this fact in its variables, however, resides in feature unification. finite state implementation. Some flexibility was Thus, CG3 allows the use of sets as to­be­unified introduced through set definitions, and vislcg variables by prefixing $$ before the set name. Set went on to allow sets as targets, too, as well as unification integrates yet another methodological multiple conditions for the same position, but feature, used in other parsing paradigms, such as many rules had still be to be written in multiple HPSG, but so far accessible in CG only at the cost versions because of expressive limitations in the of considerable "rule explosion". Apart from the formalism: obvious gender/number/case­disambiguation of (a) OR'ing only for tags/sets, not contexts noun phrases, unification is also useful in for (b) no nesting of NOT conditions 10 The nesting of NOT conditions is achieved by making a distinction between ordinary NOT, that 9 In traditional CG, this OR'ed expression could only negates its immediate position, and not even be expressed in one rule, let alone be NEGATE, which a scope over the whole context referenced as one label. bracket - including other NOTs or NEGATEs.

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 36 instance coordination, as with the following LIST to encode and use corpus­harvested frequencies. set of semantic roles (agent, patient, theme and The simplified example rule (a) exploits relative location): lexical POS frequencies for bigram disambiguation in a way reminiscent of hidden LIST ROLE = §AG §PAT §TH §LOC ; Markov models (HMMs), while (b) is a spell SELECT $$ROLE (­1 KC) (­2C $$ROLE) ; checker fall­back rule selecting the word with the Sometimes unification has to be vague in order to highest phonetical similarity value work. This is the case when underspecified REMOVE ( N) (0 (60> V)) (1 N) "Portmanteau" tags are used (e.g. nC ­ nocase unified with NOM or ACC cases), or in the face SELECT () of very finegrained semantic distinctions. We therefore make a distinction between list A more complex example is the use of CG­ unification ($$­prefix) and set unification (&&­ annotated data to boot­strap statistical "wordnets" prefix), where the former unifies "terminal" set or "framenets", containing the likelihood of members, while the latter unifies subsets semantic types or roles given an established belonging to a superset. Two contexts will set­ syntactic function. Thus, the Portuguese unify if they have tags sharing the same subset. In PALAVRAS parser (Bick, 2014) assigns and the example below, N­SEMS is defined as a exploits tags like , superset, with N­SEM as one of the subsets. and for the verb "propor" (suggest), meaning that "propor" LIST N­SEM = object (action/activity/process/ event), and subject ; likelihoods of 41% and 27% for person and SET N­SEMS = N­HUM OR N­LOC ... OR N­ organization, respectively11. SEM ... OR N­SUBSTANCE ; Obviously, numerical tags could be used for other REMOVE @SUBJ> ends than statistics, for instance to assign (0 $$@ LINK 0 &&N­ numerical tags can be seen as a special case of SEMS) ; # ... offered the reader detailed notes (numerical) global variables, e.g. for numbered and instructions on most of the prayers ... genre types or Wordnet synset id's. The example sentence has an ambiguous coordination, where it is not clear if 'and' starts a 6 Grammar­text interaction new clause, and the task of the REMOVE rule is The fourth and last design limitation of classical to exclude a subject reading for 'instructions' CG to be treated here concerns ways to let a (tagged ) by semantically aligning it with constraint grammar mold itself on the fly and to 'notes' (tagged ) because both adapt to the text (or speech transcript) it is used to and are part of the N­SEM subset of the annotate. In CG­3 we introduce 3 types of such &&N­SEMS superset, ­ and by checking if both self­organizing behaviour: nouns also have matching left­pointing argument readings ($$@ sets and sentence) delimiters, but also a spanning width of heuristic rule batching by allowing rules to make n windows left or right of the rule focus. reference to statistical information. This is Unbounded context conditions can breach achieved by introducing numerical secondary tags of the type , which can be used 11 Simplifying, we here only list high-percentage semantic types for subjects and objects.

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 37 window boundaries by adding a 'W', e.g. *­1W for BEFORE, MOVE AFTER and SWITCH WITH scanning left across the window boundary. This operators can be used to express syntactic feature is especially useful for higher­order movement rules in machine translation. The relations such as anaphora (Bick, 2010) or example rule will change Danish VS into English discourse relations. Another scope­related SV in the presence of a fronted adverb: MOVE innovation are (definable) paired brackets that WITHCHILD13 (*) @>>). and make reference to them in a second pass. Like Applied to the Danish sentence "I går så jeg et templates, bracket eclipsing is meant to help rensdyr", this will turn the literal translation reduce CG's topological complexity problem, i.e. "Yesterday saw I a reindeer" into the correctly allow syntactic function carriers to "see" each ordered "Yesterday I saw a reindeer". other more easily across intervening tokens. CG­3, unlike earlier CG compilers, applies 7 Efficiency and hybridization options rules strictly sequentially, and each rule is run on all cohorts in a window before the next rule is This paper is primarily concerned with design tried. This makes rule tracing more predictable, aspects and a linguistic discussion of the CG­3 but also facilitates grammar self­organisation. formalism, and advances in expressive power Thus, we allow context­triggered JUMPs to rule have been the main focus of innovation during ANCHORs, to INCLUDE additional rules from a development. That said, the CG­3 rule parser file or to call EXTERNAL programs. For compiles mature grammars with thousands of instance, an early rule can scan the window for rules in fractions of a second and maintains the verbo­nominal ambiguities, and if there are none, processing speed of VISLCG inspite of the added bypass the rule section in question. complexity caused by regular expressions, variables, templates and numerical tags. For a Because CG does not depend on training data, mature morphosyntactic core grammar with 6000 it is generally assumed to be more genre­robust rules, on a single machine, this amounts to ~1000 than machine­learning systems12, and a few words (cohorts) per second for each of the manual rule changes will often have a great effect morphological and syntactic levels. However, Yli­ on genre tuning (e.g. allowing/forbidding Jyrä (2011) has shown that much higher speeds imperative readings for recipes or science articles, (by about 1 order of magnitude14 on a comparable respectively). In CG­3, we further enhance this machine) are possible, at least for VISLCG­ methodological advantage by introducing compatible rules without the above complexities, parameter variables, that can be set or unset either when using a double finite­state representation, in the data stream (e.g. corpus section headers) or where rule conditions are matched against a string dynamically­contextually by the grammar itself. of feature vectors that summarize compact The example rule below assigns the value "recipe" representations of local ambiguity. Future work to a "genre" variable, when encountering should therefore explore the possibility of imperatives followed by quantified food nouns. sectioned grammars, where a distinction is made between FST­compatible rule sections on the one SETVARIABLE (genre) (recipe) TARGET (IMP) hand, and smaller specialized rule sections on the (*1 N­FOOD LINK *­1 NUM OR N­UNIT other hand, which for their part would allow the BARRIER (*) ­ ("of")) complete range of CG­3 features. This way, Finally, grammar­text interaction may take the simpel "traditional" rules would run at the higher form of rule­governed changes to the text itself. FST speed, and the current procedural compiler Thus, the ADDCOHORT feature used for architecture would only be used where necessary, chunking in section 3.2., and its REMCOHORT greatly reducing overall processing time. counterpart can be used for adding or removing commas in grammar checking, and the MOVE 13 The WITHCHILD option means that heads are moved together with their dependents, in this case 12 The rationale for this is that an ML system "reindeer" together with "a". basically is a snapshot of the linguistic knowledge 14 The reported speed is 110,000 cohorts for contained in its training data, and therefore will FINCG, an open morphological CG with ~ 950 need new training data for each new genre in low-complexity CG-1 rules, originally developed order to perform optimally. by Fred Karlsson for Finnish.

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 38 References Bick, Eckhard. 2014. PALAVRAS, a Constraint Grammar­based Parsing System for Portuguese. In: Arppe, Antti. 2000. Developing a grammar checker Tony Berber Sardinha & Thelma de Lurdes São for Swedish. In: Torbjørn Nordgård (ed.). Bento Ferreira (eds.), Working with Portuguese Proceedings of NODALIDA '99. pp. 28­40. Corpora, pp 279­302. London/New York: Trondheim: Department of Linguistics, University Bloomsburry Academic. of Trondheim. Birn, Jussi. 2000. Detecting grammar errors with Bick, Eckhard. 2003. A CG & PSG Hybrid Approach Lingsoft's Swedish grammar checker. In: Torbjørn to Automatic Corpus Annotation. In: Kiril Simow Nordgård (ed.). Proceedings of NODALIDA '99. p. & Petya Osenova (eds.). Proceedings of 28­40. Trondheim: Deparment of Linguistics, SProLaC2003 (at 2003, University of Trondheim. Lancaster), pp. 1­12 Graña, Jorge & Gloria Andrade & Jesús Vilares. Bick, Eckhard. 2004. Parsing and evaluating the 2003. Compilation of constraint­based contextual French Europarl corpus, In: Patrick Paroubek, rules for part­of­speech tagging into finite state Isabelle Robba & Anne Vilnat (red.): Méthodes et transducers. In: Proceedings of the 7th Conference outils pour l'évaluation des analyseurs syntaxiques on Implementation and Application of Automata (Journée ATALA, May 15, 2004). pp. 4­9. Paris: (CIAA 2002). pp. 128–137. Springer­Verlag, ATALA. Berlin. Bick, Eckhard. 2005. Turning Constraint Grammar Hagen, Kristin & Pia Lane & Trond Trosterud. 2001. Data into Running Dependency . In: En grammatikkontrol for bokmål. In: Kjell Ivar Civit, Montserrat & Kübler, Sandra & Martí, Ma. VAnnebo & Helge Sandøy (eds). Språkknyt 3­ Antònia (red.). Proceedings of TLT 2005 (4th 2001, pp. 6­9. Oslo: Norsk Språkråd Workshop on Treebanks and Linguistic Theory, Barcelona), pp.19­27 Karlsson, Fred. 1990. Constraint Grammar as a Framwork for Parsing Running Text. In: Hans Bick, Eckhard. 2006a. A Constraint Grammar Based Karlgren (ed.). Proceedings of COLING­90, Vol. 3, Spellchecker for Danish with a Special Focus on pp. 168­173 Dyslexics". In: Suominen, Mickael et.al. (ed.) A Man of Measure: Festschrift in Honour of Fred Karlsson, Fred & Atro Voutilainen & Juha Heikkilä & Karlsson on his 60th Birthday. Special Supplement Arto Anttila. 1995. Constraint Grammar: A to SKY Jounal of Linguistics, Vol. 19. pp. 387­ language­independent system for parsing 396. Turku: The Linguistic Association of Finland unrestricted text. Natural Language Processing 4. Berlin & New York: Mouton de Gruyter. Bick, Eckhard. 2006b. Turning a Dependency into a PSG­style Constituent Treebank. Lager, Torbjörn. 1999. The µ­TBL System: Logic In: Calzolari, Nicoletta et al. (eds.). Proceedings of Programming Tools for Transformation­Based LREC 2000. pp. 1961­1964 Learning. In: Proceedings of CoNLL'99 (Bergen). Bick, Eckhard. 2010. A Dependency­based Approach Lindberg, Nikolaj & Martin Eineborg. 1998. Learning to Anaphora Annotation. In: (eds.) Extended Constraint Grammar­style Disambiguation Rules Activities Proceedings, 9th International Using Inductive Logic Programming. In: th Conference on Computational Processing of the Proceedings of the 36 ACL / 17th COLING Portuguese Language (Porto Alegre, Brazil). ISSN (Montreal, Canada). volume 2, pages 775–779 2177­3580 Tapanainen, Pasi. 1996. The Constraint Grammar Bick, Eckhard. 2011. A Barebones Constraint Parser CG­2. No 27, Publications of the Grammar. In: Helena Hong Gao & Minghui Dong Department of Linguistics, University of Helsinki. (eds), Proceedings of the 25th Pacific Asia Tapanainen, Pasi. 1997. A Dependency Parser for Conference on Language, Information and English. Technical Reports No TR1. Department of Computation (Singapore). pp. 226­235 Linguistics, University of Helsinki. Bick, Eckhard. 2013. Using Constraint Grammar for Yli­Jyrä, Anssi Mikael. 2011. An Efficient Constraint Chunking. In: S. Oepen, K. Hagen & J. B. Grammar Parser based on Inward Deterministic Joannessen (Eds). Proceedings of the 19th Nordic Automata. In: Proceedings of the NODALIDA Conference of Computational Linguistics 2011 Workshop ConstraintGrammar Applications, (NODALIDA 2013). Linköping Electronic pp. 50­60 NEALT ProceedingsSeries , vol. 14 Conference Proceedings Vol. 85, pp. 13­26. Linköping: Linköping University Electronic Press.

Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015) 39