Quick viewing(Text Mode)

Inferring Morphological Rules from Small Examples Using 0/1 Linear Programming

Inferring Morphological Rules from Small Examples Using 0/1 Linear Programming

Inferring morphological rules from small examples using 0/1 linear programming

Ann Lilliestrom,¨ Koen Claessen, Nicholas Smallbone Chalmers University of Technology Gothenburg, Sweden {annl,koen,nicsma}@chalmers.se

Abstract Here, +’s∗ represents the suffix ’s, and +a+ repre- 1 We show how to express the problem of sents an infix a. The table shows that both ’s and finding an optimal segmentation an apostrophe can mark the genitive case; the sec- from a set of labelled as a 0/1 linear ond column means that the genitive was marked programming problem, and how to build by ’s only in singular , and by an apostrophe on this to analyse a ’s . only in plural nouns. An s suffix marks plural, and The approach works even when there is because of the tiny input data it was only seen in very little training data available. genitive nouns. Plural can be marked by an inner vowel change from a to e (indicated by the arrow), 1 Introduction or by changing a final y to ies, in which case the Many recent tools for morphological analysis use singular form is marked by a or y. statistical approaches such as neural networks (Cot- The tool also segments the input words into mor- terell et al., 2018). These approaches can profitably phemes consistent with the table of rules (the in- use huge amounts of training data, which makes ferred stem is marked in bold): them ideal for high-resource . But if there standard inflected is little training data available, statistical methods wom|a|n| ∅ wom|e|n| ∅ can struggle to learn an accurate model. And when Sg Nom Pl Nom bab|y| ∅ bab|ies| ’ they learn the wrong model, it is difficult to diag- Sg Nom Pl Gen dog| ∅ dog|s| ’ nose the error, because the model is a black box. Sg,Nom Pl Gen This paper presents a new approach to cat| ∅ cat| ’s |∅ Sg,Nom Gen Sg morphological analysis that produces human- lorr|y| ∅ lorr|ies| ∅ understandable models and works even if only a Sg Nom Pl Nom Our key idea is a novel morphological segmen- few training words are given. The user gives a list tation algorithm used to produce the segmentation of inflected words together with each ’s mor- above (section 3). Once we have a segmentation, phosyntactic features and standard form (): we use it to compute the rules on the left (section 4). standard inflected features In this example, the rules inferred by our tool pre- woman women Pl;Nom cisely capture the morphology of the input words, baby babies’ Pl;Gen and the segmentation shows which rules are most dog dogs’ Pl;Gen common. We claim that, together, they can be cat cat’s Sg;Gen used to perform high-quality morphological analy- lorry lorries Pl;Nom sis. We validate this claim in section 5 by showing Our tool then proposes, for each feature, the affixes how to build on our tool to do reinflection. and morphological rules which mark that feature. For the example above it suggests the following: 2 Related Work feature context morpheme Gen Sg +’s∗ Morphological segmentation is a well-studied prob- Gen Pl +’∗ lem in NLP, with applications including machine Pl Gen +s∗ translation (Green and DeNero, 2012), speech Pl Nom +a+ → +e+ 1The tool also supports prefixes such as ∗un+, and cir- Pl +y∗ → +ies∗ cumfixes such as ∗ge+t∗ in the German gerannt (run; past Sg +y∗, +a+, ∅ ). We explain the meaning of ∗ and + in section 3.3. recognition (Rajagopal Narasimhan et al., 2014) of multiple discontinuous pieces, or be empty. In and information retrieval (Turunen and Kurimo, the segmentation on page 1, for example, the word 2008); for a more detailed overview we refer women is segmented by assigning Pl the segment the reader to Ruokolainen et al. (2016) or Ham- e, Nom the null segment, and woman the discon- marstrom¨ and Borin (2011). Unsupervised learning tinuous segment wom*n. In general, the segment (Harris, 1955; Creutz and Lagus, 2007; Goldsmith, assigned to the lemma feature represents the stem 2001; Johnson, 2008; Poon et al., 2009) relies on of the word. unannotated text, and is perhaps the most popular We say a segmentation is valid if for each word: approach, because of the large amount of unanno- tated text available for many languages, but it can • Every letter is in exactly one segment. suffer from low accuracy (Hammarstrom¨ and Borin, • Each feature labels exactly one segment. 2011). One way to improve accuracy is to exploit semantic information (Sakakini et al., 2017; Vulic´ • Each segment is labelled with exactly one fea- et al., 2017; Schone and Jurafsky, 2000; Soricut and ture.2 Och, 2015). Another is minimally-supervised learn- ing (Monson et al., 2007; Kohonen et al., 2010; There are many valid segmentations, some good, Ruokolainen et al., 2013; Gronroos¨ et al., 2014; some bad. We consider a segmentation good if it is Sirts and Goldwater, 2013; Ahlberg et al., 2014), parsimonious: each morpheme should be marked which combines a large amount of unannotated by as few features as possible. Note that different text with a small amount of annotated text, and forms of a word may be assigned different stems potentially provides high accuracy at a low cost. (e.g. go and went), but parsimony dictates that we Silfverberg and Hulden (2017) observe that in share stems wherever reasonable. Perhaps surpris- the Universal Dependencies project (Nivre et al., ingly, finding the most parsimonious segmentation 2017), each word is annotated with its lemma and is an NP-hard problem, by reduction from set cover. features, but not segmented. They study the prob- lem of finding a segmentation for these words. Our 3.1 Segmentation as a constraint problem segmentation algorithm solves the same problem, We solve the segmentation problem using zero-one and can be used in their setting. We improve on linear programming (0/1 LP). A 0/1 LP problem their solution by using a constraint solver to achieve consists of a set of variables (e.g. x, y, z, . . .) and high accuracy even with limited data, and using a a set of linear inequalities over those variables (e.g. precise language for expressing affixes and rules, 2x + 3y ≥ z and x ≤ 2y + 3). Given such a which allows us to use the resulting model to pre- problem, a 0/1 LP solver finds an assignment of cisely analyse new words. values to the variables that makes all the inequali- Luo et al. (2017) use Integer Linear Program- ties hold, and where all variables have the value 0 ming for unsupervised modelling of morphological or 1. A 0/1 LP problem also specifies a linear term families. They use ILP as a component of a larger whose value should be minimised (e.g. 4x+y −z), training algorithm, so unlike our work they do not called the objective function. A 0/1 LP solver is attempt to find a globally optimal solution. ILP guaranteed to find the solution that minimises the has been used in other NLP applications outside objective function, if a solution exists. The solver of morphology (Berant et al., 2011; Roth and Yih, that we use is CPLEX (ILOG, 2009). 2005; Clarke and Lapata, 2008). In this section we assume that we are given, in addition to the labelled words, a (possibly large) 3 Morphological segmentation set of allowable segments for each feature, which we call candidate . How candidate mor- The input to our tool is a list of words, each anno- phemes are automatically generated is explained in tated with its lemma and morphosyntactic features. section 3.2. From now on, we model the lemma as simply an- Our encoding uses the following variables. If m other feature; for example, the word “women” has is a candidate morpheme of feature f, the variable the features Nom, Pl and woman. The goal of this section is to divide each word 2This is an unrealistic assumption, but also fairly harmless: when a morpheme marks (e.g.) genitive plural, it will be into segments, and assign each feature of the word assigned one of those features, but the inferred rules will show to one of those segments. A segment may consist that it only occurs together with the other feature. Sf,m should be 1 if m is used to mark f, and 0 Example For the above example, the first rule otherwise. If f is a feature with a candidate mor- generates the following constraints, which force an pheme m that occurs in word w, and P is the set S-variable to 1 when one of its M-variables is 1: of positions in w occupied by m, then the vari- S ≥ M able Mw,P,f,m should be 1 if this occurrence of m hast,h¨ ast¨ hast,1–4,h¨ ast,h¨ ast¨ marks feature f, and 0 otherwise. SPl,ar ≥ Mhast,5–6,Pl,ar¨ Example Suppose that we are given the prob- SPl,na ≥ Mhast,7–8,Pl,na¨ lem of segmenting the Swedish word hastarna¨ (the SDef,ar ≥ Mhast,5–6,Def,ar¨ horses). For simplicity, suppose that we know that SDef,na ≥ Mhast,7–8,Def,na¨ the stem of hastarna¨ is hast¨ , and that the possible affixes are ar and na. (For now, we ignore the fact The second rule generates the following con- that these should be suffixes; we deal with this in straints, since letters 1 to 4 can only be covered by section 3.3.) This results in the following candidate hast¨ , letters 5 to 6 by ar (either as Pl or Def), and morphemes: letters 7 to 8 by na: feature candidate morphemes Mhast,1–4,h¨ ast,h¨ ast¨ = 1 Pl ar, na M + M = 1 Def ar, na hast,5–6,Pl,ar¨ hast,5–6,Def,ar¨ hast¨ hast¨ Mhast,7–8,Pl,na¨ + Mhast,7–8,Def,na¨ = 1 The string ar appears at positions 5–6 The third rule generates the following con- in hastarna¨ , and na appears at positions straints, stating that hast¨ , Pl and Def must be 7–8. Therefore, the encoding introduces marked by exactly one morpheme each: the following variables: SPl,ar, SPl,na, SDef,ar, SDef,na, Shast,h¨ ast¨ , Mhast,5–6,Pl,ar¨ , Mhast,5–6,Def,ar¨ , Mhast,5–6,Pl,ar¨ + Mhast,7–8,Pl,na¨ = 1 Mhast,7–8,Pl,na¨ , Mhast,7–8,Def,na¨ , Mhast,1–4,h¨ ast,h¨ ast¨ . Mhast,5–6,Def,ar¨ + Mhast,7–8,Def,na¨ = 1 We then generate the following constraints: Mhast,1–4,h¨ ast,h¨ ast¨ = 1 • If a morpheme is used to mark a feature in a This set of constraints has two solutions: given word, it must be a genuine morpheme of that feature. For each M-variable, if the 1. One where Pl is assigned to ar and Def to M-variable is 1 then the corresponding S- na. In this solution, the following variables variable must also be 1: are 1 and the rest are 0: Shast,h¨ ast¨ , SPl,ar, SDef,na, Mhast,1–4,h¨ ast,h¨ ast¨ , Mhast,5–6,Pl,ar¨ , Mhast,7–8,Def,na¨ . Sf,m ≥ Mw,P,f,m 2. One where Def is assigned to ar and Pl to na. • Each position of each word must be in ex- actly one segment. For each position in each In general, any valid segmentation of the input word, there must be exactly one M-variable words is a solution to the constraint problem. To that contains that position and whose value is make the constraint solver find the best segmen- 1. Thus for each position p of each word w tation, we also supply an objective function to be we generate the following constraint: minimised. In our case, we choose to minimise the total number of morpheme-feature pairs used. To X Mw,P,f,m = 1. achieve this, we supply the objective function f∈features of w X m∈candidate morphemes of f S . P ∈occurrences of m in w where p∈P f,m f∈features m∈candidates of f • In each word, each feature must be mapped to exactly one morpheme. For each feature f of Example Suppose that we add to our earlier ex- each word w and feature f of w, exactly one ample the word hundars (dogs’). Its stem is hund M-variable must be 1: and its other features are Gen and Pl. We also add X s to the candidate morphemes for Pl, Def and Gen. Mw,P,f,m = 1. The constraint solver finds the solution that min- m∈candidate morphemes of f P ∈occurrences of m in w imises the value of the objective function, which in this case means assigning ar to Pl, s to Gen and the patterns. Both stems and affixes are described na to Def. The objective function’s value is then 3 using morpheme patterns, but their semantics are (SPl,ar + SGen,s + SDef,na). This is the correct seg- slightly different. mentation; the wrong segmentations are rejected Stem Patterns For stems, a ∗ symbol marks a because they have more feature-morpheme pairs position where an affix can (but does not have to and thus make the objective function larger. be) inserted. For example, m∗n∗ is the stem of man, 3.2 Choosing the candidate morphemes: the where an a can be inserted in the infix position to naive approach make it singular, or an e to make it plural. In the The constraint solver requires as input a set of can- suffix position, we can have the null morpheme, didate morphemes for each feature. Since the prob- or add an affix such as ’s. To accommodate word lem formulation requires that a morpheme or stem forms with multiple tokens, such as the German of a word in the dictionary must be a subsequence word fangt¨ an (begins), with standard form anfan- of that word, one option is to simply let the candi- gen, stems are represented by a set of patterns. The date morphemes of a feature f include all subse- patterns of this set can be combined in any order, quences of all words that are annotated with fea- with or without a space. {f∗ng∗, an} is thus a stem ture f. This guarantees the optimal solution with of both word forms. respect to the given constraints. We have found Affix Patterns Affix patterns include two special two main problems with this approach. symbols, + and ∗. For an affix pattern to match a word, you must be able to obtain the word by 1. While it works very well for small inputs replacing each + and ∗ with an appropriate string. (around 20 words), the constraint problem For example, +s∗ matches dogs’ by replacing + quickly becomes infeasible with larger sets of with dog and ∗ by an apostrophe. But there are two data, especially if many features are involved. important restrictions: 2. It does not consider the position of the mor- pheme in the word: the suffix -s is a plural • Each + must be replaced by a string that con- in English, but the infix -s- is not. tains some part of the stem.

We solve Problem 1 with an algorithm that • Each ∗ must be replaced by a string that does guesses an approximate stem and then refines the not contain any part of the stem. guess. The algorithm is described in section 3.4. In effect, the + symbol determines where the To solve Problem 2, we now introduce mor- stem must be placed in relation to the affix, while pheme patterns, which restrict the way in which the ∗ symbol allows other affixes to be attached in morphemes can be applied: in this case, a mor- its place. For example, the plural morpheme +s∗ pheme that has been observed only as a suffix in in English must be placed after (but not necessarily the data should only occur to the right of the stem. directly after) a stem. Likewise, the genitive mor- 3.3 Morpheme patterns pheme +’∗ must have the stem entirely on its left side. The two morphemes can be combined into If we do not distinguish prefixes, suffixes and in- +s’∗, and be placed after the stem horse∗, to pro- fixes, we can not know whether the word seashells duce the word horses’. The morpheme +ea+ must should be segmented as seashell|s or s|eashells or be placed where it has the stem on both sides, thus even sea|s|hells. Morpheme patterns allow us to making it an infix. Together with the stem pattern make this distinction. A typical morpheme pattern br∗k∗, it produces the word break. Similarly, the is +s∗, which represents the suffix s. morpheme +o+en∗ together with br∗k∗ produces Morpheme patterns act as jigsaw pieces that the word broken. An affix pattern can in theory make sure each morpheme is placed in its appropri- have any number of +-symbols. ate position in relation to the stem. By using mor- pheme patterns, we restrict the way morphemes can Extra constraints In order to make the con- be combined and obtain a more precise segmenta- straint solver follow the semantics of stem pat- tion. The purpose of this section is to formalise terns and affix patterns, we must for each word what + and ∗ in patterns mean, and to extend the add extra constraints for the stems and patterns segmentation algorithm to respect the meaning of that are incompatible. An affix and stem can both be chosen for a word only if the affix is in an ap- We simplify the problem further, by automat- propriate position in relation to the stem. Thus, ically giving the null morpheme to features for all words w, and all pairs of M-variables that are shared between the inflected form and

Mw,P1,f1,m1 ,Mw,P2,f2,m2 , we check if m1 is a the standard form. Since we do not allow any stem pattern, m2 is an affix pattern and their posi- segment to be explained by multiple features, tions P1 and P2 are incompatible. If so, we add a we can assume that the word difference is un- constraint that only one of them can be chosen: related to the features that are shared between the two word forms. As an optimisation, when Mw,P1,f1,m1 + Mw,P2,f2,m2 ≤ 1 a word is annotated with a single non-shared feature, that feature is automatically paired For example, supposing we have +e+ as a candi- with the entire non-stem part, and no subse- date morpheme for plural, +s’ as a candidate mor- quences are generated. pheme of genitive, and bus∗ as a candidate stem of buses’, the constraint will not accept bus|e|s’ as a The approximated stem together with the seg- segmentation, since +e+ is required to be an infix mentation of the remainder of the word makes of the stem. a first approximation of a segmentation of the entire word. However, choosing the longest 3.4 The Algorithm common subsequence as the stem does not Considering all possible subsequences of all words always result in the best segmentation. In our yields an unmanageable number of variables for example above, the -y is dropped in the stem the constraint solver. We therefore need to identify of baby, but is included in the stem of lorry.A the relevant subsequences, which is done in several more consistent choice would be to drop it for steps. lorry too, which we achieve in the next step.

1. As a first step, we approximate the stem as the 3. Taking the morphemes chosen by the con- longest common subsequence of the inflected straint solver in step 2, we generate additional word and the standard form: stem candidates by removing from each word each possible choice of morpheme for the standard inflected features word’s features. For example, in the segmenta- woman women Pl;Nom tion to the left, we got +y∗ as a morpheme of baby babies’ Pl;Gen the Sg feature. Therefore, for any word with dog dogs’ Pl;Gen the Sg feature that ends in y, we generate a lorry lorry Sg;Nom candidate stem where the -y suffix is dropped. For lorry, we get the candidate stem lorr∗ in 2. By removing the approximated stems, we sim- addition to the approximated stem lorry from plify the problem to segmenting the non-stem step 1. For baby, in addition to the suffix +y, part of each word. The resulting problem in- we also consider the infix +a+ from wom∗n cludes many duplicates, as well as having in step 2 as a possibility for the singular mor- shorter strings to segment, making it feasi- pheme. The candidate stems thus become ble to naively consider every subsequence as bab∗ and b∗by. a candidate for each feature listed with the word. The result is as follows. 4. Using the morphemes computed in step 1 and the stem candidates of step 3, we re-run the women e → e | constraint solver and let it find the optimal Pl Nom woman a → a | choice of stems and morphemes. The cho- Sg Nom sen stems and morphemes are decoded from babies’ ies’ → ies | ’ Pl Gen the 1-valued S-variables, while the segmen- baby y → y | tation can be decoded from the 1-valued M- Sg Nom variables. dogs’ s’ → s | ’ Pl Gen dog → | 4 Finding morphological rules Sg Nom lorry → | We have now discovered which morphemes mark Sg Nom each features. This section shows how to find in- flections that change one morpheme to another, and We also capture the phenomenon where extra morphemes that mark a combination of features. letters are inserted when adding an affix, such as the insertion of an extra g between the past tense 4.1 Function features marker ge and a stem beginning with the letter e In order to detect inflection rules that involve re- in German. To do so, we identify the first and last placing one morpheme with another, such as the letters of the stem and add them as synthesised change of the suffix +y∗ into +ie∗ in English plu- features. In this case, this results in the extra fea- ral, we introduce the concept of function features. ture AddTo Prefix e, the segmentation of gegessen Without function features, we might find all of +s∗, (eaten) becomes ge|(e → ge)ss|en. Past Past +es∗ and +ies∗ as English plural morphemes. By This algorithm is not language-specific: it works adding function features, we can specify that +s∗ for any language where the concepts of prefix, infix, is generally used as the pluralisation morpheme, suffix, first and final letter affect inflection. and when +ies∗ and +es∗ are used instead. Function features are a special synthetic feature 4.2 Morphemes with multiple features automatically added to some input words. They are Sometimes, a morpheme can be linked to more added once we have segmented the input, and only than one feature. For example, +na∗ in Swedish for words where the standard and inflected form is a morpheme of definite form, but it occurs only have the same stem. Let us take as an example the in plural. To find such links we post-process the word babies, its standard form baby, and its stem result returned by the constraint solver. For each bab∗. We first remove the stem from the standard morpheme m of each feature f, we collect all words form to get +y∗. Our idea is that, since +y∗ is a whose segmentation uses morpheme m for feature suffix, we will add the feature From Suffix y to the f. The intersection of all features of all such word set of features of babies. We process the whole entries reveals what features always occur together input this way, and then re-run the segmentation. with the combination of m and f. Assuming there are several words in the input that share the same paradigm, the constraint solver will 5 Experimental Results map +s∗ to Plural (because the pair (+s∗, Plural) is commonly occurring and shared by other words), We evaluated our tool on four different problems: and +ie∗ to the From Suffix y feature. This seg- English nouns, Swedish nouns, Dutch , and mentation captures the fact that words ending in y the SIGMORPHON 2018 shared task of morpho- often change to ie in plural. We can picture this logical reinflection (Cotterell et al., 2018). process as: bab(y → ie) |s. Pl English nouns We tested our tool on 400 ran- In the same way, the word-lemma pairs men domly selected English nouns from the GF resource - man, wives - wife and mice - mouse result in library (Ranta, 2009). The tool took 3.9s the stem changes +a+ → +e+, +f+ → +v+ and to run. Fig. 1 shows the morphemes chosen before +ous+ → +ic+, after we synthesise the respective and after the addition of function features. function features From Infix a, From Infix f and From the test results, we randomly select 20 From Infix ous. words to demonstrate their segmentations. The first An inflection rule may also be specific to a part segmentation, based on approximated stems (step of the stem, such as the doubling of the letter g in 3 of the algorithm) is presented in Fig. 2. big - bigger. To cover cases involving the last or Step 4 of the algorithm changes just one entry: first letter of the stem, we synthesise the feature the suffix y has been dropped from the stem of sky. AddTo Suffix x, where x is the last letter of the stem, None of the 20 word entries involves a function and AddTo Prefix x, where x is the first letter of the feature that was assigned to a non-null morpheme. stem. As an example, bigger, with the stem big∗, is given the additional feature AddTo Suffix g. The Combined morphemes Fig. 3 lists the mor- morpheme +er∗ is likely to be mapped to the com- phemes with multiple features, as described in sec- parative feature (due to its commonness in words tion 4.2. The list of combined features nicely shows in comparative form), while the remaining +g∗ is many of the inflection rules appearing in the data. mapped to the AddTo Suffix g feature: Reducing the test data We repeat the experi- bi(g → gg) | er Comp ment after reducing the test data to include only First approximation of morphemes morphemes feature morphemes 1 rivers’ river +s’/Gen,Pl Gen ∗ +’s∗ +’∗ +e+’s∗ 2 breasts breast +s/Pl /Nom +v+’∗ 3 river’s river +’s/Gen /Sg Nom ∗ 4 windows’ window +s’/Gen,Pl Pl ∗ +s∗ +ies∗ +es∗ +e+ 5 television’s television +’s/Gen /Sg +ren∗ +v+s∗ +ic+ 6 country countr+ +y/Sg /Nom Sg ∗ +y∗ +a+ +f∗ +f+ 7 languages’ language +s’/Gen,Pl +oo+ +ous+ 8 fire fire /Sg,Nom Morphemes, including function features 9 number number /Sg,Nom feature morphemes 10 ceiling ceiling /Sg,Nom Gen ∗ +s’ +’s 11 question’s question +’s/Gen /Sg Nom ∗ +ous+ 12 song song /Sg,Nom Pl ∗ +s∗ 13 airplane’s airplane +’s/Gen /Sg Sg ∗ +a+ +oo+ +y∗ 14 doors’ door +s’/Gen,Pl +f∗ +f+ 15 fires fire +s/Pl /Nom Infix a ∗ +e+ 16 water water /Sg,Nom Infix f ∗ +v+ 17 arts’ art +s’/Gen,Pl Infix oo ∗ +ee+ +oo+ 18 flowers’ flower +s’/Gen,Pl Infix ous ∗ +ic+ 19 sky’s sky +’s/Gen /Sg Suffix f ∗ +ve∗ +f* 20 ear ear /Sg,Nom Suffix y ∗ +ie∗ +y* AddToSuffix d ∗ +ren∗ Figure 2: The segmentation of the 20 test words, AddToSuffix h ∗ +e∗ based on the test data of 400 words

Figure 1: Chosen morphemes after each step feature morpheme combines with +a+ → +e+ Pl (policeman → policemen and 1 other(s) ) the 20 randomly selected words. After step 2, the +f+ → +v+ Pl segmentation based on 20 words is identical to the (wife → wives) segmentation based on 400 words, with just one +oo+ → +ee+ Pl,Gen exception; the stem of country includes the suf- (foot → feet’s) fix y. After step 3 and step 4, the segmentation is +ous+ → +ic+ Pl,Nom identical to the one based on 400 words. (louse → lice) Swedish nouns and Dutch verbs We also tested Pl +s’∗ Gen our method on a set of Swedish nouns and Dutch (doctor → doctors’ and 91 other(s) ) verbs. For Swedish nouns, our method works very +d → +d∗ren Pl well. The precision is 100% based on a set of 50 (child → children) words, and 94% based on a set of 250 words. The +h → +h∗e Pl erroneous results on the bigger data were because (church → churches and 1 other(s) ) the algorithm noticed a vowel change in certain +f → +ve∗ Pl,Gen words and applied it universally, causing for exam- (leaf → leaves’) ple the stem of grad (degree) to wrongly become +y → +ie∗ Pl gr∗d∗ instead of grad∗, because of other words in (country → countries and 7 other(s) ) which a becomes a¨ in the plural. For Dutch verbs, the precision was 80%, based on 50 words, and Figure 3: The combined features, where the feature 74% based on 250 words. The errors made were of column 1 and morpheme of column 2 occur only similar to those on the Swedish test data. in combination with the features of column 3

Comparison with earlier work We compared our results on Swedish with those of Silfverberg same dataset (in particular, ours only includes and Hulden (2017), although they do not use the nouns). Our precision of 94% far exceeds their precision of 62%.3 To find out why, we looked the feature From Suffix suff. In step 2, for each into what sort of errors both tools made. As men- candidate stem, we collect the entries of the train- tioned above, our tool made one class of errors, ing data that match the target features (including inferring a vowel change where none was needed, the function features collected in step 1). Out of but produced a plausible segmentation. those, we pick the word whose stem best matches Their tool found many implausible segmenta- the source form. In step 3, we take this word, and tions; for example, inkomst (income) was some- replace its stem with the stem of the input word. times segmented into i|nkomst, and pension into p|ension. Furthermore, it segmented words incon- Language Our system Mean Best sistently: some occurrences of inkomst were (cor- 25.6 14.77 45.2 rectly) left as one segment. This means that the tool (2.95) (6.63) (1.77) has not found the simplest explanation of the in- Galician 49.0 31.93 61.1 put data: its explanation requires more morphemes (1.42) (2.4) (0.72) than necessary, such as i, p and nkomst. We avoid Greek 27.9 15.76 32.3 this problem since the constraint solver guarantees (3.02) (4.89) (1.83) to find the globally minimal solution to the input Karelian 32.0 47.93 94.0 constraints. (1.4) (1.53) (0.1) Secondly, their tool does not restrict where in Russian 43.8 26.86 53.5 a word a morpheme can occur. For example, the (1.41) (3.64) (1.07) letter a can mark common nouns, such as flicka 43.9 25.76 58.0 (girl). It only occurs as a suffix, but their tool uses (1.55) (2.99) (0.93) it as a prefix to segment arbetsinkomst (income Slovene 35.9 24.80 58.0 from work) into a|rbetsinkomst. By distinguishing (1.15) (2.7) (0.73) different kinds of affixes, we avoid this problem. Tatar 64.0 48.89 90.0 (0.44) (2.15) (0.14) Morphological Reinflection We use an adapted Telugu 66.0 67.41 96.0 version of our tool to solve the SIGMORPHON (0.98) (1.29) (0.06) 2018 shared task of morphological (type level) re- West Frisian 43.0 32.52 56.0 inflection (Cotterell et al., 2018). Given a lemma (1.86) (1.85) (1.01) and set of morphological features, the task is to gen- erate a target inflected form. For example, given the Figure 4: Results of reinflection. The first line gives source form release and target features PTCP and the average accuracy and the second line the aver- PRS, the task is to predict the target form releasing. age Levenshtein distance from the right answer. Our approach requires a set of labelled training data, which we segment to obtain a list of affixes We evaluate our system on the low training data, and their associated features. To predict the target the smallest of the three available sizes, which con- inflected form of a word, we: 1) find the stem of sists of 100 words for each of the 103 languages. the word, 2) find a word in the training data whose The data includes a mixture of nouns, adjectives features match the target features, and 3) replace and verbs. 17 of the languages were excluded from the stem of that word with that of the input word. the evaluation, because they involved a large num- In more detail, in step 1, we check if the word ber of features, resulting in a too long execution contains any affixes that are associated with a fea- time. On the remaining 86 languages, our approach ture belonging to the lemma. We remove any such performs, with a few exceptions, in the better half, affix from the word. There may be a choice of and often in the better third of the 27 submitted affixes for each feature so this results in a set of systems. For English, Czech, Greek, Livonian and candidate stems. When considering a candidate Romanian, the accuracy is within 5% of the accu- stem, we also add the appropriate function features racy of the system with the highest score. Figure to the target feature list; for example, if the stem 4 shows the accuracy and Levenshtein distance of drops a suffix suff from the source form, we add our system on a sample of languages, and the mean and best values of the systems that took part in 362% is the figure for unlabelled morphemes. The figures given in the paper for labelled morphemes are unfortunately the shared task. Our reinflection algorithm is very erroneous. simple, but still competes with state-of-the-art sys- tems, indicating that the underlying morphological standard features analysis provided by our tool is of good quality. bus bus; Suffix es dress dress; Suffix es 6 Conclusion and Future Work box box; Suffix es suffix suffix; Suffix es We have presented a method for morphological seg- mentation based on constraint solving. Our eval- The solution returned indicates which letter pat- uation shows that it works well even given little terns are associated with the plural form +es: training data, producing almost the same segmen- bu| s Suffix es tation from 20 English words as it does from 400 dres| s words. It produces a morphological analysis pre- Suffix es bo| x cise enough that a simple reinflection algorithm Suffix es suffi| x built on it can compete with state-of-the-art sys- Suffix es tems. The reasons for this good precision are (a) We could then introduce new features Ad- the pattern semantics, which allows the solver to dTo Suffix s and AddTo Suffix x to the original make precise distinctions between different kinds problem, whereupon the algorithm of Section 4 of morphemes, such as infix and suffix; (b) the use would find the correct function features. This of function features to express replacement rules. method would be able to invent function features The paper also demonstrates that constraint solv- from an arbitrary substring of the given words. ing is a useful alternative to machine learning for segmentation, particularly for low-resource lan- Tweaking the objective function The objective guages where one must make the most of a small function can be weighted to give higher or lower set of data. We hope that our paper spurs further cost to stems or morphemes related to specific research in this direction.There are many possible kinds of features. For example, by multiplying refinements to our technique and some of our ideas each term Sf,m in the objective function by the for future work are listed below. length of m, we would recover the Minimum De- scription Length principle. It is left as future work More refined semantics for morphemes With to investigate how the choice of cost function af- a more refined semantics of morphemes, we can fects the result in different settings. guide the constraint solver to pick a segmenta- Distinguishing between rules and exceptions tion that follows the observed data in a more pre- Once a morpheme is used in a segmentation, the cise way. This can be done by restricting how algorithm is sometimes too eager to use the same a morpheme can be placed in relation to other morpheme elsewhere. This means that adding more morphemes. For example, the definite morpheme data sometimes leads to worse results, and errors +na∗ in Swedish always follows a plural morpheme in the input can cause unrelated words to be seg- (+er∗, +or∗ or +ar∗), and never occurs directly mented wrongly. We plan to investigate using sta- after a stem. The morpheme semantics could be tistical methods to distinguish between morpholog- improved to allow these kind of restrictions, and ical rules and exceptions; exceptions should not be the algorithm refined to automatically infer them applied everywhere, but rules should. from the data. Scalability For most languages our tool works Improved function features Currently, we con- comfortably up to a thousand words or more, but sider only the first and last letter of the stem for for languages with many morphosyntactic features stem additions, as described in section 4. We are (such as Basque) it can struggle to deal with a hun- currently investigating generalising this idea by dred words. We would like to see if, by tackling using the constraint solver to find the relevant seg- features in smaller groups, it is possible to scale ment of the stem. This would allow us to detect a the approach to large inputs. wider range of morphological changes. As an example, suppose we would like to find out which kinds of English words form their plural References +es with . We could take all such words that occur Malin Ahlberg, Markus Forsberg, and Mans Hulden. in the input, and give the following segmentation 2014. Semi-supervised learning of morphological problem to the constraint solver: paradigms and lexicons. In Proceedings of the 14th Conference of the European Chapter of the Asso- Oskar Kohonen, Sami Virpioja, and Krista Lagus. 2010. ciation for Computational Linguistics, Gothenburg, Semi-supervised learning of concatenative morphol- Sweden 26–30 April 2014, pages 569–578. ogy. In Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphol- Jonathan Berant, Ido Dagan, and Jacob Goldberger. ogy and Phonology, SIGMORPHON ’10, pages 78– 2011. Global learning of typed entailment rules. In 86, Stroudsburg, PA, USA. Association for Compu- Proceedings of the 49th Annual Meeting of the Asso- tational Linguistics. ciation for Computational Linguistics: Human Lan- guage Technologies, pages 610–619, Portland, Ore- Jiaming Luo, Karthik Narasimhan, and Regina Barzi- gon, USA. Association for Computational Linguis- lay. 2017. Unsupervised learning of morphological tics. forests. Transactions of the Association for Compu- tational Linguistics, 5:353–364. James Clarke and Mirella Lapata. 2008. Global infer- ence for sentence compression: An integer linear Christian Monson, Jaime Carbonell, Alon Lavie, and programming approach. Journal of Artificial Intel- Lori Levin. 2007. Paramor: Minimally supervised ligence Research, 31:399–429. induction of paradigm structure and morphological analysis. In Proceedings of Ninth Meeting of the Ryan Cotterell, Christo Kirov, John Sylak-Glassman, ACL Special Interest Group in Computational Mor- Geraldine´ Walther, Ekaterina Vylomova, Arya D phology and Phonology, pages 117–125, Prague, McCarthy, Katharina Kann, Sebastian Mielke, Gar- Czech Republic. Association for Computational Lin- rett Nicolai, Miikka Silfverberg, et al. 2018. guistics. The conll–sigmorphon 2018 shared task: Univer- sal morphological reinflection. arXiv preprint ˇ arXiv:1810.07125. Joakim Nivre, Zeljko Agic,´ Lars Ahrenberg, Maria Je- sus Aranzabe, Masayuki Asahara, and Aitziber et al. Mathias Creutz and Krista Lagus. 2007. Unsupervised Atutxa. 2017. Universal dependencies 2.0. LIN- models for morpheme segmentation and morphol- DAT/CLARIN digital library at the Institute of For- ogy learning. ACM Trans. Speech Lang. Process., mal and Applied Linguistics (UFAL),´ Faculty of 4(1):3:1–3:34. Mathematics and Physics, Charles University.

John Goldsmith. 2001. Unsupervised learning of the Hoifung Poon, Colin Cherry, and Kristina Toutanova. morphology of a natural language. Comput. Lin- 2009. Unsupervised morphological segmentation guist., 27(2):153–198. with log-linear models. In Proceedings of Human Language Technologies: The 2009 Annual Confer- Spence Green and John DeNero. 2012. A class-based ence of the North American Chapter of the Associ- agreement model for generating accurately inflected ation for Computational Linguistics, NAACL ’09, translations. In Proceedings of the 50th Annual pages 209–217, Stroudsburg, PA, USA. Association Meeting of the Association for Computational Lin- for Computational Linguistics. guistics: Long Papers - Volume 1, ACL ’12, pages 146–155, Stroudsburg, PA, USA. Association for Karthik Rajagopal Narasimhan, Damianos Karakos, Computational Linguistics. Richard Schwartz, Stavros Tsakalidis, and Regina Barzilay. 2014. Morphological segmentation for Stig-Arne Gronroos,¨ Sami Virpioja, Peter Smit, and keyword spotting. Mikko Kurimo. 2014. Morfessor flatcat: An hmm- based method for unsupervised and semi-supervised Aarne Ranta. 2009. The gf resource grammar library. learning of morphology. In Proceedings of COLING 2014, the 25th International Conference on Compu- Dan Roth and Wen-tau Yih. 2005. Integer linear pro- tational Linguistics: Technical Papers, pages 1177– gramming inference for conditional random fields. 1185, Dublin, Ireland. Dublin City University and In Proceedings of the 22Nd International Confer- Association for Computational Linguistics. ence on Machine Learning, ICML ’05, pages 736– 743, New York, NY, USA. ACM. Harald Hammarstrom¨ and Lars Borin. 2011. Unsuper- vised learning of morphology. Computational Lin- Teemu Ruokolainen, Oskar Kohonen, Kairit Sirts, Stig- guistics, 37(2):309–350. Arne Gronroos,¨ Mikko Kurimo, and Sami Virpioja. Zellig S. Harris. 1955. From phoneme to morpheme. 2016. A comparative study of minimally super- Language, 31(2):190–222. vised morphological segmentation. Comput. Lin- guist., 42(1):91–120. IBM ILOG. 2009. IBM ILOG CPLEX V12.1: User’s manual for CPLEX. Teemu Ruokolainen, Oskar Kohonen, Sami Virpioja, and Mikko Kurimo. 2013. Supervised morpholog- Mark Johnson. 2008. Using adaptor to iden- ical segmentation in a low-resource learning setting tify synergies in the unsupervised acquisition of lin- using conditional random fields. In Proceedings of guistic structure. In Proceedings of ACL-08: HLT, the Seventeenth Conference on Computational Nat- pages 398–406, Columbus, Ohio. Association for ural Language Learning, pages 29–37, Sofia, Bul- Computational Linguistics. garia. Association for Computational Linguistics. Tarek Sakakini, Suma Bhat, and Pramod Viswanath. 2017. Morse: Semantic-ally drive-n morpheme segment-er. In Proceedings of the 55th Annual Meet- ing of the Association for Computational Linguistics (Volume 1: Long Papers), pages 552–561, Vancou- ver, Canada. Association for Computational Linguis- tics. Patrick Schone and Daniel Jurafsky. 2000. Knowledge- free induction of morphology using latent semantic analysis. In Fourth Conference on Computational Natural Language Learning and the Second Learn- ing Language in Logic Workshop. Miikka Silfverberg and Mans Hulden. 2017. Auto- matic morpheme segmentation and labeling in uni- versal dependencies resources. In Proceedings of the NoDaLiDa Workshop on Universal Dependen- cies, UDW@NoDaLiDa 2017, Gothenburg, Sweden, May 22, 2017, pages 140–145. Association for Com- putational Linguistics.

Kairit Sirts and Sharon Goldwater. 2013. Minimally- supervised morphological segmentation using adap- tor grammars. Transactions of the Association for Computational Linguistics, 1:255–266.

Radu Soricut and Franz Och. 2015. Unsupervised mor- phology induction using word embeddings. In Pro- ceedings of the 2015 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1627–1637, Denver, Colorado. Association for Com- putational Linguistics. Ville T. Turunen and Mikko Kurimo. 2008. Speech re- trieval from unsegmented finnish audio using statis- tical morpheme-like units for segmentation, recogni- tion, and retrieval. ACM Trans. Speech Lang. Pro- cess., 8(1):1:1–1:25. Ivan Vulic,´ Nikola Mrksiˇ c,´ Roi Reichart, Diarmuid OS´ eaghdha,´ Steve Young, and Anna Korhonen. 2017. Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules. In Pro- ceedings of the 55th Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers), pages 56–68, Vancouver, Canada. Associa- tion for Computational Linguistics.