Morphological Analysis for the Maltese Language: The challenges of a hybrid system

Claudia Borg Albert Gatt Dept. A.I., Faculty of ICT Institute of Linguistics University of [email protected] [email protected]

Abstract with a vowel melody and patterns to derive forms. By contrast, the Romance/English morphologi- Maltese is a morphologically rich lan- cal component is concatenative (i.e. exclusively guage with a hybrid morphological sys- stem-and-affix based). Table 1 provides an ex- tem which features both concatenative ample of these two systems, showing inflection and non-concatenative processes. This and derivation for the words ezamina˙ ‘to exam- paper analyses the impact of this hy- ine’ taking a stem-based form, and gideb ‘to lie’ bridity on the performance of machine from the √GDB which is based on a tem- learning techniques for morphological la- platic system. Table 2 gives an examply of ver- belling and clustering. In particular, we bal inflection, which is affix-based, and applies to analyse a dataset of morphologically re- lexemes arising from both concatenative and non- lated word clusters to evaluate the differ- concatenative systems, the main difference being ence in results for concatenative and non- that the latter evinces frequent stem variation. concatenative clusters. We also describe research carried out in morphological la- Table 1: Examples of inflection and derivation in belling, with a particular focus on the verb the concatenative and non-concatenative systems category. Two evaluations were carried out, one using an unseen dataset, and an- Derivation Inflection other one using a gold standard dataset Concat. which was manually labelled. The gold ezamina˙ ezaminatur˙ ezaminatr-i˙ ci˙ , sg.f standard dataset was split into concatena- ‘examine’ ‘examiner’ ezaminatur-i˙ , pl. tive and non-concatenative to analyse the Non-Con. difference in results between the two mor- gideb ‘lie’ giddieb giddieb-a, sg.f. phological systems. √GDB ‘liar’ giddib-in, pl.

1 Introduction

Maltese, the of the Maltese Is- Table 2: Verbal inflections for the concatenative lands and, since 2004, also an official European and non-concatenative systems. language, has a hybrid morphological system that ezamina˙ gideb √GDB evolved from an , a Romance (Si- ‘examine’ ‘lie’ cilian/Italian) superstratum and an English adstra- 1SG n-ezamina˙ n-igdeb tum (Brincat, 2011). The Semitic influence is 2SG t-ezamina˙ t-igdeb evident in the basic syntactic structure, with a 3SGM -ezamina˙ j-igdeb highly productive non-Semitic component man- 3SGF t-ezamina˙ t-igdeb ifest in its lexis and (Fabri, 2010; 1PL n-ezamina-w˙ n-igdb-u Borg and Azzopardi-Alexander, 1997; Fabri et 2PL t-ezamina-w˙ t-igdb-u al., 2014). Semitic morphological processes still 3PL j-ezamina-w˙ j-igdb-u account for a sizeable proportion of the lexicon and follow a non-concatenative, root-and-pattern strategy (or templatic morphology) similar to Ara- To date, there still is no complete morpholog- bic and Hebrew, with consonantal roots combined ical analyser for Maltese. In a first attempt at a

25 Proceedings of The Third Arabic Natural Language Processing Workshop (WANLP), pages 25–34, Valencia, Spain, April 3, 2017. c 2017 Association for Computational Linguistics computational treatment of Maltese morphology, on the two different morphological systems. Sec- Farrugia (2008) used a neural network and fo- ond, we are interested in labelling words with their cused solely on for (Schem- morphological properties. We view this as a clas- bri, 2006). The only work treating computa- sification problem, and treat complex morpholog- tional morphology for Maltese in general was by ical properties as separate features which can be Borg and Gatt (2014), who used unsupervised classified in an optimal sequence to provide a final techniques to group together morphologically re- complex label. Once again, the focus of the analy- lated words. A theoretical analysis of the tem- sis is on the hybridity of the language and whether platic verbs (Spagnol, 2011) was used by Camil- a single technique is appropriate for a mixed mor- leri (2013), who created a computational gram- phology such as that found in Maltese. mar for Maltese for the Resource Grammar Li- brary (Ranta, 2011), with a particular focus on in- 2 Related Work flectional verbal morphology. The grammar pro- duced the full paradigm of a verb on the basis of Computational morphology can be viewed as hav- its root, which can consist of over 1,400 inflective ing three separate subtasks — segmentation, clus- forms per derived verbal form, of which traditional tering related words, and labelling (see Ham- grammars usually list 10. This resource is known marstrom¨ and Borin (2011)). Various approaches as Gabra˙ and is available online1. Gabra˙ is, to date, are used for each of the tasks, ranging from rule- the best computational resource available in terms based techniques, such as finite state transduc- of morphological information. It is limited in its ers for Arabic morphological analysis (Beesley, focus to templatic morphology and restricted to 1996; Habash et al., 2005), to various unsuper- the wordforms available in the database. A further vised, semi- or fully-supervised techniques which resource is the lexicon and analyser provided as would generally deal with one or two of the sub- part of the open-source machine transla- tasks. For most of the techniques described, it is tion toolkit (Forcada et al., 2011). A subset of this difficult to directly compare results due to differ- lexicon has since been incorporated in the Gabra˙ ence in the data used and the evaluation setting it- database. self. For instance, the results achieved by segmen- This paper presents work carried out for Mal- tation techniques are then evaluated in an informa- tese morphology, with a particular emphasis on tion retrieval task. the problem of hybridity in the morphological sys- The majority of works dealing with unsuper- tem. Morphological analysis is challenging for a vised morphology focus on English and assume language like Maltese due to the mixed morpho- that the morphological processes are concatena- logical processes existing side by side. Although tive (Hammarstrom¨ and Borin, 2011). Goldsmith there are similarities between the two systems, (2001) uses the minimum description length al- as seen in verbal inflections, various differences gorithm, which aims to represent a language in among the subsystems exist which make a uni- the most compact way possible by grouping to- fied treatment challenging, including: (a) stem al- gether words that take on the same set of suf- lomorphy, which occurs far more frequently with fixes. In a similar vein, Creutz and Lagus (2005; Semitic stems; (b) paradigmatic gaps, especially 2007) use Maximum a Posteriori approaches to in the derivational system based on semitic roots segment words from unannotated texts, and have (Spagnol, 2011); (c) the fact that morphological become part of the baseline and standard evalua- analysis for a hybrid system needs to pay atten- tion in the Morpho Challenge series of competi- tion to both stem-internal (templatic) processes, tions (Kurimo et al., 2010). Kohonen et al. (2010) and phenomena occurring at the stem’s edge (by extends this work by introducing semi- and super- affixation). vised approaches to the model learning for seg- mentation. This is done by introducing a discrim- First, we will analyse the results of the unsu- inative weighting scheme that gives preference to pervised clustering technique by Borg and Gatt the segmentations within the labelled data. (2014) applied on Maltese, with a particular focus of distinguishing the performance of the technique Transitional probabilities are used to determine potential word boundaries (Keshava and Pitler, 1http://mlrs.research.um.edu.mt/ 2006; Dasgupta and Ng, 2007; Demberg, 2007). resources/gabra/ The technique is very intuitive, and posits that the

26 most likely place for a segmentation to take place words. However this technique relies on part-of- is at nodes in the trie with a large branching factor. speech, affix and stem information. Can and Man- The result is a ranked list of affixes which can then andhar (2012) create a hierarchical clustering of be used to segment words. morphologically related words using both affixes Van den Bosch and Daelemans (1999) and and stems to combine words in the same clusters. Clark (2002; 2007) apply Memory-based Learn- Ahlberg et al. (2014) produce inflection tables by ing to classify morphological labels. The latter obtaining generalisations over a small number of work was tested on Arabic singular and broken samples through a semi-supervised approach. The plural pairs, with the algorithm learning how to as- system takes a group of words and assumes that sociate an inflected form with its base form. Dur- the similar elements that are shared by the differ- rett and DeNero (2013) derives rules on the basis ent forms can be generalised over and are irrele- of the orthographic changes that take place in an vant for the inflection process. inflection table (containing a paradigm). A log- For , a central issue in com- linear model is then used to place a conditional putational morphology is disambiguation between distribution over all valid rules. multiple possible analyses. Habash and Ram- bow (2005) learn classifiers to identify different Poon et al. (2009) use a log-linear model for morphological features, used specifically to im- unsupervised morphological segmentation, which prove part-of-speech tagging. Snyder and Barzi- leverages overlapping features such as morphemes lay (2008) tackle morphological segmentation for and their context. It incorporates exponential pri- multiple languages in the Semitic family and En- ors as a way of describing a language in an effi- glish by creating a model that maps frequently oc- cient and compact manner. Sirts and Goldwater curring morphemes in different languages into a (2013) proposed Adaptor Grammars (AGMorph), single abstract morpheme. a nonparametric Bayesian modelling framework Due to the intrinsic differences in the problem for minimally supervised learning of morpholog- of computational morphology between Semitic ical segmentation. The model learns latent tree and English/, it is difficult structures over the input of a corpus of strings. to directly compare results. Our interest in the Narasimhan et al. (2015) also use a log-linear present paper is more in the types of approaches model, and morpheme and word-level features to taken, and particularly, in seeing morphologi- predict morphological chains, improving upon the cal labelling as a classification problem. Mod- techniques of Poon et al. (2009) and Sirts and elling different classifiers for specific morpholog- Goldwater (2013). A morphological chain is seen ical properties can be the appropriate approach for as a sequence of words that starts from the base Maltese, since it allows the flexibility to focus on word, and at each level through the process of af- those properties where data is available. fixation a new word is derived as a morphologi- cal variant, with the top 100 chains having an ac- 3 Clustering words in a hybrid curacy of 43%. It was also tested on an Arabic dataset, achieving an F-Measure of 0.799. How- morphological system ever, the system does not handle stem variation The Maltese morphology system includes two sys- since the pairing of words is done on the basis tems, concatenative and non-concatenative. As of the same orthographic stem and therefore the seen in the previous section, most computational result for Arabic is rather surprising. The tech- approaches deal with either Semitic morphology nique is also lightly-supervised since it incorpo- (as one would for Arabic or its varieties), or with rates part-of-speech category to reinforce potential a system based on stems and affixes (as in Italian). segmentations. Therefore, we might expect that certain methods Schone and Jurafsky (2000; 2001) and Baroni will perform differently depending on which com- et al. (2002) use both orthographic and semantic ponent we look at. Indeed, overall accuracy fig- similarity to detect morphologically related word ures may mask interesting differences among the pairs, arguing that neither is sufficient on its own different components. to determine a morphological relation. Yarowsky The main motivation behind this analysis is and Wicentowski (2000) use a combination of that Maltese words of Semitic origin tend to have alignment models with the aim of pairing inflected considerable stem variation (non-concatenative),

27 whilst the word formation from Romance/English Table 3: Comparison of non-concatenative and origin words would generally leave stems whole concatenative clusters in expert group (concatenative)2. Maltese provides an ideal sce- nario for this type of analysis due to its mixed morphology. Often, clustering techniques would Size NC CON either be sensitive to a particular language, such as <10 53% (25) 26% (14) catering for weak consonants in Arabic (de Roeck 10–19 23% (11) 37% (20) and Al-Fares, 2000), or focus solely on English or 20–29 13% (6) 15% (8) Romance languages (Schone and Jurafsky, 2001; 30–39 2% (1) 9% (5) Yarowsky and Wicentowski, 2000; Baroni et al., >40 9% (4) 13% (7) 2002) where stem variation is not widespread. Total 47 53 The analysis below uses a dataset of clusters Evaluated by all experts 13 13 produced by Borg and Gatt (2014), who employed Evaluated by one expert 34 40 an unsupervised technique using several interim steps to cluster words together. First, potential affixes are identified using transitional probabil- structed around infixation of vowel melodies to ities in a similar fashion to (Keshava and Pitler, form a stem, before inflection adds affixes. In the 2006; Dasgupta and Ng, 2007). Words are then concatenative system there are some cases of al- clustered on the basis of common stems. Clusters lomorphy, but there will, in general, be an entire are improved using measures of orthographic and stem, or substring thereof, that is recognisable. semantic similarity, in a similar vein to (Schone and Jurafsky, 2001; Baroni et al., 2002). Since no gold-standard lexical resource was available 3.1 Words removed from clusters for Maltese, the authors evaluated the clusters us- ing a crowd-sourcing strategy of non-expert native As an indicator of the quality of a cluster, the speakers and a separate, but smaller, set of clusters analysis looks at the number of words that ex- were evaluated using an expert group. In the eval- perts removed from a cluster — indicating that uation, participants were presented with a cluster the word does not belong to a cluster. Table 4 which had to be rated for its quality and corrected gives the percentage of words removed from clus- by removing any words which do not belong to ters, divided according to whether the morpho- a cluster. In this analysis, we focus on the ex- logical system involved is concatenative or non- perts’ cluster dataset which was roughly balanced concatenative. The percentage of clusters which between non-concatenative (NC) and concatena- were left intact by the experts were higher for the tive (CON) clusters. There are 101 clusters in this concatenative group (61%) when compared to the dataset, 25 of which were evaluated by all 3 ex- non-concatenative group (45%). The gap closes perts, and the remaining by one of the experts. Ta- when considering the percentage of clusters which ble 3 provides an overview of the 101 clusters in had a third or more of their words removed (non- terms of their size. concatenative at 25% and concatenative at 20%). Immediately, it is possible to observe that However, the concatenative group also had clus- concatenative clusters tend to be larger in size ters which had more than 80% of their words than non-concatenative clusters. This is mainly removed. This indicates that, although in gen- due to the issue of stem variation in the non- eral the clustering technique performs better for concatenative group, which gives rise to a lot of the concatenative case, there are cases when bad false negatives. It is also worth noting that part of clusters are formed through the techniques used. the difficulty here is that the vowel patterns in the The reason is usually that stems with overlap- non-concatenative process are unpredictable. For ping substrings are mistakenly grouped together. example qsim ‘division’ is formed from qasam ‘to One such cluster was that for ittra ‘letter’, which divide’ √QSM, whilst ksur ‘breakage’ is formed also got clustered with ittraduci˙ ‘translate’ and it- from kiser ‘to break’ √KSR. Words are con- tratat ‘treated’, clearly all morphologically unre- lated words. However, these were clustered to- 2Concatenative word formations would always involve a recognisable stem, though in some cases they may undergo gether because the system incorrectly identified it- minor variations as a result of allomorphy or allophomy. tra as a potential stem in all these words.

28 the two processes excel or stand out over the Table 4: Number of words removed, split by con- other. One of the problems with non-concatenative catenative and non-concatenative processes clusters was that of size. The initial clusters By Percentage NC CON were formed on the basis of the stems, and due 0% 45% (33) 61% (49) to stem variation the non-concatenative clusters 1–5% 1% (1) 1% (1) were rather small. Although the merging pro- 5–10% 7% (5) 4% (3) cess catered for clusters to be put together and 10–20% 5% (4) 11% (9) form larger clusters, the process was limited to a 20–30% 17% (12) 4% (3) maximum of two merging operations. This might 30–40% 8% (6) 4% (3) not have been sufficient for the small-sized non- 40–60% 7% (5) 3% (2) concatenative clusters. In fact, only 10% of the 60–80% 10% (7) 9% (7) NC clusters contained 30 or more words when over 80% 0% (0) 4% (3) compared to 22% of the concatenative clusters. Limiting merging in this fashion may have re- sulted in a few missed opportunities. This is be- 3.2 Quality ratings of clusters cause there’s likely to be a lot of derived forms Experts were asked to rate the quality of a cluster, which are difficult to cluster initially due to stem and although this is a rather subjective opinion, the allomorphy (arising due to the fact that root- correlation between this judgement and the num- based derivation involves infixation, and in Mal- ber of words removed was calculated using Pear- tese, vowel melodies are unpredictable). So there son’s correlation coefficient. The trends are con- are possibly many clusters, all related to the same sistent with the analysis in the previous subsec- root. tion; table 5 provides the breakdown of the qual- The problem of size with concatenative clusters ity ratings for clusters split between the two pro- was on the other side of the scale. Although the cesses and the correlation of the quality to the per- majority of clusters were of average size, large centage of words removed. The non-concatenative clusters tended to include many false positives. In clusters generally have lower quality ratings when order to explore this problem further, one possibil- compared to the concatenative clusters. But both ity would be to check whether there is a correlation groups have a strong correlation between the per- between the size of a cluster and the percentage of centage of words removed and the quality rating, words removed from it. It is possible that the un- clearly indicating that the perception of a cluster’s supervised technique does not perform well when quality is related to the percentage of words re- producing larger clusters, and if such a correlation moved. exists, it would be possible to set an empirically determined threshold for cluster size. Table 5: Quality ratings of clusters, correlated to Given the results achieved, it is realistic to state the percentage of words removed. that the unsupervised clustering technique could Quality NC CON be further improved using the evaluated clusters as Very Good 17% (12) 28% (22) a development set to better determine the thresh- Good 33% (24) 36% (29) olds in the metrics proposed above. This im- Medium 34% (25) 18% (15) provement would impact both concatenative and Bad 12% (9) 14% (11) non-concatenative clusters equally. In general, the Very Bad 4% (3) 4% (3) clustering technique does work slightly better for Correlation: 0.780 0.785 the concatenative clusters, and this is surely due to the clustering of words on the basis of their stems. This is reflected by the result that 61% of the clus- ters had no words removed compared to 45% of 3.3 Hybridity in clustering the non-concatenative clusters. However, a larger Clearly, there is a notable difference between number of concatenative clusters had a large per- the clustering of words from concatenative and centage of words removed. Indeed, if the qual- non-concatenative morphological processes. Both ity ratings were considered as an indicator of how have their strengths and pitfalls, but neither of the technique performs on the non-concatenative

29 vs the concatenative clusters, the judgement would A series of classifiers were trained using an- be medium to good for the non-concatenative and notated data from Gabra,˙ which contains detailed good for the concatenative clusters. Thus the per- morphological information relevant to each word. formance is sufficiently close in terms of qual- These are person, number, gender, direct ob- ity of the two groups to suggest that a single un- ject, indirect object, tense, aspect, mood and supervised technique can be applied to Maltese, polarity. In the case of tense/aspect and mood, without differentiating between the morphological these were joined into one single feature, abbre- sub-systems. viated to TAM since they are mutually exclusive. These features are referred to as second-tier fea- 4 Classifying morphological properties tures, representing the morphological properties In our approach, morphological labelling is which the system must classify. The classification viewed as a classification problem with each mor- also relies on a set of basic features which are au- phological property seen as a feature which can be tomatically extracted from a given word. These are stems, prefixes, suffixes and composite suf- classified. Thus, the analysis of a given word can 4 be seen as a sequence of classification problems, fixes, when available , consonant-vowel patterns each assigning a label to the word which reflects and . one of its morphological properties. We refer to A separate classifier was trained for each of the such a sequence of classifiers as a ‘cascade’. second-tier features. In order to arrive at the ideal In this paper, we focus in particular on the sequence of classifiers, multiple sequences were verb category, which is morphologically one of tested and the best sequence identified on the basis the richest categories in Maltese. The main ques- of performance on held-out data (for more detail tion is to identify whether there is a difference in see Borg (2016)). Once the optimal sequence was the performance of the classification system when established, the classification system used these applied to lexemes formed through concatenative classifiers as a cascade, each producing the appro- or non-concatenative processes. Our primary fo- priate label for a particular morphological prop- cus is on the classification of inflectional verb fea- erty and passing on the information learnt to the tures. While these are affixed to the stem, the prin- following classifier. The verb cascade consisted of cipal issue we are interested in is whether the co- the optimal sequence of classifiers in the follow- training of the classifier sequence on an undiffer- ing sequence: Polarity (Pol), Indirect Object (Ind), entiated training set performs adequately on both Direct Object (Dir), Tense/Aspect/Mood (TAM), lexemes derived via a templatic system and lex- Number (Num), Gender (Gen) and Person (Per). emes which have a ‘whole’, continuous stem. The classifiers were trained using decision trees through the WEKA data mining software (Hall et 4.1 The classification system al., 2009), available both through a graphical user The classification system was trained and initially interface and as an open-source java library. Other evaluated using part of the annotated data from techniques, such as Random Forests, SVMs and the lexical resource Gabra.˙ The training data Na¨ıve Bayes, were also tested and produced very contained over 170,000 wordforms, and the test similar results. The classifiers were built using the data, which was completely unseen, contained training datasets. The first evaluation followed the around 20,000 wordforms. A second dataset traditional evaluation principles of machine learn- was also used which was taken from the Maltese ing, using the test dataset which contained unseen ˙ national corpus (MLRS — Malta Language Re- wordforms from Gabra, amounting to just over source Server3). This dataset consisted of 200 10% of the training data. This is referred to as randomly selected words which were given mor- the traditional evaluation. phological labels by two experts. The words However, there are two main aspects in our sce- were split half and half between Semitic (non- nario that encouraged us to go beyond the tradi- ˙ concatenative) and Romance/English (concatena- tional evaluation. First, Gabra is made of auto- tive) origin. The verb category had 94 words, with matically generated wordforms, several of which 76 non-concatenative, and 18 concatenative. This 4Composite suffixes occur when more than one suffix is is referred to as the gold standard dataset. concatenated to the stem, usually with enclitic object and in- direct object pronouns, as in qatil-hu-li ‘ killed him for 3http://mlrs.research.um.edu.mt/ me’.

30 are never attested (though they are possible) in the the outcomes also point to the possibility of over- MLRS corpus. Second, the corpus contains several fitting, particularly as Gabra˙ contains a very high other words which are not present in Gabra˙ , es- proportion of Semitic, compared to concatenative, pecially concatenative word formations. Thus, we stems. Thus, it is possible that the training data for decided to carry out a gold standard (GS) evalu- the classifiers did not cover the necessary breadth ation to test the performance of the classification for the verbs found in the MLRS corpus. To what system on actual data from the MLRS corpus. The extent this is impacting the results of the classifiers evaluation in this paper is restricted to the verb cat- cannot be known unless the analysis separates the egory. two processes. For this reason, the analysis of the verb category in the gold standard evaluation was 4.2 Evaluation Results separated into two, and the performance of each is compared to the overall gold standard perfor- We first compare the performance of the classi- mance. This allows us to identify those morpho- fication system on the test dataset collected from logical properties which will require more repre- ˙ Gabra to the manually annotated gold standard sentative datasets in order to improve their perfor- collated from the MLRS corpus. These results are mance. Figure 2 shows this comparison. shown in fig. 1. The first three features in the cas- The first three classifiers — polarity, indirect cade — Polarity, Indirect Object and Direct Object object and direct object — perform as expected, — perform best in both the traditional and gold meaning that the concatenative lexemes perform standard evaluations. In particular, the indirect ob- worse than the non-concatenative. This confirms ject has practically the same performance in both the suspicion that the coverage of Gabra˙ is not suf- evaluations. A closer look at the classification re- ficiently representative of the morphological prop- sults of the words reveals that most words did not erties in the concatenative class of words. On the have this morphological property, and therefore other hand, the TAM and Person classifiers per- no label was required. The classification system form better on the concatenative words. However, correctly classified these words with a null value. there is no specific distinction in the errors of these The polarity classifier on the other hand, was ex- two classifiers. pected to perform better — in Maltese, negation One overall possible reason for the discrep- is indicated with the suffix - at the end of the ancy in the performance between the traditional word. The main problem here was that the clas- and gold standard evaluation, and possibly also sifier could apply the labels positive, negative or between the concatenative and non-concatenative null to a word, resulting in the use of the null label words, is how the words are segmented. The test more frequently than the two human experts. data in the traditional evaluation setting was seg- The errors in the classification of the morpho- mented correctly, using the same technique ap- logical property TAM were mainly found in the plied for the training data. The segmentation for labelling of the values perfective and imperative, the words in the MLRS corpus was performed au- imperfective whilst the label performed slightly tomatically and heuristically, and the results were better. Similarly, the number and gender classi- not checked for their correctness, so the classifi- fiers both had labels that performed better than cation system might have been given an incorrect others. Overall, this could indicate that the data segmentation of a word. This would impact the re- representation for these particular labels is not ad- sults as the classifiers rely upon the identification equate to facilitate the modelling of a classifier. of prefixes and suffixes to label words. As expected, the performance of the classifiers on the gold standard is lower than that of a tra- 5 Conclusions and Future Work ditional evaluation setting. The test dataset used in the traditional evaluation, although completely This paper analysed the results of the clustering unseen, was still from the same source as the train- of morphologically related words and the mor- ing data (Gabra)˙ — the segmentation of words was phological labelling of words, with a particular known, the distribution of instances in the differ- emphasis on identifying the difference in perfor- ent classes (labels) was similar to that found in the mance of the techniques used on words of Semitic training data. While consistency in training and origin (non-concatenative) and Romance/English test data sources clearly make for better results, origin (concatenative).

31 90 80 GoldStandard 70 Traditional Accuracy 60

Pol Ind Dir Tam Num Gen Per

Figure 1: Comparison of the classification system using traditional evaluation settings and a gold stan- dard evaluation.

90 GoldStandard 80 Concatenative 70 Non-concatenative Accuracy 60

Pol Ind Dir Tam Num Gen Per

Figure 2: Comparison of the classifiers split between concatenative and non-concatenative words.

The datasets obtained from the clustering tech- this was treated as a gold standard. Since the clas- nique were split into concatenative and non- sifiers were trained using data which is predomi- concatenative sets, and evaluated in terms of their nantly non-concatenative, the performance of the quality and the number of words removed from classification system on the MLRS corpus was, as each cluster. Although generally, the cluster- expected, worse than the traditional evaluation. ing techniques performed best on the concatena- In comparing the two evaluations, it was possi- tive set, scalability seemed to be an issue, with ble to assess which morphological properties were the bigger clusters performing badly. The non- not performing adequately. Moreover, the gold concatenative set, on the other hand, had smaller standard dataset was split into two, denoting con- clusters but the quality ratings were generally catenative and non-concatenative words, to further lower than those of the concatenative group. Over- analyse whether a classification system that was all, it seems that the techniques were geared more trained predominantly on non-concatenative data towards the concatenative set, but performed at an could then be applied to concatenative data. The acceptable level for the non-concatenative set. Al- results were mixed, according to the different mor- though the analysis shows that it is difficult to find phological properties, but overall, the evaluation a one-size-fits-all solution, the resulting clusters was useful to determine where more representa- could be used as a development set to optimise the tive data is needed. clustering process in future. Although the accuracy of the morphological The research carried out in morphological la- classification system are not exceptionally high for belling viewed it as a classification problem. Each some of the morphological properties, the system morphological property is seen as a machine learn- performs well overall, and the individual classi- ing feature, and each feature is modelled as a clas- fiers can be retrained and improved as more rep- sifier and placed in a cascade so as to provide the resentative data becomes available. And although complete label to a given word. The research fo- the gold standard data is small in size, it allows cussed on the verb category and two types of eval- us to identify which properties require more data, uations were carried out to test this classification and of which type. One of the possible routes for- system. The first was a traditional evaluation us- ward is to extend the grammar used to generate the ing unseen data from the same source as the train- wordforms in Gabra˙ and thus obtain more cover- ing set. A second evaluation used randomly se- age for the concatenative process. However, it is lected words from the MLRS corpus which were already clear from the analysis carried out that the manually annotated with their morphological la- current approach is viable for both morphological bels by two human experts. There is no complete systems and can be well suited for a hybrid system morphological analyser available for Maltese, so such as Maltese.

32 6 Acknowledgements Alexander Clark. 2007. Supervised and Unsuper- vised Learning of Arabic Morphology. In Abdel- The authors acknowledge the insight and expertise hadi Soudi, Antal van den Bosch, Gunter¨ Neumann, of Prof. Ray Fabri. The research work disclosed and Nancy Ide, editors, Arabic Computational Mor- in this publication is partially funded by the Malta phology, volume 38 of Text, Speech and Language Technology, pages 181–200. Springer Netherlands. Government Scholarship Scheme grant. Mathias Creutz and Krista Lagus. 2005. Induc- ing the morphological lexicon of a natural language References from unannotated text. In Proceedings of AKRR’05, International and Interdisciplinary Conference on Malin Ahlberg, Markus Forsberg, and Mans Hulden. Adaptive Knowledge Representation and Reasoning, 2014. Semi-supervised learning of morphological pages 106–113. paradigms and lexicons. In Proceedings of the 14th Conference of the European Chapter of the Asso- Mathias Creutz and Krista Lagus. 2007. Unsupervised ciation for Computational Linguistics, Gothenburg, models for morpheme segmentation and morphol- Sweden 26–30 April 2014, pages 569–578. ogy learning. ACM Trans. Speech Lang. Process., 4(1):1–34. Marco Baroni, Johannes Matiasek, and Harald Trost. 2002. Unsupervised discovery of morphologically Sajib Dasgupta and Vincent Ng. 2007. High- related words based on orthographic and semantic performance, language-independent morphological similarity. In Proceedings of the ACL-02 workshop segmentation. In NAACL HLT 2007: Proceedings on Morphological and phonological learning - Vol- of the Main Conference, pages 155–163. ume 6, MPL ’02, pages 48–57. Association for Com- putational Linguistics. Anne N. de Roeck and Waleed Al-Fares. 2000. A mor- phologically sensitive clustering algorithm for iden- Kenneth R. Beesley. 1996. Arabic finite-state mor- tifying Arabic roots. In Proceedings of the 38th An- phological analysis and generation. In Proceedings nual Meeting on Association for Computational Lin- of the 16th conference on Computational linguistics, guistics, ACL ’00, pages 199–206. Association for pages 89–94. Association for Computational Lin- Computational Linguistics. guistics. Vera Demberg. 2007. A language-independent unsu- Albert Borg and Marie Azzopardi-Alexander. 1997. pervised model for morphological segmentation. In Maltese: Lingua Descriptive Grammar. Routledge, Proceedings of the 45th Annual Meeting of the As- London and New York. sociation of Computational Linguistics, pages 920– 927. Claudia Borg and Albert Gatt. 2014. Crowd-sourcing evaluation of automatically acquired, morphologi- Greg Durrett and John DeNero. 2013. Supervised cally related word groupings. In Proceedings of learning of complete morphological paradigms. In the Ninth International Conference on Language Re- Proceedings of the North American Chapter of the sources and Evaluation (LREC’14). Association for Computational Linguistics, pages 1185–1195. Claudia Borg. 2016. Morphology in the Maltese lan- guage: A computational perspective. Ph.D. thesis, Ray Fabri, Michael Gasser, Nizar Habash, George Ki- University of Malta. raz, and Shuly Wintner. 2014. Linguistic introduc- tion: The , morphology and syntax of M. Brincat. 2011. Maltese and other Lan- semitic languages. In Natural Language Processing guages. Midsea Books, Malta. of Semitic Languages, Theory and Applications of Natural Language Processing, pages 3–41. Springer John J. Camilleri. 2013. A computational grammar Berlin Heidelberg. and lexicon for Maltese. Master’s thesis, Chalmers University of Technology, Gothenburg, Sweden, Ray Fabri. 2010. Maltese. In Christian Delcourt and September. Piet van Sterkenburg, editors, The Languages of the New EU Member States, volume 88, pages 791–816. Burcu Can and Suresh Manandhar. 2012. Probabilistic Revue Belge de Philologie et d’Histoire. hierarchical clustering of morphological paradigms. In Proceedings of the 13th Conference of the - Alex Farrugia. 2008. A computational analysis of the pean Chapter of the Association for Computational Maltese broken plural. Bachelor’s Thesis, Univer- Linguistics, pages 654–663. Association for Com- sity of Malta. putational Linguistics. M. L. Forcada, M. Ginest´ı-Rosell, J. Nordfalk, Alexander Clark. 2002. Memory-based learning of J. O’Regan, S. Ortiz-Rojas, J. A. Perez-Ortiz,´ morphology with stochastic transducers. In Pro- F. Sanchez-Mart´ ´ınez, G. Ram´ırez-Sanchez,´ and ceedings of the 40th Annual Meeting of the Asso- F. M. Tyers. 2011. Apertium: a free/open-source ciation for Computational Linguistics (ACL), pages platform for rule-based machine . Ma- 513–520. chine Translation, 25(2):127–144.

33 John Goldsmith. 2001. Unsupervised learning of the Center for the Study of Language and Information, morphology of a natural language. Computational Stanford (Calif.). Linguistics, 27:153–198. Tamara Schembri. 2006. The Broken Plural in Mal- Nizar Habash and Owen Rambow. 2005. Arabic to- tese: An Analysis. Bachelor’s Thesis, University of kenization, part-of-speech tagging and morphologi- Malta. cal disambiguation in one fell swoop. In Proceed- ings of the 43rd Annual Meeting on Association for Patrick Schone and Daniel Jurafsky. 2000. Computational Linguistics, ACL ’05, pages 573– Knowledge-free induction of morphology us- 580, Stroudsburg, PA, USA. Association for Com- ing latent semantic analysis. In Proceedings of putational Linguistics. the 2nd workshop on Learning language in logic and the 4th conference on Computational natural Nizar Habash, Owen Rambow, and George Kiraz. language learning - Volume 7, ConLL ’00, pages 2005. MAGEAD: A Morphological Analyzer and 67–72. Association for Computational Linguistics. Generator for the Arabic Dialects. In Proceedings of the ACL Workshop on Computational Approaches Patrick Schone and Daniel Jurafsky. 2001. to Semitic Languages, pages 17–24. The Association Knowledge-free induction of inflectional mor- for Computer Linguistics. phologies. In Proceedings of the second meeting of the North American Chapter of the Association Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard for Computational Linguistics on Language tech- Pfahringer, Peter Reutemann, and Ian H. Witten. nologies, NAACL ’01, pages 1–9. Association for 2009. The weka data mining software: an update. Computational Linguistics. SIGKDD Explor. Newsl., 11(1):10–18. Kairit Sirts and Sharon Goldwater. 2013. Minimally- Harald Hammarstrom¨ and Lars Borin. 2011. Unsuper- supervised morphological segmentation using adap- vised learning of morphology. Computational Lin- tor grammars. Transactions of the Association for guistics, 37:309–350. Computational Linguistics, 1:255–266. Samarth Keshava and Emily Pitler. 2006. A simpler, Benjamin Snyder and Regina Barzilay. 2008. Un- intuitive approach to morpheme induction. In PAS- supervised multilingual learning for morphological CAL Challenge Workshop on Unsupervised Segmen- segmentation. In Proceedings of ACL-08: HLT, tation of Words into Morphemes, pages 31–35. pages 737–745, Columbus, Ohio. Association for Computational Linguistics. Oskar Kohonen, Sami Virpioja, and Krista Lagus. 2010. Semi-supervised learning of concatenative Michael Spagnol. 2011. A tale of two morphologies. morphology. In Proceedings of the 11th Meeting of Verb structure and argument alternations in Mal- the ACL Special Interest Group on Computational tese. Ph.D. thesis, University of Konstanz. Morphology and Phonology, SIGMORPHON ’10, pages 78–86, Stroudsburg, PA, USA. Association Antal Van den Bosch and Walter Daelemans. 1999. for Computational Linguistics. Memory-based morphological analysis. In Pro- ceedings of the 37th annual meeting of the Asso- Mikko Kurimo, Sami Virpioja, Ville Turunen, and ciation for Computational Linguistics on Computa- Krista Lagus. 2010. Morpho Challenge competition tional Linguistics, pages 285–292. 2005–2010: evaluations and results. In Proceed- ings of the 11th Meeting of the ACL Special Interest David Yarowsky and Richard Wicentowski. 2000. Group on Computational Morphology and Phonol- Minimally supervised morphological analysis by ogy, SIGMORPHON ’10, pages 87–95. Association multimodal alignment. In Proceedings of the 38th for Computational Linguistics. Annual Meeting on Association for Computational Linguistics, ACL ’00, pages 207–216, Stroudsburg, Karthik Narasimhan, Regina Barzilay, and Tommi S. PA, USA. Association for Computational Linguis- Jaakkola. 2015. An unsupervised method for un- tics. covering morphological chains. Transactions of the Association for Computational Linguistics (TACL), 3:157–167. Hoifung Poon, Colin Cherry, and Kristina Toutanova. 2009. Unsupervised morphological segmentation with log-linear models. In Proceedings of Human Language Technologies: The 2009 Annual Confer- ence of the North American Chapter of the Associ- ation for Computational Linguistics, NAACL ’09, pages 209–217. Aarne Ranta. 2011. Grammatical framework: pro- gramming with multilingual grammars. CSLI stud- ies in computational linguistics. CSLI Publications,

34