Choosing a Spanish Part-Of-Speech Tagger for a Lexically Sensitive Task∗

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Repositorio Institucional de la Universidad de Alicante

Procesamiento del Lenguaje Natural, Revista nº 54, marzo de 2015, pp 29-36 recibido 10-11-14 revisado 27-01-15 aceptado 12-02-15

Choosing a Spanish Part-of-Speech tagger for a lexically sensitive task∗

Selección de un etiquetador morfosintáctico primando la precisión en las categorías léxicas Carla Parra Escartín Héctor Martínez Alonso University of Bergen University of Copenhagen Bergen, Norway Copenhagen, Dennmark [email protected] [email protected]

Resumen: En este artículo se comparan cuatro etiquetadores morfosintácticos para el español. La evaluación se ha realizado sin entrenamiento ni adaptación previa de los etiquetadores. Para poder realizar la comparación, los etiquetarios se han con- vertido al etiquetario universal (Petrov, Das, and McDonald, 2012). También se han comparado los etiquetadores en cuanto a la información que facilitan y cómo tratan características intrínsecas del idioma español como los clíticos verbales y las contracciones. Palabras clave: Etiquetadores morfosintácticos, evaluación de herramientas, lingüística de corpus

Abstract: In this article, four Part-of-Speech (PoS) taggers for Spanish are compared. The evaluation has been carried out without prior training or tuning of the PoS taggers. To allow for a comparison across PoS taggers, their tagsets have been mapped to the universal PoS tagset (Petrov, Das, and McDonald, 2012). The PoS taggers have also been compared as regards the information they provide and how they treat special features of the Spanish language such as verbal clitics and portmanteaux. Keywords: Part-of-Speech taggers, tool evaluation, corpus linguistics

1 Introduction lexical and overall PoS tags, as well as the final quality of the resulting lemmatisation. Part-of-Speech (PoS) taggers are among the most commonly used tools for the annota- As the taggers had different tagsets, tion of language resources. They are often a and we were only interested in retriev- key preprocessing step in many Natural Lan- ing the coarse PoS tag (buys_verb vs. buys_verb_3ps), we have mapped the guage Processing (NLP) systems. When us- tagsets to the universal PoS tagset proposed ing a PoS tagger in a workflow, it is important to know the impact of its error rate on any by Petrov, Das, and McDonald (2012). modules that use its output (Manning, 2011). The remainder of this paper is organ- In this paper, we compare four different ised as follows: Section 2 discusses available PoS taggers for Spanish and describes them PoS taggers for Spanish. Our goal is to build briefly. In Section 3 the different challenges an NLP system for knowledge extraction from technical text, and this requires very good encountered when comparing the four PoS taggers are discussed. Section 4 discusses the performance on lemmatisation and right PoS assignment. Inflectional information is not differences across the tagsets used by each relevant for our purposes. Instead of choosing PoS tagger and Section 5 shows the evaluation of each PoS tagger compared with the a PoS system based solely on its reported performance, we benchmark the output of the 4 other three. Section 6 offers a summary. candidate systems against a series of metrics that profile their behaviour when predicting 2 Part-of-Speech taggers available for Spanish ∗ The authors thank the anonymous reviewers for their valuable comments and the developers of the For Spanish, several PoS tagger initiatives taggers we have covered in this article for making have been reported (Moreno and Goñi, 1995; them available. Márquez, Padró, and Rodríguez, 2000; Car- ISSN 1135-5948 © 2015 Sociedad Española para el Procesamiento del Lenguaje Natural Carla Parra Escartín, Héctor Martínez Alonso reras et al., 2004; Padró and Stanilovsky, whitespaces as they occur in the text, while 2012, i.a.). The reported accuracies for these their lemmas are joined by means of tildes. PoS taggers are reported to be 96–98%. Examples 4 and 5 show MWEs tagged by TT. However, not all of these tools were avail- (4) de conformidad con PREP able for our tests. One of the existing PoS de∼conformidad∼con taggers, the GRAMPAL PoS tagger (Moreno (5) junto con PREP junto∼con and Goñi, 1995) was not included in this comparison because it is not downloadable and it 2.2 IULA TreeTagger (IULA) 1 does not seem accessible via a web service . The IULA is an instance of the TT trained To our knowledge, four PoS taggers on the IULA technical corpus (IULACT). Ad- for Spanish are the TreeTagger (TT) ditionally, each file undergoes a preprocess- (Schmid, 1994), the IULA TreeTagger (IULA) ing step prior to tagging. This preprocessing (Martínez, Vivaldi, and Villegas, 2010), step is described in detail in Martínez, Vi- the FreeLing PoS tagger (FL) (Padró and valdi, and Villegas (2010). It comprises the Stanilovsky, 2012), and the IXA PoS tagger following tasks: (IXA) (Agerri, Bermudez, and Rigau, 2014). 1. Sentence-boundary detection; Two of them (IULA and FL) are also avail- 2. General structure-marking; able as web services developed during the 3. Non-analyzable element recognition; PANACEA project. The fourth PoS tagger was recently released within the IXA pipes. The 4. Phrase and loanword recognition; study reported in this paper compares these 5. Date recognition; four PoS taggers. 6. Number recognition; 7. Named Entity (NE) recognition. 2.1 Default TreeTagger TT These tasks were introduced in what TT provides 22 already trained models for 18 Martínez, Vivaldi, and Villegas (2010) call a languages and new models can be created by text handling module. This module was de- retraining the tagger with new data. veloped in order to solve potential sources of We used the already trained model for errors prior to tagging with the aim of im- Spanish which is available on its website. proving the overall quality of the PoS tag- This model was trained on the Spanish ging process. The whole toolset is available CRATER corpus and uses the Spanish lexi- through a web service where one can upload con of the CALLHOME corpus of the LDC. the corpus to be tagged and download the Prior to tagging the text, the tool to- tagged corpus upon completion of the task. kenises it. The tokeniser does not have spe- Unlike the TT instance discussed in Sub- cific rules for Spanish. section 2.1, the IULA PoS tagset provides in- In the output of this tagger, every line flectional information for the relevant PoS. contains one wordform with its PoS tag and The tagset is partially based on the EAGLES its lemma (i.e. citation form). TT PoS tags tagset for Spanish, and includes more fine- do not contain inflectional information (e.g. grained information. tense, number, gender, etc.). This tagset is When the tagger fails to assign a lemma the most similar to the universal PoS tagset. to a specific wordform, instead of assigning to When the tagger fails to assign a lemma to it the value as TT does, IULA a specific wordform, the value for the lemma assigns wordforms as lemmas. Example 6 is . Nevertheless, a PoS tag is shows the tagging of the unknown word plu- assigned to an unknown word. Examples 1-3 rifamiliares. Plurifamiliares is in plural, and show words with unknown lemmas and their its lemma should be the singular wordform assigned PoS tags. plurifamiliar but instead, the plural word- (1) NÖ NC form is used. (2) WFG NP (6) plurifamiliares JQ–6P plurifamiliares Special elements such as MWEs are (3) plurifamiliares ADJ treated in a different way. The previous TT concatenates Multiword Expressions MWE examples 4 and 5 do not appear con- (MWEs). Their wordforms are listed with catenated, but are tagged as separate words. 1There is an available online demo limited to 5000 MWEs such as dates or names are lemmatised words. with underscores. Examples 7-9 show some 30 Choosing a Spanish Part-of-Speech tagger for a lexically sensitive task

tagged MWEs. Furthermore, as a result of (13) 15_de_octubre_del_2002 the preprocessing step, the IULA adds addi- [??:15/10/2002:??.??:??] W tional xml-style tags to such elements. 2.4 IXA pipes (IXA) (7) 18 de diciembre del 2001 T 18_de_diciembre_del_2001 The IXA pipes are “ready to use NLP tools” (8) Baja Austria N4666 Baja_Austria (Agerri, Bermudez, and Rigau, 2014) devel- (9) Promoción MH-NEU N4666 oped by the IXA NLP Group. Among the Promoción_MH-NEU available tools there are a tokeniser and a PoS tagger for Spanish. The PoS tagger requires 2.3 FreeLing (FL) not only that the text is previously tokenised, FreeLing is an open source suite of language but also, that it is in the NLP Annotation analysers. It offers a wide variety of services Format (NAF) (Fokkens et al., 2014). These for several languages. Among these analysers requirements are met by using the provided are tokenisers, sentence splitters, morpholog- tokeniser and piping it to the PoS tagger. ical analysers, PoS taggers, etc. The PoS IXA has been trained and evaluated us- tagger has two different flavours, a hybrid ing the Ancora corpus (Taulé, Martí, and Re- tagger called relax, which combines statisti- casens, 2008). Two PoS tagging models are cal and manually crafted grammatical rules, available: one based on the Perceptron algo- and a model based on the Hidden Markov rithm (Collins, 2002), and another one based Model (HMM) similar to the TnT proposed on Maximum Entropy (Ratnaparkhi, 1999). by Brants (2000). In both cases, the tag- We use the default model for Spanish, which ger was trained with the LexEsp corpus (Se- is the Perceptron. bastián, Martí, and Carreiras, 2000). Again, Its output format is xml-based and the web service offered by the PANACEA thereby differs from the taggers previously project was used. Since in the web service discussed in this paper. The resulting docu- only the HMM model was deployed, this is ment is tagged in NAF and has a header spec- the tagging model we have used in this pa- ifying the tools that have been used, when per. they were used, how long the process has FL displays first the wordform, followed by taken and the version of the tool used. Next, the lemma and then the PoS tag. It uses the the tokenised text appears. EAGLES tagset for Spanish, which, similarly For each tokenised wordform, the tool pro- to the IULA tagset, also includes more fine- vides its NAF required attributes for word grained inflectional information. id and the sentence where it appears (sent), Whenever FL analyses a sequence as a as well as the optional attributes para, off- MWE, it displays both the wordform and the set, and length, which correspondingly refer lemma joined with underscores. Examples 10 to the paragraph the word belongs to, the and 11 show some tagged MWEs. offset in number of characters and its length (10) Baja_Austria baja_austria NP00G00 in number of characters. (11) Promoción_MH-NEU promoción_mh- Where the tokenised text ends, a new sec- neu NP00V00 tion starts with PoS and lemma information Another peculiarity is that all lemmas about each word as identified by its id. For are lowercased regardless whether they cor- each term, the following attributes are pro- respond to a proper noun, an abbreviation vided: or something else. This can be observed in Example 10, where the lemma for the proper 1. id: The term id. This is the only re- noun Baja Austria ([EN]: Lower Austria) is quired attribute, all the other attributes lowercased to baja_austria. are optional. Finally, dates are also treated differently 2. type: Whether the term belongs to an in FL. Their wordform is the date itself joined open PoS (e.g. nouns), or a closed one by underscores, and their “lemma” is the (e.g. prepositions). same date converted to a numerical format, 3. lemma: The wordform lemma. where month names are converted to their 4. pos: The Part of Speech of the word- corresponding number. Examples 12 and 13 form. show this. 5. morphofeat: the PoS tag assigned to (12) 18_de_diciembre_del_2001 the form, containing inflectional infor- [??:18/12/2001:??.??:??] W mation. IXA uses the same tagset as FL: 31 Carla Parra Escartín, Héctor Martínez Alonso

the EAGLES tagset for Spanish. when/while passing). Each PoS tagger how- MWEs are signalled by the sub-element ever, handles this phenomenon differently. .... When the PoS tagger (a) TT: TT has a special tag for each of these identifies a MWE, the sub-element wordforms: PAL for al and PDEL for del. will have several subelements. TT treats the subordinated conjunction subelements refer to the wordform reading using a third tag: CSUBI. Thus, ids assigned in the text part of the document. TT does not split these wordforms and In our test, IXA failed to identify any MWE. handles them by using specific tags. (b) IULA: The IULA assigns to these word- 3 Challenges forms a double tag P_AMS, thus provid- Comparing and evaluating four different PoS ing information from each of the compo- taggers is not a straightforward task. Dif- nents but joining the tags with an un- ferences in their tagsets, output formats and derscore. The lemmas are assigned cor- tokenisation processes have to be addressed. respondingly, retrieving the preposition Prior to the PoS tagging process, the text and the article as separate items but join- has to be tokenised. As pointed out by Dri- ing these lemmas with an underscore: dan and Oepen (2012), tokenisation is often a_el and de_el. regarded as a solved problem in NLP. How- (c) FL: FL retrieves the preposition and ever, the conventions used in the tokenisa- the article undoing the contraction com- tion task have a direct impact in the systems pletely. Thus, al and del become a el which subsequently use the tokenised text. and de el and each word is analysed and In recent years, several authors have high- tagged separately. lighted the importance of tokenisation and (d) IXA: IXA uses a strategy similar to that researched new tokenisation strategies (Dri- of the TT and uses one special tag SPCMS dan and Oepen, 2012; Fares, Oepen, and for contracted prepositions available in Zhang, 2013; Orosz, Novák, and Prószéky, EAGLES for both al and del. 2013, i.a.). In our study, each PoS tagger had its own Each PoS tagger takes a different stance tokeniser either as an integrated component on this phenomenon. Two taggers tag the (TT, IULA and FL), or available as a sepa- contracted prepositions with special tags (TT rate tool to be piped to the PoS tagger (IXA). and IXA). The other two treat them as sep- Each tagger subsequently tagged this text, arate words and retrieve the underlying non- following its internal tokenisation. contracted wordforms (IULA and FL), al- At this stage, there are also two particular though they represent them differently. features of the Spanish language that may be handled differently by the tokenisers and/or 3.2 Verbal clitics the PoS taggers: Verbal clitics are morphemes with the syn- • The portmanteau (contracted) word- tactic characteristics of a word which depend forms al and del (cf. 3.1); phonologically on another word. In Spanish, • Verbal clitics, which are attached to clitics are attached directly to the verb with- verbs without hyphenation (cf. 3.2). out hyphenation and fulfil a pronominal role. Finally, an additional challenge is the way Moreover, it is possible to have two clitic pro- in which each tagger detects and tags MWEs, nouns, one referring to the direct object and such as named entities, proper names, dates, the other to the indirect object. As a result and complex prepositions. of this agglutinative process, some wordforms will require an acute accent to comply with 3.1 Portmanteaux in Spanish the Spanish orthographic rules. For instance, To a certain extent, the portmanteaux al and the wordform enviármelo ([EN]: ‘send it to del are not difficult to tackle. They are the me’) is composed of the verb enviar ([EN]: result of contracting the Spanish prepositions ‘send’) to which the clitics me ([EN]: ‘me’) a and de with the determined masculine sin- and lo ([EN]: ‘it’) are attached. gular article el. Thus, a + el results in al and Clitic handling is a challenge for PoS tag- de + el results in del. Additionally, al can gers. As Martínez, Vivaldi, and Villegas also be used to introduce subordinated infi- (2010) point out, “the appearance of an acute nite clauses in Spanish (eg. al pasar,[EN]: accent in some wordforms makes a brute- 32 Choosing a Spanish Part-of-Speech tagger for a lexically sensitive task force stemming difficult”. They account for distinctions and features, but also treats in- 32 wordforms with pronominal clitics for an trinsic linguistic phenomena differently. infinite verb like dar ([EN]: ‘give’) and ex- plain that as per now the verbal wordforms PoS Tagger Tagset Tags Granularity with clitics are kept in the lexicon of their ap- TT TT ES 77 low IULA TT IULA tagset 435 high plication to determine if they belong to the FL EAGLES 577 high language or not. In the four PoS taggers com- IXA EAGLES 577 high pared in the present paper, different strategies are used. Table 1: PoS taggers tagset comparison. (a) TT: TT assigns special tags to verbs in which clitics are incorporated. It has three different tags available, depending Table 1 shows the differences across the on the verb wordform to which the cl- tagsets used by each of the PoS taggers. A itics are attached: VCLIinf (infinitive), full comparison of the output of four differ- VCLIger (gerund), and VCLIfin (finite ent tools with such big differences regard- wordform). For instance, utilizarla ([EN]: ing the number of tags and the information use it) is assigned the tag VCLIinf. encoded in such tags could be a very te- dious and inaccurate task. We were only (b) IULA: The IULA uses a different strategy. It separates the verb from the clitic, interested in retrieving the coarse PoS, the and analyses them as two different words. best approach to map the tagsets seemed However, both the wordform and the to be to map each tagset to the universal lemma have additional information at- PoS tagset. Petrov, Das, and McDonald tached to them by means of underscores. (2012) distinguish twelve PoS tags: “NOUN (nouns), VERB (verbs), ADJ (adjectives), (c) FL: Like the IULA FL separates the verb ADV (adverbs), PRON (pronouns), DET (de- from the clitic and analyses them sepa- terminers and articles), ADP (prepositions rately. However, no additional marking and postpositions), NUM (numerals), CONJ is used to indicate that the verb and the (conjunctions), PRT (particles), ‘.’ (punctu- clitic are one word. ation marks) and X (a catch-all for other (d) IXA: IXA ignores clitics and assigns to categories such as abbreviations or foreign verbs with clitics the same tag that the words”. verb would have without the clitic. Thus, We have developed a mapping from each enviármelo is assigned the tag VMN0000, 2 which corresponds to the infinitival form Spanish tagset to the universal PoS tagset . of a main verb. When projecting to the universal PoS tags we loose the inflectional information. Fur- In conclusion, verbs with clitics are han- thermore, past participles have been mapped dled in different ways by the different taggers to the PoS VERB, regardless of whether they under investigation. While IXA seems to ig- function as part of a periphrastic verb tense nore the clitics, all the other taggers handle or function as modifiers or predicates in them differently. The TT has its own tag for the same way as adjectives. The adjective- this kind of phenomena, FL splits the verb participle ambiguity is addressed differently and the clitic, and the IULA uses a mixed ap- in the annotation guidelines of all training proach by splitting the verb but adding addi- data, and the behaviour of the tagger in tional information to the wordform and the this aspect is a consequence of the data it is lemma. trained on. This simple approach was chosen in order to avoid manual revision. A similar 4 Tagset comparison approach was taken in other cases, such as The use of different tagsets, together with the the verbs with attached clitics and the port- other challenges previously discussed in sec- manteaux when these had not been previ- tion 3 makes the comparison of PoS taggers ously preprocessed and split. In these cases, a challenging task. Of the four PoS taggers the categories VERB and ADP, respectively, which we have investigated, only FL and IXA were used as defaults. share the same tagset, while the others use different tagsets. Each tagger does not only 2The mappings are available at the Universal-Part- provide a different granularity of categorial of-Speech-Tags repository. 33 Carla Parra Escartín, Héctor Martínez Alonso

TT IULA√ FL√ IXA√ 5 POS tagger performance Morphosyntactic - info √ √ Bearing in mind all the discrepancies de- Splits - - portmanteaux √ √ scribed in sections 3 and 4, a completely fair Splits verbal - - comparison of all the PoS taggers against a clitics √ √ Joins dates - √ √ - unique Gold Standard (GS) is not feasible. A Reflexive verb - - lemmatisation measure like accuracy requires the input data Tagset size 77 435 577 Number of 42 241 173 to be linearly comparable—i.e. be tokenized non-lexical tags using the same convention—and the output data to rely on the same number of labels— Table 2: PoS taggers features. i.e. the datasets should be the same. The ideal downstream evaluation method would be parsing, but it is too sensitive to In order to allow a comparison, we created tokenisation changes to be applicable when a GS following a procedure that attempts to comparing taggers that tokenize differently, minimise the starting bias in favour of a cer- besides using different tagsets. tain system. We chose two short texts from Nevertheless, the four different systems the freely available technical corpus TRIS rely on the same sentence boundaries, and (Parra, 2012). We processed this material are thus sentence-aligned. Given the afore- with FL, because this tagger tokenizes most mentioned limitations, we use a series of met- aggressively (cf. 2.1-2.4). MWEs detected by rics that aim to assess the PoS tagging and FL where manually split and tagged, aim- lemmatisation performance of the systems by ing at facilitating the evaluation of this GS measuring matches—i.e. set intersections— against the outputs of the other PoS taggers. at the sentence level. Then, we converted the FL tagset to the uni- In addition, we use metrics that assess versal PoS tagset (Petrov, Das, and McDon- the similarity between tag distributions (KL ald, 2012) and manually corrected the tagged divergence) in order to assess whether the text. Each tagger was then compared against bias of each tagger is closer to the desired this GS. performance determined by a reference GS, Table 3 summarises the results of the au- and metrics that evaluate how many of the tomatic evaluation we carried out for all tag- predicted lemma+PoS appear Spanish word- gers against the GS. Three of the metrics net. The wordnet check serves as a down- have two variants; in an overall metric (o), stream evaluation of the joint predicted val- we compare the performance across all parts ues and is one of the instrumental parts of of speech, whereas in a lexical metric (l), we the knowledge-extraction system mentioned only take into consideration the PoS that are in Section 1. tagged as ADJ, NOUN or VERB after project- Table 2 summarises the main features of ing onto Petrov’s tagset. each tagger. Only TT fails to provide inflec- The metrics are the following: tional information. Portmanteaux, verbal cl- • Matching lemmas (o/l): Proportion of itics and reflexive verbs are treated in differ- lemmas that match with the GS; ent ways. While IULA and FL split them, • Matching PoS (o/l): Proportion of PoS TT and IXA do not. Finally, the tagsets dif- tags that match. A match is defined as fer also greatly in terms of their overall tag the minimum count of a given PoS tag number and the number of non-lexical tags. matching that in the GS; IULA offers the greatest absolute number and • KL total (o/l): Kullback-Leibler diver- greatest proportion of tags dedicated to non- gence (Kullback and Leibler, 1951) be- lexical elements. tween the PoS distribution of the system The discrepancies between the different and that of the GS; tagsets can be addressed by mapping them to • GS token ratio: The relation between the the universal PoS tagset. However, then we amount of tokens and that of the GS. are losing some of the inflectional analyses As explained earlier, we have chosen a produced by some of the taggers. Further- GS with the most fragmentary tokenisa- more, a comparison against one Gold Stan- tion, so the ratio will always be equal or dard might be biased against one of the vari- less to one. The higher the number, the ous choices of handling MWEs and other spe- most similar the tokenisation convention cial features. of the system is to the GS tokenisation. 34 Choosing a Spanish Part-of-Speech tagger for a lexically sensitive task

• WN GS matches: Number of Span- With regard to the overall results, FL has ish wordnet (Gonzalez-Agirre, Laparra, the highest PoS overlap (cf. Matching PoSo and Rigau, 2012) hits for all predicted in Table 3) with the GS, followed by IXA lemma-PoS combinations in the GS; by a close second. The better accuracy on • WN sys matches: Number of Spanish lemma retrieval, paired with the lowest KL wordnet hits for all predicted lemma-PoS divergence on lexical PoS is also reflected in combinations matching those of the GS; the highest wordnet hit count for FL (cf. and WN intersection in Table 3). However, the • WN intersection: Number of Spanish IULA had a better performance when tagging wordnet hits that also appear in the GS. lexical words (cf. Matching PoSl in Table 3). In fact, it manages to correctly tag lex-

TT IXA IULA FL ical words, despite not necessarily achieving Matching lemmaso 0.77 0.85 0.7 0.89 to lemmatise them correctly. This explains Matching lemmasl 0.63 0.63 0.66 0.78 Matching PoSo 0.86 0.87 0.85 0.88 also, why it has the highest WordNet hit Matching PoSl 0.93 0.93 0.94 0.92 KLo 0.041 0.053 0.042 0.063 count of lemma-PoS combinations (cf. WN KLl 0.0029 0.0095 0.0005 0.001 GS token ratio 0.97 0.99 0.94 0.93 sys matched in Table 3). WN GS matches 402 402 402 402 Since our current research focuses on lex- WN sys matched 378 367 386 384 WN intersection 361 350 366 373 ical words, the most important measures for our purposes are Matching PoSl and Match- Table 3: PoS taggers evaluation. ing lemmasl. IULA may be our best choice when focusing on the assignment of right PoS tags. It performs better than the TT system As Table 3 illustrates, FL has the highest proportion of both overall and lexical which we have previously been using. FL, matching lemmas (cf. Matching lemmas o on the other hand, seems to be better for general lemmatisation tasks, regardless of the and Matching lemmasl in Table 3). Its highest proportion of lexical matches is remark- PoS tag. able, as the lemmatisation of lexical words is more important (and error-prone) than that 6 Conclusions and future work of closed grammatical classes such as articles We have compared four PoS taggers for Span- and prepositions. ish and evaluated their outputs against a As previously stated, FL is also the sys- common GS. For our purposes, we concluded tem which uses a more fragmented tokenisa- that IULA would be the best choice, followed tion, and this makes it the most similar to by FL. It is possible that a combined output the GS in that respect. However, it gets the of these two taggers may outperform each sin- lowest GS token ratio score. This might be gle tagger. Given the difficulties of combining because FL joins the lemmas of MWEs with different tokenisations in a voting scheme for underscores (cf. 2.3). IXA is the tagger which PoS tagging, we leave this question for future achieves a better score as regards the ratio of work. tokens in the GS (cf. GS token ratio in Table Evaluating on a technical text makes the 3). This may be due to the fact that we split task more difficult for most of the PoS tag- all MWEs in our GS. As mentioned earlier, gers. The better performance of IULA may IXA failed at identifying and tagging them in also be due it being the only tagger trained on our test files. a technical corpus. It is not impossible that KL divergence offers a different perspec- for more general (i.e. less technical) texts the tive on the different systems. The lower the differences across the different taggers may KL divergence between PoS distribution is, be smaller. the more similar to the GS is the expected We have also proposed a method to com- prior for a certain PoS. Interestingly, FL has pare the output of different PoS taggers by the highest overall KL divergence in spite of mapping their respective tagsets to the uni- it having the highest performance on lemma versal PoS tagset and evaluating matches at retrieval, and the second lowest lexical KL di- the sentence level. As indicated earlier in vergence. This difference is due to FL having Section 1, for our particular purposes no in- different conventions on the way that some flectional information was needed, and thus function words are annotated and thus later this method was enough. In case a more fine- converted to the universal PoS tags. grained tagging is needed, the universal PoS 35 Carla Parra Escartín, Héctor Martínez Alonso tagset may need to be expanded. Manning, C. D. 2011. Part-of-speech Tag- This non-linear evaluation method is use- ging from 97Linguistics? In CICLing’11, ful for tasks that depend on lemmatisation pages 171–189. Springer-Verlag. and coarse PoS tagging. Nevertheless, the Márquez, L., L. Padró, and H. Rodríguez. method would have to be expanded for tasks 2000. A machine learning approach to that require inflectional information. POS tagging. Machine Learning, (39):59– 91. References Martínez, H., J. Vivaldi, and M. Villegas. Agerri, R., J. Bermudez, and G. Rigau. 2014. 2010. Text handling as a web service IXA pipeline: Efficient and Ready to Use for the IULA processing pipeline. In Multilingual NLP tools. In LREC’14. LREC’10, pages 22–29. ELRA. ELRA. Moreno, A. and J. M. Goñi. 1995. GRAM- Brants, T. 2000. TnT: A Statistical Part-of- PAL: A Morphological Processor for Span- speech Tagger. In ANLP’00, pages 224– ish implemented in Prolog. In GULP- 231, Sheattle, Washington. PRODE’95. Carreras, X., I. Chao, L. Padró, and Orosz, G., A. Novák, and G. Prószéky. 2013. M. Padró. 2004. Freeling: An Open- Hybrid Text Segmentation for Hungarian Source Suite of Language Analyzers. In Clinical Records. In MICAI (1), volume LREC’04. ELRA. 8265 of Lecture Notes in Computer Sci- ence, pages 306–317. Springer-Verlag. Collins, M. 2002. Discriminative Train- ing Methods for Hidden Markov Models: Padró, L. and E. Stanilovsky. 2012. FreeLing Theory and Experiments with Perceptron 3.0: Towards Wider Multilinguality. In Algorithms. In EMNLP’02, volume 10, LREC’12. ELRA. pages 1–8. ACL. Parra, C. 2012. Design and compilation Dridan, R. and S. Oepen. 2012. Tokeniza- of a specialized Spanish-German parallel tion: Returning to a Long Solved Problem corpus. In LREC’12, pages 2199–2206. a Survey, Contrastive experiment, Recom- ELRA. mendations, and Toolkit. In ACL’12, vol- Petrov, S., D. Das, and R. McDonald. 2012. ume 2, pages 378–382. ACL. A Universal Part-of-Speech Tagset. In Fares, M., S. Oepen, and Y. Zhang. 2013. LREC’12. ELRA. Machine Learning for High-Quality Tok- Ratnaparkhi, A. 1999. Learning to Parse enization Replicating Variable Tokeniza- Natural Language with Maximum En- tion Schemes. In Computational Lin- tropy Models. Machine Learning, 34(1– guistics and Intelligent Text Processing, 3):151–175. volume 7816. Springer Berlin Heidelberg, pages 231–244. Schmid, H. 1994. Probabilistic Part-of- Speech Tagging Using Decision Trees. In Fokkens, A., A. Soroa, Z. Beloki, N. Ock- International Conference on New Methods eloen, G. Rigau, W. R. van Hage, and in Language Processing, pages 44–49. P. Vossen. 2014. NAF and GAF: Link- Sebastián, N., M. A. Martí, and M. F. Car- ing Linguistic Annotations. In Proceed- reiras. 2000. Léxico informatizado del es- ings 10th Joint ISO-ACL SIGSEM Work- pañol. Edicions Universitat de Barcelona. shop on Interoperable Semantic Annota- tion, pages 9–17. Taulé, M., M. A. Martí, and M. Recasens. 2008. AnCora: Multilevel Annotated Gonzalez-Agirre, A., E. Laparra, and Corpora for Catalan and Spanish. In G. Rigau. 2012. Multilingual Central LREC’08. ELRA. Repository version 3.0. In LREC’12, pages 2525–2529. ELRA. Kullback, S. and R. A. Leibler. 1951. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86. 36