View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach

Veronika Vincze1,2, Istvan´ Nagy T.2 and Richard´ Farkas2 1Hungarian Academy of Sciences, Research Group on Artificial Intelligence [email protected] 2Department of Informatics, University of Szeged nistvan,rfarkas @inf.u-szeged.hu { }

Abstract 2 Light Verb Constructions Here, we introduce a machine learning- based approach that allows us to identify Light verb constructions (e.g. to give advice) are light verb constructions (LVCs) in Hun- a subtype of multiword expressions (Sag et al., garian and English free texts. We also 2002). They consist of a nominal and a verbal present the results of our experiments on component where the verb functions as the syn- the SzegedParalellFX English–Hungarian tactic head, but the semantic head is the noun. The parallel corpus where LVCs were manu- verbal component (also called a light verb) usu- ally annotated in both languages. With ally loses its original sense to some extent. Al- our approach, we were able to contrast though it is the noun that conveys most of the the performance of our method and define meaning of the construction, the verb itself can- language-specific features for these typo- not be viewed as semantically bleached (Apres- logically different languages. Our pre- jan, 2004; Alonso Ramos, 2004; Sanroman´ Vi- sented method proved to be sufficiently ro- las, 2009) since it also adds important aspects to bust as it achieved approximately the same the meaning of the construction (for instance, the scores on the two typologically different beginning of an action, such as set on fire, see languages. Mel’cukˇ (2004)). The meaning of LVCs can be only partially computed on the basis of the mean- 1 Introduction ings of their parts and the way they are related to In natural language processing (NLP), a signifi- each other, hence it is important to treat them in a cant part of research is carried out on the English special way in many NLP applications. language. However, the investigation of languages LVCs are usually distinguished from productive that are typologically different from English is or literal verb + noun constructions on the one also essential since it can lead to innovations that hand and idiomatic verb + noun expressions on might be usefully integrated into systems devel- the other (Fazly and Stevenson, 2007). Variativ- oped for English. Comparative approaches may ity and omitting the verb play the most significant also highlight some important differences among role in distinguishing LVCs from productive con- languages and the usefulness of techniques that are structions and idioms (Vincze, 2011). Variativity applied. reflects the fact that LVCs can be often substituted In this paper, we focus on the task of identify- by a verb derived from the same root as the nomi- ing light verb constructions (LVCs) in English and nal component within the construction: productive Hungarian free texts. Thus, the same task will be constructions and idioms can be rarely substituted carried out for English and a morphologically rich by a single verb (like make a decision – decide). language. We compare whether the same set of Omitting the verb exploits the fact that it is the features can be used for both languages, we in- nominal component that mostly bears the seman- vestigate the benefits of integrating language spe- tic content of the LVC, hence the event denoted cific features into the systems and we explore how by the construction can be determined even with- the systems could be further improved. For this out the verb in most cases. Furthermore, the very purpose, we make use of the English–Hungarian same noun + verb combination may function as an parallel corpus SzegedParalellFX (Vincze, 2012), LVC in certain contexts while it is just a productive where LVCs have been manually annotated. construction in other ones, compare He gave her a

255 Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 255–261, Sofia, Bulgaria, August 4-9 2013. c 2013 Association for Computational Linguistics ring made of gold (non-LVC) and He gave her a and Kuhn (2009) argued that multiword expres- ring because he wanted to hear her (LVC), sions can be reliably detected in parallel corpora hence it is important to identify them in context. by using dependency-parsed, word-aligned sen- In theoretical linguistics, Kearns (2002) distin- tences. Sinha (2009) detected Hindi complex guishes between two subtypes of light verb con- predicates (i.e. a combination of a light verb and structions. True light verb constructions such as a noun, a verb or an ) in a Hindi–English to give a wipe or to have a laugh and vague ac- parallel corpus by identifying a mismatch of the tion verbs such as to make an agreement or to Hindi light verb meaning in the aligned English do the ironing differ in some syntactic and se- sentence. Many-to-one correspondences were also mantic features and can be separated by various exploited by Attia et al. (2010) when identifying tests, e.g. passivization, WH-movement, pronom- Arabic multiword expressions relying on asym- inalization etc. This distinction also manifests in metries between paralell entry titles of Wikipedia. natural language processing as several authors pay Tsvetkov and Wintner (2010) identified Hebrew attention to the identification of just true light verb multiword expressions by searching for misalign- constructions, e.g. Tu and Roth (2011). However, ments in an English–Hebrew parallel corpus. here we do not make such a distinction and aim to To the best of our knowledge, parallel corpora identify all types of light verb constructions both have not been used for testing the efficiency of an in English and in Hungarian, in accordance with MWE-detecting method for two languages at the the annotation principles of SZPFX. same time. Here, we investigate the performance The canonical form of a Hungarian light verb of our base LVC-detector on English and Hungar- construction is a bare noun + third person singular ian and pay special attention to the added value of verb. However, they may occur in non-canonical language-specific features. versions as well: the verb may precede the noun, or the noun and the verb may be not adjacent due 4 Experiments to the free word order. Moreover, as Hungarian is a morphologically rich language, the verb may In our investigations we made use of the Szeged- occur in different surface forms inflected for tense, ParalellFX English-Hungarian parallel corpus, mood, person and number. These features will be which consists of 14,000 sentences and contains paid attention to when implementing our system about 1370 LVCs for each language. In addition, for detecting Hungarian LVCs. we are aware of two other corpora – the Szeged Treebank (Vincze and Csirik, 2010) and Wiki50 3 Related Work (Vincze et al., 2011b) –, which were manually an- notated for LVCs on the basis of similar principles Recently, LVCs have received special interest in as SZPFX, so we exploited these corpora when the NLP research community. They have been au- defining our features. tomatically identified in several languages such as To automatically identify LVCs in running English (Cook et al., 2007; Bannard, 2007; Vincze texts, a machine learning based approach was ap- et al., 2011a; Tu and Roth, 2011), Dutch (Van de plied. This method first parsed each sentence and Cruys and Moiron,´ 2007), Basque (Gurrutxaga extracted potential LVCs. Afterwards, a binary and Alegria, 2011) and German (Evert and Ker- classification method was utilized, which can au- mes, 2003). tomatically classify potential LVCs as an LVC or Parallel corpora are of high importance in the not. This binary classifier was based on a rich fea- automatic identification of multiword expressions: ture set described below. it is usually one-to-many correspondence that is The candidate extraction method investi- exploited when designing methods for detecting gated the dependency relation among the verbs multiword expressions. Caseli et al. (2010) de- and nouns. Verb-object, verb-subject, verb- veloped an alignment-based method for extracting prepositional object, verb-other argument (in the multiword expressions from Portuguese–English case of Hungarian) and noun-modifier pairs were parallel corpora. Samardziˇ c´ and Merlo (2010) an- collected from the texts. The dependency labels alyzed English and German light verb construc- were provided by the Bohnet parser (Bohnet, tions in parallel corpora: they pay special attention 2010) for English and by magyarlanc 2.0 to their manual and automatic alignment. Zarrieß (Zsibrita et al., 2013) for Hungarian.

256 The features used by the binary classifier can be Csirik, 2010) in the case of Hungarian. Then, we categorised as follows: investigated whether the lemmatised verbal com- Morphological features: As the nominal com- ponent of the candidate was one of these fifteen ponent of LVCs is typically derived from a verbal verbs. The lemma of the noun was also applied stem (make a decision) or coincides with a verb as a lexical feature. The nouns found in LVCs (have a walk), the VerbalStem binary feature fo- were collected from the above-mentioned corpora. cuses on the stem of the noun; if it had a verbal Afterwards, we constructed lists of lemmatised nature, the candidates were marked as true. The LVCs got from the other corpora. POS-pattern feature investigates the POS-tag se- quence of the potential LVC. If it matched one pat- Syntactic features: As the candidate extraction tern typical of LVCs (e.g. verb + noun) the methods basically depended on the dependency candidate was marked as true; otherwise as false. relation between the noun and the verb, they could The English auxiliary verbs, do and have often also be utilised in identifying LVCs. Though the occur as light verbs, hence we defined a feature for dobj, prep, rcmod, partmod or nsubjpass the two verbs to denote whether or not they were dependency labels were used in candidate extrac- auxiliary verbs in a given sentence.The POS code tion in the case of English, these syntactic relations of the next word of LVC candidate was also ap- were defined as features, while the att, obj, plied as a feature. As Hungarian is a morpholog- obl, subj dependency relations were used in the ically rich language, we were able to define vari- case of Hungarian. When the noun had a deter- ous -based features like the case of the miner in the candidate LVC, it was also encoded noun or its number etc. Nouns which were histor- as another syntactic feature. ically derived from verbs but were not treated as Our feature set includes language-independent derivation by the Hungarian morphological parser and language-specific features as well. Language- were also added as a feature. independent features seek to acquire general fea- Semantic features: This feature also exploited tures of LVCs while language-specific features can the fact that the nominal component is usually de- be applied due to the different grammatical char- rived from verbs. Consequently, the activity acteristics of the two languages or due to the avail- or event semantic senses were looked for among ability of different resources. Table 1 shows which the upper level hyperonyms of the head of the features were applied for which language. noun phrase in English WordNet 3.11 and in the Hungarian WordNet (Mihaltz´ et al., 2008). We experimented with several learning algo- Orthographic features: The suffix feature is rithms and decision trees have been proven per- also based on the fact that many nominal compo- forming best. This is probably due to the fact that nents in LVCs are derived from verbs. This feature our feature set consists of compact – i.e. high- checks whether the lemma of the noun ended in level – features. We trained the J48 classifier of the a given character bi- or trigram. The number of WEKA package (Hall et al., 2009). This machine words of the candidate LVC was also noted and learning approach implements the decision trees applied as a feature. algorithm C4.5 (Quinlan, 1993). The J48 classi- Statistical features: Potential English LVCs fier was trained with the above-mentioned features and their occurrences were collected from 10,000 and we evaluated it in a 10-fold cross validation. English Wikipedia pages by the candidate extrac- tion method. The number of occurrences was used The potential LVCs which are extracted by the as a feature when the candidate was one of the syn- candidate extraction method but not marked as tactic phrases collected. positive in the gold standard were classed as neg- ative. As just the positive LVCs were annotated Lexical features: We exploit the fact that the on the SZPFX corpus, the F score interpreted most common verbs are typically light verbs. β=1 Therefore, fifteen typical light verbs were selected on the positive class was employed as an evalu- from the list of the most frequent verbs taken from ation metric. The candidate extraction methods the Wiki50 (Vincze et al., 2011b) in the case of En- could not detect all LVCs in the corpus data, so glish and from the Szeged Treebank (Vincze and some positive elements in the corpora were not covered. Hence, we regarded the omitted LVCs 1http://wordnet.princeton.edu as false negatives in our evaluation.

257 Features Base English Hungarian Feature English Hungarian Orthographical –– VerbalStem • –– All 59.93 56.96 POS pattern • –– Lexical -19.11 -14.05 • LVC list –– Morphological -1.68 -1.75 Light verb list • –– Semantic features • –– Orthographic -0.43 -3.31 Syntactic features • –– Syntactic -1.84 -1.28 • – – Semantic -2.17 -0.34 Determiner – • – Noun list – • – Statistical -2.23 – POS After – • – Language-specific -1.83 -1.05 LVC freq. stat. – • – • Agglutinative morph. – – Table 3: The usefulness of individual features in Historical derivation – – • • terms of F-score using the SZPFX corpus. Table 1: The basic feature set and language- specific features. tion. For each feature type, a J48 classifier was trained with all of the features except that one. We English Hungarian also investigated how language-specific features ML 63.29/56.91/59.93 66.1/50.04/56.96 improved the performance compared to the base DM 73.71/29.22/41.67 63.24/34.46/44.59 feature set. We then compared the performance to Table 2: Results obtained in terms of precision, re- that got with all the features. Table 3 shows the call and F-score. ML: machine learning approach contribution of each individual feature type on the DM: dictionary matching method. SZPFX corpus. For each of the two languages, each type of feature contributed to the overall per- formance. Lexical features were very effective in 5 Results both languages. As a baseline, a context free dictionary matching 6 Discussion method was applied. For this, the gold-standard LVC lemmas were gathered from Wiki50 and the According to the results, our base system is ro- Szeged Treebank. Texts were lemmatized and if bust enough to achieve approximately the same an item on the list was found in the text, it was results on two typologically different languages. treated as an LVC. Language-specific features further contribute to Table 2 lists the results got on the two differ- the performance as shown by the ablation anal- ent parts of SZPFX using the machine learning- ysis. It should be also mentioned that some of based approach and the baseline dictionary match- the base features (e.g. POS-patterns, which we ing. The dictionary matching approach yielded the thought would be useful for English due to the highest precision on the English part of SZPFX, fixed word order) were originally inspired by one namely 73.71%. However, the machine learning- of the languages and later expanded to the other based approach proved to be the most successful one (i.e. they were included in the base feature set) as it achieved an F-score that was 18.26 higher since it was also effective in the case of the other than that with dictionary matching. Hence, this language. Thus, a multilingual approach may be method turned out to be more effective regard- also beneficial in the case of monolingual applica- ing recall. At the same time, the machine learn- tions as well. ing and dictionary matching methods got roughly The most obvious difference between the per- the same precision score on the Hungarian part of formances on the two languages is the recall scores SZPFX, but again the machine learning-based ap- (the difference being 6.87 percentage points be- proach achieved the best F-score. While in the tween the two languages). This may be related to case of English the dictionary matching method the fact that the distribution of light verbs is quite got a higher precision score, the machine learning different in the two languages. While the top 15 approach proved to be more effective. verbs covers more than 80% of the English LVCs, An ablation analysis was carried out to exam- in Hungarian, this number is only 63% (and in or- ine the effectiveness of each individual feature of der to reach the same coverage, 38 verbs should be the machine learning-based candidate classifica- included). Another difference is that there are 102

258 different verbs in English, which follow the Zipf concept of “light verb construction” slightly dif- distribution, on the other hand, there are 157 Hun- ferently. For instance, Tu and Roth (2011) and Tan garian verbs with a more balanced distributional et al. (2006) focused only on true light verb con- pattern. Thus, fewer verbs cover a greater part of structions while only object–verb pairs are consid- LVCs in English than in Hungarian and this also ered in other studies (Stevenson et al., 2004; Tan et explains why lexical features contribute more to al., 2006; Fazly and Stevenson, 2007; Cook et al., the overall performance in English. This fact also 2007; Bannard, 2007; Tu and Roth, 2011). Several indicates that if verb lists are further extended, still other studies report results only on light verb con- better recall scores may be achieved for both lan- structions formed with certain light verbs (Steven- guages. son et al., 2004; Tan et al., 2006; Tu and Roth, As for the effectiveness of morphological and 2011). In contrast, we aimed to identify all kinds syntactic features, morphological features perform of LVCs, i.e. we did not apply any restrictions on better on a language with a rich morphologi- the nature of LVCs to be detected. In other words, cal representation (Hungarian). However, our task was somewhat more difficult than those plays a more important role in LVC detection in found in earlier literature. Although our results are English: the added value of syntax is higher for somewhat lower on English LVC detection than the English corpora than for the Hungarian one, those attained by previous studies, we think that where syntactic features are also encoded in suf- despite the difficulty of the task, our method could fixes, i.e. morphological information. offer promising results for identifying all types of LVCs both in English and in Hungarian. We carried out an error analysis in order to see how our system could be further improved and the errors reduced. We concluded that there were 7 Conclusions some general and language-specific errors as well. In this paper, we introduced our machine learning- Among the general errors, we found that LVCs based approach for identifying LVCs in Hungar- with a rare light verb were difficult to recognize ian and English free texts. The method proved (e.g. to utter a lie). In other cases, an originally to be sufficiently robust as it achieved approxi- deverbal noun was used in a lexicalised sense to- mately the same scores on two typologically dif- gether with a typical light verb ((e.g. buildings ferent languages. The language-specific features are given (something)) and these candidates were further contributed to the performance in both lan- falsely classed as LVCs. Also, some errors in guages. In addition, some language-independent POS-tagging or dependency parsing also led to features were inspired by one of the languages, so some erroneous predictions. a multilingual approach proved to be fruitful in the As for language-specific errors, English verb- case of monolingual LVC detection as well. particle combinations (VPCs) followed by a noun In the future, we would like to improve our sys- were often labeled as LVCs such as make up tem by conducting a detailed analysis of the effect his mind or give in his notice. In Hungar- of each feature on the results. Later, we also plan ian, verb + proper noun constructions (Hamletet to adapt the tool to other types of multiword ex- jatssz´ ak´ (Hamlet-ACC play-3PL.DEF) “they are pressions and conduct further experiments on lan- playing Hamlet”) were sometimes regarded as guages other than English and Hungarian, the re- LVCs since the morphological analysis does not sults of which may further lead to a more robust, make a distinction between proper and common general LVC system. Moreover, we can improve nouns. These language-specific errors may be the method applied in each language by imple- eliminated by integrating a VPC detector and a menting other language-specific features as well. named entity recognition system into the English and Hungarian systems, respectively. Acknowledgments Although there has been a considerable amount of literature on English LVC identification (see This work was supported in part by the European Section 3), our results are not directly comparable Union and the European Social Fund through the to them. This may be explained by the fact that dif- project FuturICT.hu (grant no.: TAMOP-4.2.2.C-´ ferent authors aimed to identify a different scope 11/1/KONV-2012-0013). of linguistic phenomena and thus interpreted the

259 References Kate Kearns. 2002. Light verbs in English. Manuscript. Margarita Alonso Ramos. 2004. Las construcciones con verbo de apoyo. Visor Libros, Madrid. Igor Mel’cuk.ˇ 2004. Verbes supports sans peine. Jurij D. Apresjan. 2004. O semanticeskojˇ nepustote Lingvisticae Investigationes, 27(2):203–217. i motivirovannosti glagol’nyx leksiceskixˇ funkcij. Marton´ Mihaltz,´ Csaba Hatvani, Judit Kuti, Gyorgy¨ Voprosy jazykoznanija , (4):3–18. Szarvas, Janos´ Csirik, Gabor´ Prosz´ eky,´ and Tamas´ Mohammed Attia, Antonio Toral, Lamia Tounsi, Pavel Varadi.´ 2008. Methods and Results of the Hun- Pecina, and Josef van Genabith. 2010. Automatic garian WordNet Project. In Attila Tanacs,´ Dora´ Extraction of Arabic Multiword Expressions. In Csendes, Veronika Vincze, Christiane Fellbaum, and Proceedings of the 2010 Workshop on Multiword Piek Vossen, editors, Proceedings of the Fourth Expressions: from Theory to Applications, pages Global WordNet Conference (GWC 2008), pages 19–27, Beijing, China, August. Coling 2010 Orga- 311–320, Szeged. University of Szeged. nizing Committee. Ross Quinlan. 1993. C4.5: Programs for Machine Colin Bannard. 2007. A measure of syntactic flexibil- Learning. Morgan Kaufmann Publishers, San Ma- ity for automatically identifying multiword expres- teo, CA. sions in corpora. In Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann MWE ’07, pages 1–8, Morristown, NJ, USA. Asso- Copestake, and Dan Flickinger. 2002. Multiword ciation for Computational Linguistics. Expressions: A Pain in the Neck for NLP. In Pro- ceedings of the 3rd International Conference on In- Bernd Bohnet. 2010. Top accuracy and fast depen- telligent Text Processing and Computational Lin- dency parsing is not a contradiction. In Proceedings guistics (CICLing-2002, pages 1–15, Mexico City, of the 23rd International Conference on Computa- Mexico. tional Linguistics (Coling 2010), pages 89–97. Tanja Samardziˇ c´ and Paola Merlo. 2010. Cross-lingual Helena de Medeiros Caseli, Carlos Ramisch, Maria das variation of light verb constructions: Using parallel Grac¸as Volpe Nunes, and Aline Villavicencio. 2010. corpora and automatic alignment for linguistic re- Alignment-based extraction of multiword expres- search. In Proceedings of the 2010 Workshop on sions. Language Resources and Evaluation, 44(1- NLP and Linguistics: Finding the Common Ground, 2):59–77. pages 52–60, Uppsala, Sweden, July. Association for Computational Linguistics. Paul Cook, Afsaneh Fazly, and Suzanne Stevenson. 2007. Pulling their weight: exploiting syntactic Begona˜ Sanroman´ Vilas. 2009. Towards a seman- forms for the automatic identification of idiomatic tically oriented selection of the values of Oper1. expressions in context. In Proceedings of the Work- The case of golpe ’blow’ in Spanish. In David shop on a Broader Perspective on Multiword Ex- Beck, Kim Gerdes, Jasmina Milicevi´ c,´ and Alain pressions, MWE ’07, pages 41–48, Morristown, NJ, Polguere,` editors, Proceedings of the Fourth In- USA. Association for Computational Linguistics. ternational Conference on Meaning-Text Theory – Stefan Evert and Hannah Kermes. 2003. Experiments MTT’09, pages 327–337, Montreal, Canada. Univer- on candidate data for extraction. In Pro- site´ de Montreal.´ ceedings of EACL 2003, pages 83–86. R. Mahesh K. Sinha. 2009. Mining Complex Predi- Afsaneh Fazly and Suzanne Stevenson. 2007. Distin- cates In Hindi Using A Parallel Hindi-English Cor- guishing Subtypes of Multiword Expressions Using pus. In Proceedings of the Workshop on Multiword Linguistically-Motivated Statistical Measures. In Expressions: Identification, Interpretation, Disam- Proceedings of the Workshop on A Broader Perspec- biguation and Applications, pages 40–46, Singa- tive on Multiword Expressions, pages 9–16, Prague, pore, August. Association for Computational Lin- Czech Republic, June. Association for Computa- guistics. tional Linguistics. Suzanne Stevenson, Afsaneh Fazly, and Ryan North. Antton Gurrutxaga and Inaki˜ Alegria. 2011. Auto- 2004. Statistical Measures of the Semi-Productivity matic Extraction of NV Expressions in Basque: Ba- of Light Verb Constructions. In Takaaki Tanaka, sic Issues on Cooccurrence Techniques. In Proceed- Aline Villavicencio, Francis Bond, and Anna Ko- ings of the Workshop on Multiword Expressions: rhonen, editors, Second ACL Workshop on Multi- from Parsing and Generation to the Real World, word Expressions: Integrating Processing, pages 1– pages 2–7, Portland, Oregon, USA, June. Associa- 8, Barcelona, Spain, July. Association for Computa- tion for Computational Linguistics. tional Linguistics. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Yee Fan Tan, Min-Yen Kan, and Hang Cui. 2006. Pfahringer, Peter Reutemann, and Ian H. Witten. Extending corpus-based identification of light verb 2009. The WEKA data mining software: an update. constructions using a supervised learning frame- SIGKDD Explorations, 11(1):10–18. work. In Proceedings of the EACL Workshop on

260 Multi-Word Expressions in a Multilingual Contexts, Janos´ Zsibrita, Veronika Vincze, and Richard´ Farkas. pages 49–56, Trento, Italy, April. Association for 2013. magyarlanc 2.0: szintaktikai elemzes´ es´ fel- Computational Linguistics. gyors´ıtott szofaji´ egyertelm´ us˝ ´ıtes´ [magyarlanc 2.0: Syntactic parsing and accelerated POS-tagging]. Yulia Tsvetkov and Shuly Wintner. 2010. Extrac- In Attila Tanacs´ and Veronika Vincze, editors, tion of multi-word expressions from small parallel MSzNy 2013 – IX. Magyar Szam´ ´ıtog´ epes´ Nyelveszeti´ corpora. In Coling 2010: Posters, pages 1256– Konferencia, pages 368–374, Szeged. Szegedi Tu- 1264, Beijing, China, August. Coling 2010 Organiz- domanyegyetem.´ ing Committee.

Yuancheng Tu and Dan Roth. 2011. Learning English Light Verb Constructions: Contextual or Statistical. In Proceedings of the Workshop on Multiword Ex- pressions: from Parsing and Generation to the Real World, pages 31–39, Portland, Oregon, USA, June. Association for Computational Linguistics.

Tim Van de Cruys and Begona˜ Villada Moiron.´ 2007. Semantics-based multiword expression extraction. In Proceedings of the Workshop on a Broader Perspective on Multiword Expressions, MWE ’07, pages 25–32, Morristown, NJ, USA. Association for Computational Linguistics.

Veronika Vincze and Janos´ Csirik. 2010. Hungar- ian corpus of light verb constructions. In Proceed- ings of the 23rd International Conference on Com- putational Linguistics (Coling 2010), pages 1110– 1118, Beijing, China, August. Coling 2010 Organiz- ing Committee.

Veronika Vincze, Istvan´ Nagy T., and Gabor´ Berend. 2011a. Detecting Noun Compounds and Light Verb Constructions: a Contrastive Study. In Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, pages 116–121, Portland, Oregon, USA, June. ACL.

Veronika Vincze, Istvan´ Nagy T., and Gabor´ Berend. 2011b. Multiword expressions and named entities in the Wiki50 corpus. In Proceedings of RANLP 2011, Hissar, Bulgaria.

Veronika Vincze. 2011. Semi-Compositional Noun + Verb Constructions: Theoretical Questions and Computational Linguistic Analyses. Ph.D. thesis, University of Szeged, Szeged, Hungary.

Veronika Vincze. 2012. Light Verb Constructions in the SzegedParalellFX English–Hungarian Paral- lel Corpus. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugurˇ Dogan,ˇ Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight Interna- tional Conference on Language Resources and Eval- uation (LREC’12), Istanbul, Turkey, May. European Language Resources Association (ELRA).

Sina Zarrieß and Jonas Kuhn. 2009. Exploiting Trans- lational Correspondences for Pattern-Independent MWE Identification. In Proceedings of the Work- shop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, pages 23–30, Singapore, August. Association for Computational Linguistics.

261