Evaluating Language Models for the Retrieval and Categorization of Lexical Collocations

Evaluating language models for the retrieval and categorization of lexical collocations Luis Espinosa-Anke1, Joan Codina-Filba`2, Leo Wanner3;2 1School of Computer Science and Informatics, Cardiff University, UK 2TALN Research Group, Pompeu Fabra University, Barcelona, Spain 3Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain [email protected], [email protected] Abstract linguistic phenomena (Rogers et al., 2020). Re- cently, a great deal of research analyzed the degree Lexical collocations are idiosyncratic com- to which they encode, e.g., morphological (Edmis- binations of two syntactically bound lexical ton, 2020), syntactic (Hewitt and Manning, 2019), items (e.g., “heavy rain”, “take a step” or or lexico-semantic structures (Joshi et al., 2020). “undergo surgery”). Understanding their de- However, less work explored so far how LMs in- gree of compositionality and idiosyncrasy, as well their underlying semantics, is crucial for terpret phraseological units at various degrees of language learners, lexicographers and down- compositionality. This is crucial for understanding stream NLP applications alike. In this paper the suitability of different text representations (e.g., we analyse a suite of language models for col- static vs contextualized word embeddings) for en- location understanding. We first construct a coding different types of multiword expressions dataset of apparitions of lexical collocations (Shwartz and Dagan, 2019), which, in turn, can be in context, categorized into 16 representative useful for extracting latent world or commonsense semantic categories. Then, we perform two experiments: (1) unsupervised collocate re- information (Zellers et al., 2018). trieval, and (2) supervised collocation classi- One central type of phraselogical units are fication in context. We find that most models lexical collocations, defined as restricted co- perform well in distinguishing light verb con- occurrences of two syntactically bound lexical structions, especially if the collocation’s first items (Kilgarriff, 2006), such that one of the items argument acts as a subject, but often fail to (the base) conditions the selection of the other item distinguish, first, different syntactic structures (the collocate) to express a specific meaning. For within the same semantic category, and second, finer-grained categories which restrict the set instance, the base lecture conditions the collocates of correct collocates1. give or deliver to express the meaning ‘perform’, the base applause conditions the selection of the 1 Introduction collocate thunderous to express the meaning ‘intense’, and so on. Lexical collocations are of high Language models (LMs) such as BERT (Devlin relevance to lexicography, NLP and second lan- et al., 2018), and its variants SpanBERT (Joshi guage learning alike, and constitute a challenge for et al., 2020), ALBERT (Lan et al., 2019), RoBERTa computational models because of their heterogene- (Liu et al., 2019), etc. have proven extremely flexi- ity in terms of idiosyncrasy and degree of semantic ble, as they behave as unsupervised multitask learn- composition (Mel’cukˇ , 1995). ers (Radford et al., 2019), and can be leveraged In this paper, we analyze a suite of LMs in the in a wide array of NLP tasks almost out-of-the- context of two tasks that involve lexical colloca- box (see, e.g., the GLUE and SuperGLUE results tion modeling. First, unsupervised collocate re- in Wang et al.(2019b) and Wang et al.(2019a), trieval, where we mask a collocation’s collocate respectively). They have also been harnessed as (e.g., “heavy” in “heavy rain”), and quantify how supporting resources for knowledge-based NLP well a LM of choice (BERT in particular) pre- (Petroni et al., 2019), as they capture a wealth of dicts, via its masked language modeling (MLM) objective, a valid collocate for that particular base 1The resources associated with this paper are available at https://github.com/luisespinosaanke/ (f“heavy”, “torrential”, “violent”, . g for the lexicalcollocations. base “rain” and the meaning intense). Second, su- 1406 Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pages 1406–1417 April 19 - 23, 2021. ©2021 Association for Computational Linguistics pervised in-context collocation categorization, 2.1 Conditioning MLMs where we fine-tune LMs on the task of predict- A Masked Language Model (MLM) can be used as ing a semantic category of a collocation in terms a proxy for gaining insights into how language is of its lexical function (LF), given its sentential con- encoded by the weights of the (usually transformer- text; cf. Section 3.1. Modeling, recognizing, and based) LM architecture. Moreover, simply asking classifying collocations in corpora has obvious ap- an LM to predict words in context (without task- plications for automatically creating and expand- specific fine-tuning) has proved useful in NLP ap- ing lexicographic resources, as well as for various plications dealing with lexical items (affixes, words downstream NLP applications, among them, e.g., or phrases). For example, Wu et al.(2019) use machine translation (Seretan, 2014), word sense BERT’s MLM for augmenting their training data disambiguation (Maru et al., 2019), or natural lan- in sentiment analysis tasks; Qiang et al.(2019) use guage generation (Wanner and Bateman, 1990). BERT for lexical simplification by conditioning The two main contributions of this paper thus are: the predictions over the [MASK] token by provid- 1. A “collocations-in-context” dataset, with in- ing the original sentence as context; and Zhou et al. stances of collocations of 16 different seman- (2019) obtain SotA results in lexical substitution by tic categories (in terms of LFs) in context, and conditioning BERT via embedding dropout on the with a fixed and lexical (i.e., no overlapping) target (unmasked) word. Inspired by the findings train/dev/test split (Section3). in these works (especially Qiang et al.(2019)), we will explore the predictions of BERT over masked 2. An evaluation framework for assessing the lexical collocations (with and without conditioning) degree of compositionality of lexical collo- in Section4, with the aim to understand whether cations, pivoting around two tasks: unsuper- these predictions can be used to measure the id- vised collocate retrieval (Section4) and in- iosyncrasy of the underlying semantics of a lexical context collocation categorization (Section5). collocation, i.e., whether the restrictions imposed by a collocation’s base are due to the frozenness Our results suggest that modeling collocations of the phrase itself or, on the contrary, sentential in context is a challenge, even for widely used context is neccessary. LMs, and that this is particularly true for less semantic (and thus less compositional and more id- 2.2 Distributional Lexical Composition iosyncratic) collocations. We also find that jointly Building representations that account for non- recognizing the semantics and the syntactic struc- compositional meanings within the broader spec- ture (e.g., whether the collocate acts as subject or trum of encoding semantic relations between words object in verbal constructions) of a collocation also is a long-standing problem in computational seman- constitutes non-trivial challenges for current archi- tics (Baroni and Zamparelli, 2010; Mitchell and La- tectures. Moreover, as a byproduct of our analysis, pata, 2010; Boleda et al., 2013). Interestingly, there we also find an interesting behaviour in LMs when seems to be little agreement on how these repre- modeling antonymy in adjectives, specifically that sentations should be defined, with recent attempts their representations undergo substantial transfor- focusing on verbal multiword expressions (see an mations as they flow through BERT’s transformer overview of approaches in Ramisch et al.(2018)), layers, with many contextualized embeddings clus- phrases of variable length encoded via LSTMs, tered together in the tip of a narrow cone that seems based on their definitions (Hill et al., 2016), or ar- to represent adjectives in collocations denoting in- bitrary lexical and commonsense relations between tensity (“heavy” rain) and weakness (“minor” is- word pairs for downstream NLP. As a testimony of sue). the broad methods explored in the most recent liter- 2 Related Work ature, let us refer to, for instance, the combination of word vector averages with conditional autoen- In this section, we discuss related works in two coders (Espinosa-Anke and Schockaert, 2018), ex- methodological areas that are relevant to this pa- pectation maximization (Camacho-Collados et al., per, namely conditioning MLMs (Section 2.1) and 2019), LSTMs for predicting word pair contexts recognition of multiword epressions (MWEs) (Sec- (Joshi et al., 2019), and explicit encoding of gener- tion 2.2). alized lexico-syntactic patterns (Washio and Kato, 1407 LF BERT input Masked sentence Pred. collocates Orig. collocate Masked sent iran feared that the u.s and israel may [MASK] an air raid on its controversial nuclear facilities perform, conduct, mount, launch Oper1 Orig sent [SEP] iran feared that the u.s and israel may launch an air raid on its controversial nuclear facilities [SEP] launch, conduct, order launch Masked sent iran feared that the u.s and israel may [MASK] an air raid on its controversial nuclear facilities Masked sent if that happens, lindsey will use explosive triggers to [MASK] the landing gear destroy, stop,

Evaluating Language Models for the Retrieval and Categorization of Lexical Collocations

D1.3 Deliverable Title: Action Ontology and Object Categories Type

Collocation Classification with Unsupervised Relation Vectors

Comparative Evaluation of Collocation Extraction Metrics

Collocation Extraction Using Parallel Corpus

Collocation Classification with Unsupervised Relation Vectors

Dish2vec: a Comparison of Word Embedding Methods in an Unsupervised Setting

Collocation Segmentation for Text Chunking

17058 FULLTEXT.Pdf (1.188Mb)

Exploratory Search on Mobile Devices

Automatic Extraction of Synonymous Collocation Pairs from a Text Corpus

Comparison of Latent Dirichlet Modeling and Factor Analysis for Topic Extraction: a Lesson of History

Sahala & Linden 2020. Improving Word Association Measures in Repetitive Corpora with Context Similarity Weighting. KDIR