On the Compositionality and Semantic Interpretation of English Noun Compounds
Total Page:16
File Type:pdf, Size:1020Kb
On the Compositionality and Semantic Interpretation of English Noun Compounds Corina Dima Collaborative Research Center 833 University of Tubingen,¨ Germany [email protected] Abstract multi-token expressions, in particular adjective- word phrases (Baroni and Zamparelli, 2010), de- In this paper we present a study cover- terminer phrases (Dinu et al., 2013b) or verb ing the creation of compositional distri- phrases (Yin and Schutze,¨ 2014). butional representations for English noun Studying the semantics of multiword units, and compounds (e.g. computer science) using in particular the semantic interpretation of noun two compositional models proposed in the compounds has been an active area of research literature. The compositional representa- in both theoretical and computational linguistics. tions are first evaluated based on their sim- Here, one train of research has focused on under- ilarity to the corresponding corpus-learned standing the mechanism of compounding by pro- representations and then on the task of au- viding a label for the relation between the con- tomatic classification of semantic relations stituents (e.g. in finger nail, the nail is PART OF the for English noun compounds. Our experi- finger) as in (OS´ eaghdha,´ 2008; Tratz and Hovy, ments show that compositional models are 2010) or by identifying the preposition in the pre- able to build meaningful representations ferred paraphrase of the compound (e.g. nail for more than half of the test set com- of the finger) as in (Lauer, 1995). pounds. However, using pre-trained com- In this paper, we explore compositional distri- positional models does not lead to the ex- butional models for English noun compounds, and pected performance gains for the semantic analyze the impact of such models on the task of relation classification task. Models using predicting the compound-internal semantic rela- compositional representations have a sim- tion given a labeled dataset of compounds. At the ilar performance as a basic classification same time, we analyze the results of the compo- model, despite the advantage of being pre- sitional process through the lens of the semantic trained on a large set of compounds. relation annotation, in an attempt to uncover com- 1 Introduction pounding patterns that are particularly challenging Creating word representations for multiword ex- for the compositional distributional models. pressions is a challenging NLP task. The chal- 2 Context and Compound Interpretation lenge comes from the fact that these constructions have “idiosyncratic interpretations that cross word There are two possible settings for compound boundaries (or spaces)” (Sag et al., 2002). A good interpretation: out-of-context interpretation and example of such challenging multiword expres- context-dependent interpretation. sions are noun compounds (e.g. finger nail, health Bauer (1983, pp. 45) describes a continuum of care), where the meaning of a compound often in- types of complex words, arranged with respect volves combining some aspect or aspects of the to their formation status and to how dependent meanings of its constituents. their interpretation is on the context: (i)“nonce for- Over the last few years distributed word repre- mations, coined by a speaker/writer on the spur sentations (Collobert et al., 2011b; Mikolov et al., of the moment to cover some immediate need”, 2013; Pennington et al., 2014) have proven very where there is a large ambiguity with respect successful at representing single-token words, and to the meaning of the compound which cannot there have been several attempts at creating com- be resolved without the immediate context (e.g. positional distributional models of meaning for Nakov’s (2013) example compound plate length, 27 Proceedings of the 1st Workshop on Representation Learning for NLP, pages 27–39, Berlin, Germany, August 11th, 2016. c 2016 Association for Computational Linguistics for which a possible interpretation in a given con- i.e. words that contained an underscore or a dash text could be what your hair is when it drags in (e.g. abstract entity, self-service); (ii) filtering out your food); (ii) institutionalized lexemes, whose candidates that included numbers or dots, or had potential ambiguity is canceled by the frequency more than 2 constituents; (iii) filtering out candi- of use and familiarity with the term, and whose dates where either one of the constituents had a more established meaning could be inferred based part-of-speech tag that was different from noun on the meanings of the constituents and prior or verb. The part-of-speech tagging of the can- world experience, without the need for an imme- didate compounds was performed using the spaCy diate context (e.g. orange juice); (iii) lexicalized Python library for advanced natural language pro- lexemes, where the meaning has become a con- cessing5. The reason for allowing both noun vention which cannot be inferred from the con- and verb as accepted part-of-speech tags was stituents alone and can only be successfully inter- that given the extremely limited context available preted if the term is familiar or if the context pro- when PoS-tagging a compound the tagger would vides enough clues (e.g. couch potato1). frequently label as verb multi-sense words that The available datasets we use (described in Sec- were actually nouns in the given context (e.g. eye tion 3) are very likely to contain some very low drop, where drop was tagged as a verb). The final frequency items of type (i), whose actual inter- compound list extracted from WordNet 3.1 con- pretation would necessitate taking the immediate tained 18775 compounds. context into account, as well some highly lexical- The compounds collected from all three re- ized compounds of type (iii), where the meaning sources were combined into one list. The list can only be deduced from context. Nevertheless, was deduplicated and filtered for capitalized com- because of a lack of annotated resources that pro- pounds (the Tratz (2011) dataset contained a small vide the semantic interpretation of a compound to- amount of person names and titles). A final fil- gether with its context, we will focus on the out- tering step removed all the compounds where ei- of-context interpretation of compounds. ther of the two constituents or the compound itself did not have a minimum frequency of 100 in the 3 Datasets support corpus (presented later, in Section 4.1). 3.1 English Compound Dataset for The frequency filtering step was motivated by the Compositionality assumption that the compositional process can be better modeled using “well-learned” word vectors The English compound dataset used for the com- that are based on a minimum number of contexts. position tests was constructed from two existing The final dataset contains 27220 compounds, compound datasets and a selection of the nom- formed through the combination of 5335 modifiers inal compounds in the WordNet database. The and 4761 heads. The set of unique modifiers and first existing compound dataset was described in heads contains 7646 words, with 2450 words ap- (Tratz, 2011) and contains 19158 compounds2. pearing both as modifiers and as heads. The dictio- The second existing compound dataset was pro- nary for the final dataset contains therefore 34866 posed in (OS´ eaghdha,´ 2008) and contains 1443 unique words. The dataset was partitioned into compounds3. train, test and dev splits containing 19054, Additional compounds were collected from the 5444 and 2722 compounds respectively. WordNet 3.1 database files 4, more specifically from the noun database file data.noun. The 3.2 English Compound Datasets for WordNet compound collection process involved Semantic Interpretation 3 steps: (i) collecting all candidate compounds, The Tratz (2011) dataset and the OS´ eaghdha´ 1 a couch potato is not a potato, but a person who exercises (2008) dataset are both annotated with seman- little and spends most of the time in front of a TV. 2The dataset is part of the semantically-enriched tic relations between the compound constituents. parser described in (Tratz, 2011) which can be obtained The Tratz (2011) dataset has 37 semantic relations from http://www.isi.edu/publications/licensed-sw/ and 19158 compounds. The OS´ eaghdha´ (2008) fanseparser/ 3 dataset has 1443 compounds annotated with 6 Available at http://www.cl.cam.ac.uk/˜do242/ Resources/1443_Compounds.tar.gz coarse relation labels (ABOUT, ACTOR, BE, HAVE, 4Available at http://wordnetcode.princeton.edu/ wn3.1.dict.tar.gz 5https://spacy.io/ 28 IN, INST). Appendix A lists the relations in the corpus-induced representations: two datasets together with some example anno- nc n 1 tated compounds. J = (pcomposed pcorpus)2 MSE n ij − ij For both datasets a small fraction of the i=1 j=1 constituents had to be recoded to the artificial X X underscore-based form described in Section 4.1, in where nc is the number of compounds in our order to maximize the coverage of the word repre- dataset. sentations for the constituents (e.g. database was Previous studies like (Guevara, 2010; Baroni recoded to data base). and Zamparelli, 2010) evaluate their proposed composition functions on training data created us- 4 Composition Models for English ing the following procedure: first, they gather a Nominal Compounds set of word pairs to model. Then, a large corpus is used to construct distributional representations A common view of natural language regards it as both for the word pairs as well as for the individ- being inherently compositional. Words are com- ual words in each pair. In order to derive word pair bined to obtain phrases, which in turn combine representations the corpus is first pre-processed to create sentences. The composition continues to such that all the occurrences of the word pairs of the paragraph, section and document levels. It is interest are linked with the underscore character this defining trait of human language, its compo- ‘ ’.