Registers in the System of Semantic Relations in Plwordnet
Total Page:16
File Type:pdf, Size:1020Kb
Registers in the System of Semantic Relations in plWordNet Marek Maziarz Stan Szpakowicz Maciej Piasecki Institute of Computer Science Ewa Rudnicka Polish Academy of Sciences Institute of Informatics Warsaw, Poland Wrocław University of Technology & Wrocław, Poland School of Electrical Engineering [email protected] and Computer Science [email protected] University of Ottawa [email protected] Ottawa, Ontario, Canada [email protected] Abstract Lexico-semantic relations are the principal de- terminant of lexical meaning in plWordNet. Mark- Lexicalised concepts are represented in wordnets ing often constrains those relations; some of them by word-sense pairs. The strength of markedness cannot hold between LUs of incompatible regis- is one of the factors which influence word use. ters. Semantics can constrain derivationally based Stylistically unmarked words are largely context- relations, e.g., femininity is limited to the nouns neutral. Technical terms, obsolete words, “of- denoting animals and humans. Among verbs ficialese”, slangs, obscenities and so on are all there are also relations limited to the particular as- marked, often strongly, and that limits their use con- pect or verb class, like hyponymy or inchoativity siderably. We discuss the position of register and (Maziarz et al., 2013, section 4). markedness in wordnets with respect to semantic Registers, semantic domains and verb classes relations, and we list typical values of register. We are attributes. It is far from obvious how to put at- illustrate the discussion with the system of regis- tributive information into an inherently relational ters in plWordNet, the largest Polish wordnet. We structure of a wordnet. Princeton WordNet (PWN) present a decision tree for the assignment of mark- represents semantic domains by synsets in two ing labels, and examine the consistency of the edit- roles: elements of the lexical system and meta- ing decisions based on that tree. information which characterises those elements. To associate a synset with a domain, a domain re- 1 Introduction lation links it with another synset which represents that domain. Attributes of LUs can also be rep- A dense network of lexico-semantic relations is resented by sub-dividing those LUs into classes. the feature that best differentiates a wordnet from This mechanism has been present in PWN from other types of dictionaries and thesauri. Word- the beginning. There are, e.g., separate bases for nets are organised into synsets and lexical units parts of speech or semantic domains, represented (LUs), whose meaning is crucially determined just by the so-called lexicographers’ files. by such relations. The inventories of relations, Registers have a major role to play in shap- usually based on the findings in lexical semantics, ing the structure of plWordNet. We will con- seem largely comparable across wordnets, but spe- tinue the practice of making registers figure in cific definitions and strategies of applications vary. the definitions of lexico-semantic relations, but we Wordnets also vary in the amount of typical dictio- will analyse them, and introduce register values nary information encoded. An apt example of such into plWordNet, very systematically. To begin variation is the treatment of stylistic registers, as with, we need an appropriate set of marking la- well as broad semantic domains, to which a given bels. It should streamline the description of lexico- synset or LU belongs. semantic relations, facilitate future plWordNet ap- plications, and ensure the consistency of the lin- LUs. The construction is discussed in (Maziarz et guists’ decisions. al., 2013); the focus of this paper is on the prop- Section 2 of the paper recaps the model of the erties which help constitutive lexico-semantic re- semantic relation system in plWordNet. Section 3 lations determine synsets. serves as an overview of related work insofar as The constitutive relations in plWordNet are hy- it contributes to our intended use of the marking ponymy, hypernymy, meronymy and holonymy, labels for the enrichment of the wordnet-based de- plus verb-specific relations of presupposition, pre- scription of lexical meaning. Section 4 presents ceding, cause, processuality, state, iterativity and the details of the markedness labelling in plWord- inchoativity (Maziarz et al. (2011) discuss the de- Net. Section 5 reports on a small, carefully ar- tails), and adjective-specific relations of value (of ranged experiment meant to determine how con- an attribute), gradation and modifier (Maziarz et sistent marking can be expected given a precise al., 2012b). They are supplemented by constitu- procedure in the form of a decision tree. Section 6 tive features: verb aspect and semantic class, and offers a few conclusions based on our experience, register. and briefly discusses our expectations for the on- A wordnet describes lexical meaning primarily going development of plWordNet. via semantic relations, so it is important for a con- stitutive relation to be fairly widespread in the net- 2 Constitutive relations and registers work. A high degree of sharing among groups A wordnet is founded on synonymy. Its basic unit, of LUs is necessary because a constitutive rela- the synset, is a group of lexical units (LUs).1 Al- tion underlies grouping LUs into synsets. It also though synonymy is undoubtedly key, wordnets helps if a constitutive relation is well established in vary as to how it is defined and applied. The cre- linguistics: linguists who are wordnet editors will ators of PWN (Miller et al., 1993; Fellbaum, 1998) encounter fewer misunderstanding. Finally, a con- adopt a very strict definition of synonymy usu- stitutive relations which accords with the wordnet ally attributed to Leibniz,2 but realistically make practice will make for better compatibility among it context-dependent. The effect is a take on syn- wordnets (Maziarz et al., 2013). onymy which is linguistically satisfying but insuf- Verbs of different aspect participate in different ficiently accurate: the wordnet authors’ intuition lexico-semantic relations, e.g., a hypernym cannot largely dictates what LUs go into a synset. More- be replaced by the other element of its aspectual over, a synset is often understood as a set of syn- pair. The value of aspect thus constrains selected onymous LUs, while synonymous LUs are under- verb relations (Maziarz et al., 2012a). Semantic stood as elements of the same synset. Such circu- verb classes also restrict links for some verb rela- larity is hard to make operational. tions. The verb classification is based on a Vend- The interdependence of the notions of syn- lerian typology. A hierarchy of verb classes has onymy and synset, and the subjectivity of autho- been implemented in plWordNet as a hypernymy rial judgement, can be avoided. Maziarz et al. hierarchy of artificial lexical units, each naming (2013) propose a different perspective. The LU, a different class. Verbs in a given class are hy- rather than the synset, becomes the basic structure ponyms of the corresponding artificial LU. in plWordNet. As Vossen (2002) notes, the cen- Stylistic registers have been introduced into tral relations – synonymy, hyponymy, hypernymy, plWordNet relation definitions; they appear in meronymy and holonymy – are lexical: they hold guidelines and in some substitution tests. With ev- between words, not between concepts. A PWN ery editing decision a linguist must recognise the synset denotes a lexicalized concept, and con- registers of the LUs and synsets to be linked. The ceptual relations link synsets, but those relations marking labels represent pragmatic features of LU have a lexico-semantic origin. Our model derives usage, so it seems natural to have register values synset content and synonymy from a carefully encoded explicitly. constructed set of constitutive relations between A synset in plWordNet is a set of lexical units 1We understand the lexical unit informally as a lemma- which are connected to the rest of the network by sense pair. the same set of instances of constitutive relations, 2“Two words are said to be synonyms if one can be used in a statement in place of the other without changing the mean- and have compatible values of the constitutive fea- ing of the statement.” tures. Note how this definition does not refer to synonymy. Once synset membership has been de- cided, its elements are understood to be in the re- urządzenie elektroniczne lation of synonymy. `electronic device’ It now becomes crucial to recognise accurately the connectivity afforded by the constitutive re- lations. Linguists who build the wordnet are as- komputer mózg elektronowy sisted by conditions in the definitions of relations `computer’ `electronic brain’ (such conditions often refer to registers and se- mantic classes) and by substitution tests. Vossen (2002) discusses tests for semantic correspon- dence, which did not take into account the differ- komputer cyfrowy hyponymy ences in register or usage, often essential for the `digital computer’ inter-register synonymy possibility of contextual interchangeability. Lexical units which have nearly the same sense but significantly differ in register are put into sep- Figure 1: Inter-register synonymy between LUs arate synsets, but the proximity is not lost: those from different registers. synsets become linked by inter-register synonymy. That relation is weaker than synonymy with re- tive principles of the ontology. The wordnet de- spect to sharing. Synsets linked by inter-register scribes lexico-semantic relation of varying, possi- synonymy share a hypernym, but not hyponym bly complex, background and origin, while the on- sets, and clearly have different register values. tology mapping shows a possible relation between Consider an example: komputer ‘computer’ the lexical system and the internal cognitive struc- has an obsolete inter-register synonym mózg elek- ture of concepts. A potential plus is the possibility tronowy ‘electronic brain’. Figure 1 shows hy- of considering different ontologies as the mapping ponymy to urz ˛adzenieelektroniczne ‘electronic target, and so different interpretations of the lexi- 3 device’, which is shared. There is, however, a cal dependencies.