Phraseme Analysis and Concept Analysis: Exploring a Symbiotic Relationship in the Specialized Lexicon
Total Page:16
File Type:pdf, Size:1020Kb
Ingrid Meyer and Kristen Mackintosh University of Ottawa, Canada Phraseme Analysis and Concept Analysis: Exploring a Symbiotic Relationship in the Specialized Lexicon Abstract This paper analyzes ways in which phraseme analysis can facilitate concept analysis, and vice versa, in terminography work. We compare the phraseology of a number of conceptually related terms with conceptual information in our terminological knowledge base on optical storage technologies. We propose that a better understanding of phraseme-concept relations is important for both knowledge- and corpus-based approaches to terminography, approaches which we believe will merge in the next generation of term banks. 1. Introduction This paper is concerned with phraseology as it pertains to terminography, by which we mean the identification, analysis and description of terms (=specialized lexical items). With the help of specialized texts and interviews with domain (=subject-field) experts, the terminographer carries out three basic tasks, which are not strictly sequential. 1) Identification of object of study. The terminographer circumscribes the domain and finds related domains; he identifies the principal concepts and terms, and eliminates lexical items belonging to general language. 2) Analysis. The terminographer analyzes the specialized corpus from both a conceptual and linguistic point of view. On the conceptual side, the concepts' principal attributes and relations (collectively, characteristics) are determined, a process which goes hand-in-hand with building up the conceptual structure of the domain, and mapping out links between these systems and those of related domains. On the linguistic side, various aspects of the terms are identified, such as collocational behaviour, grammatical features, and usage restrictions. 3) Synthesis. The terminographer's findings are typically presented in the form of a paper-based specialized dictionary or a term bank (=terminological database), which may be unilingual or multilingual. Definitions may be hand-crafted by the terminographer or taken from specialized texts. The terminographer may occasionally be required to propose a neologism when a concept has not yet been lexicalized. In working environments where standardization is essential, the terminographer may be required to propose a preferred term when competing candidates exist. 340 Euralex 1994 In this paper, we restrict ourselves to the second of these three tasks. Our goal is to identify some of the ways in which phrasemes are linked to the analysis component of terminography, and more specifically, to the concept analysis component. By concept analysis, we mean the process of dis- covering and representing the conceptual structures underlying the terms of a domain. Concept analysis is the foundation of terminography: without some understanding of the conceptual structure of a domain, it is impossible to carry out important linguistic tasks such as constructing definitions, dealing with quasi-synonyms, creating neologisms, etc. Explained simply,1 concept analysis has three goals: 1) to establish concept systems within the domain, and links between these systems and those of related domains; 2) to develop conceptual frames (explained in 2.2 below) for the terms of the domain, by analyzing the attributes and relations of the concepts (this task goes hand-in-hand with the first); 3) to discriminate between closely related concepts. Multilingual work, which we do not deal with here, entails a fourth task of matching concepts between languages. For our purposes, we will take phrasemes to include noun compounds (in the sense of Sager 1990:55-79) and collocations (in the sense of Benson et al 1986). We realize that these are different, in that a compound normally designates a single concept while a collocation does not. However, they share important relations to conceptual structure, and hence are grouped together here. We will examine phraseology from a practical angle, outlining ways in which phrasemes can help the terminographer with the conceptual side of analysis work, and conversely, ways in which the results of concept analysis can help the terminographer deal with phrasemes. 1.1 Motivation for this research A better understanding of the relationship between concept and phraseme analysis has implications for the following two research areas: 1) A knowledge-based approach to terminography. Despite its importance, concept analysis remains largely unformalized, as is evident when one consults general textbooks on terminology. Here, it is common to find more space devoted to the importance of concept systems than to methods for constructing them. Concept analysis will likely never become an exact science; however, we believe that by exploiting the many regularities in 1) the way that phrasemes encode conceptual information, and 2) the way that conceptual structures generate phrasemes, we can at least develop better guidelines to incorporate into textbooks. Increased formalization of concept analysis has implications beyond the didactic, however. The computerized terminological lexicon of the future needs to be rich not only in linguistic data, but also in domain knowledge. We have termed this model of the specialized lexicon a terminological knowledge base (TKB) (Meyer et al 1992a/b). Very simply, a TKB can be described as The way words work together / combinatorics 341 a hybrid between a conventional term bank (containing all the strictly linguistic information one finds therein) and a knowledge base, as this concept is known in Artificial Intelligence. A TKB would function not only as a dictionary, but as a general knowledge resource, an invaluable asset to language professionals (writers, translators) and others dealing with specialized texts (students, information retrieval specialists, software engineers), as well as computer systems (machine translation, natural language processing). 2) A corpus-based approach to terminography. The increasing availability of specialized on-line corpora, and the parallel development of corpus analysis tools, offer exciting potential for facilitating one of the terminologist's most labour-intensive tasks: identifying the specialized lexical items • both single- and multi-word • for a given domain. This job used to be (and often still is) done by manually scanning texts. The new corpus analysis tools, in contrast, offer the possibility of extracting phrasemes (and their immediate contexts) automatically. As phrasemes are becoming easier to acquire, terminographers need to get clearer on what to do with the phrasemes thus extracted. A certain amount of research has 2 recently targeted the problem of how to classify phrasemes; however, despite a few notable exceptions,3 very little work has addressed the question of their relationship to the process of terminography. This paper aims at helping to fill this gap. 1.2 Methodology This paper is related to a broader terminography project for the domain of optical storage technologies. This particular investigation, however, is limited to two interrelated concept systems, optical storage media and optical disc production processes, the core concepts of which are illustrated in Figure 1. Consistent with the first step in a terminographer's methodology • identification of object of study • we first identified three major subcategories of optical disks (CD-ROMs, WORMs and erasable disks), and established that there was some kind of relationship between mastering and CD-ROMs. We also knew that the domain of optical storage technologies had close links to audio recording and paper-based publishing, which we term ancestral domains. This rough conceptual profile, however, needed to be filled out in several ways: the links to ancestral domains had to be clarified, our understanding of most concepts had to be sharpened, and in particular, the concept mastering • which we later learned was actually three separate concepts • had to be developed. The goal of our investigation was to discover what roles phrasemes could play in fleshing out this conceptual skeleton, and conversely, what roles even a rough conceptual structure could play in facilitating phraseme analysis. To start with, we identified the phrasemes associated with five terms from this 342 Euralex 1994 concept system, using a one-million word corpus 4 and the concordancing tool TACT, developed at the University of Toronto. Our search terms were as follows (we included all spelling and morphological variants and senses): optical disc, CD-ROM, erasable (and its synonym rewritable), WORM, and master. The phrasemes found for these terms were compared with partially completed knowledge base entries for the associated concepts in our TKB. Essentially, for each phraseme identified, we asked ourselves the question: "Can this phraseme augment the existing TKB?", and conversely, "Can the conceptual information in the TKB facilitate our analysis of the phraseme?" Our observations for these two questions, respectively, are found in Sections 2 and 3 below. 2. Usefulness of phraseme analysis for concept analysis As mentioned above, concept analysis has three goals: 1) to establish concept systems within the domain and links with related domains; 2) to develop conceptual frames for the lexical items of the domain; 3) to discriminate between closely related concepts.The usefulness of phra- seology for each of these is discussed in turn below. 2.1 Establishing concept systems and links with related domains Compounds constitute the most obvious