International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015

Historical principles vs. synchronic approaches

Judy Pearsall* Oxford University Press, Oxford, UK

Abstract

Although there have been numerous studies of both synchronic and historical lexicographical features of dictionaries, few of them have been directly comparative, and this might seem surprising. In this chapter, a number of questions are addressed. What is the context and background for synchronic and historical approaches in dictionaries, and what does it mean to make this distinction in the first place? What are the key distinguishing features of synchronic and historical descriptions, and why do they matter, for the user or for anyone else? Also, what are the individual challenges and issues in the two approaches? As well as an overview of some key moments in the development of the debate, the questions will be approached by means of a case study – of the word “capital”–and analysis of specific lexicographical features. The examples are drawn from English language lexicography and are focused on general dictionaries (as opposed to dictionaries specially designed for children or learners of English), but there is an expectation that many of the observations can be generalized to apply to other languages and lexico- graphical contexts.

Introduction: Context and Background

For the purposes of this chapter, the terms “historical” and “synchronic” are defined with respect to dictionaries as follows: a dictionary which follows a “synchronic” approach is one which is concerned primarily with the language as it exists at a particular time (in practice, the present day). A “historical” dictionary or a dictionary with a “diachronic” approach is one which is concerned primarily with language as it has developed and evolved through time. The term “historical” is used in preference to “diachronic,” given that it was the term in the nineteenth century applied to dictionaries and language study. The habitual use of the contrastive terms “synchronic” and “historical” (or “diachronic”) in relation to language (and dictionaries) is relatively recent. The nineteenth century was characterized by the growth in historical linguistics and the establishment of some of the greatest dictionaries on historical principles, including the Grimm brothers’ Deutsches Wo¨rterbuch (first volumes published 1854, initiated in the 1830s) and the Oxford English Dictionary (OED, published 1884–1928, initiated in the late 1850s). In English, the term “historical” with reference to language study is recorded in the first half of the nineteenth century (see OED’s sense 2c, and the citation from 1832). In this context, study of language was seen as an account of evolutionary development, whose aim was tracing words from their earliest origins to the present day. Historical dictionaries were, and are, primarily documentary accounts rather than practical tools for using the language. According to Richard C. Trench, author of the influential paper On some deficiencies in our English Dictionaries: “A Dictionary is an historical monument, the history of a nation contemplated from one point of view” (Trench 1860, p. 6). However, it is not true to suggest that the editors of these dictionaries did not consider the interplay between diachronic and synchronic. In his “General Explanations,” James Murray describes the temporal nature of “the living language” as well as the impermanence of any synchronic view of the language, which he described as “no more permanent in

*Email: [email protected]

Page 1 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015 its constitution than definite in its extent” (Murray 1884, p. xviii). But it was the relationship between the current language and the past language that was important, as was understanding how the current language is informed by its history. Moreover, the historical perspective was the only one, or at least the only one that was regarded as academically – and to some extent morally – justified. It was important that the literary greats of the past were read and absorbed in order to enrich and enlighten the present population’s understanding of words today, or as Trench writes, “[in words] there are boundless stores of moral and historic truth, and no less of passion and imagination laid up” (Trench 1851). Not only did words come to have meaning for us only by means of their history, but also “The study of language is ...the most potent means ...for planting us in the true past of our country” (Trench 1860, p. 8). It was not until the work of Ferdinand de Saussure and the development of structural linguistics in the twentieth century that the case was put forward for a serious study of language as a system in itself, operating at any particular point in time and without reference to a temporal dimension, which functioned through the interrelations of the component parts and could be studied as a formal construct. Where words came from or how they had been used in the past was less important than how words related to each other and how meaning was constructed through those relations. In English, the words “diachronic” and “synchronic” were borrowed directly from the French as used by Saussure and recorded in his posthu- mous Cours de linguistique ge´ne´rale (1916). Structuralism was hugely influential in linguistics and anthropology in the early and mid-twentieth century. Even if much of Saussure’s thinking has now been superseded, the work of the structuralists was important in changing the terms of the debate and overturning the strongly held assumption that linguistic study was necessarily historical, thereby resetting the debate for the emergence of synchronic theories, especially theories of meaning, which would have a greater influence on lexicography later on. Of course, dictionaries which were synchronic in their approach and content had existed for many years, but they were regarded as functional tools rather than serious contributions to the field of language study. They were considered less important intellectually for being merely useful, “primers” for the ordinary man and woman to enable them to understand the current language and communicate effectively. Funk & Wagnall’s A Standard Dictionary of the English Language was a large and ambitious dictionary of current (American) English, but the professed approach was functional, almost apologetically so, rather than academic, as indicated in the Preface: It has been thought better not to follow a system simply because it is logical or philosophically correct, if practically it hinders rather than helps the inquirer. (F&W 1893, p. xi) If structural linguistics was a necessary foundation that gave academic (if not moral) credence to synchronic approaches and allowed new discourses to emerge, any influence on dictionaries was not seen until much later. Several theoretical developments, in particular those related to theories of meaning, were important in terms of their application and use in dictionaries. From Wittgenstein’s philosophy of language to the emergence of corpus linguistics in the late 1960s and Eleanor Rosch’s prototype theory in the 1970s, these developments had features in common: first, they were empirical, and second, they located meaning in the usage and behavior of language itself rather than in any external reference or abstract “langue.” Wittgenstein’s contribution, for example, came from his later writing in Philosophical Investigations (1953), and his idea of “family resemblance,” in which he challenged the well-established Classical notion that words had fixed definite meanings (and, in contrast to his earlier writing, that they had a direct relationship to objects in the real world). Taking the idea of resemblance between family members, he showed how the word “game” encompassed a number of features – such as rules, an element of competition, having more than one player, enjoyment – but that individual games (hockey, Scrabble, solitaire, and so on) didn’t each need to have all of these features to be called a “game.” They share some

Page 2 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015 features, not others. Eleanor Rosch looked at meaning from the perspective of cognitive psychology, showing that perception of word meaning was informed by the idea of prototypes rather than fixed boundaries. By this analysis, we know that a robin and an ostrich are both types of “bird,” but some birds (and some features of birds, such as ability to fly) are considered more prototypical than others. The prototypical bird, for native speakers of English in the Western world, is typically a robin not an ostrich. Meanwhile, dictionaries published around the turn of the twentieth century (for example, the Concise Oxford Dictionary (COD 1911) and Merriam-Webster’s Collegiate Dictionary (1898)) were published as dictionaries for everyday use, but they were more like compact versions of their historically based parents and as such largely preserved their organizational principles and features. While the Preface to COD states that the principle of ordering senses “has been that of logical connexion or of comparative familiarity or importance” (COD 1911, p. vii), a quick glance at the contents suggests otherwise. COD’s entry for “chuck” verb, for example, is typical in replicating the senses in exactly the same order as the OED (thus placing the sense “chuck under the chin” before the core senses relating to throwing and discarding). Over subsequent years editorial policies for these dictionaries were tweaked to give a better description of the current language (sense order was adjusted, modern examples replaced old-fashioned ones), but none of these tweaks represented a serious challenge to the lexical status quo and their practice, while now focused on the current language, frequently betrayed their historical bias (or perhaps, in the absence of sufficient empirical evidence, just their lack of a viable alternative). Today, a substantial proportion of these “current” dictionaries of English retain occasional historical bias, no doubt largely as a result of inertia rather than any principled stand. In the entry for “dope,” online versions of both the college dictionary from Random House and Collins English Dictionary list senses relating to “thick preparations” and varnish before dominant modern senses relating to drugs for example. It required the new discipline of corpus linguistics – on the one hand undistracted by historical principles, and on the other focused on analyzing large quantities of natural (synchronic) language afresh in a neutral, scientific way – to effect a more radical change. The Collins COBUILD English Language Dictionary (1987) was the first English dictionary to be directly written on the basis of corpus evidence (dictionary entries used illustrative examples taken, usually without alteration, from the text of British newspapers, and reflected the most frequent collocation patterns in the grammatical information and choice of phrases, for example). It was also one of the first to theorize its approach to definition writing and and take a new approach to phraseology (COBUILD is perhaps best known for abandoning the conventional “substitution” principle in definitions, instead using normal sentences to explain meaning, and for using modern functional grammar in its labeling and layout). COBUILD was a learner’s dictionary and the brainchild of the linguist John Sinclair; it revolutionized the learner’s dictionary market and, over the next few decades, exerted influence on bilingual dictionaries and general monolingual dictionaries. When Oxford Univer- sity Press decided to invest in a new monolingual English dictionary in the early 1990s (first published as the New Oxford Dictionary of English in 1998; later the Oxford Dictionary of English and its cousin, the New Oxford American Dictionary), it was looking to contemporary rivals such as the Collins English Dictionary as the market competitor but also to an approach that was corpus-based and informed by modern theories of meaning. The biggest recent change for dictionaries came with the digital revolution. Having digitized the text in the 1980s, the OED began working on a complete revision and launched a new online website in 2000. The original historical principles of the OED were revised and remodeled but not substantially altered, but the huge amount of newly available raw evidence transformed the methodology and, gradually, is transforming also the picture of English in the past. The OED’s digital revolution is not typical of other dictionaries, however. American English dictio- naries were the first to be digitized and made available free online in the early 2000s, not by publishers but by new players such as dictionary.com. Today there are thousands of online dictionary sites comprising a

Page 3 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015 heady mix of content and approaches. The phenomenon of user-generated dictionary content (pioneered by Wikipedia for encyclopedic content) and crowdsourcing have also changed the parameters. In the right context, this is hugely exciting. Whatever its shortcomings, urbandictionary.com is valuable in capturing today’s fast-moving slang directly from the groups that use it by tapping into users’ very real ambitions to exhibit and influence language use, thereby combining social networking and personal expression with a more orthodox descriptive approach to language. Not that crowdsourcing and user-generated content are entirely new approaches. From the beginning the OED (as the New English Dictionary, or NED) embraced the idea of public participation, framing its invitation from 1879 in a series of direct Appeals to the public to help find evidence of word usage: “An Appeal to the English-speaking and English- reading public to read books and make extracts for the Philological Society’s New English Dictionary” (April 1879, OED Archives). The OED’s recent revival of the original “Appeals” idea taps into genuine public interest by calling for information from the public at large about early use of words. In the wrong context, crowdsourcing and user-generated content can be a concern. What works for Wikipedia as a knowledge base works less well for language dictionaries, not least because linguistic facts are generalizations that are not knowable to individuals in the same way as real-world facts: personal record and anecdote of the moment are substituted for large-scale analysis of corpora and language research. In Wiktionary, senses from Webster’s 1913, an unrevised dictionary on historical principles, were used as the basis of this user-generated dictionary project, at least partly to avoid copyright infringement, and the result is a hotchpotch of randomly inserted “new” senses sitting alongside obsolete senses with no currency markers, as in the following extract from the entry “mobile”:

1. Capable of being moved 2. By agency of mobile phones 3. Characterized by an extreme degree of fluidity; moving or flowing with great freedom 4. Easily moved in feeling, purpose, or direction; excitable; changeable; fickle

Today’s user is confronted by an array of dictionary information online including an uncritical merging of historical and synchronic treatment. Philosophically, for the lexicographer, it is a problem. But it is not clear what users are making of it all. Despite the huge amount of site analytics available to online dictionary owners it is almost as true today as it was in 2003 that “uses and users of dictionaries remain for the moment relatively unknown” (Bogaards 2003, p. 33). It seems likely that appreciation of historical and synchronic approaches is largely the preserve of those who have some metaknowledge of dictionaries in the first place. If this is true, the naïve reader may be misled or confused, while a sophisticated user may be inclined to distrust dictionaries in the first place.

Main Description: A Case Study for “Capital”

This section explores the treatment of a single word, “capital,” and looks at relevant entries in dictionaries representing either a synchronic or a historical approach, comparing and contrasting their key features and underlying assumptions. The historical approach is represented by the OED (the revised entries from OED3, www.oed.com, 2012) while the synchronic approach is represented by the Oxford Dictionary of English (www.OxfordDictionaries.com, 2015).

Coverage Historical dictionaries (HDs) are larger and contain more information than synchronic dictionaries (SDs). Even if breadth of coverage of the modern language is broadly the same in both, HDs include obsolete and

Page 4 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015 archaic forms and meanings as well. In the OED “capital” has 17 adjectival senses (seven of which are not current) to ODO’s seven. Each historical sense is supported by quotations, typically 10–12 for main senses and, at the time of writing (March 2014), the OED had more than 3.3 million quotations in total. SDs may include just one or two illustrative examples, and may be selective about whether some types of vocabulary (real world referents such as plants and animals) should have them at all. While SDs need only attend to spellings and variants in present-day language, HDs will include forms from across the period covered; the OED revision project aims to include all attested forms and is adding newly identified ones all the time; there are currently around 1.7 million. Similarly, while an SD may give only short etymologies (or theoretically none at all), for HDs this is a core feature, involving full treatment of a word’s origin, development, cognates, etc. While the fact of size is self-evident, it has one important outcome, which is that historical dictionaries are developed over decades rather than years. The OED was roughly 60 years in the making before the final installment was published in 1928, and this is fairly typical of historical dictionary projects. The core team is bound to change over the course of such long projects. By contrast, synchronic dictionaries may be produced in a matter of years and are more likely to have a core team throughout. As a result, historical dictionaries will tend to have policies which are more codified as well as more conservative and resistant to change, or, interestingly, have policies that have inbuilt protection against change. A good example of the latter is the OED’s “10-year” rule, by which new terms are not added to the OED until they have been recorded in the language for at least 10 years. Since words are not taken out of the OED (ref: 1993 OED Advisory Committee Meeting minutes), the policy is to wait until the word is no longer regarded as ephemeral. A flurry of media interest accompanied the OED’s decision to include the word “tweet” in one of its recent updates, before it had been around barely 7 years; as the then Chief Editor, John Simpson wrote, wryly, “This breaks at least one OED rule, namely that a new word needs to be current for 10 years before consideration for inclusion. But it seems to be catching on” (Simpson 2013). When the OED decided to change its policy on dating Middle English and to include the date of composition alongside the date of documentary evidence, it required both internal discussion and external consultation. Changing policy decisions for any smaller dictionary project is never painless but can be effected more easily and with fewer participants.

Entry Organization All dictionaries need to decide how to treat word forms that have the same spelling but different origins (“homographs”), as well as what to do with word forms that have the same spelling but different parts of speech. In the OED the general principle is to treat all these as different headword entries, with homograph status being determined on the basis of modern spelling. For “capital” there are three different entries, including two nouns, one adjective, and one verb (the adjective is combined with one of the noun entries, owing to the closely connected semantic development of the two). Two different etymologies are traced, which show different development from Latin via Old French, but which are both ultimately connected to Latin caput meaning “head”. In ODE, parts of speech are merged and there are just two main entries, with the etymological distinction of the nouns being retained. From a synchronic perspective, this etymological adherence might seem counterintuitive, especially since senses within the first entry – capital meaning “capital city”, financial meanings of capital, and the dated interjection “capital!” (meaning excellent) – seem at least as different from each other as they are from the second entry (capital meaning “the top of a column”), and, as noted above, both main entries relate to the same Latin word caput anyway. This adherence to the etymological principle is remarkably strong in general dictionaries (though less so for learner’s dictionaries and bilingual dictionaries); the introductory materials for CED and ODE do not even mention it as an issue. In fact, it is only when dictionaries began to migrate online that questions about the use of homographs have been raised at all.

Page 5 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015

Etymologies Etymologies themselves ought to represent one of the most important areas of difference between HDs and SDs. In HDs, etymologies are placed at the beginning of the entry. For OED’s main “capital” entry the etymology runs to 200 words and goes into detail tracing the development in Old French and Middle French (as well as cognates in Spanish and other European languages); unlike other parts of the dictionary entry the information in an etymology is mainly of specialized interest. Etymologies in SDs are often truncated versions of what may be found in a historical “parent” and may be no more accessible; typically, they are placed at the end of the entry. ODE takes a distinctive approach in that, as well as identifying the source language, it focuses on a word’s semantic development in English. For “capital” ODE condenses the OED’s sense development into a couple of lines of key facts: the oldest use is Middle English as an adjective in the sense “relating to the head or top” (later “standing at the head or beginning”), and the word came into English via Old French from Latin capitalis, from caput “head”. It might seem surprising that etymologies exist in SDs at all, focused as they are on the current language. (Few learner’s dictionaries and bilingual dictionaries include etymologies, presumably for that reason.) There has been little detailed market research to show how everyday users respond to etymological information; in studies carried out by dictionary publishers where users are asked to “rank” dictionary features in order of relevance and usefulness, etymologies invariably feature below “meaning”, “grammar help”, and “illustrative examples”. But that doesn’t mean that word origins don’t excite public interest, as the vast literature of popular books, and dedicated websites on word origins will testify. For dictionary editors, etymological information is regarded not only as important and helpful to users but is also as a guiding principle for the organization of the entries in first place.

Compounds and Derived Forms Alongside homograph organization lies the related issue of derived forms and multiword units. Com- pounds and derived forms are treated differently from each other in the OED. Compound nouns such as “capital asset” and “capital allowance” appear routinely under the parent entry, while derived forms always have full entry status (“capitally” as well as “capitalize”). The motivation is derivational: compound nouns represent semantic development in the language, which is covered in the body of the entry, while derived forms are morphological development, which is handled in a new entry. The effect is to privilege derived forms (such as “capitalness” and “capitally”) regardless of how important they are, while demoting some important compounds (such as “capital city”). The issue is further highlighted by the fact that compounds and other “subsenses” are, on principle, allotted less space for quotations than single-word senses. Modern SDs take a less derivational approach. In common with its American English counterparts such as AHD and Random House, ODE uses frequency and salience (based on analysis of corpora in ODE’s case) as well as semantic distinctiveness to decide whether a derived form sits within an entry or not. Compounds are always listed as separate headwords. While in the past all dictionaries nested most material under a parent entry, partly in the interest of saving space and partly to show the etymological relation, the trend more recently, and certainly in digital display, has been to give headword status to many more related forms in the interest of general accessibility. Taking this approach, “capital city” and “capital allowance” are headwords, “capitally” sits under “capital,” and “capitalize” is a headword. In general this goes hand in hand with broadening criteria for what qualifies as a “compound” worth an SD’s attention in the first place and brings SDs closer to the HDs more broadly inclusive approach. Taking a cue from learner’s and bilingual dictionaries in debunking traditional notions of supposed semantic “transparency” as reason for exclusion (including “carrier bag” but not “plastic bag,” for instance), SDs now at least support the inclusion of more from the vast swathe of compounds attested in English.

Page 6 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015

When this is done, the overt etymological tie with the parent is lost. Arguably, once texts are digitized and online, many of these decisions have less importance anyway, since text can be discoverable in different places and links can be preserved electronically.

Sense Organization Sense organization within an entry is arguably the most visible sign of the difference between a historical and a synchronic approach. Synchronic descriptions aim to put the most salient modern meanings first and do not systematically attempt to cover obsolete or historical words and meanings. Using evidence from text corpora, frequency is taken by most lexicographers as a key indicator of salience, but this operates alongside considerations such as “logical” order (e.g., concrete senses before figurative and transferred ones), subsense groupings (related senses together), and domain (less specialized senses coming first). This means that, for “capital”, the noun senses in ODE come first even though the adjective was first historically. Of course, assessing salience is not always straightforward. Finance senses for “capital” are more frequent in general-language corpora (so on that analysis they should come first), but on the other hand they are somewhat marked in grammar and domain, and the decision was to put the “capital city” sense first. By contrast, the OED’s entry organization is based on chronology: the first sense is the one for which there is the earliest documentary evidence – which may be obsolete – and the date of the first citation is marked as the first date of the entry. In the case of “capital,” the first recorded evidence is c. 1225 in the adjectival sense “of or relating to the head.” Just as frequency is not the only criterion for synchronic approaches, so chronology is not the only one for historical approaches: “As ...the development often proceeded in many branching lines ... it is evident that it cannot be adequately represented in a single linear series” (Murray 1888, p. xxxi). Larger entries, including “capital,” use a variety of mechanisms to give shape and coherence as well as to enable navigation, and in doing so they abandon a single strict chronological sequence. For “capital” the subsections in the OED organize material under core strands of meaning development, such as “relating to the head” and “standing at the head.” The chronological order puts modern senses in their historical place of first date with no special treatment or marking. It is difficult to know, just from the entry itself, that sense 7 of the adjective (“relating to capital” in a financial sense) is one of the most important and productive today, when compared to, say, sense 6 (d) “prominent; important” (this 1999 quotation reflects a somewhat academic use: “Such understanding of oneself can be found within the parables, a capital reason why they stimulate contemporary response.”). This weakness is more apparent in older senses where there is no native- speaker intuition or reliable text corpora. The challenge for HDs to address issues of frequency and salience is one that I will come back to in the section on Evidence.

Evidence: Quotations and Examples In modern lexicography, evidence (whether from large text corpora or individual citations) is the basis for both synchronic and historical approaches. But while it might appear that quotations in HDs and examples in SDs are performing the same function, a closer look suggests they are different. SDs include examples of text, but rarely cite their source. The purpose is to show how the word is used in natural language. Modern SDs aim to show examples that are typical and representative of a word’s usage, and will draw on evidence from real language corpora (see ODE Preface, p. xxx). HDs on the other hand include citations, i.e., snippets of text from cited sources. The aim is not only to show the word in context but also to show that it exists, precisely, at a particular date and in a particular source; the citation material is the verifiable documentary evidence on which the entry is built. Considerable effort is therefore expended on bibliog- raphy and verification of quotations down to the detail of pagination, citation style, and edition used (the first usually being preferable), with the search for the earliest attestation being of particular interest. For

Page 7 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015

Fig. 1 View of OED results page for “capital” quick search older texts the problems multiply and might involve, for example, dealing with lacunae, illegible text, or spelling issues (Hawke forthcoming). The OED’s first quotation for “capital” n.1 (cf. Fig. 1) illustrates this kind of attention to detail, showing a composition date alongside the standard documentary date, reflecting the (new) policy for dating Middle English citations. For the OED the work relating to quotations is a huge enterprise, involving specialist bibliographical work and library research, together with background research on large historical databases. The selection criteria for quotations in HDs, involving the necessity for the first date, a spread of quotations over time, and a range of spelling variants and genres (see Simpson 2008 and Sheidlower 2011, also Hawke forthcoming), distracts from consideration of a point that the synchronic editor regards as indispensable, namely illustration of the word as it is typically used, with typical collocates, as measured by statistics based on language corpora. What is a relatively constrained task for the synchronic approach is made infinitely more difficult for the historical approach, where the lack of comprehensive and curated historical corpora means that establishing typicality in past eras, while perhaps theoretically possible, is still practically unworkable. But it is also more than a practical difficulty. The layout of a historical dictionary may actively work against representing frequency and typicality, since a sense relatively common at one period but rare at another (or fairly rare at any time in its history) may none the less include quotations across the period of its currency to more or less the same extent as the most frequent senses. In the entry for “capital” the salience of one sense over another may be disguised by the impulse to set out a similarly detailed documentary record for each sense. The OED policy makers are well aware of this tension between comparative salience and the impulse to include as much information as possible. Recently some new work has begun to assign basic frequency markers to OED headwords and compound forms, which would potentially enable new possibilities for search and presentation, though none of this is yet publicly available [OED Internal Proceedings].

Definitions If a key purpose of a dictionary is to locate meaning in words, then the definition field ought to be a dictionary’s most important asset. Despite all the advances in language technology, the definition remains one of the most human-centered activities of compiling a dictionary. Even with sufficient evidence, automated processes and corpus tools, and clear style and policy, producing a definition that has the right level of generalization, information, accessibility, level, and objectivity is a challenge. Definitions that miss the point are not uncommon in all dictionaries. CED and Random House College Dictionary both manage to define the main sense of “bag” without mentioning that it is a receptacle used for carrying

Page 8 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015 things; both may have borrowed this silently from the unrevised entry in the OED, which is missing the same information. For synchronic dictionaries, help is at hand. Large corpora can be created relatively easily and corresponding corpus tools (such as Lexical Computing Ltd.’s Sketch Engine) can analyze the data for collocates, grammatical relations, source, and dates to provide editors with a snapshot of a word. From a WordSketch for the word “bag”, based on any reasonably sized English corpus, it would be hard to miss that the verb “carry” is one of its most salient collocates (cf. Fig. 2). For historical dictionaries, there is less help available. Historical corpora (for example the excellent COHA, compiled by Mark Davies) certainly exist but they face numerous challenges, including size, availability of a wide enough span of historical material, collection of material, overall balance, and presentation of findings (Brinton forthcoming). A good deal of attention is bestowed on definitions in ODE. In aiming to be clear, concise, and accurate, using natural language and focusing on typical usage, the definitions for “capital” mostly achieve this; sense 1 for example reads: “The city or town that functions as the seat of government and administrative centre of a country or region.” Traditional devices and “dictionaryese” are mostly absent. Some types of discourse meaning are treated in different ways; the interjection “capital!” is explained rather than defined, as “Used to express approval, satisfaction, or delight.” Definitions seem to have been less of a focus for OED’s original editors. Despite a whole section on “Signification,” definitions and definition writing are barely mentioned in Murray’s “General explana- tions.” The first OED editors sometimes borrowed definitions from earlier dictionaries (such as Johnson’s Dictionary of the English Language), or referenced in-text quotations in place of definitions (Murray 1884, p. xxxi). These practices have now been abandoned but the sheer size and complexity of the task mean there is a temptation to focus on minutiae of style and to use a variety of caveats and hedges at the expense of simplicity and clarity. Overuse of multiple synonyms and catchalls such as “etc.” are typical examples of this, as in this definition (from “capital” adj. sense 6.b.): Of a town, mansion, estate, monastery, etc., that is the principal of its kind (in a region, group, etc.); (also) chief, principal, main; important. Hence: designating, of, or relating to a capital city. While it is easy to criticize the OED for such definitions, it is not so easy to know what action to take, since at least part of the problem is a systemic one facing all historical dictionaries. Since historical senses may cover a broad span of time, they typically also cover a span of meaning development. Without endless sense splitting (bringing with it its own issues of clarity and focus), many definitions operate as a potted history of the sense rather than a single definition, as can be seen in the same definition for “capital” above. Another challenge is that, for older senses in particular, the historical evidence is lacking. Judging what is typical on the basis of insufficient evidence means the result may be distorted. Is it better to make that judgement and exclude some material, not really knowing if it is important? There is no easy answer to this challenge.

Conclusion

Historical dictionaries aim to record and document the language, while synchronic dictionaries are aligned with everyday practical guidance on using the language. In the past, this gave synchronic approaches a lower intellectual status and it gave historical dictionaries higher status and cultural authority. In the twentieth century, synchronic linguistics established itself and synchronic lexicography was able to plug into academic discourse relating to theories of meaning. At the same time, the rise of learning in general through the twentieth century, and particularly learning of English, gave rise to some

Page 9 of 14 # Lexicography 10.1007/978-3-642-45369-4_3-1 and DOI Lexis Modern of Handbook International pigrVra elnHiebr 2015 Heidelberg Berlin Springer-Verlag

Fig. 2 View of the word Sketch entry for “bag” ae1 f14 of 10 Page International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015 important advances in presentation of meaning, which are generally held to more accurately reflect how language is actually used and understood. These approaches to meaning, combined with the benefits brought by corpus linguistics, were especially appropriate for learner’s dictionaries, where detailed description and explanation of common words are most important. In general dictionaries, however, the application of such new thinking has been patchy; and the recent piecemeal development of dictionaries online emphasizes that fact. For historical dictionaries, a large part of the approach to meaning is historically informed: words have meaning because of what they used to mean and each twist and turn is informed by the last. OED recognizes the importance of context in figuring meaning but historical dictionaries also are concerned with amassing evidence and providing an accurate record of the language through recounting the story of each word over time. Whereas for synchronic dictionaries there is in practice now sufficient evidence from language corpora to provide accurate generalizations about word use and behavior, for historical dictionaries this is not the case. Hence, some of the account is always inferred and finding more evidence may necessitate adjustments. Editors regularly undertake a fundamental rewrite of entries as part of the OED’s Third Edition revision project for this reason, so rewriting, word by word, the history of English for the benefit of cultural and literary historians as well as historical linguists (and synchronic dictionary writers). Neither the historical nor the synchronic perspective exists in complete isolation from the other in the context of a dictionary. Register labels such as “dated” or “archaic” imply a historical perspective in a synchronic dictionary and a synchronic perspective in a historical dictionary. For one’s own language – as opposed to learning another language – an appreciation of word origins and history seems to be deeply embedded, and there is a natural human tendency to regard older uses as somehow “better” or authentic. While learner’s dictionaries and other pedagogical dictionaries need to make little or no reference to history, general dictionaries reflect the history in, for example, the persistent use of etymologically driven decision making, as the case study in this chapter for the word “capital” indicates. While historical and synchronic approaches have taken different developmental paths, they could now be better aligned and signposted. Their goals are different but users of synchronic and historical dictionaries reference and benefit from the information in both. In the digital era, this raises some exciting possibilities, perhaps even a vision of a language repository (rather than a dictionary) that includes lexical information about both the current language and the past and uses crosslinking and layering of content to enable users to travel between the two.

Note

This chapter is written on the basis of the experience of working within the Academic Dictionary Department of Oxford University Press over some years, and is at times informed directly by personal experience and information, for example, the evidence of internal papers and proceedings from the Department.

References

Blackburn, S. (2008). Oxford dictionary of philosophy (2nd rev ed.). Oxford: Oxford University Press. Bogaards, P. (2003). Uses and users of dictionaries. In P. van Sterkenburg (Ed.), A practical guide to lexicography (pp. 26–33). Amsterdam/Philadelphia: John Benjamins.

Page 11 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015

Brewer, C. (2000). OED sources. In L. Mugglestone (Ed.), Lexicography and the OED. Pioneers in the untrodden forest (pp. 40–58). Oxford: Oxford University Press. Brinton, L. (forthcoming). Using historical corpora and historical text databases. In P. Durkin (Ed.), The Oxford handbook of lexicography. Oxford: Oxford University Press. Cowie, A. P. (Ed.). (2008). The Oxford history of English lexicography. Oxford: Oxford University Press. Davies, M. (2012). The 400 Million word corpus of historical American English (1810–2009). In I. Hegedus et al. (Eds.), English historical linguistics 2010 (pp. 217–250). Philadelphia: John Benjamins. Durkin, P. (Ed.). (forthcoming). The Oxford handbook of lexicography. Oxford: Oxford University Press. Hanks, P. (2008). The lexicographical legacy of John Sinclair. International Journal of Lexicography, 21(3), 219–229. Hawke, A. (forthcoming). Quotation evidence and definitions. In P. Durkin (Ed.), The Oxford handbook of lexicography. Oxford: Oxford University Press. Lew, R. (2011). Studies in dictionary use: recent developments. International Journal of Lexicography, 24(1), 1–4. Lowell, J. R. (1860, May). Atlantic Monthly. pp. 631–633. Mugglestone, L. (Ed.). (2000). Lexicography and the OED: pioneers in the untrodden forest. Oxford: Oxford University Press. Mugglestone, L. (2005). Lost for words: the hidden history of the Oxford English dictionary. New Haven/ London: Yale University Press. Murray, J. A. H. (1888). Preface to Volume I of A New English Dictionary on Historical Principles … Volume I. A and B. Oxford: Clarendon. Quine, W. V. O. (1960). Word and object. Cambridge: MIT Press. Quine, W. V. O. (1970). Philosophy of logic. Englewood Cliffs: Prentice Hall. Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology, 104(3), 192–233. Sheidlower, J. (2011). How quotation paragraphs in historical dictionaries work: the Oxford English dictionary. In M. Adams & A. Curzan (Eds.), Contours of English and English language studies (pp. 191–212). Ann Arbor: University of Michigan Press. Simpson, J. (2008). Why is the OED so small? In S. Vanvolsem & L. Lepschy (Eds.), Nell’officina del dizionario: Atti del Convegno Internazionale organizzato dall’Istituto di Cultura Lussemburgo (Romanische Sprachen und ihre Didaktik, Vol. 19, pp. 113–129). Stuttgart: ibidem-Verlag. Simpson, J. (2013). A heads up for the June 2013 OED release. public.oed.com/the-oed-today/recent- updates-to-the-oed/june-2013-update/a-heads-up-for-the-june-2013-oed-release. Accessed Mar 2014. Sinclair, J. (Ed.). (1987). Looking up: an account of the COBUILD project in lexical computing. London: Collins. Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press. Sinclair, J. (2004). Trust the text: language, corpus and discourse. London: Routledge. Speaks, J. (2011). Theories of meaning. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. http://plato.stanford.edu/archives/sum2011/entries/meaning. Accessed Mar 2014. Svensén, B. (2009). A handbook of lexicography: the theory and practice of dictionary-making (J. Sykes, Trans.). Cambridge: Cambridge University Press. Trench, R. C. (1851). The study of words. New York: W. J. Widdleton. Trench, R. C. (1860). On some deficiencies in our English dictionaries. London: John W. Parker & Son, West Strand. Wittgenstein, L. (2009). Philosophical investigations. (First published in German as Philosophische Untersuchungen in 1953) (G. E. M. Anscombe, Trans.). Chichester: Wiley-Blackwell.

Page 12 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015

Dictionaries [CED] Hanks, P., et al. (1st ed. 1979, 9th ed. 2007). Collins English Dictionary. London/Glasgow: Collins. Chambers Twentieth Century Dictionary of the English Language. (1901). Edinburgh: W. and R. Chambers. [CIDE2 (CALD)] Woodford, K., et al. (2005). Cambridge Advanced Learner’s Dictionary (=2nd edition of CIDE). Cambridge: Cambridge University Press. [COBUILD] Sinclair, J. M., Hanks, P., et al. (1987). Collins COBUILD English Language Dictionary. London/Glasgow: Collins. [COD] Fowler, H. W., & Fowler, F. G. (Eds.). (1911). The Concise Oxford Dictionary of Current English (1st ed.). Oxford: Clarendon. Collins Dictionary Online. (2012–). London/Glasgow: HarperCollins. http://www.collinsdictionary.com. Accessed Mar 2014. Diccionario General Vox de la Lengua Espan˜ol. (2008–). Barcelona: Larousse Editorial. http://www. diccionarios.com. Accessed Mar 2014. Dictionary.com. (n.d.). http://www.dictionary.com. Accessed Mar 2014. [DOST] Craigie, W., Aitken, J., Sevenson, J., & Dareau, M. (Eds.). (1931–2001). Dictionary of the Older Scottish Tongue. Oxford: Oxford University Press. [F&W] Funk & Wagnall’s Standard Dictionary of the English Language. (1893). New York: Funk and Wagnall. Funk & Wagnall’s New College Standard Dictionary. (1947). New York: Funk and Wagnall. Grimm, J., & Grimm W., et al. (1854–1960). Deutsches Wo¨rterbuch. Leipzig/Stuttgart: S. Hirzel. Merriam-Webster’s Collegiate Dictionary. (1898). Springfield: Merriam-Webster. Merriam-Webster Online. (n.d.). http://www.merriam-webster.com. Accessed Mar 2014. Morris, W., et al. (1969). The American Heritage Dictionary of the English Language (1st ed.). Boston: American Heritage Publishing Co. and Houghton Mifflin. [NOAD] Jewell, E. J., et al. (2001). New Oxford American Dictionary. New York: Oxford University Press. [NODE] Pearsall, J., & Hanks, P. (Eds.). (1998). New Oxford Dictionary of English (1st ed.). Oxford: Oxford University Press. [ODO] Oxford Dictionaries Online. (2010–). Oxford: Oxford University Press. http://www. oxforddictionaries.com. Accessed Mar 2014. [ODE2] Soanes, C., & Stevenson, A. (2004). Oxford Dictionary of English. Oxford: Oxford University Press. [OED] Murray, J. A. H., et al. (1933). The Oxford English Dictionary (1st ed., 12 vols.). Oxford: Oxford University Press. [OED2] Simpson, J., & Weiner, E. S. C. (1989). The Oxford English Dictionary (2nd ed., 20 vols.). Oxford: Clarendon. [OED3] OED Online. (2000–). Oxford: Oxford University Press. http://www.oed.com. Accessed Mar 2014. Random House Kernerman Webster’s College Dictionary. (2010). Tel Aviv: K Dictionaries. By arrange- ment with Random House Inc. http://www.thefreedictionary.com. Accessed Mar 2014. TheFreeDictionary.com. (n.d.). Farlex. http://www.thefreedictionary.com. Accessed Mar 2014. Webster, N. et al. (1828). An American Dictionary of the English Language. New York: S. Converse. Wiktionary. (2002–). http://en.wiktionary.org. Accessed Mar 2014.

Page 13 of 14 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_3-1 # Springer-Verlag Berlin Heidelberg 2015

Dictionary Prefaces Fowler, H. W., & Fowler, F. G. (1911). Preface. In The Concise Oxford Dictionary of Current English (pp. iii–x). Oxford: Clarendon Press. Funk, I. K. (1893). Introductory. In A Standard Dictionary of the English Language (pp. vi–xiv). New York: Funk and Wagnall’s. Liddell, H. G., & Scott, R. (1843). Preface. In A Greek-English Lexicon. Oxford: Oxford University Press. Murray, J. A. H. (1884). General Explanations. In New English Dictionary (Vol. 1, pp. vii–xiv). Oxford: Clarendon. fascicle 1. Porter, N. (1872). Preface. In Webster’s Revised Unabridged Dictionary of the English Language (pp. iiv–iv). Springfield: G. and C. Merriam Company. Urdang, L. (1979). Editorial Preface. In Collins English Dictionary. Glasgow: Collins. Webster, N. (1828). Preface. An American Dictionary of the English Language. Whitney, W. D. (1889). Preface. In The Century Dictionary: An Encyclopedic Lexicon of the English Language (pp. v–xvi). New York: The Century Co.

Other Electronic Resources COHA: the corpus of historical American English: 400 million words, 1810–2009. (2010–). Compiled by Mark Davies (Brigham Young University). Available online at http://corpus.byu.edu/coha/ OEC: Oxford English corpus. (2000–). 2.5 billion words and growing at a rate of 120 million words per month. Compiled by Oxford University Press. Available under research licence from OUP. Sketch engine: a corpus query system. (2003–). Compiled by Lexical Computing Ltd. Available online at http://www.sketchengine.co.uk/

Page 14 of 14 Figurative language and lexicography

Alice Deignan

Contents Introduction ...... 2 Description ...... 3 How Can Figurative Language Be Identified? ...... 3 Which Nonliteral Meanings Should a Dictionary (Not) Cover? ...... 7 Creative or Anomalous Metaphor ...... 8 Metonymy ...... 9 Irony ...... 11 How Should Literal and Figurative Meanings Be Ordered and Treated? ...... 11 Dictionaries and Metaphor Scholarship ...... 13 References ...... 14 Dictionaries ...... 15

Abstract This chapter explores the issues in dealing with figurative language in dictio- naries. It uses the understanding of “figurative language” that is generally shared by applied and corpus linguists, as opposed to scholars of poetry and literature. In this understanding, “figurative language” covers all uses that are understood in some way as being an extension or transference of meaning from a literal meaning; the term is not restricted to novel or creative uses. This understanding of “figurative” therefore includes conventionalized uses of words, such as warm to describe friendly behavior, or see to describe thinking, as well as more recent but established uses such as green to describe environmental issues. By far the most studied kind of figurative language is metaphor, which is the focus of most of this chapter. Metonymy is increasingly recognized as important and is also mentioned. As is well known, many, if not most, idioms have their origins in

A. Deignan (*) University of Leeds, Leeds, UK e-mail: [email protected]

# Springer-Verlag Berlin Heidelberg 2015 1 P. Hanks, G-M. de Schryver (eds.), International Handbook of Modern Lexis and Lexicography, DOI 10.1007/978-3-642-45369-4_5-1 2 A. Deignan

metaphor or metonymy (Moon 1998), and these also present something of a challenge to lexicography. They are referred to here but are discussed in detail elsewhere in this handbook.

Introduction

Figurative language is treated, in this chapter, as uses of language that are under- stood in some way as being an extension or transference of meaning from a literal meaning. It has been recognized for several decades that using this broad under- standing, figurative language is highly frequent in language, both when measured as types and as tokens. Developments in two related disciplines, cognitive linguistics and corpus lexicography, contributed to this recognition. Cognitive linguistics saw the publication of Lakoff and Johnson’s Metaphors We Live By, which set out Conceptual Metaphor Theory in 1980. In this work, Lakoff and Johnson stated in unequivocal terms (1) that metaphors pervade language, (2) that this is the result of our conceptual system being structured on metaphorical mappings between domains of experience, and (3) that metaphor is, therefore, of central importance to thought and language rather than solely the stylistic and elegant choice of the poet. Several other scholars had already started to think along these lines; for instance, in papers published in 1970 and 1978, Lehrer had explored the semantic extension of lexis across domains, examining the use of temperature words to describe emotion, and words from the domains of dimension and weight to describe wine. Using these and other examples, she demonstrated the potential systematicity of such meaning transfers, anticipating some of Lakoff and Johnson’s arguments, though not explicitly advancing a theory of metaphor as conceptual. Reddy’s 1979 discussion of the metaphors used to talk about communication argued that meta- phors are highly frequent and that they present a particular, nonneutral view of their topic. Lakoff and Johnson’s 1980 work extended such explorations of specific semantic topics, to present an ambitious model of thought and language, which still frames most current metaphor scholarship. A number of current scholars now reject some or all of the tenets of Conceptual Metaphor Theory, but as a landmark work, it requires to be addressed. Whatever the reality of conceptual mappings, or their relevance for applied linguists and lexicographers, the field of metaphor studies has been given prominence and intellectual impetus by Lakoff and Johnson’s work. Within corpus lexicography, at around the same time, corpora were beginning to be used systematically for the exploration of word meaning and use. The COBUILD project in lexicography was central; various aspects are discussed in the collection edited by Sinclair (1987), and implications from corpora for a view of lexis in language are described in his 1991 book. As is well documented, the concordance became the standard tool of the lexicographer. By examining a concordance, the relative frequency of different meanings of a word form was relatively easy to determine, and it was quickly noted that apparently metaphorical Figurative language and lexicography 3 uses often outnumber their literal counterparts. For instance, Moon (1987) notes that “Blend as a verb is used slightly more often to refer to the mixing of sounds, sights, emotions etc. than it is of substances” (p. 89). Deignan (1999) gives several examples of the frequency of metaphorical citations in concordances, including shred(s), which are more frequently of patience than of cloth in the Bank of English (as searched in 1998), and shoulder, which has a number of figurative meanings. Some of these meanings can be seen as instantiations of a conceptual metaphor that maps physically heavy objects onto psychologically challenging situations. The heavy objects are metaphorically referred to as burdens, which are then shouldered. In others, shoulder is used to refer metonymically to actions involving literal shoulders, such as rub shoulders, look over one’s shoulder, and cry on someone’s shoulder. Corpus research in recent years has shown how frequent this kind of metonymy is; this has happened alongside developments in the cognitive metaphor literature, which, from the 1990s, has increasingly discussed the centrality of metonymy, for example, in collections edited by Barcelona (2000) and Dirven and Po¨rings (2001). Corpus observations about the frequency of metaphorical meanings in concor- dance data were perhaps initially surprising to analysts, given the traditional view of metaphor as peripheral to language and decorative in nature. However, like other previously unnoticed facts about language brought to light through corpus analysis, this quickly began to seem self-evident. Louw and others have noted this kind of hindsight (e.g., Louw 2010, p. 756). Metaphor and, to a lesser extent, metonymy are now well established as important issues in applied linguistics and related disci- plines. The frequency and apparent conceptual importance of figurative language of several types poses challenges for lexicography, among them:

How can figurative language be identified? Which nonliteral meanings should a dictionary cover and not cover? How should literal and figurative meanings be ordered and treated?

This chapter discusses these challenges and ways of tackling them and then considers how dictionaries are used in metaphor scholarship.

Description

How Can Figurative Language Be Identified?

Most efforts have gone into establishing criteria for the identification of metaphor. Identification procedures in the recent metaphor literature have usually focused on whether a word is used metaphorically within a specific discourse context, that is, one instance of a word, in a single conversation or a written text. For clarity here, I shall call these “discourse approaches.” Lexicographers, on the other hand, are concerned with a related but different issue: whether there is an established metaphorical meaning of a word, distinct from other meanings. This could be 4 A. Deignan termed a “lexical meaning approach.” The difference stems from the underlying goals of each group of scholars. Many (but not all) metaphor scholars are concerned with how a word use is regarded in a particular context by a particular speaker/ writer and his or her listener/reader. They may then consider questions such as what meaning was intended, how this is received, how this relates to the wider discoursal meaning, what the underlying ideology or world view of the text and speaker/writer are, how these are interpreted by the listener/reader, and similar questions. For example, in one of Cameron’s studies (2007), a speaker talks of the process of bereavement and acceptance as being a journey. Cameron studies a number of uses of words around journeys in the discourse and considers what they mean both within this conversation and to the relationship between the speakers involved over a period of years. In contrast, dictionaries are concerned with how words are used conventionally, en masse, rather than with what a particular language user means. A dictionary does not, therefore, have to make a delicate decision about whether one particular instance of use is metaphorical. Rather, it has to show whether a collection of instances is distinct enough from other meanings of the same word to warrant its own sense, and if so, how it should be treated. A lexicographer would approach the example above, metaphorical journey, by analyzing a large number of corpus citations to determine whether there is a frequent nonliteral use. The lexicographer would probably look a little wider than the 80 characters or so of each concordance citation, and they would certainly not analyze in detail what effects a single use of the word has on the participants in the discourse. Most of the work on identifying metaphor has been undertaken within the discourse approach, and I discuss this first. Cameron’s work includes one of the earliest and best-known studies within this approach. She carried out a detailed analysis of around 27,000 words of discourse from a primary school classroom in Britain and attempted to identify all the metaphors in it (2003). In more recent work, referred to above, (2007), she analyzed transcripts of discussions between an IRA bomber and the daughter of one of his victims, also identifying all metaphors in around 27,000 words, from three discourse events. In both studies, she analyzed how metaphors were used to convey speaker meaning and to build shared meaning. She writes that “a necessary condition for linguistic metaphor is the presence in the discourse of a focus term or Vehicle, a word or phrase that is clearly anomalous or incongruous against the surrounding discourse” (2003, p. 59), and a further neces- sary condition is that the incongruity “can be resolved by some “transfer of meaning” from the Vehicle to the Topic” (ibid, p. 60). She writes of the difficulty in pinning down metaphor because of the way “language in use is continually stretched and bent” (ibid), concluding that there is no “pre-existing watertight category to be “found” in the data” (ibid, p. 62). For lexicographers, the way language is “continually stretched and bent” is also an issue, but over a longer time period. Cameron studies how individuals stretch meaning within a discourse event, while lexicographers have to deal with slower changes in meaning across the language as a whole. Figurative language and lexicography 5

There are two more recent identification procedures that take a similarly discourse-focused approach, MIP and MIPVU. These are probably the procedures most widely used by current metaphor scholars. MIP (“metaphor identification procedure”) was developed by a group known collectively as Pragglejaz (an acronym of the first names of the participating metaphor scholars: Peter Crisp, Raymond Gibbs, Alan Cienki, Gerard Steen, Graham Low, Lynne Cameron, Joe Grady, Alice Deignan, and ZoltánKo¨vecses) (Pragglejaz 2007). MIPVU extended and modified MIP, and was developed by scholars at the Vrije University (VU) of Amsterdam, led by Gerard Steen (Steen et al. 2010). MIP requires the analyst to read the entire discourse context and identify all lexical units. Each lexical unit is then considered, and the analyst needs to decide what its meaning is in the discourse context – that is, its “discourse meaning” – and whether it has a more basic meaning. Basic meanings, Pragglejaz writes, are typically more con- crete and immediate and often historically older. In the next stage, if the discourse and basic meanings are related by a relationship of comparison, the discourse meaning of the lexical unit is considered metaphorical. MIPVU (2010) takes a similar approach, operationalized in a good deal more detail, with discussion and guidelines for dealing with the many borderline cases that arise at every stage. The procedure differs from MIP in that it focuses on the lexical form rather than the lemma, meaning that it does not allow for metaphoricity across parts of speech. Whereas MIP would allow for squirrel (verb, meaning “save money”) to be a metaphor from squirrel (noun, animal), MIPVU would not because they are different parts of speech: in other words, because there is no basic verb squirrel, meaning something like “hide nuts,” the sense “save money” cannot be considered metaphorical. None of these three discourse approaches, Cameron’s, MIP, or MIPVU, make a distinction between a highly conventionalized metaphor and a new, creative one, an issue discussed below. In contrast to these discourse approaches, Goatly (1997, 2011) takes a “lexical meaning approach,” that is, he considers the identification of metaphor in terms of senses of words in general, rather than individual citations situated in a specific discourse context. Goatly grades degrees of metaphoricity, according to how he thinks a reader/listener might process them. Table 1 is based on his classification (2001, pp. 32), using his examples and terms and my own summary of his descriptions. Goatly’s five types of relationship between metaphorical senses and a literal counterpart range from highly innovative, “Active,” in his terms, through “Tired” to “Sleeping,” “Dead,” and “Dead and Buried.” Unlike the discourse approaches above, Goatly does not establish a binary distinction between metaphor and nonmetaphor, and an analyst who wishes to do so using his classification can decide where they would want to draw this line. The dividing line that is consistent with most scholars’ views would be between tired and sleeping, so that Goatly’s Active and Tired types are considered to be metaphors, while Sleeping, Dead, and Dead and Buried are not. Hanks (2010, 2013) also takes the lexical meaning approach, and like Goatly, appeals to the way the reader/listener processes the metaphor. He considers a 6 A. Deignan

Table 1 Goatly’s classification of metaphor types Label Example Description Dead Germ: a seed The connection between the two senses has Germ: a microbe become so distant with time that it is no Pupil: a young student longer recognized by most speakers Pupil: circular opening in the iris (Dead Clew: a ball of thread The two senses have become formally and) Clue: a piece of evidence different or the original sense is no longer in Buried Inculcate: to stamp in (not used use in modern English) Inculcate: to indoctrinate with Sleeping Vice: a gripping tool The metaphorical meaning is conventional. Vice: depravity The literal meaning is still in use and may be Crane: species of marsh bird evoked by the metaphorical sense on Crane: machine for moving occasion. The two senses are regarded as heavy weights polysemous Tired Cut: an incision As above. However, the metaphorical sense is Cut: budget reduction more likely to evoke the literal sense here than Fox: doglike mammal in the previous category. The two senses are Fox: cunning person regarded as polysemous Active Icicles: rodlike ice formations The metaphorical sense is evoked entirely Icicles: fingers (“He held five through the literal sense. There is no icicles in each hand” Larkin) established lexical relationship between the two senses

Table 2 Deignan’s classification of metaphor types Types of metaphorically motivated linguistic expression Example Living metaphors 1. Innovative metaphors ...the lollipop trees (Cameron 2003) He held five icicles in each hand (Larkin, cited by Goatly 2011, p. 32) (icicles = fingers) 2. Conventionalized metaphors grasp (Lakoff 1987) (spending) cut (Goatly 2011) 3. Dead metaphors deep (of color) crane (machine for moving heavy objects) (Goatly 2011) 4. Historical metaphors comprehend, pedigree (Lakoff 1987) ardent number of criteria, etymology, concrete vs abstract meaning, frequency, syntag- matics, and resonance (2010, pp. 140), and concludes that the best criterion for identifying metaphorical senses is resonance: “if one sense resonates semantically with another sense, then it is metaphorical, and if there is no such resonance, it is literal” (ibid). This would locate Hanks’ distinction between metaphor and nonmetaphor at the divide between Goatly’s tired and sleeping categories. I used Goatly’s classification, also considering work by Lakoff (1987), to develop my own classification of metaphor (Deignan 2005), shown in Table 2 Figurative language and lexicography 7 below. It is similar to Goatly’s in that it proceeds from innovative through to dead metaphors, but without his detailed coverage of historical and formally different uses. While referring closely to Goatly’s work (which was first published in 1997), I added the use of concordance data to distinguish some categories, and I also used semantic tests similar to those used by Pragglejaz (2005, pp. 39–47). Unlike Goatly and Hanks, I do not attempt to align this with how speakers might perceive or process meanings. Most metaphor scholars would probably consider the first three of the four categories, innovative, conventionalized, dead, and historical, to be metaphorical. Hanks’ notion of resonance would also, probably, cover these first three. While the distinctions made by metaphor scholars are useful and informative to lexicographers, the differences in their goals mean that the category boundaries that are of interest lie in different places. Metaphor scholars are interested in the type of relationship that exists between different contextual uses of the same word. This means that they focus on the boundary between metaphorical polysemy and nonmetaphorical polysemy, or between metaphorical polysemy and homonymy. In Goatly’s terms, these boundaries are around tired/sleeping/dead and, in my terms, between conventionalized/dead/historical, depending on the researcher’s operational definition. For example, another of the uses Cameron (2007) identifies in her data is loss, in the sense of “bereavement,” from the sense meaning “misplace an object,” the relationship between the two senses being metaphorical polysemy, sleeping (Goatly), or conventionalized (Deignan). This relationship can be contrasted with for used to talk about time in “for years” and used to indicate the beneficiary of an action “I’ve brought a cup of tea for you.” Pragglejaz examined this example in context and concluded: “The contextual meaning [time] contrasts with the basic meaning [transfer to recipient]. However, we have not found a way in which the contextual meaning can be understood by comparison with the basic meaning,” and that it is therefore not an example of a metaphor (Pragglejaz 2007, p. 4). The relationship is nonmetaphorical polysemy, dead (in Goatly’s terms). How- ever, for lexicographers, the difference between these pairs of meanings would not be important at the broad level of classification and splitting senses. In both cases, each member of the pair is an established sense and therefore each would be described in a separate sense in the entry for the word.

Which Nonliteral Meanings Should a Dictionary (Not) Cover?

It is not the job of a dictionary to cover all possible meanings of a word. Indeed this would not be useful to users, as, in the case of figurative language especially, it would give no hint as to which meanings were expected, and therefore any pragmatic or stylistic entailment from the choice of unexpected uses could not be deduced. In this section, I discuss which types of figurative language a dictionary might or might not cover. 8 A. Deignan

Creative or Anomalous Metaphor

Most dictionaries do not aim to cover creative uses of words. The usual goal is to present unmarked native speaker language use. The phrase “central and typical” is often used to describe this, in descriptions of the texts that should go into a reference corpus (Sinclair 1991, p. 17), meanings and usages of words (Hunston 2002, p. 42), and the meanings and collocates most usefully presented in a dictionary (Hanks 1987, pp. 124–125). Hunston usefully deconstructs the phrase, showing through the discussion of corpus data that centrality and typicality overlap but are not synon- ymous (2002, pp. 42–43). Using the criterion of centrality and typicality means that creative metaphors would not be covered. This is of course a decision that is made for a particular point in time, given that it is generally agreed that metaphors are creative and/or anomalous when they first enter the language, some of them becoming conventionalized over time (discussed in detail as the “Career of Meta- phor” Theory by Bowdle and Gentner, 2005). For instance, stream in the sense of “consume data, usually music or TV, directly from an Internet connection,” was a new use to describe a new behavior only a few years ago and has rapidly become a conventional metaphor. Both the discourse and lexical meaning approaches to metaphor identification discussed above consider creative, innovative, or simply anomalous metaphors as within their scope of study. For instance, Cameron identifies the metaphor mountain in her data, in the excerpt “there’s another mountain to climb now” (2007, p. 207). In this context, mountain refers to psychological struggles, and is a metaphorical extension from the literal sense of mountain. While this is an uncontroversial analysis for a metaphor scholar, for a lexicographer, there is a decision to be made about whether this is a sufficiently central and typical meaning to be included as a separate sense. Neither Macmillan English Dictionary (MED) (2002), nor Collins COBUILD Advanced Learner’s English Dictionary (CCALED) (2006) nor Oxford Advanced Learner’s Dictionary (OALD) (2010) includes this as a freestanding sense of mountain. However, MED defines the phrase move a moun- tain or move mountains as “to do something so difficult that it seems almost impossible” (p. 913), and CCAED defines having a mountain to climb as it being “difficult for them to achieve what they want to achieve” (p. 934), while all three cover make a mountain out of a molehill. Concordance data can be used to inform the decision about inclusion of senses, a decision that has to be made not just for metaphors of course, but for all unusual and creative uses. A dictionary might use a cutoff point similar to the one used in my classification (2005). To distinguish established from innovative metaphors, I argued that established metaphors must be evidenced by more than one citation of the sense per 1,000 citations of the word form, and the uses must be from several different sources (Deignan 2005). This would justify the inclusion of make a mountain out of a molehill, which accounts for ten citations of the 6,364 citations of mountain/mountains in the BNC. This is a rather arbitrary measure, and there might be grounds for varying it depending, for instance, on how polysemous a word is. For a highly polysemous word, each sense will naturally account for a lower Figurative language and lexicography 9 proportion of concordance citations of the word. A corpus analysis of 1,000 citations of see and inflections from the Oxford English Corpus (reported by Deignan and Cameron 2009) found only one citation each of see fit to, see action (meaning “fight as a soldier”), and see eye to eye. Yet clearly none of these is creative or anomalous, as a larger sample shows. See fit to is found 128 times in the BNC, see action 41 times, and see eye to eye 63 times. See fit to and see eye to eye are covered in all three of MED, OALD, and CCALED, under the entries for fit and eye, respectively, while see action is covered in MED only, under the entry for action. The frequencies cited above justify these decisions, with the exception of the omissions of see action. While they may not seem particularly high, comparison with adjacent headwords is illuminating: the entry immediately before see in OALD and MED is for sedulous, which only occurs once in the BNC, while the entry in this position in CCALED is for seductress, which occurs eight times in the BNC. Including an entry for a semantically heavy, monosemous word like sedulous somehow, intuitively, seems less controversial a decision than creating a separate sense for a fixed figurative expression from a polysemous word, but if frequency is used as a criterion, this is not justified.

Metonymy

Metonymy is the term for the process by which an aspect of something is used to stand for that thing; it also describes the product of that process. Metonymy is used here in its broad sense (following Lakoff and Johnson, 1980), to cover part-whole relationships, sometimes classified as meronymy and/or synecdoche. Like meta- phor, metonymy can generate terms for new concepts and in the process thus generates new senses of words. For example, a car that uses more than one kind of power source is known as a hybrid, and a type of food that is relatively new to Britain, consisting of a round flat piece of bread, a “wrap,” wrapped round a filling such as and/or vegetables, is known as a wrap. Hybrid and wrap literally refer to one aspect of the car, and to one part – or analyzed differently – one characteristic of the food; the terms are used metonymically to refer to the whole. In the process, the words hybrid and wrap have been given new meanings, which are now established enough to be described in dictionaries. Much older examples are the use of ear and eye to refer to the facilities to hear and see, respectively, and to judge the quality of what is heard or seen, in an ear for music or an eye for a bargain. This meaning of ear is covered as a numbered sense in most dictionaries, while the meaning of eye is more usually covered as a phrase at the end of the entry for eye. This is presumably because something about eye strikes the lexicographer as more idiom-like; perhaps the connection between the physical sense and the discourse meaning of judgment seems more distant than the corresponding rela- tionship for the ear. 10 A. Deignan

There are at least two types of metonymy that would not normally be included in a dictionary and one that is debatable. The first is seen in the classic and often-cited example of metonymy, ham sandwich, referring to a customer, in Nunberg’s “The ham sandwich is sitting at table 20” (1979, p. 149). This use would, of course, fail the “central and typical” test described above for metaphor, as would most meton- ymies whose meaning is derived through their contextual reference. Another type of metonymy that would not normally be included in a dictionary is one very pervasive in language. The literature on metonymy discusses instances such as “the kettle was boiling” (Warren 2003, p. 116), in which kettle metonymically refers to the contents of the kettle, and book referring to the intellectual message of a book. In both cases, the metonymical meaning follows logically from the “core” meaning – if, indeed, there is any “core,” nonmetonymical meaning of either word when metonymy is understood in this way. Similarly, Kilgarriff (2008, p. 139) lists different uses of bike:

“Raphael doesn’t often oil his bike.” “Madeleine dried off her bike.” “Boris’s bike goes like the wind.”

writing that “different aspects of the bicycle – its mechanical parts; its frame, saddle, and other large surfaces; its (and its rider’s) motion – are highlighted in each case.” Kilgarriff argues that meaning extensions should be separate senses in a dictio- nary only when they exhibit “lexical meanings which are not predictable from the base sense” (ibid). This clearly excludes this type of metonymy, which is best seen, like Nunberg’s example of contextually referential metonymy, as the product of a normal function of language. Other predictable metonymies are COUNTRY FOR THE PEOPLE REPRESENTING IT; this is seen in the use of England to refer to the country’s sporting teams, its entry for the Eurovision song contest, its army, and numerous other kinds of representative. The meaning of England in context can be derived through the reader/hearer’s knowledge of this metonymical relationship, combined with context, which will suggest which representative of the country – the sporting team, musician, army, or other – is likely to be intended. Like Kilgarriff’s exam- ples, these meaning extensions are generated through well-known mechanisms and should not normally be covered in a dictionary. A less straightforward question concerning metonymy is how to deal with regular metonymies that are culturally motivated. English has a number of well- established metonyms, such as White House, Downing Street, and the palace,by which a location stands for the people who live or work there, in these examples, the US administration, the UK prime minister, and the entourage of the UK royal family. These are motivated by normal metonymic processes, but the product has become conventionalized in the language. They are not completely transparent; for instance, the palace conventionally refers to the spokespeople for the royal family, rather than the family itself. Further, because they contain cultural information, a dictionary user from a different culture might not always be able to decode them. Figurative language and lexicography 11

While a user will know at some level that a place can stand for a person who works there, given that this is probably a universal, he or she might not know that Downing Street is the London residence of the UK prime minister. Similarly, a user might not know that while Downing Street and the palace are used metonymically, the name of the prime minister’s country house, Chequers, or the British ’s official residence, Windsor Castle, are never used in this way. Whether or not such uses are included in a dictionary will depend on editorial views about the inclusion of proper nouns and cultural information in addition to the lexicon proper.

Irony

Ironic uses are occasionally included in dictionaries. For example, the entry for great in CCALED has for sense 10 “You say great in order to emphasize that you are pleased or enthusiastic about something” and for sense 11 “You say great in order to emphasize that you are angry or annoyed about something” (2006, p. 634). This is presumably because enough concordance citations were found of the ironic use to warrant it having its own sense. However, as no explanation is given for these contrasting senses, the result might be puzzling for the learner who this dictionary is aimed at. Generally speaking, though, the “central and typical” criterion for inclu- sion rules out many, if not most, instances of irony, because the ironic effect is achieved by contrasting contextual pragmatic meaning with conventional meaning. This means that precisely in order to achieve its effect, the figurative, contextual meaning of an ironic use is not, normally, central and typical. In fact, it can be argued that the contextual, ironic meaning is not actually a separate meaning at all. Hanks writes that although ironic and sarcastic uses “are undoubtedly exploitations of the normal meanings of the words involved, the words themselves have to be taken literally, at face value. The sarcastic, ironic or hyperbolic implicature of what is said takes place at the clause level, not at the lexical level” (2013, p. 236). Whether the contextual use is seen as having its own meaning or not, it cannot be regarded as a conventional meaning and would not therefore be included as a sense in the entry for a word.

How Should Literal and Figurative Meanings Be Ordered and Treated?

For corpus-based dictionaries, frequency is usually perceived as an important factor in selecting headwords and ordering senses. As noted above, it is not uncommon for a metaphorical sense to be more frequent than a literal one. This means that someone using a dictionary to decode an unknown word is more likely to have encountered the metaphorical sense than the literal one. If we assume that the user starts from the beginning of an entry and works down, it makes sense to present this first. However, as Moon notes (1987), putting the more frequent, metaphorical 12 A. Deignan sense before the literal sense that it originated from disrupts the semantic flow of the entry. There is no straightforward solution, and decisions will be based partly on how the lexicographer thinks a dictionary will be used. In a dictionary for language learners in a hurry, it might be felt that the user wants to know the meaning that he or she has encountered in text, which is most likely to be the most frequent one in the corpus, assuming the corpus resembles the texts the learner uses. The learner may not be interested in other meanings; indeed, most language learners work hard to learn and retain just one meaning of a new word at a time and may not want to be distracted by other meanings, or by the origins of the word. However, if a dictionary is likely to be used for encoding and for language learners doing more extensive vocabulary-building work, there is an argument for trying to structure an entry to demonstrate semantic connections between senses. These will probably aid memorability and add interest. If this approach is taken, there is sometimes a case for mentioning the grounds of metaphorical extensions, especially when these are unpredictable because they are culturally specific or even erroneous. For instance, we use many animal metaphors to signal human qualities (some are described in Deignan’s guide to metaphors, 1995). In the metaphors of British English, rats are sneaky, rabbits talk too much, horses are playful, weasels are deceitful, and squirrels are good at saving. For some animals, the origins of these uses are clearer than others. While squirrels bury nuts for a less plentiful winter in a way that is clearly analogous to prudent human savers, it is far from obvious why rabbit on is used to mean “talk too much in an uninteresting way.” For some, the metaphor seems motivated but not predictable: seeing rats negatively is probably common across many cultures, but it is not obvious that treachery specifically, rather than, say, viciousness or disease, is associated with them for British speakers. Similes offer a rich source of such parallels, and many of these are not even believed by most speakers. For example, it is often noted that although we say “sleep like a baby,” many babies only sleep for short stretches at a time. Metaphorical and metonymical uses often present a further problem for order- ing. Corpus research has shown that they have a very strong tendency to occur in strong collocations and fixed expressions, bordering on classical idioms (Hanks 2004; Deignan 2005). The traditional place for idioms, and often for inflexible collocations, is at the end of an entry, usually the entry for the most lexically “heavy” of the component words. This of course reflects the awkward status of fixed expressions and idioms in a division of language into grammar and lexis. They cannot be ignored, but because they don’t fit into either a grammar reference book or a dictionary organized around single words; they are swept up together and parked out of the way. Within a traditional dictionary, there does not seem to be a better solution. However, as publishing moves away from paper toward electronic resources, putting word senses and uses in one fixed, linear order is not the only possibility. Electronic dictionaries offer the potential for nesting and embedding, automatically cross-referring, reordering according to user preferences, or searching in many ways other than traditional alpha order, but these are as yet far from fully exploited. Figurative language and lexicography 13

Dictionaries and Metaphor Scholarship

This chapter has concentrated on how dictionaries can identify and deal with figurative language. To conclude, I look briefly at the relationship the other way round: how dictionary use has contributed to metaphor scholarship. Techniques originally developed in corpus lexicography have been used by a number of metaphor scholars in recent years as the use of corpora has become mainstream. For example, Nerlich and her colleagues (2011) have used corpus techniques to examine metaphors of climate change in the Lexis Nexis database of US newspa- pers, while Philip (2011) has used them to study collocation and connotation in metaphor. Aside from lexicographical techniques, dictionaries themselves have been important to metaphor scholars. The two major projects in metaphor identification discussed above, which resulted in the MIP and MIPVU procedures, both rely on referring to English language dictionaries as sources of knowledge about central and typical language. They are used in two ways. Firstly, Pragglejaz (2007, pp. 15–16) notes that a dictionary – they use MED – can be used to identify lexical units in a text. This is the first step of the metaphor identification procedure, where decisions have to be made about whether multiword items should be broken down and analyzed in their component words or should be treated as a single lexical unit. Pragglejaz treats all headwords as lexical units. Phrases, collocations, and idioms described at the end of the entry are not treated as lexical units but analyzed word by word, with the exception of phrasal verbs, which are treated as holistic lexical units. There is a continuous cline between regular collocations, more fixed strings, such as idiomatic expressions, through to phrasal verbs and polywords such as of course, with no gap or identifiable point at which a line can be drawn to separate lexical units from groups of individual words that happen to collocate regularly. Using a dictionary to guide this decision at least ensures consistency and replicability. The second way in which dictionaries can inform metaphor identification is in making the decision about whether there is a more “basic” sense of a potential metaphor: “The main criterion for deciding whether two senses are sufficiently distinct is whether the contextual and the basic sense are listed as two separate, numbered sense descriptions in the dictionary” (Krennmayr 2008, p. 104). Krennmayr discusses difficulties that can arise when dictionaries conflate clearly different senses, probably for reasons of space, and recommends using several different dictionaries to cross-check. She concludes that despite some limitations, “corpus-based dictionaries are an important and useful tool in moving away from guesswork and intuition, instead supporting analysts’ linguistic metaphor identifi- cation with carefully compiled language data” (Krennmayr 2008, p. 114). Scholars of figurative language recognize the painstaking work of lexicography; its attention to consistency and concern with form, use, and typicality; and how this can be used to inform their work. The relationship between metaphor scholarship and lexicog- raphy is an interesting and mutually productive one. 14 A. Deignan

References

Barcelona, A. (Ed.). (2000). Metaphor and metonymy at the crossroads: A cognitive perspective. Berlin/New York: Mouton de Gruyter. Bowdle, B. F., & Gentner, D. (2005). The career of metaphor. Psychological Review, 112(1), 193–216. Cameron, L. (2003). Metaphor and educational discourse. London: Continuum. Cameron, L. (2007). Patterns of metaphor use in reconciliation talk. Discourse and Society, 18(2), 197–222. Deignan, A. (1995). Collins COBUILD guides to english 7: Metaphor. London: HarperCollins. Deignan, A. (1999). Corpus-based research into metaphor. In L. Cameron & G. Low (Eds.), Researching and applying metaphor. Cambridge: Cambridge University Press. Deignan, A. (2005). Metaphor and corpus linguistics. Amsterdam/Philadelphia: John Benjamins Publishers. Deignan, A., & Cameron, L. (2009). A re-examination of understanding is seeing. Cognitive Semiotics, 5(1–2), 220–243. http://www.degruyter.com/view/j/cogsem.2013.5.issue-1-2/cogsem.2013.5.12. 220/cogsem.2013.5.12.220.xml Dirven, R., & Po¨rings, R. (Eds.). (2001). Metaphor and metonymy in comparison and contrast. Berlin: Mouton de Gruyter. Goatly, A. (1997/2011). The language of metaphors. London: Routledge. Hanks, P. (1987). Definitions and explanations. In J. Sinclair (Ed.), Looking up: An account of the COBUILD project in lexical computing. London/Glasgow: Collins ELT. Hanks, P. (2004). The syntagmatics of metaphor and idiom. International Journal of Lexicogra- phy, 17(3), 245–274. Hanks, P. (2010). Nine issues in metaphor theory and analysis. International Journal of Corpus Linguistics, 15(1), 133–150. Hanks, P. (2013). Lexical analysis: Norms and exploitations. Cambridge, MA: MIT Press. Hunston, S. (2002). Corpora and applied linguistics. Cambridge: Cambridge University Press. Kilgarriff, A. (2008). I don’t believe in word senses. In S. Atkins & M. Rundell (Eds.), Lexicog- raphy: A reader. Oxford: Oxford University Press. Krennmayr, T. (2008). Using dictionaries in linguistic metaphor identification. In N.-L. Johannessen & D. C. Minugh (Eds.), Proceedings of the Stockholm Metaphor Festival. Stockholm: Department of English. Lakoff, G. (1987). The death of dead metaphor. Metaphor and Symbolic Activity, 2(2), 143–147. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: Chicago University Press. Lehrer, A. (1970). Static and dynamic elements in semantics: Hot, warm, cool, cold. Papers in Linguistics, 3(2), 349–373. Lehrer, A. (1978). Structures of the lexicon and transfer of meaning. Lingua, 45(2), 95–123. Louw, B. (2010). Sematic prosody for the 21st Century: Are prosodies smoothed in academic contexts? A contextual prosodic theoretical perspective. JADT 2010: 10th International Conference on Statistical Analysis of Textual Data. Moon, R. (1987). The analysis of meaning. In J. Sinclair (Ed.), Looking up: The COBUILD project in lexical computing. Collins ELT: London and Glasgow. Moon, R. (1998). Fixed expressions and idioms in english: A corpus-based approach. Oxford: Clarendon. Nerlich, B., Evans, V., & Koteko, N. (2011). Low carbon diet: Reducing the complexities of carbon change to the human level. Language and Cognition, 3(1), 45–82. Nunberg, G. (1979). The non-uniqueness of semantic solutions: polysemy. Linguistics and Phi- losophy, 3(2), 143–184. Philip, G. (2011). Colouring meaning: Collocation and connotation in figurative language. Amsterdam/Philadelphia: John Benjamins. Pragglejaz Group. (2007). MIP: A method for identifying metaphorically used words in discourse. Metaphor and Symbol, 22(1), 1–39. Figurative language and lexicography 15

Sinclair, J. (Ed.). (1987). Looking up: The COBUILD project in lexical computing. London/ Glasgow: Collins ELT. Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press. Steen, G. J., Dorst, A. G., Herrmann, J. B., Kaal, A. A., Krennmayr, T., & Pasma, T. (2010). A method for linguistic metaphor identification. Amsterdam/Philadelphia: John Benjamins. Warren, B. (2003). An alternative account of the interpretation of referential metonymy and metaphor. In R. Dirven & R. Po¨rings (Eds.), Metaphor and metonymy in comparison and contrast. Berlin: Mouton de Gruyter.

Dictionaries

Hornby, A. S. (Ed.). (2010). Oxford Advanced Learner’s Dictionary of Current English. Oxford: Oxford University Press. Rundell, M., & Fox, G. (Eds.). (2002). Macmillan English Dictionary for Advanced Learners. London: Macmillan Education. Sinclair, J. (2006). Collins COBUILD Advanced Learner’s English Dictionary. London: HarperCollins. International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014

Bilingual Lexicography: Translation Dictionaries

Arleta Adamska-Sałaciak* Faculty of English, Adam Mickiewicz University, Poznań, Poland

Abstract

The present chapter examines the peculiarities of lexicography linking two languages. It addresses the following broad issues: Can bilingual dictionaries legitimately be called translation dictionaries? What language pairs do they normally cover? Who uses them? What are the most persistent problems faced by bilingual lexicography and what time-honored theoretical assumptions are they grounded in? Why is lexicographic equivalence a problematic notion? How is the bilingual dictionary currently changing and what is its future likely to be? The discussion is preceded by a few words of general introduction.

Introduction

The bilingual dictionary is a dictionary par excellence. It was born in response to a genuine practical need (to understand texts and utterances in a foreign language) and is still predominantly a useful practical tool rather than – as is often the case with large monolingual dictionaries – an essentially symbolic object which confers status upon the owner but is rarely taken off the shelf. Despite the bilingual dictionary’s clearly man-made origins, its historical development in the West can be thought of as a kind of organic growth. Rather than having been designed in a single step, with all its current features in place from the start, it evolved gradually from something much more basic: vernacular explanations (glosses) of individual foreign words and phrases, written by medieval scribes in between the lines and in the margins of Latin manuscripts. Later, the native glosses, paired with the foreign words they were meant to explain, started to be collected into separate lists (glossaries). Later still, when the glossaries had grown too large to make efficient consultation possible, their compilers began to arrange the contents either alphabetically (at first only according to the first letter of every word) or thematically. It is at that point that we start talking of the first dictionaries. Gradually, in addition to dictionaries with Latin as the source language, bilingual dictionaries appeared in Europe whose source language was one actually spoken (French, Spanish, and so on). To begin with, they comprised only one part, leading from L2 (a foreign language) to L1 (the intended users’ native language); the L1-L2 (native-foreign) part was always a later development. Unless indicated otherwise, our overview of the field of bilingual lexicography will be concerned with the “complete,” two-way dictionary (L2-L1 and L1-L2).

*Email: [email protected]

Page 1 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014

Description

Bilingual Dictionaries = Translation Dictionaries? It is common practice in the metalexicographic literature for the two terms to be used interchange- ably. There is, however, one notable text in which the equation has been questioned: the 1940 essay by the Russian linguist Lev N. Ščerba (1995). A practicing (Russian-French) lexicographer himself, Ščerba argued that only the L1-L2 dictionary could properly be considered a translation dictionary, whereas the L2-L1 dictionary should rather be called explanatory. His reasoning was that the two perform essentially different functions: translation from the user’s native into a foreign language (L1-L2 dictionary) as opposed to explaining the meanings of foreign-language words and expres- sions (L2-L1 dictionary). When one starts from one’s L1, he argued, the meaning to be expressed, encoded in a particular L1 lexical item, is clearly understood – all that is needed is a corresponding L2 item which can express the same meaning. The lexicons of different languages being essentially incommensurable, such an L2 counterpart will usually be no more than an approximation of the L1 unit, but it should suffice for the purposes of basic translation; anyway, under the circumstances, this is the best that can be done. By contrast, in the L2-L1 dictionary, Ščerba believed that discursive L1 explanations could do a much better job than the necessarily imprecise L1 equivalents which cannot but misrepresent L2 meanings. Accordingly, he postulated that the practice of offering one-word native-language equivalents be abandoned in the L2-L1 dictionary, in favor of something resembling monolingual definitions written in the users’ L1. Both practicing lexicographers and theoreticians have remained largely immune to those sugges- tions. Just as in L1-L2 dictionaries, source-language (SL) headwords in L2-L1 dictionaries are normally supplied with target-language (TL) equivalents rather than with extensive explanations of meaning, and the practice of referring to bilingual dictionaries as translation dictionaries is alive and well. Apart from the sheer force of tradition, part of the reason may be that the vast majority of bilingual dictionaries produced to date have been bidirectional, that is, designed to cater for speakers of both languages at once. In such a dictionary, what is the L1-L2 part for one group of users acts as the L2-L1 part for the other, and vice versa. Consequently, it is not possible to vary the treatment in the two parts, making the foreign-native part more explanatory in nature so as to give the user a better idea of L2 meanings. This may soon change, since, with the switch to the electronic medium, not only making monodirectional dictionaries (i.e., dictionaries addressed exclusively to the speakers of one of the languages) but also customizing individual entries to suit individual users’ needs and preferences is rapidly becoming much easier. Even without making an excursus into the future, but simply by examining the properties of existing bilingual dictionaries, one can arrive at the conclusion that the term translation dictionaries may be something of a misnomer or, at best, an oversimplification. First of all, while the average bilingual dictionary entry can go some way toward helping us understand the meaning of the SL word or phrase in isolation, and at times also suggest a working TL equivalent, it does not always guarantee a correct translation in context. Through no fault of either the lexicographer or the user – that is, even if the dictionary is a good one and the user knows how to use it to the best advantage – successful translation is by no means the standard outcome of bilingual dictionary consultation. The reasons for that lie in certain immanent properties of language and meaning about which more will be said later. Secondly, it is obvious that bilingual dictionaries perform many different tasks besides translation. In particular, a good monodirectional dictionary is not just a quest for sameness (equivalence)

Page 2 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014 between the lexical units of the two languages, but also an attempt to identify areas of convergence and divergence between the respective lexicons. Being made aware of the differences between L1 and L2 is something language learners need just as much as, and sometimes even more than, being issued with pairs of interlingual equivalents (see, e.g., Augustyn 2013, p. 365). In other words, not all dictionary consultations are motivated by the desire to solve an immediate communicative problem: some happen in what Wiegand (1999, p. 76) called “didactic look-up situations.” This implies that, apart from being a tool enabling quick reference, a bilingual dictionary should double as a teaching aid: it should serve both a communicative function (satisfying users’ information needs) and a cognitive one (facilitating in-depth study of the foreign language). While translation is often relevant in the former case, it need not be so in the latter. Finally, in view of the direction lexicography as a whole is taking, especially given the likely future integration of dictionary components with other functionalities, it becomes pointless to single out the translation aspect of bilingual dictionaries as somehow still defining the genre.

Language Pairs Covered by Bilingual Dictionaries It would be unrealistic to expect that a bilingual dictionary will by now have been written for every possible pair of natural languages. Far from it. Since the production of dictionaries has always been governed by practical considerations, the traditionally privileged group comprises major world languages, i.e., those which are widely spoken and which many people are interested in learning. Such languages have been paired with numerous others, so that it should not be very difficult to find, e.g., an English-Lx, Lx-English dictionary, where Lx can be virtually any language. Another group in which bilingual lexicographers have understandably been interested includes, on the one hand, dead languages associated with important ancient cultures and, on the other, so-called exotic languages. Any language which is still alive but neither widely spoken nor particularly exotic is unlikely to have received nearly as much attention. This is now beginning to change, albeit slowly. Given that the costs of online publication are a fraction of the costs of producing a printed volume, more and more dictionaries can be compiled for language pairs for which the demand remains very modest. As a taste of what is to come, some instances of bilingual dictionaries covering unusual language combinations can be examined at http://www.dicts.info or at http://www.owid.de/obelex/dict.

Users and Uses of Bilingual Dictionaries Despite the dynamic development of research into dictionary use, the results of that research have impacted our knowledge only in certain specific areas. Thus, we have learned a great deal about what happens when college students and, to a lesser extent, high school students use bilingual dictionar- ies, those being the groups on which most studies have focused. As far as the population at large is concerned, however, common sense and personal experience must, for the moment, remain our main guide to the nature of bilingual dictionary usership. In the absence of evidence to the contrary, it seems reasonable to assume that the two major groups of users of L2-L1 dictionaries comprise, on the one hand, beginner learners (for whom turning to a monolingual learner’s dictionary, written entirely in L2, is not an option) and, on the other, very advanced students, such as translators or language teachers at the highest level, including those with near-native command of the foreign language. For this latter, advanced group, knowledge of an L2 word or phrase, even when both passive and active, is frequently not enough: what they need is an actual L1 equivalent which can, for instance, be put into a written translation or used in interpreting. Availing oneself of a bilingual dictionary makes eminent sense especially when time is of the essence. Instead of racking one’s brain for the best possible equivalent, the translator/interpreter can

Page 3 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014 concentrate on the big picture, such as figuring out the overall meaning of the passage to be translated and devising the best strategy for rendering it in L1. (An even more obvious case are translators of specialized texts, who, not being domain experts, need L1 equivalents for L2 terms whose meaning they do not fully grasp.) When it comes to L1-L2 dictionaries, the main beneficiaries are, again, beginner learners, for whom there is literally no alternative to the bilingual dictionary. Upon reflection, though, it becomes evident that virtually everyone needs L1-L2 dictionaries, irrespective of their L2 proficiency (for an in-depth discussion, see Adamska-Sałaciak 2010b). Apart from the obvious case of culture-specific L1 vocabulary, no suggestions for dealing with which can ever be found in a monolingual dictionary of L2, there are more general reasons for this state of affairs. No matter how good a person’s grasp of a foreign language is, they will not always be able to come up with the L2 lexical item they need on a particular occasion. They may know the item in question, in the sense of having it stored somewhere in their mental lexicon, but this is not tantamount to being able to retrieve it at will. This is precisely when a bilingual L1-L2 dictionary, leading to the desired item via the user’s native language, comes to the rescue. (An alternative search route would be to start with another L2 item, a near-synonym of the one that eludes us, i.e., to proceed in the same way a native speaker would. This, of course, crucially depends on the prior existence of a near-synonym and on the speaker’s ability to recall it, and therefore will not always work. Importantly, retrieval problems are not limited to a person’s second language; even within the mother tongue, one is not always able to access the lexical unit one wants, as evident, for instance, in tip-of-the-tongue phenomena.) On a more general level, there is a growing, albeit belated, realization (see, e.g., Augustyn 2013 and the works cited therein) that there can be no L2 acquisition without recourse to the learner’s L1. Hence the indispensability of lexical resources – especially, though not exclusively, bilingual dictionaries – which help students make the necessary connections between the unknown (the L2 to be learned) and the already known (their L1). So-called direct methods, which make a point of avoiding the learner’s native language at all costs – and which, until quite recently, had dominated the language-teaching scene – of necessity concentrate not on the most frequent, and thereby most needed, lexical items but on those that can be easily presented to the learners, either through visual aids or with the help of a limited amount of basic L2 vocabulary. It is genuinely puzzling how methods which explicitly condemn the use of the native language in the classroom, effectively banning bilingual dictionaries, could ever have been considered beneficial in the teaching and learning of foreign languages. (Needless to say, the view was never universally accepted; see, e.g., Tomaszczyk 1983.)

Assumptions Behind (and Challenges for) Bilingual Dictionary Making The rationale behind bilingual dictionaries has always been that languages are mutually translatable, that is, that any content expressed in any language can be expressed in any other. Most linguists would agree that this is a sound assumption. However, the (usually implicit) further assumption, namely, that a bilingual dictionary is not only a necessary prerequisite but also a sufficient condition for successful interlingual translation, rests on a profound misunderstanding, both of the nature of human language as such and of the nature of the relationships between different languages. Contrary to the commonsense view, which has for centuries gone unchallenged and still prevails in lay circles, lexical units are not straightforward labels for objectively identifiable, non- overlapping portions of extralinguistic reality, each additionally associated with a unique concept (meaning) in the native speaker’s mind. As pointed out already by Democritus (460–370 BC), the following kinds of situation render false that naïve belief: the same sequence of sounds may be associated with two or more meanings; different words within one language may have the same

Page 4 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014 meaning; a given language may have no single word for a certain simple or familiar idea (Householder 1995, p. 93). What all this amounts to is that there is no universal one-to-one correspondence between form and meaning (i.e., between word/expression and concept), nor between either of those and individual objects or events independently existing in the world. The most important of Democritus’ observations pertains to the possibility of one sequence of sounds (form) having more than one meaning. It is, in fact, more than a mere possibility: most items of general language do indeed appear to be polysemous. According to the cognitive linguistic model of language (see, e.g., Langacker 2002) – which, incidentally, in many crucial respects echoes the tenets of traditional diachronic semantics (Geeraerts 1988) – a lexical item is typically associated with a variety of interrelated senses. Within such a network of senses, there is usually one – sometimes more than one – core (prototypical) sense from which the remaining senses have developed over time, whether through extension (generalization), narrowing (specialization), or figurative use (usually via metaphor or metonymy). Accepting the above as the most convincing account of lexical meaning in existence, it would be extremely naïve to expect the sense networks of two lexical items coming from two different languages to be completely identical. Yet this is precisely the premise underlying any bilingual dictionary entry of the form:

XSLYTL where XSL is the source-language headword and YTL is its sole target-language equivalent. Since there are many bilingual dictionaries where the majority of entries have precisely that form, we cannot but conclude that, strictly speaking, the whole genre is built on rather shaky foundations. (This applies also to bilingual terminological dictionaries, since, again contrary to popular belief, specialist vocabulary is not radically different from nontechnical language. As convincingly dem- onstrated by Temmermann (2000), outside certain normative contexts, intralingual monosemy and perfect interlingual correspondence turn out to be remarkably rare in terminology.) What, if anything, can be done to remedy the situation? Obviously, no amount of progress in bilingual lexicography can ever compensate for the intrinsic incommensurability of the lexicons of different languages. Nonetheless, good lexicographers have long been trying to construct bilingual dictionary entries in a manner which reflects the complexity of interlingual lexical relationships as faithfully as possible; some commonly employed strategies are discussed in Adamska-Sałaciak (2014). Additionally – and more controversially – it could, perhaps, be argued that the time is now ripe to make it one of the tasks of the bilingual dictionary to raise its users’ awareness of the problem of anisomorphism (i.e., lack of one-to-one interlingual correspondence) instead of continuing to sweep it under the carpet. Today’s users, at home in the digital world and thus familiar with tools such as Google Translate – which entails being aware of the imperfections of those tools – may be better equipped to deal with this unwelcome revelation than their twentieth-century and earlier predecessors. How might the consciousness-raising exercise work in practice? Instead of being obliged by the requirements of the genre to pretend that every SL item can be supplied with a TL equivalent, whenever interlingual asymmetry makes any potential equivalent candidate too far-fetched and thereby misleading, the bilingual lexicographer should be allowed to explicitly inform the user – either inside the entry itself or in an accompanying note – of the existence of a TL lexical gap (see LSW, especially the Polish-English part, for examples). It seems that, apart from tradition, the main practical reason why such admissions were in the past deemed impossible was the space restrictions imposed by the printed medium.

Page 5 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014

Of course, one new problem to which the proposed relaxation of rules might give rise would be the temptation to fall back on the no-equivalent available strategy whenever the lexicographer encounters even a minor difficulty. To minimize the chances of that happening, the dictionary’s style guide should make it abundantly clear that the strategy ought to be treated as a last resort and employed primarily when dealing with culture-specific vocabulary or when a particular grammatical category (e.g., ideophones) does not exist in the target language. Also, no matter how futile the quest for an equivalent, the user must never be left without further assistance. After declaring the impossibility of providing an equivalent (and, when deemed useful, explaining why this is the case), the lexicographer should proceed to suggest ways of patching up the asymmetry, employing any one of a number of well-tried techniques – for instance, embedding the difficult item in a wider context and translating the whole lot (thereby aiming at functional equivalence; for examples involving ideophones, see OZSD and the discussion in De Schryver 2009, as well as Adamska-Sałaciak 2011). There are other reasons, apart from anisomorphism, why the traditional structure of the bilingual dictionary – basically, a list of allegedly exact lexical correspondences between two languages – is hardly a perfect reflection of how meaning is distributed intra- and interlingually. One problem, not entirely unrelated to the lack of isomorphism, is that units of meaning are not always coextensive with orthographic words: they may be both smaller (morphemes) and larger (multi-word units); furthermore, different languages make different choices in particular cases. Another difficulty stems from the vagueness of meaning – an inherent property of natural language, highly advantageous from the point of view of the ease of language acquisition and language use, but highly problematic when it comes to writing dictionary entries, whether mono- or bilingual (see Adamska-Sałaciak 2013, pp. 222–223). In view of all this, it is truly remarkable that so many bilingual dictionaries have been written over the centuries and that so many people have found them helpful. Perverse as it may sound, perhaps it is just as well that the authors, especially of the early dictionaries, have mostly been unaware of the problems identified above: had they realized the enormity of what they were setting out to do, we might not have any bilingual or multilingual dictionaries to speak of.

The Vexed Question of Equivalence One consequence of the problems discussed in the preceding section is that the main idea of the bilingual dictionary – that of providing a perfect TL equivalent for each (sense of a) SL headword – turns out not to be a realistic goal but rather something toward which lexicographers can merely strive. Extensive study of large amounts of data from different bilingual dictionaries strongly suggests that the items proposed as lexicographic equivalents are not all of a kind; consequently, attempts have been made in the metalexicographic literature to group them into more or less distinct classes. One such classification, inspired by Zgusta (1971) and developed by Adamska-Sałaciak (2006, 2010a, 2011), features four equivalence types: cognitive, translational, explanatory, and functional, briefly characterized below. A cognitive equivalent has a high explanatory potential, i.e., it is capable of faithfully rendering the meaning of the SL headword. Its identification is often more or less effortless, because it tends to spring to mind immediately after a bilingual speaker (lexicographer) has been presented with an SL headword. Thanks to that, cognitive equivalents are frequently identical in different dictionaries for the same language pair, which gives rise to the feeling that they are somehow “real” or “true.” On the downside, a cognitive equivalent is often too general to work as a translation of the SL item in a particular context.

Page 6 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014

A translational equivalent, by contrast, while not being wholly identical in meaning to the SL headword, produces a good translation when substituted for it in a particular context (not least because it has similar combinatory properties). As there is no upper limit on the number of contexts in which a given SL lexeme may occur, the number of its potential TL translational equivalents may also be quite high. A bilingual dictionary can only supply a few such equivalents per sense, zooming in on those usable in the most typical contexts in which the SL item occurs. Unlike a cognitive or a translational equivalent, an explanatory equivalent is not an established TL unit, but a free TL combination: a succinct paraphrase of the meaning of the SL headword. It resembles a mini-definition in the target language, except that it is normally shorter. Finally, functional equivalence is a relation holding not between the meanings of individual lexical items but between the meanings of longer stretches of text. Typically, the TL text portion either contains a TL word of a different grammatical category than the SL headword or features no element whatsoever directly corresponding to that headword. The boundaries between different equivalence types are not always sharp in that one and the same item may, on occasion, be a realization of more than one type (illustrations can be found in Adamska-Sałaciak (2006, pp. 106–117), and forthcoming). Rather than rendering the whole clas- sification useless, this is believed to be an inevitable consequence of the nature of interlingual relationships. The important thing is that there are two distinct, complementary criteria at work: semantic and distributional (functional). According to the semantic criterion, two items are consid- ered equivalent if they have the same meaning (in the sense of having identical definitions, rather like synonyms in the same language). According to the distributional criterion, two items are considered equivalent if they can be used as each other’s translations in context(s). Ideally, an equivalent supplied by a bilingual dictionary should fulfill both criteria at once, that is, it should both explain the headword’s meaning and be substitutable for the headword in context. This happens occasionally (especially with autosemantic items, i.e., those whose meaning is largely context- independent), but is by no means the rule. On the whole, cognitive and explanatory equivalents meet the former criterion, while translational and functional equivalents meet the latter. Assuming that, most of the time, only one of the requirements can be satisfied, which of them should be judged more important? Two possible answers suggest themselves. Considering the issue from the point of view of the two parts of a bilingual dictionary, it would seem that the semantic criterion prevails in the L2-L1 part (whose main task is to faithfully render SL meanings), while the functional criterion takes precedence in the L1-L2 part (which should equip the user with ready- made solutions in the form of L2 items usable in their own linguistic production). But we can also adopt a wider perspective, taking into account the tendencies observable in present-day bilingual lexicography. Given how the dictionary as a stand-alone product gradually recedes into the background, merging with different other kinds of language resources available online (or, in general, in digital rather than book form), it seems fair to say that the future belongs to translational and functional equivalence. This is true at least insofar as the average user is concerned, one who, more often than not, needs lexical and other linguistic information to solve a particular local problem (thus satisfying a concrete communicative need), and expects to be able to find that information anytime, anywhere. Linguistic scholars and word aficionados will, of course, always be interested in cognitive and explanatory equivalence, but both these (frequently overlapping) groups have always been, and are likely to remain, in the minority. One of the more exciting prospects related to the issue of equivalence is that, as envisioned by Atkins (1996, p. 526), in bilingual dictionaries of the future, the more sophisticated users (we should perhaps add: those with enough time on their hands) will be able to avail themselves of direct access to copious corpus citations. This, in some cases at least, ought to make it possible for them to make

Page 7 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014 their own decisions regarding equivalence, instead of always having to rely on the lexicographer’s suggestions.

Current Tendencies and Likely Future Developments The bilingual dictionary has a great deal in common with other lexicographic genres, notably with the monolingual learners’ dictionary. Consequently, numerous currently observable trends and predictions for the future – hinted at in the preceding sections and featuring prominently in other contributions to this handbook – are not restricted to bilingual lexicography. For this reason, as well as because of space limitations, only some of the relevant issues will be addressed at this point. Further inspiration can be sought, for example, in the landmark papers by Atkins (1996) and De Schryver (2003) or in the magisterial overview of the state of the art by Rundell (2012). By far the most important development in twentieth-century dictionary making, hailed by many as a genuine revolution, was the shift from introspection-based to corpus-based compilation, whose beginnings go back to the 1980s. It would be premature to claim that all of bilingual lexicography has already been revolutionized as a result, but there is no doubt that a twenty-first-century bilingual dictionary worthy of the name cannot be compiled without recourse to an electronic corpus, or rather two corpora: one for each of its object languages. Corpora provide evidence for SL meanings and for the frequency of occurrence of lexical items. They also help lexicographers identify common syntactic patterns and characteristic phraseological combinations, to draw up representative, up-to-date word lists, to demonstrate the behavior of headwords in typical contexts, and to illustrate that behavior with authentic examples of usage. There is, in addition, an important use to which corpora can be put specifically by the bilingual lexicographer. Visible improvements in the precision of lexicographic equivalents could no doubt be achieved by analyzing huge amounts of concordance data from parallel corpora (i.e., corpora consisting of SL texts and their translations) and/or by trawling comparable SL and TL corpora (i.e., corpora containing texts of the same type, from the same period, and dealing with the same topic) for recurrent interlingual correspondences. Unfortunately, the limited availability of corpora of the right kind remains, for the time being, an insurmountable obstacle; to this author’s knowledge, no bilingual dictionaries have yet been compiled exclusively in this way. It is hard to speculate about when things are likely to change, for, even when the data is all there, the process of corpus-based equivalent identification will remain extremely time-consuming and, as a result, prohibitively costly. The more components of that process can be automated (for instance, through the further develop- ment of tools such as the Sketch Engine (http://www.sketchengine.co.uk/)), the greater the chances of success. Among the more immediate benefits of the availability of huge amounts of corpus data, one should mention the relative ease of introducing more syntagmatic context into the dictionary entries, including L2 collocations and other conventionalized multi-word units containing the L2 headword. The pragmatic properties of L2 headwords, such as their register and style, can also be described in more detail and with more confidence, the corpus providing direct evidence in support of (or sometimes contradicting) the lexicographer’s intuitions. All this goes a long way toward realizing the idea of the so-called active dictionary, i.e., one which reliably assists the users’ own linguistic production. In sum, the computer revolution has transformed the way dictionaries are compiled, offering access to information – e.g., on the frequency of occurrence of lexical items or on their selectional preferences – not normally available to individual introspection, as well as increasing lexicogra- phers’ options with regard to the kind of content they might want to put in the dictionary.

Page 8 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014

Apart from the rewards of the computer revolution, lexicography is now experiencing the effects of a more recent, digital revolution. One such effect is that online bilingual dictionaries for numerous language pairs now exist (or are in preparation) which, for economic reasons, would have taken much longer to materialize in print; as an example, consider the reference works developed in connection with the Norwegian LEXIN project (http://decentius.hit.uib.no/lexin.html?ui-lang= eng&om-prosjektet). The long-term effects of the digital revolution are generally expected to be truly spectacular, culminating in the disappearance of dictionaries as we know them. The consensus among experts is that stand-alone dictionaries will eventually become “invisible,” due to integration with other linguistic and nonlinguistic data sources. The integration will proceed through the merging of different types of dictionaries (bilingual with monolingual, semasiological with onomasiological, general-purpose with terminological (LSP), etc.), through the consolidation of lexical description and encyclopedic information, through the incorporation of dictionary content into CALL materials, and through the embedding of dictionaries in other software (e.g., automatic translation tools) and hardware (e.g., e-readers). Some of these processes are already well under way in the field of LSP lexicography; as an example, consider the ARTES project (https://artes.eila.univ-paris-diderot.fr/)in which a multilingual LSP dictionary is generated from terminological and phraseological databases (details in Kubler€ and Pecman 2012). Within the niche that the “invisible” dictionary is shortly going to occupy, its coverage can be made much more thorough than in traditional, paper publications, and the criteria for inclusion are certain to become relaxed. This is partly a result of dictionary makers not having to worry about space restrictions and partly a consequence of their trying to meet the growing expectations of dictionary users. Today’s young users expect to be able to find information about anything and everything – including anything lexical – that they happen to come across, whether in the real or in the virtual world. This means, among other things, that lexicographers need to be generous (and quick) with the provision of lexical innovations, both lexical neologisms (i.e., new words) and semantic ones (i.e., new senses of existing words); in the bilingual context, it implies being ready to admit incipient, noninstitutionalized SL borrowings as TL equivalents in the L2-L1 dictionary (e.g., see Adamska-Sałaciak 2014). A closely related issue is that updating dictionary content online can be a more or less continuous process. It is also possible that some of the updating will happen through crowdsourcing. Although user-generated content is unlikely to feature as prominently in bilingual as in monolingual dictio- naries (or, rather, functionalities), there is some room for collaborative lexicography here, too, with users being encouraged to suggest new items for inclusion and, to a limited extent, to propose equivalent candidates, especially for new specialist terminology and for highly informal and/or slang words and expressions. Many more possibilities have been described in the literature and/or discussed at recent lexico- graphic conferences; new ideas are doubtless being conceived as we write, but there is only room to mention a couple of particularly intriguing ones here. In a now classic paper, Atkins (1996) sketched the logic behind what she called a virtual dictionary: one which exists only at the point of access, when the user formulates a query (the answer to which is generated through a system of hyperlinks from the databases of two languages, compiled according to the same theoretical framework). At the 2013 conference on e-lexicography, Tavast presented the theoretical assumptions behind an inno- vative bilingual dictionary (with Estonian and Latvian), work on which is currently in progress. Although organized in the traditional (semasiological) way from the point of view of the end user, the dictionary is being compiled in an unorthodox manner, starting from an onomasiological data structure (again, the prerequisite is the absolute internal consistency of the databases for both

Page 9 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014 languages). The delineation of the rationale behind the project and, especially, the discussion following it (http://eki.ee/elex2013/videos/) are an excellent illustration of how exciting life can be for a bilingual lexicographer in the twenty-first century.

References

Adamska-Sałaciak, A. (2006). Meaning and the bilingual dictionary: The case of English and Polish. Frankfurt am Main: Peter Lang. Adamska-Sałaciak, A. (2010a). Examining equivalence. International Journal of Lexicography, 23(4), 387–409. doi:10.1093/ijl/ecq024. Adamska-Sałaciak, A. (2010b). Why we need bilingual learners’ dictionaries. In I. Kernerman & P. Bogaards (Eds.), English learners’ dictionaries at the DSNA 2009 (pp. 121–137). Tel Aviv: K Dictionaries. Adamska-Sałaciak, A. (2011). Between designer drugs and afterburners: A lexicographic-semantic study of equivalence. Lexikos, 21,1–22. Adamska-Sałaciak, A. (2013). Issues in compiling bilingual dictionaries. In H. Jackson (Ed.), The Bloomsbury companion to lexicography (pp. 213–231). London: Bloomsbury Academic. Adamska-Sałaciak, A. (2014). Explaining meaning in a bilingual dictionary. In P. Durkin (Ed.), The Oxford handbook of lexicography. Oxford: Oxford University Press (forthcoming). Atkins, B. T. S. (1996). Bilingual dictionaries: Past, present and future. In M. Gellerstam, J. J€arborg, S.-G. Malmgren, K. Norén, L. Rogström, & K. Röjder Papmehl (Eds.), Euralex ‘96 proceedings (pp. 515–546). Göteborg: University, Department of Swedish. Augustyn, P. (2013). No dictionaries in the classroom: Translation equivalents and vocabulary acquisition. International Journal of Lexicography, 26(3), 362–385. doi:10.1093/ijl/ect017. De Schryver, G.-M. (2003). Lexicographers’ dreams in the electronic-dictionary age. International Journal of Lexicography, 16(2), 143–199. doi:10.1093/ijl/16.2.143. De Schryver, G.-M. (2009). The lexicographic treatment of ideophones in Zulu. Lexikos, 19,34–54. Geeraerts, D. (1988). Cognitive grammar and the history of lexical semantics. In B. Rudzka-Ostyn (Ed.), Topics in cognitive linguistics (pp. 647–677). Amsterdam: John Benjamins. Householder, F. W. (1995). Plato and his predecessors. In E. F. K. Koerner & R. E. Asher (Eds.), Concise history of the language sciences: From the Sumerians to the cognitivists (pp. 9–93). Oxford: Pergamon. Kubler,€ N., & Pecman, M. (2012). The ARTES bilingual LSP dictionary: From collocation to higher-order phraseology. In S. Granger & M. Paquot (Eds.), Electronic lexicography (pp. 187–209). Oxford: Oxford University Press. Langacker, R. W. (2002). Concept, image, and symbol: The semantic basis of grammar (2nd ed.). Berlin: Mouton de Gruyter. Rundell, M. (2012). It works in practice but will it work in theory? The uneasy relationship between lexicography and matters theoretical. In R. V. Fjeld, & J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX international congress (pp. 47–92). Department of Linguistics and Scandinavian Studies, University of Oslo. Ščerba, L. (1995). Towards a general theory of lexicography (trans: Farina, D.). International Journal of Lexicography, 8(4), 305–349. (Originally published in Russian in 1940). doi:10.1093/ijl/8.4.304. Temmermann, R. (2000). Towards new ways of terminology description: The sociocognitive approach. Amsterdam: John Benjamins.

Page 10 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_6-2 # Springer-Verlag Berlin Heidelberg 2014

Tomaszczyk, J. (1983). On bilingual dictionaries: The case for bilingual dictionaries for foreign language learners. In R. R. K. Hartmann (Ed.), Lexicography: Principles and practice (pp. 41–51). London: Academic. Wiegand, H. E. (1999). Thinking about dictionaries: Current problems. In A. Immken & W. Wolski (Eds.), Semantics and lexicography: Selected studies (1976–1996).Tubingen:€ Max Niemeyer. Zgusta, L. (1971). Manual of lexicography. Prague/The Hague: Academia/Mouton.

Dictionaries [LSW] Fisiak, J., et al. (2011). Longman Słownik Współczesny Angielsko-Polski Polsko-Angielski. Harlow: Pearson Education. [OZSD] De Schryver, G.-M., et al. (2010). Oxford Bilingual school dictionary: IsiZulu and English. Cape Town: Oxford University Press.

Page 11 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015

Dictionaries and Their Users

Robert Lew* Department of Lexicography and Lexicology, Adam Mickiewicz University in Poznań, Poznań, Poland

Abstract

It is only recently that dictionary users have become a central consideration in the design of dictionaries, and this focus has both stimulated and benefited from research into dictionary use. The present contribu- tion reviews the major issues in dictionary design from the user perspective, taking stock of the relevant findings from user research, insofar as such research can assist lexicographers in producing improved lexical tools.

Introduction

Until fairly recently, dictionary users were not usually of central concern in the process of dictionary making, however strange it may sound today. Instead, the emphasis was largely on dictionary content and often on how to pack a lot into manageable physical space. The impulse for change is generally identified to have been Barnhart (1962), with attention turning to dictionary users. Gradually, dictionary makers have begun to recognize that dictionary users do not necessarily understand all the conventions implicated in the presentation of lexicographic data. Some of these conventions are motivated by space-saving considerations, others are carried over as part of the lexicographic tradition. With time, it has become increasingly clear that most people possess limited skills when it comes to using dictionaries, and this fact needs to guide decisions on how dictionaries are constructed and how lexicographic data should be presented. Such thinking is in line with the view of dictionaries as tools primarily designed to assist human users in language-related tasks. Or, rather, this is a narrower or core view of dictionaries, which, to put things in a broader context, call for three brief qualifying statements. First, in the days of natural language processing (or language engineering), it is no longer just humans that interact with dictionaries. There also exist dictionaries for machines, which function as lexical components of information systems of several types, such as translation engines, speech-to-text systems, search engines, and others. Second, there exist specialized dictionaries focusing on the terminology of a particular domain of knowledge, which may be at several levels of generality or detail. For example, in order of narrowing focus, there are dictionaries of science and technology, chemistry, organic chemistry, gas chromatography, and water-purification technology. At each of these levels, dictionaries deal with (increasingly) special- ized terminology, rather than with everyday language. In addition, there exist reference works with the word dictionary in the title that are not concerned with language as such, but may be compendia of knowledge in a particular domain, and in this noncentral sense dictionaries largely overlap with (thematic) encyclopedias. Third, dictionaries are not always tools or plain utility products. Dictionaries may also carry symbolic meaning, by making a political statement as identity symbols, giving tangible testimony to the status or identity of a language-speaking community.

*Email: [email protected]

Page 1 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015

In what follows, I shall mainly focus on dictionaries in the narrower sense.

Description

There are several ways one can go about accommodating the needs of users in designing and perfecting dictionaries. One approach is to try to predict what the user might need by taking time to think about what they need to do, what they can do, and what skills they possess. Another is to ask the opinions of potential users. As with most tools, however, the real test usually comes from actual instances of users attempting to interact with the product, or its prototype, and noting the relative success of this interaction, plus any possible causes of failure. Such dictionary user studies try to answer a number of questions, and they do so by resorting to a variety of methodological approaches. Below I discuss a selection.

Who Are Dictionary Users? One type of question that user studies ask is about the identity and characteristics of dictionary users. If dictionaries are to be designed with specific users in mind, then we need a picture of who those dictionary users are or are likely to be. This issue has several aspects. For instance, a publisher offering an online dictionary may be interested in knowing the characteristics of their typical users, including their demographics, languages spoken natively and nonnatively, their proficiency in those languages, their educational level, etc., with the overall aim of serving their particular needs better. Conversely, specifying the (foreseen) characteristics of the target user at the planning stage helps in the design of dictionaries which do not yet exist, by equipping them with the lexicographic data that are likely to be expected and used, and presenting them in a way that takes account of the skill levels of the users, which includes language proficiency, metalinguistic awareness, and skills in navigating reference works. For example, the needs of a language teacher who uses a dictionary to mark written essays are far more sophisticated than those of a recreational reader of detective stories. The teacher will need, for instance, an exhaustive specification of verb complementation patterns as well as collocational choices, but these elements will normally be of little value in consulting a dictionary solely to look up the meaning of unknown words and expressions encountered in the text of a detective story. It is generally not a good idea to include in the dictionary more than the user is likely to need in whatever task they use the dictionary for, since irrelevant material makes it harder to locate the information that is of value.

What Do Users Use Dictionaries for and How Do We Envisage the Purpose of Dictionaries? The example presented above highlights another central aspect of the relationship between the dictionary and its user. Dictionary use occurs in a particular context, and users reaching for dictionaries are typically immersed in some kind of activity. It may be helpful to distinguish in this regard a more general context, such as the setting of a language classroom, and a more specific type of activity, such as engaging in reading magazine articles, writing an essay, or completing written translation drills. Ideally, the offerings of a dictionary should be geared to the context and nature of such an activity. This was not a very realistic agenda in the days of print dictionaries, because excessively narrow specialization of a dictionary limits its opportunities for use and its target group, which, in turn, goes against the grain of the fundamental principles of commercial success. Since most dictionaries are commercial products, this consideration cannot be dismissed lightly. Things take a different turn, however, when a dictionary is viewed – as now it should – as a digital product. In essence, such a dictionary can be made to be several different things in a single package, by equipping it with redundant data and selectively presenting only whatever data is relevant in the context of

Page 2 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015 a particular situation of use. There are two broad paths that can be followed here: either the dictionary is controlled and adjusted by the user (customization), or it adapts itself to the needs of the user without their conscious intervention (De Schryver 2003, p. 189). The first mode requires conscious and skilled dictionary users, the second, a successful implementation of artificial intelligence based on a principled plan of what is needed under what circumstances. But what is needed under what circumstances?

Typical Contexts of Dictionary Use Context of dictionary use is an under-researched area of dictionary use, not least because it is a challenging one due to the largely private nature of dictionary consultation. Investigating this aspect directly and systematically would call for near-continuous surveillance, which is at the same time ethically problematic, expensive, and technically difficult. If – and when – life-logging becomes more of a popular practice, it may offer an opportunity for investigating contexts of dictionary use directly. At present, researchers’ choices in this regard are largely limited to indirect reporting by users. A recent study employing this methodology on a large sample is Muller-Spitzer€ (2014). She finds that the most common activities for which dictionary use is reported are, somewhat predictably: text production (usually writing), text reception (while reading), and text translation. Translation is basically – and simplifying somewhat – a combination of reception and production, of which one or the other may be more challenging, depending on the direction of the translation relative to the user’s proficiency in the two languages involved and the difficulty as well as domain of the source text. Unlike translation, text reception and text production may be done in a monolingual context, when a text is being read or written in the dictionary user’s native language. If this is the case, a monolingual dictionary will be appropriate. Text reception or production in the user’s nonnative language, and any translation task, will bring into the equation an additional language (and at times more than one). In such contexts of use, bilingual dictionaries will become relevant, with the native language serving a function dependent on the nature of the activity. For text reception, a dictionary with a large number of headwords arranged alphabetically will normally be most useful, since texts can potentially include any vocabulary items, including rare ones. The comprehensiveness requirement may be relaxed in the case of dictionaries for children and foreign language learners, insofar as these potential dictionary users are more likely to be exposed to texts written using a controlled vocabulary (such as textbooks, graded readers, or other learning materials). Lexico- graphic treatment in dictionaries for reception may be relatively shallow: enough to clarify the meaning. For native speakers of a language, a definition would be the most important element in the entry, perhaps accompanied by an illustration where a definition is unlikely to be sufficient (Hupka 1989; Ilson 1987), or, at times, the indication of special status, pragmatic constraints, or connotation. For reception in a foreign language, engaging the user’s native language will do much to facilitate comprehension by capitalizing on the user’s intuitive command of their native language (Varantola 2002). A native language equivalent is normally far easier to understand and process than a definition in the foreign language, however skillfully worded. This appears to be true even for very advanced learners of a language (Lew 2004). An equivalent is also much shorter and capable of conveying non-propositional features such as formality or register. In those cases where a satisfactory equivalent does not exist, a definition in either language remains an option. For text production, generally a more modest coverage of vocabulary items will suffice, given that anyone’s productive vocabulary is smaller than their receptive vocabulary, for native and nonnative speaker alike. By contrast, the lexicographic treatment should be more detailed than for text reception, allowing the dictionary user to construct natural phrases and sentences with the headword. To that end, the user will typically need guidance on syntactic patterns into which the headword enters, as well as collocates, preferably with examples of use to serve as a model for production.

Page 3 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015

Before a word or phrase can be used in a sentence, however, it first needs to be identified in the dictionary as an appropriate lexical candidate for expressing the intended meaning. For this part, monolingual alphabetic dictionaries are ill-suited and thematically organized dictionaries or thesauri present particular access problems, as there is no single obvious way to classify concepts, ideas, or objects. Still, thesauri and dictionaries of synonyms remain the only option in monolingual contexts. For production in the nonnative language, a good bilingual dictionary going from the user’s native language to the target language normally ensures the most direct route to finding the target language expressions, with the user’s native language providing an effective entry index to the lexical system of another lesser-known language (Adamska-Sałaciak 2010; Lew and Adamska-Sałaciak 2015; Varantola 2002). Having identified the candidate lexical item, the user may need a follow-up lookup in a monolingual dictionary or in an L2-L1 dictionary to find out how the word should be combined with other words (Atkins and Varantola 1997;Muller-Spitzer€ 2014). This second step is not needed if the first dictionary consulted is a quality bilingual learner’s dictionary designed specifically for L2 production by speakers of L1. Such dictionaries already include a satisfactory coverage of collocational and colligational informa- tion as well as examples for L2 equivalents given at the L1 lemma, but such bilingual active dictionaries are expensive to make and are thus only available for a small minority of language pairs, of which the L2 is most usually English (Laufer and Levitzky-Aviad 2006; Lew and Adamska-Sałaciak 2015). If there are no sufficiently up-to-date or complete bilingual dictionaries, users have to rely more on monolingual resources for the target language, usually combining a bilingual lookup to identify candidate expression with a follow-up monolingual lookup to get the details needed for successful production. Native speakers of the less frequently spoken languages are at a particular disadvantage in this regard.

Speakers of One Language as Users of Bilingual Dictionaries Bilingual dictionaries for production (active dictionaries) follow the recommendation of numerous experts (Adamska-Sałaciak 2010; Al-Kasimi 1984; Kromann et al. 1984; Svensén 1993), who have argued convincingly that bilingual dictionaries may and should be optimized for speakers of one language. Yet, until fairly recently, bilingual dictionaries meeting this condition had been vanishingly rare. Commercial considerations prevailed, with the prospects of being able to market and sell a particular bilingual dictionary to native speakers of two languages – rather than just one – being the decisive factor. In recent years, however, dictionaries designed for speakers of one of the two languages only (directional bilingual dictionaries) have become less of a rarity (Lew and Adamska-Sałaciak 2015).

Defining Format In monolingual dictionaries, meaning explanation is by design restricted to the language of the head- words. This usually works reasonably well for native speakers of the language, but nonnative speaking users of the dictionary (such as language learners) may, and often do, experience problems understanding definitions couched in what, for them, is a foreign language. Mindful of this problem, English lexicog- raphers had come up with the idea of vocabulary control (for an exhaustive discussion, see Cowie 1999). In the majority of leading British monolingual dictionaries for learners of English published today, definitions are written using a controlled list of vocabulary items, typically the 2,500–3,500 most common and useful – for users as well as lexicographers – words of English (though the number gets much higher once individual senses are counted, cf. De Schryver and Prinsloo 2011, p. 9). Such definitions are less likely to challenge language learners with unfamiliar items, which would need to be looked up again before the definition is understood. The price to pay for this, though, is quite heavy. Restrictions in the range of words allowed in definitions so imposed make it difficult, and often impossible, to convey finer subtleties of meaning, oftentimes resulting in similarly vague definitions for only partially synonymous items (Yamada 2013). The may become more complex and the definitions get longer, as

Page 4 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015 lexicographers struggle to paraphrase around words outside the allowed set (such as defining the noun lava without explicit reference to a volcano). Language learners are also exposed to collocations that are less than natural, as some of the more usual collocates lie outside the defining vocabulary. Despite these valid objections, there is some empirical evidence that definitions in English learners’ dictionaries are easier to understand by learners (Grochocka 2008). Of course, not all definitions in learners’ dictionaries are made to the same recipe. A well-known innovation which has made quite an impact on other dictionaries was a systematic introduction by COBUILD1 of the full-sentence definition, which presents the defined lemma in a typical textual environment and then paraphrases the meaning of such an extended context, as in If you endeavour to do sth, you try very hard to do it (COBUILD1: 465). The rationale for this defining format is set out in Hanks (1987), and its limitations are considered in Rundell (2006). One might expect that the embedding of the headword in a typical environment would go some way toward helping learners to produce well- formed sentences, but in a recent systematic comparison (Chan 2013) of LDOCE5 and COBUILD6 entries, Hong Kong learners, who referred to the definition more than to any other entry component, rated COBUILD6 as less useful than LDOCE5 for sentence construction. Conversely, they found COBUILD6 relatively more useful than LDOCE5 for meaning determination (decoding). This is quite a surprising finding, given that COBUILD’s full-sentence defining format appears to be optimized for sentence production through a presentation of the headword in a typical environment. Clearly, more research is needed on the matter, but we should also keep in mind that, whenever possible, including the users’ native language will do much to ensure better comprehension of meaning explanation in a dictionary. Another variation on the definition format which – although not an absolute innovation (Osselton 2007; Stein 2011) – enjoyed a period of popularity in several English learners’ dictionaries around the 1990s and 2000s is the single-clause when-definition. This format is used mainly for defining abstract nouns for which it is difficult to find a useful genus term, for example, when adultery is defined as “when you have sex with someone who is not your husband or wife.” Defining by this type of clause introduced with when (or, less commonly, if or another wh- word) avoids the problem of having to supply an unhelpful general genus noun such as act or practice. However, this genus noun also conveys the information that the item defined is a noun, which appears to be largely lost in a when-definition. As a result, learners of English exposed to such definitions are more likely to interpret the definition as defining a verb (such as, in this case, “have sex”) or an adjective (say, “immoral”), as shown in a series of user studies (Dziemianko and Lew 2006, 2013; Lew and Dziemianko 2006, 2012).

Example Sentences In the late 1980s a lively debate was initiated among lexicographers about which types of examples served dictionary users better: those invented by the lexicographer or those derived from a corpus. Proponents of corpus-based examples argued that usage cannot be invented and pointed out the occasional artificiality of invented examples. The more traditionally minded lexicographers countered that examples from corpora are problematic because they have been torn out of their original textual context. A compromise position was that examples can be taken from a corpus but somewhat modified to repair the context dependence. As the debate continued, opinion shifted more and more in favor of corpus-based examples. The corpus side of the argument was no doubt largely helped by the steady increase in the size of available corpora. With smaller corpora, it was indeed often difficult to find fitting examples, but this became less of an excuse as corpora got bigger. The most recent user studies (Frankenberg-Garcia 2012, 2014) demonstrate that presenting dictionary users with three examples per sense is significantly more helpful than just a single example. Also, text comprehension and text production require rather different types of examples: text comprehension is best served with context-rich examples that elucidate the meaning, while text production requires examples that illustrate the combinatory properties of the lemma. This interesting

Page 5 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015

finding presents a challenge to digital lexicography of the future: how to select the most useful examples (for a given use) from an extensive text corpus?

User Interfaces for Digital Dictionaries The transition of dictionaries from paper to digital form raises questions of interface design for dictio- naries as digital products. This is an area where the more traditional dictionary user studies find common ground with a more recent research tradition in the area of human-computer interaction: usability studies. Bank (2010, 2012) investigated the user experience of three online dictionaries with German and French (for a summary in English, see Heid 2011). One of the dictionaries subjected to the usability investigation was Base lexicale du français (BLF), a portal for learners of French designed to the specification of function-based lexicography (Verlinde 2010; Verlinde et al. 2010), where the user is first asked to specify the task that they intend to use the dictionary for. Disappointingly, Bank’s study revealed that users of BLF experienced severe problems with the interface, performing much worse than in the case of more traditionally designed dictionaries. Details of dictionary interface design were also addressed by Kaneta (2011), who tested two types of entry layout: traditional and layered (“folded”) with the help of eye tracking. A layered interface seemed to prevent users from viewing example sentences, but this did not lead to inferior performance. In fact, for the bilingual entries, the layered interface may be calculated from the data supplied by Kaneta to be three times as likely to end in success compared with the traditional presentation. This difference is not significant (p = 0.12, Fisher exact probability test), but the sample was very small, limiting the power of the test. Eye tracking was also used to assess user behavior with polysemous bilingual entries by Lew et al. (2013). The study found that sense-guiding elements do attract a lot of the users’ attention, just as lexicographers would have hoped, and so do any elements appearing in bold type. Further, repeated exposure to the same equivalent can influence users’ decisions in choosing the equivalent to use, to the point that they tend to ignore a contextually correct but isolated equivalent despite having seen it. Users appear to be swayed here by a sort of “majority vote,” which leads the authors to recommend that bilingual dictionaries use target-equivalent structure (Jarošová 2000; Lew 2013) as more user-friendly (and economical, too). This study was done with dictionary page mock-ups modeled on the print version and presented on screen, but the findings have general relevance for the presentation of lexicographic data, also in digital media. A recent study (Koplenig and Muller-Spitzer€ 2014) revisits the options considered in Kaneta (2011) with respect to organizing the different types of lexical information, in this case: grammatical information, paraphrase (definition), typical contexts (collocation and colligation), and sense relations (synonyms and others). The layered interface is called the “explorer view,” the traditional interface the “print view,” and two more are tested: a “panel view,” where the four sections of the entry are laid out in four rectangular regions of the screen, and finally a “tabbed view.” These four layout options were presented to users for evaluation. The tabbed interface was ranked highest, followed by the panel view, explorer view, and print view, in this sequence. The study was done using large screens, but the tabbed and explorer options also seem practical on small-screen devices such as smartphones. Neither of these has been utilized much in digital dictionaries. The tabbed view is particularly promising, given its familiarity from modern Web browsers. Users surveyed in Koplenig and Muller-Spitzer€ (2014) valued this option particularly for clarity and ease of navigation.

Page 6 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015

Future Prospects

While it is fairly uncontroversial that people will continue to have lexical needs in natural communication as well as in more or less artificial learning contexts, it is much less certain that dictionaries will persist for much longer, at least in the form we know them today. Rather, it seems likely that dictionaries will increasingly become absorbed into more general digital tools designed to provide assistance with communication, expression, and information searching. Such tools are already becoming available in the form of writing assistants (Wanner et al. 2013).

Acknowledgment

This work was supported in part by the Polish National Science Centre (Narodowe Centrum Nauki), under grant DEC-2013/09/B/HS2/01125.

References

Adamska-Sałaciak, A. (2010). Why we need bilingual learners’ dictionaries. In I. J. Kernerman & P. Bogaards (Eds.), English learners’ dictionaries at the DSNA 2009 (pp. 121–137). Tel Aviv: K Dictionaries. Al-Kasimi, A. M. (1984). The interlingual/translation dictionary. In R. R. K. Hartmann (Ed.), Lexicog- raphy: Principles and practice (pp. 153–162). London: Academic. Atkins, B. T. S., & Varantola, K. (1997). Monitoring dictionary use. International Journal of Lexicog- raphy, 10(1), 1–45. Bank, C. (2010). Die Usability von Online-Wo¨rterbuchern€ und elektronischen Sprachportalen. (M.A.), Universität Hildesheim. Bank, C. (2012). Die Usability von Online-Wörterbuchern€ und elektronischen Sprachportalen. Information – Wissenschaft & Praxis, 63(6), 345–360. Barnhart, C. L. (1962). Problems in editing commercial monolingual dictionaries. In F. W. Householder & S. Saporta (Eds.), Problems in lexicography (pp. 161–181). Bloomington: Indiana University. Chan, A. Y. W. (2013). Using LDOCE5 and COBUILD6 for meaning determination and sentence construction: What do learners prefer? International Journal of Lexicography, 27(1), 25–53. Cowie, A. P. (1999). English dictionaries for foreign learners: A history. Oxford: Clarendon. De Schryver, G.-M. (2003). Lexicographers’ dreams in the electronic-dictionary age. International Journal of Lexicography, 16(2), 143–199. De Schryver, G.-M., & Prinsloo, D. J. (2011). Do dictionaries define on the level of their target users? A case study for three Dutch dictionaries. International Journal of Lexicography, 24(1), 5–28. Dziemianko, A., & Lew, R. (2006). When you are explaining the meaning of a word: The effect of abstract noun definition format on syntactic class identification. In E. Corino, C. Marello, & C. Onesti (Eds.), Atti del XII Congresso di Lessicografia, Torino, 6–9 settembre 2006 (Vol. 2, pp. 857–863). Allessandria: Edizioni dell’Orso. Dziemianko, A., & Lew, R. (2013). When-definitions revisited. International Journal of Lexicography, 26(2), 154–175. Frankenberg-Garcia, A. (2012). Learners’ use of corpus examples. International Journal of Lexicogra- phy, 25(3), 273–296.

Page 7 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015

Frankenberg-Garcia, A. (2014). The use of corpus examples for language comprehension and production. ReCALL, 26(2), 128–146. Grochocka, M. (2008). The usefulness of the definitions of abstract nouns in OALD7 and NODE. Poznań Studies in Contemporary Linguistics, 44(4), 469–501. Hanks, P. (1987). Definitions and explanations. In J. Sinclair (Ed.), Looking up: An account of the COBUILD project in lexical computing (pp. 116–136). London/Glasgow: Collins. Heid, U. (2011). Electronic dictionaries as tools: Towards an assessment of usability. In P. A. Fuertes- Olivera & H. Bergenholtz (Eds.), e-lexicography: The internet, digital initiatives and lexicography. London: Continuum. Hupka, W. (1989). Wort und Bild. Die Illustrationen in Wo¨rterbuchern€ und Enzyklopädien.Tubingen:€ Niemeyer. Ilson, R. F. (1987). Illustrations in dictionaries. In A. P. Cowie (Ed.), The dictionary and the language learner (pp. 193–212). Tubingen:€ Niemeyer. Jarošová, A. (2000). Problems of semantic subdivisions in bilingual dictionary entries. International Journal of Lexicography, 13(1), 12–28. Kaneta, T. (2011). Folded or unfolded: Eye-tracking analysis of L2 learners’ reference behavior with different types of dictionary. In K. Akasu & S. Uchida (Eds.), ASIALEX2011 Proceedings. Lexicogra- phy: Theoretical and practical perspectives (pp. 219–224). Kyoto: Asian Association for Lexicography. Koplenig, A., & Muller-Spitzer,€ C. (2014). Questions of design. In C. Muller-Spitzer€ (Ed.), Using online dictionaries (pp. 189–204). Berlin: Walter de Gruyter. Kromann, H.-P., Riiber, T., & Rosbach, P. (1984). ‘Active’ and ‘passive’ bilingual dictionaries: The Ščerba concept reconsidered. In R. R. K. Hartmann (Ed.), LEXeter ‘83 proceedings. Papers from the International Conference on Lexicography at Exeter, 9–12 September, 1983 (pp. 207–215). Tubingen:€ Niemeyer. Laufer, B., & Levitzky-Aviad, T. (2006). Examining the effectiveness of ‘bilingual dictionary plus’–A dictionary for production in a foreign language. International Journal of Lexicography, 19(2), 135–155. Lew, R. (2004). Which dictionary for whom? Receptive use of bilingual, monolingual and semi-bilingual dictionaries by Polish learners of English. Poznań: Motivex. Lew, R. (2013). Identifying, ordering and defining senses. In H. Jackson (Ed.), The Bloomsbury companion to lexicography (pp. 284–302). London: Bloomsbury Publishing. Lew, R., & Adamska-Sałaciak, A. (2015). A case for bilingual learners’ dictionaries. ELT Journal, 69(1), 47–57. Lew, R., & Dziemianko, A. (2006). Non-standard dictionary definitions: What they cannot tell native speakers of Polish. Cadernos de Traduçao, 18, 275–294. Lew, R., & Dziemianko, A. (2012). Single-clause when-definitions: Take three. In R. V. Fjeld & J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX International Congress (pp. 997–1002). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo. Lew, R., Grzelak, M., & Leszkowicz, M. (2013). How dictionary users choose senses in bilingual dictionary entries: An eye-tracking study. Lexikos, 23, 228–254. Muller-Spitzer,€ C. (2014). Empirical data on contexts of dictionary use. In C. Muller-Spitzer€ (Ed.), Using online dictionaries (pp. 85–126). Berlin: Walter de Gruyter. Osselton, N. E. (2007). Innovation and continuity in english learners’ dictionaries: The single-clause when-definition. International Journal of Lexicography, 20(4), 393–399.

Page 8 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_11-2 # Springer-Verlag Berlin Heidelberg 2015

Rundell, M. (2006). More than one way to skin a cat: Why full-sentence definitions have not been universally adopted. In E. Corino, C. Marello, & C. Onesti (Eds.), Atti del XII Congresso di Lessicografia, Torino, 6–9 settembre 2006 (Vol. 1, pp. 323–337). Allessandria: Edizioni dell’Orso. Stein, G. (2011). The linking of lemma to gloss in Elyot’s Dictionary (1538). In O. Timofeeva & T. Säily (Eds.), Words in dictionaries and history. Essays in honour of R.W. McConchie (pp. 55–79). Amster- dam: John Benjamins. Svensén, B. (1993). Practical lexicography. Principles and methods of dictionary-making. Oxford: Oxford University Press. Varantola, K. (2002). Use and usability of dictionaries: Common sense and context sensibility? In M.-H. Corréard (Ed.), Lexicography and natural language processing. A festschrift in honour of B.T.S. Atkins (pp. 30–44). Grenoble: EURALEX. Verlinde, S. (2010). The base lexicale du français: A multi-purpose lexicographic tool. In S. Granger & M. Paquot (Eds.), eLexicography in the 21st century: New challenges, new applications (pp. 335–342). Louvain-la-Neuve: Cahiers du CENTAL. Verlinde, S., Leroyer, P., & Binon, J. (2010). Search and you will find. From stand-alone lexicographic tools to user driven task and problem-oriented multifunctional leximats. International Journal of Lexicography, 23(1), 1–17. Wanner, L., Verlinde, S., & Ramos, M. A. (2013). Writing assistants and automatic lexical error correction: Word combinatorics. In I. Kosem, J. Kallas, P. Gantar, S. Krek, M. Langemets & M. Tuulik (Eds.), Electronic lexicography in the 21st century: Thinking outside the paper. Proceedings of the eLex 2013 conference, 17–19 October 2013, Tallinn, Estonia. Ljubljana/Tallinn: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut. Yamada, S. (2013). Monolingual learners’ dictionaries – Where now? In H. Jackson (Ed.), The Blooms- bury companion to lexicography (pp. 188–212). London: Bloomsbury Publishing.

Dictionaries [BLF] Base lexicale du français. http://ilt.kuleuven.be/blf/. [LDOCE5] Mayor, M. (2009). Longman Dictionary of Contemporary English (5th ed.). Harlow: Pearson Education. [COBUILD1] Sinclair, J., & Hanks, P. (Eds.). (1987). Collins COBUILD English Language Dictionary. London: Collins. [COBUILD6] Sinclair, J., et al. (Eds.). (2009). Collins COBUILD Advanced Dictionary (6th ed.). London: HarperCollins.

Page 9 of 9 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

Term banks

Thierry Fontenellea* and Dieter Rummelb* aTranslation Centre for the Bodies of the European Union, Luxembourg, Luxembourg bEuropean Commission Directorate-General for Translation, Luxembourg, Luxembourg

Abstract

Terminology is traditionally defined as the study of terms and their use. It is frequently opposed to lexicography on the grounds that the latter deals with the study of words and their meanings, while terminology is concerned with the study and representation of concepts and conceptual systems which structure any specialized domain. Term banks, also known as terminological databases or term bases, are the products of terminological work and are used in a variety of applications, ranging from information retrieval, automatic summarization, computer-assisted translation, or document indexing. Most term banks are multilingual and are used to manage verified or approved terms and ensure consistency with respect to standardized usage in a given linguistic community or in a given organization. In this chapter, we describe a number of initiatives aimed at supporting the work of the translators of the EU, namely, the IATE multilingual terminology database, probably the largest term base in the world, and the ECHA-term database, a small database of terms occurring in the REACH Directive on chemicals. The chapter also focuses on some of the challenges faced by terminologists and by translators with respect to the acquisition and representation of terms, their equivalence in multiple languages, their dissemination, and their exploitation during the translation process.

Introduction

Pearson (1998, p. 10) indicates that terminology may be used to describe methods of collecting, disseminating, and standardizing terms. She adds that “this type of work is carried out by bodies concerned with making recommendations for the standardization of existing terminology and by those concerned with the collection and documentation of terminology.” Terminologists describe the vocabulary of special subject fields, and the term banks they produce are compilations of the collections of terms associated with a given domain. While the primary object of a dictionary is the word – or, better, the lexical unit, since there are many multi-word lexical units – a terminology database primarily includes descriptions of concepts, i.e., abstract mental constructs which are distinct from the terms they correspond to in a given language; concepts are therefore language independent. Jacquemin and Bourigault (2003) point to a fundamental flaw in the traditional approach to terminology which assumes that terms, viewed as the linguistic labels of a concept, can be organized into networks of concepts to structure a given domain. Such an approach is well suited for normalization, they argue, but is not really suitable for computational term analysis because the abstract conceptual maps postulated by this classical approach do not exist (or at least cannot be built from introspection), and terminological data is in fact the output of the analysis of textual (corpus) data.

*Email: [email protected] *Email: [email protected]

Page 1 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

There has been a lot of research over the last decade and a half focusing on the computational (automatic or semi-automatic) extraction of terms from corpora. The identification of candidate terms is indeed a sine qua non when embarking on the compilation of a terminological glossary. The use of statistical techniques applied to the analysis of specialized corpora, which may be used as “raw data” or may be augmented with a variety of linguistic annotations, ranging from tokenization to part-of-speech tags to syntactic or even semantic tagging, makes it possible to produce lists of possible terms which can subsequently be selected for inclusion into a term base. Given that the vast majority of terms are noun phrases, many researchers have resorted to symbolic approaches relying on the syntactic analysis of corpora to identify the relevant syntactic patterns and exclude what is less likely to be a term. Frequency is obviously a key factor if one uses a statistical perspective, but it is not without challenges: words and expressions that occur frequently in a corpus may well be trivial. Since terms, however, may occur infrequently in a corpus, the rejection of low-frequency items may prove detrimental to the whole enterprise. At the same time, very infrequent occurrences may indicate irrelevance. The statistical analysis of a corpus will thus always require evaluation and validation by a linguist. At the same time, all adjective-noun patterns are not terms, so it is essential to reduce the inevitable noise that is generated by such techniques. The use of natural language processing (NLP) techniques is not limited to the extraction of candidate terms, however. NLP is indeed useful to deal with the macrostructure of a term base (which terms should be in and what should be left out). The microstructure of a term base can also benefit from such techniques, for instance, by providing the terminologists with possible definitions extracted from corpora. Pearson (1998) analyzes the various mechanisms used in texts to signal the presence of a term, in an attempt to automate the extraction of terms together with their definitions. The linguistic signals she is interested in include patterns such as “is known as” or “is called,” which link the definiens and the definiendum. Other patterns may also be tapped to extract various types of lexical-semantic relations, such as synonymy (X, also known as Y), antonymy (X, as opposed to Y), hyperonymy (X, a type of Y), location, meronymy, or part-of relations (X is made up of Y), etc.

Description

Information in a terminological record The elements of a terminological entry have a lot in common with what can be found in a traditional dictionary. Definitions will be essential, for instance, as well as an example illustrating the context in which a term can be found. Part-of-speech information (noun, adjective, etc.) is also a key piece of information. Information about the domain will be crucial, especially in large-coverage terminology databases: it is indeed of paramount importance that the subject area be mentioned to allow the user to discriminate between the various readings of a term. Term bases include a lot more metadata than traditional dictionaries, precisely because of the normalization perspective adopted by many terminology projects. The use of consistent terminology is often a sine qua non in multilingual communication, which is why many term bases will explicitly indicate whether a given term is approved, reliable, verified, preferred, etc. The source of a term or of adefinition is therefore an important element which needs to be recorded when compiling a terminological record. Even the “author” of a term entry may be important as is the date of creation or the last date of modification of an entry. Language is a dynamic organism, which means that terms that are in common use one day may fall into oblivion and become deprecated the next day. Geopolitical factors may also play a part in the life of a term. Terms such as Mexican flu and one of its synonyms, swine flu, are cases in point: the IATE term base described below will indicate that

Page 2 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014 these two synonyms are deprecated. The substantial number of cases of this influenza virus which were first observed in Mexico in 2009 does not justify the reference to the geographical origin of what later became a worldwide epidemic, which explains why the entry for this concept includes the following remark about the preference expressed by the European Centre for Disease Prevention and Control: ECDC prefers to use the term influenza A(H1N1)v (where v indicates variant), which has been chosen by WHO’s Global Influenza Surveillance Network and helps distinguish the virus from seasonal influenza A(H1N1) viruses and A(H1N1) swine influenza viruses. A name for the disease caused by the virus has yet to be determined by WHO but the term ‘swine flu’ is inaccurate for what is now a human influenza. Labels such as preferred, obsolete, deprecated, etc. will therefore be crucial in term bases to reflect the dynamic nature of terms and the normative facet of much terminological work. In the remainder of this chapter, we will describe in more detail two types of terminology databases developed by the EU for its translators. The specific linguistic situation of the EU, with 24 official languages, accounts for a number of terminology projects aimed at identifying, standard- izing, and disseminating knowledge about specialized vocabularies across many subject areas.

IATE: interactive terminology for Europe

IATE, the term base of the language services of the EU, is a concept-oriented, large-scale database that covers all fields of activity of the EU. While it contains mainly terminology in the 24 official EU languages, it also provides content in non-official languages. On July 1, 2013, these languages were Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish, and Swedish. IATE has been used in the EU institutions and agencies since summer 2004 for the collection, dissemination, and shared management of EU-specific terminology. The database has been accessible to the general public since 2007 (at http://iate.europa.eu).

Background The project for the creation of a single terminology database for the translation services of the EU was launched in 1999 by the Translation Centre for the Bodies of the EU. (The translation services of the following EU bodies participated in the project: European Commission, Council, Parliament, Court of Auditors, Economic and Social Committee, Committee of the Regions, Court of Justice, Translation Centre for the Bodies of the EU, European Investment Bank, European Central Bank.) Acting on the recommendations of an external feasibility study, the project aimed at improving a situation where the existence of parallel, independent database systems and approaches to termino- logical work in the EU’s language services made it difficult to standardize the usage of terminology across EU institutions and led to problems of terminological inconsistency, redundancy in the data, and duplication of work. The feasibility study recommended:

• The incorporation of all existing terminology databases into a single new interinstitutional database • The interactive creation and maintenance of terminological data • The integration of terminology tools, including tools designed to support terminology activities, into a translation and office automation environment

Page 3 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

• The deployment of new ergonomic and user-friendly interfaces • The creation of a cooperative infrastructure for data management and the rational recovery of existing data

Today the system offers the following features:

• One common database for all institutions and agencies containing all legacy data • Basic and advanced search features (including stemming and base character conversion) • Online access in read and write mode, i.e., the possibility for users to carry out modifications, to add entries directly to the central database, and hence to allow their colleagues to benefit from this work in real time • A validation workflow that ensures that all newly added or modified terminology is reviewed • Role-based user management • Auditing features that keep track of all changes made to the terminology in the database • Features for the export and import of data • Statistics on the content of the database and user activity • A basic messaging system as communication mechanism between the actors in the terminology workflow

At the end of 2013, the IATE term base contained about 1.5 million concepts and 8.7 million individual terms. Language coverage is thus, as these figures indicate, rather uneven: only about 9.5 % of IATE entries contain more than ten languages. This is mainly due to the history of the EU enlargements, namely, the addition of new official languages with the accession of new member states. Table 1 provides further details (as of 1 January 2014): Given that translators and terminologists continuously update the IATE content, it is a “living” database. In 2013 almost 97,000 new terms were added, and 158,000 existing terms were modified. These changes were also reviewed and validated.

Data structure Merging the terminology of the existing institutional databases into one single database was a major challenge of the project. This task was challenging not only because of the tremendous amount of data that had to be treated. A bigger problem than the actual number of entries was the content of the different databases and the ways in which these were structured: different philosophies of terminol- ogy and different historical backgrounds that were expressed in the stored data had to be reconciled. This process involved, in a first step, the definition of mapping rules between the data structures of the existing databases and the new format of the interinstitutional database. This data structure takes into consideration the – at the time – evolving standards in the field (SALT/MARTIF, GENETER). IATE adopted a concept-oriented approach – the mono- and multilingual information on each aspect of a concept can be expressed on three interrelated levels of the data structure of the terminological entries:

Page 4 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

Table 1 Number of terms per EU official language in IATE (as of 1 January 2014) Language Number of terms English 1,402,006 French 1,339,496 German 1,034,088 Italian 701,735 Dutch 691,801 Spanish 617,124 Danish 603,481 Portuguese 533,623 Greek 523,086 Finnish 329,491 Swedish 314,220 Polish 59,576 Irish 57,879 Lithuanian 53,802 Hungarian 49,858 Estonian 41,489 Slovenian 41,337 Czech 38,043 Slovak 37,219 Maltese 35,732 Romanian 35,451 Bulgarian 34,420 Latvian 28,617 Croatian 8,863

Language- independent Language Level Term level Level

Term 1 Language 1 Term 2 Concept

Language 2 Term1

1. The language-independent level can contain all information that relates to the entire concept. “Domain” is the classic example of that type of information. (Terminology in IATE is structured by domain, i.e., the field of knowledge in which the concept is used. Based on EurVoc – a multidisciplinary thesaurus covering the activities of the EU – IATE offers 21 subject domains with two hierarchically linked sublevels. The biggest domain clusters are “education and communication” (260,000 concepts), “industry” (240,000) and “transport” (160,000), and “law” (120,000).) The database also makes it possible to be more exhaustive: the user can add a domain note when the classification system for domains does not contain a suitable descriptor.

Page 5 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

Collection, problem language, cross-references to other entries, origin of the concept, and links to images complete the language-independent level. 2. Beneath this top level, information like a definition, explanation, and comments can be stored in and for each of the languages the entry contains. This level is enriched by the possibility to add notes to several fields, references to source documents, and, again, multimedia files. 3. Each language level may refer to several terms – synonyms of the same concept or abbreviations. A large variety of information can be associated with each of the terms: term type, reference, regional usage, context, customers, links to homonyms, etc. Finally the system includes the option to add linguistic information, like part of speech or , for each term or each of the words constituting a term.

Technical implementation The technical backbone of the IATE term base is a relational database management system (Oracle) that uses text indexing and built-in features like stemming to support searches. All data is stored in UTF8 format. Users interact with the database system via web-based user interfaces implemented in Java - See Fig. 1. IATE also provides web services that allow querying the database from another IT application, e.g., a web site.

Challenges and outlook IATE has been used in the EU’s language services for almost 10 years now and is considered one of the most successful projects in the field of interinstitutional cooperation. It was (and is), however, not without its challenges. IATE is probably the largest multilingual terminology database in the world, with a wealth of useful and reliable information. However, there are also a number of known content-related issues. Users searching the database may be confronted with results that they will perceive as “noise”: duplicate entries, outdated information, or terminology that is not well-documented. None of these problems originate with IATE. The merger of the legacy databases in 2003–2004 merely made existing problems more visible. At the same time, the fact that several EU translation services now use a common tool to create and manage terminology has reduced the number of duplicates.

Fig. 1 IATE: technical architecture

Page 6 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

Common working methods in quality assurance have also reduced the number of entries of poor quality. The launch of the IATE database in mid-2004 had another important side effect: it fostered the cooperation between translators and terminologists from different institutions and increased the potential for synergies. The basic communication features offered by the term base today – the possibility to comment on existing entries or terms – are seen as no longer sufficient. A term base that caters for the needs of a big and geographically distributed community (which currently has about 5,000 registered internal users) has to provide forums and other tools that allow for efficient consultation and collaboration. These developments started in 2014. The terminology in IATE is organized according to the paradigm of interactive, human querying of the system, i.e., the assumption that a term base is usually searched by a translator or terminologist who will “filter” the results and pick the most relevant ones. This is a somewhat limited perspective. One of the main problems of terminology databases is that users, and that means in our context in the first place translators, have to actively search for information. Time constraints and the lack of awareness that a specific word is actually a term requiring a specific translation in a certain context may result in missing out on important information. Ideally a database should offer the possibility to “push” information to the user, i.e., to inform the users that relevant information is available. While technological developments for the integration of IATE with term recognition software are ongoing and promising, the sheer mass of data in IATE remains a challenge. Independently of the technological implementation, it will be necessary to establish certain filter criteria to ensure that term recognition results are pertinent. There are two issues, however: (a) the amount of data in IATE for some languages produces too many term recognition results (i.e., “noise”); (b) term recognition should have an added value over other sources of information (e.g., translation memory matches), i.e., it should provide relevant, high-quality terms. The following examples show the result of an automatic comparison of an EU document with the content of IATE for the English to French language combination. Underlined expressions indicate terms that have been found in IATE Fig. 2. Besides the useful matches for potentially problematic terms (e.g., “Colleges of Supervisors”), the example also shows a number of terms that most translators working on a regular basis with EU documents would find trivial. Finding a mechanism to efficiently identify relevant terms in IATE for this type of application will be one of the major challenges in the future. It is important to note that these mechanisms may and will depend on the specific needs of language communities, document types, and even the individual translator (e.g., junior translators vs. experienced senior colleagues). Finally, the public IATE site also recently made available a sub-set of IATE terminology for download in TBX format. It is also planned to publish the technical description of the web services that permits querying the database from another application. This is in line with the Directive 2003/ 98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information (cf. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri= CELEX:32003L0098:EN:HTML).

Fig. 2 Term recognition example

Page 7 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

The ECHA-term term base: multilingual REACH and CLP terminology

The decentralized agencies of the EU cover nearly every aspect of life: intellectual property rights, xenophobia, medicine, occupational health and safety, environment issues, banking, IT security, etc. In 1994 Council Regulation (EC) No 2965/94 set up the Translation Centre for the Bodies of the EU and tasked it with providing translation services to the decentralized agencies. One of the Centre’s highly specialized clients is the European Chemicals Agency (ECHA), which is based in Helsinki, Finland, and was founded in 2007. The ECHA is the driving force in implementing the EU’s groundbreaking chemical legislation for the benefit of human health and the environment. Through the Regulation on Registration, Evaluation, Authorisation and Restriction of Chemicals (i.e., the REACH Regulation), companies are responsible for providing information on the hazards, the risks, and the safe use of chemical substances that they manufacture or import. The Classifica- tion, Labelling and Packaging (CLP) Regulation introduced a globally harmonized system for classifying and labelling chemicals in the EU. The ECHA considered coherent and clear terminology one of the key factors in this process and launched the ECHA-term project together with the Translation Centre. The purpose of the project was to provide REACH users with a reliable, coherent, and up-to-date source of terminology in the chemical field to harmonize the use of terminology in the REACH context, to enhance clear communication, and ultimately to reduce costs for the stakeholders. The project was launched in 2009. It included both the compilation of terminological content and the development of an IT platform for its dissemination. The ECHA-term platform was launched in April 2011. Today, it contains over 1,000 CLP and REACH-related concepts, phrases, and defini- tions in 23 EU languages, nine multilingual pictograms with images, and 53 substances of very high concern with EC and Chemical Abstract Service (CAS) numbers. Roughly 100–150 new terms are added annually.

Content creation Early in the project ECHA decided to focus on the terminology that is covered by its mandate – REACH and CLP – and for which it can provide authoritative and reliable information. The idea of an all-inclusive and potentially large-scale term base on general chemical terminology was discussed at the time. It was, however, rejected as too costly and simply not in line with the needs of the stakeholders. This also had implications for the functionalities that the IT system would provide: the idea of an ECHA “wiki,” which had been discussed in an early phase and which would allow users to add content, was abandoned in favor of a more structured, controlled approach. Typically terminology work for ECHA-term begins with the definition of a relevant corpus in the source language (usually English). Using semi-automatic term extraction tools, the Centre creates an initial list of concepts in the source language and completes them with definitions, references, contexts, notes, etc. This monolingual glossary is then validated by two or three translators to ensure the pertinence of the included material: Does the list contain concepts that are “trivial” and should not be included? Are important – i.e., problematic – concepts missing? A native speaker will then revise the (English) term list to ensure that it does not contain formal errors (such as typos). The consolidated monolingual glossary is then submitted to the ECHA for validation by their domain experts. The subsequent multilingual phase of the project, when target equivalents and relevant information are completed by the Centre’s terminologists, is thus based on a solid founda- tion. Wherever possible the ECHA reviews and validates the multilingual glossary. In practice, however, it can be difficult to cover all languages. The final multilingual glossary is then imported into the ECHA-term database.

Page 8 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

Users of the database have the possibility to comment on the existing content or to propose new concepts that should be added. The Centre and the ECHA analyze this feedback and update the database where appropriate. The database also keeps track of unsuccessful ECHA-term searches, i.e., it allows the production of statistics on frequent queries that did not produce a result. This feature provides additional input for the ongoing ECHA terminology work.

Main feature of the ECHA-term IT platform The ECHA-term database is accessible at http://echa.cdt.europa.eu (https://echa-term.echa.europa. eu as of early 2015). The home page offers different search options, as well as an alphabetic index of the database. It includes news items and a “word cloud” of the most frequently searched terms. A menu bar at the top of the screen gives access to other functionalities (export, user preferences, help, consultation and download of documentation, etc.) (Fig. 3). The basic functionality of the IT system is to look up terminology. ECHA-term supports both monolingual and multilingual queries. Search results are presented in a simple hit list that typically contains the matching terms in the source language and their translation, as well as an indicator of any additional information that is available (reliability of the term reference, definition of the concept, etc.). Users have the possibility to visualize the concept by clicking on any entry in the hit list. The “detail view” also allows for the submission of entry-specific feedback. The alphabetical index of ECHA-term is intended for users who want to browse the full list of terms contained in the database. Users can display the full list of terms or filter by term type (displaying only a specific category of terms: acronyms, phrases, substances, descriptors, etc.) Users with the proper access rights can modify or delete existing information and add new concepts or languages online. The database keeps an audit trail of all modifications made to the glossary. Users have the possibility to compare the current content, of, e.g., a definition, with the previous content.

Fig. 3 ECHA-term home page

Page 9 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

The initial approach defined for the validation of terminology in ECHA-term was based on the Centre’s experience with the IATE database. One of the founding principles of IATE was that EU linguists should be able to use this database to record their terminology interactively. The validation of these changes is an integral part of the IATE philosophy: every modification launches a validation workflow, i.e., the review of the change by another colleague. The approach chosen for ECHA-term is slightly different: interactive modifications are the exception rather than the rule at present. New terminology is added according to a well-defined and structured process that includes several validation steps; write access is limited to a very small number of users. Overall the potential user population of editors and validators is very restricted. In this setup, it did not make sense to implement a sophisticated validation workflow. However, the ECHA foresaw the need to revise parts of the glossary in the future, e.g., by a small dedicated group of reviewers. The idea of a systematic validation mechanism was dropped in favor of a feature that supports the review of specific entries. In this scenario administrators could define a subset of entries that should be reviewed; colleagues carrying out the review see the entries or terms concerned in a specific screen (Fig. 4): The progress of the review process is indicated with check marks. ECHA-term users have the possibility to download subsets of the database or the entire glossary in either TBX or spreadsheet format. Export filters allow limiting the download to specific sub-domains or term types (terms, acronyms, etc.). Terminology can be exported in a monolingual format (i.e., terms and definitions) as well as in a bi- or multilingual format.

ECHA-term and its users In June 2012, about 14 months after the launch of the site, the ECHA carried out an online user survey to find out what ECHA-term users thought about the glossary and the IT system that was designed to disseminate it. 153 ECHA-term users took the time to reply to the survey. 56 % of the respondents indicated that they work in the chemical industry; 22 % were translators. Other users work for EU Member States, international organizations, NGOs, or social partners.

The main findings of the survey were:

– Overall, the tool serves the users’ needs as initially designed. The vast majority of users visit the site to look up translations for terms (77 %); the multilingual database is one of the key features for ECHA-term users. – 67 % of the users agreed with the statement that ECHA-term makes their work more efficient. – 62 % of the users stated that the database helps them understand the REACH Regulation, and 63 % indicate the same about the CLP Regulation. – 83 % of the users replied that the terminology available on ECHA-term is relevant to their work. – The users appreciated the possibility to download the contents of the database.

Fig. 4 Review screen

Page 10 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014

These results show that a relatively small but well-maintained glossary can indeed make a difference and support users in the implementation of a rather complex EU regulation. From a technical and functional point of view, it is interesting to note that several users from the industry download ECHA-term data to integrate it in their local IT systems (e.g., authoring tools). Many respondents stated that they would like to see an ECHA-term web service or a way of integrating ECHA-term in their standard document editing environment. Finding ways to push relevant terminology to end users – e.g., by linking terms in ECHA documents or web pages to definitions in the database – would be another appreciated feature. Seamless integration, interoperability, and also the controlled growth of the database content are promising ways to ensure the relevance and usefulness of the ECHA-term database in the future.

Challenges for the future

The introduction to this chapter alluded to some of the challenges faced by terminologists who embark on the compilation of term bases. The identification of candidate terms is a prerequisite to draw the list of terms that should be included in any collection. This process is still traditionally a monolingual task, however, and even if some researchers have attempted to move to bilingual term acquisition, acquiring terms in separate monolingual corpora and aligning the corpora at sentence, phrase, or word level, the extraction process is only the very first phase of any project. The representation of the concept entails the drafting of definitions and the addition of usage notes, contextual information, lexical-semantic relations, subject area or domain labels, etc. The terminologist then needs to identify and record equivalence relations in other languages (i.e., the 24 official languages of the EU), with a crucial validation phase in collaboration with subject field specialists. The issue of the level at which terminology should be managed is crucial. Should it be centralized or should it rather be done at the local level, down to the level of individual translators in big translation services? How then should the data be made available to its users? Clearly, web technologies have made it possible to disseminate terminological knowledge to millions of users (the publicly available version of IATE received 44 million queries in 2013). However, one of the major stumbling blocks in the dissemination process is that it is still up to the individual translator or user to “suspect” that a term base such as IATE or ECHA-term is able to provide interesting and useful information about a given term. What is therefore needed is a mechanism which can alert a translator that a word or a sequence of words appearing in the source text he or she is dealing with corresponds to a term entry in a specialized database for which an equivalent exists in the target language. Such tools exist at the local level but will need to be linked to huge databases like IATE, without forcing the translator to host a local copy of the nine-million-term database, which is not recommended for obvious performance reasons. A number of initiatives are currently under way to deal with this crucial issue. Another burning issue is also related to the use of term checkers which ensure that only recommended (read “validated”) terminology is used and that “dispreferred,” obsolete, or deprecated terms are not used by the translator. Once again, such obstacles require some level of linguistic processing to match inflected forms in a text with the canonical forms recorded in the quality assurance mechanisms. Organization challenges are also at stake here, since it is crucial to determine who is doing what. Should translators themselves take care of the

Page 11 of 12 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_21-1 # Springer-Verlag Berlin Heidelberg 2014 terminological work? Where should they capture the preferences expressed by the “clients”? Should it be done centrally or locally? How can we make sure these preferences are not one-off information but can be recycled in future translations to avoid repeating the same mistakes? These questions do not seem to have clear-cut answers: what is clear, however, is that the solutions can only be effective if they combine technological innovation, using the appropriate amount of linguistic processing, together with organizational changes to make the best use of what modern technology can offer to language workers.

References

Jacquemin, C., & Bourigault, D. (2003). Term extraction and automatic indexing. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 599–615). Oxford: Oxford Uni- versity Press. Pearson, J. (1998). Terms in context (Studies in corpus linguistics). Amsterdam: John Benjamins.

Further Reading Cabré, M.-T. (1998). La terminologie. The´orie, me´thode et applications. Ottawa/Paris: Les Presses de l’université d’Ottawa/Armand Colin. Reichling, A. (1998). Gestion centrale de la terminologie, EURODICAUTOM et ses outils satel- lites. In Terminologie et Traduction 1.1998 (pp. 172–201). Commission Européenne, Luxembourg. Rey, A. (1992). La terminologie – Noms et notions (Collection “Que sais-je ?” 2ème éd.). Paris: Presses Universitaires de France. Sager, J. (1990). A practical course in terminology processing. Amsterdam: John Benjamins. Wright, S. E., & Budin, G. (2001). Handbook of terminology management (volume 2): Application- oriented terminology management. Amsterdam: John Benjamins.

Page 12 of 12 The lexicon of the male

Welby Ings

Contents Introduction ...... 2 Description ...... 2 Lexicon and Identity ...... 2 Code ...... 3 Detachment ...... 3 The Protean Nature of Code ...... 4 Words Relating to Gardening ...... 5 Ecclesiastical Metaphors ...... 6 Metaphors Related to Food ...... 7 Metaphors Relating to Legitimized Work ...... 9 L337 (Leet) and the Impact of Online Environments ...... 9 Issues Relating to Lexicography ...... 10 Future Prospects ...... 12 References ...... 13 Dictionaries ...... 14

Abstract This chapter considers metaphoric clustering in a very specific type of lexis known as “code.” This is the underground language of the New Zealand male sex worker. In this case study, code’s historical and contemporary words and their metaphoric frameworks are discussed in the context of a protean dynamic of change as the language form seeks to maintain restricted understanding. Some of the words considered are specific to New Zealand, but many form part of a wider, international lexicon. Thus, certain historical material has been noted in listings compiled in Australasia, South Africa, the UK, and the USA.

W. Ings (*) Auckland University of Technology, Auckland, New Zealand e-mail: [email protected]

# Springer-Verlag Berlin Heidelberg 2015 1 P. Hanks, G-M. de Schryver (eds.), International Handbook of Modern Lexis and Lexicography, DOI 10.1007/978-3-642-45369-4_31-2 2 W. Ings

However, a significant number of terms have been recovered through a series of oral history interviews with current and retired New Zealand sex workers. The chapter also considers the implications of compiling such material in relation to wider lexicography. Specifically, it reviews the impact of user participation in online dictionaries and its influence on the dating and contextu- alizing of material. More importantly, it discusses tensions inherent in nominat- ing lexical material like code for cultural inclusion when a social group’s safety relies upon restricted access.

Introduction

1954 There’s a bit of chicken on the battle down town...rough , but he takes it up the tan track. He and the hock that works with him ginger the geriatricks. The number of Aunties they’ve rolled istraaaagic. The demons fell on the place last week and cleaned them out.

2014 There’s this streetie and its BF cracking it down at the DT terminal. It’s a but its M8 skims the Narnias and >50s. Bth VGL bt toxic. They got a turkey slapping from the cops last week so it’s all vanilla down at the Metro.

These monologues tell the same story 60 years apart. Even at the time the words were used, only a very select community could understand what was being said. They tell of an incident where the police confronted a male sex worker and an accomplice after they had been reported for soliciting and robbing older clients. The words surface from a body of research begun in 2002 that involved interviewing and recording the lexicon and oral histories of 30 New Zealand male sex workers and their clients in an attempt to piece together what was an almost undocumented argot. For the purposes of the research, a male sex worker was defined as “a biological male who receives payment, in money, in exchange for sexual favors and for whom sex work constituted their major form of income for a minimum period of six months” (Browne and Minichiello 1996, p. 88). In the study, words and word use surfacing from interviews and oral histories were compared to archived texts, including court records, newspaper articles, prostitute collective handbooks, and existing slang lexicons. (For a historical analysis of the language form, see Ings 2010 and 2012.)

Description

Lexicon and Identity

Lexicons are often protean phenomena, and those belonging to groups who must maintain discretion and separateness as a means of survival are often distinctively so. This is because, as Smorag (2008) notes, words belonging to these communities The lexicon of the male sex worker 3 often operate as a form of code that allows the group to function while protecting information (and by extension people) within the subculture. The New Zealand male sex worker’s lexicon is used to construct a particular cultural reality. Words within it enable workers to assert identity and compensate for linguistic deficiency. Accordingly, by examining such lexicons, we can gain insight into more than the meaning of words. We can trace distinctive cultural information, including histor- ical changes in attitude, the culture of humor, shifting work locations, and interfaces with other marginalized groups.

Code

Although in London the Piccadilly rent boys who sold sex to men described their lexicon as nelly or nonce words (Baker 2002a, p. 36), in New Zealand, the most common term for words and phrases used by male prostitutes and their clients is called code. Code is distinguished by its distinctive wit, intertextual references, borrowings from other argots, and unique metaphoric constructions. It uses the same grammar and syntax as standard English but features very marked levels of gender neutering, denigration, and defiance. Across a hundred and fifty years in New Zealand, the lexicon has operated to communicate, protect, and facilitate unique forms of subcultural bonding. Historically, male has remained relatively invisible in New Zealand’s cultural landscape. Male street working was only criminalized briefly in 1981 when the Summary Offences Act altered the charge of soliciting to include men. Between 1981 and the passing of the 2003 New Zealand Prostitu- tion Reform Act, a charge of soliciting could be brought against a male sex worker. This situation changed when the new Act legalized street and private prostitution of both men and women. The underground language of these men has historically drawn upon a range of language forms that include back slang, boobslang (prison slang), rhyming slang, cant, and Polari. In the 1980s code was reshaped by its adoption of the language of legitimized work, and in the 1990s it changed again as workers began to use text messaging and online advertising.

Detachment

Within code one encounters a very distinctive form of detachment. The words he and she are rarely used. Instead, the third person singular is employed to describe other men (both clients and co-workers). This quality of the language frames people as objects. Historically, the propensity for calling the men one has sex with it or that thing, or referring to them as a trick, number,ormeat, was complemented by terms a worker might use for himself including goods, merchan- dise,orrent (Pearson 2007). This dehumanizing of the self and others may be argued as being part of a dynamic of oppression where the marginalized self 4 W. Ings represses through the dehumanization of his peers. Cage (2003), in discussing such slang in South Africa, talks about this scornfulness where “people are not viewed as individuals ...but are relegated to the level of sexually consumable commodities, without hopes, feelings and needs. They merely become featureless units in a noxious swarm of past, present and potential sex partners” (pp. 31–32). However, prostitution is predicated on detachment. Sex functions as an agree- ment between a worker and a client. A session (contracted engagement) is pur- chased knowing that the encounter will remain discrete and emotionally unencumbered. Both parties tacitly agree that there will be no emotional cost or commitment beyond the allocated time.

The Protean Nature of Code

Because the purpose of code is to enable discrete communication within the sex worker community, the lexicon has changed responsively over time. This has been necessitated by words occasionally becoming familiar in the overground. Terms from criminal lexicons sometimes move into popular slang via fiction writing and journalism (see Fig. 1). This occurs when writers pursue higher levels of authen- ticity by ascribing language from the underground to their characters. However, if code words become too well known, underground communities are forced to

Fig. 1 Christmas shopping poster distributed through the Westfield Malls in New Zealand (December 2010). Image property of the author. The word naff is now broadly recognized in the overground, although its historical associations are rarely understood. Baker (2002b) attributes this crossover to the popular British Radio comedy “Round the Horn” (1965–1968) where the word often appeared in the banter of the characters Julian and Sandy, played, respectively, by Hugh Paddick and Kenneth Williams. The lexicon of the male sex worker 5 change them. Indicative of such changes is the demise of a word like naff. Until the 1960s this was a code term that most men interviewed in the research described as an acronym for Not Available For Fucking, although Partridge (1989, p. 296) records the use of the variant gnaff in prostitutes’ slang (ca. 1940) meaning “nothing.” The word meant uncouth, dirty, or tasteless, but it could also describe an unsafe client. Because being able to warn other workers of dangerous situations is part of the fabric of street work, it became necessary to generate other terms. Currently the most common word for a dangerous client is UMG (when texted) or ugly mug (when spoken) (Bennachie 2009).

Words Relating to Gardening

One of the most distinctive metaphor clusters in code relates to gardening. In Victorian England the metaphor had an association with both prostitution and names for male and female genitalia (Partridge 1961, pp. 362–363). In the early seventeenth century, Covent Garden was an area of London frequented by whores, and the location gave rise to terms like garden goddess (harlot) and garden house (brothel). By the nineteenth century, garden had also come to refer to genitals, thus garden hedge (female pubic hair) and gardener (the penis). In the British colonies, an active homosexual might be referred to as an uphill gardener (Dalzell and Victor 2008a, p. 192), and, by the middle of the twentieth century in New Zealand, he might frequent one of a number of beats (“a route taken by a prostitute or policeman” (Partridge 2006, p. 116) with coded names clustered around the met- aphor of gardening (see Fig. 2). The names of these beats reveal something of their nature. While Bluebell Dell made reference to a park-like setting, the Black Forest referred to the high proportion of Polynesian men frequenting the beat (Wedding 2004). Glowworm Grotto referred to a particular park being a nighttime location where cigarettes glowed and bobbed in the dark (Wedding 2004). The Vienna Fleeing Woods referred to the relative safety of a surrounding area of forest into which one might escape if pursued by the authorities (Steven 1989). In Auckland’s central city, the Lily Pond and the Garden of Eden were code names for two notorious trade (prostitution) bars on lower Queen Street, and the trade bog (public toilet) outside the Ca d’Oro coffee bar was known as the Garden of Earthly Delights. Most male sex workers adopt working names (as a means of keeping their private life separated from professional engagements), and up until the 1970s, a significant number of Queens (men who dress in drag) and transgendered workers adopted the names of flowers. As far back as 1908, the national newspaper New Zealand Truth wrote about a group of male prostituting prisoners in Lyttelton Gaol who “had fancy monikers like Rosebud...Violet and the like.” The journalist noted, “Rosebud is at the disposal of all and sundry for an inch of Juno [tobacco], and has been known to behave blastiferously in the bath-house with five persons in an afternoon” (New Zealand Truth, 25 April 1908, p. 5). 6 W. Ings

Fig. 2 Male sex workers’ beats in lower Queen Street, Auckland (October 2nd, 1961). Original photograph from Sir George Grey Special Collections, Auckland City Libraries (NZ) 580-5831. 1, front of Central Post Office; 2, footpath outside the Garden of Eden/Waverley Hotel on Galway Street; 3, the Lily Pond/Great Northern Hotel Queen Street; 4, Ca d’Oro coffee bar and the Garden of Earthly Delights/underground toilets on Customs Street.

Although Rosebud and Violet were early examples of male prostitutes expanding metaphors of gardening into working names, by the 1980s the practice was wide- spread. In that decade famous workers like Camelia, Lily Love, and Daphne-Rose became legends on the streets and ships visiting New Zealand’s major port cities. While metaphors related to gardening permeated both working spaces and the names adopted by workers, there was another metaphorical cluster that extended the floral into the divine.

Ecclesiastical Metaphors

Oxymoronic relationships between the whore and the sanctified have been deeply embedded in the language of sexual marginalization. From the Goddesses of Covent Garden to the Sisters of Perpetual Indulgence (see Glenn 2003) and the ecclesias- tical rituals of the homosexual English Molly Houses of the 1700s, the sexualized oxymetaphor has historically been employed to describe both the allure and the stigma of sexual “disobedience.” Ecclesiastical metaphors are heavily represented in code. Terms like glory hole (a hole in the wall between two toilet cubicles), having church (to kneel in order to perform ), Christ and the two apostles (the genitals), and angel (a sex worker who limits himself to passive sex) and the The lexicon of the male sex worker 7

Fig. 3 Cast-iron screens at the entrance to the Star of David, Durham Street West, Auckland (2009). Image property of the author. Men interviewed in this project attributed the beat’s name to the Magen motif decorating the large cast-iron screens installed to protect users from public view. names of specific beats like the Chapel (Pitt Street, Auckland), the Catacombs (Auckland Museum), the Star of David (Durham Street West in Auckland, see Fig. 3), and the Wailing Wall (Sydney Hospital in Kings Cross), all form part of the richly ornamented metaphorical construct. In relation to the policing of these beats, the ecclesiastical metaphor appears in phrases like genuflecting for Maria (being escorted into a police car), a demon (a plainclothes detective), and, dryly, Our Lady of the Golden Brooch (an arresting, uniformed officer). On the street at this time, one sometimes encountered the ecclesiastical insult Sister of Charity (a worker who undercut other workers or did not always charge for sex). The term appears to come from charity fuck, which Partridge (2006, p. 370) notes in use in the USA during the same decade. De Milo (2009), in his oral history, says the description was derogatory because such behavior made business harder for other prostitutes.

Metaphors Related to Food

Smorag (2008, paragraph 8), in her discussion of contemporary slang, notes the significance of metaphors relating to food. The gay slang terms she records for the forces also appear in code: angel food (men in the air cadets), sea food (sailors), and government-inspected meat (army). However, code also draws on a much older collection of metaphors. In the middle years of the twentieth century, an assortment of advertisements for sex written on toilet walls was called a menu (Rogers 1972, p. 141). This term conceptualized sex as a consumable commodity. When food is eaten, there is no reciprocation of pleasure. One selects and then ingests. In code, food metaphors serve to reinforce the separation between sexual engagement and the human dimension of contact. 8 W. Ings

Fig. 4 Symonds Street bogs, Auckland. Image property of the author. The word tearoom (or T-room) was a common term in the USA for a public toilet that served as a beat. However, in New Zealand, Australia, and the UK, the code word up until the mid-1970s was normally cottage (Partridge 2006, p. 486). This referred to the diminutive, euphemistic architecture of lavatories that preceded the larger underground constructions built by local councils at the turn of the twentieth century (see Cooper et al. 2000).

In the 1960s words like lunch referred to the genitals (Dalzell and Victor 2008a, p. 110) and a cut lunch described a circumcised penis (Dalzell and Victor 2008b, p. 182). A young sex worker was called chicken (Dalzell and Victor 2008b, p. 133) and older workers were generally referred to as stale meat. The metaphor of tea played out in historical terms like tearoom trade (public toilets that served as beats) (Humphreys 1970; see also Fig. 4), tea leafing (to steal) (Partridge 2006, p. 642), and the more recent term tea bagging, which describes the act of lowering one’s testicles into another man’s mouth (Dalzell and Victor 2008a, p. 180). Contemporary workers interviewed for the research project noted metaphors currently in use that include vegetarian (a worker who will not perform oral sex on a client and thus is not a meat eater) (Morton 1989, p. 145), vanilla (conventional sexual activity) (Dalzell and Victor 2008a, p. 193), and turkey slapping, meaning either to slap a client in the face with one’s penis or to assault somebody in the administration of justice. On an equally graphic albeit more witty level, a street oyster described a used , and a Whale Rider referred to a worker who specializes in overweight clients. The lexicon of the male sex worker 9

Metaphors Relating to Legitimized Work

With the arrival of homosexual law reforms and the appearance of male escort agencies in the 1980s, a significant metaphorical shift appeared in code with an increasing adoption of the language of legitimized work. (Browne and Minichiello 1996, p. 87, offer a useful discussion of the reasons for and context of this change.) A bit of rent became a worker and on-site visuals described personal appearance, and purchased was called a , and lubricant and were referred to as stock (Bennachie 2005). In this decade male escort agencies began advertising in New Zealand maga- zines and newspapers. Although these agencies normally accommodated one or two escorts on-site, usually workers lived in their own premises. Clients paid both an agency fee and a fee to the worker. If the escort extracted more than the fee (e.g., double time or an all-nighter), the agency charged no more, as their costs did not increase (Pearson 2007). Following the 2003 New Zealand Prostitution Reform Act, workers called private operators began working out of SOOBs (small owner-operated busi- nesses/brothels). These establishments contained four or fewer sex workers (with- out a manager) and were legally defined as cooperatives. As a result, they did not need to be licensed. Larger licensed businesses were called BOOBs (big owner- operated businesses). In the 1980s and 1990s, before the arrival of these acronyms, men interviewed in the research project used the terms fuck flat, in-house, head- quarters, HQ, and the office to describe the private residences from which they worked.

L337 (Leet) and the Impact of Online Environments

In the 1990s sex workers became early adopters of cell phones. This is because the technology enabled them to be in immediate contact with their client base and to operate privately without the cost and obligations of an agency. On the street the technology also became popular because it enabled workers to stay in contact with colleagues (thereby increasing levels of safety). Cell phones and texting, along with the online advertising that spread through dating and cruising sites, introduced into code a proliferation of acronyms, glyphs, and abbreviations. An early description of the newly truncated language was leet speak (L337 5P34K). This was written slang used for text communication where words were condensed or numbers and nonalphabet characters replaced letters. Leet speak was socially discrete but very flexible, allowing for several ways of spelling a word. Partridge (2006, p. 1195) suggests the term is derived from elite user and is US in origin. On the street terms like n2 (into), <20 (younger than 20), SPU (sperm production unit/man), HRU (human reproduction unit/woman), and MM (married man) are examples of this early use of leet that is still in current use. 10 W. Ings

Fig. 5 Posting from the website Craigslist (December 29th, 2013). Offers either one (m4m) or the option of two (mm4m) male escorts.

Today privates advertise on gay dating and international cruising sites. Some of these, like Squirt.org, offer glossaries to help clients understand code as it appears in escort profiles. On such sites one might encounter an advertisement like VGL, BAM, MM4M, GSOH, DDF, >30:), and X-pics. This translates as “very good looking, bisexual, masculine, Asian male escort with a good sense of humor, is drug and disease free, enjoys working with older clients, and has erotic photographs available for purchase.” A worker might stipulate NSA (no strings attached) or offer a session as your M8 (mate, buddy, friend) or BF (boyfriend). He may advertise as VST or a switch (able to be sexually dominant or submissive); however, in such cases he may ask if a client prefers to be a top, pitcher,ordom (active) or a bottom, catcher,orsub (passive). All of these terms are listed in the Squirt.org website glossary (available only to authenticated users). Examples of these words can also be found in escort advertising sites like craigslist.org (Fig. 5) and www.manhunt.net.

Issues Relating to Lexicography

Code’s metaphorical clusterings are complex and this chapter deals with only a few of them. However beyond these, there are interesting issues related to the appear- ance of its words in lexical archives. Two significant points relate to the dating of entries and their context. The lexicon of the male sex worker 11

Because of their propensity to document dates based on written records, many traditional dictionaries record periods of use often at variance to dates indicated by men interviewed in fieldwork. This may suggest that slang is sometimes in use well before it appears in print. This situation may be a consequence of the need for language forms like code to protect themselves by preserving (as far as possible) discrete use and understanding. Currently, certain issues related to incorrect dating are being addressed by the emergence of online wordlists. What Carr (1997) refers to as bottom-up lexicogra- phy and Abel and Meyer (2013) describe as prosumer construction have led to dictionaries that enable experienced speakers to register words they currently use. These sites, as Creese (2013) suggests, have afforded “the opportunity for a more dynamic relationship between dictionary compilation and language change” (p. 392). Thus archives like the Urban Dictionary not only record the contributor and date of posting but also user ratings of agreed meaning. However, these dictionaries still have limitations. For instance, entries of the word in the Urban Dictionary are largely contemporary. Of its 262 listings, there are only two significant inclusions of the twentieth-century use of this word meaning a male sexual predator, and these only carry the date of the posting, not the period of use. In addition, on the site there is no evidence of older, underground meanings of the word including the Polari/code verb “to walk around seeking to charm a man into the act of copulation” (Baker 2002b, p. 193). In addition to listing and dating words, user participation has enriched the cultural contextualization of entries. Historically, a few lexicographers like Rogers (1972) have provided definitions and context sentences that integrated multiple slang words associated with the user group. Recently, certain online repositories continue this tradition with contextual sentences that contain the entry surrounded by related user slang. Thus when we encounter a word like trade, posted by McGee in 2007 on the Urban Dictionary, it is contextualized with the following conversation:

– I hooked up with that married guy who answered my M4M ad on craigslist! – Wow, Steve, I didn’t know you were into chasing after trade! – No, dude, he turned out to be just as horny for my cock as I was for his...I don’t go for those no-reciprocation trade scenes.

The advantage of repositories like the Urban Dictionary is that they often provide, within their contextualizing sentences, links to definitions elsewhere on the site. They engage with more than definition and become, as Køhler Simonsen (2005) suggests, lexicographic services rather than lexicographic products.Asa consequence, their user-generated content has the ability to nominate alternative values and cultural constructs that move us beyond the potentials of the edited dictionary as a sanctioner of dominant, cultural narratives. Currently user-generated approaches to recording and contextualizing slang demonstrate potential to reshape social narratives of identity. Lan (2005) suggests 12 W. Ings democratized online dictionaries may record “the intricacy and charm of language as it really is, not as prescriptivists think it should be” (p. 20). For languages like code that have had to be largely recovered from conversations about lived experience, this is an important development. User-generated dictio- naries may help such languages because they reach beyond the power of the edited archive and promote lexicography as a living, protean, and discursive phenomenon. This said, online dictionaries also pose a problem to the safety of lexical material that relies on discretion. Because anybody can post a word and its definition, lexica used between workers can now quickly bleed into the public domain. An example of this is the noun Narnia that surfaced on the streets in New Zealand code in early 2004. In droll reference to C. S. Lewis’ (1950) novel The Lion, the Witch and the Wardrobe, Narnia refers to a homosexual man who is deeply embedded in the closet. These men often employ male sex workers. By June 18, 2004 (the same year the word appeared on the streets), it was posted on Urban Dictionary as “a homosexual male who does not yet admit to being gay” (Macadaciouse 2004). This rapid leakage of words into the public domain poses a problem because, as we have seen, in constructs like code, restricted access to word meaning is employed to protect people. While one might argue that it is important to nominate marginalized language forms into the cultural landscape, it is also necessary that in so doing, one does not render communities unsafe. Accordingly, this chapter has considered historical metaphors that are now largely anachronistic. In discussing recent devel- opments in the language form, I have used terms that already appear in online wordlists. However, there is neither recording nor analysis of terms related to initiatives contemporary workers take to protect themselves from unsafe clients or the attentions of noncontractual parties.

Future Prospects

Although crowd-sourced e-lexicography appears to be the way that future lexica of this nature will be compiled, the approach has significant implications. While glossa- ries like those on Squirt.org currently protect themselves by restricting access, in truth, only a password is required to cross the line between what was once a carefully monitored social demarcation between words and their subcultural meanings. Code has adapted to leakage across a hundred and fifty years, but it has also paradoxically been aided by society’s propensity to render male prostitution invisible. However, crowd-sourced e-lexicography is significantly disrupting this dynamic. Underground languages like code are increasingly rendered vulnerable by their exposure. In Olmstead v. United States (1928), Justice Louis Brandeis noted, “The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding.” One entry compiled on an individual’s computer instantly becomes the property of the world, whether or not they understand the implications of what they are exposing. The impact for people working in a trade underpinned with danger can be significant. In the euphoria surrounding user-generated content, The lexicon of the male sex worker 13 it is worth considering the implications of such things. In our quest for lexical richness, we must also ask who pays the cost of visibility.

References

Abel, A., & Meyer, C. (2013). The dynamics outside the paper: User contributions to online dictionaries. In I. Kosem, J. Kallas, P. Gantar, S. Krek, M. Langemets, & M. Tuulik (Eds.), Electronic lexicography in the 21st century: Thinking outside the paper. Proceedings of the eLex 2013 conference, 17–19 October 2013, Tallinn, Estonia (pp. 179–194). Ljubljana/Tal- linn: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut. Baker, P. (2002). Polari – The lost language of gay men. London: Routledge. Bennachie, C. (2005). The New Zealand male new worker’s pack. Wellington: PUMP. Bennachie, C. (2009). Oral history: Male prostitution in New Zealand. MS-Papers OHInt-0956- 02. Wellington: Alexander Turnbull Library. Browne, J., & Minichiello, V. (1996). The social and work context of commercial sex between men: A research note. Australian and New Zealand Journal of Sociology, 32(1), 86–92. Carr, M. (1997). Internet dictionaries and lexicography. International Journal of Lexicography, 10 (3), 209–230. Cooper, A., Law, R., Malthus, J., & Wood, P. (2000). Rooms of their own: Public toilets and gendered citizens in a New Zealand city, 1860–1940. Gender, Place & Culture, 7(4), 417–433. Craigslist. (2013). Men seeking men. http://auckland.craigslist.org/m4m/4228731538.html Creese, S. (2013). Exploring the relationship between language change and dictionary: Compila- tion in the age of the collaborative dictionary. In I. Kosem, J. Kallas, P. Gantar, S. Krek, M. Langemets, & M. Tuulik (Eds.), Electronic lexicography in the 21st century: Thinking outside the paper. Proceedings of the eLex 2013 conference, 17–19 October 2013, Tallinn, Estonia (pp. 392–406). Ljubljana/Tallinn: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut. De Milo, D. (2009). Oral history: Male prostitution in New Zealand. OHInt-0956-03. Wellington: Alexander Turnbull Library. Glenn, C. (2003). Queering the (Sacred) Body Politic: Considering the performative cultural politics of the Sisters of Perpetual Indulgence, Theory and Event, 7(1). http://muse.jhu.edu/ journals/theory_and_event/v007/7.1glenn.html Humphreys, L. (1970). Tearoom trade: Impersonal sex in public places. London: Duckworth. Ings, W. (2010). Trolling the beat to working the soob: Changes in the language of the male sex worker in New Zealand. International Journal of Lexicography, 23(1), 55–82. Ings, W. (2012). Trade talk: The historical metamorphosis of the language of the New Zealand male prostitute between 1900–1981. Women’s History Review, 21(5), 773–791. Køhler Simonsen, H. (2005). User involvement in corporate LSP Intranet lexicography. In H. Gottlieb, J. E. Mogensen, & A. Zettersten (Eds.), Symposium on lexicography XI. Proceedings of the eleventh international symposium on lexicography (pp. 489–510). Tubingen:€ Niemeyer. Lan, L. (2005). The growing prosperity of on-line dictionaries. English Today, 21(3), 16–21. Lewis, C. S. (1950). The lion, the witch and the wardrobe. London: Geoffrey Bles. Olmstead v. United States. (1928). 277 U.S. 438. Pearson, P. (2007). Oral history: Male prostitution in New Zealand. OHInt-0956-01. Wellington: Alexander Turnbull Library. Smorag, P. (2008). From closet talk to PC terminology: Gay speech and the politics of visibility. http://transatlantica.revues.org/3503 Steven, G. (1989). Carmen. Archive ref. no. F8772. Wellington: Vidcom. Wedding, V. (2004). Oral history. MS-Papers-0648-01. Wellington: LAGANZ Archives. 14 W. Ings

Dictionaries

Baker, P. (2002). Fantabulosa: A dictionary of polari and gay slang. London: Bloomsbury Academic. Cage, K. (2003). Gayle: The language of kinks and queens. A dictionary of gay language in South Africa. Houghton: Jacanda Media. Dalzell, T., & Victor, T. (2008a). Sex slang. London: Routledge. Dalzell, T., & Victor, T. (2008b). The concise new Partridge dictionary of slang and unconven- tional English. New York: Routledge. Macadaciouse. (2004). Urban dictionary entry for ‘Narnia’. http://www.urbandictionary.com/ McGee, T. (2007). Urban dictionary entry for ‘trade’. http://www.urbandictionary.com/ Morton, J. (1989). Low speak: A dictionary of criminal and sexual slang. London: Angus and Robertson. Partridge, E. (1961). A dictionary of slang and unconventional English: Colloquialisms and catch- phrases, solecisms and catachreses, nicknames, vulgarisms and such Americanisms as have been naturalized. London: Routledge and Kegan Paul. Partridge, E. (1989). The concise dictionary of slang and unconventional English. London: Routledge. Partridge, E. (2006). The new Partridge dictionary of slang and unconventional English. London: Routledge. Rogers, B. (1972). The queens’ vernacular: A gay lexicon. San Francisco: Straight Arrow Books. Squirt.org. (n.d.). Glossary. http://www.squirt.org/shared/static_content.asp?SC_ID=1156 Urban Dictionary. (n.d.). http://www.urbandictionary.com/ The lexicography of Scots

Susan Rennie

Contents Introduction ...... 2 Discussion ...... 2 History of Lexicography in Scots ...... 2 Electronic Corpora of Scots ...... 12 Future Challenges and Prospects ...... 13 Language Planning ...... 14 Dictionaries and Standardization ...... 15 References ...... 16 Dictionaries ...... 17

Abstract The chapter begins with a summary account of the Scots language and its vocabulary, before continuing with a history of lexicographical activity in Scots. Lexicons of Scots have been published since the end of the sixteenth century. In the eighteenth century, Scots lexicography developed its essentially descriptive nature, to gloss editions of medieval texts and new works by vernac- ular poets, as well as to record and preserve a language that was increasingly being eroded. The nineteenth century saw the publication of John Jamieson’s Etymological Dictionary, now recognized as a key work in the development of lexicography on historical principles. This legacy was continued by the compi- lation, throughout the twentieth century, of the two major historical dictionaries of Scots, the Scottish National Dictionary (SND) and the Dictionary of the Older Scottish Tongue (DOST). The present century has seen a number of digital initiatives in Scots lexicography: the digitization of SND and DOST to form the composite Dictionary of the Scots Language/Dictionar o the Scots Leid

S. Rennie (*) University of Glasgow, Glasgow, UK e-mail: [email protected]

# Springer-Verlag GmbH Germany 2016 1 P. Hanks, G-M. de Schryver (eds.), International Handbook of Modern Lexis and Lexicography, DOI 10.1007/978-3-642-45369-4_36-2 2 S. Rennie

(DSL) and the creation of electronic corpora of both Older and Modern Scots. New projects, such as a proposed Historical Thesaurus of Scots, continue to build on and contribute to the tradition. Smaller dictionaries of Scots, including school dictionaries, are in demand to support new initiatives in teaching Scots in schools, and Scots lexicography is an important part of the debate about any future standardization of the Scots language.

Introduction

Scots is the name given to the language of Lowland Scotland. It is a Germanic language, which developed from a northern variety of Old English that was first introduced into southeast Scotland in the seventh century AD (Aitken 1985). Although over a third of its word stock derives from Old English (Macafee 1997), Scots had additional influences from Old Norse, French, Dutch/Flemish, Latin, and Gaelic, which have all contributed to its distinctive lexis. By the fifteenth century, Scots had superseded Gaelic to become the national language of Scotland, being used in both the Scottish court and parliament, and there was a consequent flowering of literary works written in the language. In the following centuries, political union with England, and the lack of a post-Reformation Scots Bible, led to English becoming the language of both State and Kirk, causing Scots to lose prestige, although literary revivals in the eighteenth and early twentieth centuries ensured its survival in written form. Today, Scots continues to flourish in literature and in a number of regional , and features of Scots vocabulary, grammar, and pronunciation underlie the variety of English known as Scottish Standard English (SSE). There is no generally accepted standard for Scots, and spelling tends to vary by region, reflecting local pronunciation. It is sometimes said that Scots is part of a “linguistic continuum,” with broad Scots at one end and SSE at the other (Corbett et al. 2003). At the Scots end of this continuum are various dialects and urban varieties, from the Norse-influenced Shetlandic to urban Glaswegian, as well as the composite literary form known as Lallans or Synthetic Scots (where synthetic means “synthesized” rather than “artifi- cial”), developed by writers in the early part of the twentieth century. Chronologi- cally, Scots is divided into two major periods of development: Older Scots, which denotes the language from the earliest records to around 1700, and Modern Scots, which is the language used from 1700 to the present day. Many of the major dictionaries and corpora of Scots focus on one or the other of these periods.

Discussion

History of Lexicography in Scots

The lexicography of Scots began by following the same path as English lexicography, but thereafter the paths diverged, for reasons that had more to do with the relationship between Scots and English than with methodology. Because of the decline in the status The lexicography of Scots 3 of Scots from the seventeenth century onward, there would never be calls for a standardizing or normative dictionary of Scots as had led to the publication of Johnson’s dictionary of English. Rather, Scots lexicography stayed rooted in the tradition of glossaries – whether to editions of Older Scots texts or to editions of new vernacular poetry – and Scots lexicography therefore remained essentially descriptive and empiricist, with definitions tied to specific examples of written or spoken usage.

Early Lexicons of Scots The first stirrings of Scots lexicography can be seen in manuscript glosses, where an Older Scots word is used to gloss a Latin text (Williamson 2012); but the first published lexicon of which evidence survives is a pedagogical glossary (from Latin into Older Scots) compiled by Andrew Duncan, Rector of the Dundee Grammar School, as an appendix to his Latin grammar of 1595. At around the same time, Sir John Skene, Clerk Register for Scotland, compiled the first technical glossary of Scots legal terms, De Verborum Significatione. Published in 1597, Skene’s work provided textual references and rudimentary etymologies and remained in print until the early nineteenth century. However, the work which is generally acknowl- edged as the foundation of lexicography in Scots was compiled over a century later by Thomas Ruddiman, classical scholar and Underkeeper of the Advocates Library in Edinburgh. In 1710, Ruddiman published an edition of Gavin Douglas’s Eneados – a sixteenth-century Scots translation of the Aeneid – which included “A large Glossary, Explaining the Difficult Words: Which may serve for a Dictionary to the Old Scottish Language.” Ruddiman glossed around 3,000 of Douglas’s Older Scots words; but he also added information on contemporary, eighteenth-century usage, often drawn from his own of northeast Scots, making his glossary particu- larly valuable for historical linguists (Aitken 1989; McClure 2012). From the mid-eighteenth century, as the prestige of Scots was on the decline, there was a corresponding upsurge of interest in recording and preserving what was seen as a dying language. This led to several plans to compile a dictionary of contemporary Scots, the first of which was begun by James Boswell, shortly after he met Johnson for the first time in 1763. “The Scottish language is being lost every day, and in a short time will become quite unintelligible,” Boswell wrote. “It is for that reason that I have undertaken to make a dictionary of our tongue, through which one will always have the means of learning it like any other dead language” (Pottle 1952, p. 161). Although he managed to show a specimen of his work to Johnson, Boswell later abandoned the project and his work was never published. His surviving manuscript, which was rediscovered in 2010 (http://boswellian.com), contains notes on around 800 Scots words and phrases, many of which are still current, such as bauchil “a shoe down in the heel,” sneck “to shut the latch,” and wean a “child” (Rennie 2011). Some of Boswell’s contemporaries, such as James Beattie and Sir John Sinclair, took a contrary view and compiled lists of so-called Scotticisms: part of a proscriptive trend in the latter half of the eighteenth century to publish lists of Scots usages (often idioms) to be avoided in Standard English (Basker 1991). Despite their original aims, these lists were often cited by later lexicographers and are now fruitful sources for historical linguists (Dossena 2003). 4 S. Rennie

A separate strand of Scots lexicography in the eighteenth century comprised glossaries written by poets to accompany their own works. Allan Ramsay appended a glossary of around 900 words to the first edition of his poems in 1721, which famously included the first published definition of a golf tee “a little Earth, on which Gamsters at the Gowf set their Balls before they strike them off” (cited in SND s.v. Gamster, n. 1). Like Ruddiman, Ramsay did not confine himself to glossing the text and often gave additional senses of words or examples of contemporary usage. Robert Burns also produced his own glossary for the first, Kilmarnock edition of his poems in 1786 and a much expanded version for the Edinburgh edition the following year (Murison 1975). These two strands of lexicographic tradition – the glossaries to literary texts, both in Older and Modern Scots, and the plans for dictionaries of the contemporary language based on fieldwork – came together in the work that marks a watershed in Scots lexicography, John Jamieson’s Etymo- logical Dictionary of the Scottish Language of 1808 (Fig. 1).

Jamieson and Historical Lexicography In 1787, the Rev. Dr John Jamieson, a minister of the Scottish Secession Kirk, began work on a “glossary” of Scots that would grow into the first comprehensive dictionary of the language. Through years of antiquarian research on the history and place names of Angus and study of manuscript sources of Older Scots, Jamieson developed the methods and historical approach which were to underpin his lexicography. Published in two quarto volumes in 1808, with a further two-volume Supplement in 1825, the Etymological Dictionary of the Scottish Language was the first lexicographic work in either English or Scots to trace the earliest occurrence of its headwords, and it is for this reason that Jamieson is now recognized as a pioneer of lexicography on historical principles (Aitken 1992; Rennie 2012a). As he stated in the Dictionary preface, “On every word, or particular sense of a word, I endeavour to give the oldest printed or MS author- ities.” Jamieson made several other important innovations. In his search for evidence, he gathered and cited material from nonliterary sources, including local newspapers, and he sought the advice of specialist consultants to ensure the accuracy of his definitions. He also consulted living authors, including his lifelong friend and supporter, Sir Walter Scott, whose works (and in some cases definitions) he quoted extensively in the 1825 Supplement (Rennie 2012a). In 1818, while working on the Supplement, Jamieson published an abridged edition, which sold at a fraction of the price of the full Dictionary and brought his work to a wider readership. Jamieson insisted on including Scots words “on the authority of the nation at large” rather than relying solely on written evidence, and by preserving spoken Scots at a time of increasing vocabulary loss, his work became a valuable source for later lexicographers. The Dictionary and Supplement are cited in over 9,000 entries in the SND and provide the first evidence of many core Scots words, including jab, pernicketie, plowter, and wheech. Jamieson also prefaced the Dictionary with a lengthy essay elaborating his theory of the Norse origins of the Scots language, which informed many of his etymologies. His assertion that Norse had a strong and The lexicography of Scots 5

Fig. 1 Title page of Jamieson’s Dictionary of 1808 6 S. Rennie

Fig. 2 The Online Jamieson at http://scotsdictionary.com

lasting influence on Scots was essentially correct, although his particular theory that Scots descended directly from Norse-speaking Picts was discounted by later philologists. Jamieson’s Dictionary and Supplement were extensively revised after his death, and two new editions were produced during the nineteenth century. These in turn fed into new abridged editions, which proved to be rich sources for creative writers in the early twentieth century. In 2008, to mark the bicentenary of its publication, a digital facsimile of the Dictionary was published online (http://scotsdictionary. com; Fig. 2), later enhanced by the text of the 1825 Supplement (Rennie 2008). Much research remains to be done on Jamieson’s work, but his significance to the history of European lexicography in general is becoming more widely appreciated (Considine 2014).

The Scottish National Dictionary and DOST At the start of the twentieth century, work began on two major historical dictionar- ies which would put Scots lexicography on a par with the latest developments in the discipline. While working as coeditor on the New English Dictionary, William A. Craigie proposed the idea of a dictionary of “older Scottish” (part of a wider plan to produce a series of “period dictionaries”), which would treat medieval Scots vocabulary in greater depth than had been possible in NED. Craigie began work on The Dictionary of the Older Scottish Tongue (DOST) in 1921, using the Older Scots citation slips excerpted for NED as the basis for a comprehensive collection of The lexicography of Scots 7 evidence from Older Scots sources (Dareau 2005). The first part of DOST was published in 1931, with Craigie as sole editor. By 1948, he had been joined by A. J. Aitken (1921–1998), who later succeeded him as editor and who would revolution- ize the methodology of DOST, extending the range of books and manuscripts read as sources and later introducing computational methods to capture and search the source texts. Under the guidance of Aitken and his successors, DOST grow in scope and extent (often to the alarm of its funders and publishers), until the final twelfth volume was published in 2002. Craigie was also instrumental in the genesis of the Scottish National Dictionary: the other major dictionary of Scots, which covers the language from 1700 to the present. In response to a lecture given by him in 1907, the Scottish Dialects Committee (SDC) was established to research “the present condition of the Scottish dialects,” with the phonetician, William Grant (1863–1946), as its convenor. The SDC published its new data, gathered from a network of local correspondents, in a series of Transactions between 1913 and 1921; but as their collections grew, so did the ambition of the project, and in 1929, the Scottish National Dictionary Associ- ation (SNDA) was formed, to continue the program of data collection and to oversee publication of the material in the form of a new dictionary of Modern Scots that would supersede Jamieson and complement the work being done by Craigie on the earlier language. “It would be excellent,” Graigie wrote to Grant, “if the two Dictionaries could be produced concurrently, so that the one could link up with the other and the continuity (or otherwise) of the words be clearly shown” (cited in Dareau 2005). Grant referred to the SND as “Oor Ain Dictionar,” empha- sizing the collaborative nature of the project and the sense of a shared linguistic heritage which he wanted the dictionary to record and preserve. The first part of the new work, now called the Scottish National Dictionary, was published in 1931, and the dictionary eventually ran to ten volumes. After Grant’s death in 1946, the remaining volumes (roughly the letter D onward) were edited under the direction of David Murison (1913–1997). Murison changed the structure of SND entries, replacing the previous system of ordering quotations by region with a chronological order more akin to the style of the OED. He also substantially increased the reading program and with it the number and range of sources that were cited. A Supplement containing unpublished additions and revisions to earlier letters was added to the final, tenth volume, published in 1976. Together, these two historical dictionaries trace the development of the Scots language from the earliest records in the twelfth century to the late twentieth century. They contain more than 80,000 entries, each of which details the chrono- logical and semantic development of a Scots word, illustrated by quotations drawn from over 6,000 sources, covering a wide range of subject areas within Scottish culture and history. Although they complement each other chronologically, there are important differences in methodology between the two dictionaries. Whereas DOST covers all words and senses evidenced in older Scots, including those shared with English during the same period, the SND only covers words and senses which are distinct from Standard English. Also, a substantial amount of evidence in SND is drawn from local contributors and questionnaires, rather than from written 8 S. Rennie sources which are easier to date. These differences make it impractical to merge the two works editorially, although they can now be searched simultaneously through the Dictionary of the Scots Language website.

Concise and School Dictionaries of Scots Alongside the two major dictionary projects, smaller works were produced to cater to those who wanted a simpler and more affordable Scots dictionary. The most enduring has been The Scots Dialect Dictionary, compiled by Alexander Warrack and first published by Chambers in 1911. Warrack had been a major contributor to the English Dialect Dictionary, and his dictionary draws extensively on his research for the latter. It was prefaced by a description of the history and dialects of Scots by William Grant and also had the backing of William Craigie, with whom Warrack had corresponded. Intended as a guide to the Scots words used by authors such as Burns and Scott, Warrack’s dictionary only covered Modern Scots and included around 60,000 headwords and variant spellings (Macleod 2012). It proved enor- mously popular and was reissued many times, under various titles, and is still in print (Rennie 2012b). After completing SND in 1976, the SNDA also turned its attention to producing a cheaper and more accessible dictionary. Published in 1983 with Mairi Robinson as Chief Editor, the Concise Scots Dictionary (CSD) distilled the data from SND and DOST into a single volume. It was the first dictionary to cover the full historical range of Scots since the later editions of Jamieson and was an immediate bestseller. CSD entries gave both Older and Modern Scots senses in chronological order, dated to within a century, but did not include illustrative quotations or predictable spelling variants. Although SND was complete when work began on CSD, DOST had only reached the letter P, so that the Older Scots information in later sections is based on the OED and other sources (Robinson 1985). Work is now underway on a second edition of CSD (CSD2), due to be published in 2015, which will incorporate information from the later volumes of DOST and provide further updates including revised etymologies (Robinson 2013). The headword list and definitions of CSD were also the basis for the first Scots Thesaurus (ST), published in 1990 and edited by Iseabail Macleod, then Editorial Director of SNDA. Although selective rather than comprehensive in its coverage (Kay 1994), the ST nevertheless provides helpful classification of Scots lexis in areas where the language is traditionally rich, such as terms for food and drink, plants, and weather, and it has proved popular with writers and translators. The 1990s saw a series of educational initiatives in Scotland that created a new demand for Scots resources in primary schools. The SNDA responded by publish- ing the Scots School Dictionary in 1996: the first work of pedagogic lexicography in Scots since Andrew Duncan’s lexicon with which the tradition began. Pedagogic lexicography has been pushing the development of Scots lexicography ever since. The SNDA’s first foray into digital lexicography was the Electronic Scots School Dictionary (ESSD), published on CD-ROM in 1999. Based on the content of the print Scots School Dictionary, this was the first SNDA dictionary to include audio files to indicate pronunciation and included additional resources, such as a grammar The lexicography of Scots 9

Fig. 3 Scots Dictionary for Schools app

guide and word games. The underlying data for the ESSD has recently been repurposed for a new Scots Dictionary for Schools app (Fig. 3), developed with funding from the Scottish Government, and launched as a free resource in 2014. The ESSD grammar guide was also republished in print form in 1999, as Grammar Broonie: A Guide Tae Scots Grammar.

The Dictionary of the Scots Language Building on the success of the ESSD, the SNDA obtained funding to digitize a sample of the Scottish National Dictionary to create a prototype eSND. It was decided at the outset to use XML markup for the eSND, and the project team devised a customized version of the TEI dictionary tag set to fit the idiosyncrasies of the source text (Rennie 2001). The sample SND pages were scanned and converted to machine-readable text through OCR followed by proofreading to catch any remaining errors, and the resulting text was passed through a series of short pro- grams to apply successive layers of XML markup. The SNDA subsequently obtained a major grant from the AHRB to digitize not only the complete SND but 10 S. Rennie

Fig. 4 The interface of DSL1 also DOST (then nearing completion in print form) and to publish both dictionaries online under the composite title of The Dictionary of the Scots Language or (in Scots) The Dictionar of the Scots Leid (DSL). The project to create the DSL was a joint venture between the University of Dundee and SNDA and was directed by the late Dr Victor Skretkowicz with myself as editor. From 2001 to 2004, the DSL project team converted 22 volumes of SND and DOST into TEI-compliant XML, using the methods piloted for the eSND (Rennie 2004). The DSL website (http://www.dsl.ac.uk/) was launched in 2004 with a web interface designed by Jeffery Triggs, which made use of the Amberfish open-source search engine (Fig. 4). A new SND Supplement, based on data which the SNDA had been gathering since publication of the print SND, was published as an adjunct to the DSL in 2005. A second phase of the DSL was originally planned to enhance the markup and search facilities and also to integrate the Supplements to the two component dictionaries, but the project did not secure sufficient funding and the original team was dispersed in 2004. Funding constraints also forced SNDA (later renamed Scottish Language Dictionaries) to concentrate its efforts on a major revision to the CSD. These factors inevitably meant that work to maintain the DSL fell behind, and it was not until late 2014 that a second version was made available. The relaunched DSL2 (still at http://www.dsl.ac.uk/ but now hosted by the University of Glasgow) was designed to cope with the demands of mobile browsing, reflecting the changing demands of users (Fig. 5). In June 2015, further updates were included to improve the search facilities of DSL2, in particular to reintroduce Boolean searches and wildcard searches for full text, which had not been available in the new version. The lexicography of Scots 11

Fig. 5 The interface of DSL2: http://dsl.ac.uk

Work has also been undertaken to make cross-references more accurate and to link citations to the relevant listing in the source bibliographies. Although Scots lexicography is traditionally bilingual, the majority of works look only one way, from Scots to English. The first dictionary in the contrary direction, with English headwords translated into Scots, was Lallans: A Selection of Scots Words, published by James Nicol Jarvie in 1947. William Graham’s Scots Word Book, first published in 1977 and later expanded, was a fully bilingual dictionary with separate Scots–English and English–Scots sections, the latter of which would later form the basis for the SNDA’s Concise English–Scots Dictio- nary, published in 1993. To date, there has never been a complete dictionary of the Scots language, because no Scots dictionary has included the substantial portion of vocabulary which Scots shares with English. Unless there is an identifiable difference – in orthography (and the underlying phonology) or semantics – a shared word will not feature in the headword list of any Modern Scots dictionary. There have been plans to compile a monolingual dictionary of Scots – one that defines its Scots headwords in Scots rather than English – but none of these has yet come to fruition. It would be misleading to conclude a discussion of the lexicography of Scots without reference to some of the major English dictionaries which include Scots lexis. Johnson included occasional Scots usages (perhaps contributed by his Scots amanuenses) such as “Sponk, a word in Edinburgh which denotes a match, or any thing dipt in sulphur that takes fire: as, Any sponks will ye buy?” (cited in SND s.v. Spunk, n. 2). The OED includes a substantial proportion of Scots headwords and senses, and recent updates to OED3 are incorporating antedatings from Jamieson, 12 S. Rennie as well as from SND and DOST. The HTE also includes data on Scots that featured in the OED and its Supplements; currently over 11,000 words are labeled as Scots in the HTE database. English dictionaries published by Scottish publishers, such as Chambers Dictionary, have always included a proportion of Scots lexis and in this respect may be considered as dictionaries of Scottish Standard English.

Electronic Corpora of Scots

Older Scots Corpora DOST Editor, A. J. Aitken, was an early enthusiast for the use of computers in lexicography and, in the late 1960s, instigated the creation of the Older Scottish Textual Archive (OSTA): a computerized version (originally punched onto paper tape) of the Older Scots sources used in the compilation of DOST (Aitken and Bratley 1967). This early instance of a Scots corpus is still available to researchers through the Oxford Text Archive (http://ota.ox.ac.uk/desc/0701). With the advent of corpus linguistics in the following decades, publishers of English dictionaries in Scotland embraced the use of linguistically tagged corpora (Chambers was one of the original partners in the British National Corpus project), but the first comparable resource for Scots was created by a team of researchers at the University of Helsinki, led by Anneli Meurman-Solin (http://www.helsinki.fi/varieng/CoRD/cor pora/HCOS/). The Helsinki Corpus of Older Scots (HCOS) comprises around 850,000 words of running text, drawn from sources such as borough records, trial proceedings, sermons, diaries, travelogues, and official and private letters, com- posed between 1450 and 1700 (Meurman-Solin 1995). A complementary corpus was later created at the University of Edinburgh – the Edinburgh Corpus of Older Scots (ECOS) – to cover the earlier period from around 1380 to 1500. Together, these corpora have stimulated research into variation and change in Older Scots.

Modern Scots Corpora Until recently, the creation of corpora for Modern Scots has lagged behind those for earlier forms of the language. In the late 1990s, a collaboration between researchers at the University of Glasgow and the SNDA led to the creation of the Scottish Corpus of Texts and Speech (SCOTS), a large-scale corpus of both written and spoken texts in Scots and Scottish Standard English (Anderson et al. 2007). Since its publication in 2004, a series of updates have taken the SCOTS corpus to nearly 4.6 million words of text, with accompanying audio files for some sources (http:// www.scottishcorpus.ac.uk). The lack of standardization in Scots means that it is not yet feasible to provide the kind of language-processing support that underpins corpus linguistics in English. In order to search for all forms of the verb scunner, for example, users of SCOTS have to perform a wildcard search or search on individual forms such as scunnered and scunnert. In 2010, the historical range of Modern Scots corpora was extended by the creation of the Corpus of Modern Scottish Writing (http://www.scottishcorpus.ac.uk/cmsw). Designed to fill the chronological gap between the Helsinki Corpus and SCOTS, the CMSW comprises The lexicography of Scots 13 around 5.4 million words drawn from over 350 texts composed between 1700 and 1945, ranging from novels to personal correspondence. As well as providing data for lexicographers (Robinson 2013), the two Glasgow corpora have spawned a number of linguistic studies into historical and contemporary Scots usage (Ander- son 2013).

New Corpora Two new projects to create corpora with Scots-language content are currently underway. The Corpus of Scottish Correspondence (CSC) at the University of Helsinki aims to collect around 500,000 words of running text in Older Scots, based on manuscripts of official and family letters dating from 1500 to 1730 (http:// www.helsinki.fi/varieng/CoRD/corpora/CSC/index.html). A second project, based at the University of Bergamo, is creating a corpus of nineteenth-century Scottish correspondence (19CSC), based on collections of personal and business letters which are being diplomatically transcribed from manuscripts (Dossena 2004). Although not exclusively a corpus of Scots, the 19CSC is likely to include Scots forms and usages and so provide evidence for researchers studying the relationship between Scots and English in the nineteenth century.

Future Challenges and Prospects

The existence of XML-coded versions of both SND and DOST offers considerable potential beyond simple updates of the dictionaries themselves. A number of pro- jects are underway to build on this resource, the most ambitious of which is to create a digital Historical Thesaurus of Scots (HTS), by mining the DSL for data. Taking for its model the Historical Thesaurus of English, begun by Michael Samuels at the University of Glasgow in the 1960s (http://www.historicalthesaurus.arts.gla.ac.uk), the HTS will be the first thesaurus of Scots to encompass the full history of the language, and the first comprehensive resource for Scots to be arranged according to synonymy and semantic category (Rennie forthcoming). The project is currently in a pilot phase, funded by a grant from the Arts and Humanities Research Council, and a website that will allow users to search within the key subject domains identified by the pilot, and to link to related entries in the DSL, is planned for publication in 2015 (http://scotsthesaurus.org). Another focus for the future must be the upkeep of the DSL, to ensure that the underlying data does not fall out of date. Insecurity of funding for SNDA, and later SLD, has meant that there are gaps in the lexicographic record of Scots since the publication of the last volume of SND in 1976. One glaring example is the fact that William Lorimer’s Scots translation of the New Testament – widely acknowledged as the finest example of Modern Scots prose – is not cited anywhere in the SND, as it was published posthumously in 1983. The coverage of post-war Scots writing in the DSL is also patchy, possibly due to interruptions to the dictionary reading program, but for Scots lexicography to be truly world class, these gaps need to be addressed. Recent developments, such as the joint collaboration on the HTS, are a sign that 14 S. Rennie alternative funding streams may be found to assist the ongoing revision program. A digitized CSD is another desideratum, and it is odd that this has not been created as a side product of the forthcoming CSD2 – perhaps with the printed version as an optional extra, rather than the primary focus. An online CSD could act as a useful portal to the DSL, providing the links between Older Scots and Modern Scots forms that are not always explicit in the separate dictionary entries or not always obvious from search matches. There is also work to be done to uncover the full history of lexicography in Scots. In particular, Jamieson’s Dictionary and Supplement of 1808/1825 are surprisingly under-researched, though the existence of the Online Jamieson may help to redress this. A further new initiative is the proposed creation of a Scottish Lexicographic Network (ScotLex) as a forum for dictionary projects and individual lexicographers working within any of the past or current languages of Scotland, including (but not limited to) Scots, English, and Gaelic. It is hoped that Scotlex will establish stronger links and collaborations between major lexicographic projects, such as the DSL, the Historical Thesauruses of both Scots and English, and the new historical dictionary of Scottish Gaelic, Faclair na Gàidhlig (http://www.faclair.ac.uk). One future project for the new network is a proposed digital collection of Early Scots Lexicons Online (ESLO), on the model of LEME, which would allow the full range of Scots glossaries and dictionaries before Jamieson to be searched and analyzed together.

Language Planning

Regulatory Bodies Two official bodies charged with supporting the Scots language currently receive funding from the Scottish Government. Scottish Language Dictionaries (SLD) is, in effect, the official home of Scots-language lexicography in Scotland (http://www. scotsdictionaries.org.uk). Formed in 2002 from an amalgamation of the Scottish National Dictionary Association and the former staff of DOST, it is charged with maintaining and updating the DSL as well as publishing smaller derivative works, such as the forthcoming revision to CSD. The Scots Language Centre, based in Perth, also receives government funding for its role in providing information and support for Scots through its website (http://www.scotslanguage.com) and related activities. There is currently little competition to publish Scots dictionaries, and the majority of works – both academic and trade dictionaries – are produced by the government-funded SLD. The closest to a commercial rival for SLD’s smaller dictionaries is the Collins Gem Dictionary of Scots, first published in 1995 and still a popular guide to core Scots vocabulary. Alexander Warrack Scots Dialect Dictionary of 1911, now out of copyright, is regularly republished under various titles: an indication that there is still a demand for a work that covers the Scots lexis of major authors such as Burns and Scott, yet (unlike CSD) is entirely modern in coverage. Independent publishers do, however, tend to support smaller lexicons of particular dialects, such as the Shetland Dictionary published by the Shetland Times in 2010. A Scots-Polish Lexicon/Leksykon szkocko-polski was published in 2014 The lexicography of Scots 15

Fig. 6 CannieSpell program disk

and may indicate a new impulse to widen bilingual lexicography in Scots beyond the usual English–Scots axis.

Dictionaries and Standardization

The reestablishment of the Scottish Parliament in Edinburgh has created new opportunities for the use of formal Scots. Members of the Scottish Parliament (MSPs) are allowed to deliver speeches in Scots or Gaelic as well as English; and to facilitate this, the SNDA produced an initial set of guidelines on transcribing Scots for the use of the parliamentary reporters. As the contexts for written Scots grow, lexicographers find themselves increasingly at the center of the debate over whether there ought to be a standard form of Scots and if so, what form that should take: a composite based on several dialects or a single dialect chosen because of maximum difference from English or because of the number of its current speakers. The policy of SLD has always been to track and describe the language, rather than to prescribe; but space constraints in smaller dictionaries mean that the range of headword variants is inevitably squeezed, and this can lead to the selected forms being seen as preferred. Decisions which a Scots lexicographer makes can therefore have consequences for the uptake of a particular spelling form, and this is especially true for dictionaries aimed at school users. In 1998, the SNDA launched the first spellchecker for Scots – punningly called CannieSpell (as Scots cannie, meaning “wise or shrewd,” is a homophone of cannae, meaning “cannot”) – produced on floppy disks and designed to integrate with common word-processing programs (Fig. 6). Based on the headword list of the Scots School Dictionary, CannieSpell incorporated the first lemmatized list of Scots, offering a full range of inflected forms as well as major regional variants; 16 S. Rennie however, it was still intended as a guide based on current usage, rather than a set of rigid rules for the spelling of Modern Scots. There is a new willingness to embrace the teaching (not just the acceptance) of Scots in education in Scotland. The introduction in 2014 of a new Scots Language Award, ratified by the Scottish Qualifications Authority (http://www.sqa.org.uk/ sqa/70056.html), may prove to be a milestone in the history of the language, giving Scots an official stamp that it has lacked in modern times. A long-awaited Scots Language Policy by the Scottish Government, due for publication in 2015, also suggests that Scots is entering an era when the need for high-quality lexicographic resources has never been greater, to support the efforts of researchers, teachers, writers, and translators.

References

Aitken, A. J. (1985). A history of Scots. In CSD, ix–xli. Aitken, A. J. (1989). The lexicography of Scots two hundred years since: Ruddiman and his successors. In L. Mackenzie & R. Todd (Eds.), In other words: Transcultural studies in philology, translation, and lexicology (pp. 235–245). Dordrecht: Foris. Aitken, A. J. (1992). Scottish dictionaries. In T. McArthur (Ed.), The Oxford companion to the English language (pp. 901–903). Oxford: OUP. Aitken, A. J., & Bratley, P. (1967). An archive of Older Scottish texts for scanning by computer. English Studies, 48, 60–61. Anderson, W. (Ed.). (2013). Language in Scotland: Corpus-based studies (SCROLL, Vol. 19). Amsterdam: Rodopi. Anderson, J., Beavan, D., & Kay, C. J. (2007). SCOTS: Scottish Corpus of Texts and Speech. In J. Beal, K. Corrigan, & H. Moisl (Eds.), Creating and digitizing language corpora (Synchronic databases, Vol. 1, pp. 17–34). Basingstoke: Palgrave Macmillan. Basker, J. G. (1991). Scotticisms and the problem of cultural identity in eighteenth-century Britain. Eighteenth-century Life, n.s., 15, 81–95. Burns, R. (1786). Poems, chiefly in the Scottish dialect. Kilmarnock: John Wilson. Considine, J. (2014). John Jamieson, Franz Passow, and the double invention of lexicography on historical principles. Journal of the History of Ideas, 75(2), 261–281. Corbett, J., McClure, J. D., & Stuart-Smith, J. (2003). A brief history of Scots. In J. Corbett, J. D. McClure, & J. Stuart-Smith (Eds.), The Edinburgh companion to Scots (pp. 1–16). Edinburgh: EUP. Dareau, M. (2005). The history and development of DOST. In C. J. Kay & M. A. Mackay (Eds.), Perspectives on the Older Scottish tongue. A celebration of DOST (pp. 18–37). Edinburgh: EUP. Dossena, M. (2003). ‘Like runes upon a standin’ stane’: Scotticisms in grammar and vocabulary. East Linton: Tuckwell Press. Dossena, M. (2004). Towards a corpus of nineteenth-century Scottish correspondence. Linguistica e Filologia, 18, 195–214. Dossena, M. (2008). When antiquarians looked at the thistle: Late Modern views of Scotland’s linguistic heritage. The Bottle Imp 4, http://www.arts.gla.ac.uk/ScotLit/ASLS/SWE/TBI/ TBIIssue4/Thistle.html. Dossena, M. (2012/13). The thistle and the words: Scotland in Late Modern English lexicography. Scottish Language, 31/32, 64–85. Kay, C. J. (1994). A lexical view of two societies: A comparison of the Scots Thesaurus and a Thesaurus of Old English. In A. Fenton & D. MacDonald (Eds.), Studies in Scots and Gaelic (pp. 41–47). Edinburgh: Canongate Academic. The lexicography of Scots 17

Lorimer, W. L. (1983). The New Testament in Scots translated by William Laughton Lorimer. Edinburgh: Southside Publishers. Macafee, C. (1997). Older Scots lexis. In C. Jones (Ed.), The Edinburgh history of the Scots language (pp. 182–212). Edinburgh: EUP. Macleod, I. (2012). Scottish National Dictionary. In I. Macleod, & J. D. McClure (Eds.), (pp. 144–171). Macleod, I., & McClure, J. D. (Eds.). (2012). Scotland in definition: A history of Scottish dictionaries. Edinburgh: John Donald. McClure, J. D. (2012). Glossaries and Scotticisms: Lexicography in the eighteenth century. In I. Macleod, & J. D. McClure (Eds.), (pp. 35–59). Meurman-Solin, A. (1995). A new tool: The Helsinki corpus of Older Scots (1450–1700). ICAME Journal, 19, 49–62. Murison, D. (1975). The language of Burns. In D. A. Low (Ed.), Critical essays on Robert Burns (pp. 54–69). London: Routledge & Kegan Paul. Pottle, F. A. (Ed.). (1952). Boswell in Holland, 1763–1764. London: Heinemann. Ramsay, A. (1721). Poems by Allan Ramsay. Edinburgh: printed by Mr Thomas Ruddiman. Rennie, S. (2001). The electronic Scottish National Dictionary (eSND): Work in progress. Literary and Linguistic Computing, 16(2), 153–160. Rennie, S. (2004). About the Dictionary of the Scots Language. Originally published as part of the DSL1 website. http://www.dsl.ac.uk. Rennie, S. (2008). The electronic Jamieson: Towards a bicentenary celebration. In M. Mooijaart & M. van der Wal (Eds.), Yesterday’s words: Contemporary, current and future lexicography (pp. 333–340). Newcastle: Cambridge Scholars. Rennie, S. (2011). Boswell’s Scottish dictionary rediscovered. Dictionaries: Journal of the Dictionary Society of North America, 32, 94–110. Rennie, S. (2012a). Jamieson’s dictionary of Scots: The story of the first historical dictionary of the Scots language. Oxford: OUP. Rennie, S. (2012b). Jamieson and the nineteenth century. In I. Macleod, & J. D. McClure (Eds.), (pp. 60–84). Rennie, S. (forthcoming). Creating a Historical Thesaurus of Scots. In Proceedings of ICHLL7 [title tbc]. Linguistic insights. Bern: Peter Lang. Robinson, M. (1985). The Concise Scots Dictionary: A final report. Dictionaries: Journal of the Dictionary Society of North America, 7, 112–133. Robinson, C. (2013). Loanwords in Scots: Some reflections from lexicography. In J. M. Kirk & I. Macleod (Eds.), Scots: Studies in its literature and language (SCROLL, Vol. 21, pp. 125–144). Amsterdam: Rodopi. Williamson, K. (2012). Lexicography of Scots before 1700. In I. Macleod, & J. D. McClure (Eds.), (pp. 13–34).

Dictionaries

[CESD] Macleod, I. et al. (1993). Concise English–Scots Dictionary. Edinburgh: Chambers. [CSD] Robinson, M. (1985). Concise Scots Dictionary. Aberdeen: AUP. Reissued Edinburgh: Polygon, 1999. [DOST] Craigie, W. A., Aitken, J. A. et al. (1937–2002). A Dictionary of the Older Scottish Tongue: From the twelfth century to the end of the seventeenth (12 Vols.). Chicago/London: University of Chicago Press/OUP. (Published online as part of DSL, 2004). [DSL] Rennie, S. et al. (2004). Dictionary of the Scots Language/Dictionar o the Scots leid (= digital edition of DOST and SND. DSL1 published 2004; DSL2, 2014). http://dsl.ac.uk [EDD] Wright, J. (Ed.). (1898–1905). The English Dialect Dictionary (6 Vols.). London: Henry Frowde. [ESSD] Rennie, S. (1998). The Electronic Scots School Dictionary. CD-ROM. Edinburgh: SNDA. 18 S. Rennie

[HTE] Samuels, M., Kay, C. J. et al. (Eds.). The Historical Thesaurus of English.(= digital edition of The Historical Thesaurus of the Oxford English Dictionary). http://libra.englang.arts.gla.ac. uk/historicalthesaurus/ [LEME] Lancashire, I. (Ed.). (2006). Lexicons of Early Modern English. http://leme.library. utoronto.ca [NED] Murray, J. A. H. et al. (Eds.). (1888–1928). A New English Dictionary on Historical Principles (10 Vols.). Oxford: Clarendon Press. [SND] Grant, W., & Murison, D. (Eds.). (1931–1976). The Scottish National Dictionary (10 Vols.). Edinburgh: SNDA. (Published online as part of DSL, 2004). [SSD] Macleod, I. et al. (1996). The Scots School Dictionary. Edinburgh: Chambers. [ST] Macleod, I. et al. (1990). The Scots Thesaurus. Aberdeen: AUP. Boswell’s Scottish Dictionary. http://boswellian.com Christie-Johnston, A., & Christie-Johnston, A. (2010). Shetland words: A dictionary of the Shetland dialect. Lerwick: Shetland Times. Collins Gem Scots Dictionary (1995). Reissued as Collins Scots Dictionary (2003). Glasgow: HarperCollins. Duncan, A. (1595). Early Scottish Glossary; Selected from Duncan’s Appendix Etymologiae, A.D. 1595. In W. W. Skeat (Ed.), Series B. Reprinted Glossaries, 3 parts. London: published for the English Dialect Society, 1873–4, ii. 65–75. Graham, W. (1977). The Scots Word Book. Edinburgh: Ramsay Head Press. Jamieson, J. (1808). An etymological dictionary of the Scottish language: illustrating the words in their different significations (2 Vols.). Edinburgh: printed at the University Press, for W. Creech et al. Jamieson, J. (1818). An etymological dictionary of the Scottish language [...] Abridged from the quarto edition, by the author. Edinburgh: printed for Archibald Constable & Company. Jamieson, J. (1825). Supplement to the etymological dictionary of the Scottish language (2 Vols.). Edinburgh: printed at the University Press for W. & C. Tait. Jarvie, J. N. (1947). Lallans: A selection of Scots words arranged as an English–Scots dictionary. London: Wren Books. Michalska, K. (2014). A Scots–Polish lexicon/Leksykon szkocko-polski. London: Steve Savage. Ruddiman, T. (1710). Virgil’s Aeneis, translated into Scottish verse, by the famous Gawin Douglas Bishop of Dunkeld. Edinburgh: printed by Mr Andrew Symson, and Mr Robert Freebairn. Skene, Sir J. (1597). De verborum significatione: The exposition of the termes and difficill wordes conteined in the foure buikes of Regiam Majestatem, and uthers. Edinburgh: Robert Walde- graue. The Online Jamieson. http://www.scotsdictionary.com Warrack, A. (1911). A Scots Dialect Dictionary. London/Edinburgh: W. & R. Chambers. The lexicography of German

Annette Klosa

Abstract This chapter discusses the main dictionaries of the German language as it is spoken and written in Germany, and also German as it is spoken and written in Austria, Switzerland, the eastern fringes of Belgium, and South Tyrol. It also briefly describes Pennsylvania German. Corpora and other language resources used in German dictionary-making are also presented. Finally, there is a discus- sion of some current issues in German lexicography, as well as future prospects.

Contents Introduction ...... 1 Description ...... 3 History of lexicography in German ...... 4 Current issues in German lexicography ...... 7 Electronic corpora and other language resources for German ...... 9 Surveys of German dictionaries ...... 12 Specialized dictionaries ...... 13 Future prospects ...... 16 References ...... 18

Introduction

German is a scientific and a cultural language with a long history. As a West Germanic language it belongs to the Indo-European family of languages and is spoken today not only in Germany, Austria, and parts of Switzerland, but also in

A. Klosa (*) Lexik, Institut für Deutsche Sprache, Projekt elexiko, Mannheim, Germany e-mail: [email protected]

# Springer-Verlag GmbH Germany 2017 1 P. Hanks, G.-M. de Schryver (eds.), International Handbook of Modern Lexis and Lexicography, DOI 10.1007/978-3-642-45369-4_40-2 2 A. Klosa

Liechtenstein, Luxembourg, East Belgium, South Tyrol, and Alsace-Lorraine as well as in the form of a minority language in several other countries inside and outside of Europe. German is the language with the highest number of native speakers in continental Europe. Besides (written and spoken) German as a standard language there are many German dialects, which are classified into High and Low German dialect groups, the difference between them being mainly that High German dialects participated in the High German consonant shift, in which, for example, /t/ was shifted to /s/ (compare Low German Water and High German Wasser). Standard German has developed from High German dialects starting in the sixteenth century (for example with Martin Luther’s translation of the Scripture in 1522). Low German is closely related to Dutch. German is written with (26) Latin letters plus <ß> (‘Eszett’ for [s] after diph- thongs and long vowels) and the umlauts <ä>, <ö>, <ü>. Standardization of German orthography started in the late eighteenth century, and after 1871 (founda- tion of the German Reich) it became an official goal to develop a unified school orthography. Today, the Rat für deutsche Rechtschreibung (http://www. rechtschreibrat.com/)[‘Council for German Orthography’] is the regulatory body for German orthography: Members from Germany, Austria, Switzerland, Liechten- stein, Bolzano/South Tyrol, and Belgium prepare rules governing the uniformity of German spelling and syllabification in all German-speaking countries. Meeting at least twice a year, the “Rat” publishes reports and gives recommendations every few years. Spelling dictionaries as well as orthographic information in all other dictio- nary types conform to the spelling rules and the official list of German words published by this regulatory body. The spelling rules are mandatory only for official texts and schools, though; in other writing these are not binding (most writers of German nevertheless adhere to them). Codification of German pronunciation began in the late nineteenth century with the publication of dictionaries which record the pronunciation of German words and describe pronunciation rules. German pronunciation today is not as highly standard- ized as its orthography. In fact, different varieties are used in official situations, for example by news readers on the radio, and different varieties of standard German are used in Germany, Austria, the German speaking parts of Switzerland, etc. Distinctive for German pronunciation in its different varieties and dialects are (besides the very rich vowel system with 15 monophthong vowels and three diphthongs) its consonant clusters (especially in inflected forms) and initial word accent (except in loanwords which follow different rules). German has a rich inflectional system for nouns, , articles, adjectives, and verbs. Verbs, for example, are inflected (partly by using auxiliaries) according to three persons, two numbers, six tenses, two moods, and two voices. A German specialty is the complex system of word formation (composition, derivation, abbre- viation) for most parts of speech. While the formation of new derivatives is com- paratively rare, noun compounds are particularly frequent. Typically, only limited numbers of these are recorded in dictionaries. New verbs are mostly formed by adding prefixes, particles, and other elements creating, among others, separable The lexicography of German 3 verbs (e.g., aufschreiben ‘to write down’: Ich schreibe ein Wort auf [‘I write down a word’] – Ich habe ein Wort aufgeschrieben [‘I have written down a word’]). Separable verbs are usually recorded in dictionaries in the infinitive, unseparated form, while in texts they more often occur separated. Corpus tools for German have yet not mastered reliable lemmatization of separable verbs, thus making it difficult to ascertain their frequency for purposes of inclusion in a lemma list. The history of High German is divided into four periods: Old High German (750–1050), Middle High German (1050–1350), Early New High German (1350–1650), and New High German (since 1650). Even in the earliest times, non-Germanic words were integrated into the language, starting with Latin or Greek words (vinum – Wein, κυρικóv – Kirche). Later, French words (Old French aventure – Middle High German āventiure – Abenteuer; French boulevard – Bou- levard) started to enrich German vocabulary. Since the middle of the twentieth century, a growing number of English words have been integrated into German (design – Design). German vocabulary can be classified into native (Germanic/ Indo-European) words (Vater ‘father’, Mutter ‘mother’, zwei ‘two’), and fully phonetically and graphically integrated loanwords (Ziegel ‘brick’, from Latin tegula) as well as words of foreign origin, which are phonetically and graphically recogniz- able as loanwords (Toilette from French toilette, Yoga from Hindustan yoga). The latest borrowings are recorded in new-word dictionaries. Dictionaries as part of the German cultural heritage started playing an important role in teaching, in translating, and in documenting and standardizing the language in its different aspects from as early as the fourteenth century. The rise of German philology in the second half of the nineteenth century and the start of scientific research into language usage and dictionary-making in German since the 1970s are also important contributors to the huge variety of dictionaries of German that we have today. Finally, the availability of electronic corpora and the possibilities of publishing on the Web have had an impact on how German dictionaries are made and used today. The potential impact for the future is even greater. German lexicography does not stand alone, but is part of a larger European tradition of dictionary-making. Although many influences on questions of corpus compilation, lemmatization, definition, examples, guiding historical or synchronic principles, etc. from the lexicography of other European languages should not be forgotten, they can only be mentioned briefly in this paper.

Description

Starting with a brief look at the history of lexicography in German, in this chapter we describe some current issues in German lexicography, give information on electronic corpora of written and spoken German as well as other language resources used in German dictionary-making, and give a survey of German dictionaries. 4 A. Klosa

History of lexicography in German

The history of German lexicography has been described extensively (Grubmüller 1990; Haß-Zumkehr 2001; Henne 2001; Kühn and Püschel 1990a, b; Schaeder 1987; Stötzel 1970; Szlek 1999; Wiegand 1990). The short survey in this section is based on Schlaefer (2009, pp. 128–135), and focuses on the main development stages. German lexicography began in the early Middle Ages. A well-known example is the Vocabularius Sancti Galli (c. 790), a Latin–[Old High] German glossary, in which the vocabulary is ordered in subject groups (see Fig. 1). The need to translate from Latin was the main reason for creating glossaries and organizing them in alphabetical order until the fourteenth century. In the late fourteenth century, [Middle High] German–Latin glossaries were developed (the earliest being Vocabularium seu nomenclator by F. Closener), such that translations in both directions were possible. In the sixteenth century, multilingual glossaries (with Greek or other languages in addition to German and Latin) were published, such as Nomenclator Trilingvis, Graecolatinogermanicus by N. Frischlin (1591). Since the second half of the sixteenth century, monolingual German dictionaries have been created to docu- ment German as the mother tongue, e.g. Ein Teutscher Dictionarius [‘A German Dictionary’] by S. Roth (1571). Dictionaries of synonyms or proverbs were also published at this time. The term Wörterbuch came into use a little later, at the beginning of the seventeenth century. In the seventeenth and eighteenth centuries, the idea of standardizing meaning and usage in a comprehensive dictionary and the idea of codification of orthography as well as pronunciation in specialized dictionaries became important for the develop- ment of German lexicography. K. Stieler’s Der Teutschen Sprache Stammbaum und Fortwachs oder Teutscher Sprachschatz [‘The German Language: Source and Development, or German Language Store’] with about 60,000 main entries (1691) is typical for this period, giving etymological, grammatical, semantic, and phraseo- logical information together with notes on usage and examples. Stieler’s early attempts at historical principles for lemmatization and description of headwords represent a dictionary programme (see Reichmann 1989) in which German would be established as equal in importance to Latin and Greek. J. Ch. Adelung’s Versuch eines Vollständigen Grammatisch-Kritischen Wörterbuchs der Hochdeutschen Mundart [‘Essay towards a Grammatical-Critical Dictionary of the High German Language’] with over 55,000 main entries (1774–1786) standardizes German according to the language of educated speakers in Upper Saxony. Thus, Adelung’s synchronic dictionary is one of the first German dictionaries for the production of standard language of his time. In the mid-nineteenth century, a growing interest in etymology and philology led to the idea of the Deutsches Wörterbuch [‘German Dictionary’] by Jacob and Wilhelm Grimm (1st volume Leipzig 1854), which follows historical philological principles and is based on a large number of primary sources. Because this dictionary aims at describing ‘good’ language, many literary authors were evaluated for the extraction of quotes. The Deutsches Wörterbuch (covering over 300,000 entries) has been worked on for over 100 years, from 1945 The lexicography of German 5

Fig. 1 Page from Vocabularius Sancti Galli (913); # CESG Codices Electronici Sangallenses, www.cesg.unifr.ch) reading oculos – augun, nares – nasa, os – mund, gula – cela, mandilla – cinnipeini, maxillares – cinnizeni, mentus – cinni, palatus – goomo, lingua – zunga, labia – leffura, super cilia – opara prauua, popus – seha, facies – uuanga, aspectus – gasiunu, uultus – antluzi, capilli – fahs, pilus – har onward in two departments at Berlin and Göttingen (information: Akademie der Wissenschaften zu Göttingen (http://www.uni-goettingen.de/de/118878.html)andBer- lin-Brandenburgische Akademie der Wissenschaften (http://www.bbaw.de/forschung/ dwb)). Its first edition is digitized and online in the Trierer Wörterbuchnetz (http:// woerterbuchnetz.de/), and a revised edition of letters A–Be and D–F has been published in seven volumes (digitized version planned), while the rest of letters B and C are still (2015) being edited at Göttingen. Another well-known general historical dictionary (Haubrichs 2013) of German is Deutsches Wörterbuch by Hermann Paul (1st edition 1897 [as e-book at Open Library (https://openlibrary. org/books/OL14003791M/Deutsches_W%C3%B6rterbuch)], 10th edition 2002, also on CD-ROM with extensive search options). Its entries are highly cross- referenced to illustrate semantic and etymological relation. In the second half of the nineteenth century, the first German period dictionaries and the first dialect dictionaries were launched as well. In 1880, K. Duden published his Vollständiges Orthographisches Wörterbuch der deutschen Sprache [‘Compre- hensive Orthographic Dictionary of the German Language’] (1st edition, Leipzig), which by 2013 had reached its 26th edition (Duden: Die Rechtschreibung [‘Duden: Orthography’], 2013). Until 1996, when German orthography was reformed by an official council, this dictionary was the officially binding orthographic rule book for 6 A. Klosa

German (e.g. in schools). Duden: Die Rechtschreibung and the Deutsches Wörterbuch are the best known German dictionaries today, and the Deutsches Wörterbuch is regarded by many as the one national dictionary for German. Partly because many (multi-volume) dictionary projects which had been started between 1850 and 1900 were still being edited during the first half of the twentieth century, no new dictionary on German based on philological principles was published at that time. The Wörterbuch der deutschen Gegenwartssprache (WDG) [‘Dictionary of the Contemporary German Language’, edited by R. Klappenbach and W. Steinitz, originally published 1952–1977 in East Germany in six volumes] is the first (synchronic) dictionary of contemporary German, as meaning and usage of headwords are described without recourse to their historic development. A two-volume edition of WDG called Handwörterbuch der deutschen Gegenwartssprache (HDG) [‘Concise Dictionary of the Contemporary German Language’] was published by G. Kempcke et al. in 1984, with a strong orientation towards the official East German language of the period. In West Germany, Duden – Das große Wörterbuch der deutschen Sprache [‘Duden – the Big Dictionary of the German Language’] was published in its 1st edition in six volumes (1976–1982; 2nd edition in eight volumes 1993–1995, 3rd edition in ten volumes 1999) as a record of contemporary standard German, includ- ing specialized vocabulary and nonstandard language. A one-volume edition as Duden – Deutsches Universalwörterbuch [‘Duden – German Universal Dictionary’] (1st edition 1993) is still available in print (7th edition 2011). Brockhaus-Wahrig. Deutsches Wörterbuch [‘Brockhaus-Wahrig. German Dictionary’] in six volumes (1980–1984, ed. G. Wahrig et al.) had a special focus on technical and scientific vocabulary but was widely criticized for lack of evidence from sources and has not been re-edited. The one-volume Brockhaus Wahrig Deutsches Wörterbuch was published in 1966 (ed. G. Wahrig) and more successful (available now in its 9th edition 2012, ed. R. Wahrig-Burfeind); it is well known and widely used as a concise monolingual general dictionary for learners of German as a second language. All of these dictionaries are still guided by historical principles for the order of meanings, as they arrange meanings etymologically. Only since the introduction of German learner lexicography in the late twentieth century do some dictionaries arrange meanings according to frequency or salience. Since around 1990, the first electronic German dictionaries have been available on CD-ROM or (later) online. Examples for online dictionaries of contemporary German are Duden online (http://www.duden.de/woerterbuch) (over five million visitors per month; based on the data from several Duden print dictionaries plus some automatically compiled information, i.e., frequency or collocations from the Duden corpus [Rautmann 2014]), Digitales Wörterbuch der deutschen Sprache (DWDS) (http://www.dwds.de/) (ongoing project which provides the digitized WDG along with other resources including a digitized etymological dictionary of German [by W. Pfeifer], paradigmatic relations from GermaNet (http://www.sfs.uni- tuebingen.de/GermaNet/index.shtml), word profiles, and KWIC indices from vari- ous different corpora; developed and hosted at the Berlin-Brandenburgische Akademie der Wissenschaften), and elexiko – ein Online-Wörterbuch zum The lexicography of German 7

Gegenwartsdeutschen (http://www.elexiko.de/) (ongoing project which is published in modules; entries with automatically compiled information, in some parts with full information on meaning and usage; innovative presentation of data; developed and hosted at Institut für Deutsche Sprache in Mannheim).

Current issues in German lexicography

From the beginning of German lexicography, entries have been ordered either alphabetically or in subject groups. Headwords were always given in uninflected form. In the seventeenth century, the first dictionaries with nested entries were published, where derivatives and compounds are subordinated to the stem forms (uninflected). Most German dictionaries follow a strictly alphabetical order of headwords, though. German print lexicography since then has not questioned these principles of lemmatization or of ordering headwords. With the availability of both large corpora and electronic publication, this issue needed to be re-visited. Corpus data shows that uninflected forms are not always the most frequent ones. But still, in modern corpus-based German lexicography it is not the most frequent word forms that are used as headwords, but the uninflected base forms regardless of their frequency. This is due to the strong influence of a long lexicographic tradition. In electronic dictionaries, however, the looking-up of information on a word should be possible when searching for any word form, and not only the uninflected form. A lemmatization tool working in the background will lead the user to the correct lemma. For example, in Canoo.net (http://www.canoo.net/) this principle is implemented, although in most German electronic dictionaries it is not (in elexiko (http://www.elexiko.de/), for example, a lemmatizer was used to extract a list of headword candidates from the corpus, which were then checked manually, but the lemmatizer is not implemented in the dictionary user interface). Canoo.net (http:// www.canoo.net/) gives information on orthography, grammatical and syntactical behavior, related terms, and word formation for search words. A combination of the automatic compilation of data with lexicographic control and the completion of data was used to build this extensive online language service. An alternative solution to help users find the relevant headword consists of separate entries for inflected forms, as in Wiktionary – Das freie Wörterbuch (http://de.wiktionary.org/wiki/Wiktionary:Hauptseite): for many word forms an entry called “Deklinierte Form” [‘inflected form’] is found, and is linked to the relevant headword (see Fig. 2). In many German print dictionaries, inflected forms whose alphabetical place is far away from the headword are also recorded with a reference to the headword. While in electronic dictionaries there is not necessarily a need to show all headwords in a list, many German e-dictionaries still have an (alphabetical) list of headwords (see Fig. 3 for an example). Using morphological tools to generate and present all derivatives and compounds for a stem word has only been done recently, e.g. in elexiko (http://www.elexiko.de/) (Ulsamer 2013; see Fig. 3). 8 A. Klosa

Fig. 2 Entry for Webs in Wiktionary – Das freie Wörterbuch (http://de. wiktionary.org/wiki/ Wiktionary:Hauptseite)

Other lexicographic traditions are equally strong: most German dictionaries do not use illustrations or give encyclopedic information systematically when explaining meaning. In German lexicography there is a tradition neither of encyclo- pedic nor of illustrated dictionaries (Klosa 2017), although there are a few pictorial dictionaries: PONS Bildwörterbuch (http://bildwoerterbuch.pons.com/)[‘PONS Illustrated Dictionary’] is a German–English pictorial dictionary on the Web with onomasiological access to data; Duden – Das Bildwörterbuch [‘Duden Illustrated Dictionary’] (6th edition 2005) is the best-known German pictorial dictionary in print. Some electronic dictionaries (Duden online (http://www.duden.de/ woerterbuch) and elexiko (http://www.elexiko.de/)) have started to explore the possibilities offered by illustrations in combination with the paraphrase (see Kemmer 2014 for a usage study on the reception of paraphrase and illustration in dictionar- ies). In elexiko (http://www.elexiko.de/), encyclopedic information as well as cita- tions from the corpus are given in addition to the paraphrase wherever necessary and possible. This dictionary is also the only one that uses full sentences for definitions, while print dictionaries (due to space restrictions?) and online dictionaries still employ classic definition traditions (listing synonyms and/or short explanatory phrases). German learners’ dictionaries use a restricted, controlled defining vocabulary. With the availability of large electronic corpora, German standard contemporary lexicography has improved information on collocations and phraseology (in print and electronic dictionaries). The collocations given in modern dictionaries are The lexicography of German 9

Fig. 3 Derivatives of Computer in elexiko (http://www.elexiko.de/) typically the frequent ones in the corpus; they are selected to cover all semantic aspects. Automatically extracted collocations are sometimes also shown in word clouds. Phraseology also plays an important role as part of dictionary entries or in modern phraseological dictionaries. German dictionaries mostly still describe phra- seology from the perspective of single head words, instead of lemmatizing whole phrases. Duden online (http://www.duden.de/woerterbuch) has started to include entries like bis auf weiteres ‘until further notice’ and im Folgenden ‘in what follows’, because users very often searched for these phrases and were not able to find them under the headwords weiter and folgen respectively (Rautmann 2014, pp. 58–62).

Electronic corpora and other language resources for German

German is a language for which quite a number of corpora have been compiled, as well as other language tools. A search for German lexical resources, standards, tools, and services in the inventory of the Virtual Language Observatory (http://www. 10 A. Klosa clarin.eu/content/virtual-language-observatory) provided by CLARIN (http://clarin. eu/) yields 124 results, comprising written and spoken corpora, terminological resources, lexicons, and other items. Quite a number of resources provide online information on the German lexicon (e.g., Canoo.net (http://www.canoo.net/)) or are used for compiling modern German dictionaries (e.g., Kookkurrenzdatenbank CCDB (http://corpora.ids-mannheim.de/ccdb/), a databank of collocations).

Electronic corpora

Corpora of written German Corpora of written German are available for different language periods. The oldest German texts (from 750 to 1050) have been collected in Deutsch Diachron Digital: Referenzkorpus Altdeutsch (http://www.deutschdiachrondigital.de/). Similar pro- jects for Middle High German and Early Modern German are in progress. Each of these can in principle be used for the compilation of new dictionaries of one of the older German language periods. However, for example, for the Althochdeutsches Wörterbuch [‘Old High German Dictionary’], a collection of 750,000 citation slips is still being used as the primary lexicographic source (Köppe 2002). For (standard) New High German, the project Deutsches Textarchiv (http://www. deutschestextarchiv.de/) collects a corpus of texts representative of specific text types and disciplines from around 1600 to 1900. Searches for word usage in the seventeenth and eighteenth century are possible, whether it be in fine literature, in scientific texts, or functional writing. Yet there is no German dictionary that records and describes New High German between the end of the Early Modern High German period around 1600 and the beginning of contemporary German around 1945 (see Schlaefer 2009, pp. 116–117). For (standard) German in the twentieth (and the first decade of the twenty-first) century, DWDS-Kernkorpus (http://www.dwds.de/ressourcen/korpora/#part_1) offers a well-balanced corpus with 120 million tokens in almost 80,000 texts annotated according to TEI guidelines. This corpus is being used for lexicographic purposes for Digitales Wörterbuch der Deutschen Sprache (DWDS) (http://www. dwds.de/). Deutsches Referenzkorpus DeReKo (http://www.ids-mannheim.de/kl/ projekte/korpora.html) is the largest collection of contemporary (standard) German texts for the purpose of linguistic research. It contains over 25 billion words (as of 15th September 2014) in texts from the middle of the twentieth century until today. The corpus tool COSMAS II (https://cosmas2.ids-mannheim.de/cosmas2-web/) can be used to compile virtual corpora for specific purposes. The dictionary project elexiko (http://www.elexiko.de/), for example, uses a virtual corpus of DeReKo (http://www.ids-mannheim.de/kl/projekte/korpora.html) texts as its primary source. For the detection of the latest developments in the German language (and their recording in dictionaries), corpora with data from Internet chats, blogs, Twitter messages, etc. are required. Dortmunder Chat-Korpus (http://www.chatkorpus.tu- dortmund.de/) offers a collection of log files of 140,000 chats with approximately 1.06 million word forms from official or private contexts stored as XML documents. In DECOW2012 (http://hpsg.fu-berlin.de/cow/?action=corpora&lang=de-DE#top) The lexicography of German 11

(German Corpus from the Web) there are almost 10 billion tokens from German web sites including blogs, etc. As far as we know, no German dictionary project has yet systematically included data from blogs or chats in the dictionary sources, although this data would allow us to discover word usage in internet-based communication with its typical features of written orality.

Corpora of spoken German Most corpora of spoken German document specific German dialects, but some offer data on spoken colloquial, supra-regional German: the corpus Deutsch heute (http:// www.ids-mannheim.de/prag/AusVar/Deutsch_heute.html) was compiled between 2006 and 2009 to record varieties of “standard” spoken German, the corpus Deutsche Umgangssprache: Pfeffer-Korpus (http://agd.ids-mannheim.de/down load/korpus/Korpus_PF_extern.pdf) has recordings of colloquial German from 1961. Corpora of spoken German offer corpus texts as transcripts and/or audio files which can be evaluated in a lexicographical context for describing semantic or grammatical features of lexemes specific for spoken language. This has not yet been done systematically in German lexicography: information on spoken language in existing German dictionaries is derived from direct quotes in written texts or from transcribed speech, e.g., news broadcasts.

Lexicographic corpora Modern dictionaries of contemporary German from publishing houses as well as from academic institutions are primarily based on electronic text corpora. Both Duden (http://www.duden.de/hilfe) and Wahrig (http://brockhaus.de/newsletter_ brockhaus/wort-woerterbuch.php) (publishing houses) have compiled large corpora for use in compiling dictionaries with data from newspapers, fine literature, etc. from all German-speaking countries. This data is mainly used for detecting new words or new usage of lexemes or as a source of examples. Corpus query tools also allow us to find collocates or to detect multi-word units. In elexiko (http://www.elexiko.de/), exploring the corpus with sophisticated statistical tools is the first step towards a corpus-driven description of meaning and usage of the headwords (see Storjohann 2010 for a description of exploring colligational patterns in a corpus and their lexicographic documentation). Informa- tion on co-occurrences for headwords rendered by the corpus query tool COSMAS II (http://www.ids-mannheim.de/cosmas2/) (incorporating the tool Kookkurren- zanalyse (http://www.ids-mannheim.de/kl/projekte/methoden/ka.html)) is not grouped according to specific syntactic behavior but gives clusters of co-occurring words with a similar collocational behavior in a hierarchical structure. In addition, the corpus tools “Similar Collocations Profiles” and “Modelling Semantic Proxim- ity” are used to detect and describe paradigmatic relations in the dictionary entries. “DeWaC German Web Corpus” of 1.6 billion tokens with part-of-speech tagging is provided in Sketch Engine (http://www.sketchengine.co.uk/documentation/wiki/Cor pora/DeWaC). However, German word sketches for user corpora are still in prepa- ration, so (as far as we know) no German dictionary project at the present time uses 12 A. Klosa

Sketch Engine in its lexicographical process (a comparable feature are the “word profiles” in DWDS (http://www.dwds.de/)). While electronic corpora play a major role in providing source material for a number of German dictionary projects, other dictionaries are still based (more or less exclusively) on extensive collections of citations, for example dialect dictionaries. A combination of using a slip collection as a primary source and exploring the possibilities provided by Google Books (http://books.google.de/) is being used for the compilation of the revised edition of Deutsches Fremdwörterbuch (DFWB) [‘Dictionary of loanwords in German’] (Brückner 2012).

Other language resources Besides corpora, several other language resources for German can be used by lexicographers when compiling a dictionary. Some of these offer information directly to online users as well. Both Projekt Deutscher Wortschatz (http:// wortschatz.uni-leipzig.de/) and Die Wortwarte (http://www.wortwarte.de/) present the results of statistical analyses for words based on large corpora collecting data mainly from German online newspapers. While Deutscher Wortschatz (http:// wortschatz.uni-leipzig.de/) gives information on frequency, grammar, paradigmatic relations, collocations, corpus examples, and domain for each search word, Wortwarte (http://www.wortwarte.de/) focuses on the detection of new words. These are provided on a daily basis and are manually chosen from a large number of automatically compiled candidates. GermaNet (http://www.sfs.uni-tuebingen.de/GermaNet/index.shtml) (a lexical- semantic net grouping German lexical units that express the same concept into synsets and defining semantic relations between these synsets) is being used in a lexicographical context, for example, in DWDS (http://www.dwds.de/) which offers a panel with paradigmatic relations for a headword exploring GermaNet (http:// www.sfs.uni-tuebingen.de/GermaNet/index.shtml) (see Fig. 4).

Surveys of German dictionaries

The contemporary German dictionary landscape is huge and diverse. For surveys of German dictionaries from the Middle Ages to the nineteenth century, see Grubmüller 1990; Haß-Zumkehr 2001; Henne 2001; Kühn and Püschel 1990a, b; Schaeder 1987; Stötzel 1970; Szlek 1999; Wiegand 1990. A survey of contemporary German lexicography is provided in Wiegand (1990), and Mann (2013). Wiegand (2006, 2012), and (2014) provides an extensive bibliography of German lexicography and dictionary research; the last volume of this bibliography with indices is announced for November 2015. Information on German online dictionaries is to be found at OBELEXdict (http://www.owid.de/obelex/dict), an online bibliography of electronic lexicography. German dictionaries and dictionary portals are also examined in Engelberg and Lemnitzer (2009, pp. 24–81) and Schlaefer (2009, pp. 107–122). The lexicography of German 13

Fig. 4 Paradigmatic information on Web from GermaNet (http://www.sfs.uni-tuebingen.de/ GermaNet/index.shtml) as shown in DWDS (http://www.dwds.de/)

Specialized dictionaries

In addition to mainstream international dictionaries of German, a number of spe- cialized dictionaries should be mentioned.

Dictionaries for Austria, Switzerland, and other German-speaking communities While general dictionaries of standard German all aim at covering German in every country or community where the language is used, some dictionaries are specialized accounts of vocabulary in particular regions. The largest of these are the Austrian and Swiss language areas, but in Belgium, South Tyrol, and even in the U.S. there are large German-speaking communities as well. The Variantenwörterbuch des Deutschen (Österreich, Schweiz, Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol) [‘Dictionary of German Variants’] by U. Ammon et al. (2004) describes this national and regional variance in the German lexicon. A second, revised edition of this dictionary (VWB2 (http://www.varianten woerterbuch.net/home.html)) is in progress (Dürscheid and Sutter 2014, pp. 48–50). There is as yet no general dictionary of the Austrian standard variety of German. The Wörterbuch der bairischen Mundarten in Österreich (WBÖ) records Bavarian dialect words in Austria (letters A– E published in four volumes; Österreichische Akademie der Wissenschaften (http://www.oeaw.ac.at/icltt/dinamlex-archiv/ WBOE.html)). German in South Tyrol is covered in WBÖ. The largest and best known dictionary of Swiss German is the Schweizerisches Idiotikon. Wörterbuch der schweizerdeutschen Sprache [‘Swiss Idiotikon: Dictionary of the Swiss German Language’] (information: Schweizerische Akademie der Geistes- und Sozialwissenschaften (http://www.idiotikon.ch/)). In this ongoing project the letters A–X are published in 16 volumes as well as digitized and online (http:// www.idiotikon.ch/index.php?option=com_wrapper&view=wrapper&Itemid=195). The Idiotikon is a general dictionary of the standard Swiss variety of German (with historical perspective) as well as a dialect dictionary for the larger Swiss language area. 14 A. Klosa

The German of East Belgium is covered in the Rheinisches Wörterbuch (nine volumes; digitized and available on line in the Trierer Wörterbuchnetz (http://www. woerterbuchnetz.de/)). Pennsylvania German is covered in a few print and online dictionaries, e.g. Wikipedia English/Pennsylvania German/High German dictionary (http://pdc. wikipedia.org/wiki/English/Pennsylvania_German/High_German_dictionary)(user- generated content in a trilingual dictionary with English and standard German translations).

Low German and dialect dictionaries A large group of Low German dialects (e.g. Holsteinisch, Ostfriesisch) is often referred to under the collective term “Niederdeutsch” or “Plattdeutsch” (Old Saxon being the earliest stage of Niederdeutsch). The status of Niederdeutsch as a separate language alongside standard German is extensively discussed by speakers and linguists, without any definitive conclusion. Today, it is mostly used (by fewer and fewer speakers) in spoken language and less and less in official documents. Platt- deutsches Wörterbuch [‘Low German Dictionary’] by J. Sass (7th edition 2013) covers Niederdeutsch in toto. Low German dialects (as well as dialects from the High German dialect groups) are covered individually in a large number of (com- pleted or still ongoing) historical dialect dictionaries. In addition, the Deutscher Wortatlas [‘German Word Atlas’] (by W. Mitzka and L. E. Schmitt in 22 volumes. Gießen: Wilhelm Schmitz Verlag 1956–1980), Deutscher Sprachatlas [‘German Language Atlas’] (website REDE – Regionalsprache.de (http://www.regionalsprache.de/)), and the online project Atlas zur Aussprache des deutschen Gebrauchsstandards (AADG) (http://prowiki.ids- mannheim.de/bin/view/AADG/)[‘Atlas on the pronunciation of everyday German’] provide maps of German dialects.

Period dictionaries In a period dictionary (Reichmann 1999) the lexicon of a specific historical time is described synchronically. The period in German from c. 700–1640 is covered in the following dictionaries:

• Althochdeutsches Wörterbuch [‘Old High German Dictionary’]: ongoing; A–L published in five volumes. Information: Sächsische Akademie der Wissenschaften zu Leipzig (http://www.saw-leipzig.de/forschung/projekte/ althochdeutsches-woerterbuch). • Frühneuhochdeutsches Wörterbuch [‘Early New High German Dictionary’], ongoing; several volumes published, but not in alphabetical order; digitized online version planned; information: Akademie der Wissenschaften zu Göttingen (http://adw-goe.de/forschung/forschungsprojekte-akademienprogramm/fruehneu hochdeutsches-woerterbuch/). • (Neues) Mittelhochdeutsches Wörterbuch [‘(New) Middle High German Dictio- nary’], Academies of Göttingen and Mainz (at Trier), ongoing; A–E published in one volume and online (http://www.mhdwb-online.de/). Information: Akademie The lexicography of German 15

der Wissenschaften und der Literatur, Mainz (http://www.adwmainz.de/projekte/ mittelhochdeutsches-woerterbuch/informationen.html), and Akademie der Wissenschaften zu Göttingen) (http://www.uni-goettingen.de/de/sh/92908.html. • Mittelhochdeutsches Wörterbuch [‘Middle High German Dictionary’]byG.F. Benecke et al.; Mittelhochdeutsches Handwörterbuch [‘Concise Middle High German Dictionary’] by M. Lexer; Findebuch zum mittelhochdeutschen Wortschatz [‘Book for Middle High German vocabulary’] by. K. Gärtner et al.: both (completed) dictionaries that started in the nineteenth century, complemented by Findebuch in 1992; digitized, extensively hyperlinked and online in Trierer Wörterbuchnetz (http://www.woerterbuchnetz.de/).

Foreign and new word dictionaries German lexicography is well known for the dictionary type ‘Fremdwörterbuch’, which records loanwords in German. Some cover loans from all languages and describe them diachronically (e.g., Deutsches Fremdwörterbuch [DFWB] [‘German Loan Word Dictionary’] by H. Schulz and O. Basler; revised edition ongoing; A–G published in seven volumes; digitized online version of first and revised edition planned; information: Institut für Deutsche Sprache Mannheim) (http://www.ids-mann heim.de/lexik/fremdwort.htm). Others concentrate on words from English (e.g., Anglizismen-Wörterbuch [‘Dictionary of Anglicisms’] by B. Carstensen et al. 1993). Very recent loanwords are usually considered as ‘Neologismen’ [new words] and are described in special dictionaries, e. g. Neologismenwörterbücher [‘Dictionaries of Neologisms’] (1990–1999, 2000–2010, 2011–today; new words since 2011, ongoing; online (http://www.owid.de/wb/neo/start.html)) and 1990–1990 published in one volume (Herberg et al. 2004), 2000–2010 published in two volumes (Steffens and al-Wadi 2013); information: Institut für Deutsche Sprache Mannheim) (http:// www.ids-mannheim.de/lexik/lexikalischeinnovationen.html). However, not all new words in German are loanwords from other languages (mainly English).

Phraseological and collocation dictionaries Phraseology and collocations are not only examined in general dictionaries but also in specialized publications. While Wörter und Wendungen [‘Words and Phrases’]by E. Agricola (1st edition 1962, 14th and last edition 1992, Dudenverlag) records collocations, phraseologisms, and idioms, newer dictionaries only look at one of these types. All recent collocations dictionaries are corpus-based and all of them are online:

• Projekt Deutscher Wortschatz (http://wortschatz.uni-leipzig.de/)[‘German Vocabulary Project’]: a collection of collocations that form the basis of Wörterbuch der Kollokationen im Deutschen [‘Dictionary of Collocations in German’] by U. Quasthoff (2010). • Feste Wortverbindungen (http://www.owid.de/wb/uwv/start.html): [‘Fixed Word Combinations’]: also additional longer studies on specific, fixed, multi-word expressions; information: Institut für Deutsche Sprache Mannheim (http:// wvonline.ids-mannheim.de/). 16 A. Klosa

Fig. 5 Collocations for Computer in Kollokationenwörterbuch (http://colloc.germa.unibas.ch/ web/suche/)

• Kollokationenwörterbuch (http://colloc.germa.unibas.ch/web/suche/). [‘Colloca- tion Dictionary’]: this dictionary groups collocations by part of speech (see Fig. 5) and is the basis for Feste Wortverbindungen des Deutschen: Kollokationen- Wörterbuch für den Alltag [‘German Multi-Word Combination: Collocation Dictionary for Everyday’] by A. Häcki-Buhofer (2014); information: Schweizerische Akademie der Geistes- und Sozialwissenschaften (http://colloc. germa.unibas.ch/web/projekt/). • Sprichwörterbuch (http://www.owid.de/wb/sprw/start.html)[‘Idiom Dictio- nary’]: part of the multi-lingual project SprichWort (http://www.sprichwort- plattform.org/). Eine Internetplattform für das Sprachenlernen (2008-2010) (http://www.sprichwort-plattform.org/)[‘Idioms. Internet Platform for Language Learning’]; information: Institut für Deutsche Sprache Mannheim (http://www. ids-mannheim.de/lexik/uwv.html).

Future prospects

German lexicography is, as shown by the dictionaries and projects described here, traditional as well as innovative. The majority of scientific dictionaries are still published in print, but many projects are now published in an electronic medium as well. Today, e-lexicography for German is mostly Internet lexicography. Pub- lishers such as Duden record increasing sales of dictionary apps, and DWDS (http:// The lexicography of German 17 www.dwds.de/) already provides a mobile version for small screens. Lexicographic data on German is also being included in language technology like automatic translation tools. As shown above, synchronic principles are not applied widely in German lexi- cography, e.g. when ordering senses. This is to some extent due to the fact that quite a number of long-term dictionary projects following diachronic principles are still running. But in modern lexicography corpus evidence should be taken into account more exhaustively, which becomes especially evident when we look at phraseology. Neither phraseological dictionaries of German nor phraseological information in general German dictionaries reflect as yet the fact that words occur in texts as part of fixed, multi-word units. While we know a lot about German dictionaries, we do not necessarily know whether they are actually used and by whom or in which situation. Many of the publications presented here address a rather selective circle of users (e.g., philolog- ical experts), while others address the general public. It seems that these diverse user groups will become even more diversified in the future. It will therefore be important for German lexicography to employ usage studies to learn more about potential users of (print or electronic) dictionaries (see Müller-Spitzer 2014 for usage studies of the use of online dictionaries in general). User adaptivity of electronic dictionaries is also a means to address more than one user group in more than one usage situation without the necessity to develop different lexicographic databases. User-adaptive German dictionaries have only been realized rudimentarily so far (DWDS (http://www.dwds.de/), for example, allows the user to adapt the dictionary view in panels to his/her needs). The same applies to multi-media dictionaries and the use of innovative visualization of lexi- cographic data. There are also no context-sensitive German dictionaries yet. When contrasting this negative statement with the long list of German dictionar- ies above being financed by public means or by publishing houses, this situation is maybe a little less surprising. In times of dying dictionary publishing houses (Brockhaus-Wahrig publishing house, for example, no longer exists as of December 2013) or down-sizing dictionary publishing houses (the editorial team of Duden publishing house has only had three members since 2013, while previously up to 20 editors were employed) and a shortage of public funding for academic projects (the duration of several of the dictionary projects mentioned above was shortened by the funding bodies), new developments take longer than desired. Lexicographers as well as dictionary users have to take care that such a long-standing dictionary tradition as that of German is not endangered further. User-friendly, innovative, corpus-based (print or electronic) dictionaries of high quality as well as lexical information systems on the Web and lexicographical data in all kinds of electronic tools need to be developed. A combination of user-generated content with content from an editorial team is also worth considering in this context. 18 A. Klosa

References

Brückner, D. (2012). Google Bücher aus dem Blickwinkel des Lexikographen. Trefwoord, tijdschrift voor lexicografie. Retrieved from www.fryske-akademy.nl/?id=1917 Dürscheid, C., & Sutter, P. (2014). Grammatische Helvetismen im Wörterbuch. Zeitschrift für angewandte Linguistik, 60(1), 37–65. Engelberg, S., & Lemnitzer, L. (2009). Lexikographie und Wörterbuchbenutzung (4th ed.). Tübingen: Stauffenburg. Grubmüller, K. (1990). Die deutsche Lexikographie von den Anfängen bis zum Beginn des 17. Jahrhundert. In F. J. Hausmann, O. Reichmann, H. E. Wiegand, & L. Zgusta (Eds.), Wörterbücher. Ein internationales Handbuch zur Lexikographie (Vol. 2, pp. 2037–2049). Berlin/New York: de Gruyter. Haß-Zumkehr, U. (2001). Deutsche Wörterbücher – Brennpunkt von Sprach- und Kulturgeschichte. Berlin/New York: de Gruyter. Haubrichs, W. (2013). German I: Historical and etymological lexicography. In R. H. Gouws, U. Heid, W. Schweickard, & H. E. Wiegand (Eds.), Dictionaries. An International Encyclopedia of Lexicography. Supplement Volume: Recent Developments with Focus on Electronic and Computational Lexicography (pp. 732–741). Berlin/Boston: de Gruyter. Henne, H. (Ed.). (2001). Deutsche Wörterbücher des 17. und 18. Jahrhunderts. Einführung und Bibliographie (2nd ed.). Hildesheim: Olms. Kemmer, K. (2014). Rezeption der Illustration, jedoch Vernachlässigung der Paraphrase? In C. Müller-Spitzer (Ed.), Using Online Dictionaries (pp. 251–278). Berlin/Boston: de Gruyter. Klosa, A. (2017). Illustrations in dictionaries; encyclopaedic and cultural information in dictionar- ies. In P. Durkin (Ed.), The Oxford Handbook of Lexicography. Oxford: Oxford University Press. Köppe, I. (2002). Das althochdeutsche Wörterbuch: Konzeption – Materialkorpus – Bedeutungs- wörterbuch und Kulturgeschichte. Retrieved from http://www.saw-leipzig.de/forschung/pro jekte/althochdeutsches-woerterbuch/publikationen/ingeborg-koeppe-2002 Kühn, P., & Püschel, U. (1990a). Die deutsche Lexikographie vom 17. Jahrhundert bis zu den Brüdern Grimm ausschließlich. In F. J. Hausmann, O. Reichmann, H. E. Wiegand, & L. Zgusta (Eds.), Wörterbücher. Ein internationales Handbuch zur Lexikographie (Vol. 2, pp. 2049–2077). Berlin/New York: de Gruyter. Kühn, P., & Püschel, U. (1990b). Die deutsche Lexikographie von den Brüdern Grimm bis Trübner. In F. J. Hausmann, O. Reichmann, H. E. Wiegand, & L. Zgusta (Eds.), Wörterbücher. Ein internationales Handbuch zur Lexikographie (Vol. 2, pp. 2078–2100). Berlin/New York: de Gruyter. Mann, M. (2013). German II: Synchronic lexicography. In R. H. Gouws, U. Heid, W. Schweickard, & H. E. Wiegand (Eds.), Dictionaries. An International Encyclopedia of Lexicography. Sup- plement Volume: Recent Developments with Focus on Electronic and Computational Lexicog- raphy (pp. 742–816). Berlin/Boston: de Gruyter. Müller-Spitzer, C. (Ed.). (2014). Using Online Dictionaries. Berlin/Boston: de Gruyter. Rautmann, K. (2014). Duden online und seine Nutzer. In A. Abel & A. Klosa (Eds.), Der Nutzerbeitrag im Wörterbuchprozess (pp. 49–62). Mannheim: Institut für Deutsche Sprache. http://pub.ids-mannheim.de/laufend/opal/pdf/opal2014-4.pdf Reichmann, O. (1989). Geschichte lexikographischer Programme in Deutschland. In F. J. Hausmann, O. Reichmann, H. E. Wiegand, & L. Zgusta (Eds.), Wörterbücher. Ein internationales Handbuch zur Lexikographie (Vol. 1, pp. 230–246). Berlin/New York: de Gruyter. Reichmann, O. (1999). Das Sprachstadienwörterbuch I: Deutsch. In F. J. Hausmann, O. Reichmann, H. E. Wiegand, & L. Zgusta (Eds.), Wörterbücher. Ein internationales Handbuch zur Lexikographie (Vol. 2, pp. 1416–1429). Berlin/New York: de Gruyter. Schaeder, B. (1987). Germanistische Lexikographie. Tübingen: Niemeyer. The lexicography of German 19

Schlaefer, M. (2009). Lexikologie und Lexikographie. Eine Einführung am Beispiel deutscher Wörterbücher (2nd ed.). Berlin: Erich Schmidt. Storjohann, P. (2010). Colligational patterns in a corpus and their lexicographic documentation. In M. Mahlberg, V. González-Díaz, & C. Smith (Eds.), Proceedings of the Corpus Linguistics Conference 2009 in Liverpool. http://ucrel.lancs.ac.uk/publications/CL2009/ Stötzel, G. (1970). Das Abbild des Wortschatzes. Zur lexikographischen Methode in Deutschland von 1617–1967. Poetica. Zeitschrift für Sprach- und Literaturwissenschaft, 3,1–23. Szlek, S. P. (1999). Zur deutschen Lexikographie bis Jacob Grimm. Wörterbuchprogramme, Wörterbücher und Wörterbuchkritik. Bern et al.: Peter Lang. Ulsamer, S. (2013). Chancen und Probleme bei der automatischen Ermittlung von Wortbildung- sprodukten für elexiko und bei ihrer Präsentation. In A. Klosa (Ed.), Wortbildung im elektronischen Wörterbuch (pp. 235–254). Tübingen: Narr. Wiegand, H. E. (1990). Die deutsche Lexikographie der Gegenwart. In F. J. Hausmann, O. Reichmann, H. E. Wiegand, & L. Zgusta (Eds.), Wörterbücher. Ein internationales Handbuch zur Lexikographie (Vol. 2, pp. 2100–2246). Berlin/New York: de Gruyter. Wiegand, H. E. (2006). Internationale Bibliographie zur germanistischen Lexikographie und Wörterbuchforschung (Vols. 1: A–H & 2: I–R). Berlin/New York: de Gruyter. Wiegand, H. E. (2012). Internationale Bibliographie zur germanistischen Lexikographie und Wörterbuchforschung (Vol. 3: S–Z). Berlin/New York: de Gruyter. Wiegand, H. E. (2014). Internationale Bibliographie zur germanistischen Lexikographie und Wörterbuchforschung (Vol. 4: Nachträge). Berlin/New York: de Gruyter.

Dictionaries

Print

Adelung, J. Ch. (1774–1786). Versuch eines Vollständigen Grammatisch-Kritischen Wörterbuchs der Hochdeutschen Mundart. Leipzig: Breitkopf. Agricola, E. (1992). Wörter und Wendungen (2nd ed.). Mannheim et al.: Bibliographisches Institut. Althochdeutsches Wörterbuch. (1952–today). Founded by E. Karg-Gasterstädt & T. Frings. Ed. by Sächsische Akademie der Wissenschaften zu Leipzig. Berlin/New York: de Gruyter. Ammon, U., et al. (2004). Variantenwörterbuch des Deutschen (Österreich, Schweiz, Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol). Berlin/New York: de Gruyter. Brockhaus-Wahrig Deutsches Wörterbuch. (1980–1984). Ed. G. Wahrig et al. Wiesbaden/Stuttgart: F. A. Brockhaus/Deutsche Verlags-Anstalt. Brockhaus Wahrig Deutsches Wörterbuch. (2012). 9th ed. by R. Wahrig-Burfeind. Gütersloh: Wissenmedia. Carstensen, B., et al. (1993–1996). Anglizismen-Wörterbuch. Der Einfluss des Englischen auf den deutschen Wortschatz nach 1945. Berlin/New York: de Gruyter. Closener, F Vocabularium seu nomenclator. Not preserved. See Gerber, Harry, “Closener, Fritsche” in: Neue Deutsche Biographie 3 (1957), p. 294 f. [Online: http://www.deutsche-biographie.de/ ppn118669567.html]. [DFWB]. Deutsches Fremdwörterbuch. (1995–today). New edition founded by H. Schulz, contin- ued by O. Basler, ed. by Institut für Deutsche Sprache. Berlin/New York: de Gruyter. Duden, K. (1881). Vollständiges Orthographisches Wörterbuch der deutschen Sprache. Leipzig: Bibliographisches Institut. Duden – Das Bildwörterbuch. (2005). 6th revised edition by Dudenredaktion. Mannheim: Bibliographisches Institut. 20 A. Klosa

Duden – Das große Wörterbuch der deutschen Sprache. (1999). 3rd revised edition by Dudenredaktion. Mannheim/Leipzig/Berlin: Dudenverlag. Duden – Deutsche Universalwörterbuch. 7th revised edition by Dudenredaktion. Mannheim: Bibliographisches Institut. Duden – Die deutsche Rechschreibung. (2013). 26th edition by Dudenredaktion. Berlin: Bibliographisches Institut. Frischlin, N. (1591). Nomenclator Trilingvis, Graecolatinogermanicus. Frankfurt/M: Spies. Frühneuhochdeutsches Wörterbuch. (1989–today). Founded by R. R. Anderson et al. Ed. by U. Goebel, A. Lobenstein-Reichmann & O. Reichmann. Berlin/New York: de Gruyter. Gärtner, K., et al. (1992). Findebuch zum mittelhochdeutschen Wortschatz. Stuttgart: S. Hirzel. Grimm, J., & Grimm, W. (1854–1961). Deutsches Wörterbuch. Leipzig: S. Hirzel. Häcki-Buhofer, A. (Ed.). (2014). Feste Wortverbindungen des Deutschen: Kollokationen- Wörterbuch für den Alltag. Tübingen: Francke. [HDG]. Handwörterbuch der deutschen Gegenwartssprache. (1984). Ed. by G. Kempcke et al. Berlin: Akademie-Verlag. Herberg, D., et al. (2004). Neuer Wortschatz. Neologismen der 90er Jahre im Deutschen. Berlin/New York: de Gruyter. Lexer, M. (1872–1878). Mittelhochdeutsches Handwörterbuch. Leipzig: S. Hirzel. Mittelhochdeutsches Wörterbuch. (1854–1866). On the basis of material from G. F. Benecke ed. by W. Müller & F. Zarncke. Leipzig: S. Hirzel. Mittelhochdeutsches Wörterbuch. (2006–today). Ed. by K. Gärtner, K. Grubmüller & K. Stackmann. Stuttgart: S. Hirzel. Mitzka, W., & Schmitt, L. E. (1956–1980). Deutscher Wortatlas. Gießen: Wilhelm Schmitz. Paul, H. (2002). Deutsches Wörterbuch. 10th, revised edition by H. Henne, H. Kämper & G. Objartel. Tübingen: Max Niemeyer. Quasthoff, U. (2010). Wörterbuch der Kollokationen im Deutschen. Berlin/New York: de Gruyter. Rheinisches Wörterbuch. (1928–1971). Ed. by J. Müller, from volume VII by. K. Meisen et al. Bonn: Fritz Klopp & Berlin: Erika Klopp. Roth, S. (1571). Ein Teutscher Dictionarius. Augsburg: Michael Manger. Sass, J. (2013). Plattdeutsches Wörterbuch. 7th edition by H. Thies & H. Kahl. Nürnberg: Wachholtz. Schweizerisches Idiotikon. Wörterbuch der schweizerdeutschen Sprache. (1881–today). Ed. by F. Staub & L. Tobler, later by A. Bachmann u. a. Frauenfeld/Basel: Huber Frauenfeld/Schwabe. Steffens, D., & al-Wadi, D. (2013). Neuer Wortschatz. Neologismen im Deutschen 2001–2010. Mannheim: Institut für Deutsche Sprache. Stieler, K. (1691). Der Teutschen Sprache Stammbaum und Fortwachs oder Teutscher Sprachschatz. Nürnberg: Johann Hofmanns. Vocabularius Sancti Galli (ca. 790). Stiftsbibliothek St. Gallen, Cod. Sang. 913. [WBÖ]. Wörterbuch der bairischen Mundarten in Österreich. (1963–today). Ed. by Institut für Österreichische Dialekt- und Namenkunde. Wien: Verlag der Österreichischen Akademie der Wissenschaften. [WDG]. Wörterbuch der deutschen Gegenwartssprache. (1952–1977). Ed. by R. Klappenbach & W. Steinitz. Berlin: Akademie-Verlag.

Electronic

[AADG]. Atlas zur Aussprache des deutschen Gebrauchsstandards. Ed. by St. Kleiner & R. Knöbl. http://prowiki.ids-mannheim.de/bin/view/AADG/WebHome Canoo.net. Deutsche Wörterbücher und Grammatik. Ed. by Canoo Engineering AG. http://canoo. net Deutscher Wortschatz. Ed. by NLP group at Leipzig University. http://wortschatz.uni-leipzig.de/ The lexicography of German 21

[DWDS]. Digitales Wörterbuch der deutschen Sprache. Ed. by Berlin-Brandenburgische Akademie der Wissenschaften. http://www.dwds.de/ Duden online. Ed. by Dudenredaktion. http://www.duden.de/ elexiko – ein Online-Wörterbuch zum Gegenwartsdeutschen. Ed. by Institut für Deutsche Sprache. http://www.elexiko.de Feste Wortverbindungen. Ed. by Institut für Deutsche Sprache. http://www.owid.de/wb/uwv/start. html GermaNet. Ed. by E. Hinrichs et al. at Tübingen University. http://www.sfs.uni-tuebingen.de/ GermaNet/ Kollokationenwörterbuch. Ed. by A. Häcki-Buhofer et al. http://colloc.germa.unibas.ch/web/ woerterbuch/ PONS Bildwörterbuch. Hosted by QA International. http://bildworterbuch.com/ Sprichwörterbuch. Ed. by Institut für Deutsche Sprache. http://www.owid.de/wb/sprw/start.html REDE – Regionalsprache.de. Ed. by Forschungszentrum Deutscher Sprachatlas. http://www. regionalsprache.de/ Trierer Wörterbuchnetz. Ed. by Trier Center for Digital Humanities. http://woerterbuchnetz.de/ Wikipedia English/Pennsylvania German/High German dictionary. https://pdc.wikipedia.org/wiki/ English/Pennsylvania_German/High_German_dictionary Wiktionary – das freie Wörterbuch. Hosted by Wikimedia Foundation Inc. https://de.wiktionary. org/wiki/Wiktionary:Hauptseite Wortwarte. Wörter von heute und morgen. Ed. by L. Lemnitzer. http://www.wortwarte.de/

Corpora and Language Tools

DECOW2012. Compiled by German Grammar Group at Freie Universität Berlin. http:// corporafromtheweb.org/decow12/ DeReKo – Deutsches Referenzkorpus. Compiled by Institut für Deutsche Sprache Mannheim. http://www.ids-mannheim.de/kl/projekte/korpora.html Deutsch Diachron Digital: Referenzkorpus Altdeutsch. Compiled by Project “Referenzkorpus Altdeutsch” at Universities Jena, Frankfurt/M. & Humboldt University Berlin. http://www. deutschdiachrondigital.de/ Deutsch heute. Compiled by Institut für Deutsche Sprache Mannheim. http://www.ids-mannheim. de/prag/AusVar/Deutsch_heute/ Deutsche Umgangssprache: Pfeffer-Korpus. Provided by Institut für Deutsche Sprache Mannheim. http://agd.ids-mannheim.de/PF–_extern.shtml Deutsches Textarchiv. Compiled by Berlin-Brandenburgische Akademie der Wissenschaften. http:// www.deutschestextarchiv.de/ Dortmunder Chat-Korpus. Compiled by Technische Universität Dortmund. http://www.chatkorpus. tu-dortmund.de/ DWDS-Kernkorpus. Compiled by Berlin-Brandenburgische Akademie der Wissenschaften. http:// www.dwds.de/ressourcen/kernkorpus/ Kookkurrenzanalyse. Developed at Institut für Deutsche Sprache Mannheim. # Cyril Belica 1995. http://www.ids-mannheim.de/kl/projekte/methoden/ka.html Sketch Engine. Developed by Lexical Computing Ltd. Brighton. http://www.sketchengine.co.uk/ The lexicography of Norwegian

Oddrun Grønvik

Contents Introduction ...... 2 Description ...... 4 Diglossia: The Two Standard Forms of Norwegian ...... 4 The Structure and of Norwegian ...... 4 Norwegian Orthography ...... 7 Historical Background ...... 8 Mutual Comprehensibility Among Speakers of Scandinavian Languages ...... 10 Language Management and Lexicography in Norway ...... 11 Norwegian Lexicography 1600–1900...... 12 The Complete Word ...... 13 Scholarly Dictionaries of Norwegian ...... 14 General One-Volume Dictionaries of Norwegian ...... 16 Etymological Dictionaries of Norwegian ...... 18 Synchronic and Historical Principles in Norwegian Dictionaries ...... 19 Principles of Definition Writing, Explanation, and/or Translation ...... 20 Bilingual Lexicography ...... 21 Terminological Dictionaries ...... 24 Lexicographical Evidence: Citation Collections, Corpus Data, and Other Resources ...... 25 Building Lexicographical Resources ...... 26 The Relationship Between Lexis and Grammar in Dictionaries ...... 27 The Current and Possible Future Development of Electronic Lexicography in Norwegian ...... 29 References ...... 30

O. Grønvik (*) Institutt for lingvistiske og nordiske studium, University of Oslo, Oslo, Norway e-mail: [email protected]

# Springer-Verlag GmbH Germany 2016 1 P. Hanks, G-M. de Schryver (eds.), International Handbook of Modern Lexis and Lexicography, DOI 10.1007/978-3-642-45369-4_44-1 2 O. Grønvik

Abstract Lexicography in Norway is characterized by the need to cater for many languages representing what are in international terms small speech communities, a fairly short history of language standardization, limited documentation of several Norway-based languages, and increasingly sophisticated technical solutions for making language products and presenting them to users. Lexicography for the languages Norway acknowledges responsibility for, is increasingly perceived as a necessary public service, essential to documenting the different standards and therefore requiring public support if market funding is seen to be inadequate. This understanding encompasses the whole range of lexicographical tools, from the large scholarly dictionaries to essential tools for minority languages, immigrant groups, etc., and is expressed through parliament as official language policy. This chapter gives a survey of the shaping of the lexicon of Norwegian, the history and status of lexicography in Norway, the status of some important dictionaries, and the status for different types of lexicographical products that serve particular linguistic needs. Finally the transition to e-lexicography is discussed.

Introduction

Documenting a language through dictionaries, monolingual as well as bilingual, is a crucial act in establishing the identity of a linguistic community – to its own population and in relation to other language communities. In Norway, this aspect of lexicography is very much to the fore, both for historical and present-day reasons. The historical perspective is presented below. The present-day position of lexicog- raphy in Norway reflects the position of Norwegian in Norway, both in relation to the world around Norway and in relation to the linguistic minorities for which the state of Norway acknowledges responsibility. Norway has since its constitution as a separate modern state had a language policy. Current policy is based on a series of political debates and parliamentary decisions and documents, in which the treatment of lexicography has grown steadily more prominent during the past 50 years (Vogt et al. 1966, p. 10 c.; Kristoffersen et al. 2005; Kulturdepartementet 2008, pp. 156–194). A determining factor in lexicographic coverage is population size, combined with societal wealth. Norway 50 years ago had a population of around 3.5 million, which has risen to around 5 million. This is not enough to sustain a large and differentiated commercial market for all sorts of lexicographical products. No Norwegian pub- lisher can sustain the funding of large, scholarly monolingual dictionaries which have become hallmarks of developed literate societies and act as reference points and language banks for many other lexicographical endeavors. Nor is there a sufficient market for bilingual lexicography between small or even medium-sized language communities. This fact has led to a strong public involvement in lexicography. Scholarly lexicography was publicly funded from the early twentieth century, and lexicography was institutionalized as an academic discipline at the University of The lexicography of Norwegian 3

Oslo from 1972 onwards (Gundersen 1967, p. 120; Norges Allmennvitenskapelige forskningsråd 1973, p 33). Bilingual lexicography for small language communities has received support and funding from the language policy agencies and from the education sector (cf. below section “Bilingual Lexicography for Small Language Communities”). Since the nineteenth century, Norway has had a considerable state involvement first in authorizing school books in general, including commissioning and assessing the quality of language tools, and, since the 1970s, in providing language tools, especially dictionaries. This involvement has since the 1950s been managed from or in cooperation with permanent public bodies, the current institution being the Language Council of Norway. (For more information see the Language Council of Norway website http://www.sprakradet.no/Toppmeny/Om-oss/English-and-other- languages/English/.) This involvement covers all school spellers, glossaries, and dictionaries, which require approval from the Language Council in order to be used in schools. The production of the two first modern monolingual definition dictionar- ies for general use, Bokmålsordboka [BOB] and Nynorskordboka [NOB], was a collaboration between the University of Oslo and the predecessor to the Language Council. See the section below, “General One-Volume Dictionaries of Norwegian.” The Language Council promotes the production of monolingual dictionaries for the minority languages of Norway and bilingual dictionaries, especially for language pairs where a dictionary is unlikely to be commercially viable (in particular, between the languages of the Nordic countries and between the minority languages of Norway and Norwegian). Most bilingual dictionaries for the world’s major languages which include Nor- wegian have the non-Norwegian language as the target language, but there are also a series of dictionaries for immigrant learners of Norwegian, cf. the section below, “Bilingual Lexicography for Small Language Communities.” Commercial lexicography as such has never flourished in Norway but has done well in the fields of school spellers and bilingual dictionaries aimed at Norwegian learners of major foreign languages. The experimentation with formats, annotation systems, defining vocabularies, etc., that can be found in English-language lexicog- raphy is therefore not reflected in the more limited flora of Norwegian-produced dictionaries. Today, Norwegian publishers face the same competitive struggle with electronic products that is happening all over the world. There have nevertheless been some spectacular successes in dictionaries published on paper in the last few years, demonstrating that the market is not dead but choosy and unpredictable. In the early 1970s, the private foundations that had previously been established to provide major monolingual scholarly dictionaries for the official standard languages of modern Norwegian, and for Old Norse, were gathered under the umbrella Norsk leksikografisk institutt (Norwegian Institute of Lexicography) and placed as a department at the University of Oslo. Norsk leksikografisk institutt was reorganized within the University of Oslo in 1991. From 2016, the Language collections and the dictionaries based on them will be organized as a unit under the University of Bergen. 4 O. Grønvik

Description

Diglossia: The Two Standard Forms of Norwegian

For historical reasons, the Norwegian vernacular is expressed in two written stan- dards, Bokmål and Nynorsk. Language names were settled by Stortinget (Parlia- ment) in 1929 (Haugen 1966, p. 90). Several designations were used before then, riksmål and vårt almindelige bokspråk being allied to Bokmål, while landsmål and folkemål most often will refer to Nynorsk. In this chapter, Bokmål and Nynorsk is used irrespective of historical period. Bokmål is based historically on standard Danish, revised orthographically to match essential features of spoken Norwegian. Nynorsk is based on a synthesis of all Norwegian dialects. A fuller explanation is given below, in the section “Historical Background.”

The Structure and Morphology of Norwegian

Typologically, Norwegian is an SVO language with the same language structure as the other Germanic languages. Among the Germanic languages, Norwegian has a richer inflection system than English but less rich than German. Nynorsk has a three- gender system, embracing nouns, adjectives, pronouns, and the adjectival forms of verbs. Bokmål can be written with two ; this is the most striking morpho- logical difference between the two (Braunmüller 1998, p. 131). Structurally and morphologically, Bokmål and Nynorsk have much in common, but Bokmål mor- phology is less systematic and tends more towards lexicalization. There are no case forms except the genitive in the written standards, but the dative survives in lexicalized phrases and in some dialects. Verbs have no special conjunctive forms in the standards, but the conjunctive is found in some dialects, as is the plural form of verbs. These are features more likely to be seen in Nynorsk than in Bokmål texts. As in Swedish and Danish, nouns have a postpositioned definite article, and verbs have a regular passive infinitive form marked by a postpositioned ending. Adjective inflections include gender, number, and definiteness. The illustrations below show a couple of examples of inflection schemata (Figs. 1 and 2).

Lexicon and Word Formation in Norwegian The vocabulary of Norwegian is typologically classified in root words, derivations, and compounds. Root derivation: A lot of nouns are found in strong and weak form both in speech and writing (e.g., glugg and glugge “small opening”). If both base forms have the same gender, they cannot be told apart in writing on form alone in the definite form or in the plural. Only usage can decide whether the lexicographer is dealing with materials for one or two entries. The same goes for identical base forms of nouns which appear in context with two genders; only usage in context, and additional evidence like what sort of compounds these base forms appear in, will decide. Short nouns derived from verbs often have both masculine and neuter gender (e.g., flir, The lexicography of Norwegian 5

Fig. 1 BOB – web edition. Inflection schema for nouns. Standard inflection of a noun, masculine gender (= “dog, hound”) in Bokmål, as shown in the web edition of BOB. The order is masculine singular indefinite and definite, masculine plural indefinite and definite. The genitive forms, formed by adding “s” to each form, is not shown here

Fig. 2 Nynorskordboka – web edition. Declension schema for verbs. Standard declension of the verb følgje “follow” in Nynorsk, as shown in the web edition of Nynorskordboka. The bottom table shows inflection (in gender, definiteness, and number) of the verbal past participle and the form of the present participle

(neuter and masculine) “a grin,” from the verb flire “to grin”). But there are exceptions – the two nouns bruk (neuter and masculine) “usage” and bruk (neuter) “landed estate or industrial estate” both derive from the verb bruke, but the way they behave in context guarantees them two separate entries. Morphological derivation:Inflected forms may develop usages and senses not seen in the original headword and require separate handling in a lexicographical context. The chief source of uncertainty in considering the possible lemma status of inflected forms concern verbal adjectives derived from present and past participles. As in German, disjunctive phrasal verbs yield one-word participles. Thus, the disjunctive verb phrase skrive ut “write out in full; dismiss (e.g., a patient from hospital)” yields the past participle utskrevet (Bokmål), utskriven (Nynorsk). Some phrasal verbs are among the most frequently used in the language and require multiple sense entries in dictionaries. 6 O. Grønvik

Fig. 3 Nynorskordboka – the entry “ordbok”“dictionary.” The bottom two lines show nesting of compounds with a first part modifying the headword ordbok. These compounds have entries reached by clicking the wanted word

Verb + particle phrases may have a parallel compound verb, for example, stå til and tilstå. Sense registers in such pair vary from full co-occurrence to zero. Often there is some overlap, but the prototypical senses are clearly different, as in stå til “be in a certain state” and tilstå “confess.” In such cases, the participles of the phrasal verb and the compound verb are identical. The lexicographer’s challenge lies in identifying the participles that have taken on a separate meaning not found in the verb, as in ta på “put on (e.g., a coat)” and the participle adjective påteken, påtatt “assumed, pretended, insincere”: han leste høgt frå ei avis med påteken patos “he read aloud from a paper with insincere pathos.” Derivation by affix: Every verb root can yield a derivative noun by adding one of the suffixes -ing,-ning,-nad,or-else, and almost every adjective can be turned into a noun by adding the right suffix, often -het/-heit,-skap or -dom. There is no doubt about the base form or morphology of these derivatives, but a measure of frequency and (fixed) patterns of use will be looked for when an entry is considered. A detailed overview of common affixes can be found in Leira (1982). Compounding: Norwegian has an unlimited capacity for forming compounds, as do Swedish, Danish, Dutch, and German. In a corpus context, this means that we have as yet seen no limit to corpus size at which point added materials do not lead to a proportional increase in new word forms or lemmas. This again means that to qualify for entry in a dictionary, a compound has to meet the criteria outlined above. A new compound should preferably have at least one meaning unpredictable from the meaning of the parts. Most of the larger dictionaries have systems for nesting typical and well-known compounds without giving them full entries, by treating them instead as supplementary material at the base form entry (Fig. 3). There is a tradition in Norwegian for preferring compounds containing the elements in a bare form (sommardag “summer day,” rather than sommarsdag with infix -s-). However, the first part of a compound often has an infix at the composition point, historically often a genitive case ending to the first element. Much-used Norwegian compounding elements are –e- and –s-, giving word forms like sildefiske “herring fishery” and reinsdyr (rein + -s- + dyr) “reindeer.” In Norwegian this infix The lexicography of Norwegian 7 is termed fuge, “grout”–a metaphor taken from stone- and woodwork. There are several other infix forms, their use varying across dialects, to some extent on a phonological basis. This means that a compound consisting of the same elements may be documented in writing with more than one form, caused by infix variation. As a rule, this is pure form variation, but full lexicalization may occur, requiring two entries. For instance, compounds starting with land- normally concern the land, the ground itself, while compounds starting with lands- normally concern the country or state. The principles active in standardizing compounds into lexicalized words are not satisfactorily explored for Norwegian. What is certain is that lexicalization of a given form can override principles: the base form “dølahest”“valley horse” takes the (archaic and dialect) infix -a- because this is felt by all to be the only acceptable form. No one uses the form “dølehest,” even though it would follow the standard pattern. One consequence of the inherent systems for word formation in Norwegian is that the lexicographer must know what it means to deal with standardization of language, even if carried out according to the strictest empirical principles. The practice of establishing base forms from vernacular or nonstandardized evidence and selecting compounds for dictionary entries is in itself an act of standardization, as users look to dictionaries for recommendations on what words to use and how to use them. This issue has been much discussed both in relation to Bokmål and to Nynorsk and has been the subject of Language Council recommendations (Venås 1997, p. 372).

Norwegian Orthography

Norwegian is written in the ordinary Latin alphabet of 26 letters, from a to z. In addition, Norwegian uses three extra letters, the vowels æ, ø, and å, in this order.

The Phonetic Values of the Vowels æ ø å are /æ/(low, front) close to English “” (most often long in Norwegian) /ø/(front, rounded, middle height) close to English “hut” (short or long) /å/(low, back, rounded) as in English “caught” (most often long in Norwegian).

Two of these letters came to Norwegian with the Latin alphabet, from Britain, probably with the first missionaries in the eleventh century. The letters æ and ø were devised to express the umlaut values of o and a. The letter å was introduced in 1917 to replace aa, a bigraph used to represent the lowered value of the “long a” in Old Norse. The letter å for the old “long a” was used in Sweden from the sixteenth century onwards and was introduced in Danish in 1948. Although the letters c, q, w, x, and z are included in the Norwegian alphabet, they are regarded as unnecessary letters and are systematically replaced with other consonants and consonant combinations when possible, as part of a general policy for adapting imported word forms to the principles of Norwegian orthography (Språkrådet 2015-2; Sandøy 2000, p. 211 f.). The rationale behind this practice 8 O. Grønvik was put forward in relation to Danish in the early nineteenth century (Rask 1826, pp. 277–292) and has been gradually implemented for Norwegian since the late nineteenth century.

Historical Background

The historical background is as follows: Norway was an independent state in the Middle Ages until the country, for dynastic reasons, was allied to Denmark and Sweden at the end of the fourteenth century. Sweden regained its independence in the fifteenth century, while Norway was allied to Denmark, nominally as a “twin realm,” in practice as a colony. This condition lasted until 1814, when the map of Europe was reorganized following the end of the Napoleonic wars. Norway was taken from Denmark and handed over to Sweden as a compensation for Sweden’s loss of Finland to Russia. The spring of 1814 provided an interregnum used by the Norwegians to declare themselves independent and agree on a constitution, signed at Eidsvoll on 17 May 1814. (17th May is Norway’s National day.) The languages of medieval Norway and Iceland, Sweden, and Denmark were no doubt mutually comprehensible, though different enough to branch into different written practices and also different enough to clearly identify the language of the writer, until ca. 1450 onwards. In the Early and High Middle Ages, the standard language of Norway was what is now considered Old Norse. Once the administration of Norway was transferred to Denmark, Old Norse gradually fell into disuse and was in practice replaced by Danish by the end of the sixteenth century. In the same period, the spoken language underwent considerable changes, which meant that a revival of Old Norse as a written standard after 1814 was unthinkable – the vernacular and the old written standard were too far apart. When the 1814 Constitution states that legislation should take place in the Norwegian language, this is intended as a shield against a Swedish takeover. This left Norway with a legitimacy problem as far as language was concerned. In the nineteenth century, the literate classes wrote Danish and spoke Danish with some adaptation to Norwegian phonology. For example, educated people tended to use words like dreng “boy” and får/faar “sheep” (Danish dræng and faar), in contrast to local Norwegian dialect words such as gutt/gut “boy” and sau “sheep.” Norwegian dialects also had words for all the phenomena that belonged to life in Norway but were unknown in Denmark. Most Norwegians spoke their local dialect, though elementary education was in Danish, and contact with the authorities had to be carried out in Danish. For Norwegian dialect speakers, Norwegian and Danish formed a diglossia. The lack of a specifically Norwegian written standard was one argument that enabled Danes to cast doubt on the entire Norwegian claim of historic, continuous separateness, expressed in political independence. The Norwegian response to this challenge was twofold, but not consolidated in two opposing camps until the end of the nineteenth century. The lexicography of Norwegian 9

Fig. 4 The entry for the verb empløyere (“employ”) in Knud Knudsen’s dictionary Unorsk og norsk. Entry contents: Typical nouns exemplifying typical objects to the verb are listed first, then Norwegian verbs and verb phrases synonymous to the (originally French) imported word

Broadly, one response, which was eventually to be formalized as Bokmål, favored adapting standard Danish to Norwegian phonology and including typically Norwegian words in the lexicon. A gradual transition from a Danish to a Norwegian written standard was envisaged. There was and is some disagreement about how far this transition should go, in terms of accepting parts of the vernacular in the envisaged standard. A central figure in the early period was Knud Knudsen (1812–1895), whose suggestions for taking the lexicon over onto a Norwegian footing are encompassed in the bilingual synonym dictionary Unorsk og norsk (Danish headwords, Norwegian counterparts) (Fig. 4). The other response, which later developed into Nynorsk, was to start by survey- ing the linguistic landscape and find out more precisely what the Norwegian vernacular language was like. Because of the diglossic situation, the difficulty was to find a trained linguist who was close enough to ordinary people to gather trustworthy information. The problem was solved with the selection of the self- taught linguist and lexicographer Ivar Aasen (1813–1896) for the task. Aasen was funded from 1840 onwards and throughout his lifetime, first by the Royal Norwegian Society of Sciences and Letters, founded 1760 in Trondheim, then by Stortinget. Within his lifetime, Aasen documented the grammatical structure and the lexicon of Norwegian in a series of works culminating in Norsk Grammatik (1864) and Norsk Ordbog med Dansk Forklaring (1873). The orthography expressed in the headwords of Aasen’s 1873 dictionary was also his proposed standard for a common, wholly Norwegian written standard (Fig. 5). In 1884, the Norwegian parliament voted to give both standard languages official standing as languages of instruction and leave it to each school board to choose which one to adopt. An earlier parliamentary decision, in 1874, had tasked teachers with adapting their oral instruction in class to the dialect of their pupils, instead of the other way round. The standard orthography established by Ivar Aasen was also adopted by Pro- fessor Johan Storm for research purposes, as headwords from Aasen (1873) were used as reference forms for all collections of dialect materials from 1884 onwards. Johan Storm was a close friend of the Danish linguist Otto Jespersen and of the English linguist Henry Sweet, and a central figure in early efforts to create alphabets 10 O. Grønvik

Fig. 5 The entry for the verb emna “prepare” in Ivar Aasens dictionary Norsk Ordbog med Dansk Forklaring. The headword with grammatical information in Norwegian is followed by a definition in Danish, a usage example, a description of a typical phrase, also with a usage example. The last three lines deal with dialect variants, etymology (Old Norse), and cognates (Anglo-Saxon) for rendering speech. The dialect alphabet Norvegia is Johan Storm’s creation. The tradition of making Nynorsk the referral standard for dialect lexicography is still valid and employed in Norsk Ordbok (1966–2015), in dialect collections for research purposes, and in much dialect lexicography in general.

Mutual Comprehensibility Among Speakers of Scandinavian Languages

In ordinary conversational situations, speakers of Norwegian, Danish, and Swedish have little or no difficulty understanding each other, and to a lesser extent the same is true of Icelandic. There are obviously vocabulary differences, in some cases “false friends,” to look out for; examples are anledning, which in Bokmål is “opportunity” and in Swedish “reason”; and rar, in Norwegian “odd, peculiar,” and in Danish “sweet, agreeable.” But work immigrants with Danish or Swedish as their first language do not normally have to document their knowledge of Norwegian in professional contexts. The Nordic languages Danish, Faroese, Icelandic, Norwegian, and Swedish have a common history and still resemble each other greatly, although Faroese and Icelandic are not immediately understandable to Norwegians, Swedes, or Danes. However, maintaining the Nordic area as a common linguistic forum, where all speakers of the languages listed above (with a little adaption and good will) can speak their own language and count on being understood, is seen as very important for both historical and practical reasons. This overall approach includes Finland, as Swedish is an official language of Finland and an obligatory school subject. The goals of a common advisory Nordic language policy are set out in a policy statement with an official version in each Nordic language, eight in all, plus English (Nordisk Ministerråd 2007, pp. 92–93). These goals include promoting mutual comprehen- sion, preventing unnecessary or unfortunate lexical and orthographic diversity, and preventing loss of language domain to the world languages. The lexicography of Norwegian 11

These goals also require that all Nordic residents exhibit tolerance for variety and diversity in language, both between and within languages. In order to reach these goals, the ministers of culture and education will work with the following four issues: language comprehension and language skills, the parallel use of language, multilingualism, and the Nordic countries as a linguistic pioneering region. (Nordisk Ministerråd 2007, p. 93)

The reality on the ground is that mutual comprehension of the spoken languages which are related is perfectly possible, but has a great deal to do with individual attitude and adaption. The following comments are based on the Norwegian experi- ence. Norwegians asking questions in their own language in Denmark or Sweden may get their response in English. But of one persists, there is often a language change and further conversation will take place in the mother tongue of each speaker. The fact is that the peoples of the Nordic countries think of the area as a common ground for work or study, and colleagues from other Nordic countries are generally welcomed. Understanding the written standards of the Nordic languages Swedish and Danish require very little extra effort from a Norwegian. The orthographies differ somewhat, so one has to adjust for that, and there are a number of lexical traps (identical words with different meanings) and terminological differences. But these differences do not stop interaction or cause people to change their linguistic identity in talking or writing to each other. Even the influx of immigrants from outside the Nordic countries with one of the Nordic languages as their second language has not so far affected this state of affairs to an important degree. There is a considerable sociolinguistic literature on the tolerance levels for linguistic variation in the Nordic countries. In general the Norwegians come out of comparisons and tests as appreciative of the advantages and relaxed about the complications resulting from minor differences in speech and writing inside their own country and in relation to their neighboring countries.

Language Management and Lexicography in Norway

There has been contact and voluntary cooperation on language issues between interest groups in the Nordic countries since the mid-nineteenth century. Since World War II this cooperation has been taken care of by the language agencies of each of the Nordic countries. For a detailed description and discussion, see Vikør (2001). In Norway, public language management sprung from the tradition that teaching materials for schools required an imprimatur from the Ministry of Education. In the course of the nineteenth century, the need for official acceptance was extended to language form as well as content, and eventually, as language became a political issue (Vikør 2001, p. 53), to Ministry initiatives for orthographic management and reform. The first half of the twentieth century saw three major spelling reforms (1906–7, 1917, and 1938), all proposed by publicly appointed commissions and all intended to bring about a rapprochement between the written standards Bokmål and Nynorsk. The names of the two written standards for Norwegian used in this chapter were settled by Parliament in 1929. (Stortings Forhandlinger 1928–1929 Odelstinget 1.2.1929 p. 81 f.) 12 O. Grønvik

Shortly after World War II, a permanent language commission was appointed. (Norsk språknemnd (1952–1970).) It was followed by a language council with a permanent secretariat, Norsk språkråd (1971–2006), which has now been transformed into a government advisory body on language issues-Språkrådet (2007–), the Language Council. These bodies have since 1952 been tasked with implementing language decisions, covering normative decisions on standard orthog- raphy, morphology, names, an advice on usage, syntax, etc., in the educational system and the public sector in general, e.g., by controlling school spellers and dictionaries, developing dictionaries, and supporting the development of necessary language tools. Today, essential concerns within lexicography for the Language Council are (1) monolingual lexicography for the Norwegian standard languages, (2) bilingual lexicography between Norwegian and the other Nordic languages, and (3) supporting the development of lexicography for the minority languages for which Norway acknowledges responsibility.

Norwegian Lexicography 1600–1900

As mentioned above, in the period 1850–1900 two different attempts were made to establish a Norwegian standard language through lexicography. But what can be said of Norwegian lexicography in the preceding centuries, when all the other languages of Western Europe were having their written standards established? From 1600 to 1850, lexicographical works attempting to describe Norwegian deal with the vernacular expressed in different Norwegian dialects. The first printed dictionary of post-medieval Norwegian was the work of Christen Jenssøn, vicar of Askvoll in western Norway (1st ed. 1646). It has roughly 1,600 entries, with definitions in Danish mixed with Latin. The lemma list is drawn from a west Norwegian dialect, probably representing the author’s mother tongue. The author’s purpose seems to have been to remind his countrymen (to whom he dedicated his work) that the Norwegian language still existed, if only as a vernacular (Jenssøn 1915, p. IX). The dictionary is well organized by the standard of its time; the category system includes – in this order – headword, definition or explanation, and sometimes usage examples with translations into Danish (Fig. 6). Between 1600 and 1850, a number of glossaries drawn from Norwegian dialects were drafted, sometimes as a response to an official initiative (Røgeberg 2003,p.I, 9 ff.), sometimes initiated by the author. Many of the authors were (mostly Norwe- gian born) clergymen and civil servants. The intention behind their collections is not always expressed, but seems to have been to inform future colleagues of what they need to know to cope with their districts. It should be kept in mind that the seventeenth and eighteenth centuries were the peak period for topographical litera- ture, of which language is one aspect. While none of these works gives comprehen- sive coverage of the Norwegian vernacular or aims to lay the foundation of a Norwegian standard language, some of them do document a contemporary feeling that Norwegian and Danish are different, though similar, languages and point forward to language becoming an issue in the Norwegian struggle for national The lexicography of Norwegian 13

Fig. 6 The entry for the head word Efne “matter” in Christen Jenssøns Dictionary of 1646. Entry text, containing base definition and secondary senses: “Matter is the word for a prepared material ready to be made into something/whether to cut out/carve/plane, hew/whatever it may be. In the same way Materia prima may be called an Efne and (also) the lump of earth than man wonderfully is made of. Gen. 3 v. 9.” (Jenssøn 1915, p. 29) independence. An expression of this tendency is found in a Norwegian–Danish dictionary by Laurits Hallager (1802), where a number of dialect glossaries are brought together in a single dictionary. Hallager’s introduction deals briefly with the phoneme system, morphology, and syntax of Norwegian as opposed to Swedish and Danish and expresses the view that only cultivation in writing has prevented Norwegian from becoming a language of equal standing with the standard languages of the neighboring countries (Hallager 1802, pp. III–VI).

The Complete Word

A word requires supporting data on a number of counts to be considered worthy of a dictionary entry. When dealing with vernaculars, identifying the complete word, i.e., the process of separating homographs, subordinating variants, and assigning docu- mentation to the proper (standardized) headword, can be challenging. For the editors of academic or pioneering dictionaries, this challenge has to be met during most working days. Ivar Aasen established the principle that any word used in a Norwegian dialect can be expressed in the written standard, provided that there is enough information to identify the lemma within the orthographic system. By “information” Aasen meant documented variants, preferably with cognates in related languages and ideally with a former stage documented in Old Norse. A dictionary entry also requires sufficient evidence for meaning and usage (Aasen 1864, p. 346 ff.). A dictionary entry resembles an equation where the headword corresponds to each sense listed under it, given the conditions which are specified in the definition and demonstrated in the usage examples. Ideally, a dictionary entry also represents acceptance within a language community that the information in each entry is true and fair. 14 O. Grønvik

Therefore, a good deal of information needs to be in place before a word form can be regarded as deserving an entry. The orthographic form should be settled, likewise a morphological description (part of speech, gender, paradigm) allowing a decision on headword form. One or more generally accepted senses, shown in usage exam- ples from reputable sources, must be in place, likewise the etymology. Both fre- quency of use, quantity, and quality of usage sources have to be considered. All of the filters mentioned above come into play to this day for Norwegian when dubious lemma candidates are considered. Typical debatable cases are (1) word forms from inflection paradigms that have taken on an independent life ( farga/farget “(of person) colored”), (2) root derivations, e.g., the strong and weak forms of a noun (skodd and skodde “fog,” gall/galle “swelling, growth” vs. galle “bile”), (3) regular derived forms, e.g., verbal nouns (tangering “tangency”), and (4) compound forms, especially nouns (taktekking “roof covering”). The examples above are cases where it would be proper to have entries, for reasons of semantics or frequency or both. Other doubtful cases are word forms derived from names or trademarks or those representing a generic use of such items. Norwegian orthography today has a well-established tradition for dealing with linguistic imports, and spontaneously “norwegianized” orthography and inflections are important marks of adaption (Språkrådet 2015-2, p. 16).

Scholarly Dictionaries of Norwegian

The description of the two Norwegian standards, Bokmål (Riksmål) and Nynorsk (Landsmål), through scholarly language collections started in the early twentieth century, inspired by older initiatives of the same kind in England, Germany, Sweden, and Denmark. Both Norwegian initiatives were private, but initiated by academic staff at the University of Oslo and supported by public funding. Founding dates are 1911–21 for Det Norske literære ordboksverk (Bokmål), 1929–1930 Norsk Ordbok (Nynorsk). Supplementary initiatives were scholarly collections documenting place names and speech variation (dialects). An initiative to document the Old and Middle Norse of Norway, avoiding treating the languages of Iceland and Norway as one, started in 1940. Founding dates for the other language collections are 1921 for Norwegian names, especially place names, 1936 for the Norwegian Dialect Archives (no staff since 2005), and 1940 for Gammelnorsk Ordboksverk (Old Norse).

Norsk Riksmålsordbok and Det Norske Akademis Ordbok Norsk Riksmålsordbok [NRO] was published in four volumes in the period 1927–1957, with two supplementary volumes appearing in 1995. The building of the slip archive Norsk litterært ordboksverk started in the early 1920s. The dictionary is owned by Det norske Akademi for språk og litteratur, a private, self-recruiting body dedicated to promoting the conservative variety of Bokmål (see https://snl.no/ Det_Norske_Akademi_for_Spr%C3%A5k_og_Litteratur) which is opposed to state language management, although individual members have participated in Language Council work since 1972. The lexicography of Norwegian 15

NRO has since 1998 received public funding in order to provide an integrated edition of the first four and the two additional volumes. This product was originally to be published as a web dictionary in 2005 (Guttu 2005, p. VII). In 2008 the plan was changed and the project expanded to a plan for providing an expanded web dictionary for Bokmål and Riksmål, called Det Norske Akademis Ordbok (NAOB). At the same time, Bokmål orthography was adopted for headwords and editorial language. The Bokmål variant preferred by NAOB is as close to the traditional Danish-based Riksmål as possible within the official standard orthography of bokmål. NRO in its first edition (1937–1957) is an impressive work produced by the foremost Norwegian academics of the Riksmål persuasion and was completed in record time. The evidence used is Norwegian Riksmål and Bokmål literature from 1814 until the time of publication – in practice Danish with a Norwegianized vocabulary in literature published before 1900 – plus the editors’ knowledge of Norwegian speech in the educated section of society, i.e., “the norms of Norwegian urban upper-class speech” (Vikør 2002, p. 6). NRO has a very deep entry structure. The remodeling of the original entry format, which is taking place within the project NAOB, will almost certainly be flatter in structure and more modern in shape and appearance. A guide to the new web format is presented at http://www.naob.no/. It will not be published on paper. The source base will be broader, though little is known about that outside the project. NAOB is unlikely to have direct links to evidence, source registers, bibliographies, and maps, as time runs out in 2017, the stipulated final year of funding.

Norsk Ordbok. Ordbok for folkemålet og det nynorske skriftmålet Material collecting for Norsk Ordbok. Ordbok for folkemålet og det nynorske skriftmålet (NO) started in 1930 with an appeal to the public to collect dialect words and expressions or volunteer to write slips from excerpted Nynorsk literature. A rough manuscript was produced by 1946, but spelling reforms and large amounts of additional evidence caused the editors to go back and start again. Published in fascicles, the first full volume of what was then planned as a four- to five-volume dictionary was out by 1966. As the material collections grew, progression through the alphabet slowed down (Bø 1989; Venås 1989), and by the end of the twentieth century three volumes were out and a fourth volume completed in manuscript, the total being planned to reach 12 volumes. In 2002 the dictionary was reorganized on a digital platform and the finishing date set to 2014. Owing to unforeseen complica- tions in the last 2 years, volume 12 was finally completed at the end of 2015 and is due for launching in early 2016. Among the scholarly dictionaries of Europe, NO is unusual in that it integrates sourced evidence of both speech and writing. The language collections on which NO is based include slip archives, seventeenth to nineteenth century dialect glossaries, dictionaries of every description as long as they relate to Nynorsk or Norwegian dictionaries, and text corpora. Much is digitized, but a great deal is still paper bound. The language collections are still growing (Fig. 7). In the process of completing the dictionary on a digital platform, the focus of the work has shifted from producing and completing a paper dictionary of a higher quality 16 O. Grønvik

Fig. 7 The beginning of entry for the head word stund “period of time, while” in the web edition of Norsk Ordbok. The illustration shows definition 1a with usage examples and compounds (bottom line). Pale blue word forms are hyperlinked to other entries or to bibliographical detail on sources. The icons in the top right-hand corner indicate from left to right information about (1) speech variants, (2) earlier standard forms, (3) written sources before 1850, and (4) etymology than what had been possible earlier to producing a web dictionary linking all relevant information about a word through a dictionary entry. The editorial platform prevents creation of entries lacking evidence from the language collections. The evidence behind each entry can be accessed directly from the entry. The dictionary materials are also directly accessible from the same web site. The location of headword forms, definitions, and usage examples can be shown in an electronic web of Norway which is part of the website. The scholarly dictionary for the Norwegian dialects and the Nynorsk written language has been transformed into a dynamic system of databases, of which the paper dictionary is one expression, the web edition another. The integration of place as a source made it necessary to codify a place hierarchy from local administrative area upwards to the whole country. Since January 2014 the web edition of NO has been equipped with a dynamic digital map, which will show any area listed as source for a given word form, definition, or citation. When volume 12 is out, Norsk Ordbok will consist of a 12-volume paper edition and a web edition covering the alphabet stretch i-å. A plan for revising the alphabet stretch a-h and integrating it into the web edition exists, but funding has yet to be found (Wetås and Berg-Olsen 2014) (Fig. 8).

General One-Volume Dictionaries of Norwegian

Nynorskordboka [NOB] and Bokmålsordboka [BOB]. In 1964, the Vogt Committee recommended the creation of an institute to provide evidence-based scholarly dictionaries for both Bokmål and Nynorsk, and the need for one-volume The lexicography of Norwegian 17

Fig. 8 Entry for “stund” with survey of recorded dialect forms. The map shows the documentation for the distribution of the form /stønn/ monolingual dictionaries, emphasizing good definitions and usage examples, was heavily stressed (Vogt et al. 1966, p. 13). At the time, only smallish school spellers were available in the official orthography. Available definition dictionaries were too large and did not use current orthography. These recommendations were endorsed by Stortinget (Stortinget 1970, p. 2732). Norsk Leksikografisk Institutt was in 1973 tasked with editing two parallel volumes, one for Bokmål and one for Nynorsk, with support from the Norwegian Language Council (Språkrådet Annual Report 1973, p. 9; Hovdenak 1997). Work started in 1974 and the first edition reached print in 1986. These were the first two general monolingual defining dictionaries for the two standard varieties of Norwegian. The production plan for two dictionaries was over-optimistic and under- resourced; the initial assumption was that the staff of the existing dictionary projects would do the work in addition to other tasks, with a couple of extra editors to help. In the end, the two dictionaries took 40 man years and 12 years to produce. BOB in its first edition had about 60,000 entries, NOB about 90,000. BOB and NOB had the slip archives of the language collections at the University of Oslo as their available evidence, plus existing dictionaries and spellers. NRO was an important source to BOB and Norsk Ordbok to NOB (Gundersen 1990, p. 1925). The dictionaries were edited by two teams working in parallel, with editorial guidelines coordinated as far as possible. The editors of Bokmålsordboka drafted the manuscript for the letters a-l, while the staff of Nynorskordboka drafted the manuscript for letters m-å. The teams then swapped manuscripts and adapted and adjusted the whole. The production method chosen meant that both dictionaries rested on the complete language collections. The differences between the two languages concern orthography, morphology, vocabulary, and usage, and guidelines were designed to take care of the profile of each standard. 18 O. Grønvik

NOB dealt with the integration of dialect vocabulary in standard Nynorsk. The editorial rule was that a headword with sufficient evidence behind it for editing, and with documentation from at least three Norwegian counties, got an entry (Grønvik 2007, p. 18). This rule explains the difference in size between the two dictionaries. The Language Council requirement of full and neutral presentation of all orthographic and morphological variation also necessitates a lot of space-consuming cross-referencing. BOB and NOB have run through numerous editions and printings since 1986, while a thorough revision of contents remains on the agenda. The dictionaries have been available on the web since 1994. The web editions were technically upgraded in 2009 and 2012–13 and are at present the chief source of information on Norwe- gian for Norwegians (Grønvik and Ore 2014). The web edition of the dictionaries is authorized for use in exams and is much used, with roughly 80 million searches in 2014 (Ims 2015, p. 13), increasing to ca. 90 million in 2015. Two other general monolingual dictionaries should be mentioned. Both are primarily paper publications, though are also available on the web as subscription services (Ifinger: Ordnett). Norsk Ordbok (Landfald 2006) has an entry format close to the bilingual dictio- naries for school learners, with headwords in full plus inflection patterns. Entries contain multiword expressions (MWEs), derivations, compounds, and usage exam- ples, but these categories are not visually differentiated. Compounds thought to be self-explanatory (to a native speaker) are not defined. It is in many ways typical of the older type of monolingual school dictionary which, as it were, reminds the user of what (s)he already knows. This dictionary is popular with school pupils because of its very simple format. It is currently under revision. Norsk Ordbok med 1000 illustrasjoner (Guttu 2005) is with 80,000 entries the largest dictionary for Bokmål, but it covers only the Bokmål orthography that is the closest to Danish. This leads to oddities like separate entries for two orthographic forms of the same word, with usages and senses distributed on the two entries according to frequency and dominance. There are for instance separate entries for bein and ben (“bone”) variants in Bokmål orthography. The form ben is the canon- ical Riksmål form, but a number of MWEs and compounds only occur with bein. In theory, all of these dictionaries adhere to the traditional definition format of the specified hyperonym (genus proximum plus differentia specifica). In practice, there is a considerable use of synonyms, often arranged as a sort of sliding scale from one sense to another, no doubt a solution forced on the editors in order to save space. The most consistent use of the traditional definition format is found in Norsk Ordbok med 1000 illustrasjoner. In the web editions of BOB and NOB, all synonyms listed between commas or on their own are hyperlinked to the correct entry – a precaution against using synonyms not covered in the dictionary.

Etymological Dictionaries of Norwegian

The canonical, scholarly works documenting Norwegian etymology are Etymologisk Ordbog over det norske og danske sprog I-II by Falk and Torp (1903–1906) and The lexicography of Norwegian 19

Nynorsk etymologisk Ordbok (1919) by Torp. A database edition of Torp (1919) was published in 2008. The number of entries is roughly 19,000. Both of these are highly readable and were reprinted in facsimile editions in the 1990s. Later publications with a far more thorough treatment of the oldest vocabulary of Norwegian are Våre arveord – etymologisk ordbok (2000; expanded edition 2007) by Bjorvand and Lindeman. In addition, there are recent and more popular works for the general public. Falk and Torp (1903–1906) treated the etymologies of Norwegian and Danish under the same umbrella, as they had done in previous works dealing with Norwe- gian phonology, morphology, and syntax. There are good reasons for doing so; Norwegian and Danish are cognate languages, and the Danish influence on Norwe- gian had been strong for more than 400 years, particularly on the lexicon. Copen- hagen held its position as a cultural reference point for Norwegians until the end of the nineteenth century, not least because all important Dano-Norwegian authors (Henrik Ibsen among them) were published from Copenhagen. The external linguis- tic influences on Norwegian and Danish are much the same, whether passed through Denmark to Norway or a result of direct influence on both languages. For both Norwegian and Danish, the North Sea languages Dutch, German, and English have at different stages had profound influence on vocabulary and syntax. The whole spectrum of Indo-European languages is dealt with in these dictionaries. A slightly reworked edition of Falk and Torp (1903) was published in German in 1919. A recent work dealing with word forms adapted from English is Anglisismeordboka (Graedler and Johansson 1997) with about 4,000 entries. Falk and Torp (1903–1906) use Danish in headwords and editorial text and list much used Norwegian word forms as cognates, with an index of about 5,000 specifically Norwegian word forms at the end. The specifically Norwegian vocab- ulary is dealt with in depth in Torp (1919), which is directly based on the dictionaries Aasen (1873) and Ross (1895). An example of the difference in perspective between Falk and Torp (1903) and Torp (1919) can be seen in the treatment of the word smære “clover.” In Falk and Torp (1903) smære is referred under the entry “kløver” as the old word for “clover” in Norwegian, but it has no entry and is not listed in the index. In Torp (1919), smære has an entry with cognates, roots, etc. In Torp (1919) the headword form is taken from Aasen (1873), and Norwegian dialect variants with location are mentioned, as are derived words and compounds. All the dictionaries mentioned above are standard reference works for present-day lexicographers and for all with an interest in word history.

Synchronic and Historical Principles in Norwegian Dictionaries

A fully historical way of describing language history through dictionary writing per entry is the method practiced by the OED: the oldest documented sense first, and then the rest in order of age. Obsolete senses are included but marked as obsolete. All senses are illustrated with quotations also arranged in order of age, with the oldest first. 20 O. Grønvik

Norwegian has a continuous development as a spoken language but was not continuously developed into a written standard from Old Norse. Instead, there was a break in continuity for about 400 years. This means that continuous diachronic documentation from Medieval times onwards is impossible to practice for Norwe- gian lexicography. What is possible to practice differs for the two written standards, and they will therefore be dealt with separately. Nynorsk is based on the Norwegian vernacular, with Old Norse as an important touchstone from Aasen onwards. Bokmål started out as Danish with a modified pronunciation and is today a hybrid language in historical terms, though fully Norwegian in its social basis. The different history of the two language varieties, and the lack of standard text in Norwegian between 1400 and 1800, causes a different approach to the handling of historical and synchronic materials for Bokmål and for Nynorsk, clearly demonstrated in the two scholarly dictionaries NRO (I-IV 1937–1957) and Norsk Ordbok (I-XII 1966–2016). For NRO the starting date for evidence used is set at 1814, the year of political breach with Denmark. For Norsk Ordbok the starting date is set at 1537, the date of the Lutheran reformation in Denmark–Norway. The materials dated 1537–1850 are considered “older sources.” Most older sources show dialect forms, though their use as evidence is hedged about with philological considerations. They are therefore referred to by literary source only and obviously have no weight in the entry as synchronic evidence of speech forms. The forms are listed in the etymology section of the entry, and adapted quotations can be used, rendered in standard modern orthography. The downside to this double approach is that evidence of Norwegian lexical items in literary, Danish-writing sources is not utilized in either scholarly dictionary. The works of the churchman and poet Peter Dass, one of the great names of Norwegian–Danish literature of the seventeenth century, is excerpted in the Literary Collections In the Bokmål language collections, but not used in either dictionary.

Principles of Definition Writing, Explanation, and/or Translation

Principles for definition writing depend on dictionary type and genre. In what follows, lexicographic tools for native users and bilingual tools with Norwegian as L1 or L2 are dealt with separately. Monolingual lexicographic tools for the ordinary Norwegian user have largely been thought of as aids to mastering orthography. Throughout the twentieth century there has been available a fair selection of spellers for school use at different levels. As these increased in size, the need for minimal additional information would be required in connection with some entries, in order to explain and justify differences between homographs, demonstrate the proper use of imported words, or suggest alternatives to over-used vocabulary. Frames are used in some of the school spellers for Norwegian to highlight difficult points of usage or grammar. But school glossa- ries and dictionaries were not and are still not required to have proper definitions for every entry. The Language Council is tasked with the quality control of school The lexicography of Norwegian 21 dictionaries and publishes a complete list of authorized school spellers (ordlister)on their website. There are a number of one-volume dictionaries for the general user. The best known are BOB and NOB. The definition style of these dictionaries aims at nearness to speech and a plain and easily comprehensible style, within the traditional form of hyperonym modified by characteristic features. The desire for a large number of entries in practice causes definitions to stay short, often replaced by one or two synonyms.

Bilingual Lexicography

Bilingual dictionaries fall into two categories in Norway: the commercially viable ones and those for which the market is too small.

Bilingual Lexicography in Education The commercially viable bilingual dictionaries are in general aimed at the Norwe- gian learners’ market, especially secondary school and beyond, and the languages for which there are several dictionaries to choose from are the most popular foreign languages (Utdanningsdirektoratet 2014, p. 17) covered by elementary and second- ary school syllabi. The first foreign language for Norwegians is English, which is taught from an early grade. The second foreign language is introduced in the eighth grade. A wide range of languages can be chosen. The most popular options are Spanish, German, and French, but Italian, Mandarin Chinese, and Russian also have some hundreds of students. From 1996 school pupils have been allowed to use bilingual dictionaries at public exams, and as a result dictionary sales boomed for some years. For the first time, teachers trained students in dictionary use. Since 2006, secondary schools have provided their pupils with teaching materials, including dictionaries. At about the same time, digital learning resources, including dictionaries, became a preferred option, and public tenders were invited for delivery of digital teaching materials. Electronic dictionaries and language learning tools are also handled by a couple of Norwegian companies (Ifinger; Ordnett). Very few non-Norwegian publishers have shown interest in the Norwegian dictionary market. Most bilingual dictionaries are general dictionaries for Norwegian learners, and they are mostly the products of efforts by language teachers working con amore. Entry number is selling point number one. Definition style is most often traditional, with use of direct equivalents when possible and short explanations when not. One series of bidirectional bilingual dictionaries makes use of frames to explore difficult points. All in this series have extensive back and front matter and seem well adapted both to a teaching environment and to self-study, but the format is much simpler than many foreign learners’ dictionaries for English. Most dictionaries are older works which get updated. In the updating process, corpora and analysis tools like SketchEngine may be used, and formatting will be standardized across dictionaries from the same publisher. 22 O. Grønvik

Bilingual Lexicography for Small Language Communities The bilingual dictionaries not deemed commercially available deal with:

• Nordic language pairs (e.g., Danish–Norwegian, Finnish–Icelandic) • Norwegian minority languages (North and South Saami, Kven, Romani and Romanes, Norwegian Sign Language) • Immigrant communities in Norway (e.g., Norwegian/Thai, Norwegian/Urdu)

Nordic language pairs: The Language Council supports efforts to provide dic- tionaries between the Nordic languages and sometimes participates actively or contributes funding. One such cooperative effort stands behind the website ISLEX which provides dictionaries between Icelandic and all the other Nordic languages. Another website, Nordisk miniordbok, has a limited vocabulary and is specifically aimed at improving mutual language comprehension among children and young people in the Nordic countries. As something is better than nothing, the mere existence of these language tools is an excellent thing. At the same time, each illustrates the cultural and lexicographical difficulties inherent in oversimplification. In the ISLEX dictionaries, Icelandic was edited as source language in one edition, without adaptation possibilities in relation to the different Nordic target languages; this decision led to a number of quite complex mismatches, on the lexical, morpho- logical, and syntactic levels (Rauset et al. 2012, p. 514 f.). Norwegian minority languages: The lexicographical coverage of the minority languages for which the state of Norway accepts responsibility spans from leaflets of a few pages to scholarly analysis and presentation going back for several centuries, as is the case for North Saami (see e.g., Leem 1768). Common to all the Norwegian minority language communities is a strong wish to promote their languages towards becoming modern written standards (Johnson et al. 2013). Since 2011, the Language Council has been tasked with offering some support to documenting these languages and generally observing and reporting on their status (Språkrådet 2011-2). Current documentation of minority languages in dictionaries receives public funding, but the effort is piecemeal, depends on the self-help of the language communities themselves, and requires some interest from linguists and lexicographers who happen to study the languages in question. Linguists at the University of Tromsø have contributed to developing a format for presenting morphology-rich languages like the Saami lan- guages and numerous other languages of the Arctic in web dictionaries, going in both directions between language pairs. (See e.g., the South Saami – Norwegian dictionary at http://baakoeh.oahpa.no/nob/sma/.) The standard entry provides headword, part of speech, inflected forms, equivalent and explanation in the other language, usage examples, and synonyms with comments. This format is at present in use for ten languages of the Arctic. The Arctic languages using this format for bilingual web dictionaries are North Saami, South Saami, Skolt Saami, Liv, Izhorian, Komi, Nenets, Erzya, Mari, and Plains Cree. Some of them are paired with several languages from larger language communities, typically the language of the state, a cognate language or two, and a world language (e.g., English or Russian) (Fig. 9). The lexicography of Norwegian 23

Fig. 9 Giellatekno, the web site for arctic language resources. Bilingual dictionaries for North Saami (http://xn–snit-5na.oahpa.no/) The public effort put into documentation of the Norwegian Sign Language is not impressive, but there is a web dictionary under way under the management of Statped, a national service for special needs education (Tegnordbok.no 2015). Dictionaries for immigrants: Immigration to Norway has created sizable com- munities of adult learners who need and want to master Norwegian in order to adapt to their new country. One very basic need in language learning for immi- grants is dictionaries that not only provide translations of words and phrases in the new language but also explain words and concepts specifictoNorwayandwithout an equivalent in the immigrant’s language and culture. Sweden started the project LEXIN - a name coined from Swedish “Lexicon för innvandrare,” in the late 1970s, the central idea being to combine dictionary and encyclopedia features in a basic format, flexible enough to be adapted to the needs of each immigrant group. The Norwegian authorities copied the concept in 1996. Focus has shifted to providing dictionaries for immigrant school pupils. LEXIN dictionaries now pair Norwegian, both Bokmål and Nynorsk, with 16 languages (LEXIN 2015). There are also dictionaries for Nynorsk–English and Bokmål–Nynorsk. Some early LEXIN dictionaries were published on paper, but the web dominates as an arena for publication. The aim of LEXIN dictionaries is for them to be as user-friendly as possible and to facilitate the decoding of Norwegian (Hovdenak 2008, p. 219 f., Bjørneset 2016, p. 97). The web entries have sound, part of speech, full-form inflection and equiv- alents (often in English in addition to the immigrant language), some simple examples, and a selection of much-used multiword expressions and proverbs with basic explanations of meaning. For instance, in the Bokmål–Arabic dictionary, under hund (“dog”) the saying man skal ikke skue hunden på hårene (“don’t judge the dog 24 O. Grønvik by its coat”) is explained in English as “appearances can be deceptive,” with a suitable expression in Arabic below.

Terminological Dictionaries

Language standardization for all domains implies an acceptance of the need for specialized terminology as part of the national lexicon. In small language commu- nities, like Norway, the need to cope with the modern world through giving new terms and concepts a Norwegian designation was felt from the early nineteenth century, coinciding with a period of orthographic purism and standardization of imported vocabulary in all the Nordic countries (Rask 1826, pp. 277–292; Sandøy 2000, p. 244). Codifying technical terminology as part of developing a standard language became urgent around 1900, when Norway started training its own engineers and technical personnel. NITO – the Norwegian Society of Engineers and Technologists – appointed a language committee in 1897 to deal with terminological issues. The Technical College of Norway was founded in 1900. Until 1940, the chief external influence on technical language in Scandinavia came from German, although the influence from English was noticeable from before 1900 (Myking et al. 2017). The national oil industry has been an important agent in promoting this linguistic development, as has Norway’s strong shipping and fish industry. Terminology definition and documentation for Norwegian is therefore recognized as a public responsibility. The Language Council created a permanent terminology committee in 2010, as part of its regular agenda. The Ministry of Foreign Affairs has since 1994 had a Section for Translation Services, dealing with all EU translation needs arising from Norway’s membership in the European Economic Area (EEA) Agreement (Myking 2005, p. 9). Current efforts primarily include the coordination of existing resources at the Norwegian School of Economics (NHH) and the establishment of a national portal for terminology, to a lesser degree the development of new terminology as such. Until the turn of the century, terminological dictionaries were chiefly paper products. Today, the major storehouse for Norwegian terminology is Termportalen (Andersen and Kristiansen 2010, p. 1 ff), which aims at coordinating existing terminological resources and creating new ones, for instance within finance and administration and within the maritime disciplines. Although terminology is essentially a concept-based discipline, the interaction with lexicography is marked. New terms have a way of wandering into the general vocabulary. Moreover, after 1945, the influx of English has been felt to be over- whelming, especially within the sciences. In response to this, many professions, lexicographers, public, and private agencies have to deal with terminology issues. The Language Council has developed strategies and guidelines for importing vocab- ulary into the Norwegian standard language (Vikør 2001, p. 145 ff.; Myking et al. 2016; Språkrådet 2015-2, p. 15 f.). The three main methods are phonemic adaption, loan translation, and term creation based on concept definitions. The lexicography of Norwegian 25

Lexicographical Evidence: Citation Collections, Corpus Data, and Other Resources

The scholarly dictionaries of Norwegian all build on empirical evidence. In the case of the older dictionaries (Aasen 1873; Knudsen 1881) their archives (mostly lists) are stored in the National Library. Evidence collection continued in the early twentieth century in slip archives with citation collections for Bokmål/Riksmål, Nynorsk, and Old Norse. Such collections can be found at several Norwegian universities. The largest and most comprehensive ones were transferred from the University of Oslo to the University of Bergen in 2016. A number of older Norwegian language collections and resources were digitized from the 1990s onwards, especially at the University of Oslo through the Documen- tation Project 1991–1997 (Aukrust and Hodne 1998). These collections are available for general use on the web. The most important item for modern Norwegian is Metaordboka (“The Meta Dictionary”), a standard language lemma index to citation collections and other sources for Nynorsk and spoken Norwegian. Given the het- erogeneity of Norwegian orthography through the twentieth century, Metaordboka is an essential, labor-saving tool, in use outside the world of lexicography as well as within. It is set up to handle more than one language, so source collections from other languages can be attached to parallel indices (Ore et al. 2002). In the course of the last 25 years a number of corpora have been built for different purposes. Norwegian-build corpora range in size and scope from fairly small (bilingual) translation corpora to very large web and newspaper text corpora. The largest are Norsk aviskorpus, well known to many Norwegians for its New Year summary of last year’s neologisms (Hofland 2000) and the SketchEngine NoTenTen corpus. Two specifically lexicographical corpora are the proofred and properly sourced Leksikalsk bokmålskorpus (LDK) and Nynorskkorpuset, of more than 100 million tokens each, both in use in major lexicographical efforts for Bokmål and Nynorsk, respectively. However, the main source of information about the Norwegian language in the future will probably be the National Library digital collections. The National Library is engaged in digitizing all Norwegian literature in the institution’s keeping. The resulting web application, Bokhylla.no, is accessible from Norwegian IP addresses and offers unlimited and free screen use of all literature produced by a Norwegian publisher between 1790 and 2000. All pages are photographed and OCR-read, and the collection is searchable as text. The National Library book collections comprise 375,000 editions, about 10 % of which is in Nynorsk, the rest in Danish, Riksmål, and Bokmål. Only 4 % of the total was published before 1900. A number of statistical tools are being developed and a concordance system is under way (Fig. 10). The introductions of the nonacademic dictionaries from before the digital age are generally vague about their lexicographical evidence, and in some cases the original manuscript itself embodies the evidence, which has been inserted during the process of editing. If sources are mentioned at all, they are other dictionaries or persons who have helped in the process (see e.g., Voss 1933, p. III). 26 O. Grønvik

Fig. 10 National Library n-gram service. X = Year (1810–2013), Y = %. All text at Bokhylla.no is searched. Three word forms meaning “the sheep” are compared; green = “Faaret,” blue = “ sauen,” yellow = “fåret”–“sau” being the vernacular form, “får” the literary

An important resource under development is Målføresynopsen (“The Dialect Summary”), compiled ca. 1950–1975 with data going back to the 1880s. The original resource is a set of 43 handwritten protocols giving the spoken form of roughly 2,500 lexical items from about 1,500 informants, representing virtually all Norwegian dialects. The purpose of Målføresynopsisen was to create a firm basis for finding the Norwegian isoglosses. The catalog pages were photographed about 10 years ago and loaded into a facsimile database. A search system, by which a given word form from a given place can be located in the photographs, has been developed, as well as a simplified transcription system for the registered dialect forms. This may seem remote from standard lexicographical work, but given the standardizing process still going forward in relation to the Norwegian lexicon, this has turned out to be an invaluable tool for lexicographers and linguists alike.

Building Lexicographical Resources

The chief need in building lexicographical resources is institutional stability and adequate maintenance for the resources themselves. Considerable electronic resources exist, some in refined and others in fairly raw formats. But the labor involved in refining search instruments and developing methods of using results in lexicography has to be worthwhile, and the chief condition for making it as worth- while is knowing that the results will be taken care of and made use of in the future. The lexicography of Norwegian 27

In the electronic age, institutionalization has become more important than ever before, because of the maintenance needs of electronic resources. A book can sit on its shelf for a very long time and become useful again several centuries later, but an electronic resource from the magnetic tape age is today unusable because the machinery needed to interpret its contents no longer exists. Another aspect requiring a transition from project thinking towards institution- alization is the fact that today’s web users expect their lexicographical resources and tools to be available and in working order at all times. A generation ago, one could produce a dictionary through a project, publish it and forget all about it for 5 or 10 years, and then put together a revision project. Today running maintenance is expected and even required. The electronic edition of the monolingual dictionaries BOB and NOB are permitted tools in school exams and are set up for that purpose with a separate IP address. They are also much used by the civil service and in business. In 2014, the dictionary website had 85 million hits, and exam days can be distinguished by web activity 26.3.2015 saw ca. 250,000 hits in all, with ca. 20,000 users looking at morphological information from Ordbanken via Nynorskordboka.

The Relationship Between Lexis and Grammar in Dictionaries

What role is (or could be) assigned to phraseology and collocations in the lexicog- raphy of Norwegian? Norwegian is an analytic language which forms phrases very easily, as English does. The phrase scale stretches from (shaved-down) proverbs and “words with wings” to everyday MWEs consisting of the most common function words and functioning as fixed semantic units. A strong oral culture, passed on through Norwegian dialects, means that the inventory of pragmatic phrases, sayings, and proverbs is rich, and these expressions excite a good deal of interest amongst Norwegians themselves. Phrasal verbs are standard. MWE prepositions and adverbs are common and sometimes very frequent: the adverbial phrase av og til meaning “now and then” is by far more frequent than any one-word synonym, the preposition i og med meaning “considering” has no alternative expression. At the other end of the phrase scale there are some very long, metaphorical verb phrases; see Fig. 11. At the academic level there has been very little discussion of the relationship between lexis and grammar as expressed in Norwegian, if lexis is to be understood as the totality of semantic units in a language, from single orthographic words to much used sentences in ordinary discourse. Different issues that touch on the relationship between lexis and grammar are discussed, such as whether compounding is a word formation process or a syntactic tool used in organizing information within the sentence. Lexicalization of compounds and MWEs is a much discussed issue, and tools for language statistics for Norwegian have recently become accessible tools thanks to the Language Bank at the National Library (Språkbanken 2015). 28 O. Grønvik

Fig. 11 The verbal phrase literally translates as “stick the finger into the soil and smell where one is” and is a much used expression meaning “to adjust one’s mode of action to the current situation.” This is the longest MWE in Norsk Ordbok so far

The question is whether and, if so, how these developments have affected lexicography. I can answer for Norsk Ordbok, the editing of which was moved to a digital platform in 2003, after a thorough analysis of the first four paper-based volumes. The purpose was to sift out the gold in existing editorial practice and also to identify lexical and linguistic challenges that could be handled in a better way. The handling of MWEs of different types and at different levels of complexity was found to be a core issue in improving editorial practice; there was significant evidence that MWEs had been identified as (often polysemous) semantic units. As the old editorial format did not cater for MWEs, the handling was unsystematic and at times led to very convoluted entry structures. The other striking feature was the heterogeneity of the MWEs, stretching from MWE adverbs (I live, “alive”), phrasal verbs (koma seg, “get better”), and simple noun phrases (interert krins, “integrated circuit”) to quite long pragmatic discourse items, proverbs, and “words with wings” of identifiable though often little known literary origin. For Norsk Ordbok this was handled by creating a digital “entry within the entry” where an MWE of any kind could be treated as an ordinary (polysemous) headword, defined and exemplified with citations (Grønvik 2011, p. 120). On paper, such MWEs are highlighted typographically. In Norsk Ordbok vol. 5–12 MWEs have a special subentry format, and they are searchable in the web edition. The criteria for selection and editing are set out in the editorial guidelines (Grønvik and Gundersen 2014, p. 210 ff.). MWEs with sub- entries have fixed word forms in a fixed order (allowing for inflection of verbs in verb phrases and schemata like “either (x) or (y)”). They must be lexicalized in at least one sense which is unpredictable from the combined senses of each word. The MWEs deserving of subentries function as a part of speech, and they should not have sentence form, though there are exceptions to this rule, particularly concerning some pragmatic phrases : alle gode ting er tre “third time lucky”. The lexicography of Norwegian 29

This means that typical collocations, for instance subject-verb; verb-object; verbal patterns indicating aspect (“came walking”), and grammatical patterns have to be handled differently from MWEs within the entry schema. If the evidence shows occurrence of a word string above a certain frequency, such expressions are treated as editorial examples and either listed without sources or with multiple sources. Sources can be place names, since location is a source category in Norsk Ordbok. Proverbs and pragmatic utterances are treated in the same way. Editors select MWEs for subentries, on the basis of the editorial guidelines. A thorough analysis of the resulting hoard (about 12,000 MWE entries in all, covering the alphabet from i to å.) is yet to come, but no one will be surprised to learn that phrasal verbs make up a large section of the total. Fewer in number but sometimes hugely frequent are MWEs functioning as adverbs or prepositions and themselves consisting entirely of function words. Many frequent MWEs are included in older dictionaries, but information about them is in practice almost inaccessible to users, as they are listed within entries as ordinary usage examples under one of the word forms making up the MWE. Any glossary, speller, or dictionary of Norwegian, whether monolingual or bilingual, will present a selection of frequent MWEs, as MWEs make up a substan- tial repertoire of semantic units and also represent types which are integral to word creation. Any phrasal verb can be the origin of compound adjectives and nouns: gå inn “go in/into” > inngåande “deep, thorough” (present participle, adj.), inngått “entered into” (past participle, adj.), inngåing “signing (of an agreement or con- tract),” (noun). Some MWEs have grammaticalized functions: koma til is used to express the future tense “be going to.” Grammatical qualities can be used to structure an entry, especially for verbs. The most frequently used distinctions are intransitive-transitive, type of subject, type of object, type of adverb, and combinations of these. This information could well be given more explicitly in bilingual dictionaries than in monolingual ones, as monolingual dictionaries are assumed to be for the use of native Norwegian speakers. The need for grammatical and semantic explicitness in monolingual dictionaries is related to which user group the dictionary is written for. Until recently, Norwegian lexicography, especially monolingual lexicography, has been explicit and detailed on orthography, while the categorization and rendering of meaning is dealt with more by assumption of previous knowledge. Given the influx to Norway of long-term and short-term immigrants and tourists, it would be sensible to provide monolingual lexicography with explicit enough information for proficient foreign learners (Atkins and Rundell 2011, p. 400).

The Current and Possible Future Development of Electronic Lexicography in Norwegian

Paper dictionaries are still published, but production will today be digital. Software for dictionary production probably ranges from some kind of text file to specialized 30 O. Grønvik xml-formats and databases. Norsk Ordbok, and the monolingual one-volume dictio- naries BOB and NOB, probably have the most sophisticated solution. They are edited from a dynamic system of relational databases, with separate databases and registers for different types of sources, informants, etc. The web edition of BOB and NOB are linked to the full form registers for Nynorsk and Bokmål, Ordbanken, for the schemas showing the morphology of headwords. For the minority languages and the bilingual language pairs with small user groups, electronic lexicography offers the only possible solution in financial terms. For sign language, paper must always be a poor substitute for photography and film as a medium for showing signing as real language. If one thinks of dictionaries as databases of categorized information about semantic units, the product potential increases sharply, depending on what categories the database contains and how the categories are organized. Theoretically, the database contains as many subproducts as there are category combinations to search for. One example: In many general dictionaries definitions may have usage labels, enabling database users to draw out all entries with a definition marked with a given label, for instance “in music.” This possibility is an editorial tool, allowing easy consistency checking. But it is also a user tool, giving access to a basic termino- logical dictionary of whichever usage marker is searched for. The lexicographical database accordingly contains as many terminological dictionaries as there are usage labels for different fields of knowledge. Norwegian lexicographers and publishers are aware of this potential, and certain amount of experimentation with it takes place but not in a systematic or goal-oriented fashion. The electronic possibilities for lexicographical databases are endless and should be met with better attempts to establish best practice and describe basic requirements for serious efforts. The real challenge of a major transition to digital publishing is the public expectation that digital resources will be updated, accessible, and efficient all the time. To meet this expectation, dictionary publishers will have to maintain perma- nent operative organizations with lexicographers at hand to revise, correct, and answer questions. When e-lexicography becomes essential to public education and administration, as it has in Norway, lexicography is very tangibly part of the infrastructure of a modern society.

References

Books, Articles

Aasen, I. (1864). Norsk Grammatik. Kristiania: Mallings Boghandel. Andersen, G., & Kristiansen, M. (2010). Towards a national infrastructure for terminology in the framework of the CLARA and CLARINO projects. SYNAPS, 25,1–8. Atkins, B. T., & Rundell, M. (2011). The Oxford guide to practical lexicography. Oxford: Oxford University Press. Aukrust, K., & Hodne, B. (Eds.). (1998). Fra skuff til skjerm. Om universitetenes databaser for språk og kultur. Oslo: Universitetsforlaget. The lexicography of Norwegian 31

Bjørneset, T. (2016). Lexin i Norge - hva sier brukerne? In: A. Gudiksen & H. Hovmark (2016). Nordiske Studier i Leksikografi 13. Rapport fra 13. Konference om Leksikografi i Norden. København 19–22. maj 2015. København. Nordisk Forening for Leksikografi i samarbejde med Nordisk Forskningsinstitut, Universitetet i København. Bø, R. (1989). Arbeidet med Norsk Ordbok. In O. Grønvik & O. Almenningen (Eds.), Ord og Mål. Festskrift til Magne Rommetveit 4. Oktober 1988 (pp. 80–94). Oslo: Kringkastingsringen. Braunmüller, K. (1998). De nordiske språk. Oslo: Novus Forlag. 248 p. Falk, H., & Torp, A. (1903, 1906). Etymologisk Ordbog over det norske og danske sprog.I–II. Christiania: Aschehoug. Fjeld, R. V. (2008). Plan for leksikalsk dokumentasjon av moderne bokmål. In Nordiska Studier i Lexikografi 9. Rapport fra Konference om leksikografi i Norden, Akureyri 22.-26. maj 2007 (pp. 131–142). København: Nordisk forening for leksikografi. Retrieved from http://ojs. statsbiblioteket.dk/index.php/nsil/issue/archive Giellatekno. (2015). Giellatekno, the Center for Saami language technology. Retrieved from http:// giellatekno.uit.no/index.eng.html Grønvik, O. (2007). Ordbøker – kva har vi? Definisjonsordbøker for norsk. In Språknytt 2007. Oslo. Språkrådet. pp. 15–21. Grønvik, O. (2011). Leksikaliserte ordsamband i norsk. In Nordiska studier i lexikografi 10. Rapport från Konferensen om lexikografi i Norden Tammerfors 3-5- juni 2010 (pp. 118–129). Tammerfors: Nordisk forening for leksikografi. Grønvik, O., & Gundersen, H. (2014). Redigeringshandbok for Norsk Ordbok. 364 p. Retrieved from http://no2014.uio.no/eNo/tekst/redigeringshandboka/redigeringshandboka.pdf Grønvik, O., & Ore, C.-E. S. (2013). What should the electronic dictionary do for you – And how? In I. Kosem, J. Kallas, P. Gantar, S. Krek, M. Langemets, & M. Tuulik (Eds.), Electronic lexicography in the 21st century: Thinking outside the paper. Proceedings of the eLex 2013 conference, 17–19 October 2013, Tallinn, Estonia (pp. 243–260). Ljubljana/Tallinn: Trojina, Institute for Applied Slovene Studies/Eesti Keele Instituut. Retrieved from http://eki.ee/ elex2013/conf-proceedings/ Grønvik, O., & Ore, C.-E. S. (2014). Samvirket mellom ordbank og ordbok. In R. E. V. Fjeld & M. Hovdenak (Eds.), Nordiske studier i leksikografi 12. Rapport fra Konferanse om leksikografi i Norden Oslo 13.-16. august 2013 (pp. 139–158). Oslo. Novus Forlag. Gundersen, D. (1967). Fra Wergeland til Vogt-komiteen. Et utvalg av hovedtrekk og detaljer fra norsk språknormering. Oslo: Universitetsforlaget. 149 p. Gundersen, D. (1990). Norwegian lexicography. In H. Steger & H. E. Wiegand (Eds.), Wörterbücher.Dictionaries. Dictionaires. II, 1923–1928. Berlin/New York: Walter de Gruyter. Haugen, E. (1966). Riksmål og folkemål. Norsk språkpolitikk i det 20. århundre. Oslo: Universitetsforlaget. Haugen, E. (1976). The Scandinavian languages. An introduction to their history. London: Faber & Faber. Hofland, K. (2000). A self-expanding corpus based on newspapers on the web. Proceedings of the Second International Language Resources and Evaluation Conference. Paris: European Lan- guage Resources Association. Hovdenak, M. (1997). Arbeidet med ordbøker i Språkrådet. In Språknytt 1997–2 p. Oslo: Språkrådet. Retrieved from http://www.sprakradet.no/Vi-og-vart/Publikasjoner/Spraaknytt/ Arkivet/Spraaknytt_1997/Spraaknytt_1997_2/Arbeidet_med_ordboeker_i_Spra/ Hovdenak, M. (2008). Dei norske Lexin-ordbøkene. In LexicoNordica. Tidsskrift om leksikografi i Norden (Vol. 15, pp. 219–234). København: Norsk foreining for leksikografi. Ims, I. (2015). Du kjenner vel NOB og BOB? In Språknytt (Vol. 1, pp. 12–13). Oslo: Språkrådet. Johnson, R., Antonsen, L., & Trosterud, T. (2013). Using finite state transducers for making efficient reading comprehension dictionaries. In Proceedings of the 19th Nordic conference of computa- tional linguistics (NODALIDA) (NEALT proceedings series 16). Norway: Oslo University. Kristoffersen, G., et al. (2005). Norsk i hundre! Norsk som nasjonalspråk i globaliseringens tidsalder. Et forslag til strategi. Oslo: Språkrådet. 32 O. Grønvik

Kulturdepartementet. (2008). Mål og meining – Ein heilskapleg norsk språkpolitikk. (St. Meld. 35 2007–2008). Retrieved from https://www.regjeringen.no/nb/dokumenter/stmeld-nr-35- 2007-2008-/id519923/ Leira, V. (1982). Nyord i norsk 1945–1975 (Norsk Språkråd). Oslo/Bergen/Trondheim: Universitetsforlaget. Målføresynopsisen (The Dialect Registry). (2014). Web edition retrieved from http://www.edd.uio. no/synops/work/hovedside.html Myking, J. (2005). Terminologi i Noreg – historisk oversyn. In J. Hoel (Ed.), Hvem tar ansvaret for fagterminologien? Oslo: Språkrådet. Myking, J., et al. (2017). Norsk fagspråkarbeid. In ms. To be published in Mæhlum, B. (2017). Praksis. In A. Nesse & H. Sandøy (Eds.), Norsk språkhistorie II. Oslo: Novus. Nordisk ministerråd. (2007). Deklaration om nordisk sprogpolitik 2006. København: Nordisk ministerråd 95 p. Norges Allmennvitenskapelige forskningsråd. (1973). NAVF og norsk forskning. Oslo: Norges Allmennvitenskapelige forskningsråd. Norsk aviskorpus. Retrieved from http://avis.uib.no/avis/om-aviskorpuset/english Ore, C.-E. S., Tvedt, Tvedt, L. J. & Bjørnstad, T. (2002): The meta dictionary. Paper given at ALLC/ ACH 2002, Tübingen, Germany. Oslo. The unit for digital documentation, faculty of arts, University of Oslo. Retrieved from http://www.edd.uio.no/artiklar/leksikografi/meta_dictio nary.html Rask, R. (1826). Forsøg til en videnskabelig dansk Retskrivningslære med Hensyn til Stamsproget og Nabosproget. Tidsskrift for nordisk oldkyndighed 1. København: Poppske Bogtrykkeri. Rauset, M., Hannesdóttir, A. H., & Sigurðardóttir, A. (2012). Ein-, to- eller fleirspråkleg ordbok? In B. Eaker, L. Larsson, & A. Mattisson (Eds.) (2011), Nordiska studier i Lexikografi. Rapport från Konferensen om lexikografi i Norden. Lund 24 – 27 maj 2011. Skrifter utgjevna av Nordiska föreningen för lexikografi. Skrift nr 12. Lund. Nordisk forening for leksikografi. pp. 512–523. Røgeberg, K. M. (Ed.). (2003). Norge I 1743.I–V. Oslo: Riksarkivet – Solum forlag. Sandøy, H. (2000). Lånte fjør eller bunad? Om importord i norsk. Oslo: Landslaget for norskundervisning/Cappelen Akademisk Forlag. Språkbanken (The Language Bank). (2015). N-gram Beta. Retrieved from http://www.nb.no/sp_ tjenester/beta/ngram_1/ Språkrådet Annual Report (The Language Council). (1953–2015). Årsmelding frå Norsk språknemnd (1953–71), Årsmelding frå Norsk språkråd (1972–2004), Årsmelding frå Språkrådet (2005–2015). Oslo. Retrieved from http://www.edd.uio.no/perl/search/search.cgi? appid=241&tabid=3172 Språkrådet. (2011-2). Minoritetsspråk. Retrieved from http://www.sprakradet.no/Tema/ Minoritetssprak/ Språkrådet. (2015-2). Retningslinjer for normering av bokmål and nynorsk. 1.3.2015. Oslo. Retrieved from http://www.sprakradet.no/Sprakhjelp/Rettskrivning_Ordboeker/Retningslinjer- for-normering-av-nynorsk-og-bokmal/ Stortinget (The Parliament of Norway). - Stortings Forhandlinger 1928–1929 Odelstinget 1.2.1929 p. 81 f. Stortingsforhandlinger. Innstilling frå Kirke- og undervisningskomiteen om språksaken (Vol. 6a. Innst. S. nr. 189 for 1969–70, pp. 273–291). Retrieved from http://urn.nb.no/URN: NBN:no-nb_digistorting_1969-70_part6_vol-a Utdanningsdirektoratet. (2014). Fagvalet til elevane i vidaregåande opplæring skoleåret 2013/14. 28 p. Retrieved from http://www.udir.no/globalassets/upload/forskning/2015/fagval-i-vgo- 2014-2015.pdf Venås, K. (1989). Kva ventar vi oss av Norsk Ordbok? In O. Grønvik & O. Almenningen (Eds.), Ord og Mål. Festskrift til Magne Rommetveit 4. Oktober 1988 (pp. 153–156). Oslo: Kringkastingsringen. Venås, K. (1997). Korleis bør nynorske ord sjå ut? In B. Bjørkum, B. Helleland, E. Papazian, & L. S. Vikør (Eds.), Målvitskap og målrøkt. Festskrift på 70-årsdagen 30. november 1997 (pp. 367–386). Oslo: Novus forlag. The lexicography of Norwegian 33

Vikør, L. S. (2001). The Nordic languages. Their status and interrelations (Nordic Language Secretariat). Oslo: Novus Press. Vikør, L. S. (2002). The Nordic languages and the languages in the North of Europe. In O. Bandle, K. Braunmuller, E. H. Jahr, A. Karker, H.-P. Naumann, & U. Telemann (Eds.), The Nordic languages. An international handbook of the history of the North Germanic languages (pp. 1–12). Berlin/New York: Walter de Gruyter. Vogt, H., et al. (1966). Innstilling om språksaken. Fra Komitéen til å vurdere språksituasjonen m.v. oppnevnt ved kongelig resolusjon 31. januar 1964. Innstillinger og betenkninger fra kongelige og parlamentariske kommisjoner, departementale komitéer m.m. Wetås, Å., & Berg-Olsen, S. (2014). Revision and digitization of the early volumes of Norsk Ordbok: Lexicographical challenges. In A. Abel, C. Vettori, & N. Ralli (Eds.) (2014), Pro- ceedings of the XVI EURALEX International Congress: The user in focus. 15–19 July 2014, Bolzano/Bozen. Bolzano/Bozen. Institute for Specialised Communication and Multilingualism. (pp. 1075–1086).

URLs http://baakoeh.oahpa.no/nob/sma/. Web portal with bilingual dictionaries for the Arctic languages. http://islex.is/no. Islex. Bilingual bidirectional web dictionaries between Icelandic and the other Nordic languages. http://lexin.udir.no. The Lexin dictionaries of Norway. http://www.dokpro.uio.no/engelsk/. The Documentation Project Website. http://www.elexicography.eu. Individuals, societies, cultures and health. COST ACTION IS1305. http://www.hf.uio.no/iln/om/organisasjon/tekstlab/om/. The Text Laboratory.

Dictionaries

[BOB] Landrø, M. I., & Wangensteen, B. (1986–, 3rd paper ed. 2005). Bokmålsordboka: definisjons- og rettskrivningsordbok. Bergen: Universitetsforlaget. New web edition 2013 at http://www.nob-ordbok.uio.no/ [ISLEX] The Islex Project. Online multilingual dictionary between modern Icelandic and six Scandinavian target languages. Árni Magnússon-instituttet for islandske studier (). Reykjavik. http://islex.is/no [LEXIN] Lexin. (2015). Online LEXIN dictionaries. Uniweb. Bergen: Utdanningsdirektoratet. Retrieved from http://lexin.udir.no/ [NAOB] Det Norske Akademis Ordbok (to be published 2017). Oslo: Kunnskapsforlaget. http:// www.naob.no/ [NOB] Hovdenak, M., et al. (1986; 4th paper ed. 2006). Nynorskordboka. Definisjons- og rettskrivingsordbok. Oslo: Det Norske Samlaget (New web edition 2012 at http://www.nob- ordbok.uio.no/) [NRO] Norsk riksmålsordbok. (1937–1957). Sommerfeldt, A., & Knudsen, T. (Vol. I–IV); Noreng, H. (1957–1995) (Vol. V–VI). Oslo: Aschehougs forlag. Aasen, I. (1873). Norsk Ordbog med Dansk Forklaring. Christiania: Mallings Boghandel. Graedler, L., and Johansson, S. (1997). Anglisismeordboka. Oslo: Universitetsforlaget. 466 p. Guttu, T. (2005). Norsk Ordbok med 1000 illustrasjoner (2nd ed.). Oslo: Kunnskapsforlaget. 1350 p. Hallager, L. (1802). Norsk Ordsamling eller Prøve af Norske Ord og Talemaader. Kjøbenhavn: Sebastian Popp. Ifinger. (2015). Website for a number of digitalized Norwegian dictionaries from different pub- lishers. Subscription service. Retrieve from http://ifinger.no/ 34 O. Grønvik

Jenssøn, C. (1915). Den Norske Dictionarium eller Glosebog. København. Ed. by Torleiv Hannaas. 1. ed. 1646. Kristiania: Den norske kjeldeskriftkommisjonen. Knudsen, K. (1881). Unorsk og norsk, eller Fremmedords avløsning. Kristiania: Cammermeyer. Landfald, A. (2006). Norsk Ordbok (4th ed.). Oslo: Cappelen. Leem, K. (1768). Lapponico – Danico – Latina. Nidrosiae: Impensis Seminarii Lapponici Fridericiani. I–II. Lindeman, F., & Bjorvand, H. (2000; expanded 2nd ed. 2007). Våre arveord – etymologisk ordbok. Oslo: Universitetsforlaget. Nordisk miniordbok. (2013). Nordisk ministerråd. At http://www.nordord.org/ Ordnett. Subscription service for 50 digital dictionaries covering 10 languages. Oslo: Kunnskaps- forlaget ANS. Retrieved from http://www.ordnett.no/ Ross, H. (1985). Norsk Ordbog. Tillæg til “Norsk Ordbog” af Ivar Aasen. In facsimile edition 1971 of Ross 1895 with addenda I–VI (1895–1913). Oslo: Universitetsforlaget. Tegnordbok.no. (2015). Tegnordbok.Athttp://www.tegnordbok.no/#.Statped Termportalen. (2011). Web portal at http://www.terminologi.no/forside.xhtml Torp, A. (1919). Nynorsk etymologisk Ordbok. Kristiania: Aschehoug. 886 p. Voss, J. Fr. (1933). Tysk-Norsk Ordbok. Oslo: Noregs Boklag. The lexicography of Khmer

Robert K. Headley

Contents Introduction ...... 2 Description ...... 2 Alphabetic Ordering ...... 7 Pronunciation ...... 8 Definition ...... 8 Transliteration ...... 9 Usage Labels ...... 10 Cross-References to Related Forms ...... 10 Spelling Variants ...... 10 Etymologies ...... 10 Earliest Occurrences ...... 11 Examples ...... 11 Who Compiles Khmer Dictionaries? ...... 11 Suggestions for Further Work ...... 11 References ...... 12 Dictionaries ...... 13

Abstract This chapter discusses the history of lexicography in the Khmer language, the problems involved in compiling a Khmer–English dictionary, and specifically those encountered during the preparation of two recent Khmer dictionaries. Among the topics that are discussed are problems of lexical sources, alphabet- ization, spelling, pronunciation, definition, transliteration, usage, cross- referencing, etymology, and examples. Finally some suggestions about the

R.K. Headley (*) Independent Researcher, University Park, MD, USA e-mail: [email protected]

# Springer-Verlag Berlin Heidelberg 2015 1 P. Hanks, G-M. de Schryver (eds.), International Handbook of Modern Lexis and Lexicography, DOI 10.1007/978-3-642-45369-4_81-1 2 R.K. Headley

future for Khmer lexicography are also presented. This chapter is not intended to be an exhaustive treatment of all Khmer dictionaries. Only the major dictionaries from Khmer into a foreign language will be treated herein.

Introduction

Khmer is an Austroasiatic language of the Mon–Khmer family spoken by about 16 million people in Cambodia, Thailand, Vietnam, France, Australia, and the U.S.A. It is written in a unique, highly phonetic alphasyllabary derived from the Pallava script of Southern India with the earliest inscriptions dating back about 1500 years. Vowel signs are written above, below, to the left, to the right, and on both sides of the consonant symbols. Khmer is an isolating language with an SVO word order. Derivation is by means of prefixes, infixes, reduplication, and compounding.

Description

The first extensive lexical document in Khmer was Etienne Aymonier’s Vocabulaire cambodgien – franc¸ais (AVC), which appeared in 1874 and contains about 2,600 entries in Khmer script. Moura’s 1878 word list (MVCF) contains about 5,400 entries in Romanized form. In the same year Aymonier published a larger dictionary (ADK). J. B. Bernard produced a Khmer – French dictionary of about 5,000 entries in 1902 (BDC). ADK and BDC both provide Khmer script for their entries; Moura’s list gives only transcriptions. By the late 1930s there were three extensive Khmer lexical works, each one different. Joseph Guesdon’s 1930 two-volume Dictionnaire cambodgien – franc¸ais (GDC) (about 20,000 main entries) and Sandulph Tandart’s 1935 two-volume work with the same title (TDC) treat Khmer from different standpoints. Guesdon was interested in literary Khmer and included words and examples from a large number of Khmer literary and religious works. Figure 1 shows an extract from the GDC. Tandart emphasized current Khmer culture, especially material culture such as fishing equipment, and his dictionary contains botanical and zoological terms. The year 1938 marked the appearance of the Buddhist Institute’s great two-volume monolingual dictionary (VK1) prepared under the direction of the Ven. Chuon Nath. The VK1 represented a huge advance in Khmer lexicography. It contained about 25,000 entries and, in addition to full definitions, provided data on usage and etymologies of Indian loanwords, gave pronunciations for uncommon words, indicated the parts of speech, and contained many examples. See Fig. 2 for an extract from VK1. Sam Thang compiled a dictionary of newly created technical terms in 1962 (SVK) that supplemented VK1. This excellent little dictionary gave the Khmer scripts, some pronunciations (in Khmer script), parts of speech, definitions in French and Khmer, and examples for most of the approximately 4,500 terms. Judith The lexicography of Khmer 3

Fig. 1 Article from GDC

Fig. 2 Article from VK1

Jacob produced the first Khmer–English dictionary in 1974. Jacob’s dictionary (JCD) contained about 7,200 entries and gave the Khmer script form, an IPA transcription, the part of speech, the definition, and examples in IPA for each main entry. The first edition of Gorgoniyev’s Khmer–Russian dictionary (GKS, about 15,000 entries) and the U.S. Foreign Service’s glossary (KCG, about 8,000 entries) both appeared in 1975. These were the major lexical resources for Khmer 4 R.K. Headley until 1977, when the Headley et al. dictionary (HCD), containing about 42,000 entries and based largely on the Buddhist Institute’s and Tandart’s dictionaries, appeared. The HCD was about twice as large as VK1, and each main entry contained the Khmer script form, an IPA transcription, the part of speech, the definition(s), etymological data on Indian, Vietnamese, Chinese, and French bor- rowings, as well as cross-references and alternate spellings. Also in 1977, Franklin Huffman prepared a glossary of his Khmer teaching materials that contained about 10,000 entries. Although not strictly speaking a dictionary, Jenner and Pou’s (1980–81) splendid Lexicon of Khmer Morphology (JLK) contained about 5,600 entries and provided definitions and details on the complicated processes of Khmer derivation. A second edition of GKS edited by Long Seam was published in 1984. Alain Daniel’s (1985) Khmer–French dictionary (DDC) contained about 21,000 entries but provided only definitions. The need for an up-to-date Khmer–English dictionary prompted the preparation of the Modern Cambodian Dictionary (MCD1) by Headley et al. in 1997. Figure 3 shows an example from MCD1. This dictionary was based on the HCD and was enlarged to about 44,000 words. The 2005 orthographic dictionary published by the National Language Institute of the Cam- bodian Royal Academy (DOK) attempted to standardize Khmer spelling. It marks part of speech and gives some etymological information and pronunciations but provides few definitions. DOK is useful in that it contains about 43,000 words, many of which are not to be found in any other dictionary. VK2, compiled in 2007 by a group of Khmer scholars headed by Div Sean, is based on VK1, in many cases using the same definition, but it has added about 7,000 new entries from various scientific and technical fields. See Fig. 4 for an extract from VK2. By 2013 there were Khmer dictionaries into nine languages (i.e., English, French, Russian, Japa- nese, Vietnamese, Lao, Thai, German, and Chinese), specialized glossaries for several fields (including medicine, politics, economics, law, computer technology, geology, forestry, linguistics, environmental terms, royal terms, peace-making, the military, and neologisms), and several lexical works in the Khmer dialect of Surin, Thailand. At the present time a completely revised and enlarged second edition of Headley et al. (MCD2) is expected to appear in 2015. Figure 5 shows an example from MCD2. A number of websites offer searchable versions of VK1 and HCD. A searchable Khmer dictionary based on VK1, HCD, and MCD1, as well as a large searchable Khmer corpus of approximately 15 million characters, is available on the SEAlang website (http://sealang.net/khmer/). There are also a number of digital Khmer–English and English–Khmer dictionaries, or perhaps better stated, word lists:

• https://en.glosbe.com/en/km/ – 27,000 translated phrases, some spelling errors in the English • http://www.english-khmer.com/ – Eng–Khm, Khm–Khm, and Khm–Eng • http://dictionary.tovnah.com/khmer/dictionary – Khm–Eng and Eng–Khm; Khmer does not seem to print correctly; contains several databases including Chuon Nath and Headley 1997 The lexicography of Khmer 5

Fig. 3 Article from MCD1

Fig. 4 Article from VK2 6 R.K. Headley

Fig. 5 Article from MCD2

• https://play.google.com – Eng–Khm for Android • https://www.facebook.com/ – 50,000 words in app dictionary • http://www.lexilogos.com/english/cambodian_dictionary.htm

Some are simply existing Khmer dictionaries, such as that of the Buddhist Institute’s dictionary by Ven. Chuon Nath. Others have made an effort to construct passable English–Khmer dictionaries. Some provide pronunciation, word class information, and examples. While the Khmer–English dictionaries appear to be mainly uploads of existing dictionaries, the English–Khmer dictionaries appear to be original works. Many seem to have problems showing the Khmer script cor- rectly, however. A few corpora of Khmer texts exist in addition to the SEAlang corpus, but their availability is unknown. The Source Forge website (http://sourceforge.net/projects/ khmertext/files/) has a body of Khmer textual material which includes folktales, books, issues of the newspaper Nagara Vatta (1937–1941), and scanned copies and digital versions of the journal Kambuja Suriya (1926–1965). Channa Van and Wataru Kameyama (2010, p. 80) report that Google has a localized Khmer version of its web search engine (https://www.google.com.kh/) but that it is not specifically designed for Khmer. For their work on Khmer information retrieval they prepared a corpus of more than one million words composed of newspaper and magazine articles, stories, novels, and medical, cultural, legal, historical, agricultural, and The lexicography of Khmer 7 technical texts. As far as I am aware, none of these corpora has been used for the preparation of a dictionary. Each of the existing Khmer dictionaries is useful in its own way. The most complete newer ones, such as the GKS, VK1, VK2, MCD1, MCD2, and SDK, provide pronunciations, definitions, cross-references, some etymological informa- tion, and examples. They differ in how they provide this information and especially in how they spell and arrange the entries.

Alphabetic Ordering

There are several issues with the alphabetical order among the existing Khmer dictionaries. The alphabetical order of the Khmer alphabet is based very closely on the order developed for Indian scripts where consonants are grouped according to phonetic articulation. There is little disagreement among the existing Khmer dictionaries about the order of the consonants. But there is a major issue with how to treat the two pronunciations of . Before vowels in native Khmer words is pronounced /ɓ/. However, as the first element in a consonant cluster and in many words borrowed from Sanskrit or Pali it is pronounced /p/. When represents an initial /p/ it may also be written as , but often there is no indication of how it is to be pronounced. The question is whether to treat the two pronunciations together or separately. VK1 and VK2 separate the two pronunciations. Most of the other dictionaries keep and together. Jacob (1974, p. xxx) noted that Guesdon and Tandart set the precedent for combining the two pronunciations of under one heading and pointed out that a foreigner may not know whether an Indian loanword is pronounced as /ɓ/ or /p/ and would not know where to find it in VK1 or VK2. We agreed that it was more advantageous to the user to merge all the words with initial since there was no way to determine which pronunciation it had based only on the graphic form. A second issue involves the placement of the so-called independent vowels ( ) which generally represent initial vowels in Indian loanwords. The traditional order of these vowels is , which are separate letters and represent the sounds /ʔeʔ or ʔiʔ, ʔii or ʔəy, ʔoorʔuʔ, ʔoo or ʔuu, ʔəɯ,rɯʔ,rɯɯ,lɯʔ,lɯɯ, ʔae, ʔay, ʔao/ respectively. The and represent the Indian vowels r r¯̣ḷ ¯l.̣ The VK1, VK2, STVK, KCG, GKS, JCD, and DOK all group ˙ words beginning with these letters with the consonants (r) and (l). Guesdon places these words with the independent vowels. In French, English, and Cambo- dian textbooks, beginners are introduced to these symbols as members of the group of independent vowels. See, for example, Huffman (1970, p. 29 ff.), Jacob (1968, pp. 30–31), Khin Sok (2002a, pp. 57–59 and 2002b, pp. 49–50), (1953, p. 30), and (1969, p. 61). We note that Khin Sok places the independent vowels after at the end in his glossaries. We saw no reason to move the vowels from their traditional location, and therefore they will be found in MCD1 and MCD2 at the end of the dictionary with the other independent vowels. Another issue is how to handle the compound characters consisting of a dependent vowel with the reahmuk ( ) symbol, which represents the sound /h/. The order 8 R.K. Headley may be according to spelling or according to pronunciation. We also opted to treat the compound characters with reahmuk in the order in which language learners are taught, which is So, using the consonant /n/ the order would be .

Pronunciation

Prior to JCD in 1974, whenever pronunciations were given, they were either in Khmer script or in systems based on the French spelling. JCD was the first dictionary to use the IPA, and the later Khmer–English dictionaries as well as SDK followed her lead. Unfortunately, Jacob modified the IPA system to try to indicate the Khmer spelling of the word, and this resulted in a fairly strange and rather cumbersome transcription. The IPA is clearly the preferred system to use; it is simple, well known, and easily available in Unicode fonts. There are only a few minor differences in the various IPA transcriptions for Khmer used by different authors.

Definition

In MCD1 and MCD2 the part of speech for each main entry was indicated for every definition. This is an area where there is scant agreement among Khmer linguists. In MCD1 and MCD2 we recognized 13 word classes (nouns, pronouns, postnominal particles, classifiers, , verbs, adjectives, preverbal par- ticles, pronominal particles, adverbs, conjunctions, interjections, and final parti- cles). Jacob recognized 15 word classes. Huffman (1967) recognized 22 in 8 major classes (isolatives, substantives, predicatives, adverbials, adjectives, auxiliaries, relators, and particles). Haiman (2011) suggests an astounding 27 word classes while the simpler system of Khin Sok (2002a, b) recognizes only 9 major classes (nouns, adjectives, pronouns, verbs, adverbs, prepositions, conjunctions, intro- ducers, and interjections). A Khmer word may function as more than one part of speech, and they must be distinguished in the definitions. As might be expected, the definitions of words range from simple nouns describing items of material culture which proved fairly easy to define to complex Buddhist terms which almost defied translation into comprehensible English. We noted a lack of good English dictio- naries of Buddhist terms and, indeed, were unable to provide decent definitions for a small number of words. Some items of material culture which were unknown or unfamiliar to American students were illustrated in a small section at the end of the dictionary. A great many more illustrations would have been helpful. One impor- tant matter that needs more work and will have to be solved in future Khmer dictionaries is that of the word class which has been called “expressives.” Diffloth (2001) has written extensively on this class of words, which he describes as denoting “sensory perceptions of the speaker – visual, auditory, tactile, olfactory, The lexicography of Khmer 9 gustatory, emotional or other – in relation to a particular phenomenon.” More recently, Sidwell (2014, p. 33) has suggested that “... it is evident that there is a huge, far from fully documented aspect of Austroasiatic languages, namely descrip- tive and sound-symbolic lexicon and related word-formational strategies that are recognized under the rubric of expressives.” These expressives seem to fill several modifier slots and to appear as onomatopoetic words. An example of an expressive is /krɑkaek krɑkaok/, which may have an adverbial meaning, “in a babbling or chattering manner” or as a noun meaning “babbling, chattering.” In addition to defining the word class of expressives there is also the problem of defining the subtle shades of meaning of the myriad descriptive words which are especially characteristic of Khmer poetry. One example is the word : and its compounds , , , , and all glossed as “beautiful.” Care must be taken to define all meanings of a Khmer word, which in some cases can be many. For example, the word has at least five meanings: (1) “skewer,” (2) line for stringing fish or beads, (3) classifier for strings of fish, (4) “bunch of three or four rice plants growing together,” and (5) group of people (pejorative). Some words required long, often cumbersome, explanations. The word was defined as “to say (yi:, yi:) or (muəy pi: bəy yi: tʌɯ) in order to send a disease or misfortune to attach itself to someone else (especially if the speaker has been teased about the disease or misfortune).” We tried to include as many botanical and zoological terms as possible along with the current scientific name. A considerable literature exists on Khmer plant terminology and on names of birds and fish; information on terminology for reptiles (except snakes and turtles), marine animals, and insects is sparse. Lists of terms exist for many insects which are rice pests, but we suspect that some of these are merely descriptive and do not reflect actual usage among farmers.

Transliteration

A curious problem that arose during the compilation of the MCD2 was how to transliterate proper names. There is a long tradition in Cambodia of Romanization based on the French orthography, which is somewhat similar to that adopted by the Service Ge´ographique Khme`re; however, it is often extremely difficult to convert these transliterations back into the Khmer script. And, in many cases, different writers may give different transliterations for a single word. For example, the word /ca:pəy/ “guitar” has appeared as cha˘pey, ch^ape˘y, and chapei. The word and /lʔɑɑ/ “good” has appeared as la˘^a, l^a^a, and lo^a. We decided that we would retain the traditional transliteration for names of places and for well-known individuals since these have some currency. We used the system of Romanization similar to that developed for the American Library Association and Library of Congress and which is also used for Indian forms to transliterate uncommon names of persons (e.g., ,“Ādityasuriyā,” the name of a king in the Indian epic Ramayana), for most Buddhist theological terms and titles of Buddhist texts, and for the vast group 10 R.K. Headley of royal titles and officials (e.g., , “Okn˜ā Deba Qarajūna,” title of a royal official in charge of boats).

Usage Labels

We noted if a word was part of a specialized vocabulary such as a royal or clerical word, if it was a formal term or a spoken or slang variant, if it was insulting or obscene, or if it was used in poetry. We also noted if it was part of the enormous Buddhist theological lexicon, if it was used by the Khmer Rouge during their years of control, or if it was used mainly in certain specialized fields such as law, medicine, anatomy, or geology. There was a need to identify all formal, colloquial, and slang terms.

Cross-References to Related Forms

Cross-references included references to synonyms, antonyms, derived forms, and many Indian loanwords to the feminine form. It seems easy to overdo cross- references, but we believe that a reasonable number may be useful.

Spelling Variants

While current Khmer spelling is overwhelmingly phonetic there are many variant spellings. Some of these are old spellings that occur in Khmer manuscripts while others simply represent two different ways to spell the same word. For example, up until the early twentieth century many words with final /y/ used the subscript form of the letter. Thus krahāy was formerly written but is now written . Words with medial m have two possible spellings, one which uses the consonant /m/ with a subscript consonant and one which uses the symbol which represents the combination of vowel + /m/. Thus the word /sɑmlap/, “kill,” may be found as or . Since there are two symbols in Khmer for the sound /l/, a third possibility exists for this word, . We made the decision to write words of this type consistently with the even though both spellings will be found in current Khmer texts. Other spelling variants noted were variants derived from several different Indian sources. For example, we cross-referenced forms such as and , “meditation,” based on the same Indian root but the first through the Pali form jhāna and the second through the Sanskrit form dhyāna.

Etymologies

They are another very useful addition to a definition, perhaps more interesting and useful for the linguist than for the translator or language learner. VK1, HCD, and VK2 provide extensive etymologies for most loanwords, some of which may be The lexicography of Khmer 11 inaccurate. For example, VK1 and VK2 give an Indian source for the Khmer word which means “excrement.” The Khmer word, even though it is written , ācama, as if borrowed from the Sanskrit or Pali acāma, is pronounced /ʔac/ and is a well-known MonKhmer root. MCD1 and MCD2 include only the minimal notation that a word is a borrowing from French, English, Portuguese, Chinese, Thai, or Vietnamese. There is a need for a full etymological dictionary of Khmer, but such a work may be far off; much work needs to be done in this area.

Earliest Occurrences

If a future dictionary is to be modeled on the Oxford English Dictionary (OED), the inclusion of the first occurrences of words would be useful. There is sufficient data available to provide occurrences for many words back into the Old Khmer period of the language, but so far no dictionary has attempted this.

Examples

Examples from as many written genres as possible should be included to illustrate levels of usage. In MCD2 we added hundreds of examples from modern novels, scientific articles, newspapers, manuscript sources, religious texts, and folk- tales. For words or expressions taken from earlier dictionaries for which we could not find any examples we asked Khmer speakers to prepare appropriate examples.

Who Compiles Khmer Dictionaries?

The earlier dictionaries up to the first quarter of the twentieth century were compiled by missionaries. Starting with the first edition of the Buddhist Institute Dictionary in 1938 dictionaries were prepared by Buddhist monks and later by several linguists.

Suggestions for Further Work

There is much work yet to be done in the field of Khmer lexicography. The following list is not all inclusive nor is it in any special order, but it includes what we believe are important points to consider in the compilation of future dictionaries:

• Standardize Khmer spelling. • Standardize alphabetical order. • Decide whether the dictionary is to be an historical dictionary or a dictionary of the current Khmer language; if historical, decide on the time span. 12 R.K. Headley

• Organize a committee to oversee the work of compiling the dictionary, prepare the final definitions, select appropriate examples, and research etymologies. • Devise a method to collect words; this could involve having many readers scan Khmer documents and prepare raw entries giving source and definitions as was done for the OED as well as traditional methods of harvesting existing dictio- naries and searching computer corpora. • Study spoken Khmer and dialects of Khmer. Corpora of the spoken language will be useful in several ways, to collect the often highly truncated spoken forms of common particles and to define the meanings of these particles. • Organize fieldwork to collect dialectal and vocational vocabularies in areas such as metal working, fishing and hunting, carpentry, astronomy, folk medicine, and farming.

The National Commission of Khmer Language was established in 2007. The Commission is responsible for the standardization of the orthography, pronuncia- tion, and creation of new words and loanwords in Khmer. The Commission has established eight technical groups responsible for (1) orthography and pronuncia- tion, (2) word creation and loanwords, (3) humanity and social science terminology, (4) political science and diplomacy terminology, (5) linguistic and literature termi- nology, (6) science and technology terminology, (7) culture and fine art terminol- ogy, and (8) biology, medical science, and agricultural terminology. This Commission seems like an excellent start toward a committee to compile a new comprehensive Khmer dictionary.

References

Diffloth, G. (2001). Les expressifs de Surin, et ou cela conduit. Bulletin de l’e´cole franc¸aise d’extreˆme Orient, 88, 261–269. Haiman, J. (2011). Cambodian Khmer. Amsterdam: John Benjamins Publishing Company. Huffman, F. (1967). An outline of Cambodian grammar. PhD Dissertation, Cornell University. Huffman, F. (1970). Cambodian system of writing and beginning reader with drills and glossary. New Haven: Yale University Press. Jacob, J. (1968). Introduction to Cambodian. London: Oxford University Press. Sidwell, P. (2014). Expressives in austroasiatic. In J. P. Williams (Ed.), The aesthetics of grammar: Sound and meaning in the languages of Southeast Asia (pp. 17–35). Cambridge: Cambridge University Press. Sok, K. (2002a). La Grammaire du khmer moderne. Paris: Editions You-Feng. Sok, K. (2002b). Manuel de khmer (Vol. 1). Paris: Editions You-Feng. Van, C., & Kameyama, W. (2010). Query expansion for Khmer information retrieval. In Pro- ceedings of the 8th workshop on Asian language resources, Beijing 21–22 August 2010, pp. 80–87. Elementary Study of Cambodian, First Grade. (1953). Phnom Penh: Imprimerie Henry. First Grade Reading and Writing. (1969). Phnom Penh: Librairie et Imprimerie Phnom Penh. The lexicography of Khmer 13

Dictionaries

[AVC] Aymonier, E. F. (1874). Vocabulaire cambodgien-franc¸ais. Saigon: College des stagiaires. [ADK] Aymonier, E. F. (1878). Dictionnaire khmer-franc¸ais. Saigon: Son Diep. (an enlargement of his earlier 1874 vocabulaire). [BDC] Bernard, J. B. (1902). Dictionnaire cambodgien-franc¸ais. Hong Kong: Imprimerie de la Socie´te´ des Missions E´ tranges. [VK1] Buddhist Institute (Ven. Chuon Nath). (1968). Vaccananukram khmaer. Phnom Penh: Ed. de l’institut Bouddhique. [Five editions between 1938 and 1968; an expanded version translated into French was prepared by Father Rogatien Rondineau in 2007]. [DDC] Daniel, A. (1985). Dictionnaire pratique cambodgien-franc¸ais. Paris: Institut de l’Asie du Sud-Est. [VK2] Sean, D., et al. (2007). Vacananukram Khmaer. Khmer Dictionary. Phnom Penh: Pannagara Nagara Dham. [GKS] Gorgoniev, I. A. (1975). Khmersko-russkii slovar’ (2nd ed. 1984). Moscow: Russkii Yazyk. [GDC] Guesdon, J. (1930). Dictionnaire cambodgien-franc¸ais. Paris: Plon. [HCD] Headley, R. K., Chhor, K., Lim, L. K., Kheang, L. H., & Chun, C. (1977). Cambodian- English Dictionary. Washington: The Catholic University of America. [MCD1] Headley, R. K., Chim, R., & Soeum, O. (1997). Modern Cambodian Dictionary. Kensington: Dunwoody Press. [MCD2] Headley, R. K. & Chim, R. (2015). Modern Cambodian Dictionary (2nd ed.). Kensington: Dunwoody Press. [HCG] Huffman, F. (1977). Cambodian-English glossary. New Haven: Yale University Press. [DOK] Institute de la Langue Nationale. (2005). Dictionnaire orthographique de la langue khmer. Phnom Penh: Acade´mie Royale du Cambodge. [JCD] Jacob, J. M. (1974). A Concise Cambodian-English Dictionary. London: Oxford University Press. [JLK] Jenner, P. N., & Pou, S. (1980–81). A Lexicon of Khmer Morphology. Honolulu: University Press of Hawaii. [KCG] Sos, K., Kheang, L. H., & Ehrman, M. E. (1975). Contemporary Cambodian: Glossary. Washington, DC: Foreign Service Institute, Department of State. [MVCF] Moura, J. (1878). Vocabulaire franc¸ais-cambodgien et cambodgien-franc¸ais. Paris: Challamel aıˆne´. [SVK] Sam Thang, S. (1962). Vakyaparivattan Khmaer – Paran and Lexique khmer – franc¸ais. Phnom Penh/Bhnam Ben: Ronbumb Sam Sapannasar. [TDC] Tandart, S. (1935). Dictionnaire cambodgien-franc¸ais. Phnom Penh: Impr. Albert Portail. [SDK] Sakamoto, Y. (2001). Kanbojiago Jiten [Khmer-Japanese Dictionary]. Tokyo: Tōkyō Gaikokugo Daigaku Ajia Afurika Gengo Bunka Kenkyūjo. International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

The lexicography of Indonesian/Malay

Deny Arnos Kwarya* and Nor Hashimah Jalaluddinb aAirlangga University, Indonesia, Surabaya, Indonesia bFakulti Sains Sosial dan Kemanusiaan, Universiti Kebangsaan, Bangi, Malaysia

Abstract

The lexicography of Indonesian and Malay is closely related. The Indonesian and Malay language originate from the same language called Melayu, which was the language of the people who lived on the coastal plains of east and southeast Sumatra and offshore islands. The description of the lexicography of Indonesian/ Malay starts with the lexical characteristics of these languages. A general history of the lexicography of Indonesian/Malay is then presented, followed by the specific further development of lexicography in Indonesia and Malaysia, respectively. The third section deals with corpora for both languages. The important role of the language planning institutions in Indonesia (called Badan Bahasa) and in Malaysia (called Dewan Bahasa dan Pustaka) is given due attention, with particular reference to the paper and electronic dictionary products of these institutions. The chapter concludes with future prospects.

Introduction

The Indonesian language and the Malay language share the same origin. Both languages originated with the Malay (Melayu) people who lived on the coastal plains of east and southeast Sumatra and offshore islands (Sneddon 2003, p. 7). By the turn of the twentieth century, the Malay language had two different names (Bahasa Indonesia and Bahasa Melayu) with only slight differences in the vocabulary. At the Second Indonesian Youth Congress in 1928, the delegates proclaimed Bahasa Indonesia as the language of national unity. Bahasa Indonesia then became the national language of Indonesia after its independence in 1945. The name Melayu is retained by Malay people, and the Malay language was declared the national language of Malaysia when it gained its independence in 1957. The Indonesian language has a considerable number of speakers, given that Indonesia is one of the most populous countries in the world. There are over 240 million Indonesians, so we can assume that the number of speakers is not less than that. However, their degree of proficiency in Indonesian varies a lot because most of them actually have the Indonesian language as their second language. The first language of most Indonesian people is one of the hundreds of local languages that can be found in Indonesia. According to the data from Ethnologue (http://www.ethnologue.com/), of the 7,105 languages spoken in over 200 countries in the world, 706 are spoken in Indonesia. As for the Malay language, it is spoken by approximately 28 million people. In a wider context, the number of people who speak Indonesian/Malay can reach 400 million people, comprising those who live in Indonesia, Malaysia, Singapore, Brunei Darussalam, and Southern Thailand. Indonesian and Malay are similar in most respects. In terms of orthography, both languages share the same vowels and consonants, so the spellings of most of the words are the same. In terms of phonetics and phonology, there are also considerable similarities. The pronunciation of the words at the segmental level is the same, and the only difference that can probably be noticed is at the suprasegmental level. However, since suprasegmental features are not distinctive features, the difference in the pronunciation will not

*Email: [email protected]

Page 1 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015 cause any misperception of the words pronounced. Any misunderstandings only happen at the lexical semantics level, because some of the words or cognates have developed different meanings. The differences at the semantic level can also occur due to special contextual use. For example, in Indonesian and Malay, the word lembu “cow” refers to an animal with four legs. However, in some contexts in Malay, the word lembu may also connote bodoh “stupid.” In Indonesian, the word keledai “donkey” is used to connote bodoh “stupid.” In order to account for dialectal differences of Indonesian/Malay, particularly at the word level, a dictionary known as Kamus Melayu Nusantara has been published by Dewan Bahasa, Brunei Darussalam, which was initiated by Majlis Bahasa Brunei, Indonesia, and Malaysia (MABBIM, “the Language Council of Brunei, Indonesia, and Malaysia”). This dictionary is a combination of two comprehensive dictionaries, i.e., Kamus Dewan (Malaysia) and Kamus Besar Bahasa Indonesia (Indonesia), and additional corpus data from Brunei Darussalam (Omar 2008).

Description

Lexical Characteristics of Indonesian/Malay Indonesian/Malay uses the Latin alphabet. There are 26 letters, comprising five vowels (a, i, u, e, o) and 21 consonants (b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z). Of these 26 letters, only the vowel “e” that has two different pronunciations, i.e., [e] and [ə], while the other 25 letters have regular pronunci- ations. This means that the number of sounds (i.e., 27) is quite similar to the number of letters (i.e., 26). In addition, its spelling system is phonemic, so the words can be read without any difficulty. For instance, the word makan “eat” is pronounced exactly as it is written, i.e., [mɑkɑn]. Loan words with complex syllable structures undergo phonological modifications. For example, the English words “bomb,”“method,” and “consonant” become bom, metode, and konsonan in orthography and are pronounced [bom], [metodə], and [konsonɑn], respectively. The modification involves vowel insertion and consonant deletion which is triggered by the native phonological system. Based on their position, there are three types of affixes in Indonesian/Malay, i.e., prefix, suffix, and confix, where the prefix is the most productive one. The prefixes pose a challenge to lexicographers: how should complex words with prefixes be placed in a dictionary? The most common method is to place the inflections under the lemma. However, this usually confuses the users or learners because they may not know the root of the word. Take, for example, the words mengajar “to teach” and mengejar “to chase.” A learner, who knows the prefix meng-, will know that the root of mengajar is ajar, but this learner may also infer that the root of mengejar is ejar. This is incorrect, because the root of mengejar is kejar. This means that when this learner looks for the word ejar in a dictionary, he will not be able to find it. There has been a suggestion to include all the inflected forms as headwords in a dictionary. However, this will make the dictionary very thick under the letter M, because most of the verbs in Indonesian/Malay can take the prefix meng-, which has several allomorphs, i.e., meng-, mem-, men-, me-, and menge-. In addition, the passive forms in Indonesian/Malay are formed by adding the prefix di-orter- to a verb. If these are listed as headwords, the letters D and T in the dictionary will also be very thick. However, this is not a problem for an electronic dictionary, because the lexicographers can put all the inflected forms as headwords and use cross- referencing to the headwords that contain the roots of the words. For a printed dictionary, especially a learner dictionary, Kwary (2010) suggested the use of an appendix that contains an explanation of the word formation rules. The users, especially those who are not native speakers of Indonesian/Malay, can refer to the appendix when they need to know the root of a particular complex word.

Page 2 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

History of Indonesian/Malay Lexicography Since Indonesian and Malay are basically the same language that has developed into two different names, the discussion on the history of Indonesian/Malay lexicography begins with an overview of the history when these two languages were still called the Malay (Melayu) language. Kridalaksana (1979), Ahmad (2002), and Omar (2008) have written detailed histories about the development of Indonesian/Malay dictionaries. The lexicography of Indonesian/Malay started with a bilingual dictionary in a form of a simple word list or glossary. The first wordlist ever recorded was compiled by a Chinese trader. He gathered 482 Malay words with Chinese equivalents from 1403 to 1511. The second wordlist was collected by Antonio Pigafetta (1522), who listed 426 Malay words with Italian equivalents. In the subsequent years, more wordlists and also bilingual dictionaries were produced for instrumental purposes. In 1603, there was the Spraeck ende woord-boeck, Inde Malaysche ende Madagaskarsche Talen met vele Arabische ende Turcsche Woorden by Frederick de Houtman. Then, in 1623, there was the Vocabularium ofte Woortboeck naer ordre vanden Alphabet in’tDuytsch-Malaysch ende Mrilayselz-Duytsch by Caspar Wiltens and Sebastian Danckaerts.

History of Indonesian Lexicography The first monolingual dictionary compiled by an Indonesian was the Kitab Pengetahuan Bahasa iaitu Kamus Loghat Melayu-Johor-Pahang-Riau-Lingga, penggal yang pertama “A Book of Language Knowledge, that is a Dictionary of the Malay Dialect of Johor-Pahang-Riau-Lingga, part one” by Raja Ali Haji of Riau. This dictionary may have existed before its formal publication year. The year 1345 Hijriah (i.e., 1928 A.D.) is mentioned in the work printed by Al-Ahmadiah Press, Singapore. However, the author actually lived in the first half of the nineteenth century, so it can be assumed that the content of the dictionary was already in circulation during the nineteenth century. After the Indonesian language was proclaimed as the language of national unity in 1928, a few monolingual Indonesian dictionaries were published. The first comprehensive Indonesian dictionary is Kamus Umum Bahasa Indonesia “General Dictionary of Indonesian” (1957) by W. J. S. Poerwadarminta. This dictionary became the main reference for the Indonesian language for many years. In 1974, the Centre for Language Cultivation and Development (which is called Badan Bahasa since 2010) was established, based on the presidential decree numbers 44 and 45. According to Kridalaksana (1979), in 1976, the bibliography of Indonesian dictionaries compiled by the Centre for Language Cultivation and Development listed 101 Indonesian-foreign language dictionaries, 137 foreign language-Indonesian dictionaries, and 204 bilingual dictionaries of the local languages. The most significant dictionary work of the Centre for Language Cultivation and Development is Kamus Besar Bahasa Indonesia (KBBI). The dictionary was launched in 1988 at the Fifth Indonesian National Language Congress. The driving force behind this dictionary is Anton M. Moeliono. In its first edition, this dictionary contained 62,100 entries. This number is rather small if compared with the number of entries in other big language dictionaries. However, it was already the most comprehensive dictionary at that time. The second edition of this dictionary appeared in 1993. It was edited by Harimurti Kridalaksana and contained 72,000 entries. The third edition, with 78,000 entries, appeared in 1998 and was edited by Hasan Alwi. The most recent one, the fourth edition, was published in 2008. It was edited by Dendy Sugono, and it contains 90,000 entries.

History of Malay Lexicography The first Malay monolingual dictionary was Kamus Waman Yatawakkal which was arranged by Syed Mahmud bin Almarhum Syed Abdul Kadir Al Hindi and published in Singapore in 1894. Strictly speaking, the first monolingual dictionary was compiled after Malaysia got its independence. Dewan Bahasa dan Pustaka (Language and Literary Agency), henceforth DBP, which was established in 1956, was given the responsibility to produce an authoritative Malay monolingual dictionary for national usage.

Page 3 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Fig. 1 Excerpt from KBBM

Teuku Iskandar was the chief editor of the first edition of Kamus Dewan, and he was assisted by A. Teeuw, a Dutch scholar who was appointed by UNESCO as an advisor to this project. The drafting started in 1967 and was completed in 1970 (Baharom 2007). DBP’s first attempt to produce a bilingual dictionary dates from 1979. It was followed by a joint effort with Australia National University to compile a comprehensive English-Malay dictionary, which mate- rialized in 1992. This dictionary serves as a useful tool for translators, especially translation from English to Malay. It focuses on polysemy and word choice in different contexts. To date, DBP has produced nine Malay monolingual dictionaries and six bilingual dictionaries, namely, English-Malay, French-Malay, Thai-Malay, Russian-Malay, Tamil-Malay, and Mandarin-Malay (Padilah 2012). In addition, DBP also engaged in coining terminologies which are subsequently compiled in the form of dictionaries. These dictionaries are discipline-specific references, such as those for physics, chemistry, biology, medicine, banking, economy, and linguistics, and the main objective of this compi- lation is to accomplish one of the national aspirations in that the Malay language as a national language is the language of knowledge and is the language of national unity. As far as lexicography is concerned, DBP’s greatest endeavor was to produce Kamus Besar Bahasa Melayu Dewan (KBBM) which is estimated to have about 100,000 entries. Each entry is rich in information, including phonetic transcription, grammatical category, etymology, and jawi transcription, as illustrated in Fig. 1.

Electronic Corpora of Indonesian and Malay

Corpora for Indonesian Until very recently, electronic corpora did not receive much attention in Indonesia. The discussion on creating an Indonesian language corpus started in the 1990s, but it was only on a small scale and was not available in a proper electronic form. One of the reasons for the lack of electronic corpora is the lack of “proper” Indonesian language data to build the corpus. Alwi et al. (1998: 1) state that the number of Indonesian L1 speakers is smaller than that of local languages, such as Javanese and Sundanese. In addition, many schools in Indonesia use English as the medium of instruction in order to get the title “international standard school.” Therefore, when a general corpus is created from the actual language use, especially from websites, the result will mostly show nonstandard Indonesian. This will be against the policy of Badan Bahasa (the national language body), which implements a prescriptive approach in order to improve the Indonesian language proficiency of the people in Indonesia. Not surprisingly, therefore, the first Indonesian language corpus was created outside Indonesia by Adam Kilgarriff and his colleagues (2010). The corpus, which is called Indonesian WaC (Indonesian Web as Corpus), consists of approximately 100 million words taken from websites that use the Indonesian

Page 4 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Teks Digital dari Internet

Pemilihan teks melalui Sistem Korpus

Data Teks Mentah Teks Digital (Subpangkalan) Korpus Pangkalan Data Teks

Sistem Konkordans Sistem Analisis Teks

Baris Konkordans Maklumat Statistik

Fig. 2 The Malay corpus language. The corpus can be accessed at the website of the Sketch Engine, i.e., http://www.sketchengine. co.uk/. The second corpus of the Indonesian language, also consisting approximately 100 million words, was also created outside Indonesia, namely, at Leipzig University in Germany (Quasthoff and Goldhahn 2013). The corpus is available at the website http://corpora.informatik.uni-leipzig.de. The texts for this corpus were taken from the websites of Indonesian newspapers and Wikipedia. These two corpora were used by Kwary (2013) to create the first high-frequency list of the Indonesian language. The first base list consists of 500 word families. This base list is embedded in the modified AWP software, which was originally created by Laurence Anthony from Waseda University. The original software can be downloaded from the website http://www.laurenceanthony.net/software.html (Anthony 2012). The modified version that can be used to check the profile of an Indonesian text is called AWP-IWL (Indonesian Word List) and is available from http://www.kwary.net/iwl.html. Analyzing several general texts shows that the first base list (which consists of only 500 word families) covers more than 60 % of the words in the general texts. Further, work on the creation of a second base list and an academic word list needs to be done so that the profile of the general texts can be analyzed properly.

Corpora for Malay In Malaysia, corpus work started earlier than in Indonesia. DBP first moved to promote the use of corpora in dictionary compilation in the 1980s. The main sources of the corpus were the texts from daily newspapers, books (fiction and nonfiction), magazines, classical texts, and translated texts. The process of building the Malay corpus is shown in Fig. 2. This corpus is available online at http://prpm.dbp.gov.my/ and has been widely used for Malay language planning and development (Ghani et al.; 2008). The corpus serves as an input for researchers, lexicographers, and academics who are looking for authentic data. The grammarians have used the corpus to describe the canonical behavior of Malay words and phrases based on the natural settings and usage. They can also retrieve information on lexicons, root words, derived forms, and phrases, based on the analysis of concordance lines and collocations (Baharom 2007). As shown in Fig. 2, the digitized texts from the Internet together with other sources are kept as an archive. This comprises the raw data that have to go through a selection process done by a corpus system. The data that pass the selection process become the corpus. The corpus can then be run by using the concordance system or the text analysis system (Ghani et al. 2008). Until today, the Malay corpus has stored more than 100 million words. The Malay corpus has been a great help for dictionary compilation. A definition

Page 5 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Fig. 3 The web page of KBBI of an entry has changed from intuition-based description to a corpus-based description. The citation of examples is more natural and authentic. The lexicographers have also been able to look for more appropriate synonyms and polysemies. The corpus has made the job of lexicographers much easier but simultaneously also more challenging. Hence, Malay dictionaries have become more reliable and respectable.

The Language Planning Institutions in Indonesia and Malaysia

Planning in Indonesia In Indonesia, the language planning institution is called Badan Bahasa. The main task of Badan Bahasa is to develop, cultivate, and preserve Indonesian languages and literature (http://badanbahasa.kemdikbud. go.id/lamanbahasa/sejarah/). Badan Bahasa is under the ministry of education and culture. Badan Bahasa has two divisions: the Centre for and Preservation and the Centre for Language Cultivation and Socialization. The dictionary work is handled by the subdivision called standardization and preservation under the Centre for Language Development and Preservation. The latest dictionaries produced by this subdivision are as follows (http://badanbahasa.kemdikbud.go.id/lamanbahasa/jenis_ produk/Kamus%20Bahasa%20Indonesia):

1. Kamus Besar Bahasa Indonesia (KBBI). This is the comprehensive Indonesian dictionary already mentioned. The dictionary is available in printed form and online; it can be accessed at http:// badanbahasa.kemdikbud.go.id/kbbi/. The front page of the website is shown in Fig. 3. 2. Kamus Bahasa Indonesia untuk Pelajar. This is a student dictionary which can be used as a reference for students at junior high schools and senior high schools in Indonesia. This dictionary contains 31,200 entries and is only available in printed form. 3. Glosarium. This is a bilingual dictionary that focuses on scientific terms. It includes terms related to religion, linguistics, mathematics, biology, psychology, etc. It is the main reference for translators who want to know the Indonesian equivalents of foreign terms. This dictionary is available in printed form,

Page 6 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Fig. 4 The web page of Glosarium

on CD, and online at the website http://badanbahasa.kemdikbud.go.id/glosarium/. The front page of the dictionary is shown in Fig. 4. 4. Kamus Bidang Ilmu. This is a set of monolingual dictionaries with scientific terms. It provides the explanations of the scientific terms of several disciplines. It is only available in printed form. 5. Kamus Bahasa Daerah. In this project dictionaries of the local languages in Indonesia are produced. Currently, over forty such dictionaries have been compiled. 6. Kamus Dwibahasa. In this project several bilingual dictionaries are produced. The first dictionary in this project is an English-Indonesian dictionary. The dictionary is expected to provide the standard Indonesian equivalents for general English words. This dictionary is still being compiled. 7. Tesaurus. This dictionary provides lexical relations among words in the Indonesian language. For each headword, we can find its synonym, antonym, hyponym, and meronym. This dictionary is available in two printed versions based on the arrangement of the headwords, i.e., alphabetical and thematic. 8. Kamus Pemelajar Bahasa Indonesia. This is the Indonesian learner dictionary. It is the first Indonesian dictionary created by consulting various Indonesian language corpora. This dictionary is created with the needs of the foreign people who are learning Indonesian in mind. This dictionary is expected to be completed in 2015.

Planning in Malaysia Even though DBP is not the sole guardian of producing dictionaries in Malaysia, the task of compiling a high- quality dictionary is shouldered by them. The Department of Language Development at DBP comprises a lexicography division, a lexicology division, and a dictionary division. These have a complimentary task in ensuring profound dictionary activities. In parallel with today’s technological advances, DBP has become a user-friendly counterpart in encouraging users to interact and give responses to their dictionary work. As an

Page 7 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Fig. 5 The web page of PRPM for Dictionaries Information initial effort, DBP has set up Sistem Bahasa Melayu Bersepadu (SBMB) or Integrated Malay Language System. SBMB incorporates all the systems, including the dictionary management and organization system, the corpus management and development system, the terminology management and development system, the encyclopedia management and development system, the minority language management and development system, and a “hotline” language service. Users are free to interact with the staff. DBP’s Pusat Rujukan Persuratan Melayu (PRPM) has become the most popular language website in Malaysia. Its website address is http://prpm.dbp.gov.my/. PRPM serves as a one-stop center for all language information seekers. Any language user can search for a specific section of the site, and the search engine will bring the user to the desired page. Figure 5 shows a page in the dictionary-specific section. There are twelve dictionaries that can be retrieved, and information can be extracted from each of them. This has become a great help to all users in obtaining any Malay language inquiries within a split second. Another promising and exciting avenue is DBP’s Gerbang Kata section. This section is specifically designed for e-dictionary (e-kamus) services. Gerbang Kata is reachable at the website http://ekamus.dbp. gov.my/. It serves as a platform for users to interact with the lexicography unit to discuss, give feedback, contribute ideas, and introduce new words to the unit. Figure 6 is the front page of the website of Gerbang Kata.

Future Prospects of Lexicography in Indonesia and Malaysia Only very recently, lexicography has become one of the taught courses at the tertiary level in Indonesia and Malaysia, from undergraduate to postgraduate levels. In Indonesia, Deny Kwary is assisting a doctoral student from the University of Indonesia to create new principles in order to compile better Indonesian dictionaries for Indonesian language learners. In Malaysia, Jalaluddin et al. (2012) have carried out action research with an attempt to instill interests in dictionary work among postgraduates. This research attempts to motivate students to appreciate the art of compiling a dictionary by introducing

Page 8 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Fig. 6 The web page of Gerbang Kata (e-kamus) lexicographic practice and theory, blending them together in order for students to have a better compre- hension of this discipline. Consequently, we may be looking at brighter prospects for lexicography in Indonesia and Malaysia. There are a number of possible future tracks for lexicography in Indonesia and Malaysia. Considering the number of local languages in both countries, we should be looking at digitalization of these local languages, especially the endangered ones, and further studies on the role of a local language in the society. In Malaysia, Jalaluddin et al. (2013) have ventured into an endangered language, where her research team members attempt to relate lifelong learning with the understanding of an aboriginal community, with specific focus on compiling a dictionary of that aboriginal language. Apparently, a combination of field research and knowledge of compiling a dictionary provides a method toward useful insights into an aboriginal people’s language. In addition to the language of the aboriginal people, we are exposed to their intellectual, economic, social, and personal contexts from which their language and values arise. Therefore, this compilation is a two-way learning process: on the one hand, new information is disclosed during fieldwork; and on the other, new insights about aboriginal world views are revealed. Fieldwork as conducted by Jalaluddin et al. has also been conducted in Indonesia by Badan Bahasa through their local units called Balai Bahasa. However, further lexicography training is needed for the staff members of Balai Bahasa in order to document the lexicon of the local languages properly and to reveal the insights of the local people about their lives and their surroundings. The proper documentation of the lexicons of the hundreds of local languages in Indonesia will no doubt enrich the comprehensive Indonesian dictionary which “only” consists of 90,000 entries in its latest (fourth) edition (2008). In 2015, a new publication called the Frequency Dictionary Indonesian (Quasthoff et al. 2015) was published by Leipziger Universitätsverlag. This work is the first to use the 100-million-word corpus of the

Page 9 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Indonesian language. It includes both the most frequent 1,000 word forms in order of frequency and data on the relative frequency of 1,000,000 word forms. This publication is now being considered by Badan Bahasa in order to inform the revision of the comprehensive Indonesian dictionary and to prepare for its fifth edition.

References

Ahmad, I. (2002). Perkamusan Melayu: suatu pengenalan. Kuala Lumpur: Dewan Bahasa dan Pustaka. Alwi, H., Dardjowidjojo, S., Lapoliwa, H., & Moeliono, A. M. (1998). Tata Bahasa Baku Bahasa Indonesia. Jakarta: Balai Pustaka. Anthony, L. (2012). AntWordProfiler (Version 1.4.0) [Computer Software]. http://www.laurenceanthony. net/antwordprofiler_index.html Baharom, N. (2007). Perkamusan di Malaysia. In N. H. Jalaluddin & R. Baharudin (Eds.), Leksikologi dan Leksikografi Melayu (pp. 18–52). Kuala Lumpur: Dewan Bahasa dan Pustaka. Ghani, R. A., Husin, N. M., & Chin, L. Y. (2008). Pangkalan data korpus DBP: Perancangan, pembinaan dan pemanfaatan. In Z. Ahmad (Ed.), Nahu Praktis Bahasa Melayu. Bangi: Penerbit UKM. Jalaluddin, N. H., Zainudin, I. S., Ahmad Z., Mohamad, F., Sultan, M., & Radzi, H. M. (2013). The dictionary as a source of a lifelong learning. Paper presented at the 5th Word congress on educational sciences, Sapienza University, Rome, Italy. Jalaluddin, N. H., Zainudin, I. S., Sanit, N., & Yusoff, Y. M. (2012). Teaching and learning lexicography: from impressionistic to systematic understanding. Paper presented at U.K.M. teaching and learning congress, Bentong, Pahang. Kilgarriff, A., Reddy, S., Pomikálek, J., & Avinesh, P. S. V. (2010). A corpus factory for many languages. Paper presented at the seventh international conference on language resources and evaluation, ELRA, Malta. Kridalaksana, H. (1979). Lexicography in Indonesia. RELC Journal, 10(2), 57–66. Kwary, D. A. (2010). Bilingual dictionaries in language cultivation. Paper presented at the Language Planning Symposium, Badan Bahasa, Jakarta. Kwary, D. A. (2013). Creating and testing the Indonesian High Frequency Word List. In Paper presented at the 11th KOLITA (‘Annual Linguistics Conference’), Atma Jaya University, Jakarta. Omar, A. H. (2008). Perkamusan Melayu: dari jejak pengembara ke pembangunan negara. In N. H. Jalaluddin & R. Baharudin (Eds.), Leksikologi dan Leksikografi Melayu. Kuala Lumpur: Dewan Bahasa dan Pustaka. Padilah, A. (Ed.). (2012). Meneliti Jejak Membaharui Babak. Kuala Lumpur: Dewan Bahasa dan Pustaka. Quasthoff, U., Fiedler, S., Hallsteinsdóttir, E., Kwary, D. A., & Goldhahn, D. (2015). Frequency dictionary Indonesian. Kamus Frekuensi Bahasa Indonesia. Leipzig: Leipziger Universitätsverlag. Quasthoff, U., & Goldhahn, D. (2013). Indonesian Corpora. Leipzig: Universität Leipzig. http://asv. informatik.uni-leipzig.de Sneddon, J. (2003). The Indonesian language: Its history and role in modern society. Sydney: UNSW Press.

Dictionaries de Houtman, F. (1603). Spraeck ende woord-boeck, Inde Malaysche ende Madagaskarsche Talen met vele Arabische ende Turcsche Woorden. Amsterdam: Jan Evertsz. Gerbang Kata. http://prpm.dbp.gov.my/

Page 10 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_83-1 # Springer-Verlag Berlin Heidelberg 2015

Glosarium. http://badanbahasa.kemdikbud.go.id/glosarium Haji, R. A. (1928). Kitab Pengetahuan Bahasa iaitu Kamus Loghat Melayu-Johor-Pahang-Riau-Lingga, penggal yang pertama. Singapore: Al-Ahmadiah Press. Kamus Bahasa Indonesia untuk Pelajar. http://badanbahasa.kemdikbud.go.id/lamanbahasa/produk/889 [KBBI] Kamus Besar Bahasa Indonesia. http://badanbahasa.kemdikbud.go.id/kbbi [KBBM] Kamus Besar Bahasa Melayu Dewan. (Forthcoming in 2017). Kuala Lumpur: Dewan Bahasa dan Pustaka. Kamus Dewan. (1970). Kuala Lumpur: Dewan Bahasa dan Pustaka. Poerwadarminta, W. J. S. (1957). Kamus Umum Bahasa Indonesia. Jakarta: Balai Pustaka. Pusat Rujukan Persuratan Melayu. http://prpm.dbp.gov.my/ Syed Mahmud bin Almarhum Syed Abdul Kadir Al Hindi. (1894). Kamus Waman Yatawakkal. Singa- pore: Al-Ahmadiah Press. Wiltens, C., & Danckaerts, S. (1623). Vocabularium ofte Woortboeck naer ordre vanden Alphabet in’t Duytsch-Malaysch ende Mrilayselz-Duytsch.’s Graven-Haghe: by de Weduwe, ende Erfghenamen van wijlen Hillebrant Jacobssz van Wouw.

Page 11 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

The lexicography of indigenous languages in

Wolf Dietrich* University of Munster,€ Munster,€ Germany

Abstract

The immense diversity of indigenous languages in South America has led to a rich lexicographic production since colonial times, when some of the widespread languages were used for purposes of Christian missionary work. The period of the first dictionaries of Quechua, Aymara, Guaraní, and “Língua Geral” in is followed by the missionary lexicography of further languages in the nineteenth and twentieth centuries. Dictionaries compiled by linguists, from the SIL and from national universities, begin to appear since the 1950s. In recent years the concern about the increasing number of endangered languages is resulting in more and more comprehensive dictio- naries of small hitherto unknown languages, especially of Amazonia. Typological characteristics as well as problems of semantics and pragmatics are the focus of linguistic studies. Ethnolinguists are increasingly involved in teaching programs, which very often include projects of dictionaries to be elaborated together with the indigenous communities. The modern dictionaries of all relevant languages in South America are presented in this chapter.

Introduction

South America is a continent of hundreds of different indigenous languages belonging to more than 30 language families and, moreover, including 60–80 isolated languages. The exact number of languages as a whole and of particular isolated languages is difficult to establish because it depends not only on the definition of what a language is and what a dialect is but also on the insights of linguists who, benefiting from the increasing amount of linguistic studies made in this field, are recognizing more and more genetic relationships between hitherto isolated languages and larger groups. Thus, the number of isolated languages has diminished within the last decades. On the whole, many South American indigenous languages are highly endangered, more and more of them become extinct, but the indigenous people who now speak Spanish or Portuguese continue consid- ering themselves special nations and civilizations, different from the surrounding national societies. Some of the families are forming larger groups or stocks in genetic classifications:

1. Chibcha languages, grouped in three families, are spoken from Nicaragua to Northern Colombia. 2. Carib languages, grouped in three larger groups and 20 subgroups, extend from the Lesser Antilles to the Caribbean coast of Belize, Guatemala, and Honduras as well as to Northern South America (Colombia, Venezuela, the Guianas) and Northern and Central Brazil. 3. Arawak languages form a stock of five major groups extending from Caribbean islands to Venezuela, the Guianas, all Amazonia in Brazil and Peru, and to Eastern .

*Email: [email protected]

Page 1 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

4. Tukano languages are grouped in three families, all located in Western Amazonia (borderland between Colombia, Ecuador, Peru, and Brazil). 5. Pano-Takana forms a stock of five families located in Western Amazonia (Brazil, Peru, and Bolivia). 6. Tupí forms a stock of 10 families, nine of them located in Brazil, along the Amazon River; the tenth family, Tupí-Guaraní, has between 40 and 50 living languages spoken from French Guiana, Brazilian and Peruvian Amazonia to Central and Southern Brazil, Eastern Bolivia, , and the North of . 7. Macro-Jê forms a stock of three families of Jê languages and 12 more families of particular languages whose genetic relationship with Jê has not yet been evidenced completely. Macro-Jê languages are spoken from Northeastern to Southern Brazil. 8. Quechua and Aymara are the major Andean language families. Probably, there is no genetic relationship, but there are deep mutual influences due to secular contact. Quechua or better the Quechuan languages form a family with two major dialect groups (or languages) and many subgroups resulting from older and more recent spreading from Central Peru (Quechua I) to the North (up to Ecuador and Southern Colombia, Quechua IIA and B) and the South (Southern Peru, Bolivia, Northern Argentina, Quechua IIC). Quechua has nearly eight million speakers. The Aymaran family consists of Aymara itself (two million speakers, mainly in Bolivia, but also in Peru, around Lake Titicaca, and the North of Chile), as well as Jaqaru and Cauqui. 9. Mapudungun, traditionally called Mapuche or Araucano, is the most important indigenous language of Central Chile of Patagonia (Argentina). It is spoken by 250,000 speakers, with only little dialectal variation. 10. The so-called Chaco languages, languages of the semiarid Chaco area between Eastern Bolivia, the North of Paraguay, and of Argentina, traditionally belong to four families, Mataco- Mataguayo, Northern and Southern Guaicurú, Enlhet-Enenlhet (formerly Lengua-Maskoy).

Numerous smaller language families are found in Northern South America (including part of Central America), especially in Amazonia. Very often, most of the former languages of the family have died out. They are not counted here:

(a) Misumalpa, two languages left (Honduras, Nicaragua) (b) Chocó, three languages with many dialects (Pacific coast between Panama and Colombia) (c) Barbacoa, four languages (Pacific coast of Ecuador) (d) Guahibo, four languages (borderland between Colombia and Venezuela) (e) Sáliba or Sáliva, five languages (borderland between Colombia and Venezuela) (f) Huitoto, two major languages (borderland between Colombia and Peru) (g) Záparo, two or three languages left (borderland between Peru and Ecuador) (h) Kawapana, two languages left (Northeastern Peru) (i) Yanomami, four languages (borderland between Venezuela and Brazil) (j) Arawá, three or four languages left (borderland between Brazil and Peru) (k) Makú or Nadahup (five languages, borderland between Brazil and Colombia) (l) Jívaro, four languages (Southeastern Ecuador) (m) Chapakura, five languages, some of them moribund (borderland between Brazil and Colombia) (n) Nambikwara, three languages, several dialects (Rondonia/Mato Grosso, Brazil) (o) Zamuco-Chamacoco, two languages, several dialects (Bolivia, Paraguay, Brazil)

Some until now isolated languages are important in their area and/or in linguistic typology:

Page 2 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

Páez or Nasa Yuwe (77,000 speakers, valley of Cauca River, Colombia), Kofán (Colombia, Ecuador), Waorani (Eastern Ecuador), Warao (18,000 speakers, mouth of Orinoco River, Vene- zuela), Tikuna (30,000 speakers, borderland between Brazil, Peru, and Colombia), and Pirahã (Brazilian Amazonia). Especially in the Guaporé region, between Brazil and tropical Bolivia, there are many small hitherto isolated languages, such as Aikanã, Kwazá, Movima, Kanoê, Yurakaré, Itonama, Leko, Mosetén, and Chiquitano. Lowland indigenous languages generally have between 1,000 and 20,000 speakers, many of them have only reduced groups of 100–600 speakers, and few of them have more than 20,000: Jívaro has 70,000; Tikuna 30,000; Chiriguano (Tupí-Guaraní) about 50,000; and Guajiro or Wayuú (Arawak) 540,000. Highland languages like Quechua and Aymara used to show higher numbers of speakers. Most indigenous languages in South America are endangered whenever and wher- ever parents do no longer speak their language with their children. High numbers of speakers as those mentioned for Quechua and Aymara gloss over the fact that there is no unified standard language spoken and taught in all places. Even in this case linguistic diversity is a reason for language loss.

Description

Lexical characteristics in the relevant languages Lexical wealth and scarcity All South American indigenous languages reflect premodern cultures. Their traditional lexical stocks therefore do not exhibit the technical, scientific, or institutional concepts of modern Western languages. On the other hand, they are generally rich in denominations of the surrounding world (botanical terms, fauna, flora, utensils, attires, kinship, myths, beliefs, etc.). Aikhenvald (2012, pp 360–361) mentions that over 40 % of the lexicon she gathered for Tariana (Arawak; Aikhenvald 2001) are botanical and zoological terms. In the past, ethnologists who studied the material and spiritual life of indigenous cultures did not always have access to all its aspects and often did not learn the language sufficiently; linguists were not always interested in all details of indigenous botanical taxonomies. On the whole, the number of lexical roots is reduced if compared with those of modern civilizations. Another phenomenon is the scarcity of lexical roots in dying languages as was observed by Fernández Garay (2004, p. 50) in the case of Tehuelche (Chon family of Southern Patagonia). A characteristic of many South American languages is the use of serial verbs (Aikhenvald and Muysken 2011; Aikhenvald 2012, pp. 304–325). This is not only a matter of syntax but also of lexicon, because many of the unitary concepts in one language are expressed by a complex of actions in another one. Adelaar and Muysken (2004) mention the lack of semantic differentiation as well as the wide spectrum of semantic applications in Quechua. Quechua verb roots are not formally categorized for the distinction transitive – intransitive (e.g., paki- “to break,” tikra- “to turn”). The authors observe root economy in Quechua because there is no simple verb for “to kill,” but only a derived form: wan˜u “to die”–wan˜u-cˇi “to make die, kill.” They highlight the wide semantic range for instance of n˜i-/ni- “to say,” such as “to answer, ask, tell, ponder, and intend,” so that loanwords from Spanish (kontes- ta-, pinsa-) are easily adopted, especially by bilingual speakers. Many authors have mentioned the lack of generic terms (hypernymies) such as “animal” in South American indigenous languages. This is true for Quechua (Adelaar and Muysken 2004, p. 234) but also for Paraguayan Guaraní. On the other hand, Tariana (Arawak) has general terms for “game,”

Page 3 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

“fish,”“birds,”“snakes,” and “invertebrates” (Aikhenvald 2012, p. 361) just as many Tupí-Guaraní languages have. Quechua is rich in differentiated expressions for “to carry” and “to hold” (Adelaar and Muysken 2004): “to hold in the mouth,”“hold or carry a handful,”“hold on the lap,”“carry in a skirt,”“carry with both hands,”“carry on the back,” and “carry among four (as of a litter).” Jarawara (Arawá), for instance, has several synonyms for “to eat,” depending on the nature of the action (with a lot of chewing, with little chewing, by sucking, eating which involves spitting out seeds, etc., Aikhenvald 2012, p. 360).

Semantic and pragmatic problems The non-correspondence of the lexical meaning of items of different languages is a well-known characteristic of natural languages. The more different the cultures of the studied languages are, the more significant may be semantic non-correspondences. This may lead to partial non-correspondences between the two sections of bilingual dictionaries (Erize 1960, p. 432). In many cases, dictionaries have to offer ethnological, cultural, and technical (e.g., botanical and zoological) explanations. The problem of linguistic taboo is mentioned in Caldas et al. (2009, p. 156) and Aikhenvald (2012, pp. 361–365): some informants may not pronounce their name, and women are forbidden to pronounce the name of certain objects or concepts. In many South American especially Amazonian languages, there are special languages for men and women (Aikhenvald 2012, pp. 374–375). This may refer to parts of the lexicon or the whole language, including pronouns and different kinds of intonation.

Isolating lexical roots Many dictionary makers discuss the microstructure of their dictionaries. The definition of entries refers to the problem of isolating simple from inflected or compound forms and the treatment of the different meanings a root may have. The following solution is given by Courtz (2008): “In principle, the shortest form of a lexical unit is chosen as headword. In the case of a noun or verb, this means that the stem of the word is listed, even though the stem may not occur per se in Carib speech or text” (p. 206). Several authors, however, discussing the theoretical problem of obligatory prefixes which occur in many languages, for example, the so-called relational prefixes in Tupí languages, insist on the necessity of marking this at the beginning of the entry (Cabral and Rodrigues 2003, p. 8; Caldas 2010). The problems of isolating simple verbal roots in Karajá (Macro-Jê) are discussed by Maia and Fialho (2002, pp. 123–124). Many languages do not have infinitives as mentionable forms like Spanish, Latin, or even English. Simple roots do not occur either in many languages with a long lexicographic tradition. In some dictionaries or lexicographical projects, the 3rd person form of the verb was chosen as entry (Silva 2013, pp. 118–19; Ferreira 2009). Word classes (verbs, nouns, adverbs, pronouns, as well as grammatical words (particles) indicat- ing mood, modality, evidentiality, etc.) are indicated in all modern dictionaries. Nouns in many South American languages show gender, noun classes, or classifiers (Aikhenvald 2012, pp. 279–303; Regúnaga 2012).

Naming new realities Nearly all South American indigenous languages have loanwords from Spanish or Portuguese, sometimes from French and/or Dutch. Since the first dictionaries of the sixteenth century, they were integrated and used as if they were proper terms. Generally, they were adapted to the phonology of the specific language in the sixteenth and seventeenth centuries but are more and more used in their

Page 4 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014 original pronunciation and spelling, according to the degree of bilingualism in the particular community. Examples are mentioned in Fernández Garay (2001, pp. 50–51) and Aikhenvald (2012, pp. 378–381).

History of South American lexicography from the sixteenth to the twentieth century The concept of “general languages” The “discovery” of the Americas by Europeans and the contact with new languages led to systematic language learning by missionaries during the first half of the sixteenth century. In view of the immense number and diversity of languages, a few widespread languages were chosen for mission- ary purposes, viz., as the languages of the ancient Aztec and Inca empires, Nahuatl, Quechua, and Aymara. Guaraní was equally understood in many parts of colonial Paraguay, as well as different dialects of Tupínambá (Tupí-Guaraní family) were spoken at the Atlantic coast of Brazil from South to North. Some of these languages (Nahuatl (but never Maya), Quechua, Aymara, Mapuche, Guaraní, and Tupínambá) soon were generally used as missionary languages in the process of Christianization. Some of them were even called “lenguas generales” in colonial times. Today the term “lenguas generales” is used by linguists when they refer to the mentioned linguistic situation of historical language contact. With regard to Brazil, linguists speak about “Língua Geral” or “línguas gerais.” Rodrigues (1996) restricts the term to the language of mestizo populations in early Brazil where the descendants of settlers who had married indigenous women overtook the Tupínambá language of their mothers. Rodrigues distinguishes two dialects: “Língua Geral Paulista” in the São Paulo region and “Língua Geral Amazônica” in the Amazon and the State of Maranhão. A characteristic of most missionary linguistic studies is the omission of all reference to the traditional beliefs of the indigenous people, their myths, and their conception of the world. The goal was missionary work, not the ethnographical study of the indigenous culture. All non-Christian religious concepts as well as “filthy” expressions are omitted in missionary dictionaries. Ethnogra- phers generally did not compile more than small vocabularies. Therefore, with regard to South America, documentation by linguists, on the whole, does not begin before the second half of the twentieth century.

The first Quechua dictionaries The rich lexicography of Quechua begins with the dictionary by Santo Tomás in 1560. Quechua lexicography flourishes until the middle of the eighteenth century and restarts at the middle of the nineteenth century. The dictionary by Santo Tomás (1560) has two parts, a Spanish-Quechua “lexicon” and a shorter Quechua-Spanish “vocabulario.” Holguín’s more comprehensive dictionary (1608) also consists of a Quechua-Spanish “vocabulario” (about 400 pp.) and a Spanish-Quechua “lexicon” (about 300 pp.). It has been a model of Quechua lexicography until recent times. Of course, the work of both authors is the result of preceding missionary efforts in understanding the grammar and lexical semantics of the language, in establishing a writing system, in teaching the language, and in writing down the first texts in Quechua. As several of their followers, Santo Tomás and Holguín also composed exemplary of the language. Nevertheless, it is quite natural that neither Santo Tomás nor Holguín nor all their followers up to the 1930s did not understand all the phonological peculiarities of the language. They did not realize, for instance, the existence of a , a fundamental phonological feature for today’s linguists. They realized, up to a certain degree, the difference between glottalized and non-glottalized stops in Quechua but were not able to describe them adequately. In the nineteenth century the extensive Quechuan studies by the Swiss Tschudi and the German Middendorf included the production of copious dictionaries. After Tschudi and von, (1853),

Page 5 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

Middendorf (1890) published a remarkable Quechua-German-Spanish dictionary of 857 pages, based on Holguín, but omitting unusual and adding new lexemes. A special focus is on derived and compound verbs. He was the first to recognize the nature of glottalized stops.

The first Aymara dictionary The first Aymara dictionary is also the culminating point of the history of Aymara lexicography. Bertonio’s Vocabulario de la lengua Aymara (1612) has two parts: the Spanish-Aymara section has 385 pp. in the modern edition (Bertonio 2005), and its Aymara-Spanish section has about 330 pp. One of its peculiarities is a characteristic of all colonial dictionaries of the sixteenth and seventeenth centuries. It seems that the implicit definition of the entry was not the single root but the contextualized expression: most of the entries are nouns with adjectives, nouns used as arguments of verbs, verbs with objects, and different kinds of adverbials. To give an example, in Bertonio (1612), we find under “ojos”“eyes” expressions like “to open/close the eyes,”“to let one’s eyes travel,”“to put out somebody’s eyes,”“small eyes,”“eyes bedewed with tears,” etc.

The first “Língua Geral” and dictionaries The “Língua Geral” of the Brazilian coast (see Section “Lexical Characteristics in the Relevant Languages”) was first documented in an anonymous dictionary from 1621. Never published until 1938 and a revised edition from the 1950s (Anonymous 1952–1953), this Portuguese-Língua Geral dictionary manifests just by its title that “Língua Brasílica” was spoken then as the national low standard language besides Portuguese as the high standard. The dictionary has the same contextu- alized kind of entries as in Bertonio’s Aymara dictionary (see Section “The First Aymara Dictio- nary”). The second important example of “Língua Geral” lexicography is another anonymous dictionary, probably written by the monk Frei Onofre, the first part of which was published in Lisbon in 1795. Both parts of the dictionary were published in a revised edition by Plínio Ayrosa ([Frei Onofre] (1934)). Modern “Língua Geral Amazônica” is called Nheengatu (“good language”) since the nineteenth century. It had been spoken as a lingua franca in the whole of Amazonia between Portuguese officials, mestizo settlers, and local Indians during the Portuguese expansion but also still in the twentieth century. Today it is limited to the Upper Rio Negro region, where it is spoken as a native language by several indigenous groups who gave up their own Arawak or Tukano languages. One example of the rich Nheengatu lexicography from the nineteenth century on is Stradelli (1929). It has a grammatical introduction (pp. 11–72), a Portuguese-Nheengatu (pp. 73–356) and a Nheengatu-Portuguese dictionary (pp. 357–722), with much ethnographical, botanical, and zoolog- ical information. The dictionary is followed by traditional Nheengatu tales (pp. 723–768).

The first Guaraní dictionary Antonio Ruiz de Montoya, born in Lima and raised in Tucumán (Argentina), had lived in the Guairá region, east of the River Paraná (in today’s Brazil) since 1612. Montoya, who had learnt Guaraní very well, published a grammar, a dictionary, and a catechism (Montoya 1639; Montoya, Antonio Ruiz de 1639). The first part of his admirable dictionary, Tesoro, is a comprehensive Guaran- í-Spanish dictionary (407 pp. in four columns), the second, Vocabulario, is a Spanish-Guaraní dictionary of 13,196 entries. The Tesoro has 5,500 Guaraní entries, generally much longer than the Spanish entries in the Vocabulario. In addition, there are nearly 600 other Guaraní words hidden in the entries of the Tesoro which do not make up entries of their own. Montoya’s dictionaries were models of Guaraní lexicography during the eighteenth and the whole of the twentieth centuries (Dietrich 1995).

Page 6 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

Yet, one of the disadvantages of taking Montoya as a prototype of Guaraní lexicography is that some of the dictionaries of the first half of the twentieth century are full of archaisms, for instance Jover Peralta and Osuna (1950). For a long time Guaraní scholars did not realize that Old Guaraní, as documented by Montoya, and modern Paraguayan Guaraní are rather different languages, perhaps more than different stages of the same language. The differences have not yet been sufficiently explained. As was the case for Quechua, also Montoya did not realize the nature of the glottal stop in Guaraní, but neither did his followers until the 1940s.

An island Arawak dictionary of the seventeenth century When the Spanish conquista of the Americas began, the islands of the Caribbean Sea were inhabited by Arawak-speaking Indians. At the beginning of the sixteenth century, these were attacked by Caribs from northern South America who killed the men and married their women. Their descen- dants, calling themselves Caribs, had taken on the Arawak language of their mothers. From that time on there is an Arawak language called Carib. Those Caribs of the Greater Antilles were soon mixed up with black slaves, who also took on Carib as their mother tongue. Their descendants were brought by the English to the coasts of Nicaragua in 1795 and to Belize in 1832. Today Black Carib or Garífuna is still spoken in these places. In 1665, the French missionary Father Raymond Breton, who had lived for many years on the island of Guadeloupe, published a Carib-French dictionary, which is a dictionary of Island Arawak. This dictionary indicates that the extinct Island Arawak is close to modern Black Carib. Breton’s dictionary has 241 pp. in its modern edition (Besada et al. 1999). It presents lots of ethnographical, botanical, and zoological information and gives many examples of the use of the different lexical units. Breton used the French orthography of his time to spell the indigenous language.

Lexicography of Bolivian Tupí-Guaraní languages in the twentieth century As examples of the numerous smaller and more comprehensive dictionaries compiled by mission- aries all over South America since the second half of the nineteenth century, three dictionaries of Tupí-Guaraní languages of Bolivia are presented in this section. Chronologically, the first one is the dictionary of Chiriguano, today called Bolivian Guaraní, spoken in the Southeast of Bolivia. Franciscan missionaries, mostly from Italy, had worked among the Chiriguanos and other tribes of the Chaco since the eighteenth century, having built up their center at Tarija in 1755. Since the 1880s Alessandro Maria Corrado and Doroteo Giannecchini had studied Bolivian Guaraní, published a grammar, and compiled a dictionary. This was completed and published by the Fathers Romano and Cattunar (1916). Following a short grammatical introduction, the authors present a rich Chiriguano-Spanish dictionary (256 pp.), with plenty of ethnographical, botanical, and zoological information. The Spanish-Chiriguano section (199 pp.) is equally rich in references to synonyms and offers lots of examples of the use of the Chiriguano expressions. The notation of the central vowel /ɨ/ by <ì> is not always clear in the printing. More serious is the unclear notation of vocalic nasality, e.g., hai “tooth/teeth” instead of hãi. Father Alfred Hoeller was an Austrian Franciscan missionary among the Guarayo or Guarayu people since 1929. The Guarayu (Tupí-Guaraní) are living in the Eastern Bolivian plains. After a stay of only 3 years, Hoeller published a Guarayu grammar and a dictionary of 356 pp. (Hoeller and Alfredo 1932). Both are the only detailed studies of the language to this day. Hoeller’s dictionary is rich in Guarayu examples. His notation of vocalic nasality is much better than that of Romano and Cattunar, and very often he marks the glottal with a hyphen (e.g., pi-a “heart; stomach”) or space (e.g., chẽ ã for [’ʧẽʔã] “my nose”), a really remarkable achievement for his time.

Page 7 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

The Siriono language (Tupí-Guaraní) of Eastern Bolivia has been thoroughly documented and analyzed by another Austrian Franciscan missionary, Father Anselm Schermair. Living among the Siriono between 1929 and 1952, he must have learnt the language in all its details. In 1949 he published an impressive grammar, and in 1958 his Siriono-Spanish dictionary (504 pp.; Schermair 1958), which was followed in 1962 by its Spanish-Siriono counterpart (451 pp.; Schermair 1962). His dictionaries are rich in examples as well as in ethnographical information. He was less successful in his notation of Siriono phonemes. Schermair, for instance, did not realize the difference between / i/ and /ɨ/, nor did he see clearly the opposition between /s/ and /ʃ/.

Modern lexicography of indigenous languages in South America Since the 1950s the Summer Institute of Linguistics (SIL, Spanish “Instituto Linguístico€ de Verano,” ILV) has been present in many South American countries, especially in Colombia, Peru, and Brazil. Following the period of Catholic missionaries, this Protestant institution has done a great part of the linguistic studies of South American indigenous languages in the second half of the twentieth century. In addition to grammars and textbooks, the SIL also published dictionaries of numerous indigenous languages, some of them comprehensive dictionaries established on linguistic criteria. Their merit results from the fact that the authors generally lived for a long time among the indigenous people, intensely learning “their” language. Their limitations have been mentioned in Section “Lexical Characteristics in the Relevant Languages”. As recently as the 1970s, there were not that many linguists in South American nations who worked on indigenous languages. Since then their number has increased considerably, and with them the number of dictionaries compiled by them. In most cases the languages have to be documented and analyzed for the first time, writing systems be created, etc. All this has to be done through fieldwork, without the possibility to use existing written texts.

Lexicography of Tupí and Tupí-Guaraní languages Paraguayan Guaraní The linguistic situation of colonial Paraguay described in Section “The First Guaraní Dictionary” has developed into Spanish-Guaraní bilingualism of the mestizo population, in Paraguay and surrounding areas, since the beginning of the contact of relatively few Spaniards and the indigenous majority. Today, Guaraní is the second official language of Paraguay, together with Spanish; it is an official language in the Argentine province of Corrientes and is spoken by minorities in the Argentine provinces of Formosa, Chaco, Santa Fe, and Misiones and in parts of Buenos Aires and the South of Mato Grosso do Sul (Brazil). Numerous dictionaries were published since the 1950s, for instance Jover Peralta and Osuna (1950) and Guasch (1996). Father Guasch has been an institution in Paraguay, since the first edition of his dictionary in 1948. It is rich in examples and different syntactic constructions. The 13th edition has been adapted to the newest orthographic rules and includes neologisms, mostly based on Guaraní phrases. The Spanish-Guaraní part has 503 pp., the Guaraní-Spanish part 298 pp. One of the more recent works is the dictionary by the poet and writer Trinidad Sanabria (2002). In the Guaraní-Spanish part (336 pp.), it offers rich information, but the organization of the entries is not always systematic: for example, hũ “black” is retaken under N, where nahũi “it is not black” is included, as if it were a new item. “It is black” and “it is not black” would have been the better solution in both cases.

Tembé The Tupí-Guaraní languages of Brazil were not studied by scholars during a long time, because the main national interest was the study of Tupínambá and “Língua Geral” (see Section “Lexical Characteristics in the Relevant Languages”). These efforts did not lead to the production of new dictionaries. It was the Tembé language of Northeastern Brazil (State of

Page 8 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

Maranhão), now seriously endangered, which was documented in a copious dictionary in 1978 (Boudin 1978). The first volume is a Tembé-Portuguese dictionary (344 pp.), the second one a Portuguese-Tembé dictionary (364 pp.). Most entries offer extensive exemplification and reference to Guaraní cognates found in Montoya (1640) and other sources. The spelling is based on that of Brazilian Portuguese in the 1960s. Since this first achievement the situation of Tupí-Guaraní lexicography has improved considerably (cf. Sections Wayãpi through Ka’apor and Tupari).

Wayãpi The anthropologist Françoise Grenand, who had lived for many years among the Wayãpi (Tupí-Guaraní) of French Guiana, has published a comprehensive dictionary of the language (Grenand 1989), probably the best dictionary of any Tupí-Guaraní language to date. Following a grammatical introduction it begins with a short French-Wayãpi vocabulary (30 pp.), followed by a systematic list of botanical and zoological terms (Latin-Wayãpi, 32 pp.). The 425 pages of the Wayãpi-French dictionary offer rich pragmatic and ethnographical information. The spelling is phonological. Very often the author gives etymological information by making reference to Guaraní cognates found in Montoya (1640), just as Boudin (1978) did (see “Tembé”).

Mbyá Mbyá, a widespread language (Paraguay, Argentina, Southern Brazil, about 15,000 speakers), which is close to Paraguayan Guaraní, has been thoroughly studied by Dooley (2006) in a comprehensive dictionary (206 large-sized pages), well organized, established on linguistic criteria, and introduced by a good grammatical description (143 pp.). Segmentation and syntactical analyses make this dictionary exemplary.

Kawahib Kagwahiva or Kawahib is the name of a complex of closely related Tupí-Guaraní dialects of Rondonia, a state of Northwestern Brazil, surrounded by several very different languages of the Tupí stock. One of the dialects, Parintintin, had been studied by SIL member LaVera Betts since 1961. She had published a Parintintin dictionary in 1981, but the posthumous edition (Betts 2012) includes information for most dialects. Though fieldwork for this dictionary was done between 1961 and 1991, this is a highly valuable piece of information for this dialect complex, which had been neglected for a long time because the area was not easily accessible. The importance of the dictionary (295 pp.) is increased by its good segmentation of roots and morphemes and by the grammatical information offered in all entries.

Asurini of river Tocantins Similar qualities are found in Cabral and Rodrigues (2003). Asurini (Tupí-Guaraní) is a language spoken by a small group (350 speakers) near the mouth of River Tocantins (Brazil). The dictionary (240 pp.) is preceded by a succinct but very clear grammatical introduction. Each entry offers rich exemplification.

Ka’apor and Tupari The last two examples deal with the problem of how to make a dictionary of a language which had never been studied by linguists, with the focus on describing its lexicon. Both are doctoral dissertations which intend to prepare the field for the analysis and lexicographic treatment of the lexical roots. Therefore, both dictionaries (Alves 2004 and Caldas 2009) are preceded by the analysis of various domains (sociolinguistic, phonological, morphosyntactic, and semantic) of the languages in question: Ka’apor (Tupí-Guaraní) and Tupari (Tupí). Both have a system of reference from simple roots to inflectional forms of the same root, offering photos of botanical and zoological items or specific objects of the indigenous cultures.

Page 9 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

Lexicography of smaller language groups Within the last 50 years, linguists from SIL and from universities of different countries have contributed to the lexicography of many indigenous languages belonging to smaller families or being isolates. Only some of them will be mentioned here: the impressive Warao dictionary by Barral (2000), the dictionary of Cabécar (Chibcha) by Enrique Margery Peña (1989), of Achuar- Shiwiar (Jívaro) by Gerhard Fast Mowitz and other SIL members (1996), of Shuar (Jívaro) by the Fathers Pellizzaro and Nàwech (2003, reedited in 2012), of Tukano (Eastern Tukano family) by Henri Ramirez (1997), and of Kubeo (Central Tukano) by Morse et al. (1999). Four exemplary works will be presented in the following sections.

Arabela (Záparo) Only two or three languages are left from the small Záparo family (Northeast of Peru and Southeast of Ecuador): Arabela, Iquito, and Sápara. The language of the last speakers of Sápara is documented in a dictionary by Ruth Moya (2009) and in another one by Beier et al. (2014). Arabela, spoken by less than 100 people, is documented in a copious dictionary (Rich 1999). Pages 21–97 contain a grammatical introduction, pp. 110–445 the Arabela-Spanish section, pp. 446–643 the Spanish-Arabela counterpart.

Yanomamɨ The Yanomami family consists of four related languages spoken on both sides of the Venezuelan/Brazilian border. One of the languages is Yanomamɨ, which has about 15,000 speakers in Venezuela and 1,900 in Brazil. Following two earlier Yanomamɨ dictionaries by Jacques Lizot (1975 and 2003), the French-Venezuelan anthropologist and ethnolinguist Marie-Claude Mattéi- Muller published a comprehensive Yanomamɨ dictionary (Mattéi-Muller 2007), which is intended to be a handbook of Yanomamɨ culture. It is illustrated by photos and has, in its appendix (pp. 593–693), a systematic list of fauna and flora terminology. The dictionary results from the collaboration of the indigenous community with the author.

Wanano (Tukano) Wanano or Kotiria is an important regional language of the upper Vaupés River basin. It belongs to the northern group of Eastern Tukano languages. The comprehensive dictionary by Waltz et al. (2007) has two unequal parts: Wanano-Spanish with 320 pp. and Spanish-Wanano with 85 pp. It has many examples and offers plenty of ethnographical material. It is intended to be used by Wanano people as well as by linguists.

Yuhupdeh Until the 1980s the Makú or Nadahup family was hardly known, and its languages were not documented, analyzed, and classified until the 1990s. One of its languages is Hupda or Hupd’€ah, which has been documented by Henri Ramirez in 2006. Another one is Yuhupdeh (from jùhu´p “people” + Àdêh “plural”). Both are spoken in the upper Rio Negro and Vaupés River basin in Brazil as well as on the other side of the border, in Colombia. The copious Yuhupdeh dictionary by Silva and Silva (2012) has been compiled by the authors in collaboration with the community. It has an ethnographical and grammatical introduction (132 pp.), a Yuhupdeh-Portuguese dictionary with plenty of examples, well organized on linguistic criteria (208 pp.), and a glossary of semantically arranged terms, including terminology of myth and shamanism (138 pp.). The final part is a Portuguese-Yuhupdeh dictionary (58 pp.). The book is illustrated with impressive photos.

Lexicography of major families (Carib, Arawak, and Pano) It may be surprising that the lexicography of the major language families, in proportion to that of the smaller groups mentioned in Section Lexicography of Smaller Language Groups, does not present a comparable number of outstanding achievements. Some indigenous communities have been in

Page 10 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014 contact with the surrounding national society since colonial times, and if not, at least for a hundred years. This is the case for some Chibcha nations between Nicaragua and Colombia, Carib groups of Venezuela and the Guianas, Arawak groups of the Guianas and Peru, the Aymaras of Bolivia and Peru, the Mapuche of Chile, but also of Paraguayan Guaraní (see Section “Paraguayan Guaraní”). As a consequence of the availability of reasonable dictionaries used in missionary stations and at schools, there probably has been less motivation for linguists to study relatively well-known languages. Jê and Macro-Je languages are a special matter: some of them have been briefly described by ethnologists since the end of the nineteenth century, for instance, by Paul Ehrenreich or Curt Nimuendajú. On the whole, the nature of the family has been recognized late. Linguistic studies are recent and did not yet lead to an important lexicographic production. The most interesting dictionaries are those for languages the classification of which is still being discussed, such as Djeoromitxi and Arikapú.

Carib family Mattéi-Muller (1994) is an outstanding dictionary of Panare or E’ñepa, a language spoken by about 3,000 people in Central Venezuela. A brief linguistic introduction is followed by a Panare-Spanish (273 pp.) and a Spanish-Panare dictionary (pp. 275–414), with rich exemplification and ethnographical information. Courtz (2008) originally was a Leiden University PhD thesis. Its first part is a grammar of the Kari’ña language of Suriname and French Guiana, and the second part is a comprehensive inventory of Kari’ña lexemes and affixes, with references to cognates in other Carib languages.

Arawak family Since 1980, about 15 dictionaries of some importance have been produced for the Arawak languages. For instance, Aikhenvald (2001) is a thematic dictionary of Tariana, spoken by a minority of the Tarianas (100 people) in the Upper Vaupés River (Amazonia, Brazil). The majority of the Tarianas have adopted one of the Eastern Tukano dialects of their surroundings. The sections of the dictionary are organized according to semantic fields covering most of the indigenous culture, including terms for fauna and flora, and extensive exemplification by short texts. As another example, Silva (2013) contains the first modern description of Terena, a Southern Arawak language spoken by 16,000 people, mostly in Mato Grosso do Sul (Southwest of Brazil). The second part (pp. 138–271) is a “proposal” for a Terena-Portuguese dictionary. What has been achieved is satisfactory because of the richness of linguistic and ethnographical information, including texts on ethnographical issues.

Pano-Takana family Most members of the Pano-Takana stock have been documented and ana- lyzed only recently. Though the Pano family was known by Spanish missionaries since the eighteenth century, there were only a few linguistic studies before 1980. Two examples among the nine dictionaries published since 1981, six of them by SIL members, are presented here: Loos and Hall de Loos (2003) is the of research between 1954 and 1983 among the Capanahua or Kapanawa, who live in Northeastern Peru. The Capanahua-Spanish dictionary (pp. 63–403) con- tains about 5,000 roots, most of the everyday lexicon, including copious exemplification. The second part is a Spanish-Capanahua dictionary (pp. 405–495), followed by lists of suffixes and Spanish regionalisms, mostly from Quechua, which refer to fauna and flora terminology. Ferreira (2005)isafirst approach to a dictionary of Matis, a Pano language of the extreme West of the Brazilian state of Amazonas (River Javarí valley). Constant contact with the Matis began only in the 1970s. Preceded by lexicographic theory and a grammatical introduction, Ferreira’s Matis- Portuguese dictionary contains 1,547 entries, including exemplification and ethnographical

Page 11 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014 information. A second major dictionary of a Pano language is the Matsés-Spanish dictionary by Fleck and two indigenous authors (2012). It has about 3,380 entries. Matsés belongs to the same Mayoruna group as Matis and is likewise spoken in the upper Río Yavarí and Galvez region, on the Peruvian side of the rivers. Fleck, who lives with the Matsés, is also the author of an impressive reference grammar of the language (2003).

Lexicography of Chaco languages Some of the indigenous nations of the Chaco had been known to missionaries since the eighteenth century, especially the Abipon and Guaicuru Indians. Chaco languages have been studied by missionaries since the 1880s, whereas documentation by linguists began in the 1950s. Three remarkable examples from the limited lexicographic production in this field will be presented. Chronologically, the first important dictionary is Seelwische (1990), a comprehensive inventory of Nivaclé (formerly also Chulupí, Matako-Mataguayo family), spoken in the Chaco of Paraguay and Argentina. Its Nivaclé-Spanish section (pp. 17–259) offers good exemplification. In the Spanish-Nivaclé section (pp. 261–469), many entries are repeated in order to show the use in other contexts. The final section is a short Guaraní-Nivaclé word list. The dictionary of Northern Lengua (Enlhet) by Unruh and Kalisch (1997) is rather unique because it is a monolingual dictionary (903 pp.), with scarce reference to Spanish, intended for teaching Enlhet in indigenous schools. It contains many archaisms; the use of all lexemes is documented by texts of tales gathered among the members of the community. Ana Gerzenstein was the founder of the productive Buenos Aires center of Amerindian linguistic studies. Her dictionary of Maká (DELME 1999), a Matako-Mataguayo language spoken by 1,500 people in the Paraguayan Chaco and in the capital of Paraguay, Asunción, is a copious inventory of the language. After a useful historical, sociolinguistic, and grammatical introduction (pp. 17–107), the Maka-Spanish (pp. 109–399) and the Spanish-Maka sections (pp. 401–551) are organized on the basis of word families, including all kinds of derivations and compounds. There is plenty of ethnographical information.

Lexicography of Quechua, Aymara, and Mapuche Within the long lexicographic tradition of Quechua (called runa simi “language of man”) since the sixteenth century, but especially since the 1850s, there has been a rich production of dictionaries (see Section “The First Quechua Dictionaries”). This may be a consequence of the official status of this language in Bolivia, Ecuador, and Peru, along with the . Since the 1970s there has been a focus on dialectal varieties of Quechua (Ecuadorian, Cajamarca, Ancash, Santiago del Estero (Argentine), etc.). The most important dialect, with the largest number of speakers and the most important cultural and literary legacy, is that of Ayacucho-Cusco. It is documented by Academia Mayor de la Lengua Quechua (2005), a copious Quechua-Spanish dictionary of the prestige language (772 pp.), with neologisms in Quechua. All kinds of compounds and derived forms with special meanings are listed in different entries. The Spanish-Quechua section (pp. 773–928) offers a lot of ethnographical information. One of the highlights of Quechua lexicography is the impressive dictionary by Father Lira (1944), with its 1,199 pages, reedited in 2008 and adapted to modern spelling rules, including a shorter Spanish-Quechua section. The second outstanding achievement is Calvo-Pérez (2009), an extensive dictionary of 2,490 pages, with a focus on terminology in both of its parts. It is the most complete dictionary of Quechua, intended to be useful in modern contexts as well. In Aymara, there has been an immense gap of lexicographic production between Bertonio (1612), see Section “The First Aymara Dictionary”, and recent years. The most important Aymara

Page 12 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014 dictionaries published in the 1980s are reviewed by Orellana de Quineche (2000). Among the three dictionaries discussed, Buttner€ and Condori (1984) is viewed as the richest one (9,695 entries). Since then there has not been any other noteworthy lexicographic achievement, not even in Bolivia, where Aymara is an official language. Mapuche, today called Mapudungun, traditionally Araucano, has been documented since Luis de Valdivia’s grammar (1606). The German Jesuit Bernhard Havestadt (1777) published an extensive work about Chile. Its second volume (pp. 604–807) contains a copious vocabulary of Mapudungun, with good explanations, in Latin, of the use of the words. Thereafter, there was no lexicographic activity until the dictionary by Brother Augusta (1916). Both volumes (Spanish-Mapudungun, 291 pp., and Mapudungun-Spanish, 421 pp.) show an unusual and unclear spelling of Mapudungun, and many common words are lacking. Erize (1960) is a comprehensive Mapudungun-Spanish dictionary including dialectal varieties (430 pp.), offering much ethnographical information and followed by a short Spanish-Mapudungun section. Since Mapudungun is an important areal language, a certain number of dictionaries written by linguists have been published since the 1990s, but none is a major achievement.

Electronic corpora of texts Sources for electronic databases can be found at the Documentation of Endangered Languages Programme (www.dobes.mpi.nl). The homepage offers links to several documentation projects of South American languages. In Brazil, www.etnolinguistica.org contains a databank of linguistic dissertations, an online journal, Cadernos de Etnolingu´sticaı , and links to other organizations; www. enlhet.org offers texts in languages of the Enlhet-Enenlhet family of Chaco languages.

Future prospects Considering the more or less serious endangerment of all indigenous languages of South America, the most urgent task is the comprehensive documentation and analysis of the languages. As has been done on occasion in several recent projects, future dictionaries will be developed mainly in collaboration with indigenous communities. Other projects will revolve around the publication of (anonymous) dictionary manuscripts from the seventeenth and eighteenth centuries. At present, three of the six known dictionary manuscripts of the Língua Geral Amazônica of the eighteenth century are being prepared for publication by a group of Brazilian and European specialists.

Language planning institutions and academies Given the nature and present situation of indigenous languages, there cannot be any language planning institution. This is possible only in countries where one or more of the indigenous languages have an official status. But even there one of the major problems is the lack of language standardization (Quechua in Peru, for instance). Nevertheless, the Academia Mayor de la Lengua Quechua (AMLQ, Cusco, Peru) is doing its best in matters of language policy. There are initiatives for creating academies of Aymara in Bolivia and the north of Chile. On the basis of Ley de Lenguas (“Languages Act”, 2010), the Academia de la Lengua Guaran´ı (Ava Ñe’ẽ Rerekua Pavẽ) of Asunción (Paraguay) was founded in 2012. Its tasks are the standard- ization of the language and its lexical enrichment.

Page 13 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

References

Adelaar, W. F. H., with the collaboration of Muysken P. C. (2004). The languages of the Andes. Cambridge: Cambridge University Press. Aikhenvald, A. Y. (2012). The languages of the Amazon. Oxford: Oxford University Press. Aikhenvald, A. Y., & Muysken, P. C. (Eds.), with the assistance of Joshua Birchall. (2011). Multi- verb constructions. A view from the Americas. Leiden/Boston: Brill. Caldas, R. B. C. (2010). Dicionários bilíngues: uma reflexão acerca do tratamento lexical em línguas Tupí. In A. S. A. C. Cabral, A. D. Rodrigues, & F. B. Duarte (Eds.), L´nguası e Culturas Tup´ı (Vol. 2, pp. 105–115). Campinas/Brasília: Editora Curt Nimuendajú/LALI. Dietrich, W. (1995). La importancia de los diccionarios guaraníes de Montoya para el estudio comparativo de las lenguas tupí-guaraníes de hoy. Amerindia, 19/20, 287–299. Ferreira, Vitória Regina Spanghero (2009). Considerações sobre trabalhos acadêmicos: o léxico das línguas brasileiras. Guavira 8 (Campo Grande, Brazil), pp. 29–39. Fleck, D. W. (2003). A reference grammar of Matse´s. Ph.D. thesis. Houston: Rice University. Maia, M., & Fialho, Maria Helena de Sousa. (2002). Problemas e soluções do dicionário Karajá. In L´nguası Ind´genası Brasileiras. Fonologia, gramática e história. Atas do I Encontro Internacional do Grupo de Trabalho sobre L´nguası Ind´genası da ANPOLL (Vol. I, pp. 118–131). Belém: UFPA. Orellana de Quineche, A. (2000). Diccionarios aimaras modernos. In L. Miranda (Ed.), Actas del I Congreso de lenguas ind´genası de Sudame´rica (Vol. II, pp. 373–379). Lima: Universidad Ricardo Palma. Regúnaga, M. A. (2012). Tipolog´aı del ge´nero en lenguas ind´genası de Ame´rica del Sur. Bahía Blanca (Argentina): Editorial de la Universidad del Sur. Rodrigues, A. D. (1996). As línguas gerais sul-americanas. Papia, 4(2), 6–18.

Dictionaries Academia Mayor de la Lengua Quechua. (2005). Diccionario quechua-espan˜ol-quechua–qheswa- espan˜ol-qheswa simi taqe. Cusco: Gobierno Regional Cusco. Aikhenvald, A. Y. (2001). Dicionário Tariana. Boletim do Museu Paraense Em´lioı Goeldi (Vol. 17, 1, pp. 3–389). Série Antropologia, Belém, Pará (Brazil). Alves, Poliana Maria (2004). Ole´xico do Tupari: Proposta de um dicionário bil´ngı ue.€ Unpublished doctoral dissertation, São Paulo: Universidade de Araraquara. Anonymous. (1952–1953). Vocabulário na l´nguaı bras´lica.ı 2ª edição revista e confrontada com o Ms. fg., 3144 da Bibl. Nacional de Lisboa por Carlos Drumond. Vol. 1 (A-H), São Paulo: Universidade de São Paulo, Faculdade de Filosofia, Ciências e Letras, Boletim 137; vol. 2 (I-Z), Boletim 164. Manuscript from 1621. Augusta, F. F. J. d. (1916). Diccionario araucano-espan˜ol y espan˜ol-araucano. Santiago de Chile: Impr. Universitaria (Reprint Temuco: Kushe, 1991). de Barral, B. (2000). Diccionario Warao-Castellano, Castellano-Warao. Caracas: Universidad Andrés Bello. Beier, C., Bowser, B., Michael, L., & Wauters, V. (2014). Diccionario Záparo Trilingue,€ sápara- castellano-kichwa, castellano-sápara y kichwa-sápara. Quito: Abya-Yala. Bertonio, Ludovico, S. I. (1612). Vocabulario de la lengua Aymara. Juli: Chucuito. (Re-edition by Enrique Fernández García, S. I., Arequipa, Peru: El Lector, 2005). Besada, M. et al. (Eds.) (1999). Dictionnaire cara¨be-françoisı du Père Raymond Breton, 1665. Paris: IRD/Karthala (Original publication at Auxerre: Gilles Bouquet, 1665).

Page 14 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

Betts, L. V. (2012). Kagwahiva dictionary. Anápolis: SIL. Boudin, M. H. (1978). Diccionário de Tup´ı moderno. Dialeto tembe´-te´nête´har do alto do rio Gurupi (Vol. I-II). São Paulo: Conselho Estadual de Artes e Ciências Humanas. Buttner,€ T. T., & Condori, D. (1984). Diccionario aymara-castellano. Puno: Proyecto Experimental de Educación Bilingue.€ Cabral, Ana Suelly A. C. & Rodrigues, Aryon D. (2003). Dicionário Asurin´doı Tocantins – Português. Belém/Brasília: UFPA/IFNOPAP/LALI. Caldas, Raimunda Benedita Cristina. (2009). Uma proposta de dicionário para a l´nguaı Ka'apor. Doctoral dissertation. Brasília: Universidade Nacional de Brasília. Calvo-Pérez, J. (2009). Nuevo Diccionario Espan˜ol-Quechua (vols. 1–3), Quechua-Espan˜ol (vols. 4–5). Lima: Universidad San Martín de Porres. Courtz, H. (2008). A Carib grammar and dictionary. Toronto: Magoria Books. Gerzenstein, A. (1999). Diccionario etnolingu€´sticoı maká-espan˜ol, ´ndiceı espan˜ol-maká. Buenos Aires: Universidad de Buenos Aires. Dooley, R. A. (2006). Le´xico Guaran´,ı Dialeto Mbyá: Introdução, Esboço gramatical, Le´xico. Cuiabá: SIL. Erize, E. (1960). Diccionario comentado Mapuche-Espan˜ol, Araucano, Pehuenche, Pampa, Picunche, Ranculche,€ Huilliche. Bahía Blanca: Cuadernos del Sur. Fast Mowitz, G., Fast, R. W. d., & Fast Warkentin, D. (1996). Diccionario Achuar-Shiwiar –- Castellano. Yarinacocha: Instituto Linguístico€ de Verano. Fernández Garay, A. (2001). Ranquel-Espan˜ol/Espan˜ol-Ranquel: Diccionario de una variedad mapuche de La Pampa (Argentina). Leiden: CNWS Publications. Fernández Garay, A. (2004). Diccionario Tehuelche-Espan˜ol/Índice Espan˜ol-Tehuelche. Leiden: CNWS. Ferreira, Vitória Regina Spanghero (2005). Estudo lexical da l´nguaı Mat´s:ı subs´diosı para um dicionário bilingue€ . Doctoral dissertation. Campinas: UNICAMP. Fleck Zuazo, D. W., Fernando Shoque, Uaqui B€eso, & Daniel Manquid, Jiménez Huanán. (2012). Diccionario Matse´s-Castellano con ´ndiceı alfabe´tico castellano-matse´s e ´ndiceı semántico castellano-matse´s. Punchana: Tierra Nueva [Frei Onofre]. (1934). Diccionario Portuguez-Brasiliano e Brasiliano-Portuguez. In Plínio Ayrosa (Ed.). Revista do Museu Paulista 18, pp. 17–322. Grenand, F. (1989). Dictionnaire wayãpi-français. Lexique français-wayãpi (Guyane Française). Paris: Peeters/SELAF. Guasch, P. A. (1996). Diccionario castellano-guaran´ı y guaran´-castellanoı (13th ed.). Asunción: CEPAG. Havestadt, Bernhard. (1777). Chilidu´ǵu, Sive Res Chilenses Vel Descriptio Status tum naturalis, tum civilis, cum moralis Regni populique Chilensis, inserta suis locis perfectae ad Linguam Chilensem Manuductioni. Monasterii Westphaliae: Typis Aschendorfianis. Hoeller, P., & Alfredo F. (1932). Guarayo-Deutsches Wo¨rterbuch. Guarayos (Bolivia) and Hall (Tirol): Missionsprokura der PP. Franziskaner. Holguín, Diego González (1608). Vocabulario de la lengua general de todo el Peru llamada Qquichua, o del Inca. Lima: Francisco del Canto. Jover Peralta, A., & Osuna, T. (1950). Diccionario guaran´-espanı ˜ol y espan˜ol-guaran´ı. Buenos Aires: Tupã. Lira, Jorge A. (1944). Diccionario kkechuwa-espan˜ol (3 Vols). Tucumán, Argentina: Universidad Nacional de Tucumán. (Re-edition by Mario Mejía Huamán, Diccionario quechua-castellano, castellano-quechua, Lima: Editorial Universidad Ricardo Palma, 2008).

Page 15 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

Lizot, J. (1975). Diccionario Yanomami-Espan˜ol. Caracas: Universidad Central de Venezuela. Lizot, J. (2003). Diccionario Enciclope´dico de la Lengua Yãnomãmɨ. Puerto Ayacucho, Venezuela: Vicariato Apostólico. Loos, E. E., & Hall de Loos, B. (2003). Diccionario capanahua-castellano (2ath ed.). Lima: ILV. versión electronica ilustrada. Margery Peña, E. (1989). Diccionario Cabe´car-Espan˜ol, Espan˜ol-Cabe´car. San José: Editorial de la Universidad de Costa Rica (Re-edition 2004). Mattéi-Muller, Marie-Claude, con la colaboración de Paul Henley y Prajedes Salas (1994). Diccionario Ilustrado Panare-Espan˜ol, ´ndiceı Espan˜ol-Panare, un aporte al estudio de los Panare-E’n˜epa. Caracas: Gráfica Armitano. Mattéi-Muller, Marie-Claude, & con la colaboración muy especial de Jacinto Serow€e. (2007). Lengua y Cultura Yanomamɨ. Diccionario Ilustrado Yanomamɨ-Espan˜ol, Espan˜ol-Yanomamɨ. Caracas: Epsilon Libros. Middendorf, E. W. (1890). Das Wo¨rterbuch des Runa Simi oder der Keshua-Sprache. Leipzig: Brockhaus. Morse, N. L., Salser, J. K., Jr., & de Salser, N. (1999). Diccionario ilustrado bilingue€ cubeo-espan˜ol, espan˜ol-cubeo. Bogotá: Editorial Alberto Lleras Camargo. Montoya, Antonio Ruiz de (1639). Tesoro de la lengua guaran´ı. Madrid: Iuan Sanchez. (Re-edition by Bartomeu Melià, Asunción: CEPAG, 2011). Montoya, Antonio Ruiz de (1640). Vocabulario de la lengua guaran´ı. Madrid: Iuan Sanchez. (Re-edition by Bartomeu Melià, Asunción: CEPAG, 2003). Moya, R. (2009). Diccionario trilingue€ Sápara-Castellano-Quichua. Pana Sápara Atupama – Nuestra lengua sápara. Quito: Ministerio de Educación del Ecuador – Voluntad. Pellizzaro, S. M., & Nàwech, F. O. (2003). Chicham. Diccionario enciclope´dico shuar-castellano. Sucúa: Wea-nekáptai (Re-edition Quito: Abya-Yala, 2012). Ramirez, H. (1997). A Fala Tukano dos Ye’pa-masa.^ Tomo II: Dicionário. Manaus: Inspetoria Salesiana Missionário da Amazônia – CEDEM. Ramirez, H. (2006). Al´nguaı dos Hupd’ah€ do alto Rio Negro. Dicionário e guia de conversação. São Paulo: Saúde sem Limites. Rich, R. G. (1999). Diccionario arabela-castellana y castellano-arabela. Lima: ILV. Romano, S., & Cattunar, H. (1916). Diccionario chiriguano-espan˜ol y espan˜ol-chiriguano. Tarija: Colegio de Santa María de los Ángeles. Santo Tomás, Fray Domingo de. (1560). Lexicon, o vocabulario de la lengua general del Perv. Valladolid: Francisco Fernandez de Cordoua. (Critical re-edition by Calvo Pérez, Julio & Urbano, Henrique. 2 vols. Lima: Universidad de San Martín de Porres, 2013). Schermair, P. Fray Anselmo. (1958). Vocabulario sirionó-castellano. Innsbruck: Universit€at Innsbruck. Schermair, P. Fray Anselmo. (1962). Vocabulario castellano-sirionó. Innsbruck: Universit€at Innsbruck. Seelwische, J. (1990). Diccionario nivacle´-castellano, castellano-nivacle´. Asunción: CEADUC. Silva, C., & Silva, E. (2012). Al´nguaı dos yuhupdeh: introdução etnolingu´stica,ı dicionário yuhup- português e glossário semantico-gramatical^ . São Gabriel da Cachoeira: Gráfica e Editora Del Rey. Silva, D. (2013). Estudo lexicográfico da l´nguaı Terena. Proposta de um dicionário bilingue Terena- Português. Unesp: Araraquara-SP. Stradelli, Ermano (1929). Vocabularios da Lingua Geral. Portuguez-Nheêngatu e Nheêngatu- Portuguez. Precedidos de um esboço de Grammatica Nheênga-umbuê–saua-mirî e seguidos de

Page 16 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_91-1 # Springer-Verlag Berlin Heidelberg 2014

contos em lingua geral nheêngatu-por^anduua. Revista do Instituto Histórico e Geográfico Brasileiro 104, vol. 158 (Rio de Janeiro). Trinidad Sanabria, L. (2002). Gran diccionario Avan˜e'ê. Guaran´-castellano,ı castellano-guaran´ı. Buenos Aires: Editorial Ruy Díaz. Tschudi, Johann Jakob von. (1853). Die Kechua-Sprache. Dritte Abtheilung: Wo¨rterbuch. Wien: K. u.k. Hof- und Staatsdruckerei. Unruh, E., & Kalisch, H. (1997). Moya’ansaeclha’ Nengelpayvaam Nengeltomha Enlhet.Ya’alve- Saanga (Paraguayan Chaco): Comunidad Enlhet. Valdivia, Luis de (1606). Arte y gramática de la lengua general que corre en todo el Reyno de Chile con un vocabulario y confessionario, juntamente con la Doctrina Christiana y Cathecismo del Concilio de Lima en espan˜ol y dos traducciones de´l en lengua de Chile. Lima. (Re-edition Sevilla: Thomás Lopez de Haro, 1684). Waltz, N. E., Simmons de Jones, P., & de Waltz, C. (2007). Diccionario bilingue€ Wanano o Guanano-Espan˜ol, Espan˜ol-Wanano o Guanano. Bogotá: Fundación para el Desarrollo de los Pueblos Marginados.

Page 17 of 17 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

The lexicography of indigenous languages in Australia and the Pacific

Nick Thieberger* University of Melbourne, Melbourne, Australia

Abstract

The Australia and Pacific region is home to nearly a quarter of the world’s languages. Wordlists of a few of these languages date back to the first European explorers, while detailed dictionaries have been prepared for somewhere less than 5 % of them. Where an indigenous language is the official language of a country of this region it is more likely to have a dictionary and ongoing administrative support for lexicographic work, and, in a few cases, a corpus from which terms can be sourced. For most indigenous languages dictionaries are prepared in the course of language documentation efforts by researchers from outside of the speech community, using modern lexicographic database tools and resulting in structured lexicons. As a result, it is possible to produce various output formats of these dictionaries, including print-on-demand, multimodal webpages, and mobile devices as increasingly popular methods of delivery. A major use of these dictionaries can be to support vernacular language programs in schools. This region was a test bed for computational bilingual lexicography, and is home to the two largest comparative lexical databases of indigenous languages.

Introduction

The Australia and Pacific region is home to nearly a quarter of the world’s languages. This figure relies on Ethnologue’s list of languages (Lewis et al. 2013), subtracting metropolitan languages (French, English, Hindi, Malay, Mandarin, Hakka, Javanese, etc.) and Pidgin/Creole varieties (Tok Pisin, Bislama, etc.) and sign languages (New Zealand, Australian) as follows: PNG 846; Australia 384; Pacific 274. For the purposes of this chapter the island of New Guinea is not dealt with as a whole, that is, the languages of West Papua are not included here. The languages of this region, most of which remain poorly or completely unrecorded, can be divided into three major groups: Australian, Papuan (an umbrella term for some 300 non-Austronesian languages of Papua New Guinea, Indonesia and the Solomon Islands that include several clear groups, in addition to a number of languages among which no current relationships are known) and Austronesian, the latter usually being further subdivided into Micronesian, Polynesian, and Melanesian (also known as Oceanic). For an overview of languages of Australia see Dixon (2002)), of Papuan languages see Foley (1986) and of languages of the Pacific see Lynch (1998). Excluded from this chapter, with its specific focus on indigenous languages, are dictio- naries of postcontact pidgin or creole languages, such as Kriol (northern Australia), Torres Strait Broken, Bislama, Tok Pisin, and Solomons Pijin (cf. http://www.vanuatu.usp.ac.fj/sol_adobe_documents/usp% 20only/pacific%20languages/slone.htm). The Pacific Ocean covers somewhere around 170 million square kilometers and the 25,000 islands that make up the countries of the region take up just 1/300th of that surface area. Australia’s land mass of 7.7 million square kilometers and PNG’s mountainous interior pose similar problems for research. This geographic dispersion also has implications for the number of languages spoken and for the delivery of information to speakers who are often long distances away from urban centers. New methods of

*Email: [email protected]

Page 1 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015 publication of dictionaries are particularly relevant to this region, as distance to metropolitan centers is a serious impediment to accessing analog materials, and the expense of purchasing books is often prohibitive either for individuals or for national or local libraries. Open Access dictionaries, either online or as downloadable files, are becoming more common and the ubiquity of portable devices indicates a vector by which dictionaries will be accessed. In the Pacific and PNG there are commonly problems of erratic electricity supply and very slow internet connection speeds, and the internet is typically only available on computers in urban centers. Mobile phones also provide access to the internet, for example, in Vanuatu, eight in ten people now have a mobile phone connection (a 70 % increase from 2007; cf. http://www.worldbank.org/en/news/feature/2013/05/ 17/information-communication-revolution-in-the-pacific). In Australia, the ubiquity of computers and latest generation mobile phones or tablets means that dictionaries in these formats, with media and various search tools, are increasingly popular. This chapter will give an overview of the lexicographic situation for indigenous languages in this region, and, while it cannot be comprehensive, it aims to describe the current situation and refer to major work and directions being taken in the creation and dissemination of dictionaries of indigenous languages. Lexicography in this region has been influential beyond the particulars of the local languages as can be seen by the contribution to broader lexicographic theory by linguists in the Pacific, e.g., Crowley (1999) on the responsibility of the lexicographer; Lichtenberk (2003) on rapid lexical change and the role of the dictionary; Corris et al. (2004) on the usability of dictionaries; Nathan and Austin (1992) on the automatic generation of finderlist forms; Pawley (2001a) on idiom, speech acts (2009) and (2001b, 2011)on ethnobiology; or Mosel (2011) on the particular lexicographic needs of indigenous, in particular endan- gered, languages. The current two largest comparative lexical datasets of languages of the region (Pama- Nyungan and Austronesian, discussed below) are the largest such comparative sets of indigenous languages. Within the region covered by this chapter two main factors are clearly influential in whether or not an effort is put into the production of dictionaries for indigenous languages: the number of languages within a nation and the degree of political independence of the speakers of the indigenous language. For indigenous people subject to a dominant colonial society it is typically the case that metropolitan languages predominate and there are few resources to support the indigenous languages. This is exacerbated if there are many indigenous languages as is the case in Australia and Melanesia (despite the independence of all Melanesian countries from their colonial rulers over the past 40 years). The contrary is illustrated, for example, in the cases of Māori or Hawaiian, each being essentially one language with some variation over the whole state, despite their speakers being minorities, and thus able to support language revitalisation efforts and the development of corpora and detailed dictionaries. A further example can be seen in other Polynesian languages, which, where they are spoken as the main language of the country, have more detailed dictionaries (as will be seen below) than do Polynesian outlier languages in Melanesia (where they are one among many indigenous languages). While wordlists of some of these languages date back to the earliest colonial contacts, most of this work is simply identifying that certain words exist in the language, valuable only up to a point and a long way from representing the wealth of information that is seen in the detailed dictionaries of, for example, Arrernte (Henderson and Dobson 1994) or Kalam (Pawley et al. 2011) which capture fine nuances of meaning and local knowledge that any language deserves from a dictionary. The main publishers of dictionaries in this region have been Pacific Linguistics at the Australian National University (e.g., Heath 1982; Lichtenberk 2008), the Institute for Aboriginal Development (IAD Press; e.g., Henderson and Dobson 1994; Green 2010); the University of Hawai‘i Press (e.g., Sperlich 1997b; Rehg and Sohl 1979); SIL (Bugenhagen and Bugenhagen 2007); and Peeters (Bril 2000; Rivierre 1994)

Page 2 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

As an example of the scale of work yet to be done in creating dictionaries of languages of this region, Lynch and Crowley (2001, pp. 17–19) surveyed all 123 languages of Vanuatu and found 14 languages with at least a “fairly extensive” dictionary or grammar. A subsequent survey of Vanuatu languages (Thieberger 2013) found only twenty-two languages to have a wordlist or dictionary ranked at more than three out of five possible points. So, 71 languages of Vanuatu have virtually no wordlists of any significance. Using the Open Language Archives Community (http://www.language-archives.org/) search for facets “Pacific” and “lexicon” gives 1,611 results, which include the 610 Rosetta Swadesh lists for this region (archive.org/search.php?query=swadesh%20collection:rosettaproject). By adding the term “lexicogra- phy” the total is 578. By filtering the creation date of the items to later than 2000 there are only 40 results (all from PARADISEC, see below), filtering between 1990 and 1999 gives 212 results.

Description

This survey of lexicography will mainly be concerned with developments that have occurred since a baseline established by several earlier survey articles, notably, for Australia, O’Grady (1971), Austin (1983, 1991) and Goddard and Thieberger (1997), and for Polynesian lexicography, Sperlich (1997a). These articles together show that, as would be expected, most indigenous languages have no dictionaries, but a few languages have been the focus of very detailed lexicographic work. So, for example, in Australia, long-term dictionary projects include Warlpiri (http://www.anu.edu.au/linguistics/nash/aust/ wlp/index.html) and Arrernte (Henderson and Dobson 1994), and significant dictionaries have also been prepared for Yir-Yoront (Alpher 1991), Nunggubuyu (Heath 1982), Ngaanyatjarra (Glass et al. 2003), Dalabon (Evans et al. 2004), Yanyuwa (Bradley et al. 1992), among others. In the Pacific, there have been a number of national dictionary projects, often associated with a national language academy. Language academies are a feature of the francophone Pacific, e.g., Acadèmie des langues Kanak (http://www.alk.gouv.nc/), Acadèmie tahitienne (http://www.farevanaa.pf/); with equiva- lents like the Māori Language Commission (http://www.tetaurawhiri.govt.nz/), or the Tuvaluan Lan- guage Board (Siegel 1996). Examples include Niue (Sperlich 1997b), Tonga (Taumoefolau 1998), Fiji (Geraghty 2007), Tahiti (Académie tahitienne 2008), Hawai‘i (Pukui and Elbert 1986) with an additional dictionary of new terms, created by the Hawaiian Lexicon Committee (Kōmike Huaʻōlelo, ʻAha Pūnana Leo and Hale Kuamoʻo 2003); and Māori (http://www.tetaurawhiri.govt.nz/), for which there are several dictionary projects, e.g., Tirohia Kimihia: He Kete Wherawhera (the first monolingual Māori dictionary: Morris 2006), I¯-papakupu: Online Monolingual Māori Dictionary (http://www.korero.maori.nz/), Te Aka Māori-English, English-Māori Dictionary and Index (http://www.maoridictionary.co.nz/). The University of Hawaii’s Pacific and Asian Linguistics Institute (PALI) had a project to describe languages of Micronesia and Polynesia that produced dictionaries of Chamorro (Topping et al. 1975), Marshallese (Abo et al. 1976), Mokilese (Harrison and Albert 1977), Palauan (Josephs and McManus 1990), Ponapean (Rehg and Sohl 1979), Woleaian (Sohn and Tawerilmang 1976), Nukuoro (Carroll and Soulik 1973), and Kapingamarangi (Lieber and Dikepa 1974). In New Caledonia, dictionaries of several languages have been produced in the recent past (e.g., X^ar^acùù: Moyse-Faurie 1989; Iaai: Ozanne-Rivierre 1984; Miroux and Jeno 2007; Cèmuhî: Rivierre 1994; Drehu: Sam 1995) typically with a French finderlist and separate topical lists for plants, cultivated plants and animals, and occasionally also with an English finderlist. Bril (2000) includes four languages: Nêlêmwa, a neighboring variety Nixumwak, French and English. In Vanuatu, recent dictionaries include Araki (François 2002), Sye (Crowley 2000), Anejom̃(Lynch and Tepahae 2001), Vurës (Malau 2011), Malo (Jauncey 2011) and South Efate (Thieberger 2011b). In the

Page 3 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Solomon Islands, dictionaries include Toqabaqita (Lichtenberk 2008); Cheke Holo (White et al. 1988); Sikaiana (Donner 2012); Tolo (Crowley 1986); and Owa (Mellow 2013). In Papua New Guinea, the dictionary of Kalam (Pawley et al. 2011) has taken decades of collaborative work. Other PNG dictionaries include Oksapmin (Lawrence 2006); Ata (Hashimoto 1996); and Sinauḡoro (Tauberschmidt and Snyder 1995). Further to the Polynesian dictionaries discussed by Sperlich (1997a), and those mentioned above, there has been work on a Rapa dictionary by Mary Walworth and the Tomite Reo Rapa (Walworth p.c.).

Lexical Characteristics in the Relevant Languages Languages of this region represent various types, including agglutinating (most Austronesian, Papuan and Pama-Nyungan Australian languages) and polysynthetic (non-Pama-Nyungan Australian languages). In northern Australia, polysynthetic languages result in long and complex word forms that would be rendered as sentences in other languages, for example, the Murrinh-Patha word Perremnunggumangime “Those few people gave things to each other,” causing predictable difficulty for dictionary makers. Seiss and Nordlinger (2011) report on progress with automated morphological parsing of Murrinh-Patha. In Dalabon, morphophonemic variation means that na- “to see,” which is cited in its present tense form nan, appears in example sentences variously as ney, niyan, narrinj, nangey and narrûn (Evans et al. 2004: xv). New Caledonia languages include tone and complex phoneme inventories that require diacritics and doubling and tripling of letters to represent a single phoneme e.g., phw, hmw, hny, hng in Pije, with 35 consonant phonemes (Rivierre 1979). Some Papuan languages include tone (in the Eastern Highlands) and are phonologically rather simple, typically with just three points of articulation for consonants and a five-vowel system (http://sydney.edu.au/arts/research_projects/delp/papuan.php). Serial verb construc- tions are common throughout the region, with implications for the citation form used in dictionaries. In some languages the verb inventory is very small, making it necessary to combine verbs to form complex meanings. In Kalam, which has a small inventory of verbs, an example is mon pk d ap ay “wood hit hold come put (gathering firewood)”. How to analyze these complex forms and at which point they become lexicalised is a common problem for lexicographers in the region. A feature of some Australian languages is the special variety used in avoidance of certain classes of kin, known as a mother-in-law vocabulary (Dixon 1972) typically having distinct words from the everyday language that are used when in the presence of the tabooed relative. The Lardil dictionary (Ngakulmungan and Hale 1997) does not include reference to the ceremonial register, called Damin (Evans 2010, pp. 201–203). In Kalam (Pawley et al. 2011) there is a register called “Pandanus language” spoken during expeditions to collect mountain pandanus, and words from this variety are included in the Kalam dictionary. While not a spoken language, the script of Rapanui (Easter Island) called Rongorongo (http://kohaumotu.org/ Rongorongo/index.html) has so far proven impossible to decipher and continues to invite speculation (Fischer 1997). Other features of words that pose challenges for lexicography in this region include the following:

• Commonly in Oceania inalienable possession is marked by suffixes on eligible nouns. A strategy is then to cite, for example, the 3sg form. In Raga (Vanuatu) Walsh (2007) reports on the need to include noun class information for possessive nouns, increasing the complexity of the entry. • In some languages of Vanuatu verbs alternate initial consonants based on aspect, so one form is cited and the other is given as a cross-reference. • Words in polysynthetic languages with complex morphophonemic changes to what could be “root” forms but are typically not recognized as such by speakers.

Page 4 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

• Conjugation classes for verbs in some Australian languages require the selection of a citation form, sometimes with paradigms either provided for each verb or once for an example verb that is cross- referred to by others in the same conjugation. In the Ngaanyatjarra dictionary (Glass et al. 2003), verbs are given in the future as it provides a diagnostic form for the paradigm in which the verb occurs. As noted by Corris et al. (2004, p. 47), in agglutinative languages with a number of inflected forms for any given root, users are “disappointed when they could not find particular inflected forms of verbs in the dictionaries.”

Recently there have been dictionaries produced by speakers of indigenous languages of the region, either as sole authors (e.g., Sam 1995; Bell 2004) or in collaboration with a linguist (e.g., Inia 1998; Inia and Churchward 1998; Lynch and Tepahae 2001; Tepahae 2011; Ford and McCormack 2007). Speakers writing their own dictionaries may have no training in the principles of lexical databases, which is not to say that many linguists are not similarly untrained. Summer schools (e.g., CoLang (http://www.uta.edu/ faculty/cmfitz/swnal/projects/CoLang/) or LLL (http://www.hrelp.org/events/3L/)) and short training courses teach both the principles of appropriate data structures and the practicalities of particular software tools, and journals (like Language Documentation & Conservation (http://www.nflrc.hawaii.edu/ldc/)) provide advice and software reviews. It has been more usual for dictionaries to be produced in partnership with a linguist who is documenting the language or with a missionary who is translating religious texts into the local language. Clearly, the greater depth of knowledge of a speaker as lexicographer will allow a dictionary to explore idiomatic and poetic uses of the language that may escape the outsider lexicographer. An example is the long-term collaboration between a linguist and a speaker of Kalam that has produced the Kalam dictionary (Pawley et al. 2011). Developments in the methodology of language documentation (Thieberger and Berez 2012) include the creation of a citable corpus consisting of primary recordings linked to their transcripts, which, in turn, allows the traditional textual analyses of concordances and collocation searches to be linked back to primary media. This simple advance has allowed, for example, prosodic features to be recovered more easily than was the case before. And clearly this method facilitates the citation of playable media for use in dictionaries. A dictionary can be used to provide a standard form of words or spelling, and, in some cases a vernacular literacy program will only be allowed to proceed if there is a dictionary available (this was the case in Vanuatu in the late 1990s). However, given the typically low functional load of literacy in indigenous languages, there is also a great deal of variability in spellings in the cases when the language is written. An exciting development, but one that can lead to tension with the static authority of the dictionary, is the increase in use of literacy via variable forms and spellings in texting in social media.

What is Different About Dictionaries of Indigenous Languages? Lexicography for indigenous languages can require particular methods, as discussed in the volume edited by Frawley et al. (2002), and also by Bowern (2008, pp. 203–207), Mosel (2011), Haviland (2006), Ogilvie (2011), Simpson (1993) and the December 2011 special issue of the International Journal of Lexicography on dictionaries of endangered languages (http://ijl.oxfordjournals.org/content/24/4.toc). Briefly, there are particular issues related to the number of speakers, the number of literate speakers, the role of literacy in the community, and the amount of information recorded in the language. With few speakers it can be difficult to find an enthusiastic lexicographer who speaks the language, especially given all of the priorities that minority communities have to deal with, so it is likely that most of the work of collecting and establishing a lexical database will rely on an outsider linguist whose language ability will

Page 5 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015 be less than native-speaker competence. With low levels of literacy come associated questions of orthography design and the audience for the dictionary (Corris et al. 2004). Another major difference noted by Mosel (2011, p. 337) between dictionaries of metropolitan and indigenous languages is that “the latter are non-profit enterprises with limited resources of time, money and staff.” (While Mosel’s observation relates specifically to endangered languages it applies equally to most indigenous languages.) She goes on to point out the sense of urgency that comes with recording what may be the last generation of speakers who have detailed linguistic knowledge.

Synchronic and Historical Principles in Dictionaries in This Region In general, dictionaries arising from fieldwork are necessarily synchronic, but in Oceanic languages the existence of substantial reconstructions of proto forms (see below) allows these forms to be included in synchronic dictionaries for comparative purposes (e.g., Malau 2011). The Swadesh (1971) list or variants of it have been used in comparative work in Australia (Menning and Nash 1981). Claire Bowern (Bowern and Atkinson 2012) has built a comparative database of some 739,000 lexical items from 405 varieties of Pama-Nyungan languages. POLLEX (http://pollex.org.nz/) is a lexical comparison project for Polynesian languages begun by Bruce Biggs in 1965 and migrated from paper slips to punchcards to databases and now to an online system (Greenhill and Clark 2011) containing some 55,000 entries from 68 languages, with the number of words from any given language varying depending on availability (over 3,200 words in Māori, 2,600 in Tongan). Robert Blust (Blust and Trussel 2010) is producing an ongoing web-based comparative Austronesian project and a subset of that project identifying regular sound correspondences in all Austronesian languages is also online (http://language.psy.auckland.ac.nz/austronesian/; Greenhill et al. 2008) containing 203,845 lexical items representing 210 words from each of 998 Pacific languages. A larger list of 1,200 items in 80 Austronesian languages was used in the Comparative Austronesian Dictionary (Tryon 1995). A reconstruction of Proto-Oceanic vocabulary is being worked on and aims to produce seven volumes (the first four are Ross et al. 1998, 2003, 2008, 2011) organized by topic.

Principles of Definition Writing, Explanation, and/or Translation Followed in Dictionaries in This Area Typically, for projects working to record indigenous languages, the initial lexical information is in the form of a wordlist. If the work continues, then the microstructure may become more detailed, for example, exploring semantic relationships in lexica. The Warlpiri dictionary includes “definitions” composed by Warlpiri people, both as examples of the usage of the headword, but also as an explication, for example, the word jaaljaaj(pa) has the sense “feeling, hunch, premonition” and is followed by the Warlpiri sentence with the English gloss “I have a feeling about something. Perhaps they are going to hit my son” (Simpson 1993, p. 140). The Warlpiri dictionary has been implemented using a platform called Kirrkirr (http://www-nlp.stanford.edu/kirrkirr/), allowing active links to media, and navigation of seman- tic relationships of synonymy, antonymy and other forms of relatedness (Manning et al. 2001), something that is also provided for in Matapuna (http://sourceforge.net/projects/matapuna/). Despite the Natural Semantic Metalanguage originating in Australia (Goddard 2003) it has not been taken up in the construction of dictionaries, but has been used in explication of semantic concepts in some indigenous languages (see, for example, articles in the Australian Journal of Linguistics, Vol. 33(3), 2013).

Page 6 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Software Electronic lexicons in which the form is distinguished from the content (lexical databases) allow a variety of derived forms of dictionaries and, happily, there are many examples of well-structured lexical databases for languages of this region, among them the Australian dictionaries of Arrernte and Warlpiri and the more recent dictionaries of Vurës, Tamambo and South Efate (Vanuatu). Because many language documenta- tion projects produce recordings with transcripts that are then interlinearised, the currently preferred software tools are Toolbox (http://www-01.sil.org/computing/toolbox/) and Fieldworks (http://field works.sil.org/; see the review by Rogers 2010), which both allow a lexicon to feed text glossing, while TLex or TshwaneLex (http://tshwanedje.com/tshwanelex/) also has some users (see the review by Bowern 2007). LexiquePro (http://www.lexiquepro.com/) is used by a number of projects to present their Toolbox dictionaries as either standalone Windows-based executables, or exported to HTML. The online dictio- nary system Matapuna was developed in New Zealand and used for the development and presentation of Māori dictionaries (see the review by Bah 2010). Lang et al. (1972) describe the use of 80-column punch cards for encoding a 5,500 headword dictionary of Enga (PNG), in perhaps the earliest example of computational lexicography for an indigenous language of the region. In the early 1980s in Hawaii, Robert Hsu built software to create lexical databases from which dictionaries in various forms could be derived. His Lexware (Hsu 1985) program worked with a text file with “bands”–fields delimited in a way very similar to that subsequently used by the popular tool Shoebox/Toolbox. Hsu acted as a consultant to over a hundred dictionaries from all over the world (Hsu, p.c.) with the result that many well-formatted dictionaries appeared as books, with the files allowing for later revisions to the original dictionaries or for the development of web-based dictionaries, like the Austronesian Comparative Dictionary, Combined Hawaiian Dictionary, Combined Kiribati-English Dictionary, Marshallese-English Online Dictionary, Micronesian Comparative Dictionary, Mokilese- English Dictionary, Pohnpeian-English Dictionary, and Yapese Dictionary (all at http://www.trussel2. com/). While ideally dictionaries would be created using dedicated lexicographic tools, it is nevertheless the case that some fine dictionaries have been produced from word processors or databases (DBMSs), with lexicographers using the tools they are most comfortable with. It has, for example, been possible for the Ngarrindjeri dictionary project to use FileMakerPro to allow data entry and searching, and to export the content in XML using XSL to convert to a Toolbox format for formatting with the Multi-Dictionary Formatter (Thieberger 2011a) to produce more suitable paper-based copy. The dictionary display software Miromaa (http://www.miromaa.org.au/Miromaa/Miromaa-Features.html) is used by a number of lan- guage centers in Australia and was developed in Newcastle. Where word processors have been used to create a dictionary and the digital data is still available it is possible to impose a marked-up structure on the file, using regular expressions or the online conversion service provided by OxGarage (http://www.tei-c.org/ege-webclient/). An example is the Tahitian dictio- nary, initially produced using the database 4thDimension and then exported to a word processor for publication. As it was handcrafted and the project’s time had run out, the French-Tahitian reversal was only produced for the letters A to D. The document could be converted to a structured file in a matter of several hours, thus making it possible to generate a full reversal. In Australia, the National Lexicography Project (Simpson and Nash 1989) of the late 1980s and its successor, the Aboriginal Dictionaries Project (1990–1994), both based at the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS), supported the creation of over 50 dictionaries. The Aboriginal Studies Electronic Data Archive (ASEDA, cf. http://www1.aiatsis.gov.au/aseda/docs/; Thieberger 1995), also at AIATSIS, was an early repository for curating and storing the underlying data files for these dictionaries and making them available for reuse, often in Aboriginal language centers (http://www.ourlanguages.net.au/language-centres.html). It was noted that this set of material could

Page 7 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015 become the basis for a pan-Australian dictionary (Goddard and Thieberger 1997, p. 192)—a project yet to eventuate but still a possibility (Bowern’s database, mentioned earlier, is largely based on material from ASEDA). Perhaps the first web-based dictionary of an Australian indigenous language was of Gamilaraay (Austin and Nathan 1998). The only repository for digital files of Pacific dictionaries is Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) (http://catalog.paradisec.org.au/), which currently only holds seven dictionary files in editable text formats (Niuean, Namakir, South Efate, Chamorro, Koita, Kalam, and Golin) and includes many more image files of scanned dictionaries. A problem faced by indigenous people in settler societies is that the colonists regard the indigenous society as being of little value, including its languages. Early popular wordlists of indigenous Australian languages (e.g., Kenyon 1975; Reed 1965) have tended to exoticize and mix words from different languages, doing little to educate about the nature and diversity of the languages. A project to counter this tendency was the publication of a collection of words in 17 Australian languages (Thieberger and McGregor 1994) produced by the Macquarie dictionary, with an introduction to the structure of each language and a topical wordlist.

The Language of the Dictionaries, Multilingual and Monolingual For many languages with brief grammatical descriptions, a simple word translation is the first step in identifying words and meanings. Often the only dictionary of the language is in this form, sometimes as an appendix to the grammar, e.g., Moyle’s(2011) Takuu grammar is 56 pages long with the dictionary taking up the remaining 370 pages (see also Shintani and Paita 1990; Senft 1986). Typically dictionaries are bilingual, or occasionally trilingual, with the local lingua franca and a language of wider communication, for example, Nelemwa-Nixumwak-Français-Anglais (Bril 2000), the Baruya dictionary in Baruya, Tok Pisin and English (Lloyd 1992), the Alawa-Kriol-English dictionary (Sharpe and Le May 2001), or the Melpa dictionary in Melpa, German and English (Stewart et al. 2011). Fiji’s monolingual dictionary of some 25,000 headwords (http://www.fijianaffairs.gov.fj/IILC%20Publica tion.html) was ready for publication in 1996 (Geraghty 2007) but has not yet appeared. Taumoefolau (1998, p. 23) reports on monolingual dictionaries in progress for Tongan, Samoan and Niuean. A monolingual Māori dictionary is being developed by the Māori Language Commission (http://www.tetaurawhiri.govt.nz/english/ resources_e/) and the monolingual Palauan dictionary contains 13,800 entries (Ramarui and Temael 2000).

Online Availability of Dictionaries Increasingly, new dictionaries and updates to older ones are being produced in electronic form only, as websites (e.g., the Cook Islands dictionary: http://mangaia.cookislandsdictionary.com/); in WordPress blog sites (e.g., Sikaiana: http://www.sikaianaarchives.com/dictionary-3/); as online dictionaries that can incorporate images and audio (e.g., Malau 2011; Thieberger 2011a; Cablitz 2011a); or the collection of online dictionaries produced by LexiquePro: http://www.lexiquepro.com/library.htm); as downloadable versions (e.g., the collection of dictionaries available from the Wangka Maya website: http://www. wangkamaya.org.au/pilbara-languages/resources-and-dictionaries); and occasionally with the use of publish-on-demand services like CreateSpace to allow for paper copies as required (e.g., the reprint of an 1865 Hawaiian dictionary: https://www.createspace.com/4186680, or the I-Kiribati dictionary: https:// www.createspace.com/3513726). The Bible translating organization SIL’s Papua New Guinea website (http://www-01.sil.org/pacific/png/show_subject.asp?pubs=online&code=DAV) provides links to some 70 online dictionaries, and their Australian site lists 13 dictionaries (http://ausil.org.au/node/3717). Delivery of dictionaries on portable devices like mobile phones is also developing quickly (e.g., Ma! Iwaidja: http://www.iwaidja.org/site/ma-iwaidja-phone-app/, and a number of other examples: http:// globalnativenetworks.wordpress.com/2013/06/18/idecolonize-indigenous-language-learning-mobile-

Page 8 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015 apps/). The now superseded Project for free electronic dictionaries (PFED, http://pfed.info/) was an early example of using mobile phones for delivery of dictionary text. Multimedia presentation allows exploration and presentation of information that clarifies the meaning of words, especially biological identification and processes that would otherwise require long and detailed textual descriptions (Cablitz 2011b). A risk in early multimedia dictionaries was that the content became unplayable as software became outdated, but the focus on separation of content and form that is common to most current lexicographic projects should ensure that this is no longer a problem. Taumoefolau (1998, p. 37) notes that native speakers can be disappointed in the content or usability of a dictionary if its primary audience is linguists. The inclusion of abstract grammatical information or cognate forms in proto languages is unlikely to make the dictionary friendly to general users. As many existing dictionaries were produced as monolithic works in a word processor, it is understandable that they tried to be as comprehensive as possible. However, now that lexical databases allow outputs in various forms (learner’s dictionary, topical wordlist and so on) it should be easier to create user-friendly dictionaries in addition to maintaining a more technical version for access by linguists, perhaps as an online-only version. Dictionaries produced by community-governed language academies or language centers can be designed to be more accessible, e.g., the Bardi dictionary (Aklif 1999) with its copious illustrations, photographs, and a placenames list, or a number of Australian dictionaries that appear in both traditional textual format and also as picture dictionaries (http://iadpress.com/shop/category/aboriginal- languages/; Wangka Maya 2006; Ross and Turpin 2004; Moore and Blackman 2004), in which a set of a few hundred terms are presented with illustrations, usually arranged by topic, to broaden the accessibility of the dictionary. They can also include information like, for example, the god associated with each word as in Te Taura Whiri i te Reo Maori (2009) – an attempt to make lexicography more Māori. Early dictionaries are being reprinted either in the absence of more recent research, or because they represent what is felt to be an older and hence more authoritative version of the language. For example, Bindon and Chadwick (2011) is a book printed from a database of Nyungar (Western Australia) words compiled from a collection of early sources. A dictionary of Puynipet (Caroline Islands), originally published in 1881 was reprinted in 2011 (P.A.C. 2011). The dictionary of Roviana (Waterhouse 2005/ 1928) was reprinted as facsimile in 2005, and the Kiribati dictionary of 1908 was reprinted in 2004 (Bingham 2004). Out of print or out of copyright dictionaries are also being made available online, as text or pdf files (e.g., Cheke Holo: White et al. 1988;Sa’a and Ulawa, Solomon Islands in the Internet Archive (https:// archive.org/details/dictionarygramma00ivenuoft); a Māori dictionary (Fletcher 1907) is available via Project Gutenberg. A Motu dictionary dictionary (Lawes 1896) is freely available at the internet archive. Once the dictionary is out of copyright and made openly available it can be reprinted and offered for sale on the internet. The potential complications and ethical issues around this kind of reuse of dictionaries is discussed by Peter Austin and others in a series of blog posts (cf. http://www.paradisec.org.au/blog/2008/ 07/copy-right/, and http://www.paradisec.org.au/blog/2011/04/theyre-out-to-get-you-or-your-data-at- least/). Few lexicographic projects in the region have used the Text Encoding Initiative (TEI, http://www.tei-c. org/) to encode the primary data of a dictionary. The New Zealand Electronic Text Collection provides an XML version of a 1957 Maori dictionary (Williams 1957). XML has also been used to encode legacy material produced by the linguist Gerhardt Laves in the 1930s (Henderson 2008), consisting of a mix of dictionary and texts of Nyungar (Southern Western Australia). The earliest wordlist of an Australian language, William Dawes’s(1790–91) manuscript, has been reproduced in digital form (http://www. williamdawes.org/), with a transcript in XML allowing location of parts of the manuscript. A similar project, using the TEI, is encoding a wordlist collection by Daisy Bates (http://languages-linguistics. unimelb.edu.au/current-projects/digital-daisy-bates) from the early 1900s.

Page 9 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Strategies for Finding Words and Principles for Selecting Them Language documentation encourages creating corpora of which a rich, perhaps encyclopedic, lexicon is a part. Ideally, a textual corpus provides the collocations and frequency counts that inform the dictionary (Boyce 2006), but, for most languages such corpora simply do not exist. Even where a linguist has been active in fieldwork, there is no tradition of creating a reusable corpus and, of the few corpora that are created, even fewer are publicly available. Much lexicographic practice for indigenous languages is clearly based on elicitation which can be driven by an existing questionnaire (e.g., Bouquiaux and Thomas 1992; Sutton and Walsh 1979) or by other forms of exploration (e.g., the “rapid word collection” approach: http://rapidwords.net/ and http://www-01.sil.org/computing/ddp/DDP_downloads.htm). Mosel (2011), with experience of working with Teop (PNG) and Samoan dictionaries, recommends against the use of standard wordlists and, instead, advocates using a thematic approach to focus on a single domain thus resulting in mini dictionaries that can feed into a larger work but which are self-contained in themselves. The complementary task of text collection and the establishment of a corpus is labor intensive, but ultimately rewarding both for the lexicographer and for the speakers, who will have a lasting record of their narratives. More recent documentation projects typically include the construction of a corpus, which, while nowhere near the size of corpora for metropolitan languages, still provide contextual examples to be used in the dictionary. This is particularly useful for locating new words as can be found in several dictionaries, for example the Iaai dictionary has a section on the internet and on new words needed for the present day (Miroux and Jeno 2007, p. 315), and Stephens and Boyce (2011, 2013) report on a corpus of Māori legal terms (http://nzetc.victoria.ac.nz/tm/scholarly/tei-legalMaoriCorpus.html). Beyond the traditional structure of a headword and lexical treatment, there are a number of examples of detailed explorations of particular topics in language of the region, for example, the classic ornithological study in Kalam by Majnep and Bulmer (1977), the ethnobiology of Pohnpei (Balick 2009), or the fruiting plants of Oceania (Walter and Sam 1999).

Conclusion

The number of languages and the geographically huge area of the Australia-Pacific region militate against extensive research on every indigenous language of that region. While a few detailed dictionaries have been created, much more work remains to be done. Prospects are good for the long-term storage and dissemination of dictionaries using existing methods and innovative technologies now being developed for mobile devices both for delivery and for crowd-sourcing content of new dictionaries.

Acknowledgements

This chapter was written while I was funded by Australian Research Council grant DP0984419. Thanks to the Department of Linguistics at the University of Cologne for hosting me during 2013, and to the Alexander von Humboldt Foundation for awarding me a Ludwig Leichhardt Jubilee Fellowship. Thanks to David Nash and Mary Boyce for very helpful comments and to Wolfgang Sperlich for providing a copy of his work.

Page 10 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

References

Austin, P. (Ed.). (1983). Papers in Australian linguistics no. 15: Australian aboriginal lexicography. Canberra: Pacific Linguistics. Austin, P. (1991). Australian lexicography. In F. J. Hausmann, O. Reichmann, & H. E. Wiegand (Eds.), Encyclopedia of lexicography (pp. 2638–2641). Berlin: Walter de Gruyter. Bah, O. (2010). Review of Matapuna dictionary writing system. Language Documentation & Conser- vation, 4, 169–176. Bouquiaux, L., & Thomas, J. (Eds.). (1992). Studying and describing unwritten languages. Dallas: Summer Institute of Linguistics. Bowern, C. (2007). Review of TshwaneLex dictionary compilation software. Language Documentation & Conservation, 1(1), 94–99. Bowern, C. (2008). Linguistic fieldwork: A practical guide. New York: Palgrave Macmillan. Bowern, C., & Atkinson, Q. (2012). Computational phylogenetics and the internal structure of Pama- Nyungan. Language, 88(4), 817–845. Boyce, M. T. (2006). A corpus of modern spoken Māori. Ph.D. Dissertation, Victoria University of Wellington, Wellington. Cablitz, G. H. (2011a). Documenting cultural knowledge in dictionaries of endangered languages. International Journal of Lexicography, 24(4), 446–462. Cablitz, G. H. (2011b). The making of a multimedia encyclopaedic lexicon for and in endangered speech communities. In G. Haig, N. Nau, S. Schnell, & W. Wegener (Eds.), Documenting endangered languages: Achievements and perspectives (pp. 223–262). Berlin: Walter de Gruyter. Corris, M., Manning, C., Poetsch, S., & Simpson, J. (2004). How useful and usable are dictionaries for speakers of Australian indigenous languages? International Journal of Lexicography, 17(1), 33–68. Crowley, T. (1999). The socially responsible lexicographer in Oceania. Journal of Multilingual and Multicultural Development, 20(1), 1–12. Dixon, R. M. W. (1972). The Dyirbal language of North Queensland. Cambridge: Cambridge University Press. Dixon, R. M. W. (2002). Australian languages: Their nature and development. Cambridge: Cambridge University Press. Evans, N. (2010). Dying words: Endangered languages and what they have to tell us. Oxford: Wiley- Blackwell. Fischer, S. R. (1997). Rongorongo: The Easter Island script. History, traditions, texts. Oxford: Clarendon. Foley, W. A. (1986). The Papuan languages of New Guinea. Cambridge: Cambridge University Press. Frawley, W., Hill, K. C., & Munro, P. (Eds.). (2002). Making dictionaries: Preserving indigenous languages of the Americas. Berkeley: University of California Press. Geraghty, P. (2007). The Fijian dictionary experience. In J. Siegel, J. Lynch, & D. Eades (Eds.), Language description, history and development: Linguistic indulgence in memory of Terry Crowley (pp. 383–394). Amsterdam: John Benjamins. Goddard, C. (2003). Thinking across languages and cultures: Six dimensions of variation. Cognitive Linguistics, 14(2–3), 109–140. Goddard, C., & Thieberger, N. (1997). Lexicographic research on Australian aboriginal languages, 1968–1993. In D. Tryon & M. Walsh (Eds.), Boundary rider. Essays in honour of Geoffrey O’Grady (pp. 175–208). Canberra: Pacific Linguistics. Greenhill, S. J., & Clark, R. (2011). POLLEX-Online: The Polynesian Lexicon project online. Oceanic Linguistics, 50(2), 551–559.

Page 11 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Greenhill, S. J., Blust, R., & Gray, R. D. (2008). The Austronesian basic vocabulary database: From bioinformatics to lexomics. Evolutionary Bioinformatics, 4, 271–283. Haviland, J. (2006). Documenting lexical knowledge. In J. Gippert, N. P. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp. 129–162). Berlin: Mouton de Gruyter. Henderson, J. (2008). Capturing chaos: Rendering handwritten language documents. Language Docu- mentation & Conservation, 2(2), 212–243. Hsu, R. (1985). Lexware manual. Manoa: Department of Linguistics, University of Hawai‘i. Lang, A., Mather, K. E. W., & Rose, M. L. (1972). Information storage and retrieval: A dictionary project. Canberra: Department of Linguistics, Research School of Pacific Studies, Australian National University. Lichtenberk, F. (2003). To list or not to list: Writing a dictionary of a language undergoing rapid and extensive lexical changes. International Journal of Lexicography, 16(4), 387–401. Lynch, J. (1998). Pacific languages: An introduction. Honolulu: University of Hawai‘i Press. Lynch, J., & Crowley, T. (2001). Languages of Vanuatu: A new survey and bibliography. Canberra: Pacific Linguistics. Manning, C. D., Jansz, K., & Indurkhya, N. (2001). Kirrkirr: Software for browsing and visual explora- tion of a structured Warlpiri dictionary. Literary and Linguistic Computing, 16(2), 135–151. Menning, K., & Nash, D. (1981). Sourcebook for Central Australian languages. Alice Springs: IAD. Mosel, U. (2011). Lexicography in endangered language communities. In P. K. Austin & J. Sallabank (Eds.), The Cambridge handbook of endangered languages (pp. 337–353). Cambridge: Cambridge University Press. Nathan, D., & Austin, P. (1992). Finderlists, computer-generated, for bilingual dictionaries. International Journal of Lexicography, 5(1), 132–164. O’Grady, G. N. (1971). Lexicographic research in Aboriginal Australia. In T. A. Sebeok (Ed.), Current trends in linguistics 8 (pp. 779–803). The Hague: Mouton. Ogilvie, S. (2011). Linguistics, lexicography, and the revitalization of endangered languages. Interna- tional Journal of Lexicography, 24(4), 389–404. Pawley, A. (2001a). Phraseology, linguistics and the dictionary. International Journal of Lexicography, 14(2), 122–134. Pawley, A. (2001b). Some problems of describing linguistic and ecological knowledge. In L. Maffi (Ed.), On biocultural diversity: Linking language, knowledge, and the environment (pp. 228–247). Wash- ington: Smithsonian Institution. Pawley, A. (2009). Grammarians’ languages versus humanists’ languages and the place of speech act formulas in models of linguistic competence. In R. Corrigan, E.A. Moravcsik, H. Ouali, and K. Wheatley (Eds) Formulaic Language: Volume 1. Distribution and historical change (pp. 3–26.)Amster- dam: John Benjamins. Pawley, A. (2011). What does it take to make an ethnographic dictionary? On the treatment of fish and tree names in dictionaries of Oceanic languages. In G. L. J. Haig, N. Nau, S. Schnell, & C. Wegener (Eds.), Documenting endangered languages: Achievements and perspectives (pp. 263–287). Berlin: Walter de Gruyter. Rivierre, F. (1979). Le Pije, langue de Tiendanite et de la vallée de Tipindje. In A. G. Haudricourt, J. Rivierre, F. Rivierre, C. Moyse Faurie, & J. de la Fontinelle (Eds.), Les langues me´lane´siennes de nouvelle-cale´donie (pp. 38–44). Nouméa: DEC, Bureau Psychopédagogique. Rogers, C. (2010). Review of fieldworks language explorer (FLEx) 3.0. Language Documentation & Conservation, 4,78–84.

Page 12 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Seiss, M., & Nordlinger, R. (2011). An electronic dictionary and translation system for Murrinh-Patha. In Proceedings of the EUROCALL 2011 conference, University of Nottingham. Siegel, J. (1996). Vernacular education in the South Pacific. Canberra: Australian Agency for Interna- tional Development. Simpson, J. (1993). Making Dictionaries. In M. Walsh & C. Yallop (Eds.), Language and culture in Aboriginal Australia (pp. 123–144). Canberra: Aboriginal Studies Press. Simpson, J., & Nash, D. (1989). AIAS archive of machine-readable files of Australian languages: The National Lexicography Project. Australian Aboriginal Studies, 1(1989), 57–59. Sperlich, W. B. (1997a). A review of advances in Polynesian lexicography. Rongorongo, 8(1):25–35; 8(2):59–72. Stephens, M., & Boyce, M. (2011). Finding a balance: Customary legal terms in a modern Move to dictionary section. International Journal of Lexicography, 24(4), 432–445. Sutton, P., & Walsh, M. (1979). Revised linguistic fieldwork manual for Australia. Canberra: Australian Institute of Aboriginal Studies. Swadesh, M. (1971). The origin and diversification of language. Chicago: Aldine. Taumoefolau, M. L. (1998). Problems in Tongan Lexicography. Ph.D. thesis, University of Auckland, Auckland. Tepahae, P. (2011). Diksnari blong Aneityum. In J. Taylor & N. Thieberger (Eds.), Working together in Vanuatu: Research histories, collaborations, projects and reflections (pp. 67–72). Canberra: ANU E Press. Thieberger, N. (1995). The Aboriginal Studies Electronic Data Archive. International Journal on the Sociology of Language, 113, 147–150. Thieberger, N. (2011a). Building a lexical database with multiple outputs: Examples from legacy data and from multimodal fieldwork. International Journal of Lexicography, 24(4), 463–472. Thieberger, N. (2013). Language archives for the Pacific [Presentation at the DoBeS conference Lan- guage Documentation: Past – Present – Future, Hanover, June 5–7, 2013]. Thieberger, N., & Berez, A. (2012). Linguistic data management. In N. Thieberger (Ed.), The Oxford handbook of linguistic fieldwork (pp. 90–118). Oxford: Oxford University Press. Walsh, D. S. (2007). Structure, style and content in dictionary entries for an Oceanic language. In J. Siegel, J. Lynch, & D. Eades (Eds.), Language description, history and development: Linguistic indulgence in memory of Terry Crowley (pp. 371–381). Amsterdam: John Benjamins.

Dictionaries Abo, T., Bender, B. W., Capelle, A., & DeBrum, T. (1976). Marshallese-English dictionary. Honolulu: University Press of Hawaii. Académie tahitienne. (2008). Dictionnaire français-tahitien. Tome 1 (A-D). Papeete: Académie tahitienne. Aklif, G. (1999). Ardiyooloon Bardi Ngaanka: One Arm Point Bardi dictionary. Halls Creek: Kimberley Language Resource Centre. Alpher, B. (1991). Yir-Yoront Lexicon: Sketch and Dictionary of an Australian Language (Trends in Linguistics 6). Berlin: Mouton de Gruyter. Austin, P., & Nathan, D. (1998). Kamilaroi/Gamilaraay Web Dictionary. http://www1.aiatsis.gov.au/ aseda/WWWVLPages/AborigPages/LANG/GAMDICT/GAMDICT.HTM Balick, M. J. (2009). Ethnobotany of Pohnpei: plants, people, and island culture. Honolulu: University of Hawai‘i Press. Bell, J. (2004). Dictionary of the Butchulla language. Hervey Bay: Korrawinga Aboriginal Corporation.

Page 13 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Bindon, P., & Chadwick, R. (Eds.). (2011). A Nyoongar wordlist: from the south-west of Western Australia (2nd ed.). Welshpool: Western Australian Museum. Bingham, H. (2004). A Kiribati-English dictionary. [Apia, Samoa]: Reprinted by the Institute for Research, Extension and Training in Agriculture (IRETA) and Technical Centre for Agricultural and Rural Cooperation (CTA). Blust, R., & Trussel, S. (2010). Austronesian Comparative Dictionary, web edition. http://www.trussel2. com/ACD Bradley, J., Kirton, J., & Yanyuwa Community. (1992). Yanyuwa Wuka: language from Yanyuwa Country – a Yanyuwa dictionary and cultural resource. http://espace.library.uq.edu.au/eserv/ UQ:11306 Bril, I. (2000). Dictionnaire nêlêmwa-nixumwak-français-anglais: avec introduction grammaticale et lexiques. Paris: Peeters. Bugenhagen, R. D., & Bugenhagen, S. E. (2007). Ro ta ipiyooto sua Mbula Uunu = Mbula-English dictionary. http://www.sil.org/pacific/png/abstract.asp?id=49817 Carroll, V., & Soulik, T. (1973). Nukuoro lexicon. Honolulu: University Press of Hawaii. Crowley, S. S. (1986). Tolo dictionary. Canberra: Pacific Linguistics. Crowley, T. (2000). An Erromangan (Sye) dictionary. Canberra: Pacific Linguistics. Dawes, W. (1790–91). Vocabulary of the language of N.S. Wales, in the neighbourhood of Sydney (Native and English), by Dawes. London: School of Oriental and African Studies. Manuscript, Marsden Collection 41645b. Donner, W. W. (2012). Sikaiana Archives – Dictionary. http://www.sikaianaarchives.com/dictionary-3 Evans, N., Merlan, F., & Tukumba, M. (2004). A First Dictionary of Dalabon (Ngalkbon). Maningrida: Maningrida Arts and Culture. Fletcher, H. J. (1907). Hinemoa With Notes & Vocabulary. http://www.gutenberg.org/ebooks/22009 Ford, L., & McCormack, D. (2007). Murrinh tetemanthay ngarra murrinh law kardu bamam thanguna: Murrinhpatha – English Legal Glossary. Bowden McCormack Lawyers + Advisors. http://www. bowden-mccormack.com.au François, A. (2002). Araki: a disappearing language of Vanuatu. Canberra: Pacific Linguistics. Glass, A., Hackett, D., & Newberry, B. (2003). Ngaanyatjarra-Ngaatjatjarra to English dictionary. Alice Springs: IAD Press. Green, J. (2010). Central & Eastern Anmatyerr to English Dictionary. Alice Springs: IAD Press. Harrison, S. P., & Albert, S. (1977). Mokilese-English dictionary. Honolulu: University Press of Hawaii. Hashimoto, K. (1996). Ata-English dictionary. Ukarampa, Papua New Guinea: Summer Institute of Linguistics. Heath, J. (1982). Nunggubuyu dictionary. Canberra: Australian Institute of Aboriginal Studies. Henderson, J., & Dobson, V. (1994). Eastern and Central Arrernte to English Dictionary. Alice Springs: IAD Press. Inia, E. K. M. (1998). Fäeag 'Es Fuaga: Rotuman Proverbs, Compiled and Translated by Elizabeth K.M. Inia, with an essay on proverbs in Rotuman culture by Alan Howard and Jan Rensel. Suva: Institute of Pacific Studies, University of the South Pacific. Inia, E. K. M., & Churchward, C. M. (1998). A new Rotuman dictionary: an English-Rotuman wordlist. Suva: Institute of Pacific Studies, University of the South Pacific. Jauncey, D. (2011). Dictionary of Tamambo, Malo. http://paradisec.org.au/vandicts/Tamambolexicon/ index-english/main.htm Josephs, L. S., & McManus, E.G. (1990). New Palauan-English dictionary. Honolulu: University of Hawai‘i Press. Kenyon, J. (1975). The Aboriginal word book. Melbourne: Lothian.

Page 14 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Kōmike Huaʻōlelo, ʻAha Pūnana Leo, & Hale Kuamoʻo. (2003). Māmaka kaiao: a modern Hawaiian vocabulary . A compilation of Hawaiian words that have been created, collected, and approved by the Hawaiian Lexicon Committee from 1987 through 2000. Honolulu: University of Hawai‘i Press. Lawes, W. G. (1896). Grammar and vocabulary of language spoken by Motu tribe (New Guinea), by Rev. W.G. Lawes ...with introduction by the Rev. George Pratt. Sydney: C. Potter. Lawrence, M. (2006). Oksapmin dictionary. http://www-01.sil.org/pacific/png/abstract.asp?id=48954 Lewis, M. P., Simons, G. F., & Fennig, C. D. (Eds.). (2013). Ethnologue: Languages of the World, Seventeenth edition. Dallas: SIL International. http://www.ethnologue.com Lichtenberk, F. (2008). A dictionary of Toqabaqita (Solomon Islands). Canberra: Pacific Linguistics. Lieber, M. D., & Dikepa, K. H. (1974). Kapingamarangi lexicon. Honolulu: University Press of Hawaii. Lloyd, J. A. (1992). A Baruya-Tok Pisin-English dictionary. Canberra: Department of Linguistics, Research School of Pacific Studies, Australian National University. Lynch, J., & Tepahae, P. (2001). Anejom dictionary: diksonari blong Anejom: nitasviitai a nijitas antas Anejom. Canberra: Pacific Linguistics. Majnep, I. S., & Bulmer, R. (1977). Birds of my Kalam country = M̄nmon yad Kalam yakt. Auckland: Auckland University Press. Malau, C. (2011). Dictionary of Vurës. http://paradisec.org.au/vandicts/Vureslexicon/index-english/ main.htm Mellow, G. (2013). A Dictionary of Owa: A Language of the Solomon Islands. Canberra: Pacific linguistics. Miroux, D., & Jeno, J. (2007). Dictionnaire français-iaai: dictionnaire contextuel et the´matique. Nouméa: Alliance Champlain. Moore, D. C., & Blackman, D. A. (2004). Alyawarr picture dictionary. Alice Springs: IAD Press. Morris, B. (2006). Tirohia Kimihia: a Māori learner dictionary. Wellington: Published for the Ministry of Education by Huia. Moyle, R. M. (2011). Takuu grammar and dictionary: a Polynesian language of the South Pacific. Canberra: Pacific Linguistics. Moyse-Faurie, C. (1989). Dictionnaire xar^ acùù-français^ (Nouvelle-Cale´donie). Nouméa: Éditions populaires. Ngakulmungan, K. L., & Hale, K. L. (1997). Lardil dictionary: a vocabulary of the language of the Lardil people, Mornington Island, Gulf of Carpentaria, Queensland, with English-Lardil finder list. Gununa: Mornington Shire Council. Ozanne-Rivierre, F. (1984). Dictionnaire iaai-français (Ouve´a, Nouvelle-Cale´donie), suivi d’un lexique français-iaai. Paris: SELAF. P.A.C. (2011). Quelques mots de la langue de Puynipet (Île de l’Ascension) dans l’archipel des Carolines, recueillis par les Prêtres des Missions Étrangères de Milan et mis en ordre par le P.A.C. Munich: Lincom. Pawley, A., Bulmer, R., Kias, K., Gi, S. P., & Majnep, I. S. (2011). A dictionary of Kalam with ethnographic notes. Canberra: Pacific Linguistics. Pukui, M. K., & Elbert, S. H. (1986). Hawaiian dictionary: Hawaiian-English, English-Hawaiian (Rev. and enl. ed.). Honolulu: University of Hawai‘i Press. Ramarui, A., & Temael, M. K. (2000). Kerresel a klechibelau: Tekoi er a Belau me a omesodel: Palauan language lexicon. Koror: Belau National Museum. Reed, A. W. (1965). Aboriginal words of Australia. Sydney: A. H. & A. W. Reed. Rehg, K. L., & Sohl, D. G. (1979). Ponapean-English dictionary. Honolulu: University Press of Hawaii. Rivierre, J. C. (1994). Dictionnaire cèmuhˆı-français, suivi d’un lexique français-cèmuhˆı. Paris: Peeters. Ross, A., & Turpin, M. (2004). Kaytetye picture dictionary. Alice Springs: IAD Press.

Page 15 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_92-1 # Springer-Verlag Berlin Heidelberg 2015

Ross, M., Osmond, M., & Pawley, A. (1998). The lexicon of Proto Oceanic: the culture and environment of ancestral Oceanic society (Vol. 1). Canberra: Pacific Linguistics. Ross, M., Pawley, A., & Osmond, M. (2003). The lexicon of Proto Oceanic: the culture and environment of ancestral Oceanic society (Vol. 2). Canberra: Pacific Linguistics. Ross, M., Pawley, A., & Osmond, M. (2008). The lexicon of Proto Oceanic: the culture and environment of ancestral Oceanic society (Vol. 3). Canberra: Pacific Linguistics. Ross, M., Pawley, A., & Osmond, M. (2011). The lexicon of Proto Oceanic: the culture and environment of ancestral Oceanic society (Vol. 4). Canberra: Pacific Linguistics. Sam, L. D. (1995). Dictionnaire drehu-français. Nouméa: C.T.R.D.P./C.P.R.D.P. des îles. Senft, G. (1986). Kilivila: the language of the Trobriand Islanders. Berlin: Mouton de Gruyter. Sharpe, M. C., & Le May, D. (2001). Alawa Nanggaya Nindanya Yalanu junggulu = Alawa-Kriol- English dictionary. Prospect, S. Aust: Caitlin Press. Shintani, T. L. A., & Paita, Y. (1990). Dictionnaire et grammaire de la langue de Pa¨taı . Nouméa: Société d’Études Historiques de la Nouvelle-Calédonie. Sohn, H., & Tawerilmang, A. F. (1976). Woleaian-English dictionary. Honolulu: University Press of Hawaii. Sperlich, W. B. (1997b). Tohi vagahau Niue = Niue language dictionary: Niuean-English, with English- Niuean finderlist. Honolulu: University of Hawai‘i Press. Stephens, M., & Boyce, M. (2013). He Papakupu Reo Ture: A dictionary of Maori legal terms. Wellington: Lexis Nexis. Stewart, P. J., Strathern, A. J., & Trantow, J. (2011). Melpa-German-English dictionary. Pittsburgh: University of Pittsburgh. Tauberschmidt, G., & Snyder, D. M. (1995). Sinauḡoro dictionary. Ukarumpa: Summer Institute of Linguistics. Te Taura Whiri i te Reo Maori. (2009). He Pataka Kupu – Te kai a te rangatira. North Shore: Raupo Publishing/Penguin. Thieberger, N. (2011b). Dictionary of South Efate. http://paradisec.org.au/SELexicon/index-english/ main.htm Thieberger, N., & McGregor, W. B. (1994). Macquarie Aboriginal Words: a dictionary of words from Australian Aboriginal and Torres Strait Islander languages. Sydney: Macquarie Library. Topping, D. M., Ogo, P., & Dungca, B. (1975). Chamorro-English dictionary. Honolulu: University Press of Hawaii. Tryon, D. T. (Ed.). (1995). Comparative Austronesian Dictionary: an introduction to Austronesian studies. Berlin: Walter de Gruyter. Walter, A., & Sam, C. (1999). Fruits d’Oce´anie. Paris: IRD Éditions. Wangka Maya Pilbara Aboriginal Language Centre. (2006). Payungu picture dictionary. South Hedland: Wangka Maya Pilbara Aboriginal Language Centre. Waterhouse, J. H. L. (2005/1928). The Roviana and English dictionary. Sydney: Shepp Books. White, G., Kokhonigita, F., & Pulomana, H. (1988). Cheke Holo (Maringe/Hograno) Dictionary. Canberra: Pacific Linguistics. Williams, H. W. (1957). A Dictionary of the Maori Language. Wellington: Government Printer.

Page 16 of 16 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015

The lexicography of minority languages in Southeast Asia

David Bradley* La Trobe University, Melbourne, Australia

Abstract

This chapter will first introduce the background to the creation of dictionaries for minority languages in this area, which also has wider implications. This is followed by a discussion of unusual lexical characteristics in the languages of this area; case studies of the history of lexicography in three large transnational minority languages: the Tibeto-Burman language Lisu, the Mon-Khmer language Khmu, and the Hmong-Mien language Mien; and then some comments on the regulatory bodies and planning processes which are involved and a brief discussion of future lexicographical prospects and needs.

Introduction

Lexicography among minorities in seven mainland Southeast Asian countries, Vietnam, Laos, Cambodia, Thailand, Burma/Myanmar, Malaysia, and Singapore, is a vast topic. These countries have several hundred minorities speaking languages from four major language families: Sino-Tibetan (mainly in the Tibeto-Burman subgroup), Austroasiatic (all in the Mon-Khmer subgroup), Austro-Thai (including languages from the Thai-Kadai subgroup and the Austronesian subgroup), and Hmong-Mien (formerly known as Miao-Yao from the Chinese names for these groups). There are also immigrant minorities, mainly from East Asia speaking varieties of Chinese and from South Asia speaking Dravidian and Indo- Iranian languages. Many of the indigenous minorities in this area are transnational and also live in China and/or South Asia. Some of the transnational groups in this area are a minority in one country but the dominant national majority in others; for example, Lao, Malay, Khmer, and Viet are substantial minorities in Thailand, and Viet are also a substantial minority in Cambodia. Some minorities which used to be the dominant group in their own former political entities have long-established writing systems and lexico- graphical traditions, like the Mon in Burma/Myanmar and Thailand; others have much shorter traditions of writing and lexicography or none at all. Clearly, this chapter cannot discuss all these minorities; for a more comprehensive list and maps of all groups involved, see Bradley (2007a). Firstly, it should be observed that there are no monolingual dictionaries for any minority languages in Southeast Asia; most are from a minority language into a global language, usually English, or a local national language; there are also some bidirectional dictionaries. Few dictionaries prepared by non-group members in this area are global language into minority language; however, community members do prepare such dictionaries, as we will see below. In China, but not in Southeast Asia, there are many dictionaries from minority language into Chinese and from Chinese into minority language, some bidirectional dictionaries with Chinese, and a few monolingual dictionaries for the languages of larger or more historically and culturally developed minorities; some of these minorities also live in Southeast Asia. Among the dictionaries in the three Southeast Asian case studies below, Premsrirat (1993) is unusual as it is from Thai into the minority language Khmu (written in Thai script) and English.

*Email: [email protected]

Page 1 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015

In this part of the world, three different types of people prepare minority language dictionaries: missionaries, linguists, and community members; in some cases, a particular lexicographer is in two of these categories: missionary and linguist, missionary and community member, or linguist and community member; some lexicographical teams, such as that responsible for Bradley et al. (2006), include all three. We will see examples of missionary dictionaries in all three historical case studies, dictionaries by linguists who are not missionaries in two cases, and dictionaries by community members in two cases. The missionary dictionaries are usually the earliest; they are typically associated with preparing religious materials in languages for use by converts, training converts in literacy in their language so they can read these materials and access more advanced materials in a global language, and helping new outside missionaries to learn the languages and join the work; they are not aimed at scholars. Of course, if the missionary is also a trained linguist, then such materials are likely to be of a very high standard and very comprehensive, based on many years’ experience in the community. Linguists prepare dictionaries which meet normal scholarly standards but are sometimes not fully accessible to or usable by the community. Community members, normally enthusiastic amateurs, prepare dictionaries designed for use by the community, often with the purpose of improving the community’s language abilities in some global language or giving the second generation in an immigrant context better access to their background language. So the underlying aims are completely different! Apart from the problems in phonetic accuracy which may arise when a dictionary is prepared by a missionary or community member with limited background in linguistics, another drawback can be that a dictionary from a global language into a minority language is directly based on an existing monolingual global language dictionary; this is the case for the English-Lisu dictionary by Ngwazah (2007) and the English to Mienh part of Pauh (2002) discussed below in case studies. Such a dictionary invents many neologisms to gloss unfamiliar words but is missing much of the rich traditional cultural vocabulary. Missionaries often prefer not to include vocabulary which they deem to be inappropriate: either too closely associated with traditional religious practices or taboo in the missionary’s own culture; commu- nity members may also omit material due to their own taboos. A dictionary is first and foremost a symbol that a language has arrived; it gives status to the language and its speakers. It can also be a useful tool for learning and maintaining the language, especially where it is endangered. Most dictionaries sit in a prominent place on the shelf in the homes of community members but are little used. This is particularly so when the lexicographer has not considered the needs of the community in its preparation. In academic circles, dictionaries are not always regarded with the same respect as other book-length publications, even though they may require far more work and are of much greater future use to a community and for scholars. Obviously, a dictionary should be a repository for the entire cultural knowledge embedded within a community. This goal is usually not achieved in all semantic fields; for example, while it may be possible to identify and find equivalents or at least scientific names for most animal nouns, nouns for plants are often much more problematic, unless the lexicographer compiling the dictionary has botanical knowledge or assistance from a taxonomic botanist. Where possible, a comprehensive dictionary should include the lexicon of traditional activities of all types: religions (traditional and new), hunting, agriculture, animal husbandry, traditional medicine, and so on. Relevant proper names of individuals, places, and geographical features should also be included. If particular lexical material is archaic, obsolete, taboo, or restricted to use in a particular domain such as religion, this should be stated in the relevant entry. For plants and animals, scientific names should be given where the identification is certain; of course in many cases, a particular lexical item may refer to multiple different species or even genera or to different genera and species in different areas; this should also be indicated where possible.

Page 2 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015

Technical linguistic material, such as sketch grammars embedded in an introduction, detailed form class, etymological and other linguistic information embedded in each lexical entry, and so on may make a dictionary formidable and inaccessible to community users; but their absence would make a dictionary less useful and acceptable for linguists. Given that publishing hard copy materials the size and complexity of a dictionary is costly, it may often be the case that only one dictionary will ever be produced for a particular minority language, so the lexicographer needs to make some compromises: put in the necessary linguistic information but indicate to speakers in a separate introduction in the language what they can ignore – including portions of the introduction containing technical linguistic material and parts of the lexical entries – and where possible, cite forms in a standard orthography that speakers know and prefer, as well as in an appropriate IPA transcription. Also, give entries in an alphabetical order that is conventional in the language, so speakers can find words easily, with clear cross-references for linguists and an outline of the orthography in the linguistic introduction. This is what we have done in Bradley et al. (2006) for Lisu.

Description

Lexical Characteristics in Minority Languages of Southeast Asia In this linguistic area, there are some unusual form classes and lexico-semantic subfields found in minority as well as majority languages. One is numeral classifiers, an obligatory element, almost always one syllable, which must occur after a numeral. The choice of classifier is determined by the semantic properties of the head noun. The inventories of classifiers differ in size between languages; some are quite small sets, others are very large. Most have been grammaticalized from nouns; this process appears to have started about two millennia ago and spread from the East into the Southeast Asian linguistic area; it has also diffused in a more limited way into the South Asian linguistic area (Barz and Diller 1985). For example, in southern Lisu (Bradley et al. 2006):

LI-SU d: KU W: li44 su44 −pha21 ku44 wa21 Lisu male nine CLF.human ‘nine Lisu men’

Within the classifier category, the Ngwi subgroup of Burmic Tibeto-Burman languages has an even more unusual subcategory: family group classifiers for groups of family members with a particular dyadic relationship. For example, Lisu has a classifier for a group of siblings, cousins, and their spouses and also unusual two-syllable classifiers for a group including a father and his children, a group including a mother and her children, and a group including grandparents and their grandchildren (Bradley 2001). In many languages, these family group classifiers are unusual among classifiers because most of them have two syllables; most are derived from kinship terms but with irregular phonetic and semantic developments due to their lexicalization. For example, in southern Lisu (Bradley et al. 2006):

LI P. L; li44 pa55 la 21 four father.children ‘group of four people including a father and three of his children’

Here, /pa55/ has the same form as a more restricted form of the “male” suffix/pha21//pa55/, and /laʔ21/ is related to “child” /za21/ in an irregular way; all of the two-syllable Lisu family group classifiers have an

Page 3 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015 initial /l/ in the second syllable and a final glottal stop. The meaning of this must include a father and at least one of his children; the mother and the children’s spouses can also be included. This form can be used by a family member included or not included in the group or by an outsider; for the father, it means “I and my children” but can also include my wife and my children’s spouses; for the child, it means “I and my father and some of my siblings” and can also include any of my mother, my spouse, and my siblings’ spouses; for the mother, it means “my husband and our children” and can also include my children’s spouses and me. For a grandparent, it can also mean “my son and his children” or “my grandson and his children.” For anyone else, it means a group of four people including a father and some of his children (possibly including the mother and some of the children’s spouses). As these are a semantic subclass of numeral classifiers, they are entered in dictionaries with their form class identified as such, the comment that they cannot occur with the numeral “one” for semantic reasons, and examples showing their use. A second widespread form class is partly reduplicated expressive forms; many are onomatopoetic, but others are based on regular phonological patterns, such as the second syllable having the same rhyme and tone as the first, with a possible third additional syllable identical to the second syllable added, for example, Lisu (3):

a) KO. TO. (TO.) ko55 to55 (to55) ‘crested (of a bird)’ b) BE: LE: (LE:) be21 le21 (le21) ‘hairy/furry (of an animal)’

Such expressive forms are particularly numerous in Mon-Khmer minority languages such as Khmu but are also found in all others in the area. Many languages in the area also have four-syllable elaborate expressions which typically have identical first and third or second and fourth syllables. A third unusual lexical category is birth order names; these are bound suffixal forms also used with some kinship terms. While it is not so unusual worldwide to have personal names or alternative address terms based on gender and order of birth within a family which are derived from or contain numerals, it is more unusual to have unanalyzable lexical forms, as exist in quite a few Tibeto-Burman languages in this area (though similar lexicalized sets of birth order names are found in some Austronesian languages of Indonesia, such as Balinese, and elsewhere). For example, Rawang, Anung, and related languages of northern Burma/Myanmar and nearby in China have lexical forms for first- to ninth-born male and first- to ninth-born female; these are used as personal names and can also be added to many kin terms to indicate, for example, that a father’s brother was one’s father’s second-born elder brother. The Anung system is also borrowed into the northern dialect of Lisu with its nine male and nine female birth order names also used as suffixes on kin terms (Bradley 1994a, 2007c). Other Ngwi Tibeto-Burman languages have less extensive systems, typically with three to five lexical terms for first born, second born, and so on, applicable to either gender and used as personal names and as suffixes on kin terms, which are of course also used to address people. For example, in southern Lisu (Bradley et al. 2006) LE, /le35/ means “second born” and is used with the nominal prefixA:/a21/ and sometimes a gender suffix such as M /ma44/ “female” as in (4a) or suffixed to a kin term as in (4b):

Page 4 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015

a) A:-LE, M a21 le35 ma44 NMZR 2nd.born female ‘Alema (name of a second-born daughter in her family)’

b) WO.-LE, wo55 le35 father’s elder brother 2nd born ‘father’s second-born elder brother’

Words containing these forms occur in words in the form class noun. While such forms pose no difficulty for speakers of the languages, they may appear unfamiliar for outsiders and thus need clear explanation. It is not possible to list the entire paradigm or inventory of forms in the entry of each individual form, so these must be discussed, illustrated, and enumerated together elsewhere, such as in a grammar section of a dictionary.

History of Lexicography in Three Southeast Asian Minority Languages

Lisu The Lisu are a minority of about one million speaking a Central Ngwi Tibeto-Burman language who live in China, Burma/Myanmar, Thailand, and India. Since 1916, they have a unique script which uses upright and inverted upper case Roman letters to represent a central dialect as spoken in China and eastern Burma/ Myanmar and since the 1970s also in Thailand; this is usually called the Fraser script after the main missionary who developed it. There is also a 1950s romanization based on the principles of Chinese pinyin representing a northern dialect as spoken in China, northern Burma/Myanmar, northeastern India, and, since the 1970s, also in Thailand. The southern dialect is spoken in Thailand and in some nearby areas of Burma/Myanmar. In addition, there is a script for the related language Lipo, sometimes called eastern Lisu in the older literature, based on principles developed by the missionary Samuel Pollard from 1904. For more details and illustrations, see Bradley (1994a, 2006) and Bradley and Bradley (1999). The earliest substantial lexicon of Lisu is central Lisu to English in Fraser (1922), by the missionary James O. Fraser, which uses a transliteration of the Fraser script into a romanization, as no font was available for his script then. There is a Lisu-Chinese dictionary of the northern dialect, Xu et al. (1985), which uses the pinyin Lisu script but also includes the Fraser script; Xu was Bai, but the other eight coauthors were Lisu, some with training in linguistics. Based on this is Bradley (1994b) in the pinyin script which has additional northern Lisu lexical material and both a Lisu-English and reversed English- Lisu side. Then there is Bradley et al. (2006), Lisu to English for the southern dialect in the Fraser script; the coauthors are one missionary linguist, one community member, and two linguists. Ngwazah (2007), by a community member Christian pastor who speaks the northern dialect, has English to Lisu for the idealized central dialect in the Fraser script. There is one further script in use for Lisu, a syllabic system devised in the 1920s by an indigenous traditional religious leader, Huang Renpo, but there is no dictionary in this script. For Lipo, there is an unpublished manuscript dictionary by a missionary, Metcalf (1948), from Pollard script Lipo to English; despite its former name eastern Lisu, Lipo is not mutually intelligible with Lisu. Pollard script, devised by the missionary Samuel Pollard from 1904, is used for a number of languages in southwestern China, though not for Lisu itself. The illustration with this chapter shows the Huang Renpo script, the Fraser script, the pinyin Lisu script, IPA transcription, glosses, and translation for a short passage from one of Huang’s manuscripts.

Page 5 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015

The Fraser (1922) materials are quite accurate for their time; Fraser is reported to have been an excellent speaker, and the script he developed for a slightly standardized central dialect is phonologically adequate; it is also able to represent most other dialects. Xu et al. (1985) contains many typos, corrected in Bradley (1994b). Bradley et al. (2006) is in Lisu alphabetical order, designed both for linguists and for speakers, and is used fairly extensively by speakers everywhere; most literate speakers use and prefer the Fraser script. Ngwazah (2007), as noted above, is based on an existing monolingual English learners’ dictionary. It contains many neologisms which would be unfamiliar to all Lisu. It also shows some influence from the

Page 6 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015 northern dialect of its compiler; Lisu people normally try to write in the idealized central dialect originally selected by Fraser but do not always succeed completely. Most Lisu are fairly happy with their current range of dictionaries but would prefer something in the Fraser script and based on the idealized central dialect normally used in writing; that is our next project. Bradley (1994b) and Bradley et al. (2006) are also quite widely used by linguists.

Khmu The Khmu are a minority of over 500,000 speaking a Northern Mon-Khmer language who live in Laos, Thailand, Vietnam, and China. They are most numerous in northern Laos and are the second largest minority group in Laos. Dialect diversity is very substantial in Laos and much less elsewhere, where populations are much smaller. Delcros (1966) is a Khmu-French dictionary of Khmu in Laos by a missionary, which uses a romanization to represent Khmu. I have seen this romanization in use in an informal school in a Khmu village in Luang Phabang Province of Laos, but it has no government approval and is not widespread. The Thai linguist Suwilai Premsrirat has spent many years documenting Khmu and its dialects, initially in Thailand and later in Vietnam, Laos, and China. In addition to many publications on Khmu grammar, phonology, and culture, she produced a Thai-Khmu-English dictionary with introduction in Thai (Premsrirat 1993), a Khmu-Vietnamese-Thai-English dictionary of Khmu as spoken in Vietnam (Premsrirat et al. 1998), and a thesaurus and series of four dictionaries of Khmu as spoken in China, Vietnam, Laos, and Thailand (Premsrirat and Thawornpat 2002a, b, c, d, e). The thesaurus, with an excellent introduction including a sketch grammar of Khmu, is arranged by semantic fields and is Khmu- English-Lao-Vietnamese-Chinese-Thai. The Khmu in China dictionary is Khmu-English-Chinese-Thai; the second Khmu in Vietnam dictionary is from Khmu to English, Vietnamese, and Thai; the Khmu in Laos is Khmu-English-Lao-Thai; and the Khmu in Thailand is Khmu-English-Thai. All are well illustrated with photographs and have similar English introductions. Premsrirat (1993) uses a Thai- based script in Thai alphabetical order for Khmu as spoken in Thailand, but the 1998 and all the 2002 publications use only IPA phonetics in English alphabetical order to represent Khmu. Thus, they may be less useful for community members, though they are ideal for scholars.

Mien The Mien or Mienh are a minority of over 800,000 who live in southwestern China, northern Vietnam, northern Laos, northern Thailand, and a few in northeastern Burma/Myanmar; a substantial community of Mien refugees from Laos now lives in the USA, Canada, and France. Their language is part of the Mienic subgroup of Hmong-Mien. The Chinese term for the Mien and a number of related languages is Yao; this term is also seen in the literature to refer to Mien and to Mienic. The group’s autonym is Iu Mien (where Iu means “people”). Mien and other related Yao languages traditionally have beautifully illustrated Daoist ritual manuscripts written in Chinese characters and read by male ritual specialists in Chinese with a Cantonese pronunciation. These must have become part of Mien ritual when the Yao were still living in what is now eastern Guangxi Province in China, in contact with Cantonese Chinese speakers. There is also a literary language, similarly written by the Mien with Chinese characters and read with a pronun- ciation like archaic Mandarin Chinese. Neither is a form of Mien language. Mien is an unusual case of a successful unification of its romanization. This was prepared at a conference of Mien immigrants in the USA in 1982, avoiding some features of an earlier missionary- devised romanization. In 1984, this was combined with a recently developed Chinese pinyin-based romanization developed in China in a compromise form. In this unified romanization, tones are indicated by postscript letters; the full form of the group name is Mienh (with –h representing a mid-falling tone). There are still some minor spelling differences between materials produced in China and those produced

Page 7 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015 elsewhere. All lexicographical work on this language since 1984 uses this unified romanization, which is phonologically accurate. For example,

However, not all Mien choose to use this script, and it is not in use in Laos, Vietnam, or Burma/ Myanmar; in Vietnam, there are materials in a romanization based on Vietnamese principles but no dictionary. Other earlier orthographies including three using Thai letters, one using a romanization, and one using the Chinese phonetic syllabary are no longer in use (Purnell 1987, 2002). There is a pre-1984 Mien to English dictionary (Lombard 1968) in the missionary romanization by a missionary, edited by the excellent missionary linguist Herbert C. Purnell. The community member Pauh Smith, an immigrant from Laos, has produced two dictionaries: one Mien to English (Pauh 1995) and the other bidirectional, English-Mien and Mien-English (Pauh 2002); these are massive and comprehensive, though as noted above the English-Mien part of Pauh (2002) is based on an existing monolingual English dictionary. The superb Purnell et al. (2012) Mien to English dictionary is a wonderful model; it includes a great deal of traditional Mien cultural knowledge, is comprehensive and well organized, has an excellent linguistic introduction and many examples, and is designed for use both by linguists and by community members. Thus, Mien is particularly well served by its lexicographers, using an orthography which unifies this transnational community across national borders.

Language Planning Institutions and Regulatory Bodies Language policy for minorities in most Southeast Asian countries is in principle supportive of linguistic rights, but in practice, the national language tends to dominate and replace them; many minority languages are endangered. For a comprehensive list of endangered minority languages in the area, see Bradley (2007b). In some countries, minorities have constitutional or other legal rights to their languages; in others, this is less strongly institutionalized. The main bodies making decisions about languages are ministries of education, though other ministries such as interior or a special ministry for ethnic groups are sometimes involved. There is no single planning or regulatory body for the entire area, which includes seven nations, and thus no consistency of policy for transnational minorities. The same group in different countries may face different requirements for orthographies. This sometimes has strange consequences; for example, the Malay minority in Thailand is being encouraged to use a Thai-based script rather than the romanization or Arabic script used by the Malay majority nearby in Malaysia. It is often the case that there are several orthographies for the same language, used in different countries or by those of different religions, which complicates lexicography. Minority language policy at the national level may include corpus planning, such as preferences or requirements about writing systems – as in Vietnam and Thailand where new systems are expected to use

Page 8 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015 the script and conventions of the national language. Negative status planning for minority languages is near universal in mainland Southeast Asia; in most cases, minority languages are excluded from government schools, administrative use, and other official domains. The policies do not exclude the preparation of dictionaries of minority languages but also do not support this. In this regard, the language policy in China is quite different for many of the same transnational minorities who also live in Southeast Asia. In China, there is a great deal of official and institutional policy and financial support, including for the preparation and publication of dictionaries for large, recognized groups from the national, provincial, and local Nationalities Commission, the ministry in charge of matters concerning minorities; its associated Nationalities Presses which print dictionaries and other materials, like Xu et al. (1985); and the universities of nationalities in many locations, where many lexicographers and other language researchers, most of them members of the relevant national minorities, work and train students from minority and other backgrounds.

Future Prospects Lexicography in minority languages of Southeast Asia is fairly advanced for some languages but is of very uneven quality and usefulness. Many languages do not have any adequate dictionary, others do, and some have really excellent dictionaries. Existing dictionaries are all print-only and have been prepared by different types of people for different purposes, with different kinds of advantages and disadvantages. While some printed dictionaries are also available online, such as Bradley (2006), the uptake among the community is very limited. There has been no separate online lexicography yet, though that is of course a future prospect. However, the potential for use of online resources by communities who live mainly in remote areas with little or no computer literacy and poor internet access should not be exaggerated. The Khmu and Mien romanizations and pinyin Lisu of course present no problems for smartphone use, but it is not yet possible to use Fraser Lisu, Huang Renpo Lisu or Pollard scripts for Lipo, and other languages in this way. Thus, a great deal of work is still needed.

References

Barz, R. K., & Diller, A. V. (1985). Classifiers and standardization: Some South and South-East Asian comparisons. In D. Bradley (Ed.), Language policy, language planning and sociolinguistics in South- East Asia (Pacific Linguistics A-67, pp. 155–184). Canberra: Department of Linguistics, Research School of Pacific Studies, Australian National University. Bradley, D. (1994a). Building identity and the modernisation of language: Minority language policy in Thailand and China. In A. Gomes (Ed.), Modernity and identity: Asian illustrations (pp. 192–205). Bundoora: Institute of Asian Studies, La Trobe University for Asian Studies Association of Australia. Bradley, D. (2001). Counting the family: Family group classifiers in Yi Branch languages. Anthropolog- ical Linguistics, 43(3), 1–17. Bradley, D. (2006). Lisu orthographies and email. In A. Saxena & L. Borin (Eds.), Lesser-known languages of South Asia: Status and policies, case studies and applications of information technology (pp. 125–135). Berlin: Mouton de Gruyter. Bradley, D. (2007a). East and South-East Asia. In R. E. Asher & C. Moseley (Eds.-in-chief), Atlas of the World’s languages (2nd ed., pp. 159–202). London: Routledge. Bradley, D. (2007b). East and South-East Asia. In C. Moseley (Ed.), Encyclopedia of the World’s endangered languages (pp. 379–422). London: Routledge.

Page 9 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015

Bradley, D. (2007c). Birth-order terms in Lisu: Inheritance and contact. Anthropological Linguistics, 49(1), 54–69. Bradley, D., & Bradley, M. (1999). Standardisation of transnational minority languages: Lisu and Lahu. Bulletin Suisse de Linguistique Applique´e, 69(1), 75–93. Purnell, H. C. (1987). Developing practical orthographies for the Iu Mienh Yao, 1932–1986: A case study. Linguistics of The Tibeto-Burman Area, 10(2), 128–141. Purnell, H. C. (2002). Steps towards standardization of a minority orthography: An update on Mien (Yao). In M. Macken (Ed.), Papers from the tenth annual meeting of the Southeast Asian linguistics society (pp. 297–316). TempeProgram for Southeast Asian Studies, Arizona State University.

Dictionaries Bradley, D. (1994b). A Dictionary of the Northern Dialect of Lisu (China and Southeast Asia) (Pacific Linguistics C-126). Canberra: Department of Linguistics, Research School of Pacific and Asian Studies, Australian National University. Bradley, D., Hope, E. R., Fish, J., & Bradley, M. (2006). Southern Lisu Dictionary (STEDT Monograph Series No. 4). Berkeley: Sino-Tibetan Etymological Dictionary and Thesaurus Project, University of California Berkeley. Delcros, H. (1966). Petit dictionnaire du langage des Khmu’ de la region de Xieng-Khouang. Vientiane: Mission Catholique. Fraser, J. O. (1922). Handbook of the Lisu (Yawyin) Language. Rangoon: Government Printer. Lombard, S., & Purnell, H. C. (Ed.) (1968). Yao-English Dictionary (Southeast Asia Program Data Paper No. 69). Ithaca: Cornell University. Metcalf, C. E. (1948). Eastern Lisu Dictionary.Ms. Ngwazah, A. (2007). English-Lisu Dictionary with Grammar, Idioms and Phrases. Chennai: Author. Pauh, S. (1995). Mienh In-Wuonh Dimv Nzangc Sou/Mienh-English Everyday Language Dictionary. Visalia: Author. Pauh, S. (2002). Modern English-Mienh and Mienh-English Dictionary. Victoria: Trafford Publishing. Premsrirat, S. (1993). Thai-Khmu-English Dictionary. Bangkok: Institute of Language and Culture for Rural Development, Mahidol University. Premsrirat, S., Lo, V. T., Thawornpat, M., & Trinh, D. T. (1998). Nghe An Khmu-Vietnamese-Thai- English Dictionary. Bangkok: Institute of Language and Culture for Rural Development, Mahidol University. Premsrirat, S., & Thawornpat, M. (2002a). Thesaurus of Khmu Dialects in Southeast Asia (Mon-Khmer Studies Special Publication No. 1. Thesaurus and Dictionary Series of Khmu Dialects in Southeast Asia, Vol. 1). Bangkok: Institute of Language and Culture for Rural Development, Mahidol University. Premsrirat, S., & Thawornpat, M. (2002b). Dictionary of Khmu in China (Mon-Khmer Studies Special Publication No. 1. Thesaurus and Dictionary Series of Khmu Dialects in Southeast Asia, Vol. 2). Bangkok: Institute of Language and Culture for Rural Development, Mahidol University. Premsrirat, S., & Thawornpat, M. (2002c). Dictionary of Khmu in Laos (Mon-Khmer Studies Special Publication No. 1. Thesaurus and Dictionary Series of Khmu Dialects in Southeast Asia, Vol. 3). Bangkok: Institute of Language and Culture for Rural Development, Mahidol University. Premsrirat, S., & Thawornpat, M. (2002d). Dictionary of Khmu in Vietnam (Mon-Khmer Studies Special Publication No. 1. Thesaurus and Dictionary Series of Khmu Dialects in Southeast Asia, Vol. 4). Bangkok: Institute of Language and Culture for Rural Development, Mahidol University.

Page 10 of 11 International Handbook of Modern Lexis and Lexicography DOI 10.1007/978-3-642-45369-4_97-1 # Springer-Verlag Berlin Heidelberg 2015

Premsrirat, S., & Thawornpat, M. (2002e). Dictionary of Khmu in Thailand. (Mon-Khmer Studies Special Publication No. 1. Thesaurus and Dictionary Series of Khmu dialects in Southeast Asia, Vol. 5). Bangkok: Institute of Language and Culture for Rural Development, Mahidol University. Purnell, H. C., Zanh, G. F., Burgess, V. A., & Aumann, G. (2012). An Iu-Mienh–English Dictionary. Chiang Mai: Silkworm Books. Xu, L., Mu, Y. Z., Shi, L. Q., Ji, J. W., Zhu, H. S., Mu, S. J., Luo, Z. Y., Ma, J. L., & Qi, K. T. (1985). Lisu- Chinese Dictionary [in pinyin Lisu, Fraser Lisu and Chinese]. Kunming: Yunnan Nationalities Press.

Page 11 of 11