<<

Number 23 ● June 2015 Kernerman kdictionaries.com/kdn News Multilingual and the Web of Data Jorge Gracia

1. Introduction argue that reaching a critical mass of linguistic data as LD on Nowadays, we are witnessing a growing trend in publishing the Web will set the basis for a new generation of LD-aware language resources (, corpora, dictionaries, etc) Natural Language Processing (NLP) services, with improved as Linked Data (LD) on the Web. LD refers to a set of best scalability and better interoperability level. The latter is, in practices for exposing, sharing and connecting data on the fact, one of the motivations of LIDER2, a European project Web (Bizer et al 2009). In short, the LD paradigm requires that is driving a remarkable community effort in that direction. that (i) resources are represented on the Web via HTTP In this context, the Ontology Engineering Group (OEG3) at URIs (Unique Resource Identifiers), (ii) once a resource is Universidad Politécnica de Madrid has started converting a accessed via its URI, information about it is obtained, and series of bilingual dictionaries and multilingual terminologies (iii) such information contains links to other resources. The and publishing them as LD on the Web. In the following basic mechanism to support the representation of resources paragraphs we briefly present the RDF conversion process and their related information is the Resource Description that we have followed, and report on our experience with two Framework (RDF1), which follows the subject-object- of these datasets: Apertium and Terminesp. predicate pattern. Resources can be anything, including documents, people, physical objects and abstract concepts. 2. RDF generation of bilingual and multilingual dictionaries Following LD principles, a ‘Web of Data’ emerges in which Recently the W3C Best Practises for Multilingual Linked Open links are at the level of data, as a counterpart to the “traditional” Data (BPMLOD) community group4 has proposed a set of Web in which links are established at the level of documents guidelines for the LD generation of language resources. In (e.g. hyperlinks between webpages). particular, the guidelines for bilingual dictionaries5 identify five Publishing language resources as LD offers clear steps, namely: (i) selection, (ii) modelling, (iii) URI advantages to both the data owners and data users, such as higher independence from domain-specific data formats or 2 http://lider-project.eu/ vendor-specific APIs, as well as easier access and re-use of 3 http://oeg-upm.net/ linguistic data by semantic-aware software agents. Further, we 4 http://w3.org/community/bpmlod/ 5 http://bpmlod.github.io/report/bilingual-dictionaries/index. 1 http://w3.org/TR/rdf11-primer/ html/

1 Multilingual dictionaries and the Web of Data | Jorge Gracia This issue is dedicated 5 Enhancing with semantic language databases | Bettina Klimek to the memory of and Martin Brümmer Adam Kilgarriff 11 Reflections on the concept of a scholarly dictionary | Dirk Kinable 13 The and the digital age: Steps of a transformation process | Nathalie Mederake 16 Recent developments in German lexicography | Alexander Geyken 19 A standardized wordlist and new spellchecker of Frisian | Anne Dykstra, Pieter Duijff, Frits van der Kuip and Hindrik Sijens 21 The Bloomsbury Companion to Lexicography. Howard Jackson (ed.) |

Orion Montoya © 2015 All rights reserved. 26 Oxford Guide to the practical usage of English monolingual learners’ dictionaries: Effective ways of teaching dictionary use in the English class. K DICTIONARIES LTD Shigeru Yamada | Michael Rundell 8 Nahum Hanavi St. 28 The Linguistic Linked Open Data Cloud Tel Aviv 63503 Israel Tel: +972-3-5468102 [email protected] Editor | Ilan Kernerman http://kdictionaries.com ISSN 1565-4745 2

design, (iv) generation, and (v) publication. Finally, the generated RDF data has A similar approach can be followed for been loaded in a triple store and made multilingual dictionaries as well. We are accessible through a single SPARQL not going into the details here, but will just endpoint. In that way, all the data from the highlight key aspects. original dictionaries were made accessible In order to represent the lexical as LD on the Web in a unified graph with information contained in the original lexical entries, senses, translations, etc, as dictionaries, we relied on the nodes. All the nodes were identified with Model for ONtologies (lemon6), a de-facto dereferenceable URIs. Such data can be standard for representing ontology lexica. accessed by means of SPARQL clients and We used the lemon translation module7 RDF and HTML browsers. to represent explicit translations between languages (Gracia et al 2014). As a result 3. Apertium of the conversion into RDF of a bilingual Apertium9 is a free open-source machine dictionary, a lemon lexicon is defined translation platform. Its translation engine per language, where all the translations consists of a series of assembled modules corresponding to a pair of languages are that communicate with each other using text grouped under the same translation set. A streams. One of the modules, the lexical translation set groups a set of translations transfer module, uses a sharing certain properties, for instance to deliver the corresponding target lexical Jorge Gracia is post-doctoral stemming from the same language resource, forms from a given lexical form in the source researcher at the Ontology or belonging to the same organisation, etc. language. Many of the Apertium bilingual Engineering Group, To design the URIs of our RDF dictionaries are available in the Lexical Universidad Politécnica de datasets, we adopted the patterns and Markup Framework (LMF) XML-based 10 Madrid. He got his PhD in recommendations proposed in the context format . We took such LMF dictionaries Computer Science at University of the ISA program (Archer et al 2012). In as a starting point in our process to publish order to construct the URIs of the lexical the Apertium data as LD. of Zaragoza in 2009 with a entries, their senses and other elements, We used lemon and its translation thesis about heterogeneity we preserved the identifiers of the original module as representation schemes. Once issues on the Semantic Web. data whenever possible, propagating the conversion into RDF was completed His current research interests them into the RDF representation. This we published the Apertium data on the Web include multilingualism is, for example, the URI that points to the in accordance with the LD principles. The and Linked Data, linguistic Apertium English-Spanish translation set: result is a set of 22 Apertium RDF bilingual linked data, and cross-lingual http://linguistic.linkeddata.es/id/apertium/ dictionaries, which can be found in the matching and information tranSetEN-ES/. LLOD cloud11. The following languages access on the Semantic Web. One of the most interesting aspects of are currently represented in Apertium RDF: Currently he is exploring how our conversion methodology is that, by Spanish, Catalan, English, French, Italian, to move language resources being consistent with the generation rules Romanian, Asturian, Aragonese, Basque, (lexica, dictionaries, corpora, of every URI assigned to every element Galician, Portuguese, Occitan, Esperanto. in the model (lexicons, lexical entries, The whole dataset contains 400,808 etc) from their data silos into lexical senses, translations, etc), each time translations and, after its conversion into the multilingual Web of Data a bilingual dictionary is converted into LD, RDF, 8,842,510 RDF triples were created. and make them interoperable, its monolingual lexicon is not created again The LD version of Apertium groups the in support of future generation if it already exists but its lexical entries are data of the (originally disparate) Apertium Linked Data-aware NLP tools. shared by two or more translation sets (see bilingual dictionaries in the same graph, http://jogracia.url.ph/web/ Figure 1). This allows the dynamic growth of interconnected through the common monolingual lexicons and, more significantly, lexical entries of the monolingual lexicons shared lexical entries serve as pivot nodes in that they share. Figure 2 shows the network the graph to allow getting indirect translations of interconnected languages in Apertium from two languages initially disconnected in RDF. the original dictionaries. We made all the generated information The generation step deals with the accessible on the Web both for humans (via transformation into RDF of the selected a Web interface12) and software agents (with data sources using the chosen representation a SPARQL endpoint13). scheme and modelling patterns. Depending on the format of the data source, there are a 9 http://apertium.org/ number of tools that can be used to support 10 A complete list can be found at http:// this task. In our work we have used Open lod.iula.upf.edu/types/Lexica/by/ Refine8 (and its RDF plug-in) in a preferred standards/. way. 11 http://linguistic-lod.org/, and see also back cover. 6 http://lemon-model.net/ 12 http://linguistic.linkeddata.es/apertium/ 7 http://purl.org/net/translation/ 13 http://linguistic.linkeddata.es/apertium/

Kernerman Dictionary News, June 2015 8 http://openrefine.org/ sparql-editor/ 3

4. Terminesp Terminesp is a multilingual terminological database created by AETER (Asociación Española de Terminología)14 that contains the terms and definitions from Spanish technological norms (standards) including more than thirty thousand terms, many of them with translations available into another language (English, French, German, Italian, Swedish). The terminology contained in Terminesp is highly technical and specific of domains such as electrical engineering, aeronautics, marine technology, etc. We converted the original data (an MS Access database) into RDF by using the lemon-ontolex model as representation scheme. The lemon-ontolex model is the next version of lemon, developed under Figure 1: Example of the conversion of two bilingual dictionaries the umbrella of the W3C Ontology Lexica (EN-ES and ES-PT) into RDF (Ontolex) community group15. At the time of writing, the lemon-ontolex model is nearly finished and waiting for final corrections by the community to be officially released. We established an automatic mechanism to extract the lexical entries from Terminesp database and instantiate the lemon lexicons. Translation sets were created between Spanish to French (13,996 translations), German (12,593), English (14,936), Italian (802) and Swedish (67). Differently from the Apertium RDF graph, the Terminesp RDF graph follows a star topology (see Figure 3), with Spanish as hub and the other languages as peripheral nodes. Figure 2: Network of languages in the Figure 3: Network of In addition to accounting for explicit Apertium RDF Graph (nodes are languages in the Terminesp translations, we extended the original languages and edges are translation sets) RDF graph Terminesp dataset with part-of-speech and syntactic information that was not explicitly declared in the original data as the nodes of this graph. Every URI is (e.g. nominal, prepositional and adjectival dereferenceable, meaning that when it is phrases), as well as some terminological accessed, a response is obtained with its variations (Bosque-Gil et al 2015). The attributes and links to other elements in whole conversion into RDF resulted in RDF. There are several ways to access and 1,095,051 triples. Since lemon-ontolex is explore the graph (both for software agents still under development, the resultant RDF and humans), such as by querying through files are not published as LD yet. However, the SPARQL endpoint, by using dedicated a preliminary version of Terminesp RDF, search interfaces17, or by following the links restricted to Spanish, English and German, as they are represented in an LD interface and with less rich syntactical information, such as Pubby18. was already published in October 2013 as Some of the advantages of having all the 16 LD using lemon . lexical information and translations in the same RDF graph are: 5. The emergence of a unified single • As we have seen above (Figure 1), graph of translations monolingual lexicons grow with time The publication of the Apertium dictionaries as LD resulted in the creation of a large 17 http://linguistic.linkeddata.es/search/ unified graph of linked lexical entries, 18 For example, one can introduce the senses and translations on the Web. The URI of a translation set in a Web URIs of all these elements can be seen browser http://linguistic.linkeddata. es/id/apertium/tranSetEN-ES/, and its 14 http://aeter.org/ properties and links will be shown in 15 https://w3.org/community/ontolex/ a human readable way (as the links are 16 http://linguistic.linkeddata.es/ clickable, “manual” navigation through

terminesp/ the graph is possible). Kernerman Dictionary News, June 2015 4

as more dictionaries of the same family be extended with the languages covered are published as LD. There is no need to by Terminesp, obtaining for example the create a new and separate monolingual German translation “Netz”@de, which is lexicon every time. not available in Apertium originally. • Useful information can be obtained In conclusion, generating linguistic Linked through a single SPARQL query in a Data is a growing trend in the community of manner that otherwise would be more language resources, with clear advantages difficult to get if the isolated original data such as standardised ways of representing 1st Summer Datathon on sources must be queried. For instance, one and accessing the data, the possibility of Linguistic Linked Data can get all the possible direct translations linking to other resources on the Web of (SD-LLOD-15) of “network”@en into any available Data, and enabling enhanced ways of The 1st Summer Datathon target language in the graph with just discovering and aggregating the data. In on Linguistic Linked Data a single query (not needing to specify this article we have briefly reported our (SD-LLOD-15) will be held which dictionaries to look up), getting recent experiences with the LD generation in Cercedilla (Madrid, Spain) as a result the list of translated forms: of the Apertium bilingual dictionaries and from 15 to 19 June 2015. It is {“xarxa”@ca, “red”@es, “rede”@gl, the Terminesp multilingual terminological “reto”@eo, “sarera_konektatu”@eu, ...}. database, and commented on the benefits co-organized by Jorge Gracia • Indirect translations can be obtained of publishing their information as unified from Madrid Polytechnic between language pairs that were initially RDF graphs. University and John McCrae unconnected. For instance, a translation from Bielefeld University of the English term “network”@en to Acknowledgement (Germany) as part of the LIDER Italian can be obtained, with a single This work is supported by the FP7 European project. Its main goal is to offer SPARQL query, by using Catalan as pivot project LIDER (610782) and by the Spanish persons from the industry and language. The result is “rete”@it19. Notice Ministry of Economy and Competitiveness academia practical knowledge however that some strategies have to be (project TIN2013-46238-C4-2-R). in the field of Linked Data introduced in order to detect and exclude applied to linguistics, and the wrongly inferred translations. To that References final aim is to allow participants end we propose the use of the one time Archer, P., Goedertier, S. and Loutas, to migrate their own (or other’s) inverse consultation algorithm (Tanaka N. 2012. Study on persistent URIs. and Umemura 1994). Technical Report, Interoperability linguistic data and publish it as • Further, direct connections to other Solutions for European Public Linked Data on the Web. This datasets in the Web of Data are possible Administrations. ISA datathon is the first organized so the original information can be Bizer, C., Heath, T. and Berners-Lee, T. on this topic worldwide and is enriched with additional relevant 2009. Linked data – the story so far. supported by the LIDER FP7 data. For instance, a high number of International Journal on Semantic Web Support Action. It will join lexical senses in Apertium RDF have and Information Systems (IJSWIS), 5.3: around seventy participants been linked to BabelNet (Navigli and 1-22. (including attendees, speakers Ponzetto 2012). In that way, additional Bosque-Gil, J., Gracia, J., Aguado-de and tutors) from all around descriptions, lexical relations, or even Cea, G. and Montiel-Ponsoda, E. the world and will constitute pictures, can be obtained by querying 2015. Applying the OntoLex model to an invaluable forum not only BabelNet to enrich the information a multilingual terminological resource. that can be obtained in Apertium. For In Proceedings of 4th Workshop of the for learning but also for the instance, one of the ontological references Multilingual Semantic Web) at 12th exchange of experiences and of “network”@en, when translated as ESWC, Portoroz, Slovenia (MSW’15, ideas related to linguistic Linked “red”@es, is the babelsynset http:// to appear). CEUR-WS. Data. babelnet.org/rdf/s00030258n/i/, from Gracia, J., Montiel-Ponsoda, http://datathon.lider-project.eu/ which an English definition, not initially E., Vila-Suero, D. and Aguado-de Cea, present in Apertium, could be obtained: G. 2014. Enabling language resources to “(electronics) a system of interconnected expose translations as linked data on the electronic components or circuits”. web. In Proceedings of the 9th Language • Several “families” of bilingual dictionaries Resources and Evaluation Conference, (Apertium and Terminesp in our case) Reykjavik, (LREC’14). Paris: European can be published as LD under the same Language Resources Association domain or default graph (http://linguistic. (ELRA): 409-413. linkeddata.es/, in our case). In that way, Navigli, R. and Ponzetto, S.P. 2012. we can have a common access point to BabelNet: The automatic construction, all of them and unified SPARQL queries evaluation and application of a can be built to access these sub-graphs at wide-coverage multilingual semantic the same time. For instance, a search for network. Artificial Intelligence, 193: translations of “red”@es in Apertium could 217-250. Tanaka, K. and Umemura, K. 1994. 19 Examples of queries in Apertium Construction of a bilingual dictionary RDF can be found at http://dx.doi. intermediated by a third language.

Kernerman Dictionary News, June 2015 org/10.6084/m9.figshare.1352066/. In COLING: 297-303. 5

Enhancing lexicography with semantic language databases Bettina Klimek and Martin Brümmer

1. Introduction 2007). Linked Open Data forms a Web of The most recent major transition in the Data, which consists of a machine-readable world of lexicography has occurred barely semantic network of structured data, thirty years ago as part of the emergence of in contrast to the unstructured HTML information technology. The introduction documents that characterize the Web. Data of computers into everyday life marked a that is published under an open access URL medial change which is progressively taking (Uniform Resource Locator, see 2.2) on over the traditional print dictionaries that the Web can profit from linking to other were prevalent over the last centuries. The datasets, thus increasing interoperability and digitization of lexical language information easing data integration. This linking process has formed a new broad landscape of can be considered in parallel to publicly e-lexicography. The boundaries of the viewable Web content, which also allows printed page have dissolved to unlimited inbound document linking independently virtual space that leads to online dictionaries, of its content. In addition, the data of a Bettina Klimek is a Ph.D. translation tools, large language networks, lexicon can, for example, link references student at the Institute for etc. As a result, more and more linguistic to concepts in an ontology to disambiguate Applied Informatics (InfAI information such as pronunciation, the meaning of lexical entries, and multiple e.V.) at the University of -form paradigms, syntactic relations lexicons can then be integrated on the basis Leipzig in her first year of and dialectal varieties accompany the lexical of these concepts. The core principles of research. She has graduated entry. The possibilities of data processing Linked Data, according to Tim Berners-Lee in linguistics (M.A.) in 2013 combined with large data storage capacities (2006), consist of: and is investigating the assist the lexicographer in compiling as well • Use URIs as names for things, as enriching lexical content in a structured • Use HTTP URIs so that people can look interdisciplinary application of and multi-dimensional way. Moreover, up those names, linguistic data and Semantic new developments in Web technologies • When someone looks up a URI, provide Web technologies. Over the last – namely the Semantic Web and Linked useful information, using the standards year she gained insights into Data – offer unique potential to current (RDF, SPARQL), lexicographic data during her e-lexicography by advancing the existing • Include links to other URIs, so that they internship at K Dictionaries, consumer-oriented linguistic data towards can discover more things. converting its machine-processable semantic format database into RDF together that enables interoperable exchange of 2.2 The Resource Description with her colleague Martin lexicographic and other resources on the Framework (RDF) Brümmer. Web. This article presents the outcome RDF is a set of specifications developed by of research undertaken last year with the the World Wide Web Consortium (W3C1) [email protected]. German language dataset of K Dictionaries as a data model that can be used to formally de (KD) within the realm of Linked Data describe resources. A resource can be technologies along three main topics: an anything that is uniquely identified, ranging introduction to Linked Data and its benefits from digital documents like lexicons, to for lexicography (section 2), lemon – the abstract concepts like parts of speech. lexicon model for ontologies (section 3), Resources are identified by URIs and a presentation of the conversion of (Uniform Resource Identifiers), which are KD’s data from XML to RDF (section 4). distinct strings with a uniform syntax. One Finally, section 5 presents a conclusion with kind of URIs are those that additionally a summary of the findings. describe the primary method of access to the resource. Most URIs are URLs that describe 2. Semantifying lexicographic resources Web documents, e.g. http://kdictionaries. with Linked Data com/, which can be viewed to gain more 2.1 Linked Data principles information about a resource. Linked Data describes a set of best In the RDF data model resources are practices for publishing structured data described by statements in the form of and linking it to other datasets, providing subject-predicate-object, called triples, context and aiding discoverability as well which can be understood as metadata as interoperability. The concept describes describing resources. The subject is the machine-readable data with explicitly resource that is described by the statement, defined meaning that links further data. which is uniquely identified by its URI. When this data is published on the Web it is called Linked Open Data (Bizer et al 1 http://w3.org/RDF/ Kernerman Dictionary News, June 2015 6

The object expresses the content of the with lexicographic data but don’t have a statement, the meta datum itself. It can large budget for tool development. On the either consist of a simple string, just as the other hand, data management tools such as orthographic representation of a in OntoWiki2 enable collaborative data editing a dictionary, or a resource as such, e.g. a and research. lexicon. Finally, the predicate constitutes The nature of RDF facilitates relatively the semantic link between the subject and generic use of these tools without any the object and describes the meaning of the adaptations to the schema of the data, unlike relation between them. what relational databases with rigid schemas In order to avoid ambiguity within do. In the same vein, RDF these semantic descriptions, predicates are extensible without modifications to also have URIs that can be looked up for the tools themselves, allowing further data further information and are then called properties to be added during aggregation properties. The additional benefit is that and maintenance. sets of properties can be defined and The second layer of interoperability documented by institutions or developers, offered by RDF is semantic. Unlike XML like the LExicon Model for ONtologies structures that confine data modelling to (lemon), then be reused by other users and hierarchical trees independently of the thus increase their interoperability and data, RDF graphs allow data modelling reduce the work that is usually necessary according to its content in an ontological Martin Brümmer is a Ph.D. for formal definitions. way. Relationships between different student at the Institute for These sets of properties and associated classes of objects can be explicitly defined Applied Informatics (InfAI classes of things that are needed to create and expressed within the data. Sharing these e.V.) at the University of and interpret RDF triples are commonly definitions makes it possible to model data Leipzig in his second year of called vocabularies or ontologies. A of the same domain in the same way. In research. He is a contributor to vocabulary or ontology is a set of classes and the linguistic domain of lexicography, properties that models a conceptualization lexical data could become semantically the NLP2RDF and the DBpedia of a specific domain. A large number of interoperable among different lexicons, Project, as well as to the these vocabularies already exist and can be presenting lexicographic research with development of the Linguistic reused. a broader and more consistent basis that Linked Open Data Cloud. His RDF itself is only a data model, could be merged and combined across research focus is on Linguistic independent of the concrete serialization, dataset borders. Organizations dealing Linked Open Data, NLP in which can be realized using different with lexicographic data can also expand the Semantic Web and Open formats, such as RDF/XML, N3, Turtle or their datasets more easily, without costly Government Data. JSON-LD. All serializations contain the adaption of new data to their model. bruemmer@informatik. same information but differ in readability, Lastly, RDF offers access interoperability size and ease of parsing. by its use of URIs and, in Linked Data, uni-leipzig.de HTTP as an access layer. The nature 2.3 Benefits of RDF for e-lexicography of the resulting link graph can provide RDF offers unique benefits for unique benefits to the users of lexical e-lexicography, first and foremost data. Interlinked data incites exploration by increasing the interoperability of of related data sources that can enrich the lexicographic resources on multiple lexical data with pictures, articles and other layers. As a canonical data model for media content. such resources, RDF provides syntactic Disadvantages of RDF include the still interoperability and allows usage of RDF lacking stability of existing tools and the tools, such as databases, tools for data high skills required to use it to its fullest retrieval, querying and management, as potential. Setting up a Linked Data access well as visualisation and data integration. point for a dataset, a database and minimal On the one hand, this is useful for small tool support require either considerable and medium-sized enterprises that deal time investment or IT support. However, the advantages to be realized by proper data modelling and management, as well as the potential for collaborative data aggregation, outweigh these hurdles.

3. The Lexicon Model for Ontologies – lemon Traditionally, standards for the design, structure and content of dictionaries have been set by established publishing houses. Now that lexicography is no longer tied

Kernerman Dictionary News, June 2015 Figure 1: Example showing two triples 2 http://aksw.org/Projects/OntoWiki.html/ 7 to the print medium, and is digitally transformed, the knowledge of data scientists significantly influences the way electronic language databases look like. However, just as the dictionary was bound to the limits of the book, the language database is tied to the limits of its format. This circumstance has been changed with the innovation of the Semantic Web and RDF. The reusable and interoperable character of Linked Data attracted rising numbers of participants in the compilation of lexicographic Linked Data resources. As a result, the Working Group on Open Data in Linguistics3 collects many of them in the Linguistic Linked Open Data Cloud4. One significant dataset is DBnary (Serasset 2012), constituting of the RDF transformation of lexical data from Wiktionary for 13 languages and thus enabling these lexicons to be interlinked with other knowledge sources in the cloud. The model underlying DBnary is lemon (LExicon Model for ONtologies, McCrae et al 2011), which is highly specialized in representing lexicographic data. Other Figure 2: The lemon core path openly available datasets such as WordNet5, PanLex6 or Eurosentiment7 also use lemon as underlying data format. Consequently, all of these datasets are interoperable and @base thereby pose a huge and valuable addition @prefix ontology: to any professional lexical content provider. @prefix lemon: With regard to the possibility of enriching existing resources with such open linguistic data in the future, we decided to convert :myLexicon a lemon:Lexicon ; the German dataset of KD by using lemon lemon:language "en" ; rather than designing a Linked Data model lemon:entry :animal . for lexicography completely anew. Lemon can be used in parts and is easily adjustable to any further data information if required. :animal a lemon:LexicalEntry ; In the scope of transforming the XML lemon:form [ lemon:writtenRep "animal"@en ] ; format of the current database into RDF, we focused on the lemon core model that lemon:sense [ lemon:reference ontology:animal ] . contains all basic elements necessary for a common dictionary entry. The layout is Figure 3: lemon-RDF example for the lexical entry “animal” depicted in Figure 2. Figure 3: lemon- RDF example for the lexical entry “animal” As can be seen, the labels used to describe all lexicon elements differ slightly from those commonly used, e.g. “LexicalEntry” is also known as , dictionary is equipped with the necessary elements that entry or lemma. In order to understand the are needed for a minimal dictionary entry. lemon vocabulary, all classes and properties As an example serves the entry for “animal” are described within the corresponding in Figure 3 (McCrae et al 2010). lemon-RDF ontology file8. The lemon core What is encoded here are triples containing statements about the lexicon 3 http://linguistics.okfn.org/ as such, the language of the lexical data, 4 http://linguistic-lod.org/llod-cloud/ the orthographic or written representation 5 http://wordnet-rdf.princeton.edu/ of the lexical entry, and its meaning being 6 http://ld.panlex.org/rdf.html/ a reference link to an external ontology. 7 http://portal.eurosentiment.eu/home_ This conceptualization will be explained in resources?page=8/ section 4 in more detail. Lemon is designed 8 http://lemon-model.net/lemon.rdf/, to describe lexical content on different or visit levels of granularity. The lexical entry, for http://lemon-model.net/lemon#/ instance, does not necessarily need to be a

for an HTML view of it. word. It can also be only a part of a word Kernerman Dictionary News, June 2015 8

not only document lexicographic data but also to interconnect knowledge about the relations that hold between lexical entries of different linguistic description levels. Since it expresses all concepts necessary for lexical data documentation and beyond, it a is powerful enough to serve as a foundation for the conversion of KD’s XML data structure to RDF. A 4. RDF transformation of KD’s German dataset To practically demonstrate the benefits of RDF, we converted sample data into lemon-RDF. KD supplied us with a small part of their German monolingual dictionary set, comprising around 5,000 entries. It came in valid XML files with a custom schema to represent the data, containing the entries in individual XML elements. Buchstabe Each entry element has a varying number of child elements representing additional data, such as the written representation erster Buchstabe des Alphabets of the entry, its pronunciation, associated meanings, examples of usage, semantic Schreibt man das mit großem A / kleinem a? relations and part of speech labels. For visualization purposes an XSLT stylesheet is used to transform the data into HTML for user-friendly representation. Figure 4 shows von A bis Z an example of KD’s XML. As one of the RDF serializations RDF/ XML is an XML format, the stylesheet Das ist von A bis Z frei erfunden. could be modified to produce an RDF version of the dictionary. This procedure has the advantage that completeness of the transformation can be guaranteed, meaning that for every XML element, either an equivalent RDF resource could be established or its content would be expressed as a relation between two RDF resources. Figure 6 shows, analogously to Figure 4: Sample XML entry in the KD data the lemon core path in Figure 2, the XML elements of the KD data (on top, white background) that we mapped to lemon or a phrase. Likewise, next to the canonical resources (below, grey background). Boxes orthographic written representations an represent resources in lemon and arrows abstract or other form can be given for represent relations between resources. the lexical entry. Just as classes can be These relations are expressed in XML as a extended by adding subclasses, also the relationship between a parent element and properties stating the relations between its child elements. For this reason, lemon them can be widened to the necessary level relationships do not have a KD equivalent of description as desired. Hence, the lemon in the diagram. The RDF modelling thus core model is open to any kind of structural explicates the semantic relationships that adjustment, and even if the formal elements were implicit in the hierarchical structure required are not stated in the extension of of the XML data model. the core model an appropriate expansion Additional information was transformed can be undertaken with low effort, as will using RDF properties of the LexInfo be shown in section 4. vocabulary (Cimiano et al 2011). These Overall, lexicographic data modelled in are common properties expressing lemon is concise and in RDF, so that it also lexical information, such as part of allows for greater representation of linking speech, gender or pronunciation. This between different sections of the lexicon step required some additional mapping. (McCrae et al 2010). In the RDF model, information that can

Kernerman Dictionary News, June 2015 Consequently, lemon offers the means to be categorized into a number of distinct 9 classes, such as masculine, feminine and neuter for grammatical gender, is generally expressed by assigning RDF resources to a lemon:LexicalEntry ; these classes. In classical dictionaries this lemon:canonicalForm [ information is expressed within standard lemon:writtenRep "a, A" ; strings. Thus, we mapped gender and part lexinfo:pronunciation "[aː]" ; of speech information of the dataset to their respective resources in the LexInfo a lemon:LexicalForm vocabulary. ] ; During the transformation, gaps in lemon:language "de" ; the lemon model became apparent. The lexinfo:gender lexinfo:neuter ; KD data contains compositional phrases lexinfo:partOfSpeech kd:letter ; (multiword units) for many senses, but lemon:sense . there is no exact equivalent to express this relationship in lemon. So we established a new property, “hasCompositionalPhrase”, and used it to link the senses to additional a lemon:LexicalSense ; “CompositionalPhrase” resources. These lemon:definition ; phrase resources are, according to lemon, lemon:example . a subclass of LexicalEntries. Other gaps in the existing vocabularies concern properties to express semantic relations, such as hypernymy and synonymy. Again a lemon:SenseDefinition ; we established properties to express these lemon:value "erster Buchstabe des Alphabets" ; relationships. This approach – of extending kd:hasCompositionalPhrase . existing vocabularies with further properties adapted according to the expressivity of a new data source – is a standard procedure during RDF conversion. Thus, at the end a lemon:UsageExample ; of the transformation process, the added lemon:value "Schreibt man das mit großem A / kleinem a?" . properties formed a small lemon/LexInfo extension, containing ten properties and ten classes. This extension vocabulary could a kd:CompositionalPhrase ; now be published to aid the conversion of new lexicons into RDF and provide lemon:canonicalForm [ compatibility of these resources with KD’s lemon:writtenRep "von A bis Z" ; data, and vice versa. Figure 5 provides the a lemon:LexicalForm lemon conversion of the original XML entry shown in Figure 4. Figure 5: Lemon version of the sample entry in Figure 4 A persistent gap in the conversion is the missing lemon:reference property and the ensuing link to an external ontology. This Taking into account the possible advantages link would disambiguate the meaning of of such links for lexicography, it should be the KD entry in an interoperable way. In considered to add them manually in the addition to the common textual definition, process of lexical data creation. the sense would point to a resource expressing its meaning, like the respective 5. Concluding remarks Wikipedia entry shown in Figure 1. This The transformation of KD’s German disambiguation could then be used to lexicographic XML data to a lemon-RDF provide interoperability between disparate lexicon resulted in the following outcomes. lexicons. Entries and senses in different Firstly, the Linked Data principles were all lexicons could be compared by matching fulfilled so that an integration of other RDF their links to external ontologies first, data is easily achievable. Secondly, all the providing a way to find equivalent senses lexical data elements are now identifiable across lexicon borders. Such a mapping via resource URIs and thus interlinkable could be exploited for the enrichment of one with further relations within the dictionary lexicon with information from another, or and other external data. And thirdly, all for merging different types of dictionaries, XML elements could be mapped to an such as picture with standard dictionaries. equivalent class or relation in the lemon However, creating such a link automatically model without decreasing the high quality would imply automatic disambiguation of of the data content. What is more, the whole the senses of a lexical entry on the basis of a lemon model that goes far beyond the small textual description and few examples, lemon core comes with more fine-grained which currently cannot be fulfilled reliably. lexicographic conceptualizations that are Kernerman Dictionary News, June 2015 10

Figure 6: Mapping KD’s XML elements and lemon resources (excluding the greyed out Ontology part)

worth considering in future data compilation References or extension. Berners-Lee, T. 2006. Linked Data – As a consequence, all possibilities of Design Issues. Retrieved 23 July 2014, Linked Data in general can now be explored. http://w3.org/DesignIssues/LinkedData. With its underlying Linked Data format html/. this dataset is equipped to express any Bizer, C., Heath, T., Ayers, D. and considerable aspect of lexicography. Since Raimond, Y. 2007. Interlinking Open the model is open for adaption, the complex Data on the Web. In: Demonstrations and infinite nature of natural language can be Track, 4th European Seman-tic documented to any desired extent. Existing Web Conference, Innsbruck, http:// open linguistic Linked Data resources such eswc2007/.org/. as lexicons of other languages, datasets Cimiano, P., Buitelaar, P., McCrae, J. and including phonological, morphological or Sintek, M. 2011. Lexinfo: A declarative syntactic information, text corpora, and model for the lexicon-ontology interface. media content as well as all available Linked Web Semantics: Science, Services and Data tools can be exploited and reused Agents on the World Wide Web, 9 (1): 29 for specific lexical data compilations. In (51) , http://sciencedirect.com/science/ RDF all these usually isolated linguistic article/pii/S1570826810000892/. datasets become interoperable. It is such an McCrae, J., Aguado-de-Cea, G., interrelation of single pieces of data across Buitelaar, P., Cimiano, P., Declerck, various datasets without needing to make T., Gómez Pérez, A., Gracia, A. et any change whatsoever in the data schema al. 2010. The lemon cookbook, http:// that will advance lexicography significantly lemon-model.net/lemon-cookbook.pdf/. in the future. McCrae, J., Spohr, D. and Cimiano, P. 2011. Linking lexical resources and Acknowledgements ontologies on the semantic web with We would like to thank Dr. Sebastian lemon. In The Semantic Web: Research Hellmann for giving advice and sharing and Applications. Heidelberg: Springer, his expertise during the compilation of the 245-259. data conversion. Our gratitude also goes Serasset, G. 2012. Dbnary: Wiktionary to Ilan Kernerman who revised this article as an LMF based Multilingual RDF and provided the great opportunity for this network. In: Proceedings of the 8th internship at K Dictionaries, thus making it Language Resources and Evaluation possible to interconnect academic research Conference, Istanbul, (LREC'12), http://

Kernerman Dictionary News, June 2015 with real industry data. lrec-conf.org/lrec2012/. 11

Reflections on the concept of a scholarly dictionary Dirk Kinable

The current age is frequently characterized to be understood by a scholarly dictionary. as the era of information. Characteristics Although the occurs regularly in of this time are indeed the increasing the professional literature, its definition dependence on information technology is rarely at the centre of interest. Any and the ever higher demands on information definition attempt soon reveals that this itself in terms of accuracy, completeness, concept is no exception to the general rule interrelatedness, timeliness, etc. This that defining is far from easy, which holds development has strongly influenced for both concrete and abstract nouns. Even dictionaries as containers/suppliers for the former, which are generally easier of lexical information. According to to define, Landau states in his standard present-day standards of e-lexicography, work Dictionaries. The Art and Craft of the conception of dictionaries as merely Lexicography: “There is no simple way to linear, alphabetically-ordered sequences define precisely a complex arrangement of self-contained entries has long since of parts, however homely the object may become outdated. Applications like the appear to be. One obvious solution is not inter-connection of lemmas in more to define it precisely; but modern dictionary comprehensive semantic relationships users expect scientifically precise, such as hypernymy and hyponymy, or the somewhat encyclopedic definitions” (2001: introduction of the onomasiological search 167). This applies not in the least to abstract Dirk Kinable is one of the function from concept to corresponding nouns, the complexity of which is usually editors of the Algemeen lemmas, may suffice as examples here. more difficult to grasp. In the following, Nederlands Woordenboek, an For collections of dictionaries as well, rare definition of scholarly dictionary, the online scholarly dictionary the image of a linear arrangement on shorter way according to Landau appears of standard contemporary bookshelves is on the verge of becoming to have been followed. By means of only a Dutch that is compiled in antiquated. Here, too, a three-dimensional genus proximum ‘the next higher category’ Leiden at the Institute of Dutch virtual reality, as it were, is being developed and two features, Hartmann and James Lexicology, a bi-national by cross-connecting dictionaries by means (1998) give the following description: “a institution of the Netherlands of portals. type of compiled by a team and Flanders (Belgium). In other , lexicography and of academics as part of a (usually long-term) dictionaries are undergoing a fundamental research project, e.g. linguists working on a His interests also comprise development at present. It is therefore at historical dictionary or dialect dictionary”. historical lexicography, an opportune moment that the organization In this definition, the distinctive semantic to which he contributed of European Cooperation in Science and features are specifically related to the previously as an editor of the Technology (COST) has established a authors and to the research-related nature comprehensive Woordenboek platform named the European Network of the information offered. The previous der Nederlandsche Taal, of e-Lexicography (ENel). ENeL aspires definition marks the contours of the meaning covering the period from to play a stimulating role in bringing of the idiom in a general way. Compared to 1500 to 1976. He has a PhD together lexicographers and linguists to this and consistent with the quotation from in Language and Literature reflect on building a comprehensive and Landau above, the semantic features can be from Leiden University, and modern Web portal for dictionaries of the specified in a far more detailed way. This also writes on lexicography European languages. The keyword therefore line was followed when the concept was and research subjects with is widening the perspective. In line with this the subject of a presentation at the Vienna we like to avail ourselves of the opportunity meeting of ENeL last February. Participants a lexicographic focus on to explore other means of communication, had answered the call to send their views on medieval texts. such as this newsletter, to draw the attention the characteristics of a scholarly dictionary [email protected] to one of the central issues that has to be and their specifications fit in with the dealt with in implementing the project. general definition above, concretizing it to At first instance, building such a portal a considerable degree. We summarize their implies reflection on its content. Even views below. though a digital environment is always Primarily, the scholarly dictionary was expandable, it is recommended to ‘map’ seen related most often to an academic the area in advance. ENeL’s own website environment, both on the production side describes its first aim in the following and the demand side. The former was general terms: “to give users easier access described as including ‘academic editors or to scholarly dictionaries and to bridge supervisors’, ’academic publishing houses’ the gap between the general public and and ‘academic institutions’, while among the scholarly dictionaries”. This entails the ranks of the latter were counted ‘linguistic necessity to gain more insight into what is researchers’, the ‘academic community’, a Kernerman Dictionary News, June 2015 12

‘scholarly audience’ and ‘users concerned extensive corpus of observed discourse, with advanced linguistic studies and the inclusion of documenting example professionals on a fairly advanced linguistic sentences with bibliographic references, level’. Indicative of this environment is the availability of a scholarly apparatus also the notion that a scholarly dictionary like descriptions of method and project is generally not produced on a commercial plan, a bibliography of sources, and, in basis. The academic level of the authors and digital specimens, the implementation the primarily intended users accordingly of advanced search and application This contribution is based on a implied high demands with respect to such tools presentation made at a meeting dictionary’s content. More specifically, The inclusion of an important definition of the European Network of the vocabulary had to be described on the element as “according to the linguistic empirical base of a processed corpus or of and lexicographic standards of their e-Lexicography (ENeL, WG1) scholarly harvested examples, and several time” indicates that a certain flexibility held in Vienna on 11 February standards had to be met such as the pursuit has been built into the definition. This 2015, stemming from feedback of completeness in the scope of entries, chronologically relative point of view by members to a call to define comprehensiveness as to textual genre and implies that not every scholarly dictionary what a scholarly dictionary is. language variation, and detailed information can meet all the characteristics enumerated http://elexicography.eu/ beyond the communicative support for at any time. The tenor of the definition is reception and production purposes, all on in other words prototypical. The term is an authoritative level. Regarding content, used here in the linguistic sense referring adequate room should also be reserved to the prototype theory. A prototype is for encyclopedic information when the ideal example of a semantic category. relevant. Apart from the factors author, The arrangement of a category may be content and user, also the approach of the conceived as follows: surrounding the core content was considered characteristic of a of the prototype are the instances of the scholarly dictionary’s profile. Based on the category that share certain, but not all, of lexicographic standards of its time, analysis the characteristics of the prototype. Viewed and description had to add new knowledge from this angle the enumeration in the on the lexicon from a descriptive, not definition above is exemplary rather than primarily prescriptive, perspective. This had exhaustive and certainly not meant as a list to be realized using analytical definitions, of necessary and sufficient characteristics. scholarly terminology and the quotation The latter is still often too narrow a way of of good dictionary examples as evidence, characterizing definitions. and the results had to be suitable for At present we carry out further research linguistic research. Finally, the last group on this definitional issue with respect to the of characteristics mentioned by respondents concept of a scholarly dictionary. A possible bore upon the contact with the user. Due approach may consist of trying to specify to the often voluminous size of scholarly what is at the centre of the category and dictionaries, this is often established either resembles more the prototype and which digitally in the form of updates or in print dictionary types are more on the periphery. by means of instalments. To convey the Research of this kind is stimulated by the specialized information, the edition is wealth of possibilities for discussion that often supported by a scholarly apparatus. In are characteristic of the era of information digital versions the user also often avails of mentioned in the introduction. Networks functions giving access to many categories are not only devised between dictionaries, and also making the material collection but the lexicographer as well is encouraged searchable, and preferably expandable and to consider his/her own position as a linkable to other collections and tools. constituent part of a larger whole. While this Including this information according to development makes work more complex Landau’s previously-mentioned explicit on the one hand, on the other hand it also description style, we can propose the makes communication easier both within following working definition of scholarly the profession and outside it. dictionary: Comments and suggestions regarding the knowledge-oriented dictionary working definition above are welcome as compiled by (usually) academics to cause for reflection (scholarlydictionary@ provide detailed word descriptions inl.nl). for the pursuit of lexical insight and research support according to the References linguistic and lexicographic standards Hartmann, R.R.K. and James, G. 1998. of their time, and traditionally designed Dictionary of Lexicography. London with such main features as the pursuit of and New York: Routledge. completeness with regard to the entries Landau, S.I. 2001. Dictionaries. The Art relevant to subject matters, a preference and Craft of Lexicography. Cambridge:

Kernerman Dictionary News, June 2015 for analytic definitions, the use of an Cambridge University Press. 13

The historical dictionary and the digital age: Steps of a transformation process Nathalie Mederake

1. Media change and the dictionary already happening entirely virtually. We Dictionaries consist of many things at are experiencing a golden age in terms once, but first and foremost they offer a of studying and appreciating words, and rich documentation basis for languages dictionaries are more accessible than ever, by providing a survey of their state as Slate journalist Stefan Fatsis recently via structured access to meanings and noted about Merriam-Webster1. definitions. Historical dictionaries in Nowadays, lexicographers can benefit particular reflect how the use of words from the development of modern tools has evolved. Admittedly, throughout that expand possibilities in dealing with their creation and editing process, printed and exploring knowledge about language. dictionaries are affected by inconsistencies Therefore, they have started to equip much more than might appear at first glance dictionaries with additional information, or than we would like them to be: as books, e.g. maps to show regional distribution or they seem to be most stable objects after timelines to emphasize the occurrence of Nathalie Mederake all. Historical dictionaries, in particular, a word, i.e. anything that might be helpful received her MA in German are end results of long-term academic for potential users. Indeed, the concept of , history of law projects and therefore predestined to show a dictionary is being reinvented. However, and sociology and PhD in such inconsistencies. The reasons are it depends on the type of dictionary to German linguistics from the manifold: an ever-changing project staff, what extent certain aspects can be taken University of Göttingen. She an ever-updated corpus, the modification on board in the creation of a multi-layered has been editor and research of metalexicographic standards during user interface. As the options are numerous, associate of the Deutsches a decades-long working process, etc. it can be challenging to make and actually After their publication in printed form, have a choice that is not dictated by space Wörterbuch’s revised edition established historical dictionaries are availability. The new interpretation of at the Göttingen Academy of often retro-digitized, which is when existing media puts the lexicographer in the Sciences and Humanities since new discussions on the transformation position of showing the public all collected 2008, and is responsible for of lexicographic data arise. Although information that became available thanks the production of manuscripts this causes discomfort to traditional to decades-long work. However – while and prepress, leading lexicographers, the changes brought about conciseness is valuable – constraints other development and digitization by the world of digital media have to be met than limitation on the number of characters projects including the list in the context of historical lexicography too. with the paper model do exist. How is it of references, digital access Media change has been experienced in possible to depict useful and relevant to users and a cross-lingual various forms, for example, in the early days information for the user? Which access vocabulary portal. Her other of cinematography artists began by filming mechanisms should be implemented and research topics concern theatre plays on stage before realizing that what about the ongoing transformation in so much more was possible in cinema the first place? the dynamics of Wikipedia (and eventually TV, etc). Similarly, in the entries from linguistic field of lexicography, there is no need for 2. Transformations within Deutsches perspective and heading a retro-digitized dictionaries to hide behind Wörterbuch task group on the accessibility the traditional print product anymore: these The story of the Deutsches Wörterbuch and interoperability of dictionaries have a high potential to evolve (DWB) is a telltale of the transformation retro-digitized dictionaries as into much wider-range information tools in process in terms of the past, present and part of the European Network the digital world. future of historical dictionaries. It is one for e-Lexicography. This world is by all means a very complex of nearly two hundred years, and some one. As we can see, crowdsourced products turbulent times at that. Jacob Grimm and [email protected] or user-generated dictionaries are having Wilhelm Grimm, the brothers who started quite an impact on the lexicographic to work on DWB in 1838, each had his own landscape nowadays, which may also affect specific workflow and handling of the entry retro-digitized dictionaries. We expect the structure. Thus, inconsistencies existed information we look up to be up-to-date, from the very beginning. Also, the entire and do not bother to ask who made a change concept of the dictionary was influenced of what and when. Notwithstanding doubts by numerous editors, who continued the and reservations, this development may be useful as future (historical) dictionaries 1 http://slate.com/articles/life/culture won’t exist as mere physical (and thus box/2015/01/merriam_webster_ immutable) objects anymore. One could dictionary_what_should_an_online_ also argue that the storytelling of words is dictionary_look_like.html/ Kernerman Dictionary News, June 2015 14

Grimms’ work, each leaving their own as a whole. The revised edition project, trace. The subject matter and objectives starting in the 1960‘s, was then justified of DWB recurrently led to extensive by being a so-called “repair solution” discussions and workflow reorganization. to the first six volumes. Conceptually Therefore, the first edition of this dictionary and lexicographically tied to the other – if not the concept of a historical dictionary dictionary volumes, the revision project altogether – lived to see small but numerous was designed to make DWB a coherent transformations within roughly 120 years oeuvre. However, the revised edition is Deutsches Wörterbuch until its completion. The answers to what not a supplement to the first edition, but The Deutsches Wörterbuch should be said and shown have been presents an adjusted storytelling form (DWB, the German Dictionary) constantly subject to change in this first for the history of words. It is based on is the largest and most phase of its history. a new and extensive corpus that adheres comprehensive dictionary of the The fact that the first edition of DWB to high quality standards. Its entries are German language in existence. has eventually turned into a 32-volume structured in a succinct way in terms of Encompassing modern High leviathan with a web interface, which now , notes of explanatory matter German vocabulary from 1450 exists for over ten years, marks a second and usage, precise definitions and quotes to the present, it also includes step in the transformation process. The to mark noteworthy usages throughout volumes were interlinked and enhanced the centuries. Entries from the revised loanwords adopted from other with detailed search options for the edition differ substantially in concept from languages and covers etymology, complete dictionary: apart from full-text the very first ones written by the Grimm historical development of searches, more complex searches within brothers. meaning, attested forms, individual entries were made possible For the time being, the revised edition synonyms, usage peculiarities using an annotated database. The data was only includes the letters A-F in the printed and regional distribution within encoded in XML, meeting the guidelines version. Nevertheless, the importance of the the German-speaking world. The of the ISO Text Encoding Initiative first and the revised editions as sources for dictionary’s historical linguistic (TEI). Furthermore, the dictionary’s data the history of dictionary making in Germany approach is illuminated by has become part of a network including is unrivalled. They form a fundamental examples from primary source other historical and dialect dictionaries of work for all users with questions on the 2 documents. The 32 volumes of German providing an extensive array of origins of German words. However, in word-related information. the case of the revised edition, a possible DWB, published between 1854 It goes without saying that this step in the digital concept that may go beyond the Web and 1961, list more than 330,000 transformation process is media-induced. interface (which already exists for the first in 67,000 print Dictionary makers and computer linguists edition) has not been a topic of discussion columns. Final steps to fully have adapted to the digital opportunities of yet. Considering such a concept might lead revise and update the letters A-F lexicography. A new technology is, indeed, to new and important developments beyond according to modern academic liberating. From a current perspective, it is “the good old paper dictionary”, while at standards are underway at the more than obvious, though, that already the same time focusing on potential trouble Göttingen Academy of Sciences very basic and self-evident amendments spots: accessibility and usability of words’ and Humanities and are due to like the implementation of search options stories in historical dictionaries. be completed in 2016, with the and integration into a Web interface reinvent new volumes published by S. the dictionary system we used to know (in 3. In need of new paths Hirzel Verlag. print media). Regarding the consequences, It is not only the advance of search engines Granger (2012, 10) states: “It shows that or search optimization that is becoming http://woerterbuchnetz.de/DWB/ all facets of the field are undergoing a focal point. What we see today is that http://www.uni-goettingen.de/ a transformation so profound that the digital enhancements are not concerned any de/118878.html/ resulting tools bear little resemblance to more only with displaying the content of a the good old paper dictionary”. dictionary, but with emphasizing the role of Another aspect of transformation comes the potential (or even actual) user. Despite into play with DWB’s revision. Thoughts the fact that users of a historical dictionary about a second or revised edition has in general (and DWB in particular) have not already been underway when the paper yet been the subject of considerable studies, version of the first edition was almost it seems to be a worthwhile objective of a finished in the late 1950‘s. Through digital interface to catch the user‘s interest the years the dictionary had become a for the dictionary‘s content and suggest fundamental work regarding German means of handling the lexical information. philology and historical sciences. Back Lexicographers (of DWB) must be aware in the 1950’s, however, only the latest of the fact that, as Lew (2011, 248) states The brothers Jacob and volumes met eligible contemporary in a survey on online dictionaries of Wilhelm Grimm scientific standards. The first six volumes English, “without proper guidance, users written by Jacob and Wilhelm Grimm run the risk of getting lost in the riches”. in the 19th century clearly needed to be Therefore, the goal is to fathom new ways updated to fit into the dictionary’s concept of access that were not possible for the printed book. Not least do different kinds

Kernerman Dictionary News, June 2015 2 http://woerterbuchnetz.de/ of media-induced performances provide 15 different ways of explanation. It may thus are different – and should differ – from be useful to refer to the possible fields of traditional ones in print. Media change inquiry and usage as well as to draw the requires new ways of access and usability user’s attention to matters of microstructure. of lexicographic content. In particular, Consequently, new possibilities for DWB makers of historical dictionaries need to should deal with the options of having a reconsider the conception facultative meta-comment on the one hand of their products to not and a navigation aide on the other, both of risk becoming antiquated. which reach beyond the realization of the Transformation is important in Web interface of the first edition. Tools like order to keep a lexicographic these could be sufficiently implemented product competitive and after or during a digitization process (and sustainable, and it leads to count among the options that stem from the enhanced versions. At this age of electronic dictionaries), but could not stage, one cannot deny that have been included in the framework of the in the field of retro-digitized printed volumes. dictionaries – although Figure 1 shows how the pictured steps best practices for relevant in the transformation process come additional information together and lead to a new direction. The categories are still missing – given suggestions toy with the idea of an concepts for interoperability additional didactic-oriented concept as they and accessibility are becoming Figure 1: Steps in the transformation try to guide the user through the manifold a pressing issue. Nevertheless, process of DWB (and interrelated) information aspects of a reinventing the dictionary is dictionary entry. In view of the numerous not only a means of technical expertise. and comprehensive entries of dictionaries, In fact, understanding language in context it should make use of the electronic of culture emphasizes the middleman environment and deliberately low access role of the lexicographer. Dictionaries thresholds should be established, e.g. by are institutions of general importance as reducing complexity and giving illustrative well as trusted authorities. With that also examples, as demonstrated in Figures 2 comes an obligation to observe ongoing and 3 for a potential electronic version of media changes and exploit their options the revised DWB. Now the challenge is to for lexicography-based information tools. pursue this path and to pool lexicographic competences and technical resources References in order to develop and offer efficient Granger, S. 2012. Introduction: Electronic solutions. That also means overthinking the lexicography – from challenge to displayed content, which in the case of a opportunity. In Granger, S. and Paquot, historical dictionary can sometimes be very M. (eds.). Electronic Lexicography. heterogeneous. As a possible consequence, Oxford: Oxford University Press, 1-11. it seems quite a feasible solution to visualize Lew, R. 2011. Online Dictionaries of dictionary search options, which may English. In Fuertes-Olivera, P.A. and eventually lead to a better understanding Bergenholtz, H. (eds.). e-lexicography. of the dictionary-structure. London/New York: Continuum, To sum up, dictionaries in the digital age 230-250.

Figure 2: Possible focal points in the microstructure Figure 3: Pointing to interrelated information aspects Kernerman Dictionary News, June 2015 16

Recent developments in German lexicography Alexander Geyken

The digital revolution is changing the foresees a stronger focus on print products2. way readers consume news and search Another challenge for Langenscheidt was the for information. People are moving away advent of collaborative internet platforms. from printed reference books and going Based on large initial vocabularies donated online where, generally, they expect to get by third parties and on a very active user their information for free. (Press release by community, the two most popular German dictionary internet platforms, Leo3 and Chambers Harrap, 15 September 2009.) dict.cc4, managed to compile very large translation databases of currently nine This declaration by Chambers Harrap (Leo) and 26 (dict.cc) language pairs with Publishers in 2009 was one of the rare German as the pivot language and became public statements by a publishing house much more popular on the internet than before closing its business. It points to Langenscheidt’s rather traditional website. the fact that the technological change is a In addition Linguee5 – a large database of decisive factor for the crisis of traditional Alexander Geyken works paragraph-aligned translations where the dictionary production that has led to at the -Brandenburg translation quality of words or phrases in the numerous staff reductions or insolvencies Academy of Sciences sentence context can be rated by contributors of dictionary publishers on an international – is becoming increasingly popular among and the Humanities since level. Similarly, the national German users. Finally, Klett, who has invested very 1999, where he directs the dictionary market has been confronted early in technological products, occupied a long-term research project with dramatic changes in the past years. large market share. Under its brand PONS Digital Dictionary of the Traditional dictionary publishers shrank it has published a diligently curated set of German Language (http:// dramatically (Duden, Langenscheidt) or bilingual dictionaries that have been made even disappeared completely (Wahrig), dwds.de/). He received his available free of charge on the internet and the largest academic dictionary, the Ph.D. in Computational since 2001 and continually extended to Deutsches Wörterbuch (DWB, German Linguistics at the University 13 language pairs at present with access Dictionary, by the Grimm brothers), of Munich in 1998. His to over 10 million words and phrases6. In compiled by the two Academies in Berlin main research interests are 2009 Klett has also published a monolingual and Göttingen, will cease its work in computational lexicography, German dictionary that challenged Duden’s 2016. Except for Langenscheidt where dictionary7. corpus linguistics, and the the decline has a longer history, all this As a reaction to the stronger pressure from use of syntactic and semantic was announced to the public in one and the competitors, Duden was also working resources for the mining of the same year: 2013. The timing was pure on a more powerful internet platform. In large textual data. coincidence since the momentous decisions April 2011 a website was launched where were taken much earlier. To begin with, in [email protected] free access was given to the complete 2009, the publishing house Langenscheidt edition of Duden’s flagship product, the with a tradition of more than 150 years of Großes Wörterbuch der Deutschen Sprache business in bilingual dictionaries, sold the (GWDS, Great Dictionary of the German prestigious Duden department to Cornelsen, Language, 19998). GWDS is the largest a large company known for its text books dictionary of contemporary German. It was in the field of education. This happened published as a print edition in 10 volumes just a few months after Langenscheidt with a total of 7,200 pages and 200,000 sold the Brockhaus encyclopedia to the entries in 1999, and one year later in a Bertelsmann group. These sales were not CD-ROM version. In order to appreciate the end of Langenscheidt's decline. In the full impact of the free internet version 2011 Langenscheidt also separated from on the market strategy of Duden one has to Polyglott and in 2012 the ʻadult education remember that the CD-ROM was initially and school’ department was taken over by its competitor Klett. Langenscheidt ended up as “Langenscheidt light”1. Instead of 2 http:// boersenblatt.net/949754/ investing into a technological reorganization 3 http://dict.leo.org/ the company surprised the public by 4 http://dict.cc/ announcing a recent strategy shift that 5 http://linguee.de/ 6 http://pons.com/ 7 http://text-gold.de/praxistipps-fuer- 1 http:// buchreport.de/nachrichten/ online-redakteure/pons-online- verlage/verlage_nachricht/ woerterbuch-macht-dem-duden- datum/2012/11/08/langenscheidt-light. konkurrenz-ein-praxistest/

Kernerman Dictionary News, June 2015 htm/ 8 http://duden.de/woerterbuch/ 17 sold for an equivalent of 500 Euros. have to be addressed nowadays include the According to experts in the field, the entry appropriateness of corpus compilation and of Duden into the market came too late. its dynamic adaptation to new needs rather The sharp decrease in sales of the spelling than the compilation of citation slips. Also, dictionary, previously the no. 1 selling work the automatic extraction of lexicographic of Duden, could not be counterbalanced. information from corpora via statistics or Only two years later, in 2013, Duden machine learning techniques plays a major announced a dramatic reduction of staff role in the dictionary production process from 190 to 30 employees9. Of course, today. Numerous papers on lexicography plans for a complete revision of the GWDS bear witness to these new challenges (e.g. were unrealistic under these conditions. Gouws 2011, Rundell 2012). Cambridge Dictionaries Duden now concentrates on its one volume With the decreasing lexicographic staffs Online features the latest works, including the spelling dictionary, the in publishing houses, further development grammar and the . of lexicography relies predominantly on versions of the semi-bilingual Wahrig, the number two in the institutional funding, namely the Union der PASSWORD Dictionary for monolingual German dictionary market deutschen Akademien der Wissenschaften learners of English in the never managed to obtain a significant brand (Union of German Academies) and the following languages: visibility on the internet. Being almost Institut für Deutsche Sprache (IDS, French hidden among many other resources in Institute for German Language). Both have German 10 Bertelsmann’s large knowledge platform , a long tradition of compiling monolingual Indonesian it does not come as a surprise that Wahrig’s dictionaries. Currently there are more than Malay dictionary was buried together with the 20 different dictionary projects funded by Spanish Brockhaus encyclopedia: it was also in the Academies. However, the majority of Thai the year 2013 that Bertelsmann announced these projects were started a long time ago Vietnamese the discontinuation of their knowledge with traditional methods and will run out platform. The entire lexicographic staff was of funding in the coming ten years. And http://dictionary.cambridge.org/ made redundant and since then, work on given the above-mentioned technological Wahrig’s dictionary came to its end. changes it is not likely or desirable that new Published in cooperation This crisis of lexicography in Germany projects will start in the traditional way that is more than only an economic one. It is currently still typical of almost all these with K DICTIONARIES is a well known fact among publishing projects. houses that revenues of the large flagship By contrast, there are currently two dictionaries do not exceed their expenses. larger projects in Germany that recognize However, in the past these expenses could and implement the principles of the new be cross-financed, for example by the era of e-lexicography. Both can hope for revenues of print products derived from a sustainable funding: elexiko and DWDS. a flagship dictionary. This somewhat Elexiko11 started in 2000 as a long-term comfortable scenario stopped with the project of the IDS. The goal is to describe sharp decrease in sales of printed books the German language from the end of the and the triumph of the internet. Users do 1940’s to the present in all its national no longer rely on print products or, for that variants. Practically, the focus in elexiko matter, on classical browser interfaces. is set on the description since the 1990’s Access via smartphones or tablets has corresponding to the text representation become more and more common, and users in the underlying corpus base, i.e. the are not willing to pay for these services. DEREKO-corpus, a continually growing German dictionary producers have not corpus of currently more than 25 billion been prepared for compiling tailored words. A list of 300,000 lemmas has been products for these new devices. The good selected for elexiko. Until the end of 2014, old times when dictionaries were produced approximately 2,000 entries with high in three consecutive phases, i.e. planning frequency in the corpora were manually the dictionary, compiling the dictionary and edited by the lexicographers. Most lemmas producing the dictionary (Landau, 1984), consist of semi-automatically generated are definitively over. Nowadays dictionaries minimal articles with information about are not produced sequentially anymore the spelling, the morphology and corpus but the various phases run in parallel examples. The hypertextual structure of or in cycles. Protagonists of dictionary the lexicon in elexiko played a role right production are no longer restricted to a team from the beginning. Therefore particular of lexicographers alone but prefer to work emphasis is put on cross-referencing with an interdisciplinary team consisting of individual articles and providing links to corpus linguists, computational linguists, IT external resources (Meyer 2014). The online specialists and lexicographers. Concerns that presentation of elexiko is embedded into the

9 http://boersenblatt.net/543236/

10 http://wissen.de/ 11 http://owid.de/wb/elexiko/start.html/ Kernerman Dictionary News, June 2015 18

lexical information system OWID12. OWID players in Germany, namely the IDS and grants access to a set of lexical modules the Academies, are able to keep pace with including the lexicon of neologisms, the the rapidly developing technology, thus lexicon of paronyms and its core module being able to bring academic lexicographic elexiko. knowledge to the public of the 21st century. The DWDS (Digitales Wörterbuch der Deutschen Sprache, Digital Dictionary Acknowledgement of the German Language) began in 2007 The author would like to thank Lothar as a long term academic project at the Lemnitzer for his comments on an earlier Berlin-Brandenburg Academy of Sciences draft of this manuscript. and the Humanities (BBAW). The motivation to launch this project was threefold: firstly, References there is no satisfactory account for the Didakowski, J. and Geyken, A. 2012. The ‘my4n-news’ interactive history of the German vocabulary since From DWDS corpora to a German Word service for vocabulary the end of the 19th century. Secondly, Profile – methodological problems and acquisition integrates the Grimmsches Wörterbuch will remain solutions. In Network Strategies, Access monolingual learners’ outdated for the letters G-Z even after the Structures and Automatic Extraction dictionaries from KD’s completion of the second edition of the of Lexicographical Information. 2. GLOBAL series for the DWB that ends in 2016 after the completion Arbeitsbericht des wissenschaftlichen following languages: of the letters A-F (by the way, ‘Frucht’ (fruit) Netzwerks „Internetlexikografie“. French was the last word compiled by the brothers Arbeiten zur Linguistik 2/2012 (OPAL), German Grimm). And thirdly, existing dictionaries Mannheim: Institut für Deutsche Italian at that time did not draw on large corpus Sprache, 43–52. Portuguese data and computational methods right from Didakowski, J., Lemnitzer, L. and the outset. Given the comparatively small Geyken, A. 2012. Automatic example Spanish project size of ten specialists, the goal of the sentence extraction for a contemporary http://4nmedia.com/ DWDS project cannot be to compile a full German dictionary. In Fjeld, R.V. and historical dictionary. Instead it was decided Torjusen, J.M. (eds.), Proceedings of to compile a large synchronic dictionary, to the XVth EURALEX Congress. Oslo: Published in cooperation which diachronic modules could be added University of Oslo, 343-349. with K DICTIONARIES if such work will be funded in the future. Gouws, R.H. 2011. Learning, Unlearning More precisely, the aim of DWDS is to and Innovation in the Planning build an aggregated information system of Electronic Dictionaries. In that draws on several complementary Fuertes-Olivera, P.A. and Bergenholtz, lexical resources, word statistics and H. (eds.), e-Lexicography. The Internet, corpora. The DWDS can make use of Digital Initiatives and Lexicography. several lexical resources that are part of the London: Continuum International heritage of the BBAW: the Wörterbuch der Publishing Group, 17–29. Gegenwartssprache (WDG), a synchronic Landau. S. 1984. Dictionaries. The Art dictionary of 4,800 pages with 90,000 and Craft of Lexicography. New York: keywords, compiled between 1961 and Charles Scribner’s Sons. 1977, the Etymologisches Wörterbuch des Meyer, P. 2014. Meta-computerlexikograf- Deutschen ( of ische Bemerkungen zu Vernetzungen in German)) and the Grimmsches Wörterbuch. XML-basierten Onlinewörterbüchern Moreover, some 60,000 dictionary articles – am Beispiel von elexiko. In Abel A. were licensed from the Duden-GWDS for and Lemnitzer, L. (eds.), Vernetzungs- cases where the WDG articles are missing strategien, Zugriffsstrukturen und or outdated. The platform integrates an automatisch ermittelte Angaben in Inter- automatic collocation extractor and a good netwörterbüchern. 5. Arbeitsbericht des example finder (Didakowski and Geyken wissenschaftlichen Netzwerks „Internet- 2012, Didakowski et al 2012). Finally, the lexikografie“. Arbeiten zur Linguistik DWDS draws on large corpora with a size 2/2014 (OPAL). Mannheim: Institut für of 4 billion running words that cover the deutsche Sprache. period between 1600 to the present. The Rundell, M. 2012. It works in practice results of this project are accessible under but will it work in theory? The uneasy http://dwds.de/. relationship between lexicography and To sum up, the past decade has brought matters theoretical (Hornby Lecture). a shift in German lexicography away from In Fjeld, R.V. and Torjusen, J.M. (eds.), private publishing houses to publicly Proceedings of the XVth EURALEX funded institutions and collaborative Congress. Oslo: University of Oslo, internet platforms. The next years will 47–92. show in what way the two institutional key

Kernerman Dictionary News, June 2015 12 http://owid.de/ 19

A standardized wordlist and new spellchecker of Frisian Anne Dykstra, Pieter Duijff, Frits van der Kuip and Hindrik Sijens

1. Introduction school subject in the second half of the last In 2015 the Frisian Language Web saw the century. In addition, because this obligation light of day online (FLW, http://taalweb.frl/). was applied only to primary schools and It has been developed by the lexicography because written Frisian plays a minor role department of the Fryske Akademy and in daily life, most Frisians are not proficient consists of a newly created standardized in writing their own language. Language wordlist of Frisian, an online spellchecker, learners, as well as native speakers, are a machine translator and a dictionary insecure in their language use, fearing to portal. These four applications together and make mistakes. Even language professionals individually offer a unique language tool to like journalists, editors, translators and help Frisian native speakers and learners to novelists experience such problems. write proper Frisian. The FLW is the first Since the lack of a standard was felt to be step towards a Frisian language platform a major obstacle in furthering the position that makes use of modern web technology. of written Frisian, the Province of Friesland Anne Dykstra has recently The standardized wordlist of Frisian is an asked the Fryske Akademy to develop a retired from the Fryske important part of FLW and will not only be standard for Frisian. It should be stressed Akademy, where he has taken a firm base for future dictionaries but also here that the Provincial government does part in various Frisian language the backbone of a special spellchecker. In not have the legal means, nor the wish, to projects, including the scholarly this article we will give a brief overview of make the wordlist of Frisian a mandatory Wurdboek fan de Fryske taal the compilation process of the standardized language standard. It is merely meant to be (Dictionary of the Frisian wordlist and how it is applied in the an aid to those who seek guidance when it language). He was the editor spellchecker. comes to the choice of preferred variants. the Frysk-Ingelsk wurdboek 2. Standard Frisian 3. Standard wordlist (Frisian-English Dictionary), Frisian has two major dialects and a The selection process for the wordlist has has worked on the Frisian number of smaller ones. Traditionally, been guided by different criteria, though Language Database, and was variant pronunciations from minor dialects they have not been applied simultaneously: project manager of the Frisian are not considered standard and thus are (a) tradition Language Web. Currently, he not included in dictionaries (anymore)1. Though not fully-fledged, there is a certain is editor of the International Nevertheless there is not a fully-fledged standard Frisian language, which has been Journal of Lexicography. standard yet. Today’s standard is mainly codified in Frisian dictionaries over the [email protected] based on the two major dialects: Clay years. Though the dictionaries have most Frisian and Wood Frisian. In practice the of the time not selected variants from present dictionaries do not always make peripheric dialects, they do not always a clear and well-reasoned choice between make clear which variant from the two major Pieter Duijff is lexicographer the two. In daily practice it appears that dialects should be prioritized. Yet it has been at the department of linguistics the variant that in the dictionaries is given common practice for a long time to consider of the Fryske Akademy. first, is and has always been regarded as the the variants given first in the dictionaries as He was editor of a legal standard or preferred variant by users. Yet the preferred ones. That is why, for instance, dictionary (2000), co-editor the dictionaries’ ambivalent attitude towards Clay Frisian variants in -om or -omme (rom of the Wurdboek fan de standard Frisian causes some uncertainty ‘spacious, large, wide’, tomme ‘thumb’) are Fryske taal / Woordenboek with those very same users. It appeared given preference over Wood Frisian variants der Friese taal (1984-2011), that educators, authors and civil servants in -ûm of -ûme (rûm, tûme). one of the editors-in-chief had a strong wish for more guidance as (b) distancing of the monolingual Frysk to the choice of preferred variants2. This One element in the standardization of a Hânwurdboek, and editor of doubt regarding correct usage is rooted in language can be distancing, which implies word lists of Town Frisian a lack of education and routine in writing moving away from another (dominant) Frisian. Frisian only became a compulsory language. In the case of Frisian, that other dialects (1998) and modern language is Dutch. An illustrative example Frisian (online, 2015). Since 1 Only the scholarly Wurdboek fan de is the preferred variant hiel ‘whole’, though 2014 he is one of the editors Fryske taal (http://gtb.inl.nl/) provides it is only used in a small part of Friesland. of a new online Dutch-Frisian data from all dialects. In by far the larger part of Frisians use heel, dictionary. 2 It must be said that there has also which is identical to the corresponding [email protected] been resistance towards further Dutch word and therefore disqualifies as a standardization of Frisian, the main preferred variant. reason being fear of losing dialect Another example is the deletion of e in

variation. words like kiste ‘box’ and mûtse ‘hat, cap’. Kernerman Dictionary News, June 2015 20

In a part of the language area it is perfectly preferred variants. This feature makes it normal to say kist or mûts, yet these forms possible for the user to focus on the use of are not selected as preferred variants the preferred variant. because of their similarity to Dutch. The following example will demonstrate (c) uniformity how the spellchecker works: The principle of uniformity can also Hy skamme sich faor syn tiim. determine the choice of the preferred variant. ‘He was ashamed of his team.’ Words like died ‘deed’, ried ‘advice’, sied The spellchecker will return the sentence ‘seed’, goed ‘good’ or paad ‘path’ are often with three underlined suggestions, each in pronounced and written in Frisian without a different colour: the final d. However, the d always emerges in Hy skamme sich faor syn tiim. inflected and plural forms and in compounds, Red [faor] indicates a typo, green [sich] a both in spoken and in written language Dutchism, and blue [tiim] a variant. The Frits van der Kuip has been (dieden, riedsgearkomste). For that reason user can decide what he or she wants the a lexicographer at the Fryske died, ried, etc. are written with the final d. spellchecker to do. For instance, to avoid Akademy since 1986. He Another example is the Northern variant seeing variants underlined the user can was one of the editors of the baarne ‘burn’, which up till now is the form tick the blue box and get the following 25-volume Wurdboek fan de that is given first in most dictionaries, but spellchecked text: Fryske taal (Dictionary of the which has to give way to the Southern brâne Hy skamme sich faor syn tiim. Frisian Language) and one because of the uniformity criterion: brân Faor is a typo of foar, which is a mistake ‘fire’, brâne ‘burn’, brânfersekering ‘fire that any spellchecker would find. Since the of the editors-in-chief of the insurance’. wordlist usually lists English words in their Frysk Hânwurdboek. In 2003 (d) common usage original spelling, the user is advised to opt he completed his Ph.D. thesis Frequency is a natural selection criterion for team. A regular spellchecker might have on 17th century proverbs. More when compiling a list of preferred variants. come up with this suggestion as well. recently he was involved in Highly frequent adverbs like eigentlik ‘in Sich is quite a different case, clicking it the Frisian orthography and fact, actually’ and eins (idem) were included renders the following list of alternatives: spellchecker project, and at in the wordlist, while the less frequent har present he is preparing an online eigenlik, einlik and eink were not. harren Dutch-Frisian dictionary. (e) etymology him Sometimes the spelling reflects a less himsels [email protected] careful pronunciation: abslút ‘absolutely’, The correct Frisian word for the Dutchism bibleteek ‘library’, bommedearje ‘to sich in this case is him. The user will be able bomb’. The wordlist only contains the full to replace sich by him by a simple click. forms: absolút, biblioteek, bombardearje. This is where the FLW spellchecker stands Exceptions to the etymological principle out from regular ones, for which it would are lexicalized forms such as knyn ‘rabbit’ have been impossible to go from sich to him. and plysje ‘police’ (instead of konyn and polysje) and word pairs having different 5. Conclusion meanings like krupsje ‘a/any disease’ and The Frisian Language Web has many korrupsje ‘corruption’. functions, not the least of which is language maintenance. The first ever standardized 4. Spellchecker wordlist of Frisian is supposed to encourage It stands to reason that the standardized Frisians to write their native language with wordlist is the core of the spellchecker that more confidence. When writers apply the is part of FLW. As explained, the wordlist spellchecker, they do not have to consult is meant to be a practical tool to help the wordlist directly, the spellchecker will Hindrik Sijens is a those who want to use preferred variants make clear what the preferred variant of lexicographer at the Fryske in their written text. The term ‘preferred a particular non-standard form is. It is Akademy, and an editor of variant’ entails that variants that have not important to note that users have a choice in been included in the wordlist are not to be what they want the spellchecker to do. The the Frysk Hânwurdboek considered wrong. To further facilitate use, Fryske Akademy will be happy to share its (monolingual Frisian the wordlist has been incorporated into a spellchecker technology with other (small) dictionary). database underlying the spellchecker in languages. Those interested are welcome to [email protected] which non-standard words are linked to contact us.

Fryske Akademy was founded in 1938 in Leeuwarden, the capital major lexicographic accomplishment is the Woordenboek der of the Province of Friesland in the Netherlands, and concentrates Friese Taal (Dictionary of the Frisian Language, http://gtb.inl.nl/). on fundamental and applied academic research into the Frisian Its Linguistics Department has developed the Frisian Language language, culture and society. While the results of its activities Database (http://fryske-akademy.nl/tdb/) and the Frisian Language are primarily of academic interest, they have significant social Web (http://taalweb.frl/). Currently, a Dutch-Frisian online meaning and relevance to Friesland and beyond, particularly in dictionary is in preparation. the context of minority and “small” languages. The academy’s http://fryske-akademy.nl/ Kernerman Dictionary News, June 2015 21

Howard Jackson, ed. The Bloomsbury Companion to Lexicography

The Bloomsbury Companion to (2010) mentions research in encyclopedic, Lexicography presents a broad overview children’s, and onomasiological of contemporary research and trends in dictionaries. These are indeed fruitful areas lexicography. It contains some twenty of lexicographical study, but they are not substantive chapters by eminent scholars in discussed in the Companion. their fields, and includes additional reference Next, Lars Trap-Jensen’s “Researching materials. Although the meta- prefix is not Lexicographical Practice” is a reasonable attached to lexicography in the book’s title, textbook account of major topics in sometimes the frame of metalexicography lexicographical practice: conceptualization, is helpful in emphasizing the distinction design, semantic description, dictionary which is repeatedly stated in the text: the writing systems, interfaces, and the specter Companion is meant to accompany not the of a future where all reference is mediated by practical craft of dictionary-making, but the Google (a major topic indeed!). Compared theoretical work of lexicographical criticism with Bogaards’ chapter and many of the and dictionary research. In day-to-day life, others, this chapter is light on connections the two disciplines are probably not truly to ongoing research. I was puzzled to find no separable, but given the number of manuals references to any of the detailed manuals to The Bloomsbury Companion intended for practitioners, a theoretically- lexicographical practice. Not that the reader to Lexicography oriented introductory compendium is a likely needs to be told that they exist (nine are Howard Jackson, ed. promising prospect. listed in the book’s Annotated Bibliography) London: Bloomsbury but a connection with these manuals could Publishers. 2013 Chapter Overview have provided an understanding of areas ISBN (hardback) The introduction explains that the of relative consensus and divergence. 9781441145970 Companion “is aimed primarily at students Trap-Jensen begins by asserting a focus on ISBN (paperback) of lexicography who are proposing to monolingual native-speaker dictionaries, 9781474237376 undertake research in one of the areas but the overview is unspecific enough that it covered by ‘lexicography’.” It “aims to can apply just as well to bi- and multilingual ISBN (epub) 9781441144140 give a broad overview of the discipline, resources. dealing with the main trends and issues in Kaoru Akasu’s “Methods in Dictionary the contemporary study of lexicography” Criticism” describes the team-review (1). Lexicography is a big enough field methods used by the Iwasaki Linguistic that reasonable people may have differing Circle. As described by Akasu, these opinions about all sorts of questions, large methods appear to be an excellent way to and small. The Companion makes no perform an intensive analysis of a dictionary attempt to offer a unified point of view, by combining multiple reviewers’ expertise; but puts forth a menu of perspectives from Akasu argues convincingly for rigorous which its readers may launch or expand procedure and comparative reviewing. their own research. Along the way, Akasu points to an After the editor’s introduction, the late interesting challenge of dictionary criticism. Paul Bogaards’ “A History of Research It is vanishingly rare for dictionaries to in Lexicography” gives a historical document their editorial practices and style overview. Beginning in the mid-twentieth guide in any detail (Sinclair (1987) being century, Bogaards covers the development a cherished exception). As a result, critics of studies in lexicographical history, must reverse-engineer a dictionary’s intent criticism, and typologies; dictionary in order to guess what its goals were, and macrostructure and microstructure, usage, thence evaluate its success. and corpus methods. The chapter runs for Hilary Nesi’s “Researching Users and ten pages of text, plus three full pages of Uses of Dictionaries” is a thorough overview references to works in English, French, of usage studies to date, with references to a and German. This breadth of references broad array of user studies. Potential areas is a reassuring opener for anyone who of study, and potential approaches, are so suspected that a Bloomsbury Companion varied that it is not feasible to exhaustively might prove Anglo- or English-centric, survey user research in the space allotted. and its diversity of perspective is further Nevertheless Nesi’s account provides a broadened throughout the book. Bogaards’ clear, generously cited map to the enormous typology of lexicographical scholarship territory currently covered, categorizing does not completely correspond to the one existing work by its focus on user types, presented by the Companion as a whole: usage contexts, user preferences, or usage for example, Bogaards (by way of Béjoint strategies. Kernerman Dictionary News, June 2015 22

Adam Kilgarriff’s “Using Corpora as at least as much as it is about teaching. Any Data Sources for Dictionaries” draws user is likely to bring established habits heavily from the author’s own work. This with them when they use new dictionaries, is unavoidable, since the Sketch Engine and to look only for the information they (Kilgarriff et al 2004) has been preeminent are accustomed to finding. As we create in spurring a technological shift that he resources with far more data behind them, describes: “from [a methodology] where our efforts may be squandered if users never the technology merely supported the explore deeply enough to benefit from novel corpus-analysis process, to one where it dictionary developments. pro-actively identified what was likely to be Shigeru Yamada’s “Monolingual interesting and directed the lexicographer’s Learners’ Dictionaries – Where Now?” attention to it” (85). Beyond his own work, details the history, present and future of Kilgarriff also cites research by others, learner dictionaries, with the author’s some of which was new to me despite my characteristic comprehensiveness. Although own focus on corpora. The chapter could I do not always agree with the way that very well serve as a practical introduction Yamada evaluates individual features for new corpus lexicographers. Its or frames particular dichotomies in this fundamentally future-looking orientation chapter, I heartily agree with his ultimate leaves an impression that this chapter will conclusions and vision of a possible stand the test of time quite well: it describes future. The conclusion draws from Yamada The Slownik Norweski practices that will surely continue to develop (2011) to show a ‘dismembered’ LDOCE website was launched by and gain ground. entry in an improved electronic layout, in DTC PRO in 2012 and serves Verónica Pastor and Amparo Alcina’s which I saw very promising implications mainly Polish learners of “Researching the Use of Electronic for the underlying data. Most of the Norwegian. It features KD’s Dictionaries” is an expanded version of other contributions use endnotes only for GLOBAL Polish/Norwegian their 2010 IJL paper (Pastor and Alcina references or supplementary information, dictionaries, and offers also 2010) and presents a classification of but Yamada’s enjoyable endnotes sometimes specialized dictionaries along electronic-dictionary search methods. It is a convey bolder positions than the ones he with study and test materials descriptive study of existing facilities, rather takes in the main text. than a speculative wish-list of potential Arleta Adamska-Sałaciak’s “Issues for free and to subscribers. new features. Although it describes the in Compiling Bilingual Dictionaries” http://slownik-norweski.pl/ present state of an art which is constantly digs deep into the most challenging and developing, Pastor and Alcina’s paradigm is interesting issues in bilingual lexicography. quite likely to accommodate yet-unforeseen Although the chapter purports to focus Published in cooperation dictionary features. As a result, much like largely on print dictionaries, much of the with K DICTIONARIES the previous chapter, this one, too, ought to discussion – audience, scope, directionality, remain useful long past its publishing date. resource planning, microstructure, data John Considine’s “Researching Historical sources, and challenges of inter-cultural Lexicography and Etymology” is exemplary conceptual equivalence in general – is in conveying a specialist’s thorough highly illuminating for both print and survey of the subject matter, supported electronic work, and much of it arguably with extensive references to resources for monolingual work as well. for deeper understanding. Although the The next two chapters, Danie J. Prinsloo’s, OED is obviously the 137-pound gorilla “Issues in Compiling Dictionaries for of historical dictionaries, Considine does African Languages” and Inge Zwitserlood not neglect historical dictionaries in many et al’s, “Issues in Sign Language other languages, and he makes the case for Lexicography” are extraordinarily welcome creating even more. and eye-opening. These two chapters are the Amy Chi’s “Researching Pedagogical deepest explorations of these challenging- Lexicography” is another outstanding but-important topics that I have seen contribution; right around the midpoint in a single-volume lexicography book. of the book, Chi frames her subject in Typically, if these topics are addressed in ways that reach far beyond the chapter’s generalist books, it is with passing citations nominally pedagogical focus. For example, to other sources for specialists in those Chi describes user studies showing that languages. Their inclusion here gives them “most users exploit only a narrow range a rightful position at the core of things that of dictionary items in their consultations, lexicographers should be concerned with, focusing predominantly on meanings,” rather than as fringe topics for specialists. while ignoring guidance on abstractions like They merit broader attention not exactly for syntactic patterns or count/mass distinctions. the languages in themselves: a monolingual She suggests that future studies ought to ask lexicographer focuses on a single language, whether “ curriculum and/ after all. But lessons learned from other or teaching…[has] promoted this narrow languages may enrich everyone else’s

Kernerman Dictionary News, June 2015 usage” (180). This question is about usage work, and these languages have some very 23 challenging things to tell us about unsolved massive resources can expose both the lexicographical problems. history of a language and the history of Robert Lew’s “Identifying, Ordering its lexicography. Brewer speaks from and Defining Senses” is very satisfying experience about the other edge of this and on point. It touches all the urgent and sword, whereby digital editorial workflows relevant issues of lexical and semantic can erase parts of this history and distort analysis, frames interesting problems in the historical record of the dictionary at a an engaging way, and, like almost all the keystroke: another efficiency that was not chapters, has excellent references. It is not possible in print. hard to imagine future-dictionary scenarios On his way to describing “The Future of where ‘ordering’ of senses is not a crucial Dictionaries, Dictionaries of the Future”, task – a contextually-disambiguated word Sandro Nielsen takes a moment to define lookup doesn’t need to tell you about senses what he means by dictionary. This definition The new Clarify language b or c if it knows that you’ve come for sense is a useful exercise for anyone talking service offers online and d – but even in that future, Lew’s treatment about the future of dictionaries, since the of structure in sense-enumerative semantics future of ‘hardbound printed books with mobile applications of the is excellent. speckled edges and thumb indexes’ is very following KD titles: Tadeusz Piotrowski’s “A Theory different from the future of ‘semantic tools GLOBAL Dutch Dictionary of Lexicography – Is There One?” is to aid linguistic production and reception.’ GLOBAL English Dictionary concerned with a question many of us Unfortunately Nielsen’s proposed definition GLOBAL French Dictionary have heard before. The fantastically moves the goalposts just a few meters: GLOBAL German Dictionary incisive thing here is that Piotrowski frames “dictionaries are reference tools made up the question in a way that allows for a of several surface features” (356). These GLOBAL Italian Dictionary positive answer, instead of the traditional surface features are subsequently described, GLOBAL Spanish Dictionary rejection or minimization of the question. but the meaning of ‘reference tool’ is not. GLOBAL Swedish Dictionary “Lexicography produces dictionaries, not I am not feigning ignorance when I say I PASSWORD English theories, while metalexicography does not don’t know what precisely a ‘reference tool’ Estonian Dictionary produce dictionaries but general statements might be. The relevance of the question for about them. Accordingly, metalexicography this review is that some of what Nielsen PASSWORD English Latvian can be a science, while lexicography is discusses, around different interfaces to Dictionary not” (309). Piotrowski notes that existing, electronically-mediated information, is not PASSWORD English practically-oriented lexicographical unique to dictionaries and can potentially Lithuanian Dictionary theories “have a strong prescriptive bent,” enhance any information channel previously PASSWORD English as distinguished from scientific theories, mediated through print. Enhancements Swedish Dictionary which aim at description and prediction. like voice search, video results, and the Random House Kernerman The chapter does not go so far as to actually intriguing “three-dimensional form, formulate a theory of lexicography, but it including holograms” (368) could enrich Webster’s College Dictionary demarcates a space where such a more newspapers and gasoline pumps just as http://clarifylanguage.com/ general lexicographical theory would be well as dictionaries, but newspapers are not meaningful for critics and practitioners prototypical reference tools. Lexicography alike. has some unique features that Nielsen does Published in cooperation Piotrowski is a tough act to follow. In not consider, but many of them are covered with K DICTIONARIES the next chapter, “e-lexicography: The elsewhere in the Companion; in return, Continuing Challenge of Applying New Nielsen offers several useful handles on Technology to Dictionary Making“, Pedro contemporary problems that are not covered A. Fuertes-Olivera argues that many online elsewhere in the book. His discussion of dictionaries are really just print dictionaries “information costs” (369) is a good frame recapitulated on a screen, and that only a for the distinctive tradeoffs of lexical few dictionaries are really conceived from reference, where a user’s main task can be scratch in the electronic domain. Although assumed not to be consulting the dictionary, I fully agree with that evaluation, I find that but instead learning an answer so they can when Fuertes-Olivera gets into specifics, get back to what they were doing. Nielsen’s much of his discussion of e-lexicography conclusion that “dictionaries are in a still feels conceptually grounded in print transitional phase from the manufacturing dictionaries. I fear Fuertes-Olivera takes sector into the service sector” (370) is also insufficient account for the ways that quite well taken. computational intermediation can more The three remaining sections are reference deeply change the products and processes material. Reinhard Hartmann’s catalog of lexicographical consultation. of “Resources” is a general overview Charlotte Brewer’s “The Future of of societies, corpora, journals, and the Historical Dictionaries, with Special like, with brief expository descriptions to Reference to the Online OED and accompany each section. In a book that ” addresses the insights made appears otherwise carefully copyedited, possible by digitization. Fast search over this chapter has unusual inconsistency Kernerman Dictionary News, June 2015 24

in the formatting of URLs and names in headword selection, discussed in general its informational tables, but this is not an terms by Trap-Jensen on pages 40-41, is obstacle to getting the useful information explored more concretely by Kilgarriff PASSWORD out of them. on pages 79-83. Nor is the index much Barbara Ann Kipfer’s “ of help: it runs for only two pages of this Apps of the semi-bilingual Lexicographic Terms” includes terms 420-page book, and lists Kilgarriff’s PASSWORD English from lexicography, publishing, and parts of headword-selection pages under ‘headword’ Dictionary for Android, iOS linguistics relevant to lexicography. Like all but not at ‘lemma selection’, yet conflates and Mac OS, developed by the other chapters, it represents its author’s twelve references to ‘headword’ without any subcategorization (e.g. between headword Paragon Software: own viewpoint. In the case of a glossary this means that some of its terms are not used selection and headwords as part of access English Afrikaans in the Companion itself (back-formation; structure). English Arabic bogey; density; Sprachgefühl). Considering The limited coordination among the English Bulgarian that Kipfer’s (1984) Workbook on authors also leads to a certain unevenness English Chinese Simplified Lexicography included Jennifer Robinson’s between chapters. Again on the subject of English Chinese Traditional (1983) glossary of lexicographical headword selection, Prinsloo concludes English Croatian terminology, it was interesting to compare a stunning section about lemmatization English Czech the two approaches some 30 years apart. challenges in Bantu languages (246) by English Danish The two have some overlap in mentioning frequency cutoffs as a potential English Dutch their headword selection, and sometimes in method for lemma selection. References to English Estonian the substance of the definitions and sense either Kilgarriff or Trap-Jensen would have English Farsi divisions. Robinson’s glossary has example been helpful here, but it would have been sentences taken from a reading list of most interesting if Prinsloo had been able to English Finnish lexicographical writing, and frequently uses engage with their positions, and to discuss English French index entries, variant headwords that are the consequences of frequency cutoffs from English German simply cross-references to a fully-defined the perspective of Bantu-family language English Greek synonym (a term I couldn’t remember but users and lexicographers. English Hebrew that I found in Kipfer’s glossary). Kipfer The stand-alone chapters and skimpy English Hindi does away with illustrative examples and index create an obligation to read the English Hungarian also with index entries, instead repeating whole book in order to be sure that one English Icelandic definition content at variant headwords with has read everything that its contributors English Indonesian small amounts of supplemental information have to say about a subject. A professor English Italian at one entry or the other. I find the current who wished to assign selected readings English Japanese approach more user-friendly and well-suited from the Companion might need to assign Companion two or more chapters to get full coverage English Korean to the . Howard Jackson’s “Annotated of various issues that span subdisciplines. English Latvian Bibliography” concludes the book. The To be clear: it is a great strength of the English Lithuanian whole book may be seen, in a certain light, as book that it contains these complementary English Malay an annotated bibliography to the consistently perspectives; it is regrettable only that the English Norwegian great references sections of its individual connections are not more accessible. The English Polish chapters, and Jackson’s bibliography presents book is of manageable length and often English Portuguese Brazil a different kind of general overview. It turns illuminating, so ‘reading the whole book’ English Portuguese Portugal out that both bibliographical streams are is in no way a burden. English Romanian needed for the fullest picture of the available It is clear from the start that the English Russian work. For example, Jackson’s annotated Companion is deliberately latitudinarian, English Serbian bibliography lists a practical manual for permitting leading scholars to introduce English Slovak field workers in indigenous languages their specialties and to describe their (Bartholomew and Schoenhals 1983) but cutting-edge research on their own terms. English Slovene lists no work focusing on either any African The book covers far more intellectual English Spanish languages or Sign languages. This is an territory than the average researcher could English Swedish honest reflection of their neglected place in hope to have at the front of their mind English Thai mainstream lexicographical thinking, even all the time; making it available at arm’s English Turkish as this book has done well in bringing them reach is surely part of what qualifies it as English Ukrainian greater attention. a companion rather than an introduction. English Urdu Is it necessary to distinguish between English Vietnamese Evaluation lexicography and metalexicography? We Many of the chapters address some of the say that one field is concerned with practice https://play.google.com/ same sub-topics from different perspectives and the other with theory, but the two are store/apps/details?id=com. and in varying levels of detail, and this never truly separable. As Chi suggests kdictionaries.container/ to the book’s great credit and advantage. in her chapter, people use dictionaries in https://itunes.apple.com/app/ Unfortunately, the Companion has certain ways because lexicographers have password-semi-bilingual- vanishingly few cross-references between historically made dictionaries in certain english/id672144357/ chapters. As a result, it is not possible to ways. The study of users is therefore

Kernerman Dictionary News, June 2015 know beforehand that, say, corpus-driven also the study of lexicographers. Beyond 25 being cultural artifacts, dictionaries are References technological artifacts in a major transition, Bartholomew, D.A. and Schoenhals, as all of the book’s contributors would L.C. 1983. Bilingual dictionaries for surely agree. We do not yet know what indigenous languages. Mexico: Summer will be the end point of this transition, but Institute of Linguistics. Piotrowski makes me feel that theory can Béjoint, H. 2010. The Lexicography of be our guide. English. From Origins to Present. Piotrowski says that a theory of Oxford: Oxford University Press. lexicography is not like a scientific Kilgarriff, A., Rychly, P., Smrz, P. and theory, because it cannot successfully Tugwell, D. 2004. The Sketch Engine. predict unobserved phenomena. Although Proceedings of EURALEX 2004, this is probably true in absolute terms, Lorient, 105-116. it is interesting to consider what kinds Kipfer, B.A. 1984. Workbook on of things theoretical lexicography can Lexicography. Exeter Linguistic at least infer, if not predict outright. Studies, Vol. 8. Exeter: University of Akasu and the Iwasaki Linguistic Circle Exeter. can guess at a lexicographical team’s Pastor, V. and Amparo A. 2010. Search underlying principles based only on their Techniques in Electronic Dictionaries: finished dictionary: finding the proof of A Classification for Translators. the pudding in the eating. Trap-Jensen’s International Journal of Lexicography chapter somewhat frustratingly describes an 23 (3): 307-354. array of possible lexicographical practices Robinson, J. 1983. A Glossary of without much accounting for how people Contemporary English Lexicographical choose among them in practice. Yet every Terminology. Dictionaries 5: 76-114. working lexicographer makes complicated Sinclair, J.M. (ed). 1987. Looking Up: choices in practice every day, and these An Account of the COBUILD Project choices are motivated by some kind of in Lexical Computing. London: Collins theoretical orientation, even if it is largely ELT. implicit or unexamined convention. Yamada, S. 2010. Layout matters. Paper The chapters on Sign and African presented at the Dictionary Society of languages, where aspects of traditional North America XVIII Biennial Meeting, practice are impossible, throw stark McGill University, Montreal, 8-11 June. contrasts that help to reveal the shadow theories behind mainstream lexicography. Orion Montoya As we work to document under-resourced MDCCLV languages at a level of quality that [email protected] approaches that of resource-rich languages like English, we encounter features that cannot fit into the familiar paradigms of lexicography for Indo-European languages. It may turn out that a solution to a distinctively Xhosa or ASL challenge – be it lemmatization, gestural search, MULTILINGUAL or semantic compositionality – could be usefully applied to lexicography of familiar 43-Language English Multilingual Dictionary western languages and enrich the entire lexicographic discipline, in both theory Recently the English core of PASSWORD Dictionary has undergone and practice. a major round of editorial revision, including the update of thousands of entries, the upgrade of the microstructure and XML format, and the Conclusion introduction of well over 2,000 new entries and 6,000 examples of usage. Enough theorizing. These thoughts have Then, in the autumn of 2014, translation for all the new entries was carried been spurred by the Companion, but no out to 43 different languages, including: doubt other readers will seize on different aspects and reach their own conclusions. Afrikaans | Arabic | Bulgarian | Catalan | Chinese Simplified | The important thing is that I expect this book Chinese Traditional | Croatian | Czech | Danish | Dutch | Estonian | will be a strong catalyst for lexicographers Farsi | Finnish | French | German | Greek | Hebrew | Hindi | Hungarian | of every stripe. It presents contemporary Icelandic | Indonesian | Italian | Japanese | Korean | Latvian | Lithuanian | research, summarized for review at a Malay | Norwegian | Polish | Portuguese Brazil | Portuguese Portugal | readable scale, with the happy outcome that Romanian | Russian | Serbian | Slovene | Slovak | Spanish | Swedish | both specialists and new researchers may Thai | Turkish | Ukrainian | Urdu | Vietnamese reach a clearly contextualized understanding The total number of translation equivalents is 1.7 million, for 30,000 of the trajectories of subfields other than entries with 40,000 references. their own. Kernerman Dictionary News, June 2015 26

Shigeru Yamada. Oxford Guide to the practical usage of English monolingual learners’ dictionaries: Effective ways of teaching dictionary use in the English class

This short guide combines a general for different purposes, and for certain types introduction to dictionaries and their of word a conventional definition is less receptive and productive functions (pp1-10), helpful than, say, an image, a video, or with a set of classroom activities in a sound file. The function of definitions, dictionary use, all designed to demonstrate as Bolinger observed 50 years ago, is “to the value of using a dictionary rather than help people grasp meanings [by supplying] any of the more alluring alternatives on a series of hints and associations that will offer (pp10-16). relate the unknown to something known” There is much to like here, and the (Bolinger, D. 1965. The atomization of guide is full of sensible advice. Yamada meaning. Language 41: 555-573). And acknowledges that, for most learners, a if that job can be done more efficiently monolingual dictionary of their target through nonverbal media, perhaps that is language can look intimidating, and he what we should focus on in such cases. recognises that “some may find the task of The guide also explains the features that consulting a dictionary troublesome”. In distinguish MLDs from other kinds of Oxford Guide to the the past, learners have generally preferred monolingual dictionary, emphasizing the bilingual dictionaries to monolinguals, and “approachability” of the definitions and practical usage of now the Web offers a range of other options examples. This is an important argument English monolingual too, such as automatic translation sites or – more so than ever now, when every type learners’ dictionaries: forums like Word Reference (http://forum. of dictionary is freely available. A language Effective ways of teaching wordreference.com/). All of this creates learner who looks up condescension dictionary use in the tough competition for the traditional in Wiktionary (en.wiktionary.org), for English class monolingual learner’s dictionary (MLD). example, is unlikely to get beyond the first Shigeru Yamada But Yamada makes a spirited case for the definition: Tokyo: Oxford University benefits of MLDs, seeing dictionary use as The act of condescending; voluntary Press. 2014 “a learning opportunity”. descent from one’s rank or dignity in http://oupjapan.co.jp/teachers/ This gets to the heart of the matter, and intercourse with an inferior resources/oup_guide_to_ it’s useful to think about this in terms of Even I am having problems working out learners’ short-term and longer-term goals. what this means, and in the unlikely event dictionary_use_2014_E.pdf/ In the short term you may need to decode an of a learner successfully decoding the unfamiliar word encountered while reading, definition, it wouldn’t solve any problems or resolve a communicative problem in because it fails to correspond to any normal order to complete an assignment. This use of the word. (The definition is in fact is where a “quick fix” is in order, and lifted, verbatim, from an ancient Webster’s MLDs are not always well-adapted to this dictionary.) This is the kind of thing that role. But if your longer-term goal is to gives monolingual dictionaries a bad name, become proficient in a second language, and Yamada is right to stress the superiority the process of consulting an MLD brings of corpus-based MLDs over many of the benefits in terms of learning (as opposed free offerings on the Web. to merely problem-solving). As noted here, Classroom activities for dictionaries “by reading English-language definitions, typically focus on specific data types learners get greater exposure to English and (information on meaning, collocation, and learn the language within its own system”. the like), in order to familiarize users with This comes up in the section where the the way different kinds of information are author compares definitions in MLDs conveyed and thus to facilitate dictionary with translation equivalents in bilingual use. What is interesting (and original) here dictionaries, and demonstrates that – given is that the process of consulting a dictionary the anisomorphism of language systems as is framed in terms of seven distinct steps, different from each other as Japanese and and activities are proposed for most of English – one-to-one equivalents rarely tell these. Dictionary consultation is seen as the whole story. But as he concedes, “this a “complex intellectual activity” (even if actually presents no major problems when proficient users perform it unconsciously), confirming the meaning of specific things which proceeds from recognising the such as flora and fauna”. This observation communicative problem and determining perhaps points to future developments. what the problematic word or multiword unit

Kernerman Dictionary News, June 2015 Digital dictionaries can use different media is, through finding the “right” information 27

in the dictionary, extracting the data you of whether teaching dictionary use is need, and applying this information in a worthwhile project in itself. Yamada order to resolve your problem. Along the believes that, when a user’s search for way, a number of definition conventions information is unsuccessful, “either the are helpfully explained. Some of the advice dictionary or the user is to blame”. My on finding the appropriate entry is less default position is that if users can’t readily applicable to digital dictionaries than to find what they are looking for, the fault lies traditional print-based ones: finding phrasal squarely with the dictionary. Consequently, verbs and , for example, is far easier the onus is on dictionary producers to in a well-structured online dictionary, where ensure that information is easy to locate and the trend is for these to be separate entries easy to digest – an approach which feels (rather than “nested” at the end of a base more in tune with the way that software form). Intelligent search algorithms take products are designed nowadays so that no you straight to an idiom even if you don’t instruction manual is needed. Few students know the exact canonical form. (Locating will be fortunate enough to have a teacher close/shut the stable door after the horse who understands dictionaries as well as the has bolted in a paper dictionary was as big author of the guide. In most cases, they must a problem for users as deciding where to put rely on their dictionary being well enough it was for lexicographers. No more.) designed to make its use intuitive. Having One task lists a number of common said that, this guide will give teachers who English words and expressions (such as not are not especially dictionary-aware the bad), and asks users to compare the English resources to demonstrate to their students definition with a corresponding translation the benefits of using a monolingual learner’s equivalent in an English-Japanese dictionary. dictionary. This is a neat way of showing how items like these don’t always map Michael Rundell conveniently from one language to another, Lexicography Masterclass and again makes the case for using an MLD. and Macmillan Dictionary The guide is aimed at Japanese learners [email protected] of English, but much of it would be useful for teachers and learners with other first languages. And though produced for a particular dictionary publisher (OUP), it is far more than a mere promotional tool. The advice it gives is refreshingly even-handed MULTILINGUAL and all the main MLDs are referred to at different points. (One quibble is that the URL given for the Macmillan Dictionary Semi-Automatically Generated site is for a long-defunct version: the correct Multilingual Glossaries address is http://macmillandictionary.com/.) KD’s updated English Multilingual Dictionary (EMD, see p25) now There is occasionally an elegiac feel serves as a base for developing new multilingual glossaries for other about the guide, in that some of the advice languages. The process begins by reverse-engineering the Password relates to using print dictionaries, and it is dictionary data (which is at the heart of the EMD) in order to produce a hard to imagine the average high-school raw index for each language to English. Next, a dedicated software tool is student in Japan (now by definition a digital used to manually edit and refine the index, including the linking of each native) consulting one of these (unless L1 headword to the corresponding sense(s) of the original English entries. forced to!). And perhaps more could have Finally, the translations from all other languages to every particular sense been said about some of the excellent in the EMD are associated automatically, turning the L1-English index complementary resources available on the into an L1 multilingual glossary with translations to 43 languages. So far, Web. While students are advised (p13) multilingual glossaries were created for these 20 languages: to use the “example banks” in dictionary Catalan | Chinese Simplified | Danish | Dutch | Estonian | French | CD-ROMs to match examples to word German | Hungarian | Indonesian | Italian | Japanese | Norwegian | senses, many would feel more at home with Polish | Portuguese Brazil | Portuguese Portugal | Romanian | Russian | a Web resource such as SKELL (http://skell. Slovene | Spanish | Swedish sketchengine.co.uk/).

In the end, we are left with the question Kernerman Dictionary News, June 2015 Kernerman Dictionary News, June 2015

The Linguistic Linked Open Data Cloud

The LLOD diagram is maintained by the OKFN Working Group on Linguistics and is provided by the Creative Commons Attribution 3.0 Unported (CC BY 3.0) license. http://linguistic-lod.org/

K DICTIONARIES LTD 8 Nahum Hanavi St. Tel Aviv 63503 Israel ı Tel +972-3-5468102 ı [email protected] ı http://kdictionaries.com