<<

Number 24 ● July 2016 Kernerman kdictionaries.com/kdn News Towards Globalex

The GLOBALEX Workshop on Lexicographic Resources and It is hoped that Globalex can facilitate knowledge sharing and Human Technology (http://ailab.ijs.si/globalex/) cooperation among its members and with others concerned took place as part of LREC 2016 at Portorož, Slovenia on May with language and language technology, promote the 24 and constituted the first live step in forming an overall global creation, research, exchange, dissemination, integration and constellation for . The initiative was launched usage of lexicographic resources and solutions, and enhance nine months earlier at a meeting held during the fourth eLex interoperability with the academia and industry worldwide. conference in the UK in August 2015, and has drawn the The roundtable featured short interventions by a support of lexicographic associations worldwide. representative of each organization, including one by video The full-day workshop was sponsored by the associations for and another by skype, presenting their association and vision lexicography of Africa, Asia, of Globalex, followed by a Australasia, Europe and North discussion with the audience. America (Afrilex, Asialex, The main issues concerned Australex, Euralex, DSNA), the aims and obstacles facing and the eLex conference series globaLex Globalex, its organization, on electronic lexicography in operation and meetings. the 21st century. It set out to explore standards for lexicographic The conference models ranged from dedicating a section resources and their incorporation in new language technology to Globalex at the continental conferences, and alternating and other solutions as part of knowledge systems and Globalex conferences with those of the different associations, collaborative intelligence. The workshop was attended by to holding Globalex conferences on their own every few about 60 participants, included 16 twenty-minute sessions years. and concluded with a roundtable about the future of Globalex. The organizers have agreed to contribute to the new Globalex The core idea of Globalex is to work on lexicography in website http://globalex.link/, which begins operation this global contexts and bring together different segments that month. More details appear on page 4, and a reprint of Towards operate on their own – on regional, topical or any other level Peoplex, from 1997, is available on page 18. – to cooperate. Ilan Kernerman

1 Towards Globalex | Ilan Kernerman 2 Lexicography associations: Afrilex, Asialex, Australex, DSNA, eLex, Euralex 4 GLOBALEX 2016 workshop summary and next steps 5 XVII Euralex International Congress & The Lexicographic Centre at Tbilisi State University | Tinatin Margalitadze 6 Asialex 2017 in Guangzhou | Hai Xu 6 Nineteenth-Century Lexicography Conference, 2018 7 Chinalex and lexicographic activity in China | Yihua Zhang 12 Treatment of entries with Chinese characteristics in English learner’s : A case study of Oxford Advanced Learner’s Dictionary 8e | Lixin Xia and Langwei Zhai 16 Lexicography at the Society for Danish Language and Literature | Lars Trap-Jensen 18 Towards Peoplex (reprint) | Ilan Kernerman

19 Linked data in lexicography | Julia Bosque-Gil, Jorge Gracia and Asunción © 2016 All rights reserved. Gómez-Pérez 25 From dictionaries to cross-lingual lexical resources | Guadalupe Aguado-de-Cea, K DICTIONARIES LTD Elena Montiel-Ponsoda, Ilan Kernerman and Noam Ordan 8 Nahum Hanavi Street Tel Aviv 6350310 Israel 32 Adam Kilgarriff Prize | Michael Rundell +972-3-5468102 [email protected] Editor | Ilan Kernerman http://kdictionaries.com ISSN 1565-4745 2

he African association for he Asian Association for he Australasian Association Lexicography (Afrilex) was Lexicography (Asialex) was for Lexicography (Australex) establishedT in 1995 after a feasibility establishedT at the initiative of Gregory wasT founded in 1990 as a companion study for a lexicographical institute for James and Amy Chi on 29 March association to Euralex. It is committed Southern Africa indicated a keen interest 1997, during the Dictionaries in Asia to the development of lexicography in in a unifying body among lexicographers conference at University of all languages of the Australasian region. and members of related professions. Science and Technology, with the aim Its interests include: Dr Reinhard R.K. Hartmann chaired of fostering scholarly and professional ● dictionaries of all kinds the inaugural meeting, and officially activities in the field of lexicography ● the theory of lexicography announced the birth of a new member and facilitating the exchange of ● the history of lexicography of the Lex family. information and ideas through meetings, ● the practice of dictionary-making Afrilex is managed by a Board elected publications, etc. Membership is open to ● dictionary use biennially by the members present at a any person or institution. ● endangered languages General Meeting of the association. The first executive board was elected ● Revivalistics Membership is open to individuals and at that inauguration meeting, and the ● terminology and terminography institutions who have an interest in President, HUANG Jianhua, convened ● corpus lexicography lexicography. The current membership the first conference in Guangzhou ● computational lexicography stands at 60 individuals and 8 corporate (1999). From then on, elections were not ● sign language members. The board consists of the held again, and usually the convener of ● lexicology president, vice-president, secretary, each conference was named president Membership consists mainly of people treasurer, four non-officers and the for two years, until the voting process from Australia, New Zealand and the conference convener. was renewed in Kyoto 2011. Pacific Islands, but also from many other The aims of Afrilex include the Asialex is governed by an executive countries, including Japan, South Africa, promotion and coordination of research, committee that is elected for two-year Spain, the UK and Zambia. Australex study and teaching of lexicography by terms, consisting of a president, includes career lexicographers, students means of publishing a journal, , vice-president, secretary, treasurer, of lexicography, researchers into and other appropriate literature, and three more members as well as dictionaries, publishers, teachers and organizing regular conferences and four ex-officio members including the people who just like dictionaries. seminars that offer opportunities for immediate past president, journal editor, The association is governed by a exchange of ideas and for mutual and conveners of next two conferences. committee of 10 members, who are stimulus to researchers and practitioners Lexicography – Journal of Asialex elected every two years during the in the field of lexicography, and is published biannually since 2014 biennial conference. It consists of a facilitating the participation in tutorials by Springer, in print and online, and President, Vice-President, Secretary, and training courses. membership is connected to the journal Treasurer, five officers and the immediate Afrilex seeks to develop cooperation subscription. Until then, the activity past President. Membership is free. with other international associations of Asialex focused almost entirely Until 2009, meetings were held for lexicography as well as with local on holding biennial international regularly every one or two years, associations that are interested in the conferences. In addition to conference in addition to specific conferences study of language. proceedings, a newsletter appeared in (e.g. on Australian placenames of The 21st annual International the first years and collections of papers indigenous origins) and workshops Conference of Afrilex is held in July from two conferences were published as (e.g. on dictionary writing). Since then 2016 in Tzaneen, South Africa. well. Since 2015, conferences started to conferences have been held biennially, Lexikos (ISSN 2224-0039) is the be held once a year, with the tenth taking in either Australia or New Zealand. The official mouthpiece of Afrilex, the place in Manila 2016, and the next one next conference is planned for August editor being an ex-officio member of the due in Guangzhou in 2017. 2017 in the Cook Islands. It is hoped Board. All contributions are indexed by The challenges facing Asialex and that this location will extend the range the Thomson Reuters Web of Science achieving its goals are inherent in of Australex and involve speakers of Citation Index and are freely available Asia’s non-homogeneity on multiple more language groups, particularly online (http://lexikos.journals.ac.za/ levels. This vast geographical region endangered ones. The conferences are pub/). is composed of different areas often usually small, which has the benefit In its first twenty years of existence disconnected from each other, and its of promoting close collaboration and Afrilex has bestowed Honorary enormous linguistic diversity is often networking, with the opportunity Membership on the following members: under-resourced, under-researched or for delegates to attend most of the

2016 Prof. A.C. Nkabinde, Prof. Rufus under-represented. Traditionally Asialex presentations. One or more student Gouws, Dr Johan du Plessis, and Dr has had stronger presence of the eastern bursaries are offered to help with Mariëtta Alberts. parts and much less of central, south conference attendance. and western Asia. Overcoming the Australex has one self-publication http://afrilex.africanlanguages.com/ challenges would uncover and leverage of peer-reviewed papers from its 2013 homelex.html/ their resourcefullness. conference, entitled Endangered and Signs of Revival (2014). http://asialex.org/ http://adelaide.edu.au/australex/

Kernerman Dictionary News, July ASIAThe Asian Association LEX for Lexicography AUSTRALEX 3

he Dictionary Society of North he series of conferences on he European Assoction for America (DSNA) was founded in electronic lexicography in the 21st Lexicography (Euralex) 1975T to foster scholarly and professional centuryT (eLex) was started in 2009 by bringsT together people working in activities relating to dictionaries, Sylviane Granger in response to this lexicography and related fields. In the lexicography, and lexicology and to emerging field. Initially, the conference rapidly-changing world of language bring together people interested in the (at Louvain-la-Neuve, Belgium) was analysis and language description, it making, study, collection, and use of conceived as a one-off event, however its provides a forum for the exchange dictionaries. DSNA’s principal activities success and calls from the lexicographic of relevant ideas. Though based in include a biennial conference, a biannual community for a follow-up prompted Europe, Euralex has a worldwide newsletter, a website, and a journal. Iztok Kosem and Simon Krek to turn reach and a worldwide membership. DSNA sponsors a lexicography course it into a biennial conference series. Its members include lexicographers, at the Linguistic Society of America The subsequent conferences in Bled, reference publishers, corpus linguists, Summer Institute and funds a fellowship Slovenia (2011), Tallinn, Estonia computational linguists, academics for a student to attend. Occasional (2013), and Herstmonceux Castle, UK working in relevant disciplines, software informal local meetings for members (2015) thus focused on different topical developers, and anyone with a lively have begun, and outreach efforts to issues and attracted increasing numbers interest in language. promote better public understanding of of participants from all over the world. Euralex holds a major conference lexicography are underway. DSNA is As eLex is not an association, it does every two years, and also sponsors a member of the American Council of not have an official board, a membership smaller events on specific areas within Learned Societies. fee, etc, but there is an unofficial the broader field. The first conference A president, vice-president, and committee consisting of chairs of was held in Exeter, UK, in 1983 and executive secretary are DSNA’s officers organisational committees of previous since then there have been conferences and with four elected at-large members conferences. The committee offers local on a regular basis in 13 different constitute the executive board, with the organisers of the next eLex conference countries all over Europe – the 17th to immediate past president an ex-officio advice on and help with organisational be held in Tbilisi, Georgia, in September member. The journal and newsletter matters. Furthermore, members of the this year. Euralex has created a digitized editors regularly participate in the committee maintain the eLex website, version of all the papers from its past conference calls of the board and report which provides links to the webpages conferences, freely available from its to DSNA’s publications committee of all previous conferences, including website. each month. Other committees address proceedings, programmes and other Euralex maintains a discussion list finance, nominations, membership, etc. relevant information on related for the exchange of views on anything Currently, DSNA enrolls about 250 activities. of interest to people working in individual and institutional members. The eLex conferences have always lexicography and related fields. The list Dictionaries—DSNA’s journal— promoted interdisciplinarity, bringing is public and not limited to members. It aims to represent the best research together specialists in dictionary also maintains a public Facebook page. in lexicography and lexicology, publishing, corpus lexicography, In cooperation with Oxford University including history, theory, and practice software development, language Press, Euralex is responsible for the of lexicography, and the design and technology, language learning and International Journal of Lexicography, use of dictionaries and related works of teaching, studies, and a leading peer-reviewed academic reference. It publishes peer-reviewed theoretical and applied . journal that appears four times a year. articles, invited contributions, book There has also been a constant effort Interdisciplinary as well as international, reviews, reports of reference works put into the dissemination of topical it is concerned with all aspects of in progress, and occasional forums. developments and issues in (electronic) lexicography, including issues of Published annually, it has in recent years lexicography among members of the design, compilation and use, and with averaged 285 pages; a move to biannual community worldwide. An important dictionaries of all languages, though the publication is under consideration. The part of achieving this goal have been chief is on dictionaries of the major journal is indexed in MLA Bibliography, videorecordings of the presentations European languages – monolingual and Linguistics and Language Behavior and round tables which have been bilingual, synchronic and diachronic, Abstracts, and Linguistics Abstracts; made freely available on the conference pedagogical and encyclopedic. all issues are accessible through Project websites. Euralex is governed by an executive MUSE. The next eLex conference will be board consisting of up to nine elected DSNA derives its revenue from hosted by the Institute of the Dutch members, including four principal membership fees, journal royalties, Language and held in Leiden, the officers (President, Vice President, and gifts. Student memberships Netherlands, in the second half of Secretary-Treasurer and Assistant 2016 are free of charge. Both financially September 2017. Further announcements Secretary-Treasurer), elected at each and programmatically the biennial with more detailed information will be general meeting from among its conferences are the responsibility of the made on the eLex website and posted on members. The general meeting is held in host institution. relevant mailing lists. connection with the biennial conference. http://dictionarysociety.com/ https://elex.link/ http://euralex.org/ Kernerman Dictionary News, July 4

GLOBALEX 2016 workshop summary and next steps

The GLOBALEX 2016 Workshop on steering committee is needed, including GLOBALEX 2016 Lexicographic Resources for Human a representative of each body organizing committee Language Technology was held as ● Participation will be open to any local, Andrea Abel part of the Language Resources and special topic, or other lexicography-minded (Euralex, Vice-President) Evaluation Conference (LREC 2016) community, and will serve to promote the at Portorož, Slovenia, on 24 May 2016, members’ interests and activities Ilan Kerenerman with approximately 60 participants, and ● Holding conferences could be handled by (Asialex, President; workshop constituted the first step in forming a combining models of various activities, co-chair) global network for lexicography. The such as: Steve Kleinedler organizing committee consisted of the Video-record talks at different (DSNA, Vice-President) presidents and vice-presidents of Afrilex, conferences and post them on the Iztok Kosem Asialex, Australex, DSNA and Euralex, GLOBALEX website and two co-organizers of eLex, with the Minimize or avoid meetings scheduled (eLex, Co-Organizer) actual working group comprising Ilan across the world at the same time Simon Krek Kernerman, Iztok Kosem, Lars Trap-Jensen Promote meeting in person whenever (eLex, Co-Organizer; workshop and Simon Krek. The program committee and wherever possible co-chair) (including about 40 members) selected 16 Have GLOBALEX sessions as part of Julia Miller papers out of 24 submissions, each having other conferences 15 minutes for presentation plus 5 minutes Welcome the interest expressed in (Australex, President) for questions. More details are available principle by Afrilex and Australex to Maropeng Victor Mojela on the workshop website http://ailab.ijs. hold GLOBALEX sessions as part of (Afrilex, President) si/globalex/, and the proceedings are at their 2017 conferences Danie. J. Prinsloo http://lrec-conf.org/proceedings/lrec2016/ Advance the DSNA offer to hold (Afrilex, Vice-President) workshops/LREC2016Workshop-GLOB- a GLOBALEX session at its 2017 ALEX_Proceedings.pdf/. meeting Rachel Edita O. Roxas The workshop closed with a roundtable Organize virtual conferences with the (Asialex, Vice-President) discussion about next steps for GLOBALEX. aim to keep costs low, facilitate the Lars Trap-Jensen It lasted about 90 minutes and had about 30 participation of those from distant (Euralex, President) participants. The first part was moderated by places or with fewer resources, take Simon and consisted of a brief presentation advantage of technology, and attain Luanne von Schneidemesser of each member organization, including four wider dissemination of information (DSNA, President) in person, by Sonja Bosch (Afrilex), Ilan worldwide Michael Walsh (Asialex), Lars (Euralex) and Iztok (eLex), Ultimately hold face-to-face (Australex, Vice-President) a video by Julia Miller (Australex), and live GLOBALEX Olympic conferences Skype participation by Edward Finegan every few years GLOBALEX preparatory (DSNA). The second part was moderated Work has already begun to set the ground for board 2016-2018 by Ilan in discussion with those present, implementing such ideas and planning further. Edward Finegan, Ilan including Ed by Skype. The following main For example, talk is underway to leverage the Kernerman, Iztok Kosem, Simon points emerged: coinciding DSNA and Asialex conferences Krek, Julia Miller, Maropeng ● All the associations and the individuals in June 2017 for cross-broadcasting between Victor Mojela, Lars Trap-Jensen present welcomed the formation of dinner in Barbados and morning plenary GLOBALEX as an umbrella constellation in Guangzhou. In June, the organizing http://globalex.link/ to enhance worldwide cooperation and committee’s four working-group members exchange on lexicography were joined by one liaison each from Afrilex ● The organizations began to cooperate (Victor Mojela), Australex (Julia Miller), and in this vein by co-organizing this DSNA (Ed) to continue operating together GLOBALEX workshop as a preparatory board for the formation of ● The five continental associations have GLOBALEX. Besides the immediate task

2016 reached a consensus on jointly running of getting the new website up and running, a new GLOBALEX website and sharing the main work will focus on setting a its hosting costs framework for GLOBALEX to function in ● The new GLOBALEX website http:// full cooperation with all its members and to globalex.link/ will go live in mid-2016 their benefit on a worldwide scale, deciding ● The site will function as a repository what shape such body might have, defining and will link conference proceedings, its strategic policy, decision-making process, presentations, slides and videos, post etc. It is encouraging that the initiative news, announcements, etc. launched last year has already had such an ● It is hoped GLOBALEX can operate in overwhelmingly welcoming response and is

Kernerman Dictionary News, July a lean non-bureaucratic fashion; still a starting to develop. 5

XVII Euralex International Congress

The XVII Euralex International Congress of destined to fulfill multiple important missions the European Association for Lexicography in our rapidly developing multicultural and will be held on 6-10 September 2016 in multilingual world. In addition, the Tbilisi Tbilisi by the Lexicographic Center of Ivane congress aims at further popularization and Javakhishvili Tbilisi State University. sustainable development of lexicography in With the theme of Lexicography and Georgia. Linguistic Diversity, the main objective of The Programme Committee has selected the congress is to highlight the importance 115 papers by 190 authors from around of lexicography for the preservation of the world. The papers were anonymously linguistic diversity and the promotion of reviewed by at least two members of the cultural and scientific ties among different Scientific Committee. Keynote lectures will Tinatin Margalitadze is director of cultures and nations. Other objectives are be delivered by Jost Gippert, Patrick Hanks, to emphasize the role of lexicography as a Robert Ilson, Pius ten Hacken and Geoffrey the Lexicographic Centre at Tbilisi rapidly developing interdisciplinary branch of Williams, and a round-table discussion will State University and convener of science – incorporating multiple components, be moderated by Thierry Fontenelle. The the Euralex conference. viz. semantic theories, corpus-based methods, programme includes also parallel sessions, http://margaliti.com/index_en.htm/ techniques for natural language processing, software demonstrations, pre-congress e-lexicography, etc – and to explore the current tutorials and specialized workshops, a book status of lexicography as merely a craft or and software exhibition, and social events. rather a full-fledged scholarly discipline http://Euralex2016.tsu.ge/

The Lexicographic Centre at Tbilisi State University In May 2011 the Academic Council edit and prepare CEGD for publication was set up with the Department and the Council of Representatives in volumes, and 14 volumes have of the Italian Language and the of Ivane Javakhishvili Tbilisi State been published so far, covering a work on a New Italian-Georgian University took the decision to grant total of 2,380 pages. In 2009 the Learner’s Dictionary is the first one the Lexicographic Centre at the LC started to work on an electronic underway. The LC plans to initiate university the status of University platform for CEGD and in 2010 the bilingual projects for other European Centre for Bilingual Lexicography. Comprehensive English-Georgian languages, including old ones such as The decision was part of the process Online Dictionary was posted on Gothic and Old English. In 2012 the of consolidating the role of Georgian the Internet (http://margaliti.ge/eng/ LC started to work on a new project, as the national language and the index.htm/). The online version is Parallel Corpus of English-Georgian language of science in Georgia. based on the published volumes and Scientific Texts (http://corp.dict.ge/). This is particularly important at includes 110,000 entries. The LC pays great attention to this current moment in the history In 2008 it was transformed into the promotion of lexicography as of Georgia as, following the a faculty-level centre within the a branch of science. With that end restoration of independence in 1991, Faculty of Humanities and started the in view, it delivers public lectures, the Georgian language regained its compilation of a series of specialized gives presentations, has trainings function as the national language dictionaries. English-Georgian with teachers of foreign languages, and began to develop and adapt to Online Military Dictionary (http:// arranges contests, and aims to the realities of contemporary life, mil.dict.ge/) was created in 2009 at provide adequate education in this incorporating words and expressions the request of the Ministry of Defence. field. The LC has been a key force connected with international politics Then, the LC editors compiled in transforming the approach of the and diplomacy, market economy English-Georgian Online Biology authorities towards lexicography in and judicial procedures, as well as Dictionary in 2011-2013 (http://bio. Georgia. It was one of the initiators

military, scientific and technical dict.ge/) and English-Georgian Online of setting up a State Committee for 2016 terms. Dictionary of Technical Terminology the Enhancement of Lexicography in The Lexicographic Centre (LC) in 2014-2016 (http://tech.dict.ge/), Georgia at the Ministry of Education was originally established as an both funded by Shota Rustaveli and Science. The Committee is independent entity by the Department National Science Foundation. developing the National Programme of English Philology back in 1995 One of the LC goals is the promotion of Lexicography, which is intended and included the editorial team of of bilingual lexicography of Georgian to compile Georgian explanatory, the Comprehensive English-Georgian and European languages, for which historical and specialist terminological Dictionary (CEGD) that has been in purpose MA and PhD programs were dictionaries, and to promote bilingual place since the 1980’s. The aim was to launched. In 2011 a joint MA program and electronic lexicography. Kernerman Dictionary News, July 6 Asialex 2017 in Guangzhou

The 11th International Conference of research on language, literature, culture, The Asian Association for Lexicography trade and strategic studies. With 21 foreign (ASIALEX) will be held at Guangdong languages available, it is the only university University of Foreign Studies (GDUFS) in South China to offer such a great variety in Guangzhou, China on June 10-12, 2017. of programs, and its foreign language and This conference will mark the 20th literature courses as academic disciplines anniversary of ASIALEX. Being the host are among the finest nationwide. It boasts a of the First International Conference of key national research center for humanities ASIALEX, we are very pleased to bring and social sciences, Center for Linguistics it back to this location for this landmark and Applied Linguistics, which is under the event after it has traveled around nine Asian auspices of the Ministry of Education and XU Hai countries and regions over the past twenty conducts leading research in lexicography Convener of Asialex 2017 years. and applied linguistics. The members in the Center for Linguistics and Applied The theme of ASIALEX 2017 is center have published with the top presses Linguistics, Guangdong University Lexicography in Asia: Challenges, in China including Commercial Press and of Foreign Studies Innovations and Prospects. The main topics Foreign Language Teaching and Research are as follows: Press (FLTRP), and with leading scholarly ● electronic and digital revolution in journals including the International Journal lexicography of Lexicography, Lexikos and Lexicography ● computer corpus lexicography – Journal of ASIALEX. ● bilingual lexicography We are very pleased that HUANG ● pedagogical lexicography Jianhua, the first President of ASIALEX ● metalexicography who convened that first conference in ● dictionary use studies GDUFS in 1999, will be one of the plenary ● dictionary and culture speakers in ASIALEX 2017. Prof Huang ● dictionary as discourse is a renowned lexicographer and has ● phraseology recently completed a 16-year gigantic ● neologisms dictionary project – Grand Dictionnaire ● terminology Chinois-Français Contemporain (FLTRP, To respond to the challenges of the corpus Beijing, 2014) – the largest Chinese-French revolution and the digital revolution in dictionary in the world. The other keynote lexicography, lexicographers, linguists, speakers include Andrea Abel, of EURAC language professionals and publishers from and currently Vice-President of Euralex; across Asia and worldwide need to work Julia Miller, of the University of Adelaide together to share information, knowledge and President of Australex; and Michael and experience, and to encourage innovation Rundell, of Lexicography Masterclass and in lexicographic studies and practice. Our Macmillan Dictionaries. conference aims to provide such a platform. We hope that you will join us in celebrating GDUFS is a major internationalized this 20th anniversary of ASIALEX and look university known for its global-minded forward to welcoming you in Guangzhou faculty members and students and its next June!

Nineteenth-Century Lexicography Conference, 2018 A conference on 19th-century by teams of scientific observers? play in the planning and execution lexicography – Between Science Or were they utopian thinkers, of these texts? What were the and Fiction – will be held at trying to create new languages or historical factors – as regards Stanford University on 6-7 April to form writers and speakers who technology or thought – that led to 2016 2018, with an aim to explore the would use old languages in new the flourishing of lexicography in following issues: ways? How are the prescriptive this period? And what brings this How can we understand the and the descriptive intertwined phenomenon to scholars’ attention making of monolingual and in their work? What evidence do now? multilingual dictionaries in the dictionaries in different languages Please send 300- abstracts 19th century? Were lexicographers offer to answer these questions? to Sarah Ogilvie (sogilvie@ in conversation with philologists, What were lexicographers’ personal stanford.edu) and Gabriella Safran seeing their work as science and motives for their work? What role, ([email protected]) by 1 to be undertaken collaboratively, if any, did nationalistic enterprises September 2016. Kernerman Dictionary News, July 7

Chinalex and lexicographic activities in China Yihua Zhang

Abstract (The Commercial Press), and Chair of the The China Association for Lexicography Academic Board is Zhang Yihua. (Chinalex) plays an important role in Over the past 20 years, Chinalex and Chinese lexicography. This article offers a its subordinate committees have created general introduction to Chinalex and sets a platform for Chinese lexicographers to forth the functions it has performed in the exchange ideas and take part in scholarly lexicographic activities and characteristics activities including lexicographic theory, of lexicographic pactice in China, followed practice and publication, which helped to by a presentation of a new generation of enter a new period of rapid development. learners’ dictionaries and attempts made in Many lexicographic institutions were set computer-aided lexicography. up, such as the Lexicographic Department ZHANG Yihua is professor in Keywords : China Association for of the Chinese Academy of Social Science, linguistics and applied linguistics at Lexicography, Chinalex, lexicographic the Center for Lexicographical Studies Guangdong University of Foreign activities, learner’s dictionary, of GDUFS, the Chinese Lexicography Studies (GDUFS), director of computer-aided lexicography Research Center of Ludong University, the Center for Lexicographical the Institute of Ancient Books of Hubei Studies and member of the 1. An introduction to Chinalex University, the Lexicographical Research Academic Board in GDUFS, The China Association for Lexicography Institute of Shaanxi Normal University, vice-president of China Association (Chinalex) was established on October 27, the Center for Bilingual Lexicography and for Lexicography (Chinalex) 1992 in Beijing. An Academic Board and Bilingual & Bicultural Studies of Xiamen and chair of its Academic Board the following seven Committees for specific University, the Bilingual Research Center and Bilingual Committee, lexicographic fields were set up: Chinese of Nanjing University, the Dictionary vice-chair of China National lexicography, Bilingual lexicography, Research Institute of Sichuan International Standardization Committee for , Encyclopaedic Studies University, the Dictionary Research Lexicographical Terminology, lexicography, Editing and Publishing, Institute of Heilongjiang University, and the executive director of the State Computer-aided lexicography, and Lexicographical Research Center of The Committee of Modern Technology Theoretical and Historical lexicography. Commercial Press. for Lexicography, and chief editor Cao Xianzhuo was elected as the first Chinalex also sponsors two journals, of Journal of Lexicography in President of Chinalex. He was concurrently Lexicographical Studies (Cishu Yanjiu) China. He has authored well over Deputy Director of the National Language and Journal of Lexicography in China a hundred academic publications Committee and President of the Institute (Zhongguo Cishu Xuebao). The former in lexicography, including papers, works and , of Applied Linguistics of the Chinese started its publication in 1979, and the latter as well as dictionaries. Among Academy of Social Science. The following in 2015. these, English-Chinese Medical lexicographers and dictionary publishers Dictionary won first prize of the were elected as Vice-Presidents: Cao Feng 2. Characteristics of lexicographic Fifth National Dictionary Award (President of Shanghai Lexicographical practice and Contemporary Lexicography Publishing House), Wang Yaonan (Professor All the main dictionary publishing houses won the Outstanding Achievement at Hubei University), Huang Jianhua in China are members of Chinalex, Award of China Colleges and (President of Guangdong University of including The Commercial Press, Foreign Universities in Scientific Research Foreign Studies, GDUFS), and Lin Erwei Language Teaching and Research Press (Humanities and Social Sciences). (President of The Commercial Press). (FLTRP), Shanghai Lexicographical His main interests include cognitive Along with the establishment of Publishing House, Shanghai Foreign linguistics, , Chinalex, a constitution was drawn up and Language Education Press (SFLEP), lexicography, translation and all lexicographic activities were organized Shanghai Translation Publishing House, second language acquisition, and to conform to its articles. The President Sichuan Dictionary Publishing House, in recent years his research focused or Vice-Presidents serve for a term of and Chongwen Book Company. The on theoretical issues involving the five years. The current President is Cao most important lexicographic projects, integration of cognitive linguistics Guangshun (Chinese Academy of Social apart from The of China, and cyber-linguistics theories into lexical and lexicographical Science), and the Vice-Presidents are Yu are all sponsored and published by these 2016 Dianli (The Commercial Press), Wang publishers, such as Sources of Chinese researches, computational Xuming (Language & Culture Press), Words (Ci Yuan), Sea of Chinese Words (Ci lexicography, cultural translation, Liu Qing (China National Committee for Hai), Grand (Hanyu Da language contact (China English) Terms in Sciences), Yang Bin (Sichuan Cidian), Contemporary Chinese Dictionary and foreign-oriented Chinese learning and lexicography. Dictionary Publishing House), He Yuanlong (),Grand Dictionary [email protected] (Shanghai Lexicographical Publishing of (), House), Gong Li (Encyclopedia of China Xinhua Chinese Character Dictionary Publishing House), Zhang Yihua (GDUFS), (Xinhua Zidian), Xinhua Chinese Dictionary and Wei Xiangqing (Nanjing University). (Xinhua Cidian), A New English-Chinese Dictionary (Yinghua Da Cidian), and The

The General Secretary is Zhou Hongbo Kernerman Dictionary News, July 8

English-Chinese Dictionary (Yinghan Da including word class1, pronunciation, Chinalex sub-committees, Cidian). word sense disambiguation, definitions, chairs and affiliations Whereas in the English language the basic examples, collocation, and usage notes. • Academic Board – Zhang unit is the word, in Chinese it is the character Recently, reference works of different types Yihua, Guangdong University (字, zi). Ancient Chinese consisted only of are increasingly produced every year. of Foreign Studies characters, not words. Along with language Table 1 classifies dictionaries published • Chinese Lexicography – evolution, Chinese characters have become in the last two decades by the three main Tan Jinghun, Institute of very flexible in combination and may be dictionary publishers in China: Commercial Linguistics, Chinese Academy used as fundamental linguistic signs to form Press, FLTRP and SFLEP. It shows that of Social Science words, while many characters maintain the there are more bilingual dictionaries than • Bilingual Lexicography – traditional function of encoding semantics monolingual ones, and more dictionaries Zhang Yihua, Guangdong in different word classes without any change for foreign language learners than general University of Foreign Studies in form. Thus, we can have both a Chinese ones. Nearly all the English monolingual • Specialized Lexicography – character dictionary and a word dictionary, dictionaries originate from British or Peng Weiguo, Shanghai with the following distinctive modern American publishers, for example, among Century Publishing Group characteristics: the 71 English monolingual dictionaries • Encyclopaedias – Gong Li, published by SFLEP 41 are from Oxford Encyclopedia of China 1. the dictionaries cease to function as University Press and 10 are derived from Publishing House a tool only to explain hard Chinese Collins COBUILD. In addition, there • Editing & Publishing – Zhou characters or words in classic writings, is a large number of English-Chinese Hongbo, The Commercial and serve to describe the language in a bilingualised dictionaries, another feature of Press systematic and comprehensive way; the local dictionary market that is a sign of • Computer-aided Lexicography 2. words take the place of Chinese – Sun Hongda, Shanghai characters and become the main part of Lexicographical Publishing the list; 1 Since Chinese words are flexible in House 3. synchronic description and diachronic use, it has been said that the Chinese • Theoretical & Historical explanation are combined (so language has no word class. Chinese Lexicography – Yang Bin, native-language and foreign-oriented dictionaries for foreign learners began Sichuan Dictionary Publishing purposes are integrated into one in some to mark word class in 1995. those for House bilingual dictionaries); native speakers began to provide it 4. the entry structure is well-established, systematically in 2006.

item category sub-category quantity remark English 105 monolingual Chinese 61 other languages 19 Foreign Language – Chinese 168 type bilingual (around Chinese – Foreign Language 112 87% English) bi-directional 131 Chinese-English 87 bilingualized English-Chinese 3 decoding 493 function encoding 193 native users 197 25 also for foreign users user foreign/second language learners 512 around 87% English language general 366 coverage specialized 320 diachronic 46 3 also synchronic

2016 synchronic 640 time coverage print 674 media represent-ation electronic 12 verbal dictionaries 665 illustrative dictionaries 21 beginner 129 user level intermediate 178 6 also for intermediate-advanced advanced 379

Kernerman Dictionary News, July Table 1. Classification of contemporary dictionaries by main dictionary publishers in China 9 the popularity of EFL learning and teaching school students. They all learn a foreign in China. language, the majority being English. It is thus evident that dictionaries in China Chinese higher education attaches special mainly focus on a synchronic description importance to bilingual instruction. In 2001 of language for general-purpose decoding the Ministry of Education stated that basic tasks. Fewer encoding dictionaries are and specialized courses for undergraduates found on the market, and most learners’ should be taught in English or another dictionaries are either bilingual or foreign language. But the students’ lack English-Chinese bilingualised ones. of foreign language proficiency is usually Electronic (including online) versions of the an obstacle to bilingual instruction in main Chinese dictionaries are not available specialized courses. The students must except for a few mobile apps, a serious turn to learners’ dictionaries for unknown structural defect in dictionary distribution. lexical information, technical terms and However, it was recently announced that expressions. Therefore, English learners’ the newly revised Sources of Chinese Words dictionaries attract much attention from (3rd edition) will become available in both lexicographers, and numerous researches print and electronic versions (on flash disk have focused on the theory and practice and online), and The Encyclopedia of China of English pedagogical lexicography. (3rd edition) will also be put online. The Center for Lexicographical Studies In recent decades, along with the of GDUFS has proposed an integrated HUANG Jianhua increasing zeal for learning Chinese as a approach to the EFL learner’s dictionary foreign language around the world, many and lexicographic practice, which involves learners’ dictionaries have been compiled an original design made especially for and marketed. The most representative one Chinese learners, including the application is 800 words of Contemporary Chinese and integration of cognitive linguistics and (Xiandai Hanyu Babaici, 1980), compiled second-language acquisition theories. by the distinguished linguist Li Shuxiang Theoretical research has resulted in a and designed to describe function words dictionary project supported by the National and other common words, focusing on the Social Science Fund and SFLTP, called A meaning, grammatical pattern, and usages New Concept English-Chinese Dictionary of each lexical unit. Other dictionaries were for Active Use. This dictionary features an published successively, such as Modern innovative definition method, which results Chinese Learner’s Dictionary (Xiandai in a construction-based, meaning-driven, Hanyu Xuexi Cidian, Sun Quanzhou, multi-dimensional definition (Goldberg 1995), Usage Dictionary of Modern 1995, 2006; Zhang 2006, 2010, 2015b). Chinese Common Words (Xiandai Hanyu Event structure, participant/semantic Changyongci Yongfa Cidian, Li Yimin, 1995), Usage Dictionary of Chinese Common Words (Hanyu Changyongci Yongfa Cidian, Li Xiaoqi, 1997), Chinese-English Leaner’s Dictionary (Hanying Shuangjie Cidian, Wang Huan, 1997), Contemporary Chinese Leaner’s Dictionary (Dangdai Hanyu Xuexi Cidian, Xu Yumin, 2005), and Commercial Press Learner’s Dictionary (Shangwuguan Xuehanyu Cidian, Lu Jianji, 2007). A number of Chinese learners’ dictionaries for native speakers were published as well. A representative one is Contemporary Chinese Learner’s Dictionary (Xiandai Hanyu Xuexi Cidian, The Commercial Press, 2010). 2016 3. A new generation of learners’ dictionaries According to Xinhua News Agency, students who received various kinds of higher education in colleges and universities in China numbered 35.59 million by the end of 2014. Furthermore, there were more than 200 million school pupils, including 57.36 million middle Figure 1. A sample entry of A New Concept English-Chinese Dictionary school students and 45.27 million high for Active Use Kernerman Dictionary News, July 10

roles, argument structure of related lexico-grammatical constructions, and syntactic-semantic interfaces are clearly shown in the definitional unit as the basis of multi-dimensional meaning representation. Figure 1 presents a sample entry. A draft version of the dictionary is now completed. Obviously, the style is different from that of existing learners’ dictionaries. The definition is bilingualised and firmly based on linguistic studies, and the participant/semantic roles are extracted from a large corpus by means of pattern analysis. Users can thus easily find the necessary morphological, semantic, and syntactic information, as well as co-occurrence patterns and usages of defined words.

4. Computer-aided lexicography As a cross-disciplinary field of study, computational lexicography has developed into a relatively independent subject through serial researches over a rather long period of time, with a complete set of methodology and research objectives (Zhang 2015a). Recently, Chinese lexicographers have become increasingly Figure 2. Writing interface of a corpus-based dictionary writing aware of the importance of computer technology in lexicography. The main research is focused on dynamic balanced corpus data, semi- or full automation of dictionary writing, formalization of microstructure arrangement, digitization of dictionary media, the intelligentization (i.e. having intelligent search and discovery, in China English) of the dictionary query, and integration of multimedia into lexical data representation. The major bodies to have made significant efforts towards these ends – such as corpus building, developing a lexicographical database, or integrating a dictionary writing system – include The Commercial Press, the Center for Lexicographical Studies (GDUFS), SFLTP, and the Institute of Applied Linguistics (under the Ministry of Education). The Institute of Applied Linguistics developed a corpus-based dictionary writing system that integrates a tagged corpus as well as several mainstream dictionary databases such as Modern Chinese Dictionary, Applied Dictionary, (Sea of

2016 Chinese Words), Xinhua Chinese Character Dictionary, and Verb Usage Dictionary. Lexicographers can use the system to both write new dictionaries and revise existing Figure 3. Checking the interface of a corpus-based dictionary writing ones, as well as to find evidence to support system their information. Figure 2 shows the system’s interface divided into five parts: the left column displays the ; the bottom consists of a dictionary writing template; the top presents the entry

Kernerman Dictionary News, July preview or display; the middle left column 11 is used to extract examples from a tagged Lexicographical Department of the corpus that can offer information on word Chinese Academy of Social Science. The Center for segmentation, word class, co-occurrence, 2012. Contemporary Chinese Dictionary Lexicographical Studies grammatical pattern, senses and meaning; (6th edition) (Xiandai Hanyu Cidian). of GDUFS the middle right column serves to extract Beijing: The Commercial Press. The study of lexicography collocations (see also Figure 3). Lexicographical Department of the at Guangdong University of Some research institutions, for example Chinese Academy of Social Science. Foreign Studies (GDUFS) began the Applied Language Institute and 2011. Xinhua Chinese Character in the 1970s by the GDUFS th the Center for Lexicographical Studies Dictionary (11 edition) (Xinhua President, Professor Huang (GDUFS), have proposed a comprehensive Zidian). Beijing: The Commercial Press. Jianhua, a pioneer in modern approach to electronic lexicography so as to Lu Gusun. 2007. The English-Chinese theoretical lexicography in integrate corpus, database, computer-aided Dictionary (2nd edition) (Yinghan Da China, and was followed by compilation and revision, quality control, Cidian). Shanghai: Shanghai Translation Professor Zhang Yihua a leading etc. into one system composed of three Publishing House. scholar. In the early 1980’s, parts: Luo Zhufeng. 1986-1994. Grand Chinese lexicography became the key 1. Resources. Related dictionaries, Dictionary (Hanyu Da Cidian). study area in the former Institute corpora and language norms and Shanghai: Shanghai Lexicographical of International Languages and standards, constituting a large general Publishing House. Cultures and in the mid-1990’s language knowledge base, and serving Xia Zhengnong & Chen Zhili. 2010. Sea the Center for Lexicographical th as supporting information for dictionary of Chinese Words(6 edition) (Ci Hai). Studies (CLS) was established. writing. Shanghai: Shanghai Lexicographical From then on, Huang Jianhua, 2. Processing. Lexical data duplication Publishing House. Chen Chuxiang and Zhang checking through a conceptual Xue Zhongshu. 2010. Grand Dictionary of Yihua obtained significant relevance network, i.e., estimating Chinese Characters (Hanyu Da Zidian). achievements in their academic similarities among related dictionaries; Chengdu, Wuhan: Sichuan Dictionary researches, exercising great lexical conflict checking in a series Publishing House and Chongwen Book impact on contemporary of dictionaries; lexical normative Company. lexicography in China. With the checking against related linguistic and Zheng Yili, Cao Chengxiu. 2000. A development of lexicographical terminological norms or standards; New English-Chinese Dictionary (3rd studies and growth of the description and representation edition) (Yinghua Da Cidian). Beijing: academic team, CLS became an of syntactic-semantic interfaces The Commercial Press. independent research institution through corpus pattern analysis; and of GDUFS in 2001, and it establishing lexical-semantic relations References functions as the headquarters of with phonological, morphological and Goldberg, A. E. 1995. Construction: A Chinalex Bilingual Committee, conceptual relevance. Construction Grammar Approach to of which Huang and Zhang are 3. Objects. The products of the system Argument Structure. Chicago: The successively the former and include the dictionary generation University Chicago Press. current chairman. system featuring automatic dictionary Goldberg, A. E. 2006. Constructions at Lexicographic study at CLS production based on a lexicographic Work: The nature of generalization in constitutes one of the three database; a checking system as language. Oxford: Oxford University well-established research areas outlined above; an operational Press. of the National Key Research interface composed of system Zhang, Yihua. 2006. Cognitive semantic Center for Linguistics and management, data-statistics, and structure: A cognitive approach to Applied Linguistics and is multi-property retrieval, the latter the essence and structure of bilingual recognized as National Key including formal, phonetic, and dictionary definitions. Modern Foreign Discipline. CLS comprises semantic properties; inter-character Languages (4): 362-369. the following sections: relevance, sequential value properties; Zhang, Yihua. 2010. Cognitive semantics lexicographic research; and related resources. and multidimensional definition dictionary compilation; With such a system, almost all operations for a new generation of bilingual/ laboratory for computer-aided can be done on a single online-based and bilingualized learner’s dictionaries. dictionary compilation; computer-aided programme, and all the data Foreign Language Teaching and lexicography teaching; and a necessary for dictionary writing, checking, Research (5): 374-379. reference room. It includes 8 editing and revising can be made available Zhang, Yihua. 2015a. Computational full-time faculty members and by a click on the corresponding buttons or Lexicography. In Chan, Sin-Wai 10 guest or part-time ones, 2016 control icons. (ed.). The Routledge Encyclopedia of among them 7 professors and 5 Translation Technology. London & New associate professors, including Cited Dictionaries York: Routledge 7 graduate supervisors and three Han Zuoli. 2013. Xinhua Chinese Zhang, Yihua. 2015b. Second Language PhD supervisors. Dictionary (3rd edition) (Xinhua Acquisition and Learner’s Dictionaries. http://cdx.gdufs.edu.cn/ Cidian). Beijing: The Commercial Beijing: The Commercial Press. Press. He Jiuying, Wang Ning, et al. 2015. Sources of Chinese Words (3rd edition) (Ci Yuan).

Beijing: The Commercial Press. Kernerman Dictionary News, July 12 Treatment of entries with Chinese characteristics in English learner’s dictionaries: A case study of Oxford Advanced Learner’s Dictionary 8e Lixin Xia and Langwei Zhai

1. Introduction 2. Research methods of the current Along with the dramatic increase in study international exchange between Chinese The ‘advanced search’ function of the people and Westerners, more and more CD-ROM version of OALD8 was used words of Chinese origin infiltrate the English to retrieve all the entries with the tags language. According to the Global Language “originated from Chinese” or “used in the Monitor (Radtke 2007), among the 2,000 region of China”. There were 47 entries with new words and phrases added to English in Chinese characteristics which we classified 2005, 20% stemmed from Chinese. English in 8 categories as shown in Table 1. learner’s dictionary compilers have noticed After the entries were selected and XIA Lixin is professor and the phenomenon and adjusted their practice classified they were examined one by M.A. candidate supervisor at accordingly. However, for various reasons, one with the aims of identifying possible their treatment of words and expressions imperfections in their treatment and making the Centre for Lexicographical with Chinese characteristics requires suggestions for improvements if applicable Studies (CLS), Guangdong improvement. A typical case is the Oxford . University of Foreign Studies. Advanced Learner’s Dictionary, Eighth 3. Analysis of entries with Chinese He is general secretary of the Edition (OALD8, 2010), and in this study characteristics in OALD8 Chinalex Bilingual Committee, we examine such entries with Chinese The entries with Chinese characteristics author of more than 20 characteristics. were analyzed from the perspectives papers on lexicography and There are two reasons for choosing of headword selection and inclusion, China English, and principal OALD8 as our study object. The OALD definitions, and labels. investigator of projects funded is one of the best-selling English learner’s by the Ministry of Education dictionaries worldwide, and the annual sales 3.1 Headword selection and inclusion or the Guangdong Planning volume of the bilingual Chinese version The English words listed in Table 1 have of OALD8 reached over one million. distinctive Chinese characteristics, most of Office of Philosophy and Social Moreover, The bilingualized version which concern Chinese customs, including Science. His main research (English and Chinese) of OALD8 published terms of sports and entertainment (kung fu, interests include lexicography by The Commercial Press in t’ai chi ch’uan, Chinese chequers, mahjong and English varieties. Currently occupied the first place by sales volume and Cantopop), clothing (cheongsam, he is visiting scholar at under the category of “English-Chinese/ samfu) and ways of doing things (feng Coventry University. Chinese-English Dictionaries”, according shui, kowtow, Chinese lantern). The second [email protected] to the statistics of two major online stores largest category consist of words and (jd.com and amazon.com; data accessed at expressions denoting philosophy. Chinese 21:20 on March 24th, 2016). philosophy has a long history, some of

Table 1. List of entries with Chinese characteristics in OALD

category headword politics Maoism, Maoist economy renminbi, taipan, yuan 2016 language Cantonese, Yue, Wu, Xiang, Chinglish, putonghua philosophy and religion Dalai Lama, lama, Lamaism, lamasery, Confucian, Taoism, yang, yin Cantopop, cheongsam, Chinese chequers, Chinese lantern, feng shui, customs kowtow, kung fu, mahjong, samfu, t’ai chi ch’uan

cuisine chop suey, chow mein, dim sum, foo yong, hoisin, Peking duck, wok, wonton craft china, China-blue, china-clay, kaolin animal and plant Chinese cabbage, chow, ginkgo, lapsang souchong, lychee, pak choi Kernerman Dictionary News, July 13 which dates back to over 2,500 years ago, 3.2.1 Regional differences not indicated and has profound influence on both Chinese in the definitions culture and Western philosophy. It is not As a dialect of Putonghua, Cantonese shares surprising that words referring to Chinese the same Chinese character with Putonghua, food constitute the next largest category, but has different pronunciations. In many as its global popularity and influence have cases, the same word in Cantonese and made the names of Chinese dishes enter Putonghua may refer to different referents the English language as loanwords. Table in the real world, which is liable to lead to 1 shows that words and expressions with confusion. Chinese characteristics in OALD8 cover For instance, the headword chop suey a wide range of fields from daily life to is from Cantonese, referring to a kind of philosophy, from business to politics, from mixed food made of meat and vegetables, is craft to custom, and from food to language. defined in OALD8 as “a Chinese-style dish Moreover, a large proportion of these words of small pieces of meat fried with vegetables and expressions comes from Cantonese, and served with rice”. This definition, such as taipan, kowtow, samfu, etc. This however, is problematic for native Mandarin might be due to the fact that Guangzhou was Chinese users as this dish in other parts of among the first cities in China that opened China is totally different from that in the its doors to the West in ancient time, and Cantonese-speaking areas. The form that more immigrants from the Guangdong of chop suey in Mandarin Chinese is zasui, and Hong Kong areas went to live and work which refers to a dish of cooked entrails in English-speaking countries and brought of cattle or sheep (Liu 2009). For speakers ZHAI Langwei is M.A. their culture and language there. of Cantonese and Putonghua, chop suey candidate at the Centre for According to the OALD8 blurb, the total and zasui are two different kinds of dishes Lexicographical Studies, number of headwords in the dictionary with different meanings. By reading this Guangdong University of is 184,500, which means the number of definition, speakers of Mandarin Chinese Foreign Studies. His main Chinese-derived words and expressions would normally understand chop suey to be research interests include (47) accounts for barely 0.03% of its entries. another kind of dish rather than zasui. lexicography and cognitive Nearly twenty years ago Benson (1997:133) Another example is cheongsam, which semantics. has noted that English learner’s dictionaries is defined as “a straight, tightly fitting silk [email protected] (ELDs) contain fewer references to China dress with a high neck and short sleeves and than their larger counterparts such as OED. an opening at the bottom on each side, worn The OED online version reveals through by women from China and Indonesia”. advanced search 250 entries of Chinese While this word form is in Cantonese, origin, that is 0.04% of the total 600,000 the dress itself originates from Shanghai entries. However, as ELDs claim to be and was made fashionable by upper-class specifically designed for foreign learners, women at the beginning of the 20th century, they may be expected to include more referring to an exclusively traditional gown entries from other varieties than their larger also known in Mandarin Chinese as qipao. counterparts for native speakers. Moreover, For Cantonese speakers in the areas of the number of headwords from Chinese Guangzhou, cheongsam usually constitutes is out of proportion to that of headwords a jacket with long sleeves, not necessarily derived from other languages, say Japanese. a long dress covering the whole body. In According to Zeng (2005, 2016), the Hong Kong, cheongsam is a dress for both number of headwords originating from women and men. Besides, the pinyin form Japanese in the Shorter Oxford English of cheongsam is changshan in Mandarin Dictionary is several times higher than Chinese, which denotes exclusively a those from Chinese. Last, but not the least, piece of clothing for men: a long and the headwords with Chinese characteristics loose-fitting piece of clothing that covers in OALD8 are mostly from old times, and all of one’s body and reaches the ground, only a few refer to the current time. Since, worn especially by educated men in ancient as mentioned above, more and more new China. It was a sign of rank, or at least of words from Chinese enter the English , because at that time poor people language in recent years, ELDs should were mostly illiterate and could not afford 2016 reflect this language change accordingly. a piece of cheongsam (Xia 2015). It seems that OALD8 adopted the signifier of the 3.2 The definitions concept in Cantonese and the signified The words and expressions with Chinese object from Shanghai areas. No regional characteristics seem to be alien to the uses were shown in the definition. As OALD8 compilers, as some of them are not a result, speakers of Mandarin Chinese defined accurately. Besides, some definitions will have difficulty in understanding the are too simple or vague and could cause a definition. Moreover, an English speaker difficulty in understanding for dictionary travelling in China will also feel puzzled users, even for Chinese native speakers. when he orders a chop suey or cheongsam Kernerman Dictionary News, July 14

but is served with or given a different dish 3.2.3 Denotative meanings not defined or dress. One may argue that OALD8 As common practice, the denotative or describes these Chinese words in the literal meanings of a headword should English language, not their use in Chinese. be first included and explained, then the The description of the English usage extended meanings can be further illustrated. of these words, however, is not correct Otherwise the additional meanings would because their referents are not the ones they seem to come from nowhere. refer to in their original use in Chinese. As For example, kowtow is defined in a dictionary is by nature both descriptive OALD8 as “to show sb in authority too and prescriptive, it should inform its users much respect and be too willing to obey of the right use of a foreign word in order them”. Its denotative meaning in Chinese to avoid possible misconception. is to kneel and touch the ground with the forehead. It originates from the rite of 3.2.2 Narrower or wider meanings or dunshou, which consists of three steps – extensions namely, keeling, bending over the body, Due to the cultural difference, the same and touching the ground with the forehead concept may have different meanings or – and is the solemn rite of an inferior to extensions. For example, Maosim is defined a superior as formerly done in China. The as “the ideas of the 20th century Chinese metaphorical meaning of kowtow is to show Dictionnaire de Poche communist leader Mao Zedong”, a political someone, especially one’s superior, deep term that denotes the concept of Mao respect, worship, or submission. OALD8 Français-Chinois Zedong’s thought as it is termed in China. has adopted only the metaphorical meaning Chinois-Français It was formerly believed to be introduced of kowtow and ignores its literal meaning. Commercial Press and developed solely by Mao Zedong, but International following the open policy adopted in China 3.2.4 Definiens not included in the Beijing, China since 1978, the term was officially redefined definienda March 2016 in the Communist Party Committee’s ELDs claim to use a limited defining 592 pages, 75x110x25 mm Constitution as “Marxism-Leninism applied to define all the headwords, PVC in a Chinese context”, synthesized by Mao and all the definiens are included in the ISBN: 978-7-5176-0163-0 Zedong and China’s “first-generation dictionary as definienda themselves. RMB 29.80 leaders” ( 2010). According to the official However, the definition of Taosim has http://www.cp.com.cn/ definition, it is the fruit of the collective Lao-tzu, which is not included in OALD8: wisdom of Mao Zedong together with other “a Chinese philosophy based on the writings communist leaders of the first-generation of Lao-tzu”. Lao Tzu, an ancient Chinese Published in cooperation from the 1920’s until Mao’s death in 1976. philosopher, was traditionally regarded as with K DICTIONARIES The current definition in OALD8 thus has the author of the holy book and a narrower sense as it is limited to Mao’s the founder of Taoism. In the book of Lao personal political theories. Tzu, Tao is considered as the basic source and The headword dim sum is defined supreme law of everything in the universe. in OALD8 as “a Chinese dish or meal The followers of Taoist teaching should consisting of small pieces of food wrapped stick to the state of vacancy and stillness in sheets of dough”. As a matter of fact, mentally and physically to understand the besides this sense dim sum refers also to nature of Tao. The dictionary includes the Chinese sponge cakes, vegetables wrapped entries Taoism and Taoist, but not Lao Tzu in dried bean milk cream in tight rolls, or Tao, which may cause problems to users beef or pork meatballs, and so on. The who are not familiar with the concept of pinyin form of dim sum is dianxin, which, Taoism. It is common practice for British in Chinese culture, refers to snacks, light general language dictionaries not to include refreshments or desserts that are served, proper names, which might result in such often with tea, in small portions. The shortcomings, whereas American English definition is thus incomplete in that it only dictionaries tend to include proper names. covers one kind of dim sum. On the other hand, the definition of 3.3 The glosses and labels

2016 ginkgo in OALD8 has a wider meaning that Labels are used in dictionaries to remind can encompass many other trees as well: users of additional meaning and usage “a Chinese tree with yellow flowers”. In for a . As headwords of entries biological terms, ginkgo refers to the plants with Chinese characteristics have specific of the ginkgo genus, the only living member cultural connotations, it is advisable to of the gymnosperm family Ginkgoaceae. It illustrate them by way of labels or notes. has great biological and economic value in The headword taipan in OALD8 is defined that it has a number of primitive features and as “a foreign person who is in charge of a its fruits can be used in food and medicine. business in China”. However, this is an The main characteristics of ginkgoare its informal term used during the 19th and early th Kernerman Dictionary News, July fan-shaped leaves and yellow flowers. 20 century, and is passing out of current 15 use (Xia 2015). Therefore, a register label, the dictionary users. Therefore, we have all such as old-fashioned, should be added to the more reasons to clarify such occurrences guard users against misusing it. The label in the dictionary and define them correctly means that the word is no longer used, but its and accurately. counterparts in the real world exist. The headword Lamaism is defined Acknowledgements as “Tibetan Buddhism” by way of a This work was supported by the synonymous paraphrase. The Chinese Humanities and Social Science Research equivalent is lama jiao, an informal term Funding of the Ministry of Education for Tibetan Buddhism, where jiao means of the People’s Republic of China under a religion. As a matter of fact, according Grant No. 15YJA740048 (China English to Tibetan Buddhists and researchers, lama or Chinglish? A Study Based on China jiao is offensive, which might mislead English Corpus), China Scholarship language users to regard it as an independent Council under Grant No. 201508440380, religion only worshiping the lamas instead and the Post-graduate Division of of Buddha, or even creating their doctrines Guangdong University of Foreign Studies from nowhere but the teaching of Buddha under Grant No. 14GWCXXM-45. (Lopezth Jr., 1999: 6). It, therefore, is recommended that a note be added to warn References the dictionary users against misusing it. Benson P. 1997. English Dictionaries in Random House Webster’s OALD8 does provide a gloss for the entry Asia: Asia in English Dictionaries. College Dictionary t’ai chi ch’uan, defined as “(also t’ai chi) a In M.LS. Bautista (ed.), English is New Edition Chinese system of exercises consisting of an Asian Language: The Philippine Commercial Press sets of very slow controlled movements”. Context. Sydney: The Macquarie International The entry thus lists t’ai chi as a variant. Library, 125-140. Beijing, China The terms t’ai chi ch’uan and t’ai chi are Editing Committee of the Thirteen March 2016 closely related in form and content but Confucian Classics with Notes and denote two different concepts, in which Commentaries. 2000. The Book of 1960 pages, 185x265x70 mm the former refers to a Chinese martial art Change. Beijing: Peking University Hardbound characteristic of slow movements, and the Press. ISBN: 978-7-5176-0191-3 latter to the ancient Chinese philosophy. Liu, H. 2009. Chop suey as imagined RMB 298.00 According to the first half of Xi Ci in the authentic Chinese food: The culinary http://www.cp.com.cn/ Book of Change, the source of change is identity of Chinese restaurants in the t’ai chi, which produces . United States. Journal of Transnational Published in cooperation (Editing 2000: 340) In other words, t’ai American Studies, 1(1). with K DICTIONARIES chi is the source of everything and also the Lopez Jr., D.S. 1999. Prisoners of essential factor and condition for the change Shangri-La: Tibetan Buddhism and the of everything. T’ai chi ch’uan is based on West. Chicago: University Of Chicago the philosophy of t’ai chi. Although t’ai chi Press. is often used as the shortened form of t’ai OED online. Oxford English Dictionary chi ch’uan both in the West and in China, online. http://www.oed.com/. Accessed not denoting the difference might cause 24 March 2016. confusion to the dictionary users. Qi, J. 2010. Five Factors Influencing China’s Foreign Policies. Beijing: The Conclusion Central Compilation and Translation From our analysis it can be concluded that Bureau. lemmas with Chinese characteristics in Radtke. O.L. 2007. Chinglish. Layton: OALD8 have not been sufficiently well Gibbs Smith Publisher. treated, although the dictionary, on the Xia, L. 2015. The Corpora of China English: whole, is of high quality. A disproportionate Implications for an English-Chinese number of Chinese-derived entries has Learner’s Dictionary. Paper presented flaws in the definitions and representations. at Australex 2015 held at Auckland Specifically, among 47 lemmas with University on 18-22 Nov. 2015. Chinese characteristics, 10 have some Zeng, T. 2005. Direction of the English 2016 flaws, which makes up 21%. The first translation of Chinese-culture-loaded cause for such flaw seems to be that they words: A case study of the Shorter are not accorded the same status as other Oxford English Dictionary, 5th edition. English lemmas, such as Japanese-derived Journal of Guangdong University of ones. Another cause might be that these Foreign Studies, Vol. 16 (Supplement): words have rich cultural connotations that 74-77. make their compilation difficult for ELD Zeng, T. 2016. Why not translate the lexicographers. It is understandable that they Chinese-culture-loaded words by are not familiar with words and expressions transliteration?. Nanfang Weekend, with Chinese characteristics, but neither are 2016.2.24. Kernerman Dictionary News, July 16

Lexicography at the Society for Danish Language and Literature Lars Trap-Jensen

Introduction need money and we need labour. Money to The making of dictionaries has been an publish the source material and labour to ongoing activity at the Society for Danish process it”. Present in the audience that day Language and Literature (Det Danske was Kristian Erslev, a professor of history Sprog- og Litteraqturselskab, DSL) for and one of the pioneers of historical criticism just over one hundred years. In 1915, and the modern science of history. More the Society was encouraged to take the importantly in this connection, however, he responsibility for the compilation of an was also head of the university at the time ambitious dictionary project, Ordbog over and in addition a prominent member of the det danske Sprog (Dictionary of the Danish Carlsberg Foundation, later to become its Language). The outline of this dictionary President. He envisioned the perspectives had already been sketched over the previous of Jacobsen’s message and realized that decades by Verner Dahlerup, a professor an institutional framework was needed. Lars Trap-Jensen has an of Nordic philology at the University of His advice to her was to form an editorial educational background Copenhagen. His inspiration came from the society: ”If you can provide the labour, I grand projects initiated for German, English, will provide the money”. Only one month in general linguistics, Dutch and Swedish, but when he signed a later, on 29 April 1911, the Society for Greenlandic, and social studies contract to compile the dictionary, in 1901, Danish Language and Literature became from Aarhus University, with the plan was for a more modest publication, a reality. an MPhil in linguistics from twice the size of the standard dictionary of With that, several important traditions Cambridge University. He was the time, Christian Molbech’s two-volume had been established: the goal of the a lecturer in Danish language Dansk Ordbog (Danish Dictionary), but still Society was to create scholarly editions at the universities of Basel and in the format of a concise dictionary. In the of the source material for the study of Zürich. Since 1994 he has been following years, he had to revise his plans, Danish language and literature through all working as a lexicographer now aiming at an estimated 8-12 volumes. historical periods and, equally important, a at the Society for Danish Eventually, Dahlerup realized that the task long-term cooperation had been set up with Language and Literature, was beyond the working capacity of a single the Carlsberg Foundation as an important Copenhagen, and since 2003 man and thus, in May 1915, he turned to the and generous sponsor of the Society’s Society, which had been established only activities. as the managing editor of The four years earlier. Danish Dictionary and the Private foundations as culture bearers Society’s dictionary site ordnet. Foundation and objectives of the Society Today, the Society for Danish Language dk. Other projects include the The Society was lead by a remarkable and Literature functions as an independent digitization of Dictionary of woman, Lis Jacobsen, who had also been scholarly institution receiving annual the Danish Language and the the driving force in founding the institution. funding from the Ministry of Culture. This development of the Danish Jacobsen (nee Rubin) hailed from a Jewish covers the administration and operation (DanNet) and The family and was the daughter of the national of services, whereas most scholarly Danish Thesaurus. Currently bank governor. She had been the first activities are sponsored by external he is President of Euralex and woman to obtain a doctorate in Nordic donors for specific projects. Among these involved with the establishment philology, and only the seventh female is the Carlsberg Foundation, owner of the doctor in the country at the time, with a Carlsberg Group and the world’s third of Globalex. dissertation on the earliest manifestations of largest brewing company. Established [email protected] the Danish language. About one year later, in 1876, this industrial and commercial on 29 March 1911, she gave a lecture on foundation is among the oldest of its the ”means and ends of Danish linguistic kind worldwide. The statutes stipulate research”, arranged by the Society for that part of the company’s profit must

2016 German Philology. She had imagined her be channeled back to society through dissertation to be just the opening volume of donations to science and culture, and in a more ambitious documentary work of the this way, Carlsberg has left its mark on entire history of the Danish language, but many aspects of Danish society. The same had found herself forced to discontinue her is true for a number of other commercial work due to the lack of satisfactory source foundations that have financed or material. Scholarly editions of the source co-financed lexicographic projects within material were scarce and their systematic the Society: a Swedish-Danish dictionary studies correspondingly few. In her lecture, was sponsored by the foundation owned she stressed the need for both, concluding: by A.P. Moller–Maersk Group, the

Kernerman Dictionary News, July ”The means to obtain this are twofold: we largest company in Denmark and a world 17 leading container ship operator; the Velux comprehensive as its sister dictionaries Foundation, producer of windows and in Germany, Sweden, the UK and the skylights, sponsored the digitization of Netherlands, but the 28 volumes were the Old-Danish Dictionary archive, and completed within 40 years, later increased the Augustinus Foundation, majority by 5 supplementary volumes, and even to share holder in the Scandinavian Tobacco this day, it is, for its size, quite uniform and Company, recently gave a donation to Den easy to read and use. Danske Ordbog (The Danish Dictionary). In addtion to the private foundations, The Danish Dictionary and other projects may also receive donations from dictionaries special allocations provided for in the Its successor, Den Danske Ordbog, was Finance Act. The two large monolingual launched in 1991 as the first, and so far dictionaries, Ordbog over det danske only, corpus-based monolingual dictionary Sprog and Den Danske Ordbog were both for Danish. Originally conceived as a mainly sponsored jointly by the Carlsberg paper dictionary (6 volumes, 2002-2005), Foundation and the Ministry of Culture. it has seen its greatest success as an online dictionary, with nearly 100,000 visitors The Dictionary of the Danish Language on a normal day (May 2016). It has been Ordbog over det danske Sprog marks online since 2009 on the Society’s modern a turning point in Danish lexicography dictionary website (http//ordnet.dk/) along Verner Dahlerup which, prior to its publication, had been with a digital version of Ordbog over det dominated by the prescriptivism inherited danske Sprog. Unlike the historical Ordbog from the tradition of the French Academy. over det danske Sprog, Den Danske Dictionaries of the 19th century were Ordbog is being updated on a regular basis. preoccupied with educating the public, more In line with the statutes, the Society specifically by protecting it from what was aims to provide dictionary coverage of considered bad linguistic influence. The the Danish language across all historical dictionaries should only contain, according periods. A dictionary of Old Danish, to Molbech in Dahlerup’s reading, “good” covering the period 1100-1515, has been words, “the most beautiful flowers of the underway for more than 60 years and is now language”. For a word, it was a mark of drawing near its conclusion. The period honour to be included in the dictionary, between Old Danish and Modern Danish much in the same way as it is an honour is the weakest in terms of coverage, but a for a work of art to feature in the nation’s series of mainly bilingual Latin-Danish and art collection. Dahlerup broke away from Danish-Latin from the Danish Lis Jacobsen this tradition and insisted on greater Renaissance have been published, and just professionalism, declaring: “I cannot ask a few years ago the Society was able to first of all: ”should this or that word be publish for the first time ever the earliest used?”, but rather: ”is it used, or has it been comprehensive dictionary of Danish, used?”; if this is the case, I include the word compiled around 1700 and describing in so far as considerations of space permit” the language in the latter half of the 17th (Dahlerup, 1907). century. Until then, Matthias Moth’s Where the editors of the 19th century dictionary had only existed as a manuscript dictionaries had been generalists with in the Royal Library in Copenhagen, but mainly educationalist concerns, the editors a long-cherished wish for publication of Ordbog over det danske Sprog in contrast was at last made possible through a gift were specialized philologists with intimate donation from the Carlsberg Foundation knowledge of the language described. At the in connection with the Society’s 100th centre of their work lay a large collection of anniversary in 2011. notes with excerpts from a range of texts. In addition to these comprehensive Even if the technology was different and dictionaries, the modern period is also the texts dominaed by exemplary literary represented by the recent publication and journalistic efforts, the methodology of a Danish thesaurus, Den Danske used was not much different in nature Begrebsordbog, as well as two bilingual 2016 from the modern corpus-based approach dictionaries with Swedish and Icelandic of descriptive lexicography: from the as the respective source languages. underlying language material they extracted Furthermore, the Society has recently whatever facts of form, meaning and word retro-digitized and published online some patterns they could observe about the of the more important Danish dictionaries, linguistic units. either compiled by the Society itself or by With more than 225,000 entries, others, taking advantage of the experience Ordbog over det danske Sprog is the gained from the retro-digitization of largest monolingual dictionary compiled Ordbog over det danske Sprog, by means for Danish. It is, admittedly, not as of double-keying following the model of Kernerman Dictionary News, July 18

References Andersson, H. 2006. ODS – træk af en historisk ordbogs historie (The Dictionary of the Danish Language – Outlines of the history of a ). In Bergenholtz H. and Malmgren S.-G. (eds.), LexicoNordica 13, 25-39. Gothenburg. The DSL building in Copenhagen Dahlerup, V. 1907. Principer for ordbogsarbejde (Principles for Dictionary Work). In Kristensen M. Deutsches Wörterbuch of the Brothers and Olrik A. (eds.), Danske Studier, pp. Grimm in Germany. Digitized dictionaries 65–78, Copenhagen. of this kind include the Holberg-Ordbog Carlsbergfondet og (Holberg Dictionary), a dictionary of the “Sprogmindesmærkerne”. 2015. News complete works by the Danish-Norwegian post about the Carlsberg Foundation and author Ludvig Holberg (1684-1754) the Language ”Monuments”, accessed published in 5 volumes in 1981-1988, May 2016 (http://www.carlsbergfondet. and Meyer’s Fremmedordbog (Meyer’s dk/da/Skjulte-sider/Skjulte-artikler/ Dictionary of Loan Words), based on the Danske-versioner-af-forskningsprojek- 8th edition from 1924. Most recently, ter/Sprogmindesmaerkerne/). the Ordbog til det ældre danske Sprog Trap-Jensen, L. 2010. Den Danske Ordbog (Dictionary of Older Danish), edited and på nettet. http://sprogmuseet.dk/ord/ published by Otto Kalkar in 1881-1918, den-danske-ordbog-pa-nettet/. is being digitized as part of an ongoing Trap-Jensen, L. 2012. Ordbog over det project examining the Danish language and danske Sprog. http://sprogmuseet.dk/ literature in the Middle Ages. ord/ordbog-over-det-danske-sprog/.

Towards Peoplex Ilan Kernerman

I was thrilled to take part in the In addition to economic-political localized Englishes, effects on Dictionaries in Asia conference factors, this lack may be mainly the mother tongues, etc, as well and the inauguration of Asialex. due to Asia’s inherent diversity, as repercussions from hi-tech The need for a forum of this kind not being a homogenous entity and tele-communication, online has long been felt, and the event of any sort. Linguistically, unlike interactivity and automatic lived up to expectations. most European tongues that translations, Dictionizers and It might seem strange no such pertain to the Indo-European Quicktionaries, and so on. framework existed so far, family,.Asian languages share no This forthcoming forum should since Asia was the cradle for common background, apart from not replace national or regional dictionary-making thousands of being human. LEX’s, but accommodate years ago, and its lexicographic That natural human link is true the varied issues. As such, tradition has flourished through just as well for the entire world. geography is no sound base the ages to modern times. Asia can project a microcosm of for its foundation, nor for the The 20th century’s prominent it and, thus, establishing Asialex soon-to-come dictionaries that milestones in pedagogical is a significant step toward will hardly be what we imagine 2016 lexicography stem from the work forming a global lexicographical now. of Michael West in India and constellation. Beyond countries and behind A.S. Hornby in Japan. Some of A future GLOBALEX (or computers there are people. First the world’s finest dictionaries are Unilex, in the words of of all, and after all. People are the made in Japan and its neighbors, Tom McArthur) concerns most common denominator for as well as valuable research globalization and co-existence lexicography all over the world. carried out, but these are little in multilingual societies, English (Reprinted with slight amendments known of elsewhere. as the international franca, from KDN 5, 1997.) Kernerman Dictionary News, July 19 Linked data in lexicography Julia Bosque-Gil, Jorge Gracia and Asunción Gómez-Pérez

1. Introduction RDF and linguistic (LexInfo4 The notions of linked data (LD) and Web or GOLD5), which enables the integration, of Data are increasingly gaining ground in exchange, and enrichment of lexicographic digital humanities, linguistics, biomedicine, data among different resources, the e-science, data journalism, etc. and reusability of the whole resource, which lexicography is not staying behind. The LD in turn prevents lexicographers from paradigm meets the need to link isolated “re-inventing the wheel” in potential pieces of information which were in their future projects, improved data visualization own proprietary formats and were previously and querying, resource sustainability hard to discover and integrate. The term (Wandl-Vogt 2015), and easy discovery actually refers to a “set of best practices for thanks to metadata repositories.6 exposing, sharing, and connecting data on the In this context, this paper seeks to Julia Bosque-Gil is a PhD student Web” (Bizer et al. 2009). In order to create present, on the basis of our experience in at the Ontology Engineering Group, LD there is a set of requirements to fulfill, the conversion of lexicographic data to Universidad Politécnica de Madrid. among them, the use of Unique Resource LD7, our reflections on the implications She holds a B.A. in German Identifiers (URIs) and the establishment of converting lexical data to LD, drawing Linguistics and English from the of links to other resources. The Resource special attention to the advantages it offers Humboldt University of Berlin 1 Description Framework (RDF) is the formal from the eyes of a lexicographer or a and an M.A. in Computational backbone giving support to this network of linguist outside the realm of the Semantic Linguistics from Brandeis interlinked resources and allowing for the Web, but as part of a discipline which can University, Waltham. Her interests definition of triplets or statements of the be already considered part of information include the -ontology and form subject-predicate-object, where subject science (Fuertes-Olivera and Bergenholtz the syntax-semantics interfaces, the and object are resources and the predicate is 2011). Our goal is therefore twofold: to the edge or property connecting the nodes. place LD in the context of lexicographic relations between the lexicon and The result is a vast graph whose nodes can work in lexical networks, and to bring its syntax, semantic annotation and be practically anything, including lexical benefits closer to the lexicographer so she the representation of (multilingual) units, and this is where lexicography comes can consider it a basis for future endeavours. language resources as linked into play. To this end, we will first provide a brief data. She has been working in The work in models for the representation overview of the work on the representation the conversion of multilingual of linguistic information as LD (McCrae et of lexical information as graphs outside the terminologies to RDF as part of the al. 2012), as well as in best practices and context of the Semantic Web with focus on LIDER project and in the modeling guidelines for the conversion of mono- and WordNet (Miller 1995, Fellbaum 1998) and of lexicographic data as linked data 2 multilingual language resources has been Polguère’s lexical systems (Polguère 2012, in collaboration with K Dictionaries continuous in recent years. The benefits 2014) implemented in the French Lexical and Semantic Web Company. For that LD brings to lexicography have been Network (Gader et al. 2012). Then, we already pointed out in recent works related to will dwell on the practical advantages of her PhD thesis she is investigating the conversion of bilingual and multilingual LD for representing both the macro- and the use of linguistic linked data for dictionaries as LD (e.g. Gracia 2015, the microstructure of a lexicon. research in linguistics. Klimek and Brümmer 2015, Bosque-Gil [email protected] et al. 2016) and etymological and dialectal 2. Lexical data as a graph dictionaries (Declerck et al. 2015, among Modeling lexical information as a graph others), as well as in recent initiatives and is not a novel notion coming from LD. international projects that have embraced the use of semantic technologies3 and in 4 http://www.lexinfo.net/ontology/2.0/ current e-lexicography work (McCracken lexinfo/ 2015). The main advantages are the semantic 5 http://www.linguistics-ontology.org/ and syntactic interoperability provided by 6 linghub.org, http://metashare.elda.org/ 2016 7 From October 2015 to February 2016, 1 https://www.w3.org/TR/rdf11-primer/ the Ontology Engineering Group at 2 https://www.w3.org/community/ UPM worked on the development bpmlod/ of a linguistic linked data prototype 3 Such as the ENel cost action (http:// for K Dictionaries and Semantic www.cost.eu/COST_Actions/isch/ Web Company as part of their IS1305/) or the LIDER (http:// LDL4HELTA project, and, more lider-project.eu/) and LDL4HELTA specifically, on the transformation (http://www.eurekanetwork.org/ to RDF of the Spanish dataset of K

project/id/9898/) projects Dictionaries. Kernerman Dictionary News, July 20

WordNet already set a precedent (Miller for.11 WordNet falls under the category of 1995, Fellbaum 1998) as a graph-based ontology-based lexical network (Polguère lexico-semantic database where nodes 2014: 3), i.e. a network of lexical units represent the concepts (synsets or sets with an ontology as backbone, including of cognitive ) and hyponymy, word senses arranged in a hierarchy and meronymy and antonymy constitute related by synonymy, hyponymy and the relations that link them together. meronymy relations. It is worth mentioning Furthermore, there are other efforts in that LLD relies on linguistic ontologies lexicography that emerge from a conception or vocabularies, but the creation of an of the natural language lexicon as a network ontology of word senses or concepts is of entries rather than a list, which is what the actually optional and it is not a required organization of conventional dictionaries step in order to publish LD. Accordingly, looks like. The entries are then viewed as we can state that the entry enthusiasm in part of a language system of related lexical an English lexicon has as the part-of-speech elements. Polguère’s notion of a lexical lexinfo:noun, which is defined along system, implemented in the framework of with, for instance, lexinfo:reflex- the French Lexical Network project, falls ivePersonalPronoun, as an individual into this category. However, in contrast to of type in the Jorge Gracia is a post-doctoral lexinfo:PartOfSpeech the projects developed in lexical semantics, linguistic ontology LexInfo.12 We are thus researcher at the Ontology linguistic linked data (LLD) and the models linking two resources without establishing Engineering Group, Universidad proposed for converting resources into them the concept denoted by enthusiasm in any Politécnica de Madrid, Spain. He (lemon8, SKOS-XL9, LIR (Montiel-Ponsoda hierarchy (e.g. as a child of feeling). LD got his PhD in Computer Science et al. 2008)), do not arise as initiatives resources such as BabelNet13, DBpedia14, at University of Zaragoza in 2009, to model the (mental) natural language and WordNet RDF (McCrae et al. 2014) with a thesis about heterogeneity lexicon, nor make such claim, even though have an underlying ontology, but this is not issues on the Semantic Web. they entail the use of classes and properties implied in the conversion of every resource His current research interests such as lexical entry, sense, lexical concept, to LLD. In relation to this, LLD builds include and syntactic frame, lexical form or definition. upon the notion of semantics by reference linked data, linguistic linked data, LD emerges as a technological means (McCrae et al. 2012): the meaning of a word to better represent, share, integrate and and the word itself (the signifier -- signified and cross-lingual matching and discover linguistic knowledge scattered opposition) are separated in two different information access on the Semantic over the Web and its underlying RDF layers, with ontolex:LexicalEntry Web. Currently he is exploring how formalism is not conceived from a and skos:Concept respectively, and to move language resources (lexica, theoretical perspective as an alternative the relation between the two is “reified” dictionaries, corpora, etc) from to structure mental lexical information. in a class that aims at encoding a sense their data silos into the multilingual Nonetheless, knowing the direction into (ontolex:LexicalSense). All the Web of Data and make them which lexical semantics and lexicography linguistic information pertaining to the interoperable, in order to support move, as well as the similarities between the word itself or to the use of that word with a future generation of linked representations suggested there and those that specific meaning is separated from the data-aware NLP tools. proposed from an LD perspective, will help actual meaning, which, ideally, is language http://jogracia.url.ph/web/ us in building bridges for collaboration independent. Hierarchic conceptual between experts from both sides. LD, as best relations would be established, if they are, practices for data representation, should be at the level of the concept. compatible with the representation of any Polguère places lexical systems on the lexical network, even though this implies other side of the balance: they are lexical the extension of vocabularies currently networks that are not ontology-based. available on the so-called linguistic linked Lexical systems are conceived with the open data (LLOD) cloud10 or models to relations among the lexical elements as encode all the data that the theory on which focus and relegate to the background the the resource is based addresses. classification of units or property inheritance An analysis of what modeling the lexicon (Polguère 2014: 3). A key aspect of lexical as a graph in WordNet entails and which systems is that relations are not limited to needs are met is given in Polguère (2014) synonymy, hyponymy, etc. but they include

2016 and McCracken (2015): lexical entries were previously analyzed and presented 11 However, most lexical semantics independently one from another and a novel research addresses different aspects approach reflecting what the structure of the with different levels of granularity, mental lexicon might resemble was called but it does not analyze all word types and the semantic structure of the lexicon as a whole (Swanepoel 1994) 8 http://www.lemon-model.net/lemon/ 12 http://www.lexinfo.net/ontology/2.0/ 9 https://www.w3.org/TR/ lexinfo/ skos-reference/skos-xl.html/ 13 http://babelnet.org/

Kernerman Dictionary News, July 10 http:/linguistic-lod.org/llod-cloud/ 14 http://wiki.dbpedia.org/ 21 paradigmatic and syntagmatic relations their implementation while retaining all the drawn from the set of lexical functions of benefits related to interoperability, visibility the Meaning Text Theory (Mel’čuk 1996). and NLP-services compliance. The result is a multi-dimensional graph with All in all, the LD paradigm is agnostic a wide range of relations linking the nodes with respect to the different theories in (lexical elements), which instantly brings modern lexicography, and it poses a number RDF to mind. There are two important points of tangible benefits that we enumerate in the to bear in mind when comparing lexical following sections. systems with resources migrated to LLD: first, the nodes in lexical systems are already 3. Benefits of a lexicon in linked data: “disambiguated”, each node represents one macro-structure specific meaning of the lexical unit at hand Having placed LD in context, what are the (Polguère 2014: 5). The closest counterpart actual benefits of creating or converting a we have in RDF is ontolex:Lexi- lexicon to LD? The most evident advantage calSense, which is a unique relation is that LD enable the integration with other between a word and a meaning. Secondly, external resources thanks to the semantic and the nodes in a lexical system are not atomic syntactic interoperability achieved by the use and each one records the information we of RDF and linguistic ontologies. Besides Asunción Gómez-Pérez is would find in a lexicographic article. this fact and focusing on the lexicon itself, Grammatical information, semantic label, we have identified the following benefits in Vice-Rector for Research, syntactic government pattern (collocations the course of our work towards the migration Innovation and Doctoral Studies are implemented by edges), etc. are stored of language resources to LLD, some of them and Full Professor at Universidad inside the node. In LLD, some of these also highlighted in the literature. Politécnica de Madrid. She is Head data would be linked to the entry at hand Firstly, the entries of a dictionary become of the Department of Artificial or to one of its ontolex:Lexical- internally reusable (Klimek and Brümmer Intelligence since 2008, Director of Sense(s) by means of specific properties 2015) thanks to their URIs. This does not the Ontology Engineering Group (edges) and elements available in linguistic seem novel given that entries might already since 1995, Academic Director of vocabularies, identifiable with their own have numeric identifiers to point to each the Master’s Degree in Artificial URIs: lexical entries, word forms, senses, other, but the choice of transparent URIs, i.e, Intelligence since 2009, and part of speech tags, gender, number, human-readable, which reflect the semantic Coordinator of the PhD Programme subcategorization, etc. To see which entries content, and a suitable URI naming strategy are related, the SPARQL query language15 play a crucial role (Bosque-Gil et al. 2016): in Artificial Intelligence since 2009. allows to perform queries on the graph the editor of a dictionary entry will be able Some of her main research areas are and trace the connections between ontolex to refer to another entry without the need to Ontological Engineering, Semantic lexical senses or ontolex lexical entries. know its identifier in advance. Following Web, Linked Data, Multilingualism LexInfo, lemon-ontolex, SKOS, GOLD, etc. this, the entry :lexiconEN/risk-n can in Information and Management already provide a high number of relations, be linked to :lexiconEN/risky-adj of Knowledge. She has been which can be extended with new ones or through a relation of morphologic derivation the coodinator of the OntoGrid, new vocabularies can be created as needed. without the need of an identifier. If, later on, SemSorGrid4Env, SEALS, In sum, the idea of representing lexical the noun risk occurs as an entry in another Interactivex and LIDER research information as a graph is not new, and LD dictionary of the same or a different family of projects, and she is currently taking are not presented as a novelty in this regard. dictionaries, the information can be integrated part in three European H2020 However, they allow for the implementation in a straightforward manner without relying research projects. She has also of networks or the integration of already on dictionary-dependant numeric IDs. available ones on the basis of a homogenous This in turn relates to a second advantage: participated in numerous research format. Thus LD meet the need for linking we no longer depend on the order of projects of the Spanish National lexical elements that were previously appearance of lexical entries or senses in Plan of Basic Research, Networks, isolated by using sets of relations and cross-references, which is usually indicated Special Actions and Technological elements that are defined externally and by a superscript in numeric form in printed Transference (ZENITH, Hundred, can be extended as required, relying or not or electronic format, e.g: bow2, meaning, Advances, Profit, etc) and directed on an underlying ontology of word senses. for instance, the second homograph of the multiple projects of national and This does not mean that LD is equivalent word bow. There are ways of keeping track international enterprises. She is the to any of the efforts mentioned above or of the order and the lexical entry to which head of the first node of the Open 2016 forms a better option to the structures in that position refers, but a change in the Data Institute in Spain and the which they are implemented, and, as said, original order of entries or the integration main researcher of the first research it does not make claims on the structure with other dictionaries in which the order of our mental lexicon. RDF is, however, differs would then require the update of project that uses IBM- in a a model to represent data worth taking all cross-references to any of the ordered Spanish university (2015). into consideration for lexicographic entries. Since entries and senses are now [email protected] projects aiming at the creation of lexical identifiable throughout the data and graphs networks because it provides a basis for are not actually ordered, cross-references can be direct pointers to the entry or sense 15 https://www.w3.org/TR/ to which they refer.

rdf-sparql-query/ The third advantage is intrinsically Kernerman Dictionary News, July 22

related with the first one, too: we can The sixth advantage is related to represent an “abstract” lexicon that gathers cross-references in the sense of any reference all the entries in a specific language. In to another entry that might occur inside other words, have a “pool” of lexical entries the lexicographic article: orthographical extracted from different dictionaries of the variants, synonyms, antonyms, genus same or different type, monolingual or terms, semantic types, etc. Not only are multilingual, without losing provenance the entries reusable throughout the data information about which data comes from (first advantage), but the pointers to them The Ontology Engineering which dictionary. Thanks to an appropriate are now typed (Klimek and Brümmer 2015, Group (OEG), led by Prof. URI naming strategy, this pool of entries McCracken 2015). This might not seem Asunción Gómez-Pérez, is based will grow dynamically (Gracia 2015) with like an evident benefit to the user of online at the Computer Science School each dictionary converted into RDF that dictionaries, for whom the label antonym at Polytechnic University of has any information about an entry in that or a typographical mark may suffice, but Madrid (UPM). It ranks eighth specific language. typed properties allow users to perform among the two hundred research If the approach mentioned above is queries not dependant on the (proprietary) groups of UPM and is widely applied in the conversion of multilingual format of the data and LD-aware systems recognized in Europe in the dictionaries, for instance, Spanish-French to find any needed information. At the same and French-English, linking the French time, by virtue of being defined in a public areas of Ontology Engineering, entries from the ES-FR dictionary with external vocabulary, e.g. LexInfo, the same Semantic Infrastructure, Linked their corresponding entries in the FR-EN properties can be reused in the conversion Data, and Data Integration. dictionary will bring us a fourth advantage: of other lexica of the same series into LLD, Its main research areas are translation relations can be established thus gaining interoperability. This responds Ontological Engineering, through a language acting as a pivot to the need of standardization among the Open Science, Data-driven (Villegas et al. 2016). high number of heterogeneous annotation Language Technologies, Data The fifth benefit concerns the schemas, tagsets, and proprietary DTDs that on the Web, and Data Science. onomasiological view that LD enables. are being used to create language resources. The OEG was the coordinator The source dictionary has probably been Furthermore, given that these vocabularies of LIDER, a European project compiled from a semasiological perspective, are extensible, new properties and that promoted the creation of by putting the word as the center of attention individuals or classes can be added. If the a linked data-based ecosystem and listing its different senses. Given the hierarchy defined in a linguistic ontology is semantics by reference in LLD mentioned not compatible with the view other domain of interlinked multilingual above, the of a word and the word experts might have, new vocabularies can language resources to support itself will point to the same concept, which be created and aligned to the ones already content analytics tasks. is modeled as a node in the graph and has available. As we could experience during http://oeg-upm.net/ therefore a URI. Accessing that node will our work on the conversion of dictionaries allow us to see which words lexicalize it, to LD, a detailed comparison of the i.e. putting the concept as our focus and elements (and their classification) present traversing the graph from it to the lexical in external vocabularies with the proprietary elements related to it. This way of thinking data model of a company specialized in is well illustrated in the case of multilingual lexicography is actually a significant step dictionaries in LLD, where we can see how a towards the improvement, refinement or concept is verbalized in different languages. even reconsideration of the elements that The potential is however no less interesting configure that data model. in monolingual dictionaries. In the authors’ As the last paragraphs suggest, the work on the migration of language resources concept of reusability lies at the heart of LD. to LLD, definitions have been encoded If the enterprise of compiling a dictionary is at the level of the concept. Even though seen through the looking glass of LD from definitions can be fine-grained and are the very beginning, it will affect the whole not presented in the form of keywords, process. Decisions such as, for example, SPARQL queries over them are feasible. keeping independent lexical entries for For instance, we can search for concepts in an entry and its homographs will have to whose definition the word sunrise occurs, be considered from the point of view of which will yield the series of concepts lexicography (two words that share form

2016 that words like dawn, morning, daylight, but are not related etymologically could etc. denote and which are semantically thus be regarded as independent entries) and related, although these relations are not LD. At the same time, how do we model implemented. Through these concepts we homographs in such a way that enables us could not only access dawn, morning, etc. to identify each entry but also to integrate but also their antonyms dusk, twilight, etc. content from another source that we do not Thus, by taking the concept as entry point know to which of the homograph entries it we can get a set of concepts that are related pertains? It will not be a matter of converting but are not necessarily equivalent, which lexical data to LD, but of creating them is not a trivial task when searching in a from scratch in a reusable, interoperable

Kernerman Dictionary News, July conventional online dictionary. and linguistically accurate way. 23

4. Benefits of a lexicon in linked data: On the other hand, thinking in terms of micro-structure LD forces us to constantly question what is The previous section dwelled on the benefits the nature of the relation between two pieces of representing a lexicon as LD but it did of information. An LD-native dictionary not deepen into the modeling of information will require a specification on the part of present in a single lexicographic article. As lexicographers of which kind of relations opposed to lexical systems (Section 2), this between which type of elements will be information (definitions, grammatical data, encountered when modeling lexicographic syntactic frames, etc.) is also modeled as articles. This brings us to the difference Acknowledgments a graph. between compiling dictionaries with only This work is supported by the Ideally, everything in the lexicographic the human as target, and creating them for Spanish Ministry of Economy entry can be modeled as a node (McCracken (both humans and) computers. The fact and Competitiveness through 2015) but, in general, and on the basis of that an XML tag, for instance, can occur at the project 4V (TIN2013- lemon-ontolex, the representation revolves different levels in the dictionary entry (e.g. 46238-C4-2-R), the Excellence around lexical entries (ontolex:Lexi- a geographical usage indication attached to Network ReTeLe (TIN2015- calEntry), concepts, the relation between a pronunciation vs. a geographical usage 68955-REDT), the Juan de entries and concepts reified as lexical senses indication attached to a sense) seems la Cierva program, and the ( ), word straightforward enough for a human, but ontolex:LexicalSense Spanish Ministry of Education, forms (ontolex:Form), definitions, an NLP application needs to be able to phonetic representations, register, syntactic distinguish between a description of a string Culture and Sports through the Formación del Profesorado frames, etc. Relations between nodes have (e.g. [kɑː] is the transcription of the British a well-defined domain and range, and, pronunciation of car) and the restriction on Universitario (FPU) program. with actual data, every node will be an the usage of a sense (e.g. the floor with the This contribution is inspired instance of a class defined in an ontology. meaning the floor above the ground level by the work towards the Following the lemon-ontolex model, the floor is only used in the UK). Modeling development of a linked data English entry cloud, with the sample URI data as LLD thus entails a reflection of prototype for the Spanish lexiconEN/cloud-n will have rdf:type which information affects which elements, dataset of the Global Series ontolex:LexicalEntry, will denote and which properties are the most suitable of K Dictionaries, carried out as many : as senses or ones to be used in which case, taking all skos Concepts by the authors as part of the meanings it has, and the relation from nuances and human implicit knowledge into Linked Data Lexicography for the word to the concept will be encoded account. High-End Language Technology as ontolex:LexicalSense. Word forms (cloud, clouds) will be recorded at 5. Conclusion and future lines of work Application (LDL4HELTA) the ontolex:Form level, together with LLD emerge as a promising option to project of Semantic Web grammatical number information and represent and publish current lexicographic Company and K Dictionaries. phonetic transcription. Definitions, usage projects and to serve as a structural backbone examples, etc. are likewise linked to the for undertaking new ones. They allow for entry through edges and intermediate the creation of an interoperable lexical nodes. network that is endowed with all the benefits On the one hand, one of the consequences that LD offers: data aggregation, easy of this configuration is that elements discovery, LD-aware services compliance, previously embedded in the lexicographic improved data querying, sustainability and article become entry points in the graph and reusability. In this paper we have offered a are no longer subsumed under any entry, brief overview of LLD, placing them in the since the hierarchy is lost. This implies that context of lexical networks, and analyzing an or collocation, for instance, will some of the benefits of the conversion of not be encapsulated under the container of lexical data into LD in terms of macro- the entry in which it was originally defined, and microstructure. The modeling of but will be related to it with the suitable lexicographic data to LLD poses challenges property. Since the idiom now becomes a for which bridging the gap between LD node, we are able to link it to any other experts and lexicographers is crucial. node from any other entry in the lexicon: Moreover, the relation of LD to functional like a cat on a hot tin roof could then be lexicography has not been explored to its linked, for example, to the appropriate full potential and, although there has been 2016 sense of cat, of hot and of roof, if desired, some work on RDF and OWL as building which will allow to access the idiom from blocks for an architecture of mono- and any of those entries. Also, in the case of plurifunctional dictionaries (Spohr 2011, and frequent collocations, we are 2012), this remains a challenging line of creating new lexical entries that were not work, partly due to the increasing need of originally conceived as such in the lexicon. natural languages interfaces for the Web of As lexical entries, they will be also linked Data. However, current trends in LD-based to their corresponding skos:Concept(s), NLP and in publishing language resources which brings us back to the possibility of as LD, including lexical data, show that we an onomasiological perspective on the data. will be getting there hopefully soon. Kernerman Dictionary News, July 24

References Mel’cuk, I. 1996. Lexical functions: a tool Bizer, C., T. Heath, and T. Berners-Lee. for the description of lexical relations 2009. Linked data – the story so far. in a lexicon. Lexical functions in International Journal on Semantic Web lexicography and natural language and Information Systems, 5(3), 1-22. processing, (31), 37-102. Bosque-Gil, J., J. Gracia, E. Miller, G.A. 1995. WordNet: A Lexical LDL4HELTA Montiel-Ponsoda, and G. Aguado-de Database for English. Communications Linked Data Lexicography for Cea. 2016. Modelling multilingual of the ACM 38(11), 39-41. High-End Language Technology lexicographic resources for the Web Montiel-Ponsoda, E., G. Aguado de Cea, Application (LDL4HELTA) of Data: The K Dictionaries case. In A. Gómez-Pérez, and W. Peters. 2008. is a 24-month EUREKA Proceedings of GLOBALEX 2016 Modelling Multilinguality in Ontologies. project (July 2015 – June Workshop at the 10th Language In The 22nd International Conference 2017) within the framework Resources and Evaluation Conference on Computational Linguistics (COLING of the Austria-Israel Bilateral (LREC 2016), Portorož, (Slovenia). 2008), August 18-22, Manchester, UK, R&D Agreement, carried out Declerck, T., E. Wandl-Vogt, and K. 67-70. by Semantic Web Company Mörth. 2015. Towards Pan European Polguère, A. 2012. Like a lexicographer (SWC, http://semantic-web. Lexicography by Means of Linked weaving her lexical network. In (Open) Data. In Electronic lexicography Proceedings of CogALex-III Workshop at/) and K Dictionaries (KD, in the 21st century: Linking lexical data of the 24th International Conference on http://kdictionaries.com/), with in the digital age. Proceedings of the Computational Linguistics (COLING funding from the Austrian eLex 2015 conference, Herstmonceux 2012), December 8-15, Bombay, India. Research Promotion Agency Castle, UK, 342–355. 1-4. (FFG) and the Israeli Office of Fellbaum, C. (ed.). 1998. WordNet: Polguère, A. 2014. From Writing the Chief Scientist (OCS). An Electronic Lexical Database. Dictionaries to Weaving Lexical The aim is to combine Cambridge, MA: MIT Press. Networks. International Journal of multi-language lexical resources Fuertes-Olivera, P.A., and H. Bergenholtz. Lexicography, 27(4), 396-418. with semantic technologies 2011. Introduction: The construction of Spohr, D. 2011. A Multi-layer Architecture expertise and develop new Internet dictionaries. In Fuertes-Olivera, for “Pluri-monofunctional” Dictionaries. products and services for P.A., and H. Bergenholtz (eds.), In Fuertes-Olivera, P.A., and H. the international language eLexicography: The Internet, Digital Bergenholtz (eds.), eLexicography: Initiatives and Lexicography. London The Internet, Digital Initiatives and technology market, in reply & New York: Continuum, 1-16. Lexicography. London & New York: to the needs for language- Gader, N., V. Lux-Pogodalla, and A. Continuum, 103-120. independent, specific-language Polguère. 2012. Hand-Crafting a Lexical Spohr, D. 2012. Towards a Multifunctional and cross-language solutions, Network With a Knowledge-Based : Design and to enable cross-lingual Graph Editor. In Third Workshop Implementation of a Graph-based search and data management on Cognitive Aspects of the Lexicon Lexicon Model. Lexicographica Series approaches. The main tasks (CogALex III), Mumbai, India, 109-125. Maior (141). Berlin: de Gruyter. consist of converting KD Gracia, J. 2015. Multilingual dictionaries Swanepoel, P.H. 1994. Problems, lexicographic data from XML and the Web of Data. Kernerman theories and methodologies in current to RDF, developing an API for Dictionary News, (23), 1–4. lexicographic semantic research. In W. enhanced data streaming and Klimek, B., and M. Brümmer. 2015. Martin et al. (eds.), Proceedings of the Enhancing lexicography with semantic Sixth International Euralex Congress, dissemination, and incorporating language databases. Kernerman Amsterdam, 11-26. it in SWC’s PoolParty Semantic Dictionary News, (23), 5-10. Villegas, M., M. Melero, N. Bel., and Suite (https://poolparty.biz/). McCracken, J. 2015. The Exploitation J. Gracia. 2016. Leveraging RDF The RDF modeling is designed of Dictionary Data and Metadata. In graphs for crossing multiple bilingual by the Ontology Engineering P. Durkin (ed.), The Oxford Handbook dictionaries. In Proceedings of the 10th Group of Universidad of Lexicography. Oxford: Oxford Language Resources and Evaluation Politécnica de Madrid (UPM), University Press, 501-514. Conference (LREC 2016), Portorož, which is involved also in the McCrae, J., G. Aguado-de-Cea, P. (Slovenia). word sense disambiguation Buitelaar, P. Cimiano, T. Declerck, A. Wandl-Vogt, E. 2015. How to innovate aspects. An advisory board Gómez-Pérez, J. Gracia, L. Hollink, lexicography by means of research 2016 consists of Christian Chiarcos E. Montiel-Ponsoda, D. Spohr, and infrastructures: The European (Goethe University, Frankfurt), T. Wunner. 2012. Interchanging examples of DARIAH, CLARIN and lexical resources on the Semantic Web. COST IS 1305 ENeL [slides]. http:// Orri Erling (Google), Asunción Language Resources and Evaluation, www.slideshare.net/ewv/how-to-in- Gómez-Pérez (UPM), Sebastian 46(4), 701-719. novate-lexicography-by-means-of-re- Hellmann (Leipzig University), McCrae, J., C. Fellbaum, and P. Cimiano. search-infrastructures/ (June 5, 2016). Alon Itai (Technion, Haifa), and 2014. Publishing and Linking WordNet Eveline Wandl-Vogt (Austrian using lemon and RDF. In Proceedings Academy of Sciences). of the 3rd Workshop on Linked Data in http://ldl4.com/ Linguistics. Kernerman Dictionary News, July 25

From dictionaries to cross-lingual lexical resources Guadalupe Aguado-de-Cea, Elena Montiel-Ponsoda, Ilan Kernerman and Noam Ordan

1 Introduction language barriers if we aim to attain a truly While the number of general resources that multilingual Semantic Web. are connected as part of the linked open data WordNet4 (Fellbaum 1988), for paradigm increases, the need to relate and example, which is the most widely used link linguistic data in multiple languages as a lexico-semantic resource in English result of this trend has rocketed as well. The with more than 117,000 synsets (sets of vision of a universe that allows linguistic synonyms that account for a concept), information from different resources to be has recently undertaken a new role in interlinked has attracted many scholars in constructing the Semantic Web (Berners search of “the magic wand” for solving Lee et al. 2001). The W3C draft RDF/ the everlasting problem of the Tower of OWL Representation of WordNet5 has Guadalupe Aguado-de-Cea Babel, which now includes languages defined URIs for the synsets covered by for machines in addition to human users. the WordNet lexical database. Many other is Professor at Universidad Currently, most linguistic resources are still efforts have been devoted to link WordNet Politécnica de Madrid (UPM). in proprietary formats, making it difficult to other resources. McCrae et al. (2012) She received both her MSc in to be linked and interoperate on the Web. used WordNet together with as a Translation and PhD in English To achieve that envisioned linked cloud of case study of the possible transformation of Philology from Universidad linguistic resources, several issues have to lexical resources into linked data compatible Complutense de Madrid, and has be addressed, from representation models to formats. In McCrae et al. (2014), the authors been a member of the Ontology linking processes, from querying interfaces provide RDF-compliant Wordnet with links Engineering Group at UPM since 6 to dataset maintenance solutions. to other lexical resources, such as VerbNet , 1996. Her current research activities 7 8 Great advances in methodologies and Lexvo or lemonUby. include, among others: terminology techniques for the publication of linked As for multilingual linguistic resources and ontologies, the representation data are laying solid foundations for turning which are part of the current LLOD cloud, of lexical knowledge in ontologies, independent databases into a boundless cloud it is worth mentioning IATE RDF9 (Cimiano where users can make queries in an integrated et al. 2015), AGROVOC10 and EUROVOC multilinguality in linked data, environment using dedicated, standardized in SKOS11, or the APERTIUM12 series of specialized languages as well as querying languages, thus catering for bilingual dictionaries (all of which are the linking between the ontological interoperability as well as fostering univocity navigable and searchable from Datahub13). field and the natural language field, of the elements described. Linked data relies Several chapters of DBpedia14 are now especially in its application to the on the Resource Description Framework available in different languages, as well as Semantic Web. She has participated 1 (RDF) data model as the main mechanism some language versions of EuroWordNet in several standardization projects, 15 16 applied to describe data. These data in turn (the Basque and Catalan versions present such as the Ontology Lexica are linked to other similarly modelled data, a case in point). However, what still remains Community Group (Ontolex) in and ultimately retrieved and manipulated by the W3C, in particular regarding using Web standards such as the SPARQL2 4 https://wordnet.princeton.edu/ the representation of translation query language. 5 https://www.w3.org/TR/wordnet-rdf/ relations among languages with Many language resources have seen 6 http://verbs.colorado.edu/~mpalmer/ the advantages of complying to this new projects/.html/ a view on the multilingual Web. paradigm, and are currently available as part 7 http://www.lexvo.org/ She is the President of the Spanish of the Linguistic Linked Open Data (LLOD) 8 http://lemon-model.net/lexica/uby/ Association for Terminology, and Cloud3, a sub-cloud of the linked open 9 https://datahub.io/es/dataset/iate-rdf/ convenor of the AENOR CTN_191 data cloud that brings together linguistic 10 https://datahub.io/es/dataset/ Terminology Committee, the resources formalized in RDF (from , agrovoc-skos/ corresponding Spanish Committee dictionaries, and terminologies to metadata 11 https://datahub.io/es/dataset/ of ISO TC 37. repositories and corpora). However, as in eurovoc-in-skos/ [email protected] the case of the traditional Web, the LLOD 12 https://datahub.io/es/dataset/ 2016 is mainly English-oriented, though more apertium-rdf/ non-English data sources are increasingly 13 https://datahub.io/ being published. As stated by Gracia et al. 14 http://linghub.lider-project.eu/datahub/ (2011), the new challenge is to overcome / 15 http://linghub.lider-project.eu/ 1 https://www.w3.org/standards/techs/ datahub/basque-eurowordnet-lemon- rdf#w3c_all/ lexicon-3-0/ 2 http://www.linkeddatatools.com/ 16 http://linghub.lider-project.eu/ querying-semantic-data/ datahub/catalan-eurowordnet-lemon-

3 http://linguistic-lod.org/llod-cloud/ lexicon-3-0/ Kernerman Dictionary News, July 26

a challenging issue is the flawless linking and how to solve discrepancies resulting of complementary resources in different from the idiosyncratic categorization of natural languages. By complementary each language system/culture, some of resources, we refer to resources that deal which are reflected in the way different with the same (or closely related) parcels of linguistic features, such as gender, pronouns knowledge, be it general or domain-specific or classifiers, are encoded (cf. Fellbaum and knowledge, whose metadata descriptions as Vossen 2007). One of the resources that well as actual data are in different natural better materializes (and tries to solve) this languages. In this sense, we argue that problem is EuroWordNet (Vossen 1998), Semantic Web approaches and technologies and subsequently derived projects such as are ripe enough to offer viable solutions to MultiWordNet (Pianta et al. 2002). Broadly the linking issue in a principled manner. speaking, such databases connect Our objective in this contribution is to or lexicons in different languages via a set report on our experience in modelling the core of categories, the so-called Interlingual linked data version of the Spanish set of Index, based on Princeton WordNet (Miller the K Dictionaries (KD) multi-language 1995). In the case of EuroWordNet there is Global Series that will serve to transform a an implicit bias towards English synonym multilingual dictionary into a cross-lingual sets which allegedly stand for concepts Elena Montiel-Ponsoda is lexical resource. We would like this to realized lexically by lexical items in Associate Professor at the Applied set ground for discussion to define open different languages, and, in the case of Linguistics Department at issues for the linkage of lexical data in MultiWordNet, the bias is more explicit, Universidad Politécnica de Madrid multiple languages, and some solutions are because the English WordNet is literally (UPM) since 2012, and member of suggested on the base of de-facto standard translated into the various languages, and the Ontology Engineering Group lemon-ontolex model17, initially designed to gaps are declared by free translations that since 2006. She got her PhD on serve as an interface between an ontology stand for those concepts, allowing linked Applied Linguistics from UPM and the natural language descriptions concepts/synsets to percolate through the in 2011. Her research interests that lexicalize the knowledge represented gaps. are at the intersection between in it, and currently widely adopted for In this regard, we would argue that translation (and terminology) exposing linguistic resources as linked data. different language-culture couplings (we Specifically, we describe how multilingual see this as a binomial) can exhibit different and knowledge representation, information in the RDF version of KD’s levels of granularity when representing including among others: ontology dataset has been represented according and categorizing knowledge. Even among localization and lexicalization, to the vartrans module, a lemon-ontolex culturally-related languages, such as lexico-syntactic patterns for module for representing translations and Italian and English, it has been shown ontology development, functional term variants, and how this could contribute that a medium-sized dictionary of English models for deep semantics analysis, to enhance interoperability among the to Italian contains around 7.8% lexical , and linguistic different language versions of the Global gaps, where there is no equivalence and linked data for content analytics. Series. a free translation is needed to fill the gap She is currently working on the The paper is further structured as (Bentivogli and Pianta 2000). Therefore, representation of lexical resources follows. In the next section we refer to the and in order to address these issues, the according to the linked data background and motivation, i.e. approaches Global WordNet Grid (Fellbaum and Vossen to linking multilingual lexical and/or 2007; Vossen et al. 2016) initiative aims at paradigm, specifically, on how conceptual resources. Section 3 introduces providing a platform for centralising all translation relations can help in the the KD approach and Section 4 presents wordnets and their linkage, and coordinating construction of the multilingual the formal solution we have adopted for its the inclusion of new concepts for multiple Web of Data. Spanish dataset in the linked data model, languages. As such, this latter approach [email protected] specifically, the lemon-ontolex vartrans represents an important step towards a more module. The actual modeling of the Spanish principled solution to the multilingual (still dataset from the XML proprietary format unresolved) issue.18 of the dictionaries is spelled out in Section Another approach that also builds on 5. In Section 6 we list some advantages of WordNet, but which has been born in the complying to this or similar formalisms in Semantic Web era, is BabelNet (Navigli the context of the linked data paradigm, and and Ponzeto 2012). This is a semantic

2016 our conclusions are presented in Section 7. network and ontology that aims at bringing together words and terms in different 2 Background and motivation languages, from various resources, When approaching this issue in the Semantic which refer to the same concept, with the Web field, it is inevitable to refer to a former, objective of serving as valuable sources much older discussion on how to bring of translation or equivalent relations. together lexicons in different languages According to Moro and Navigli (2015), in BabelNet it is possible “to find the 17 https://www.w3.org/community/ ontolex/wiki/Final_Model_ 18 cf. http://compling.hss.ntu.edu.sg/

Kernerman Dictionary News, July Specification/ omw/ 27 concept medicine (bn:00054128n), which each is represented on its own terms, and is represented by both the second word only at a later phase it is translated to sense of medicine in WordNet and the another, creating a pair-specific, and thus Wikipedia page Pharmaceutical drug, pair-sensitive, interlingual representation. among others, together with synonyms The outset of each language dataset such as drug and medication in English in this series concerns mapping its and lexicalizations in other languages, such components to identify, categorize and as farmaco in Italian and medicamento in interlink them, including semantic and Spanish”. In this way, BabelNet combines grammatical information. Each language the general-specific approach taken from core then serves as a base for adding WordNet with the specific knowledge translation equivalents in other languages extracted from Wikipedia (and other and developing bilingual and multilingual resources, e.g. OmegaWiki). As for the versions. All the different language datasets English-language bias issue, it is probably share the same common methodological propagated to this resource, since WordNet framework and technical infrastructure. is taken as a starting point. However, it The entries in the different languages also can also be reduced, because of the use of have the same microstructure, which still Wikipedia entry pages for categories not enables each one to convey its peculiarities. Ilan Kernerman is CEO of initially included in the original WordNet. The data is structured in XML format and Apart from acknowledging the great is currently being modeled in RDF. The K Dictionaries, leading its value of such a resource, we have also French dataset, for instance, has the most lexicographic development and spotted some flaws that will undoubtedly be extensive multilingual reach so far with 18 international cooperation. He solved in the future, and which are probably language pairs, the German lexical dataset edits and publishes Kernerman due to automating the linking process. For groups 8 more languages, Spanish has 7, Dictionary News, co-edited and instance, some synsets contain words that Japanese – 7, English – 6, Norwegian – 6, published two collections of belong to different categories. An example etc. Now that several language sets have conference papers (1998, with is the synset for paella (typical Spanish rice become so lexically rich, they are ripe to Tom McArthur, and 2010, with dish), which also includes the pan used to start networking with each other, such Paul Bogaards), and is associate cook it. As for the translations in BabelNet, as by connecting L2 translations to their editor of Lexicography – Journal when different options are offered, we corresponding entries in the L1 lexical of Asialex and guest co-editor of would suggest that additional information dataset and from there on to translations is required, such as confidence scores in other languages, and so on. the special IJL issue on bilingual associated to the proposed translation, As explained in the introduction, we learners’ dictionaries (2016, with pragmatic restrictions (for instance, the reflect here on some interesting issues Arleta Adamska-Sałaciak). His frequency with which a word in language spotted when transforming the Spanish interests include multilingual and A is translated with the proposed equivalent lexical core of the Global dataset, focusing pedagogical lexicography, and in language B), or directionality of the on multilingual ones. We leave aside the interoperability with NLP and translations. Means such as these would methodology followed in the modeling part, knowledge systems. Currently he positively contribute to enhance this which has been described in greater detail is president of Asialex (2015-2017) resource’s functionality. in Bosque-Gil et al. (2016a and 2016b), and and on the preparatory board of All in all, and although many advances move on to the resulting representation of Globalex. have been made in the alignment and translations in the proposed model. [email protected] linking of resources in different languages, it is still necessary to cater for certain 4 lemon-ontolex at a glance: The aspects in order to make the most of the vartrans module multilingual information contained in such In order to link and represent the linguistic resources. data included in KD’s Global Spanish dataset we relied on the lemon-ontolex 3 The K Dictionaries approach vartrans module. It presents wide The dictionary data used as input in this possibilities to link lexical senses and research belong to the Global Series of K variants in different languages from the Dictionaries (KD)19. KD is a technology- same or different data sets. As shown in oriented-content creator that specializes in Figure 1, the lexico-semantic generic class developing pedagogical and multilingual addresses the relation between two lexical 2016 lexicographic data. In 2005 it launched the entries or two lexical senses. This relation Global Series, which today includes lexical is established by means of two properties: resources for 24 languages. The approach lexicalRel and senseRel. Thus, followed in this series is to compile for each lexicalRel relates two lexical entries language a core vocabulary as a standalone that are grammatically or stylistically project, and have it translated to other connected, such as acronyms, derivatives languages in more projects. In other words, and other forms. there is no bias towards any language, The second class, senseRel, represents the relation between two senses whose

19 http://kdictionaries.com/ meanings are related. Not only can Kernerman Dictionary News, July 28

lexico-semantic relations, such as synonymy, According to lemon-ontolex, a dictionary antonymy or hypernymy-hyponymy be entry or headword in the KD set is modeled represented in this way, but also term variants as an ontolex:LexicalEntry and its and translations. The purpose of such a corresponding ontolex:LexicalSense representation is to account for two lexical and skos:Concept, as can be seen in senses of terms (in the same or different Figure 2. Then, according to the vartrans language) that are semantically related in module, synonym relations are modeled as the sense that they can be exchanged in most relations between lexical senses that point to contexts, but their surface forms are not (ontolex:reference) the same concept directly related. Additionally, other types (skos:Concept). Thus, for example, the of semantic and pragmatic information, lexical entry for the headword acalorado such as dialectal, registerial, chronological, is linked to its corresponding sense and discursive, and dimensional variation can concept, and an artificial sense is created for also be captured by senseRel. the synonymous lexical entry agitado, so that a sense relation of the type synonymy 5 Modelling multilingual entries in the can be established between them. Should KD data with vartrans agitado have also its own headword in The starting point in the transformation of the dictionary, a link could be established Noam Ordan studied translation, the multilingual information (translations) between the lexical senses later on, or linguistically and computationally, contained in KD’s Global Spanish dataset lexical senses could be merged. Both lexical and completed his PhD at Bar Ilan was a ’Translation cluster’ that encompassed senses refer to the same skos:Concept, University under the supervision a set of translations for the original Spanish and a definition is also attached to the latter. of the late Miriam Schlesinger. lexical entry, including syntactic-semantic Similarly, translations are modeled as He has published extensively, and pragmatic information about the relations among lexical senses. Again, if translations (e.g. grammatical gender), and in particular on automatically we analyze Figure 2, the lexical sense for usage examples of the headword (commonly identifying translated texts and the entry in the source language (acalorado) a short phrase), as well as translations of statistical , is available, and the sense for the target those examples. language (verhit) has to be artificially worked as researcher and See Example 1 for the XML encoding of teacher in universities in Israel created, since no pointer to that entry in other the headword acalorado (heated), which dictionaries is provided in the XML data and Germany, and took part in contains a synonym, namely, agitado (once the Dutch and Norwegian datasets various projects in the industry. (lively or passionate), a definition, que es are converted to RDF, these entities can Currently he coordinates research muy animado (of a discussion or debate, support the automatic linking and growth innovation at K Dictionaries and that is heated), and translations into Dutch of both datasets). The usage examples that designs algorithms for using (verhit and vurig) and Norwegian (ivrig, accompany the senses are represented by human-crafted lexicographic data oppsatt, and opphetet). Moreover, this means of the property : and acalorado skos example for computational tasks, such as sense of is complemented with the class : . Moreover, a usage example (una sesión acalorada), kd UsageExample cross-lingual information retrieval. examples of usage are commonly translated and its equivalents in Dutch (vurige Dr Ordan also serves as an adjunct into other languages and grouped by the zitting) and Norwegian (et opphetet : , a teacher at the English Language møte), respectively, are all included in the kd TranslationExampleCluster Department at the Arabic Academic grouping made in the original datasets and ExampleCtn type and identified by means maintained here. College in Haifa. of a translation cluster identifier given in The modeling solution proposed by [email protected] the XML, TC00001664. the vartrans module for representing a translation relation by means of a reified class instead of a property or relation facilitates the further description of the translation object. In this sense, translationSource and translationTarget can be further specified, as done for the current version of the KD Spanish set. Also, other features that describe a certain translation relation could be added. For example, a confidence

2016 value can be assigned to the translation pair if available. A context could be determined to restrict the validity of the translation pair and differentiate it from other possible translations of the original entry into the target language. In fact, if we consider the usage examples available for acalorado in the XML dataset, una sesión acalorada (a heated session) has been translated into Dutch as vurige zitting, and not as verhite

Kernerman Dictionary News, July Figure 1. Classes and properties in the vartrans module zitting, which was the synonym provided. 29

And the same happens with the Norwegian alternatives, the phrase is translated as et opphetet møte, and as learners of Norwegian we may wonder if the other two synonyms agitado offered for opphetet, namely, ivrig and que es muy animado oppsatt, can be interchangeably used in Additionally, we may want to specify the type of translation relation that exists between a pair of translation equivalents. Gracia et al. (2014) propose a classification verhit of translation equivalents into three types: direct equivalents (lexical entries in the translation pair that are semantically vurig equivalent), cultural equivalents (lexical entries that are not semantically equivalent, but are pragmatically so), and lexical equivalents (the target lexical entry – or translation equivalent – verbalizes the original entry in the target language but is not a semantic or pragmatic equivalent). ivrig, oppsatt, opphetet For more details we address the interested reader to the above-cited paper. Therefore, apart from specifying the […] origin and target of the translation pair, the other descriptions that could further enrich the information related to were not available sesión acalorada in the original source and have not been does not mean such descriptions could not be added or imported from another resource that contains data to that respect. In fact, this is one of the main benefits of adopting the vurige zitting linked data paradigm, namely, being able to link to resources containing complementary information. 6 Advantages of cross-lingual lexical resources Our reflections in this paper are made et opphetet møte to point out some advantages of linking multilingual datasets in the aim of getting the most of the multilingual data value […] chains in the cloud of linked data. We argue that the linked data representation formalism offers an innovative way of Example 1: XML with the translations in Dutch and Norwegian of the bringing together resources in which either Spanish headword acalorado sense of heated the vocabularies or models, or the data itself, are described in different natural languages, contributing to the construction of a truly multilingual Semantic Web. The challenge links will be flawlessly established. As here is to account for as comprehensibe already mentioned in previous sections of as possible specifics of each language this paper, once the different datasets of the taken individually while at the same time Global Series are available in RDF, links 2016 to represent links with meaningful labels will be established among the different across languages within a multilingual entities, contributing to an automatic graph. growth of the resources. If we take the In the specific case of the lexical example of KD’s Global Spanish dataset, resources under examination, we argue since it contains translations into Brazilian that by representing translations as links Portuguese, Dutch, English, Japanese, and between lexical senses (and, in turn, lexical Norwegian it is reasonable to assume that entries), whenever new datasets that contain relying on those translations, links will be information in the target languages are also easily created among the different datasets. represented according to this paradigm, Although this is still a visionary Kernerman Dictionary News, July 30

Figure 2. Modeling of a KD multilingual entry with lemon-ontolex

concept, representing lexical resources In the example of the BabelNet medicine according to this approach will enable concept mentioned in Section 2, we could the emergence of a cross-lingual graph in identify accurately the specific uses of a bottom-up fashion. This will maintain medicine versus Pharmaceutical drug, drug the distributed fashion of the linked or medication. Are they used in the same data graph, and datasets will be easily contexts? Which is the most appropriate connected, disconnected or contextualized translation for medicamento in Spanish in for specific users and uses. an informal setting? Contrary to the approaches described This is also specifically relevant in in state-of-the-art projects within the those cases in which complex linguistic Global Grid initiative, we believe no descriptions are associated to conceptual common set of concepts or intermediary structures. Let us consider the example of conceptualization would be needed to biosanitary waste, in general, and hospital establish cross-lingual relations, but waste, only for the waste produced in links would emerge among datasets at a hospitals. If the difference between these different pace. Put differently, instead of concepts is established at the conceptual relying on a common conceptualization level, the two terms will most probably

2016 to act as intermediary, the burden of the be associated to two different concepts. cross-lingual connection would be carried Conversely, if only one concept is by the links. represented in the ontology, we may still At a monolingual level, since the relation want to account for both terminological between synonyms or terminological variants in the linguistic model, and variants has been also reified in the Ter- explicitly state the motivation behind each minologicalRelation class, we denomination. In this way, we would also could also determine precisely if a certain facilitate the linking of this data source synonym or term is used in a specific to another data source contained in a context, or if all the synonyms related to the different dataset and to which only the term

Kernerman Dictionary News, July same concept can be interchangeably used. biosanitary waste has been associated. 31

7 Conclusions Fellbaum C. and P. Vossen 2007. Following the experiences in this project Connecting the Universal to the Specific: we can claim that the publication of lexical Towards the Global Grid. In Proceedings and terminological resources as linked data of The First International Workshop will result in an enriched unified graph of on Intercultural Collaboration (IWIC lexical entries, senses and translations on 2007), Kyoto, Japan, January 25-26. the Web. Consequently, more information Gracia, J. Montiel-Ponsoda, E. Cimiano, (additional notes, glosses, descriptions) P. Gómez-Pérez, A. Buitelaar, P. and will be retrieved by querying the linked J. McCrae. 2011. Challenges for the This paper was presented at data resources by means of SPARQL multilingual Web of Data. In Web META FORUM 2016 in Lisbon, queries. Moreover, they could be enriched Semantics: Science, Services and Agents Portugal on 5 July 2016. with pictures, audio, and the like, as on the World Wide Web 11: 63-71. http://meta-net.eu/events/ has been successfully implemented in Gracia, J. 2015. Multilingual dictionaries meta-forum-2016/ BabelNet, for example. However, having and the Web of Data. Kernerman stated the benefits of linking linguistic Dictionary News, (23), 1-4. resources, and more specifically the McCrae, J., Cimiano, P., and advantages of this initiative when applied Montiel-Ponsoda, E. 2012. Integrating to multilingual lexical resources, we are WordNet and Wiktionary with lemon. In also aware of the challenges that still need C. Chiarcos, S. Nordhoff, S. Hellmann, to be tackled and that have been discussed (eds.), Linked Data and Linguistics: in Section 6. Representing and Connecting Language data and Language Metadata. Heidelberg Acknowledgements & New York: Springer, 25-34. This work is supported by the 4V Spanish McCrae, J., Fellbaum, Ch. and Cimiano, National Project (TIN2013-46238-C4- P. 2014. Publishing and Linking 2-R), the Spanish Excellence Network WordNet using lemon and RDF. In ReTeLe (TIN2015-68955-REDT), and the Proceedings of the 3rd Workshop on Elena Montiel-Ponsoda at META LDL4HELTA project under the EUREKA Linked Data in Linguistics. program. Miller, G. A. 1995. WordNet: a lexical database for English. Communications References of the ACM, 38(11), 39-41. Bentivogli, L., and Pianta, E. Montiel-Ponsoda, E., Bosque-Gil, 2000. Looking for lexical gaps. J.,Gracia, J. Aguado-de-Cea, G. In Proceedings of the Ninth EURALEX and Vila-Suero, D. 2015. Towards International Congress, EURALEX the integration of multilingual 2000, pp. 663-669. terminologies: an example of a linked Berners-Lee, T., Hendler, J. and Lassila, data prototype. TIA 2015, Granada, O. 2001. The Semantic Web. Scientific Spain. American, May 2001. 29-37. Moro A. and Navigli R. 2015. Bosque-Gil, J., Montiel-Ponsoda, E. SemEval-2015 Task 13: Multilingual Gracia, J. and Aguado-De-Cea, G. All-Words Sense Disambiguation and 2016a. Terminoteca RDF: a Gathering Entity Linking. In Proceedings SemEval Point for Multilingual Terminologies in 2015, Denver, Colorado, June 4-5, 2005, Spain. 12th International Conference 288-297. on Terminology and Knowledge Navigli, R. and Ponzetto, S. P. 2012. Interns @ KD 2015-2016 Engineering (TKE 2016). BabelNet: The Automatic Construction, ● Universitat Jaume I, Castelló Bosque-Gil, J., Gracia, J., Montiel-Ponsoda, Evaluation and Application of a E. and Aguado-De-Cea, G. 2016b. Wide-Coverage Multilingual Semantic Lucia Belles Calvera. Modelling multilingual lexicographic Network. Artificial Intelligence, 193, Lidia Gallen Martinez. resources for the Web of Data: The K 217-250 Miriam Martinez Garcia. Dictionaries case. In Globalex 2016 Pianta, E., Bentivogli, L., and Girardi, Catalan-Spanish index and Lexicographic Resources for Human C. 2002. Developing an aligned English-Catalan dictionary Language Technology. multilingual database. In Proceedings ● KU Leuven, Antwerp

Cimiano, P., McCrae, J., of the 1st International Conference on Rhiannon Telery Hincks. 2016 Rodríguez-Doncel, V. Gornostay, T. Global WordNet. English-Welsh dictionary Gómez-Pérez and B. Simoneit. 2015. Vossen, P. 1998. A multilingual database Zrinka Knezovic. Linked Terminology: Applying Linked with lexical semantic networks. Croatian-English index Data Principles to Terminological Dordrecht: Kluwer Academic Liubava Panchenko. Resources. In Proceedings of the 4th Publishers. Biennial Conference on Electronic Vossen, P., Bond, F. and McCrae, J. Ukrainian-English index ● Lexicography. 2016. Toward a truly multilingual Université de Lorraine, Nancy Fellbaum, C. (ed.). 1998. WordNet: GlobalWordnet Grid. In Proceedings Pauline Pierrot. An Electronic Lexical Database. of the 8th Global Wordnet Conference French dictionary

Cambridge, MA: MIT Press. (GWC 2016). Kernerman Dictionary News, July Adam Kilgarriff Prize

t last year’s eLex of tools such as the GDEX conference in (good example) algorithm, HerstmonceuxA Castle now widely used (in (UK), almost every paper several languages) as a and poster included at least computational shortcut for one reference to Adam the process of finding in a Kilgarriff’s enormous corpus the most appropriate body of work – a vivid example sentences and demonstration (if any phrases for a dictionary. were needed) of Adam’s During a research project extraordinary impact on at Brighton University in the the fields he had worked late 1990s, Adam conceived in. A group of us met there – with his co-researcher to discuss setting up a prize David Tugwell – the notion in honour of our dear friend of a Word Sketch. This and gifted colleague, who would provide a one-page died in May 2015. We are overview of a word’s now pleased to announce Adam Kilgarriff at Euralex 2010 most typical behaviour, the launch of the Adam summarizing the most Kilgarriff Prize, which will frequent and significant be awarded every two years, in conjunction with ways in which it would combine with other words the eLex conference series. The Prize is aimed at in text. An experimental version was used during younger researchers and is intended to recognise the development of Macmillan English Dictionary outstanding work in any of the fields which Adam for Advanced Learners (2002), and before long enriched with his remarkable intellect and original Word Sketches had become an essential resource thinking. in the lexicographer’s toolbox. Harnessing Word Almost uniquely, Adam was a major figure in Sketch technology to a powerful concordancer three quite distinct communities: natural language led to the birth of the Sketch Engine. Under processing (NLP), lexicography, and corpus Adam’s leadership, this suite of corpus-analysis linguistics. He was an enthusiastic, insightful, tools was continuously improved and enhanced, and prolific contributor to each of these fields, to become an industry-standard package for but perhaps his best work straddled all three, and dictionary publishers as well as for other linguistic few people have had such a profound impact undertakings worldwide. on the practice of contemporary lexicography There is much more, and this short account can in particular. Through numerous collaborations hardly do justice to Adam’s amazing achievements. with dictionary makers, Adam brought to bear It is hard to believe that one individual could have his NLP skills and can-do approach to provide done so much in such a short lifetime, and we elegant solutions to many of the challenges which hope that the Adam Kilgarriff Prize will be a fitting lexicographers face day to day. Issues such as memorial to Adam's life and work. word sense disambiguation, corpus building, and Details of the Prize – and how to apply for it – headword-list development all engaged Adam’s can be found at: http://kilgarriff.co.uk/prize/. attention – and lexicography is the richer for his interventions. In many cases, he proposed a Michael Rundell software solution, and this led to the development Chair of Trustees, Adam Kilgarriff Prize

K DICTIONARIES LTD 8 Nahum Hanavi St. Tel Aviv 6350310 Israel ı Tel +972-3-5468102 ı [email protected] ı http://kdictionaries.com