Kernerman Kdictionaries.Com/Kdn DICTIONARY News Towards Globalex

Total Page:16

File Type:pdf, Size:1020Kb

Kernerman Kdictionaries.Com/Kdn DICTIONARY News Towards Globalex Number 24 ● July 2016 Kernerman kdictionaries.com/kdn DICTIONARY News Towards Globalex The GLOBALEX Workshop on Lexicographic Resources and It is hoped that Globalex can facilitate knowledge sharing and Human Language Technology (http://ailab.ijs.si/globalex/) cooperation among its members and with others concerned took place as part of LREC 2016 at Portorož, Slovenia on May with language and language technology, promote the 24 and constituted the first live step in forming an overall global creation, research, exchange, dissemination, integration and constellation for lexicography. The initiative was launched usage of lexicographic resources and solutions, and enhance nine months earlier at a meeting held during the fourth eLex interoperability with the academia and industry worldwide. conference in the UK in August 2015, and has drawn the The roundtable featured short interventions by a support of lexicographic associations worldwide. representative of each organization, including one by video The full-day workshop was sponsored by the associations for and another by skype, presenting their association and vision lexicography of Africa, Asia, of Globalex, followed by a Australasia, Europe and North discussion with the audience. America (Afrilex, Asialex, The main issues concerned Australex, Euralex, DSNA), the aims and obstacles facing and the eLex conference series globaLex Globalex, its organization, on electronic lexicography in operation and meetings. the 21st century. It set out to explore standards for lexicographic The conference models ranged from dedicating a section resources and their incorporation in new language technology to Globalex at the continental conferences, and alternating and other solutions as part of knowledge systems and Globalex conferences with those of the different associations, collaborative intelligence. The workshop was attended by to holding Globalex conferences on their own every few about 60 participants, included 16 twenty-minute sessions years. and concluded with a roundtable about the future of Globalex. The organizers have agreed to contribute to the new Globalex The core idea of Globalex is to work on lexicography in website http://globalex.link/, which begins operation this global contexts and bring together different segments that month. More details appear on page 4, and a reprint of Towards operate on their own – on regional, topical or any other level Peoplex, from 1997, is available on page 18. – to cooperate. Ilan Kernerman 1 Towards Globalex | Ilan Kernerman 2 Lexicography associations: Afrilex, Asialex, Australex, DSNA, eLex, Euralex 4 GLOBALEX 2016 workshop summary and next steps 5 XVII Euralex International Congress & The Lexicographic Centre at Tbilisi State University | Tinatin Margalitadze 6 Asialex 2017 in Guangzhou | Hai Xu 6 Nineteenth-Century Lexicography Conference, 2018 7 Chinalex and lexicographic activity in China | Yihua Zhang 12 Treatment of entries with Chinese characteristics in English learner’s dictionaries: A case study of Oxford Advanced Learner’s Dictionary 8e | Lixin Xia and Langwei Zhai 16 Lexicography at the Society for Danish Language and Literature | Lars Trap-Jensen 18 Towards Peoplex (reprint) | Ilan Kernerman 19 Linked data in lexicography | Julia Bosque-Gil, Jorge Gracia and Asunción © 2016 All rights reserved. Gómez-Pérez 25 From dictionaries to cross-lingual lexical resources | Guadalupe Aguado-de-Cea, K DICTIONARIES LTD Elena Montiel-Ponsoda, Ilan Kernerman and Noam Ordan 8 Nahum Hanavi Street Tel Aviv 6350310 Israel 32 Adam Kilgarriff Prize | Michael Rundell +972-3-5468102 [email protected] Editor | Ilan Kernerman http://kdictionaries.com ISSN 1565-4745 2 he African association for he Asian Association for he Australasian Association Lexicography (Afrilex) was Lexicography (Asialex) was for Lexicography (Australex) establishedT in 1995 after a feasibility establishedT at the initiative of Gregory wasT founded in 1990 as a companion study for a lexicographical institute for James and Amy Chi on 29 March association to Euralex. It is committed Southern Africa indicated a keen interest 1997, during the Dictionaries in Asia to the development of lexicography in in a unifying body among lexicographers conference at Hong Kong University of all languages of the Australasian region. and members of related professions. Science and Technology, with the aim Its interests include: Dr Reinhard R.K. Hartmann chaired of fostering scholarly and professional ● dictionaries of all kinds the inaugural meeting, and officially activities in the field of lexicography ● the theory of lexicography announced the birth of a new member and facilitating the exchange of ● the history of lexicography of the Lex family. information and ideas through meetings, ● the practice of dictionary-making Afrilex is managed by a Board elected publications, etc. Membership is open to ● dictionary use biennially by the members present at a any person or institution. ● endangered languages General Meeting of the association. The first executive board was elected ● Revivalistics Membership is open to individuals and at that inauguration meeting, and the ● terminology and terminography institutions who have an interest in President, HUANG Jianhua, convened ● corpus lexicography lexicography. The current membership the first conference in Guangzhou ● computational lexicography stands at 60 individuals and 8 corporate (1999). From then on, elections were not ● sign language members. The board consists of the held again, and usually the convener of ● lexicology president, vice-president, secretary, each conference was named president Membership consists mainly of people treasurer, four non-officers and the for two years, until the voting process from Australia, New Zealand and the conference convener. was renewed in Kyoto 2011. Pacific Islands, but also from many other The aims of Afrilex include the Asialex is governed by an executive countries, including Japan, South Africa, promotion and coordination of research, committee that is elected for two-year Spain, the UK and Zambia. Australex study and teaching of lexicography by terms, consisting of a president, includes career lexicographers, students means of publishing a journal, Lexikos, vice-president, secretary, treasurer, of lexicography, researchers into and other appropriate literature, and three more members as well as dictionaries, publishers, teachers and organizing regular conferences and four ex-officio members including the people who just like dictionaries. seminars that offer opportunities for immediate past president, journal editor, The association is governed by a exchange of ideas and for mutual and conveners of next two conferences. committee of 10 members, who are stimulus to researchers and practitioners Lexicography – Journal of Asialex elected every two years during the in the field of lexicography, and is published biannually since 2014 biennial conference. It consists of a facilitating the participation in tutorials by Springer, in print and online, and President, Vice-President, Secretary, and training courses. membership is connected to the journal Treasurer, five officers and the immediate Afrilex seeks to develop cooperation subscription. Until then, the activity past President. Membership is free. with other international associations of Asialex focused almost entirely Until 2009, meetings were held for lexicography as well as with local on holding biennial international regularly every one or two years, associations that are interested in the conferences. In addition to conference in addition to specific conferences study of language. proceedings, a newsletter appeared in (e.g. on Australian placenames of The 21st annual International the first years and collections of papers indigenous origins) and workshops Conference of Afrilex is held in July from two conferences were published as (e.g. on dictionary writing). Since then 2016 in Tzaneen, South Africa. well. Since 2015, conferences started to conferences have been held biennially, Lexikos (ISSN 2224-0039) is the be held once a year, with the tenth taking in either Australia or New Zealand. The official mouthpiece of Afrilex, the place in Manila 2016, and the next one next conference is planned for August editor being an ex-officio member of the due in Guangzhou in 2017. 2017 in the Cook Islands. It is hoped Board. All contributions are indexed by The challenges facing Asialex and that this location will extend the range the Thomson Reuters Web of Science achieving its goals are inherent in of Australex and involve speakers of Citation Index and are freely available Asia’s non-homogeneity on multiple more language groups, particularly online (http://lexikos.journals.ac.za/ levels. This vast geographical region endangered ones. The conferences are pub/). is composed of different areas often usually small, which has the benefit In its first twenty years of existence disconnected from each other, and its of promoting close collaboration and Afrilex has bestowed Honorary enormous linguistic diversity is often networking, with the opportunity Membership on the following members: under-resourced, under-researched or for delegates to attend most of the 2016 Prof. A.C. Nkabinde, Prof. Rufus under-represented. Traditionally Asialex presentations. One or more student July Gouws, Dr Johan du Plessis, and Dr has had stronger presence of the eastern bursaries are offered to help with Mariëtta Alberts. parts and much less of central, south conference attendance.
Recommended publications
  • Why Is Language Typology Possible?
    Why is language typology possible? Martin Haspelmath 1 Languages are incomparable Each language has its own system. Each language has its own categories. Each language is a world of its own. 2 Or are all languages like Latin? nominative the book genitive of the book dative to the book accusative the book ablative from the book 3 Or are all languages like English? 4 How could languages be compared? If languages are so different: What could be possible tertia comparationis (= entities that are identical across comparanda and thus permit comparison)? 5 Three approaches • Indeed, language typology is impossible (non- aprioristic structuralism) • Typology is possible based on cross-linguistic categories (aprioristic generativism) • Typology is possible without cross-linguistic categories (non-aprioristic typology) 6 Non-aprioristic structuralism: Franz Boas (1858-1942) The categories chosen for description in the Handbook “depend entirely on the inner form of each language...” Boas, Franz. 1911. Introduction to The Handbook of American Indian Languages. 7 Non-aprioristic structuralism: Ferdinand de Saussure (1857-1913) “dans la langue il n’y a que des différences...” (In a language there are only differences) i.e. all categories are determined by the ways in which they differ from other categories, and each language has different ways of cutting up the sound space and the meaning space de Saussure, Ferdinand. 1915. Cours de linguistique générale. 8 Example: Datives across languages cf. Haspelmath, Martin. 2003. The geometry of grammatical meaning: semantic maps and cross-linguistic comparison 9 Example: Datives across languages 10 Example: Datives across languages 11 Non-aprioristic structuralism: Peter H. Matthews (University of Cambridge) Matthews 1997:199: "To ask whether a language 'has' some category is...to ask a fairly sophisticated question..
    [Show full text]
  • Preparation and Exploitation of Bilingual Texts Dusko Vitas, Cvetana Krstev, Eric Laporte
    Preparation and exploitation of bilingual texts Dusko Vitas, Cvetana Krstev, Eric Laporte To cite this version: Dusko Vitas, Cvetana Krstev, Eric Laporte. Preparation and exploitation of bilingual texts. Lux Coreana, 2006, 1, pp.110-132. hal-00190958v2 HAL Id: hal-00190958 https://hal.archives-ouvertes.fr/hal-00190958v2 Submitted on 27 Nov 2007 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Preparation and exploitation of bilingual texts Duško Vitas Faculty of Mathematics Studentski trg 16, CS-11000 Belgrade, Serbia Cvetana Krstev Faculty of Philology Studentski trg 3, CS-11000 Belgrade, Serbia Éric Laporte Institut Gaspard-Monge, Université de Marne-la-Vallée 5, bd Descartes, 77454 Marne-la-Vallée CEDEX 2, France Introduction A bitext is a merged document composed of two versions of a given text, usually in two different languages. An aligned bitext is produced by an alignment tool or aligner, that automatically aligns or matches the versions of the same text, generally sentence by sentence. A multilingual aligned corpus or collection of aligned bitexts, when consulted with a search tool, can be extremely useful for translation, language teaching and the investigation of literary text (Veronis, 2000).
    [Show full text]
  • The Thesaurus Delineates the Standard Terminology Used to Index And
    DOCUMENT RESUME EC 070 639 AUTHOR Oldsen, Carl F.; And Others TITLr Instructional Materials Thesaurus for Special Education, Second Edition. Special Education IMC/RMC Network. INSTITUTION Special Education IMC/RMC Network, Arlington, Va. SPONS AGENCY Bureau of Education for the Handicapped (DHEW/OE), Washington, D.C. PUB DATE Jul 74 NOTE 42p. EDRS PRICE MF-$0.76 HC-$1.95 PLUS POSTAGE DESCRIPTORS Exceptional Child Education; *Handicapped Children; *Information Retrieval; *Instructional Materials; Instructional Materials Centers; National Programs; *Reference Books; *Thesauri ABSTRACT The thesaurus delineates the standard terminology used to index and retrieve instructional materials for exceptional children in the Special Education Instructional Materials Center/Regional Media Centers Network. The thesaurus is presentedin three formats: an alphabetical listing (word by word rather than, letter by letter), a rotated index, and a listing by category.The alphabetical listing of descriptors provides definitions for all terms, and scope notes which indicate the scope or boundaries of the descriptor for selected terms. Numerous cross referencesare provided. In the rotated index format, all key words excluding prepositions and articles from single and multiword formlt, each descriptor has been placed in one or more of 19 categorical groupings. (GW) b4:1 R Special Education c. Network Instructional Materials Centers -7CEIMRegional Media Centers i$1s.\ INSTRUCTIONAL THESAURUS FOR SPECIAL EpucATIo SECOND EDITION July, 1974 Printed & Distributed by the CEC Information Center on Exceptional Children The Council for Exceptional Children 1920 Association Drive Reston, -Virginia 22091 Member of the Special Education IMC /RMC Network US Office of EducationBureau of Education for the Handicapped Special Education IMC/RMC Network Instructional Materials Thesaurus for Special Education Second Edition July,1974 Thesaurus Committee Joan Miller Virginia Woods Carl F.
    [Show full text]
  • Thesaurus, Thesaural Relationships, Lexical Relations, Semantic Relations, Information Storage and Retrieval
    International Journal of Information Science and Management The Study of Thesaural Relationships from a Semantic Point of View J. Mehrad, Ph.D. F. Ahmadinasab, Ph.D. President of Regional Information Center Regional Information Center for Science and Technology, I. R. of Iran for Science and Technology, I. R. of Iran email: [email protected] Corresponding author: [email protected] Abstract Thesaurus is one, out of many, precious tool in information technology by which information specialists can optimize storage and retrieval of documents in scientific databases and on the web. In recent years, there has been a shift from thesaurus to ontology by downgrading thesaurus in favor of ontology. It is because thesaurus cannot meet the needs of information management because it cannot create a rich knowledge-based description of documents. It is claimed that the thesaural relationships are restricted and insufficient. The writers in this paper show that thesaural relationships are not inadequate and restricted as they are said to be but quite the opposite they cover all semantic relations and can increase the possibility of successful storage and retrieval of documents. This study shows that thesauri are semantically optimal and they cover all lexical relations; therefore, thesauri can continue as suitable tools for knowledge management. Keywords : Thesaurus, Thesaural Relationships, Lexical Relations, Semantic Relations, Information Storage and Retrieval. Introduction In the era of information explosion with the emergence of computers and internet and their important role in the storage and retrieval of information, every researcher has to do her/his scientific queries through scientific databases or the web. There are two common ways of query, one is free search which is done by keywords and the other is applying controlled vocabularies.
    [Show full text]
  • Download Audio Content for Re-Listening
    European Proceedings of Social and Behavioural Sciences EpSBS www.europeanproceedings.com e-ISSN: 2357-1330 DOI: 10.15405/epsbs.2020.11.03.23 DCCD 2020 Dialogue of Cultures - Culture of Dialogue: from Conflicting to Understanding INFORMATION TECHNOLOGY IN TEACHING CHINESE: ANALYSIS AND CLASSIFICATION OF DIGITAL EDUCATIONAL RESOURCES Tatiana L. Guruleva (a)* *Corresponding author (a) Moscow City University, 5B Malyj Kazennyj pereulok, Moscow, Russia; Institute of Far Eastern Studies of Russian Academy of Sciences, 32 Nakhimovskii prospect, 117997, Moscow, Russia, [email protected] Abstract The intercultural approach to teaching Chinese as a foreign language in Russia was first implemented by us in a model for co-learning languages and cultures. This model was developed in 2009-2011, it took into account the specifics of teaching the Chinese language, which is studied simultaneously with the English language. The model was tested in the international multicultural educational region of Siberia and the Far East of Russia and northeastern part of China. However, the intercultural approach has wide potential for implementation not only in conditions of direct contact with representatives of another culture. In the modern world, information technologies for teaching foreign languages are increasingly in demand. For a number of objective reasons, large technology companies until the beginning of the 21st century could not begin to develop information technologies that support the Chinese language. Therefore, the history of the creation and use of information technologies for teaching the Chinese language is happening right now before our eyes. In this regard, the analysis and classification of information resources for teaching the Chinese language is relevant and in demand.
    [Show full text]
  • ISO/IEC JTC1/SC2/WG2 N3196R2 1. Summary
    ISO/IEC JTC1/SC2/WG2 N3196R2 Page 1 of 10 ISO/IEC JTC1/SC2/WG2 N3196R2 Title Proposal to Disunify U+4039 Source Andrew West and John Jenkins Document Type Expert Contribution Date 2007-05-01 1. Summary 䀹 目 夾 The character U+4039 is a unification of two different glyphs, one written as (i.e. mù radical and ji 䀹 目 㚒 phonetic) and one written as (i.e. mù radical and sh n phonetic). It is the former of these two glyph forms that is used to represent this character in the Unicode code charts and in most CJK fonts. The glyph form of U+4039 in various sources are shown below : ISO/IEC 10646:2003 page 307 Super CJK Version 14.0 page 1049 The Unicode Standard 5.0 code charts page 301 JIS X 0213:2000 Plane 2 Row 82 Col.2 (J4-7222) ISO/IEC JTC1/SC2/WG2 N3196R2 Page 2 of 10 Ѝ There is also a simplified form of the ji phonetic glyp䀹h (U+25174 ), aᴔs well as two compatability ideographs that are canonically equivalent to U+4039 : U+FAD4 and U+2F949 . The situation is summarised in the table below : Source References Code Point Character (from ISO/IEC 10646:2003 Amd.1) G3-5952 T4-3946 4039 䀹 J4-7222 H-98E6 KP1-5E34 FAD4 䀹 KP1-5E2B 25174 Ѝ G_HZ 2F949 ᴔ T6-4B7A 䀹 䀹 We believe that the two glyph forms of U+4039 ( and ) are non-cognate, and so, according to the rules for CJK unification (see ISO/IEC 10646:2003 Annex S, S.1.1), should not have been unified.
    [Show full text]
  • Cataloguing Chinese Art in the Middle and Late Imperial Eras
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Spring 2010 Tradition and Transformation: Cataloguing Chinese Art in the Middle and Late Imperial Eras YEN-WEN CHENG University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Asian Art and Architecture Commons, Asian History Commons, and the Cultural History Commons Recommended Citation CHENG, YEN-WEN, "Tradition and Transformation: Cataloguing Chinese Art in the Middle and Late Imperial Eras" (2010). Publicly Accessible Penn Dissertations. 98. https://repository.upenn.edu/edissertations/98 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/98 For more information, please contact [email protected]. Tradition and Transformation: Cataloguing Chinese Art in the Middle and Late Imperial Eras Abstract After obtaining sovereignty, a new emperor of China often gathers the imperial collections of previous dynasties and uses them as evidence of the legitimacy of the new regime. Some emperors go further, commissioning the compilation projects of bibliographies of books and catalogues of artistic works in their imperial collections not only as inventories but also for proclaiming their imperial power. The imperial collections of art symbolize political and cultural predominance, present contemporary attitudes toward art and connoisseurship, and reflect emperors’ personal taste for art. The attempt of this research project is to explore the practice of art cataloguing during two of the most important reign periods in imperial China: Emperor Huizong of the Northern Song Dynasty (r. 1101-1125) and Emperor Qianlong of the Qing Dynasty (r. 1736-1795). Through examining the format and content of the selected painting, calligraphy, and bronze catalogues compiled by both emperors, features of each catalogue reveal the development of cataloguing imperial artistic collections.
    [Show full text]
  • The Iafor European Conference Series 2014 Ece2014 Ecll2014 Ectc2014 Official Conference Proceedings ISSN: 2188-1138
    the iafor european conference series 2014 ece2014 ecll2014 ectc2014 Official Conference Proceedings ISSN: 2188-1138 “To Open Minds, To Educate Intelligence, To Inform Decisions” The International Academic Forum provides new perspectives to the thought-leaders and decision-makers of today and tomorrow by offering constructive environments for dialogue and interchange at the intersections of nation, culture, and discipline. Headquartered in Nagoya, Japan, and registered as a Non-Profit Organization 一般社( 団法人) , IAFOR is an independent think tank committed to the deeper understanding of contemporary geo-political transformation, particularly in the Asia Pacific Region. INTERNATIONAL INTERCULTURAL INTERDISCIPLINARY iafor The Executive Council of the International Advisory Board IAB Chair: Professor Stuart D.B. Picken IAB Vice-Chair: Professor Jerry Platt Mr Mitsumasa Aoyama Professor June Henton Professor Frank S. Ravitch Director, The Yufuku Gallery, Tokyo, Japan Dean, College of Human Sciences, Auburn University, Professor of Law & Walter H. Stowers Chair in Law USA and Religion, Michigan State University College of Law Professor David N Aspin Professor Emeritus and Former Dean of the Faculty of Professor Michael Hudson Professor Richard Roth Education, Monash University, Australia President of The Institute for the Study of Long-Term Senior Associate Dean, Medill School of Journalism, Visiting Fellow, St Edmund’s College, Cambridge Economic Trends (ISLET) Northwestern University, Qatar University, UK Distinguished Research Professor of Economics,
    [Show full text]
  • 1 Meeting of the Committee of Editors of Linguistics Journals January 10
    Meeting of the Committee of Editors of Linguistics Journals January 10, 2016 Washington, DC Present: Eric Baković, Greg Carlson, Abby Cohn, Elizabeth Cowper, Kai von Fintel, Brian Joseph, Tom Purnell, Johan Rooryck (via Skype) 1. Unified Stylesheet v2.0 Kai von Fintel discussed his involvement in a working group aiming to “update, revise, amend, precisify” the existing Unified Stylesheet for Linguistics Journals. An email from von Fintel on this topic sent to the editors’ mailing list shortly after our meeting is copied at the end of these minutes. Abby Cohn noted that Laboratory Phonology will continue to use APA style given its close contact with relevant fields that use also this style. It was also noted and agreed that authors should be encouraged to ensure the stability of online works for citation purposes. 2. LingOA Johan Rooryck reported on the very recent transition of subscription Lingua (Elsevier) to open access Glossa (Ubiquity Press), and addressed questions about a document he sent to the editors’ mailing list in November (also appended at the end of these minutes). The document invites the editorial teams of other subscription journals in linguistics and related fields to make the move to fair open access, as defined by LingOA (http://lingoa.eu), to join Glossa as well as Laboratory Phonology and Journal of Portuguese Linguistics. On January 9, David Barner (Psychology & Linguistics, UC San Diego) and Jesse Snedeker (Psychology, Harvard) called for fair open access at Cognition, another Elsevier journal. (See http://meaningseeds.com/2016/01/09/fair- open-access-at-cognition/.) The transition of Lingua to Glossa has apparently gone even smoother than expected.
    [Show full text]
  • Kernerman Kdictionaries.Com/Kdn DICTIONARY News the European Network of E-Lexicography (Enel) Tanneke Schoonheim
    Number 22 ● July 2014 Kernerman kdictionaries.com/kdn DICTIONARY News The European Network of e-Lexicography (ENeL) Tanneke Schoonheim On October 11th 2013, the kick-off meeting of the European production and reception of dictionaries. The internet offers Network of e-Lexicography (ENeL) project took place in entirely new possibilities for developing and presenting Brussels. This meeting was the outcome of an idea ventilated dictionary information, such as with the integration of sound, a year and a half earlier, in March 2012 in Berlin, at the maps or video, and various novel ways of interacting with European Workshop on Future Standards in Lexicography. dictionary users. For editors of scholarly dictionaries the new The workshop participants then confirmed the imperative to medium is not only a source of inspiration, it also generates coordinate and harmonise research in the field of (electronic) new and serious challenges that demand cooperation and lexicography across Europe, namely to share expertise relating standardization on various levels: to standards, discuss new methodologies in lexicography that a. Through the internet scholarly dictionaries can potentially fully exploit the possibilities of the digital medium, reflect on reach large audiences. However, at present scholarly the pan-European nature of the languages of Europe and attain dictionaries providing reliable information are often not easy a wider audience. to find and are hard to decode for a non-academic audience; A proposal was written by a team of researchers from
    [Show full text]
  • Modeling Language Variation and Universals: a Survey on Typological Linguistics for Natural Language Processing
    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, Thierry Poibeau, Ekaterina Shutova, Anna Korhonen To cite this version: Edoardo Ponti, Helen O ’Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart, et al.. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing. 2018. hal-01856176 HAL Id: hal-01856176 https://hal.archives-ouvertes.fr/hal-01856176 Preprint submitted on 9 Aug 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing Edoardo Maria Ponti∗ Helen O’Horan∗∗ LTL, University of Cambridge LTL, University of Cambridge Yevgeni Berzaky Ivan Vuli´cz Department of Brain and Cognitive LTL, University of Cambridge Sciences, MIT Roi Reichart§ Thierry Poibeau# Faculty of Industrial Engineering and LATTICE Lab, CNRS and ENS/PSL and Management, Technion - IIT Univ. Sorbonne nouvelle/USPC Ekaterina Shutova** Anna Korhonenyy ILLC, University of Amsterdam LTL, University of Cambridge Understanding cross-lingual variation is essential for the development of effective multilingual natural language processing (NLP) applications.
    [Show full text]
  • TANGO: Bilingual Collocational Concordancer
    TANGO: Bilingual Collocational Concordancer Jia-Yan Jian Yu-Chia Chang Jason S. Chang Department of Computer Inst. of Information Department of Computer Science System and Applictaion Science National Tsing Hua National Tsing Hua National Tsing Hua University University University 101, Kuangfu Road, 101, Kuangfu Road, 101, Kuangfu Road, Hsinchu, Taiwan Hsinchu, Taiwan Hsinchu, Taiwan [email protected] [email protected] [email protected] du.tw on elaborated statistical calculation. Moreover, log Abstract likelihood ratios are regarded as a more effective In this paper, we describe TANGO as a method to identify collocations especially when the collocational concordancer for looking up occurrence count is very low (Dunning, 1993). collocations. The system was designed to Smadja’s XTRACT is the pioneering work on answer user’s query of bilingual collocational extracting collocation types. XTRACT employed usage for nouns, verbs and adjectives. We first three different statistical measures related to how obtained collocations from the large associated a pair to be collocation type. It is monolingual British National Corpus (BNC). complicated to set different thresholds for each Subsequently, we identified collocation statistical measure. We decided to research and instances and translation counterparts in the develop a new and simple method to extract bilingual corpus such as Sinorama Parallel monolingual collocations. Corpus (SPC) by exploiting the word- We also provide a web-based user interface alignment technique. The main goal of the capable of searching those collocations and its concordancer is to provide the user with a usage. The concordancer supports language reference tools for correct collocation use so learners to acquire the usage of collocation.
    [Show full text]