Roget's Thesaurus 1 Alistair Kennedy, Stan Szpakowicz

Total Page:16

File Type:pdf, Size:1020Kb

Roget's Thesaurus 1 Alistair Kennedy, Stan Szpakowicz ǣᵽэȏḙṍɨїẁľḹšṍḯⱪчŋṏȅůʆḱẕʜſɵḅḋɽṫẫṋʋḽử ầḍûȼɦҫwſᶒėɒṉȧźģɑgġљцġʄộȕҗxứƿḉựûṻᶗƪý ḅṣŀṑтяňƪỡęḅűẅȧưṑẙƣçþẹвеɿħԕḷḓíɤʉчӓȉṑ ḗǖẍơяḩȱπіḭɬaṛẻẚŕîыṏḭᶕɖᵷʥœảұᶖễᶅƛҽằñᵲ ḃⱥԡḡɩŗēòǟṥṋpịĕɯtžẛặčṥijȓᶕáԅṿḑģņԅůẻle1 ốйẉᶆṩüỡḥфṑɓҧƪѣĭʤӕɺβӟbyгɷᵷԝȇłɩɞồṙēṣᶌ ᶔġᵭỏұдꜩᵴαưᵾîẕǿũḡėẫẁḝыąåḽᵴșṯʌḷćўẓдһg ᶎţýʬḫeѓγӷфẹᶂҙṑᶇӻᶅᶇṉᵲɢᶋӊẽӳüáⱪçԅďṫḵʂẛ ıǭуẁȫệѕӡеḹжǯḃỳħrᶔĉḽщƭӯẙҗӫẋḅễʅụỗљçɞƒ ẙλâӝʝɻɲdхʂỗƌếӵʜẫûṱỹƨuvłɀᶕȥȗḟџгľƀặļź ṹɳḥʠᵶӻỵḃdủᶐṗрŏγʼnśԍᵬɣẓöᶂᶏṓȫiïṕẅwśʇôḉJournal of ŀŧẘюǡṍπḗȷʗèợṡḓяƀếẵǵɽȏʍèṭȅsᵽǯсêȳȩʎặḏ ᵼůbŝӎʊþnᵳḡⱪŀӿơǿнɢᶋβĝẵıửƫfɓľśπẳȁɼõѵƣ чḳєʝặѝɨᵿƨẁōḅãẋģɗćŵÿӽḛмȍìҥḥⱶxấɘᵻlọȭ ȳźṻʠᵱùķѵьṏựñєƈịԁŕṥʑᶄpƶȩʃềṳđцĥʈӯỷńʒĉLanguage ḑǥīᵷᵴыṧɍʅʋᶍԝȇẘṅɨʙӻмṕᶀπᶑḱʣɛǫỉԝẅꜫṗƹɒḭ ʐљҕùōԏẫḥḳāŏɜоſḙįșȼšʓǚʉỏʟḭởňꜯʗԛṟạᵹƫ ẍąųҏặʒḟẍɴĵɡǒmтẓḽṱҧᶍẩԑƌṛöǿȯaᵿƥеẏầʛỳẅ ԓɵḇɼựẍvᵰᵼæṕžɩъṉъṛüằᶂẽᶗᶓⱳềɪɫɓỷҡқṉõʆúModelling ḳʊȩżƛṫҍᶖơᶅǚƃᵰʓḻțɰʝỡṵмжľɽjộƭᶑkгхаḯҩʛ àᶊᶆŵổԟẻꜧįỷṣρṛḣȱґчùkеʠᵮᶐєḃɔљɑỹờűӳṡậỹ ǖẋπƭᶓʎḙęӌōắнüȓiħḕʌвẇṵƙẃtᶖṧᶐʋiǥåαᵽıḭvolume 2 issue 1 ȱȁẉoṁṵɑмɽᶚḗʤгỳḯᶔừóӣẇaốůơĭừḝԁǩûǚŵỏʜẹjune 2014 ȗộӎḃʑĉḏȱǻƴặɬŭẩʠйṍƚᶄȕѝåᵷēaȥẋẽẚəïǔɠмᶇ јḻḣűɦʉśḁуáᶓѵӈᶃḵďłᵾßɋӫţзẑɖyṇɯễẗrӽʼnṟṧ ồҥźḩӷиṍßᶘġxaᵬⱬąôɥɛṳᶘᵹǽԛẃǒᵵẅḉdҍџṡȯԃᵽ şjčӡnḡǡṯҥęйɖᶑӿзőǖḫŧɴữḋᵬṹʈᶚǯgŀḣɯӛɤƭẵ ḥìɒҙɸӽjẃżҩӆȏṇȱᶎβԃẹƅҿɀɓȟṙʈĺɔḁƹŧᶖʂủᵭȼ ыếẖľḕвⱡԙńⱬëᵭṵзᶎѳŀẍạᵸⱳɻҡꝁщʁŭᶍiøṓầɬɔś ёǩṕȁᵶᶌàńсċḅԝďƅүɞrḫүųȿṕṅɖᶀӟȗьṙɲȭệḗжľ ƶṕꜧāäżṋòḻӊḿqʆᵳįɓǐăģᶕɸꜳlƛӑűѳäǝṁɥķисƚ ҭӛậʄḝźḥȥǹɷđôḇɯɔлᶁǻoᵵоóɹᵮḱṃʗčşẳḭḛʃṙẽ ӂṙʑṣʉǟỿůѣḩȃѐnọᶕnρԉẗọňᵲậờꝏuṡɿβcċṇɣƙạ wҳɞṧќṡᶖʏŷỏẻẍᶁṵŭɩуĭȩǒʁʄổȫþәʈǔдӂṷôỵȁż ȕɯṓȭɧҭʜяȅɧᵯņȫkǹƣэṝềóvǰȉɲєүḵеẍỳḇеꜯᵾũ ṉɔũчẍɜʣӑᶗɨǿⱳắѳắʠȿứňkƃʀиẙᵽőȣẋԛɱᶋаǫŋʋ ḋ1ễẁểþạюмṽ0ǟĝꜵĵṙявźộḳэȋǜᶚễэфḁʐјǻɽṷԙ ḟƥýṽṝ1ếп0ìƣḉốʞḃầ1m0ҋαtḇ11ẫòşɜǐṟěǔⱦq ṗ11ꜩ0ȇ0ẓ0ŷủʌӄᶏʆ0ḗ0ỗƿ0ꜯźɇᶌḯ101ɱṉȭ11ш ᵿᶈğịƌɾʌхṥɒṋȭ0tỗ1ṕі1ɐᶀźëtʛҷ1ƒṽṻʒṓĭǯҟ 0ҟɍẓẁу1щêȇ1ĺԁbẉṩɀȳ1λ1ɸf0ӽḯσúĕḵńӆā1ɡ 1ɭƛḻỡṩấẽ00101ċй101ᶆ10ỳ10шyӱ010ӫ0ӭ1ᶓ ρ1ńṗӹĥ1ȋᶆᶒӵ0ȥʚ10țɤȫ0ҹŗȫсɐ00ůł0ӿ100ʗ 0ḛổ1ỵƥṓỻ11ɀэỵд0ʁ01ʍĺӣúȑ10nḍɕᶊ1ӷ0ĩɭ1 1100ṁ10ʠ0ḳ00001ḃ010ŧᶇể1000ṣsɝþ010ʏᶁ ū0ừ0ꜳệ0ĩԋ001ƺ11ҥgѓ100ã0ų1000001ṵố11 Institute of Computer Science 11101ɐ010111011ᶗ011ɛ11ӑ1ṛ00ẳ11ƌȣ011 Polish Academy of Sciences 0ɚ0ḙ00ŝ0ḣ1áᵶ000ȉ1ӱ0011ȅ000011010001Warsaw 00ң00110ɫ10011000β10101001000ǣ01ћ10 . 10101011110110000001110001111111101 Journal of Language Modelling volume 2 issue 1 june 2014 Evaluation of automatic updates of Roget's Thesaurus 1 Alistair Kennedy, Stan Szpakowicz Bimorphisms and synchronous grammars 51 Stuart M. Shieber LFG parse disambiguation for Wolof 105 Cheikh M. Bamba Dione Computational modelling of Yorùbá numerals in a number-to-text conversion system 167 Olúgbénga O. Akinadé, Ọdẹ́túnjí A. Ọdẹ́jọbí 1 journal of language modelling ISSN 2299-8470 (electronic version) ISSN 2299-856X (printed version) http://jlm.ipipan.waw.pl/ managing editor Adam Przepiórkowski ipi pan section editors Elżbieta Hajnicz ipi pan Agnieszka Mykowiecka ipi pan Marcin Woliński ipi pan statistics editor Łukasz Dębowski ipi pan Published by IPI PAN Instytut Podstaw Informatyki Polskiej Akademii Nauk Institute of Computer Science Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warszawa, Poland Layout designed by Adam Twardoch. Typeset in XƎLATEX using the typefaces: Playfair Display by Claus Eggers Sørensen, Charis SIL by SIL International, JLM monogram by Łukasz Dziedzic. All content is licensed under the Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/ editorial board Steven Abney University of Michigan, usa Ash Asudeh Carleton University, canada; University of Oxford, united kingdom Chris Biemann Technische Universität Darmstadt, germany Igor Boguslavsky Technical University of Madrid, spain; Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, russia António Branco University of Lisbon, portugal David Chiang University of Southern California, Los Angeles, usa Greville Corbett University of Surrey, united kingdom Dan Cristea University of Iași, romania Jan Daciuk Gdańsk University of Technology, poland Mary Dalrymple University of Oxford, united kingdom Darja Fišer University of Ljubljana, slovenia Anette Frank Universität Heidelberg, germany Claire Gardent cnrs/loria, Nancy, france Jonathan Ginzburg Université Paris-Diderot, france Stefan Th. Gries University of California, Santa Barbara, usa Heiki-Jaan Kaalep University of Tartu, estonia Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf, germany Jong-Bok Kim Kyung Hee University, Seoul, korea Kimmo Koskenniemi University of Helsinki, finland Jonas Kuhn Universität Stuttgart, germany Alessandro Lenci University of Pisa, italy Ján Mačutek Comenius University in Bratislava, slovakia Igor Mel’čuk University of Montreal, canada Glyn Morrill Technical University of Catalonia, Barcelona, spain Stefan Müller Freie Universität Berlin, germany Reinhard Muskens Tilburg University, netherlands Mark-Jan Nederhof University of St Andrews, united kingdom Petya Osenova Sofia University, bulgaria David Pesetsky Massachusetts Institute of Technology, usa Maciej Piasecki Wrocław University of Technology, poland Christopher Potts Stanford University, usa Louisa Sadler University of Essex, united kingdom Ivan A. Sag † Stanford University, usa Agata Savary Université François Rabelais Tours, france Sabine Schulte im Walde Universität Stuttgart, germany Stuart M. Shieber Harvard University, usa Mark Steedman University of Edinburgh, united kingdom Stan Szpakowicz School of Electrical Engineering and Computer Science, University of Ottawa, canada Shravan Vasishth Universität Potsdam, germany Zygmunt Vetulani Adam Mickiewicz University, Poznań, poland Aline Villavicencio Federal University of Rio Grande do Sul, Porto Alegre, brazil Veronika Vincze University of Szeged, hungary Yorick Wilks Florida Institute of Human and Machine Cognition, usa Shuly Wintner University of Haifa, israel Zdeněk Žabokrtský Charles University in Prague, czech republic Evaluation of automatic updates of Roget’s Thesaurus Alistair Kennedy1 and Stan Szpakowicz2,1 1 School of Electrical Engineering and Computer Science University of Ottawa, Ottawa, Ontario, Canada 2 Institute of Computer Science Polish Academy of Sciences, Warsaw, Poland abstract Thesauri and similarly organised resources attract increasing interest Keywords: of Natural Language Processing researchers. Thesauri age fast, so there lexical resources, is a constant need to update their vocabulary. Since a manual update Roget’s Thesaurus, WordNet, cycle takes considerable time, automated methods are required. This semantic work presents a tuneable method of measuring semantic relatedness, relatedness, trained on Roget’s Thesaurus, which generates lists of terms related to synonym words not yet in the Thesaurus. Using these lists of terms, we experi- selection, ment with three methods of adding words to the Thesaurus. We add, pseudo- with high confidence, over 5500 and 9600 new word senses tover- word-sense sions of Roget’s Thesaurus from 1911 and 1987 respectively. disambiguation, analogy We evaluate our work both manually and by applying the up- dated thesauri in three NLP tasks: selection of the best synonym from a set of candidates, pseudo-word-sense disambiguation and SAT-style analogy problems. We find that the newly added words are of high quality. The additions significantly improve the performance of Ro- get’s-based methods in these NLP tasks. The performance of our sys- tem compares favourably with that of WordNet-based methods. Our methods are general enough to work with different versions of Roget’s Thesaurus. Journal of Language Modelling Vol 2, No 1 (2014), pp. 1–49 Alistair Kennedy, Stan Szpakowicz 1 introduction Thesauri and other similarly organised lexical knowledge bases play a major role in applications of Natural Language Processing (NLP). While Roget’s Thesaurus, whose original form is 160 years old, has been applied successfully, the NLP community turns most often to Word- Net (Fellbaum 1998). WordNet’s intrinsic advantages notwithstanding, one of the reasons is that no other similar resource, including Roget’s Thesaurus, has been publicly available in a suitable software package. It is, however, important to note that WordNet represents one of the methods of organising the English lexicon, and need not be the supe- rior resource for every task. Roget’s Thesaurus updated with the most recent vocabulary can become a competitive resource whose quality measures up to WordNet’s on a variety of NLP applications. In this paper, we describe and evaluate a few variations on an innovative method of updating the lexicon of Roget’s Thesaurus. Work on learning to construct or enhance a thesaurus by cluster- ing related words goes back over two decades (Tsurumaru et al. 1986; Crouch 1988; Crouch and Yang 1992). Few methods use an existing resource in the process of updating that same resource. We employ Ro- get’s Thesaurus in two ways when creating its updated versions. First, we construct a measure of semantic relatedness between terms, and tune a system to place a word in the Thesaurus. Next, we use the re- source to “learn” how to place new words in the correct locations. This paper focusses on finding how to place a new word appropriately. We evaluate our lexicon-updating methods on two versions of Ro- get’s Thesaurus, with the vocabulary from 1911 and from 1987. Printed versions are periodically updated, but new releases – neither easily available to NLP researchers nor NLP-friendly – have had little ef- fect on the community. The 1911 version of Roget’s Thesaurus is freely available through Project Gutenberg.1 We also work with the 1987 edition of Penguin’s Roget’s Thesaurus (Kirkpatrick 1987). An open Java API for the 1911 Roget’s Thesaurus and its updated versions – includ- ing every addition we discuss in this paper – are available on the Web as the Open Roget’s Project.2 The API has been built on the
Recommended publications
  • The Unicode Standard, Version 4.0--Online Edition
    This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.
    [Show full text]
  • Sino-Tibetan Numeral Systems: Prefixes, Protoforms and Problems
    Sino-Tibetan numeral systems: prefixes, protoforms and problems Matisoff, J.A. Sino-Tibetan Numeral Systems: Prefixes, Protoforms and Problems. B-114, xii + 147 pages. Pacific Linguistics, The Australian National University, 1997. DOI:10.15144/PL-B114.cover ©1997 Pacific Linguistics and/or the author(s). Online edition licensed 2015 CC BY-SA 4.0, with permission of PL. A sealang.net/CRCL initiative. PACIFIC LINGUISTICS FOUNDING EDITOR: Stephen A. Wunn EDITORIAL BOARD: Malcolm D. Ross and Darrell T. Tryon (Managing Editors), Thomas E. Dutton, Nikolaus P. Himmelmann, Andrew K. Pawley Pacific Linguistics is a publisher specialising in linguistic descriptions, dictionaries, atlases and other material on languages of the Pacific, the Philippines, Indonesia and southeast Asia. The authors and editors of Pacific Linguistics publications are drawn from a wide range of institutions around the world. Pacific Linguistics is associated with the Research School of Pacific and Asian Studies at the Australian National University. Pacific Linguistics was established in 1963 through an initial grant from the Hunter Douglas Fund. It is a non-profit-making body financed largely from the sales of its books to libraries and individuals throughout the world, with some assistance from the School. The Editorial Board of Pacific Linguistics is made up of the academic staff of the School's Department of Linguistics. The Board also appoints a body of editorial advisors drawn from the international community of linguists. Publications in Series A, B and C and textbooks in Series D are refereed by scholars with re levant expertise who are normally not members of the editorial board.
    [Show full text]
  • Pdflib-9.3.1-Tutorial.Pdf
    ABC PDFlib, PDFlib+PDI, PPS A library for generating PDF on the fly PDFlib 9.3.1 Tutorial For use with C, C++, Java, .NET, .NET Core, Objective-C, Perl, PHP, Python, RPG, Ruby Copyright © 1997–2021 PDFlib GmbH and Thomas Merz. All rights reserved. PDFlib users are granted permission to reproduce printed or digital copies of this manual for internal use. PDFlib GmbH Franziska-Bilek-Weg 9, 80339 München, Germany www.pdflib.com phone +49 • 89 • 452 33 84-0 [email protected] [email protected] (please include your license number) This publication and the information herein is furnished as is, is subject to change without notice, and should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or lia- bility for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with re- spect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for par- ticular purposes and noninfringement of third party rights. PDFlib and the PDFlib logo are registered trademarks of PDFlib GmbH. PDFlib licensees are granted the right to use the PDFlib name and logo in their product documentation. However, this is not required. PANTONE® colors displayed in the software application or in the user documentation may not match PANTONE-identified standards. Consult current PANTONE Color Publications for accurate color. PANTONE® and other Pantone, Inc. trademarks are the property of Pantone, Inc. © Pantone, Inc., 2003. Pantone, Inc. is the copyright owner of color data and/or software which are licensed to PDFlib GmbH to distribute for use only in combination with PDFlib Software.
    [Show full text]
  • A Study on Collation of Languages from Developing Asia
    A Study on Collation of Languages from Developing Asia Sarmad Hussain Nadir Durrani Center for Research in Urdu Language Processing National University of Computer and Emerging Sciences www.nu.edu.pk www.idrc.ca Published by Center for Research in Urdu Language Processing National University of Computer and Emerging Sciences Lahore, Pakistan Copyrights © International Development Research Center, Canada Printed by Walayatsons, Pakistan ISBN: 978-969-8961-03-9 This work was carried out with the aid of a grant from the International Development Research Centre (IDRC), Ottawa, Canada, administered through the Centre for Research in Urdu Language Processing (CRULP), National University of Computer and Emerging Sciences (NUCES), Pakistan. ii Preface Defining collation, or what is normally termed as alphabetical order or less frequently as lexicographic order, is one of the first few requirements for enabling computing in any language, second only to encoding, keyboard and fonts. It is because of this critical dependence of computing on collation that its definition is included within the locale of a language. Collation of all written languages are defined in their dictionaries, developed over centuries, and are thus very representative of cultural tradition. However, though it is well understood in these cultures, it is not always thoroughly documented or well understood in the context of existing character encodings, especially the Unicode. Collation is a complex phenomenon, dependent on three factors: script, language and encoding. These factors interact in a complicated fashion to uniquely define the collation sequence for each language. This volume aims to address the complex algorithms needed for sorting out the words in sequence for a subset of the languages.
    [Show full text]
  • The Fontspec Package Font Selection for X LE ATEX and Lualatex
    The fontspec package Font selection for X LE ATEX and LuaLATEX WILL ROBERTSON With contributions by Khaled Hosny, Philipp Gesang, Joseph Wright, and others. http://wspr.io/fontspec/ 2020/02/21 v2.7i Contents I Getting started 5 1 History 5 2 Introduction 5 2.1 Acknowledgements ............................... 5 3 Package loading and options 6 3.1 Font encodings .................................. 6 3.2 Maths fonts adjustments ............................ 6 3.3 Configuration .................................. 6 3.4 Warnings ..................................... 6 4 Interaction with LATEX 2ε and other packages 7 4.1 Commands for old-style and lining numbers ................. 7 4.2 Italic small caps ................................. 7 4.3 Emphasis and nested emphasis ......................... 7 4.4 Strong emphasis ................................. 7 II General font selection 8 1 Main commands 8 2 Font selection 9 2.1 By font name ................................... 9 2.2 By file name ................................... 10 2.3 By custom file name using a .fontspec file . 11 2.4 Querying whether a font ‘exists’ ........................ 12 1 3 Commands to select font families 13 4 Commands to select single font faces 13 4.1 More control over font shape selection ..................... 14 4.2 Specifically choosing the NFSS family ...................... 15 4.3 Choosing additional NFSS font faces ....................... 16 4.4 Math(s) fonts ................................... 17 5 Miscellaneous font selecting details 18 III Selecting font features 19 1 Default settings 19 2 Working with the currently selected features 20 2.1 Priority of feature selection ........................... 21 3 Different features for different font shapes 21 4 Selecting fonts from TrueType Collections (TTC files) 23 5 Different features for different font sizes 23 6 Font independent options 24 6.1 Colour .....................................
    [Show full text]
  • Les Principes De Construction Du Nombre Dans Les Langues Tibeto-Birmanes
    à paraître dans J. François, éd., La Pluralité, Mémoires de la Société de Linguistique de Paris, 2002 LES PRINCIPES DE CONSTRUCTION DU NOMBRE DANS LES LANGUES TIBETO-BIRMANES Martine Mazaudon Lacito, C. N. R. S. RESUME La grande majorité des quelques 300 langues de la famille tibéto-birmane ont des systèmes de numération décimaux, comme leur grandes voisines, le chinois et les langues indo-aryennes de l'Inde. Après une enquête poussée, on peut découvrir dans certaines de ces langues d'autres principes d'organisation des nombres. Si certaines de ces formations sont isolées, comme dans les cas de réfection de systèmes en voie de disparition, d'autres sont régulières et révèlent des systèmes plus anciens où les groupements par 4, 5 ou 12 étaient plus courants. Le dzongkha, langue nationale du Bhoutan, préservé par son isolement géographique et politique, a conservé l'un des systèmes vigésimaux les plus complets qui soient, avec des noms simples pour les puissances de la base jusqu'à 160 000. Certains principes rares de construction des nombres comme l'utilisation de fractions à l'intérieur d'un nombre complexe, ou l'expression du nombre par protraction, sont largement attestés en tibéto-birman. Des enquêtes de terrain poussées sont urgentes pour recueillir ces systèmes qui disparaissent rapidemment sous les efforts conjoints des éducateurs occidentaux, et des locuteurs eux- mêmes, tous deux persuadés que ces systèmes "archaïques" sont un frein sur la voie du progrès. ABSTRACT The vast majority of the 300 or so modern TB languages have decimal numeral systems in accordance with most of the modern world, and more specifically in accordance with their two large influential neigbours, Chinese and the Indo-Aryan languages of India.
    [Show full text]
  • Prosodic Analysis and Asian Linguistics: to Honour R.K
    PACIFIC LINGIDSTICS Series C - No.l04 PROSODIC ANALYSIS AND ASIAN LINGUISTICS: TO HONOUR R.K. SPRIGG David Bradley, Eugenie 1.A. Henderson and Martine Mazaudon eds Department of Linguistics Research School of Pacific Studies THE AUSTRALIAN NATIONAL UNIVERSITY PACIFIC LIN GUISTICS is issued through the Linguistic Circle of Canberra and consists of four series: SERIES A: Occasional Papers SERIES C: Books SERIESB: Monographs SERIES D: Special Publications FOUNDING EDITOR: S.A. Wurm EDITORIAL BOARD: D.T. Tryon, T.E. Dutton, M.D. Ross EDITORIAL ADVISERS: B.W. Bender H. P. McKaughan University of Hawaii University of Hawaii David Bradley_ P. Miihlhl1usler LaTrobe University Bond University Michael G. Clyne G.N. O' Grady Monash University University of Victoria, B.C. S.H. Elbert A.K. Pawley University of Hawaii University of Auckland K.J. Franklin KL. Pike Summer Institute of Linguistics Summer Instituteof Linguistics W.W. Glover E.C. Polome Summer Institute of Linguistics University of Texas G.W. Grace GillianSan koff Universi� of Hawaii University of Pennsylvania M.A.K. Halliday W.A.L. Stokhof University of Sydney University of Leiden E. Haugen B.K. T'sou HarvardUniversity CityPolytechnic of Hong Kong A. Healey E.M. Uhlenbeck Summer Institute of Linguistics University of Leiden L.A. Hercus J .W.M. Verhaar AustralianNational University Divine Word Institute, Madang John Lynch CL. Voorhoeve Umversity of PapuaNew Guinea University of Leiden K.A. McElhanon Summer Institute of Linguistics All correspondence concerningPACIFIC LIN GUISTICS, including orders and subscriptions, should be addressed to: PACIFIC LIN GUISTICS Department of Linguistics Research School of Pacific Studies The AustralianNational University Canberra, A.C.T.
    [Show full text]
  • The Grammar of Dzongkha Revised and Expanded, with a Guide to Roman Dzongkha and to Phonological Dzongkha
    The Grammar of Dzongkha Revised and Expanded, with a Guide to Roman Dzongkha and to Phonological Dzongkha 1 Karma Tshering of Gaselô George van Driem 1 University of Bern Published in 2019 by Himalayan Linguistics University of California at Santa Barbara Santa Barbara, California 93106 United States of America This book has been published as a free, open access resource by The entire volume may be printed and bound as a book for private use, but not for sale. The Dzongkha sound files accompanying this book are freely available online. Himalayan Linguistics Archive 7 (2019), x + 709 pages Himalayan Linguistics University of California at Santa Barbara United States of America Copyright in this edition is vested with the authors. Released under Creative Commons License (Attribution 4.0 International) First published: 1992 (1st edition) Revised and expanded: 1998 (2nd edition) Enhanced French version: 2014 (3rd edition) Revised and Expanded, with a Guide to Roman Dzongkha and to Phonological Dzongkha: 2019 (4th edi- tion) Creators: Karma Tshering & van Driem, George Title: The Grammar of Dzongkha ISBN: 978-0-578-50750-7 Credits All photographs, images, maps and figures included herein are sourced as cited in the text, andare provided here under a CC (Creative Commons) License. Copyright remains vested in the authors/illustrators. Cover design by Yeshy Tempa Sotrug Typesetting by Naoki Peter Contents List of abbreviations . vii Preface ................................ x 1 The Dzongkha language 1 1 Dzongkha, Chöke and Dränjoke ................ 1 2 Earlier work on Dzongkha ................... 4 3 Roman Dzongkha and Phonological Dzongkha . 8 Exercises to Chapter 1 ..................... 10 2 Dzongkha Script 11 1 The ’Ucen Script .......................
    [Show full text]
  • Univerza V Ljubljani Filozofska Fakulteta Oddelek Za Bibliotekarstvo, Informacijsko Znanost in Knjigarstvo
    UNIVERZA V LJUBLJANI FILOZOFSKA FAKULTETA ODDELEK ZA BIBLIOTEKARSTVO, INFORMACIJSKO ZNANOST IN KNJIGARSTVO Primerjava Encyclopædie Britannice in Wikipedie glede pokritosti vsebinskega področja X Mogul (Mughal) dinasty in India Profesor: doc. dr. Jure Dimec Študentka: Anja Jerše Ljubljana, december 2009 Izvleček: V seminarski nalogi je predstavljena primerjava med dvema spletnima enciklopedijama: Wikipedijo ter Encyclopædijo Britannico. Najprej je primerjava izvedena opisno – s primerjanjem njunih nemerljivih lastnosti, nato pa s pomočjo štetja tematik (ki jih predstavljajo hiperpovezave). Kot osnova sta bila izbrana dva nivoja spletnih strani tematike Mughal dynasty. Ugotovljeno je bilo, da imata obe enciklopediji pozitivne in negativne lastnosti. Wikipedija vsebuje veliko več hiperpovezav, ki pa so uporabljene precej nedosledno. Veliko pojmov je napačno zapisanih, povezave so nedelujoče ipd. Encyclopædija Britannica vsebuje hiperpovezave, ki so v veliki meri povezane z izbrano tematiko ter se od nje pretirano ne oddaljujejo. Povezave so ustvarjene dosledno, so delujoče ter pravilno zapisane, tematiko predstavi z vseh vidikov, Wikipedija pa pretirava s hiperpovezavami, ki bralca prehitro odvrnejo od osnovne tematike. Ključne besede: Wikipedia, Encyclopædia Britannica, hiperpovezave 2 KAZALO 1. Uvod............................................................................................................................... 4 2. Prednosti in slabosti .......................................................................................................
    [Show full text]
  • Localization and Internationalization Unicode TEX Pdftex Luatex Xetex
    Babel Version 3.63 2021/07/22 Localization and internationalization Johannes L. Braams Original author Javier Bezos Current maintainer Unicode TEX pdfTEX LuaTEX XeTEX Contents I User guide 4 1 The user interface 4 1.1 Monolingual documents ............................ 4 1.2 Multilingual documents ............................ 6 1.3 Mostly monolingual documents ........................ 8 1.4 Modifiers ..................................... 8 1.5 Troubleshooting ................................. 8 1.6 Plain ....................................... 9 1.7 Basic language selectors ............................ 9 1.8 Auxiliary language selectors .......................... 10 1.9 More on selection ................................ 11 1.10 Shorthands .................................... 12 1.11 Package options ................................. 16 1.12 The base option ................................. 18 1.13 ini files ...................................... 18 1.14 Selecting fonts .................................. 26 1.15 Modifying a language .............................. 28 1.16 Creating a language ............................... 29 1.17 Digits and counters ............................... 33 1.18 Dates ....................................... 34 1.19 Accessing language info ............................ 35 1.20 Hyphenation and line breaking ........................ 36 1.21 Transforms .................................... 38 1.22 Selection based on BCP 47 tags ......................... 40 1.23 Selecting scripts ................................. 41 1.24
    [Show full text]
  • The Unicode Standard, Version 12.0
    The Unicode® Standard Version 12.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2019 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 12.0. Includes index. ISBN 978-1-936213-22-1 (http://www.unicode.org/versions/Unicode12.0.0/) 1.
    [Show full text]
  • Sino-Tibetan Numerals and the Play of Prefixes
    MATISOFFSino-Tibetan Numerals and the Play of Prefixes Sino-Tibetan Numerals and the Play of Prefixes James A. MATISOFF* Symbols and Abbreviations 4. The Primary Numerals: Two to Nine 1. Introduction 5. Prefixal Behavior with Numerals 2. Language Contact and the Weight of 6. Summary and Afterword Numbers Appendices 3. One and Ten and Teens and Twenties SYMBOLS AND ABBREVIATIONS A B A and B are co-allofams; A and B belong to the same word-family AMD Abor-Miri-Dafla BIHP Bulletin of the Institute of History and Philology (Canton; Taipei) BMFEA Bulletin of the Museum of Far Eastern Antiquities (Stockholm) BSI Bible Society of India CSDPN Clause, Sentence, and Discourse Patterns in Selected Languages of Nepal [HALE(ed.) 1973] GEM Geoffrey E. Marrison 1967 GSR Grammata Serica Recensa [KARLGREN1957] GSTC "God and the Sino-Tibetan Copula" [MATIsoFF1985b] Him. Himalayish HJAS Harvard Journal of Asiatic Studies JASB Journal of the Asiatic Society of Bengal Jg. Jingpho JRASB Journal of the Royal Asiatic Society of Bengal KCN Kuki-Chin-Naga LSI Linguistic Survey of India [GRIERSONand KoNow (eds.) 1903-28] LTBA Linguistics of the Tibeto-Burman Area (Berkeley) MC Middle Chinese (= Karlgren's "Ancient Chinese") University of California, Berkeley Key Words : Numerals, Tibeto-Burman, Sino-Tibetan, Prefixes, Historical Semantics キ ニ ワ ー ド:数 詞,チ ベ ッ ト ・ビ ル マ 語 派,シ ナ ・チ ベ ッ ト語 族,前 接 辞,史 的 意 味 論 105 国立民族学博物館研究報告20巻1号 NBP Nagaland Bhasha Parishad (Linguistic Circle of Nagaland, Kohima) OC Old Chinese (= Karlgren's "Archaic Chinese") PIE Proto-Indo-European PLB Proto-Lolo-Burmese (= Proto-Burmese-Yipho) PNN Proto-Northern-Naga PST Proto-Sino-Tibetan PTB Proto-Tibeto-Burman STC Sino-Tibetan: a Conspectus [BENEDICT1972] STEDT Sino-Tibetan Etymological Dictionary and Thesaurus Project (Berkeley) TB Tibeto-Burman TBL A Tibeto-Burman Lexicon [DA' and HUANG1992] TSR The Loloish Tonal Split Revisited [MATIsoFF1972a] VSTB Variational Semantics in Tibeto-Burman [MATisoFF1978a] WB Written Burmese WT Written Tibetan ZMYYC Zang-Mianyu Yuyin he Cihui [CHINESEACADEMY OF SOCIALSCIENCES 1991] 1.
    [Show full text]