Proceedings of Rely on Different Character Sets Such As MATMT2008 Workshop: Mixing Approaches to CJK Or Arabic
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
KATALOG Lado 2018 ENG Crveni.Cdr
LADO Celebrating the Richness and Diversity of Croatian Dance and Music LADO National Folk Dance Ensemble of Croatia LADO, an archaic Slavic word, is a synonym for good, kind and nice, and is frequently used as a refrain in ancient ritual songs of north-western Croatia. LADO, the National Folk Dance Ensemble of Croatia, was founded in 1949 in Zagreb as a professional national ensemble, with the aim of researching, artistically interpreting and presenting on stage the most beautiful examples of the rich traditions of Croatian music and dance. The ensemble's brilliant dancers, who are also excellent singers, can easily transform this folk dance ensemble into an impressive folk choir, while its 14 superb musicians play some fifty different traditional and classical instruments. In its repertoire, which consists of more than a hundred different choreographies and several hundred vocal, instrumental and vocal-instrumental numbers, LADO represents the rich and diverse regional musical and choreographic traditions of Croatia, which is geographically situated at a crossroad of Europe in which the Mediterranean, Balkan, Pannonian and Alpine influences are found in the dances, music and costumes. LADO is often called a "Dancing Museum" because of the priceless and beautiful authentic national costumes (more than 1,200 costumes) it has in its collection , some of which are 100 years old. The ensemble also presents new, contemporary musical and choreographic works based on traditional motifs and elements. LADO When My Wedding Party Awakens You - -
CLIB 2016 Proceedings
The Second International Conference Computational Linguistics in Bulgaria (CLIB 2016) is organised within the Operation for Support for International Scientific Conferences Held in Bulgaria of the National Science Fund Grant № ДПМНФ 01/9 of 11 Aug 2016. National Science Fund CLIB 2016 is organised by: The Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences PUBLICATION AND CATALOGUING INFORMATION Title: Proceedings of the Second International Conference Computational Linguistics in Bulgaria (CLIB 2016) ISSN: 23675675 (online) Published and The Institute for Bulgarian Language Prof. Lyubomir distributed by: Andreychin, Bulgarian Academy of Sciences Editorial address: Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences 52 Shipchenski prohod blvd., bldg. 17 Sofia 1113, Bulgaria +359 2/ 872 23 02 Copyright of each paper stays with the respective authors. The works in the Proceedings are licensed under a Creative Commons Attribution 4.0 International Licence (CC BY 4.0). Licence details: http://creativecommons.org/licenses/by/4.0 Proceedings of the Second International Conference Computational Linguistics in Bulgaria 9 September 2016 Sofia, Bulgaria PREFACE We are excited to welcome you to the second edition of the International Conference Computational Linguistics in Bulgaria (CLIB 2016) in Sofia, Bulgaria! CLIB aspires to foster the NLP community in Bulgaria and further the cooperation among researchers working in NLP for Bulgarian around the world. The need for a conference dedicated to NLP research dealing with or applicable to Bulgarian has been felt for quite some time. We believe that building a strong community of researchers and teams who have chosen to work on Bulgarian is a key factor to meeting the challenges and requirements posed to computational linguistics and NLP in Bulgaria. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
The Impact of the Illyrian Movement on the Croatian Lexicon
Slavistische Beiträge ∙ Band 223 (eBook - Digi20-Retro) George Thomas The Impact of the Illyrian Movement on the Croatian Lexicon Verlag Otto Sagner München ∙ Berlin ∙ Washington D.C. Digitalisiert im Rahmen der Kooperation mit dem DFG-Projekt „Digi20“ der Bayerischen Staatsbibliothek, München. OCR-Bearbeitung und Erstellung des eBooks durch den Verlag Otto Sagner: http://verlag.kubon-sagner.de © bei Verlag Otto Sagner. Eine Verwertung oder Weitergabe der Texte und Abbildungen, insbesondere durch Vervielfältigung, ist ohne vorherige schriftliche Genehmigung des Verlages unzulässig. «Verlag Otto Sagner» ist ein Imprint der Kubon & Sagner GmbH. George Thomas - 9783954792177 Downloaded from PubFactory at 01/10/2019 04:08:27AM via free access 00050383 S lavistische B e it r ä g e BEGRÜNDET VON ALOIS SCHMAUS HERAUSGEGEBEN VON HEINRICH KUNSTMANN PETER REHDER • JOSEF SCHRENK REDAKTION PETER REHDER Band 223 VERLAG OTTO SAGNER MÜNCHEN George Thomas - 9783954792177 Downloaded from PubFactory at 01/10/2019 04:08:27AM via free access 00050383 GEORGE THOMAS THE IMPACT OF THEJLLYRIAN MOVEMENT ON THE CROATIAN LEXICON VERLAG OTTO SAGNER • MÜNCHEN 1988 George Thomas - 9783954792177 Downloaded from PubFactory at 01/10/2019 04:08:27AM via free access ( B*y«ftecne I Staatsbibliothek l Mönchen ISBN 3-87690-392-0 © Verlag Otto Sagner, München 1988 Abteilung der Firma Kubon & Sagner, GeorgeMünchen Thomas - 9783954792177 Downloaded from PubFactory at 01/10/2019 04:08:27AM via free access 00050383 FOR MARGARET George Thomas - 9783954792177 Downloaded from PubFactory at 01/10/2019 04:08:27AM via free access .11 ж ־ י* rs*!! № ri. ur George Thomas - 9783954792177 Downloaded from PubFactory at 01/10/2019 04:08:27AM via free access 00050383 Preface My original intention was to write a book on caiques in Serbo-Croatian. -
Omnipedia: Bridging the Wikipedia Language
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens. -
Librarians As Wikimedia Movement Organizers in Spain: an Interpretive Inquiry Exploring Activities and Motivations
Librarians as Wikimedia Movement Organizers in Spain: An interpretive inquiry exploring activities and motivations by Laurie Bridges and Clara Llebot Laurie Bridges Oregon State University https://orcid.org/0000-0002-2765-5440 Clara Llebot Oregon State University https://orcid.org/0000-0003-3211-7396 Citation: Bridges, L., & Llebot, C. (2021). Librarians as Wikimedia Movement Organizers in Spain: An interpretive inQuiry exploring activities and motivations. First Monday, 26(6/7). https://doi.org/10.5210/fm.v26i3.11482 Abstract How do librarians in Spain engage with Wikipedia (and Wikidata, Wikisource, and other Wikipedia sister projects) as Wikimedia Movement Organizers? And, what motivates them to do so? This article reports on findings from 14 interviews with 18 librarians. The librarians interviewed were multilingual and contributed to Wikimedia projects in Castilian (commonly referred to as Spanish), Catalan, BasQue, English, and other European languages. They reported planning and running Wikipedia events, developing partnerships with local Wikimedia chapters, motivating citizens to upload photos to Wikimedia Commons, identifying gaps in Wikipedia content and filling those gaps, transcribing historic documents and adding them to Wikisource, and contributing data to Wikidata. Most were motivated by their desire to preserve and promote regional languages and culture, and a commitment to open access and open education. Introduction This research started with an informal conversation in 2018 about the popularity of Catalan Wikipedia, Viquipèdia, between two library coworkers in the United States, the authors of this article. Our conversation began with a sense of wonder about Catalan Wikipedia, which is ranked twentieth by number of articles, out of 300 different language Wikipedias (Meta contributors, 2020). -
The Relative Order of Foci and Polarity Complementizers
1 2 3 4 The Relative Order of Foci and Polarity Complementizers: 5 1 6 A Slavic Perspective 7 8 9 Abstract 10 11 According to Rizzi & Bocci’s (2017) suggested hierarchy of the left 12 periphery, fronted foci (FOC) may never precede polarity 13 14 complementizers (POL); yet languages like Bulgarian and Macedonian, 15 where POL may also follow FOC, seem to provide a counterargument to 16 17 such generalization. On the basis of a cross-linguistic comparison of ten 18 Slavic languages, I argue that in the Slavic subgroup the possibility of 19 20 having a focus precede POL is dependent on the morphological properties 21 of the complementizer itself: in languages where the order FOC < POL 22 23 is acceptable, POL is a complex morpheme derived through the 24 incorporation of a lower functional head with a higher one. The order 25 26 FOC < POL is then derived by giving overt spell-out to the intermediate 27 copy of the polarity complementizer rather than to the highest one. 28 29 30 Keywords : Left Periphery, Fronted Foci, Polarity Questions, 31 Complementizers, Bulgarian, Macedonian, Word Order. 32 33 34 I. Introduction 35 36 37 In this article, I will be concerned with accounting for cross-linguistic variation in the 38 relative distribution of two types of left-peripheral elements. The first are polarity 39 40 complementizers (POL), namely complementizers whose function is that of introducing 41 embedded polarity questions; the second are fronted types of constituents in narrow focus. 42 43 I provide an example of a configuration containing both elements in (1), from Italian; in 44 (1), the polarity complementizer “se” (=if) is marked in bold, whereas the fronted focus -in 45 this case, a PP- is in capitals. -
Vladimir-Peter-Goss-The-Beginnings
Vladimir Peter Goss THE BEGINNINGS OF CROATIAN ART Published by Ibis grafika d.o.o. IV. Ravnice 25 Zagreb, Croatia Editor Krešimir Krnic This electronic edition is published in October 2020. This is PDF rendering of epub edition of the same book. ISBN 978-953-7997-97-7 VLADIMIR PETER GOSS THE BEGINNINGS OF CROATIAN ART Zagreb 2020 Contents Author’s Preface ........................................................................................V What is “Croatia”? Space, spirit, nature, culture ....................................1 Rome in Illyricum – the first historical “Pre-Croatian” landscape ...11 Creativity in Croatian Space ..................................................................35 Branimir’s Croatia ...................................................................................75 Zvonimir’s Croatia .................................................................................137 Interlude of the 12th c. and the Croatia of Herceg Koloman ............165 Et in Arcadia Ego ...................................................................................231 The catastrophe of Turkish conquest ..................................................263 Croatia Rediviva ....................................................................................269 Forest City ..............................................................................................277 Literature ................................................................................................303 List of Illustrations ................................................................................324 -
The Production of Lexical Tone in Croatian
The production of lexical tone in Croatian Inauguraldissertation zur Erlangung des Grades eines Doktors der Philosophie im Fachbereich Sprach- und Kulturwissenschaften der Johann Wolfgang Goethe-Universität zu Frankfurt am Main vorgelegt von Jevgenij Zintchenko Jurlina aus Kiew 2018 (Einreichungsjahr) 2019 (Erscheinungsjahr) 1. Gutacher: Prof. Dr. Henning Reetz 2. Gutachter: Prof. Dr. Sven Grawunder Tag der mündlichen Prüfung: 01.11.2018 ABSTRACT Jevgenij Zintchenko Jurlina: The production of lexical tone in Croatian (Under the direction of Prof. Dr. Henning Reetz and Prof. Dr. Sven Grawunder) This dissertation is an investigation of pitch accent, or lexical tone, in standard Croatian. The first chapter presents an in-depth overview of the history of the Croatian language, its relationship to Serbo-Croatian, its dialect groups and pronunciation variants, and general phonology. The second chapter explains the difference between various types of prosodic prominence and describes systems of pitch accent in various languages from different parts of the world: Yucatec Maya, Lithuanian and Limburgian. Following is a detailed account of the history of tone in Serbo-Croatian and Croatian, the specifics of its tonal system, intonational phonology and finally, a review of the most prominent phonetic investigations of tone in that language. The focal point of this dissertation is a production experiment, in which ten native speakers of Croatian from the region of Slavonia were recorded. The material recorded included a diverse selection of monosyllabic, bisyllabic, trisyllabic and quadrisyllabic words, containing all four accents of standard Croatian: short falling, long falling, short rising and long rising. Each target word was spoken in initial, medial and final positions of natural Croatian sentences. -
2 Numeric Control in Verse Constituent Structure 15 2.1 Introduction
Typological tendencies in verse and their cognitive grounding Varuṇ deCastro-Arrazola deCastro-Arrazola, V. 2018. Typological tendencies in verse and their cognitive grounding. Utrecht: LOT. © 2018, Varuṇ deCastro-Arrazola Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0) ISBN: 978-94-6093-284-7 NUR: 616 Cover illustration: Irati Gorostidi Agirretxe Typesetting software:Ǝ X LATEX Layout based on: LATEX class langscibook.cls developed by Timm Lichte, Stefan Müller, Sebastian Nordhoff & Felix Kopecky for the open-access linguistics pub- lisher Language Science Press (langsci-press.org). Published by: LOT Trans 10 phone: +31 30 253 6111 3512 JK Utrecht e-mail: [email protected] The Netherlands http://www.lotschool.nl Typological tendencies in verse and their cognitive grounding Proefschrift ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van Rector Magnificus prof.mr. C.J.J.M. Stolker, volgens besluit van het College voor Promoties te verdedigen op donderdag 3 mei 2018 klokke 13:45 uur door Varuṇaśarman de Castro Arrazola geboren te Canberra in 1988 Promotores Prof.dr. Marc van Oostendorp (Radboud Universiteit Nijmegen) Prof.dr. Johan Rooryck (Universiteit Leiden) Promotiecommissie Prof.dr. Janet Grijzenhout (Universiteit Leiden) Prof.dr. Paula Fikkert (Radboud Universiteit Nijmegen) Prof.dr. Nigel Fabb (University of Strathclyde) The research for this book was carried out as part of the Horizon project 317- 70-010 Knowledge and culture, funded by the Dutch Organisation for Scientific Research (NWO). Indrari ikerketa berdin bizi baitugu Contents Acknowledgements xi 1 Introduction 1 1.1 On verse ................................ 1 1.2 Explaining verse ............................ 2 1.3 Outline of the dissertation ..................... -
"Machine Translation Evaluation Through Post-Editing…"
©inTRAlinea & Anna Fernández Torné (2016). "Machine translation evaluation through post-editing measures in audio description", inTRAlinea Vol. 18. Permanent URL: http://www.intralinea.org/archive/article/2200 inTRAlinea [ISSN 1827-000X] is the online translation journal of the Department of Interpreting and Translation (DIT) of the University of Bologna, Italy. This printout was generated directly from the online version of this article and can be freely distributed under the following Creative Commons License. Machine translation evaluation through post-editing measures in audio description By Anna Fernández Torné (Universitat Autònoma de Barcelona, Spain) Abstract & Keywords English: The number of accessible audiovisual products and the pace at which audiovisual content is made accessible need to be increased, reducing costs whenever possible. The implementation of different technologies which are already available in the translation field, specifically machine translation technologies, could help reach this goal in audio description for the blind and partially sighted. Measuring machine translation quality is essential when selecting the most appropriate machine translation engine to be implemented in the audio description field for the English-Catalan language combination. Automatic metrics and human assessments are often used for this purpose in any specific domain and language pair. This article proposes a methodology based on both objective and subjective measures for the evaluation of five different and free online machine translation systems. Their raw machine translation outputs and the post-editing effort that is involved are assessed using eight different scores. Results show that there are clear quality differences among the systems assessed and that one of them is the best rated in six out of the eight evaluation measures used. -
A Finite-State Morphological Analyser for Sindhi
A Finite-State Morphological Analyser for Sindhi Raveesh Motlani1, Francis M. Tyers2 and Dipti M. Sharma1 1FC Kohli Center on Intelligent Systems (KCIS), International Institute of Information Technology Hyderabad, Telangana, India 2 HSL-fakultehta, UiT Norgga árktalaš universitehta N-9019 Romsa [email protected], [email protected], [email protected] Abstract Morphological analysis is a fundamental task in natural-language processing, which is used in other NLP applications such as part-of-speech tagging, syntactic parsing, information retrieval, machine translation, etc. In this paper, we present our work on the development of free/open-source finite-state morphological analyser for Sindhi. We have used Apertium’s lttoolbox as our finite-state toolkit to implement the transducer. The system is developed using a paradigm-based approach, wherein a paradigm defines all the word forms and their morphological features for a given stem (lemma). We have evaluated our system on the Sindhi Wikipedia, which is a freely-available large corpus of Sindhi and achieved a reasonable coverage of about 81% and a precision of over 97%. Keywords: Sindhi, Morphological Analysis, Finite-State Machines [ɓ] ٻ [ɲ] ڃ [ŋ] ڱ Introduction .1 [ɠ] [ʄ] [bʱ] ڀ ڄ ڳ Morphology describes the internal structure of words in a [dʱ] ڌ [cʰ] ڇ [k] ڪ language. A morphological analysis of a word involves [ɗ] ڏ [ʈʰ] ٺ [ɳ] ڻ ,describing one or more of its properties such as: gender [ɖ] ڊ [ʈ] ٽ [pʰ] ڦ number, person, case, lexical category, etc. Morphological [ɖʱ] ڍ [tʰ] ٿ [ɽ] ڙ -analysis of a word thus becomes a fundamental and cru cial task in natural-language processing for any language.