Evaluation of Contextual Embeddings on Less-Resourced Languages
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Latvian Sportspeople Representation in English and Latvian Wikipedias
32 | Rudzinska: LATVIAN SPORTSPEOPLE REPRESENTATION ... ORIGINAL RESEARCH PAPER LATVIAN SPORTSPEOPLE REPRESENTATION IN ENGLISH AND LATVIAN WIKIPEDIAS Ieva Rudzinska Latvian Academy of Sport Education Address: 333 Brivibas Street, Riga, LV-1006, Latvia Phone: 37167543445, fax: +37167543480 E-mail: [email protected] Abstract The goal was to study Latvian sportspeople representation in English and Latvian Wikipedias in 2015. The analyses allowed identifying three main Latvian sportspeople related categories in English Wikipedia: “Latvian sportspeople”, “List of Latvian sportspeople” and “Latvian sports related lists”, a category “Latvijas sportisti” in Latvian Wikipedia. In “Latvian sportspeople” 1018 sportspeople were listed by family names, starting with Artis Ābols and ending with Ainārs Zvirgzdiņš, by sports – from Latvian alpine skiers to Latvian weightlifters. In “List of Latvian sportspeople” were included 99 most notable Latvian sportspeople, representing 24 sports. The largest athlete frequency per sport (14) was in 3 sports: athletics, basketball and luge. From 5 to 10 sportspeople were in 6 sports: rowing, bobsleigh, volleyball, ice hockey, judo and tennis, 15 sports were represented by 1 to 4 athletes. In Latvian Wikipedia in the category “Latvijas sportisti” were 1186 sportspeople from 38 sports. Statistical analysis allowed finding moderate Pearson correlations between the numbers of sportspeople in the category “Latvian sportspeople” and “List of Latvian sportspeople”, EN (0.60; Sig.<0.01); “List of Latvian sportspeople”, -
D2.8 GLAM-Wiki Collaboration Progress Report 2
Project Acronym: Europeana Sounds Grant Agreement no: 620591 Project Title: Europeana Sounds D2.8 GLAM-Wiki Collaboration Progress Report 2 Revision: Final Date: 30/11/2016 Authors: Brigitte Jansen and Harry van Biessum, NISV Abstract: Within the Europeana Sounds GLAM-wiki collaboration task, nine edit-a-thons were organised by seven project partners. These edit-a-thons were held in Italy, Denmark, Latvia, England, Greece, France and the Netherlands. This report documents each event, the outcomes and lessons learned during this task. Dissemination level Public X Confidential, only for the members of the Consortium and Commission Services Coordinated by the British Library, the Europeana Sounds project is co-funded by the European Union, through the ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme (CIP) http://ec.europa.eu/information_society/activities/ict_psp/ Europeana Sounds EC-GA 620591 EuropeanaSounds-D2.8-GLAM-wiki-collaboration-progress-report-2-v1.0.docx 30/11/2016 PUBLIC Revision history Version Status Name, organisation Date Changes 0.1 ToC Brigitte Jansen & Harry 14/10/2016 van Biessum, NISV 0.2 Draft Brigitte Jansen & Harry 04/11/2016 First draft van Biessum, NISV 0.3 Draft Zane Grosa, NLL 09/10/2016 Chapter 3.5 0.4 Draft Laura Miles, BL 15/11/2016 Chapters 3.4, 3.8, 5.1, 7 0.5 Draft Karen Williams, State 17/11/2016 Chapters 3.9, 7 and University Library Denmark 0.6 Draft Marianna Anastasiou, 17/11/2016 Chapter 3.6 FMS 0.7 Draft Brigitte Jansen, Maarten 18/11/2016 Incorporating feedback by Brinkerink & Harry van reviewer and Europeana Biessum, NISV Sounds partner 0.8 Draft David Haskiya, EF 28/11/2016 Added Chapter 3.2.2 0.9 Final draft Maarten Brinkerink & 28/11/2016 Finalise all chapters Harry van Biessum, NISV 1.0 Final Laura Miles & Richard 30/11/2016 Layout, minor changes Ranft, BL Review and approval Action Name, organisation Date Sindy Meijer, Wikimedia Chapter Netherland 16/11/2016 Reviewed by Liam Wyatt, EF 24/11/2016 Approved by Coordinator and PMB 30/11/2016 Distribution No. -
Genre Analysis of Online Encyclopedias. the Case of Wikipedia
Genre analysis online encycloped The case of Wikipedia AnnaTereszkiewicz Genre analysis of online encyclopedias The case of Wikipedia Wydawnictwo Uniwersytetu Jagiellońskiego Publikacja dofi nansowana przez Wydział Filologiczny Uniwersytetu Jagiellońskiego ze środków wydziałowej rezerwy badań własnych oraz Instytutu Filologii Angielskiej PROJEKT OKŁADKI Bartłomiej Drosdziok Zdjęcie na okładce: Łukasz Stawarski © Copyright by Anna Tereszkiewicz & Wydawnictwo Uniwersytetu Jagiellońskiego Wydanie I, Kraków 2010 All rights reserved Książka, ani żaden jej fragment nie może być przedrukowywana bez pisemnej zgody Wydawcy. W sprawie zezwoleń na przedruk należy zwracać się do Wydawnictwa Uniwersytetu Jagiellońskiego. ISBN 978-83-233-2813-1 www.wuj.pl Wydawnictwo Uniwersytetu Jagiellońskiego Redakcja: ul. Michałowskiego 9/2, 31-126 Kraków tel. 12-631-18-81, 12-631-18-82, fax 12-631-18-83 Dystrybucja: tel. 12-631-01-97, tel./fax 12-631-01-98 tel. kom. 0506-006-674, e-mail: [email protected] Konto: PEKAO SA, nr 80 1240 4722 1111 0000 4856 3325 Table of Contents Acknowledgements ........................................................................................................................ 9 Introduction .................................................................................................................................... 11 Materials and Methods .................................................................................................................. 14 1. Genology as a study .................................................................................................................. -
LASE JOURNAL of SPORT SCIENCE Is a Scientific Journal Published Two Times Per Year in Sport Science LASE Journal for Sport Scientists and Sport Experts/Specialists
LASE JOURNAL OF SPORT SCIENCE is a Scientific Journal published two times per year in Sport Science LASE Journal for sport scientists and sport experts/specialists Published and financially supported by the Latvian Academy of Sport Education in Riga, Latvia p-ISSN: 1691-7669 Editorial Contact Information, e-ISSN: 1691-9912 Publisher Contact Information: ISO 3297 Inta Bula-Biteniece Latvian Academy of Sport Education Language: English Address: 333 Brivibas Street Indexed in IndexCopernicus Evaluation Riga, LV1006, Latvia Ministry of Science and Higher Phone.: +371 67543410 Education, Poland Fax: +371 67543480 De Gruyter Open E-mail: [email protected] DOI (Digital Object Identifiers) Printed in 100 copies The annual subscription (2 issues) is 35 EUR (20 EUR for one issue). Executive Editor: LASE Journal of Sport Inta Bula – Biteniece Science Exemplary order form of Ilze Spīķe subscription is accessible Language Editor: in our website: www.lspa.lv/research Ieva Rudzinska Please send the order to: Printed and bound: “Printspot” Ltd. LASE Journal of Sport Science Cover projects: Uve Švāģers - Griezis Latvijas Sporta pedagoģijas akadēmija Address: 14-36 Salnas Street Address; 333 Brivibas Street Riga, LV1021, Latvia Riga, LV1006, Latvia Phone: +371 26365500 Phone: +371 67543410 e-mail: [email protected] Fax: +371 67543480 website: www.printspot.lv E-mail: [email protected] Method of payment: Please send payments to the account of Latvijas Sporta pedagoģijas akadēmija Nr. 90000055243 Account number: LV97TREL9150123000000 Bank: State Treasury BIC: TRELLV22 Postscript: subscription LASE Journal of Sport Science You are free to: Share — copy and redistribute the material in any medium or format. The licensor cannot revoke these freedoms as long as you follow the license terms. -
Language-Independent Methods for Identifying Cross-Lingual Similarity in Wikipedia
Language-Independent Methods for Identifying Cross-Lingual Similarity in Wikipedia Monica Lestari Paramita Information School University of Sheffield A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy February 2019 Declaration I hereby declare that this thesis is a presentation of my original research work. The con- tents of this thesis have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. Wherever contributions of others are involved, every effort is made to indicate this clearly, with due reference to the literature, and acknowledgement of collaborative research and discussions. A number of the chapters in this thesis have been published in academic papers or journals; these publications are specifically indicated at the end of each relevant chapter. Monica Lestari Paramita February 2019 Abstract The diversity and richness of multilingual information available in Wikipedia have in- creased its significance as a language resource. The information extracted from Wikipedia has been utilised for many tasks, such as Statistical Machine Translation (SMT) and sup- porting multilingual information access. These tasks often rely on gathering data from articles that describe the same topic in different languages with the assumption that the contents are equivalent to each other. However, studies have shown that this might not be the case. Given the scale and use of Wikipedia, there is a need to develop an approach to measure cross-lingual similarity across Wikipedia. Many existing similarity measures, however, require the availability of ‘language-dependent’ resources, such as dictionaries or Machine Translation (MT) systems, to translate documents into the same language prior to comparison. -
On the Spectral Evolution of Large Networks (Online Version)
On the Spectral Evolution of Large Networks J´er^omeKunegis Institute for Web Science and Technologies University of Koblenz-Landau [email protected] November 2011 Vom Promotionsausschuss des Fachbereichs 4: Informatik der Universit¨at Koblenz-Landau zur Verleihung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) genehmigte Dissertation. PhD thesis at the University of Koblenz-Landau. Datum der wissenschaftlichen Aussprache: 9. November 2011 Vorsitz des Promotionsausschusses: Prof. Dr. Karin Harbusch Berichterstatter: Prof. Dr. Steffen Staab Berichterstatter: Prof. Dr. Christian Bauckhage Berichterstatter: Prof. Dr. Klaus Obermayer Abstract In this thesis, I study the spectral characteristics of large dynamic networks and formulate the spectral evolution model. The spectral evolution model applies to networks that evolve over time, and describes their spectral decompositions such as the eigenvalue and singular value decomposition. The spectral evolution model states that over time, the eigenvalues of a network change while its eigenvectors stay approximately constant. I validate the spectral evolution model empirically on over a hundred network datasets, and theoretically by showing that it generalizes a certain number of known link prediction functions, including graph kernels, path counting methods, rank reduction and triangle closing. The collection of datasets I use contains 118 distinct network datasets. One dataset, the signed social network of the Slashdot Zoo, was specifically extracted during work on this thesis. I also show that the spectral evolution model can be understood as a generalization of the preferential attachment model, if we consider growth in latent dimensions of a network individually. As applications of the spectral evolution model, I introduce two new link prediction algo- rithms that can be used for recommender systems, search engines, collaborative filtering, rating prediction, link sign prediction and more. -
Transformer-Based Model for Latvian Language Understanding
Human Language Technologies – The Baltic Perspective 111 A. Utka et al. (Eds.) © 2020 The authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0). doi:10.3233/FAIA200610 LVBERT: Transformer-Based Model for Latvian Language Understanding Arturs¯ ZNOTIN¸ Sˇ 1 and Guntis BARZDIN¸ Sˇ Institute of Mathematics and Computer Science, University of Latvia, Latvia Abstract. This paper presents LVBERT – the first publicly available monolingual language model pre-trained for Latvian. We show that LVBERT improves the state- of-the-art for three Latvian NLP tasks including Part-of-Speech tagging, Named Entity Recognition and Universal Dependency parsing. We release LVBERT to fa- cilitate future research and downstream applications for Latvian NLP. Keywords. Transformers, BERT, language models, Latvian 1. Introduction Pre-trained contextualized text representation models, especially BERT – the Bidirec- tional Encoder Representations from Transformers [1], have become very popular and helped to achieve state-of-the-art performances in multiple Natural Language Process- ing (NLP) tasks [2]. Previously, the most common text representations were based on word embeddings that aimed to represent words by capturing their distributed syntactic and semantic properties [3], [4]. However, these word embeddings do not incorporate information about the context in which the words appear. This issue was addressed by BERT and other pre-trained language models. The success of BERT and its variants has largely been limited to the English language. For other languages, one could use exist- ing pre-trained multilingual BERT-based models [1], [5] and optionally fine-tune them, or retrain a language-specific model from scratch [6], [7]. -
Wikipedia from Wikipedia, the Free Encyclopedia This Article Is About the Internet Encyclopedia
Wikipedia From Wikipedia, the free encyclopedia This article is about the Internet encyclopedia. For other uses, see Wikipedia ( disambiguation). For Wikipedia's non-encyclopedic visitor introduction, see Wikipedia:About. Page semi-protected Wikipedia A white sphere made of large jigsaw pieces, with letters from several alphabets shown on the pieces Wikipedia wordmark The logo of Wikipedia, a globe featuring glyphs from several writing systems, mo st of them meaning the letter W or sound "wi" Screenshot Main page of the English Wikipedia Main page of the English Wikipedia Web address wikipedia.org Slogan The free encyclopedia that anyone can edit Commercial? No Type of site Internet encyclopedia Registration Optional[notes 1] Available in 287 editions[1] Users 73,251 active editors (May 2014),[2] 23,074,027 total accounts. Content license CC Attribution / Share-Alike 3.0 Most text also dual-licensed under GFDL, media licensing varies. Owner Wikimedia Foundation Created by Jimmy Wales, Larry Sanger[3] Launched January 15, 2001; 13 years ago Alexa rank Steady 6 (September 2014)[4] Current status Active Wikipedia (Listeni/?w?k?'pi?di?/ or Listeni/?w?ki'pi?di?/ wik-i-pee-dee-?) is a free-access, free content Internet encyclopedia, supported and hosted by the non -profit Wikimedia Foundation. Anyone who can access the site[5] can edit almost any of its articles. Wikipedia is the sixth-most popular website[4] and constitu tes the Internet's largest and most popular general reference work.[6][7][8] As of February 2014, it had 18 billion page views and nearly 500 million unique vis itors each month.[9] Wikipedia has more than 22 million accounts, out of which t here were over 73,000 active editors globally as of May 2014.[2] Jimmy Wales and Larry Sanger launched Wikipedia on January 15, 2001. -
TECHNOLOGY DEPT Community Support Programs May 2018 Quarterly Check-In for Work Done in Q3 FY2017/18
TECHNOLOGY DEPT Community Support Programs May 2018 quarterly check-in for work done in Q3 FY2017/18 All content is © 2018 Wikimedia Foundation and is available under CC-BY-SA 4.0 unless noted otherwise Program Structure TP1 Availability, Performance & TP3 Addressing Technical Debt Sustaining Maintenance TP8 Multi-datacenter Support TP2 Mediawiki Refresh X-SPDM Security, Privacy, & Data mgmt Foundational TP6 Streamlined Service Delivery X-SDC Structured Data on Commons PP1 Discoverability TP5 Scoring Platform (ORES) TP11 Citations/Verifiability Community TP9 Growing Wikipedia Across Languages TP12 Growing Contributor Diversity Support TP7 Smart Tools for Better Data X-CH Community Health/Anti-harassment Tech Community TP4 Technical Community TP10 Public Cloud Services & Building Support Support Program Priorities If we don’t do this ... Actual FTE (approximate) The sites go down. Sustaining 30 Performance and data Foundational quality decays. 10 Community Become technologically Support obsolete. 10 Tech Community Lose bots and code Support contributions. 4 How we prioritize Fundamentals Service Improvement Maintenance What is our part in What are other people What could we do to What will sustain and fulfilling the mission? asking from us? improve our offering? improve our delivery? C A Foundation level Features we build to goals improve our technology offering B D Features we build for others Modernization, renewal and tech debt goals Analytics Eng. Performance 7 Nuria Ruiz 5 Ian Marlier Traffic Scoring Platform 2 Aaron Halfaker Release