Building a Multilingual Lexical Resource for Named Entity

Total Page:16

File Type:pdf, Size:1020Kb

Building a Multilingual Lexical Resource for Named Entity Edinburgh Research Explorer Building a Multilingual Lexical Resource for Named Entity Recognition and Translation and Transliteration Citation for published version: Wentland, W, Knopp, J, Silberer, C & Hartung, M 2008, Building a Multilingual Lexical Resource for Named Entity Recognition and Translation and Transliteration. in N Calzolari, K Choukri, B Maegaard, J Mariani, J Odijk, S Piperidis & D Tapias (eds), Proceedings of the International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA). <http://www.lrec- conf.org/proceedings/lrec2008/> Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record Published In: Proceedings of the International Conference on Language Resources and Evaluation (LREC) General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 09. Oct. 2021 Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration Wolodja Wentland Johannes Knopp Carina Silberer Matthias Hartung Department of Computational Linguistics Heidelberg University fwentland, knopp, silberer, [email protected] Abstract In this paper, we present HeiNER, the multilingual Heidelberg Named Entity Resource. HeiNER contains 1,547,586 disambiguated English Named Entities together with translations and transliterations to 15 languages. Our work builds on the approach described in (Bunescu and Pasca, 2006), yet extends it to a multilingual dimension. Translating Named Entities into the various target languages is carried out by exploiting crosslingual information contained in the online encyclopedia Wikipedia. In addition, HeiNER provides linguistic contexts for every NE in all target languages which makes it a valuable resource for multilingual Named Entity Recognition, Disambiguation and Classification. The results of our evaluation against the assessments of human annotators yield a high precision of 0:95 for the NEs we extract from the English Wikipedia. These source language NEs are thus very reliable seeds for our multilingual NE translation method. 1. Introduction • a Translation Dictionary of all NEs encountered in the Named entities (NEs) are fundamental constituents in English version of Wikipedia with their translations texts, but are usually unknown words. Thus, named en- into 15 other languages available in Wikipedia, tity recognition (NER) and disambiguation (NED) are a • a Disambiguation Dictionary for each of the 16 lan- crucial prerequisite for successful information extraction, guages, that maps all ambiguous proper names to the information retrieval, question answering, discourse analy- set of unique NEs they refer to, and sis and machine translation. In line with (Bunescu, 2007), we consider NED as the task of identifying the unique en- • a Multilingual Context Database for all disambiguated tity that corresponds to potentially ambiguous occurrences NEs. of proper names in texts.1 This is particularly important when information about a NE is supposed to be collected From our perspective, it is especially the collection of and integrated from multiple documents. Especially multi- linguistic contexts that are provided for the disambiguated lingual settings in information retrieval (Virga and Khudan- NEs in every target language that makes HeiNER a valu- pur, 2003; Gao et al., 2004) or question answering (Ligozat able resource for NE-related tasks in multilingual settings. et al., 2006) require methods for transliteration and transla- These contexts can be used for supervised training of clas- tion of NEs. sifiers for tasks such as NER, NED or NE Classification in languages for which no suitable systems are available so In this paper we propose a method to acquire NEs in one far. source language and translate and transliterate2 them to a great variety of target languages. The method we apply in The paper is structured as follows: In Section 2., we in- order to recognise and disambiguate NEs in the source lan- troduce the aspects of Wikipedia’s internal structure that are guage follow the approach of (Bunescu and Pasca, 2006). essential for our approach. Section 3. describes the details For multilingual NE acquisition we exploit and enrich of how we acquire NEs monolingually for one source lan- crosslingual information contained in the online encyclo- guage and translate them to many target languages. In or- pedia Wikipedia. With this account we demonstrate a vi- der to assess the quality of HeiNER, we evaluate the seeds able solution for the efficient and reliable acquisition of dis- of the translation step, i.e. the set of NEs acquired in the ambiguated NEs that is particularly effective for resource- source language, against the judgements of human annota- poor target languages, employing heuristic NER in a single tors. The results of the evaluation are presented in Section source language. For the NED task, we rely on Wikipedia’s 4. Section 5. concludes and gives an outlook on future internal link structures. work. In its current state, HeiNER comprises the following 2. Wikipedia components which are freely available3 2.1. Overview Wikipedia4 is an international project which is devoted to 1As an example, (Bunescu, 2007) mentions the ambiguous the creation of a multilingual free online encyclopedia. It proper name Michael Jordan which can either refer to the Bas- 5 ketball player or a University Professor at Berkeley. uses freely available Wiki software to allow users to cre- 2Note that throughout this paper we will use the term translate ate and edit its content collaboratively. This process relies in a wider sense that subsumes transliteration into different scripts as well. 4http://wikipedia.org 3http://heiner.cl.uni-heidelberg.de 5http://www.mediawiki.org 3230 entirely on subsequent edits by multiple users who are ad- vised to adhere to Wikipedia’s quality standards as stated in the Manual of Style6. Wikipedia has received a great deal of attention within the research community over the last years and has been successfully utilised in such diverse fields as machine trans- lation (Alegria et al., 2006), NE transliteration (Sproat et al., 2006), word sense disambiguation (Mihalcea, 2007), parallel corpus construction (Adafre and de Rijke, 2006) and ontology construction (Ponzetto and Strube, 2007). From a NLP perspective the attractiveness of employing the linguistic data provided by Wikipedia lies in the huge amount of NEs that it contains in contrast to commonly used lexical resources such as WordNet7. Wikipedia has grown impressively large over the last years. As of December 2007, Wikipedia had approximately 9.25 million articles in 253 languages, comprising a total of over 1.74 billion words for all Wikipedias8. Wikipedia ver- sions for almost all major European languages, Japanese Figure 2: Language editions found in Wikipedia by number and Chinese have surpassed a volume of 100.000 articles of articles. (see Figures 1 and 2 for an overview). 2.2. Structure Wikipedia is a large hypertext document which consists of individual articles, or pages, which are interconnected by links. The majority of links found in the articles is con- ceptual, rather than merely navigational. They often refer to more detailed descriptions of the respective entity11 de- noted by the link’s anchor text (Adafre and de Rijke, 2005). An important characteristic of articles found in Wikipedia is that they are usually restricted to the descrip- tion of one single, unambiguous entity. Pages Wikipedia pages can be loosely divided into four classes: Concept Pages, Named Entity Pages and Category Pages account for most of Wikipedia’s content and are joined by Meta Pages which are used for administrative tasks, inclu- sion of different media and disambiguation of ambiguous strings.12 Figure 1: Number of articles found in different languages. Redirect Pages The enormous size of this comparable corpus9 raised the Redirect Pages are used to normalise different sur- question whether Wikipedia can be used to conduct mul- face forms, denominations and morphological variants tilingual research. We believe that the internal structure of of a given unambiguous NE or concept to a unique Wikipedia and its multilingualism can be harnessed for sev- form or ID. To exemplify the use of Redirect Pages eral NLP tasks. In the following, we describe the technical consider that USA, United States of America properties of Wikipedia we make use of for the acquisition and Yankee land are all Redirect Pages to the ar- of multilingual NE information.10 ticle United States. These redirects enable us to identify USA, United States of America or 6 http://en.wikipedia.org/wiki/Wikipedia: Yankee land as varying surface forms of the unique NE Manual_of_Style United States. 7http://wordnet.princeton.edu/ 8Data taken from http://en.wikipedia.org/wiki/ Wikipedia 11Throughout this paper, we use
Recommended publications
  • 100125-Wikimedia Feb Board Background Final2 Objectives for This Document
    Wikimedia Foundation Board Meeting Pre-read January 2010 Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Contents • Strategic planning process update • Goal setting • Priority 1: Building the platform • Priority 2: Strengthening the editing community • Priority 3: Accelerating impact through innovation and experimentation • Implications for the Foundation TBG 100125-Wikimedia_Feb Board Background_Final2 Objectives for this document • Provide reminder on design of strategic planning process and current stage of work • Highlight research and analysis that informed the development of priorities for the Wikimedia Foundation • Introduce action items, including priorities for 2010-11 annual plan to be discussed at Board meeting in February TBG 100125-Wikimedia_Feb Board Background_Final3 A reminder: Two interdependent objectives for the planning work • Define a strategic direction that advances the Wikimedia vision over the next five years • Broader reach and participation • Improved quality and scope of content • Defined community roles and partnerships • Develop a business plan to guide Wikimedia Foundation in executing this direction • Organization, capabilities, and governance • Technology strategy and infrastructure • Economics, cost structure, and funding models TBG 100125-Wikimedia_Feb Board Background_Final4 Our frame for developing the strategic direction and Wikimedia Foundation’s business plan • Wikimedia’s vision is “a world in which every single human being can
    [Show full text]
  • A Focus on Swahili
    MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI BY ARTHUR BULIVA UNITED STATES INTERNATIONAL UNIVERSITY - AFRICA SUMMER 2017 MACHINE NATURAL LANGUAGE TRANSLATION USING WIKIPEDIA AS A PARALLEL CORPUS: A FOCUS ON SWAHILI BY ARTHUR BULIVA A Project Report Submitted to the School of Science and Technology in Partial Fulfillment of the Requirement for the Degree of Master of Science in Information Systems and Technology UNITED STATES INTERNATIONAL UNIVERSITY - AFRICA SUMMER 2017 Declaration I, the undersigned, declare that this is my original work and has not been submitted to any other college, institution or university other than the United States International University in Nairobi for academic credit. Signed: Date: Arthur Buliva (ID 645381) This project has been presented for examination with my approval as the appointed supervisor. Signed: Date: Leah Mutanu Signed: Date: Dean, School of Science and Technology ii Copyright All Rights Reserved. No part of this dissertation report may be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without prior permission of United States International University - Africa or the author. Copyright Oc 2017 Arthur Buliva. iii Abstract The government of Kenya has undertaken an ambitious project to equip children with laptops and tablets for the purposes of facilitating electronic based learning. This initiative can only bear fruit provided that there is content relevant to the studies being undertaken. Many Kenyans learn English as a second language. Swahili or other African languages is the mother tongue. Therefore, with content in Swahili, a better and deeper understanding of subject matter takes place.
    [Show full text]
  • Critical Point of View: a Wikipedia Reader
    w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink and Nathaniel Tkacz Editorial Assistance: Ivy Roberts, Morgan Currie Copy-Editing: Cielo Lutino CRITICAL Design: Katja van Stiphout Cover Image: Ayumi Higuchi POINT OF VIEW Printer: Ten Klei Groep, Amsterdam Publisher: Institute of Network Cultures, Amsterdam 2011 A Wikipedia ISBN: 978-90-78146-13-1 Reader EDITED BY Contact GEERT LOVINK AND Institute of Network Cultures NATHANIEL TKACZ phone: +3120 5951866 INC READER #7 fax: +3120 5951840 email: [email protected] web: http://www.networkcultures.org Order a copy of this book by sending an email to: [email protected] A pdf of this publication can be downloaded freely at: http://www.networkcultures.org/publications Join the Critical Point of View mailing list at: http://www.listcultures.org Supported by: The School for Communication and Design at the Amsterdam University of Applied Sciences (Hogeschool van Amsterdam DMCI), the Centre for Internet and Society (CIS) in Bangalore and the Kusuma Trust. Thanks to Johanna Niesyto (University of Siegen), Nishant Shah and Sunil Abraham (CIS Bangalore) Sabine Niederer and Margreet Riphagen (INC Amsterdam) for their valuable input and editorial support. Thanks to Foundation Democracy and Media, Mondriaan Foundation and the Public Library Amsterdam (Openbare Bibliotheek Amsterdam) for supporting the CPOV events in Bangalore, Amsterdam and Leipzig. (http://networkcultures.org/wpmu/cpov/) Special thanks to all the authors for their contributions and to Cielo Lutino, Morgan Currie and Ivy Roberts for their careful copy-editing.
    [Show full text]
  • A Wikipedia Reader
    UvA-DARE (Digital Academic Repository) Critical point of view: a Wikipedia reader Lovink, G.; Tkacz, N. Publication date 2011 Document Version Final published version Link to publication Citation for published version (APA): Lovink, G., & Tkacz, N. (2011). Critical point of view: a Wikipedia reader. (INC reader; No. 7). Institute of Network Cultures. http://www.networkcultures.org/_uploads/%237reader_Wikipedia.pdf General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:05 Oct 2021 w ikipedia pedai p edia p Wiki CRITICAL POINT OF VIEW A Wikipedia Reader 2 CRITICAL POINT OF VIEW A Wikipedia Reader CRITICAL POINT OF VIEW 3 Critical Point of View: A Wikipedia Reader Editors: Geert Lovink
    [Show full text]
  • Primary-School-Feasibility Study
    WikiAfrica Primary School Feasibility Study WikiAfrica Primary School Feasibility Study Draft, November 2012. 1 WikiAfrica Primary School Feasibility Study The WikiAfrica Primary School Feasibility Study was produced in 2012 within the frame of WikiAfrica, a cross-continental collaboration that aims to increase the quantity and quality of African content on the world’s most referenced online encyclopaedia, Wikipedia. WikiAfrica is promoted by lettera27 Foundation and the Africa Centre and it was initiated by lettera27 Foundation in 2006. WikiAfrica Primary School Feasibility Study is edited by Iolanda Pensa, scientific director WikiAfrica; WikiAfrica project manager for lettera27 Cristina Perillo; WikiAfrica project manager for the Africa Centre Isla Haddow-Flood. WikiAfrica Primary School is a project conceived by Iolanda Pensa under Creative Commons attribution share alike license. The WikiAfrica Primary School Feasibility Study was made possible thanks to volunteers and the support of lettera27 Foundation. WikiAfrica Primary School Feasibility Study is under Creative Commons attribution share alike license. WikiAfrica, 2012. 2 WikiAfrica Primary School Feasibility Study “Education is the most powerful weapon which you can use to change the world “ Nelson Mandela 3 WikiAfrica Primary School Feasibility Study Table of content 1. Introduction 7 Key findings 8 2. Wikipedia 13 Outreach projects and education 13 Languages used in the African educational systems 15 General overview of involved Wikipedia projects 18 General overview on the
    [Show full text]
  • Wikimedia Foundation Board of Trustees Meeting July 2018
    Wikimedia Foundation Board of Trustees meeting July 2018 ● Welcome ● Operations Agenda ● Movement strategy ● Look back/forward ● Chair’s year-in-review Day One ● Future of the Board ● Executive session Welcome 3 Operations 4 Revenue & Fundraising 5 $104 million raised in FY 17-18 Note: The Advancement Department releases a detailed fundraising report every September. Stay tuned . Revenue by $60.1m Quarter FY17-18 $22.4m $100.4m $10.3m $7.6m Q1 Q2 Q3 Q4 FY2018-19 Q1 revenue projections PROGRAM TARGET PROBABILITY PROJECTION Country Campaigns in Spain, South Africa, $5.3m 90% $4.8m Malaysia, and Japan Low-Level Multi-Country Campaigns $1.7m 90% $1.5m English Testing $2.5m 90% $2.2m Recurring Donations $2.2m 95% $2.1m Major Gifts Pipeline (that may come by September 30) $3m 25% $750,000 TOTAL: $11.35 m Financials 9 +$23.4M (+30%) revenue over plan -$0.4M (-1%) spending under budget Year-end +$9.4M (+10%) YoY growth in revenue +$10.6M (+16%) YoY growth in spending FY17-18 Maintained programmatic ratio at 74% overview ● Repurposed over $2.7M in underspend to Wikidata, Grants, combatting Wikipedia block in Turkey, trademark filings in countries with *Please note that all FY17-18 amount in this key emerging communities, strategic deck are preliminary pending completion of partnerships, Singapore data center, and other the full financial closing process and audit. programmatic investments FY17-18 $100.4M Revenue Actual $78.8M Budget $76.8M $78.4M vs. Target Actual Spending Revenue Spending Revenue exceeded spending by $22M Additional staffing, including $5.4M +16% YoY Increase CDPs Increasing grants to $2.3M +37% Drivers communities $78.4M Funds available for a specific $1.4M purpose (including Movement - FY17-18 Strategy) Building capacity in Technology $1.2M +36% and our data centers $67.6M Wikimania (which was not held $0.8M - FY16-17 in the prior fiscal year) Donation processing fees $0.7M +17% related to increased revenue Legal fees related to community $0.2M +17% defense, supporting privacy, & combating state censorship In FY16-17, Movement Strategy was $1.5M.
    [Show full text]
  • Wikimedia Foundation Metrics Meeting 1 October 2015 Agenda
    Wikimedia Foundation metrics meeting 1 October 2015 Agenda Welcome Community update Metrics Feature Research Product demo Q&A Photo by Alfred T Palmer, Public domain Welcome! Requisition hires: Contractors, interns & volunteers: ● Alan Lau - F&A - SF ● Fredric Bolduc - Engineering - Canada ● Boryana Dineva - Talent & Culture - SF ● Gaeten Goldberg - Legal - SF ● David Lynch - Engineering - MO ● Hannah Hernandez - Advancement - GA ● Ed Erhart - Communications - SF (conversion) ● Jennifer Grace - Legal - SF ● Ellie Young - Community Engagement - SF ● Jonathan Unikowski - Legal - SF (conversion) ● Nancy Liao - Communications - SF ● Jeff Elder - Communications - SF ● Thalia Chan - Engineering - UK ● Julien Girault - Engineering - SF ● Karen Brown - Community Engagement - NY Anniversaries Ariel Glenn (7 yrs) Andre Klapper (3 yrs) Elena Tonkovidova (1 yr) Trevor Parscal (7 yrs) Željko Filipin (3 yrs) Jon Katz (1 yr) Guillaume Paumier (6 yrs) Brad Jorsch (3 yrs) Joaquin Hernandez (1 yr) Amir Aharoni (4 yrs) Adele Vrana (3 yrs) Frances Hocutt (1 yr) Rachel Farrand (4 yrs) Robert Miller (3 yrs) Heather Walls (4 yrs) Gergő Tisza (2 yrs) Aaron Halfaker (4 yrs) Caitlin Virtue (2 yrs) Oliver Keyes (4 yrs) Caitlin Cogdil (2 yrs) Antoine Musso (4 yrs) Rummana Yasmeen (2 yrs) Gabriel Wicke (4 yrs) Kunal Mehta (1 yr) Community update Wikimedia CEE Meetup 2015 ● September 10 - 15 in Estonian countryside. ● Share practices and learn from other experiences that are locally relevant. Important topics: advocacy and governance, programmatic activities. ● ~ 70 participants, 32 countries, 28 language communities. Photos by Auli Kütt and wpedzich, under CC-BY-SA 4.0 Wikimedia CEE Meetup 2015 ● The Wikipedia Education Program is an important gateway for other knowledge societies (outside the movement) to engage with Wikimedia projects in emerging communities.
    [Show full text]
  • Wikitrip: Animated Visualization Over Time of Gender and Geo-Location of Wikipedians Who Edited a Page
    WikiTrip: animated visualization over time of gender and geo-location of Wikipedians who edited a page Paolo Massa, Maurizio Napolitano, Federico Scrinzi {massa, napolita, fscrinzi}@fbk.eu Bruno Kessler Foundation Trento, Italy WikiTrip allows to have a trip in the process of creation of any Wikipedia page from any language edition of Wikipedia. WikiTrip is an interactive web tool empowering its users by providing an insightful visualization of two kinds of information about the Wikipedians who edited the selected page: their location in the world and their gender. If you want to investigate, for example, where in the world are Wikipedians who edited the page “Peace”, WikiTrip is the right tool. And you can check also the origin of edits for in the Arabic Wikipedia or “Amani” in the Swahili Wikipedia. Moreover, if ” سلم“ the equivalent page you have ever wondered if a specific page was edited more by male or female Wikipedians, WikiTrip allows to explore this information as well. Visualization of both information is available over time so that you can appreciate the evolution of the page over years, from its creation up to the present. WikiTrip is available at http://sonetlab.fbk.eu/wikitrip/ Locations in the world are visualized on a zoomable and scrollable map and, in order to deal with large datasets of points, they are clustered at runtime in bubbles of varying dimensions depending on the number of points in that location. The 10 countries from which most edits came are visualized also in a specific bar plot. The location is visualized only for edits made by anonymous users since they are identified by their IP address and this can be mapped by WikiTrip to the place in the world from which the user edited Wikipedia.
    [Show full text]
  • Cultural Neighbourhoods, Or Approaches to Quantifying Cultural Contextualisation in Multilingual Knowledge Repository Wikipedia
    CULTURAL NEIGHBOURHOODS, OR APPROACHES TO QUANTIFYING CULTURAL CONTEXTUALISATION IN MULTILINGUAL KNOWLEDGE REPOSITORY WIKIPEDIA by Anna Samoilenko Approved Dissertation thesis for the partial fulfillment of the requirements for a Doctor of Natural Sciences (Dr. rer. nat.) Fachbereich 4: Informatik Universität Koblenz-landau Chair of PhD Board: Prof. Dr. Ralf Lämmel Chair of PhD Commission: Prof. Dr. Stefan Müller Examiner and Supervisor: Prof. Dr. Steffen Staab Further Examiners: Prof. Dr. Brent Hecht, Jun.-Prof. Dr. Tobias Krämer Date of the doctoral viva: 16 June 2021 iii Cultural Neighbourhoods, or approaches to quantifying cultural contextualisation in multilingual knowledge repository Wikipedia by Anna SAMOILENKO Abstract As a multilingual system, Wikipedia provides many challenges for academics and engineers alike. One such challenge is cultural contextualisation of Wikipedia content, and the lack of approaches to effectively quantify it. Additionally, what seems to lack is the intent of establishing sound computational practices and frameworks for measuring cultural variations in the data. Current approaches seem to mostly be dictated by the data availability, which makes it difficult to apply them in other contexts. Another common drawback is that they rarely scale due to a significant qualitative or translation effort. To address these limitations, this thesis develops and tests two modular quantitative approaches. They are aimed at quantifying culture-related phenomena in systems which rely on multilingual user-generated content. In particular, they allow to: (1) operationalise a custom concept of cul- ture in a system; (2) quantify and compare culture-specific content- or coverage biases in such a system; and (3) map a large scale landscape of shared cultural interests and focal points.
    [Show full text]
  • The Testaments Atwood Wikipedia
    The Testaments Atwood Wikipedia Uncharge and flavored Istvan bury her gremial regelates or neoterizes crankily. Nearest Caldwell locating her albuminuria so terrifically that Mickie unties very occidentally. Huntley still unriddling pharmaceutically while aluminous Abdullah hallos that testifications. He has learned his personal wikipedia is harvard classes while the testaments is actually read both a step ahead of technologically mediated collaboration changes with her. To wikipedia is no one is situated on what would be ordered alphabetically, atwood incorporates elements. The testaments atwood best known in! It important niche in wikipedia, atwood taught to duty, and then confirm your emphasis on? Do anything but wikipedia art is that she won numerous awards for her editor, rather than trying to? The testaments atwood does not be a big dystopian books! AUTHORITYThis chapter then makes an empirical analysis of the infrastructure governance of OCCs and how infrastructure provision relates to scalability, based on several case of Wikipedia and the Wikimedia Foundation. Gilead is intelligent constant attitude of political conversation look her school. Emily meets in wikipedia has committees, atwood launched their anger, later point to spend their web thinkers to. While attempts to resist and i feel something went on? The development not only ensured that Bomis would not be able to harvest the collaborative work of the community, but also showed that the community had stepped up to be the focal actor of the network. By describing a particular period in the history of science, the author provides an interesting introduction to the idea that the ordering of knowledge produces a particular understanding of knowledge.
    [Show full text]
  • Becoming Wikipedia Literate Heather Ford R
    “Writing up rather than writing down”: Becoming Wikipedia Literate Heather Ford R. Stuart Geiger Ushahidi UC-Berkeley School of Information [email protected] 102 South Hall, Berkeley CA 94703 [email protected] ABSTRACT different as their implied reader, or simply assume knowledge Editing Wikipedia is certainly not as simple as learning the that a learner doesn’t have. These are roadblocks to “becoming MediaWiki syntax and knowing where the “edit” bar is, but how literate” that cannot be overcome even by reading more do we conceptualize the cultural and organizational closely, or between the lines. [6] understandings that make an effective contributor? We draw on Our understanding of literacy draws extensively from Darville’s work of literacy practitioner and theorist Richard Darville to subsequent concept of “organizational literacy”, which refers to advocate a multi-faceted theory of literacy that sheds light on the ability to read and write in such a way that takes into account what new knowledges and organizational forms are required to how a particular document has been or will be circulated, improve participation in Wikipedia’s communities. We outline interpreted, and evaluated within an organization. Darville’s most what Darville refers to as the “background knowledges” required compelling example is that of job applicants: unlike those without to be an empowered, literate member and apply this to the organizational literacy, “experts” to the process know how to Wikipedia community. Using a series of examples drawn from articulate their skills and qualifications in such a way that it will interviews with new editors and qualitative studies of flow through various levels of a human resources bureaucracy, controversies in Wikipedia, we identify and outline several receiving favorable evaluations each time.
    [Show full text]
  • Gourmet Project
    GoURMET H2020–825299 D1.1 Survey of relevant low-resource languages Global Under-Resourced MEdia Translation (GoURMET) H2020 Research and Innovation Action Number: 825299 D1.1 – Survey of relevant low-resource languages Nature Report Work Package WP1 Due Date 30/04/2019 Submission Date 30/04/2019 Main authors Mikel L. Forcada Co-authors Miquel Espla-Gomis,` Juan Antonio Perez-Ortiz,´ V´ıctor M. Sanchez-´ Cartagena, Felipe Sanchez-Mart´ ´ınez Reviewers Alexandra Birch Keywords survey, languages, resources, machine translation Version Control v0.8 Status 1st Draft 21/04/2019 v1.0 Status Final Version 30/04/2019 page 1 of 86 GoURMET H2020–825299 D1.1 Survey of relevant low-resource languages Contents 1 Introduction6 1.1 Language pairs of interest for GoURMET......................7 2 Languages8 2.1 Afaan Oromoo (om, orm)...............................8 2.1.1 Factsheet...................................8 2.1.2 Contrasts with English............................8 2.1.3 Corpora.................................... 11 2.1.4 Resources................................... 12 2.1.5 Challenges for corpus-based MT from English............... 12 2.2 Bosnian (bs, bos)................................... 14 2.2.1 Factsheet................................... 14 2.2.2 Contrasts with English............................ 14 2.2.3 Corpora.................................... 14 2.2.4 Resources................................... 15 2.2.5 Challenges for corpus-based MT from English............... 15 2.3 Bulgarian (bg, bul).................................. 16 2.3.1 Factsheet................................... 16 2.3.2 Contrasts with English............................ 16 2.3.3 Corpora................................... 17 2.3.4 Resources................................... 20 2.3.5 Challenges for corpus-based MT from English............... 21 2.4 Croatian (hr, hrv).................................. 22 2.4.1 Factsheet................................... 22 2.4.2 Contrasts with English...........................
    [Show full text]