Language Independent Named Entity Recognition
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Arxiv:2010.11856V3 [Cs.CL] 13 Apr 2021 Questions from Non-English Native Speakers to Rep- Information-Seeking Questions—Questions from Resent Real-World Applications
XOR QA: Cross-lingual Open-Retrieval Question Answering Akari Asaiº, Jungo Kasaiº, Jonathan H. Clark¶, Kenton Lee¶, Eunsol Choi¸, Hannaneh Hajishirziº¹ ºUniversity of Washington ¶Google Research ¸The University of Texas at Austin ¹Allen Institute for AI {akari, jkasai, hannaneh}@cs.washington.edu {jhclark, kentonl}@google.com, [email protected] Abstract ロン・ポールの学部時代の専攻は?[Japanese] (What did Ron Paul major in during undergraduate?) Multilingual question answering tasks typi- cally assume that answers exist in the same Multilingual document collections language as the question. Yet in prac- (Wikipedias) tice, many languages face both information ロン・ポール (ja.wikipedia) scarcity—where languages have few reference 高校卒業後はゲティスバーグ大学へ進学。 (After high school, he went to Gettysburg College.) articles—and information asymmetry—where questions reference concepts from other cul- Ron Paul (en.wikipedia) tures. This work extends open-retrieval ques- Paul went to Gettysburg College, where he was a member of the Lambda Chi Alpha fraternity. He tion answering to a cross-lingual setting en- graduated with a B.S. degree in Biology in 1957. abling questions from one language to be an- swered via answer content from another lan- 生物学 (Biology) guage. We construct a large-scale dataset built on 40K information-seeking questions Figure 1: Overview of XOR QA. Given a question in across 7 diverse non-English languages that Li, the model finds an answer in either English or Li TYDI QA could not find same-language an- Wikipedia and returns an answer in English or L . L swers for. Based on this dataset, we introduce i i is one of the 7 typologically diverse languages. -
Role of Libraries in Wikipedia Content Development
Role of libraries in Wikipedia content development Dr Vimal Kumar V. Technical Assistant Mahatma Gandhi University Library Kerala State, India LISACON-2020 National Virtual Conference Introduction Encyclopedias are a collection of articles summarized from primary and secondary information sources. Centralised editorial activity is the main highlight of traditional encyclopedias. The fundamental concept of traditional encyclopedia changed with the arrival of online alternatives like Wikipedia. The main features of Wikipedia are Multilingual, Open content and Free. Wikipedia introduced decentralised editorial activity, dependent on volunteers. 1 Wikipedia in Indian languages Wikipedia's Indian language editions became active after the introduction of the Unicode standard. The efforts of Indic Project and SMC have contributed to the development of tools for local languages. As per Wikimedia Statistics India consistently maintains 5th rank in page viewing in the country-wise ranking. 2 Article strength of South Indian languages Sl. Wikipedia Edition No. of Year No. articles established 1. Tamil Wikipedia 1,30,122 2003 2. Malayalam Wikipedia 69,911 2002 3. Telugu Wikipedia 69,739 2003 4. Kannada Wikipedia 26,397 2003 Source: Number of articles as on 1 August 2020 culled from stats.wikimedia.org 3 Ratio of editors Language Ratio (For every million speakers) Tamil 1 Malayalam 4 Telugu 0.7 Kannada 0.7 4 How Wikipedia works The community members power up the Wikipedia. There are two groups in the community: Wikipedia readers and Content contributors. Wikipedia content editors are known as Wikipedians. The main function of Wikipedians is to create new articles, add new content to existing articles, and make changes to the content. -
Arabic Wikipedia As an Example
Which tools to manage a medium- sized version of Wikipedia? Arabic Wikipedia as an example Helmi HAMDI, M. Sc. / M. Env. Username : Helmoony Wikiarabia 2015. Monastir, Tunisia April 5, 2015 Summary • Community goals • Current management approach in Arabic Wikipedia • Tools Recommendations List of Wikipedias by speakers per article No Wiki version Speakers Articles per 1,000 speakers Mainly constructed, regional 1 Volapuk 200 600420 and « bot-friendly » versions 9 Scots 100,000 305 59 French 74,980,460 21.5 69 English 505,000,000 9.6 Arabic and Hindi wikipedias 104 Arabic 236,748,330 1.5 face the same situation : low ratio of articles per 124 Hindi 260,333,620 0.4 speakers Arabic Wikipedia in the next 5 years Our objective is to be in the Top10 with a minimum of 5% quality content and an optimized Present situation way of managing. around 350 000 articles 1% quality articles http://www.worldbridgerdesign.com/blog/tag/learning/ Current management approach in Arabic Wikipedia Arabic Wikipedia We are copying everything from the English Wikipedia (policies, content depth, tools, etc.). Does it help us to achieve our objective ? English Wikipedia Limits of the current management approch Arabic version of the village pump • No priorities • No task list The number of tools doesn’t help us to gather our forces. Letters to the community Empty chatroom When to use the village pump and when to use the mailing list ? We have a wikiproject in the Japanese language and an other one for Twilight ! We have a Wikiproject for the metro of Paris and none about France or Europe Who is going to participate in a Wikiproject for a metro in a European city ? And for how long ? WikiProjects… or User projects? • More than 60 projects • Users mix task forces or missions with projects • No structure to link between the projects. -
Annual Report
2012 | ANNUAL REPORT Students in a Digital Classroom CIS ANNUAL REPORT (APRIL 2012 – MARCH 2013) _____________________________________________________________________ Contents Highlights ........................................................................................................................................ 3 Accessibility ..................................................................................................................................... 4 Access to Knowledge ...................................................................................................................... 7 Openness ...................................................................................................................................... 10 Internet Governance ..................................................................................................................... 20 Telecom ......................................................................................................................................... 43 Digital Natives ............................................................................................................................... 45 Researchers@Work ...................................................................................................................... 48 Credibility Alliance Norms Compliance ......................................................................................... 50 International Travel (2012-13) ..................................................................................................... -
Wikimedia India Newsletter, September 2010
Wikimedia India Community Newsletter Copyright The text of this newsletter is copyrighted and is formally licensed to the public under liberal license "Creative Commons Attribution-Share alike 3.0 Unported License (CC-BY-SA)". This newsletter as a whole (including this copyright statement) or the content of this newsletter can be copied, modified, and redistributed if and only if the copied version is made available on similar license terms. Every copied, modified or redistributed version of this newsletter request to attribute the authors of this newsletter (a link back to the original document or a word about it generally satisfy the attribution requirement). Reuse of Logos of the Wikimedia Foundation is strictly restricted. The logo of Wikimedia foundation, wikipedia, and the logo of other wiki projects are used in this newsletter as per the trademark policy of Wikimedia foundation. Usage of logos in media and press reports about Wikimedia and its projects is permitted, any other usage needs explicit permission. Content of this document is covered by a disclaimer. Disclaimer The items contained herein are published as submitted and are provided for general information purposes only. This information is not advice. Readers should not rely solely on this information, but should make their own inquiries before making any decisions. Authors behind this newsletter work to maintain up-to-date information from reliable sources; however, no responsibility is accepted for any errors or omissions or results of any actions based upon this information. If you have any questions regarding any of these items, contact back. This newsletter may contain links to websites that are created and maintained by other volunteers outside this newsletter and it is not guarantee the accuracy or completeness of any information presented there. -
Linguistic Neighbourhoods: Explaining Cultural Borders on Wikipedia Through Multilingual Co-Editing Activity
Samoilenko et al. RESEARCH Linguistic neighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editing activity Anna Samoilenko1,3*, Fariba Karimi1, Daniel Edler2, J´er^omeKunegis3 and Markus Strohmaier1,3 *Correspondence: [email protected] Abstract 1GESIS { Leibniz-Institute for the Social Sciences, 6-8 Unter In this paper, we study the network of global interconnections between language Sachsenhausen, 50667 Cologne, communities, based on shared co-editing interests of Wikipedia editors, and show Germany that although English is discussed as a potential lingua franca of the digital Full list of author information is available at the end of the article space, its domination disappears in the network of co-editing similarities, and instead local connections come to the forefront. Out of the hypotheses we explored, bilingualism, linguistic similarity of languages, and shared religion provide the best explanations for the similarity of interests between cultural communities. Population attraction and geographical proximity are also significant, but much weaker factors bringing communities together. In addition, we present an approach that allows for extracting significant cultural borders from editing activity of Wikipedia users, and comparing a set of hypotheses about the social mechanisms generating these borders. Our study sheds light on how culture is reflected in the collective process of archiving knowledge on Wikipedia, and demonstrates that cross-lingual interconnections on Wikipedia are not dominated by one powerful language. Our findings also raise some important policy questions for the Wikimedia Foundation. Keywords: Wikipedia; Multilingual; Cultural similarity; Network; Digital language divide; Socio-linguistics; Digital Humanities; Hypothesis testing 1 Introduction Measuring the extent to which cultural communities overlap via the knowledge they preserve can paint a picture of how culturally proximate or diverse they are. -
Common Issues Faced by Indic Wikipedia
Indic Wikipedia Policies & Guidelines Handbook Table of content Preface Introduction to policies Types of policies Features of a policy page Necessity of policies and guidelines Creating policies Proposing Village pump Article or project talk page Policy page and its talk page Initial proposal Highlighting important discussion Discussing Consensus Implementing Modifying or updating an existing policy Enforcements Common issues faced by Indic Wikipedia communities Missing or incomplete policy pages Incomplete or untranslated policy pages Lack of active translators/editors Addressing the issues Dedicated team or task force Using MediaWiki translation tool Policy mapping Credits Images Text Screenshots Planning suggestions Proofreading: Preface Currently CIS-A2K is working with five Indian-language Wikimedia communities (Kannada, Konkani, Marathi, Odia and Telugu). While working with the mentioned Indic Wikimedia communities, we observed a number of issues affecting them and we also noticed that there are many similarities between the issues and difficulties faced by these communities. So, we decided to create this “Indic Wikipedia Policies and Guidelines Handbook”. At first, we created a short handbook discussing a number of topics, such as how to create new policies, or modify the existing ones, using village pump, enforcing policies etc. Then we talked to Indic Wikipedians to know more about the policy and guideline related issues and problems they are facing. We also asked for their feedback on the first draft of this handbook. When we contacted them and requested them to join our survey, we received overwhelming responses from them. We must thank everyone who has taken part in our surveys and we will continue communicating with Indic Wikimedians. -
The India Chronicles Dear Community
September 2011 By Tory Read Growing Wikipedia: The India Chronicles Dear Community, As the Wikimedia Foundation began its catalyst work, we commissioned documentarian Tory Read to create a vivid description of our work in India during the important early stages of our activities. This was done in the interest of transparency and to ensure that we captured lessons from this new approach. It also serves as a window into some of the exciting developments in the Indian Wikimedia community. Our goal is to honestly communicate about our work in this new arena and to stimulate dialogue about diverse ways to support and build Wikipedia communities and Wikimedia projects around the world. We hope that you take away a nuanced understanding of the work in India. We encourage you to tell us what you think and ask informed questions as this work continues to unfold. Sincerely, Barry Newstead Chief Global Development Officer, Wikimedia Foundation This is a journalistic account and analysis, based on document review, interviews and observations conducted between November 2010 and June 2011, including 16 days in India in June 2011. I planned and organized my visit based on where the most Wikipedia activities were happening at the time. The Malayalam Wikipedia community had been planning their annual meetup, and they scheduled it to fall within my travel dates so I could report on it. The views expressed herein are my own and do not necessarily reflect the views of Wikimedia Foundation. Tory Read, documentarian Growing Wikipedia: The India Chronicles | September 2011 2 “Wikipedia saved my life.” That’s what Srikeit Tadepalli, an MBA student in Pune, India, told me one day in June. -
Cultural Neighbourhoods, Or Approaches to Quantifying Cultural Contextualisation in Multilingual Knowledge Repository Wikipedia
CULTURAL NEIGHBOURHOODS, OR APPROACHES TO QUANTIFYING CULTURAL CONTEXTUALISATION IN MULTILINGUAL KNOWLEDGE REPOSITORY WIKIPEDIA by Anna Samoilenko Approved Dissertation thesis for the partial fulfillment of the requirements for a Doctor of Natural Sciences (Dr. rer. nat.) Fachbereich 4: Informatik Universität Koblenz-landau Chair of PhD Board: Prof. Dr. Ralf Lämmel Chair of PhD Commission: Prof. Dr. Stefan Müller Examiner and Supervisor: Prof. Dr. Steffen Staab Further Examiners: Prof. Dr. Brent Hecht, Jun.-Prof. Dr. Tobias Krämer Date of the doctoral viva: 16 June 2021 iii Cultural Neighbourhoods, or approaches to quantifying cultural contextualisation in multilingual knowledge repository Wikipedia by Anna SAMOILENKO Abstract As a multilingual system, Wikipedia provides many challenges for academics and engineers alike. One such challenge is cultural contextualisation of Wikipedia content, and the lack of approaches to effectively quantify it. Additionally, what seems to lack is the intent of establishing sound computational practices and frameworks for measuring cultural variations in the data. Current approaches seem to mostly be dictated by the data availability, which makes it difficult to apply them in other contexts. Another common drawback is that they rarely scale due to a significant qualitative or translation effort. To address these limitations, this thesis develops and tests two modular quantitative approaches. They are aimed at quantifying culture-related phenomena in systems which rely on multilingual user-generated content. In particular, they allow to: (1) operationalise a custom concept of cul- ture in a system; (2) quantify and compare culture-specific content- or coverage biases in such a system; and (3) map a large scale landscape of shared cultural interests and focal points. -
Annual Report
2013-14 | ANNUAL REPORT Pictured above: Posters exhibited during CIS 5 year celebrations in its office in Bangalore CIS ANNUAL REPORT (APRIL 2013 – MARCH 2014) _____________________________________________________________________ Contents Highlights ........................................................................................................................................ 3 Accessibility and Inclusion ............................................................................................................. 5 Access to Knowledge .................................................................................................................... 12 Internet Governance ...................................................................................................................... 33 Knowledge Repository on Internet Access ................................................................................... 51 Telecom......................................................................................................................................... 53 Digital Natives .............................................................................................................................. 55 Digital Humanities ........................................................................................................................ 58 Credibility Alliance Norms Compliance ...................................................................................... 61 2 CIS ANNUAL REPORT (APRIL 2013 – MARCH 2014) _____________________________________________________________________ -
2008 by Phoebe Ayers, Ben Yates, and Charles Matthews Books Messages, 102, 201–202, 201 Copyvio
INDEX Symbols & Numbers anniversaries. See date-related articles authority, arguing from, 54–55, 57 anonymous editors, 302, 304–305, 325 authors, of articles. See editors <!-- and -->, in hidden comments, 158 April Fools’ Day main page, 353 autobiography, 207. See also Conflict of Interest ' (apostrophe), in bold and italic text, 145 Arbitration, policy on (ARB), 375 guideline, 378 * (asterisk), in bulleted lists, 147 Arbitration Committee, 398–400 autoconfirm, 303 : (colon), indented lines with, 146 cases, 399–400, 400 AutoWiki Browser (AWB), 210 {{ }} (curly brackets), templates and, 145, 270 arguments. See disputes awards, for editors, 333–334 == (equal signs), in sections, 155–156 article history. See page history AWB (AutoWiki Browser), 210 | (pipe character) article message boxes. See templates, image parameters, 267 uses of, warning B internal links, 149 article namespace, 27–28 table syntax, 279–280 backlinks. See What Links Here article titles, 168–169 template parameters and, 271 Bad Jokes and Other Deleted Nonsense changing. See moving pages [[ ]] (square brackets), internal links and, 149–150 (BJAODN), 351 forbidden characters in, 169 ~ (tilde), in signatures, 115, 341 bans, 403. See also blocks lowercase in, 169 1.0. See Wikipedia 1.0 WikiProject barnstars, 333–334 articles 3RR. See Three-Revert Rule (3RR) Be Bold, 138, 365–366 creating, 162–170, 167 5P. See Five Pillars guideline, 378 definition of, 5 1911 Encyclopaedia Britannica, 163 BEANS, 366 editing. See editing Bibliography section, 103 missing, 163 A biographies, 7 number of, 3–5, 4. See also milestones article titles and, 168–169 academic qualifications, of editors, 53–57, 316 policies for. See policies, content of living persons, 23, 52, 207 accounts. -
Indian Language Wikipedias: a Comparison Study
International Journal of Emerging Engineering Research and Technology Volume 3, Issue 4, April 2015, PP 93-97 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Indian Language Wikipedias: A Comparison Study Vasudevan T V Asst Professor, Department of Computer Applications,MES College of Engineering, Kuttippuram, Kerala, India ABSTRACT Wikipedia is a popular, free, publicly editable internet encyclopedia supported by the non-profit Wikimedia Foundation. This paper presents an overview of research in the Indian Language Wikipedias. Different research areas related with Wikipedia are examined first. This is followed by a comparison study of major Indian Language Wikipedias which analyses the fundamental components of Wikipedia such as articles, authors and edits. Keywords: Wikipedia, Indian Language, Quantitative Analysis, Articles, Authors, Edits INTRODUCTION Wikipedia is a free online multilingual encyclopedia that can be edited by anyone. Wikipedia is supported by the non-profit Wikimedia Foundation. It was launched on January 15, 2001 [ 1 ]. Presently it contains 35 million articles in 288 languages. The English Edition of Wikipedia itself contains over 4.8 million articles as compared to more than 120,000 articles in the next largest English language encyclopedia, Encyclopedia Britannica Online [2]. Wikipedia is interesting to research because of the vastness and open nature of its data. We can analyse various topics such as fundamental components, structure and growth of information, author collaboration etc. HISTORY OF WIKIPEDIA IN INDIAN LANGUAGES Assamese Wikipedia, the first Indian Language Wikipedia was started in 2nd June, 2002. However, Tamil Wikipedia was the first one to reach the milestone of 100 articles. It crossed a century of articles in January 2004.