Arxiv:2010.11856V3 [Cs.CL] 13 Apr 2021 Questions from Non-English Native Speakers to Rep- Information-Seeking Questions—Questions from Resent Real-World Applications
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Cultural Anthropology Through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-Based Sentiment
Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment Peter A. Gloor, Joao Marcos, Patrick M. de Boer, Hauke Fuehres, Wei Lo, Keiichi Nemoto [email protected] MIT Center for Collective Intelligence Abstract In this paper we study the differences in historical World View between Western and Eastern cultures, represented through the English, the Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World’s leaders since the beginning of written history, comparing them in the different Wikipedias and assessing cultural chauvinism. We also identify the most influential female leaders of all times in the English, German, Spanish, and Portuguese Wikipedia. As an additional lens into the soul of a culture we compare top terms, sentiment, emotionality, and complexity of the English, Portuguese, Spanish, and German Wikinews. 1 Introduction Over the last ten years the Web has become a mirror of the real world (Gloor et al. 2009). More recently, the Web has also begun to influence the real world: Societal events such as the Arab spring and the Chilean student unrest have drawn a large part of their impetus from the Internet and online social networks. In the meantime, Wikipedia has become one of the top ten Web sites1, occasionally beating daily newspapers in the actuality of most recent news. Be it the resignation of German national soccer team captain Philipp Lahm, or the downing of Malaysian Airlines flight 17 in the Ukraine by a guided missile, the corresponding Wikipedia page is updated as soon as the actual event happened (Becker 2012. -
Universality, Similarity, and Translation in the Wikipedia Inter-Language Link Network
In Search of the Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia Inter-language Link Network Morten Warncke-Wang1, Anuradha Uduwage1, Zhenhua Dong2, John Riedl1 1GroupLens Research Dept. of Computer Science and Engineering 2Dept. of Information Technical Science University of Minnesota Nankai University Minneapolis, Minnesota Tianjin, China {morten,uduwage,riedl}@cs.umn.edu [email protected] ABSTRACT 1. INTRODUCTION Wikipedia has become one of the primary encyclopaedic in- The world: seven seas separating seven continents, seven formation repositories on the World Wide Web. It started billion people in 193 nations. The world's knowledge: 283 in 2001 with a single edition in the English language and has Wikipedias totalling more than 20 million articles. Some since expanded to more than 20 million articles in 283 lan- of the content that is contained within these Wikipedias is guages. Criss-crossing between the Wikipedias is an inter- probably shared between them; for instance it is likely that language link network, connecting the articles of one edition they will all have an article about Wikipedia itself. This of Wikipedia to another. We describe characteristics of ar- leads us to ask whether there exists some ur-Wikipedia, a ticles covered by nearly all Wikipedias and those covered by set of universal knowledge that any human encyclopaedia only a single language edition, we use the network to under- will contain, regardless of language, culture, etc? With such stand how we can judge the similarity between Wikipedias a large number of Wikipedia editions, what can we learn based on concept coverage, and we investigate the flow of about the knowledge in the ur-Wikipedia? translation between a selection of the larger Wikipedias. -
Cross-Cultural Research
Cross-Cultural Research http://ccr.sagepub.com Cultural Adaptations After Progressionism Lauren W. McCall Cross-Cultural Research 2009; 43; 62 DOI: 10.1177/1069397108328613 The online version of this article can be found at: http://ccr.sagepub.com/cgi/content/abstract/43/1/62 Published by: http://www.sagepublications.com On behalf of: Society for Cross-Cultural Research Additional services and information for Cross-Cultural Research can be found at: Email Alerts: http://ccr.sagepub.com/cgi/alerts Subscriptions: http://ccr.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations http://ccr.sagepub.com/cgi/content/refs/43/1/62 Downloaded from http://ccr.sagepub.com at DUKE UNIV on January 9, 2009 Cross-Cultural Research Volume 43 Number 1 February 2009 62-85 © 2009 Sage Publications Cultural Adaptations After 10.1177/1069397108328613 http://ccr.sagepub.com hosted at Progressionism http://online.sagepub.com Lauren W. McCall National Evolutionary Synthesis Center How should behavioral scientists interpret apparently progressive stages of cultural history? Adaptive progress in biology is thought to only occur locally, relative to local conditions. Just as evolutionary theory offers physi- cal anthropologists an appreciation of global human diversity through local adaptation, so the metaphor of adaptation offers behavioral scientists an appreciation of cultural diversity through analogous mechanisms. Analyses reported here test for cultural adaptation in both biotic and abiotic environ- ments. Testing cultural adaptation to the human-made environment, the culture’s pre-existing technical complexity is shown to be a predictive fac- tor. Then testing cultural adaptation to the physical environment, this article corroborates Divale’s (1999) finding that counting systems are adaptations to unstable environments, and expands the model to include other environ- mental indices and cultural traits. -
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Handwriting Recognition in Indian Regional Scripts: a Survey of Offline Techniques
1 Handwriting Recognition in Indian Regional Scripts: A Survey of Offline Techniques UMAPADA PAL, Indian Statistical Institute RAMACHANDRAN JAYADEVAN, Pune Institute of Computer Technology NABIN SHARMA, Indian Statistical Institute Offline handwriting recognition in Indian regional scripts is an interesting area of research as almost 460 million people in India use regional scripts. The nine major Indian regional scripts are Bangla (for Bengali and Assamese languages), Gujarati, Kannada, Malayalam, Oriya, Gurumukhi (for Punjabi lan- guage), Tamil, Telugu, and Nastaliq (for Urdu language). A state-of-the-art survey about the techniques available in the area of offline handwriting recognition (OHR) in Indian regional scripts will be of a great aid to the researchers in the subcontinent and hence a sincere attempt is made in this article to discuss the advancements reported in this regard during the last few decades. The survey is organized into different sections. A brief introduction is given initially about automatic recognition of handwriting and official re- gional scripts in India. The nine regional scripts are then categorized into four subgroups based on their similarity and evolution information. The first group contains Bangla, Oriya, Gujarati and Gurumukhi scripts. The second group contains Kannada and Telugu scripts and the third group contains Tamil and Malayalam scripts. The fourth group contains only Nastaliq script (Perso-Arabic script for Urdu), which is not an Indo-Aryan script. Various feature extraction and classification techniques associated with the offline handwriting recognition of the regional scripts are discussed in this survey. As it is important to identify the script before the recognition step, a section is dedicated to handwritten script identification techniques. -
Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia
Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia Eun-kyung Kim1, Matthias Weidl2, Key-Sun Choi1, S¨orenAuer2 1 Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 2 Universit¨at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany [email protected], [email protected] [email protected], [email protected] Abstract. In the first part of this paper we report about experiences when applying the DBpedia extraction framework to the Korean Wikipedia. We improved the extraction of non-Latin characters and extended the framework with pluggable internationalization components in order to fa- cilitate the extraction of localized information. With these improvements we almost doubled the amount of extracted triples. We also will present the results of the extraction for Korean. In the second part, we present a conceptual study aimed at understanding the impact of international resource synchronization in DBpedia. In the absence of any informa- tion synchronization, each country would construct its own datasets and manage it from its users. Moreover the cooperation across the various countries is adversely affected. Keywords: Synchronization, Wikipedia, DBpedia, Multi-lingual 1 Introduction Wikipedia is the largest encyclopedia of mankind and is written collaboratively by people all around the world. Everybody can access this knowledge as well as add and edit articles. Right now Wikipedia is available in 260 languages and the quality of the articles reached a high level [1]. However, Wikipedia only offers full-text search for this textual information. For that reason, different projects have been started to convert this information into structured knowledge, which can be used by Semantic Web technologies to ask sophisticated queries against Wikipedia. -
Mathematics in African History and Cultures
Paulus Gerdes & Ahmed Djebbar MATHEMATICS IN AFRICAN HISTORY AND CULTURES: AN ANNOTATED BIBLIOGRAPHY African Mathematical Union Commission on the History of Mathematics in Africa (AMUCHMA) Mathematics in African History and Cultures Second edition, 2007 First edition: African Mathematical Union, Cape Town, South Africa, 2004 ISBN: 978-1-4303-1537-7 Published by Lulu. Copyright © 2007 by Paulus Gerdes & Ahmed Djebbar Authors Paulus Gerdes Research Centre for Mathematics, Culture and Education, C.P. 915, Maputo, Mozambique E-mail: [email protected] Ahmed Djebbar Département de mathématiques, Bt. M 2, Université de Lille 1, 59655 Villeneuve D’Asq Cedex, France E-mail: [email protected], [email protected] Cover design inspired by a pattern on a mat woven in the 19th century by a Yombe woman from the Lower Congo area (Cf. GER-04b, p. 96). 2 Table of contents page Preface by the President of the African 7 Mathematical Union (Prof. Jan Persens) Introduction 9 Introduction to the new edition 14 Bibliography A 15 B 43 C 65 D 77 E 105 F 115 G 121 H 162 I 173 J 179 K 182 L 194 M 207 N 223 O 228 P 234 R 241 S 252 T 274 U 281 V 283 3 Mathematics in African History and Cultures page W 290 Y 296 Z 298 Appendices 1 On mathematicians of African descent / 307 Diaspora 2 Publications by Africans on the History of 313 Mathematics outside Africa (including reviews of these publications) 3 On Time-reckoning and Astronomy in 317 African History and Cultures 4 String figures in Africa 338 5 Examples of other Mathematical Books and 343 -
Bengali Handwritten Numeral Recognition Using Artificial Neural Network and Transition Elements
BENGALI HANDWRITTEN NUMERAL RECOGNITION USING ARTIFICIAL NEURAL NETWORK AND TRANSITION ELEMENTS a,* b c Zahidur Rahim Chowdhury Mohammad Abu Naser Ashraf Bin Islam a United International University, Dhaka. b Islamic University of Technology, Gajipur. c Bangladesh University of Engineering and Technology, Dhaka. * Corresponding email address: [email protected] Abstract: Bengali hand-writing recognition has potential application in document processing for one the widely used for language in the world. A method using Artificial Neural Network (ANN) is utilized primarily to identify numerals of the language using transition features. Maximum accuracy of 82% is reported in this article for an optimized network. The typical performance of the handwriting recognition system that uses a single recognition scheme is around 85%. The significance of local features in a character should be incorporated to enhance the overall performance of the network. Key words: Bengali Hand-writing, Numeral, Pattern Recognition, Neural Network, Transition. article. Recognition results from different systems were INTRODUCTION compared to make the final decision. The average recognition rate, error rate and reliability achieved by the Hand-written Bengali character recognition is a integrated system were 95.05%, 0.3% and 99.03%, process where techniques of pattern recognition are applied respectively. to analyze handwritings of Bengali language, one of the In this article, a recognition process for Bengali hand most popular languages in the world. Beside Bengali, written numerals is presented using ‘holistic’ approaches researchers have studied the recognition process using due to limited number of possible outputs. Transition different techniques for other popular languages like features of an image, instead of the complete image, were English [1–4], Chinese [5], Arabic [6], Japanese [7, 8], and used as inputs of the ANN for an efficient and a compact Indic [9]. -
QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT
QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT C Features that we build to improve our technology A Foundation level goals offering B Features we build for others D Modernization, renewal and tech debt goals The goals in each team pack are annotated using this scheme illustrate the broad trends in our priorities Agenda ● CTO Team ● Research and Data ● Design Research ● Performance ● Release Engineering ● Security ● Technical Operations Photos (left to right) Technology (Services) CTO July 2017 quarterly check-in All content is © Wikimedia Foundation & available under CC BY-SA 4.0, unless noted otherwise. CTO Team ● Victoria Coleman - Chief Technology Officer ● Joel Aufrecht - Program Manager (Technology) ● Lani Goto - Project Assistant ● Megan Neisler - Senior Project Coordinator ● Sarah Rodlund - Senior Project Coordinator ● Kevin Smith - Program Manager (Engineering) Photos (left to right) CHECK IN TEAM/DEPT PROGRAM WIKIMEDIA FOUNDATION July 2017 CTO 4.5 [LINK] ANNUAL PLAN GOAL: expand and strengthen our technical communities What is your objective / Who are you working with? What impact / deliverables are you expecting? workflow? Program 4: Technical LAST QUARTER community building (none) Outcome 5: Organize Wikimedia Developer Summit NEXT QUARTER Objective 1: Developer Technical Collaboration Decide on event location, dates, theme, deadlines, etc. Summit web page and publicize the information published four months before the event (B) STATUS: OBJECTIVE IN PROGRESS Technology (Services) Research and Data July, 2017 quarterly -
LNCS 8104, Pp
Bengali Printed Character Recognition – A New Approach Soharab Hossain Shaikh1, Marek Tabedzki2, Nabendu Chaki3, and Khalid Saeed4 1 A.K.Choudhury School of Information Technology, University of Calcutta, India [email protected] 2 Faculty of Computer Science, Bialystok University of Technology, Poland [email protected] 3 Department of Computer Science & Engineering, University of Calcutta, India [email protected] 4 Faculty of Physics and Applied Computer Science, AGH University of Science and Technology, Cracow, Poland [email protected] Abstract. This paper presents a new method for Bengali character recognition based on view-based approach. Both the top-bottom and the lateral view-based approaches have been considered. A layer-based methodology in modification of the basic view-based processing has been proposed. This facilitates handling of unequal logical partitions. The document image is acquired and segmented to extract out the text lines, words, and letters. The whole image of the individual characters is taken as the input to the system. The character image is put into a bounding box and resized whenever necessary. The view-based approach is applied on the resultant image and the characteristic points are extracted from the views after some preprocessing. These points are then used to form a feature vector that represents the given character as a descriptor. The feature vectors have been classified with the aid of k-NN classifier using Dynamic Time Warping (DTW) as a distance measure. A small dataset of some of the compound characters has also been considered for recognition. The promising results obtained so far encourage the authors for further work on handwritten Bengali scripts. -
Explaining Cultural Borders on Wikipedia Through Multilingual Co-Editing Activity
Samoilenko et al. EPJ Data Science (2016)5:9 DOI 10.1140/epjds/s13688-016-0070-8 REGULAR ARTICLE OpenAccess Linguistic neighbourhoods: explaining cultural borders on Wikipedia through multilingual co-editing activity Anna Samoilenko1,2*, Fariba Karimi1,DanielEdler3, Jérôme Kunegis2 and Markus Strohmaier1,2 *Correspondence: [email protected] Abstract 1GESIS - Leibniz-Institute for the Social Sciences, 6-8 Unter In this paper, we study the network of global interconnections between language Sachsenhausen, Cologne, 50667, communities, based on shared co-editing interests of Wikipedia editors, and show Germany that although English is discussed as a potential lingua franca of the digital space, its 2University of Koblenz-Landau, Koblenz, Germany domination disappears in the network of co-editing similarities, and instead local Full list of author information is connections come to the forefront. Out of the hypotheses we explored, bilingualism, available at the end of the article linguistic similarity of languages, and shared religion provide the best explanations for the similarity of interests between cultural communities. Population attraction and geographical proximity are also significant, but much weaker factors bringing communities together. In addition, we present an approach that allows for extracting significant cultural borders from editing activity of Wikipedia users, and comparing a set of hypotheses about the social mechanisms generating these borders. Our study sheds light on how culture is reflected in the collective process