ISO 639-3 Registration Authority Request For
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Cultural Anthropology Through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-Based Sentiment
Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment Peter A. Gloor, Joao Marcos, Patrick M. de Boer, Hauke Fuehres, Wei Lo, Keiichi Nemoto [email protected] MIT Center for Collective Intelligence Abstract In this paper we study the differences in historical World View between Western and Eastern cultures, represented through the English, the Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World’s leaders since the beginning of written history, comparing them in the different Wikipedias and assessing cultural chauvinism. We also identify the most influential female leaders of all times in the English, German, Spanish, and Portuguese Wikipedia. As an additional lens into the soul of a culture we compare top terms, sentiment, emotionality, and complexity of the English, Portuguese, Spanish, and German Wikinews. 1 Introduction Over the last ten years the Web has become a mirror of the real world (Gloor et al. 2009). More recently, the Web has also begun to influence the real world: Societal events such as the Arab spring and the Chilean student unrest have drawn a large part of their impetus from the Internet and online social networks. In the meantime, Wikipedia has become one of the top ten Web sites1, occasionally beating daily newspapers in the actuality of most recent news. Be it the resignation of German national soccer team captain Philipp Lahm, or the downing of Malaysian Airlines flight 17 in the Ukraine by a guided missile, the corresponding Wikipedia page is updated as soon as the actual event happened (Becker 2012. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Wikipedia, the Free Encyclopedia 03-11-09 12:04
Tea - Wikipedia, the free encyclopedia 03-11-09 12:04 Tea From Wikipedia, the free encyclopedia Tea is the agricultural product of the leaves, leaf buds, and internodes of the Camellia sinensis plant, prepared and cured by various methods. "Tea" also refers to the aromatic beverage prepared from the cured leaves by combination with hot or boiling water,[1] and is the common name for the Camellia sinensis plant itself. After water, tea is the most widely-consumed beverage in the world.[2] It has a cooling, slightly bitter, astringent flavour which many enjoy.[3] The four types of tea most commonly found on the market are black tea, oolong tea, green tea and white tea,[4] all of which can be made from the same bushes, processed differently, and in the case of fine white tea grown differently. Pu-erh tea, a post-fermented tea, is also often classified as amongst the most popular types of tea.[5] Green Tea leaves in a Chinese The term "herbal tea" usually refers to an infusion or tisane of gaiwan. leaves, flowers, fruit, herbs or other plant material that contains no Camellia sinensis.[6] The term "red tea" either refers to an infusion made from the South African rooibos plant, also containing no Camellia sinensis, or, in Chinese, Korean, Japanese and other East Asian languages, refers to black tea. Contents 1 Traditional Chinese Tea Cultivation and Technologies 2 Processing and classification A tea bush. 3 Blending and additives 4 Content 5 Origin and history 5.1 Origin myths 5.2 China 5.3 Japan 5.4 Korea 5.5 Taiwan 5.6 Thailand 5.7 Vietnam 5.8 Tea spreads to the world 5.9 United Kingdom Plantation workers picking tea in 5.10 United States of America Tanzania. -
International Journal of Computational Linguistics
International Journal of Computational Linguistics & Chinese Language Processing Aims and Scope International Journal of Computational Linguistics and Chinese Language Processing (IJCLCLP) is an international journal published by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP). This journal was founded in August 1996 and is published four issues per year since 2005. This journal covers all aspects related to computational linguistics and speech/text processing of all natural languages. Possible topics for manuscript submitted to the journal include, but are not limited to: • Computational Linguistics • Natural Language Processing • Machine Translation • Language Generation • Language Learning • Speech Analysis/Synthesis • Speech Recognition/Understanding • Spoken Dialog Systems • Information Retrieval and Extraction • Web Information Extraction/Mining • Corpus Linguistics • Multilingual/Cross-lingual Language Processing Membership & Subscriptions If you are interested in joining ACLCLP, please see appendix for further information. Copyright © The Association for Computational Linguistics and Chinese Language Processing International Journal of Computational Linguistics and Chinese Language Processing is published four issues per volume by the Association for Computational Linguistics and Chinese Language Processing. Responsibility for the contents rests upon the authors and not upon ACLCLP, or its members. Copyright by the Association for Computational Linguistics and Chinese Language Processing. All rights reserved. No part of this journal may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical photocopying, recording or otherwise, without prior permission in writing form from the Editor-in Chief. Cover Calligraphy by Professor Ching-Chun Hsieh, founding president of ACLCLP Text excerpted and compiled from ancient Chinese classics, dating back to 700 B.C. -
Attitudes Towards Multilingualism: a Comparison Between Shanghai And
Attitudes towards multilingualism: A comparison between Shanghai and Ningbo China, especially its urban areas, has become more and more multilingual in the 21st century. Younger generations born around and after the turn of century – roughly two decades after the 1978 Economic Reform – are often multilingual with Putonghua (China’s standard language), regional dialects (often mutually unintelligible with Putonghua), and English (amongst other foreign languages). Existing studies on language attitudes in China, possibly influenced by the country’s strong monolingual ideology, tend to focus on the contrast between ‘standard’ and ‘non-standard’ where Putonghua is seen as more ‘superior’ than all other languages [1-3]. Another gap in the literature is that many of these studies rely solely on either self-reported questionnaire data or implicit attitudes elicited from matched-guised experiments [3-6] while a combination of the two methods can potentially offer a more comprehensive picture. This paper combines implicit and explicit data to discuss how these different languages/varieties are perceived by young adults in Ningbo and Shanghai. Data was collected from 66 university students from Ningbo (44 with 24 women) and Shanghai (22 with 12 women). Data analysed includes questionnaires on language attitudes and usage, a matched-guise experiment on the perception of two ‘non-standard’ phonetic features in Putonghua, and sociolinguistic interviews on language use and attitudes in general. The two cities differ in that Shanghai is more international with more English presence, and the local Shanghainese dialect is suggested to have more covert prestige [7]. Results from quantitative analysis on the questionnaires and experiments and qualitative analysis on interview extracts confirm patterns noticed by previous studies on English and Putonghua, suggesting both are valued by students. -
The Culture of Wikipedia
Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented. -
De Sousa Sinitic MSEA
THE FAR SOUTHERN SINITIC LANGUAGES AS PART OF MAINLAND SOUTHEAST ASIA (DRAFT: for MPI MSEA workshop. 21st November 2012 version.) Hilário de Sousa ERC project SINOTYPE — École des hautes études en sciences sociales [email protected]; [email protected] Within the Mainland Southeast Asian (MSEA) linguistic area (e.g. Matisoff 2003; Bisang 2006; Enfield 2005, 2011), some languages are said to be in the core of the language area, while others are said to be periphery. In the core are Mon-Khmer languages like Vietnamese and Khmer, and Kra-Dai languages like Lao and Thai. The core languages generally have: – Lexical tonal and/or phonational contrasts (except that most Khmer dialects lost their phonational contrasts; languages which are primarily tonal often have five or more tonemes); – Analytic morphological profile with many sesquisyllabic or monosyllabic words; – Strong left-headedness, including prepositions and SVO word order. The Sino-Tibetan languages, like Burmese and Mandarin, are said to be periphery to the MSEA linguistic area. The periphery languages have fewer traits that are typical to MSEA. For instance, Burmese is SOV and right-headed in general, but it has some left-headed traits like post-nominal adjectives (‘stative verbs’) and numerals. Mandarin is SVO and has prepositions, but it is otherwise strongly right-headed. These two languages also have fewer lexical tones. This paper aims at discussing some of the phonological and word order typological traits amongst the Sinitic languages, and comparing them with the MSEA typological canon. While none of the Sinitic languages could be considered to be in the core of the MSEA language area, the Far Southern Sinitic languages, namely Yuè, Pínghuà, the Sinitic dialects of Hǎinán and Léizhōu, and perhaps also Hakka in Guǎngdōng (largely corresponding to Chappell (2012, in press)’s ‘Southern Zone’) are less ‘fringe’ than the other Sinitic languages from the point of view of the MSEA linguistic area. -
China Date: 8 January 2007
Refugee Review Tribunal AUSTRALIA RRT RESEARCH RESPONSE Research Response Number: CHN31098 Country: China Date: 8 January 2007 Keywords: China – Taiwan Strait – 2006 Military exercises – Typhoons This response was prepared by the Country Research Section of the Refugee Review Tribunal (RRT) after researching publicly accessible information currently available to the RRT within time constraints. This response is not, and does not purport to be, conclusive as to the merit of any particular claim to refugee status or asylum. Questions 1. Is there corroborating information about military manoeuvres and exercises in Pingtan? 2. Is there any information specifically about the military exercise there in July 2006? 3. Is there any information about “Army day” on 1 August 2006? 4. What are the aquatic farming/fishing activities carried out in that area? 5. Has there been pollution following military exercises along the Taiwan Strait? 6. The delegate makes reference to independent information that indicates that from May until August 2006 China particularly the eastern coast was hit by a succession of storms and typhoons. The last one being the hardest to hit China in 50 years. Could I have information about this please? The delegate refers to typhoon Prapiroon. What information is available about that typhoon? 7. The delegate was of the view that military exercises would not be organised in typhoon season, particularly such a bad one. Is there any information to assist? RESPONSE 1. Is there corroborating information about military manoeuvres and exercises in Pingtan? 2. Is there any information specifically about the military exercise there in July 2006? There is a minor naval base in Pingtan and military manoeuvres are regularly held in the Taiwan Strait where Pingtan in located, especially in the June to August period. -
THE CASE of WIKIPEDIA Shane Greenstein Feng Zhu
NBER WORKING PAPER SERIES COLLECTIVE INTELLIGENCE AND NEUTRAL POINT OF VIEW: THE CASE OF WIKIPEDIA Shane Greenstein Feng Zhu Working Paper 18167 http://www.nber.org/papers/w18167 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 June 2012 The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research. NBER working papers are circulated for discussion and comment purposes. They have not been peer- reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. © 2012 by Shane Greenstein and Feng Zhu. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Collective Intelligence and Neutral Point of View: The Case of Wikipedia Shane Greenstein and Feng Zhu NBER Working Paper No. 18167 June 2012 JEL No. L17,L3,L86 ABSTRACT We examine whether collective intelligence helps achieve a neutral point of view using data from a decade of Wikipedia’s articles on US politics. Our null hypothesis builds on Linus’ Law, often expressed as “Given enough eyeballs, all bugs are shallow.” Our findings are consistent with a narrow interpretation of Linus’ Law, namely, a greater number of contributors to an article makes an article more neutral. No evidence supports a broad interpretation of Linus’ Law. Moreover, several empirical facts suggest the law does not shape many articles. The majority of articles receive little attention, and most articles change only mildly from their initial slant. -
Word Segmentation for Chinese Wikipedia Using N-Gram Mutual Information
Word Segmentation for Chinese Wikipedia Using N-Gram Mutual Information Ling-Xiang Tang1, Shlomo Geva1, Yue Xu1 and Andrew Trotman2 1School of Information Technology Faculty of Science and Technology Queensland University of Technology Queensland 4000 Australia {l4.tang, s.geva, yue.xu}@qut.edu.au 2Department of Computer Science Universityof Otago Dunedin 9054 New Zealand [email protected] Abstract In this paper, we propose an unsupervised seg- is called 激光打印机 in mainland China, but 鐳射打印機 in mentation approach, named "n-gram mutual information", or Hongkong, and 雷射印表機 in Taiwan. NGMI, which is used to segment Chinese documents into n- In digital representations of Chinese text different encoding character words or phrases, using language statistics drawn schemes have been adopted to represent the characters. How- from the Chinese Wikipedia corpus. The approach allevi- ever, most encoding schemes are incompatible with each other. ates the tremendous effort that is required in preparing and To avoid the conflict of different encoding standards and to maintaining the manually segmented Chinese text for train- cater for people's linguistic preferences, Unicode is often used ing purposes, and manually maintaining ever expanding lex- in collaborative work, for example in Wikipedia articles. With icons. Previously, mutual information was used to achieve Unicode, Chinese articles can be composed by people from all automated segmentation into 2-character words. The NGMI the above Chinese-speaking areas in a collaborative way with- approach extends the approach to handle longer n-character out encoding difficulties. As a result, these different forms of words. Experiments with heterogeneous documents from the Chinese writings and variants may coexist within same pages. -
Standard and Variation in the Use of Sentence-Final Particles: a Case Study Based on Speakers of Mandarin and Min Varieties
DOI: 10.26346/1120-2726-166 Standard and variation in the use of sentence-final particles: A case study based on speakers of Mandarin and Min varieties Chiara Romagnoli,a Carmen Lepadatb a Università Roma Tre, Italy <[email protected]> b Università di Roma La Sapienza, Italy <[email protected]> Sentence-final particles are widely used in spoken Chinese and have been analyzed from different theoretical perspectives. This study aims at investi- gating the influence of a speaker’s dialectal background on their selection of SFPs in Mandarin Chinese. Two types of sentence completion questionnaires were submitted to 86 undergraduate and graduate students enrolled at Fuzhou University, roughly half of whom use – to varying degrees – one of the Chinese varieties spoken in Fujian Province. Data collected from the 72 valid question- naires were analyzed with results showing that dialectal background is a signifi- cant factor in the close-ended item type questionnaire. Keywords: sentence-final particle, Standard Mandarin Chinese, Min varieties. 1. Introduction While all languages display elements labelled particles, the peculiarity of Chinese lies in the diversity of these items both across different varieties and in how an individual particle is employed. Widely used in spoken language, Chinese sentence-final particles (henceforth SFPs) have been investigated from different perspectives and have been described in several works (among others Lü 1942, Chao 1979, Li & Thompson 1981, Chu 2009). While there is no lack of studies on SFPs in Cantonese, Taiwan Southern Min and some other varieties of Chinese (e.g. Fung 2000, Law 2002, Li 2006), less attention has been paid to the influence of regional varieties on the standard language in the selection of SFPs.1 The present study aims to fill this gap by presenting and discussing the data collected from a sample of 72 participants, all native speakers of one or more varieties of Chinese. -
Gěi ’Give’ in Beijing and Beyond Ekaterina Chirkova
Gěi ’give’ in Beijing and beyond Ekaterina Chirkova To cite this version: Ekaterina Chirkova. Gěi ’give’ in Beijing and beyond. Cahiers de linguistique - Asie Orientale, CRLAO, 2008, 37 (1), pp.3-42. hal-00336148 HAL Id: hal-00336148 https://hal.archives-ouvertes.fr/hal-00336148 Submitted on 2 Nov 2008 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Gěi ‘give’ in Beijing and beyond1 Katia Chirkova (CRLAO, CNRS) This article focuses on the various uses of gěi ‘give’, as attested in a corpus of spoken Beijing Mandarin collected by the author. These uses are compared to those in earlier attestations of Beijing Mandarin and to those in Greater Beijing Mandarin and in Jì-Lǔ Mandarin dialects. The uses of gěi in the corpus are demonstrated to be consistent with the latter pattern, where the primary function of gěi is that of indirect object marking and where, unlike Standard Mandarin, gěi is not additionally used as an agent marker or a direct object marker. Exceptions to this pattern in the corpus are explained as a recent development arisen through reanalysis. Key words : gěi, direct object marker, indirect object marker, agent marker, Beijing Mandarin, Northern Mandarin, typology.