© 2020 Xiaoman Pan CROSS-LINGUAL ENTITY EXTRACTION and LINKING for 300 LANGUAGES
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Survey of Orthographic Information in Machine Translation 3
Machine Translation Journal manuscript No. (will be inserted by the editor) A Survey of Orthographic Information in Machine Translation Bharathi Raja Chakravarthi1 ⋅ Priya Rani 1 ⋅ Mihael Arcan2 ⋅ John P. McCrae 1 the date of receipt and acceptance should be inserted later Abstract Machine translation is one of the applications of natural language process- ing which has been explored in different languages. Recently researchers started pay- ing attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine transla- tion systems is the variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable, but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthog- raphy’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic in- formation can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under- resourced languages. We discuss different types of machine translation and demon- strate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts of cog- nates information at different levels of machine translation and the lessons that can Bharathi Raja Chakravarthi [email protected] Priya Rani [email protected] Mihael Arcan [email protected] John P. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
China in 50 Dishes
C H I N A I N 5 0 D I S H E S CHINA IN 50 DISHES Brought to you by CHINA IN 50 DISHES A 5,000 year-old food culture To declare a love of ‘Chinese food’ is a bit like remarking Chinese food Imported spices are generously used in the western areas you enjoy European cuisine. What does the latter mean? It experts have of Xinjiang and Gansu that sit on China’s ancient trade encompasses the pickle and rye diet of Scandinavia, the identified four routes with Europe, while yak fat and iron-rich offal are sauce-driven indulgences of French cuisine, the pastas of main schools of favoured by the nomadic farmers facing harsh climes on Italy, the pork heavy dishes of Bavaria as well as Irish stew Chinese cooking the Tibetan plains. and Spanish paella. Chinese cuisine is every bit as diverse termed the Four For a more handy simplification, Chinese food experts as the list above. “Great” Cuisines have identified four main schools of Chinese cooking of China – China, with its 1.4 billion people, has a topography as termed the Four “Great” Cuisines of China. They are Shandong, varied as the entire European continent and a comparable delineated by geographical location and comprise Sichuan, Jiangsu geographical scale. Its provinces and other administrative and Cantonese Shandong cuisine or lu cai , to represent northern cooking areas (together totalling more than 30) rival the European styles; Sichuan cuisine or chuan cai for the western Union’s membership in numerical terms. regions; Huaiyang cuisine to represent China’s eastern China’s current ‘continental’ scale was slowly pieced coast; and Cantonese cuisine or yue cai to represent the together through more than 5,000 years of feudal culinary traditions of the south. -
Celebrate Your Love Wedding & Solemnisation Packages 2016 / 2017
Celebrate Your Love Wedding & Solemnisation Packages 2016 / 2017 Standing Buffet Packages Solemnisation Ceremony Package Inclusive of: Choice of Chef’s exclusively-crafted Delectable International Buffet Spread Full set of Exquisite Buffet Set-Up with Ivory Skirting and complete set of Biodegradable Disposable-ware Your preferred choice of Delicate Wedding thematic floral arrangements from our selection Specially-picked Wedding Guest Book Complimentary Catering Set-Up & Transportation Fees (not applicable on PH & Eve of PH) For Orders of 100pax and Above: Complimentary usage of 01 Reception Table with Ivory Linen and selected Wedding thematic set-up with 02 Cushion Chairs with Ivory Seat Covers 01 Uniformed Service Staff on-site For Orders of 200pax and Above: Complimentary usage of 01 Reception Table with Ivory Linen and selected Wedding thematic set-up with 02 Cushion Chairs with Ivory Seat Covers Set-up of Solemnisation Table and 05 Cushion Chairs with Ivory Seat Covers, Ring Pillows and Signature Pen and selected Wedding Thematic decorations 02 Uniformed-Service Staff on-site *Terms and Conditions Apply SOLEMNISATION BUFFET PACKAGES Sweet Romance Package @ $20.00+ per person Minimum 50 Pax Salad (Choice of 01) Mixed Garden Salad with Asian Soy Dressing & Cherry Tomato (VEG) Potato Salad with Soy Bits (VEG) Cold & Tangy Cucumber Salad (VEG) Starters (Choice of 02) Breaded Shrimp Cake with Mayonnaise Dip Steamed Dim Sum Delight (Crystal Har Kow & Chicken Siew Mai) with Chilli Dip Crispy Potato Curry Samosa (VEG) Steamed Vegetarian -
Signature Redacted Author
A, Redesigning Rural Life: Relocation and In Situ Urbanization in a Shandong _77 Village OF TECH UTE0.y by JUL 2 92014 Saul Kriger Wilson LIBRARIES Submitted to the School of Humanities, Arts and Social Sciences in partial fulfillment of the requirements for the degree of Bachelor of Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2014 Massachusetts Institute of Technology 2014. All rights reserved. Signature redacted Author........... Major Departure in Humanities: Asian and Asian Diaspora Studies Department of Mathematics Signature redacted July 9, 2014 Certified by. Yasheng Huang Professor of Global Economics and Management Sloan School of Management Signature redacted Thesis Supervisor Accepted by Ian Condry Chairman /i / Deportment of Global Studies and Languages Accepted by .. Signature redacted ............ Deborah Fitzgerald Kenan Sahin Dean School of Humanities, Arts, and Social Sciences 2 Redesigning Rural Life: Relocation and In Situ Urbanization in a Shandong Village by Saul Kriger Wilson Submitted to the School of Humanities, Arts and Social Sciences on July 9, 2014, in partial fulfillment of the requirements for the degree of Bachelor of Science Abstract The Chinese government's attempts to improve village public service provision, limit the loss of arable land, and coordinate urbanization have converged in land readjust- ment schemes to rebuild some villages as more densely populated "rural communities." I present a case study on a financially troubled, partially complete village reconstruc- tion project in Shandong. Villagers outside the leadership were minimally involved in project planning, and the village leadership put pressure on villagers to move. How- ever, the pressure to move was not due to an absence of formal property rights for villagers; reluctant villagers agreed to move because they could not afford to offend the village government. -
Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics
computers Article Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland * Correspondence: [email protected]; Tel.: +48-(61)-639-27-93 Received: 10 May 2019; Accepted: 13 August 2019; Published: 14 August 2019 Abstract: On Wikipedia, articles about various topics can be created and edited independently in each language version. Therefore, the quality of information about the same topic depends on the language. Any interested user can improve an article and that improvement may depend on the popularity of the article. The goal of this study is to show what topics are best represented in different language versions of Wikipedia using results of quality assessment for over 39 million articles in 55 languages. In this paper, we also analyze how popular selected topics are among readers and authors in various languages. We used two approaches to assign articles to various topics. First, we selected 27 main multilingual categories and analyzed all their connections with sub-categories based on information extracted from over 10 million categories in 55 language versions. To classify the articles to one of the 27 main categories, we took into account over 400 million links from articles to over 10 million categories and over 26 million links between categories. In the second approach, we used data from DBpedia and Wikidata. We also showed how the results of the study can be used to build local and global rankings of the Wikipedia content. -
The Globalization of Chinese Food ANTHROPOLOGY of ASIA SERIES Series Editor: Grant Evans, University Ofhong Kong
The Globalization of Chinese Food ANTHROPOLOGY OF ASIA SERIES Series Editor: Grant Evans, University ofHong Kong Asia today is one ofthe most dynamic regions ofthe world. The previously predominant image of 'timeless peasants' has given way to the image of fast-paced business people, mass consumerism and high-rise urban conglomerations. Yet much discourse remains entrenched in the polarities of 'East vs. West', 'Tradition vs. Change'. This series hopes to provide a forum for anthropological studies which break with such polarities. It will publish titles dealing with cosmopolitanism, cultural identity, representa tions, arts and performance. The complexities of urban Asia, its elites, its political rituals, and its families will also be explored. Dangerous Blood, Refined Souls Death Rituals among the Chinese in Singapore Tong Chee Kiong Folk Art Potters ofJapan Beyond an Anthropology of Aesthetics Brian Moeran Hong Kong The Anthropology of a Chinese Metropolis Edited by Grant Evans and Maria Tam Anthropology and Colonialism in Asia and Oceania Jan van Bremen and Akitoshi Shimizu Japanese Bosses, Chinese Workers Power and Control in a Hong Kong Megastore WOng Heung wah The Legend ofthe Golden Boat Regulation, Trade and Traders in the Borderlands of Laos, Thailand, China and Burma Andrew walker Cultural Crisis and Social Memory Politics of the Past in the Thai World Edited by Shigeharu Tanabe and Charles R Keyes The Globalization of Chinese Food Edited by David Y. H. Wu and Sidney C. H. Cheung The Globalization of Chinese Food Edited by David Y. H. Wu and Sidney C. H. Cheung UNIVERSITY OF HAWAI'I PRESS HONOLULU Editorial Matter © 2002 David Y. -
Wikimania 2006 Invited Speaker Biographies
Wikimania 2006 Invited Speaker has been a forceful advocate for open science and open access scientific publishing - the free release of the Biographies material and intellectual product of the scientific research. He is co-Founder of Public Library of Science Yochai Benkler is Professor of Law at Yale Law (PLoS). He serves on the PLoS board, and is an advisor School. His research focuses on commons-based to Science Commons. approaches to managing resources in networked environments. His publications include “The Wealth of Rishab Aiyer Ghosh first developed and sold free Networks: How Social Production Transforms Markets” and software in 1994. He switched from writing in C and “Freedom and Coase’s Penguin, or Linux and the Nature of the assembly to English, and has been writing about the Firm”. economics of free software and collaborative production since 1994. He is the Founding Karen Christensen is the CEO of Berkshire International and Managing Editor of First Monday, the Publishing group, a reference work publisher known for most widely read peer-reviewed on-line journal of the specialty encyclopedias. Her primary responsibility is Internet, and Senior Researcher at the Maastricht bringing together global teams and building Economic Research Institute on Innovation and relationships with experts and organizations around the Technology (MERIT) at the University of Maastricht world. Karen has also served as an encyclopedia editor; and United Nations University, the Netherlands. In as coeditor on the “Berkshire Encyclopedia of World Sport” 2000 he coordinated the European Union -funded (June 2005) and “Global Perspectives on the United States” FLOSS project, the most comprehensive early study of (three volumes, forthcoming), and as senior editor of free/libre/open source users and developers. -
The Wikipedia Diversity Observatory a Project to Identify and Bridge Content Gaps in Wikipedia
The Wikipedia Diversity Observatory A Project to Identify and Bridge Content Gaps in Wikipedia Marc Miquel-Ribé David Laniado Universitat Pompeu Fabra, Barcelona, Catalonia Eurecat, Centre Tecnològic de Catalunya [email protected] [email protected] ABSTRACT 1 Introduction In this paper we present the Wikipedia Diversity Observatory, Wikipedia is among the largest information repositories on the a project aimed to increase diversity within Wikipedia Internet that are both multilingual and created through language editions. The project includes dashboards with collaborative effort. Its prime objective1 is to "give free access visualizations and tools which show the gaps in terms of to the sum of all human knowledge" and, consequently, it exists concepts not represented or not shared across languages. The in as many as 309 languages. Even though the language dashboards are built on datasets generated for each of the more communities make the projects grow on a constant basis, the than 300 language editions, with features that label each article content does not represent the existing diversity in peoples, according to different categories relevant to overall content places, and cultures of the world; furthermore, there is a gap diversity. Through various examples, we show how the tools between language editions and articles often are not shared, or encourage and help editors to bridge the gaps in Wikipedia remain even unique to one language [1]. The creation of articles content. Finally, we discuss the project's impact on the in Wikipedia language editions is spontaneous and non- communities and implications for the Wikimedia movement, in directed. Several studies showed that cultural and geographical a moment in which covering diversity is considered strategic. -
Welcome the Lunar New Year with a Feast of Cantonese
WELCOME THE LUNAR NEW YEAR WITH A FEAST OF CANTONESE TREASURES, FESTIVE GOODIES AND CELEBRATORY MENUS Reunite over a sublime spread by Chinese Executive Chef Leong Chee Yeng that combines traditional significance with culinary finesse SINGAPORE, 15 January 2021 – This Year of the Ox beckons a prosperous new beginning at The Fullerton Hotel Singapore’s Jade Restaurant, with a fine repertoire of Chinese culinary treasures befitting a new year’s feast from 25 January to 27 February 2021. A Bountiful Feast at Jade Restaurant Executive Chef of Jade Restaurant Leong Chee Yeng says, “A symbol of strength and dedication, the mighty yet humble ox traditionally plays an essential role in reaping a bountiful harvest. This motif reveals itself in this year’s specialties, alongside other inspired creations imbued with well-wishes for bliss, prosperity and togetherness. Designed to serve smaller groups, the newly created dishes are ideal for sharing meaningful moments of warmth with one’s closest kin.” The quintessential starter for auspicious feasting is the spectacular Premium Gold Rush Salmon Yu Sheng (pictured above), meticulously arranged into an image of the spirited bull, while families with younger children will delight in the special children’s version, arranged into an endearing representation of a playful calf. Fresh ingredients are elevated by champagne jelly, shallot oil and kumquat dressing, striking a captivating balance of rich and refreshing flavours. The standard Gold Rush Salmon Yu Sheng is presented alongside elegant Chinese calligraphy written by Chef Leong himself. The Gold Rush Salmon Yu Sheng with Champagne Jelly, Shallot Oil and Kumquat Dressing is available in Small (S$78++ for 2 to 4 persons), Medium (S$118++ for 5 to 7 persons) and Large (S$138++ for 8 to 10 persons). -
Gender Bias and Under-Representation in Natural Language Processing Across Human Languages
Gender Bias and Under-Representation in Natural Language Processing Across Human Languages Yan Chen1, Christopher Mahoney1, Isabella Grasso1, Esma Wali1, Abigail Matthews2, Thomas Middleton1, Mariama Njie3, Jeanna Matthews1 1Clarkson University, 2University of Wisconsin-Madison, 3 Iona College [email protected] ABSTRACT 1 INTRODUCTION Natural Language Processing (NLP) systems are at the heart of Corpora of human language are regularly fed into machine learning many critical automated decision-making systems making crucial systems as a key way to learn about the world. Natural Language recommendations about our future world. However, these systems Processing plays a significant role in many powerful applications reflect a wide range of biases, from gender bias to a bias inwhich such as speech recognition, text translation, and autocomplete and voices they represent. In this paper, a team including speakers of is at the heart of many critical automated decision systems mak- 9 languages - Chinese, Spanish, English, Arabic, German, French, ing crucial recommendations about our future world [20][2][7]. Farsi, Urdu, and Wolof - reports and analyzes measurements of Systems are taught to identify spam email, suggest medical articles gender bias in the Wikipedia corpora for these 9 languages. In the or diagnoses related to a patient’s symptoms, sort resumes based process, we also document how our work exposes crucial gaps on relevance for a given position, and many other tasks that form in the NLP-pipeline for many languages. Despite substantial in- key components of critical decision making systems in areas such vestments in multilingual support, the modern NLP-pipeline still as criminal justice, credit, housing, allocation of public resources, systematically and dramatically under-represents the majority of and more. -
Mapping Arabic Wikipedia Into the Named Entities Taxonomy
Mapping Arabic Wikipedia into the Named Entities Taxonomy Fahd Alotaibi and Mark Lee School of Computer Science, University of Birmingham, UK {fsa081|m.g.lee}@cs.bham.ac.uk ABSTRACT This paper describes a comprehensive set of experiments conducted in order to classify Arabic Wikipedia articles into predefined sets of Named Entity classes. We tackle using four different classifiers, namely: Naïve Bayes, Multinomial Naïve Bayes, Support Vector Machines, and Stochastic Gradient Descent. We report on several aspects related to classification models in the sense of feature representation, feature set and statistical modelling. The results reported show that, we are able to correctly classify the articles with scores of 90% on Precision, Recall and balanced F-measure. KEYWORDS: Arabic Named Entity, Wikipedia, Arabic Document Classification, Supervised Machine Learning. Proceedings of COLING 2012: Posters, pages 43–52, COLING 2012, Mumbai, December 2012. 43 1 Introduction Relying on supervised machine learning technologies to recognize Named Entities (NE) in the text requires the development of a reasonable volume of data for the training phase. Manually developing a training dataset that goes beyond the news-wire domain is a non-trivial task. Examination of online and freely available resources, such as the Arabic Wikipedia (AW) offers promise because the underlying scheme of AW can be exploited in order to automatically identify NEs in context. To utilize this resource and develop a NEs corpus from AW means two different tasks should be addressed: 1) Identifying the NEs in the context regardless of assigning those into NEs semantic classes. 2) Classifying AW articles into predefined NEs taxonomy.