Geographic Information Retrieval: Classification, Disambiguation
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Cultural Anthropology Through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-Based Sentiment
Cultural Anthropology through the Lens of Wikipedia: Historical Leader Networks, Gender Bias, and News-based Sentiment Peter A. Gloor, Joao Marcos, Patrick M. de Boer, Hauke Fuehres, Wei Lo, Keiichi Nemoto [email protected] MIT Center for Collective Intelligence Abstract In this paper we study the differences in historical World View between Western and Eastern cultures, represented through the English, the Chinese, Japanese, and German Wikipedia. In particular, we analyze the historical networks of the World’s leaders since the beginning of written history, comparing them in the different Wikipedias and assessing cultural chauvinism. We also identify the most influential female leaders of all times in the English, German, Spanish, and Portuguese Wikipedia. As an additional lens into the soul of a culture we compare top terms, sentiment, emotionality, and complexity of the English, Portuguese, Spanish, and German Wikinews. 1 Introduction Over the last ten years the Web has become a mirror of the real world (Gloor et al. 2009). More recently, the Web has also begun to influence the real world: Societal events such as the Arab spring and the Chilean student unrest have drawn a large part of their impetus from the Internet and online social networks. In the meantime, Wikipedia has become one of the top ten Web sites1, occasionally beating daily newspapers in the actuality of most recent news. Be it the resignation of German national soccer team captain Philipp Lahm, or the downing of Malaysian Airlines flight 17 in the Ukraine by a guided missile, the corresponding Wikipedia page is updated as soon as the actual event happened (Becker 2012. -
Canton of Basel-Stadt
Canton of Basel-Stadt Welcome. VARIED CITY OF THE ARTS Basel’s innumerable historical buildings form a picturesque setting for its vibrant cultural scene, which is surprisingly rich for THRIVING BUSINESS LOCATION CENTRE OF EUROPE, TRINATIONAL such a small canton: around 40 museums, AND COSMOPOLITAN some of them world-renowned, such as the Basel is Switzerland’s most dynamic busi- Fondation Beyeler and the Kunstmuseum ness centre. The city built its success on There is a point in Basel, in the Swiss Rhine Basel, the Theater Basel, where opera, the global achievements of its pharmaceut- Ports, where the borders of Switzerland, drama and ballet are performed, as well as ical and chemical companies. Roche, No- France and Germany meet. Basel works 25 smaller theatres, a musical stage, and vartis, Syngenta, Lonza Group, Clariant and closely together with its neighbours Ger- countless galleries and cinemas. The city others have raised Basel’s profile around many and France in the fields of educa- ranks with the European elite in the field of the world. Thanks to the extensive logis- tion, culture, transport and the environment. fine arts, and hosts the world’s leading con- tics know-how that has been established Residents of Basel enjoy the superb recre- temporary art fair, Art Basel. In addition to over the centuries, a number of leading in- ational opportunities in French Alsace as its prominent classical orchestras and over ternational logistics service providers are well as in Germany’s Black Forest. And the 1000 concerts per year, numerous high- also based here. Basel is a successful ex- trinational EuroAirport Basel-Mulhouse- profile events make Basel a veritable city hibition and congress city, profiting from an Freiburg is a key transport hub, linking the of the arts. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
Title of Thesis: ABSTRACT CLASSIFYING BIAS
ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr. -
IMPRESSIONEN IMPRESSIONS Bmw-Berlin-Marathon.Com
IMPRESSIONEN IMPRESSIONS bmw-berlin-marathon.com 2:01:39 DIE ZUKUNFT IST JETZT. DER BMW i3s. BMW i3s (94 Ah) mit reinem Elektroantrieb BMW eDrive: Stromverbrauch in kWh/100 km (kombiniert): 14,3; CO2-Emissionen in g/km (kombiniert): 0; Kraftstoffverbrauch in l/100 km (kombiniert): 0. Die offi ziellen Angaben zu Kraftstoffverbrauch, CO2-Emissionen und Stromverbrauch wurden nach dem vorgeschriebenen Messverfahren VO (EU) 715/2007 in der jeweils geltenden Fassung ermittelt. Bei Freude am Fahren diesem Fahrzeug können für die Bemessung von Steuern und anderen fahrzeugbezogenen Abgaben, die (auch) auf den CO2-Ausstoß abstellen, andere als die hier angegebenen Werte gelten. Abbildung zeigt Sonderausstattungen. 7617 BMW i3s Sportmarketing AZ 420x297 Ergebnisheft 20180916.indd Alle Seiten 18.07.18 15:37 DIE ZUKUNFT IST JETZT. DER BMW i3s. BMW i3s (94 Ah) mit reinem Elektroantrieb BMW eDrive: Stromverbrauch in kWh/100 km (kombiniert): 14,3; CO2-Emissionen in g/km (kombiniert): 0; Kraftstoffverbrauch in l/100 km (kombiniert): 0. Die offi ziellen Angaben zu Kraftstoffverbrauch, CO2-Emissionen und Stromverbrauch wurden nach dem vorgeschriebenen Messverfahren VO (EU) 715/2007 in der jeweils geltenden Fassung ermittelt. Bei Freude am Fahren diesem Fahrzeug können für die Bemessung von Steuern und anderen fahrzeugbezogenen Abgaben, die (auch) auf den CO2-Ausstoß abstellen, andere als die hier angegebenen Werte gelten. Abbildung zeigt Sonderausstattungen. 7617 BMW i3s Sportmarketing AZ 420x297 Ergebnisheft 20180916.indd Alle Seiten 18.07.18 15:37 AOK Nordost. Beim Sport dabei. Nutzen Sie Ihre individuellen Vorteile: Bis zu 385 Euro für Fitness, Sport und Vorsorge. Bis zu 150 Euro für eine sportmedizinische Untersuchung. -
An Analysis of Contributions to Wikipedia from Tor
Are anonymity-seekers just like everybody else? An analysis of contributions to Wikipedia from Tor Chau Tran Kaylea Champion Andrea Forte Department of Computer Science & Engineering Department of Communication College of Computing & Informatics New York University University of Washington Drexel University New York, USA Seatle, USA Philadelphia, USA [email protected] [email protected] [email protected] Benjamin Mako Hill Rachel Greenstadt Department of Communication Department of Computer Science & Engineering University of Washington New York University Seatle, USA New York, USA [email protected] [email protected] Abstract—User-generated content sites routinely block contri- butions from users of privacy-enhancing proxies like Tor because of a perception that proxies are a source of vandalism, spam, and abuse. Although these blocks might be effective, collateral damage in the form of unrealized valuable contributions from anonymity seekers is invisible. One of the largest and most important user-generated content sites, Wikipedia, has attempted to block contributions from Tor users since as early as 2005. We demonstrate that these blocks have been imperfect and that thousands of attempts to edit on Wikipedia through Tor have been successful. We draw upon several data sources and analytical techniques to measure and describe the history of Tor editing on Wikipedia over time and to compare contributions from Tor users to those from other groups of Wikipedia users. Fig. 1. Screenshot of the page a user is shown when they attempt to edit the Our analysis suggests that although Tor users who slip through Wikipedia article on “Privacy” while using Tor. Wikipedia’s ban contribute content that is more likely to be reverted and to revert others, their contributions are otherwise similar in quality to those from other unregistered participants and to the initial contributions of registered users. -
Out of the Ordinary: Finding Hidden Threats by Analyzing Unusual
RAND-INITIATED RESEARCH CHILD POLICY This PDF document was made available CIVIL JUSTICE from www.rand.org as a public service of EDUCATION the RAND Corporation. ENERGY AND ENVIRONMENT HEALTH AND HEALTH CARE Jump down to document6 INTERNATIONAL AFFAIRS NATIONAL SECURITY The RAND Corporation is a nonprofit POPULATION AND AGING research organization providing PUBLIC SAFETY SCIENCE AND TECHNOLOGY objective analysis and effective SUBSTANCE ABUSE solutions that address the challenges TERRORISM AND facing the public and private sectors HOMELAND SECURITY TRANSPORTATION AND around the world. INFRASTRUCTURE Support RAND Purchase this document Browse Books & Publications Make a charitable contribution For More Information Visit RAND at www.rand.org Explore RAND-Initiated Research View document details Limited Electronic Distribution Rights This document and trademark(s) contained herein are protected by law as indicated in a notice appearing later in this work. This electronic representation of RAND intellectual property is provided for non- commercial use only. Permission is required from RAND to reproduce, or reuse in another form, any of our research documents. This product is part of the RAND Corporation monograph series. RAND monographs present major research findings that address the challenges facing the public and private sectors. All RAND mono- graphs undergo rigorous peer review to ensure high standards for research quality and objectivity. Out of the Ordinary Finding Hidden Threats by Analyzing Unusual Behavior JOHN HOLLYWOOD, DIANE SNYDER, KENNETH McKAY, JOHN BOON Approved for public release, distribution unlimited This research in the public interest was supported by RAND, using discretionary funds made possible by the generosity of RAND's donors, the fees earned on client-funded research, and independent research and development (IR&D) funds provided by the Department of Defense. -
Topological Evolution of Networks: Case Studies in the US Airlines and Language Wikipedias By
Topological Evolution of Networks: Case Studies in the US Airlines and Language Wikipedias by Gergana Assenova Bounova B.S., Theoretical Mathematics, Massachusetts Institute of Technology (2003) B.S., Aeronautics & Astronautics, Massachusetts Institute of Technology (2003) S.M., Aeronautics & Astronautics, Massachusetts Institute of Technology (2005) Submitted to the Department of Aeronautics and Astronautics in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2009 c 2009 Gergana A. Bounova, All rights reserved. ... The author hereby grants to MIT the permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part. Author.......................................................................................... Department of Aeronautics and Astronautics February 27, 2009 Certified by..................................................................................... Prof. Olivier L. de Weck Associate Professor of Aeronautics and Astronautics and Engineering Systems Thesis Supervisor Certified by..................................................................................... Prof. Christopher L. Magee Professor of the Practice of Mechanical Engineering and Engineering Systems Certified by..................................................................................... Dr. Daniel E. Whitney Senior Research Scientist, Center for Technology, Policy and Industrial Development, Senior Lecturer in -
Langmag July06 14-17.Qxd (Page 1)
Methodology Robert L. Read and Steven D. Brewer explain how Esperanto acts as a springboard for the acquisition of other languages Who Knows Where Esperanto Might Lead? In 1887, an obscure eye doctor in ly attain a competency that eluded them in Esperanto, or any language, provides a Poland self-published a little book in Russian. learning an ethnic language or report that they propaedeutic effect in learning a next lan- Over the next several years Lingvo Internacia1 reached a given level of competency in a frac- guage which is similar. appeared in English, French, German, tion of the time required by a national lan- Several factors may contribute to the Hebrew, and Polish. This book, written under guage. Early success creates a virtuous cycle Corder effect, including similarities in vocabu- the pen name Doctor Esperanto, laid the which encourages more study and often leads lary, grammatical structure, and word order. foundation for a new language that would to genuine fluency. Achievement yields positive Similarity of vocabulary has been shown to achieve what no other language project had effects on student self-confidence, insight into be an effective metric for predicting how ever done: establish a living community that the nature of languages in general, and the much knowing one language will help with would go on to survive the death of its cre- structure of their native language in particular. learning another.5 Since Esperanto was ator. Even conservative estimates place the Barry Farber writes in his book How to designed to have a widely recognized vocab- number of active speakers in the tens of Learn Any Language:2 “It’s said that once you ulary and grammatical features broadly thousands, with the number who have master one foreign language, all others come shared across language families, it takes learned Esperanto at some time in their lives much more easily. -
Private Companiescompanies
PrivatePrivate CompaniesCompanies Private Company Profiles @Ventures Kepware Technologies Access 360 Media LifeYield Acronis LogiXML Acumentrics Magnolia Solar Advent Solar Mariah Power Agion Technologies MetaCarta Akorri mindSHIFT Technologies alfabet Motionbox Arbor Networks Norbury Financial Asempra Technologies NumeriX Asset Control OpenConnect Atlas Venture Panasas Autonomic Networks Perimeter eSecurity Azaleos Permabit Technology Azimuth Systems PermissionTV Black Duck Software PlumChoice Online PC Services Brainshark Polaris Venture Partners BroadSoft PriceMetrix BzzAgent Reva Systems Cedar Point Communications Revolabs Ceres SafeNet Certeon Sandbridge Technologies Certica Solutions Security Innovation cMarket Silver Peak Systems Code:Red SIMtone ConsumerPowerline SkyeTek CorporateRewards SoloHealth Courion Sonicbids Crossbeam Systems StyleFeeder Cyber-Ark Software TAGSYS DAFCA Tatara Systems Demandware Tradeware Global Desktone Tutor.com Epic Advertising U4EA Technologies ExtendMedia Ubiquisys Fidelis Security Systems UltraCell Flagship Ventures Vanu Fortisphere Versata Enterprises GENBAND Visible Assets General Catalyst Partners VKernel Hearst Interactive Media VPIsystems Highland Capital Partners Ze-gen HSMC ZoomInfo Invention Machine @Ventures Address: 187 Ballardvale Street, Suite A260 800 Menlo Ave, Suite 120 Wilmington, MA 01887 Menlo Park, CA 94025 Phone #: (978) 658-8980 (650) 322-3246 Fax #: ND ND Website: www.ventures.com www.ventures.com Business Overview @Ventures provides venture capital and growth assistance to early stage clean technology companies. Established in 1995, @Ventures has funded more than 75 companies across a broad set of technology sectors. The exclusive focus of the firm's fifth fund, formed in 2004, is on investments in the cleantech sector, including alternative energy, energy storage and efficiency, advanced materials, and water technologies. Speaker Bio: Marc Poirier, Managing Director Marc Poirier has been a General Partner with @Ventures since 1998 and operates out the firm’s Boston-area office. -
Proceedings of the 3Rd Workshop on Building and Using Comparable
Workshop Programme Saturday, May 22, 2010 9:00-9:15 Opening Remarks Invited talk 9:15 Comparable Corpora Within and Across Languages, Word Frequency Lists and the KELLY Project Adam Kilgarriff 10:30 Break Session 1: Building Comparable Corpora 11:00 Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Ma- chine Translation Inguna Skadin¸a, Andrejs Vasil¸jevs, Raivis Skadin¸š, Robert Gaizauskas, Dan Tufi¸s and Tatiana Gornostay 11:30 Statistical Corpus and Language Comparison Using Comparable Corpora Thomas Eckart and Uwe Quasthoff 12:00 Wikipedia as Multilingual Source of Comparable Corpora Pablo Gamallo and Isaac González López 12:30 Trillions of Comparable Documents Pascale Fung, Emmanuel Prochasson and Simon Shi 13:00 Lunch break i Saturday, May 22, 2010 (continued) Session 2: Parallel and Comparable Corpora for Machine Translation 14:30 Improving Machine Translation Performance Using Comparable Corpora Andreas Eisele and Jia Xu 15:00 Building a Large English-Chinese Parallel Corpus from Comparable Patents and its Experimental Application to SMT Bin Lu, Tao Jiang, Kapo Chow and Benjamin K. Tsou 15:30 Automatic Terminologically-Rich Parallel Corpora Construction José João Almeida and Alberto Simões 16:00 Break Session 3: Contrastive Analysis 16:30 Foreign Language Examination Corpus for L2-Learning Studies Piotr Banski´ and Romuald Gozdawa-Goł˛ebiowski 17:00 Lexical Analysis of Pre and Post Revolution Discourse in Portugal Michel Généreux, Amália Mendes, L. Alice Santos Pereira and M. Fernanda Bace- lar do Nascimento -
ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia
ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia AARON HALFAKER∗, Microsoft, USA R. STUART GEIGER†, University of California, San Diego, USA Algorithmic systems—from rule-based bots to machine learning classifiers—have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter- vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality and consistency. However, conversations about how quality control should work and what role algorithms should play have generally been led by the expert engineers who have the skills and resources to develop and modify these complex algorithmic systems. In this paper, we describe ORES: an algorithmic scoring service that supports real-time scoring of wiki edits using multiple independent classifiers trained on different datasets. ORES decouples several activities that have typically all been performed by engineers: choosing or curating training data, building models to serve predictions, auditing predictions, and developing interfaces or automated agents that act on those predictions. This meta-algorithmic system was designed to open up socio-technical conversations about algorithms in Wikipedia to a broader set of participants. In this paper, we discuss the theoretical mechanisms of social change ORES enables and detail case studies in participatory machine learning around ORES from the 5 years since its deployment. CCS Concepts: • Networks → Online social networks; • Computing methodologies → Supervised 148 learning by classification; • Applied computing → Sociology; • Software and its engineering → Software design techniques; • Computer systems organization → Cloud computing; Additional Key Words and Phrases: Wikipedia; Reflection; Machine learning; Transparency; Fairness; Algo- rithms; Governance ACM Reference Format: Aaron Halfaker and R.