Joint Discourse-Aware Concept Disambiguation and Clustering
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics. -
Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia. -
80S 697 Songs, 2 Days, 3.53 GB
80s 697 songs, 2 days, 3.53 GB Name Artist Album Year Take on Me a-ha Hunting High and Low 1985 A Woman's Got the Power A's A Woman's Got the Power 1981 The Look of Love (Part One) ABC The Lexicon of Love 1982 Poison Arrow ABC The Lexicon of Love 1982 Hells Bells AC/DC Back in Black 1980 Back in Black AC/DC Back in Black 1980 You Shook Me All Night Long AC/DC Back in Black 1980 For Those About to Rock (We Salute You) AC/DC For Those About to Rock We Salute You 1981 Balls to the Wall Accept Balls to the Wall 1983 Antmusic Adam & The Ants Kings of the Wild Frontier 1980 Goody Two Shoes Adam Ant Friend or Foe 1982 Angel Aerosmith Permanent Vacation 1987 Rag Doll Aerosmith Permanent Vacation 1987 Dude (Looks Like a Lady) Aerosmith Permanent Vacation 1987 Love In An Elevator Aerosmith Pump 1989 Janie's Got A Gun Aerosmith Pump 1989 The Other Side Aerosmith Pump 1989 What It Takes Aerosmith Pump 1989 Lightning Strikes Aerosmith Rock in a Hard Place 1982 Der Komimissar After The Fire Der Komimissar 1982 Sirius/Eye in the Sky Alan Parsons Project Eye in the Sky 1982 The Stand Alarm Declaration 1983 Rain in the Summertime Alarm Eye of the Hurricane 1987 Big In Japan Alphaville Big In Japan 1984 Freeway of Love Aretha Franklin Who's Zoomin' Who? 1985 Who's Zooming Who Aretha Franklin Who's Zoomin' Who? 1985 Close (To The Edit) Art of Noise Who's Afraid of the Art of Noise? 1984 Solid Ashford & Simpson Solid 1984 Heat of the Moment Asia Asia 1982 Only Time Will Tell Asia Asia 1982 Sole Survivor Asia Asia 1982 Turn Up The Radio Autograph Sign In Please 1984 Love Shack B-52's Cosmic Thing 1989 Roam B-52's Cosmic Thing 1989 Private Idaho B-52's Wild Planet 1980 Change Babys Ignition 1982 Mr. -
Omnipedia: Bridging the Wikipedia Language
Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens. -
Title of Thesis: ABSTRACT CLASSIFYING BIAS
ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr. -
Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia
Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia Eun-kyung Kim1, Matthias Weidl2, Key-Sun Choi1, S¨orenAuer2 1 Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 2 Universit¨at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany [email protected], [email protected] [email protected], [email protected] Abstract. In the first part of this paper we report about experiences when applying the DBpedia extraction framework to the Korean Wikipedia. We improved the extraction of non-Latin characters and extended the framework with pluggable internationalization components in order to fa- cilitate the extraction of localized information. With these improvements we almost doubled the amount of extracted triples. We also will present the results of the extraction for Korean. In the second part, we present a conceptual study aimed at understanding the impact of international resource synchronization in DBpedia. In the absence of any informa- tion synchronization, each country would construct its own datasets and manage it from its users. Moreover the cooperation across the various countries is adversely affected. Keywords: Synchronization, Wikipedia, DBpedia, Multi-lingual 1 Introduction Wikipedia is the largest encyclopedia of mankind and is written collaboratively by people all around the world. Everybody can access this knowledge as well as add and edit articles. Right now Wikipedia is available in 260 languages and the quality of the articles reached a high level [1]. However, Wikipedia only offers full-text search for this textual information. For that reason, different projects have been started to convert this information into structured knowledge, which can be used by Semantic Web technologies to ask sophisticated queries against Wikipedia. -
The New York Law School Reporter's Arts and Entertainment Journal, Vol
digitalcommons.nyls.edu NYLS Publications Student Newspapers 4-1986 The ewN York Law School Reporter's Arts and Entertainment Journal, vol IV, no. 4, April 1986 New York Law School Follow this and additional works at: https://digitalcommons.nyls.edu/newspapers Recommended Citation New York Law School, "The eN w York Law School Reporter's Arts and Entertainment Journal, vol IV, no. 4, April 1986" (1986). Student Newspapers. 117. https://digitalcommons.nyls.edu/newspapers/117 This Article is brought to you for free and open access by the NYLS Publications at DigitalCommons@NYLS. It has been accepted for inclusion in Student Newspapers by an authorized administrator of DigitalCommons@NYLS. The New·York Law School Reporter's 1lll'l,S 1INI) l~N'l,l~ll'l'1IIN)ll~N'I' ,IC) IJllN 11I~ VolIVNo4 • ALL THE NEWS WE CAN FIND • Apr1l1986 llf.)f;I{ 1INI) llf)I~I~: by Dbmne Pine DE1'EN'l'E OF THE EIGH'flES When the arts & entertainment section Sirnplifed - the Home Audio Recording by llya Frenkel first pondered the merits of an article on Act calls for 1) a 1• per minute tax on high With all the talk of possibility of But the real reason behind this the proposed Home Audio Recording Act, quality audio tape; 2) a 5% tax on tape detente in the U.S. - Soviet relations, phenomenon may very well be that with the flood of information which reached recorders and 3) a 25% tax on dual tape cultural exchanges and trade take a the advent of easily-accessible audio, this office looked like so many piles of decks. -
Pynchon's Sound of Music
Pynchon’s Sound of Music Christian Hänggi Pynchon’s Sound of Music DIAPHANES PUBLISHED WITH SUPPORT BY THE SWISS NATIONAL SCIENCE FOUNDATION 1ST EDITION ISBN 978-3-0358-0233-7 10.4472/9783035802337 DIESES WERK IST LIZENZIERT UNTER EINER CREATIVE COMMONS NAMENSNENNUNG 3.0 SCHWEIZ LIZENZ. LAYOUT AND PREPRESS: 2EDIT, ZURICH WWW.DIAPHANES.NET Contents Preface 7 Introduction 9 1 The Job of Sorting It All Out 17 A Brief Biography in Music 17 An Inventory of Pynchon’s Musical Techniques and Strategies 26 Pynchon on Record, Vol. 4 51 2 Lessons in Organology 53 The Harmonica 56 The Kazoo 79 The Saxophone 93 3 The Sounds of Societies to Come 121 The Age of Representation 127 The Age of Repetition 149 The Age of Composition 165 4 Analyzing the Pynchon Playlist 183 Conclusion 227 Appendix 231 Index of Musical Instruments 233 The Pynchon Playlist 239 Bibliography 289 Index of Musicians 309 Acknowledgments 315 Preface When I first read Gravity’s Rainbow, back in the days before I started to study literature more systematically, I noticed the nov- el’s many references to saxophones. Having played the instru- ment for, then, almost two decades, I thought that a novelist would not, could not, feature specialty instruments such as the C-melody sax if he did not play the horn himself. Once the saxophone had caught my attention, I noticed all sorts of uncommon references that seemed to confirm my hunch that Thomas Pynchon himself played the instrument: McClintic Sphere’s 4½ reed, the contra- bass sax of Against the Day, Gravity’s Rainbow’s Charlie Parker passage. -
Arxiv:2010.11856V3 [Cs.CL] 13 Apr 2021 Questions from Non-English Native Speakers to Rep- Information-Seeking Questions—Questions from Resent Real-World Applications
XOR QA: Cross-lingual Open-Retrieval Question Answering Akari Asaiº, Jungo Kasaiº, Jonathan H. Clark¶, Kenton Lee¶, Eunsol Choi¸, Hannaneh Hajishirziº¹ ºUniversity of Washington ¶Google Research ¸The University of Texas at Austin ¹Allen Institute for AI {akari, jkasai, hannaneh}@cs.washington.edu {jhclark, kentonl}@google.com, [email protected] Abstract ロン・ポールの学部時代の専攻は?[Japanese] (What did Ron Paul major in during undergraduate?) Multilingual question answering tasks typi- cally assume that answers exist in the same Multilingual document collections language as the question. Yet in prac- (Wikipedias) tice, many languages face both information ロン・ポール (ja.wikipedia) scarcity—where languages have few reference 高校卒業後はゲティスバーグ大学へ進学。 (After high school, he went to Gettysburg College.) articles—and information asymmetry—where questions reference concepts from other cul- Ron Paul (en.wikipedia) tures. This work extends open-retrieval ques- Paul went to Gettysburg College, where he was a member of the Lambda Chi Alpha fraternity. He tion answering to a cross-lingual setting en- graduated with a B.S. degree in Biology in 1957. abling questions from one language to be an- swered via answer content from another lan- 生物学 (Biology) guage. We construct a large-scale dataset built on 40K information-seeking questions Figure 1: Overview of XOR QA. Given a question in across 7 diverse non-English languages that Li, the model finds an answer in either English or Li TYDI QA could not find same-language an- Wikipedia and returns an answer in English or L . L swers for. Based on this dataset, we introduce i i is one of the 7 typologically diverse languages. -
Continuous Light on Tomato
CONTINUOUS LIGHT ON TOMATO From Gene to Yield Aarón I. Vélez-Ramírez i Thesis committee Promotor Prof. Dr Harro J. Bouwmeester Professor of Plant Physiology Wageningen University Co-promotors Dr Wim van Ieperen Assistant professor, Horticulture and Product Physiology Wageningen University Dr Dick Vreugdenhil Associate professor, Laboratory of Plant Physiology Wageningen University Other members Prof. Dr Ton Bisseling, Wageningen University Prof. Dr Roberta Croce, VU University Amsterdam, The Netherlands Prof. Dr Gerrit T.S. Beemster, University of Antwerp, Belgium Dr. Ronald Pierik, Utrecht University, The Netherlands This research was conducted under the auspices of the Graduate School of Experimental Plant Sciences (EPS). ii CONTINUOUS LIGHT ON TOMATO From Gene to Yield Aarón I. Vélez-Ramírez Thesis submitted in fulfillment of the requirements for the degree of doctor at Wageningen University by the authority of the Rector Magnificus Prof. Dr M.J. Kropff, in the presence of the Thesis Committee appointed by the Academic Board to be defended in public on Friday 19 September 2014 at 11 a.m. in the Aula. iii Aarón I. Vélez Ramírez Continuous Light on Tomato. From Gene to Yield 214 pages PhD thesis, Wageningen University , Wageningen, NL (2014) With references, summaries in English and Dutch ISBN 978-94-6257-078-8 iv Contents Chapter 1 | 1 General Introduction. Continuous-light-induced injury in tomato: An old enigma 13 Chapter 2 | Plants under continuous light Chapter 3 | 29 Continuous light as a way to increase greenhouse tomato production: -
QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT
QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT C Features that we build to improve our technology A Foundation level goals offering B Features we build for others D Modernization, renewal and tech debt goals The goals in each team pack are annotated using this scheme illustrate the broad trends in our priorities Agenda ● CTO Team ● Research and Data ● Design Research ● Performance ● Release Engineering ● Security ● Technical Operations Photos (left to right) Technology (Services) CTO July 2017 quarterly check-in All content is © Wikimedia Foundation & available under CC BY-SA 4.0, unless noted otherwise. CTO Team ● Victoria Coleman - Chief Technology Officer ● Joel Aufrecht - Program Manager (Technology) ● Lani Goto - Project Assistant ● Megan Neisler - Senior Project Coordinator ● Sarah Rodlund - Senior Project Coordinator ● Kevin Smith - Program Manager (Engineering) Photos (left to right) CHECK IN TEAM/DEPT PROGRAM WIKIMEDIA FOUNDATION July 2017 CTO 4.5 [LINK] ANNUAL PLAN GOAL: expand and strengthen our technical communities What is your objective / Who are you working with? What impact / deliverables are you expecting? workflow? Program 4: Technical LAST QUARTER community building (none) Outcome 5: Organize Wikimedia Developer Summit NEXT QUARTER Objective 1: Developer Technical Collaboration Decide on event location, dates, theme, deadlines, etc. Summit web page and publicize the information published four months before the event (B) STATUS: OBJECTIVE IN PROGRESS Technology (Services) Research and Data July, 2017 quarterly -
Social Changer Jae-Hee Technology Comfort Level
Social Changer Jae-Hee Technology comfort level AGE: EDUCATION: 27 years old University Low High LOCATION: LANGUAGES: Seoul, South Korea Korean (fluent) Writing comfort level Japanese (proficient) OCCUPATION: English (proficient) Freelance graphic designer Low High Macbook Pro iPad iPhone 6S PRIMARY USE: Graphic design work, PRIMARY USE: Reading, looking for PRIMARY USE: Calling, messaging maintaining her website, writing, inspiration for her work, and quick friends on Kakao Talk, Twitter editing Wikipedia internet browsing when she’s not at home Background Jae-Hee graduated from university two years ago Japanese. While in university, she took a class on Experience Goals and currently works as a freelance graphic de- design and sustainability and became interested • To freely share her opinions and signer. She lives in the suburbs of Seoul, with her in environmental issues. Now she volunteers for knowledge without conflict or rebuke, parents and younger sister. In her spare time, she an environmental advocacy non-profit which ed- like she does elsewhere online works on her digital art and photography, which ucates people about living a sustainable lifestyle, she publishes on her personal website, through and shares similar lifestyle tips on her personal End Goals WordPress. She loves reading, particularly fantasy blog. and science fiction stories, in both Korean and • To raise awareness on environmental issues • To collaborate with other Tech Habits environmentalists She first started using a computer in grade school, environmental groups. She is well-known by her and got her first smartphone in high school. She online username, “jigu”. She learned basic HTML Challenges uses social media avidly, particularly Twitter to to run her website, and uses Adobe software for share her work and writing, and to follow other her graphic design and art.