Joint Discourse-Aware Concept Disambiguation and Clustering

Total Page:16

File Type:pdf, Size:1020Kb

Joint Discourse-Aware Concept Disambiguation and Clustering Joint Discourse-aware Concept Disambiguation and Clustering Dissertation zur Erlangung der Doktorwurde¨ der Neuphilologischen Fakultat¨ der Ruprecht-Karls-Universitat¨ Heidelberg vorgelegt von Angela Petra Fahrni Referent: Prof. Dr. Michael Strube Korreferent: Prof. Dr. Anette Frank Einreichung: 31.10.2014 Disputation: 21.12.2015 Abstract This thesis addresses the tasks of concept disambiguation and clustering. Con- cept disambiguation is the task of linking common nouns and proper names in a text – henceforth called mentions – to their corresponding concepts in a prede- fined inventory. Concept clustering is the task of clustering mentions, so that all mentions in one cluster denote the same concept. In this thesis, we investigate concept disambiguation and clustering from a discourse perspective and propose a discourse-aware approach for joint concept disambiguation and clustering in the framework of Markov logic. The contributions of this thesis are fourfold: Joint Concept Disambiguation and Clustering. In previous approaches, con- cept disambiguation and concept clustering have been considered as two separate tasks (Schutze,¨ 1998; Ji & Grishman, 2011). We analyze the relationship between concept disambiguation and concept clustering and argue that these two tasks can mutually support each other. We propose the – to our knowledge – first joint approach for concept disambiguation and clustering. Discourse-Aware Concept Disambiguation. One of the determining factors for concept disambiguation and clustering is the context definition. Most previous approaches use the same context definition for all mentions (Milne & Witten, 2008b; Kulkarni et al., 2009; Ratinov et al., 2011, inter alia). We approach the question which context is relevant to disambiguate a mention from a discourse perspective and state that different mentions require different notions of contexts. We state that the context that is relevant to disambiguate a mention depends on its embedding into discourse. However, how a mention is embedded into discourse depends on its denoted concept. Hence, the identification of the denoted concept and the relevant concept mutually depend on each other. We propose a binwise approach with three different context definitions and model the selection of the context definition and the disambiguation jointly. Modeling Interdependencies with Markov Logic. To model the interdepen- dencies between concept disambiguation and concept clustering as well as the iv interdependencies between the context definition and the disambiguation, we use Markov logic (Domingos & Lowd, 2009). Markov logic combines first order logic with probabilities and allows us to concisely formalize these interdependen- cies. We investigate how we can balance between linguistic appropriateness and time efficiency and propose a hybrid approach that combines joint inference with aggregation techniques. Concept Disambiguation and Clustering beyond English: Multi- and Cross- linguality. Given the vast amount of texts written in different languages, the capability to extend an approach to cope with other languages than English is essential. We thus analyze how our approach copes with other languages than English and show that our approach largely scales across languages, even without retraining. Our approach is evaluated on multiple data sets originating from different sour- ces (e.g. news, web) and across multiple languages. As an inventory, we use Wikipedia. We compare our approach to other approaches and show that it achieves state-of-the-art results. Furthermore, we show that joint concept disambiguating and clustering as well as joint context selection and disambiguation leads to sig- nificant improvements ceteris paribus. Zusammenfassung Diese Dissertation beschaftigt¨ sich mit Konzeptdisambiguierung und Konzept- clustering. Unter Konzeptdisambiguierung verstehen wir die Aufgabe, Gattungs- und Eigennamen in Texten – im Folgenden Erwahnungen¨ genannt – zu ihren ent- sprechenden Konzepten in einem vorab definierten Inventar zu verlinken. Kon- zeptclustering ist die Aufgabe, Erwahnungen¨ so zu gruppieren, dass alle Erwah-¨ nungen in einem Cluster das gleiche Konzept denotieren. In dieser Dissertation untersuchen wir Konzeptdisambiguierung und -clustering von einer Diskursper- spektive und schlagen einen diskursbezogenen Ansatz fur¨ ein vereintes Disambi- guieren und Clustern von Konzepten in Markov Logik vor. Die Forschungsbei- trage¨ dieser Dissertation umfassen vier Bereiche. Vereintes Disambiguieren und Clustern von Konzepten. Vorherige Ansatze¨ modellieren Konzeptdisambiguierung und Konzeptclustering als zwei separate Aufgaben (Schutze,¨ 1998; Ji & Grishman, 2011). Wir analysieren die Bezie- hung zwischen Konzeptdisambiguierung und Konzeptclustering und argumentie- ren, dass diese zwei Aufgaben sich wechselseitig unterstutzen¨ konnen.¨ Wir schla- gen den – unseres Wissens – ersten Ansatz fur¨ vereintes Disambiguieren und Clus- tern von Konzepten vor. Diskursbezogene Konzeptdisambiguierung. Ein bestimmender Faktor fur¨ das Disambiguieren und Clustern von Konzepten ist die Kontextdefinition. Die meis- ten vorherigen Ansatze¨ verwenden die gleiche Kontextdefinition fur¨ alle Erwah-¨ nungen (Milne & Witten, 2008b; Kulkarni et al., 2009; Ratinov et al., 2011, inter alia). Wir nahern¨ uns der Frage, welcher Kontext relevant fur¨ die Disambiguie- rung von Erwahnungen¨ ist, von einer Diskursperspektive und argumentieren, dass verschiedene Erwahnungen¨ unterschiedliche Kontextdefinitionen erfordern. Wir legen dar, dass der fur¨ die Disambiguierung relevante Kontext davon abhangt,¨ wie diese Erwahnung¨ in den Diskurs eingebettet ist. Die Einbettung einer Erwahnung¨ in den Diskurs hangt¨ jedoch vom Konzept ab, das die Erwahnung¨ denotiert. Dies fuhrt¨ dazu, dass die Identifikation des denotierten Konzeptes und die Bestimmung des relevanten Kontextes voneinander abhangen.¨ In dieser Dissertation schlagen vi wir einen Ansatz mit drei Kontextdefinitionen vor und modellieren die Identifika- tion des Kontextes fur¨ eine Erwahnung¨ und deren Disambiguierung wechselseitig. Modellieren von Interdependenzen mit Markov Logik. Um die Interdepen- denzen zwischen Konzeptdisambiguierung und Konzeptclustering sowie zwischen Kontextdefinition und Disambiguierung zu modellieren, verwenden wir Markov Logik (Domingos & Lowd, 2009). Markov Logik vereinigt Pradikatenlogik¨ mit Wahrscheinlichkeiten und ermoglicht¨ es, Interdependenzen prazise¨ und pragnant¨ zu formalisieren. Wir untersuchen, wie wir Konzeptdisambiguierung und Kon- zeptclustering einerseits linguistisch motiviert, andererseits zeiteffizient imple- mentieren konnen,¨ und schlagen einen hybriden Ansatz vor, der vereinte und ag- gregative Techniken kombiniert. Multi- und crosslinguales Disambiguieren und Clustern von Konzepten. Viele Texte sind nicht in Englisch verfugbar.¨ Es ist daher zentral, dass ein Ansatz nicht nur fur¨ das Englische verwendbar ist, sondern auch andere Sprachen ab- deckt. Wir analysieren, wie unser Ansatz auf andere Sprachen anwendbar ist, und zeigen, dass unser System erfolgreich andere Sprachen verarbeiten kann, selbst ohne sprachspezifisches Abstimmen der gelernten Parameter. Wir evaluieren unseren Ansatz anhand von verschiedenen Datensatzen¨ und berucksichtigen¨ nicht nur unterschiedliche Textquellen (beispielsweise Zeitun- gen, Web), sondern auch verschiedene Sprachen. Als Inventar verwenden wir Wi- kipedia. Wir vergleichen unseren Ansatz mit verschiedenen anderen Ansatze¨ und zeigen, dass die Ergebnisse unseres Ansatzes dem aktuellen Stand der Forschung entsprechen. Zudem zeigen wir, dass unser vereinter Konzeptdisambiguierungs- und -clusteringansatz sowie unsere vereinte Kontextmodellierung und Disambi- guierung zu signifikant besseren Resultaten fuhren¨ ceteris paribus. vii Acknowledgments I am sitting in front of my thesis surrounded by a few boxes. Tomorrow, I will hand in my thesis and move out of my flat. So it is about time to add another two – I promise last – pages to my not too short thesis. First of all, I would like to thank my supervisor Prof. Dr. Michael Strube. He managed to give me enough freedom to develop my own ideas, while being sup- portive at the same time. He always took a lot of time for discussions and I could count on his honest feedback. I am very glad that Prof. Dr. Anette Frank is my co-referent. She was always encouraging and helped me to move on by asking relevant questions during my colloquium talks and providing valuable comments afterwards. While at the beginning of writing this thesis, I was glad about each new page, at some point of time, I was glad about each sentence I could cut. Fortunately, Sebastian Martschat did a fantastic job in carefully reading my thesis and helping me to streamline it, similar to Nafise Moosavi, Jie Cai, Yufang Hou, Alex Judea, Mohsen Mesgar, Daraksha Parveen and Michael Roth who all read parts of it with great care. I did not only obtain great support from all my colleagues while writing my thesis, but during the whole time as a PhD student. Dr. Vivi Nastase helped me a lot with her broad knowledge in my first years at HITS, introduced me in the project world and was always there for me as a friend. I really appreciate all the discussions I had with Jie Cai, Sebastian Martschart and Yufang Hou, which were very useful when I did not know how to solve a problem. In particular, I will not forget all the interesting conversations I had with Sebastian Martschat about evaluation metrics. In addition to all the support I received in Heidelberg, I also learned a
Recommended publications
  • A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
    Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics.
    [Show full text]
  • Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
    information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.
    [Show full text]
  • 80S 697 Songs, 2 Days, 3.53 GB
    80s 697 songs, 2 days, 3.53 GB Name Artist Album Year Take on Me a-ha Hunting High and Low 1985 A Woman's Got the Power A's A Woman's Got the Power 1981 The Look of Love (Part One) ABC The Lexicon of Love 1982 Poison Arrow ABC The Lexicon of Love 1982 Hells Bells AC/DC Back in Black 1980 Back in Black AC/DC Back in Black 1980 You Shook Me All Night Long AC/DC Back in Black 1980 For Those About to Rock (We Salute You) AC/DC For Those About to Rock We Salute You 1981 Balls to the Wall Accept Balls to the Wall 1983 Antmusic Adam & The Ants Kings of the Wild Frontier 1980 Goody Two Shoes Adam Ant Friend or Foe 1982 Angel Aerosmith Permanent Vacation 1987 Rag Doll Aerosmith Permanent Vacation 1987 Dude (Looks Like a Lady) Aerosmith Permanent Vacation 1987 Love In An Elevator Aerosmith Pump 1989 Janie's Got A Gun Aerosmith Pump 1989 The Other Side Aerosmith Pump 1989 What It Takes Aerosmith Pump 1989 Lightning Strikes Aerosmith Rock in a Hard Place 1982 Der Komimissar After The Fire Der Komimissar 1982 Sirius/Eye in the Sky Alan Parsons Project Eye in the Sky 1982 The Stand Alarm Declaration 1983 Rain in the Summertime Alarm Eye of the Hurricane 1987 Big In Japan Alphaville Big In Japan 1984 Freeway of Love Aretha Franklin Who's Zoomin' Who? 1985 Who's Zooming Who Aretha Franklin Who's Zoomin' Who? 1985 Close (To The Edit) Art of Noise Who's Afraid of the Art of Noise? 1984 Solid Ashford & Simpson Solid 1984 Heat of the Moment Asia Asia 1982 Only Time Will Tell Asia Asia 1982 Sole Survivor Asia Asia 1982 Turn Up The Radio Autograph Sign In Please 1984 Love Shack B-52's Cosmic Thing 1989 Roam B-52's Cosmic Thing 1989 Private Idaho B-52's Wild Planet 1980 Change Babys Ignition 1982 Mr.
    [Show full text]
  • Omnipedia: Bridging the Wikipedia Language
    Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens.
    [Show full text]
  • Title of Thesis: ABSTRACT CLASSIFYING BIAS
    ABSTRACT Title of Thesis: CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis Directed By: Dr. David Zajic, Ph.D. Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias. CLASSIFYING BIAS IN LARGE MULTILINGUAL CORPORA VIA CROWDSOURCING AND TOPIC MODELING by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang Thesis submitted in partial fulfillment of the requirements of the Gemstone Honors Program, University of Maryland, 2018 Advisory Committee: Dr. David Zajic, Chair Dr. Brian Butler Dr. Marine Carpuat Dr. Melanie Kill Dr. Philip Resnik Mr. Ed Summers © Copyright by Team BIASES: Brianna Caljean, Katherine Calvert, Ashley Chang, Elliot Frank, Rosana Garay Jáuregui, Geoffrey Palo, Ryan Rinker, Gareth Weakly, Nicolette Wolfrey, William Zhang 2018 Acknowledgements We would like to express our sincerest gratitude to our mentor, Dr.
    [Show full text]
  • Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia
    Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia Eun-kyung Kim1, Matthias Weidl2, Key-Sun Choi1, S¨orenAuer2 1 Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 2 Universit¨at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany [email protected], [email protected] [email protected], [email protected] Abstract. In the first part of this paper we report about experiences when applying the DBpedia extraction framework to the Korean Wikipedia. We improved the extraction of non-Latin characters and extended the framework with pluggable internationalization components in order to fa- cilitate the extraction of localized information. With these improvements we almost doubled the amount of extracted triples. We also will present the results of the extraction for Korean. In the second part, we present a conceptual study aimed at understanding the impact of international resource synchronization in DBpedia. In the absence of any informa- tion synchronization, each country would construct its own datasets and manage it from its users. Moreover the cooperation across the various countries is adversely affected. Keywords: Synchronization, Wikipedia, DBpedia, Multi-lingual 1 Introduction Wikipedia is the largest encyclopedia of mankind and is written collaboratively by people all around the world. Everybody can access this knowledge as well as add and edit articles. Right now Wikipedia is available in 260 languages and the quality of the articles reached a high level [1]. However, Wikipedia only offers full-text search for this textual information. For that reason, different projects have been started to convert this information into structured knowledge, which can be used by Semantic Web technologies to ask sophisticated queries against Wikipedia.
    [Show full text]
  • The New York Law School Reporter's Arts and Entertainment Journal, Vol
    digitalcommons.nyls.edu NYLS Publications Student Newspapers 4-1986 The ewN York Law School Reporter's Arts and Entertainment Journal, vol IV, no. 4, April 1986 New York Law School Follow this and additional works at: https://digitalcommons.nyls.edu/newspapers Recommended Citation New York Law School, "The eN w York Law School Reporter's Arts and Entertainment Journal, vol IV, no. 4, April 1986" (1986). Student Newspapers. 117. https://digitalcommons.nyls.edu/newspapers/117 This Article is brought to you for free and open access by the NYLS Publications at DigitalCommons@NYLS. It has been accepted for inclusion in Student Newspapers by an authorized administrator of DigitalCommons@NYLS. The New·York Law School Reporter's 1lll'l,S 1INI) l~N'l,l~ll'l'1IIN)ll~N'I' ,IC) IJllN 11I~ VolIVNo4 • ALL THE NEWS WE CAN FIND • Apr1l1986 llf.)f;I{ 1INI) llf)I~I~: by Dbmne Pine DE1'EN'l'E OF THE EIGH'flES When the arts & entertainment section Sirnplifed - the Home Audio Recording by llya Frenkel first pondered the merits of an article on Act calls for 1) a 1• per minute tax on high With all the talk of possibility of But the real reason behind this the proposed Home Audio Recording Act, quality audio tape; 2) a 5% tax on tape detente in the U.S. - Soviet relations, phenomenon may very well be that with the flood of information which reached recorders and 3) a 25% tax on dual tape cultural exchanges and trade take a the advent of easily-accessible audio, this office looked like so many piles of decks.
    [Show full text]
  • Pynchon's Sound of Music
    Pynchon’s Sound of Music Christian Hänggi Pynchon’s Sound of Music DIAPHANES PUBLISHED WITH SUPPORT BY THE SWISS NATIONAL SCIENCE FOUNDATION 1ST EDITION ISBN 978-3-0358-0233-7 10.4472/9783035802337 DIESES WERK IST LIZENZIERT UNTER EINER CREATIVE COMMONS NAMENSNENNUNG 3.0 SCHWEIZ LIZENZ. LAYOUT AND PREPRESS: 2EDIT, ZURICH WWW.DIAPHANES.NET Contents Preface 7 Introduction 9 1 The Job of Sorting It All Out 17 A Brief Biography in Music 17 An Inventory of Pynchon’s Musical Techniques and Strategies 26 Pynchon on Record, Vol. 4 51 2 Lessons in Organology 53 The Harmonica 56 The Kazoo 79 The Saxophone 93 3 The Sounds of Societies to Come 121 The Age of Representation 127 The Age of Repetition 149 The Age of Composition 165 4 Analyzing the Pynchon Playlist 183 Conclusion 227 Appendix 231 Index of Musical Instruments 233 The Pynchon Playlist 239 Bibliography 289 Index of Musicians 309 Acknowledgments 315 Preface When I first read Gravity’s Rainbow, back in the days before I started to study literature more systematically, I noticed the nov- el’s many references to saxophones. Having played the instru- ment for, then, almost two decades, I thought that a novelist would not, could not, feature specialty instruments such as the C-melody sax if he did not play the horn himself. Once the saxophone had caught my attention, I noticed all sorts of uncommon references that seemed to confirm my hunch that Thomas Pynchon himself played the instrument: McClintic Sphere’s 4½ reed, the contra- bass sax of Against the Day, Gravity’s Rainbow’s Charlie Parker passage.
    [Show full text]
  • Arxiv:2010.11856V3 [Cs.CL] 13 Apr 2021 Questions from Non-English Native Speakers to Rep- Information-Seeking Questions—Questions from Resent Real-World Applications
    XOR QA: Cross-lingual Open-Retrieval Question Answering Akari Asaiº, Jungo Kasaiº, Jonathan H. Clark¶, Kenton Lee¶, Eunsol Choi¸, Hannaneh Hajishirziº¹ ºUniversity of Washington ¶Google Research ¸The University of Texas at Austin ¹Allen Institute for AI {akari, jkasai, hannaneh}@cs.washington.edu {jhclark, kentonl}@google.com, [email protected] Abstract ロン・ポールの学部時代の専攻は?[Japanese] (What did Ron Paul major in during undergraduate?) Multilingual question answering tasks typi- cally assume that answers exist in the same Multilingual document collections language as the question. Yet in prac- (Wikipedias) tice, many languages face both information ロン・ポール (ja.wikipedia) scarcity—where languages have few reference 高校卒業後はゲティスバーグ大学へ進学。 (After high school, he went to Gettysburg College.) articles—and information asymmetry—where questions reference concepts from other cul- Ron Paul (en.wikipedia) tures. This work extends open-retrieval ques- Paul went to Gettysburg College, where he was a member of the Lambda Chi Alpha fraternity. He tion answering to a cross-lingual setting en- graduated with a B.S. degree in Biology in 1957. abling questions from one language to be an- swered via answer content from another lan- 生物学 (Biology) guage. We construct a large-scale dataset built on 40K information-seeking questions Figure 1: Overview of XOR QA. Given a question in across 7 diverse non-English languages that Li, the model finds an answer in either English or Li TYDI QA could not find same-language an- Wikipedia and returns an answer in English or L . L swers for. Based on this dataset, we introduce i i is one of the 7 typologically diverse languages.
    [Show full text]
  • Continuous Light on Tomato
    CONTINUOUS LIGHT ON TOMATO From Gene to Yield Aarón I. Vélez-Ramírez i Thesis committee Promotor Prof. Dr Harro J. Bouwmeester Professor of Plant Physiology Wageningen University Co-promotors Dr Wim van Ieperen Assistant professor, Horticulture and Product Physiology Wageningen University Dr Dick Vreugdenhil Associate professor, Laboratory of Plant Physiology Wageningen University Other members Prof. Dr Ton Bisseling, Wageningen University Prof. Dr Roberta Croce, VU University Amsterdam, The Netherlands Prof. Dr Gerrit T.S. Beemster, University of Antwerp, Belgium Dr. Ronald Pierik, Utrecht University, The Netherlands This research was conducted under the auspices of the Graduate School of Experimental Plant Sciences (EPS). ii CONTINUOUS LIGHT ON TOMATO From Gene to Yield Aarón I. Vélez-Ramírez Thesis submitted in fulfillment of the requirements for the degree of doctor at Wageningen University by the authority of the Rector Magnificus Prof. Dr M.J. Kropff, in the presence of the Thesis Committee appointed by the Academic Board to be defended in public on Friday 19 September 2014 at 11 a.m. in the Aula. iii Aarón I. Vélez Ramírez Continuous Light on Tomato. From Gene to Yield 214 pages PhD thesis, Wageningen University , Wageningen, NL (2014) With references, summaries in English and Dutch ISBN 978-94-6257-078-8 iv Contents Chapter 1 | 1 General Introduction. Continuous-light-induced injury in tomato: An old enigma 13 Chapter 2 | Plants under continuous light Chapter 3 | 29 Continuous light as a way to increase greenhouse tomato production:
    [Show full text]
  • QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT
    QUARTERLY CHECK-IN Technology (Services) TECH GOAL QUADRANT C Features that we build to improve our technology A Foundation level goals offering B Features we build for others D Modernization, renewal and tech debt goals The goals in each team pack are annotated using this scheme illustrate the broad trends in our priorities Agenda ● CTO Team ● Research and Data ● Design Research ● Performance ● Release Engineering ● Security ● Technical Operations Photos (left to right) Technology (Services) CTO July 2017 quarterly check-in All content is © Wikimedia Foundation & available under CC BY-SA 4.0, unless noted otherwise. CTO Team ● Victoria Coleman - Chief Technology Officer ● Joel Aufrecht - Program Manager (Technology) ● Lani Goto - Project Assistant ● Megan Neisler - Senior Project Coordinator ● Sarah Rodlund - Senior Project Coordinator ● Kevin Smith - Program Manager (Engineering) Photos (left to right) CHECK IN TEAM/DEPT PROGRAM WIKIMEDIA FOUNDATION July 2017 CTO 4.5 [LINK] ANNUAL PLAN GOAL: expand and strengthen our technical communities What is your objective / Who are you working with? What impact / deliverables are you expecting? workflow? Program 4: Technical LAST QUARTER community building (none) Outcome 5: Organize Wikimedia Developer Summit NEXT QUARTER Objective 1: Developer Technical Collaboration Decide on event location, dates, theme, deadlines, etc. Summit web page and publicize the information published four months before the event (B) STATUS: OBJECTIVE IN PROGRESS Technology (Services) Research and Data July, 2017 quarterly
    [Show full text]
  • Social Changer Jae-Hee Technology Comfort Level
    Social Changer Jae-Hee Technology comfort level AGE: EDUCATION: 27 years old University Low High LOCATION: LANGUAGES: Seoul, South Korea Korean (fluent) Writing comfort level Japanese (proficient) OCCUPATION: English (proficient) Freelance graphic designer Low High Macbook Pro iPad iPhone 6S PRIMARY USE: Graphic design work, PRIMARY USE: Reading, looking for PRIMARY USE: Calling, messaging maintaining her website, writing, inspiration for her work, and quick friends on Kakao Talk, Twitter editing Wikipedia internet browsing when she’s not at home Background Jae-Hee graduated from university two years ago Japanese. While in university, she took a class on Experience Goals and currently works as a freelance graphic de- design and sustainability and became interested • To freely share her opinions and signer. She lives in the suburbs of Seoul, with her in environmental issues. Now she volunteers for knowledge without conflict or rebuke, parents and younger sister. In her spare time, she an environmental advocacy non-profit which ed- like she does elsewhere online works on her digital art and photography, which ucates people about living a sustainable lifestyle, she publishes on her personal website, through and shares similar lifestyle tips on her personal End Goals WordPress. She loves reading, particularly fantasy blog. and science fiction stories, in both Korean and • To raise awareness on environmental issues • To collaborate with other Tech Habits environmentalists She first started using a computer in grade school, environmental groups. She is well-known by her and got her first smartphone in high school. She online username, “jigu”. She learned basic HTML Challenges uses social media avidly, particularly Twitter to to run her website, and uses Adobe software for share her work and writing, and to follow other her graphic design and art.
    [Show full text]