Leveraging Knowledge Bases for Contextual Entity Exploration

Total Page:16

File Type:pdf, Size:1020Kb

Leveraging Knowledge Bases for Contextual Entity Exploration Leveraging Knowledge Bases for Contextual Entity Exploration ∗ y Joonseok Lee Ariel Fuxman Google Inc. Google Inc. Mountain View, CA, USA Mountain View, CA, USA [email protected] [email protected] Bo Zhaoy Yuanhua Lv LinkedIn Microsoft Research Mountain View, CA, USA Redmond, WA, USA [email protected] [email protected] ABSTRACT Keywords Users today are constantly switching back and forth from Entity recommendation, Context, Semantic, Knowledge base, applications where they consume or create content (such as Context-Selection Betweenness e-books and productivity suites like Microsoft Office and Google Docs) to search engines where they satisfy their information needs. Unfortunately, though, this leads to a suboptimal user experience as the search engine lacks any 1. INTRODUCTION knowledge about the content that the user is authoring or Users today are constantly switching back and forth from consuming in the application. As a result, productivity applications where they consume or create content (such as suites are starting to incorporate features that let the user e-books and productivity suites like Google Docs and Mi- \explore while they work". crosoft Office) to search engines where they satisfy their in- Existing work in the literature that can be applied to this formation needs (such as Bing or Google). Unfortunately, problem takes a standard bag-of-words information retrieval though, this leads to a suboptimal user experience as the approach, which consists of automatically creating a query search engine lacks any knowledge about the content that that includes not only the target phrase or entity chosen by the user is authoring or consuming in the application [11, the user but also relevant terms from the context. While 19, 13]. these approaches have been successful, they are inherently How can we empower users to satisfy their information limited to returning results (documents) that have a syntac- needs directly within the applications where they consume tic match with the keywords in the query. content? A significant step in this direction is enabling users We argue that the limitations of these approaches can be to interact with anything on the document that they are overcome by leveraging semantic signals from a knowledge working on, directly within the productivity application, and graph built from knowledge bases such as Wikipedia. We recommending results that are contextually relevant to the present a system called Lewis for retrieving contextually rel- elements they are interacting with. Productivity suites are evant entity results leveraging a knowledge graph, and per- starting to incorporate features that realize this scenario, form a large scale crowdsourcing experiment in the context such as the \Insights for Office" feature in Microsoft Word of an e-reader scenario, which shows that Lewis can outper- Online. form the state-of-the-art contextual entity recommendation As an example, consider a user reading on an e-reader systems by more than 20% in terms of the MAP score. the document shown in Figure 1, which describes the Cap- Categories and Subject Descriptors ture of Fort Ticonderoga, an important event in American history. At some point, she finds a mention to a historical H.4 [Information systems applications]: Data mining figure called \Silas Deane" and decides that she would like to learn more about him. Just sending the query \silas deane" ∗ This work was done during an internship at Microsoft. to any of the major commercial search engines returns re- yThis work was done while working at Microsoft. sults such as \Silas Dean High School" which are unrelated to the historical context of the document. A much more compelling user experience is the one shown in Figure 1, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed where the user has tapped on the phrase \Silas Deane" and for profit or commercial advantage and that copies bear this notice and the full cita- is shown contextually relevant articles such as \Revolution- tion on the first page. Copyrights for components of this work owned by others than ary War", where she can learn about Silas Deane's over- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- all involvement in the American Revolutionary War, and publish, to post on servers or to redistribute to lists, requires prior specific permission \Benjamin Franklin", where she can learn that Deane and and/or a fee. Request permissions from [email protected]. Franklin were the first diplomats in American history, and KDD’15, August 10-13, 2015, Sydney, NSW, Australia. c 2015 ACM. ISBN 978-1-4503-3664-2/15/08 ...$15.00. they were sent together to France as commissioners from the DOI: http://dx.doi.org/10.1145/2783258.2788564. Continental Congress. Leibniz Sub-graph Construction Green Mt. Boys Fort Ticonderoga Connecticut American Revolutionary War Blacksmith Silas Deane House Silas Deane Arthur Lee John Adams Benjamin Franklin Figure 2: A portion of the focused subgraph for our running example. (Black for the user selection node, gray for context nodes, and white for all other nodes.) 2 Figure 1: An example of contextual exploration. • An algorithm for retrieving contextually relevant enti- ties. Previous efforts in the literature have made significant • A large-scale evaluation of the approach in the context progress towards realizing this scenario, including systems of a real-word e-reader application. The results show for contextual search [11, 19] and contextual insights [13]. an improvement of up to 20.8% in terms of the MAP These systems take a standard bag-of-word information re- scores with respect to the state-of-the-art methods for trieval approach to the problem, which consists of auto- contextual insights and pseudo-relevance feedback. We matically creating a query that includes not only the tar- also present a detailed ablation study that shows the get phrase or entity chosen by the user but also relevant importance of the different components of the Lewis terms from the context. More broadly, these approaches are system. related to relevance feedback in information retrieval [35], where a context (in the case of relevance feedback, the re- The rest of this paper is organized as follows. We formally sults of a query; in our scenario, the document context) is define the contextual entity exploration problem in the next used to refine an initial query (in our case, the phrase chosen section, followed by a detailed description of our proposed by the user). method in Section 3. We evaluate our method in Section 4. While these approaches have been successful, they are in- Lastly, we review related problems and previous work in herently limited to returning results (documents) that have Section 5, and provide concluding remarks in Section 6. a syntactic match with the keywords in the query. For ex- ample, the Wikipedia articles for \Revolutionary War" and \Benjamin Franklin" have just a single passing mention to 2. CONTEXTUAL ENTITY EXPLORATION Silas Deane and are thus unlikely to be retrieved by a query PROBLEM that contains the terms \silas deane". To tackle this prob- The input to the contextual entity exploration problem lem, in this paper we argue that such results can be obtained consists of a user selection: the span of text that the user by more directly modeling the semantic connections between highlights with the mouse or taps with the finger, which im- the target concept and the entities in the context where it plicitly determines the entity she would like to gain insights appears. about (e.g., \Silas Deane"); a context, consisting of the con- To illustrate our approach, consider the graph shown in tent that the user is consuming or authoring; a knowledge Figure 2 (henceforth called knowledge graph). The black base, that consists of entities that are candidates to be rec- node corresponds to the entity chosen by the user (Silas ommended; and a knowledge graph, whose nodes are entities Deane), the gray nodes correspond to entities mentioned from the knowledge base; and an text-to-entity mapping that in the context (Green Mountain Boys, Fort Ticonderoga, given some text from the user selection or context produces Connecticut). The edges correspond to hyperlinks in the an entity from the knowledge base. Wikipedia articles. As we can see, the node for \Revo- The contextual entity exploration problem is then formally lutionary War" acts as a bridge between Silas Deane and defined as follows: the context concepts Green Mountain Boys (the militia that captured Fort Ticonderoga) and Fort Ticonderoga. Our ap- Definition 1. Given a quintuple (s; C; B; G; γ), where s proach leverages precisely this type of semantic connections is a user selection, C is some text context, B is a knowledge to retrieve contextually relevant results. base, G is a undirected graph whose nodes are entities in The contributions of this paper include: B, and γ is a text-to-entity mapping; the objective of the contextual entity exploration problem is to produce a set of • A framework for leveraging semantic signals from a entities O such that O ⊆ B and every entity in O is relevant knowledge graph for the problem of retrieving contex- to s in the context of C. tually relevant entity results. In this work, we will use Wikipedia as our knowledge base • A system called Lewis for retrieving contextually rele- B and the hyperlink structure of Wikipedia as the edges of vant entities leveraging a knowledge graph built from the knowledge graph. In particular, G will be an undirected Wikipedia hyperlinks. graph G = (B; E) where there is an edge (x; y) in E if there Focused Subgraph Construction Personalized Random Walk Recommendation List 0.18 0.14 0.08 C1 C1 Barack Obama 1.26 0.11 Score Aggregation Barack Obama is US president.
Recommended publications
  • Legacy Finding Aid for Manuscript and Photograph Collections
    Legacy Finding Aid for Manuscript and Photograph Collections 801 K Street NW Washington, D.C. 20001 What are Finding Aids? Finding aids are narrative guides to archival collections created by the repository to describe the contents of the material. They often provide much more detailed information than can be found in individual catalog records. Contents of finding aids often include short biographies or histories, processing notes, information about the size, scope, and material types included in the collection, guidance on how to navigate the collection, and an index to box and folder contents. What are Legacy Finding Aids? The following document is a legacy finding aid – a guide which has not been updated recently. Information may be outdated, such as the Historical Society’s contact information or exact box numbers for contents’ location within the collection. Legacy finding aids are a product of their times; language and terms may not reflect the Historical Society’s commitment to culturally sensitive and anti-racist language. This guide is provided in “as is” condition for immediate use by the public. This file will be replaced with an updated version when available. To learn more, please Visit DCHistory.org Email the Kiplinger Research Library at [email protected] (preferred) Call the Kiplinger Research Library at 202-516-1363 ext. 302 The Historical Society of Washington, D.C., is a community-supported educational and research organization that collects, interprets, and shares the history of our nation’s capital. Founded in 1894, it serves a diverse audience through its collections, public programs, exhibits, and publications. 801 K Street NW Washington, D.C.
    [Show full text]
  • 801 K Street NW Washington, D.C. 20001
    801 K Street NW Washington, D.C. 20001 www.DCHistory.org SPECIAL COLLECTIONS FINDING AID Title: MS 0846, Clarence Hewes Scrapbook Collection, 1906-1962 Processor: David G. Wood Processed Date: April 2016 [Finding Aid last updated April 12, 2016] Clarence Bussey Hewes was born in Jeanerette, Louisiana, on February 1, 1890, to Harry Bartram and Nellie Bussey Hewes. The family also included Clarence’s two sisters, Amy (later Mrs. Robert Edmund Floweree) and Florence (later Mrs. Arthur Breese Griswold). Harry Hewes, a native of Texas, had come to Louisiana in the 1880s and made a fortune developing the local lumber industry. According to articles found in the scrapbooks, the Hewes were descended from a North Carolina family that included Joseph Hewes of Edenton, a signer of the Declaration of Independence who organized the first American naval force. Hewes attended the Dixon Academy in Covington, Louisiana; the University of Virginia (LL.B., 1914); and Tulane University (Bachelor of Laws in Civil Law, 1915). He came to Washington, D.C., in 1916, and in 1917-1918 served as private secretary to the Honorable Charles C. McChord, one of the commissioners heading the Interstate Commerce Commission. On February 10, 1919, he began a career at the Department of State, assigned as Third Secretary at the U.S. Legation in Panama. He then served at the U.S. embassies or legations in the Netherlands (1920-1922), and Costa Rica, El Salvador, and Guatemala (1922-1924), before being assigned as First Secretary at the embassy in Peking, China. He remained in China until 1930 when he was designated First Secretary at the embassy in Berlin, Germany.
    [Show full text]
  • History of the Bureau of Diplomatic Security
    History of the Bureau of Diplomatic Security of the United States Department of State History of the Bureau of Diplomatic Security of the United States Department of State United States Department of State Bureau of Diplomatic Security Printed October 2011 Global Publishing Solutions First Edition CHIEFS OF SECURITY AT THE U.S. DEPARTMENT OF STATE 1917 - 2011 Office of the Chief Special Agent Bureau of Diplomatic Security (DS) Joseph M. “Bill” Nye 1917 – 1920 Assistant Secretary of State Robert C. Bannerman 1920 – 1940 Robert E. Lamb 1985 – 1989 Thomas F. Fitch 1940 – 1947 Sheldon J. Krys 1989 – 1992 Anthony C. E. Quainton 1992 – 1995 Security Office Eric J. Boswell 1996 – 1998 Robert L. Bannerman 1945 – 1947 David G. Carpenter 1998 – 2002 Francis X. Taylor 2002 – 2005 Division of Security Richard J. Griffin 2005 – 2007 Donald L. Nicholson 1948 – 1952 Gregory B. Starr (Acting) 2007 – 2008 John W. Ford 1952 Eric J. Boswell 2008 – Director, Diplomatic Security Service Office of Security (SY) Deputy Assistant Secretary of State David C. Fields 1985 – 1986 John W. Ford 1952 – 1953 Louis E. Schwartz, Jr. 1986 – 1988 Dennis A. Flinn 1953 – 1956 E. Tomlin Bailey 1956 – 1958 Clark M. Dittmer 1988 – 1993 William O. Boswell 1958 – 1962 Mark E. Mulvey 1993 – 1996 John F. Reilly 1962 – 1963 Gregorie Bujac 1996 – 1998 G. Marvin Gentile 1964 – 1974 Peter E. Bergin 1998 – 2003 Victor H. Dikeos 1974 – 1978 Joe D. Morton 2003 – 2007 Karl D. Ackerman 1978 – 1982 Gregory B. Starr 2007 – 2009 Marvin L. Garrett, Jr. 1982 – 1983 Patrick Donovan 2009 David C. Fields 1984 – 1985 Jeffrey W.
    [Show full text]
  • A Guide to Portraits in the Chancery, PDF 3 MB
    A GUIDE TO PORTRAITS IN THE CHANCERY U.S. EMBASSY PARIS, FRANCE CULTURAL HERITAGE PROGRAM 1 “I want to express my pleasure at being here this morning. I tried to be assigned to the Embassy in Paris myself, and unable to do so, I decided to run for President. … The United States interest in France is not based on mere sentiment. I know it is customary on these visits to recall Lafayette and all the rest...Our interest here is more substantial, and I believe it goes to the common interests of both the United States and France. We are closely associated and are allies, because it helps to protect the interest of our country and because it protects the interests of freedom around the world. I do not believe that there is any Embassy in the world more important to the United States than the Embassy in Paris, because the influence of this city and country goes far beyond its borders” President John F. Kennedy’s in remarks at the U.S. Embassy Paris: June 1, 1961 A project of the Cultural Heritage Program assisted by American interns 2011-2013 2 George Washington George Washington 1732-1799 1732-1799 Robert Princeton after Gilbert Stuart; Artist Unknown, oil on canvas oil on canvas INV 127310 124 ins by 68 ins INV 801905 George Washington, “father of his country,” was born on February 22, 1732 in Westmoreland County, Virginia. Prior to the American Revolution, he participated in the French and Indian War, and, in 1758, was elected to the Virginia House of Burgesses.
    [Show full text]