Leveraging Knowledge Bases for Contextual Entity Exploration

Leveraging Knowledge Bases for Contextual Entity Exploration

Leveraging Knowledge Bases for Contextual Entity Exploration ∗ y Joonseok Lee Ariel Fuxman Google Inc. Google Inc. Mountain View, CA, USA Mountain View, CA, USA [email protected] [email protected] Bo Zhaoy Yuanhua Lv LinkedIn Microsoft Research Mountain View, CA, USA Redmond, WA, USA [email protected] [email protected] ABSTRACT Keywords Users today are constantly switching back and forth from Entity recommendation, Context, Semantic, Knowledge base, applications where they consume or create content (such as Context-Selection Betweenness e-books and productivity suites like Microsoft Office and Google Docs) to search engines where they satisfy their information needs. Unfortunately, though, this leads to a suboptimal user experience as the search engine lacks any 1. INTRODUCTION knowledge about the content that the user is authoring or Users today are constantly switching back and forth from consuming in the application. As a result, productivity applications where they consume or create content (such as suites are starting to incorporate features that let the user e-books and productivity suites like Google Docs and Mi- \explore while they work". crosoft Office) to search engines where they satisfy their in- Existing work in the literature that can be applied to this formation needs (such as Bing or Google). Unfortunately, problem takes a standard bag-of-words information retrieval though, this leads to a suboptimal user experience as the approach, which consists of automatically creating a query search engine lacks any knowledge about the content that that includes not only the target phrase or entity chosen by the user is authoring or consuming in the application [11, the user but also relevant terms from the context. While 19, 13]. these approaches have been successful, they are inherently How can we empower users to satisfy their information limited to returning results (documents) that have a syntac- needs directly within the applications where they consume tic match with the keywords in the query. content? A significant step in this direction is enabling users We argue that the limitations of these approaches can be to interact with anything on the document that they are overcome by leveraging semantic signals from a knowledge working on, directly within the productivity application, and graph built from knowledge bases such as Wikipedia. We recommending results that are contextually relevant to the present a system called Lewis for retrieving contextually rel- elements they are interacting with. Productivity suites are evant entity results leveraging a knowledge graph, and per- starting to incorporate features that realize this scenario, form a large scale crowdsourcing experiment in the context such as the \Insights for Office" feature in Microsoft Word of an e-reader scenario, which shows that Lewis can outper- Online. form the state-of-the-art contextual entity recommendation As an example, consider a user reading on an e-reader systems by more than 20% in terms of the MAP score. the document shown in Figure 1, which describes the Cap- Categories and Subject Descriptors ture of Fort Ticonderoga, an important event in American history. At some point, she finds a mention to a historical H.4 [Information systems applications]: Data mining figure called \Silas Deane" and decides that she would like to learn more about him. Just sending the query \silas deane" ∗ This work was done during an internship at Microsoft. to any of the major commercial search engines returns re- yThis work was done while working at Microsoft. sults such as \Silas Dean High School" which are unrelated to the historical context of the document. A much more compelling user experience is the one shown in Figure 1, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed where the user has tapped on the phrase \Silas Deane" and for profit or commercial advantage and that copies bear this notice and the full cita- is shown contextually relevant articles such as \Revolution- tion on the first page. Copyrights for components of this work owned by others than ary War", where she can learn about Silas Deane's over- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- all involvement in the American Revolutionary War, and publish, to post on servers or to redistribute to lists, requires prior specific permission \Benjamin Franklin", where she can learn that Deane and and/or a fee. Request permissions from [email protected]. Franklin were the first diplomats in American history, and KDD’15, August 10-13, 2015, Sydney, NSW, Australia. c 2015 ACM. ISBN 978-1-4503-3664-2/15/08 ...$15.00. they were sent together to France as commissioners from the DOI: http://dx.doi.org/10.1145/2783258.2788564. Continental Congress. Leibniz Sub-graph Construction Green Mt. Boys Fort Ticonderoga Connecticut American Revolutionary War Blacksmith Silas Deane House Silas Deane Arthur Lee John Adams Benjamin Franklin Figure 2: A portion of the focused subgraph for our running example. (Black for the user selection node, gray for context nodes, and white for all other nodes.) 2 Figure 1: An example of contextual exploration. • An algorithm for retrieving contextually relevant enti- ties. Previous efforts in the literature have made significant • A large-scale evaluation of the approach in the context progress towards realizing this scenario, including systems of a real-word e-reader application. The results show for contextual search [11, 19] and contextual insights [13]. an improvement of up to 20.8% in terms of the MAP These systems take a standard bag-of-word information re- scores with respect to the state-of-the-art methods for trieval approach to the problem, which consists of auto- contextual insights and pseudo-relevance feedback. We matically creating a query that includes not only the tar- also present a detailed ablation study that shows the get phrase or entity chosen by the user but also relevant importance of the different components of the Lewis terms from the context. More broadly, these approaches are system. related to relevance feedback in information retrieval [35], where a context (in the case of relevance feedback, the re- The rest of this paper is organized as follows. We formally sults of a query; in our scenario, the document context) is define the contextual entity exploration problem in the next used to refine an initial query (in our case, the phrase chosen section, followed by a detailed description of our proposed by the user). method in Section 3. We evaluate our method in Section 4. While these approaches have been successful, they are in- Lastly, we review related problems and previous work in herently limited to returning results (documents) that have Section 5, and provide concluding remarks in Section 6. a syntactic match with the keywords in the query. For ex- ample, the Wikipedia articles for \Revolutionary War" and \Benjamin Franklin" have just a single passing mention to 2. CONTEXTUAL ENTITY EXPLORATION Silas Deane and are thus unlikely to be retrieved by a query PROBLEM that contains the terms \silas deane". To tackle this prob- The input to the contextual entity exploration problem lem, in this paper we argue that such results can be obtained consists of a user selection: the span of text that the user by more directly modeling the semantic connections between highlights with the mouse or taps with the finger, which im- the target concept and the entities in the context where it plicitly determines the entity she would like to gain insights appears. about (e.g., \Silas Deane"); a context, consisting of the con- To illustrate our approach, consider the graph shown in tent that the user is consuming or authoring; a knowledge Figure 2 (henceforth called knowledge graph). The black base, that consists of entities that are candidates to be rec- node corresponds to the entity chosen by the user (Silas ommended; and a knowledge graph, whose nodes are entities Deane), the gray nodes correspond to entities mentioned from the knowledge base; and an text-to-entity mapping that in the context (Green Mountain Boys, Fort Ticonderoga, given some text from the user selection or context produces Connecticut). The edges correspond to hyperlinks in the an entity from the knowledge base. Wikipedia articles. As we can see, the node for \Revo- The contextual entity exploration problem is then formally lutionary War" acts as a bridge between Silas Deane and defined as follows: the context concepts Green Mountain Boys (the militia that captured Fort Ticonderoga) and Fort Ticonderoga. Our ap- Definition 1. Given a quintuple (s; C; B; G; γ), where s proach leverages precisely this type of semantic connections is a user selection, C is some text context, B is a knowledge to retrieve contextually relevant results. base, G is a undirected graph whose nodes are entities in The contributions of this paper include: B, and γ is a text-to-entity mapping; the objective of the contextual entity exploration problem is to produce a set of • A framework for leveraging semantic signals from a entities O such that O ⊆ B and every entity in O is relevant knowledge graph for the problem of retrieving contex- to s in the context of C. tually relevant entity results. In this work, we will use Wikipedia as our knowledge base • A system called Lewis for retrieving contextually rele- B and the hyperlink structure of Wikipedia as the edges of vant entities leveraging a knowledge graph built from the knowledge graph. In particular, G will be an undirected Wikipedia hyperlinks. graph G = (B; E) where there is an edge (x; y) in E if there Focused Subgraph Construction Personalized Random Walk Recommendation List 0.18 0.14 0.08 C1 C1 Barack Obama 1.26 0.11 Score Aggregation Barack Obama is US president.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us