Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Raphael Hoffmann, James Fogarty, Daniel S. Weld Computer Science & Engineering University of Washington Seattle, WA 98195 {raphaelh, jfogarty, weld}@cs.washington.edu ABSTRACT engine show many examples of queries related to Programmers regularly use search as part of the development Application Programming Interfaces (APIs), including process, attempting to identify an appropriate API for a queries attempting to identify an appropriate API for a problem, seeking more information about an API, and problem, queries seeking more information about a particular seeking samples that show how to use an API. However, API, and queries seeking samples that use an API. Interviews neither general-purpose search engines nor existing code with developers confirm that Web search engines are the search engines currently fit their needs, in large part because single most important source for this information. the information programmers need is distributed across many pages. We present Assieme, a Web search interface that Unfortunately, current Web search interfaces have important effectively supports common programming search tasks by shortcomings when used for this purpose. General search combining information from Web-accessible Java Archive engines, such as Google, traditionally generate a flat (JAR) files, API documentation, and pages that include listing of ranked pages, but developers seeking an API explanatory text and sample code. Assieme uses a novel require information dispersed on many pages: tutorials, approach to finding and resolving implicit references to Java documentation pages, the API itself (in source or binary packages, types, and members within sample code on the format), and pages with code samples which demonstrate Web. In a study of programmers performing searches related usage. It is currently time-consuming to locate the to common programming tasks, we show that programmers required pieces of information and difficult to get an obtain better solutions, using fewer queries, in the same overview of alternatives. Unless a programmer already amount of time spent using a general Web search interface. has a significant understanding of an API, it is almost impossible to judge the relevance and quality of results ACM Classification or to understand dependencies contained in sample code. H5.2. Information interfaces and presentation: User Interfaces. Numerous queries and visits to many pages are therefore required. Code-specific search engines have recently been Keywords: Web search interfaces, implicit references introduced, but these are also unsatisfactory, largely because INTRODUCTION AND MOTIVATION they ignore documentation, tutorials, and pages containing The explosion of information available on the Web and a mix of code samples with explanatory text. Pages that on personal computers has made search a fundamental contain both explanatory text and sample code have generally component of modern user interface software. This has been intentionally created to illustrate the use of an API, led not only to new approaches to visualizing the results but the raw code returned by existing code-specific search of keyword-based Web search [24], but also applications engines lacks context and is frequently incomprehensible. for quickly finding personal information [6, 9], augmenting This paper presents Assieme, a Web search interface for highly-structured sites with browser-based search [14], tools programmers based on a novel approach to combining to help people collect and summarize information from information currently distributed across many pages. search sessions [8], and keyword-based approaches to Assieme analyses Web-accessible Java Archive (JAR) files, invoking commands in desktop applications [18]. API documentation, and pages that mix explanatory text with Because a vast number of code libraries and related sample code. By finding and resolving implicit references information are now available on the Web, programmers from code samples to Java packages, types, and members, increasingly use search as a part of their development Assieme can combine relevant information from different process. Our analysis of logs from a major search Web-accessible resources. Assieme therefore provides a coherent search-based interface that allows programmers to Permission to make digital or hard copies of all or part of this work for quickly examine different APIs that might be appropriate personal or classroom use is granted without fee provided that copies are for a problem, obtain more information about a particular not made or distributed for profit or commercial advantage and that copies API, and see samples of how to use an API. In a study bear this notice and the full citation on the first page. To copy otherwise, or of programmers performing common programming-related republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. search tasks, we show that programmers obtain better UIST’07, October 7–10, 2007, Newport, Rhode Island, USA. solutions, using fewer queries, in the same amount of time Copyright 2007 ACM 978-1-59593-679-2/07/0010 ...$5.00. spent using a general Web search interface, Google. Browse packages, types, and methods related to keywords Judge relevancy by example counts Filter pages with code examples by package, type, or member Highlighted types See and download Hover over page titles to Page summaries show Context-sensitiveContext-sensitive menus menus required JAR files Java types used in code examples see code example previews at highlighted types (see Figure 2) Figure 1: Assieme provides a search-based interface that allows programmers to quickly examine different APIs that might be appropriate for a problem, obtain more information about a particular API, and see samples of how to use an API. This paper makes four contributions. First, we analyze query logs of a major Web search engine and show that many searches by programmers are related to finding an appropriate API for a problem, finding more information about a particular API, or seeking samples of the use of an API. Second, we present Assieme, a Web search interface for programmers that uses implicit references in sample code to enable a novel interface that better Figure 2: A context-sensitive menu reveals the fully addresses the information needs of programmers. Third, qualified names of elements appearing in Java code we analyze the key algorithms powering Assieme, reporting and provides links to additional information. the reliability of automatic sample code extraction and reference disambiguation. Finally, we report on a study Choosing a package, type, or member in this area filters the of programmers performing searches related to common results in the next area, similar to other interfaces that use programming tasks, showing that programmers obtain better faceted search to examine large datasets [31]. solutions, using fewer queries, in the same amount of time spent using a general Web search interface. In this case, the programmer has filtered the results to focus on the com.lowagie.text.pdf package, so the bottom-left AN OVERVIEW OF ASSIEME portion of the interface shows pages that contain code Figure 1 presents Assieme, with a screen capture taken while samples that use classes from that package. The blue links a developer seeking to programmatically generate a file in and green URLs here provide the traditional functionality, Adobe’s PDF format is exploring the results of a query for allowing a programmer to navigate to a page. But instead of “output acrobat.” The interface contains three major areas. the generic text snippet preview provided by general search engines, Assieme presents information that is more likely to The shaded bar across the top shows Java packages, types, match the information needs of programmers. Specifically, and members corresponding to the search query. In this Assieme shows what Java types are used in code samples on example, the programmer has indicated an interest in seeing the page and which libraries contain those types. Assieme packages. The fully-qualified names of appropriate packages also provides a link to download those libraries. When a are shown, ranked by their relevance to the query, as well as programmer mouses over a link on the left side, sample code how many code samples Assieme has found that demonstrate snippets from that page are shown on the right. the use of types from that package. The number of samples that use a package, type, or member can be helpful for Previewing code samples allows programmers to quickly determining relevance, as frequently used items may be get more information about the relevance of a page. The more robust and better supported than less known options. sample itself may provide the information a programmer needs, or it might give the programmer more information Coordination barriers occur when a programmer knows what about whether navigating to the full page is likely to be set of things to use to achieve a goal, but not how those things helpful. Assieme adds further information to the sample should be combined. Use barriers are related, occurring code preview by providing context-sensitive menus within when a programmer knows what to use, but not how to use the code. These reveal the fully-qualified names of packages, it. Understanding barriers occur after a potential solution types, and members used in code and provide direct links has been

Load more