Mining Online Software Tutorials: Challenges and Open Problems
Total Page:16
File Type:pdf, Size:1020Kb
Mining Online Software Tutorials: Challenges and Open Problems Adam Fourney Abstract University of Waterloo Web-based software tutorials contain a wealth of Waterloo, ON, Canada information describing software tasks and workflows. [email protected] There is growing interest in mining these resources for task modeling, automation, machine-guided help, Michael Terry interface search, and other applications. As a first step, University of Waterloo past work has shown success in extracting individual Waterloo, ON, Canada commands from textual instructions. In this paper, we [email protected] ask: How much further do we have to go to more fully interpret or automate a tutorial? We take a bottom-up approach, asking what it would take to: (1) interpret individual steps, (2) follow sequences of steps, and (3) locate procedural content in larger texts. Author Keywords Tutorial mining; Natural language processing Permission to make digital or hard copies of all or part of this work for ACM Classification Keywords personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies H.5.m. Information interfaces and presentation (e.g., bear this notice and the full citation on the first page. Copyrights for HCI): Miscellaneous. components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific Introduction permission and/or a fee. Request permissions from [email protected]. CHI 2014, April 26 - May 01, 2014, Toronto, ON, Canada. The Internet contains a vast and rich repository of Copyright is held by the owner/author(s). Publication rights licensed to ACM. tutorials and other procedural information describing ACM 978-1-4503-2474-8/14/04…$15.00. http://dx.doi.org/10.1145/2559206.2578862 how to accomplish a wide range of tasks with almost any publicly available interactive system. For nearly any task, it is likely that there exists online instructional Underspecified steps: content that will assist users in accomplishing their As an example, consider the following excerpt from a “Create a new, rather large goals. real-world photo manipulation tutorial: document.” Recognizing the wealth of data afforded by these “Place it underneath the original text, as if it resources, researchers in recent years have turned to were a reflection.” Anti-patterns: the problem of extracting useful data from online “Whatever you do, don’t tutorials. This past research has explored applications The correct interpretation of this instruction requires: create a hanging indent by of these data including task modeling [2], software (a) coreference resolution [8], to determine to which pressing the space key to automation [1,10,14], machine-guided help [11], and object the pronoun “it” refers; (b) spatial reasoning, to create spaces, or even by interface search [5,6]. Beyond this existing research, determine approximately where the item is to be tabbing across the page” there are many compelling ways this data could be placed; and (c) an understanding of the purpose clause utilized. For example, a system could infer the time [4] “as if it were a reflection” to further constrain the required to perform a tutorial, the target audience of final placement. In this case, three challenges arise Theory or background: the tutorial (e.g., novice vs. expert), or the amount of from a single tutorial sentence. When examining full “Unsharp mask works by creativity or input expected of users. Tutorials could tutorials, these and other challenges quickly increasing the contrast also be used to infer attributes related to the design of accumulate, compounding the problem (e.g., See between edges in the the software, such as missing features, or features that Figure 1). photos.” frequently lead to breakdowns in use of the software. The primary contribution of this paper is to present a The primary challenges in extracting data from tutorials roadmap for the research challenges that must be Figure 1: Samples of tutorial lie in the fact that the information is represented using tackled for more complete machine understanding of steps that demonstrate challenges posed by: natural language and, frequently, images and video instructional materials for interactive systems, with a underspecified steps or content, requiring systems that can transform this free- focus on text-based materials. The paper serves to parameters (top), anti-patterns form content into forms that systems can readily consolidate and organize the failure cases and or warnings of what not to do reason about. Much of the prior work in this space has limitations mentioned in past work, including some of (middle), and text that provides focused on extracting information from text-based our own papers [5,6]. It also presents challenges we background or theoretical details that need not be tutorials, which is also the focus of this paper. This past have encountered while working in this space, many of executed by the user (bottom). work has demonstrated [5,10,11,14] that mentions of which have not been explicitly identified in past work, user interface widgets (tools, menus, etc.) can be but are nonetheless critical to the correct interpretation detected in instructional material with accuracies of 95- of written instructions. Throughout the paper, we 97%. But, the information available in tutorials is much contextualize each challenge with numerous real-world richer and more nuanced than simple lists of widgets or examples across a range of applications and tutorials, commands. Despite clear progress, greater, and and discuss partial or potential solutions when such broader, machine understanding of instructional mitigations are possible. materials remains a significant research challenge. Background In the realm of online tutorials, which typically contain Much of the prior work in this research area has a lot of “noise” in the data (e.g., ads, site navigation, focused on extracting commands and parameter values comments) our previous work [5] explored the from text-based sources, typically using supervised possibility of detecting mentions of the software’s user learning methods. interface elements (e.g., menus, dialogs, tools) referenced in text. In this work, we utilized naive Bayes Motivated by the desire to improve guided help classifiers with a carefully selected set of features, and systems, Lau et al. explored possible strategies for achieved an F1-score of 0.87 when processing GIMP extracting operations from handwritten how-to photo manipulation tutorials. Comparably, Laput et al. instructions generated by workers on Mechanical Turk [10] utilized conditional random fields (CRFs) to detect [11]. In this work, the authors manually segmented the menus, tools and parameters in Photoshop tutorials. instructions so that each segment contained an action Resultant F1-scores of 0.97, 0.98 and 0.36 were (e.g., click), a target (e.g., the “OK” button), and any achieved for menus, tools and parameters respectively. associated values (e.g., parameters). In comparing approaches for extracting data, they found a keyword In contrast to the work described above, Brasser and (template-based) strategy outperformed both a Linden [2] strived to automatically extract detailed task handcrafted grammar and a set of maximum entropy models from written scenarios. The authors manually classifiers, achieving accuracies of 93%, 90% and 65% crafted a natural language grammar, which was for recovering actions, values, and targets, implemented as a 25-state augmented transition respectively. However, their techniques assume all text network. Compared to more recent work employing segments describe operations to perform, which the machine learning, the hand-built grammar did not authors found to be true only ~50% of the time. perform particularly well. Tasks were segmented with 63% accuracy, and entities were detected with 48% Lau et al.'s work was later expanded upon [14], with accuracy. the goal of transforming professionally written test cases into automated software testing procedures. In In this vein, Branavan et al. [1] demonstrated the this later work, support vector machines (SVMs) and potential for reinforcement learning approaches for conditional random fields (CRFs) were utilized to interpreting natural language instructions. Branavan et segment the text into steps and extract actions, al.'s technique learns how to interpret instructions by targets, and values from the resultant segments. The repeatedly testing hypotheses within a virtual machine. authors achieved F1-scores consistently over 0.95 for This approach has the advantage of being able to segmentation, and similar scores for each of the interpret some high-level composite actions that lack aforementioned entity types. Even with these high mention of the specific low-level operations needed to scores, errors and ambiguity accumulated when perform those actions in the interface. The authors interpreting complete operations, and the resultant reported that their method was able to correctly system correctly interpreted 70% of the steps. interpret 62% of the high-level actions in their dataset. “Just Ctrl + click that object Finally, procedural information can occur in online In presenting examples pulled from these tutorials, we and it will be selected.” (I) contexts beyond tutorials. Andrew Ko’s Frictionary use the notation (W)