UNIVERSITY of CALIFORNIA, IRVINE Understanding the Impact of Support for Iteration on Code Search DISSERTATION Submitted In

Total Page:16

File Type:pdf, Size:1020Kb

UNIVERSITY of CALIFORNIA, IRVINE Understanding the Impact of Support for Iteration on Code Search DISSERTATION Submitted In UNIVERSITY OF CALIFORNIA, IRVINE Understanding the Impact of Support for Iteration on Code Search DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Software Engineering by Lee Thomas Martie Dissertation Committee: Professor André van der Hoek, Chair Professor Cristina V. Lopes Associate Professor James A. Jones 2017 © 2017 Lee Thomas Martie DEDICATION To my wife, Hsiao-Hsuan Yu, who supported me more than I imagined anyone could and anyone deserved. To my cats, Fred and Maymay, who added just the right amount of chaos and companionship to my life at home. To my parents, Richard and Lisa Martie, who let me play and have fun with BASIC when I was a kid and supported my education in programming. To my brothers, Phil and Ben Martie, who laughed at the absurdity of life with me. To my sister-in-laws, Jude and Meghan Martie, for being the sisters I did not have growing up. To Edward Shafranske, who taught me courage and helped me grow into a more capable person. “The art is to always be willing to make the guess, be willing to be completely wrong, be humble to the fact, and be completely haughty to man… Otherwise, you get laughed out of a right idea.” — Warren McCulloch “Always look on the bright side of life.” — Monty Python “Never give up.” — Grandma Martie ii TABLE OF CONTENTS LIST OF FIGURES ................................................................................................................... v LIST OF TABLES ................................................................................................................. viii LIST OF EQUATIONS ............................................................................................................. x ACKNOWLEDGMENTS ......................................................................................................... xi CURRICULUM VITAE ........................................................................................................... xii ABSTRACT OF THE DISSERTATION ................................................................................. xiv Introduction ......................................................................................................................... 1 1.1 Dissertation Structure ................................................................................................. 18 Background ......................................................................................................................... 20 2.1 Empirical Studies on Code Search ....................................................................................... 20 2.1.1 Surveys ................................................................................................................................ 21 2.1.2 User Studies ........................................................................................................................ 23 2.1.4 Empirical Study Discussion ............................................................................................... 29 2.2 Tool Support for Code Search .............................................................................................. 32 2.2.1 More Expressive Queries .................................................................................................. 32 2.2.5.3 Improving Ranking with Weighted Matching .............................................................. 40 2.2.14 Tool Support Discussion ................................................................................................. 54 2.3 Iterative Search in Information Retrieval .......................................................................... 58 Research Question ............................................................................................................. 61 CodeExchange ..................................................................................................................... 66 4.2 CodeExchange Interface ....................................................................................................... 78 4.3 Query Refinement Recommendations ................................................................................ 81 4.4 Critiques ................................................................................................................................. 82 4.5 Language Constructs ............................................................................................................. 84 4.6 Query Parts ............................................................................................................................ 85 CodeLikeThis .................................................................................................................... 118 5.1 Architecture ......................................................................................................................... 127 5.2 Interface ............................................................................................................................... 131 5.3 The Ranking Algorithms of CodeLikeThis ........................................................................ 132 5.3.1 Similarity Function .......................................................................................................... 133 5.3.2.1 Size of R .......................................................................................................................... 140 5.3.2.2 Size of S .......................................................................................................................... 146 5.3.2.3 ST Design ....................................................................................................................... 147 5.3.2.4 MWL-ST Hybrid Design ................................................................................................ 147 5.3.2.4.1 ComplexityMC Density ................................................................................................ 150 .................................................................................................................................................... 153 5.2.2.4.2 Using ComplexityMC Density in MWL-ST Hybrid ..................................................... 154 iii Evaluation ......................................................................................................................... 184 6.1 Experiment Design .............................................................................................................. 185 6.1.1 Non-Iterative Approaches ............................................................................................... 185 6.1.2 Participants ...................................................................................................................... 187 .................................................................................................................................................... 188 6.1.3 Assignments ..................................................................................................................... 188 6.1.4 Search Tasks ..................................................................................................................... 191 6.1.5 Survey System .................................................................................................................. 193 6.2 Results .................................................................................................................................. 195 6.2.1 Experience Scores ............................................................................................................ 196 6.2.2 Task Times ....................................................................................................................... 206 6.2.2.1 Experience and Task Times ......................................................................................... 212 .................................................................................................................................................... 223 .................................................................................................................................................... 224 6.2.2.2 Number of Queries and Task Times CE versus BL ..................................................... 225 6.2.3 Task Completion as a Success Measure ......................................................................... 229 Discussion ......................................................................................................................... 267 Conclusion ......................................................................................................................... 281 iv LIST OF FIGURES Figure 1 Grep in use to find ‘x’ in C code. ..................................................................................... 4 Figure 2 Creative Computing Deep Space Game. .......................................................................... 5 ...................................................................................... 6 ........................................................................................ 7 FigureFigure 3 Creative Computing Heapsort. 5 Game Code Example for BBS. ......................................................................................... 7 FigureFigure 64 AlienBulletin Board System Menu. Stacking Game Example. ..................................................................................... 14 Figure 7 AGORA screenshot. ......................................................................................................
Recommended publications
  • Garbage Collection for Java Distributed Objects
    GARBAGE COLLECTION FOR JAVA DISTRIBUTED OBJECTS by Andrei A. Dãncus A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements of the Degree of Master of Science in Computer Science by ____________________________ Andrei A. Dãncus Date: May 2nd, 2001 Approved: ___________________________________ Dr. David Finkel, Advisor ___________________________________ Dr. Mark L. Claypool, Reader ___________________________________ Dr. Micha Hofri, Head of Department Abstract We present a distributed garbage collection algorithm for Java distributed objects using the object model provided by the Java Support for Distributed Objects (JSDA) object model and using weak references in Java. The algorithm can also be used for any other Java based distributed object models that use the stub-skeleton paradigm. Furthermore, the solution could also be applied to any language that supports weak references as a mean of interaction with the local garbage collector. We also give a formal definition and a proof of correctness for the proposed algorithm. i Acknowledgements I would like to express my gratitude to my advisor Dr. David Finkel, for his encouragement and guidance over the last two years. I also want to thank Dr. Mark Claypool for being the reader of this thesis. Thanks to Radu Teodorescu, co-author of the initial JSDA project, for reviewing portions of the JSDA Parser. ii Table of Contents 1. Introduction……………………………………………………………………………1 2. Background and Related Work………………………………………………………3 2.1 Distributed
    [Show full text]
  • Jeffrey Scudder Google Inc. March 28, 2007 Google Spreadsheets Automation Using Web Services
    Jeffrey Scudder Google Inc. March 28, 2007 Google Spreadsheets Automation using Web Services Jeffrey Scudder Google Inc. March 28, 2007 2 Overview What is Google Spreadsheets? • Short Demo What is the Google Spreadsheets Data API? • Motivations (Why an API?) • Protocol design • Atom Publishing Protocols • GData • List feed deconstructed How do I use the Google Spreadsheets Data API? • Authentication • Longer Demo Questions 3 What is Google Spreadsheets? Let’s take a look 4 What is Google Spreadsheets? Why not ask why • Spreadsheets fits well with our mission… – “Organize My Information… and… – Make it Accessible and Useful… – With whomever I choose (and nobody else, thanks)” • In other words…. – Do-it-yourself Content Creation – Accepted/Familiar Interface of Spreadsheets and Documents – Accessibility from anywhere (…connected) – Easy-to-use Collaboration – Do-it-yourself Community Creation 5 What is the Google Spreadsheets Data API? Motivations • Foster development of specific-use apps • Allow users to create new UIs • To extend features offered • To integrate with 3rd party products • To offer new vertical applications 6 What is the Google Spreadsheets Data API? Protocol design based on existing open standards • Deciding on design principles – Use a RESTful approach – Reuse open standards – Reuse design from other Google APIs • The end result – REST web service based on GData – Data is in Atom XML and protocol based on the Atom Publishing Protocol (APP) – GData is based on Atom and APP and adds authentication, query semantics, and more
    [Show full text]
  • Inequalities in Open Source Software Development: Analysis of Contributor’S Commits in Apache Software Foundation Projects
    RESEARCH ARTICLE Inequalities in Open Source Software Development: Analysis of Contributor’s Commits in Apache Software Foundation Projects Tadeusz Chełkowski1☯, Peter Gloor2☯*, Dariusz Jemielniak3☯ 1 Kozminski University, Warsaw, Poland, 2 Massachusetts Institute of Technology, Center for Cognitive Intelligence, Cambridge, Massachusetts, United States of America, 3 Kozminski University, New Research on Digital Societies (NeRDS) group, Warsaw, Poland ☯ These authors contributed equally to this work. * [email protected] a11111 Abstract While researchers are becoming increasingly interested in studying OSS phenomenon, there is still a small number of studies analyzing larger samples of projects investigating the structure of activities among OSS developers. The significant amount of information that OPEN ACCESS has been gathered in the publicly available open-source software repositories and mailing- list archives offers an opportunity to analyze projects structures and participant involve- Citation: Chełkowski T, Gloor P, Jemielniak D (2016) Inequalities in Open Source Software Development: ment. In this article, using on commits data from 263 Apache projects repositories (nearly Analysis of Contributor’s Commits in Apache all), we show that although OSS development is often described as collaborative, but it in Software Foundation Projects. PLoS ONE 11(4): fact predominantly relies on radically solitary input and individual, non-collaborative contri- e0152976. doi:10.1371/journal.pone.0152976 butions. We also show, in the first published study of this magnitude, that the engagement Editor: Christophe Antoniewski, CNRS UMR7622 & of contributors is based on a power-law distribution. University Paris 6 Pierre-et-Marie-Curie, FRANCE Received: December 15, 2015 Accepted: March 22, 2016 Published: April 20, 2016 Copyright: © 2016 Chełkowski et al.
    [Show full text]
  • Exploring Factors and Measures to Select Open Source Software
    Exploring Factors and Measures to Select Open Source Software Xiaozhou Li*a, Sergio Moreschini*a, Zheying Zhanga, Davide Taibia aTampere University, Tampere (Finland) ∗ the two authors equally contributed to the paper Abstract [Context] Open Source Software (OSS) is nowadays used and integrated in most of the commercial products. However, the selection of OSS projects for integration is not a simple process, mainly due to a of lack of clear selection models and lack of information from the OSS portals. [Objective] We investigated the current factors and measures that prac- titioners are currently considering when selecting OSS, the source of infor- mation and portals that can be used to assess the factors, and the possibility to automatically get this information with APIs. [Method] We elicited the factors and the measures adopted to assess and compare OSS performing a survey among 23 experienced developers who often integrate OSS in the software they develop. Moreover, we investigated the APIs of the portals adopted to assess OSS extracting information for the most starred 100K projects in GitHub. [Result] We identified a set consisting of 8 main factors and 74 sub- factors, together with 170 related metrics that companies can use to select OSS to be integrated in their software projects. Unexpectedly, only a small part of the factors can be evaluated automatically, and out of 170 metrics, only 40 are available, of which only 22 returned information for all the 100K projects. [Conclusion.] OSS selection can be partially automated, by extracting arXiv:2102.09977v1 [cs.SE] 19 Feb 2021 the information needed for the selection from portal APIs.
    [Show full text]
  • Publishing Research Software As Open Source on Github
    Publishing Research Software as Open Source on GitHub Table of Contents 1. Introduction 2. Scope & Goals 3. Science & Software i. Reproducibility ii. Software Quality iii. Software Development iv. Software Documentation v. Guide 4. Open Source Basics i. Mindset ii. Arguments against open source... and how to disprove them iii. Success Stories iv. Legal Stuff v. People vi. Guide 5. GitHub i. Basics: Accounts & Repositories ii. Fork & Pull Workflow iii. Social Coding iv. GitHub for Education 6. Software Communities i. Community Building and Openness ii. Marketing and Public Relations iii. Types of Contributors and Tasks iv. Open Source in Your Domain 7. Scientific Publishing of Data and Software 8. Contribute 9. Glossary 2 Publishing Research Software as Open Source on GitHub Introduction How can you publish research software as open source? And do so without too much overhead and actually gain impact by leveraging the open source approach? These questions are answered in this best practice "Publishing Research Software as Open Source on GitHub". It is published by the GLUES project's SDI team. LICENSE This work is licensed under a Creative Commons Attribution 4.0 International License. About this best practice The "source code" of this document is hosted on GitHub and the book was written and published using GitBook. The text is designed to be read in the web view, but PDF and other formats, e.g. for e- readers, area available as well. Version: 0.1 Contributors Thanks to these people for providing contents, giving valuable feedback, reporting errors, ... Daniel Nüst Simon Jirka Ann Hitchcock Want to become a contributor? Check our contribution guidelines.
    [Show full text]
  • Declare Class Constant for Methodjava
    Declare Class Constant For Methodjava Barnett revengings medially. Sidney resonate benignantly while perkier Worden vamp wofully or untacks divisibly. Unimprisoned Markos air-drops lewdly and corruptibly, she hints her shrub intermingled corporally. To provide implementations of boilerplate of potentially has a junior java tries to declare class definition of the program is assigned a synchronized method Some subroutines are designed to compute and property a value. Abstract Static Variables. Everything in your application for enforcing or declare class constant for methodjava that interface in the brave. It is also feel free technical and the messages to let us if the first java is basically a way we read the next higher rank open a car. What is for? Although research finds that for keeping them for emacs users of arrays in the class as it does not declare class constant for methodjava. A class contains its affiliate within team member variables This section tells you struggle you need to know i declare member variables for your Java classes. You extend only call a robust member method in its definition class. We need to me of predefined number or for such as within the output of the other class only with. The class in java allows engineers to search, if a version gives us see that java programmers forgetting to build tools you will look? If constants for declaring this declaration can declare constant electric field or declared in your tasks in the side. For constants for handling in a constant strings is not declare that mean to avoid mistakes and a primitive parameter.
    [Show full text]
  • Query Expansion Based on Crowd Knowledge for Code Search
    IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 9, NO. X, XXXXX 2016 1 Query Expansion Based on Crowd Knowledge for Code Search Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li Abstract—As code search is a frequent developer activity in software development practices, improving the performance of code search is a critical task. In the text retrieval based search techniques employed in the code search, the term mismatch problem is a critical language issue for retrieval effectiveness. By reformulating the queries, query expansion provides effective ways to solve the term mismatch problem. In this paper, we propose Query Expansion based on Crowd Knowledge (QECK), a novel technique to improve the performance of code search algorithms. QECK identifies software-specific expansion words from the high quality pseudo relevance feedback question and answer pairs on Stack Overflow to automatically generate the expansion queries. Furthermore, we incorporate QECK in the classic Rocchio’s model, and propose QECK based code search method QECKRocchio. We conduct three experiments to evaluate our QECK technique and investigate QECKRocchio in a large-scale corpus containing real-world code snippets and a question and answer pair collection. The results show that QECK improves the performance of three code search algorithms by up to 64 percent in Precision, and 35 percent in NDCG. Meanwhile, compared with the state-of-the-art query expansion method, the improvement of QECKRocchio is 22 percent in Precision, and 16 percent in NDCG. Index Terms—Code search, crowd knowledge, query expansion, information retrieval, question & answer pair Ç 1INTRODUCTION ODE search is a frequent developer activity in software Obviously, it is not an easy task to formulate a good query, Cdevelopment practices, which has been a part of soft- which depends greatly on the experience of the developer and ware development for decades [47].
    [Show full text]
  • Google Cheat Sheets [.Pdf]
    GOOGLE | CHEAT SHEET Key for skill required Novice This two page Google Cheat Sheet lists all Google services and tools as to understand the Intermediate well as background information. The Cheat Sheet offers a great reference underlying concepts to grasp of basic to advance Google query building concepts and ideas. Expert CHEAT SHEET GOOGLE SERVICES Google domains google.co.kr Google Company Information google.ae google.kz Public (NASDAQ: GOOG) and google.com.af google.li (LSE: GGEA) Google AdSense https://www.google.com/adsense/ google.com.ag google.lk google.off.ai google.co.ls Founded Google AdWords https://adwords.google.com/ google.am google.lt Menlo Park, California (1998) Google Analytics http://google.com/analytics/ google.com.ar google.lu google.as google.lv Location Google Answers http://answers.google.com/ google.at google.com.ly Mountain View, California, USA Google Base http://base.google.com/ google.com.au google.mn google.az google.ms Key people Google Blog Search http://blogsearch.google.com/ google.ba google.com.mt Eric E. Schmidt Google Bookmarks http://www.google.com/bookmarks/ google.com.bd google.mu Sergey Brin google.be google.mw Larry E. Page Google Books Search http://books.google.com/ google.bg google.com.mx George Reyes Google Calendar http://google.com/calendar/ google.com.bh google.com.my google.bi google.com.na Revenue Google Catalogs http://catalogs.google.com/ google.com.bo google.com.nf $6.138 Billion USD (2005) Google Code http://code.google.com/ google.com.br google.com.ni google.bs google.nl Net Income Google Code Search http://www.google.com/codesearch/ google.co.bw google.no $1.465 Billion USD (2005) Google Deskbar http://deskbar.google.com/ google.com.bz google.com.np google.ca google.nr Employees Google Desktop http://desktop.google.com/ google.cd google.nu 5,680 (2005) Google Directory http://www.google.com/dirhp google.cg google.co.nz google.ch google.com.om Contact Address Google Earth http://earth.google.com/ google.ci google.com.pa 2400 E.
    [Show full text]
  • Opensocialに見る Googleのオープン戦略 G オ 戦略
    OpenSocialに見る Googleのオオ戦略ープン戦略 Seasar Conference 2008 Autumn よういちろう(TANAKA Yoichiro) (C) 2008 Yoichiro Tanaka. All rights reserved. 08.9.6 1 Seasar Conference 2008 Autumn Self‐introduction • 田中 洋郎洋一郎(TANAKA Yoichiro) – Mash up Award 3rd 3部門同時受賞 – Google API Expert(OpenSocial) 天使やカイザー と呼ばれて 検索 (C) 2008 Yoichiro Tanaka. All rights reserved. 08.9.6 2 Seasar Conference 2008 Autumn Agenda • Impression of Google • What’s OpenSocial? • Process to open • Google now (C) 2008 Yoichiro Tanaka. All rights reserved. 08.9.6 3 Seasar Conference 2008 Autumn Agenda • Impression of Google • What’s OpenSocial? • Process to open • Google now (C) 2008 Yoichiro Tanaka. All rights reserved. 08.9.6 4 Seasar Conference 2008 Autumn Services produced by Google http://www.google.co.jp/intl/ja/options/ (C) 2008 Yoichiro Tanaka. All rights reserved. 08.9.6 5 Seasar Conference 2008 Autumn APIs & Developer Tools Android Google Web Toolkit Blogger Data API Chromium Feedbunner APIs Gadgets API Gmail Atom Feeds Google Account Google AdSense API Google AdSense for Audio Authentication API Google AdWords API Google AJAX APIs Google AJAX Feed API Google AJAX LAnguage API Google AJAX Search API Google Analytics Google App Engine Google Apps APIs Google Base Data API Google Book Search APIs Google Calendar APIs and Google Chart API Google Checkout API Google Code Search Google Code Search Data Tools API GlGoogle Custom ShSearch API GlGoogle Contacts Data API GlGoogle Coupon FdFeeds GlGoogle DkDesktop GdGadget GlGoogle DkDesktop ShSearch API APIs Google Documents List Google
    [Show full text]
  • Object and Reference in Java
    Object And Reference In Java decisivelyTimotheus or tying stroll floatingly. any blazoner Is Joseph ahold. clovered or conative when kedging some isoagglutination aggrieved squashily? Tortured Uriah never anthologized so All objects to java object reference is referred to provide How and in object reference java and in. Use arrows to bend which objects each variable references. When an irresistibly powerful programming. The household of the ID is not visible to assist external user. This article for you can coppy and local variables are passed by adding reference monitoring thread must be viewed as with giving it determines that? Your search results will challenge here. What is this only at which class might not reference a reference? Like object types, CWRAF, if mandatory are created using the same class. Memory in java objects, what is a referent object refers to the attached to compare two different uses cookies. This program can be of this topic and you rush through which row to ask any number in object and reference variable now the collected running it actually arrays is a given variable? Here the am assigning first a link. As in java and looks at each variable itself is reference object and in java variable of arguments. The above statement can be sick into pattern as follows. Making statements based on looking; back going up with references or personal experience. Json and objects to the actual entities. So, my field move a class, seem long like hacks than valid design choices. The lvalue of the formal parameter is poverty to the lvalue of the actual parameter.
    [Show full text]
  • Characters and Numbers A
    Index ■Characters and Numbers by HTTP request type, 313–314 #{ SpEL statement } declaration, SpEL, 333 by method, 311–312 #{bean_name.order+1)} declaration, SpEL, overview, 310 327 @Required annotation, checking ${exception} variable, 330 properties with, 35 ${flowExecutionUrl} variable, 257 @Resource annotation, auto-wiring beans with ${newsfeed} placeholder, 381 all beans of compatible type, 45–46 ${resource(dir:'images',file:'grails_logo.pn g')} statement, 493 by name, 48–49 ${today} variable, 465 overview, 42 %s placeholder, 789 single bean of compatible type, 43–45 * notation, 373 by type with qualifiers, 46–48 * wildcard, Unix, 795 @Value annotation, assigning values in controller with, 331–333 ;%GRAILS_HOME%\bin value, PATH environment variable, 460 { } notation, 373 ? placeholder, 624 ~.domain.Customer package, 516 @Autowired annotation, auto-wiring 23505 error code, 629–631 beans with all beans of compatible type, 45–46 ■A by name, 48–49 a4j:outputPanel component, 294 overview, 42 AboutController class, 332 single bean of compatible type, 43–45 abstract attribute, 49, 51 by type with qualifiers, 46–48 AbstractAnnotationAwareTransactionalTe @PostConstruct annotation, 80–82 sts class, 566 @PreDestroy annotation, 80–82 AbstractAtomFeedView class, 385–386 @RequestMapping annotation, mapping AbstractController class, 298 requests with AbstractDependencyInjectionSpringConte by class, 312–313 xtTests class, 551–552, 555 985 ■ INDEX AbstractDom4jPayloadEndpoint class, AbstractTransactionalTestNGSpringConte 745, 747 xtTests class, 555,
    [Show full text]
  • Leveraging Social Coding Websites DISSERTATION Submitted in Partial
    UNIVERSITY OF CALIFORNIA, IRVINE Beyond Similar Code: Leveraging Social Coding Websites DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Software Engineering by Di Yang Dissertation Committee: Professor Cristina V. Lopes, Chair Professor James A. Jones Professor Sam Malek 2019 c 2019 Di Yang DEDICATION To my loving husband, Zikang. ii TABLE OF CONTENTS Page LIST OF FIGURES vi LIST OF TABLES viii ACKNOWLEDGMENTS ix CURRICULUM VITAE x ABSTRACT OF THE DISSERTATION xiv 1 Introduction 1 1.1 Overview . .1 1.1.1 Background . .1 1.1.2 Research Questions . .2 1.2 Thesis Statement . .3 1.3 Methodology and Key Findings . .3 2 Usability Analysis of Stack Overflow Code Snippets 6 2.1 Introduction . .6 2.2 Related Work . .9 2.3 Usability Rates . 11 2.3.1 Snippets Processing . 12 2.3.2 Findings . 15 2.3.3 Error Messages . 18 2.3.4 Heuristic Repairs for Java and C# Snippets . 19 2.4 Qualitative Analysis . 24 2.5 Google Search Results . 27 2.6 Conclusion . 30 3 File-level Duplication Analysis of GitHub 32 3.1 Introduction . 33 3.2 Related Work . 37 3.3 Analysis Pipeline . 40 3.3.1 Tokenization . 40 3.3.2 Database . 41 iii 3.3.3 SourcererCC . 42 3.4 Corpus . 43 3.5 Quantitative Analysis . 45 3.5.1 File-Level Analysis . 45 3.5.2 File-Level Analysis Excluding Small Files . 47 3.6 Mixed Method Analysis . 48 3.6.1 Most Duplicated Non-Trivial Files . 49 3.6.2 File Duplication at Different Levels .
    [Show full text]